date:20141010

Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 12:07:03AM +0400, Ilya Verbin wrote:
 On 09 Oct 16:07, Ilya Verbin wrote:
+  /* By default linker does not discard .gnu.offload_lto_* 
sections.  */
+  const char *linker_script = make_temp_file (_linker_script.x);
+  FILE *stream = fopen (linker_script, w);
+  if (!stream)
+   fatal_error (fopen %s: %m, linker_script);
+  fprintf (stream, SECTIONS { /DISCARD/ : { *(
+  OFFLOAD_SECTION_NAME_PREFIX *) } }\n);
+  fclose (stream);
+  printf (%s\n, linker_script);
+
+  goto finish;
+}
   
   Does this work with gold?  Are there any other linkers that support 
   plugins,
   but don't support linker scripts this way?
  
  Oops, gold does not support scripts, outputted from plugins :(
  error: SECTIONS seen after other input files; try -T/--script
  
  Probably, we should update default linker scripts in binutils?
  But without latest ld/gold all binaries compiled without -flto and with
  offloading will contain intermediate bytecode...
 
 Actually, this issue is not due to outputting a script from a plugin,
 gold just does not support partial linker scripts:
 https://sourceware.org/bugzilla/show_bug.cgi?id=17451
 
 So it seems that discarding .gnu.offload_lto_* sections (like it is done for
 .gnu.lto_*) in the default ld and gold scripts is the right way?

I must say I'm not very much familiar with the linker plugin API, but it
surprises me that discarding sections is not something it allows.
Anyway, can you do the partial linker script for the bfd linker (is there
a way to determine from the linker plugin API if it is gold or bfd ld?), and
for gold for the time being perhaps strip the sections in lto-wrapper? and
feed the ET_REL objects with the sections stripped back to the linker
through the plugin API?

Jakub

Re: [PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.

2014-10-10 Thread Maxim Ostapenko


Adding Jakub.

-Maxim
On 10/09/2014 04:34 PM, Maxim Ostapenko wrote:

Hi,

After enabling ASan, TSan and UBSan testsuites for installed 
toolchain, many tests started to fail. This is caused by wrong logic 
in {asan, ubsan, tsan}_finish
functions. Here, restore_ld_library_path is called, that is wrong, 
because it drops some env variables ( GCC_EXEC_PREFIX, 
LD_LIBRARY_PATH, etc) to state that was before gcc-dg.exp initialized 
testing environment, so installed GCC will be confused to find some 
needed stuff later.


Removing restore_ld_library_path from {asan, ubsan, tsan}_finish seems 
to fix the issue.


Tested on x86_64-pc-linux-gnu, ok to commit?

-Maxim


gcc/testsuite/ChangeLog:

2014-10-09  Max Ostapenko  m.ostape...@partner.samsung.com

	* lib/asan-dg.exp (asan_finish): Remove restore_ld_library_path_env_vars.
	* lib/tsan-dg.exp (tsan_finish): Likewise.
	* lib/ubsan-dg.exp (ubsan_finish): Likewise.

diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
index 9769138..c98fd3c 100644
--- a/gcc/testsuite/lib/asan-dg.exp
+++ b/gcc/testsuite/lib/asan-dg.exp
@@ -132,7 +132,6 @@ proc asan_finish { args } {
 	unset TEST_ALWAYS_FLAGS
 	}
 }
-restore_ld_library_path_env_vars
 }
 
 # Symbolize lines like
diff --git a/gcc/testsuite/lib/tsan-dg.exp b/gcc/testsuite/lib/tsan-dg.exp
index 54ec404..6f7a4d9 100644
--- a/gcc/testsuite/lib/tsan-dg.exp
+++ b/gcc/testsuite/lib/tsan-dg.exp
@@ -143,5 +143,4 @@ proc tsan_finish { args } {
 } else {
 	unset dg-do-what-default
 }
-restore_ld_library_path_env_vars
 }
diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp
index 5a7a653..87c460f 100644
--- a/gcc/testsuite/lib/ubsan-dg.exp
+++ b/gcc/testsuite/lib/ubsan-dg.exp
@@ -114,5 +114,4 @@ proc ubsan_finish { args } {
 	unset TEST_ALWAYS_FLAGS
 	}
 }
-restore_ld_library_path_env_vars
 }

Re: [PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 11:13:11AM +0400, Maxim Ostapenko wrote:
 Adding Jakub.
 
 -Maxim
 On 10/09/2014 04:34 PM, Maxim Ostapenko wrote:
 Hi,
 
 After enabling ASan, TSan and UBSan testsuites for installed toolchain,
 many tests started to fail. This is caused by wrong logic in {asan, ubsan,
 tsan}_finish
 functions. Here, restore_ld_library_path is called, that is wrong, because
 it drops some env variables ( GCC_EXEC_PREFIX, LD_LIBRARY_PATH, etc) to
 state that was before gcc-dg.exp initialized testing environment, so
 installed GCC will be confused to find some needed stuff later.
 
 Removing restore_ld_library_path from {asan, ubsan, tsan}_finish seems to
 fix the issue.
 
 Tested on x86_64-pc-linux-gnu, ok to commit?
 
 -Maxim
 

 gcc/testsuite/ChangeLog:
 
 2014-10-09  Max Ostapenko  m.ostape...@partner.samsung.com
 
   * lib/asan-dg.exp (asan_finish): Remove 
 restore_ld_library_path_env_vars.
   * lib/tsan-dg.exp (tsan_finish): Likewise.
   * lib/ubsan-dg.exp (ubsan_finish): Likewise.

That looks wrong to me, we don't want to keep the libsanitizer paths in
LD_LIBRARY_PATH* after we leave asan.exp etc.

So, perhaps instead save ld_library_path into some global variable
(like {a,t,ub}san_saved_ld_library_path) during {a,t,ub}san_link_flags
before appending there anything, and replace
restore_ld_library_path_env_vars
with
set ld_library_path ${a,t,ub}san_saved_ld_library_path
set_ld_library_path_env_vars
?

Jakub

[PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Evgeny Stupachenko

Hi,

The patch enables EBX in RA for x86 32bits PIC mode.
It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html
Now there is working version with good performance and stability level
- it could be a solid first step of EBX enabling.

Bootstrap and make check passed.
There are several changes in -m32 make check.

New pass:
gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS

New fails:

gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no
errors as we can do asm insertions with EBX
gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential
performance opportunity using constant immediate
gcc.target/i386/pr55458.c (test for errors, line 10) - now there are
no errors as there enough registers

Performance on Silvermont (for the group of 3 patches):

at -Ofast -flto -funroll-loops -fPIC
EEMBC benchs:
 +6% CoreMark
 +3% Denmark
Individual test gained up to +25%

SPEC2000int is almost unchanged, SPEC2000fp +1,7%:

at -O2 -fPIC
EEMBC benchs:
 +2% CoreMark
 +5% Denmark
Individual test gained up to +25%

SPEC2000int and SPEC2000fp are almost unchanged:

ChangeLog:

2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
Vladimir Makarov  vmaka...@redhat.com
* doc/tm.texi.in (TARGET_USE_PSEUDO_PIC_REG, TARGET_INIT_PIC_REG):
Document.
* doc/tm.texi: Regenerate.
* gcc/function.c (assign_parms): Generate pseudo register for PIC.
* gcc/init-regs.c (initialize_uninitialized_regs): Ignor pseudo PIC
register.
* gcc/ira-color.c (color_pass): Add check on pseudo register.
* gcc/ira-emit.c (change_loop): Don't create copies for PIC pseudo
register.
* gcc/ira.c (split_live_ranges_for_shrink_wrap): Add check on pseudo
register.
(ira): Add target specific PIC register initialization.
(do_reload): Keep PIC pseudo register.
* gcc/lra-assigns.c (spill_for): Add checks on pseudo register.
* gcc/lra-constraints.c (contains_symbol_ref_p): New.
(lra_constraints): Enable lra risky transformations when PIC is pseudo
register.
* gcc/shrink-wrap.c (try_shrink_wrapping): Add check on pseudo register.
* gcc/target.def (use_pseudo_pic_reg): New.
(init_pic_reg): New.


enabling_ebx_general.patch
Description: Binary data

[PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Evgeny Stupachenko

i386 specific part of the patch:

2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
Vladimir Makarov  vmaka...@redhat.com
* gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
(ix86_init_pic_reg): New.
(ix86_select_alt_pic_regnum): Add check on pseudo register.
(ix86_save_reg): Likewise.
(ix86_expand_prologue): Remove irrelevant code.
(ix86_output_function_epilogue): Add check on pseudo register.
(set_pic_reg_ever_alive): New.
(legitimize_pic_address): Replace df_set_regs_ever_live with new
set_pic_reg_ever_alive.
(legitimize_tls_address): Likewise.
(ix86_pic_register_p): New check.
(ix86_delegitimize_address): Add check on pseudo register.
(ix86_expand_call): Insert move from pseudo PIC register to ABI
defined REAL_PIC_OFFSET_TABLE_REGNUM.
(TARGET_INIT_PIC_REG): New.
(TARGET_USE_PSEUDO_PIC_REG): New.
(PIC_OFFSET_TABLE_REGNUM): New check.


enabling_ebx_i386.patch
Description: Binary data

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [8/n] Remove useless builtin calls

2014-10-10 Thread Richard Biener

On Wed, Oct 8, 2014 at 9:08 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hi,

 This patch adds removal of user calls to chkp builtins which become useless 
 after instrumentation.

 Thanks,
 Ilya
 --
 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com

 * tree-chkp.c (chkp_remove_useless_builtins): New.
 (chkp_execute): Remove useless calls to Pointer Bounds
 Checker builtins.


 diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
 index 5443950..b424af8 100644
 --- a/gcc/tree-chkp.c
 +++ b/gcc/tree-chkp.c
 @@ -3805,6 +3805,49 @@ chkp_instrument_function (void)
 }
  }

 +/* Find init/null/copy_ptr_bounds calls and replace them
 +   with assignments.  It should allow better code
 +   optimization.  */
 +
 +static void
 +chkp_remove_useless_builtins ()
 +{
 +  basic_block bb, next;
 +  gimple_stmt_iterator gsi;
 +
 +  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)-next_bb;
 +  do
 +{
 +  next = bb-next_bb;

Please don't use -next_bb but instead use FOR_EACH_BB_FN (cfun, bb)
instead.

 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 + tree fndecl;
 + enum built_in_function fcode;
 +
 + /* Find builtins returning first arg and replace
 +them with assignments.  */
 + if (gimple_code (stmt) == GIMPLE_CALL
 +  (fndecl = gimple_call_fndecl (stmt))
 +  DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
 +  (fcode = DECL_FUNCTION_CODE (fndecl))
 +  (fcode == BUILT_IN_CHKP_INIT_PTR_BOUNDS
 + || fcode == BUILT_IN_CHKP_NULL_PTR_BOUNDS
 + || fcode == BUILT_IN_CHKP_COPY_PTR_BOUNDS
 + || fcode == BUILT_IN_CHKP_SET_PTR_BOUNDS))
 +   {
 + tree res = gimple_call_arg (stmt, 0);
 + if (!update_call_from_tree (gsi, res))
 +   gimplify_and_update_call_from_tree (gsi, res);

update_call_from_tree should always succeed with res being a call argument.

Richard.

 + stmt = gsi_stmt (gsi);
 + update_stmt (stmt);
 +   }
 +}
 +  bb = next;
 +}
 +  while (bb);
 +}
 +
  /* Initialize pass.  */
  static void
  chkp_init (void)
 @@ -3872,6 +3915,8 @@ chkp_execute (void)

chkp_instrument_function ();

 +  chkp_remove_useless_builtins ();
 +
chkp_function_mark_instrumented (cfun-decl);

chkp_fix_cfg ();

[PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Evgeny Stupachenko

the patch improves performance when previous are applied.
It makes RTL loop invariant behavior for GOT loads same as it was
before the 2 previous patches.

It improves 164.gzip (+9%), 253.perlbmk (+2%) giving ~0.5% to SPEC2000int
(compiled with “-m32 -Ofast -flto -funroll-loops -fPIC”

For example in 164.gzip.

Before enabling EBX:
loop begin:

(I) 1. SI162 = prev (global address)
2. SI163 = SI142  0xfff (SI 142 modified in the loop)
3. SI164 = EBX + SI162
4. HI107 = SI163*2 + SI164
5. SI142 = HI107

Only INSN 1. treated as invariant and later combine propagates 2,3,4 into 5.

After enabling EBX:
loop begin:

(I) 1. SI163 = prev (global address)
2. SI164 = SI142  0xfff (SI 142 modified in the loop)
(I) 3. SI165 = SI143 + SI163
4. HI107 = SI164*2 + SI165
5. SI142 = HI107

INSNs 1. and 3. are treated as invariants (143 is GOT register) and
hoisted outside the loop

After that combine pass was unable to combine INSNs inside and outside
the loop, which lead to higher register pressure and therefore new
spills/fills.

The patch fixes x86 address cost so that cost for addresses with GOT
register becomes less, how it was before enabling EBX.

In x86_address_cost the result of “REGNO (parts.base) =
FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes
condition result
the same when parts.base is GOT register (the same for parts.index).

2014-10-08  Evgeny Stupachenko  evstu...@gmail.com
* gcc/config/i386/i386.c (ix86_address_cost): Lower cost for
when address contains GOT register.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b43e870..9d8cfd1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode,
addr_space_t, bool)
 cost++;

   if (parts.base
+   (!pic_offset_table_rtx
+ || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
(!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER)
parts.index
+   (!pic_offset_table_rtx
+ || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
(!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER)
parts.base != parts.index)
 cost++;

RE: [PATCH] Clean up duplicated function seq_cost

2014-10-10 Thread Zhenqiang Chen


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Thursday, October 09, 2014 5:21 PM
 To: Zhenqiang Chen
 Cc: GCC Patches
 Subject: Re: [PATCH] Clean up duplicated function seq_cost
 
 On Thu, Oct 9, 2014 at 11:20 AM, Richard Biener
 richard.guent...@gmail.com wrote:
  On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen
 zhenqiang.c...@arm.com wrote:
  Hi,
 
  The are two implementations of seq_cost. The function bodies are
  exactly the same. The patch removes one of them and make the other
 global.
 
  Bootstrap and no make check regression on X86-64.
 
  OK for trunk?
 
  The prototype should go to cfgloopanal.c.
 
 Err - cfgloopanal.h of course ;)  Or rather the function sounds misplaced in
 cfgloopanal.c.

Thanks for the comments. I think seq_cost should be in rtlanal.c, just like 
rtx_cost, insn_rtx_cost and so on.

ChangeLog:
2014-10-10  Zhenqiang Chen  zhenqiang.c...@arm.com

* cfgloopanal.c (seq_cost): Delete.
* rtl.h (seq_cost): New prototype.
* rtlanal.c (seq_cost): New function.
* tree-ssa-loop-ivopts.c (seq_cost): Delete.

diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
index 7ea1a5f..006b419 100644
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -302,26 +302,6 @@ get_loop_level (const struct loop *loop)
   return mx;
 }
 
-/* Returns estimate on cost of computing SEQ.  */
-
-static unsigned
-seq_cost (const rtx_insn *seq, bool speed)
-{
-  unsigned cost = 0;
-  rtx set;
-
-  for (; seq; seq = NEXT_INSN (seq))
-{
-  set = single_set (seq);
-  if (set)
-   cost += set_rtx_cost (set, speed);
-  else
-   cost++;
-}
-
-  return cost;
-}
-
 /* Initialize the constants for computing set costs.  */
 
 void
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e73f731..b697417 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn *, 
rtx_insn *);
 extern bool keep_with_call_p (const rtx_insn *);
 extern bool label_is_jump_target_p (const_rtx, const rtx_insn *);
 extern int insn_rtx_cost (rtx, bool);
+extern unsigned seq_cost (const rtx_insn *, bool);
 
 /* Given an insn and condition, return a canonical description of
the test being made.  */
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 3063458..93eda12 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5017,6 +5017,26 @@ insn_rtx_cost (rtx pat, bool speed)
   return cost  0 ? cost : COSTS_N_INSNS (1);
 }
 
+/* Returns estimate on cost of computing SEQ.  */
+
+unsigned
+seq_cost (const rtx_insn *seq, bool speed)
+{
+  unsigned cost = 0;
+  rtx set;
+
+  for (; seq; seq = NEXT_INSN (seq))
+{
+  set = single_set (seq);
+  if (set)
+cost += set_rtx_cost (set, speed);
+  else
+cost++;
+}
+
+  return cost;
+}
+
 /* Given an insn INSN and condition COND, return the condition in a
canonical form to simplify testing by callers.  Specifically:
 
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 400798a..087ca26 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct iv_use 
*use,
   return NULL;
 }
 
-/* Returns estimate on cost of computing SEQ.  */
-
-static unsigned
-seq_cost (rtx_insn *seq, bool speed)
-{
-  unsigned cost = 0;
-  rtx set;
-
-  for (; seq; seq = NEXT_INSN (seq))
-{
-  set = single_set (seq);
-  if (set)
-   cost += set_src_cost (SET_SRC (set), speed);
-  else
-   cost++;
-}
-
-  return cost;
-}
-
 /* Produce DECL_RTL for object obj so it looks like it is stored in memory.  */
 static rtx
 produce_memory_decl_rtl (tree obj, int *regno)

Re: [patch] Fix miscompilation of gnat1 in LTO bootstrap

2014-10-10 Thread Eric Botcazou

 I can't see how this can work with LTO.  We need a middle-end way
 to represent the alias relation of those types.  At least I can't see how
 your simple patch covers all cases here?

It covers what I think is the most prominent case (unconstrained array types), 
the other cases are far less annoying in practice.

 With LTO we preserve TYPE_ALIAS_SET == 0, so another way to
 fix this (and which I'd like more) is to do your patch in the Ada frontend,
 that is, use alias-set zero for all types you relate if flag_lto.

Really Big Hammer if you ask me.

 Another way is to make LTO canonical type merging handle the
 case of type_contains_placeholder_p better, that is by treating
 two types with those equivalent more easily.  For arrays this simply
 means hashing and comparing non-constant TYPE_DOMAIN the
 same / as equal.  There is already some code handling PLACEHODER_EXPR
 special, but it doesn't seem to be enough (why in this case)?

Yes, gimple_canonical_types_compatible_p will return true for a couple of 
unconstrained (i.e. type_contains_placeholder_p) array types with the same 
component type.  The problem here is for an unconstrained array type and a 
constrained (regular) array type with the same component type.  Can I modify 
gimple_canonical_types_compatible_p to return true in that case as well?

-- 
Eric Botcazou

Re: [PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.

2014-10-10 Thread Maxim Ostapenko



On 10/10/2014 11:30 AM, Jakub Jelinek wrote:

On Fri, Oct 10, 2014 at 11:13:11AM +0400, Maxim Ostapenko wrote:

Adding Jakub.

-Maxim
On 10/09/2014 04:34 PM, Maxim Ostapenko wrote:

Hi,

After enabling ASan, TSan and UBSan testsuites for installed toolchain,
many tests started to fail. This is caused by wrong logic in {asan, ubsan,
tsan}_finish
functions. Here, restore_ld_library_path is called, that is wrong, because
it drops some env variables ( GCC_EXEC_PREFIX, LD_LIBRARY_PATH, etc) to
state that was before gcc-dg.exp initialized testing environment, so
installed GCC will be confused to find some needed stuff later.

Removing restore_ld_library_path from {asan, ubsan, tsan}_finish seems to
fix the issue.

Tested on x86_64-pc-linux-gnu, ok to commit?

-Maxim

gcc/testsuite/ChangeLog:

2014-10-09  Max Ostapenko  m.ostape...@partner.samsung.com

* lib/asan-dg.exp (asan_finish): Remove 
restore_ld_library_path_env_vars.
* lib/tsan-dg.exp (tsan_finish): Likewise.
* lib/ubsan-dg.exp (ubsan_finish): Likewise.

That looks wrong to me, we don't want to keep the libsanitizer paths in
LD_LIBRARY_PATH* after we leave asan.exp etc.

So, perhaps instead save ld_library_path into some global variable
(like {a,t,ub}san_saved_ld_library_path) during {a,t,ub}san_link_flags
before appending there anything, and replace
restore_ld_library_path_env_vars
with
set ld_library_path ${a,t,ub}san_saved_ld_library_path
set_ld_library_path_env_vars
?

Jakub



This works indeed. However, calling set_ld_library_path_env_vars in 
{asan, tsan, ubsan}_finish will lead to updating LD_LIBRARY_PATH_{32, 
64}, LD_RUN_PATH etc. with $ld_library_path:$orig_ld_{library_path_32, 
library_path_64, run, etc}. Is this fine?


-Maxim
gcc/testsuite/ChangeLog:

2014-10-10  Max Ostapenko  m.ostape...@partner.samsung.com

	* lib/asan-dg.exp (asan_link_flags): Save ld_library_path.
	* lib/tsan-dg.exp (tsan_link_flags): Likewise.
	* lib/ubsan-dg.exp (ubsan_link_flags): Likewise.
	* lib/asan-dg.exp (asan_finish): Remove restore_ld_library_path_env_vars.
	Restore ld_library_path with saved value. Restore LD_LIBRARY_PATH
	related env variables by calling set_ld_library_path_env_vars.
	* lib/tsan-dg.exp (tsan_finish): Likewise.
	* lib/ubsan-dg.exp (ubsan_finish): Likewise.

diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
index 9769138..4e8b4d6 100644
--- a/gcc/testsuite/lib/asan-dg.exp
+++ b/gcc/testsuite/lib/asan-dg.exp
@@ -47,11 +47,13 @@ proc asan_link_flags { paths } {
 global srcdir
 global ld_library_path
 global shlib_ext
+global asan_saved_library_path
 
 set gccpath ${paths}
 set flags 
 
 set shlib_ext [get_shlib_extension]
+set asan_saved_library_path $ld_library_path
 
 if { $gccpath !=  } {
   if { [file exists ${gccpath}/libsanitizer/asan/.libs/libasan.a]
@@ -122,6 +124,8 @@ proc asan_finish { args } {
 global TEST_ALWAYS_FLAGS
 global asan_saved_TEST_ALWAYS_FLAGS
 global asan_saved_ALWAYS_CXXFLAGS
+global asan_saved_library_path
+global ld_library_path
 
 if [info exists asan_saved_ALWAYS_CXXFLAGS ] {
 	set ALWAYS_CXXFLAGS $asan_saved_ALWAYS_CXXFLAGS
@@ -132,7 +136,8 @@ proc asan_finish { args } {
 	unset TEST_ALWAYS_FLAGS
 	}
 }
-restore_ld_library_path_env_vars
+set ld_library_path $asan_saved_library_path
+set_ld_library_path_env_vars
 }
 
 # Symbolize lines like
diff --git a/gcc/testsuite/lib/tsan-dg.exp b/gcc/testsuite/lib/tsan-dg.exp
index 54ec404..77c85ff 100644
--- a/gcc/testsuite/lib/tsan-dg.exp
+++ b/gcc/testsuite/lib/tsan-dg.exp
@@ -32,11 +32,13 @@ proc tsan_link_flags { paths } {
 global srcdir
 global ld_library_path
 global shlib_ext
+global tsan_saved_library_path
 
 set gccpath ${paths}
 set flags 
 
 set shlib_ext [get_shlib_extension]
+set tsan_saved_library_path $ld_library_path
 
 if { $gccpath !=  } {
   if { [file exists ${gccpath}/libsanitizer/tsan/.libs/libtsan.a]
@@ -127,6 +129,8 @@ proc tsan_finish { args } {
 global tsan_saved_ALWAYS_CXXFLAGS
 global dg-do-what-default
 global tsan_saved_dg-do-what-default
+global tsan_saved_library_path
+global ld_library_path
 
 if [info exists tsan_saved_ALWAYS_CXXFLAGS ] {
 	set ALWAYS_CXXFLAGS $tsan_saved_ALWAYS_CXXFLAGS
@@ -143,5 +147,6 @@ proc tsan_finish { args } {
 } else {
 	unset dg-do-what-default
 }
-restore_ld_library_path_env_vars
+set ld_library_path $tsan_saved_library_path
+set_ld_library_path_env_vars
 }
diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp
index 5a7a653..3bfdcc8 100644
--- a/gcc/testsuite/lib/ubsan-dg.exp
+++ b/gcc/testsuite/lib/ubsan-dg.exp
@@ -32,11 +32,13 @@ proc ubsan_link_flags { paths } {
 global srcdir
 global ld_library_path
 global shlib_ext
+global ubsan_saved_library_path
 
 set gccpath ${paths}
 set flags 
 
 set shlib_ext [get_shlib_extension]
+set

Re: [PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 01:09:00PM +0400, Maxim Ostapenko wrote:
 This works indeed. However, calling set_ld_library_path_env_vars in {asan,
 tsan, ubsan}_finish will lead to updating LD_LIBRARY_PATH_{32, 64},
 LD_RUN_PATH etc. with $ld_library_path:$orig_ld_{library_path_32,
 library_path_64, run, etc}. Is this fine?

Isn't that the state in which asan_init has been called?

I mean, all of asan.exp, tsan.exp and ubsan.exp source gcc-dg.exp which does:
global orig_environment_saved

# This file may be sourced, so don't override environment settings
# that have been previously setup.
if { $orig_environment_saved == 0 } {
append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
set_ld_library_path_env_vars
}

so by the time asan_init etc. is called, it should have been
in LD_LIBRARY_PATH* etc. already.

2014-10-10  Max Ostapenko  m.ostape...@partner.samsung.com
   

   
* lib/asan-dg.exp (asan_link_flags): Save ld_library_path.  
   
* lib/tsan-dg.exp (tsan_link_flags): Likewise.  
   
* lib/ubsan-dg.exp (ubsan_link_flags): Likewise.
   
* lib/asan-dg.exp (asan_finish): Remove
* restore_ld_library_path_env_vars. 
 
Restore ld_library_path with saved value. Restore LD_LIBRARY_PATH   
   
related env variables by calling set_ld_library_path_env_vars.  
   
* lib/tsan-dg.exp (tsan_finish): Likewise.  
   
* lib/ubsan-dg.exp (ubsan_finish): Likewise.
   

I'd rather duplicate the descriptions than duplicate the filenames in
the same ChangeLog entry.  So, please put all asan-dg.exp changes together
etc.

Ok with that change.

Jakub

Re: Towards GNU11

2014-10-10 Thread Marek Polacek

On Thu, Oct 09, 2014 at 02:34:51PM -0700, Mike Stump wrote:
 On Oct 7, 2014, at 2:07 PM, Marek Polacek pola...@redhat.com wrote:
  I'd like to kick off a discussion about moving the default standard
  for C from gnu89 to gnu11.
 
 I endorse the change of default.
 
Thanks for chiming in.

 A wiki page that has the types of changes people hit in real code with how to 
 fix it, would be useful, helpful.  Might be nice to have this in the 
 document, but, don’t know if people want to do that much work.  The wiki site 
 is nice, as if others do world builds, then can add what ever they hit in 
 easily, which then makes that even more complete.  This is a nice to have, I 
 don’t think the work going in should be gated on this.
 
Yeah.  I plan to write something into the porting to document.

 Two comment:
 
   Thank you for all your hard work.
 
   Yes please.

Thank you very much!

Marek

Re: [PATCH] Implement -fsanitize=object-size

2014-10-10 Thread Marek Polacek

On Thu, Oct 02, 2014 at 02:04:24PM +0200, Jakub Jelinek wrote:
 Looks much better.

Cool.

 There are some nits I'd change, like:
 1) no need not to handle bitfields
 2) IMHO it should handle PARM_DECL and RESULT_DECL alongside of VAR_DECL
 3) decl_p IMHO should use just DECL_P
 4) it doesn't make sense to build ADDR_EXPR if you are never going to use
it

All these look good - I applied the patch.
 
 Attaching some (incomplete) testcase I've used to look at the
 implementation, e.g. f1 is for testing how it plays together with
 -fsanitize=bounds (perhaps, maybe as a follow-up, we could optimize by
 looking for UBSAN_BOUNDS call with expected arguments in a few statements
 before array access (e.g. remember last UBSAN_BOUNDS seen in the same
 basic block) if inner is DECL_P), f2 to show a case which -fsanitize=bounds
 doesn't instrument, but -fsanitize=object-size should, f3/f4 the same with
 PARM_DECL, f5/f6 unsuccessful attempt for RESULT_DECL, f7/f8 bitfield
 tests, f9 something where __bos is folded very early (already before objsz1
 pass), f10 where __bos is folded during objsz1 pass.
 If you want to turn parts of that testcase into real /ubsan/ tests, go
 ahead.  Other than that, if you are fine with following changes, can you
 incorporate them into the patch and retest?

I created a new testcase based on that, thanks.

I couldn't test bootstrap-ubsan, because of error:
/home/polacek/x/trunk/prev-x86_64-unknown-linux-gnu/libsanitizer/ubsan/.libs/libubsan.a(ubsan_init.o):
.preinit_array section is not allowed in DSO
but I remember that the previous version of the patch passed fine.

I had to add the following

+  if (LOCATION_LOCUS (loc) == UNKNOWN_LOCATION)
+   loc = input_location;

the reason is that on pr59250.C we ICEd, since we hit this assert in
tree-object-size.c:pass_object_sizes::execute;
gcc_assert (TREE_CODE (result) == INTEGER_CST);
Because fold_call_stmt returned an INTEGER_CST wrapped in NOP_EXPR.
This NOP_EXPR is created in fold_builtin_n so we can a location on it,
and should be discarded in fold_call_stmt: 
ret = fold_builtin_n (loc, fndecl, args, nargs, ignore);
  if (ret) 
{
  /* Propagate location information from original call to
 expansion of builtin.  Otherwise things like
 maybe_emit_chk_warning, that operate on the expansion
 of a builtin, will use the wrong location information.  */
  if (gimple_has_location (stmt))
{
  tree realret = ret;
  if (TREE_CODE (ret) == NOP_EXPR)
realret = TREE_OPERAND (ret, 0);

However, STMT was without location.  We set the location properly in
instrument_object_size, but in this case the location was UNKNOWN_LOCATION.
So for UNKNOWN_LOCATION I use input_location.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-10-09  Marek Polacek  pola...@redhat.com

* asan.c (pass_sanopt::execute): Handle IFN_UBSAN_OBJECT_SIZE.
* doc/invoke.texi: Document -fsanitize=object-size.
* flag-types.h (enum sanitize_code): Add SANITIZE_OBJECT_SIZE and
or it into SANITIZE_UNDEFINED.
* gimple-fold.c (gimple_fold_call): Optimize IFN_UBSAN_OBJECT_SIZE.
* internal-fn.c (expand_UBSAN_OBJECT_SIZE): New function.
* internal-fn.def (UBSAN_OBJECT_SIZE): Define.
* opts.c (common_handle_option): Handle -fsanitize=object-size.
* ubsan.c: Include tree-object-size.h.
(ubsan_type_descriptor): Call tree_to_uhwi instead of tree_to_shwi. 
(ubsan_expand_bounds_ifn): Use false instead of 0.
(ubsan_expand_objsize_ifn): New function.
(instrument_object_size): New function.
(pass_ubsan::execute): Add object size instrumentation.
* ubsan.h (ubsan_expand_objsize_ifn): Declare.
testsuite/
* c-c++-common/ubsan/object-size-1.c: New test.
* c-c++-common/ubsan/object-size-2.c: New test.
* c-c++-common/ubsan/object-size-3.c: New test.
* c-c++-common/ubsan/object-size-4.c: New test.
* c-c++-common/ubsan/object-size-5.c: New test.
* c-c++-common/ubsan/object-size-6.c: New test.
* c-c++-common/ubsan/object-size-7.c: New test.
* c-c++-common/ubsan/object-size-8.c: New test.
* c-c++-common/ubsan/object-size-9.c: New test.
* g++.dg/ubsan/object-size-1.C: New test.
* gcc.dg/ubsan/object-size-9.c: New test.

diff --git gcc/asan.c gcc/asan.c
index fca4ee6..6ea3efe 100644
--- gcc/asan.c
+++ gcc/asan.c
@@ -2879,6 +2879,9 @@ pass_sanopt::execute (function *fun)
case IFN_UBSAN_BOUNDS:
  no_next = ubsan_expand_bounds_ifn (gsi);
  break;
+   case IFN_UBSAN_OBJECT_SIZE:
+ no_next = ubsan_expand_objsize_ifn (gsi);
+ break;
case IFN_ASAN_CHECK:
  {
no_next =

Re: [PATCH] Implement -fsanitize=object-size

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 12:04:08PM +0200, Marek Polacek wrote:
 I couldn't test bootstrap-ubsan, because of error:
 /home/polacek/x/trunk/prev-x86_64-unknown-linux-gnu/libsanitizer/ubsan/.libs/libubsan.a(ubsan_init.o):
 .preinit_array section is not allowed in DSO
 but I remember that the previous version of the patch passed fine.

We build (intentionally) both libubsan.so.* objects and libubsan.a
objects with -fPIC, but don't build the latter with -DPIC.  I guess
we need now, with -static-libubsan libubsan.a is linked into shared
libraries statically and we definitely can't use .preinit_array
in that case.

So, I think (untested) something like:

2014-10-10  Jakub Jelinek  ja...@redhat.com

* ubsan/Makefile.am (DEFS): Add -DPIC.
* ubsan/Makefile.in: Regenerated.

--- libsanitizer/ubsan/Makefile.am  2014-09-24 11:08:04.183026156 +0200
+++ libsanitizer/ubsan/Makefile.am  2014-10-10 12:15:19.124247283 +0200
@@ -3,7 +3,7 @@ AM_CPPFLAGS = -I $(top_srcdir) -I $(top_
 # May be used by toolexeclibdir.
 gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
 
-DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS 
+DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS -DPIC
 AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic 
-Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fno-rtti 
-fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros
 AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
 ACLOCAL_AMFLAGS = -I m4

should fix this.

 2014-10-09  Marek Polacek  pola...@redhat.com

Check the date ;)

   * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_OBJECT_SIZE.
   * doc/invoke.texi: Document -fsanitize=object-size.
   * flag-types.h (enum sanitize_code): Add SANITIZE_OBJECT_SIZE and
   or it into SANITIZE_UNDEFINED.
   * gimple-fold.c (gimple_fold_call): Optimize IFN_UBSAN_OBJECT_SIZE.
   * internal-fn.c (expand_UBSAN_OBJECT_SIZE): New function.
   * internal-fn.def (UBSAN_OBJECT_SIZE): Define.
   * opts.c (common_handle_option): Handle -fsanitize=object-size.
   * ubsan.c: Include tree-object-size.h.

I'd avoid the s.

 --- gcc/gimple-fold.c
 +++ gcc/gimple-fold.c
 @@ -2662,6 +2662,19 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
 inplace)
   gimple_call_arg (stmt, 1),
   gimple_call_arg (stmt, 2));
 break;
 +case IFN_UBSAN_OBJECT_SIZE:
 +   if (integer_all_onesp (gimple_call_arg (stmt, 2))

Formatting on the case line, there should be tab.

 +
 +  gcc_assert (TREE_CODE (size) == INTEGER_CST);
 +  /* See if we can discard the check.  */
 +  if (integer_all_onesp (size))
 +/* Yes, __builtin_object_size couldn't determine the
 +   object size.  */;

I'd just treat TREE_CODE (size) != INTEGER_CST
the same as integer_all_onesp.  It is very likely you'll get
an INTEGER_CST there, but I'd be afraid if somebody disables ccp, forwprop
and similar optimizations that if you are unlucky you might actually have
an SSA_NAME there instead.

Ok with those changes.

After commit, please update gcc-5/changes.html.  Thanks.

Jakub

Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak

On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove irrelevant code.
 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 (PIC_OFFSET_TABLE_REGNUM): New check.

Couple of nits below.

+/* Set regs_ever_live for PIC base address register
+   to true if required.  */
+static void
+set_pic_reg_ever_alive ()

Please rename this function to set_pic_reg_ever_live.

-#define PIC_OFFSET_TABLE_REGNUM \
-  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
- || TARGET_PECOFF)) \
-   || !flag_pic ? INVALID_REGNUM \
-   : reload_completed ? REGNO (pic_offset_table_rtx) \
+#define PIC_OFFSET_TABLE_REGNUM \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
+ || TARGET_PECOFF)) \
+   || !flag_pic ? INVALID_REGNUM \
+   : pic_offset_table_rtx ? INVALID_REGNUM \
: REAL_PIC_OFFSET_TABLE_REGNUM)

No negative conditions, please. Also, please follow established
multi-level condition format, please see e.g. HARD_REGNO_NREGS
definition in i386.h.

OK for mainline after infrastructure patch is approved.

Thanks,
Uros.

[PATCH] Small optimization for emit_case_bit_tests (PR tree-optimization/63464)

2014-10-10 Thread Jakub Jelinek

Hi!

This patch adds a small optimization to emit_case_bit_tests,
instead of emitting (for high, low, mask all constants)
(x - low) = (high - low)  ((1  (x - low))  mask)
if high is smaller than BITS_PER_WORD and low  0 we can emit
x = high  ((1  x)  (mask  low))
and avoid subtraction.  Do this only if mask  low isn't more costly than
mask.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-10-10  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/63464
* tree-switch-conversion.c (struct case_bit_test): Remove
hi and lo fields, add wide_int mask field.
(emit_case_bit_tests): Add MAXVAL argument, rewrite uses of
hi/lo fields into wide_int mask operations, optimize by pretending
minval to be 0 if maxval is small enough.
(process_switch): Adjust caller.

--- gcc/tree-switch-conversion.c.jj 2014-08-01 09:24:12.0 +0200
+++ gcc/tree-switch-conversion.c2014-10-07 12:36:29.438181857 +0200
@@ -246,8 +246,7 @@ This transformation was contributed by R
 
 struct case_bit_test
 {
-  HOST_WIDE_INT hi;
-  HOST_WIDE_INT lo;
+  wide_int mask;
   edge target_edge;
   tree label;
   int bits;
@@ -294,13 +293,14 @@ case_bit_test_cmp (const void *p1, const
 are not guaranteed to be of the same type as INDEX_EXPR
 (the gimplifier doesn't change the type of case label values,
 and MINVAL and RANGE are derived from those values).
+MAXVAL is MINVAL + RANGE.
 
 There *MUST* be MAX_CASE_BIT_TESTS or less unique case
 node targets.  */
 
 static void
 emit_case_bit_tests (gimple swtch, tree index_expr,
-tree minval, tree range)
+tree minval, tree range, tree maxval)
 {
   struct case_bit_test test[MAX_CASE_BIT_TESTS];
   unsigned int i, j, k;
@@ -324,6 +324,8 @@ emit_case_bit_tests (gimple swtch, tree
   tree word_type_node = lang_hooks.types.type_for_mode (word_mode, 1);
   tree word_mode_zero = fold_convert (word_type_node, integer_zero_node);
   tree word_mode_one = fold_convert (word_type_node, integer_one_node);
+  int prec = TYPE_PRECISION (word_type_node);
+  wide_int wone = wi::one (prec);
 
   memset (test, 0, sizeof (test));
 
@@ -348,8 +350,7 @@ emit_case_bit_tests (gimple swtch, tree
   if (k == count)
{
  gcc_checking_assert (count  MAX_CASE_BIT_TESTS);
- test[k].hi = 0;
- test[k].lo = 0;
+ test[k].mask = wi::zero (prec);
  test[k].target_edge = e;
  test[k].label = label;
  test[k].bits = 1;
@@ -367,14 +368,39 @@ emit_case_bit_tests (gimple swtch, tree
CASE_HIGH (cs), minval));
 
   for (j = lo; j = hi; j++)
-if (j = HOST_BITS_PER_WIDE_INT)
- test[k].hi |= (HOST_WIDE_INT) 1  (j - HOST_BITS_PER_INT);
-   else
- test[k].lo |= (HOST_WIDE_INT) 1  j;
+   test[k].mask |= wi::lshift (wone, j);
 }
 
   qsort (test, count, sizeof (*test), case_bit_test_cmp);
 
+  /* If all values are in the 0 .. BITS_PER_WORD-1 range, we can get rid of
+ the minval subtractions, but it might make the mask constants more
+ expensive.  So, compare the costs.  */
+  if (compare_tree_int (minval, 0)  0
+   compare_tree_int (maxval, GET_MODE_BITSIZE (word_mode))  0)
+{
+  int cost_diff;
+  HOST_WIDE_INT m = tree_to_uhwi (minval);
+  rtx reg = gen_raw_REG (word_mode, 1);
+  bool speed_p = optimize_bb_for_speed_p (gimple_bb (swtch));
+  cost_diff = set_rtx_cost (gen_rtx_PLUS (word_mode, reg,
+ GEN_INT (-m)), speed_p);
+  for (i = 0; i  count; i++)
+   {
+ rtx r = immed_wide_int_const (test[i].mask, word_mode);
+ cost_diff += set_src_cost (gen_rtx_AND (word_mode, reg, r), speed_p);
+ r = immed_wide_int_const (wi::lshift (test[i].mask, m), word_mode);
+ cost_diff -= set_src_cost (gen_rtx_AND (word_mode, reg, r), speed_p);
+   }
+  if (cost_diff  0)
+   {
+ for (i = 0; i  count; i++)
+   test[i].mask = wi::lshift (test[i].mask, m);
+ minval = build_zero_cst (TREE_TYPE (minval));
+ range = maxval;
+   }
+}
+
   /* We generate two jumps to the default case label.
  Split the default edge, so that we don't have to do any PHI node
  updating.  */
@@ -446,13 +472,7 @@ emit_case_bit_tests (gimple swtch, tree
 if (const  csui) goto target  */
   for (k = 0; k  count; k++)
 {
-  HOST_WIDE_INT a[2];
-
-  a[0] = test[k].lo;
-  a[1] = test[k].hi;
-  tmp = wide_int_to_tree (word_type_node,
- wide_int::from_array (a, 2,
-   TYPE_PRECISION 
(word_type_node)));
+  tmp = wide_int_to_tree (word_type_node, test[k].mask);
   tmp = fold_build2 (BIT_AND_EXPR, word_type_node, csui, tmp);
   tmp = force_gimple_operand_gsi (gsi, tmp,

[PATCH] Fix up _Alignof with user aligned types (PR c/63495)

2014-10-10 Thread Jakub Jelinek

Hi!

As the testcase shows, _Alignof can sometimes return smaller number
than the minimum alignment.  That is because when laying out structures,
fields with types with TYPE_USER_ALIGN set have also DECL_USER_ALIGN
set and therefore neither BIGGEST_FIELD_ALIGNMENT nor ADJUST_FIELD_ALIGN
is applied to them, but min_align_of_type was applying that unconditionally.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk/4.9?

2014-10-10  Jakub Jelinek  ja...@redhat.com

PR c/63495
* stor-layout.c (min_align_of_type): Don't decrease alignment
through BIGGEST_FIELD_ALIGNMENT or ADJUST_FIELD_ALIGN if
TYPE_USER_ALIGN is set.

* gcc.target/i386/pr63495.c: New test.

--- gcc/stor-layout.c.jj2014-09-01 09:43:56.0 +0200
+++ gcc/stor-layout.c   2014-10-09 09:56:49.760893630 +0200
@@ -2400,17 +2400,19 @@ min_align_of_type (tree type)
 {
   unsigned int align = TYPE_ALIGN (type);
   align = MIN (align, BIGGEST_ALIGNMENT);
+  if (!TYPE_USER_ALIGN (type))
+{
 #ifdef BIGGEST_FIELD_ALIGNMENT
-  align = MIN (align, BIGGEST_FIELD_ALIGNMENT);
+  align = MIN (align, BIGGEST_FIELD_ALIGNMENT);
 #endif
-  unsigned int field_align = align;
+  unsigned int field_align = align;
 #ifdef ADJUST_FIELD_ALIGN
-  tree field = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE,
-  type);
-  field_align = ADJUST_FIELD_ALIGN (field, field_align);
-  ggc_free (field);
+  tree field = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, type);
+  field_align = ADJUST_FIELD_ALIGN (field, field_align);
+  ggc_free (field);
 #endif
-  align = MIN (align, field_align);
+  align = MIN (align, field_align);
+}
   return align / BITS_PER_UNIT;
 }
 
--- gcc/testsuite/gcc.target/i386/pr63495.c.jj  2014-10-09 09:59:46.036634365 
+0200
+++ gcc/testsuite/gcc.target/i386/pr63495.c 2014-10-09 10:01:09.231096165 
+0200
@@ -0,0 +1,6 @@
+/* PR c/63495 */
+/* { dg-do compile { target { i?86-*-linux* x86_64-*-linux* } } } */
+/* { dg-options -std=gnu11 } */
+
+struct __attribute__ ((aligned (8))) S { char c; };
+_Static_assert (_Alignof (struct S) = 8, wrong alignment);

Jakub

[AArch64] Add --enable-fix-cortex-a53-835769 configure-time option

2014-10-10 Thread Kyrill Tkachov


Hi all,

This adds a new configure-time option --enable-fix-cortex-a53-835769 that
will enable the Cortex-A53 erratum fix by default
so you don't have to specify -mfix-cortex-a53-835769 every time.

Documentation in install.texi is added.

Ok for trunk?

Thanks,
Kyrill

2014-10-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* configure.ac: Add --enable-fix-cortex-a53-835769 option.
* configure: Regenerate.
* config/aarch64/aarch64.c (aarch64_override_options): Handle
TARGET_FIX_ERR_A53_835769_DEFAULT.
* config/aarch64/aarch64.opt (mfix-cortex-a53-835769): Set Init
value to 2.
* doc/install.texi (aarch64*-*-*): Document
new --enable-fix-cortex-a53-835769 option.
commit ed8a64d6a7a23919b6b6d3a76cad90b9eae8a530
Author: Kyrill Tkachov kyrylo.tkac...@arm.com
Date:   Wed Oct 8 16:28:25 2014 +

[AArch64] Add --enable-fix-cortex-a53-835769 configure switch.

2014-10-08  Kyrylo Tkachov  kyrylo.tkac...@arm.com

	* configure.ac: Add --enable-fix-cortex-a53-835769 option.
	* configure: Regenerate.
	* config/aarch64/aarch64.c (aarch64_override_options): Handle
	TARGET_FIX_ERR_A53_835769_DEFAULT.
	* config/aarch64/aarch64.opt (mfix-cortex-a53-835769): Set Init
	value to 2.
	* doc/install.texi (aarch64*-*-*): Document
	new --enable-fix-cortex-a53-835769 option.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 76a2480..db5ff59 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6387,6 +6387,15 @@ aarch64_override_options (void)
   aarch64_tune = selected_tune-core;
   aarch64_tune_params = selected_tune-tune;
 
+  if (aarch64_fix_a53_err835769 == 2)
+{
+#ifdef TARGET_FIX_ERR_A53_835769_DEFAULT
+  aarch64_fix_a53_err835769 = 1;
+#else
+  aarch64_fix_a53_err835769 = 0;
+#endif
+}
+
   aarch64_override_options_after_change ();
 }
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 77deb2e..fc0307e 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -68,7 +68,7 @@ Target Report RejectNegative Mask(GENERAL_REGS_ONLY)
 Generate code which uses only the general registers
 
 mfix-cortex-a53-835769
-Target Report Var(aarch64_fix_a53_err835769) Init(0)
+Target Report Var(aarch64_fix_a53_err835769) Init(2)
 Workaround for ARM Cortex-A53 Erratum number 835769
 
 mlittle-endian
diff --git a/gcc/configure b/gcc/configure
index 380a235..bd1215d 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -920,6 +920,7 @@ with_plugin_ld
 enable_gnu_indirect_function
 enable_initfini_array
 enable_comdat
+enable_fix_cortex_a53_835769
 with_glibc_version
 enable_gnu_unique_object
 enable_linker_build_id
@@ -1638,6 +1639,14 @@ Optional Features:
   glibc systems
   --enable-initfini-array	use .init_array/.fini_array sections
   --enable-comdat enable COMDAT group support
+
+  --enable-fix-cortex-a53-835769
+  enable workaround for AArch64 Cortex-A53 erratum
+  835769 by default
+  --disable-fix-cortex-a53-835769
+  disable workaround for AArch64 Cortex-A53 erratum
+  835769 by default
+
   --enable-gnu-unique-object
   enable the use of the @gnu_unique_object ELF
   extension on glibc systems
@@ -18049,7 +18058,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 18052 configure
+#line 18061 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
@@ -18155,7 +18164,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 18158 configure
+#line 18167 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
@@ -23979,6 +23988,25 @@ $as_echo #define HAVE_AS_MABI_OPTION 1 confdefs.h
 done
   fi
 fi
+# Enable default workaround for AArch64 Cortex-A53 erratum 835769.
+# Check whether --enable-fix-cortex-a53-835769 was given.
+if test ${enable_fix_cortex_a53_835769+set} = set; then :
+  enableval=$enable_fix_cortex_a53_835769;
+case $enableval in
+  yes)
+tm_defines=${tm_defines} TARGET_FIX_ERR_A53_835769_DEFAULT=1
+;;
+  no)
+;;
+  *)
+as_fn_error '$enableval' is an invalid value for --enable-fix-cortex-a53-835769.\
+  Valid choices are 'yes' and 'no'. $LINENO 5
+;;
+
+esac
+
+fi
+
 ;;
 
   # All TARGET_ABI_OSF targets.
diff --git a/gcc/configure.ac b/gcc/configure.ac
index eb480de..8f7c814 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3526,6 +3526,29 @@ case $target in
 done
   fi
 fi
+# Enable default workaround for AArch64 Cortex-A53 erratum 835769.
+AC_ARG_ENABLE(fix-cortex-a53-835769,
+[

[PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Kyrill Tkachov


Hi all,


Some early revisions of the Cortex-A53 have an erratum (835769) whereby
it is possible for a 64-bit multiply-accumulate instruction in
AArch64 state to generate an incorrect result.  The details are quite
complex and hard to determine statically, since branches in the code
may exist in some circumstances, but all cases end with a memory
(load, store, or prefetch) instruction followed immediately by the
multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid
emitting multiply-accumulate instructions immediately after memory
instructions and the simplest way to do this is to insert a NOP. A
linker patching technique can also be used, by moving potentially
affected multiply-accumulate instruction into a patch region and
replacing the original instruction with a branch to the patch.

This patch achieves the compiler part by using the final prescan pass.
The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
so that conditional branches work properly.

The fix is disabled by default and can be turned on by the
-mfix-cortex-a53-835769 compiler option.

I'm attaching a trunk and a 4.9 version of the patch.
The 4.9 version is different only in that rtx_insn* is replaced by rtx.

Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
built various large benchmarks with it.

Ok?

Thanks,
Kyrill

2014-10-10  Kyrylo Tkachovkyrylo.tkac...@arm.com
Ramana Radhakrishnanramana.radhakrish...@arm.com

 * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
 (ADJUST_INSN_LENGTH): Define.
 * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
 * config/aarch64/aarch64.c (is_mem_p): New function.
 (is_memory_op): Likewise.
 (aarch64_prev_real_insn): Likewise.
 (is_madd_op): Likewise.
 (dep_between_memop_and_next): Likewise.
 (aarch64_madd_needs_nop): Likewise.
 (aarch64_final_prescan_insn): Likewise.
 * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
 and -mno-fix-cortex-a53-835769 options.
commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
Author: Kyrill Tkachov kyrylo.tkac...@arm.com
Date:   Wed Oct 8 12:48:34 2014 +

[AArch64] Add final prescan workaround for Cortex-A53 erratum.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b5f53d2..c57a467 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
 extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
+extern bool aarch64_madd_needs_nop (rtx_insn *);
+extern void aarch64_final_prescan_insn (rtx_insn *);
 extern bool
 aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
 void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5144c35..76a2480 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
   return NULL;
 }
 
+static int
+is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  return MEM_P (*x);
+}
+
+static bool
+is_memory_op (rtx_insn *mem_insn)
+{
+   rtx pattern = PATTERN (mem_insn);
+   return for_each_rtx (pattern, is_mem_p, NULL);
+}
+
+/* Find the first rtx_insn before insn that will generate an assembly
+   instruction.  */
+
+static rtx_insn *
+aarch64_prev_real_insn (rtx_insn *insn)
+{
+  if (!insn)
+return NULL;
+
+  do
+{
+  insn = prev_real_insn (insn);
+}
+  while (insn  recog_memoized (insn)  0);
+
+  return insn;
+}
+
+static bool
+is_madd_op (enum attr_type t1)
+{
+  unsigned int i;
+  /* A number of these may be AArch32 only.  */
+  enum attr_type mlatypes[] = {
+TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
+TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
+TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD
+  };
+
+  for (i = 0; i  sizeof (mlatypes) / sizeof (enum attr_type); i++)
+{
+  if (t1 == mlatypes[i])
+	return true;
+}
+
+  return false;
+}
+
+/* Check if there is a register dependency between a load and the insn
+   for which we hold recog_data.  */
+
+static bool
+dep_between_memop_and_curr (rtx memop)
+{
+  rtx load_reg;
+  int opno;
+
+  if (!memop)
+return false;
+
+  if (!REG_P (SET_DEST (memop)))
+return false;
+
+  load_reg = SET_DEST (memop);
+  for (opno = 0; opno  recog_data.n_operands; opno++)
+{
+  rtx operand = recog_data.operand[opno];
+  if (REG_P (operand)
+   reg_overlap_mentioned_p (load_reg, operand))
+return true;
+
+}
+  return false;
+}
+
+bool
+aarch64_madd_needs_nop (rtx_insn* insn)
+{
+  enum attr_type attr_type;
+  rtx_insn *prev;
+  rtx body;
+
+  if

Re: [PATCH] Small optimization for emit_case_bit_tests (PR tree-optimization/63464)

2014-10-10 Thread Richard Biener

On Fri, 10 Oct 2014, Jakub Jelinek wrote:

 Hi!
 
 This patch adds a small optimization to emit_case_bit_tests,
 instead of emitting (for high, low, mask all constants)
 (x - low) = (high - low)  ((1  (x - low))  mask)
 if high is smaller than BITS_PER_WORD and low  0 we can emit
 x = high  ((1  x)  (mask  low))
 and avoid subtraction.  Do this only if mask  low isn't more costly than
 mask.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

But isn't this a general optimization?  Also testing for RTX costs
this early sounds bogus.

Thanks,
Richard.

 2014-10-10  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/63464
   * tree-switch-conversion.c (struct case_bit_test): Remove
   hi and lo fields, add wide_int mask field.
   (emit_case_bit_tests): Add MAXVAL argument, rewrite uses of
   hi/lo fields into wide_int mask operations, optimize by pretending
   minval to be 0 if maxval is small enough.
   (process_switch): Adjust caller.
 
 --- gcc/tree-switch-conversion.c.jj   2014-08-01 09:24:12.0 +0200
 +++ gcc/tree-switch-conversion.c  2014-10-07 12:36:29.438181857 +0200
 @@ -246,8 +246,7 @@ This transformation was contributed by R
  
  struct case_bit_test
  {
 -  HOST_WIDE_INT hi;
 -  HOST_WIDE_INT lo;
 +  wide_int mask;
edge target_edge;
tree label;
int bits;
 @@ -294,13 +293,14 @@ case_bit_test_cmp (const void *p1, const
  are not guaranteed to be of the same type as INDEX_EXPR
  (the gimplifier doesn't change the type of case label values,
  and MINVAL and RANGE are derived from those values).
 +MAXVAL is MINVAL + RANGE.
  
  There *MUST* be MAX_CASE_BIT_TESTS or less unique case
  node targets.  */
  
  static void
  emit_case_bit_tests (gimple swtch, tree index_expr,
 -  tree minval, tree range)
 +  tree minval, tree range, tree maxval)
  {
struct case_bit_test test[MAX_CASE_BIT_TESTS];
unsigned int i, j, k;
 @@ -324,6 +324,8 @@ emit_case_bit_tests (gimple swtch, tree
tree word_type_node = lang_hooks.types.type_for_mode (word_mode, 1);
tree word_mode_zero = fold_convert (word_type_node, integer_zero_node);
tree word_mode_one = fold_convert (word_type_node, integer_one_node);
 +  int prec = TYPE_PRECISION (word_type_node);
 +  wide_int wone = wi::one (prec);
  
memset (test, 0, sizeof (test));
  
 @@ -348,8 +350,7 @@ emit_case_bit_tests (gimple swtch, tree
if (k == count)
   {
 gcc_checking_assert (count  MAX_CASE_BIT_TESTS);
 -   test[k].hi = 0;
 -   test[k].lo = 0;
 +   test[k].mask = wi::zero (prec);
 test[k].target_edge = e;
 test[k].label = label;
 test[k].bits = 1;
 @@ -367,14 +368,39 @@ emit_case_bit_tests (gimple swtch, tree
   CASE_HIGH (cs), minval));
  
for (j = lo; j = hi; j++)
 -if (j = HOST_BITS_PER_WIDE_INT)
 -   test[k].hi |= (HOST_WIDE_INT) 1  (j - HOST_BITS_PER_INT);
 - else
 -   test[k].lo |= (HOST_WIDE_INT) 1  j;
 + test[k].mask |= wi::lshift (wone, j);
  }
  
qsort (test, count, sizeof (*test), case_bit_test_cmp);
  
 +  /* If all values are in the 0 .. BITS_PER_WORD-1 range, we can get rid of
 + the minval subtractions, but it might make the mask constants more
 + expensive.  So, compare the costs.  */
 +  if (compare_tree_int (minval, 0)  0
 +   compare_tree_int (maxval, GET_MODE_BITSIZE (word_mode))  0)
 +{
 +  int cost_diff;
 +  HOST_WIDE_INT m = tree_to_uhwi (minval);
 +  rtx reg = gen_raw_REG (word_mode, 1);
 +  bool speed_p = optimize_bb_for_speed_p (gimple_bb (swtch));
 +  cost_diff = set_rtx_cost (gen_rtx_PLUS (word_mode, reg,
 +   GEN_INT (-m)), speed_p);
 +  for (i = 0; i  count; i++)
 + {
 +   rtx r = immed_wide_int_const (test[i].mask, word_mode);
 +   cost_diff += set_src_cost (gen_rtx_AND (word_mode, reg, r), speed_p);
 +   r = immed_wide_int_const (wi::lshift (test[i].mask, m), word_mode);
 +   cost_diff -= set_src_cost (gen_rtx_AND (word_mode, reg, r), speed_p);
 + }
 +  if (cost_diff  0)
 + {
 +   for (i = 0; i  count; i++)
 + test[i].mask = wi::lshift (test[i].mask, m);
 +   minval = build_zero_cst (TREE_TYPE (minval));
 +   range = maxval;
 + }
 +}
 +
/* We generate two jumps to the default case label.
   Split the default edge, so that we don't have to do any PHI node
   updating.  */
 @@ -446,13 +472,7 @@ emit_case_bit_tests (gimple swtch, tree
  if (const  csui) goto target  */
for (k = 0; k  count; k++)
  {
 -  HOST_WIDE_INT a[2];
 -
 -  a[0] = test[k].lo;
 -  a[1] = test[k].hi;
 -  tmp = wide_int_to_tree (word_type_node,
 -   wide_int::from_array (a, 2,
 - TYPE_PRECISION 
 (word_type_node)));
 +  tmp =

Re: [PATCH] Small optimization for emit_case_bit_tests (PR tree-optimization/63464)

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 12:55:21PM +0200, Richard Biener wrote:
 On Fri, 10 Oct 2014, Jakub Jelinek wrote:
  This patch adds a small optimization to emit_case_bit_tests,
  instead of emitting (for high, low, mask all constants)
  (x - low) = (high - low)  ((1  (x - low))  mask)
  if high is smaller than BITS_PER_WORD and low  0 we can emit
  x = high  ((1  x)  (mask  low))
  and avoid subtraction.  Do this only if mask  low isn't more costly than
  mask.
  
  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 But isn't this a general optimization?

The  always has to mean separate basic blocks, as otherwise it is
undefined behavior.  I'd think this optimization would be too specialized
for a general optimization, and unsure in which pass it would be desirable.

 Also testing for RTX costs this early sounds bogus.

Well, the bit test optimization is already decided based on other rtx costs.

Jakub

[PATCH] Fix PR63476

2014-10-10 Thread Richard Biener


This fixes PRE not keeping virtual SSA form up-to-date during insertion
and thus eventual VOP walks from the devirt code during elimination.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2014-10-10  Richard Biener  rguent...@suse.de

PR tree-optimization/63476
* tree-ssa-pre.c (struct bb_bitmap_sets): Add vop_on_exit member.
(BB_LIVE_VOP_ON_EXIT): New define.
(create_expression_by_pieces): Assign VUSEs to stmts.
(compute_avail): Track BB_LIVE_VOP_ON_EXIT.
(pass_pre::execute): Assert virtual SSA form is up-to-date
after insertion.

* g++.dg/torture/pr63476.C: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 216037)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -423,6 +423,9 @@ typedef struct bb_bitmap_sets
   /* A cache for value_dies_in_block_x.  */
   bitmap expr_dies;
 
+  /* The live virtual operand on successor edges.  */
+  tree vop_on_exit;
+
   /* True if we have visited this block during ANTIC calculation.  */
   unsigned int visited : 1;
 
@@ -440,6 +443,7 @@ typedef struct bb_bitmap_sets
 #define EXPR_DIES(BB)  ((bb_value_sets_t) ((BB)-aux))-expr_dies
 #define BB_VISITED(BB) ((bb_value_sets_t) ((BB)-aux))-visited
 #define BB_MAY_NOTRETURN(BB) ((bb_value_sets_t) 
((BB)-aux))-contains_may_not_return_call
+#define BB_LIVE_VOP_ON_EXIT(BB) ((bb_value_sets_t) ((BB)-aux))-vop_on_exit
 
 
 /* Basic block list in postorder.  */
@@ -2886,12 +2890,15 @@ create_expression_by_pieces (basic_block
  bitmap_value_replace_in_set (NEW_SETS (block), nameexpr);
  bitmap_value_replace_in_set (AVAIL_OUT (block), nameexpr);
}
+
+ gimple_set_vuse (stmt, BB_LIVE_VOP_ON_EXIT (block));
}
   gimple_seq_add_seq (stmts, forced_stmts);
 }
 
   name = make_temp_ssa_name (exprtype, NULL, pretmp);
   newstmt = gimple_build_assign (name, folded);
+  gimple_set_vuse (newstmt, BB_LIVE_VOP_ON_EXIT (block));
   gimple_set_plf (newstmt, NECESSARY, false);
 
   gimple_seq_add_stmt (stmts, newstmt);
@@ -3593,6 +3623,9 @@ compute_avail (void)
son = next_dom_son (CDI_DOMINATORS, son))
 worklist[sp++] = son;
 
+  BB_LIVE_VOP_ON_EXIT (ENTRY_BLOCK_PTR_FOR_FN (cfun))
+= ssa_default_def (cfun, gimple_vop (cfun));
+
   /* Loop until the worklist is empty.  */
   while (sp)
 {
@@ -3607,7 +3640,10 @@ compute_avail (void)
 its immediate dominator.  */
   dom = get_immediate_dominator (CDI_DOMINATORS, block);
   if (dom)
-   bitmap_set_copy (AVAIL_OUT (block), AVAIL_OUT (dom));
+   {
+ bitmap_set_copy (AVAIL_OUT (block), AVAIL_OUT (dom));
+ BB_LIVE_VOP_ON_EXIT (block) = BB_LIVE_VOP_ON_EXIT (dom);
+   }
 
   /* Generate values for PHI nodes.  */
   for (gsi = gsi_start_phis (block); !gsi_end_p (gsi); gsi_next (gsi))
@@ -3617,7 +3653,10 @@ compute_avail (void)
  /* We have no need for virtual phis, as they don't represent
 actual computations.  */
  if (virtual_operand_p (result))
-   continue;
+   {
+ BB_LIVE_VOP_ON_EXIT (block) = result;
+ continue;
+   }
 
  pre_expr e = get_or_alloc_expr_for_name (result);
  add_to_value (get_expr_value_id (e), e);
@@ -3661,6 +3700,9 @@ compute_avail (void)
  bitmap_value_insert_into_set (AVAIL_OUT (block), e);
}
 
+ if (gimple_vdef (stmt))
+   BB_LIVE_VOP_ON_EXIT (block) = gimple_vdef (stmt);
+
  if (gimple_has_side_effects (stmt)
  || stmt_could_throw_p (stmt)
  || is_gimple_debug (stmt))
@@ -4758,6 +4794,10 @@ pass_pre::execute (function *fun)
   remove_fake_exit_edges ();
   gsi_commit_edge_inserts ();
 
+  /* Eliminate folds statements which might (should not...) end up
+ not keeping virtual operands up-to-date.  */
+  gcc_assert (!need_ssa_update_p (fun));
+
   /* Remove all the redundant expressions.  */
   todo |= eliminate (true);
 
Index: gcc/testsuite/g++.dg/torture/pr63476.C
===
--- gcc/testsuite/g++.dg/torture/pr63476.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr63476.C  (working copy)
@@ -0,0 +1,39 @@
+// { dg-do compile }
+// { dg-additional-options -std=gnu++11 }
+
+enum class nsresult;
+class A;
+class B
+{
+public:
+B (int);
+A *operator-();
+};
+class C
+{
+};
+class A
+{
+public:
+virtual nsresult AddObserver (const char *, C *, bool) = 0;
+};
+class D : A
+{
+  nsresult
+  AddObserver (const char *p1, C *p2, bool p3)
+   {
+ AddObserver (p1, p2, p3);
+   }
+};
+char *prefList[]{};
+class F : C
+{
+  nsresult Install ();
+};
+nsresult
+F::Install ()
+{
+  B branch = 0;
+  for (int i;;)
+branch-AddObserver (prefList[i], this, false);
+}

[PATCH v2, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference

2014-10-10 Thread Uros Bizjak

On Wed, Oct 8, 2014 at 1:56 PM, Uros Bizjak ubiz...@gmail.com wrote:

 This message revives an old thread [1], where the miscompilation of gfortran
 on alpha was found that that resulted in:

 [...]

 As said in the audit trail of the bugreport I think that the caller
 of alpha_set_memflags is wrong in applying MEM flags from the _source_
 operand to MEMs generated for the RMW sequence for the destination.

 It would be better to fix that instead.

 Please see comment #6 of the referred PR [1] for further analysis and
 ammended testcase. The testcase and analysis will show a native read
 passing possibly aliasing store.

Attached v2 patch implements the same approach in all alias.c places
that declare MEM_READONLY_P operands as never aliasing.

2014-10-10  Uros Bizjak  ubiz...@gmail.com

* alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P
references when alignment ANDs are involved.
(write_dependence_p): Ditto.
(may_alias_p): Ditto.

Patch was boostrapped and regression tested on x86_64-linux-gnu and
alpha-linux-gnu. Unfortunately, there are still failures remaining in
gfortran testsuite due to independent RTL infrastructure problem with
VALUEs leaking into aliasing detecting functions [2], [3].

The patch was discussed and OK'd by Richi in the PR audit trail. If
there are no objections from RTL maintainers, I plan to commit it to
the mainline early next week.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483
[2] https://gcc.gnu.org/ml/gcc/2014-10/msg00060.html
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475

Uros.
Index: alias.c
===
--- alias.c (revision 216025)
+++ alias.c (working copy)
@@ -2458,18 +2458,6 @@ true_dependence_1 (const_rtx mem, enum machine_mod
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
-  /* Read-only memory is by definition never modified, and therefore can't
- conflict with anything.  We don't expect to find read-only set on MEM,
- but stupid user tricks can produce them, so don't die.  */
-  if (MEM_READONLY_P (x))
-return 0;
-
-  /* If we have MEMs referring to different address spaces (which can
- potentially overlap), we cannot easily tell from the addresses
- whether the references overlap.  */
-  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
-return 1;
-
   if (! mem_addr)
 {
   mem_addr = XEXP (mem, 0);
@@ -2493,6 +2481,22 @@ true_dependence_1 (const_rtx mem, enum machine_mod
}
 }
 
+  /* Read-only memory is by definition never modified, and therefore can't
+ conflict with anything.  However, don't assume anything when AND
+ addresses are involved and leave to the code below to determine
+ dependence.  We don't expect to find read-only set on MEM, but
+ stupid user tricks can produce them, so don't die.  */
+  if (MEM_READONLY_P (x)
+   GET_CODE (x_addr) != AND
+   GET_CODE (mem_addr) != AND)
+return 0;
+
+  /* If we have MEMs referring to different address spaces (which can
+ potentially overlap), we cannot easily tell from the addresses
+ whether the references overlap.  */
+  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
+return 1;
+
   base = find_base_term (x_addr);
   if (base  (GET_CODE (base) == LABEL_REF
   || (GET_CODE (base) == SYMBOL_REF
@@ -2576,16 +2580,6 @@ write_dependence_p (const_rtx mem,
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
-  /* A read from read-only memory can't conflict with read-write memory.  */
-  if (!writep  MEM_READONLY_P (mem))
-return 0;
-
-  /* If we have MEMs referring to different address spaces (which can
- potentially overlap), we cannot easily tell from the addresses
- whether the references overlap.  */
-  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
-return 1;
-
   mem_addr = XEXP (mem, 0);
   if (!x_addr)
 {
@@ -2603,6 +2597,21 @@ write_dependence_p (const_rtx mem,
}
 }
 
+  /* A read from read-only memory can't conflict with read-write memory.
+ Don't assume anything when AND addresses are involved and leave to
+ the code below to determine dependence.  */
+  if (!writep
+   MEM_READONLY_P (mem)
+   GET_CODE (x_addr) != AND
+   GET_CODE (mem_addr) != AND)
+return 0;
+
+  /* If we have MEMs referring to different address spaces (which can
+ potentially overlap), we cannot easily tell from the addresses
+ whether the references overlap.  */
+  if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x))
+return 1;
+
   base = find_base_term (mem_addr);
   if (! writep
base
@@ -2690,18 +2699,6 @@ may_alias_p (const_rtx mem, const_rtx x)
   || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
 return 1;
 
-  /* Read-only memory is by definition never modified, and therefore can't
- conflict with anything.  We don't expect to find read-only set on MEM,
- but stupid user tricks can

Re: [PATCH] Fix PR63379

2014-10-10 Thread Richard Biener

On Thu, 9 Oct 2014, Richard Biener wrote:

 
 This fixes SLP vectorization of a MIN/MAX reduction where we choose
 a neutral element as the initial value of the first SLP group
 member.  That's of course wrong.  Simply don't choose one in which
 case the proper initial values will used.
 
 Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Ok, that's a too simple fix and breaks reduction chains for which
this code was added.  Thus we have to support both variants
dependent on if this is a reduction chain or not (let's hope
we don't support the mixed case...)

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied on trunk
sofar.

Richard.

2014-10-09  Richard Biener  rguent...@suse.de

PR tree-optimization/63379
* tree-vect-slp.c (vect_get_constant_vectors): Do not compute
a neutral operand for min/max when it is not a reduction chain.

* gcc.dg/vect/pr63379.c: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 216037)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2395,13 +2395,21 @@ vect_get_constant_vectors (tree op, slp_
 neutral_op = build_int_cst (TREE_TYPE (op), -1);
 break;
 
-  case MAX_EXPR:
-  case MIN_EXPR:
-def_stmt = SSA_NAME_DEF_STMT (op);
-loop = (gimple_bb (stmt))-loop_father;
-neutral_op = PHI_ARG_DEF_FROM_EDGE (def_stmt,
-loop_preheader_edge (loop));
-break;
+ /* For MIN/MAX we don't have an easy neutral operand but
+the initial values can be used fine here.  Only for
+a reduction chain we have to force a neutral element.  */
+ case MAX_EXPR:
+ case MIN_EXPR:
+   if (!GROUP_FIRST_ELEMENT (stmt_vinfo))
+ neutral_op = NULL;
+   else
+ {
+   def_stmt = SSA_NAME_DEF_STMT (op);
+   loop = (gimple_bb (stmt))-loop_father;
+   neutral_op = PHI_ARG_DEF_FROM_EDGE (def_stmt,
+   loop_preheader_edge (loop));
+ }
+   break;
 
   default:
 neutral_op = NULL;
Index: gcc/testsuite/gcc.dg/vect/pr63379.c
===
--- gcc/testsuite/gcc.dg/vect/pr63379.c (revision 0)
+++ gcc/testsuite/gcc.dg/vect/pr63379.c (working copy)
@@ -0,0 +1,43 @@
+/* PR tree-optimization/63379  */
+/* { dg-do run } */
+
+#include tree-vect.h
+
+extern void abort (void);
+
+typedef struct {
+int x;
+int y;
+} Point;
+
+Point pt_array[25];
+
+void __attribute__((noinline,noclone))
+generate_array(void)
+{
+  unsigned int i;
+  for (i = 0; i25; i++)
+{
+  pt_array[i].x = i;
+  pt_array[i].y = 1000+i;
+}
+}
+
+int main()
+{
+  check_vect ();
+  generate_array ();
+  Point min_pt = pt_array[0];
+  Point *ptr, *ptr_end;
+  for (ptr = pt_array+1, ptr_end = pt_array+25; ptr != ptr_end; ++ptr)
+{
+  min_pt.x = (min_pt.x  ptr-x) ? min_pt.x : ptr-x;
+  min_pt.y = (min_pt.y  ptr-y) ? min_pt.y : ptr-y;
+}
+
+  if (min_pt.x != 0 || min_pt.y != 1000)
+abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump vect } } */

Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak

On Fri, Oct 10, 2014 at 9:58 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 the patch improves performance when previous are applied.
 It makes RTL loop invariant behavior for GOT loads same as it was
 before the 2 previous patches.

 The patch fixes x86 address cost so that cost for addresses with GOT
 register becomes less, how it was before enabling EBX.

 In x86_address_cost the result of “REGNO (parts.base) =
 FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes
 condition result
 the same when parts.base is GOT register (the same for parts.index).

 2014-10-08  Evgeny Stupachenko  evstu...@gmail.com
 * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for
 when address contains GOT register.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index b43e870..9d8cfd1 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode,
 addr_space_t, bool)
  cost++;

Please add a short comment here, explaining the reason for new condition.

if (parts.base
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
 (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER)
 parts.index
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
 (!REG_P (parts.index) || REGNO (parts.index) = 
 FIRST_PSEUDO_REGISTER)
 parts.base != parts.index)
  cost++;

Otherwise LGTM, but please repost the patch with a comment.

Uros.

[PATCH] Fix default(none) with array parameters (PR fortran/59488)

2014-10-10 Thread Jakub Jelinek

Hi!

This patch makes Fortran array/derived parameters as well as
vtables predetermined shared.  For the standard proper those aren't
really variables at all, and as they are global vars, shared is
the best predetermination for them.

2014-10-10  Jakub Jelinek  ja...@redhat.com

PR fortran/59488
* trans-openmp.c (gfc_omp_predetermined_sharing): Return
OMP_CLAUSE_DEFAULT_SHARED for parameters or vtables.

* gfortran.dg/gomp/pr59488-1.f90: New test.
* gfortran.dg/gomp/pr59488-2.f90: New test.

--- gcc/fortran/trans-openmp.c.jj   2014-09-01 09:43:40.0 +0200
+++ gcc/fortran/trans-openmp.c  2014-10-10 10:15:00.426035513 +0200
@@ -135,6 +135,16 @@ gfc_omp_predetermined_sharing (tree decl
   if (GFC_DECL_RESULT (decl)  ! DECL_HAS_VALUE_EXPR_P (decl))
 return OMP_CLAUSE_DEFAULT_SHARED;
 
+  /* These are either array or derived parameters, or vtables.
+ In the former cases, the OpenMP standard doesn't consider them to be
+ variables at all (they can't be redefined), but they can nevertheless 
appear
+ in parallel/task regions and for default(none) purposes treat them as 
shared.
+ For vtables likely the same handling is desirable.  */
+  if (TREE_CODE (decl) == VAR_DECL
+   TREE_READONLY (decl)
+   TREE_STATIC (decl))
+return OMP_CLAUSE_DEFAULT_SHARED;
+
   return OMP_CLAUSE_DEFAULT_UNSPECIFIED;
 }
 
--- gcc/testsuite/gfortran.dg/gomp/pr59488-1.f90.jj 2014-10-10 
10:06:38.231365699 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr59488-1.f902014-10-10 
10:06:56.054036548 +0200
@@ -0,0 +1,13 @@
+! PR fortran/59488
+! { dg-do compile }
+! { dg-options -fopenmp }
+
+  implicit none
+  integer, parameter :: p(2) = (/ 11, 12 /)
+  integer :: r
+
+  !$omp parallel do default(none)
+  do r = 1, 2
+print *, p(r)
+  end do
+end
--- gcc/testsuite/gfortran.dg/gomp/pr59488-2.f90.jj 2014-10-10 
10:06:41.331308441 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr59488-2.f902014-10-10 
10:08:07.0 +0200
@@ -0,0 +1,16 @@
+! PR fortran/59488
+! { dg-do compile }
+! { dg-options -fopenmp }
+
+  implicit none
+  type t
+integer :: s1, s2, s3
+  end type
+  integer :: r
+  type(t), parameter :: u = t(1, 2, 3)
+
+  !$omp parallel do default(none)
+  do r = 1, 2
+print *, u
+  end do
+end

Jakub

[PING v2][PATCH] Support for BIT_FIELD_REF in asan.c

2014-10-10 Thread Marat Zakirov



On 10/03/2014 10:47 AM, Marat Zakirov wrote:


On 09/26/2014 12:55 PM, Marat Zakirov wrote:

Hi all!

Here's a patch which instruments byte-aligned BIT_FIELD_REFs. During 
GCC asan-bootstrap and Linux kernel build I didn't find any cases 
where BIT_FIELD_REF is not 8 bits aligned. But I do not have 
sufficient confidence to replace current return if BIT_FIELD_REF is 
misaligned to assert.


Ok to commit?

--Marat




gcc/ChangeLog:

2014-09-19  Marat Zakirov  m.zaki...@samsung.com

	* asan.c (instrument_derefs): BIT_FIELD_REF added.

gcc/testsuite/ChangeLog:

2014-09-19  Marat Zakirov  m.zaki...@samsung.com

	* c-c++-common/asan/bitfield-5.c: New test.

diff --git a/gcc/asan.c b/gcc/asan.c
index cf5de27..451af33 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1705,6 +1705,7 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
 case INDIRECT_REF:
 case MEM_REF:
 case VAR_DECL:
+case BIT_FIELD_REF:
   break;
   /* FALLTHRU */
 default:
diff --git a/gcc/testsuite/c-c++-common/asan/bitfield-5.c b/gcc/testsuite/c-c++-common/asan/bitfield-5.c
new file mode 100644
index 000..eb5e9e9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/bitfield-5.c
@@ -0,0 +1,24 @@
+/* Check BIT_FIELD_REF.  */
+
+/* { dg-do run } */
+/* { dg-shouldfail asan } */
+
+struct A
+{
+  int y : 20;
+  int x : 13;
+};
+
+int __attribute__ ((noinline, noclone))
+f (void *p) {
+  return ((struct A *)p)-x != 0;
+}
+
+int
+main ()
+{
+  int a = 0;
+  return f (a);
+}
+
+/* { dg-output ERROR: AddressSanitizer: stack-buffer-overflow } */

[PATCH] Fix PR63419

2014-10-10 Thread Richard Biener


force_gimple_operand doesn't really do a deep verification
on its input when we ask for a non-simple result here.  Instead
it trusts that the CONSTRUCTOR PRE feeds it is valid GIMPLE.
Unfortunately that isn't so if its elements required a conversion.

The following fixes it by merging gimple_convert and its use
in PRE from match-and-simplify (which avoids force_gimple_operand
entirely in PRE).  The implementation of gimple_convert is of
course still using fold here.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2014-10-10  Richard Biener  rguent...@suse.de

PR tree-optimization/63419
* gimple-fold.h (gimple_convert): New function.
* gimple-fold.c (gimple_convert): Likewise.
* tree-ssa-pre.c (create_expression_by_pieces): Use gimple_convert
to split out required conversions early.

* g++.dg/torture/pr63419.C: New testcase.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 216069)
--- gcc/gimple-fold.c   (working copy)
*** rewrite_to_defined_overflow (gimple stmt
*** 5282,5284 
--- 5282,5301 
  
return stmts;
  }
+ 
+ /* Return OP converted to TYPE by emitting a conversion statement on SEQ
+if required using location LOC.  Note that OP will be returned
+unmodified if GIMPLE does not require an explicit conversion between
+its type and TYPE.  */
+ 
+ tree
+ gimple_convert (gimple_seq *seq, location_t loc, tree type, tree op)
+ {
+   if (useless_type_conversion_p (type, TREE_TYPE (op)))
+ return op;
+   op = fold_convert_loc (loc, type, op);
+   gimple_seq stmts = NULL;
+   op = force_gimple_operand (op, stmts, true, NULL_TREE);
+   gimple_seq_add_seq_without_update (seq, stmts);
+   return op;
+ }
Index: gcc/gimple-fold.h
===
*** gcc/gimple-fold.h   (revision 216069)
--- gcc/gimple-fold.h   (working copy)
*** extern tree gimple_fold_indirect_ref (tr
*** 45,48 
--- 45,55 
  extern bool arith_code_with_undefined_signed_overflow (tree_code);
  extern gimple_seq rewrite_to_defined_overflow (gimple);
  
+ extern tree gimple_convert (gimple_seq *, location_t, tree, tree);
+ inline tree
+ gimple_convert (gimple_seq *seq, tree type, tree op)
+ {
+   return gimple_convert (seq, UNKNOWN_LOCATION, type, op);
+ }
+ 
  #endif  /* GCC_GIMPLE_FOLD_H */
Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 216069)
--- gcc/tree-ssa-pre.c  (working copy)
*** create_expression_by_pieces (basic_block
*** 2819,2830 
if (nary-opcode == POINTER_PLUS_EXPR)
  {
if (i == 0)
! genop[i] = fold_convert (nary-type, genop[i]);
else if (i == 1)
! genop[i] = convert_to_ptrofftype (genop[i]);
  }
else
! genop[i] = fold_convert (TREE_TYPE (nary-op[i]), genop[i]);
  }
if (nary-opcode == CONSTRUCTOR)
  {
--- 2819,2833 
if (nary-opcode == POINTER_PLUS_EXPR)
  {
if (i == 0)
! genop[i] = gimple_convert (forced_stmts,
!nary-type, genop[i]);
else if (i == 1)
! genop[i] = gimple_convert (forced_stmts,
!sizetype, genop[i]);
  }
else
! genop[i] = gimple_convert (forced_stmts,
!TREE_TYPE (nary-op[i]), genop[i]);
  }
if (nary-opcode == CONSTRUCTOR)
  {
*** create_expression_by_pieces (basic_block
*** 2866,2873 
   statements.
   We have to call unshare_expr because force_gimple_operand may
   modify the tree we pass to it.  */
!   folded = force_gimple_operand (unshare_expr (folded), forced_stmts,
 false, NULL);
  
/* If we have any intermediate expressions to the value sets, add them
   to the value sets and chain them in the instruction stream.  */
--- 2869,2878 
   statements.
   We have to call unshare_expr because force_gimple_operand may
   modify the tree we pass to it.  */
!   gimple_seq tem = NULL;
!   folded = force_gimple_operand (unshare_expr (folded), tem,
 false, NULL);
+   gimple_seq_add_seq_without_update (forced_stmts, tem);
  
/* If we have any intermediate expressions to the value sets, add them
   to the value sets and chain them in the instruction stream.  */

Re: [PATCH] Small optimization for emit_case_bit_tests (PR tree-optimization/63464)

2014-10-10 Thread Richard Biener

On Fri, 10 Oct 2014, Jakub Jelinek wrote:

 On Fri, Oct 10, 2014 at 12:55:21PM +0200, Richard Biener wrote:
  On Fri, 10 Oct 2014, Jakub Jelinek wrote:
   This patch adds a small optimization to emit_case_bit_tests,
   instead of emitting (for high, low, mask all constants)
   (x - low) = (high - low)  ((1  (x - low))  mask)
   if high is smaller than BITS_PER_WORD and low  0 we can emit
   x = high  ((1  x)  (mask  low))
   and avoid subtraction.  Do this only if mask  low isn't more costly than
   mask.
   
   Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
  
  But isn't this a general optimization?
 
 The  always has to mean separate basic blocks, as otherwise it is
 undefined behavior.  I'd think this optimization would be too specialized
 for a general optimization, and unsure in which pass it would be desirable.

Yeah, I also didn't come up with a good place to do this in general.
ifcombine maybe, which already does some pattern matching on
if (x) if (y) ...

If I understand correctly we have

 if ((x - low) = (high - low))
  {
if (((1  (x - low))  mask))
  {
 ...

which we'd like to transform to

 if (x = high)
   {
 if ((1  x)  (mask  low))
   {
...

ifcombine can certainly detect the pattern, but obviously the
result wouldn't have the ifs combined in any way.  And you'd
need to check whether _all_ code that depends on both tests
is ok with the transformed checks (not sure if that is always
the case).

  Also testing for RTX costs this early sounds bogus.
 
 Well, the bit test optimization is already decided based on other rtx costs.

Oh, I see.

I suppose the patch is fine then.

Thanks,
Richard.

Re: [PING v2][PATCH] Support for BIT_FIELD_REF in asan.c

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 03:40:57PM +0400, Marat Zakirov wrote:
 gcc/ChangeLog:
 
 2014-09-19  Marat Zakirov  m.zaki...@samsung.com
 
   * asan.c (instrument_derefs): BIT_FIELD_REF added.
 
 gcc/testsuite/ChangeLog:
 
 2014-09-19  Marat Zakirov  m.zaki...@samsung.com
 
   * c-c++-common/asan/bitfield-5.c: New test.

Ok, thanks.

Jakub

[Ada] New internal primitive Is_Subprogram_Or_Generic_Subprogram

2014-10-10 Thread Arnaud Charlet

This is a minor internal cleanup, to introduce a new primitive
Is_Subprogram_Or_Generic_Subprogram with the obvious meaning.
No external effect, no test required.

Tested on x86_64-pc-linux-gnu, committed on trunk

2014-10-10  Robert Dewar  de...@adacore.com

* sem_ch7.adb, einfo.adb, einfo.ads, sem_prag.adb, sem_ch12.adb,
freeze.adb, sem_util.adb, sem_res.adb, exp_ch6.adb, exp_ch13.adb,
sem_ch6.adb, sem_cat.adb, sem_disp.adb
(Is_Subprogram_Or_Generic_Subprogram): New primitive. Use this primitive
throughout where appropriate.

Index: sem_ch7.adb
===
--- sem_ch7.adb (revision 216063)
+++ sem_ch7.adb (working copy)
@@ -2808,7 +2808,7 @@
 
   --  Body required if subprogram
 
-  elsif Is_Subprogram (P) or else Is_Generic_Subprogram (P) then
+  elsif Is_Subprogram_Or_Generic_Subprogram (P) then
  return True;
 
   --  Treat a block as requiring a body
@@ -2937,7 +2937,7 @@
 
   --  Body required if subprogram
 
-  elsif Is_Subprogram (P) or else Is_Generic_Subprogram (P) then
+  elsif Is_Subprogram_Or_Generic_Subprogram (P) then
  Error_Msg_N (info:  requires body (subprogram case)?Y?, P);
 
   --  Body required if generic parent has Elaborate_Body
Index: einfo.adb
===
--- einfo.adb   (revision 216063)
+++ einfo.adb   (working copy)
@@ -1129,8 +1129,7 @@
E_Package_Body,
E_Subprogram_Body,
E_Variable)
-  or else Is_Generic_Subprogram (Id)
-  or else Is_Subprogram (Id));
+  or else Is_Subprogram_Or_Generic_Subprogram (Id));
   return Node34 (Id);
end Contract;
 
@@ -3405,6 +3404,13 @@
   return Ekind (Id) in Subprogram_Kind;
end Is_Subprogram;
 
+   function Is_Subprogram_Or_Generic_Subprogram (Id : E) return B is
+   begin
+  return Ekind (Id) in Subprogram_Kind
+   or else
+ Ekind (Id) in Generic_Subprogram_Kind;
+   end Is_Subprogram_Or_Generic_Subprogram;
+
function Is_Task_Type(Id : E) return B is
begin
   return Ekind (Id) in Task_Kind;
@@ -3593,15 +3599,14 @@
begin
   pragma Assert
 (Ekind_In (Id, E_Entry,
- E_Entry_Family,
- E_Generic_Package,
- E_Package,
- E_Package_Body,
- E_Subprogram_Body,
- E_Variable,
- E_Void)
- or else Is_Generic_Subprogram (Id)
- or else Is_Subprogram (Id));
+ E_Entry_Family,
+ E_Generic_Package,
+ E_Package,
+ E_Package_Body,
+ E_Subprogram_Body,
+ E_Variable,
+ E_Void)
+  or else Is_Subprogram_Or_Generic_Subprogram (Id));
   Set_Node34 (Id, V);
end Set_Contract;
 
Index: einfo.ads
===
--- einfo.ads   (revision 216063)
+++ einfo.ads   (working copy)
@@ -2974,6 +2974,10 @@
 --   Applies to all entities, true for function, procedure and operator
 --   entities.
 
+--Is_Subprogram_Or_Generic_Subprogram
+--   Applies to all entities, true for function procedure and operator
+--   entities, and also for the corresponding generic entities.
+
 --Is_Synchronized_Interface (synthesized)
 --   Defined in types that are interfaces. True if interface is declared
 --   synchronized, task, or protected, or is derived from a synchronized
@@ -6964,6 +6968,7 @@
function Is_Scalar_Type  (Id : E) return B;
function Is_Signed_Integer_Type  (Id : E) return B;
function Is_Subprogram   (Id : E) return B;
+   function Is_Subprogram_Or_Generic_Subprogram (Id : E) return B;
function Is_Task_Type(Id : E) return B;
function Is_Type (Id : E) return B;
 
@@ -8800,6 +8805,7 @@
pragma Inline (Is_Base_Type);
pragma Inline (Is_Package_Or_Generic_Package);
pragma Inline (Is_Packed_Array);
+   pragma Inline (Is_Subprogram_Or_Generic_Subprogram);
pragma Inline (Is_Volatile);
pragma Inline (Is_Wrapper_Package);
pragma Inline (Known_RM_Size);
Index: sem_prag.adb
===
--- sem_prag.adb(revision 216063)
+++ sem_prag.adb(working copy)
@@ -6736,10 +6736,9 @@
 (dispatching subprogram# cannot use Stdcall convention!,
  Arg1);
 
-   --  Subprogram is allowed, but not a generic subprogram
+   --  Subprograms are not allowed
 
-   elsif not Is_Subprogram (E)
- and then not Is_Generic_Subprogram (E)
+   elsif not Is_Subprogram_Or_Generic_Subprogram (E)

[Ada] Check for attempt to bind GNATprove files

2014-10-10 Thread Arnaud Charlet

If one or more objects is compiled in GNATprove mode (either by using
GNATprove directly, or by using -gnatd.F), then the ALI file is marked
and gnatbind will exit with a message as shown here. Given:

 1. procedure linkdf is
 2. begin
 3.null;
 4. end;

If we first compile this with

  gcc -c linkdf.adb -gnatd.F

then we try to do a gnatmake, we get

error: one or more files compiled in GNATprove mode
gnatmake: *** bind failed.

Previously this was not detected and the linker bombed
with peculiar error messages.

Tested on x86_64-pc-linux-gnu, committed on trunk

2014-10-10  Robert Dewar  de...@adacore.com

* ali.adb (Scan_ALI): Read and process new GP flag on ALI P line.
* ali.ads (GNATprove_Mode): New component in ALI table.
(GNATprove_Mode_Specified): New global.
* gnatbind.adb (Gnatbind): Give fatal error if any file compiled
in GNATProve mode.
* lib-writ.ads, lib-writ.adb (GP): New flag on P line for
GNATProve_Mode.

Index: lib-writ.adb
===
--- lib-writ.adb(revision 216063)
+++ lib-writ.adb(working copy)
@@ -1153,6 +1153,10 @@
  end if;
   end if;
 
+  if GNATprove_Mode then
+ Write_Info_Str ( GP);
+  end if;
+
   if Partition_Elaboration_Policy /= ' ' then
  Write_Info_Str  ( E);
  Write_Info_Char (Partition_Elaboration_Policy);
Index: lib-writ.ads
===
--- lib-writ.ads(revision 216063)
+++ lib-writ.ads(working copy)
@@ -192,6 +192,9 @@
--  the units in this file, where x is the first character
--  (upper case) of the policy name (e.g. 'C' for Concurrent).
 
+   -- GP   Set if this compilation was done in GNATprove mode, either
+   --  from direct use of GNATprove, or from use of -gnatdF.
+
-- Lx   A valid Locking_Policy pragma applies to all the units in
--  this file, where x is the first character (upper case) of
--  the policy name (e.g. 'C' for Ceiling_Locking).
@@ -200,7 +203,9 @@
--  were not compiled to produce an object. This can occur as a
--  result of the use of -gnatc, or if no object can be produced
--  (e.g. when a package spec is compiled instead of the body,
-   --  or a subunit on its own).
+   --  or a subunit on its own). Note that in GNATprove mode, we
+   --  do produce an object. The object is not suitable for binding
+   --  and linking, but we do not set NO, instead we set GP.
 
-- NR   No_Run_Time. Indicates that a pragma No_Run_Time applies
--  to all units in the file.
Index: ali.adb
===
--- ali.adb (revision 216063)
+++ ali.adb (working copy)
@@ -111,6 +111,7 @@
   Locking_Policy_Specified   := ' ';
   No_Normalize_Scalars_Specified := False;
   No_Object_Specified:= False;
+  GNATprove_Mode_Specified   := False;
   Normalize_Scalars_Specified:= False;
   Partition_Elaboration_Policy_Specified := ' ';
   Queuing_Policy_Specified   := ' ';
@@ -875,6 +876,7 @@
 First_Sdep   = No_Sdep_Id,
 First_Specific_Dispatching   = Specific_Dispatching.Last + 1,
 First_Unit   = No_Unit_Id,
+GNATprove_Mode   = False,
 Last_Interrupt_State = Interrupt_States.Last,
 Last_Sdep= No_Sdep_Id,
 Last_Specific_Dispatching= Specific_Dispatching.Last,
@@ -1089,6 +1091,13 @@
ALIs.Table (Id).Partition_Elaboration_Policy :=
  Partition_Elaboration_Policy_Specified;
 
+--  Processing for GP
+
+elsif C = 'G' then
+   Checkc ('P');
+   GNATprove_Mode_Specified := True;
+   ALIs.Table (Id).GNATprove_Mode := True;
+
 --  Processing for Lx
 
 elsif C = 'L' then
Index: ali.ads
===
--- ali.ads (revision 216063)
+++ ali.ads (working copy)
@@ -176,6 +176,11 @@
   --  always be set as well in this case. Not set if 'P' appears in
   --  Ignore_Lines.
 
+  GNATprove_Mode : Boolean;
+  --  Set to True if ALI and object file produced in GNATprove_Mode as
+  --  signalled by GP appearing on the P line. Not set if 'P' appears in
+  --  Ignore_Lines.
+
   No_Object : Boolean;
   --  Set to True if no object file generated. Not set if 'P' appears in
   --  Ignore_Lines.
@@ -465,6 +470,9 @@
--  Set to False by Initialize_ALI. Set to True if Scan_ALI reads
--  a unit for which dynamic elaboration

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Marcus Shawcroft

On 10 October 2014 11:53, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,


 Some early revisions of the Cortex-A53 have an erratum (835769) whereby
 it is possible for a 64-bit multiply-accumulate instruction in
 AArch64 state to generate an incorrect result.  The details are quite
 complex and hard to determine statically, since branches in the code
 may exist in some circumstances, but all cases end with a memory
 (load, store, or prefetch) instruction followed immediately by the
 multiply-accumulate operation.

 The safest work-around for this issue is to make the compiler avoid
 emitting multiply-accumulate instructions immediately after memory
 instructions and the simplest way to do this is to insert a NOP. A
 linker patching technique can also be used, by moving potentially
 affected multiply-accumulate instruction into a patch region and
 replacing the original instruction with a branch to the patch.

 This patch achieves the compiler part by using the final prescan pass.
 The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
 so that conditional branches work properly.

 The fix is disabled by default and can be turned on by the
 -mfix-cortex-a53-835769 compiler option.

 I'm attaching a trunk and a 4.9 version of the patch.
 The 4.9 version is different only in that rtx_insn* is replaced by rtx.

 Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
 built various large benchmarks with it.

 Ok?


OK /Marcus

Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option

2014-10-10 Thread Marcus Shawcroft

On 10 October 2014 11:53, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This adds a new configure-time option --enable-fix-cortex-a53-835769 that
 will enable the Cortex-A53 erratum fix by default
 so you don't have to specify -mfix-cortex-a53-835769 every time.

 Documentation in install.texi is added.

 Ok for trunk?

 Thanks,
 Kyrill

 2014-10-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * configure.ac: Add --enable-fix-cortex-a53-835769 option.
 * configure: Regenerate.
 * config/aarch64/aarch64.c (aarch64_override_options): Handle
 TARGET_FIX_ERR_A53_835769_DEFAULT.
 * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): Set Init
 value to 2.
 * doc/install.texi (aarch64*-*-*): Document
 new --enable-fix-cortex-a53-835769 option.

OK /Marcus

Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak

On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove irrelevant code.

Please mention *which* code you removed here.

 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 (PIC_OFFSET_TABLE_REGNUM): New check.

This is not New check, but changed one. Please mention *what* changed.

-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+   (!reload_completed || !ix86_use_pseudo_pic_reg ()))

Hm, can you please add a comment for this change?

Uros.

Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Rainer Orth

Uros Bizjak ubiz...@gmail.com writes:

 On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.

Evgeny: here and in your other submissions: drop the gcc prefix from the
pathnames.  They are all relative to the directory the ChangeLog lives
in.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote:
 Uros Bizjak ubiz...@gmail.com writes:
 
  On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com 
  wrote:
  i386 specific part of the patch:
 
  2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
  Vladimir Makarov  vmaka...@redhat.com
  * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 
 Evgeny: here and in your other submissions: drop the gcc prefix from the
 pathnames.  They are all relative to the directory the ChangeLog lives
 in.

And add a blank line after after the e-mail lines.

Jakub

Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option

2014-10-10 Thread Kyrill Tkachov



On 10/10/14 13:25, Marcus Shawcroft wrote:

On 10 October 2014 11:53, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This adds a new configure-time option --enable-fix-cortex-a53-835769 that
will enable the Cortex-A53 erratum fix by default
so you don't have to specify -mfix-cortex-a53-835769 every time.

Documentation in install.texi is added.

Ok for trunk?

Thanks,
Kyrill

 2014-10-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * configure.ac: Add --enable-fix-cortex-a53-835769 option.
 * configure: Regenerate.
 * config/aarch64/aarch64.c (aarch64_override_options): Handle
 TARGET_FIX_ERR_A53_835769_DEFAULT.
 * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): Set Init
 value to 2.
 * doc/install.texi (aarch64*-*-*): Document
 new --enable-fix-cortex-a53-835769 option.

OK /Marcus


Thanks Marcus, committed as r216076.
Can this go to the 4.9 branch as well? It applies cleanly there (with 
appropriately regenerated configure) and I tested it there with 
bootstraps with and without the option.


Kyrill

Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option

2014-10-10 Thread Marcus Shawcroft

On 10 October 2014 13:38, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 Thanks Marcus, committed as r216076.
 Can this go to the 4.9 branch as well? It applies cleanly there (with
 appropriately regenerated configure) and I tested it there with bootstraps
 with and without the option.

Yes that's ok /Marcus

[Ada] Issue errors on illegal contracts unless SPARK_Mode is Off

2014-10-10 Thread Arnaud Charlet

Illegal Global/Depends contracts should be flagged by frontend in code for
which SPARK_Mode is not specified, as GNATprove relies on contracts being
legal in those cases. The frontend should skip these errors only when
SPARK_Mode is Off. Now fixed, as shown on the following example.

Command:

$ gcc -c notinspark.ads

Output:
---
 1. package Notinspark is
 2.
 3.function Get return Integer;
 4.
 5.procedure Set with
 6.  Global = (In_Out = Get);
  |
 global item must denote variable or state

 7.
 8. end Notinspark;

Tested on x86_64-pc-linux-gnu, committed on trunk

2014-10-10  Yannick Moy  m...@adacore.com

* errout.adb (SPARK_Msg_N): Issue error unless SPARK_Mode is Off.

Index: errout.adb
===
--- errout.adb  (revision 216063)
+++ errout.adb  (working copy)
@@ -3138,7 +3138,7 @@
 
procedure SPARK_Msg_N (Msg : String; N : Node_Or_Entity_Id) is
begin
-  if SPARK_Mode = On then
+  if SPARK_Mode /= Off then
  Error_Msg_N (Msg, N);
   end if;
end SPARK_Msg_N;

[Ada] Loop parameter is a constant in an iterator over a formal container.

2014-10-10 Thread Arnaud Charlet

This patch enforces the same semantics for the handling of loop parameters
in element iterators over formal containers, os those over formal containers:
the loop parameter cannot be assigned to in user code.

Compiling formal_test.adb must yield:

   formal_test.adb:15:07: assignment to loop parameter not allowed

---
with Ada.Containers.Formal_Doubly_Linked_Lists;
procedure Formal_Test is
   type E is range 1 .. 1000;
   package My_List is new Ada.Containers.Formal_Doubly_Linked_Lists (E);
   use My_List;
   Thing : My_List.List (10);
   C : Cursor;
begin
   for I in 1 .. 10 loop
  Append (Thing, E (I));
   end loop;

   for Element of Thing loop
  null;
  Element := Element * 3;  --  ERROR
   end loop;
end;

Tested on x86_64-pc-linux-gnu, committed on trunk

2014-10-10  Ed Schonberg  schonb...@adacore.com

* exp_ch5.adb (Expand_Formal_Container_Element_Loop): Analyze
declaration for loop parameter before rest of loop, and set
entity kind to prevent assignments to it in the user code.
* sem_ch3.adb (Analyze_Object_Contract): No contracts apply to the
loop parameter in an element iteration over o formal container.

Index: exp_ch5.adb
===
--- exp_ch5.adb (revision 216063)
+++ exp_ch5.adb (working copy)
@@ -2889,7 +2889,17 @@
   Statements =  New_List (New_Loop)));
 
   Rewrite (N, New_Loop);
-  Analyze (New_Loop);
+
+  --  The loop parameter is declared by an object declaration, but within
+  --  the loop we must prevent user assignments to it, so we analyze the
+  --  declaration and reset the entity kind, before analyzing the rest of
+  --  the loop;
+
+  Analyze (Elmt_Decl);
+  Set_Ekind (Defining_Identifier (Elmt_Decl), E_Loop_Parameter);
+  Set_Assignment_OK (Name (Elmt_Ref));
+
+  Analyze (N);
end Expand_Formal_Container_Element_Loop;
 
-
Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 216063)
+++ sem_ch3.adb (working copy)
@@ -3062,6 +3062,12 @@
 Error_Msg_N (constant cannot be volatile, Obj_Id);
  end if;
 
+  --  The loop parameter in an element iterator over a formal container
+  --  is declared with an object declaration but no contracts apply.
+
+  elsif Ekind (Obj_Id) = E_Loop_Parameter then
+ null;
+
   else pragma Assert (Ekind (Obj_Id) = E_Variable);
 
  --  The following checks are only relevant when SPARK_Mode is on as

Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-10 Thread Felix Yang

Hi Sterling,

I made some improvement to the patch. Two changes:
1. TARGET_LOOPS is now used as a condition of the doloop related
patterns, which is more elegant.
2. As the trip count register of the zero-cost loop maybe
potentially spilled, we need to change the patterns in order to handle
this issue. The solution is similar to that adapted by c6x backend.
Just turn the zero-cost loop into a regular loop when that happens
when reload is completed.
Attached please find version 4 of the patch. Make check regression
tested with xtensa-elf-gcc/simulator.
OK for trunk?

Index: gcc/ChangeLog
===
--- gcc/ChangeLog(revision 216079)
+++ gcc/ChangeLog(working copy)
@@ -1,3 +1,20 @@
+2014-10-10  Felix Yang  felix.y...@huawei.com
+
+* config/xtensa/xtensa.h (TARGET_LOOPS): New Macro.
+* config/xtensa/xtensa.c (xtensa_reorg): New.
+(xtensa_reorg_loops): New.
+(xtensa_can_use_doloop_p): New.
+(xtensa_invalid_within_doloop): New.
+(hwloop_optimize): New.
+(hwloop_fail): New.
+(hwloop_pattern_reg): New.
+(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label.
+(xtensa_doloop_hooks): Define.
+* config/xtensa/xtensa.md (doloop_end): New.
+(loop_end): New
+(zero_cost_loop_start): Rewritten.
+(zero_cost_loop_end): Rewritten.
+
 2014-10-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * configure.ac: Add --enable-fix-cortex-a53-835769 option.
Index: gcc/config/xtensa/xtensa.md
===
--- gcc/config/xtensa/xtensa.md(revision 216079)
+++ gcc/config/xtensa/xtensa.md(working copy)
@@ -35,6 +35,8 @@
   (UNSPEC_TLS_CALL9)
   (UNSPEC_TP10)
   (UNSPEC_MEMW11)
+  (UNSPEC_LSETUP_START  12)
+  (UNSPEC_LSETUP_END13)

   (UNSPECV_SET_FP1)
   (UNSPECV_ENTRY2)
@@ -1289,41 +1291,120 @@
(set_attr length3)])


+;; Zero-overhead looping support.
+
 ;; Define the loop insns used by bct optimization to represent the
-;; start and end of a zero-overhead loop (in loop.c).  This start
-;; template generates the loop insn; the end template doesn't generate
-;; any instructions since loop end is handled in hardware.
+;; start and end of a zero-overhead loop.  This start template generates
+;; the loop insn; the end template doesn't generate any instructions since
+;; loop end is handled in hardware.

 (define_insn zero_cost_loop_start
   [(set (pc)
-(if_then_else (eq (match_operand:SI 0 register_operand a)
-  (const_int 0))
-  (label_ref (match_operand 1  ))
-  (pc)))
-   (set (reg:SI 19)
-(plus:SI (match_dup 0) (const_int -1)))]
-  
-  loopnez\t%0, %l1
+(if_then_else (ne (match_operand:SI 0 register_operand 2)
+  (const_int 1))
+  (label_ref (match_operand 1  ))
+  (pc)))
+   (set (match_operand:SI 2 register_operand =a)
+(plus (match_dup 0)
+  (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_LSETUP_START)]
+  TARGET_LOOPS  optimize
+  loop\t%0, %l1_LEND
   [(set_attr typejump)
(set_attr modenone)
(set_attr length3)])

 (define_insn zero_cost_loop_end
   [(set (pc)
-(if_then_else (ne (reg:SI 19) (const_int 0))
-  (label_ref (match_operand 0  ))
-  (pc)))
-   (set (reg:SI 19)
-(plus:SI (reg:SI 19) (const_int -1)))]
-  
+(if_then_else (ne (match_operand:SI 0 nonimmediate_operand 2,2)
+  (const_int 1))
+  (label_ref (match_operand 1  ))
+  (pc)))
+   (set (match_operand:SI 2 nonimmediate_operand =a,m)
+(plus (match_dup 0)
+  (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_LSETUP_END)
+   (clobber (match_scratch:SI 3 =X,r))]
+  TARGET_LOOPS  optimize
+  #
+  [(set_attr typejump)
+   (set_attr modenone)
+   (set_attr length0)])
+
+(define_insn loop_end
+  [(set (pc)
+(if_then_else (ne (match_operand:SI 0 register_operand 2)
+  (const_int 1))
+  (label_ref (match_operand 1  ))
+  (pc)))
+   (set (match_operand:SI 2 register_operand =a)
+(plus (match_dup 0)
+  (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_LSETUP_END)]
+  TARGET_LOOPS  optimize
 {
-xtensa_emit_loop_end (insn, operands);
-return ;
+  xtensa_emit_loop_end (insn, operands);
+  return ;
 }
   [(set_attr typejump)
(set_attr modenone)
(set_attr length0)])

+(define_split
+  [(set (pc)
+(if_then_else (ne (match_operand:SI 0 nonimmediate_operand )
+  (const_int 1))
+  (label_ref (match_operand 1  ))
+  (pc)))
+   (set (match_operand:SI 2 nonimmediate_operand )
+(plus:SI (match_dup 0)
+ (const_int -1)))
+   (unspec

Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Evgeny Stupachenko

Updated ChangeLog:
2014-10-10  Ilya Enkovich  ilya.enkov...@intel.com
Vladimir Makarov  vmaka...@redhat.com

* config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
(ix86_init_pic_reg): New.
(ix86_select_alt_pic_regnum): Add check on pseudo register.
(ix86_save_reg): Likewise.
(ix86_expand_prologue): Remove PIC register initialization
now performed in ix86_init_pic_reg.
(ix86_output_function_epilogue): Add check on pseudo register.
(set_pic_reg_ever_alive): New.
(legitimize_pic_address): Replace df_set_regs_ever_live with new
set_pic_reg_ever_alive.
(legitimize_tls_address): Likewise.
(ix86_pic_register_p): New check.
(ix86_delegitimize_address): Add check on pseudo register.
(ix86_expand_call): Insert move from pseudo PIC register to ABI
defined REAL_PIC_OFFSET_TABLE_REGNUM.
(TARGET_INIT_PIC_REG): New.
(TARGET_USE_PSEUDO_PIC_REG): New.
* config/i386/i386.h (PIC_OFFSET_TABLE_REGNUM): Return INVALID_REGNUM
if pic_offset_table_rtx exists.

-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+   (!reload_completed || !ix86_use_pseudo_pic_reg ()))

Hm, can you please add a comment for this change?

I've added the following comment to the patch:
-in which case we return (%ecx - %ebx) + foo.  */
+in which case we return (%ecx - %ebx) + foo.
+
+Note that when pseudo_pic_reg is used we can generate it only
+before reload_completed.  */

On Fri, Oct 10, 2014 at 4:36 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote:
 Uros Bizjak ubiz...@gmail.com writes:

  On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com 
  wrote:
  i386 specific part of the patch:
 
  2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
  Vladimir Makarov  vmaka...@redhat.com
  * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.

 Evgeny: here and in your other submissions: drop the gcc prefix from the
 pathnames.  They are all relative to the directory the ChangeLog lives
 in.

 And add a blank line after after the e-mail lines.

 Jakub

Re: [PATCH] gcc parallel make check

2014-10-10 Thread Christophe Lyon

Hi Jakub,


On 15 September 2014 18:05, Jakub Jelinek ja...@redhat.com wrote:
[...]
  # For parallelized check-% targets, this decides whether parallelization
  # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything
  # but optional --target_board or --extra_opts arguments).  If desirable,
  # recursive make is run with check-parallel-$lang{,1,2,3,4,5} etc. goals,
  # which can be executed in parallel, as they are run in separate directories.
 -# check-parallel-$lang{1,2,3,4,5} etc. goals invoke runtest with the longest
 -# running *.exp files from the testsuite, as determined by 
 check_$lang_parallelize
 -# variable.  The check-parallel-$lang goal in that case invokes runtest with
 -# all the remaining *.exp files not handled by the separate goals.
 +# check-parallel-$lang{,1,2,3,4,5} etc. goals invoke runtest with
 +# GCC_RUNTEST_PARALLELIZE_DIR var in the environment and runtest_file_p
 +# dejaGNU procedure is overridden to additionally synchronize through
 +# a $lang-parallel directory which tests will be run by which runtest 
 instance.
  # Afterwards contrib/dg-extract-results.sh is used to merge the sum and log
  # files.  If parallelization isn't desirable, only one recursive make
  # is run with check-parallel-$lang goal and check_$lang_parallelize variable
 @@ -3662,76 +3645,60 @@ check_p_subdirs=$(wordlist 1,$(words $(c
  # to lang_checks_parallelized variable and define check_$lang_parallelize
  # variable (see above check_gcc_parallelize description).
  $(lang_checks_parallelized): check-% : site.exp
 -   @if [ -z $(filter-out --target_board=%,$(filter-out 
 --extra_opts%,$(RUNTESTFLAGS))) ] \

Since you removed this test, the comment above is not longer accurate:
setting RUNTESTFLAGS to whatever value no longer disables
parallelization.

Which leads me to discuss a bug I faced after you committed this
change: I am testing a patch which bring a series of new tests.
$ RUNTESTFLAGS=my.exp make -jN check (in fact the 'make -j' is
embedded in a larger build script)

my.exp contains the following construct which is often used in the testsuite:
==
foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
# If we're only testing specific files and this isn't one of them,
skip it.
if ![runtest_file_p $runtests $src] then {
continue
}
c-torture-execute $src $additional_flags
gcc-dg-runtest $src  $additional_flags
}
==
Note that gcc-dg-runtest calls runtest_file_p too.

What I observed is that if I use -j1, all my .c files get tested,
while with N2 some of them are silently skipped.

It took me a while to figure out that it's because gcc-dg-runtest
calls runtest_file_p, which means that runtest_file_p is called twice
when the 1st invocation returns 1, and only once when the 1st
invocation returns 0.

For example, if we have pid0, pid1 the concurrent runtest processes,
and file0.c, file1.c,  the testcases, then:
* pid0 decides to keep file0.c file1.c file2.c file3.c file4.c. Since
the above loop calls runtest_file_p twice for each, we reach the
minor counter of 10.
* in the mean time, pid1 decides to skip file0.c, file1.c ... file9.c
since it calls runtest_file_p only once for each
* pid1 increments its parallel counter to 1, and create the new testing subdir
* pid1 decides to keep file10, file11, file12, file13 and file14
(again, 2 calls to runtest_file_p per testcase)
* pid0 increments its parallel counter to 1, and decides it has to skip it
* pid0 thus decides to skip file5, file6, file7, ... file14, calling
runtest_file_p once for each
* etc...

In the end, we have ignored file5...file9

I'm not sure why you have made special cases for some of the existing
*.exp when you forced them to disable parallelization.
Was it to handle such cases?

I'm not sure about the next step:
- should I modify my .exp file?
- should you modify gcc_parallel_test_run_p?

Even if I have to modify my .exp file, I think this is error prone,
and others could introduce a similar construct in the future.

Thanks,

Christophe.

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [12/n] Optimize string functions

2014-10-10 Thread Ilya Enkovich

On 09 Oct 11:02, Jeff Law wrote:
 On 10/08/14 13:18, Ilya Enkovich wrote:
 Hi,
 
 This patch introduces simple optimization of string function calls using 
 variants with no checks and/or bounds copy when possible.
 
 Thanks,
 Ilya
 --
 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 
  * tree-chkp.c (check_infos): New.
  (chkp_get_nobnd_fndecl): New.
  (chkp_get_nochk_fndecl): New.
  (chkp_optimize_string_function_calls): New.
  (chkp_opt_init): New.
  (chkp_opt_fini): New.
  (chkp_opt_execute): New.
  (chkp_opt_gate): New.
  (pass_data_chkp_opt): New.
  (pass_chkp_opt): New.
  (make_pass_chkp_opt): New.
 Just a note, these reviews are going to come out-of-order.
 
 I thought I mentioned it in the prior message.  Can you pull the
 optimization stuff into its own file.  It seems to me it ought to be
 conceptually separate from expansion and the other miscellaneous
 stuff in tree-chkp.c.
 
 
 +
 +/* Find memcpy, mempcpy, memmove and memset calls, perform
 +   checks before call and then call no_chk version of
 +   functions.  We do it on O2 to enable inlining of these
 +   functions during expand.
 So is the purpose here to expose the checks that would normally be
 done in the mem* routines to their caller in the hopes that doing so
 will expose redundant checks?  Or is there some other reason?

There are few reasons to replace instrumented string functions:

1.  As you said following redundant checks elimination may remove checks for 
some string functions
2.  By default functions like memcpy should assume pointers are copied and copy 
bounds.  If we know pointers are not copied, we may use faster version with no 
bounds copy
3.  If we avoid both checks and bounds copy then it is a candidate for existing 
string function calls inlining in expand pass

 
 
 +
 +   Also try to find memcpy, mempcpy, memmove and memset calls
 +   which are known to not write pointers to memory and use
 +   faster function versions for them.  */
 Can you please include tests for both classes of transformations?
 It doesn't have to be exhaustive, just a few simple tests to both
 verify the transformation is working today and to help ensure nobody
 breaks it in the future?

I thought tests will be added later.  I have ~250 tests to commit.  Will check 
I have tests for optimizations.  BTW this particular optimization cann't work 
until we have instrumented builtin calls.

 
 
 +void
 +chkp_optimize_string_function_calls (void)
 +{
 +  basic_block bb;
 +
 +  if (dump_file  (dump_flags  TDF_DETAILS))
 +fprintf (dump_file, Searching for replacable string function 
 calls...\n);
 s/replacable/replaceable/
 
 Seems like if you're going to emit a message into the debug file
 that the compiler is searching for replaceable calls that you ought
 to emit a message when a replaceable call is found too.
 
 And doing so ought to make writing those testcases easier too since
 you can scan the appropriate dump file for the message :-)
 
 
 +
 +  if (fndecl_nochk)
 +fndecl = fndecl_nochk;
 +
 +  gimple_call_set_fndecl (stmt, fndecl);
 +
 +  /* If there is no nochk version of function then
 + do nothing.  Otherwise insert checks before
 + the call.  */
 +  if (!fndecl_nochk)
 +continue;
 It's a nit, but I'd tend to write that as:
 
 if (!fndecl_nochk)
   continue;
 
 fndecl = fndecl_nochk
 gimple_call_set_fndecl (stmt, fndecl);
 
 
 

There is one more assignment to fndecl above which makes your version 
nonequivalent.

 + to perform runtiome check for size and perform
 s/runtiome/runtime/
 
 + checks only when size is not zero.  */
 +  if (!known)
 +{
 +  gimple check = gimple_build_cond (NE_EXPR,
 +size,
 +size_zero_node,
 +NULL_TREE,
 +NULL_TREE);
 +
 +  /* Split block before string function call.  */
 +  j = i;
 +  gsi_prev (j);
 +  fall = split_block (bb, gsi_stmt (j));
 +  bb = fall-src;
 +
 +  /* Add size check.  */
 +  j = gsi_last_bb (bb);
 +  if (gsi_end_p (j))
 +gsi_insert_before (j, check, GSI_SAME_STMT);
 +  else
 +gsi_insert_after (j, check, GSI_SAME_STMT);
 +
 +  /* Create basic block for checks.  */
 +  check_bb = create_empty_bb (bb);
 +  make_edge (bb, check_bb, EDGE_TRUE_VALUE);
 +  make_single_succ_edge (check_bb, fall-dest, EDGE_FALLTHRU);
 +
 +  /* Fix edge for splitted bb.  */
 +  fall-flags = EDGE_FALSE_VALUE;
 +  fall-count = bb-count;
 +  fall-probability = REG_BR_PROB_BASE;
 +
 +  /* Update dominance

Re: [PATCH] gcc parallel make check

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote:
 my.exp contains the following construct which is often used in the testsuite:
 ==
 foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
 # If we're only testing specific files and this isn't one of them,
 skip it.
 if ![runtest_file_p $runtests $src] then {
 continue
 }
 c-torture-execute $src $additional_flags
 gcc-dg-runtest $src  $additional_flags
 }
 ==
 Note that gcc-dg-runtest calls runtest_file_p too.

Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p
is run the same number of times in all instances unless
gcc_parallel_test_enable has been disabled.

See the patches I've posted when adding the fine-grained parallelization,
e.g. go testsuite has been fixed that way, etc.
So, in your above example, you'd need:
gcc_parallel_test_enable 0
line before c-torture-execute and
gcc_parallel_test_enable 1
line after gcc-dg-runtest.  That way, if runtest_file_p says the test should
be scheduled by current instance, all the subtests will be run there.

If my.exp is part of gcc/testsuite, I'm sorry for missing it, if it is
elsewhere, just fix it up.

Note, there are #verbose lines in gcc_parallel_test_run_p, you can uncomment
them and through sed on the log files verify that each instance performs the
same parallelization checks (same strings).

Jakub

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Kyrill Tkachov



On 10/10/14 15:18, Richard Earnshaw wrote:

On 10/10/14 11:53, Kyrill Tkachov wrote:

Hi all,


Some early revisions of the Cortex-A53 have an erratum (835769) whereby
it is possible for a 64-bit multiply-accumulate instruction in
AArch64 state to generate an incorrect result.  The details are quite
complex and hard to determine statically, since branches in the code
may exist in some circumstances, but all cases end with a memory
(load, store, or prefetch) instruction followed immediately by the
multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid
emitting multiply-accumulate instructions immediately after memory
instructions and the simplest way to do this is to insert a NOP. A
linker patching technique can also be used, by moving potentially
affected multiply-accumulate instruction into a patch region and
replacing the original instruction with a branch to the patch.

This patch achieves the compiler part by using the final prescan pass.
The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
so that conditional branches work properly.

The fix is disabled by default and can be turned on by the
-mfix-cortex-a53-835769 compiler option.

I'm attaching a trunk and a 4.9 version of the patch.
The 4.9 version is different only in that rtx_insn* is replaced by rtx.

Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
built various large benchmarks with it.

Ok?

Thanks,
Kyrill

2014-10-10  Kyrylo Tkachovkyrylo.tkac...@arm.com
  Ramana Radhakrishnanramana.radhakrish...@arm.com

   * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
   (ADJUST_INSN_LENGTH): Define.
   * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
   * config/aarch64/aarch64.c (is_mem_p): New function.
   (is_memory_op): Likewise.
   (aarch64_prev_real_insn): Likewise.
   (is_madd_op): Likewise.
   (dep_between_memop_and_next): Likewise.
   (aarch64_madd_needs_nop): Likewise.
   (aarch64_final_prescan_insn): Likewise.
   * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
   and -mno-fix-cortex-a53-835769 options.


a53-workaround-new.patch


commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
Author: Kyrill Tkachov kyrylo.tkac...@arm.com
Date:   Wed Oct 8 12:48:34 2014 +

 [AArch64] Add final prescan workaround for Cortex-A53 erratum.

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index b5f53d2..c57a467 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
  
  extern void aarch64_split_combinev16qi (rtx operands[3]);

  extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
+extern bool aarch64_madd_needs_nop (rtx_insn *);
+extern void aarch64_final_prescan_insn (rtx_insn *);
  extern bool
  aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5144c35..76a2480 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
return NULL;
  }
  
+static int

+is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  return MEM_P (*x);
+}
+
+static bool
+is_memory_op (rtx_insn *mem_insn)
+{
+   rtx pattern = PATTERN (mem_insn);
+   return for_each_rtx (pattern, is_mem_p, NULL);
+}
+
+/* Find the first rtx_insn before insn that will generate an assembly
+   instruction.  */
+
+static rtx_insn *
+aarch64_prev_real_insn (rtx_insn *insn)
+{
+  if (!insn)
+return NULL;
+
+  do
+{
+  insn = prev_real_insn (insn);
+}
+  while (insn  recog_memoized (insn)  0);
+
+  return insn;
+}
+
+static bool
+is_madd_op (enum attr_type t1)
+{
+  unsigned int i;
+  /* A number of these may be AArch32 only.  */
+  enum attr_type mlatypes[] = {
+TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
+TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
+TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD
+  };
+
+  for (i = 0; i  sizeof (mlatypes) / sizeof (enum attr_type); i++)
+{
+  if (t1 == mlatypes[i])
+   return true;
+}
+
+  return false;
+}
+
+/* Check if there is a register dependency between a load and the insn
+   for which we hold recog_data.  */
+
+static bool
+dep_between_memop_and_curr (rtx memop)
+{
+  rtx load_reg;
+  int opno;
+
+  if (!memop)
+return false;
+
+  if (!REG_P (SET_DEST (memop)))
+return false;
+
+  load_reg = SET_DEST (memop);
+  for (opno = 0; opno  recog_data.n_operands; opno++)
+{
+  rtx operand = recog_data.operand[opno];
+  if (REG_P (operand)
+   reg_overlap_mentioned_p (load_reg, operand))
+return true;
+
+}
+

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [11/n] Optimization helpers

2014-10-10 Thread Ilya Enkovich

On 09 Oct 12:09, Jeff Law wrote:
 On 10/08/14 13:16, Ilya Enkovich wrote:
 Hi,
 
 This patch introduces structures and manipulation functions used by simple 
 checker optimizations.  Structures are used to hold checks information - 
 type of check and checked address in a polinomial form.
 
 Thanks,
 Ilya
 --
 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 
  * tree-chkp.c (check_type): New.
  (pol_item): New.
  (address_t): New.
  (check_info): New.
  (bb_checks): New.
  (chkp_pol_item_compare): New.
  (chkp_pol_find): New.
  (chkp_extend_const): New.
  (chkp_add_addr_item): New.
  (chkp_sub_addr_item): New.
  (chkp_add_addr_addr): New.
  (chkp_sub_addr_addr): New.
  (chkp_mult_addr): New.
  (chkp_is_constant_addr): New.
  (chkp_print_addr): New.
  (chkp_collect_addr_value): New.
  (chkp_collect_value): New.
  (chkp_fill_check_info): New.
 
 
 +/* Find plynomial item in ADDR with var equal to VAR
 s/plynomial/polynomial/
 
 With nit fixed and functions moved into whatever new file gets
 created for the optimization work  this will be OK.
 jeff

Thanks for review!  Here is a fixed version.

Ilya
--
2014-10-10  Ilya Enkovich  ilya.enkov...@intel.com

* tree-chkp-opt.c: New.
* Makefile.in (OBJS): Add tree-chkp-opt.o.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d8c8488..cd45b29 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1389,6 +1389,7 @@ OBJS = \
tree-parloops.o \
tree-phinodes.o \
tree-chkp.o \
+   tree-chkp-opt.o \
tree-predcom.o \
tree-pretty-print.o \
tree-profile.o \
diff --git a/gcc/tree-chkp-opt.c b/gcc/tree-chkp-opt.c
new file mode 100644
index 000..103c4bb
--- /dev/null
+++ b/gcc/tree-chkp-opt.c
@@ -0,0 +1,463 @@
+/* Pointer Bounds Checker optimization pass.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Ilya Enkovich (ilya.enkov...@intel.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#include config.h
+#include system.h
+#include coretypes.h
+#include tree-core.h
+#include stor-layout.h
+#include varasm.h
+#include tree.h
+#include target.h
+#include tree-iterator.h
+#include tree-cfg.h
+#include langhooks.h
+#include tree-pass.h
+#include hashtab.h
+#include diagnostic.h
+#include ggc.h
+#include output.h
+#include internal-fn.h
+#include is-a.h
+#include predict.h
+#include cfgloop.h
+#include stringpool.h
+#include tree-ssa-alias.h
+#include tree-ssanames.h
+#include tree-ssa-operands.h
+#include tree-ssa-address.h
+#include tree-ssa.h
+#include ipa-inline.h
+#include basic-block.h
+#include tree-ssa-loop-niter.h
+#include gimple-expr.h
+#include gimple.h
+#include tree-phinodes.h
+#include gimple-ssa.h
+#include ssa-iterators.h
+#include gimple-pretty-print.h
+#include gimple-iterator.h
+#include gimplify.h
+#include gimplify-me.h
+#include print-tree.h
+#include expr.h
+#include tree-ssa-propagate.h
+#include gimple-fold.h
+#include gimple-walk.h
+#include tree-dfa.h
+#include tree-chkp.h
+
+enum check_type
+{
+  CHECK_LOWER_BOUND,
+  CHECK_UPPER_BOUND
+};
+
+struct pol_item
+{
+  tree cst;
+  tree var;
+};
+
+struct address_t
+{
+  vecstruct pol_item pol;
+};
+
+/* Structure to hold check informtation.  */
+struct check_info
+{
+  /* Type of the check.  */
+  check_type type;
+  /* Address used for the check.  */
+  address_t addr;
+  /* Bounds used for the check.  */
+  tree bounds;
+  /* Check statement.  Can be NULL for removed checks.  */
+  gimple stmt;
+};
+
+/* Structure to hold checks information for BB.  */
+struct bb_checks
+{
+  vecstruct check_info, va_heap, vl_ptr checks;
+};
+
+static void chkp_collect_value (tree ssa_name, address_t res);
+
+#define chkp_bndmk_fndecl \
+  (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDMK))
+#define chkp_intersect_fndecl \
+  (targetm.builtin_chkp_function (BUILT_IN_CHKP_INTERSECT))
+#define chkp_checkl_fndecl \
+  (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDCL))
+#define chkp_checku_fndecl \
+  (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDCU))
+
+/* Comparator for pol_item structures I1 and I2 to be used
+   to find items with equal var.  Also used for polynomial
+   sorting.  */
+static int
+chkp_pol_item_compare (const void *i1, const void *i2)
+{
+  const struct pol_item *p1 = (const struct pol_item *)i1;
+  const struct pol_item *p2 = (const struct

[Ada] Operator name returned by GNAT.Source_Info.Enclosing_Entity

2014-10-10 Thread Arnaud Charlet

The string returned by GNAT.Source_Info.Enclosing_Entity did not include
names of operators (e.g. **).

The following program:

 1. with Text_IO; use Text_IO;
 2. with GNAT.Source_Info; use GNAT.Source_Info;
 3. procedure BadEE is
 4.type R is new Boolean;
 5.RV : R := True;
 6.
 7.function ** (X, Y : R) return String is
 8.begin
 9.   return Enclosing_Entity;
10.end;
11. begin
12.Put_Line (RV ** RV);
13. end BadEE;

must output the string:

BadEE.**

Tested on x86_64-pc-linux-gnu, committed on trunk

2014-10-10  Robert Dewar  de...@adacore.com

* exp_intr.adb (Write_Entity_Name): Moved to outer level
(Write_Entity_Name): Properly handle operator names
(Expand_Source_Info): New procedure.
* exp_intr.ads (Add_Source_Info): New procedure.

Index: exp_intr.adb
===
--- exp_intr.adb(revision 216063)
+++ exp_intr.adb(working copy)
@@ -36,7 +36,6 @@
 with Exp_Fixd; use Exp_Fixd;
 with Exp_Util; use Exp_Util;
 with Freeze;   use Freeze;
-with Namet;use Namet;
 with Nmake;use Nmake;
 with Nlists;   use Nlists;
 with Opt;  use Opt;
@@ -116,6 +115,96 @@
--Name_Compilation_Date  - expand string with compilation date
--Name_Compilation_Time  - expand string with compilation time
 
+   procedure Write_Entity_Name (E : Entity_Id);
+   --  Recursive procedure to construct string for qualified name of enclosing
+   --  program unit. The qualification stops at an enclosing scope has no
+   --  source name (block or loop). If entity is a subprogram instance, skip
+   --  enclosing wrapper package. The name is appended to the current contents
+   --  of Name_Buffer, incrementing Name_Len.
+
+   -
+   -- Add_Source_Info --
+   -
+
+   procedure Add_Source_Info (Loc : Source_Ptr; Nam : Name_Id) is
+  Ent : Entity_Id;
+
+  Save_NB : constant String  := Name_Buffer (1 .. Name_Len);
+  Save_NL : constant Natural := Name_Len;
+  --  Save current Name_Buffer contents
+
+   begin
+  Name_Len := 0;
+
+  --  Line
+
+  case Nam is
+
+ when Name_Line =
+Add_Nat_To_Name_Buffer (Nat (Get_Logical_Line_Number (Loc)));
+
+ when Name_File =
+Get_Decoded_Name_String
+  (Reference_Name (Get_Source_File_Index (Loc)));
+
+ when Name_Source_Location =
+Build_Location_String (Loc);
+
+ when Name_Enclosing_Entity =
+
+--  Skip enclosing blocks to reach enclosing unit
+
+Ent := Current_Scope;
+while Present (Ent) loop
+   exit when Ekind (Ent) /= E_Block
+ and then Ekind (Ent) /= E_Loop;
+   Ent := Scope (Ent);
+end loop;
+
+--  Ent now points to the relevant defining entity
+
+Write_Entity_Name (Ent);
+
+ when Name_Compilation_Date =
+declare
+   subtype S13 is String (1 .. 3);
+   Months : constant array (1 .. 12) of S13 :=
+  (Jan, Feb, Mar, Apr, May, Jun,
+   Jul, Aug, Sep, Oct, Nov, Dec);
+
+   M1 : constant Character := Opt.Compilation_Time (6);
+   M2 : constant Character := Opt.Compilation_Time (7);
+
+   MM : constant Natural range 1 .. 12 :=
+  (Character'Pos (M1) - Character'Pos ('0')) * 10 +
+ (Character'Pos (M2) - Character'Pos ('0'));
+
+begin
+   --  Reformat ISO date into MMM DD  (__DATE__) format
+
+   Name_Buffer (1 .. 3)  := Months (MM);
+   Name_Buffer (4)   := ' ';
+   Name_Buffer (5 .. 6)  := Opt.Compilation_Time (9 .. 10);
+   Name_Buffer (7)   := ' ';
+   Name_Buffer (8 .. 11) := Opt.Compilation_Time (1 .. 4);
+   Name_Len := 11;
+end;
+
+ when Name_Compilation_Time =
+Name_Buffer (1 .. 8) := Opt.Compilation_Time (12 .. 19);
+Name_Len := 8;
+
+ when others =
+raise Program_Error;
+  end case;
+
+  --  Prepend original Name_Buffer contents
+
+  Name_Buffer (Save_NL + 1 .. Save_NL + Name_Len) :=
+Name_Buffer (1 .. Name_Len);
+  Name_Buffer (1 .. Save_NL) := Save_NB;
+   end Add_Source_Info;
+
-
-- Expand_Binary_Operator_Call --
-
@@ -718,61 +807,6 @@
   Loc : constant Source_Ptr := Sloc (N);
   Ent : Entity_Id;
 
-  procedure Write_Entity_Name (E : Entity_Id);
-  --  Recursive procedure to construct string for qualified name of
-  --  enclosing program unit. The qualification stops at an enclosing
-  --  scope has no source name (block or loop). If entity is a subprogram
-  --  instance, skip

[Ada] Implement new pragma Prefix_Exception_Messages

2014-10-10 Thread Arnaud Charlet

This implements a new configuration pragma

pragma Prefix_Exception_Messages;

which causes messages set using raise x with s to be
prefixed by the expanded name of the enclosing entity if
s is a string literal (if s is more complex, we assume
the program is calculating exactly the message it wants).

So for example, if we have the program:

  1. pragma Prefix_Exception_Messages;
  2. procedure Prefixem is
  3.procedure Inner is
  4.begin
  5.   raise Constraint_Error with explicit raise;
  6.end;
  7. begin
  8.Inner;
  9. end Prefixem;

The output will be:

raised CONSTRAINT_ERROR : Prefixem.Inner: explicit raise

This mode is automatic for run-time library files, so
a typical message from the runtime library which used to
look like:

raised GNAT.CALENDAR.TIME_IO.PICTURE_ERROR :
null picture string

now looks like:

raised GNAT.CALENDAR.TIME_IO.PICTURE_ERROR :
GNAT.Calendar.Time_IO.Image: null picture string

In the case of instantiations of containers, you will get the full
qualified name of the particular instantiation that is involved. For
example, the following program:

  1. with Ada.Containers.Ordered_Sets;
  2. procedure NoElmt is
  3.package Ordered_Integer_Sets is
  4.  new Ada.Containers.Ordered_Sets (Integer);
  5.use Ordered_Integer_Sets;
  6. begin
  7.if No_Element  No_Element then
  8.   null;
  9.end if;
 10. end;

will output

raised CONSTRAINT_ERROR :
NoElmt.Ordered_Integer_Sets.: Left cursor equals No_Element

This allows disambiguation of messages without reintroducing
line numbers which are problematic for maintaining tests over
different versions and targets.

Tested on x86_64-pc-linux-gnu, committed on trunk

2014-10-10  Robert Dewar  de...@adacore.com

* exp_ch11.adb (Expand_N_Raise_Statement): Handle
Prefix_Exception_Messages.
* opt.adb: Handle new flags Prefix_Exception_Message[_Config].
* opt.ads: New flags Prefix_Exception_Message[_Config].
* par-prag.adb: New dummy entry for pragma Prefix_Exception_Messages.
* snames.ads-tmpl: Add entries for new pragma Prefix_Exception_Messages.
* sem_prag.adb: Implement new pragma Prefix_Exception_Messages
* gnat_rm.texi: Document pragma Prefix_Exception_Messages.

Index: exp_ch11.adb
===
--- exp_ch11.adb(revision 216063)
+++ exp_ch11.adb(working copy)
@@ -29,6 +29,7 @@
 with Elists;   use Elists;
 with Errout;   use Errout;
 with Exp_Ch7;  use Exp_Ch7;
+with Exp_Intr; use Exp_Intr;
 with Exp_Util; use Exp_Util;
 with Namet;use Namet;
 with Nlists;   use Nlists;
@@ -1565,6 +1566,22 @@
 
   if Present (Expression (N)) then
 
+ --  Adjust message to deal with Prefix_Exception_Messages. We only
+ --  add the prefix to string literals, if the message is being
+ --  constructed, we assume it already deals with uniqueness.
+
+ if Prefix_Exception_Messages
+   and then Nkind (Expression (N)) = N_String_Literal
+ then
+Name_Len := 0;
+Add_Source_Info (Loc, Name_Enclosing_Entity);
+Add_Str_To_Name_Buffer (: );
+Add_String_To_Name_Buffer (Strval (Expression (N)));
+Rewrite (Expression (N),
+  Make_String_Literal (Loc, Name_Buffer (1 .. Name_Len)));
+Analyze_And_Resolve (Expression (N), Standard_String);
+ end if;
+
  --  Avoid passing exception-name'identity in runtimes in which this
  --  argument is not used. This avoids generating undefined references
  --  to these exceptions when compiling with no optimization
Index: gnat_rm.texi
===
--- gnat_rm.texi(revision 216081)
+++ gnat_rm.texi(working copy)
@@ -227,6 +227,7 @@
 * Pragma Precondition::
 * Pragma Predicate::
 * Pragma Preelaborable_Initialization::
+* Pragma Prefix_Exception_Messages::
 * Pragma Pre_Class::
 * Pragma Priority_Specific_Dispatching::
 * Pragma Profile::
@@ -1096,6 +1097,7 @@
 * Pragma Precondition::
 * Pragma Predicate::
 * Pragma Preelaborable_Initialization::
+* Pragma Prefix_Exception_Messages::
 * Pragma Pre_Class::
 * Pragma Priority_Specific_Dispatching::
 * Pragma Profile::
@@ -5692,6 +5694,34 @@
 versions of Ada as an implementation-defined pragma.
 See Ada 2012 Reference Manual for details.
 
+@node Pragma Prefix_Exception_Messages
+@unnumberedsec Pragma Prefix_Exception_Messages
+@cindex Prefix_Exception_Messages
+@cindex exception
+@cindex Exception_Message
+@findex Exceptions
+@noindent
+Syntax:
+
+@smallexample @c ada
+pragma Prefix_Exception_Messages;
+@end smallexample
+
+@noindent
+This is an implementation-defined configuration pragma that affects the
+behavior of raise statements with a message given as a static string
+constant (typically a

[Ada] Spurious error on local instantiation of pure generic unit

2014-10-10 Thread Arnaud Charlet

This patch fixes an error in the legality checks of aspects that apply
to library units: these aspects are legal on a local instantiation of a
library-level generic unit that carries the aspect pure.

The following must compile quietly:

   gcc -c my_buffer.adb

---
package My_Buffer
  with Elaborate_Body
is
end My_Buffer;
---
with Common.Gen_Circular_Buffer;
package body My_Buffer is
   type Capacity_Count is range 0 .. 10;

   procedure Copy_Integer
 (From_Element : in Integer;
  To_Element   :out Integer)
   is
   begin
  To_Element := From_Element;
   end Copy_Integer;

   package Buffer is new
 Common.Gen_Circular_Buffer (Capacity_Count = Capacity_Count,
 Capacity   = 10,
 Element_Type   = Integer,
 Copy_Element   = Copy_Integer);
end My_Buffer;
---
package body Common.Gen_Circular_Buffer is
   procedure Initialize (Buffer : out Buffer_Type) is
   begin
  Buffer.Start_Index := Element_Index'First;
  Buffer.Count   := 0;
   end Initialize;

   procedure Insert
 (Element : in Element_Type;
  Buffer : in out Buffer_Type)
   is
  End_Index : Element_Index;
  --  Index into the end of the buffer where Element is to be stored.
   begin
  if Element_Index'Last - Buffer.Start_Index = Buffer.Count then
 End_Index := Buffer.Start_Index + Buffer.Count;
  else
 End_Index := Element_Index'First +
  (Buffer.Count - ((Element_Index'Last - Buffer.Start_Index) + 1));
  end if;

  Copy_Element (From_Element = Element,
 To_Element = Buffer.Elements (End_Index));

  Buffer.Count := Buffer.Count + 1;
   end Insert;

   procedure Remove
 (Buffer : in out Buffer_Type;
  Element : out Element_Type)
   is
   begin
  Copy_Element (From_Element = Buffer.Elements (Buffer.Start_Index),
To_Element = Element);

  Discard_First (Buffer);
   end Remove;

   procedure Discard_First (Buffer : in out Buffer_Type) is
   begin
  if Buffer.Start_Index = Element_Index'Last then
 Buffer.Start_Index := Element_Index'First;
  else
 Buffer.Start_Index := Buffer.Start_Index + 1;
  end if;

  Buffer.Count := Buffer.Count - 1;
   end Discard_First;

   function Capacity_Used (Buffer : in Buffer_Type) return Element_Count is
   begin
  return Buffer.Count;
   end Capacity_Used;

end Common.Gen_Circular_Buffer;
---
generic
   type Capacity_Count is range ;
   Capacity : Capacity_Count;

   type Element_Type is limited private;

   with procedure Copy_Element (From_Element : in Element_Type;
   To_Element : out Element_Type);
package Common.Gen_Circular_Buffer
  with Pure
is
   type Buffer_Type is limited private;

   type Element_Count is new Capacity_Count range 0 .. Capacity;

   procedure Initialize (Buffer : out Buffer_Type);

   procedure Insert (Element : in Element_Type; Buffer : in out Buffer_Type)
   with Pre = Capacity_Used (Buffer)  Element_Count'Last;

   procedure Remove (Buffer : in out Buffer_Type; Element : out Element_Type)
   with Pre = Capacity_Used (Buffer)  0;

   procedure Discard_First (Buffer : in out Buffer_Type)
   with Pre = Capacity_Used (Buffer)  0;

   function Capacity_Used (Buffer : in Buffer_Type) return Element_Count;
private

   subtype Element_Index is Element_Count range 1 .. Element_Count'Last;

   type Element_Array is array (Element_Index) of Element_Type;

   type Buffer_Type is
  record
 Start_Index : Element_Index;

 Count : Element_Count;

 Elements : Element_Array;
 --  Element storage.
  end record;
end Common.Gen_Circular_Buffer;
---
package Common
with Pure
is
end Common;

Tested on x86_64-pc-linux-gnu, committed on trunk

2014-10-10  Ed Schonberg  schonb...@adacore.com

* sem_ch13.adb (Analyze_Aspect_Specifications, Library_Unit_Aspects):
Aspect specification is legal on a local instantiation of a
library-level generic unit.

Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 216081)
+++ sem_ch13.adb(working copy)
@@ -3018,12 +3018,15 @@
   --  of a package declaration, the pragma needs to be inserted
   --  in the list of declarations for the associated package.
   --  There is no issue of visibility delay for these aspects.
+  --  Aspect is legal on a local instantiation of a library-
+  --  level generic unit.
 
   if A_Id in Library_Unit_Aspects
 and then
   Nkind_In (N, N_Package_Declaration,
N_Generic_Package_Declaration)
 and then Nkind (Parent (N)) /= N_Compilation_Unit
+and then not Is_Generic_Instance (Defining_Entity (N))
   then

[Ada] Ada2012 freeze rules for subprogram profiles

2014-10-10 Thread Arnaud Charlet

Ada05-019 specifies that freezing a subprogram does not automatically freeze
the profile, i.e. the types of the formals and the return type. In particular
an attribute reference 'Access and its relatives do not freeze the profile.

Compiling bd.ads must yield:

   bd.ads:15:34: incorrect expression for READ attribute

---
with Ada.Streams;
package BD is
type My_Big_Int is range 0 .. 1;

type Write_Ptr is access procedure
   (Stream : not null access Ada.Streams.Root_Stream_Type'Class;
A  : in My_Big_Int'Base);

procedure Good_Write6
   (Stream : not null access Ada.Streams.Root_Stream_Type'Class;
A  : in My_Big_Int'Base);

WPtr : Write_Ptr := Good_Write6'Access;
-- Does not freeze My_Big_Int (AI05-0019-1).
for My_Big_Int'Read use WPtr.all;  -- ERROR:

private
type My_Priv (D : Integer) is null record;
end BD;

Tested on x86_64-pc-linux-gnu, committed on trunk

2014-10-10  Ed Schonberg  schonb...@adacore.com

* freeze.adb (Freeze_Entity): Freezing a subprogram does
not always freeze its profile.  In particular, an attribute
reference that takes the access type does not freeze the types
of the formals.

Index: freeze.adb
===
--- freeze.adb  (revision 216089)
+++ freeze.adb  (working copy)
@@ -4004,7 +4004,17 @@
 --  any extra formal parameters are created since we now know
 --  whether the subprogram will use a foreign convention.
 
-if not Is_Internal (E) then
+--  In Ada 2012, freezing a subprogram does not always freeze
+--  the corresponding profile (see AI05-019). An attribute
+--  reference is not a freezing point of the profile.
+--  Other constructs that should not freeze ???
+
+if Ada_Version  Ada_2005
+  and then Nkind (N) = N_Attribute_Reference
+then
+   null;
+
+elsif not Is_Internal (E) then
declare
   F_Type: Entity_Id;
   R_Type: Entity_Id;

[AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.

2014-10-10 Thread David Sherwood

Hi,

I have a fix (originally written by Tejas Belagod) for the following bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810

Could someone take a look please?

Thanks!
David Sherwood.

ChangeLog:

gcc/:
2014-10-10  David Sherwood  david.sherw...@arm.com

* config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist,
aarch64_reverse_mask): New decls.
* config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum.
* config/aarch64/iterators.md (insn_count): New mode_attr.
* config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i,
vec_load_lanes(o/c/x)i): Fixed to work for Big Endian. 
* config/aarch64/aarch64-simd.md (aarch64_rev_reglist,
aarch64_simd_(ld/st)(2/3/4)): Added.
* config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist,
aarch64_reverse_mask): Added.

ccmc_v1.patch
Description: Binary data

Re: [PATCH] gcc parallel make check

2014-10-10 Thread Christophe Lyon

On 10 October 2014 16:19, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote:
 my.exp contains the following construct which is often used in the testsuite:
 ==
 foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
 # If we're only testing specific files and this isn't one of them,
 skip it.
 if ![runtest_file_p $runtests $src] then {
 continue
 }
 c-torture-execute $src $additional_flags
 gcc-dg-runtest $src  $additional_flags
 }
 ==
 Note that gcc-dg-runtest calls runtest_file_p too.

 Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p
 is run the same number of times in all instances unless
 gcc_parallel_test_enable has been disabled.

Thanks for your prompt answer.

Is this documented somewhere, so that such cases do not happen in the future?


 See the patches I've posted when adding the fine-grained parallelization,
 e.g. go testsuite has been fixed that way, etc.
 So, in your above example, you'd need:
 gcc_parallel_test_enable 0
 line before c-torture-execute and
 gcc_parallel_test_enable 1
 line after gcc-dg-runtest.  That way, if runtest_file_p says the test should
 be scheduled by current instance, all the subtests will be run there.

 If my.exp is part of gcc/testsuite, I'm sorry for missing it, if it is
 elsewhere, just fix it up.

It's in a patch which has been under review for quite some time
(started before your change), that's why you missed it.

 Note, there are #verbose lines in gcc_parallel_test_run_p, you can uncomment
 them and through sed on the log files verify that each instance performs the
 same parallelization checks (same strings).
Yep, I saw those and also added other traces of my own :-)


What about my remark about:
  # For parallelized check-% targets, this decides whether parallelization
  # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything
  # but optional --target_board or --extra_opts arguments).  If desirable,
I think it should be removed from gcc/Makefile.in

Thanks,

Christophe.

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [8/n] Remove useless builtin calls

2014-10-10 Thread Ilya Enkovich

On 09 Oct 15:03, Jeff Law wrote:
 On 10/08/14 13:08, Ilya Enkovich wrote:
 Hi,
 
 This patch adds removal of user calls to chkp builtins which become useless 
 after instrumentation.
 
 Thanks,
 Ilya
 --
 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 
  * tree-chkp.c (chkp_remove_useless_builtins): New.
  (chkp_execute): Remove useless calls to Pointer Bounds
  Checker builtins.
 Please put this in the file with the optimizations.  Tests too,
 which may require you to add some dumping bits into this code since
 you may not be able to directly see the behaviour you want in the
 gimple dumps later.
 
 What I'm a bit confused about is it looks like every one of these
 builtin calls you end up optimizing -- why not generate the simple
 copy in the first place and avoid the need for
 chkp_remove_useless_builtins completely?  Clearly I'm missing
 something here.
 
 
 
 
 diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
 index 5443950..b424af8 100644
 --- a/gcc/tree-chkp.c
 +++ b/gcc/tree-chkp.c
 @@ -3805,6 +3805,49 @@ chkp_instrument_function (void)
  }
   }
 
 +/* Find init/null/copy_ptr_bounds calls and replace them
 +   with assignments.  It should allow better code
 +   optimization.  */
 +
 +static void
 +chkp_remove_useless_builtins ()
 +{
 +  basic_block bb, next;
 +  gimple_stmt_iterator gsi;
 +
 +  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)-next_bb;
 +  do
 +{
 +  next = bb-next_bb;
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  tree fndecl;
 +  enum built_in_function fcode;
 There's some kind of formatting goof on one or more of the
 preceeding lines.  They should be at the same indention level.  Most
 likely they aren't consistent with their use of tabs vs spaces.  A
 nit, but please take care of it.
 
 If we end up keeping this code, then I think it'll be fine with the
 whitespace nit fixed.  I just want to make sure there's a good
 reason to generate these builtins in the first place rather than the
 more direct assignment.
 
 THanks,
 Jeff

With this code we remove user builtins calls coming from source code.  E.g.:

p2 = (int *)__bnd_init_ptr_bounds (p1);
*p2 = 0;

which means p2 has value of p1 but has default bounds and following store is 
unchecked.  These calls are important for instrumentation but useless after 
instrumentation.  I don't think it is a part of checker optimizer because it 
doesn't optimize instrumentation code.  Also this transformation is trivial 
enough for O0 and checker optimizer works starting from O2.

Below is a version fixed according to Richard's comments.

Thanks,
Ilya
--
2014-10-10  Ilya Enkovich  ilya.enkov...@intel.com

* tree-chkp.c (chkp_remove_useless_builtins): New.
(chkp_execute): Remove useless calls to Pointer Bounds
Checker builtins.


diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index 170b33b..fde0228 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -3805,6 +3805,44 @@ chkp_instrument_function (void)
}
 }
 
+/* Find init/null/copy_ptr_bounds calls and replace them
+   with assignments.  It should allow better code
+   optimization.  */
+
+static void
+chkp_remove_useless_builtins ()
+{
+  basic_block bb, next;
+  gimple_stmt_iterator gsi;
+
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+{
+ gimple stmt = gsi_stmt (gsi);
+ tree fndecl;
+ enum built_in_function fcode;
+
+ /* Find builtins returning first arg and replace
+them with assignments.  */
+ if (gimple_code (stmt) == GIMPLE_CALL
+  (fndecl = gimple_call_fndecl (stmt))
+  DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+  (fcode = DECL_FUNCTION_CODE (fndecl))
+  (fcode == BUILT_IN_CHKP_INIT_PTR_BOUNDS
+ || fcode == BUILT_IN_CHKP_NULL_PTR_BOUNDS
+ || fcode == BUILT_IN_CHKP_COPY_PTR_BOUNDS
+ || fcode == BUILT_IN_CHKP_SET_PTR_BOUNDS))
+   {
+ tree res = gimple_call_arg (stmt, 0);
+ update_call_from_tree (gsi, res);
+ stmt = gsi_stmt (gsi);
+ update_stmt (stmt);
+   }
+}
+}
+}
+
 /* Initialize pass.  */
 static void
 chkp_init (void)
@@ -3872,6 +3910,8 @@ chkp_execute (void)
 
   chkp_instrument_function ();
 
+  chkp_remove_useless_builtins ();
+
   chkp_function_mark_instrumented (cfun-decl);
 
   chkp_fix_cfg ();

Re: [PATCH] gcc parallel make check

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 04:50:47PM +0200, Christophe Lyon wrote:
 On 10 October 2014 16:19, Jakub Jelinek ja...@redhat.com wrote:
  On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote:
  my.exp contains the following construct which is often used in the 
  testsuite:
  ==
  foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
  # If we're only testing specific files and this isn't one of them,
  skip it.
  if ![runtest_file_p $runtests $src] then {
  continue
  }
  c-torture-execute $src $additional_flags
  gcc-dg-runtest $src  $additional_flags
  }
  ==
  Note that gcc-dg-runtest calls runtest_file_p too.
 
  Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p
  is run the same number of times in all instances unless
  gcc_parallel_test_enable has been disabled.
 
 Thanks for your prompt answer.
 
 Is this documented somewhere, so that such cases do not happen in the future?

Feel free to submit a documentation patch.

 It's in a patch which has been under review for quite some time
 (started before your change), that's why you missed it.

Ah, ok.

 What about my remark about:
   # For parallelized check-% targets, this decides whether parallelization
   # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything
   # but optional --target_board or --extra_opts arguments).  If desirable,
 I think it should be removed from gcc/Makefile.in

Only the  and RUNTESTFLAGS ... arguments part of that.  Patch preapproved.

Jakub

FW: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.

2014-10-10 Thread David Sherwood

Hi,

I forgot to mention that this patch needs was tested in combination with another
patch by Alan Hayward (coming soon) and observed the following:

x86_64: No regressions.
aarch64: No regressions.
aarch64_be: Lots of vector tests now fixed. No new regressions.

Regards,
David.


-Original Message-
From: David Sherwood [mailto:david.sherw...@arm.com] 
Sent: 10 October 2014 15:48
To: gcc-patches@gcc.gnu.org
Subject: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.

Hi,

I have a fix (originally written by Tejas Belagod) for the following bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810

Could someone take a look please?

Thanks!
David Sherwood.

ChangeLog:

gcc/:
2014-10-10  David Sherwood  david.sherw...@arm.com

* config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist,
aarch64_reverse_mask): New decls.
* config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum.
* config/aarch64/iterators.md (insn_count): New mode_attr.
* config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i,
vec_load_lanes(o/c/x)i): Fixed to work for Big Endian. 
* config/aarch64/aarch64-simd.md (aarch64_rev_reglist,
aarch64_simd_(ld/st)(2/3/4)): Added.
* config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist,
aarch64_reverse_mask): Added.

New rematerialization sub-pass in LRA

2014-10-10 Thread Vladimir Makarov


  Here is a new rematerialization sub-pass of LRA.

  This summer, I tried to implement a separate register pressure
release pass based on rematerialization described in Simpson's PhD.
Although this approach is very attractive as implementation of a
separate pass is simpler, it did not work in GCC.  I saw no code
improvement but mostly big performance degradations.  We have pretty
decent register pressure evaluation infrastructure but apparently it
is not enough to make right rematerialization based only on this info
as IRA/LRA make many other optimizations which can be hard to take
into account before the RA.  So I guess Simpson's approach works only
for basic/simple RA for regular file architectures (as register
pressure decrease pass described in Morgan's book which I also tried).

  So I had to implement rematerialization in LRA itself where all
necessary info is available.  The implementation is more complicated
than Simpson's one but it is worth it as there is no performance
degradation in vast majority cases.

  The new LRA rematerialization sub-pass works right before spilling
subpass and tries to rematerialize values of spilled pseudos.  To
implement the new sub-pass, some important changes in LRA were done.
First, lra-lives.c updates live info about all registers (not only
allocatable ones) and, second, lra-constraints.c was modified to
permit to check that insn satisfies all constraints in strict sense
even if it still contains pseudos.

  I've tested and benchmarked the sub-pass on x86-64 and ARM.  The
sub-pass permits to generate a smaller code in average on both
architecture (although improvement no-significant), adds  0.4%
additional compilation time in -O2 mode of release GCC (according user
time of compilation of 500K lines fortran program and valgrind lakey #
insns in combine.i compilation) and about 0.7% in -O0 mode.  As the
performance result, the best I found is 1% SPECFP2000 improvement on
ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance
results are practically the same (Haswell has a very good
sophisticated memory sub-system).

 There is a room for the pass improvements.  I wrote some ideas at the
top level comment of file lra-remat.c

 Rematerialization sub-pass will work at -O2 and higher and new option
-flra-remat is introduced.

 The patch was successfully tested on x86-64 and ARM.  I am going to
submit it on next week.  Any comments are appreciated.

2014-10-10  Vladimir Makarov  vmaka...@redhat.com

* common.opt (flra-remat): New.
* opts.c (default_options_table): Add entry for flra_remat.
* timevar_def (TV_LRA_REMAT): New.
* doc/invoke.texi (-flra-remat): Add description of the new
option.
* doc/passes.texi (-flra-remat): Remove lra-equivs.c and
lra-saves.c.  Add lra-remat.c.
* Makefile.in (OBJS): Add lra-remat.o.
* lra-remat.c: New file.
* lra.c: Add info about the rematerialization pass in the top
comment.
(collect_non_operand_hard_regs, add_regs_to_insn_regno_info):
Process unallocatable regs too.
(lra_constraint_new_insn_uid_start): Remove.
(lra): Add code for calling rematerialization sub-pass.
* lra-int.h (lra_constraint_new_insn_uid_start): Remove.
(lra_constrain_insn, lra_remat): New prototypes.
(lra_eliminate_regs_1): Add parameter.
* lra-lives.c (make_hard_regno_born, make_hard_regno_dead):
Process unallocatable hard regs too.
(process_bb_lives): Ditto.
* lra-spills.c (remove_pseudos): Add argument to
lra_eliminate_regs_1 call.
* lra-eliminations.c (lra_eliminate_regs_1): Add parameter.
Use it for sp offset calculation.
(lra_eliminate_regs): Add argument for lra_eliminate_regs_1
call.
(eliminate_regs_in_insn): Add parameter.  Use it for sp offset
calculation.
(process_insn_for_elimination): Add argument for
eliminate_regs_in_insn call.
* lra-constraints.c (get_equiv_with_elimination):  Add argument
for lra_eliminate_regs_1 call.
(process_addr_reg): Add parameter.  Use it.
(process_address_1): Ditto.  Add argument for process_addr_reg
call.
(process_address): Ditto.
(curr_insn_transform): Add parameter.  Use it.  Add argument for
process_address calls.
(lra_constrain_insn): New function.
(lra_constraints): Add argument for curr_insn_transform call.

Index: Makefile.in
===
--- Makefile.in (revision 216039)
+++ Makefile.in (working copy)
@@ -1296,6 +1296,7 @@ OBJS = \
lra-constraints.o \
lra-eliminations.o \
lra-lives.o \
+   lra-remat.o \
lra-spills.o \
lto-cgraph.o \
lto-streamer.o \
Index: common.opt
===
--- common.opt  (revision 216039)
+++

Re: RFA: RTL typesafety improvements for ira.c

2014-10-10 Thread Vladimir Makarov


On 2014-10-08 10:21 PM, Jeff Law wrote:

On 10/01/14 17:27, David Malcolm wrote:


FWIW, presumably insn here also can now be an rtx_insn *?

(I'd like to eventually strengthen the params to the note-handling
functions, so fixing this up now would help with that).

Here's the updated patch to include strengthening insn to rtx_insn *.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.

OK for the trunk?  I'd like to get this wrapped up so that a patch in
the same area from Felix can get rebased and move forward.



It is ok for the trunk, Jeff.  Sorry for the delay with the response. 
As you know I was busy with the rematerialzation pass.

[PATCH x86] Update PARTIAL_REG_DEPENDENCY tune

2014-10-10 Thread Evgeny Stupachenko

Hi,

We've met several performance issues (up to 15%) on Silvermont caused
by the PARTIAL_REG_DEPENDENCY tuning.
Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY.

The patch passed bootstrap, make check.

Is it ok for trunk?

Thanks,
Evgeny

2014-10-10  Evgeny Stupachenko  evstu...@gmail.com

* config/i386/x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY):
Remove m_SILVERMONT and m_INTEL from the tune.

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 215c63c..b6b210e 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -58,8 +58,8 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY,
partial_reg_dependency,
SPECfp regression, while enabling it on K8 brings roughly 2.4% regression
that can be partly masked by careful scheduling of moves.  */
 DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, sse_partial_reg_dependency,
-  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
- | m_INTEL | m_AMDFAM10 | m_BDVER | m_GENERIC)
+  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
+ | m_BDVER | m_GENERIC)

 /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
are resolved on SSE register parts instead of whole registers, so we may

Re: [PATCH x86] Update PARTIAL_REG_DEPENDENCY tune

2014-10-10 Thread Uros Bizjak

On Fri, Oct 10, 2014 at 5:07 PM, Evgeny Stupachenko evstu...@gmail.com wrote:

 We've met several performance issues (up to 15%) on Silvermont caused
 by the PARTIAL_REG_DEPENDENCY tuning.
 Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY.

 The patch passed bootstrap, make check.

 Is it ok for trunk?

OK.

Thanks,
Uros.

[AArch64] [BE] Fix vector load/stores to not use ld1/st1

2014-10-10 Thread Alan Hayward

This patch is dependant on [AArch64] [BE] [1/2] Make large opaque integer
modes endianness-safe.”

It fixes up movoi/ci/xi for Big Endian, so that we end up with the lsb of
a big-endian integer to be in the low byte of the highest-numbered
register.

movoi uses stp/ldp
movxi needs a split version (which is shared with register-register
movxi), which just splits to two new movs
movci can then split to three movs. A future patch will instead split to
an movoi and a movti.


There are no changes for LE.

Ran whole of check with both parts of Make large opaque integer modes
endianness-safe”. No regressions.


ChangeLog:

gcc/:
2014-10-10  Alan Hayward  alan.hayw...@arm.com
* config/aarch64/aarch64.c
(aarch64_classify_address): Allow extra addressing modes for BE.
(aarch64_print_operand): new operand for printing a q register+1.
(aarch64_simd_emit_reg_reg_move): replacement for
aarch64_simd_disambiguate_copy that plants the required mov.
* config/aarch64/aarch64-protos.h
(aarch64_simd_emit_reg_reg_move): replacement for
aarch64_simd_disambiguate_copy.
* config/aarch64/aarch64-simd.md
(define_split) Use new aarch64_simd_emit_reg_reg_move.
(define_expand movmode) less restrictive predicates.
(define_insn *aarch64_movmode) Simplify and only allow for LE.
(define_insn *aarch64_be_movoi) New.  BE only.  Plant ldp or stp.
(define_insn *aarch64_be_movci) New.  BE only.  No instructions.
(define_insn *aarch64_be_movxi) New.  BE only.  No instructions.
(define_split) OI mov.  Use new aarch64_simd_emit_reg_reg_move.
(define_split) CI mov.  Use new aarch64_simd_emit_reg_reg_move.
On BE
plant movs for reg to/from mem case.
(define_split) XI mov.  Use new aarch64_simd_emit_reg_reg_move.
On BE
plant movs for reg to/from mem case.



0001-be-mov.patch
Description: Binary data

Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Vladimir Makarov


On 2014-10-10 3:42 AM, Evgeny Stupachenko wrote:

Hi,

The patch enables EBX in RA for x86 32bits PIC mode.
It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html
Now there is working version with good performance and stability level
- it could be a solid first step of EBX enabling.

Bootstrap and make check passed.
There are several changes in -m32 make check.

New pass:
gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS

New fails:

gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no
errors as we can do asm insertions with EBX
gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential
performance opportunity using constant immediate
gcc.target/i386/pr55458.c (test for errors, line 10) - now there are
no errors as there enough registers

Performance on Silvermont (for the group of 3 patches):

at -Ofast -flto -funroll-loops -fPIC
EEMBC benchs:
  +6% CoreMark
  +3% Denmark
Individual test gained up to +25%

SPEC2000int is almost unchanged, SPEC2000fp +1,7%:

at -O2 -fPIC
EEMBC benchs:
  +2% CoreMark
  +5% Denmark
Individual test gained up to +25%

SPEC2000int and SPEC2000fp are almost unchanged:



Great results!  The changes to RA (all ira* lra* file changes) is ok for 
me.  I'll probably add some comments to these changes on next week.


Thanks.

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Richard Earnshaw

On 10/10/14 15:22, Kyrill Tkachov wrote:
 
 On 10/10/14 15:18, Richard Earnshaw wrote:
 On 10/10/14 11:53, Kyrill Tkachov wrote:
 Hi all,


 Some early revisions of the Cortex-A53 have an erratum (835769) whereby
 it is possible for a 64-bit multiply-accumulate instruction in
 AArch64 state to generate an incorrect result.  The details are quite
 complex and hard to determine statically, since branches in the code
 may exist in some circumstances, but all cases end with a memory
 (load, store, or prefetch) instruction followed immediately by the
 multiply-accumulate operation.

 The safest work-around for this issue is to make the compiler avoid
 emitting multiply-accumulate instructions immediately after memory
 instructions and the simplest way to do this is to insert a NOP. A
 linker patching technique can also be used, by moving potentially
 affected multiply-accumulate instruction into a patch region and
 replacing the original instruction with a branch to the patch.

 This patch achieves the compiler part by using the final prescan pass.
 The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
 so that conditional branches work properly.

 The fix is disabled by default and can be turned on by the
 -mfix-cortex-a53-835769 compiler option.

 I'm attaching a trunk and a 4.9 version of the patch.
 The 4.9 version is different only in that rtx_insn* is replaced by rtx.

 Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
 built various large benchmarks with it.

 Ok?

 Thanks,
 Kyrill

 2014-10-10  Kyrylo Tkachovkyrylo.tkac...@arm.com
   Ramana Radhakrishnanramana.radhakrish...@arm.com

* config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
(ADJUST_INSN_LENGTH): Define.
* config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
* config/aarch64/aarch64.c (is_mem_p): New function.
(is_memory_op): Likewise.
(aarch64_prev_real_insn): Likewise.
(is_madd_op): Likewise.
(dep_between_memop_and_next): Likewise.
(aarch64_madd_needs_nop): Likewise.
(aarch64_final_prescan_insn): Likewise.
* doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
and -mno-fix-cortex-a53-835769 options.


 a53-workaround-new.patch


 commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
 Author: Kyrill Tkachov kyrylo.tkac...@arm.com
 Date:   Wed Oct 8 12:48:34 2014 +

  [AArch64] Add final prescan workaround for Cortex-A53 erratum.

 diff --git a/gcc/config/aarch64/aarch64-protos.h 
 b/gcc/config/aarch64/aarch64-protos.h
 index b5f53d2..c57a467 100644
 --- a/gcc/config/aarch64/aarch64-protos.h
 +++ b/gcc/config/aarch64/aarch64-protos.h
 @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
   
   extern void aarch64_split_combinev16qi (rtx operands[3]);
   extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx 
 sel);
 +extern bool aarch64_madd_needs_nop (rtx_insn *);
 +extern void aarch64_final_prescan_insn (rtx_insn *);
   extern bool
   aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
   void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 5144c35..76a2480 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
 return NULL;
   }
   
 +static int
 +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
 +{
 +  return MEM_P (*x);
 +}
 +
 +static bool
 +is_memory_op (rtx_insn *mem_insn)
 +{
 +   rtx pattern = PATTERN (mem_insn);
 +   return for_each_rtx (pattern, is_mem_p, NULL);
 +}
 +
 +/* Find the first rtx_insn before insn that will generate an assembly
 +   instruction.  */
 +
 +static rtx_insn *
 +aarch64_prev_real_insn (rtx_insn *insn)
 +{
 +  if (!insn)
 +return NULL;
 +
 +  do
 +{
 +  insn = prev_real_insn (insn);
 +}
 +  while (insn  recog_memoized (insn)  0);
 +
 +  return insn;
 +}
 +
 +static bool
 +is_madd_op (enum attr_type t1)
 +{
 +  unsigned int i;
 +  /* A number of these may be AArch32 only.  */
 +  enum attr_type mlatypes[] = {
 +TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
 +TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
 +TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, 
 TYPE_SMLSLD
 +  };
 +
 +  for (i = 0; i  sizeof (mlatypes) / sizeof (enum attr_type); i++)
 +{
 +  if (t1 == mlatypes[i])
 +   return true;
 +}
 +
 +  return false;
 +}
 +
 +/* Check if there is a register dependency between a load and the insn
 +   for which we hold recog_data.  */
 +
 +static bool
 +dep_between_memop_and_curr (rtx memop)
 +{
 +  rtx load_reg;
 +  int opno;
 +
 +  if (!memop)
 +return false;
 +
 +  if (!REG_P (SET_DEST (memop)))
 +return false;
 +
 +  load_reg = SET_DEST (memop);
 +  for (opno = 0; opno  recog_data.n_operands;

[PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-10 Thread Evgeny Stupachenko

Hi,

The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with
high branch cost.
Bootstrap and make check are in progress.
The patch boosts (up to 2,5 times improve) several benchmarks compiled
with -Ofast on Silvermont
Spec2000:
+5% gain on 173.applu
+1% gain on 255.vortex

Is it ok for trunk when pass bootstrap and make check?

Thanks,
Evgeny

2014-10-10  Evgeny Stupachenko  evstu...@gmail.com
* config/i386/i386.c (ix86_option_override_internal): Increase
PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost.
* config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New.
* config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates
CPUs with high branch cost.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6337aa5..5ac10eb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p,
 opts-x_param_values,
 opts_set-x_param_values);

+  /* Extend full peel max insns parameter for CPUs with high branch cost.  */
+  if (TARGET_HIGH_BRANCH_COST)
+maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+  120,
+  opts-x_param_values,
+  opts_set-x_param_values);
+
+
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts-x_flag_prefetch_loop_arrays  0
HAVE_prefetch
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 2c64162..da0c57b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 #define TARGET_INTER_UNIT_CONVERSIONS \
ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS]
 #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT]
+#define TARGET_HIGH_BRANCH_COST
ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST]
 #define TARGET_SCHEDULEix86_tune_features[X86_TUNE_SCHEDULE]
 #define TARGET_USE_BT  ix86_tune_features[X86_TUNE_USE_BT]
 #define TARGET_USE_INCDEC  ix86_tune_features[X86_TUNE_USE_INCDEC]
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index b6b210e..04d8bf8 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit,
   m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL |
  m_ATHLON_K8 | m_AMDFAM10)

+/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost.  This could be
+   used to tune unroll, if-cvt, inline... heuristics.  */
+DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost,
+  m_BONNELL | m_SILVERMONT | m_INTEL)
+
 /*/
 /* Integer instruction selection tuning  */
 /*/

Re: [patch] Use abi_tag attribute on std::list

2014-10-10 Thread Jonathan Wakely


On 03/10/14 15:49 +0100, Jonathan Wakely wrote:

On 03/10/14 16:25 +0200, Marc Glisse wrote:

On Fri, 3 Oct 2014, Jonathan Wakely wrote:


This is the patch I intend to commit to make std::list::size() O(1) as
required by C++11.

This is an ABI change, so std::list will get tagged with
abi_tag(cxx11) so that it mangles differently.


Assuming a future where we have both _GLIBCXX_ABI_TAG_CXX11 and 
_GLIBCXX_ABI_TAG_CXX17, I don't really see how 
_GLIBCXX_DEFAULT_ABI_TAG
is supposed to work. We don't want to define 
_GLIBCXX_DEFAULT_ABI_TAG to _GLIBCXX_ABI_TAG_CXX17 and suddenly have 
std::list change mangling.


True.

Should it be called _GLIBCXX_DEFAULT_ABI_TAG_CXX11, meaning 
_GLIBCXX_ABI_TAG_CXX11_IF_ENABLED_AND_NOTHING_OTHERWISE?


I suppose so ... or _GLIBCXX_MAYBE_ABI_TAG_CXX11 ?

Defining a dummy _M_distance in the old abi case is a bit strange 
(we could protect the single use with #if _GLIBCXX_USE_CXX11_ABI), 
but why not...


Yeah, I was trying to minimise the preprocessor conditionals, but
maybe that's one step too far.

Do you mind if I move (in a future patch once yours is committed) 
_M_size into _M_impl::_M_node as suggested in PR 61347?


Gah, that's where I had it until earlier this week, and I looked at it
and wondered why it was in the _List_impl class (because you only need
one member in there to benefit from the empty base-class
optimisation).

I will move it back there, since I already have that code on another
branch, so there's no point making you change the code to match
something I've already got!

Thanks for your useful comments (as always).


I've committed the attached patch, which I think is the same as the
one I posted a week ago.

Marc, I agree with your comments, but chose to ignore them ;-) For
now, anyway. I want to get something committed, we can improve it in
stage 3, or when we need to worry about a C++17 ABI change.

Tested x86_64-linux, committed to trunk.

Documentation for the abi_tag(cxx11) changes will follow ASAP, but
probably during stage 3, once all the changes are committed.
commit 3ac61312eb285d125da8af4c6cdaad1b1ce7d235
Author: Jonathan Wakely jwak...@redhat.com
Date:   Thu Aug 7 01:13:17 2014 +0100

	PR libstdc++/49561
	* acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_CXX11_ABI): Define.
	* configure.ac: Use GLIBCXX_ENABLE_LIBSTDCXX_CXX11_ABI.
	* configure: Regenerate.
	* include/Makefile.am (stamp-cxx11-abi): New target.
	(c++config.h): Set _GLIBCXX_USE_CXX11_ABI macro.
	* include/Makefile.in: Regenerate.
	* include/bits/c++config: Add _GLIBCXX_USE_CXX11_ABI placeholder and
	define _GLIBCXX_DEFAULT_ABI_TAG.
	* include/bits/list.tcc (list::emplace(const_iterator, _Args...)):
	Increment size.
	(list::emplace(const_iterator, const value_type)): Likewise.
	(list::merge(list), list::merge(list, _StrictWeakOrdering)): Adjust
	list sizes.
	* include/bits/stl_list.h (_List_base, list): Add ABI tag macro.
	(_List_base::_M_size): New data member in cxx11 ABI mode.
	(_List_base::_S_distance(_List_node_base*, _List_node_base*)): New
	function.
	(_List_base::_M_get_size(), _List_base::_M_set_size(size_t),
	_List_base::_M_inc_size(size_t), _List_base::_M_dec_size(size_t),
	_List_base::_M_distance, _List_base::_M_node_count): New functions for
	accessing list size correctly for the ABI mode.
	(_List_base::_List_base(_List_base)): Copy size and reset source.
	(_List_base::_M_init()): Initialize size member.
	(list::size()): Use _List_base::_M_node_count.
	(list::swap(list)): Swap sizes.
	(list::splice(iterator, list)): Update sizes.
	(list::splice(iterator, list, iterator)): Likewise.
	(list::insert(iterator, const value_type)): Update size.
	(list::insert(iterator, _Args...)): Likewise.
	(list::_M_erase(iterator)): Likewise.
	* testsuite/23_containers/list/requirements/dr438/assign_neg.cc:
	Adjust.
	* testsuite/23_containers/list/requirements/dr438/constructor_1_neg.cc:
	Adjust.
	* testsuite/23_containers/list/requirements/dr438/constructor_2_neg.cc:
	Adjust.
	* testsuite/23_containers/list/requirements/dr438/insert_neg.cc:
	Adjust.
	* testsuite/ext/profile/mutex_extensions_neg.cc: Adjust.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 2650a5a..0229609 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3831,6 +3831,25 @@ AC_DEFUN([GLIBCXX_CHECK_SDT_H], [
   AC_MSG_RESULT($glibcxx_cv_sys_sdt_h)
 ])
 
+dnl
+dnl Check if the user wants the new C++11-conforming ABI.
+dnl
+dnl --disable-libstdcxx-cxx11-abi will use old ABI for all types.
+dnl
+dnl Defines:
+dnl  _GLIBCXX_USE_ABI_TAG (always defined, either to 1 or 0)
+dnl
+AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_CXX11_ABI], [
+  AC_ARG_ENABLE([libstdcxx-cxx11-abi],
+AC_HELP_STRING([--disable-libstdcxx-cxx11-abi],
+		   [disable the C++11-conforming ABI]),,
+		   [enable_libstdcxx_cxx11_abi=yes])
+  if test

Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-10 Thread Ilya Tocar

On 09 Oct 20:51, Jakub Jelinek wrote:
 On Thu, Oct 09, 2014 at 04:15:23PM +0400, Ilya Tocar wrote:
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -21358,32 +21358,169 @@ ix86_expand_int_vcond (rtx operands[])
 return true;
   }
   
  -ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1)
  +ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1, 
  struct expand_vec_perm_d *d)
 
 Too long line, please wrap it.

Fixed.

   {
  -  enum machine_mode mode = GET_MODE (op0);
  +  enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
  +
 switch (mode)
   {
  +case V8HImode:
  +  if (!TARGET_AVX512VL || !TARGET_AVX512BW)
  +   return false;
 
 My strong preference would be:
   enum machine_mode maskmode = mode;
   rtx (*gen) (rtx, rtx, rtx, rtx);
 right below the enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
 line and then inside of the first switch just do:
 ...
 case V16SImode:
   if (!TARGET_AVX512F)
   return false;
   gen = gen_avx512f_vpermi2varv16si3;
   break;
 case V4SFmode:
   if (!TARGET_AVX512VL)
   return false;
   gen = gen_avx512vl_vpermi2varv4sf3;
   maskmode = V4SImode;
   break;
 ...
 etc., then in the mask = line use:
 mask = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (d-nelt, vec));
 and finally instead of the second switch do:
   emit_insn (gen (target, op0, force_reg (maskmode, mask), op1));
   return true;

Updated patch below.

---
 gcc/config/i386/i386.c | 281 +++--
 gcc/config/i386/sse.md |  45 
 2 files changed, 253 insertions(+), 73 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 352ab81..2247da8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21358,33 +21358,132 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+/* AVX512F does support 64-byte integer vector operations,
+   thus the longest vector we are faced with is V64QImode.  */
+#define MAX_VECT_LEN   64
+
+struct expand_vec_perm_d
+{
+  rtx target, op0, op1;
+  unsigned char perm[MAX_VECT_LEN];
+  enum machine_mode vmode;
+  unsigned char nelt;
+  bool one_operand_p;
+  bool testing_p;
+};
+
 static bool
-ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1)
+ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1,
+ struct expand_vec_perm_d *d)
 {
-  enum machine_mode mode = GET_MODE (op0);
+  /* ix86_expand_vec_perm_vpermi2 is called from both const and non-const
+ expander, so args are either in d, or in op0, op1 etc.  */
+  enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
+  enum machine_mode maskmode = mode;
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
   switch (mode)
 {
+case V8HImode:
+  if (!TARGET_AVX512VL || !TARGET_AVX512BW)
+   return false;
+  gen = gen_avx512vl_vpermi2varv8hi3; 
+  break;
+case V16HImode:
+  if (!TARGET_AVX512VL || !TARGET_AVX512BW)
+   return false;
+  gen = gen_avx512vl_vpermi2varv16hi3;
+  break;
+case V32HImode:
+  if (!TARGET_AVX512BW)
+   return false;
+  gen = gen_avx512bw_vpermi2varv32hi3; 
+  break;
+case V4SImode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv4si3;
+  break;
+case V8SImode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv8si3;
+  break;
 case V16SImode:
-  emit_insn (gen_avx512f_vpermi2varv16si3 (target, op0,
- force_reg (V16SImode, mask),
- op1));
-  return true;
+  if (!TARGET_AVX512F)
+   return false;
+  gen = gen_avx512f_vpermi2varv16si3;
+  break;
+case V4SFmode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv4sf3;
+  maskmode = V4SImode;
+  break;
+case V8SFmode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv8sf3;
+  maskmode = V8SImode;
+  break;
 case V16SFmode:
-  emit_insn (gen_avx512f_vpermi2varv16sf3 (target, op0,
- force_reg (V16SImode, mask),
- op1));
-  return true;
+  if (!TARGET_AVX512F)
+   return false;
+  gen = gen_avx512f_vpermi2varv16sf3;
+  maskmode = V16SImode;
+  break;
+case V2DImode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv2di3;
+  break;
+case V4DImode:
+  if (!TARGET_AVX512VL)
+   return false;
+  gen = gen_avx512vl_vpermi2varv4di3;
+  break;
 case V8DImode:
-  emit_insn (gen_avx512f_vpermi2varv8di3 (target, op0,
-force_reg (V8DImode, mask), op1));
-  return true;
+  if (!TARGET_AVX512F)
+   return false;
+

Re: [PATCH, ARM] attribute target (thumb,arm)

2014-10-10 Thread Christian Bruel


On 10/09/2014 04:11 PM, Richard Earnshaw wrote:
 On 09/10/14 12:35, Christian Bruel wrote:
 On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
 Hi Christian,
  snipped agreed stuf 
 3) about inlining
I dislike inlining different modes, From a conceptual use, a user
 might want to switch mode only when changing a function's hotness.
 Usually inlining a cold function into a hot one is not what the user
 explicitly required when setting different mode attributes for them,
 __attribute__((thumb)) should not imply coldness or hotness. Inlining 
 between cold and hot functions should be done based on profile feedback. 
 The choice of compiling in Thumb1 state for coldness is a separate one 
 because that's where the choice needs to be made.
 Ideally yes. but I think that a user motivation to use target attribute
 ((thumb) is to reduce code size even in the cases where PFO is not
 available (libraries, kernel or user application build spec packaged
 without profile data). And there are cases where static probabilities
 are not enough and that a user wants it own control with gprof or oprofile.
 But in this case, we could point to the __attribute__ ((cold)) on the
 function ? That would probably be the best workaround to propose if we
 recommend this

 Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
 that.  The days when ARM=fast, thumb=small are in the past now, and
 thumb2 code should be both fast and small.  Indeed, smaller thumb2 code
 can be faster than larger ARM code simply because you can get more of it
 in the cache.  The use of arm vs thumb is likely to be much more subtle now.

I'm also very interested by this. From my last bench session, ARM mode
could bring a speedup (from noise to 5/6%) but with a very big size
penalty.  So I believe there is room for fine tuning at application
level, and I agree this is very subtle and difficult  this is another
topic. (That was with a GCC 4.8, maybe the gap has reduced since).


 But here is another scenario: Using of attribute ((arm)) for exception
 entry points is indeed not related to hotness. But consider a typical
 thumb binary with an entry point in arm compiled in C (ex handler, a
 kernel...). Today due to the file boundary the thumb part is not inlined
 into the arm point. (Using -flto is not possible because the whole
 gimple would be thumb).

 We have no-inline attributes for scenarios like that.  I don't think a
 specific use case should dominate other cases.

That's severe, no-inline attribute would disable inlining same modes !


 Now, using attribute ((target)) for the functions others than the
 entry point, with your approach they would all be inlined (assuming the
 cost allow this) and we would end up with a arm binary instead of a
 thumb binary...

 But there are still 3 points  :

 - At least 2 other target (i386, Powerpc) that support attribute_target
 disable inlining between modes that are not subsets. I like to think
 about homogeneity between targets and I find odd to have different
 inlining rules...

 That's because use of specific instructions must not be allowed to leak
 past a gating check that is in the caller.  It would be a disaster if a
 function that used a neon register, for example, was allowed to leak
 into code that might run on a target with no Neon register file.  The
 ARM/thumb distinction shouldn't, by default, be limited in that manner.

 I believe inlining could happen from a subset of the archtiecture into a
 function using a superset, just not vice-versa.

I'm afraid I misunderstand this, Do you want inlining from Thumb to a
function using ARM because you consider thumb to be a subset of ARM ?
You know better that I but I never thought that, or maybe there is
something to do with the unified assembler ?

In this case I see the problem under a new light. With the unified
assembly, indeed we could inline from any mode as long as no divide mode
asm inside.


 - Scanning the function body to check for ASM_INPUT does not look very
 elegant (if this matters) because the asm could well be unrelated

 The only case when it will always be a win to inline thumb into arm is
 when the cost of the inlined body is less than a BX instruction (but
 still, with branch prediction this cost is pondered).

 One of the problems with not inlining is that the C++ abstraction
 penalty is likely to shoot up.  There will be many major lost
 optimization opportunities if we start down that path.

I would never consider users to use extensively this attribute on
inlined member functions. But I take your point

 So I think inlining should only be disabled if there's some technical
 reason why it should be disabled, not because of some 'it might not
 always be ideal' feelings.  Furthermore, we should expect users to use
 the other attributes consistently when they expect specific behaviours
 to occur.

Sure, me also I would have preferred objective benchmarks results, but
its a little bit early to have experience with

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [15/n] Optimize redundant checks

2014-10-10 Thread Ilya Enkovich

2014-10-09 21:41 GMT+04:00 Jeff Law l...@redhat.com:
 On 10/08/14 13:22, Ilya Enkovich wrote:

 Hi,

 This patch adds removal of redundant (covered by other) checks into
 checker optimization.

 Thanks,
 Ilya
 --
 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com

 * tree-chkp.c (chkp_compare_checks): New.
 (chkp_remove_redundant_checks): New.
 (chkp_opt_execute): Run redundant checks removal
 algorithm.


 Wouldn't pure removal be better modeled in existing optimizers?  DOM would
 seem to be a natural fit?

 Similarly, isn't the swapping very similar to a reverse DOM walk DSE-like
 optimizer?

 Deferring further review until those questions are answered?

 jeff



Checks and and intersection removal code was added as a simple pass
catching trivial cases.  I'm sure there are optimizations having
common elements with what checker optimizer does.  But initially we
didn't want to adopt existing optimizers because GIMPLE representation
of instrumentation was not stable and also we still don't know what
are important targets for optimizations.

The plan is to have stable version first.  After enabling we want to
make performance analysis and determine which optimizations are most
required (it may appear checks removal doesn't give any significant
performance gain at all), determine which of current infrastructure
may be re-used (if any) and implement proper checker optimization.

Current optimizer is a simple code cleanup.  I do not think we should
make any significant rework of it as a part of enabling.  If current
approach seems to require significant changes to go to trunk then it
should be probably delayed and go separately from instrumentation
pass.

Thanks,
Ilya

Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 07:47:19PM +0400, Ilya Tocar wrote:
 Updated patch below.

You haven't posted ChangeLog entry this time, so using the last one:

* config/i386/i386.c
   
(MAX_VECT_LEN): Move above ix86_expand_vec_perm_vpermi2.
   
...
* config/i386/sse.md
   
(define_mode_iterator VI1_AVX512): New. 
   
I'd think you should avoid the line break after filename in these
cases, so
* config/i386/i386.c (MAX_VECT_LEN): Move above
ix86_expand_vec_perm_vpermi2.
...
* config/i386/sse.md (define_mode_iterator VI1_AVX512): New.

Other than that nit it looks good to me.

Jakub

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Richard Henderson

On 10/08/2014 08:31 AM, Jiong Wang wrote:
 
 On 30/09/14 19:36, Jiong Wang wrote:
 2014-09-30 17:30 GMT+01:00 Jeff Law l...@redhat.com:
 On 09/30/14 08:37, Jiong Wang wrote:

 On 30/09/14 05:21, Jeff Law wrote:

 I do agree with Richard that it would be useful to see the insns that
 are incorrectly sunk and the surrounding context.
 So I must be missing something.  I thought the shrink-wrapping code wouldn't
 sink arithmetic/logical insns like we see with insn 14 and insn 182.  I
 thought it was limited to reg-reg copies and constant initializations.
 yes, it was limited to reg-reg copies, and my previous sink improvement 
 aimed to
 sink any rtx

A: be single_set
B: the src operand be any combination of no more than one register
 and no non-constant objects.

 while some operator like shift may have side effect. IMHO, all side
 effects are reflected on RTX,
 together with this fail_on_clobber_use modification, the rtx returned
 by single_set_no_clobber_use is
 safe to sink if it meets the above limit B and pass later register
 use/def check in move_insn_for_shrink_wrap ?
 
 Ping ~
 
 And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap 
 invoked,
 we could avoid creating new wrapper function by invoke single_set_2 directly.

I'm committing the following to fix this.

(1) Don't bother modifying single_set; just look for a bare SET.
(2) Tighten the set of expressions we're willing to move.
(3) Use direct return false in the failure case, rather than
counting a non-zero number of non-constants, etc.

Tested on x86_64, and against Andi's test case (unfortunately unreduced).


r~

2014-10-10  Richard Henderson  r...@redhat.com

PR target/63404
* shrink-wrap.c (move_insn_for_shrink_wrap): Don't use single_set.
Restrict the set of expressions we're willing to move.


diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b1ff8a2..257812c 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -176,17 +176,40 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   basic_block next_block;
   edge live_edge;
 
-  /* Look for a simple register copy.  */
-  set = single_set (insn);
-  if (!set)
+  /* Look for a simple register assignment.  We don't use single_set here
+ because we can't deal with any CLOBBERs, USEs, or REG_UNUSED secondary
+ destinations.  */
+  if (!INSN_P (insn))
+return false;
+  set = PATTERN (insn);
+  if (GET_CODE (set) != SET)
 return false;
   src = SET_SRC (set);
   dest = SET_DEST (set);
 
-  if (!REG_P (src))
+  /* For the destination, we want only a register.  Also disallow STACK
+ or FRAME related adjustments.  They are likely part of the prologue,
+ so keep them in the entry block.  */
+  if (!REG_P (dest)
+  || dest == stack_pointer_rtx
+  || dest == frame_pointer_rtx
+  || dest == hard_frame_pointer_rtx)
+return false;
+
+  /* For the source, we want one of:
+  (1) A (non-overlapping) register
+  (2) A constant,
+  (3) An expression involving no more than one register.
+
+ That last point comes from the code following, which was originally
+ written to handle only register move operations, and still only handles
+ a single source register when checking for overlaps.  Happily, the
+ same checks can be applied to expressions like (plus reg const).  */
+
+  if (CONSTANT_P (src))
+;
+  else if (!REG_P (src))
 {
-  unsigned int reg_num = 0;
-  unsigned int nonconstobj_num = 0;
   rtx src_inner = NULL_RTX;
 
   if (can_throw_internal (insn))
@@ -196,30 +219,50 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
{
  rtx x = *iter;
- if (REG_P (x))
+ switch (GET_RTX_CLASS (GET_CODE (x)))
{
- reg_num++;
- src_inner = x;
+   case RTX_CONST_OBJ:
+   case RTX_COMPARE:
+   case RTX_COMM_COMPARE:
+   case RTX_BIN_ARITH:
+   case RTX_COMM_ARITH:
+   case RTX_UNARY:
+   case RTX_TERNARY:
+ /* Constant or expression.  Continue.  */
+ break;
+
+   case RTX_OBJ:
+   case RTX_EXTRA:
+ switch (GET_CODE (x))
+   {
+   case UNSPEC:
+   case SUBREG:
+   case STRICT_LOW_PART:
+   case PC:
+ /* Ok.  Continue.  */
+ break;
+
+   case REG:
+ /* Fail if we see a second inner register.  */
+ if (src_inner != NULL)
+   return false;
+ src_inner = x;
+ break;
+
+   default:
+ return false;
+   }
+ break;
+
+   default:
+ return false;
}
- else if (!CONSTANT_P (x)  OBJECT_P (x))
-   nonconstobj_num++;
}
 
-  if (nonconstobj_num  0
-

Re: [PATCH x86] Update PARTIAL_REG_DEPENDENCY tune

2014-10-10 Thread H.J. Lu

On Fri, Oct 10, 2014 at 8:07 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Hi,

 We've met several performance issues (up to 15%) on Silvermont caused
 by the PARTIAL_REG_DEPENDENCY tuning.
 Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY.

 The patch passed bootstrap, make check.

 Is it ok for trunk?

 Thanks,
 Evgeny

 2014-10-10  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY):

^
It should be X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY.

 Remove m_SILVERMONT and m_INTEL from the tune.

 diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
 index 215c63c..b6b210e 100644
 --- a/gcc/config/i386/x86-tune.def
 +++ b/gcc/config/i386/x86-tune.def
 @@ -58,8 +58,8 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY,
 partial_reg_dependency,
 SPECfp regression, while enabling it on K8 brings roughly 2.4% regression
 that can be partly masked by careful scheduling of moves.  */
  DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, sse_partial_reg_dependency,
 -  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
 - | m_INTEL | m_AMDFAM10 | m_BDVER | m_GENERIC)
 +  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
 + | m_BDVER | m_GENERIC)

  /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
 are resolved on SSE register parts instead of whole registers, so we may



-- 
H.J.

Re: [PATCH] Fix up _Alignof with user aligned types (PR c/63495)

2014-10-10 Thread Joseph S. Myers

On Fri, 10 Oct 2014, Jakub Jelinek wrote:

 Hi!
 
 As the testcase shows, _Alignof can sometimes return smaller number
 than the minimum alignment.  That is because when laying out structures,
 fields with types with TYPE_USER_ALIGN set have also DECL_USER_ALIGN
 set and therefore neither BIGGEST_FIELD_ALIGNMENT nor ADJUST_FIELD_ALIGN
 is applied to them, but min_align_of_type was applying that unconditionally.
 
 Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
 trunk/4.9?
 
 2014-10-10  Jakub Jelinek  ja...@redhat.com
 
   PR c/63495
   * stor-layout.c (min_align_of_type): Don't decrease alignment
   through BIGGEST_FIELD_ALIGNMENT or ADJUST_FIELD_ALIGN if
   TYPE_USER_ALIGN is set.
 
   * gcc.target/i386/pr63495.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Jeff Law


On 10/10/14 01:42, Evgeny Stupachenko wrote:

Hi,

The patch enables EBX in RA for x86 32bits PIC mode.
It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html
Now there is working version with good performance and stability level
- it could be a solid first step of EBX enabling.

Bootstrap and make check passed.
There are several changes in -m32 make check.

New pass:
gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS
?!?  That doesn't make a lot of sense.  More likely it was Uros's fix 
from yesterday to regcprop which causes this to pass again.


Is it possible you updated your sources between testing runs and as a 
result picked up Uros's fix?




New fails:

gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no
errors as we can do asm insertions with EBX
I think you should remove the dg-error directive.  That turns this test 
into a simple confirmation that we can use %ebx in an asm even when 
generating PIC code.


Can you add a PR markers to your changelog

PR target/8340
PR middle-end/47602
PR rtl-optimization/55458

Actually I think there is an additional test in 47602.  Can you please 
add it to the suite?  You'll also want to change the state of 47602 to 
RESOLVED/FIXED.









gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential
performance opportunity using constant immediate

If you're not going to fix it, then you should xfail it.


gcc.target/i386/pr55458.c (test for errors, line 10) - now there are
no errors as there enough registers
Right.  Remove the dg-error and turn this into a test that effective 
verifies that %ebx is no longer fixed when generating PIC code on i686.


With those changes this is OK for the trunk.

jeff

Re: [PATCH, ARM] attribute target (thumb,arm)

2014-10-10 Thread Richard Earnshaw

On 10/10/14 15:18, Christian Bruel wrote:
 
 On 10/09/2014 04:11 PM, Richard Earnshaw wrote:
 On 09/10/14 12:35, Christian Bruel wrote:
 On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
 Hi Christian,
  snipped agreed stuf 
 3) about inlining
I dislike inlining different modes, From a conceptual use, a user
 might want to switch mode only when changing a function's hotness.
 Usually inlining a cold function into a hot one is not what the user
 explicitly required when setting different mode attributes for them,
 __attribute__((thumb)) should not imply coldness or hotness. Inlining 
 between cold and hot functions should be done based on profile feedback. 
 The choice of compiling in Thumb1 state for coldness is a separate one 
 because that's where the choice needs to be made.
 Ideally yes. but I think that a user motivation to use target attribute
 ((thumb) is to reduce code size even in the cases where PFO is not
 available (libraries, kernel or user application build spec packaged
 without profile data). And there are cases where static probabilities
 are not enough and that a user wants it own control with gprof or oprofile.
 But in this case, we could point to the __attribute__ ((cold)) on the
 function ? That would probably be the best workaround to propose if we
 recommend this

 Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
 that.  The days when ARM=fast, thumb=small are in the past now, and
 thumb2 code should be both fast and small.  Indeed, smaller thumb2 code
 can be faster than larger ARM code simply because you can get more of it
 in the cache.  The use of arm vs thumb is likely to be much more subtle now.
 
 I'm also very interested by this. From my last bench session, ARM mode
 could bring a speedup (from noise to 5/6%) but with a very big size
 penalty.  So I believe there is room for fine tuning at application
 level, and I agree this is very subtle and difficult  this is another
 topic. (That was with a GCC 4.8, maybe the gap has reduced since).

It will obviously depend on the precise micro-architecture of the chip
you run the code on; but things are improving all the time (at least,
that's what we aim for :-).  For example, we've recently throttled back
the amount of instruction shortening for hot loops to reduce the number
of instruction dependencies through the PSR flags.


 

 But here is another scenario: Using of attribute ((arm)) for exception
 entry points is indeed not related to hotness. But consider a typical
 thumb binary with an entry point in arm compiled in C (ex handler, a
 kernel...). Today due to the file boundary the thumb part is not inlined
 into the arm point. (Using -flto is not possible because the whole
 gimple would be thumb).

 We have no-inline attributes for scenarios like that.  I don't think a
 specific use case should dominate other cases.
 
 That's severe, no-inline attribute would disable inlining same modes !
 

Yes, but why would that be a problem?  What makes you think that
inlining across modes is inherently wrong, when inlining within the same
mode is not?


 Now, using attribute ((target)) for the functions others than the
 entry point, with your approach they would all be inlined (assuming the
 cost allow this) and we would end up with a arm binary instead of a
 thumb binary...

 But there are still 3 points  :

 - At least 2 other target (i386, Powerpc) that support attribute_target
 disable inlining between modes that are not subsets. I like to think
 about homogeneity between targets and I find odd to have different
 inlining rules...

 That's because use of specific instructions must not be allowed to leak
 past a gating check that is in the caller.  It would be a disaster if a
 function that used a neon register, for example, was allowed to leak
 into code that might run on a target with no Neon register file.  The
 ARM/thumb distinction shouldn't, by default, be limited in that manner.

 I believe inlining could happen from a subset of the archtiecture into a
 function using a superset, just not vice-versa.
 
 I'm afraid I misunderstand this, Do you want inlining from Thumb to a
 function using ARM because you consider thumb to be a subset of ARM ?
 You know better that I but I never thought that, or maybe there is
 something to do with the unified assembler ?
 

No, I think we can fundamentally inline either way, provided that
there's no inline assembly code.  Even then, we could probably get away
with it 90% of the time.

 In this case I see the problem under a new light. With the unified
 assembly, indeed we could inline from any mode as long as no divide mode
 asm inside.
 

Unfortunately, the two ISAs are not identical and neither is a subset of
the other, despite the use of unified assembly syntax.  For example, ARM
state has RSC, but thumb doesn't.  While Thumb has ORN, but ARM doesn't.
 We can't parse the ASM body to find out what it contains, so we have to
play safe.

[US]DIV isn't an issue once you

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [8/n] Remove useless builtin calls

2014-10-10 Thread Jeff Law


On 10/10/14 08:52, Ilya Enkovich wrote:


THanks, Jeff


With this code we remove user builtins calls coming from source code.
E.g.:

p2 = (int *)__bnd_init_ptr_bounds (p1); *p2 = 0;

which means p2 has value of p1 but has default bounds and following
store is unchecked.  These calls are important for instrumentation
but useless after instrumentation.  I don't think it is a part of
checker optimizer because it doesn't optimize instrumentation code.
Also this transformation is trivial enough for O0 and checker
optimizer works starting from O2.

Below is a version fixed according to Richard's comments.

Thanks, Ilya -- 2014-10-10  Ilya Enkovich  ilya.enkov...@intel.com

* tree-chkp.c (chkp_remove_useless_builtins): New. (chkp_execute):
Remove useless calls to Pointer Bounds Checker builtins.
Tests instrumentation are still needed.  With some basic tests and 
instrumentation this will be OK.


I hate to be harping tests, but few developers are going to be familiar 
with the MPX and related infrastructure and those tests are critical to 
helping them know when they break something.


Similarly if the plan is to iterate on improving things, then those 
basic functionality tests will ultimately save time as you can smoke 
test before running larger benchmarks.



jeff

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [11/n] Optimization helpers

2014-10-10 Thread Jeff Law


On 10/10/14 08:24, Ilya Enkovich wrote:

On 09 Oct 12:09, Jeff Law wrote:

On 10/08/14 13:16, Ilya Enkovich wrote:

Hi,

This patch introduces structures and manipulation functions used by simple 
checker optimizations.  Structures are used to hold checks information - type 
of check and checked address in a polinomial form.

Thanks,
Ilya
--
2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com

* tree-chkp.c (check_type): New.
(pol_item): New.
(address_t): New.
(check_info): New.
(bb_checks): New.
(chkp_pol_item_compare): New.
(chkp_pol_find): New.
(chkp_extend_const): New.
(chkp_add_addr_item): New.
(chkp_sub_addr_item): New.
(chkp_add_addr_addr): New.
(chkp_sub_addr_addr): New.
(chkp_mult_addr): New.
(chkp_is_constant_addr): New.
(chkp_print_addr): New.
(chkp_collect_addr_value): New.
(chkp_collect_value): New.
(chkp_fill_check_info): New.


+/* Find plynomial item in ADDR with var equal to VAR

s/plynomial/polynomial/

With nit fixed and functions moved into whatever new file gets
created for the optimization work  this will be OK.
jeff


Thanks for review!  Here is a fixed version.

Ilya
--
2014-10-10  Ilya Enkovich  ilya.enkov...@intel.com

* tree-chkp-opt.c: New.
* Makefile.in (OBJS): Add tree-chkp-opt.o.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d8c8488..cd45b29 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1389,6 +1389,7 @@ OBJS = \
tree-parloops.o \
tree-phinodes.o \
tree-chkp.o \
+   tree-chkp-opt.o \
tree-predcom.o \
tree-pretty-print.o \
tree-profile.o \
diff --git a/gcc/tree-chkp-opt.c b/gcc/tree-chkp-opt.c
new file mode 100644
index 000..103c4bb
--- /dev/null
+++ b/gcc/tree-chkp-opt.c
@@ -0,0 +1,463 @@
+/* Pointer Bounds Checker optimization pass.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Ilya Enkovich (ilya.enkov...@intel.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#include config.h
+#include system.h
+#include coretypes.h
+#include tree-core.h
+#include stor-layout.h
+#include varasm.h
+#include tree.h
+#include target.h
+#include tree-iterator.h
+#include tree-cfg.h
+#include langhooks.h
+#include tree-pass.h
+#include hashtab.h
+#include diagnostic.h
+#include ggc.h
+#include output.h
+#include internal-fn.h
+#include is-a.h
+#include predict.h
+#include cfgloop.h
+#include stringpool.h
+#include tree-ssa-alias.h
+#include tree-ssanames.h
+#include tree-ssa-operands.h
+#include tree-ssa-address.h
+#include tree-ssa.h
+#include ipa-inline.h
+#include basic-block.h
+#include tree-ssa-loop-niter.h
+#include gimple-expr.h
+#include gimple.h
+#include tree-phinodes.h
+#include gimple-ssa.h
+#include ssa-iterators.h
+#include gimple-pretty-print.h
+#include gimple-iterator.h
+#include gimplify.h
+#include gimplify-me.h
+#include print-tree.h
+#include expr.h
+#include tree-ssa-propagate.h
+#include gimple-fold.h
+#include gimple-walk.h
+#include tree-dfa.h
+#include tree-chkp.h

Thanks.  Looks good.

As a follow-up, can you try to trim down what appear to be the 
over-zealous includes?   It's a minor thing, but we are trying to be a 
bit saner about that kind of stuff than we've been in the past.


If you've already done that, then, well, we've clearly still got a ways 
to go.  For example, I can't see why you'd need output.h here :-0



Jeff

Regenerated libstdc++-v3/testsuite/Makefile.in

2014-10-10 Thread Jonathan Wakely


I've regenerated this file and committed it so I don't get harmless
whitespace changes every time I run autoreconf.

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [12/n] Optimize string functions

2014-10-10 Thread Jeff Law


On 10/10/14 08:19, Ilya Enkovich wrote:

So is the purpose here to expose the checks that would normally be
done in the mem* routines to their caller in the hopes that doing
so will expose redundant checks?  Or is there some other reason?


There are few reasons to replace instrumented string functions:

1.  As you said following redundant checks elimination may remove
checks for some string functions
2.  By default functions like memcpy
should assume pointers are copied and copy bounds.  If we know
pointers are not copied, we may use faster version with no bounds
copy
3.  If we avoid both checks and bounds copy then it is a
candidate for existing string function calls inlining in expand pass

Perfect.  So this belongs in a comment in the code.


I thought tests will be added later.
Did you already post them?  There's been so many patches I'm starting to 
lose track :-)


For future reference, when you break a submission down into logical 
hunks, including the tests in those logical hunks helps.  I realize the 
MPX work isn't as well suited for that kind of breakdown, but it's worth 
keeping in mind.



 I have ~250 tests to commit.

Will check I have tests for optimizations.

Excellent.


 BTW this particular

optimization cann't work until we have instrumented builtin calls.

Yea, hopefully we'll get to that before close of stage1.


It's a nit, but I'd tend to write that as:

if (!fndecl_nochk) continue;

fndecl = fndecl_nochk gimple_call_set_fndecl (stmt, fndecl);





There is one more assignment to fndecl above which makes your version
nonequivalent.
I had assumed the gimple_call_set_fndecl was a nop if we didn't change 
the fndecl.  Is that not the case?



I'm a bit surprised we don't have this kind of capability already
broken out.  But assuming that's the case, can you go ahead and
break that out into its own little helper function?You don't
need to find all the cases where we're doing this kind of thing
today, just create the helper function and use it in your new
code.


I could miss such function  (looked in cfg hooks and tree-cfg.h).
Hopefully someone will correct me if it is so.
Thanks.  I suspect everyone has just done their own implementation 
inline like you did.  It's something I'll be keeping my eye out for in 
others' code so we can funnel everyone into your new function.  ISTM 
many speculative optimizations are going to need that kind of helper.


Taking into account not instrumented builtin calls I suppose this
patch goes into a next builtin related series.  But here is a version
with changes.

Yea, I think you're right.  I think this is OK when the builtins are done.

jeff

Re: [PATCH] Clean up duplicated function seq_cost

2014-10-10 Thread Jeff Law

On 10/10/14 02:17, Zhenqiang Chen wrote:

-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com]
Sent: Thursday, October 09, 2014 5:21 PM
To: Zhenqiang Chen
Cc: GCC Patches
Subject: Re: [PATCH] Clean up duplicated function seq_cost

On Thu, Oct 9, 2014 at 11:20 AM, Richard Biener
richard.guent...@gmail.com wrote:

On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen

zhenqiang.c...@arm.com wrote:

Hi,

The are two implementations of seq_cost. The function bodies are
exactly the same. The patch removes one of them and make the other

global.

Bootstrap and no make check regression on X86-64.

OK for trunk?

The prototype should go to cfgloopanal.c.

Err - cfgloopanal.h of course ;)  Or rather the function sounds misplaced in
cfgloopanal.c.

Thanks for the comments. I think seq_cost should be in rtlanal.c, just like 
rtx_cost, insn_rtx_cost and so on.

ChangeLog:
2014-10-10  Zhenqiang Chen  zhenqiang.c...@arm.com

* cfgloopanal.c (seq_cost): Delete.
* rtl.h (seq_cost): New prototype.
* rtlanal.c (seq_cost): New function.
* tree-ssa-loop-ivopts.c (seq_cost): Delete.

Ok.
Jeff

Re: [PATCH x86] Update PARTIAL_REG_DEPENDENCY tune

2014-10-10 Thread Evgeny Stupachenko

Thanks. I've modified ChangeLog.

2014-10-10  Evgeny Stupachenko  evstu...@gmail.com

* config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY):
Remove m_SILVERMONT and m_INTEL from the tune.

On Fri, Oct 10, 2014 at 7:58 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Oct 10, 2014 at 8:07 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 Hi,

 We've met several performance issues (up to 15%) on Silvermont caused
 by the PARTIAL_REG_DEPENDENCY tuning.
 Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY.

 The patch passed bootstrap, make check.

 Is it ok for trunk?

 Thanks,
 Evgeny

 2014-10-10  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY):

 ^
 It should be X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY.

 Remove m_SILVERMONT and m_INTEL from the tune.

 diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
 index 215c63c..b6b210e 100644
 --- a/gcc/config/i386/x86-tune.def
 +++ b/gcc/config/i386/x86-tune.def
 @@ -58,8 +58,8 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY,
 partial_reg_dependency,
 SPECfp regression, while enabling it on K8 brings roughly 2.4% regression
 that can be partly masked by careful scheduling of moves.  */
  DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, sse_partial_reg_dependency,
 -  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
 - | m_INTEL | m_AMDFAM10 | m_BDVER | m_GENERIC)
 +  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
 + | m_BDVER | m_GENERIC)

  /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
 are resolved on SSE register parts instead of whole registers, so we may



 --
 H.J.

Re: [PATCH] Implement -fsanitize=object-size

2014-10-10 Thread Marek Polacek

On Fri, Oct 10, 2014 at 12:26:44PM +0200, Jakub Jelinek wrote:
 On Fri, Oct 10, 2014 at 12:04:08PM +0200, Marek Polacek wrote:
  I couldn't test bootstrap-ubsan, because of error:
  /home/polacek/x/trunk/prev-x86_64-unknown-linux-gnu/libsanitizer/ubsan/.libs/libubsan.a(ubsan_init.o):
  .preinit_array section is not allowed in DSO
  but I remember that the previous version of the patch passed fine.
 
 We build (intentionally) both libubsan.so.* objects and libubsan.a
 objects with -fPIC, but don't build the latter with -DPIC.  I guess
 we need now, with -static-libubsan libubsan.a is linked into shared
 libraries statically and we definitely can't use .preinit_array
 in that case.
 
 So, I think (untested) something like:
 
 2014-10-10  Jakub Jelinek  ja...@redhat.com
 
   * ubsan/Makefile.am (DEFS): Add -DPIC.
   * ubsan/Makefile.in: Regenerated.
 
 --- libsanitizer/ubsan/Makefile.am2014-09-24 11:08:04.183026156 +0200
 +++ libsanitizer/ubsan/Makefile.am2014-10-10 12:15:19.124247283 +0200
 @@ -3,7 +3,7 @@ AM_CPPFLAGS = -I $(top_srcdir) -I $(top_
  # May be used by toolexeclibdir.
  gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
  
 -DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS 
 -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS 
 +DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS 
 -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DPIC
  AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic 
 -Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fno-rtti 
 -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros
  AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
  ACLOCAL_AMFLAGS = -I m4
 
 should fix this.
 
  2014-10-09  Marek Polacek  pola...@redhat.com
 
 Check the date ;)

Adjusted.
 
  * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_OBJECT_SIZE.
  * doc/invoke.texi: Document -fsanitize=object-size.
  * flag-types.h (enum sanitize_code): Add SANITIZE_OBJECT_SIZE and
  or it into SANITIZE_UNDEFINED.
  * gimple-fold.c (gimple_fold_call): Optimize IFN_UBSAN_OBJECT_SIZE.
  * internal-fn.c (expand_UBSAN_OBJECT_SIZE): New function.
  * internal-fn.def (UBSAN_OBJECT_SIZE): Define.
  * opts.c (common_handle_option): Handle -fsanitize=object-size.
  * ubsan.c: Include tree-object-size.h.
 
 I'd avoid the s.

Adjusted.
 
  --- gcc/gimple-fold.c
  +++ gcc/gimple-fold.c
  @@ -2662,6 +2662,19 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
  inplace)
  gimple_call_arg (stmt, 1),
  gimple_call_arg (stmt, 2));
break;
  +case IFN_UBSAN_OBJECT_SIZE:
  + if (integer_all_onesp (gimple_call_arg (stmt, 2))
 
 Formatting on the case line, there should be tab.

Fixed.
 
  +
  +  gcc_assert (TREE_CODE (size) == INTEGER_CST);
  +  /* See if we can discard the check.  */
  +  if (integer_all_onesp (size))
  +/* Yes, __builtin_object_size couldn't determine the
  +   object size.  */;
 
 I'd just treat TREE_CODE (size) != INTEGER_CST
 the same as integer_all_onesp.  It is very likely you'll get
 an INTEGER_CST there, but I'd be afraid if somebody disables ccp, forwprop
 and similar optimizations that if you are unlucky you might actually have
 an SSA_NAME there instead.

Fixed.
 
 Ok with those changes.

Thanks for reviewing!
 
 After commit, please update gcc-5/changes.html.  Thanks.

Sure.

I'm applying the following.

2014-10-10  Marek Polacek  pola...@redhat.com

* asan.c (pass_sanopt::execute): Handle IFN_UBSAN_OBJECT_SIZE.
* doc/invoke.texi: Document -fsanitize=object-size.
* flag-types.h (enum sanitize_code): Add SANITIZE_OBJECT_SIZE and
or it into SANITIZE_UNDEFINED.
* gimple-fold.c (gimple_fold_call): Optimize IFN_UBSAN_OBJECT_SIZE.
* internal-fn.c (expand_UBSAN_OBJECT_SIZE): New function.
* internal-fn.def (UBSAN_OBJECT_SIZE): Define.
* opts.c (common_handle_option): Handle -fsanitize=object-size.
* ubsan.c: Include tree-object-size.h.
(ubsan_type_descriptor): Call tree_to_uhwi instead of tree_to_shwi. 
(ubsan_expand_bounds_ifn): Use false instead of 0.
(ubsan_expand_objsize_ifn): New function.
(instrument_object_size): New function.
(pass_ubsan::execute): Add object size instrumentation.
* ubsan.h (ubsan_expand_objsize_ifn): Declare.
testsuite/
* c-c++-common/ubsan/object-size-1.c: New test.
* c-c++-common/ubsan/object-size-2.c: New test.
* c-c++-common/ubsan/object-size-3.c: New test.
* c-c++-common/ubsan/object-size-4.c: New test.
* c-c++-common/ubsan/object-size-5.c: New test.
* c-c++-common/ubsan/object-size-6.c: New test.
* c-c++-common/ubsan/object-size-7.c: New test.
* c-c++-common/ubsan/object-size-8.c: New test.
* c-c++-common/ubsan/object-size-9.c: New test.
* g++.dg/ubsan/object-size-1.C: New test.

Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-10 Thread Uros Bizjak

On Fri, Oct 10, 2014 at 5:47 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

 My strong preference would be:
   enum machine_mode maskmode = mode;
   rtx (*gen) (rtx, rtx, rtx, rtx);
 right below the enum machine_mode mode = GET_MODE (d ? d-op0 : op0);
 line and then inside of the first switch just do:
 ...
 case V16SImode:
   if (!TARGET_AVX512F)
   return false;
   gen = gen_avx512f_vpermi2varv16si3;
   break;
 case V4SFmode:
   if (!TARGET_AVX512VL)
   return false;
   gen = gen_avx512vl_vpermi2varv4sf3;
   maskmode = V4SImode;
   break;
 ...
 etc., then in the mask = line use:
 mask = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (d-nelt, vec));
 and finally instead of the second switch do:
   emit_insn (gen (target, op0, force_reg (maskmode, mask), op1));
   return true;

 Updated patch below.

Please recode that horrible first switch statement to:

--cut here--
  rtx (*gen) (rtx, rtx, rtx, rtx) = NULL;

  switch (mode)
{
case V8HImode:
  if (TARGET_AVX512VL  TARGET_AVX152BW)
gen = gen_avx512vl_vpermi2varv8hi3;
  break;

...

case V2DFmode:
  if (TARGET_AVX512VL)
{
  gen = gen_avx512vl_vpermi2varv2df3;
  maskmode = V2DImode;
}
  break;

default:
  break;
}

  if (gen == NULL)
return false;
--cut here--

The patch is OK with the above improvement.

(Please also note that the patch has a bunch of i386.md changes that
will clash with followup patch series).

Thanks,
Uros.

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jiong Wang



On 10/10/14 16:59, Richard Henderson wrote:

On 10/08/2014 08:31 AM, Jiong Wang wrote:


Ping ~

And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked,
we could avoid creating new wrapper function by invoke single_set_2 directly.

I'm committing the following to fix this.

(1) Don't bother modifying single_set; just look for a bare SET.
(2) Tighten the set of expressions we're willing to move.
(3) Use direct return false in the failure case, rather than
 counting a non-zero number of non-constants, etc.

Tested on x86_64, and against Andi's test case (unfortunately unreduced).


minor nit, after this patch, gcc.target/aarch64/shrink_wrap_symbol_ref_1.c 
still not shrink-wrapped under -mabi=ilp32,
the reason is as Pinski reported, LO_SUM is with with RTX_OBJ class, while it 
could be treated as expression.
in your fix, if it's RTX_OBJ, and LO_SUM, then the move aborted by return NULL.

anyway, thanks for the fix, I verified the shrink-wrapping of some hotpot 
functions in benchmark on aarch64 are not affected.

Regards,
Jiong




r~

Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target

2014-10-10 Thread Jeff Law


On 10/09/14 10:54, Uros Bizjak wrote:

On Thu, Oct 9, 2014 at 4:07 PM, Ilya Enkovich enkovich@gmail.com wrote:

It appeared I changed a semantics of BNDMK expand when replaced tree operations 
with rtl ones.

Original code:

+  op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1),
+   arg1, integer_minus_one_node));
+  op1 = force_reg (Pmode, op1);

Modified code:

+  op1 = expand_normal (arg1);
+
+  if (!register_operand (op1, Pmode))
+   op1 = ix86_zero_extend_to_Pmode (op1);
+
+  /* Builtin arg1 is size of block but instruction op1 should
+be (size - 1).  */
+  op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
+op1, 1, OPTAB_DIRECT);

The problem is that in the fixed version we may modify value of a pseudo 
register into which arg1 is expanded which means incorrect value for all 
following usages of arg1.

Didn't reveal it early because programs surprisingly rarely hit this bug.  I do 
following change to fix it:

op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
-op1, 1, OPTAB_DIRECT);
+NULL, 1, OPTAB_DIRECT);

Similar problem was also fixed for BNDNARROW.  Does it look OK?


I'm not aware of this type of limitation, and there are quite some
similar constructs in i386.c. It is hard to say without the testcase
what happens, perhaps one of RTX experts (CC'd) can advise what is
recommended here.

The problem is the call to expand_simple_binop.

The source (op1) and the destination (op1) are obviously the same, so 
its going to clobber whatever value is in there.  If there are other 
uses of the original value of op1, then things aren't going to work. 
But I'm a little unclear how there's be other later uses of that value. 
 Perhaps Ilya could comment on that.


Regardless, unless there's a strong reason to do so, I'd generally 
recommend passing a NULL_RTX as the target for expansions so that you 
always get a new pseudo.  Lots of optimizers in the RTL world work 
better if we don't have pseudos with multiple assignments.  By passing 
NULL_RTX for the target we get that property more often.  So a change 
like Ilya suggests (though I'd use NULL_RTX rather than NULL) makes sense.




Jeff

Re: New rematerialization sub-pass in LRA

2014-10-10 Thread Jeff Law


On 10/10/14 09:02, Vladimir Makarov wrote:

   The new LRA rematerialization sub-pass works right before spilling
subpass and tries to rematerialize values of spilled pseudos.  To
implement the new sub-pass, some important changes in LRA were done.
First, lra-lives.c updates live info about all registers (not only
allocatable ones) and, second, lra-constraints.c was modified to
permit to check that insn satisfies all constraints in strict sense
even if it still contains pseudos.

   I've tested and benchmarked the sub-pass on x86-64 and ARM.  The
sub-pass permits to generate a smaller code in average on both
architecture (although improvement no-significant), adds  0.4%
additional compilation time in -O2 mode of release GCC (according user
time of compilation of 500K lines fortran program and valgrind lakey #
insns in combine.i compilation) and about 0.7% in -O0 mode.  As the
performance result, the best I found is 1% SPECFP2000 improvement on
ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance
results are practically the same (Haswell has a very good
sophisticated memory sub-system).

  There is a room for the pass improvements.  I wrote some ideas at the
top level comment of file lra-remat.c

  Rematerialization sub-pass will work at -O2 and higher and new option
-flra-remat is introduced.

  The patch was successfully tested on x86-64 and ARM.  I am going to
submit it on next week.  Any comments are appreciated.
I wonder if this could help with some of the rematerialization issues 
Intel is running into with their allocatable PIC register changes for i686.


I'll let them pass along test cases you can play with :-)

jeff

Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2014-10-10 Thread Cary Coutant

The linker already has a --strip-lto-sections option, and it's on by
default. I'll approve a patch that modifies gold to recognize
.gnu.offload_lto.* sections as part of --strip-lto-sections.

Really, though, you should be setting the SHF_EXCLUDE bit on these
sections. Do that and no special-casing will be necessary.

Generating a linker script on the fly to discard these sections is, to
me, rather hacky. There are better ways to do it.

-cary


On Thu, Oct 9, 2014 at 11:53 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 10, 2014 at 12:07:03AM +0400, Ilya Verbin wrote:
 On 09 Oct 16:07, Ilya Verbin wrote:
+  /* By default linker does not discard .gnu.offload_lto_* 
sections.  */
+  const char *linker_script = make_temp_file (_linker_script.x);
+  FILE *stream = fopen (linker_script, w);
+  if (!stream)
+   fatal_error (fopen %s: %m, linker_script);
+  fprintf (stream, SECTIONS { /DISCARD/ : { *(
+  OFFLOAD_SECTION_NAME_PREFIX *) } }\n);
+  fclose (stream);
+  printf (%s\n, linker_script);
+
+  goto finish;
+}
  
   Does this work with gold?  Are there any other linkers that support 
   plugins,
   but don't support linker scripts this way?
 
  Oops, gold does not support scripts, outputted from plugins :(
  error: SECTIONS seen after other input files; try -T/--script
 
  Probably, we should update default linker scripts in binutils?
  But without latest ld/gold all binaries compiled without -flto and with
  offloading will contain intermediate bytecode...

 Actually, this issue is not due to outputting a script from a plugin,
 gold just does not support partial linker scripts:
 https://sourceware.org/bugzilla/show_bug.cgi?id=17451

 So it seems that discarding .gnu.offload_lto_* sections (like it is done for
 .gnu.lto_*) in the default ld and gold scripts is the right way?

 I must say I'm not very much familiar with the linker plugin API, but it
 surprises me that discarding sections is not something it allows.
 Anyway, can you do the partial linker script for the bfd linker (is there
 a way to determine from the linker plugin API if it is gold or bfd ld?), and
 for gold for the time being perhaps strip the sections in lto-wrapper? and
 feed the ET_REL objects with the sections stripped back to the linker
 through the plugin API?

 Jakub

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Richard Henderson

On 10/10/2014 09:39 AM, Jiong Wang wrote:
 (1) Don't bother modifying single_set; just look for a bare SET.
 (2) Tighten the set of expressions we're willing to move.
 (3) Use direct return false in the failure case, rather than
  counting a non-zero number of non-constants, etc.

 Tested on x86_64, and against Andi's test case (unfortunately unreduced).
 
 minor nit, after this patch, gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
 still not shrink-wrapped under -mabi=ilp32,
 the reason is as Pinski reported, LO_SUM is with with RTX_OBJ class, while it
 could be treated as expression.
 in your fix, if it's RTX_OBJ, and LO_SUM, then the move aborted by return 
 NULL.

I missed that message.  Interesting.

I wonder what kind of fallout there would be from changing LO_SUM to
RTX_BIN_ARITH, which is what it should have been all along.

My guess is that someone looked at HIGH being RTX_CONST_OBJ and thought that
LO_SUM should also be a kind of object too.  But it's really a lot more like a
kind of PLUS.  If instead we had a LOW to match HIGH it would have been

  (plus reg (low symbol))

and thus more obvious.

I'll see if I can bootstrap such a change on e.g. sparc or ppc32, which uses
lo_sum heavily.

 anyway, thanks for the fix, I verified the shrink-wrapping of some hotpot
 functions in benchmark on aarch64 are not affected.

Thanks for the additional testing.


r~

Re: [PATCH, Pointer Bounds Checker 14/x] Passes [15/n] Optimize redundant checks

2014-10-10 Thread Jeff Law


On 10/10/14 09:50, Ilya Enkovich wrote:

Checks and and intersection removal code was added as a simple pass
catching trivial cases.  I'm sure there are optimizations having
common elements with what checker optimizer does.  But initially we
didn't want to adopt existing optimizers because GIMPLE representation
of instrumentation was not stable and also we still don't know what
are important targets for optimizations.

Understood.



The plan is to have stable version first.  After enabling we want to
make performance analysis and determine which optimizations are most
required (it may appear checks removal doesn't give any significant
performance gain at all), determine which of current infrastructure
may be re-used (if any) and implement proper checker optimization.

Current optimizer is a simple code cleanup.  I do not think we should
make any significant rework of it as a part of enabling.  If current
approach seems to require significant changes to go to trunk then it
should be probably delayed and go separately from instrumentation
pass.
Well, I think it should be trivial to handle the redundant check 
elimination in DOM.


Most likely eliminate_redundant_computations needs some work to allow it 
to look inside those checks and get them recorded into its tables.  With 
that in place, DOM should optimize this stuff without further 
intervention.  It's probably less code than you've already written :-)


The swapping variant feels like it should be simple to implement with 
the existing dominator walkers.  But I haven't thought nearly as much 
about that one.


jeff

Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 09:51:02AM -0700, Cary Coutant wrote:
 The linker already has a --strip-lto-sections option, and it's on by
 default. I'll approve a patch that modifies gold to recognize
 .gnu.offload_lto.* sections as part of --strip-lto-sections.
 
 Really, though, you should be setting the SHF_EXCLUDE bit on these
 sections. Do that and no special-casing will be necessary.

For that I guess
lhd_begin_section
would need to replace:
  section = get_section (name, SECTION_DEBUG, NULL);
with:
  section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL);
either just for the .gnu.offload_lto prefixed section, or all.
The question is what will old assemblers and/or linkers do with that, and
if there are any that support linker plugins, but not SHF_EXCLUDE.

Jakub

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jeff Law


On 10/10/14 10:52, Richard Henderson wrote:


I missed that message.  Interesting.

I wonder what kind of fallout there would be from changing LO_SUM to
RTX_BIN_ARITH, which is what it should have been all along.

My guess is that someone looked at HIGH being RTX_CONST_OBJ and thought that
LO_SUM should also be a kind of object too.
Most likely.  One could even make an argument that LO_SUM should be a 
constant, but that's a more intrusive change.




 But it's really a lot more like a

kind of PLUS.  If instead we had a LOW to match HIGH it would have been
Right.  In fact, I believe at the hardware level it's typically 
implemented as addition.


Jeff

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jakub Jelinek

On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote:
  But it's really a lot more like a
 kind of PLUS.  If instead we had a LOW to match HIGH it would have been
 Right.  In fact, I believe at the hardware level it's typically implemented
 as addition.

Can be addition or bitwise or, as high should have some low bits zero, both
usually work the same.

Jakub

Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2014-10-10 Thread Cary Coutant

 The question is what will old assemblers and/or linkers do with that, and
 if there are any that support linker plugins, but not SHF_EXCLUDE.

If it helps answer that question, SHF_EXCLUDE support has been in gold
for 6 years, and in gas for 4.

-cary

Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI

2014-10-10 Thread Uros Bizjak

On Wed, Oct 1, 2014 at 8:57 PM, Vladimir Makarov vmaka...@redhat.com wrote:

 The problem is in code introduced by Bernd in IRA and caller-saves.c in
 2012.  It is basically an optimization for functions returning always the
 same result as one argument (e.g. memcpy returning 1st argument).

 There are two possible solutions.  First one is to prohibit the
 optimizations when there is a parallel in SET.  Second one is to go deeper
 if the call result is guaranteed in the first element which is true for the
 patch.

I suspect that the first solution will regress
gcc.target/i386/retarg.c on i686 - that testcase test if referred
optimization is effective. All things equal, I think we should go with
the second solution.

 For the first solution, the patch would be

 Index: lra-constraints.c
 ===
 --- lra-constraints.c   (revision 215748)
 +++ lra-constraints.c   (working copy)
 @@ -5348,16 +5348,19 @@
   if (GET_CODE (pat) == PARALLEL)
 pat = XVECEXP (pat, 0, 0);
   dest = SET_DEST (pat);
 - start_sequence ();
 - emit_move_insn (cheap, copy_rtx (dest));
 - restore = get_insns ();
 - end_sequence ();
 - lra_process_new_insns (curr_insn, NULL, restore,
 -Inserting call parameter
 restore);
 - /* We don't need to save/restore of the pseudo from
 -this call.  */
 - usage_insns[regno].calls_num = calls_num;
 - bitmap_set_bit (check_only_regs, regno);
 + if (REG_P (dest))
 +   {
 + start_sequence ();
 + emit_move_insn (cheap, copy_rtx (dest));
 + restore = get_insns ();
 + end_sequence ();
 + lra_process_new_insns (curr_insn, NULL, restore,
 +Inserting call parameter
 restore);
 + /* We don't need to save/restore of the pseudo
 +from this call.  */
 + usage_insns[regno].calls_num = calls_num;
 + bitmap_set_bit (check_only_regs, regno);
 +   }
 }
 }
   to_inherit_num = 0;


 For the second solution, the patch is


 Index: lra-constraints.c
 ===
 --- lra-constraints.c   (revision 215748)
 +++ lra-constraints.c   (working copy)
 @@ -5348,16 +5348,25 @@
   if (GET_CODE (pat) == PARALLEL)
 pat = XVECEXP (pat, 0, 0);
   dest = SET_DEST (pat);
 - start_sequence ();
 - emit_move_insn (cheap, copy_rtx (dest));
 - restore = get_insns ();
 - end_sequence ();
 - lra_process_new_insns (curr_insn, NULL, restore,
 -Inserting call parameter
 restore);
 - /* We don't need to save/restore of the pseudo from
 -this call.  */
 - usage_insns[regno].calls_num = calls_num;
 - bitmap_set_bit (check_only_regs, regno);
 + if (GET_CODE (dest) == PARALLEL)
 +   {
 + dest = XVECEXP (dest, 0, 0);
 + if (GET_CODE (dest) == EXPR_LIST)
 +   dest = XEXP (dest, 0);
 +   }
 + if (REG_P (dest))
 +   {
 + start_sequence ();
 + emit_move_insn (cheap, copy_rtx (dest));
 + restore = get_insns ();
 + end_sequence ();
 + lra_process_new_insns (curr_insn, NULL, restore,
 +Inserting call parameter
 restore);
 + /* We don't need to save/restore of the pseudo from
 +this call.  */
 + usage_insns[regno].calls_num = calls_num;
 + bitmap_set_bit (check_only_regs, regno);
 +   }
 }
 }


 The first patch is safer but the second one is ok too.  I have no particular
 preferences.  Whatever we choose, analogous code in caller-saves.c should be
 changed too.

Uros.

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Andrew Pinski

On Fri, Oct 10, 2014 at 3:53 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,


 Some early revisions of the Cortex-A53 have an erratum (835769) whereby
 it is possible for a 64-bit multiply-accumulate instruction in
 AArch64 state to generate an incorrect result.  The details are quite
 complex and hard to determine statically, since branches in the code
 may exist in some circumstances, but all cases end with a memory
 (load, store, or prefetch) instruction followed immediately by the
 multiply-accumulate operation.

 The safest work-around for this issue is to make the compiler avoid
 emitting multiply-accumulate instructions immediately after memory
 instructions and the simplest way to do this is to insert a NOP. A
 linker patching technique can also be used, by moving potentially
 affected multiply-accumulate instruction into a patch region and
 replacing the original instruction with a branch to the patch.

 This patch achieves the compiler part by using the final prescan pass.
 The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
 so that conditional branches work properly.

 The fix is disabled by default and can be turned on by the
 -mfix-cortex-a53-835769 compiler option.

 I'm attaching a trunk and a 4.9 version of the patch.
 The 4.9 version is different only in that rtx_insn* is replaced by rtx.

 Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
 built various large benchmarks with it.

 Ok?

A few comments:
This:
+static int
+is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  return MEM_P (*x);
+}
+
+static bool
+is_memory_op (rtx_insn *mem_insn)
+{
+   rtx pattern = PATTERN (mem_insn);
+   return for_each_rtx (pattern, is_mem_p, NULL);
+}

Should be using the new FOR_EACH_SUBRTX instead.  This should simplify
the code something like:

static bool
is_memory_op (rtx_insn *mem_insn)
{
  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, PATTERN (mem_insn), ALL)
if (MEM_P (*iter))
  return true;
  return false;
}

Also this should be a has_ function rather than a is_ function.

I think you should have ADJUST_INSN_LENGTH as a function call rather
than inline it there or at least wrap it with do { } while (0);  which
I think is documented part of the coding style.

Also you added no comment before each function.  The coding style says
each function needs a comment describing what the function does.

Thanks,
Andrew Pinski


 Thanks,
 Kyrill

 2014-10-10  Kyrylo Tkachovkyrylo.tkac...@arm.com
 Ramana Radhakrishnanramana.radhakrish...@arm.com

  * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
  (ADJUST_INSN_LENGTH): Define.
  * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
  * config/aarch64/aarch64.c (is_mem_p): New function.
  (is_memory_op): Likewise.
  (aarch64_prev_real_insn): Likewise.
  (is_madd_op): Likewise.
  (dep_between_memop_and_next): Likewise.
  (aarch64_madd_needs_nop): Likewise.
  (aarch64_final_prescan_insn): Likewise.
  * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
  and -mno-fix-cortex-a53-835769 options.

Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target

2014-10-10 Thread Ilya Enkovich

2014-10-10 20:45 GMT+04:00 Jeff Law l...@redhat.com:
 On 10/09/14 10:54, Uros Bizjak wrote:

 On Thu, Oct 9, 2014 at 4:07 PM, Ilya Enkovich enkovich@gmail.com
 wrote:

 It appeared I changed a semantics of BNDMK expand when replaced tree
 operations with rtl ones.

 Original code:

 +  op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1),
 +   arg1, integer_minus_one_node));
 +  op1 = force_reg (Pmode, op1);

 Modified code:

 +  op1 = expand_normal (arg1);
 +
 +  if (!register_operand (op1, Pmode))
 +   op1 = ix86_zero_extend_to_Pmode (op1);
 +
 +  /* Builtin arg1 is size of block but instruction op1 should
 +be (size - 1).  */
 +  op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
 +op1, 1, OPTAB_DIRECT);

 The problem is that in the fixed version we may modify value of a pseudo
 register into which arg1 is expanded which means incorrect value for all
 following usages of arg1.

 Didn't reveal it early because programs surprisingly rarely hit this bug.
 I do following change to fix it:

 op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
 -op1, 1, OPTAB_DIRECT);
 +NULL, 1, OPTAB_DIRECT);

 Similar problem was also fixed for BNDNARROW.  Does it look OK?


 I'm not aware of this type of limitation, and there are quite some
 similar constructs in i386.c. It is hard to say without the testcase
 what happens, perhaps one of RTX experts (CC'd) can advise what is
 recommended here.

 The problem is the call to expand_simple_binop.

 The source (op1) and the destination (op1) are obviously the same, so its
 going to clobber whatever value is in there.  If there are other uses of the
 original value of op1, then things aren't going to work. But I'm a little
 unclear how there's be other later uses of that value.  Perhaps Ilya could
 comment on that.

op1 is a result of expand_normal called for SSA name.  Other uses of
op1 come from expand of uses of this SSA name in GIMPLE code.


 Regardless, unless there's a strong reason to do so, I'd generally recommend
 passing a NULL_RTX as the target for expansions so that you always get a new
 pseudo.  Lots of optimizers in the RTL world work better if we don't have
 pseudos with multiple assignments.  By passing NULL_RTX for the target we
 get that property more often.  So a change like Ilya suggests (though I'd
 use NULL_RTX rather than NULL) makes sense.

Will replace it with NULL_RTX.

Thanks,
Ilya




 Jeff

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jeff Law


On 10/10/14 11:04, Jakub Jelinek wrote:

On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote:

  But it's really a lot more like a

kind of PLUS.  If instead we had a LOW to match HIGH it would have been

Right.  In fact, I believe at the hardware level it's typically implemented
as addition.


Can be addition or bitwise or, as high should have some low bits zero, both
usually work the same.
It can be bitwise-or on some architectures, but I believe it's typically 
implemented as addition.  There's also typically an overlap between the 
high and low parts when building up addresses.  GCC actually takes 
advantage of that overlap to reduce unnecessary HIGH expressions.



And you can always have an oddball architecture like the PA where the 
LO_SUM does something utterly braindead.   It looks like this


addil %r27,symbolic nonsense

You might think %r27 holds the high bits of the address...  Umm, no it 
doesn't.  There's an implicit %r1 source/destination operand (which 
holds the high bits).  So what this really does is add %r27, %r1 and the 
symbolic constant.  %r27 is a data pointer.  Obviously the implicit 
operand is used get more bits holding the symbolic constant in the 
instruction.


If that's not bad enough, if the object is in readonly memory, then the 
linker will modify the instruction to change %r27 to %r0 (hardwired 0 
constant).But I'm getting offtopic here :-)


jeff

Re: [PATCH v2, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference

2014-10-10 Thread Jeff Law


On 10/10/14 04:06, Uros Bizjak wrote:

On Wed, Oct 8, 2014 at 1:56 PM, Uros Bizjak ubiz...@gmail.com wrote:


This message revives an old thread [1], where the miscompilation of gfortran
on alpha was found that that resulted in:


[...]


As said in the audit trail of the bugreport I think that the caller
of alpha_set_memflags is wrong in applying MEM flags from the _source_
operand to MEMs generated for the RMW sequence for the destination.

It would be better to fix that instead.


Please see comment #6 of the referred PR [1] for further analysis and
ammended testcase. The testcase and analysis will show a native read
passing possibly aliasing store.


Attached v2 patch implements the same approach in all alias.c places
that declare MEM_READONLY_P operands as never aliasing.

2014-10-10  Uros Bizjak  ubiz...@gmail.com

 * alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P
 references when alignment ANDs are involved.
 (write_dependence_p): Ditto.
 (may_alias_p): Ditto.

Patch was boostrapped and regression tested on x86_64-linux-gnu and
alpha-linux-gnu. Unfortunately, there are still failures remaining in
gfortran testsuite due to independent RTL infrastructure problem with
VALUEs leaking into aliasing detecting functions [2], [3].

The patch was discussed and OK'd by Richi in the PR audit trail. If
there are no objections from RTL maintainers, I plan to commit it to
the mainline early next week.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483
[2] https://gcc.gnu.org/ml/gcc/2014-10/msg00060.html
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475
No objection.  In fact, after reading everything it's pretty obvious to 
me that a /u MEM must be considered as potentially conflicting with 
writes that are implemented as RMW sequences to deal with the lack of 
byte access support.


The escaping VALUE stuff is still in my queue.

jeff

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Richard Henderson

On 10/10/2014 10:21 AM, Jeff Law wrote:
 On 10/10/14 11:04, Jakub Jelinek wrote:
 On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote:
   But it's really a lot more like a
 kind of PLUS.  If instead we had a LOW to match HIGH it would have been
 Right.  In fact, I believe at the hardware level it's typically implemented
 as addition.

 Can be addition or bitwise or, as high should have some low bits zero, both
 usually work the same.
 It can be bitwise-or on some architectures...

Just to be clear, the LOW rtx code I mentioned upthread was for illustration
purposes only.  I only intend to change the rtx class of the existing LO_SUM.

:-)


r~

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jeff Law


On 10/10/14 11:25, Richard Henderson wrote:

On 10/10/2014 10:21 AM, Jeff Law wrote:

On 10/10/14 11:04, Jakub Jelinek wrote:

On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote:

   But it's really a lot more like a

kind of PLUS.  If instead we had a LOW to match HIGH it would have been

Right.  In fact, I believe at the hardware level it's typically implemented
as addition.


Can be addition or bitwise or, as high should have some low bits zero, both
usually work the same.

It can be bitwise-or on some architectures...


Just to be clear, the LOW rtx code I mentioned upthread was for illustration
purposes only.  I only intend to change the rtx class of the existing LO_SUM.

That was my expectation :-)
jeff

1 2 >

1 - 100 of 152 matches

Mail list logo