Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper
On Fri, Oct 10, 2014 at 12:07:03AM +0400, Ilya Verbin wrote: On 09 Oct 16:07, Ilya Verbin wrote: + /* By default linker does not discard .gnu.offload_lto_* sections. */ + const char *linker_script = make_temp_file (_linker_script.x); + FILE *stream = fopen (linker_script, w); + if (!stream) + fatal_error (fopen %s: %m, linker_script); + fprintf (stream, SECTIONS { /DISCARD/ : { *( + OFFLOAD_SECTION_NAME_PREFIX *) } }\n); + fclose (stream); + printf (%s\n, linker_script); + + goto finish; +} Does this work with gold? Are there any other linkers that support plugins, but don't support linker scripts this way? Oops, gold does not support scripts, outputted from plugins :( error: SECTIONS seen after other input files; try -T/--script Probably, we should update default linker scripts in binutils? But without latest ld/gold all binaries compiled without -flto and with offloading will contain intermediate bytecode... Actually, this issue is not due to outputting a script from a plugin, gold just does not support partial linker scripts: https://sourceware.org/bugzilla/show_bug.cgi?id=17451 So it seems that discarding .gnu.offload_lto_* sections (like it is done for .gnu.lto_*) in the default ld and gold scripts is the right way? I must say I'm not very much familiar with the linker plugin API, but it surprises me that discarding sections is not something it allows. Anyway, can you do the partial linker script for the bfd linker (is there a way to determine from the linker plugin API if it is gold or bfd ld?), and for gold for the time being perhaps strip the sections in lto-wrapper? and feed the ET_REL objects with the sections stripped back to the linker through the plugin API? Jakub
Re: [PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.
Adding Jakub. -Maxim On 10/09/2014 04:34 PM, Maxim Ostapenko wrote: Hi, After enabling ASan, TSan and UBSan testsuites for installed toolchain, many tests started to fail. This is caused by wrong logic in {asan, ubsan, tsan}_finish functions. Here, restore_ld_library_path is called, that is wrong, because it drops some env variables ( GCC_EXEC_PREFIX, LD_LIBRARY_PATH, etc) to state that was before gcc-dg.exp initialized testing environment, so installed GCC will be confused to find some needed stuff later. Removing restore_ld_library_path from {asan, ubsan, tsan}_finish seems to fix the issue. Tested on x86_64-pc-linux-gnu, ok to commit? -Maxim gcc/testsuite/ChangeLog: 2014-10-09 Max Ostapenko m.ostape...@partner.samsung.com * lib/asan-dg.exp (asan_finish): Remove restore_ld_library_path_env_vars. * lib/tsan-dg.exp (tsan_finish): Likewise. * lib/ubsan-dg.exp (ubsan_finish): Likewise. diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp index 9769138..c98fd3c 100644 --- a/gcc/testsuite/lib/asan-dg.exp +++ b/gcc/testsuite/lib/asan-dg.exp @@ -132,7 +132,6 @@ proc asan_finish { args } { unset TEST_ALWAYS_FLAGS } } -restore_ld_library_path_env_vars } # Symbolize lines like diff --git a/gcc/testsuite/lib/tsan-dg.exp b/gcc/testsuite/lib/tsan-dg.exp index 54ec404..6f7a4d9 100644 --- a/gcc/testsuite/lib/tsan-dg.exp +++ b/gcc/testsuite/lib/tsan-dg.exp @@ -143,5 +143,4 @@ proc tsan_finish { args } { } else { unset dg-do-what-default } -restore_ld_library_path_env_vars } diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp index 5a7a653..87c460f 100644 --- a/gcc/testsuite/lib/ubsan-dg.exp +++ b/gcc/testsuite/lib/ubsan-dg.exp @@ -114,5 +114,4 @@ proc ubsan_finish { args } { unset TEST_ALWAYS_FLAGS } } -restore_ld_library_path_env_vars }
Re: [PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.
On Fri, Oct 10, 2014 at 11:13:11AM +0400, Maxim Ostapenko wrote: Adding Jakub. -Maxim On 10/09/2014 04:34 PM, Maxim Ostapenko wrote: Hi, After enabling ASan, TSan and UBSan testsuites for installed toolchain, many tests started to fail. This is caused by wrong logic in {asan, ubsan, tsan}_finish functions. Here, restore_ld_library_path is called, that is wrong, because it drops some env variables ( GCC_EXEC_PREFIX, LD_LIBRARY_PATH, etc) to state that was before gcc-dg.exp initialized testing environment, so installed GCC will be confused to find some needed stuff later. Removing restore_ld_library_path from {asan, ubsan, tsan}_finish seems to fix the issue. Tested on x86_64-pc-linux-gnu, ok to commit? -Maxim gcc/testsuite/ChangeLog: 2014-10-09 Max Ostapenko m.ostape...@partner.samsung.com * lib/asan-dg.exp (asan_finish): Remove restore_ld_library_path_env_vars. * lib/tsan-dg.exp (tsan_finish): Likewise. * lib/ubsan-dg.exp (ubsan_finish): Likewise. That looks wrong to me, we don't want to keep the libsanitizer paths in LD_LIBRARY_PATH* after we leave asan.exp etc. So, perhaps instead save ld_library_path into some global variable (like {a,t,ub}san_saved_ld_library_path) during {a,t,ub}san_link_flags before appending there anything, and replace restore_ld_library_path_env_vars with set ld_library_path ${a,t,ub}san_saved_ld_library_path set_ld_library_path_env_vars ? Jakub
[PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Hi, The patch enables EBX in RA for x86 32bits PIC mode. It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html Now there is working version with good performance and stability level - it could be a solid first step of EBX enabling. Bootstrap and make check passed. There are several changes in -m32 make check. New pass: gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS New fails: gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no errors as we can do asm insertions with EBX gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential performance opportunity using constant immediate gcc.target/i386/pr55458.c (test for errors, line 10) - now there are no errors as there enough registers Performance on Silvermont (for the group of 3 patches): at -Ofast -flto -funroll-loops -fPIC EEMBC benchs: +6% CoreMark +3% Denmark Individual test gained up to +25% SPEC2000int is almost unchanged, SPEC2000fp +1,7%: at -O2 -fPIC EEMBC benchs: +2% CoreMark +5% Denmark Individual test gained up to +25% SPEC2000int and SPEC2000fp are almost unchanged: ChangeLog: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * doc/tm.texi.in (TARGET_USE_PSEUDO_PIC_REG, TARGET_INIT_PIC_REG): Document. * doc/tm.texi: Regenerate. * gcc/function.c (assign_parms): Generate pseudo register for PIC. * gcc/init-regs.c (initialize_uninitialized_regs): Ignor pseudo PIC register. * gcc/ira-color.c (color_pass): Add check on pseudo register. * gcc/ira-emit.c (change_loop): Don't create copies for PIC pseudo register. * gcc/ira.c (split_live_ranges_for_shrink_wrap): Add check on pseudo register. (ira): Add target specific PIC register initialization. (do_reload): Keep PIC pseudo register. * gcc/lra-assigns.c (spill_for): Add checks on pseudo register. * gcc/lra-constraints.c (contains_symbol_ref_p): New. (lra_constraints): Enable lra risky transformations when PIC is pseudo register. * gcc/shrink-wrap.c (try_shrink_wrapping): Add check on pseudo register. * gcc/target.def (use_pseudo_pic_reg): New. (init_pic_reg): New. enabling_ebx_general.patch Description: Binary data
[PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. enabling_ebx_i386.patch Description: Binary data
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [8/n] Remove useless builtin calls
On Wed, Oct 8, 2014 at 9:08 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds removal of user calls to chkp builtins which become useless after instrumentation. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_remove_useless_builtins): New. (chkp_execute): Remove useless calls to Pointer Bounds Checker builtins. diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 5443950..b424af8 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -3805,6 +3805,49 @@ chkp_instrument_function (void) } } +/* Find init/null/copy_ptr_bounds calls and replace them + with assignments. It should allow better code + optimization. */ + +static void +chkp_remove_useless_builtins () +{ + basic_block bb, next; + gimple_stmt_iterator gsi; + + bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)-next_bb; + do +{ + next = bb-next_bb; Please don't use -next_bb but instead use FOR_EACH_BB_FN (cfun, bb) instead. + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + tree fndecl; + enum built_in_function fcode; + + /* Find builtins returning first arg and replace +them with assignments. */ + if (gimple_code (stmt) == GIMPLE_CALL + (fndecl = gimple_call_fndecl (stmt)) + DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL + (fcode = DECL_FUNCTION_CODE (fndecl)) + (fcode == BUILT_IN_CHKP_INIT_PTR_BOUNDS + || fcode == BUILT_IN_CHKP_NULL_PTR_BOUNDS + || fcode == BUILT_IN_CHKP_COPY_PTR_BOUNDS + || fcode == BUILT_IN_CHKP_SET_PTR_BOUNDS)) + { + tree res = gimple_call_arg (stmt, 0); + if (!update_call_from_tree (gsi, res)) + gimplify_and_update_call_from_tree (gsi, res); update_call_from_tree should always succeed with res being a call argument. Richard. + stmt = gsi_stmt (gsi); + update_stmt (stmt); + } +} + bb = next; +} + while (bb); +} + /* Initialize pass. */ static void chkp_init (void) @@ -3872,6 +3915,8 @@ chkp_execute (void) chkp_instrument_function (); + chkp_remove_useless_builtins (); + chkp_function_mark_instrumented (cfun-decl); chkp_fix_cfg ();
[PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
the patch improves performance when previous are applied. It makes RTL loop invariant behavior for GOT loads same as it was before the 2 previous patches. It improves 164.gzip (+9%), 253.perlbmk (+2%) giving ~0.5% to SPEC2000int (compiled with “-m32 -Ofast -flto -funroll-loops -fPIC” For example in 164.gzip. Before enabling EBX: loop begin: (I) 1. SI162 = prev (global address) 2. SI163 = SI142 0xfff (SI 142 modified in the loop) 3. SI164 = EBX + SI162 4. HI107 = SI163*2 + SI164 5. SI142 = HI107 Only INSN 1. treated as invariant and later combine propagates 2,3,4 into 5. After enabling EBX: loop begin: (I) 1. SI163 = prev (global address) 2. SI164 = SI142 0xfff (SI 142 modified in the loop) (I) 3. SI165 = SI143 + SI163 4. HI107 = SI164*2 + SI165 5. SI142 = HI107 INSNs 1. and 3. are treated as invariants (143 is GOT register) and hoisted outside the loop After that combine pass was unable to combine INSNs inside and outside the loop, which lead to higher register pressure and therefore new spills/fills. The patch fixes x86 address cost so that cost for addresses with GOT register becomes less, how it was before enabling EBX. In x86_address_cost the result of “REGNO (parts.base) = FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes condition result the same when parts.base is GOT register (the same for parts.index). 2014-10-08 Evgeny Stupachenko evstu...@gmail.com * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for when address contains GOT register. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b43e870..9d8cfd1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) cost++; if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++;
RE: [PATCH] Clean up duplicated function seq_cost
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Thursday, October 09, 2014 5:21 PM To: Zhenqiang Chen Cc: GCC Patches Subject: Re: [PATCH] Clean up duplicated function seq_cost On Thu, Oct 9, 2014 at 11:20 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: Hi, The are two implementations of seq_cost. The function bodies are exactly the same. The patch removes one of them and make the other global. Bootstrap and no make check regression on X86-64. OK for trunk? The prototype should go to cfgloopanal.c. Err - cfgloopanal.h of course ;) Or rather the function sounds misplaced in cfgloopanal.c. Thanks for the comments. I think seq_cost should be in rtlanal.c, just like rtx_cost, insn_rtx_cost and so on. ChangeLog: 2014-10-10 Zhenqiang Chen zhenqiang.c...@arm.com * cfgloopanal.c (seq_cost): Delete. * rtl.h (seq_cost): New prototype. * rtlanal.c (seq_cost): New function. * tree-ssa-loop-ivopts.c (seq_cost): Delete. diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c index 7ea1a5f..006b419 100644 --- a/gcc/cfgloopanal.c +++ b/gcc/cfgloopanal.c @@ -302,26 +302,6 @@ get_loop_level (const struct loop *loop) return mx; } -/* Returns estimate on cost of computing SEQ. */ - -static unsigned -seq_cost (const rtx_insn *seq, bool speed) -{ - unsigned cost = 0; - rtx set; - - for (; seq; seq = NEXT_INSN (seq)) -{ - set = single_set (seq); - if (set) - cost += set_rtx_cost (set, speed); - else - cost++; -} - - return cost; -} - /* Initialize the constants for computing set costs. */ void diff --git a/gcc/rtl.h b/gcc/rtl.h index e73f731..b697417 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn *, rtx_insn *); extern bool keep_with_call_p (const rtx_insn *); extern bool label_is_jump_target_p (const_rtx, const rtx_insn *); extern int insn_rtx_cost (rtx, bool); +extern unsigned seq_cost (const rtx_insn *, bool); /* Given an insn and condition, return a canonical description of the test being made. */ diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 3063458..93eda12 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -5017,6 +5017,26 @@ insn_rtx_cost (rtx pat, bool speed) return cost 0 ? cost : COSTS_N_INSNS (1); } +/* Returns estimate on cost of computing SEQ. */ + +unsigned +seq_cost (const rtx_insn *seq, bool speed) +{ + unsigned cost = 0; + rtx set; + + for (; seq; seq = NEXT_INSN (seq)) +{ + set = single_set (seq); + if (set) +cost += set_rtx_cost (set, speed); + else +cost++; +} + + return cost; +} + /* Given an insn INSN and condition COND, return the condition in a canonical form to simplify testing by callers. Specifically: diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 400798a..087ca26 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct iv_use *use, return NULL; } -/* Returns estimate on cost of computing SEQ. */ - -static unsigned -seq_cost (rtx_insn *seq, bool speed) -{ - unsigned cost = 0; - rtx set; - - for (; seq; seq = NEXT_INSN (seq)) -{ - set = single_set (seq); - if (set) - cost += set_src_cost (SET_SRC (set), speed); - else - cost++; -} - - return cost; -} - /* Produce DECL_RTL for object obj so it looks like it is stored in memory. */ static rtx produce_memory_decl_rtl (tree obj, int *regno)
Re: [patch] Fix miscompilation of gnat1 in LTO bootstrap
I can't see how this can work with LTO. We need a middle-end way to represent the alias relation of those types. At least I can't see how your simple patch covers all cases here? It covers what I think is the most prominent case (unconstrained array types), the other cases are far less annoying in practice. With LTO we preserve TYPE_ALIAS_SET == 0, so another way to fix this (and which I'd like more) is to do your patch in the Ada frontend, that is, use alias-set zero for all types you relate if flag_lto. Really Big Hammer if you ask me. Another way is to make LTO canonical type merging handle the case of type_contains_placeholder_p better, that is by treating two types with those equivalent more easily. For arrays this simply means hashing and comparing non-constant TYPE_DOMAIN the same / as equal. There is already some code handling PLACEHODER_EXPR special, but it doesn't seem to be enough (why in this case)? Yes, gimple_canonical_types_compatible_p will return true for a couple of unconstrained (i.e. type_contains_placeholder_p) array types with the same component type. The problem here is for an unconstrained array type and a constrained (regular) array type with the same component type. Can I modify gimple_canonical_types_compatible_p to return true in that case as well? -- Eric Botcazou
Re: [PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.
On 10/10/2014 11:30 AM, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 11:13:11AM +0400, Maxim Ostapenko wrote: Adding Jakub. -Maxim On 10/09/2014 04:34 PM, Maxim Ostapenko wrote: Hi, After enabling ASan, TSan and UBSan testsuites for installed toolchain, many tests started to fail. This is caused by wrong logic in {asan, ubsan, tsan}_finish functions. Here, restore_ld_library_path is called, that is wrong, because it drops some env variables ( GCC_EXEC_PREFIX, LD_LIBRARY_PATH, etc) to state that was before gcc-dg.exp initialized testing environment, so installed GCC will be confused to find some needed stuff later. Removing restore_ld_library_path from {asan, ubsan, tsan}_finish seems to fix the issue. Tested on x86_64-pc-linux-gnu, ok to commit? -Maxim gcc/testsuite/ChangeLog: 2014-10-09 Max Ostapenko m.ostape...@partner.samsung.com * lib/asan-dg.exp (asan_finish): Remove restore_ld_library_path_env_vars. * lib/tsan-dg.exp (tsan_finish): Likewise. * lib/ubsan-dg.exp (ubsan_finish): Likewise. That looks wrong to me, we don't want to keep the libsanitizer paths in LD_LIBRARY_PATH* after we leave asan.exp etc. So, perhaps instead save ld_library_path into some global variable (like {a,t,ub}san_saved_ld_library_path) during {a,t,ub}san_link_flags before appending there anything, and replace restore_ld_library_path_env_vars with set ld_library_path ${a,t,ub}san_saved_ld_library_path set_ld_library_path_env_vars ? Jakub This works indeed. However, calling set_ld_library_path_env_vars in {asan, tsan, ubsan}_finish will lead to updating LD_LIBRARY_PATH_{32, 64}, LD_RUN_PATH etc. with $ld_library_path:$orig_ld_{library_path_32, library_path_64, run, etc}. Is this fine? -Maxim gcc/testsuite/ChangeLog: 2014-10-10 Max Ostapenko m.ostape...@partner.samsung.com * lib/asan-dg.exp (asan_link_flags): Save ld_library_path. * lib/tsan-dg.exp (tsan_link_flags): Likewise. * lib/ubsan-dg.exp (ubsan_link_flags): Likewise. * lib/asan-dg.exp (asan_finish): Remove restore_ld_library_path_env_vars. Restore ld_library_path with saved value. Restore LD_LIBRARY_PATH related env variables by calling set_ld_library_path_env_vars. * lib/tsan-dg.exp (tsan_finish): Likewise. * lib/ubsan-dg.exp (ubsan_finish): Likewise. diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp index 9769138..4e8b4d6 100644 --- a/gcc/testsuite/lib/asan-dg.exp +++ b/gcc/testsuite/lib/asan-dg.exp @@ -47,11 +47,13 @@ proc asan_link_flags { paths } { global srcdir global ld_library_path global shlib_ext +global asan_saved_library_path set gccpath ${paths} set flags set shlib_ext [get_shlib_extension] +set asan_saved_library_path $ld_library_path if { $gccpath != } { if { [file exists ${gccpath}/libsanitizer/asan/.libs/libasan.a] @@ -122,6 +124,8 @@ proc asan_finish { args } { global TEST_ALWAYS_FLAGS global asan_saved_TEST_ALWAYS_FLAGS global asan_saved_ALWAYS_CXXFLAGS +global asan_saved_library_path +global ld_library_path if [info exists asan_saved_ALWAYS_CXXFLAGS ] { set ALWAYS_CXXFLAGS $asan_saved_ALWAYS_CXXFLAGS @@ -132,7 +136,8 @@ proc asan_finish { args } { unset TEST_ALWAYS_FLAGS } } -restore_ld_library_path_env_vars +set ld_library_path $asan_saved_library_path +set_ld_library_path_env_vars } # Symbolize lines like diff --git a/gcc/testsuite/lib/tsan-dg.exp b/gcc/testsuite/lib/tsan-dg.exp index 54ec404..77c85ff 100644 --- a/gcc/testsuite/lib/tsan-dg.exp +++ b/gcc/testsuite/lib/tsan-dg.exp @@ -32,11 +32,13 @@ proc tsan_link_flags { paths } { global srcdir global ld_library_path global shlib_ext +global tsan_saved_library_path set gccpath ${paths} set flags set shlib_ext [get_shlib_extension] +set tsan_saved_library_path $ld_library_path if { $gccpath != } { if { [file exists ${gccpath}/libsanitizer/tsan/.libs/libtsan.a] @@ -127,6 +129,8 @@ proc tsan_finish { args } { global tsan_saved_ALWAYS_CXXFLAGS global dg-do-what-default global tsan_saved_dg-do-what-default +global tsan_saved_library_path +global ld_library_path if [info exists tsan_saved_ALWAYS_CXXFLAGS ] { set ALWAYS_CXXFLAGS $tsan_saved_ALWAYS_CXXFLAGS @@ -143,5 +147,6 @@ proc tsan_finish { args } { } else { unset dg-do-what-default } -restore_ld_library_path_env_vars +set ld_library_path $tsan_saved_library_path +set_ld_library_path_env_vars } diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp index 5a7a653..3bfdcc8 100644 --- a/gcc/testsuite/lib/ubsan-dg.exp +++ b/gcc/testsuite/lib/ubsan-dg.exp @@ -32,11 +32,13 @@ proc ubsan_link_flags { paths } { global srcdir global ld_library_path global shlib_ext +global ubsan_saved_library_path set gccpath ${paths} set flags set shlib_ext [get_shlib_extension] +set
Re: [PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.
On Fri, Oct 10, 2014 at 01:09:00PM +0400, Maxim Ostapenko wrote: This works indeed. However, calling set_ld_library_path_env_vars in {asan, tsan, ubsan}_finish will lead to updating LD_LIBRARY_PATH_{32, 64}, LD_RUN_PATH etc. with $ld_library_path:$orig_ld_{library_path_32, library_path_64, run, etc}. Is this fine? Isn't that the state in which asan_init has been called? I mean, all of asan.exp, tsan.exp and ubsan.exp source gcc-dg.exp which does: global orig_environment_saved # This file may be sourced, so don't override environment settings # that have been previously setup. if { $orig_environment_saved == 0 } { append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST] set_ld_library_path_env_vars } so by the time asan_init etc. is called, it should have been in LD_LIBRARY_PATH* etc. already. 2014-10-10 Max Ostapenko m.ostape...@partner.samsung.com * lib/asan-dg.exp (asan_link_flags): Save ld_library_path. * lib/tsan-dg.exp (tsan_link_flags): Likewise. * lib/ubsan-dg.exp (ubsan_link_flags): Likewise. * lib/asan-dg.exp (asan_finish): Remove * restore_ld_library_path_env_vars. Restore ld_library_path with saved value. Restore LD_LIBRARY_PATH related env variables by calling set_ld_library_path_env_vars. * lib/tsan-dg.exp (tsan_finish): Likewise. * lib/ubsan-dg.exp (ubsan_finish): Likewise. I'd rather duplicate the descriptions than duplicate the filenames in the same ChangeLog entry. So, please put all asan-dg.exp changes together etc. Ok with that change. Jakub
Re: Towards GNU11
On Thu, Oct 09, 2014 at 02:34:51PM -0700, Mike Stump wrote: On Oct 7, 2014, at 2:07 PM, Marek Polacek pola...@redhat.com wrote: I'd like to kick off a discussion about moving the default standard for C from gnu89 to gnu11. I endorse the change of default. Thanks for chiming in. A wiki page that has the types of changes people hit in real code with how to fix it, would be useful, helpful. Might be nice to have this in the document, but, don’t know if people want to do that much work. The wiki site is nice, as if others do world builds, then can add what ever they hit in easily, which then makes that even more complete. This is a nice to have, I don’t think the work going in should be gated on this. Yeah. I plan to write something into the porting to document. Two comment: Thank you for all your hard work. Yes please. Thank you very much! Marek
Re: [PATCH] Implement -fsanitize=object-size
On Thu, Oct 02, 2014 at 02:04:24PM +0200, Jakub Jelinek wrote: Looks much better. Cool. There are some nits I'd change, like: 1) no need not to handle bitfields 2) IMHO it should handle PARM_DECL and RESULT_DECL alongside of VAR_DECL 3) decl_p IMHO should use just DECL_P 4) it doesn't make sense to build ADDR_EXPR if you are never going to use it All these look good - I applied the patch. Attaching some (incomplete) testcase I've used to look at the implementation, e.g. f1 is for testing how it plays together with -fsanitize=bounds (perhaps, maybe as a follow-up, we could optimize by looking for UBSAN_BOUNDS call with expected arguments in a few statements before array access (e.g. remember last UBSAN_BOUNDS seen in the same basic block) if inner is DECL_P), f2 to show a case which -fsanitize=bounds doesn't instrument, but -fsanitize=object-size should, f3/f4 the same with PARM_DECL, f5/f6 unsuccessful attempt for RESULT_DECL, f7/f8 bitfield tests, f9 something where __bos is folded very early (already before objsz1 pass), f10 where __bos is folded during objsz1 pass. If you want to turn parts of that testcase into real /ubsan/ tests, go ahead. Other than that, if you are fine with following changes, can you incorporate them into the patch and retest? I created a new testcase based on that, thanks. I couldn't test bootstrap-ubsan, because of error: /home/polacek/x/trunk/prev-x86_64-unknown-linux-gnu/libsanitizer/ubsan/.libs/libubsan.a(ubsan_init.o): .preinit_array section is not allowed in DSO but I remember that the previous version of the patch passed fine. I had to add the following + if (LOCATION_LOCUS (loc) == UNKNOWN_LOCATION) + loc = input_location; the reason is that on pr59250.C we ICEd, since we hit this assert in tree-object-size.c:pass_object_sizes::execute; gcc_assert (TREE_CODE (result) == INTEGER_CST); Because fold_call_stmt returned an INTEGER_CST wrapped in NOP_EXPR. This NOP_EXPR is created in fold_builtin_n so we can a location on it, and should be discarded in fold_call_stmt: ret = fold_builtin_n (loc, fndecl, args, nargs, ignore); if (ret) { /* Propagate location information from original call to expansion of builtin. Otherwise things like maybe_emit_chk_warning, that operate on the expansion of a builtin, will use the wrong location information. */ if (gimple_has_location (stmt)) { tree realret = ret; if (TREE_CODE (ret) == NOP_EXPR) realret = TREE_OPERAND (ret, 0); However, STMT was without location. We set the location properly in instrument_object_size, but in this case the location was UNKNOWN_LOCATION. So for UNKNOWN_LOCATION I use input_location. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-10-09 Marek Polacek pola...@redhat.com * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_OBJECT_SIZE. * doc/invoke.texi: Document -fsanitize=object-size. * flag-types.h (enum sanitize_code): Add SANITIZE_OBJECT_SIZE and or it into SANITIZE_UNDEFINED. * gimple-fold.c (gimple_fold_call): Optimize IFN_UBSAN_OBJECT_SIZE. * internal-fn.c (expand_UBSAN_OBJECT_SIZE): New function. * internal-fn.def (UBSAN_OBJECT_SIZE): Define. * opts.c (common_handle_option): Handle -fsanitize=object-size. * ubsan.c: Include tree-object-size.h. (ubsan_type_descriptor): Call tree_to_uhwi instead of tree_to_shwi. (ubsan_expand_bounds_ifn): Use false instead of 0. (ubsan_expand_objsize_ifn): New function. (instrument_object_size): New function. (pass_ubsan::execute): Add object size instrumentation. * ubsan.h (ubsan_expand_objsize_ifn): Declare. testsuite/ * c-c++-common/ubsan/object-size-1.c: New test. * c-c++-common/ubsan/object-size-2.c: New test. * c-c++-common/ubsan/object-size-3.c: New test. * c-c++-common/ubsan/object-size-4.c: New test. * c-c++-common/ubsan/object-size-5.c: New test. * c-c++-common/ubsan/object-size-6.c: New test. * c-c++-common/ubsan/object-size-7.c: New test. * c-c++-common/ubsan/object-size-8.c: New test. * c-c++-common/ubsan/object-size-9.c: New test. * g++.dg/ubsan/object-size-1.C: New test. * gcc.dg/ubsan/object-size-9.c: New test. diff --git gcc/asan.c gcc/asan.c index fca4ee6..6ea3efe 100644 --- gcc/asan.c +++ gcc/asan.c @@ -2879,6 +2879,9 @@ pass_sanopt::execute (function *fun) case IFN_UBSAN_BOUNDS: no_next = ubsan_expand_bounds_ifn (gsi); break; + case IFN_UBSAN_OBJECT_SIZE: + no_next = ubsan_expand_objsize_ifn (gsi); + break; case IFN_ASAN_CHECK: { no_next =
Re: [PATCH] Implement -fsanitize=object-size
On Fri, Oct 10, 2014 at 12:04:08PM +0200, Marek Polacek wrote: I couldn't test bootstrap-ubsan, because of error: /home/polacek/x/trunk/prev-x86_64-unknown-linux-gnu/libsanitizer/ubsan/.libs/libubsan.a(ubsan_init.o): .preinit_array section is not allowed in DSO but I remember that the previous version of the patch passed fine. We build (intentionally) both libubsan.so.* objects and libubsan.a objects with -fPIC, but don't build the latter with -DPIC. I guess we need now, with -static-libubsan libubsan.a is linked into shared libraries statically and we definitely can't use .preinit_array in that case. So, I think (untested) something like: 2014-10-10 Jakub Jelinek ja...@redhat.com * ubsan/Makefile.am (DEFS): Add -DPIC. * ubsan/Makefile.in: Regenerated. --- libsanitizer/ubsan/Makefile.am 2014-09-24 11:08:04.183026156 +0200 +++ libsanitizer/ubsan/Makefile.am 2014-10-10 12:15:19.124247283 +0200 @@ -3,7 +3,7 @@ AM_CPPFLAGS = -I $(top_srcdir) -I $(top_ # May be used by toolexeclibdir. gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER) -DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS +DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DPIC AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC -fno-builtin -fno-exceptions -fno-rtti -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS) ACLOCAL_AMFLAGS = -I m4 should fix this. 2014-10-09 Marek Polacek pola...@redhat.com Check the date ;) * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_OBJECT_SIZE. * doc/invoke.texi: Document -fsanitize=object-size. * flag-types.h (enum sanitize_code): Add SANITIZE_OBJECT_SIZE and or it into SANITIZE_UNDEFINED. * gimple-fold.c (gimple_fold_call): Optimize IFN_UBSAN_OBJECT_SIZE. * internal-fn.c (expand_UBSAN_OBJECT_SIZE): New function. * internal-fn.def (UBSAN_OBJECT_SIZE): Define. * opts.c (common_handle_option): Handle -fsanitize=object-size. * ubsan.c: Include tree-object-size.h. I'd avoid the s. --- gcc/gimple-fold.c +++ gcc/gimple-fold.c @@ -2662,6 +2662,19 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace) gimple_call_arg (stmt, 1), gimple_call_arg (stmt, 2)); break; +case IFN_UBSAN_OBJECT_SIZE: + if (integer_all_onesp (gimple_call_arg (stmt, 2)) Formatting on the case line, there should be tab. + + gcc_assert (TREE_CODE (size) == INTEGER_CST); + /* See if we can discard the check. */ + if (integer_all_onesp (size)) +/* Yes, __builtin_object_size couldn't determine the + object size. */; I'd just treat TREE_CODE (size) != INTEGER_CST the same as integer_all_onesp. It is very likely you'll get an INTEGER_CST there, but I'd be afraid if somebody disables ccp, forwprop and similar optimizations that if you are unlucky you might actually have an SSA_NAME there instead. Ok with those changes. After commit, please update gcc-5/changes.html. Thanks. Jakub
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. Couple of nits below. +/* Set regs_ever_live for PIC base address register + to true if required. */ +static void +set_pic_reg_ever_alive () Please rename this function to set_pic_reg_ever_live. -#define PIC_OFFSET_TABLE_REGNUM \ - ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ - || TARGET_PECOFF)) \ - || !flag_pic ? INVALID_REGNUM \ - : reload_completed ? REGNO (pic_offset_table_rtx) \ +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : pic_offset_table_rtx ? INVALID_REGNUM \ : REAL_PIC_OFFSET_TABLE_REGNUM) No negative conditions, please. Also, please follow established multi-level condition format, please see e.g. HARD_REGNO_NREGS definition in i386.h. OK for mainline after infrastructure patch is approved. Thanks, Uros.
[PATCH] Small optimization for emit_case_bit_tests (PR tree-optimization/63464)
Hi! This patch adds a small optimization to emit_case_bit_tests, instead of emitting (for high, low, mask all constants) (x - low) = (high - low) ((1 (x - low)) mask) if high is smaller than BITS_PER_WORD and low 0 we can emit x = high ((1 x) (mask low)) and avoid subtraction. Do this only if mask low isn't more costly than mask. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-10-10 Jakub Jelinek ja...@redhat.com PR tree-optimization/63464 * tree-switch-conversion.c (struct case_bit_test): Remove hi and lo fields, add wide_int mask field. (emit_case_bit_tests): Add MAXVAL argument, rewrite uses of hi/lo fields into wide_int mask operations, optimize by pretending minval to be 0 if maxval is small enough. (process_switch): Adjust caller. --- gcc/tree-switch-conversion.c.jj 2014-08-01 09:24:12.0 +0200 +++ gcc/tree-switch-conversion.c2014-10-07 12:36:29.438181857 +0200 @@ -246,8 +246,7 @@ This transformation was contributed by R struct case_bit_test { - HOST_WIDE_INT hi; - HOST_WIDE_INT lo; + wide_int mask; edge target_edge; tree label; int bits; @@ -294,13 +293,14 @@ case_bit_test_cmp (const void *p1, const are not guaranteed to be of the same type as INDEX_EXPR (the gimplifier doesn't change the type of case label values, and MINVAL and RANGE are derived from those values). +MAXVAL is MINVAL + RANGE. There *MUST* be MAX_CASE_BIT_TESTS or less unique case node targets. */ static void emit_case_bit_tests (gimple swtch, tree index_expr, -tree minval, tree range) +tree minval, tree range, tree maxval) { struct case_bit_test test[MAX_CASE_BIT_TESTS]; unsigned int i, j, k; @@ -324,6 +324,8 @@ emit_case_bit_tests (gimple swtch, tree tree word_type_node = lang_hooks.types.type_for_mode (word_mode, 1); tree word_mode_zero = fold_convert (word_type_node, integer_zero_node); tree word_mode_one = fold_convert (word_type_node, integer_one_node); + int prec = TYPE_PRECISION (word_type_node); + wide_int wone = wi::one (prec); memset (test, 0, sizeof (test)); @@ -348,8 +350,7 @@ emit_case_bit_tests (gimple swtch, tree if (k == count) { gcc_checking_assert (count MAX_CASE_BIT_TESTS); - test[k].hi = 0; - test[k].lo = 0; + test[k].mask = wi::zero (prec); test[k].target_edge = e; test[k].label = label; test[k].bits = 1; @@ -367,14 +368,39 @@ emit_case_bit_tests (gimple swtch, tree CASE_HIGH (cs), minval)); for (j = lo; j = hi; j++) -if (j = HOST_BITS_PER_WIDE_INT) - test[k].hi |= (HOST_WIDE_INT) 1 (j - HOST_BITS_PER_INT); - else - test[k].lo |= (HOST_WIDE_INT) 1 j; + test[k].mask |= wi::lshift (wone, j); } qsort (test, count, sizeof (*test), case_bit_test_cmp); + /* If all values are in the 0 .. BITS_PER_WORD-1 range, we can get rid of + the minval subtractions, but it might make the mask constants more + expensive. So, compare the costs. */ + if (compare_tree_int (minval, 0) 0 + compare_tree_int (maxval, GET_MODE_BITSIZE (word_mode)) 0) +{ + int cost_diff; + HOST_WIDE_INT m = tree_to_uhwi (minval); + rtx reg = gen_raw_REG (word_mode, 1); + bool speed_p = optimize_bb_for_speed_p (gimple_bb (swtch)); + cost_diff = set_rtx_cost (gen_rtx_PLUS (word_mode, reg, + GEN_INT (-m)), speed_p); + for (i = 0; i count; i++) + { + rtx r = immed_wide_int_const (test[i].mask, word_mode); + cost_diff += set_src_cost (gen_rtx_AND (word_mode, reg, r), speed_p); + r = immed_wide_int_const (wi::lshift (test[i].mask, m), word_mode); + cost_diff -= set_src_cost (gen_rtx_AND (word_mode, reg, r), speed_p); + } + if (cost_diff 0) + { + for (i = 0; i count; i++) + test[i].mask = wi::lshift (test[i].mask, m); + minval = build_zero_cst (TREE_TYPE (minval)); + range = maxval; + } +} + /* We generate two jumps to the default case label. Split the default edge, so that we don't have to do any PHI node updating. */ @@ -446,13 +472,7 @@ emit_case_bit_tests (gimple swtch, tree if (const csui) goto target */ for (k = 0; k count; k++) { - HOST_WIDE_INT a[2]; - - a[0] = test[k].lo; - a[1] = test[k].hi; - tmp = wide_int_to_tree (word_type_node, - wide_int::from_array (a, 2, - TYPE_PRECISION (word_type_node))); + tmp = wide_int_to_tree (word_type_node, test[k].mask); tmp = fold_build2 (BIT_AND_EXPR, word_type_node, csui, tmp); tmp = force_gimple_operand_gsi (gsi, tmp,
[PATCH] Fix up _Alignof with user aligned types (PR c/63495)
Hi! As the testcase shows, _Alignof can sometimes return smaller number than the minimum alignment. That is because when laying out structures, fields with types with TYPE_USER_ALIGN set have also DECL_USER_ALIGN set and therefore neither BIGGEST_FIELD_ALIGNMENT nor ADJUST_FIELD_ALIGN is applied to them, but min_align_of_type was applying that unconditionally. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9? 2014-10-10 Jakub Jelinek ja...@redhat.com PR c/63495 * stor-layout.c (min_align_of_type): Don't decrease alignment through BIGGEST_FIELD_ALIGNMENT or ADJUST_FIELD_ALIGN if TYPE_USER_ALIGN is set. * gcc.target/i386/pr63495.c: New test. --- gcc/stor-layout.c.jj2014-09-01 09:43:56.0 +0200 +++ gcc/stor-layout.c 2014-10-09 09:56:49.760893630 +0200 @@ -2400,17 +2400,19 @@ min_align_of_type (tree type) { unsigned int align = TYPE_ALIGN (type); align = MIN (align, BIGGEST_ALIGNMENT); + if (!TYPE_USER_ALIGN (type)) +{ #ifdef BIGGEST_FIELD_ALIGNMENT - align = MIN (align, BIGGEST_FIELD_ALIGNMENT); + align = MIN (align, BIGGEST_FIELD_ALIGNMENT); #endif - unsigned int field_align = align; + unsigned int field_align = align; #ifdef ADJUST_FIELD_ALIGN - tree field = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, - type); - field_align = ADJUST_FIELD_ALIGN (field, field_align); - ggc_free (field); + tree field = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, type); + field_align = ADJUST_FIELD_ALIGN (field, field_align); + ggc_free (field); #endif - align = MIN (align, field_align); + align = MIN (align, field_align); +} return align / BITS_PER_UNIT; } --- gcc/testsuite/gcc.target/i386/pr63495.c.jj 2014-10-09 09:59:46.036634365 +0200 +++ gcc/testsuite/gcc.target/i386/pr63495.c 2014-10-09 10:01:09.231096165 +0200 @@ -0,0 +1,6 @@ +/* PR c/63495 */ +/* { dg-do compile { target { i?86-*-linux* x86_64-*-linux* } } } */ +/* { dg-options -std=gnu11 } */ + +struct __attribute__ ((aligned (8))) S { char c; }; +_Static_assert (_Alignof (struct S) = 8, wrong alignment); Jakub
[AArch64] Add --enable-fix-cortex-a53-835769 configure-time option
Hi all, This adds a new configure-time option --enable-fix-cortex-a53-835769 that will enable the Cortex-A53 erratum fix by default so you don't have to specify -mfix-cortex-a53-835769 every time. Documentation in install.texi is added. Ok for trunk? Thanks, Kyrill 2014-10-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * configure.ac: Add --enable-fix-cortex-a53-835769 option. * configure: Regenerate. * config/aarch64/aarch64.c (aarch64_override_options): Handle TARGET_FIX_ERR_A53_835769_DEFAULT. * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): Set Init value to 2. * doc/install.texi (aarch64*-*-*): Document new --enable-fix-cortex-a53-835769 option. commit ed8a64d6a7a23919b6b6d3a76cad90b9eae8a530 Author: Kyrill Tkachov kyrylo.tkac...@arm.com Date: Wed Oct 8 16:28:25 2014 + [AArch64] Add --enable-fix-cortex-a53-835769 configure switch. 2014-10-08 Kyrylo Tkachov kyrylo.tkac...@arm.com * configure.ac: Add --enable-fix-cortex-a53-835769 option. * configure: Regenerate. * config/aarch64/aarch64.c (aarch64_override_options): Handle TARGET_FIX_ERR_A53_835769_DEFAULT. * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): Set Init value to 2. * doc/install.texi (aarch64*-*-*): Document new --enable-fix-cortex-a53-835769 option. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 76a2480..db5ff59 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6387,6 +6387,15 @@ aarch64_override_options (void) aarch64_tune = selected_tune-core; aarch64_tune_params = selected_tune-tune; + if (aarch64_fix_a53_err835769 == 2) +{ +#ifdef TARGET_FIX_ERR_A53_835769_DEFAULT + aarch64_fix_a53_err835769 = 1; +#else + aarch64_fix_a53_err835769 = 0; +#endif +} + aarch64_override_options_after_change (); } diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 77deb2e..fc0307e 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -68,7 +68,7 @@ Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Generate code which uses only the general registers mfix-cortex-a53-835769 -Target Report Var(aarch64_fix_a53_err835769) Init(0) +Target Report Var(aarch64_fix_a53_err835769) Init(2) Workaround for ARM Cortex-A53 Erratum number 835769 mlittle-endian diff --git a/gcc/configure b/gcc/configure index 380a235..bd1215d 100755 --- a/gcc/configure +++ b/gcc/configure @@ -920,6 +920,7 @@ with_plugin_ld enable_gnu_indirect_function enable_initfini_array enable_comdat +enable_fix_cortex_a53_835769 with_glibc_version enable_gnu_unique_object enable_linker_build_id @@ -1638,6 +1639,14 @@ Optional Features: glibc systems --enable-initfini-array use .init_array/.fini_array sections --enable-comdat enable COMDAT group support + + --enable-fix-cortex-a53-835769 + enable workaround for AArch64 Cortex-A53 erratum + 835769 by default + --disable-fix-cortex-a53-835769 + disable workaround for AArch64 Cortex-A53 erratum + 835769 by default + --enable-gnu-unique-object enable the use of the @gnu_unique_object ELF extension on glibc systems @@ -18049,7 +18058,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18052 configure +#line 18061 configure #include confdefs.h #if HAVE_DLFCN_H @@ -18155,7 +18164,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18158 configure +#line 18167 configure #include confdefs.h #if HAVE_DLFCN_H @@ -23979,6 +23988,25 @@ $as_echo #define HAVE_AS_MABI_OPTION 1 confdefs.h done fi fi +# Enable default workaround for AArch64 Cortex-A53 erratum 835769. +# Check whether --enable-fix-cortex-a53-835769 was given. +if test ${enable_fix_cortex_a53_835769+set} = set; then : + enableval=$enable_fix_cortex_a53_835769; +case $enableval in + yes) +tm_defines=${tm_defines} TARGET_FIX_ERR_A53_835769_DEFAULT=1 +;; + no) +;; + *) +as_fn_error '$enableval' is an invalid value for --enable-fix-cortex-a53-835769.\ + Valid choices are 'yes' and 'no'. $LINENO 5 +;; + +esac + +fi + ;; # All TARGET_ABI_OSF targets. diff --git a/gcc/configure.ac b/gcc/configure.ac index eb480de..8f7c814 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -3526,6 +3526,29 @@ case $target in done fi fi +# Enable default workaround for AArch64 Cortex-A53 erratum 835769. +AC_ARG_ENABLE(fix-cortex-a53-835769, +[
[PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
Hi all, Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. A linker patching technique can also be used, by moving potentially affected multiply-accumulate instruction into a patch region and replacing the original instruction with a branch to the patch. This patch achieves the compiler part by using the final prescan pass. The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH so that conditional branches work properly. The fix is disabled by default and can be turned on by the -mfix-cortex-a53-835769 compiler option. I'm attaching a trunk and a 4.9 version of the patch. The 4.9 version is different only in that rtx_insn* is replaced by rtx. Tested on aarch64-none-linux-gnu (and bootstrap with the option) and built various large benchmarks with it. Ok? Thanks, Kyrill 2014-10-10 Kyrylo Tkachovkyrylo.tkac...@arm.com Ramana Radhakrishnanramana.radhakrish...@arm.com * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define. (ADJUST_INSN_LENGTH): Define. * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option. * config/aarch64/aarch64.c (is_mem_p): New function. (is_memory_op): Likewise. (aarch64_prev_real_insn): Likewise. (is_madd_op): Likewise. (dep_between_memop_and_next): Likewise. (aarch64_madd_needs_nop): Likewise. (aarch64_final_prescan_insn): Likewise. * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769 and -mno-fix-cortex-a53-835769 options. commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4 Author: Kyrill Tkachov kyrylo.tkac...@arm.com Date: Wed Oct 8 12:48:34 2014 + [AArch64] Add final prescan workaround for Cortex-A53 erratum. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index b5f53d2..c57a467 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl, extern void aarch64_split_combinev16qi (rtx operands[3]); extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel); +extern bool aarch64_madd_needs_nop (rtx_insn *); +extern void aarch64_final_prescan_insn (rtx_insn *); extern bool aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel); void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5144c35..76a2480 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type) return NULL; } +static int +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED) +{ + return MEM_P (*x); +} + +static bool +is_memory_op (rtx_insn *mem_insn) +{ + rtx pattern = PATTERN (mem_insn); + return for_each_rtx (pattern, is_mem_p, NULL); +} + +/* Find the first rtx_insn before insn that will generate an assembly + instruction. */ + +static rtx_insn * +aarch64_prev_real_insn (rtx_insn *insn) +{ + if (!insn) +return NULL; + + do +{ + insn = prev_real_insn (insn); +} + while (insn recog_memoized (insn) 0); + + return insn; +} + +static bool +is_madd_op (enum attr_type t1) +{ + unsigned int i; + /* A number of these may be AArch32 only. */ + enum attr_type mlatypes[] = { +TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD, +TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY, +TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD + }; + + for (i = 0; i sizeof (mlatypes) / sizeof (enum attr_type); i++) +{ + if (t1 == mlatypes[i]) + return true; +} + + return false; +} + +/* Check if there is a register dependency between a load and the insn + for which we hold recog_data. */ + +static bool +dep_between_memop_and_curr (rtx memop) +{ + rtx load_reg; + int opno; + + if (!memop) +return false; + + if (!REG_P (SET_DEST (memop))) +return false; + + load_reg = SET_DEST (memop); + for (opno = 0; opno recog_data.n_operands; opno++) +{ + rtx operand = recog_data.operand[opno]; + if (REG_P (operand) + reg_overlap_mentioned_p (load_reg, operand)) +return true; + +} + return false; +} + +bool +aarch64_madd_needs_nop (rtx_insn* insn) +{ + enum attr_type attr_type; + rtx_insn *prev; + rtx body; + + if
Re: [PATCH] Small optimization for emit_case_bit_tests (PR tree-optimization/63464)
On Fri, 10 Oct 2014, Jakub Jelinek wrote: Hi! This patch adds a small optimization to emit_case_bit_tests, instead of emitting (for high, low, mask all constants) (x - low) = (high - low) ((1 (x - low)) mask) if high is smaller than BITS_PER_WORD and low 0 we can emit x = high ((1 x) (mask low)) and avoid subtraction. Do this only if mask low isn't more costly than mask. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? But isn't this a general optimization? Also testing for RTX costs this early sounds bogus. Thanks, Richard. 2014-10-10 Jakub Jelinek ja...@redhat.com PR tree-optimization/63464 * tree-switch-conversion.c (struct case_bit_test): Remove hi and lo fields, add wide_int mask field. (emit_case_bit_tests): Add MAXVAL argument, rewrite uses of hi/lo fields into wide_int mask operations, optimize by pretending minval to be 0 if maxval is small enough. (process_switch): Adjust caller. --- gcc/tree-switch-conversion.c.jj 2014-08-01 09:24:12.0 +0200 +++ gcc/tree-switch-conversion.c 2014-10-07 12:36:29.438181857 +0200 @@ -246,8 +246,7 @@ This transformation was contributed by R struct case_bit_test { - HOST_WIDE_INT hi; - HOST_WIDE_INT lo; + wide_int mask; edge target_edge; tree label; int bits; @@ -294,13 +293,14 @@ case_bit_test_cmp (const void *p1, const are not guaranteed to be of the same type as INDEX_EXPR (the gimplifier doesn't change the type of case label values, and MINVAL and RANGE are derived from those values). +MAXVAL is MINVAL + RANGE. There *MUST* be MAX_CASE_BIT_TESTS or less unique case node targets. */ static void emit_case_bit_tests (gimple swtch, tree index_expr, - tree minval, tree range) + tree minval, tree range, tree maxval) { struct case_bit_test test[MAX_CASE_BIT_TESTS]; unsigned int i, j, k; @@ -324,6 +324,8 @@ emit_case_bit_tests (gimple swtch, tree tree word_type_node = lang_hooks.types.type_for_mode (word_mode, 1); tree word_mode_zero = fold_convert (word_type_node, integer_zero_node); tree word_mode_one = fold_convert (word_type_node, integer_one_node); + int prec = TYPE_PRECISION (word_type_node); + wide_int wone = wi::one (prec); memset (test, 0, sizeof (test)); @@ -348,8 +350,7 @@ emit_case_bit_tests (gimple swtch, tree if (k == count) { gcc_checking_assert (count MAX_CASE_BIT_TESTS); - test[k].hi = 0; - test[k].lo = 0; + test[k].mask = wi::zero (prec); test[k].target_edge = e; test[k].label = label; test[k].bits = 1; @@ -367,14 +368,39 @@ emit_case_bit_tests (gimple swtch, tree CASE_HIGH (cs), minval)); for (j = lo; j = hi; j++) -if (j = HOST_BITS_PER_WIDE_INT) - test[k].hi |= (HOST_WIDE_INT) 1 (j - HOST_BITS_PER_INT); - else - test[k].lo |= (HOST_WIDE_INT) 1 j; + test[k].mask |= wi::lshift (wone, j); } qsort (test, count, sizeof (*test), case_bit_test_cmp); + /* If all values are in the 0 .. BITS_PER_WORD-1 range, we can get rid of + the minval subtractions, but it might make the mask constants more + expensive. So, compare the costs. */ + if (compare_tree_int (minval, 0) 0 + compare_tree_int (maxval, GET_MODE_BITSIZE (word_mode)) 0) +{ + int cost_diff; + HOST_WIDE_INT m = tree_to_uhwi (minval); + rtx reg = gen_raw_REG (word_mode, 1); + bool speed_p = optimize_bb_for_speed_p (gimple_bb (swtch)); + cost_diff = set_rtx_cost (gen_rtx_PLUS (word_mode, reg, + GEN_INT (-m)), speed_p); + for (i = 0; i count; i++) + { + rtx r = immed_wide_int_const (test[i].mask, word_mode); + cost_diff += set_src_cost (gen_rtx_AND (word_mode, reg, r), speed_p); + r = immed_wide_int_const (wi::lshift (test[i].mask, m), word_mode); + cost_diff -= set_src_cost (gen_rtx_AND (word_mode, reg, r), speed_p); + } + if (cost_diff 0) + { + for (i = 0; i count; i++) + test[i].mask = wi::lshift (test[i].mask, m); + minval = build_zero_cst (TREE_TYPE (minval)); + range = maxval; + } +} + /* We generate two jumps to the default case label. Split the default edge, so that we don't have to do any PHI node updating. */ @@ -446,13 +472,7 @@ emit_case_bit_tests (gimple swtch, tree if (const csui) goto target */ for (k = 0; k count; k++) { - HOST_WIDE_INT a[2]; - - a[0] = test[k].lo; - a[1] = test[k].hi; - tmp = wide_int_to_tree (word_type_node, - wide_int::from_array (a, 2, - TYPE_PRECISION (word_type_node))); + tmp =
Re: [PATCH] Small optimization for emit_case_bit_tests (PR tree-optimization/63464)
On Fri, Oct 10, 2014 at 12:55:21PM +0200, Richard Biener wrote: On Fri, 10 Oct 2014, Jakub Jelinek wrote: This patch adds a small optimization to emit_case_bit_tests, instead of emitting (for high, low, mask all constants) (x - low) = (high - low) ((1 (x - low)) mask) if high is smaller than BITS_PER_WORD and low 0 we can emit x = high ((1 x) (mask low)) and avoid subtraction. Do this only if mask low isn't more costly than mask. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? But isn't this a general optimization? The always has to mean separate basic blocks, as otherwise it is undefined behavior. I'd think this optimization would be too specialized for a general optimization, and unsure in which pass it would be desirable. Also testing for RTX costs this early sounds bogus. Well, the bit test optimization is already decided based on other rtx costs. Jakub
[PATCH] Fix PR63476
This fixes PRE not keeping virtual SSA form up-to-date during insertion and thus eventual VOP walks from the devirt code during elimination. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2014-10-10 Richard Biener rguent...@suse.de PR tree-optimization/63476 * tree-ssa-pre.c (struct bb_bitmap_sets): Add vop_on_exit member. (BB_LIVE_VOP_ON_EXIT): New define. (create_expression_by_pieces): Assign VUSEs to stmts. (compute_avail): Track BB_LIVE_VOP_ON_EXIT. (pass_pre::execute): Assert virtual SSA form is up-to-date after insertion. * g++.dg/torture/pr63476.C: New testcase. Index: gcc/tree-ssa-pre.c === --- gcc/tree-ssa-pre.c (revision 216037) +++ gcc/tree-ssa-pre.c (working copy) @@ -423,6 +423,9 @@ typedef struct bb_bitmap_sets /* A cache for value_dies_in_block_x. */ bitmap expr_dies; + /* The live virtual operand on successor edges. */ + tree vop_on_exit; + /* True if we have visited this block during ANTIC calculation. */ unsigned int visited : 1; @@ -440,6 +443,7 @@ typedef struct bb_bitmap_sets #define EXPR_DIES(BB) ((bb_value_sets_t) ((BB)-aux))-expr_dies #define BB_VISITED(BB) ((bb_value_sets_t) ((BB)-aux))-visited #define BB_MAY_NOTRETURN(BB) ((bb_value_sets_t) ((BB)-aux))-contains_may_not_return_call +#define BB_LIVE_VOP_ON_EXIT(BB) ((bb_value_sets_t) ((BB)-aux))-vop_on_exit /* Basic block list in postorder. */ @@ -2886,12 +2890,15 @@ create_expression_by_pieces (basic_block bitmap_value_replace_in_set (NEW_SETS (block), nameexpr); bitmap_value_replace_in_set (AVAIL_OUT (block), nameexpr); } + + gimple_set_vuse (stmt, BB_LIVE_VOP_ON_EXIT (block)); } gimple_seq_add_seq (stmts, forced_stmts); } name = make_temp_ssa_name (exprtype, NULL, pretmp); newstmt = gimple_build_assign (name, folded); + gimple_set_vuse (newstmt, BB_LIVE_VOP_ON_EXIT (block)); gimple_set_plf (newstmt, NECESSARY, false); gimple_seq_add_stmt (stmts, newstmt); @@ -3593,6 +3623,9 @@ compute_avail (void) son = next_dom_son (CDI_DOMINATORS, son)) worklist[sp++] = son; + BB_LIVE_VOP_ON_EXIT (ENTRY_BLOCK_PTR_FOR_FN (cfun)) += ssa_default_def (cfun, gimple_vop (cfun)); + /* Loop until the worklist is empty. */ while (sp) { @@ -3607,7 +3640,10 @@ compute_avail (void) its immediate dominator. */ dom = get_immediate_dominator (CDI_DOMINATORS, block); if (dom) - bitmap_set_copy (AVAIL_OUT (block), AVAIL_OUT (dom)); + { + bitmap_set_copy (AVAIL_OUT (block), AVAIL_OUT (dom)); + BB_LIVE_VOP_ON_EXIT (block) = BB_LIVE_VOP_ON_EXIT (dom); + } /* Generate values for PHI nodes. */ for (gsi = gsi_start_phis (block); !gsi_end_p (gsi); gsi_next (gsi)) @@ -3617,7 +3653,10 @@ compute_avail (void) /* We have no need for virtual phis, as they don't represent actual computations. */ if (virtual_operand_p (result)) - continue; + { + BB_LIVE_VOP_ON_EXIT (block) = result; + continue; + } pre_expr e = get_or_alloc_expr_for_name (result); add_to_value (get_expr_value_id (e), e); @@ -3661,6 +3700,9 @@ compute_avail (void) bitmap_value_insert_into_set (AVAIL_OUT (block), e); } + if (gimple_vdef (stmt)) + BB_LIVE_VOP_ON_EXIT (block) = gimple_vdef (stmt); + if (gimple_has_side_effects (stmt) || stmt_could_throw_p (stmt) || is_gimple_debug (stmt)) @@ -4758,6 +4794,10 @@ pass_pre::execute (function *fun) remove_fake_exit_edges (); gsi_commit_edge_inserts (); + /* Eliminate folds statements which might (should not...) end up + not keeping virtual operands up-to-date. */ + gcc_assert (!need_ssa_update_p (fun)); + /* Remove all the redundant expressions. */ todo |= eliminate (true); Index: gcc/testsuite/g++.dg/torture/pr63476.C === --- gcc/testsuite/g++.dg/torture/pr63476.C (revision 0) +++ gcc/testsuite/g++.dg/torture/pr63476.C (working copy) @@ -0,0 +1,39 @@ +// { dg-do compile } +// { dg-additional-options -std=gnu++11 } + +enum class nsresult; +class A; +class B +{ +public: +B (int); +A *operator-(); +}; +class C +{ +}; +class A +{ +public: +virtual nsresult AddObserver (const char *, C *, bool) = 0; +}; +class D : A +{ + nsresult + AddObserver (const char *p1, C *p2, bool p3) + { + AddObserver (p1, p2, p3); + } +}; +char *prefList[]{}; +class F : C +{ + nsresult Install (); +}; +nsresult +F::Install () +{ + B branch = 0; + for (int i;;) +branch-AddObserver (prefList[i], this, false); +}
[PATCH v2, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference
On Wed, Oct 8, 2014 at 1:56 PM, Uros Bizjak ubiz...@gmail.com wrote: This message revives an old thread [1], where the miscompilation of gfortran on alpha was found that that resulted in: [...] As said in the audit trail of the bugreport I think that the caller of alpha_set_memflags is wrong in applying MEM flags from the _source_ operand to MEMs generated for the RMW sequence for the destination. It would be better to fix that instead. Please see comment #6 of the referred PR [1] for further analysis and ammended testcase. The testcase and analysis will show a native read passing possibly aliasing store. Attached v2 patch implements the same approach in all alias.c places that declare MEM_READONLY_P operands as never aliasing. 2014-10-10 Uros Bizjak ubiz...@gmail.com * alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P references when alignment ANDs are involved. (write_dependence_p): Ditto. (may_alias_p): Ditto. Patch was boostrapped and regression tested on x86_64-linux-gnu and alpha-linux-gnu. Unfortunately, there are still failures remaining in gfortran testsuite due to independent RTL infrastructure problem with VALUEs leaking into aliasing detecting functions [2], [3]. The patch was discussed and OK'd by Richi in the PR audit trail. If there are no objections from RTL maintainers, I plan to commit it to the mainline early next week. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483 [2] https://gcc.gnu.org/ml/gcc/2014-10/msg00060.html [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475 Uros. Index: alias.c === --- alias.c (revision 216025) +++ alias.c (working copy) @@ -2458,18 +2458,6 @@ true_dependence_1 (const_rtx mem, enum machine_mod || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; - /* Read-only memory is by definition never modified, and therefore can't - conflict with anything. We don't expect to find read-only set on MEM, - but stupid user tricks can produce them, so don't die. */ - if (MEM_READONLY_P (x)) -return 0; - - /* If we have MEMs referring to different address spaces (which can - potentially overlap), we cannot easily tell from the addresses - whether the references overlap. */ - if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) -return 1; - if (! mem_addr) { mem_addr = XEXP (mem, 0); @@ -2493,6 +2481,22 @@ true_dependence_1 (const_rtx mem, enum machine_mod } } + /* Read-only memory is by definition never modified, and therefore can't + conflict with anything. However, don't assume anything when AND + addresses are involved and leave to the code below to determine + dependence. We don't expect to find read-only set on MEM, but + stupid user tricks can produce them, so don't die. */ + if (MEM_READONLY_P (x) + GET_CODE (x_addr) != AND + GET_CODE (mem_addr) != AND) +return 0; + + /* If we have MEMs referring to different address spaces (which can + potentially overlap), we cannot easily tell from the addresses + whether the references overlap. */ + if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) +return 1; + base = find_base_term (x_addr); if (base (GET_CODE (base) == LABEL_REF || (GET_CODE (base) == SYMBOL_REF @@ -2576,16 +2580,6 @@ write_dependence_p (const_rtx mem, || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; - /* A read from read-only memory can't conflict with read-write memory. */ - if (!writep MEM_READONLY_P (mem)) -return 0; - - /* If we have MEMs referring to different address spaces (which can - potentially overlap), we cannot easily tell from the addresses - whether the references overlap. */ - if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) -return 1; - mem_addr = XEXP (mem, 0); if (!x_addr) { @@ -2603,6 +2597,21 @@ write_dependence_p (const_rtx mem, } } + /* A read from read-only memory can't conflict with read-write memory. + Don't assume anything when AND addresses are involved and leave to + the code below to determine dependence. */ + if (!writep + MEM_READONLY_P (mem) + GET_CODE (x_addr) != AND + GET_CODE (mem_addr) != AND) +return 0; + + /* If we have MEMs referring to different address spaces (which can + potentially overlap), we cannot easily tell from the addresses + whether the references overlap. */ + if (MEM_ADDR_SPACE (mem) != MEM_ADDR_SPACE (x)) +return 1; + base = find_base_term (mem_addr); if (! writep base @@ -2690,18 +2699,6 @@ may_alias_p (const_rtx mem, const_rtx x) || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; - /* Read-only memory is by definition never modified, and therefore can't - conflict with anything. We don't expect to find read-only set on MEM, - but stupid user tricks can
Re: [PATCH] Fix PR63379
On Thu, 9 Oct 2014, Richard Biener wrote: This fixes SLP vectorization of a MIN/MAX reduction where we choose a neutral element as the initial value of the first SLP group member. That's of course wrong. Simply don't choose one in which case the proper initial values will used. Bootstrap / regtest running on x86_64-unknown-linux-gnu. Ok, that's a too simple fix and breaks reduction chains for which this code was added. Thus we have to support both variants dependent on if this is a reduction chain or not (let's hope we don't support the mixed case...) Bootstrapped and tested on x86_64-unknown-linux-gnu, applied on trunk sofar. Richard. 2014-10-09 Richard Biener rguent...@suse.de PR tree-optimization/63379 * tree-vect-slp.c (vect_get_constant_vectors): Do not compute a neutral operand for min/max when it is not a reduction chain. * gcc.dg/vect/pr63379.c: New testcase. Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 216037) +++ gcc/tree-vect-slp.c (working copy) @@ -2395,13 +2395,21 @@ vect_get_constant_vectors (tree op, slp_ neutral_op = build_int_cst (TREE_TYPE (op), -1); break; - case MAX_EXPR: - case MIN_EXPR: -def_stmt = SSA_NAME_DEF_STMT (op); -loop = (gimple_bb (stmt))-loop_father; -neutral_op = PHI_ARG_DEF_FROM_EDGE (def_stmt, -loop_preheader_edge (loop)); -break; + /* For MIN/MAX we don't have an easy neutral operand but +the initial values can be used fine here. Only for +a reduction chain we have to force a neutral element. */ + case MAX_EXPR: + case MIN_EXPR: + if (!GROUP_FIRST_ELEMENT (stmt_vinfo)) + neutral_op = NULL; + else + { + def_stmt = SSA_NAME_DEF_STMT (op); + loop = (gimple_bb (stmt))-loop_father; + neutral_op = PHI_ARG_DEF_FROM_EDGE (def_stmt, + loop_preheader_edge (loop)); + } + break; default: neutral_op = NULL; Index: gcc/testsuite/gcc.dg/vect/pr63379.c === --- gcc/testsuite/gcc.dg/vect/pr63379.c (revision 0) +++ gcc/testsuite/gcc.dg/vect/pr63379.c (working copy) @@ -0,0 +1,43 @@ +/* PR tree-optimization/63379 */ +/* { dg-do run } */ + +#include tree-vect.h + +extern void abort (void); + +typedef struct { +int x; +int y; +} Point; + +Point pt_array[25]; + +void __attribute__((noinline,noclone)) +generate_array(void) +{ + unsigned int i; + for (i = 0; i25; i++) +{ + pt_array[i].x = i; + pt_array[i].y = 1000+i; +} +} + +int main() +{ + check_vect (); + generate_array (); + Point min_pt = pt_array[0]; + Point *ptr, *ptr_end; + for (ptr = pt_array+1, ptr_end = pt_array+25; ptr != ptr_end; ++ptr) +{ + min_pt.x = (min_pt.x ptr-x) ? min_pt.x : ptr-x; + min_pt.y = (min_pt.y ptr-y) ? min_pt.y : ptr-y; +} + + if (min_pt.x != 0 || min_pt.y != 1000) +abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump vect } } */
Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:58 AM, Evgeny Stupachenko evstu...@gmail.com wrote: the patch improves performance when previous are applied. It makes RTL loop invariant behavior for GOT loads same as it was before the 2 previous patches. The patch fixes x86 address cost so that cost for addresses with GOT register becomes less, how it was before enabling EBX. In x86_address_cost the result of “REGNO (parts.base) = FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes condition result the same when parts.base is GOT register (the same for parts.index). 2014-10-08 Evgeny Stupachenko evstu...@gmail.com * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for when address contains GOT register. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b43e870..9d8cfd1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) cost++; Please add a short comment here, explaining the reason for new condition. if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++; Otherwise LGTM, but please repost the patch with a comment. Uros.
[PATCH] Fix default(none) with array parameters (PR fortran/59488)
Hi! This patch makes Fortran array/derived parameters as well as vtables predetermined shared. For the standard proper those aren't really variables at all, and as they are global vars, shared is the best predetermination for them. 2014-10-10 Jakub Jelinek ja...@redhat.com PR fortran/59488 * trans-openmp.c (gfc_omp_predetermined_sharing): Return OMP_CLAUSE_DEFAULT_SHARED for parameters or vtables. * gfortran.dg/gomp/pr59488-1.f90: New test. * gfortran.dg/gomp/pr59488-2.f90: New test. --- gcc/fortran/trans-openmp.c.jj 2014-09-01 09:43:40.0 +0200 +++ gcc/fortran/trans-openmp.c 2014-10-10 10:15:00.426035513 +0200 @@ -135,6 +135,16 @@ gfc_omp_predetermined_sharing (tree decl if (GFC_DECL_RESULT (decl) ! DECL_HAS_VALUE_EXPR_P (decl)) return OMP_CLAUSE_DEFAULT_SHARED; + /* These are either array or derived parameters, or vtables. + In the former cases, the OpenMP standard doesn't consider them to be + variables at all (they can't be redefined), but they can nevertheless appear + in parallel/task regions and for default(none) purposes treat them as shared. + For vtables likely the same handling is desirable. */ + if (TREE_CODE (decl) == VAR_DECL + TREE_READONLY (decl) + TREE_STATIC (decl)) +return OMP_CLAUSE_DEFAULT_SHARED; + return OMP_CLAUSE_DEFAULT_UNSPECIFIED; } --- gcc/testsuite/gfortran.dg/gomp/pr59488-1.f90.jj 2014-10-10 10:06:38.231365699 +0200 +++ gcc/testsuite/gfortran.dg/gomp/pr59488-1.f902014-10-10 10:06:56.054036548 +0200 @@ -0,0 +1,13 @@ +! PR fortran/59488 +! { dg-do compile } +! { dg-options -fopenmp } + + implicit none + integer, parameter :: p(2) = (/ 11, 12 /) + integer :: r + + !$omp parallel do default(none) + do r = 1, 2 +print *, p(r) + end do +end --- gcc/testsuite/gfortran.dg/gomp/pr59488-2.f90.jj 2014-10-10 10:06:41.331308441 +0200 +++ gcc/testsuite/gfortran.dg/gomp/pr59488-2.f902014-10-10 10:08:07.0 +0200 @@ -0,0 +1,16 @@ +! PR fortran/59488 +! { dg-do compile } +! { dg-options -fopenmp } + + implicit none + type t +integer :: s1, s2, s3 + end type + integer :: r + type(t), parameter :: u = t(1, 2, 3) + + !$omp parallel do default(none) + do r = 1, 2 +print *, u + end do +end Jakub
[PING v2][PATCH] Support for BIT_FIELD_REF in asan.c
On 10/03/2014 10:47 AM, Marat Zakirov wrote: On 09/26/2014 12:55 PM, Marat Zakirov wrote: Hi all! Here's a patch which instruments byte-aligned BIT_FIELD_REFs. During GCC asan-bootstrap and Linux kernel build I didn't find any cases where BIT_FIELD_REF is not 8 bits aligned. But I do not have sufficient confidence to replace current return if BIT_FIELD_REF is misaligned to assert. Ok to commit? --Marat gcc/ChangeLog: 2014-09-19 Marat Zakirov m.zaki...@samsung.com * asan.c (instrument_derefs): BIT_FIELD_REF added. gcc/testsuite/ChangeLog: 2014-09-19 Marat Zakirov m.zaki...@samsung.com * c-c++-common/asan/bitfield-5.c: New test. diff --git a/gcc/asan.c b/gcc/asan.c index cf5de27..451af33 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1705,6 +1705,7 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t, case INDIRECT_REF: case MEM_REF: case VAR_DECL: +case BIT_FIELD_REF: break; /* FALLTHRU */ default: diff --git a/gcc/testsuite/c-c++-common/asan/bitfield-5.c b/gcc/testsuite/c-c++-common/asan/bitfield-5.c new file mode 100644 index 000..eb5e9e9 --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/bitfield-5.c @@ -0,0 +1,24 @@ +/* Check BIT_FIELD_REF. */ + +/* { dg-do run } */ +/* { dg-shouldfail asan } */ + +struct A +{ + int y : 20; + int x : 13; +}; + +int __attribute__ ((noinline, noclone)) +f (void *p) { + return ((struct A *)p)-x != 0; +} + +int +main () +{ + int a = 0; + return f (a); +} + +/* { dg-output ERROR: AddressSanitizer: stack-buffer-overflow } */
[PATCH] Fix PR63419
force_gimple_operand doesn't really do a deep verification on its input when we ask for a non-simple result here. Instead it trusts that the CONSTRUCTOR PRE feeds it is valid GIMPLE. Unfortunately that isn't so if its elements required a conversion. The following fixes it by merging gimple_convert and its use in PRE from match-and-simplify (which avoids force_gimple_operand entirely in PRE). The implementation of gimple_convert is of course still using fold here. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-10-10 Richard Biener rguent...@suse.de PR tree-optimization/63419 * gimple-fold.h (gimple_convert): New function. * gimple-fold.c (gimple_convert): Likewise. * tree-ssa-pre.c (create_expression_by_pieces): Use gimple_convert to split out required conversions early. * g++.dg/torture/pr63419.C: New testcase. Index: gcc/gimple-fold.c === *** gcc/gimple-fold.c (revision 216069) --- gcc/gimple-fold.c (working copy) *** rewrite_to_defined_overflow (gimple stmt *** 5282,5284 --- 5282,5301 return stmts; } + + /* Return OP converted to TYPE by emitting a conversion statement on SEQ +if required using location LOC. Note that OP will be returned +unmodified if GIMPLE does not require an explicit conversion between +its type and TYPE. */ + + tree + gimple_convert (gimple_seq *seq, location_t loc, tree type, tree op) + { + if (useless_type_conversion_p (type, TREE_TYPE (op))) + return op; + op = fold_convert_loc (loc, type, op); + gimple_seq stmts = NULL; + op = force_gimple_operand (op, stmts, true, NULL_TREE); + gimple_seq_add_seq_without_update (seq, stmts); + return op; + } Index: gcc/gimple-fold.h === *** gcc/gimple-fold.h (revision 216069) --- gcc/gimple-fold.h (working copy) *** extern tree gimple_fold_indirect_ref (tr *** 45,48 --- 45,55 extern bool arith_code_with_undefined_signed_overflow (tree_code); extern gimple_seq rewrite_to_defined_overflow (gimple); + extern tree gimple_convert (gimple_seq *, location_t, tree, tree); + inline tree + gimple_convert (gimple_seq *seq, tree type, tree op) + { + return gimple_convert (seq, UNKNOWN_LOCATION, type, op); + } + #endif /* GCC_GIMPLE_FOLD_H */ Index: gcc/tree-ssa-pre.c === *** gcc/tree-ssa-pre.c (revision 216069) --- gcc/tree-ssa-pre.c (working copy) *** create_expression_by_pieces (basic_block *** 2819,2830 if (nary-opcode == POINTER_PLUS_EXPR) { if (i == 0) ! genop[i] = fold_convert (nary-type, genop[i]); else if (i == 1) ! genop[i] = convert_to_ptrofftype (genop[i]); } else ! genop[i] = fold_convert (TREE_TYPE (nary-op[i]), genop[i]); } if (nary-opcode == CONSTRUCTOR) { --- 2819,2833 if (nary-opcode == POINTER_PLUS_EXPR) { if (i == 0) ! genop[i] = gimple_convert (forced_stmts, !nary-type, genop[i]); else if (i == 1) ! genop[i] = gimple_convert (forced_stmts, !sizetype, genop[i]); } else ! genop[i] = gimple_convert (forced_stmts, !TREE_TYPE (nary-op[i]), genop[i]); } if (nary-opcode == CONSTRUCTOR) { *** create_expression_by_pieces (basic_block *** 2866,2873 statements. We have to call unshare_expr because force_gimple_operand may modify the tree we pass to it. */ ! folded = force_gimple_operand (unshare_expr (folded), forced_stmts, false, NULL); /* If we have any intermediate expressions to the value sets, add them to the value sets and chain them in the instruction stream. */ --- 2869,2878 statements. We have to call unshare_expr because force_gimple_operand may modify the tree we pass to it. */ ! gimple_seq tem = NULL; ! folded = force_gimple_operand (unshare_expr (folded), tem, false, NULL); + gimple_seq_add_seq_without_update (forced_stmts, tem); /* If we have any intermediate expressions to the value sets, add them to the value sets and chain them in the instruction stream. */
Re: [PATCH] Small optimization for emit_case_bit_tests (PR tree-optimization/63464)
On Fri, 10 Oct 2014, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 12:55:21PM +0200, Richard Biener wrote: On Fri, 10 Oct 2014, Jakub Jelinek wrote: This patch adds a small optimization to emit_case_bit_tests, instead of emitting (for high, low, mask all constants) (x - low) = (high - low) ((1 (x - low)) mask) if high is smaller than BITS_PER_WORD and low 0 we can emit x = high ((1 x) (mask low)) and avoid subtraction. Do this only if mask low isn't more costly than mask. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? But isn't this a general optimization? The always has to mean separate basic blocks, as otherwise it is undefined behavior. I'd think this optimization would be too specialized for a general optimization, and unsure in which pass it would be desirable. Yeah, I also didn't come up with a good place to do this in general. ifcombine maybe, which already does some pattern matching on if (x) if (y) ... If I understand correctly we have if ((x - low) = (high - low)) { if (((1 (x - low)) mask)) { ... which we'd like to transform to if (x = high) { if ((1 x) (mask low)) { ... ifcombine can certainly detect the pattern, but obviously the result wouldn't have the ifs combined in any way. And you'd need to check whether _all_ code that depends on both tests is ok with the transformed checks (not sure if that is always the case). Also testing for RTX costs this early sounds bogus. Well, the bit test optimization is already decided based on other rtx costs. Oh, I see. I suppose the patch is fine then. Thanks, Richard.
Re: [PING v2][PATCH] Support for BIT_FIELD_REF in asan.c
On Fri, Oct 10, 2014 at 03:40:57PM +0400, Marat Zakirov wrote: gcc/ChangeLog: 2014-09-19 Marat Zakirov m.zaki...@samsung.com * asan.c (instrument_derefs): BIT_FIELD_REF added. gcc/testsuite/ChangeLog: 2014-09-19 Marat Zakirov m.zaki...@samsung.com * c-c++-common/asan/bitfield-5.c: New test. Ok, thanks. Jakub
[Ada] New internal primitive Is_Subprogram_Or_Generic_Subprogram
This is a minor internal cleanup, to introduce a new primitive Is_Subprogram_Or_Generic_Subprogram with the obvious meaning. No external effect, no test required. Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-10 Robert Dewar de...@adacore.com * sem_ch7.adb, einfo.adb, einfo.ads, sem_prag.adb, sem_ch12.adb, freeze.adb, sem_util.adb, sem_res.adb, exp_ch6.adb, exp_ch13.adb, sem_ch6.adb, sem_cat.adb, sem_disp.adb (Is_Subprogram_Or_Generic_Subprogram): New primitive. Use this primitive throughout where appropriate. Index: sem_ch7.adb === --- sem_ch7.adb (revision 216063) +++ sem_ch7.adb (working copy) @@ -2808,7 +2808,7 @@ -- Body required if subprogram - elsif Is_Subprogram (P) or else Is_Generic_Subprogram (P) then + elsif Is_Subprogram_Or_Generic_Subprogram (P) then return True; -- Treat a block as requiring a body @@ -2937,7 +2937,7 @@ -- Body required if subprogram - elsif Is_Subprogram (P) or else Is_Generic_Subprogram (P) then + elsif Is_Subprogram_Or_Generic_Subprogram (P) then Error_Msg_N (info: requires body (subprogram case)?Y?, P); -- Body required if generic parent has Elaborate_Body Index: einfo.adb === --- einfo.adb (revision 216063) +++ einfo.adb (working copy) @@ -1129,8 +1129,7 @@ E_Package_Body, E_Subprogram_Body, E_Variable) - or else Is_Generic_Subprogram (Id) - or else Is_Subprogram (Id)); + or else Is_Subprogram_Or_Generic_Subprogram (Id)); return Node34 (Id); end Contract; @@ -3405,6 +3404,13 @@ return Ekind (Id) in Subprogram_Kind; end Is_Subprogram; + function Is_Subprogram_Or_Generic_Subprogram (Id : E) return B is + begin + return Ekind (Id) in Subprogram_Kind + or else + Ekind (Id) in Generic_Subprogram_Kind; + end Is_Subprogram_Or_Generic_Subprogram; + function Is_Task_Type(Id : E) return B is begin return Ekind (Id) in Task_Kind; @@ -3593,15 +3599,14 @@ begin pragma Assert (Ekind_In (Id, E_Entry, - E_Entry_Family, - E_Generic_Package, - E_Package, - E_Package_Body, - E_Subprogram_Body, - E_Variable, - E_Void) - or else Is_Generic_Subprogram (Id) - or else Is_Subprogram (Id)); + E_Entry_Family, + E_Generic_Package, + E_Package, + E_Package_Body, + E_Subprogram_Body, + E_Variable, + E_Void) + or else Is_Subprogram_Or_Generic_Subprogram (Id)); Set_Node34 (Id, V); end Set_Contract; Index: einfo.ads === --- einfo.ads (revision 216063) +++ einfo.ads (working copy) @@ -2974,6 +2974,10 @@ -- Applies to all entities, true for function, procedure and operator -- entities. +--Is_Subprogram_Or_Generic_Subprogram +-- Applies to all entities, true for function procedure and operator +-- entities, and also for the corresponding generic entities. + --Is_Synchronized_Interface (synthesized) -- Defined in types that are interfaces. True if interface is declared -- synchronized, task, or protected, or is derived from a synchronized @@ -6964,6 +6968,7 @@ function Is_Scalar_Type (Id : E) return B; function Is_Signed_Integer_Type (Id : E) return B; function Is_Subprogram (Id : E) return B; + function Is_Subprogram_Or_Generic_Subprogram (Id : E) return B; function Is_Task_Type(Id : E) return B; function Is_Type (Id : E) return B; @@ -8800,6 +8805,7 @@ pragma Inline (Is_Base_Type); pragma Inline (Is_Package_Or_Generic_Package); pragma Inline (Is_Packed_Array); + pragma Inline (Is_Subprogram_Or_Generic_Subprogram); pragma Inline (Is_Volatile); pragma Inline (Is_Wrapper_Package); pragma Inline (Known_RM_Size); Index: sem_prag.adb === --- sem_prag.adb(revision 216063) +++ sem_prag.adb(working copy) @@ -6736,10 +6736,9 @@ (dispatching subprogram# cannot use Stdcall convention!, Arg1); - -- Subprogram is allowed, but not a generic subprogram + -- Subprograms are not allowed - elsif not Is_Subprogram (E) - and then not Is_Generic_Subprogram (E) + elsif not Is_Subprogram_Or_Generic_Subprogram (E)
[Ada] Check for attempt to bind GNATprove files
If one or more objects is compiled in GNATprove mode (either by using GNATprove directly, or by using -gnatd.F), then the ALI file is marked and gnatbind will exit with a message as shown here. Given: 1. procedure linkdf is 2. begin 3.null; 4. end; If we first compile this with gcc -c linkdf.adb -gnatd.F then we try to do a gnatmake, we get error: one or more files compiled in GNATprove mode gnatmake: *** bind failed. Previously this was not detected and the linker bombed with peculiar error messages. Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-10 Robert Dewar de...@adacore.com * ali.adb (Scan_ALI): Read and process new GP flag on ALI P line. * ali.ads (GNATprove_Mode): New component in ALI table. (GNATprove_Mode_Specified): New global. * gnatbind.adb (Gnatbind): Give fatal error if any file compiled in GNATProve mode. * lib-writ.ads, lib-writ.adb (GP): New flag on P line for GNATProve_Mode. Index: lib-writ.adb === --- lib-writ.adb(revision 216063) +++ lib-writ.adb(working copy) @@ -1153,6 +1153,10 @@ end if; end if; + if GNATprove_Mode then + Write_Info_Str ( GP); + end if; + if Partition_Elaboration_Policy /= ' ' then Write_Info_Str ( E); Write_Info_Char (Partition_Elaboration_Policy); Index: lib-writ.ads === --- lib-writ.ads(revision 216063) +++ lib-writ.ads(working copy) @@ -192,6 +192,9 @@ -- the units in this file, where x is the first character -- (upper case) of the policy name (e.g. 'C' for Concurrent). + -- GP Set if this compilation was done in GNATprove mode, either + -- from direct use of GNATprove, or from use of -gnatdF. + -- Lx A valid Locking_Policy pragma applies to all the units in -- this file, where x is the first character (upper case) of -- the policy name (e.g. 'C' for Ceiling_Locking). @@ -200,7 +203,9 @@ -- were not compiled to produce an object. This can occur as a -- result of the use of -gnatc, or if no object can be produced -- (e.g. when a package spec is compiled instead of the body, - -- or a subunit on its own). + -- or a subunit on its own). Note that in GNATprove mode, we + -- do produce an object. The object is not suitable for binding + -- and linking, but we do not set NO, instead we set GP. -- NR No_Run_Time. Indicates that a pragma No_Run_Time applies -- to all units in the file. Index: ali.adb === --- ali.adb (revision 216063) +++ ali.adb (working copy) @@ -111,6 +111,7 @@ Locking_Policy_Specified := ' '; No_Normalize_Scalars_Specified := False; No_Object_Specified:= False; + GNATprove_Mode_Specified := False; Normalize_Scalars_Specified:= False; Partition_Elaboration_Policy_Specified := ' '; Queuing_Policy_Specified := ' '; @@ -875,6 +876,7 @@ First_Sdep = No_Sdep_Id, First_Specific_Dispatching = Specific_Dispatching.Last + 1, First_Unit = No_Unit_Id, +GNATprove_Mode = False, Last_Interrupt_State = Interrupt_States.Last, Last_Sdep= No_Sdep_Id, Last_Specific_Dispatching= Specific_Dispatching.Last, @@ -1089,6 +1091,13 @@ ALIs.Table (Id).Partition_Elaboration_Policy := Partition_Elaboration_Policy_Specified; +-- Processing for GP + +elsif C = 'G' then + Checkc ('P'); + GNATprove_Mode_Specified := True; + ALIs.Table (Id).GNATprove_Mode := True; + -- Processing for Lx elsif C = 'L' then Index: ali.ads === --- ali.ads (revision 216063) +++ ali.ads (working copy) @@ -176,6 +176,11 @@ -- always be set as well in this case. Not set if 'P' appears in -- Ignore_Lines. + GNATprove_Mode : Boolean; + -- Set to True if ALI and object file produced in GNATprove_Mode as + -- signalled by GP appearing on the P line. Not set if 'P' appears in + -- Ignore_Lines. + No_Object : Boolean; -- Set to True if no object file generated. Not set if 'P' appears in -- Ignore_Lines. @@ -465,6 +470,9 @@ -- Set to False by Initialize_ALI. Set to True if Scan_ALI reads -- a unit for which dynamic elaboration
Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
On 10 October 2014 11:53, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. A linker patching technique can also be used, by moving potentially affected multiply-accumulate instruction into a patch region and replacing the original instruction with a branch to the patch. This patch achieves the compiler part by using the final prescan pass. The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH so that conditional branches work properly. The fix is disabled by default and can be turned on by the -mfix-cortex-a53-835769 compiler option. I'm attaching a trunk and a 4.9 version of the patch. The 4.9 version is different only in that rtx_insn* is replaced by rtx. Tested on aarch64-none-linux-gnu (and bootstrap with the option) and built various large benchmarks with it. Ok? OK /Marcus
Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option
On 10 October 2014 11:53, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This adds a new configure-time option --enable-fix-cortex-a53-835769 that will enable the Cortex-A53 erratum fix by default so you don't have to specify -mfix-cortex-a53-835769 every time. Documentation in install.texi is added. Ok for trunk? Thanks, Kyrill 2014-10-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * configure.ac: Add --enable-fix-cortex-a53-835769 option. * configure: Regenerate. * config/aarch64/aarch64.c (aarch64_override_options): Handle TARGET_FIX_ERR_A53_835769_DEFAULT. * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): Set Init value to 2. * doc/install.texi (aarch64*-*-*): Document new --enable-fix-cortex-a53-835769 option. OK /Marcus
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. Please mention *which* code you removed here. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. This is not New check, but changed one. Please mention *what* changed. - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) Hm, can you please add a comment for this change? Uros.
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Uros Bizjak ubiz...@gmail.com writes: On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. Evgeny: here and in your other submissions: drop the gcc prefix from the pathnames. They are all relative to the directory the ChangeLog lives in. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote: Uros Bizjak ubiz...@gmail.com writes: On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. Evgeny: here and in your other submissions: drop the gcc prefix from the pathnames. They are all relative to the directory the ChangeLog lives in. And add a blank line after after the e-mail lines. Jakub
Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option
On 10/10/14 13:25, Marcus Shawcroft wrote: On 10 October 2014 11:53, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This adds a new configure-time option --enable-fix-cortex-a53-835769 that will enable the Cortex-A53 erratum fix by default so you don't have to specify -mfix-cortex-a53-835769 every time. Documentation in install.texi is added. Ok for trunk? Thanks, Kyrill 2014-10-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * configure.ac: Add --enable-fix-cortex-a53-835769 option. * configure: Regenerate. * config/aarch64/aarch64.c (aarch64_override_options): Handle TARGET_FIX_ERR_A53_835769_DEFAULT. * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): Set Init value to 2. * doc/install.texi (aarch64*-*-*): Document new --enable-fix-cortex-a53-835769 option. OK /Marcus Thanks Marcus, committed as r216076. Can this go to the 4.9 branch as well? It applies cleanly there (with appropriately regenerated configure) and I tested it there with bootstraps with and without the option. Kyrill
Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option
On 10 October 2014 13:38, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Thanks Marcus, committed as r216076. Can this go to the 4.9 branch as well? It applies cleanly there (with appropriately regenerated configure) and I tested it there with bootstraps with and without the option. Yes that's ok /Marcus
[Ada] Issue errors on illegal contracts unless SPARK_Mode is Off
Illegal Global/Depends contracts should be flagged by frontend in code for which SPARK_Mode is not specified, as GNATprove relies on contracts being legal in those cases. The frontend should skip these errors only when SPARK_Mode is Off. Now fixed, as shown on the following example. Command: $ gcc -c notinspark.ads Output: --- 1. package Notinspark is 2. 3.function Get return Integer; 4. 5.procedure Set with 6. Global = (In_Out = Get); | global item must denote variable or state 7. 8. end Notinspark; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-10 Yannick Moy m...@adacore.com * errout.adb (SPARK_Msg_N): Issue error unless SPARK_Mode is Off. Index: errout.adb === --- errout.adb (revision 216063) +++ errout.adb (working copy) @@ -3138,7 +3138,7 @@ procedure SPARK_Msg_N (Msg : String; N : Node_Or_Entity_Id) is begin - if SPARK_Mode = On then + if SPARK_Mode /= Off then Error_Msg_N (Msg, N); end if; end SPARK_Msg_N;
[Ada] Loop parameter is a constant in an iterator over a formal container.
This patch enforces the same semantics for the handling of loop parameters in element iterators over formal containers, os those over formal containers: the loop parameter cannot be assigned to in user code. Compiling formal_test.adb must yield: formal_test.adb:15:07: assignment to loop parameter not allowed --- with Ada.Containers.Formal_Doubly_Linked_Lists; procedure Formal_Test is type E is range 1 .. 1000; package My_List is new Ada.Containers.Formal_Doubly_Linked_Lists (E); use My_List; Thing : My_List.List (10); C : Cursor; begin for I in 1 .. 10 loop Append (Thing, E (I)); end loop; for Element of Thing loop null; Element := Element * 3; -- ERROR end loop; end; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-10 Ed Schonberg schonb...@adacore.com * exp_ch5.adb (Expand_Formal_Container_Element_Loop): Analyze declaration for loop parameter before rest of loop, and set entity kind to prevent assignments to it in the user code. * sem_ch3.adb (Analyze_Object_Contract): No contracts apply to the loop parameter in an element iteration over o formal container. Index: exp_ch5.adb === --- exp_ch5.adb (revision 216063) +++ exp_ch5.adb (working copy) @@ -2889,7 +2889,17 @@ Statements = New_List (New_Loop))); Rewrite (N, New_Loop); - Analyze (New_Loop); + + -- The loop parameter is declared by an object declaration, but within + -- the loop we must prevent user assignments to it, so we analyze the + -- declaration and reset the entity kind, before analyzing the rest of + -- the loop; + + Analyze (Elmt_Decl); + Set_Ekind (Defining_Identifier (Elmt_Decl), E_Loop_Parameter); + Set_Assignment_OK (Name (Elmt_Ref)); + + Analyze (N); end Expand_Formal_Container_Element_Loop; - Index: sem_ch3.adb === --- sem_ch3.adb (revision 216063) +++ sem_ch3.adb (working copy) @@ -3062,6 +3062,12 @@ Error_Msg_N (constant cannot be volatile, Obj_Id); end if; + -- The loop parameter in an element iterator over a formal container + -- is declared with an object declaration but no contracts apply. + + elsif Ekind (Obj_Id) = E_Loop_Parameter then + null; + else pragma Assert (Ekind (Obj_Id) = E_Variable); -- The following checks are only relevant when SPARK_Mode is on as
Re: [PATCH] Add zero-overhead looping for xtensa backend
Hi Sterling, I made some improvement to the patch. Two changes: 1. TARGET_LOOPS is now used as a condition of the doloop related patterns, which is more elegant. 2. As the trip count register of the zero-cost loop maybe potentially spilled, we need to change the patterns in order to handle this issue. The solution is similar to that adapted by c6x backend. Just turn the zero-cost loop into a regular loop when that happens when reload is completed. Attached please find version 4 of the patch. Make check regression tested with xtensa-elf-gcc/simulator. OK for trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 216079) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,20 @@ +2014-10-10 Felix Yang felix.y...@huawei.com + +* config/xtensa/xtensa.h (TARGET_LOOPS): New Macro. +* config/xtensa/xtensa.c (xtensa_reorg): New. +(xtensa_reorg_loops): New. +(xtensa_can_use_doloop_p): New. +(xtensa_invalid_within_doloop): New. +(hwloop_optimize): New. +(hwloop_fail): New. +(hwloop_pattern_reg): New. +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label. +(xtensa_doloop_hooks): Define. +* config/xtensa/xtensa.md (doloop_end): New. +(loop_end): New +(zero_cost_loop_start): Rewritten. +(zero_cost_loop_end): Rewritten. + 2014-10-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * configure.ac: Add --enable-fix-cortex-a53-835769 option. Index: gcc/config/xtensa/xtensa.md === --- gcc/config/xtensa/xtensa.md(revision 216079) +++ gcc/config/xtensa/xtensa.md(working copy) @@ -35,6 +35,8 @@ (UNSPEC_TLS_CALL9) (UNSPEC_TP10) (UNSPEC_MEMW11) + (UNSPEC_LSETUP_START 12) + (UNSPEC_LSETUP_END13) (UNSPECV_SET_FP1) (UNSPECV_ENTRY2) @@ -1289,41 +1291,120 @@ (set_attr length3)]) +;; Zero-overhead looping support. + ;; Define the loop insns used by bct optimization to represent the -;; start and end of a zero-overhead loop (in loop.c). This start -;; template generates the loop insn; the end template doesn't generate -;; any instructions since loop end is handled in hardware. +;; start and end of a zero-overhead loop. This start template generates +;; the loop insn; the end template doesn't generate any instructions since +;; loop end is handled in hardware. (define_insn zero_cost_loop_start [(set (pc) -(if_then_else (eq (match_operand:SI 0 register_operand a) - (const_int 0)) - (label_ref (match_operand 1 )) - (pc))) - (set (reg:SI 19) -(plus:SI (match_dup 0) (const_int -1)))] - - loopnez\t%0, %l1 +(if_then_else (ne (match_operand:SI 0 register_operand 2) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 register_operand =a) +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] + TARGET_LOOPS optimize + loop\t%0, %l1_LEND [(set_attr typejump) (set_attr modenone) (set_attr length3)]) (define_insn zero_cost_loop_end [(set (pc) -(if_then_else (ne (reg:SI 19) (const_int 0)) - (label_ref (match_operand 0 )) - (pc))) - (set (reg:SI 19) -(plus:SI (reg:SI 19) (const_int -1)))] - +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand 2,2) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 nonimmediate_operand =a,m) +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END) + (clobber (match_scratch:SI 3 =X,r))] + TARGET_LOOPS optimize + # + [(set_attr typejump) + (set_attr modenone) + (set_attr length0)]) + +(define_insn loop_end + [(set (pc) +(if_then_else (ne (match_operand:SI 0 register_operand 2) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 register_operand =a) +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)] + TARGET_LOOPS optimize { -xtensa_emit_loop_end (insn, operands); -return ; + xtensa_emit_loop_end (insn, operands); + return ; } [(set_attr typejump) (set_attr modenone) (set_attr length0)]) +(define_split + [(set (pc) +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand ) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 nonimmediate_operand ) +(plus:SI (match_dup 0) + (const_int -1))) + (unspec
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Updated ChangeLog: 2014-10-10 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove PIC register initialization now performed in ix86_init_pic_reg. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. * config/i386/i386.h (PIC_OFFSET_TABLE_REGNUM): Return INVALID_REGNUM if pic_offset_table_rtx exists. - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) Hm, can you please add a comment for this change? I've added the following comment to the patch: -in which case we return (%ecx - %ebx) + foo. */ +in which case we return (%ecx - %ebx) + foo. + +Note that when pseudo_pic_reg is used we can generate it only +before reload_completed. */ On Fri, Oct 10, 2014 at 4:36 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote: Uros Bizjak ubiz...@gmail.com writes: On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. Evgeny: here and in your other submissions: drop the gcc prefix from the pathnames. They are all relative to the directory the ChangeLog lives in. And add a blank line after after the e-mail lines. Jakub
Re: [PATCH] gcc parallel make check
Hi Jakub, On 15 September 2014 18:05, Jakub Jelinek ja...@redhat.com wrote: [...] # For parallelized check-% targets, this decides whether parallelization # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything # but optional --target_board or --extra_opts arguments). If desirable, # recursive make is run with check-parallel-$lang{,1,2,3,4,5} etc. goals, # which can be executed in parallel, as they are run in separate directories. -# check-parallel-$lang{1,2,3,4,5} etc. goals invoke runtest with the longest -# running *.exp files from the testsuite, as determined by check_$lang_parallelize -# variable. The check-parallel-$lang goal in that case invokes runtest with -# all the remaining *.exp files not handled by the separate goals. +# check-parallel-$lang{,1,2,3,4,5} etc. goals invoke runtest with +# GCC_RUNTEST_PARALLELIZE_DIR var in the environment and runtest_file_p +# dejaGNU procedure is overridden to additionally synchronize through +# a $lang-parallel directory which tests will be run by which runtest instance. # Afterwards contrib/dg-extract-results.sh is used to merge the sum and log # files. If parallelization isn't desirable, only one recursive make # is run with check-parallel-$lang goal and check_$lang_parallelize variable @@ -3662,76 +3645,60 @@ check_p_subdirs=$(wordlist 1,$(words $(c # to lang_checks_parallelized variable and define check_$lang_parallelize # variable (see above check_gcc_parallelize description). $(lang_checks_parallelized): check-% : site.exp - @if [ -z $(filter-out --target_board=%,$(filter-out --extra_opts%,$(RUNTESTFLAGS))) ] \ Since you removed this test, the comment above is not longer accurate: setting RUNTESTFLAGS to whatever value no longer disables parallelization. Which leads me to discuss a bug I faced after you committed this change: I am testing a patch which bring a series of new tests. $ RUNTESTFLAGS=my.exp make -jN check (in fact the 'make -j' is embedded in a larger build script) my.exp contains the following construct which is often used in the testsuite: == foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] { # If we're only testing specific files and this isn't one of them, skip it. if ![runtest_file_p $runtests $src] then { continue } c-torture-execute $src $additional_flags gcc-dg-runtest $src $additional_flags } == Note that gcc-dg-runtest calls runtest_file_p too. What I observed is that if I use -j1, all my .c files get tested, while with N2 some of them are silently skipped. It took me a while to figure out that it's because gcc-dg-runtest calls runtest_file_p, which means that runtest_file_p is called twice when the 1st invocation returns 1, and only once when the 1st invocation returns 0. For example, if we have pid0, pid1 the concurrent runtest processes, and file0.c, file1.c, the testcases, then: * pid0 decides to keep file0.c file1.c file2.c file3.c file4.c. Since the above loop calls runtest_file_p twice for each, we reach the minor counter of 10. * in the mean time, pid1 decides to skip file0.c, file1.c ... file9.c since it calls runtest_file_p only once for each * pid1 increments its parallel counter to 1, and create the new testing subdir * pid1 decides to keep file10, file11, file12, file13 and file14 (again, 2 calls to runtest_file_p per testcase) * pid0 increments its parallel counter to 1, and decides it has to skip it * pid0 thus decides to skip file5, file6, file7, ... file14, calling runtest_file_p once for each * etc... In the end, we have ignored file5...file9 I'm not sure why you have made special cases for some of the existing *.exp when you forced them to disable parallelization. Was it to handle such cases? I'm not sure about the next step: - should I modify my .exp file? - should you modify gcc_parallel_test_run_p? Even if I have to modify my .exp file, I think this is error prone, and others could introduce a similar construct in the future. Thanks, Christophe.
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [12/n] Optimize string functions
On 09 Oct 11:02, Jeff Law wrote: On 10/08/14 13:18, Ilya Enkovich wrote: Hi, This patch introduces simple optimization of string function calls using variants with no checks and/or bounds copy when possible. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (check_infos): New. (chkp_get_nobnd_fndecl): New. (chkp_get_nochk_fndecl): New. (chkp_optimize_string_function_calls): New. (chkp_opt_init): New. (chkp_opt_fini): New. (chkp_opt_execute): New. (chkp_opt_gate): New. (pass_data_chkp_opt): New. (pass_chkp_opt): New. (make_pass_chkp_opt): New. Just a note, these reviews are going to come out-of-order. I thought I mentioned it in the prior message. Can you pull the optimization stuff into its own file. It seems to me it ought to be conceptually separate from expansion and the other miscellaneous stuff in tree-chkp.c. + +/* Find memcpy, mempcpy, memmove and memset calls, perform + checks before call and then call no_chk version of + functions. We do it on O2 to enable inlining of these + functions during expand. So is the purpose here to expose the checks that would normally be done in the mem* routines to their caller in the hopes that doing so will expose redundant checks? Or is there some other reason? There are few reasons to replace instrumented string functions: 1. As you said following redundant checks elimination may remove checks for some string functions 2. By default functions like memcpy should assume pointers are copied and copy bounds. If we know pointers are not copied, we may use faster version with no bounds copy 3. If we avoid both checks and bounds copy then it is a candidate for existing string function calls inlining in expand pass + + Also try to find memcpy, mempcpy, memmove and memset calls + which are known to not write pointers to memory and use + faster function versions for them. */ Can you please include tests for both classes of transformations? It doesn't have to be exhaustive, just a few simple tests to both verify the transformation is working today and to help ensure nobody breaks it in the future? I thought tests will be added later. I have ~250 tests to commit. Will check I have tests for optimizations. BTW this particular optimization cann't work until we have instrumented builtin calls. +void +chkp_optimize_string_function_calls (void) +{ + basic_block bb; + + if (dump_file (dump_flags TDF_DETAILS)) +fprintf (dump_file, Searching for replacable string function calls...\n); s/replacable/replaceable/ Seems like if you're going to emit a message into the debug file that the compiler is searching for replaceable calls that you ought to emit a message when a replaceable call is found too. And doing so ought to make writing those testcases easier too since you can scan the appropriate dump file for the message :-) + + if (fndecl_nochk) +fndecl = fndecl_nochk; + + gimple_call_set_fndecl (stmt, fndecl); + + /* If there is no nochk version of function then + do nothing. Otherwise insert checks before + the call. */ + if (!fndecl_nochk) +continue; It's a nit, but I'd tend to write that as: if (!fndecl_nochk) continue; fndecl = fndecl_nochk gimple_call_set_fndecl (stmt, fndecl); There is one more assignment to fndecl above which makes your version nonequivalent. + to perform runtiome check for size and perform s/runtiome/runtime/ + checks only when size is not zero. */ + if (!known) +{ + gimple check = gimple_build_cond (NE_EXPR, +size, +size_zero_node, +NULL_TREE, +NULL_TREE); + + /* Split block before string function call. */ + j = i; + gsi_prev (j); + fall = split_block (bb, gsi_stmt (j)); + bb = fall-src; + + /* Add size check. */ + j = gsi_last_bb (bb); + if (gsi_end_p (j)) +gsi_insert_before (j, check, GSI_SAME_STMT); + else +gsi_insert_after (j, check, GSI_SAME_STMT); + + /* Create basic block for checks. */ + check_bb = create_empty_bb (bb); + make_edge (bb, check_bb, EDGE_TRUE_VALUE); + make_single_succ_edge (check_bb, fall-dest, EDGE_FALLTHRU); + + /* Fix edge for splitted bb. */ + fall-flags = EDGE_FALSE_VALUE; + fall-count = bb-count; + fall-probability = REG_BR_PROB_BASE; + + /* Update dominance
Re: [PATCH] gcc parallel make check
On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote: my.exp contains the following construct which is often used in the testsuite: == foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] { # If we're only testing specific files and this isn't one of them, skip it. if ![runtest_file_p $runtests $src] then { continue } c-torture-execute $src $additional_flags gcc-dg-runtest $src $additional_flags } == Note that gcc-dg-runtest calls runtest_file_p too. Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p is run the same number of times in all instances unless gcc_parallel_test_enable has been disabled. See the patches I've posted when adding the fine-grained parallelization, e.g. go testsuite has been fixed that way, etc. So, in your above example, you'd need: gcc_parallel_test_enable 0 line before c-torture-execute and gcc_parallel_test_enable 1 line after gcc-dg-runtest. That way, if runtest_file_p says the test should be scheduled by current instance, all the subtests will be run there. If my.exp is part of gcc/testsuite, I'm sorry for missing it, if it is elsewhere, just fix it up. Note, there are #verbose lines in gcc_parallel_test_run_p, you can uncomment them and through sed on the log files verify that each instance performs the same parallelization checks (same strings). Jakub
Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
On 10/10/14 15:18, Richard Earnshaw wrote: On 10/10/14 11:53, Kyrill Tkachov wrote: Hi all, Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. A linker patching technique can also be used, by moving potentially affected multiply-accumulate instruction into a patch region and replacing the original instruction with a branch to the patch. This patch achieves the compiler part by using the final prescan pass. The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH so that conditional branches work properly. The fix is disabled by default and can be turned on by the -mfix-cortex-a53-835769 compiler option. I'm attaching a trunk and a 4.9 version of the patch. The 4.9 version is different only in that rtx_insn* is replaced by rtx. Tested on aarch64-none-linux-gnu (and bootstrap with the option) and built various large benchmarks with it. Ok? Thanks, Kyrill 2014-10-10 Kyrylo Tkachovkyrylo.tkac...@arm.com Ramana Radhakrishnanramana.radhakrish...@arm.com * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define. (ADJUST_INSN_LENGTH): Define. * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option. * config/aarch64/aarch64.c (is_mem_p): New function. (is_memory_op): Likewise. (aarch64_prev_real_insn): Likewise. (is_madd_op): Likewise. (dep_between_memop_and_next): Likewise. (aarch64_madd_needs_nop): Likewise. (aarch64_final_prescan_insn): Likewise. * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769 and -mno-fix-cortex-a53-835769 options. a53-workaround-new.patch commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4 Author: Kyrill Tkachov kyrylo.tkac...@arm.com Date: Wed Oct 8 12:48:34 2014 + [AArch64] Add final prescan workaround for Cortex-A53 erratum. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index b5f53d2..c57a467 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl, extern void aarch64_split_combinev16qi (rtx operands[3]); extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel); +extern bool aarch64_madd_needs_nop (rtx_insn *); +extern void aarch64_final_prescan_insn (rtx_insn *); extern bool aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel); void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5144c35..76a2480 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type) return NULL; } +static int +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED) +{ + return MEM_P (*x); +} + +static bool +is_memory_op (rtx_insn *mem_insn) +{ + rtx pattern = PATTERN (mem_insn); + return for_each_rtx (pattern, is_mem_p, NULL); +} + +/* Find the first rtx_insn before insn that will generate an assembly + instruction. */ + +static rtx_insn * +aarch64_prev_real_insn (rtx_insn *insn) +{ + if (!insn) +return NULL; + + do +{ + insn = prev_real_insn (insn); +} + while (insn recog_memoized (insn) 0); + + return insn; +} + +static bool +is_madd_op (enum attr_type t1) +{ + unsigned int i; + /* A number of these may be AArch32 only. */ + enum attr_type mlatypes[] = { +TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD, +TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY, +TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD + }; + + for (i = 0; i sizeof (mlatypes) / sizeof (enum attr_type); i++) +{ + if (t1 == mlatypes[i]) + return true; +} + + return false; +} + +/* Check if there is a register dependency between a load and the insn + for which we hold recog_data. */ + +static bool +dep_between_memop_and_curr (rtx memop) +{ + rtx load_reg; + int opno; + + if (!memop) +return false; + + if (!REG_P (SET_DEST (memop))) +return false; + + load_reg = SET_DEST (memop); + for (opno = 0; opno recog_data.n_operands; opno++) +{ + rtx operand = recog_data.operand[opno]; + if (REG_P (operand) + reg_overlap_mentioned_p (load_reg, operand)) +return true; + +} +
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [11/n] Optimization helpers
On 09 Oct 12:09, Jeff Law wrote: On 10/08/14 13:16, Ilya Enkovich wrote: Hi, This patch introduces structures and manipulation functions used by simple checker optimizations. Structures are used to hold checks information - type of check and checked address in a polinomial form. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (check_type): New. (pol_item): New. (address_t): New. (check_info): New. (bb_checks): New. (chkp_pol_item_compare): New. (chkp_pol_find): New. (chkp_extend_const): New. (chkp_add_addr_item): New. (chkp_sub_addr_item): New. (chkp_add_addr_addr): New. (chkp_sub_addr_addr): New. (chkp_mult_addr): New. (chkp_is_constant_addr): New. (chkp_print_addr): New. (chkp_collect_addr_value): New. (chkp_collect_value): New. (chkp_fill_check_info): New. +/* Find plynomial item in ADDR with var equal to VAR s/plynomial/polynomial/ With nit fixed and functions moved into whatever new file gets created for the optimization work this will be OK. jeff Thanks for review! Here is a fixed version. Ilya -- 2014-10-10 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp-opt.c: New. * Makefile.in (OBJS): Add tree-chkp-opt.o. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index d8c8488..cd45b29 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1389,6 +1389,7 @@ OBJS = \ tree-parloops.o \ tree-phinodes.o \ tree-chkp.o \ + tree-chkp-opt.o \ tree-predcom.o \ tree-pretty-print.o \ tree-profile.o \ diff --git a/gcc/tree-chkp-opt.c b/gcc/tree-chkp-opt.c new file mode 100644 index 000..103c4bb --- /dev/null +++ b/gcc/tree-chkp-opt.c @@ -0,0 +1,463 @@ +/* Pointer Bounds Checker optimization pass. + Copyright (C) 2014 Free Software Foundation, Inc. + Contributed by Ilya Enkovich (ilya.enkov...@intel.com) + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include tree-core.h +#include stor-layout.h +#include varasm.h +#include tree.h +#include target.h +#include tree-iterator.h +#include tree-cfg.h +#include langhooks.h +#include tree-pass.h +#include hashtab.h +#include diagnostic.h +#include ggc.h +#include output.h +#include internal-fn.h +#include is-a.h +#include predict.h +#include cfgloop.h +#include stringpool.h +#include tree-ssa-alias.h +#include tree-ssanames.h +#include tree-ssa-operands.h +#include tree-ssa-address.h +#include tree-ssa.h +#include ipa-inline.h +#include basic-block.h +#include tree-ssa-loop-niter.h +#include gimple-expr.h +#include gimple.h +#include tree-phinodes.h +#include gimple-ssa.h +#include ssa-iterators.h +#include gimple-pretty-print.h +#include gimple-iterator.h +#include gimplify.h +#include gimplify-me.h +#include print-tree.h +#include expr.h +#include tree-ssa-propagate.h +#include gimple-fold.h +#include gimple-walk.h +#include tree-dfa.h +#include tree-chkp.h + +enum check_type +{ + CHECK_LOWER_BOUND, + CHECK_UPPER_BOUND +}; + +struct pol_item +{ + tree cst; + tree var; +}; + +struct address_t +{ + vecstruct pol_item pol; +}; + +/* Structure to hold check informtation. */ +struct check_info +{ + /* Type of the check. */ + check_type type; + /* Address used for the check. */ + address_t addr; + /* Bounds used for the check. */ + tree bounds; + /* Check statement. Can be NULL for removed checks. */ + gimple stmt; +}; + +/* Structure to hold checks information for BB. */ +struct bb_checks +{ + vecstruct check_info, va_heap, vl_ptr checks; +}; + +static void chkp_collect_value (tree ssa_name, address_t res); + +#define chkp_bndmk_fndecl \ + (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDMK)) +#define chkp_intersect_fndecl \ + (targetm.builtin_chkp_function (BUILT_IN_CHKP_INTERSECT)) +#define chkp_checkl_fndecl \ + (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDCL)) +#define chkp_checku_fndecl \ + (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDCU)) + +/* Comparator for pol_item structures I1 and I2 to be used + to find items with equal var. Also used for polynomial + sorting. */ +static int +chkp_pol_item_compare (const void *i1, const void *i2) +{ + const struct pol_item *p1 = (const struct pol_item *)i1; + const struct pol_item *p2 = (const struct
[Ada] Operator name returned by GNAT.Source_Info.Enclosing_Entity
The string returned by GNAT.Source_Info.Enclosing_Entity did not include names of operators (e.g. **). The following program: 1. with Text_IO; use Text_IO; 2. with GNAT.Source_Info; use GNAT.Source_Info; 3. procedure BadEE is 4.type R is new Boolean; 5.RV : R := True; 6. 7.function ** (X, Y : R) return String is 8.begin 9. return Enclosing_Entity; 10.end; 11. begin 12.Put_Line (RV ** RV); 13. end BadEE; must output the string: BadEE.** Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-10 Robert Dewar de...@adacore.com * exp_intr.adb (Write_Entity_Name): Moved to outer level (Write_Entity_Name): Properly handle operator names (Expand_Source_Info): New procedure. * exp_intr.ads (Add_Source_Info): New procedure. Index: exp_intr.adb === --- exp_intr.adb(revision 216063) +++ exp_intr.adb(working copy) @@ -36,7 +36,6 @@ with Exp_Fixd; use Exp_Fixd; with Exp_Util; use Exp_Util; with Freeze; use Freeze; -with Namet;use Namet; with Nmake;use Nmake; with Nlists; use Nlists; with Opt; use Opt; @@ -116,6 +115,96 @@ --Name_Compilation_Date - expand string with compilation date --Name_Compilation_Time - expand string with compilation time + procedure Write_Entity_Name (E : Entity_Id); + -- Recursive procedure to construct string for qualified name of enclosing + -- program unit. The qualification stops at an enclosing scope has no + -- source name (block or loop). If entity is a subprogram instance, skip + -- enclosing wrapper package. The name is appended to the current contents + -- of Name_Buffer, incrementing Name_Len. + + - + -- Add_Source_Info -- + - + + procedure Add_Source_Info (Loc : Source_Ptr; Nam : Name_Id) is + Ent : Entity_Id; + + Save_NB : constant String := Name_Buffer (1 .. Name_Len); + Save_NL : constant Natural := Name_Len; + -- Save current Name_Buffer contents + + begin + Name_Len := 0; + + -- Line + + case Nam is + + when Name_Line = +Add_Nat_To_Name_Buffer (Nat (Get_Logical_Line_Number (Loc))); + + when Name_File = +Get_Decoded_Name_String + (Reference_Name (Get_Source_File_Index (Loc))); + + when Name_Source_Location = +Build_Location_String (Loc); + + when Name_Enclosing_Entity = + +-- Skip enclosing blocks to reach enclosing unit + +Ent := Current_Scope; +while Present (Ent) loop + exit when Ekind (Ent) /= E_Block + and then Ekind (Ent) /= E_Loop; + Ent := Scope (Ent); +end loop; + +-- Ent now points to the relevant defining entity + +Write_Entity_Name (Ent); + + when Name_Compilation_Date = +declare + subtype S13 is String (1 .. 3); + Months : constant array (1 .. 12) of S13 := + (Jan, Feb, Mar, Apr, May, Jun, + Jul, Aug, Sep, Oct, Nov, Dec); + + M1 : constant Character := Opt.Compilation_Time (6); + M2 : constant Character := Opt.Compilation_Time (7); + + MM : constant Natural range 1 .. 12 := + (Character'Pos (M1) - Character'Pos ('0')) * 10 + + (Character'Pos (M2) - Character'Pos ('0')); + +begin + -- Reformat ISO date into MMM DD (__DATE__) format + + Name_Buffer (1 .. 3) := Months (MM); + Name_Buffer (4) := ' '; + Name_Buffer (5 .. 6) := Opt.Compilation_Time (9 .. 10); + Name_Buffer (7) := ' '; + Name_Buffer (8 .. 11) := Opt.Compilation_Time (1 .. 4); + Name_Len := 11; +end; + + when Name_Compilation_Time = +Name_Buffer (1 .. 8) := Opt.Compilation_Time (12 .. 19); +Name_Len := 8; + + when others = +raise Program_Error; + end case; + + -- Prepend original Name_Buffer contents + + Name_Buffer (Save_NL + 1 .. Save_NL + Name_Len) := +Name_Buffer (1 .. Name_Len); + Name_Buffer (1 .. Save_NL) := Save_NB; + end Add_Source_Info; + - -- Expand_Binary_Operator_Call -- - @@ -718,61 +807,6 @@ Loc : constant Source_Ptr := Sloc (N); Ent : Entity_Id; - procedure Write_Entity_Name (E : Entity_Id); - -- Recursive procedure to construct string for qualified name of - -- enclosing program unit. The qualification stops at an enclosing - -- scope has no source name (block or loop). If entity is a subprogram - -- instance, skip
[Ada] Implement new pragma Prefix_Exception_Messages
This implements a new configuration pragma pragma Prefix_Exception_Messages; which causes messages set using raise x with s to be prefixed by the expanded name of the enclosing entity if s is a string literal (if s is more complex, we assume the program is calculating exactly the message it wants). So for example, if we have the program: 1. pragma Prefix_Exception_Messages; 2. procedure Prefixem is 3.procedure Inner is 4.begin 5. raise Constraint_Error with explicit raise; 6.end; 7. begin 8.Inner; 9. end Prefixem; The output will be: raised CONSTRAINT_ERROR : Prefixem.Inner: explicit raise This mode is automatic for run-time library files, so a typical message from the runtime library which used to look like: raised GNAT.CALENDAR.TIME_IO.PICTURE_ERROR : null picture string now looks like: raised GNAT.CALENDAR.TIME_IO.PICTURE_ERROR : GNAT.Calendar.Time_IO.Image: null picture string In the case of instantiations of containers, you will get the full qualified name of the particular instantiation that is involved. For example, the following program: 1. with Ada.Containers.Ordered_Sets; 2. procedure NoElmt is 3.package Ordered_Integer_Sets is 4. new Ada.Containers.Ordered_Sets (Integer); 5.use Ordered_Integer_Sets; 6. begin 7.if No_Element No_Element then 8. null; 9.end if; 10. end; will output raised CONSTRAINT_ERROR : NoElmt.Ordered_Integer_Sets.: Left cursor equals No_Element This allows disambiguation of messages without reintroducing line numbers which are problematic for maintaining tests over different versions and targets. Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-10 Robert Dewar de...@adacore.com * exp_ch11.adb (Expand_N_Raise_Statement): Handle Prefix_Exception_Messages. * opt.adb: Handle new flags Prefix_Exception_Message[_Config]. * opt.ads: New flags Prefix_Exception_Message[_Config]. * par-prag.adb: New dummy entry for pragma Prefix_Exception_Messages. * snames.ads-tmpl: Add entries for new pragma Prefix_Exception_Messages. * sem_prag.adb: Implement new pragma Prefix_Exception_Messages * gnat_rm.texi: Document pragma Prefix_Exception_Messages. Index: exp_ch11.adb === --- exp_ch11.adb(revision 216063) +++ exp_ch11.adb(working copy) @@ -29,6 +29,7 @@ with Elists; use Elists; with Errout; use Errout; with Exp_Ch7; use Exp_Ch7; +with Exp_Intr; use Exp_Intr; with Exp_Util; use Exp_Util; with Namet;use Namet; with Nlists; use Nlists; @@ -1565,6 +1566,22 @@ if Present (Expression (N)) then + -- Adjust message to deal with Prefix_Exception_Messages. We only + -- add the prefix to string literals, if the message is being + -- constructed, we assume it already deals with uniqueness. + + if Prefix_Exception_Messages + and then Nkind (Expression (N)) = N_String_Literal + then +Name_Len := 0; +Add_Source_Info (Loc, Name_Enclosing_Entity); +Add_Str_To_Name_Buffer (: ); +Add_String_To_Name_Buffer (Strval (Expression (N))); +Rewrite (Expression (N), + Make_String_Literal (Loc, Name_Buffer (1 .. Name_Len))); +Analyze_And_Resolve (Expression (N), Standard_String); + end if; + -- Avoid passing exception-name'identity in runtimes in which this -- argument is not used. This avoids generating undefined references -- to these exceptions when compiling with no optimization Index: gnat_rm.texi === --- gnat_rm.texi(revision 216081) +++ gnat_rm.texi(working copy) @@ -227,6 +227,7 @@ * Pragma Precondition:: * Pragma Predicate:: * Pragma Preelaborable_Initialization:: +* Pragma Prefix_Exception_Messages:: * Pragma Pre_Class:: * Pragma Priority_Specific_Dispatching:: * Pragma Profile:: @@ -1096,6 +1097,7 @@ * Pragma Precondition:: * Pragma Predicate:: * Pragma Preelaborable_Initialization:: +* Pragma Prefix_Exception_Messages:: * Pragma Pre_Class:: * Pragma Priority_Specific_Dispatching:: * Pragma Profile:: @@ -5692,6 +5694,34 @@ versions of Ada as an implementation-defined pragma. See Ada 2012 Reference Manual for details. +@node Pragma Prefix_Exception_Messages +@unnumberedsec Pragma Prefix_Exception_Messages +@cindex Prefix_Exception_Messages +@cindex exception +@cindex Exception_Message +@findex Exceptions +@noindent +Syntax: + +@smallexample @c ada +pragma Prefix_Exception_Messages; +@end smallexample + +@noindent +This is an implementation-defined configuration pragma that affects the +behavior of raise statements with a message given as a static string +constant (typically a
[Ada] Spurious error on local instantiation of pure generic unit
This patch fixes an error in the legality checks of aspects that apply to library units: these aspects are legal on a local instantiation of a library-level generic unit that carries the aspect pure. The following must compile quietly: gcc -c my_buffer.adb --- package My_Buffer with Elaborate_Body is end My_Buffer; --- with Common.Gen_Circular_Buffer; package body My_Buffer is type Capacity_Count is range 0 .. 10; procedure Copy_Integer (From_Element : in Integer; To_Element :out Integer) is begin To_Element := From_Element; end Copy_Integer; package Buffer is new Common.Gen_Circular_Buffer (Capacity_Count = Capacity_Count, Capacity = 10, Element_Type = Integer, Copy_Element = Copy_Integer); end My_Buffer; --- package body Common.Gen_Circular_Buffer is procedure Initialize (Buffer : out Buffer_Type) is begin Buffer.Start_Index := Element_Index'First; Buffer.Count := 0; end Initialize; procedure Insert (Element : in Element_Type; Buffer : in out Buffer_Type) is End_Index : Element_Index; -- Index into the end of the buffer where Element is to be stored. begin if Element_Index'Last - Buffer.Start_Index = Buffer.Count then End_Index := Buffer.Start_Index + Buffer.Count; else End_Index := Element_Index'First + (Buffer.Count - ((Element_Index'Last - Buffer.Start_Index) + 1)); end if; Copy_Element (From_Element = Element, To_Element = Buffer.Elements (End_Index)); Buffer.Count := Buffer.Count + 1; end Insert; procedure Remove (Buffer : in out Buffer_Type; Element : out Element_Type) is begin Copy_Element (From_Element = Buffer.Elements (Buffer.Start_Index), To_Element = Element); Discard_First (Buffer); end Remove; procedure Discard_First (Buffer : in out Buffer_Type) is begin if Buffer.Start_Index = Element_Index'Last then Buffer.Start_Index := Element_Index'First; else Buffer.Start_Index := Buffer.Start_Index + 1; end if; Buffer.Count := Buffer.Count - 1; end Discard_First; function Capacity_Used (Buffer : in Buffer_Type) return Element_Count is begin return Buffer.Count; end Capacity_Used; end Common.Gen_Circular_Buffer; --- generic type Capacity_Count is range ; Capacity : Capacity_Count; type Element_Type is limited private; with procedure Copy_Element (From_Element : in Element_Type; To_Element : out Element_Type); package Common.Gen_Circular_Buffer with Pure is type Buffer_Type is limited private; type Element_Count is new Capacity_Count range 0 .. Capacity; procedure Initialize (Buffer : out Buffer_Type); procedure Insert (Element : in Element_Type; Buffer : in out Buffer_Type) with Pre = Capacity_Used (Buffer) Element_Count'Last; procedure Remove (Buffer : in out Buffer_Type; Element : out Element_Type) with Pre = Capacity_Used (Buffer) 0; procedure Discard_First (Buffer : in out Buffer_Type) with Pre = Capacity_Used (Buffer) 0; function Capacity_Used (Buffer : in Buffer_Type) return Element_Count; private subtype Element_Index is Element_Count range 1 .. Element_Count'Last; type Element_Array is array (Element_Index) of Element_Type; type Buffer_Type is record Start_Index : Element_Index; Count : Element_Count; Elements : Element_Array; -- Element storage. end record; end Common.Gen_Circular_Buffer; --- package Common with Pure is end Common; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-10 Ed Schonberg schonb...@adacore.com * sem_ch13.adb (Analyze_Aspect_Specifications, Library_Unit_Aspects): Aspect specification is legal on a local instantiation of a library-level generic unit. Index: sem_ch13.adb === --- sem_ch13.adb(revision 216081) +++ sem_ch13.adb(working copy) @@ -3018,12 +3018,15 @@ -- of a package declaration, the pragma needs to be inserted -- in the list of declarations for the associated package. -- There is no issue of visibility delay for these aspects. + -- Aspect is legal on a local instantiation of a library- + -- level generic unit. if A_Id in Library_Unit_Aspects and then Nkind_In (N, N_Package_Declaration, N_Generic_Package_Declaration) and then Nkind (Parent (N)) /= N_Compilation_Unit +and then not Is_Generic_Instance (Defining_Entity (N)) then
[Ada] Ada2012 freeze rules for subprogram profiles
Ada05-019 specifies that freezing a subprogram does not automatically freeze the profile, i.e. the types of the formals and the return type. In particular an attribute reference 'Access and its relatives do not freeze the profile. Compiling bd.ads must yield: bd.ads:15:34: incorrect expression for READ attribute --- with Ada.Streams; package BD is type My_Big_Int is range 0 .. 1; type Write_Ptr is access procedure (Stream : not null access Ada.Streams.Root_Stream_Type'Class; A : in My_Big_Int'Base); procedure Good_Write6 (Stream : not null access Ada.Streams.Root_Stream_Type'Class; A : in My_Big_Int'Base); WPtr : Write_Ptr := Good_Write6'Access; -- Does not freeze My_Big_Int (AI05-0019-1). for My_Big_Int'Read use WPtr.all; -- ERROR: private type My_Priv (D : Integer) is null record; end BD; Tested on x86_64-pc-linux-gnu, committed on trunk 2014-10-10 Ed Schonberg schonb...@adacore.com * freeze.adb (Freeze_Entity): Freezing a subprogram does not always freeze its profile. In particular, an attribute reference that takes the access type does not freeze the types of the formals. Index: freeze.adb === --- freeze.adb (revision 216089) +++ freeze.adb (working copy) @@ -4004,7 +4004,17 @@ -- any extra formal parameters are created since we now know -- whether the subprogram will use a foreign convention. -if not Is_Internal (E) then +-- In Ada 2012, freezing a subprogram does not always freeze +-- the corresponding profile (see AI05-019). An attribute +-- reference is not a freezing point of the profile. +-- Other constructs that should not freeze ??? + +if Ada_Version Ada_2005 + and then Nkind (N) = N_Attribute_Reference +then + null; + +elsif not Is_Internal (E) then declare F_Type: Entity_Id; R_Type: Entity_Id;
[AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
Hi, I have a fix (originally written by Tejas Belagod) for the following bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 Could someone take a look please? Thanks! David Sherwood. ChangeLog: gcc/: 2014-10-10 David Sherwood david.sherw...@arm.com * config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): New decls. * config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum. * config/aarch64/iterators.md (insn_count): New mode_attr. * config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i, vec_load_lanes(o/c/x)i): Fixed to work for Big Endian. * config/aarch64/aarch64-simd.md (aarch64_rev_reglist, aarch64_simd_(ld/st)(2/3/4)): Added. * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): Added. ccmc_v1.patch Description: Binary data
Re: [PATCH] gcc parallel make check
On 10 October 2014 16:19, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote: my.exp contains the following construct which is often used in the testsuite: == foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] { # If we're only testing specific files and this isn't one of them, skip it. if ![runtest_file_p $runtests $src] then { continue } c-torture-execute $src $additional_flags gcc-dg-runtest $src $additional_flags } == Note that gcc-dg-runtest calls runtest_file_p too. Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p is run the same number of times in all instances unless gcc_parallel_test_enable has been disabled. Thanks for your prompt answer. Is this documented somewhere, so that such cases do not happen in the future? See the patches I've posted when adding the fine-grained parallelization, e.g. go testsuite has been fixed that way, etc. So, in your above example, you'd need: gcc_parallel_test_enable 0 line before c-torture-execute and gcc_parallel_test_enable 1 line after gcc-dg-runtest. That way, if runtest_file_p says the test should be scheduled by current instance, all the subtests will be run there. If my.exp is part of gcc/testsuite, I'm sorry for missing it, if it is elsewhere, just fix it up. It's in a patch which has been under review for quite some time (started before your change), that's why you missed it. Note, there are #verbose lines in gcc_parallel_test_run_p, you can uncomment them and through sed on the log files verify that each instance performs the same parallelization checks (same strings). Yep, I saw those and also added other traces of my own :-) What about my remark about: # For parallelized check-% targets, this decides whether parallelization # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything # but optional --target_board or --extra_opts arguments). If desirable, I think it should be removed from gcc/Makefile.in Thanks, Christophe.
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [8/n] Remove useless builtin calls
On 09 Oct 15:03, Jeff Law wrote: On 10/08/14 13:08, Ilya Enkovich wrote: Hi, This patch adds removal of user calls to chkp builtins which become useless after instrumentation. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_remove_useless_builtins): New. (chkp_execute): Remove useless calls to Pointer Bounds Checker builtins. Please put this in the file with the optimizations. Tests too, which may require you to add some dumping bits into this code since you may not be able to directly see the behaviour you want in the gimple dumps later. What I'm a bit confused about is it looks like every one of these builtin calls you end up optimizing -- why not generate the simple copy in the first place and avoid the need for chkp_remove_useless_builtins completely? Clearly I'm missing something here. diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 5443950..b424af8 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -3805,6 +3805,49 @@ chkp_instrument_function (void) } } +/* Find init/null/copy_ptr_bounds calls and replace them + with assignments. It should allow better code + optimization. */ + +static void +chkp_remove_useless_builtins () +{ + basic_block bb, next; + gimple_stmt_iterator gsi; + + bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)-next_bb; + do +{ + next = bb-next_bb; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + tree fndecl; + enum built_in_function fcode; There's some kind of formatting goof on one or more of the preceeding lines. They should be at the same indention level. Most likely they aren't consistent with their use of tabs vs spaces. A nit, but please take care of it. If we end up keeping this code, then I think it'll be fine with the whitespace nit fixed. I just want to make sure there's a good reason to generate these builtins in the first place rather than the more direct assignment. THanks, Jeff With this code we remove user builtins calls coming from source code. E.g.: p2 = (int *)__bnd_init_ptr_bounds (p1); *p2 = 0; which means p2 has value of p1 but has default bounds and following store is unchecked. These calls are important for instrumentation but useless after instrumentation. I don't think it is a part of checker optimizer because it doesn't optimize instrumentation code. Also this transformation is trivial enough for O0 and checker optimizer works starting from O2. Below is a version fixed according to Richard's comments. Thanks, Ilya -- 2014-10-10 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_remove_useless_builtins): New. (chkp_execute): Remove useless calls to Pointer Bounds Checker builtins. diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 170b33b..fde0228 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -3805,6 +3805,44 @@ chkp_instrument_function (void) } } +/* Find init/null/copy_ptr_bounds calls and replace them + with assignments. It should allow better code + optimization. */ + +static void +chkp_remove_useless_builtins () +{ + basic_block bb, next; + gimple_stmt_iterator gsi; + + FOR_EACH_BB_FN (bb, cfun) +{ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + tree fndecl; + enum built_in_function fcode; + + /* Find builtins returning first arg and replace +them with assignments. */ + if (gimple_code (stmt) == GIMPLE_CALL + (fndecl = gimple_call_fndecl (stmt)) + DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL + (fcode = DECL_FUNCTION_CODE (fndecl)) + (fcode == BUILT_IN_CHKP_INIT_PTR_BOUNDS + || fcode == BUILT_IN_CHKP_NULL_PTR_BOUNDS + || fcode == BUILT_IN_CHKP_COPY_PTR_BOUNDS + || fcode == BUILT_IN_CHKP_SET_PTR_BOUNDS)) + { + tree res = gimple_call_arg (stmt, 0); + update_call_from_tree (gsi, res); + stmt = gsi_stmt (gsi); + update_stmt (stmt); + } +} +} +} + /* Initialize pass. */ static void chkp_init (void) @@ -3872,6 +3910,8 @@ chkp_execute (void) chkp_instrument_function (); + chkp_remove_useless_builtins (); + chkp_function_mark_instrumented (cfun-decl); chkp_fix_cfg ();
Re: [PATCH] gcc parallel make check
On Fri, Oct 10, 2014 at 04:50:47PM +0200, Christophe Lyon wrote: On 10 October 2014 16:19, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote: my.exp contains the following construct which is often used in the testsuite: == foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] { # If we're only testing specific files and this isn't one of them, skip it. if ![runtest_file_p $runtests $src] then { continue } c-torture-execute $src $additional_flags gcc-dg-runtest $src $additional_flags } == Note that gcc-dg-runtest calls runtest_file_p too. Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p is run the same number of times in all instances unless gcc_parallel_test_enable has been disabled. Thanks for your prompt answer. Is this documented somewhere, so that such cases do not happen in the future? Feel free to submit a documentation patch. It's in a patch which has been under review for quite some time (started before your change), that's why you missed it. Ah, ok. What about my remark about: # For parallelized check-% targets, this decides whether parallelization # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything # but optional --target_board or --extra_opts arguments). If desirable, I think it should be removed from gcc/Makefile.in Only the and RUNTESTFLAGS ... arguments part of that. Patch preapproved. Jakub
FW: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
Hi, I forgot to mention that this patch needs was tested in combination with another patch by Alan Hayward (coming soon) and observed the following: x86_64: No regressions. aarch64: No regressions. aarch64_be: Lots of vector tests now fixed. No new regressions. Regards, David. -Original Message- From: David Sherwood [mailto:david.sherw...@arm.com] Sent: 10 October 2014 15:48 To: gcc-patches@gcc.gnu.org Subject: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. Hi, I have a fix (originally written by Tejas Belagod) for the following bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 Could someone take a look please? Thanks! David Sherwood. ChangeLog: gcc/: 2014-10-10 David Sherwood david.sherw...@arm.com * config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): New decls. * config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum. * config/aarch64/iterators.md (insn_count): New mode_attr. * config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i, vec_load_lanes(o/c/x)i): Fixed to work for Big Endian. * config/aarch64/aarch64-simd.md (aarch64_rev_reglist, aarch64_simd_(ld/st)(2/3/4)): Added. * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): Added.
New rematerialization sub-pass in LRA
Here is a new rematerialization sub-pass of LRA. This summer, I tried to implement a separate register pressure release pass based on rematerialization described in Simpson's PhD. Although this approach is very attractive as implementation of a separate pass is simpler, it did not work in GCC. I saw no code improvement but mostly big performance degradations. We have pretty decent register pressure evaluation infrastructure but apparently it is not enough to make right rematerialization based only on this info as IRA/LRA make many other optimizations which can be hard to take into account before the RA. So I guess Simpson's approach works only for basic/simple RA for regular file architectures (as register pressure decrease pass described in Morgan's book which I also tried). So I had to implement rematerialization in LRA itself where all necessary info is available. The implementation is more complicated than Simpson's one but it is worth it as there is no performance degradation in vast majority cases. The new LRA rematerialization sub-pass works right before spilling subpass and tries to rematerialize values of spilled pseudos. To implement the new sub-pass, some important changes in LRA were done. First, lra-lives.c updates live info about all registers (not only allocatable ones) and, second, lra-constraints.c was modified to permit to check that insn satisfies all constraints in strict sense even if it still contains pseudos. I've tested and benchmarked the sub-pass on x86-64 and ARM. The sub-pass permits to generate a smaller code in average on both architecture (although improvement no-significant), adds 0.4% additional compilation time in -O2 mode of release GCC (according user time of compilation of 500K lines fortran program and valgrind lakey # insns in combine.i compilation) and about 0.7% in -O0 mode. As the performance result, the best I found is 1% SPECFP2000 improvement on ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance results are practically the same (Haswell has a very good sophisticated memory sub-system). There is a room for the pass improvements. I wrote some ideas at the top level comment of file lra-remat.c Rematerialization sub-pass will work at -O2 and higher and new option -flra-remat is introduced. The patch was successfully tested on x86-64 and ARM. I am going to submit it on next week. Any comments are appreciated. 2014-10-10 Vladimir Makarov vmaka...@redhat.com * common.opt (flra-remat): New. * opts.c (default_options_table): Add entry for flra_remat. * timevar_def (TV_LRA_REMAT): New. * doc/invoke.texi (-flra-remat): Add description of the new option. * doc/passes.texi (-flra-remat): Remove lra-equivs.c and lra-saves.c. Add lra-remat.c. * Makefile.in (OBJS): Add lra-remat.o. * lra-remat.c: New file. * lra.c: Add info about the rematerialization pass in the top comment. (collect_non_operand_hard_regs, add_regs_to_insn_regno_info): Process unallocatable regs too. (lra_constraint_new_insn_uid_start): Remove. (lra): Add code for calling rematerialization sub-pass. * lra-int.h (lra_constraint_new_insn_uid_start): Remove. (lra_constrain_insn, lra_remat): New prototypes. (lra_eliminate_regs_1): Add parameter. * lra-lives.c (make_hard_regno_born, make_hard_regno_dead): Process unallocatable hard regs too. (process_bb_lives): Ditto. * lra-spills.c (remove_pseudos): Add argument to lra_eliminate_regs_1 call. * lra-eliminations.c (lra_eliminate_regs_1): Add parameter. Use it for sp offset calculation. (lra_eliminate_regs): Add argument for lra_eliminate_regs_1 call. (eliminate_regs_in_insn): Add parameter. Use it for sp offset calculation. (process_insn_for_elimination): Add argument for eliminate_regs_in_insn call. * lra-constraints.c (get_equiv_with_elimination): Add argument for lra_eliminate_regs_1 call. (process_addr_reg): Add parameter. Use it. (process_address_1): Ditto. Add argument for process_addr_reg call. (process_address): Ditto. (curr_insn_transform): Add parameter. Use it. Add argument for process_address calls. (lra_constrain_insn): New function. (lra_constraints): Add argument for curr_insn_transform call. Index: Makefile.in === --- Makefile.in (revision 216039) +++ Makefile.in (working copy) @@ -1296,6 +1296,7 @@ OBJS = \ lra-constraints.o \ lra-eliminations.o \ lra-lives.o \ + lra-remat.o \ lra-spills.o \ lto-cgraph.o \ lto-streamer.o \ Index: common.opt === --- common.opt (revision 216039) +++
Re: RFA: RTL typesafety improvements for ira.c
On 2014-10-08 10:21 PM, Jeff Law wrote: On 10/01/14 17:27, David Malcolm wrote: FWIW, presumably insn here also can now be an rtx_insn *? (I'd like to eventually strengthen the params to the note-handling functions, so fixing this up now would help with that). Here's the updated patch to include strengthening insn to rtx_insn *. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for the trunk? I'd like to get this wrapped up so that a patch in the same area from Felix can get rebased and move forward. It is ok for the trunk, Jeff. Sorry for the delay with the response. As you know I was busy with the rematerialzation pass.
[PATCH x86] Update PARTIAL_REG_DEPENDENCY tune
Hi, We've met several performance issues (up to 15%) on Silvermont caused by the PARTIAL_REG_DEPENDENCY tuning. Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY. The patch passed bootstrap, make check. Is it ok for trunk? Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY): Remove m_SILVERMONT and m_INTEL from the tune. diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 215c63c..b6b210e 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -58,8 +58,8 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, partial_reg_dependency, SPECfp regression, while enabling it on K8 brings roughly 2.4% regression that can be partly masked by careful scheduling of moves. */ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, sse_partial_reg_dependency, - m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT - | m_INTEL | m_AMDFAM10 | m_BDVER | m_GENERIC) + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 + | m_BDVER | m_GENERIC) /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies are resolved on SSE register parts instead of whole registers, so we may
Re: [PATCH x86] Update PARTIAL_REG_DEPENDENCY tune
On Fri, Oct 10, 2014 at 5:07 PM, Evgeny Stupachenko evstu...@gmail.com wrote: We've met several performance issues (up to 15%) on Silvermont caused by the PARTIAL_REG_DEPENDENCY tuning. Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY. The patch passed bootstrap, make check. Is it ok for trunk? OK. Thanks, Uros.
[AArch64] [BE] Fix vector load/stores to not use ld1/st1
This patch is dependant on [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.” It fixes up movoi/ci/xi for Big Endian, so that we end up with the lsb of a big-endian integer to be in the low byte of the highest-numbered register. movoi uses stp/ldp movxi needs a split version (which is shared with register-register movxi), which just splits to two new movs movci can then split to three movs. A future patch will instead split to an movoi and a movti. There are no changes for LE. Ran whole of check with both parts of Make large opaque integer modes endianness-safe”. No regressions. ChangeLog: gcc/: 2014-10-10 Alan Hayward alan.hayw...@arm.com * config/aarch64/aarch64.c (aarch64_classify_address): Allow extra addressing modes for BE. (aarch64_print_operand): new operand for printing a q register+1. (aarch64_simd_emit_reg_reg_move): replacement for aarch64_simd_disambiguate_copy that plants the required mov. * config/aarch64/aarch64-protos.h (aarch64_simd_emit_reg_reg_move): replacement for aarch64_simd_disambiguate_copy. * config/aarch64/aarch64-simd.md (define_split) Use new aarch64_simd_emit_reg_reg_move. (define_expand movmode) less restrictive predicates. (define_insn *aarch64_movmode) Simplify and only allow for LE. (define_insn *aarch64_be_movoi) New. BE only. Plant ldp or stp. (define_insn *aarch64_be_movci) New. BE only. No instructions. (define_insn *aarch64_be_movxi) New. BE only. No instructions. (define_split) OI mov. Use new aarch64_simd_emit_reg_reg_move. (define_split) CI mov. Use new aarch64_simd_emit_reg_reg_move. On BE plant movs for reg to/from mem case. (define_split) XI mov. Use new aarch64_simd_emit_reg_reg_move. On BE plant movs for reg to/from mem case. 0001-be-mov.patch Description: Binary data
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On 2014-10-10 3:42 AM, Evgeny Stupachenko wrote: Hi, The patch enables EBX in RA for x86 32bits PIC mode. It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html Now there is working version with good performance and stability level - it could be a solid first step of EBX enabling. Bootstrap and make check passed. There are several changes in -m32 make check. New pass: gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS New fails: gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no errors as we can do asm insertions with EBX gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential performance opportunity using constant immediate gcc.target/i386/pr55458.c (test for errors, line 10) - now there are no errors as there enough registers Performance on Silvermont (for the group of 3 patches): at -Ofast -flto -funroll-loops -fPIC EEMBC benchs: +6% CoreMark +3% Denmark Individual test gained up to +25% SPEC2000int is almost unchanged, SPEC2000fp +1,7%: at -O2 -fPIC EEMBC benchs: +2% CoreMark +5% Denmark Individual test gained up to +25% SPEC2000int and SPEC2000fp are almost unchanged: Great results! The changes to RA (all ira* lra* file changes) is ok for me. I'll probably add some comments to these changes on next week. Thanks.
Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
On 10/10/14 15:22, Kyrill Tkachov wrote: On 10/10/14 15:18, Richard Earnshaw wrote: On 10/10/14 11:53, Kyrill Tkachov wrote: Hi all, Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. A linker patching technique can also be used, by moving potentially affected multiply-accumulate instruction into a patch region and replacing the original instruction with a branch to the patch. This patch achieves the compiler part by using the final prescan pass. The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH so that conditional branches work properly. The fix is disabled by default and can be turned on by the -mfix-cortex-a53-835769 compiler option. I'm attaching a trunk and a 4.9 version of the patch. The 4.9 version is different only in that rtx_insn* is replaced by rtx. Tested on aarch64-none-linux-gnu (and bootstrap with the option) and built various large benchmarks with it. Ok? Thanks, Kyrill 2014-10-10 Kyrylo Tkachovkyrylo.tkac...@arm.com Ramana Radhakrishnanramana.radhakrish...@arm.com * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define. (ADJUST_INSN_LENGTH): Define. * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option. * config/aarch64/aarch64.c (is_mem_p): New function. (is_memory_op): Likewise. (aarch64_prev_real_insn): Likewise. (is_madd_op): Likewise. (dep_between_memop_and_next): Likewise. (aarch64_madd_needs_nop): Likewise. (aarch64_final_prescan_insn): Likewise. * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769 and -mno-fix-cortex-a53-835769 options. a53-workaround-new.patch commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4 Author: Kyrill Tkachov kyrylo.tkac...@arm.com Date: Wed Oct 8 12:48:34 2014 + [AArch64] Add final prescan workaround for Cortex-A53 erratum. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index b5f53d2..c57a467 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl, extern void aarch64_split_combinev16qi (rtx operands[3]); extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel); +extern bool aarch64_madd_needs_nop (rtx_insn *); +extern void aarch64_final_prescan_insn (rtx_insn *); extern bool aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel); void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5144c35..76a2480 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type) return NULL; } +static int +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED) +{ + return MEM_P (*x); +} + +static bool +is_memory_op (rtx_insn *mem_insn) +{ + rtx pattern = PATTERN (mem_insn); + return for_each_rtx (pattern, is_mem_p, NULL); +} + +/* Find the first rtx_insn before insn that will generate an assembly + instruction. */ + +static rtx_insn * +aarch64_prev_real_insn (rtx_insn *insn) +{ + if (!insn) +return NULL; + + do +{ + insn = prev_real_insn (insn); +} + while (insn recog_memoized (insn) 0); + + return insn; +} + +static bool +is_madd_op (enum attr_type t1) +{ + unsigned int i; + /* A number of these may be AArch32 only. */ + enum attr_type mlatypes[] = { +TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD, +TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY, +TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD + }; + + for (i = 0; i sizeof (mlatypes) / sizeof (enum attr_type); i++) +{ + if (t1 == mlatypes[i]) + return true; +} + + return false; +} + +/* Check if there is a register dependency between a load and the insn + for which we hold recog_data. */ + +static bool +dep_between_memop_and_curr (rtx memop) +{ + rtx load_reg; + int opno; + + if (!memop) +return false; + + if (!REG_P (SET_DEST (memop))) +return false; + + load_reg = SET_DEST (memop); + for (opno = 0; opno recog_data.n_operands;
[PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
Hi, The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. Bootstrap and make check are in progress. The patch boosts (up to 2,5 times improve) several benchmarks compiled with -Ofast on Silvermont Spec2000: +5% gain on 173.applu +1% gain on 255.vortex Is it ok for trunk when pass bootstrap and make check? Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New. * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates CPUs with high branch cost. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for CPUs with high branch cost. */ + if (TARGET_HIGH_BRANCH_COST) +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, + 120, + opts-x_param_values, + opts_set-x_param_values); + + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 2c64162..da0c57b 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_INTER_UNIT_CONVERSIONS \ ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS] #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT] +#define TARGET_HIGH_BRANCH_COST ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST] #define TARGET_SCHEDULEix86_tune_features[X86_TUNE_SCHEDULE] #define TARGET_USE_BT ix86_tune_features[X86_TUNE_USE_BT] #define TARGET_USE_INCDEC ix86_tune_features[X86_TUNE_USE_INCDEC] diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index b6b210e..04d8bf8 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit, m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL | m_ATHLON_K8 | m_AMDFAM10) +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost. This could be + used to tune unroll, if-cvt, inline... heuristics. */ +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost, + m_BONNELL | m_SILVERMONT | m_INTEL) + /*/ /* Integer instruction selection tuning */ /*/
Re: [patch] Use abi_tag attribute on std::list
On 03/10/14 15:49 +0100, Jonathan Wakely wrote: On 03/10/14 16:25 +0200, Marc Glisse wrote: On Fri, 3 Oct 2014, Jonathan Wakely wrote: This is the patch I intend to commit to make std::list::size() O(1) as required by C++11. This is an ABI change, so std::list will get tagged with abi_tag(cxx11) so that it mangles differently. Assuming a future where we have both _GLIBCXX_ABI_TAG_CXX11 and _GLIBCXX_ABI_TAG_CXX17, I don't really see how _GLIBCXX_DEFAULT_ABI_TAG is supposed to work. We don't want to define _GLIBCXX_DEFAULT_ABI_TAG to _GLIBCXX_ABI_TAG_CXX17 and suddenly have std::list change mangling. True. Should it be called _GLIBCXX_DEFAULT_ABI_TAG_CXX11, meaning _GLIBCXX_ABI_TAG_CXX11_IF_ENABLED_AND_NOTHING_OTHERWISE? I suppose so ... or _GLIBCXX_MAYBE_ABI_TAG_CXX11 ? Defining a dummy _M_distance in the old abi case is a bit strange (we could protect the single use with #if _GLIBCXX_USE_CXX11_ABI), but why not... Yeah, I was trying to minimise the preprocessor conditionals, but maybe that's one step too far. Do you mind if I move (in a future patch once yours is committed) _M_size into _M_impl::_M_node as suggested in PR 61347? Gah, that's where I had it until earlier this week, and I looked at it and wondered why it was in the _List_impl class (because you only need one member in there to benefit from the empty base-class optimisation). I will move it back there, since I already have that code on another branch, so there's no point making you change the code to match something I've already got! Thanks for your useful comments (as always). I've committed the attached patch, which I think is the same as the one I posted a week ago. Marc, I agree with your comments, but chose to ignore them ;-) For now, anyway. I want to get something committed, we can improve it in stage 3, or when we need to worry about a C++17 ABI change. Tested x86_64-linux, committed to trunk. Documentation for the abi_tag(cxx11) changes will follow ASAP, but probably during stage 3, once all the changes are committed. commit 3ac61312eb285d125da8af4c6cdaad1b1ce7d235 Author: Jonathan Wakely jwak...@redhat.com Date: Thu Aug 7 01:13:17 2014 +0100 PR libstdc++/49561 * acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_CXX11_ABI): Define. * configure.ac: Use GLIBCXX_ENABLE_LIBSTDCXX_CXX11_ABI. * configure: Regenerate. * include/Makefile.am (stamp-cxx11-abi): New target. (c++config.h): Set _GLIBCXX_USE_CXX11_ABI macro. * include/Makefile.in: Regenerate. * include/bits/c++config: Add _GLIBCXX_USE_CXX11_ABI placeholder and define _GLIBCXX_DEFAULT_ABI_TAG. * include/bits/list.tcc (list::emplace(const_iterator, _Args...)): Increment size. (list::emplace(const_iterator, const value_type)): Likewise. (list::merge(list), list::merge(list, _StrictWeakOrdering)): Adjust list sizes. * include/bits/stl_list.h (_List_base, list): Add ABI tag macro. (_List_base::_M_size): New data member in cxx11 ABI mode. (_List_base::_S_distance(_List_node_base*, _List_node_base*)): New function. (_List_base::_M_get_size(), _List_base::_M_set_size(size_t), _List_base::_M_inc_size(size_t), _List_base::_M_dec_size(size_t), _List_base::_M_distance, _List_base::_M_node_count): New functions for accessing list size correctly for the ABI mode. (_List_base::_List_base(_List_base)): Copy size and reset source. (_List_base::_M_init()): Initialize size member. (list::size()): Use _List_base::_M_node_count. (list::swap(list)): Swap sizes. (list::splice(iterator, list)): Update sizes. (list::splice(iterator, list, iterator)): Likewise. (list::insert(iterator, const value_type)): Update size. (list::insert(iterator, _Args...)): Likewise. (list::_M_erase(iterator)): Likewise. * testsuite/23_containers/list/requirements/dr438/assign_neg.cc: Adjust. * testsuite/23_containers/list/requirements/dr438/constructor_1_neg.cc: Adjust. * testsuite/23_containers/list/requirements/dr438/constructor_2_neg.cc: Adjust. * testsuite/23_containers/list/requirements/dr438/insert_neg.cc: Adjust. * testsuite/ext/profile/mutex_extensions_neg.cc: Adjust. diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 2650a5a..0229609 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -3831,6 +3831,25 @@ AC_DEFUN([GLIBCXX_CHECK_SDT_H], [ AC_MSG_RESULT($glibcxx_cv_sys_sdt_h) ]) +dnl +dnl Check if the user wants the new C++11-conforming ABI. +dnl +dnl --disable-libstdcxx-cxx11-abi will use old ABI for all types. +dnl +dnl Defines: +dnl _GLIBCXX_USE_ABI_TAG (always defined, either to 1 or 0) +dnl +AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_CXX11_ABI], [ + AC_ARG_ENABLE([libstdcxx-cxx11-abi], +AC_HELP_STRING([--disable-libstdcxx-cxx11-abi], + [disable the C++11-conforming ABI]),, + [enable_libstdcxx_cxx11_abi=yes]) + if test
Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).
On 09 Oct 20:51, Jakub Jelinek wrote: On Thu, Oct 09, 2014 at 04:15:23PM +0400, Ilya Tocar wrote: --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21358,32 +21358,169 @@ ix86_expand_int_vcond (rtx operands[]) return true; } -ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1) +ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1, struct expand_vec_perm_d *d) Too long line, please wrap it. Fixed. { - enum machine_mode mode = GET_MODE (op0); + enum machine_mode mode = GET_MODE (d ? d-op0 : op0); + switch (mode) { +case V8HImode: + if (!TARGET_AVX512VL || !TARGET_AVX512BW) + return false; My strong preference would be: enum machine_mode maskmode = mode; rtx (*gen) (rtx, rtx, rtx, rtx); right below the enum machine_mode mode = GET_MODE (d ? d-op0 : op0); line and then inside of the first switch just do: ... case V16SImode: if (!TARGET_AVX512F) return false; gen = gen_avx512f_vpermi2varv16si3; break; case V4SFmode: if (!TARGET_AVX512VL) return false; gen = gen_avx512vl_vpermi2varv4sf3; maskmode = V4SImode; break; ... etc., then in the mask = line use: mask = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (d-nelt, vec)); and finally instead of the second switch do: emit_insn (gen (target, op0, force_reg (maskmode, mask), op1)); return true; Updated patch below. --- gcc/config/i386/i386.c | 281 +++-- gcc/config/i386/sse.md | 45 2 files changed, 253 insertions(+), 73 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 352ab81..2247da8 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21358,33 +21358,132 @@ ix86_expand_int_vcond (rtx operands[]) return true; } +/* AVX512F does support 64-byte integer vector operations, + thus the longest vector we are faced with is V64QImode. */ +#define MAX_VECT_LEN 64 + +struct expand_vec_perm_d +{ + rtx target, op0, op1; + unsigned char perm[MAX_VECT_LEN]; + enum machine_mode vmode; + unsigned char nelt; + bool one_operand_p; + bool testing_p; +}; + static bool -ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1) +ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1, + struct expand_vec_perm_d *d) { - enum machine_mode mode = GET_MODE (op0); + /* ix86_expand_vec_perm_vpermi2 is called from both const and non-const + expander, so args are either in d, or in op0, op1 etc. */ + enum machine_mode mode = GET_MODE (d ? d-op0 : op0); + enum machine_mode maskmode = mode; + rtx (*gen) (rtx, rtx, rtx, rtx); + switch (mode) { +case V8HImode: + if (!TARGET_AVX512VL || !TARGET_AVX512BW) + return false; + gen = gen_avx512vl_vpermi2varv8hi3; + break; +case V16HImode: + if (!TARGET_AVX512VL || !TARGET_AVX512BW) + return false; + gen = gen_avx512vl_vpermi2varv16hi3; + break; +case V32HImode: + if (!TARGET_AVX512BW) + return false; + gen = gen_avx512bw_vpermi2varv32hi3; + break; +case V4SImode: + if (!TARGET_AVX512VL) + return false; + gen = gen_avx512vl_vpermi2varv4si3; + break; +case V8SImode: + if (!TARGET_AVX512VL) + return false; + gen = gen_avx512vl_vpermi2varv8si3; + break; case V16SImode: - emit_insn (gen_avx512f_vpermi2varv16si3 (target, op0, - force_reg (V16SImode, mask), - op1)); - return true; + if (!TARGET_AVX512F) + return false; + gen = gen_avx512f_vpermi2varv16si3; + break; +case V4SFmode: + if (!TARGET_AVX512VL) + return false; + gen = gen_avx512vl_vpermi2varv4sf3; + maskmode = V4SImode; + break; +case V8SFmode: + if (!TARGET_AVX512VL) + return false; + gen = gen_avx512vl_vpermi2varv8sf3; + maskmode = V8SImode; + break; case V16SFmode: - emit_insn (gen_avx512f_vpermi2varv16sf3 (target, op0, - force_reg (V16SImode, mask), - op1)); - return true; + if (!TARGET_AVX512F) + return false; + gen = gen_avx512f_vpermi2varv16sf3; + maskmode = V16SImode; + break; +case V2DImode: + if (!TARGET_AVX512VL) + return false; + gen = gen_avx512vl_vpermi2varv2di3; + break; +case V4DImode: + if (!TARGET_AVX512VL) + return false; + gen = gen_avx512vl_vpermi2varv4di3; + break; case V8DImode: - emit_insn (gen_avx512f_vpermi2varv8di3 (target, op0, -force_reg (V8DImode, mask), op1)); - return true; + if (!TARGET_AVX512F) + return false; +
Re: [PATCH, ARM] attribute target (thumb,arm)
On 10/09/2014 04:11 PM, Richard Earnshaw wrote: On 09/10/14 12:35, Christian Bruel wrote: On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote: Hi Christian, snipped agreed stuf 3) about inlining I dislike inlining different modes, From a conceptual use, a user might want to switch mode only when changing a function's hotness. Usually inlining a cold function into a hot one is not what the user explicitly required when setting different mode attributes for them, __attribute__((thumb)) should not imply coldness or hotness. Inlining between cold and hot functions should be done based on profile feedback. The choice of compiling in Thumb1 state for coldness is a separate one because that's where the choice needs to be made. Ideally yes. but I think that a user motivation to use target attribute ((thumb) is to reduce code size even in the cases where PFO is not available (libraries, kernel or user application build spec packaged without profile data). And there are cases where static probabilities are not enough and that a user wants it own control with gprof or oprofile. But in this case, we could point to the __attribute__ ((cold)) on the function ? That would probably be the best workaround to propose if we recommend this Hot vs cold is interesting, but arm/thumb shouldn't be used to imply that. The days when ARM=fast, thumb=small are in the past now, and thumb2 code should be both fast and small. Indeed, smaller thumb2 code can be faster than larger ARM code simply because you can get more of it in the cache. The use of arm vs thumb is likely to be much more subtle now. I'm also very interested by this. From my last bench session, ARM mode could bring a speedup (from noise to 5/6%) but with a very big size penalty. So I believe there is room for fine tuning at application level, and I agree this is very subtle and difficult this is another topic. (That was with a GCC 4.8, maybe the gap has reduced since). But here is another scenario: Using of attribute ((arm)) for exception entry points is indeed not related to hotness. But consider a typical thumb binary with an entry point in arm compiled in C (ex handler, a kernel...). Today due to the file boundary the thumb part is not inlined into the arm point. (Using -flto is not possible because the whole gimple would be thumb). We have no-inline attributes for scenarios like that. I don't think a specific use case should dominate other cases. That's severe, no-inline attribute would disable inlining same modes ! Now, using attribute ((target)) for the functions others than the entry point, with your approach they would all be inlined (assuming the cost allow this) and we would end up with a arm binary instead of a thumb binary... But there are still 3 points : - At least 2 other target (i386, Powerpc) that support attribute_target disable inlining between modes that are not subsets. I like to think about homogeneity between targets and I find odd to have different inlining rules... That's because use of specific instructions must not be allowed to leak past a gating check that is in the caller. It would be a disaster if a function that used a neon register, for example, was allowed to leak into code that might run on a target with no Neon register file. The ARM/thumb distinction shouldn't, by default, be limited in that manner. I believe inlining could happen from a subset of the archtiecture into a function using a superset, just not vice-versa. I'm afraid I misunderstand this, Do you want inlining from Thumb to a function using ARM because you consider thumb to be a subset of ARM ? You know better that I but I never thought that, or maybe there is something to do with the unified assembler ? In this case I see the problem under a new light. With the unified assembly, indeed we could inline from any mode as long as no divide mode asm inside. - Scanning the function body to check for ASM_INPUT does not look very elegant (if this matters) because the asm could well be unrelated The only case when it will always be a win to inline thumb into arm is when the cost of the inlined body is less than a BX instruction (but still, with branch prediction this cost is pondered). One of the problems with not inlining is that the C++ abstraction penalty is likely to shoot up. There will be many major lost optimization opportunities if we start down that path. I would never consider users to use extensively this attribute on inlined member functions. But I take your point So I think inlining should only be disabled if there's some technical reason why it should be disabled, not because of some 'it might not always be ideal' feelings. Furthermore, we should expect users to use the other attributes consistently when they expect specific behaviours to occur. Sure, me also I would have preferred objective benchmarks results, but its a little bit early to have experience with
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [15/n] Optimize redundant checks
2014-10-09 21:41 GMT+04:00 Jeff Law l...@redhat.com: On 10/08/14 13:22, Ilya Enkovich wrote: Hi, This patch adds removal of redundant (covered by other) checks into checker optimization. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_compare_checks): New. (chkp_remove_redundant_checks): New. (chkp_opt_execute): Run redundant checks removal algorithm. Wouldn't pure removal be better modeled in existing optimizers? DOM would seem to be a natural fit? Similarly, isn't the swapping very similar to a reverse DOM walk DSE-like optimizer? Deferring further review until those questions are answered? jeff Checks and and intersection removal code was added as a simple pass catching trivial cases. I'm sure there are optimizations having common elements with what checker optimizer does. But initially we didn't want to adopt existing optimizers because GIMPLE representation of instrumentation was not stable and also we still don't know what are important targets for optimizations. The plan is to have stable version first. After enabling we want to make performance analysis and determine which optimizations are most required (it may appear checks removal doesn't give any significant performance gain at all), determine which of current infrastructure may be re-used (if any) and implement proper checker optimization. Current optimizer is a simple code cleanup. I do not think we should make any significant rework of it as a part of enabling. If current approach seems to require significant changes to go to trunk then it should be probably delayed and go separately from instrumentation pass. Thanks, Ilya
Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).
On Fri, Oct 10, 2014 at 07:47:19PM +0400, Ilya Tocar wrote: Updated patch below. You haven't posted ChangeLog entry this time, so using the last one: * config/i386/i386.c (MAX_VECT_LEN): Move above ix86_expand_vec_perm_vpermi2. ... * config/i386/sse.md (define_mode_iterator VI1_AVX512): New. I'd think you should avoid the line break after filename in these cases, so * config/i386/i386.c (MAX_VECT_LEN): Move above ix86_expand_vec_perm_vpermi2. ... * config/i386/sse.md (define_mode_iterator VI1_AVX512): New. Other than that nit it looks good to me. Jakub
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/08/2014 08:31 AM, Jiong Wang wrote: On 30/09/14 19:36, Jiong Wang wrote: 2014-09-30 17:30 GMT+01:00 Jeff Law l...@redhat.com: On 09/30/14 08:37, Jiong Wang wrote: On 30/09/14 05:21, Jeff Law wrote: I do agree with Richard that it would be useful to see the insns that are incorrectly sunk and the surrounding context. So I must be missing something. I thought the shrink-wrapping code wouldn't sink arithmetic/logical insns like we see with insn 14 and insn 182. I thought it was limited to reg-reg copies and constant initializations. yes, it was limited to reg-reg copies, and my previous sink improvement aimed to sink any rtx A: be single_set B: the src operand be any combination of no more than one register and no non-constant objects. while some operator like shift may have side effect. IMHO, all side effects are reflected on RTX, together with this fail_on_clobber_use modification, the rtx returned by single_set_no_clobber_use is safe to sink if it meets the above limit B and pass later register use/def check in move_insn_for_shrink_wrap ? Ping ~ And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked, we could avoid creating new wrapper function by invoke single_set_2 directly. I'm committing the following to fix this. (1) Don't bother modifying single_set; just look for a bare SET. (2) Tighten the set of expressions we're willing to move. (3) Use direct return false in the failure case, rather than counting a non-zero number of non-constants, etc. Tested on x86_64, and against Andi's test case (unfortunately unreduced). r~ 2014-10-10 Richard Henderson r...@redhat.com PR target/63404 * shrink-wrap.c (move_insn_for_shrink_wrap): Don't use single_set. Restrict the set of expressions we're willing to move. diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index b1ff8a2..257812c 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -176,17 +176,40 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, basic_block next_block; edge live_edge; - /* Look for a simple register copy. */ - set = single_set (insn); - if (!set) + /* Look for a simple register assignment. We don't use single_set here + because we can't deal with any CLOBBERs, USEs, or REG_UNUSED secondary + destinations. */ + if (!INSN_P (insn)) +return false; + set = PATTERN (insn); + if (GET_CODE (set) != SET) return false; src = SET_SRC (set); dest = SET_DEST (set); - if (!REG_P (src)) + /* For the destination, we want only a register. Also disallow STACK + or FRAME related adjustments. They are likely part of the prologue, + so keep them in the entry block. */ + if (!REG_P (dest) + || dest == stack_pointer_rtx + || dest == frame_pointer_rtx + || dest == hard_frame_pointer_rtx) +return false; + + /* For the source, we want one of: + (1) A (non-overlapping) register + (2) A constant, + (3) An expression involving no more than one register. + + That last point comes from the code following, which was originally + written to handle only register move operations, and still only handles + a single source register when checking for overlaps. Happily, the + same checks can be applied to expressions like (plus reg const). */ + + if (CONSTANT_P (src)) +; + else if (!REG_P (src)) { - unsigned int reg_num = 0; - unsigned int nonconstobj_num = 0; rtx src_inner = NULL_RTX; if (can_throw_internal (insn)) @@ -196,30 +219,50 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, FOR_EACH_SUBRTX_VAR (iter, array, src, ALL) { rtx x = *iter; - if (REG_P (x)) + switch (GET_RTX_CLASS (GET_CODE (x))) { - reg_num++; - src_inner = x; + case RTX_CONST_OBJ: + case RTX_COMPARE: + case RTX_COMM_COMPARE: + case RTX_BIN_ARITH: + case RTX_COMM_ARITH: + case RTX_UNARY: + case RTX_TERNARY: + /* Constant or expression. Continue. */ + break; + + case RTX_OBJ: + case RTX_EXTRA: + switch (GET_CODE (x)) + { + case UNSPEC: + case SUBREG: + case STRICT_LOW_PART: + case PC: + /* Ok. Continue. */ + break; + + case REG: + /* Fail if we see a second inner register. */ + if (src_inner != NULL) + return false; + src_inner = x; + break; + + default: + return false; + } + break; + + default: + return false; } - else if (!CONSTANT_P (x) OBJECT_P (x)) - nonconstobj_num++; } - if (nonconstobj_num 0 -
Re: [PATCH x86] Update PARTIAL_REG_DEPENDENCY tune
On Fri, Oct 10, 2014 at 8:07 AM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, We've met several performance issues (up to 15%) on Silvermont caused by the PARTIAL_REG_DEPENDENCY tuning. Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY. The patch passed bootstrap, make check. Is it ok for trunk? Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY): ^ It should be X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY. Remove m_SILVERMONT and m_INTEL from the tune. diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 215c63c..b6b210e 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -58,8 +58,8 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, partial_reg_dependency, SPECfp regression, while enabling it on K8 brings roughly 2.4% regression that can be partly masked by careful scheduling of moves. */ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, sse_partial_reg_dependency, - m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT - | m_INTEL | m_AMDFAM10 | m_BDVER | m_GENERIC) + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 + | m_BDVER | m_GENERIC) /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies are resolved on SSE register parts instead of whole registers, so we may -- H.J.
Re: [PATCH] Fix up _Alignof with user aligned types (PR c/63495)
On Fri, 10 Oct 2014, Jakub Jelinek wrote: Hi! As the testcase shows, _Alignof can sometimes return smaller number than the minimum alignment. That is because when laying out structures, fields with types with TYPE_USER_ALIGN set have also DECL_USER_ALIGN set and therefore neither BIGGEST_FIELD_ALIGNMENT nor ADJUST_FIELD_ALIGN is applied to them, but min_align_of_type was applying that unconditionally. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9? 2014-10-10 Jakub Jelinek ja...@redhat.com PR c/63495 * stor-layout.c (min_align_of_type): Don't decrease alignment through BIGGEST_FIELD_ALIGNMENT or ADJUST_FIELD_ALIGN if TYPE_USER_ALIGN is set. * gcc.target/i386/pr63495.c: New test. OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On 10/10/14 01:42, Evgeny Stupachenko wrote: Hi, The patch enables EBX in RA for x86 32bits PIC mode. It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html Now there is working version with good performance and stability level - it could be a solid first step of EBX enabling. Bootstrap and make check passed. There are several changes in -m32 make check. New pass: gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS ?!? That doesn't make a lot of sense. More likely it was Uros's fix from yesterday to regcprop which causes this to pass again. Is it possible you updated your sources between testing runs and as a result picked up Uros's fix? New fails: gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no errors as we can do asm insertions with EBX I think you should remove the dg-error directive. That turns this test into a simple confirmation that we can use %ebx in an asm even when generating PIC code. Can you add a PR markers to your changelog PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 Actually I think there is an additional test in 47602. Can you please add it to the suite? You'll also want to change the state of 47602 to RESOLVED/FIXED. gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential performance opportunity using constant immediate If you're not going to fix it, then you should xfail it. gcc.target/i386/pr55458.c (test for errors, line 10) - now there are no errors as there enough registers Right. Remove the dg-error and turn this into a test that effective verifies that %ebx is no longer fixed when generating PIC code on i686. With those changes this is OK for the trunk. jeff
Re: [PATCH, ARM] attribute target (thumb,arm)
On 10/10/14 15:18, Christian Bruel wrote: On 10/09/2014 04:11 PM, Richard Earnshaw wrote: On 09/10/14 12:35, Christian Bruel wrote: On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote: Hi Christian, snipped agreed stuf 3) about inlining I dislike inlining different modes, From a conceptual use, a user might want to switch mode only when changing a function's hotness. Usually inlining a cold function into a hot one is not what the user explicitly required when setting different mode attributes for them, __attribute__((thumb)) should not imply coldness or hotness. Inlining between cold and hot functions should be done based on profile feedback. The choice of compiling in Thumb1 state for coldness is a separate one because that's where the choice needs to be made. Ideally yes. but I think that a user motivation to use target attribute ((thumb) is to reduce code size even in the cases where PFO is not available (libraries, kernel or user application build spec packaged without profile data). And there are cases where static probabilities are not enough and that a user wants it own control with gprof or oprofile. But in this case, we could point to the __attribute__ ((cold)) on the function ? That would probably be the best workaround to propose if we recommend this Hot vs cold is interesting, but arm/thumb shouldn't be used to imply that. The days when ARM=fast, thumb=small are in the past now, and thumb2 code should be both fast and small. Indeed, smaller thumb2 code can be faster than larger ARM code simply because you can get more of it in the cache. The use of arm vs thumb is likely to be much more subtle now. I'm also very interested by this. From my last bench session, ARM mode could bring a speedup (from noise to 5/6%) but with a very big size penalty. So I believe there is room for fine tuning at application level, and I agree this is very subtle and difficult this is another topic. (That was with a GCC 4.8, maybe the gap has reduced since). It will obviously depend on the precise micro-architecture of the chip you run the code on; but things are improving all the time (at least, that's what we aim for :-). For example, we've recently throttled back the amount of instruction shortening for hot loops to reduce the number of instruction dependencies through the PSR flags. But here is another scenario: Using of attribute ((arm)) for exception entry points is indeed not related to hotness. But consider a typical thumb binary with an entry point in arm compiled in C (ex handler, a kernel...). Today due to the file boundary the thumb part is not inlined into the arm point. (Using -flto is not possible because the whole gimple would be thumb). We have no-inline attributes for scenarios like that. I don't think a specific use case should dominate other cases. That's severe, no-inline attribute would disable inlining same modes ! Yes, but why would that be a problem? What makes you think that inlining across modes is inherently wrong, when inlining within the same mode is not? Now, using attribute ((target)) for the functions others than the entry point, with your approach they would all be inlined (assuming the cost allow this) and we would end up with a arm binary instead of a thumb binary... But there are still 3 points : - At least 2 other target (i386, Powerpc) that support attribute_target disable inlining between modes that are not subsets. I like to think about homogeneity between targets and I find odd to have different inlining rules... That's because use of specific instructions must not be allowed to leak past a gating check that is in the caller. It would be a disaster if a function that used a neon register, for example, was allowed to leak into code that might run on a target with no Neon register file. The ARM/thumb distinction shouldn't, by default, be limited in that manner. I believe inlining could happen from a subset of the archtiecture into a function using a superset, just not vice-versa. I'm afraid I misunderstand this, Do you want inlining from Thumb to a function using ARM because you consider thumb to be a subset of ARM ? You know better that I but I never thought that, or maybe there is something to do with the unified assembler ? No, I think we can fundamentally inline either way, provided that there's no inline assembly code. Even then, we could probably get away with it 90% of the time. In this case I see the problem under a new light. With the unified assembly, indeed we could inline from any mode as long as no divide mode asm inside. Unfortunately, the two ISAs are not identical and neither is a subset of the other, despite the use of unified assembly syntax. For example, ARM state has RSC, but thumb doesn't. While Thumb has ORN, but ARM doesn't. We can't parse the ASM body to find out what it contains, so we have to play safe. [US]DIV isn't an issue once you
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [8/n] Remove useless builtin calls
On 10/10/14 08:52, Ilya Enkovich wrote: THanks, Jeff With this code we remove user builtins calls coming from source code. E.g.: p2 = (int *)__bnd_init_ptr_bounds (p1); *p2 = 0; which means p2 has value of p1 but has default bounds and following store is unchecked. These calls are important for instrumentation but useless after instrumentation. I don't think it is a part of checker optimizer because it doesn't optimize instrumentation code. Also this transformation is trivial enough for O0 and checker optimizer works starting from O2. Below is a version fixed according to Richard's comments. Thanks, Ilya -- 2014-10-10 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_remove_useless_builtins): New. (chkp_execute): Remove useless calls to Pointer Bounds Checker builtins. Tests instrumentation are still needed. With some basic tests and instrumentation this will be OK. I hate to be harping tests, but few developers are going to be familiar with the MPX and related infrastructure and those tests are critical to helping them know when they break something. Similarly if the plan is to iterate on improving things, then those basic functionality tests will ultimately save time as you can smoke test before running larger benchmarks. jeff
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [11/n] Optimization helpers
On 10/10/14 08:24, Ilya Enkovich wrote: On 09 Oct 12:09, Jeff Law wrote: On 10/08/14 13:16, Ilya Enkovich wrote: Hi, This patch introduces structures and manipulation functions used by simple checker optimizations. Structures are used to hold checks information - type of check and checked address in a polinomial form. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (check_type): New. (pol_item): New. (address_t): New. (check_info): New. (bb_checks): New. (chkp_pol_item_compare): New. (chkp_pol_find): New. (chkp_extend_const): New. (chkp_add_addr_item): New. (chkp_sub_addr_item): New. (chkp_add_addr_addr): New. (chkp_sub_addr_addr): New. (chkp_mult_addr): New. (chkp_is_constant_addr): New. (chkp_print_addr): New. (chkp_collect_addr_value): New. (chkp_collect_value): New. (chkp_fill_check_info): New. +/* Find plynomial item in ADDR with var equal to VAR s/plynomial/polynomial/ With nit fixed and functions moved into whatever new file gets created for the optimization work this will be OK. jeff Thanks for review! Here is a fixed version. Ilya -- 2014-10-10 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp-opt.c: New. * Makefile.in (OBJS): Add tree-chkp-opt.o. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index d8c8488..cd45b29 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1389,6 +1389,7 @@ OBJS = \ tree-parloops.o \ tree-phinodes.o \ tree-chkp.o \ + tree-chkp-opt.o \ tree-predcom.o \ tree-pretty-print.o \ tree-profile.o \ diff --git a/gcc/tree-chkp-opt.c b/gcc/tree-chkp-opt.c new file mode 100644 index 000..103c4bb --- /dev/null +++ b/gcc/tree-chkp-opt.c @@ -0,0 +1,463 @@ +/* Pointer Bounds Checker optimization pass. + Copyright (C) 2014 Free Software Foundation, Inc. + Contributed by Ilya Enkovich (ilya.enkov...@intel.com) + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include tree-core.h +#include stor-layout.h +#include varasm.h +#include tree.h +#include target.h +#include tree-iterator.h +#include tree-cfg.h +#include langhooks.h +#include tree-pass.h +#include hashtab.h +#include diagnostic.h +#include ggc.h +#include output.h +#include internal-fn.h +#include is-a.h +#include predict.h +#include cfgloop.h +#include stringpool.h +#include tree-ssa-alias.h +#include tree-ssanames.h +#include tree-ssa-operands.h +#include tree-ssa-address.h +#include tree-ssa.h +#include ipa-inline.h +#include basic-block.h +#include tree-ssa-loop-niter.h +#include gimple-expr.h +#include gimple.h +#include tree-phinodes.h +#include gimple-ssa.h +#include ssa-iterators.h +#include gimple-pretty-print.h +#include gimple-iterator.h +#include gimplify.h +#include gimplify-me.h +#include print-tree.h +#include expr.h +#include tree-ssa-propagate.h +#include gimple-fold.h +#include gimple-walk.h +#include tree-dfa.h +#include tree-chkp.h Thanks. Looks good. As a follow-up, can you try to trim down what appear to be the over-zealous includes? It's a minor thing, but we are trying to be a bit saner about that kind of stuff than we've been in the past. If you've already done that, then, well, we've clearly still got a ways to go. For example, I can't see why you'd need output.h here :-0 Jeff
Regenerated libstdc++-v3/testsuite/Makefile.in
I've regenerated this file and committed it so I don't get harmless whitespace changes every time I run autoreconf.
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [12/n] Optimize string functions
On 10/10/14 08:19, Ilya Enkovich wrote: So is the purpose here to expose the checks that would normally be done in the mem* routines to their caller in the hopes that doing so will expose redundant checks? Or is there some other reason? There are few reasons to replace instrumented string functions: 1. As you said following redundant checks elimination may remove checks for some string functions 2. By default functions like memcpy should assume pointers are copied and copy bounds. If we know pointers are not copied, we may use faster version with no bounds copy 3. If we avoid both checks and bounds copy then it is a candidate for existing string function calls inlining in expand pass Perfect. So this belongs in a comment in the code. I thought tests will be added later. Did you already post them? There's been so many patches I'm starting to lose track :-) For future reference, when you break a submission down into logical hunks, including the tests in those logical hunks helps. I realize the MPX work isn't as well suited for that kind of breakdown, but it's worth keeping in mind. I have ~250 tests to commit. Will check I have tests for optimizations. Excellent. BTW this particular optimization cann't work until we have instrumented builtin calls. Yea, hopefully we'll get to that before close of stage1. It's a nit, but I'd tend to write that as: if (!fndecl_nochk) continue; fndecl = fndecl_nochk gimple_call_set_fndecl (stmt, fndecl); There is one more assignment to fndecl above which makes your version nonequivalent. I had assumed the gimple_call_set_fndecl was a nop if we didn't change the fndecl. Is that not the case? I'm a bit surprised we don't have this kind of capability already broken out. But assuming that's the case, can you go ahead and break that out into its own little helper function?You don't need to find all the cases where we're doing this kind of thing today, just create the helper function and use it in your new code. I could miss such function (looked in cfg hooks and tree-cfg.h). Hopefully someone will correct me if it is so. Thanks. I suspect everyone has just done their own implementation inline like you did. It's something I'll be keeping my eye out for in others' code so we can funnel everyone into your new function. ISTM many speculative optimizations are going to need that kind of helper. Taking into account not instrumented builtin calls I suppose this patch goes into a next builtin related series. But here is a version with changes. Yea, I think you're right. I think this is OK when the builtins are done. jeff
Re: [PATCH] Clean up duplicated function seq_cost
On 10/10/14 02:17, Zhenqiang Chen wrote: -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Thursday, October 09, 2014 5:21 PM To: Zhenqiang Chen Cc: GCC Patches Subject: Re: [PATCH] Clean up duplicated function seq_cost On Thu, Oct 9, 2014 at 11:20 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: Hi, The are two implementations of seq_cost. The function bodies are exactly the same. The patch removes one of them and make the other global. Bootstrap and no make check regression on X86-64. OK for trunk? The prototype should go to cfgloopanal.c. Err - cfgloopanal.h of course ;) Or rather the function sounds misplaced in cfgloopanal.c. Thanks for the comments. I think seq_cost should be in rtlanal.c, just like rtx_cost, insn_rtx_cost and so on. ChangeLog: 2014-10-10 Zhenqiang Chen zhenqiang.c...@arm.com * cfgloopanal.c (seq_cost): Delete. * rtl.h (seq_cost): New prototype. * rtlanal.c (seq_cost): New function. * tree-ssa-loop-ivopts.c (seq_cost): Delete. Ok. Jeff
Re: [PATCH x86] Update PARTIAL_REG_DEPENDENCY tune
Thanks. I've modified ChangeLog. 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Remove m_SILVERMONT and m_INTEL from the tune. On Fri, Oct 10, 2014 at 7:58 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Oct 10, 2014 at 8:07 AM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, We've met several performance issues (up to 15%) on Silvermont caused by the PARTIAL_REG_DEPENDENCY tuning. Previously discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954 Propose removing Silvermont related tune from PARTIAL_REG_DEPENDENCY. The patch passed bootstrap, make check. Is it ok for trunk? Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY): ^ It should be X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY. Remove m_SILVERMONT and m_INTEL from the tune. diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 215c63c..b6b210e 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -58,8 +58,8 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, partial_reg_dependency, SPECfp regression, while enabling it on K8 brings roughly 2.4% regression that can be partly masked by careful scheduling of moves. */ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, sse_partial_reg_dependency, - m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT - | m_INTEL | m_AMDFAM10 | m_BDVER | m_GENERIC) + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 + | m_BDVER | m_GENERIC) /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies are resolved on SSE register parts instead of whole registers, so we may -- H.J.
Re: [PATCH] Implement -fsanitize=object-size
On Fri, Oct 10, 2014 at 12:26:44PM +0200, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 12:04:08PM +0200, Marek Polacek wrote: I couldn't test bootstrap-ubsan, because of error: /home/polacek/x/trunk/prev-x86_64-unknown-linux-gnu/libsanitizer/ubsan/.libs/libubsan.a(ubsan_init.o): .preinit_array section is not allowed in DSO but I remember that the previous version of the patch passed fine. We build (intentionally) both libubsan.so.* objects and libubsan.a objects with -fPIC, but don't build the latter with -DPIC. I guess we need now, with -static-libubsan libubsan.a is linked into shared libraries statically and we definitely can't use .preinit_array in that case. So, I think (untested) something like: 2014-10-10 Jakub Jelinek ja...@redhat.com * ubsan/Makefile.am (DEFS): Add -DPIC. * ubsan/Makefile.in: Regenerated. --- libsanitizer/ubsan/Makefile.am2014-09-24 11:08:04.183026156 +0200 +++ libsanitizer/ubsan/Makefile.am2014-10-10 12:15:19.124247283 +0200 @@ -3,7 +3,7 @@ AM_CPPFLAGS = -I $(top_srcdir) -I $(top_ # May be used by toolexeclibdir. gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER) -DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS +DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DPIC AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC -fno-builtin -fno-exceptions -fno-rtti -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS) ACLOCAL_AMFLAGS = -I m4 should fix this. 2014-10-09 Marek Polacek pola...@redhat.com Check the date ;) Adjusted. * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_OBJECT_SIZE. * doc/invoke.texi: Document -fsanitize=object-size. * flag-types.h (enum sanitize_code): Add SANITIZE_OBJECT_SIZE and or it into SANITIZE_UNDEFINED. * gimple-fold.c (gimple_fold_call): Optimize IFN_UBSAN_OBJECT_SIZE. * internal-fn.c (expand_UBSAN_OBJECT_SIZE): New function. * internal-fn.def (UBSAN_OBJECT_SIZE): Define. * opts.c (common_handle_option): Handle -fsanitize=object-size. * ubsan.c: Include tree-object-size.h. I'd avoid the s. Adjusted. --- gcc/gimple-fold.c +++ gcc/gimple-fold.c @@ -2662,6 +2662,19 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace) gimple_call_arg (stmt, 1), gimple_call_arg (stmt, 2)); break; +case IFN_UBSAN_OBJECT_SIZE: + if (integer_all_onesp (gimple_call_arg (stmt, 2)) Formatting on the case line, there should be tab. Fixed. + + gcc_assert (TREE_CODE (size) == INTEGER_CST); + /* See if we can discard the check. */ + if (integer_all_onesp (size)) +/* Yes, __builtin_object_size couldn't determine the + object size. */; I'd just treat TREE_CODE (size) != INTEGER_CST the same as integer_all_onesp. It is very likely you'll get an INTEGER_CST there, but I'd be afraid if somebody disables ccp, forwprop and similar optimizations that if you are unlucky you might actually have an SSA_NAME there instead. Fixed. Ok with those changes. Thanks for reviewing! After commit, please update gcc-5/changes.html. Thanks. Sure. I'm applying the following. 2014-10-10 Marek Polacek pola...@redhat.com * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_OBJECT_SIZE. * doc/invoke.texi: Document -fsanitize=object-size. * flag-types.h (enum sanitize_code): Add SANITIZE_OBJECT_SIZE and or it into SANITIZE_UNDEFINED. * gimple-fold.c (gimple_fold_call): Optimize IFN_UBSAN_OBJECT_SIZE. * internal-fn.c (expand_UBSAN_OBJECT_SIZE): New function. * internal-fn.def (UBSAN_OBJECT_SIZE): Define. * opts.c (common_handle_option): Handle -fsanitize=object-size. * ubsan.c: Include tree-object-size.h. (ubsan_type_descriptor): Call tree_to_uhwi instead of tree_to_shwi. (ubsan_expand_bounds_ifn): Use false instead of 0. (ubsan_expand_objsize_ifn): New function. (instrument_object_size): New function. (pass_ubsan::execute): Add object size instrumentation. * ubsan.h (ubsan_expand_objsize_ifn): Declare. testsuite/ * c-c++-common/ubsan/object-size-1.c: New test. * c-c++-common/ubsan/object-size-2.c: New test. * c-c++-common/ubsan/object-size-3.c: New test. * c-c++-common/ubsan/object-size-4.c: New test. * c-c++-common/ubsan/object-size-5.c: New test. * c-c++-common/ubsan/object-size-6.c: New test. * c-c++-common/ubsan/object-size-7.c: New test. * c-c++-common/ubsan/object-size-8.c: New test. * c-c++-common/ubsan/object-size-9.c: New test. * g++.dg/ubsan/object-size-1.C: New test.
Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).
On Fri, Oct 10, 2014 at 5:47 PM, Ilya Tocar tocarip.in...@gmail.com wrote: My strong preference would be: enum machine_mode maskmode = mode; rtx (*gen) (rtx, rtx, rtx, rtx); right below the enum machine_mode mode = GET_MODE (d ? d-op0 : op0); line and then inside of the first switch just do: ... case V16SImode: if (!TARGET_AVX512F) return false; gen = gen_avx512f_vpermi2varv16si3; break; case V4SFmode: if (!TARGET_AVX512VL) return false; gen = gen_avx512vl_vpermi2varv4sf3; maskmode = V4SImode; break; ... etc., then in the mask = line use: mask = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (d-nelt, vec)); and finally instead of the second switch do: emit_insn (gen (target, op0, force_reg (maskmode, mask), op1)); return true; Updated patch below. Please recode that horrible first switch statement to: --cut here-- rtx (*gen) (rtx, rtx, rtx, rtx) = NULL; switch (mode) { case V8HImode: if (TARGET_AVX512VL TARGET_AVX152BW) gen = gen_avx512vl_vpermi2varv8hi3; break; ... case V2DFmode: if (TARGET_AVX512VL) { gen = gen_avx512vl_vpermi2varv2df3; maskmode = V2DImode; } break; default: break; } if (gen == NULL) return false; --cut here-- The patch is OK with the above improvement. (Please also note that the patch has a bunch of i386.md changes that will clash with followup patch series). Thanks, Uros.
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/14 16:59, Richard Henderson wrote: On 10/08/2014 08:31 AM, Jiong Wang wrote: Ping ~ And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked, we could avoid creating new wrapper function by invoke single_set_2 directly. I'm committing the following to fix this. (1) Don't bother modifying single_set; just look for a bare SET. (2) Tighten the set of expressions we're willing to move. (3) Use direct return false in the failure case, rather than counting a non-zero number of non-constants, etc. Tested on x86_64, and against Andi's test case (unfortunately unreduced). minor nit, after this patch, gcc.target/aarch64/shrink_wrap_symbol_ref_1.c still not shrink-wrapped under -mabi=ilp32, the reason is as Pinski reported, LO_SUM is with with RTX_OBJ class, while it could be treated as expression. in your fix, if it's RTX_OBJ, and LO_SUM, then the move aborted by return NULL. anyway, thanks for the fix, I verified the shrink-wrapping of some hotpot functions in benchmark on aarch64 are not affected. Regards, Jiong r~
Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target
On 10/09/14 10:54, Uros Bizjak wrote: On Thu, Oct 9, 2014 at 4:07 PM, Ilya Enkovich enkovich@gmail.com wrote: It appeared I changed a semantics of BNDMK expand when replaced tree operations with rtl ones. Original code: + op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1), + arg1, integer_minus_one_node)); + op1 = force_reg (Pmode, op1); Modified code: + op1 = expand_normal (arg1); + + if (!register_operand (op1, Pmode)) + op1 = ix86_zero_extend_to_Pmode (op1); + + /* Builtin arg1 is size of block but instruction op1 should +be (size - 1). */ + op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, +op1, 1, OPTAB_DIRECT); The problem is that in the fixed version we may modify value of a pseudo register into which arg1 is expanded which means incorrect value for all following usages of arg1. Didn't reveal it early because programs surprisingly rarely hit this bug. I do following change to fix it: op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, -op1, 1, OPTAB_DIRECT); +NULL, 1, OPTAB_DIRECT); Similar problem was also fixed for BNDNARROW. Does it look OK? I'm not aware of this type of limitation, and there are quite some similar constructs in i386.c. It is hard to say without the testcase what happens, perhaps one of RTX experts (CC'd) can advise what is recommended here. The problem is the call to expand_simple_binop. The source (op1) and the destination (op1) are obviously the same, so its going to clobber whatever value is in there. If there are other uses of the original value of op1, then things aren't going to work. But I'm a little unclear how there's be other later uses of that value. Perhaps Ilya could comment on that. Regardless, unless there's a strong reason to do so, I'd generally recommend passing a NULL_RTX as the target for expansions so that you always get a new pseudo. Lots of optimizers in the RTL world work better if we don't have pseudos with multiple assignments. By passing NULL_RTX for the target we get that property more often. So a change like Ilya suggests (though I'd use NULL_RTX rather than NULL) makes sense. Jeff
Re: New rematerialization sub-pass in LRA
On 10/10/14 09:02, Vladimir Makarov wrote: The new LRA rematerialization sub-pass works right before spilling subpass and tries to rematerialize values of spilled pseudos. To implement the new sub-pass, some important changes in LRA were done. First, lra-lives.c updates live info about all registers (not only allocatable ones) and, second, lra-constraints.c was modified to permit to check that insn satisfies all constraints in strict sense even if it still contains pseudos. I've tested and benchmarked the sub-pass on x86-64 and ARM. The sub-pass permits to generate a smaller code in average on both architecture (although improvement no-significant), adds 0.4% additional compilation time in -O2 mode of release GCC (according user time of compilation of 500K lines fortran program and valgrind lakey # insns in combine.i compilation) and about 0.7% in -O0 mode. As the performance result, the best I found is 1% SPECFP2000 improvement on ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance results are practically the same (Haswell has a very good sophisticated memory sub-system). There is a room for the pass improvements. I wrote some ideas at the top level comment of file lra-remat.c Rematerialization sub-pass will work at -O2 and higher and new option -flra-remat is introduced. The patch was successfully tested on x86-64 and ARM. I am going to submit it on next week. Any comments are appreciated. I wonder if this could help with some of the rematerialization issues Intel is running into with their allocatable PIC register changes for i686. I'll let them pass along test cases you can play with :-) jeff
Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper
The linker already has a --strip-lto-sections option, and it's on by default. I'll approve a patch that modifies gold to recognize .gnu.offload_lto.* sections as part of --strip-lto-sections. Really, though, you should be setting the SHF_EXCLUDE bit on these sections. Do that and no special-casing will be necessary. Generating a linker script on the fly to discard these sections is, to me, rather hacky. There are better ways to do it. -cary On Thu, Oct 9, 2014 at 11:53 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 10, 2014 at 12:07:03AM +0400, Ilya Verbin wrote: On 09 Oct 16:07, Ilya Verbin wrote: + /* By default linker does not discard .gnu.offload_lto_* sections. */ + const char *linker_script = make_temp_file (_linker_script.x); + FILE *stream = fopen (linker_script, w); + if (!stream) + fatal_error (fopen %s: %m, linker_script); + fprintf (stream, SECTIONS { /DISCARD/ : { *( + OFFLOAD_SECTION_NAME_PREFIX *) } }\n); + fclose (stream); + printf (%s\n, linker_script); + + goto finish; +} Does this work with gold? Are there any other linkers that support plugins, but don't support linker scripts this way? Oops, gold does not support scripts, outputted from plugins :( error: SECTIONS seen after other input files; try -T/--script Probably, we should update default linker scripts in binutils? But without latest ld/gold all binaries compiled without -flto and with offloading will contain intermediate bytecode... Actually, this issue is not due to outputting a script from a plugin, gold just does not support partial linker scripts: https://sourceware.org/bugzilla/show_bug.cgi?id=17451 So it seems that discarding .gnu.offload_lto_* sections (like it is done for .gnu.lto_*) in the default ld and gold scripts is the right way? I must say I'm not very much familiar with the linker plugin API, but it surprises me that discarding sections is not something it allows. Anyway, can you do the partial linker script for the bfd linker (is there a way to determine from the linker plugin API if it is gold or bfd ld?), and for gold for the time being perhaps strip the sections in lto-wrapper? and feed the ET_REL objects with the sections stripped back to the linker through the plugin API? Jakub
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/2014 09:39 AM, Jiong Wang wrote: (1) Don't bother modifying single_set; just look for a bare SET. (2) Tighten the set of expressions we're willing to move. (3) Use direct return false in the failure case, rather than counting a non-zero number of non-constants, etc. Tested on x86_64, and against Andi's test case (unfortunately unreduced). minor nit, after this patch, gcc.target/aarch64/shrink_wrap_symbol_ref_1.c still not shrink-wrapped under -mabi=ilp32, the reason is as Pinski reported, LO_SUM is with with RTX_OBJ class, while it could be treated as expression. in your fix, if it's RTX_OBJ, and LO_SUM, then the move aborted by return NULL. I missed that message. Interesting. I wonder what kind of fallout there would be from changing LO_SUM to RTX_BIN_ARITH, which is what it should have been all along. My guess is that someone looked at HIGH being RTX_CONST_OBJ and thought that LO_SUM should also be a kind of object too. But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been (plus reg (low symbol)) and thus more obvious. I'll see if I can bootstrap such a change on e.g. sparc or ppc32, which uses lo_sum heavily. anyway, thanks for the fix, I verified the shrink-wrapping of some hotpot functions in benchmark on aarch64 are not affected. Thanks for the additional testing. r~
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [15/n] Optimize redundant checks
On 10/10/14 09:50, Ilya Enkovich wrote: Checks and and intersection removal code was added as a simple pass catching trivial cases. I'm sure there are optimizations having common elements with what checker optimizer does. But initially we didn't want to adopt existing optimizers because GIMPLE representation of instrumentation was not stable and also we still don't know what are important targets for optimizations. Understood. The plan is to have stable version first. After enabling we want to make performance analysis and determine which optimizations are most required (it may appear checks removal doesn't give any significant performance gain at all), determine which of current infrastructure may be re-used (if any) and implement proper checker optimization. Current optimizer is a simple code cleanup. I do not think we should make any significant rework of it as a part of enabling. If current approach seems to require significant changes to go to trunk then it should be probably delayed and go separately from instrumentation pass. Well, I think it should be trivial to handle the redundant check elimination in DOM. Most likely eliminate_redundant_computations needs some work to allow it to look inside those checks and get them recorded into its tables. With that in place, DOM should optimize this stuff without further intervention. It's probably less code than you've already written :-) The swapping variant feels like it should be simple to implement with the existing dominator walkers. But I haven't thought nearly as much about that one. jeff
Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper
On Fri, Oct 10, 2014 at 09:51:02AM -0700, Cary Coutant wrote: The linker already has a --strip-lto-sections option, and it's on by default. I'll approve a patch that modifies gold to recognize .gnu.offload_lto.* sections as part of --strip-lto-sections. Really, though, you should be setting the SHF_EXCLUDE bit on these sections. Do that and no special-casing will be necessary. For that I guess lhd_begin_section would need to replace: section = get_section (name, SECTION_DEBUG, NULL); with: section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL); either just for the .gnu.offload_lto prefixed section, or all. The question is what will old assemblers and/or linkers do with that, and if there are any that support linker plugins, but not SHF_EXCLUDE. Jakub
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/14 10:52, Richard Henderson wrote: I missed that message. Interesting. I wonder what kind of fallout there would be from changing LO_SUM to RTX_BIN_ARITH, which is what it should have been all along. My guess is that someone looked at HIGH being RTX_CONST_OBJ and thought that LO_SUM should also be a kind of object too. Most likely. One could even make an argument that LO_SUM should be a constant, but that's a more intrusive change. But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote: But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Can be addition or bitwise or, as high should have some low bits zero, both usually work the same. Jakub
Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper
The question is what will old assemblers and/or linkers do with that, and if there are any that support linker plugins, but not SHF_EXCLUDE. If it helps answer that question, SHF_EXCLUDE support has been in gold for 6 years, and in gas for 4. -cary
Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI
On Wed, Oct 1, 2014 at 8:57 PM, Vladimir Makarov vmaka...@redhat.com wrote: The problem is in code introduced by Bernd in IRA and caller-saves.c in 2012. It is basically an optimization for functions returning always the same result as one argument (e.g. memcpy returning 1st argument). There are two possible solutions. First one is to prohibit the optimizations when there is a parallel in SET. Second one is to go deeper if the call result is guaranteed in the first element which is true for the patch. I suspect that the first solution will regress gcc.target/i386/retarg.c on i686 - that testcase test if referred optimization is effective. All things equal, I think we should go with the second solution. For the first solution, the patch would be Index: lra-constraints.c === --- lra-constraints.c (revision 215748) +++ lra-constraints.c (working copy) @@ -5348,16 +5348,19 @@ if (GET_CODE (pat) == PARALLEL) pat = XVECEXP (pat, 0, 0); dest = SET_DEST (pat); - start_sequence (); - emit_move_insn (cheap, copy_rtx (dest)); - restore = get_insns (); - end_sequence (); - lra_process_new_insns (curr_insn, NULL, restore, -Inserting call parameter restore); - /* We don't need to save/restore of the pseudo from -this call. */ - usage_insns[regno].calls_num = calls_num; - bitmap_set_bit (check_only_regs, regno); + if (REG_P (dest)) + { + start_sequence (); + emit_move_insn (cheap, copy_rtx (dest)); + restore = get_insns (); + end_sequence (); + lra_process_new_insns (curr_insn, NULL, restore, +Inserting call parameter restore); + /* We don't need to save/restore of the pseudo +from this call. */ + usage_insns[regno].calls_num = calls_num; + bitmap_set_bit (check_only_regs, regno); + } } } to_inherit_num = 0; For the second solution, the patch is Index: lra-constraints.c === --- lra-constraints.c (revision 215748) +++ lra-constraints.c (working copy) @@ -5348,16 +5348,25 @@ if (GET_CODE (pat) == PARALLEL) pat = XVECEXP (pat, 0, 0); dest = SET_DEST (pat); - start_sequence (); - emit_move_insn (cheap, copy_rtx (dest)); - restore = get_insns (); - end_sequence (); - lra_process_new_insns (curr_insn, NULL, restore, -Inserting call parameter restore); - /* We don't need to save/restore of the pseudo from -this call. */ - usage_insns[regno].calls_num = calls_num; - bitmap_set_bit (check_only_regs, regno); + if (GET_CODE (dest) == PARALLEL) + { + dest = XVECEXP (dest, 0, 0); + if (GET_CODE (dest) == EXPR_LIST) + dest = XEXP (dest, 0); + } + if (REG_P (dest)) + { + start_sequence (); + emit_move_insn (cheap, copy_rtx (dest)); + restore = get_insns (); + end_sequence (); + lra_process_new_insns (curr_insn, NULL, restore, +Inserting call parameter restore); + /* We don't need to save/restore of the pseudo from +this call. */ + usage_insns[regno].calls_num = calls_num; + bitmap_set_bit (check_only_regs, regno); + } } } The first patch is safer but the second one is ok too. I have no particular preferences. Whatever we choose, analogous code in caller-saves.c should be changed too. Uros.
Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769
On Fri, Oct 10, 2014 at 3:53 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. A linker patching technique can also be used, by moving potentially affected multiply-accumulate instruction into a patch region and replacing the original instruction with a branch to the patch. This patch achieves the compiler part by using the final prescan pass. The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH so that conditional branches work properly. The fix is disabled by default and can be turned on by the -mfix-cortex-a53-835769 compiler option. I'm attaching a trunk and a 4.9 version of the patch. The 4.9 version is different only in that rtx_insn* is replaced by rtx. Tested on aarch64-none-linux-gnu (and bootstrap with the option) and built various large benchmarks with it. Ok? A few comments: This: +static int +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED) +{ + return MEM_P (*x); +} + +static bool +is_memory_op (rtx_insn *mem_insn) +{ + rtx pattern = PATTERN (mem_insn); + return for_each_rtx (pattern, is_mem_p, NULL); +} Should be using the new FOR_EACH_SUBRTX instead. This should simplify the code something like: static bool is_memory_op (rtx_insn *mem_insn) { subrtx_iterator::array_type array; FOR_EACH_SUBRTX (iter, array, PATTERN (mem_insn), ALL) if (MEM_P (*iter)) return true; return false; } Also this should be a has_ function rather than a is_ function. I think you should have ADJUST_INSN_LENGTH as a function call rather than inline it there or at least wrap it with do { } while (0); which I think is documented part of the coding style. Also you added no comment before each function. The coding style says each function needs a comment describing what the function does. Thanks, Andrew Pinski Thanks, Kyrill 2014-10-10 Kyrylo Tkachovkyrylo.tkac...@arm.com Ramana Radhakrishnanramana.radhakrish...@arm.com * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define. (ADJUST_INSN_LENGTH): Define. * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option. * config/aarch64/aarch64.c (is_mem_p): New function. (is_memory_op): Likewise. (aarch64_prev_real_insn): Likewise. (is_madd_op): Likewise. (dep_between_memop_and_next): Likewise. (aarch64_madd_needs_nop): Likewise. (aarch64_final_prescan_insn): Likewise. * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769 and -mno-fix-cortex-a53-835769 options.
Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target
2014-10-10 20:45 GMT+04:00 Jeff Law l...@redhat.com: On 10/09/14 10:54, Uros Bizjak wrote: On Thu, Oct 9, 2014 at 4:07 PM, Ilya Enkovich enkovich@gmail.com wrote: It appeared I changed a semantics of BNDMK expand when replaced tree operations with rtl ones. Original code: + op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1), + arg1, integer_minus_one_node)); + op1 = force_reg (Pmode, op1); Modified code: + op1 = expand_normal (arg1); + + if (!register_operand (op1, Pmode)) + op1 = ix86_zero_extend_to_Pmode (op1); + + /* Builtin arg1 is size of block but instruction op1 should +be (size - 1). */ + op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, +op1, 1, OPTAB_DIRECT); The problem is that in the fixed version we may modify value of a pseudo register into which arg1 is expanded which means incorrect value for all following usages of arg1. Didn't reveal it early because programs surprisingly rarely hit this bug. I do following change to fix it: op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, -op1, 1, OPTAB_DIRECT); +NULL, 1, OPTAB_DIRECT); Similar problem was also fixed for BNDNARROW. Does it look OK? I'm not aware of this type of limitation, and there are quite some similar constructs in i386.c. It is hard to say without the testcase what happens, perhaps one of RTX experts (CC'd) can advise what is recommended here. The problem is the call to expand_simple_binop. The source (op1) and the destination (op1) are obviously the same, so its going to clobber whatever value is in there. If there are other uses of the original value of op1, then things aren't going to work. But I'm a little unclear how there's be other later uses of that value. Perhaps Ilya could comment on that. op1 is a result of expand_normal called for SSA name. Other uses of op1 come from expand of uses of this SSA name in GIMPLE code. Regardless, unless there's a strong reason to do so, I'd generally recommend passing a NULL_RTX as the target for expansions so that you always get a new pseudo. Lots of optimizers in the RTL world work better if we don't have pseudos with multiple assignments. By passing NULL_RTX for the target we get that property more often. So a change like Ilya suggests (though I'd use NULL_RTX rather than NULL) makes sense. Will replace it with NULL_RTX. Thanks, Ilya Jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/14 11:04, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote: But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Can be addition or bitwise or, as high should have some low bits zero, both usually work the same. It can be bitwise-or on some architectures, but I believe it's typically implemented as addition. There's also typically an overlap between the high and low parts when building up addresses. GCC actually takes advantage of that overlap to reduce unnecessary HIGH expressions. And you can always have an oddball architecture like the PA where the LO_SUM does something utterly braindead. It looks like this addil %r27,symbolic nonsense You might think %r27 holds the high bits of the address... Umm, no it doesn't. There's an implicit %r1 source/destination operand (which holds the high bits). So what this really does is add %r27, %r1 and the symbolic constant. %r27 is a data pointer. Obviously the implicit operand is used get more bits holding the symbolic constant in the instruction. If that's not bad enough, if the object is in readonly memory, then the linker will modify the instruction to change %r27 to %r0 (hardwired 0 constant).But I'm getting offtopic here :-) jeff
Re: [PATCH v2, RTL]: Fix PR 63483, Scheduler performs Invalid move of aliased memory reference
On 10/10/14 04:06, Uros Bizjak wrote: On Wed, Oct 8, 2014 at 1:56 PM, Uros Bizjak ubiz...@gmail.com wrote: This message revives an old thread [1], where the miscompilation of gfortran on alpha was found that that resulted in: [...] As said in the audit trail of the bugreport I think that the caller of alpha_set_memflags is wrong in applying MEM flags from the _source_ operand to MEMs generated for the RMW sequence for the destination. It would be better to fix that instead. Please see comment #6 of the referred PR [1] for further analysis and ammended testcase. The testcase and analysis will show a native read passing possibly aliasing store. Attached v2 patch implements the same approach in all alias.c places that declare MEM_READONLY_P operands as never aliasing. 2014-10-10 Uros Bizjak ubiz...@gmail.com * alias.c (true_dependence_1): Do not exit early for MEM_READONLY_P references when alignment ANDs are involved. (write_dependence_p): Ditto. (may_alias_p): Ditto. Patch was boostrapped and regression tested on x86_64-linux-gnu and alpha-linux-gnu. Unfortunately, there are still failures remaining in gfortran testsuite due to independent RTL infrastructure problem with VALUEs leaking into aliasing detecting functions [2], [3]. The patch was discussed and OK'd by Richi in the PR audit trail. If there are no objections from RTL maintainers, I plan to commit it to the mainline early next week. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483 [2] https://gcc.gnu.org/ml/gcc/2014-10/msg00060.html [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63475 No objection. In fact, after reading everything it's pretty obvious to me that a /u MEM must be considered as potentially conflicting with writes that are implemented as RMW sequences to deal with the lack of byte access support. The escaping VALUE stuff is still in my queue. jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/2014 10:21 AM, Jeff Law wrote: On 10/10/14 11:04, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote: But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Can be addition or bitwise or, as high should have some low bits zero, both usually work the same. It can be bitwise-or on some architectures... Just to be clear, the LOW rtx code I mentioned upthread was for illustration purposes only. I only intend to change the rtx class of the existing LO_SUM. :-) r~
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/14 11:25, Richard Henderson wrote: On 10/10/2014 10:21 AM, Jeff Law wrote: On 10/10/14 11:04, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote: But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Can be addition or bitwise or, as high should have some low bits zero, both usually work the same. It can be bitwise-or on some architectures... Just to be clear, the LOW rtx code I mentioned upthread was for illustration purposes only. I only intend to change the rtx class of the existing LO_SUM. That was my expectation :-) jeff