Re: [PATCH x86, PR60451] Expand even/odd permutation using pack insn.
On Thu, Nov 20, 2014 at 5:25 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Bootstrap / make check passed with updated patch. Is it still ok? It looks like we don't need expand_vec_perm_vpshufb2_vpermq_even_odd any more with the patch. However the clean up will be in the separate patch after appropriate testing. Modified ChangeLog: 2014-11-20 Evgeny Stupachenko evstu...@gmail.com gcc/testsuite PR target/60451 * gcc.target/i386/pr60451.c: New. gcc/ PR target/60451 * config/i386/i386.c (expand_vec_perm_even_odd_pack): New. (expand_vec_perm_even_odd_1): Add new expand for V8HI mode, replace for V16QI, V16HI and V32QI modes. (ix86_expand_vec_perm_const_1): Add new expand. OK. Thanks, Uros.
[PATCH] Backport PR61750 fix
The following backports a fix I applied to match.pd whilst merging from match-and-simplify to the original tree-ssa-forwprop.c code on the 4.9 branch. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-11-21 Richard Biener rguent...@suse.de PR tree-optimization/61750 * tree-ssa-forwprop.c (simplify_vce): Verify type sizes match for the resulting VIEW_CONVERT_EXPR. Index: gcc/tree-ssa-forwprop.c === --- gcc/tree-ssa-forwprop.c (revision 217764) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -3178,7 +3178,9 @@ simplify_vce (gimple_stmt_iterator *gsi) (INTEGRAL_TYPE_P (TREE_TYPE (def_op)) || POINTER_TYPE_P (TREE_TYPE (def_op))) (TYPE_PRECISION (TREE_TYPE (op)) - == TYPE_PRECISION (TREE_TYPE (def_op + == TYPE_PRECISION (TREE_TYPE (def_op))) + (TYPE_SIZE (TREE_TYPE (op)) + == TYPE_SIZE (TREE_TYPE (def_op { TREE_OPERAND (gimple_assign_rhs1 (stmt), 0) = def_op; update_stmt (stmt);
Re: [PATCH 2/2] PR debug/38757 continued. Handle C11, C++11 and C++14.
On Thu, Nov 20, 2014 at 11:30:12PM +0100, Mark Wielaard wrote: @@ -19592,13 +19597,28 @@ gen_compile_unit_die (const char *filename) language = DW_LANG_C; if (strncmp (language_string, GNU C++, 7) == 0) -language = DW_LANG_C_plus_plus; +{ + language = DW_LANG_C_plus_plus; + if (dwarf_version = 5 || !dwarf_strict) + { + if (strcmp (language_string, GNU C++11) == 0) + language = DW_LANG_C_plus_plus_11; + else if (strcmp (language_string, GNU C++14) == 0) + language = DW_LANG_C_plus_plus_14; + } +} I think best would be to tweak if (value 2 || value 4) error_at (loc, dwarf version %d is not supported, value); else opts-x_dwarf_version = value; so that we accept value 5 too, and for now, until the most common consumers are changed, use if (dwarf_version = 5 /* || !dwarf_strict */) so that - you can actually use it in the test with -gdwarf-5 - you can commit it right away - people can start playing with what it will mean to support DWARF5 GCC 4.5 also allowed -gdwarf-4 even when DWARF4 has not been released yet. When there are consumers that can grok it, we can uncomment the || !dwarf_strict. Jason, do you agree? else if (strncmp (language_string, GNU C, 5) == 0) { language = DW_LANG_C89; if (dwarf_version = 3 || !dwarf_strict) - if (strcmp (language_string, GNU C99) == 0) - language = DW_LANG_C99; + { + if (strcmp (language_string, GNU C89) != 0) + language = DW_LANG_C99; + + if (dwarf_version = 5 || !dwarf_strict) + if (strcmp (language_string, GNU C11) == 0) + language = DW_LANG_C11; + } Shouldn't we emit at least DW_LANG_C99 for GNU C11 if not dwarf_version = 5 /* || !dwarf_strict */ but dwarf_version = 3 || !dwarf_strict is true? BTW, noticed we don't have anything for Fortran 2003 and 2008, filed a DWARF Issue for that. Jakub
Re: [PATCH] OpenACC for C front end
On Thu, Nov 20, 2014 at 05:50:33PM -0600, James Norris wrote: + case 'h': + if (!strcmp (host, p)) + result = PRAGMA_OMP_CLAUSE_SELF; + break; Shouldn't this be PRAGMA_OMP_CLAUSE_HOST (PRAGMA_OACC_CLAUSE_HOST) instead? It is _HOST in the C++ patch, are there no C tests with that clause covering it? The host clause is a synonym for the self clause. The initial C++ patch did not treat host as a synonym and has amended accordingly. Can you add a comment mentioning that (for casual readers)? There was a mistake in naming the function: c_parser_omp_clause_vector_length. Once it was renamed to: c_parser_oacc_clause_vector_length, diff was able to keep track. Great. OK to commit after middle end is accepted? Ok, thanks. Jakub
Re: [PATCH] OpenACC for C++ front end
On Thu, Nov 20, 2014 at 05:33:57PM -0600, James Norris wrote: + t = OMP_CLAUSE_ASYNC_EXPR (c); + if (t == error_mark_node) + remove = true; + else if (!type_dependent_expression_p (t) + !INTEGRAL_TYPE_P (TREE_TYPE (t))) + { + error (%async% expression must be integral); You have OMP_CLAUSE_LOCATION (c) which you could use for error_at. I followed the convention that was used elsewhere in the function at using error (). Perhaps it would be better to change even those other spots in the function. But that can be certainly done as a follow-up patch. Thank you for taking the time to review! OK to commit after middle end has been accepted? Yes, thanks. Jakub
Re: Add to maintainers list.
Hi, 2014-11-20 Alex Velenko alex.vele...@arm.com *MAINTAINERS (write-after-approval): Add myself. diff --git a/MAINTAINERS b/MAINTAINERS index 11a28ef..eada4e9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -566,6 +566,7 @@ David Ung dav...@mips.com Neil Vachharajani nvach...@gmail.com Kris Van Hees kris.van.h...@oracle.com Joost VandeVondelejoost.vandevond...@mat.ethz.ch +Alex Velenko alex.vele...@arm.com Ilya Verbin iver...@gmail.com Kugan Vivekanandarajahkug...@linaro.org Tom de Vries t...@codesourcery.com Can someone, please, approve? Kind regards, Alex
Re: Add to maintainers list.
On 20 November 2014 16:27, Alex Velenko alex.vele...@arm.com wrote: 2014-11-20 Alex Velenko alex.vele...@arm.com *MAINTAINERS (write-after-approval): Add myself. Your patch looks fine, commit it. /Marcus
Commit: Rl78: Save ES register in interrupt handlers
Hi Guys, I am applying the patch below to fix the RL78 backend so that it will preserve the ES register if an interrupt handler uses it. The ES register can be altered if a __far variable is addressed inside the handler. Tested without any regressions on an rl78-elf toolchain. Cheers Nick gcc/ChangeLog 2014-11-21 Nick Clifton ni...@redhat.com * config/rl78/rl78-real.md (movqi_from_es): New pattern. * config/rl78/rl78.c (struct machine_function): Add uses_es field. (rl78_expand_prologue): Save the ES register in interrupt handlers that use it. (rl78_expand_epilogue): Restore the ES register if necessary. (rl78_start_function): Mention if the function uses the ES register. (rl78_lo16): Record the use of the ES register. (transcode_memory_rtx): Likewise. Index: gcc/config/rl78/rl78-real.md === --- gcc/config/rl78/rl78-real.md (revision 217910) +++ gcc/config/rl78/rl78-real.md (working copy) @@ -36,6 +36,13 @@ mov\tes, %0 ) +(define_insn movqi_from_es + [(set (match_operand:QI 0 register_operand =a) + (reg:QI ES_REG))] + + mov\t%0, es +) + (define_insn movqi_cs [(set (reg:QI CS_REG) (match_operand:QI 0 register_operand a))] Index: gcc/config/rl78/rl78.c === --- gcc/config/rl78/rl78.c (revision 217910) +++ gcc/config/rl78/rl78.c (working copy) @@ -118,6 +118,9 @@ int virt_insns_ok; /* Set if the current function needs to clean up any trampolines. */ int trampolines_used; + /* True if the ES register is used and hence + needs to be saved inside interrupt handlers. */ + bool uses_es; }; /* This is our init_machine_status, as set in @@ -136,38 +139,36 @@ /* This pass converts virtual instructions using virtual registers, to real instructions using real registers. Rather than run it as reorg, we reschedule it before vartrack to help with debugging. */ -namespace { - -const pass_data pass_data_rl78_devirt = +namespace { - RTL_PASS, /* type */ - devirt, /* name */ - OPTGROUP_NONE, /* optinfo_flags */ - TV_MACH_DEP, /* tv_id */ - 0, /* properties_required */ - 0, /* properties_provided */ - 0, /* properties_destroyed */ - 0, /* todo_flags_start */ - 0, /* todo_flags_finish */ -}; + const pass_data pass_data_rl78_devirt = +{ + RTL_PASS, /* type */ + devirt, /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_MACH_DEP, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; -class pass_rl78_devirt : public rtl_opt_pass -{ -public: - pass_rl78_devirt(gcc::context *ctxt) -: rtl_opt_pass(pass_data_rl78_devirt, ctxt) + class pass_rl78_devirt : public rtl_opt_pass { - } + public: +pass_rl78_devirt (gcc::context *ctxt) + : rtl_opt_pass (pass_data_rl78_devirt, ctxt) + { + } - /* opt_pass methods: */ - virtual unsigned int execute (function *) +/* opt_pass methods: */ +virtual unsigned int execute (function *) { rl78_reorg (); return 0; } - -}; - + }; } // anon namespace rtl_opt_pass * @@ -203,8 +204,7 @@ can eliminate the second SET. */ if (prev rtx_equal_p (SET_DEST (prev), SET_SRC (set)) - rtx_equal_p (SET_DEST (set), SET_SRC (prev)) - ) + rtx_equal_p (SET_DEST (set), SET_SRC (prev))) { if (dump_file) fprintf (dump_file, Delete insn %d because it is redundant\n, @@ -216,7 +216,7 @@ else prev = set; } - + if (dump_file) print_rtl_with_bb (dump_file, get_insns (), 0); @@ -223,33 +223,32 @@ return 0; } -namespace { - -const pass_data pass_data_rl78_move_elim = +namespace { - RTL_PASS, /* type */ - move_elim, /* name */ - OPTGROUP_NONE, /* optinfo_flags */ - TV_MACH_DEP, /* tv_id */ - 0, /* properties_required */ - 0, /* properties_provided */ - 0, /* properties_destroyed */ - 0, /* todo_flags_start */ - 0, /* todo_flags_finish */ -}; + const pass_data pass_data_rl78_move_elim = +{ + RTL_PASS, /* type */ + move_elim, /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_MACH_DEP, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; -class pass_rl78_move_elim : public rtl_opt_pass -{ -public: - pass_rl78_move_elim(gcc::context *ctxt) -: rtl_opt_pass(pass_data_rl78_move_elim, ctxt) + class pass_rl78_move_elim : public rtl_opt_pass { - } + public: +pass_rl78_move_elim (gcc::context *ctxt) + : rtl_opt_pass (pass_data_rl78_move_elim, ctxt) + { + } - /* opt_pass methods: */ - virtual unsigned int execute (function *) { return move_elim_pass ();
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
PING. 200 currently looks optimal for x86. Let's commit the following: 2014-11-21 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for x86. */ + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, +200, +opts-x_param_values, +opts_set-x_param_values); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch On Wed, Nov 12, 2014 at 5:02 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Code size for spec2000 is almost unchanged (many benchmarks have the same binaries). For those that are changed we have the following numbers (200 vs 100, both dynamic build -Ofast -funroll-loops -flto): 183.equake +10% 164.gzip, 173.applu +3,5% 187.facerec, 191.fma3d +2,5% 200.sixstrack +2% 177.mesa, 178.galgel +1% On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka hubi...@ucw.cz wrote: 150 and 200 make Silvermont performance better on 173.applu (+8%) and 183.equake (+3%); Haswell spec2006 performance stays almost unchanged. Higher value of 300 leave the performance of mentioned tests unchanged, but add some regressions on other benchmarks. So I like 200 as well as 120 and 150, but can confirm performance gains only for x86. IMO it's either 150 or 200. We chose 200 for our 4.9-based compiler because this gave the performance boost without affecting the code size (on x86-64) and because this was previously 400, but it's your call. Both 150 or 200 globally work for me if there is not too much of code size bloat (did not see code size mentioned here). What I did before decreasing the bounds was strenghtening the loop iteraton count bounds and adding logic the predicts constant propagation enabled by unrolling. For this reason 400 became too large as we did a lot more complete unrolling than before. Also 400 in older compilers is not really 400 in newer. Because I saw performance to drop only with values bellow 50, I went for 100. It would be very interesting to actually analyze what happends for those two benchmarks (that should not be too hard with perf). Honza
Re: [PATCH] PR lto/63968: 175.vpr from cpu2000 fails to build with LTO
On 11/20/2014 10:13 PM, Jan Hubicka wrote: Hello. As I reimplemented fibheap to C++ template, Honza told me that replace_key method actually supports just decrement operation. Old implementation suppress any feedback if we try to increase key: fibheap.c: ... /* If we wanted to, we could actually do a real increase by redeleting and inserting. However, this would require O (log n) time. So just bail out for now. */ if (fibheap_comp_data (heap, key, data, node) 0) return NULL; ... My reimplementation added assert for such kind operation, as this PR shows we try to do increment in reorder-bb. Thus, I added fibonacci_heap::replace_key method that can increment key (it deletes the node and new key is associated with the node). The patch can bootstrap on x86_64-linux-pc and no new regression was introduced. I would like to ask someone if the increase operation for bb-reorder is valid or not? Can you verify that the implementation is correct? I tend to remember that I introduced the lazy incerementation to inliner both for perofrmance and correctness reasons. I used to get odd orders when keys was increased. Honza Hello. What kind of correctness do you mean? Old implementation didn't support increment operation and the fact was hushed up. Martin Thanks, Martin gcc/ChangeLog: 2014-11-20 Martin Liska mli...@suse.cz * bb-reorder.c (find_traces_1_round): decreate_key is replaced with replace_key method. * fibonacci_heap.h (fibonacci_heap::insert): New argument. (fibonacci_heap::replace_key_data): Likewise. (fibonacci_heap::replace_key): New method that can even increment key, this operation costs O(log N). (fibonacci_heap::extract_min): New argument. (fibonacci_heap::delete_node): Likewise. diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index 689d7b6..b568114 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -644,7 +644,7 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, (long) bbd[e-dest-index].node-get_key (), key); } - bbd[e-dest-index].heap-decrease_key + bbd[e-dest-index].heap-replace_key (bbd[e-dest-index].node, key); } } @@ -812,7 +812,7 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, e-dest-index, (long) bbd[e-dest-index].node-get_key (), key); } - bbd[e-dest-index].heap-decrease_key + bbd[e-dest-index].heap-replace_key (bbd[e-dest-index].node, key); } } diff --git a/gcc/fibonacci_heap.h b/gcc/fibonacci_heap.h index ecb92f8..3fce370 100644 --- a/gcc/fibonacci_heap.h +++ b/gcc/fibonacci_heap.h @@ -183,20 +183,27 @@ public: } /* For given NODE, set new KEY value. */ - K decrease_key (fibonacci_node_t *node, K key) + K replace_key (fibonacci_node_t *node, K key) { K okey = node-m_key; -gcc_assert (key = okey); replace_key_data (node, key, node-m_data); return okey; } + /* For given NODE, decrease value to new KEY. */ + K decrease_key (fibonacci_node_t *node, K key) + { +gcc_assert (key = node-m_key); +return replace_key (node, key); + } + /* For given NODE, set new KEY and DATA value. */ V *replace_key_data (fibonacci_node_t *node, K key, V *data); - /* Extract minimum node in the heap. */ - V *extract_min (); + /* Extract minimum node in the heap. If RELEASE is specified, + memory is released. */ + V *extract_min (bool release = true); /* Return value associated with minimum node in the heap. */ V *min () @@ -214,12 +221,15 @@ public: } /* Delete NODE in the heap. */ - V *delete_node (fibonacci_node_t *node); + V *delete_node (fibonacci_node_t *node, bool release = true); /* Union the heap with HEAPB. */ fibonacci_heap *union_with (fibonacci_heap *heapb); private: + /* Insert new NODE given by KEY and DATA associated with the key. */ + fibonacci_node_t *insert (fibonacci_node_t *node, K key, V *data); + /* Insert it into the root list. */ void insert_root (fibonacci_node_t *node); @@ -322,6 +332,15 @@ fibonacci_heapK,V::insert (K key, V *data) /* Create the new node. */ fibonacci_nodeK,V *node = new fibonacci_node_t (); + return insert (node, key, data); +} + +/* Insert new NODE given by KEY and DATA associated with the key. */ + +templateclass K, class V +fibonacci_nodeK,V* +fibonacci_heapK,V::insert (fibonacci_node_t *node, K key, V *data) +{ /* Set the node's data. */ node-m_data = data; node-m_key = key; @@ -345,17 +364,22 @@ V* fibonacci_heapK,V::replace_key_data (fibonacci_nodeK,V *node, K key,
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On Fri, Nov 21, 2014 at 11:46 AM, Evgeny Stupachenko evstu...@gmail.com wrote: PING. 200 currently looks optimal for x86. Let's commit the following: 2014-11-21 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. OK. Looks like a good performance vs. codesize tradeoff. Uros.
Re: [ia64 PATCH] Fix up ia64 attribute handling (PR target/61137)
Jakub Jelinek ja...@redhat.com writes: The following untested patch fixes that (tested on small-addr-1.c with a cross-compiler), I don't have ia64 hw nor spare cycles to test this though, so I'm just offering the patch as is if anyone wants to test it. Perhaps better testsuite coverage wouldn't hurt (test the model (small) attribute also in C++, perhaps test the common_object attribute on VMS?). 2014-11-20 Jakub Jelinek ja...@redhat.com PR target/61137 * config/ia64/ia64.c (ia64_attribute_takes_identifier_p): New function. (TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P): Redefine to it. Looks good. http://gcc.gnu.org/ml/gcc-testresults/2014-11/msg02276.html Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
[PATCH]Add myself to MAINTAINERS
Hi, This patch is to add myself into Write After Approval section of MAINTAINERS file. Is it Okay to commit? Regards, Renlin Li ChangeLog: 2014-11-21 Renlin Li renlin...@arm.com * MAINTAINERS (Write After Approval): Add myself.diff --git a/MAINTAINERS b/MAINTAINERS index 56e68c5..96a7497 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -461,6 +461,7 @@ Georg-Johann Laya...@gjlay.de Marc Lehmann p...@goof.com James Lemke jwle...@codesourcery.com Kriang Lerdsuwanakijlerds...@users.sourceforge.net +Renlin Li renlin...@arm.com Xinliang David Lidavi...@google.com Jiangning Liu jiangning@arm.com Sa Liu sa...@de.ibm.com
Re: [PATCH 8/9] Negative numbers added for sreal class.
On 11/14/2014 11:48 AM, Richard Biener wrote: On Thu, Nov 13, 2014 at 1:35 PM, mliska mli...@suse.cz wrote: gcc/ChangeLog: 2014-11-13 Martin Liska mli...@suse.cz * predict.c (propagate_freq): More elegant sreal API is used. (estimate_bb_frequencies): New static constants defined by sreal replace precomputed ones. * sreal.c (sreal::normalize): New function. (sreal::to_int): Likewise. (sreal::operator+): Likewise. (sreal::operator-): Likewise. * sreal.h: Definition of new functions added. Please use gcc_checking_assert()s everywhere. sreal is supposed to be fast... (I see it has current uses of gcc_assert - you may want to mass-convert them as a followup). --- gcc/predict.c | 30 +++- gcc/sreal.c | 56 gcc/sreal.h | 75 --- 3 files changed, 126 insertions(+), 35 deletions(-) diff --git a/gcc/predict.c b/gcc/predict.c index 0215e91..0f640f5 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -82,7 +82,7 @@ along with GCC; see the file COPYING3. If not see /* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE, 1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX. */ -static sreal real_zero, real_one, real_almost_one, real_br_prob_base, +static sreal real_almost_one, real_br_prob_base, real_inv_br_prob_base, real_one_half, real_bb_freq_max; static void combine_predictions_for_insn (rtx_insn *, basic_block); @@ -2528,13 +2528,13 @@ propagate_freq (basic_block head, bitmap tovisit) bb-count = bb-frequency = 0; } - BLOCK_INFO (head)-frequency = real_one; + BLOCK_INFO (head)-frequency = sreal::one (); last = head; for (bb = head; bb; bb = nextbb) { edge_iterator ei; - sreal cyclic_probability = real_zero; - sreal frequency = real_zero; + sreal cyclic_probability = sreal::zero (); + sreal frequency = sreal::zero (); nextbb = BLOCK_INFO (bb)-next; BLOCK_INFO (bb)-next = NULL; @@ -2559,13 +2559,13 @@ propagate_freq (basic_block head, bitmap tovisit) * BLOCK_INFO (e-src)-frequency / REG_BR_PROB_BASE); */ - sreal tmp (e-probability, 0); + sreal tmp = e-probability; tmp *= BLOCK_INFO (e-src)-frequency; tmp *= real_inv_br_prob_base; frequency += tmp; } - if (cyclic_probability == real_zero) + if (cyclic_probability == sreal::zero ()) { BLOCK_INFO (bb)-frequency = frequency; } @@ -2577,7 +2577,7 @@ propagate_freq (basic_block head, bitmap tovisit) /* BLOCK_INFO (bb)-frequency = frequency / (1 - cyclic_probability) */ - cyclic_probability = real_one - cyclic_probability; + cyclic_probability = sreal::one () - cyclic_probability; BLOCK_INFO (bb)-frequency = frequency / cyclic_probability; } } @@ -2591,7 +2591,7 @@ propagate_freq (basic_block head, bitmap tovisit) = ((e-probability * BLOCK_INFO (bb)-frequency) / REG_BR_PROB_BASE); */ - sreal tmp (e-probability, 0); + sreal tmp = e-probability; tmp *= BLOCK_INFO (bb)-frequency; EDGE_INFO (e)-back_edge_prob = tmp * real_inv_br_prob_base; } @@ -2873,13 +2873,11 @@ estimate_bb_frequencies (bool force) if (!real_values_initialized) { real_values_initialized = 1; - real_zero = sreal (0, 0); - real_one = sreal (1, 0); - real_br_prob_base = sreal (REG_BR_PROB_BASE, 0); - real_bb_freq_max = sreal (BB_FREQ_MAX, 0); + real_br_prob_base = REG_BR_PROB_BASE; + real_bb_freq_max = BB_FREQ_MAX; real_one_half = sreal (1, -1); - real_inv_br_prob_base = real_one / real_br_prob_base; - real_almost_one = real_one - real_inv_br_prob_base; + real_inv_br_prob_base = sreal::one () / real_br_prob_base; + real_almost_one = sreal::one () - real_inv_br_prob_base; } mark_dfs_back_edges (); @@ -2897,7 +2895,7 @@ estimate_bb_frequencies (bool force) FOR_EACH_EDGE (e, ei, bb-succs) { - EDGE_INFO (e)-back_edge_prob = sreal (e-probability, 0); + EDGE_INFO (e)-back_edge_prob = e-probability; EDGE_INFO (e)-back_edge_prob *= real_inv_br_prob_base; } } @@ -2906,7 +2904,7 @@ estimate_bb_frequencies (bool force) to outermost to examine frequencies for back edges. */ estimate_loops (); - freq_max = real_zero; + freq_max = sreal::zero (); FOR_EACH_BB_FN (bb, cfun) if (freq_max BLOCK_INFO (bb)-frequency)
Re: [PATCH 1/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.
On 20 Nov 09:43, Uros Bizjak wrote: On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, New revision of Intel ISA reference [1] has new instructions: Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. I understand that stage 1 is closed, however those changes shouldn't affect anything outside if i386 backend. And are extremely unlikely to break existing functionality, and I personally think it's desirable for newest GCC to support newest spec. Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. Ok for trunk? Please split the patch into patch series, like it was done previously for AVX512F patches. Uros. This part adds avx512ifma. Bootstraps/passes make check. gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512IFMA_SET, , OPTION_MASK_ISA_AVX512IFMA_UNSET): New. (ix86_handle_option): Handle OPT_mavx512ifma. * config.gcc: Add avx512ifmaintrin.h, avx512ifmavlintrin.h. * config/i386/avx512ifmaintrin.h: New file. * config/i386/avx512ifmaivlntrin.h: Ditto. * config/i386/cpuid.h (bit_AVX512IFMA): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect avx512ifma. * config/i386/i386-c.c (ix86_target_macros_internal): Define __AVX512IFMA__. * config/i386/i386.c (ix86_target_string): Add -mavx512ifma. (PTA_AVX512IFMA): Define. (ix86_option_override_internal): Handle new options. (ix86_valid_target_attribute_inner_p): Add avx512ifma. (ix86_builtins): Add IX86_BUILTIN_VPMADD52LUQ512, IX86_BUILTIN_VPMADD52HUQ512, IX86_BUILTIN_VPMADD52LUQ256, IX86_BUILTIN_VPMADD52HUQ256, IX86_BUILTIN_VPMADD52LUQ128, IX86_BUILTIN_VPMADD52HUQ128, IX86_BUILTIN_VPMADD52LUQ512_MASKZ, IX86_BUILTIN_VPMADD52HUQ512_MASKZ, IX86_BUILTIN_VPMADD52LUQ256_MASKZ, IX86_BUILTIN_VPMADD52HUQ256_MASKZ, IX86_BUILTIN_VPMADD52LUQ128_MASKZ, IX86_BUILTIN_VPMADD52HUQ128_MASKZ. (bdesc_special_args): Add __builtin_ia32_vpmadd52luq512_mask, __builtin_ia32_vpmadd52luq512_maskz, __builtin_ia32_vpmadd52huq512_mask, __builtin_ia32_vpmadd52huq512_maskx, __builtin_ia32_vpmadd52luq256_mask, __builtin_ia32_vpmadd52luq256_maskz, __builtin_ia32_vpmadd52huq256_mask, __builtin_ia32_vpmadd52huq256_maskz, __builtin_ia32_vpmadd52luq128_mask, __builtin_ia32_vpmadd52luq128_maskz, __builtin_ia32_vpmadd52huq128_mask, __builtin_ia32_vpmadd52huq128_maskz, * config/i386/i386.h (TARGET_AVX512IFMA, TARGET_AVX512IFMA_P): Define. * config/i386/i386.opt: Add mavx512ifma. * config/i386/immintrin.h: Include avx512ifmaintrin.h, avx512ifmavlintrin.h. * config/i386/sse.md (unspec): Add UNSPEC_VPMADD52LUQ, UNSPEC_VPMADD52HUQ. (VPMADD52): New iterator. (vpmadd52type): New attribute. (vpamdd52huqmode_maskz): New. (vpamdd52luqmode_maskz): Ditto. (vpamdd52vpmadd52typemodesd_maskz_name): Ditto. (vpamdd52vpmadd52typemode_mask): Ditto. gcc/testsuite/ * g++.dg/other/i386-2.C: Add -mavx512ifma. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx512f-helper.h: Add avx512ifma-check.h. * gcc.target/i386/avx512ifma-check.h: New. * gcc.target/i386/avx512ifma-vpmaddhuq-1.c: Ditto. * gcc.target/i386/avx512ifma-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx512ifma-vpmaddluq-1.c: Ditto. * gcc.target/i386/avx512ifma-vpmaddluq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto. * gcc.target/i386/i386.exp (check_effective_target_avx512ifma): New. * gcc.target/i386/sse-12.c: Add new options. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. --- gcc/common/config/i386/i386-common.c | 16 ++ gcc/config.gcc | 6 +- gcc/config/i386/avx512ifmaintrin.h | 104 + gcc/config/i386/avx512ifmavlintrin.h | 164 + gcc/config/i386/cpuid.h| 1 + gcc/config/i386/driver-i386.c | 5 +- gcc/config/i386/i386-c.c | 2 + gcc/config/i386/i386.c | 35 + gcc/config/i386/i386.h | 2 + gcc/config/i386/i386.opt | 4 + gcc/config/i386/immintrin.h| 4 + gcc/config/i386/sse.md | 69 + gcc/testsuite/g++.dg/other/i386-2.C| 2 +- gcc/testsuite/g++.dg/other/i386-3.C| 2 +- gcc/testsuite/gcc.target/i386/avx512f-helper.h | 5 +
[PATCH] VRP: don't assume strict overflow semantics when checking if a loop wraps
When adjusting the value range of an induction variable using SCEV, VRP calls scev_probably_wraps_p() with use_overflow_semantics=true. This parameter set to true makes scev_probably_wraps_p() assume that signed induction variables never wrap, so for these variables it always returns false (when strict overflow rules are in effect). This is wrong because if a signed induction variable really does overflow then we want to give it an INF(OVF) value range and not the (finite) estimation returned by SCEV. While this change shouldn't make a difference in code generation, it should help improve the coverage of -Wstrict-overflow warnings on induction variables like in the test case. OK after bootstrap + regtest on x86_64-unknown-linux-gnu? gcc/ * tree-vrp.c (adjust_range_with_scev): Call scev_probably_wraps_p with use_overflow_semantics=false. gcc/testsuite/ * gcc.dg/Wstrict-overflow-27.c: New test. --- gcc/testsuite/gcc.dg/Wstrict-overflow-27.c | 22 ++ gcc/tree-vrp.c | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/Wstrict-overflow-27.c diff --git a/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c b/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c new file mode 100644 index 000..c1f27ab --- /dev/null +++ b/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -fstrict-overflow -O2 -Wstrict-overflow } */ + +/* Warn about an overflow when folding i 0. */ + +void bar (unsigned *p); + +int +foo (unsigned *p) +{ + int i; + int sum = 0; + + for (i = 0; i *p; i++) +{ + if (i 0) /* { dg-warning signed overflow } */ + sum += 2; + bar (p); +} + + return sum; +} diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index a75138f..bf9ff61 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4270,7 +4270,7 @@ adjust_range_with_scev (value_range_t *vr, struct loop *loop, dir == EV_DIR_UNKNOWN /* ... or if it may wrap. */ || scev_probably_wraps_p (init, step, stmt, get_chrec_loop (chrec), - true)) + /*use_overflow_semantics=*/false)) return; /* We use TYPE_MIN_VALUE and TYPE_MAX_VALUE here instead of -- 2.2.0.rc1.23.gf570943
Re: [PATCH][ARM] Fix names of some rounding intrinsics, impement vrndx_f32 and vrndxq_f32
Ping again. Thanks, Kyrill On 13/11/14 14:45, Kyrill Tkachov wrote: Ping. Kyrill On 04/11/14 10:56, Kyrill Tkachov wrote: Phew, This one slipped through the cracks. Ping? https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01981.html Thanks, Kyrill On 23/09/14 16:25, Kyrill Tkachov wrote: On 23/09/14 16:07, Kyrill Tkachov wrote: Hi all, Some intrinsics had the wrong name (inconsistent with the NEON intrinsics spec). This patch fixes that and adds the vrndx_f32 and vrndxq_f32 intrinsics that were missing. For reference, the NEON intrinsics spec can be found at: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf Kyrill These map down to vrintx.f32 NEON instructions (d and q forms). We already had builtins defined for them, just the intrinsics were not wired up to them properly. Tested arm-none-eabi Ok for trunk? 2014-09-23 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm_neon.h (vrndqn_f32): Rename to... (vrndnq_f32): ... this. (vrndqa_f32): Rename to... (vrndaq_f32): ... this. (vrndqp_f32): Rename to... (vrndpq_f32): ... this. (vrndqm_f32): Rename to... (vrndmq_f32): ... this. (vrndx_f32): New intrinsic. (vrndxq_f32): Likewise. 2014-09-23 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/arm/simd/neon-vrndx_f32_1.c: New test. * gcc.target/arm/simd/neon-vrndxq_f32_1.c: Likewise. * gcc.target/arm/neon/vrndqaf32.c: Rename to... * gcc.target/arm/neon/vrndaqf32.c: ... This. Update intrinsic names. * gcc.target/arm/neon/vrndqmf32.c: Rename to... * gcc.target/arm/neon/vrndmqf32.c: ... This. Update intrinsic names. * gcc.target/arm/neon/vrndqnf32.c: Rename to... * gcc.target/arm/neon/vrndnqf32.c: ... This. Update intrinsic names. * gcc.target/arm/neon/vrndqpf32.c: Rename to... * gcc.target/arm/neon/vrndpqf32.c: ... This. Update intrinsic names.
Re: [patch c++]: Fix PR/53904
On Thu, Nov 20, 2014 at 8:48 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this issue fixes a type-overflow issue caused by trying to cast a UHWI via tree_to_shwi. As soon as value gets larger then SHWI_MAX, we get an error for it. So we need to cast it via tree_to_uhwi, and then casting it to the signed variant. I think it's better to handle the degenerate case (no element) explicitely. And I would think that sth like nelts should have a positive result, thus why is 'max' not unsigned? Also 'max' and using 'nelts' looks like a mismatch? max == nelts - 1. Ah, because array_type_nelts returns nelts - 1 ... how useful ;) Still you want to special-case the array_type_nelts == -1 case. Richard. ChangeLog 2014-11-20 Kai Tietz kti...@redhat.com PR c++/63904 * constexpr.c (cxx_eval_vec_init_1): Avoid type-overflow issue. 2014-11-20 Kai Tietz kti...@redhat.com PR c++/63904 * g++.dg/cpp0x/pr63904.C: New. Regression tested for x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/cp/constexpr.c === --- gcc.orig/gcc/cp/constexpr.c +++ gcc/gcc/cp/constexpr.c @@ -2006,12 +2050,12 @@ cxx_eval_vec_init_1 (const constexpr_ctx bool *non_constant_p, bool *overflow_p) { tree elttype = TREE_TYPE (atype); - int max = tree_to_shwi (array_type_nelts (atype)); + HOST_WIDE_INT max = (HOST_WIDE_INT) tree_to_uhwi (array_type_nelts (atype)); verify_ctor_sanity (ctx, atype); vecconstructor_elt, va_gc **p = CONSTRUCTOR_ELTS (ctx-ctor); vec_alloc (*p, max + 1); bool pre_init = false; - int i; + HOST_WIDE_INT i; /* For the default constructor, build up a call to the default constructor of the element type. We only need to handle class types Index: gcc/gcc/testsuite/g++.dg/cpp0x/pr63904.C === --- /dev/null +++ gcc/gcc/testsuite/g++.dg/cpp0x/pr63904.C @@ -0,0 +1,13 @@ +// { dg-do compile { target c++11 } } + +templateint N +struct foo { +constexpr foo() : a() {} +int a[N]; +}; + +int main() { + foo (foo1{}).a[0] f; + return 0; +} +
Re: [PATCH 2/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.
On 20 Nov 09:43, Uros Bizjak wrote: On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, New revision of Intel ISA reference [1] has new instructions: Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. I understand that stage 1 is closed, however those changes shouldn't affect anything outside if i386 backend. And are extremely unlikely to break existing functionality, and I personally think it's desirable for newest GCC to support newest spec. Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. Ok for trunk? Please split the patch into patch series, like it was done previously for AVX512F patches. Uros. This part adds avx512vbmi. I'll send vpermi2b autogen patch together with v64qi const perm later. Boostraps/passes make check. Ok for trunk? gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512VBMI_SET OPTION_MASK_ISA_AVX512VBMI_UNSET): New. (ix86_handle_option): Handle OPT_mavx512vbmi. * config.gcc: Add avx512vbmiintrin.h, avx512vbmivlintrin.h. * config/i386/avx512vbmiintrin.h: New file. * config/i386/avx512vbmivlintrin.h: Ditto. * config/i386/cpuid.h (bit_AVX512VBMI): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect avx512vbmi. * config/i386/i386-c.c (ix86_target_macros_internal): Define __AVX512VBMI__. * config/i386/i386.c (ix86_target_string): Add -mavx512vbmi. (PTA_AVX512VBMI): Define. (ix86_option_override_internal): Handle new options. (ix86_valid_target_attribute_inner_p): Add avx512vbmi, (ix86_builtins): Add IX86_BUILTIN_VPMULTISHIFTQB512, IX86_BUILTIN_VPMULTISHIFTQB256, IX86_BUILTIN_VPMULTISHIFTQB128, IX86_BUILTIN_VPERMVARQI512_MASK, IX86_BUILTIN_VPERMT2VARQI512, IX86_BUILTIN_VPERMT2VARQI512_MASKZ, IX86_BUILTIN_VPERMI2VARQI512, IX86_BUILTIN_VPERMVARQI256_MASK, IX86_BUILTIN_VPERMVARQI128_MASK, IX86_BUILTIN_VPERMT2VARQI256, IX86_BUILTIN_VPERMT2VARQI256_MASKZ, IX86_BUILTIN_VPERMT2VARQI128, IX86_BUILTIN_VPERMI2VARQI256, IX86_BUILTIN_VPERMI2VARQI128. (bdesc_special_args): Add __builtin_ia32_vpmultishiftqb512_mask, __builtin_ia32_vpmultishiftqb256_mask, __builtin_ia32_vpmultishiftqb128_mask, __builtin_ia32_permvarqi512_mask, __builtin_ia32_vpermt2varqi512_mask, __builtin_ia32_vpermt2varqi512_maskz, __builtin_ia32_vpermi2varqi512_mask, __builtin_ia32_permvarqi256_mask, __builtin_ia32_permvarqi128_mask, __builtin_ia32_vpermt2varqi256_mask, __builtin_ia32_vpermt2varqi256_maskz, __builtin_ia32_vpermt2varqi128_mask, __builtin_ia32_vpermt2varqi128_maskz, __builtin_ia32_vpermi2varqi256_mask, __builtin_ia32_vpermi2varqi128_mask. (ix86_hard_regno_mode_ok): Allow big masks for AVX512VBMI. * config/i386/i386.h (TARGET_AVX512VBMI, TARGET_AVX512VBMI_P): Define. * config/i386/i386.opt: Add mavx512vbmi. * config/i386/immintrin.h: Include avx512vbmiintrin.h, avx512vbmivlintrin.h. * config/i386/sse.md (unspec): Add UNSPEC_VPMULTISHIFT. (VI1_AVX512VL): New iterator. (avx512_permvarmodemask_name): Use it. (avx512_vpermi2varmode3_maskz): Ditto. (avx512_vpermi2varmode3sd_maskz_name): Ditto. (avx512_vpermi2varmode3_mask): Ditto. (avx512_vpermt2varmode3_maskz): Ditto. (avx512_vpermt2varmode3sd_maskz_name): Ditto. (avx512_vpermt2varmode3_mask): Ditto. (vpmultishiftqbmodemask_name): Ditto. gcc/testsuite/ * g++.dg/other/i386-2.C: Add -mavx512vbmi. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx512f-helper.h: Add avx512vbmi-check.h. * gcc.target/i386/avx512vbmi-check.h: Ditto. * gcc.target/i386/avx512vbmi-vpermb-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpermb-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpermi2b-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpermi2b-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpermt2b-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpermt2b-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c: Ditto. * gcc.target/i386/avx512vl-vpermb-2.c: Ditto. * gcc.target/i386/avx512vl-vpermi2b-2.c: Ditto. * gcc.target/i386/avx512vl-vpermt2b-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmultishiftqb-2.c: Ditto. * gcc.target/i386/i386.exp (check_effective_target_avx512vbmi): New. * gcc.target/i386/sse-12.c: Add new options. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. ---
Re: [PATCH] Fix PR 63952 (Re: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4)
On Fri, Nov 21, 2014 at 2:51 AM, Ulrich Weigand uweig...@de.ibm.com wrote: Richard Biener wrote: This probably caused bootstrap on s390x-linux to fail as in PR63952 (last checked with rev. 217714). It seems we have both a back-end bug and a middle-end bug here. First of all, this code in optabs.c:prepare_cmp_insn is quite strange: if (GET_MODE_CLASS (mode) == MODE_CC) { gcc_assert (can_compare_p (comparison, CCmode, ccp_jump)); *ptest = gen_rtx_fmt_ee (comparison, VOIDmode, x, y); return; } Note that can_compare_p checks whether the back-end accepts a test RTX created via: test = gen_rtx_fmt_ee (code, mode, const0_rtx, const0_rtx); All back-end cbranchcc4 patterns however verify that the first operand of the comparison is the flags register, so a const0_rtx will never match. It doesn't seem useful to call can_compare_p with CCmode at all. The patch below changes prepare_cmp_insn do do an explicit insn_operand_matches test using the actual operands, just like is also done for non-CCmode comparisons. However, even so this is still rejected by the s390 back end. This is because the s390 cbranchcc4 pattern is really quite wrong; it is restricted to accepting only EQ/NE comparisons when it could simply accept any valid comparison (i.e. where s390_comparison is true). In addition, it has a TARGET_HARD_FLOAT check for no reason I can see, and it has custom expander code that is in all cases a no-op and results in exactly the pattern in the insn to be emitted anyway. Fixed by the patch below as well. Tested on s390x-ibm-linux (with and without --with-arch=z196). OK for mainline? Ok. Thanks, Richard. Bye, Ulrich ChangeLog: PR rtl-optimization/63952 * optabs.c (prepare_cmp_insn): Do not call can_compare_p for CCmode. * config/s390/s390.md (cbranchcc4): Accept any s390_comparison. Remove incorrect TARGET_HARD_FLOAT check and no-op expander code. Index: gcc/optabs.c === *** gcc/optabs.c(revision 217784) --- gcc/optabs.c(working copy) *** prepare_cmp_insn (rtx x, rtx y, enum rtx *** 4167,4174 if (GET_MODE_CLASS (mode) == MODE_CC) { ! gcc_assert (can_compare_p (comparison, CCmode, ccp_jump)); ! *ptest = gen_rtx_fmt_ee (comparison, VOIDmode, x, y); return; } --- 4167,4177 if (GET_MODE_CLASS (mode) == MODE_CC) { ! enum insn_code icode = optab_handler (cbranch_optab, CCmode); ! test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y); ! gcc_assert (icode != CODE_FOR_nothing !insn_operand_matches (icode, 0, test)); ! *ptest = test; return; } Index: gcc/config/s390/s390.md === *** gcc/config/s390/s390.md (revision 217784) --- gcc/config/s390/s390.md (working copy) *** *** 8142,8157 (define_expand cbranchcc4 [(set (pc) ! (if_then_else (match_operator 0 s390_eqne_operator [(match_operand 1 cc_reg_operand ) ! (match_operand 2 const0_operand )]) (label_ref (match_operand 3 )) (pc)))] ! TARGET_HARD_FLOAT ! s390_emit_jump (operands[3], ! s390_emit_compare (GET_CODE (operands[0]), operands[1], operands[2])); !DONE;) ! ;; --- 8142,8154 (define_expand cbranchcc4 [(set (pc) ! (if_then_else (match_operator 0 s390_comparison [(match_operand 1 cc_reg_operand ) ! (match_operand 2 const_int_operand )]) (label_ref (match_operand 3 )) (pc)))] ! ! ) ;; -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH]Add myself to MAINTAINERS
On 21/11/14 11:16, Renlin Li wrote: Hi, This patch is to add myself into Write After Approval section of MAINTAINERS file. Is it Okay to commit? Regards, Renlin Li ChangeLog: 2014-11-21 Renlin Li renlin...@arm.com * MAINTAINERS (Write After Approval): Add myself. tmp.patch diff --git a/MAINTAINERS b/MAINTAINERS index 56e68c5..96a7497 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -461,6 +461,7 @@ Georg-Johann Lay a...@gjlay.de Marc Lehmann p...@goof.com James Lemke jwle...@codesourcery.com Kriang Lerdsuwanakij lerds...@users.sourceforge.net +Renlin Lirenlin...@arm.com Xinliang David Lidavi...@google.com Jiangning Liujiangning@arm.com Sa Liu sa...@de.ibm.com OK R.
Re: [PATCH 3/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.
On 20 Nov 09:43, Uros Bizjak wrote: On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, New revision of Intel ISA reference [1] has new instructions: Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. I understand that stage 1 is closed, however those changes shouldn't affect anything outside if i386 backend. And are extremely unlikely to break existing functionality, and I personally think it's desirable for newest GCC to support newest spec. Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. Ok for trunk? Please split the patch into patch series, like it was done previously for AVX512F patches. Uros. Done. This part adds clwb. Bootstrapped/passes make-check. Ok for trunk? gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_CLWB_UNSET, OPTION_MASK_ISA_CLWB_SET): New. (ix86_handle_option): Handle OPT_mclwb. * config.gcc: Add clwbintrin.h. * config/i386/clwbintrin.h: New file. * config/i386/cpuid.h (bit_CLWB): Define. * config/i386/driver-i386.c (host_detect_local_cpu): Detect clwb. * config/i386/i386-c.c (ix86_target_macros_internal): Define __CLWB__. * config/i386/i386.c (ix86_target_string): Add -mclwb. (PTA_CLWB): Define. (ix86_option_override_internal): Handle new option. (ix86_valid_target_attribute_inner_p): Add clwb. (ix86_builtins): Add IX86_BUILTIN_CLWB. (ix86_init_mmx_sse_builtins): Add __builtin_ia32_clwb. (ix86_expand_builtin): Handle IX86_BUILTIN_CLWB. * config/i386/i386.h (TARGET_CLWB, TARGET_CLWB_P): Define. * config/i386/i386.md (unspecv): Add UNSPECV_CLWB. (clwb): New instruction. * config/i386/i386.opt: Add mclwb. * config/i386/x86intrin.h: Include clwbintrin.h. gcc/testsuite/ * g++.dg/other/i386-2.C: Add -mclwb. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/clwb-1.c: New test. * gcc.target/i386/sse-12.c: Add new options. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. --- gcc/common/config/i386/i386-common.c | 15 +++ gcc/config.gcc | 4 +-- gcc/config/i386/clwbintrin.h | 49 ++ gcc/config/i386/cpuid.h| 1 + gcc/config/i386/driver-i386.c | 6 +++-- gcc/config/i386/i386-c.c | 2 ++ gcc/config/i386/i386.c | 23 gcc/config/i386/i386.h | 2 ++ gcc/config/i386/i386.md| 12 + gcc/config/i386/i386.opt | 4 +++ gcc/config/i386/x86intrin.h| 2 ++ gcc/testsuite/g++.dg/other/i386-2.C| 2 +- gcc/testsuite/g++.dg/other/i386-3.C| 2 +- gcc/testsuite/gcc.target/i386/clwb-1.c | 11 gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 2 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- 19 files changed, 134 insertions(+), 11 deletions(-) create mode 100644 gcc/config/i386/clwbintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/clwb-1.c diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index 1c4f15e..bad0988 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_XSAVE) #define OPTION_MASK_ISA_XSAVEC_SET \ (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE) +#define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -181,6 +182,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_CLFLUSHOPT_UNSET OPTION_MASK_ISA_CLFLUSHOPT #define OPTION_MASK_ISA_XSAVEC_UNSET OPTION_MASK_ISA_XSAVEC #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES +#define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -901,6 +903,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; +case OPT_mclwb: + if (value) + { + opts-x_ix86_isa_flags |= OPTION_MASK_ISA_CLWB_SET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLWB_SET; + } + else + { + opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_CLWB_UNSET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLWB_UNSET; + } + return true; + /* Comes from final.c -- no real reason to change it. */ #define MAX_CODE_ALIGN 16 diff --git a/gcc/config.gcc b/gcc/config.gcc index da2a723..766f13b 100644
Re: [PATCH 1/2] PR debug/38757 gcc does not emit DW_LANG_C99.
On Fri, Nov 21, 2014 at 8:56 AM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Nov 20, 2014 at 11:30:11PM +0100, Mark Wielaard wrote: --- a/gcc/config/avr/avr-c.c +++ b/gcc/config/avr/avr-c.c @@ -386,7 +386,8 @@ avr_cpu_cpp_builtins (struct cpp_reader *pfile) (as mentioned in ISO/IEC DTR 18037; Annex F.2) which is not implemented in GCC up to now. */ - if (!strcmp (lang_hooks.name, GNU C)) + if (strncmp (lang_hooks.name, GNU C, 5) == 0 + strncmp (lang_hooks.name, GNU C++, 7) != 0) I wonder if the tests for C language shouldn't be better done as (strncmp (lang_hooks.name, GNU C, 5) == 0 strchr (0123456789, lang_hooks.name[5]) != NULL) or (strncmp (lang_hooks.name, GNU C, 5) == 0 (ISDIGIT (lang_hooks.name[5]) || lang_hooks.name[5] == '\0')) to make it explicit what we are looking for, not what we aren't. Or even make that a helper function in langhooks.[ch] lang_GNU_C (), lang_GNU_CXX () + either, so for now use 0. Match GNU C++ first, since it needs to + be compared with strncmp, like GNU C, which has the same prefix. */ + if (! strncmp (language_string, GNU C++, 7) +|| ! strcmp (language_string, GNU Objective-C++)) Wrong formatting, || should be below ! on the previous line. + i = 9; + else if (! strncmp (language_string, GNU C, 5) || ! strcmp (language_string, GNU GIMPLE) || ! strcmp (language_string, GNU Go)) And here too. But if you use a different check for C (see above), you could avoid moving the C++ case first. --- a/gcc/langhooks.h +++ b/gcc/langhooks.h @@ -261,7 +261,8 @@ struct lang_hooks_for_lto struct lang_hooks { - /* String identifying the front end. e.g. GNU C++. */ + /* String identifying the front end. e.g. GNU C++. + Might include language version being used. */ As we no longer have GNU C++ as any name, using it as an example is weird. So, /* String identifying the front end and optionally language standard version, e.g. GNU C++98 or GNU Java. */ ? LGTM otherwise. Yes, otherwise looks good. Thanks, Richard. Jakub
Re: [PATCH][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.
On 20 Nov 09:43, Uros Bizjak wrote: On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, New revision of Intel ISA reference [1] has new instructions: Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. I understand that stage 1 is closed, however those changes shouldn't affect anything outside if i386 backend. And are extremely unlikely to break existing functionality, and I personally think it's desirable for newest GCC to support newest spec. Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. Ok for trunk? Please split the patch into patch series, like it was done previously for AVX512F patches. Uros. [1]:https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf This part adds pcommit. Bootstrapps/passes make check. Ok for trunk? gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_PCOMMIT_UNSET, OPTION_MASK_ISA_PCOMMIT_SET): New. (ix86_handle_option): Handle OPT_mpcommit. * config.gcc: Add pcommitintrin.h * config/i386/pcommitintrin.h: New file. * config/i386/cpuid.h (bit_PCOMMIT): Define. * config/i386/driver-i386.c (host_detect_local_cpu): Detect pcommit. * config/i386/i386-c.c (ix86_target_macros_internal): Define __PCOMMIT__. * config/i386/i386.c (ix86_target_string): Add -mpcommit. (PTA_PCOMMIT): Define. (ix86_option_override_internal): Handle new option. (ix86_valid_target_attribute_inner_p): Add pcommit. (ix86_builtins): Add IX86_BUILTIN_PCOMMIT. (bdesc_special_args): Add __builtin_ia32_pcommit. * config/i386/i386.h (TARGET_PCOMMIT, TARGET_PCOMMIT_P): Define. * config/i386/i386.md (unspecv): Add UNSPECV_PCOMMIT. (pcommit): New instruction. * config/i386/i386.opt: Add mpcommit. * config/i386/x86intrin.h: Include pcommitintrin.h. --- gcc/common/config/i386/i386-common.c | 15 ++ gcc/config.gcc| 4 +-- gcc/config/i386/cpuid.h | 1 + gcc/config/i386/driver-i386.c | 5 +++- gcc/config/i386/i386-c.c | 2 ++ gcc/config/i386/i386.c| 12 gcc/config/i386/i386.h| 2 ++ gcc/config/i386/i386.md | 10 +++ gcc/config/i386/i386.opt | 4 +++ gcc/config/i386/pcommitintrin.h | 49 +++ gcc/config/i386/x86intrin.h | 2 ++ gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/pcommit-1.c | 11 +++ gcc/testsuite/gcc.target/i386/sse-12.c| 2 +- gcc/testsuite/gcc.target/i386/sse-13.c| 2 +- gcc/testsuite/gcc.target/i386/sse-14.c| 2 +- gcc/testsuite/gcc.target/i386/sse-22.c| 2 +- gcc/testsuite/gcc.target/i386/sse-23.c| 2 +- 19 files changed, 121 insertions(+), 10 deletions(-) create mode 100644 gcc/config/i386/pcommitintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/pcommit-1.c diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index bad0988..2e09d77 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_XSAVEC_SET \ (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE) #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB +#define OPTION_MASK_ISA_PCOMMIT_SET OPTION_MASK_ISA_PCOMMIT /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -182,6 +183,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_CLFLUSHOPT_UNSET OPTION_MASK_ISA_CLFLUSHOPT #define OPTION_MASK_ISA_XSAVEC_UNSET OPTION_MASK_ISA_XSAVEC #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES +#define OPTION_MASK_ISA_PCOMMIT_UNSET OPTION_MASK_ISA_PCOMMIT #define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same @@ -903,6 +905,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; +case OPT_mpcommit: + if (value) + { + opts-x_ix86_isa_flags |= OPTION_MASK_ISA_PCOMMIT_SET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PCOMMIT_SET; + } + else + { + opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_PCOMMIT_UNSET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PCOMMIT_UNSET; + } + return true; + case OPT_mclwb: if (value) { diff --git a/gcc/config.gcc b/gcc/config.gcc index 766f13b..fa3e1fc 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -369,7 +369,7 @@ i[34567]86-*-*) xsavesintrin.h avx512dqintrin.h avx512bwintrin.h avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
Re: [PATCH] Detect a pack-unpack pattern in GCC vectorizer and optimize it.
Hi, Please note that currently the test: int a[N]; short b[N*2]; for (int i = 0; i N; ++i) a[i] = b[i*2]; Is compiled to (with -march=corei7 -O2 -ftree-vectorize): movdqa b(%rax), %xmm0 movdqa b-16(%rax), %xmm2 pand%xmm1, %xmm0 pand%xmm1, %xmm2 packusdw%xmm2, %xmm0 pmovsxwd%xmm0, %xmm2 psrldq $8, %xmm0 pmovsxwd%xmm0, %xmm0 movaps %xmm2, a-32(%rax) movaps %xmm0, a-16(%rax) Which is more close to the requested sequence. Thanks, Evgeny On Wed, Jun 25, 2014 at 8:34 PM, Cong Hou co...@google.com wrote: On Tue, Jun 24, 2014 at 4:05 AM, Richard Biener richard.guent...@gmail.com wrote: On Sat, May 3, 2014 at 2:39 AM, Cong Hou co...@google.com wrote: On Mon, Apr 28, 2014 at 4:04 AM, Richard Biener rguent...@suse.de wrote: On Thu, 24 Apr 2014, Cong Hou wrote: Given the following loop: int a[N]; short b[N*2]; for (int i = 0; i N; ++i) a[i] = b[i*2]; After being vectorized, the access to b[i*2] will be compiled into several packing statements, while the type promotion from short to int will be compiled into several unpacking statements. With this patch, each pair of pack/unpack statements will be replaced by less expensive statements (with shift or bit-and operations). On x86_64, the loop above will be compiled into the following assembly (with -O2 -ftree-vectorize): movdqu 0x10(%rcx),%xmm3 movdqu -0x20(%rcx),%xmm0 movdqa %xmm0,%xmm2 punpcklwd %xmm3,%xmm0 punpckhwd %xmm3,%xmm2 movdqa %xmm0,%xmm3 punpcklwd %xmm2,%xmm0 punpckhwd %xmm2,%xmm3 movdqa %xmm1,%xmm2 punpcklwd %xmm3,%xmm0 pcmpgtw %xmm0,%xmm2 movdqa %xmm0,%xmm3 punpckhwd %xmm2,%xmm0 punpcklwd %xmm2,%xmm3 movups %xmm0,-0x10(%rdx) movups %xmm3,-0x20(%rdx) With this patch, the generated assembly is shown below: movdqu 0x10(%rcx),%xmm0 movdqu -0x20(%rcx),%xmm1 pslld $0x10,%xmm0 psrad $0x10,%xmm0 pslld $0x10,%xmm1 movups %xmm0,-0x10(%rdx) psrad $0x10,%xmm1 movups %xmm1,-0x20(%rdx) Bootstrapped and tested on x86-64. OK for trunk? This is an odd place to implement such transform. Also if it is faster or not depends on the exact ISA you target - for example ppc has constraints on the maximum number of shifts carried out in parallel and the above has 4 in very short succession. Esp. for the sign-extend path. Thank you for the information about ppc. If this is an issue, I think we can do it in a target dependent way. So this looks more like an opportunity for a post-vectorizer transform on RTL or for the vectorizer special-casing widening loads with a vectorizer pattern. I am not sure if the RTL transform is more difficult to implement. I prefer the widening loads method, which can be detected in a pattern recognizer. The target related issue will be resolved by only expanding the widening load on those targets where this pattern is beneficial. But this requires new tree operations to be defined. What is your suggestion? I apologize for the delayed reply. Likewise ;) I suggest to implement this optimization in vector lowering in tree-vect-generic.c. This sees for your example vect__5.7_32 = MEM[symbol: b, index: ivtmp.15_13, offset: 0B]; vect__5.8_34 = MEM[symbol: b, index: ivtmp.15_13, offset: 16B]; vect_perm_even_35 = VEC_PERM_EXPR vect__5.7_32, vect__5.8_34, { 0, 2, 4, 6, 8, 10, 12, 14 }; vect__6.9_37 = [vec_unpack_lo_expr] vect_perm_even_35; vect__6.9_38 = [vec_unpack_hi_expr] vect_perm_even_35; where you can apply the pattern matching and transform (after checking with the target, of course). This sounds good to me! I'll try to make a patch following your suggestion. Thank you! Cong Richard. thanks, Cong Richard.
Re: [ia64 PATCH] Fix up ia64 attribute handling (PR target/61137)
On Fri, Nov 21, 2014 at 12:01 PM, Andreas Schwab sch...@suse.de wrote: Jakub Jelinek ja...@redhat.com writes: The following untested patch fixes that (tested on small-addr-1.c with a cross-compiler), I don't have ia64 hw nor spare cycles to test this though, so I'm just offering the patch as is if anyone wants to test it. Perhaps better testsuite coverage wouldn't hurt (test the model (small) attribute also in C++, perhaps test the common_object attribute on VMS?). 2014-11-20 Jakub Jelinek ja...@redhat.com PR target/61137 * config/ia64/ia64.c (ia64_attribute_takes_identifier_p): New function. (TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P): Redefine to it. Looks good. http://gcc.gnu.org/ml/gcc-testresults/2014-11/msg02276.html Ok. Thanks, Richard. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH] PR ipa/63909 ICE: SIGSEGV in ipa_icf_gimple::func_checker::compare_bb()
On 11/20/2014 05:41 PM, Richard Biener wrote: On Thu, Nov 20, 2014 at 5:30 PM, Martin Liška mli...@suse.cz wrote: Hello. Following patch fixes ICE in IPA ICF. Problem was that number of non-debug statements in a BB can change (for instance by IPA split), so that the number is recomputed. Huh, so can it get different for both candidates? I think the stmt compare loop should be terminated on gsi_end_p of either iterator and return false for any remaining non-debug-stmts on the other. Thus, not walk all stmts twice here. Hello. Sorry for the previous patch, you are right it can be fixed in purer way. Please take a look at attached patch. As IPA split is run early I don't see how it should affect a real IPA pass though? Sorry for non precise information, the problematic BB is changed here: #0 gsi_split_seq_before (i=0x7fffd550, pnew_seq=0x7fffd528) at ../../gcc/gimple-iterator.c:429 #1 0x00b95a2a in gimple_split_block (bb=0x76c41548, stmt=0x0) at ../../gcc/tree-cfg.c:5707 #2 0x007563cf in split_block (bb=0x76c41548, i=i@entry=0x0) at ../../gcc/cfghooks.c:508 #3 0x00756b44 in split_block_after_labels (bb=optimized out) at ../../gcc/cfghooks.c:549 #4 make_forwarder_block (bb=optimized out, redirect_edge_p=redirect_edge_p@entry=0x75d4e0 mfb_keep_just(edge_def*), new_bb_cbk=new_bb_cbk@entry=0x0) at ../../gcc/cfghooks.c:842 #5 0x0076085a in create_preheader (loop=0x76d56948, flags=optimized out) at ../../gcc/cfgloopmanip.c:1563 #6 0x00760aea in create_preheaders (flags=1) at ../../gcc/cfgloopmanip.c:1613 #7 0x009bc6b0 in apply_loop_flags (flags=15) at ../../gcc/loop-init.c:75 #8 0x009bc7d3 in loop_optimizer_init (flags=15) at ../../gcc/loop-init.c:136 #9 0x00957914 in estimate_function_body_sizes (node=0x76c47620, early=false) at ../../gcc/ipa-inline-analysis.c:2480 #10 0x0095948b in compute_inline_parameters (node=0x76c47620, early=false) at ../../gcc/ipa-inline-analysis.c:2907 #11 0x0095bd88 in inline_analyze_function (node=0x76c47620) at ../../gcc/ipa-inline-analysis.c:3994 #12 0x0095bed3 in inline_generate_summary () at ../../gcc/ipa-inline-analysis.c:4045 #13 0x00a70b71 in execute_ipa_summary_passes (ipa_pass=0x1dcb9e0) at ../../gcc/passes.c:2137 #14 0x00777a15 in ipa_passes () at ../../gcc/cgraphunit.c:2074 #15 symbol_table::compile (this=this@entry=0x76c3a000) at ../../gcc/cgraphunit.c:2187 #16 0x00778bcd in symbol_table::finalize_compilation_unit (this=0x76c3a000) at ../../gcc/cgraphunit.c:2340 #17 0x006580ee in c_write_global_declarations () at ../../gcc/c/c-decl.c:10777 #18 0x00b5bb8b in compile_file () at ../../gcc/toplev.c:584 #19 0x00b5def1 in do_compile () at ../../gcc/toplev.c:2041 #20 0x00b5e0fa in toplev::main (this=0x7fffdc9f, argc=20, argv=0x7fffdd98) at ../../gcc/toplev.c:2138 #21 0x0063f1d9 in main (argc=20, argv=0x7fffdd98) at ../../gcc/main.c:38 Patch can bootstrap on x86_64-linux-pc and no regression has been seen. Ready for trunk? Thanks, Martin Thanks, Richard. Patch can bootstrap on x86_64-linux-pc and no regression has been seen. Ready for trunk? Thanks, Martin From 09b90f6a5ec1e49464f57c333af43574ad8c1375 Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Thu, 20 Nov 2014 16:28:54 +0100 Subject: [PATCH] Fix and new test. gcc/ChangeLog: 2014-11-21 Martin Liska mli...@suse.cz * gimple-iterator.h (gsi_start_bb_nondebug): New function. * ipa-icf-gimple.c (func_checker::compare_bb): Correct iteration replaces loop based on precomputed number of non-debug statements. gcc/testsuite/ChangeLog: 2014-11-21 Martin Liska mli...@suse.cz * gcc.dg/ipa/pr63909.c: New test. --- gcc/gimple-iterator.h | 13 + gcc/ipa-icf-gimple.c | 25 ++--- gcc/testsuite/gcc.dg/ipa/pr63909.c | 27 +++ 3 files changed, 50 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr63909.c diff --git a/gcc/gimple-iterator.h b/gcc/gimple-iterator.h index fb6cc07..e9602b3 100644 --- a/gcc/gimple-iterator.h +++ b/gcc/gimple-iterator.h @@ -211,6 +211,19 @@ gsi_stmt (gimple_stmt_iterator i) return i.ptr; } +/* Return a new iterator pointing to the first non-debug statement + in basic block BB. */ + +static inline gimple_stmt_iterator +gsi_start_bb_nondebug (basic_block bb) +{ + gimple_stmt_iterator gsi = gsi_start_bb (bb); + while (!gsi_end_p (gsi) is_gimple_debug (gsi_stmt (gsi))) +gsi_next (gsi); + + return gsi; +} + /* Return a block statement iterator that points to the first non-label statement in block BB. */ diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c index 8f2a438..ec0290a 100644 --- a/gcc/ipa-icf-gimple.c +++ b/gcc/ipa-icf-gimple.c @@ -559,24 +559,16 @@ func_checker::parse_labels (sem_bb *bb)
Re: [PATCH 8/9] Negative numbers added for sreal class.
On Fri, Nov 21, 2014 at 12:21 PM, Martin Liška mli...@suse.cz wrote: On 11/14/2014 11:48 AM, Richard Biener wrote: On Thu, Nov 13, 2014 at 1:35 PM, mliska mli...@suse.cz wrote: gcc/ChangeLog: 2014-11-13 Martin Liska mli...@suse.cz * predict.c (propagate_freq): More elegant sreal API is used. (estimate_bb_frequencies): New static constants defined by sreal replace precomputed ones. * sreal.c (sreal::normalize): New function. (sreal::to_int): Likewise. (sreal::operator+): Likewise. (sreal::operator-): Likewise. * sreal.h: Definition of new functions added. Please use gcc_checking_assert()s everywhere. sreal is supposed to be fast... (I see it has current uses of gcc_assert - you may want to mass-convert them as a followup). --- gcc/predict.c | 30 +++- gcc/sreal.c | 56 gcc/sreal.h | 75 --- 3 files changed, 126 insertions(+), 35 deletions(-) diff --git a/gcc/predict.c b/gcc/predict.c index 0215e91..0f640f5 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -82,7 +82,7 @@ along with GCC; see the file COPYING3. If not see /* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE, 1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX. */ -static sreal real_zero, real_one, real_almost_one, real_br_prob_base, +static sreal real_almost_one, real_br_prob_base, real_inv_br_prob_base, real_one_half, real_bb_freq_max; static void combine_predictions_for_insn (rtx_insn *, basic_block); @@ -2528,13 +2528,13 @@ propagate_freq (basic_block head, bitmap tovisit) bb-count = bb-frequency = 0; } - BLOCK_INFO (head)-frequency = real_one; + BLOCK_INFO (head)-frequency = sreal::one (); last = head; for (bb = head; bb; bb = nextbb) { edge_iterator ei; - sreal cyclic_probability = real_zero; - sreal frequency = real_zero; + sreal cyclic_probability = sreal::zero (); + sreal frequency = sreal::zero (); nextbb = BLOCK_INFO (bb)-next; BLOCK_INFO (bb)-next = NULL; @@ -2559,13 +2559,13 @@ propagate_freq (basic_block head, bitmap tovisit) * BLOCK_INFO (e-src)-frequency / REG_BR_PROB_BASE); */ - sreal tmp (e-probability, 0); + sreal tmp = e-probability; tmp *= BLOCK_INFO (e-src)-frequency; tmp *= real_inv_br_prob_base; frequency += tmp; } - if (cyclic_probability == real_zero) + if (cyclic_probability == sreal::zero ()) { BLOCK_INFO (bb)-frequency = frequency; } @@ -2577,7 +2577,7 @@ propagate_freq (basic_block head, bitmap tovisit) /* BLOCK_INFO (bb)-frequency = frequency / (1 - cyclic_probability) */ - cyclic_probability = real_one - cyclic_probability; + cyclic_probability = sreal::one () - cyclic_probability; BLOCK_INFO (bb)-frequency = frequency / cyclic_probability; } } @@ -2591,7 +2591,7 @@ propagate_freq (basic_block head, bitmap tovisit) = ((e-probability * BLOCK_INFO (bb)-frequency) / REG_BR_PROB_BASE); */ - sreal tmp (e-probability, 0); + sreal tmp = e-probability; tmp *= BLOCK_INFO (bb)-frequency; EDGE_INFO (e)-back_edge_prob = tmp * real_inv_br_prob_base; } @@ -2873,13 +2873,11 @@ estimate_bb_frequencies (bool force) if (!real_values_initialized) { real_values_initialized = 1; - real_zero = sreal (0, 0); - real_one = sreal (1, 0); - real_br_prob_base = sreal (REG_BR_PROB_BASE, 0); - real_bb_freq_max = sreal (BB_FREQ_MAX, 0); + real_br_prob_base = REG_BR_PROB_BASE; + real_bb_freq_max = BB_FREQ_MAX; real_one_half = sreal (1, -1); - real_inv_br_prob_base = real_one / real_br_prob_base; - real_almost_one = real_one - real_inv_br_prob_base; + real_inv_br_prob_base = sreal::one () / real_br_prob_base; + real_almost_one = sreal::one () - real_inv_br_prob_base; } mark_dfs_back_edges (); @@ -2897,7 +2895,7 @@ estimate_bb_frequencies (bool force) FOR_EACH_EDGE (e, ei, bb-succs) { - EDGE_INFO (e)-back_edge_prob = sreal (e-probability, 0); + EDGE_INFO (e)-back_edge_prob = e-probability; EDGE_INFO (e)-back_edge_prob *= real_inv_br_prob_base; } } @@ -2906,7 +2904,7 @@ estimate_bb_frequencies (bool force) to outermost to examine frequencies for back edges. */
Re: [PATCH]Add myself to MAINTAINERS
On 2014.11.21 at 11:42 +, Richard Earnshaw wrote: On 21/11/14 11:16, Renlin Li wrote: Hi, This patch is to add myself into Write After Approval section of MAINTAINERS file. Is it Okay to commit? OK There is no need to ask for permission in this case: http://gcc.gnu.org/svnwrite.html#authenticated Once you get a gcc.gnu.org account you could just add yourself. -- Markus
FW: [Aarch64][BE][2/2] Fix vector load/stores to not use ld1/st1
On 20/11/2014 18:13, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 14 November 2014 16:48, Alan Hayward alan.hayw...@arm.com wrote: This is a new version of my BE patch from a few weeks ago. This is part 2 and covers all the aarch64 changes. When combined with the first patch, It fixes up movoi/ci/xi for Big Endian, so that we end up with the lab of a big-endian integer to be in the low byte of the highest-numbered register. This patch requires part 1 and David Sherwood’s patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. When tested with David’s patch and [1/2] of this patch, no regressions were seen when testing aarch64 and x86_64 on make check. Changelog: 2014-11-14 Alan Hayward alan.hayw...@arm.com * config/aarch64/aarch64.c (aarch64_classify_address): Allow extra addressing modes for BE. (aarch64_print_operand): new operand for printing a q register+1. Just a bunch of ChangeLog nits. +void aarch64_simd_emit_reg_reg_move (rtx *operands, enum machine_mode mode, + unsigned int count); Drop the formal argument names. Can you respin with these changes please. /Marcus New version. Identical to previous version of the patch except for: * removal of parameter names in aarch64-protos.h * new changelog 2014-11-21 Alan Hayward alan.hayw...@arm.com PR 57233 PR 59810 * config/aarch64/aarch64.c (aarch64_classify_address): Allow extra addressing modes for BE. (aarch64_print_operand): New operand for printing a q register+1. (aarch64_simd_emit_reg_reg_move): Define. (aarch64_simd_disambiguate_copy): Remove. * config/aarch64/aarch64-protos.h (aarch64_simd_emit_reg_reg_move): Define. (aarch64_simd_disambiguate_copy): Remove. * config/aarch64/aarch64-simd.md (define_split): Use aarch64_simd_emit_reg_reg_move. (define_expand movmode): Less restrictive predicates. (define_insn *aarch64_movmode): Simplify and only allow for LE. (define_insn *aarch64_be_movoi): Define. (define_insn *aarch64_be_movci): Define. (define_insn *aarch64_be_movxi): Define. (define_split): OI mov. Use aarch64_simd_emit_reg_reg_move. (define_split): CI mov. Use aarch64_simd_emit_reg_reg_move. (define_split): XI mov. Use aarch64_simd_emit_reg_reg_move. Alan. 0001-BE-fix-load-stores.-Aarch64-code.-v2.patch Description: Binary data
Re: [PATCH 3/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.
On Fri, Nov 21, 2014 at 12:45 PM, Ilya Tocar tocarip.in...@gmail.com wrote: On 20 Nov 09:43, Uros Bizjak wrote: On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, New revision of Intel ISA reference [1] has new instructions: Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. I understand that stage 1 is closed, however those changes shouldn't affect anything outside if i386 backend. And are extremely unlikely to break existing functionality, and I personally think it's desirable for newest GCC to support newest spec. Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. Ok for trunk? Please split the patch into patch series, like it was done previously for AVX512F patches. Uros. Done. This part adds clwb. Bootstrapped/passes make-check. Ok for trunk? gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_CLWB_UNSET, OPTION_MASK_ISA_CLWB_SET): New. (ix86_handle_option): Handle OPT_mclwb. * config.gcc: Add clwbintrin.h. * config/i386/clwbintrin.h: New file. * config/i386/cpuid.h (bit_CLWB): Define. * config/i386/driver-i386.c (host_detect_local_cpu): Detect clwb. * config/i386/i386-c.c (ix86_target_macros_internal): Define __CLWB__. * config/i386/i386.c (ix86_target_string): Add -mclwb. (PTA_CLWB): Define. (ix86_option_override_internal): Handle new option. (ix86_valid_target_attribute_inner_p): Add clwb. (ix86_builtins): Add IX86_BUILTIN_CLWB. (ix86_init_mmx_sse_builtins): Add __builtin_ia32_clwb. (ix86_expand_builtin): Handle IX86_BUILTIN_CLWB. * config/i386/i386.h (TARGET_CLWB, TARGET_CLWB_P): Define. * config/i386/i386.md (unspecv): Add UNSPECV_CLWB. (clwb): New instruction. * config/i386/i386.opt: Add mclwb. * config/i386/x86intrin.h: Include clwbintrin.h. gcc/testsuite/ * g++.dg/other/i386-2.C: Add -mclwb. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/clwb-1.c: New test. * gcc.target/i386/sse-12.c: Add new options. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. OK. Thanks, Uros. --- gcc/common/config/i386/i386-common.c | 15 +++ gcc/config.gcc | 4 +-- gcc/config/i386/clwbintrin.h | 49 ++ gcc/config/i386/cpuid.h| 1 + gcc/config/i386/driver-i386.c | 6 +++-- gcc/config/i386/i386-c.c | 2 ++ gcc/config/i386/i386.c | 23 gcc/config/i386/i386.h | 2 ++ gcc/config/i386/i386.md| 12 + gcc/config/i386/i386.opt | 4 +++ gcc/config/i386/x86intrin.h| 2 ++ gcc/testsuite/g++.dg/other/i386-2.C| 2 +- gcc/testsuite/g++.dg/other/i386-3.C| 2 +- gcc/testsuite/gcc.target/i386/clwb-1.c | 11 gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 2 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- 19 files changed, 134 insertions(+), 11 deletions(-) create mode 100644 gcc/config/i386/clwbintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/clwb-1.c diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index 1c4f15e..bad0988 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_XSAVE) #define OPTION_MASK_ISA_XSAVEC_SET \ (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE) +#define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -181,6 +182,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_CLFLUSHOPT_UNSET OPTION_MASK_ISA_CLFLUSHOPT #define OPTION_MASK_ISA_XSAVEC_UNSET OPTION_MASK_ISA_XSAVEC #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES +#define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -901,6 +903,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; +case OPT_mclwb: + if (value) + { + opts-x_ix86_isa_flags |= OPTION_MASK_ISA_CLWB_SET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLWB_SET; + } + else + { + opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_CLWB_UNSET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLWB_UNSET; + } +
Re: [PATCH] VRP: don't assume strict overflow semantics when checking if a loop wraps
On Fri, Nov 21, 2014 at 12:29 PM, Patrick Palka patr...@parcs.ath.cx wrote: When adjusting the value range of an induction variable using SCEV, VRP calls scev_probably_wraps_p() with use_overflow_semantics=true. This parameter set to true makes scev_probably_wraps_p() assume that signed induction variables never wrap, so for these variables it always returns false (when strict overflow rules are in effect). This is wrong because if a signed induction variable really does overflow then we want to give it an INF(OVF) value range and not the (finite) estimation returned by SCEV. While this change shouldn't make a difference in code generation, it should help improve the coverage of -Wstrict-overflow warnings on induction variables like in the test case. OK after bootstrap + regtest on x86_64-unknown-linux-gnu? Hmm, I don't think the change won't affect code-generation. In fact we check for overflow ourselves in the most interesting case (the first block) - only the path adjusting min/max based on the init value and the max value of the type needs to know whether overflow may happen and fail or drop to +-INF(OVF). So I'd rather open-code the relevant cases and not call scev_probably_wraps_p at all. Richard. gcc/ * tree-vrp.c (adjust_range_with_scev): Call scev_probably_wraps_p with use_overflow_semantics=false. gcc/testsuite/ * gcc.dg/Wstrict-overflow-27.c: New test. --- gcc/testsuite/gcc.dg/Wstrict-overflow-27.c | 22 ++ gcc/tree-vrp.c | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/Wstrict-overflow-27.c diff --git a/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c b/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c new file mode 100644 index 000..c1f27ab --- /dev/null +++ b/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -fstrict-overflow -O2 -Wstrict-overflow } */ + +/* Warn about an overflow when folding i 0. */ + +void bar (unsigned *p); + +int +foo (unsigned *p) +{ + int i; + int sum = 0; + + for (i = 0; i *p; i++) +{ + if (i 0) /* { dg-warning signed overflow } */ + sum += 2; + bar (p); +} + + return sum; +} diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index a75138f..bf9ff61 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4270,7 +4270,7 @@ adjust_range_with_scev (value_range_t *vr, struct loop *loop, dir == EV_DIR_UNKNOWN /* ... or if it may wrap. */ || scev_probably_wraps_p (init, step, stmt, get_chrec_loop (chrec), - true)) + /*use_overflow_semantics=*/false)) return; /* We use TYPE_MIN_VALUE and TYPE_MAX_VALUE here instead of -- 2.2.0.rc1.23.gf570943
Re: libsanitizer merge from upstream r221802
On Thu, Nov 13, 2014 at 12:16 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Nov 12, 2014 at 05:35:48PM -0800, Konstantin Serebryany wrote: Here is one more merge of libsanitizer (last one was in Sept). Tested on x86_64 Ubuntu 14.04 like this: rm -rf */{*/,}libsanitizer make -j 50 make -j 40 -C gcc check-g{cc,++} RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp' \ make -j 40 -C gcc check-g{cc,++} RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} tsan.exp' \ make -j 40 -C gcc check RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} ubsan.exp' \ echo PASS Expected ChangeLog entry: 2014-11-12 Kostya Serebryany k...@google.com * All source files: Merge from upstream r221802. * sanitizer_common/sanitizer_symbolizer_libbacktrace.cc (LibbacktraceSymbolizer::SymbolizeData): replace 'address' with 'start' to follow the new interface. Capital R in Replace. All lines are indented by single tab, not tab and two spaces. * asan/Makefile.am (AM_CXXFLAGS): added -std=c++11. Capital A in Added. Also, I wonder if we shouldn't use -std=gnu++11 instead. As the sources are compiled by newly built compiler, it should be generally fine to use extensions in there. * interception/Makefile.am (AM_CXXFLAGS): added -std=c++11. * libbacktrace/Makefile.am (AM_CXXFLAGS): added -std=c++11. * lsan/Makefile.am (AM_CXXFLAGS): added -std=c++11. * sanitizer_common/Makefile.am (sanitizer_common_files): Added new files. (AM_CXXFLAGS): added -std=c++11. * tsan/Makefile.am (AM_CXXFLAGS): added -std=c++11. * ubsan/Makefile.am (AM_CXXFLAGS): added -std=c++11. Ditto. * asan/Makefile.in: Regenerate. * interception/Makefile.in: Regenerate. * libbacktrace/Makefile.in: Regenerate. * lsan/Makefile.in: Regenerate. * sanitizer_common/Makefile.in: Regenerate. * tsan/Makefile.in: Regenerate. * ubsan/Makefile.in: Regenerate. Other than that, it looks good to me, I've bootstrapped/regtested it on x86_64-linux and i686-linux too. So, with those changes ok for trunk (how do you decide about c++11 vs. gnu++11 I'll leave to you). A few questions regarding possible changes on the compiler side: 1) is __asan_poison_intra_object_redzone/__asan_unpoison_intra_object_redzone just for the ABI incompatible putting of red zones in between fields in structures? How do you handle whole struct copying in that case? Could it be done without changing ABI for a subset of structs which have natural padding in them? 2) regarding the tsan memory layout changes, is it now possible to support non-pie binaries? If yes, we should probably remove the: %{!pie:%{!shared:%e-fsanitize=thread linking must be done with -pie or -shared}}}\ and add testcases that would test that. Hi Jakub, Yes, I think it's the way to go. I've just committed the following revision to clang that removes -pie when compiling with tsan: http://llvm.org/viewvc/llvm-project?view=revisionrevision=222526 The tests in llvm tree pass with this change.
Re: [PATCH][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.
On Fri, Nov 21, 2014 at 12:50 PM, Ilya Tocar tocarip.in...@gmail.com wrote: On 20 Nov 09:43, Uros Bizjak wrote: On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, New revision of Intel ISA reference [1] has new instructions: Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. I understand that stage 1 is closed, however those changes shouldn't affect anything outside if i386 backend. And are extremely unlikely to break existing functionality, and I personally think it's desirable for newest GCC to support newest spec. Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. Ok for trunk? Please split the patch into patch series, like it was done previously for AVX512F patches. Uros. [1]:https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf This part adds pcommit. Bootstrapps/passes make check. Ok for trunk? gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_PCOMMIT_UNSET, OPTION_MASK_ISA_PCOMMIT_SET): New. (ix86_handle_option): Handle OPT_mpcommit. * config.gcc: Add pcommitintrin.h * config/i386/pcommitintrin.h: New file. * config/i386/cpuid.h (bit_PCOMMIT): Define. * config/i386/driver-i386.c (host_detect_local_cpu): Detect pcommit. * config/i386/i386-c.c (ix86_target_macros_internal): Define __PCOMMIT__. * config/i386/i386.c (ix86_target_string): Add -mpcommit. (PTA_PCOMMIT): Define. (ix86_option_override_internal): Handle new option. (ix86_valid_target_attribute_inner_p): Add pcommit. (ix86_builtins): Add IX86_BUILTIN_PCOMMIT. (bdesc_special_args): Add __builtin_ia32_pcommit. * config/i386/i386.h (TARGET_PCOMMIT, TARGET_PCOMMIT_P): Define. * config/i386/i386.md (unspecv): Add UNSPECV_PCOMMIT. (pcommit): New instruction. * config/i386/i386.opt: Add mpcommit. * config/i386/x86intrin.h: Include pcommitintrin.h. OK with a small typo fix below. Thanks, Uros. --- gcc/common/config/i386/i386-common.c | 15 ++ gcc/config.gcc| 4 +-- gcc/config/i386/cpuid.h | 1 + gcc/config/i386/driver-i386.c | 5 +++- gcc/config/i386/i386-c.c | 2 ++ gcc/config/i386/i386.c| 12 gcc/config/i386/i386.h| 2 ++ gcc/config/i386/i386.md | 10 +++ gcc/config/i386/i386.opt | 4 +++ gcc/config/i386/pcommitintrin.h | 49 +++ gcc/config/i386/x86intrin.h | 2 ++ gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/pcommit-1.c | 11 +++ gcc/testsuite/gcc.target/i386/sse-12.c| 2 +- gcc/testsuite/gcc.target/i386/sse-13.c| 2 +- gcc/testsuite/gcc.target/i386/sse-14.c| 2 +- gcc/testsuite/gcc.target/i386/sse-22.c| 2 +- gcc/testsuite/gcc.target/i386/sse-23.c| 2 +- 19 files changed, 121 insertions(+), 10 deletions(-) create mode 100644 gcc/config/i386/pcommitintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/pcommit-1.c diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index bad0988..2e09d77 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_XSAVEC_SET \ (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE) #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB +#define OPTION_MASK_ISA_PCOMMIT_SET OPTION_MASK_ISA_PCOMMIT /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -182,6 +183,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_CLFLUSHOPT_UNSET OPTION_MASK_ISA_CLFLUSHOPT #define OPTION_MASK_ISA_XSAVEC_UNSET OPTION_MASK_ISA_XSAVEC #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES +#define OPTION_MASK_ISA_PCOMMIT_UNSET OPTION_MASK_ISA_PCOMMIT #define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same @@ -903,6 +905,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; +case OPT_mpcommit: + if (value) + { + opts-x_ix86_isa_flags |= OPTION_MASK_ISA_PCOMMIT_SET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PCOMMIT_SET; + } + else + { + opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_PCOMMIT_UNSET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PCOMMIT_UNSET; + } + return true; + case OPT_mclwb: if (value) { diff --git a/gcc/config.gcc b/gcc/config.gcc index 766f13b..fa3e1fc 100644 ---
Re: [PATCH] PR ipa/63909 ICE: SIGSEGV in ipa_icf_gimple::func_checker::compare_bb()
On Fri, Nov 21, 2014 at 12:52 PM, Martin Liška mli...@suse.cz wrote: On 11/20/2014 05:41 PM, Richard Biener wrote: On Thu, Nov 20, 2014 at 5:30 PM, Martin Liška mli...@suse.cz wrote: Hello. Following patch fixes ICE in IPA ICF. Problem was that number of non-debug statements in a BB can change (for instance by IPA split), so that the number is recomputed. Huh, so can it get different for both candidates? I think the stmt compare loop should be terminated on gsi_end_p of either iterator and return false for any remaining non-debug-stmts on the other. Thus, not walk all stmts twice here. Hello. Sorry for the previous patch, you are right it can be fixed in purer way. Please take a look at attached patch. As IPA split is run early I don't see how it should affect a real IPA pass though? Sorry for non precise information, the problematic BB is changed here: #0 gsi_split_seq_before (i=0x7fffd550, pnew_seq=0x7fffd528) at ../../gcc/gimple-iterator.c:429 #1 0x00b95a2a in gimple_split_block (bb=0x76c41548, stmt=0x0) at ../../gcc/tree-cfg.c:5707 #2 0x007563cf in split_block (bb=0x76c41548, i=i@entry=0x0) at ../../gcc/cfghooks.c:508 #3 0x00756b44 in split_block_after_labels (bb=optimized out) at ../../gcc/cfghooks.c:549 #4 make_forwarder_block (bb=optimized out, redirect_edge_p=redirect_edge_p@entry=0x75d4e0 mfb_keep_just(edge_def*), new_bb_cbk=new_bb_cbk@entry=0x0) at ../../gcc/cfghooks.c:842 #5 0x0076085a in create_preheader (loop=0x76d56948, flags=optimized out) at ../../gcc/cfgloopmanip.c:1563 #6 0x00760aea in create_preheaders (flags=1) at ../../gcc/cfgloopmanip.c:1613 #7 0x009bc6b0 in apply_loop_flags (flags=15) at ../../gcc/loop-init.c:75 #8 0x009bc7d3 in loop_optimizer_init (flags=15) at ../../gcc/loop-init.c:136 #9 0x00957914 in estimate_function_body_sizes (node=0x76c47620, early=false) at ../../gcc/ipa-inline-analysis.c:2480 #10 0x0095948b in compute_inline_parameters (node=0x76c47620, early=false) at ../../gcc/ipa-inline-analysis.c:2907 #11 0x0095bd88 in inline_analyze_function (node=0x76c47620) at ../../gcc/ipa-inline-analysis.c:3994 #12 0x0095bed3 in inline_generate_summary () at ../../gcc/ipa-inline-analysis.c:4045 #13 0x00a70b71 in execute_ipa_summary_passes (ipa_pass=0x1dcb9e0) at So inline_summary is generated after IPA-ICF does its job? But the bug is obviously that an IPA analysis phase does a code transform (here initializes loops without AVOID_CFG_MANIPULATIONS). Honza - if that is really needed then I think we should make sure loops are initialized at the start of the IPA analysis phase, not randomly inbetween. Thanks, Richard. ../../gcc/passes.c:2137 #14 0x00777a15 in ipa_passes () at ../../gcc/cgraphunit.c:2074 #15 symbol_table::compile (this=this@entry=0x76c3a000) at ../../gcc/cgraphunit.c:2187 #16 0x00778bcd in symbol_table::finalize_compilation_unit (this=0x76c3a000) at ../../gcc/cgraphunit.c:2340 #17 0x006580ee in c_write_global_declarations () at ../../gcc/c/c-decl.c:10777 #18 0x00b5bb8b in compile_file () at ../../gcc/toplev.c:584 #19 0x00b5def1 in do_compile () at ../../gcc/toplev.c:2041 #20 0x00b5e0fa in toplev::main (this=0x7fffdc9f, argc=20, argv=0x7fffdd98) at ../../gcc/toplev.c:2138 #21 0x0063f1d9 in main (argc=20, argv=0x7fffdd98) at ../../gcc/main.c:38 Patch can bootstrap on x86_64-linux-pc and no regression has been seen. Ready for trunk? Thanks, Martin Thanks, Richard. Patch can bootstrap on x86_64-linux-pc and no regression has been seen. Ready for trunk? Thanks, Martin
Re: [PATCH 2/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.
On Fri, Nov 21, 2014 at 12:38 PM, Ilya Tocar tocarip.in...@gmail.com wrote: On 20 Nov 09:43, Uros Bizjak wrote: On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, New revision of Intel ISA reference [1] has new instructions: Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. I understand that stage 1 is closed, however those changes shouldn't affect anything outside if i386 backend. And are extremely unlikely to break existing functionality, and I personally think it's desirable for newest GCC to support newest spec. Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. Ok for trunk? Please split the patch into patch series, like it was done previously for AVX512F patches. Uros. This part adds avx512vbmi. I'll send vpermi2b autogen patch together with v64qi const perm later. Boostraps/passes make check. Ok for trunk? gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512VBMI_SET Please remove in the above line. OPTION_MASK_ISA_AVX512VBMI_UNSET): New. (ix86_handle_option): Handle OPT_mavx512vbmi. * config.gcc: Add avx512vbmiintrin.h, avx512vbmivlintrin.h. * config/i386/avx512vbmiintrin.h: New file. * config/i386/avx512vbmivlintrin.h: Ditto. * config/i386/cpuid.h (bit_AVX512VBMI): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect avx512vbmi. * config/i386/i386-c.c (ix86_target_macros_internal): Define __AVX512VBMI__. * config/i386/i386.c (ix86_target_string): Add -mavx512vbmi. (PTA_AVX512VBMI): Define. (ix86_option_override_internal): Handle new options. (ix86_valid_target_attribute_inner_p): Add avx512vbmi, (ix86_builtins): Add IX86_BUILTIN_VPMULTISHIFTQB512, IX86_BUILTIN_VPMULTISHIFTQB256, IX86_BUILTIN_VPMULTISHIFTQB128, IX86_BUILTIN_VPERMVARQI512_MASK, IX86_BUILTIN_VPERMT2VARQI512, IX86_BUILTIN_VPERMT2VARQI512_MASKZ, IX86_BUILTIN_VPERMI2VARQI512, IX86_BUILTIN_VPERMVARQI256_MASK, IX86_BUILTIN_VPERMVARQI128_MASK, IX86_BUILTIN_VPERMT2VARQI256, IX86_BUILTIN_VPERMT2VARQI256_MASKZ, IX86_BUILTIN_VPERMT2VARQI128, IX86_BUILTIN_VPERMI2VARQI256, IX86_BUILTIN_VPERMI2VARQI128. (bdesc_special_args): Add __builtin_ia32_vpmultishiftqb512_mask, __builtin_ia32_vpmultishiftqb256_mask, __builtin_ia32_vpmultishiftqb128_mask, __builtin_ia32_permvarqi512_mask, __builtin_ia32_vpermt2varqi512_mask, __builtin_ia32_vpermt2varqi512_maskz, __builtin_ia32_vpermi2varqi512_mask, __builtin_ia32_permvarqi256_mask, __builtin_ia32_permvarqi128_mask, __builtin_ia32_vpermt2varqi256_mask, __builtin_ia32_vpermt2varqi256_maskz, __builtin_ia32_vpermt2varqi128_mask, __builtin_ia32_vpermt2varqi128_maskz, __builtin_ia32_vpermi2varqi256_mask, __builtin_ia32_vpermi2varqi128_mask. (ix86_hard_regno_mode_ok): Allow big masks for AVX512VBMI. * config/i386/i386.h (TARGET_AVX512VBMI, TARGET_AVX512VBMI_P): Define. * config/i386/i386.opt: Add mavx512vbmi. * config/i386/immintrin.h: Include avx512vbmiintrin.h, avx512vbmivlintrin.h. * config/i386/sse.md (unspec): Add UNSPEC_VPMULTISHIFT. (VI1_AVX512VL): New iterator. (avx512_permvarmodemask_name): Use it. (avx512_vpermi2varmode3_maskz): Ditto. (avx512_vpermi2varmode3sd_maskz_name): Ditto. (avx512_vpermi2varmode3_mask): Ditto. (avx512_vpermt2varmode3_maskz): Ditto. (avx512_vpermt2varmode3sd_maskz_name): Ditto. (avx512_vpermt2varmode3_mask): Ditto. (vpmultishiftqbmodemask_name): Ditto. gcc/testsuite/ * g++.dg/other/i386-2.C: Add -mavx512vbmi. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx512f-helper.h: Add avx512vbmi-check.h. * gcc.target/i386/avx512vbmi-check.h: Ditto. * gcc.target/i386/avx512vbmi-vpermb-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpermb-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpermi2b-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpermi2b-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpermt2b-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpermt2b-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpmultishiftqb-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c: Ditto. * gcc.target/i386/avx512vl-vpermb-2.c: Ditto. * gcc.target/i386/avx512vl-vpermi2b-2.c: Ditto. * gcc.target/i386/avx512vl-vpermt2b-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmultishiftqb-2.c: Ditto. * gcc.target/i386/i386.exp (check_effective_target_avx512vbmi): New. * gcc.target/i386/sse-12.c: Add new options. *
Re: [PATCH 1/4][x86] Add clwb,pcommit,avx512avbmi,avx512ifma.
On Fri, Nov 21, 2014 at 12:21 PM, Ilya Tocar tocarip.in...@gmail.com wrote: On 20 Nov 09:43, Uros Bizjak wrote: On Wed, Nov 19, 2014 at 6:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, New revision of Intel ISA reference [1] has new instructions: Clwb, pcommit and new flavors of AVX512. Patch bellow adds them. I understand that stage 1 is closed, however those changes shouldn't affect anything outside if i386 backend. And are extremely unlikely to break existing functionality, and I personally think it's desirable for newest GCC to support newest spec. Bootstrapped/regtestsed on x86_64-unknown-linux-gnu. Ok for trunk? Please split the patch into patch series, like it was done previously for AVX512F patches. Uros. This part adds avx512ifma. Bootstraps/passes make check. gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512IFMA_SET, , OPTION_MASK_ISA_AVX512IFMA_UNSET): New. (ix86_handle_option): Handle OPT_mavx512ifma. * config.gcc: Add avx512ifmaintrin.h, avx512ifmavlintrin.h. * config/i386/avx512ifmaintrin.h: New file. * config/i386/avx512ifmaivlntrin.h: Ditto. * config/i386/cpuid.h (bit_AVX512IFMA): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect avx512ifma. * config/i386/i386-c.c (ix86_target_macros_internal): Define __AVX512IFMA__. * config/i386/i386.c (ix86_target_string): Add -mavx512ifma. (PTA_AVX512IFMA): Define. (ix86_option_override_internal): Handle new options. (ix86_valid_target_attribute_inner_p): Add avx512ifma. (ix86_builtins): Add IX86_BUILTIN_VPMADD52LUQ512, IX86_BUILTIN_VPMADD52HUQ512, IX86_BUILTIN_VPMADD52LUQ256, IX86_BUILTIN_VPMADD52HUQ256, IX86_BUILTIN_VPMADD52LUQ128, IX86_BUILTIN_VPMADD52HUQ128, IX86_BUILTIN_VPMADD52LUQ512_MASKZ, IX86_BUILTIN_VPMADD52HUQ512_MASKZ, IX86_BUILTIN_VPMADD52LUQ256_MASKZ, IX86_BUILTIN_VPMADD52HUQ256_MASKZ, IX86_BUILTIN_VPMADD52LUQ128_MASKZ, IX86_BUILTIN_VPMADD52HUQ128_MASKZ. (bdesc_special_args): Add __builtin_ia32_vpmadd52luq512_mask, __builtin_ia32_vpmadd52luq512_maskz, __builtin_ia32_vpmadd52huq512_mask, __builtin_ia32_vpmadd52huq512_maskx, __builtin_ia32_vpmadd52luq256_mask, __builtin_ia32_vpmadd52luq256_maskz, __builtin_ia32_vpmadd52huq256_mask, __builtin_ia32_vpmadd52huq256_maskz, __builtin_ia32_vpmadd52luq128_mask, __builtin_ia32_vpmadd52luq128_maskz, __builtin_ia32_vpmadd52huq128_mask, __builtin_ia32_vpmadd52huq128_maskz, * config/i386/i386.h (TARGET_AVX512IFMA, TARGET_AVX512IFMA_P): Define. * config/i386/i386.opt: Add mavx512ifma. * config/i386/immintrin.h: Include avx512ifmaintrin.h, avx512ifmavlintrin.h. * config/i386/sse.md (unspec): Add UNSPEC_VPMADD52LUQ, UNSPEC_VPMADD52HUQ. (VPMADD52): New iterator. (vpmadd52type): New attribute. (vpamdd52huqmode_maskz): New. (vpamdd52luqmode_maskz): Ditto. (vpamdd52vpmadd52typemodesd_maskz_name): Ditto. (vpamdd52vpmadd52typemode_mask): Ditto. gcc/testsuite/ * g++.dg/other/i386-2.C: Add -mavx512ifma. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx512f-helper.h: Add avx512ifma-check.h. * gcc.target/i386/avx512ifma-check.h: New. * gcc.target/i386/avx512ifma-vpmaddhuq-1.c: Ditto. * gcc.target/i386/avx512ifma-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx512ifma-vpmaddluq-1.c: Ditto. * gcc.target/i386/avx512ifma-vpmaddluq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto. * gcc.target/i386/i386.exp (check_effective_target_avx512ifma): New. * gcc.target/i386/sse-12.c: Add new options. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. As discussed some time ago with Kirill, scan strings in the testsuite need %xmm\[0-9\]+ (please note + at the end, at least one number should be present), but this will be mass-fixed in the near future. OK for mainline. Thanks, Uros. --- gcc/common/config/i386/i386-common.c | 16 ++ gcc/config.gcc | 6 +- gcc/config/i386/avx512ifmaintrin.h | 104 + gcc/config/i386/avx512ifmavlintrin.h | 164 + gcc/config/i386/cpuid.h| 1 + gcc/config/i386/driver-i386.c | 5 +- gcc/config/i386/i386-c.c | 2 + gcc/config/i386/i386.c | 35 + gcc/config/i386/i386.h | 2
[AArch64, Obvious] Fix formatting of SHLL and friends
Hi, I spotted in an assembly dump, that the the SHLL, SHLL2, SADDL, and SSUBL instructions appear out of line, as they are missing a tab between their mnemonic and their operands. I've committed (revision 217917) the attached as the obvious fix to this. Tested with a build-test and a run of aarch64.exp/simd.exp for aarch64-none-elf with no issues. Cheers, James --- 2014-11-21 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-simd.md (aarch64_ANY_EXTEND:suADDSUB:optablmode): Add a tab between output mnemonic and operands. (aarch64_simd_vec_unpacksu_lo_mode): Likewise. (aarch64_simd_vec_unpacksu_hi_mode): Likewise. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 23345b1df1ebb28075edd2effd5f327749abd61d..926eb765e1bdc84f3f7873dbcd4030c4e2ea62a7 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1175,7 +1175,7 @@ (define_insn aarch64_simd_vec_unpacksu (match_operand:VQW 2 vect_par_cnst_lo_half ) )))] TARGET_SIMD - sushll %0.Vwtype, %1.Vhalftype, 0 + sushll\t%0.Vwtype, %1.Vhalftype, 0 [(set_attr type neon_shift_imm_long)] ) @@ -1186,7 +1186,7 @@ (define_insn aarch64_simd_vec_unpacksu (match_operand:VQW 2 vect_par_cnst_hi_half ) )))] TARGET_SIMD - sushll2 %0.Vwtype, %1.Vtype, 0 + sushll2\t%0.Vwtype, %1.Vtype, 0 [(set_attr type neon_shift_imm_long)] ) @@ -2601,7 +2601,7 @@ (define_insn aarch64_ANY_EXTEND:suAD (ANY_EXTEND:VWIDE (match_operand:VDW 2 register_operand w] TARGET_SIMD - ANY_EXTEND:suADDSUB:optabl %0.Vwtype, %1.Vtype, %2.Vtype + ANY_EXTEND:suADDSUB:optabl\t%0.Vwtype, %1.Vtype, %2.Vtype [(set_attr type neon_ADDSUB:optab_long)] )
[PATCH, committed] Add fgcse-sm test with scan-rtl-dump
Hi, this patch adds a fgcse-sm test with a scan-rtl-dump directive. The other fgcse-sm tests: ... ./gcc/testsuite/gcc.dg/pr45352-3.c ./gcc/testsuite/gcc.dg/torture/pr24257.c ./gcc/testsuite/gcc.target/i386/movsi-sm-1.c ./gcc/testsuite/g++.dg/opt/pr36185.C ... do not check whether fgcse-sm actually does something. Committed as trivial. Thanks, - Tom 2014-11-21 Tom de Vries t...@codesourcery.com * gcc.dg/store-motion-fgcse-sm.c: New test. --- gcc/testsuite/gcc.dg/store-motion-fgcse-sm.c | 32 1 file changed, 32 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/store-motion-fgcse-sm.c diff --git a/gcc/testsuite/gcc.dg/store-motion-fgcse-sm.c b/gcc/testsuite/gcc.dg/store-motion-fgcse-sm.c new file mode 100644 index 000..b331a24 --- /dev/null +++ b/gcc/testsuite/gcc.dg/store-motion-fgcse-sm.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-options -O2 -ftree-pre -fno-tree-loop-im -fgcse-sm -fdump-rtl-store_motion } */ + +/* tree-pre moves the *sum load out of the loop. ftree-loop-im moves the *sum + store out of the loop, so we disable it, to allow fgcse-sm to do it + instead. */ + +#include stdlib.h + +void __attribute__((noinline)) +f (unsigned int *__restrict__ a, unsigned int *__restrict__ sum, unsigned int n) +{ + unsigned int i; + for (i = 0; i n; ++i) +*sum += a[i]; +} + +int +main () +{ + unsigned int a[] = { 1, 10, 100 }; + unsigned sum = 1000; + + f (a, sum, 3); + if (sum != ) +abort (); + + return 0; +} + +/* Check that -fgcse-sm did something for f. */ +/* { dg-final { scan-rtl-dump STORE_MOTION of f, .* basic blocks, 1 insns deleted, 1 insns created store_motion } } */ -- 1.9.1
Re: [PATCH 1/2] PR debug/38757 gcc does not emit DW_LANG_C99.
On Fri, 2014-11-21 at 12:48 +0100, Richard Biener wrote: On Fri, Nov 21, 2014 at 8:56 AM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Nov 20, 2014 at 11:30:11PM +0100, Mark Wielaard wrote: --- a/gcc/config/avr/avr-c.c +++ b/gcc/config/avr/avr-c.c @@ -386,7 +386,8 @@ avr_cpu_cpp_builtins (struct cpp_reader *pfile) (as mentioned in ISO/IEC DTR 18037; Annex F.2) which is not implemented in GCC up to now. */ - if (!strcmp (lang_hooks.name, GNU C)) + if (strncmp (lang_hooks.name, GNU C, 5) == 0 + strncmp (lang_hooks.name, GNU C++, 7) != 0) I wonder if the tests for C language shouldn't be better done as (strncmp (lang_hooks.name, GNU C, 5) == 0 strchr (0123456789, lang_hooks.name[5]) != NULL) or (strncmp (lang_hooks.name, GNU C, 5) == 0 (ISDIGIT (lang_hooks.name[5]) || lang_hooks.name[5] == '\0')) to make it explicit what we are looking for, not what we aren't. Or even make that a helper function in langhooks.[ch] lang_GNU_C (), lang_GNU_CXX () Nice idea. I added those. It also fixes the formatting issues and makes the diff smaller. --- a/gcc/langhooks.h +++ b/gcc/langhooks.h @@ -261,7 +261,8 @@ struct lang_hooks_for_lto struct lang_hooks { - /* String identifying the front end. e.g. GNU C++. */ + /* String identifying the front end. e.g. GNU C++. + Might include language version being used. */ As we no longer have GNU C++ as any name, using it as an example is weird. So, /* String identifying the front end and optionally language standard version, e.g. GNU C++98 or GNU Java. */ ? Used Jakub's example text. OK to push? Thanks, Mark PR debug/38757 gcc does not emit DW_LANG_C99. For C and C++ add the language standard version in use to lang_hooks.name. Change users of lang_hook.name to check with new functions lang_GNU_C or lang_GNU_CXX. In dwarf2out.c output the DW_LANG_C version from the lang_hooks.name and merge any LTO TRANSLATION_UNIT_LANGUAGE found. Adds two testcases to dwarf2.exp to check the right DWARF DW_AT_language is set on the compile_unit depending on the -std=c89 or -std=c99 setting. gcc/c-family/ChangeLog PR debug/38757 * c-opts.c (set_std_c89): Set lang_hooks.name. (set_std_c99): Likewise. (set_std_c11): Likewise. (set_std_cxx98): Likewise. (set_std_cxx11): Likewise. (set_std_cxx14): Likewise. (set_std_cxx1z): Likewise. gcc/ChangeLog PR debug/38757 * config/avr/avr-c.c (avr_cpu_cpp_builtins): Use lang_GNU_C. * config/darwin.c (darwin_file_end): Use lang_GNU_CXX. (darwin_override_options): Likewise. * config/ia64/ia64.c (ia64_struct_retval_addr_is_first_parm_p): Likewise. * config/rs6000/rs6000.c (rs6000_output_function_epilogue): Likewise. * dbxout.c (get_lang_number): Likewise. (dbxout_type): Likewise. (dbxout_symbol_location): Likewise. * dwarf2out.c (add_prototyped_attribute): Add DW_AT_prototype also for DW_LANG_{C,C99,ObjC}. (highest_c_language): New function. (gen_compile_unit_die): Call highest_c_language to merge LTO TRANSLATION_UNIT_LANGUAGE. Use strncmp language_string to determine if DW_LANG_C99 or DW_LANG_C89 should be returned. * fold-const.c (fold_cond_expr_with_comparison): Use lang_GNU_CXX. * langhooks.h (struct lang_hooks): Add version comment to name. (lang_GNU_C): New function declaration. (lang_GNU_CXX): Likewise. * langhooks.c (lang_GNU_C): New function. (lang_GNU_CXX): Likewise. * vmsdbgout.c (vmsdbgout_init): Use lang_GNU_C and lang_GNU_CXX. gcc/testsuite/ChangeLog PR debug/38757 * gcc.dg/debug/dwarf2/lang-c89.c: New test. * gcc.dg/debug/dwarf2/lang-c99.c: Likewise. diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index 000fdd2..08a36f0 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -1450,6 +1450,7 @@ set_std_c89 (int c94, int iso) flag_isoc94 = c94; flag_isoc99 = 0; flag_isoc11 = 0; + lang_hooks.name = GNU C89; } /* Set the C 99 standard (without GNU extensions if ISO). */ @@ -1463,6 +1464,7 @@ set_std_c99 (int iso) flag_isoc11 = 0; flag_isoc99 = 1; flag_isoc94 = 1; + lang_hooks.name = GNU C99; } /* Set the C 11 standard (without GNU extensions if ISO). */ @@ -1476,6 +1478,7 @@ set_std_c11 (int iso) flag_isoc11 = 1; flag_isoc99 = 1; flag_isoc94 = 1; + lang_hooks.name = GNU C11; } /* Set the C++ 98 standard (without GNU extensions if ISO). */ @@ -1487,6 +1490,7 @@ set_std_cxx98 (int iso) flag_no_nonansi_builtin = iso; flag_iso = iso; cxx_dialect = cxx98; + lang_hooks.name = GNU C++98; } /* Set the C++ 2011 standard (without GNU extensions if ISO). */ @@ -1501,6 +1505,7 @@ set_std_cxx11 (int iso)
Re: [PATCH 1/2] PR debug/38757 gcc does not emit DW_LANG_C99.
On Fri, Nov 21, 2014 at 02:01:55PM +0100, Mark Wielaard wrote: gcc/c-family/ChangeLog PR debug/38757 * c-opts.c (set_std_c89): Set lang_hooks.name. (set_std_c99): Likewise. (set_std_c11): Likewise. (set_std_cxx98): Likewise. (set_std_cxx11): Likewise. (set_std_cxx14): Likewise. (set_std_cxx1z): Likewise. gcc/ChangeLog PR debug/38757 * config/avr/avr-c.c (avr_cpu_cpp_builtins): Use lang_GNU_C. * config/darwin.c (darwin_file_end): Use lang_GNU_CXX. (darwin_override_options): Likewise. * config/ia64/ia64.c (ia64_struct_retval_addr_is_first_parm_p): Likewise. * config/rs6000/rs6000.c (rs6000_output_function_epilogue): Likewise. * dbxout.c (get_lang_number): Likewise. (dbxout_type): Likewise. (dbxout_symbol_location): Likewise. * dwarf2out.c (add_prototyped_attribute): Add DW_AT_prototype also for DW_LANG_{C,C99,ObjC}. (highest_c_language): New function. (gen_compile_unit_die): Call highest_c_language to merge LTO TRANSLATION_UNIT_LANGUAGE. Use strncmp language_string to determine if DW_LANG_C99 or DW_LANG_C89 should be returned. * fold-const.c (fold_cond_expr_with_comparison): Use lang_GNU_CXX. * langhooks.h (struct lang_hooks): Add version comment to name. (lang_GNU_C): New function declaration. (lang_GNU_CXX): Likewise. * langhooks.c (lang_GNU_C): New function. (lang_GNU_CXX): Likewise. * vmsdbgout.c (vmsdbgout_init): Use lang_GNU_C and lang_GNU_CXX. gcc/testsuite/ChangeLog PR debug/38757 * gcc.dg/debug/dwarf2/lang-c89.c: New test. * gcc.dg/debug/dwarf2/lang-c99.c: Likewise. Ok, thanks. Jakub
Re: FW: [Aarch64][BE][2/2] Fix vector load/stores to not use ld1/st1
On 21 November 2014 12:11, Alan Hayward alan.hayw...@arm.com wrote: 2014-11-21 Alan Hayward alan.hayw...@arm.com PR 57233 PR 59810 * config/aarch64/aarch64.c (aarch64_classify_address): Allow extra addressing modes for BE. (aarch64_print_operand): New operand for printing a q register+1. (aarch64_simd_emit_reg_reg_move): Define. (aarch64_simd_disambiguate_copy): Remove. * config/aarch64/aarch64-protos.h (aarch64_simd_emit_reg_reg_move): Define. (aarch64_simd_disambiguate_copy): Remove. * config/aarch64/aarch64-simd.md (define_split): Use aarch64_simd_emit_reg_reg_move. (define_expand movmode): Less restrictive predicates. (define_insn *aarch64_movmode): Simplify and only allow for LE. (define_insn *aarch64_be_movoi): Define. (define_insn *aarch64_be_movci): Define. (define_insn *aarch64_be_movxi): Define. (define_split): OI mov. Use aarch64_simd_emit_reg_reg_move. (define_split): CI mov. Use aarch64_simd_emit_reg_reg_move. (define_split): XI mov. Use aarch64_simd_emit_reg_reg_move. I don;t think we should claim to resolve 57233 here. The solution to 57233 from Marc just happened to expose the BE issues in aarch64. Otherwise OK. /Marcus
[wwwdocs] Document ARM --with-cpu changes for 5.0
Hi, As requested by Ramana when he OKed the initial change, the attched patch documents the changes I made to --with-cpu and --with-tune in this patch: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02618.html in the changes for GCC 5.0. OK? Thanks, James --- ? .git ? foo.patch ? htdocs/.#index.html.1.888 Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.40 diff -u -r1.40 changes.html --- htdocs/gcc-5/changes.html 20 Nov 2014 09:09:26 - 1.40 +++ htdocs/gcc-5/changes.html 20 Nov 2014 10:46:48 - @@ -393,6 +393,10 @@ non-unified syntax is used. However this is subject to change in future releases. Eventually the non-unified syntax will be deprecated. /li + li It is now a configure-time error to use the code--with-cpu/code + configure option with either of code--with-tune/code or + code--with-arch/code. + /li /ul h3 id=x86IA-32/x86-64/h3
Re: [PATCH] VRP: don't assume strict overflow semantics when checking if a loop wraps
On Fri, Nov 21, 2014 at 7:18 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Nov 21, 2014 at 12:29 PM, Patrick Palka patr...@parcs.ath.cx wrote: When adjusting the value range of an induction variable using SCEV, VRP calls scev_probably_wraps_p() with use_overflow_semantics=true. This parameter set to true makes scev_probably_wraps_p() assume that signed induction variables never wrap, so for these variables it always returns false (when strict overflow rules are in effect). This is wrong because if a signed induction variable really does overflow then we want to give it an INF(OVF) value range and not the (finite) estimation returned by SCEV. While this change shouldn't make a difference in code generation, it should help improve the coverage of -Wstrict-overflow warnings on induction variables like in the test case. OK after bootstrap + regtest on x86_64-unknown-linux-gnu? Hmm, I don't think the change won't affect code-generation. In fact we check for overflow ourselves in the most interesting case (the first block) - only the path adjusting min/max based on the init value and the max value of the type needs to know whether overflow may happen and fail or drop to +-INF(OVF). So I'd rather open-code the relevant cases and not call scev_probably_wraps_p at all. What kind of tests for overflow do you have in mind? max_loop_iterations() in this test case always return INT_MAX so there will be no overflow when computing the upper bound using the number of loop iterations. Do you mean to compare what max_loop_iterations() returns with the range that VRP has inferred for the induction variable? Richard. gcc/ * tree-vrp.c (adjust_range_with_scev): Call scev_probably_wraps_p with use_overflow_semantics=false. gcc/testsuite/ * gcc.dg/Wstrict-overflow-27.c: New test. --- gcc/testsuite/gcc.dg/Wstrict-overflow-27.c | 22 ++ gcc/tree-vrp.c | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/Wstrict-overflow-27.c diff --git a/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c b/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c new file mode 100644 index 000..c1f27ab --- /dev/null +++ b/gcc/testsuite/gcc.dg/Wstrict-overflow-27.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -fstrict-overflow -O2 -Wstrict-overflow } */ + +/* Warn about an overflow when folding i 0. */ + +void bar (unsigned *p); + +int +foo (unsigned *p) +{ + int i; + int sum = 0; + + for (i = 0; i *p; i++) +{ + if (i 0) /* { dg-warning signed overflow } */ + sum += 2; + bar (p); +} + + return sum; +} diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index a75138f..bf9ff61 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4270,7 +4270,7 @@ adjust_range_with_scev (value_range_t *vr, struct loop *loop, dir == EV_DIR_UNKNOWN /* ... or if it may wrap. */ || scev_probably_wraps_p (init, step, stmt, get_chrec_loop (chrec), - true)) + /*use_overflow_semantics=*/false)) return; /* We use TYPE_MIN_VALUE and TYPE_MAX_VALUE here instead of -- 2.2.0.rc1.23.gf570943
[patch] Fix tilepro includes
During the flattening of optabs.h, I updated all the config/* files which were affected. I've been getting spurious failures with config-list.mk where my changes would disappear and tracked down why. I was blissfully unaware that the tilepro ports mul-tables.c file is actually generated from gen-mul-tables.cc. This patch fixes the include issue by adding #include insn-codes.h to the generated files. I also added a comment indicating these are generated files, and to make changes in the generator. This allows all the tile* ports to compile properly again. OK for trunk? Andrew * config/tilepro/gen-mul-tables.cc: Add insn-codes.h to include list for generator file. Add comment indicating it is a generated file. * config/tilepro/mul-tables.c: Update generated file. * config/tilegx/mul-tables.c: Likewise. Index: config/tilepro/gen-mul-tables.cc === *** config/tilepro/gen-mul-tables.cc (revision 217787) --- config/tilepro/gen-mul-tables.cc (working copy) *** main () *** 1249,1258 --- 1249,1262 printf ( along with GCC; see the file COPYING3. If not see\n); printf ( http://www.gnu.org/licenses/. */\n); printf (\n); + printf (/* Note this file is auto-generated from gen-mul-tables.cc.\n); + printf ( Make any required changes there. */\n); + printf (\n); printf (#include \config.h\\n); printf (#include \system.h\\n); printf (#include \coretypes.h\\n); printf (#include \expr.h\\n); + printf (#include \insn-codes.h\\n); printf (#include \optabs.h\\n); printf (#include \%s-multiply.h\\n\n, ARCH); create_insn_code_compression_table (); Index: config/tilepro/mul-tables.c === *** config/tilepro/mul-tables.c (revision 217787) --- config/tilepro/mul-tables.c (working copy) *** *** 18,23 --- 18,26 along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ + /* Note this file is auto-generated from gen-mul-tables.cc. +Make any required changes there. */ + #include config.h #include system.h #include coretypes.h Index: config/tilegx/mul-tables.c === *** config/tilegx/mul-tables.c (revision 217787) --- config/tilegx/mul-tables.c (working copy) *** *** 18,23 --- 18,26 along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ + /* Note this file is auto-generated from gen-mul-tables.cc. +Make any required changes there. */ + #include config.h #include system.h #include coretypes.h
[PATCH,MIPS] Refine configure guard for .module availability
(I'm not sure if I need approval from someone else for MIPS specific top level 'configure' changes. I'm cautiously assuming I do for now.) Since adding o32 FFPXX support, the MIPS backend uses the .module directive to emit a .module [no]oddspreg when .module support is detected in binutils. The oddspreg option was however only added to binutils with FPXX and not the initial .module support. This leads to errors when using binutils-gdb between the following commits: commit 919731affbef19fcad8dddb0a595bb05755cb345 Author: mfortune matthew.fort...@imgtec.com Date: Tue May 20 13:28:20 2014 +0100 Add MIPS .module directive commit 351cdf24d223290b15fa991e5052ec9e9bd1e284 Author: Matthew Fortune matthew.fort...@imgtec.com Date: Tue Jul 29 11:27:59 2014 +0100 [MIPS] Implement O32 FPXX, FP64 and FP64A ABI extensions I have updated the configure check for .module to check for both .module and FPXX support. There was no point in separating the detection of .module from detection of FPXX as there is no need to switch to .module until using FPXX. Tested a build of the compiler for mipsel-linux-gnu, mips64el-linux-gnu with binutils which predates and postdates FPXX and checked that the configure results are correct and that .module vs .gnu_attribute is generated appropriately. Thanks, Matthew gcc/ * configure.ac: When checking for .module support ensure that o32 FPXX is supported to avoid a second configure check. * configure: Regenerate. diff --git a/gcc/configure.ac b/gcc/configure.ac index f6e7ec3..584400d 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -4280,8 +4280,9 @@ LCF0: [Define if your assembler supports .gnu_attribute.])]) gcc_GAS_CHECK_FEATURE([.module support], - gcc_cv_as_mips_dot_module,,, - [.module fp=32],, + gcc_cv_as_mips_dot_module,,[-32], + [.module mips2 + .module fp=xx],, [AC_DEFINE(HAVE_AS_DOT_MODULE, 1, [Define if your assembler supports .module.])]) if test x$gcc_cv_as_mips_dot_module = xno \
[committed] Cherry-pick a libsanitizer bugfix (PR sanitizer/64013)
Hi! I've committed this as obvious. 2014-11-21 Jakub Jelinek ja...@redhat.com PR sanitizer/64013 * sanitizer_common/sanitizer_linux.cc (FileExists): Cherry pick upstream r222532. --- libsanitizer/sanitizer_common/sanitizer_linux.cc(revision 222531) +++ libsanitizer/sanitizer_common/sanitizer_linux.cc(revision 222532) @@ -283,17 +283,15 @@ uptr internal_execve(const char *filenam // - sanitizer_common.h bool FileExists(const char *filename) { -#if SANITIZER_USES_CANONICAL_LINUX_SYSCALLS struct stat st; +#if SANITIZER_USES_CANONICAL_LINUX_SYSCALLS if (internal_syscall(SYSCALL(newfstatat), AT_FDCWD, filename, st, 0)) -return false; #else - struct stat st; if (internal_stat(filename, st)) +#endif return false; // Sanity check: filename is a regular file. return S_ISREG(st.st_mode); -#endif } uptr GetTid() { Jakub
Re: [rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1
On 14/11/2014 16:48, Alan Hayward alan.hayw...@arm.com wrote: This is a new version of my BE patch from a few weeks ago. This is part 1 and covers rtlanal.c. The second part will be aarch64 specific. When combined with the second patch, It fixes up movoi/ci/xi for Big Endian, so that we end up with the lab of a big-endian integer to be in the low byte of the highest-numbered register. This will apply cleanly by itself and no regressions were seen when testing aarch64 and x86_64 on make check. Changelog: 2014-11-14 Alan Hayward alan.hayw...@arm.com * rtlanal.c (subreg_get_info): Exit early for simple and common cases Alan. Hi, The second part to this patch (aarch64 specific) has been approved. Could someone review this one please. Thanks, Alan.
[PATCH] Improve PR63679
This patch picks up work that was in my working tree already and fixes it up. When targets choose to not emitting piecewise aggregate inits during gimplification or when that is disabled for other reasons (like being too large) then even FRE with all its tricks cannot constant fold from them. The following patch teaches it to do that via allowing offsetted reads (at the moment only reads from offset zero would have been handled) and finally trying to do a lookup from the static initializer. It also generalizes the code doing that to not only simplify reads from string constants but from arbitrary constans by means of the recently improved native_encode/interpret_expr code and from CONSTRUCTORs via using fold_ctor_reference. This exposes several testcases that use static uninitialized globals for which they don't expect loads to be optimized to zero ... Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress after a minor fix. To really fix PR63679 fold_ctor_reference would need to learn to combine several array fields to a vector constant or native_encode_expr would need to learn to encode CONSTRUCTORs. Also FRE would have to be run late. Still referencing that PR as it lead me to re-investigate all this. I'll go ahead and apply this patch as bugfix on Monday unless somebody screams loudly. Richard. 2014-11-21 Richard Biener rguent...@suse.de PR tree-optimization/63679 * tree-ssa-sccvn.c: Include ipa-ref.h, plugin-api.h and cgraph.h. (copy_reference_ops_from_ref): Fix non-constant ADDR_EXPR case to properly leave off at -1. (fully_constant_vn_reference_p): Generalize folding from constant initializers. (vn_reference_lookup_3): When looking through aggregate copies handle offsetted reads and try simplifying the result to a constant. * gimple-fold.h (fold_ctor_reference): Export. * gimple-fold.c (fold_ctor_reference): Likewise. * gcc.dg/tree-ssa/ssa-fre-42.c: New testcase. * gcc.dg/tree-ssa/20030807-5.c: Avoid folding read from global to zero. * gcc.target/i386/ssetype-1.c: Likewise. * gcc.target/i386/ssetype-3.c: Likewise. * gcc.target/i386/ssetype-5.c: Likewise. Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c.orig 2014-11-21 11:09:55.230818525 +0100 --- gcc/tree-ssa-sccvn.c2014-11-21 14:51:03.328237909 +0100 *** along with GCC; see the file COPYING3. *** 65,70 --- 65,73 #include tree-ssa-sccvn.h #include tree-cfg.h #include domwalk.h + #include ipa-ref.h + #include plugin-api.h + #include cgraph.h /* This algorithm is based on the SCC algorithm presented by Keith Cooper and L. Taylor Simpson in SCC-Based Value numbering *** copy_reference_ops_from_ref (tree ref, v *** 936,942 temp.op0 = ref; break; } ! /* Fallthrough. */ /* These are only interesting for their operands, their existence, and their type. They will never be the last ref in the chain of references (IE they require an --- 939,945 temp.op0 = ref; break; } ! break; /* These are only interesting for their operands, their existence, and their type. They will never be the last ref in the chain of references (IE they require an *** fully_constant_vn_reference_p (vn_refere *** 1341,1364 } } ! /* Simplify reads from constant strings. */ ! else if (op-opcode == ARRAY_REF ! TREE_CODE (op-op0) == INTEGER_CST ! integer_zerop (op-op1) ! operands.length () == 2) ! { ! vn_reference_op_t arg0; ! arg0 = operands[1]; ! if (arg0-opcode == STRING_CST ! (TYPE_MODE (op-type) ! == TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0-op0 ! GET_MODE_CLASS (TYPE_MODE (op-type)) == MODE_INT ! GET_MODE_SIZE (TYPE_MODE (op-type)) == 1 ! tree_int_cst_sgn (op-op0) = 0 ! compare_tree_int (op-op0, TREE_STRING_LENGTH (arg0-op0)) 0) ! return build_int_cst_type (op-type, ! (TREE_STRING_POINTER (arg0-op0) ! [TREE_INT_CST_LOW (op-op0)])); } return NULL_TREE; --- 1344,1409 } } ! /* Simplify reads from constants or constant initializers. */ ! else if (BITS_PER_UNIT == 8 ! is_gimple_reg_type (ref-type) ! (!INTEGRAL_TYPE_P (ref-type) ! || TYPE_PRECISION (ref-type) % BITS_PER_UNIT == 0)) ! { ! HOST_WIDE_INT off = 0; ! HOST_WIDE_INT size = tree_to_shwi (TYPE_SIZE (ref-type)); ! if (size % BITS_PER_UNIT != 0 ! || size MAX_BITSIZE_MODE_ANY_MODE) ! return NULL_TREE; ! size /= BITS_PER_UNIT; ! unsigned i;
Re: SRA: don't drop clobbers
On Thu, Nov 20, 2014 at 7:11 PM, Martin Jambor mjam...@suse.cz wrote: Hi, On Mon, Nov 03, 2014 at 10:46:49PM +0100, Marc Glisse wrote: On Mon, 3 Nov 2014, Marc Glisse wrote: On Mon, 3 Nov 2014, Martin Jambor wrote: I just applied your patch on top of trunk revision 217032 on my Ah, that explains it, thanks. This patch is a follow-up to r217034. Still, I didn't expect the ICE you are seeing by applying this patch to older trunk, I'll try to reproduce that. It is TODO_update_address_taken that used to remove clobbers, and as you said ESRA goes straight to TODO_update_ssa, which explains why the clobbers caused trouble. In any case, after r217034, update_ssa should handle clobbers much better. Could you take an other look based on a more recent trunk, please? Sorry for the delay. Anyway, on the current trunk (i.e. Tuesday checkout) the patch works as expected, there are assignments from default definitions now and even though we do not warn as we should, the patch improves the generated code. The function foo from the testcase is optimized to return SR.1_2(D); as soon as release_ssa now, whereas unpatched trunk leaves an undefined load even in the optimized dump. Thus, I like the patch and given that you posted it well before stage1 end, I'd like to see it committed. Richi, can you have a look and perhaps approve it? Yes, the patch is ok. Thanks, Richard. Thanks, Martin
Re: [PATCH 8/9] Negative numbers added for sreal class.
On 11/21/2014 01:03 PM, Richard Biener wrote: On Fri, Nov 21, 2014 at 12:21 PM, Martin Liška mli...@suse.cz wrote: On 11/14/2014 11:48 AM, Richard Biener wrote: On Thu, Nov 13, 2014 at 1:35 PM, mliska mli...@suse.cz wrote: gcc/ChangeLog: 2014-11-13 Martin Liska mli...@suse.cz * predict.c (propagate_freq): More elegant sreal API is used. (estimate_bb_frequencies): New static constants defined by sreal replace precomputed ones. * sreal.c (sreal::normalize): New function. (sreal::to_int): Likewise. (sreal::operator+): Likewise. (sreal::operator-): Likewise. * sreal.h: Definition of new functions added. Please use gcc_checking_assert()s everywhere. sreal is supposed to be fast... (I see it has current uses of gcc_assert - you may want to mass-convert them as a followup). --- gcc/predict.c | 30 +++- gcc/sreal.c | 56 gcc/sreal.h | 75 --- 3 files changed, 126 insertions(+), 35 deletions(-) diff --git a/gcc/predict.c b/gcc/predict.c index 0215e91..0f640f5 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -82,7 +82,7 @@ along with GCC; see the file COPYING3. If not see /* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE, 1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX. */ -static sreal real_zero, real_one, real_almost_one, real_br_prob_base, +static sreal real_almost_one, real_br_prob_base, real_inv_br_prob_base, real_one_half, real_bb_freq_max; static void combine_predictions_for_insn (rtx_insn *, basic_block); @@ -2528,13 +2528,13 @@ propagate_freq (basic_block head, bitmap tovisit) bb-count = bb-frequency = 0; } - BLOCK_INFO (head)-frequency = real_one; + BLOCK_INFO (head)-frequency = sreal::one (); last = head; for (bb = head; bb; bb = nextbb) { edge_iterator ei; - sreal cyclic_probability = real_zero; - sreal frequency = real_zero; + sreal cyclic_probability = sreal::zero (); + sreal frequency = sreal::zero (); nextbb = BLOCK_INFO (bb)-next; BLOCK_INFO (bb)-next = NULL; @@ -2559,13 +2559,13 @@ propagate_freq (basic_block head, bitmap tovisit) * BLOCK_INFO (e-src)-frequency / REG_BR_PROB_BASE); */ - sreal tmp (e-probability, 0); + sreal tmp = e-probability; tmp *= BLOCK_INFO (e-src)-frequency; tmp *= real_inv_br_prob_base; frequency += tmp; } - if (cyclic_probability == real_zero) + if (cyclic_probability == sreal::zero ()) { BLOCK_INFO (bb)-frequency = frequency; } @@ -2577,7 +2577,7 @@ propagate_freq (basic_block head, bitmap tovisit) /* BLOCK_INFO (bb)-frequency = frequency / (1 - cyclic_probability) */ - cyclic_probability = real_one - cyclic_probability; + cyclic_probability = sreal::one () - cyclic_probability; BLOCK_INFO (bb)-frequency = frequency / cyclic_probability; } } @@ -2591,7 +2591,7 @@ propagate_freq (basic_block head, bitmap tovisit) = ((e-probability * BLOCK_INFO (bb)-frequency) / REG_BR_PROB_BASE); */ - sreal tmp (e-probability, 0); + sreal tmp = e-probability; tmp *= BLOCK_INFO (bb)-frequency; EDGE_INFO (e)-back_edge_prob = tmp * real_inv_br_prob_base; } @@ -2873,13 +2873,11 @@ estimate_bb_frequencies (bool force) if (!real_values_initialized) { real_values_initialized = 1; - real_zero = sreal (0, 0); - real_one = sreal (1, 0); - real_br_prob_base = sreal (REG_BR_PROB_BASE, 0); - real_bb_freq_max = sreal (BB_FREQ_MAX, 0); + real_br_prob_base = REG_BR_PROB_BASE; + real_bb_freq_max = BB_FREQ_MAX; real_one_half = sreal (1, -1); - real_inv_br_prob_base = real_one / real_br_prob_base; - real_almost_one = real_one - real_inv_br_prob_base; + real_inv_br_prob_base = sreal::one () / real_br_prob_base; + real_almost_one = sreal::one () - real_inv_br_prob_base; } mark_dfs_back_edges (); @@ -2897,7 +2895,7 @@ estimate_bb_frequencies (bool force) FOR_EACH_EDGE (e, ei, bb-succs) { - EDGE_INFO (e)-back_edge_prob = sreal (e-probability, 0); + EDGE_INFO (e)-back_edge_prob = e-probability; EDGE_INFO (e)-back_edge_prob *= real_inv_br_prob_base; } } @@ -2906,7 +2904,7 @@ estimate_bb_frequencies (bool force) to outermost to examine frequencies for back edges. */
Re: [PATCH 8/9] Negative numbers added for sreal class.
On Fri, Nov 21, 2014 at 3:39 PM, Martin Liška mli...@suse.cz wrote: Hello. Ok, this is simplified, one can use sreal a = 12345 and it works ;) that's a new API, right? There is no max () and I think that using LONG_MIN here is asking for trouble (host dependence). The comment in the file says the max should be sreal (SREAL_MAX_SIG, SREAL_MAX_EXP) and the min sreal (-SREAL_MAX_SIG, SREAL_MAX_EXP)? Sure, sreal can store much bigger(smaller) numbers :) Where do you need sreal::to_double? The host shouldn't perform double calculations so it can be only for dumping? In which case the user should have used sreal::dump (), maybe with extra arguments. That new function was request from Honza, only for debugging purpose. I agree that dump should this kind of job. If no other problem, I will run tests once more and commit it. Thanks, Martin -#define SREAL_MAX_EXP (INT_MAX / 4) +#define SREAL_MAX_EXP (INT_MAX / 8) this change doesn't look necessary anymore? Btw, it's also odd that... #define SREAL_PART_BITS 32 ... #define SREAL_MIN_SIG ((uint64_t) 1 (SREAL_PART_BITS - 1)) #define SREAL_MAX_SIG (((uint64_t) 1 SREAL_PART_BITS) - 1) thus all m_sig values fit in 32bits but we still use a uint64_t m_sig ... (the implementation uses 64bit for internal computations, but still the storage is wasteful?) Of course the way normalize() works requires that storage to be 64bits to store unnormalized values. I'd say ok with the SREAL_MAX_EXP change reverted. Thanks, Richard. Otherwise looks good to me and sorry for not noticing the above earlier. Thanks, Richard. Thanks, Martin }; extern void debug (sreal ref); @@ -76,12 +133,12 @@ inline sreal operator+= (sreal a, const sreal b) inline sreal operator-= (sreal a, const sreal b) { -return a = a - b; + return a = a - b; } inline sreal operator/= (sreal a, const sreal b) { -return a = a / b; + return a = a / b; } inline sreal operator*= (sreal a, const sreal b) -- 2.1.2
Re: [PATCH 8/9] Negative numbers added for sreal class.
On 11/21/2014 04:02 PM, Richard Biener wrote: On Fri, Nov 21, 2014 at 3:39 PM, Martin Liška mli...@suse.cz wrote: Hello. Ok, this is simplified, one can use sreal a = 12345 and it works ;) that's a new API, right? There is no max () and I think that using LONG_MIN here is asking for trouble (host dependence). The comment in the file says the max should be sreal (SREAL_MAX_SIG, SREAL_MAX_EXP) and the min sreal (-SREAL_MAX_SIG, SREAL_MAX_EXP)? Sure, sreal can store much bigger(smaller) numbers :) Where do you need sreal::to_double? The host shouldn't perform double calculations so it can be only for dumping? In which case the user should have used sreal::dump (), maybe with extra arguments. That new function was request from Honza, only for debugging purpose. I agree that dump should this kind of job. If no other problem, I will run tests once more and commit it. Thanks, Martin -#define SREAL_MAX_EXP (INT_MAX / 4) +#define SREAL_MAX_EXP (INT_MAX / 8) this change doesn't look necessary anymore? Btw, it's also odd that... #define SREAL_PART_BITS 32 ... #define SREAL_MIN_SIG ((uint64_t) 1 (SREAL_PART_BITS - 1)) #define SREAL_MAX_SIG (((uint64_t) 1 SREAL_PART_BITS) - 1) thus all m_sig values fit in 32bits but we still use a uint64_t m_sig ... (the implementation uses 64bit for internal computations, but still the storage is wasteful?) Of course the way normalize() works requires that storage to be 64bits to store unnormalized values. I'd say ok with the SREAL_MAX_EXP change reverted. Hi. You are right, this change was done because I used one bit for m_negative (bitfield), not needed any more. Final version attached. Thank you, Martin Thanks, Richard. Otherwise looks good to me and sorry for not noticing the above earlier. Thanks, Richard. Thanks, Martin }; extern void debug (sreal ref); @@ -76,12 +133,12 @@ inline sreal operator+= (sreal a, const sreal b) inline sreal operator-= (sreal a, const sreal b) { -return a = a - b; + return a = a - b; } inline sreal operator/= (sreal a, const sreal b) { -return a = a / b; + return a = a / b; } inline sreal operator*= (sreal a, const sreal b) -- 2.1.2 From b28e4264b5f9965ca5ab4f52ce6f4c9df00d4800 Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Fri, 21 Nov 2014 12:07:40 +0100 Subject: [PATCH 1/2] Negative numbers added for sreal class. gcc/ChangeLog: 2014-11-13 Martin Liska mli...@suse.cz * predict.c (propagate_freq): More elegant sreal API is used. (estimate_bb_frequencies): Precomputed constants replaced by integer constants. * sreal.c (sreal::normalize): New function. (sreal::to_int): Likewise. (sreal::operator+): Likewise. (sreal::operator-): Likewise. * sreal.h: Definition of new functions added. --- gcc/predict.c | 30 gcc/sreal.c | 114 -- gcc/sreal.h | 82 +- 3 files changed, 174 insertions(+), 52 deletions(-) diff --git a/gcc/predict.c b/gcc/predict.c index 779af11..0cfe4a9 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -82,7 +82,7 @@ along with GCC; see the file COPYING3. If not see /* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE, 1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX. */ -static sreal real_zero, real_one, real_almost_one, real_br_prob_base, +static sreal real_almost_one, real_br_prob_base, real_inv_br_prob_base, real_one_half, real_bb_freq_max; static void combine_predictions_for_insn (rtx_insn *, basic_block); @@ -2541,13 +2541,13 @@ propagate_freq (basic_block head, bitmap tovisit) bb-count = bb-frequency = 0; } - BLOCK_INFO (head)-frequency = real_one; + BLOCK_INFO (head)-frequency = 1; last = head; for (bb = head; bb; bb = nextbb) { edge_iterator ei; - sreal cyclic_probability = real_zero; - sreal frequency = real_zero; + sreal cyclic_probability = 0; + sreal frequency = 0; nextbb = BLOCK_INFO (bb)-next; BLOCK_INFO (bb)-next = NULL; @@ -2572,13 +2572,13 @@ propagate_freq (basic_block head, bitmap tovisit) * BLOCK_INFO (e-src)-frequency / REG_BR_PROB_BASE); */ - sreal tmp (e-probability, 0); + sreal tmp = e-probability; tmp *= BLOCK_INFO (e-src)-frequency; tmp *= real_inv_br_prob_base; frequency += tmp; } - if (cyclic_probability == real_zero) + if (cyclic_probability == 0) { BLOCK_INFO (bb)-frequency = frequency; } @@ -2590,7 +2590,7 @@ propagate_freq (basic_block head, bitmap tovisit) /* BLOCK_INFO (bb)-frequency = frequency / (1 - cyclic_probability) */ - cyclic_probability = real_one - cyclic_probability; + cyclic_probability = sreal (1) - cyclic_probability; BLOCK_INFO (bb)-frequency = frequency / cyclic_probability; } } @@ -2604,7 +2604,7 @@ propagate_freq (basic_block head, bitmap tovisit)
Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library
On 19 Nov 21:11, Ilya Enkovich wrote: 2014-11-19 20:55 GMT+03:00 Jeff Law l...@redhat.com: On 11/19/14 07:15, Ilya Enkovich wrote: -- 2014-11-19 Ilya Enkovich ilya.enkov...@intel.com * Makefile.def: Add libmpx. * configure.ac: Add libmpx. * Makefile.in: Regenerate. * configure: Regenerate. gcc/ 2014-11-19 Ilya Enkovich ilya.enkov...@intel.com * gcc.c (LIBMPX_LIBS): New. (LIBMPX_SPEC): New. (MPX_SPEC): New. (LINK_COMMAND_SPEC): Add MPX_SPEC. * c-family/c.opt (static-libmpx): New. libmpx/ 2014-11-19 Ilya Enkovich ilya.enkov...@intel.com Initial commit. So I have only done a cursory peek at this code, but one thing which I did immediately note was the CPU feature testing stuff. Shouldn't all that stuff be integrated into the feature testing bits already found in libgcc? I'll have a look at these features. I've asked the steering committee to vote on accepting the runtime -- necessary given Intel is keeping copyright ownership to the best of my knowledge. Thanks! Ilya Jeff Jakub objected adding CPUID checks used in MPX runtime into __builtin_cpu_supports. So I just added required bits into cpuid.h and removed local implementation of cpuid. Is it OK? Thanks, Ilya -- 2014-11-21 Ilya Enkovich ilya.enkov...@intel.com * Makefile.def: Add libmpx. * configure.ac: Add libmpx. * Makefile.in: Regenerate. * configure: Regenerate. gcc/ 2014-11-21 Ilya Enkovich ilya.enkov...@intel.com * config/i386/cpuid.h (bit_MPX): New. (bit_BNDREGS): New. (bit_BNDCSR): New. * gcc.c (LIBMPX_LIBS): New. (LIBMPX_SPEC): New. (MPX_SPEC): New. (LINK_COMMAND_SPEC): Add MPX_SPEC. * c-family/c.opt (static-libmpx): New. libmpx/ 2014-11-21 Ilya Enkovich ilya.enkov...@intel.com Initial commit. diff --git a/Makefile.def b/Makefile.def index 40bbca9..4a535d2 100644 --- a/Makefile.def +++ b/Makefile.def @@ -128,6 +128,9 @@ target_modules = { module= libsanitizer; bootstrap=true; lib_path=.libs; raw_cxx=true; }; +target_modules = { module= libmpx; + bootstrap=true; + lib_path=.libs; }; target_modules = { module= libvtv; bootstrap=true; lib_path=.libs; diff --git a/configure.ac b/configure.ac index b27fb1d..ccb119b 100644 --- a/configure.ac +++ b/configure.ac @@ -162,6 +162,7 @@ target_libraries=target-libgcc \ target-libstdc++-v3 \ target-libsanitizer \ target-libvtv \ + target-libmpx \ target-libssp \ target-libquadmath \ target-libgfortran \ @@ -642,6 +643,25 @@ if test -d ${srcdir}/libvtv; then fi fi + +# Disable libmpx on unsupported systems. +if test -d ${srcdir}/libmpx; then +if test x$enable_libmpx = x; then + AC_MSG_CHECKING([for libmpx support]) + if (srcdir=${srcdir}/libmpx; \ + . ${srcdir}/configure.tgt; \ + test $LIBMPX_SUPPORTED != yes) + then + AC_MSG_RESULT([no]) + noconfigdirs=$noconfigdirs target-libmpx + else + AC_MSG_RESULT([yes]) + fi +fi +fi + + + # Disable libquadmath for some systems. case ${target} in avr-*-*) @@ -2652,6 +2672,11 @@ if echo ${target_configdirs} | grep libvtv /dev/null 21 bootstrap_target_libs=${bootstrap_target_libs}target-libvtv, fi +# If we are building libmpx, bootstrap it. +if echo ${target_configdirs} | grep libmpx /dev/null 21; then + bootstrap_target_libs=${bootstrap_target_libs}target-libmpx, +fi + # Determine whether gdb needs tk/tcl or not. # Use 'maybe' since enable_gdbtk might be true even if tk isn't available # and in that case we want gdb to be built without tk. Ugh! diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 85dcb98..8f5d76c 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1040,6 +1040,9 @@ fchkp-instrument-marked-only C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0) Instrument only functions marked with bnd_instrument attribute. +static-libmpx +Driver + fcilkplus C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0) Enable Cilk Plus diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index 133e356..f85cebb 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -72,6 +72,7 @@ #define bit_AVX2 (1 5) #define bit_BMI2 (1 8) #define bit_RTM(1 11) +#define bit_MPX(1 14) #define bit_AVX512F(1 16) #define bit_AVX512DQ (1 17) #define bit_RDSEED (1 18) @@ -87,6 +88,10 @@ /* %ecx */ #define bit_PREFETCHWT1 (1 0) +/* XFEATURE_ENABLED_MASK register bits (%eax == 13, %ecx == 0) */ +#define
Re: [PATCH, MPX wrappers 1/3] Add MPX wrappers library
On 18 Nov 14:15, Jeff Law wrote: On 11/18/14 09:48, Ilya Enkovich wrote: On 15 Nov 00:10, Jeff Law wrote: On 11/14/14 10:26, Ilya Enkovich wrote: Hi, This patch introduces a simple library with several wrappers to be used with MPX and Pointer Bounds Checker. Wrappers allow to obtain, copy and just keep alive bounds whrough widely use library calls. It significantly increases checking quality. Thanks, Ilya -- gcc/ 2014-11-14 Ilya Enkovich ilya.enkov...@intel.com * gcc.c (MPX_SPEC): Add wrappers library. libmpx/ 2014-11-14 Ilya Enkovich ilya.enkov...@intel.com * Makefile.am (SUBDIRS): New. (MAKEOVERRIDES): New. * Makefile.in: Regenerate. * configure.ac: Add mpxintr/Makefile to config files. * configure: Regenerate. * mpxwrap/Makefile.am: New. * mpxwrap/Makefile.in: New. * mpxwrap/libtool-version: New. * mpxwrap/mpx_wrappers.cc: New. As Joseph mentioned, symbol versioning. Anytime a target side library is added to GCC, it should be properly versioned. Don't forget copyright headers in the new files. Remember it has to be suitable for embeddeding in the target without infecting the target with the GPL. LGPL or GPL + exception clause seem the most appropriate to me. Jeff Thank you for review! Here is a version with license and versioning added. Thanks, Ilya -- gcc/ 2014-11-18 Ilya Enkovich ilya.enkov...@intel.com * gcc.c (MPX_SPEC): Add wrappers library. libmpx/ 2014-11-18 Ilya Enkovich ilya.enkov...@intel.com * Makefile.am (SUBDIRS): New. (MAKEOVERRIDES): New. * Makefile.in: Regenerate. * configure.ac: Add mpxintr/Makefile to config files. * configure: Regenerate. * mpxwrap/Makefile.am: New. * mpxwrap/Makefile.in: New. * mpxwrap/libtool-version: New. * mpxwrap/mpx_wrappers.cc: New. * mpxwrap/libmpxwrappers.map: New. OK. Jeff Hi, There is a missing check in libmpx configure. We may try to build mpxwrappers when binutils don't support MPX and thus get build failure. I added a check for MPX support in used assembler and mpxwrappers library is now built conditionally. Since the latest version of runtime library supports static link, I also supported -static-libmpxwrappers option. Does it look OK? Thanks, Ilya -- gcc/ 2014-11-21 Ilya Enkovich ilya.enkov...@intel.com * gcc.c (LIBMPX_WRAPPERSSPEC): New. (MPX_SPEC): Add wrappers library. * c-family/c.opt (static-libmpxwrappers): New. libmpx/ 2014-11-21 Ilya Enkovich ilya.enkov...@intel.com * Makefile.am (SUBDIRS): Add mpxwrap when used AS supports MPX. (MAKEOVERRIDES): New. * Makefile.in: Regenerate. * configure.ac: Check AS supports MPX. Add mpxintr/Makefile to config files. * configure: Regenerate. * mpxwrap/Makefile.am: New. * mpxwrap/Makefile.in: New. * mpxwrap/libtool-version: New. * mpxwrap/mpx_wrappers.cc: New. * mpxwrap/libmpxwrappers.map: New. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 8f5d76c..283c632 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1043,6 +1043,9 @@ Instrument only functions marked with bnd_instrument attribute. static-libmpx Driver +static-libmpxwrappers +Driver + fcilkplus C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0) Enable Cilk Plus diff --git a/gcc/gcc.c b/gcc/gcc.c index 75e5767..aa8c9a3 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -828,9 +828,23 @@ proper position among the other output files. */ #endif #endif +#ifndef LIBMPXWRAPPERS_SPEC +#if defined(HAVE_LD_STATIC_DYNAMIC) +#define LIBMPXWRAPPERS_SPEC \ +%{mmpx:%{fcheck-pointer-bounds:%{!fno-chkp-use-wrappers:\ +%{static:-lmpxwrappers}\ +%{!static:%{static-libmpxwrappers: LD_STATIC_OPTION --whole-archive}\ +-lmpxwrappers %{static-libmpxwrappers:--no-whole-archive \ +LD_DYNAMIC_OPTION } +#else +#define LIBMPXWRAPPERS_SPEC \ +%{mmpx:%{fcheck-pointer-bounds:{!fno-chkp-use-wrappers:-lmpxwrappers}}} +#endif +#endif + #ifndef MPX_SPEC #define MPX_SPEC \ -%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC }} +%{!nostdlib:%{!nodefaultlibs: LIBMPX_SPEC LIBMPXWRAPPERS_SPEC }} #endif /* -u* was put back because both BSD and SysV seem to support it. */ diff --git a/libmpx/Makefile.am b/libmpx/Makefile.am index 6cee4ac..bd0a8b6 100644 --- a/libmpx/Makefile.am +++ b/libmpx/Makefile.am @@ -2,6 +2,9 @@ ACLOCAL_AMFLAGS = -I .. -I ../config if LIBMPX_SUPPORTED SUBDIRS = mpxrt +if MPX_AS_SUPPORTED +SUBDIRS += mpxwrap +endif nodist_toolexeclib_HEADERS = libmpx.spec endif @@ -45,3 +48,5 @@ AM_MAKEFLAGS = \ PICFLAG=$(PICFLAG) \ RANLIB=$(RANLIB) \ DESTDIR=$(DESTDIR) + +MAKEOVERRIDES = diff --git a/libmpx/configure.ac b/libmpx/configure.ac index bd7a5eb..180503c 100644 --- a/libmpx/configure.ac +++ b/libmpx/configure.ac @@ -93,6 +93,18 @@ AC_CHECK_TOOL(AS, as)
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c OK /Marcus
Re: [PATCH][wwwdocs] Add Cortex-A53 erratum workaround note to AArch64 changes for 4.8
On 17 November 2014 11:42, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Makes sense. Here are the changes for the 4.9 and 4.8 changes.html pages. Ok? This looks ok to me, I'd suggest changing... + li Starting with GCC 4.8.4 a workaround for the ARM Cortex-A53 to + li As of GCC 4.8.4 OK with that change. /Marcus
Re: [PATCH][AArch64]Add vec_shr pattern for 64-bit vectors using ush{l,r}; enable tests.
On 14 November 2014 15:46, Alan Lawrence alan.lawre...@arm.com wrote: gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_shrmode): New. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_whole_vector_shift): Add aarch64{,_be}. OK /Marcus
Re: [PATCH][AArch64]Tidy up aarch64_simd_expand_args
On 17 November 2014 16:56, Alan Lawrence alan.lawre...@arm.com wrote: This is a pure tidyup, no new functionality. Changes are (1) Use op[0] to store the result operand, rather than a separate variable, thus combining the two large switch statements into one; (2) The 'arg' and 'mode' arrays were (almost-)only ever used to store data *within* each iteration, so turn them into scalar variables. (3) Use 'opc' rather than 'argc' as it indexes operands. Cross-tested check-gcc on aarch64-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Refactor by combining switch statements and make arrays into scalars. OK /Marcus
Re: [PATCH][AArch64] Add vector pattern for __builtin_ctz
On 14 November 2014 16:38, Jiong Wang jiong.w...@arm.com wrote: gcc/ * config/aarch64/iterators.md (VS): New mode iterator. (vsi2qi): New mode attribute. (VSI2QI): Likewise. * config/aarch64/aarch64-simd-builtins.def: New entry for ctz. * config/aarch64/aarch64-simd.md (ctzmode2): New pattern for ctz. * config/aarch64/aarch64-builtins.c (aarch64_builtin_vectorized_function): Support BUILT_IN_CTZ. gcc/testsuite/ * gcc.target/aarch64/vect_ctz_1.c: New testcase. OK /Marcus
Re: [PATCH][AArch64][1/5] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
On 18 November 2014 12:20, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 18/11/14 10:33, Kyrill Tkachov wrote: diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 264bf01..ad7ec43c 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -36,7 +36,6 @@ extern int arm_no_early_alu_shift_value_dep (rtx, rtx); extern int arm_no_early_mul_dep (rtx, rtx); extern int arm_no_early_store_addr_dep (rtx, rtx); extern bool arm_rtx_shift_left_p (rtx); - /* RTX cost table definitions. These are used when tuning for speed rather than for size and should reflect the_additional_ cost over the cost of the fastest instruction in the machine, which is COSTS_N_INSNS (1). This hunk should not be here. I'll remove it when I commit if approved... Sorry for that. Kyrill Ok, with that hunk dropped. /Marcus
Re: [PATCH][AArch64][2/5] Implement adrp+add fusion
On 18 November 2014 10:33, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c: Include tm-constrs.h (AARCH64_FUSE_ADRP_ADD): Define. (cortexa57_tunings): Add AARCH64_FUSE_ADRP_ADD to fuseable_ops. (cortexa53_tunings): Likewise. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_ADRP_ADD. OK /Marcus
Re: [PATCH][AArch64][3/5] Implement fusion of MOVK+MOVK
On 18 November 2014 10:33, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (AARCH64_FUSE_MOVK_MOVK): Define. (cortexa53_tunings): Specify AARCH64_FUSE_MOVK_MOVK in fuseable_ops. (cortexa57_tunings): Likewise. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_MOVK_MOVK. OK /Marcus
Re: [PATCH][AArch64][4/5] Implement fusion of ARDP+LDR
On 18 November 2014 10:33, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (AARCH64_FUSE_ADRP_LDR): Define. (cortexa53_tunings): Specify AARCH64_FUSE_ADRP_LDR in fuseable_ops. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_ADRP_LDR. OK /Marcus
Re: [PATCH, i386] Add new arg values for __builtin_cpu_supports
On 11/20/14 09:40, Jakub Jelinek wrote: On Thu, Nov 20, 2014 at 07:36:03PM +0300, Ilya Enkovich wrote: Hi, MPX runtime checks some feature bits in order to check MPX is fully supported. Runtime does it by cpuid calls but there is a __builtin_cpu_supports which may be used for that. Unfortunately currently it doesn't support required bits. Will it be OK to add them for trunk? I think using cpuid for that is just fine. __builtin_cpu_supports is for ISA additions users might actually want to version code for, MPX stuff, as the instructions are nops without hw support, are not something one would multi-version a function for. If anything, AVX512F and AVX512BW+VL might be good candidates for that, not MPX. SOrry, I didn't know the __builtin_cpu_supports was really only ment for user multi-versioning. In that case, it won't make any sense to put the MPX stuff in there. Sorry for sending you down a wrong path Ilya. jeff
[C++ PATCH] Allow void type as a literal type in C++14
I noticed that C++14 [basic.types] says that a type is a literal type if it is: void, [...]. Yet our literal_type_p doesn't consider void type as a literal type. The following is an attempt to fix that along with a testcase. It seems that void was only added in C++14, so check for cxx14 as well. Bootstrapped/regtested on ppc64-linux, ok for trunk? 2014-11-21 Marek Polacek pola...@redhat.com * constexpr.c (literal_type_p): Return true for void type in C++14. * g++.dg/cpp0x/constexpr-function2.C: Limit dg-error to C++11. * g++.dg/cpp0x/constexpr-neg1.C: Likewise. * g++.dg/cpp1y/constexpr-void1.C: New test. diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c index 2678223..0a258cf 100644 --- gcc/cp/constexpr.c +++ gcc/cp/constexpr.c @@ -59,7 +59,8 @@ literal_type_p (tree t) { if (SCALAR_TYPE_P (t) || TREE_CODE (t) == VECTOR_TYPE - || TREE_CODE (t) == REFERENCE_TYPE) + || TREE_CODE (t) == REFERENCE_TYPE + || (VOID_TYPE_P (t) cxx_dialect = cxx14)) return true; if (CLASS_TYPE_P (t)) { diff --git gcc/testsuite/g++.dg/cpp0x/constexpr-function2.C gcc/testsuite/g++.dg/cpp0x/constexpr-function2.C index 8c51c9d..95ee244 100644 --- gcc/testsuite/g++.dg/cpp0x/constexpr-function2.C +++ gcc/testsuite/g++.dg/cpp0x/constexpr-function2.C @@ -23,7 +23,7 @@ constexpr int area = squarei(side); // { dg-error side|argument } int next(constexpr int x) // { dg-error parameter } { return x + 1; } -constexpr void f(int x) // { dg-error return type .void } +constexpr void f(int x) // { dg-error return type .void { target c++11_only } } { /* ... */ } constexpr int prev(int x) diff --git gcc/testsuite/g++.dg/cpp0x/constexpr-neg1.C gcc/testsuite/g++.dg/cpp0x/constexpr-neg1.C index 35f5e8e..dfa1d6b 100644 --- gcc/testsuite/g++.dg/cpp0x/constexpr-neg1.C +++ gcc/testsuite/g++.dg/cpp0x/constexpr-neg1.C @@ -29,7 +29,7 @@ int next(constexpr int x) { // { dg-error parameter } extern constexpr int memsz;// { dg-error definition } // error: return type is void -constexpr void f(int x)// { dg-error void } +constexpr void f(int x)// { dg-error void { target c++11_only } } { /* ... */ } // error: use of decrement constexpr int prev(int x) diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-void1.C gcc/testsuite/g++.dg/cpp1y/constexpr-void1.C index e69de29..10ef5bc 100644 --- gcc/testsuite/g++.dg/cpp1y/constexpr-void1.C +++ gcc/testsuite/g++.dg/cpp1y/constexpr-void1.C @@ -0,0 +1,13 @@ +// { dg-do compile { target c++14 } } + +struct S +{ + int i = 20; + + constexpr void + foo (void) + { +if (i 20) + __builtin_abort (); + } +}; Marek
Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
Aehm Kirill, excuse me please, but if I do autogen Makefile.def I get this from svn diff Index: Makefile.in === --- Makefile.in (revision 217890) +++ Makefile.in (working copy) @@ -35238,9 +35238,6 @@ $(SHELL) $(srcdir)/mkinstalldirs $(TARGET_SUBDIR)/liboffloadmic ; \ $(NORMAL_TARGET_EXPORTS) \ echo Configuring in $(TARGET_SUBDIR)/liboffloadmic; \ - \ - this_target=${target_alias}; \ - \ cd $(TARGET_SUBDIR)/liboffloadmic || exit 1; \ case $(srcdir) in \ /* | [A-Za-z]:[\\/]*) topdir=$(srcdir) ;; \ @@ -35248,14 +35245,12 @@ sed -e 's,\./,,g' -e 's,[^/]*/,../,g' `$(srcdir) ;; \ esac; \ module_srcdir=liboffloadmic; \ - srcdiroption=--srcdir=$${topdir}/liboffloadmic; \ - libsrcdir=$$s/liboffloadmic; \ rm -f no-such-file || : ; \ CONFIG_SITE=no-such-file $(SHELL) \ $$s/$$module_srcdir/configure \ --srcdir=$${topdir}/$$module_srcdir \ $(TARGET_CONFIGARGS) --build=${build_alias} --host=${target_alias} \ - --target=$${this_target} $${srcdiroption} @extra_liboffloadmic_configure_flags@ \ + --target=${target_alias} @extra_liboffloadmic_configure_flags@ \ || exit 1 @endif target-liboffloadmic svn blame Makefile.in points to: r217498 | kyukhin | 2014-11-13 15:03:17 +0100 (Thu, 13 Nov 2014) | 110 lines [PATCH 2/4] OpenMP 4.0 offloading to Intel MIC: liboffloadmic. * Makefile.def: Add liboffloadmic to target_modules. Make liboffloadmic depend on libgomp's configure, libstdc++ and libgcc. * Makefile.in: Regenerate. * configure: Regenerate. * configure.ac: Add liboffloadmic to target binaries. Restrict liboffloadmic for POSIX and i*86, and x86_64 architectures. Add liboffloadmic to noconfig list when C++ is not supported. so, did you really regenerate Makefile.in in that patch, or am I missing something ? Regards, Bernd.
libgo patch committed: Use ppc64le for little-endian 64-bit PowerPC architecture
This patch by Lynn A. Boger changes libgo to use ppc64le for little-endian 64-bit PowerPC. Bootstrapped and ran testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 96de84075614 libgo/configure.ac --- a/libgo/configure.acTue Nov 18 09:28:24 2014 -0800 +++ b/libgo/configure.acFri Nov 21 10:01:45 2014 -0800 @@ -194,6 +194,7 @@ mips_abi=unknown is_ppc=no is_ppc64=no +is_ppc64le=no is_s390=no is_s390x=no is_sparc=no @@ -266,11 +267,18 @@ #ifdef _ARCH_PPC64 #error 64-bit #endif], -[is_ppc=yes], [is_ppc64=yes]) +[is_ppc=yes], +[AC_COMPILE_IFELSE([ +#if defined(_BIG_ENDIAN) || defined(__BIG_ENDIAN__) +#error 64be +#endif], +[is_ppc64le=yes],[is_ppc64=yes])]) if test $is_ppc = yes; then GOARCH=ppc +elif test $is_ppc64 = yes; then + GOARCH=ppc64 else - GOARCH=ppc64 + GOARCH=ppc64le fi ;; s390*-*-*) @@ -310,6 +318,7 @@ AM_CONDITIONAL(LIBGO_IS_MIPSO64, test $mips_abi = o64) AM_CONDITIONAL(LIBGO_IS_PPC, test $is_ppc = yes) AM_CONDITIONAL(LIBGO_IS_PPC64, test $is_ppc64 = yes) +AM_CONDITIONAL(LIBGO_IS_PPC64LE, test $is_ppc64le = yes) AM_CONDITIONAL(LIBGO_IS_S390, test $is_s390 = yes) AM_CONDITIONAL(LIBGO_IS_S390X, test $is_s390x = yes) AM_CONDITIONAL(LIBGO_IS_SPARC, test $is_sparc = yes) diff -r 96de84075614 libgo/go/go/build/syslist.go --- a/libgo/go/go/build/syslist.go Tue Nov 18 09:28:24 2014 -0800 +++ b/libgo/go/go/build/syslist.go Fri Nov 21 10:01:45 2014 -0800 @@ -5,4 +5,4 @@ package build const goosList = darwin dragonfly freebsd linux nacl netbsd openbsd plan9 solaris windows -const goarchList = 386 amd64 amd64p32 arm arm64 alpha m68k mipso32 mipsn32 mipsn64 mipso64 ppc ppc64 s390 s390x sparc sparc64 +const goarchList = 386 amd64 amd64p32 arm arm64 alpha m68k mipso32 mipsn32 mipsn64 mipso64 ppc ppc64 ppc64le s390 s390x sparc sparc64 diff -r 96de84075614 libgo/testsuite/gotest --- a/libgo/testsuite/gotestTue Nov 18 09:28:24 2014 -0800 +++ b/libgo/testsuite/gotestFri Nov 21 10:01:45 2014 -0800 @@ -379,7 +379,7 @@ { text=T case $GOARCH in - ppc64) text=[TD] ;; + ppc64*) text=[TD] ;; esac symtogo='sed -e s/_test/XXXtest/ -e s/.*_\([^_]*\.\)/\1/ -e s/XXXtest/_test/'
Re: [PATCH] Set goarch to ppc64le where needed for gccgo testing
On Wed, Nov 19, 2014 at 12:55 PM, Lynn A. Boger labo...@linux.vnet.ibm.com wrote: Updated patch: Thanks. Committed. Ian On 11/19/2014 09:01 AM, Lynn A. Boger wrote: Hi, This change goes along with the change to the GOARCH setting in gccgo for ppc64le which will be done in gofrontend. The description for that change is here: https://groups.google.com/forum/#!topic/gofrontend-dev/ocEttrpsw-s This change has been bootstrapped and tested along with the above change to gofrontend on ppc, ppc64, and ppc64le. 2014-11-19 Lynn Boger labo...@linux.vnet.ibm.com * gcc/testsuite/go.test/go-test.exp: Add case for ppc64le goarch value for go testing Index: gcc/testsuite/go.test/go-test.exp === --- gcc/testsuite/go.test/go-test.exp (revision 217507) +++ gcc/testsuite/go.test/go-test.exp (working copy) @@ -237,13 +237,15 @@ proc go-set-goarch { } { return } } - powerpc*-*-* { - if [check_effective_target_ilp32] { - set goarch ppc - } else { - set goarch ppc64 - } + powerpc-*-* { + set goarch ppc } + powerpc64-*-* { + set goarch ppc64 + } + powerpc64le-*-* { + set goarch ppc64le + } s390-*-* { set goarch s390 }
[PATCH 0/2, AArch64, v3] APM X-Gene 1 cost-table and pipeline model
The following patch-series adds optimized support for the APM X-Gene 1 by providing a cost-model and pipeline-model. The pipeline-model has a few long reservation-chains, but looking at the stats for the generated NDA shows that it's well below other AArch64 cores (e.g. Cortex-A53) in overall size. This includes all the requested enhancements and cleans up the naming of the various states and reservations in 'xgene1.md'. Even though it isn't wired into the 32bit ARM backend yet, we've decided to keep the machine-description in config/arm... after all, the X-Gene family is backwards compatible with ARMv7 and our benchmarking has shown good potential for performance improvements from improving the instruction selection and scheduling when using ARMv7 code (after all, X-Gene 1 is a 4-way superscalar design). After having a few further discussions with my colleagues regarding the latencies and modelling of divides in the pipeline, we've readjusted the modelling of the divides another time... even though it doesn't make a difference in real-world benchmarks. Thanks to everyone who took the time to review and comment. Philipp Tomsich (2): Core definition for APM XGene-1 and associated cost-table. Pipeline model for APM XGene-1. gcc/ChangeLog| 14 + gcc/config/aarch64/aarch64-cores.def | 1 + gcc/config/aarch64/aarch64-tune.md | 2 +- gcc/config/aarch64/aarch64.c | 62 gcc/config/aarch64/aarch64.md| 3 +- gcc/config/arm/aarch-cost-tables.h | 101 +++ gcc/config/arm/xgene1.md | 532 +++ gcc/doc/invoke.texi | 3 +- 8 files changed, 715 insertions(+), 3 deletions(-) create mode 100644 gcc/config/arm/xgene1.md -- 1.9.1
[PATCH 1/2] Core definition for APM XGene-1 and associated cost-table.
To keep this change separately buildable from the pipeline model, this patch directs the APM XGene-1 to use the generic scheduling model. --- gcc/ChangeLog| 8 +++ gcc/config/aarch64/aarch64-cores.def | 1 + gcc/config/aarch64/aarch64-tune.md | 2 +- gcc/config/aarch64/aarch64.c | 62 + gcc/config/arm/aarch-cost-tables.h | 101 +++ gcc/doc/invoke.texi | 3 +- 6 files changed, 175 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2fa58ca..c9ac0d9 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2014-11-19 Philipp Tomsich philipp.toms...@theobroma-systems.com + + * config/aarch64/aarch64-cores.def (xgene1): Update/add the + xgene1 (APM XGene-1) core definition. + * gcc/config/aarch64/aarch64.c: Add cost tables for APM XGene-1 + * config/arm/aarch-cost-tables.h: Add cost tables for APM XGene-1 + * doc/invoke.texi: Document -mcpu=xgene1. + 2014-11-18 Maciej W. Rozycki ma...@codesourcery.com * config/mips/mips.md (compression): Add `micromips32' setting. diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 312941f..e553e50 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -37,6 +37,7 @@ AARCH64_CORE(cortex-a53, cortexa53, cortexa53, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC, cortexa53) AARCH64_CORE(cortex-a57, cortexa15, cortexa15, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC, cortexa57) AARCH64_CORE(thunderx,thunderx, thunderx, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx) +AARCH64_CORE(xgene1, xgene1,xgene1,8, AARCH64_FL_FPSIMD, xgene1) /* V8 big.LITTLE implementations. */ diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index c717ea8..6409082 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr tune - cortexa53,cortexa15,thunderx,cortexa57cortexa53 + cortexa53,cortexa15,thunderx,xgene1,cortexa57cortexa53 (const (symbol_ref ((enum attr_tune) aarch64_tune diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4fec21e..9b92527 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -226,6 +226,27 @@ static const struct cpu_addrcost_table cortexa57_addrcost_table = #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ #endif +static const struct cpu_addrcost_table xgene1_addrcost_table = +{ +#if HAVE_DESIGNATED_INITIALIZERS + .addr_scale_costs = +#endif +{ + NAMED_PARAM (hi, 1), + NAMED_PARAM (si, 0), + NAMED_PARAM (di, 0), + NAMED_PARAM (ti, 1), +}, + NAMED_PARAM (pre_modify, 1), + NAMED_PARAM (post_modify, 0), + NAMED_PARAM (register_offset, 0), + NAMED_PARAM (register_extend, 1), + NAMED_PARAM (imm_offset, 0), +}; + +#if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 +__extension__ +#endif static const struct cpu_regmove_cost generic_regmove_cost = { NAMED_PARAM (GP2GP, 1), @@ -262,6 +283,17 @@ static const struct cpu_regmove_cost thunderx_regmove_cost = NAMED_PARAM (FP2FP, 4) }; +static const struct cpu_regmove_cost xgene1_regmove_cost = +{ + NAMED_PARAM (GP2GP, 1), + NAMED_PARAM (GP2FP, 8), + NAMED_PARAM (FP2GP, 8), + /* We currently do not provide direct support for TFmode Q-Q move. + Therefore we need to raise the cost above 2 in order to have + reload handle the situation. */ + NAMED_PARAM (FP2FP, 4) +}; + /* Generic costs for vector insn classes. */ #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ @@ -302,6 +334,26 @@ static const struct cpu_vector_cost cortexa57_vector_cost = NAMED_PARAM (cond_not_taken_branch_cost, 1) }; +/* Generic costs for vector insn classes. */ +#if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 +__extension__ +#endif +static const struct cpu_vector_cost xgene1_vector_cost = +{ + NAMED_PARAM (scalar_stmt_cost, 1), + NAMED_PARAM (scalar_load_cost, 5), + NAMED_PARAM (scalar_store_cost, 1), + NAMED_PARAM (vec_stmt_cost, 2), + NAMED_PARAM (vec_to_scalar_cost, 4), + NAMED_PARAM (scalar_to_vec_cost, 4), + NAMED_PARAM (vec_align_load_cost, 10), + NAMED_PARAM (vec_unalign_load_cost, 10), + NAMED_PARAM (vec_unalign_store_cost, 2), + NAMED_PARAM (vec_store_cost, 2), + NAMED_PARAM (cond_taken_branch_cost, 2), + NAMED_PARAM (cond_not_taken_branch_cost, 1) +}; + #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ #endif @@ -345,6 +397,16 @@ static const struct tune_params thunderx_tunings = NAMED_PARAM (issue_rate, 2) }; +static const struct tune_params xgene1_tunings = +{ + xgene1_extra_costs, + xgene1_addrcost_table, + xgene1_regmove_cost,
[PATCH 2/2] Pipeline model for APM XGene-1.
--- gcc/ChangeLog | 6 + gcc/config/aarch64/aarch64.md | 3 +- gcc/config/arm/xgene1.md | 532 ++ 3 files changed, 540 insertions(+), 1 deletion(-) create mode 100644 gcc/config/arm/xgene1.md diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c9ac0d9..dad2278 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,11 @@ 2014-11-19 Philipp Tomsich philipp.toms...@theobroma-systems.com + * config/aarch64/aarch64.md: Include xgene1.md. + (generic_sched): Set to no for xgene1. + * config/arm/xgene1.md: New file. + +2014-11-19 Philipp Tomsich philipp.toms...@theobroma-systems.com + * config/aarch64/aarch64-cores.def (xgene1): Update/add the xgene1 (APM XGene-1) core definition. * gcc/config/aarch64/aarch64.c: Add cost tables for APM XGene-1 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 597ff8c..1b36384 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -191,7 +191,7 @@ (define_attr generic_sched yes,no (const (if_then_else - (eq_attr tune cortexa53,cortexa15,thunderx) + (eq_attr tune cortexa53,cortexa15,thunderx,xgene1) (const_string no) (const_string yes @@ -199,6 +199,7 @@ (include ../arm/cortex-a53.md) (include ../arm/cortex-a15.md) (include thunderx.md) +(include ../arm/xgene1.md) ;; --- ;; Jumps and other miscellaneous insns diff --git a/gcc/config/arm/xgene1.md b/gcc/config/arm/xgene1.md new file mode 100644 index 000..563a959 --- /dev/null +++ b/gcc/config/arm/xgene1.md @@ -0,0 +1,532 @@ +;; Machine description for AppliedMicro xgene1 core. +;; Copyright (C) 2012-2014 Free Software Foundation, Inc. +;; Contributed by Theobroma Systems Design und Consulting GmbH. +;;See http://www.theobroma-systems.com for more info. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, but +;; WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;; General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; http://www.gnu.org/licenses/. + +;; Pipeline description for the xgene1 micro-architecture + +(define_automaton xgene1) + +(define_cpu_unit xgene1_decode_out0 xgene1) +(define_cpu_unit xgene1_decode_out1 xgene1) +(define_cpu_unit xgene1_decode_out2 xgene1) +(define_cpu_unit xgene1_decode_out3 xgene1) + +(define_cpu_unit xgene1_divide xgene1) +(define_cpu_unit xgene1_fp_divide xgene1) +(define_cpu_unit xgene1_fsu xgene1) +(define_cpu_unit xgene1_fcmp xgene1) + +(define_reservation xgene1_decode1op +( xgene1_decode_out0 ) +|( xgene1_decode_out1 ) +|( xgene1_decode_out2 ) +|( xgene1_decode_out3 ) +) +(define_reservation xgene1_decode2op +( xgene1_decode_out0 + xgene1_decode_out1 ) +|( xgene1_decode_out0 + xgene1_decode_out2 ) +|( xgene1_decode_out0 + xgene1_decode_out3 ) +|( xgene1_decode_out1 + xgene1_decode_out2 ) +|( xgene1_decode_out1 + xgene1_decode_out3 ) +|( xgene1_decode_out2 + xgene1_decode_out3 ) +) +(define_reservation xgene1_decodeIsolated +( xgene1_decode_out0 + xgene1_decode_out1 + xgene1_decode_out2 + xgene1_decode_out3 ) +) + +(define_insn_reservation xgene1_branch 1 + (and (eq_attr tune xgene1) + (eq_attr type branch)) + xgene1_decode1op) + +(define_insn_reservation xgene1_nop 1 + (and (eq_attr tune xgene1) + (eq_attr type no_insn)) + xgene1_decode1op) + +(define_insn_reservation xgene1_call 1 + (and (eq_attr tune xgene1) + (eq_attr type call)) + xgene1_decode2op) + +(define_insn_reservation xgene1_f_load 10 + (and (eq_attr tune xgene1) + (eq_attr type f_loadd,f_loads)) + xgene1_decode2op) + +(define_insn_reservation xgene1_f_store 4 + (and (eq_attr tune xgene1) + (eq_attr type f_stored,f_stores)) + xgene1_decode2op) + +(define_insn_reservation xgene1_fmov 2 + (and (eq_attr tune xgene1) + (eq_attr type fmov,fconsts,fconstd)) + xgene1_decode1op) + +(define_insn_reservation xgene1_f_mcr 10 + (and (eq_attr tune xgene1) + (eq_attr type f_mcr)) + xgene1_decodeIsolated) + +(define_insn_reservation xgene1_f_mrc 4 + (and (eq_attr tune xgene1) + (eq_attr type f_mrc)) + xgene1_decode2op) + +(define_insn_reservation xgene1_load_pair 6 + (and (eq_attr tune xgene1) + (eq_attr type load2)) + xgene1_decodeIsolated) + +(define_insn_reservation xgene1_store_pair 2 + (and (eq_attr
Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
Hi, On 21 Nov 19:19, Bernd Edlinger wrote: so, did you really regenerate Makefile.in in that patch, or am I missing something ? You're right. This patch was rebased so many times, that we may forget to regenerate it before committing. Do you plan to submit any patch for Makefile.in? Or should I post this change separately for review? (with regtesting) -- Ilya
[RFC] First steps towards segregating types.
I've been trying to sort out how to proceed with the gimple_type work, and the first step always come back to figuring out all the places types are used. This has turned out to be non-trivial and is difficult to do in an iterative way. I believe I've found a reasonable way to proceed. Over the next few months I plan to maintain a branch (tree-type) which leaves types still implemented as trees, and introduce 2 new typedefs and a few macros: typedef union tree_node *tree_type_ptr; // same as tree typedef const union tree_node *const_tree_type_ptr; // same as const_tree I will introduce their use throughout the compiler where types are needed. This will tag all the type locations and still allow me to bootstrap and run tests to ensure things are still working. meanwhile, I'll also maintain another patchset which can be applied to this branch and will switch those types to a completely separate type structure not connected to trees. It changes all the TYPE_ accessor macros to be incompatible with trees. This causes compilation errors everywhere a type is referenced, passed, used, or whatever. It is likely to pick up a few extra things along the way related to separating types that are not appropriate for the main branch. I can then go through the source files fixing the compilation issues raised by adding tree_type_ptr where appropriate and modifying whatever else is required to deal with a segregated type (there is no shortage of those!). These changes can then be applied to the main branch, and tested with a bootstrap/testrun/target-build cycle. I'll also try to keep the branch relatively current with mainline. Once the entire compiler has been processed, the next hunk of work would involve removing the types from the tree union and a multitude of related cleanups (I'm tracking a list) . The 3 type structs would be replaced with a single type node and tree_type_ptr can be replaced with a pointer to the new type_node. const_tree_type_ptr can also be replaced with a normal const version of the same pointer.. we will *not* be stuck with the const_tree paradigm. It is just needed to enable compatibility with const_tree for now :-P There are a few issues, of course :-) The biggest issue is what to do with fields which can be either a type or a tree... ie TREE_VALUE() of a TREE_LIST can be a type, as can a TREE_VEC element or a DECL_CONTEXT. I think the DECL_INITIAL field is overloaded and can sometimes be a type, and this was recently introduced to TARGET_STATIC_CHAIN. I suspect the compilation process will identify others. Looking primarily at TREE_LIST first (which can be a mixed list of trees and types), the question is how to generally handle this situation I have 2 workable approaches in mind, but am open to suggestions. 1 - introduce a TYPE_REF tree node, which is effectively just a 'typed' tree node, and the TREE_TYPE() field of a TYPE_REF node would point to the type node. Any routines which utilize a TYPE node in a tree list would have to be modified to make use of this new TYPE_REF node to refer to the type. 2 - change the field (list-value in this case) to be a tagged union of { tree tree_value, tree_type_ptr type_value } and use a bit in the base to flag which kind of value it is. This would be compatible with GTY, and would require changing routines and algorithms to check the bit and use the right field. Option 2 also introduces a change in current practice. TREE_VALUE() can be either an rvalue or an lvalue right now. This would no longer be possible and would require changing to a get_value(), set_value(), and value_ptr() model. There would be a tree variant and a type variant, along with asserts to make sure they are being used properly. These algorithmic changes can also be fully tested on the main branch. I've implemented this change, and it impacts 40 files which utilize TREE_VALUE as an lvalue. The upside of this is we at least have the illusion of more control. I think the union could possibly be macrod/templated to be generally applicable in other circumstances. I'm not 100% sure, but I think the TYPE_REF approach could continue with the current lvalue or rvalue approach, perhaps with some tweaking... All conjecture since I haven't prototyped it. It also provides a general mechanism for referencing a type node in any tree circumstance. I have a feeling this is the easiest approach, and lends itself well to an initial implementation. At the moment I'm leaning this way but I'm going to think about it over the weekend. Perhaps prototyping it next week will give me a stronger feeling one way or the other. I also suspect it will be worth introducing a TYPE_VEC node which parallels the TREE_VEC, only giving us a list of types. There may be places that a TREE_LIST is comprised entirely of types, and I'd consider trying to convert those to a TYPE_VEC. I've attached 2
RE: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
Hi Ilya, On Fri, 21 Nov 2014 21:44:40, Ilya Verbin wrote: Hi, On 21 Nov 19:19, Bernd Edlinger wrote: so, did you really regenerate Makefile.in in that patch, or am I missing something ? You're right. This patch was rebased so many times, that we may forget to regenerate it before committing. Do you plan to submit any patch for Makefile.in? Or should I post this change separately for review? (with regtesting) -- Ilya No, at least not immediately, so I would prefer if you go ahead with your patch ASAP. Thanks, Bernd.
[PATCH 1/2, PR 63814] Strengthen cgraph_edge_brings_value_p
Hi, PR 63814 is caused by cgraph_edge_brings_value_p misidentifying an edge to an expanded artificial thunk as an edge to the original node, which then leads to crazy double-cloning and doubling the thunks along the call. This patch fixes the bug by strengthening the predicate so that it knows where the value is supposed to go and can check that it goes there and not anywhere else. It also adds an extra availability check that was probably missing in it. Bootstrapped and tested on x86_64-linux, and i686-linux. OK for trunk? Thanks, Martin 2014-11-20 Martin Jambor mjam...@suse.cz PR ipa/63814 * ipa-cp.c (same_node_or_its_all_contexts_clone_p): New function. (cgraph_edge_brings_value_p): New parameter dest, use same_node_or_its_all_contexts_clone_p and check availability. (cgraph_edge_brings_value_p): Likewise. (get_info_about_necessary_edges): New parameter dest, pass it to cgraph_edge_brings_value_p. Update caller. (gather_edges_for_value): Likewise. (perhaps_add_new_callers): Use cgraph_edge_brings_value_p to check both the destination and availability. Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -2785,17 +2785,31 @@ get_clone_agg_value (struct cgraph_node return NULL_TREE; } -/* Return true if edge CS does bring about the value described by SRC. */ +/* Return true is NODE is DEST or its clone for all contexts. */ static bool -cgraph_edge_brings_value_p (struct cgraph_edge *cs, - ipcp_value_sourcetree *src) +same_node_or_its_all_contexts_clone_p (cgraph_node *node, cgraph_node *dest) +{ + if (node == dest) +return true; + + struct ipa_node_params *info = IPA_NODE_REF (node); + return info-is_all_contexts_clone info-ipcp_orig_node == dest; +} + +/* Return true if edge CS does bring about the value described by SRC to node + DEST or its clone for all contexts. */ + +static bool +cgraph_edge_brings_value_p (cgraph_edge *cs, ipcp_value_sourcetree *src, + cgraph_node *dest) { struct ipa_node_params *caller_info = IPA_NODE_REF (cs-caller); - cgraph_node *real_dest = cs-callee-function_symbol (); - struct ipa_node_params *dst_info = IPA_NODE_REF (real_dest); + enum availability availability; + cgraph_node *real_dest = cs-callee-function_symbol (availability); - if ((dst_info-ipcp_orig_node !dst_info-is_all_contexts_clone) + if (!same_node_or_its_all_contexts_clone_p (real_dest, dest) + || availability = AVAIL_INTERPOSABLE || caller_info-node_dead) return false; if (!src-val) @@ -2834,18 +2848,18 @@ cgraph_edge_brings_value_p (struct cgrap } } -/* Return true if edge CS does bring about the value described by SRC. */ +/* Return true if edge CS does bring about the value described by SRC to node + DEST or its clone for all contexts. */ static bool -cgraph_edge_brings_value_p (struct cgraph_edge *cs, - ipcp_value_sourceipa_polymorphic_call_context - *src) +cgraph_edge_brings_value_p (cgraph_edge *cs, + ipcp_value_sourceipa_polymorphic_call_context *src, + cgraph_node *dest) { struct ipa_node_params *caller_info = IPA_NODE_REF (cs-caller); cgraph_node *real_dest = cs-callee-function_symbol (); - struct ipa_node_params *dst_info = IPA_NODE_REF (real_dest); - if ((dst_info-ipcp_orig_node !dst_info-is_all_contexts_clone) + if (!same_node_or_its_all_contexts_clone_p (real_dest, dest) || caller_info-node_dead) return false; if (!src-val) @@ -2871,13 +2885,14 @@ get_next_cgraph_edge_clone (struct cgrap return next_edge_clone[cs-uid]; } -/* Given VAL, iterate over all its sources and if they still hold, add their - edge frequency and their number into *FREQUENCY and *CALLER_COUNT - respectively. */ +/* Given VAL that is intended for DEST, iterate over all its sources and if + they still hold, add their edge frequency and their number into *FREQUENCY + and *CALLER_COUNT respectively. */ template typename valtype static bool -get_info_about_necessary_edges (ipcp_valuevaltype *val, int *freq_sum, +get_info_about_necessary_edges (ipcp_valuevaltype *val, cgraph_node *dest, + int *freq_sum, gcov_type *count_sum, int *caller_count) { ipcp_value_sourcevaltype *src; @@ -2890,7 +2905,7 @@ get_info_about_necessary_edges (ipcp_val struct cgraph_edge *cs = src-cs; while (cs) { - if (cgraph_edge_brings_value_p (cs, src)) + if (cgraph_edge_brings_value_p (cs, src, dest)) { count++; freq += cs-frequency; @@ -2907,12 +2922,13 @@ get_info_about_necessary_edges (ipcp_val return hot; } -/* Return a vector of incoming edges that do
RE: [PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs
-Original Message- From: Rozycki, Maciej Sent: Wednesday, November 19, 2014 8:05 AM To: gcc-patches@gcc.gnu.org Cc: Moore, Catherine; Eric Christopher; Matthew Fortune Subject: [PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs 2014-11-19 Maciej W. Rozycki ma...@codesourcery.com gcc/ * config/mips/mips.c (mips16_build_call_stub): Move the save of the return address in $18 ahead of passing arguments to FPRs. Maciej This looks OK. Please commit.
Re: [PATCH, i386] Add new arg values for __builtin_cpu_supports
2014-11-21 20:45 GMT+03:00 Jeff Law l...@redhat.com: On 11/20/14 09:40, Jakub Jelinek wrote: On Thu, Nov 20, 2014 at 07:36:03PM +0300, Ilya Enkovich wrote: Hi, MPX runtime checks some feature bits in order to check MPX is fully supported. Runtime does it by cpuid calls but there is a __builtin_cpu_supports which may be used for that. Unfortunately currently it doesn't support required bits. Will it be OK to add them for trunk? I think using cpuid for that is just fine. __builtin_cpu_supports is for ISA additions users might actually want to version code for, MPX stuff, as the instructions are nops without hw support, are not something one would multi-version a function for. If anything, AVX512F and AVX512BW+VL might be good candidates for that, not MPX. SOrry, I didn't know the __builtin_cpu_supports was really only ment for user multi-versioning. In that case, it won't make any sense to put the MPX stuff in there. Sorry for sending you down a wrong path Ilya. It's OK, AVX guys will just transform this MPX patch into AVX512 one :) Ilya jeff
[PATCH] Fix VRP handling of {ADD,SUB,MUL}_OVERFLOW (PR tree-optimization/64006)
Hi! As discussed on IRC and in the PR, these internal calls are quite unique for VRP in that they return _Complex integer result, which VRP doesn't track, but then extract using REALPART_EXPR/IMAGPART_EXPR the two results from that _Complex int and to generate good code it is desirable to get proper ranges of those two results. The problem is that right now this works only on the first VRP iteration, the REALPART_EXPR/IMAGPART_EXPR statements are handled if their operand is set by {ADD,SUB,MUL}_OVERFLOW. If we iterate because a VR of one of the internal call arguments changes, nothing in the propagator marks the REALPART_EXPR/IMAGPART_EXPR statements for reconsideration. The following patch handles this, by making the internal calls interesting to the propagator and returning the right SSA_PROP_* for it (depending on whether any of the value ranges of the REALPART_EXPR/IMAGPART_EXPR immediate uses would change or not). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-11-21 Jakub Jelinek ja...@redhat.com PR tree-optimization/64006 * tree-vrp.c (stmt_interesting_for_vrp): Return true for {ADD,SUB,MUL}_OVERFLOW internal calls. (vrp_visit_assignment_or_call): For {ADD,SUB,MUL}_OVERFLOW internal calls, check if any REALPART_EXPR/IMAGPART_EXPR immediate uses would change their value ranges and return SSA_PROP_INTERESTING if so, or SSA_PROP_NOT_INTERESTING if there are some REALPART_EXPR/IMAGPART_EXPR immediate uses interesting for vrp. * gcc.c-torture/execute/pr64006.c: New test. --- gcc/tree-vrp.c.jj 2014-11-21 10:17:05.0 +0100 +++ gcc/tree-vrp.c 2014-11-21 13:12:09.895013334 +0100 @@ -6949,6 +6949,20 @@ stmt_interesting_for_vrp (gimple stmt) (is_gimple_call (stmt) || !gimple_vuse (stmt))) return true; + else if (is_gimple_call (stmt) gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_ADD_OVERFLOW: + case IFN_SUB_OVERFLOW: + case IFN_MUL_OVERFLOW: + /* These internal calls return _Complex integer type, + but are interesting to VRP nevertheless. */ + if (lhs TREE_CODE (lhs) == SSA_NAME) + return true; + break; + default: + break; + } } else if (gimple_code (stmt) == GIMPLE_COND || gimple_code (stmt) == GIMPLE_SWITCH) @@ -7101,6 +7115,74 @@ vrp_visit_assignment_or_call (gimple stm return SSA_PROP_NOT_INTERESTING; } + else if (is_gimple_call (stmt) gimple_call_internal_p (stmt)) +switch (gimple_call_internal_fn (stmt)) + { + case IFN_ADD_OVERFLOW: + case IFN_SUB_OVERFLOW: + case IFN_MUL_OVERFLOW: + /* These internal calls return _Complex integer type, + which VRP does not track, but the immediate uses + thereof might be interesting. */ + if (lhs TREE_CODE (lhs) == SSA_NAME) + { + imm_use_iterator iter; + use_operand_p use_p; + enum ssa_prop_result res = SSA_PROP_VARYING; + + set_value_range_to_varying (get_value_range (lhs)); + + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) + { + gimple use_stmt = USE_STMT (use_p); + if (!is_gimple_assign (use_stmt)) + continue; + enum tree_code rhs_code = gimple_assign_rhs_code (use_stmt); + if (rhs_code != REALPART_EXPR rhs_code != IMAGPART_EXPR) + continue; + tree rhs1 = gimple_assign_rhs1 (use_stmt); + tree use_lhs = gimple_assign_lhs (use_stmt); + if (TREE_CODE (rhs1) != rhs_code + || TREE_OPERAND (rhs1, 0) != lhs + || TREE_CODE (use_lhs) != SSA_NAME + || !stmt_interesting_for_vrp (use_stmt) + || (!INTEGRAL_TYPE_P (TREE_TYPE (use_lhs)) + || !TYPE_MIN_VALUE (TREE_TYPE (use_lhs)) + || !TYPE_MAX_VALUE (TREE_TYPE (use_lhs + continue; + + /* If there is a change in the value range for any of the + REALPART_EXPR/IMAGPART_EXPR immediate uses, return + SSA_PROP_INTERESTING. If there are any REALPART_EXPR + or IMAGPART_EXPR immediate uses, but none of them have + a change in their value ranges, return + SSA_PROP_NOT_INTERESTING. If there are no + {REAL,IMAG}PART_EXPR uses at all, + return SSA_PROP_VARYING. */ + value_range_t new_vr = VR_INITIALIZER; + extract_range_basic (new_vr, use_stmt); + value_range_t *old_vr = get_value_range (use_lhs); + if (old_vr-type != new_vr.type + || !vrp_operand_equal_p (old_vr-min, new_vr.min) +
[PATCH] Fix up __builtin_*_overflow expansion on some targets (PR target/63848)
Hi! Apparently, emit_cmp_and_jump_insns can silently generate wrong code for wider modes on some targets, so this patch changes all those calls in internal-fn.c to do_compare_rtx_and_jump, which is a wrapper around emit_cmp_and_jump_insns that should handle the wider mode comparison expansion. Unfortunately, the order of arguments is different :(. No new testcases provided, the existing testsuite exhibited this on various targets. Bootstrapped/regtested on x86_64-linux and i686-linux, tested on the testcases for ia64 and Uros tested the testcases on Alpha (in both cases they previously failed), ok for trunk? 2014-11-21 Jakub Jelinek ja...@redhat.com PR target/63848 PR target/63975 * internal-fn.c (expand_arith_overflow_result_store, expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow): Use do_compare_rtx_and_jump instead of emit_cmp_and_jump_insns everywhere, adjust arguments to those functions. Use unsignedp = true for EQ, NE, GEU, LEU, LTU and GTU comparisons. --- gcc/internal-fn.c.jj2014-11-19 18:48:02.0 +0100 +++ gcc/internal-fn.c 2014-11-21 17:34:00.634621461 +0100 @@ -386,8 +386,8 @@ expand_arith_overflow_result_store (tree int uns = TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))); lres = convert_modes (tgtmode, mode, res, uns); gcc_assert (GET_MODE_PRECISION (tgtmode) GET_MODE_PRECISION (mode)); - emit_cmp_and_jump_insns (res, convert_modes (mode, tgtmode, lres, uns), - EQ, NULL_RTX, mode, false, done_label, + do_compare_rtx_and_jump (res, convert_modes (mode, tgtmode, lres, uns), + EQ, true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); write_complex_part (target, const1_rtx, true); emit_label (done_label); @@ -533,8 +533,8 @@ expand_addsub_overflow (location_t loc, ? (CONST_SCALAR_INT_P (op0) REG_P (op1)) : CONST_SCALAR_INT_P (op1))) tem = op1; - emit_cmp_and_jump_insns (res, tem, code == PLUS_EXPR ? GEU : LEU, - NULL_RTX, mode, false, done_label, + do_compare_rtx_and_jump (res, tem, code == PLUS_EXPR ? GEU : LEU, + true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); goto do_error_label; } @@ -549,7 +549,7 @@ expand_addsub_overflow (location_t loc, rtx tem = expand_binop (mode, add_optab, code == PLUS_EXPR ? res : op0, sgn, NULL_RTX, false, OPTAB_LIB_WIDEN); - emit_cmp_and_jump_insns (tem, op1, GEU, NULL_RTX, mode, false, + do_compare_rtx_and_jump (tem, op1, GEU, true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); goto do_error_label; } @@ -591,9 +591,9 @@ expand_addsub_overflow (location_t loc, emit_jump (do_error); else if (pos_neg == 3) /* If ARG0 is not known to be always positive, check at runtime. */ - emit_cmp_and_jump_insns (op0, const0_rtx, LT, NULL_RTX, mode, false, -do_error, PROB_VERY_UNLIKELY); - emit_cmp_and_jump_insns (op1, op0, LEU, NULL_RTX, mode, false, + do_compare_rtx_and_jump (op0, const0_rtx, LT, false, mode, NULL_RTX, +NULL_RTX, do_error, PROB_VERY_UNLIKELY); + do_compare_rtx_and_jump (op1, op0, LEU, true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); goto do_error_label; } @@ -607,7 +607,7 @@ expand_addsub_overflow (location_t loc, OPTAB_LIB_WIDEN); rtx tem = expand_binop (mode, add_optab, op1, sgn, NULL_RTX, false, OPTAB_LIB_WIDEN); - emit_cmp_and_jump_insns (op0, tem, LTU, NULL_RTX, mode, false, + do_compare_rtx_and_jump (op0, tem, LTU, true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); goto do_error_label; } @@ -619,8 +619,8 @@ expand_addsub_overflow (location_t loc, unsigned. */ res = expand_binop (mode, add_optab, op0, op1, NULL_RTX, false, OPTAB_LIB_WIDEN); - emit_cmp_and_jump_insns (res, const0_rtx, LT, NULL_RTX, mode, false, - do_error, PROB_VERY_UNLIKELY); + do_compare_rtx_and_jump (res, const0_rtx, LT, false, mode, NULL_RTX, + NULL_RTX, do_error, PROB_VERY_UNLIKELY); rtx tem = op1; /* The operation is commutative, so we can pick operand to compare against. For prec = BITS_PER_WORD, I think preferring REG operand @@ -633,7 +633,7 @@ expand_addsub_overflow (location_t loc, ? (CONST_SCALAR_INT_P (op1) REG_P (op0)) : CONST_SCALAR_INT_P (op0)) tem = op0; - emit_cmp_and_jump_insns (res, tem, GEU,
Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
Hi Jakub! On Fri, 21 Nov 2014 21:44:40, Ilya Verbin wrote: You're right. This patch was rebased so many times, that we may forget to regenerate it before committing. Build with liboffloadmic passed. OK for trunk? -- Ilya * Makefile.in: Regenerate. diff --git a/Makefile.in b/Makefile.in index f1ff972..0bae570 100644 --- a/Makefile.in +++ b/Makefile.in @@ -35238,9 +35238,6 @@ configure-target-liboffloadmic: $(SHELL) $(srcdir)/mkinstalldirs $(TARGET_SUBDIR)/liboffloadmic ; \ $(NORMAL_TARGET_EXPORTS) \ echo Configuring in $(TARGET_SUBDIR)/liboffloadmic; \ -\ - this_target=${target_alias}; \ -\ cd $(TARGET_SUBDIR)/liboffloadmic || exit 1; \ case $(srcdir) in \ /* | [A-Za-z]:[\\/]*) topdir=$(srcdir) ;; \ @@ -35248,14 +35245,12 @@ configure-target-liboffloadmic: sed -e 's,\./,,g' -e 's,[^/]*/,../,g' `$(srcdir) ;; \ esac; \ module_srcdir=liboffloadmic; \ - srcdiroption=--srcdir=$${topdir}/liboffloadmic; \ - libsrcdir=$$s/liboffloadmic; \ rm -f no-such-file || : ; \ CONFIG_SITE=no-such-file $(SHELL) \ $$s/$$module_srcdir/configure \ --srcdir=$${topdir}/$$module_srcdir \ $(TARGET_CONFIGARGS) --build=${build_alias} --host=${target_alias} \ - --target=$${this_target} $${srcdiroption} @extra_liboffloadmic_configure_flags@ \ + --target=${target_alias} @extra_liboffloadmic_configure_flags@ \ || exit 1 @endif target-liboffloadmic
[PATCH 2/2, PR 63814] Do not re-create expanded artificial thunks
Hi, when debugging PR 63814 I noticed that when cgraph_node::create_clone was using redirect_edge_duplicating_thunks to redirect two edges to a thunk of a clone, two thunks were created, one for each edge. The reason is that even though duplicate_thunk_for_node attempts to locate an already created thunk, it does so by looking for a caller with thunk.thunk_p set and the previously created one does not have it set because (on i686) expand_thunk has expanded the thunk to gimple and cleared the flag. This patch fixes the issue by marking such expanded thunks with yet another flag and then uses the flag to identify such expanded thunks. Bootstrapped and tested on x86_64-linux and i686-linux. Honza, do you think this is a good approach? Is the patch OK for trunk? Thanks, Martin 2014-11-21 Martin Jambor mjam...@suse.cz * cgraph.h (cgraph_thunk_info): Converted thunk_p to a bit-field. Added new flag expanded_thunk_p. * cgraphunit.c (expand_thunk): Set expanded_thunk_p when appropriate. * cgraphclones.c (duplicate_thunk_for_node): Also re-use an expanded thunk if available. Index: src/gcc/cgraph.h === --- src.orig/gcc/cgraph.h +++ src/gcc/cgraph.h @@ -552,7 +552,9 @@ struct GTY(()) cgraph_thunk_info { bool virtual_offset_p; bool add_pointer_bounds_args; /* Set to true when alias node is thunk. */ - bool thunk_p; + unsigned thunk_p : 1; + /* Set when this is an already expanded thunk. */ + unsigned expanded_thunk_p : 1; }; /* Information about the function collected locally. Index: src/gcc/cgraphclones.c === --- src.orig/gcc/cgraphclones.c +++ src/gcc/cgraphclones.c @@ -311,7 +311,7 @@ duplicate_thunk_for_node (cgraph_node *t cgraph_edge *cs; for (cs = node-callers; cs; cs = cs-next_caller) -if (cs-caller-thunk.thunk_p +if ((cs-caller-thunk.thunk_p || cs-caller-thunk.expanded_thunk_p) cs-caller-thunk.this_adjusting == thunk-thunk.this_adjusting cs-caller-thunk.fixed_offset == thunk-thunk.fixed_offset cs-caller-thunk.virtual_offset_p == thunk-thunk.virtual_offset_p Index: src/gcc/cgraphunit.c === --- src.orig/gcc/cgraphunit.c +++ src/gcc/cgraphunit.c @@ -1504,6 +1504,7 @@ cgraph_node::expand_thunk (bool output_a set_cfun (NULL); TREE_ASM_WRITTEN (thunk_fndecl) = 1; thunk.thunk_p = false; + thunk.expanded_thunk_p = true; analyzed = false; } else @@ -1686,6 +1687,7 @@ cgraph_node::expand_thunk (bool output_a /* Since we want to emit the thunk, we explicitly mark its name as referenced. */ thunk.thunk_p = false; + thunk.expanded_thunk_p = true; lowered = true; bitmap_obstack_release (NULL); }
Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
On Fri, Nov 21, 2014 at 10:14:21PM +0300, Ilya Verbin wrote: On Fri, 21 Nov 2014 21:44:40, Ilya Verbin wrote: You're right. This patch was rebased so many times, that we may forget to regenerate it before committing. Build with liboffloadmic passed. OK for trunk? -- Ilya * Makefile.in: Regenerate. Ok. --- a/Makefile.in +++ b/Makefile.in @@ -35238,9 +35238,6 @@ configure-target-liboffloadmic: $(SHELL) $(srcdir)/mkinstalldirs $(TARGET_SUBDIR)/liboffloadmic ; \ $(NORMAL_TARGET_EXPORTS) \ echo Configuring in $(TARGET_SUBDIR)/liboffloadmic; \ - \ - this_target=${target_alias}; \ - \ cd $(TARGET_SUBDIR)/liboffloadmic || exit 1; \ case $(srcdir) in \ /* | [A-Za-z]:[\\/]*) topdir=$(srcdir) ;; \ @@ -35248,14 +35245,12 @@ configure-target-liboffloadmic: sed -e 's,\./,,g' -e 's,[^/]*/,../,g' `$(srcdir) ;; \ esac; \ module_srcdir=liboffloadmic; \ - srcdiroption=--srcdir=$${topdir}/liboffloadmic; \ - libsrcdir=$$s/liboffloadmic; \ rm -f no-such-file || : ; \ CONFIG_SITE=no-such-file $(SHELL) \ $$s/$$module_srcdir/configure \ --srcdir=$${topdir}/$$module_srcdir \ $(TARGET_CONFIGARGS) --build=${build_alias} --host=${target_alias} \ - --target=$${this_target} $${srcdiroption} @extra_liboffloadmic_configure_flags@ \ + --target=${target_alias} @extra_liboffloadmic_configure_flags@ \ || exit 1 @endif target-liboffloadmic Jakub
Re: [PATCH] PR lto/63968: 175.vpr from cpu2000 fails to build with LTO
Can you verify that the implementation is correct? I tend to remember that I introduced the lazy incerementation to inliner both for perofrmance and correctness reasons. I used to get odd orders when keys was increased. Honza Hello. What kind of correctness do you mean? Old implementation didn't support increment operation and the fact was hushed up. I see, you patch actually implement the variant of busy (and thus suboptimal) method of increasing key by combination of removalinsertion. I guess O(log n) is good enough for everything except for inliner that does the lazy increases instead. Doing lazy increases probably means to store pair of keys per node that is wasteful, so the patch is OK as it is. Honza Martin
Re: [RFC] First steps towards segregating types.
On Fri, Nov 21, 2014 at 1:48 PM, Andrew MacLeod amacl...@redhat.com wrote: 1 - introduce a TYPE_REF tree node, which is effectively just a 'typed' tree node, and the TREE_TYPE() field of a TYPE_REF node would point to the type node. Any routines which utilize a TYPE node in a tree list would have to be modified to make use of this new TYPE_REF node to refer to the type. 2 - change the field (list-value in this case) to be a tagged union of { tree tree_value, tree_type_ptr type_value } and use a bit in the base to flag which kind of value it is. This would be compatible with GTY, and would require changing routines and algorithms to check the bit and use the right field. Seems to me that option 2 would also help against code that blindly looks at TREE_VALUE and assumes it to be a tree. Wouldn't that make initial implementation a bit more challenging? Option 1 does seem easier, but I kind of like the forcing of rvalues that option 2 provides. Also liking option 1. The final change to the final type should be simpler that way. Diego.
[PATCH, PR 63551] Use proper type in evaluate_conditions_for_known_args
Hi, the testcase of PR 63551 passes a union between a signed and an unsigned integer between two functions as a parameter. The caller initializes to an unsigned integer with the highest order bit set, the callee loads the data through the signed field and compares with zero. evaluate_conditions_for_known_args then wrongly evaluated the condition in these circumstances, which later on lead to insertion of builtin_unreachable and miscompilation. Fixed by fold_converting the known value first. I use the type of the value in the condition which should do exactly the right thing because the value is taken from the corresponding gimple_cond statement in which types must match. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2014-11-21 Martin Jambor mjam...@suse.cz PR ipa/63551 * ipa-inline-analysis.c (evaluate_conditions_for_known_args): Convert value of the argument to the type of the value in the condition. testsuite/ * gcc.dg/ipa/pr63551.c: New test. Index: src/gcc/ipa-inline-analysis.c === --- src.orig/gcc/ipa-inline-analysis.c +++ src/gcc/ipa-inline-analysis.c @@ -880,6 +880,7 @@ evaluate_conditions_for_known_args (stru } if (c-code == IS_NOT_CONSTANT || c-code == CHANGED) continue; + val = fold_convert (TREE_TYPE (c-val), val); res = fold_binary_to_constant (c-code, boolean_type_node, val, c-val); if (res integer_zerop (res)) continue; Index: src/gcc/testsuite/gcc.dg/ipa/pr63551.c === --- /dev/null +++ src/gcc/testsuite/gcc.dg/ipa/pr63551.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options -Os } */ + +union U +{ + unsigned int f0; + int f1; +}; + +int a, d; + +void +fn1 (union U p) +{ + if (p.f1 = 0) +if (a) + d = 0; +} + +void +fn2 () +{ + d = 0; + union U b = { 4294967286 }; + fn1 (b); +} + +int +main () +{ + fn2 (); + return 0; +}
Re: [RFC] First steps towards segregating types.
On November 21, 2014 8:45:09 PM CET, Diego Novillo dnovi...@google.com wrote: On Fri, Nov 21, 2014 at 1:48 PM, Andrew MacLeod amacl...@redhat.com wrote: 1 - introduce a TYPE_REF tree node, which is effectively just a 'typed' tree node, and the TREE_TYPE() field of a TYPE_REF node would point to the type node. Any routines which utilize a TYPE node in a tree list would have to be modified to make use of this new TYPE_REF node to refer to the type. 2 - change the field (list-value in this case) to be a tagged union of { tree tree_value, tree_type_ptr type_value } and use a bit in the base to flag which kind of value it is. This would be compatible with GTY, and would require changing routines and algorithms to check the bit and use the right field. Seems to me that option 2 would also help against code that blindly looks at TREE_VALUE and assumes it to be a tree. Wouldn't that make initial implementation a bit more challenging? Option 1 does seem easier, but I kind of like the forcing of rvalues that option 2 provides. Also liking option 1. The final change to the final type should be simpler that way. I don't like either :). It seems you are concerned about uses from trees. An intermediate step here that would be useful is doing what David did for RTL insns and now gimple - expose tree_type as static type but keep tree as its base. Thus make references to trees that are always types use tree_type * while keeping those that can refer to types and sth else refer to tree. That's something that would not be completely artificial at this point. Richard. Diego.
Re: [RFC] First steps towards segregating types.
On 11/21/2014 02:45 PM, Diego Novillo wrote: On Fri, Nov 21, 2014 at 1:48 PM, Andrew MacLeod amacl...@redhat.com wrote: 1 - introduce a TYPE_REF tree node, which is effectively just a 'typed' tree node, and the TREE_TYPE() field of a TYPE_REF node would point to the type node. Any routines which utilize a TYPE node in a tree list would have to be modified to make use of this new TYPE_REF node to refer to the type. 2 - change the field (list-value in this case) to be a tagged union of { tree tree_value, tree_type_ptr type_value } and use a bit in the base to flag which kind of value it is. This would be compatible with GTY, and would require changing routines and algorithms to check the bit and use the right field. Seems to me that option 2 would also help against code that blindly looks at TREE_VALUE and assumes it to be a tree. Wouldn't that make initial implementation a bit more challenging? The opposite I think... option 2 requires compile time correctness. In order to get option 1 it right, anywhere there is a TYPE_REF we're going to have to change it to look through the TYPE_REF to get the type. If we don't then I'll probably end up with run-time errors in the main branch when TREE_VALUE() gets an unexpected TYPE_REF, or something like that. Im also somewhat concerned about places which use it as an LVALUE and write the wrong sort of thing back in. Won't catch that until runtime either. I'll know more next week about that aspect with some practical implementation. Maybe even a combination... change to the get,set,ptr model for accessors, AND then use TYPE_REFalthough at the same time i'd be nice to ditch the TREE_VALUE_PTR variations... thats virtually the same thing as an LVALUE :-P. that is non-trivial however. perhaps thats a bad idea :-) Option 1 does seem easier, but I kind of like the forcing of rvalues that option 2 provides. Also liking option 1. The final change to the final type should be simpler that way. Its also relatively easy to change individual cases from option 2 to option 1 down the road. vice versa is not true :-) Andrew
Re: [PATCH, PR 63551] Use proper type in evaluate_conditions_for_known_args
On Fri, Nov 21, 2014 at 09:07:50PM +0100, Martin Jambor wrote: Hi, the testcase of PR 63551 passes a union between a signed and an unsigned integer between two functions as a parameter. The caller initializes to an unsigned integer with the highest order bit set, the callee loads the data through the signed field and compares with zero. evaluate_conditions_for_known_args then wrongly evaluated the condition in these circumstances, which later on lead to insertion of builtin_unreachable and miscompilation. Fixed by fold_converting the known value first. I use the type of the value in the condition which should do exactly the right thing because the value is taken from the corresponding gimple_cond statement in which types must match. Bootstrapped and tested on x86_64-linux. OK for trunk? I forgot, this is also a 4.9 bug and I have bootstrapped and tested it on top of the 4.9 branch as well. So OK for trunk and the 4.9 branch? Thanks, Martin 2014-11-21 Martin Jambor mjam...@suse.cz PR ipa/63551 * ipa-inline-analysis.c (evaluate_conditions_for_known_args): Convert value of the argument to the type of the value in the condition. testsuite/ * gcc.dg/ipa/pr63551.c: New test. Index: src/gcc/ipa-inline-analysis.c === --- src.orig/gcc/ipa-inline-analysis.c +++ src/gcc/ipa-inline-analysis.c @@ -880,6 +880,7 @@ evaluate_conditions_for_known_args (stru } if (c-code == IS_NOT_CONSTANT || c-code == CHANGED) continue; + val = fold_convert (TREE_TYPE (c-val), val); res = fold_binary_to_constant (c-code, boolean_type_node, val, c-val); if (res integer_zerop (res)) continue; Index: src/gcc/testsuite/gcc.dg/ipa/pr63551.c === --- /dev/null +++ src/gcc/testsuite/gcc.dg/ipa/pr63551.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options -Os } */ + +union U +{ + unsigned int f0; + int f1; +}; + +int a, d; + +void +fn1 (union U p) +{ + if (p.f1 = 0) +if (a) + d = 0; +} + +void +fn2 () +{ + d = 0; + union U b = { 4294967286 }; + fn1 (b); +} + +int +main () +{ + fn2 (); + return 0; +}
Re: [PATCH, PR 63551] Use proper type in evaluate_conditions_for_known_args
On November 21, 2014 9:07:50 PM CET, Martin Jambor mjam...@suse.cz wrote: Hi, the testcase of PR 63551 passes a union between a signed and an unsigned integer between two functions as a parameter. The caller initializes to an unsigned integer with the highest order bit set, the callee loads the data through the signed field and compares with zero. evaluate_conditions_for_known_args then wrongly evaluated the condition in these circumstances, which later on lead to insertion of builtin_unreachable and miscompilation. Fixed by fold_converting the known value first. I use the type of the value in the condition which should do exactly the right thing because the value is taken from the corresponding gimple_cond statement in which types must match. Bootstrapped and tested on x86_64-linux. OK for trunk? I think you want to use fold_unary (VIEW_CONVERT,...) Here if you consider the case with Int and float. And fail if that returns NULL or not a constant. Thanks, Richard. Thanks, Martin 2014-11-21 Martin Jambor mjam...@suse.cz PR ipa/63551 * ipa-inline-analysis.c (evaluate_conditions_for_known_args): Convert value of the argument to the type of the value in the condition. testsuite/ * gcc.dg/ipa/pr63551.c: New test. Index: src/gcc/ipa-inline-analysis.c === --- src.orig/gcc/ipa-inline-analysis.c +++ src/gcc/ipa-inline-analysis.c @@ -880,6 +880,7 @@ evaluate_conditions_for_known_args (stru } if (c-code == IS_NOT_CONSTANT || c-code == CHANGED) continue; + val = fold_convert (TREE_TYPE (c-val), val); res = fold_binary_to_constant (c-code, boolean_type_node, val, c-val); if (res integer_zerop (res)) continue; Index: src/gcc/testsuite/gcc.dg/ipa/pr63551.c === --- /dev/null +++ src/gcc/testsuite/gcc.dg/ipa/pr63551.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options -Os } */ + +union U +{ + unsigned int f0; + int f1; +}; + +int a, d; + +void +fn1 (union U p) +{ + if (p.f1 = 0) +if (a) + d = 0; +} + +void +fn2 () +{ + d = 0; + union U b = { 4294967286 }; + fn1 (b); +} + +int +main () +{ + fn2 (); + return 0; +}
[PATCH][OpenMP] Fix named critical sections inside target functions
Hi, '#pragma omp critical (name)' can be placed in the function, marked with '#pragma omp declare target', in this case the corresponding node should be marked as offloadable too. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? -- Ilya gcc/ * omp-low.c (lower_omp_critical): Mark critical sections inside target functions as offloadable. diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 3924282..6c5774c 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -9366,16 +9366,6 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx) DECL_ARTIFICIAL (decl) = 1; DECL_IGNORED_P (decl) = 1; - /* If '#pragma omp critical' is inside target region, the symbol must -be marked for offloading. */ - omp_context *octx; - for (octx = ctx-outer; octx; octx = octx-outer) - if (is_targetreg_ctx (octx)) - { - varpool_node::get_create (decl)-offloadable = 1; - break; - } - varpool_node::finalize_decl (decl); critical_name_mutexes-put (name, decl); @@ -9383,6 +9373,20 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx) else decl = *n; + /* If '#pragma omp critical' is inside target region or +inside function marked as offloadable, the symbol must be +marked as offloadable too. */ + omp_context *octx; + if (cgraph_node::get (current_function_decl)-offloadable) + varpool_node::get_create (decl)-offloadable = 1; + else + for (octx = ctx-outer; octx; octx = octx-outer) + if (is_targetreg_ctx (octx)) + { + varpool_node::get_create (decl)-offloadable = 1; + break; + } + lock = builtin_decl_explicit (BUILT_IN_GOMP_CRITICAL_NAME_START); lock = build_call_expr_loc (loc, lock, 1, build_fold_addr_expr_loc (loc, decl));
Re: [PATCH] Fix VRP handling of {ADD,SUB,MUL}_OVERFLOW (PR tree-optimization/64006)
On November 21, 2014 8:04:39 PM CET, Jakub Jelinek ja...@redhat.com wrote: Hi! As discussed on IRC and in the PR, these internal calls are quite unique for VRP in that they return _Complex integer result, which VRP doesn't track, but then extract using REALPART_EXPR/IMAGPART_EXPR the two results from that _Complex int and to generate good code it is desirable to get proper ranges of those two results. The problem is that right now this works only on the first VRP iteration, the REALPART_EXPR/IMAGPART_EXPR statements are handled if their operand is set by {ADD,SUB,MUL}_OVERFLOW. If we iterate because a VR of one of the internal call arguments changes, nothing in the propagator marks the REALPART_EXPR/IMAGPART_EXPR statements for reconsideration. The following patch handles this, by making the internal calls interesting to the propagator and returning the right SSA_PROP_* for it (depending on whether any of the value ranges of the REALPART_EXPR/IMAGPART_EXPR immediate uses would change or not). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2014-11-21 Jakub Jelinek ja...@redhat.com PR tree-optimization/64006 * tree-vrp.c (stmt_interesting_for_vrp): Return true for {ADD,SUB,MUL}_OVERFLOW internal calls. (vrp_visit_assignment_or_call): For {ADD,SUB,MUL}_OVERFLOW internal calls, check if any REALPART_EXPR/IMAGPART_EXPR immediate uses would change their value ranges and return SSA_PROP_INTERESTING if so, or SSA_PROP_NOT_INTERESTING if there are some REALPART_EXPR/IMAGPART_EXPR immediate uses interesting for vrp. * gcc.c-torture/execute/pr64006.c: New test. --- gcc/tree-vrp.c.jj 2014-11-21 10:17:05.0 +0100 +++ gcc/tree-vrp.c 2014-11-21 13:12:09.895013334 +0100 @@ -6949,6 +6949,20 @@ stmt_interesting_for_vrp (gimple stmt) (is_gimple_call (stmt) || !gimple_vuse (stmt))) return true; + else if (is_gimple_call (stmt) gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) +{ +case IFN_ADD_OVERFLOW: +case IFN_SUB_OVERFLOW: +case IFN_MUL_OVERFLOW: + /* These internal calls return _Complex integer type, + but are interesting to VRP nevertheless. */ + if (lhs TREE_CODE (lhs) == SSA_NAME) +return true; + break; +default: + break; +} } else if (gimple_code (stmt) == GIMPLE_COND || gimple_code (stmt) == GIMPLE_SWITCH) @@ -7101,6 +7115,74 @@ vrp_visit_assignment_or_call (gimple stm return SSA_PROP_NOT_INTERESTING; } + else if (is_gimple_call (stmt) gimple_call_internal_p (stmt)) +switch (gimple_call_internal_fn (stmt)) + { + case IFN_ADD_OVERFLOW: + case IFN_SUB_OVERFLOW: + case IFN_MUL_OVERFLOW: + /* These internal calls return _Complex integer type, + which VRP does not track, but the immediate uses + thereof might be interesting. */ + if (lhs TREE_CODE (lhs) == SSA_NAME) +{ + imm_use_iterator iter; + use_operand_p use_p; + enum ssa_prop_result res = SSA_PROP_VARYING; + + set_value_range_to_varying (get_value_range (lhs)); + + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) +{ + gimple use_stmt = USE_STMT (use_p); + if (!is_gimple_assign (use_stmt)) +continue; + enum tree_code rhs_code = gimple_assign_rhs_code (use_stmt); + if (rhs_code != REALPART_EXPR rhs_code != IMAGPART_EXPR) +continue; + tree rhs1 = gimple_assign_rhs1 (use_stmt); + tree use_lhs = gimple_assign_lhs (use_stmt); + if (TREE_CODE (rhs1) != rhs_code + || TREE_OPERAND (rhs1, 0) != lhs + || TREE_CODE (use_lhs) != SSA_NAME + || !stmt_interesting_for_vrp (use_stmt) + || (!INTEGRAL_TYPE_P (TREE_TYPE (use_lhs)) + || !TYPE_MIN_VALUE (TREE_TYPE (use_lhs)) + || !TYPE_MAX_VALUE (TREE_TYPE (use_lhs +continue; + + /* If there is a change in the value range for any of the + REALPART_EXPR/IMAGPART_EXPR immediate uses, return + SSA_PROP_INTERESTING. If there are any REALPART_EXPR + or IMAGPART_EXPR immediate uses, but none of them have + a change in their value ranges, return + SSA_PROP_NOT_INTERESTING. If there are no + {REAL,IMAG}PART_EXPR uses at all, + return SSA_PROP_VARYING. */ + value_range_t new_vr = VR_INITIALIZER; + extract_range_basic (new_vr, use_stmt); + value_range_t *old_vr = get_value_range (use_lhs); + if (old_vr-type != new_vr.type + || !vrp_operand_equal_p (old_vr-min,
Re: [RFC] First steps towards segregating types.
On 11/21/2014 03:13 PM, Richard Biener wrote: On November 21, 2014 8:45:09 PM CET, Diego Novillo dnovi...@google.com wrote: On Fri, Nov 21, 2014 at 1:48 PM, Andrew MacLeod amacl...@redhat.com wrote: 1 - introduce a TYPE_REF tree node, which is effectively just a 'typed' tree node, and the TREE_TYPE() field of a TYPE_REF node would point to the type node. Any routines which utilize a TYPE node in a tree list would have to be modified to make use of this new TYPE_REF node to refer to the type. 2 - change the field (list-value in this case) to be a tagged union of { tree tree_value, tree_type_ptr type_value } and use a bit in the base to flag which kind of value it is. This would be compatible with GTY, and would require changing routines and algorithms to check the bit and use the right field. Seems to me that option 2 would also help against code that blindly looks at TREE_VALUE and assumes it to be a tree. Wouldn't that make initial implementation a bit more challenging? Option 1 does seem easier, but I kind of like the forcing of rvalues that option 2 provides. Also liking option 1. The final change to the final type should be simpler that way. I don't like either :). It seems you are concerned about uses from trees. An intermediate step here that would be useful is doing what David did for RTL insns and now gimple - expose tree_type as static type but keep tree as its base. Didn't say I was thrilled with either, just the only 2 I had come up with :-) Thus make references to trees that are always types use tree_type * while keeping those that can refer to types and sth else refer to tree. That's something that would not be completely artificial at this point. Richard. Or possibly a third type which is a hybrid of the two, and also maps to a tree... something like tree_type_hybrid * That could work, and will continue to highlight all the places which still need to be dealt with. And it's much less work.. :-) I'll give that a go and see how it plays out. Thanks Andrew
Re: [PATCH] Fix up __builtin_*_overflow expansion on some targets (PR target/63848)
On November 21, 2014 8:08:37 PM CET, Jakub Jelinek ja...@redhat.com wrote: Hi! Apparently, emit_cmp_and_jump_insns can silently generate wrong code for wider modes on some targets, so this patch changes all those calls in internal-fn.c to do_compare_rtx_and_jump, which is a wrapper around emit_cmp_and_jump_insns that should handle the wider mode comparison expansion. Unfortunately, the order of arguments is different :(. No new testcases provided, the existing testsuite exhibited this on various targets. Bootstrapped/regtested on x86_64-linux and i686-linux, tested on the testcases for ia64 and Uros tested the testcases on Alpha (in both cases they previously failed), ok for trunk? Ok. Thanks, Richard. 2014-11-21 Jakub Jelinek ja...@redhat.com PR target/63848 PR target/63975 * internal-fn.c (expand_arith_overflow_result_store, expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow): Use do_compare_rtx_and_jump instead of emit_cmp_and_jump_insns everywhere, adjust arguments to those functions. Use unsignedp = true for EQ, NE, GEU, LEU, LTU and GTU comparisons. --- gcc/internal-fn.c.jj 2014-11-19 18:48:02.0 +0100 +++ gcc/internal-fn.c 2014-11-21 17:34:00.634621461 +0100 @@ -386,8 +386,8 @@ expand_arith_overflow_result_store (tree int uns = TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))); lres = convert_modes (tgtmode, mode, res, uns); gcc_assert (GET_MODE_PRECISION (tgtmode) GET_MODE_PRECISION (mode)); - emit_cmp_and_jump_insns (res, convert_modes (mode, tgtmode, lres, uns), - EQ, NULL_RTX, mode, false, done_label, + do_compare_rtx_and_jump (res, convert_modes (mode, tgtmode, lres, uns), + EQ, true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); write_complex_part (target, const1_rtx, true); emit_label (done_label); @@ -533,8 +533,8 @@ expand_addsub_overflow (location_t loc, ? (CONST_SCALAR_INT_P (op0) REG_P (op1)) : CONST_SCALAR_INT_P (op1))) tem = op1; - emit_cmp_and_jump_insns (res, tem, code == PLUS_EXPR ? GEU : LEU, - NULL_RTX, mode, false, done_label, + do_compare_rtx_and_jump (res, tem, code == PLUS_EXPR ? GEU : LEU, + true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); goto do_error_label; } @@ -549,7 +549,7 @@ expand_addsub_overflow (location_t loc, rtx tem = expand_binop (mode, add_optab, code == PLUS_EXPR ? res : op0, sgn, NULL_RTX, false, OPTAB_LIB_WIDEN); - emit_cmp_and_jump_insns (tem, op1, GEU, NULL_RTX, mode, false, + do_compare_rtx_and_jump (tem, op1, GEU, true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); goto do_error_label; } @@ -591,9 +591,9 @@ expand_addsub_overflow (location_t loc, emit_jump (do_error); else if (pos_neg == 3) /* If ARG0 is not known to be always positive, check at runtime. */ - emit_cmp_and_jump_insns (op0, const0_rtx, LT, NULL_RTX, mode, false, - do_error, PROB_VERY_UNLIKELY); - emit_cmp_and_jump_insns (op1, op0, LEU, NULL_RTX, mode, false, + do_compare_rtx_and_jump (op0, const0_rtx, LT, false, mode, NULL_RTX, + NULL_RTX, do_error, PROB_VERY_UNLIKELY); + do_compare_rtx_and_jump (op1, op0, LEU, true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); goto do_error_label; } @@ -607,7 +607,7 @@ expand_addsub_overflow (location_t loc, OPTAB_LIB_WIDEN); rtx tem = expand_binop (mode, add_optab, op1, sgn, NULL_RTX, false, OPTAB_LIB_WIDEN); - emit_cmp_and_jump_insns (op0, tem, LTU, NULL_RTX, mode, false, + do_compare_rtx_and_jump (op0, tem, LTU, true, mode, NULL_RTX, NULL_RTX, done_label, PROB_VERY_LIKELY); goto do_error_label; } @@ -619,8 +619,8 @@ expand_addsub_overflow (location_t loc, unsigned. */ res = expand_binop (mode, add_optab, op0, op1, NULL_RTX, false, OPTAB_LIB_WIDEN); - emit_cmp_and_jump_insns (res, const0_rtx, LT, NULL_RTX, mode, false, - do_error, PROB_VERY_UNLIKELY); + do_compare_rtx_and_jump (res, const0_rtx, LT, false, mode, NULL_RTX, + NULL_RTX, do_error, PROB_VERY_UNLIKELY); rtx tem = op1; /* The operation is commutative, so we can pick operand to compare against. For prec = BITS_PER_WORD, I think preferring REG operand @@ -633,7 +633,7 @@ expand_addsub_overflow (location_t loc, ? (CONST_SCALAR_INT_P (op1) REG_P (op0)) : CONST_SCALAR_INT_P (op0)) tem = op0;
Re: [PATCH][OpenMP] Fix named critical sections inside target functions
On Fri, Nov 21, 2014 at 11:19:26PM +0300, Ilya Verbin wrote: Hi, '#pragma omp critical (name)' can be placed in the function, marked with '#pragma omp declare target', in this case the corresponding node should be marked as offloadable too. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Please add a testcase for this. * omp-low.c (lower_omp_critical): Mark critical sections inside target functions as offloadable. diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 3924282..6c5774c 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -9366,16 +9366,6 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx) DECL_ARTIFICIAL (decl) = 1; DECL_IGNORED_P (decl) = 1; - /* If '#pragma omp critical' is inside target region, the symbol must - be marked for offloading. */ - omp_context *octx; - for (octx = ctx-outer; octx; octx = octx-outer) - if (is_targetreg_ctx (octx)) - { - varpool_node::get_create (decl)-offloadable = 1; - break; - } - varpool_node::finalize_decl (decl); critical_name_mutexes-put (name, decl); @@ -9383,6 +9373,20 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx) else decl = *n; + /* If '#pragma omp critical' is inside target region or + inside function marked as offloadable, the symbol must be + marked as offloadable too. */ + omp_context *octx; + if (cgraph_node::get (current_function_decl)-offloadable) + varpool_node::get_create (decl)-offloadable = 1; + else + for (octx = ctx-outer; octx; octx = octx-outer) + if (is_targetreg_ctx (octx)) + { + varpool_node::get_create (decl)-offloadable = 1; + break; + } + lock = builtin_decl_explicit (BUILT_IN_GOMP_CRITICAL_NAME_START); lock = build_call_expr_loc (loc, lock, 1, build_fold_addr_expr_loc (loc, decl)); Jakub
Re: [PATCH 2/2] PR debug/38757 continued. Handle C11, C++11 and C++14.
On Fri, Nov 21, 2014 at 09:28:45AM +0100, Jakub Jelinek wrote: I think best would be to tweak if (value 2 || value 4) error_at (loc, dwarf version %d is not supported, value); else opts-x_dwarf_version = value; so that we accept value 5 too, and for now, until the most common consumers are changed, use if (dwarf_version = 5 /* || !dwarf_strict */) so that - you can actually use it in the test with -gdwarf-5 - you can commit it right away - people can start playing with what it will mean to support DWARF5 GCC 4.5 also allowed -gdwarf-4 even when DWARF4 has not been released yet. When there are consumers that can grok it, we can uncomment the || !dwarf_strict. That makes sense and would be convenient for me. I made the change in opts.c and added some minimal documentation. And made sure we only emit the new DWARFv5 language values, but not yet anything else (the table header format has changed for debug_info and debug_line in v5, but we don't emit new style headers yet). The testcases were updated to explicitly add -gdwarf-5. else if (strncmp (language_string, GNU C, 5) == 0) { language = DW_LANG_C89; if (dwarf_version = 3 || !dwarf_strict) - if (strcmp (language_string, GNU C99) == 0) - language = DW_LANG_C99; + { + if (strcmp (language_string, GNU C89) != 0) + language = DW_LANG_C99; + + if (dwarf_version = 5 || !dwarf_strict) + if (strcmp (language_string, GNU C11) == 0) + language = DW_LANG_C11; + } Shouldn't we emit at least DW_LANG_C99 for GNU C11 if not dwarf_version = 5 /* || !dwarf_strict */ but dwarf_version = 3 || !dwarf_strict is true? Yes, that is the intention. If it is a versioned GNU C then it is at least DW_LANG_C89, if we have -gdwarf-3 or higher and it isn't GNU C89 then it is at least DW_LANG_C99 and if we have -gdwarf-5 and it is GNU C11 then we emit DW_LANG_C11. I added an explicit testcase for this. BTW, noticed we don't have anything for Fortran 2003 and 2008, filed a DWARF Issue for that. Thanks. I have only focussed on C and C++ because I don't know anything about version changes in other language standards. With the above change everything keeps working fine. You only need a patched GDB when explicitly using -gdwarf-5. OK to commit? Thanks, Mark PR debug/38757 continued. Handle C11, C++11 and C++14. Add experimental (minimal) DWARFv5 support. This change depends on the new DWARFv5 constants mentioned in the following draft: http://dwarfstd.org/doc/dwarf5.20141029.pdf gcc/ChangeLog * doc/invoke.texi (-gdwarf-@{version}): Mention experimental DWARFv5. * opts.c (common_handle_option): Accept -gdwarf-5. * dwarf2out.c (is_cxx): Add DW_LANG_C_plus_plus_11 and DW_LANG_C_plus_plus_14. (lower_bound_default): Likewise. Plus DW_LANG_C11. (gen_compile_unit_die): Output DW_LANG_C_plus_plus_11, DW_LANG_C_plus_plus_14 or DW_LANG_C11. (output_compilation_unit_header): Output at most a DWARFv4 header. (output_skeleton_debug_sections): Likewise. (output_line_info): Likewise. (output_aranges): Document header version number. gcc/testsuite/ChangeLog * gcc.dg/debug/dwarf2/lang-c11.c: New test. * gcc.dg/debug/dwarf2/lang-c11-d4-strict.c: Likewise. * g++.dg/debug/dwarf2/lang-cpp11.C: Likewise. * g++.dg/debug/dwarf2/lang-cpp14.C: Likewise. * g++.dg/debug/dwarf2/lang-cpp98.C: Likewise. include/ChangeLog * dwarf2.h: Add DW_LANG_C_plus_plus_11, DW_LANG_C11 and DW_LANG_C_plus_plus_14. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 89edddb..d7bce2a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -5407,8 +5407,8 @@ assembler (GAS) to fail with an error. @item -gdwarf-@var{version} @opindex gdwarf-@var{version} Produce debugging information in DWARF format (if that is supported). -The value of @var{version} may be either 2, 3 or 4; the default version -for most targets is 4. +The value of @var{version} may be either 2, 3, 4 or 5; the default version +for most targets is 4. DWARF Version 5 is only experimental. Note that with DWARF Version 2, some ports require and always use some non-conflicting DWARF 3 extensions in the unwind tables. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 3d50ac9..d0eaaf1 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -4684,7 +4684,8 @@ is_cxx (void) { unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language); - return lang == DW_LANG_C_plus_plus || lang == DW_LANG_ObjC_plus_plus; + return (lang == DW_LANG_C_plus_plus || lang == DW_LANG_ObjC_plus_plus + || lang == DW_LANG_C_plus_plus_11 || lang == DW_LANG_C_plus_plus_14); } /* Return TRUE if the language is Java. */ @@ -8966,7 +8967,9 @@ output_die (dw_die_ref die) static void
Re: [RFC] First steps towards segregating types.
On November 21, 2014 9:22:08 PM CET, Andrew MacLeod amacl...@redhat.com wrote: On 11/21/2014 03:13 PM, Richard Biener wrote: On November 21, 2014 8:45:09 PM CET, Diego Novillo dnovi...@google.com wrote: On Fri, Nov 21, 2014 at 1:48 PM, Andrew MacLeod amacl...@redhat.com wrote: 1 - introduce a TYPE_REF tree node, which is effectively just a 'typed' tree node, and the TREE_TYPE() field of a TYPE_REF node would point to the type node. Any routines which utilize a TYPE node in a tree list would have to be modified to make use of this new TYPE_REF node to refer to the type. 2 - change the field (list-value in this case) to be a tagged union of { tree tree_value, tree_type_ptr type_value } and use a bit in the base to flag which kind of value it is. This would be compatible with GTY, and would require changing routines and algorithms to check the bit and use the right field. Seems to me that option 2 would also help against code that blindly looks at TREE_VALUE and assumes it to be a tree. Wouldn't that make initial implementation a bit more challenging? Option 1 does seem easier, but I kind of like the forcing of rvalues that option 2 provides. Also liking option 1. The final change to the final type should be simpler that way. I don't like either :). It seems you are concerned about uses from trees. An intermediate step here that would be useful is doing what David did for RTL insns and now gimple - expose tree_type as static type but keep tree as its base. Didn't say I was thrilled with either, just the only 2 I had come up with :-) Thus make references to trees that are always types use tree_type * while keeping those that can refer to types and sth else refer to tree. That's something that would not be completely artificial at this point. Richard. Or possibly a third type which is a hybrid of the two, and also maps to a tree... something like tree_type_hybrid * That could work, and will continue to highlight all the places which still need to be dealt with. Well, on a case by case basis you could find a better union in the tree type hierarchy. Richard. And it's much less work.. :-) I'll give that a go and see how it plays out. Thanks Andrew
[PATCH v2] gcc/ubsan.c: Use 'pretty_print' for 'pretty_name' to avoid memory overflow
According to the next code, 'pretty_name' may need additional bytes more than 16 (may have unlimited length for array type). There is an easy way to fix it: use 'pretty_print' for 'pretty_name'. Let the code meet 2 white spaces alignment coding styles (originally, some of code is 1 white sapce alignment). It passes testsuite under fedora 20 x86_64-unknown-linux-gnu. 2014-11-22 Chen Gang gang.chen.5...@gmail.com * ubsan.c (ubsan_type_descriptor): Use 'pretty_print' for 'pretty_name' to avoid memory overflow --- gcc/ubsan.c | 57 +++-- 1 file changed, 27 insertions(+), 30 deletions(-) diff --git a/gcc/ubsan.c b/gcc/ubsan.c index 41cf546..c03b000 100644 --- a/gcc/ubsan.c +++ b/gcc/ubsan.c @@ -336,7 +336,7 @@ ubsan_type_descriptor (tree type, enum ubsan_print_style pstyle) tree dtype = ubsan_type_descriptor_type (); tree type2 = type; const char *tname = NULL; - char *pretty_name; + pretty_printer pretty_name; unsigned char deref_depth = 0; unsigned short tkind, tinfo; @@ -375,54 +375,50 @@ ubsan_type_descriptor (tree type, enum ubsan_print_style pstyle) /* We weren't able to determine the type name. */ tname = unknown; - /* Decorate the type name with '', '*', struct, or union. */ - pretty_name = (char *) alloca (strlen (tname) + 16 + deref_depth); if (pstyle == UBSAN_PRINT_POINTER) { - int pos = sprintf (pretty_name, '%s%s%s%s%s%s%s, -TYPE_VOLATILE (type2) ? volatile : , -TYPE_READONLY (type2) ? const : , -TYPE_RESTRICT (type2) ? restrict : , -TYPE_ATOMIC (type2) ? _Atomic : , -TREE_CODE (type2) == RECORD_TYPE -? struct -: TREE_CODE (type2) == UNION_TYPE - ? union : , tname, -deref_depth == 0 ? : ); + pp_printf (pretty_name, '%s%s%s%s%s%s%s, +TYPE_VOLATILE (type2) ? volatile : , +TYPE_READONLY (type2) ? const : , +TYPE_RESTRICT (type2) ? restrict : , +TYPE_ATOMIC (type2) ? _Atomic : , +TREE_CODE (type2) == RECORD_TYPE +? struct +: TREE_CODE (type2) == UNION_TYPE + ? union : , tname, +deref_depth == 0 ? : ); while (deref_depth-- 0) -pretty_name[pos++] = '*'; - pretty_name[pos++] = '\''; - pretty_name[pos] = '\0'; + pp_star(pretty_name); + pp_quote(pretty_name); } else if (pstyle == UBSAN_PRINT_ARRAY) { /* Pretty print the array dimensions. */ gcc_assert (TREE_CODE (type) == ARRAY_TYPE); tree t = type; - int pos = sprintf (pretty_name, '%s , tname); + pp_printf (pretty_name, '%s , tname); while (deref_depth-- 0) -pretty_name[pos++] = '*'; + pp_star(pretty_name); while (TREE_CODE (t) == ARRAY_TYPE) { - pretty_name[pos++] = '['; + pp_left_bracket(pretty_name); tree dom = TYPE_DOMAIN (t); if (dom TREE_CODE (TYPE_MAX_VALUE (dom)) == INTEGER_CST) - pos += sprintf (pretty_name[pos], HOST_WIDE_INT_PRINT_DEC, - tree_to_uhwi (TYPE_MAX_VALUE (dom)) + 1); + pp_printf (pretty_name, HOST_WIDE_INT_PRINT_DEC, + tree_to_uhwi (TYPE_MAX_VALUE (dom)) + 1); else /* ??? We can't determine the variable name; print VLA unspec. */ - pretty_name[pos++] = '*'; - pretty_name[pos++] = ']'; + pp_star(pretty_name); + pp_right_bracket(pretty_name); t = TREE_TYPE (t); } - pretty_name[pos++] = '\''; - pretty_name[pos] = '\0'; + pp_quote(pretty_name); - /* Save the tree with stripped types. */ - type = t; + /* Save the tree with stripped types. */ + type = t; } else -sprintf (pretty_name, '%s', tname); +pp_printf (pretty_name, '%s', tname); switch (TREE_CODE (type)) { @@ -459,8 +455,9 @@ ubsan_type_descriptor (tree type, enum ubsan_print_style pstyle) DECL_IGNORED_P (decl) = 1; DECL_EXTERNAL (decl) = 0; - size_t len = strlen (pretty_name); - tree str = build_string (len + 1, pretty_name); + const char *tmp = pp_formatted_text(pretty_name); + size_t len = strlen (tmp); + tree str = build_string (len + 1, tmp); TREE_TYPE (str) = build_array_type (char_type_node, build_index_type (size_int (len))); TREE_READONLY (str) = 1; -- 1.9.3
Re: [PATCH][OpenMP] Fix named critical sections inside target functions
On 21 Nov 2014, at 23:36, Jakub Jelinek ja...@redhat.com wrote: On Fri, Nov 21, 2014 at 11:19:26PM +0300, Ilya Verbin wrote: Hi, '#pragma omp critical (name)' can be placed in the function, marked with '#pragma omp declare target', in this case the corresponding node should be marked as offloadable too. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Please add a testcase for this. By default with disabled offloading it will always PASS. Add anyway? -- Ilya
Re: [PATCH][OpenMP] Fix named critical sections inside target functions
On Fri, Nov 21, 2014 at 1:08 PM, Ilya Verbin iver...@gmail.com wrote: On 21 Nov 2014, at 23:36, Jakub Jelinek ja...@redhat.com wrote: On Fri, Nov 21, 2014 at 11:19:26PM +0300, Ilya Verbin wrote: Hi, '#pragma omp critical (name)' can be placed in the function, marked with '#pragma omp declare target', in this case the corresponding node should be marked as offloadable too. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Please add a testcase for this. By default with disabled offloading it will always PASS. Add anyway? Have you fixed the offloading issue with binutils 2.25? -- H.J.