[PATCH]: Guard division by zero in ipa-icf dumps
Hello! 2014-10-24 Martin Liska mli...@suse.cz * ipa-icf.c (sem_item_optimizer::parse_nonsingleton_classes): Guard division by zero in dumps. (sem_item_optimizer::merge_classes): Ditto. Tested on alphaev68-linux-gnu, committed to mainline as obvious. Uros. diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index d1238a4..e7a293e 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -1736,7 +1736,7 @@ sem_item_optimizer::parse_nonsingleton_classes (void) if (dump_file) fprintf (dump_file, Init called for %u items (%.2f%%).\n, init_called_count, -100.0f * init_called_count / m_items.length ()); +m_items.length () ? 100.0f * init_called_count / m_items.length (): 0.0f); } /* Equality function for semantic items is used to subdivide existing @@ -2196,14 +2196,15 @@ sem_item_optimizer::merge_classes (unsigned int prev_class_count) fprintf (dump_file, Congruent classes before: %u, after: %u\n, prev_class_count, class_count); fprintf (dump_file, Average class size before: %.2f, after: %.2f\n, - 1.0f * item_count / prev_class_count, - 1.0f * item_count / class_count); + prev_class_count ? 1.0f * item_count / prev_class_count : 0.0f, + class_count ? 1.0f * item_count / class_count : 0.0f); fprintf (dump_file, Average non-singular class size: %.2f, count: %u\n, - 1.0f * non_singular_classes_sum / non_singular_classes_count, + non_singular_classes_count ? 1.0f * non_singular_classes_sum / + non_singular_classes_count : 0.0f, non_singular_classes_count); fprintf (dump_file, Equal symbols: %u\n, equal_items); fprintf (dump_file, Fraction of visited symbols: %.2f%%\n\n, - 100.0f * equal_items / item_count); + item_count ? 100.0f * equal_items / item_count : 0.0f); } for (hash_tablecongruence_class_group_hash::iterator it = m_classes.begin ();
Re: [PATCH] Add zero-overhead looping for xtensa backend
On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com wrote: Here the key point is we need a general purpose register for the loop instruction. So the question to ask here is, How does this work today, without loop instructions? Somehow--even when it has been spilled--a branch instruction can test the trip count. There should be no difference. And we cannot use zero-cost looping in this situation. And that's why I spilt the zero_cost_loop_end into a normal test and branch. Also note that the hwloop_pattern_reg interface also expects a general purpose register in the doloop_end pattern. If there were no loop instruction, how would this work?
[PATCH] c++ify sreal
From: Trevor Saunders tsaund...@mozilla.com Hi, do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Trev gcc/ChangeLog: 2014-10-24 Trevor Saunders tsaund...@mozilla.com * ipa-inline.c (edge_badness): Adjust. (inline_small_functions): Likewise. * predict.c (propagate_freq): Likewise. (estimate_bb_frequencies): Likewise. * sreal.c (sreal::dump): Rename from dump_sreal. (debug): Adjust. (copy): Remove function. (sreal::shift_right): Rename from sreal_sift_right. (sreal::normalize): Rename from normalize. (sreal_init): Remove function. (sreal::to_int): Rename from sreal_to_int. (sreal_compare): Remove function. (sreal::operator+): Rename from sreal_add. (sreal::operator-): Rename from sreal_sub. (sreal::operator*): Rename from sreal_mul. (sreal::operator/): Rename from sreal_div. * sreal.h (class sreal): Adjust. (inline sreal operator+=): New operator. (inline sreal operator-=): Likewise. (inline sreal operator/=): Likewise. (inline sreal operator*=): Likewise. (inline bool operator!=): Likewise. (inline bool operator): Likewise. (inline bool operator=): Likewise. (inline bool operator=): Likewise. --- gcc/ipa-inline.c | 25 ++- gcc/predict.c| 82 -- gcc/sreal.c | 479 +++ gcc/sreal.h | 97 --- 4 files changed, 213 insertions(+), 470 deletions(-) diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index e79a4dd..cca1fb3 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -939,29 +939,28 @@ edge_badness (struct cgraph_edge *edge, bool dump) else if (max_count) { - sreal tmp, relbenefit_real, growth_real; int relbenefit = relative_time_benefit (callee_info, edge, edge_time); /* Capping edge-count to max_count. edge-count can be larger than max_count if an inline adds new edges which increase max_count after max_count is computed. */ gcov_type edge_count = edge-count max_count ? max_count : edge-count; - sreal_init (relbenefit_real, relbenefit, 0); - sreal_init (growth_real, growth, 0); + sreal relbenefit_real (relbenefit, 0); + sreal growth_real (growth, 0); /* relative_edge_count. */ - sreal_init (tmp, edge_count, 0); - sreal_div (tmp, tmp, max_count_real); + sreal tmp (edge_count, 0); + tmp /= max_count_real; /* relative_time_benefit. */ - sreal_mul (tmp, tmp, relbenefit_real); - sreal_div (tmp, tmp, max_relbenefit_real); + tmp *= relbenefit_real; + tmp /= max_relbenefit_real; /* growth_f_caller. */ - sreal_mul (tmp, tmp, half_int_min_real); - sreal_div (tmp, tmp, growth_real); + tmp *= half_int_min_real; + tmp /= growth_real; - badness = -1 * sreal_to_int (tmp); + badness = -1 * tmp.to_int (); if (dump) { @@ -1604,9 +1603,9 @@ inline_small_functions (void) if (max_count edge-count) max_count = edge-count; } - sreal_init (max_count_real, max_count, 0); - sreal_init (max_relbenefit_real, RELATIVE_TIME_BENEFIT_RANGE, 0); - sreal_init (half_int_min_real, INT_MAX / 2, 0); + max_count_real = sreal (max_count, 0); + max_relbenefit_real = sreal (RELATIVE_TIME_BENEFIT_RANGE, 0); + half_int_min_real = sreal (INT_MAX / 2, 0); ipa_free_postorder_info (); initialize_growth_caches (); diff --git a/gcc/predict.c b/gcc/predict.c index 5f5d4a5..10675c3 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -2571,15 +2571,13 @@ propagate_freq (basic_block head, bitmap tovisit) bb-count = bb-frequency = 0; } - memcpy (BLOCK_INFO (head)-frequency, real_one, sizeof (real_one)); + BLOCK_INFO (head)-frequency = real_one; last = head; for (bb = head; bb; bb = nextbb) { edge_iterator ei; - sreal cyclic_probability, frequency; - - memcpy (cyclic_probability, real_zero, sizeof (real_zero)); - memcpy (frequency, real_zero, sizeof (real_zero)); + sreal cyclic_probability = real_zero; + sreal frequency = real_zero; nextbb = BLOCK_INFO (bb)-next; BLOCK_INFO (bb)-next = NULL; @@ -2596,42 +2594,34 @@ propagate_freq (basic_block head, bitmap tovisit) FOR_EACH_EDGE (e, ei, bb-preds) if (EDGE_INFO (e)-back_edge) { - sreal_add (cyclic_probability, cyclic_probability, - EDGE_INFO (e)-back_edge_prob); + cyclic_probability += EDGE_INFO (e)-back_edge_prob; } else if (!(e-flags EDGE_DFS_BACK)) { - sreal tmp; - /* frequency += (e-probability * BLOCK_INFO (e-src)-frequency /
Re: [PATCH] Add zero-overhead looping for xtensa backend
On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com wrote: Here the key point is we need a general purpose register for the loop instruction. So the question to ask here is, How does this work today, without loop instructions? Somehow--even when it has been spilled--a branch instruction can test the trip count. There should be no difference. And we cannot use zero-cost looping in this situation. And that's why I spilt the zero_cost_loop_end into a normal test and branch. Also note that the hwloop_pattern_reg interface also expects a general purpose register in the doloop_end pattern. If there were no loop instruction, how would this work? Just take a look at my patch. I handle this in the new define_split: +(define_split + [(set (pc) +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand ) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 nonimmediate_operand ) +(plus:SI (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END) + (clobber (match_scratch 3))] + TARGET_LOOPS optimize reload_completed + [(const_int 0)] +{ + if (!REG_P (operands[0])) +{ + rtx test; + + /* Fallback into a normal conditional branch insn. */ + emit_move_insn (operands[3], operands[0]); + emit_insn (gen_addsi3 (operands[3], operands[3], constm1_rtx)); + emit_move_insn (operands[0], operands[3]); + test = gen_rtx_NE (VOIDmode, operands[3], const0_rtx); + emit_jump_insn (gen_cbranchsi4 (test, operands[3], + const0_rtx, operands[1])); +} + else +{ + emit_jump_insn (gen_loop_end (operands[0], operands[1], operands[2])); +} + + DONE; +})
Re: [PATCH] Add zero-overhead looping for xtensa backend
I mean without your patch at all. On Thu, Oct 23, 2014 at 11:30 PM, Yangfei (Felix) felix.y...@huawei.com wrote: On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com wrote: Here the key point is we need a general purpose register for the loop instruction. So the question to ask here is, How does this work today, without loop instructions? Somehow--even when it has been spilled--a branch instruction can test the trip count. There should be no difference. And we cannot use zero-cost looping in this situation. And that's why I spilt the zero_cost_loop_end into a normal test and branch. Also note that the hwloop_pattern_reg interface also expects a general purpose register in the doloop_end pattern. If there were no loop instruction, how would this work? Just take a look at my patch. I handle this in the new define_split: +(define_split + [(set (pc) +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand ) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 nonimmediate_operand ) +(plus:SI (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END) + (clobber (match_scratch 3))] + TARGET_LOOPS optimize reload_completed + [(const_int 0)] +{ + if (!REG_P (operands[0])) +{ + rtx test; + + /* Fallback into a normal conditional branch insn. */ + emit_move_insn (operands[3], operands[0]); + emit_insn (gen_addsi3 (operands[3], operands[3], constm1_rtx)); + emit_move_insn (operands[0], operands[3]); + test = gen_rtx_NE (VOIDmode, operands[3], const0_rtx); + emit_jump_insn (gen_cbranchsi4 (test, operands[3], + const0_rtx, operands[1])); +} + else +{ + emit_jump_insn (gen_loop_end (operands[0], operands[1], operands[2])); +} + + DONE; +})
Re: [PATCH] Add zero-overhead looping for xtensa backend
1. The original xtensa port never generates loop instruction at all. 2. A port doesn't need to implement hwloop_pattern_reg hook if it has no zero-cost loop instruction. Is that clear? I mean without your patch at all. On Thu, Oct 23, 2014 at 11:30 PM, Yangfei (Felix) felix.y...@huawei.com wrote: On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com wrote: Here the key point is we need a general purpose register for the loop instruction. So the question to ask here is, How does this work today, without loop instructions? Somehow--even when it has been spilled--a branch instruction can test the trip count. There should be no difference. And we cannot use zero-cost looping in this situation. And that's why I spilt the zero_cost_loop_end into a normal test and branch. Also note that the hwloop_pattern_reg interface also expects a general purpose register in the doloop_end pattern. If there were no loop instruction, how would this work? Just take a look at my patch. I handle this in the new define_split: +(define_split + [(set (pc) +(if_then_else (ne (match_operand:SI 0 nonimmediate_operand ) + (const_int 1)) + (label_ref (match_operand 1 )) + (pc))) + (set (match_operand:SI 2 nonimmediate_operand ) +(plus:SI (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END) + (clobber (match_scratch 3))] + TARGET_LOOPS optimize reload_completed + [(const_int 0)] +{ + if (!REG_P (operands[0])) +{ + rtx test; + + /* Fallback into a normal conditional branch insn. */ + emit_move_insn (operands[3], operands[0]); + emit_insn (gen_addsi3 (operands[3], operands[3], constm1_rtx)); + emit_move_insn (operands[0], operands[3]); + test = gen_rtx_NE (VOIDmode, operands[3], const0_rtx); + emit_jump_insn (gen_cbranchsi4 (test, operands[3], + const0_rtx, operands[1])); +} + else +{ + emit_jump_insn (gen_loop_end (operands[0], operands[1], operands[2])); +} + + DONE; +})
Re: [PATCH] Add zero-overhead looping for xtensa backend
On Thu, Oct 23, 2014 at 11:40 PM, Yangfei (Felix) felix.y...@huawei.com wrote: 1. The original xtensa port never generates loop instruction at all. 2. A port doesn't need to implement hwloop_pattern_reg hook if it has no zero-cost loop instruction. Is that clear? We are talking in circles. I understand very well what goes on here. My point is: 1. Right now, today, GCC generates loops with branch instructions even when the trip count is spilled. 2. Branch instructions and loop instructions have identical register requirements. Therefore: 3. loop instructions should be generatable when the trip count is spilled.
[PATCH] Fix bootstrap/PR63632
r216566 (r216568 for 4.9 branch) added %{fno-lto} to LINK_COMMAND_SPEC. However the linker doesn't understand -fno-lto and errors out. This causes LTO/PGO bootstrap to fail, because -fno-lto is used during STAGEprofile. Fixed by filtering out -fno-lto in collect2.c. LTO/PGO bootstrapped and tested on powerpc64-unknown-linux-gnu. Preapproved by Jakub on IRC. Applied to trunk and 4.9 branch. 2014-10-24 Markus Trippelsdorf mar...@trippelsdorf.de PR bootstrap/63632 * collect2.c (main): Filter out -fno-lto. PR bootstrap/63632 * g++.dg/torture/pr63632.C: New test. diff --git a/gcc/collect2.c b/gcc/collect2.c index c54e6fb51578..7c067ffcafbb 100644 --- a/gcc/collect2.c +++ b/gcc/collect2.c @@ -1311,6 +1311,12 @@ main (int argc, char **argv) ld1--; ld2--; } + else if (strncmp (arg, -fno-lto, 8) == 0) + { + /* Do not pass -fno-lto to the linker. */ + ld1--; + ld2--; + } #ifdef TARGET_AIX_VERSION else { diff --git a/gcc/testsuite/g++.dg/torture/pr63632.C b/gcc/testsuite/g++.dg/torture/pr63632.C new file mode 100644 index ..48cd8692412a --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/pr63632.C @@ -0,0 +1,5 @@ +// PR bootstrap/63632 +// { dg-do link } +// { dg-options -fno-lto } + +int main () {} -- Markus
Re: [PATCH] Add zero-overhead looping for xtensa backend
Thanks for the explanation. I think I am clear about what you are thinking now. That's an interesting question. I am not sure about reason why GCC's reload cannot handle a doloop_end insn. I guess maybe the doloop_end pattern is special? I mean it's a branch insn in a parallel form. On Thu, Oct 23, 2014 at 11:40 PM, Yangfei (Felix) felix.y...@huawei.com wrote: 1. The original xtensa port never generates loop instruction at all. 2. A port doesn't need to implement hwloop_pattern_reg hook if it has no zero-cost loop instruction. Is that clear? We are talking in circles. I understand very well what goes on here. My point is: 1. Right now, today, GCC generates loops with branch instructions even when the trip count is spilled. 2. Branch instructions and loop instructions have identical register requirements. Therefore: 3. loop instructions should be generatable when the trip count is spilled.
Re: [PATCH] Add zero-overhead looping for xtensa backend
On Thu, Oct 23, 2014 at 11:51 PM, Yangfei (Felix) felix.y...@huawei.com wrote: Thanks for the explanation. I think I am clear about what you are thinking now. That's an interesting question. I am not sure about reason why GCC's reload cannot handle a doloop_end insn. I guess maybe the doloop_end pattern is special? I mean it's a branch insn in a parallel form. No it is not special. Just jump are never handled by reload. I thought this was documented somewhere also. Basically the main issue with jumps is where does the reload value go which side of the jump? Thanks, Andrew On Thu, Oct 23, 2014 at 11:40 PM, Yangfei (Felix) felix.y...@huawei.com wrote: 1. The original xtensa port never generates loop instruction at all. 2. A port doesn't need to implement hwloop_pattern_reg hook if it has no zero-cost loop instruction. Is that clear? We are talking in circles. I understand very well what goes on here. My point is: 1. Right now, today, GCC generates loops with branch instructions even when the trip count is spilled. 2. Branch instructions and loop instructions have identical register requirements. Therefore: 3. loop instructions should be generatable when the trip count is spilled.
Re: [PATCH 5/5] add libcc1
On 10/10/14 22:58, Jeff Law wrote: On 10/09/14 03:07, Phil Muldoon wrote: Sorry for taking so long to reply. We've talked, on irc and elsewhere a little (some at the Cauldron too!). I think the consensus is as nobody has explicitly mentioned anything, this is OK to go in? Yes, please go ahead and check it in. You'll be the first contact point if something goes wrong :-) Given the length of time since the original post and now, can you please do sanity bootstrap to make sure nothing's bitrotted before you commit? I rebased the patch on top of GCC head (from the git repository), updated the ChangeLogs, etc from two days ago (it takes two days to do a full rebase, pristine and patched bootstrap and testrun on my poor laptop ;). I've built both pristine and patched branches with bootstrap enabled. I ran both testsuites and used contrib/compare_tests to make sure everything was as it should be. compare_tests reports everything as fine. One minor change I found, was due to some ongoing work on hash_tables. It seems to parameterless constructor call for a new hash table has been removed. This was trivially fixed with the patch attached. Even though (to me) it is obvious, what do you think? Cheers Phil -- diff --git a/libcc1/plugin.cc b/libcc1/plugin.cc index fbb49d3..5cdd19d 100644 --- a/libcc1/plugin.cc +++ b/libcc1/plugin.cc @@ -220,13 +220,10 @@ static plugin_context *current_context; plugin_context::plugin_context (int fd) : cc1_plugin::connection (fd), -address_map (), -preserved (), -file_names () +address_map (30), +preserved (30), +file_names (30) { - address_map.create (20); - preserved.create (20); - file_names.create (20); } void @@ -236,8 +233,8 @@ plugin_context::mark () it != address_map.end (); ++it) { - ggc_mark ((*it).decl); - ggc_mark ((*it).address); + ggc_mark ((*it)-decl); + ggc_mark ((*it)-address); } for (hash_table pointer_hashtree_node ::iterator it = preserved.begin ();
Re: [PATCH 5/5] add libcc1
On Fri, Oct 24, 2014 at 08:15:36AM +0100, Phil Muldoon wrote: On 10/10/14 22:58, Jeff Law wrote: On 10/09/14 03:07, Phil Muldoon wrote: Sorry for taking so long to reply. We've talked, on irc and elsewhere a little (some at the Cauldron too!). I think the consensus is as nobody has explicitly mentioned anything, this is OK to go in? Yes, please go ahead and check it in. You'll be the first contact point if something goes wrong :-) Given the length of time since the original post and now, can you please do sanity bootstrap to make sure nothing's bitrotted before you commit? I rebased the patch on top of GCC head (from the git repository), updated the ChangeLogs, etc from two days ago (it takes two days to do a full rebase, pristine and patched bootstrap and testrun on my poor laptop ;). I've built both pristine and patched branches with bootstrap enabled. I ran both testsuites and used contrib/compare_tests to make sure everything was as it should be. compare_tests reports everything as fine. One minor change I found, was due to some ongoing work on hash_tables. It seems to parameterless constructor call for a new hash table has been removed. This was trivially fixed with the patch attached. Even though (to me) it is obvious, what do you think? --- a/libcc1/plugin.cc +++ b/libcc1/plugin.cc @@ -220,13 +220,10 @@ static plugin_context *current_context; plugin_context::plugin_context (int fd) : cc1_plugin::connection (fd), -address_map (), -preserved (), -file_names () +address_map (30), +preserved (30), +file_names (30) { - address_map.create (20); - preserved.create (20); - file_names.create (20); This is http://gcc.gnu.org/r211936 , i.e. https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01598.html so the changes are fine. } void @@ -236,8 +233,8 @@ plugin_context::mark () it != address_map.end (); ++it) { - ggc_mark ((*it).decl); - ggc_mark ((*it).address); + ggc_mark ((*it)-decl); + ggc_mark ((*it)-address); } And this is http://gcc.gnu.org/r211937 , i.e. https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01599.html in action. for (hash_table pointer_hashtree_node ::iterator it = preserved.begin (); So, if these are the only non-obvious changes you needed, please go ahead and commit. Jakub
Re: [PATCH] Fix genmatch linking
Richard Biener rguent...@suse.de writes: On Thu, 23 Oct 2014, Richard Biener wrote: Final try for today. And this may work as well and is slightly simpler. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 216590) +++ gcc/Makefile.in (working copy) @@ -981,7 +981,15 @@ else LIBIBERTY = ../libiberty/libiberty.a BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a endif +# For stage1 and when cross-compiling use the build libcpp which is +# built with NLS disabled. For stage2+ use the host library and +# its dependencies. +ifeq ($(build_objdir),$(build_libobjdir)) BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a +else +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV) +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) +endif # Dependencies on the intl and portability libraries. LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \ Can you test it please? Sure: this version allowed an i386-pc-solaris2.10 bootstrap to complete just fine. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Fix genmatch linking
On Thu, 23 Oct 2014, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: This adds a libcpp host module without NLS and ICONV support and properly links genmatch against the build libcpp instead of the host one. Bootstrap running on x86_64-unknown-linux-gnu (stage1 all-gcc finished fine). Ok for trunk? Thanks, Richard. 2014-10-23 Richard Biener rguent...@suse.de * Makefile.def: Add libcpp build module and dependencies. * configure.ac: Add libcpp build module. * Makefile.in: Regenerate. * configure: Likewise. gcc/ * Makefile.in (BUILD_CPPLIB): Add. (build/genmatch$(build_exeext)): Use BUILD_CPPLIB, not CPPLIB. Drop LIBIBERTY. This breaks a -j1 (!) build on x86_64-linux-gnu for me with: g++ -I/blah/libcpp -I. -I/blah/libcpp/../include -I/blah/libcpp/include -g -O2 -W -Wall -Wno-narrowing -Wwrite-strings -Wmissing-format-attribute -pedantic -Wno-long-long -fno-exceptions -fno-rtti -I/blah/libcpp -I. -I/blah/libcpp/../include -I/blah/libcpp/include -c -o charset.o -MT charset.o -MMD -MP -MF .deps/charset.Tpo /blah/libcpp/charset.c In file included from /blah/libcpp/../include/hashtab.h:40:0, from /blah/libcpp/../include/filenames.h:29, from /blah/libcpp/system.h:367, from /blah/libcpp/charset.c:21: /blah/libcpp/../include/ansidecl.h:171:64: error: new declaration ‘char* basename(const char*)’ # define ATTRIBUTE_NONNULL(m) __attribute__ ((__nonnull__ (m))) ^ /blah/libcpp/../include/libiberty.h:113:64: note: in expansion of macro ‘ATTRIBUTE_NONNULL’ extern char *basename (const char *) ATTRIBUTE_RETURNS_NONNULL ATTRIBUTE_NONNULL(1); ^ In file included from /blah/libcpp/system.h:201:0, from /blah/libcpp/charset.c:21: /usr/include/string.h:597:26: error: ambiguates old declaration ‘const char* basename(const char*)’ extern C++ const char *basename (const char *__filename) ^ make: *** [charset.o] Error 1 The build libiberty and libcpp share a config.cache file and both check Ugh - why do they do this kind of thing? Btw, I can't decipher where or how it's doing that. for basename. The problem is that libiberty is built without _GNU_SOURCE and doesn't see the declaration while libcpp is built with it and does see the declaration. So if libiberty configures first we assume basename isn't available. If they both configure at the same time we're OK. Bah. Where does the _GNU_SOURCE come from? Is it libcpp being built with C++ and libiberty build with C? I think the solution is to force a separate config.cache for libcpp. But as I didn't yet figure out how it ends up sharing config.cache I didn't figure out how to disable that sharing... Thanks, Richard. Thanks, Richard -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH 1/2, x86, PR63534] Fix darwin bootstrap
On Thu, Oct 23, 2014 at 11:19 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Linux x86 bootstrap/make check passed. Darwin x86 bootstrap passed configured with --with-cpu=intel --with-cpu=corei7 issue refers to PR63620. On Fri, Oct 17, 2014 at 6:08 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, The patch fixes 1st fail in darwin bootstarp. When PIC register is pseudo we don't need to init it after setjmp or non local goto. Is it ok? ChangeLog: 2014-10-17 Evgeny Stupachenko evstu...@gmail.com PR target/63534 * config/i386/i386.c (builtin_setjmp_receiver): Delete. (nonlocal_goto_receiver): Ditto. This one needs approval from Darwin maintainer. Uros.
Re: PATCH: fix breakage from [PATCH] Fix genmatch linking
On Fri, 24 Oct 2014, Hans-Peter Nilsson wrote: From: Richard Biener rguent...@suse.de Date: Thu, 23 Oct 2014 10:47:43 +0200 This adds a libcpp host module without NLS and ICONV support and properly links genmatch against the build libcpp instead of the host one. Bootstrap running on x86_64-unknown-linux-gnu (stage1 all-gcc finished fine). Ok for trunk? Thanks, Richard. 2014-10-23 Richard Biener rguent...@suse.de * Makefile.def: Add libcpp build module and dependencies. * configure.ac: Add libcpp build module. * Makefile.in: Regenerate. * configure: Likewise. You only exposed a dormant issue, but ever since this commit (nominally, some commit in (216573:216588] but I only see this one matching) I see, for cross-builds (to cris-elf): make[2]: Entering directory `/tmp/hpautotest-gcc1/cris-elf/gccobj/build-x86_64-unknown-linux-gnu/libcpp' g++ -I/tmp/hpautotest-gcc1/gcc/libcpp -I. -I/tmp/hpautotest-gcc1/gcc/libcpp/../include -I/tmp/hpautotest-gcc1/gcc/libcpp/include -g -O2 -W -Wall -Wwrite-strings -Wmissing-format-attribute -pedantic -Wno-long-long -fno-exceptions -fno-rtti -I/tmp/hpautotest-gcc1/gcc/libcpp -I. -I/tmp/hpautotest-gcc1/gcc/libcpp/../include -I/tmp/hpautotest-gcc1/gcc/libcpp/include -c -o charset.o -MT charset.o -MMD -MP -MF .deps/charset.Tpo /tmp/hpautotest-gcc1/gcc/libcpp/charset.c In file included from /tmp/hpautotest-gcc1/gcc/libcpp/system.h:370, from /tmp/hpautotest-gcc1/gcc/libcpp/charset.c:21: /tmp/hpautotest-gcc1/gcc/libcpp/../include/libiberty.h:113: error: new declaration 'char* basename(const char*)' /usr/include/string.h:601: error: ambiguates old declaration 'const char* basename(const char*)' make[2]: *** [charset.o] Error 1 make[2]: Leaving directory `/tmp/hpautotest-gcc1/cris-elf/gccobj/build-x86_64-unknown-linux-gnu/libcpp' make[1]: *** [all-build-libcpp] Error 2 make[1]: Leaving directory `/tmp/hpautotest-gcc1/cris-elf/gccobj' make: *** [all] Error 2 Above that, we have: checking whether basename is declared... (cached) no and above that, we have: make[2]: Leaving directory `/tmp/hpautotest-gcc1/cris-elf/gccobj/build-x86_64-unknown-linux-gnu/fixincludes' mkdir -p -- build-x86_64-unknown-linux-gnu/libcpp Configuring in build-x86_64-unknown-linux-gnu/libcpp configure: loading cache ../config.cache which is apparently set due to (above that, first non-cached): mkdir -p -- build-x86_64-unknown-linux-gnu/libiberty Configuring in build-x86_64-unknown-linux-gnu/libiberty configure: creating cache ../config.cache ... checking whether basename is declared... no Your commit introduces build-subdirectories for cross-builds. Build-subdirs share a config.cache (in build-host/config.cache), with subdirs in build-host being fixincludes, libcpp and libiberty. But, libiberty and fixincludes are configure-tested and compiled using gcc, while libcpp is compiled with g++, which causes a different set of declarations to be exposed, so the shared config.cache is invalid and its use is bogus. Not sure how this works for native builds. The libcpp configure checks are actually run with gcc which is bogus by itself, but apparently working. I guess the C vs. C++ declaration etc. differences for libcpp are mostly hidden by using _GNU_SOURCE (through AC_USE_SYSTEM_EXTENSIONS), and I'm a bit surprised that's not used for libiberty and fixincludes. Still, a red herring. Aligning those options *may* cause the build to succeed, but I think that'd be too much of sweeping the issue under the carpet. It seems more correct to just disable the config.cache sharing between the differently-configured build-subdirectories, as is already is done for host-libraries and target-libraries, even if that may slow down the builds. (Erroring out is infinitely slower. :) Still, I don't understand exactly how your patch introduces build-subdirectories where there were none before. Maybe that +all-gcc: maybe-all-build-libcpp was wrong and should be different? No, we do need a build-libcpp to build gcc/build/genmatch. Not sure how you got around without a build-libiberty as other gen* programs surely require that. Anyway, with this, a cris-elf cross build passes the point of failure; compilers and libraries built, progressing into testing. Ok to commit? Ok. Thanks, Richard. toplev: * configure.ac (build_configargs): Don't share config.cache between build subdirs. Index: configure.ac === --- configure.ac (revision 216610) +++ configure.ac (working copy) @@ -2922,8 +2922,10 @@ AC_ARG_VAR([target_configargs], # For the build-side libraries, we just need to pretend we're native, # and not use the same cache file. Multilibs are neither needed nor -# desired. -build_configargs=$build_configargs --cache-file=../config.cache ${baseargs} +#
Re: [PATCH] Fix genmatch linking
On Fri, 24 Oct 2014, Rainer Orth wrote: Richard Biener rguent...@suse.de writes: On Thu, 23 Oct 2014, Richard Biener wrote: Final try for today. And this may work as well and is slightly simpler. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 216590) +++ gcc/Makefile.in (working copy) @@ -981,7 +981,15 @@ else LIBIBERTY = ../libiberty/libiberty.a BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a endif +# For stage1 and when cross-compiling use the build libcpp which is +# built with NLS disabled. For stage2+ use the host library and +# its dependencies. +ifeq ($(build_objdir),$(build_libobjdir)) BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a +else +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV) +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) +endif # Dependencies on the intl and portability libraries. LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \ Can you test it please? Sure: this version allowed an i386-pc-solaris2.10 bootstrap to complete just fine. Great. Installed as follows. Richard. 2014-10-24 Richard Biener rguent...@suse.de * Makefile.in (BUILD_CPPLIB): When in stage2+ use the host library and make sure to pull in the required libintl and libiconv dependencies. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 216590) +++ gcc/Makefile.in (working copy) @@ -981,7 +981,15 @@ else LIBIBERTY = ../libiberty/libiberty.a BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a endif +# For stage1 and when cross-compiling use the build libcpp which is +# built with NLS disabled. For stage2+ use the host library and +# its dependencies. +ifeq ($(build_objdir),$(build_libobjdir)) BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a +else +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV) +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) +endif # Dependencies on the intl and portability libraries. LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \
Re: [Patch ARM-AArch64/testsuite v3 00/21] Neon intrinsics executable tests
On 21 October 2014 14:02, Christophe Lyon christophe.l...@linaro.org wrote: This patch series is an updated version of the series I sent here: https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00022.html I addressed comments from Marcus and Richard, and decided to skip support for half-precision variants for the time being. I'll post dedicated patches later. Compared to v2: - the directory containing the new tests is named gcc.target/aarch64/adv-simd instead of gcc.target/aarch64/neon-intrinsics. - the driver is named adv-simd.exp instead of neon-intrinsics.exp - the driver is guarded against the new test parallelization framework - the README file uses 'Advanced SIMD (Neon)' instead of 'Neon' Thank you Christophe. Please commit all 21 patches in the series. /Marcus
[AARCH64, Question] Does AARCH64 GCC support long calls?
Hi, I find that the -mlong-calls option is not there for AARCH64. So can this port generate long calls? Any plan on this option? I would like to have a try on this if it's missing :-) Thanks.
Re: [PATCHv4] Enable -fsanitize-recover for KASan
On 10/23/2014 03:10 PM, Andrey Ryabinin wrote: On 10/23/2014 02:38 PM, Jakub Jelinek wrote: On Thu, Oct 23, 2014 at 02:33:42PM +0400, Yury Gribov wrote: Actually this is a historical artifact. If inlining proves to be significantly faster, they may want to switch. Ok. So, at that point you can include your ugly hacks in __asan_load* logic in the kernel, the difference between __asan_load4 and __asan_load4_noabort will be just that the latter will always return, while the former will not if some error has been reported. All the __asan_load* and __asan_store* entrypoints, regardless of -f{,no-}sanitize-recover=kernel-address are by definition not noreturn, they in the common case (if the code is not buggy) return. Perhaps we should just keep __asan_load* as is and leave the decision whether to abort or continue for the runtime? This would make semantics of -fsanitize-recover cumbersome though (because it wouldn't work if user selects outline instrumentation). Well, the don't ever report anything while some per-CPU flag is set thing can be considered as part of the is this memory access ok test, it is pretending everything is accessible. But, otherwise, if it is supposed to be developer's decision at compile time, __asan_load*_noabort should better always continue, even if it reported issues, and __asan_load* should better not return after reporting errors. True, but why we need new functions for that. __asan_load could also abort or not depending on what user/developer wants. Why we have to rebuild the entire kernel if someone wants to switch from abort to noabort? I'm not against __asan_load_noabort, I'm just saying that this is no point to have separate __asan_load/__asan_load_noabort functions in kernel. I'd still suggest to emit __asan_load_noabort so that we match userspace (where __asan_load strictly matches __asan_report in terminating the program). Behavior of __asan_load_noabort can further be restricted by user via various environment settings (kernel parameters, /proc, etc.). @Dmitry: what's your opinion on this? -Y
Re: [PATCH PR63173] [AARCH64, NEON] Improve vld[234](q?)_dup intrinsics
On 24 October 2014 03:21, Yangfei (Felix) felix.y...@huawei.com wrote: Thanks for the comments. I updated the patch with the intrinsic moved to its place. Attached please find the new version of the patch. OK for the trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 216558) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,39 @@ +2014-10-23 Felix Yang felix.y...@huawei.com + Jiji Jiang jiangj...@huawei.com Double space before . Otherwise OK. Thanks /Marcus
Re: [PATCH] c++ify sreal
On Fri, Oct 24, 2014 at 8:28 AM, tsaund...@mozilla.com wrote: From: Trevor Saunders tsaund...@mozilla.com Hi, do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Ok. Can you please replace remaining HOST_WIDE_INT vestiges in there with [u]int64_t please? Thanks, Richard. Trev gcc/ChangeLog: 2014-10-24 Trevor Saunders tsaund...@mozilla.com * ipa-inline.c (edge_badness): Adjust. (inline_small_functions): Likewise. * predict.c (propagate_freq): Likewise. (estimate_bb_frequencies): Likewise. * sreal.c (sreal::dump): Rename from dump_sreal. (debug): Adjust. (copy): Remove function. (sreal::shift_right): Rename from sreal_sift_right. (sreal::normalize): Rename from normalize. (sreal_init): Remove function. (sreal::to_int): Rename from sreal_to_int. (sreal_compare): Remove function. (sreal::operator+): Rename from sreal_add. (sreal::operator-): Rename from sreal_sub. (sreal::operator*): Rename from sreal_mul. (sreal::operator/): Rename from sreal_div. * sreal.h (class sreal): Adjust. (inline sreal operator+=): New operator. (inline sreal operator-=): Likewise. (inline sreal operator/=): Likewise. (inline sreal operator*=): Likewise. (inline bool operator!=): Likewise. (inline bool operator): Likewise. (inline bool operator=): Likewise. (inline bool operator=): Likewise. --- gcc/ipa-inline.c | 25 ++- gcc/predict.c| 82 -- gcc/sreal.c | 479 +++ gcc/sreal.h | 97 --- 4 files changed, 213 insertions(+), 470 deletions(-) diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index e79a4dd..cca1fb3 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -939,29 +939,28 @@ edge_badness (struct cgraph_edge *edge, bool dump) else if (max_count) { - sreal tmp, relbenefit_real, growth_real; int relbenefit = relative_time_benefit (callee_info, edge, edge_time); /* Capping edge-count to max_count. edge-count can be larger than max_count if an inline adds new edges which increase max_count after max_count is computed. */ gcov_type edge_count = edge-count max_count ? max_count : edge-count; - sreal_init (relbenefit_real, relbenefit, 0); - sreal_init (growth_real, growth, 0); + sreal relbenefit_real (relbenefit, 0); + sreal growth_real (growth, 0); /* relative_edge_count. */ - sreal_init (tmp, edge_count, 0); - sreal_div (tmp, tmp, max_count_real); + sreal tmp (edge_count, 0); + tmp /= max_count_real; /* relative_time_benefit. */ - sreal_mul (tmp, tmp, relbenefit_real); - sreal_div (tmp, tmp, max_relbenefit_real); + tmp *= relbenefit_real; + tmp /= max_relbenefit_real; /* growth_f_caller. */ - sreal_mul (tmp, tmp, half_int_min_real); - sreal_div (tmp, tmp, growth_real); + tmp *= half_int_min_real; + tmp /= growth_real; - badness = -1 * sreal_to_int (tmp); + badness = -1 * tmp.to_int (); if (dump) { @@ -1604,9 +1603,9 @@ inline_small_functions (void) if (max_count edge-count) max_count = edge-count; } - sreal_init (max_count_real, max_count, 0); - sreal_init (max_relbenefit_real, RELATIVE_TIME_BENEFIT_RANGE, 0); - sreal_init (half_int_min_real, INT_MAX / 2, 0); + max_count_real = sreal (max_count, 0); + max_relbenefit_real = sreal (RELATIVE_TIME_BENEFIT_RANGE, 0); + half_int_min_real = sreal (INT_MAX / 2, 0); ipa_free_postorder_info (); initialize_growth_caches (); diff --git a/gcc/predict.c b/gcc/predict.c index 5f5d4a5..10675c3 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -2571,15 +2571,13 @@ propagate_freq (basic_block head, bitmap tovisit) bb-count = bb-frequency = 0; } - memcpy (BLOCK_INFO (head)-frequency, real_one, sizeof (real_one)); + BLOCK_INFO (head)-frequency = real_one; last = head; for (bb = head; bb; bb = nextbb) { edge_iterator ei; - sreal cyclic_probability, frequency; - - memcpy (cyclic_probability, real_zero, sizeof (real_zero)); - memcpy (frequency, real_zero, sizeof (real_zero)); + sreal cyclic_probability = real_zero; + sreal frequency = real_zero; nextbb = BLOCK_INFO (bb)-next; BLOCK_INFO (bb)-next = NULL; @@ -2596,42 +2594,34 @@ propagate_freq (basic_block head, bitmap tovisit) FOR_EACH_EDGE (e, ei, bb-preds) if (EDGE_INFO (e)-back_edge) { - sreal_add (cyclic_probability, cyclic_probability, - EDGE_INFO (e)-back_edge_prob); + cyclic_probability +=
[committed] MAINTAINERS: add myself to write-after-approval list.
2014-10-24 Daniel Hellstrom dan...@gaisler.com * MAINTAINERS (write-after-approval): Add myself. Index: MAINTAINERS === --- MAINTAINERS(revision 216624) +++ MAINTAINERS(working copy) @@ -408,6 +408,7 @@ Pat Haugenpthau...@us.ibm.com Mark Heffernanmeh...@google.com George Helffrichgeo...@gcc.gnu.org +Daniel Hellstromdan...@gaisler.com Fergus Hendersonf...@cs.mu.oz.au Stuart Hendersonshend...@gcc.gnu.org Matthew Hillerhil...@redhat.com
Re: [libgomp, libiberty, libobjc] Fix gnu11 fallout on Solaris 10+
Richard Henderson r...@redhat.com writes: On 10/22/2014 04:43 AM, Rainer Orth wrote: The gnu11 patch broke Solaris 10 and 11 bootstrap: sys/feature_test.h has /* * It is invalid to compile an XPG3, XPG4, XPG4v2, or XPG5 application * using c99. The same is true for POSIX.1-1990, POSIX.2-1992, POSIX.1b, * and POSIX.1c applications. Likewise, it is invalid to compile an XPG6 * or a POSIX.1-2001 application with anything other than a c99 or later * compiler. Therefore, we force an error in both cases. */ #if defined(_STDC_C99) (defined(__XOPEN_OR_POSIX) !defined(_XPG6)) #error Compiler or options invalid for pre-UNIX 03 X/Open applications \ and pre-2001 POSIX applications #elif !defined(_STDC_C99) \ (defined(__XOPEN_OR_POSIX) defined(_XPG6)) #error Compiler or options invalid; UNIX 03 and POSIX.1-2001 applications \ require the use of c99 #endif so the headers now error out for any definition of (say) _XOPEN_SOURCE 600. Hum. Does it hurt us to simply update to 600 everywhere? If we can, I'd prefer that as a solution. But if that causes more problems than it solves, I'm ok with this as a solution. This certainly depends on what the oldest systems we still support are. E.g. in Solaris 8 (no longer supported on mainline, just serving as illustration) sys/feature_test.h we have for XPG5/UNIX 98: #if (_XOPEN_SOURCE - 0 == 500) #define _XPG5 Changing _XOPEN_SOURCE to 600 loses this completely. Same on Solaris 9 (equally no longer supported), and at least some pre-XPG6 systems, I fear. I cannot tell if we still support any of those, though. This might not even an issue for the cases as hand: e.g. even Solaris 8 pthread.h defines PTHREAD_MUTEX_RECURSIVE (the reason to define _XOPEN_SOURCE in libgomp/config/posix/lock.c) unconditionally. We might as well try and watch out for breakage, given that we are still in stage1. I would like the comments updated to mention the reason for XPG6; just saying that Solaris requires it for C99 and later seems sufficient. Sure, will do once we've decided which route to follow. And even with the _XOPEN_SOURCE business out of the way, there's still the question what to do about _POSIX_SOURCE in libiberty/sigsetmask.c. Given that it was introduced almost 20 years ago, it may well be unnecessary for the systems we still care about. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH][match-and-simplify] Fix failed init of force_no_side_effects
Committed. Richard. 2014-10-24 Richard Biener rguent...@suse.de * genmatch.c (capture_info::capture_info): Initialize force_no_side_effects. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 216594) +++ gcc/genmatch.c (working copy) @@ -1902,6 +1902,7 @@ capture_info::capture_info (simplify *s) return; } + force_no_side_effects = 0; info.safe_grow_cleared (s-capture_max + 1); e = as_a expr * (s-match); for (unsigned i = 0; i e-ops.length (); ++i)
[PATCH, alpha]: Fix c11 fallout in alpha testsuite
2014-10-24 Uros Bizjak ubiz...@gmail.com * gcc.target/alpha/980217-1.c (main): Fix implicit int. * gcc.target/alpha/pr19518.c (_mm_setzero_si64): New function. Tested on alphaev68-linux-gnu and committed to mainline SVN. Uros. Index: gcc.target/alpha/980217-1.c === --- gcc.target/alpha/980217-1.c (revision 216611) +++ gcc.target/alpha/980217-1.c (working copy) @@ -26,7 +26,7 @@ return ~(t 31) (1 - ((ix 0x8000) 30)); } -main () +int main () { float x = 1.0 / 0.0; int i = isinff (x); Index: gcc.target/alpha/pr19518.c === --- gcc.target/alpha/pr19518.c (revision 216611) +++ gcc.target/alpha/pr19518.c (working copy) @@ -9,6 +9,10 @@ typedef int FbStride; typedef unsigned long __m64; extern __m64 load (__m64); +static __inline __m64 _mm_setzero_si64(void) +{ + return (__m64)0L; +} static __inline __m64 _mm_adds_pu8(__m64 __m1, __m64 __m2) { return __m1 + __builtin_alpha_minsb8(__m2, ~__m1);
Re: avoid alignment of static variables affecting stack's
On 23.10.14 at 20:13, l...@redhat.com wrote: On 10/23/14 01:09, Jan Beulich wrote: On 23.10.14 at 08:50, ja...@redhat.com wrote: On Thu, Oct 23, 2014 at 07:30:27AM +0100, Jan Beulich wrote: Function (or more narrow) scope static variables (as well as others not placed on the stack) should also not have any effect on the stack alignment. I noticed the issue first with Linux'es dynamic_pr_debug() construct using an 8-byte aligned sub-file-scope local variable. According to my checking bad behavior started with 4.6.x (4.5.3 was still okay), but generated code got quite a bit worse as of 4.9.0. If the static/external var has BLKmode, then perhaps it is safe, but I wonder about other vars, say vectors etc. Such vars are most likely loaded from their memory location, and if for some reason that needs to be spilled again, stack realignment would not be able to do that. Or do we inspect the IL and for any pseudos with modes needing larger alignment we adjust the dynamic stack realignment fields? I don't know, but it would seem to me that this ought to happen anyway: If the pseudo holds the result of some computation other than a simple load from memory and needs spilling, the same would apply afaict. For something in static storage, this seems OK. However, I think a hard register variable ought to be left alone -- even if we can't spill it to a stack slot today, there's a reasonable chance we might add that capability in the future. Hmm, but then wouldn't it need to be the code generating the spill that's responsible for enforcing suitable alignment? I can certainly re-submit without the hard register special cased (as it would still fix the original issue I'm seeing), but it feels wrong to do so. Jan
Re: Move loop peeling from RTL to gimple
Hello! Hi, this is update of my 2013 update to 2012 patch to move rtl loop peeling to tree level. This is to expose optimization oppurtunities earlier. Incrementally I think I can also improve profiling to provide a histogram on loop iterations and get more sensible peeling decisions. profiled-bootstrapped/regtested x86_64-linux, OK? Ok. Thanks, Richard. Honza * loop-unroll.c: (decide_unrolling_and_peeling): Rename to (decide_unrolling): ... this one. (peel_loops_completely): Remove. (decide_peel_simple): Remove. (decide_peel_once_rolling): Remove. (decide_peel_completely): Remove. (peel_loop_simple): Remove. (peel_loop_completely): Remove. (unroll_and_peel_loops): Rename to ... (unroll_loops): ... this one; handle only unrolling. * cfgloop.h (lpt_dec): Remove LPT_PEEL_COMPLETELY and LPT_PEEL_SIMPLE. (UAP_PEEL): Remove. (unroll_and_peel_loops): Remove. (unroll_loops): New. * passes.def: Replace pass_rtl_unroll_and_peel_loops by pass_rtl_unroll_loops. * loop-init.c (gate_rtl_unroll_and_peel_loops, rtl_unroll_and_peel_loops): Rename to ... (gate_rtl_unroll_loops, rtl_unroll_loops): ... these; update. (pass_rtl_unroll_and_peel_loops): Rename to ... (pass_rtl_unroll_loops): ... this one. * tree-pass.h (make_pass_rtl_unroll_and_peel_loops): Remove. (make_pass_rtl_unroll_loops): New. * tree-ssa-loop-ivcanon.c: (estimated_peeled_sequence_size, try_peel_loop): New. (canonicalize_loop_induction_variables): Update. * gcc.dg/tree-prof/peel-1.c: Update. * gcc.dg/tree-prof/unroll-1.c: Update. * gcc.dg/gcc.dg/unroll_1.c: Update. * gcc.dg/gcc.dg/unroll_2.c: Update. * gcc.dg/gcc.dg/unroll_3.c: Update. * gcc.dg/gcc.dg/unroll_4.c: Update. This patch caused: UNRESOLVED: gcc.dg/tree-prof/peel-1.c scan-rtl-dump loop2_unroll Considering simply peeling loop The gcc.dg/tree-prof/peel-1.c testcase needs further updates in dump scan and cleanup dg-directives. Uros.
Re: avoid alignment of static variables affecting stack's
On Fri, Oct 24, 2014 at 11:01 AM, Jan Beulich jbeul...@suse.com wrote: On 23.10.14 at 20:13, l...@redhat.com wrote: On 10/23/14 01:09, Jan Beulich wrote: On 23.10.14 at 08:50, ja...@redhat.com wrote: On Thu, Oct 23, 2014 at 07:30:27AM +0100, Jan Beulich wrote: Function (or more narrow) scope static variables (as well as others not placed on the stack) should also not have any effect on the stack alignment. I noticed the issue first with Linux'es dynamic_pr_debug() construct using an 8-byte aligned sub-file-scope local variable. According to my checking bad behavior started with 4.6.x (4.5.3 was still okay), but generated code got quite a bit worse as of 4.9.0. If the static/external var has BLKmode, then perhaps it is safe, but I wonder about other vars, say vectors etc. Such vars are most likely loaded from their memory location, and if for some reason that needs to be spilled again, stack realignment would not be able to do that. Or do we inspect the IL and for any pseudos with modes needing larger alignment we adjust the dynamic stack realignment fields? I don't know, but it would seem to me that this ought to happen anyway: If the pseudo holds the result of some computation other than a simple load from memory and needs spilling, the same would apply afaict. For something in static storage, this seems OK. However, I think a hard register variable ought to be left alone -- even if we can't spill it to a stack slot today, there's a reasonable chance we might add that capability in the future. Hmm, but then wouldn't it need to be the code generating the spill that's responsible for enforcing suitable alignment? I can certainly re-submit without the hard register special cased (as it would still fix the original issue I'm seeing), but it feels wrong to do so. Yes, ISTR the spilling code is supposed to update the required stack alignment. After all the RA decision might affect required alignment of spills. Richard. Jan
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, In my initial design I did such splitting but before start real if-conversion but I decided to not perform it since code size for if-converted loop is growing (number of phi nodes is increased). It is worth noting also that for phi with #nodes 2 we need to get all predicates (except one) to do phi-predication and it means that block containing such phi can have only 1 critical edge. Can you point me to the patch with the special insertion code then? I definitely want to avoid the mess we ran into with the reassoc code clever insertion code. Richard. Thanks. Yuri. 2014-10-21 18:19 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I saw the sources of these functions, but I can't understand why I should use something else? Note that all predicate computations are located in basic blocks ( by design of if-conv) and there is special function that put these computations in bb (insert_gimplified_predicates). Edge contains only predicate not its computations. New function - find_insertion_point() does very simple search - it finds out the latest (in current bb) operand def-stmt of predicates taken from all incoming edges. In original algorithm the predicate of non-critical edge is taken to perform phi-node predication since for critical edge it does not work properly. My question is: does your comments mean that I should re-design my extensions? Well, we have infrastructure for inserting code on edges and you've made critical edges predicated correctly. So why re-invent the wheel? I realize this is very similar to my initial suggestion to simply split critical edges in loops you want to if-convert but delays splitting until it turns out to be necessary (which might be good for the !force_vect case). For edge predicates you simply can emit their computation on the edge, no? Btw, I very originally suggested to rework if-conversion to only record edge predicates - having both block and edge predicates somewhat complicates the code and makes it harder to maintain (thus also the suggestion to simply split critical edges if necessary to make BB predicates work always). Your patches add a lot of code and to me it seems we can avoid doing so much special casing. For example attacking the critical edge issue by a simple Index: tree-if-conv.c === --- tree-if-conv.c (revision 216508) +++ tree-if-conv.c (working copy) @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop, if (EDGE_COUNT (e-src-succs) == 1) found = true; if (!found) - { - if (dump_file (dump_flags TDF_DETAILS)) - fprintf (dump_file, only critical predecessors\n); - return false; - } + split_edge (EDGE_PRED (bb, 0)); } return true; it changes the number of blocks in the loop, so get_loop_body_in_if_conv_order should probably be re-done with the above eventually signalling that it created a new block. Or the above should populate a vector of edges to split and do that after the loop calling if_convertible_bb_p. Richard. Richard. Thanks. Yuri. BTW Jeff did initial review of my changes related to predicate computation for join blocks. I presented him updated patch with test-case and some minor changes in patch. But still did not get any feedback on it. Could you please take a look also on it? 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Yes, This patch does not make sense since phi node predication for bb with critical incoming edges only performs another function which is absent (predicate_extended_scalar_phi). BTW I see that commit_edge_insertions() is used for rtx instructions only but you propose to use it for tree also. Did I miss something? Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert if you want easy access to the newly created basic block to push the predicate to - see gsi_commit_edge_inserts implementation). Richard. Thanks ahead. 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). But the same reasoning applies to this version of the patch when flag_force_vectorize is true!? (insertion point and invalid SSA form) Which means the patch doesn't make sense in isolation? Btw, I think for the case you should simply
Re: [PATCH] Simple improvement for predicate computation in if-convert phase.
On Fri, Oct 17, 2014 at 3:08 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Jeff, I prepared another patch that includes test-case as you requested. Below are answers on your questions. First, for the benefit of anyone trying to understand what you're doing, defining what cd equivalent means would be helpful. I added the following comment to function: fwe call basic blocks bb1 and bb2 cd-equivalent if they are executed under the same condition. Is it sufficient? So, do you have a case where the dominated_by_p test above is true and is_predicated(bb) returns true as well? I think this part of the change is largely responsible for the hack you're doing with having the function scoped static variable join_bb. I don't have such test-case and I assume that if bb is always executed, it is not predicated. I also deleted join_bb in my changes. Is it OK for trunk now. Ok. Thanks, Richard. Thanks. Yuri. 2014-10-17 Yuri Rumyantsev ysrum...@gmail.com gcc/ChangeLog * tree-if-conv.c (add_to_predicate_list): Check unconditionally that bb is always executed to early exit. Use predicate of cd-equivalent block for join blocks if it exists. (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree. (tree_if_conversion): Free post-dominance information. gcc/testsuite/ChangeLog * gcc/dg/tree-ssa/ifc-cd.c: New test. 2014-10-17 1:16 GMT+04:00 Jeff Law l...@redhat.com: On 10/16/14 05:52, Yuri Rumyantsev wrote: Hi All, Here is a simple enhancement for predicate computation in if-convert phase: We use notion of cd equivalence to get simpler predicate for join block, e.g. if join block has 2 predecessors with predicates p1 p2 and p1 !p2, we'd like to get p1 for it instead of p1 p2 | p1 !p2. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? gcc/ChangeLog 2014-10-16 Yuri Rumyantsevysrum...@gmail.com * tree-if-conv.c (add_to_predicate_list): Check unconditionally that bb is always executed to early exit. Use predicate of cd-equivalent block for join blocks if it exists. (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree. (tree_if_conversion): Free post-dominance information. First, for the benefit of anyone trying to understand what you're doing, defining what cd equivalent means would be helpful. if-conv.patch Index: tree-if-conv.c === --- tree-if-conv.c (revision 216217) +++ tree-if-conv.c (working copy) @@ -396,25 +396,51 @@ } /* Add condition NC to the predicate list of basic block BB. LOOP is - the loop to be if-converted. */ + the loop to be if-converted. Use predicate of cd-equivalent block + for join bb if it exists. */ static inline void add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) { tree bc, *tp; + basic_block dom_bb; + static basic_block join_bb = NULL; if (is_true_predicate (nc)) return; - if (!is_predicated (bb)) + /* If dominance tells us this basic block is always executed, + don't record any predicates for it. */ + if (dominated_by_p (CDI_DOMINATORS, loop-latch, bb)) +return; So, do you have a case where the dominated_by_p test above is true and is_predicated(bb) returns true as well? I think this part of the change is largely responsible for the hack you're doing with having the function scoped static variable join_bb. + + /* If predicate has been already set up for given bb using cd-equivalent + block predicate, simply escape. */ + if (join_bb == bb) +return; I *really* dislike the state you're carrying around via join_bb. ISTM that if you compute that there's an equivalence, then you just set the predicate for the equivalent block and the right things would have happened if you had not changed the test above. You also need a testcase. It doesn't have to be extensive, but at least some basic smoke test to verify basic operation of this code. It's perfectly fine to scan the debugging dumps for debug output. jeff
Re: avoid alignment of static variables affecting stack's
On 24.10.14 at 11:10, richard.guent...@gmail.com wrote: On Fri, Oct 24, 2014 at 11:01 AM, Jan Beulich jbeul...@suse.com wrote: On 23.10.14 at 20:13, l...@redhat.com wrote: On 10/23/14 01:09, Jan Beulich wrote: On 23.10.14 at 08:50, ja...@redhat.com wrote: On Thu, Oct 23, 2014 at 07:30:27AM +0100, Jan Beulich wrote: Function (or more narrow) scope static variables (as well as others not placed on the stack) should also not have any effect on the stack alignment. I noticed the issue first with Linux'es dynamic_pr_debug() construct using an 8-byte aligned sub-file-scope local variable. According to my checking bad behavior started with 4.6.x (4.5.3 was still okay), but generated code got quite a bit worse as of 4.9.0. If the static/external var has BLKmode, then perhaps it is safe, but I wonder about other vars, say vectors etc. Such vars are most likely loaded from their memory location, and if for some reason that needs to be spilled again, stack realignment would not be able to do that. Or do we inspect the IL and for any pseudos with modes needing larger alignment we adjust the dynamic stack realignment fields? I don't know, but it would seem to me that this ought to happen anyway: If the pseudo holds the result of some computation other than a simple load from memory and needs spilling, the same would apply afaict. For something in static storage, this seems OK. However, I think a hard register variable ought to be left alone -- even if we can't spill it to a stack slot today, there's a reasonable chance we might add that capability in the future. Hmm, but then wouldn't it need to be the code generating the spill that's responsible for enforcing suitable alignment? I can certainly re-submit without the hard register special cased (as it would still fix the original issue I'm seeing), but it feels wrong to do so. Yes, ISTR the spilling code is supposed to update the required stack alignment. After all the RA decision might affect required alignment of spills. Thanks for confirming. So is the patch then okay to commit as is? Jan
Re: avoid alignment of static variables affecting stack's
On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote: For something in static storage, this seems OK. However, I think a hard register variable ought to be left alone -- even if we can't spill it to a stack slot today, there's a reasonable chance we might add that capability in the future. Hmm, but then wouldn't it need to be the code generating the spill that's responsible for enforcing suitable alignment? I can certainly re-submit without the hard register special cased (as it would still fix the original issue I'm seeing), but it feels wrong to do so. Yes, ISTR the spilling code is supposed to update the required stack alignment. After all the RA decision might affect required alignment of spills. From what I remember, at RA time you already have to know conservatively that you'll want to do dynamic stack realignment and what the highest needed alignment will be, so various parts of expansion etc. conservatively compute what will be needed. I think that is because you e.g. need to reserve some registers (vDRAP, etc.) if doing dynamic realignment. If you conservatively assume you'll need dynamic stack realignment and after RA you find you really don't need it, there are some optimizations in prologue threading where it attempts to at least decrease amount of unnecessary code, but the harm has already been done. Might be that with LRA perhaps this could be changed and not conservatively assume more alignment than proven to be needed, but such code isn't there I think. Jakub
Re: [testsuite] Fix gcc.dg/ipa/ipa-icf-2[18].c on Solaris
Jan Hubicka hubi...@ucw.cz writes: I had a terrible time figuring out what march value i686-unknown-linux-gnu uses; only in gcc -v it's possible to see that it uses pentiumpro, unlike pentium4 on Solaris/x86, which explains the difference. What is the reason to use pentium4 for Solaris/x86? Is it -march setting or -mtune? For -mtune it is a terrible choice This was changed in http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01483.html http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00254.html It's with_arch_32=pentium4 and with_tune_32=generic now. The latest Oracle Studio compilers default to -xarch=sse2, which is sse2 May use 386, MMX, Pentium_pro, SSE and SSE2 instructions. The pentium4 over pentiumpro choice may well be the result of a misunderstanding: as I said, the code in config.gcc setting with_arch* is almost impossible to follow. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [match-and-simplify] add new lower function
On Thu, Oct 23, 2014 at 11:22 AM, Prathamesh Kulkarni bilbotheelffri...@gmail.com wrote: Instead of iterating each time for calling lowering function (lower_opt_convert, etc.), add new lower function and pass lower_opt_convert, lower_for etc. to it as callback. I don't think this is more readable. Thanks, Richard. * genmatch.c (lower): New overloaded function. (lower): Adjust to call overloaded lower. Thanks, Prathamesh
Re: [PATCH] Fix and improve avx2 broadcasts (PR target/63594)
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Jakub Jelinek ja...@redhat.com writes: On Thu, Oct 23, 2014 at 02:58:06PM +0200, Rainer Orth wrote: Unfortunately, I see some problems with those tests on Solaris: * On Solaris/x86, I get FAIL: gcc.dg/pr63594-2.c execution test for 32-bit. Any particular reason to restrict -mno-mmx to Linux/x86? Manually building the testcase with -mno-mmx on Solaris/x86 seems to cure the failure. No reason, probably finger memory without lots of thinking. The reason for -mno-mmx is that the functions use floating point vectors and scalar floating point arithmetics in the same function. Feel free to change both pr63594-{1,2}.c with s/linux//g . Ok, will do and commit after Linux and Solaris testing. Here's what I've checked in after i686-unknown-linux-gnu, x86_64-unknown-linux-gnu, and i386-pc-solaris2.11 testing: 2014-10-24 Rainer Orth r...@cebitec.uni-bielefeld.de * gcc.dg/pr63594-1.c: Apply -mno-mmx to all i?86-*-* and x86_64-*-* targets. * gcc.dg/pr63594-2.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/pr63594-1.c b/gcc/testsuite/gcc.dg/pr63594-1.c --- a/gcc/testsuite/gcc.dg/pr63594-1.c +++ b/gcc/testsuite/gcc.dg/pr63594-1.c @@ -1,7 +1,7 @@ /* PR target/63594 */ /* { dg-do compile } */ /* { dg-options -O2 -Wno-psabi } */ -/* { dg-additional-options -mno-mmx { target i?86-*-linux* x86_64-*-linux* } } */ +/* { dg-additional-options -mno-mmx { target i?86-*-* x86_64-*-* } } */ #define C1 c #define C2 C1, C1 diff --git a/gcc/testsuite/gcc.dg/pr63594-2.c b/gcc/testsuite/gcc.dg/pr63594-2.c --- a/gcc/testsuite/gcc.dg/pr63594-2.c +++ b/gcc/testsuite/gcc.dg/pr63594-2.c @@ -1,7 +1,7 @@ /* PR target/63594 */ /* { dg-do run } */ /* { dg-options -O2 -Wno-psabi } */ -/* { dg-additional-options -mno-mmx { target i?86-*-linux* x86_64-*-linux* } } */ +/* { dg-additional-options -mno-mmx { target i?86-*-* x86_64-*-* } } */ #define C1 c #define C2 C1, C1 * On 64-bit Solaris/SPARC, I get FAIL: gcc.dg/pr63594-1.c (internal compiler error) FAIL: gcc.dg/pr63594-1.c (test for excess errors) /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c: In function 'test1float1': /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:19:1: internal compiler error: Bus Error /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:57:1: note: in expansion of macro 'T' 0x751c03 crash_signal /vol/gcc/src/hg/trunk/local/gcc/toplev.c:349 0x44ffb4 gen_group_rtx(rtx_def*) /vol/gcc/src/hg/trunk/local/gcc/expr.c:1624 0x4f8167 expand_function_start(tree_node*) /vol/gcc/src/hg/trunk/local/gcc/function.c:4803 0x36278f execute /vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:5709 Works fine on x86_64, and doesn't seem to be related to the fix in any way, it seems the ICE is related to returning or passing the vectors, so supposedly some latent Solaris/SPARC issue? Ok, I'll file a PR and Cc Eric. This seems to be the same issue as PR target/61535. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Fix for PR63595
Hello. Following patch contains addition of PHI result comparison in IPA ICF. Boostrap works on x86_64-linux-pc, no regression observed. Ready for trunk? Thanks, Martin gcc/testsuite/ChangeLog: 2014-10-24 Martin Liska mli...@suse.cz * gcc.dg/ipa/pr63595.c: New test. gcc/ChangeLog: 2014-10-24 Martin Liska mli...@suse.cz * ipa-icf.c (sem_function::compare_phi_node): PHI result comparison added. diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index d1238a4..7456fec 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -869,6 +869,12 @@ sem_function::compare_phi_node (basic_block bb1, basic_block bb2) phi1 = gsi_stmt (si1); phi2 = gsi_stmt (si2); + tree phi_result1 = gimple_phi_result (phi1); + tree phi_result2 = gimple_phi_result (phi2); + + if (!m_checker-compare_operand (phi_result1, phi_result2)) + return return_false_with_msg (PHI results are different); + size1 = gimple_phi_num_args (phi1); size2 = gimple_phi_num_args (phi2); diff --git a/gcc/testsuite/gcc.dg/ipa/pr63595.c b/gcc/testsuite/gcc.dg/ipa/pr63595.c new file mode 100644 index 000..52851fb --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr63595.c @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-ipa-icf-details } */ + +typedef int size_t; + +typedef struct TypHeader { +unsigned long size; +struct TypHeader * * ptr; +char name[3]; +unsigned char type; +} * TypHandle; + +__attribute__((noinline)) +static TypHandle Error(const char *str, unsigned long l1, unsigned long l2) +{ + return 0; +} + +extern TypHandle (* EvTab[81]) ( TypHandle hd ); +extern TypHandle (*TabProd[28][28]) ( TypHandle, TypHandle ); + +__attribute__((noinline)) +TypHandle FunOnRight (TypHandle hdCall) +{ +TypHandle hdRes; +TypHandle hdPnt; +TypHandle hdElm; + + +if ( ((hdCall)-size) != 3*((size_t)sizeof(TypHandle)) ) +return Error(,0L,0L); +hdPnt = ((long)(((TypHandle*)((hdCall)-ptr))[1])1 ? (((TypHandle*)((hdCall)-ptr))[1]) : (* EvTab[(((long)(((TypHandle*)((hdCall)-ptr))[1]) 1) ? 1 : TypHandle*)((hdCall)-ptr))[1])-type))])TypHandle*)((hdCall)-ptr))[1]))); +hdElm = ((long)(((TypHandle*)((hdCall)-ptr))[2])1 ? (((TypHandle*)((hdCall)-ptr))[2]) : (* EvTab[(((long)(((TypHandle*)((hdCall)-ptr))[2]) 1) ? 1 : TypHandle*)((hdCall)-ptr))[2])-type))])TypHandle*)((hdCall)-ptr))[2]))); + + +hdRes = ((*TabProd[(((long)(hdPnt) 1) ? 1 : ((hdPnt)-type))][(((long)(hdElm) 1) ? 1 : ((hdElm)-type))])((hdPnt),(hdElm))); +return hdRes; +} + +__attribute__((noinline)) +TypHandle FunOnLeft (TypHandle hdCall) +{ +TypHandle hdRes; +TypHandle hdPnt; +TypHandle hdElm; + + +if ( ((hdCall)-size) != 3*((size_t)sizeof(TypHandle)) ) +return Error(,0L,0L); +hdPnt = ((long)(((TypHandle*)((hdCall)-ptr))[1])1 ? (((TypHandle*)((hdCall)-ptr))[1]) : (* EvTab[(((long)(((TypHandle*)((hdCall)-ptr))[1]) 1) ? 1 : TypHandle*)((hdCall)-ptr))[1])-type))])TypHandle*)((hdCall)-ptr))[1]))); +hdElm = ((long)(((TypHandle*)((hdCall)-ptr))[2])1 ? (((TypHandle*)((hdCall)-ptr))[2]) : (* EvTab[(((long)(((TypHandle*)((hdCall)-ptr))[2]) 1) ? 1 : TypHandle*)((hdCall)-ptr))[2])-type))])TypHandle*)((hdCall)-ptr))[2]))); + + +hdRes = ((*TabProd[(((long)(hdElm) 1) ? 1 : ((hdElm)-type))][(((long)(hdPnt) 1) ? 1 : ((hdPnt)-type))])((hdElm),(hdPnt))); +return hdRes; +} + +int main() +{ + return 0; +} + +/* { dg-final { scan-ipa-dump Equal symbols: 0 icf } } */ +/* { dg-final { scan-ipa-dump PHI results are different icf } } */ +/* { dg-final { cleanup-ipa-dump icf } } */
Re: [PATCHv4] Enable -fsanitize-recover for KASan
On Fri, Oct 24, 2014 at 12:28 PM, Yury Gribov y.gri...@samsung.com wrote: On 10/23/2014 03:10 PM, Andrey Ryabinin wrote: On 10/23/2014 02:38 PM, Jakub Jelinek wrote: On Thu, Oct 23, 2014 at 02:33:42PM +0400, Yury Gribov wrote: Actually this is a historical artifact. If inlining proves to be significantly faster, they may want to switch. Ok. So, at that point you can include your ugly hacks in __asan_load* logic in the kernel, the difference between __asan_load4 and __asan_load4_noabort will be just that the latter will always return, while the former will not if some error has been reported. All the __asan_load* and __asan_store* entrypoints, regardless of -f{,no-}sanitize-recover=kernel-address are by definition not noreturn, they in the common case (if the code is not buggy) return. Perhaps we should just keep __asan_load* as is and leave the decision whether to abort or continue for the runtime? This would make semantics of -fsanitize-recover cumbersome though (because it wouldn't work if user selects outline instrumentation). Well, the don't ever report anything while some per-CPU flag is set thing can be considered as part of the is this memory access ok test, it is pretending everything is accessible. But, otherwise, if it is supposed to be developer's decision at compile time, __asan_load*_noabort should better always continue, even if it reported issues, and __asan_load* should better not return after reporting errors. True, but why we need new functions for that. __asan_load could also abort or not depending on what user/developer wants. Why we have to rebuild the entire kernel if someone wants to switch from abort to noabort? I'm not against __asan_load_noabort, I'm just saying that this is no point to have separate __asan_load/__asan_load_noabort functions in kernel. I'd still suggest to emit __asan_load_noabort so that we match userspace (where __asan_load strictly matches __asan_report in terminating the program). Behavior of __asan_load_noabort can further be restricted by user via various environment settings (kernel parameters, /proc, etc.). @Dmitry: what's your opinion on this? I am somewhat lost in this thread and probably missing something. But why do we need __asan_load (which is not noabort) at all? Outline instrumentation is non a default mode for both user-space asan and kasan (at least in the envisioned future). I would expect that these non-typical cases that use outline instrumentation can also bear the overhead of non-noreturn functions. Can we use just one version of __asan_load and let runtime decide on abort?
Re: [PATCH] Fix genmatch linking
On Fri, 24 Oct 2014, Richard Biener wrote: On Fri, 24 Oct 2014, Rainer Orth wrote: Richard Biener rguent...@suse.de writes: On Thu, 23 Oct 2014, Richard Biener wrote: Final try for today. And this may work as well and is slightly simpler. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 216590) +++ gcc/Makefile.in (working copy) @@ -981,7 +981,15 @@ else LIBIBERTY = ../libiberty/libiberty.a BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a endif +# For stage1 and when cross-compiling use the build libcpp which is +# built with NLS disabled. For stage2+ use the host library and +# its dependencies. +ifeq ($(build_objdir),$(build_libobjdir)) BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a +else +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV) +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) +endif # Dependencies on the intl and portability libraries. LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \ Can you test it please? Sure: this version allowed an i386-pc-solaris2.10 bootstrap to complete just fine. Great. Installed as follows. Richard. 2014-10-24 Richard Biener rguent...@suse.de * Makefile.in (BUILD_CPPLIB): When in stage2+ use the host library and make sure to pull in the required libintl and libiconv dependencies. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 216590) +++ gcc/Makefile.in (working copy) @@ -981,7 +981,15 @@ else LIBIBERTY = ../libiberty/libiberty.a BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a endif +# For stage1 and when cross-compiling use the build libcpp which is +# built with NLS disabled. For stage2+ use the host library and +# its dependencies. +ifeq ($(build_objdir),$(build_libobjdir)) BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a +else +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV) +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) +endif # Dependencies on the intl and portability libraries. LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \ Dominique reported that this fails for system libiconv but built libintl. Which might be fixed by the following. Does that still work for you? Thanks, Richard. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 216626) +++ gcc/Makefile.in (working copy) @@ -981,15 +981,6 @@ else LIBIBERTY = ../libiberty/libiberty.a BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a endif -# For stage1 and when cross-compiling use the build libcpp which is -# built with NLS disabled. For stage2+ use the host library and -# its dependencies. -ifeq ($(build_objdir),$(build_libobjdir)) -BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a -else -BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV) -build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) -endif # Dependencies on the intl and portability libraries. LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \ @@ -2529,6 +2520,17 @@ genprog = $(genprogerr) check checksum c # These programs need libs over and above what they get from the above list. build/genautomata$(build_exeext) : BUILD_LIBS += -lm +# For stage1 and when cross-compiling use the build libcpp which is +# built with NLS disabled. For stage2+ use the host library and +# its dependencies. +ifeq ($(build_objdir),$(build_libobjdir)) +BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a +else +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) +build/genmatch$(build_exeext): BUILD_LIBS += $(LIBINTL) $(LIBICONV) +endif + build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \ $(BUILD_ERRORS) build/vec.o build/hash-table.o
Re: [PATCHv4] Enable -fsanitize-recover for KASan
On Fri, Oct 24, 2014 at 01:44:27PM +0400, Dmitry Vyukov wrote: I am somewhat lost in this thread and probably missing something. But why do we need __asan_load (which is not noabort) at all? Outline instrumentation is non a default mode for both user-space asan and kasan (at least in the envisioned future). I would expect that these non-typical cases that use outline instrumentation can also bear the overhead of non-noreturn functions. Can we use just one version of __asan_load and let runtime decide on abort? __asan_load actually must never be noreturn, because in the common case where the load is valid it of course returns. Jakub
Re: avoid alignment of static variables affecting stack's
On Fri, Oct 24, 2014 at 11:18 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote: For something in static storage, this seems OK. However, I think a hard register variable ought to be left alone -- even if we can't spill it to a stack slot today, there's a reasonable chance we might add that capability in the future. Hmm, but then wouldn't it need to be the code generating the spill that's responsible for enforcing suitable alignment? I can certainly re-submit without the hard register special cased (as it would still fix the original issue I'm seeing), but it feels wrong to do so. Yes, ISTR the spilling code is supposed to update the required stack alignment. After all the RA decision might affect required alignment of spills. From what I remember, at RA time you already have to know conservatively that you'll want to do dynamic stack realignment and what the highest needed alignment will be, so various parts of expansion etc. conservatively compute what will be needed. I think that is because you e.g. need to reserve some registers (vDRAP, etc.) if doing dynamic realignment. If you conservatively assume you'll need dynamic stack realignment and after RA you find you really don't need it, there are some optimizations in prologue threading where it attempts to at least decrease amount of unnecessary code, but the harm has already been done. Might be that with LRA perhaps this could be changed and not conservatively assume more alignment than proven to be needed, but such code isn't there I think. I stand corrected then. Richard. Jakub
Re: [PATCHv4] Enable -fsanitize-recover for KASan
On Fri, Oct 24, 2014 at 1:50 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 24, 2014 at 01:44:27PM +0400, Dmitry Vyukov wrote: I am somewhat lost in this thread and probably missing something. But why do we need __asan_load (which is not noabort) at all? Outline instrumentation is non a default mode for both user-space asan and kasan (at least in the envisioned future). I would expect that these non-typical cases that use outline instrumentation can also bear the overhead of non-noreturn functions. Can we use just one version of __asan_load and let runtime decide on abort? __asan_load actually must never be noreturn, because in the common case where the load is valid it of course returns. Right! Then I am puzzled by that message by Yury: I'd still suggest to emit __asan_load_noabort so that we match userspace (where __asan_load strictly matches __asan_report in terminating the program) Why are we discussing __asan_load_noabort?
Re: [PATCHv4] Enable -fsanitize-recover for KASan
On Fri, Oct 24, 2014 at 11:50:58AM +0200, Jakub Jelinek wrote: On Fri, Oct 24, 2014 at 01:44:27PM +0400, Dmitry Vyukov wrote: I am somewhat lost in this thread and probably missing something. But why do we need __asan_load (which is not noabort) at all? Outline instrumentation is non a default mode for both user-space asan and kasan (at least in the envisioned future). I would expect that these non-typical cases that use outline instrumentation can also bear the overhead of non-noreturn functions. Can we use just one version of __asan_load and let runtime decide on abort? __asan_load actually must never be noreturn, because in the common case where the load is valid it of course returns. The point of __asan_load*_noabort (vs. __asan_load*) and __asan_report*_noabort (vs. __asan_report*) is to allow the choice what is fatal and what is not fatal to be done at compile time, per compilation unit. For __asan_report* that is a must, as __asan_report* without noabort is noreturn, for __asan_load* which is not noreturn the implementation can of course choose not to make something fatal when it wishes. Without -fsanitize-recover={address,kernel-address} support, the choice could be done only per program or kernel globally, without a way for programmer to differentiate. Jakub
Re: [PATCHv4] Enable -fsanitize-recover for KASan
On 10/24/2014 01:44 PM, Dmitry Vyukov wrote: On Fri, Oct 24, 2014 at 12:28 PM, Yury Gribov y.gri...@samsung.com wrote: On 10/23/2014 03:10 PM, Andrey Ryabinin wrote: On 10/23/2014 02:38 PM, Jakub Jelinek wrote: On Thu, Oct 23, 2014 at 02:33:42PM +0400, Yury Gribov wrote: Actually this is a historical artifact. If inlining proves to be significantly faster, they may want to switch. Ok. So, at that point you can include your ugly hacks in __asan_load* logic in the kernel, the difference between __asan_load4 and __asan_load4_noabort will be just that the latter will always return, while the former will not if some error has been reported. All the __asan_load* and __asan_store* entrypoints, regardless of -f{,no-}sanitize-recover=kernel-address are by definition not noreturn, they in the common case (if the code is not buggy) return. Perhaps we should just keep __asan_load* as is and leave the decision whether to abort or continue for the runtime? This would make semantics of -fsanitize-recover cumbersome though (because it wouldn't work if user selects outline instrumentation). Well, the don't ever report anything while some per-CPU flag is set thing can be considered as part of the is this memory access ok test, it is pretending everything is accessible. But, otherwise, if it is supposed to be developer's decision at compile time, __asan_load*_noabort should better always continue, even if it reported issues, and __asan_load* should better not return after reporting errors. True, but why we need new functions for that. __asan_load could also abort or not depending on what user/developer wants. Why we have to rebuild the entire kernel if someone wants to switch from abort to noabort? I'm not against __asan_load_noabort, I'm just saying that this is no point to have separate __asan_load/__asan_load_noabort functions in kernel. I'd still suggest to emit __asan_load_noabort so that we match userspace (where __asan_load strictly matches __asan_report in terminating the program). Behavior of __asan_load_noabort can further be restricted by user via various environment settings (kernel parameters, /proc, etc.). @Dmitry: what's your opinion on this? I am somewhat lost in this thread and probably missing something. But why do we need __asan_load (which is not noabort) at all? Outline instrumentation is non a default mode for both user-space asan and kasan (at least in the envisioned future). Well, it's still enabled automatically if number of memory accesses is getting large enough (I think it was 7000?) so it is default in a way. I would expect that these non-typical cases that use outline instrumentation can also bear the overhead of non-noreturn functions. As Jakub mentioned both __asan_load and __asan_load_noabort will _not_ be NORETURN. Two different versions are necessary so that compiler can insert outline instrumentation that matches the -fsanitize-recover setting. Can we use just one version of __asan_load and let runtime decide on abort? In this case semantics of inline and outline instrumentation will differ (the former depending on -fsanitize-recover compile-time setting and the latter depending on runtime options) which may be undesirable given that compiler may automatically choose to switch from inline to outline depending on function size. -Y
Re: [PATCH] Fix for PR63595
On Fri, Oct 24, 2014 at 11:35 AM, Martin Liška mli...@suse.cz wrote: Hello. Following patch contains addition of PHI result comparison in IPA ICF. Boostrap works on x86_64-linux-pc, no regression observed. Ready for trunk? Ok. Thanks, Richard. Thanks, Martin
Re: avoid alignment of static variables affecting stack's
On 24.10.14 at 11:52, richard.guent...@gmail.com wrote: On Fri, Oct 24, 2014 at 11:18 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote: For something in static storage, this seems OK. However, I think a hard register variable ought to be left alone -- even if we can't spill it to a stack slot today, there's a reasonable chance we might add that capability in the future. Hmm, but then wouldn't it need to be the code generating the spill that's responsible for enforcing suitable alignment? I can certainly re-submit without the hard register special cased (as it would still fix the original issue I'm seeing), but it feels wrong to do so. Yes, ISTR the spilling code is supposed to update the required stack alignment. After all the RA decision might affect required alignment of spills. From what I remember, at RA time you already have to know conservatively that you'll want to do dynamic stack realignment and what the highest needed alignment will be, so various parts of expansion etc. conservatively compute what will be needed. I think that is because you e.g. need to reserve some registers (vDRAP, etc.) if doing dynamic realignment. If you conservatively assume you'll need dynamic stack realignment and after RA you find you really don't need it, there are some optimizations in prologue threading where it attempts to at least decrease amount of unnecessary code, but the harm has already been done. Might be that with LRA perhaps this could be changed and not conservatively assume more alignment than proven to be needed, but such code isn't there I think. I stand corrected then. So am I to conclude then that I need to take out the hard register check in order for this to be accepted? Jan
Re: [PATCHv4] Enable -fsanitize-recover for KASan
On Fri, Oct 24, 2014 at 1:59 PM, Yury Gribov y.gri...@samsung.com wrote: On 10/24/2014 01:44 PM, Dmitry Vyukov wrote: On Fri, Oct 24, 2014 at 12:28 PM, Yury Gribov y.gri...@samsung.com wrote: On 10/23/2014 03:10 PM, Andrey Ryabinin wrote: On 10/23/2014 02:38 PM, Jakub Jelinek wrote: On Thu, Oct 23, 2014 at 02:33:42PM +0400, Yury Gribov wrote: Actually this is a historical artifact. If inlining proves to be significantly faster, they may want to switch. Ok. So, at that point you can include your ugly hacks in __asan_load* logic in the kernel, the difference between __asan_load4 and __asan_load4_noabort will be just that the latter will always return, while the former will not if some error has been reported. All the __asan_load* and __asan_store* entrypoints, regardless of -f{,no-}sanitize-recover=kernel-address are by definition not noreturn, they in the common case (if the code is not buggy) return. Perhaps we should just keep __asan_load* as is and leave the decision whether to abort or continue for the runtime? This would make semantics of -fsanitize-recover cumbersome though (because it wouldn't work if user selects outline instrumentation). Well, the don't ever report anything while some per-CPU flag is set thing can be considered as part of the is this memory access ok test, it is pretending everything is accessible. But, otherwise, if it is supposed to be developer's decision at compile time, __asan_load*_noabort should better always continue, even if it reported issues, and __asan_load* should better not return after reporting errors. True, but why we need new functions for that. __asan_load could also abort or not depending on what user/developer wants. Why we have to rebuild the entire kernel if someone wants to switch from abort to noabort? I'm not against __asan_load_noabort, I'm just saying that this is no point to have separate __asan_load/__asan_load_noabort functions in kernel. I'd still suggest to emit __asan_load_noabort so that we match userspace (where __asan_load strictly matches __asan_report in terminating the program). Behavior of __asan_load_noabort can further be restricted by user via various environment settings (kernel parameters, /proc, etc.). @Dmitry: what's your opinion on this? I am somewhat lost in this thread and probably missing something. But why do we need __asan_load (which is not noabort) at all? Outline instrumentation is non a default mode for both user-space asan and kasan (at least in the envisioned future). Well, it's still enabled automatically if number of memory accesses is getting large enough (I think it was 7000?) so it is default in a way. I would expect that these non-typical cases that use outline instrumentation can also bear the overhead of non-noreturn functions. As Jakub mentioned both __asan_load and __asan_load_noabort will _not_ be NORETURN. Two different versions are necessary so that compiler can insert outline instrumentation that matches the -fsanitize-recover setting. Can we use just one version of __asan_load and let runtime decide on abort? In this case semantics of inline and outline instrumentation will differ (the former depending on -fsanitize-recover compile-time setting and the latter depending on runtime options) which may be undesirable given that compiler may automatically choose to switch from inline to outline depending on function size. Ok, thank you, now I am on the same page. Yury, I would expect that the inline instrumentation will become the default in the kernel as well (it's 2 times faster, or maybe even more if happens to affect loop registration). Do you agree? If we consider inline as default, then the user won't be able to simply switch between abort/noabort with a runtime flag w/o rebuilding the kernel, because for __asan_report* we have to have 2 versions. I would also consider that abort/noabort is pretty persistent in a particular testing environment -- you either use one or another and do not switch frequently between them. Taking that into account and the fact that __asan_load* can be emitted due to call threshold, I am mildly in favor of Jakub's position of making it all consistent between user/kernel and load/report and explicit. To make it clear, I mean the runtime interface: __asan_load* -- not-noreturn, always aborts in runtime on failure __asan_load*_noabort - not-noreturn, never aborts in runtime on failure
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
Richard, Patch containing new core related to extended predication is attached. Here is few comments which explain a main goal of design. 1. I don't want to insert any critical edge splitting since it may lead to less efficient binaries (I remember some performance issue when we designed lazy code motion algorithm in SPARC compiler). 2. One special case of extended PHI node predication was introduced when #arguments is more than 2 but only two arguments are different and one argument has the only occurrence. For such PHI conditional scalar reduction is applied. This is correspondent to the following: if (q1 q2 q3) var++ New function phi_has_two_different_args was introduced to detect such phi. 3. Original algorithm for PHI predication used assumption that at least one incoming edge for blocks containing PHI is not critical - it guarantees that all computations related to predicate of normal edge are already inserted above this block and core related to PHI predication can be inserted at the beginning of block. But this is not true for critical edges for which predicate computations are in the block where code for phi predication must be inserted. So new function find_insertion_point is introduced which is simply found out the last statement in block defining predicates correspondent to all incoming edges and insert phi predication code after it (with some minor exceptions). If you need more comments or something unclear will let me know. Thanks. Yuri. ChangeLog: 2014-10-24 Yuri Rumyantsev ysrum...@gmail.com * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE instead of loop flag. (if_convertible_bb_p): Allow bb has more than 2 predecessors if FLAG_FORCE_VECTORIZE is true. (if_convertible_bb_p): Delete check that bb has at least one non-critical incoming edge. (phi_has_two_different_args): New function. (is_cond_scalar_reduction): Add argument EXTENDED to choose access to phi arguments. Invoke phi_has_two_different_args to get phi arguments if EXTENDED is true. Change check that block containing reduction statement candidate is predecessor of phi-block since phi may have more than two arguments. (convert_scalar_cond_reduction): Add argument BEFORE to insert statement before/after gsi point. (predicate_scalar_phi): Add argument false (which means non-extended predication) to call of is_cond_scalar_reduction. Add argument true (which correspondent to argument BEFORE) to call of convert_scalar_cond_reduction. (get_predicate_for_edge): New function. (predicate_arbitrary_scalar_phi): New function. (predicate_extended_scalar_phi): New function. (find_insertion_point): New function. (predicate_all_scalar_phis): Add two boolean variables EXTENDED and BEFORE. Initialize EXTENDED to true if BB containing phi has more than 2 predecessors or both incoming edges are critical. Invoke find_phi_replacement_condition and predicate_scalar_phi or find_insertion_point and predicate_extended_scalar_phi depending on EXTENDED value. (insert_gimplified_predicates): Add check that non-predicated block may have statements to insert. Insert predicate of BB just after label if FLAG_FORCE_VECTORIZE is true. (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which is copy of inner or outer loop field force_vectorize. 2014-10-24 13:12 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, In my initial design I did such splitting but before start real if-conversion but I decided to not perform it since code size for if-converted loop is growing (number of phi nodes is increased). It is worth noting also that for phi with #nodes 2 we need to get all predicates (except one) to do phi-predication and it means that block containing such phi can have only 1 critical edge. Can you point me to the patch with the special insertion code then? I definitely want to avoid the mess we ran into with the reassoc code clever insertion code. Richard. Thanks. Yuri. 2014-10-21 18:19 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I saw the sources of these functions, but I can't understand why I should use something else? Note that all predicate computations are located in basic blocks ( by design of if-conv) and there is special function that put these computations in bb (insert_gimplified_predicates). Edge contains only predicate not its computations. New function - find_insertion_point() does very simple search - it finds out the latest (in current bb) operand def-stmt of predicates taken from all incoming edges. In original algorithm the predicate of non-critical edge is taken to perform phi-node predication since for critical edge it does not work properly. My question is: does your comments mean that I
Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
On 23 October 2014 18:51, Charles Baylis charles.bay...@linaro.org wrote: Otherwise this and the previous 1/2 associated patch look good, can you respin with these tidy ups? OK for trunk? OK /Marcus
Re: [patch,avr] tweak sign extensions, take #2
Am 10/23/2014 08:16 PM schrieb Denis Chertykov: This optimization makes most sign-extensions one instruction shorter in the case when the source register may be clobbered and the register numbers are different. Source and destination may overlap. Ok for trunk? Johann gcc/ * config/avr/avr.md (extendqihi2, extendqipsi2, extendqisi2) (extendhipsi2, extendhisi2): Optimize if source reg is unused after the insns and has different REGNO than destination. Approved. Denis. Finally I switched to a solution that avoids all the ugly asm snippets and special casing, and which is exact w.r.t code size. So allow me drop the patch from above and to propose this one for trunk. Sorry for the inconvenience. In any case it uses LSL/SBC idiom instead of the old CLR/SBRC/COM. Johann * avr-protos.h (avr_out_sign_extend): New. * avr.c (avr_adjust_insn_length) [ADJUST_LEN_SEXT]: Handle. (avr_out_sign_extend): New function. * avr.md (extendqihi2, extendqipsi2, extendqisi2, extendhipsi2) (extendhisi2, extendpsisi2): Use it. (adjust_len) [sext]: New. Index: config/avr/avr-protos.h === --- config/avr/avr-protos.h (revision 216591) +++ config/avr/avr-protos.h (working copy) @@ -57,6 +57,7 @@ extern const char *avr_out_compare (rtx_ extern const char *avr_out_compare64 (rtx_insn *, rtx*, int*); extern const char *ret_cond_branch (rtx x, int len, int reverse); extern const char *avr_out_movpsi (rtx_insn *, rtx*, int*); +extern const char *avr_out_sign_extend (rtx_insn *, rtx*, int*); extern const char *ashlqi3_out (rtx_insn *insn, rtx operands[], int *len); extern const char *ashlhi3_out (rtx_insn *insn, rtx operands[], int *len); Index: config/avr/avr.c === --- config/avr/avr.c (revision 216592) +++ config/avr/avr.c (working copy) @@ -7734,6 +7734,56 @@ avr_out_bitop (rtx insn, rtx *xop, int * } +/* Output sign extension from XOP[1] to XOP[0] and return . + If PLEN == NULL, print assembler instructions to perform the operation; + otherwise, set *PLEN to the length of the instruction sequence (in words) + as printed with PLEN == NULL. */ + +const char* +avr_out_sign_extend (rtx_insn *insn, rtx *xop, int *plen) +{ + // Size in bytes of source resp. destination operand. + unsigned n_src = GET_MODE_SIZE (GET_MODE (xop[1])); + unsigned n_dest = GET_MODE_SIZE (GET_MODE (xop[0])); + rtx r_msb = all_regs_rtx[REGNO (xop[1]) + n_src - 1]; + + if (plen) +*plen = 0; + + // Copy destination to source + + if (REGNO (xop[0]) != REGNO (xop[1])) +{ + gcc_assert (n_src = 2); + + if (n_src == 2) +avr_asm_len (AVR_HAVE_MOVW + ? movw %0,%1 + : mov %B0,%B1, xop, plen, 1); + if (n_src == 1 || !AVR_HAVE_MOVW) +avr_asm_len (mov %A0,%A1, xop, plen, 1); +} + + // Set Carry to the sign bit MSB.7... + + if (REGNO (xop[0]) == REGNO (xop[1]) + || !reg_unused_after (insn, r_msb)) +{ + avr_asm_len (mov __tmp_reg__,%0, r_msb, plen, 1); + r_msb = tmp_reg_rtx; +} + + avr_asm_len (lsl %0, r_msb, plen, 1); + + // ...and propagate it to all the new sign bits + + for (unsigned n = n_src; n n_dest; n++) +avr_asm_len (sbc %0,%0, all_regs_rtx[REGNO (xop[0]) + n], plen, 1); + + return ; +} + + /* PLEN == NULL: Output code to add CONST_INT OP[0] to SP. PLEN != NULL: Set *PLEN to the length of that sequence. Return . */ @@ -8578,6 +8628,7 @@ avr_adjust_insn_length (rtx_insn *insn, case ADJUST_LEN_MOVMEM: avr_out_movmem (insn, op, len); break; case ADJUST_LEN_XLOAD: avr_out_xload (insn, op, len); break; case ADJUST_LEN_LPM: avr_out_lpm (insn, op, len); break; +case ADJUST_LEN_SEXT: avr_out_sign_extend (insn, op, len); break; case ADJUST_LEN_SFRACT: avr_out_fract (insn, op, true, len); break; case ADJUST_LEN_UFRACT: avr_out_fract (insn, op, false, len); break; Index: config/avr/avr.md === --- config/avr/avr.md (revision 216592) +++ config/avr/avr.md (working copy) @@ -147,7 +147,7 @@ (define_attr length ;; Otherwise do special processing depending on the attribute. (define_attr adjust_len - out_bitop, plus, addto_sp, + out_bitop, plus, addto_sp, sext, tsthi, tstpsi, tstsi, compare, compare64, call, mov8, mov16, mov24, mov32, reload_in16, reload_in24, reload_in32, ufract, sfract, round, @@ -4174,62 +4174,66 @@ (define_insn extendqihi2 [(set (match_operand:HI 0 register_operand =r,r) (sign_extend:HI (match_operand:QI 1 combine_pseudo_register_operand 0,*r)))] - @ - clr %B0\;sbrc %0,7\;com %B0 - mov %A0,%A1\;clr %B0\;sbrc %A0,7\;com %B0 + { +return avr_out_sign_extend (insn, operands, NULL); + } [(set_attr length 3,4) - (set_attr cc set_n,set_n)]) +
Re: avoid alignment of static variables affecting stack's
On Fri, Oct 24, 2014 at 12:12 PM, Jan Beulich jbeul...@suse.com wrote: On 24.10.14 at 11:52, richard.guent...@gmail.com wrote: On Fri, Oct 24, 2014 at 11:18 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 24, 2014 at 11:10:08AM +0200, Richard Biener wrote: For something in static storage, this seems OK. However, I think a hard register variable ought to be left alone -- even if we can't spill it to a stack slot today, there's a reasonable chance we might add that capability in the future. Hmm, but then wouldn't it need to be the code generating the spill that's responsible for enforcing suitable alignment? I can certainly re-submit without the hard register special cased (as it would still fix the original issue I'm seeing), but it feels wrong to do so. Yes, ISTR the spilling code is supposed to update the required stack alignment. After all the RA decision might affect required alignment of spills. From what I remember, at RA time you already have to know conservatively that you'll want to do dynamic stack realignment and what the highest needed alignment will be, so various parts of expansion etc. conservatively compute what will be needed. I think that is because you e.g. need to reserve some registers (vDRAP, etc.) if doing dynamic realignment. If you conservatively assume you'll need dynamic stack realignment and after RA you find you really don't need it, there are some optimizations in prologue threading where it attempts to at least decrease amount of unnecessary code, but the harm has already been done. Might be that with LRA perhaps this could be changed and not conservatively assume more alignment than proven to be needed, but such code isn't there I think. I stand corrected then. So am I to conclude then that I need to take out the hard register check in order for this to be accepted? Yes. Thanks, Richard. Jan
Re: [PATCH][AArch64] Cleanup logic around aarch64_final_prescan
On 22 October 2014 14:57, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-10-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.h (ADJUST_INSN_LENGTH): Wrap definition in do while (0). * config/aarch64/aarch64.c (is_mem_p): Delete. (is_memory_op): Rename to... (has_memory_op): ... This. Use FOR_EACH_SUBRTX. (dep_between_memop_and_curr): Assert that the input is a SET. (aarch64_madd_needs_nop): Add comment. Do not call dep_between_memop_and_curr on NULL body. (aarch64_final_prescan_insn): Add comment. Include rtl-iter.h. OK /Marcus
Re: [PATCH] PR58867 ASan and UBSan tests not run for installed testing.
some time ago, Andrew wrote a patch that fixes PR58867 (http://patchwork.ozlabs.org/patch/286866/), but for some reasons it wasn't committed to trunk. This is resurrected Andrew's patch, extended to support Tsan testsuite. This patch broke --disable-libsanitizer though, i.e. you now get gazillions of sanitizer failures in the C and C++ testsuites. -- Eric Botcazou
Re: [PATCH][AArch64][4.8] Backport Cortex-A53 erratum 835769 workaround
On 17 October 2014 16:55, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the 4.8 backport of the Cortex-A53 erratum 835769 workaround. 4.8 doesn't have rtx_insns and the type attributes are different. Other than that there's not much different from the trunk version. Bootstrapped and tested on aarch64-none-linux-gnu with and without the workaround enabled. Compiled various large benchmarks with it. Ok for the 4.8 branch? OK /Marcus
[PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns
This combines the already posted 3/n (first simple patterns), 4/n (hook into fold-const.c) and not yet posted 5/n (hook into fold_stmt). Over the first posting this also contains recent improvements to the generator from the branch regarding to TREE_SIDE_EFFECTS and NON_LVALUE_EXPR handling. The hook into fold_stmt leaves all existing calls in doing what fold_stmt does currently (the match-and-simplify machinery will not follow SSA edges). It adds the ability to enable that though via an overload taking a valueization hook as argument (this is how tree-ssa-forwprop.c will exercise it). Bootstrapped and tested on x86_64-unknown-linux-gnu. Thanks, Richard. 2014-10-24 Richard Biener rguent...@suse.de * genmatch.c (expr::gen_transform): Use fold_buildN_loc and build_call_expr_loc. (dt_simplify::gen): Drop non_lvalue for GIMPLE, use non_lvalue_loc to build it for GENERIC. (decision_tree::gen_generic): Add location argument to generic_simplify prototype. (capture_info): New class. (capture_info::capture_info): New constructor. (capture_info::walk_match): New method. (capture_info::walk_result): New method. (capture_info::walk_c_expr): New method. (dt_simplify::gen): Handle preserving side-effects for GENERIC code generation. (decision_tree::gen_generic): Do not reject operands with TREE_SIDE_EFFECTS. * generic-match.h: New file. * generic-match-head.c: Include generic-match.h, not gimple-match.h. * match.pd: Add some constant folding patterns from fold-const.c. * fold-const.c: Include generic-match.h. (fold_unary_loc): Dispatch to generic_simplify. (fold_ternary_loc): Likewise. (fold_binary_loc): Likewise. Remove patterns now implemented by generic_simplify. * gimple-fold.c (replace_stmt_with_simplification): New function. (fold_stmt_1): Add valueize parameter, dispatch to gimple_simplify. (no_follow_ssa_edges): New function. (fold_stmt): New overload with valueization hook. Use no_follow_ssa_edges for the overload without hook. (fold_stmt_inplace): Likewise. * gimple-fold.h (no_follow_ssa_edges): Declare. Index: gcc/generic-match.h === *** /dev/null 1970-01-01 00:00:00.0 + --- gcc/generic-match.h 2014-10-23 15:45:28.322836040 +0200 *** *** 0 --- 1,33 + /* Generic simplify definitions. + +Copyright (C) 2011-2014 Free Software Foundation, Inc. +Contributed by Richard Guenther rguent...@suse.de + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation; either version 3, or (at your option) any later + version. + + GCC is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + http://www.gnu.org/licenses/. */ + + #ifndef GCC_GENERIC_MATCH_H + #define GCC_GENERIC_MATCH_H + + /* Note the following functions are supposed to be only used from +fold_unary_loc, fold_binary_loc and fold_ternary_loc respectively. +They are not considered a public API. */ + + tree generic_simplify (location_t, enum tree_code, tree, tree); + tree generic_simplify (location_t, enum tree_code, tree, tree, tree); + tree generic_simplify (location_t, enum tree_code, tree, tree, tree, tree); + + #endif /* GCC_GENERIC_MATCH_H */ Index: gcc/generic-match-head.c === *** gcc/generic-match-head.c.orig 2014-10-23 15:45:26.935836135 +0200 --- gcc/generic-match-head.c2014-10-23 15:45:28.322836040 +0200 *** along with GCC; see the file COPYING3. *** 43,48 #include tree-phinodes.h #include ssa-iterators.h #include dumpfile.h ! #include gimple-match.h --- 43,48 #include tree-phinodes.h #include ssa-iterators.h #include dumpfile.h ! #include generic-match.h Index: gcc/fold-const.c === *** gcc/fold-const.c.orig 2014-10-23 15:44:38.601839463 +0200 --- gcc/fold-const.c2014-10-23 15:45:51.976834411 +0200 *** along with GCC; see the file COPYING3. *** 70,75 --- 70,76 #include hash-table.h /* Required for ENABLE_FOLD_CHECKING. */ #include builtins.h #include cgraph.h + #include generic-match.h /* Nonzero if we are folding constants inside an initializer; zero otherwise. */ *** fold_unary_loc (location_t loc, enum tre ***
Re: [PATCH][AArch64][4.8] Add --enable-fix-cortex-a53-835769 configure option
On 17 October 2014 16:55, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the 4.8 backport of the configure option --enable-fix-cortex-a53-835769 to enable the workaround for the Cortex-A53 erratum 835769 by default. The patch is very similar to the trunk version, just some differences in the placement of the relevant sections. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for the 4.8 branch together with the -mfix-cortex-a53-835769 option backport? OK /Marcus
Re: [PATCH] Fix genmatch linking
Richard Biener rguent...@suse.de writes: Dominique reported that this fails for system libiconv but built libintl. Which might be fixed by the following. Does that still work for you? It does: an i386-pc-solaris2.10 bootstrap has finished by now and make check is running. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH][AArch64] LINK_SPEC changes for Cortex-A53 erratum 835769 workaround
On 22 October 2014 15:20, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch contains the LINK_SPEC changes required to pass on the linker option --fix-cortex-a53-835769 when compiling with -mfix-cortex-a53-835769 (or by default when configured with --enable-fix-cortex-a53-835769). This requires a binutils installation with the patch posted at https://sourceware.org/ml/binutils/2014-10/msg00198.html applied. Bootstrapped and tested on aarch64-none-linux-gnu and built various benchmarks. This patch applies to 4.9 (4.8 version will be posted separately) and has been tested there as well. Ok for trunk and 4.9? The corresponding binutils changes are committed on binutils trunk, 2.25 and 2.24. The trunk patch is OK. Given that Jakub is in the process of preparing a 4.9.2 I'd like an explicit OK before we commit on 4.9. Jakub? Cheers /Marcus
[match-and-simplify] Merge from trunk
2014-10-24 Richard Biener rguent...@suse.de Merge from trunk r216543 through r216631. Brings back second merge piece.
Re: [PATCH][AArch64] LINK_SPEC changes for Cortex-A53 erratum 835769 workaround
On Fri, Oct 24, 2014 at 12:04:52PM +0100, Marcus Shawcroft wrote: On 22 October 2014 15:20, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch contains the LINK_SPEC changes required to pass on the linker option --fix-cortex-a53-835769 when compiling with -mfix-cortex-a53-835769 (or by default when configured with --enable-fix-cortex-a53-835769). This requires a binutils installation with the patch posted at https://sourceware.org/ml/binutils/2014-10/msg00198.html applied. Bootstrapped and tested on aarch64-none-linux-gnu and built various benchmarks. This patch applies to 4.9 (4.8 version will be posted separately) and has been tested there as well. Ok for trunk and 4.9? The corresponding binutils changes are committed on binutils trunk, 2.25 and 2.24. The trunk patch is OK. Given that Jakub is in the process of preparing a 4.9.2 I'd like an explicit OK before we commit on 4.9. Jakub? Is that a regression on the 4.9 branch? If not, I'd prefer if it could wait for 4.9.3. Jakub
Re: [PATCH] Fix genmatch linking
On Fri, 24 Oct 2014, Rainer Orth wrote: Richard Biener rguent...@suse.de writes: Dominique reported that this fails for system libiconv but built libintl. Which might be fixed by the following. Does that still work for you? It does: an i386-pc-solaris2.10 bootstrap has finished by now and make check is running. Dominique reported an ok as well. Bootstrapped myself on x86_64-unknown-linux-gnu and commited as r216632. Richard. 2014-10-24 Richard Biener rguent...@suse.de * Makefile.in (BUILD_CPPLIB): Move $(LIBINTL) $(LIBICONV) to genmatch BUILD_LIBS instead. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 216626) +++ gcc/Makefile.in (working copy) @@ -981,15 +981,6 @@ else LIBIBERTY = ../libiberty/libiberty.a BUILD_LIBIBERTY = $(build_libobjdir)/libiberty/libiberty.a endif -# For stage1 and when cross-compiling use the build libcpp which is -# built with NLS disabled. For stage2+ use the host library and -# its dependencies. -ifeq ($(build_objdir),$(build_libobjdir)) -BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a -else -BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) $(LIBINTL) $(LIBICONV) -build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) -endif # Dependencies on the intl and portability libraries. LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \ @@ -2529,6 +2520,17 @@ genprog = $(genprogerr) check checksum c # These programs need libs over and above what they get from the above list. build/genautomata$(build_exeext) : BUILD_LIBS += -lm +# For stage1 and when cross-compiling use the build libcpp which is +# built with NLS disabled. For stage2+ use the host library and +# its dependencies. +ifeq ($(build_objdir),$(build_libobjdir)) +BUILD_CPPLIB = $(build_libobjdir)/libcpp/libcpp.a +else +BUILD_CPPLIB = $(CPPLIB) $(LIBIBERTY) +build/genmatch$(build_exeext): BUILD_LIBDEPS += $(LIBINTL_DEP) $(LIBICONV_DEP) +build/genmatch$(build_exeext): BUILD_LIBS += $(LIBINTL) $(LIBICONV) +endif + build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \ $(BUILD_ERRORS) build/vec.o build/hash-table.o
Re: [PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns
+ /* Same applies to modulo operations, but fold is inconsistent here +and simplifies 0 % x to 0, only preserving literal 0 % 0. */ + (for op (ceil_mod floor_mod round_mod trunc_mod) + /* 0 % X is always zero. */ + (simplify + (trunc_mod integer_zerop@0 @1) + /* But not for 0 % 0 so that we can get the proper warnings and errors. */ + (if (!integer_zerop (@1)) +@0)) + /* X % 1 is always zero. */ + (simplify + (trunc_mod @0 integer_onep) + { build_zero_cst (type); })) op is unused, you probably meant to replace trunc_mod with it. -- Marc Glisse
Re: [PATCH][3,4,5/n] Merge from match-and-simplify, fold, fold_stmt and first patterns
On Fri, 24 Oct 2014, Marc Glisse wrote: + /* Same applies to modulo operations, but fold is inconsistent here +and simplifies 0 % x to 0, only preserving literal 0 % 0. */ + (for op (ceil_mod floor_mod round_mod trunc_mod) + /* 0 % X is always zero. */ + (simplify + (trunc_mod integer_zerop@0 @1) + /* But not for 0 % 0 so that we can get the proper warnings and errors. */ + (if (!integer_zerop (@1)) +@0)) + /* X % 1 is always zero. */ + (simplify + (trunc_mod @0 integer_onep) + { build_zero_cst (type); })) op is unused, you probably meant to replace trunc_mod with it. Oh, indeed. I'll fix that up next week (heh - sth for a first warning from genmatch!). Thanks, Richard.
[PATCH][ARM] revert changes on check_effective_target_arm_*_ok
we should not add explicit declaration there. arm_neon.h contains those prototype already. they will be available if the compiler configuration is with related builtin predefine, for example __ARM_FEATURE_CRYPTO. so, actually, if there is any warning when compile these test programs, they are expected, and we rely on these warnings to check whether certain features are available. previously, I only verified on arm-none-linux-gnueabi cross check, so have not exposed these regressions. no verified on arm-none-linux-gnueabihf, regression gone away on arm directory. make check RUNTESTFLAGS=aapcs.exp neon.exp acle.exp simd.exp arm.exp ok for trunk? gcc/testsuite/ * lib/target-supports.exp (check_effective_target_arm_crypto_ok_nocache): Remove declaration for vaeseq_u8. (check_effective_target_arm_neon_fp16_ok_nocache): Remove declaration for vcvt_f16_f32. (check_effective_target_arm_neonv2_ok_nocache): Remove declaration for vfma_f32. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 91460c2..4398345 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2404,7 +2404,6 @@ proc check_effective_target_arm_crypto_ok_nocache { } { foreach flags { -mfloat-abi=softfp -mfpu=crypto-neon-fp-armv8 -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp} { if { [check_no_compiler_messages_nocache arm_crypto_ok object { #include arm_neon.h - extern uint8x16_t vaeseq_u8 (uint8x16_t, uint8x16_t); uint8x16_t foo (uint8x16_t a, uint8x16_t b) { @@ -2549,7 +2548,6 @@ proc check_effective_target_arm_neon_fp16_ok_nocache { } { -mfpu=neon-fp16 -mfloat-abi=softfp} { if { [check_no_compiler_messages_nocache arm_neon_fp_16_ok object { #include arm_neon.h - extern float16x4_t vcvt_f16_f32 (float32x4_t); float16x4_t foo (float32x4_t arg) { @@ -2625,7 +2623,6 @@ proc check_effective_target_arm_neonv2_ok_nocache { } { foreach flags { -mfloat-abi=softfp -mfpu=neon-vfpv4 -mfpu=neon-vfpv4 -mfloat-abi=softfp} { if { [check_no_compiler_messages_nocache arm_neonv2_ok object { #include arm_neon.h - extern float32x2_t vfma_f32 (float32x2_t, float32x2_t, float32x2_t); float32x2_t foo (float32x2_t a, float32x2_t b, float32x2_t c) {
[PATCH][ARM] gnu11 cleanup for aapcs testcases
a furhter cleanup under aapcs sub-directory. ok for trunk? gcc/testsuite/ * gcc.target/arm/aapcs/abitest.h: Declare memcpy. diff --git a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h index 06a92c3..7bce58b 100644 --- a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h +++ b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h @@ -49,6 +49,8 @@ extern void abort (void); +typedef unsigned int size_t; +extern int memcmp (const void *s1, const void *s2, size_t n); __attribute__((naked)) void dumpregs () __asm(myfunc); __attribute__((naked)) void dumpregs ()
Re: [PATCH][ARM] gnu11 cleanup for aapcs testcases
On Fri, Oct 24, 2014 at 12:48:24PM +0100, Jiong Wang wrote: a furhter cleanup under aapcs sub-directory. ok for trunk? gcc/testsuite/ * gcc.target/arm/aapcs/abitest.h: Declare memcpy. diff --git a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h index 06a92c3..7bce58b 100644 --- a/gcc/testsuite/gcc.target/arm/aapcs/abitest.h +++ b/gcc/testsuite/gcc.target/arm/aapcs/abitest.h @@ -49,6 +49,8 @@ extern void abort (void); +typedef unsigned int size_t; +extern int memcmp (const void *s1, const void *s2, size_t n); You can use __SIZE_TYPE__ and then you don't need the typedef. Marek
Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c
Rainer Orth wrote: However, as a quick first step, does adding the ilp32 / lp64 (and keeping the architectures list for now) solve the immediate problem? Patch attached, OK for trunk? No, as I said this is wrong for biarch targets like sparc and i386. When you say no this does not solve the immediate problem, are you saying that you are (still) seeing test failures with the require-effective-target patch applied? Or is the issue that this would not execute the tests as widely as might be possible? In principle I'm quite happy to relax the target patterns, although have been having issues with sparc (below)... Re. what the architectures have in common is largely that these are the primary/secondary archs on which I've checked the test passes! I can now add mips and microblaze to this list, however I'm nervous of dropping the target entirely given the very large number of target architectures gcc supports; and e.g. IA64 (in ILP32 mode) generates an ashiftrt:DI by 31 places, not ashiftrt:SI, which does not match the simplification criteria in combine.c. This should be something like { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* } E.g. sparc-sun-solaris2.11 with -m64 is lp64, but would be excluded by your target list. Keep the list sorted alphabetically and best add an explanation so others know what those targets have in common. So I've built a stage-1 compiler with --target=sparc-sun-solaris2.11, and I find * without -m64, my dg-require-effective-target ilp32 causes the 32-bit test to execute, and pass; dg-require-effective-target lp64 prevents execution of the 64-bit test (which would fail) - so all as expected and desired. * with -lp64, behaviour is as previous (this is probably expected) * with -m64, dg-require-effective-target ilp32 still causes the test to execute (but it fails, as the RTL now has an ashiftrt:DI by 31 places, which doesn't meet the simplification criteria in combine.c - this is pretty much as expected). dg-require-effective-target lp64 stops the 64-bit test from executing however (despite that it would now pass). Can you clarify what I should be doing on sparc, therefore? Thanks for your help! Alan
[PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral
This is the first half of my previous patch series (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01456.html), that is the part making the REDUC_..._EXPR tree codes endian-neutral, and adding a new reduce-to-scalar optab in place of the endianness-dependent reduc_[us](plus|min|max)_optab. I'm leaving the vec_shr portion out of this patch series, as the link between the two halves is only the end goal of removing an if (BYTES_BIG_ENDIAN) from tree-vect-loop.c; this series removes that from one code path so can stand alone. Patches 1-6 are as previously posted apart from rebasing and removing the old/poisoned AArch64 patterns as per maintainer's request. Patches 1, 2, 4, 5 and 6 have already been approved; patch 3 was discussed somewhat but I think we decided against most of the ideas raised, I have added comment to scalar_reduc_to_vector. I now reread Richie's Otherwise the patch looks good to me and wonder if I should have taken that as an approval but I didn't read it that way at the time...??? Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out how to do the same for MIPS (specifically what I need to add to mips_expand_vec_reduc), and have had no response from the maintainers, so am leaving that for now. Also I haven't migrated (or worked out how to target) rs6000/paired.md, help would be most welcome. The suggestion was then to complete the migration, by removing the old optabs. There are a few options here and I'll follow up with appropriate patches according to feedback received. I see options: (1) just delete the old optabs (and the migration code). This would performance-regress the MIPS backend, but should not break it, although one should really do *something* with the then-unused reduc_[us](plus|min|max)_optab in config/mips/loongson.md. (2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break the MIPS backend if something were not done with it's existing patterns. (2a) Alternatively I could just use a different new name, e.g. reduce_, reduct_, vec_reduc_..., anything that's less of a mouthful than reduc_..._scal. Whilst being only-very-slightly-different from the current reduc_... might be confusing, so might changing the meaning of the optab, and its signature, with the existing name, so am open to suggestions? Cheers, Alancommit 9819291c17610dcdcca19a3d9ea3a4260df0577e Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Aug 21 13:05:43 2014 +0100 Temporarily remove gimple_fold diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 3dba1b2..a49da89 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1188,6 +1188,9 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args, return NULL_TREE; } +/* Handling of reduction operations temporarily removed so as to decouple + changes to tree codes from AArch64 NEON Intrinsics. */ +#if 0 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) { @@ -1259,6 +1262,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) return changed; } +#endif void aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index db5ff59..27d82f3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10015,8 +10015,8 @@ aarch64_asan_shadow_offset (void) #undef TARGET_FRAME_POINTER_REQUIRED #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required -#undef TARGET_GIMPLE_FOLD_BUILTIN -#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin +//#undef TARGET_GIMPLE_FOLD_BUILTIN +//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin #undef TARGET_GIMPLIFY_VA_ARG_EXPR #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_exprcommit bf6d5d32c552ce1c6ccd890f501db4f39291088f Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Jul 29 11:46:01 2014 +0100 Make tree codes produce scalar, with NOP_EXPRs. (tree-vect-loop.c mess) diff --git a/gcc/expr.c b/gcc/expr.c index a6233f3..c792028 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9044,7 +9044,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, { op0 = expand_normal (treeop0); this_optab = optab_for_tree_code (code, type, optab_default); -temp = expand_unop (mode, this_optab, op0, target, unsignedp); +enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0)); +temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp); +gcc_assert (temp); +/* The tree code produces a scalar result, but (somewhat by convention) + the optab produces a vector with the result in element 0 if + little-endian, or element N-1 if big-endian. So pull the scalar
[PATCH 7/11][ARM] Migrate to new reduc_plus_scal_optab
This migrates ARM from reduc_splus_optab and reduc_uplus optab to a single reduc_plus_optab. Tested, in combination with next patch: bootstrap on arm-none-linux-gnueabihf cross-tested check-gcc on arm-none-eabi. gcc/ChangeLog: config/arm/neon.md (reduc_plus_*): Rename to... (reduc_plus_scal_*): ...this; reduce to temp and extract scalar result.commit 22e60bd46f2a591f5357a543d76b19ed89f401ed Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Aug 28 16:12:24 2014 +0100 ARM reduc_plus_scal, V_elem not V_ext, rm old reduc_[us]plus, emit the extract! diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 41cf913..d13fe5d 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1349,33 +1349,47 @@ ;; Reduction operations -(define_expand reduc_splus_mode - [(match_operand:VD 0 s_register_operand ) +(define_expand reduc_plus_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VD 1 s_register_operand )] TARGET_NEON (!Is_float_mode || flag_unsafe_math_optimizations) { - neon_pairwise_reduce (operands[0], operands[1], MODEmode, + rtx vec = gen_reg_rtx (MODEmode); + neon_pairwise_reduce (vec, operands[1], MODEmode, gen_neon_vpadd_internalmode); + /* The same result is actually computed into every element. */ + emit_insn (gen_vec_extractmode (operands[0], vec, const0_rtx)); DONE; }) -(define_expand reduc_splus_mode - [(match_operand:VQ 0 s_register_operand ) +(define_expand reduc_plus_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VQ 1 s_register_operand )] TARGET_NEON (!Is_float_mode || flag_unsafe_math_optimizations) !BYTES_BIG_ENDIAN { rtx step1 = gen_reg_rtx (V_HALFmode); - rtx res_d = gen_reg_rtx (V_HALFmode); emit_insn (gen_quad_halves_plusmode (step1, operands[1])); - emit_insn (gen_reduc_splus_V_half (res_d, step1)); - emit_insn (gen_move_lo_quad_mode (operands[0], res_d)); + emit_insn (gen_reduc_plus_scal_V_half (operands[0], step1)); + + DONE; +}) + +(define_expand reduc_plus_scal_v2di + [(match_operand:DI 0 nonimmediate_operand =w) + (match_operand:V2DI 1 s_register_operand )] + TARGET_NEON !BYTES_BIG_ENDIAN +{ + rtx vec = gen_reg_rtx (V2DImode); + + emit_insn (gen_arm_reduc_plus_internal_v2di (vec, operands[1])); + emit_insn (gen_vec_extractv2di (operands[0], vec, const0_rtx)); DONE; }) -(define_insn reduc_splus_v2di +(define_insn arm_reduc_plus_internal_v2di [(set (match_operand:V2DI 0 s_register_operand =w) (unspec:V2DI [(match_operand:V2DI 1 s_register_operand w)] UNSPEC_VPADD))] @@ -1384,17 +1398,6 @@ [(set_attr type neon_add_q)] ) -;; NEON does not distinguish between signed and unsigned addition except on -;; widening operations. -(define_expand reduc_uplus_mode - [(match_operand:VDQI 0 s_register_operand ) - (match_operand:VDQI 1 s_register_operand )] - TARGET_NEON (Is_d_reg || !BYTES_BIG_ENDIAN) -{ - emit_insn (gen_reduc_splus_mode (operands[0], operands[1])); - DONE; -}) - (define_expand reduc_smin_mode [(match_operand:VD 0 s_register_operand ) (match_operand:VD 1 s_register_operand )]
Re: [PATCH][ARM] revert changes on check_effective_target_arm_*_ok
On Fri, Oct 24, 2014 at 12:47 PM, Jiong Wang jiong.w...@arm.com wrote: we should not add explicit declaration there. arm_neon.h contains those prototype already. they will be available if the compiler configuration is with related builtin predefine, for example __ARM_FEATURE_CRYPTO. so, actually, if there is any warning when compile these test programs, they are expected, and we rely on these warnings to check whether certain features are available. previously, I only verified on arm-none-linux-gnueabi cross check, so have not exposed these regressions. I had also missed the vaes and vfma turning on by default by this change. This is OK. ramana no verified on arm-none-linux-gnueabihf, regression gone away on arm directory. make check RUNTESTFLAGS=aapcs.exp neon.exp acle.exp simd.exp arm.exp ok for trunk? gcc/testsuite/ * lib/target-supports.exp (check_effective_target_arm_crypto_ok_nocache): Remove declaration for vaeseq_u8. (check_effective_target_arm_neon_fp16_ok_nocache): Remove declaration for vcvt_f16_f32. (check_effective_target_arm_neonv2_ok_nocache): Remove declaration for vfma_f32.
[PATCH 8/11][ARM] Migrate to new reduc_[us](min|max)_scal_optab
Similarly to last patch. Tested, in combination with previous patch: bootstrap on arm-none-linux-gnueabihf cross-tested check-gcc on arm-none-eabi. gcc/ChangeLog: config/arm/neon.md (reduc_smin_mode *2): Rename to... (reduc_smin_scal_mode *2): ...this; extract scalar result. (reduc_smax_mode *2): Rename to... (reduc_smax_scal_mode *2): ...this; extract scalar result. (reduc_umin_mode *2): Rename to... (reduc_umin_scal_mode *2): ...this; extract scalar result. (reduc_umax_mode *2): Rename to... (reduc_umax_scal_mode *2): ...this; extract scalar result.commit 537c31561933f8054a2289198f35b19cf5c4196e Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Aug 28 16:49:24 2014 +0100 ARM reduc_[us](min|max)_scal, V_elem not V_ext, rm old non-_scal version. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index d13fe5d..19e1ba0 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1398,104 +1398,109 @@ [(set_attr type neon_add_q)] ) -(define_expand reduc_smin_mode - [(match_operand:VD 0 s_register_operand ) +(define_expand reduc_smin_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VD 1 s_register_operand )] TARGET_NEON (!Is_float_mode || flag_unsafe_math_optimizations) { - neon_pairwise_reduce (operands[0], operands[1], MODEmode, + rtx vec = gen_reg_rtx (MODEmode); + + neon_pairwise_reduce (vec, operands[1], MODEmode, gen_neon_vpsminmode); + /* The result is computed into every element of the vector. */ + emit_insn (gen_vec_extractmode (operands[0], vec, const0_rtx)); DONE; }) -(define_expand reduc_smin_mode - [(match_operand:VQ 0 s_register_operand ) +(define_expand reduc_smin_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VQ 1 s_register_operand )] TARGET_NEON (!Is_float_mode || flag_unsafe_math_optimizations) !BYTES_BIG_ENDIAN { rtx step1 = gen_reg_rtx (V_HALFmode); - rtx res_d = gen_reg_rtx (V_HALFmode); emit_insn (gen_quad_halves_sminmode (step1, operands[1])); - emit_insn (gen_reduc_smin_V_half (res_d, step1)); - emit_insn (gen_move_lo_quad_mode (operands[0], res_d)); + emit_insn (gen_reduc_smin_scal_V_half (operands[0], step1)); DONE; }) -(define_expand reduc_smax_mode - [(match_operand:VD 0 s_register_operand ) +(define_expand reduc_smax_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VD 1 s_register_operand )] TARGET_NEON (!Is_float_mode || flag_unsafe_math_optimizations) { - neon_pairwise_reduce (operands[0], operands[1], MODEmode, + rtx vec = gen_reg_rtx (MODEmode); + neon_pairwise_reduce (vec, operands[1], MODEmode, gen_neon_vpsmaxmode); + /* The result is computed into every element of the vector. */ + emit_insn (gen_vec_extractmode (operands[0], vec, const0_rtx)); DONE; }) -(define_expand reduc_smax_mode - [(match_operand:VQ 0 s_register_operand ) +(define_expand reduc_smax_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VQ 1 s_register_operand )] TARGET_NEON (!Is_float_mode || flag_unsafe_math_optimizations) !BYTES_BIG_ENDIAN { rtx step1 = gen_reg_rtx (V_HALFmode); - rtx res_d = gen_reg_rtx (V_HALFmode); emit_insn (gen_quad_halves_smaxmode (step1, operands[1])); - emit_insn (gen_reduc_smax_V_half (res_d, step1)); - emit_insn (gen_move_lo_quad_mode (operands[0], res_d)); + emit_insn (gen_reduc_smax_scal_V_half (operands[0], step1)); DONE; }) -(define_expand reduc_umin_mode - [(match_operand:VDI 0 s_register_operand ) +(define_expand reduc_umin_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VDI 1 s_register_operand )] TARGET_NEON { - neon_pairwise_reduce (operands[0], operands[1], MODEmode, + rtx vec = gen_reg_rtx (MODEmode); + neon_pairwise_reduce (vec, operands[1], MODEmode, gen_neon_vpuminmode); + /* The result is computed into every element of the vector. */ + emit_insn (gen_vec_extractmode (operands[0], vec, const0_rtx)); DONE; }) -(define_expand reduc_umin_mode - [(match_operand:VQI 0 s_register_operand ) +(define_expand reduc_umin_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VQI 1 s_register_operand )] TARGET_NEON !BYTES_BIG_ENDIAN { rtx step1 = gen_reg_rtx (V_HALFmode); - rtx res_d = gen_reg_rtx (V_HALFmode); emit_insn (gen_quad_halves_uminmode (step1, operands[1])); - emit_insn (gen_reduc_umin_V_half (res_d, step1)); - emit_insn (gen_move_lo_quad_mode (operands[0], res_d)); + emit_insn (gen_reduc_umin_scal_V_half (operands[0], step1)); DONE; }) -(define_expand reduc_umax_mode - [(match_operand:VDI 0 s_register_operand ) +(define_expand reduc_umax_scal_mode + [(match_operand:V_elem 0 nonimmediate_operand ) (match_operand:VDI 1 s_register_operand )] TARGET_NEON { - neon_pairwise_reduce (operands[0], operands[1], MODEmode, + rtx vec
Re: [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral
On Fri, 24 Oct 2014, Alan Lawrence wrote: This is the first half of my previous patch series (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01456.html), that is the part making the REDUC_..._EXPR tree codes endian-neutral, and adding a new reduce-to-scalar optab in place of the endianness-dependent reduc_[us](plus|min|max)_optab. I'm leaving the vec_shr portion out of this patch series, as the link between the two halves is only the end goal of removing an if (BYTES_BIG_ENDIAN) from tree-vect-loop.c; this series removes that from one code path so can stand alone. Patches 1-6 are as previously posted apart from rebasing and removing the old/poisoned AArch64 patterns as per maintainer's request. Patches 1, 2, 4, 5 and 6 have already been approved; patch 3 was discussed somewhat but I think we decided against most of the ideas raised, I have added comment to scalar_reduc_to_vector. I now reread Richie's Otherwise the patch looks good to me and wonder if I should have taken that as an approval but I didn't read it that way at the time...??? Yes, it was an approval ;) Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out how to do the same for MIPS (specifically what I need to add to mips_expand_vec_reduc), and have had no response from the maintainers, so am leaving that for now. Also I haven't migrated (or worked out how to target) rs6000/paired.md, help would be most welcome. The suggestion was then to complete the migration, by removing the old optabs. There are a few options here and I'll follow up with appropriate patches according to feedback received. I see options: (1) just delete the old optabs (and the migration code). This would performance-regress the MIPS backend, but should not break it, although one should really do *something* with the then-unused reduc_[us](plus|min|max)_optab in config/mips/loongson.md. (2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break the MIPS backend if something were not done with it's existing patterns. (2a) Alternatively I could just use a different new name, e.g. reduce_, reduct_, vec_reduc_..., anything that's less of a mouthful than reduc_..._scal. Whilst being only-very-slightly-different from the current reduc_... might be confusing, so might changing the meaning of the optab, and its signature, with the existing name, so am open to suggestions? I definitely prefer (2). Thanks, Richard.
[PATCH 9/11][i386] Migrate reduction optabs to reduc_..._scal
Bootstrapped and check-gcc on x86_64-none-linux-gnu. gcc/ChangeLog: * config/i386/i386.c (ix86_expand_reduc): Extract result into scalar. * config/i386/sse.md (reduc_splus_v8df, reduc_code_mode * 3, reduc_umin_v8hi): Rename to... (reduc_plus_scal_v8df, reduc_code_scal_mode * 3, reduc_umin_scal_v8hi): ...these, changing result mode to scalar. (reduc_splus_v4df, reduc_splus_v2df, reduc_splus_v16sf, reduc_splus_v8sf, reduc_splus_v4sf): Rename to... (reduc_plus_scal_v4df, reduc_plus_scal_v2df, reduc_plus_scal_v16sf, reduc_plus_scal_v8sf, reduc_plus_scal_v4sf): ...these, adding gen_vec_extract for scalar result.commit 80b0d10a78b2f3e86325f373e99e9cf71e42e622 Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Oct 7 13:25:08 2014 +0100 i386 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 4c4a6eb..670a5f5 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41211,12 +41211,12 @@ emit_reduc_half (rtx dest, rtx src, int i) } /* Expand a vector reduction. FN is the binary pattern to reduce; - DEST is the destination; IN is the input vector. */ + DEST is the (scalar) destination; IN is the input vector. */ void ix86_expand_reduc (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in) { - rtx half, dst, vec = in; + rtx half, dst = NULL_RTX, vec = in; enum machine_mode mode = GET_MODE (in); int i; @@ -41225,23 +41225,21 @@ ix86_expand_reduc (rtx (*fn) (rtx, rtx, rtx), rtx dest, rtx in) mode == V8HImode fn == gen_uminv8hi3) { - emit_insn (gen_sse4_1_phminposuw (dest, in)); - return; + dst = gen_reg_rtx (mode); + emit_insn (gen_sse4_1_phminposuw (dst, in)); } - - for (i = GET_MODE_BITSIZE (mode); - i GET_MODE_BITSIZE (GET_MODE_INNER (mode)); - i = 1) -{ + else +for (i = GET_MODE_BITSIZE (mode); + i GET_MODE_BITSIZE (GET_MODE_INNER (mode)); + i = 1) + { half = gen_reg_rtx (mode); emit_reduc_half (half, vec, i); - if (i == GET_MODE_BITSIZE (GET_MODE_INNER (mode)) * 2) - dst = dest; - else - dst = gen_reg_rtx (mode); + dst = gen_reg_rtx (mode); emit_insn (fn (dst, half, vec)); vec = dst; } + ix86_expand_vector_extract (false, dest, dst, 0); } /* Target hook for scalar_mode_supported_p. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index e7646d7..e4e0b95 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2238,8 +2238,8 @@ (set_attr prefix_rep 1,*) (set_attr mode V4SF)]) -(define_expand reduc_splus_v8df - [(match_operand:V8DF 0 register_operand) +(define_expand reduc_plus_scal_v8df + [(match_operand:DF 0 register_operand) (match_operand:V8DF 1 register_operand)] TARGET_AVX512F { @@ -2247,30 +2247,35 @@ DONE; }) -(define_expand reduc_splus_v4df - [(match_operand:V4DF 0 register_operand) +(define_expand reduc_plus_scal_v4df + [(match_operand:DF 0 register_operand) (match_operand:V4DF 1 register_operand)] TARGET_AVX { rtx tmp = gen_reg_rtx (V4DFmode); rtx tmp2 = gen_reg_rtx (V4DFmode); + rtx tmp3 = gen_reg_rtx (V4DFmode); + emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1])); emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1))); - emit_insn (gen_addv4df3 (operands[0], tmp, tmp2)); + emit_insn (gen_addv4df3 (tmp3, tmp, tmp2)); + emit_insn (gen_vec_extractv4df (operands[0], tmp3, GEN_INT (1))); DONE; }) -(define_expand reduc_splus_v2df - [(match_operand:V2DF 0 register_operand) +(define_expand reduc_plus_scal_v2df + [(match_operand:DF 0 register_operand) (match_operand:V2DF 1 register_operand)] TARGET_SSE3 { - emit_insn (gen_sse3_haddv2df3 (operands[0], operands[1], operands[1])); + rtx tmp = gen_reg_rtx (V2DFmode); + emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1])); + emit_insn (gen_vec_extractv2df (operands[0], tmp, GEN_INT (0))); DONE; }) -(define_expand reduc_splus_v16sf - [(match_operand:V16SF 0 register_operand) +(define_expand reduc_plus_scal_v16sf + [(match_operand:SF 0 register_operand) (match_operand:V16SF 1 register_operand)] TARGET_AVX512F { @@ -2278,30 +2283,35 @@ DONE; }) -(define_expand reduc_splus_v8sf - [(match_operand:V8SF 0 register_operand) +(define_expand reduc_plus_scal_v8sf + [(match_operand:SF 0 register_operand) (match_operand:V8SF 1 register_operand)] TARGET_AVX { rtx tmp = gen_reg_rtx (V8SFmode); rtx tmp2 = gen_reg_rtx (V8SFmode); + rtx tmp3 = gen_reg_rtx (V8SFmode); + emit_insn (gen_avx_haddv8sf3 (tmp, operands[1], operands[1])); emit_insn (gen_avx_haddv8sf3 (tmp2, tmp, tmp)); emit_insn (gen_avx_vperm2f128v8sf3 (tmp, tmp2, tmp2, GEN_INT (1))); - emit_insn (gen_addv8sf3 (operands[0], tmp, tmp2)); + emit_insn (gen_addv8sf3 (tmp3, tmp, tmp2)); + emit_insn (gen_vec_extractv8sf (operands[0], tmp3, GEN_INT (0))); DONE; })
[PATCH] c11-atomic-exec-5: Avoid dead code where LDBL_MANT_DIG is 106
Hi, Commit 216437 missed a part of Adhemerval's original change that made `long_double_add_overflow', `complex_long_double_add_overflow', `long_double_sub_overflow' and `complex_long_double_sub_overflow' tests consistently defined only if called. These tests are now only made under the `LDBL_MANT_DIG != 106' condition, otherwise there is no need to provide definitions that become dead code. Here's the missing part, I have verified the source still builds after the change manually with: $ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=113 -Wunused-function -std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o c11-atomic-exec-5 c11-atomic-exec-5.c and: $ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=106 -Wunused-function -std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o c11-atomic-exec-5 c11-atomic-exec-5.c It also passed regression testing with the powerpc-gnu-linux target and my usual multilibs that have LDBL_MANT_DIG set to 106, which is the only case this change really affects. Without this change I get this instead: $ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=113 -Wunused-function -std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o c11-atomic-exec-5 c11-atomic-exec-5.c $ (OK), and: $ gcc -U__LDBL_MANT_DIG__ -D__LDBL_MANT_DIG__=106 -Wunused-function -std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -latomic -o c11-atomic-exec-5 c11-atomic-exec-5.c c11-atomic-exec-5.c:62:1: warning: 'test_main_long_double_add_overflow' definedbut not used [-Wunused-function] test_main_##NAME (void) \ ^ c11-atomic-exec-5.c:334:1: note: in expansion of macro 'TEST_FUNCS' TEST_FUNCS (long_double_add_overflow, long double, , += LDBL_MAX, 0, ^ c11-atomic-exec-5.c:62:1: warning: 'test_main_complex_long_double_add_overflow'defined but not used [-Wunused-function] test_main_##NAME (void) \ ^ c11-atomic-exec-5.c:352:1: note: in expansion of macro 'TEST_FUNCS' TEST_FUNCS (complex_long_double_add_overflow, _Complex long double, , += LDBL_MAX, 0, ^ c11-atomic-exec-5.c:62:1: warning: 'test_main_long_double_sub_overflow' definedbut not used [-Wunused-function] test_main_##NAME (void) \ ^ c11-atomic-exec-5.c:358:1: note: in expansion of macro 'TEST_FUNCS' TEST_FUNCS (long_double_sub_overflow, long double, , -= LDBL_MAX, 0, ^ c11-atomic-exec-5.c:62:1: warning: 'test_main_complex_long_double_sub_overflow'defined but not used [-Wunused-function] test_main_##NAME (void) \ ^ c11-atomic-exec-5.c:376:1: note: in expansion of macro 'TEST_FUNCS' TEST_FUNCS (complex_long_double_sub_overflow, _Complex long double, , -= LDBL_MAX, 0, ^ $ (not quite so). This also wraps the definitions of the `NOT_LDBL_EPSILON_2' and `NOT_MINUS_LDBL_EPSILON_2' macros into this condition, but these aren't referred to if `LDBL_MANT_DIG' is 106 either. No changes compared to original code so all credit goes to Adhemerval. OK to apply? 2014-10-24 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc/testsuite/ * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Only actually define if LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. (NOT_LDBL_EPSILON_2): Likewise. (NOT_MINUS_LDBL_EPSILON_2): Likewise. Maciej gcc-r216437-azanella-rs6000-atomic-assign-expand-env-update.diff Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c === --- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 2014-10-22 21:59:45.788954624 +0100 +++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 2014-10-22 21:59:15.788143775 +0100 @@ -331,11 +331,11 @@ TEST_FUNCS (complex_double_div_overflow, TEST_FUNCS (long_double_add_invalid, long double, , += __builtin_infl (), 0, 0, __builtin_isinf, 0, -__builtin_infl (), FE_INVALID) +#if LDBL_MANT_DIG != 106 TEST_FUNCS (long_double_add_overflow, long double, , += LDBL_MAX, 0, LDBL_MAX, __builtin_isinf, FE_OVERFLOW | FE_INEXACT, 0, 0) #define NOT_LDBL_EPSILON_2(X) ((X) != LDBL_EPSILON / 2) -#if LDBL_MANT_DIG != 106 TEST_FUNCS (long_double_add_inexact, long double, , += LDBL_EPSILON / 2, 0, 1.0L, NOT_LDBL_EPSILON_2, FE_INEXACT, 0, 0) @@ -348,18 +348,18 @@ TEST_FUNCS (long_double_preinc_inexact, TEST_FUNCS (long_double_postinc_inexact, long double, , ++, 0, LDBL_EPSILON / 2, NOT_MINUS_1, FE_INEXACT, -1, 0) -#endif TEST_FUNCS (complex_long_double_add_overflow, _Complex long double, , += LDBL_MAX, 0, LDBL_MAX, REAL_ISINF, FE_OVERFLOW | FE_INEXACT, 0, 0) +#endif TEST_FUNCS (long_double_sub_invalid, long
[Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab
This is an attempt to migrate IA64 to the newer optabs, however, I found none of the tests in gcc.dg/vect seemed to touch any of the affected patternsso this is only really tested by building a stage-1 compiler. gcc/ChangeLog: * config/ia64/vect.md (reduc_splus_v2sf): Rename to... (reduc_plus_v2sf): ...this, add a vec_extractv2sf. (reduc_smin_v2sf): Rename to... (reduc_smin_scal_v2sf): ...this, add a vec_extractv2sf. (reduc_smax_v2sf): Rename to... (reduc_smax_scal_v2sf): ...this, add a vec_extractv2sf.
[PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
This migrates the reduction patterns in altivec.md and vector.md to the new names. I've not touched paired.md as I wasn't really sure how to fix that (how do I vec_extractv2sf ?), moreover the testing I did didn't seem to exercise any of those patterns (iow: I'm not sure what would be an appropriate target machine?). I note the reduc_uplus_v16qi (which I've removed, as unsigned and signed addition should be equivalent) differed from reduc_splus_v16qi in using gen_altivec_vsum4ubs rather than gen_altivec_vsum4sbs. Testcases gcc.dg/vect/{slp-24-big-array.c,slp-24.c,vect-reduc-1char-big-array.c,vert-reduc-1char.c} thus produce assembly which differs from previously (only) in that vsum4ubs becomes vsum4sbs. These tests are still passing so I assume this is OK. The combining of signed and unsigned addition also improves gcc.dg/vect/{vect-outer-4i.c,vect-reduc-1short.c,vect-reduc-dot-u8b.c,vect-reduc-pattern-1c-big-array.c,vect-reduc-pattern-1c.c} : these are now reduced using direct vector reduction, rather than with shifts as previously (because there was only a reduc_splus rather than the reduc_uplus these tests looked for). ((Side note: the RTL changes to vector.md are to match the combine patterns in vsx.md; now that we now longer depend upon combine to generate those patterns (as the optab outputs them directly), one might wish to remove the smaller pattern from vsx.md, and/or simplify the RTL. I theorize that a reduction of a two-element vector is just adding the first element to the second, so maybe to something like [(parallel [(set (match_operand:DF 0 vfloat_operand ) (VEC_reduc:V2DF (vec_select:DF (match_operand:V2DF 1 vfloat_operand ) (parallel [(const_int 1)])) (vec_select:DF (match_dup 1) (parallel [(const_int 0)] (clobber (match_scratch:V2DF 2 ))])] but I think it's best for me to leave that to the port maintainers.)) Bootstrapped and check-gcc on powerpc64-none-linux-gnu (gcc110.fsffrance.org, with thanks to the GCC Compile Farm). gcc/ChangeLog: * config/rs6000/altivec.md (reduc_splus_mode): Rename to... (reduc_plus_scal_mode): ...this, and rs6000_expand_vector_extract. (reduc_uplus_v16qi): Remove. * config/rs6000/vector.md (VEC_reduc_name): change splus to plus (reduc_VEC_reduc_name_v2df): Rename to... (reduc_VEC_reduc_name_scal_v2df): ...this, wrap VEC_reduc in a vec_select of element 1. (reduc_VEC_reduc_name_v4sf): Rename to... (reduc_VEC_reduc_name_scal_v4sf): ...this, wrap VEC_reduc in a vec_select of element 3, add scratch register.
Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
Ooops, attached.commit e48d59399722ce8316d4b1b4f28b40d87b1193fa Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Oct 7 15:28:47 2014 +0100 PowerPC v2 (but not paired.md) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 02ea142..92bb5d0 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2596,35 +2596,22 @@ operands[3] = gen_reg_rtx (GET_MODE (operands[0])); }) -(define_expand reduc_splus_mode - [(set (match_operand:VIshort 0 register_operand =v) +(define_expand reduc_plus_scal_mode + [(set (match_operand:VI_scalar 0 register_operand =v) (unspec:VIshort [(match_operand:VIshort 1 register_operand v)] UNSPEC_REDUC_PLUS))] TARGET_ALTIVEC { rtx vzero = gen_reg_rtx (V4SImode); rtx vtmp1 = gen_reg_rtx (V4SImode); - rtx dest = gen_lowpart (V4SImode, operands[0]); + rtx vtmp2 = gen_reg_rtx (MODEmode); + rtx dest = gen_lowpart (V4SImode, vtmp2); + HOST_WIDE_INT last_elem = GET_MODE_NUNITS (MODEmode) - 1; emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); emit_insn (gen_altivec_vsum4sVI_chars (vtmp1, operands[1], vzero)); emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero)); - DONE; -}) - -(define_expand reduc_uplus_v16qi - [(set (match_operand:V16QI 0 register_operand =v) -(unspec:V16QI [(match_operand:V16QI 1 register_operand v)] - UNSPEC_REDUC_PLUS))] - TARGET_ALTIVEC -{ - rtx vzero = gen_reg_rtx (V4SImode); - rtx vtmp1 = gen_reg_rtx (V4SImode); - rtx dest = gen_lowpart (V4SImode, operands[0]); - - emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); - emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero)); - emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero)); + rs6000_expand_vector_extract (operands[0], vtmp2, last_elem); DONE; }) diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index 237724e..54b18aa 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -81,7 +81,7 @@ ;; Vector reduction code iterators (define_code_iterator VEC_reduc [plus smin smax]) -(define_code_attr VEC_reduc_name [(plus splus) +(define_code_attr VEC_reduc_name [(plus plus) (smin smin) (smax smax)]) @@ -1077,18 +1077,20 @@ ;; Vector reduction expanders for VSX -(define_expand reduc_VEC_reduc_name_v2df - [(parallel [(set (match_operand:V2DF 0 vfloat_operand ) - (VEC_reduc:V2DF - (vec_concat:V2DF - (vec_select:DF - (match_operand:V2DF 1 vfloat_operand ) - (parallel [(const_int 1)])) - (vec_select:DF - (match_dup 1) - (parallel [(const_int 0)]))) - (match_dup 1))) - (clobber (match_scratch:V2DF 2 ))])] +(define_expand reduc_VEC_reduc_name_scal_v2df + [(parallel [(set (match_operand:DF 0 vfloat_operand ) + (vec_select:DF + (VEC_reduc:V2DF + (vec_concat:V2DF + (vec_select:DF + (match_operand:V2DF 1 vfloat_operand ) + (parallel [(const_int 1)])) + (vec_select:DF + (match_dup 1) + (parallel [(const_int 0)]))) + (match_dup 1)) + (parallel [(const_int 1)]))) + (clobber (match_scratch:DF 2 ))])] VECTOR_UNIT_VSX_P (V2DFmode) ) @@ -1099,13 +1101,16 @@ ; is to allow us to use a code iterator, but not completely list all of the ; vector rotates, etc. to prevent canonicalization -(define_expand reduc_VEC_reduc_name_v4sf - [(parallel [(set (match_operand:V4SF 0 vfloat_operand ) - (VEC_reduc:V4SF - (unspec:V4SF [(const_int 0)] UNSPEC_REDUC) - (match_operand:V4SF 1 vfloat_operand ))) +(define_expand reduc_VEC_reduc_name_scal_v4sf + [(parallel [(set (match_operand:SF 0 vfloat_operand ) + (vec_select:SF + (VEC_reduc:V4SF + (unspec:V4SF [(const_int 0)] UNSPEC_REDUC) + (match_operand:V4SF 1 vfloat_operand )) + (parallel [(const_int 3)]))) (clobber (match_scratch:V4SF 2 )) - (clobber (match_scratch:V4SF 3 ))])] + (clobber (match_scratch:V4SF 3 )) + (clobber (match_scratch:V4SF 4 ))])] VECTOR_UNIT_VSX_P (V4SFmode) )
Re: [Protopatch 11/11][IA64] Migrate to reduc_(plus|min|max)_scal_v2df optab
Ooops, attached.commit 56296417b9f6795e541b1101dce6e6ac1789de9a Author: Alan Lawrence alan.lawre...@arm.com Date: Wed Oct 8 15:58:27 2014 +0100 IA64 (?!) diff --git a/gcc/config/ia64/vect.md b/gcc/config/ia64/vect.md index e3ce292..45f4156 100644 --- a/gcc/config/ia64/vect.md +++ b/gcc/config/ia64/vect.md @@ -1217,45 +1217,54 @@ fpmin %0 = %1, %2 [(set_attr itanium_class fmisc)]) -(define_expand reduc_splus_v2sf - [(match_operand:V2SF 0 fr_register_operand ) +(define_expand reduc_plus_scal_v2sf + [(match_operand:SF 0 fr_register_operand ) (match_operand:V2SF 1 fr_register_operand )] { rtx tmp = gen_reg_rtx (V2SFmode); + rtx tmp2 = gen_reg_rtx (V2SFmode); + if (TARGET_BIG_ENDIAN) emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1])); else emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode))); - emit_insn (gen_addv2sf3 (operands[0], operands[1], tmp)); + emit_insn (gen_addv2sf3 (tmp2, operands[1], tmp)); + emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0))); DONE; }) -(define_expand reduc_smax_v2sf - [(match_operand:V2SF 0 fr_register_operand ) +(define_expand reduc_smax_scal_v2sf + [(match_operand:SF 0 fr_register_operand ) (match_operand:V2SF 1 fr_register_operand )] { rtx tmp = gen_reg_rtx (V2SFmode); + rtx tmp2 = gen_reg_rtx (V2SFmode); + if (TARGET_BIG_ENDIAN) emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1])); else emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode))); - emit_insn (gen_smaxv2sf3 (operands[0], operands[1], tmp)); + emit_insn (gen_smaxv2sf3 (tmp2, operands[1], tmp)); + emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0))); DONE; }) -(define_expand reduc_smin_v2sf - [(match_operand:V2SF 0 fr_register_operand ) +(define_expand reduc_smin_scal_v2sf + [(match_operand:SF 0 fr_register_operand ) (match_operand:V2SF 1 fr_register_operand )] { rtx tmp = gen_reg_rtx (V2SFmode); + rtx tmp2 = gen_reg_rtx (V2SFmode); + if (TARGET_BIG_ENDIAN) emit_insn (gen_fswap (tmp, CONST0_RTX (V2SFmode), operands[1])); else emit_insn (gen_fswap (tmp, operands[1], CONST0_RTX (V2SFmode))); - emit_insn (gen_sminv2sf3 (operands[0], operands[1], tmp)); + emit_insn (gen_sminv2sf3 (tmp2, operands[1], tmp)); + emit_insn (gen_vec_extractv2sf (operands[0], tmp2, GEN_INT (0))); DONE; })
[v3] Minor atomic tweaks
Hi, tested x86_64-linux. Thanks, Paolo. /// 2014-10-24 Paolo Carlini paolo.carl...@oracle.com * include/bits/atomic_base.h: Avoid including stdbool.h. * include/std/atomic: When __cplusplus 201103L skip the rest of the header. * testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc: Adjust. Index: include/bits/atomic_base.h === --- include/bits/atomic_base.h (revision 216624) +++ include/bits/atomic_base.h (working copy) @@ -33,7 +33,6 @@ #pragma GCC system_header #include bits/c++config.h -#include stdbool.h #include stdint.h #include bits/atomic_lockfree_defines.h Index: include/std/atomic === --- include/std/atomic (revision 216624) +++ include/std/atomic (working copy) @@ -36,7 +36,7 @@ #if __cplusplus 201103L # include bits/c++0x_warning.h -#endif +#else #include bits/atomic_base.h @@ -1129,4 +1129,6 @@ _GLIBCXX_END_NAMESPACE_VERSION } // namespace -#endif +#endif // C++11 + +#endif // _GLIBCXX_ATOMIC Index: testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc === --- testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc(revision 216624) +++ testsuite/29_atomics/headers/atomic/std_c++0x_neg.cc(working copy) @@ -18,7 +18,7 @@ // with this library; see the file COPYING3. If not see // http://www.gnu.org/licenses/. -#include atomic // { dg-excess-errors In file included from } +#include atomic // { dg-error ISO C.. 2011 { target *-*-* } 32 }
Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c
Alan Lawrence alan.lawre...@arm.com writes: Rainer Orth wrote: However, as a quick first step, does adding the ilp32 / lp64 (and keeping the architectures list for now) solve the immediate problem? Patch attached, OK for trunk? No, as I said this is wrong for biarch targets like sparc and i386. When you say no this does not solve the immediate problem, are you saying that you are (still) seeing test failures with the require-effective-target patch applied? Or is the issue that this would not execute the tests as I didn't try that patch yet, but the target part is wrong, as I tried to explain. Consider the sparc case: * if you configure for sparc-sun-solaris2.11, you default to -m32 (i.e. ilp32), while -m64 is lp64 * if you configure for sparcv9-sun-solaris2.11 instead, you default to -m64 (lp64), but get ilp32 with -m32 So, irrespective of the sparc vs. sparc64 (which is wrong, btw., the canonical form for 64-bit-default sparc is sparcv9) forms, you can get ilp32 and lp64 with both. Similar issues hold for i?86 vs. x86_64 and probably other biarch targets like powerpc vs. powerpc64, so you need to use the most generic forms of the target names in you target lists. widely as might be possible? In principle I'm quite happy to relax the target patterns, although have been having issues with sparc (below)... Re. what the architectures have in common is largely that these are the primary/secondary archs on which I've checked the test passes! I can now add mips and microblaze to this list, however I'm nervous of dropping the target entirely given the very large number of target architectures gcc supports; and e.g. IA64 (in ILP32 mode) generates an ashiftrt:DI by 31 places, not ashiftrt:SI, which does not match the simplification criteria in combine.c. As I stated before, such target lists without any explanation are bound to confuse future readers/testers: at the very least, add comments explaining what those lists have in common. OTOH, at this stage it might be best to just drop the target list for now, learn which targets pass and fail the tests, and then reintroduce them or, better yet, add an effective-target keyword which matches them. Otherwise, you'll never get test coverage beyond your current list. This should be something like { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* } E.g. sparc-sun-solaris2.11 with -m64 is lp64, but would be excluded by your target list. Keep the list sorted alphabetically and best add an explanation so others know what those targets have in common. So I've built a stage-1 compiler with --target=sparc-sun-solaris2.11, and I find * without -m64, my dg-require-effective-target ilp32 causes the 32-bit test to execute, and pass; dg-require-effective-target lp64 prevents execution of the 64-bit test (which would fail) - so all as expected and desired. * with -lp64, behaviour is as previous (this is probably expected) Huh? What's -lp64? * with -m64, dg-require-effective-target ilp32 still causes the test to execute (but it fails, as the RTL now has an ashiftrt:DI by 31 places, which doesn't meet the simplification criteria in combine.c - this is pretty much as expected). dg-require-effective-target lp64 stops the 64-bit test from executing however (despite that it would now pass). Can you clarify what I should be doing on sparc, therefore? It's not only about sparc, but about all biarch targets. The following patch (which only includes the parts strictly necessary to avoid the failures, nothing else I suggested above) works for me on sparc-sun-solaris2.11 (-m32 and -m64), x86_64-unknown-linux-gnu (-m64 and -m32), and i686-unknown-linux-gnu (-m32 and -m64): the first test is run for 64-bit only, while the second one only for 32-bit: diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c --- a/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c +++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c @@ -1,4 +1,5 @@ -/* { dg-do compile {target sparc64*-*-* aarch64*-*-* x86_64-*-* powerpc64*-*-*} } */ +/* { dg-do compile { target aarch64*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* } } */ +/* { dg-require-effective-target lp64 } */ /* { dg-options -O2 -fdump-rtl-combine-all } */ typedef long long int int64_t; diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c --- a/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c +++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c @@ -1,4 +1,5 @@ -/* { dg-do compile {target arm*-*-* i?86-*-* powerpc-*-* sparc-*-*} } */ +/* { dg-do compile { target arm*-*-* i?86-*-* powerpc*-*-* sparc*-*-* x86_64-*-* } } */ +/* { dg-require-effective-target ilp32 } */ /* { dg-options -O2 -fdump-rtl-combine-all } */ typedef long int32_t; Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH][6/n] Merge from match-and-simplify, make forwprop fold all stmts
This patch makes GIMPLE forwprop fold all statements, following single-use SSA edges only (as suggested by Jeff and certainly how this will regress the least until we replace manual simplification code that does not restrict itself this way). forwprop is run up to 4 times at the moment (once only for -Og, not at all for -O0), which still seems reasonable. IMHO the forwprop pass immediately after inlining is somewhat superfluous, it was added there just for its ADDR_EXPR propagation. We should eventually split this pass into two. Note that just folding what we propagated into (like the SSA propagators do during substitute-and-fold phase) will miss cases where we propagate into a stmt feeding the one we could simplify. Unless we always fold all single-use (and their use) stmts we have to fold everything from time to time. Changing how / when we fold stuff is certainly sth to look after with fold_stmt now being able to follow SSA edges. Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress. From earlier testing I remember I need to adjust a few testcases that don't expect the early folding - notably two strlenopt cases (previously XFAILed but then PASSed again). I also expect to massage the single-use heuristic as I get to merging the patterns I added for the various forwprop manual pattern matchings to trunk (a lot of them do not restrict themselves this way). Does this otherwise look ok? Thanks, Richard. 2014-10-24 Richard Biener rguent...@suse.de * tree-ssa-forwprop.c: Include tree-cfgcleanup.h and tree-into-ssa.h. (lattice): New global. (fwprop_ssa_val): New function. (fold_all_stmts): Likewise. (pass_forwprop::execute): Finally fold all stmts. Index: gcc/tree-ssa-forwprop.c === --- gcc/tree-ssa-forwprop.c (svn+ssh://rgue...@gcc.gnu.org/svn/gcc/trunk/gcc/tree-ssa-forwprop.c) (revision 216631) +++ gcc/tree-ssa-forwprop.c (.../gcc/tree-ssa-forwprop.c) (working copy) @@ -54,6 +54,8 @@ along with GCC; see the file COPYING3. #include tree-ssa-propagate.h #include tree-ssa-dom.h #include builtins.h +#include tree-cfgcleanup.h +#include tree-into-ssa.h /* This pass propagates the RHS of assignment statements into use sites of the LHS of the assignment. It's basically a specialized @@ -3586,6 +3588,93 @@ simplify_mult (gimple_stmt_iterator *gsi return false; } + + +/* Const-and-copy lattice for fold_all_stmts. */ +static vectree lattice; + +/* Primitive lattice function for gimple_simplify. */ + +static tree +fwprop_ssa_val (tree name) +{ + /* First valueize NAME. */ + if (TREE_CODE (name) == SSA_NAME + SSA_NAME_VERSION (name) lattice.length ()) +{ + tree val = lattice[SSA_NAME_VERSION (name)]; + if (val) + name = val; +} + /* If NAME is not the only use signal we don't want to continue + matching into its definition. */ + if (TREE_CODE (name) == SSA_NAME + !has_single_use (name)) +return NULL_TREE; + return name; +} + +/* Fold all stmts using fold_stmt following only single-use chains + and using a simple const-and-copy lattice. */ + +static bool +fold_all_stmts (struct function *fun) +{ + bool cfg_changed = false; + + /* Combine stmts with the stmts defining their operands. Do that + in an order that guarantees visiting SSA defs before SSA uses. */ + lattice.create (num_ssa_names); + lattice.quick_grow_cleared (num_ssa_names); + int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (fun)); + int postorder_num = inverted_post_order_compute (postorder); + for (int i = 0; i postorder_num; ++i) +{ + basic_block bb = BASIC_BLOCK_FOR_FN (fun, postorder[i]); + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple orig_stmt = stmt; + + if (fold_stmt (gsi, fwprop_ssa_val)) + { + stmt = gsi_stmt (gsi); + if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt) + gimple_purge_dead_eh_edges (bb)) + cfg_changed = true; + /* Cleanup the CFG if we simplified a condition to +true or false. */ + if (gimple_code (stmt) == GIMPLE_COND + (gimple_cond_true_p (stmt) + || gimple_cond_false_p (stmt))) + cfg_changed = true; + update_stmt (stmt); + } + + /* Fill up the lattice. */ + if (gimple_assign_single_p (stmt)) + { + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + if (TREE_CODE (lhs) == SSA_NAME) + { + if (TREE_CODE (rhs) == SSA_NAME) + lattice[SSA_NAME_VERSION (lhs)] = fwprop_ssa_val (rhs); + else if (is_gimple_min_invariant (rhs)) +
Re: Patch committed: Don't define TARGET_HAS_F_SETLKW
Ian Taylor i...@golang.org writes: 2014-10-23 Ian Lance Taylor i...@google.com * config/mep/mep.h (TARGET_HAS_F_SETLKW): Don't define. s/define/undefine/ Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
[PATCH] Fix modulo patterns in match.pd
As noted by Marc I forgot to actually utilize the iterator variable. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. PS: How do we want to refer to patterns in ChangeLogs? 2014-10-24 Richard Biener rguent...@suse.de * match.pd (0 % X): Properly use the iterator iterating over all modulo operators. (X % 1): Likewise. Index: gcc/match.pd === --- gcc/match.pd(revision 216648) +++ gcc/match.pd(working copy) @@ -64,13 +64,13 @@ (define_predicates (for op (ceil_mod floor_mod round_mod trunc_mod) /* 0 % X is always zero. */ (simplify - (trunc_mod integer_zerop@0 @1) + (op integer_zerop@0 @1) /* But not for 0 % 0 so that we can get the proper warnings and errors. */ (if (!integer_zerop (@1)) @0)) /* X % 1 is always zero. */ (simplify - (trunc_mod @0 integer_onep) + (op @0 integer_onep) { build_zero_cst (type); })) /* x | ~0 - ~0 */
Re: [PATCH] Fix modulo patterns in match.pd
On Fri, Oct 24, 2014 at 03:27:19PM +0200, Richard Biener wrote: As noted by Marc I forgot to actually utilize the iterator variable. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. PS: How do we want to refer to patterns in ChangeLogs? Perhaps the syntax should be (simplify name (...) { ... }) (maybe the name being optional?), where you'd give some name to the simplification, say 0 % X or 0 % X = 0 or 0 % X variant 3 or whatever, then you could easily refer to those strings in ChangeLog, on gcc-patches, in comments etc. 2014-10-24 Richard Biener rguent...@suse.de * match.pd (0 % X): Properly use the iterator iterating over all modulo operators. (X % 1): Likewise. Index: gcc/match.pd === --- gcc/match.pd (revision 216648) +++ gcc/match.pd (working copy) @@ -64,13 +64,13 @@ (define_predicates (for op (ceil_mod floor_mod round_mod trunc_mod) /* 0 % X is always zero. */ (simplify - (trunc_mod integer_zerop@0 @1) + (op integer_zerop@0 @1) /* But not for 0 % 0 so that we can get the proper warnings and errors. */ (if (!integer_zerop (@1)) @0)) /* X % 1 is always zero. */ (simplify - (trunc_mod @0 integer_onep) + (op @0 integer_onep) { build_zero_cst (type); })) /* x | ~0 - ~0 */ Jakub
Re: [PATCH] Fix modulo patterns in match.pd
On Fri, 24 Oct 2014, Jakub Jelinek wrote: On Fri, Oct 24, 2014 at 03:27:19PM +0200, Richard Biener wrote: As noted by Marc I forgot to actually utilize the iterator variable. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. PS: How do we want to refer to patterns in ChangeLogs? Perhaps the syntax should be (simplify name (...) { ... }) (maybe the name being optional?), where you'd give some name to the simplification, say 0 % X or 0 % X = 0 or 0 % X variant 3 or whatever, then you could easily refer to those strings in ChangeLog, on gcc-patches, in comments etc. I ripped out optional name support when I added user-defined predicates which look like (match truth_valued_p (truth_not @0)) or (match (logical_inverted_value @0) (bit_not truth_valued_p@0)) (un-)conveniently the parsers for (simplify...) and (match...) are shared. I can see to re-add the optional pattern naming. OTOH it will be fun to invent an unique name for each of them ;) (patternN anyone? ...) Richard. 2014-10-24 Richard Biener rguent...@suse.de * match.pd (0 % X): Properly use the iterator iterating over all modulo operators. (X % 1): Likewise. Index: gcc/match.pd === --- gcc/match.pd(revision 216648) +++ gcc/match.pd(working copy) @@ -64,13 +64,13 @@ (define_predicates (for op (ceil_mod floor_mod round_mod trunc_mod) /* 0 % X is always zero. */ (simplify - (trunc_mod integer_zerop@0 @1) + (op integer_zerop@0 @1) /* But not for 0 % 0 so that we can get the proper warnings and errors. */ (if (!integer_zerop (@1)) @0)) /* X % 1 is always zero. */ (simplify - (trunc_mod @0 integer_onep) + (op @0 integer_onep) { build_zero_cst (type); })) /* x | ~0 - ~0 */
[PATCHv5][Kasan] Allow to override Asan shadow offset from command line
Hi all, On 10/17/2014 11:53 AM, Yury Gribov wrote: On 09/29/2014 09:21 PM, Yury Gribov wrote: Kasan developers has asked for an option to override offset of Asan shadow memory region. This should simplify experimenting with memory layouts on 64-bit architectures. New patch which checks that -fasan-shadow-offset is only enabled for -fsanitize=kernel-address. I (unfortunately) can't make this --param because this can be a 64-bit value. New patchset that adds strtoull to libiberty (blind copy-paste of already existing strtoul.c) and uses it to parse -fasan-shadow-offset (to avoid problem with compiling for 64-bit target a 32-bit host). A new version of patchset which does a proper implementation of strtoll/strtoull in libiberty (with tests, docs and stuff). Bootstrapped and regtested on x64. As mentioned previously, I'm not sure how to properly test strtoll implementation (strtoll is already part of Linux glibc so my implementation is not compiled in by default). I've manually embedded strtoll.o/strtoull.o into libiberty.a and verified that regression tests passed. -Y From 1882c41de6c8ae53b7e199b3cc655b6f4b31e8fb Mon Sep 17 00:00:00 2001 From: Yury Gribov y.gri...@samsung.com Date: Thu, 16 Oct 2014 18:31:10 +0400 Subject: [PATCH 1/2] Add strtoll and strtoull to libiberty. 2014-10-20 Yury Gribov y.gri...@samsung.com include/ * libiberty.h (strtol, strtoul, strtoll, strtoull): New prototypes. libiberty/ * strtoll.c: New file. * strtoull.c: New file. * configure.ac: Add long long checks. Add harness for strtoll and strtoull. Check decls for strtol, strtoul, strtoll, strtoull. * Makefile.in (CFILES, CONFIGURED_OFILES): Added strtoll and strtoull. * config.in: Regenerate. * configure: Regenerate. * functions.texi: Regenerate. * testsuite/Makefile.in (check-strtol): New rule. (test-strtol): Likewise. (mostlyclean): Clean up strtol test. * testsuite/test-strtol.c: New test. --- include/libiberty.h | 27 ++ libiberty/Makefile.in | 46 +++--- libiberty/config.in | 31 +++ libiberty/configure | 122 +++- libiberty/configure.ac| 14 ++- libiberty/functions.texi | 18 libiberty/strtoll.c | 175 +++ libiberty/strtoull.c | 122 libiberty/testsuite/Makefile.in | 12 ++- libiberty/testsuite/test-strtol.c | 184 + 10 files changed, 733 insertions(+), 18 deletions(-) create mode 100644 libiberty/strtoll.c create mode 100644 libiberty/strtoull.c create mode 100644 libiberty/testsuite/test-strtol.c diff --git a/include/libiberty.h b/include/libiberty.h index d09c9a5..26355a9 100644 --- a/include/libiberty.h +++ b/include/libiberty.h @@ -655,6 +655,33 @@ extern size_t strnlen (const char *, size_t); extern int strverscmp (const char *, const char *); #endif +#if defined(HAVE_DECL_STRTOL) !HAVE_DECL_STRTOL +extern long int strtol (const char *nptr, +char **endptr, int base); +#endif + +#if defined(HAVE_DECL_STRTOUL) !HAVE_DECL_STRTOUL +extern unsigned long int strtoul (const char *nptr, + char **endptr, int base); +#endif + +#if defined(HAVE_DECL_STRTOLL) !HAVE_DECL_STRTOLL +__extension__ +extern long long int strtoll (const char *nptr, + char **endptr, int base); +#endif + +#if defined(HAVE_DECL_STRTOULL) !HAVE_DECL_STRTOULL +__extension__ +extern unsigned long long int strtoull (const char *nptr, +char **endptr, int base); +#endif + +#if defined(HAVE_DECL_STRVERSCMP) !HAVE_DECL_STRVERSCMP +/* Compare version strings. */ +extern int strverscmp (const char *, const char *); +#endif + /* Set the title of a process */ extern void setproctitle (const char *name, ...); diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in index 9b87720..1b0d8ae 100644 --- a/libiberty/Makefile.in +++ b/libiberty/Makefile.in @@ -152,8 +152,8 @@ CFILES = alloca.c argv.c asprintf.c atexit.c\ spaces.c splay-tree.c stack-limit.c stpcpy.c stpncpy.c \ strcasecmp.c strchr.c strdup.c strerror.c strncasecmp.c \ strncmp.c strrchr.c strsignal.c strstr.c strtod.c strtol.c \ - strtoul.c strndup.c strnlen.c strverscmp.c \ - timeval-utils.c tmpnam.c \ + strtoll.c strtoul.c strtoull.c strndup.c strnlen.c \ + strverscmp.c timeval-utils.c tmpnam.c\ unlink-if-ordinary.c \ vasprintf.c vfork.c vfprintf.c vprintf.c vsnprintf.c vsprintf.c \ waitpid.c \ @@ -219,8 +219,8 @@ CONFIGURED_OFILES = ./asprintf.$(objext) ./atexit.$(objext) \ ./strchr.$(objext) ./strdup.$(objext) ./strncasecmp.$(objext) \ ./strncmp.$(objext) ./strndup.$(objext) ./strnlen.$(objext) \ ./strrchr.$(objext) ./strstr.$(objext) ./strtod.$(objext) \ - ./strtol.$(objext) ./strtoul.$(objext) ./strverscmp.$(objext)
Re: [patch,avr] tweak sign extensions, take #2
2014-10-24 14:37 GMT+04:00 Georg-Johann Lay a...@gjlay.de: Am 10/23/2014 08:16 PM schrieb Denis Chertykov: This optimization makes most sign-extensions one instruction shorter in the case when the source register may be clobbered and the register numbers are different. Source and destination may overlap. Ok for trunk? Johann gcc/ * config/avr/avr.md (extendqihi2, extendqipsi2, extendqisi2) (extendhipsi2, extendhisi2): Optimize if source reg is unused after the insns and has different REGNO than destination. Approved. Denis. Finally I switched to a solution that avoids all the ugly asm snippets and special casing, and which is exact w.r.t code size. So allow me drop the patch from above and to propose this one for trunk. Sorry for the inconvenience. In any case it uses LSL/SBC idiom instead of the old CLR/SBRC/COM. Johann * avr-protos.h (avr_out_sign_extend): New. * avr.c (avr_adjust_insn_length) [ADJUST_LEN_SEXT]: Handle. (avr_out_sign_extend): New function. * avr.md (extendqihi2, extendqipsi2, extendqisi2, extendhipsi2) (extendhisi2, extendpsisi2): Use it. (adjust_len) [sext]: New. I'm agree with you. It's better. Approved. Denis.
Re: [PATCHv5][Kasan] Allow to override Asan shadow offset from command line
On Fri, Oct 24, 2014 at 05:56:37PM +0400, Yury Gribov wrote: From 1882c41de6c8ae53b7e199b3cc655b6f4b31e8fb Mon Sep 17 00:00:00 2001 From: Yury Gribov y.gri...@samsung.com Date: Thu, 16 Oct 2014 18:31:10 +0400 Subject: [PATCH 1/2] Add strtoll and strtoull to libiberty. 2014-10-20 Yury Gribov y.gri...@samsung.com include/ * libiberty.h (strtol, strtoul, strtoll, strtoull): New prototypes. libiberty/ * strtoll.c: New file. * strtoull.c: New file. * configure.ac: Add long long checks. Add harness for strtoll and strtoull. Check decls for strtol, strtoul, strtoll, strtoull. * Makefile.in (CFILES, CONFIGURED_OFILES): Added strtoll and strtoull. * config.in: Regenerate. * configure: Regenerate. * functions.texi: Regenerate. * testsuite/Makefile.in (check-strtol): New rule. (test-strtol): Likewise. (mostlyclean): Clean up strtol test. * testsuite/test-strtol.c: New test. Ian, can you please review this? --- a/gcc/common.opt +++ b/gcc/common.opt @@ -883,6 +883,10 @@ fsanitize= Common Driver Report Joined Select what to sanitize +fasan-shadow-offset= +Common Joined RejectNegative Var(common_deferred_options) Defer +-fasan-shadow-offset=stringUse custom shadow memory offset. Shouldn't that be =number or =address instead of string? --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -297,7 +297,7 @@ Objective-C and Objective-C++ Dialects}. @xref{Debugging Options,,Options for Debugging Your Program or GCC}. @gccoptlist{-d@var{letters} -dumpspecs -dumpmachine -dumpversion @gol -fsanitize=@var{style} -fsanitize-recover -fsanitize-recover=@var{style} @gol --fsanitize-undefined-trap-on-error @gol +-fasan-shadow-offset=@var{string} -fsanitize-undefined-trap-on-error @gol Likewise here, @var{number} instead. -fdbg-cnt-list -fdbg-cnt=@var{counter-value-list} @gol -fdisable-ipa-@var{pass_name} @gol -fdisable-rtl-@var{pass_name} @gol @@ -5642,6 +5642,12 @@ While @option{-ftrapv} causes traps for signed overflows to be emitted, @option{-fsanitize=undefined} gives a diagnostic message. This currently works only for the C family of languages. +@item -fasan-shadow-offset=@var{string} And here. Otherwise looks good to me. Jakub
Re: [PATCH][optabs] PR63442 libgcc_cmp_return_mode not always return word_mode
ping~ thanks. Regards, Jiong On 17/10/14 13:04, Jiong Wang wrote: the cause should be one minor bug in prepare_cmp_insn. the last mode parameter pmode of prepare_cmp_insn should match the mode of the first parameter x, while during the recursive call of prepare_cmp_insn, x is with mode of targetm.libgcc_cmp_return_mode () and pmode is assign to word_mode. generally this is OK, because default libgcc_cmp_return_mode hook always return word_mode, but AArch64 has a target private implementation which always return SImode, so there is a mismatch which cause a ICE later. this minor issue is hidding because nearly all other targets use default hook, and the compare is rarely invoked. Thanks gcc/ PR target/63442 * optabs.c (prepare_cmp_insn): Use target hook libgcc_cmp_return_mode instead of word_mode.
Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode
The following patch align stack for mcount and there should be no problems with unwind as ix86_frame_pointer_required is true when crtl-profile is true and flag_fentry is false (we call mcount after function prolog). When flag_fentry is true it is set to false in 32bit PIC mode: if (!TARGET_64BIT_P (opts-x_ix86_isa_flags) opts-x_flag_pic) { if (opts-x_flag_fentry 0) sorry (-mfentry isn%'t supported for 32-bit in combination with -fpic); opts-x_flag_fentry = 0; } 2014-10-24 Evgeny Stupachenko evstu...@gmail.com PR target/63534 * config/i386/i386.c (x86_function_profiler): Add GOT register init for mcount call. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6235c4f..2dff29c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) else x86_print_call_or_nop (file, mcount_name); } + /* At this stage we can't detrmine where GOT register is, as RA can allocate + it to any hard register. Therefore we need to set it once again. */ else if (flag_pic) { + pic_labels_used |= 1 BX_REG; + fprintf (file,\tsub\t$16, %%esp\n); + fprintf (file,\tmovl\t%%ebx, (%%esp)\n); + fprintf (file,\tcall\t__x86.get_pc_thunk.bx\n); + fprintf (file,\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n); #ifndef NO_PROFILE_COUNTERS fprintf (file, \tleal\t%sP%d@GOTOFF(%%ebx),%% PROFILE_COUNT_REGISTER \n, LPREFIX, labelno); #endif fprintf (file, 1:\tcall\t*%s@GOT(%%ebx)\n, mcount_name); + fprintf (file,\tmovl\t(%%esp), %%ebx\n); + fprintf (file,\tadd\t$16, %%esp\n); } else { On Fri, Oct 17, 2014 at 6:38 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 17, 2014 at 06:30:42PM +0400, Evgeny Stupachenko wrote: Hi, The patch fixes profile in 32bits PIC mode (only -p option affected). x86 bootstrap, make check passed spec2000 o2 -p train data on Corei7: CINT -5% CFP +1,5 compared to a compiler before enabling ebx. There is a potential performance improve after the patch applied suggested by Jakub: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534#c8 There is opened bug on this: PR63527. However the fix of the bug is more complicated. Is it ok? Unfortunately I don't think it is ok. 1) you don't set the appropriate bit in pic_labels_used (for ebx) 2) more importantly, it causes the stack to be misaligned (i.e. violating ABI) for the _mcount call, and, break unwind info. 2014-10-16 Evgeny Stupachenko evstu...@gmail.com PR target/63534 * config/i386/i386.c (x86_function_profiler): Add GOT register init for mcount call. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a3ca2ed..5117572 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39119,11 +39126,15 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) } else if (flag_pic) { + fprintf (file,\tpush\t%%ebx\n); + fprintf (file,\tcall\t__x86.get_pc_thunk.bx\n); + fprintf (file,\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n); #ifndef NO_PROFILE_COUNTERS fprintf (file, \tleal\t%sP%d@GOTOFF(%%ebx),%% PROFILE_COUNT_REGISTER \n, LPREFIX, labelno); #endif fprintf (file, 1:\tcall\t*%s@GOT(%%ebx)\n, mcount_name); + fprintf (file,\tpop\t%%ebx\n); } else { Jakub
Unifying std::atomic_int and std::atomicint
Our atomic was implemented (by Benjamin IIRC) based on an early C++0x draft when the spec was still trying to be valid for both C and C++. Part of the C compatibility aspect was that std::atomic_int is allowed to be either a typedef for std::atomicint or a base class of it, so that a C library could define std::atomic_int and then the C++ library could make std::atomicint derive from that. In the final C11 spec atomics work completely differently, and atomic_int is a typedef for _Atomic int, which is not a valid base class. So the old C++0x draft's compatibility aim is impossible, atomic_int can never be the same type in C and C++. In our implementation, std::atomic_int is a base class of std::atomicint, which has no benefit I can see, but causes https://gcc.gnu.org/PR60940 Rather than overloading every atomic_op() non-member function to handle the derived class and the base class, it would be simpler to just get rid of the base classes and make atomic_xxx a typedef for atomicxxx, as the attached patch does for atomic_{bool,char,schar}. Does anyone object to that change? If you object, are you prepared to do the work to fix PR60940? :-) [Note:- it could probably be simplified even further so atomicchar is just: template struct atomicchar : public __atomic_basechar { using __atomic_basechar::__atomic_base; }; But that could be done later as it wouldn't change anything observable, making atomic_char a typedef for atomicchar is the observable and IMHO important change. -end note] diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h index 1fc0ebb..a591c46 100644 --- a/libstdc++-v3/include/bits/atomic_base.h +++ b/libstdc++-v3/include/bits/atomic_base.h @@ -120,12 +120,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION templatetypename _IntTp struct __atomic_base; - /// atomic_char - typedef __atomic_basechar atomic_char; - - /// atomic_schar - typedef __atomic_basesigned char atomic_schar; - /// atomic_uchar typedef __atomic_baseunsigned char atomic_uchar; diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic index 85dc252..c58853e 100644 --- a/libstdc++-v3/include/std/atomic +++ b/libstdc++-v3/include/std/atomic @@ -49,21 +49,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @{ */ - /// atomic_bool + templatetypename _Tp +struct atomic; + + /// atomicbool // NB: No operators or fetch-operations for this type. - struct atomic_bool + template + struct atomicbool { private: __atomic_basebool_M_base; public: -atomic_bool() noexcept = default; -~atomic_bool() noexcept = default; -atomic_bool(const atomic_bool) = delete; -atomic_bool operator=(const atomic_bool) = delete; -atomic_bool operator=(const atomic_bool) volatile = delete; +atomic() noexcept = default; +~atomic() noexcept = default; +atomic(const atomic) = delete; +atomic operator=(const atomic) = delete; +atomic operator=(const atomic) volatile = delete; -constexpr atomic_bool(bool __i) noexcept : _M_base(__i) { } +constexpr atomic(bool __i) noexcept : _M_base(__i) { } bool operator=(bool __i) noexcept @@ -151,6 +155,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { return _M_base.compare_exchange_strong(__i1, __i2, __m); } }; + /// atomic_bool + typedef atomicbool atomic_bool; + /** * @brief Generic atomic type, primary class template. @@ -485,31 +492,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION }; - /// Explicit specialization for bool. - template -struct atomicbool : public atomic_bool -{ - typedef bool __integral_type; - typedef atomic_bool __base_type; - - atomic() noexcept = default; - ~atomic() noexcept = default; - atomic(const atomic) = delete; - atomic operator=(const atomic) = delete; - atomic operator=(const atomic) volatile = delete; - - constexpr atomic(__integral_type __i) noexcept : __base_type(__i) { } - - using __base_type::operator __integral_type; - using __base_type::operator=; -}; - /// Explicit specialization for char. template -struct atomicchar : public atomic_char +struct atomicchar : public __atomic_basechar { typedef char __integral_type; - typedef atomic_char __base_type; + typedef __atomic_basechar __base_type; atomic() noexcept = default; ~atomic() noexcept = default; @@ -523,12 +511,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION using __base_type::operator=; }; + /// atomic_char + typedef atomicchar atomic_char; + /// Explicit specialization for signed char. template -struct atomicsigned char : public atomic_schar +struct atomicsigned char : public __atomic_basesigned char { typedef signed char __integral_type; -
[PATCH] Fix typedef-name printing (PR c/56980)
Our current C pretty printer output sometimes looks a bit goofy: expected ‘enum F *’ but argument is of type ‘enum F *’. It's because it always prints struct/union/enum even though the type is a typedef name. This patch ought to fix this. We've got a bunch of reports about this over the years... The C++ printer can also print B* {aka A*}, I'll try to learn c_tree_printer to do something similar as well. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-10-24 Marek Polacek pola...@redhat.com PR c/56980 * c-pretty-print.c (c_pretty_printer::simple_type_specifier): Don't print struct/union/enum for typedefed names. * gcc.dg/pr56980.c: New test. diff --git gcc/c-family/c-pretty-print.c gcc/c-family/c-pretty-print.c index 3b2dbc1..9096a07 100644 --- gcc/c-family/c-pretty-print.c +++ gcc/c-family/c-pretty-print.c @@ -416,7 +416,9 @@ c_pretty_printer::simple_type_specifier (tree t) case UNION_TYPE: case RECORD_TYPE: case ENUMERAL_TYPE: - if (code == UNION_TYPE) + if (TYPE_NAME (t) TREE_CODE (TYPE_NAME (t)) == TYPE_DECL) + /* Don't decorate the type if this is a typedef name. */; + else if (code == UNION_TYPE) pp_c_ws_string (this, union); else if (code == RECORD_TYPE) pp_c_ws_string (this, struct); diff --git gcc/testsuite/gcc.dg/pr56980.c gcc/testsuite/gcc.dg/pr56980.c index e69de29..f48379a 100644 --- gcc/testsuite/gcc.dg/pr56980.c +++ gcc/testsuite/gcc.dg/pr56980.c @@ -0,0 +1,24 @@ +/* PR c/56980 */ +/* { dg-do compile } */ + +typedef struct A { int i; } B; +typedef union U { int i; } V; +typedef enum E { G } F; + +void foo_s (struct A); /* { dg-message expected .struct A. but argument is of type .B \\*. } */ +void foo_u (union U); /* { dg-message expected .union U. but argument is of type .V \\*. } */ +void foo_e (enum E); /* { dg-message expected .enum E. but argument is of type .F \\*. } */ +void foo_sp (B *); /* { dg-message expected .B \\*. but argument is of type .struct B \\*. } */ +void foo_up (V *); /* { dg-message expected .V \\*. but argument is of type .union V \\*. } */ +void foo_ep (F *); /* { dg-message expected .F \\*. but argument is of type .enum F \\*. } */ + +void +bar (B *b, V *v, F *f) +{ + foo_s (b); /* { dg-error incompatible } */ + foo_u (v); /* { dg-error incompatible } */ + foo_e (f); /* { dg-error incompatible } */ + foo_sp ((struct B *) b); /* { dg-error passing argument } */ + foo_up ((union V *) v); /* { dg-error passing argument } */ + foo_ep (__extension__ (enum F *) f); /* { dg-error passing argument } */ +} Marek
Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
On 20 Oct 15:19, Ilya Verbin wrote: On 15 Oct 16:23, Richard Biener wrote: +static bool +initialize_offload (void) +{ + bool have_offload = false; + struct cgraph_node *node; + struct varpool_node *vnode; + + FOR_EACH_DEFINED_FUNCTION (node) +if (lookup_attribute (omp declare target, DECL_ATTRIBUTES (node-decl))) + { + have_offload = true; + break; + } + + FOR_EACH_DEFINED_VARIABLE (vnode) +{ + if (!lookup_attribute (omp declare target, + DECL_ATTRIBUTES (vnode-decl)) + || TREE_CODE (vnode-decl) != VAR_DECL + || DECL_SIZE (vnode-decl) == 0) + continue; + have_offload = true; +} + + return have_offload; +} + I wonder if we can avoid the above by means of a global have_offload flag? (or inside gcc::context) +/* Select what needs to be streamed out. In regular lto mode stream everything. + In offload lto mode stream only stuff marked with an attribute. */ +void +select_what_to_stream (bool offload_lto_mode) +{ + struct symtab_node *snode; + FOR_EACH_SYMBOL (snode) +snode-need_lto_streaming + = !offload_lto_mode || lookup_attribute (omp declare target, +DECL_ATTRIBUTES (snode-decl)); I suppose I suggested this already earlier this year. Why keep this artificial attribute when you have a cgraph node flag? + /* If '#pragma omp critical' is inside target region, the symbol must + have an 'omp declare target' attribute. */ + omp_context *octx; + for (octx = ctx-outer; octx; octx = octx-outer) + if (is_targetreg_ctx (octx)) + { + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier (omp declare target), +NULL_TREE, DECL_ATTRIBUTES (decl)); Here - why not set a flag on cgraph_get_node (decl) instead? I thought that select_what_to_stream is exactly what you've suggested. Could you please clarify this? You propose to replace omp declare target attribure with some cgraph node flag like need_offload? But we'll need need_lto_streaming anyway, since for LTO it should be 1 for all nodes, but for offloading it should be equal to need_offload. We have to set the global have_offload flag in few places in omp-low.c and in FE (c/c-decl.c:c_decl_attributes, fortran/trans-common.c:build_common_decl, fortran/trans-decl.c:add_attributes_to_decl). This way looks for me a bit more complicated than the current approach. Actually, we could follow Jakub's suggestion of caching the attribute in a bit field, and set the global have_offload flag on the run without any changes in FE. However, I don't know a suitable place for it. If you agree with the approach, could you please specify the place? Thanks, -- Ilya
Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
On Fri, Oct 24, 2014 at 06:16:01PM +0400, Ilya Verbin wrote: We have to set the global have_offload flag in few places in omp-low.c and in FE (c/c-decl.c:c_decl_attributes, fortran/trans-common.c:build_common_decl, fortran/trans-decl.c:add_attributes_to_decl). This way looks for me a bit more complicated than the current approach. Actually, we could follow Jakub's suggestion of caching the attribute in a bit field, and set the global have_offload flag on the run without any changes in FE. However, I don't know a suitable place for it. If you agree with the approach, could you please specify the place? Can't you do that when creating the cgraph or varpool nodes? I'd expect the attribute to be already present on the decls at those spots. Jakub
Re: [PATCH, x86, 63534] Fix '-p' profile for 32 bit PIC mode
On Fri, Oct 24, 2014 at 06:12:15PM +0400, Evgeny Stupachenko wrote: The following patch align stack for mcount and there should be no problems with unwind as ix86_frame_pointer_required is true when crtl-profile is true and flag_fentry is false (we call mcount after function prolog). When flag_fentry is true it is set to false in 32bit PIC mode: if (!TARGET_64BIT_P (opts-x_ix86_isa_flags) opts-x_flag_pic) { if (opts-x_flag_fentry 0) sorry (-mfentry isn%'t supported for 32-bit in combination with -fpic); opts-x_flag_fentry = 0; } What is wrong in emitting the set_got right before the PROLOGUE_END note and that way sharing a single load from both? This looks just as a hack. 2014-10-24 Evgeny Stupachenko evstu...@gmail.com PR target/63534 * config/i386/i386.c (x86_function_profiler): Add GOT register init for mcount call. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6235c4f..2dff29c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) else x86_print_call_or_nop (file, mcount_name); } + /* At this stage we can't detrmine where GOT register is, as RA can allocate + it to any hard register. Therefore we need to set it once again. */ else if (flag_pic) { + pic_labels_used |= 1 BX_REG; + fprintf (file,\tsub\t$16, %%esp\n); + fprintf (file,\tmovl\t%%ebx, (%%esp)\n); + fprintf (file,\tcall\t__x86.get_pc_thunk.bx\n); + fprintf (file,\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n); #ifndef NO_PROFILE_COUNTERS fprintf (file, \tleal\t%sP%d@GOTOFF(%%ebx),%% PROFILE_COUNT_REGISTER \n, LPREFIX, labelno); #endif fprintf (file, 1:\tcall\t*%s@GOT(%%ebx)\n, mcount_name); + fprintf (file,\tmovl\t(%%esp), %%ebx\n); + fprintf (file,\tadd\t$16, %%esp\n); } else { Jakub
Re: [PATCH 3/4] Add libgomp plugin for Intel MIC
On Thu, Oct 23, 2014 at 07:41:12PM +0400, Ilya Verbin wrote: malloc can fail, SIGSEGV in response to that is not desirable. Can't you fallback to alloca, or use just alloca, or use alloca with malloc fallback? I replaced it with alloca. There is a risk if a suid or otherwise priviledge escalated program uses it and attacker passes huge env vars. Perhaps use alloca if it is = 2KB and malloc otherwise, and in that case if malloc fails, just do a fatal error? Where does this artificial limit come from? Using libNNN.so library names? Can't you use lib%d.so instead? Yes, it comes from the Image structure (liboffloadmic/runtime/offload_host.h:52) It must contain a null-terminated name, therefore I need to allocate some space for the name in plugin's struct TargetImage. But the structure can't contain any bytes after the trailing zero and before the actual data. So, now I extended the name to 10 digits and removed the comparison with 1000. Ok. Also, seeing register_image, shouldn't there be GOMP_OFFLOAD_unregister_image which would be invoked when the library containing MIC offloading regions is dlclosed? One could use __cxa_atexit or similar for that, something that is given __dso_handle. Or is no cleanup necessary? At least unregistering it from translation tables, because the same addresses might be reused by a different shared library? With dlopen/dlclose in mind, 1000 might be easily reached, consider 1 times dlopening/dlclosing (perhaps over longer time, by long running daemon) a shared library containg #pragma omp target region. Hmm, previously we've tested only cases when all libraries are loaded before the first offload. Offloading from a dlopened library after the call to gomp_target_init isn't working. So, this will require some changes in libgomp/target.c . Is it ok to fix this bug in a separate patch? I guess it can be done incrementally, even during stage3. Jakub
[PATCH v2] avoid alignment of static variables affecting stack's
Function (or more narrow) scope static variables (as well as others not placed on the stack) should also not have any effect on the stack alignment. I noticed the issue first with Linux'es dynamic_pr_debug() construct using an 8-byte aligned sub-file-scope local variable. According to my checking bad behavior started with 4.6.x (4.5.3 was still okay), but generated code got quite a bit worse as of 4.9.0. [v2: Drop inclusion of hard register variables, as requested by Jakub and Richard.] gcc/ 2014-10-24 Jan Beulich jbeul...@suse.com * cfgexpand.c (expand_one_var): Exclude static and external variables when adjusting stack alignment related state. gcc/testsuite/ 2014-10-24 Jan Beulich jbeul...@suse.com * gcc.c-torture/execute/stkalign.c: New. --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -1233,12 +1233,15 @@ static HOST_WIDE_INT expand_one_var (tree var, bool toplevel, bool really_expand) { unsigned int align = BITS_PER_UNIT; + bool stack = true; tree origvar = var; var = SSAVAR (var); if (TREE_TYPE (var) != error_mark_node TREE_CODE (var) == VAR_DECL) { + stack = !TREE_STATIC (var) !DECL_EXTERNAL (var); + /* Because we don't know if VAR will be in register or on stack, we conservatively assume it will be on stack even if VAR is eventually put into register after RA pass. For non-automatic @@ -1267,22 +1270,25 @@ expand_one_var (tree var, bool toplevel, align = POINTER_SIZE; } - if (SUPPORTS_STACK_ALIGNMENT - crtl-stack_alignment_estimated align) + if (stack) { - /* stack_alignment_estimated shouldn't change after stack - realign decision made */ - gcc_assert (!crtl-stack_realign_processed); - crtl-stack_alignment_estimated = align; + if (SUPPORTS_STACK_ALIGNMENT + crtl-stack_alignment_estimated align) + { + /* stack_alignment_estimated shouldn't change after stack +realign decision made */ + gcc_assert (!crtl-stack_realign_processed); + crtl-stack_alignment_estimated = align; + } + + /* stack_alignment_needed PREFERRED_STACK_BOUNDARY is permitted. +So here we only make sure stack_alignment_needed = align. */ + if (crtl-stack_alignment_needed align) + crtl-stack_alignment_needed = align; + if (crtl-max_used_stack_slot_alignment align) + crtl-max_used_stack_slot_alignment = align; } - /* stack_alignment_needed PREFERRED_STACK_BOUNDARY is permitted. - So here we only make sure stack_alignment_needed = align. */ - if (crtl-stack_alignment_needed align) -crtl-stack_alignment_needed = align; - if (crtl-max_used_stack_slot_alignment align) -crtl-max_used_stack_slot_alignment = align; - if (TREE_CODE (origvar) == SSA_NAME) { gcc_assert (TREE_CODE (var) != VAR_DECL --- a/gcc/testsuite/gcc.c-torture/execute/stkalign.c +++ b/gcc/testsuite/gcc.c-torture/execute/stkalign.c @@ -0,0 +1,26 @@ +/* { dg-options -fno-inline } */ + +#include assert.h + +#define ALIGNMENT 64 + +unsigned test(unsigned n, unsigned p) +{ + static struct { char __attribute__((__aligned__(ALIGNMENT))) c; } s; + unsigned x; + + assert(__alignof__(s) == ALIGNMENT); + asm ( : =g (x), +m (s) : 0 (x)); + + return n ? test(n - 1, x) : (x ^ p); +} + +int main (int argc, char *argv[] __attribute__((unused))) +{ + unsigned int x = test(argc, 0); + + x |= test(argc + 1, 0); + x |= test(argc + 2, 0); + + return !(x (ALIGNMENT - 1)); +} avoid alignment of static variables affecting stack's Function (or more narrow) scope static variables (as well as others not placed on the stack) should also not have any effect on the stack alignment. I noticed the issue first with Linux'es dynamic_pr_debug() construct using an 8-byte aligned sub-file-scope local variable. According to my checking bad behavior started with 4.6.x (4.5.3 was still okay), but generated code got quite a bit worse as of 4.9.0. [v2: Drop inclusion of hard register variables, as requested by Jakub and Richard.] gcc/ 2014-10-24 Jan Beulich jbeul...@suse.com * cfgexpand.c (expand_one_var): Exclude static and external variables when adjusting stack alignment related state. gcc/testsuite/ 2014-10-24 Jan Beulich jbeul...@suse.com * gcc.c-torture/execute/stkalign.c: New. --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -1233,12 +1233,15 @@ static HOST_WIDE_INT expand_one_var (tree var, bool toplevel, bool really_expand) { unsigned int align = BITS_PER_UNIT; + bool stack = true; tree origvar = var; var = SSAVAR (var); if (TREE_TYPE (var) != error_mark_node TREE_CODE (var) == VAR_DECL) { + stack = !TREE_STATIC (var) !DECL_EXTERNAL (var); + /* Because we don't know if VAR will be in register or on stack, we conservatively assume it will be on stack even if VAR is eventually put into register after RA pass. For
Re: [PATCH 3/4] Add libgomp plugin for Intel MIC
On 24 Oct 16:35, Jakub Jelinek wrote: On Thu, Oct 23, 2014 at 07:41:12PM +0400, Ilya Verbin wrote: malloc can fail, SIGSEGV in response to that is not desirable. Can't you fallback to alloca, or use just alloca, or use alloca with malloc fallback? I replaced it with alloca. There is a risk if a suid or otherwise priviledge escalated program uses it and attacker passes huge env vars. Perhaps use alloca if it is = 2KB and malloc otherwise, and in that case if malloc fails, just do a fatal error? Why is this more preferable than just a malloc + fatal error? This function is executed only once at plugin initialization, therefore no real performance gain could be achived. Thanks, -- Ilya
Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
On 24 October 2014 11:23, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 23 October 2014 18:51, Charles Baylis charles.bay...@linaro.org wrote: Otherwise this and the previous 1/2 associated patch look good, can you respin with these tidy ups? OK for trunk? OK /Marcus Committed to trunk as r216671 and r216672.
RE: [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral
Alan Lawrence alan.lawre...@arm.com writes: Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out how to do the same for MIPS (specifically what I need to add to mips_expand_vec_reduc), and have had no response from the maintainers, so am Sorry, I was looking at this but failed to send an email saying so. The lack of vec_extract appears to be the stumbling point here so at the very least we need to add a naïve version of that I believe. (2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break the MIPS backend if something were not done with it's existing patterns. I suspect we can deal with this in time to make a rename OK. One thing occurred to me about this change in general which is that on the whole the reduction to a scalar seems good for an epilogue but is there a problem if the result is then replicated across a vector for further processing. I.e. a vector is reduced to a scalar, which moves the value from a SIMD register to a GP register (because scalar modes are not supported in SIMD registers generally) and then gets moved back to a SIMD register to form part of a new vector? Would you expect the redundant moves to get eliminated? Thanks, Matthew