[PATCH] Low-hanging C++-lexer speedup (PR c++/24208)
Within cp/parser.c, cp_lexer_peek_token and the rest of the token-related functions are bloated by lexer-debugging code, code that is completely dead unless calls to the functions cp_lexer_[start|stop]_debugging are deliberately inserted somewhere in the parser source code for temporary debugging purposes. The compiler doesn't fold away this dead code at compile time because it cannot prove that the flag lexer->debugging_p doesn't change. So we end up with this dead debugging code, guarded by cp_lexer_debugging_p, in the release binary. This is especially wasteful with code like while (cp_lexer_next_token_is_not (parser->lexer, CPP_EQ) && cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA) && cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN) && cp_lexer_next_token_is_not (parser->lexer, CPP_EOF)) which, after inlining, ought to be equivalent to: token = parser->lexer->token; while (token != CPP_EQ && token != CPP_COMMA && token != CPP_CLOSE_PAREN && token != CPP_EOF) but because of the lexer-debugging stuff getting in the way of inlining/CSE, the final code is much worse. This patch helps the compiler to fold away calls to cp_lexer_debugging_p when the lexer is not being debugged, by adding a new macro that short-circuits the cp_lexer_debugging_p predicate. This change reduces the size of parser.o by 3.5% -- from 6060 Kb to 5852 Kb. This change also reduces the time it takes to compile a dummy C++ file of mine from 1.95s to 1.85s, a reduction of 5%. Bootstrapped + regtested on x86_64-pc-linux-gnu. Does this patch look OK to commit? gcc/cp/ChangeLog: PR c++/24208 * parser.c (LEXER_DEBUGGING_ENABLED_P): New macro. (cp_lexer_debugging_p): Use it. (cp_lexer_start_debugging): Likewise. (cp_lexer_stop_debugging): Likewise. --- gcc/cp/parser.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 33f1df3..d03b0c9 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -706,11 +706,21 @@ cp_lexer_destroy (cp_lexer *lexer) ggc_free (lexer); } +/* This needs to be set to TRUE before the lexer-debugging infrastructure can + be used. The point of this flag is to help the compiler to fold away calls + to cp_lexer_debugging_p within this source file at compile time, when the + lexer is not being debugged. */ + +#define LEXER_DEBUGGING_ENABLED_P false + /* Returns nonzero if debugging information should be output. */ static inline bool cp_lexer_debugging_p (cp_lexer *lexer) { + if (!LEXER_DEBUGGING_ENABLED_P) +return false; + return lexer->debugging_p; } @@ -1296,6 +1306,10 @@ debug (cp_token *ptr) static void cp_lexer_start_debugging (cp_lexer* lexer) { + if (!LEXER_DEBUGGING_ENABLED_P) +fatal_error (input_location, +"LEXER_DEBUGGING_ENABLED_P is not set to true"); + lexer->debugging_p = true; cp_lexer_debug_stream = stderr; } @@ -1305,6 +1319,10 @@ cp_lexer_start_debugging (cp_lexer* lexer) static void cp_lexer_stop_debugging (cp_lexer* lexer) { + if (!LEXER_DEBUGGING_ENABLED_P) +fatal_error (input_location, +"LEXER_DEBUGGING_ENABLED_P is not set to true"); + lexer->debugging_p = false; cp_lexer_debug_stream = NULL; } -- 2.7.0.134.gf5046bd.dirty
[PATCH, libstdc++-v3] Fix import of wide character related symbols in stdlib.h wraper
include/c_compatibility/stdlib.h imports wide character related symbols into global namespace unconditionaly which causes libstdc++-v3 build to fail when one or both of _GLIBCXX_USE_WCHAR_T and _GLIBCXX_HAVE_MBSTATE_T are not defined. Included patch changes it to import them into global namespace only when they are defined in cstdlib Andris 2016-01-26 Andris Pavenis* include/c_compatibility/stdlib.h: Include wide character related definitions only when they are available in cstdlib. >From 17778d89abe4f51f929806e67d2e2352b6b4376e Mon Sep 17 00:00:00 2001 From: Andris Pavenis Date: Tue, 26 Jan 2016 06:24:48 +0200 Subject: [PATCH] [PATCH,libstdc++-v3] Fix use use wide character related symbols in stdlib.h wrapper * include/c_compatibility/stdlib.h: include wide character related definitions only when they are available in cstdlib --- libstdc++-v3/include/c_compatibility/stdlib.h | 4 1 file changed, 4 insertions(+) diff --git a/libstdc++-v3/include/c_compatibility/stdlib.h b/libstdc++-v3/include/c_compatibility/stdlib.h index bd72580..31e7e5f 100644 --- a/libstdc++-v3/include/c_compatibility/stdlib.h +++ b/libstdc++-v3/include/c_compatibility/stdlib.h @@ -62,9 +62,11 @@ using std::getenv; using std::labs; using std::ldiv; using std::malloc; +#ifdef _GLIBCXX_HAVE_MBSTATE_T using std::mblen; using std::mbstowcs; using std::mbtowc; +#endif // _GLIBCXX_HAVE_MBSTATE_T using std::qsort; using std::rand; using std::realloc; @@ -73,8 +75,10 @@ using std::strtod; using std::strtol; using std::strtoul; using std::system; +#ifdef _GLIBCXX_USE_WCHAR_T using std::wcstombs; using std::wctomb; +#endif // _GLIBCXX_USE_WCHAR_T #endif #endif -- 2.5.0
Re: [PATCH] OpenACC use_device clause ICE fix
On 2016/1/25 7:06 PM, Jakub Jelinek wrote: > The following ICEs without the patch and works with it, so I think it is > better: > > 2016-01-25 Jakub Jelinek> > * omp-low.c (lower_omp_target) : Set > DECL_VALUE_EXPR of new_var even for the non-array case. Look > through DECL_VALUE_EXPR for expansion. > > * c-c++-common/goacc/use_device-1.c: New test. Thanks, the test was indeed just a reduction of a whole example program, which I'm not sure we're at liberty to directly include in the testsuite. I've verified that the patch allows the program to build and run correctly. Thanks, Chung-Lin
Re: Speedup configure and build with system.h
Hi, On Mon, 25 Jan 2016, Uros Bizjak wrote: > This patch caused bootstrap failure on non-c++11 bootstrap compiler > [1], e.g. CentOS 5.11. > > The problem is with std::swap, which was defined in header > until c++11 [2]. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69464 > [2] http://en.cppreference.com/w/cpp/algorithm/swap Meh. Can you try the attached patch with a configure test (it includes the generated files)? It works for me with 4.3.4, and should make your build include always. Ciao, Michael.Index: configure.ac === --- configure.ac (revision 232675) +++ configure.ac (working copy) @@ -416,6 +416,15 @@ struct X { typedef long long ]], [[X::t x;]])],[],[AC_MSG_ERROR([error verifying int64_t uses long long])]) fi +AC_CACHE_CHECK(for std::swap in , ac_cv_std_swap_in_utility, [ +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ +#include +]], [[int a, b; std::swap(a,b);]])],[ac_cv_std_swap_in_utility=yes],[ac_cv_std_swap_in_utility=no])]) +if test $ac_cv_std_swap_in_utility = yes; then + AC_DEFINE(HAVE_SWAP_IN_UTILITY, 1, + [Define if defines std::swap.]) +fi + # Check whether compiler is affected by placement new aliasing bug (PR 29286). # If the host compiler is affected by the bug, and we build with optimization # enabled (which happens e.g. when cross-compiling), the pool allocator may Index: system.h === --- system.h (revision 232736) +++ system.h (working copy) @@ -217,7 +217,7 @@ extern int errno; #endif #ifdef __cplusplus -#ifdef INCLUDE_ALGORITHM +#if defined (INCLUDE_ALGORITHM) || !defined (HAVE_SWAP_IN_UTILITY) # include #endif # include Index: configure === --- configure (revision 232675) +++ configure (working copy) @@ -6534,6 +6534,40 @@ fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for std::swap in " >&5 +$as_echo_n "checking for std::swap in ... " >&6; } +if test "${ac_cv_std_swap_in_utility+set}" = set; then : + $as_echo_n "(cached) " >&6 +else + +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +#include + +int +main () +{ +int a, b; std::swap(a,b); + ; + return 0; +} +_ACEOF +if ac_fn_cxx_try_compile "$LINENO"; then : + ac_cv_std_swap_in_utility=yes +else + ac_cv_std_swap_in_utility=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_std_swap_in_utility" >&5 +$as_echo "$ac_cv_std_swap_in_utility" >&6; } +if test $ac_cv_std_swap_in_utility = yes; then + +$as_echo "#define HAVE_SWAP_IN_UTILITY 1" >>confdefs.h + +fi + # Check whether compiler is affected by placement new aliasing bug (PR 29286). # If the host compiler is affected by the bug, and we build with optimization # enabled (which happens e.g. when cross-compiling), the pool allocator may @@ -18419,7 +18453,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18422 "configure" +#line 18456 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -18525,7 +18559,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18528 "configure" +#line 18562 "configure" #include "confdefs.h" #if HAVE_DLFCN_H Index: config.in === --- config.in (revision 232675) +++ config.in (working copy) @@ -1705,6 +1705,12 @@ #endif +/* Define if defines std::swap. */ +#ifndef USED_FOR_TARGET +#undef HAVE_SWAP_IN_UTILITY +#endif + + /* Define to 1 if you have the `sysconf' function. */ #ifndef USED_FOR_TARGET #undef HAVE_SYSCONF @@ -1865,7 +1871,8 @@ #endif -/* Define if your assembler supports .dwsect 0xB */ +/* Define if your assembler supports AIX debug frame section label reference. + */ #ifndef USED_FOR_TARGET #undef HAVE_XCOFF_DWARF_EXTRAS #endif
Re: [PATCH 0/2] [ARC] Small fixes
On 25/01/16 13:33, Claudiu Zissulescu wrote: Please find attached two small patches which are fixing two issues within the ARC backend: 1. The first one fixes predicates used by arcset* patterns. 2. The second one rejects constant-constant comparisons. This situation may happen durring CSE step. These are OK. FWIW, there's probably a missed optimization here - these constant - constant comparisons could be folded down further.
Re: [PATCH, PR69421] Check vector types of COND_EXPR operands are compatible when vectorizing it
On Mon, Jan 25, 2016 at 11:16 AM, Ilya Enkovichwrote: > Hi, > > This patch covers one more case when boolean operands get different > vectypes and we don't detect it. > > Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk? Ok. Richard. > Thanks, > Ilya > -- > gcc/ > > 2016-01-25 Ilya Enkovich > > PR target/69421 > * tree-vect-stmts.c (vectorizable_condition): Check vectype > of operands is compatible with a statement vectype. > > gcc/testsuite/ > > 2016-01-25 Ilya Enkovich > > PR target/69421 > * gcc.dg/pr69421.c: New test. > > > diff --git a/gcc/testsuite/gcc.dg/pr69421.c b/gcc/testsuite/gcc.dg/pr69421.c > new file mode 100644 > index 000..252e22c > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr69421.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > + > +struct A { double a; }; > +double a; > + > +void > +foo (_Bool *x) > +{ > + long i; > + for (i = 0; i < 64; i++) > +{ > + struct A c; > + x[i] = c.a || a; > +} > +} > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index 1d2246d..ed2ce07 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -7528,6 +7528,7 @@ vectorizable_condition (gimple *stmt, > gimple_stmt_iterator *gsi, > >tree vectype = STMT_VINFO_VECTYPE (stmt_info); >int nunits = TYPE_VECTOR_SUBPARTS (vectype); > + tree vectype1 = NULL_TREE, vectype2 = NULL_TREE; > >if (slp_node || PURE_SLP_STMT (stmt_info)) > ncopies = 1; > @@ -7547,9 +7548,17 @@ vectorizable_condition (gimple *stmt, > gimple_stmt_iterator *gsi, > return false; > >gimple *def_stmt; > - if (!vect_is_simple_use (then_clause, stmt_info->vinfo, _stmt, )) > + if (!vect_is_simple_use (then_clause, stmt_info->vinfo, _stmt, , > + )) > +return false; > + if (!vect_is_simple_use (else_clause, stmt_info->vinfo, _stmt, , > + )) > return false; > - if (!vect_is_simple_use (else_clause, stmt_info->vinfo, _stmt, )) > + > + if (vectype1 && !useless_type_conversion_p (vectype, vectype1)) > +return false; > + > + if (vectype2 && !useless_type_conversion_p (vectype, vectype2)) > return false; > >masked = !COMPARISON_CLASS_P (cond_expr);
Re: [PATCH] Fix aarch64 bootstrap (pr69416)
On 22 January 2016 at 18:07, Richard Hendersonwrote: > The bare CONST_INT inside the CCmode IF_THEN_ELSE is causing combine to make > incorrect simplifications. At this stage it feels safer to wrap the > CONST_INT inside of an UNSPEC than make more generic changes to combine. > > But we should definitely investigate combine's CCmode issues for gcc7. > > > Ok? > Hi, After this, I'm seeing this test now FAILs: gcc.target/aarch64/ccmp_1.c scan-assembler adds\t Christophe > > r~
Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables
Hi! On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote: > >diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c > >index 62e5454..cdaee41 100644 > >--- a/gcc/lto-cgraph.c > >+++ b/gcc/lto-cgraph.c > >@@ -1911,6 +1911,11 @@ input_offload_tables (void) > > tree fn_decl > > = lto_file_decl_data_get_fn_decl (file_data, decl_index); > > vec_safe_push (offload_funcs, fn_decl); > >+ > >+ /* Prevent IPA from removing fn_decl as unreachable, since there > >+ may be no refs from the parent function to child_fn in offload > >+ LTO mode. */ > >+ cgraph_node::get (fn_decl)->mark_force_output (); > > } > > else if (tag == LTO_symtab_variable) > > { > >@@ -1918,6 +1923,10 @@ input_offload_tables (void) > > tree var_decl > > = lto_file_decl_data_get_var_decl (file_data, decl_index); > > vec_safe_push (offload_vars, var_decl); > >+ > >+ /* Prevent IPA from removing var_decl as unused, since there > >+ may be no refs to var_decl in offload LTO mode. */ > >+ varpool_node::get (var_decl)->force_output = 1; > > } This doesn't work when there is more than one LTO partition, because only first partition contains full offload table to maintain correct order, but cgraph and varpool nodes aren't necessarily created for the first partition. To reproduce: $ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* --target_board=unix/-flto" FAIL: libgomp.c/for-3.c (internal compiler error) FAIL: libgomp.c/for-5.c (internal compiler error) FAIL: libgomp.c/for-6.c (internal compiler error) $ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* --target_board=unix/-flto" FAIL: libgomp.c++/for-11.C (internal compiler error) FAIL: libgomp.c++/for-13.C (internal compiler error) FAIL: libgomp.c++/for-14.C (internal compiler error) -- Ilya
Re: [hsa merge 07/10] IPA-HSA pass
On 01/16/2016 11:00 AM, Jan Hubicka wrote: > Can't it be represented via explicit REF_ADDR or something like that? > > Honza Hi. Sure, I've just done a patch that can do that. However, as we're currently in stage4, that change would probably require explicit permission of a release manager? Thanks, Martin >From 9639fff94d043c55b55bfb12bb086032db565f0a Mon Sep 17 00:00:00 2001 From: marxinDate: Mon, 25 Jan 2016 16:11:00 +0100 Subject: [PATCH] HSA: simplify partitioning of HSA kernels and host impls. gcc/lto/ChangeLog: 2016-01-25 Martin Liska * lto-partition.c (add_symbol_to_partition_1): Remove usage of hsa_summaries. gcc/ChangeLog: 2016-01-25 Martin Liska * hsa.c (hsa_summary_t::link_functions): Create IPA_REF_ADDR reference for an HSA kernel and its host function. --- gcc/hsa.c | 5 + gcc/lto/lto-partition.c | 19 --- 2 files changed, 5 insertions(+), 19 deletions(-) diff --git a/gcc/hsa.c b/gcc/hsa.c index ec23f81..f0b3205 100644 --- a/gcc/hsa.c +++ b/gcc/hsa.c @@ -781,6 +781,11 @@ hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host, TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false; TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false; DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts; + + /* Create reference between a kernel and a corresponding host implementation + to quarantee LTO streaming to a same LTRANS. */ + if (kind == HSA_KERNEL) +gpu->create_reference (host, IPA_REF_ADDR); } /* Add a HOST function to HSA summaries. */ diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index eb28fed..9eb63c2 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -34,7 +34,6 @@ along with GCC; see the file COPYING3. If not see #include "ipa-prop.h" #include "ipa-inline.h" #include "lto-partition.h" -#include "hsa.h" vec ltrans_partitions; @@ -171,24 +170,6 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) Therefore put it into the same partition. */ if (cnode->instrumented_version) add_symbol_to_partition_1 (part, cnode->instrumented_version); - - /* Add an HSA associated with the symbol. */ - if (hsa_summaries != NULL) - { - hsa_function_summary *s = hsa_summaries->get (cnode); - if (s->m_kind == HSA_KERNEL) - { - /* Add binded function. */ - bool added = add_symbol_to_partition_1 (part, - s->m_binded_function); - gcc_assert (added); - if (symtab->dump_file) - fprintf (symtab->dump_file, - "adding an HSA function (host/gpu) to the " - "partition: %s\n", - s->m_binded_function->name ()); - } - } } add_references_to_partition (part, node); -- 2.7.0
Re: [PATCH] PR c++/69399: Add HAVE_WORKING_CXX_BUILTIN_CONSTANT_P
On Fri, Jan 22, 2016 at 7:55 PM, H.J. Luwrote: > Without the fix for PR 65656, g++ miscompiles __builtin_constant_p in > wi::lrshift in wide-int.h. Add a check with PR 65656 testcase to verify > that C++ __builtin_constant_p works properly. > > Tested on x86-64 with working GCC: > > gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ > prev-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ > stage1-gcc/auto-host.h:#define HAVE_WORKING_CXX_BUILTIN_CONSTANT_P 1 > > and broken GCC: > > gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ > prev-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ > stage1-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ > > Ok for trunk? I have a hard time seeing how we are "miscompiling" if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT) ? xi.len == 1 && xi.val[0] >= 0 : xi.precision <= HOST_BITS_PER_WIDE_INT) anything that relies on __builtin_constant_p () for sematics is fishy so why is this not simply an lrshfit implementation bug? Richard. > Thanks. > > H.J. > --- > gcc/ > > PR c++/69399 > * configure.ac: Check if C++ __builtin_constant_p works > properly. > (HAVE_WORKING_CXX_BUILTIN_CONSTANT_P): AC_DEFINE. > * system.h (STATIC_CONSTANT_P): Use __builtin_constant_p only > if HAVE_WORKING_CXX_BUILTIN_CONSTANT_P is defined. > * config.in: Regenerated. > * configure: Likewise. > > gcc/testsuite/ > > PR c++/69399 > * gcc.dg/torture/pr69399.c: New test. > --- > gcc/config.in | 10 - > gcc/configure | 41 > -- > gcc/configure.ac | 27 ++ > gcc/system.h | 2 +- > gcc/testsuite/gcc.dg/torture/pr69399.c | 21 + > 5 files changed, 97 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr69399.c > > diff --git a/gcc/config.in b/gcc/config.in > index 1796e1d..11ebf5c 100644 > --- a/gcc/config.in > +++ b/gcc/config.in > @@ -1846,6 +1846,13 @@ > #endif > > > +/* Define this macro if C++ __builtin_constant_p with constexpr does not > crash > + with a variable. */ > +#ifndef USED_FOR_TARGET > +#undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P > +#endif > + > + > /* Define to 1 if `fork' works. */ > #ifndef USED_FOR_TARGET > #undef HAVE_WORKING_FORK > @@ -1865,7 +1872,8 @@ > #endif > > > -/* Define if your assembler supports .dwsect 0xB */ > +/* Define if your assembler supports AIX debug frame section label reference. > + */ > #ifndef USED_FOR_TARGET > #undef HAVE_XCOFF_DWARF_EXTRAS > #endif > diff --git a/gcc/configure b/gcc/configure > index ff646e8..2798231 100755 > --- a/gcc/configure > +++ b/gcc/configure > @@ -6534,6 +6534,43 @@ fi > rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext > fi > > +# Check if C++ __builtin_constant_p works properly. Without the fix > +# for PR 65656, g++ miscompiles __builtin_constant_p in wi::lrshift in > +# wide-int.h. Add a check with PR 65656 testcase to verify that C++ > +# __builtin_constant_p works properly. > +if test "$GCC" = yes; then > + saved_CFLAGS="$CFLAGS" > + saved_CXXFLAGS="$CXXFLAGS" > + CFLAGS="$CFLAGS -O -x c++ -std=c++11" > + CXXFLAGS="$CXXFLAGS -O -x c++ -std=c++11" > + { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX > __builtin_constant_p works with constexpr" >&5 > +$as_echo_n "checking whether $CXX __builtin_constant_p works with > constexpr... " >&6; } > + cat confdefs.h - <<_ACEOF >conftest.$ac_ext > +/* end confdefs.h. */ > + > +int > +foo (int argc) > +{ > + constexpr bool x = __builtin_constant_p(argc); > + return x ? 1 : 0; > +} > + > +_ACEOF > +if ac_fn_cxx_try_compile "$LINENO"; then : > + { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 > +$as_echo "yes" >&6; } > + > +$as_echo "#define HAVE_WORKING_CXX_BUILTIN_CONSTANT_P 1" >>confdefs.h > + > +else > + { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 > +$as_echo "no" >&6; } > +fi > +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext > + CFLAGS="$saved_CFLAGS" > + CXXFLAGS="$saved_CXXFLAGS" > +fi > + > # Check whether compiler is affected by placement new aliasing bug (PR > 29286). > # If the host compiler is affected by the bug, and we build with optimization > # enabled (which happens e.g. when cross-compiling), the pool allocator may > @@ -18419,7 +18456,7 @@ else >lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 >lt_status=$lt_dlunknown >cat > conftest.$ac_ext <<_LT_EOF > -#line 18422 "configure" > +#line 18459 "configure" > #include "confdefs.h" > > #if HAVE_DLFCN_H > @@ -18525,7 +18562,7 @@ else >lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 >lt_status=$lt_dlunknown >cat > conftest.$ac_ext <<_LT_EOF > -#line 18528 "configure" >
Re: [PATCH, rs6000] Fix PR63354
On Sun, Jan 24, 2016 at 9:17 PM, Bill Schmidtwrote: > Hi Jan, thanks for the report! Patch below that should fix the problem. > Bootstrapped and tested on powerpc64le-unknown-linux-gnu, no > regressions. David, is this ok for trunk? > > Thanks, > Bill > > > 2016-01-24 Bill Schmidt > > * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled): Add > decl with __attribute__ ((unused)) annotation. Okay. Thanks, David
Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
On 22 January 2016 at 18:06, Alan Lawrencewrote: > On 20/01/16 21:10, Christophe Lyon wrote: >> >> On 19 January 2016 at 15:51, Alan Lawrence >> wrote: >>> >>> On 19/01/16 11:15, Christophe Lyon wrote: >>> >> For neon_vdupn, I chose to implement neon_vdup_nv4hf and >> neon_vdup_nv8hf instead of updating the VX iterator because I thought >> it was not desirable to impact neon_vrev32. > > > > Well, the same instruction will suffice for vrev32'ing vectors of HF > just > as > well as vectors of HI, so I think I'd argue that's harmless enough. To > gain the > benefit, we'd need to update arm_evpc_neon_vrev with a few new cases, > though. > Since this is more intrusive, I'd rather leave that part for later. OK? >>> >>> >>> >>> Sure. >>> >> +#ifdef __ARM_BIG_ENDIAN >> + /* Here, 3 is (4-1) where 4 is the number of lanes. This is also >> the >> + right value for vectors with 8 lanes. */ >> +#define __arm_lane(__vec, __idx) (__idx ^ 3) >> +#else >> +#define __arm_lane(__vec, __idx) __idx >> +#endif >> + > > > > Looks right, but sounds... my concern here is that I'm hoping at some > point we > will move the *other* vget/set_lane intrinsics to use GCC vector > extensions > too. At which time (unlike __aarch64_lane which can be used everywhere) > this > will be the wrong formula. Can we name (and/or comment) it to avoid > misleading > anyone? The key characteristic seems to be that it is for vectors of > 16-bit > elements only. > I'm not to follow, here. Looking at the patterns for neon_vget_lane_*internal in neon.md, I can see 2 flavours: one for VD, one for VQ2. The latter uses "halfelts". Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq), that would be similar to the aarch64 ones (by computing the number of lanes of the input vector), but the "q" one would use half the total number of lanes instead? >>> >>> >>> >>> That works for me! Sthg like: >>> >>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx >>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) + >>> (NUM_LANES(__vec)/2 - __idx) >>> //or similarly >>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1)) >>> >>> Alternatively I'd been thinking >>> >>> #define __arm_lane_32xN(__idx) __idx ^ 1 >>> #define __arm_lane_16xN(__idx) __idx ^ 3 >>> #define __arm_lane_8xN(__idx) __idx ^ 7 >>> >>> Bear in mind PR64893 that we had on AArch64 :-( >>> >> >> Here is a new version, based on the comments above. >> I've also removed the addition of arm_fp_ok effective target since I >> added that in my other testsuite patch. >> >> OK now? > > > Yes. I realize my worry about PR64893 doesn't apply here since we pass > constant lane numbers / vector bounds into __builtin_arm_lane_check. Thanks! > Thanks, I guess I still have to wait for Kyrill/Ramana 's approval. > --Alan > >> >> Thanks, >> >> Christophe >> >>> Cheers, Alan > >
Re: Mark oacc kernels fns
On Mon, Jan 25, 2016 at 10:06:50AM -0500, Nathan Sidwell wrote: > On 01/04/16 10:39, Nathan Sidwell wrote: > >There's currently no robust predicate to determine whether an oacc offload > >function is for a kernels region (as opposed to a parallel region). The > >test in > >tree-ssa-loop.c uses the heuristic of seeing if all the dimensions are > >defaulted > > (which can easily be true for parallel offloads at that point). > > > >This patch marks TREE_PUBLIC on the offload attribute values, to note kernels > >regions, and adds a predicate to check that. I also broke out the function > >level determination from oacc_validate_dims, as there it was only laziness > >on my > >part to have not done that earlier. > > > >Using these predicates improves the dump output of the openacc device > >lowering > >pass too. > > > >ok? > > https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00092.html > ping? Ok, thanks. Jakub
[PATCH 0/2] [ARC] Small fixes
Please find attached two small patches which are fixing two issues within the ARC backend: 1. The first one fixes predicates used by arcset* patterns. 2. The second one rejects constant-constant comparisons. This situation may happen durring CSE step. Ok to apply? Claudiu Claudiu Zissulescu (2): [ARC] Fix arcset* pattern's predicate. [ARC] Reject constant-constant comparison. gcc/config/arc/arc.md| 18 +++--- gcc/config/arc/predicates.md | 2 ++ 2 files changed, 13 insertions(+), 7 deletions(-) -- 1.9.1
[PATCH 1/2] [ARC] Fix arcset* pattern's predicate.
gcc/ 2016-01-25 Claudiu Zissulescu* config/arc/arc.md (cstoresi4): Force operand into register. (arcset): Fix predicate. (arcsetltu): Likewise. (arcsetgeu): Likewise. (arcsethi): Likewise. (arcsetls): Likewise. --- gcc/config/arc/arc.md | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md index 222a468..602cf0b 100644 --- a/gcc/config/arc/arc.md +++ b/gcc/config/arc/arc.md @@ -3346,8 +3346,9 @@ (define_expand "cstoresi4" [(set (match_operand:SI 0 "dest_reg_operand" "") - (match_operator:SI 1 "ordered_comparison_operator" [(match_operand:SI 2 "nonmemory_operand" "") - (match_operand:SI 3 "nonmemory_operand" "")]))] + (match_operator:SI 1 "ordered_comparison_operator" + [(match_operand:SI 2 "nonmemory_operand" "") + (match_operand:SI 3 "nonmemory_operand" "")]))] "" { if (!TARGET_CODE_DENSITY) @@ -3358,6 +3359,9 @@ emit_insn (gen_scc_insn (operands[0], operands[1])); DONE; } + if (!register_operand (operands[2], SImode)) +operands[2] = force_reg (SImode, operands[2]); + }) (define_mode_iterator SDF [SF DF]) @@ -5414,7 +5418,7 @@ (define_insn "arcset" [(set (match_operand:SI 0 "register_operand""=r,r,r,r,r,r,r") - (arcCC_cond:SI (match_operand:SI 1 "nonmemory_operand" "0,r,0,r,0,0,r") + (arcCC_cond:SI (match_operand:SI 1 "register_operand" "0,r,0,r,0,0,r") (match_operand:SI 2 "nonmemory_operand" "r,r,L,L,I,n,n")))] "TARGET_V2 && TARGET_CODE_DENSITY" "set%? %0, %1, %2" @@ -5427,7 +5431,7 @@ (define_insn "arcsetltu" [(set (match_operand:SI 0 "register_operand" "=r,r,r,r,r, r, r") - (ltu:SI (match_operand:SI 1 "nonmemory_operand" "0,r,0,r,0, 0, r") + (ltu:SI (match_operand:SI 1 "register_operand" "0,r,0,r,0, 0, r") (match_operand:SI 2 "nonmemory_operand" "r,r,L,L,I, n, n")))] "TARGET_V2 && TARGET_CODE_DENSITY" "setlo%? %0, %1, %2" @@ -5440,7 +5444,7 @@ (define_insn "arcsetgeu" [(set (match_operand:SI 0 "register_operand" "=r,r,r,r,r, r, r") - (geu:SI (match_operand:SI 1 "nonmemory_operand" "0,r,0,r,0, 0, r") + (geu:SI (match_operand:SI 1 "register_operand" "0,r,0,r,0, 0, r") (match_operand:SI 2 "nonmemory_operand" "r,r,L,L,I, n, n")))] "TARGET_V2 && TARGET_CODE_DENSITY" "seths%? %0, %1, %2" @@ -5454,7 +5458,7 @@ ;; Special cases of SETCC (define_insn_and_split "arcsethi" [(set (match_operand:SI 0 "register_operand" "=r,r, r,r") - (gtu:SI (match_operand:SI 1 "nonmemory_operand" "r,r, r,r") + (gtu:SI (match_operand:SI 1 "register_operand" "r,r, r,r") (match_operand:SI 2 "nonmemory_operand" "0,r,C62,n")))] "TARGET_V2 && TARGET_CODE_DENSITY" "setlo%? %0, %2, %1" @@ -5477,7 +5481,7 @@ (define_insn_and_split "arcsetls" [(set (match_operand:SI 0 "register_operand" "=r,r, r,r") - (leu:SI (match_operand:SI 1 "nonmemory_operand" "r,r, r,r") + (leu:SI (match_operand:SI 1 "register_operand" "r,r, r,r") (match_operand:SI 2 "nonmemory_operand" "0,r,C62,n")))] "TARGET_V2 && TARGET_CODE_DENSITY" "seths%? %0, %2, %1" -- 1.9.1
[PATCH 2/2] [ARC] Reject constant-constant comparison.
gcc/ 2016-01-25 Claudiu Zissulescu* config/arc/predicates.md (proper_comparison_operator): Reject constant-constant comparison. --- gcc/config/arc/predicates.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/config/arc/predicates.md b/gcc/config/arc/predicates.md index 52ac2ac..d384d70 100644 --- a/gcc/config/arc/predicates.md +++ b/gcc/config/arc/predicates.md @@ -510,6 +510,8 @@ /* From combiner. */ case QImode: case HImode: case DImode: case SFmode: case DFmode: return 0; +case VOIDmode: + return 0; default: gcc_unreachable (); } -- 1.9.1
[PATCH] Fix PR69380
Tested on x86_64-linux, applied. Richard. 2016-01-25 Richard BienerPR testsuite/69380 * g++.dg/tree-ssa/pr69336.C: Restrict to x86_64 and i?86. Index: gcc/testsuite/g++.dg/tree-ssa/pr69336.C === --- gcc/testsuite/g++.dg/tree-ssa/pr69336.C (revision 232792) +++ gcc/testsuite/g++.dg/tree-ssa/pr69336.C (working copy) @@ -83,4 +83,4 @@ int main(void) return 0; } -// { dg-final { scan-tree-dump-not "cmap" "optimized" } } +// { dg-final { scan-tree-dump-not "cmap" "optimized" { target x86_64-*-* i?86-*-* } } }
RE: [PATCH 0/2] [ARC] Small fixes
> FWIW, there's probably a missed optimization here - these constant - > constant comparisons could be folded down further. They are. The issue is when the CSE runs, wants to validate a new instruction with the propagated constant, which will lead to errors as it checks the proper_comaprison_operator, as it hits the assert at the end. Returning zero, it invalidates the instruction change, and the constant comparison will be handled later on by other steps. //Claudiu
Re: [PATCH] Fix the remaining PR c++/24666 blockers (arrays decay to pointers too early)
On 01/22/2016 05:30 PM, Patrick Palka wrote: On Fri, 22 Jan 2016, Jason Merrill wrote: On 01/22/2016 11:17 AM, Patrick Palka wrote: On Thu, 21 Jan 2016, Patrick Palka wrote: On Thu, 21 Jan 2016, Jason Merrill wrote: On 01/19/2016 10:30 PM, Patrick Palka wrote: * g++.dg/template/unify17.C: XFAIL. Hmm, I'm not comfortable with deliberately regressing this testcase. template -void bar (void (T[5])); // { dg-error "array of 'void'" } +void bar (void (T[5])); // { dg-error "array of 'void'" "" { xfail *-*-* } } Can we work it so that T[5] also is un-decayed in the DECL_ARGUMENTS of bar, but decayed in the TYPE_ARG_TYPES? I think so, I'll try it. Well, I tried it and the result is really ugly and it only "somewhat" works. (Maybe I'm just missing something obvious though.) The ugliness comes from the fact that decaying an array parameter type of a function type embedded deep within some tree structure requires rebuilding all the tree structures in between to avoid issues due to tree sharing. Yes, that does complicate things. This approach only "somewhat" works because it currently looks through function, pointer, reference and array types. Right, you would need to handle template arguments as well. And I just noticed that this approach does not work at all for USING_DECLs because no PARM_DECL is ever retained anyway in that case. I don't understand what you mean about USING_DECLs. I just meant that we fail and would continue to fail to diagnose an "array of void" error in the following test case: template using X = void (T[5]); void foo (X); True. I think here we want to get the error when instantiating X. I think a better, complete fix for this issue would be to, one way or another, be able to get at the PARM_DECLs that correspond to a given FUNCTION_TYPE. Say, if, the TREE_CHAIN of a FUNCTION_TYPE optionally pointed to its PARM_DECLs, or something. What do you think? Hmm. So void(int[5]) and void(int*) would be distinct types, but they would share TYPE_CANONICAL, as though one is a typedef of the other? Interesting, but I'm not sure how that would interact with template argument canonicalization. Well, that can probably be made to work by treating dependent template arguments as distinct more frequently. Another thought: What if we keep a list of arrays we need to substitute into for a particular function template? That approach definitely seems easier to reason about. And it could properly handle "using" templates as well as variable templates -- any TEMPLATE_DECL, I think. Agreed. Jason
Re: Mark oacc kernels fns
On 01/04/16 10:39, Nathan Sidwell wrote: There's currently no robust predicate to determine whether an oacc offload function is for a kernels region (as opposed to a parallel region). The test in tree-ssa-loop.c uses the heuristic of seeing if all the dimensions are defaulted (which can easily be true for parallel offloads at that point). This patch marks TREE_PUBLIC on the offload attribute values, to note kernels regions, and adds a predicate to check that. I also broke out the function level determination from oacc_validate_dims, as there it was only laziness on my part to have not done that earlier. Using these predicates improves the dump output of the openacc device lowering pass too. ok? https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00092.html ping?
Re: Speedup configure and build with system.h
Hi, On Fri, 22 Jan 2016, Jakub Jelinek wrote: > > > This may have caused: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69434 > > > > Guess we need: > > > > 2016-01-22 Jakub Jelinek> > > > PR bootstrap/69434 > > * genrecog.c: Define INCLUDE_ALGORITHM before including system.h, > > remove include. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Thanks for the fixup, the problem didn't happen on my system. This usage of sse3 intrinsics inside installed STL headers seems a bit unfortunate (not to mention the dubiousness of placing a 180 line function containing 11 loops nested to level 4 for arcane functionality into a header; but I guess this battle is lost with STL). Ciao, Michael.
Re: Speedup configure and build with system.h
On Mon, 25 Jan 2016, Michael Matz wrote: > Hi, > > On Mon, 25 Jan 2016, Uros Bizjak wrote: > > > This patch caused bootstrap failure on non-c++11 bootstrap compiler > > [1], e.g. CentOS 5.11. > > > > The problem is with std::swap, which was defined in header > > until c++11 [2]. > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69464 > > [2] http://en.cppreference.com/w/cpp/algorithm/swap > > Meh. Can you try the attached patch with a configure test (it includes > the generated files)? It works for me with 4.3.4, and should make your > build include always. Ok with a proper changelog. Thanks, Richard.
Re: [PATCH, rs6000] Fix PR63354
Thanks, committed as r232793. Bill On Mon, 2016-01-25 at 08:54 -0500, David Edelsohn wrote: > On Sun, Jan 24, 2016 at 9:17 PM, Bill Schmidt >wrote: > > > Hi Jan, thanks for the report! Patch below that should fix the problem. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu, no > > regressions. David, is this ok for trunk? > > > > Thanks, > > Bill > > > > > > 2016-01-24 Bill Schmidt > > > > * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled): Add > > decl with __attribute__ ((unused)) annotation. > > Okay. > > Thanks, David >
Re: [PATCH] Fix PR64091
On Mon, 25 Jan 2016, Tom de Vries wrote: > On 27/11/14 15:13, Richard Biener wrote: > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > Richard. > > > > 2014-11-27 Richard Biener> > > > PR tree-optimization/64088 > > * tree-ssa-tail-merge.c (update_debug_stmt): After resetting > > the stmt break from the loop over use operands. > > > > * gcc.dg/torture/pr64091.c: New testcase. > > > > Index: gcc/tree-ssa-tail-merge.c > > === > > --- gcc/tree-ssa-tail-merge.c (revision 218117) > > +++ gcc/tree-ssa-tail-merge.c (working copy) > > @@ -1606,9 +1613,7 @@ update_debug_stmt (gimple stmt) > > { > > use_operand_p use_p; > > ssa_op_iter oi; > > - basic_block bbdef, bbuse; > > - gimple def_stmt; > > - tree name; > > + basic_block bbuse; > > > > if (!gimple_debug_bind_p (stmt)) > > return; > > @@ -1616,19 +1621,16 @@ update_debug_stmt (gimple stmt) > > bbuse = gimple_bb (stmt); > > FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, oi, SSA_OP_USE) > > { > > - name = USE_FROM_PTR (use_p); > > - gcc_assert (TREE_CODE (name) == SSA_NAME); > > - > > - def_stmt = SSA_NAME_DEF_STMT (name); > > - gcc_assert (def_stmt != NULL); > > - > > - bbdef = gimple_bb (def_stmt); > > + tree name = USE_FROM_PTR (use_p); > > + gimple def_stmt = SSA_NAME_DEF_STMT (name); > > + basic_block bbdef = gimple_bb (def_stmt); > > if (bbdef == NULL || bbuse == bbdef > > || dominated_by_p (CDI_DOMINATORS, bbuse, bbdef)) > > continue; > > > > gimple_debug_bind_reset_value (stmt); > > update_stmt (stmt); > > + break; > > } > > } > > > > Index: gcc/testsuite/gcc.dg/torture/pr64091.c > > === > > --- gcc/testsuite/gcc.dg/torture/pr64091.c (revision 0) > > +++ gcc/testsuite/gcc.dg/torture/pr64091.c (working copy) > > @@ -0,0 +1,28 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-g" } */ > > + > > +extern int foo(void); > > + > > +int main(void) > > +{ > > + int i, a, b; > > + > > + if (foo()) > > +return 0; > > + > > + for (i = 0, a = 0, b = 0; i < 3; i++, a++) > > + { > > +if (foo()) > > + break; > > + > > +if (b += a) > > + a = 0; > > + } > > + > > + if (!a) > > +return 2; > > + > > + b += a; > > + > > + return 0; > > +} > > > > Hi, > > the ICE that the patch above fixes does not reproduce on 4.9. > > One reason is that an edge_flag EDGE_EXECUTABLE happens to be set, which > prevents tail-merge from doing a merge. > > Another reason is that the use that is added to the free_uses freelist during > update_stmt happens to be not immediately reused, so the contents remains the > same. > > Using first attached patch, which: > - clears EDGE_EXECUTABLE in tail-merge, and this shows a latent issue in tail-merging that it doesn't ignore edge flags that are "private" (that is, they have random state upon pass entry). > - clears the contents of a use when adding it to the freelist > we manage to trigger the same problem with 4.9. > > Is seems possible to me that the same problem could trigger on 4.9 for a > different example without the trigger patch. > > The second attached patch is a minimal version of the above fix. > > OK for 4.9? Ok (the minimal fix). Thanks, Richard. > Thanks, > - Tom > > > > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [hsa merge 07/10] IPA-HSA pass
On Mon, Jan 25, 2016 at 04:21:50PM +0100, Martin Liška wrote: > On 01/16/2016 11:00 AM, Jan Hubicka wrote: > > Can't it be represented via explicit REF_ADDR or something like that? > > > > Honza > > Hi. > > Sure, I've just done a patch that can do that. However, as we're currently in > stage4, > that change would probably require explicit permission of a release manager? If Honza is fine with it and you've tested it, this is ok for trunk. > >From 9639fff94d043c55b55bfb12bb086032db565f0a Mon Sep 17 00:00:00 2001 > From: marxin> Date: Mon, 25 Jan 2016 16:11:00 +0100 > Subject: [PATCH] HSA: simplify partitioning of HSA kernels and host impls. > > gcc/lto/ChangeLog: > > 2016-01-25 Martin Liska > > * lto-partition.c (add_symbol_to_partition_1): Remove usage > of hsa_summaries. > > gcc/ChangeLog: > > 2016-01-25 Martin Liska > > * hsa.c (hsa_summary_t::link_functions): Create IPA_REF_ADDR > reference for an HSA kernel and its host function. > --- > gcc/hsa.c | 5 + > gcc/lto/lto-partition.c | 19 --- > 2 files changed, 5 insertions(+), 19 deletions(-) > > diff --git a/gcc/hsa.c b/gcc/hsa.c > index ec23f81..f0b3205 100644 > --- a/gcc/hsa.c > +++ b/gcc/hsa.c > @@ -781,6 +781,11 @@ hsa_summary_t::link_functions (cgraph_node *gpu, > cgraph_node *host, >TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false; >TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false; >DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts; > + > + /* Create reference between a kernel and a corresponding host > implementation > + to quarantee LTO streaming to a same LTRANS. */ > + if (kind == HSA_KERNEL) > +gpu->create_reference (host, IPA_REF_ADDR); > } > > /* Add a HOST function to HSA summaries. */ > diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c > index eb28fed..9eb63c2 100644 > --- a/gcc/lto/lto-partition.c > +++ b/gcc/lto/lto-partition.c > @@ -34,7 +34,6 @@ along with GCC; see the file COPYING3. If not see > #include "ipa-prop.h" > #include "ipa-inline.h" > #include "lto-partition.h" > -#include "hsa.h" > > vec ltrans_partitions; > > @@ -171,24 +170,6 @@ add_symbol_to_partition_1 (ltrans_partition part, > symtab_node *node) >Therefore put it into the same partition. */ >if (cnode->instrumented_version) > add_symbol_to_partition_1 (part, cnode->instrumented_version); > - > - /* Add an HSA associated with the symbol. */ > - if (hsa_summaries != NULL) > - { > - hsa_function_summary *s = hsa_summaries->get (cnode); > - if (s->m_kind == HSA_KERNEL) > - { > - /* Add binded function. */ > - bool added = add_symbol_to_partition_1 (part, > - s->m_binded_function); > - gcc_assert (added); > - if (symtab->dump_file) > - fprintf (symtab->dump_file, > - "adding an HSA function (host/gpu) to the " > - "partition: %s\n", > - s->m_binded_function->name ()); > - } > - } > } > >add_references_to_partition (part, node); > -- > 2.7.0 > Jakub
Re: [PATCH] Fix PR64091
On 27/11/14 15:13, Richard Biener wrote: Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2014-11-27 Richard BienerPR tree-optimization/64088 * tree-ssa-tail-merge.c (update_debug_stmt): After resetting the stmt break from the loop over use operands. * gcc.dg/torture/pr64091.c: New testcase. Index: gcc/tree-ssa-tail-merge.c === --- gcc/tree-ssa-tail-merge.c (revision 218117) +++ gcc/tree-ssa-tail-merge.c (working copy) @@ -1606,9 +1613,7 @@ update_debug_stmt (gimple stmt) { use_operand_p use_p; ssa_op_iter oi; - basic_block bbdef, bbuse; - gimple def_stmt; - tree name; + basic_block bbuse; if (!gimple_debug_bind_p (stmt)) return; @@ -1616,19 +1621,16 @@ update_debug_stmt (gimple stmt) bbuse = gimple_bb (stmt); FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, oi, SSA_OP_USE) { - name = USE_FROM_PTR (use_p); - gcc_assert (TREE_CODE (name) == SSA_NAME); - - def_stmt = SSA_NAME_DEF_STMT (name); - gcc_assert (def_stmt != NULL); - - bbdef = gimple_bb (def_stmt); + tree name = USE_FROM_PTR (use_p); + gimple def_stmt = SSA_NAME_DEF_STMT (name); + basic_block bbdef = gimple_bb (def_stmt); if (bbdef == NULL || bbuse == bbdef || dominated_by_p (CDI_DOMINATORS, bbuse, bbdef)) continue; gimple_debug_bind_reset_value (stmt); update_stmt (stmt); + break; } } Index: gcc/testsuite/gcc.dg/torture/pr64091.c === --- gcc/testsuite/gcc.dg/torture/pr64091.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr64091.c (working copy) @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-g" } */ + +extern int foo(void); + +int main(void) +{ + int i, a, b; + + if (foo()) +return 0; + + for (i = 0, a = 0, b = 0; i < 3; i++, a++) + { +if (foo()) + break; + +if (b += a) + a = 0; + } + + if (!a) +return 2; + + b += a; + + return 0; +} Hi, the ICE that the patch above fixes does not reproduce on 4.9. One reason is that an edge_flag EDGE_EXECUTABLE happens to be set, which prevents tail-merge from doing a merge. Another reason is that the use that is added to the free_uses freelist during update_stmt happens to be not immediately reused, so the contents remains the same. Using first attached patch, which: - clears EDGE_EXECUTABLE in tail-merge, and - clears the contents of a use when adding it to the freelist we manage to trigger the same problem with 4.9. Is seems possible to me that the same problem could trigger on 4.9 for a different example without the trigger patch. The second attached patch is a minimal version of the above fix. OK for 4.9? Thanks, - Tom trigger --- gcc/tree-ssa-operands.c | 6 +- gcc/tree-ssa-tail-merge.c | 1 + 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c index 76d04630..d249354 100644 --- a/gcc/tree-ssa-operands.c +++ b/gcc/tree-ssa-operands.c @@ -402,7 +402,11 @@ finalize_ssa_uses (struct function *fn, gimple stmt) if (old_ops) { for (ptr = old_ops; ptr; ptr = ptr->next) - delink_imm_use (USE_OP_PTR (ptr)); + { + delink_imm_use (USE_OP_PTR (ptr)); + USE_OP_PTR (ptr)->use = NULL; + USE_OP_PTR (ptr)->loc.stmt = NULL; + } old_ops->next = gimple_ssa_operands (fn)->free_uses; gimple_ssa_operands (fn)->free_uses = old_ops; } diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c index b5165d5..af2641c 100644 --- a/gcc/tree-ssa-tail-merge.c +++ b/gcc/tree-ssa-tail-merge.c @@ -725,6 +725,7 @@ find_same_succ_bb (basic_block bb, same_succ *same_p) { int index = e->dest->index; bitmap_set_bit (same->succs, index); + e->flags &= ~EDGE_EXECUTABLE; same_succ_edge_flags[index] = e->flags; } EXECUTE_IF_SET_IN_BITMAP (same->succs, 0, j, bj) Backport "Fix PR64091" 2016-01-25 Tom de Vries backport from trunk: 2014-11-27 Richard Biener PR tree-optimization/PR64091 * tree-ssa-tail-merge.c (update_debug_stmt): After resetting the stmt break from the loop over use operands. * gcc.dg/torture/pr64091.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr64091.c | 28 gcc/tree-ssa-tail-merge.c | 1 + 2 files changed, 29 insertions(+) diff --git a/gcc/testsuite/gcc.dg/torture/pr64091.c b/gcc/testsuite/gcc.dg/torture/pr64091.c new file mode 100644 index 000..0cd994a --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr64091.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-g" } */ + +extern int foo(void); + +int main(void) +{ + int i, a, b; + + if (foo()) +return 0; + + for (i = 0, a = 0, b = 0; i < 3; i++, a++) + { +
Re: [PATCH] fix gimplification of call parameters (PR cilkplus/69267)
ping On Tue, Jan 19, 2016 at 9:28 AM, Ryan Burnwrote: > Does this look ok? > >> On Jan 15, 2016, at 5:41 PM, Ryan Burn wrote: >> >> This patch changes the function cilk_gimplify_call_params_in_spawned_fn to >> use gimplify_arg instead of gimplify_expr. It fixes an ICE when calling a >> function with a constructed empty class as the argument. >> >> Bootstrapped and regression tested on x86_64-linux. >> >> 2016-01-15 Ryan Burn >> >>PR cilkplus/69267 >>* cilk.c (cilk_gimplify_call_params_in_spawned_fn): Change to use >>gimplify_arg. Removed superfluous post_p argument. >>* c-family.h (cilk_gimplify_call_params_in_spawned_fn): Removed >>superfluous post_p argument. >>* c-gimplify.c (c_gimplify_expr): Likewise. >> >> gcc/cp/ChangeLog: >> >> 2016-01-15 Ryan Burn >> >>PR cilkplus/69267 >>* cp-gimplify.c (cilk_cp_gimplify_call_params_in_spawned_fn): Removed >>superfluous post_p argument in call to >>cilk_gimplify_call_params_in_spawned_fn. >> >> gcc/testsuite/ChangeLog: >> >> 2016-01-15 Ryan Burn >> >> PR cilkplus/69267 >> * g++.dg/cilk-plus/CK/pr69267.cc: New test. >> >> >> >> >
Re: C++ PATCH for c++/69379 (ICE with PTRMEM_CST wrapped in NOP_EXPR)
On 01/22/2016 05:07 PM, Marek Polacek wrote: On Fri, Jan 22, 2016 at 03:38:26PM -0500, Jason Merrill wrote: If we have a NOP_EXPR to the same type, we should strip it here. This helps for the unreduced testcases in the PR, but not for the reduced one, because for the reduced one, the types are not the same. One type is struct { void Dict:: (struct Dict *, T) * __pfn; long int __delta; } and the second one struct { void Dict:: (struct Dict *) * __pfn; long int __delta; } The NOP_EXPR in this case originated in build_reinterpret_cast_1: 7070 else if ((TYPE_PTRFN_P (type) && TYPE_PTRFN_P (intype)) 7071|| (TYPE_PTRMEMFUNC_P (type) && TYPE_PTRMEMFUNC_P (intype))) 7072 return build_nop (type, expr); Well, a reinterpret_cast makes the expression non-constant, so we can recognize that case (when the types are unrelated) and bail out. After that we probably still need to deal with the case of conversion to a pointer-to-member-of-base type; for functions it looks like we can just copy the PTRMEM_CST and give it a different type, but for data members I think we'll need to add support for the type not matching the member in expand_ptrmem_cst. Jason
[PATCH] Fix a typo in ppc libgcc (PR target/69444)
Hi! The soft-fp multilib of powerpc libgcc doesn't build because of a typo in the conditional - the guarded code uses inline asm that assumes hard float. Ok for trunk? 2016-01-25 Jakub JelinekPR target/69444 * config/rs6000/sfp-machine.h: Fix a typo in #ifndef - __NO_FPRS__ instead of ___NO_FPRS__. --- libgcc/config/rs6000/sfp-machine.h.jj 2016-01-21 21:27:57.0 +0100 +++ libgcc/config/rs6000/sfp-machine.h 2016-01-25 11:45:48.093285428 +0100 @@ -110,7 +110,7 @@ typedef int __gcc_CMPtype __attribute__ floating point on pre-ISA 3.0 machines without the IEEE 128-bit floating point support. */ -#ifndef ___NO_FPRS__ +#ifndef __NO_FPRS__ #define ISA_BIT(x) (1LL << (63 - x)) /* Use the same bits of the FPSCR. */ Jakub
[PATCH] [PR tree-optimization/69196] [PR tree-optimization/68398] Reorganize profitibility testing for FSM jump threading
This is the first of a few patches to address 69196 (code size regression with jump threading) and 68398 (coremark regression due to FSM changes). While these are caused by distinct issues, they hit the same hunk of code. I could address them independently, but I believe in the end it'll just make the whole process more difficult. This hunk of work is a bit of reorganization. Right now valid_jump_thread_path searches the path for particular nuggets of information (did we thread through a multiway branch, did we thread a multiway branch, did we thread through the loop latch, did we thread the loop latch, etc). We really want that code in the FSM detection side so that we can make better decisions about whether or not a particular FSM path is likely to be profitable to thread. So this patch moves that code into the detection side. Second, the limiters for the FSM code had some minor bugs. For example, it didn't count one of the blocks when determining how many blocks where in the FSM path. It didn't account for PHIs when determining how many statements we'd copy, etc. Before the limiters are re-tuned, the basic accounting needs to be more accurate. This turns out to have a tiny positive impact of 69196, but the primary purpose of this patch is putting bits in the right place and fixing dumb accounting errors. The real work to address 69196 and 68398 will come in follow-up patches. The testsuite changes are totally an artifact of changing how we detect the actions of the jump threader. Bootstrapped & regression tested on x86_64. Installed on the trunk. Now to actually fix the regressions :-) Jeff commit e25c808d9975556443d1bf90f968f0fd567a5de6 Author: lawDate: Mon Jan 25 19:19:09 2016 + PR tree-optimization/69196 PR tree-optimization/68398 * tree-ssa-threadupdate.h (enum bb_dom_status): Moved here from tree-ssa-threadupdate.c. (determine_bb_domination_status): Prototype * tree-ssa-threadupdate.c (enum bb_dom_status): Remove (determine_bb_domination_status): No longer static. (valid_jump_thread_path): Remove code to detect characteristics of the jump thread path not associated with correctness. * tree-ssa-threadbackward.c (fsm_find_control_statment_thread_paths): Correct test for thread path length. Count PHIs for real operands as statements that need to be copied. Do not count ASSERT_EXPRs. Look at all the blocks in the thread path. Compute and selectively filter thread paths based on threading through the latch, threading a multiway branch or crossing a multiway branch. PR tree-optimization/69196 PR tree-optimization/68398 * gcc.dg/tree-ssa/pr66752-3.c: Update expected output * gcc.dg/tree-ssa/pr68198.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232802 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6d51578..d9d59d7 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,21 @@ +2016-01-25 Jeff Law + + PR tree-optimization/69196 + PR tree-optimization/68398 + * tree-ssa-threadupdate.h (enum bb_dom_status): Moved here from + tree-ssa-threadupdate.c. + (determine_bb_domination_status): Prototype + * tree-ssa-threadupdate.c (enum bb_dom_status): Remove + (determine_bb_domination_status): No longer static. + (valid_jump_thread_path): Remove code to detect characteristics + of the jump thread path not associated with correctness. + * tree-ssa-threadbackward.c (fsm_find_control_statment_thread_paths): + Correct test for thread path length. Count PHIs for real operands as + statements that need to be copied. Do not count ASSERT_EXPRs. + Look at all the blocks in the thread path. Compute and selectively + filter thread paths based on threading through the latch, threading + a multiway branch or crossing a multiway branch. + 2016-01-25 Bill Schmidt * config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled): Add diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 763ceac..7e5daa9 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,10 @@ +2016-01-25 Jeff Law + + PR tree-optimization/69196 + PR tree-optimization/68398 + * gcc.dg/tree-ssa/pr66752-3.c: Update expected output + * gcc.dg/tree-ssa/pr68198.c: Likewise. + 2016-01-25 David Edelsohn PR target/69469 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index 1f27b1a..6eeaca5 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -33,9 +33,9 @@ foo (int N, int c, int b, int *a) }
Re: [PATCH] PR c++/69399: Add HAVE_WORKING_CXX_BUILTIN_CONSTANT_P
On Mon, Jan 25, 2016 at 4:40 AM, Richard Bienerwrote: > On Fri, Jan 22, 2016 at 7:55 PM, H.J. Lu wrote: >> Without the fix for PR 65656, g++ miscompiles __builtin_constant_p in >> wi::lrshift in wide-int.h. Add a check with PR 65656 testcase to verify >> that C++ __builtin_constant_p works properly. >> >> Tested on x86-64 with working GCC: >> >> gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ >> prev-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ >> stage1-gcc/auto-host.h:#define HAVE_WORKING_CXX_BUILTIN_CONSTANT_P 1 >> >> and broken GCC: >> >> gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ >> prev-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ >> stage1-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */ >> >> Ok for trunk? > > I have a hard time seeing how we are "miscompiling" > > if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT) > ? xi.len == 1 && xi.val[0] >= 0 > : xi.precision <= HOST_BITS_PER_WIDE_INT) > > anything that relies on __builtin_constant_p () for sematics is fishy so why > is this not simply an lrshfit implementation bug? > We hit this via: Breakpoint 1, wi::lrshift >, generic_wide_int > > (x=..., y=...) at /export/gnu/import/git/sources/gcc-release/gcc/wide-int.h:2898 2898 val[0] = xi.to_uhwi () >> shift; (gdb) bt #0 wi::lrshift >, generic_wide_int > > (x=..., y=...) at /export/gnu/import/git/sources/gcc-release/gcc/wide-int.h:2898 #1 0x009e7bbe in wi::rshift >, generic_wide_int > > (sgn=, y=..., x=...) at /export/gnu/import/git/sources/gcc-release/gcc/wide-int.h:2947 #2 bit_value_binop_1 (code=code@entry=RSHIFT_EXPR, type=type@entry=0x7fffefe82dc8, val=val@entry=0x7fffd7c0, mask=mask@entry=0x7fffd790, r1type=0x7fffefe82dc8, r1val=..., r1mask=..., r2type=0x7fffefd6b690, r2val=..., r2mask=...) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:1348 #3 0x009e9e7b in bit_value_binop (code=code@entry=RSHIFT_EXPR, type=0x7fffefe82dc8, rhs1=rhs1@entry=0x7fffefd71708, rhs2=) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:1549 #4 0x009eb520 in evaluate_stmt (stmt=stmt@entry=0x7fffefe9a160) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:1785 #5 0x009ec8d2 in visit_assignment (stmt=stmt@entry=0x7fffefe9a160, output_p=output_p@entry=0x7fffdba0) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:2258 #6 0x009ec9c2 in ccp_visit_stmt (stmt=0x7fffefe9a160, taken_edge_p=0x7fffdba8, output_p=0x7fffdba0) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:2336 ---Type to continue, or q to quit--- #7 0x00a4efcf in simulate_stmt (stmt=stmt@entry=0x7fffefe9a160) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-propagate.c:348 #8 0x00a50f79 in simulate_block (block=) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-propagate.c:471 #9 ssa_propagate ( visit_stmt=visit_stmt@entry=0x9ec937 , visit_phi=visit_phi@entry=0x9e6aa5 ) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-propagate.c:888 #10 0x009e6295 in do_ssa_ccp () at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:2382 #11 (anonymous namespace)::pass_ccp::execute (this=) at /export/gnu/import/git/sources/gcc-release/gcc/tree-ssa-ccp.c:2415 #12 0x0089ca0c in execute_one_pass (pass=pass@entry=0x19b4bf0) at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2330 #13 0x0089cd62 in execute_pass_list_1 (pass=0x19b4bf0) at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2382 #14 0x0089cd7f in execute_pass_list_1 (pass=0x19b4a70, pass@entry=0x19b48f0) at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2383 #15 0x0089cd9c in execute_pass_list (fn=0x7fffefe98000, pass=0x19b48f0) at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2393 #16 0x0089ba57 in do_per_function_toporder ( callback=callback@entry=0x89cd83 , ---Type to continue, or q to quit--- data=0x19b48f0) at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:1728 #17 0x0089d3e3 in execute_ipa_pass_list (pass=0x19b4890) at /export/gnu/import/git/sources/gcc-release/gcc/passes.c:2736 #18 0x0066f3ac in ipa_passes () at /export/gnu/import/git/sources/gcc-release/gcc/cgraphunit.c:2172 #19 symbol_table::compile (this=this@entry=0x7fffefd6b000) at
Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
On 22 January 2016 at 12:56, Richard Bienerwrote: > On Fri, Jan 22, 2016 at 12:41 PM, Christian Bruel > wrote: >> >> >> On 01/19/2016 04:18 PM, Richard Biener wrote: >>> >>> maybe just if (currently_expanding_to_rtl)? >>> >>> But yes, this looks like a safe variant of the fix. >>> >>> Richard. >>> >> thanks, currently_expanding_to_rtl works perfectly. So the final version. >> I added a test for each target. > > Ok. > Hi, This small patch is needed to make the new test pass on arm hard-float targets (eg. arm-none-linux-gnueabihf). I'm not sure it counts as obvious, so here it is. OK? Christophe. DATE Christophe Lyon * gcc.target/arm/pr68674.c: Check and use arm_fp effective target. > Thanks, > Richard. > >> bootstrapped / tested for : >> unix/-m32/-march=i586 >> unix >> >> arm-qemu/ >> arm-qemu//-mfpu=neon >> arm-qemu//-mfpu=neon-fp-armv8 >> >> aarch64-qemu >> >> >> >> >> >> >> diff --git a/gcc/testsuite/gcc.target/arm/pr68674.c b/gcc/testsuite/gcc.target/arm/pr68674.c index a31a88a..0b32374 100644 --- a/gcc/testsuite/gcc.target/arm/pr68674.c +++ b/gcc/testsuite/gcc.target/arm/pr68674.c @@ -1,7 +1,9 @@ /* PR target/68674 */ /* { dg-do compile } */ /* { dg-require-effective-target arm_neon_ok } */ -/* { dg-options "-O2 -mfloat-abi=softfp" } */ +/* { dg-require-effective-target arm_fp_ok } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_fp } */ #pragma GCC target ("fpu=vfp")
[PATCH] PR other/69006: fix extra newlines after diagnostics (v2)
Here's an updated version of the patch. On Wed, 2016-01-13 at 18:32 +0100, Bernd Schmidt wrote: > On 01/13/2016 01:57 AM, David Malcolm wrote: > > There are five places in trunk that can call diagnostic_show_locus. > > I'd kind of like to see before/after example output for all of these, to > make sure that we are indeed removing only unnecessary newlines. Here's an attempt to show all of the cases, for the 4 out of 5 meangingful sites. It's rather long, so by way of summary it's as if I'd hand-unrolled these nested loops: for each of the 4 usage sites in the source code: for each of "before the patch" vs "after the patch": for each of with, then without -fno-diagnostics-show-caret (i.e. first without the quoted source text, then with it). giving 16 examples, using "VVV" and "^^^" to mark the bounds of what I'm quoting (to make it easier to see trailing newlines). USAGE SITE (1): in default_diagnostic_finalizer As before, the patch updates this to remove a newline immediately after a call to diagnostic_show_locus. Example of use: the Go frontend, e.g. go.test/test/assign.go: Before, with -fno-diagnostics-show-caret: $ gccgo ../../src/gcc/testsuite/go.test/test/assign.go -I../x86_64-pc-linux-gnu/libgo -O -fno-show-column -pedantic-errors -S -o assign.s -fno-diagnostics-show-caret ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘state’ in ‘sync.Mutex’ literal ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘sema’ in ‘sync.Mutex’ literal ../../src/gcc/testsuite/go.test/test/assign.go:45: error: unknown field ‘key’ in ‘sync.Mutex’ Before, without -fno-diagnostics-show-caret: $ gccgo ../../src/gcc/testsuite/go.test/test/assign.go -I../x86_64-pc-linux-gnu/libgo -O -fno-show-column -pedantic-errors -S -o assign.s ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘state’ in ‘sync.Mutex’ literal x := sync.Mutex{0, 0} // ERROR "assignment.*Mutex" ^ ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘sema’ in ‘sync.Mutex’ literal ../../src/gcc/testsuite/go.test/test/assign.go:45: error: unknown field ‘key’ in ‘sync.Mutex’ x := sync.Mutex{key: 0} // ERROR "(unknown|assignment).*Mutex" ^ (note the erroneous trailing blank lines after the caret lines) After, with -fno-diagnostics-show-caret: $ ./gccgo -B. ../../src/gcc/testsuite/go.test/test/assign.go -I../x86_64-pc-linux-gnu/libgo -O -fno-show-column -pedantic-errors -S -o assign.s ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘state’ in ‘sync.Mutex’ literal ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘sema’ in ‘sync.Mutex’ literal ../../src/gcc/testsuite/go.test/test/assign.go:45: error: unknown field ‘key’ in ‘sync.Mutex’ (i.e. unchanged) After, without -fno-diagnostics-show-caret: $ ./gccgo -B. ../../src/gcc/testsuite/go.test/test/assign.go -I../x86_64-pc-linux-gnu/libgo -O -fno-show-column -pedantic-errors -S -o assign.s -fno-diagnostics-show-caret ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘state’ in ‘sync.Mutex’ literal ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘sema’ in ‘sync.Mutex’ literal ../../src/gcc/testsuite/go.test/test/assign.go:45: error: unknown field ‘key’ in ‘sync.Mutex’ [david@c64 gcc]$ ./gccgo -B. ../../src/gcc/testsuite/go.test/test/assign.go -I../x86_64-pc-linux-gnu/libgo -O -fno-show-column -pedantic-errors -S -o assign.s ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘state’ in ‘sync.Mutex’ literal x := sync.Mutex{0, 0} // ERROR "assignment.*Mutex" ^ ../../src/gcc/testsuite/go.test/test/assign.go:41: error: assignment of unexported field ‘sema’ in ‘sync.Mutex’ literal ../../src/gcc/testsuite/go.test/test/assign.go:45: error: unknown field ‘key’ in ‘sync.Mutex’ x := sync.Mutex{key: 0} // ERROR "(unknown|assignment).*Mutex" ^ (i.e. fixing the erroneous trailing blank lines after the caret lines) USAGE SITE (2): c_diagnostic_finalizer: Likewise. Example of use: C frontend, e.g. gcc.dg/2003-1.c Before, with -fno-diagnostics-show-caret: $ gcc
RE: [PATCH] Skip re-computing the mips frame info after reload completed
Bernd Edlingerwrites: > Matthew Fortune writes: > > Has the patch been tested beyond just building GCC? I can do a > > test run for you if you don't have things set up to do one yourself. > > I built a cross-gcc with all languages and a cross-glibc, but I have > not set up an emulation environment, so if you could give it a test > that would be highly welcome. mipsel-linux-gnu test results are the same before and after this patch. Please go ahead and commit. Thanks, Matthew
Re: C++ PATCH for c++/69379 (ICE with PTRMEM_CST wrapped in NOP_EXPR)
On Mon, Jan 25, 2016 at 10:08:34AM -0500, Jason Merrill wrote: > On 01/22/2016 05:07 PM, Marek Polacek wrote: > >On Fri, Jan 22, 2016 at 03:38:26PM -0500, Jason Merrill wrote: > >>If we have a NOP_EXPR to the same type, we should strip it here. > > > >This helps for the unreduced testcases in the PR, but not for the reduced > >one, > >because for the reduced one, the types are not the same. One type is > >struct > >{ > > void Dict:: (struct Dict *, T) * __pfn; > > long int __delta; > >} > >and the second one > >struct > >{ > > void Dict:: (struct Dict *) * __pfn; > > long int __delta; > >} > > > >The NOP_EXPR in this case originated in build_reinterpret_cast_1: > >7070 else if ((TYPE_PTRFN_P (type) && TYPE_PTRFN_P (intype)) > >7071|| (TYPE_PTRMEMFUNC_P (type) && TYPE_PTRMEMFUNC_P (intype))) > >7072 return build_nop (type, expr); > > Well, a reinterpret_cast makes the expression non-constant, so we can > recognize that case (when the types are unrelated) and bail out. After that > we probably still need to deal with the case of conversion to a > pointer-to-member-of-base type; for functions it looks like we can just copy > the PTRMEM_CST and give it a different type, but for data members I think > we'll need to add support for the type not matching the member in > expand_ptrmem_cst. It appears that handling the case when the types don't match is sufficient, at least all the tests pass, thus the following should be enough. If you want me to take care of the rest then please let me know, though without a testcase it might be harder to get right. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2016-01-25 Marek PolacekPR c++/69379 * constexpr.c (cxx_eval_constant_expression): Handle PTRMEM_CSTs wrapped in NOP_EXPRs. * g++.dg/pr69379.C: New test. diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c index 6b0e5a8..4b952d1 100644 --- gcc/cp/constexpr.c +++ gcc/cp/constexpr.c @@ -3619,6 +3619,20 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, if (TREE_CODE (op) == PTRMEM_CST && !TYPE_PTRMEM_P (type)) op = cplus_expand_constant (op); + if (TREE_CODE (op) == PTRMEM_CST && tcode == NOP_EXPR) + { + if (same_type_ignoring_top_level_qualifiers_p (type, + TREE_TYPE (op))) + STRIP_NOPS (t); + else + { + if (!ctx->quiet) + error_at (EXPR_LOC_OR_LOC (t, input_location), + "reinterpret_cast has different types"); + *non_constant_p = true; + return t; + } + } if (POINTER_TYPE_P (type) && TREE_CODE (op) == INTEGER_CST && !integer_zerop (op)) diff --git gcc/testsuite/g++.dg/pr69379.C gcc/testsuite/g++.dg/pr69379.C index e69de29..249ad00 100644 --- gcc/testsuite/g++.dg/pr69379.C +++ gcc/testsuite/g++.dg/pr69379.C @@ -0,0 +1,20 @@ +// PR c++/69379 +// { dg-do compile } +// { dg-options "-Wformat" } + +typedef int T; +class A { +public: + template A(const char *, D); + template + void m_fn1(const char *, Fn, A1 const &, A2); +}; +struct Dict { + void m_fn2(); +}; +void fn1() { + A a("", ""); + typedef void *Get; + typedef void (Dict::*d)(T); + a.m_fn1("", Get(), d(::m_fn2), ""); +} Marek
Re: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order
Andreas Schwabwrote: > FAIL: gcc.target/aarch64/ccmp_1.c scan-assembler-times \tcmp\tw[0-9]+, 0 4 > FAIL: gcc.target/aarch64/ccmp_1.c scan-assembler adds\t > FAIL: gcc.target/aarch64/ccmp_1.c scan-assembler-times fccmpe\t.*0\\.0 1 Yes I noticed those too, and here is the fix. Richard's recent change added UNSPEC to the CCMP patterns to stop combine optimizing the CCMP CCmode immediate in a rare case. This requires a change to the CCMP cost calculation as the CCMP instruction with unspec is no longer recognized. Fix the ccmp_1.c test to allow both '0' and 'wzr' on cmp - BTW is there a regular expression that correctly implements (0|xzr)? If I use that the test still fails somehow but \[0wzr\]+ works fine... Is the correct syntax documented somewhere? Finally to ensure FCCMPE is emitted on relational compares, add -ffinite-math-only. ChangeLog: 2016-01-25 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_if_then_else_costs): Remove CONST_INT_P check in CCMP cost calculation. gcc/testsuite/ * gcc.target/aarch64/ccmp_1.c: Fix test issues. --- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 6c570c7db1cfbd0415e73fb110ce5d70aa09b540..7f304b78a3e48862bf5aaf855e307fe90969dd8c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6014,7 +6014,7 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, int *cost, bool speed) else if (GET_MODE_CLASS (GET_MODE (inner)) == MODE_CC) { /* CCMP. */ - if ((GET_CODE (op1) == COMPARE) && CONST_INT_P (op2)) + if (GET_CODE (op1) == COMPARE) { /* Increase cost of CCMP reg, 0, imm, CC to prefer CMP reg, 0. */ if (XEXP (op1, 1) == const0_rtx) diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_1.c b/gcc/testsuite/gcc.target/aarch64/ccmp_1.c index 7c39b61a585a1d4d662b0736e1c80e06bdc6b4ce..8e3f8629f802eec64c95080a23f320712333471b 100644 --- a/gcc/testsuite/gcc.target/aarch64/ccmp_1.c +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2" } */ +/* { dg-options "-O2 -ffinite-math-only" } */ int f1 (int a) @@ -85,7 +85,7 @@ f13 (int a, int b) /* { dg-final { scan-assembler "cmp\t(.)+34" } } */ /* { dg-final { scan-assembler "cmp\t(.)+35" } } */ -/* { dg-final { scan-assembler-times "\tcmp\tw\[0-9\]+, 0" 4 } } */ +/* { dg-final { scan-assembler-times "\tcmp\tw\[0-9\]+, \[0wzr\]+" 4 } } */ /* { dg-final { scan-assembler-times "fcmpe\t(.)+0\\.0" 2 } } */ /* { dg-final { scan-assembler-times "fcmp\t(.)+0\\.0" 2 } } */
[PATCH] Handle -fsanitize=* in lto-wrapper (PR lto/69254)
Hi! Here is an attempt to handle -f{,no-}sanitize= options in LTO wrapper. In addition to that I've noticed ICEs e.g. if some OpenMP code is compiled with -c -flto -fopenmp, but final link is -fno-openmp, similarly for openacc, -fcilkplus is similar but used to be handled even less. The intended behavior for -f{,no-}sanitize= is that for the ubsan sanitizers which are typically lowered before IPA, but are often using builtins that need initialization even at the LTO level, we collect from each TU info on whether any ubsan sanitizers have been enabled (note, this needs parsing of the options, because we can e.g. have -fsanitize=shift,return -fno-sanitize=undefined -fsanitize=integer-divide-by-zero ) and turn that into -fsanitize=shift from all the TUs if any of them needed any (randomly chosen sanitizer that is handled by FEs only). For address or thread sanitizers, which are handled solely post IPA, the choice whether to sanitize is left to the linker command line. And finally we need to ensure that e.g. -fno-sanitize=address,shift doesn't turn off the ubsan sanitizers. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-01-25 Jakub JelinekPR lto/69254 * opts.h (parse_sanitizer_options): New prototype. * opts.c (sanitizer_opts): New array. (parse_sanitizer_options): New function. (common_handle_option): Use parse_sanitizer_options. * lto-opts.c (lto_write_options): Write also -f{,no-}sanitize= options. * lto-wrapper.c (sanitize_shift_decoded_opt): New function. (merge_and_complain): Determine if any -fsanitize= options enabled at the end any undefined behavior sanitizers, and append -fsanitize=shift if needed. Handle -fcilkplus. (append_compiler_options): Handle -fcilkplus and -fsanitize=. (append_linker_options): Ignore -fno-{openmp,openacc,cilkplus}. (find_and_merge_options): Canonicalize -fsanitize= options. (run_gcc): Append -fsanitize=shift if compiler options set it and linker options might override it. --- gcc/opts.h.jj 2016-01-23 00:13:00.714017906 +0100 +++ gcc/opts.h 2016-01-25 14:06:31.833127411 +0100 @@ -372,6 +372,8 @@ extern void control_warning_option (unsi extern char *write_langs (unsigned int mask); extern void print_ignored_options (void); extern void handle_common_deferred_options (void); +unsigned int parse_sanitizer_options (const char *, location_t, int, + unsigned int, int, bool); extern bool common_handle_option (struct gcc_options *opts, struct gcc_options *opts_set, const struct cl_decoded_option *decoded, --- gcc/opts.c.jj 2016-01-23 00:13:00.662018617 +0100 +++ gcc/opts.c 2016-01-25 14:06:31.834127398 +0100 @@ -1433,6 +1433,104 @@ enable_fdo_optimizations (struct gcc_opt opts->x_flag_tree_loop_distribute_patterns = value; } +/* -f{,no-}sanitize{,-recover}= suboptions. */ +static const struct sanitizer_opts_s +{ + const char *const name; + unsigned int flag; + size_t len; +} sanitizer_opts[] = +{ +#define SANITIZER_OPT(name, flags) { #name, flags, sizeof #name - 1 } + SANITIZER_OPT (address, SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS), + SANITIZER_OPT (kernel-address, SANITIZE_ADDRESS | SANITIZE_KERNEL_ADDRESS), + SANITIZER_OPT (thread, SANITIZE_THREAD), + SANITIZER_OPT (leak, SANITIZE_LEAK), + SANITIZER_OPT (shift, SANITIZE_SHIFT), + SANITIZER_OPT (integer-divide-by-zero, SANITIZE_DIVIDE), + SANITIZER_OPT (undefined, SANITIZE_UNDEFINED), + SANITIZER_OPT (unreachable, SANITIZE_UNREACHABLE), + SANITIZER_OPT (vla-bound, SANITIZE_VLA), + SANITIZER_OPT (return, SANITIZE_RETURN), + SANITIZER_OPT (null, SANITIZE_NULL), + SANITIZER_OPT (signed-integer-overflow, SANITIZE_SI_OVERFLOW), + SANITIZER_OPT (bool, SANITIZE_BOOL), + SANITIZER_OPT (enum, SANITIZE_ENUM), + SANITIZER_OPT (float-divide-by-zero, SANITIZE_FLOAT_DIVIDE), + SANITIZER_OPT (float-cast-overflow, SANITIZE_FLOAT_CAST), + SANITIZER_OPT (bounds, SANITIZE_BOUNDS), + SANITIZER_OPT (bounds-strict, SANITIZE_BOUNDS | SANITIZE_BOUNDS_STRICT), + SANITIZER_OPT (alignment, SANITIZE_ALIGNMENT), + SANITIZER_OPT (nonnull-attribute, SANITIZE_NONNULL_ATTRIBUTE), + SANITIZER_OPT (returns-nonnull-attribute, SANITIZE_RETURNS_NONNULL_ATTRIBUTE), + SANITIZER_OPT (object-size, SANITIZE_OBJECT_SIZE), + SANITIZER_OPT (vptr, SANITIZE_VPTR), + SANITIZER_OPT (all, ~0), +#undef SANITIZER_OPT + { NULL, 0, 0 } +}; + +/* Parse comma separated sanitizer suboptions from P for option SCODE, + adjust previous FLAGS and return new ones. If COMPLAIN is false, + don't issue diagnostics. */ + +unsigned int +parse_sanitizer_options (const char *p, location_t loc, int scode, +unsigned int flags, int value, bool complain) +{ + enum opt_code code = (enum opt_code) scode; + while (*p != 0) +{ +
Re: [PATCH][ARM] Fix PR target/69245 Rewrite arm_set_current_function
On 22/01/16 14:51, Christian Bruel wrote: Hi Kyrill, On 01/22/2016 03:17 PM, Kyrill Tkachov wrote: Hi Christian, On 22/01/16 14:07, Christian Bruel wrote: Hi Kyrill, On 01/21/2016 01:22 PM, Kyrill Tkachov wrote: Hi Christian, On 21/01/16 10:36, Christian Bruel wrote: The current arm_set_current_function was both awkward and buggy. For instance using partially set TARGET_OPTION set from pragma_parse, while restore_target_globalsnor arm_option_params_internal was not reset. Another issue is that in some paths, target_reinit was not called due to old cached target_option_current_node value. for instance with foo{} #pragma GCC target ... foo was called with global_options set from old GCC target (which was wrong) and correct rtl values. This is a reimplementation of the function. Hoping to be easier to read (and maintain). Solves the current issues seen so far. regtested for arm-linux-gnueabi -mfpu=vfp, -mfpu=neon,-mfpu=neon-fp-armv8 Thanks for the patch, I'll try it out. In the meantime there's a couple of style and typo nits... + /* Make sure that target_reinit is called for next function, since + TREE_TARGET_OPTION might change with the #pragma even if there are + no target attribute attached to the function. */ s/attribute/attributes - arm_previous_fndecl = fndecl; + /* if no attribute, use the mode set by the current pragma target. */ + if (! new_tree) +new_tree = target_option_current_node; + s/if/If/ + /* now target_reinit can save the state for later. */ + TREE_TARGET_GLOBALS (new_tree) += save_target_globals_default_opts (); s/now/Now/ While playing on my side. I realized that we could simplify the patch further by removing the need to set and use target_option_current_node, since this is redundant with what handle_pragma_push/pop_options does. Also since that the functions inside a pragma GCC target region will have DECL_FUNCTION_SPECIFIC_TARGET set already, we don't seem to need a special case for those. With this V2, arm_set_current_function is becoming more minimalist and still fixes the current issues. Could you test this version instead ? Thanks, I'll check this out instead. I've played a bit with your previous version and the effect on the testcases looked ok, but I have a couple of comments on the testcase in the meantime Index: gcc/testsuite/gcc.target/arm/pr69245.c === --- gcc/testsuite/gcc.target/arm/pr69245.c(revision 0) +++ gcc/testsuite/gcc.target/arm/pr69245.c(working copy) @@ -0,0 +1,24 @@ +/* PR target/69245 */ +/* Test that pop_options restores the vfp fpu mode. */ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_fp_ok } */ +/* { dg-add-options arm_fp } */ + +#pragma GCC target "fpu=vfp" + +#pragma GCC push_options +#pragma GCC target "fpu=neon" +int a, c, d; +float b; +static int fn1 () +{ + return 0; +} +#pragma GCC pop_options + +void fn2 () +{ + d = b * c + a; +} + +/* { dg-final { scan-assembler-times "\.fpu vfp" 1 } } */ PR 69245 is an ICE whereas your testcase doesn't ICE without the patch, it just fails the scan-assembler check. I'd like to have the testcase trigger the ICE without your patch. For that we need -O2 in dg-options. Also, the "fpu=vfp" pragma you put in the beginning doesn't allow the ICE to trigger, presumably because it triggers a different path through the pragma option popping code. So removing that pragma and instead changing the dg-add-options from arm_fp to arm_vfp3 (which is floating-point without the vfma instruction causes the ICE) does the trick for me. Also the "fpu=neon" pragma should also be changed to be "fpu=neon-vfpv4" because that setting allows the vfma instruction which is being wrongly considered in fn2(). I suppose you'll then want to change the scan-assembler directive to look for \.fpu vfp3. ah yes ! OK for -O2, I thought I had it, must have been deleted somewhere :-( I added the #pragma GCC target "fpu=vfp" to have some kind of deterministic checks to guard against the options permutations that Christophe stresses during his validations. so for instance the ".fpu scan-assembler would change depending on the default options... so the following test should ICE with the all configurations (!-mfloat-abi=soft) in -O2 #pragma GCC target "fpu=vfp" #pragma GCC push_options #pragma GCC target "fpu=neon-vfpv4" int a, c, d; float b; static int fn1 () { return 0; } #pragma GCC pop_options void fn2 () { d = b * c + a; } Ah ok, I needed to update my tree to include your other midend fixes in this area. I played around with the patch and gave it a bootstrap as well. I wanted to make a sanity check on compile-time performance for files using arm_neon.h and I didn't spot any measurable regressions. So this is ok for trunk with the testcase changed as discussed above and using -O2 optimisation level and with a couple comment fixes below. - arm_previous_fndecl =
Re: [PATCH, committed][gcc-5-branch] Fix broken test case derived_constructor_comps_6.f90
On Mon, 2016-01-25 at 18:51 +0100, Paul Richard Thomas wrote: > On 25 January 2016 at 18:33, Peter Bergnerwrote: >> I'll leave it to you or someone else to fix the -m32 bug, since >> I'm not seeing it on my system. > > Neither am I :-( I have a powerpc64-linux (ie, BE) system I can jump on that does support -m32. I'll see if I can't recreate the -m32 segv Dominique was seeing. Peter
Re: [PING^2][PATCHv2, ARM, libgcc] New aeabi_idiv function for armv6-m
Ping. On 27/10/15 17:03, Andre Vieira wrote: Ping. BR, Andre On 13/10/15 18:01, Andre Vieira wrote: This patch ports the aeabi_idiv routine from Linaro Cortex-Strings (https://git.linaro.org/toolchain/cortex-strings.git), which was contributed by ARM under Free BSD license. The new aeabi_idiv routine is used to replace the one in libgcc/config/arm/lib1funcs.S. This replacement happens within the Thumb1 wrapper. The new routine is under LGPLv3 license. The main advantage of this version is that it can improve the performance of the aeabi_idiv function for Thumb1. This solution will also increase the code size. So it will only be used if __OPTIMIZE_SIZE__ is not defined. Make check passed for armv6-m. libgcc/ChangeLog: 2015-08-10 Hale WangAndre Vieira * config/arm/lib1funcs.S: Add new wrapper.
Re: [aarch64] Improve TImode constant moves
On 01/25/2016 01:32 AM, Kyrill Tkachov wrote: > +case CONST_WIDE_INT: > + *cost = 0; > + for (unsigned int n = CONST_WIDE_INT_NUNITS(x), i = 0; i < n; ++i) > +{ > + unsigned HOST_WIDE_INT e = CONST_WIDE_INT_ELT(x, i); > + if (e != 0) > +*cost += COSTS_N_INSNS (aarch64_internal_mov_immediate > +(NULL_RTX, GEN_INT (e), false, DImode)); > +} > + return true; > + > > We usually avoid creating intermediate rtxes in the cost function because > it can potentially be called many times during compilation and we want to > avoid > creating too many short-lived objects, though I suppose there's no way getting > around this one (the GEN_INT call). Well, it's only aarch64_internal_mov_immediate -- we could change the interface to provide the HOST_WIDE_INT value directly. But that was more than I wanted to do for enabling splittable TImode constants. r~
[PATCH, gcc7, aarch64] Add arithmetic overflow patterns
After having just spent a few days looking through dumps of builtin-overflow-*.c for regressions while testing the patch for the TImode arithmetic PR, I thought I'd go ahead and post a patch to make use of the overflow bit on aarch64. Consider this queued for stage1. r~ * config/aarch64/aarch64-modes.def (CC_V): New. * config/aarch64/aarch64.c (aarch64_zero_extend_const_eq): New. (aarch64_select_cc_mode): Test for signed overflow using CC_Vmode. (aarch64_get_condition_code_1): Handle CC_Vmode. * config/aarch64/aarch64-protos.h: Update. * config/aarch64/aarch64.md (addv4, uaddv4): New. (addti3): Create simpler code if low part is already known to be 0. (addvti4, uaddvti4): New. (*add3_compareC_cconly_imm): New. (*add3_compareC_cconly): New. (*add3_compareC_imm): New. (*add3_compareC): Rename from add3_compare1; do not handle constants within this pattern. (*add3_compareV_cconly_imm): New. (*add3_compareV_cconly): New. (*add3_compareV_imm): New. (add3_compareV): New. (add3_carryinC, add3_carryinV): New. (*add3_carryinC_zero, *add3_carryinV_zero): New. (*add3_carryinC, *add3_carryinV): New. (subv4, usubv4): New. (subti): Handle op1 zero. (subvti4, usub4ti4): New. (*sub3_compare1_imm): New. (sub3_carryinCV): New. (*sub3_carryinCV_z1_z2, *sub3_carryinCV_z1): New. (*sub3_carryinCV_z2, *sub3_carryinCV): New. diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 7de0b3f..f34345a 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -26,6 +26,7 @@ CC_MODE (CC_SESWP); /* sign-extend LHS (but swap to make it RHS). */ CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid. */ CC_MODE (CC_Z); /* Only Z bit of condition flags is valid. */ CC_MODE (CC_C); /* Only C bit of condition flags is valid. */ +CC_MODE (CC_V); /* Only V bit of condition flags is valid. */ /* Half-precision floating point for __fp16. */ FLOAT_MODE (HF, 2, 0); diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 15fc37d..32cf245 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -289,6 +289,7 @@ void aarch64_declare_function_name (FILE *, const char*, tree); bool aarch64_legitimate_pic_operand_p (rtx); bool aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2); +bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx); bool aarch64_move_imm (HOST_WIDE_INT, machine_mode); bool aarch64_mov_operand_p (rtx, machine_mode); int aarch64_simd_attr_length_rglist (enum machine_mode); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0c18ab2..191d081 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1489,6 +1489,16 @@ aarch64_split_simd_move (rtx dst, rtx src) } } +bool +aarch64_zero_extend_const_eq (machine_mode xmode, rtx x, + machine_mode ymode, rtx y) +{ + rtx r = simplify_const_unary_operation (ZERO_EXTEND, xmode, y, ymode); + gcc_assert (r != NULL); + return rtx_equal_p (x, r); +} + + static rtx aarch64_force_temporary (machine_mode mode, rtx x, rtx value) { @@ -4192,6 +4202,13 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) && GET_CODE (y) == ZERO_EXTEND) return CC_Cmode; + /* A test for signed overflow. */ + if ((GET_MODE (x) == DImode || GET_MODE (x) == TImode) + && code == NE + && GET_CODE (x) == PLUS + && GET_CODE (y) == SIGN_EXTEND) +return CC_Vmode; + /* For everything else, return CCmode. */ return CCmode; } @@ -4300,6 +4317,15 @@ aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code comp_code) } break; +case CC_Vmode: + switch (comp_code) + { + case NE: return AARCH64_VS; + case EQ: return AARCH64_VC; + default: return -1; + } + break; + default: return -1; break; diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 363785e..46056f2 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1703,22 +1703,150 @@ } ) +(define_expand "addv4" + [(match_operand:GPI 0 "register_operand") + (match_operand:GPI 1 "register_operand") + (match_operand:GPI 2 "register_operand") + (match_operand 3 "")] + "" +{ + emit_insn (gen_add3_compareV (operands[0], operands[1], operands[2])); + + rtx x; + x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Vmode, CC_REGNUM), const0_rtx); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, + gen_rtx_LABEL_REF (VOIDmode, operands[3]), + pc_rtx); + emit_jump_insn
Re: [PATCH] fix #69317 - [6 regression] wrong ABI version in -Wabi warnings
Ping: I'm looking a review/approval of the almost trivial patch below: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01206.html On 01/16/2016 05:42 PM, Martin Sebor wrote: While adding an ABI warning in the patch for bug 69277 I noticed that the ABI version printed by GCC 6.0 in some -Wabi diagnostics is incorrect: while 5.1.0 prints the versions of the ABI given by the -Wabi=X and -fabi-version=Y options (i.e., it mentions both X and Y), 6.0 prints the same version twice (just Y). The attached patch fixes this and adds tests to verify that the referenced versions are as expected (it uses ABIs 2 and 3 but tests exercising the other ABI changes should be added as well). Martin
Re: [PATCH] Fix aarch64 bootstrap (pr69416)
On 01/25/2016 05:28 AM, Christophe Lyon wrote: > After this, I'm seeing this test now FAILs: > gcc.target/aarch64/ccmp_1.c scan-assembler adds\t That test case is badly written. In addition to that one, several of the other failures that I see within that file are simply equally optimal alternative choices for the compiler. The file needs to be split up and simpler more directed tests written. r~
[patch] bootstrap/69464 Avoid including all of in
In C++11 mode defines std::shuffle which uses std::uniform_int_distribution. It doesn't need the rest of , which is huge, especially on x86 with SSE3 support, where pmmintrin.h is pulled in. This moves the definition of std::uniform_int_distribution to a new header, and makes include that instead of . That removes 23kloc from , making it much less of a problem for the rest of the compiler to use during bootstrap. The change revealed a few testsuite bugs where tests (incorrectly) relied on pulling in or indirectly. It's likely that some programs will stop compiling because of this change, the fix will be to add the necessary headers. Tested x86_64-linux and powerpc64-linux, committed to trunk. commit b8cab1de33b9d3fbfbe984b6b7e9d3aa41b8e80e Author: Jonathan WakelyDate: Mon Jan 25 14:15:24 2016 + Avoid including all of in PR libstdc++/69464 * include/Makefile.am: Add new header. * include/Makefile.in: Regenerate. * include/bits/random.h (uniform_int_distribution): Move to bits/uniform_int_dist.h. * include/bits/random.tcc (uniform_int_distribution::operator(), uniform_int_distribution::__generate_impl): Likewise. * include/bits/uniform_int_dist.h: New header. * include/bits/stl_algo.h [__cplusplus >= 201103L]: Include instead of . * testsuite/20_util/specialized_algorithms/uninitialized_copy/ move_iterators/1.cc: Include correct header for uninitialized_copy. * testsuite/20_util/specialized_algorithms/uninitialized_copy_n/ move_iterators/1.cc: Likewise. * testsuite/25_algorithms/nth_element/58800.cc: Include correct header for vector. * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error lines. diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 573f057..0b34c3c 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -180,6 +180,7 @@ bits_headers = \ ${bits_srcdir}/stl_vector.h \ ${bits_srcdir}/streambuf.tcc \ ${bits_srcdir}/stringfwd.h \ + ${bits_srcdir}/uniform_int_dist.h \ ${bits_srcdir}/unique_ptr.h \ ${bits_srcdir}/unordered_map.h \ ${bits_srcdir}/unordered_set.h \ diff --git a/libstdc++-v3/include/bits/random.h b/libstdc++-v3/include/bits/random.h index 63f57d5..1babe80 100644 --- a/libstdc++-v3/include/bits/random.h +++ b/libstdc++-v3/include/bits/random.h @@ -32,6 +32,7 @@ #define _RANDOM_H 1 #include +#include namespace std _GLIBCXX_VISIBILITY(default) { @@ -149,14 +150,6 @@ _GLIBCXX_END_NAMESPACE_VERSION __mod(_Tp __x) { return _Mod<_Tp, __m, __a, __c>::__calc(__x); } -/* Determine whether number is a power of 2. */ -template - inline bool - _Power_of_2(_Tp __x) - { - return ((__x - 1) & __x) == 0; - }; - /* * An adaptor class for converting the output of any Generator into * the input for a specific Distribution. @@ -1656,164 +1649,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @{ */ - /** - * @brief Uniform discrete distribution for random numbers. - * A discrete random distribution on the range @f$[min, max]@f$ with equal - * probability throughout the range. - */ - template -class uniform_int_distribution -{ - static_assert(std::is_integral<_IntType>::value, - "template argument not an integral type"); - -public: - /** The type of the range of the distribution. */ - typedef _IntType result_type; - /** Parameter type. */ - struct param_type - { - typedef uniform_int_distribution<_IntType> distribution_type; - - explicit - param_type(_IntType __a = 0, - _IntType __b = std::numeric_limits<_IntType>::max()) - : _M_a(__a), _M_b(__b) - { - __glibcxx_assert(_M_a <= _M_b); - } - - result_type - a() const - { return _M_a; } - - result_type - b() const - { return _M_b; } - - friend bool - operator==(const param_type& __p1, const param_type& __p2) - { return __p1._M_a == __p2._M_a && __p1._M_b == __p2._M_b; } - - private: - _IntType _M_a; - _IntType _M_b; - }; - -public: - /** - * @brief Constructs a uniform distribution object. - */ - explicit - uniform_int_distribution(_IntType __a = 0, - _IntType __b = std::numeric_limits<_IntType>::max()) - : _M_param(__a, __b) - { } - - explicit - uniform_int_distribution(const param_type& __p) - : _M_param(__p) - { } - - /** - * @brief Resets the distribution state. - * - * Does nothing for the uniform integer distribution. - */ - void - reset() { } - - result_type - a() const - { return _M_param.a(); } - - result_type - b() const - { return _M_param.b(); } - - /** - * @brief Returns the parameter set of the distribution. - */ - param_type - param() const - { return _M_param; } - - /** - * @brief Sets the parameter set of
[PATCH, committed][gcc-5-branch] Fix broken test case derived_constructor_comps_6.f90
When the test case derived_constructor_comps_6.f90 was backported to the FSF 5 branch, a '}' on the dg-additional-options was dropped causing the test case to fail. I have added it back and committed it as obvious. Peter PR fortran/61831 * gfortran.dg/derived_constructor_comps_6.f90: Add missing } to fix up dg-additional-options. Index: gcc/testsuite/gfortran.dg/derived_constructor_comps_6.f90 === --- gcc/testsuite/gfortran.dg/derived_constructor_comps_6.f90 (revision 232798) +++ gcc/testsuite/gfortran.dg/derived_constructor_comps_6.f90 (working copy) @@ -1,5 +1,5 @@ ! { dg-do run } -! { dg-additional-options "-fdump-tree-original" +! { dg-additional-options "-fdump-tree-original" } ! ! PR fortran/61831 ! The deallocation of components of array constructor elements
Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts
On Mon, Jan 25, 2016 at 3:39 AM, Bernd Schmidtwrote: > On 01/23/2016 12:52 AM, Ian Lance Taylor wrote: > >> 2016-01-22 Ian Lance Taylor >> >> * common.opt (fkeep-gc-roots-live): New option. >> * tree-ssa-loop-ivopts.c (add_candidate_1): If >> -fkeep-gc-roots-live, skip pointers. >> (add_iv_candidate_for_biv): Handle add_candidate_1 returning >> NULL. >> * doc/invoke.texi (Optimize Options): Document >> -fkeep-gc-roots-live. >> >> gcc/testsuite/ChangeLog: >> >> 2016-01-22 Ian Lance Taylor >> >> * gcc.dg/tree-ssa/ivopt_5.c: New test. > > > Patch not attached? The patch is there in the mailing list. See the attachment on https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01781.html . Ian
[PATCH] pr 65702 - error out for invalid register asms earlier
From: Trevor SaundersHi, $subject. To avoid regressions I kept the checks when generating rtl, but I believe its impossible for those to trigger now and we can remove the checks. bootstrapped + regtested on x86_64-linux-gnu, ok? Trev gcc/c/ChangeLog: 2016-01-25 Trevor Saunders * c-decl.c (finish_decl): Check if asm register is valid. gcc/ChangeLog: 2016-01-25 Trevor Saunders * varasm.c (register_asmspec_ok_p): New function. (make_decl_rtl): Adjust. * varasm.h (register_asmspec_ok_p): New prototype. gcc/cp/ChangeLog: 2016-01-25 Trevor Saunders * decl.c (make_rtl_for_nonlocal_decl): Check if register asm is valid. --- gcc/c/c-decl.c| 8 +- gcc/cp/decl.c | 4 +- gcc/testsuite/g++.dg/torture/register-asm-1.C | 14 +++ gcc/testsuite/gcc.dg/reg-vol-struct-1.c | 2 +- gcc/testsuite/gcc.dg/torture/register-asm-1.c | 12 +++ gcc/varasm.c | 150 +++--- gcc/varasm.h | 3 + 7 files changed, 129 insertions(+), 64 deletions(-) create mode 100644 gcc/testsuite/g++.dg/torture/register-asm-1.C create mode 100644 gcc/testsuite/gcc.dg/torture/register-asm-1.c diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index 1ec6042..9257f35 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -4867,7 +4867,9 @@ finish_decl (tree decl, location_t init_loc, tree init, when a tentative file-scope definition is seen. But at end of compilation, do output code for them. */ DECL_DEFER_OUTPUT (decl) = 1; - if (asmspec && C_DECL_REGISTER (decl)) + if (asmspec + && C_DECL_REGISTER (decl) + && register_asmspec_ok_p (decl, asmspec, DECL_MODE (decl))) DECL_HARD_REGISTER (decl) = 1; rest_of_decl_compilation (decl, true, 0); } @@ -4878,7 +4880,9 @@ finish_decl (tree decl, location_t init_loc, tree init, in a particular register. */ if (asmspec && C_DECL_REGISTER (decl)) { - DECL_HARD_REGISTER (decl) = 1; + if (register_asmspec_ok_p (decl, asmspec, DECL_MODE (decl))) + DECL_HARD_REGISTER (decl) = 1; + /* This cannot be done for a structure with volatile fields, on which DECL_REGISTER will have been reset. */ diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index f4604b6..6d130bd 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -6201,7 +6201,9 @@ make_rtl_for_nonlocal_decl (tree decl, tree init, const char* asmspec) /* The `register' keyword, when used together with an asm-specification, indicates that the variable should be placed in a particular register. */ - if (VAR_P (decl) && DECL_REGISTER (decl)) + if (VAR_P (decl) + && DECL_REGISTER (decl) + && register_asmspec_ok_p (decl, asmspec, DECL_MODE (decl))) { set_user_assembler_name (decl, asmspec); DECL_HARD_REGISTER (decl) = 1; diff --git a/gcc/testsuite/g++.dg/torture/register-asm-1.C b/gcc/testsuite/g++.dg/torture/register-asm-1.C new file mode 100644 index 000..b5cfc84 --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/register-asm-1.C @@ -0,0 +1,14 @@ +/* { dg-do compile } */ + +class A { + int m_fn1() const; +}; +int a[1]; +int b; +int A::m_fn1() const { + register int c asm(""); // { dg-error "invalid register name for 'c'" } + while (b) + if (a[5]) + c = b; + return c; +} diff --git a/gcc/testsuite/gcc.dg/reg-vol-struct-1.c b/gcc/testsuite/gcc.dg/reg-vol-struct-1.c index b885f91..e67c7a2 100644 --- a/gcc/testsuite/gcc.dg/reg-vol-struct-1.c +++ b/gcc/testsuite/gcc.dg/reg-vol-struct-1.c @@ -12,7 +12,7 @@ f (void) { register struct S a; register struct S b[2]; - register struct S c __asm__("nosuchreg"); /* { dg-error "object with volatile field" "explicit reg name" } */ + register struct S c __asm__("nosuchreg"); /* { dg-error "invalid register name for 'c'|cannot put object with volatile field into register" } */ /* { dg-error "address of register" "explicit address" } */ b; /* { dg-error "address of register" "implicit address" } */ } diff --git a/gcc/testsuite/gcc.dg/torture/register-asm-1.c b/gcc/testsuite/gcc.dg/torture/register-asm-1.c new file mode 100644 index 000..1949f62 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/register-asm-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ + + int a[1], b; +int +foo () +{ +register int c asm (""); /* { dg-error "invalid register name for 'c'" } */ + while (b) + if (a[5]) + c = b; +return c; +} diff --git a/gcc/varasm.c b/gcc/varasm.c index 3a3573e..7e3aebc9 100644 ---
Re: [PATCH] pr 65702 - error out for invalid register asms earlier
On 01/25/2016 04:36 PM, tbsaunde+...@tbsaunde.org wrote: $subject. To avoid regressions I kept the checks when generating rtl, but I believe its impossible for those to trigger now and we can remove the checks. bootstrapped + regtested on x86_64-linux-gnu, ok? Is this still an issue? I committed a fix for a similar PR a few weeks ago, and I can't make the testcase from 65702 ICE. Bernd
Re: [PATCH] pr 65702 - error out for invalid register asms earlier
On Mon, Jan 25, 2016 at 04:42:58PM +0100, Bernd Schmidt wrote: > On 01/25/2016 04:36 PM, tbsaunde+...@tbsaunde.org wrote: > >$subject. To avoid regressions I kept the checks when generating rtl, but I > >believe its impossible for those to trigger now and we can remove the checks. > > > >bootstrapped + regtested on x86_64-linux-gnu, ok? > > Is this still an issue? I committed a fix for a similar PR a few weeks ago, > and I can't make the testcase from 65702 ICE. hrm, I guess my tree was more out of date than I thought, it doesn't ICE for me at r232662. Never mind then ;) Trev > > > Bernd
Re: [PATCH] fix #69251 - [6 Regression] ICE in unify_array_domain on a flexible array member
On 01/21/2016 04:32 PM, Martin Sebor wrote: On 01/21/2016 04:19 PM, Jason Merrill wrote: Can we reconsider the representation of flexible arrays? Following the example of the C front-end is causing a lot of trouble, and using a null TYPE_DOMAIN seems more intuitive. I remember running into at least one ICE in the middle end with the alternate representation (null TYPE_DOMAIN). At this late stage I would worry about the fallout from that. It seems that outside of 69251 and 69277 the problems are mostly triggered by ill-formed code that wasn't being tested and I'm hoping that the problems in the well-formed cases have been reported (and with the patches I've sent fixed). At the same time, based on some debugging I had to do for 69251 (ICE in unify_array_domain on a flexible array member) it seems that it might make handling them in template easier. In a discussion with Jason in IRC I agreed to submit a patch changing the representation of flexible array members in the C++ front end to use a null domain rather than a domain with a null upper bound. Attached is a patch making the requested change. It fixes the following bugs: c++/69251 - [6 Regression] ICE in unify_array_domain on a flexible array member (the bug in the Subject) c++/69253 - [6 Regression] ICE in cxx_incomplete_type_diagnostic initializing a flexible array member with empty string with the original patch here: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01325.htm and c++/69290 - [6 Regression] ICE on invalid initialization of a flexible array member with the original patch here: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01685.html as well as c++/69277 - [6 Regression] ICE mangling a flexible array member with its final patch posted here https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01233.html The downside of this approach is that it prevents everything but the front end from distinguishing flexible array members from arrays of unspecified or unknown bounds. The immediate impact is that prevents us from maintaining ABI compatibility with GCC 5 (with -fabi-version=9) and from diagnosing the mangling change. This means should we decide to adopt this approach, the final version of the patch for c++/69277 mentioned above that's still pending approval will need to be tweaked to have the ABI checks removed. I successfully tested the new patch on x86_64. Martin PR c++/69251 - [6 Regression] ICE in unify_array_domain on a flexible array member PR c++/69253 - [6 Regression] ICE in cxx_incomplete_type_diagnostic initializing a flexible array member with empty string PR c++/69290 - [6 Regression] ICE on invalid initialization of a flexible array member gcc/testsuite/ChangeLog: 2016-01-25 Martin SeborPR c++/69253 PR c++/69251 PR c++/69290 * g++.dg/ext/flexarray-subst.C: New test. * g++.dg/ext/flexary11.C: New test. * g++.dg/ext/flexary12.C: New test. * g++.dg/ext/flexary13.C: New test. * g++.dg/ext/flexary14.C: New test. * g++.dg/other/dump-ada-spec-2.C: Adjust. gcc/cp/ChangeLog: 2016-01-25 Martin Sebor PR c++/69253 PR c++/69251 PR c++/69290 * decl.c (compute_array_index_type): Return null for flexible array members. * tree.c (array_of_runtime_bound_p): Handle gracefully array types with null TYPE_MAX_VALUE. (build_ctor_subob_ref): Loosen debug checking to handle flexible array members. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index ceeef60..beb7c58 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -8638,8 +8638,9 @@ compute_array_index_type (tree name, tree size, tsubst_flags_t complain) tree itype; tree osize = size; + /* Flexible array members have no domain. */ if (size == NULL_TREE) -return build_index_type (NULL_TREE); +return NULL_TREE; if (error_operand_p (size)) return error_mark_node; diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index e2123ac..779652c 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -937,9 +937,10 @@ array_of_runtime_bound_p (tree t) tree dom = TYPE_DOMAIN (t); if (!dom) return false; - tree max = TYPE_MAX_VALUE (dom); - return (!potential_rvalue_constant_expression (max) - || (!value_dependent_expression_p (max) && !TREE_CONSTANT (max))); + if (tree max = TYPE_MAX_VALUE (dom)) +return (!potential_rvalue_constant_expression (max) + || (!value_dependent_expression_p (max) && !TREE_CONSTANT (max))); + return false; } /* Return a reference type node referring to TO_TYPE. If RVAL is @@ -2556,8 +2557,21 @@ build_ctor_subob_ref (tree index, tree type, tree obj) obj = build_class_member_access_expr (obj, index, NULL_TREE, /*reference*/false, tf_none); if (obj) -gcc_assert (same_type_ignoring_top_level_qualifiers_p (type, - TREE_TYPE (obj))); +{ + tree objtype = TREE_TYPE (obj); + if (TREE_CODE (objtype) == ARRAY_TYPE + && (!TYPE_DOMAIN (objtype) + || !TYPE_MAX_VALUE (TYPE_DOMAIN (objtype + { + /* When the
Re: [PATCH, committed][gcc-5-branch] Fix broken test case derived_constructor_comps_6.f90
Dear Peter, Many thanks! I have been away, got back last night and had intended to deal with it tonight. You should note that Dominique has flagged up that it fails with -m32. Tshuess Paul On 25 January 2016 at 18:09, Peter Bergnerwrote: > When the test case derived_constructor_comps_6.f90 was backported to > the FSF 5 branch, a '}' on the dg-additional-options was dropped > causing the test case to fail. I have added it back and committed > it as obvious. > > Peter > > PR fortran/61831 > * gfortran.dg/derived_constructor_comps_6.f90: Add missing } to fix > up dg-additional-options. > > > Index: gcc/testsuite/gfortran.dg/derived_constructor_comps_6.f90 > === > --- gcc/testsuite/gfortran.dg/derived_constructor_comps_6.f90 (revision > 232798) > +++ gcc/testsuite/gfortran.dg/derived_constructor_comps_6.f90 (working copy) > @@ -1,5 +1,5 @@ > ! { dg-do run } > -! { dg-additional-options "-fdump-tree-original" > +! { dg-additional-options "-fdump-tree-original" } > ! > ! PR fortran/61831 > ! The deallocation of components of array constructor elements > -- The difference between genius and stupidity is; genius has its limits. Albert Einstein
Re: [patch,ira]: Improve on updated memory cost in coloring pass of integrated register allocator.
On 01/23/2016 06:09 AM, Ajit Kumar Agarwal wrote: This patch improves the updated memory cost in coloring pass of integrated register allocator. Only enter_freq of the loop is considered in updated memory cost in the coloring pass. Consideration of only enter_freq is based on the concept that live Out of the entry or header of the Loop is live in and liveout throughout the loop. Exit freq is ignored in the update memory cost in coloring pass. As we put stores for spilled pseudos on loop entry and loads on the loop exits, ignoring loop exits means for me that we basically ignore the cost of the loads which is probably wrong in a general case. This increases the updated memory most and more chances of reducing the spill and fetch and better assignment. The concept of live-out of the header of the loop is live-in and live-out throughout of the Loop is based on the following. If a v live is out at the header of the loop then the variable is live-in at every node in the loop. To prove this, consider a loop L with header h such that the variable v defined at d is live-in at h. Since v is live at h, d is not part of L. This follows from the dominance property, i.e. h is strictly dominated by d. Furthermore, there exists a path from h to a use of v which does not go through d. For every node p in the loop, since the loop is strongly connected and node is a component of the CFG, there exists a path, consisting only of nodes of L from p to h. Concatenating these two paths proves that v is live-in and live-out of p. Bootstrapped on X86_64. Performance run is done on SPEC CPU2000 benchmarks and following are the results. SPEC INT benchmarks (Mean Score with this patch vs Mean score without this patch = 3729.777 vs 3717.083). BenchmarksGains. 186.crafty = 2.78% 176.gcc = 0.7% 253.perlbmk = 0.75% 255.vortex= 0.82% SPEC FP benchmarks (Mean Score with this patch vs Mean score without this patch = 4774.65 vs 4751.838 ). Benchmarks Gains 168.wupwise = 0.77% 171.swim= 1.5% 177.mesa= 1.2% 200.sixtrack= 1.2% 178.galgel= 0.6% 179.art = 0.6% 183.equake = 0.5% 187.facerec = 0.7%. Thanks for trying to improve GCC performance, Ajit. Unfortunately, I got different numbers on SPEC2000 with your patch. The different results might be a consequence of different test setup. I got the following numbers using 4.2GHz i7-4790K (Haswell) using -Ofast -mtune=corei7. Using the tune option is important as RA will try to improve code for Haswell architecture. 64-bit: Int 5123 5126 FP 6886 6897 32-bit: Int 4754 4763 FP 6363 6346 Here the first column is GCC with your patch and the second one is without your patch. Only 32-bit FP score was improved by you patch. These days practically nobody uses 32-bit code for FP benchmarks. So unfortunately I can not approve the patch. Sorry.
Re: [PATCH, committed][gcc-5-branch] Fix broken test case derived_constructor_comps_6.f90
On Mon, 2016-01-25 at 18:17 +0100, Paul Richard Thomas wrote: > Many thanks! I have been away, got back last night and had intended > to deal with it tonight. No problem. > You should note that Dominique has flagged up that it fails with > -m32. I was building on powerpc64le-linux (which doesn't support -m32) when I encountered the missing '}', so I only fixed the bug I saw. I'll leave it to you or someone else to fix the -m32 bug, since I'm not seeing it on my system. Peter
Re: [PATCH, committed][gcc-5-branch] Fix broken test case derived_constructor_comps_6.f90
Neither am I :-( Paul On 25 January 2016 at 18:33, Peter Bergnerwrote: > On Mon, 2016-01-25 at 18:17 +0100, Paul Richard Thomas wrote: >> Many thanks! I have been away, got back last night and had intended >> to deal with it tonight. > > No problem. > > >> You should note that Dominique has flagged up that it fails with >> -m32. > > I was building on powerpc64le-linux (which doesn't support -m32) > when I encountered the missing '}', so I only fixed the bug I saw. > I'll leave it to you or someone else to fix the -m32 bug, since > I'm not seeing it on my system. > > Peter > > -- The difference between genius and stupidity is; genius has its limits. Albert Einstein
Re: gomp_target_fini
On Jan 22, 2016, at 2:16 AM, Jakub Jelinekwrote: > On Thu, Jan 21, 2016 at 04:24:46PM +0100, Bernd Schmidt wrote: >> Thomas, I've mentioned this issue before - there is sometimes just too much >> irrelevant stuff to wade through in your patch submissions, and it >> discourages review. The discussion of the actual problem begins more than >> halfway through your multi-page mail. Please try to be more concise. >> >> On 12/16/2015 01:30 PM, Thomas Schwinge wrote: >>> Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the >>> atexit handler, gomp_target_fini, which, with the device lock held, will >>> call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to >>> clean up. >>> >>> Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA >>> context is now in an inconsistent state >> >>> Thus, any cuMemFreeHost invocations that are run during clean-up will now >>> also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call >>> GOMP_PLUGIN_fatal, which again will trigger the same or another >>> (GOMP_offload_unregister_ver) atexit handler, which will then deadlock >>> trying to lock the device again, which is still locked. >> >>> libgomp/ >>> * error.c (gomp_vfatal): Call _exit instead of exit. >> >> It seems unfortunate to disable the atexit handlers for everything for what >> seems purely an nvptx problem. >> >> What exactly happens if you don't register the cleanups with atexit in the >> first place? Or maybe you can query for CUDA_ERROR_LAUNCH_FAILED in the >> cleanup functions? > > I agree, _exit is just wrong, there could be important atexit hooks from the > application. You can set some flag that the libgomp or nvptx plugin atexit > hooks should not do anything, or should do things differently. But > bypassing all atexit handlers is risky. I’d use the phrase, is wrong. Just create a semaphore that says that init was fully done, and at the end of init, set it, and at the beginning of the cleanup, just test it and anytime you want to cancel the cleanup, reset the semaphore. Think of it, as a is_valid predicate. Any operation that needs it to be valid can query it first, and fail otherwise.
Re: [PATCH] Fix aarch64 bootstrap (pr69416)
Richard Henderson wrote: > On 01/25/2016 05:28 AM, Christophe Lyon wrote: > > After this, I'm seeing this test now FAILs: > > gcc.target/aarch64/ccmp_1.c scan-assembler adds\t > > That test case is badly written. In addition to that one, several of the > other > failures that I see within that file are simply equally optimal alternative > choices for the compiler. The file needs to be split up and simpler more > directed tests written. The test case was written specifically to emit 'adds' as that is the optimal sequence. It is a regression caused by wrapping the immediate in a unspec which disables costing of all CCMPs... I have a patch for this. The zero issue is due to the testcase assuming GCC emits '0' and 'wzr' correctly - it was based on a very old patch that emits the correct zero for compares that hasn't been OK'd yet. And the failure to emit an fccmp is due to a recent fix to NaN handling in compares, so that testcase now needs -ffinite-math-only. Wilco
Re: [PATCH, AArch64] Fix for PR67896 (C++ FE cannot distinguish __Poly{8,16,64,128}_t types)
On Jan 25, 2016, at 4:15 AM, James Greenhalghwrote: P.S.: I haven't signed the copyright assignment to the FSF. The change is really small but I can do the paperwork if required. > > I can't commit it on your behalf until we've heard back regarding whether > this needs a copyright assignment to the FSF, but once I've heard I'd > be happy to commit this for you. This is fine for the tree without paper work. Though, if you work on gcc on a regular basis and are likely to contribute more work in the future, it is nice to get the paper work out of the way for next time.
[C PATCH] Fix -Wunused-function (PR debug/66869)
Hi! The early-debug changes moved warnings about unused functions into cgraph. The problem is that if we have just unused declarations, they aren't sometimes even registered with cgraph and therefore we no longer warn. Here is an attempt to register those with cgraph anyway to get the warning, for C FE only (no idea where to do that in C++ FE). Or anyone has better suggestions what to do? Bootstrapped/regtested on x86_64-linux and i686-linux. 2016-01-25 Jakub JelinekPR debug/66869 * c-decl.c (c_write_global_declarations_1): For warn_unused_function, ensure creation of cgraph node even if there is no definition. * gcc.dg/pr66869.c: New test. --- gcc/c/c-decl.c.jj 2016-01-21 00:41:47.0 +0100 +++ gcc/c/c-decl.c 2016-01-25 16:36:31.973504082 +0100 @@ -10741,11 +10741,19 @@ c_write_global_declarations_1 (tree glob if (TREE_CODE (decl) == FUNCTION_DECL && DECL_INITIAL (decl) == 0 && DECL_EXTERNAL (decl) - && !TREE_PUBLIC (decl) - && C_DECL_USED (decl)) + && !TREE_PUBLIC (decl)) { - pedwarn (input_location, 0, "%q+F used but never defined", decl); - TREE_NO_WARNING (decl) = 1; + if (C_DECL_USED (decl)) + { + pedwarn (input_location, 0, "%q+F used but never defined", decl); + TREE_NO_WARNING (decl) = 1; + } + /* For -Wunused-function push the unused statics into cgraph, +so that check_global_declaration emits the warning. */ + else if (warn_unused_function + && ! DECL_ARTIFICIAL (decl) + && ! TREE_NO_WARNING (decl)) + cgraph_node::get_create (decl); } wrapup_global_declaration_1 (decl); --- gcc/testsuite/gcc.dg/pr66869.c.jj 2016-01-25 16:38:39.037758657 +0100 +++ gcc/testsuite/gcc.dg/pr66869.c 2016-01-25 16:39:42.346888954 +0100 @@ -0,0 +1,6 @@ +/* PR debug/66869 */ +/* { dg-do compile } */ +/* { dg-options "-Wunused-function" } */ + +static void test (void); /* { dg-warning "'test' declared 'static' but never defined" } */ +int i; Jakub
Re: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order
On 01/25/2016 12:09 PM, Wilco Dijkstra wrote: > BTW is there a regular expression that correctly implements (0|xzr)? (0|wzr) works fine for me; I've got exactly that fix in one of my trees. r~
[Patch, fortran] PR69385 - [6 regression] ICE on valid with -fcheck=mem
Dear All, The initial report concerns initialization assignments that should be excluded from the check for assignment of scalars to unallocated arrays. This part is so trivial that it does not require a test. On the other hand, the block that implemented the check was plain and simple wrong and the rest of the patch corrects this. It is commented such as to be fully comprehensible. Bootstrapped and regtested on FC21/x86_64 - OK for trunk and for 5-branch when all the wrinkles (PR69422 and 69423) are sorted out? Cheers Paul 2016-01-25 Paul ThomasPR fortran/69385 * trans-expr.c (gfc_trans_assignment_1): Exclude initialization assignments from check on assignment of scalars to unassigned arrays and correct wrong code within the corresponding block. 2015-01-25 Paul Thomas PR fortran/69385 * gfortran.dg/allocate_error_6.f90: New test. Index: gcc/fortran/trans-expr.c === *** gcc/fortran/trans-expr.c(revision 232800) --- gcc/fortran/trans-expr.c(working copy) *** gfc_trans_assignment_1 (gfc_expr * expr1 *** 9286,9291 --- 9286,9292 { gfc_conv_expr (, expr1); if (gfc_option.rtcheck & GFC_RTCHECK_MEM + && !init_flag && gfc_expr_attr (expr1).allocatable && expr1->rank && !expr2->rank) *** gfc_trans_assignment_1 (gfc_expr * expr1 *** 9293,9306 tree cond; const char* msg; ! tmp = expr1->symtree->n.sym->backend_decl; ! if (POINTER_TYPE_P (TREE_TYPE (tmp))) ! tmp = build_fold_indirect_ref_loc (input_location, tmp); ! if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (tmp))) ! tmp = gfc_conv_descriptor_data_get (tmp); ! else ! tmp = TREE_OPERAND (lse.expr, 0); cond = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, tmp, build_int_cst (TREE_TYPE (tmp), 0)); --- 9294,9310 tree cond; const char* msg; ! /* We should only get array references here. */ ! gcc_assert (TREE_CODE (lse.expr) == POINTER_PLUS_EXPR ! || TREE_CODE (lse.expr) == ARRAY_REF); ! /* 'tmp' is either the pointer to the array(POINTER_PLUS_EXPR) !or the array itself(ARRAY_REF). */ ! tmp = TREE_OPERAND (lse.expr, 0); ! ! /* Provide the address of the array. */ ! if (TREE_CODE (lse.expr) == ARRAY_REF) ! tmp = gfc_build_addr_expr (NULL_TREE, tmp); cond = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, tmp, build_int_cst (TREE_TYPE (tmp), 0)); Index: gcc/testsuite/gfortran.dg/allocate_error_6.f90 === *** gcc/testsuite/gfortran.dg/allocate_error_6.f90 (revision 0) --- gcc/testsuite/gfortran.dg/allocate_error_6.f90 (working copy) *** *** 0 --- 1,40 + ! { dg-do run } + ! { dg-additional-options "-fcheck=mem" } + ! { dg-shouldfail "Fortran runtime error: Assignment of scalar to unallocated array" } + ! + ! This omission was encountered in the course of fixing PR54070. Whilst this is a + ! very specific case, others such as allocatable components have been tested. + ! + ! Contributed by Tobias Burnus + ! + function g(a) result (res) + real :: a + real,allocatable :: res(:) + res = a ! Since 'res' is not allocated, a runtime error should occur. + end function + + interface + function g(a) result(res) + real :: a + real,allocatable :: res(:) + end function + end interface + ! print *, g(2.0) + ! call foo + call foofoo + contains + subroutine foo + type bar + real, allocatable, dimension(:) :: r + end type + type (bar) :: foobar + foobar%r = 1.0 + end subroutine + subroutine foofoo + type barfoo + character(:), allocatable, dimension(:) :: c + end type + type (barfoo) :: foobarfoo + foobarfoo%c = "1.0" + end subroutine + end
Re: [Patch, fortran] PR69385 - [6 regression] ICE on valid with -fcheck=mem
Hi Paul, seems we were pretty well-synchronized in posting this (in the PR it sounded as if you wanted me to submit it ...) In any case, the patch is ok for my taste. Thanks! Cheers, Janus 2016-01-25 22:02 GMT+01:00 Paul Richard Thomas: > Dear All, > > The initial report concerns initialization assignments that should be > excluded from the check for assignment of scalars to unallocated > arrays. This part is so trivial that it does not require a test. On > the other hand, the block that implemented the check was plain and > simple wrong and the rest of the patch corrects this. It is commented > such as to be fully comprehensible. > > Bootstrapped and regtested on FC21/x86_64 - OK for trunk and for > 5-branch when all the wrinkles (PR69422 and 69423) are sorted out? > > Cheers > > Paul > > 2016-01-25 Paul Thomas > > PR fortran/69385 > * trans-expr.c (gfc_trans_assignment_1): Exclude initialization > assignments from check on assignment of scalars to unassigned > arrays and correct wrong code within the corresponding block. > > 2015-01-25 Paul Thomas > > PR fortran/69385 > * gfortran.dg/allocate_error_6.f90: New test.
Re: [PATCH] Fix a typo in ppc libgcc (PR target/69444)
On Mon, Jan 25, 2016 at 3:34 PM, Jakub Jelinekwrote: > Hi! > > The soft-fp multilib of powerpc libgcc doesn't build because of a typo > in the conditional - the guarded code uses inline asm that assumes hard > float. > > Ok for trunk? > > 2016-01-25 Jakub Jelinek > > PR target/69444 > * config/rs6000/sfp-machine.h: Fix a typo in #ifndef - __NO_FPRS__ > instead of ___NO_FPRS__. Okay. Thanks, David
Re: Incorrect code due to indirect tail call of varargs function with hard float ABI
This issue also remains in 4.9 and 5.0 branches. Is this OK to backport to the release branches. Thanks, Kugan On 02/12/15 10:00, Kugan wrote: > >>> >>> gcc/ChangeLog: >>> >>> 2015-11-18 Kugan Vivekanandarajah>>> >>> PR target/68390 >>> * config/arm/arm.c (arm_function_ok_for_sibcall): Get function type >>> for indirect function call. >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2015-11-18 Kugan Vivekanandarajah >>> >>> PR target/68390 >>> * gcc.target/arm/PR68390.c: New test. >>> >> >> s/PR/pr in the test name and put this in gcc.c-torture/execute instead - >> there is nothing ARM specific about the test. Tests in gcc.target/arm should >> really only be architecture specific. This isn't. >> >>> >>> >>> >>> p.txt >>> >>> >>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >>> index a379121..0dae7da 100644 >>> --- a/gcc/config/arm/arm.c >>> +++ b/gcc/config/arm/arm.c >>> @@ -6680,8 +6680,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp) >>> a VFP register but then need to transfer it to a core >>> register. */ >>>rtx a, b; >>> + tree fn_decl = decl; >> >> Call it decl_or_type instead - it's really that ... >> >>> >>> - a = arm_function_value (TREE_TYPE (exp), decl, false); >>> + /* If it is an indirect function pointer, get the function type. */ >>> + if (!decl) >>> + fn_decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp))); >>> + >> >> This is probably just my mail client - but please watch out for indentation. >> >>> + a = arm_function_value (TREE_TYPE (exp), fn_decl, false); >>>b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)), >>> cfun->decl, false); >>>if (!rtx_equal_p (a, b)) >> >> >> OK with those changes. >> >> Ramana >> >
[PATCH 1/3] add missing testcase
--- gcc/testsuite/gcc.dg/graphite/pr69292.c | 19 +++ 1 file changed, 19 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/graphite/pr69292.c diff --git a/gcc/testsuite/gcc.dg/graphite/pr69292.c b/gcc/testsuite/gcc.dg/graphite/pr69292.c new file mode 100644 index 000..b925181 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr69292.c @@ -0,0 +1,19 @@ +/* { dg-options "-O2 -floop-nest-optimize" } */ + +int m[1]; + +void +foo (double a[20][20], double b[20]) +{ + int i, j, k; + + for (i = 0; i < m[0]; ++i) +for (j = 0; j < m[0]; ++j) + a[i][j] = a[i][j] + 1; + + for (k = 0; k < 20; ++k) +for (i = 0; i < m[0]; ++i) + for (j = 0; j < m[0]; ++j) + b[i] = b[i] + a[i][j]; +} + -- 2.5.0
[PATCH] pr69477 - attribute aligned documentation misleading
The attached patch adjusts the documentation of attribute aligned and attribute pack so as to prevent misreading the text of the former attribute as if it had read: Specifying attribute aligned for struct and union types is equivalent to specifying the packed attribute on each of the structure or union members. ... Martin PR other/69477 - attribute aligned documentation misleading gcc/ChangeLog: 2016-01-25 Martin SeborPR other/69477 * doc/extend.texi (Common Type Attributes): Move text that talks about attribute packed from attribute aligned to the section discussing the former attribute for clarity. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 232765) +++ gcc/doc/extend.texi (working copy) @@ -6307,9 +6307,6 @@ relevant type, and the code that the com pointer arithmetic operations is often more efficient for efficiently-aligned types than for other types. -The @code{aligned} attribute can only increase the alignment; but you -can decrease it by specifying @code{packed} as well. See below. - Note that the effectiveness of @code{aligned} attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum @@ -6319,36 +6316,8 @@ up to a maximum of 8-byte alignment, the in an @code{__attribute__} still only provides you with 8-byte alignment. See your linker documentation for further information. -@opindex fshort-enums -Specifying this attribute for @code{struct} and @code{union} types is -equivalent to specifying the @code{packed} attribute on each of the -structure or union members. Specifying the @option{-fshort-enums} -flag on the line is equivalent to specifying the @code{packed} -attribute on all @code{enum} definitions. - -In the following example @code{struct my_packed_struct}'s members are -packed closely together, but the internal layout of its @code{s} member -is not packed---to do that, @code{struct my_unpacked_struct} needs to -be packed too. - -@smallexample -struct my_unpacked_struct - @{ -char c; -int i; - @}; - -struct __attribute__ ((__packed__)) my_packed_struct - @{ - char c; - int i; - struct my_unpacked_struct s; - @}; -@end smallexample - -You may only specify this attribute on the definition of an @code{enum}, -@code{struct} or @code{union}, not on a @code{typedef} that does not -also define the enumerated type, structure or union. +The @code{aligned} attribute can only increase alignment. Alignment +can be decreased by specifying the @code{packed} attribute. See below. @item bnd_variable_size @cindex @code{bnd_variable_size} type attribute @@ -6476,6 +6445,37 @@ of the structure or union is placed to m attached to an @code{enum} definition, it indicates that the smallest integral type should be used. +@opindex fshort-enums +Specifying the @code{packed} attribute for @code{struct} and @code{union} +types is equivalent to specifying the @code{packed} attribute on each +of the structure or union members. Specifying the @option{-fshort-enums} +flag on the command line is equivalent to specifying the @code{packed} +attribute on all @code{enum} definitions. + +In the following example @code{struct my_packed_struct}'s members are +packed closely together, but the internal layout of its @code{s} member +is not packed---to do that, @code{struct my_unpacked_struct} needs to +be packed too. + +@smallexample +struct my_unpacked_struct + @{ +char c; +int i; + @}; + +struct __attribute__ ((__packed__)) my_packed_struct + @{ + char c; + int i; + struct my_unpacked_struct s; + @}; +@end smallexample + +You may only specify the @code{packed} attribute attribute on the definition +of an @code{enum}, @code{struct} or @code{union}, not on a @code{typedef} +that does not also define the enumerated type, structure or union. + @item scalar_storage_order ("@var{endianness}") @cindex @code{scalar_storage_order} type attribute When attached to a @code{union} or a @code{struct}, this attribute sets
Re: [hsa merge 07/10] IPA-HSA pass
> On Mon, Jan 25, 2016 at 04:21:50PM +0100, Martin Liška wrote: > > On 01/16/2016 11:00 AM, Jan Hubicka wrote: > > > Can't it be represented via explicit REF_ADDR or something like that? > > > > > > Honza > > > > Hi. > > > > Sure, I've just done a patch that can do that. However, as we're currently > > in stage4, > > that change would probably require explicit permission of a release manager? > > If Honza is fine with it and you've tested it, this is ok for trunk. It looks fine to me. Honza
[PATCH 2/3] fix PR68343: disable fuse-*.c tests for isl 0.14 or earlier
The patch disables all fuse-*.c tests when configuring gcc with isl 0.14 or earlier. ChangeLog: * Makefile.in: Regenerate. * Makefile.tpl: Export ISLVER. * configure: Regenerate. * config/isl.m4: Detect isl-0.15. gcc/ * Makefile.in: Set ISLVER in site.exp. * config.in: Regenerate. * configure: Regenerate. * configure.ac: Define HAVE_isl for isl-0.15. gcc/testsuite/ * gcc.dg/graphite/graphite.exp: Only run the fuse-*.c tests with isl-0.15. --- Makefile.in| 2 ++ Makefile.tpl | 2 ++ config/isl.m4 | 12 configure | 29 + gcc/Makefile.in| 1 + gcc/testsuite/gcc.dg/graphite/fuse-2.c | 4 ++-- gcc/testsuite/gcc.dg/graphite/graphite.exp | 8 +++- 7 files changed, 55 insertions(+), 3 deletions(-) diff --git a/Makefile.in b/Makefile.in index 20d1c90..a519a54 100644 --- a/Makefile.in +++ b/Makefile.in @@ -222,6 +222,7 @@ HOST_EXPORTS = \ GMPINC="$(HOST_GMPINC)"; export GMPINC; \ ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \ ISLINC="$(HOST_ISLINC)"; export ISLINC; \ + ISLVER="$(HOST_ISLVER)"; export ISLVER; \ LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \ LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \ XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export XGCC_FLAGS_FOR_TARGET; \ @@ -315,6 +316,7 @@ HOST_GMPINC = @gmpinc@ # Where to find isl HOST_ISLLIBS = @isllibs@ HOST_ISLINC = @islinc@ +HOST_ISLVER = @islver@ # Where to find libelf HOST_LIBELFLIBS = @libelflibs@ diff --git a/Makefile.tpl b/Makefile.tpl index 2567365..829f664 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -225,6 +225,7 @@ HOST_EXPORTS = \ GMPINC="$(HOST_GMPINC)"; export GMPINC; \ ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \ ISLINC="$(HOST_ISLINC)"; export ISLINC; \ + ISLVER="$(HOST_ISLVER)"; export ISLVER; \ LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \ LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \ XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export XGCC_FLAGS_FOR_TARGET; \ @@ -318,6 +319,7 @@ HOST_GMPINC = @gmpinc@ # Where to find isl HOST_ISLLIBS = @isllibs@ HOST_ISLINC = @islinc@ +HOST_ISLVER = @islver@ # Where to find libelf HOST_LIBELFLIBS = @libelflibs@ diff --git a/config/isl.m4 b/config/isl.m4 index 86ccb94..0103f1f 100644 --- a/config/isl.m4 +++ b/config/isl.m4 @@ -117,6 +117,18 @@ AC_DEFUN([ISL_CHECK_VERSION], AC_MSG_RESULT([recommended isl version is 0.15, minimum required isl version 0.14 is deprecated]) fi +AC_MSG_CHECKING([for isl-0.15]) +AC_TRY_LINK([#include ], +[isl_options_set_schedule_serialize_sccs (NULL, 0);], +[ac_has_isl_options_set_schedule_serialize_sccs=yes], +[ac_has_isl_options_set_schedule_serialize_sccs=no]) +AC_MSG_RESULT($ac_has_isl_options_set_schedule_serialize_sccs) + +if test x"$ac_has_isl_options_set_schedule_serialize_sccs" = x"yes"; then + islver="0.15" + AC_SUBST([islver]) +fi + CFLAGS=$_isl_saved_CFLAGS LDFLAGS=$_isl_saved_LDFLAGS LIBS=$_isl_saved_LIBS diff --git a/configure b/configure index cae3373..b9a4b51 100755 --- a/configure +++ b/configure @@ -650,6 +650,7 @@ extra_linker_plugin_flags extra_linker_plugin_configure_flags islinc isllibs +islver poststage1_ldflags poststage1_libs stage1_ldflags @@ -6048,6 +6049,34 @@ $as_echo "$gcc_cv_isl" >&6; } $as_echo "recommended isl version is 0.15, minimum required isl version 0.14 is deprecated" >&6; } fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for isl-0.15" >&5 +$as_echo_n "checking for isl-0.15... " >&6; } +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +int +main () +{ +isl_options_set_schedule_serialize_sccs (NULL, 0); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + ac_has_isl_options_set_schedule_serialize_sccs=yes +else + ac_has_isl_options_set_schedule_serialize_sccs=no +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext conftest.$ac_ext +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_has_isl_options_set_schedule_serialize_sccs" >&5 +$as_echo "$ac_has_isl_options_set_schedule_serialize_sccs" >&6; } + +if test x"$ac_has_isl_options_set_schedule_serialize_sccs" = x"yes"; then + islver="0.15" + +fi + CFLAGS=$_isl_saved_CFLAGS LDFLAGS=$_isl_saved_LDFLAGS LIBS=$_isl_saved_LIBS diff --git a/gcc/Makefile.in b/gcc/Makefile.in index ab9cbbf..aa3c018 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3698,6 +3698,7 @@ site.exp: ./config.status Makefile echo "set PLUGINCFLAGS \"$(PLUGINCFLAGS)\"" >> ./site.tmp; \ echo "set GMPINC \"$(GMPINC)\"" >>
[PATCH 3/3] new scop schedule for isl-0.15
Keep unchanged the implementation for isl-0.14. * graphite-poly.c (apply_poly_transforms): Simplify. (print_isl_set): Use more readable format: ISL_YAML_STYLE_BLOCK. (print_isl_map): Same. (print_isl_union_map): Same. (print_isl_schedule): New. (debug_isl_schedule): New. * graphite-dependences.c (scop_get_reads): Do not call isl_union_map_add_map that is undocumented isl functionality. (scop_get_must_writes): Same. (scop_get_may_writes): Same. (scop_get_original_schedule): Remove. (scop_get_dependences): Do not call isl_union_map_compute_flow that is deprecated in isl 0.15. Instead, use isl_union_access_* interface. (compute_deps): Remove. * graphite-isl-ast-to-gimple.c (print_schedule_ast): New. (debug_schedule_ast): New. (translate_isl_ast_to_gimple::scop_to_isl_ast): Call set_separate_option. (graphite_regenerate_ast_isl): Add dump. (translate_isl_ast_to_gimple::scop_to_isl_ast): Generate code from scop->transformed_schedule. (graphite_regenerate_ast_isl): Add more dump. * graphite-optimize-isl.c (optimize_isl): Set scop->transformed_schedule. Check whether schedules are equal. (apply_poly_transforms): Move here. * graphite-poly.c (apply_poly_transforms): ... from here. (free_poly_bb): Static. (free_scop): Static. (pbb_number_of_iterations_at_time): Remove. (print_isl_ast): New. (debug_isl_ast): New. (debug_scop_pbb): New. * graphite-scop-detection.c (print_edge): Move. (print_sese): Move. * graphite-sese-to-poly.c (build_pbb_scattering_polyhedrons): Remove. (build_scop_scattering): Remove. (create_pw_aff_from_tree): Assert instead of bailing out. (add_condition_to_pbb): Remove unused code, do not fail. (add_conditions_to_domain): Same. (add_conditions_to_constraints): Remove. (build_scop_context): New. (add_iter_domain_dimension): New. (build_iteration_domains): Initialize pbb->iterators. Call add_conditions_to_domain. (nested_in): New. (loop_at): New. (index_outermost_in_loop): New. (index_pbb_in_loop): New. (outermost_pbb_in): New. (add_in_sequence): New. (add_outer_projection): New. (outer_projection_mupa): New. (add_loop_schedule): New. (build_schedule_pbb): New. (build_schedule_loop): New. (embed_in_surrounding_loops): New. (build_schedule_loop_nest): New. (build_original_schedule): New. (build_poly_scop): Call build_original_schedule. * graphite.h: Declare print_isl_schedule and debug_isl_schedule. (free_poly_dr): Remove. (struct poly_bb): Add iterators. Remove schedule, transformed, saved. (free_poly_bb): Remove. (debug_loop_vec): Remove. (print_isl_ast): Declare. (debug_isl_ast): Declare. (scop_do_interchange): Remove. (scop_do_strip_mine): Remove. (scop_do_block): Remove. (flatten_all_loops): Remove. (optimize_isl): Remove. (pbb_number_of_iterations_at_time): Remove. (debug_scop_pbb): Declare. (print_schedule_ast): Declare. (debug_schedule_ast): Declare. (struct scop): Remove schedule. Add original_schedule, transformed_schedule. (free_gimple_poly_bb): Remove. (print_generated_program): Remove. (debug_generated_program): Remove. (unify_scattering_dimensions): Remove. * sese.c (print_edge): ... here. (print_sese): ... here. (debug_edge): ... here. (debug_sese): ... here. * sese.h (print_edge): Declare. (print_sese): Declare. (dump_edge): Declare. (dump_sese): Declare. --- gcc/graphite-dependences.c | 123 ++- gcc/graphite-isl-ast-to-gimple.c | 203 +++- gcc/graphite-optimize-isl.c| 177 +++--- gcc/graphite-poly.c| 154 + gcc/graphite-scop-detection.c | 15 - gcc/graphite-sese-to-poly.c| 365 ++--- gcc/graphite.h | 48 +-- gcc/sese.c | 34 ++ gcc/sese.h | 7 +- gcc/testsuite/gcc.dg/graphite/pr35356-1.c | 2 +- .../gfortran.dg/graphite/interchange-3.f90 | 2 +- 11 files changed, 836 insertions(+), 294 deletions(-) diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c index 0544700..f9d5bc3 100644 --- a/gcc/graphite-dependences.c +++ b/gcc/graphite-dependences.c @@ -66,7 +66,7 @@ add_pdr_constraints (poly_dr_p pdr, poly_bb_p pbb) /* Returns all the memory reads in
[patch] fix gccjit build failure
gccjit currently fails to build, needing an additional header. Ok to install on the trunk? Matthias * jit-playback.c: Include . --- a/gcc/jit/jit-playback.c +++ b/gcc/jit/jit-playback.c @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3. #include "jit-builtins.h" #include "jit-tempdir.h" +#include + /* gcc::jit::playback::context::build_cast uses the convert.h API, which in turn requires the frontend to provide a "convert"
Re: Wonly-top-basic-asm
Hi David, On Sun, Jan 24, 2016 at 02:23:53PM -0800, David Wohlferd wrote: > - Warn that this could change in future versions of gcc. To avoid > impacts from this change, use extended asm. > - Implement and document -Wonly-top-basic-asm (disabled by default) as a > way to locate affected statements. In my opinion we should not warn for any asm that means the same both as basic and as extended asm. The problem then becomes, what *is* the meaning of a basic asm, what does it clobber. Currently the only differences are: - asms that have a % in the string, or {|} on targets with ASSEMBLER_DIALECT; - ia64 (for stop bits); - mep, and this one is easily fixed. - basic asms do not get TARGET_MD_ASM_ADJUST. Segher
Re: [PATCH] jit: Fix missing references to pthread in jit-playback.c
On Sat, 2016-01-23 at 19:08 +0100, Iain Buclaw wrote: > Hi, > > I noticed when building from 2016-01-17 snapshot that the JIT frontend > failed to build. > > --- > jit-playback.c:2075:36: error: ‘PTHREAD_MUTEX_INITIALIZER’ was not > declared in this scope > jit-playback.c: In member function ‘void > gcc::jit::playback::context::acquire_mutex()’: > jit-playback.c:2086:33: error: ‘pthread_mutex_lock’ was not declared > in this scope > jit-playback.c: In member function ‘void > gcc::jit::playback::context::release_mutex()’: > jit-playback.c:2100:35: error: ‘pthread_mutex_unlock’ was not declared > in this scope > --- > > I'm not sure if this is something environmental on my side, or some > reorder/removals were done in the gcc headers included by the JIT > frontend, however this was needed in order to continue. Thanks. Doko just reported the same issue, and I now see it (with r232813) so this isn't just at your end. OK for trunk. Dave
[PATCH] PR target/68986: [5/6 Regression] internal compiler error: Segmentation fault
Stack alignment adjustment for __tls_get_addr should be done in ix86_update_stack_boundary, not ix86_compute_frame_layout. Also there is no need to over-align stack for __tls_get_addr and function with __tls_get_addr call isn't a leaf function. Tested on x86-64 with -m32 on testsuite. OK for trunk? Thanks. H.J. --- gcc/ PR target/68986 * config/i386/i386.c (ix86_compute_frame_layout): Move stack alignment adjustment to ... (ix86_update_stack_boundary): Here. Don't over-align stack for __tls_get_addr. (ix86_finalize_stack_realign_flags): Use stack_alignment_needed if __tls_get_addr is called. gcc/testsuite/ PR target/68986 * gcc.target/i386/pr68986-1.c: New test. * gcc.target/i386/pr68986-2.c: Likewise. * gcc.target/i386/pr68986-3.c: Likewise. --- gcc/config/i386/i386.c| 24 +++- gcc/testsuite/gcc.target/i386/pr68986-1.c | 11 +++ gcc/testsuite/gcc.target/i386/pr68986-2.c | 13 + gcc/testsuite/gcc.target/i386/pr68986-3.c | 13 + 4 files changed, 48 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-3.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 34b57a4..9c27ea9 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11360,18 +11360,6 @@ ix86_compute_frame_layout (struct ix86_frame *frame) crtl->preferred_stack_boundary = 128; crtl->stack_alignment_needed = 128; } - /* preferred_stack_boundary is never updated for call - expanded from tls descriptor. Update it here. We don't update it in - expand stage because according to the comments before - ix86_current_function_calls_tls_descriptor, tls calls may be optimized - away. */ - else if (ix86_current_function_calls_tls_descriptor - && crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY) -{ - crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; - if (crtl->stack_alignment_needed < PREFERRED_STACK_BOUNDARY) - crtl->stack_alignment_needed = PREFERRED_STACK_BOUNDARY; -} stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT; preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT; @@ -12043,6 +12031,15 @@ ix86_update_stack_boundary (void) && cfun->stdarg && crtl->stack_alignment_estimated < 128) crtl->stack_alignment_estimated = 128; + + /* __tls_get_addr needs to be called with 16-byte aligned stack. */ + if (ix86_tls_descriptor_calls_expanded_in_cfun + && crtl->preferred_stack_boundary < 128) +{ + crtl->preferred_stack_boundary = 128; + if (crtl->stack_alignment_needed < 128) + crtl->stack_alignment_needed = 128; +} } /* Handle the TARGET_GET_DRAP_RTX hook. Return NULL if no DRAP is @@ -12506,7 +12503,8 @@ ix86_finalize_stack_realign_flags (void) = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary ? crtl->parm_stack_boundary : ix86_incoming_stack_boundary); unsigned int stack_realign = (incoming_stack_boundary - < (crtl->is_leaf + < ((crtl->is_leaf + && !ix86_current_function_calls_tls_descriptor) ? crtl->max_used_stack_slot_alignment : crtl->stack_alignment_needed)); diff --git a/gcc/testsuite/gcc.target/i386/pr68986-1.c b/gcc/testsuite/gcc.target/i386/pr68986-1.c new file mode 100644 index 000..998f34f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr68986-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options "-fPIC -mno-accumulate-outgoing-args -mpreferred-stack-boundary=5 -mincoming-stack-boundary=4" } */ + +extern __thread int msgdata; +int +foo () +{ + return msgdata; +} diff --git a/gcc/testsuite/gcc.target/i386/pr68986-2.c b/gcc/testsuite/gcc.target/i386/pr68986-2.c new file mode 100644 index 000..23f9a52 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr68986-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options "-fPIC -mno-accumulate-outgoing-args -mpreferred-stack-boundary=2 -m32" } */ + +extern __thread int msgdata; +int +foo () +{ + return msgdata; +} + +/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr68986-3.c b/gcc/testsuite/gcc.target/i386/pr68986-3.c new file mode 100644 index 000..5744cf2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr68986-3.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* {
Re: [PATCH] OpenACC use_device clause ICE fix
On 2016/1/22 12:32 AM, Jakub Jelinek wrote: > On Thu, Jan 21, 2016 at 10:22:19PM +0800, Chung-Lin Tang wrote: >> On 2016/1/20 09:17 PM, Bernd Schmidt wrote: >>> On 01/05/2016 02:15 PM, Chung-Lin Tang wrote: * omp-low.c (scan_sharing_clauses): Call add_local_decl() for use_device/use_device_ptr variables. >>> >>> It looks vaguely plausible, but if everything is part of the host >>> function, why make a copy of the decl at all? I.e. what happens if you >>> just remove the install_var_local call? >> >> Because (only) inside the OpenMP context, the variable is supposed to >> contain the device-side value; a runtime call is used to obtain the >> value from the device back to host. So a new variable is created, the >> remap_decl mechanisms are used to change references inside the omp >> context, and other references of the original variable are not touched. > > The patch looks wrong to me, the var shouldn't be actually used, > it is supposed to have DECL_VALUE_EXPR set for it during omp lowering and > the following gimplification is supposed to replace it. > > I've tried the testcases you've listed and couldn't get an ICE, so, if you > see some ICE, can you mail the testcase (in patch form)? > Perhaps there is something wrong with the OpenACC lowering? > > Jakub > I've attached a small testcase that triggers the ICE under -fopenacc. This stll happens under current trunk. Thanks, Chung-Lin void foo (float *x, float *y) { int n = 1 << 20; #pragma acc data create(x[0:n]) copyout(y[0:n]) { #pragma acc host_data use_device(x,y) { for (int i = 1 ; i < n; i++) y[0] += x[i] * y[i]; } } }
[gomp4] Merge trunk r232548 (2016-01-19) into gomp-4_0-branch
Hi! Committed to gomp-4_0-branch in r232784: commit 9cfa5d5eb5fd3b186124883a76232189b359b3de Merge: 312e74d 56778b6 Author: tschwingeDate: Mon Jan 25 07:35:18 2016 + svn merge -r 232189:232548 svn+ssh://gcc.gnu.org/svn/gcc/trunk git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@232784 138bc75d-0d04-0410-961f-82ee72b054a4 Grüße Thomas
[PATCH] Fix PR69393
The following fixes an issue with LTO and debug info of OMP vars. LTO bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2016-01-25 Richard BienerPR lto/69393 * dwarf2out.c (is_naming_typedef_decl): Not when DECL_NAMELESS. * tree-streamer-out.c (pack_ts_base_value_fields): Stream DECL_NAMELESS. * tree-streamer-in.c (unpack_ts_base_value_fields): Likewise. * testsuite/libgomp.c++/pr69393.C: New testcase. Index: gcc/dwarf2out.c === *** gcc/dwarf2out.c (revision 232717) --- gcc/dwarf2out.c (working copy) *** is_naming_typedef_decl (const_tree decl) *** 22970,22975 --- 22970,22976 { if (decl == NULL_TREE || TREE_CODE (decl) != TYPE_DECL + || DECL_NAMELESS (decl) || !is_tagged_type (TREE_TYPE (decl)) || DECL_IS_BUILTIN (decl) || is_redundant_typedef (decl) Index: gcc/tree-streamer-out.c === *** gcc/tree-streamer-out.c (revision 232717) --- gcc/tree-streamer-out.c (working copy) *** pack_ts_base_value_fields (struct bitpac *** 87,93 bp_pack_value (bp, TREE_ADDRESSABLE (expr), 1); bp_pack_value (bp, TREE_THIS_VOLATILE (expr), 1); if (DECL_P (expr)) ! bp_pack_value (bp, DECL_UNSIGNED (expr), 1); else if (TYPE_P (expr)) bp_pack_value (bp, TYPE_UNSIGNED (expr), 1); else --- 87,96 bp_pack_value (bp, TREE_ADDRESSABLE (expr), 1); bp_pack_value (bp, TREE_THIS_VOLATILE (expr), 1); if (DECL_P (expr)) ! { ! bp_pack_value (bp, DECL_UNSIGNED (expr), 1); ! bp_pack_value (bp, DECL_NAMELESS (expr), 1); ! } else if (TYPE_P (expr)) bp_pack_value (bp, TYPE_UNSIGNED (expr), 1); else Index: gcc/tree-streamer-in.c === *** gcc/tree-streamer-in.c (revision 232717) --- gcc/tree-streamer-in.c (working copy) *** unpack_ts_base_value_fields (struct bitp *** 116,122 TREE_ADDRESSABLE (expr) = (unsigned) bp_unpack_value (bp, 1); TREE_THIS_VOLATILE (expr) = (unsigned) bp_unpack_value (bp, 1); if (DECL_P (expr)) ! DECL_UNSIGNED (expr) = (unsigned) bp_unpack_value (bp, 1); else if (TYPE_P (expr)) TYPE_UNSIGNED (expr) = (unsigned) bp_unpack_value (bp, 1); else --- 116,125 TREE_ADDRESSABLE (expr) = (unsigned) bp_unpack_value (bp, 1); TREE_THIS_VOLATILE (expr) = (unsigned) bp_unpack_value (bp, 1); if (DECL_P (expr)) ! { ! DECL_UNSIGNED (expr) = (unsigned) bp_unpack_value (bp, 1); ! DECL_NAMELESS (expr) = (unsigned) bp_unpack_value (bp, 1); ! } else if (TYPE_P (expr)) TYPE_UNSIGNED (expr) = (unsigned) bp_unpack_value (bp, 1); else Index: libgomp/testsuite/libgomp.c++/pr69393.C === *** libgomp/testsuite/libgomp.c++/pr69393.C (revision 0) --- libgomp/testsuite/libgomp.c++/pr69393.C (working copy) *** *** 0 --- 1,16 + // { dg-do run } + // { dg-require-effective-target lto } + // { dg-options "-flto -g -fopenmp" } + + int e = 5; + + int + main () + { + int a[e]; + a[0] = 6; + #pragma omp parallel + if (a[0] != 6) + __builtin_abort (); + return 0; + }
Re: [aarch64] Improve TImode constant moves
Hi Richard, On 24/01/16 10:54, Richard Henderson wrote: This looks to be an incomplete transition of the aarch64 backend to CONST_WIDE_INT. I haven't checked to see if it's a regression from gcc5, but I suspect not, since there should have been similar checks for CONST_DOUBLE. FWIW, I defined TARGET_SUPPORTS_WIDE_INT for aarch64 on trunk and the GCC 5 branch in order to fix PR 68129. This is probably gcc7 fodder, but it helped me debug another TImode PR. r~ +case CONST_WIDE_INT: + *cost = 0; + for (unsigned int n = CONST_WIDE_INT_NUNITS(x), i = 0; i < n; ++i) + { + unsigned HOST_WIDE_INT e = CONST_WIDE_INT_ELT(x, i); + if (e != 0) + *cost += COSTS_N_INSNS (aarch64_internal_mov_immediate + (NULL_RTX, GEN_INT (e), false, DImode)); + } + return true; + We usually avoid creating intermediate rtxes in the cost function because it can potentially be called many times during compilation and we want to avoid creating too many short-lived objects, though I suppose there's no way getting around this one (the GEN_INT call). Thanks, Kyrill
Re: [PATCH] OpenACC use_device clause ICE fix
On Mon, Jan 25, 2016 at 05:52:56PM +0900, Chung-Lin Tang wrote: > I've attached a small testcase that triggers the ICE under -fopenacc. This > stll > happens under current trunk. Then I think I'd prefer (untested so far): 2016-01-25 Jakub Jelinek* omp-low.c (lower_omp_target) : Set DECL_VALUE_EXPR of new_var even for the non-array case. Look through DECL_VALUE_EXPR for expansion. * c-c++-common/goacc/use_device-1.c: New test. --- gcc/omp-low.c.jj2016-01-21 00:55:19.0 +0100 +++ gcc/omp-low.c 2016-01-25 10:45:30.995510057 +0100 @@ -15878,6 +15878,14 @@ lower_omp_target (gimple_stmt_iterator * SET_DECL_VALUE_EXPR (new_var, x); DECL_HAS_VALUE_EXPR_P (new_var) = 1; } + else + { + tree new_var = lookup_decl (var, ctx); + x = create_tmp_var_raw (TREE_TYPE (new_var), get_name (new_var)); + gimple_add_tmp_var (x); + SET_DECL_VALUE_EXPR (new_var, x); + DECL_HAS_VALUE_EXPR_P (new_var) = 1; + } break; } @@ -16493,6 +16501,7 @@ lower_omp_target (gimple_stmt_iterator * x = build_fold_addr_expr (v); } } + new_var = DECL_VALUE_EXPR (new_var); x = fold_convert (TREE_TYPE (new_var), x); gimplify_expr (, _body, NULL, is_gimple_val, fb_rvalue); gimple_seq_add_stmt (_body, --- gcc/testsuite/c-c++-common/goacc/use_device-1.c.jj 2016-01-25 10:56:33.472310437 +0100 +++ gcc/testsuite/c-c++-common/goacc/use_device-1.c 2016-01-25 10:56:43.128176481 +0100 @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +void +foo (float *x, float *y) +{ + int n = 1 << 20; +#pragma acc data create(x[0:n]) copyout(y[0:n]) + { +#pragma acc host_data use_device(x,y) +{ + for (int i = 1; i < n; i++) + y[0] += x[i] * y[i]; +} + } +} Jakub
[libmpx, committed] Fix verbosity for error messages
Hi, This is an obvious patch fixing a verbosity for a part of error messages. Bootstrapped on x86_64-pc-linux-gnu. Applied to trunk and gcc-5-branch. Thanks, Ilya -- libmpx/ 2016-01-20 Ilya Enkovich* mpxrt/mpxrt.c (handler): Fix verbosity for error message. diff --git a/libmpx/mpxrt/mpxrt.c b/libmpx/mpxrt/mpxrt.c index bcdd3a6..b52906b 100644 --- a/libmpx/mpxrt/mpxrt.c +++ b/libmpx/mpxrt/mpxrt.c @@ -268,7 +268,7 @@ handler (int sig __attribute__ ((unused)), __mpxrt_write_uint (VERB_ERROR, trapno, 10); __mpxrt_write (VERB_ERROR, ", ip = 0x"); __mpxrt_write_uint (VERB_ERROR, ip, 16); - __mpxrt_write (VERB_BR, "\n"); + __mpxrt_write (VERB_ERROR, "\n"); exit (255); } else @@ -277,7 +277,7 @@ handler (int sig __attribute__ ((unused)), __mpxrt_write_uint (VERB_ERROR, trapno, 10); __mpxrt_write (VERB_ERROR, "! at 0x"); __mpxrt_write_uint (VERB_ERROR, ip, 16); - __mpxrt_write (VERB_BR, "\n"); + __mpxrt_write (VERB_ERROR, "\n"); exit (255); } }
Re: [PATCH] OpenACC use_device clause ICE fix
On Mon, Jan 25, 2016 at 11:02:05AM +0100, Jakub Jelinek wrote: > On Mon, Jan 25, 2016 at 10:58:17AM +0100, Jakub Jelinek wrote: > > --- gcc/testsuite/c-c++-common/goacc/use_device-1.c.jj 2016-01-25 > > 10:56:33.472310437 +0100 > > +++ gcc/testsuite/c-c++-common/goacc/use_device-1.c 2016-01-25 > > 10:56:43.128176481 +0100 > > @@ -0,0 +1,15 @@ > > +/* { dg-do compile } */ > > + > > +void > > +foo (float *x, float *y) > > +{ > > + int n = 1 << 20; > > +#pragma acc data create(x[0:n]) copyout(y[0:n]) > > + { > > +#pragma acc host_data use_device(x,y) > > +{ > > + for (int i = 1; i < n; i++) > > + y[0] += x[i] * y[i]; > > +} > > + } > > +} > > Though the testcase looks invalid to me, how can you dereference > the device pointer on the host? Though, for a testcase that it doesn't ICE > maybe good enough. The following ICEs without the patch and works with it, so I think it is better: 2016-01-25 Jakub Jelinek* omp-low.c (lower_omp_target) : Set DECL_VALUE_EXPR of new_var even for the non-array case. Look through DECL_VALUE_EXPR for expansion. * c-c++-common/goacc/use_device-1.c: New test. --- gcc/omp-low.c.jj2016-01-21 00:55:19.0 +0100 +++ gcc/omp-low.c 2016-01-25 10:45:30.995510057 +0100 @@ -15878,6 +15878,14 @@ lower_omp_target (gimple_stmt_iterator * SET_DECL_VALUE_EXPR (new_var, x); DECL_HAS_VALUE_EXPR_P (new_var) = 1; } + else + { + tree new_var = lookup_decl (var, ctx); + x = create_tmp_var_raw (TREE_TYPE (new_var), get_name (new_var)); + gimple_add_tmp_var (x); + SET_DECL_VALUE_EXPR (new_var, x); + DECL_HAS_VALUE_EXPR_P (new_var) = 1; + } break; } @@ -16493,6 +16501,7 @@ lower_omp_target (gimple_stmt_iterator * x = build_fold_addr_expr (v); } } + new_var = DECL_VALUE_EXPR (new_var); x = fold_convert (TREE_TYPE (new_var), x); gimplify_expr (, _body, NULL, is_gimple_val, fb_rvalue); gimple_seq_add_stmt (_body, --- gcc/testsuite/c-c++-common/goacc/use_device-1.c.jj 2016-01-25 10:56:33.472310437 +0100 +++ gcc/testsuite/c-c++-common/goacc/use_device-1.c 2016-01-25 10:56:43.128176481 +0100 @@ -0,0 +1,14 @@ +/* { dg-do compile } */ + +void bar (float *, float *); + +void +foo (float *x, float *y) +{ + int n = 1 << 10; +#pragma acc data create(x[0:n]) copyout(y[0:n]) + { +#pragma acc host_data use_device(x,y) +bar (x, y); + } +} Jakub
[PATCH] Fix PR69376
The following makes SCCVN properly save/restore SSA_NAME_ANTI_RANGE_P. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2016-01-25 Richard BienerPR tree-optimization/69376 * tree-ssa-sccvn.h (struct vn_ssa_aux): Add range_info_anti_range_p flag. (VN_INFO_ANTI_RANGE_P): New inline. (VN_INFO_RANGE_TYPE): Likewise. * tree-ssa-sccvn.c (set_ssa_val_to): Also record and copy SSA_NAME_ANTI_RANGE_P. (free_scc_vn): Restore SSA_NAME_ANTI_RANGE_P. * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children): Properly query VN_INFO_RANGE_TYPE. * gcc.dg/torture/pr69376.c: New testcase. Index: gcc/tree-ssa-sccvn.h === *** gcc/tree-ssa-sccvn.h(revision 232717) --- gcc/tree-ssa-sccvn.h(working copy) *** typedef struct vn_ssa_aux *** 191,196 --- 191,199 insertion of such with EXPR as definition is required before a use can be created of it. */ unsigned needs_insertion : 1; + + /* Whether range-info is anti-range. */ + unsigned range_info_anti_range_p : 1; } *vn_ssa_aux_t; enum vn_lookup_kind { VN_NOWALK, VN_WALK, VN_WALKREWRITE }; *** VN_INFO_RANGE_INFO (tree name) *** 253,258 --- 256,279 : SSA_NAME_RANGE_INFO (name)); } + /* Whether the original range info of NAME is an anti-range. */ + + inline bool + VN_INFO_ANTI_RANGE_P (tree name) + { + return (VN_INFO (name)->info.range_info + ? VN_INFO (name)->range_info_anti_range_p + : SSA_NAME_ANTI_RANGE_P (name)); + } + + /* Get at the original range info kind for NAME. */ + + inline value_range_type + VN_INFO_RANGE_TYPE (tree name) + { + return VN_INFO_ANTI_RANGE_P (name) ? VR_ANTI_RANGE : VR_RANGE; + } + /* Get at the original pointer info for NAME. */ inline ptr_info_def * Index: gcc/tree-ssa-pre.c === *** gcc/tree-ssa-pre.c (revision 232717) --- gcc/tree-ssa-pre.c (working copy) *** eliminate_dom_walker::before_dom_childre *** 4047,4053 && ! VN_INFO_RANGE_INFO (sprime) && b == sprime_b) duplicate_ssa_name_range_info (sprime, ! SSA_NAME_RANGE_TYPE (lhs), VN_INFO_RANGE_INFO (lhs)); } --- 4047,4053 && ! VN_INFO_RANGE_INFO (sprime) && b == sprime_b) duplicate_ssa_name_range_info (sprime, ! VN_INFO_RANGE_TYPE (lhs), VN_INFO_RANGE_INFO (lhs)); } Index: gcc/testsuite/gcc.dg/torture/pr69376.c === *** gcc/testsuite/gcc.dg/torture/pr69376.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr69376.c (working copy) *** *** 0 --- 1,45 + /* { dg-do run } */ + /* { dg-require-effective-target int32plus } */ + + int printf (const char *, ...); + + unsigned a, c, *d, f; + char b, e; + short g; + + void + fn1 () + { + unsigned h = 4294967290; + if (b >= 0) + { + h = b; + c = b / 290; + f = ~(c - (8 || h)); + if (f) + printf ("%d\n", 1); + if (f) + printf ("%d\n", f); + g = ~f; + if (c < 3) + { + int i = -h < ~c; + unsigned j; + if (i) + j = h; + h = -j * g; + } + c = h; + } + unsigned k = ~h; + char l = e || g; + if (l < 1 || k < 7) + *d = a; + } + + int + main () + { + fn1 (); + return 0; + } Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c(revision 232717) --- gcc/tree-ssa-sccvn.c(working copy) *** set_ssa_val_to (tree from, tree to) *** 3139,3153 { /* Save old info. */ if (! VN_INFO (to)->info.range_info) ! VN_INFO (to)->info.range_info = SSA_NAME_RANGE_INFO (to); /* Use that from the dominator. */ SSA_NAME_RANGE_INFO (to) = SSA_NAME_RANGE_INFO (from); } else { /* Save old info. */ if (! VN_INFO (to)->info.range_info) ! VN_INFO (to)->info.range_info = SSA_NAME_RANGE_INFO (to); /* Rather than allocating memory and unioning the info just clear it. */ SSA_NAME_RANGE_INFO (to) = NULL; --- 3139,3162 { /* Save old info. */ if (! VN_INFO (to)->info.range_info) ! { !
Re: [PATCH] OpenACC use_device clause ICE fix
On Mon, Jan 25, 2016 at 10:58:17AM +0100, Jakub Jelinek wrote: > --- gcc/testsuite/c-c++-common/goacc/use_device-1.c.jj2016-01-25 > 10:56:33.472310437 +0100 > +++ gcc/testsuite/c-c++-common/goacc/use_device-1.c 2016-01-25 > 10:56:43.128176481 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > + > +void > +foo (float *x, float *y) > +{ > + int n = 1 << 20; > +#pragma acc data create(x[0:n]) copyout(y[0:n]) > + { > +#pragma acc host_data use_device(x,y) > +{ > + for (int i = 1; i < n; i++) > + y[0] += x[i] * y[i]; > +} > + } > +} Though the testcase looks invalid to me, how can you dereference the device pointer on the host? Though, for a testcase that it doesn't ICE maybe good enough. Jakub
Re: [gomp4, PR68977, Committed] Don't gimplify in ssa mode if seen_error in oacc_xform_loop
On 14/01/16 10:43, Richard Biener wrote: On Wed, Jan 13, 2016 at 9:04 PM, Tom de Vrieswrote: Hi, At r231739, there was an ICE when checking code generated by oacc_xform_loop, in case the source contained an error. Due to seen_error (), gimplification during oacc_xform_loop bailed out, and an uninitialized var was introduced. Because of gimplifying in ssa mode, that caused an ICE. I can't reproduce this any longer, but I think the fix still makes sense. The patch makes sure oacc_xform_loop gimplifies in non-ssa mode if seen_error (). I don't think it makes "sense" in any way. After seen_error () a following ICE will be "confused after earlier errors" in release mode and thus I think that's not an important problem to paper over with this kind of "hack". I'd rather avoid doing any of omp-low if seen_error ()? The error triggered in oacc_device_lower, so there's nothing we can do before (in omp-low). How about this fix, which replaces the oacc ifn calls with zero-assignments if seen_error ()? Thanks, - Tom Ignore oacc ifn if seen_error in execute_oacc_device_lower --- gcc/omp-low.c | 39 ++- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 2de3aeb..f678f05 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -20201,7 +20201,7 @@ execute_oacc_device_lower () /* Rewind to allow rescan. */ gsi_prev (); - bool rescan = false, remove = false; + bool rescan = false, remove = false, assign_zero = false; enum internal_fn ifn_code = gimple_call_internal_fn (call); switch (ifn_code) @@ -20209,11 +20209,25 @@ execute_oacc_device_lower () default: break; case IFN_GOACC_LOOP: + if (seen_error ()) + { + remove = true; + assign_zero = true; + break; + } + oacc_xform_loop (call); rescan = true; break; case IFN_GOACC_REDUCTION: + if (seen_error ()) + { + remove = true; + assign_zero = true; + break; + } + /* Mark the function for SSA renaming. */ mark_virtual_operands_for_renaming (cfun); @@ -20228,6 +20242,13 @@ execute_oacc_device_lower () case IFN_UNIQUE: { + if (seen_error ()) + { + remove = true; + assign_zero = true; + break; + } + enum ifn_unique_kind kind = ((enum ifn_unique_kind) TREE_INT_CST_LOW (gimple_call_arg (call, 0))); @@ -20266,11 +20287,19 @@ execute_oacc_device_lower () { if (gimple_vdef (call)) replace_uses_by (gimple_vdef (call), gimple_vuse (call)); - if (gimple_call_lhs (call)) + tree lhs = gimple_call_lhs (call); + if (lhs != NULL_TREE) { - /* Propagate the data dependency var. */ - gimple *ass = gimple_build_assign (gimple_call_lhs (call), - gimple_call_arg (call, 1)); + gimple *ass; + if (assign_zero) + { + tree zero = build_zero_cst (TREE_TYPE (lhs)); + ass = gimple_build_assign (lhs, zero); + } + else + /* Propagate the data dependency var. */ + ass = gimple_build_assign (lhs, gimple_call_arg (call, 1)); + gsi_replace (, ass, false); } else
Re: [AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning
On Mon, Jan 11, 2016 at 12:04:43PM +, James Greenhalgh wrote: > > Hi, > > I've seen a couple of large performance issues caused by expanding > the high-precision reciprocal square root for Cortex-A57, so I'd like > to turn it off by default. > > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for > some private microbenchmark kernels which stress the divide/sqrt/multiply > units. It therefore seems to me to be the correct choice to make across > a number of workloads. > > Bootstrapped and tested on aarch64-none-linux-gnu with no issues. > > OK? *Ping* Thanks, James > --- > 2015-12-11 James Greenhalgh> > * config/aarch64/aarch64.c (cortexa57_tunings): Remove > AARCH64_EXTRA_TUNE_RECIP_SQRT. > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 1d5d898..999c9fc 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = >0, /* max_case_values. */ >0, /* cache_line_size. */ >tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS > - | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ > + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ > }; > > static const struct tune_params cortexa72_tunings =
[PATCH, PR69421] Check vector types of COND_EXPR operands are compatible when vectorizing it
Hi, This patch covers one more case when boolean operands get different vectypes and we don't detect it. Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2016-01-25 Ilya EnkovichPR target/69421 * tree-vect-stmts.c (vectorizable_condition): Check vectype of operands is compatible with a statement vectype. gcc/testsuite/ 2016-01-25 Ilya Enkovich PR target/69421 * gcc.dg/pr69421.c: New test. diff --git a/gcc/testsuite/gcc.dg/pr69421.c b/gcc/testsuite/gcc.dg/pr69421.c new file mode 100644 index 000..252e22c --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr69421.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +struct A { double a; }; +double a; + +void +foo (_Bool *x) +{ + long i; + for (i = 0; i < 64; i++) +{ + struct A c; + x[i] = c.a || a; +} +} diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 1d2246d..ed2ce07 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -7528,6 +7528,7 @@ vectorizable_condition (gimple *stmt, gimple_stmt_iterator *gsi, tree vectype = STMT_VINFO_VECTYPE (stmt_info); int nunits = TYPE_VECTOR_SUBPARTS (vectype); + tree vectype1 = NULL_TREE, vectype2 = NULL_TREE; if (slp_node || PURE_SLP_STMT (stmt_info)) ncopies = 1; @@ -7547,9 +7548,17 @@ vectorizable_condition (gimple *stmt, gimple_stmt_iterator *gsi, return false; gimple *def_stmt; - if (!vect_is_simple_use (then_clause, stmt_info->vinfo, _stmt, )) + if (!vect_is_simple_use (then_clause, stmt_info->vinfo, _stmt, , + )) +return false; + if (!vect_is_simple_use (else_clause, stmt_info->vinfo, _stmt, , + )) return false; - if (!vect_is_simple_use (else_clause, stmt_info->vinfo, _stmt, )) + + if (vectype1 && !useless_type_conversion_p (vectype, vectype1)) +return false; + + if (vectype2 && !useless_type_conversion_p (vectype, vectype2)) return false; masked = !COMPARISON_CLASS_P (cond_expr);
RE: [PATCH] [ARC] Add basic support for double load and store instructions
Committed r232788 Thanks, Claudiu > -Original Message- > From: Joern Wolfgang Rennecke [mailto:g...@amylaar.uk] > Sent: Sunday, January 24, 2016 3:26 PM > To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org > Cc: Francois Bedard; jeremy.benn...@embecosm.com > Subject: Re: [PATCH] [ARC] Add basic support for double load and store > instructions > > > > On 22/01/16 11:59, Claudiu Zissulescu wrote: > > Thank u for the feedback. I hope this new patch solves the outstanding > issues. Please find it attached. > > This is OK.
Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
On Mon, Jan 11, 2016 at 11:53:39AM +, James Greenhalgh wrote: > > Hi, > > I'd like to switch the logic around in aarch64.c such that > -mlow-precision-recip-sqrt causes us to always emit the low-precision > software expansion for reciprocal square root. I have two reasons to do > this; first is consistency across -mcpu targets, second is enabling more > -mcpu targets to use the flag for peak tuning. > > I don't much like that the precision we use for -mlow-precision-recip-sqrt > differs between cores (and possibly compiler revisions). Yes, we're > under -ffast-math but I take this flag to mean the user explicitly wants the > low-precision expansion, and we should not diverge from that based on an > internal decision as to what is optimal for performance in the > high-precision case. I'd prefer to keep things as predictable as possible, > and here that means always emitting the low-precision expansion when asked. > > Judging by the comments in the thread proposing the reciprocal square > root optimisation, this will benefit all cores currently supported by GCC. > To be clear, we would still not expand in the high-precision case for any > cores which do not explicitly ask for it. Currently that is Cortex-A57 > and xgene, though I will be proposing a patch to remove Cortex-A57 from > that list shortly. > > Which gives my second motivation for this patch. -mlow-precision-recip-sqrt > is intended as a tuning flag for situations where performance is more > important than precision, but the current logic requires setting an > internal flag which also changes the performance characteristics where > high-precision is needed. This conflates two decisions the target might > want to make, and reduces the applicability of an option targets might > want to enable for performance. In particular, I'd still like to see > -mlow-precision-recip-sqrt continue to emit the cheaper, low-precision > sequence for floats under Cortex-A57. > > Based on that reasoning, this patch makes the appropriate change to the > logic. I've checked with the current -mcpu values to ensure that behaviour > without -mlow-precision-recip-sqrt does not change, and that behaviour > with -mlow-precision-recip-sqrt is to emit the low precision sequences. > > I've also put this through bootstrap and test on aarch64-none-linux-gnu > with no issues. > > OK? *Ping* Thanks, James > 2015-12-10 James Greenhalgh> > * config/aarch64/aarch64.c (use_rsqrt_p): Always use software > reciprocal sqrt for -mlow-precision-recip-sqrt. > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 9142ac0..1d5d898 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -7485,8 +7485,9 @@ use_rsqrt_p (void) > { >return (!flag_trapping_math > && flag_unsafe_math_optimizations > - && (aarch64_tune_params.extra_tuning_flags > - & AARCH64_EXTRA_TUNE_RECIP_SQRT)); > + && ((aarch64_tune_params.extra_tuning_flags > +& AARCH64_EXTRA_TUNE_RECIP_SQRT) > + || flag_mrecip_low_precision_sqrt)); > } > > /* Function to decide when to use
Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts
On 01/23/2016 12:52 AM, Ian Lance Taylor wrote: 2016-01-22 Ian Lance Taylor* common.opt (fkeep-gc-roots-live): New option. * tree-ssa-loop-ivopts.c (add_candidate_1): If -fkeep-gc-roots-live, skip pointers. (add_iv_candidate_for_biv): Handle add_candidate_1 returning NULL. * doc/invoke.texi (Optimize Options): Document -fkeep-gc-roots-live. gcc/testsuite/ChangeLog: 2016-01-22 Ian Lance Taylor * gcc.dg/tree-ssa/ivopt_5.c: New test. Patch not attached? Bernd
Re: [AARCH64][ACLE][NEON] Implement vcvt*_s64_f64 and vcvt*_u64_f64 NEON intrinsics.
On Thu, Jan 21, 2016 at 12:32:07PM +, James Greenhalgh wrote: > On Wed, Jan 13, 2016 at 05:44:30PM +, Bilyan Borisov wrote: > > This patch implements all the vcvtR_s64_f64 and vcvtR_u64_f64 vector > > intrinsics, where R is ['',a,m,n,p]. Since these intrinsics are > > identical in semantics to the corresponding scalar variants, they are > > implemented in terms of them, with appropriate packing and unpacking > > of vector arguments. New test cases, covering all the intrinsics were > > also added. > > This patch is very low risk, gets us another step towards closing pr58693, > and was posted before the Stage 3 deadline. This is OK for trunk. I realised you don't have commit access, so I've committed this on your behalf as revision 232789. Thanks, James > > gcc/ > > > > 2015-XX-XX Bilyan Borisov> > > > * config/aarch64/arm_neon.h (vcvt_s64_f64): New intrinsic. > > (vcvt_u64_f64): Likewise. > > (vcvta_s64_f64): Likewise. > > (vcvta_u64_f64): Likewise. > > (vcvtm_s64_f64): Likewise. > > (vcvtm_u64_f64): Likewise. > > (vcvtn_s64_f64): Likewise. > > (vcvtn_u64_f64): Likewise. > > (vcvtp_s64_f64): Likewise. > > (vcvtp_u64_f64): Likewise. > > > > gcc/testsuite/ > > > > 2015-XX-XX Bilyan Borisov > > > > * gcc.target/aarch64/simd/vcvt_s64_f64_1.c: New. > > * gcc.target/aarch64/simd/vcvt_u64_f64_1.c: Likewise. > > * gcc.target/aarch64/simd/vcvta_s64_f64_1.c: Likewise. > > * gcc.target/aarch64/simd/vcvta_u64_f64_1.c: Likewise. > > * gcc.target/aarch64/simd/vcvtm_s64_f64_1.c: Likewise. > > * gcc.target/aarch64/simd/vcvtm_u64_f64_1.c: Likewise. > > * gcc.target/aarch64/simd/vcvtn_s64_f64_1.c: Likewise. > > * gcc.target/aarch64/simd/vcvtn_u64_f64_1.c: Likewise. > > * gcc.target/aarch64/simd/vcvtp_s64_f64_1.c: Likewise. > > * gcc.target/aarch64/simd/vcvtp_u64_f64_1.c: Likewise. >
Minor tweaks to documentation of scalar_storage_order
Tested on x86_64-suse-linux, applied on the mainline as obvious. 2016-01-25 Eric Botcazou* doc/extend.texi (scalar_storage_order type attribute): Fix typo and improve wording for mixed storage order support. -- Eric BotcazouIndex: doc/extend.texi === --- doc/extend.texi (revision 232773) +++ doc/extend.texi (working copy) @@ -6481,10 +6481,10 @@ integral type should be used. When attached to a @code{union} or a @code{struct}, this attribute sets the storage order, aka endianness, of the scalar fields of the type, as well as the array fields whose component is scalar. The supported -endianness are @code{big-endian} and @code{little-endian}. The attribute +endiannesses are @code{big-endian} and @code{little-endian}. The attribute has no effects on fields which are themselves a @code{union}, a @code{struct} or an array whose component is a @code{union} or a @code{struct}, and it is -possible to have fields with a different scalar storage order than the +possible for these fields to have a different scalar storage order than the enclosing type. This attribute is supported only for targets that use a uniform default
Re: [PATCH, AArch64] Fix for PR67896 (C++ FE cannot distinguish __Poly{8,16,64,128}_t types)
On Wed, Jan 20, 2016 at 09:27:41PM +0100, Roger Ferrer Ibáñez wrote: > Hi James, > > > This patch looks technically correct to me, though there is a small > > style issue to correct (in-line below), and your ChangeLogs don't fit > > our usual style. > > thank you very much for the useful comments. I'm attaching a new > version of the patch with the style issues (hopefully) ironed out. Thanks, this version of the patch looks correct to me. > > > P.S.: I haven't signed the copyright assignment to the FSF. The change > > > is really small but I can do the paperwork if required. I can't commit it on your behalf until we've heard back regarding whether this needs a copyright assignment to the FSF, but once I've heard I'd be happy to commit this for you. I'll expand the CC list a bit further to see if we can get an answer on that. Thanks again for the analysis and patch. James > gcc/ChangeLog: > > 2016-01-19 Roger Ferrer Ibáñez> > PR target/67896 > * config/aarch64/aarch64-builtins.c > (aarch64_init_simd_builtin_types): Do not set structural > equality to __Poly{8,16,64,128}_t types. > > gcc/testsuite/ChangeLog: > > 2016-01-19 Roger Ferrer Ibáñez > > PR target/67896 > * gcc.target/aarch64/simd/pr67896.C: New. > > -- > Roger Ferrer Ibáñez > From 72c065f6a3f9d168baf357de1b567faa6042c03b Mon Sep 17 00:00:00 2001 > From: Roger Ferrer Ibanez > Date: Wed, 20 Jan 2016 21:11:42 +0100 > Subject: [PATCH] Do not set structural equality on polynomial types > > --- > gcc/config/aarch64/aarch64-builtins.c | 10 ++ > gcc/testsuite/gcc.target/aarch64/simd/pr67896.C | 7 +++ > 2 files changed, 13 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/pr67896.C > > diff --git a/gcc/config/aarch64/aarch64-builtins.c > b/gcc/config/aarch64/aarch64-builtins.c > index bd7a8dd..40272ed 100644 > --- a/gcc/config/aarch64/aarch64-builtins.c > +++ b/gcc/config/aarch64/aarch64-builtins.c > @@ -610,14 +610,16 @@ aarch64_init_simd_builtin_types (void) >enum machine_mode mode = aarch64_simd_types[i].mode; > >if (aarch64_simd_types[i].itype == NULL) > - aarch64_simd_types[i].itype = > - build_distinct_type_copy > - (build_vector_type (eltype, GET_MODE_NUNITS (mode))); > + { > + aarch64_simd_types[i].itype > + = build_distinct_type_copy > + (build_vector_type (eltype, GET_MODE_NUNITS (mode))); > + SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype); > + } > >tdecl = add_builtin_type (aarch64_simd_types[i].name, > aarch64_simd_types[i].itype); >TYPE_NAME (aarch64_simd_types[i].itype) = tdecl; > - SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype); > } > > #define AARCH64_BUILD_SIGNED_TYPE(mode) \ > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C > b/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C > new file mode 100644 > index 000..1f916e0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C > @@ -0,0 +1,7 @@ > +typedef __Poly8_t A; > +typedef __Poly16_t A; /* { dg-error "conflicting declaration" } */ > +typedef __Poly64_t A; /* { dg-error "conflicting declaration" } */ > +typedef __Poly128_t A; /* { dg-error "conflicting declaration" } */ > + > +typedef __Poly8x8_t B; > +typedef __Poly16x8_t B; /* { dg-error "conflicting declaration" } */ > -- > 2.1.4 >
Re: Speedup configure and build with system.h
Hello! > * system.h (string, algorithm): Include only conditionally. > (new): Include always under C++. > * bb-reorder.c (toplevel): Define USES_ALGORITHM. > * final.c (toplevel): Ditto. > * ipa-chkp.c (toplevel): Define USES_STRING. > * genconditions.c (write_header): Make gencondmd.c define > USES_STRING. > * mem-stats.h (mem_usage::print_dash_line): Don't use std::string. > > * config/aarch64/aarch64.c (toplevel): Define USES_STRING. > * common/config/aarch64/aarch64-common.c (toplevel): Ditto. This patch caused bootstrap failure on non-c++11 bootstrap compiler [1], e.g. CentOS 5.11. The problem is with std::swap, which was defined in header until c++11 [2]. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69464 [2] http://en.cppreference.com/w/cpp/algorithm/swap Uros.
Re: Wonly-top-basic-asm
On 01/24/2016 11:23 PM, David Wohlferd wrote: +Wonly-top-basic-asm +C ObjC ObjC++ C++ Var(warn_only_top_basic_asm) Warning +Warn on unsafe uses of basic asm. Maybe just -Wbasic-asm? +/* Warn on basic asm used inside of functions, + EXCEPT when in naked functions. Also allow asm(""). */ Two spaces after a sentence. +if (warn_only_top_basic_asm && (TREE_STRING_LENGTH (str) != 1) ) Unnecessary parens, and extra space before closing paren. + if (warn_only_top_basic_asm && + (TREE_STRING_LENGTH (string) != 1)) Extra parens, and && goes first on the next line. + warning_at(asm_loc, OPT_Wonly_top_basic_asm, Space before "(". + "asm statement in function does not use extended syntax"); Could break that into ".." "..." over two lines so as to keep indentation. -asm (""); +asm ("":::); Is that necessary? As far as I can tell we're treating these equally. @@ -7487,6 +7490,8 @@ consecutive in the output, put them in a single multi-instruction @code{asm} statement. Note that GCC's optimizers can move @code{asm} statements relative to other code, including across jumps. +Using inputs and outputs with extended @code{asm} can help correctly position +your asm. Not sure this is needed either. Sounds a bit like advertising :) In general the doc changes seem much too verbose to me. +Extended @code{asm}'s @samp{%=} may help resolve this. Same here. I think the block that recommends extended asm is good enough. I think the next part could be shrunk significantly too. -Here is an example of basic @code{asm} for i386: +Basic @code{asm} statements within functions do not perform an implicit +"memory" clobber (@pxref{Clobbers}). Also, there is no implicit clobbering +of @emph{any} registers, so (other than "naked" functions which follow the "other than in"? Also @code{naked} maybe. I'd place a note about clobbering after the existing "To access C data, it is better to use extended asm". +ABI rules) changed registers must be restored to their original value before +exiting the @code{asm}. While this behavior has not always been documented, +GCC has worked this way since at least v2.95.3. Also, lacking inputs and +outputs means that GCC's optimizers may have difficulties consistently +positioning the basic @code{asm} in the generated code. The existing text already mentions ordering issues. Lose this block. +The concept of ``clobbering'' does not apply to basic @code{asm} statements +outside of functions (aka top-level asm). Stating the obvious? +@strong{Warning!} This "clobber nothing" behavior may be different than how Ok there is precedent for this, but it's spelt "@strong{Warning:}" in all other cases. Still, I'd probably also shrink this paragraph and put a note about lack of C semantics and possibly different behaviour from other compilers near the top, where we say that extended asm is better in most cases. +other compilers treat basic @code{asm}, since the C standards for the +@code{asm} statement provide no guidance regarding these semantics. As a +result, @code{asm} statements that work correctly on other compilers may not +work correctly with GCC (and vice versa), even though they both compile +without error. Also, there is discussion underway about changing GCC to +have basic @code{asm} clobber at least memory and perhaps some (or all) +registers. If implemented, this change may fix subtle problems with +existing @code{asm} statements. However it may break or slow down ones that +were working correctly. How would such a change break anything? I'd also not mention discussion underway, just say "Future versions of GCC may treat basic @code{asm} as clobbering memory". +If your existing code needs clobbers that GCC's basic @code{asm} is not +providing, or if you want to 'future-proof' your asm against possible +changes to basic @code{asm}'s semantics, use extended @code{asm}. Recommending it too often. Lose this. +Extended @code{asm} allows you to specify what (if anything) needs to be +clobbered for your code to work correctly. And again. You can use @ref{Warning +Options, @option{-Wonly-top-basic-asm}} to locate basic @code{asm} I think just plain @option is usual. +statements that may need changes, and refer to +@uref{https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended, How to convert +from basic asm to extended asm} for information about how to perform the +conversion. A link is probably good if we have such a page. +Here is an example of top-level basic @code{asm} for i386 that defines an +asm macro. That macro is then invoked from within a function using +extended @code{asm}: The updated example also looks good. I think I'm fine with the concept but I'd like to see an updated patch with better docs. Bernd
[PATCH] rs6000: Put back the 's' output modifier
It turns out the 's' output modifier is used in some glibc math code, and is in an installed header even. So let's put it back, it is much less of a burden supporting it a bit longer than to deal with the fallout. (It is also being fixed for glibc.) Tested on powerpc64-linux-gcc; is this okay for mainline? Segher 2016-01-26 Segher Boessenkool* config/rs6000/rs6000.c (print_operand): Rollback 's' removal. --- gcc/config/rs6000/rs6000.c | 8 1 file changed, 8 insertions(+) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index ba4aeab..2a3a441 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -19949,6 +19949,14 @@ print_operand (FILE *file, rtx x, int code) fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO)); return; +case 's': + /* Low 5 bits of 32 - value */ + if (! INT_P (x)) + output_operand_lossage ("invalid %%s value"); + else + fprintf (file, HOST_WIDE_INT_PRINT_DEC, (32 - INTVAL (x)) & 31); + return; + case 't': /* Like 'J' but get to the OVERFLOW/UNORDERED bit. */ gcc_assert (REG_P (x) && GET_MODE (x) == CCmode); -- 1.9.3