Re: [PATCH, rs6000] Fix PR target/80210: ICE in extract_insn
On 8/18/17 6:27 PM, Segher Boessenkool wrote: > On Thu, Aug 17, 2017 at 07:02:14PM -0500, Peter Bergner wrote: >> This is also broken in GCC 7, GCC 6 and GCC 5. Ok for those after this >> has been on trunk for a little while and assuming testing passes? > > Okay for trunk and all branches. Thanks! Ok, committed to trunk and the back port bootstraps/regtesting are running. If they're clean, I'll commit this sometime next week after the trunk version has baked a little bit. Thanks! Peter
[PATCH 1/2] c-family/c/c++: pass optional vec to c-format.c
This patch passes along the vec of argument locations at a callsite from the C frontend to check_function_arguments and from there to c-format.c, so that we can underline the pertinent argument to mismatched format codes even for tree codes like decls and constants which lack a location_t for their usage sites. This takes e.g.: printf("hello %i %i %i ", foo, bar, baz); ~^ %s to: printf("hello %i %i %i ", foo, bar, baz); ~^~~~ %s which is useful for cases where there's more than one variadic argument. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/c-family/ChangeLog: * c-common.c (check_function_arguments): Add "arglogs" param; pass it to check_function_format. * c-common.h (check_function_arguments): Add vec * param. (check_function_format): Likewise. * c-format.c (struct format_check_context): Add field "arglocs". (check_function_format): Add param "arglocs"; pass it to check_format_info. (check_format_info): Add param "arglocs"; use it to initialize new field of format_ctx. (check_format_arg): Pass format_ctx->arglocs to new param of check_format_info_main. (class argument_parser): New field "arglocs". (argument_parser::argument_parser): Add "arglocs_" param and use it to initialize new field. (argument_parser::check_argument_type): Pass new arglocs field to check_format_types. (check_format_info_main): Add param "arglocs", and use it when constructing arg_parser. (check_format_types): Add param "arglocs"; use it if non-NULL when !EXPR_HAS_LOCATION (cur_param) to get at location information. gcc/c/ChangeLog: * c-typeck.c (build_function_call_vec): Pass arg_loc to call to check_function_arguments. gcc/cp/ChangeLog: * call.c (build_over_call): Pass NULL for new parameter to check_function_arguments. * typeck.c (cp_build_function_call_vec): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/format/diagnostic-ranges.c: Update expected results to show underlining of all pertinent params. * gcc.dg/format/pr72858.c: Likewise. --- gcc/c-family/c-common.c | 4 +- gcc/c-family/c-common.h | 4 +- gcc/c-family/c-format.c | 52 gcc/c/c-typeck.c| 2 +- gcc/cp/call.c | 2 +- gcc/cp/typeck.c | 2 +- gcc/testsuite/gcc.dg/format/diagnostic-ranges.c | 41 -- gcc/testsuite/gcc.dg/format/pr72858.c | 102 +--- 8 files changed, 98 insertions(+), 111 deletions(-) diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 4dc3b33..156c89d 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -5539,7 +5539,7 @@ attribute_fallthrough_p (tree attr) diagnostics. Return true if -Wnonnull warning has been diagnosed. */ bool check_function_arguments (location_t loc, const_tree fndecl, const_tree fntype, - int nargs, tree *argarray) + int nargs, tree *argarray, vec *arglocs) { bool warned_p = false; @@ -5553,7 +5553,7 @@ check_function_arguments (location_t loc, const_tree fndecl, const_tree fntype, /* Check for errors in format strings. */ if (warn_format || warn_suggest_attribute_format) -check_function_format (TYPE_ATTRIBUTES (fntype), nargs, argarray); +check_function_format (TYPE_ATTRIBUTES (fntype), nargs, argarray, arglocs); if (warn_format) check_function_sentinel (fntype, nargs, argarray); diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index 980d396..8e36768 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -808,7 +808,7 @@ extern tree fname_decl (location_t, unsigned, tree); extern int check_user_alignment (const_tree, bool); extern bool check_function_arguments (location_t loc, const_tree, const_tree, - int, tree *); + int, tree *, vec *); extern void check_function_arguments_recurse (void (*) (void *, tree, unsigned HOST_WIDE_INT), @@ -816,7 +816,7 @@ extern void check_function_arguments_recurse (void (*) unsigned HOST_WIDE_INT); extern bool check_builtin_function_arguments (location_t, vec, tree, int, tree *); -extern void check_function_format (tree, int, tree *); +extern void check_function_format (tree, int, tree *, vec *); extern bool attribute_fallthrough_p (tree); extern tree handle_format_attribute (tree *, tree, t
[PATCH 2/2] C: use full locations within c_parser_expr_list's vec
The previous patch uncovered a bug in how c_parser_expr_list builds the vec: it was only using the location of the first token within each assignment-expression in the expr-list. This shows up in e.g. this -Wformat warning, where only part of the 2nd param is underlined: printf("hello %i", (long)0); ~^ ~ %li This patch fixes c_parser_expr_list to use the full range of each assignment-expression in the list for the vec, so that for the above we print: printf("hello %i", (long)0); ~^ ~~~ %li Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/c/ChangeLog: * c-parser.c (c_parser_expr_list): Use c_expr::get_location () rather than peeking the location of the first token. * c-tree.h (c_expr::get_location): New method. gcc/testsuite/ChangeLog: * gcc.dg/format/diagnostic-ranges.c (test_mismatching_types): Update expected result to show all of "(long)0" being underlined. * gcc.dg/plugin/diagnostic-test-string-literals-1.c (test_multitoken_macro): Update expected underlining. --- gcc/c/c-parser.c | 11 +-- gcc/c/c-tree.h| 8 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c | 2 +- .../gcc.dg/plugin/diagnostic-test-string-literals-1.c | 2 +- 4 files changed, 15 insertions(+), 8 deletions(-) diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 1402ba6..dfc75e0 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -8933,7 +8933,6 @@ c_parser_expr_list (c_parser *parser, bool convert_p, bool fold_p, vec *ret; vec *orig_types; struct c_expr expr; - location_t loc = c_parser_peek_token (parser)->location; unsigned int idx = 0; ret = make_tree_vector (); @@ -8946,14 +8945,14 @@ c_parser_expr_list (c_parser *parser, bool convert_p, bool fold_p, c_parser_check_literal_zero (parser, literal_zero_mask, 0); expr = c_parser_expr_no_commas (parser, NULL); if (convert_p) -expr = convert_lvalue_to_rvalue (loc, expr, true, true); +expr = convert_lvalue_to_rvalue (expr.get_location (), expr, true, true); if (fold_p) expr.value = c_fully_fold (expr.value, false, NULL); ret->quick_push (expr.value); if (orig_types) orig_types->quick_push (expr.original_type); if (locations) -locations->safe_push (loc); +locations->safe_push (expr.get_location ()); if (sizeof_arg != NULL && expr.original_code == SIZEOF_EXPR) { @@ -8963,19 +8962,19 @@ c_parser_expr_list (c_parser *parser, bool convert_p, bool fold_p, while (c_parser_next_token_is (parser, CPP_COMMA)) { c_parser_consume_token (parser); - loc = c_parser_peek_token (parser)->location; if (literal_zero_mask) c_parser_check_literal_zero (parser, literal_zero_mask, idx + 1); expr = c_parser_expr_no_commas (parser, NULL); if (convert_p) - expr = convert_lvalue_to_rvalue (loc, expr, true, true); + expr = convert_lvalue_to_rvalue (expr.get_location (), expr, true, +true); if (fold_p) expr.value = c_fully_fold (expr.value, false, NULL); vec_safe_push (ret, expr.value); if (orig_types) vec_safe_push (orig_types, expr.original_type); if (locations) - locations->safe_push (loc); + locations->safe_push (expr.get_location ()); if (++idx < 3 && sizeof_arg != NULL && expr.original_code == SIZEOF_EXPR) diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h index 92bcc70..5182cc5 100644 --- a/gcc/c/c-tree.h +++ b/gcc/c/c-tree.h @@ -147,6 +147,14 @@ struct c_expr location_t get_start () const { return src_range.m_start; } location_t get_finish () const { return src_range.m_finish; } + location_t get_location () const + { +if (CAN_HAVE_LOCATION_P (value)) + return EXPR_LOCATION (value); +else + return make_location (get_start (), get_start (), get_finish ()); + } + /* Set the value to error_mark_node whilst ensuring that src_range is initialized. */ void set_error () diff --git a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c index bc6f665..e56e159 100644 --- a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c +++ b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c @@ -25,7 +25,7 @@ void test_mismatching_types (const char *msg) printf("hello %i", (long)0); /* { dg-warning "format '%i' expects argument of type 'int', but argument 2 has type 'long int' " } */ /* { dg-begin-multiline-output "" } printf("hello %i", (long)0); - ~^ ~ + ~^ ~~~ %li { dg-end-multiline-output "" } */ } diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c b/gcc/testsuite/gcc.dg/plugin/diagno
Re: [PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget
On Fri, Aug 18, 2017 at 04:49:44PM -0500, Steven Munroe wrote: > On Thu, 2017-08-17 at 00:47 -0500, Segher Boessenkool wrote: > > On Wed, Aug 16, 2017 at 03:50:55PM -0500, Steven Munroe wrote: > > > This it part 3/3 for contributing PPC64LE support for X86 SSE > > > instrisics. This patch includes testsuite/gcc.target tests for the > > > intrinsics included by xmmintrin.h. > > > > > +#define CHECK_EXP(UINON_TYPE, VALUE_TYPE, FMT) \ > > > > Should that be UNION_TYPE? > > It is spelled 'UINON_TYPE' in > ./gcc/testsuite/gcc.target/i386/m128-check.h which the source for the > powerpc version. > > There is no obvious reason why it could not be spelled UNION_TYPE. > Unless there is some symbol collision further up the SSE/AVX stack. > > Bingo: > > avx512f-helper.h:#define UNION_TYPE(SIZE, NAME) EVAL(union, SIZE, NAME) > > I propose not to change this. Heh. Okay :-) Segher
[committed] jit: fix segfault with autovectorization (PR tree-optimization/46805)
On Thu, 2017-08-17 at 11:51 -0400, David Malcolm wrote: > On Thu, 2017-08-17 at 09:15 +1200, Michael Cree wrote: > > On Wed, Aug 16, 2017 at 10:01:57AM -0400, David Malcolm wrote: > > > On Wed, 2017-08-16 at 21:58 +1200, Michael Cree wrote: > > > > > > > > But I have hit a problem which I suspect is a bug in the gcc > > > > optimiser. > > > > > > > > In the vein of your example above, but working on uint8_t pixel > > > > data > > > > and adding saturation, the jit compiler segfaults in the > > > > optimiser. I > > > > provide below the gimple produced by the function that causes > > > > the > > > > problem (I presume that is more useful than the code calling > > > > the > > > > gcc_jit routines), > > > > > > There's actually a handy entrypoint for generating minimal > > > reproducers > > > for such crashes: > > > gcc_jit_context_dump_reproducer_to_file > > > > > > https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#gcc_jit_c > > > on > > > text_dump_reproducer_to_file > > > > > > Can you add a call to that to your code (after the context is > > > fully > > > populated), and see if the resulting .c file leads to the crash > > > when > > > run? If so, can you post the .c file here please (or attach it > > > to > > > bugzilla), and hopefully I can then reproduce it at my end. > > > > Attached. > > > > Cheers > > Michael. > > Thanks. I'm able to reproduce the crash using that; am > investigating. > > Dave libgccjit ran into its own version of PR tree-optimization/46805 (seen with the Go frontend); this patch fixes it in the same way. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. Takes jit.sum from 9759 passes to 9769. Committed to trunk as r251192. gcc/jit/ChangeLog: PR tree-optimization/46805 * dummy-frontend.c (jit_langhook_parse_file): Handle vector types. gcc/testsuite/ChangeLog: PR tree-optimization/46805 * jit.dg/all-non-failing-tests.h: Add test-autovectorize.c. * jit.dg/test-autovectorize.c: New test case. --- gcc/jit/dummy-frontend.c | 11 + gcc/testsuite/jit.dg/all-non-failing-tests.h | 10 + gcc/testsuite/jit.dg/test-autovectorize.c| 375 +++ 3 files changed, 396 insertions(+) create mode 100644 gcc/testsuite/jit.dg/test-autovectorize.c diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c index d7d2172..b217290 100644 --- a/gcc/jit/dummy-frontend.c +++ b/gcc/jit/dummy-frontend.c @@ -165,6 +165,17 @@ jit_langhook_parse_file (void) static tree jit_langhook_type_for_mode (machine_mode mode, int unsignedp) { + /* Build any vector types here (see PR 46805). */ + if (VECTOR_MODE_P (mode)) +{ + tree inner; + + inner = jit_langhook_type_for_mode (GET_MODE_INNER (mode), unsignedp); + if (inner != NULL_TREE) + return build_vector_type_for_mode (inner, mode); + return NULL_TREE; +} + if (mode == TYPE_MODE (float_type_node)) return float_type_node; diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h b/gcc/testsuite/jit.dg/all-non-failing-tests.h index 4af704a..acfcc40 100644 --- a/gcc/testsuite/jit.dg/all-non-failing-tests.h +++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h @@ -50,6 +50,13 @@ #undef create_code #undef verify_code +/* test-autovectorize.c */ +#define create_code create_code_autovectorize +#define verify_code verify_code_autovectorize +#include "test-autovectorize.c" +#undef create_code +#undef verify_code + /* test-calling-external-function.c */ #define create_code create_code_calling_external_function #define verify_code verify_code_calling_external_function @@ -267,6 +274,9 @@ const struct testcase testcases[] = { {"arrays", create_code_arrays, verify_code_arrays}, + {"autovectorize", + create_code_autovectorize, + verify_code_autovectorize}, {"calling_external_function", create_code_calling_external_function, verify_code_calling_external_function}, diff --git a/gcc/testsuite/jit.dg/test-autovectorize.c b/gcc/testsuite/jit.dg/test-autovectorize.c new file mode 100644 index 000..9cfd0b2 --- /dev/null +++ b/gcc/testsuite/jit.dg/test-autovectorize.c @@ -0,0 +1,375 @@ +#include +#include "harness.h" + +void +create_code (gcc_jit_context *ctxt, void * user_data) +{ + gcc_jit_type *type_void = gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_VOID); + gcc_jit_type *type_int = gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT); + gcc_jit_type *type_unsigned_char = gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_UNSIGNED_CHAR); + gcc_jit_type *type_void_ptr = +gcc_jit_type_get_pointer (type_void); + gcc_jit_type *type_void_ptr_ptr = +gcc_jit_type_get_pointer (type_void_ptr); + gcc_jit_type *type_unsigned_char__ = +gcc_jit_type_get_pointer (type_unsigned_char); + gcc_jit_field *field_x = +gcc_jit_context_new_field (ctxt, + NULL, /* gcc_jit_location *loc */ + type_int, /* gcc_jit_type *type, */ +
[committed] jit: make simpler reproducers
The C reproducers generated by gcc_jit_context_dump_reproducer_to_file contain numerous pointer values (from %p) to ensure uniqueness of the identifiers, but this makes them less readable than they could be. This patch updates reproducer::make_identifier so that the pointer is only added if it's necessary for uniqueness. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. Committed to trunk as r251191. gcc/jit/ChangeLog: * jit-recording.c (class gcc::jit::reproducer): Rename field "m_identifiers" to "m_map_memento_to_identifier". Add field "m_set_identifiers" and struct hash_traits for it. (gcc::jit::reproducer::reproducer): Update for above. (convert_to_identifier): New function. (gcc::jit::reproducer::ensure_identifier_is_unique): New method. (gcc::jit::reproducer::make_identifier): Avoid appending the %p unless necessary for uniqueness. Update for field renaming. (gcc::jit::reproducer::get_identifier): Update for field renaming. --- gcc/jit/jit-recording.c | 62 +++-- 1 file changed, 50 insertions(+), 12 deletions(-) diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c index ea4ebb1..0e7f46e0 100644 --- a/gcc/jit/jit-recording.c +++ b/gcc/jit/jit-recording.c @@ -236,7 +236,16 @@ class reproducer : public dump GNU_PRINTF(2, 3); private: - hash_map m_identifiers; + const char * ensure_identifier_is_unique (const char *candidate, void *ptr); + + private: + hash_map m_map_memento_to_identifier; + + struct hash_traits : public string_hash + { +static void remove (const char *) {} + }; + hash_set m_set_identifiers; allocator m_allocator; }; @@ -245,7 +254,8 @@ class reproducer : public dump reproducer::reproducer (recording::context &ctxt, const char *filename) : dump (ctxt, filename, 0), - m_identifiers (), + m_map_memento_to_identifier (), + m_set_identifiers (), m_allocator () { } @@ -286,6 +296,35 @@ reproducer::write_args (const vec &contexts) } } +/* Ensure that STR is a valid C identifier by overwriting + any invalid chars in-place with underscores. + + This doesn't special-case the first character. */ + +static void +convert_to_identifier (char *str) +{ + for (char *p = str; *p; p++) +if (!ISALNUM (*p)) + *p = '_'; +} + +/* Given CANDIDATE, a possible C identifier for use in a reproducer, + ensure that it is unique within the generated source file by + appending PTR to it if necessary. Return the resulting string. + + The reproducer will eventually clean up the buffer in its dtor. */ + +const char * +reproducer::ensure_identifier_is_unique (const char *candidate, void *ptr) +{ + if (m_set_identifiers.contains (candidate)) +candidate = m_allocator.xstrdup_printf ("%s_%p", candidate, ptr); + gcc_assert (!m_set_identifiers.contains (candidate)); + m_set_identifiers.add (candidate); + return candidate; +} + /* Generate a C identifier for the given memento, associating the generated buffer with the memento (for future calls to get_identifier et al). @@ -293,21 +332,20 @@ reproducer::write_args (const vec &contexts) const char * reproducer::make_identifier (recording::memento *m, const char *prefix) { - char *result; + const char *result; if (strlen (m->get_debug_string ()) < 100) { - result = m_allocator.xstrdup_printf ("%s_%s_%p", - prefix, - m->get_debug_string (), - (void *) m); - for (char *p = result; *p; p++) - if (!ISALNUM (*p)) - *p = '_'; + char *buf = m_allocator.xstrdup_printf ("%s_%s", + prefix, + m->get_debug_string ()); + convert_to_identifier (buf); + result = buf; } else result = m_allocator.xstrdup_printf ("%s_%p", prefix, (void *) m); - m_identifiers.put (m, result); + result = ensure_identifier_is_unique (result, m); + m_map_memento_to_identifier.put (m, result); return result; } @@ -350,7 +388,7 @@ reproducer::get_identifier (recording::memento *m) if (!loc->created_by_user ()) return "NULL"; - const char **slot = m_identifiers.get (m); + const char **slot = m_map_memento_to_identifier.get (m); if (!slot) { get_context ().add_error (NULL, -- 1.8.5.3
Re: [PATCH, rs6000] 2/3 Add x86 SSE intrinsics to GCC PPC64LE taget
On Thu, Aug 17, 2017 at 08:40:34PM -0500, Steven Munroe wrote: > > > +/* Convert the lower SPFP value to a 32-bit integer according to the > > > current > > > + rounding mode. */ > > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, > > > __artificial__)) > > > +_mm_cvtss_si32 (__m128 __A) > > > +{ > > > + __m64 res = 0; > > > +#ifdef _ARCH_PWR8 > > > + __m128 vtmp; > > > + __asm__( > > > + "xxsldwi %x1,%x2,%x2,3;\n" > > > + "xscvspdp %x1,%x1;\n" > > > + "fctiw %1,%1;\n" > > > + "mfvsrd %0,%x1;\n" > > > + : "=r" (res), > > > + "=&wi" (vtmp) > > > + : "wa" (__A) > > > + : ); > > > +#endif > > > + return (res); > > > +} > > > > Maybe it could do something better than return the wrong answer for non-p8? > > Ok this gets tricky. Before _ARCH_PWR8 the vector to scalar transfer > would go through storage. But that is not the worst of it. Float to int conversion goes trough storage on older systems, too. > The semantic of cvtss requires rint or llrint. But __builtin_rint will > generate a call to libm unless we assert -ffast-math. Yeah, we should fix that some day. If we can. > And we don't have > builtins to generate fctiw/fctid directly. Yup. Well, __builtin_rint*, but that currently calls out to libm. > So I will add the #else using __builtin_rint if that libm dependency is > ok (this will pop in the DG test for older machines. Another option is to not support this intrinsic for < POWER8. I don't have a big (or well-informed) opinion on which it best; but I doubt always returning 0 is the best we can do ;-) Segher
Re: [PATCH, rs6000] Fix PR target/80210: ICE in extract_insn
On Thu, Aug 17, 2017 at 07:02:14PM -0500, Peter Bergner wrote: > PR target/80210 exposes a problem in rs6000_set_current_function() where > is fails to correctly clear the rs6000_previous_fndecl cache correctly. > With the test case, we notice that rs6000_previous_fndecl is set (when it > shouldn't be) and we end up restoring options from it. In this case, > we end up disabling HW sqrt (because of the pragma) when we earlier > decided we could generate HW sqrts which leads to an ICE. > > The current code in rs6000_set_current_function() is kind of a rats nest, > so I threw it out and rewrote it, modeling it after how S390 and i386 > handle it, which correctly clears the *_previous_fndecl caches. > It also makes the code much more readable in my view. Yeah, it almost looks easy this way. Treacherous :-) > This passed bootstrap and regtesting with no regressions and it fixes > the ICE. Ok for trunk? > > This is also broken in GCC 7, GCC 6 and GCC 5. Ok for those after this > has been on trunk for a little while and assuming testing passes? Okay for trunk and all branches. Thanks! Segher > gcc/ > * config/rs6000/rs6000.c (rs6000_activate_target_options): New function. > (rs6000_set_current_function): Rewrite function to use it. > > gcc/testsuite/ > > * gcc.target/powerpc/pr80210.c: New test.
Re: [PATCH], Delete some PowerPC debug switches
Hi! On Thu, Aug 17, 2017 at 07:48:54PM -0400, Michael Meissner wrote: > This patch deletes some of the debug switches that I've added over the years, > and now don't use any more. > > I did bootstrap builds and make check runs on a little endian power8 system. > There were no regressions. Can I check this into the trunk? Maybe some (or all) of them can be deleted, not leaving a stub? The options were undocumented after all, no one should have been using them. Leaving stubs doesn't hurt much of course, just a bit of clutter. > +; This option existed in the past, but it hadn't been used in awhile > mallow-df-permute > -Target Undocumented Var(TARGET_ALLOW_DF_PERMUTE) Save > -; Allow permutation of DF/DI vectors > +Target RejectNegative Undocumented Ignore That comment isn't very enlightening, just use the same text as for the others? The patch is okay for trunk with or without that change. Thanks! Segher
Re: [PATCH] Fix fallout from VRP strict-overflow changes
On Aug 17 2017, Richard Biener wrote: > I was notifed I broke proper handling of undefined overflow in > multiplicative ops handling. The following resurrects previous > behavior (and adds a testcase). > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. This breaks gfortran.dg/alloc_comp_auto_array_2.f90 on aarch64 with -mabi=ilp32 (only for -O3): FAIL: gfortran.dg/alloc_comp_auto_array_2.f90 -O3 -g (test for excess errors) Excess errors: /opt/gcc/gcc-20170818/gcc/testsuite/gfortran.dg/alloc_comp_auto_array_2.f90:33:0: Warning: '__builtin_memcpy' specified size between 2147483648 and 4294967295 exceeds maximum object size 2147483647 [-Wstringop-overflow=] Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget
On Thu, 2017-08-17 at 00:47 -0500, Segher Boessenkool wrote: > On Wed, Aug 16, 2017 at 03:50:55PM -0500, Steven Munroe wrote: > > This it part 3/3 for contributing PPC64LE support for X86 SSE > > instrisics. This patch includes testsuite/gcc.target tests for the > > intrinsics included by xmmintrin.h. > > > +#define CHECK_EXP(UINON_TYPE, VALUE_TYPE, FMT) \ > > Should that be UNION_TYPE? It is spelled 'UINON_TYPE' in ./gcc/testsuite/gcc.target/i386/m128-check.h which the source for the powerpc version. There is no obvious reason why it could not be spelled UNION_TYPE. Unless there is some symbol collision further up the SSE/AVX stack. Bingo: avx512f-helper.h:#define UNION_TYPE(SIZE, NAME) EVAL(union, SIZE, NAME) I propose not to change this.
Re: [PATCH, rs6000] testcase coverage for vec_perm built-ins
Hi Will, On Thu, Aug 17, 2017 at 09:19:23AM -0500, Will Schmidt wrote: > Add some Testcase coverage for the vector permute intrinsics. > > Tested across power platforms. OK for trunk? > * gcc.target/powerpc/fold-vec-perm-char.c: New. > * gcc.target/powerpc/fold-vec-perm-double.c: New. > * gcc.target/powerpc/fold-vec-perm-float.c: New. > * gcc.target/powerpc/fold-vec-perm-int.c: New. > * gcc.target/powerpc/fold-vec-perm-longlong.c: New. > * gcc.target/powerpc/fold-vec-perm-pixel.c: New. > * gcc.target/powerpc/fold-vec-perm-short.c: New. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-double.c > @@ -0,0 +1,16 @@ > +/* Verify that overloaded built-ins for vec_perm with > + double inputs produce the right results. */ That suggests it is a run test, but it's not. s/results/code/ maybe? (Same in other tests). > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-float.c > @@ -0,0 +1,16 @@ > +/* Verify that overloaded built-ins for vec_perm with float > + inputs produce the right results. */ > + > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-maltivec -O2" } */ vsx vs. altivec again. You probably just need to add a comment what this is about, or you can use -mvsx instead. > new file mode 100644 > index 000..9f5c786 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-pixel.c > @@ -0,0 +1,16 @@ > +/* Verify that overloaded built-ins for vec_perm with pixel > + inputs produce the right results. */ > + > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-mvsx -O2" } */ Why vsx for pixel? It's an altivec thing. Segher
Re: [PATCH, rs6000] testcase coverage for vec_perm built-ins
On Thu, Aug 17, 2017 at 09:19:17AM -0500, Will Schmidt wrote: > Add testcase coverage for the vec_ld intrinsic builtins. > > Tested across power platforms (p6 and newer). OK for trunk? > * gcc.target/powerpc/fold-vec-ld-char.c: New. > * gcc.target/powerpc/fold-vec-ld-double.c: New. > * gcc.target/powerpc/fold-vec-ld-float.c: New. > * gcc.target/powerpc/fold-vec-ld-int.c: New. > * gcc.target/powerpc/fold-vec-ld-longlong.c: New. > * gcc.target/powerpc/fold-vec-ld-short.c: New. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-float.c > @@ -0,0 +1,23 @@ > +/* Verify that overloaded built-ins for vec_ld with float > + inputs produce the right results. */ > + > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-maltivec -O2" } */ This is vsx_ok but you're only using -maltivec? Should it use altivec_ok instead? > @@ -0,0 +1,28 @@ > +/* Verify that overloaded built-ins for vec_ld* with long long > + inputs produce the right results. */ > + > +/* { dg-do compile { target lp64 } } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-mvsx -mpower8-vector -O2" } */ -mvsx is redundant with -mpower8-vector; remove -mvsx? Okay for trunk with those two things taken care of. Thanks, Segher
Re: [PATCH, rs6000] Add testcase coverage for vec_sums built-in
On Thu, Aug 17, 2017 at 09:18:54AM -0500, Will Schmidt wrote: > Add some testcase coverage for the vec_sums() built-in. > > Tested across power platforms (p6 and newer). OK for trunk? Okay, thanks! Segher > * gcc.target/powerpc/fold-vec-sums-int.c: New.
Re: [PATCH], Enable -mfloat128 by default on PowerPC VSX systems
On Wed, Aug 16, 2017 at 06:59:50PM -0400, Michael Meissner wrote: > This patch enables -mfloat128 to be the default on PowerPC Linux VSX systems. > > This patch depends on the libquadmatch/81848 patch being approved and > installed: > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00977.html That patch is still waiting, but I'll review this one already. > I've checked this on a big endian power7 system (both 32-bit and 64-bit) and a > little endian power8 system. It may be good to test it on a system without VSX as well, if you can? > +The default for @option{-mfloat128} is enabled on PowerPC Linux > +systems using the VSX instruction set, and disabled on other systems. > + > +The VSX instruction set (@option{-mvsx}, @option{-mcpu=power7}, > +@option{-mcpu=power8}) must be enabled to use the IEEE 128-bit > +floating point support. The IEEE 128-bit floating point support only > +works on PowerPC Linux systems. Maybe swap these two? More logical that way I think. > +The default for @option{-mfloat128-hardware} is enabled on PowerPC > +Linux systems using the ISA 3.0 instruction set, and disabled on other > +systems. You cannot enable it without ISA 3.0, it looks like it does not say that? The patch looks fine (maybe improve the doc a bit), it's approved for trunk. Thanks! Segher
libgo patch commited: Don't use little-endian code to dump big-endian PPC regs
According to GCC PR 81893 the code that dumps the registers for PPC only works for little-endian. This patch fixes it to only be used in that case. Bootstrapped on x86_64-pc-linux-gnu, for what that's worth. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 251182) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -28e49825162465172ed706283628bf5cc1996260 +2c4a2bd826e58c8c8c51b9196c8d2c67abc4037e The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/runtime/go-signal.c === --- libgo/runtime/go-signal.c (revision 251127) +++ libgo/runtime/go-signal.c (working copy) @@ -343,7 +343,7 @@ dumpregs(siginfo_t *info __attribute__(( #endif #endif -#ifdef __PPC__ +#if defined(__PPC__) && defined(__LITTLE_ENDIAN__) #ifdef __linux__ { mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext; @@ -359,6 +359,9 @@ dumpregs(siginfo_t *info __attribute__(( runtime_printf("xer %X\n", m->regs->xer); } #endif +#endif + +#ifdef __PPC__ #ifdef _AIX { mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
Re: std::list optimizations
On 28/07/17 18:42 +0200, François Dumont wrote: Hi Completing execution of the testsuite revealed a bug. So here is the correct version of this patch. François On 21/07/2017 19:14, François Dumont wrote: Hi Here is a proposal for 2 optimizations in the std::list implementation. Optimization on the move constructor taking an allocator for always equal allocators. Compare to the version in my previous std::list patch I am now doing it at std::list level rather than at _List_base level. This way we won't instantiate the insert call and we won't check for empty list when the allocator always compare equal. 2nd optimization, I replace the _S_distance method by the std::distance algo which benefit from the nice [begin(), end()) range optimization when cxx11 abi is being used. Note that I am proposing the 2 change in 1 patch to save some review time but I can commit those separately. Tested under x86_64 Linux normal mode. * include/bits/stl_list.h (_List_base<>::_S_distance): Remove. (_List_impl(_List_impl&&, _Node_alloc_type&&)): New. (_List_base(_List_base&&, _Node_alloc_type&&)): Use latter. (_List_base(_Node_alloc_type&&)): New. (_List_base<>::_M_distance, _List_base<>::_M_node_count): Move... (list<>::_M_distance, list<>::_M_node_count): ...here. Replace calls to _S_distance with calls to std::distance. (list(list&&, const allocator_type&, true_type)): New. (list(list&&, const allocator_type&, false_type)): New. (list(list&&, const allocator_type&)): Adapt to call latters. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h index cef94f7..eb71a5e 100644 --- a/libstdc++-v3/include/bits/stl_list.h +++ b/libstdc++-v3/include/bits/stl_list.h @@ -364,19 +364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 rebind<_List_node<_Tp> >::other _Node_alloc_type; typedef __gnu_cxx::__alloc_traits<_Node_alloc_type> _Node_alloc_traits; - static size_t - _S_distance(const __detail::_List_node_base* __first, - const __detail::_List_node_base* __last) - { - size_t __n = 0; - while (__first != __last) - { - __first = __first->_M_next; - ++__n; - } - return __n; - } - Removing this function will break code that expects it to exist in an explicit instantiation. The function could be left there, but unused. struct _List_impl : public _Node_alloc_type { @@ -393,6 +380,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 #if __cplusplus >= 201103L _List_impl(_List_impl&&) = default; + _List_impl(_List_impl&& __x, _Node_alloc_type&& __a) + : _Node_alloc_type(std::move(__a)), _M_node(std::move(__x._M_node)) + { } + This is fine. _List_impl(_Node_alloc_type&& __a) noexcept : _Node_alloc_type(std::move(__a)) { } @@ -409,28 +400,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 void _M_inc_size(size_t __n) { _M_impl._M_node._M_size += __n; } void _M_dec_size(size_t __n) { _M_impl._M_node._M_size -= __n; } - - size_t - _M_distance(const __detail::_List_node_base* __first, - const __detail::_List_node_base* __last) const - { return _S_distance(__first, __last); } - - // return the stored size - size_t _M_node_count() const { return _M_get_size(); } #else // dummy implementations used when the size is not stored size_t _M_get_size() const { return 0; } void _M_set_size(size_t) { } void _M_inc_size(size_t) { } void _M_dec_size(size_t) { } - size_t _M_distance(const void*, const void*) const { return 0; } - - // count the number of nodes - size_t _M_node_count() const - { - return _S_distance(_M_impl._M_node._M_next, - std::__addressof(_M_impl._M_node)); - } #endif Same problem again: these will no longer be defined by explicit instantiations. typename _Node_alloc_traits::pointer @@ -466,12 +441,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _List_base(_List_base&&) = default; _List_base(_List_base&& __x, _Node_alloc_type&& __a) + : _M_impl(std::move(__x._M_impl), std::move(__a)) + { } + + _List_base(_Node_alloc_type&& __a) : _M_impl(std::move(__a)) - { - if (__x._M_get_Node_allocator() == _M_get_Node_allocator()) - _M_move_nodes(std::move(__x)); - // else caller must move individual elements. - } + { } I like this change in principle, but it alters the behaviour of an existing constructor. Existing code might use the constructor and get broken by this. You can avoid that by leaving the existing constructor alone and adding two new ones for new code to use. Reordering the parameters will make the new one distinct from the old one: // Used when is_always_equal _List_base(_Node_alloc_type&& __a, _List_base&& __x)) : _M_impl(std::move(_
Re: [PATCH] detect incompatible aliases (PR c/81854)
On 08/18/2017 10:48 AM, Jonathan Wakely wrote: On 18/08/17 08:54 -0600, Martin Sebor wrote: On 08/18/2017 07:10 AM, Jonathan Wakely wrote: On 17/08/17 21:21 -0600, Martin Sebor wrote: Joseph, while looking into implementing enhancement your request pr81824 I noticed that GCC silently accepts incompatible alias declarations (pr81854) so as sort of a proof-concept for the former I enhanced the checking already done for other kinds of incompatibilities to also detect those mentioned in the latter bug. Attached is this patch, tested on x85_64-linux. Jonathan, the patch requires suppressing the warning in libstdc++ compatibility symbol definitions in compatibility.cc. I couldn't find a way to do it without the suppression but I'd be happy to try again if you have an idea for how. Doing it that way is fine, but ... diff --git a/libstdc++-v3/src/c++98/compatibility.cc b/libstdc++-v3/src/c++98/compatibility.cc index 381f4c4..5f56b9e 100644 --- a/libstdc++-v3/src/c++98/compatibility.cc +++ b/libstdc++-v3/src/c++98/compatibility.cc @@ -213,6 +213,11 @@ _ZNSt19istreambuf_iteratorIcSt11char_traitsIcEEppEv _ZNSt19istreambuf_iteratorIwSt11char_traitsIwEEppEv */ +// Disable warning about declaring aliases between functions with +// incompatible types. +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wattributes" + Could this be moved closer to the point where it's needed? It's not needed until after line 361, right? Sure. The other possibility that I forgot to mention is to declare the alias without a prototype, which in C++ looks like this: void foo (...); The patch would then look like this. Do you have a preference between these two approaches? If this doesn't change the generated code, but avoids the warnings then I think I prefer this. i.e. fix the code, not just suppress the warnings. It's the same as calling a function without a prototype in C. I rebuilt libstdc++ with this change and reran the test suite with no unexpected failures so I'll go ahead and commit this change instead. To be clear, though, this is just a suppression mechanism not unlike a cast. The ideal solution would be to declare the aliases to have the right type, e.g., using __typeof__ or decltype. I just couldn't find a way to make it work with the macros. Martin Martin diff --git a/libstdc++-v3/src/c++98/compatibility.cc b/libstdc++-v3/src/c++98/compatibility.cc index 381f4c4..b49a5ca 100644 --- a/libstdc++-v3/src/c++98/compatibility.cc +++ b/libstdc++-v3/src/c++98/compatibility.cc @@ -367,13 +367,13 @@ _GLIBCXX_END_NAMESPACE_VERSION #define _GLIBCXX_3_4_SYMVER(XXname, name) \ extern "C" void \ - _X##name() \ + _X##name(...) \ __attribute__ ((alias(#XXname))); \ asm (".symver " "_X" #name "," #name "@GLIBCXX_3.4"); #define _GLIBCXX_3_4_5_SYMVER(XXname, name) \ extern "C" void \ - _Y##name() \ + _Y##name(...) \ __attribute__ ((alias(#XXname))); \ asm (".symver " "_Y" #name "," #name "@@GLIBCXX_3.4.5");
[PATCH/AARCH64] Generate FRINTZ for (double)(long) under -ffast-math on aarch64
Like https://gcc.gnu.org/ml/gcc-patches/2010-09/msg00060.html for PowerPC, we should do something similar for aarch64. This pattern does show up in SPEC CPU 2006 in astar but I did not look into performance improvement of it though. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * config/aarch64/aarch64.md (*frintz): New pattern. testsuite/ChangeLog: * testsuite/gcc.target/aarch64/floattointtofloat-1.c: New testcase. commit 9cef5e196729df5a197b81b72192d687683a057a Author: Andrew Pinski Date: Thu Aug 17 22:31:15 2017 -0700 Fix 19869: (double)(long)double_var could be better optimized Use FRINTZ to optimize this. Signed-off-by: Andrew Pinski diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5e6b027..1439bd0 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4645,6 +4645,16 @@ [(set_attr "type" "f_cvti2f")] ) +(define_insn "*frintz" + [(set (match_operand:GPF 0 "register_operand" "=w") +(float:GPF (fix:GPI (match_operand:GPF 1 "register_operand" "w"] + "TARGET_FLOAT && flag_unsafe_math_optimizations && !flag_trapping_math" + "frintz %0, %1" + [(set_attr "simd" "yes") + (set_attr "fp" "no") + (set_attr "type" "neon_int_to_fp_")] +) + ;; If we do not have ARMv8.2-A 16-bit floating point extensions, the ;; midend will arrange for an SImode conversion to HFmode to first go ;; through DFmode, then to HFmode. But first it will try converting diff --git a/gcc/testsuite/gcc.target/aarch64/floattointtofloat-1.c b/gcc/testsuite/gcc.target/aarch64/floattointtofloat-1.c new file mode 100644 index 000..c1209c6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/floattointtofloat-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast" } */ +double dll(double a) +{ + return (long long)a; +} +double di(double a) +{ + return (int)a; +} +float fll(float a) +{ + return (long long)a; +} +float fi(float a) +{ + return (int)a; +} +/* { dg-final { scan-assembler-times "frintz" 4 } } */ +
[PATCH] Fix PR81488
Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81488 reports a problem with SLSR where too many memory resources are required to complete SLSR processing of conditional candidates. The code in question contains large trees of PHI dependencies that must be examined in order to find all candidates for replacement. Not only are the dependency chains deep, but many PHIs contain duplicate source operands arriving by different paths, and SLSR isn't currently smart enough to avoid tracing them more than once. This leads to exponential behavior and a bad ending. Even removing the exponential behavior is not sufficient to fix the problem. The dependencies are just too complex. So it is also necessary to put a limit on how much time we want to spend examining PHI nodes before giving up. I've arbitrarily chosen 16 as the maximum number of PHI nodes to visit for each candidate, which seems likely to be sufficient in most cases. A side benefit of removing the exponential behavior is better accuracy in making cost-model decisions. With tracing through the same PHI dependencies more than once, the insertion (+) and replacement (-) costs are overcounted. This should now be improved. The original bug went latent on the trunk after it was reported, but I was able to reproduce with an older revision and verify that the following patch fixes the problem. I've also bootstrapped and tested it on powerpc64le-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2017-08-18 Bill Schmidt PR tree-optimization/81488 * gimple-ssa-strength-reduction (struct slsr_cand_d): Add visited and cached_basis fields. (MAX_SPREAD): New constant. (alloc_cand_and_find_basis): Initialize new fields. (clear_visited): New function. (create_phi_basis_1): Rename from create_phi_basis, set visited and cached_basis fields. (create_phi_basis): New wrapper function. (phi_add_costs_1): Rename from phi_add_costs, add spread parameter, set visited field, short-circuit when limits reached. (phi_add_costs): New wrapper function. (record_phi_increments_1): Rename from record_phi_increments, set visited field. (record_phi_increments): New wrapper function. (phi_incr_cost_1): Rename from phi_incr_cost, set visited field. (phi_incr_cost): New wrapper function. (all_phi_incrs_profitable_1): Rename from all_phi_incrs_profitable, set visited field. (all_phi_incrs_profitable): New wrapper function. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 251159) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -281,6 +281,14 @@ struct slsr_cand_d /* Savings that can be expected from eliminating dead code if this candidate is replaced. */ int dead_savings; + + /* For PHI candidates, use a visited flag to keep from processing the + same PHI twice from multiple paths. */ + int visited; + + /* We sometimes have to cache a phi basis with a phi candidate to + avoid processing it twice. Valid only if visited==1. */ + tree cached_basis; }; typedef struct slsr_cand_d slsr_cand, *slsr_cand_t; @@ -369,7 +377,11 @@ enum count_phis_status DONT_COUNT_PHIS = 0, COUNT_PHIS = 1 }; - + +/* Constrain how many PHI nodes we will visit for a conditional + candidate (depth and breadth). */ +const int MAX_SPREAD = 16; + /* Pointer map embodying a mapping from statements to candidates. */ static hash_map *stmt_cand_map; @@ -671,6 +683,8 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi c->sibling = 0; c->def_phi = kind == CAND_MULT ? find_phi_def (base) : 0; c->dead_savings = savings; + c->visited = 0; + c->cached_basis = NULL_TREE; cand_vec.safe_push (c); @@ -2317,19 +2331,33 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b return lhs; } -/* Given a candidate C with BASIS_NAME being the LHS of C's basis which - is hidden by the phi node FROM_PHI, create a new phi node in the same - block as FROM_PHI. The new phi is suitable for use as a basis by C, - with its phi arguments representing conditional adjustments to the - hidden basis along conditional incoming paths. Those adjustments are - made by creating add statements (and sometimes recursively creating - phis) along those incoming paths. LOC is the location to attach to - the introduced statements. KNOWN_STRIDE is true iff C's stride is a - constant. */ +/* Clear the visited field for a tree of PHI candidates. */ +static void +clear_visited (gphi *phi) +{ + unsigned i; + slsr_cand_t phi_cand = *stmt_cand_map->get (phi); + + if (phi_cand->visited) +{ + phi_cand->visited = 0; + + for (i = 0; i < gimple_phi_num_args (phi); i++) + { + tree arg = gimple_phi_arg_def (phi, i); + gimple *arg_def
[PATCH] Simplify allocator usage in unordered containers
While fixing PR 81891 I noticed that _Hashtable always creates a __value_alloc_type for constructing and destroying the elements, which is unnecessary. https://wg21.link/lwg2218 confirmed that. We can avoid constructing new allocators just for these calls and can call them directly on the stored allocator. * include/bits/hashtable_policy.h (_ReuseOrAllocNode): Remove __value_alloc_type and __value_alloc_traits typedefs. (_ReuseOrAllocNode::operator()): Call construct and destroy on the node allocator. (_Hashtable_alloc): Simplify __value_alloc_traits typedef. (_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args&&...)): Call construct on the node allocator. (_Hashtable_alloc<_NodeAlloc>::_M_deallocate_node(__node_type*)): Call destroy on the node allocator. Tested powerpc64le-linux, committed to trunk. commit caa7283396a9f74a42c979697b0a53115cee Author: Jonathan Wakely Date: Fri Aug 18 19:12:00 2017 +0100 Simplify allocator usage in unordered containers * include/bits/hashtable_policy.h (_ReuseOrAllocNode): Remove __value_alloc_type and __value_alloc_traits typedefs. (_ReuseOrAllocNode::operator()): Call construct and destroy on the node allocator. (_Hashtable_alloc): Simplify __value_alloc_traits typedef. (_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args&&...)): Call construct on the node allocator. (_Hashtable_alloc<_NodeAlloc>::_M_deallocate_node(__node_type*)): Call destroy on the node allocator. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index a3a31d1fb11..5f2d8776aaa 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -111,9 +111,6 @@ namespace __detail private: using __node_alloc_type = _NodeAlloc; using __hashtable_alloc = _Hashtable_alloc<__node_alloc_type>; - using __value_alloc_type = typename __hashtable_alloc::__value_alloc_type; - using __value_alloc_traits = - typename __hashtable_alloc::__value_alloc_traits; using __node_alloc_traits = typename __hashtable_alloc::__node_alloc_traits; using __node_type = typename __hashtable_alloc::__node_type; @@ -135,18 +132,17 @@ namespace __detail __node_type* __node = _M_nodes; _M_nodes = _M_nodes->_M_next(); __node->_M_nxt = nullptr; - __value_alloc_type __a(_M_h._M_node_allocator()); - __value_alloc_traits::destroy(__a, __node->_M_valptr()); + auto& __a = _M_h._M_node_allocator(); + __node_alloc_traits::destroy(__a, __node->_M_valptr()); __try { - __value_alloc_traits::construct(__a, __node->_M_valptr(), - std::forward<_Arg>(__arg)); + __node_alloc_traits::construct(__a, __node->_M_valptr(), +std::forward<_Arg>(__arg)); } __catch(...) { __node->~__node_type(); - __node_alloc_traits::deallocate(_M_h._M_node_allocator(), - __node, 1); + __node_alloc_traits::deallocate(__a, __node, 1); __throw_exception_again; } return __node; @@ -2000,10 +1996,8 @@ namespace __detail // Use __gnu_cxx to benefit from _S_always_equal and al. using __node_alloc_traits = __gnu_cxx::__alloc_traits<__node_alloc_type>; - using __value_type = typename __node_type::value_type; - using __value_alloc_type = - __alloc_rebind<__node_alloc_type, __value_type>; - using __value_alloc_traits = std::allocator_traits<__value_alloc_type>; + using __value_alloc_traits = typename __node_alloc_traits::template + rebind_traits; using __node_base = __detail::_Hash_node_base; using __bucket_type = __node_base*; @@ -2057,10 +2051,10 @@ namespace __detail __node_type* __n = std::__addressof(*__nptr); __try { - __value_alloc_type __a(_M_node_allocator()); ::new ((void*)__n) __node_type; - __value_alloc_traits::construct(__a, __n->_M_valptr(), - std::forward<_Args>(__args)...); + __node_alloc_traits::construct(_M_node_allocator(), + __n->_M_valptr(), + std::forward<_Args>(__args)...); return __n; } __catch(...) @@ -2076,8 +2070,7 @@ namespace __detail { typedef typename __node_alloc_traits::pointer _Ptr; auto __ptr = std::pointer_traits<_Ptr>::pointer_to(*__n); - __value_alloc
Re: Ping on target independent stack clash protection patches
> #01 of #08: > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01971.html > > #02 of #08: > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01972.html > > #03 of #08: > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01974.html > > Need to reach some kind of closure on these, then I can start pinging > the target maintainers for the rest of the bits... All OK with me, thanks for your attention to the interaction with Ada. Minor nit: + Stack checking is designed to detect infinite recursion for Ada + programs. Furthermore stack checking tries to ensure that scenario + that enough stack space is left to run a signal handler. Let's use the same wording as in invoke.texi: "...infinite recursion and stack overflows for...". Missing "in" before "that scenario'. [Ada folks in the embedded world are mainly scared about the possibility of tasks (threads) overwriting each other's stack; in that case, their only requirement is to be able to run a last chance handler to terminate the task properly. But the ACATS testsuite contains a handul of tests that litteraly play with stack overflows and this complicates the implementation for artificial reasons]. -- Eric Botcazou
[PATCH] PR libstdc++/81891 fix double-free in hashtable constructor
We introduced a regression in r214986 when changing the _Hashtable range constructor to delegate to another constructor. That change means that the object has completed construction after the target constructor completes, and so if an exception is thrown in the delegating constructor then the destructor will run. This results in calling _M_deallocate_buckets twice, causing a double-free. The fix is to simply omit the try-catch in the delegating constructor, so that the destructor takes care of the clean up. PR libstdc++/81891 * include/bits/hashtable.h (_Hashtable(_InputIterator, _InputIterator, size_type, const _H1&, const _H2&, const _Hash&, const _Equal&, const _ExtractKey&, const allocator_type&)): Let destructor do clean up if an exception is thrown. * testsuite/23_containers/unordered_map/cons/81891.cc: New. Tested powerpc64le-linux, committed to trunk. As a regression since 4.9 this needs to be backported to all active branches. commit 30ce5842ed9b91813b97b756960afe6369f1a568 Author: Jonathan Wakely Date: Fri Aug 18 18:03:51 2017 +0100 PR libstdc++/81891 fix double-free in hashtable constructor PR libstdc++/81891 * include/bits/hashtable.h (_Hashtable(_InputIterator, _InputIterator, size_type, const _H1&, const _H2&, const _Hash&, const _Equal&, const _ExtractKey&, const allocator_type&)): Let destructor do clean up if an exception is thrown. * testsuite/23_containers/unordered_map/cons/81891.cc: New. diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index bc7448bcf13..e0806dc93a1 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -973,17 +973,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_bucket_count = __bkt_count; } - __try - { - for (; __f != __l; ++__f) - this->insert(*__f); - } - __catch(...) - { - clear(); - _M_deallocate_buckets(); - __throw_exception_again; - } + for (; __f != __l; ++__f) + this->insert(*__f); } templatehttp://www.gnu.org/licenses/>. + +// { dg-do run { target c++11 } } + +#include +#include +#include + +struct fails_on_copy { + fails_on_copy() = default; + fails_on_copy(const fails_on_copy&) { throw 0; }; +}; + +using value_type = std::pair; + +void +test01() +{ + value_type p; + try + { +std::unordered_map umap(&p, &p + 1); + } + catch(...) + { } +} + +void +test02() +{ + using Alloc = __gnu_test::tracker_allocator; + using std::hash; + using std::equal_to; + + value_type p; + try + { +std::unordered_map, equal_to, Alloc> + umap(&p, &p + 1); + } + catch(...) + { } + + using counter = __gnu_test::tracker_allocator_counter; + VERIFY(counter::get_allocation_count() == counter::get_deallocation_count()); +} + +int +main() +{ + test01(); + test02(); +}
Re: [PATCH] detect incompatible aliases (PR c/81854)
On 18/08/17 08:54 -0600, Martin Sebor wrote: On 08/18/2017 07:10 AM, Jonathan Wakely wrote: On 17/08/17 21:21 -0600, Martin Sebor wrote: Joseph, while looking into implementing enhancement your request pr81824 I noticed that GCC silently accepts incompatible alias declarations (pr81854) so as sort of a proof-concept for the former I enhanced the checking already done for other kinds of incompatibilities to also detect those mentioned in the latter bug. Attached is this patch, tested on x85_64-linux. Jonathan, the patch requires suppressing the warning in libstdc++ compatibility symbol definitions in compatibility.cc. I couldn't find a way to do it without the suppression but I'd be happy to try again if you have an idea for how. Doing it that way is fine, but ... diff --git a/libstdc++-v3/src/c++98/compatibility.cc b/libstdc++-v3/src/c++98/compatibility.cc index 381f4c4..5f56b9e 100644 --- a/libstdc++-v3/src/c++98/compatibility.cc +++ b/libstdc++-v3/src/c++98/compatibility.cc @@ -213,6 +213,11 @@ _ZNSt19istreambuf_iteratorIcSt11char_traitsIcEEppEv _ZNSt19istreambuf_iteratorIwSt11char_traitsIwEEppEv */ +// Disable warning about declaring aliases between functions with +// incompatible types. +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wattributes" + Could this be moved closer to the point where it's needed? It's not needed until after line 361, right? Sure. The other possibility that I forgot to mention is to declare the alias without a prototype, which in C++ looks like this: void foo (...); The patch would then look like this. Do you have a preference between these two approaches? If this doesn't change the generated code, but avoids the warnings then I think I prefer this. i.e. fix the code, not just suppress the warnings. Martin diff --git a/libstdc++-v3/src/c++98/compatibility.cc b/libstdc++-v3/src/c++98/compatibility.cc index 381f4c4..b49a5ca 100644 --- a/libstdc++-v3/src/c++98/compatibility.cc +++ b/libstdc++-v3/src/c++98/compatibility.cc @@ -367,13 +367,13 @@ _GLIBCXX_END_NAMESPACE_VERSION #define _GLIBCXX_3_4_SYMVER(XXname, name) \ extern "C" void \ - _X##name() \ + _X##name(...) \ __attribute__ ((alias(#XXname))); \ asm (".symver " "_X" #name "," #name "@GLIBCXX_3.4"); #define _GLIBCXX_3_4_5_SYMVER(XXname, name) \ extern "C" void \ - _Y##name() \ + _Y##name(...) \ __attribute__ ((alias(#XXname))); \ asm (".symver " "_Y" #name "," #name "@@GLIBCXX_3.4.5");
Re: [PATCH] PR libstdc++/79162 ambiguity in string assignment due to string_view overload (LWG 2946)
Hi, This is a friendly reminder asking for a review of the suggested patch! Thanks, - Daniel 2017-07-30 15:01 GMT+02:00 Daniel Krügler : > 2017-07-28 22:40 GMT+02:00 Daniel Krügler : >> 2017-07-28 22:29 GMT+02:00 Daniel Krügler : >>> 2017-07-28 22:25 GMT+02:00 Tim Song : On Fri, Jul 28, 2017 at 4:10 PM, Daniel Krügler wrote: > + // Performs an implicit conversion from _Tp to __sv_type. > + template > +static __sv_type _S_to_string_view(const _Tp& __svt) > +{ > + return __svt; > +} I might have gone for +static __sv_type _S_to_string_view(__sv_type __svt) noexcept +{ + return __svt; +} With that, we can also use noexcept(_S_to_string_view(__t)) to make up for the absence of is_nothrow_convertible (basically the same thing I did in LWG 2993's PR). >>> >>> Agreed, that makes very much sense. I will adjust the P/R, but before >>> I resubmit I would like to get feedback whether the other two compare >>> functions also should become conditionally noexcept. >> >> Locally I have now performed the sole change of the _S_to_string_view >> declaration getting rid of the template, but would also like to gather >> feedback from the maintainers whether I should also change the form of >> the conditional noexcept to use the expression >> >> noexcept(_S_to_string_view(__t)) >> >> instead of the current >> >> is_same<_Tp, __sv_type>::value >> >> as suggested by Tim Song. >> >> I'm asking also, because I have a paper proposing to standardize >> is_nothrow_convertible submitted for the upcoming C++ mailing - This >> would be one of the first applications in the library ;-) > > A slightly revised patch update: It replaces the _S_to_string_view > template by a simpler _S_to_string_view function as of Tim Song's > suggestion, but still uses the simplified noexcept specification > deferring it to a future application case for is_nothrow_convertible. > Furthermore now all three compare function templates are now > (conditionally) noexcept by an (off-list) suggestion from Jonathan > Wakely. > > Thanks, > > - Daniel -- SavedURI :Show URLShow URLSavedURI : SavedURI :Hide URLHide URLSavedURI : https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154
Re: [PATCH] detect incompatible aliases (PR c/81854)
On 08/18/2017 07:10 AM, Jonathan Wakely wrote: On 17/08/17 21:21 -0600, Martin Sebor wrote: Joseph, while looking into implementing enhancement your request pr81824 I noticed that GCC silently accepts incompatible alias declarations (pr81854) so as sort of a proof-concept for the former I enhanced the checking already done for other kinds of incompatibilities to also detect those mentioned in the latter bug. Attached is this patch, tested on x85_64-linux. Jonathan, the patch requires suppressing the warning in libstdc++ compatibility symbol definitions in compatibility.cc. I couldn't find a way to do it without the suppression but I'd be happy to try again if you have an idea for how. Doing it that way is fine, but ... diff --git a/libstdc++-v3/src/c++98/compatibility.cc b/libstdc++-v3/src/c++98/compatibility.cc index 381f4c4..5f56b9e 100644 --- a/libstdc++-v3/src/c++98/compatibility.cc +++ b/libstdc++-v3/src/c++98/compatibility.cc @@ -213,6 +213,11 @@ _ZNSt19istreambuf_iteratorIcSt11char_traitsIcEEppEv _ZNSt19istreambuf_iteratorIwSt11char_traitsIwEEppEv */ +// Disable warning about declaring aliases between functions with +// incompatible types. +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wattributes" + Could this be moved closer to the point where it's needed? It's not needed until after line 361, right? Sure. The other possibility that I forgot to mention is to declare the alias without a prototype, which in C++ looks like this: void foo (...); The patch would then look like this. Do you have a preference between these two approaches? Martin diff --git a/libstdc++-v3/src/c++98/compatibility.cc b/libstdc++-v3/src/c++98/compatibility.cc index 381f4c4..b49a5ca 100644 --- a/libstdc++-v3/src/c++98/compatibility.cc +++ b/libstdc++-v3/src/c++98/compatibility.cc @@ -367,13 +367,13 @@ _GLIBCXX_END_NAMESPACE_VERSION #define _GLIBCXX_3_4_SYMVER(XXname, name) \ extern "C" void \ - _X##name() \ + _X##name(...) \ __attribute__ ((alias(#XXname))); \ asm (".symver " "_X" #name "," #name "@GLIBCXX_3.4"); #define _GLIBCXX_3_4_5_SYMVER(XXname, name) \ extern "C" void \ - _Y##name() \ + _Y##name(...) \ __attribute__ ((alias(#XXname))); \ asm (".symver " "_Y" #name "," #name "@@GLIBCXX_3.4.5");
[PATCH] Asm memory constraints
This patch adds some documentation on asm memory constraints, aimed especially at constraints for arrays. I may have invented something new here as I've never seen "=m" (*(T (*)[]) ptr) used before. So this isn't simply a documentation patch. It needs blessing from a global maintainer, I think, as to whether this is a valid approach and something that gcc ought to continue supporting. My poking around the code and looking at dumps convinced me that it's OK.. PR inline-asm/81890 * doc/extend.texi (Clobbers): Correct vax example. Delete old example of a memory input for a string of known length. Move commentary out of table. Add a number of new examples covering array memory inputs and outputs. testsuite/ * gcc.target/i386/asm-mem.c: New test. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 93d542d..224518f 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8755,7 +8755,7 @@ registers: asm volatile ("movc3 %0, %1, %2" : /* No outputs. */ : "g" (from), "g" (to), "g" (count) - : "r0", "r1", "r2", "r3", "r4", "r5"); + : "r0", "r1", "r2", "r3", "r4", "r5", "memory"); @end example Also, there are two special clobber arguments: @@ -8786,14 +8786,75 @@ Note that this clobber does not prevent the @emph{processor} from doing speculative reads past the @code{asm} statement. To prevent that, you need processor-specific fence instructions. -Flushing registers to memory has performance implications and may be an issue -for time-sensitive code. You can use a trick to avoid this if the size of -the memory being accessed is known at compile time. For example, if accessing -ten bytes of a string, use a memory input like: +@end table -@code{@{"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )@}}. +Flushing registers to memory has performance implications and may be +an issue for time-sensitive code. You can provide better information +to GCC to avoid this, as shown in the following examples. At a +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} +need to be flushed. Also, if GCC can prove that all of the outputs of +a non-volatile @code{asm} statement are unused, then the @code{asm} +may be deleted. Removal of otherwise dead @code{asm} statements will +not happen if they clobber @code{"memory"}. -@end table +Here is a fictitious sum of squares instruction, that takes two +pointers to floating point values in memory and produces a floating +point register output. +Notice that @code{x}, and @code{y} both appear twice in the @code{asm} +parameters, once to specify memory accessed, and once to specify a +base register used by the @code{asm}. You won't normally be wasting a +register by doing this as GCC can use the same register for both +purposes. However, it would be foolish to use both @code{%1} and +@code{%3} for @code{x} in this @code{asm} and expect them to be the +same. In fact, @code{%3} may well not even be a register. It might +be a symbolic memory reference to the object pointed to by @code{x}. + +@smallexample +asm ("sumsq %0, %1, %2" + : "+f" (result) + : "r" (x), "r" (y), "m" (*x), "m" (*y)); +@end smallexample + +Here is a fictitious @code{*z++ = *x++ * *y++} instruction. +Notice that the @code{x}, @code{y} and @code{z} pointer registers +must be specified as input/output because the @code{asm} modifies +them. + +@smallexample +asm ("vecmul %0, %1, %2" + : "+r" (z), "+r" (x), "+r" (y), "=m" (*z) + : "m" (*x), "m" (*y)); +@end smallexample + +An x86 example where the string memory argument is of unknown length. + +@smallexample +asm("repne scasb" +: "=c" (count), "+D" (p) +: "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); +@end smallexample + +If you know the above will only be reading a ten byte array then you +could instead use a memory input like: +@code{"m" (*(const char (*)[10]) p)}. + +Here is an example of a PowerPC vector scale implemented in assembly, +complete with vector and condition code clobbers, and some initialized +offset registers that are unchanged by the @code{asm}. + +@smallexample +void +dscal (size_t n, double *x, double alpha) +@{ + asm ("/* lots of asm here */" + : "+m" (*(double (*)[n]) x), "+r" (n), "+b" (x) + : "d" (alpha), "b" (32), "b" (48), "b" (64), + "b" (80), "b" (96), "b" (112) + : "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"); +@} +@end smallexample @anchor{GotoLabels} @subsubsection Goto Labels diff --git a/gcc/testsuite/gcc.target/i386/asm-mem.c b/gcc/testsuite/gcc.target/i386/asm-mem.c new file mode 100644 index 000..01522fe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/asm-mem.c @@ -0,0 +1,58 @@ +/* { dg-do run } */ +/* { dg-options "-O3" } */ + +/* Check that "m" array references are effective in preventin
constexpr basic_string_view as required by C++17
The current implementation does miss a lot of constexpr in basic_string_view. (Bug 70483) The patch adds the missing constexpr's as required by C++17. I did run the the stdlibc++-v3 testsuite and got no errors. I didn't changes the basic_string_view test cases, because I couldn't find a pattern in it how to test if the function declarations are constexpr. (This is my first patch for GCC.) diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index fd9a6afb..86f1a72 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,8 @@ +2017-08-18 Benjamin Buch + + * include/std/string_view: Added constexpr as required by the C++17. + * include/bits/string_view.tcc: Likewise. + 2017-08-11 Jonathan Wakely PR libstdc++/81808 @@ -2783,7 +2788,7 @@ 2017-01-01 Jakub Jelinek Update copyright years. - + Copyright (C) 2017 Free Software Foundation, Inc. Copying and distribution of this file, with or without modification, diff --git a/libstdc++-v3/include/bits/string_view.tcc b/libstdc++-v3/include/bits/string_view.tcc index f4bd50f..b8ab78c 100644 --- a/libstdc++-v3/include/bits/string_view.tcc +++ b/libstdc++-v3/include/bits/string_view.tcc @@ -45,7 +45,7 @@ namespace std _GLIBCXX_VISIBILITY(default) _GLIBCXX_BEGIN_NAMESPACE_VERSION template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: find(const _CharT* __str, size_type __pos, size_type __n) const noexcept { @@ -66,7 +66,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: find(_CharT __c, size_type __pos) const noexcept { @@ -82,7 +82,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: rfind(const _CharT* __str, size_type __pos, size_type __n) const noexcept { @@ -102,7 +102,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: rfind(_CharT __c, size_type __pos) const noexcept { @@ -119,7 +119,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: find_first_of(const _CharT* __str, size_type __pos, size_type __n) const { @@ -135,7 +135,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: find_last_of(const _CharT* __str, size_type __pos, size_type __n) const { @@ -156,7 +156,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: find_first_not_of(const _CharT* __str, size_type __pos, size_type __n) const { @@ -168,7 +168,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: find_first_not_of(_CharT __c, size_type __pos) const noexcept { @@ -179,7 +179,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: find_last_not_of(const _CharT* __str, size_type __pos, size_type __n) const { @@ -200,7 +200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -typename basic_string_view<_CharT, _Traits>::size_type +constexpr typename basic_string_view<_CharT, _Traits>::size_type basic_string_view<_CharT, _Traits>:: find_last_not_of(_CharT __c, size_type __pos) const noexcept { diff --git a/libstdc++-v3/include/std/string_view b/libstdc++-v3/include/std/string_view index 88a7686..8b4ac05 100644 --- a/libstdc++-v3/include/std/string_view +++ b/libstdc++-v3/include/std/string_view @@ -106,7 +106,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_str{__str} { } - basic_string_view& + constexpr basic_string_view& operator=(const basic_string_view&) noexcept = default; // [string.view.iterators], iterators @@ -127,19 +127,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Friday, August 18, 2017 3:53 PM > To: Tsimbalist, Igor V > Cc: gcc-patches@gcc.gnu.org > Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling > > On Fri, Aug 18, 2017 at 3:11 PM, Tsimbalist, Igor V > wrote: > >> -Original Message- > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > >> Sent: Tuesday, August 15, 2017 3:43 PM > >> To: Tsimbalist, Igor V > >> Cc: gcc-patches@gcc.gnu.org > >> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling > >> > >> On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V > >> wrote: > >> > Part#1. Add generic part for Intel CET enabling. > >> > > >> > The spec is available at > >> > > >> > https://software.intel.com/sites/default/files/managed/4d/2a/contro > >> > l-f low-enforcement-technology-preview.pdf > >> > > >> > High-level design. > >> > -- > >> > > >> > A proposal is to introduce a target independent flag > >> > -finstrument-control-flow with a semantic to instrument a code to > >> > control validness or integrity of control-flow transfers using jump > >> > and call instructions. The main goal is to detect and block a > >> > possible malware execution through transfer the execution to > >> > unknown target address. Implementation could be either software or > >> > target based. Any target platforms can provide their implementation > >> > for instrumentation under this option. > >> > > >> > When the -finstrument-control-flow flag is set each implementation > >> > has to check if a support exists for a target platform and report > >> > an error if no support is found. > >> > > >> > The compiler should instrument any control-flow transfer points in > >> > a program (ex. call/jmp/ret) as well as any landing pads, which are > >> > targets of for control-flow transfers. > >> > > >> > A new 'notrack' attribute is introduced to provide hand tuning support. > >> > The attribute directs the compiler to skip a call to a function and > >> > a function's landing pad from instrumentation (tracking). The > >> > attribute can be used for function and pointer to function types, > >> > otherwise it will be ignored. The attribute is saved in a type and > >> > propagated to a GIMPLE call statement and later to a call instruction. > >> > > >> > Currently all platforms except i386 will report the error and do no > >> > instrumentation. i386 will provide the implementation based on a > >> > specification published by Intel for a new technology called > >> > Control-flow Enforcement Technology (CET). > >> > >> diff --git a/gcc/gimple.c b/gcc/gimple.c index 479f90c..2e4ab2d > >> 100644 > >> --- a/gcc/gimple.c > >> +++ b/gcc/gimple.c > >> @@ -378,6 +378,23 @@ gimple_build_call_from_tree (tree t) > >>gimple_set_no_warning (call, TREE_NO_WARNING (t)); > >>gimple_call_set_with_bounds (call, CALL_WITH_BOUNDS_P (t)); > >> > >> + if (fndecl == NULL_TREE) > >> +{ > >> + /* Find the type of an indirect call. */ > >> + tree addr = CALL_EXPR_FN (t); > >> + if (TREE_CODE (addr) != FUNCTION_DECL) > >> + { > >> + tree fntype = TREE_TYPE (addr); > >> + gcc_assert (POINTER_TYPE_P (fntype)); > >> + fntype = TREE_TYPE (fntype); > >> + > >> + /* Check if its type has the no-track attribute and propagate > >> +it to the CALL insn. */ > >> + if (lookup_attribute ("notrack", TYPE_ATTRIBUTES (fntype))) > >> + gimple_call_set_with_notrack (call, TRUE); > >> + } > >> +} > >> > >> this means notrack is not recognized if fndecl is not NULL. Note > >> that only the two callers know the real function type in effect (they > >> call gimple_call_set_fntype with it). I suggest to pass down that > >> type to gimple_build_call_from_tree and move the > >> gimple_call_set_fntype call there as well. And simply use the type for the > above. > > > > The best way to say is notrack is not propagated if fndecl is not NULL. > Fndecl, if not NULL, is a direct call and notrack is not applicable for such > calls. I > will add a comment before the if. > > Hmm. But what does this mean? I guess direct calls are always 'notrack' then > and thus we're fine to ignore it. Yes, that's correct - direct calls are always 'notrack' and we are ignoring it in calls. > > I would like to propose modifying the existing code without changing > interfaces. The idea is that at the time the notrack is propagated (the code > snippet above) the gimple call was created and the correct type was assigned > to the 'call' exactly by gimple_call_set_fntype. My proposal is to get the > type > out of the gimple 'call' (like gimple_call_fntype) instead of the tree 't'. > Is it > right? > > Yes, but note that the type on the gimple 'call' isn't yet correctly set > (only the > caller does that). The gimple_build_call_from_tree is really an ugly thing > and > it should be privatized inside th
Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
On 18.08.2017 12:01, Richard Biener wrote: On Wed, Aug 16, 2017 at 3:32 PM, Georg-Johann Lay wrote: On 28.07.2017 09:34, Richard Biener wrote: On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay wrote: On 27.07.2017 14:34, Richard Biener wrote: On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay wrote: For some targets, the best place to put read-only lookup tables as generated by -ftree-switch-conversion is not the generic address space but some target specific address space. This is the case for AVR, where for most devices .rodata must be located in RAM. Part #1 adds a new, optional target hook that queries the back-end about the desired address space. There is currently only one user: tree-switch-conversion.c::build_one_array() which re-builds value_type and array_type if the address space returned by the backend is not the generic one. Part #2 is the AVR part that implements the new hook and adds some sugar around it. Given that switch-conversion just creates a constant initializer doesn't AVR benefit from handling those uniformly (at RTL expansion?). Not sure but I think it goes through the regular constant pool handling. Richard. avr doesn't use constant pools because they are not profitable for several reasons. Moreover, it's not sufficient to just patch the pool's section, you'd also have to patch *all* accesses to also use the exact same address space so that they use the correct instruction (LPM instead of LD). Sure. So then not handle it in constant pool handling but in expanding the initializers of global vars. I think the entry for this is varasm.c:assemble_variable - that sets DECL_RTL for the variable. That cannot work, and here is why; just for completeness: cgraphunit.c::symbol_table::compile() runs ... expand_all_functions (); output_variables (); // runs assemble_variable ... So any patching of DECL_RTL will result in wrong code: The address space of the decl won't match the address space used by the code accessing decl. Ok, so it's more tricky to hack it that way (you'd need to catch the time the decl gets its DECL_RTL set, not sure where that happens for globals). Too late, IMO. Problem is that any reference must also have the same AS. Hence if somewhere, before patching decl_rtl, some code uses TREE_TYPE on respective decl, that type will miss the AS, same for all variables / pointers created with that type. Been there, seen it... That leaves doing a more broad transform. One convenient place to hook in would be the ipa_single_use pass which computes the varpool_node.used_by_single_function flag. All such variables would be suitable for promotion (with some additional constraints I suppose). You'd add a transform phase to the pass that rewrites the decls and performs GIMPLE IL adjustments (I think you need to patch memory references to use the address-space). Rewriting DECLs is not enough. Only the place that cooks up the decl (implicitly) knows whether it's appropriate to use different AS and whether is has control over diffusion into TREE_TYPEs. And as we have strong filter (see below) which includes DECL_ARTIFICIAL, almost nothing will remain to be transformed anyway... Of course there would need to be a target hook determining if/what section/address-space a variable should be promoted to which somehow would allow switch-conversion to use that as well. Ho humm. That said, do you think the switch-conversion created decl is the only one that benefits from compiler-directed promotion to different address-space? Yes. Only compiler-generated lookup-tables may be transformed, and the only such tables I know of are CSWTCH and vtables. The conditions so far are: * TREE_STATIC * !TREE_PUBLIC * TREE_READONLY (at least for the case this PR is after) * DECL_ARTIFICIAL (or otherwise exclude inline asm) * decl must not be cooked up by non-C FE (i.e. vtables are out). * Not weak, comdat or linkonce (again, vtables are out). * Not for debug purpose (like what dwarf2asm does). * No section attribute (actually also CSWTCH could be mentioned in linker script, but we can argue that artificials are managed by the compiler + user has option to control CSWTCH). What remains is CSWTCH. Johann
Re: [PATCH][RFA/RFC] Stack clash mitigation patch 03/08 - V3
On Mon, Jul 31, 2017 at 7:41 AM, Jeff Law wrote: > This patch introduces some routines for logging of stack clash > protection actions. > > I don't think this patch changed at all relative to V2. Ok. Richard. > > Jeff > > > * function.c (dump_stack_clash_frame_info): New function. > * function.h (dump_stack_clash_frame_info): Prototype. > (enum stack_clash_probes): New enum. > > diff --git a/gcc/function.c b/gcc/function.c > index f625489..ca48b3f 100644 > --- a/gcc/function.c > +++ b/gcc/function.c > @@ -5695,6 +5695,58 @@ get_arg_pointer_save_area (void) >return ret; > } > > + > +/* If debugging dumps are requested, dump information about how the > + target handled -fstack-check=clash for the prologue. > + > + PROBES describes what if any probes were emitted. > + > + RESIDUALS indicates if the prologue had any residual allocation > + (i.e. total allocation was not a multiple of PROBE_INTERVAL). */ > + > +void > +dump_stack_clash_frame_info (enum stack_clash_probes probes, bool residuals) > +{ > + if (!dump_file) > +return; > + > + switch (probes) > +{ > +case NO_PROBE_NO_FRAME: > + fprintf (dump_file, > + "Stack clash no probe no stack adjustment in prologue.\n"); > + break; > +case NO_PROBE_SMALL_FRAME: > + fprintf (dump_file, > + "Stack clash no probe small stack adjustment in prologue.\n"); > + break; > +case PROBE_INLINE: > + fprintf (dump_file, "Stack clash inline probes in prologue.\n"); > + break; > +case PROBE_LOOP: > + fprintf (dump_file, "Stack clash probe loop in prologue.\n"); > + break; > +} > + > + if (residuals) > +fprintf (dump_file, "Stack clash residual allocation in prologue.\n"); > + else > +fprintf (dump_file, "Stack clash no residual allocation in prologue.\n"); > + > + if (frame_pointer_needed) > +fprintf (dump_file, "Stack clash frame pointer needed.\n"); > + else > +fprintf (dump_file, "Stack clash no frame pointer needed.\n"); > + > + if (TREE_THIS_VOLATILE (cfun->decl)) > +fprintf (dump_file, > +"Stack clash noreturn prologue, assuming no implicit" > +" probes in caller.\n"); > + else > +fprintf (dump_file, > +"Stack clash not noreturn prologue.\n"); > +} > + > /* Add a list of INSNS to the hash HASHP, possibly allocating HASHP > for the first time. */ > > diff --git a/gcc/function.h b/gcc/function.h > index 0f34bcd..87dac80 100644 > --- a/gcc/function.h > +++ b/gcc/function.h > @@ -553,6 +553,14 @@ do { > \ >((TARGET_PTRMEMFUNC_VBIT_LOCATION == ptrmemfunc_vbit_in_pfn) \ > ? MAX (FUNCTION_BOUNDARY, 2 * BITS_PER_UNIT) : FUNCTION_BOUNDARY) > > +enum stack_clash_probes { > + NO_PROBE_NO_FRAME, > + NO_PROBE_SMALL_FRAME, > + PROBE_INLINE, > + PROBE_LOOP > +}; > + > +extern void dump_stack_clash_frame_info (enum stack_clash_probes, bool); > > > extern void push_function_context (void); >
libgo patch committed: Fix cgo tests for AIX
This libgo patch by Tony Reix fixes the cgo tests for AIX. Bootstrapped and ran testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 251179) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -9ff49c64ea6dbb5e08d1fa859b99b06049413279 +28e49825162465172ed706283628bf5cc1996260 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/misc/cgo/test/cthread_unix.c === --- libgo/misc/cgo/test/cthread_unix.c (revision 250873) +++ libgo/misc/cgo/test/cthread_unix.c (working copy) @@ -2,7 +2,7 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build darwin dragonfly freebsd linux netbsd openbsd solaris +// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris #include #include "_cgo_export.h" Index: libgo/misc/cgo/test/issue18146.go === --- libgo/misc/cgo/test/issue18146.go (revision 250873) +++ libgo/misc/cgo/test/issue18146.go (working copy) @@ -50,6 +50,8 @@ func test18146(t *testing.T) { nproc = 6 case "darwin", "dragonfly", "freebsd", "netbsd", "openbsd": nproc = 7 + case "aix": + nproc = 9 } if setNproc { var rlim syscall.Rlimit
Re: [PATCH][RFA/RFC] Stack clash mitigation patch 02/08 - V3
On Mon, Jul 31, 2017 at 7:38 AM, Jeff Law wrote: > > This patch introduces generic mechanisms to protect the dynamically > allocated stack space against stack-clash attacks. > > Changes since V2: > > Dynamic allocations can be emitted as unrolled inlined probes or with a > rotated loop. Blockage insns are also properly emitted for the dynamic > area probes and the dynamic area probing now supports targets that may > make optimistic assumptions in their prologues. Finally it uses the new > param to control the probing interval. > > Tests were updated to explicitly specify the guard and probing interval. > New test to check inline/unrolled probes as well as rotated loop. > > > > * explow.c: Include "params.h". > (anti_adjust_stack_and_probe_stack_clash): New function. > (get_stack_check_protect): Likewise. > (compute_stack_clash_protection_loop_data): Likewise. > (emit_stack_clash_protection_loop_start): Likewise. > (emit_stack_clash_protection_loop_end): Likewise. > (allocate_dynamic_stack_space): Use get_stack_check_protect. > Use anti_adjust_stack_and_probe_stack_clash. > * explow.h (compute_stack_clash_protection_loop_data): Prototype. > (emit_stack_clash_protection_loop_start): Likewise. > (emit_stack_clash_protection_loop_end): Likewise. > * rtl.h (get_stack_check_protect): Prototype. > * defaults.h (STACK_CLASH_PROTECTION_NEEDS_FINAL_DYNAMIC_PROBE): > Define new default. > * doc/tm.texi.in (STACK_CLASH_PROTECTION_NEEDS_FINAL_DYNAMIC_PROBE): > Define. Please make this a hook instead of a target macro. Besides this it looks good (I trust you on the RTL details). Thanks, Richard. > * doc/tm.texi: Rebuilt. > > * config/aarch64/aarch64.c (aarch64_expand_prologue): Use > get_stack_check_protect. > * config/alpha/alpha.c (alpha_expand_prologue): Likewise. > * config/arm/arm.c (arm_expand_prologue): Likewise. > * config/i386/i386.c (ix86_expand_prologue): Likewise. > * config/ia64/ia64.c (ia64_expand_prologue): Likewise. > * config/mips/mips.c (mips_expand_prologue): Likewise. > * config/powerpcspe/powerpcspe.c (rs6000_emit_prologue): Likewise. > * config/rs6000/rs6000.c (rs6000_emit_prologue): Likewise. > * config/sparc/sparc.c (sparc_expand_prologue): Likewise. > > > testsuite > > * gcc.dg/stack-check-3.c: New test. > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index ef1b5a8..0a8b40a 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -3676,12 +3676,14 @@ aarch64_expand_prologue (void) > { >if (crtl->is_leaf && !cfun->calls_alloca) > { > - if (frame_size > PROBE_INTERVAL && frame_size > STACK_CHECK_PROTECT) > - aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, > - frame_size - STACK_CHECK_PROTECT); > + if (frame_size > PROBE_INTERVAL > + && frame_size > get_stack_check_protect ()) > + aarch64_emit_probe_stack_range (get_stack_check_protect (), > + (frame_size > +- get_stack_check_protect ())); > } >else if (frame_size > 0) > - aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, frame_size); > + aarch64_emit_probe_stack_range (get_stack_check_protect (), > frame_size); > } > >aarch64_sub_sp (IP0_REGNUM, initial_adjust, true); > diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c > index 00a69c1..91f3d7c 100644 > --- a/gcc/config/alpha/alpha.c > +++ b/gcc/config/alpha/alpha.c > @@ -7741,7 +7741,7 @@ alpha_expand_prologue (void) > >probed_size = frame_size; >if (flag_stack_check) > -probed_size += STACK_CHECK_PROTECT; > +probed_size += get_stack_check_protect (); > >if (probed_size <= 32768) > { > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index c6101ef..9822ca7 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -21680,13 +21680,13 @@ arm_expand_prologue (void) > >if (crtl->is_leaf && !cfun->calls_alloca) > { > - if (size > PROBE_INTERVAL && size > STACK_CHECK_PROTECT) > - arm_emit_probe_stack_range (STACK_CHECK_PROTECT, > - size - STACK_CHECK_PROTECT, > + if (size > PROBE_INTERVAL && size > get_stack_check_protect ()) > + arm_emit_probe_stack_range (get_stack_check_protect (), > + size - get_stack_check_protect (), > regno, live_regs_mask); > } >else if (size > 0) > - arm_emit_probe_stack_range (STACK_CHECK_PROTECT, size, > + arm_emit_probe_stack_range (get_stack_check_protect (), size, >
Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 - V3
On Mon, Jul 31, 2017 at 7:35 AM, Jeff Law wrote: > This patch introduces the stack clash protection options > > Changes since V2: > > Adds two new params. The first controls the size of the guard area. > This controls the threshold for when a function prologue requires > probes. The second controls the probing interval -- ie, once probes are > needed, how often do we emit them. These are really meant more for > developers to experiment with than users. Regardless I did go ahead and > document them./PARAM > > It also adds some sanity checking WRT combining stack clash protection > with -fstack-check. diff --git a/gcc/params.c b/gcc/params.c index fab0ffa..8afe4c4 100644 --- a/gcc/params.c +++ b/gcc/params.c @@ -209,6 +209,11 @@ set_param_value (const char *name, int value, error ("maximum value of parameter %qs is %u", compiler_params[i].option, compiler_params[i].max_value); + else if ((strcmp (name, "stack-clash-protection-guard-size") == 0 +|| strcmp (name, "stack-clash-protection-probe-interval") == 0) + && exact_log2 (value) == -1) +error ("value of parameter %qs must be a power of 2", + compiler_params[i].option); else set_param_value_internal ((compiler_param) i, value, params, params_set, true); I don't like this. Either use them as if they were power-of-two (floor_log2/ceil_log2 as appropriate) or simply make them take the logarithm instead (like -mincoming-stack-boundary and friends). Both -fstack-clash-protection and -fstack-check cannot be turned off per function. This means they would need merging in lto-wrapper. The alternative is to mark them with 'Optimization' and allow per-function specification (like we do for -fstack-protector). Otherwise ok. Richard. > Jeff > > > * common.opt (-fstack-clash-protection): New option. > * flag-types.h (enum stack_check_type): Note difference between > -fstack-check= and -fstack-clash-protection. > * params.h (set_param_value): Verify stack-clash-protection > params are powers of two. > * params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE): New PARAM. > (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Likewise. > * toplev.c (process_options): Issue warnings/errors for cases > not handled with -fstack-clash-protection. > > > > * doc/invoke.texi (-fstack-clash-protection): Document new option. > (-fstack-check): Note additional problem with -fstack-check=generic. > Note that -fstack-check is primarily for Ada and refer users > to -fstack-clash-protection for stack-clash-protection. > Document new params for stack clash protection. > > > > > * toplev.c (process_options): Handle -fstack-clash-protection. > > testsuite/ > > * gcc.dg/stack-check-2.c: New test. > * lib/target-supports.exp > (check_effective_target_supports_stack_clash_protection): New > function. > (check_effective_target_frame_pointer_for_non_leaf): Likewise. > (check_effective_target_caller_implicit_probes): Likewise. > > > > diff --git a/gcc/common.opt b/gcc/common.opt > index e81165c..cfaf2bc 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2306,6 +2306,11 @@ fstack-check > Common Alias(fstack-check=, specific, no) > Insert stack checking code into the program. Same as -fstack-check=specific. > > +fstack-clash-protection > +Common Report Var(flag_stack_clash_protection) > +Insert code to probe each page of stack space as it is allocated to protect > +from stack-clash style attacks > + > fstack-limit > Common Var(common_deferred_options) Defer > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 3e5cee8..8095dc2 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -10128,6 +10128,20 @@ compilation without. The value for compilation with > profile feedback > needs to be more conservative (higher) in order to make tracer > effective. > > +@item stack-clash-protection-guard-size > +The size (in bytes) of the stack guard. Acceptable values are powers of 2 > +between 4096 and 134217728 and defaults to 4096. Higher values may reduce > the > +number of explicit probes, but a value larger than the operating system > +provided guard will leave code vulnerable to stack clash style attacks. > + > +@item stack-clash-protection-probe-interval > +Stack clash protection involves probing stack space as it is allocated. This > +param controls the maximum distance (in bytes) between probes into the stack. > +Acceptable values are powers of 2 between 4096 and 65536 and defaults to > +4096. Higher values may reduce the number of explicit probes, but a value > +larger than the operating system provided guard will leave code vulnerable to > +stack clash style attacks. > + > @item max-cse-path-length > > The maximum number of basic blocks on path that CSE considers. > @@ -11333,7 +11347,8 @@ target support in the compiler but comes with the > following
Re: [PATCH 4/3] improve detection of attribute conflicts (PR 81544)
On 16/08/17 16:38 -0600, Martin Sebor wrote: Jon, Attached is the libstdc++ only patch to remove the pointless const attribute from __pool::_M_destroy_thread_key(void*). https://gcc.gnu.org/ml/gcc/2017-08/msg00027.html I only belatedly now broke it out of the larger patch under review here: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00599.html Thanks Martin libstdc++-v3/ChangeLog: PR c/81544 * include/ext/mt_allocator.h (_M_destroy_thread_key): Remove pointless attribute const. OK. diff --git a/libstdc++-v3/include/ext/mt_allocator.h b/libstdc++-v3/include/ext/mt_allocator.h index effb13b..f349ff8 100644 --- a/libstdc++-v3/include/ext/mt_allocator.h +++ b/libstdc++-v3/include/ext/mt_allocator.h @@ -355,7 +355,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } // XXX GLIBCXX_ABI Deprecated - _GLIBCXX_CONST void + void _M_destroy_thread_key(void*) throw (); size_t
Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
On Fri, Aug 18, 2017 at 3:11 PM, Tsimbalist, Igor V wrote: >> -Original Message- >> From: Richard Biener [mailto:richard.guent...@gmail.com] >> Sent: Tuesday, August 15, 2017 3:43 PM >> To: Tsimbalist, Igor V >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling >> >> On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V >> wrote: >> > Part#1. Add generic part for Intel CET enabling. >> > >> > The spec is available at >> > >> > https://software.intel.com/sites/default/files/managed/4d/2a/control-f >> > low-enforcement-technology-preview.pdf >> > >> > High-level design. >> > -- >> > >> > A proposal is to introduce a target independent flag >> > -finstrument-control-flow with a semantic to instrument a code to >> > control validness or integrity of control-flow transfers using jump >> > and call instructions. The main goal is to detect and block a possible >> > malware execution through transfer the execution to unknown target >> > address. Implementation could be either software or target based. Any >> > target platforms can provide their implementation for instrumentation >> > under this option. >> > >> > When the -finstrument-control-flow flag is set each implementation has >> > to check if a support exists for a target platform and report an error >> > if no support is found. >> > >> > The compiler should instrument any control-flow transfer points in a >> > program (ex. call/jmp/ret) as well as any landing pads, which are >> > targets of for control-flow transfers. >> > >> > A new 'notrack' attribute is introduced to provide hand tuning support. >> > The attribute directs the compiler to skip a call to a function and a >> > function's landing pad from instrumentation (tracking). The attribute >> > can be used for function and pointer to function types, otherwise it >> > will be ignored. The attribute is saved in a type and propagated to a >> > GIMPLE call statement and later to a call instruction. >> > >> > Currently all platforms except i386 will report the error and do no >> > instrumentation. i386 will provide the implementation based on a >> > specification published by Intel for a new technology called >> > Control-flow Enforcement Technology (CET). >> >> diff --git a/gcc/gimple.c b/gcc/gimple.c index 479f90c..2e4ab2d 100644 >> --- a/gcc/gimple.c >> +++ b/gcc/gimple.c >> @@ -378,6 +378,23 @@ gimple_build_call_from_tree (tree t) >>gimple_set_no_warning (call, TREE_NO_WARNING (t)); >>gimple_call_set_with_bounds (call, CALL_WITH_BOUNDS_P (t)); >> >> + if (fndecl == NULL_TREE) >> +{ >> + /* Find the type of an indirect call. */ >> + tree addr = CALL_EXPR_FN (t); >> + if (TREE_CODE (addr) != FUNCTION_DECL) >> + { >> + tree fntype = TREE_TYPE (addr); >> + gcc_assert (POINTER_TYPE_P (fntype)); >> + fntype = TREE_TYPE (fntype); >> + >> + /* Check if its type has the no-track attribute and propagate >> +it to the CALL insn. */ >> + if (lookup_attribute ("notrack", TYPE_ATTRIBUTES (fntype))) >> + gimple_call_set_with_notrack (call, TRUE); >> + } >> +} >> >> this means notrack is not recognized if fndecl is not NULL. Note that only >> the >> two callers know the real function type in effect (they call >> gimple_call_set_fntype with it). I suggest to pass down that type to >> gimple_build_call_from_tree and move the gimple_call_set_fntype call >> there as well. And simply use the type for the above. > > The best way to say is notrack is not propagated if fndecl is not NULL. > Fndecl, if not NULL, is a direct call and notrack is not applicable for such > calls. I will add a comment before the if. Hmm. But what does this mean? I guess direct calls are always 'notrack' then and thus we're fine to ignore it. > I would like to propose modifying the existing code without changing > interfaces. The idea is that at the time the notrack is propagated (the code > snippet above) the gimple call was created and the correct type was assigned > to the 'call' exactly by gimple_call_set_fntype. My proposal is to get the > type out of the gimple 'call' (like gimple_call_fntype) instead of the tree > 't'. Is it right? Yes, but note that the type on the gimple 'call' isn't yet correctly set (only the caller does that). The gimple_build_call_from_tree is really an ugly thing and it should be privatized inside the gimplifier... >> +static inline bool >> +gimple_call_with_notrack_p (const gimple *gs) { >> + const gcall *gc = GIMPLE_CHECK2 (gs); >> + return gimple_call_with_notrack_p (gc); } >> >> please do not add gimple * overloads for new APIs, instead make sure to >> pass down gcalls at callers. > > Ok, I will remove. > >> Please change the names to omit 'with_', thus just notrack and >> GF_CALL_NOTRACK. > > Ok, I will rename. > >> I think 'notrack' is somewhat unspecific of a name, what prevented you to >> use 'nocet'? > > Actually
Re: [PATCH v2] Simplify pow with constant
On Fri, Aug 18, 2017 at 2:47 PM, Wilco Dijkstra wrote: > Alexander Monakov wrote: >> >> Note this changes the outcome for C == +Inf, x == 0 (pow is specified to >> return 1.0 in that case, but x * C1 == NaN). There's another existing >> transform with the same issue, 'pow(expN(x), y) -> expN(x*y)', so this is >> not a new problem. >> >> The whole set of these match.pd transforms is guarded by >> flag_unsafe_math_optimizations, which is a bit strange, on the one hand >> it does not include -ffinite-math-only, but on the other hand it's >> defined broadly enough to imply that. > > Yes I was assuming that unsafe_math_optimizations was enough for these > transformations to be... safe. I've added an isfinite check so that case > works fine. > It looks we need to go through the more complex transformations (especially > given pow has so many special cases) and add more finite_math checks. > Here's the new version: > > > This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C). > Do this only for fast-math as accuracy is reduced. This is much faster > since pow is more complex than exp - with current GLIBC the speedup is > more than 7 times for this transformation. > > The worst-case ULP error of the transformation for powf (10.0, x) in SPEC > was 2.5. If we allow use of exp10 in match.pd, the ULP error would be lower. You can use exp10/pow10 in match.pd but it will only match if the programmer already uses those functions as they are not C standard. There's also exp2 which is a C99 function. Is there any good reason to convert exp (C * x) to exp2 (C' * x)? (besides if C' becomes 1) So eventually we can match pow (10., x) -> exp10 and pow (2., x) -> exp2 (though I believe in canonicalization as well -- all exp calls could be canonicalized to pow calls to simplify the cases we need to handle...). The patch is ok. Thanks, Richard. > ChangeLog: > 2017-08-18 Wilco Dijkstra > > * match.pd: Add pow (C, x) simplification. > -- > diff --git a/gcc/match.pd b/gcc/match.pd > index > 0e36f46b914bc63c257cef47152ab1aa507963e5..a5552c5096de5100a882d52add6b620ba87c1f72 > 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3622,6 +3622,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (logs (pows @0 @1)) > (mult @1 (logs @0 > > + /* pow(C,x) -> exp(log(C)*x) if C > 0. */ > + (for pows (POW) > + exps (EXP) > + logs (LOG) > + (simplify > + (pows REAL_CST@0 @1) > +(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) > +&& real_isfinite (TREE_REAL_CST_PTR (@0))) > + (exps (mult (logs @0) @1) > + > (for sqrts (SQRT) >cbrts (CBRT) >pows (POW) > >
RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Tuesday, August 15, 2017 3:43 PM > To: Tsimbalist, Igor V > Cc: gcc-patches@gcc.gnu.org > Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling > > On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V > wrote: > > Part#1. Add generic part for Intel CET enabling. > > > > The spec is available at > > > > https://software.intel.com/sites/default/files/managed/4d/2a/control-f > > low-enforcement-technology-preview.pdf > > > > High-level design. > > -- > > > > A proposal is to introduce a target independent flag > > -finstrument-control-flow with a semantic to instrument a code to > > control validness or integrity of control-flow transfers using jump > > and call instructions. The main goal is to detect and block a possible > > malware execution through transfer the execution to unknown target > > address. Implementation could be either software or target based. Any > > target platforms can provide their implementation for instrumentation > > under this option. > > > > When the -finstrument-control-flow flag is set each implementation has > > to check if a support exists for a target platform and report an error > > if no support is found. > > > > The compiler should instrument any control-flow transfer points in a > > program (ex. call/jmp/ret) as well as any landing pads, which are > > targets of for control-flow transfers. > > > > A new 'notrack' attribute is introduced to provide hand tuning support. > > The attribute directs the compiler to skip a call to a function and a > > function's landing pad from instrumentation (tracking). The attribute > > can be used for function and pointer to function types, otherwise it > > will be ignored. The attribute is saved in a type and propagated to a > > GIMPLE call statement and later to a call instruction. > > > > Currently all platforms except i386 will report the error and do no > > instrumentation. i386 will provide the implementation based on a > > specification published by Intel for a new technology called > > Control-flow Enforcement Technology (CET). > > diff --git a/gcc/gimple.c b/gcc/gimple.c index 479f90c..2e4ab2d 100644 > --- a/gcc/gimple.c > +++ b/gcc/gimple.c > @@ -378,6 +378,23 @@ gimple_build_call_from_tree (tree t) >gimple_set_no_warning (call, TREE_NO_WARNING (t)); >gimple_call_set_with_bounds (call, CALL_WITH_BOUNDS_P (t)); > > + if (fndecl == NULL_TREE) > +{ > + /* Find the type of an indirect call. */ > + tree addr = CALL_EXPR_FN (t); > + if (TREE_CODE (addr) != FUNCTION_DECL) > + { > + tree fntype = TREE_TYPE (addr); > + gcc_assert (POINTER_TYPE_P (fntype)); > + fntype = TREE_TYPE (fntype); > + > + /* Check if its type has the no-track attribute and propagate > +it to the CALL insn. */ > + if (lookup_attribute ("notrack", TYPE_ATTRIBUTES (fntype))) > + gimple_call_set_with_notrack (call, TRUE); > + } > +} > > this means notrack is not recognized if fndecl is not NULL. Note that only > the > two callers know the real function type in effect (they call > gimple_call_set_fntype with it). I suggest to pass down that type to > gimple_build_call_from_tree and move the gimple_call_set_fntype call > there as well. And simply use the type for the above. The best way to say is notrack is not propagated if fndecl is not NULL. Fndecl, if not NULL, is a direct call and notrack is not applicable for such calls. I will add a comment before the if. I would like to propose modifying the existing code without changing interfaces. The idea is that at the time the notrack is propagated (the code snippet above) the gimple call was created and the correct type was assigned to the 'call' exactly by gimple_call_set_fntype. My proposal is to get the type out of the gimple 'call' (like gimple_call_fntype) instead of the tree 't'. Is it right? > +static inline bool > +gimple_call_with_notrack_p (const gimple *gs) { > + const gcall *gc = GIMPLE_CHECK2 (gs); > + return gimple_call_with_notrack_p (gc); } > > please do not add gimple * overloads for new APIs, instead make sure to > pass down gcalls at callers. Ok, I will remove. > Please change the names to omit 'with_', thus just notrack and > GF_CALL_NOTRACK. Ok, I will rename. > I think 'notrack' is somewhat unspecific of a name, what prevented you to > use 'nocet'? Actually it's specific. The HW will have a prefix with exactly this name and the same meaning. And I think, what is more important, 'track/notrack' gives better semantic for a user. CET is a name bound with Intel specific technology. > Any idea how to implement a software-based solution efficiently? > Creating a table of valid destination addresses in a special section should be > possible without too much work, am I right in that only indirect control > transfer is checked? Thus CET assumes the code
Re: [PATCH] detect incompatible aliases (PR c/81854)
On 17/08/17 21:21 -0600, Martin Sebor wrote: Joseph, while looking into implementing enhancement your request pr81824 I noticed that GCC silently accepts incompatible alias declarations (pr81854) so as sort of a proof-concept for the former I enhanced the checking already done for other kinds of incompatibilities to also detect those mentioned in the latter bug. Attached is this patch, tested on x85_64-linux. Jonathan, the patch requires suppressing the warning in libstdc++ compatibility symbol definitions in compatibility.cc. I couldn't find a way to do it without the suppression but I'd be happy to try again if you have an idea for how. Doing it that way is fine, but ... diff --git a/libstdc++-v3/src/c++98/compatibility.cc b/libstdc++-v3/src/c++98/compatibility.cc index 381f4c4..5f56b9e 100644 --- a/libstdc++-v3/src/c++98/compatibility.cc +++ b/libstdc++-v3/src/c++98/compatibility.cc @@ -213,6 +213,11 @@ _ZNSt19istreambuf_iteratorIcSt11char_traitsIcEEppEv _ZNSt19istreambuf_iteratorIwSt11char_traitsIwEEppEv */ +// Disable warning about declaring aliases between functions with +// incompatible types. +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wattributes" + Could this be moved closer to the point where it's needed? It's not needed until after line 361, right? namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -509,6 +514,9 @@ _GLIBCXX_MATHL_WRAPPER1 (tan, GLIBCXX_3.4); #endif #endif // _GLIBCXX_LONG_DOUBLE_COMPAT +// Restore disable -Wattributes +#pragma GCC diagnostic pop + #endif #ifdef _GLIBCXX_LONG_DOUBLE_COMPAT
Re: [PATCH v2] Simplify pow with constant
Alexander Monakov wrote: > > Note this changes the outcome for C == +Inf, x == 0 (pow is specified to > return 1.0 in that case, but x * C1 == NaN). There's another existing > transform with the same issue, 'pow(expN(x), y) -> expN(x*y)', so this is > not a new problem. > > The whole set of these match.pd transforms is guarded by > flag_unsafe_math_optimizations, which is a bit strange, on the one hand > it does not include -ffinite-math-only, but on the other hand it's > defined broadly enough to imply that. Yes I was assuming that unsafe_math_optimizations was enough for these transformations to be... safe. I've added an isfinite check so that case works fine. It looks we need to go through the more complex transformations (especially given pow has so many special cases) and add more finite_math checks. Here's the new version: This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C). Do this only for fast-math as accuracy is reduced. This is much faster since pow is more complex than exp - with current GLIBC the speedup is more than 7 times for this transformation. The worst-case ULP error of the transformation for powf (10.0, x) in SPEC was 2.5. If we allow use of exp10 in match.pd, the ULP error would be lower. ChangeLog: 2017-08-18 Wilco Dijkstra * match.pd: Add pow (C, x) simplification. -- diff --git a/gcc/match.pd b/gcc/match.pd index 0e36f46b914bc63c257cef47152ab1aa507963e5..a5552c5096de5100a882d52add6b620ba87c1f72 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3622,6 +3622,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (logs (pows @0 @1)) (mult @1 (logs @0 + /* pow(C,x) -> exp(log(C)*x) if C > 0. */ + (for pows (POW) + exps (EXP) + logs (LOG) + (simplify + (pows REAL_CST@0 @1) +(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) +&& real_isfinite (TREE_REAL_CST_PTR (@0))) + (exps (mult (logs @0) @1) + (for sqrts (SQRT) cbrts (CBRT) pows (POW)
Re: C PATCH to remove unused block of code
On Thu, Aug 17, 2017 at 02:17:31PM +, Joseph Myers wrote: > On Thu, 17 Aug 2017, Marek Polacek wrote: > > > I've been itching to remove this code for some time now. The comment > > suggests > > that the code is actually unused, so I replaced the body of that "else if" > > with > > gcc_unreachable (); and ran regtest/bootstrap and nothing broke, so I > > propose > > to do away with it. > > > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > OK, with an appropriate change to the comment above the > c_parser_postfix_expression function to say that compound literals are not > handled here and callers have to call > c_parser_postfix_expression_after_paren_type on encountering them. (I've > reviewed all paths to c_parser_postfix_expression in the current parser > and don't think any of them can send compound literals to it, because > either they are parsing a specific more restricted syntax incompatible > with compond literals (OMP cases), are parsing an expression that might be > a cast expression so need to parse the (type) first to distinguish, or are > parsing a context such as sizeof where a parenthesized type name is > allowed as well as a postfix expression and again need to parse the (type) > first to distinguish.) Thanks for verifying. Here's what I'm going to install: 2017-08-18 Marek Polacek * c-parser.c (c_parser_postfix_expression): Remove unused code. Update commentary. diff --git gcc/c/c-parser.c gcc/c/c-parser.c index 1402ba67204..5c965d4420d 100644 --- gcc/c/c-parser.c +++ gcc/c/c-parser.c @@ -7752,7 +7752,8 @@ c_parser_generic_selection (c_parser *parser) } /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2, - C11 6.5.1-6.5.2). + C11 6.5.1-6.5.2). Compound literals aren't handled here; callers have to + call c_parser_postfix_expression_after_paren_type on encountering them. postfix-expression: primary-expression @@ -7792,7 +7793,7 @@ c_parser_generic_selection (c_parser *parser) __builtin_types_compatible_p ( type-name , type-name ) __builtin_complex ( assignment-expression , assignment-expression ) __builtin_shuffle ( assignment-expression , assignment-expression ) - __builtin_shuffle ( assignment-expression , + __builtin_shuffle ( assignment-expression , assignment-expression , assignment-expression, ) @@ -7943,28 +7944,6 @@ c_parser_postfix_expression (c_parser *parser) set_c_expr_source_range (&expr, loc, close_loc); mark_exp_read (expr.value); } - else if (c_token_starts_typename (c_parser_peek_2nd_token (parser))) - { - /* A compound literal. ??? Can we actually get here rather -than going directly to -c_parser_postfix_expression_after_paren_type from -elsewhere? */ - location_t loc; - struct c_type_name *type_name; - c_parser_consume_token (parser); - loc = c_parser_peek_token (parser)->location; - type_name = c_parser_type_name (parser); - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, -"expected %<)%>"); - if (type_name == NULL) - { - expr.set_error (); - } - else - expr = c_parser_postfix_expression_after_paren_type (parser, -type_name, -loc); - } else { /* A parenthesized expression. */ Marek
Re: Add a full_integral_type_p helper function
On Fri, Aug 18, 2017 at 1:04 PM, Richard Sandiford wrote: > Richard Biener writes: >> On Fri, Aug 18, 2017 at 10:10 AM, Richard Sandiford >> wrote: >>> There are several places that test whether: >>> >>> TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t)) >>> >>> for some integer type T. With SVE variable-length modes, this would >>> need to become: >>> >>> TYPE_PRECISION (t) == GET_MODE_PRECISION (SCALAR_TYPE_MODE (t)) >>> >>> (or SCALAR_INT_TYPE_MODE, it doesn't matter which in this case). >>> But rather than add the "SCALAR_" everywhere, it seemed neater to >>> introduce a new helper function that tests whether T is an integral >>> type that has the same number of bits as its underlying mode. This >>> patch does that, calling it full_integral_type_p. >>> >>> It isn't possible to use TYPE_MODE in tree.h because vector_type_mode >>> is defined in stor-layout.h, so for now the function accesses the mode >>> field directly. After the 77-patch machine_mode series (thanks again >>> Jeff for the reviews) it would use SCALAR_TYPE_MODE instead. >>> >>> Of the changes that didn't previously have an INTEGRAL_TYPE_P check: >>> >>> - for fold_single_bit_test_into_sign_test it is obvious from the >>> integer_foop tests that this is restricted to integral types. >>> >>> - vect_recog_vector_vector_shift_pattern is inherently restricted >>> to integral types. >>> >>> - the register_edge_assert_for_2 hunk is dominated by: >>> >>> TREE_CODE (val) == INTEGER_CST >>> >>> - the ubsan_instrument_shift hunk is preceded by an early exit: >>> >>> if (!INTEGRAL_TYPE_P (type0)) >>> return NULL_TREE; >>> >>> - the second and third match.pd hunks are from: >>> >>> /* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) >>> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) >>>if the new mask might be further optimized. */ >>> >>> I'm a bit confused about: >>> >>> /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) >>>when profitable. >>>For bitwise binary operations apply operand conversions to the >>>binary operation result instead of to the operands. This allows >>>to combine successive conversions and bitwise binary operations. >>>We combine the above two cases by using a conditional convert. */ >>> (for bitop (bit_and bit_ior bit_xor) >>> (simplify >>> (bitop (convert @0) (convert? @1)) >>> (if (((TREE_CODE (@1) == INTEGER_CST >>> && INTEGRAL_TYPE_P (TREE_TYPE (@0)) >>> && int_fits_type_p (@1, TREE_TYPE (@0))) >>> || types_match (@0, @1)) >>>/* ??? This transform conflicts with fold-const.c doing >>> Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>> constants (if x has signed type, the sign bit cannot be set >>> in c). This folds extension into the BIT_AND_EXPR. >>> Restrict it to GIMPLE to avoid endless recursions. */ >>>&& (bitop != BIT_AND_EXPR || GIMPLE) >>>&& (/* That's a good idea if the conversion widens the operand, thus >>> after hoisting the conversion the operation will be narrower. >>> */ >>>TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type) >>>/* It's also a good idea if the conversion is to a non-integer >>> mode. */ >>>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT >>>/* Or if the precision of TO is not the same as the precision >>> of its mode. */ >>>|| TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE >>> (type >>>(convert (bitop @0 (convert @1)) >>> >>> though. The "INTEGRAL_TYPE_P (TREE_TYPE (@0))" suggests that we can't >>> rely on @0 and @1 being integral (although conversions from float would >>> use FLOAT_EXPR), but then what is: >> >> bit_and is valid on POINTER_TYPE and vector integer types >> >>> >>>/* It's also a good idea if the conversion is to a non-integer >>> mode. */ >>>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT >>> >>> letting through? MODE_PARTIAL_INT maybe, but that's a sort of integer >>> mode too. MODE_COMPLEX_INT or MODE_VECTOR_INT? I thought for those >>> it would be better to apply the scalar rules to the element type. >> >> I suppose extra caution ;) I think I have seen BLKmode for not naturally >> aligned integer types at least on strict-align targets? The code is >> a copy from original code in tree-ssa-forwprop.c. >> >>> Either way, having allowed all non-INT modes, using full_integral_type_p >>> for the remaining condition seems correct. >>> >>> If the feeling is that this isn't a useful abstraction, I can just update >>> each site individually to cope with variable-sized modes. >> >> I think "full_integral_type_p" is a name from which I cannot infer >> its meaning. Maybe type_has_mode_precision_p? Or >> type_matches_mode_p? > > type_has_mode_precision_p sounds good. With that name I
Re: [PATCH] detect incompatible aliases (PR c/81854)
On Thu, 17 Aug 2017, Martin Sebor wrote: > PPS To make use of this checking (and compile without the new > warnings) Glibc needs the following patch: This should be submitted to libc-alpha (independently of the GCC patch, presuming existing GCC versions accept the correct types). -- Joseph S. Myers jos...@codesourcery.com
Re: Add a full_integral_type_p helper function
Richard Biener writes: > On Fri, Aug 18, 2017 at 10:10 AM, Richard Sandiford > wrote: >> There are several places that test whether: >> >> TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t)) >> >> for some integer type T. With SVE variable-length modes, this would >> need to become: >> >> TYPE_PRECISION (t) == GET_MODE_PRECISION (SCALAR_TYPE_MODE (t)) >> >> (or SCALAR_INT_TYPE_MODE, it doesn't matter which in this case). >> But rather than add the "SCALAR_" everywhere, it seemed neater to >> introduce a new helper function that tests whether T is an integral >> type that has the same number of bits as its underlying mode. This >> patch does that, calling it full_integral_type_p. >> >> It isn't possible to use TYPE_MODE in tree.h because vector_type_mode >> is defined in stor-layout.h, so for now the function accesses the mode >> field directly. After the 77-patch machine_mode series (thanks again >> Jeff for the reviews) it would use SCALAR_TYPE_MODE instead. >> >> Of the changes that didn't previously have an INTEGRAL_TYPE_P check: >> >> - for fold_single_bit_test_into_sign_test it is obvious from the >> integer_foop tests that this is restricted to integral types. >> >> - vect_recog_vector_vector_shift_pattern is inherently restricted >> to integral types. >> >> - the register_edge_assert_for_2 hunk is dominated by: >> >> TREE_CODE (val) == INTEGER_CST >> >> - the ubsan_instrument_shift hunk is preceded by an early exit: >> >> if (!INTEGRAL_TYPE_P (type0)) >> return NULL_TREE; >> >> - the second and third match.pd hunks are from: >> >> /* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) >> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) >>if the new mask might be further optimized. */ >> >> I'm a bit confused about: >> >> /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) >>when profitable. >>For bitwise binary operations apply operand conversions to the >>binary operation result instead of to the operands. This allows >>to combine successive conversions and bitwise binary operations. >>We combine the above two cases by using a conditional convert. */ >> (for bitop (bit_and bit_ior bit_xor) >> (simplify >> (bitop (convert @0) (convert? @1)) >> (if (((TREE_CODE (@1) == INTEGER_CST >> && INTEGRAL_TYPE_P (TREE_TYPE (@0)) >> && int_fits_type_p (@1, TREE_TYPE (@0))) >> || types_match (@0, @1)) >>/* ??? This transform conflicts with fold-const.c doing >> Convert (T)(x & c) into (T)x & (T)c, if c is an integer >> constants (if x has signed type, the sign bit cannot be set >> in c). This folds extension into the BIT_AND_EXPR. >> Restrict it to GIMPLE to avoid endless recursions. */ >>&& (bitop != BIT_AND_EXPR || GIMPLE) >>&& (/* That's a good idea if the conversion widens the operand, thus >> after hoisting the conversion the operation will be narrower. >> */ >>TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type) >>/* It's also a good idea if the conversion is to a non-integer >> mode. */ >>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT >>/* Or if the precision of TO is not the same as the precision >> of its mode. */ >>|| TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE >> (type >>(convert (bitop @0 (convert @1)) >> >> though. The "INTEGRAL_TYPE_P (TREE_TYPE (@0))" suggests that we can't >> rely on @0 and @1 being integral (although conversions from float would >> use FLOAT_EXPR), but then what is: > > bit_and is valid on POINTER_TYPE and vector integer types > >> >>/* It's also a good idea if the conversion is to a non-integer >> mode. */ >>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT >> >> letting through? MODE_PARTIAL_INT maybe, but that's a sort of integer >> mode too. MODE_COMPLEX_INT or MODE_VECTOR_INT? I thought for those >> it would be better to apply the scalar rules to the element type. > > I suppose extra caution ;) I think I have seen BLKmode for not naturally > aligned integer types at least on strict-align targets? The code is > a copy from original code in tree-ssa-forwprop.c. > >> Either way, having allowed all non-INT modes, using full_integral_type_p >> for the remaining condition seems correct. >> >> If the feeling is that this isn't a useful abstraction, I can just update >> each site individually to cope with variable-sized modes. > > I think "full_integral_type_p" is a name from which I cannot infer > its meaning. Maybe type_has_mode_precision_p? Or > type_matches_mode_p? type_has_mode_precision_p sounds good. With that name I guess it should be written to cope with all types (even those with variable- width modes), so I think we'd need to continue using TYPE_MODE. The VECTOR_MODE_P check shoul
Re: Add a full_integral_type_p helper function
On Fri, Aug 18, 2017 at 10:10 AM, Richard Sandiford wrote: > There are several places that test whether: > > TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t)) > > for some integer type T. With SVE variable-length modes, this would > need to become: > > TYPE_PRECISION (t) == GET_MODE_PRECISION (SCALAR_TYPE_MODE (t)) > > (or SCALAR_INT_TYPE_MODE, it doesn't matter which in this case). > But rather than add the "SCALAR_" everywhere, it seemed neater to > introduce a new helper function that tests whether T is an integral > type that has the same number of bits as its underlying mode. This > patch does that, calling it full_integral_type_p. > > It isn't possible to use TYPE_MODE in tree.h because vector_type_mode > is defined in stor-layout.h, so for now the function accesses the mode > field directly. After the 77-patch machine_mode series (thanks again > Jeff for the reviews) it would use SCALAR_TYPE_MODE instead. > > Of the changes that didn't previously have an INTEGRAL_TYPE_P check: > > - for fold_single_bit_test_into_sign_test it is obvious from the > integer_foop tests that this is restricted to integral types. > > - vect_recog_vector_vector_shift_pattern is inherently restricted > to integral types. > > - the register_edge_assert_for_2 hunk is dominated by: > > TREE_CODE (val) == INTEGER_CST > > - the ubsan_instrument_shift hunk is preceded by an early exit: > > if (!INTEGRAL_TYPE_P (type0)) > return NULL_TREE; > > - the second and third match.pd hunks are from: > > /* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) > (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) >if the new mask might be further optimized. */ > > I'm a bit confused about: > > /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) >when profitable. >For bitwise binary operations apply operand conversions to the >binary operation result instead of to the operands. This allows >to combine successive conversions and bitwise binary operations. >We combine the above two cases by using a conditional convert. */ > (for bitop (bit_and bit_ior bit_xor) > (simplify > (bitop (convert @0) (convert? @1)) > (if (((TREE_CODE (@1) == INTEGER_CST > && INTEGRAL_TYPE_P (TREE_TYPE (@0)) > && int_fits_type_p (@1, TREE_TYPE (@0))) > || types_match (@0, @1)) >/* ??? This transform conflicts with fold-const.c doing > Convert (T)(x & c) into (T)x & (T)c, if c is an integer > constants (if x has signed type, the sign bit cannot be set > in c). This folds extension into the BIT_AND_EXPR. > Restrict it to GIMPLE to avoid endless recursions. */ >&& (bitop != BIT_AND_EXPR || GIMPLE) >&& (/* That's a good idea if the conversion widens the operand, thus > after hoisting the conversion the operation will be narrower. > */ >TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type) >/* It's also a good idea if the conversion is to a non-integer > mode. */ >|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT >/* Or if the precision of TO is not the same as the precision > of its mode. */ >|| TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE (type >(convert (bitop @0 (convert @1)) > > though. The "INTEGRAL_TYPE_P (TREE_TYPE (@0))" suggests that we can't > rely on @0 and @1 being integral (although conversions from float would > use FLOAT_EXPR), but then what is: bit_and is valid on POINTER_TYPE and vector integer types > >/* It's also a good idea if the conversion is to a non-integer > mode. */ >|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT > > letting through? MODE_PARTIAL_INT maybe, but that's a sort of integer > mode too. MODE_COMPLEX_INT or MODE_VECTOR_INT? I thought for those > it would be better to apply the scalar rules to the element type. I suppose extra caution ;) I think I have seen BLKmode for not naturally aligned integer types at least on strict-align targets? The code is a copy from original code in tree-ssa-forwprop.c. > Either way, having allowed all non-INT modes, using full_integral_type_p > for the remaining condition seems correct. > > If the feeling is that this isn't a useful abstraction, I can just update > each site individually to cope with variable-sized modes. I think "full_integral_type_p" is a name from which I cannot infer its meaning. Maybe type_has_mode_precision_p? Or type_matches_mode_p? Does TYPE_PRECISION == GET_MODE_PRECISION imply TYPE_SIZE == GET_MODE_BITSIZE btw? Richard. > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Richard > > > 2017-08-18 Richard Sandiford > Alan Hayward > David Sherwood > > gcc/ > * tree.h (scalar_type_is_full_p): New function. > (full_integral_type_p
Re: PR81635: Use chrecs to help find related data refs
On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener wrote: > On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng wrote: >> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford >> wrote: >>> "Bin.Cheng" writes: On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford wrote: > "Bin.Cheng" writes: >> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford >> wrote: >>> "Bin.Cheng" writes: On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford wrote: > The first loop in the testcase regressed after my recent changes to > dr_analyze_innermost. Previously we would treat "i" as an iv even > for bb analysis and end up with: > >DR_BASE_ADDRESS: p or q >DR_OFFSET: 0 >DR_INIT: 0 or 4 >DR_STEP: 16 > > We now always keep the step as 0 instead, so for an int "i" we'd have: > >DR_BASE_ADDRESS: p or q >DR_OFFSET: (intptr_t) i >DR_INIT: 0 or 4 >DR_STEP: 0 > > This is also what we'd like to have for the unsigned "i", but the > problem is that strip_constant_offset thinks that the "i + 1" in > "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". > The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA > name that holds "(intptr_t) (i + 1)", meaning that the accesses no > longer seem to be related to the [i] ones. Didn't read the change in detail, so sorry if I mis-understood the issue. I made changes in scev to better fold type conversion by various sources of information, for example, vrp, niters, undefined overflow behavior etc. In theory these information should be available for other optimizers without querying scev. For the mentioned test, vrp should compute accurate range information for "i" so that we can fold (intptr_t) (i + 1) it without worrying overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 could be more expensive (especially in case of (T)(i + j)), or because the CST part is in bigger precision after conversion. But such folding is wanted in several places, e.g, IVOPTs. To provide such an interface, we changed tree-affine and made it do aggressive fold. I am curious if it's possible to use aff_tree to implement strip_constant_offset here since aggressive folding is wanted. After all, using additional chrec looks like a little heavy wrto the simple test. >>> >>> Yeah, using aff_tree does work here when the bounds are constant. >>> It doesn't look like it works for things like: >>> >>> double p[1000]; >>> double q[1000]; >>> >>> void >>> f4 (unsigned int n) >>> { >>> for (unsigned int i = 0; i < n; i += 4) >>> { >>> double a = q[i] + p[i]; >>> double b = q[i + 1] + p[i + 1]; >>> q[i] = a; >>> q[i + 1] = b; >>> } >>> } >>> >>> though, where the bounds on the global arrays guarantee that [i + 1] >>> can't >>> overflow, even though "n" is unconstrained. The patch as posted handles >>> this case too. >> BTW is this a missed optimization in value range analysis? The range >> information for i should flow in a way like: array boundary -> niters >> -> scev/vrp. >> I think that's what niters/scev do in analysis. > > Yeah, maybe :-) It looks like the problem is that when SLP runs, > the previous VRP pass came before loop header copying, so the (single) > header has to cope with n == 0 case. Thus we get: Ah, there are several passes in between vrp and pass_ch, not sure if any such pass depends on vrp intensively. I would suggestion reorder the two passes, or standalone VRP interface updating information for loop region after header copied? This is a non-trivial issue that needs to be fixed. Niters analyzer rely on simplify_using_initial_conditions heavily to get the same information, which in my opinion should be provided by VRP. Though this won't be able to obsolete simplify_using_initial_conditions because VRP is weak in symbolic range... > > Visiting statement: > i_15 = ASSERT_EXPR ; > Intersecting > [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) > and > [0, 0] > to > [0, 0] EQUIVALENCES: { i_6 } (1 elements) > Intersecting > [0, 0] EQUIVALENCES: { i_6 } (1 elements) > and > [0, 1000] > to > [0, 0] EQUIVALENCES: { i_6 } (1 elements) > Found new range for i_15: [0, 0] > > Visiting statement: >>
Re: PR81635: Use chrecs to help find related data refs
On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng wrote: > On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford > wrote: >> "Bin.Cheng" writes: >>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford >>> wrote: "Bin.Cheng" writes: > On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford > wrote: >> "Bin.Cheng" writes: >>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>> wrote: The first loop in the testcase regressed after my recent changes to dr_analyze_innermost. Previously we would treat "i" as an iv even for bb analysis and end up with: DR_BASE_ADDRESS: p or q DR_OFFSET: 0 DR_INIT: 0 or 4 DR_STEP: 16 We now always keep the step as 0 instead, so for an int "i" we'd have: DR_BASE_ADDRESS: p or q DR_OFFSET: (intptr_t) i DR_INIT: 0 or 4 DR_STEP: 0 This is also what we'd like to have for the unsigned "i", but the problem is that strip_constant_offset thinks that the "i + 1" in "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA name that holds "(intptr_t) (i + 1)", meaning that the accesses no longer seem to be related to the [i] ones. >>> >>> Didn't read the change in detail, so sorry if I mis-understood the >>> issue. >>> I made changes in scev to better fold type conversion by various sources >>> of information, for example, vrp, niters, undefined overflow behavior >>> etc. >>> In theory these information should be available for other >>> optimizers without >>> querying scev. For the mentioned test, vrp should compute accurate >>> range >>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>> worrying >>> overflow. Note we don't do it in generic folding because >>> (intptr_t) (i) + 1 >>> could be more expensive (especially in case of (T)(i + j)), or because >>> the >>> CST part is in bigger precision after conversion. >>> But such folding is wanted in several places, e.g, IVOPTs. To provide >>> such >>> an interface, we changed tree-affine and made it do aggressive fold. I >>> am >>> curious if it's possible to use aff_tree to implement >>> strip_constant_offset >>> here since aggressive folding is wanted. After all, using additional >>> chrec >>> looks like a little heavy wrto the simple test. >> >> Yeah, using aff_tree does work here when the bounds are constant. >> It doesn't look like it works for things like: >> >> double p[1000]; >> double q[1000]; >> >> void >> f4 (unsigned int n) >> { >> for (unsigned int i = 0; i < n; i += 4) >> { >> double a = q[i] + p[i]; >> double b = q[i + 1] + p[i + 1]; >> q[i] = a; >> q[i + 1] = b; >> } >> } >> >> though, where the bounds on the global arrays guarantee that [i + 1] >> can't >> overflow, even though "n" is unconstrained. The patch as posted handles >> this case too. > BTW is this a missed optimization in value range analysis? The range > information for i should flow in a way like: array boundary -> niters > -> scev/vrp. > I think that's what niters/scev do in analysis. Yeah, maybe :-) It looks like the problem is that when SLP runs, the previous VRP pass came before loop header copying, so the (single) header has to cope with n == 0 case. Thus we get: >>> Ah, there are several passes in between vrp and pass_ch, not sure if >>> any such pass depends on vrp intensively. I would suggestion reorder >>> the two passes, or standalone VRP interface updating information for >>> loop region after header copied? This is a non-trivial issue that >>> needs to be fixed. Niters analyzer rely on >>> simplify_using_initial_conditions heavily to get the same information, >>> which in my opinion should be provided by VRP. Though this won't be >>> able to obsolete simplify_using_initial_conditions because VRP is weak >>> in symbolic range... >>> Visiting statement: i_15 = ASSERT_EXPR ; Intersecting [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) and [0, 0] to [0, 0] EQUIVALENCES: { i_6 } (1 elements) Intersecting [0, 0] EQUIVALENCES: { i_6 } (1 elements) and [0, 1000] to [0, 0] EQUIVALENCES: { i_6 } (1 elements) Found new range for i_15: [0, 0] Visiting statement: _3 = i_15 + 1; Match-and-simplified i_15 + 1 to 1 Intersecting [1, 1] and [0, +INF] to [1, 1] Found new range for _3: [1
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On 08/18/2017 11:30 AM, Richard Biener wrote: > On Tue, Aug 15, 2017 at 2:37 PM, Martin Liška wrote: >> On 08/14/2017 10:32 AM, Richard Biener wrote: >>> Hmm, but the existing "lowering" part is called from the >>> switch-conversion pass. So >>> I'm not sure a new file is good. >> >> Good, I'm not against having that in a single file. So new version of the >> patch >> does that. >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Ready to be installed? > > Hmm, I see you duplicate add_case_node for example. Is that just temporary? > If not can you please factor out the data structure and common code? > (case.[Ch]?) You are right. As we'll generate just jump table in stmt.c the proper fix is to remove all usages of 'case_node' in the file because simple iteration of labels will work fine. Let me do it incrementally to minimize fall out :) Martin > > Thanks, > Richard. > >> Martin
Re: [PATCH] Fix file find utils and add unit tests (PR driver/81829).
On 08/15/2017 02:45 PM, Martin Liška wrote: > Hi. > > As shown in the PR, remove_prefix function is written wrongly. It does not > distinguish > in between a node in linked list and pprefix->plist. So I decide to rewrite > it. > Apart from that, I identified discrepancy in between do_add_prefix and > prefix_from_string > where the later one appends always DIR_SEPARATOR (if missing). So I also the > former function. > And I'm adding unit tests for all functions in file-find.c. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? > Martin > > gcc/ChangeLog: > > 2017-08-14 Martin Liska > > PR driver/81829 > * file-find.c (do_add_prefix): Always append DIR_SEPARATOR > at the end of a prefix. > (remove_prefix): Properly remove elements and accept also > path without a trailing DIR_SEPARATOR. > (purge): New function. > (file_find_verify_prefix_creation): Likewise. > (file_find_verify_prefix_add): Likewise. > (file_find_verify_prefix_removal): Likewise. > (file_find_c_tests): Likewise. > * selftest-run-tests.c (selftest::run_tests): Add new > file_find_c_tests. > * selftest.h (file_find_c_tests): Likewise. > --- > gcc/file-find.c | 182 > ++- > gcc/selftest-run-tests.c | 1 + > gcc/selftest.h | 1 + > 3 files changed, 167 insertions(+), 17 deletions(-) > > Hi. As doko pointed out, the first version was not correct. Let me describe 2 scenarios which should be supported: 1) my original motivation where I configure gcc w/ --prefix=/my_bin and I manually create in /my_bin/bin: $ ln -s gcc-ar ar $ ln -s gcc-nm nm $ ln -s gcc-ranlib ranlib and then is set PATH=/my_bin/bin:... and LD_LIBRARY_PATH and I don't have to care about NM=gcc-nm and so on. That's how I usually test a newly built GCC. 2) using NM=gcc-nm and so on will force usage of LTO plugin. That's what was broken in first version of the patch. In order to support both cases we need to identify real name of GCC wrapper (like /my_bin/bin/gcc-ar) and when find_a_file (&path, real_exe_name, X_OK) will point to the same file, the directory should be ignored from prefix. Can you please test attached patch in your scenario? Thanks, Martin >From f8029ed6d3dd444ee2608146118f2189cf9ef0d8 Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 14 Aug 2017 13:56:32 +0200 Subject: [PATCH] Fix file find utils and add unit tests (PR driver/81829). gcc/ChangeLog: 2017-08-14 Martin Liska PR driver/81829 * file-find.c (do_add_prefix): Always append DIR_SEPARATOR at the end of a prefix. (remove_prefix): Properly remove elements and accept also path without a trailing DIR_SEPARATOR. (purge): New function. (file_find_verify_prefix_creation): Likewise. (file_find_verify_prefix_add): Likewise. (file_find_verify_prefix_removal): Likewise. (file_find_c_tests): Likewise. * selftest-run-tests.c (selftest::run_tests): Add new file_find_c_tests. * selftest.h (file_find_c_tests): Likewise. --- gcc/file-find.c | 182 ++- gcc/gcc-ar.c | 19 +++-- gcc/selftest-run-tests.c | 1 + gcc/selftest.h | 1 + 4 files changed, 179 insertions(+), 24 deletions(-) diff --git a/gcc/file-find.c b/gcc/file-find.c index b072a4993d7..23a883042a2 100644 --- a/gcc/file-find.c +++ b/gcc/file-find.c @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see #include "system.h" #include "filenames.h" #include "file-find.h" +#include "selftest.h" static bool debug = false; @@ -126,11 +127,22 @@ do_add_prefix (struct path_prefix *pprefix, const char *prefix, bool first) /* Keep track of the longest prefix. */ len = strlen (prefix); + bool append_separator = !IS_DIR_SEPARATOR (prefix[len - 1]); + if (append_separator) +len++; + if (len > pprefix->max_len) pprefix->max_len = len; pl = XNEW (struct prefix_list); - pl->prefix = xstrdup (prefix); + char *dup = XCNEWVEC (char, len + 1); + memcpy (dup, prefix, append_separator ? len - 1 : len); + if (append_separator) +{ + dup[len - 1] = DIR_SEPARATOR; + dup[len] = '\0'; +} + pl->prefix = dup; if (*prev) pl->next = *prev; @@ -212,34 +224,170 @@ prefix_from_string (const char *p, struct path_prefix *pprefix) void remove_prefix (const char *prefix, struct path_prefix *pprefix) { - struct prefix_list *remove, **prev, **remove_prev = NULL; + char *dup = NULL; int max_len = 0; + size_t len = strlen (prefix); + if (prefix[len - 1] != DIR_SEPARATOR) +{ + char *dup = XNEWVEC (char, len + 2); + memcpy (dup, prefix, len); + dup[len] = DIR_SEPARATOR; + dup[len + 1] = '\0'; + prefix = dup; +} if (pprefix->plist) { - prev = &pprefix->plist; - for (struct prefix_list *pl = pprefix->plist; pl->next; pl = pl->next) + prefix_li
Re: [PATCH] [PR79542][Ada] Fix ICE in dwarf2out.c with nested func. inlining
On Tue, Aug 15, 2017 at 1:16 PM, Richard Biener wrote: > On Sat, Aug 12, 2017 at 11:09 AM, Pierre-Marie de Rodat > wrote: >> On 08/11/2017 11:29 PM, Jason Merrill wrote: >>> >>> OK. >> >> >> Committed. Thank you for your sustained review effort, Jason. :-) > > The way you use decl_ultimate_origin conflicts with the early LTO > debug patches which > make dwarf2out_abstract_function call set_decl_origin_self and thus the assert > in gen_typedef_die triggers (and the rest probably misbehaves). > > Now I wonder whether we at any point need that self-origin? > > Currently it's set via > > static dw_die_ref > gen_decl_die (tree decl, tree origin, struct vlr_context *ctx, > dw_die_ref context_die) > { > ... > case FUNCTION_DECL: > #if 0 > /* FIXME */ > /* This doesn't work because the C frontend sets DECL_ABSTRACT_ORIGIN > on local redeclarations of global functions. That seems broken. */ > if (current_function_decl != decl) > /* This is only a declaration. */; > #endif > > /* If we're emitting a clone, emit info for the abstract instance. */ > if (origin || DECL_ORIGIN (decl) != decl) > dwarf2out_abstract_function (origin > ? DECL_ORIGIN (origin) > : DECL_ABSTRACT_ORIGIN (decl)); > > /* If we're emitting an out-of-line copy of an inline function, > emit info for the abstract instance and set up to refer to it. */ > else if (cgraph_function_possibly_inlined_p (decl) >&& ! DECL_ABSTRACT_P (decl) >&& ! class_or_namespace_scope_p (context_die) >/* dwarf2out_abstract_function won't emit a die if this is just > a declaration. We must avoid setting DECL_ABSTRACT_ORIGIN > in > that case, because that works only if we have a die. */ >&& DECL_INITIAL (decl) != NULL_TREE) > { > dwarf2out_abstract_function (decl); > set_decl_origin_self (decl); > } > > ok, not doing this at all doesn't work, doing it only in the above case > neither. > > Bah. > > Can anyone explain to me why we do the set_decl_origin_self dance? Ok, so I need the following incremental patch to fix the fallout. This allows Ada LTO bootstrap to succeed with the early LTO debug patches. I assume this change is ok ontop of the LTO debug patches unless I hear otherwise til Monday (when I then finally will commit the series). Full bootstrap/testing running now. Thanks, Richard. 2017-08-18 Richard Biener * dwarf2out.c (modified_type_die): Check for self origin before recursing. (gen_type_die_with_usage): Likewise. (gen_typedef_die): Allow self origin. * tree.c (variably_modified_type_p): Guard against Ada recursive pointer types. p Description: Binary data
Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space
On Wed, Aug 16, 2017 at 3:32 PM, Georg-Johann Lay wrote: > On 28.07.2017 09:34, Richard Biener wrote: >> >> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay wrote: >>> >>> On 27.07.2017 14:34, Richard Biener wrote: On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay wrote: > > > For some targets, the best place to put read-only lookup tables as > generated by -ftree-switch-conversion is not the generic address space > but some target specific address space. > > This is the case for AVR, where for most devices .rodata must be > located in RAM. > > Part #1 adds a new, optional target hook that queries the back-end > about the desired address space. There is currently only one user: > tree-switch-conversion.c::build_one_array() which re-builds value_type > and array_type if the address space returned by the backend is not > the generic one. > > Part #2 is the AVR part that implements the new hook and adds some > sugar around it. Given that switch-conversion just creates a constant initializer doesn't AVR benefit from handling those uniformly (at RTL expansion?). Not sure but I think it goes through the regular constant pool handling. Richard. >>> >>> >>> >>> avr doesn't use constant pools because they are not profitable for >>> several reasons. >>> >>> Moreover, it's not sufficient to just patch the pool's section, you'd >>> also have to patch *all* accesses to also use the exact same address >>> space so that they use the correct instruction (LPM instead of LD). >> >> >> Sure. So then not handle it in constant pool handling but in expanding >> the initializers of global vars. I think the entry for this is >> varasm.c:assemble_variable - that sets DECL_RTL for the variable. > > > That cannot work, and here is why; just for completeness: > > cgraphunit.c::symbol_table::compile() > > runs > > ... > expand_all_functions (); > output_variables (); // runs assemble_variable > ... > > So any patching of DECL_RTL will result in wrong code: The address > space of the decl won't match the address space used by the code > accessing decl. Ok, so it's more tricky to hack it that way (you'd need to catch the time the decl gets its DECL_RTL set, not sure where that happens for globals). That leaves doing a more broad transform. One convenient place to hook in would be the ipa_single_use pass which computes the varpool_node.used_by_single_function flag. All such variables would be suitable for promotion (with some additional constraints I suppose). You'd add a transform phase to the pass that rewrites the decls and performs GIMPLE IL adjustments (I think you need to patch memory references to use the address-space). Of course there would need to be a target hook determining if/what section/address-space a variable should be promoted to which somehow would allow switch-conversion to use that as well. Ho humm. That said, do you think the switch-conversion created decl is the only one that benefits from compiler-directed promotion to different address-space? Richard. > Johann
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On Tue, Aug 15, 2017 at 2:37 PM, Martin Liška wrote: > On 08/14/2017 10:32 AM, Richard Biener wrote: >> Hmm, but the existing "lowering" part is called from the >> switch-conversion pass. So >> I'm not sure a new file is good. > > Good, I'm not against having that in a single file. So new version of the > patch > does that. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? Hmm, I see you duplicate add_case_node for example. Is that just temporary? If not can you please factor out the data structure and common code? (case.[Ch]?) Thanks, Richard. > Martin
Re: [GCC RFC]Expensive internal function calls.
On Fri, Aug 18, 2017 at 10:45 AM, Bin Cheng wrote: > Hi, > As a followup patch for fix to PR81832, this patch considers internal > function call to > IFN_LOOP_DIST_ALIAS and IFN_LOOP_VECTORIZED as expensive. Or interpreted > in another way: return false since we shouldn't be asked the question? Any > comments? > BTW, I have no problem to drop this if not appropriate. I think as the meaning is to cost the actual call making it expensive is wrong given it will end up as constant. So, no, this doesn't look correct. Richard. > Thanks, > bin > 2017-08-16 Bin Cheng > > * gimple.c (gimple_inexpensive_call_p): Consider IFN_LOOP_DIST_ALIAS > and IFN_LOOP_VECTORIZED as expensive calls.
[GCC RFC]Expensive internal function calls.
Hi, As a followup patch for fix to PR81832, this patch considers internal function call to IFN_LOOP_DIST_ALIAS and IFN_LOOP_VECTORIZED as expensive. Or interpreted in another way: return false since we shouldn't be asked the question? Any comments? BTW, I have no problem to drop this if not appropriate. Thanks, bin 2017-08-16 Bin Cheng * gimple.c (gimple_inexpensive_call_p): Consider IFN_LOOP_DIST_ALIAS and IFN_LOOP_VECTORIZED as expensive calls.diff --git a/gcc/gimple.c b/gcc/gimple.c index c4e6f81..6d4e376 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -3038,7 +3038,16 @@ bool gimple_inexpensive_call_p (gcall *stmt) { if (gimple_call_internal_p (stmt)) -return true; +{ + /* Some internal function calls are only meant to indicate temporary +arrangement of optimization and are never used in code generation. +We always consider these calls expensive. */ + if (gimple_call_internal_p (stmt, IFN_LOOP_DIST_ALIAS) + || gimple_call_internal_p (stmt, IFN_LOOP_VECTORIZED)) + return false; + + return true; +} tree decl = gimple_call_fndecl (stmt); if (decl && is_inexpensive_builtin (decl)) return true;
Re: [PATCH] PR c++/80287 add new testcase
On 4 August 2017 at 15:52, Yvan Roux wrote: > On 13 July 2017 at 14:02, Yvan Roux wrote: >> Hi, >> >> as discussed in the PR, this patch adds a new reduced testcase which >> doesn't rely on c++17 features, this is a prereq to the backport of >> the fix into GCC 6 branch which is impacted by this issue. >> >> Validated on x86, ARM and AArch64 targets. >> >> Ok for trunk ? and maybe on gcc-7-branch ? > > ping ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00730.html >> Thanks, >> Yvan >> >> gcc/testsuite >> 2017-07-13 Yvan Roux >> >> PR c++/80287 >> * g++.dg/pr80287.C: New test.
Re: [Libgomp, Fortran] Fix canadian cross build
On 4 August 2017 at 15:52, Yvan Roux wrote: > On 11 July 2017 at 12:25, Yvan Roux wrote: >> On 3 July 2017 at 11:21, Yvan Roux wrote: >>> On 23 June 2017 at 15:44, Yvan Roux wrote: Hello, Fortran parts of libgomp (omp_lib.mod, openacc.mod, etc...) are missing in a canadian cross build, at least when target gfortran compiler comes from PATH and not from GFORTRAN_FOR_TARGET. Back in 2010, executability test of GFORTRAN was added to fix libgomp build on cygwin, but when the executable doesn't contain the path, "test -x" fails and part of the library are not built. This patch fixes the issue by using M4 macro AC_PATH_PROG (which returns the absolute name) instead of AC_CHECK_PROG in the function defined in config/acx.m4: NCN_STRICT_CHECK_TARGET_TOOLS. I renamed it into NCN_STRICT_PATH_TARGET_TOOLS to keep the semantic used in M4. Tested by building cross and candian cross toolchain (host: i686-w64-mingw32) for arm-linux-gnueabihf with issue and with a complete libgomp. ok for trunk ? >>> >>> ping? >> >> ping? > > ping ping https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01784.html >>> Thanks Yvan config/ChangeLog 2017-06-23 Yvan Roux * acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ... (NCN_STRICT_PATH_TARGET_TOOLS): ... this. It reflects the replacement of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name of the program. (ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function. ChangeLog 2017-06-23 Yvan Roux * configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of NCN_STRICT_CHECK_TARGET_TOOLS. * configure: Regenerate.
Add a partial_integral_type_p helper function
This patch adds a partial_integral_type_p function, to go along with the full_integral_type_p added by the previous patch. Of the changes that didn't previously have an INTEGRAL_TYPE_P check: - the convert_to_integer_1 hunks are dominated by a case version of INTEGRAL_TYPE_P. - the merge_ranges hunk is dominated by an ENUMERAL_TYPE case. - vectorizable_reduction has the comment: /* Do not try to vectorize bit-precision reductions. */ and so I think was only concerned with integers. - vectorizable_assignment has the comment: /* We do not handle bit-precision changes. */ and the later: /* But a conversion that does not change the bit-pattern is ok. */ && !((TYPE_PRECISION (TREE_TYPE (scalar_dest)) > TYPE_PRECISION (TREE_TYPE (op))) && TYPE_UNSIGNED (TREE_TYPE (op))) would only make sense if OP is also an integral type. - vectorizable_shift is inherently restricted to integers. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard 2017-08-17 Richard Sandiford Alan Hayward David Sherwood gcc/ * tree.h (partial_integral_type_p): New function. * convert.c (convert_to_integer_1): Use it. * expr.c (store_fieldexpand_expr_real_2, expand_expr_real_1): Likewise. * fold-const.c (merge_ranges): Likewise. * tree-ssa-math-opts.c (convert_mult_to_fma): Likewise. * tree-tailcall.c (process_assignment): Likewise. * tree-vect-loop.c (vectorizable_reduction): Likewise. * tree-vect-stmts.c (vectorizable_conversion): Likewise. (vectorizable_assignment, vectorizable_shift): Likewise. Index: gcc/tree.h === --- gcc/tree.h 2017-08-18 08:35:58.031690315 +0100 +++ gcc/tree.h 2017-08-18 08:36:07.208306339 +0100 @@ -5414,4 +5414,13 @@ full_integral_type_p (const_tree t) return INTEGRAL_TYPE_P (t) && scalar_type_is_full_p (t); } +/* Return true if T is an integral type that has fewer bits than + its underlying mode. */ + +inline bool +partial_integral_type_p (const_tree t) +{ + return INTEGRAL_TYPE_P (t) && !scalar_type_is_full_p (t); +} + #endif /* GCC_TREE_H */ Index: gcc/convert.c === --- gcc/convert.c 2017-08-10 14:36:09.015436664 +0100 +++ gcc/convert.c 2017-08-18 08:36:07.203306339 +0100 @@ -711,8 +711,7 @@ convert_to_integer_1 (tree type, tree ex the signed-to-unsigned case the high-order bits have to be cleared. */ if (TYPE_UNSIGNED (type) != TYPE_UNSIGNED (TREE_TYPE (expr)) - && (TYPE_PRECISION (TREE_TYPE (expr)) - != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (expr) + && partial_integral_type_p (TREE_TYPE (expr))) code = CONVERT_EXPR; else code = NOP_EXPR; @@ -725,7 +724,7 @@ convert_to_integer_1 (tree type, tree ex type corresponding to its mode, then do a nop conversion to TYPE. */ else if (TREE_CODE (type) == ENUMERAL_TYPE - || outprec != GET_MODE_PRECISION (TYPE_MODE (type))) + || partial_integral_type_p (type)) { expr = convert (lang_hooks.types.type_for_mode (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr); Index: gcc/expr.c === --- gcc/expr.c 2017-08-03 10:40:54.807600276 +0100 +++ gcc/expr.c 2017-08-18 08:36:07.204306339 +0100 @@ -6834,8 +6834,7 @@ store_field (rtx target, HOST_WIDE_INT b if (nop_def) { tree type = TREE_TYPE (exp); - if (INTEGRAL_TYPE_P (type) - && TYPE_PRECISION (type) < GET_MODE_BITSIZE (TYPE_MODE (type)) + if (partial_integral_type_p (type) && bitsize == TYPE_PRECISION (type)) { tree op = gimple_assign_rhs1 (nop_def); @@ -8243,8 +8242,7 @@ #define REDUCE_BIT_FIELD(expr)(reduce_b /* An operation in what may be a bit-field type needs the result to be reduced to the precision of the bit-field type, which is narrower than that of the type's mode. */ - reduce_bit_field = (INTEGRAL_TYPE_P (type) - && GET_MODE_PRECISION (mode) > TYPE_PRECISION (type)); + reduce_bit_field = partial_integral_type_p (type); if (reduce_bit_field && modifier == EXPAND_STACK_PARM) target = 0; @@ -9669,9 +9667,7 @@ expand_expr_real_1 (tree exp, rtx target /* An operation in what may be a bit-field type needs the result to be reduced to the precision of the bit-field type, which is narrower than that of the type's mode. */ - reduce_bit_field = (!ignore - && INTEGRAL_TYPE_P (type) - && GET_MODE_PRECISION (mode) > TYPE_PRECISION (type)); + reduce_bit_field = (!ignore && partial_integral_type_p (type)); /* If we a
Add a full_integral_type_p helper function
There are several places that test whether: TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t)) for some integer type T. With SVE variable-length modes, this would need to become: TYPE_PRECISION (t) == GET_MODE_PRECISION (SCALAR_TYPE_MODE (t)) (or SCALAR_INT_TYPE_MODE, it doesn't matter which in this case). But rather than add the "SCALAR_" everywhere, it seemed neater to introduce a new helper function that tests whether T is an integral type that has the same number of bits as its underlying mode. This patch does that, calling it full_integral_type_p. It isn't possible to use TYPE_MODE in tree.h because vector_type_mode is defined in stor-layout.h, so for now the function accesses the mode field directly. After the 77-patch machine_mode series (thanks again Jeff for the reviews) it would use SCALAR_TYPE_MODE instead. Of the changes that didn't previously have an INTEGRAL_TYPE_P check: - for fold_single_bit_test_into_sign_test it is obvious from the integer_foop tests that this is restricted to integral types. - vect_recog_vector_vector_shift_pattern is inherently restricted to integral types. - the register_edge_assert_for_2 hunk is dominated by: TREE_CODE (val) == INTEGER_CST - the ubsan_instrument_shift hunk is preceded by an early exit: if (!INTEGRAL_TYPE_P (type0)) return NULL_TREE; - the second and third match.pd hunks are from: /* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1)) (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1)) if the new mask might be further optimized. */ I'm a bit confused about: /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) when profitable. For bitwise binary operations apply operand conversions to the binary operation result instead of to the operands. This allows to combine successive conversions and bitwise binary operations. We combine the above two cases by using a conditional convert. */ (for bitop (bit_and bit_ior bit_xor) (simplify (bitop (convert @0) (convert? @1)) (if (((TREE_CODE (@1) == INTEGER_CST && INTEGRAL_TYPE_P (TREE_TYPE (@0)) && int_fits_type_p (@1, TREE_TYPE (@0))) || types_match (@0, @1)) /* ??? This transform conflicts with fold-const.c doing Convert (T)(x & c) into (T)x & (T)c, if c is an integer constants (if x has signed type, the sign bit cannot be set in c). This folds extension into the BIT_AND_EXPR. Restrict it to GIMPLE to avoid endless recursions. */ && (bitop != BIT_AND_EXPR || GIMPLE) && (/* That's a good idea if the conversion widens the operand, thus after hoisting the conversion the operation will be narrower. */ TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type) /* It's also a good idea if the conversion is to a non-integer mode. */ || GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT /* Or if the precision of TO is not the same as the precision of its mode. */ || TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE (type (convert (bitop @0 (convert @1)) though. The "INTEGRAL_TYPE_P (TREE_TYPE (@0))" suggests that we can't rely on @0 and @1 being integral (although conversions from float would use FLOAT_EXPR), but then what is: /* It's also a good idea if the conversion is to a non-integer mode. */ || GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT letting through? MODE_PARTIAL_INT maybe, but that's a sort of integer mode too. MODE_COMPLEX_INT or MODE_VECTOR_INT? I thought for those it would be better to apply the scalar rules to the element type. Either way, having allowed all non-INT modes, using full_integral_type_p for the remaining condition seems correct. If the feeling is that this isn't a useful abstraction, I can just update each site individually to cope with variable-sized modes. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard 2017-08-18 Richard Sandiford Alan Hayward David Sherwood gcc/ * tree.h (scalar_type_is_full_p): New function. (full_integral_type_p): Likewise. * fold-const.c (fold_single_bit_test_into_sign_test): Likewise. * match.pd: Likewise. * tree-ssa-forwprop.c (simplify_rotate): Likewise. * tree-vect-patterns.c (vect_recog_vector_vector_shift_pattern) (vect_recog_mult_pattern, vect_recog_divmod_pattern): Likewise. (adjust_bool_pattern): Likewise. * tree-vrp.c (register_edge_assert_for_2): Likewise. * ubsan.c (instrument_si_overflow): Likewise. gcc/c-family/ * c-ubsan.c (ubsan_instrument_shift): Use full_integral_type_p. diff --git a/gcc/c-family/c-ubsan.c b/gcc/c-family/c-ubsan.c index b1386db..20f78e7 100644 --- a/gcc/c-family/c-ubsan.c +++ b/gcc/c-family/c-ubsan.c @@ -131,8 +131,8 @@ ub
Re: [PATCH v2] Simplify pow with constant
On Thu, Aug 17, 2017 at 5:43 PM, Alexander Monakov wrote: > On Thu, 17 Aug 2017, Wilco Dijkstra wrote: > >> This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C). > > Note this changes the outcome for C == +Inf, x == 0 (pow is specified to > return 1.0 in that case, but x * C1 == NaN). There's another existing > transform with the same issue, 'pow(expN(x), y) -> expN(x*y)', so this is > not a new problem. > > The whole set of these match.pd transforms is guarded by > flag_unsafe_math_optimizations, which is a bit strange, on the one hand > it does not include -ffinite-math-only, but on the other hand it's > defined broadly enough to imply that. No, flag_unsafe_math_optimization doesn't imply -ffinite-math-only. -funsafe-math-optimization these days should guard transforms that affect the value of the result (but not its classification). There are more specific variants like -fassociative-math which also affects results but in a more specific way. > (to be clear, I'm not objecting to the patch, just pointing out an > existing inconsistency in case someone can offer clarifications) We shouldn't add such issue knowingly, I guess for this case it's easy to check that C is not +Inf. For the existing transforms with the same issue guarding with -ffinite-math-only is an appropriate fix. Richard. > Alexander