[patch, committed] Bug 79841 - Inconsistent diagnostics
Committed as obvious after regression testing. Regards, Jerry Author: jvdelisle Date: Sat Mar 4 03:13:34 2017 New Revision: 245891 URL: https://gcc.gnu.org/viewcvs?rev=245891=gcc=rev Log: 2017-03-03 Jerry DeLislePR fortran/79841 * openmp.c (check_symbol_not_pointer): Adjust diagnostic. Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/openmp.c
New German PO file for 'gcc' (version 7.1-b20170226)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the German team of translators. The file is available at: http://translationproject.org/latest/gcc/de.po (This file, 'gcc-7.1-b20170226.de.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [RFC] VEC_SELECT sanity checking in genrecog
On Fri, Mar 03, 2017 at 02:58:06PM -0600, Segher Boessenkool wrote: > On Fri, Mar 03, 2017 at 03:43:03PM -0500, Michael Meissner wrote: > > On Fri, Mar 03, 2017 at 02:19:57PM -0600, Segher Boessenkool wrote: > > > On Fri, Mar 03, 2017 at 05:28:27PM +0100, Jakub Jelinek wrote: > > > Yeah, it looks like these patterns should use VSX_D instead of VSX_LE. > > > Mike, you know this code best, what do you think? > > > > Bill Schmidt added these, but he is gone for the day. > > > > No, these patterns need to use VSX_LE, but V1TI should be moved from VSX_LE > > to > > VSX_LE_128. VSX_LE_128 is used for things like IEEE 128-bit floating point > > and > > 128-bit integers that can't use vec_select. In fact there is this comment: > > > > ;; Little endian word swapping for 128-bit types that are either scalars or > > the > > ;; special V1TI container class, which it is not appropriate to use > > vec_select > > ;; for the type. > > Okay, but there are also these comments: > > ;; Iterator for the 2 64-bit vector types > (define_mode_iterator VSX_D [V2DF V2DI]) > > ;; Iterator for the 2 64-bit vector types + 128-bit types that are loaded with > ;; lxvd2x to properly handle swapping words on little endian > (define_mode_iterator VSX_LE [V2DF V2DI V1TI]) > > so that removing V1TI from VSX_LE makes it exactly equal to VSX_D. Yep, VSX_LE would now be redunant. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [RFC] VEC_SELECT sanity checking in genrecog
On Fri, Mar 03, 2017 at 03:43:03PM -0500, Michael Meissner wrote: > On Fri, Mar 03, 2017 at 02:19:57PM -0600, Segher Boessenkool wrote: > > On Fri, Mar 03, 2017 at 05:28:27PM +0100, Jakub Jelinek wrote: > > Yeah, it looks like these patterns should use VSX_D instead of VSX_LE. > > Mike, you know this code best, what do you think? > > Bill Schmidt added these, but he is gone for the day. > > No, these patterns need to use VSX_LE, but V1TI should be moved from VSX_LE to > VSX_LE_128. VSX_LE_128 is used for things like IEEE 128-bit floating point > and > 128-bit integers that can't use vec_select. In fact there is this comment: > > ;; Little endian word swapping for 128-bit types that are either scalars or > the > ;; special V1TI container class, which it is not appropriate to use vec_select > ;; for the type. Okay, but there are also these comments: ;; Iterator for the 2 64-bit vector types (define_mode_iterator VSX_D [V2DF V2DI]) ;; Iterator for the 2 64-bit vector types + 128-bit types that are loaded with ;; lxvd2x to properly handle swapping words on little endian (define_mode_iterator VSX_LE [V2DF V2DI V1TI]) so that removing V1TI from VSX_LE makes it exactly equal to VSX_D. Segher
Re: [RFC] VEC_SELECT sanity checking in genrecog
On Fri, Mar 03, 2017 at 02:19:57PM -0600, Segher Boessenkool wrote: > Hi, > > On Fri, Mar 03, 2017 at 05:28:27PM +0100, Jakub Jelinek wrote: > > E.g. in vsx.md the thing is that > > the pattern uses an iterator with 2 V2?? modes in it and then V1TI mode, > > and uses exactly two elements in paralle, which doesn't make any sense > > for V1TI. > > > > ../../gcc/config/rs6000/vsx.md:2063:1: vec_select parallel with 2 elements, > > expected 1 > > ../../gcc/config/rs6000/vsx.md:2112:1: vec_select parallel with 2 elements, > > expected 1 > > ../../gcc/config/rs6000/vsx.md:2161:1: vec_select parallel with 2 elements, > > expected 1 > > Yeah, it looks like these patterns should use VSX_D instead of VSX_LE. > Mike, you know this code best, what do you think? Bill Schmidt added these, but he is gone for the day. No, these patterns need to use VSX_LE, but V1TI should be moved from VSX_LE to VSX_LE_128. VSX_LE_128 is used for things like IEEE 128-bit floating point and 128-bit integers that can't use vec_select. In fact there is this comment: ;; Little endian word swapping for 128-bit types that are either scalars or the ;; special V1TI container class, which it is not appropriate to use vec_select ;; for the type. (define_insn "*vsx_le_permute_" [(set (match_operand:VSX_LE_128 0 "nonimmediate_operand" "=,,Z") (rotate:VSX_LE_128 (match_operand:VSX_LE_128 1 "input_operand" ",Z,") (const_int 64)))] "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "@ xxpermdi %x0,%x1,%x1,2 lxvd2x %x0,%y1 stxvd2x %x1,%y0" [(set_attr "length" "4") (set_attr "type" "vecperm,vecload,vecstore")]) -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [RFC] VEC_SELECT sanity checking in genrecog
Hi, On Fri, Mar 03, 2017 at 05:28:27PM +0100, Jakub Jelinek wrote: > E.g. in vsx.md the thing is that > the pattern uses an iterator with 2 V2?? modes in it and then V1TI mode, > and uses exactly two elements in paralle, which doesn't make any sense > for V1TI. > > ../../gcc/config/rs6000/vsx.md:2063:1: vec_select parallel with 2 elements, > expected 1 > ../../gcc/config/rs6000/vsx.md:2112:1: vec_select parallel with 2 elements, > expected 1 > ../../gcc/config/rs6000/vsx.md:2161:1: vec_select parallel with 2 elements, > expected 1 Yeah, it looks like these patterns should use VSX_D instead of VSX_LE. Mike, you know this code best, what do you think? Segher
[committed] Small C OpenMP diagnostics fix (PR c/79837)
Hi! In both cases starting with the UDR additions we are looking for any identifiers, both min and max are identifiers, so no need to mention them explicitly. They are just kind like predefined UDRs. Committed to trunk. 2017-03-03 Jakub JelinekPR c/79837 * c-parser.c (c_parser_omp_clause_reduction): Don't mention % or % in the diagnostics, instead mention identifier. (c_parser_omp_declare_reduction): Don't mention % in the diagnostics. --- gcc/c/c-parser.c.jj 2017-03-03 20:53:36.0 +0100 +++ gcc/c/c-parser.c2017-03-03 21:13:35.058836286 +0100 @@ -12112,7 +12112,7 @@ c_parser_omp_clause_reduction (c_parser default: c_parser_error (parser, "expected %<+%>, %<*%>, %<-%>, %<&%>, " - "%<^%>, %<|%>, %<&&%>, %<||%>, % or % "); + "%<^%>, %<|%>, %<&&%>, %<||%> or identifier"); c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0); return list; } @@ -16960,7 +16960,7 @@ c_parser_omp_declare_reduction (c_parser default: c_parser_error (parser, "expected %<+%>, %<*%>, %<-%>, %<&%>, " - "%<^%>, %<|%>, %<&&%>, %<||%>, % or identifier"); + "%<^%>, %<|%>, %<&&%>, %<||%> or identifier"); goto fail; } Jakub
[committed] Fix 3 C FE diagnostic messages (PR c/79836)
Hi! Missed % before > in 3 spots (strangely, all of them in the C FE). The first hunk committed as obvious, the second two falls under OpenMP maintainance, committed to trunk. 2017-03-03 Jakub JelinekPR c/79836 * c-parser.c (c_parser_generic_selection): Use %<_Generic%> instead of %<_Generic>. (c_parser_omp_ordered): Use % instead of %. (c_parser_omp_target_exit_data): Use % instead of %. --- gcc/c/c-parser.c.jj 2017-02-28 16:24:05.0 +0100 +++ gcc/c/c-parser.c2017-03-03 20:53:36.394531548 +0100 @@ -7452,7 +7452,7 @@ c_parser_generic_selection (c_parser *pa else { error_at (assoc.type_location, - "%<_Generic> selector matches multiple associations"); + "%<_Generic%> selector matches multiple associations"); inform (matched_assoc.type_location, "other match is here"); } @@ -15454,7 +15454,7 @@ c_parser_omp_ordered (c_parser *parser, if (context == pragma_stmt) { error_at (loc, - "%<#pragma omp ordered%> with % clause may " + "%<#pragma omp ordered%> with % clause may " "only be used in compound statements"); c_parser_skip_to_pragma_eol (parser, false); return false; @@ -16297,7 +16297,7 @@ c_parser_omp_target_exit_data (location_ map_seen |= 1; error_at (OMP_CLAUSE_LOCATION (*pc), "%<#pragma omp target exit data%> with map-type other " - "than % , % or % on %
[PATCH] Add " && !(MEM_P (operands[0]) && MEM_P (operands[1]))" to condition of some sse.md instructions
On Fri, Mar 03, 2017 at 12:18:09PM +0100, Uros Bizjak wrote: > Yes. Although expander takes care not to generate two memory > references, combine can propagate memory to the other operand, > creating semi-invalid RTX that is later resolved by RA. Here is a patch which does that. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-03-03 Jakub Jelinek* config/i386/sse.md (sse_storehps, sse_storelps, avx__, avx512f__, avx512f__256): Require in condition that at least one operand is not a MEM. --- gcc/config/i386/sse.md.jj 2017-03-03 17:08:53.0 +0100 +++ gcc/config/i386/sse.md 2017-03-03 17:39:05.907871328 +0100 @@ -6637,7 +6637,7 @@ (define_insn "sse_storehps" (vec_select:V2SF (match_operand:V4SF 1 "nonimmediate_operand" "v,v,o") (parallel [(const_int 2) (const_int 3)])))] - "TARGET_SSE" + "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "@ %vmovhps\t{%1, %0|%q0, %1} %vmovhlps\t{%1, %d0|%d0, %1} @@ -6690,7 +6690,7 @@ (define_insn "sse_storelps" (vec_select:V2SF (match_operand:V4SF 1 "nonimmediate_operand" " v,v,m") (parallel [(const_int 0) (const_int 1)])))] - "TARGET_SSE" + "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "@ %vmovlps\t{%1, %0|%q0, %1} %vmovaps\t{%1, %0|%0, %1} @@ -18661,7 +18661,7 @@ (define_insn_and_split "avx_ 1 "nonimmediate_operand" "xm,x")] UNSPEC_CAST))] - "TARGET_AVX" + "TARGET_AVX && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1))] @@ -19575,7 +19575,7 @@ (define_insn_and_split "avx512f_ 1 "nonimmediate_operand" "xm,x")] UNSPEC_CAST))] - "TARGET_AVX512F" + "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1))] @@ -19592,7 +19592,7 @@ (define_insn_and_split "avx512f_ 1 "nonimmediate_operand" "xm,x")] UNSPEC_CAST))] - "TARGET_AVX512F" + "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1))] Jakub
[PATCH] Fix avx512 vpermq (PR target/79812)
Hi! vpermq/vpermpd instructions for 512-bit vectors use bogus RTL and if we happen to simplify-rtx.c it, we ICE. The problem is that for V8D[IF]mode VEC_SELECT we need to use a PARALLEL with 8 elements, not 4. The _vec_dup_1 change is unrelated to this, spotted first by manual inspection and verified by the genrecog.c verify_pattern patch; the broadcast wants to broadcast the first element, so it should be a scalar vec_select in vec_duplicate, rather than same size vector as vec_select's operand. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-03-03 Jakub JelinekPR target/79812 * config/i386/sse.md (VI8F_256_512): Remove mode iterator. (_perm): Rename to ... (avx2_perm): ... this. Use VI8F_256 iterator instead of VI8F_256_512. (_perm_mask): Rename to ... (avx512vl_perm_mask): ... this. Use VI8F_256 iterator instead of VI8F_256_512. (_perm_1): Rename to ... (avx2_perm_1 > 0) & 3), - GEN_INT ((mask >> 2) & 3), - GEN_INT ((mask >> 4) & 3), - GEN_INT ((mask >> 6) & 3))); + emit_insn (gen_avx2_perm_1 (operands[0], operands[1], + GEN_INT ((mask >> 0) & 3), + GEN_INT ((mask >> 2) & 3), + GEN_INT ((mask >> 4) & 3), + GEN_INT ((mask >> 6) & 3))); DONE; }) -(define_expand "_perm_mask" - [(match_operand:VI8F_256_512 0 "register_operand") - (match_operand:VI8F_256_512 1 "nonimmediate_operand") +(define_expand "avx512vl_perm_mask" + [(match_operand:VI8F_256 0 "register_operand") + (match_operand:VI8F_256 1 "nonimmediate_operand") (match_operand:SI 2 "const_0_to_255_operand") - (match_operand:VI8F_256_512 3 "vector_move_operand") + (match_operand:VI8F_256 3 "vector_move_operand") (match_operand: 4 "register_operand")] - "TARGET_AVX512F" + "TARGET_AVX512VL" { int mask = INTVAL (operands[2]); emit_insn (gen__perm_1_mask (operands[0], operands[1], - GEN_INT ((mask >> 0) & 3), - GEN_INT ((mask >> 2) & 3), - GEN_INT ((mask >> 4) & 3), - GEN_INT ((mask >> 6) & 3), - operands[3], operands[4])); + GEN_INT ((mask >> 0) & 3), + GEN_INT ((mask >> 2) & 3), + GEN_INT ((mask >> 4) & 3), + GEN_INT ((mask >> 6) & 3), + operands[3], operands[4])); DONE; }) -(define_insn "_perm_1" - [(set (match_operand:VI8F_256_512 0 "register_operand" "=v") - (vec_select:VI8F_256_512 - (match_operand:VI8F_256_512 1 "nonimmediate_operand" "vm") +(define_insn "avx2_perm_1" + [(set (match_operand:VI8F_256 0 "register_operand" "=v") + (vec_select:VI8F_256 + (match_operand:VI8F_256 1 "nonimmediate_operand" "vm") (parallel [(match_operand 2 "const_0_to_3_operand") (match_operand 3 "const_0_to_3_operand") (match_operand 4 "const_0_to_3_operand") @@ -17361,6 +17359,77 @@
[C++ PATCH] Fix ICE on empty statement expression (PR c++/79822)
Hi! Seems callers of cxx_eval_constant_expression don't really like if it returns NULL, which is what cxx_eval_statement_list returns for empty STATEMENT_LIST. In statement expressions we treat those the same as (void) 0 or any other statement with void value. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-03-03 Jakub JelinekPR c++/79822 * constexpr.c (cxx_eval_statement_list): Treat empty ({ }) like ({ (void) 0; }). * g++.dg/cpp0x/constexpr-79822.C: New test. --- gcc/cp/constexpr.c.jj 2017-02-22 23:03:12.0 +0100 +++ gcc/cp/constexpr.c 2017-03-03 12:42:02.391773591 +0100 @@ -3725,8 +3725,9 @@ cxx_eval_statement_list (const constexpr { tree_stmt_iterator i; tree local_target; - /* In a statement-expression we want to return the last value. */ - tree r = NULL_TREE; + /* In a statement-expression we want to return the last value. + For empty statement expression return void_node. */ + tree r = void_node; if (!jump_target) { local_target = NULL_TREE; --- gcc/testsuite/g++.dg/cpp0x/constexpr-79822.C.jj 2017-03-03 12:49:41.595614494 +0100 +++ gcc/testsuite/g++.dg/cpp0x/constexpr-79822.C2017-03-03 12:50:51.753673495 +0100 @@ -0,0 +1,12 @@ +// PR c++/79822 +// { dg-do compile } +// { dg-options "" } + +bool +foo () +{ + bool a = ({ }) && false; // { dg-error "could not convert" } + bool b = ({ ; }) && false; // { dg-error "could not convert" } + bool c = ({ (void) 1; }) && false; // { dg-error "could not convert" } + return a && b && c; +} Jakub
Re: C PATCH to fix c/79758 (ICE-on-invalid with function redefinition and old style decls)
On 03/03/2017 02:33 PM, Marek Polacek wrote: 2017-03-03 Marek PolacekPR c/79758 * c-decl.c (store_parm_decls_oldstyle): Check if the element of current_function_prototype_arg_types is error_mark_node. Fix formatting. Use TREE_VALUE instead of TREE_TYPE. * gcc.dg/noncompile/pr79758.c: New test. Ok. Bernd
[maintainer-scripts] gcc_release: replace MD5 hashing
Looking how our snapshots are secured in terms of checksums, I noticed we're still using MD5 and SHA-1 hashes. Which is unfortunate, given that MD5 has been considered weak for what, a decade?, and SHA-1 has been considered weak for years as well and now been demonstrated broken for real. So I went ahead and added SHA-256 as new primary hash, demoting SHA-1 to secondary, and removing MD5. I also updated the copy of this script on gcc.gnu.org and will have a closer look at coming snapshots. Gerald 2017-03-03 Gerald Pfeifer* gcc_release (snapshot_print): Replace md5 checksum by sha256. Index: gcc_release === --- gcc_release (revision 245874) +++ gcc_release (revision 245875) @@ -356,7 +356,7 @@ # Print description if snapshot exists. snapshot_print() { if [ -e ${RELEASE}/$1 ]; then -hash=`openssl md5 ${RELEASE}/$1 | sed -e 's#(.*)##' -e 's# *= *#=#'` +hash=`openssl sha256 ${RELEASE}/$1 | sed -e 's#(.*)##' -e 's# *= *#=#'` hash2=`openssl sha1 ${RELEASE}/$1 | sed -e 's#(.*)##' -e 's# *= *#=#'` printf " %-37s%s\n\n %s\n %s\n\n" "$1" "$2" "$hash" "$hash2" \
[PATCH], PR target/79038, Improve char/short -> _Float128 conversions
On January 31st I commited a patch that improves conversion of signed/unsigned char/short values to 32-bit and 64-bit floating point on Power9, particularly when the values are coming from memory. This adds similar support to _Float128/__float128 (i.e. IEEE 128-bit floating point). I have tested this patch via bootstrap and make check and there were no regressions. Can I check this patch into the trunk? [gcc] 2017-03-03 Michael MeissnerPR target/79038 * config/rs6000/rs6000.md (float2): Define insns to convert from signed/unsigned char/short to IEEE 128-bit floating point. (floatuns2): Likewise. [gcc/testsuite] 2017-02-24 Michael Meissner PR target/79038 * gcc.target/powerpc/pr79038-1.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 245815) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -14467,6 +14467,43 @@ (define_insn_and_split "float_si2_ operands[2] = gen_reg_rtx (DImode); }) +(define_insn_and_split "float2" + [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v,v,v") + (float:IEEE128 (match_operand:QHI 1 "nonimmediate_operand" "v,r,Z"))) + (clobber (match_scratch:DI 2 "=X,r,X"))] + "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + rtx dest_di = gen_rtx_REG (DImode, REGNO (dest)); + + if (altivec_register_operand (src, mode)) +emit_insn (gen_extenddi2 (dest_di, src)); + else if (int_reg_operand (src, mode)) +{ + rtx ext_di = operands[2]; + emit_insn (gen_extenddi2 (ext_di, src)); + emit_move_insn (dest_di, ext_di); +} + else if (MEM_P (src)) +{ + rtx dest_qhi = gen_rtx_REG (mode, REGNO (dest)); + emit_move_insn (dest_qhi, src); + emit_insn (gen_extenddi2 (dest_di, dest_qhi)); +} + else +gcc_unreachable (); + + emit_insn (gen_float_di2_hw (dest, dest_di)); + DONE; +} + [(set_attr "length" "8,12,12") + (set_attr "type" "vecfloat") + (set_attr "size" "128")]) + (define_insn "floatuns_di2_hw" [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v") (unsigned_float:IEEE128 @@ -14493,6 +14530,38 @@ (define_insn_and_split "floatuns_s operands[2] = gen_reg_rtx (DImode); }) +(define_insn_and_split "floatuns2" + [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v,v,v") + (unsigned_float:IEEE128 +(match_operand:QHI 1 "nonimmediate_operand" "v,r,Z"))) + (clobber (match_scratch:DI 2 "=X,r,X"))] + "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx dest = operands[0]; + rtx src = operands[1]; + rtx dest_di = gen_rtx_REG (DImode, REGNO (dest)); + + if (altivec_register_operand (src, mode) || MEM_P (src)) +emit_insn (gen_zero_extenddi2 (dest_di, src)); + else if (int_reg_operand (src, mode)) +{ + rtx ext_di = operands[2]; + emit_insn (gen_zero_extenddi2 (ext_di, src)); + emit_move_insn (dest_di, ext_di); +} + else +gcc_unreachable (); + + emit_insn (gen_floatuns_di2_hw (dest, dest_di)); + DONE; +} + [(set_attr "length" "8,12,8") + (set_attr "type" "vecfloat") + (set_attr "size" "128")]) + ;; IEEE 128-bit instructions with round to odd semantics (define_insn "*truncdf2_odd" [(set (match_operand:DF 0 "vsx_register_operand" "=v") Index: gcc/testsuite/gcc.target/powerpc/pr79038-1.c === --- gcc/testsuite/gcc.target/powerpc/pr79038-1.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr79038-1.c(revision 0) @@ -0,0 +1,39 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2 -mfloat128" } */ + +#ifndef TYPE +#define TYPE _Float128 +#endif + +TYPE +return_convert_uchar_to_float128_mem (unsigned char *p_uc) +{ + return (TYPE) p_uc[10]; +} + +TYPE +return_convert_schar_to_float128_mem (signed char *p_sc) +{ + return (TYPE) p_sc[10]; +} + +TYPE +return_convert_ushort_to_float128_mem (unsigned short *p_us) +{ + return (TYPE) p_us[10]; +} + +TYPE +return_convert_sshort_to_float128_mem (short *p_ss) +{ + return (TYPE) p_ss[10]; +} + +/* { dg-final { scan-assembler {\mlxsi[bh]zx\M} } } */ +/* { dg-final { scan-assembler {\mvexts[bh]2d\M} } } */ +/* { dg-final { scan-assembler-not {\mextsb\M} } } */ +/* { dg-final { scan-assembler-not {\ml[bh][az]\M} } } */ +/* { dg-final { scan-assembler-not
Re: [PATCH] [MSP430] PR78838: Do not add section name prefixes when section name is .lowtext
ping The MSP430 target supports the automatic placement of functions and data in different memory regions when passing the "-mdata-region=either" and "-mcode-region=either" options. MSP430x devices support the large memory model, "-mlarge", which enables 20 bit pointers, however the vector table across all MSP430 targets only accepts 16-bit pointers. To prevent the use of 20-bit pointers in the vector table when the large memory model is used, the MSP430 backend currently places functions specified with the interrupt attribute in a section called ".lowtext". This section is placed at the beginning of .text in the MSP430 linker scripts. PR target/78838 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78838) reports that a function with the interrupt attribute is placed in a non-existent section ".either.lowtext" when "-mlarge", "-mcode-region=either" and "-ffunction-sections" are passed. The backend has correctly decided to place the function in .lowtext, but has applied the .either prefix which is undesirable. No additional .lower/.upper/.either prefixes should be applied to the section name once it has been placed in .lowtext, the patch below implements this. I've built and tested successfully with no regressions reported: Target is msp430-unknown-elf Host is x86_64-unknown-linux-gnu Testsuite variations: msp430-sim/-mcpu=msp430x/-mlarge/-mdata-region=either/-mcode-region=either The section .lowtext will only be utilised if the large memory model is used, which is why I have only tested with this testsuite variation. I haven't run the g++ testsuite because msp430-elf doesn't build with C++ support at the moment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79242 I don't have write access to the GCC SVN repository, so if this patch is satisfactory, I would appreciate if someone could commit it for me. Thanks. --- 2017-02-XX Jozef Lawrynowiczgcc/ PR target/78838 * config/msp430/msp430.c (gen_prefix): Return NULL when section name is .lowtext (msp430_section_is_lowtext): New function. gcc/testsuite PR target/78838 * gcc.target/msp430/interrupt_fn_placement.c: New test From: Jozef Lawrynowicz Date: Wed, 15 Feb 2017 13:03:40 + Subject: [PATCH] [MSP430] PR78838: Do not add section name prefixes when the section name is .lowtext --- gcc/config/msp430/msp430.c | 14 ++ gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c | 9 + 2 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c index 6f63116..c9dffb1 100644 --- a/gcc/config/msp430/msp430.c +++ b/gcc/config/msp430/msp430.c @@ -1808,6 +1808,15 @@ is_critical_func (tree decl = current_function_decl) return has_attr (ATTR_CRIT, decl); } +static bool +msp430_section_is_lowtext (tree decl = current_function_decl) +{ + if (decl == NULL_TREE) +return false; + const char * dec_name = DECL_SECTION_NAME (decl); + return dec_name && strcmp (".lowtext", dec_name) == 0; +} + #undef TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS #define TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS msp430_allocate_stack_slots_for_args @@ -2146,6 +2155,11 @@ gen_prefix (tree decl) if (has_attr ("section", decl)) return NULL; + /* If the function has been put in the .lowtext section because it is an interrupt + * handler, and the large memory model is used, then do not add any prefixes. */ + if (msp430_section_is_lowtext (decl)) +return NULL; + /* If the object has __attribute__((lower)) then use the ".lower." prefix. */ if (has_attr (ATTR_LOWER, decl)) return lower_prefix; diff --git a/gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c b/gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c new file mode 100644 index 000..d5d2113 --- /dev/null +++ b/gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c @@ -0,0 +1,9 @@ +/* { dg-do link } */ +/* { dg-options "-mlarge -mcode-region=either -ffunction-sections" } */ + +void __attribute__((interrupt(2))) ir_1(void) { +} + +int main(void) { + while(1); +} -- 1.8.3.1
Re: [PATCH] [ARC] Handle complex PIC move patterns.
* Claudiu Zissulescu[2017-02-28 16:59:58 +0100]: > Hi, > > fwprop step is placing in the REG_EQUIV notes constant pic unspecs > expressions. Then, loop may use these notes for optimizations > rezulting in complex patterns that are not supported by the current > implementation. > > The patch adds handling of complex PIC addresses having MINUS or UNARY > operations. > > Ok to apply? > Claudiu There's a couple of spelling mistakes I've marked inline, but otherwise, this seems sensible. Thanks, Andrew > > gcc/ > 2017-02-28 Claudiu Zissulescu > > * config/arc/arc.c (arc_legitimize_pic_address): Handle PIC > expressions with MINUS and UNARY ops. > > gcc/testsuite > 2017-02-28 Claudiu Zissulescu > > * gcc.target/arc/pr9001090948.c: New file. > --- > gcc/config/arc/arc.c| 49 > - > gcc/testsuite/gcc.target/arc/pr9001090948.c | 25 +++ > 2 files changed, 73 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/arc/pr9001090948.c > > diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c > index 508a9a6..3b94d7c 100644 > --- a/gcc/config/arc/arc.c > +++ b/gcc/config/arc/arc.c > @@ -5024,8 +5024,55 @@ arc_legitimize_pic_address (rtx orig, rtx oldx) > /* Check that the unspec is one of the ones we generate? */ > return orig; > } > + /* fwprop is placing in the REG_EQUIV notes constant pic > + unspecs expressions. Then, loop may use these notes for > + optimizations rezulting in complex patterns that are not s/rezulting/resulting/ > + supported by the current implementation. The following > + two if-cases are simplifying the complex patters in s/patters in/patterns to/ > + simpler ones. */ > + else if (GET_CODE (addr) == MINUS) > + { > + rtx op0 = XEXP (addr, 0); > + rtx op1 = XEXP (addr, 1); > + gcc_assert (oldx); > + gcc_assert (GET_CODE (op1) == UNSPEC); > + > + emit_move_insn (oldx, > + gen_rtx_CONST (SImode, > + arc_legitimize_pic_address (op1, > + > NULL_RTX))); > + emit_insn (gen_rtx_SET (oldx, gen_rtx_MINUS (SImode, op0, oldx))); > + return oldx; > + > + } > + else if (GET_CODE (addr) != PLUS) > + { > + rtx tmp = XEXP (addr, 0); > + enum rtx_code code = GET_CODE (addr); > + > + /* It only works for UNARY operations. */ > + gcc_assert (UNARY_P (addr)); > + gcc_assert (GET_CODE (tmp) == UNSPEC); > + gcc_assert (oldx); > + > + emit_move_insn > + (oldx, > + gen_rtx_CONST (SImode, > + arc_legitimize_pic_address (tmp, > + NULL_RTX))); > + > + emit_insn (gen_rtx_SET (oldx, > + gen_rtx_fmt_ee (code, SImode, > + oldx, const0_rtx))); > + > + return oldx; > + } > else > - gcc_assert (GET_CODE (addr) == PLUS); > + { > + gcc_assert (GET_CODE (addr) == PLUS); > + if (GET_CODE (XEXP (addr, 0)) == UNSPEC) > + return orig; > + } > } > >if (GET_CODE (addr) == PLUS) > diff --git a/gcc/testsuite/gcc.target/arc/pr9001090948.c > b/gcc/testsuite/gcc.target/arc/pr9001090948.c > new file mode 100644 > index 000..103f4ae > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arc/pr9001090948.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-skip-if "ARC600 doesn't support pic" { arc6xx } } */ > +/* { dg-options "-Os -fPIC" } */ > +#include > +#include > + > +char * > +strip_trail (const char str[], size_t n) > +{ > + static char buf[1025]; > + int j; > + > + strncpy (buf, str, n); > + buf[n] = '\0'; > + > + for (j = strlen (buf) - 1; j >= 0; j--) > +{ > + if (buf[j] != ' ') > +break; > + > + buf[j] = '\0'; > +} > + > + return buf; > +} > -- > 1.9.1 >
Re: Fix (work around) LRA infinite loop, PR78911
On 03/03/2017 08:36 AM, Bernd Schmidt wrote: Reload is designed in a way to avoid cycles and to process all reloads for an insn in order of ascending class so as to avoid this kind of issue. With LRA I'm really not sure how to fix this properly, but the following patch seems to cure the PR at least, by recognizing when we're about to spill a reload reg whose assigned class contains only contains a single register. I had conversations with several people who worked on different RAs and got a conclusion that looping is a pretty general problem in iterative classical RAs. Unfortunately, it is not discussed in a literature and ad-hoc approaches are used (that is why it is probably not discussed in scientific literature). On first stages of LRA development, looping was a very frequent bug and also add-hoc approaches were used to fix it. The major way is to predict that given decision can result in repeating patterns and to avoid such decision (e.g. lra_constraints.c::process_alt_operands contains code). Reasons for looping can have different nature and now I have no systematic solution on my mind to avoid all of them. Bootstrapped and tested on x86_64-linux, ok? Yes. Thank you for working on this, Bernd. The issue mentioned in FIXME part of the comment can be addressed later if we have cases for which your fix will be not enough.
C++ PATCH to fix ICE with NSDMI and this pointer (PR c++/79796)
We weren't replacing PLACEHOLDER_EXPR in the following testcase, leading to a crash in the gimplifier. The fix seems to be to use replace_placeholders, the question is where. These three functions have it: cp_gimplify_init_expr store_init_value build_new_1 But we call neither so I tried adding the call to build_over_call, right after we create the MODIFY_EXPR with the NSDMI, and that seemed to work out. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2017-03-03 Marek PolacekPR c++/79796 - ICE with NSDMI and this pointer * call.c (build_over_call): Handle NSDMI with a 'this' by calling replace_placeholders. * g++.dg/cpp0x/nsdmi13.C: New test. diff --git gcc/cp/call.c gcc/cp/call.c index dc629b96..b821224 100644 --- gcc/cp/call.c +++ gcc/cp/call.c @@ -8047,6 +8047,9 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain) { arg = cp_build_indirect_ref (arg, RO_NULL, complain); val = build2 (MODIFY_EXPR, TREE_TYPE (to), to, arg); + if (cxx_dialect >= cxx14) + /* Handle NSDMI that refer to the object being initialized. */ + replace_placeholders (arg, to); } else { diff --git gcc/testsuite/g++.dg/cpp0x/nsdmi13.C gcc/testsuite/g++.dg/cpp0x/nsdmi13.C index e69de29..2751da3 100644 --- gcc/testsuite/g++.dg/cpp0x/nsdmi13.C +++ gcc/testsuite/g++.dg/cpp0x/nsdmi13.C @@ -0,0 +1,13 @@ +// PR c++/79796 +// { dg-do compile { target c++11 } } + +struct A +{ + A* p = this; +}; + +void foo() +{ + A a; + a = A({}); +} Marek
[RFC] VEC_SELECT sanity checking in genrecog
Hi! When working on PR79812 which was caused by bugs in x86 define_insn (used vec_select with parallel as last operand with incorrect number of operands), I wrote following sanity checking. The thing is that e.g. simplify-rtx.c already has such assertions: if (!VECTOR_MODE_P (mode)) { gcc_assert (VECTOR_MODE_P (GET_MODE (trueop0))); gcc_assert (mode == GET_MODE_INNER (GET_MODE (trueop0))); gcc_assert (GET_CODE (trueop1) == PARALLEL); gcc_assert (XVECLEN (trueop1, 0) == 1); gcc_assert (CONST_INT_P (XVECEXP (trueop1, 0, 0))); ... else { gcc_assert (VECTOR_MODE_P (GET_MODE (trueop0))); gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER (GET_MODE (trueop0))); gcc_assert (GET_CODE (trueop1) == PARALLEL); if (GET_CODE (trueop0) == CONST_VECTOR) { int elt_size = GET_MODE_UNIT_SIZE (mode); unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size); rtvec v = rtvec_alloc (n_elts); unsigned int i; gcc_assert (XVECLEN (trueop1, 0) == (int) n_elts); ... and documentation says: @findex vec_select @item (vec_select:@var{m} @var{vec1} @var{selection}) This describes an operation that selects parts of a vector. @var{vec1} is the source vector, and @var{selection} is a @code{parallel} that contains a @code{const_int} for each of the subparts of the result vector, giving the number of the source subpart that should be stored into it. The result mode @var{m} is either the submode for a single element of @var{vec1} (if only one subpart is selected), or another vector mode with that element submode (if multiple subparts are selected). Unfortunately, even after fixing 2 x86 bugs with that, I'm seeing tons of issues on other targets (many others are ok though) below. So, is the verification the right thing and are all these md bugs that should be fixed (and then the verification added)? If not, I'd be afraid if people are unlucky and combiner constant propagates something or some other pass, we can ICE on those in simplify-rtx.c. E.g. in vsx.md the thing is that the pattern uses an iterator with 2 V2?? modes in it and then V1TI mode, and uses exactly two elements in paralle, which doesn't make any sense for V1TI. ../../gcc/config/rs6000/vsx.md:2063:1: vec_select parallel with 2 elements, expected 1 ../../gcc/config/rs6000/vsx.md:2112:1: vec_select parallel with 2 elements, expected 1 ../../gcc/config/rs6000/vsx.md:2161:1: vec_select parallel with 2 elements, expected 1 ../../gcc/config/aarch64/aarch64-simd.md:79:1: DImode of first vec_select operand is not a vector mode ../../gcc/config/aarch64/aarch64-simd.md:79:1: DFmode of first vec_select operand is not a vector mode ../../gcc/config/aarch64/aarch64-simd.md:588:1: DImode of first vec_select operand is not a vector mode ../../gcc/config/aarch64/aarch64-simd.md:588:1: DFmode of first vec_select operand is not a vector mode ../../gcc/config/aarch64/aarch64-simd.md:3192:1: DFmode of first vec_select operand is not a vector mode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select HImode and its operand QImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select SImode and its operand QImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select QImode and its operand HImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select SImode and its operand HImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select QImode and its operand SImode ../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select HImode and its operand SImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select HImode and its operand QImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select SImode and its operand QImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select QImode and its operand HImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select SImode and its operand HImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select QImode and its operand SImode ../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select HImode and its operand SImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select HImode and its operand QImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select SImode and its operand QImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select QImode and its operand HImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select SImode and its operand HImode ../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select QImode and
[PATCH][AArch64] Fix bootstrap due to wide_int .elt (0) uninit warning
Hi all, This patch fixes the aarch64 bootstrap failure I've encountered. Richi suggested on IRC that we should be using .ulow () on the wide_int insetad of accessing elt (0) as that doesn't play well with the uninit analysis. Bootstrapped and tested on aarch64-none-linux-gnu. Committing to trunk in the interest of fixing the build 2017-03-03 Kyrylo Tkachov* config/aarch64/aarch64.c (aarch64_float_const_representable_p): Use wide_int::ulow () instead of .elt (0). diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ab1bdc0..714bb79 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -12496,7 +12496,7 @@ aarch64_float_const_representable_p (rtx x) /* If the low part of the mantissa has bits set we cannot represent the value. */ - if (w.elt (0) != 0) + if (w.ulow () != 0) return false; /* We have rejected the lower HOST_WIDE_INT, so update our understanding of how many bits lie in the mantissa and
[PATCH, i386]: Fix a couple of split insn constraints
Hello! While looking in this area, I've noticed that there are constraints that allow wide immediate operands that result in unsupported immediate->memory moves on 64bit targets. Since these immediates have to be limited to 32bits, we have to use C constraint that allows only all-bits-zero 0.0 FP value. 2017-03-03 Uros Bizjak* config/i386/i386.md (*pushtf): Change *roF constraint to *roC. (*pushxf): Limit oF constraint to 32bit targets and add oC constraint for 64bit targets. (pushxf splitter): Use PUSH_ROUNDING to calculate stack adjustment. (*pushdf): Change rmF constraint to rmC. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} and committed. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 245874) +++ config/i386/i386.md (working copy) @@ -2992,7 +2992,7 @@ (define_insn "*pushtf" [(set (match_operand:TF 0 "push_operand" "=<,<") - (match_operand:TF 1 "general_no_elim_operand" "v,*roF"))] + (match_operand:TF 1 "general_no_elim_operand" "v,*roC"))] "TARGET_64BIT || TARGET_SSE" { /* This insn should be already split before reg-stack. */ @@ -3046,17 +3046,18 @@ (symbol_ref "true")))]) (define_insn "*pushxf" - [(set (match_operand:XF 0 "push_operand" "=<,<,<,<") - (match_operand:XF 1 "general_no_elim_operand" "f,r,*r,oF"))] + [(set (match_operand:XF 0 "push_operand" "=<,<,<,<,<") + (match_operand:XF 1 "general_no_elim_operand" "f,r,*r,oF,oC"))] "" { /* This insn should be already split before reg-stack. */ gcc_unreachable (); } - [(set_attr "type" "multi") - (set_attr "unit" "i387,*,*,*") + [(set_attr "isa" "*,*,*,nox64,x64") + (set_attr "type" "multi") + (set_attr "unit" "i387,*,*,*,*") (set (attr "mode") - (cond [(eq_attr "alternative" "1,2,3") + (cond [(eq_attr "alternative" "1,2,3,4") (if_then_else (match_test "TARGET_64BIT") (const_string "DI") (const_string "SI")) @@ -3075,7 +3076,7 @@ [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2))) (set (match_dup 0) (match_dup 1))] { - operands[2] = GEN_INT (-GET_MODE_SIZE (XFmode)); + operands[2] = GEN_INT (-PUSH_ROUNDING (GET_MODE_SIZE (XFmode))); /* Preserve memory attributes. */ operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx); }) @@ -3082,7 +3083,7 @@ (define_insn "*pushdf" [(set (match_operand:DF 0 "push_operand" "=<,<,<,<,<,<") - (match_operand:DF 1 "general_no_elim_operand" "f,r,*r,oF,rmF,x"))] + (match_operand:DF 1 "general_no_elim_operand" "f,r,*r,oF,rmC,x"))] "" { /* This insn should be already split before reg-stack. */
Re: [PATCH] PR libstdc++/79789 fix non-reserved names in headers
This patch caused a new regression on AIX. - David FAIL: 17_intro/names.cc (test for excess errors) Excess errors: /tmp/GCC/gcc/include-fixed/sys/types.h:600: error: expected unqualified-id before '[' token /tmp/GCC/gcc/include-fixed/sys/types.h:600: error: expected ')' before '[' token /tmp/GCC/gcc/include-fixed/sys/types.h:668: error: expected unqualified-id before '[' token /tmp/GCC/gcc/include-fixed/sys/types.h:668: error: expected ')' before '[' token /tmp/GCC/gcc/include-fixed/stdlib.h:332: error: 'parameter' declared as function returning a function /tmp/GCC/gcc/include-fixed/stdlib.h:332: error: expected primary-expression before 'void' /tmp/GCC/gcc/include-fixed/math.h:1458: error: expected unqualified-id before ',' token /tmp/GCC/gcc/include-fixed/math.h:1458: error: expected ')' before ',' token AIX types.h:600 is typedef struct { int r[1]; } * physadr_t; stdlib.h:332 is extern int at_quick_exit(void (*f)(void)); math.h:1458 is struct dbl_hypot { double x, y; };
Re: [PR79793] Fixes interrupt stack alignment
On Fri, Mar 3, 2017 at 4:20 PM, Koval, Juliawrote: > Hi, > This patch fixes PR79731. Ok for trunk? > > gcc/ > * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Fix > boundary for interrupt ...: Set incoming stack boundary to 128 for 64-bit targets. > gcc/testsuite/ > * gcc.target/i386/interrupt-12.c: Fix test. > * gcc.target/i386/interrupt-13.c: Ditto. > * gcc.target/i386/interrupt-14.c: Ditto. > * gcc.target/i386/interrupt-15.c: Ditto. ...: Update scan-assembler-times directives. Please also add "PR target/79731" to the top of ChangeLog entry. OK with the above ChangeLog changes. Thanks, Uros.
[PR79793] Fixes interrupt stack alignment
Hi, This patch fixes PR79731. Ok for trunk? gcc/ * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Fix boundary for interrupt gcc/testsuite/ * gcc.target/i386/interrupt-12.c: Fix test. * gcc.target/i386/interrupt-13.c: Ditto. * gcc.target/i386/interrupt-14.c: Ditto. * gcc.target/i386/interrupt-15.c: Ditto. Julia patch_interrupt_3.3 Description: patch_interrupt_3.3
[PATCH] libbacktrace: Fix compile warning building testcase
Hello, I hit this compile warning running `make check` on AIX. Since -Werror is enabled by default here it causes a build failure. Tested on powerpc-ibm-aix7.2.0.0 This my first patch submission for GCC, so please let me know if I need to do anything differently. Thanks Sam >From a0d73b85afd593aafd50e454d9a86acd9f0bc489 Mon Sep 17 00:00:00 2001 From: Sam ThursfieldDate: Fri, 3 Mar 2017 08:38:42 -0600 Subject: [PATCH] libbacktrace: Fix compile warning building testcase This warning occurred when the BACKTRACE_SUPPORTED flag is undefined. ../../../gcc/libbacktrace/btest.c:624:1: error: 'test5' defined but not used [-Werror=unused-function] test5 (void) ^ cc1: all warnings being treated as errors libbacktrace/: 2017-03-03 Sam Thursfield * btest.c (test5): Fix compile warning when BACKTRACE_SUPPORTED isn't defined. --- libbacktrace/btest.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c index e28a3d8..bed547a 100644 --- a/libbacktrace/btest.c +++ b/libbacktrace/btest.c @@ -616,6 +616,7 @@ f33 (int f1line, int f2line) return failures; } +#if BACKTRACE_SUPPORTED #if BACKTRACE_SUPPORTS_DATA int global = 1; @@ -687,6 +688,7 @@ test5 (void) } #endif /* BACKTRACE_SUPPORTS_DATA */ +#endif /* BACKTRACE_SUPPORTED */ static void error_callback_create (void *data ATTRIBUTE_UNUSED, const char *msg, -- 2.2.2
Fix IRA issue, PR79728
This is an ICE where setup_pressure_classes fails if xmm0 is a global reg. Instead of GENERAL/FLOAT/SSE/MMX_REGS, it computes only SSE_FIRST_REG as the third register class. The problem is that the costs for moving between SSE_FIRST_REG and SSE_REGS are inflated because we think we have no available registers in SSE_FIRST_REG (since the only register in that class is global), and that somewhat confuses the algorithm. The following fixes it by tweaking contains_regs_of_mode. Out of caution, I've retained the old meaning for code in reload which uses this. Bootstrapped and tested on x86_64-linux. Ok? Bernd PR rtl-optimization/79728 * regs.h (struct target_regs): New field x_contains_allocatable_regs_of_mode. (contains_allocatable_regs_of_mode): New macro. * reginfo.c (init_reg_sets_1): Initialize it, and change contains_reg_of_mode so it includes global regs as well. * reload.c (push_reload): Use contains_allocatable_regs_of_mode rather thanc ontains_regs_of_mode. PR rtl-optimization/79728 * gcc.target/i386/sse-globalreg.c: New test. Index: gcc/reginfo.c === --- gcc/reginfo.c (revision 245685) +++ gcc/reginfo.c (working copy) @@ -468,19 +468,29 @@ init_reg_sets_1 (void) memset (contains_reg_of_mode, 0, sizeof (contains_reg_of_mode)); for (m = 0; m < (unsigned int) MAX_MACHINE_MODE; m++) { - HARD_REG_SET ok_regs; + HARD_REG_SET ok_regs, ok_regs2; CLEAR_HARD_REG_SET (ok_regs); + CLEAR_HARD_REG_SET (ok_regs2); for (j = 0; j < FIRST_PSEUDO_REGISTER; j++) - if (!fixed_regs [j] && HARD_REGNO_MODE_OK (j, (machine_mode) m)) - SET_HARD_REG_BIT (ok_regs, j); + if (!TEST_HARD_REG_BIT (fixed_nonglobal_reg_set, j) + && HARD_REGNO_MODE_OK (j, (machine_mode) m)) + { + SET_HARD_REG_BIT (ok_regs, j); + if (!fixed_regs[j]) + SET_HARD_REG_BIT (ok_regs2, j); + } for (i = 0; i < N_REG_CLASSES; i++) if ((targetm.class_max_nregs ((reg_class_t) i, (machine_mode) m) <= reg_class_size[i]) && hard_reg_set_intersect_p (ok_regs, reg_class_contents[i])) { - contains_reg_of_mode [i][m] = 1; - have_regs_of_mode [m] = 1; + contains_reg_of_mode[i][m] = 1; + if (hard_reg_set_intersect_p (ok_regs2, reg_class_contents[i])) + { + have_regs_of_mode[m] = 1; + contains_allocatable_reg_of_mode[i][m] = 1; + } } } } Index: gcc/regs.h === --- gcc/regs.h (revision 245685) +++ gcc/regs.h (working copy) @@ -218,6 +218,10 @@ struct target_regs { /* 1 if the corresponding class contains a register of the given mode. */ char x_contains_reg_of_mode[N_REG_CLASSES][MAX_MACHINE_MODE]; + /* 1 if the corresponding class contains a register of the given mode + which is not global and can therefore be allocated. */ + char x_contains_allocatable_reg_of_mode[N_REG_CLASSES][MAX_MACHINE_MODE]; + /* Record for each mode whether we can move a register directly to or from an object of that mode in memory. If we can't, we won't try to use that mode directly when accessing a field of that mode. */ @@ -243,6 +247,8 @@ extern struct target_regs *this_target_r (this_target_regs->x_have_regs_of_mode) #define contains_reg_of_mode \ (this_target_regs->x_contains_reg_of_mode) +#define contains_allocatable_reg_of_mode \ + (this_target_regs->x_contains_allocatable_reg_of_mode) #define direct_load \ (this_target_regs->x_direct_load) #define direct_store \ Index: gcc/reload.c === --- gcc/reload.c (revision 245685) +++ gcc/reload.c (working copy) @@ -1055,7 +1055,7 @@ push_reload (rtx in, rtx out, rtx *inloc #ifdef CANNOT_CHANGE_MODE_CLASS && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass) #endif - && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))] + && contains_allocatable_reg_of_mode[rclass][GET_MODE (SUBREG_REG (in))] && (CONSTANT_P (SUBREG_REG (in)) || GET_CODE (SUBREG_REG (in)) == PLUS || strict_low @@ -1164,7 +1164,7 @@ push_reload (rtx in, rtx out, rtx *inloc #ifdef CANNOT_CHANGE_MODE_CLASS && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass) #endif - && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))] + && contains_allocatable_reg_of_mode[rclass][GET_MODE (SUBREG_REG (out))] && (CONSTANT_P (SUBREG_REG (out)) || strict_low || (((REG_P (SUBREG_REG (out)) Index: gcc/testsuite/gcc.target/i386/sse-globalreg.c === --- gcc/testsuite/gcc.target/i386/sse-globalreg.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse-globalreg.c (working copy) @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -w" } */ +/* { dg-require-effective-target sse2 } */ + +register
Fix (work around) LRA infinite loop, PR78911
In this PR, we have an endless cycle in LRA, generating ever more instructions. The relevant portions of the dump seem to be these: ** Local #9: ** Creating newreg=130 from oldreg=128, assigning class CREG to r130 18: {r117:DI=unspec/v[[r129:SI*0x8+r123:SI],r117:DI,r122:SI,r130:SI,0x8005] 62;[r129:SI*0x8+r123:SI]=unspec/v[0] 62;flags:CCZ=unspec/v[0] 62;} ** Assignment #7: ** Assigning to 130 (cl=CREG, orig=114, freq=1148, tfirst=130, tfreq=1148)... 2nd iter for reload pseudo assignments: Reload r130 assignment failure Spill r87(hr=5, freq=4296) Spill r123(hr=4, freq=1148) Spill reload r129(hr=2, freq=1148) ** Local #11: ** Spilling non-eliminable hard regs: 6 Creating newreg=131, assigning class GENERAL_REGS to address r131 Change to class INDEX_REGS for r131 ** Assignment #8: ** Assigning to 131 (cl=INDEX_REGS, orig=131, freq=1148, tfirst=131, tfreq=1148)... Trying 0: spill 117(freq=1722) Now best 0(cost=574, bad_spills=0, insn_pseudos=1) Trying 1: spill 117(freq=1722) Trying 2: spill 130(freq=1148) Now best 2(cost=0, bad_spills=0, insn_pseudos=1) Trying 3: spill 122(freq=1148) Trying 4: spill 129(freq=1148) ** Local #12: ** Creating newreg=133 from oldreg=130, assigning class CREG to r133 What seems to happen is that in the "Assignemnt #8" phase, we're considering one new reload reg only, and we end up spilling r130, which only allows a register from a single class. Then we reload r130 again and make r133, and the cycle starts. Reload is designed in a way to avoid cycles and to process all reloads for an insn in order of ascending class so as to avoid this kind of issue. With LRA I'm really not sure how to fix this properly, but the following patch seems to cure the PR at least, by recognizing when we're about to spill a reload reg whose assigned class contains only contains a single register. Bootstrapped and tested on x86_64-linux, ok? Bernd PR rtl-optimization/78911 * lra-assigns.c (must_not_spill_p): New function. (spill_for): Use it. PR rtl-optimization/78911 * gcc.target/i386/pr78911-1.c: New test. * gcc.target/i386/pr78911-2.c: New test. Index: gcc/lra-assigns.c === --- gcc/lra-assigns.c (revision 245685) +++ gcc/lra-assigns.c (working copy) @@ -889,6 +889,30 @@ assign_temporarily (int regno, int hard_ live_pseudos_reg_renumber[regno] = hard_regno; } +/* Return true iff there is a reason why pseudo SPILL_REGNO should not + be spilled. */ +static bool +must_not_spill_p (unsigned spill_regno) +{ + if ((pic_offset_table_rtx != NULL + && spill_regno == REGNO (pic_offset_table_rtx)) + || ((int) spill_regno >= lra_constraint_new_regno_start + && ! bitmap_bit_p (_inheritance_pseudos, spill_regno) + && ! bitmap_bit_p (_split_regs, spill_regno) + && ! bitmap_bit_p (_subreg_reload_pseudos, spill_regno) + && ! bitmap_bit_p (_optional_reload_pseudos, spill_regno))) +return true; + /* A reload pseudo that requires a singleton register class should + not be spilled. + FIXME: this mitigates the issue on certain i386 patterns, but + does not solve the general case where existing reloads fully + cover a limited register class. */ + if (!bitmap_bit_p (_reload_pseudos, spill_regno) + && reg_class_size [reg_preferred_class (spill_regno)] == 1) +return true; + return false; +} + /* Array used for sorting reload pseudos for subsequent allocation after spilling some pseudo. */ static int *sorted_reload_pseudos; @@ -960,13 +984,7 @@ spill_for (int regno, bitmap spilled_pse /* Spill pseudos. */ static_p = false; EXECUTE_IF_SET_IN_BITMAP (_pseudos_bitmap, 0, spill_regno, bi) - if ((pic_offset_table_rtx != NULL - && spill_regno == REGNO (pic_offset_table_rtx)) - || ((int) spill_regno >= lra_constraint_new_regno_start - && ! bitmap_bit_p (_inheritance_pseudos, spill_regno) - && ! bitmap_bit_p (_split_regs, spill_regno) - && ! bitmap_bit_p (_subreg_reload_pseudos, spill_regno) - && ! bitmap_bit_p (_optional_reload_pseudos, spill_regno))) + if (must_not_spill_p (spill_regno)) goto fail; else if (non_spilled_static_chain_regno_p (spill_regno)) static_p = true; Index: gcc/testsuite/gcc.target/i386/pr78911-1.c === --- gcc/testsuite/gcc.target/i386/pr78911-1.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr78911-1.c (working copy) @@ -0,0 +1,22 @@ +/* PR rtl-optimization/78911 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fno-strict-aliasing -fno-omit-frame-pointer" } */ +/* { dg-additional-options "-fPIC" { target fpic } } */ +/* { dg-additional-options "-march=pentium-m" { target ia32 } } */ + +int a, b, d, e; +long long *c; + +static
Re: C PATCH to fix c/79758 (ICE-on-invalid with function redefinition and old style decls)
On Thu, Mar 02, 2017 at 08:47:57PM +0100, Bernd Schmidt wrote: > On 03/02/2017 06:35 PM, Marek Polacek wrote: > > While at it, I fixed wrong formatting in the nearby code. Also use > > NULL_TREE > > instead of 0 where appropriate. I really dislike those zeros-as-trees; one > > day I'll just go and turn them into NULL_TREEs. > > I sympathize, but it makes it harder to see which parts of the patch are > actual changes. Generally it's best to separate functional and formatting > changes. True, but I though I'd do it anyway while I'm changing the very same if; formatting fixes along with actual fixes are not that rare. > > @@ -8996,7 +8999,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct > > c_arg_info *arg_info) > > declared for the arg. ISO C says we take the unqualified > > type for parameters declared with qualified type. */ > > if (TREE_TYPE (parm) != error_mark_node > > - && TREE_TYPE (type) != error_mark_node > > + && TREE_TYPE (TREE_VALUE (type)) != error_mark_node > > && ((TYPE_ATOMIC (DECL_ARG_TYPE (parm)) > >!= TYPE_ATOMIC (TREE_VALUE (type))) > > || !comptypes (TYPE_MAIN_VARIANT (DECL_ARG_TYPE (parm)), > > Isn't this most likely intended to be TREE_VALUE (type) != error_mark_node? Yea, that's what I had in mind but goofed. > > @@ -9017,7 +9020,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct > > c_arg_info *arg_info) > > if (targetm.calls.promote_prototypes (TREE_TYPE > > (current_function_decl)) > > && INTEGRAL_TYPE_P (TREE_TYPE (parm)) > > && TYPE_PRECISION (TREE_TYPE (parm)) > > - < TYPE_PRECISION (integer_type_node)) > > +< TYPE_PRECISION (integer_type_node)) > > Should add the necessary parens while fixing formatting. Dunno if they're necessary here, but I added them. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2017-03-03 Marek PolacekPR c/79758 * c-decl.c (store_parm_decls_oldstyle): Check if the element of current_function_prototype_arg_types is error_mark_node. Fix formatting. Use TREE_VALUE instead of TREE_TYPE. * gcc.dg/noncompile/pr79758.c: New test. diff --git gcc/c/c-decl.c gcc/c/c-decl.c index 32edacc..f46ca11 100644 --- gcc/c/c-decl.c +++ gcc/c/c-decl.c @@ -8965,12 +8965,15 @@ store_parm_decls_oldstyle (tree fndecl, const struct c_arg_info *arg_info) tree type; for (parm = DECL_ARGUMENTS (fndecl), type = current_function_prototype_arg_types; - parm || (type && TREE_VALUE (type) != error_mark_node - && (TYPE_MAIN_VARIANT (TREE_VALUE (type)) != void_type_node)); + parm || (type != NULL_TREE + && TREE_VALUE (type) != error_mark_node + && TYPE_MAIN_VARIANT (TREE_VALUE (type)) != void_type_node); parm = DECL_CHAIN (parm), type = TREE_CHAIN (type)) { - if (parm == 0 || type == 0 - || TYPE_MAIN_VARIANT (TREE_VALUE (type)) == void_type_node) + if (parm == NULL_TREE + || type == NULL_TREE + || (TREE_VALUE (type) != error_mark_node + && TYPE_MAIN_VARIANT (TREE_VALUE (type)) == void_type_node)) { if (current_function_prototype_built_in) warning_at (DECL_SOURCE_LOCATION (fndecl), @@ -8996,7 +8999,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct c_arg_info *arg_info) declared for the arg. ISO C says we take the unqualified type for parameters declared with qualified type. */ if (TREE_TYPE (parm) != error_mark_node - && TREE_TYPE (type) != error_mark_node + && TREE_VALUE (type) != error_mark_node && ((TYPE_ATOMIC (DECL_ARG_TYPE (parm)) != TYPE_ATOMIC (TREE_VALUE (type))) || !comptypes (TYPE_MAIN_VARIANT (DECL_ARG_TYPE (parm)), @@ -9016,8 +9019,8 @@ store_parm_decls_oldstyle (tree fndecl, const struct c_arg_info *arg_info) if (targetm.calls.promote_prototypes (TREE_TYPE (current_function_decl)) && INTEGRAL_TYPE_P (TREE_TYPE (parm)) - && TYPE_PRECISION (TREE_TYPE (parm)) - < TYPE_PRECISION (integer_type_node)) + && (TYPE_PRECISION (TREE_TYPE (parm)) + < TYPE_PRECISION (integer_type_node))) DECL_ARG_TYPE (parm) = c_type_promotes_to (TREE_TYPE (parm)); diff --git gcc/testsuite/gcc.dg/noncompile/pr79758.c gcc/testsuite/gcc.dg/noncompile/pr79758.c index e69de29..aeaf7c7 100644 --- gcc/testsuite/gcc.dg/noncompile/pr79758.c +++ gcc/testsuite/gcc.dg/noncompile/pr79758.c @@ -0,0 +1,6 @@ +/* PR c/79758 */ +/* { dg-do compile } */ + +void fn1 (int[a]) { }; /* { dg-error "undeclared here|parameter name omitted" } */
RE: [PATCH,testsuite] MIPS: Fix register mode checking for n64 in pr68273.c.
> From: Matthew Fortune > > > > gcc/testsuite/ > > > > * gcc.target/mips/pr68273.c (dg-final): Match SImode registers only for > > ilp32 targets and match DImode registers for lp64 targets. > > OK, thanks. > > Matthew Committed as r245874. Thanks, Toma
Re: [OBVIOUS] [PATCH] Remove unused variable.
The missing patch. >From d539ba1a94ab85445f54c1742ff8e924e915f06a Mon Sep 17 00:00:00 2001 From: marxinDate: Fri, 3 Mar 2017 13:50:42 +0100 Subject: [PATCH] Remove unused variable. gcc/ChangeLog: 2017-03-03 Martin Liska * tree-ssa-loop-prefetch.c (pass_loop_prefetch::execute): Remove unused variable. --- gcc/tree-ssa-loop-prefetch.c | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c index a1372074e9f..339eeafcfee 100644 --- a/gcc/tree-ssa-loop-prefetch.c +++ b/gcc/tree-ssa-loop-prefetch.c @@ -2035,7 +2035,6 @@ pass_loop_prefetch::execute (function *fun) if (number_of_loops (fun) <= 1) return 0; - bool warned_p = false; if ((PREFETCH_BLOCK & (PREFETCH_BLOCK - 1)) != 0) { static bool warned = false; -- 2.11.1
Re: [PATCH] Fix IFN_ATOMIC* handling with -fnon-call-exceptions (PR middle-end/79805)
On Fri, 3 Mar 2017, Jakub Jelinek wrote: > On Fri, Mar 03, 2017 at 09:21:55AM +0100, Richard Biener wrote: > > > --- gcc/gimple-fold.c.jj 2017-02-07 16:40:45.0 +0100 > > > +++ gcc/gimple-fold.c 2017-03-02 16:04:51.304850077 +0100 > > > @@ -3533,6 +3533,8 @@ fold_builtin_atomic_compare_exchange (gi > > >tree itype = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (parmt))); > > >tree ctype = build_complex_type (itype); > > >tree expected = TREE_OPERAND (gimple_call_arg (stmt, 1), 0); > > > + bool may_throw = false; > > > + edge e = NULL; > > >gimple *g = gimple_build_assign (make_ssa_name (TREE_TYPE (expected)), > > > expected); > > >gsi_insert_before (gsi, g, GSI_SAME_STMT); > > > @@ -3558,19 +3560,43 @@ fold_builtin_atomic_compare_exchange (gi > > >gimple_set_vdef (g, gimple_vdef (stmt)); > > >gimple_set_vuse (g, gimple_vuse (stmt)); > > >SSA_NAME_DEF_STMT (gimple_vdef (g)) = g; > > > - if (gimple_call_lhs (stmt)) > > > + tree oldlhs = gimple_call_lhs (stmt); > > > + if (flag_non_call_exceptions && stmt_ends_bb_p (stmt)) > > > > I think a more appropriate check is stmt_can_throw_internal (stmt) > > Ok. I suppose without "flag_non_call_exceptions && ". Yeah. > > > { > > > - gsi_insert_before (gsi, g, GSI_SAME_STMT); > > > + may_throw = true; > > > > and 'throws' rather than 'may_throw'. > > Ack. > > > > + gimple_call_set_lhs (stmt, NULL_TREE); > > > + gsi_replace (gsi, g, true); > > > + e = find_fallthru_edge (gsi_bb (*gsi)->succs); > > > +} > > > + gimple_call_set_nothrow (as_a (g), flag_non_call_exceptions > > > == 0); > > > > it should copy nothrow state from the source(s) of the folding if you > > consider flag_non_call_exceptions == true in this fn but working on > > inlined code from a flag_non_call_exceptions == false function. At least > > for calls in the source(s) that works, for other stmts I think we do > > not have an explicit nothrow on the stmt. > > So just > gimple_call_set_nothrow (as_a (g), !throws); > then? Hmm, no. That will result in stmt_can_throw_external to return false. Simply gimple_call_set_nothrow (g, gimple_call_nothrow (orig-call)) > > > + if (oldlhs) > > > +{ > > > + if (!may_throw) > > > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > > >g = gimple_build_assign (make_ssa_name (itype), IMAGPART_EXPR, > > > build1 (IMAGPART_EXPR, itype, lhs)); > > > - gsi_insert_before (gsi, g, GSI_SAME_STMT); > > > - g = gimple_build_assign (gimple_call_lhs (stmt), NOP_EXPR, > > > -gimple_assign_lhs (g)); > > > + if (may_throw) > > > + { > > > + gsi_insert_on_edge_immediate (e, g); > > > + *gsi = gsi_for_stmt (g); > > > + } > > > + else > > > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > > > + g = gimple_build_assign (oldlhs, NOP_EXPR, gimple_assign_lhs (g)); > > > + if (may_throw) > > > + gsi_insert_after (gsi, g, GSI_NEW_STMT); > > > } > > > - gsi_replace (gsi, g, true); > > > + if (!may_throw) > > > +gsi_replace (gsi, g, true); > > >g = gimple_build_assign (make_ssa_name (itype), REALPART_EXPR, > > > build1 (REALPART_EXPR, itype, lhs)); > > > - gsi_insert_after (gsi, g, GSI_NEW_STMT); > > > + if (may_throw && oldlhs == NULL_TREE) > > > +{ > > > + gsi_insert_on_edge_immediate (e, g); > > > + *gsi = gsi_for_stmt (g); > > > +} > > > + else > > > +gsi_insert_after (gsi, g, GSI_NEW_STMT); > > > > rather than these dances with if (may_throw) isn't it easier to > > compute an insert gsi after finding the fallthru edge, splitting > > the edge if it is a critical one? (basically doing a less > > fancy gimple_find_edge_insert_loc) > > I can't see how that could simplify the code, gimple_find_edge_insert_loc > is lots of code, not exported and can return ins_after/!ins_after which > afterwards needs to be honored. > > I've simplified the code a little bit by doing gsi_replace early, then > the difference is pretty much just: > - gsi_insert_after (gsi, g, GSI_NEW_STMT); > + if (throws) > + { > + gsi_insert_on_edge_immediate (e, g); > + *gsi = gsi_for_stmt (g); > + } > + else > + gsi_insert_after (gsi, g, GSI_NEW_STMT); > that has to be done on the first insertion after the call (which > unfortunately means two locations). Another option I see is to > gimple_seq_add_stmt_without_update all the stmts following the call > into a seq and then conditionally insert that seq on the edge (immediate) > or insert the seq after the call. It would make sense only for > gimple-fold.c, in tree-ssa-ccp.c there is just one stmt added this way, > the debug stmt needs extra care. As you prefer. Looks good apart from the set_nothrow issue above. Thanks, Richard. > 2017-03-03 Jakub Jelinek> > PR middle-end/79805 > * internal-fn.def
Re: Fix ICE in use-after-scope w/ -fno-tree-dce (PR, sanitize/79783).
On Thu, Mar 02, 2017 at 06:49:32PM +0100, Martin Liška wrote: > It can happen with inlining and -fno-tree-dce that VAR_DECL for a SSA > NAME was removed and thus the poisoning should not have any usage. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? > Martin > >From d8aa72dc1d696f5500c00b6c2f532f2a87cf58d2 Mon Sep 17 00:00:00 2001 > From: marxin> Date: Thu, 2 Mar 2017 11:55:00 +0100 > Subject: [PATCH] Fix ICE in use-after-scope w/ -fno-tree-dce (PR > sanitize/79783). > > gcc/ChangeLog: > > 2017-03-02 Martin Liska > > PR sanitize/79783 > * asan.c (asan_expand_poison_ifn): Do not expand ASAN_POISON > when having a SSA NAME w/o VAR_DECL assigned to it. > > gcc/testsuite/ChangeLog: > > 2017-03-02 Martin Liska > > PR sanitize/79783 > * g++.dg/asan/pr79783.C: New test. > --- > gcc/asan.c | 5 - > gcc/testsuite/g++.dg/asan/pr79783.C | 19 +++ > 2 files changed, 23 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/asan/pr79783.C > > diff --git a/gcc/asan.c b/gcc/asan.c > index 6cdd59b7ea7..307423ced03 100644 > --- a/gcc/asan.c > +++ b/gcc/asan.c > @@ -3107,7 +3107,10 @@ asan_expand_poison_ifn (gimple_stmt_iterator *iter, > { >gimple *g = gsi_stmt (*iter); >tree poisoned_var = gimple_call_lhs (g); > - if (!poisoned_var) > + > + /* It can happen with inlining and -fno-tree-dce that VAR_DECL for a SSA > + NAME was removed and thus the poisoning should not have any usage. */ > + if (!poisoned_var || SSA_NAME_VAR (poisoned_var) == NULL_TREE) I wonder if it wouldn't be better to do: if (!poisoned_var || has_zero_uses (poisoned_var)) perhaps with -fno-tree-dce we could end up with SSA_NAME_VAR being non-NULL, yet no uses; in that case there is nothing to warn about. On the other side, in theory we could also end up with anonymous SSA_NAME and some uses - in that case perhaps it would be better to warn. So do: if (SSA_NAME_VAR (poisoned_var) == NULL_TREE) SSA_NAME_VAR (poisoned_var) = create_tmp_var (TREE_TYPE (poisoned_var)); tree shadow_var = create_asan_shadow_var (SSA_NAME_VAR (poisoned_var), shadow_vars_mapping); or so? We'll need SSA_NAME_VAR non-NULL so that we can use a default def later. Jakub
[OBVIOUS] [PATCH] Remove unused variable.
Hello. This fixes copy error, it's trivial, thus I'm going to install it. Thanks, Martin
[PATCH, testsuite]: Disable some tests on x32
x32 does not support -mmpx. 2017-03-03 Uros Bizjak* g++.dg/pr71624.C: Disable for x32. * g++.dg/pr71633.C: Ditto. Tested on x86_64-linux-gnu {,32}, committed. Uros. Index: g++.dg/pr71624.C === --- g++.dg/pr71624.C(revision 245860) +++ g++.dg/pr71624.C(working copy) @@ -1,5 +1,5 @@ /* PR71624 */ -// { dg-do compile { target i?86-*-* x86_64-*-* } } +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! x32 } } } } */ /* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */ class c1 Index: g++.dg/pr71633.C === --- g++.dg/pr71633.C(revision 245860) +++ g++.dg/pr71633.C(working copy) @@ -1,5 +1,5 @@ /* PR71633 */ -// { dg-do compile { target i?86-*-* x86_64-*-* } } +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! x32 } } } } */ /* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */ class c1
Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)
On Fri, Mar 3, 2017 at 12:30 PM, Jakub Jelinekwrote: >> Regarding the implementation, I'd rather see another flag for >> i386-builtin.def entries, where we can mark if matching memory operand >> is allowed. The expander can then look at the flag, and do some >> special magic. > > I think maintaining such a flag would be a nightmare, I think more insns > will need it in the future and it will be hard to remember. Maybe it is > better > not to do anything about those (i.e. apply my patch as is), after all -O0 > is not optimized code. Yes, I agree. Uros.
[PATCH] Properly handle __cxa_pure_virtual visibility (PR lto/79760).
Hello. I'm sending Honza's patch (which is pre-approved) that I've just tested on ppc64le-redhat-linux. It also fixes the issue spotted in the PR. Martin >From 10050e47af7dbf217469da083666b16dbbbf606d Mon Sep 17 00:00:00 2001 From: marxinDate: Thu, 2 Mar 2017 18:46:25 +0100 Subject: [PATCH] Properly handle __cxa_pure_virtual visibility (PR lto/79760). gcc/ChangeLog: 2017-03-02 Jan Hubicka PR lto/79760 * ipa-devirt.c (maybe_record_node): Properly handle __cxa_pure_virtual visibility. --- gcc/ipa-devirt.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index 4428d679722..0e5eb85b61d 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -2462,10 +2462,19 @@ maybe_record_node (vec , nodes.safe_push (target_node); } } - else if (completep - && (!type_in_anonymous_namespace_p - (DECL_CONTEXT (target)) - || flag_ltrans)) + else if (!completep) +; + /* We have definition of __cxa_pure_virtual that is not accessible (it is + optimized out or partitioned to other unit) so we can not add it. When + not sanitizing, there is nothing to do. + Otherwise declare the list incomplete. */ + else if (pure_virtual) +{ + if (flag_sanitize & SANITIZE_UNREACHABLE) + *completep = false; +} + else if (flag_ltrans + || !type_in_anonymous_namespace_p (DECL_CONTEXT (target))) *completep = false; } -- 2.11.1
Re: [C++ Patch] PR 70266
OK. On Fri, Mar 3, 2017 at 12:26 AM, Paolo Carliniwrote: > Hi, > > one more case of OVERLOAD filtering to cxx_eval_constant_expression and > causing an ICE. > > In order to have a proper diagnostic early enough I think that in > build_must_not_throw_expr we want to use perform_implicit_conversion_flags > on the condition before calling cxx_constant_value for it. That would be > also analogous to the code we already have in build_noexcept_spec, for > example. Tested x86_64-linux. > > Thanks, Paolo. > > // >
[PATCH] Revert part of fixed-value fix
The following reverts uhigh back to elt (1) which are not 100% semantically equivalent (and the change is not needed to avoid the warning-prone elt (0)). Bootstrapped / tested on x86_64-unknown-linux-gnu, applied. Richard. 2017-03-03 Richard Biener* fixed-value.c (fixed_from_string): Restore use of elt (1) in place of uhigh (). (fixed_convert_from_real): Likewise. Index: gcc/fixed-value.c === --- gcc/fixed-value.c (revision 245865) +++ gcc/fixed-value.c (working copy) @@ -131,7 +131,7 @@ fixed_from_string (FIXED_VALUE_TYPE *f, wide_int w = real_to_integer (_value, , GET_MODE_PRECISION (mode)); f->data.low = w.ulow (); - f->data.high = w.uhigh (); + f->data.high = w.elt (1); if (temp == FIXED_MAX_EPS && ALL_FRACT_MODE_P (f->mode)) { @@ -1050,7 +1050,7 @@ fixed_convert_from_real (FIXED_VALUE_TYP wide_int w = real_to_integer (_value, , GET_MODE_PRECISION (mode)); f->data.low = w.ulow (); - f->data.high = w.uhigh (); + f->data.high = w.elt (1); temp = check_real_for_fixed_mode (_value, mode); if (temp == FIXED_UNDERFLOW) /* Minimum. */ {
Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)
On Fri, Mar 03, 2017 at 12:18:09PM +0100, Uros Bizjak wrote: > > as the constraint require that both operands aren't memory, shall I create a > > patch for that? This is the first category below. > > Yes. Although expander takes care not to generate two memory > references, combine can propagate memory to the other operand, > creating semi-invalid RTX that is later resolved by RA. Ok, will work on a patch. > > The second category is where matching operand is ok, so my patch can > > pessimize stuff. I wonder if we couldn't handle this like: > > /* If we aren't optimizing, only allow one memory operand to be > > generated. */ > > if (memory_operand (op, mode)) > > { > > const char *const constraint > > = insn_data[icode].operand[i + adjust + 1].constraint; > > if (optimize > > || num_memory != 1 > > || !rtx_equal_p (real_target, op)) > > num_memory++; > > /* sse2_movsd allows matching operand. */ > > else if (icode == CODE_FOR_sse2_movsd) > > ; > > /* Various masked insns allow matching operand. */ > > else if (insn_data[icode].operand[i + adjust + 1].predicate > >== vector_move_operand > >&& (strcmp (constraint, "0C") == 0 > >|| strcmp (constraint, "0C,0") == 0)) > > ; > > else > > num_memory++; > > } > > (though perhaps sse2_movsd is still too simplistic, because it has just > > =m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is > > fine as is, if it is 1, then only if the memory is offsettable; though > > perhaps LRA can just turn non-offsettable memory into offsettable one > > through secondary reload). > > There is similarity with arithmetic operations. Please note that arith > ops fixup their arguments using ix86_expand_binary_operator, and their > insn patterns have to satisfy ix86_binary_operator_ok insn predicate. > However, even using this infrastructure, I have noticed that the > compiler is quite sloppy with memory operands, and we mostly get > matched memory after combine pass (if we are lucky). > > Regarding the implementation, I'd rather see another flag for > i386-builtin.def entries, where we can mark if matching memory operand > is allowed. The expander can then look at the flag, and do some > special magic. I think maintaining such a flag would be a nightmare, I think more insns will need it in the future and it will be hard to remember. Maybe it is better not to do anything about those (i.e. apply my patch as is), after all -O0 is not optimized code. Jakub
Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)
On Fri, Mar 3, 2017 at 11:35 AM, Jakub Jelinekwrote: > On Fri, Mar 03, 2017 at 09:10:03AM +0100, Uros Bizjak wrote: >> > Or are there any define_insn/define_expand where it is desirable to have >> > both input and output operand a MEM (and does it have to be matching)? >> > For various scalar binary and unary expanders the backend already uses a >> > helper >> > that will force something into memory if dest and src are both memory and >> > not rtx_equal_p, but do we have anything like that in anything these two >> > builtin expanders emit? >> >> Any insn with matched memory (movlps, movhps and similar) can also >> operate with matched register. To my knowledge, all insn patterns are >> written in this way, since in the past we had plenty of problems with >> matched memory operands. >> >> Also, I really don't remember what benefit brings us to *not* force >> input operand to a register at -O0 at expand time and leave to RA to >> find matched memory. OTOH, with -O0 we already have so many >> unnecessary moves that you get tears in the eyes... > > So, I've tried: > make mddump > for i in `sed -n -e 's/^.*CODE_FOR_\([^,]*\),.*$/\1/p' > ../../gcc/config/i386/i386-builtin.def`; do sed -n > '/^(define_\(insn\|expand\) ("'$i'"/,/^$/p' tmp-mddump.md; done > and looked for "=.*[mo] in there. > Some insns might want that && !(MEM_P (operands[0]) && MEM_P (operands[1])), > e.g. > (define_insn_and_split "avx512f__" > [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m") > (unspec:AVX512MODE2P > [(match_operand: 1 "nonimmediate_operand" "xm,x")] > UNSPEC_CAST))] > "TARGET_AVX512F" > "#" > "&& reload_completed" > [(set (match_dup 0) (match_dup 1))] > as the constraint require that both operands aren't memory, shall I create a > patch for that? This is the first category below. Yes. Although expander takes care not to generate two memory references, combine can propagate memory to the other operand, creating semi-invalid RTX that is later resolved by RA. > The second category is where matching operand is ok, so my patch can > pessimize stuff. I wonder if we couldn't handle this like: > /* If we aren't optimizing, only allow one memory operand to be > generated. */ > if (memory_operand (op, mode)) > { > const char *const constraint > = insn_data[icode].operand[i + adjust + 1].constraint; > if (optimize > || num_memory != 1 > || !rtx_equal_p (real_target, op)) > num_memory++; > /* sse2_movsd allows matching operand. */ > else if (icode == CODE_FOR_sse2_movsd) > ; > /* Various masked insns allow matching operand. */ > else if (insn_data[icode].operand[i + adjust + 1].predicate >== vector_move_operand >&& (strcmp (constraint, "0C") == 0 >|| strcmp (constraint, "0C,0") == 0)) > ; > else > num_memory++; > } > (though perhaps sse2_movsd is still too simplistic, because it has just > =m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is > fine as is, if it is 1, then only if the memory is offsettable; though > perhaps LRA can just turn non-offsettable memory into offsettable one > through secondary reload). There is similarity with arithmetic operations. Please note that arith ops fixup their arguments using ix86_expand_binary_operator, and their insn patterns have to satisfy ix86_binary_operator_ok insn predicate. However, even using this infrastructure, I have noticed that the compiler is quite sloppy with memory operands, and we mostly get matched memory after combine pass (if we are lucky). Regarding the implementation, I'd rather see another flag for i386-builtin.def entries, where we can mark if matching memory operand is allowed. The expander can then look at the flag, and do some special magic. > And the last category is with "=m" destination (not allowing anything else) > and (match_dup 0) somewhere in the pattern, I believe those have to be > expanded specially, because otherwise one e.g. wouldn't get MEM_P target > if optimize at all (I think the builtins are supposed to pass pointer to the > result in that case) and the match_dup just doesn't appear as another operand. It looks to me, that the expander already takes care for matching, where strictly required. OTOH, matching memory requirement would also fit in the proposed implementation with flag in builtin entries, we can mark if matching memory operand is required, not only allowed. However, as said above, expanding with memory operands is somehow neglected area, desperately looking for some love. Uros. > missing !(MEM && MEM) ? > === > avx512f_pd512_256pd > avx512f_pd512_pd > avx512f_ps512_256ps >
Re: [PATCH] Fix IFN_ATOMIC* handling with -fnon-call-exceptions (PR middle-end/79805)
On Fri, Mar 03, 2017 at 09:21:55AM +0100, Richard Biener wrote: > > --- gcc/gimple-fold.c.jj2017-02-07 16:40:45.0 +0100 > > +++ gcc/gimple-fold.c 2017-03-02 16:04:51.304850077 +0100 > > @@ -3533,6 +3533,8 @@ fold_builtin_atomic_compare_exchange (gi > >tree itype = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (parmt))); > >tree ctype = build_complex_type (itype); > >tree expected = TREE_OPERAND (gimple_call_arg (stmt, 1), 0); > > + bool may_throw = false; > > + edge e = NULL; > >gimple *g = gimple_build_assign (make_ssa_name (TREE_TYPE (expected)), > >expected); > >gsi_insert_before (gsi, g, GSI_SAME_STMT); > > @@ -3558,19 +3560,43 @@ fold_builtin_atomic_compare_exchange (gi > >gimple_set_vdef (g, gimple_vdef (stmt)); > >gimple_set_vuse (g, gimple_vuse (stmt)); > >SSA_NAME_DEF_STMT (gimple_vdef (g)) = g; > > - if (gimple_call_lhs (stmt)) > > + tree oldlhs = gimple_call_lhs (stmt); > > + if (flag_non_call_exceptions && stmt_ends_bb_p (stmt)) > > I think a more appropriate check is stmt_can_throw_internal (stmt) Ok. I suppose without "flag_non_call_exceptions && ". > > { > > - gsi_insert_before (gsi, g, GSI_SAME_STMT); > > + may_throw = true; > > and 'throws' rather than 'may_throw'. Ack. > > + gimple_call_set_lhs (stmt, NULL_TREE); > > + gsi_replace (gsi, g, true); > > + e = find_fallthru_edge (gsi_bb (*gsi)->succs); > > +} > > + gimple_call_set_nothrow (as_a (g), flag_non_call_exceptions == > > 0); > > it should copy nothrow state from the source(s) of the folding if you > consider flag_non_call_exceptions == true in this fn but working on > inlined code from a flag_non_call_exceptions == false function. At least > for calls in the source(s) that works, for other stmts I think we do > not have an explicit nothrow on the stmt. So just gimple_call_set_nothrow (as_a (g), !throws); then? > > + if (oldlhs) > > +{ > > + if (!may_throw) > > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > >g = gimple_build_assign (make_ssa_name (itype), IMAGPART_EXPR, > >build1 (IMAGPART_EXPR, itype, lhs)); > > - gsi_insert_before (gsi, g, GSI_SAME_STMT); > > - g = gimple_build_assign (gimple_call_lhs (stmt), NOP_EXPR, > > - gimple_assign_lhs (g)); > > + if (may_throw) > > + { > > + gsi_insert_on_edge_immediate (e, g); > > + *gsi = gsi_for_stmt (g); > > + } > > + else > > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > > + g = gimple_build_assign (oldlhs, NOP_EXPR, gimple_assign_lhs (g)); > > + if (may_throw) > > + gsi_insert_after (gsi, g, GSI_NEW_STMT); > > } > > - gsi_replace (gsi, g, true); > > + if (!may_throw) > > +gsi_replace (gsi, g, true); > >g = gimple_build_assign (make_ssa_name (itype), REALPART_EXPR, > >build1 (REALPART_EXPR, itype, lhs)); > > - gsi_insert_after (gsi, g, GSI_NEW_STMT); > > + if (may_throw && oldlhs == NULL_TREE) > > +{ > > + gsi_insert_on_edge_immediate (e, g); > > + *gsi = gsi_for_stmt (g); > > +} > > + else > > +gsi_insert_after (gsi, g, GSI_NEW_STMT); > > rather than these dances with if (may_throw) isn't it easier to > compute an insert gsi after finding the fallthru edge, splitting > the edge if it is a critical one? (basically doing a less > fancy gimple_find_edge_insert_loc) I can't see how that could simplify the code, gimple_find_edge_insert_loc is lots of code, not exported and can return ins_after/!ins_after which afterwards needs to be honored. I've simplified the code a little bit by doing gsi_replace early, then the difference is pretty much just: - gsi_insert_after (gsi, g, GSI_NEW_STMT); + if (throws) + { + gsi_insert_on_edge_immediate (e, g); + *gsi = gsi_for_stmt (g); + } + else + gsi_insert_after (gsi, g, GSI_NEW_STMT); that has to be done on the first insertion after the call (which unfortunately means two locations). Another option I see is to gimple_seq_add_stmt_without_update all the stmts following the call into a seq and then conditionally insert that seq on the edge (immediate) or insert the seq after the call. It would make sense only for gimple-fold.c, in tree-ssa-ccp.c there is just one stmt added this way, the debug stmt needs extra care. 2017-03-03 Jakub JelinekPR middle-end/79805 * internal-fn.def (ATOMIC_BIT_TEST_AND_SET, ATOMIC_BIT_TEST_AND_RESET, ATOMIC_BIT_TEST_AND_COMPLEMENT, ATOMIC_COMPARE_EXCHANGE): Remove ECF_NOTHROW. * gimple-fold.c (fold_builtin_atomic_compare_exchange): Set gimple_call_nothrow_p flag based on whether original builtin can throw. If it can, emit following stmts on the fallthrough edge. * tree-ssa-ccp.c (optimize_atomic_bit_test_and): Similarly, except don't create new bb if inserting just
Re: GCSE: Use HOST_WIDE_INT instead of int (PR, rtl-optimization/79574).
On Thu, Mar 2, 2017 at 8:58 PM, Bernd Schmidtwrote: > On 03/02/2017 06:50 PM, Martin Liška wrote: >> >> Hello. >> >> This is second part of fixes needed to not trigger integer overflow in >> gcse pass. > > > So, how is this intended to work? The min/max stored in the param is an int, > and by using a HOST_WIDE_INT here, we expect that it is a larger type and > therefore won't overflow? HOST_WIDE_INT is equal to uint64_t (and we could mass-replace where we can do so consistently) Richard. >>{ >> expr = flat_table[i]; >> fprintf (file, "Index %d (hash value %d; max distance %d)\n ", >> -expr->bitmap_index, hash_val[i], expr->max_distance); >> +expr->bitmap_index, hash_val[i], >> (int)expr->max_distance); >> print_rtl (file, expr->expr); >> fprintf (file, "\n"); > > > Use HOST_WIDE_INT_PRINT_DEC maybe? Otherwise OK, I guess. > > > Bernd
Re: [PATCH] free MPFR caches in gimple-ssa-sprintf.c (PR 79699)
On Thu, Mar 2, 2017 at 10:33 PM, Martin Seborwrote: > On 03/02/2017 01:08 AM, Richard Biener wrote: >> >> On Thu, Mar 2, 2017 at 2:01 AM, Joseph Myers >> wrote: >>> >>> On Wed, 1 Mar 2017, Martin Sebor wrote: >>> Joseph, since you commented on the bug, do you have a suggestion for a different site for it or a guard? The only other call to the function is in the Fortran FE and it's neither guarded nor does it appear to ever be called. >>> >>> >>> I don't think a guard is needed. Arguably it should be called from an >>> atexit handler, but since we don't have such a handler calling it from >>> the >>> relevant pass seems reasonable (and I'm not sure what the right way to >>> handle such freeing of memory in the JIT context is). >> >> >> IMHO we should call it from gcc::~context > > > Thanks, that makes sense to me. The attached patch does that. "mpfr.h" as include is wrong, please use "realmpfr.h" as all other code in GCC (which includes and , ideally those would move to system.h guarded with a INCLUDE_XXX guard or always included as we include gmp.h). Ok with that change, Richard. > Martin
Re: Add -Wdisabled-optimization to loop prefetching pass (PR, tree-optimization/79803).
On Fri, Mar 3, 2017 at 11:57 AM, Richard Bienerwrote: > On Thu, Mar 2, 2017 at 6:48 PM, Martin Liška wrote: >> Hello. >> >> I've just followed Richi's advises in the PR. >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Ready to be installed? > > I think the check tests for power-of two, not even. Also you should mention > this is a --param, otherwise people might think their CPU is broken ;) > > Ok otherwise. Oh, and maybe only warn once, not for each function... > > Thanks, > Richard. > >> Martin
Re: Add -Wdisabled-optimization to loop prefetching pass (PR, tree-optimization/79803).
On Thu, Mar 2, 2017 at 6:48 PM, Martin Liškawrote: > Hello. > > I've just followed Richi's advises in the PR. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? I think the check tests for power-of two, not even. Also you should mention this is a --param, otherwise people might think their CPU is broken ;) Ok otherwise. Thanks, Richard. > Martin
Re: [PATCH] S/390: Disable vectorization for loops with few iterations
On Thu, Mar 2, 2017 at 4:39 PM, Robin Dappwrote: > Hi, > > the following patch defines the PARAM_MIN_VECT_LOOP_BOUND parameter in > the s390 backend. It helps with the vectorization epilogue problem > described here [1]. > I see an overall performance increase of > 1% in SPECfp2006, yet some > cases like cactusADM regress. This seems to be caused by the vectorizer > creating an epilogue guard for one more iteration than before, which, in > turn, causes e.g. predcom to run on the epilogue that it used to ignore > before ("Loop iterates only 1 time, nothing to do."). Subsequent, > minor, effects cause an eventual slowdown. > > Until the reason for the bad epilogue code is understood, this patch > mitigates the problem. When investigating the issue, I stumbled across > an attempt to vectorize the epilogue itself as well as combine it with > the vectorized loop in addition to vector masking [2]. A similar > approach might also help here. My original observation of high register > pressure within the epilogue still stands. In this specific case, it > would most likely suffice to save all registers once, run the epilogue > and restore the registers. I'm pretty sure this would be faster than > the "spill fest" that's currently happening. Note it also honors PARAM_MIN_VECT_LOOP_BOUND if there's no epilouge. Richard. > Regards > Robin > > [1] https://gcc.gnu.org/ml/gcc/2017-01/msg00234.html > [2] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01562.html > > -- > > gcc/ChangeLog: > > 2017-03-02 Robin Dapp > > * config/s390/s390.c (s390_option_override_internal): Set > PARAM_MIN_VECT_LOOP_BOUND
[PATCH] rs6000: Fix for -mwarn-cell-microcode (PR43763)
If using -mwarn-cell-microcode, rs6000_final_prescan_insn calls get_insn_template to get the name of the machine instruction. But, get_insn_template calls the output template if that is code, and that then can modify recog_data (it is normal to change the operands, for example). This patch saves and restores recog_data around the call to get_insn_template to fix the problems this causes. Bootstrap and test running on powerpc64-linux; I'll commit it to trunk if that goes well (and 6 and 5 later). Segher 2017-03-03 Segher BoesssenkoolPR target/43763 * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Save and restore recog_data (including the operand rtxes inside it) around the call to get_insn_data. --- gcc/config/rs6000/rs6000.c | 5 + 1 file changed, 5 insertions(+) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 3f13ce5..ecc7823 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -39041,7 +39041,12 @@ rs6000_final_prescan_insn (rtx_insn *insn, rtx *operand ATTRIBUTE_UNUSED, if (insn_code_number < 0) return; + /* get_insn_data can modify recog_data, so save and restore it. */ + struct recog_data_d recog_data_save = recog_data; + for (int i = 0; i < recog_data.n_operands; i++) + recog_data.operand[i] = copy_rtx (recog_data.operand[i]); temp = get_insn_template (insn_code_number, insn); + recog_data = recog_data_save; if (get_attr_cell_micro (insn) == CELL_MICRO_ALWAYS) warning_at (location, OPT_mwarn_cell_microcode, -- 1.9.3
Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)
On Fri, Mar 03, 2017 at 09:10:03AM +0100, Uros Bizjak wrote: > > Or are there any define_insn/define_expand where it is desirable to have > > both input and output operand a MEM (and does it have to be matching)? > > For various scalar binary and unary expanders the backend already uses a > > helper > > that will force something into memory if dest and src are both memory and > > not rtx_equal_p, but do we have anything like that in anything these two > > builtin expanders emit? > > Any insn with matched memory (movlps, movhps and similar) can also > operate with matched register. To my knowledge, all insn patterns are > written in this way, since in the past we had plenty of problems with > matched memory operands. > > Also, I really don't remember what benefit brings us to *not* force > input operand to a register at -O0 at expand time and leave to RA to > find matched memory. OTOH, with -O0 we already have so many > unnecessary moves that you get tears in the eyes... So, I've tried: make mddump for i in `sed -n -e 's/^.*CODE_FOR_\([^,]*\),.*$/\1/p' ../../gcc/config/i386/i386-builtin.def`; do sed -n '/^(define_\(insn\|expand\) ("'$i'"/,/^$/p' tmp-mddump.md; done and looked for "=.*[mo] in there. Some insns might want that && !(MEM_P (operands[0]) && MEM_P (operands[1])), e.g. (define_insn_and_split "avx512f__" [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m") (unspec:AVX512MODE2P [(match_operand: 1 "nonimmediate_operand" "xm,x")] UNSPEC_CAST))] "TARGET_AVX512F" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1))] as the constraint require that both operands aren't memory, shall I create a patch for that? This is the first category below. The second category is where matching operand is ok, so my patch can pessimize stuff. I wonder if we couldn't handle this like: /* If we aren't optimizing, only allow one memory operand to be generated. */ if (memory_operand (op, mode)) { const char *const constraint = insn_data[icode].operand[i + adjust + 1].constraint; if (optimize || num_memory != 1 || !rtx_equal_p (real_target, op)) num_memory++; /* sse2_movsd allows matching operand. */ else if (icode == CODE_FOR_sse2_movsd) ; /* Various masked insns allow matching operand. */ else if (insn_data[icode].operand[i + adjust + 1].predicate == vector_move_operand && (strcmp (constraint, "0C") == 0 || strcmp (constraint, "0C,0") == 0)) ; else num_memory++; } (though perhaps sse2_movsd is still too simplistic, because it has just =m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is fine as is, if it is 1, then only if the memory is offsettable; though perhaps LRA can just turn non-offsettable memory into offsettable one through secondary reload). And the last category is with "=m" destination (not allowing anything else) and (match_dup 0) somewhere in the pattern, I believe those have to be expanded specially, because otherwise one e.g. wouldn't get MEM_P target if optimize at all (I think the builtins are supposed to pass pointer to the result in that case) and the match_dup just doesn't appear as another operand. missing !(MEM && MEM) ? === avx512f_pd512_256pd avx512f_pd512_pd avx512f_ps512_256ps avx512f_ps512_ps avx512f_si512_256si avx512f_si512_si avx_pd256_pd avx_ps256_ps avx_si256_si sse_storehps sse_storelps valid matching mem == avx512f_ss_truncatev16siv16hi2_mask avx512f_ss_truncatev16siv16qi2_mask avx512f_ss_truncatev8div8hi2_mask avx512f_ss_truncatev8div8si2_mask avx512f_truncatev16siv16hi2_mask avx512f_truncatev16siv16qi2_mask avx512f_truncatev8div8hi2_mask avx512f_truncatev8div8si2_mask avx512f_us_truncatev16siv16hi2_mask avx512f_us_truncatev16siv16qi2_mask avx512f_us_truncatev8div8hi2_mask avx512f_us_truncatev8div8si2_mask avx512f_vcvtps2ph512_mask avx512vl_ss_truncatev16hiv16qi2_mask avx512vl_ss_truncatev4div4si2_mask avx512vl_ss_truncatev8siv8hi2_mask avx512vl_truncatev16hiv16qi2_mask avx512vl_truncatev4div4si2_mask avx512vl_truncatev8siv8hi2_mask avx512vl_us_truncatev16hiv16qi2_mask avx512vl_us_truncatev4div4si2_mask avx512vl_us_truncatev8siv8hi2_mask sse2_movsd vcvtps2ph256_mask match_dup on mem target === avx512f_ss_truncatev8div16qi2_mask_store avx512f_storev16sf_mask avx512f_storev16si_mask avx512f_storev8df_mask avx512f_storev8di_mask avx512f_truncatev8div16qi2_mask_store avx512f_us_truncatev8div16qi2_mask_store avx512vl_compressstorev2df_mask avx512vl_compressstorev2di_mask avx512vl_compressstorev4df_mask avx512vl_compressstorev4di_mask avx512vl_compressstorev4sf_mask avx512vl_compressstorev4si_mask
Re: GCSE: Use HOST_WIDE_INT instead of int (PR, rtl-optimization/79574).
On 03/02/2017 08:58 PM, Bernd Schmidt wrote: > On 03/02/2017 06:50 PM, Martin Liška wrote: >> Hello. >> >> This is second part of fixes needed to not trigger integer overflow in gcse >> pass. > > So, how is this intended to work? The min/max stored in the param is an int, > and by using a HOST_WIDE_INT here, we expect that it is a larger type and > therefore won't overflow? Using int type is fine, but as gcse.c does an arithmetic with a distance: /* Prefer to hoist EXPR if register pressure is decreased. */ if (decreased_pressure > *nregs) distance += bb_size[bb->index]; ... max_distance += (bb_size[dominated->index] - to_bb_head[INSN_UID (occr->insn)]); > >>{ >> expr = flat_table[i]; >> fprintf (file, "Index %d (hash value %d; max distance %d)\n ", >> - expr->bitmap_index, hash_val[i], expr->max_distance); >> + expr->bitmap_index, hash_val[i], (int)expr->max_distance); >> print_rtl (file, expr->expr); >> fprintf (file, "\n"); > > Use HOST_WIDE_INT_PRINT_DEC maybe? Otherwise OK, I guess. That's for nit, I'll install the patch with that. Martin > > > Bernd
[C++ Patch] PR 70266
Hi, one more case of OVERLOAD filtering to cxx_eval_constant_expression and causing an ICE. In order to have a proper diagnostic early enough I think that in build_must_not_throw_expr we want to use perform_implicit_conversion_flags on the condition before calling cxx_constant_value for it. That would be also analogous to the code we already have in build_noexcept_spec, for example. Tested x86_64-linux. Thanks, Paolo. // /cp 2017-03-03 Paolo CarliniPR c++/70266 * except.c (build_must_not_throw_expr): Perform the implicit conversions on the condition. /testsuite 2017-03-03 Paolo Carlini PR c++/70266 * g++.dg/tm/pr70266.C: New. Index: cp/except.c === --- cp/except.c (revision 245836) +++ cp/except.c (working copy) @@ -268,6 +268,9 @@ build_must_not_throw_expr (tree body, tree cond) if (cond && !value_dependent_expression_p (cond)) { + cond = perform_implicit_conversion_flags (boolean_type_node, cond, + tf_warning_or_error, + LOOKUP_NORMAL); cond = cxx_constant_value (cond); if (integer_zerop (cond)) return body; Index: testsuite/g++.dg/tm/pr70266.C === --- testsuite/g++.dg/tm/pr70266.C (revision 0) +++ testsuite/g++.dg/tm/pr70266.C (working copy) @@ -0,0 +1,8 @@ +// { dg-do compile { target c++11 } } +// { dg-options "-fgnu-tm" } + +template < typename T > +int foo (int x, T t) +{ + return __transaction_atomic noexcept (foo) (1); // { dg-error "cannot resolve" } +}
Re: [PATCH] Avoid peeling for gaps if accesses are aligned
On Fri, 3 Mar 2017, Richard Sandiford wrote: > Richard Bienerwrites: > > On Wed, 1 Mar 2017, Richard Sandiford wrote: > > > >> Richard Biener writes: > >> > On Wed, 1 Mar 2017, Richard Sandiford wrote: > >> > > >> >> Richard Biener writes: > >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote: > >> >> > > >> >> >> Richard Biener writes: > >> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote: > >> >> >> > > >> >> >> >> Sorry for the late reply, but: > >> >> >> >> > >> >> >> >> Richard Biener writes: > >> >> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote: > >> >> >> >> > > >> >> >> >> >> > >> >> >> >> >> Currently we force peeling for gaps whenever element > >> >> >> >> >> overrun can occur > >> >> >> >> >> but for aligned accesses we know that the loads won't trap > >> >> >> >> >> and thus > >> >> >> >> >> we can avoid this. > >> >> >> >> >> > >> >> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu > >> >> >> >> >> (I expect > >> >> >> >> >> some testsuite fallout here so didn't bother to invent a > >> >> >> >> >> new testcase). > >> >> >> >> >> > >> >> >> >> >> Just in case somebody thinks the overrun is a bad idea in > >> >> >> >> >> general > >> >> >> >> >> (even when not trapping). Like for ASAN or valgrind. > >> >> >> >> > > >> >> >> >> > This is what I applied. > >> >> >> >> > > >> >> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu. > >> >> >> >> > > >> >> >> >> > Richard. > >> >> >> >> [...] > >> >> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > >> >> >> >> > index 15aec21..c29e73d 100644 > >> >> >> >> > --- a/gcc/tree-vect-stmts.c > >> >> >> >> > +++ b/gcc/tree-vect-stmts.c > >> >> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, > >> >> >> >> > tree vectype, bool slp, > >> >> >> >> >/* If there is a gap at the end of the group then these > >> >> >> >> > optimizations > >> >> >> >> > would access excess elements in the last iteration. */ > >> >> >> >> >bool would_overrun_p = (gap != 0); > >> >> >> >> > + /* If the access is aligned an overrun is fine. */ > >> >> >> >> > + if (would_overrun_p > >> >> >> >> > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info))) > >> >> >> >> > + would_overrun_p = false; > >> >> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info) > >> >> >> >> > && (can_overrun_p || !would_overrun_p) > >> >> >> >> > && compare_step_with_zero (stmt) > 0) > >> >> >> >> > >> >> >> >> ...is this right for all cases? I think it only looks for > >> >> >> >> single-vector > >> >> >> >> alignment, but the gap can in principle be vector-sized or larger, > >> >> >> >> at least for load-lanes. > >> >> >> >> > >> >> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4 > >> >> >> >> and a gap of 2 or 3. Even if the access itself is aligned, the > >> >> >> >> group > >> >> >> >> spans two vectors and we have no guarantee that the second one > >> >> >> >> is mapped. > >> >> >> > > >> >> >> > The check assumes that if aligned_access_p () returns true then the > >> >> >> > whole access is aligned in a way that it can't cross page > >> >> >> > boundaries. > >> >> >> > That's of course not the case if alignment is 16 bytes but the > >> >> >> > access > >> >> >> > will be a multiple of that. > >> >> >> > > >> >> >> >> I haven't been able to come up with a testcase though. We seem > >> >> >> >> to be > >> >> >> >> overly conservative when computing alignments. > >> >> >> > > >> >> >> > Not sure if we can run into this with load-lanes given that bumps > >> >> >> > the > >> >> >> > vectorization factor. Also does load-lane work with gaps? > >> >> >> > > >> >> >> > I think that gap can never be larger than nunits-1 so it is by > >> >> >> > definition > >> >> >> > in the last "vector" independent of the VF. > >> >> >> > > >> >> >> > Classical gap case is > >> >> >> > > >> >> >> > for (i=0; i >> >> >> > { > >> >> >> >y[3*i + 0] = x[4*i + 0]; > >> >> >> >y[3*i + 1] = x[4*i + 1]; > >> >> >> >y[3*i + 2] = x[4*i + 2]; > >> >> >> > } > >> >> >> > > >> >> >> > where x has a gap of 1. You'll get VF of 12 for the above. Make > >> >> >> > the y's different streams and you should get the perfect case for > >> >> >> > load-lane: > >> >> >> > > >> >> >> > for (i=0; i >> >> >> > { > >> >> >> >y[i] = x[4*i + 0]; > >> >> >> >z[i] = x[4*i + 1]; > >> >> >> >w[i] = x[4*i + 2]; > >> >> >> > } > >> >> >> > > >> >> >> > previously we'd peel at least 4 iterations into the epilogue for > >> >> >> > the fear of accessing x[4*i + 3]. When x is V4SI aligned that's > >> >> >> > ok. > >> >> >> > >> >> >> The case I was thinking of was like the second, but with the > >> >> >> element type being DI or DF and with the + 2 statement removed. > >> >> >> E.g.: > >> >> >> > >> >> >> double
[PATCH][C++] Fix PR79825
The following obvious patch fixes PR79825, we were not treating EMPTY_CLASS_EXPR as simple_empty_class_p. Bootstrap / regtest running on x86_64-unknown-linux-gnu. Richard. 2017-03-03 Richard BienerPR c++/79825 * cp-gimplify.c (simple_empty_class_p): Handle EMPTY_CLASS_EXPR. * g++.dg/warn/Wuninitialized-8.C: New testcase. Index: gcc/cp/cp-gimplify.c === *** gcc/cp/cp-gimplify.c(revision 245863) --- gcc/cp/cp-gimplify.c(working copy) *** simple_empty_class_p (tree type, tree op *** 549,554 --- 549,555 return ((TREE_CODE (op) == COMPOUND_EXPR && simple_empty_class_p (type, TREE_OPERAND (op, 1))) + || TREE_CODE (op) == EMPTY_CLASS_EXPR || is_gimple_lvalue (op) || INDIRECT_REF_P (op) || (TREE_CODE (op) == CONSTRUCTOR Index: gcc/testsuite/g++.dg/warn/Wuninitialized-8.C === *** gcc/testsuite/g++.dg/warn/Wuninitialized-8.C(nonexistent) --- gcc/testsuite/g++.dg/warn/Wuninitialized-8.C(working copy) *** *** 0 --- 1,18 + // { dg-do compile } + // { dg-options "-Wuninitialized" } + + struct A; + struct B + { + B(A); + }; + struct C + { + template void m_fn1(PassT p1) { new B(p1); } // { dg-bogus "uninitialized" } + }; + struct A {}; + void fn1() + { + C a; + a.m_fn1(A()); + }
[wwwdocs] gcc-4.7/changes.html -- "typed DWARF stack"
This link broke, but I found a replacement that is actually more in line with the other references in that section (in that it uses the ShowIssue facility on www.dwarfstd.org). Applied. Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v retrieving revision 1.145 diff -u -r1.145 changes.html --- changes.html28 Dec 2016 23:42:42 - 1.145 +++ changes.html3 Mar 2017 10:06:00 - @@ -997,7 +997,7 @@ value and http://www.dwarfstd.org/ShowIssue.php?issue=100909.2;>call site information, http://www.dwarfstd.org/doc/040408.1.html;>typed DWARF stack +href="http://www.dwarfstd.org/ShowIssue.php?issue=140425.1;>typed DWARF stack or http://www.dwarfstd.org/ShowIssue.php?issue=110722.1;>a more compact macro representation. Support for these extensions
Re: [PATCH] Avoid peeling for gaps if accesses are aligned
Richard Bienerwrites: > On Wed, 1 Mar 2017, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Wed, 1 Mar 2017, Richard Sandiford wrote: >> > >> >> Richard Biener writes: >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote: >> >> > >> >> >> Richard Biener writes: >> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote: >> >> >> > >> >> >> >> Sorry for the late reply, but: >> >> >> >> >> >> >> >> Richard Biener writes: >> >> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote: >> >> >> >> > >> >> >> >> >> >> >> >> >> >> Currently we force peeling for gaps whenever element >> >> >> >> >> overrun can occur >> >> >> >> >> but for aligned accesses we know that the loads won't trap >> >> >> >> >> and thus >> >> >> >> >> we can avoid this. >> >> >> >> >> >> >> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu >> >> >> >> >> (I expect >> >> >> >> >> some testsuite fallout here so didn't bother to invent a >> >> >> >> >> new testcase). >> >> >> >> >> >> >> >> >> >> Just in case somebody thinks the overrun is a bad idea in general >> >> >> >> >> (even when not trapping). Like for ASAN or valgrind. >> >> >> >> > >> >> >> >> > This is what I applied. >> >> >> >> > >> >> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu. >> >> >> >> > >> >> >> >> > Richard. >> >> >> >> [...] >> >> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c >> >> >> >> > index 15aec21..c29e73d 100644 >> >> >> >> > --- a/gcc/tree-vect-stmts.c >> >> >> >> > +++ b/gcc/tree-vect-stmts.c >> >> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, >> >> >> >> > tree vectype, bool slp, >> >> >> >> >/* If there is a gap at the end of the group then these >> >> >> >> > optimizations >> >> >> >> >would access excess elements in the last iteration. */ >> >> >> >> >bool would_overrun_p = (gap != 0); >> >> >> >> > + /* If the access is aligned an overrun is fine. */ >> >> >> >> > + if (would_overrun_p >> >> >> >> > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info))) >> >> >> >> > + would_overrun_p = false; >> >> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info) >> >> >> >> > && (can_overrun_p || !would_overrun_p) >> >> >> >> > && compare_step_with_zero (stmt) > 0) >> >> >> >> >> >> >> >> ...is this right for all cases? I think it only looks for >> >> >> >> single-vector >> >> >> >> alignment, but the gap can in principle be vector-sized or larger, >> >> >> >> at least for load-lanes. >> >> >> >> >> >> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4 >> >> >> >> and a gap of 2 or 3. Even if the access itself is aligned, the >> >> >> >> group >> >> >> >> spans two vectors and we have no guarantee that the second one >> >> >> >> is mapped. >> >> >> > >> >> >> > The check assumes that if aligned_access_p () returns true then the >> >> >> > whole access is aligned in a way that it can't cross page boundaries. >> >> >> > That's of course not the case if alignment is 16 bytes but the access >> >> >> > will be a multiple of that. >> >> >> > >> >> >> >> I haven't been able to come up with a testcase though. We seem to >> >> >> >> be >> >> >> >> overly conservative when computing alignments. >> >> >> > >> >> >> > Not sure if we can run into this with load-lanes given that bumps the >> >> >> > vectorization factor. Also does load-lane work with gaps? >> >> >> > >> >> >> > I think that gap can never be larger than nunits-1 so it is by >> >> >> > definition >> >> >> > in the last "vector" independent of the VF. >> >> >> > >> >> >> > Classical gap case is >> >> >> > >> >> >> > for (i=0; i > >> >> > { >> >> >> >y[3*i + 0] = x[4*i + 0]; >> >> >> >y[3*i + 1] = x[4*i + 1]; >> >> >> >y[3*i + 2] = x[4*i + 2]; >> >> >> > } >> >> >> > >> >> >> > where x has a gap of 1. You'll get VF of 12 for the above. Make >> >> >> > the y's different streams and you should get the perfect case for >> >> >> > load-lane: >> >> >> > >> >> >> > for (i=0; i > >> >> > { >> >> >> >y[i] = x[4*i + 0]; >> >> >> >z[i] = x[4*i + 1]; >> >> >> >w[i] = x[4*i + 2]; >> >> >> > } >> >> >> > >> >> >> > previously we'd peel at least 4 iterations into the epilogue for >> >> >> > the fear of accessing x[4*i + 3]. When x is V4SI aligned that's >> >> >> > ok. >> >> >> >> >> >> The case I was thinking of was like the second, but with the >> >> >> element type being DI or DF and with the + 2 statement removed. >> >> >> E.g.: >> >> >> >> >> >> double __attribute__((noinline)) >> >> >> foo (double *a) >> >> >> { >> >> >> double res = 0.0; >> >> >> for (int n = 0; n < 256; n += 4) >> >> >> res += a[n] + a[n + 1]; >> >> >> return res; >> >> >> } >> >> >> >> >> >> (with -ffast-math). We do use LD4 for this, and having "a" aligned >> >> >> to V2DF isn't enough to guarantee that we can access a[n + 2] >> >>
Re: [PATCH, i386]: Fix PR79514, ICE with -m96bit-long-double on x86_64
On Thu, Mar 2, 2017 at 7:52 PM, Uros Bizjakwrote: > Attached patch adds new insn_and_split pattern to handle pre_modify > XFmode push variant. -m96bit-long-double option sets XFmode size to 12 > bytes that is not divisible with push size (8 bytes), resulting in > pre_modify push variant. Attached patch is also needed to avoid ICE on x32. 2017-03-03 Uros Bizjak PR target/79514 * config/i386/i386.md (*pushxf_rounded): Use Pmode instead of DImode. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline, will be backported to release branches. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 245860) +++ config/i386/i386.md (working copy) @@ -3017,14 +3017,14 @@ (define_insn_and_split "*pushxf_rounded" [(set (mem:XF - (pre_modify:DI - (reg:DI SP_REG) - (plus:DI (reg:DI SP_REG) (const_int -16 + (pre_modify:P + (reg:P SP_REG) + (plus:P (reg:P SP_REG) (const_int -16 (match_operand:XF 0 "nonmemory_no_elim_operand" "f,r,*r,C"))] "TARGET_64BIT" "#" "&& 1" - [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -16))) + [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (const_int -16))) (set (match_dup 1) (match_dup 0))] { rtx pat = PATTERN (curr_insn);
Re: [PATCH docs] remove Java from GCC 7 release criteria
On Thu, 2 Mar 2017, Martin Sebor wrote: > On 03/02/2017 12:58 AM, Richard Biener wrote: > > On Wed, 1 Mar 2017, Martin Sebor wrote: > > > > > On 02/28/2017 11:41 PM, Richard Biener wrote: > > > > On March 1, 2017 3:34:46 AM GMT+01:00, Martin Sebor> > > > wrote: > > > > > On 02/28/2017 01:41 PM, Richard Biener wrote: > > > > > > On February 28, 2017 7:00:39 PM GMT+01:00, Jeff Law > > > > > > > > > > > wrote: > > > > > > > On 02/28/2017 10:54 AM, Martin Sebor wrote: > > > > > > > > The GCC 7 release criteria page mentions Java even though > > > > > > > > the front end has been removed. The attached patch removes Java > > > > > > > > from the criteria page. While reviewing the rest of the text I > > > > > > > > noticed a few minor typos that I corrected in the patch as well. > > > > > > > > > > > > > > > > Btw., as an aside, I read the page to see if I could find out > > > > > > > > more > > > > > > > > about the "magic" bug counts that are being aimed for to decide > > > > > when > > > > > > > > to cut the release. Can someone say what those are and where to > > > > > > > > find them? I understand from the document that they're not > > > > > > > > exact > > > > > > > > but even ballpark numbers would be useful. > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > WRT the bug counts. 0 P1 regressions, < 100 P1-P3 regressions. > > > > > > > I'm > > > > > > > not > > > > > > > sure if that's documented anywhere though. > > > > > > > > > > > > Actually the only criteria is zero P1 regressions. Those are > > > > > documented to block a release. > > > > > > > > > > Yes, that is mentioned in the document. Would it be fair to say > > > > > that the number of P2 bugs (or regressions) or their nature plays > > > > > into the decision in some way as well? If so, what can the release > > > > > criteria say about it? > > > > > > > > Ultimatively P2 bugs do not play a role and 'time' will trump them. > > > > OTOH we > > > > never were in an uncomfortable situation with P2s at the desired point > > > > of > > > > release. > > > > > > > > Also note that important P2 bugs can be promoted to P1 and not important > > > > P1 > > > > to P2. > > > > > > > > > I'm trying to get a better idea which bugs to work on and where > > > > > my help might have the biggest impact. I think having better > > > > > visibility into the bug triage process (such as bug priorities > > > > > and how they impact the release schedule) might help others > > > > > focus too. > > > > > > > > In order of importance: > > > > - P1 > > > > - wrong-code, rejects-valid, ice-on-valid (even if not regressions, > > > > regressions more important) > > > > - P2 regressions, more recent ones first (newest working version) > > > > > > I see. This is helpful, thanks. > > > > > > The kinds of problems you mention are discussed in the document > > > so just to make the importance clear, would adding the following > > > after this sentence > > > > > > In general bugs blocking the release are marked with priority P1 > > > (Maintaining the GCC Bugzilla database). > > > > > > accurately reflect what you described? > > > > > > As a general rule of thumb, within each priority level, bugs that > > > result in incorrect code are considered more urgent than those > > > that lead to rejecting valid code, which in turn are viewed as > > > more severe than ice-on-valid code (compiler crashes). More > > > recently reported bugs are also prioritized over very old ones. > > > > I'd rather see to clarify things in bugs/management.html. Note > > that wrong-code, rejects-valid, ice-on-valid are equally important. > > Less important would be accepts-invalid or ice-on-invalid or, of course, > > missed-optimization. Also it's not more recently _reported_ bugs > > but a [6/7 Regression] is more important to fix than a [5/6/7 Regression] > > (this is also why [7 Regression]s are P1 by default). > > Got it. Attached is a rewording of the paragraph above added to > bugs/management.html. I put it under the Importance (Severity > and Priority) heading but it could probably fit just as well under > Procedures and Policies. > > Please let me know if there's anything else that can be said to > further clarify things, or if you have any other suggestions. Looks fine! Thus, ok. Richard.
Re: [PATCH] Fix IFN_ATOMIC* handling with -fnon-call-exceptions (PR middle-end/79805)
On Thu, 2 Mar 2017, Jakub Jelinek wrote: > Hi! > > The __atomic/__sync builtins have nothrow conditional on > -fnon-call-exceptions, but internal fns have right now only constant flags. > As per discussions on IRC, this patch removes ECF_NOTHROW from those 4 ifns > and instead calls gimple_call_set_nothrow when creating those to say > if it can or can't throw. Furthermore, if we need to add statements after > the former builtin, now ifn with -fnon-call-exceptions, we need to insert > them on the fallthru edge instead of right after it. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2017-03-02 Jakub Jelinek> > PR middle-end/79805 > * internal-fn.def (ATOMIC_BIT_TEST_AND_SET, ATOMIC_BIT_TEST_AND_RESET, > ATOMIC_BIT_TEST_AND_COMPLEMENT, ATOMIC_COMPARE_EXCHANGE): Remove > ECF_NOTHROW. > * gimple-fold.c (fold_builtin_atomic_compare_exchange): Set > gimple_call_nothrow_p flag based on flag_non_call_exceptions. For > -fnon-call-exceptions emit following stmts on the fallthrough edge. > (optimize_atomic_bit_test_and): Similarly, except don't create new > bb if inserting just debug stmts on the edge, try to insert them > on the fallthru bb or just reset debug stmts. > > * g++.dg/opt/pr79805.C: New test. > > --- gcc/internal-fn.def.jj2017-02-09 14:55:38.0 +0100 > +++ gcc/internal-fn.def 2017-03-02 13:14:43.659419232 +0100 > @@ -205,11 +205,13 @@ DEF_INTERNAL_FN (GOACC_TILE, ECF_NOTHROW > current target. */ > DEF_INTERNAL_FN (SET_EDOM, ECF_LEAF | ECF_NOTHROW, NULL) > > -/* Atomic functions. */ > -DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL) > -DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, > NULL) > -DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL) > -DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF | ECF_NOTHROW, NULL) > +/* Atomic functions. These don't have ECF_NOTHROW because for > + -fnon-call-exceptions they can throw, otherwise we set > + gimple_call_nothrow_p on it. */ > +DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF, NULL) > +DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF, NULL) > +DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF, NULL) > +DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF, NULL) > > /* To implement [[fallthrough]]. */ > DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL) > --- gcc/gimple-fold.c.jj 2017-02-07 16:40:45.0 +0100 > +++ gcc/gimple-fold.c 2017-03-02 16:04:51.304850077 +0100 > @@ -3533,6 +3533,8 @@ fold_builtin_atomic_compare_exchange (gi >tree itype = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (parmt))); >tree ctype = build_complex_type (itype); >tree expected = TREE_OPERAND (gimple_call_arg (stmt, 1), 0); > + bool may_throw = false; > + edge e = NULL; >gimple *g = gimple_build_assign (make_ssa_name (TREE_TYPE (expected)), > expected); >gsi_insert_before (gsi, g, GSI_SAME_STMT); > @@ -3558,19 +3560,43 @@ fold_builtin_atomic_compare_exchange (gi >gimple_set_vdef (g, gimple_vdef (stmt)); >gimple_set_vuse (g, gimple_vuse (stmt)); >SSA_NAME_DEF_STMT (gimple_vdef (g)) = g; > - if (gimple_call_lhs (stmt)) > + tree oldlhs = gimple_call_lhs (stmt); > + if (flag_non_call_exceptions && stmt_ends_bb_p (stmt)) I think a more appropriate check is stmt_can_throw_internal (stmt) > { > - gsi_insert_before (gsi, g, GSI_SAME_STMT); > + may_throw = true; and 'throws' rather than 'may_throw'. > + gimple_call_set_lhs (stmt, NULL_TREE); > + gsi_replace (gsi, g, true); > + e = find_fallthru_edge (gsi_bb (*gsi)->succs); > +} > + gimple_call_set_nothrow (as_a (g), flag_non_call_exceptions == > 0); it should copy nothrow state from the source(s) of the folding if you consider flag_non_call_exceptions == true in this fn but working on inlined code from a flag_non_call_exceptions == false function. At least for calls in the source(s) that works, for other stmts I think we do not have an explicit nothrow on the stmt. > + if (oldlhs) > +{ > + if (!may_throw) > + gsi_insert_before (gsi, g, GSI_SAME_STMT); >g = gimple_build_assign (make_ssa_name (itype), IMAGPART_EXPR, > build1 (IMAGPART_EXPR, itype, lhs)); > - gsi_insert_before (gsi, g, GSI_SAME_STMT); > - g = gimple_build_assign (gimple_call_lhs (stmt), NOP_EXPR, > -gimple_assign_lhs (g)); > + if (may_throw) > + { > + gsi_insert_on_edge_immediate (e, g); > + *gsi = gsi_for_stmt (g); > + } > + else > + gsi_insert_before (gsi, g, GSI_SAME_STMT); > + g = gimple_build_assign (oldlhs, NOP_EXPR, gimple_assign_lhs (g)); > + if (may_throw) > + gsi_insert_after (gsi, g, GSI_NEW_STMT); > } > - gsi_replace (gsi, g, true); > + if (!may_throw) >
Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)
On Thu, Mar 2, 2017 at 10:44 PM, Jakub Jelinekwrote: > Hi! > > For -O1 and above, we force all operands and targets of x86 builtins into > registers during expansion, because we expect combiner will do its job. > But for -O0, we try to keep them in MEM if the predicate allows it, we just > make sure that at most one input operand is a MEM. We still allow > MEM destination and one MEM input operand, which is why we ICE - most > insns/expanders don't allow that in the predicates and it is fine, but > various insns/expanders have e.g. nonimmediate_operand on both target > and some input operand and just have a condition that requires that one > of those is a register or that both of them are not memory. The expanders > don't check the condition though, it is checked only when the insn is being > recognized in vregs pass and that is too late. > The following patch fixes that by forcing all input operands at -O0 into > non-memory if the target is memory. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Or are there any define_insn/define_expand where it is desirable to have > both input and output operand a MEM (and does it have to be matching)? > For various scalar binary and unary expanders the backend already uses a > helper > that will force something into memory if dest and src are both memory and > not rtx_equal_p, but do we have anything like that in anything these two > builtin expanders emit? Any insn with matched memory (movlps, movhps and similar) can also operate with matched register. To my knowledge, all insn patterns are written in this way, since in the past we had plenty of problems with matched memory operands. Also, I really don't remember what benefit brings us to *not* force input operand to a register at -O0 at expand time and leave to RA to find matched memory. OTOH, with -O0 we already have so many unnecessary moves that you get tears in the eyes... > 2017-03-02 Jakub Jelinek > > PR target/79807 > * config/i386/i386.c (ix86_expand_multi_arg_builtin): If target > is a memory operand, increase num_memory. > (ix86_expand_args_builtin): Likewise. > > * gcc.target/i386/pr79807.c: New test. OK. Thanks, Uros. > > --- gcc/config/i386/i386.c.jj 2017-03-02 08:08:43.0 +0100 > +++ gcc/config/i386/i386.c 2017-03-02 17:58:28.396999033 +0100 > @@ -34249,6 +34249,8 @@ ix86_expand_multi_arg_builtin (enum insn >|| GET_MODE (target) != tmode >|| !insn_data[icode].operand[0].predicate (target, tmode)) > target = gen_reg_rtx (tmode); > + else if (memory_operand (target, tmode)) > +num_memory++; > >gcc_assert (nargs <= 4); > > @@ -35534,6 +35536,8 @@ ix86_expand_args_builtin (const struct b > || GET_MODE (target) != tmode > || !insn_p->operand[0].predicate (target, tmode)) > target = gen_reg_rtx (tmode); > + else if (memory_operand (target, tmode)) > + num_memory++; >real_target = target; > } >else > --- gcc/testsuite/gcc.target/i386/pr79807.c.jj 2017-03-02 18:03:30.032023436 > +0100 > +++ gcc/testsuite/gcc.target/i386/pr79807.c 2017-03-02 18:02:53.0 > +0100 > @@ -0,0 +1,12 @@ > +/* PR target/79807 */ > +/* { dg-do compile } */ > +/* { dg-options "-O0 -mavx -ffloat-store" } */ > + > +typedef double __v2df __attribute__ ((__vector_size__ (16))); > +typedef double __v4df __attribute__ ((__vector_size__ (32))); > + > +__v2df > +foo (__v4df x) > +{ > + return __builtin_ia32_pd_pd256 (x); > +} > > Jakub