Re: [PATCH] Fix for PR/62089 (enable missing Asan checks)
On 08/14/2014 09:46 AM, Yury Gribov wrote: Something like this? Forgot to mention: full tested the new patch on x64 (bootstrap, regtest, Asan-bootstrap).
Re: [PATCH 2/3]Improve induction variable elimination
On Fri, Jul 25, 2014 at 8:35 PM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Jul 17, 2014 at 11:08 AM, Bin Cheng bin.ch...@arm.com wrote: Hi, As quoted from the function difference_cannot_overflow_p, /* TODO: deeper inspection may be necessary to prove the equality. */ switch (code) { case PLUS_EXPR: return expr_equal_p (e1, offset) || expr_equal_p (e2, offset); case POINTER_PLUS_EXPR: return expr_equal_p (e2, offset); default: return false; } The overflow check can be improved by using deeper inspection to prove the equality. This patch deals with that by making below two improvements: a) Handles constant cases. b) Uses affine expansion as deeper inspection to check the equality. As a result, functions strip_wrap_conserving_type_conversions and expr_equal_p can be removed now. A test case is also added to illustrate iv elimination opportunity captured by this patch. Thanks, bin You add special casing for constants but I don't see any testcases for that. Specifically + /* No overflow if offset is zero. */ + if (offset == integer_zero_node) return true; is a bogus check (use integer_zerop). Apart from the special-casing of constants the patch looks good to me. Hi, I changed/rebased the patch which passes bootstrap/test on x86_64. Since it was review/approved before, I will commit it in 24 hours if there is no objection. Thanks, bin 2014-07-17 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (ivopts_data): New field name_expansion. (tree_ssa_iv_optimize_init): Initialize name_expansion. (tree_ssa_iv_optimize_finalize): Free name_expansion. (strip_wrap_conserving_type_conversions, expr_equal_p): Delete. (difference_cannot_overflow_p): New parameter. Handle constant cases. Use affine expansion for equality check. (iv_elimination_compare_lt): Pass new argument. gcc/testsuite/ChangeLog 2014-07-17 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/ivopts-lt-2.c: New test. Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-ivopts } */ + +void +f1 (int *p, unsigned int i) +{ + p += i; + do +{ + *p = 0; + p += 1; + i++; +} + while (i 100); +} + +/* { dg-final { scan-tree-dump-times PHI 1 ivopts } } */ +/* { dg-final { scan-tree-dump-times PHI p_ 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times p_\[0-9\]* 1 ivopts } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 213937) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -323,6 +323,9 @@ struct ivopts_data /* A bitmap of important candidates. */ bitmap important_candidates; + /* Cache used by tree_to_aff_combination_expand. */ + hash_maptree, name_expansion * *name_expansion_cache; + /* The maximum invariant id. */ unsigned max_inv_id; @@ -876,6 +879,7 @@ tree_ssa_iv_optimize_init (struct ivopts_data *dat data-iv_candidates.create (20); data-inv_expr_tab = new hash_tableiv_inv_expr_hasher (10); data-inv_expr_id = 0; + data-name_expansion_cache = NULL; decl_rtl_to_reset.create (20); } @@ -4462,75 +4466,20 @@ iv_elimination_compare (struct ivopts_data *data, return (exit-flags EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR); } -static tree -strip_wrap_conserving_type_conversions (tree exp) -{ - while (tree_ssa_useless_type_conversion (exp) - (nowrap_type_p (TREE_TYPE (exp)) -== nowrap_type_p (TREE_TYPE (TREE_OPERAND (exp, 0) -exp = TREE_OPERAND (exp, 0); - return exp; -} - -/* Walk the SSA form and check whether E == WHAT. Fairly simplistic, we - check for an exact match. */ - -static bool -expr_equal_p (tree e, tree what) -{ - gimple stmt; - enum tree_code code; - - e = strip_wrap_conserving_type_conversions (e); - what = strip_wrap_conserving_type_conversions (what); - - code = TREE_CODE (what); - if (TREE_TYPE (e) != TREE_TYPE (what)) -return false; - - if (operand_equal_p (e, what, 0)) -return true; - - if (TREE_CODE (e) != SSA_NAME) -return false; - - stmt = SSA_NAME_DEF_STMT (e); - if (gimple_code (stmt) != GIMPLE_ASSIGN - || gimple_assign_rhs_code (stmt) != code) -return false; - - switch (get_gimple_rhs_class (code)) -{ -case GIMPLE_BINARY_RHS: - if (!expr_equal_p (gimple_assign_rhs2 (stmt), TREE_OPERAND (what, 1))) - return false; - /* Fallthru. */ - -case GIMPLE_UNARY_RHS: -case GIMPLE_SINGLE_RHS: - return expr_equal_p (gimple_assign_rhs1 (stmt), TREE_OPERAND (what, 0)); -default: -
Re: [PATCH][www] Document versioning scheme for GCC 5 and up
Hi! Apparently using numbers without dots in 4.9/5 Regression has the disadvantage that some bugs make it to the query even when they should not, like [4.8 Regression] ICE in set_address_disp, at rtlanal.c:5537 or avx512f-ceil-sfix-vec-2.c and avx512f-floor-sfix-vec-2.c FAIL on Solaris9/x86 Matching a string 5 in the short_desc is simply too fragile. Note even the 4.9 search strings are fragile, some people happened to type [4.8 Regression] Something bla bla, works in 4.9 or similar. Wonder what extra resources would it take to use regexps instead, and dunno if we don't need to escape it further. Thoughts? --- index.html 13 Aug 2014 13:09:00 - 1.934 +++ index.html 14 Aug 2014 07:52:01 - @@ -176,10 +176,10 @@ Any additions? Don't be shy, send them br / span class=regress a - href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=allwordssubstramp;short_desc=5amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;priority=P1amp;priority=P2amp;priority=P3amp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=;Serious + href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=regexpamp;short_desc=\[([ 0-9.%2F]*[ %2F])*5[ %2F][ 0-9.%2F]*[Rr]egression *\]amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;priority=P1amp;priority=P2amp;priority=P3amp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=Serious regressions/a. a - href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=allwordssubstramp;short_desc=5amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=;All + href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=regexpamp;short_desc=\[([ 0-9.%2F]*[ %2F])*5[ %2F][ 0-9.%2F]*[Rr]egression *\]amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=All regressions/a. /span /dd Jakub
Re: [PATCH] Fix for PR/62089 (enable missing Asan checks)
On Thu, Aug 14, 2014 at 09:46:46AM +0400, Yury Gribov wrote: --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1690,22 +1690,21 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t, int volatilep = 0, unsignedp = 0; tree inner = get_inner_reference (t, bitsize, bitpos, offset, mode, unsignedp, volatilep, false); - if (((size_in_bytes (size_in_bytes - 1)) == 0 -(bitpos % (size_in_bytes * BITS_PER_UNIT))) - || bitsize != size_in_bytes * BITS_PER_UNIT) + + if (TREE_CODE (t) == COMPONENT_REF) { - if (TREE_CODE (t) == COMPONENT_REF -DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1)) != NULL_TREE) + if (DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1)) != NULL_TREE) { tree repr = DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1)); instrument_derefs (iter, build3 (COMPONENT_REF, TREE_TYPE (repr), TREE_OPERAND (t, 0), repr, NULL_TREE), location, is_store); + return; } - return; + else if (bitpos % BITS_PER_UNIT +|| bitsize != size_in_bytes * BITS_PER_UNIT) + return; No, this should be if, not else if, and be after the } below. We really can't handle it otherwise. Generally, the bitfield COMPONENT_REFs should have DECL_BIT_FIELD_REPRESENTATIVE which is not a bitfield, therefore the common case will be handled. As for BIT_FIELD_REFs, the question is what access to use instead of them. Jakub
Re: [testsuite patch] add __ARM_ARCH check for arm_v8_neon_ok
On Mon, Aug 11, 2014 at 11:01 PM, Janis Johnson janis_john...@mentor.com wrote: The check for effective target arm_v8_neon_ok passes even if __ARM_ARCH is not 8 or greater, but then some tests fail because intrinsic functions used in the test have not been declared. This patch requires that __ARM_ARCH be 8 or greater. Tested for arm-none-linux-gnu for mainline and 4.9 with a variety of multilib flags. Out of curiosity - A number of tests are gated off by the target triplet arm*-*-*eabi* , I wonder how many of them actually run if you test with arm-none-linux-gnu . Also given this is the pre-EABI linux triplet, so I'd prefer not to conflate this with the use of the triplet arm-none-linux-gnueabi(hf). regards Ramana OK for mainline and the 4.9 branch? Janis
Re: [PATCH][optabs.c] Fix PR 61713: ICE when expanding single-threaded version of atomic_test_and_set
Hi all, CC'ing release manager, Is this ok to backport to 4.9? Tested there with no problems. Kyrill On 04/08/14 17:54, Kyrill Tkachov wrote: On 25/07/14 23:05, Jeff Law wrote: On 07/23/14 02:53, Kyrill Tkachov wrote: Darn, had forgotten to attach the patch... On 16/07/14 12:30, Kyrill Tkachov wrote: Hi all, This fixes the PR mentioned in the subject. When expanding atomic_test_and_set we try the corresponding sync optabs and if none exist, like for pre-SMP ARM architectures (-march=armv6 and earlier) the code in optabs.c reverts to a single-threaded implementation that just does a load and store. However, if the result of the operation is to be ignored the code in builtins.c follows the convention that the target RTX is set to const0_rtx to signify that the result is ignored and the expand functions in optabs.c are supposed to check for that and act appropriately. The code in the single-threaded codepath in expand_atomic_test_and_set in optabs.c didn't perform this check and instead tried to emit a move to a const0_rtx which ICEs further down the line. This patch fixes that by checking if the result is ignored, and if it is omits the load. I wouldn't dare to remove the load in normal atomic handling code due to all the memory ordering subtleties, but this code path occurs only when we have a trivially single-threaded bare-metal target and the compiler assumes no races anyway and no dodgy context switches and tries to implement this with a ldrb+strb, so I think removing the ldrb is valid. Bootstrapped on arm, aarch64 and x86 and tested as well. Ok for trunk? This appears on 4.9 as well, I'll test it on that branch as well 2014-07-16 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/61713 * gcc/optabs.c (expand_atomic_test_and_set): Do not try to emit move to subtarget in serial version if result is ignored. OK. jeff Thanks Jeff, Is this ok for the 4.9 branch as well if testing comes back ok? (It applies cleanly there.) Kyrill
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
Ping. On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko evstu...@gmail.com wrote: On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson r...@redhat.com wrote: On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote: -expand_vec_perm_palignr (struct expand_vec_perm_d *d) +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num) insn_num might as well be bool avx2, since it's only ever set to two values. Agree. However: after the alignment, one operand permutation could be just move and take 2 instructions for AVX2 as well for AVX2 there could be other cases when the scheme takes 4 or 5 instructions we can leave it for potential avx512 extension - /* Even with AVX, palignr only operates on 128-bit vectors. */ - if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16) + /* SSSE3 is required to apply PALIGNR on 16 bytes operands. */ + if (GET_MODE_SIZE (d-vmode) == 16) +{ + if (!TARGET_SSSE3) + return false; +} + /* AVX2 is required to apply PALIGNR on 32 bytes operands. */ + else if (GET_MODE_SIZE (d-vmode) == 32) +{ + if (!TARGET_AVX2) + return false; +} + /* Other sizes are not supported. */ + else return false; And you'd better check it up here because... Correct. The following should resolve the issue: /* For AVX2 we need more than 2 instructions when the alignment by itself does not produce the desired permutation. */ if (TARGET_AVX2 insn_num = 2) return false; + /* For SSSE3 we need 1 instruction for palignr plus 1 for one + operand permutaoin. */ + if (insn_num == 2) +{ + ok = expand_vec_perm_1 (dcopy); + gcc_assert (ok); +} + /* For AVX2 we need 2 instructions for the shift: vpalignr and + vperm plus 4 instructions for one operand permutation. */ + else if (insn_num == 6) +{ + ok = expand_vec_perm_vpshufb2_vpermq (dcopy); + gcc_assert (ok); +} + else +ok = false; return ok; ... down here you'll simply ICE from the gcc_assert. r~
[PATCH testcase]fix failure of g++.dg/ext/arm-fp16/fp16-mangle-1.C
Hi, g++.dg/ext/arm-fp16/fp16-mangle-1.C is failed because GCC now sets DECL_COMDAT on template instantiations if flag_implicit_templates is in effect. Then DECL_WEAK will be set accordingly. As a result, checking for .global would fail on this case. This patch fixes this by relaxing scanning string into .global|weak just in case it's still global if some options are specified explicitly. Since the change is made by https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=213307. Hi Jason, could you help me here that I understood the issue correctly? Thanks very much. Thanks, bin gcc/testsuite/ChangeLog 2014-08-14 Bin Cheng bin.ch...@arm.com * g++.dg/ext/arm-fp16/fp16-mangle-1.C: Also check .weak. Index: gcc/testsuite/g++.dg/ext/arm-fp16/fp16-mangle-1.C === --- gcc/testsuite/g++.dg/ext/arm-fp16/fp16-mangle-1.C (revision 213809) +++ gcc/testsuite/g++.dg/ext/arm-fp16/fp16-mangle-1.C (working copy) @@ -9,6 +9,6 @@ void f (__fp16 *x) { } /* { dg-final { scan-assembler \t.global\t_Z1gPDhS_ } } */ void g (__fp16 *x, __fp16 *y) { } -/* { dg-final { scan-assembler \t.global\t_ZN1SIDhDhE1iE } } */ -template typename T, typename U struct S { static int i; }; +/* { dg-final { scan-assembler \t.global|weak\t_ZN1SIDhDhE1iE } } */ +template typename T, typename U struct S { static int i; }; template int S__fp16, __fp16::i = 3;
Re: [PATCH] Fix PR61950
Dear Richie and Tobias, I OK'd this patch on 9th August but I now see that the posting bounced because of mime content emanating from my phone mail reader :-( I also thought that the patch is obvious. Cheers Paul On 13 August 2014 23:22, Tobias Burnus bur...@net-b.de wrote: Hi Richard, sorry for the belated reply due to spending lazily my time at the sea ... The patch is okay - and I would even claim that it is obvious. Regarding the data type, I have no idea why it uses a hard-coded 32bit integer instead of a size-type one. I have added it as item to https://gcc.gnu.org/wiki/LibgfortranAbiCleanup to change it once we break the ABI. Tobias PS: I wonder where all the other patch reviewers are, given that quite a few patches have accumulated. In particular, I am lacking a patch reviewer myself. Richard Biener wrote: The following fixes PR61950 by properly initializing the _size field of the static constructor for the vtable init member. Previously gfortran would use a 64bit integer to initialize the 32bit size field (not sure why larger aggregates are not considered). This breaks more sophisticated constant-folding and previously inhibited constant folding (which would have worked in this case been there no mismatch between initializer and field). Bootstrap and regtest ongoing on powerpc64-linux-gnu, ok? I'm not considering a backport as it is only a missed optimization there. Thanks, Richard. 2014-07-31 Richard Biener rguent...@suse.de * trans-expr.c (gfc_conv_structure): Initialize _size with a value of proper type. Index: gcc/fortran/trans-expr.c === --- gcc/fortran/trans-expr.c(revision 213342) +++ gcc/fortran/trans-expr.c(working copy) @@ -6260,7 +6260,9 @@ gfc_conv_structure (gfc_se * se, gfc_exp else if (cm-ts.u.derived strcmp (cm-name, _size) == 0) { val = TYPE_SIZE_UNIT (gfc_get_derived_type (cm-ts.u.derived)); - CONSTRUCTOR_APPEND_ELT (v, cm-backend_decl, val); + CONSTRUCTOR_APPEND_ELT (v, cm-backend_decl, + fold_convert (TREE_TYPE (cm-backend_decl), + val)); } else { -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
Re: [PATCH][www] Document versioning scheme for GCC 5 and up
On Thu, 14 Aug 2014, Jakub Jelinek wrote: Hi! Apparently using numbers without dots in 4.9/5 Regression has the disadvantage that some bugs make it to the query even when they should not, like [4.8 Regression] ICE in set_address_disp, at rtlanal.c:5537 or avx512f-ceil-sfix-vec-2.c and avx512f-floor-sfix-vec-2.c FAIL on Solaris9/x86 Matching a string 5 in the short_desc is simply too fragile. Note even the 4.9 search strings are fragile, some people happened to type [4.8 Regression] Something bla bla, works in 4.9 or similar. Wonder what extra resources would it take to use regexps instead, and dunno if we don't need to escape it further. Works for me. Richard. Thoughts? --- index.html13 Aug 2014 13:09:00 - 1.934 +++ index.html14 Aug 2014 07:52:01 - @@ -176,10 +176,10 @@ Any additions? Don't be shy, send them br / span class=regress a - href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=allwordssubstramp;short_desc=5amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;priority=P1amp;priority=P2amp;priority=P3amp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=n oopamp;type0-0-0=noopamp;value0-0-0=Serious + href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=regexpamp;short_desc=\[([ 0-9.%2F]*[ %2F])*5[ %2F][ 0-9.%2F]*[Rr]egression *\]amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;priority=P1amp;priority=P2amp;priority=P3amp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=Serious regressions/a. a - href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=allwordssubstramp;short_desc=5amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=;All + href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=regexpamp;short_desc=\[([ 0-9.%2F]*[ %2F])*5[ %2F][ 0-9.%2F]*[Rr]egression *\]amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0- 0=noopamp;type0-0-0=noopamp;value0-0-0=All regressions/a. /span
Re: [PATCH][optabs.c] Fix PR 61713: ICE when expanding single-threaded version of atomic_test_and_set
On 14/08/14 09:09, Kyrill Tkachov wrote: Hi all, CC'ing release manager, Is this ok to backport to 4.9? Tested there with no problems. Ah, I see Jeff already ok'd it, sorry for the noise, must have missed then. Kyrill Kyrill On 04/08/14 17:54, Kyrill Tkachov wrote: On 25/07/14 23:05, Jeff Law wrote: On 07/23/14 02:53, Kyrill Tkachov wrote: Darn, had forgotten to attach the patch... On 16/07/14 12:30, Kyrill Tkachov wrote: Hi all, This fixes the PR mentioned in the subject. When expanding atomic_test_and_set we try the corresponding sync optabs and if none exist, like for pre-SMP ARM architectures (-march=armv6 and earlier) the code in optabs.c reverts to a single-threaded implementation that just does a load and store. However, if the result of the operation is to be ignored the code in builtins.c follows the convention that the target RTX is set to const0_rtx to signify that the result is ignored and the expand functions in optabs.c are supposed to check for that and act appropriately. The code in the single-threaded codepath in expand_atomic_test_and_set in optabs.c didn't perform this check and instead tried to emit a move to a const0_rtx which ICEs further down the line. This patch fixes that by checking if the result is ignored, and if it is omits the load. I wouldn't dare to remove the load in normal atomic handling code due to all the memory ordering subtleties, but this code path occurs only when we have a trivially single-threaded bare-metal target and the compiler assumes no races anyway and no dodgy context switches and tries to implement this with a ldrb+strb, so I think removing the ldrb is valid. Bootstrapped on arm, aarch64 and x86 and tested as well. Ok for trunk? This appears on 4.9 as well, I'll test it on that branch as well 2014-07-16 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/61713 * gcc/optabs.c (expand_atomic_test_and_set): Do not try to emit move to subtarget in serial version if result is ignored. OK. jeff Thanks Jeff, Is this ok for the 4.9 branch as well if testing comes back ok? (It applies cleanly there.) Kyrill
Re: [PATCH] Fix PR61950
On Thu, 14 Aug 2014, Paul Richard Thomas wrote: Dear Richie and Tobias, I OK'd this patch on 9th August but I now see that the posting bounced because of mime content emanating from my phone mail reader :-( I also thought that the patch is obvious. I already applied it after your approval. I wasn't sure about the obviousness because of the 32bit thing (the actual bug may be different). Richard. Cheers Paul On 13 August 2014 23:22, Tobias Burnus bur...@net-b.de wrote: Hi Richard, sorry for the belated reply due to spending lazily my time at the sea ... The patch is okay - and I would even claim that it is obvious. Regarding the data type, I have no idea why it uses a hard-coded 32bit integer instead of a size-type one. I have added it as item to https://gcc.gnu.org/wiki/LibgfortranAbiCleanup to change it once we break the ABI. Tobias PS: I wonder where all the other patch reviewers are, given that quite a few patches have accumulated. In particular, I am lacking a patch reviewer myself. Richard Biener wrote: The following fixes PR61950 by properly initializing the _size field of the static constructor for the vtable init member. Previously gfortran would use a 64bit integer to initialize the 32bit size field (not sure why larger aggregates are not considered). This breaks more sophisticated constant-folding and previously inhibited constant folding (which would have worked in this case been there no mismatch between initializer and field). Bootstrap and regtest ongoing on powerpc64-linux-gnu, ok? I'm not considering a backport as it is only a missed optimization there. Thanks, Richard. 2014-07-31 Richard Biener rguent...@suse.de * trans-expr.c (gfc_conv_structure): Initialize _size with a value of proper type. Index: gcc/fortran/trans-expr.c === --- gcc/fortran/trans-expr.c(revision 213342) +++ gcc/fortran/trans-expr.c(working copy) @@ -6260,7 +6260,9 @@ gfc_conv_structure (gfc_se * se, gfc_exp else if (cm-ts.u.derived strcmp (cm-name, _size) == 0) { val = TYPE_SIZE_UNIT (gfc_get_derived_type (cm-ts.u.derived)); - CONSTRUCTOR_APPEND_ELT (v, cm-backend_decl, val); + CONSTRUCTOR_APPEND_ELT (v, cm-backend_decl, + fold_convert (TREE_TYPE (cm-backend_decl), + val)); } else { -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH] Fix PR62077
On Wed, 13 Aug 2014, Jason Merrill wrote: On 08/13/2014 10:28 AM, Richard Biener wrote: Sofar the patch survived building stage2 in a LTO bootstrap on the 4.9 branch, full testing is scheduled for trunk. The patch breaks a lot of C++ testcases, such as g++.old-deja/g++.other/cvt1.C; I think you need to share the set the canonical type code with the template path. Jason, are you happy with that (esp. ripping out the odd type completion stuff that also messes with types recorded in said hashtable)? I'm nervous about it, since it leads to ARRAY_TYPEs with different TYPE_ALIGN than their elements, though I'm not sure this actually breaks anything. Perhaps we could copy TYPE_ALIGN and TYPE_USER_ALIGN at the same place we copy TYPE_NEEDS_CONSTRUCTING. Um, ok. I don't feel like fiddling with this C++ frontend part and am going to try workaround it in tree.c with sth like Index: gcc/tree.c === --- gcc/tree.c (revision 213814) +++ gcc/tree.c (working copy) @@ -6759,6 +6759,13 @@ type_hash_canon (unsigned int hashcode, t1 = type_hash_lookup (hashcode, type); if (t1 != 0) { + if (TYPE_MAIN_VARIANT (t1) != t1) + { + /* Oops. C++ FE fun. Overwrite the entry. */ + type_hash_add (hashcode, type); + return type; + } + gcc_assert (TYPE_MAIN_VARIANT (t1) == t1); if (GATHER_STATISTICS) { tree_code_counts[(int) TREE_CODE (type)]--; as I seem to have other GC-related LTO IL differences to chase. So - can you take over this C++ frontend issue? Thanks, Richard.
Re: [PATCH] Fix PR62077
On Thu, 14 Aug 2014, Richard Biener wrote: On Wed, 13 Aug 2014, Jason Merrill wrote: On 08/13/2014 10:28 AM, Richard Biener wrote: Sofar the patch survived building stage2 in a LTO bootstrap on the 4.9 branch, full testing is scheduled for trunk. The patch breaks a lot of C++ testcases, such as g++.old-deja/g++.other/cvt1.C; I think you need to share the set the canonical type code with the template path. Jason, are you happy with that (esp. ripping out the odd type completion stuff that also messes with types recorded in said hashtable)? I'm nervous about it, since it leads to ARRAY_TYPEs with different TYPE_ALIGN than their elements, though I'm not sure this actually breaks anything. Perhaps we could copy TYPE_ALIGN and TYPE_USER_ALIGN at the same place we copy TYPE_NEEDS_CONSTRUCTING. Um, ok. I don't feel like fiddling with this C++ frontend part and am going to try workaround it in tree.c with sth like Index: gcc/tree.c === --- gcc/tree.c (revision 213814) +++ gcc/tree.c (working copy) @@ -6759,6 +6759,13 @@ type_hash_canon (unsigned int hashcode, t1 = type_hash_lookup (hashcode, type); if (t1 != 0) { + if (TYPE_MAIN_VARIANT (t1) != t1) + { + /* Oops. C++ FE fun. Overwrite the entry. */ + type_hash_add (hashcode, type); + return type; + } + gcc_assert (TYPE_MAIN_VARIANT (t1) == t1); if (GATHER_STATISTICS) { tree_code_counts[(int) TREE_CODE (type)]--; Awww, and that quickly runs into a bootstrap ICE like In file included from /abuild/rguenther/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/rope:2975:0, from /space/rguenther/src/svn/trunk/libstdc++-v3/include/precompiled/extc++.h:52: /abuild/rguenther/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/ropeimpl.h:1185:52: internal compiler error: canonical types differ for identical types const long unsigned int [46] and const long unsigned int [46] _S_min_len[int(__detail::_S_max_rope_depth) + 1] = { Richard.
Re: PING – Re: [Patch, Fortran] -fcoarray=lib - support CRITICAL, prepare for locking support
Dear Tobias, dear all, This patch and the documentation patch are OK for trunk. Many thanks Paul On 6 August 2014 08:46, Tobias Burnus bur...@net-b.de wrote: * PING * – of the patch with the obvious change mentioned by Alessandro (i.e. using if(is_lock_type))? Tobias On 1 August 2014 21:57, Alessandro Fanfarillo wrote: Hello, I was implementing lock/unlock on the library side when I found a possible problem in the patch: if (is_lock_type == GFC_CAF_CRITICAL) +reg_type = sym-attr.artificial ? GFC_CAF_CRITICAL : GFC_CAF_LOCK_STATIC; + else +reg_type = GFC_CAF_COARRAY_STATIC; the if statement cannot be true since is_lock_type is a boolean and GFC_CAF_CRITICAL is 4. Using if (is_lock_type) it produces the right result for the lock registration. Regards Alessandro 2014-07-28 14:37 GMT-06:00 Tobias Burnus bur...@net-b.de: This patch implements -fcoarray=lib support for CRITICAL blocks and includes some preparatory work for locking. In particular: * Updated the documentation for locking/critical, minor cleanup. The patch goes on top of the unreviewed patch https://gcc.gnu.org/ml/fortran/2014-07/msg00155.html * Add libcaf_single implementation for lock/unlock * Add lock/unlock calls for CRITICAL * Register static/SAVEd locking variables and locking variables for critical sections. Build and currently regtesting on x86-64-gnu-linux. OK when it regtested successfully? * * * Still to be done as follow up: * Handling the registering of lock-type components in statically allocated derived types * Handling the registering of lock-type variables and components with allocate and with implicit/explicit deallocate * Calling lock/unlock function for those * Test case for locking and critical blocks Other coarray to-do items: * Type-conversion test case missing * Vector subscript library implementation + test cases * Extending the documentation * Issues with striding for coarray components of derived types * Nonallocatable polymophic coarrays and select type/associated * Allocatable/pointer components of coarrays; co_reduce and co_broadcast Tobias -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
Re: [PATCH] Fix find_inc in the scheduler (PR target/62025)
On Thu, Aug 14, 2014 at 05:12:49AM +0200, Bernd Schmidt wrote: On 08/12/2014 09:35 PM, Jakub Jelinek wrote: As detailed in the PR, find_inc ignored any possible clobbers on inc_insn (typically %cc/flags/etc. register) and thus we could ignore all register dependencies between mem_insn and inc_insn even when we could only safely ignore the mem_reg0 register dependency. I've been trying to remember how I intended to prevent this, and there is indeed a mechanism: DEP_MULTIPLE, which ought to be set for any dependency between two insns that is found for more than one reason. find_inc does not try to break such dependencies, which should make it examine only cases where the only reason for a dependency is between the incremented register and the memory address. I'm pretty sure this worked at some point. It can't have ever worked. DEP_MULTIPLE is set only in update_dep. But what happens here is that while add_or_update_dep_1 is called indeed twice, with dep-pro: (insn 4485 4484 2569 3 (parallel [ (set (reg:SI 10 %r10 [ D.1921 ]) (plus:SI (plus:SI (ltu:SI (reg:CCL1 33 %cc) (const_int 0 [0])) (reg:SI 10 %r10 [ D.1921 ])) (mem:SI (plus:SI (reg:SI 3 %r3 [orig:622 ivtmp.179 ] [622]) (const_int 128 [0x80])) [4 MEM[base: _1817, offset: 128B]+0 S4 A64]))) (clobber (reg:CC 33 %cc)) ]) pr62025.c:139 407 {*addsi3_alc} (expr_list:REG_DEAD (reg:CCL1 33 %cc) (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil dep-con: (insn 1027 2571 2573 3 (parallel [ (set (reg:SI 3 %r3 [orig:622 ivtmp.179 ] [622]) (plus:SI (reg:SI 3 %r3 [orig:622 ivtmp.179 ] [622]) (const_int 128 [0x80]))) (clobber (reg:CC 33 %cc)) ]) 327 {*addsi3} (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil))) dep-type: REG_DEP_ANTI (once because of the %cc clobber - %cc use, and then because of the %r3 set - %r3 use), update_dep isn't called at all and thus DEP_MULTIPLE is not set. That is because we hit the: if (true_dependency_cache != NULL) { switch (ask_dependency_caches (new_dep)) { case DEP_PRESENT: return DEP_PRESENT; path and return DEP_PRESENT immediately. Thus, DEP_MULTIPLE is set only if you have multiple kinds of dependencies in between pro and con, not just multiple reasons to have the same kind of dependency. So, to set DEP_MULTIPLE even in the case where ask_depencency_caches returns DEP_PRESENT, you'd need to find the old dependency anyway (isn't that going to be expensive and totally kill all the effects of true_dependency_cache?) and set DEP_MULTIPLE on it if not already set. Jakub
Re: [PATCH] Fix find_inc in the scheduler (PR target/62025)
On Thu, Aug 14, 2014 at 11:34:04AM +0200, Jakub Jelinek wrote: So, to set DEP_MULTIPLE even in the case where ask_depencency_caches returns DEP_PRESENT, you'd need to find the old dependency anyway (isn't that going to be expensive and totally kill all the effects of true_dependency_cache?) and set DEP_MULTIPLE on it if not already set. Something like (untested except for the openssl s390 testcase), haven't checked what effects it has on the number of successful find_inc cases. --- gcc/sched-deps.c.jj 2014-08-12 17:06:26.0 +0200 +++ gcc/sched-deps.c2014-08-14 11:46:25.509631874 +0200 @@ -1233,6 +1233,13 @@ add_or_update_dep_1 (dep_t new_dep, bool switch (ask_dependency_caches (new_dep)) { case DEP_PRESENT: + dep_t present_dep; + sd_iterator_def sd_it; + + present_dep = sd_find_dep_between_no_cache (DEP_PRO (new_dep), + DEP_CON (new_dep), + resolved_p, sd_it); + DEP_MULTIPLE (present_dep) = 1; return DEP_PRESENT; case DEP_CHANGED: @@ -4752,23 +4759,6 @@ find_inc (struct mem_inc_info *mii, bool goto next; } - /* The inc instruction could have clobbers, make sure those -registers are not used in mem insn. */ - FOR_EACH_INSN_DEF (def, mii-inc_insn) - if (!reg_overlap_mentioned_p (DF_REF_REG (def), mii-mem_reg0)) - { - df_ref use; - FOR_EACH_INSN_USE (use, mii-mem_insn) - if (reg_overlap_mentioned_p (DF_REF_REG (def), - DF_REF_REG (use))) - { - if (sched_verbose = 5) - fprintf (sched_dump, -inc clobber used in store failure.\n); - goto next; - } - } - newaddr = mii-inc_input; if (mii-mem_index != NULL_RTX) newaddr = gen_rtx_PLUS (GET_MODE (newaddr), newaddr, Jakub
Re: RFC: Patch for switch elimination (PR 54742)
On Wed, Aug 13, 2014 at 11:06 PM, Jeff Law l...@redhat.com wrote: On 08/13/14 14:55, Sebastian Pop wrote: Steve Ellcey wrote: +/* This file implements an optimization where, when a variable is set + to a constant value and there is a path that leads from that definition + to a switch statement that uses that variable as its controlling expression + we duplicate the blocks on this path and change the jump to the switch + statement with a direct jump to the label of the switch block that control + would goto based on the value of the variable. This can come up in + loops/switch statements that implement state machines. Can you explain why the jump-threading pass cannot do this? Why do we need another pass to handle a corner case that jump-thread does not catch yet? I'm pretty sure jump threading *could* handle it, but after looking at the full testcase when it was posted, I'm not sure it's *wise* to handle this in jump threading. The core issue is we have to look even deeper into the CFG than was originally envisioned and do quite a bit more block duplication than was originally envisioned. That's going to have a notable compile-time cost (and to a lesser extent issues around codesize). It's unfortunate that the testcase when we first looked at this was over-simplified and not actually representative of what happens in Coremark. Had I seen the Coremark testcase, I probably wouldn't have suggested we tackle the problem in jump threading and the extensions I made to jump threading would _not_ have included aggressively following backedges in the CFG. You'll note in a separate thread Steve and I discussed this during Cauldron and it was at my recommendation Steve resurrected his proof of concept plugin and started beating it into shape. But do we really want a pass just to help coremark? Richard. jeff
Re: [PATCH] Don't set dir prefix twice (PR middle-end/60484)
On Thu, Aug 14, 2014 at 7:34 AM, Joey Ye joey.ye...@gmail.com wrote: PR60484 is marked as 4.7/4.8 regression and it is reported against 4.8 recently by an user. OK backporting to 4.7/4.8? The 4.7 branch is closed. Richard. - Joey On Sat, Mar 15, 2014 at 1:43 AM, Joseph S. Myers jos...@codesourcery.com wrote: On Fri, 14 Mar 2014, Marek Polacek wrote: This patch makes sure that we set the directory prefix of dump_base_name only once, otherwise we'd end up with invalid path, resulting in error: could not open dump file ... This happened because finish_options is called for every optimize attribute and once more for command line options and every time it added the directory prefix. Regtested/bootstrapped on x86_64-linux, ok for trunk? OK, though I think it might be better to use separate fields of gcc_options for the originally specified name and the prefixed version. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Speed up type_hash_canon
This speeds up type_hash_canon by avoiding a 2nd hashtable lookup in the case no previous same type is in the hashtable. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-08-14 Richard Biener rguent...@suse.de * tree.c (type_hash_lookup, type_hash_add): Merge into ... (type_hash_canon): ... this and avoid 2nd lookup for the add. Index: gcc/tree.c === --- gcc/tree.c (revision 213951) +++ gcc/tree.c (working copy) @@ -6763,44 +6763,6 @@ type_hash_hash (const void *item) return ((const struct type_hash *) item)-hash; } -/* Look in the type hash table for a type isomorphic to TYPE. - If one is found, return it. Otherwise return 0. */ - -static tree -type_hash_lookup (hashval_t hashcode, tree type) -{ - struct type_hash *h, in; - - /* The TYPE_ALIGN field of a type is set by layout_type(), so we - must call that routine before comparing TYPE_ALIGNs. */ - layout_type (type); - - in.hash = hashcode; - in.type = type; - - h = (struct type_hash *) htab_find_with_hash (type_hash_table, in, - hashcode); - if (h) -return h-type; - return NULL_TREE; -} - -/* Add an entry to the type-hash-table - for a type TYPE whose hash code is HASHCODE. */ - -static void -type_hash_add (hashval_t hashcode, tree type) -{ - struct type_hash *h; - void **loc; - - h = ggc_alloctype_hash (); - h-hash = hashcode; - h-type = type; - loc = htab_find_slot_with_hash (type_hash_table, h, hashcode, INSERT); - *loc = (void *)h; -} - /* Given TYPE, and HASHCODE its hash code, return the canonical object for an identical type if one already exists. Otherwise, return TYPE, and record it as the canonical object. @@ -6813,17 +6775,28 @@ type_hash_add (hashval_t hashcode, tree tree type_hash_canon (unsigned int hashcode, tree type) { - tree t1; + type_hash in; + void **loc; /* The hash table only contains main variants, so ensure that's what we're being passed. */ gcc_assert (TYPE_MAIN_VARIANT (type) == type); - /* See if the type is in the hash table already. If so, return it. - Otherwise, add the type. */ - t1 = type_hash_lookup (hashcode, type); - if (t1 != 0) + /* The TYPE_ALIGN field of a type is set by layout_type(), so we + must call that routine before comparing TYPE_ALIGNs. */ + layout_type (type); + + in.hash = hashcode; + in.type = type; + + loc = htab_find_slot_with_hash (type_hash_table, in, hashcode, INSERT); + if (*loc) { + tree t1 = ((type_hash *) *loc)-type; + /* ??? We'd like to assert here that the hashtable only contains + main variants but the C++ frontend breaks this by modifying +types already in the hashtable in build_cplus_array_type. */ + /* gcc_assert (TYPE_MAIN_VARIANT (t1) == t1); */ if (GATHER_STATISTICS) { tree_code_counts[(int) TREE_CODE (type)]--; @@ -6834,7 +6807,13 @@ type_hash_canon (unsigned int hashcode, } else { - type_hash_add (hashcode, type); + struct type_hash *h; + + h = ggc_alloctype_hash (); + h-hash = hashcode; + h-type = type; + *loc = (void *)h; + return type; } }
[C++ Patch] PR 54377
Hi, this is a diagnostic issue about the error message for a wrong number of template arguments vs default arguments. The fix is simple but also replacing the second error with a default, as recommended by Manuel in the audit trail, means that we have to adjust a few existing testcases. Note, I'm replacing 'or more' with 'at least', as, again, recommended in the audit trail, and consistently with the other similar error message recently added. Tested x86_64-linux. Thanks, Paolo. /// /cp 2014-08-14 Paolo Carlini paolo.carl...@oracle.com PR c++/54377 * pt.c (coerce_template_parms): Improve error message vs default arguments. /testsuite 2014-08-14 Paolo Carlini paolo.carl...@oracle.com PR c++/54377 * g++.dg/template/pr54377.C: New. * g++.dg/cpp0x/alias-decl-2.C: Adjust. * g++.dg/cpp0x/pr51226.C: Likewise. * g++.dg/cpp0x/variadic2.C: Likewise. * g++.dg/parse/too-many-tmpl-args1.C: Likewise. * g++.dg/template/dtor3.C: Likewise. * g++.dg/template/qualttp4.C: Likewise. * g++.dg/template/spec28.C: Likewise. * g++.old-deja/g++.brendan/crash8.C: Likewise. * g++.old-deja/g++.pt/ttp7.C: Likewise. Index: cp/pt.c === --- cp/pt.c (revision 213952) +++ cp/pt.c (working copy) @@ -6861,19 +6861,24 @@ coerce_template_parms (tree parms, int variadic_args_p = 0; int post_variadic_parms = 0; + /* Likewise for parameters with default arguments. */ + int default_p = 0; + if (args == error_mark_node) return error_mark_node; nparms = TREE_VEC_LENGTH (parms); - /* Determine if there are any parameter packs. */ + /* Determine if there are any parameter packs or default arguments. */ for (parm_idx = 0; parm_idx nparms; ++parm_idx) { - tree tparm = TREE_VALUE (TREE_VEC_ELT (parms, parm_idx)); + tree parm = TREE_VEC_ELT (parms, parm_idx); if (variadic_p) ++post_variadic_parms; - if (template_parameter_pack_p (tparm)) + if (template_parameter_pack_p (TREE_VALUE (parm))) ++variadic_p; + if (TREE_PURPOSE (parm)) + ++default_p; } inner_args = orig_inner_args = INNERMOST_TEMPLATE_ARGS (args); @@ -6902,11 +6907,11 @@ coerce_template_parms (tree parms, { if (complain tf_error) { - if (variadic_p) + if (variadic_p || default_p) { - nparms -= variadic_p; + nparms -= variadic_p ? variadic_p : default_p; error (wrong number of template arguments -(%d, should be %d or more), nargs, nparms); +(%d, should be at least %d), nargs, nparms); } else error (wrong number of template arguments @@ -6913,7 +6918,7 @@ coerce_template_parms (tree parms, (%d, should be %d), nargs, nparms); if (in_decl) - error (provided for %q+D, in_decl); + inform (input_location, provided for %q+D, in_decl); } return error_mark_node; Index: testsuite/g++.dg/cpp0x/alias-decl-2.C === --- testsuite/g++.dg/cpp0x/alias-decl-2.C (revision 213952) +++ testsuite/g++.dg/cpp0x/alias-decl-2.C (working copy) @@ -22,7 +22,7 @@ templateclass T using Vec = VectorT, AllocT templateclass T void g(VectorT, AllocT ); -templatetemplateclass T class TT void h(TTint); // { dg-error provided for } +templatetemplateclass T class TT void h(TTint); // { dg-message provided for } void bar() Index: testsuite/g++.dg/cpp0x/pr51226.C === --- testsuite/g++.dg/cpp0x/pr51226.C(revision 213952) +++ testsuite/g++.dg/cpp0x/pr51226.C(working copy) @@ -1,7 +1,7 @@ // PR c++/51226 // { dg-do compile { target c++11 } } -templateint struct A // { dg-error provided } +templateint struct A // { dg-message provided } { enum E : int; }; Index: testsuite/g++.dg/cpp0x/variadic2.C === --- testsuite/g++.dg/cpp0x/variadic2.C (revision 213952) +++ testsuite/g++.dg/cpp0x/variadic2.C (working copy) @@ -6,9 +6,9 @@ templatetypename... = int // { dg-error default class tuple3; templatetypename T1, typename T2, typename... Rest -struct two_or_more {}; // { dg-error provided for } +struct two_or_more {}; // { dg-message provided for } -typedef two_or_moreint bad; // { dg-error 2 or more 2 or more } +typedef two_or_moreint bad; // { dg-error at least 2 at least 2 } void f() { Index: testsuite/g++.dg/parse/too-many-tmpl-args1.C === --- testsuite/g++.dg/parse/too-many-tmpl-args1.C(revision 213952) +++ testsuite/g++.dg/parse/too-many-tmpl-args1.C
Re: [patch,gomp4] make fortran loop variables implicitly private in openacc
Hi! On Wed, 13 Aug 2014 22:41:47 +0200, Tobias Burnus bur...@net-b.de wrote: Cesar Philippidis wrote: According to section 2.6.1 in the openacc spec, fortran loop variables should be implicitly private like in openmp. This patch does just so. Makes sense. Looking at the patch, I wonder whether the context is properly handled when mixing OpenMP and OpenACC. I have the feeling that one then might end up using the OpenACC context when one should use the OpenMP context. However, I have not fully followed the program flow. For instance, would something like !$oacc parallel !$omp simd private(i) reduction(+:c) do i = 1, n ... end do be properly handled? While that isn't something we're currently focussing on supporting, my intuition tells me that there shouldn't be a separate oacc_current_ctx, and omp_current_ctx should be used instead for OpenACC contexts, too. (If you guys agree, don't let fixing that hold up the patch as posted.) Grüße, Thomas pgpaBODbFSnoJ.pgp Description: PGP signature
[PATCH i386 AVX512] [9/n] Extend iterators and attributes.
Hello, This patch extends iterators and iterator modes to support new patterns (future patches). Bootstrapped. Is it ok trunk? * config/i386/sse.md (define_mode_attr avx512): New. (define_mode_attr sse2_avx_avx512f): Allow V8HI, V16HI, V32HI, V2DI, V4DI modes. (define_mode_attr sse2_avx2): Allow V64QI, V32HI, V4TI modes. (define_mode_attr ssse3_avx2): Ditto. (define_mode_attr sse4_1_avx2): Allow V64QI, V32HI, V8DI modes. (define_mode_attr avx2_avx512bw): New. (define_mode_attr ssedoublemodelower): New. (define_mode_attr ssedoublemode): Allow V8SF, V8SI, V4DI, V4DF, V4SI, V32HI, V64QI modes. (define_mode_attr ssebytemode): Allow V8DI modes. (define_mode_attr sseinsnmode): Allow V4TI, V32HI, V64QI modes. (define_mode_attr sseintvecmodelower): Allow V8DF, V4TI modes. (define_mode_attr ssePSmode2): New. (define_mode_attr ssescalarsize): Allow V64QI, V32QI, V16QI, V8HI, V16HI, V32HI modes. (define_mode_attr dbpsadbwmode): New. (define_mode_attr bcstscalarsuff): Allow V64QI, V32QI, V16QI, V32HI, V16HI, V8HI, V8SI, V4SI, V4DI, V2DI, V8SF, V4SF, V4DF, V2DF modes. (vi8_sse4_1_avx2_avx512): New. (define_insn sse4_1_avx2_movntdqa): Use vi8_sse4_1_avx2_avx512 mode attribute. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ebe38f3..89a1842 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -331,32 +331,41 @@ (V4SI TARGET_AVX2) (V2DI TARGET_AVX2) (V8SI TARGET_AVX2) (V4DI TARGET_AVX2)]) +(define_mode_attr avx512 + [(V16QI avx512vl) (V32QI avx512vl) (V64QI avx512bw) + (V8HI avx512vl) (V16HI avx512vl) (V32HI avx512bw) + (V4SI avx512vl) (V8SI avx512vl) (V16SI avx512f) + (V2DI avx512vl) (V4DI avx512vl) (V8DI avx512f) + (V4SF avx512vl) (V8SF avx512vl) (V16SF avx512f) + (V2DF avx512vl) (V4DF avx512vl) (V8DF avx512f)]) + (define_mode_attr sse2_avx_avx512f [(V16QI sse2) (V32QI avx) (V64QI avx512f) + (V8HI avx512vl) (V16HI avx512vl) (V32HI avx512bw) (V4SI sse2) (V8SI avx) (V16SI avx512f) - (V8DI avx512f) + (V2DI avx512vl) (V4DI avx512vl) (V8DI avx512f) (V16SF avx512f) (V8SF avx) (V4SF avx) (V8DF avx512f) (V4DF avx) (V2DF avx)]) (define_mode_attr sse2_avx2 - [(V16QI sse2) (V32QI avx2) - (V8HI sse2) (V16HI avx2) + [(V16QI sse2) (V32QI avx2) (V64QI avx512bw) + (V8HI sse2) (V16HI avx2) (V32HI avx512bw) (V4SI sse2) (V8SI avx2) (V16SI avx512f) (V2DI sse2) (V4DI avx2) (V8DI avx512f) - (V1TI sse2) (V2TI avx2)]) + (V1TI sse2) (V2TI avx2) (V4TI avx512bw)]) (define_mode_attr ssse3_avx2 - [(V16QI ssse3) (V32QI avx2) -(V4HI ssse3) (V8HI ssse3) (V16HI avx2) + [(V16QI ssse3) (V32QI avx2) (V64QI avx512bw) +(V4HI ssse3) (V8HI ssse3) (V16HI avx2) (V32HI avx512bw) (V4SI ssse3) (V8SI avx2) (V2DI ssse3) (V4DI avx2) -(TI ssse3) (V2TI avx2)]) +(TI ssse3) (V2TI avx2) (V4TI avx512bw)]) (define_mode_attr sse4_1_avx2 - [(V16QI sse4_1) (V32QI avx2) -(V8HI sse4_1) (V16HI avx2) + [(V16QI sse4_1) (V32QI avx2) (V64QI avx512bw) +(V8HI sse4_1) (V16HI avx2) (V32HI avx512bw) (V4SI sse4_1) (V8SI avx2) (V16SI avx512f) -(V2DI sse4_1) (V4DI avx2) (V8DI avx512f)]) +(V2DI sse4_1) (V4DI avx2) (V8DI avx512dq)]) (define_mode_attr avx_avx2 [(V4SF avx) (V2DF avx) @@ -376,6 +385,13 @@ (V8SF avx2) (V16SF avx512f) (V4DF avx2) (V8DF avx512f)]) +(define_mode_attr avx2_avx512bw + [(V4SI avx2) (V8SI avx2) (V16SI avx512f) + (V2DI avx2) (V4DI avx2) (V8DI avx512f) + (V4SF avx2) (V8SF avx2) (V16SF avx512f) + (V2DF avx2) (V4DF avx2) (V8DF avx512f) + (V8HI avx512vl) (V16HI avx512vl) (V32HI avx512bw)]) + (define_mode_attr shuffletype [(V16SF f) (V16SI i) (V8DF f) (V8DI i) (V8SF f) (V8SI i) (V4DF f) (V4DI i) @@ -386,13 +402,19 @@ (define_mode_attr ssequartermode [(V16SF V4SF) (V8DF V2DF) (V16SI V4SI) (V8DI V2DI)]) +(define_mode_attr ssedoublemodelower + [(V16QI v16hi) (V32QI v32hi) (V64QI v64hi) + (V8HI v8si) (V16HI v16si) (V32HI v32si) + (V4SI v4di) (V8SI v8di) (V16SI v16di)]) + (define_mode_attr ssedoublemode [(V16SF V32SF) (V16SI V32SI) (V8DI V16DI) (V8DF V16DF) - (V16HI V16SI) (V8HI V8SI) (V4HI V4SI) - (V32QI V32HI) (V16QI V16HI)]) + (V8SF V16SF) (V8SI V16SI) (V4DI V8DI) (V4DF V8DF) + (V16HI V16SI) (V8HI V8SI) (V4HI V4SI) (V4SI V4DI) + (V32HI V32SI) (V32QI V32HI) (V16QI V16HI) (V64QI V64HI)]) (define_mode_attr ssebytemode - [(V4DI V32QI) (V2DI V16QI)]) + [(V8DI V64QI) (V4DI V32QI) (V2DI V16QI)]) ;; All 128bit vector integer modes (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI]) @@ -461,7 +483,7 @@ ;; SSE instruction mode (define_mode_attr sseinsnmode - [(V64QI XI) (V32HI XI) (V16SI XI) (V8DI XI) + [(V64QI XI) (V32HI XI) (V16SI XI) (V8DI XI) (V4TI XI) (V32QI OI) (V16HI OI) (V8SI OI) (V4DI OI) (V2TI OI) (V16QI TI) (V8HI TI) (V4SI TI) (V2DI
[PATCH i386 AVX512] [10/n] Add vector move/load/store.
Hello, This patch extends load/store insns. No built-ins added in this patch. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_special_args_builtin): Handle avx512vl_storev8sf_mask, avx512vl_storev8si_mask, avx512vl_storev4df_mask, avx512vl_storev4di_mask, avx512vl_storev4sf_mask, avx512vl_storev4si_mask, avx512vl_storev2df_mask, avx512vl_storev2di_mask, avx512vl_loadv8sf_mask, avx512vl_loadv8si_mask, avx512vl_loadv4df_mask, avx512vl_loadv4di_mask, avx512vl_loadv4sf_mask, avx512vl_loadv4si_mask, avx512vl_loadv2df_mask, avx512vl_loadv2di_mask, avx512bw_loadv64qi_mask, avx512vl_loadv32qi_mask, avx512vl_loadv16qi_mask, avx512bw_loadv32hi_mask, avx512vl_loadv16hi_mask, avx512vl_loadv8hi_mask. * config/i386/i386.md: Allow V32HI mode. * config/i386/sse.md (define_mode_iterator VMOVE): Allow V4TI mode. (define_mode_iterator V_AVX512VL): New. (define_mode_iterator V): New handling for AVX512VL. (define_insn avx512f_loadmode_mask): Delete. (define_insn avx512_loadmode_mask): New. (define_insn avx512f_storemode_mask): Delete. (define_insn avx512_storemode_mask): New. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 183b7be..da01ac6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -34722,6 +34722,14 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_storev16si_mask: case CODE_FOR_avx512f_storev8df_mask: case CODE_FOR_avx512f_storev8di_mask: + case CODE_FOR_avx512vl_storev8sf_mask: + case CODE_FOR_avx512vl_storev8si_mask: + case CODE_FOR_avx512vl_storev4df_mask: + case CODE_FOR_avx512vl_storev4di_mask: + case CODE_FOR_avx512vl_storev4sf_mask: + case CODE_FOR_avx512vl_storev4si_mask: + case CODE_FOR_avx512vl_storev2df_mask: + case CODE_FOR_avx512vl_storev2di_mask: aligned_mem = true; break; default: @@ -34765,6 +34773,20 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_loadv16si_mask: case CODE_FOR_avx512f_loadv8df_mask: case CODE_FOR_avx512f_loadv8di_mask: + case CODE_FOR_avx512vl_loadv8sf_mask: + case CODE_FOR_avx512vl_loadv8si_mask: + case CODE_FOR_avx512vl_loadv4df_mask: + case CODE_FOR_avx512vl_loadv4di_mask: + case CODE_FOR_avx512vl_loadv4sf_mask: + case CODE_FOR_avx512vl_loadv4si_mask: + case CODE_FOR_avx512vl_loadv2df_mask: + case CODE_FOR_avx512vl_loadv2di_mask: + case CODE_FOR_avx512bw_loadv64qi_mask: + case CODE_FOR_avx512vl_loadv32qi_mask: + case CODE_FOR_avx512vl_loadv16qi_mask: + case CODE_FOR_avx512bw_loadv32hi_mask: + case CODE_FOR_avx512vl_loadv16hi_mask: + case CODE_FOR_avx512vl_loadv8hi_mask: aligned_mem = true; break; default: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index a72c206..b8ce6c0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1054,7 +1054,7 @@ (V4SF ps) (V2DF pd) (V16QI b) (V8HI w) (V4SI d) (V2DI q) (V32QI b) (V16HI w) (V8SI d) (V4DI q) - (V64QI b) (V16SI d) (V8DI q)]) + (V64QI b) (V32HI w) (V16SI d) (V8DI q)]) ;; SSE vector suffix for floating point modes (define_mode_attr ssevecmodesuffix [(SF ps) (DF pd)]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 89a1842..910b29b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -146,10 +146,21 @@ (V32HI TARGET_AVX512F) (V16HI TARGET_AVX) V8HI (V16SI TARGET_AVX512F) (V8SI TARGET_AVX) V4SI (V8DI TARGET_AVX512F) (V4DI TARGET_AVX) V2DI - (V2TI TARGET_AVX) V1TI + (V4TI TARGET_AVX) (V2TI TARGET_AVX) V1TI (V16SF TARGET_AVX512F) (V8SF TARGET_AVX) V4SF (V8DF TARGET_AVX512F) (V4DF TARGET_AVX) V2DF]) +;; All AVX512VL vector modes +(define_mode_iterator V_AVX512VL + [(V64QI TARGET_AVX512BW) (V32QI TARGET_AVX512VL TARGET_AVX512BW) + (V16QI TARGET_AVX512VL TARGET_AVX512BW) + (V32HI TARGET_AVX512BW) (V16HI TARGET_AVX512VL TARGET_AVX512BW) + (V8HI TARGET_AVX512VL TARGET_AVX512BW) + (V16SI TARGET_AVX512F) (V8SI TARGET_AVX512VL) (V4SI TARGET_AVX512VL) + (V8DI TARGET_AVX512F) (V4DI TARGET_AVX512VL) (V2DI TARGET_AVX512VL) + (V16SF TARGET_AVX512F) (V8SF TARGET_AVX512VL) (V4SF TARGET_AVX512VL) + (V8DF TARGET_AVX512F) (V4DF TARGET_AVX512VL) (V2DF TARGET_AVX512VL)]) + ;; All vector modes (define_mode_iterator V [(V32QI TARGET_AVX) V16QI @@ -708,12 +719,10 @@ case 2: /* There is no evex-encoded vmov* for sizes smaller than 64-bytes in avx512f, so we need to use workarounds, to access sse registers -16-31, which are evex-only. */ - if (TARGET_AVX512F MODE_SIZE 64 -
Re: [PATCH i386 AVX512] [9/n] Extend iterators and attributes.
On Thu, Aug 14, 2014 at 1:20 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: This patch extends iterators and iterator modes to support new patterns (future patches). Bootstrapped. Is it ok trunk? * config/i386/sse.md (define_mode_attr avx512): New. (define_mode_attr sse2_avx_avx512f): Allow V8HI, V16HI, V32HI, V2DI, V4DI modes. (define_mode_attr sse2_avx2): Allow V64QI, V32HI, V4TI modes. (define_mode_attr ssse3_avx2): Ditto. (define_mode_attr sse4_1_avx2): Allow V64QI, V32HI, V8DI modes. (define_mode_attr avx2_avx512bw): New. (define_mode_attr ssedoublemodelower): New. (define_mode_attr ssedoublemode): Allow V8SF, V8SI, V4DI, V4DF, V4SI, V32HI, V64QI modes. (define_mode_attr ssebytemode): Allow V8DI modes. (define_mode_attr sseinsnmode): Allow V4TI, V32HI, V64QI modes. (define_mode_attr sseintvecmodelower): Allow V8DF, V4TI modes. (define_mode_attr ssePSmode2): New. (define_mode_attr ssescalarsize): Allow V64QI, V32QI, V16QI, V8HI, V16HI, V32HI modes. (define_mode_attr dbpsadbwmode): New. (define_mode_attr bcstscalarsuff): Allow V64QI, V32QI, V16QI, V32HI, V16HI, V8HI, V8SI, V4SI, V4DI, V2DI, V8SF, V4SF, V4DF, V2DF modes. (vi8_sse4_1_avx2_avx512): New. (define_insn sse4_1_avx2_movntdqa): Use vi8_sse4_1_avx2_avx512 mode attribute. OK (although I'd put single-use attributes - blendbits and probably new dbpsadbwmode - nearby their users). Uros.
[PATCH i386 AVX512] [11/n] AVX-512DQ extract.
Hello, This patch extends vec_extract_hi_mode pattern to support AVX-512DQ insn. Bootstrapped. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.md (define_attr isa): Add avx512dq, noavx512dq. (define_attr enabled): Ditto. * config/i386/sse.md (define_insn vec_extract_hi_modemask_name): Support masking. -- Thanks, K diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index b8ce6c0..3a797c8 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -741,7 +741,7 @@ (define_attr isa base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64, sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx, avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f, - fma_avx512f,avx512bw,noavx512bw + fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq (const_string base)) (define_attr enabled @@ -774,6 +774,8 @@ (symbol_ref TARGET_FMA || TARGET_AVX512F) (eq_attr isa avx512bw) (symbol_ref TARGET_AVX512BW) (eq_attr isa noavx512bw) (symbol_ref !TARGET_AVX512BW) +(eq_attr isa avx512dq) (symbol_ref TARGET_AVX512DQ) +(eq_attr isa noavx512dq) (symbol_ref !TARGET_AVX512DQ) ] (const_int 1))) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 910b29b..3d3d1a0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -6132,6 +6132,29 @@ (set_attr prefix evex) (set_attr mode sseinsnmode)]) +(define_insn vec_extract_hi_modemask_name + [(set (match_operand:ssehalfvecmode 0 store_mask_predicate =store_mask_constraint,vm) + (vec_select:ssehalfvecmode + (match_operand:V16FI 1 register_operand v,v) + (parallel [(const_int 8) (const_int 9) +(const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)])))] + TARGET_AVX512F (!mask_applied || TARGET_AVX512DQ) + @ + vextractshuffletype32x8\t{$0x1, %1, %0mask_operand2|%0mask_operand2, %1, 0x1} + vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1} + [(set_attr type sselog) + (set_attr prefix_extra 1) + (set_attr isa avx512dq,noavx512dq) + (set_attr length_immediate 1) + (set (attr memory) + (if_then_else (match_test MEM_P (operands[0])) + (const_string store) + (const_string none))) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) + (define_expand avx_vextractf128mode [(match_operand:ssehalfvecmode 0 nonimmediate_operand) (match_operand:V_256 1 register_operand) @@ -6178,23 +6201,6 @@ DONE; }) -(define_insn vec_extract_hi_mode - [(set (match_operand:ssehalfvecmode 0 nonimmediate_operand =v,m) - (vec_select:ssehalfvecmode - (match_operand:V16FI 1 nonimmediate_operand v,v) - (parallel [(const_int 8) (const_int 9) -(const_int 10) (const_int 11) -(const_int 12) (const_int 13) -(const_int 14) (const_int 15)])))] - TARGET_AVX512F - vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1} - [(set_attr type sselog) - (set_attr prefix_extra 1) - (set_attr length_immediate 1) - (set_attr memory none,store) - (set_attr prefix evex) - (set_attr mode XI)]) - (define_insn_and_split vec_extract_lo_mode [(set (match_operand:ssehalfvecmode 0 nonimmediate_operand =x,m) (vec_select:ssehalfvecmode
[PATCH i386 AVX512] [12/n] Extend OI/TImode moves.
Hello, This patch extends moves to OI/TI mode. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.md (define_insn *movoi_internal_avx): Add EVEX version. (define_insn *movti_internal): Ditto. -- Thanks, K diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 3a797c8..39fb2307 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1920,8 +1920,8 @@ (set_attr mode XI)]) (define_insn *movoi_internal_avx - [(set (match_operand:OI 0 nonimmediate_operand =x,x ,m) - (match_operand:OI 1 vector_move_operand C ,xm,x))] + [(set (match_operand:OI 0 nonimmediate_operand =v,v ,m) + (match_operand:OI 1 vector_move_operand C ,vm,v))] TARGET_AVX !(MEM_P (operands[0]) MEM_P (operands[1])) { switch (get_attr_type (insn)) @@ -1935,6 +1935,8 @@ { if (get_attr_mode (insn) == MODE_V8SF) return vmovups\t{%1, %0|%0, %1}; + else if (get_attr_mode (insn) == MODE_XI) + return vmovdqu32\t{%1, %0|%0, %1}; else return vmovdqu\t{%1, %0|%0, %1}; } @@ -1942,6 +1944,8 @@ { if (get_attr_mode (insn) == MODE_V8SF) return vmovaps\t{%1, %0|%0, %1}; + else if (get_attr_mode (insn) == MODE_XI) + return vmovdqa32\t{%1, %0|%0, %1}; else return vmovdqa\t{%1, %0|%0, %1}; } @@ -1953,7 +1957,10 @@ [(set_attr type sselog1,ssemov,ssemov) (set_attr prefix vex) (set (attr mode) - (cond [(match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) + (cond [(ior (match_operand 0 ext_sse_reg_operand) + (match_operand 1 ext_sse_reg_operand)) +(const_string XI) + (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) (const_string V8SF) (and (eq_attr alternative 2) (match_test TARGET_SSE_TYPELESS_STORES)) @@ -1962,8 +1969,8 @@ (const_string OI)))]) (define_insn *movti_internal - [(set (match_operand:TI 0 nonimmediate_operand =!r ,o ,x,x ,m) - (match_operand:TI 1 general_operand riFo,re,C,xm,x))] + [(set (match_operand:TI 0 nonimmediate_operand =!r ,o ,v,v ,m) + (match_operand:TI 1 general_operand riFo,re,C,vm,v))] (TARGET_64BIT || TARGET_SSE) !(MEM_P (operands[0]) MEM_P (operands[1])) { @@ -1983,6 +1990,8 @@ { if (get_attr_mode (insn) == MODE_V4SF) return %vmovups\t{%1, %0|%0, %1}; + else if (get_attr_mode (insn) == MODE_XI) + return vmovdqu32\t{%1, %0|%0, %1}; else return %vmovdqu\t{%1, %0|%0, %1}; } @@ -1990,6 +1999,8 @@ { if (get_attr_mode (insn) == MODE_V4SF) return %vmovaps\t{%1, %0|%0, %1}; + else if (get_attr_mode (insn) == MODE_XI) + return vmovdqa32\t{%1, %0|%0, %1}; else return %vmovdqa\t{%1, %0|%0, %1}; } @@ -2005,7 +2016,10 @@ (const_string maybe_vex) (const_string orig))) (set (attr mode) - (cond [(eq_attr alternative 0,1) + (cond [(ior (match_operand 0 ext_sse_reg_operand) + (match_operand 1 ext_sse_reg_operand)) +(const_string XI) + (eq_attr alternative 0,1) (const_string DI) (ior (not (match_test TARGET_SSE2)) (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL))
Re: [PATCH i386 AVX512] [10/n] Add vector move/load/store.
On Thu, Aug 14, 2014 at 1:30 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends load/store insns. No built-ins added in this patch. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_special_args_builtin): Handle avx512vl_storev8sf_mask, avx512vl_storev8si_mask, avx512vl_storev4df_mask, avx512vl_storev4di_mask, avx512vl_storev4sf_mask, avx512vl_storev4si_mask, avx512vl_storev2df_mask, avx512vl_storev2di_mask, avx512vl_loadv8sf_mask, avx512vl_loadv8si_mask, avx512vl_loadv4df_mask, avx512vl_loadv4di_mask, avx512vl_loadv4sf_mask, avx512vl_loadv4si_mask, avx512vl_loadv2df_mask, avx512vl_loadv2di_mask, avx512bw_loadv64qi_mask, avx512vl_loadv32qi_mask, avx512vl_loadv16qi_mask, avx512bw_loadv32hi_mask, avx512vl_loadv16hi_mask, avx512vl_loadv8hi_mask. * config/i386/i386.md: Allow V32HI mode. Please update the above entry. * config/i386/sse.md (define_mode_iterator VMOVE): Allow V4TI mode. (define_mode_iterator V_AVX512VL): New. (define_mode_iterator V): New handling for AVX512VL. (define_insn avx512f_loadmode_mask): Delete. (define_insn avx512_loadmode_mask): New. (define_insn avx512f_storemode_mask): Delete. (define_insn avx512_storemode_mask): New. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 183b7be..da01ac6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -34722,6 +34722,14 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_storev16si_mask: case CODE_FOR_avx512f_storev8df_mask: case CODE_FOR_avx512f_storev8di_mask: + case CODE_FOR_avx512vl_storev8sf_mask: + case CODE_FOR_avx512vl_storev8si_mask: + case CODE_FOR_avx512vl_storev4df_mask: + case CODE_FOR_avx512vl_storev4di_mask: + case CODE_FOR_avx512vl_storev4sf_mask: + case CODE_FOR_avx512vl_storev4si_mask: + case CODE_FOR_avx512vl_storev2df_mask: + case CODE_FOR_avx512vl_storev2di_mask: aligned_mem = true; break; default: @@ -34765,6 +34773,20 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_loadv16si_mask: case CODE_FOR_avx512f_loadv8df_mask: case CODE_FOR_avx512f_loadv8di_mask: + case CODE_FOR_avx512vl_loadv8sf_mask: + case CODE_FOR_avx512vl_loadv8si_mask: + case CODE_FOR_avx512vl_loadv4df_mask: + case CODE_FOR_avx512vl_loadv4di_mask: + case CODE_FOR_avx512vl_loadv4sf_mask: + case CODE_FOR_avx512vl_loadv4si_mask: + case CODE_FOR_avx512vl_loadv2df_mask: + case CODE_FOR_avx512vl_loadv2di_mask: + case CODE_FOR_avx512bw_loadv64qi_mask: + case CODE_FOR_avx512vl_loadv32qi_mask: + case CODE_FOR_avx512vl_loadv16qi_mask: + case CODE_FOR_avx512bw_loadv32hi_mask: + case CODE_FOR_avx512vl_loadv16hi_mask: + case CODE_FOR_avx512vl_loadv8hi_mask: aligned_mem = true; break; default: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index a72c206..b8ce6c0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1054,7 +1054,7 @@ (V4SF ps) (V2DF pd) (V16QI b) (V8HI w) (V4SI d) (V2DI q) (V32QI b) (V16HI w) (V8SI d) (V4DI q) - (V64QI b) (V16SI d) (V8DI q)]) + (V64QI b) (V32HI w) (V16SI d) (V8DI q)]) ;; SSE vector suffix for floating point modes (define_mode_attr ssevecmodesuffix [(SF ps) (DF pd)]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 89a1842..910b29b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -146,10 +146,21 @@ (V32HI TARGET_AVX512F) (V16HI TARGET_AVX) V8HI (V16SI TARGET_AVX512F) (V8SI TARGET_AVX) V4SI (V8DI TARGET_AVX512F) (V4DI TARGET_AVX) V2DI - (V2TI TARGET_AVX) V1TI + (V4TI TARGET_AVX) (V2TI TARGET_AVX) V1TI Are you sure TARGET_AVX is the correct condition for V4TI? (V16SF TARGET_AVX512F) (V8SF TARGET_AVX) V4SF (V8DF TARGET_AVX512F) (V4DF TARGET_AVX) V2DF]) +;; All AVX512VL vector modes +(define_mode_iterator V_AVX512VL + [(V64QI TARGET_AVX512BW) (V32QI TARGET_AVX512VL TARGET_AVX512BW) + (V16QI TARGET_AVX512VL TARGET_AVX512BW) + (V32HI TARGET_AVX512BW) (V16HI TARGET_AVX512VL TARGET_AVX512BW) Is the above OK? So, you have to pass -m...vl and -m...bw to the compiler to enable these modes? Uros.
[PATCH] Fix PR62067
A static checker determined that LTO def_fn_type has erratic va_start/end combinations on error paths. The following fixes that. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-08-14 Richard Biener rguent...@suse.de * lto-lang.c (def_fn_type): Fix error handling wrt va_end. Index: gcc/lto/lto-lang.c === --- gcc/lto/lto-lang.c (revision 213958) +++ gcc/lto/lto-lang.c (working copy) @@ -524,6 +524,7 @@ def_fn_type (builtin_type def, builtin_t tree *args = XALLOCAVEC (tree, n); va_list list; int i; + bool err = false; va_start (list, n); for (i = 0; i n; ++i) @@ -531,22 +532,22 @@ def_fn_type (builtin_type def, builtin_t builtin_type a = (builtin_type) va_arg (list, int); t = builtin_types[a]; if (t == error_mark_node) - goto egress; + err = true; args[i] = t; } va_end (list); t = builtin_types[ret]; + if (err) +t = error_mark_node; if (t == error_mark_node) -goto egress; - if (var) +; + else if (var) t = build_varargs_function_type_array (t, n, args); else t = build_function_type_array (t, n, args); - egress: builtin_types[def] = t; - va_end (list); } /* Used to help initialize the builtin-types.def table. When a type of
[PATCH PR62011]
Hi All, Here is a fix for PR 62011 - remove false dependency for unary bit-manipulation instructions for latest BigCore chips (Sandybridge and Haswell) by outputting in assembly file zeroing destination register before bmi instruction. I checked that performance restored for popcnt, lzcnt and tzcnt instructions. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? gcc/ChangeLog 2014-08-14 Yuri Rumyantsev ysrum...@gmail.com PR target/62011 * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function prototype. * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function. * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros. * config/i386/i386.md (ctzmode2, clzmode2_lzcnt, popcountmode2, *popcountmode2_cmp, *popcountsi2_cmp_zext): Output zeroing destination register for unary bit-manipulation instructions if required. * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New. patch Description: Binary data
[PATCH] Fix PR62081
The following fixes missing dominator computation before fixing loops. Rather than doing even more such weird stuff in a pass gate function this puts this into a new pass scheduled before the loop passes gate. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-08-14 Richard Biener rguent...@suse.de PR tree-optimization/62081 * tree-ssa-loop.c (pass_fix_loops): New pass. (pass_tree_loop::gate): Do not fixup loops here. * tree-pass.h (make_pass_fix_loops): Declare. * passes.def: Schedule pass_fix_loops before GIMPLE loop passes. Index: gcc/tree-ssa-loop.c === --- gcc/tree-ssa-loop.c (revision 213956) +++ gcc/tree-ssa-loop.c (working copy) @@ -43,6 +43,56 @@ along with GCC; see the file COPYING3. #include tree-vectorizer.h +/* A pass making sure loops are fixed up. */ + +namespace { + +const pass_data pass_data_fix_loops = +{ + GIMPLE_PASS, /* type */ + fix_loops, /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_TREE_LOOP, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_fix_loops : public gimple_opt_pass +{ +public: + pass_fix_loops (gcc::context *ctxt) +: gimple_opt_pass (pass_data_fix_loops, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) { return flag_tree_loop_optimize; } + + virtual unsigned int execute (function *fn); +}; // class pass_fix_loops + +unsigned int +pass_fix_loops::execute (function *) +{ + if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) +{ + calculate_dominance_info (CDI_DOMINATORS); + fix_loop_structure (NULL); +} + return 0; +} + +} // anon namespace + +gimple_opt_pass * +make_pass_fix_loops (gcc::context *ctxt) +{ + return new pass_fix_loops (ctxt); +} + + /* Gate for loop pass group. The group is controlled by -ftree-loop-optimize but we also avoid running it when the IL doesn't contain any loop. */ @@ -57,9 +107,6 @@ gate_loop (function *fn) if (!loops_for_fn (fn)) return true; - /* Make sure to drop / re-discover loops when necessary. */ - if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) -fix_loop_structure (NULL); return number_of_loops (fn) 1; } Index: gcc/passes.def === --- gcc/passes.def (revision 213956) +++ gcc/passes.def (working copy) @@ -200,7 +200,10 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_asan); NEXT_PASS (pass_tsan); /* Pass group that runs when 1) enabled, 2) there are loops -in the function. */ +in the function. Make sure to run pass_fix_loops before +to discover/remove loops before running the gate function +of pass_tree_loop. */ + NEXT_PASS (pass_fix_loops); NEXT_PASS (pass_tree_loop); PUSH_INSERT_PASSES_WITHIN (pass_tree_loop) NEXT_PASS (pass_tree_loop_init); Index: gcc/tree-pass.h === --- gcc/tree-pass.h (revision 213956) +++ gcc/tree-pass.h (working copy) @@ -349,6 +349,7 @@ extern gimple_opt_pass *make_pass_sra_ea extern gimple_opt_pass *make_pass_early_ipa_sra (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tail_recursion (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tail_calls (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_loop (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_no_loop (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
Re: [PATCH i386 AVX512] [9/n] Extend iterators and attributes.
Hello Uroš, On 14 Aug 13:35, Uros Bizjak wrote: OK (although I'd put single-use attributes - blendbits and probably new dbpsadbwmode - nearby their users). Thanks! I'll move `blendbits' to its user. As far as currently no user for `dbpsadbwmode' exists in main trunk - I'll move it to the user in the patch introducing the user. -- Thanks, K
Re: [PATCH i386 AVX512] [11/n] AVX-512DQ extract.
On Thu, Aug 14, 2014 at 1:36 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: This patch extends vec_extract_hi_mode pattern to support AVX-512DQ insn. Bootstrapped. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.md (define_attr isa): Add avx512dq, noavx512dq. (define_attr enabled): Ditto. * config/i386/sse.md (define_insn vec_extract_hi_modemask_name): Support masking. -- Thanks, K diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index b8ce6c0..3a797c8 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -741,7 +741,7 @@ (define_attr isa base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64, sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx, avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f, - fma_avx512f,avx512bw,noavx512bw + fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq (const_string base)) (define_attr enabled @@ -774,6 +774,8 @@ (symbol_ref TARGET_FMA || TARGET_AVX512F) (eq_attr isa avx512bw) (symbol_ref TARGET_AVX512BW) (eq_attr isa noavx512bw) (symbol_ref !TARGET_AVX512BW) +(eq_attr isa avx512dq) (symbol_ref TARGET_AVX512DQ) +(eq_attr isa noavx512dq) (symbol_ref !TARGET_AVX512DQ) ] (const_int 1))) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 910b29b..3d3d1a0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -6132,6 +6132,29 @@ (set_attr prefix evex) (set_attr mode sseinsnmode)]) +(define_insn vec_extract_hi_modemask_name + [(set (match_operand:ssehalfvecmode 0 store_mask_predicate =store_mask_constraint,vm) + (vec_select:ssehalfvecmode + (match_operand:V16FI 1 register_operand v,v) + (parallel [(const_int 8) (const_int 9) +(const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)])))] + TARGET_AVX512F (!mask_applied || TARGET_AVX512DQ) + @ + vextractshuffletype32x8\t{$0x1, %1, %0mask_operand2|%0mask_operand2, %1, 0x1} + vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1} + [(set_attr type sselog) + (set_attr prefix_extra 1) + (set_attr isa avx512dq,noavx512dq) + (set_attr length_immediate 1) + (set (attr memory) + (if_then_else (match_test MEM_P (operands[0])) + (const_string store) + (const_string none))) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) Please change the type to sselog1 (the pattern has only one input operand). The memory attribute will be then set automatically and its explicit handling can be removed. OK with this change. Uros.
[PATCH i386 AVX512] [13/n] Add VF2_AVX512VL and vcvt[t]pd2 insns.
Hello, This patch introduces vcvt[t]pd2 patterns. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.md (define_code_attr ufix_bool): New. * config/i386/sse.md (define_mode_iterator VF2_AVX512VL): New. (define_mode_attr sseintvecmode2): New. (define_insn ufix_truncv2dfv2si2mask_name): Add masking. (define_insn fixsuffixfix_truncv4dfv4si2mask_name): New. (define_insn fixsuffixfix_truncmodesseintvecmodelower2mask_nameround_saeonly_name): New. (define_insn fix_notruncmodesseintvecmodelower2mask_nameround_name): New. (define_insn ufix_notruncmodesseintvecmodelower2mask_nameround_name): New. -- Thanks, K diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 39fb2307..f6351cf 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -881,6 +881,7 @@ ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix ) (unsigned_fix u)]) +(define_code_attr ufix_bool [(fix false) (unsigned_fix true)]) ;; All integer modes. (define_mode_iterator SWI1248x [QI HI SI DI]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 3d3d1a0..eaf97ab 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -228,6 +228,9 @@ (define_mode_iterator VF_512 [V16SF V8DF]) +(define_mode_iterator VF2_AVX512VL + [V8DF (V4DF TARGET_AVX512VL) (V2DF TARGET_AVX512VL)]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F) @@ -523,6 +526,10 @@ (V32HI V32HI) (V64QI V64QI) (V32QI V32QI) (V16QI V16QI)]) +(define_mode_attr sseintvecmode2 + [(V8DF XI) (V4DF OI) (V2DF TI) + (V8SF OI) (V4SF TI)]) + (define_mode_attr sseintvecmodelower [(V16SF v16si) (V8DF v8di) (V8SF v8si) (V4DF v4di) @@ -4236,15 +4243,58 @@ (set_attr prefix evex) (set_attr mode OI)]) -(define_insn fix_truncv4dfv4si2 - [(set (match_operand:V4SI 0 register_operand =x) - (fix:V4SI (match_operand:V4DF 1 nonimmediate_operand xm)))] - TARGET_AVX - vcvttpd2dq{y}\t{%1, %0|%0, %1} +(define_insn ufix_truncv2dfv2si2mask_name + [(set (match_operand:V4SI 0 register_operand =v) + (vec_concat:V4SI + (unsigned_fix:V2SI (match_operand:V2DF 1 nonimmediate_operand vm)) + (const_vector:V2SI [(const_int 0) (const_int 0)])))] + TARGET_AVX512VL + vcvttpd2udq{x}\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type ssecvt) - (set_attr prefix vex) + (set_attr prefix evex) + (set_attr mode TI)]) + +(define_insn fixsuffixfix_truncv4dfv4si2mask_name + [(set (match_operand:V4SI 0 register_operand =v) + (any_fix:V4SI (match_operand:V4DF 1 nonimmediate_operand vm)))] + (TARGET_AVX !ufix_bool) || (TARGET_AVX512VL TARGET_AVX512F) + vcvttpd2fixsuffixdq{y}\t{%1, %0mask_operand2|%0mask_operand2, %1} + [(set_attr type ssecvt) + (set_attr prefix maybe_evex) (set_attr mode OI)]) +(define_insn fixsuffixfix_truncmodesseintvecmodelower2mask_nameround_saeonly_name + [(set (match_operand:sseintvecmode 0 register_operand =v) + (any_fix:sseintvecmode + (match_operand:VF2_AVX512VL 1 round_saeonly_nimm_predicate round_saeonly_constraint)))] + TARGET_AVX512DQ round_saeonly_mode512bit_condition + vcvttpd2fixsuffixqq\t{round_saeonly_mask_op2%1, %0mask_operand2|%0mask_operand2, %1round_saeonly_mask_op2} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode sseintvecmode2)]) + +(define_insn fix_notruncmodesseintvecmodelower2mask_nameround_name + [(set (match_operand:sseintvecmode 0 register_operand =v) + (unspec:sseintvecmode + [(match_operand:VF2_AVX512VL 1 round_nimm_predicate round_constraint)] + UNSPEC_FIX_NOTRUNC))] + TARGET_AVX512DQ round_mode512bit_condition + vcvtpd2qq\t{round_mask_op2%1, %0mask_operand2|%0mask_operand2, %1round_mask_op2} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode sseintvecmode2)]) + +(define_insn ufix_notruncmodesseintvecmodelower2mask_nameround_name + [(set (match_operand:sseintvecmode 0 register_operand =v) + (unspec:sseintvecmode + [(match_operand:VF2_AVX512VL 1 nonimmediate_operand round_constraint)] + UNSPEC_UNSIGNED_FIX_NOTRUNC))] + TARGET_AVX512DQ round_mode512bit_condition + vcvtpd2uqq\t{round_mask_op2%1, %0mask_operand2|%0mask_operand2, %1round_mask_op2} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode sseintvecmode2)]) + (define_expand avx_cvttpd2dq256_2 [(set (match_operand:V8SI 0 register_operand) (vec_concat:V8SI
[PATCH i386 AVX512] [15/n] Extend vcvtudq2ps to avx512vl.
Hello, This patch extends vcvtudq2ps to support AVX-512VL new insns. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator VF1_AVX512VL): New. (define_insn ufloatv16siv16sf2mask_nameround_name): Delete. (define_insn ufloatsseintvecmodelowermode2mask_nameround_name): New. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index d5598d4..819dae1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -237,6 +237,9 @@ (define_mode_iterator VF2_AVX512VL [V8DF (V4DF TARGET_AVX512VL) (V2DF TARGET_AVX512VL)]) +(define_mode_iterator VF1_AVX512VL + [V16SF (V8SF TARGET_AVX512VL) (V4SF TARGET_AVX512VL)]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F) @@ -3687,15 +3690,15 @@ (set_attr prefix maybe_vex) (set_attr mode sseinsnmode)]) -(define_insn ufloatv16siv16sf2mask_nameround_name - [(set (match_operand:V16SF 0 register_operand =v) - (unsigned_float:V16SF - (match_operand:V16SI 1 round_nimm_predicate round_constraint)))] +(define_insn ufloatsseintvecmodelowermode2mask_nameround_name + [(set (match_operand:VF1_AVX512VL 0 register_operand =v) + (unsigned_float:VF1_AVX512VL + (match_operand:sseintvecmode 1 nonimmediate_operand round_constraint)))] TARGET_AVX512F vcvtudq2ps\t{round_mask_op2%1, %0mask_operand2|%0mask_operand2, %1round_mask_op2} [(set_attr type ssecvt) (set_attr prefix evex) - (set_attr mode V16SF)]) + (set_attr mode MODE)]) (define_expand floatunssseintvecmodelowermode2 [(match_operand:VF1 0 register_operand)
Fwd: [PATCH i386 AVX512] [14/n] Add convert to PS insn patterns.
Adding community. -- Forwarded message -- From: Kirill Yukhin kirill.yuk...@gmail.com Date: Thu, Aug 14, 2014 at 4:16 PM Subject: [PATCH i386 AVX512] [14/n] Add convert to PS insn patterns. To: Uros Bizjak ubiz...@gmail.com Hello, This patch introduces new patterns for conversions to PS. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c: Rename ufloatv8siv8df_mask to ufloatv8siv8df2_mask. * config/i386/i386.md (define_code_iterator any_float): New. (define_code_attr floatsuffix): New. * config/i386/sse.md (define_mode_iterator VF1_128_256VL): New. (define_mode_iterator VF2_512_256VL): New. (define_insn floatsi2dfmodelowermode2mask_name): Remove unnecessary TARGET check. (define_insn ufloatv8siv8dfmask_name): Delete. (define_insn floatsuffixfloatsseintvecmodelowermode2mask_nameround_name): New. (define_mode_attr qq2pssuff): New. (define_mode_attr sselongvecmode): New. (define_mode_attr sselongvecmodelower): New. (define_mode_attr sseintvecmode3): New. (define_insn floatsuffixfloatsselongvecmodelowermode2mask_nameround_name): New. (define_insn *floatsuffixfloatv2div2sf2): New. (define_insn floatsuffixfloatv2div2sf2_mask): New. (define_insn ufloatsi2dfmodelowermode2mask_name): New. (define_insn ufloatv2siv2df2mask_name): New. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index da01ac6..e2eda2a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -30036,7 +30036,7 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_compressv16sf_mask, __builtin_ia32_compresssf512_mask, IX86_BUILTIN_COMPRESSPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_floatv8siv8df2_mask, __builtin_ia32_cvtdq2pd512_mask, IX86_BUILTIN_CVTDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_QI }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vcvtps2ph512_mask, __builtin_ia32_vcvtps2ph512_mask, IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_HI }, - { OPTION_MASK_ISA_AVX512F, CODE_FOR_ufloatv8siv8df_mask, __builtin_ia32_cvtudq2pd512_mask, IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_QI }, + { OPTION_MASK_ISA_AVX512F, CODE_FOR_ufloatv8siv8df2_mask, __builtin_ia32_cvtudq2pd512_mask, IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_QI }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_cvtusi2sd32, __builtin_ia32_cvtusi2sd32, IX86_BUILTIN_CVTUSI2SD32, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_expandv8df_mask, __builtin_ia32_expanddf512_mask, IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_expandv8df_maskz, __builtin_ia32_expanddf512_maskz, IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI }, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index f6351cf..e686cf6 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -883,6 +883,10 @@ (define_code_attr fixsuffix [(fix ) (unsigned_fix u)]) (define_code_attr ufix_bool [(fix false) (unsigned_fix true)]) +;; Used in signed and unsigned float. +(define_code_iterator any_float [float unsigned_float]) +(define_code_attr floatsuffix [(float ) (unsigned_float u)]) + ;; All integer modes. (define_mode_iterator SWI1248x [QI HI SI DI]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index eaf97ab..d5598d4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -205,6 +205,9 @@ (define_mode_iterator VF1_128_256 [(V8SF TARGET_AVX) V4SF]) +(define_mode_iterator VF1_128_256VL + [V8SF (V4SF TARGET_AVX512VL)]) + ;; All DFmode vector float modes (define_mode_iterator VF2 [(V8DF TARGET_AVX512F) (V4DF TARGET_AVX) V2DF]) @@ -216,6 +219,9 @@ (define_mode_iterator VF2_512_256 [(V8DF TARGET_AVX512F) (V4DF TARGET_AVX)]) +(define_mode_iterator VF2_512_256VL + [V8DF (V4DF TARGET_AVX512VL)]) + ;; All 128bit vector float modes (define_mode_iterator VF_128 [V4SF (V2DF TARGET_SSE2)]) @@ -4091,21 +4097,94 @@ (define_insn floatsi2dfmodelowermode2mask_name [(set (match_operand:VF2_512_256 0 register_operand =v) (float:VF2_512_256 (match_operand:si2dfmode 1 nonimmediate_operand vm)))] - TARGET_AVX mask_mode512bit_condition + mask_mode512bit_condition vcvtdq2pd\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type ssecvt) (set_attr prefix maybe_vex) (set_attr mode MODE)]) -(define_insn ufloatv8siv8dfmask_name - [(set (match_operand:V8DF 0 register_operand =v) - (unsigned_float:V8DF - (match_operand:V8SI 1 nonimmediate_operand vm)))] - TARGET_AVX512F +(define_insn floatsuffixfloatsseintvecmodelowermode2mask_nameround_name + [(set
Re: [PATCH i386 AVX512] [10/n] Add vector move/load/store.
On 14 Aug 13:45, Uros Bizjak wrote: Please update the above entry. Whoops. Updated ChangeLog: gcc/ * config/i386/i386.c (ix86_expand_special_args_builtin): Handle avx512vl_storev8sf_mask, avx512vl_storev8si_mask, avx512vl_storev4df_mask, avx512vl_storev4di_mask, avx512vl_storev4sf_mask, avx512vl_storev4si_mask, avx512vl_storev2df_mask, avx512vl_storev2di_mask, avx512vl_loadv8sf_mask, avx512vl_loadv8si_mask, avx512vl_loadv4df_mask, avx512vl_loadv4di_mask, avx512vl_loadv4sf_mask, avx512vl_loadv4si_mask, avx512vl_loadv2df_mask, avx512vl_loadv2di_mask, avx512bw_loadv64qi_mask, avx512vl_loadv32qi_mask, avx512vl_loadv16qi_mask, avx512bw_loadv32hi_mask, avx512vl_loadv16hi_mask, avx512vl_loadv8hi_mask. * config/i386/i386.md (define_mode_attr ssemodesuffix): Allow V32HI mode. * config/i386/sse.md (define_mode_iterator VMOVE): Allow V4TI mode. (define_mode_iterator V_AVX512VL): New. (define_mode_iterator V): New handling for AVX512VL. (define_insn avx512f_loadmode_mask): Delete. (define_insn avx512_loadmode_mask): New. (define_insn avx512f_storemode_mask): Delete. (define_insn avx512_storemode_mask): New. --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -146,10 +146,21 @@ (V32HI TARGET_AVX512F) (V16HI TARGET_AVX) V8HI (V16SI TARGET_AVX512F) (V8SI TARGET_AVX) V4SI (V8DI TARGET_AVX512F) (V4DI TARGET_AVX) V2DI - (V2TI TARGET_AVX) V1TI + (V4TI TARGET_AVX) (V2TI TARGET_AVX) V1TI Are you sure TARGET_AVX is the correct condition for V4TI? Right! This should be TARGET_AVX512BW (because corresponding shifts belong to AVX-512BW). +;; All AVX512VL vector modes +(define_mode_iterator V_AVX512VL + [(V64QI TARGET_AVX512BW) (V32QI TARGET_AVX512VL TARGET_AVX512BW) + (V16QI TARGET_AVX512VL TARGET_AVX512BW) + (V32HI TARGET_AVX512BW) (V16HI TARGET_AVX512VL TARGET_AVX512BW) Is the above OK? So, you have to pass -m...vl and -m...bw to the compiler to enable these modes? Yeah. This looks strange, but should be so. Simplest example: vpaddb (with regno 15). This insn is enabled only when AVX512VL *and* AVX512BW bits are on. Updated patch in the bottom. Bootstrapped. Is it ok? -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 183b7be..da01ac6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -34722,6 +34722,14 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_storev16si_mask: case CODE_FOR_avx512f_storev8df_mask: case CODE_FOR_avx512f_storev8di_mask: + case CODE_FOR_avx512vl_storev8sf_mask: + case CODE_FOR_avx512vl_storev8si_mask: + case CODE_FOR_avx512vl_storev4df_mask: + case CODE_FOR_avx512vl_storev4di_mask: + case CODE_FOR_avx512vl_storev4sf_mask: + case CODE_FOR_avx512vl_storev4si_mask: + case CODE_FOR_avx512vl_storev2df_mask: + case CODE_FOR_avx512vl_storev2di_mask: aligned_mem = true; break; default: @@ -34765,6 +34773,20 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case CODE_FOR_avx512f_loadv16si_mask: case CODE_FOR_avx512f_loadv8df_mask: case CODE_FOR_avx512f_loadv8di_mask: + case CODE_FOR_avx512vl_loadv8sf_mask: + case CODE_FOR_avx512vl_loadv8si_mask: + case CODE_FOR_avx512vl_loadv4df_mask: + case CODE_FOR_avx512vl_loadv4di_mask: + case CODE_FOR_avx512vl_loadv4sf_mask: + case CODE_FOR_avx512vl_loadv4si_mask: + case CODE_FOR_avx512vl_loadv2df_mask: + case CODE_FOR_avx512vl_loadv2di_mask: + case CODE_FOR_avx512bw_loadv64qi_mask: + case CODE_FOR_avx512vl_loadv32qi_mask: + case CODE_FOR_avx512vl_loadv16qi_mask: + case CODE_FOR_avx512bw_loadv32hi_mask: + case CODE_FOR_avx512vl_loadv16hi_mask: + case CODE_FOR_avx512vl_loadv8hi_mask: aligned_mem = true; break; default: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index a72c206..b8ce6c0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1054,7 +1054,7 @@ (V4SF ps) (V2DF pd) (V16QI b) (V8HI w) (V4SI d) (V2DI q) (V32QI b) (V16HI w) (V8SI d) (V4DI q) - (V64QI b) (V16SI d) (V8DI q)]) + (V64QI b) (V32HI w) (V16SI d) (V8DI q)]) ;; SSE vector suffix for floating point modes (define_mode_attr ssevecmodesuffix [(SF ps) (DF pd)]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 89a1842..ea56bcb 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -146,10 +146,21 @@ (V32HI TARGET_AVX512F) (V16HI TARGET_AVX) V8HI (V16SI TARGET_AVX512F) (V8SI TARGET_AVX) V4SI (V8DI TARGET_AVX512F) (V4DI TARGET_AVX) V2DI - (V2TI TARGET_AVX) V1TI + (V4TI TARGET_AVX512BW) (V2TI TARGET_AVX) V1TI (V16SF TARGET_AVX512F) (V8SF
[PATCH] Bump BASE-VER, change snapshots to gcc-5-2014xxxxx
See $subject. Ok? Thanks, Richard. 2014-08-14 Richard Biener rguent...@suse.de maintainer-scripts/ * crontab: Change trunk snapshots from 4.10 to 5. gcc/ * BASE-VER: Change to 5.0.0 Index: maintainer-scripts/crontab === --- maintainer-scripts/crontab (revision 213958) +++ maintainer-scripts/crontab (working copy) @@ -3,4 +3,4 @@ 55 0 * * * sh /home/gccadmin/scripts/update_web_docs_libstdcxx_svn 32 22 * * 3 sh /home/gccadmin/scripts/gcc_release -s 4.9:branches/gcc-4_9-branch -l -d /sourceware/snapshot-tmp/gcc all 32 22 * * 4 sh /home/gccadmin/scripts/gcc_release -s 4.8:branches/gcc-4_8-branch -l -d /sourceware/snapshot-tmp/gcc all -32 22 * * 7 sh /home/gccadmin/scripts/gcc_release -s 4.10:trunk -l -d /sourceware/snapshot-tmp/gcc all +32 22 * * 7 sh /home/gccadmin/scripts/gcc_release -s 5:trunk -l -d /sourceware/snapshot-tmp/gcc all Index: gcc/BASE-VER === --- gcc/BASE-VER(revision 213958) +++ gcc/BASE-VER(working copy) @@ -1 +1 @@ -4.10.0 +5.0.0
Re: [PATCH] fix hardreg_cprop to honor HARD_REGNO_MODE_OK.
I've observed SPEC2006 failure on avx512-vlbwdq branch. It was caused by hardreg_cprop. In maybe_mode_change it was assumed, that all values of the same register class and same mode. are ok. This is not the case for i386/avx512. We need to honor HARD_REGNO_MODE_OK. One could argue that having a class where some members are OK for being used in a particular mode, but other members are not is the core issue here. Can you describe a bit about why you've got a class of that nature? Background on that would be useful. AVX512 added new 16 xmm registers (xmm16-xmm31). Those registers require evex encoding. Only 512-bit wide versions of instructions have evex encoding with avx512f, but all versions have it with avx512vl. Most instructions have same macroized pattern for 128/256/512 vector length. They all use constraint 'v', which corresponds to class ALL_SSE_REGS (xmm0 - xmm31). To disallow e. g. xmm20 in 256-bit case (avx512f) and allow it only in avx512vl case we have HARD_REGNO_MODE_OK checking for regno being evex-only and disallowing it if mode is not 512-bit.
Re: [PATCH] Bump BASE-VER, change snapshots to gcc-5-2014xxxxx
On Thu, Aug 14, 2014 at 03:09:26PM +0200, Richard Biener wrote: See $subject. Ok? Thanks, Richard. 2014-08-14 Richard Biener rguent...@suse.de maintainer-scripts/ * crontab: Change trunk snapshots from 4.10 to 5. gcc/ * BASE-VER: Change to 5.0.0 LGTM. --- maintainer-scripts/crontab(revision 213958) +++ maintainer-scripts/crontab(working copy) @@ -3,4 +3,4 @@ 55 0 * * * sh /home/gccadmin/scripts/update_web_docs_libstdcxx_svn 32 22 * * 3 sh /home/gccadmin/scripts/gcc_release -s 4.9:branches/gcc-4_9-branch -l -d /sourceware/snapshot-tmp/gcc all 32 22 * * 4 sh /home/gccadmin/scripts/gcc_release -s 4.8:branches/gcc-4_8-branch -l -d /sourceware/snapshot-tmp/gcc all -32 22 * * 7 sh /home/gccadmin/scripts/gcc_release -s 4.10:trunk -l -d /sourceware/snapshot-tmp/gcc all +32 22 * * 7 sh /home/gccadmin/scripts/gcc_release -s 5:trunk -l -d /sourceware/snapshot-tmp/gcc all Index: gcc/BASE-VER === --- gcc/BASE-VER (revision 213958) +++ gcc/BASE-VER (working copy) @@ -1 +1 @@ -4.10.0 +5.0.0 Jakub
[PINGv3][PATCHv3] Fix vector tests on ARM platforms with disabled unaligned accesses
On 08/07/2014 12:52 PM, Marat Zakirov wrote: On 07/31/2014 04:08 PM, Marat Zakirov wrote: On 07/24/2014 07:40 PM, Marat Zakirov wrote: On 07/24/2014 04:27 PM, Marat Zakirov wrote: On 07/23/2014 06:23 PM, Marat Zakirov wrote: Hi there! I made a patch which fixes regressions on ARM platforms with disabled unaligned accesses. The problem is that 'arm_vect_no_misalign' predicate do not check 'unaligned_access' global variable to determine whether unaligned access to vector are allowed. This leads to spurious vect.exp test fails when GCC is configured --with-specs=%{!munaligned-access:-mno-unaligned-access}. Attached patch fixes ARM predicate and several tests to correctly handle the issue. The following targets were reg. tested for multiple targets (ARM, Thumb-1, Thumb-2, x86, x86_64) with and without -mno-unaligned-access. Analysis showed patch affects only vect.exp tests so only vect.exp was tested. For x86, x86_64, ARM without -mno-unaligned-access, Thumb-2 without -mno-unaligned-access and Thumb-1 no regressions occured. For ARM/Thumb2 with -mno-unaligned-access patch fixed most of failures but triggered some problems (see attached log) for current vect.exp tests: 1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61887 2) Some XPASS'es due to unexpected loop versioning (e.g. gcc.dg/vect/pr33804.c). 3) After predicate fix some passing tests which require unaligned vector support become NA (this was expected). Here is new version of patch and regression log. On the current trunk results are slightly different due to patches for Richard Biener (no UNRESOLVED fails) but some PASS-XPASS regressions still remain (see attachment): PASS-XPASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c scan-tree-dump-times vect vectorized 1 loops 1 PASS-XPASS: gcc.dg/vect/pr33804.c -flto -ffat-lto-objects scan-tree-dump-times vect vectorized 1 loops 1 etc. These XPASS'es are due to code versioning: current GCC creates 2 versions of loop: aligned and misaligned. It's look like they are slightly out of date at lest for ARM. On 07/24/2014 06:50 PM, Ramana Radhakrishnan wrote: This is redundant. - || (defined(__ARMEL__) \ + || (defined(__ARM_FEATURE_UNALIGNED) \ +defined(__ARMEL__) \ (!defined(__thumb__) || defined(__thumb2__))) As is this line. I think you can restrict the check to defined(__ARM_FEATURE_UNALIGNED) defined(__ARMEL__) __ARM_FEATURE_UNALIGNED should tell you whether unaligned access is allowed or not, therefore you should no longer require any specific architectural checks. #error FOO #endif I'm not sure about the original intent of the tests right now. Ramana Thank you Ramana! --Marat gcc/testsuite/ChangeLog: 2014-07-23 Marat Zakirov m.zaki...@samsung.com * gcc.dg/vect/vect-109.c: Skip predicate added. * gcc.dg/vect/vect-93.c: Test check fixed. * gcc.dg/vect/bb-slp-10.c: Likewise. * lib/target-supports.exp (check_effective_target_arm_vect_no_misalign): Check unaligned feature. diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-10.c b/gcc/testsuite/gcc.dg/vect/bb-slp-10.c index a1850ed..0090a4b 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-10.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-10.c @@ -49,7 +49,7 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times unsupported alignment in basic block. 1 slp2 { xfail vect_element_align } } } */ +/* { dg-final { scan-tree-dump unsupported alignment in basic block. 1 slp2 { xfail vect_element_align } } } */ /* { dg-final { scan-tree-dump-times basic block vectorized 1 slp2 { target vect_element_align } } } */ /* { dg-final { cleanup-tree-dump slp2 } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c b/gcc/testsuite/gcc.dg/vect/vect-109.c index 854c970..c671175 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-109.c +++ b/gcc/testsuite/gcc.dg/vect/vect-109.c @@ -1,3 +1,4 @@ +/* { dg-skip-if { vect_no_align } } */ /* { dg-require-effective-target vect_int } */ #include stdarg.h diff --git a/gcc/testsuite/gcc.dg/vect/vect-93.c b/gcc/testsuite/gcc.dg/vect/vect-93.c index 65403eb..1065a6e 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-93.c +++ b/gcc/testsuite/gcc.dg/vect/vect-93.c @@ -79,7 +79,7 @@ int main (void) /* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { target vect_no_align } } } */ /* in main: */ -/* { dg-final { scan-tree-dump-times vectorized 0 loops 1 vect { target vect_no_align } } } */ +/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target vect_no_align } } } */ /* { dg-final { scan-tree-dump-times Vectorizing an unaligned access 1 vect { xfail { vect_no_align } } } } */ /* { dg-final { cleanup-tree-dump vect } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index db65ebe..35076d2 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2293,8 +2293,8 @@ proc
Re: [PATCH i386 AVX512] [12/n] Extend OI/TImode moves.
On Thu, Aug 14, 2014 at 1:42 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends moves to OI/TI mode. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.md (define_insn *movoi_internal_avx): Add EVEX version. (define_insn *movti_internal): Ditto. OK. Thanks, Uros.
Re: [PATCH PR62011]
On Thu, 14 Aug 2014, Yuri Rumyantsev wrote: Hi All, Here is a fix for PR 62011 - remove false dependency for unary bit-manipulation instructions for latest BigCore chips (Sandybridge and Haswell) by outputting in assembly file zeroing destination register before bmi instruction. I checked that performance restored for popcnt, lzcnt and tzcnt instructions. I am not an x86 reviewer, but one thing looks a bit superfluous to me: +/* Retirn true if we need to insert before bit-manipulation instruction note typo^ + zeroing of its destination register. */ +bool +ix86_avoid_false_dep_for_bm (rtx insn, rtx operands[]) +{ + unsigned int regno0; + df_ref use; + if (!TARGET_AVOID_FALSE_DEP_FOR_BM || optimize_function_for_size_p (cfun)) +return false; + regno0 = true_regnum (operands[0]); + /* Check if insn does not use REGNO0. */ + FOR_EACH_INSN_USE (use, insn) +if (regno0 == DF_REF_REGNO (use)) + return false; + return true; +} The loop is to prevent adding the xor when the dest operand is also the source operand. Looks like a simpler reg_or_subregno (operands[0]) == reg_or_subregno (operands[1]) could be used here, as long as the assumption that this is called only for two-operand instruction holds? Alexander
Re: [PATCH PR62011]
2014-08-14 18:00 GMT+04:00 Alexander Monakov amona...@ispras.ru: On Thu, 14 Aug 2014, Yuri Rumyantsev wrote: Hi All, Here is a fix for PR 62011 - remove false dependency for unary bit-manipulation instructions for latest BigCore chips (Sandybridge and Haswell) by outputting in assembly file zeroing destination register before bmi instruction. I checked that performance restored for popcnt, lzcnt and tzcnt instructions. I am not an x86 reviewer, but one thing looks a bit superfluous to me: +/* Retirn true if we need to insert before bit-manipulation instruction note typo^ + zeroing of its destination register. */ +bool +ix86_avoid_false_dep_for_bm (rtx insn, rtx operands[]) +{ + unsigned int regno0; + df_ref use; + if (!TARGET_AVOID_FALSE_DEP_FOR_BM || optimize_function_for_size_p (cfun)) +return false; + regno0 = true_regnum (operands[0]); + /* Check if insn does not use REGNO0. */ + FOR_EACH_INSN_USE (use, insn) +if (regno0 == DF_REF_REGNO (use)) + return false; + return true; +} The loop is to prevent adding the xor when the dest operand is also the source operand. Looks like a simpler reg_or_subregno (operands[0]) == reg_or_subregno (operands[1]) could be used here, as long as the assumption that this is called only for two-operand instruction holds? This wouldn't cover memory operand case. Ilya Alexander
Re: [PATCH i386 AVX512] [13/n] Add VF2_AVX512VL and vcvt[t]pd2 insns.
On Thu, Aug 14, 2014 at 2:12 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch introduces vcvt[t]pd2 patterns. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.md (define_code_attr ufix_bool): New. * config/i386/sse.md (define_mode_iterator VF2_AVX512VL): New. (define_mode_attr sseintvecmode2): New. (define_insn ufix_truncv2dfv2si2mask_name): Add masking. (define_insn fixsuffixfix_truncv4dfv4si2mask_name): New. (define_insn fixsuffixfix_truncmodesseintvecmodelower2mask_nameround_saeonly_name): New. (define_insn fix_notruncmodesseintvecmodelower2mask_nameround_name): New. (define_insn ufix_notruncmodesseintvecmodelower2mask_nameround_name): New. -- Thanks, K diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 39fb2307..f6351cf 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -881,6 +881,7 @@ ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix ) (unsigned_fix u)]) +(define_code_attr ufix_bool [(fix false) (unsigned_fix true)]) ;; All integer modes. (define_mode_iterator SWI1248x [QI HI SI DI]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 3d3d1a0..eaf97ab 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -228,6 +228,9 @@ (define_mode_iterator VF_512 [V16SF V8DF]) +(define_mode_iterator VF2_AVX512VL + [V8DF (V4DF TARGET_AVX512VL) (V2DF TARGET_AVX512VL)]) + ;; All vector integer modes (define_mode_iterator VI [(V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F) @@ -523,6 +526,10 @@ (V32HI V32HI) (V64QI V64QI) (V32QI V32QI) (V16QI V16QI)]) +(define_mode_attr sseintvecmode2 + [(V8DF XI) (V4DF OI) (V2DF TI) + (V8SF OI) (V4SF TI)]) + (define_mode_attr sseintvecmodelower [(V16SF v16si) (V8DF v8di) (V8SF v8si) (V4DF v4di) @@ -4236,15 +4243,58 @@ (set_attr prefix evex) (set_attr mode OI)]) -(define_insn fix_truncv4dfv4si2 - [(set (match_operand:V4SI 0 register_operand =x) - (fix:V4SI (match_operand:V4DF 1 nonimmediate_operand xm)))] - TARGET_AVX - vcvttpd2dq{y}\t{%1, %0|%0, %1} +(define_insn ufix_truncv2dfv2si2mask_name + [(set (match_operand:V4SI 0 register_operand =v) + (vec_concat:V4SI + (unsigned_fix:V2SI (match_operand:V2DF 1 nonimmediate_operand vm)) + (const_vector:V2SI [(const_int 0) (const_int 0)])))] + TARGET_AVX512VL + vcvttpd2udq{x}\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type ssecvt) - (set_attr prefix vex) + (set_attr prefix evex) + (set_attr mode TI)]) + +(define_insn fixsuffixfix_truncv4dfv4si2mask_name + [(set (match_operand:V4SI 0 register_operand =v) + (any_fix:V4SI (match_operand:V4DF 1 nonimmediate_operand vm)))] + (TARGET_AVX !ufix_bool) || (TARGET_AVX512VL TARGET_AVX512F) + vcvttpd2fixsuffixdq{y}\t{%1, %0mask_operand2|%0mask_operand2, %1} + [(set_attr type ssecvt) + (set_attr prefix maybe_evex) (set_attr mode OI)]) Please split this pattern to fix and unsigned_fix. The macroisation here is confusing. You should then remove ufix_bool attribute. OK with this change. Thanks, Uros.
Re: [C++ Patch] PR 54377
On 08/14/2014 07:14 AM, Paolo Carlini wrote: + nparms -= variadic_p ? variadic_p : default_p; What if you have both default arguments and parameter packs? Jason
Re: [PATCH i386 AVX512] [15/n] Extend vcvtudq2ps to avx512vl.
On Thu, Aug 14, 2014 at 2:20 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This patch extends vcvtudq2ps to support AVX-512VL new insns. Bootstrapped. New tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator VF1_AVX512VL): New. (define_insn ufloatv16siv16sf2mask_nameround_name): Delete. (define_insn ufloatsseintvecmodelowermode2mask_nameround_name): New. OK. Thanks, Uros.
Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
On Mon, Jul 28, 2014 at 8:08 AM, Wei Mi w...@google.com wrote: But fact is that it is _not_ necessary to split the block because there are no outgoing abnormal edges from it. The verifier failure is an artifact from using the same predicates during CFG building and CFG verifying (usually ok, but for this particular case it leads to this issue). So I don't think your patch is the proper way to address this issue (while it certainly works). Instead whether a call can make abnormal gotos should be recorded per-call and stored on the gimple-call. Martin - this is exactly one of the cases your patch would address? Thanks for the comment and thanks to Martin's patch. I try the patch. It works well to address both pr60449 and pr61776 after some extension. One extension is to replace GF_CALL_LEAF attribute using GF_CALL_NO_ABNORMAL_GOTO. That is because not only dropping leaf attribute in lto symbol merge could introduce the control flow verification problem in pr60449, dropping const/pure attributes could introduce the same problem too. It is unnecessary to introduce per-call attributes for all these three: ECF_LEAF/ECF_CONST/ECF_PURE, so GF_CALL_NO_ABNORMAL_GOTO is introduced to indicate that a call stmt has no abnormal goto. GF_CALL_NO_ABNORMAL_GOTO will be set according to gimple_call_flags() once gimple call stmt is created, then updated in execute_fixup_cfg and cleanup_tree_cfg. I posted the extended patch here. I didn't add the noreturn part in because it has no direct impact on pr60449 and pr61776. I can help Martin to test and post that part as an independent patch later. bootstrap and regression pass on x86_64-linux-gnu. Is it ok? +static void +update_no_abnormal_goto_attr (basic_block bb) +{ + gimple_stmt_iterator gsi; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ it should be enough to check these on last stmts of a basic block, no? That you call update_no_abnormal_goto_attr from two places before cleanup_tree_cfg_bb suggests you may want to perform this change in cleanup_control_flow_bb which already looks at the last stmt only? Btw, I originally had this whole idea of moving flags to the gimple stmt level because of the interesting way we handle the noreturn attribute (calls to noreturn functions also end basic-blocks). Thus would it be possible to turn all these into a single flag, GF_CALL_CTRL_ALTERING? That is, cover everything that is_ctrl_altering_stmt covers? I suggest we initialize it at CFG build time and only ever clear it later. Sorry for not thinking about this earlier (and for the slow review). Thanks, Richard. Thanks, Wei.
Re: Fix if-conversion pass for dead type-unsafe code
On Sat, Aug 9, 2014 at 7:14 AM, Tom de Vries tom_devr...@mentor.com wrote: On 08-08-14 17:17, Tom de Vries wrote: Maybe instead of a new mem_alias_equal_p simply compare MEM_ATTRs with mem_attrs_eq_p? I propose to fix it this way (as attached) on 4.8/4.9/trunk, and maybe do a more efficient handling on trunk as a follow-up patch. I'll put this through bootstrap/test again. Bootstrapped and reg-tested on trunk x86_64. Re-attached patch OK for trunk, 4.8, 4.9 ? Ok. (did you check the effect on code-generation? that is, how many opportunities compiling GCC do we lose?) Thanks, Richard. Thanks, - Tom
[PATCH][LTO] Hide string streaming details
This hides a part of the internal strings processing detail by using appropriate APIs. LTO bootstrap ongoing on x86_64-unknown-linux-gnu. Richard. 2014-08-14 Richard Biener rguent...@suse.de * data-streamer.h (streamer_string_index, string_for_index): Remove. * data-streamer-out.c (streamer_string_index): Make static. * data-streamer-in.c (string_for_index): Likewise. * lto-streamer-out.c (lto_output_location): Use bp_pack_string. * lto-streamer-in.c (lto_input_location): Use bp_unpack_string. Index: gcc/data-streamer.h === --- gcc/data-streamer.h (revision 213966) +++ gcc/data-streamer.h (working copy) @@ -57,8 +57,6 @@ void streamer_write_hwi (struct output_b void streamer_write_gcov_count (struct output_block *, gcov_type); void streamer_write_string (struct output_block *, struct lto_output_stream *, const char *, bool); -unsigned streamer_string_index (struct output_block *, const char *, - unsigned int, bool); void streamer_write_string_with_length (struct output_block *, struct lto_output_stream *, const char *, unsigned int, bool); @@ -74,7 +72,6 @@ void streamer_write_data_stream (struct size_t); /* In data-streamer-in.c */ -const char *string_for_index (struct data_in *, unsigned int, unsigned int *); const char *streamer_read_string (struct data_in *, struct lto_input_block *); const char *streamer_read_indexed_string (struct data_in *, struct lto_input_block *, Index: gcc/data-streamer-out.c === --- gcc/data-streamer-out.c (revision 213966) +++ gcc/data-streamer-out.c (working copy) @@ -81,7 +81,7 @@ lto_append_block (struct lto_output_stre When PERSISTENT is set, the string S is supposed to not change during duration of the OB and thus OB can keep pointer into it. */ -unsigned +static unsigned streamer_string_index (struct output_block *ob, const char *s, unsigned int len, bool persistent) { Index: gcc/data-streamer-in.c === --- gcc/data-streamer-in.c (revision 213966) +++ gcc/data-streamer-in.c (working copy) @@ -36,7 +36,7 @@ along with GCC; see the file COPYING3. /* Read a string from the string table in DATA_IN using input block IB. Write the length to RLEN. */ -const char * +static const char * string_for_index (struct data_in *data_in, unsigned int loc, unsigned int *rlen) { unsigned int len; Index: gcc/lto-streamer-out.c === --- gcc/lto-streamer-out.c (revision 213966) +++ gcc/lto-streamer-out.c (working copy) @@ -189,10 +189,7 @@ lto_output_location (struct output_block bp_pack_value (bp, ob-current_col != xloc.column, 1); if (ob-current_file != xloc.file) -bp_pack_var_len_unsigned (bp, - streamer_string_index (ob, xloc.file, -strlen (xloc.file) + 1, -true)); +bp_pack_string (ob, bp, xloc.file, true); ob-current_file = xloc.file; if (ob-current_line != xloc.line) Index: gcc/lto-streamer-in.c === --- gcc/lto-streamer-in.c (revision 213966) +++ gcc/lto-streamer-in.c (working copy) @@ -154,7 +154,6 @@ lto_input_location (struct bitpack_d *bp static int current_line; static int current_col; bool file_change, line_change, column_change; - unsigned len; bool prev_file = current_file != NULL; if (bp_unpack_value (bp, 1)) @@ -165,10 +164,7 @@ lto_input_location (struct bitpack_d *bp column_change = bp_unpack_value (bp, 1); if (file_change) -current_file = canon_file_name -(string_for_index (data_in, - bp_unpack_var_len_unsigned (bp), - len)); +current_file = canon_file_name (bp_unpack_string (data_in, bp)); if (line_change) current_line = bp_unpack_var_len_unsigned (bp);
[PATCH] Fix PR62031
This fixes wrong answer from data-dependence analysis by realizing that _all_ (even non-evolving) indirect accesses cannot be constrained to a full object size. This also gets rid of that ugly DR_UNCONSTRAINED_BASE hack (but effectively make it always active). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-08-14 Richard Biener rguent...@suse.de PR tree-optimization/62031 * tree-data-ref.c (dr_analyze_indices): Do not set DR_UNCONSTRAINED_BASE. (dr_may_alias_p): All indirect accesses have to go the formerly DR_UNCONSTRAINED_BASE path. * tree-data-ref.h (struct indices): Remove unconstrained_base member. (DR_UNCONSTRAINED_BASE): Remove. * gcc.dg/torture/pr62031.c: New testcase. Index: gcc/tree-data-ref.c === --- gcc/tree-data-ref.c (revision 213958) +++ gcc/tree-data-ref.c (working copy) @@ -982,7 +982,6 @@ dr_analyze_indices (struct data_referenc ref = fold_build2_loc (EXPR_LOCATION (ref), MEM_REF, TREE_TYPE (ref), base, memoff); - DR_UNCONSTRAINED_BASE (dr) = true; access_fns.safe_push (access_fn); } } @@ -1389,14 +1388,20 @@ dr_may_alias_p (const struct data_refere return false; } - /* If we had an evolution in a MEM_REF BASE_OBJECT we do not know - the size of the base-object. So we cannot do any offset/overlap - based analysis but have to rely on points-to information only. */ + /* If we had an evolution in a pointer-based MEM_REF BASE_OBJECT we + do not know the size of the base-object. So we cannot do any + offset/overlap based analysis but have to rely on points-to + information only. */ if (TREE_CODE (addr_a) == MEM_REF - DR_UNCONSTRAINED_BASE (a)) + TREE_CODE (TREE_OPERAND (addr_a, 0)) == SSA_NAME) { - if (TREE_CODE (addr_b) == MEM_REF - DR_UNCONSTRAINED_BASE (b)) + /* For true dependences we can apply TBAA. */ + if (flag_strict_aliasing + DR_IS_WRITE (a) DR_IS_READ (b) + !alias_sets_conflict_p (get_alias_set (DR_REF (a)), +get_alias_set (DR_REF (b + return false; + if (TREE_CODE (addr_b) == MEM_REF) return ptr_derefs_may_alias_p (TREE_OPERAND (addr_a, 0), TREE_OPERAND (addr_b, 0)); else @@ -1404,9 +1409,21 @@ dr_may_alias_p (const struct data_refere build_fold_addr_expr (addr_b)); } else if (TREE_CODE (addr_b) == MEM_REF - DR_UNCONSTRAINED_BASE (b)) -return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a), - TREE_OPERAND (addr_b, 0)); + TREE_CODE (TREE_OPERAND (addr_b, 0)) == SSA_NAME) +{ + /* For true dependences we can apply TBAA. */ + if (flag_strict_aliasing + DR_IS_WRITE (a) DR_IS_READ (b) + !alias_sets_conflict_p (get_alias_set (DR_REF (a)), +get_alias_set (DR_REF (b + return false; + if (TREE_CODE (addr_a) == MEM_REF) + return ptr_derefs_may_alias_p (TREE_OPERAND (addr_a, 0), + TREE_OPERAND (addr_b, 0)); + else + return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a), + TREE_OPERAND (addr_b, 0)); +} /* Otherwise DR_BASE_OBJECT is an access that covers the whole object that is being subsetted in the loop nest. */ Index: gcc/tree-data-ref.h === --- gcc/tree-data-ref.h (revision 213958) +++ gcc/tree-data-ref.h (working copy) @@ -81,10 +81,6 @@ struct indices /* A list of chrecs. Access functions of the indices. */ vectree access_fns; - - /* Whether BASE_OBJECT is an access representing the whole object - or whether the access could not be constrained. */ - bool unconstrained_base; }; struct dr_alias @@ -195,7 +191,6 @@ struct data_reference #define DR_STMT(DR)(DR)-stmt #define DR_REF(DR) (DR)-ref #define DR_BASE_OBJECT(DR) (DR)-indices.base_object -#define DR_UNCONSTRAINED_BASE(DR) (DR)-indices.unconstrained_base #define DR_ACCESS_FNS(DR) (DR)-indices.access_fns #define DR_ACCESS_FN(DR, I)DR_ACCESS_FNS (DR)[I] #define DR_NUM_DIMENSIONS(DR) DR_ACCESS_FNS (DR).length () Index: gcc/testsuite/gcc.dg/torture/pr62031.c === --- gcc/testsuite/gcc.dg/torture/pr62031.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr62031.c (working copy) @@ -0,0 +1,52 @@ +/* { dg-do run } */ + +#include stdlib.h + +#define NUM_OF_STATES 4 +typedef unsigned int entry_t[2]; +typedef struct entries_item { entry_t
Re: [PATCH PR62011]
For example, for the first loop for attached test-case we do not prepend xor to popcnt because of using destination register: .L23: leal 1(%rdx), %ecx popcntq (%rbx,%rax,8), %rax leal 2(%rdx), %r8d popcntq (%rbx,%rcx,8), %rcx addq %rax, %rcx leal 3(%rdx), %esi xorq %rax, %rax popcntq (%rbx,%r8,8), %rax addq %rax, %rcx xorq %rax, %rax popcntq (%rbx,%rsi,8), %rax addq %rax, %rcx leal 4(%rdx), %eax addq %rcx, %r14 movq %rax, %rdx cmpq %rax, %r12 ja .L23 2014-08-14 18:06 GMT+04:00 Ilya Enkovich enkovich@gmail.com: 2014-08-14 18:00 GMT+04:00 Alexander Monakov amona...@ispras.ru: On Thu, 14 Aug 2014, Yuri Rumyantsev wrote: Hi All, Here is a fix for PR 62011 - remove false dependency for unary bit-manipulation instructions for latest BigCore chips (Sandybridge and Haswell) by outputting in assembly file zeroing destination register before bmi instruction. I checked that performance restored for popcnt, lzcnt and tzcnt instructions. I am not an x86 reviewer, but one thing looks a bit superfluous to me: +/* Retirn true if we need to insert before bit-manipulation instruction note typo^ + zeroing of its destination register. */ +bool +ix86_avoid_false_dep_for_bm (rtx insn, rtx operands[]) +{ + unsigned int regno0; + df_ref use; + if (!TARGET_AVOID_FALSE_DEP_FOR_BM || optimize_function_for_size_p (cfun)) +return false; + regno0 = true_regnum (operands[0]); + /* Check if insn does not use REGNO0. */ + FOR_EACH_INSN_USE (use, insn) +if (regno0 == DF_REF_REGNO (use)) + return false; + return true; +} The loop is to prevent adding the xor when the dest operand is also the source operand. Looks like a simpler reg_or_subregno (operands[0]) == reg_or_subregno (operands[1]) could be used here, as long as the assumption that this is called only for two-operand instruction holds? This wouldn't cover memory operand case. Ilya Alexander
Re: [PATCH 1/2, x86] Add palignr support for AVX2.
On Thu, Aug 14, 2014 at 1:08 AM, Evgeny Stupachenko evstu...@gmail.com wrote: Ping. On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko evstu...@gmail.com wrote: On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson r...@redhat.com wrote: On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote: -expand_vec_perm_palignr (struct expand_vec_perm_d *d) +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num) insn_num might as well be bool avx2, since it's only ever set to two values. Agree. However: after the alignment, one operand permutation could be just move and take 2 instructions for AVX2 as well for AVX2 there could be other cases when the scheme takes 4 or 5 instructions we can leave it for potential avx512 extension - /* Even with AVX, palignr only operates on 128-bit vectors. */ - if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16) + /* SSSE3 is required to apply PALIGNR on 16 bytes operands. */ + if (GET_MODE_SIZE (d-vmode) == 16) +{ + if (!TARGET_SSSE3) + return false; +} + /* AVX2 is required to apply PALIGNR on 32 bytes operands. */ + else if (GET_MODE_SIZE (d-vmode) == 32) +{ + if (!TARGET_AVX2) + return false; +} + /* Other sizes are not supported. */ + else return false; And you'd better check it up here because... Correct. The following should resolve the issue: /* For AVX2 we need more than 2 instructions when the alignment by itself does not produce the desired permutation. */ if (TARGET_AVX2 insn_num = 2) return false; + /* For SSSE3 we need 1 instruction for palignr plus 1 for one + operand permutaoin. */ + if (insn_num == 2) +{ + ok = expand_vec_perm_1 (dcopy); + gcc_assert (ok); +} + /* For AVX2 we need 2 instructions for the shift: vpalignr and + vperm plus 4 instructions for one operand permutation. */ + else if (insn_num == 6) +{ + ok = expand_vec_perm_vpshufb2_vpermq (dcopy); + gcc_assert (ok); +} + else +ok = false; return ok; ... down here you'll simply ICE from the gcc_assert. Can you modify your patch to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128 with a testcase? -- H.J.
Re: [PATCH PR62011]
On Thu, Aug 14, 2014 at 4:06 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-08-14 18:00 GMT+04:00 Alexander Monakov amona...@ispras.ru: On Thu, 14 Aug 2014, Yuri Rumyantsev wrote: Hi All, Here is a fix for PR 62011 - remove false dependency for unary bit-manipulation instructions for latest BigCore chips (Sandybridge and Haswell) by outputting in assembly file zeroing destination register before bmi instruction. I checked that performance restored for popcnt, lzcnt and tzcnt instructions. I am not an x86 reviewer, but one thing looks a bit superfluous to me: +/* Retirn true if we need to insert before bit-manipulation instruction note typo^ + zeroing of its destination register. */ +bool +ix86_avoid_false_dep_for_bm (rtx insn, rtx operands[]) +{ + unsigned int regno0; + df_ref use; + if (!TARGET_AVOID_FALSE_DEP_FOR_BM || optimize_function_for_size_p (cfun)) +return false; + regno0 = true_regnum (operands[0]); + /* Check if insn does not use REGNO0. */ + FOR_EACH_INSN_USE (use, insn) +if (regno0 == DF_REF_REGNO (use)) + return false; + return true; +} The loop is to prevent adding the xor when the dest operand is also the source operand. Looks like a simpler reg_or_subregno (operands[0]) == reg_or_subregno (operands[1]) could be used here, as long as the assumption that this is called only for two-operand instruction holds? This wouldn't cover memory operand case. You should use reg_mentioned_p, like if (!reg_mentioned_p (operands[0], operands[1]) output_asm_insn (xor{l}\t%k0,%k0, operands); ... Uros.
Re: [PATCH PR62011]
On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi All, Here is a fix for PR 62011 - remove false dependency for unary bit-manipulation instructions for latest BigCore chips (Sandybridge and Haswell) by outputting in assembly file zeroing destination register before bmi instruction. I checked that performance restored for popcnt, lzcnt and tzcnt instructions. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? gcc/ChangeLog 2014-08-14 Yuri Rumyantsev ysrum...@gmail.com PR target/62011 * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function prototype. * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function. * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros. * config/i386/i386.md (ctzmode2, clzmode2_lzcnt, popcountmode2, *popcountmode2_cmp, *popcountsi2_cmp_zext): Output zeroing destination register for unary bit-manipulation instructions if required. Why don't you use splitter to to generate XOR? * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New. Is this needed for r16 and r32? The original report says that only r64 is affected: http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance Have you tried this on Silvermont? Does it help Silvermont? -- H.J.
Re: [PATCH testcase]fix failure of g++.dg/ext/arm-fp16/fp16-mangle-1.C
On 08/14/2014 04:31 AM, Bin Cheng wrote: g++.dg/ext/arm-fp16/fp16-mangle-1.C is failed because GCC now sets DECL_COMDAT on template instantiations if flag_implicit_templates is in effect. Then DECL_WEAK will be set accordingly. As a result, checking for .global would fail on this case. But i in this testcase is an explicit specialization, not an instantiation, so this is a bug. Jason
Re: [PATCH 2/3]Improve induction variable elimination
Bin.Cheng wrote: The overflow check can be improved by using deeper inspection to prove the equality. This patch deals with that by making below two improvements: a) Handles constant cases. b) Uses affine expansion as deeper inspection to check the equality. Looks good to me. Thanks, Sebastian
Re: [PATCH PR62011]
It does not help Silvermont, i.e. only Haswell and SandyBridge are affected. I don't use splitter since (1) it deletes zeroing of dest reg; (2) scheduler can hoist them up . I will try r16/r32 variants and tell you later. 2014-08-14 19:18 GMT+04:00 H.J. Lu hjl.to...@gmail.com: On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi All, Here is a fix for PR 62011 - remove false dependency for unary bit-manipulation instructions for latest BigCore chips (Sandybridge and Haswell) by outputting in assembly file zeroing destination register before bmi instruction. I checked that performance restored for popcnt, lzcnt and tzcnt instructions. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? gcc/ChangeLog 2014-08-14 Yuri Rumyantsev ysrum...@gmail.com PR target/62011 * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function prototype. * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function. * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros. * config/i386/i386.md (ctzmode2, clzmode2_lzcnt, popcountmode2, *popcountmode2_cmp, *popcountsi2_cmp_zext): Output zeroing destination register for unary bit-manipulation instructions if required. Why don't you use splitter to to generate XOR? * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New. Is this needed for r16 and r32? The original report says that only r64 is affected: http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance Have you tried this on Silvermont? Does it help Silvermont? -- H.J.
Re: [PATCH] Fix find_inc in the scheduler (PR target/62025)
On Thu, Aug 14, 2014 at 11:49:37AM +0200, Jakub Jelinek wrote: On Thu, Aug 14, 2014 at 11:34:04AM +0200, Jakub Jelinek wrote: So, to set DEP_MULTIPLE even in the case where ask_depencency_caches returns DEP_PRESENT, you'd need to find the old dependency anyway (isn't that going to be expensive and totally kill all the effects of true_dependency_cache?) and set DEP_MULTIPLE on it if not already set. Something like (untested except for the openssl s390 testcase), haven't checked what effects it has on the number of successful find_inc cases. Here is the same thing with ChangeLog entry, after bootstrap/regtest on x86_64-linux and i686-linux. With extra instrumentation (pretty much not removing the find_inc hunk and adding the patch from yesterday too + statistics accumulation) shows comparable amount of successful parse_add_or_inc in find_inc (around 1.2M), but to my surprise still about 13000 cases where DEP_MULTIPLE is not set, yet there are clearly more than one dependency between mem_inc and inc_insn. The only testcase I've looked so far has been -m32 -O3 -fomit-frame-pointer -funroll-loops -funroll-all-loops 990517-1.c and what I see there is we have sp += 24, clobber flags; ... sp += 16, clobber flags; ... flags = ([sp+1096] eax) != 0; The dep in between the middle insn and last insn has DEP_MULTIPLE set, one dep there has been for sp setter vs. sp user and the other for clobber flags vs. flags setter. But for the first insn and third insn pair, only the clobber flags vs. flags setter dependency has been recorded, presumably the one on sp not, because there is some other setter after that. I hope the scheduler doesn't attempt to swap sp += 24 with flags setter because of the sp += 16 vs. flags setter dependency and sp += 24 vs. sp += 16 dependency, but I feel kind of uneasy with find_inc assuming the recorded dependency is the one for the mem_reg0, when in this case the dependency is there for completely different register. 2014-08-14 Jakub Jelinek ja...@redhat.com PR target/62025 * sched-deps.c (add_or_update_dep_1): If ask_dependency_caches returned DEP_PRESENT, make sure to set DEP_MULTIPLE on present_dep. (find_inc): Revert 2014-08-13 change. --- gcc/sched-deps.c.jj 2014-08-12 17:06:26.0 +0200 +++ gcc/sched-deps.c2014-08-14 11:46:25.509631874 +0200 @@ -1233,6 +1233,13 @@ add_or_update_dep_1 (dep_t new_dep, bool switch (ask_dependency_caches (new_dep)) { case DEP_PRESENT: + dep_t present_dep; + sd_iterator_def sd_it; + + present_dep = sd_find_dep_between_no_cache (DEP_PRO (new_dep), + DEP_CON (new_dep), + resolved_p, sd_it); + DEP_MULTIPLE (present_dep) = 1; return DEP_PRESENT; case DEP_CHANGED: @@ -4752,23 +4759,6 @@ find_inc (struct mem_inc_info *mii, bool goto next; } - /* The inc instruction could have clobbers, make sure those -registers are not used in mem insn. */ - FOR_EACH_INSN_DEF (def, mii-inc_insn) - if (!reg_overlap_mentioned_p (DF_REF_REG (def), mii-mem_reg0)) - { - df_ref use; - FOR_EACH_INSN_USE (use, mii-mem_insn) - if (reg_overlap_mentioned_p (DF_REF_REG (def), - DF_REF_REG (use))) - { - if (sched_verbose = 5) - fprintf (sched_dump, -inc clobber used in store failure.\n); - goto next; - } - } - newaddr = mii-inc_input; if (mii-mem_index != NULL_RTX) newaddr = gen_rtx_PLUS (GET_MODE (newaddr), newaddr, Jakub
Re: [PATCH] Fix find_inc in the scheduler (PR target/62025)
On 08/14/2014 05:50 PM, Jakub Jelinek wrote: I hope the scheduler doesn't attempt to swap sp += 24 with flags setter because of the sp += 16 vs. flags setter dependency and sp += 24 vs. sp += 16 dependency, but I feel kind of uneasy with find_inc assuming the recorded dependency is the one for the mem_reg0, when in this case the dependency is there for completely different register. Let me think about that for a while. Thanks for debugging the cache problem. Bernd
[linaro/gcc-4_9-branch] Merge from gcc-4_8-branch and backports
Hi all we have merged the gcc-4_8-branch into linaro/gcc-4_8-branch up to revision 213802 as r213944. We have also backported this set of revisions: r204251 as r213841 PR sanitizer/58543 r206529 as r213842 PR target/59744 r206530 as r213842 PR target/59744 / fix changelog typo This will be part of our 2014.08 4.8 release. Thanks, Yvan
Re: RFC: Patch for switch elimination (PR 54742)
On 08/14/14 04:32, Richard Biener wrote: You'll note in a separate thread Steve and I discussed this during Cauldron and it was at my recommendation Steve resurrected his proof of concept plugin and started beating it into shape. But do we really want a pass just to help coremark? And that's the biggest argument against Steve's work. In theory it should be applicable to other FSMs, but nobody's come forth with additional testcases from real world applications. jeff
[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and backports
Hi all we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to revision 213803 as r213943. We have also backported this set of revisions: r211140 as r213455 [AArch64] Drop ISB after FPCR write. r211270 as r213790 [AArch64] Remove from arm_neon.h functions not in the spec r211271 as r213791 Fix check for __FAST_MATH in arm_neon.h r211273 as r213792 [Patch,testsuite] Fix bind_pic_locally r211275 as r213792 [Patch,testsuite] Fix tests that fail due to symbol visibility when -fPIC r211503 as r213793 [AArch64] fix and enable non-const shuffle for bigendian using TBL instruction r211779 as r213793 [AArch64][committed] Delete unused variable i r212023 as r213794 [AArch64] Fix constraint vec_unpack_trunk r212024 as r213795 [AArch32] Cortex-A5 rtx costs table r212142 as r213796 [AArch32] Handle clz, rbit types in arm pipeline descriptions r212225 as r213797 [AArch64] Fix argument types for some high_lane* intrinsics implemented in assembly r212296 as r213798 [AArch64] Handle fcvta[su] and frint in RTX cost function r212358 as r213799 [AArch64] Relocate saved_varargs_size. r212512 as r213799 [AArch64] Restructure callee save slot allocation logic. r212752 as r213799 Unify slots x29/x30 r212753 as r213799 [AArch64] Add frame_size and hard_fp_offset to machine.frame r212912 as r213799 [AArch64] GNU-Stylize some un-formatted code. r212913 as r213799 [AArch64] Consistent parameter types in prologue/epilogue generation. r212943 as r213799 [AArch64] Remove useless local variable. r212945 as r213799 [AArch64] Remove useless parameter base_rtx. r212946 as r213799 [AArch64] Use register offset in cfun-machine-frame.reg_offset r212947 as r213799 [AArch64] Remove useless variable 'increment' r212949 as r213799 [AArch64] Hoist calculation of register rtx. r212950 as r213799 [AArch64] Refactor code out into aarch64_next_callee_save r212951 as r213799 [AArch64] Use helper functions to handle multiple modes. r212952 as r213799 [AArch64] Unify vector and core register save/restore code. r212954 as r213799 [AArch64] Split save restore path. r212955 as r213799 [AArch64] Simplify prologue expand using new helper functions. r212956 as r213799 [AArch64] Simplify epilogue expansion using new helper functions. r212957 as r213799 [AArch64] Prologue and epilogue test cases. r212958 as r213799 [AArch64] Optimize epilogue in the presence of an outgoing args area. r212959 as r213799 [AArch64] Extend frame state to track WB candidates. r212976 as r213799 [AArch64] Infrastructure for optional use of writeback r212996 as r213799 [AArch64] Optimize prologue when there is no frame pointer. r212997 as r213799 [AArch64] Optimize epilogue when there is no frame r212999 as 213800 [PATCH, ARM] Fix PR61948 (ICE with DImode shift by 1 bit) r213000 as r213801 PR 61713: ICE when expanding single-threaded version of atomic_test_and_set r213376 as r213817 [AArch64][1/2] Remove UNSPEC_CLS and use clrsb RTL code in its' place r213555 as r213817 [AArch64][2/2] Add rtx cost function handling of clz, clrsb, rbit This will be part of our 2014.08 4.9 release. Thanks, Yvan
Re: [C++ Patch] PR 54377
Hi, On 08/14/2014 04:18 PM, Jason Merrill wrote: On 08/14/2014 07:14 AM, Paolo Carlini wrote: + nparms -= variadic_p ? variadic_p : default_p; What if you have both default arguments and parameter packs? Right. Got distracted by the minor secondary issues... It seems to me that simply adding the counters works, ie, in the additional cpp0x testcase we thus say 'at least 2', instead of a wrong 'at least 3'. I'm finishing testing the below. Thanks, Paolo. / Index: cp/pt.c === --- cp/pt.c (revision 213952) +++ cp/pt.c (working copy) @@ -6861,19 +6861,24 @@ coerce_template_parms (tree parms, int variadic_args_p = 0; int post_variadic_parms = 0; + /* Likewise for parameters with default arguments. */ + int default_p = 0; + if (args == error_mark_node) return error_mark_node; nparms = TREE_VEC_LENGTH (parms); - /* Determine if there are any parameter packs. */ + /* Determine if there are any parameter packs or default arguments. */ for (parm_idx = 0; parm_idx nparms; ++parm_idx) { - tree tparm = TREE_VALUE (TREE_VEC_ELT (parms, parm_idx)); + tree parm = TREE_VEC_ELT (parms, parm_idx); if (variadic_p) ++post_variadic_parms; - if (template_parameter_pack_p (tparm)) + if (template_parameter_pack_p (TREE_VALUE (parm))) ++variadic_p; + if (TREE_PURPOSE (parm)) + ++default_p; } inner_args = orig_inner_args = INNERMOST_TEMPLATE_ARGS (args); @@ -6902,11 +6907,11 @@ coerce_template_parms (tree parms, { if (complain tf_error) { - if (variadic_p) + if (variadic_p || default_p) { - nparms -= variadic_p; + nparms -= variadic_p + default_p; error (wrong number of template arguments -(%d, should be %d or more), nargs, nparms); +(%d, should be at least %d), nargs, nparms); } else error (wrong number of template arguments @@ -6913,7 +6918,7 @@ coerce_template_parms (tree parms, (%d, should be %d), nargs, nparms); if (in_decl) - error (provided for %q+D, in_decl); + inform (input_location, provided for %q+D, in_decl); } return error_mark_node; Index: testsuite/g++.dg/cpp0x/alias-decl-2.C === --- testsuite/g++.dg/cpp0x/alias-decl-2.C (revision 213952) +++ testsuite/g++.dg/cpp0x/alias-decl-2.C (working copy) @@ -22,7 +22,7 @@ templateclass T using Vec = VectorT, AllocT templateclass T void g(VectorT, AllocT ); -templatetemplateclass T class TT void h(TTint); // { dg-error provided for } +templatetemplateclass T class TT void h(TTint); // { dg-message provided for } void bar() Index: testsuite/g++.dg/cpp0x/pr51226.C === --- testsuite/g++.dg/cpp0x/pr51226.C(revision 213952) +++ testsuite/g++.dg/cpp0x/pr51226.C(working copy) @@ -1,7 +1,7 @@ // PR c++/51226 // { dg-do compile { target c++11 } } -templateint struct A // { dg-error provided } +templateint struct A // { dg-message provided } { enum E : int; }; Index: testsuite/g++.dg/cpp0x/pr54377.C === --- testsuite/g++.dg/cpp0x/pr54377.C(revision 0) +++ testsuite/g++.dg/cpp0x/pr54377.C(working copy) @@ -0,0 +1,7 @@ +// PR c++/54377 +// { dg-do compile { target c++11 } } + +template typename, typename, typename = void, typename... +struct foo {}; // { dg-message provided for } + +fooint f; // { dg-error at least 2 } Index: testsuite/g++.dg/cpp0x/variadic2.C === --- testsuite/g++.dg/cpp0x/variadic2.C (revision 213952) +++ testsuite/g++.dg/cpp0x/variadic2.C (working copy) @@ -6,9 +6,9 @@ templatetypename... = int // { dg-error default class tuple3; templatetypename T1, typename T2, typename... Rest -struct two_or_more {}; // { dg-error provided for } +struct two_or_more {}; // { dg-message provided for } -typedef two_or_moreint bad; // { dg-error 2 or more 2 or more } +typedef two_or_moreint bad; // { dg-error at least 2 at least 2 } void f() { Index: testsuite/g++.dg/parse/too-many-tmpl-args1.C === --- testsuite/g++.dg/parse/too-many-tmpl-args1.C(revision 213952) +++ testsuite/g++.dg/parse/too-many-tmpl-args1.C(working copy) @@ -2,7 +2,7 @@ // Origin: Wolfgang Bangerth bange...@ticam.utexas.edu // { dg-do compile } -template typename T class A // { dg-error } +template typename T class A // { dg-message } { struct B; template typename U friend
Re: RFC: Patch for switch elimination (PR 54742)
On 08/14/14 10:12, David Malcolm wrote: On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote: On 08/14/14 04:32, Richard Biener wrote: You'll note in a separate thread Steve and I discussed this during Cauldron and it was at my recommendation Steve resurrected his proof of concept plugin and started beating it into shape. But do we really want a pass just to help coremark? And that's the biggest argument against Steve's work. In theory it should be applicable to other FSMs, but nobody's come forth with additional testcases from real world applications. Maybe a regex library? Perhaps: http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ? The key is that at least some states tell you at compile time what state you'll be in during the next loop iteration. Thus instead of coming around the loop, evaluating the switch condition, then doing the multi-way branch, we just directly jump to the case for the next iteration. I've never looked at the PCRE code to know if it's got cases like that. jeff
Re: [PATCH] Fix PR62077
On 08/14/2014 05:07 AM, Richard Biener wrote: So - can you take over this C++ frontend issue? OK. Jason
Re: RFC: Patch for switch elimination (PR 54742)
On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote: On 08/14/14 04:32, Richard Biener wrote: You'll note in a separate thread Steve and I discussed this during Cauldron and it was at my recommendation Steve resurrected his proof of concept plugin and started beating it into shape. But do we really want a pass just to help coremark? And that's the biggest argument against Steve's work. In theory it should be applicable to other FSMs, but nobody's come forth with additional testcases from real world applications. Maybe a regex library? Perhaps: http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ? Dave
Re: [PATCH 000/236] Introduce rtx subclasses
On Wed, 2014-08-13 at 20:13 -0400, David Malcolm wrote: On Wed, 2014-08-06 at 13:19 -0400, David Malcolm wrote: This is the patch series I spoke about at Cauldron in the talk A proposal for typesafe RTL; slides here: http://dmalcolm.fedorapeople.org/presentations/cauldron-2014/rtl They can also be seen at: https://dmalcolm.fedorapeople.org/gcc/patch-backups/rtx-classes/v20/ [...snip...] The patches (and my control for testing) has been on top of r211061, which being from 2014-05-29 is now two months old, but hopefully is still reviewable; naturally I'll perform bootstrap and regression testing for each patch in turn before committing. FWIW I've now rebased the patches against today's trunk (specifically r213914) and I've backed up the results to: https://dmalcolm.fedorapeople.org/gcc/patch-backups/rtx-classes/v21/ Still TODO: * rename as_a_nullable to safe_as_a * eliminate rtx_real_insn * make various scaffolding functions be inline, updating gdbinit.in as appropriate. [...snip...] Likewise; latest WIP: https://dmalcolm.fedorapeople.org/gcc/patch-backups/rtx-classes/v22/ has: * the renaming of as_a_nullable to safe_as_a * the elimination of rtx_real_insn Still TODO: * make various scaffolding functions be inline, updating gdbinit.in as
[PATCH] Handle -fsanitize=leak more similarly to address/thread
Hi! Right now when -fsanitize=leak adds -llsan, it adds it late on the command line, so e.g. -lstdc++ comes after it, which seems to be bad. The following patch puts it early on the link command line like we do for -lasan or -ltsan. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-08-14 Jakub Jelinek ja...@redhat.com * config/gnu-user.h (LIBLSAN_EARLY_SPEC): Define. * gcc.c (LIBLSAN_SPEC, LIBLSAN_EARLY_SPEC): Follow LIBTSAN*_SPEC. (SANITIZER_EARLY_SPEC): Include LIBLSAN_EARLY_SPEC for -fsanitize=leak. --- gcc/config/gnu-user.h.jj2014-01-03 11:41:11.0 +0100 +++ gcc/config/gnu-user.h 2014-08-11 13:42:24.187122214 +0200 @@ -114,7 +114,8 @@ see the files COPYING3 and COPYING.RUNTI /* Link -lasan early on the command line. For -static-libasan, don't link it for -shared link, the executable should be compiled with -static-libasan in that case, and for executable link link with --{,no-}whole-archive around - it to force everything into the executable. And similarly for -ltsan. */ + it to force everything into the executable. And similarly for -ltsan + and -llsan. */ #if defined(HAVE_LD_STATIC_DYNAMIC) #undef LIBASAN_EARLY_SPEC #define LIBASAN_EARLY_SPEC %{!shared:libasan_preinit%O%s} \ @@ -125,4 +126,8 @@ see the files COPYING3 and COPYING.RUNTI #define LIBTSAN_EARLY_SPEC %{static-libtsan:%{!shared: \ LD_STATIC_OPTION --whole-archive -ltsan --no-whole-archive \ LD_DYNAMIC_OPTION }}%{!static-libtsan:-ltsan} +#undef LIBLSAN_EARLY_SPEC +#define LIBLSAN_EARLY_SPEC %{static-liblsan:%{!shared: \ + LD_STATIC_OPTION --whole-archive -llsan --no-whole-archive \ + LD_DYNAMIC_OPTION }}%{!static-liblsan:-llsan} #endif --- gcc/gcc.c.jj2014-08-01 09:23:35.0 +0200 +++ gcc/gcc.c 2014-08-11 13:45:27.994253297 +0200 @@ -576,13 +576,19 @@ proper position among the other output f #ifndef LIBLSAN_SPEC #define STATIC_LIBLSAN_LIBS \ %{static-liblsan:%:include(libsanitizer.spec)%(link_liblsan)} -#ifdef HAVE_LD_STATIC_DYNAMIC -#define LIBLSAN_SPEC %{!shared:%{static-liblsan: LD_STATIC_OPTION \ +#ifdef LIBLSAN_EARLY_SPEC +#define LIBLSAN_SPEC STATIC_LIBLSAN_LIBS +#elif defined(HAVE_LD_STATIC_DYNAMIC) +#define LIBLSAN_SPEC %{static-liblsan: LD_STATIC_OPTION \ } -llsan %{static-liblsan: LD_DYNAMIC_OPTION } \ -STATIC_LIBLSAN_LIBS } +STATIC_LIBLSAN_LIBS #else -#define LIBLSAN_SPEC %{!shared:-llsan STATIC_LIBLSAN_LIBS } +#define LIBLSAN_SPEC -llsan STATIC_LIBLSAN_LIBS +#endif #endif + +#ifndef LIBLSAN_EARLY_SPEC +#define LIBLSAN_EARLY_SPEC #endif #ifndef LIBUBSAN_SPEC @@ -772,7 +778,8 @@ proper position among the other output f #ifndef SANITIZER_EARLY_SPEC #define SANITIZER_EARLY_SPEC \ %{!nostdlib:%{!nodefaultlibs:%{%:sanitize(address): LIBASAN_EARLY_SPEC } \ -%{%:sanitize(thread): LIBTSAN_EARLY_SPEC }}} +%{%:sanitize(thread): LIBTSAN_EARLY_SPEC } \ +%{%:sanitize(leak): LIBLSAN_EARLY_SPEC }}} #endif /* Linker command line options for -fsanitize= late on the command line. */ Jakub
Re: [C++ Patch] PR 54377
OK. Jason
[committed] Two !$omp declare simd fixes (PR fortran/62076)
Hi! I've committed following fix for two issues revealed e.g. by valgrind on some of the udr*.f90 testcases. buffer could be uninitialized, and gfc_free_omp_udr could free symbols in the combiner_ns (or initializer_ns) when freeing those whole namespaces, so if some symbols were queued for commit/undo, it is bad to first free the symbols and then try to undo them. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk and 4.9 branch. 2014-08-14 Jakub Jelinek ja...@redhat.com PR fortran/62076 * openmp.c (gfc_match_omp_clauses): When failed to match operator name, defined op name or name, set buffer to empty string. Don't call gfc_find_omp_udr if buffer is empty string. (gfc_match_omp_declare_reduction): Call gfc_undo_symbols () before calling gfc_free_omp_udr. --- gcc/fortran/openmp.c.jj 2014-08-12 15:42:44.239327384 +0200 +++ gcc/fortran/openmp.c2014-08-14 12:17:20.554565218 +0200 @@ -464,7 +464,11 @@ gfc_match_omp_clauses (gfc_omp_clauses * || !gfc_add_intrinsic (sym-attr, NULL))) rop = OMP_REDUCTION_NONE; } - gfc_omp_udr *udr = gfc_find_omp_udr (gfc_current_ns, buffer, NULL); + else + buffer[0] = '\0'; + gfc_omp_udr *udr + = (buffer[0] + ? gfc_find_omp_udr (gfc_current_ns, buffer, NULL) : NULL); gfc_omp_namelist **head = NULL; if (rop == OMP_REDUCTION_NONE udr) rop = OMP_REDUCTION_USER; @@ -1240,6 +1244,7 @@ gfc_match_omp_declare_reduction (void) syntax: gfc_current_locus = old_loc; gfc_current_ns = combiner_ns-parent; + gfc_undo_symbols (); gfc_free_omp_udr (omp_udr); return MATCH_ERROR; } Jakub
Re: [PATCH] Asan static optimization (draft)
Indeed, thanks for working on this. We've been wanting such optimization phase from day one, but never got to implementing it (except for a few simple ones). https://code.google.com/p/address-sanitizer/wiki/CompileTimeOptimizations There have been several attempts outside of our team to do such optimizations, but none of them made it to llvm trunk. In order for your work to be generally useful, I'd ask several things: - Update https://code.google.com/p/address-sanitizer/wiki/CompileTimeOptimizations with examples that will be handled - Create small standalone test cases in C - Don't put the tests under GPL (otherwise we'll not be able to reuse them in LLVM) make sure that sanopt performs conservative optimization Yes. I don't know a good solution to this problem, which is why we did not attack it before. Increasing asan speed will save lots of CPU time, but missing a single software bug due to an overly aggressive optimization may cost much more. We can do a detailed analysis of *simple* peephole-like optimizers, but *proving* that a complex optimizer does not remove needed checks is hardly practically possible. On Fri, Aug 8, 2014 at 3:26 AM, Yury Gribov y.gri...@samsung.com wrote: Hi all, I have been working on Asan global optimization pass lately. The goal is to remove redundant Asan checks from sanitized code. This should hopefully reduce Asan's speed/size overhead (which is currently ~2x). The patch is not yet ready for trunk (e.g. I haven't done bootstrap, etc. but Asan testsuite passed wo errors) but I thought I'd send it for preliminary review of algorithm and data structures (*). Current implementation (based on existing sanopt pass) uses a simple iterative intra-procedural dataflow to compute information about living checks. For each pointer we track the size of memory that was already checked for it. During dataflow iterations, living checks are merged at blocks start in a natural way. I decided to keep current (local) Asan optimizations because they reduce compilation time by dropping many obviously redundant checks much earlier in the compilation pipeline. Current results seem to be encouraging: the pass was able to remove 112 checks (out of 1768) in gcc/asan.c without significant increase in sanopt pass runtime. Before upstreaming this code, I plan to 1) develop extensive set of tests to make sure that sanopt performs conservative optimization i.e. does not remove checks too agressively (I guess this is a critically important prerequisite so any suggestions are welcomed) 2) disable optimizations for very large functions to avoid unbearable compile times 3) do detailed performance and efficiency measuments for Asan-bootstrap I also have some ideas for improving this code (and I'm certainly open to suggestions from community): 1) propagating checking information through assignments and PHI-nodes (and maybe even pointer arithmetic) should improve efficiency 2) ditto for optimization of symbolic ranges; actually this looks like a relatively easy addition to current code (I just need to keep a list of checked symbolic ranges to check_info structure) 3) in addition to redundant check removal, we could also move duplicate checks from e.g. branches of if-statement to their dominators. -Y (*) The patch relies on some additional changes in hash-table and CFG which have not yet been upstreamed so it won't go with current trunk.
Re: [Patch] PR55189 enable -Wreturn-type by default
On 12/08/2014 19:48, Joseph S. Myers wrote: On Mon, 11 Aug 2014, Sylvestre Ledru wrote: The test Wmissing-return2.c only has one of the two warnings. But as per -Wreturn-type = Run both, and for backwards compatibility with the existing definition of -Wreturn-type, both warnings should appear for this test. Make sense. Thanks for the feedback and the help. Here it is: https://github.com/sylvestre/gcc/commit/089ffc9fb85034111b892ee10190dc12b5dbe551 Yes, those tests seem as I expect. So, here is the full commit now. Since the patch is 600k, I uploaded it here: http://sylvestre.ledru.info/0001-Enable-warning-Wreturn-type-by-default.patch Let me know when I can commit that. At the end of this mail, I also added the changelog for Enable -Wimplicit-int by default Patch: http://sylvestre.ledru.info/0002-Enable-Wimplicit-int-by-default.patch Let me know if you prefer a separate mail. Everything is on github if it helps: https://github.com/sylvestre/gcc gcc/c-family/ChangeLog: 2014-08-13 Sylvestre Ledru sylves...@debian.org * c.opt: Enable -Wreturn-type by default Add -Wmissing-return: Warn whenever control may reach end of non-void function gcc/ChangeLog: 2014-08-13 Sylvestre Ledru sylves...@debian.org * doc/invoke.texi: Document new flag -Wmissing-return Update -Wreturn-type * tree-cfg.c (pass_warn_function_return::execute): Introduce -Wreturn-type management gcc/fortran/ChangeLog: 2014-08-13 Sylvestre Ledru sylves...@debian.org * options.c (gfc_handle_option): Manage the new flag -Wmissing-return libgomp/ChangeLog: 2014-08-13 Sylvestre Ledru sylves...@debian.org * testsuite/libgomp.c++/loop-2.C: Update the test with -Wreturn-type by default and -Wmissing-return * testsuite/libgomp.c++/loop-4.C: likewise * testsuite/libgomp.c++/parallel-1.C: likewise * testsuite/libgomp.c++/shared-1.C: likewise * testsuite/libgomp.c++/single-1.C: likewise * testsuite/libgomp.c++/single-2.C: likewise * testsuite/libgomp.c/omp-loop02.c: likewise * testsuite/libgomp.c/omp-parallel-for.c: likewise * testsuite/libgomp.c/omp-parallel-if.c: likewise * testsuite/libgomp.c/omp-single-1.c: likewise * testsuite/libgomp.c/omp-single-2.c: likewise * testsuite/libgomp.c/omp_matvec.c: likewise * testsuite/libgomp.c/omp_workshare3.c: likewise * testsuite/libgomp.c/omp_workshare4.c: likewise * testsuite/libgomp.c/pr30494.c (check): likewise * testsuite/libgomp.c/shared-1.c: likewise gcc/testsuite/ChangeLog: 2014-08-13 Sylvestre Ledru sylves...@debian.org * gcc.dg/Wmissing-return1.c: New test which tests the new behavior * gcc.dg/Wmissing-return2.c: New test which tests the new behavior * gcc.dg/Wmissing-return3.c: New test which tests the new behavior * gcc.dg/Wmissing-return4.c: New test which tests the new behavior * gcc.dg/Wmissing-return5.c: New test which tests the new behavior * c-c++-common/asan/no-redundant-instrumentation-2.c (main): Update the test with -Wreturn-type by default and -Wmissing-return * c-c++-common/cilk-plus/AN/decl-ptr-colon.c (int main): likewise * c-c++-common/cilk-plus/AN/parser_errors.c: likewise * c-c++-common/cilk-plus/AN/parser_errors2.c: likewise * c-c++-common/cilk-plus/AN/parser_errors3.c: likewise * c-c++-common/cilk-plus/AN/pr57457-2.c: likewise * c-c++-common/cilk-plus/AN/pr57541-2.c (void foo1): likewise (void foo2): likewise * c-c++-common/cilk-plus/AN/pr57541.c (int foo): likewise (int foo1): likewise * c-c++-common/cilk-plus/CK/pr60197.c: likewise * c-c++-common/cilk-plus/CK/spawn_in_return.c: likewise * c-c++-common/convert-vec-1.c: likewise * c-c++-common/dfp/call-by-value.c (int foo32): likewise (int foo64): likewise (int foo128): likewise * c-c++-common/pr36513-2.c (int main2): likewise * c-c++-common/pr36513.c (int main1): likewise * c-c++-common/pr43772.c: likewise * c-c++-common/pr49706-2.c (same): likewise * c-c++-common/raw-string-3.c: likewise * c-c++-common/tm/pr54893.c: likewise * c-c++-common/tm/trxn-expr-2.c: likewise * c-c++-common/tsan/fd_pipe_race.c: likewise * c-c++-common/tsan/tls_race.c: likewise * c-c++-common/vector-1.c (int f): likewise * c-c++-common/vector-2.c (void f): likewise * g++.dg/abi/covariant2.C (struct c3): likewise (struct c7): likewise * g++.dg/abi/covariant3.C: likewise * g++.dg/abi/key2.C (int sub): likewise * g++.dg/abi/mangle7.C: likewise * g++.dg/bprob/g++-bprob-1.C (call_for): likewise * g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc: likewise * g++.dg/conversion/op1.C (class C): likewise (int fn): likewise * g++.dg/conversion/op6.C: likewise * g++.dg/cpp0x/access01.C: likewise * g++.dg/cpp0x/alias-decl-19.C: likewise * g++.dg/cpp0x/auto2.C (struct A): likewise * g++.dg/cpp0x/constexpr-defarg2.C: likewise *
Re: [PATCH] Fix PR62081
Richard Biener wrote: The following fixes missing dominator computation before fixing loops. Rather than doing even more such weird stuff in a pass gate function this puts this into a new pass scheduled before the loop passes gate. Ok. +unsigned int +pass_fix_loops::execute (function *) +{ I would add an early exit if there are no loops in the function (like in the original code below...) if (!loops_for_fn (fn)) return 0; + if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) +{ + calculate_dominance_info (CDI_DOMINATORS); + fix_loop_structure (NULL); +} + return 0; +} [...] /* Gate for loop pass group. The group is controlled by -ftree-loop-optimize but we also avoid running it when the IL doesn't contain any loop. */ @@ -57,9 +107,6 @@ gate_loop (function *fn) if (!loops_for_fn (fn)) return true; ... here. - /* Make sure to drop / re-discover loops when necessary. */ - if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) -fix_loop_structure (NULL); return number_of_loops (fn) 1; }
Re: [PATCH] Fix UB in diagnostic.c
On Wed, Aug 13, 2014 at 09:03:37PM +0200, Manuel López-Ibáñez wrote: I don't think this is the right fix. The problem is that we are trying to print the caret in a column that is larger than the line_width. We do this because the file given by the line directive has nothing to do with the actual code we are parsing. I think in that case it is ok to just give up and not print a caret. So something like: @@ -300,11 +299,11 @@ diagnostic_show_locus (diagnostic_contex return; context-last_location = diagnostic-location; s = expand_location_to_spelling_point (diagnostic-location); line = location_get_source_line (s, line_width); - if (line == NULL) + if (line == NULL || s.column line_width) return; max_width = context-caret_max_width; line = adjust_line (line, line_width, max_width, (s.column)); Nonetheless, perhaps in addition adjust_line should have gcc_checking_assert(line_width = column). Yea, I think this would be much better, thanks. Done in the following. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-08-14 Marek Polacek pola...@redhat.com Manuel López-Ibáñez m...@gcc.gnu.org PR c/62059 * diagnostic.c (adjust_line): Add gcc_checking_assert. (diagnostic_show_locus): Don't print caret diagnostic if a column is larger than the line_width. diff --git gcc/diagnostic.c gcc/diagnostic.c index 0cc7593..2109f31 100644 --- gcc/diagnostic.c +++ gcc/diagnostic.c @@ -270,6 +270,7 @@ adjust_line (const char *line, int line_width, int right_margin = 10; int column = *column_p; + gcc_checking_assert (line_width = column); right_margin = MIN (line_width - column, right_margin); right_margin = max_width - right_margin; if (line_width = max_width column right_margin) @@ -302,7 +303,7 @@ diagnostic_show_locus (diagnostic_context * context, context-last_location = diagnostic-location; s = expand_location_to_spelling_point (diagnostic-location); line = location_get_source_line (s, line_width); - if (line == NULL) + if (line == NULL || s.column line_width) return; max_width = context-caret_max_width; Marek
Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
On 08/13/2014 05:29 AM, Kyrill Tkachov wrote: Is the attached patch ok? It just moves the section as you suggested. I did a build of the Linux kernel with and without this patch to make sure no code-gen was accidentally affected. Looks good. We'd need to store a mapping from constant to RTXes and everytime we have a cache hit we'd have to tweak them to make sure the registers involved are correct. I had a quick play with this but ended up with LRA ICEs :( You mis-understand how I meant to memoize. Have a look at how we cache expansions for multiply in synth_mult: we don't record registers or rtx's, but we do record the operation and arguments. So you could consider building a trie, indexed by a hash. struct imm_algorithm { HOST_WIDE_INT op1; imm_algorithm *prev; enum operation { // op1 is accepted by aarch64_mov_operand. // prev should be null. mov, // op1 is to be inserted at the given position // to the value constructed by prev. movk_48, movk_32, movk_16, movk_0, // op1 is an arithmetic immediate to be applied // to the value constructed by prev add, sub, // op1 is a logical immediate to be applied to // the value constructed by prev and, ior, } code; }; r~
[patch, fortran] Fix PR 62106
Hello world, the attached patch fixes the regression by making sure we never try to create a temporary variable from a temporary variable, which happened in the wrong order. Regression-tested. OK for trunk and 4.9? 2014-08-19 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/62106 * gfortran.h (symbol_attribute): Add fe_temp flag. * frontend-passes.c (is_fe_temp): New function. (create_var): Don't add a temporary for an already created variable or for a constant. (combine_ARRAY_constructor): Remove special handling for constants. 2014-08-19 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/62106 * gfortran.dg/array_constructor_49.f90: New test. Index: frontend-passes.c === --- frontend-passes.c (Revision 213778) +++ frontend-passes.c (Arbeitskopie) @@ -430,11 +430,26 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre return 0; } +/* Auxiliary function to check if an expression is a temporary created by + create var. */ + +static bool +is_fe_temp (gfc_expr *e) +{ + if (e-expr_type != EXPR_VARIABLE) +return false; + + return e-symtree-n.sym-attr.fe_temp; +} + + /* Returns a new expression (a variable) to be used in place of the old one, with an assignment statement before the current statement to set the value of the variable. Creates a new BLOCK for the statement if that hasn't already been done and puts the statement, plus the - newly created variables, in that block. */ + newly created variables, in that block. Special cases: If the + expression is constant or a temporary which has already + been created, just copy it. */ static gfc_expr* create_var (gfc_expr * e) @@ -448,6 +463,9 @@ create_var (gfc_expr * e) gfc_namespace *ns; int i; + if (e-expr_type == EXPR_CONSTANT || is_fe_temp (e)) +return gfc_copy_expr (e); + /* If the block hasn't already been created, do so. */ if (inserted_block == NULL) { @@ -522,6 +540,7 @@ create_var (gfc_expr * e) symbol-attr.flavor = FL_VARIABLE; symbol-attr.referenced = 1; symbol-attr.dimension = e-rank 0; + symbol-attr.fe_temp = 1; gfc_commit_symbol (symbol); result = gfc_get_expr (); @@ -1082,10 +1101,7 @@ combine_array_constructor (gfc_expr *e) if (op2-ts.type == BT_CHARACTER) return false; - if (op2-expr_type == EXPR_CONSTANT) -scalar = gfc_copy_expr (op2); - else -scalar = create_var (gfc_copy_expr (op2)); + scalar = create_var (gfc_copy_expr (op2)); oldbase = op1-value.constructor; newbase = NULL; Index: gfortran.h === --- gfortran.h (Revision 213778) +++ gfortran.h (Arbeitskopie) @@ -739,7 +739,7 @@ typedef struct optional:1, pointer:1, target:1, value:1, volatile_:1, temporary:1, dummy:1, result:1, assign:1, threadprivate:1, not_always_present:1, implied_index:1, subref_array_pointer:1, proc_pointer:1, asynchronous:1, -contiguous:1; +contiguous:1, fe_temp: 1; /* For CLASS containers, the pointer attribute is sometimes set internally even though it was not directly specified. In this case, keep the ! { dg-do run } ! { dg-options -ffrontend-optimize -fdump-tree-original } ! PR 62106 - this used to give wrong results because ! of a bogus extra temporary variable. ! Original test case by Martien Hulsen program t integer :: ndim=2, ndfp=4, i character (len=8) :: line write (unit=line,fmt='(4I2)'), (/ ( i, i = 1, ndfp ) /) + ndim if (line /= ' 3 4 5 6') call abort end program t ! { dg-final { scan-tree-dump-times __var 3 original } } ! { dg-final { cleanup-tree-dump original } }
Re: RFC: Patch for switch elimination (PR 54742)
On Wed, 2014-08-13 at 11:52 +0200, Richard Biener wrote: On Wed, Aug 13, 2014 at 4:54 AM, Bin.Cheng amker.ch...@gmail.com wrote: On Wed, Aug 13, 2014 at 4:40 AM, Jeff Law l...@redhat.com wrote: On 08/12/14 14:23, Richard Biener wrote: On August 12, 2014 8:31:16 PM CEST, Jeff Law l...@redhat.com wrote: On 08/12/14 11:46, Steve Ellcey wrote: Try setting the header latch fields for the loop structure to NULL, then call loops_set_state (LOOPS_NEED_FIXUP). But that is _not_ the appropriate way of keeping loops preserved! I think that's done when we've scrogged the loop beyond repair and want the structures rebuilt. IIRC, that's what you recommended we do. An update on this part of the patch. I found that just calling 'loops_set_state (LOOPS_NEED_FIXUP)' without setting the header and latch fields to NULL first is enough to fix the problem I had earlier. I thought I had tried that before but either I was calling something else or I had put the call in the wrong place but it is working now. Steve Ellcey
[C PATCH] Allow ATOMIC_*_LOCK_FREE in #if directive (DR#458)
The DR#458 is about the usage of ATOMIC_*_LOCK_FREE macros defined in stdatomic.h in the #if directives. Proposed Technical Corrigendum of this DR is that these macros should expand to constant expressions suitable for use in #if preprocessing directives. This patch does that by mapping these macros to __GCC_ATOMIC_*_LOCK_FREE variants defined in c-cppbuiltin.c:cpp_atomic_builtins. (It's what libsupc++ does.) The current implementation wouldn't work in #if directives. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-08-14 Marek Polacek pola...@redhat.com DR 458 * ginclude/stdatomic.h (__atomic_type_lock_free): Remove. (ATOMIC_*_LOCK_FREE): Map to __GCC_ATOMIC_*_LOCK_FREE. * gcc.dg/c11-stdatomic-2.c: New test. diff --git gcc/ginclude/stdatomic.h gcc/ginclude/stdatomic.h index 108259b4..26b4b6e 100644 --- gcc/ginclude/stdatomic.h +++ gcc/ginclude/stdatomic.h @@ -95,30 +95,16 @@ typedef _Atomic __UINTMAX_TYPE__ atomic_uintmax_t; #define atomic_signal_fence(MO)__atomic_signal_fence (MO) #define atomic_is_lock_free(OBJ) __atomic_is_lock_free (sizeof (*(OBJ)), (OBJ)) -#define __atomic_type_lock_free(T) \ - (__atomic_always_lock_free (sizeof (T), (void *) 0) \ - ? 2 \ - : (__atomic_is_lock_free (sizeof (T), (void *) 0) ? 1 : 0)) -#define ATOMIC_BOOL_LOCK_FREE \ - __atomic_type_lock_free (atomic_bool) -#define ATOMIC_CHAR_LOCK_FREE \ - __atomic_type_lock_free (atomic_char) -#define ATOMIC_CHAR16_T_LOCK_FREE \ - __atomic_type_lock_free (atomic_char16_t) -#define ATOMIC_CHAR32_T_LOCK_FREE \ - __atomic_type_lock_free (atomic_char32_t) -#define ATOMIC_WCHAR_T_LOCK_FREE \ - __atomic_type_lock_free (atomic_wchar_t) -#define ATOMIC_SHORT_LOCK_FREE \ - __atomic_type_lock_free (atomic_short) -#define ATOMIC_INT_LOCK_FREE \ - __atomic_type_lock_free (atomic_int) -#define ATOMIC_LONG_LOCK_FREE \ - __atomic_type_lock_free (atomic_long) -#define ATOMIC_LLONG_LOCK_FREE \ - __atomic_type_lock_free (atomic_llong) -#define ATOMIC_POINTER_LOCK_FREE \ - __atomic_type_lock_free (void * _Atomic) +#define ATOMIC_BOOL_LOCK_FREE __GCC_ATOMIC_BOOL_LOCK_FREE +#define ATOMIC_CHAR_LOCK_FREE __GCC_ATOMIC_CHAR_LOCK_FREE +#define ATOMIC_CHAR16_T_LOCK_FREE __GCC_ATOMIC_CHAR16_T_LOCK_FREE +#define ATOMIC_CHAR32_T_LOCK_FREE __GCC_ATOMIC_CHAR32_T_LOCK_FREE +#define ATOMIC_WCHAR_T_LOCK_FREE __GCC_ATOMIC_WCHAR_T_LOCK_FREE +#define ATOMIC_SHORT_LOCK_FREE __GCC_ATOMIC_SHORT_LOCK_FREE +#define ATOMIC_INT_LOCK_FREE __GCC_ATOMIC_INT_LOCK_FREE +#define ATOMIC_LONG_LOCK_FREE __GCC_ATOMIC_LONG_LOCK_FREE +#define ATOMIC_LLONG_LOCK_FREE __GCC_ATOMIC_LLONG_LOCK_FREE +#define ATOMIC_POINTER_LOCK_FREE __GCC_ATOMIC_POINTER_LOCK_FREE /* Note that these macros require __typeof__ and __auto_type to remove diff --git gcc/testsuite/gcc.dg/c11-stdatomic-2.c gcc/testsuite/gcc.dg/c11-stdatomic-2.c index e69de29..d746b5c 100644 --- gcc/testsuite/gcc.dg/c11-stdatomic-2.c +++ gcc/testsuite/gcc.dg/c11-stdatomic-2.c @@ -0,0 +1,27 @@ +/* Test stdatomic.h header contents. Test that ATOMIC_*_LOCK_FREE + macros can be used in an #if directive (DR#458). */ +/* { dg-do preprocess } */ +/* { dg-options -std=c11 -pedantic-errors } */ + +#include stdatomic.h + +#if ATOMIC_BOOL_LOCK_FREE +#endif +#if ATOMIC_CHAR_LOCK_FREE +#endif +#if ATOMIC_CHAR16_T_LOCK_FREE +#endif +#if ATOMIC_CHAR32_T_LOCK_FREE +#endif +#if ATOMIC_WCHAR_T_LOCK_FREE +#endif +#if ATOMIC_SHORT_LOCK_FREE +#endif +#if ATOMIC_INT_LOCK_FREE +#endif +#if ATOMIC_LONG_LOCK_FREE +#endif +#if ATOMIC_LLONG_LOCK_FREE +#endif +#if ATOMIC_POINTER_LOCK_FREE +#endif Marek
Re: [PATCH] Fix PR62031
Richard Biener wrote: This fixes wrong answer from data-dependence analysis by realizing that _all_ (even non-evolving) indirect accesses cannot be constrained to a full object size. This also gets rid of that ugly DR_UNCONSTRAINED_BASE hack (but effectively make it always active). Looks good to me. @@ -1404,9 +1409,21 @@ dr_may_alias_p (const struct data_refere build_fold_addr_expr (addr_b)); } else if (TREE_CODE (addr_b) == MEM_REF - DR_UNCONSTRAINED_BASE (b)) -return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a), -TREE_OPERAND (addr_b, 0)); + TREE_CODE (TREE_OPERAND (addr_b, 0)) == SSA_NAME) +{ + /* For true dependences we can apply TBAA. */ + if (flag_strict_aliasing +DR_IS_WRITE (a) DR_IS_READ (b) +!alias_sets_conflict_p (get_alias_set (DR_REF (a)), + get_alias_set (DR_REF (b + return false; + if (TREE_CODE (addr_a) == MEM_REF) + return ptr_derefs_may_alias_p (TREE_OPERAND (addr_a, 0), +TREE_OPERAND (addr_b, 0)); + else Remove the else here. + return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a), +TREE_OPERAND (addr_b, 0)); +} /* Otherwise DR_BASE_OBJECT is an access that covers the whole object that is being subsetted in the loop nest. */
Re: [C PATCH] Allow ATOMIC_*_LOCK_FREE in #if directive (DR#458)
On Thu, 14 Aug 2014, Marek Polacek wrote: The DR#458 is about the usage of ATOMIC_*_LOCK_FREE macros defined in stdatomic.h in the #if directives. Proposed Technical Corrigendum of this DR is that these macros should expand to constant expressions suitable for use in #if preprocessing directives. This patch does that by mapping these macros to __GCC_ATOMIC_*_LOCK_FREE variants defined in c-cppbuiltin.c:cpp_atomic_builtins. (It's what libsupc++ does.) The current implementation wouldn't work in #if directives. Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [patch, fortran] Fix PR 62106
On Thu, Aug 14, 2014 at 07:40:52PM +0200, Thomas Koenig wrote: Hello world, the attached patch fixes the regression by making sure we never try to create a temporary variable from a temporary variable, which happened in the wrong order. Regression-tested. OK for trunk and 4.9? Looks good to me. -- Steve
Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting
On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang ahyan...@google.com wrote: This bug is caused by my last patch, which did not differentiate between explicit section names (via attributes) and implicit section names (via -ffunction-section). This patch fixes that. -- diff --git gcc/bb-reorder.c gcc/bb-reorder.c index 8f8c420..2115b01 100644 --- gcc/bb-reorder.c +++ gcc/bb-reorder.c @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void) we are going to omit the reordering. */ optimize_function_for_speed_p (cfun) !DECL_ONE_ONLY (current_function_decl) - !DECL_SECTION_NAME (current_function_decl)); + !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); } /* This function is the main 'entrance' for the optimization that diff --git gcc/tree.h gcc/tree.h index 817507f..738675a 100644 --- gcc/tree.h +++ gcc/tree.h @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl { #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p) +/* Speficy whether the section name was explicitly set with decl_attributes. */ +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ + (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \ + !!DECL_SECTION_NAME(NODE)) Personally, I think it is clearer to simply write this as: #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ (DECL_SECTION_NAME(NODE) != NULL_TREE \ !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) Teresa + struct GTY(()) tree_decl_with_vis { struct tree_decl_with_rtl common; tree assembler_name; -- -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: RFC: Patch for switch elimination (PR 54742)
On Thu, 2014-08-14 at 10:21 -0600, Jeff Law wrote: On 08/14/14 10:12, David Malcolm wrote: On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote: On 08/14/14 04:32, Richard Biener wrote: You'll note in a separate thread Steve and I discussed this during Cauldron and it was at my recommendation Steve resurrected his proof of concept plugin and started beating it into shape. But do we really want a pass just to help coremark? And that's the biggest argument against Steve's work. In theory it should be applicable to other FSMs, but nobody's come forth with additional testcases from real world applications. Maybe a regex library? Perhaps: http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ? The key is that at least some states tell you at compile time what state you'll be in during the next loop iteration. Thus instead of coming around the loop, evaluating the switch condition, then doing the multi-way branch, we just directly jump to the case for the next iteration. I've never looked at the PCRE code to know if it's got cases like that. jeff I compiled PCRE but it never triggered this optimization (even if I bumped up the parameters for instruction counts and paths). I understand the desire not to add optimizations just for benchmarks but we do know other compilers have added this optimization for coremark (See http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance) and the 13 people on the CC list for this bug certainly shows interest in having it even if it is just for a benchmark. Does 'competing against other compilers' sound better then 'optimizing for a benchmark'? Steve Ellcey sell...@mips.com
[Google/4_9] A couple gcov-tool fixes
Fix a couple problems found during testing. Backport from trunk (r212694) failed to fixup gcov_read_counter invocations in google-specific code. Also, forward port r211800 from google/4_8 to tolerate differences after COMDAT fixup. Passes manual testing, ok if passes regression tests? Thanks, Teresa 2014-08-14 Teresa Johnson tejohn...@google.com * libgcov-merge.c (__gcov_merge_dc): Use gcov_get_counter, Relax the check after COMDAT fixup. (__gcov_merge_icall_topn): Use gcov_get_counter. Index: libgcov-merge.c === --- libgcov-merge.c (revision 213975) +++ libgcov-merge.c (working copy) @@ -95,8 +95,8 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c gcc_assert (!(n_counters % 2)); for (i = 0; i n_counters; i += 2) { - gcov_type global_id = gcov_read_counter (); - gcov_type call_count = gcov_read_counter (); + gcov_type global_id = gcov_get_counter_target (); + gcov_type call_count = gcov_get_counter (); /* Note that global id counter may never have been set if no calls were made from this call-site. */ @@ -108,7 +108,10 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c else if (__gcov_is_gid_insane (global_id)) global_id = counters[i]; - gcc_assert (counters[i] == global_id); + /* In the case of inconsistency, use the src's target. */ + if (counters[i] != global_id) +fprintf (stderr, Warning: Inconsistent call targets in + direct-call profile.\n); } else if (global_id) counters[i] = global_id; @@ -158,12 +161,12 @@ __gcov_merge_icall_topn (gcov_type *counters, unsi } /* Skip the number_of_eviction entry. */ - gcov_read_counter (); + gcov_get_counter (); for (k = 0; k GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2) { int found = 0; - gcov_type global_id = gcov_read_counter (); - gcov_type call_count = gcov_read_counter (); + gcov_type global_id = gcov_get_counter_target (); + gcov_type call_count = gcov_get_counter (); for (m = 0; m j; m += 2) { if (tmp_array[m] == global_id) -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[C++ Patch] pedwarn issues
Hi, a few cases where under SFINAE we just go ahead and we don't immediately return error_mark_node. During the work I also noticed a glitch in cxx_sizeof_or_alignof_type where we do complain tf_warning_or_error but in that case complain is a bool. Tested x86_64-linux as usual. Thanks, Paolo. 2014-08-14 Paolo Carlini paolo.carl...@oracle.com * typeck.c (composite_pointer_type, cxx_sizeof_or_alignof_type, cp_build_array_ref, cp_build_function_call_vec): When a pedwarn is suppressed under SFINAE, return error_mark_node. * typeck.c (cxx_sizeof_or_alignof_type): Fix complain tf_warning_or_error, where complain is a bool, glitch. Index: typeck.c === --- typeck.c(revision 213976) +++ typeck.c(working copy) @@ -597,28 +597,34 @@ composite_pointer_type (tree t1, tree t2, tree arg tree attributes; tree result_type; - if (TYPE_PTRFN_P (t2) (complain tf_error)) -{ - switch (operation) - { - case CPO_COMPARISON: -pedwarn (input_location, OPT_Wpedantic, - ISO C++ forbids comparison between - pointer of type %void *% and pointer-to-function); -break; - case CPO_CONVERSION: -pedwarn (input_location, OPT_Wpedantic, - ISO C++ forbids conversion between - pointer of type %void *% and pointer-to-function); -break; - case CPO_CONDITIONAL_EXPR: -pedwarn (input_location, OPT_Wpedantic, - ISO C++ forbids conditional expression between - pointer of type %void *% and pointer-to-function); -break; - default: -gcc_unreachable (); - } + if (TYPE_PTRFN_P (t2)) + { + if (complain tf_error) + { + switch (operation) + { + case CPO_COMPARISON: + pedwarn (input_location, OPT_Wpedantic, + ISO C++ forbids comparison between pointer + of type %void *% and pointer-to-function); + break; + case CPO_CONVERSION: + pedwarn (input_location, OPT_Wpedantic, + ISO C++ forbids conversion between pointer + of type %void *% and pointer-to-function); + break; + case CPO_CONDITIONAL_EXPR: + pedwarn (input_location, OPT_Wpedantic, + ISO C++ forbids conditional expression between + pointer of type %void *% and + pointer-to-function); + break; + default: + gcc_unreachable (); + } + } + else + return error_mark_node; } result_type = cp_build_qualified_type (void_type_node, @@ -1536,6 +1542,8 @@ cxx_sizeof_or_alignof_type (tree type, enum tree_c pedwarn (input_location, OPT_Wpointer_arith, invalid application of %qs to a member function, operator_name_info[(int) op].name); + else + return error_mark_node; value = size_one_node; } @@ -1561,7 +1569,7 @@ cxx_sizeof_or_alignof_type (tree type, enum tree_c if (cxx_dialect = cxx1y array_of_runtime_bound_p (type) (flag_iso || warn_vla 0)) { - if (complain tf_warning_or_error) + if (complain) pedwarn (input_location, OPT_Wvla, taking sizeof array of runtime bound); else @@ -3129,9 +3137,14 @@ cp_build_array_ref (location_t loc, tree array, tr return error_mark_node; } - if (!lvalue_p (array) (complain tf_error)) - pedwarn (loc, OPT_Wpedantic, -ISO C++ forbids subscripting non-lvalue array); + if (!lvalue_p (array)) + { + if (complain tf_error) + pedwarn (loc, OPT_Wpedantic, +ISO C++ forbids subscripting non-lvalue array); + else + return error_mark_node; + } /* Note in C++ it is valid to subscript a `register' array, since it is valid to take the address of something with that @@ -3467,10 +3480,14 @@ cp_build_function_call_vec (tree function, vectre fndecl = function; /* Convert anything with function type to a pointer-to-function. */ - if (DECL_MAIN_P (function) (complain tf_error)) - pedwarn (input_location, OPT_Wpedantic, -ISO C++ forbids calling %::main% from within program); - + if (DECL_MAIN_P (function)) + { + if (complain tf_error) + pedwarn (input_location, OPT_Wpedantic, +
Re: RFC: Patch for switch elimination (PR 54742)
Steve Ellcey wrote: I understand the desire not to add optimizations just for benchmarks but we do know other compilers have added this optimization for coremark (See http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance) and the 13 people on the CC list for this bug certainly shows interest in having it even if it is just for a benchmark. Does 'competing against other compilers' sound better then 'optimizing for a benchmark'? I definitely would like to see GCC trunk do this transform. What about we integrate the new pass, and then when jump-threading manages to catch the coremark loop, we remove the pass? Thanks, Sebastian
Re: [Patch] PR55189 enable -Wreturn-type by default
--- a/gcc/fortran/options.c +++ b/gcc/fortran/options.c @@ -693,6 +693,10 @@ gfc_handle_option (size_t scode, const char *arg, int value, gfc_option.warn_line_truncation = value; break; +case OPT_Wmissing_return: + warn_missing_return = value; + break; + case OPT_Wrealloc_lhs: gfc_option.warn_realloc_lhs = value; break; The entry in c.opt says this is a C/C++ option, why you need this? +Wmissing-return +C ObjC C++ ObjC++ Var(warn_missing_return) LangEnabledBy(C ObjC C++ ObjC++,Wreturn-type) +Warn whenever control may reach end of non-void function This should prevent that using -Wreturn-type in Fortran tries to enable -Wmissing-return, if not that is a bug. In any case, the work-around should be adding Wmissing-return to fortran/lang.opt with a ??? comment, not there. --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -472,7 +472,7 @@ C ObjC Var(warn_implicit_function_declaration) Init(-1) Warning LangEnabledBy(C Warn about implicit function declarations Wimplicit-int -C ObjC Var(warn_implicit_int) Warning LangEnabledBy(C ObjC,Wimplicit) +C ObjC Var(warn_implicit_int) Warning Warn when a declaration does not specify a type Wimport diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 5ae910c..3f2019a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -3615,7 +3615,7 @@ This warning is enabled by @option{-Wall} in C++. @opindex Wimplicit-int @opindex Wno-implicit-int Warn when a declaration does not specify a type. -This warning is enabled by @option{-Wall}. +This warning is enabled by default. @item -Wimplicit-function-declaration @r{(C and Objective-C only)} @opindex Wimplicit-function-declaration Does this patch actually enables -Wimplicit-int by default? The default without Init() should be zero! And according to this: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01367.html we still want -Wno-implicit to disable -Wimplicit-int (and -Werror=implicit to set -Werror=implicit-int), so the LangEnabledBy() should stay. The documentation could say: This warning is enabled by default and it is also controlled by -Wimplicit. Cheers, Manuel.
Re: [patch] No allocation for empty unordered containers
On 13/08/2014 11:50, Jonathan Wakely wrote: Yes you can, it's conforming to replace a (non-virtual) member function with default arguments by two or more member functions. We do it all the time. See 17.6.5.5 [member.functions] p2. You should have told it sooner ! But of course no-one is supposed to ignore the Standard :-). Then here is the patch to introduce default constructor with compiler computed noexcept qualification. Note that I also made allocator aware default constructor allocation free however noexcept qualification has to be manually written which I find quite a burden. Do you think we shall do so now ? 2014-08-14 François Dumont fdum...@gcc.gnu.org * include/bits/hashtable_policy.h (_Prime_rehash_policy): Qualify constructor noexcept. (_Hash_code_base): All specialization default constructible if possible. (_Hashtable_base): Likewise. * include/bits/hashtable.h (_Hashtable()): Implementation defaulted. * include/bits/unordered_map.h (unordered_map::unordered_map()): New, implementation defaulted. (unordered_multimap::unordered_multimap()): Likewise. * include/bits/unordered_set.h (unordered_set::unordered_set()): Likewise. (unordered_multiset::unordered_multiset()): Likewise. * include/debug/unordered_map: Likewise. * include/debug/unordered_set: Likewise. * testsuite/23_containers/unordered_map/allocator/noexcept.cc (test04()): New. * testsuite/23_containers/unordered_multimap/allocator/noexcept.cc (test04()): New. * testsuite/23_containers/unordered_set/allocator/noexcept.cc (test04()): New. * testsuite/23_containers/unordered_multiset/allocator/noexcept.cc (test04()): New. I am preparing a patch for profile mode so I will submit modification for this mode with this big patch. Tested under Linux x86_64. Ok to commit ? François Index: include/bits/hashtable_policy.h === --- include/bits/hashtable_policy.h (revision 213780) +++ include/bits/hashtable_policy.h (working copy) @@ -460,7 +460,7 @@ /// smallest prime that keeps the load factor small enough. struct _Prime_rehash_policy { -_Prime_rehash_policy(float __z = 1.0) +_Prime_rehash_policy(float __z = 1.0) noexcept : _M_max_load_factor(__z), _M_next_resize(0) { } float @@ -1071,7 +1071,8 @@ typedef void* __hash_code; typedef _Hash_node_Value, false __node_type; - // We need the default constructor for the local iterators. + // We need the default constructor for the local iterators and _Hashtable + // default constructor. _Hash_code_base() = default; _Hash_code_base(const _ExtractKey __ex, const _H1, const _H2, @@ -1161,7 +1162,8 @@ typedef std::size_t __hash_code; typedef _Hash_node_Value, false __node_type; - // We need the default constructor for the local iterators. + // We need the default constructor for the local iterators and _Hashtable + // default constructor. _Hash_code_base() = default; _Hash_code_base(const _ExtractKey __ex, @@ -1250,6 +1252,8 @@ typedef std::size_t __hash_code; typedef _Hash_node_Value, true __node_type; + // We need the default constructor for _Hashtable default constructor. + _Hash_code_base() = default; _Hash_code_base(const _ExtractKey __ex, const _H1 __h1, const _H2 __h2, const _Default_ranged_hash) @@ -1694,6 +1698,7 @@ __hash_code, __hash_cached::value; protected: +_Hashtable_base() = default; _Hashtable_base(const _ExtractKey __ex, const _H1 __h1, const _H2 __h2, const _Hash __hash, const _Equal __eq) : __hash_code_base(__ex, __h1, __h2, __hash), _EqualEBO(__eq) @@ -1906,6 +1911,7 @@ __alloc_rebind__node_alloc_type, __bucket_type; using __bucket_alloc_traits = std::allocator_traits__bucket_alloc_type; + _Hashtable_alloc() = default; _Hashtable_alloc(const _Hashtable_alloc) = default; _Hashtable_alloc(_Hashtable_alloc) = default; Index: include/bits/hashtable.h === --- include/bits/hashtable.h (revision 213780) +++ include/bits/hashtable.h (working copy) @@ -310,10 +310,10 @@ const_local_iterator; private: - __bucket_type* _M_buckets; - size_type _M_bucket_count; + __bucket_type* _M_buckets = _M_single_bucket; + size_type _M_bucket_count = 1; __node_base _M_before_begin; - size_type _M_element_count; + size_type _M_element_count = 0; _RehashPolicy _M_rehash_policy; // A single bucket used when only need for 1 bucket. Especially @@ -322,7 +322,7 @@ // qualified. // Note that we can't leave hashtable with 0 bucket without adding // numerous checks in the code to avoid 0 modulus. - __bucket_type
Re: [C++ Patch] pedwarn issues
OK, thanks. Jason
Re: [PATCH 130/236] config/bfin: Use rtx_insn
On 08/06/14 11:21, David Malcolm wrote: gcc/ * config/bfin/bfin-protos.h (asm_conditional_branch): Strengthen param 1 from rtx to rtx_insn *. * config/bfin/bfin.c (expand_prologue_reg_save): Likewise for the various locals named insn. (expand_epilogue_reg_restore): Likewise. (frame_related_constant_load): Likewise. (add_to_reg): Likewise. (emit_link_insn): Likewise. (do_link): Likewise. (expand_interrupt_handler_prologue): Likewise. (branch_dest): Likewise for param branch. (asm_conditional_branch): Likewise for param insn. (gen_one_bundle): Likewise for elements of param slot and local t. (bfin_gen_bundles): Likewise for locals insn, next and elements of local slot. (reorder_var_tracking_notes): Likewise for locals insn, next, queue, next_queue, prev. (workaround_rts_anomaly): Likewise for locals insn, first_insn. (add_sched_insns_for_speculation): Likewise for local insn. OK. As are patches: #131-155 I resisted the nostalgia trip through a variety of backends I've hacked through the years and focused strictly on the patches :-) Jeff
Re: [Google/4_9] A couple gcov-tool fixes
On Thu, Aug 14, 2014 at 11:36 AM, Xinliang David Li davi...@google.com wrote: Ok. The interfaces of counter reading/getting now becomes confusing. Should it be better documented somewhere so that developer knows what is the right one to use in a certain context? I think it is documented in libgcov.h and accesses should be through the new gcov_get_counter interface, but these just got missed when it was ported from trunk. Teresa David On Thu, Aug 14, 2014 at 11:27 AM, Teresa Johnson tejohn...@google.com wrote: Fix a couple problems found during testing. Backport from trunk (r212694) failed to fixup gcov_read_counter invocations in google-specific code. Also, forward port r211800 from google/4_8 to tolerate differences after COMDAT fixup. Passes manual testing, ok if passes regression tests? Thanks, Teresa 2014-08-14 Teresa Johnson tejohn...@google.com * libgcov-merge.c (__gcov_merge_dc): Use gcov_get_counter, Relax the check after COMDAT fixup. (__gcov_merge_icall_topn): Use gcov_get_counter. Index: libgcov-merge.c === --- libgcov-merge.c (revision 213975) +++ libgcov-merge.c (working copy) @@ -95,8 +95,8 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c gcc_assert (!(n_counters % 2)); for (i = 0; i n_counters; i += 2) { - gcov_type global_id = gcov_read_counter (); - gcov_type call_count = gcov_read_counter (); + gcov_type global_id = gcov_get_counter_target (); + gcov_type call_count = gcov_get_counter (); /* Note that global id counter may never have been set if no calls were made from this call-site. */ @@ -108,7 +108,10 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c else if (__gcov_is_gid_insane (global_id)) global_id = counters[i]; - gcc_assert (counters[i] == global_id); + /* In the case of inconsistency, use the src's target. */ + if (counters[i] != global_id) +fprintf (stderr, Warning: Inconsistent call targets in + direct-call profile.\n); } else if (global_id) counters[i] = global_id; @@ -158,12 +161,12 @@ __gcov_merge_icall_topn (gcov_type *counters, unsi } /* Skip the number_of_eviction entry. */ - gcov_read_counter (); + gcov_get_counter (); for (k = 0; k GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2) { int found = 0; - gcov_type global_id = gcov_read_counter (); - gcov_type call_count = gcov_read_counter (); + gcov_type global_id = gcov_get_counter_target (); + gcov_type call_count = gcov_get_counter (); for (m = 0; m j; m += 2) { if (tmp_array[m] == global_id) -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH Fortran/Diagnostics] Move Fortran to common diagnostics machinery
Hi Tobias (or any other Fortran maintainer), Is this patch OK? https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00488.html Thanks, Manuel.
Re: [PATCH 012/236] Convert DF_REF_INSN to a function for now
On Wed, 2014-08-13 at 20:55 -0600, Jeff Law wrote: On 08/13/14 18:11, David Malcolm wrote: On Wed, 2014-08-13 at 14:34 -0600, Jeff Law wrote: On 08/13/14 14:28, David Malcolm wrote: Thanks. Although this function gets converted back to a macro in patch 191, I just realized that in the meantime that it's not inlined, as is the case for some of the other macro-function conversions in patches 13-16. Do I need to convert them to inline functions with the appropriate headers, and is that regarded as a sufficiently trivial fix to the stuff you've already reviewed to not need re-review? (I will bootstraptest). I'd just make it a follow-up. #237 ;-) Right, but these would be modifications to stuff that only exists between phases 1-3 of the kit, before going away in phase 4, so although it might be #237, it would need to be applied when the individual changes go in. If they're around just through phases #1-#3, then I wouldn't worry about it. ...and indeed, having experimented with it, the inline function approach would need to include more header files: for example, an inline implementation of, say, BB_HEAD() in basic-block.h would look like this: inline rtx_insn *BB_HEAD (const_basic_block bb) { rtx insn = bb-il.x.head_; return safe_as_a rtx_insn * (insn); } but this requires everything including basic-block.h to know about safe_as_a rtx_insn * (rtx) and hence have access to is_a_helper rtx_insn * (rtx), and hence needs to include rtl.h i.e. either basic-block.h would need to include rtl.h, or everything including it would. Similar considerations apply to the macros in the various other header files. Touching the header file graph doesn't seem like a good thing to be doing for a transient effort during phases 1-3 of applying the kit, so I'll hold off on making that change, and go with the patches as-is. Transient, yes, but given the amount of time for me to simply bootstrap each candidate patch, the non-inlined functions could last in trunk for a couple of weeks (there are only 168 hours in a week, and a bootstrap +regrtest takes about 3 hours on my box, so for 236 patches we're talking ~4 weeks of compute time just for that). Well, I'd suggest a few things. 1. For the config/ changes, a full bootstrap is not necessary. For those targets which are embedded, just build a stage1 cross to the target to verify it builds and call it good. 2. For targets where you can bootstrap, go ahead and do so, but just on those targets. Some of these you can probably have going in parallel. 3. Some of the changes are so trivial that I'd squash them together in a single build/test cycle. Thanks. FWIW I'm working on followup cleanups whilst waiting for the build/tests. I would even consider seeing all the scaffolding go in as a single chunk. It's nice to see the individuals during the review process, but I wouldn't lose any sleep if bundled those together. I guess the underlying point here is that this is a big change and I'm trying to be fastidious here. Murphy's Law suggests I'm going to break at least *something* :( Understood, but we don't want to be so fastidious that the time this stuff is in flux is so long that it creates more problems than it would if we took some sensible and safe shortcuts. (nods) Thanks again for the reviews Dave
Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting
Patch v2. Trunk no longer set SECTION_NAME for implicit section names, so this probably does not apply to trunk. It's probably not necessary for trunk either. Tested for Google 4.8(albeit unnecessary) and 4.9 branch. diff --git gcc/bb-reorder.c gcc/bb-reorder.c index a1b3e65..b9a829e 100644 --- gcc/bb-reorder.c +++ gcc/bb-reorder.c @@ -2554,7 +2554,7 @@ gate_handle_partition_blocks (void) we are going to omit the reordering. */ optimize_function_for_speed_p (cfun) !DECL_ONE_ONLY (current_function_decl) - !DECL_SECTION_NAME (current_function_decl)); + !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); } /* This function is the main 'entrance' for the optimization that diff --git gcc/tree.h gcc/tree.h index b656b7b..308eef8 100644 --- gcc/tree.h +++ gcc/tree.h @@ -2405,6 +2405,11 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p) +/* Speficy whether the section name was explicitly set with decl_attributes. */ +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ + (DECL_SECTION_NAME(NODE) != NULL_TREE \ +!DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) + extern tree decl_debug_expr_lookup (tree); extern void decl_debug_expr_insert (tree, tree); -- On Thu, Aug 14, 2014 at 11:25 AM, Teresa Johnson tejohn...@google.com wrote: On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang ahyan...@google.com wrote: This bug is caused by my last patch, which did not differentiate between explicit section names (via attributes) and implicit section names (via -ffunction-section). This patch fixes that. -- diff --git gcc/bb-reorder.c gcc/bb-reorder.c index 8f8c420..2115b01 100644 --- gcc/bb-reorder.c +++ gcc/bb-reorder.c @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void) we are going to omit the reordering. */ optimize_function_for_speed_p (cfun) !DECL_ONE_ONLY (current_function_decl) - !DECL_SECTION_NAME (current_function_decl)); + !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); } /* This function is the main 'entrance' for the optimization that diff --git gcc/tree.h gcc/tree.h index 817507f..738675a 100644 --- gcc/tree.h +++ gcc/tree.h @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl { #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p) +/* Speficy whether the section name was explicitly set with decl_attributes. */ +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ + (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \ + !!DECL_SECTION_NAME(NODE)) Personally, I think it is clearer to simply write this as: #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ (DECL_SECTION_NAME(NODE) != NULL_TREE \ !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) Teresa + struct GTY(()) tree_decl_with_vis { struct tree_decl_with_rtl common; tree assembler_name; -- -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting
On Thu, Aug 14, 2014 at 1:46 PM, Yi Yang ahyan...@google.com wrote: Patch v2. Trunk no longer set SECTION_NAME for implicit section names, so this probably does not apply to trunk. It's probably not necessary for trunk either. Tested for Google 4.8(albeit unnecessary) and 4.9 branch. diff --git gcc/bb-reorder.c gcc/bb-reorder.c index a1b3e65..b9a829e 100644 --- gcc/bb-reorder.c +++ gcc/bb-reorder.c @@ -2554,7 +2554,7 @@ gate_handle_partition_blocks (void) we are going to omit the reordering. */ optimize_function_for_speed_p (cfun) !DECL_ONE_ONLY (current_function_decl) - !DECL_SECTION_NAME (current_function_decl)); + !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); } /* This function is the main 'entrance' for the optimization that diff --git gcc/tree.h gcc/tree.h index b656b7b..308eef8 100644 --- gcc/tree.h +++ gcc/tree.h @@ -2405,6 +2405,11 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p) +/* Speficy whether the section name was explicitly set with decl_attributes. */ Typo Specify. Otherwise looks ok to me. Thanks, Teresa +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ + (DECL_SECTION_NAME(NODE) != NULL_TREE \ +!DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) + extern tree decl_debug_expr_lookup (tree); extern void decl_debug_expr_insert (tree, tree); -- On Thu, Aug 14, 2014 at 11:25 AM, Teresa Johnson tejohn...@google.com wrote: On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang ahyan...@google.com wrote: This bug is caused by my last patch, which did not differentiate between explicit section names (via attributes) and implicit section names (via -ffunction-section). This patch fixes that. -- diff --git gcc/bb-reorder.c gcc/bb-reorder.c index 8f8c420..2115b01 100644 --- gcc/bb-reorder.c +++ gcc/bb-reorder.c @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void) we are going to omit the reordering. */ optimize_function_for_speed_p (cfun) !DECL_ONE_ONLY (current_function_decl) - !DECL_SECTION_NAME (current_function_decl)); + !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); } /* This function is the main 'entrance' for the optimization that diff --git gcc/tree.h gcc/tree.h index 817507f..738675a 100644 --- gcc/tree.h +++ gcc/tree.h @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl { #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p) +/* Speficy whether the section name was explicitly set with decl_attributes. */ +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ + (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \ + !!DECL_SECTION_NAME(NODE)) Personally, I think it is clearer to simply write this as: #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ (DECL_SECTION_NAME(NODE) != NULL_TREE \ !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) Teresa + struct GTY(()) tree_decl_with_vis { struct tree_decl_with_rtl common; tree assembler_name; -- -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] cygwin: __cxa_atexit support
This patch implements __cxa_atexit support for Cygwin targets. This requires Cygwin 1.7.32 and binutils master. Net difference in check-c++ results on i686-pc-cygwin: # of unexpected failures-11 # of unexpected successes -3 # of expected failures -61 # of unsupported tests -46 -- Yaakov Selkowitz Associate Software Engineer, ARM Red Hat, Inc. 2014-08-12 Yaakov Selkowitz yselk...@redhat.com gcc/ * config/i386/cygwin.h (STARTFILE_SPEC): Use crtbeginS.o if shared. * config.gcc (*-*-cygwin*): Use __cxa_atexit by default. libgcc/ * config/i386/cygming-crtend.c (register_frame_ctor): Move atexit call from here... * config/i386/cygming-crtbegin.c (__gcc_register_frame): to here. (__dso_handle): Define on Cygwin. * config/i386/t-cygming (crtbeginS.o): New rule. * config.host (*-*-cygwin*): Add crtbeginS.o to extra_parts. Index: gcc/config/i386/cygwin.h === --- gcc/config/i386/cygwin.h(revision 213759) +++ gcc/config/i386/cygwin.h(working copy) @@ -40,7 +40,7 @@ #define STARTFILE_SPEC \ %{!shared: %{!mdll: crt0%O%s \ %{pg:gcrt0%O%s}}}\ - crtbegin.o%s + %{shared:crtbeginS.o%s;:crtbegin.o%s} #undef ENDFILE_SPEC #define ENDFILE_SPEC \ Index: gcc/config.gcc === --- gcc/config.gcc (revision 213759) +++ gcc/config.gcc (working copy) @@ -1575,6 +1575,7 @@ if test x$enable_threads = xyes; then thread_file='posix' fi + default_use_cxa_atexit=yes use_gcc_stdint=wrap ;; x86_64-*-cygwin*) @@ -1590,6 +1591,7 @@ if test x$enable_threads = xyes; then thread_file='posix' fi + default_use_cxa_atexit=yes use_gcc_stdint=wrap tm_defines=${tm_defines} TARGET_CYGWIN64=1 ;; Index: libgcc/config/i386/cygming-crtbegin.c === --- libgcc/config/i386/cygming-crtbegin.c (revision 213759) +++ libgcc/config/i386/cygming-crtbegin.c (working copy) @@ -111,6 +111,23 @@ = { }; #endif +#ifdef __CYGWIN__ +/* Declare the __dso_handle variable. It should have a unique value + in every shared-object; in a main program its value is zero. The + object should in any case be protected. This means the instance + in one DSO or the main program is not used in another object. The + dynamic linker takes care of this. */ + +#ifdef CRTSTUFFS_O +extern void *__ImageBase; +void *__dso_handle = __ImageBase; +#else +void *__dso_handle = 0; +#endif + +#endif /* __CYGWIN__ */ + + /* Pull in references from libgcc.a(unwind-dw2-fde.o) in the startfile. These are referenced by a ctor and dtor in crtend.o. */ extern void __gcc_register_frame (void); @@ -161,6 +178,13 @@ register_class_fn (__JCR_LIST__); } #endif + +#if DEFAULT_USE_CXA_ATEXIT + /* If we use the __cxa_atexit method to register C++ dtors + at object construction, also use atexit to register eh frame + info cleanup. */ + atexit(__gcc_deregister_frame); +#endif /* DEFAULT_USE_CXA_ATEXIT */ } void Index: libgcc/config/i386/cygming-crtend.c === --- libgcc/config/i386/cygming-crtend.c (revision 213759) +++ libgcc/config/i386/cygming-crtend.c (working copy) @@ -70,12 +70,6 @@ register_frame_ctor (void) { __gcc_register_frame (); -#if DEFAULT_USE_CXA_ATEXIT - /* If we use the __cxa_atexit method to register C++ dtors - at object construction, also use atexit to register eh frame - info cleanup. */ - atexit (__gcc_deregister_frame); -#endif } #if !DEFAULT_USE_CXA_ATEXIT Index: libgcc/config/i386/t-cygming === --- libgcc/config/i386/t-cygming(revision 213759) +++ libgcc/config/i386/t-cygming(working copy) @@ -8,6 +8,9 @@ crtbegin.o: $(srcdir)/config/i386/cygming-crtbegin.c $(crt_compile) -fno-omit-frame-pointer -c $ +crtbeginS.o: $(srcdir)/config/i386/cygming-crtbegin.c + $(crt_compile) -fno-omit-frame-pointer -c $ -DCRTSTUFFS_O + # We intentionally use a implementation-reserved init priority of 0, # so allow the warning. crtend.o: $(srcdir)/config/i386/cygming-crtend.c Index: libgcc/config.host === --- libgcc/config.host (revision 213759) +++ libgcc/config.host (working copy) @@ -614,7 +614,7 @@ i[4567]86-wrs-vxworks|i[4567]86-wrs-vxworksae) ;; i[34567]86-*-cygwin*) - extra_parts=crtbegin.o crtend.o crtfastmath.o + extra_parts=crtbegin.o crtbeginS.o crtend.o crtfastmath.o # This has to match the logic for DWARF2_UNWIND_INFO in gcc/config/i386/cygming.h if test x$enable_sjlj_exceptions = xyes; then
Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting
Thank you. I fixed the typo and committed. On Thu, Aug 14, 2014 at 1:49 PM, Teresa Johnson tejohn...@google.com wrote: On Thu, Aug 14, 2014 at 1:46 PM, Yi Yang ahyan...@google.com wrote: Patch v2. Trunk no longer set SECTION_NAME for implicit section names, so this probably does not apply to trunk. It's probably not necessary for trunk either. Tested for Google 4.8(albeit unnecessary) and 4.9 branch. diff --git gcc/bb-reorder.c gcc/bb-reorder.c index a1b3e65..b9a829e 100644 --- gcc/bb-reorder.c +++ gcc/bb-reorder.c @@ -2554,7 +2554,7 @@ gate_handle_partition_blocks (void) we are going to omit the reordering. */ optimize_function_for_speed_p (cfun) !DECL_ONE_ONLY (current_function_decl) - !DECL_SECTION_NAME (current_function_decl)); + !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); } /* This function is the main 'entrance' for the optimization that diff --git gcc/tree.h gcc/tree.h index b656b7b..308eef8 100644 --- gcc/tree.h +++ gcc/tree.h @@ -2405,6 +2405,11 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p) +/* Speficy whether the section name was explicitly set with decl_attributes. */ Typo Specify. Otherwise looks ok to me. Thanks, Teresa +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ + (DECL_SECTION_NAME(NODE) != NULL_TREE \ +!DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) + extern tree decl_debug_expr_lookup (tree); extern void decl_debug_expr_insert (tree, tree); -- On Thu, Aug 14, 2014 at 11:25 AM, Teresa Johnson tejohn...@google.com wrote: On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang ahyan...@google.com wrote: This bug is caused by my last patch, which did not differentiate between explicit section names (via attributes) and implicit section names (via -ffunction-section). This patch fixes that. -- diff --git gcc/bb-reorder.c gcc/bb-reorder.c index 8f8c420..2115b01 100644 --- gcc/bb-reorder.c +++ gcc/bb-reorder.c @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void) we are going to omit the reordering. */ optimize_function_for_speed_p (cfun) !DECL_ONE_ONLY (current_function_decl) - !DECL_SECTION_NAME (current_function_decl)); + !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl)); } /* This function is the main 'entrance' for the optimization that diff --git gcc/tree.h gcc/tree.h index 817507f..738675a 100644 --- gcc/tree.h +++ gcc/tree.h @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl { #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p) +/* Speficy whether the section name was explicitly set with decl_attributes. */ +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ + (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \ + !!DECL_SECTION_NAME(NODE)) Personally, I think it is clearer to simply write this as: #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \ (DECL_SECTION_NAME(NODE) != NULL_TREE \ !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)) Teresa + struct GTY(()) tree_decl_with_vis { struct tree_decl_with_rtl common; tree assembler_name; -- -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH] _cxa_thread_atexit fixes for Cygwin/MinGW-w64
The attached patch implements premature DLL unloading prevention in __cxa_thread_atexit for Cygwin and MinGW-w64 targets. The mingw.org target is welcome to do the same in their os_defines.h, but this code does require Windows XP/2003, and they have historically catered to older platforms. MinGW cannot practically implement __cxa_thread_atexit_impl because it has no control over the contents of its libc (msvcrt), and implementing this in the (static) mingw32 support library would be of little benefit because it would still end up statically linked into libstdc++. Also, the GPL-with-exceptions libstdc++ implementation cannot be copied as a basis of our own __cxa_thread_atexit_impl due to licensing conflicts with both Cygwin (copyright assignment) and MinGW (public domain). The testsuite shows no regressions with this patch. -- Yaakov Selkowitz Associate Software Engineer, ARM Red Hat, Inc. 2014-08-13 Yaakov Selkowitz yselk...@redhat.com libstdc++-v3/ * config/os/mingw32-w64/os_defines.h (_GLIBCXX_THREAD_ATEXIT_WIN32): Define. * config/os/newlib/os_defines.h (_GLIBCXX_THREAD_ATEXIT_WIN32): Ditto. * libsupc++/atexit_thread.cc [_GLIBCXX_THREAD_ATEXIT_WIN32]: #include windows.h. (struct elt): Add dll member. (run): Decrement dll refcount. (__cxxabiv1::__cxa_thread_atexit): Increment dll refcount. Index: libstdc++-v3/config/os/mingw32-w64/os_defines.h === --- libstdc++-v3/config/os/mingw32-w64/os_defines.h (revision 213759) +++ libstdc++-v3/config/os/mingw32-w64/os_defines.h (working copy) @@ -78,4 +78,9 @@ #define _GLIBCXX_LLP64 1 #endif +// Enable use of GetModuleHandleEx (requires Windows XP/2003) in +// __cxa_thread_atexit to prevent modules from being unloaded before +// their dtors are called +#define _GLIBCXX_THREAD_ATEXIT_WIN32 1 + #endif Index: libstdc++-v3/config/os/newlib/os_defines.h === --- libstdc++-v3/config/os/newlib/os_defines.h (revision 213759) +++ libstdc++-v3/config/os/newlib/os_defines.h (working copy) @@ -47,6 +47,12 @@ // See libstdc++/20806. #define _GLIBCXX_HAVE_DOS_BASED_FILESYSTEM 1 + +// Enable use of GetModuleHandleEx (requires Windows XP/2003) in +// __cxa_thread_atexit to prevent modules from being unloaded before +// their dtors are called +#define _GLIBCXX_THREAD_ATEXIT_WIN32 1 + #endif #endif Index: libstdc++-v3/libsupc++/atexit_thread.cc === --- libstdc++-v3/libsupc++/atexit_thread.cc (revision 213759) +++ libstdc++-v3/libsupc++/atexit_thread.cc (working copy) @@ -25,6 +25,10 @@ #include cstdlib #include new #include bits/gthr.h +#ifdef _GLIBCXX_THREAD_ATEXIT_WIN32 +#define WIN32_LEAN_AND_MEAN +#include windows.h +#endif #if _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL @@ -47,6 +51,9 @@ void (*destructor)(void *); void *object; elt *next; +#ifdef _GLIBCXX_THREAD_ATEXIT_WIN32 +HMODULE dll; +#endif }; // Keep a per-thread list of cleanups in gthread_key storage. @@ -62,6 +69,11 @@ { elt *old_e = e; e-destructor (e-object); +#ifdef _GLIBCXX_THREAD_ATEXIT_WIN32 + /* Decrement DLL count */ + if (e-dll) + FreeLibrary (e-dll); +#endif e = e-next; delete (old_e); } @@ -133,6 +145,14 @@ new_elt-destructor = dtor; new_elt-object = obj; new_elt-next = first; +#ifdef _GLIBCXX_THREAD_ATEXIT_WIN32 + /* Store the DLL address for a later call to FreeLibrary in new_elt and + increment DLL load count. This blocks the unloading of the DLL + before the thread-local dtors have been called. This does NOT help + if FreeLibrary/dlclose is called in excess. */ + GetModuleHandleExW (GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, + (LPCWSTR) dtor, new_elt-dll); +#endif if (__gthread_active_p ()) __gthread_setspecific (key, new_elt);
Re: [PATCH 007/236] New function: for_each_rtx_in_insn
On Tue, 2014-08-12 at 15:08 -0600, Jeff Law wrote: On 08/06/14 11:19, David Malcolm wrote: gcc/ * rtl.h (for_each_rtx_in_insn): New function. * rtlanal.c (for_each_rtx_in_insn): Likewise. OK. Note that we're moving away from for_each_rtx... I haven't looked, but there's a reasonable chance we may not need it after Richard S.'s work is committed. Something to watch. CCing Richard S, as a heads up. Richard, for context, the patch in this thread is this one: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00494.html which adds a for_each_rtx_in_insn function, to work on a top-level insn node, as part of a big cleanup I'm working on that distinguishes between insn nodes vs arbitrary rtx nodes: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00498.html I believe that your rtl-iter reworking: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00133.html hasn't landed in trunk yet (I don't see an rtl-iter.h there yet). Summary of notes below: I think our patch series are mostly compatible, but I think there's a conflict between them in rtlanal.c to do with for_each_inc_dec, at patch 176 in my kit vs patch 40 of your kit. I think it's resolvable, though. More detailed notes follow: My reading of Richard's patch series is that it optimizes many specific for_each_rtx implementations, but keeps the existing generic one around. Some notes on all of the places in which I've made use of the new for_each_rtx_in_insn in my patchkit: cfgcleanup.c:1728:for_each_rtx_in_insn (BB_END (bb1), replace_label, rr); cfgcleanup.c:1742:for_each_rtx_in_insn (BB_END (bb1), replace_label, rr); (both in outgoing_edges_match; I don't think Richard's patchkit touches this one). cfgcleanup.c:2002: for_each_rtx_in_insn (insn, replace_label, rr); (in try_crossjump_to_edge; likewise, I don't think Richard's kit touches this) ira.c:3350: for_each_rtx_in_insn (insn, set_paradoxical_subreg, ...in update_equiv_regs, the use added in my [PATCH 191/236] Remove DF_REF_INSN scaffolding: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00554.html It looks like Richard's [PATCH 25/50] ira.c:set_paradoxical_subreg: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00161.html optimizes the current code, and my reading of his patch is that it's compatible with my patchkit; the for_each_rtx_in_insn can be eliminated. Hence, depending on whose patch goes in first, the other patch will need some fixup, but I think it will be trivial to fixup. rtlanal.c:3103: return for_each_rtx_in_insn (insn, for_each_inc_dec_find_mem, data); ...in for_each_inc_dec, the use added in my: [PATCH 176/236] cselib and incdec: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00711.html Looks like Richard's [PATCH 40/50] rtlanal.c:for_each_inc_dec: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00177.html optimizes this. Merging these looks more involved: in my patch series I establish that the first param to for_each_inc_dec is an rtx_insn **, which may cause type issues in Richard's new code as is - but perhaps may allow some simplification? Given that the loc passed in is an insn, am I right in thinking we're only interested in MEM within the PATTERN of insn, right? Also, am I right in thinking that that particular callback can't modify the ptr that's passed in (I don't see writes back to *r)? If so, that suggests that we can lose an indirection from the first param, and simply work on PATTERN (insn). config/i386/i386.c:46886:for_each_rtx_in_insn (insn, (rtx_function) ix86_loop_memcount, (in ix86_loop_unroll_adjust) config/s390/s390.c:11820:for_each_rtx_in_insn (insn, (rtx_function) check_dpu, mem_count); (in s390_loop_unroll_adjust) (as far as I can see Richard's patches don't touch the config subdirs). Hope this is sane Dave