Re: [PATCH 4/4] Fix leading spaces.
2013/6/16 Ondřej Bílka nel...@seznam.cz: On Sat, Jun 15, 2013 at 05:13:31PM +0800, Chung-Ju Wu wrote: 2013/6/14 Joseph S. Myers jos...@codesourcery.com: On Thu, 13 Jun 2013, Richard Biener wrote: Btw, rather than these kind of patches I'd appreciate if someone would look at a simple pre(post?)-commit hook that enforces those whitespace rules. In the cpp testsuite we definitely want tests with bad whitespace (e.g. gcc.dg/cpp/backslash*.c) and generally I think it's wrong to be changing whitespace in the testsuite without an understanding of what the test is testing and how the whitespace is irrelevant to that (more generally, cleanups of compiler tests are suspect without such an understanding of what is or is not significant in a particular test). And so you need to allow addition of otherwise bad whitespace there. It's not obvious whether there might be other cases needing such whitespace as well. Either by adjusting the committed content or by rejecting the commit(?) I don't think hooks adjusting committed content are likely to work at all. -- Joseph S. Myers jos...@codesourcery.com By having a look at Ondřej's patch, it is nice to fix existing codes with proper whitespace/tab rules. But it covers too much under whole gcc source tree. Like Joseph said, we may accidentally change the cases that need bad whitespace. Perhaps we should gradually fix them? i.e. We can only fix leading spaces for some obvious cases (gcc/*.c gcc/c-family/*.c ...), leaving other source (gcc/testsuite/* gcc/config/* ...) untouched. I uploaded new version that touches only gcc/*.[ch] gcc/c-family/*.[ch] to http://kam.mff.cuni.cz/~ondra/stylepp.tar.bz2 patches are also updated. IMHO, the preliminary whitelist could be: gcc/*.[ch] gcc/c/*.[ch] gcc/c-family/*.[ch] gcc/common/*.[ch] gcc/cp/*.[ch] Anyway what in gcc/config/*.c depends on leading/trailing spaces? In general, I agree with you that all stuff under gcc/config/ and gcc/common/config/ are supposed to follow whitespace rules. But I think it would be better to have corresponding port maintainers make the decision. Your tool and patches look great to me. It helps fixup the existing codes with strong whitespace convention. But I am sorry that I don't have right to approve it. You will need some reviewers to review the patch and give the OK. If you do not receive any response to the patches within two weeks or so, you can 'ping' it with a follow-up mail to remind reviewers. :) Best regards, jasonwucj
Re: [gomp4] Some progress on #pragma omp simd
On Mon, Jun 17, 2013 at 07:59:15PM -0500, Aldy Hernandez wrote: As discussed on IRC. Attached are these changes you requested, plus changing OMP_CLAUSE__SIMDUID__UID to OMP_CLAUSE__SIMDUID__DECL. I will tackle the dot named builtins in the next iteration. Thanks. --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -227,6 +227,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT) +DEF_FUNCTION_TYPE_1 (BT_FN_UINT_PTR, BT_UINT, BT_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_ULONG_ULONG, BT_ULONG, BT_ULONG) DEF_FUNCTION_TYPE_1 (BT_FN_ULONGLONG_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG) DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT16, BT_UINT16, BT_UINT16) You can avoid this by using say unsigned_type_node as the type of the magic decl rather than pointer type. Though, with internal functions this will not be needed anyway. diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 6cc9a6c..41677bc 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -176,7 +176,7 @@ struct GTY ((chain_next (%h.next))) loop { /* For SIMD loops, this is a unique identifier of the loop, referenced by __builtin_GOMP.simd_vf and __builtin_GOMP.simd_lane builtins. */ - unsigned int simduid; + tree simduid; /* True if we should try harder to vectorize this loop. */ bool force_vect; Please move simduid after force_vect, so that it is better packed. Jakub
Re: [C/C++ PATCH] Handle COMPOUND_EXPR in convert_to_* (PR c++/56493)
On Mon, Jun 17, 2013 at 6:54 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Mon, 17 Jun 2013, Jakub Jelinek wrote: The following patch shows a performance regression caused by the C++ changes to evaluate side-effects in x += foo (); first and only then do the addition. Previously if x was say int and foo () returned unsigned long long, convert_to_integer would narrow the addition to unsigned int, but as we now pass to convert_to_* a COMPOUND_EXPR with the side-effects on the op0 and addition on op1 of the COMPOUND_EXPR, convert_to_integer doesn't perform this shortening and unfortunately we don't have any gimple optimization yet to do this later on. This patch fixes it by handling COMPOUND_EXPR in convert_to_* where we care about TREE_CODE of the expr. Ok for trunk? Ok for 4.8 as well after a while? I suppose it's OK to fix the regression, though really we should be eliminating these early convert_to_* optimizations (optimizing later on GIMPLE if possible) rather than adding to them. I agree. The correct place for this is in tree-ssa-forwprop.c (for now). Richard.
[PATCH][ARM][7/n] Partial IT block deprecation in ARMv8 AArch32 - straightforward arm.md changes
Hi all, This patch adjusts a number of patterns in arm.md for the IT block rules in -mrestrict-it. The changes are mostly straightforward, disabling the cond_exec version for -mrestrict-it in some cases and adding alternatives that can produce a 16-bit encoding in others. Bootstrapped on Cortex-A15 and tested arm-none-eabi on a model on armv7-a and armv8-a architecture levels. Ok for trunk? Thanks, Kyrill 2013-06-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (arm_mulsi3_v6): Add alternative for 16-bit encoding. (mulsi3addsi_v6): Disable predicable variant for arm_restrict_it. (mulsi3subsi): Likewise. (mulsidi3adddi): Likewise. (mulsidi3_v6): Likewise. (umulsidi3_v6): Likewise. (umulsidi3adddi_v6): Likewise. (smulsi3_highpart_v6): Likewise. (umulsi3_highpart_v6): Likewise. (mulhisi3tb): Likewise. (mulhisi3bt): Likewise. (mulhisi3tt): Likewise. (maddhisi4): Likewise. (maddhisi4tb): Likewise. (maddhisi4tt): Likewise. (maddhidi4): Likewise. (maddhidi4tb): Likewise. (maddhidi4tt): Likewise. (zeroextractsi_compare0_scratch): Likewise. (insv_zero): Likewise. (insv_t2): Likewise. (anddi_notzesidi_di): Likewise. (anddi_notsesidi_di): Likewise. (andsi_notsi_si): Likewise. (iordi_zesidi_di): Likewise. (xordi_zesidi_di): Likewise. (andsi_iorsi3_notsi): Likewise. (smax_0): Likewise. (smax_m1): Likewise. (smin_0): Likewise. (not_shiftsi): Likewise. (unaligned_loadsi): Likewise. (unaligned_loadhis): Likewise. (unaligned_loadhiu): Likewise. (unaligned_storesi): Likewise. (unaligned_storehi): Likewise. (extv_reg): Likewise. (extzv_t2): Likewise. (divsi3): Likewise. (udivsi3): Likewise. (arm_zero_extendhisi2addsi): Likewise. (arm_zero_extendqisi2addsi): Likewise. (compareqi_eq0): Likewise. (arm_extendhisi2_v6): Likewise. (arm_extendqisi2addsi): Likewise. (arm_movt): Likewise. (thumb2_ldrd): Likewise. (thumb2_ldrd_base): Likewise. (thumb2_ldrd_base_neg): Likewise. (thumb2_strd): Likewise. (thumb2_strd_base): Likewise. (thumb2_strd_base_neg): Likewise. (arm_negsi2): Add alternative for 16-bit encoding. (arm_one_cmplsi2): Likewise. 08-armmd-straight.patch Description: Binary data
Re: [ping**2] Nios II port
On 13/6/18 上午3:05, Sandra Loosemore wrote: Ping? I think these are the most recent versions of the unreviewed patches in the series: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00287.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00760.html http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01085.html There are also these two parts that have been reviewed already: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01329.html [approved but not applied yet?] That Dwarf fix has already been applied. http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html [needs minor cleanup and separate consideration] Yes, I forgot about this one. Need some cleanup to be more robust. Will resubmit separately. Chung-Lin
[PATCH, ARM] Fix unrecognizable vector comparisons
Hi, During expand, function vcondmodemode inverses some CMP, e.g. a LE b - b GE a But if b is CONST0_RTX, b GE a will be an illegal insn. (insn 933 932 934 113 (set (reg:V4SI 1027) (unspec:V4SI [ (const_vector:V4SI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ]) (reg:V4SI 1023 [ vect_var_.49 ]) (const_int 1 [0x1]) ] UNSPEC_VCGE)) PUGHSlab/Mapping.c:567 -1 (nil)) Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1189445 for more. And the bug also happens for FSF trunk. The similar issue (https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942) had fixed on AARCH64: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00581.html The patch is similar to the fix for aarch64. Bootstrap and no make check regression on Panda Board. Is it OK for trunk and 4.8? Thanks! -Zhenqiang ChangeLog: 2013-06-18 Zhenqiang Chen zhenqiang.c...@linaro.org * config/arm/neon.md (vcond): Fix floating-point vector comparisons against 0. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index e814df0..9299ae5 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1671,6 +1671,7 @@ ? 3 : 1; rtx magic_rtx = GEN_INT (magic_word); int inverse = 0; + int use_zero_form = 0; int swap_bsl_operands = 0; rtx mask = gen_reg_rtx (V_cmp_resultmode); rtx tmp = gen_reg_rtx (V_cmp_resultmode); @@ -1681,12 +1682,16 @@ switch (GET_CODE (operands[3])) { case GE: +case GT: case LE: +case LT: case EQ: - if (!REG_P (operands[5]) - (operands[5] != CONST0_RTX (MODEmode))) - operands[5] = force_reg (MODEmode, operands[5]); - break; + if (operands[5] == CONST0_RTX (MODEmode)) + { + use_zero_form = 1; + break; + } + /* Fall through. */ default: if (!REG_P (operands[5])) operands[5] = force_reg (MODEmode, operands[5]); @@ -1737,7 +1742,26 @@ a GT b - a GT b a LE b - b GE a a LT b - b GT a -a EQ b - a EQ b */ +a EQ b - a EQ b +Note that there also exist direct comparison against 0 forms, +so catch those as a special case. */ + if (use_zero_form) + { + inverse = 0; + switch (GET_CODE (operands[3])) + { + case LT: + base_comparison = gen_neon_vcltmode; + break; + case LE: + base_comparison = gen_neon_vclemode; + break; + default: + /* Do nothing, other zero form cases already have the correct +base_comparison. */ + break; + } + } if (!inverse) emit_insn (base_comparison (mask, operands[4], operands[5], magic_rtx));
[Committed] S/390: PR57609 fix - use next_active_insn instead of next_real_insn
Hi, the patch replaces next_real_insn with next_active_insn when checking for JUMP TABLE insns. This fixes ESA mode bootstrap on s390 which broke with r197266. Comitted to mainline Bye, -Andreas- 2013-06-18 Andreas Krebbel andreas.kreb...@de.ibm.com PR target/57609 * config/s390/s390.c (s390_chunkify_start): Replace next_real_insn with next_active_insn. --- gcc/config/s390/s390.c |4 1 file changed, 4 modifications(!) Index: gcc/config/s390/s390.c === *** gcc/config/s390/s390.c.orig --- gcc/config/s390/s390.c *** s390_chunkify_start (void) *** 7012,7018 if (LABEL_P (insn) (LABEL_PRESERVE_P (insn) || LABEL_NAME (insn))) { ! rtx vec_insn = next_real_insn (insn); if (! vec_insn || ! JUMP_TABLE_DATA_P (vec_insn)) bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (insn)); } --- 7012,7018 if (LABEL_P (insn) (LABEL_PRESERVE_P (insn) || LABEL_NAME (insn))) { ! rtx vec_insn = next_active_insn (insn); if (! vec_insn || ! JUMP_TABLE_DATA_P (vec_insn)) bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (insn)); } *** s390_chunkify_start (void) *** 7043,7049 { /* Find the jump table used by this casesi jump. */ rtx vec_label = XEXP (XEXP (XVECEXP (pat, 0, 1), 0), 0); ! rtx vec_insn = next_real_insn (vec_label); if (vec_insn JUMP_TABLE_DATA_P (vec_insn)) { rtx vec_pat = PATTERN (vec_insn); --- 7043,7049 { /* Find the jump table used by this casesi jump. */ rtx vec_label = XEXP (XEXP (XVECEXP (pat, 0, 1), 0), 0); ! rtx vec_insn = next_active_insn (vec_label); if (vec_insn JUMP_TABLE_DATA_P (vec_insn)) { rtx vec_pat = PATTERN (vec_insn);
Re: [Committed] S/390: PR57609 fix - use next_active_insn instead of next_real_insn
On Tue, Jun 18, 2013 at 10:57 AM, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: Hi, the patch replaces next_real_insn with next_active_insn when checking for JUMP TABLE insns. This fixes ESA mode bootstrap on s390 which broke with r197266. Comitted to mainline Please revert this and find another solution. JUMP_TABLE_DATA is not an active insn, and I will be removing it soon from active_insn_p. How big does a FIXME have to be for people to notice? Ciao! Steven
Re: [Committed] S/390: PR57609 fix - use next_active_insn instead of next_real_insn
2013/6/18 Steven Bosscher stevenb@gmail.com: On Tue, Jun 18, 2013 at 10:57 AM, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: Hi, the patch replaces next_real_insn with next_active_insn when checking for JUMP TABLE insns. This fixes ESA mode bootstrap on s390 which broke with r197266. Comitted to mainline Please revert this and find another solution. JUMP_TABLE_DATA is not an active insn, and I will be removing it soon from active_insn_p. How big does a FIXME have to be for people to notice? Ciao! Steven Hi, Steven, Do you mean that your comment 6 in PR56809 is not a good solution since you are going to remove JUMP_TABLE_DAT from active_insn_p ?? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56809 Best regards, jasonwucj
Re: [Committed] S/390: PR57609 fix - use next_active_insn instead of next_real_insn
On Tue, Jun 18, 2013 at 11:29 AM, Chung-Ju Wu wrote: Do you mean that your comment 6 in PR56809 is not a good solution since you are going to remove JUMP_TABLE_DAT from active_insn_p ?? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56809 No, that one I'll fix, it's fall-out from the JUMP_TABLE_DATA change and thus my responsibility to fix. Ciao! Steven
Re: [PATCH, ARM] Fix unrecognizable vector comparisons
On 06/18/13 09:50, Zhenqiang Chen wrote: Hi, During expand, function vcondmodemode inverses some CMP, e.g. a LE b - b GE a But if b is CONST0_RTX, b GE a will be an illegal insn. (insn 933 932 934 113 (set (reg:V4SI 1027) (unspec:V4SI [ (const_vector:V4SI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ]) (reg:V4SI 1023 [ vect_var_.49 ]) (const_int 1 [0x1]) ] UNSPEC_VCGE)) PUGHSlab/Mapping.c:567 -1 (nil)) Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1189445 for more. And the bug also happens for FSF trunk. The similar issue (https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942) had fixed on AARCH64: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00581.html The patch is similar to the fix for aarch64. Bootstrap and no make check regression on Panda Board. Is it OK for trunk and 4.8? No, not without an appropriate set of testcases that exercise these cases. regards Ramana Thanks! -Zhenqiang ChangeLog: 2013-06-18 Zhenqiang Chen zhenqiang.c...@linaro.org * config/arm/neon.md (vcond): Fix floating-point vector comparisons against 0. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index e814df0..9299ae5 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1671,6 +1671,7 @@ ? 3 : 1; rtx magic_rtx = GEN_INT (magic_word); int inverse = 0; + int use_zero_form = 0; int swap_bsl_operands = 0; rtx mask = gen_reg_rtx (V_cmp_resultmode); rtx tmp = gen_reg_rtx (V_cmp_resultmode); @@ -1681,12 +1682,16 @@ switch (GET_CODE (operands[3])) { case GE: +case GT: case LE: +case LT: case EQ: - if (!REG_P (operands[5]) - (operands[5] != CONST0_RTX (MODEmode))) - operands[5] = force_reg (MODEmode, operands[5]); - break; + if (operands[5] == CONST0_RTX (MODEmode)) + { + use_zero_form = 1; + break; + } + /* Fall through. */ default: if (!REG_P (operands[5])) operands[5] = force_reg (MODEmode, operands[5]); @@ -1737,7 +1742,26 @@ a GT b - a GT b a LE b - b GE a a LT b - b GT a -a EQ b - a EQ b */ +a EQ b - a EQ b +Note that there also exist direct comparison against 0 forms, +so catch those as a special case. */ + if (use_zero_form) + { + inverse = 0; + switch (GET_CODE (operands[3])) + { + case LT: + base_comparison = gen_neon_vcltmode; + break; + case LE: + base_comparison = gen_neon_vclemode; + break; + default: + /* Do nothing, other zero form cases already have the correct +base_comparison. */ + break; + } + } if (!inverse) emit_insn (base_comparison (mask, operands[4], operands[5], magic_rtx));
Re: [PATCH] Fix up bmi_bextr_mode (PR target/57623)
On Mon, Jun 17, 2013 at 6:11 PM, Jakub Jelinek ja...@redhat.com wrote: This instruction has the predicates/constraints wrong, the r/m argument is the middle-one, the value from which it should be extracted, rather than the packed start/length argument. This got broken with PR50766, where the patch hasn't touched just BMI2, but for unknown reasons also this BMI instruction which was handled right in GCC 4.6. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8/4.7? BTW, the AVX2 docs document _bextr_u{32,64} 3 argument intrinsics, rather than the __bextr_u{32,64} 2 argument intrinsics we have in GCC right now. Something to fix up (as the names are different, perhaps we can both the old ones and the new ones implemented say as return __bextr_u32 (x, (y 255) | (z 8)); or similar)? It looks to me that GCC's double-underscored version is wrong. We are looking for compatibility with official bmiintrin.h header. Kirill, can you please comment on this issue? 2013-06-17 Jakub Jelinek ja...@redhat.com PR target/57623 * config/i386/i386.md (bmi_bextr_mode): Swap predicates and constraints of operand 1 and 2. * gcc.target/i386/bmi-bextr-3.c: New test. The fix itself looks good to me as far as insn is concerned, but please wait until the issue with builtins is resolved. Thanks, Uros.
Re: [PATCH] Fix up bmi_bextr_mode (PR target/57623)
On Tue, Jun 18, 2013 at 11:47:29AM +0200, Uros Bizjak wrote: On Mon, Jun 17, 2013 at 6:11 PM, Jakub Jelinek ja...@redhat.com wrote: This instruction has the predicates/constraints wrong, the r/m argument is the middle-one, the value from which it should be extracted, rather than the packed start/length argument. This got broken with PR50766, where the patch hasn't touched just BMI2, but for unknown reasons also this BMI instruction which was handled right in GCC 4.6. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8/4.7? BTW, the AVX2 docs document _bextr_u{32,64} 3 argument intrinsics, rather than the __bextr_u{32,64} 2 argument intrinsics we have in GCC right now. Something to fix up (as the names are different, perhaps we can both the old ones and the new ones implemented say as return __bextr_u32 (x, (y 255) | (z 8)); or similar)? It looks to me that GCC's double-underscored version is wrong. We are looking for compatibility with official bmiintrin.h header. Kirill, can you please comment on this issue? Well, bmiintrin.h has been added first as an AMD ISA extension, and I think for AMD extensions the GCC headers are the official place (the AMD docs don't mention the intrinsics at all), and only later on also added as Intel ISA extension. So I'd say it is fine to keep the __bextr_* variants (as written in the PR, sometimes the two argument ones can be useful if you e.g. precompute the start/length pairs ahead of time) and just add the one underscored ones to match ICC/AVX2 documentation. Jakub
[PATCH] Fix PR57334
This fixes PR57334, properly merging the chain of symtab nodes sharing the same assembler name. LTO bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2013-06-18 Richard Biener rguent...@suse.de PR lto/57334 * lto-symtab.c (lto_symtab_merge_decls): Process nodes properly. Index: gcc/lto-symtab.c === --- gcc/lto-symtab.c(revision 200163) +++ gcc/lto-symtab.c(working copy) @@ -522,19 +522,9 @@ lto_symtab_merge_decls (void) symtab_initialize_asm_name_hash (); FOR_EACH_SYMBOL (node) -if (lto_symtab_symbol_p (node) +if (!node-symbol.previous_sharing_asm_name node-symbol.next_sharing_asm_name) - { -symtab_node n; - - /* To avoid duplicated work, see if this is first real symbol in the - chain. */ - for (n = node-symbol.previous_sharing_asm_name; -n !lto_symtab_symbol_p (n); n = n-symbol.previous_sharing_asm_name) - ; - if (!n) - lto_symtab_merge_decls_1 (node); - } + lto_symtab_merge_decls_1 (node); } /* Helper to process the decl chain for the symbol table entry *SLOT. */
Re: [PATCH, ARM] Fix gcc.dg/pr48335-5.c ICE with NEON enabled
Thanks, Julian ChangeLog gcc/ * config/arm/arm.c (neon_vector_mem_operand): Add strict argument. Permit virtual register pre-reload if !strict. (coproc_secondary_reload_class): Adjust for neon_vector_mem_operand change. * config/arm/arm-protos.h (neon_vector_mem_operand): Adjust prototype. * config/arm/neon.md (movmisalignmode): Use neon_perm_struct_or_reg_operand instead of neon_struct_or_register_operand. (*movmisalignmode_neon_load, *movmisalignmode_neon_store): Use neon_permissive_struct_operand instead of neon_struct_operand. * config/arm/constraints.md (Un, Um, Us): Adjust calls to neon_vector_mem_operand. * config/arm/predicates.md (neon_struct_operand): Adjust call to neon_vector_mem_operand. (neon_permissive_struct_operand): New. (neon_struct_or_register_operand): Rename to... (neon_perm_struct_or_reg_operand): This. Adjust call to neon_vector_mem_operand. Ok but this also needs to go to FSF 4.8 if no RM objects and after a few days of soaking on trunk. regards Ramana
Re: [PATCH] Re-write LTO type merging again, do tree merging
On Mon, 17 Jun 2013, Andi Kleen wrote: Current trunk cannot build LTO kernels now, with your patch, as reported earlier. Please fix ASAP. I'm surprised that you checked in a patchkit with such serious reported problems. -Andi backup/lsrc/git/linux-lto-2.6/lib/decompress_unlzo.c: In function 'unlzo': /backup/lsrc/git/linux-lto-2.6/lib/decompress_unlzo.c:79:8: internal compiler error: in expand_expr_real_1, at expr.c:9361 parse += 7; ^ 0x5ea175 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**) ../../gcc/gcc/expr.c:9356 That suggests Index: gcc/expr.c === --- gcc/expr.c (revision 200164) +++ gcc/expr.c (working copy) @@ -9353,7 +9353,7 @@ expand_expr_real_1 (tree exp, rtx target /* Variables inherited from containing functions should have been lowered by this point. */ context = decl_function_context (exp); - gcc_assert (!context + gcc_assert (SCOPE_FILE_SCOPE_P (context) || context == current_function_decl || TREE_STATIC (exp) || DECL_EXTERNAL (exp) might fix it. Richard.
Re: [PATCH] PowerPC: Fix test case for PR55033
Hello Chung-Ju, On 06/18/2013 05:12 AM, Chung-Ju Wu wrote: 2013/6/18 David Edelsohn dje@gmail.com: gcc/testsuite/ChangeLog 2013-06-17 Sebastian Huber sebastian.hu...@embedded-brains.de PR target/55033 * gcc.target/powerpc/pr55033.c: Fix options. Okay. Thanks, David P.S. Please explicitly copy me on patches. Hi, Sebastian, Since David has pproved your patch, do you need me to help commit this fix again? I'd happy to do this for you. :) yes, please commit it for me. Thanks. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
PING [C++ docs patch] PR 56544
Hi, On 06/08/2013 10:57 AM, Paolo Carlini wrote: Hi, the bug reminds us to update the documentation about the value of __cplusplus. I tentatively prepared the below, is it clear enough? We could probably apply something to the branch too, without the -std=c++1y bits, thus end simply like '; or @code{201103L}, per the 2011 C++ standard' or more verbosely say that with -std=c++1y too the value is 201103L. Is this patch straightforward enough to go in? Opinions about the branch? Thanks... Paolo.
Re: [Patch wwwdocs] gcc-4.9 changes: mention support of the Intel Silvermont microarchitecture
Ping? On Mon, Jun 10, 2013 at 2:13 PM, Igor Zamyatin izamya...@gmail.com wrote: Hi! This patch mentions support of Silvermont architecture in the gcc-4.9/changes.html page. OK to install? Thanks, Igor Index: htdocs/gcc-4.9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v retrieving revision 1.17 diff -c -1 -r1.17 changes.html *** htdocs/gcc-4.9/changes.html 6 Jun 2013 11:15:24 - 1.17 --- htdocs/gcc-4.9/changes.html 10 Jun 2013 10:11:24 - *** *** 177,178 --- 177,185 + h3IA-32/x86-64/h3 + ul + li GCC now supports new Intel microarchitecture named Silvermont + through code-march=slm/code. + /li + /ul + h3 id=rxRX/h3
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo oleg.e...@t-online.de wrote: On Mon, 2013-06-17 at 10:41 +0200, Eric Botcazou wrote: My mistake. It's because arm_legitimize_address cannot re-factor [r105 + r165*4 + (-2064)] into rx = r105 + (-2064); [rx + r165*4]. Do you suggest that this kind of transformation should be done in backend? I can think of some disadvantages by doing it in backend: Different kinds of address expression might be wanted in different phase of compilation, for example, sometimes GCC wants to force computation of address expression out of memory access because it cannot CSE array indexing. It's difficult to differentiate in backend. It's equally difficult in memory_address_addr_space and it affects all the architectures at once... You said in the original message that Thumb2 and ARM modes are already not equally sensitive to it, so it's not unthinkable that some architectures might not want it at all. Therefore, yes, this should preferably be handled in arm_legitimize_address. My observation is, that legitimizing addressing modes in the backend by looking at one isolated address works, but doesn't give good results. In the SH backend there is this a particular case with displacement addressing (register + constant). On SH displacements for byte addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words. sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed heuristic to satisfy the displacement constraint and splits out a plus insn if needed to adjust the base address. Of course that fixed heuristic doesn't work for some cases and thus sometimes results in unnecessary base address adjustments. I had the idea of replacing the current (partially defunct) auto-inc-dec RTL pass with a more generic addressing mode selection pass: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590 Any suggestions/comments/... are highly appreciated. In PR56590, is PR50749 the only one that correlate with auto-inc-dec? Others seem just problems of wrong addressing mode. And one point on PR50749, auto-inc-dec depends on ivopt to choose auto-increment candidate. Since you disabled ivopt, I bet GCC will miss lots of auto-increment opportunities. Thanks. bin -- Best Regards.
Re: [PATCH] Fix up bmi_bextr_mode (PR target/57623)
On Tue, Jun 18, 2013 at 11:51 AM, Jakub Jelinek ja...@redhat.com wrote: This instruction has the predicates/constraints wrong, the r/m argument is the middle-one, the value from which it should be extracted, rather than the packed start/length argument. This got broken with PR50766, where the patch hasn't touched just BMI2, but for unknown reasons also this BMI instruction which was handled right in GCC 4.6. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8/4.7? BTW, the AVX2 docs document _bextr_u{32,64} 3 argument intrinsics, rather than the __bextr_u{32,64} 2 argument intrinsics we have in GCC right now. Something to fix up (as the names are different, perhaps we can both the old ones and the new ones implemented say as return __bextr_u32 (x, (y 255) | (z 8)); or similar)? It looks to me that GCC's double-underscored version is wrong. We are looking for compatibility with official bmiintrin.h header. Kirill, can you please comment on this issue? Well, bmiintrin.h has been added first as an AMD ISA extension, and I think for AMD extensions the GCC headers are the official place (the AMD docs don't mention the intrinsics at all), and only later on also added as Intel ISA extension. So I'd say it is fine to keep the __bextr_* variants (as written in the PR, sometimes the two argument ones can be useful if you e.g. precompute the start/length pairs ahead of time) and just add the one underscored ones to match ICC/AVX2 documentation. I agree with this proposal, but probably we need to review the whole bmiintrin.h for intel additions. Uros.
Re: [PATCH] Re-write LTO type merging again, do tree merging
On Mon, 17 Jun 2013, Andi Kleen wrote: Richard Biener rguent...@suse.de writes: Andi Kleen a...@firstfloor.org wrote: Current trunk cannot build LTO kernels now, with your patch, as reported earlier. Please fix ASAP. I'm surprised that you checked in a patchkit with such serious reported problems. Instructions for reproducing this issue appreciated. I've never built lto kernels. Install recent HJ Lu's Linux binutils somewhere (https://www.kernel.org/pub/linux/devel/binutils/) Build a gcc that uses the linker from above as plugin ld Check out git://github.com/andikleen/linux-misc -b lto-3.9 Put attached config as .config into build dir. make oldconfig make CC=gcc LD=ld-from-linux-binutils AR=gcc-ar -j .. Ok, it doesn't use LTO for me, not even with adding CFLAGS=-O2 -flto here. Richard. -Andi -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [patch] gcov intermediate format
Ping. Th patch is OK, thanks! I see you added gcov.exp file support, do you have a testcases? Honza
[PATCH] Fix c90-fordecl-1.c test
When I fixed PR57630, I failed to adjust the expected message in gcc.dg/c90-fordecl-1.c test (sorry for that); so we regressed. Fixed thusly, will commit as obvious. Tested with make check-gcc RUNTESTFLAGS=dg.exp=c90-fordecl-1.c 2013-06-18 Marek Polacek pola...@redhat.com * gcc.dg/c90-fordecl-1.c: Adjust expected message. --- gcc/testsuite/gcc.dg/c90-fordecl-1.c.mp32013-06-18 12:10:11.233225073 +0200 +++ gcc/testsuite/gcc.dg/c90-fordecl-1.c2013-06-18 12:11:27.332603234 +0200 @@ -9,6 +9,6 @@ foo (void) int j = 0; for (int i = 1; i = 10; i++) /* { dg-bogus warning warning in place of error } */ j += i; - /* { dg-error 'for' loop initial declarations are only allowed in C99 mode declaration in for loop { target *-*-* } 10 } */ - /* { dg-message note: use option -std=c99 or -std=gnu99 to compile your code note { target *-*-* } 10 }} */ + /* { dg-error 'for' loop initial declarations are only allowed in C99 or C11 mode declaration in for loop { target *-*-* } 10 } */ + /* { dg-message note: use option -std=c99, -std=gnu99, -std=c11 or -std=gnu11 to compile your code note { target *-*-* } 10 }} */ } Marek
Re: [Committed] S/390: PR57609 fix - use next_active_insn instead of next_real_insn
On 18/06/13 11:35, Steven Bosscher wrote: On Tue, Jun 18, 2013 at 11:29 AM, Chung-Ju Wu wrote: Do you mean that your comment 6 in PR56809 is not a good solution since you are going to remove JUMP_TABLE_DAT from active_insn_p ?? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56809 No, that one I'll fix, it's fall-out from the JUMP_TABLE_DATA change and thus my responsibility to fix. Btw. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57609 is also a fall-out from the JUMP_TABLE_DATA change. -Andreas-
[PATCH, libfortran]: Initialize result variable (+ other changes)
Hello! Attached patch initializes return variable in get_fpu_except_flags. Additionally, it uses __asm__ and __volatile__ consistently, as recommended for header files and unifies a bunch of formatting issues throughout the file. 2012-06-18 Uros Bizjak ubiz...@gmail.com * config/fpu-387.h: Use __asm__ and __volatile__ consistently. (get_fpu_except_flags): Initialize result. Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline? Uros. Index: config/fpu-387.h === --- config/fpu-387.h(revision 200163) +++ config/fpu-387.h(working copy) @@ -73,7 +73,7 @@ has_sse (void) /* We need a single SSE instruction here so the handler can safely skip over it. */ - __asm__ volatile (movaps %xmm0,%xmm0); + __asm__ __volatile__ (movaps\t%xmm0,%xmm0); sigaction (SIGILL, oact, NULL); @@ -100,7 +100,7 @@ void set_fpu (void) { unsigned short cw; - asm volatile (fnstcw %0 : =m (cw)); + __asm__ __volatile__ (fnstcw\t%0 : =m (cw)); cw |= (_FPU_MASK_IM | _FPU_MASK_DM | _FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM | _FPU_MASK_PM); @@ -112,13 +112,13 @@ void set_fpu (void) if (options.fpe GFC_FPE_UNDERFLOW) cw = ~_FPU_MASK_UM; if (options.fpe GFC_FPE_INEXACT) cw = ~_FPU_MASK_PM; - asm volatile (fldcw %0 : : m (cw)); + __asm__ __volatile__ (fldcw\t%0 : : m (cw)); if (has_sse()) { unsigned int cw_sse; - asm volatile (%vstmxcsr %0 : =m (cw_sse)); + __asm__ __volatile__ (%vstmxcsr\t%0 : =m (cw_sse)); cw_sse = 0x; cw_sse |= (_FPU_MASK_IM | _FPU_MASK_DM | _FPU_MASK_ZM | _FPU_MASK_OM @@ -131,7 +131,7 @@ void set_fpu (void) if (options.fpe GFC_FPE_UNDERFLOW) cw_sse = ~(_FPU_MASK_UM 7); if (options.fpe GFC_FPE_INEXACT) cw_sse = ~(_FPU_MASK_PM 7); - asm volatile (%vldmxcsr %0 : : m (cw_sse)); + __asm__ __volatile__ (%vldmxcsr\t%0 : : m (cw_sse)); } } @@ -139,7 +139,7 @@ void set_fpu (void) int get_fpu_except_flags (void) { - int result; + int result = 0; unsigned short cw; __asm__ __volatile__ (fnstsw\t%0 : =a (cw)); @@ -147,27 +147,18 @@ get_fpu_except_flags (void) if (has_sse()) { unsigned int cw_sse; + __asm__ __volatile__ (%vstmxcsr\t%0 : =m (cw_sse)); + cw |= cw_sse; } - if (cw _FPU_MASK_IM) -result |= GFC_FPE_INVALID; + if (cw _FPU_MASK_IM) result |= GFC_FPE_INVALID; + if (cw _FPU_MASK_DM) result |= GFC_FPE_DENORMAL; + if (cw _FPU_MASK_ZM) result |= GFC_FPE_ZERO; + if (cw _FPU_MASK_OM) result |= GFC_FPE_OVERFLOW; + if (cw _FPU_MASK_UM) result |= GFC_FPE_UNDERFLOW; + if (cw _FPU_MASK_PM) result |= GFC_FPE_INEXACT; - if (cw _FPU_MASK_ZM) -result |= GFC_FPE_ZERO; - - if (cw _FPU_MASK_OM) -result |= GFC_FPE_OVERFLOW; - - if (cw _FPU_MASK_UM) -result |= GFC_FPE_UNDERFLOW; - - if (cw _FPU_MASK_DM) -result |= GFC_FPE_DENORMAL; - - if (cw _FPU_MASK_PM) -result |= GFC_FPE_INEXACT; - return result; }
[PATCH] Remove LTO streamer cache pointer-map for LTO input
It is not necessary to maintain the pointer-map from cache entry to cache index when reading trees. A quick estimate using the latest WPA stats from firefox estimates its size to at least 1.5GB - apart from the cost to maintain it. So the following patch makes it possible to not maintain that map. LTO bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2013-06-18 Richard Biener rguent...@suse.de * tree-streamer.h (streamer_tree_cache_create): Adjust prototype. * tree-streamer.c (streamer_tree_cache_create): Make maintaining the map from cache entry to cache index optional. (streamer_tree_cache_replace_tree): Adjust accordingly. (streamer_tree_cache_append): Likewise. (streamer_tree_cache_delete): Likewise. * lto-streamer-in.c (lto_data_in_create): Do not maintain the streamer cache map from cache entry to cache index. * lto-streamer-out.c (create_output_block): Adjust. lto/ * lto.c (lto_register_var_decl_in_symtab): Pass in cache index and use it. (lto_register_function_decl_in_symtab): Likewise. (cmp_tree): New function. (unify_scc): Instead of using the streamer cache map from entry to cache index match up the two maps we have by sorting them. Adjust calls to lto_register_var_decl_in_symtab and lto_register_function_decl_in_symtab. Index: gcc/tree-streamer.c === *** gcc/tree-streamer.c.orig2013-06-18 13:00:34.0 +0200 --- gcc/tree-streamer.c 2013-06-18 13:00:46.042791385 +0200 *** streamer_tree_cache_replace_tree (struct *** 197,203 hashval_t hash = 0; if (cache-hashes.exists ()) hash = streamer_tree_cache_get_hash (cache, ix); ! streamer_tree_cache_insert_1 (cache, t, hash, ix, false); } --- 197,206 hashval_t hash = 0; if (cache-hashes.exists ()) hash = streamer_tree_cache_get_hash (cache, ix); ! if (!cache-node_map) ! streamer_tree_cache_add_to_node_array (cache, ix, t, hash); ! else ! streamer_tree_cache_insert_1 (cache, t, hash, ix, false); } *** streamer_tree_cache_append (struct strea *** 208,214 tree t, hashval_t hash) { unsigned ix = cache-nodes.length (); ! streamer_tree_cache_insert_1 (cache, t, hash, ix, false); } /* Return true if tree node T exists in CACHE, otherwise false. If IX_P is --- 211,220 tree t, hashval_t hash) { unsigned ix = cache-nodes.length (); ! if (!cache-node_map) ! streamer_tree_cache_add_to_node_array (cache, ix, t, hash); ! else ! streamer_tree_cache_insert_1 (cache, t, hash, ix, false); } /* Return true if tree node T exists in CACHE, otherwise false. If IX_P is *** preload_common_nodes (struct streamer_tr *** 319,331 /* Create a cache of pickled nodes. */ struct streamer_tree_cache_d * ! streamer_tree_cache_create (bool with_hashes) { struct streamer_tree_cache_d *cache; cache = XCNEW (struct streamer_tree_cache_d); ! cache-node_map = pointer_map_create (); cache-nodes.create (165); if (with_hashes) cache-hashes.create (165); --- 325,338 /* Create a cache of pickled nodes. */ struct streamer_tree_cache_d * ! streamer_tree_cache_create (bool with_hashes, bool with_map) { struct streamer_tree_cache_d *cache; cache = XCNEW (struct streamer_tree_cache_d); ! if (with_map) ! cache-node_map = pointer_map_create (); cache-nodes.create (165); if (with_hashes) cache-hashes.create (165); *** streamer_tree_cache_delete (struct strea *** 347,353 if (c == NULL) return; ! pointer_map_destroy (c-node_map); c-nodes.release (); c-hashes.release (); free (c); --- 354,361 if (c == NULL) return; ! if (c-node_map) ! pointer_map_destroy (c-node_map); c-nodes.release (); c-hashes.release (); free (c); Index: gcc/tree-streamer.h === *** gcc/tree-streamer.h.orig2013-06-18 13:00:34.0 +0200 --- gcc/tree-streamer.h 2013-06-18 13:00:46.061791614 +0200 *** void streamer_tree_cache_append (struct *** 98,104 hashval_t); bool streamer_tree_cache_lookup (struct streamer_tree_cache_d *, tree, unsigned *); ! struct streamer_tree_cache_d *streamer_tree_cache_create (bool); void streamer_tree_cache_delete (struct streamer_tree_cache_d *); /* Return the tree node at slot IX in CACHE. */ --- 98,104 hashval_t); bool streamer_tree_cache_lookup (struct streamer_tree_cache_d *, tree, unsigned *); ! struct streamer_tree_cache_d
Re: [Committed] S/390: PR57609 fix - use next_active_insn instead of next_real_insn
On Tue, Jun 18, 2013 at 10:59:56AM +0200, Steven Bosscher wrote: On Tue, Jun 18, 2013 at 10:57 AM, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: Hi, the patch replaces next_real_insn with next_active_insn when checking for JUMP TABLE insns. This fixes ESA mode bootstrap on s390 which broke with r197266. Comitted to mainline Please revert this and find another solution. JUMP_TABLE_DATA is not an active insn, and I will be removing it soon from active_insn_p. I don't see which of the other iterators would fit here. So I'll need my own. How do you intend to fix this for the other targets? Will there be a new iterator available? If you don't want me to use next_active_insn I probably have to do something like this instead: --- gcc/config/s390/s390.c | 30 ++ 1 file changed, 22 insertions(+), 8 modifications(!) Index: gcc/config/s390/s390.c === *** gcc/config/s390/s390.c.orig --- gcc/config/s390/s390.c *** s390_mainpool_cancel (struct constant_po *** 6792,6797 --- 6792,6819 s390_free_pool (pool); } + /* Return true if LABEL is directly followed by a jump table insn. If +JUMP_TABLE_INSN is non-null the jump table insn is returned +there. */ + static bool + s390_is_jumptable_label_p (rtx label, rtx *jump_table_insn) + { + while (label) + { + label = NEXT_INSN (label); + + if (label == NULL_RTX || NONDEBUG_INSN_P (label)) + return false; + + if (JUMP_TABLE_DATA_P (label)) + { + if (jump_table_insn != NULL) + *jump_table_insn = label; + return true; + } + } + return false; + } /* Chunkify the literal pool. */ *** s390_chunkify_start (void) *** 7012,7019 if (LABEL_P (insn) (LABEL_PRESERVE_P (insn) || LABEL_NAME (insn))) { ! rtx vec_insn = next_active_insn (insn); ! if (! vec_insn || ! JUMP_TABLE_DATA_P (vec_insn)) bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (insn)); } --- 7034,7040 if (LABEL_P (insn) (LABEL_PRESERVE_P (insn) || LABEL_NAME (insn))) { ! if (!s390_is_jumptable_label_p (insn, NULL)) bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (insn)); } *** s390_chunkify_start (void) *** 7043,7050 { /* Find the jump table used by this casesi jump. */ rtx vec_label = XEXP (XEXP (XVECEXP (pat, 0, 1), 0), 0); ! rtx vec_insn = next_active_insn (vec_label); ! if (vec_insn JUMP_TABLE_DATA_P (vec_insn)) { rtx vec_pat = PATTERN (vec_insn); int i, diff_p = GET_CODE (vec_pat) == ADDR_DIFF_VEC; --- 7064,7072 { /* Find the jump table used by this casesi jump. */ rtx vec_label = XEXP (XEXP (XVECEXP (pat, 0, 1), 0), 0); ! rtx vec_insn; ! ! if (s390_is_jumptable_label_p (vec_label, vec_insn)) { rtx vec_pat = PATTERN (vec_insn); int i, diff_p = GET_CODE (vec_pat) == ADDR_DIFF_VEC;
[ARM][Insn classification refactoring 1/N] Move mult/div attribute values from insn to type
Hi, This is the first of a series of patches to implement a single, unified and fine grained instruction classification attribute. The first few patches will propose a refactoring of the instruction classifications we currently have in place. This first patch moves the multiplication and division attribute values from insn to type. OK for trunk? - Thanks Sofiane arm-move-mult-div-insn.diff Description: Binary data
Re: [ARM][Insn classification refactoring 1/N] Move mult/div attribute values from insn to type
On 18/06/13 13:33, Sofiane Naci wrote: Hi, This is the first of a series of patches to implement a single, unified and fine grained instruction classification attribute. The first few patches will propose a refactoring of the instruction classifications we currently have in place. This first patch moves the multiplication and division attribute values from insn to type. OK for trunk? - Thanks Sofiane= OK. R.
Re: RFA: Fix rtl-optimization/57425
On Sun, 16 Jun 2013, Joern Rennecke wrote: Quoting Eric Botcazou ebotca...@adacore.com: Could you also check that your patch also fixes PR opt/57569 and, if so, add the reference to the ChangeLog as well as the testcase? Attached is what I'm currently testing. bootstrap on i686-pc-linux-gnu finished, now regtesting. On x86_64-pc-linux-gnu, bootstrap is still in progress I also still have to verify if the pr57569.c testcase FAILs/PASSes for unpatched/patched svn trunk. Careful. The patch uses the wrong order of arguments in the call to the new function: /* Returns nonzero if a write to X might alias a previous read from - (or, if WRITEP is nonzero, a write to) MEM. */ + (or, if WRITEP is true, a write to) MEM. + If MEM_CANONCALIZED is nonzero, CANON_MEM_ADDR is the canonicalized + address of MEM, and MEM_MODE the mode for that access. */ static int -write_dependence_p (const_rtx mem, const_rtx x, int writep) +write_dependence_p (const_rtx mem, enum machine_mode mem_mode, + rtx canon_mem_addr, const_rtx x, + bool mem_canonicalized, bool writep) So, first argument is the thing in question (the one that might be clobbered), the second (or in the new interface the fourth) is the write that might clobber it ... /* Likewise, but we already have a canonicalized MEM_ADDR for MEM. + Also, consider MEM in MEM_MODE (which might be from an enclosing + STRICT_LOW_PART / ZERO_EXTRACT). */ + +int +canon_anti_dependence (const_rtx mem, enum machine_mode mem_mode, + rtx mem_addr, const_rtx x) +{ + return write_dependence_p (mem, mem_mode, mem_addr, x, ... same here, first the read, then the potentially destroying write ... if (*x MEM_P (*x)) -return canon_true_dependence (d-exp, d-mode, d-addr, *x, NULL_RTX); +return canon_anti_dependence (d-exp, d-mode, d-addr, *x); else ... but here you use the same order as for true_dependence, to cite from the function comment: /* True dependence: X is read after store in MEM takes place. */ int true_dependence (const_rtx mem, enum machine_mode mem_mode, const_rtx x) So, first the potentially clobbering write, then the read. And indeed in check_dependence d-exp is the write and x the read that is potentially clobbered. I.e. the arguments after your patch are exactly swapped. This is usually harmless, but not always, so that should be corrected before check in. The change in cselib.c:cselib_invalidate_mem has the same problem. Generally I would prefer simple interfaces to the query functions, dependence problems are hard enough to think about without functions needing four arguments. Does it really save much to not canonicalize the mem address for some calls? Ciao, Michael.
Re: Apply powerpc64le patches to gcc-4.8 branch?
On Tue, Jun 18, 2013 at 12:12 AM, Alan Modra amo...@gmail.com wrote: I'd like to apply the following set of patches supporting powerpc64le to the 4.8 branch. David has stated that he's happy with the idea, even though technically these are not regressions. OK? http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01374.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00211.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00214.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00244.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00297.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00435.html http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00476.html http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00165.html http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00166.html http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00388.html http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00554.html http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00684.html http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00766.html All of the rs6000 parts are okay with me. Thanks, David
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
On Tue, 2013-06-18 at 18:09 +0800, Bin.Cheng wrote: On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo oleg.e...@t-online.de wrote: My observation is, that legitimizing addressing modes in the backend by looking at one isolated address works, but doesn't give good results. In the SH backend there is this a particular case with displacement addressing (register + constant). On SH displacements for byte addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words. sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed heuristic to satisfy the displacement constraint and splits out a plus insn if needed to adjust the base address. Of course that fixed heuristic doesn't work for some cases and thus sometimes results in unnecessary base address adjustments. I had the idea of replacing the current (partially defunct) auto-inc-dec RTL pass with a more generic addressing mode selection pass: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590 Any suggestions/comments/... are highly appreciated. In PR56590, is PR50749 the only one that correlate with auto-inc-dec? Others seem just problems of wrong addressing mode. Yes, PR 50749 was the initial description of auto-inc-dec defects. PR 52049 is also related to it, as the code examples there are candidates for post-inc addressing mode. In that case, if 'int' is replaced with 'float' on SH post-inc is the optimal mode, because it doesn't have displacement addressing for FPU loads, except than SH2A. Even then, using post-inc is better as it has a more compact instruction encoding. The current auto-inc-dec is not able to discover such opportunities if, for example, mem accesses are reordered by preceding optimization passes. And one point on PR50749, auto-inc-dec depends on ivopt to choose auto-increment candidate. Since you disabled ivopt, I bet GCC will miss lots of auto-increment opportunities. No, I haven't disabled ivopt. Cheers, Oleg
[PATCH] Fix mv?.C ICEs
This fixes the mv?.C ICEs in the testsuite (and transforms a subset of them to execute fails for me). Running target unix/ FAIL: g++.dg/ext/mv12.C -std=gnu++98 execution test FAIL: g++.dg/ext/mv12.C -std=gnu++11 execution test FAIL: g++.dg/ext/mv2.C -std=gnu++98 execution test FAIL: g++.dg/ext/mv2.C -std=gnu++11 execution test FAIL: g++.dg/ext/mv5.C -std=gnu++98 execution test FAIL: g++.dg/ext/mv5.C -std=gnu++11 execution test Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2013-06-18 Richard Biener rguent...@suse.de * Makefile.in (cgraphunit.o): Add $(CFGLOOP_H) dependency. * cgraphunit.c: Include cfgloop.h. (init_lowered_empty_function): Initialize the loop tree. (assemble_thunk): Insert new BBs into loops. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 200164) +++ gcc/Makefile.in (working copy) @@ -2903,7 +2903,7 @@ cgraphunit.o : cgraphunit.c $(CONFIG_H) $(TREE_FLOW_H) $(TREE_PASS_H) debug.h $(DIAGNOSTIC_H) \ $(FIBHEAP_H) output.h $(PARAMS_H) $(RTL_H) $(IPA_PROP_H) \ gt-cgraphunit.h tree-iterator.h $(COVERAGE_H) $(TREE_DUMP_H) \ - $(GIMPLE_PRETTY_PRINT_H) $(IPA_INLINE_H) $(IPA_UTILS_H) \ + $(GIMPLE_PRETTY_PRINT_H) $(IPA_INLINE_H) $(IPA_UTILS_H) $(CFGLOOP_H) \ $(LTO_STREAMER_H) output.h $(REGSET_H) $(EXCEPT_H) $(GCC_PLUGIN_H) plugin.h cgraphclones.o : cgraphclones.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(TREE_H) langhooks.h $(TREE_INLINE_H) toplev.h $(DIAGNOSTIC_CORE_H) $(FLAGS_H) $(GGC_H) \ Index: gcc/cgraphunit.c === --- gcc/cgraphunit.c(revision 200164) +++ gcc/cgraphunit.c(working copy) @@ -192,6 +192,7 @@ along with GCC; see the file COPYING3. #include ipa-utils.h #include lto-streamer.h #include except.h +#include cfgloop.h #include regset.h /* FIXME: For reg_obstack. */ /* Queue of cgraph nodes scheduled to be added into cgraph. This is a @@ -1196,18 +1197,24 @@ init_lowered_empty_function (tree decl, init_tree_ssa (cfun); init_ssa_operands (cfun); cfun-gimple_df-in_ssa_p = true; + cfun-curr_properties |= PROP_ssa; } DECL_INITIAL (decl) = make_node (BLOCK); DECL_SAVED_TREE (decl) = error_mark_node; - cfun-curr_properties |= -(PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa | PROP_gimple_any); + cfun-curr_properties |= (PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_any + | PROP_cfg | PROP_loops); + + set_loops_for_fn (cfun, ggc_alloc_cleared_loops ()); + init_loops_structure (cfun, loops_for_fn (cfun), 1); + loops_for_fn (cfun)-state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES; /* Create BB for body of the function and connect it properly. */ bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR); - make_edge (ENTRY_BLOCK_PTR, bb, 0); + make_edge (ENTRY_BLOCK_PTR, bb, EDGE_FALLTHRU); make_edge (bb, EXIT_BLOCK_PTR, 0); + add_bb_to_loop (bb, ENTRY_BLOCK_PTR-loop_father); return bb; } @@ -1452,6 +1459,9 @@ assemble_thunk (struct cgraph_node *node then_bb = create_basic_block (NULL, (void *) 0, bb); return_bb = create_basic_block (NULL, (void *) 0, then_bb); else_bb = create_basic_block (NULL, (void *) 0, else_bb); + add_bb_to_loop (then_bb, bb-loop_father); + add_bb_to_loop (return_bb, bb-loop_father); + add_bb_to_loop (else_bb, bb-loop_father); remove_edge (single_succ_edge (bb)); true_label = gimple_block_label (then_bb); stmt = gimple_build_cond (NE_EXPR, restmp,
Re: [PATCH] Re-write LTO type merging again, do tree merging
make oldconfig make CC=gcc LD=ld-from-linux-binutils AR=gcc-ar -j .. Ok, it doesn't use LTO for me, not even with adding CFLAGS=-O2 -flto here. Can you send me a build log with V=1 ? There are some checks for the environment at the beginning, maybe they fail. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [C++ Patch / RFC] PR 53211
On 06/17/2013 08:21 PM, Paolo Carlini wrote: I see... There is a little difficulty in that 56794 involves a non-type variadic parameter and in that case type_dependent_expression_p returns false. If I use value_dependent_expression_p things work, but I'm not sure it's 100% correct. I don't think we need to consider whether the initializer is dependent here; if the array has no length and has an initializer, it must be that we couldn't determine its length in cp_complete_array_type because it is dependent. Eventually, we'll have to decide where we want to commit this: 56794 is fixed in 4.7 and 4.8 too, but it's true that the issue with the specific testcase attached by Jon isn't a regression. I think apply it to 4.8 as well. Jason
[PATCH] lto_tree_ref_encoder TLC
This makes us use a pointer-map for the hashtable in lto_tree_ref_encoder which avoids gazillion of malloc/free calls. LTO bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2013-06-18 Richard Biener rguent...@suse.de * Makefile.in (LTO_STREAMER_H): Add pointer-set.h dependency. * lto-streamer.h: Include pointer-set.h. (struct lto_decl_slot): Remove. (struct lto_tree_ref_encoder): Make tree_hash_table a pointer-map. Remove next_index entry. (lto_hash_decl_slot_node, lto_eq_decl_slot_node, lto_hash_type_slot_node, lto_eq_type_slot_node): Remove. (lto_init_tree_ref_encoder): Adjust. (lto_destroy_tree_ref_encoder): Likewise. * lto-section-out.c (lto_hash_decl_slot_node, lto_eq_decl_slot_node, lto_hash_type_slot_node, lto_eq_type_slot_node): Remove. (lto_output_decl_index): Adjust. (lto_new_out_decl_state): Likewise. (lto_record_function_out_decl_state): Likewise. * lto-streamer-out.c (copy_function): Likewise. Index: gcc/lto-streamer.h === *** gcc/lto-streamer.h (revision 200165) --- gcc/lto-streamer.h (working copy) *** along with GCC; see the file COPYING3. *** 33,38 --- 33,39 #include alloc-pool.h #include gcov-io.h #include diagnostic.h + #include pointer-set.h /* Define when debugging the LTO streamer. This causes the writer to output the numeric value for the memory address of the tree node *** struct GTY(()) lto_tree_ref_table *** 474,494 }; - /* Mapping between trees and slots in an array. */ - struct lto_decl_slot - { - tree t; - int slot_num; - }; - - /* The lto_tree_ref_encoder struct is used to encode trees into indices. */ struct lto_tree_ref_encoder { ! htab_t tree_hash_table; /* Maps pointers to indices. */ ! unsigned int next_index;/* Next available index. */ ! vectree trees;/* Maps indices to pointers. */ }; --- 475,486 }; /* The lto_tree_ref_encoder struct is used to encode trees into indices. */ struct lto_tree_ref_encoder { ! pointer_map_t *tree_hash_table; /* Maps pointers to indices. */ ! vectree trees;/* Maps indices to pointers. */ }; *** extern void lto_value_range_error (const *** 788,797 HOST_WIDE_INT) ATTRIBUTE_NORETURN; /* In lto-section-out.c */ - extern hashval_t lto_hash_decl_slot_node (const void *); - extern int lto_eq_decl_slot_node (const void *, const void *); - extern hashval_t lto_hash_type_slot_node (const void *); - extern int lto_eq_type_slot_node (const void *, const void *); extern void lto_begin_section (const char *, bool); extern void lto_end_section (void); extern void lto_write_stream (struct lto_output_stream *); --- 780,785 *** lto_tag_check_range (enum LTO_tags actua *** 1007,1017 /* Initialize an lto_out_decl_buffer ENCODER. */ static inline void ! lto_init_tree_ref_encoder (struct lto_tree_ref_encoder *encoder, ! htab_hash hash_fn, htab_eq eq_fn) { ! encoder-tree_hash_table = htab_create (37, hash_fn, eq_fn, free); ! encoder-next_index = 0; encoder-trees.create (0); } --- 995,1003 /* Initialize an lto_out_decl_buffer ENCODER. */ static inline void ! lto_init_tree_ref_encoder (struct lto_tree_ref_encoder *encoder) { ! encoder-tree_hash_table = pointer_map_create (); encoder-trees.create (0); } *** lto_destroy_tree_ref_encoder (struct lto *** 1023,1029 { /* Hash table may be delete already. */ if (encoder-tree_hash_table) ! htab_delete (encoder-tree_hash_table); encoder-trees.release (); } --- 1009,1015 { /* Hash table may be delete already. */ if (encoder-tree_hash_table) ! pointer_map_destroy (encoder-tree_hash_table); encoder-trees.release (); } Index: gcc/lto-section-out.c === *** gcc/lto-section-out.c (revision 200165) --- gcc/lto-section-out.c (working copy) *** static veclto_out_decl_state_ptr decl_ *** 48,107 generate the decl directory later. */ veclto_out_decl_state_ptr lto_function_decl_states; - /* Returns a hash code for P. */ - - hashval_t - lto_hash_decl_slot_node (const void *p) - { - const struct lto_decl_slot *ds = (const struct lto_decl_slot *) p; - - /* - return (hashval_t) DECL_UID (ds-t); - */ - return (hashval_t) TREE_HASH (ds-t); - } - - - /* Returns nonzero if P1 and P2 are equal. */ - - int - lto_eq_decl_slot_node (const void *p1, const void *p2) - { - const struct lto_decl_slot *ds1 = - (const struct lto_decl_slot *) p1; - const struct lto_decl_slot *ds2 = - (const struct lto_decl_slot *) p2; - - /*
Re: [C++ Patch / RFC] PR 53211
Hi, On 06/18/2013 04:15 PM, Jason Merrill wrote: On 06/17/2013 08:21 PM, Paolo Carlini wrote: I see... There is a little difficulty in that 56794 involves a non-type variadic parameter and in that case type_dependent_expression_p returns false. If I use value_dependent_expression_p things work, but I'm not sure it's 100% correct. I don't think we need to consider whether the initializer is dependent here; if the array has no length and has an initializer, it must be that we couldn't determine its length in cp_complete_array_type because it is dependent. Ah, fantastic. I really hoped we could say something like that but seemed too easy ;) I'm finishing testing the below then. Eventually, we'll have to decide where we want to commit this: 56794 is fixed in 4.7 and 4.8 too, but it's true that the issue with the specific testcase attached by Jon isn't a regression. I think apply it to 4.8 as well. Agreed. Paolo. Index: cp/parser.c === --- cp/parser.c (revision 200169) +++ cp/parser.c (working copy) @@ -9750,10 +9750,7 @@ cp_parser_range_for (cp_parser *parser, tree scope range_expr = error_mark_node; stmt = begin_range_for_stmt (scope, init); finish_range_for_decl (stmt, range_decl, range_expr); - if (range_expr != error_mark_node - !type_dependent_expression_p (range_expr) - /* The length of an array might be dependent. */ - COMPLETE_TYPE_P (complete_type (TREE_TYPE (range_expr))) + if (!type_dependent_expression_p (range_expr) /* do_auto_deduction doesn't mess with template init-lists. */ !BRACE_ENCLOSED_INITIALIZER_P (range_expr)) do_range_for_auto_deduction (range_decl, range_expr); Index: cp/pt.c === --- cp/pt.c (revision 200169) +++ cp/pt.c (working copy) @@ -20079,6 +20079,29 @@ type_dependent_expression_p (tree expression) VAR_HAD_UNKNOWN_BOUND (expression)) return true; + /* An array of unknown bound depending on a variadic parameter, eg: + + templatetypename... Args + void foo (Args... args) + { + int arr[] = { args... }; + } + + templateint... vals + void bar () + { + int arr[] = { vals... }; + } + + If the array has no length and has an initializer, it must be that + we couldn't determine its length in cp_complete_array_type because + it is dependent. */ + if (VAR_P (expression) + TREE_CODE (TREE_TYPE (expression)) == ARRAY_TYPE + !TYPE_DOMAIN (TREE_TYPE (expression)) + DECL_INITIAL (expression)) + return true; + if (TREE_TYPE (expression) == unknown_type_node) { if (TREE_CODE (expression) == ADDR_EXPR) Index: testsuite/g++.dg/cpp0x/decltype55.C === --- testsuite/g++.dg/cpp0x/decltype55.C (revision 0) +++ testsuite/g++.dg/cpp0x/decltype55.C (working copy) @@ -0,0 +1,20 @@ +// PR c++/53211 +// { dg-do compile { target c++11 } } + +templatetypename A, typename B + struct is_same { static const bool value = false; }; + +templatetypename A + struct is_sameA, A { static const bool value = true; }; + +templatetypename... Args +void func(Args... args) +{ + int arr[] = { args... }; + static_assert (is_samedecltype(arr), int[sizeof...(Args)]::value, ); +} + +int main() +{ + func(1, 2, 3, 4); +}
[ARM][Insn classification refactoring 2/N] Update instruction classification documentation
Hi, This patch updates the documentation for type attribute. It complements the changes proposed in the previous patch OK for trunk? - Thanks Sofiane arm-update-insn-class-doc.diff Description: Binary data
Re: [PATCH] Add command line parsing of -fsanitize
On Mon, Jun 17, 2013 at 06:01:10PM +0200, Jakub Jelinek wrote: On Mon, Jun 17, 2013 at 03:52:54PM +, Joseph S. Myers wrote: On Mon, 17 Jun 2013, Jakub Jelinek wrote: +; What the sanitizer should instrument +Variable +unsigned int flag_sanitize Can't you just add Var(flag_sanitize) to the line after fsanitize= ? I think that would create a string variable, whereas an integer is what's wanted here. We already have say: Wstack-usage= Common Joined RejectNegative UInteger Var(warn_stack_usage) Init(-1) Warning Warn if stack usage might be larger than specified amount that creates #ifdef GENERATOR_FILE extern int warn_stack_usage; #else int x_warn_stack_usage; #define warn_stack_usage global_options.x_warn_stack_usage #endif so I guess UInteger Var(flag_sanitize) then would do the trick. Nope, that would mean that argument to ‘-fsanitize=’ should be a non-negative integer. So I'm keeping common.opt as it was... Though, now looking at it, -fsanitize={address,thread} aren't RejectNegative, so they accept also -fno-sanitize=address etc. Marek, thus your patch should handle that properly too, say if you do: -fsanitize=undefined -fno-sanitize=shift it should first or into the flag_sanitize bitmask SANITIZE_UNDEFINED and then and it with ~SANITIZE_SHIFT. Ok, should be done now (together with other nit-fixes). Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-06-18 Marek Polacek pola...@redhat.com * common.opt (flag_sanitize): Add variable. (fsanitize=): Add option. (fsanitize=thread): Remove option. (fsanitize=address): Likewise. * flag-types.h (sanitize_code): New enum. * opts.c (common_handle_option): Parse command line arguments of -fsanitize=. * varasm.c (get_variable_section): Adjust. (assemble_noswitch_variable): Likewise. (assemble_variable): Likewise. (output_constant_def_contents): Likewise. (categorize_decl_for_section): Likewise. (place_block_symbol): Likewise. (output_object_block): Likewise. * builtins.def: Likewise. * toplev.c (compile_file): Likewise. (process_options): Likewise. * cppbuiltin.c: Likewise. * tsan.c (tsan_pass): Likewise. (tsan_gate): Likewise. (tsan_gate_O0): Likewise. * cfgexpand.c (partition_stack_vars): Likewise. (expand_stack_vars): Likewise. (defer_stack_allocation): Likewise. (expand_used_vars): Likewise. * cfgcleanup.c (old_insns_match_p): Likewise. * asan.c (asan_finish_file): Likewise. (asan_instrument): Likewise. (gate_asan): Likewise. --- gcc/opts.c.mp 2013-06-18 13:56:52.113609043 +0200 +++ gcc/opts.c 2013-06-18 13:57:50.595853767 +0200 @@ -1405,6 +1405,70 @@ common_handle_option (struct gcc_options opts-x_exit_after_options = true; break; +case OPT_fsanitize_: + { + const char *p = arg; + while (*p != 0) + { + static const struct + { + const char *const name; + unsigned int flag; + size_t len; + } spec[] = + { + { address, SANITIZE_ADDRESS, sizeof address - 1 }, + { thread, SANITIZE_THREAD, sizeof thread - 1 }, + /* For now, let's hide this. + { shift, SANITIZE_SHIFT, sizeof shift - 1 }, + { integer-divide-by-zero, SANITIZE_DIVIDE, + sizeof integer-divide-by-zero - 1 }, + { undefined, SANITIZE_UNDEFINED, sizeof undefined - 1 }, + */ + { NULL, 0, 0 } + }; + const char *comma; + size_t len, i; + bool found = false; + + comma = strchr (p, ','); + if (comma == NULL) + len = strlen (p); + else + len = comma - p; + if (len == 0) + { + p = comma + 1; + continue; + } + + /* Check to see if the string matches an option class name. */ + for (i = 0; spec[i].name != NULL; ++i) + if (len == spec[i].len + memcmp (p, spec[i].name, len) == 0) + { + /* Handle both -fsanitize and -fno-sanitize cases. */ + if (value) + flag_sanitize |= spec[i].flag; + else + flag_sanitize = ~spec[i].flag; + found = true; + break; + } + + if (! found) + warning_at (loc, 0, + unrecognized argument to -fsanitize= option: %q.*s, + (int) len, p); + + if (comma == NULL) + break; + p = comma + 1; + } + + break; + } + case OPT_O: case OPT_Os: case OPT_Ofast: --- gcc/varasm.c.mp 2013-06-18
[PATCH, ARM] Reintroduce minipool ranges for zero-extension insn patterns
Hi, The following patch removed pool_range/neg_pool_range attributes from several instructions as a cleanup, which I believe to have been incorrect: http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01036.html On a Mentor-local branch, this caused problems with instructions like: (insn 77 53 87 (set (reg:SI 8 r8 [orig:197 s.4 ] [197]) (zero_extend:SI (mem/u/c:HI (symbol_ref/u:SI (*.LC0) [flags 0x2]) [7 S2 A16]))) [...] 161 {*arm_zero_extendhisi2_v6} (nil)) The reasoning behind the cleanup was that the instructions in question have no immediate constraints -- but the minipool code is used for more than just immediates, e.g. in the above case where a symbol reference (m) is loaded. I don't have a test case for the problem on mainline at present, but I believe it is still a latent bug. Tested with the default multilibs (ARM Thumb mode) on arm-none-eabi, with no regressions. (The patch has also been tested with more multilibs on our local branches for a while, and I did ensure previously that it did not adversely affect Bernd's patch linked above.) OK to apply? Thanks, Julian ChangeLog gcc/ * arm.md (*thumb1_zero_extendhisi2, *arm_zero_extendhisi2) (*arm_zero_extendhisi2_v6, *thumb1_zero_extendqisi2) (*thumb1_zero_extendqisi2_v6, *arm_zero_extendqisi2) (*arm_zero_extendqisi2_v6): Add pool_range, neg_pool_range attributes. Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 200171) +++ gcc/config/arm/arm.md (working copy) @@ -5313,7 +5313,8 @@ [(if_then_else (eq_attr is_arch6 yes) (const_int 2) (const_int 4)) (const_int 4)]) - (set_attr type simple_alu_shift, load_byte)] + (set_attr type simple_alu_shift, load_byte) + (set_attr pool_range *,60)] ) (define_insn *arm_zero_extendhisi2 @@ -5324,7 +5325,9 @@ # ldr%(h%)\\t%0, %1 [(set_attr type alu_shift,load_byte) - (set_attr predicable yes)] + (set_attr predicable yes) + (set_attr pool_range *,256) + (set_attr neg_pool_range *,244)] ) (define_insn *arm_zero_extendhisi2_v6 @@ -5335,7 +5338,9 @@ uxth%?\\t%0, %1 ldr%(h%)\\t%0, %1 [(set_attr predicable yes) - (set_attr type simple_alu_shift,load_byte)] + (set_attr type simple_alu_shift,load_byte) + (set_attr pool_range *,256) + (set_attr neg_pool_range *,244)] ) (define_insn *arm_zero_extendhisi2addsi @@ -5405,7 +5410,8 @@ uxtb\\t%0, %1 ldrb\\t%0, %1 [(set_attr length 2) - (set_attr type simple_alu_shift,load_byte)] + (set_attr type simple_alu_shift,load_byte) + (set_attr pool_range *,32)] ) (define_insn *arm_zero_extendqisi2 @@ -5417,7 +5423,9 @@ ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2 [(set_attr length 8,4) (set_attr type alu_shift,load_byte) - (set_attr predicable yes)] + (set_attr predicable yes) + (set_attr pool_range *,4096) + (set_attr neg_pool_range *,4084)] ) (define_insn *arm_zero_extendqisi2_v6 @@ -5428,7 +5436,9 @@ uxtb%(%)\\t%0, %1 ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2 [(set_attr type simple_alu_shift,load_byte) - (set_attr predicable yes)] + (set_attr predicable yes) + (set_attr pool_range *,4096) + (set_attr neg_pool_range *,4084)] ) (define_insn *arm_zero_extendqisi2addsi
Re: [PATCH, ARM] Reintroduce minipool ranges for zero-extension insn patterns
On 18/06/13 16:42, Julian Brown wrote: Hi, The following patch removed pool_range/neg_pool_range attributes from several instructions as a cleanup, which I believe to have been incorrect: http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01036.html On a Mentor-local branch, this caused problems with instructions like: (insn 77 53 87 (set (reg:SI 8 r8 [orig:197 s.4 ] [197]) (zero_extend:SI (mem/u/c:HI (symbol_ref/u:SI (*.LC0) [flags 0x2]) [7 S2 A16]))) [...] 161 {*arm_zero_extendhisi2_v6} (nil)) The reasoning behind the cleanup was that the instructions in question have no immediate constraints -- but the minipool code is used for more than just immediates, e.g. in the above case where a symbol reference (m) is loaded. I don't have a test case for the problem on mainline at present, but I believe it is still a latent bug. Tested with the default multilibs (ARM Thumb mode) on arm-none-eabi, with no regressions. (The patch has also been tested with more multilibs on our local branches for a while, and I did ensure previously that it did not adversely affect Bernd's patch linked above.) OK to apply? Pushing extending loads (particularly sign-extending loads) into the minipools adversely affects freedom of pool placement (since the offset ranges are limited). And they shouldn't be happening anyway. I really don't think this is the right solution to the problem you have. R. Thanks, Julian ChangeLog gcc/ * arm.md (*thumb1_zero_extendhisi2, *arm_zero_extendhisi2) (*arm_zero_extendhisi2_v6, *thumb1_zero_extendqisi2) (*thumb1_zero_extendqisi2_v6, *arm_zero_extendqisi2) (*arm_zero_extendqisi2_v6): Add pool_range, neg_pool_range attributes. ze-minipool-ranges-fsf-2.diff Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 200171) +++ gcc/config/arm/arm.md (working copy) @@ -5313,7 +5313,8 @@ [(if_then_else (eq_attr is_arch6 yes) (const_int 2) (const_int 4)) (const_int 4)]) - (set_attr type simple_alu_shift, load_byte)] + (set_attr type simple_alu_shift, load_byte) + (set_attr pool_range *,60)] ) (define_insn *arm_zero_extendhisi2 @@ -5324,7 +5325,9 @@ # ldr%(h%)\\t%0, %1 [(set_attr type alu_shift,load_byte) - (set_attr predicable yes)] + (set_attr predicable yes) + (set_attr pool_range *,256) + (set_attr neg_pool_range *,244)] ) (define_insn *arm_zero_extendhisi2_v6 @@ -5335,7 +5338,9 @@ uxth%?\\t%0, %1 ldr%(h%)\\t%0, %1 [(set_attr predicable yes) - (set_attr type simple_alu_shift,load_byte)] + (set_attr type simple_alu_shift,load_byte) + (set_attr pool_range *,256) + (set_attr neg_pool_range *,244)] ) (define_insn *arm_zero_extendhisi2addsi @@ -5405,7 +5410,8 @@ uxtb\\t%0, %1 ldrb\\t%0, %1 [(set_attr length 2) - (set_attr type simple_alu_shift,load_byte)] + (set_attr type simple_alu_shift,load_byte) + (set_attr pool_range *,32)] ) (define_insn *arm_zero_extendqisi2 @@ -5417,7 +5423,9 @@ ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2 [(set_attr length 8,4) (set_attr type alu_shift,load_byte) - (set_attr predicable yes)] + (set_attr predicable yes) + (set_attr pool_range *,4096) + (set_attr neg_pool_range *,4084)] ) (define_insn *arm_zero_extendqisi2_v6 @@ -5428,7 +5436,9 @@ uxtb%(%)\\t%0, %1 ldr%(b%)\\t%0, %1\\t%@ zero_extendqisi2 [(set_attr type simple_alu_shift,load_byte) - (set_attr predicable yes)] + (set_attr predicable yes) + (set_attr pool_range *,4096) + (set_attr neg_pool_range *,4084)] ) (define_insn *arm_zero_extendqisi2addsi
Re: [PATCH] Re-write LTO type merging again, do tree merging
That suggests Index: gcc/expr.c === --- gcc/expr.c (revision 200164) +++ gcc/expr.c (working copy) @@ -9353,7 +9353,7 @@ expand_expr_real_1 (tree exp, rtx target /* Variables inherited from containing functions should have been lowered by this point. */ context = decl_function_context (exp); - gcc_assert (!context + gcc_assert (SCOPE_FILE_SCOPE_P (context) || context == current_function_decl || TREE_STATIC (exp) || DECL_EXTERNAL (exp) might fix it. Just confirmed with the small build. It does. Running the large build now. Please check in. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
Ping. On 06/06/2013 01:23 PM, Meador Inge wrote: On 06/06/2013 08:11 AM, Richard Earnshaw wrote: I understand (and agree with) this bit... +(define_peephole2 + [(set (reg:CC CC_REGNUM) +(compare:CC (match_operand:SI 0 register_operand ) +(match_operand:SI 1 arm_rhs_operand ))) + (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0)) + (set (match_operand:SI 2 register_operand ) (const_int 0))) + (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0)) + (set (match_dup 2) (const_int 1))) + (match_scratch:SI 3 r)] + TARGET_32BIT !peep2_reg_dead_p (3, operands[0]) + [(set (match_dup 3) (minus:SI (match_dup 0) (match_dup 1))) + (parallel +[(set (reg:CC CC_REGNUM) + (compare:CC (const_int 0) (match_dup 3))) + (set (match_dup 2) (minus:SI (const_int 0) (match_dup 3)))]) + (set (match_dup 2) +(plus:SI (plus:SI (match_dup 2) (match_dup 3)) + (geu:SI (reg:CC CC_REGNUM) (const_int 0]) + ... but what's this bit about? The original intent was to revert back to the original peephole pattern (pre-PR 46975) when the CC reg is still live, but that doesn't properly maintain the CC state either (it just happened to pass in the test case I was looking at because I only cared about the Z flag, which is maintained the same). OK with the above bit left out? -- Meador Inge CodeSourcery / Mentor Embedded
Re: [PATCH] Remove LTO streamer cache pointer-map for LTO input
Richard Biener rguent...@suse.de writes: It is not necessary to maintain the pointer-map from cache entry to cache index when reading trees. A quick estimate using the latest WPA stats from firefox estimates its size to at least 1.5GB - apart from the cost to maintain it. So the following patch makes it possible to not maintain that map. Cool! It also costs a lot of CPU cycles, according to my profiles. -Andi -- a...@linux.intel.com -- Speaking for myself only
PATCH: Fix a typo in comments in config/i386/i386.c
Hi, I checked in this patch to fix a typo in comments in config/i386/i386.c. H.J. --- Index: ChangeLog === --- ChangeLog (revision 200173) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2013-06-18 H.J. Lu hongjiu...@intel.com + + * config/i386/i386.c (initial_ix86_tune_features): Fix a typo + in comments. + 2013-06-18 Julian Brown jul...@codesourcery.com * config/arm/arm.c (neon_vector_mem_operand): Add strict argument. Index: config/i386/i386.c === --- config/i386/i386.c (revision 200173) +++ config/i386/i386.c (working copy) @@ -2085,7 +2085,7 @@ static unsigned int initial_ix86_tune_fe instructions. */ ~m_ATOM, - /* X86_SOFTARE_PREFETCHING_BENEFICIAL: Enable software prefetching + /* X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL: Enable software prefetching at -O3. For the moment, the prefetching seems badly tuned for Intel chips. */ m_K6_GEODE | m_AMD_MULTIPLE,
Re: [patch] gcov intermediate format
On Tue, Jun 18, 2013 at 3:28 AM, Jan Hubicka hubi...@ucw.cz wrote: Ping. Th patch is OK, thanks! I see you added gcov.exp file support, do you have a testcases? Yes, I added support for verifying intermediate format in gcov.exp. I also added a minimal testcase for intermediate format in testsuite/g++.dg/gcov directory. Thanks, Sharad Honza
Re: [Committed] S/390: PR57609 fix - use next_active_insn instead of next_real_insn
On Tue, Jun 18, 2013 at 2:17 PM, Andreas Krebbel wrote: If you don't want me to use next_active_insn I probably have to do something like this instead: No. If the label is followed by jump table data, then NEXT_INSN(label) will be the JUMP_TABLE_DATA rtx. See tablejump_p. So the following should work: Index: s390.c === --- s390.c (revision 200173) +++ s390.c (working copy) @@ -7023,7 +7023,7 @@ s390_chunkify_start (void) if (LABEL_P (insn) (LABEL_PRESERVE_P (insn) || LABEL_NAME (insn))) { - rtx vec_insn = next_active_insn (insn); + rtx vec_insn = NEXT_INSN (insn); if (! vec_insn || ! JUMP_TABLE_DATA_P (vec_insn)) bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (insn)); } @@ -7054,7 +7054,7 @@ s390_chunkify_start (void) { /* Find the jump table used by this casesi jump. */ rtx vec_label = XEXP (XEXP (XVECEXP (pat, 0, 1), 0), 0); - rtx vec_insn = next_active_insn (vec_label); + rtx vec_insn = NEXT_INSN (vec_label); if (vec_insn JUMP_TABLE_DATA_P (vec_insn)) { rtx vec_pat = PATTERN (vec_insn); BTW I don't understand how a label satisfying the following can be true for a label before a jump table: if (LABEL_P (insn) (LABEL_PRESERVE_P (insn) || LABEL_NAME (insn))) LABEL_PRESERVE_P should never be set on a label before a JUMP_TABLE_DATA, and LABEL_NAME should be NULL. Better yet would be to use tablejump_p instead of examining the pattern by hand, e.g.: Index: s390.c === --- s390.c (revision 200173) +++ s390.c (working copy) @@ -7023,7 +7023,7 @@ s390_chunkify_start (void) if (LABEL_P (insn) (LABEL_PRESERVE_P (insn) || LABEL_NAME (insn))) { - rtx vec_insn = next_active_insn (insn); + rtx vec_insn = NEXT_INSN (insn); if (! vec_insn || ! JUMP_TABLE_DATA_P (vec_insn)) bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (insn)); } @@ -7033,6 +7033,8 @@ s390_chunkify_start (void) else if (JUMP_P (insn)) { rtx pat = PATTERN (insn); + rtx table; + if (GET_CODE (pat) == PARALLEL XVECLEN (pat, 0) 2) pat = XVECEXP (pat, 0, 0); @@ -7046,28 +7048,18 @@ s390_chunkify_start (void) bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (label)); } } - else if (GET_CODE (pat) == PARALLEL - XVECLEN (pat, 0) == 2 - GET_CODE (XVECEXP (pat, 0, 0)) == SET - GET_CODE (XVECEXP (pat, 0, 1)) == USE - GET_CODE (XEXP (XVECEXP (pat, 0, 1), 0)) == LABEL_REF) - { - /* Find the jump table used by this casesi jump. */ - rtx vec_label = XEXP (XEXP (XVECEXP (pat, 0, 1), 0), 0); - rtx vec_insn = next_active_insn (vec_label); - if (vec_insn JUMP_TABLE_DATA_P (vec_insn)) - { - rtx vec_pat = PATTERN (vec_insn); - int i, diff_p = GET_CODE (vec_pat) == ADDR_DIFF_VEC; - - for (i = 0; i XVECLEN (vec_pat, diff_p); i++) - { - rtx label = XEXP (XVECEXP (vec_pat, diff_p, i), 0); - - if (s390_find_pool (pool_list, label) - != s390_find_pool (pool_list, insn)) - bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (label)); - } + else if (tablejump_p (insn, NULL, table)) + { + rtx vec_pat = PATTERN (table); + int i, diff_p = GET_CODE (vec_pat) == ADDR_DIFF_VEC; + + for (i = 0; i XVECLEN (vec_pat, diff_p); i++) + { + rtx label = XEXP (XVECEXP (vec_pat, diff_p, i), 0); + + if (s390_find_pool (pool_list, label) + != s390_find_pool (pool_list, insn)) + bitmap_set_bit (far_labels, CODE_LABEL_NUMBER (label)); } } } Ciao! Steven
Re: [PATCH] Re-write LTO type merging again, do tree merging
Just confirmed with the small build. It does. Running the large build now. Large build worked too. Please check in.
Re: [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc.
On Wed, May 22, 2013 at 4:52 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: 2013-05-22 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/predicates.md (fusion_gpr_addis): New predicates to support power8 load fusion. (fusion_gpr_mem_load): Likewise. * config/rs6000/rs6000-modes.def (PTImode): Update a comment. * config/rs6000/rs6000-protos.h (fusion_gpr_load_p): New declarations for power8 load fusion. (emit_fusion_gpr_load): Likewise. * config/rs6000/rs6000.opt (-mlra): New undocumented switch to turn on using the LRA register allocator. (-mconstrain-regs): New undocumented switch to constrain non-integer values from being loaded into the LR or CTR registers. This really should have been a separate patch. * config/rs6000/rs6000.c (TARGET_LRA_P): If -mlra, turn on using the LRA register allocator. (rs6000_lra_p): Likewise. (rs6000_hard_regno_mode_ok): Allow DI/DD/SF/SD modes in altivec registers if power8. If -mconstrain-regs, only allow int modes into LR, CTR, and special purpose registers. (rs6000_debug_reg_global): Print -mlra, -mconstrain-regs status if debugging. (rs6000_init_hard_regno_mode_ok): Mark that SFmode can use Altivec registers in the future. (rs6000_option_override_internal): If tuning for power8, turn on fusion mode by default. Turn on sign extending fusion mode if normal fusion mode is on, and we are at -O2 or -O3. (rs6000_opt_masks): Add -mlra, -mconstrain-regs. (fusion_gpr_load_p): New function, return true if we can fuse an addis instruction with a dependent load to a GPR. (emit_fusion_gpr_load): Emit the instructions for power8 load fusion to GPRs. * config/rs6000/vsx.md (VSX load fusion peepholes): Add peepholes to fuse together an addi instruction with a VSX load instruction. * config/rs6000/rs6000.md (GPR load fusion peepholes): Add peepholes to fuse an addis instruction with a load to a GPR base register, if the addis instruction is dead after the load, by using the register to be loaded for the addis. If we are supporting sign extending fusions, convert sign extending loads to zero extending loads and an explicit sign extension. + /* 32-bit is not done yet. */ + if (TARGET_ELF !TARGET_POWERPC64) +return 0; What does 32-bit is not done yet. mean? This means PPC32 Linux is not supported but PPC32 AIX is supported? + if (TARGET_ELF !TARGET_POWERPC64) +return 0; Please return true and false from new predicates, not 1 and 0. + +case DImode: + if (TARGET_POWERPC64) +{ + mode_name = long; + load_str = ld; +} + break; What happens for DImode when not TARGET_POWERPC64? This should be gcc_unreachable()? Thanks, David
FW: [PATCH] RTEMS: Use strict DWARF-2 on ARM, PowerPC, SPARC
Hi, Forwarding this patch to gcc-patches... Cheers! Cindy From: rtems-devel-boun...@rtems.org [rtems-devel-boun...@rtems.org] on behalf of Sebastian Huber [sebastian.hu...@embedded-brains.de] Sent: Tuesday, June 18, 2013 4:58 AM To: rtems-de...@rtems.org Subject: [PATCH] RTEMS: Use strict DWARF-2 on ARM, PowerPC, SPARC Some debuggers do not cope with the new DWARF3/4 debug format introduced with GCC 4.8. Default to strict DWARF-2 on ARM, PowerPC and SPARC for now. This patch should be committed to GCC 4.8 and 4.9. gcc/ChangeLog 2013-06-18 Sebastian Huber sebastian.hu...@embedded-brains.de * config/rtems.c: New. * config.gcc (*-*-rtems*): Set extra_objs. * config/rtems.h (rtems_override_options): Declare. (RTEMS_OVERRIDE_OPTIONS): Define. * config/t-rtems (rtems.o): New. * config/arm/rtems-eabi.h (SUBTARGET_OVERRIDE_OPTIONS): Define. * config/rs6000/rtems.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Define. * config/sparc/rtemself.h (SUBTARGET_OVERRIDE_OPTIONS): Define. --- gcc/config.gcc |1 + gcc/config/arm/rtems-eabi.h |3 +++ gcc/config/rs6000/rtems.h |3 +++ gcc/config/rtems.c | 37 + gcc/config/rtems.h |4 gcc/config/sparc/rtemself.h |3 +++ gcc/config/t-rtems |4 7 files changed, 55 insertions(+), 0 deletions(-) create mode 100644 gcc/config/rtems.c diff --git a/gcc/config.gcc b/gcc/config.gcc index 1a0be50..6394551 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -734,6 +734,7 @@ case ${target} in yes) thread_file='rtems' ;; esac extra_options=${extra_options} rtems.opt + extra_objs=rtems.o use_gcc_stdint=wrap ;; *-*-uclinux*) diff --git a/gcc/config/arm/rtems-eabi.h b/gcc/config/arm/rtems-eabi.h index 77fcf1a..a7ccdae 100644 --- a/gcc/config/arm/rtems-eabi.h +++ b/gcc/config/arm/rtems-eabi.h @@ -27,3 +27,6 @@ builtin_assert (system=rtems);\ TARGET_BPABI_CPP_BUILTINS();\ } while (0) + +#undef SUBTARGET_OVERRIDE_OPTIONS +#define SUBTARGET_OVERRIDE_OPTIONS RTEMS_OVERRIDE_OPTIONS diff --git a/gcc/config/rs6000/rtems.h b/gcc/config/rs6000/rtems.h index b910b5e..cf53592 100644 --- a/gcc/config/rs6000/rtems.h +++ b/gcc/config/rs6000/rtems.h @@ -34,6 +34,9 @@ } \ while (0) +#undef SUBSUBTARGET_OVERRIDE_OPTIONS +#define SUBSUBTARGET_OVERRIDE_OPTIONS RTEMS_OVERRIDE_OPTIONS + #undef CPP_OS_DEFAULT_SPEC #define CPP_OS_DEFAULT_SPEC %(cpp_os_rtems) diff --git a/gcc/config/rtems.c b/gcc/config/rtems.c new file mode 100644 index 000..4250ba3 --- /dev/null +++ b/gcc/config/rtems.c @@ -0,0 +1,37 @@ +/* Common RTEMS target definitions for GNU compiler. + Copyright (C) 2013 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include target.h +#include tm.h + +void +rtems_override_options (void) +{ + /* Use strict DWARF-2 unless specifically selected. This is a + workaround for a lack of tool support. */ + + if (!global_options_set.x_dwarf_strict) +dwarf_strict = 1; + + if (!global_options_set.x_dwarf_version) +dwarf_version = 2; +} diff --git a/gcc/config/rtems.h b/gcc/config/rtems.h index 4d94f82..8ea97fe 100644 --- a/gcc/config/rtems.h +++ b/gcc/config/rtems.h @@ -43,3 +43,7 @@ along with GCC; see the file COPYING3. If not see -lc -lgcc --end-group %{!qnolinkcmds: -T linkcmds%s}}} #define TARGET_POSIX_IO + +/* Do RTEMS specific parts of TARGET_OPTION_OVERRIDE. */ +extern void rtems_override_options (void); +#define RTEMS_OVERRIDE_OPTIONS rtems_override_options () diff --git a/gcc/config/sparc/rtemself.h b/gcc/config/sparc/rtemself.h index 7a69dfa..2ee939c 100644 --- a/gcc/config/sparc/rtemself.h +++ b/gcc/config/sparc/rtemself.h @@ -29,5 +29,8 @@ along with GCC; see the file COPYING3. If not see } \ while (0) +#undef SUBTARGET_OVERRIDE_OPTIONS +#define SUBTARGET_OVERRIDE_OPTIONS RTEMS_OVERRIDE_OPTIONS + /* Use the default */ #undef LINK_GCC_C_SEQUENCE_SPEC diff --git a/gcc/config/t-rtems b/gcc/config/t-rtems index baa00d8..5a8e88a 100644 --- a/gcc/config/t-rtems +++ b/gcc/config/t-rtems @@ -1,2 +1,6 @@ # RTEMS always has limits.h.
Re: [gomp4] Some progress on #pragma omp simd
Please move simduid after force_vect, so that it is better packed. Fixed. I also rewrote the builtins to use the internal function doo-dah as previously suggested. Let me know if this is fine with y'all and you (Jakub) can keep this patch and apply it on top of your pending patchset for clauses. diff --git a/gcc/ChangeLog.gomp b/gcc/ChangeLog.gomp index 7f9151d..b023173 100644 --- a/gcc/ChangeLog.gomp +++ b/gcc/ChangeLog.gomp @@ -1,3 +1,30 @@ +2013-06-17 Aldy Hernandez al...@redhat.com + + * builtin-types.def (BT_FN_UINT_PTR): New. + * omp-builtins.def (BUILT_IN_GOMP_SIMD_LANE): Remove. + (BUILT_IN_GOMP_SIMD_VF): Remove. + * internal-fn.c (expand_GOMP_SIMD_LANE): New function. + (expand_GOMP_SIMD_VF): Same. + * internal-fn.def (GOMP_SIMD_LANE): New. + (GOMP_SIMD_VF): New. + * cfgloop.h (struct loop): Change type of simduid to tree. + * omp-low.c (lower_rec_input_clauses): Adapt to use simduid as a + tree and use internal functions instead of built-ins. + (expand_omp_simd): Same. + * tree-data-ref.c (get_references_in_stmt): Same. + Use internal functions instead of built-ins. + * tree-vect-data-refs.c (vect_analyze_data_refs): Same. + * tree-vectorizer.c (struct simduid_to_vf): Change type of simduid + to tree. + (simduid_to_vf::hash): Hash pointer. + (adjust_simduid_builtins): Add comment. + Use simduid as tree. + Use internal functions instead of built-ins. + * tree-pretty-print.c (dump_omp_clause): Rename + OMP_CLAUSE__SIMDUID__UID to OMP_CLAUSE__SIMDUID__DECL. + * tree.h (OMP_CLAUSE__SIMDUID__DECL): Rename from + OMP_CLAUSE__SIMDUID__UID. + 2013-06-14 Jakub Jelinek ja...@redhat.com * gimple-pretty-print.c (dump_gimple_omp_for): Don't handle diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 6cc9a6c..794599b 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -174,13 +174,13 @@ struct GTY ((chain_next (%h.next))) loop { of the loop can be safely evaluated concurrently. */ int safelen; - /* For SIMD loops, this is a unique identifier of the loop, referenced - by __builtin_GOMP.simd_vf and __builtin_GOMP.simd_lane builtins. */ - unsigned int simduid; - /* True if we should try harder to vectorize this loop. */ bool force_vect; + /* For SIMD loops, this is a unique identifier of the loop, referenced + by __builtin_GOMP.simd_vf and __builtin_GOMP.simd_lane builtins. */ + tree simduid; + /* Upper bound on number of iterations of a loop. */ struct nb_iter_bound *bounds; diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index b841abd..0ed0d9a 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -109,6 +109,22 @@ expand_STORE_LANES (gimple stmt) expand_insn (get_multi_vector_move (type, vec_store_lanes_optab), 2, ops); } +/* This should get expanded in adjust_simduid_builtins. */ + +static void +expand_GOMP_SIMD_LANE (gimple stmt ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + +/* This should get expanded in adjust_simduid_builtins. */ + +static void +expand_GOMP_SIMD_VF (gimple stmt ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 8900d90..fdbcbe8 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -40,3 +40,5 @@ along with GCC; see the file COPYING3. If not see DEF_INTERNAL_FN (LOAD_LANES, ECF_CONST | ECF_LEAF) DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF) +DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW) diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index 8ad2113..44c48f4 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -218,8 +218,3 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, GOMP_single_copy_start, BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, GOMP_single_copy_end, BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST) - -DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SIMD_LANE, GOMP.simd_lane, - BT_FN_UINT_UINT, ATTR_NOVOPS_NOTHROW_LEAF_LIST) -DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SIMD_VF, GOMP.simd_vf, - BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index a9e2758..8bb7004 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -2497,7 +2497,6 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, bool copyin_by_ref = false; bool lastprivate_firstprivate = false; int pass; - static int simd_uid; bool is_simd = (gimple_code (ctx-stmt) == GIMPLE_OMP_FOR gimple_omp_for_kind (ctx-stmt) == GF_OMP_FOR_KIND_SIMD); int max_vf = 0; @@ -2887,23 +2886,21 @@ lower_rec_input_clauses (tree clauses, gimple_seq
Re: RFA: Fix rtl-optimization/57425
Quoting Michael Matz m...@suse.de: So, first the potentially clobbering write, then the read. And indeed in check_dependence d-exp is the write and x the read that is potentially clobbered. Oops, you are right. I got confused because what is X in cse.c:invalidate ends up as d-exp in cse.c:check_dependence, and what is p-canon_exp in cse.c:invalidate ends up as X in cse.c:check_dependence. I was a bit puzzled why we could have a STRICT_LOW_PART for MEM. Now it makes more sense. I also see now that both the read expression from the hash table and the write address are canonicalized, so if we have a canon_* interface, we want to use both. Generally I would prefer simple interfaces to the query functions, dependence problems are hard enough to think about without functions needing four arguments. Does it really save much to not canonicalize the mem address for some calls? I haven't measured it myself, but I suppose it must be there for a good reason. Of course a simpler interface would be nice, however... When cse encounters a write, it goes through the entire hash table of expressions and checks dependencies. So for (extended?) basic blocks with lots of expressions writes, we got O(n^2) dependency checks.
Re: Symtab cleanups 4/17 - ICE in GUPC due to use of init section
On 06/05/13 16:18:52, Jan Hubicka wrote: Hi, this patch deals with C++ keyed methods and explicit instantiations. Currently C++ calls mark_used that ultimately sets force_output on the functions. This is equivalent to attribute ((used)) on the function/variable and it is bit too strong. [...] This patch leads to in ICE on the GUPC branch after attempting a merge from the trunk. This may be because GUPC is attempting to write to the (assembler) output too early, or because some additional tree meta data needs to be set before the cgraph-related pass is run. I'd appreciate your suggestions on how to best make this work within the new scheme of things. As background, in certain situations GUPC generates initialization code that is run before main() is called. This initialization code is run after the UPC runtime has set up things like shared memory. The actual program initial entry point (main) is in the runtime. GCC's constructors won't do the job here because constructors are run _before_ the runtime's main() entry point is called. The ICE (in debug builds with checks enabled) is shown below. test25.upc:91:1: internal compiler error: in decide_is_symbol_needed, at cgraphunit.c:230 This failure occurs in tests that require active initialization of shared variables or static variables that are initialized with expressions involving the addresses of UPC shared variables. This assertion is failing: 227 /* Double check that no one output the function into assembly file 228 early. */ 229 gcc_checking_assert (!DECL_ASSEMBLER_NAME_SET_P (decl) 230 || !TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME The failure is a result of this patch: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00240.html [^] It was committed as: r199695 | hubicka | 2013-06-05 07:15:31 -0700 (Wed, 05 Jun 2013) The reason that this fails is that GNU UPC has the following code in gcc/upc/upc-act.c: /* Build a function that will be called by the UPC runtime to initialize UPC shared variables. STMT_LIST is a list of initialization statements. */ static void upc_build_init_func (const tree stmt_list) { [...] finish_function (); gcc_assert (DECL_RTL (init_func)); mark_decl_referenced (init_func); DECL_PRESERVE_P (init_func) = 1; upc_init_array_section = get_section (UPC_INIT_ARRAY_SECTION_NAME, 0, NULL); init_func_symbol = XEXP (DECL_RTL (init_func), 0); assemble_addr_to_section (init_func_symbol, upc_init_array_section); } See: http://gcc.gnu.org/viewcvs/gcc/branches/gupc/gcc/upc/upc-act.c?revision=197503view=markup (near line 1448). The function upc_build_init_func() is called after the entire compilation unit has been compiled. It is called from upc_write_global_declarations() which is called from c_common_parse_file(), just after parsing the main source file. 1054 void 1055 c_common_parse_file (void) 1056 { 1057 unsigned int i; 1058 1059 i = 0; 1060 for (;;) 1061 { 1062 c_finish_options (); 1063 pch_init (); 1064 push_file_scope (); 1065 c_parse_file (); 1066 /* Generate UPC global initialization code, if required. */ 1067 if (c_dialect_upc ()) 1068 upc_write_global_declarations (); 1069 pop_file_scope (); See: http://gcc.gnu.org/viewcvs/gcc/branches/gupc/gcc/c-family/c-opts.c?revision=198453view=markup (near line 1068). It seems that GUPC may be calling assemble_addr_to_section() too early and that some other method of locating the UPC shared data related initialization into the UPC upc_init_array section needs to be implemented. Either that, or something needs to be set in the function's tree node to indicate that no further special handling is needed. Please advise. Thanks, - Gary
Re: [PATCH] Add command line parsing of -fsanitize
On Tue, Jun 18, 2013 at 04:42:51PM +0200, Marek Polacek wrote: Ok, should be done now (together with other nit-fixes). Regtested/bootstrapped on x86_64-linux, ok for trunk? Looks good to me, the only thing I'm worried about are how this interferes with the %{fsanitize=address:...} and %{fsanitize=thread:...} bits in gcc.c. Because we should link in -lasan even for -fsanitize=shift,address,undefined and should not link in -lasan for -fsanitize=address -fno-sanitize=undefined,address,shift (generally, what we have guarded right now with %{fsanitize=address:...} should be done if flag_sanitize SANITIZE_ADDRESS is going to be true in the end, etc., and we'll need to link in -lubsan whenever at least one of the undefined options are set in the bitmask. -lubsan isn't incompatible with -lasan nor -ltsan, but -lasan and -ltsan are incompatible. Joseph, any thoughts how to deal with this? Jakub
Re: Symtab cleanups 4/17 - ICE in GUPC due to use of init section
On Tue, Jun 18, 2013 at 10:19 PM, Gary Funck wrote: It seems that GUPC may be calling assemble_addr_to_section() too early and that some other method of locating the UPC shared data related initialization into the UPC upc_init_array section needs to be implemented. Either that, or something needs to be set in the function's tree node to indicate that no further special handling is needed. Please advise. The advice would have to be that the front end should not write out anything to the assembler file. Why not just emit the function as GIMPLE (even if your stmt_list is empty) and let your main() call it? Ciao! Steven
Re: [PATCH] Re-write LTO type merging again, do tree merging
On Tue, Jun 18, 2013 at 08:04:15PM +0200, Andi Kleen wrote: Just confirmed with the small build. It does. Running the large build now. Large build worked too. Also it seems to be drastically faster. I haven't done a proper measurement run, but the initial run was 58% faster than 4.8, using 42% less peak RSS. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [C++ Patch / RFC] PR 53211
On 06/18/2013 10:46 AM, Paolo Carlini wrote: Ah, fantastic. I really hoped we could say something like that but seemed too easy ;) I'm finishing testing the below then. OK. Jason
Re: [PATCH] Add command line parsing of -fsanitize
On Tue, 18 Jun 2013, Jakub Jelinek wrote: On Tue, Jun 18, 2013 at 04:42:51PM +0200, Marek Polacek wrote: Ok, should be done now (together with other nit-fixes). Regtested/bootstrapped on x86_64-linux, ok for trunk? Looks good to me, the only thing I'm worried about are how this interferes with the %{fsanitize=address:...} and %{fsanitize=thread:...} bits in gcc.c. Because we should link in -lasan even for -fsanitize=shift,address,undefined and should not link in -lasan for -fsanitize=address -fno-sanitize=undefined,address,shift (generally, what we have guarded right now with %{fsanitize=address:...} should be done if flag_sanitize SANITIZE_ADDRESS is going to be true in the end, etc., and we'll need to link in -lubsan whenever at least one of the undefined options are set in the bitmask. -lubsan isn't incompatible with -lasan nor -ltsan, but -lasan and -ltsan are incompatible. Joseph, any thoughts how to deal with this? Try defining a new spec function or functions that uses flag_sanitize to determine what linker arguments to pass? Since the option handling is in opts.c it should get run in the driver so flag_sanitize should be set correctly there; as long as the specs in question run after the relevant option processing, a spec function should work for this. -- Joseph S. Myers jos...@codesourcery.com
Re: GCC does not support *mmintrin.h with function specific opts
On Mon, Jun 17, 2013 at 10:49 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Jun 14, 2013 at 11:08 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Jun 14, 2013 at 1:43 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Jun 14, 2013 at 4:52 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Jun 13, 2013 at 12:40 PM, Jan Hubicka hubi...@ucw.cz wrote: * tree-inline.c (expand_call_inline): Allow the error to be flagged in early inline pass. * ipa-inline.c (inline_always_inline_functions): Pretend always_inline functions are inlined during failures to flag an error. * gcc.target/i386/inline_error.c: New test. This patch is OK if it passes testing. Two tests gcc.c-torture/compile/pr43791.c and pr44043.c are failing because of always_inline functions being present that cannot be inlined and the compiler is now generating error messages. I will fix them and resend the patch. Quick look - pr43791.c is not expected to work at -O0, so skip -O0 for example by guarding the whole thing with #if __OPTIMIZED__ 0. Similar for pr44043.c. Seems like __OPTIMIZED__ is not defined in my config, dont know why. I see other tests checking for __OPTIMIZED. I fixed these two tests by adding a dg-prune-output at the end. Is that reasonable? I have attached the patch with all tests passing now. I have attached the latest patch with a small error in the previous patch fixed. I will submit this patch if there are no objections. I have committed this patch. Thanks Sri Thanks Sri Thanks Sri That we didn't error at -O0 before is a bug. Eventually I was suggesting to error if we end up outputting the body of an always_inline function, that is, any uses remain (including indirect calls or address-takens which is where we do not error right now). Richard. Thanks Sri Thanks for your patience! Honza
[patch] fix libstdc++/57641
Instead of fixing the bug three times I refactored the try_lock_xxx functions into a mixin template and used that in the various timed mutexes. PR libstdc++/57641 * include/std/mutex (timed_mutex, recursive_timed_mutex): Move common functionality to new __timed_mutex_impl mixin. Overload try_lock_until to handle conversion between different clocks. Replace constrained __try_lock_for_impl overloads with conditional increment. * include/std/shared_mutex (shared_mutex::_Mutex): Use the new mixin. * testsuite/30_threads/timed_mutex/try_lock_until/57641.cc: New. Tested x86_64-linux, committed to trunk. commit 29a304a2644431769a5d1cc0ecfe6cef71838fc3 Author: Jonathan Wakely jwakely@gmail.com Date: Tue Jun 18 22:54:35 2013 +0100 PR libstdc++/57641 * include/std/mutex (timed_mutex, recursive_timed_mutex): Move common functionality to new __timed_mutex_impl mixin. Overload try_lock_until to handle conversion between different clocks. Replace constrained __try_lock_for_impl overloads with conditional increment. * include/std/shared_mutex (shared_mutex::_Mutex): Use the new mixin. * testsuite/30_threads/timed_mutex/try_lock_until/57641.cc: New. diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex index cdd05a3..40b2e31 100644 --- a/libstdc++-v3/include/std/mutex +++ b/libstdc++-v3/include/std/mutex @@ -199,15 +199,57 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION }; #if _GTHREAD_USE_MUTEX_TIMEDLOCK - /// timed_mutex - class timed_mutex : private __mutex_base - { + templatetypename _Derived +class __timed_mutex_impl +{ +protected: #ifdef _GLIBCXX_USE_CLOCK_MONOTONIC -typedef chrono::steady_clock __clock_t; + typedef chrono::steady_clock __clock_t; #else -typedef chrono::high_resolution_clock __clock_t; + typedef chrono::high_resolution_clock__clock_t; #endif + templatetypename _Rep, typename _Period + bool + _M_try_lock_for(const chrono::duration_Rep, _Period __rtime) + { + auto __rt = chrono::duration_cast__clock_t::duration(__rtime); + if (ratio_greater__clock_t::period, _Period()) + ++__rt; + + return _M_try_lock_until(__clock_t::now() + __rt); + } + + templatetypename _Duration + bool + _M_try_lock_until(const chrono::time_point__clock_t, + _Duration __atime) + { + chrono::time_point__clock_t, chrono::seconds __s = + chrono::time_point_castchrono::seconds(__atime); + + chrono::nanoseconds __ns = + chrono::duration_castchrono::nanoseconds(__atime - __s); + + __gthread_time_t __ts = { + static_caststd::time_t(__s.time_since_epoch().count()), + static_castlong(__ns.count()) + }; + + auto __mutex = static_cast_Derived*(this)-native_handle(); + return !__gthread_mutex_timedlock(__mutex, __ts); + } + + templatetypename _Clock, typename _Duration + bool + _M_try_lock_until(const chrono::time_point_Clock, _Duration __atime) + { return _M_try_lock_for(__atime - _Clock::now()); } +}; + + /// timed_mutex + class timed_mutex + : private __mutex_base, public __timed_mutex_impltimed_mutex + { public: typedef __native_type* native_handle_type; @@ -237,25 +279,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template class _Rep, class _Period bool try_lock_for(const chrono::duration_Rep, _Period __rtime) - { return __try_lock_for_impl(__rtime); } + { return _M_try_lock_for(__rtime); } template class _Clock, class _Duration bool try_lock_until(const chrono::time_point_Clock, _Duration __atime) - { - chrono::time_point_Clock, chrono::seconds __s = - chrono::time_point_castchrono::seconds(__atime); - - chrono::nanoseconds __ns = - chrono::duration_castchrono::nanoseconds(__atime - __s); - - __gthread_time_t __ts = { - static_caststd::time_t(__s.time_since_epoch().count()), - static_castlong(__ns.count()) - }; - - return !__gthread_mutex_timedlock(_M_mutex, __ts); - } + { return _M_try_lock_until(__atime); } void unlock() @@ -267,40 +296,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION native_handle_type native_handle() { return _M_mutex; } - - private: -templatetypename _Rep, typename _Period - typename enable_if - ratio_less_equal__clock_t::period, _Period::value, bool::type - __try_lock_for_impl(const chrono::duration_Rep, _Period __rtime) - { - __clock_t::time_point __atime = __clock_t::now() - + chrono::duration_cast__clock_t::duration(__rtime); - - return try_lock_until(__atime); - } - -template typename _Rep, typename _Period - typename
Re: Symtab cleanups 4/17 - ICE in GUPC due to use of init section
On 06/18/13 22:27:51, Steven Bosscher wrote: The advice would have to be that the front end should not write out anything to the assembler file. Why not just emit the function as GIMPLE (even if your stmt_list is empty) and let your main() call it? The initialization function is currently generated in tree form in the usual way (it will be gimplified when the gimple pass is run). The code that is being generated is roughly equivalent to: static void __upc_init_decls (void) { /* Compiler generated: Initialize data related to UPC shared variables. */ } static void (*const __upc_init_addr) (void) __attribute__ ((section (upc_init_array))) = __upc_init_decls; The __upc_init_decls() function is generated on a per file (compilation unit) basis, if there is UPC related data that needs to be initialized. Thus, calling it from main() will not be sufficient. The address of this function is added to the section named upc_init_array. Sentinel symbols are added to the section by the inclusion of a upcrtbegin.o and upcrtend.o file. This is done by the gupc command line driver. The UPC runtime traverses this table and calls each per file initialization function. Would you happen to know of any code in GCC that creates a file scoped variable and initializes, perhaps along the lines of that shown above? I should be able to eventually figure something out, but any pointers/suggestions would be appreciated. Note that in the longer term it be better not to create a uniquely named section for this purpose, but that will require some re-work of the UPC runtime. For now, I'd like to find a method that of generating code along the lines of that shown above. Thanks, - Gary
Re: [PATCH] Cilk Plus Array Notation for C++
Thanks for fixing everything. This looks much better. I don't have much, just small typos. It's up to Jason and Richard to take it from here. Aldy + else + { + val = convert_like_with_context (conv, arg, fn, i-is_method, + conversion_warning + ? complain + : complain (~tf_warning)); s/i-is_method/i - is_method/ I know this isn't your fault, since the original code had this, but if you're going to copy over, might as well fix small formatting errors :). +/* Extracts all the array notation triplet information from LIST and stores them + in a 2-D arrays (size x rank) of START, LENGTH and STRIDE, holding the s/2-D arrays/2-D array/ + For example: For an array notation A[5:10:2], the vector start will be Extra space after start. + /* We need to do this because we are faking the builtin function types, +so the compiler does a bunch of typecasts and this will get rid of +all that! */ + while (TREE_CODE (call_fn) == CONVERT_EXPR +|| TREE_CODE (call_fn) == NOP_EXPR) + call_fn = TREE_OPERAND (call_fn, 0); We already have an automated way of doing this. Will this work for you?: /* Given an expression as a tree, strip any conversion that generates no instruction. Accepts both tree and const_tree arguments since we are not modifying the tree itself. */ #define STRIP_NOPS(EXP) \ (EXP) = tree_strip_nop_conversions (CONST_CAST_TREE (EXP)) Similarly elsewhere. +/* Helper function for expand_conditonal_array_notations. Encloses the Two spaces after . +static tree +expand_return_expr (tree expr) +{ + tree new_mod_list, new_var, new_mod, retval_expr; + location_t loc = EXPR_LOCATION (expr); + + if (TREE_CODE (expr) != RETURN_EXPR) +return expr; Well, since we're using C++, might as well move the comparison with RETURN_EXPR earlier, to avoid setting loc. + /* Skip empty subtrees. */ + if (!t) +return t; No need to document the obvious. +case ARRAY_NOTATION_REF: + { + tree start_index, length, stride; + op1 = tsubst_non_call_postfix_expression (ARRAY_NOTATION_ARRAY (t), + args, complain, in_decl); + start_index = RECUR (ARRAY_NOTATION_START (t)); + length = RECUR (ARRAY_NOTATION_LENGTH (t)); + stride = RECUR (ARRAY_NOTATION_STRIDE (t)); + if (!cilkplus_an_triplet_types_ok_p (loc, start_index, length, stride, +TREE_TYPE (op1))) + RETURN (error_mark_node); + RETURN (build_array_notation_ref (EXPR_LOCATION (t), op1, start_index, + length, stride, TREE_TYPE (op1))); Indentation of RETURN is wrong. + /* If the return expression contains array notatinos, then flag it as s/notatinos/notations. Though I like the sound of notatinos :-). + /* If find_rank returns false, then it should have reported an error, s/ then/then + to convert them. They will be broken up into modify exprs in future, Two spaces after period.
[C++ patch, obvious] PR 57638
Hi, if there are no objections, tomorrow I will apply this patchlet from Manuel which I consider obvious. Tested x86_64-linux. Thanks, Paolo. / /cp 2013-06-19 Manuel Lopez-Ibanez m...@gcc.gnu.org PR c++/57638 * pt.c (unify, [TEMPLATE_PARM_INDEX]): Pass to unify_type_mismatch TREE_TYPE (arg), not arg itself. /testsuite 2013-06-19 Manuel Lopez-Ibanez m...@gcc.gnu.org PR c++/57638 * g++.dg/template/error53.C: New. Index: cp/pt.c === --- cp/pt.c (revision 200178) +++ cp/pt.c (working copy) @@ -16961,7 +16961,7 @@ unify (tree tparms, tree targs, tree parm, tree ar later. */ return unify_success (explain_p); else - return unify_type_mismatch (explain_p, tparm, arg); + return unify_type_mismatch (explain_p, tparm, TREE_TYPE (arg)); /* If ARG is a parameter pack or an expansion, we cannot unify against it unless PARM is also a parameter pack. */ Index: testsuite/g++.dg/template/error53.C === --- testsuite/g++.dg/template/error53.C (revision 0) +++ testsuite/g++.dg/template/error53.C (working copy) @@ -0,0 +1,13 @@ +// PR c++/57638 + +templateint x +struct S {}; + +templatelong long i +void g(Si); + +void f() +{ + S1000 t; + g(t); // { dg-error no matching } +} // { dg-message mismatched types 'long long int' and 'int' { target *-*-* } 12 }
[PATCH] Improve folding of bitwise ops feeding conditionals for single bit types
The notable changes since the last version: First, it should properly handle signed single bit types, though I haven't tested it with real code. Second, the transformation is only applied when the result is used in a conditional. Thus it's much less likely to pessimize targets with and-not instructions as it's highly likely we'll eliminate two gimple statements rather than just one. Other comments (such as not needing to retrieve gsi_stmt) were also addressed. Testcase was renamed, but is otherwise unchanged. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for the trunk? * tree-ssa-forwprop.c (simplify_bitwise_binary_boolean): New function. (simplify_bitwise_binary): Use it to simpify certain binary ops on booleans. * gcc.dg/tree-ssa/forwprop-28.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c new file mode 100644 index 000..2c42065 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-28.c @@ -0,0 +1,76 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-forwprop1 } */ + +extern char * frob (void); +extern _Bool testit(void); + +test (int code) +{ + char * temp = frob();; + int rotate = (code == 22); + if (temp == 0 !rotate) + oof(); +} + +test_2 (int code) +{ + char * temp = frob(); + int rotate = (code == 22); + if (!rotate temp == 0) + oof(); +} + + +test_3 (int code) +{ + char * temp = frob(); + int rotate = (code == 22); + if (!rotate || temp == 0) + oof(); +} + + +test_4 (int code) +{ + char * temp = frob(); + int rotate = (code == 22); + if (temp == 0 || !rotate) + oof(); +} + + +test_5 (int code) +{ + _Bool temp = testit();; + _Bool rotate = (code == 22); + if (temp == 0 !rotate) + oof(); +} + +test_6 (int code) +{ + _Bool temp = testit(); + _Bool rotate = (code == 22); + if (!rotate temp == 0) + oof(); +} + + +test_7 (int code) +{ + _Bool temp = testit(); + _Bool rotate = (code == 22); + if (!rotate || temp == 0) + oof(); +} + + +test_8 (int code) +{ + _Bool temp = testit(); + _Bool rotate = (code == 22); + if (temp == 0 || !rotate) + oof(); +} + +/* { dg-final { scan-tree-dump-times Replaced 8 forwprop1} } */ diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c index c6a7eaf..29a0bb7 100644 --- a/gcc/tree-ssa-forwprop.c +++ b/gcc/tree-ssa-forwprop.c @@ -1870,6 +1870,52 @@ hoist_conversion_for_bitop_p (tree to, tree from) return false; } +/* GSI points to a statement of the form + + result = OP0 CODE OP1 + + Where OP0 and OP1 are single bit SSA_NAMEs and CODE is either + BIT_AND_EXPR or BIT_IOR_EXPR. + + If OP0 is fed by a bitwise negation of another single bit SSA_NAME, + then we can simplify the two statements into a single LT_EXPR or LE_EXPR + when code is BIT_AND_EXPR and BIT_IOR_EXPR respectively. + + If a simplification is mode, return TRUE, else return FALSE. */ +static bool +simplify_bitwise_binary_boolean (gimple_stmt_iterator *gsi, +enum tree_code code, +tree op0, tree op1) +{ + gimple op0_def_stmt = SSA_NAME_DEF_STMT (op0); + + if (!is_gimple_assign (op0_def_stmt) + || (gimple_assign_rhs_code (op0_def_stmt) != BIT_NOT_EXPR)) +return false; + + tree x = gimple_assign_rhs1 (op0_def_stmt); + if (TREE_CODE (x) == SSA_NAME + INTEGRAL_TYPE_P (TREE_TYPE (x)) + TYPE_PRECISION (TREE_TYPE (x)) == 1 + TYPE_UNSIGNED (TREE_TYPE (x)) == TYPE_UNSIGNED (TREE_TYPE (op1))) +{ + enum tree_code newcode; + + gimple stmt = gsi_stmt (*gsi); + gimple_assign_set_rhs1 (stmt, x); + gimple_assign_set_rhs2 (stmt, op1); + if (code == BIT_AND_EXPR) + newcode = TYPE_UNSIGNED (TREE_TYPE (x)) ? LT_EXPR : GT_EXPR; + else + newcode = TYPE_UNSIGNED (TREE_TYPE (x)) ? LE_EXPR : GE_EXPR; + gimple_assign_set_rhs_code (stmt, newcode); + update_stmt (stmt); + return true; +} + return false; + +} + /* Simplify bitwise binary operations. Return true if a transformation applied, otherwise return false. */ @@ -2117,8 +2163,44 @@ simplify_bitwise_binary (gimple_stmt_iterator *gsi) return true; } } -} + /* If arg1 and arg2 are booleans (or any single bit type) + then try to simplify: + + (~X Y) - X Y + (X ~Y) - Y X + (~X | Y) - X = Y + (X | ~Y) - Y = X + + But only do this if our result feeds into a comparison as + this transformation is not always a win, particularly on + targets with and-not instructions. */ + if (TREE_CODE (arg1) == SSA_NAME + TREE_CODE (arg2) == SSA_NAME + INTEGRAL_TYPE_P (TREE_TYPE (arg1)) + TYPE_PRECISION (TREE_TYPE (arg1)) == 1 + TYPE_PRECISION (TREE_TYPE (arg2)) == 1 + (TYPE_UNSIGNED (TREE_TYPE (arg1)) + ==
Re: [PATCH] PR57518, RA generated redundent code
Ping. On Wed, Jun 12, 2013 at 2:44 PM, Wei Mi w...@google.com wrote: Hi, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518 pr57518 happened because update_equiv_regs in IRA marked a reg equivalent with a mem, lowered its mem_cost in scan_one_insn, set NO_REGS to its rclass, but didn't consider the reg was used in paradoxical subreg which prevented the reg from being replaced by mem in LRA phase. This patch is to check whether a reg is used in a paradoxical subreg in update_equiv_regs before reg is set as equivalent to a mem. bootstrap and regression test on x86_64-linux-gnu ok. Is it ok for trunk and gcc-4.8 branch? Thanks, Wei.