Re: [PATCH] fixup libobjc usage of PCC_BITFIELD_TYPE_MATTERS
On Sun, May 03, 2015 at 10:59:46AM +0200, Andreas Schwab wrote: tbsaunde+...@tbsaunde.org writes: +AC_DEFUN([gt_BITFIELD_TYPE_MATTERS], +[ + AC_CACHE_CHECK([if the type of bitfields matters], gt_cv_bitfield_type_matters, + [ +AC_TRY_COMPILE( + [struct foo1 { char x; char :0; char y; }; +struct foo2 { char x; int :0; char y; }; +int foo1test[ sizeof (struct foo1) == 2 ? 1 : -1 ]; +int foo2test[ sizeof (struct foo2) == 5 ? 1 : -1]; ], + [], gt_cv_bitfield_type_matters=yes, gt_cv_bitfield_type_matters=no) + ]) + if test $gt_cv_bitfield_type_matters = yes; then +AC_DEFINE(HAVE_BITFIELD_TYPE_MATTERS, 1, + [Define if the type of bitfields effects alignment.]) + fi +]) gcc/config/aarch64/aarch64.h:#define PCC_BITFIELD_TYPE_MATTERS 1 configure:11554: /opt/gcc/gcc-20150503/Build/./gcc/xgcc -B/opt/gcc/gcc-20150503/Build/./gcc/ -B/usr/aarch64-suse-linux/bin/ -B/usr/aarch64-suse-linux/lib/ -isystem /usr/aarch64-suse-linux/include -isystem /usr/aarch64-suse-linux/sys-include-c -O2 -g conftest.c 5 conftest.c:27:5: error: size of array 'foo2test' is negative int foo2test[ sizeof (struct foo2) == 5 ? 1 : -1]; ^ configure:11554: $? = 1 ok, a quick test seems to show Jakub's version of the test works in this case so lets try that. Trev Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH, PR65915] Fix float conversion split.
On Thu, Apr 30, 2015 at 5:18 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Apr 30, 2015 at 8:15 AM, Ilya Tocar tocarip.in...@gmail.com wrote: Hi, Looks like I missed some splits, which caused PR65915. Patch below fixes it. Ok for trunk? 2015-04-28 Ilya Tocar ilya.to...@intel.com * config/i386/i386.md (define_split): Check for xmm16+, when splitting scalar float conversion. --- gcc/config/i386/i386.md | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 937871a..af1cd9b 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4897,7 +4897,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed SSE_REG_P (operands[0]) -(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC) +(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC) +(!EXT_REX_SSE_REG_P (operands[0]) + || TARGET_AVX512VL) [(const_int 0)] { operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0], @@ -4921,7 +4923,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_SSE_PARTIAL_REG_DEPENDENCY optimize_function_for_speed_p (cfun) -reload_completed SSE_REG_P (operands[0]) +reload_completed SSE_REG_P (operands[0]) +(!EXT_REX_SSE_REG_P (operands[0]) + || TARGET_AVX512VL) [(const_int 0)] { const machine_mode vmode = MODEF:ssevecmodemode; -- 1.8.3.1 Updated version below (now with test). --- gcc/config/i386/i386.md | 8 ++-- gcc/config/i386/sse.md | 6 +++--- gcc/testsuite/gcc.target/i386/pr65915.c | 6 ++ 3 files changed, 15 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr65915.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 937871a..af1cd9b 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4897,7 +4897,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed SSE_REG_P (operands[0]) -(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC) +(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC) +(!EXT_REX_SSE_REG_P (operands[0]) + || TARGET_AVX512VL) [(const_int 0)] { operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0], @@ -4921,7 +4923,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_SSE_PARTIAL_REG_DEPENDENCY optimize_function_for_speed_p (cfun) -reload_completed SSE_REG_P (operands[0]) +reload_completed SSE_REG_P (operands[0]) +(!EXT_REX_SSE_REG_P (operands[0]) + || TARGET_AVX512VL) [(const_int 0)] { const machine_mode vmode = MODEF:ssevecmodemode; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9b7009a..c61098d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4258,11 +4258,11 @@ (set_attr mode TI)]) (define_insn sse2_cvtsi2sd - [(set (match_operand:V2DF 0 register_operand =x,x,x) + [(set (match_operand:V2DF 0 register_operand =x,x,v) (vec_merge:V2DF (vec_duplicate:V2DF (float:DF (match_operand:SI 2 nonimmediate_operand r,m,rm))) - (match_operand:V2DF 1 register_operand 0,0,x) + (match_operand:V2DF 1 register_operand 0,0,v) (const_int 1)))] TARGET_SSE2 @ @@ -4275,7 +4275,7 @@ (set_attr amdfam10_decode vector,double,*) (set_attr bdver1_decode double,direct,*) (set_attr btver2_decode double,double,double) - (set_attr prefix orig,orig,vex) + (set_attr prefix orig,orig,maybe_evex) (set_attr mode DF)]) (define_insn sse2_cvtsi2sdqround_name diff --git a/gcc/testsuite/gcc.target/i386/pr65915.c b/gcc/testsuite/gcc.target/i386/pr65915.c new file mode 100644 index 000..990c5aa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr65915.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-options -O2 -mavx512f -fpic -mcmodel=medium } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target lp64 } */ + +#include avx512f-vrndscalepd-2.c Missing testcases for FAIL: gcc.target/i386/avx512f-vrndscaleps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512vl-vrndscaleps-2.c (internal compiler error) The attached test is OK, since these two would test for the same problem. as well as ChangeLog entries. ChangeLog is missing. Please add PR number and describe *each* change accurately. You can say (vector convert to float spltiter) for this particular nameless splitter. Please repost the patch with updated ChangeLog. Uros.
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On Sat, 2 May 2015, Jakub Jelinek wrote: Hi! This is an attempt to fix the following testcase (reduced from gsoap) similarly how you've fixed another issue with r221795 other AAPCS regressions introduced with r221348 change. This patch passed bootstrap/regtest on {x86_64,i686,armv7hl,aarch64,powerpc64{,le},s390{,x}}-linux. Though, it still doesn't fix profiledbootstrap on armv7hl that is broken since r221348, so other issues are lurking in there, and I must say I'm not entirely sure about this, because it changes alignment even when the original access had higher alignment. I was trying something like: struct B { char *a, *b; }; typedef struct B C __attribute__((aligned (8))); struct A { C a; int b; long long c; }; char v[3]; __attribute__((noinline, noclone)) void fn1 (C x, C y) { if (x.a != v[1] || y.a != v[2]) __builtin_abort (); v[1]++; } __attribute__((noinline, noclone)) int fn2 (C x) { asm volatile ( : +g (x.a) : : memory); asm volatile ( : +g (x.b) : : memory); return x.a == v[0]; } __attribute__((noinline, noclone)) void fn3 (const char *x) { if (x[0] != 0) __builtin_abort (); } static struct A foo (const char *x, struct A y, struct A z) { struct A r = { { 0, 0 }, 0, 0 }; if (y.b z.b) { if (fn2 (y.a) fn2 (z.a)) switch (x[0]) { case '|': break; default: fn3 (x); } fn1 (y.a, z.a); } return r; } __attribute__((noinline, noclone)) int bar (int x, struct A *y) { switch (x) { case 219: foo (+, y[-2], y[0]); case 220: foo (-, y[-2], y[0]); } } int main () { struct A a[3] = { { { v[1], v[0] }, 1, 1LL }, { { v[0], v[0] }, 0, 0LL }, { { v[2], v[0] }, 2, 2LL } }; bar (220, a + 2); if (v[1] != 1) __builtin_abort (); return 0; } and this patch indeed changes the register passing, eventhough it probably shouldn't (though, the testcase doesn't fail). Wouldn't it be possible to preserve the original type (before we call build_aligned_type on it) somewhere in SRA data structures, perhaps keep expr (the new MEM_REF) use the aligned type, but type field be the non-aligned one? Not sure how this helps when SRA tears apart the parameter. That is, isn't the important thing that both the IPA modified function argument types/decls have the same type as the types of the parameters SRA ends up passing? (as far as alignment goes?) Yes, of course using natural alignment makes sure that the backend can handle alignment properly and we don't run into oddball bugs here. 2015-05-02 Jakub Jelinek ja...@redhat.com PR target/65956 * tree-sra.c (turn_representatives_into_adjustments): For adj.type, use TYPE_MAIN_VARIANT of repr-type with TYPE_QUALS. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/tree-sra.c.jj 2015-04-20 14:35:47.0 +0200 +++ gcc/tree-sra.c2015-05-01 01:08:34.092636496 +0200 @@ -4427,7 +4427,11 @@ turn_representatives_into_adjustments (v gcc_assert (repr-base == parm); adj.base_index = index; adj.base = repr-base; - adj.type = repr-type; + /* Drop any special alignment on the type if it's not on the + main variant. This avoids issues with weirdo ABIs like + AAPCS. */ + adj.type = build_qualified_type (TYPE_MAIN_VARIANT (repr-type), +TYPE_QUALS (repr-type)); So - this changes the function argument type of the clone? Does it also change the type of the value we pass to the function? That is, why drop the alignment here but not avoid attaching it to repr-type in the first place as my fix for the other issue did? Doesn't the above just make it inconsistent by default? There is also the correctness issue of under-aligned types (which was what the original code using build_aligned_type cared for - before I fixed it to also preserve over-alignment). That said - somewhere we create the register we use for passing the argument, and only the type of that register needs fixing IMHO. We also have ptype = adj-type; if (is_gimple_reg_type (ptype)) { unsigned malign = GET_MODE_ALIGNMENT (TYPE_MODE (ptype)); if (TYPE_ALIGN (ptype) malign) ptype = build_aligned_type (ptype, malign); in ipa_modify_formal_parameters. That looks odd for by-value passing as well. When modifying the function bodies we simply take what was set in -new_decl which we'd populate above in ipa_modify_formal_parameters. It seems to me that ipa_modify_expr should look to preserve alignment at the callers site (for loading into the regs we pass) for non-reference passing. Esp. if (cand-by_ref) src = build_simple_mem_ref (cand-new_decl); looks bogus in this
Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests
Hi! On Thu, 30 Apr 2015 14:47:03 +0200, I wrote: Here is a patch, prepared by Jim Norris, to fix dg-shouldfail usage in OpenACC libgomp tests. It introduces two regressions (that is, makes the existing errors visible), which shall then be fixed later on: libgomp.oacc-c-c++-common/lib-3.c, and libgomp.oacc-c-c++-common/lib-42.c. As obvious, committed to trunk in r222620: [...] So much for obvious ;-) -- https://gcc.gnu.org/PR65993. Dave, would you please test the following patch, and report the regression status compared to before r222620? (Compared to your existing r222021 results, as posted in the PR, for example.) Additionally to the %p format specifier printing a 0x prefix vs. not doing that, I've also changed the expected (nil) output for NULL pointers to instead match basically everything. libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c | 4 ++-- libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c | 4 ++-- libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-8.c | 4 ++-- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-16.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-17.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-18.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-21.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-25.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-26.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-27.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-28.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-29.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-34.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-35.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-36.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-39.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-40.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-42.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-44.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-48.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-52.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-53.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-54.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-57.c | 2 +- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-58.c | 2 +- libgomp/testsuite/libgomp.oacc-fortran/data-already-1.f | 2 +- libgomp/testsuite/libgomp.oacc-fortran/data-already-2.f | 2 +- libgomp/testsuite/libgomp.oacc-fortran/data-already-8.f | 2 +- 35 files changed, 38 insertions(+), 38 deletions(-) diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c index fec2214..c0a5d00 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c @@ -64,5 +64,5 @@ main (int argc, char **argv) return 0; } -/* { dg-output Trying to map into device \\\[0x\[0-9a-f\]+..0x\[0-9a-f\]+\\\) object when \\\[0x\[0-9a-f\]+..0x\[0-9a-f\]+\\\) is already mapped } +/* { dg-output Trying to map into device \\\[\[0-9a-fA-FxX\]+..\[0-9a-fA-FxX\]+\\\) object when \\\[\[0-9a-fA-FxX\]+..\[0-9a-fA-FxX\]+\\\) is already mapped } */ /* { dg-shouldfail } */ diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c index 83c0a42..0c61a66 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c @@ -15,5 +15,5 @@ main (int argc, char *argv[]) return 0; } -/* { dg-shouldfail } - { dg-output Trying to map into device .* object when .* is already mapped } */ +/* { dg-output Trying to map into device \\\[\[0-9a-fA-FxX\]+..\[0-9a-fA-FxX\]+\\\) object when \\\[\[0-9a-fA-FxX\]+..\[0-9a-fA-FxX\]+\\\) is already mapped } */ +/* { dg-shouldfail } */ diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c index 137d8ce..cd9fea3 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c @@ -12,5 +12,5 @@ main (int argc, char *argv[]) return 0; } -/*
Re: [rs6000] Fix compare debug failure on AIX
On Mon, May 4, 2015 at 2:32 AM, David Edelsohn dje@gmail.com wrote: On Sat, May 2, 2015 at 6:04 AM, Eric Botcazou ebotca...@adacore.com wrote: Why should GCC unnecessarily create stack frames to avoid compare-debug testcase failures? I'm not sure I understand the question... compare-debug failures are failures (-g is not supposed to change the generated code and this XCOFF-specific bug was reported to us) so they need to be fixed. From there on, as Alan said, there are 2 cases: either AIX needs a frame for debugging or it doesn't. If the latter, then the lines can simply be deleted. If the former, we have to draw a line somewhere; Alan suggests always creating a frame while I suggest creating it only at -O0 and -Og. I believe that AIX does need a frame for debugging. I don't remember the exact reason off hand. I'm sorry that XCOFF debugging changes the generated code (only in the sense of allocating a frame), but that is a system dependency. It's been this way for over 20 years. I see no reason to produce worse code at -O0 when not debugging simply to make testcases happier. The simple reason is because it is policy for GCC to generate the same code with -g0 and -g. You can't simply say you don't care. You never want to run into the situation that you miscompile a program with -g0 but not with -g because that's very much no fun to debug. Yes, I don't think we have this policy written down anywhere - something we should improve on. Richard. By the way, I'm still waiting for the DWARF debugging patches from Adacore compatible with AIX as and ld. DWARF debugging would not require pushing a frame, and would resolve the failure when testing with DWARF. The patch would be adjusted to only push a frame when writing XCOFF debugging. - David
Re: [patch] Perform anonymous constant propagation during inlining
On Fri, May 1, 2015 at 8:09 PM, Eric Botcazou ebotca...@adacore.com wrote: OK, how aggressive then? We could as well do the substitution for all copies: /* For EXPAND_INITIALIZER try harder to get something simpler. Otherwise, substitute copies on the RHS, this can propagate constants at -O0 and thus simplify arithmetic operations. */ if (g == NULL !SSA_NAME_IS_DEFAULT_DEF (exp) (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp))) (modifier == EXPAND_INITIALIZER || (modifier != EXPAND_WRITE gimple_assign_copy_p (SSA_NAME_DEF_STMT (exp stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp))) g = SSA_NAME_DEF_STMT (exp); This doesn't work (this generates wrong code because this creates overlapping live ranges for SSA_NAMEs with the same base variable). Here's the latest working version, all the predicates and accessors used are inlined. Hum, the fact that your earlier version created wrong code (get_gimple_for_ssa_name already returned false here) points at some issues with EXPAND_INITIALIZER as well, no...? That said, the path you add is certainly safe (though maybe we want to change get_gimple_for_ssa_name to return tcc_constant single-use defs even if TER is disabled (thus at -O0 - and only at -O0, otherwise it shouldn't happen). That would cover more cases of get_gimple_for_ssa_name uses (I can see optimize_bitfield_expansion for example...) So, your patch is ok for trunk unless you want to explore the get_gimple_for_ssa_name improvement suggestion. I also wonder about EXPAND_INITIALIZER creating overlapping life-ranges (or moving loads across stores). Thanks, Richard. Tested on x86_64-suse-linux, OK for the mainline? 2015-05-01 Eric Botcazou ebotca...@adacore.com * expr.c (expand_expr_real_1) SSA_NAME: Try to substitute constants on the RHS of expressions. * gimple-expr.h (is_gimple_constant): Reorder. -- Eric Botcazou
[PING^4] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
Hi, This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html Regtested with aarch64-linux-gnu on QEMU. This patch has no regressions for aarch64_be-linux-gnu big-endian target too. OK for the trunk? Thanks. Jiang jiji
Re: PR 64454: Improve VRP for %
On Sat, May 2, 2015 at 12:46 AM, Marc Glisse marc.gli...@inria.fr wrote: Hello, this patch tries to tighten a bit the range estimate for x%y. slp-perm-7.c started failing by vectorizing more than expected, I assumed it was a good thing and updated the test. I am less conservative than Jakub with division by 0, but I still don't really understand how empty ranges are supposed to be represented in VRP. Bootstrap+testsuite on x86_64-linux-gnu. Hmm, so I don't like how you (continute to) use trees for the constant computations. wide-ints would be a better fit today. I also notice that fold_unary_to_constant can return NULL_TREE and neither the old nor your code handles that. empty ranges are basically UNDEFINED. Aren't you pessimizing the case where the old code used value_range_nonnegative_p() by just using TYPE_UNSIGNED? Thanks, Richard. 2015-05-02 Marc Glisse marc.gli...@inria.fr PR tree-optimization/64454 gcc/ * tree-vrp.c (extract_range_from_binary_expr_1) TRUNC_MOD_EXPR: Rewrite. gcc/testsuite/ * gcc.dg/tree-ssa/vrp97.c: New file. * gcc.dg/vect/slp-perm-7.c: Update. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/vrp97.c === --- gcc/testsuite/gcc.dg/tree-ssa/vrp97.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/vrp97.c (working copy) @@ -0,0 +1,13 @@ +/* PR tree-optimization/64454 */ +/* { dg-options -O2 -fdump-tree-vrp1 } */ + +int f(int a, int b) +{ +if (a -3 || a 13) __builtin_unreachable(); +if (b -6 || b 9) __builtin_unreachable(); +int c = a % b; +return c = -3 c = 8; +} + +/* { dg-final { scan-tree-dump return 1; vrp1 } } */ +/* { dg-final { cleanup-tree-dump vrp1 } } */ Index: gcc/testsuite/gcc.dg/vect/slp-perm-7.c === --- gcc/testsuite/gcc.dg/vect/slp-perm-7.c (revision 222708) +++ gcc/testsuite/gcc.dg/vect/slp-perm-7.c (working copy) @@ -63,15 +63,15 @@ int main (int argc, const char* argv[]) foo (input, output, input2, output2); for (i = 0; i N; i++) if (output[i] != check_results[i] || output2[i] != check_results2[i]) abort (); return 0; } -/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target vect_perm } } } */ +/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target vect_perm } } } */ /* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 1 vect { target vect_perm } } } */ /* { dg-final { cleanup-tree-dump vect } } */ Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 222708) +++ gcc/tree-vrp.c (working copy) @@ -3189,40 +3189,83 @@ extract_range_from_binary_expr_1 (value_ } } else { extract_range_from_multiplicative_op_1 (vr, code, vr0, vr1); return; } } else if (code == TRUNC_MOD_EXPR) { - if (vr1.type != VR_RANGE - || range_includes_zero_p (vr1.min, vr1.max) != 0 - || vrp_val_is_min (vr1.min)) + if (range_is_null (vr1)) + { + set_value_range_to_undefined (vr); + return; + } + // Some propagation of symbolic ranges should be possible + // at least in the unsigned case. + bool has_vr0 = vr0.type == VR_RANGE !symbolic_range_p (vr0); + bool has_vr1 = vr1.type == VR_RANGE !symbolic_range_p (vr1); + if (!has_vr0 !has_vr1) { set_value_range_to_varying (vr); return; } type = VR_RANGE; - /* Compute MAX |vr1.min|, |vr1.max| - 1. */ - max = fold_unary_to_constant (ABS_EXPR, expr_type, vr1.min); - if (tree_int_cst_lt (max, vr1.max)) - max = vr1.max; - max = int_const_binop (MINUS_EXPR, max, build_int_cst (TREE_TYPE (max), 1)); - /* If the dividend is non-negative the modulus will be -non-negative as well. */ - if (TYPE_UNSIGNED (expr_type) - || value_range_nonnegative_p (vr0)) - min = build_int_cst (TREE_TYPE (max), 0); + if (TYPE_UNSIGNED (expr_type)) + { + // A % B is at most A and smaller than B. + min = build_int_cst (expr_type, 0); + if (has_vr0 (!has_vr1 || tree_int_cst_lt (vr0.max, vr1.max))) + max = vr0.max; + else + max = int_const_binop (MINUS_EXPR, vr1.max, + build_int_cst (expr_type, 1)); + } else - min = fold_unary_to_constant (NEGATE_EXPR, expr_type, max); + { + tree min1 = NULL_TREE; + tree max1 = NULL_TREE; + if (has_vr1) + { + // ABS (A % B) ABS (B) + max1 = fold_unary_to_constant (ABS_EXPR, expr_type, vr1.min); + if (tree_int_cst_lt (max1,
Re: [PATCH, AArch64] Add Cortex-A53 erratum 843419 configure-time option
Hi Marcus, On 1 May 2015 at 17:18, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 1 May 2015 at 14:56, Yvan Roux yvan.r...@linaro.org wrote: 2015-05-01 Yvan Roux yvan.r...@linaro.org * configure.ac: Add --enable-fix-cortex-a53-843419 option. * configure: Regenerate. * config/aarch64/aarch64-elf-raw.h (CA53_ERR_843419_SPEC): Define. (LINK_SPEC): Include CA53_ERR_843419_SPEC. * config/aarch64/aarch64-linux.h (CA53_ERR_843419_SPEC): Define. (LINK_SPEC): Include CA53_ERR_843419_SPEC. * doc/install.texi (aarch64*-*-*): Document new --enable-fix-cortex-a53-843419 option * config/aarch64/aarch64.opt (mfix-cortex-a53-843419): New option. * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-843419 and -mno-fix-cortex-a53-8434199 options. +@option{--enable-fix-cortex-a53-843419} option. This erratum workaround is +made at link time and enabling it by default in GCC will only pass the How about something like The workaround is applied at link time. Enabling the workaround will cause GCC to pass the relevant option to the linker. ? Yes this is a better formulation. +corresponding flag to the linker. It can be explicitly disabled during +compilation by passing the @option{-mno-fix-cortex-a53-835769} option. Copy paste error here with the previous errata number. Here is the patch with the modifications. Is it needed to backport it into 4.9 and 5.1 branches ? Cheers, Yvan diff --git a/gcc/config/aarch64/aarch64-elf-raw.h b/gcc/config/aarch64/aarch64-elf-raw.h index ebeeb50..bd5e51c 100644 --- a/gcc/config/aarch64/aarch64-elf-raw.h +++ b/gcc/config/aarch64/aarch64-elf-raw.h @@ -35,10 +35,19 @@ %{mfix-cortex-a53-835769:--fix-cortex-a53-835769} #endif +#ifdef TARGET_FIX_ERR_A53_843419_DEFAULT +#define CA53_ERR_843419_SPEC \ + %{!mno-fix-cortex-a53-843419:--fix-cortex-a53-843419} +#else +#define CA53_ERR_843419_SPEC \ + %{mfix-cortex-a53-843419:--fix-cortex-a53-843419} +#endif + #ifndef LINK_SPEC #define LINK_SPEC %{mbig-endian:-EB} %{mlittle-endian:-EL} -X \ -maarch64elf%{mabi=ilp32*:32}%{mbig-endian:b} \ - CA53_ERR_835769_SPEC + CA53_ERR_835769_SPEC \ + CA53_ERR_843419_SPEC #endif #endif /* GCC_AARCH64_ELF_RAW_H */ diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h index 9abb252..7973268 100644 --- a/gcc/config/aarch64/aarch64-linux.h +++ b/gcc/config/aarch64/aarch64-linux.h @@ -49,8 +49,17 @@ %{mfix-cortex-a53-835769:--fix-cortex-a53-835769} #endif +#ifdef TARGET_FIX_ERR_A53_843419_DEFAULT +#define CA53_ERR_843419_SPEC \ + %{!mno-fix-cortex-a53-843419:--fix-cortex-a53-843419} +#else +#define CA53_ERR_843419_SPEC \ + %{mfix-cortex-a53-843419:--fix-cortex-a53-843419} +#endif + #define LINK_SPEC LINUX_TARGET_LINK_SPEC \ - CA53_ERR_835769_SPEC + CA53_ERR_835769_SPEC \ + CA53_ERR_843419_SPEC #define GNU_USER_TARGET_MATHFILE_SPEC \ %{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index f2ef124..6d72ac2 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -71,6 +71,10 @@ mfix-cortex-a53-835769 Target Report Var(aarch64_fix_a53_err835769) Init(2) Workaround for ARM Cortex-A53 Erratum number 835769 +mfix-cortex-a53-843419 +Target Report +Workaround for ARM Cortex-A53 Erratum number 843419 + mlittle-endian Target Report RejectNegative InverseMask(BIG_END) Assume target CPU is configured as little endian diff --git a/gcc/configure b/gcc/configure index 84f58ce..e563e94 100755 --- a/gcc/configure +++ b/gcc/configure @@ -923,6 +923,7 @@ enable_gnu_indirect_function enable_initfini_array enable_comdat enable_fix_cortex_a53_835769 +enable_fix_cortex_a53_843419 with_glibc_version enable_gnu_unique_object enable_linker_build_id @@ -1648,6 +1649,14 @@ Optional Features: disable workaround for AArch64 Cortex-A53 erratum 835769 by default + + --enable-fix-cortex-a53-843419 + enable workaround for AArch64 Cortex-A53 erratum + 843419 by default + --disable-fix-cortex-a53-843419 + disable workaround for AArch64 Cortex-A53 erratum + 843419 by default + --enable-gnu-unique-object enable the use of the @gnu_unique_object ELF extension on glibc systems @@ -18153,7 +18162,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18156 configure +#line 18165 configure #include confdefs.h #if HAVE_DLFCN_H @@ -18259,7 +18268,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18262 configure +#line 18271 configure
Re: [RFA] More type narrowing in match.pd V2
On Sat, May 2, 2015 at 2:36 AM, Jeff Law l...@redhat.com wrote: Here's an updated patch to add more type narrowing to match.pd. Changes since the last version: Slight refactoring of the condition by using types_match as suggested by Richi. I also applied the new types_match to 2 other patterns in match.pd where it seemed clearly appropriate. Additionally the transformation is restricted by using the new single_use predicate. I didn't change other patterns in match.pd to use the new single_use predicate. But some probably could be changed. This (of course) continues to pass the bootstrap and regression check for x86-linux-gnu. There's still a ton of work to do in this space. This is meant to be an incremental stand-alone improvement. OK now? Ok with the {gimple,generic}-match-head.c changes mentioned in the ChangeLog. Thanks, Richard. Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e006b26..5ee89de 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2015-05-01 Jeff Law l...@redhat.com + + * match.pd (bit_and (plus/minus (convert @0) (convert @1) mask): New + simplifier to narrow arithmetic. + 2015-05-01 Rasmus Villemoes r...@rasmusvillemoes.dk * match.pd: New simplification patterns. diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c index daa56aa..303b237 100644 --- a/gcc/generic-match-head.c +++ b/gcc/generic-match-head.c @@ -70,4 +70,20 @@ along with GCC; see the file COPYING3. If not see #include dumpfile.h #include generic-match.h +/* Routine to determine if the types T1 and T2 are effectively + the same for GENERIC. */ +inline bool +types_match (tree t1, tree t2) +{ + return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2); +} + +/* Return if T has a single use. For GENERIC, we assume this is + always true. */ + +inline bool +single_use (tree t) +{ + return true; +} diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index c7b2f95..dc13218 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -861,3 +861,21 @@ do_valueize (tree (*valueize)(tree), tree op) return op; } +/* Routine to determine if the types T1 and T2 are effectively + the same for GIMPLE. */ + +inline bool +types_match (tree t1, tree t2) +{ + return types_compatible_p (t1, t2); +} + +/* Return if T has a single use. For GIMPLE, we also allow any + non-SSA_NAME (ie constants) and zero uses to cope with uses + that aren't linked up yet. */ + +inline bool +single_use (tree t) +{ + return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t); +} diff --git a/gcc/match.pd b/gcc/match.pd index 87ecaf1..51a950a 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -289,8 +289,7 @@ along with GCC; see the file COPYING3. If not see (if (((TREE_CODE (@1) == INTEGER_CST INTEGRAL_TYPE_P (TREE_TYPE (@0)) int_fits_type_p (@1, TREE_TYPE (@0))) - || (GIMPLE types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))) - || (GENERIC TREE_TYPE (@0) == TREE_TYPE (@1))) + || types_match (TREE_TYPE (@0), TREE_TYPE (@1))) /* ??? This transform conflicts with fold-const.c doing Convert (T)(x c) into (T)x (T)c, if c is an integer constants (if x has signed type, the sign bit cannot be set @@ -949,8 +948,7 @@ along with GCC; see the file COPYING3. If not see /* Unordered tests if either argument is a NaN. */ (simplify (bit_ior (unordered @0 @0) (unordered @1 @1)) - (if ((GIMPLE types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))) - || (GENERIC TREE_TYPE (@0) == TREE_TYPE (@1))) + (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))) (unordered @0 @1))) (simplify (bit_ior:c (unordered @0 @0) (unordered:c@2 @0 @1)) @@ -1054,7 +1052,7 @@ along with GCC; see the file COPYING3. If not see operation and convert the result to the desired type. */ (for op (plus minus) (simplify -(convert (op (convert@2 @0) (convert@3 @1))) +(convert (op@4 (convert@2 @0) (convert@3 @1))) (if (INTEGRAL_TYPE_P (type) /* We check for type compatibility between @0 and @1 below, so there's no need to check that @1/@3 are integral types. */ @@ -1070,15 +1068,45 @@ along with GCC; see the file COPYING3. If not see TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type)) /* The inner conversion must be a widening conversion. */ TYPE_PRECISION (TREE_TYPE (@2)) TYPE_PRECISION (TREE_TYPE (@0)) - ((GENERIC - (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) - == TYPE_MAIN_VARIANT (TREE_TYPE (@1))) - (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) - == TYPE_MAIN_VARIANT (type))) -|| (GIMPLE - types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1)) - types_compatible_p (TREE_TYPE (@0), type +
[PATCH, ARM] Fix testcases that require Thumb2 effective target.
Hi, This patch fixes two ARM testcases that require target to be Thumb2 effective. One is built for Cortex-m3, the purpose of the second one is to generate thumb2_addsi3_compare0_scratch insn and both are failing when compiled for armv5t for instance. Built and regtested, is it OK for trunk ? Thanks, Yvan 2015-05-04 Yvan Roux yvan.r...@linaro.org * gcc.target/arm/pr65067.c: Require Thumb2 effective target. * gcc.target/arm/pr65924.c: Likewise. diff --git a/gcc/testsuite/gcc.target/arm/pr65067.c b/gcc/testsuite/gcc.target/arm/pr65067.c index 9ddd7bb..05da294 100644 --- a/gcc/testsuite/gcc.target/arm/pr65067.c +++ b/gcc/testsuite/gcc.target/arm/pr65067.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_thumb2_ok } */ /* { dg-options -mthumb -mcpu=cortex-m3 -O2 } */ struct tmp { diff --git a/gcc/testsuite/gcc.target/arm/pr65924.c b/gcc/testsuite/gcc.target/arm/pr65924.c index 746749f..e1ad394 100644 --- a/gcc/testsuite/gcc.target/arm/pr65924.c +++ b/gcc/testsuite/gcc.target/arm/pr65924.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_thumb2_ok } */ /* { dg-options -O2 -mthumb } */ int a, b, c;
Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease
Jakub Jelinek ja...@redhat.com writes: On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote: Hi, I checked this patch into gcc-5-branch. That's wrong according to https://gcc.gnu.org/develop.html#num_scheme HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0 versions of gcc identify as 5.1.1, with no way of telling them apart, like datestamp and revison. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease
On Mon, May 04, 2015 at 11:13:51AM +0200, Rainer Orth wrote: Jakub Jelinek ja...@redhat.com writes: On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote: Hi, I checked this patch into gcc-5-branch. That's wrong according to https://gcc.gnu.org/develop.html#num_scheme HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0 versions of gcc identify as 5.1.1, with no way of telling them apart, like datestamp and revison. That suggests we should change DATESTAMP_s := \$(if $(DEVPHASE_c), $(DATESTAMP_c))\ so that it would expand to DATESTAMP_c also if DEVPHASE_c is empty, but BASEVER_c does not end with .0 Jakub
Re: [PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook
Hi Christian, I noticed case gcc.dg/ipa/iinline-attr.c failed on aarch64. The original patch is x86 specific, while the case is added as general one. Could you please have a look at this? FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline hooray[^\\n]*inline copy in test that is the same latent bug for aarch64: alignment flags are not propagated with attribute optimize (O2). testing attached patch Christian Index: config/aarch64/aarch64.c === --- config/aarch64/aarch64.c (revision 222627) +++ config/aarch64/aarch64.c (working copy) @@ -6908,18 +6908,6 @@ #endif } - /* If not opzimizing for size, set the default - alignment to what the target wants */ - if (!optimize_size) -{ - if (align_loops = 0) - align_loops = aarch64_tune_params-loop_align; - if (align_jumps = 0) - align_jumps = aarch64_tune_params-jump_align; - if (align_functions = 0) - align_functions = aarch64_tune_params-function_align; -} - if (AARCH64_TUNE_FMA_STEERING) aarch64_register_fma_steering (); @@ -6935,6 +6923,18 @@ flag_omit_leaf_frame_pointer = false; else if (flag_omit_leaf_frame_pointer) flag_omit_frame_pointer = true; + + /* If not opzimizing for size, set the default + alignment to what the target wants */ + if (!optimize_size) +{ + if (align_loops = 0) + align_loops = aarch64_tune_params-loop_align; + if (align_jumps = 0) + align_jumps = aarch64_tune_params-jump_align; + if (align_functions = 0) + align_functions = aarch64_tune_params-function_align; +} } static struct machine_function *
Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease
On Mon, 4 May 2015, Jakub Jelinek wrote: On Mon, May 04, 2015 at 11:13:51AM +0200, Rainer Orth wrote: Jakub Jelinek ja...@redhat.com writes: On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote: Hi, I checked this patch into gcc-5-branch. That's wrong according to https://gcc.gnu.org/develop.html#num_scheme HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0 versions of gcc identify as 5.1.1, with no way of telling them apart, like datestamp and revison. That suggests we should change DATESTAMP_s := \$(if $(DEVPHASE_c), $(DATESTAMP_c))\ so that it would expand to DATESTAMP_c also if DEVPHASE_c is empty, but BASEVER_c does not end with .0 Yes. Richard. -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
Re: [patch] Perform anonymous constant propagation during inlining
Hum, the fact that your earlier version created wrong code (get_gimple_for_ssa_name already returned false here) points at some issues with EXPAND_INITIALIZER as well, no...? Theoritically yes but, in practice, EXPAND_INITIALIZER is used in varasm.c and for debugging stuff only, so I don't think that's a real concern. That said, the path you add is certainly safe (though maybe we want to change get_gimple_for_ssa_name to return tcc_constant single-use defs even if TER is disabled (thus at -O0 - and only at -O0, otherwise it shouldn't happen). That would cover more cases of get_gimple_for_ssa_name uses (I can see optimize_bitfield_expansion for example...) optimize_bitfield_assignment_op is only interested in loads from bitfields though. The get_gimple_for_ssa_name route would be interesting to bypass the stmt_is_replaceable_p test, i.e. to bypass the single-use test, but this could be counter-productive at -O0 so I'm not sure it's worth the trouble. -- Eric Botcazou
[PATCH, AArch64] [4.8] Backport PR64304 fix (miscompilation with -mgeneral-regs-only )
According to your opinion, I split the backports of pr64304 into 2 emails, and this one is for 4.8 branch. This patch backport the fix of PR target/64304 , miscompilation with -mgeneral-regs-only, to the 4.8 branch from trunk r219844. Tested on x86_64 by using qemu of aarch64. OK for 4.8? diff -rupN gcc-4.8-20150226/gcc/ChangeLog gcc-4.8-20150226.pr64304//gcc/ChangeLog --- gcc-4.8-20150226/gcc/ChangeLog2015-03-04 21:13:46.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/ChangeLog2015-03-04 21:19:49.0 -0500 @@ -1,3 +1,13 @@ +2015-03-05 Shanyao Chen chenshan...@huawei.com + +Backported from mainline +2015-01-19 Jiong Wang jiong.w...@arm.com +Andrew Pinski apin...@cavium.com + +PR target/64304 +* config/aarch64/aarch64.md (define_insn *ashlmode3_insn): Deleted. +(ashlmode3): Don't expand if operands[2] is not constant. + 2015-02-26 Peter Bergner berg...@vnet.ibm.com Backport from mainline diff -rupN gcc-4.8-20150226/gcc/config/aarch64/aarch64.md gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md --- gcc-4.8-20150226/gcc/config/aarch64/aarch64.md2015-03-04 21:14:29.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 21:21:54.0 -0500 @@ -2612,6 +2612,8 @@ DONE; } } +else + FAIL; } ) @@ -2681,16 +2683,6 @@ (set_attr mode SI)] ) -(define_insn *ashlmode3_insn - [(set (match_operand:SHORT 0 register_operand =r) -(ashift:SHORT (match_operand:SHORT 1 register_operand r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_si rUss)))] - - lsl\\t%w0, %w1, %w2 - [(set_attr v8type shift) - (set_attr mode MODE)] -) - (define_insn *optabmode3_insn [(set (match_operand:SHORT 0 register_operand =r) (ASHIFT:SHORT (match_operand:SHORT 1 register_operand r) diff -rupN gcc-4.8-20150226/gcc/testsuite/ChangeLog gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog --- gcc-4.8-20150226/gcc/testsuite/ChangeLog2015-03-04 21:16:54.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog2015-03-04 21:22:58.0 -0500 @@ -1,3 +1,10 @@ +2015-03-05 Shanyao chen chenshan...@huawei.com + +Backported from mainline +2015-01-19 Jiong Wang jiong.w...@arm.com + +* gcc.target/aarch64/pr64304.c: New testcase. + 2015-02-26 Peter Bergner berg...@vnet.ibm.com Backport from mainline diff -rupN gcc-4.8-20150226/gcc/testsuite/gcc.target/aarch64/pr64304.c gcc-4.8-20150226.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c --- gcc-4.8-20150226/gcc/testsuite/gcc.target/aarch64/pr64304.c 1969-12-31 19:00:00.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c 2015-03-04 21:12:15.0 -0500 @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 --save-temps } */ + +unsigned char byte = 0; + +void +set_bit (unsigned int bit, unsigned char value) +{ + unsigned char mask = (unsigned char) (1 (bit 7)); + + if (! value) +byte = (unsigned char)~mask; + else +byte |= mask; +/* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, 7 } } */ +} + +/* { dg-final { cleanup-saved-temps } } */ diff -rupN gcc-4.8-20150226/gcc/ChangeLog gcc-4.8-20150226.pr64304//gcc/ChangeLog --- gcc-4.8-20150226/gcc/ChangeLog 2015-03-04 21:13:46.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/ChangeLog 2015-03-04 21:19:49.0 -0500 @@ -1,3 +1,13 @@ +2015-03-05 Shanyao Chen chenshan...@huawei.com + +Backported from mainline +2015-01-19 Jiong Wang jiong.w...@arm.com +Andrew Pinski apin...@cavium.com + +PR target/64304 +* config/aarch64/aarch64.md (define_insn *ashlmode3_insn): Deleted. +(ashlmode3): Don't expand if operands[2] is not constant. + 2015-02-26 Peter Bergner berg...@vnet.ibm.com Backport from mainline diff -rupN gcc-4.8-20150226/gcc/config/aarch64/aarch64.md gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md --- gcc-4.8-20150226/gcc/config/aarch64/aarch64.md 2015-03-04 21:14:29.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 21:21:54.0 -0500 @@ -2612,6 +2612,8 @@ DONE; } } +else + FAIL; } ) @@ -2681,16 +2683,6 @@ (set_attr mode SI)] ) -(define_insn *ashlmode3_insn - [(set (match_operand:SHORT 0 register_operand =r) - (ashift:SHORT (match_operand:SHORT 1 register_operand r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_si rUss)))] - - lsl\\t%w0, %w1, %w2 - [(set_attr v8type shift) - (set_attr mode MODE)] -) - (define_insn *optabmode3_insn [(set (match_operand:SHORT 0 register_operand =r) (ASHIFT:SHORT (match_operand:SHORT 1 register_operand r) diff -rupN gcc-4.8-20150226/gcc/testsuite/ChangeLog gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog ---
Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease
On Mon, May 04, 2015 at 11:31:11AM +0200, Richard Biener wrote: On Mon, 4 May 2015, Jakub Jelinek wrote: On Mon, May 04, 2015 at 11:13:51AM +0200, Rainer Orth wrote: Jakub Jelinek ja...@redhat.com writes: On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote: Hi, I checked this patch into gcc-5-branch. That's wrong according to https://gcc.gnu.org/develop.html#num_scheme HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0 versions of gcc identify as 5.1.1, with no way of telling them apart, like datestamp and revison. That suggests we should change DATESTAMP_s := \$(if $(DEVPHASE_c), $(DATESTAMP_c))\ so that it would expand to DATESTAMP_c also if DEVPHASE_c is empty, but BASEVER_c does not end with .0 Yes. Here is a patch to do that, ok for trunk/5? 2015-05-04 Jakub Jelinek ja...@redhat.com * Makefile.in (PATCHLEVEL_c): New variable. (DATESTAMP_s, REVISION_s): If PATCHLEVEL_c is not 0, expand the same way as if DEVPHASE_c was non-empty. --- gcc/Makefile.in.jj 2015-04-12 21:50:12.0 +0200 +++ gcc/Makefile.in 2015-05-04 12:03:03.394797230 +0200 @@ -828,14 +828,20 @@ endif version := $(BASEVER_c) +PATCHLEVEL_c := \ + $(shell echo $(BASEVER_c) | sed -e 's/^[0-9]*\.[0-9]*\.\([0-9]*\)$$/\1/') + + # For use in version.c - double quoted strings, with appropriate # surrounding punctuation and spaces, and with the datestamp and # development phase collapsed to the empty string in release mode -# (i.e. if DEVPHASE_c is empty). The space immediately after the -# comma in the $(if ...) constructs is significant - do not remove it. +# (i.e. if DEVPHASE_c is empty and PATCHLEVEL_c is 0). The space +# immediately after the comma in the $(if ...) constructs is +# significant - do not remove it. BASEVER_s := \$(BASEVER_c)\ DEVPHASE_s := \$(if $(DEVPHASE_c), ($(DEVPHASE_c)))\ -DATESTAMP_s := \$(if $(DEVPHASE_c), $(DATESTAMP_c))\ +DATESTAMP_s := \ + \$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\ PKGVERSION_s:= \@PKGVERSION@\ BUGURL_s:= \@REPORT_BUGS_TO@\ @@ -843,7 +849,8 @@ PKGVERSION := @PKGVERSION@ BUGURL_TEXI := @REPORT_BUGS_TEXI@ ifdef REVISION_c -REVISION_s := \$(if $(DEVPHASE_c), $(REVISION_c))\ +REVISION_s := \ + \$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(REVISION_c))\ else REVISION_s := \\ endif Jakub
Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease
On Mon, 4 May 2015, Jakub Jelinek wrote: On Mon, May 04, 2015 at 11:31:11AM +0200, Richard Biener wrote: On Mon, 4 May 2015, Jakub Jelinek wrote: On Mon, May 04, 2015 at 11:13:51AM +0200, Rainer Orth wrote: Jakub Jelinek ja...@redhat.com writes: On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote: Hi, I checked this patch into gcc-5-branch. That's wrong according to https://gcc.gnu.org/develop.html#num_scheme HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0 versions of gcc identify as 5.1.1, with no way of telling them apart, like datestamp and revison. That suggests we should change DATESTAMP_s := \$(if $(DEVPHASE_c), $(DATESTAMP_c))\ so that it would expand to DATESTAMP_c also if DEVPHASE_c is empty, but BASEVER_c does not end with .0 Yes. Here is a patch to do that, ok for trunk/5? Looks good to me. Thanks, Richard. 2015-05-04 Jakub Jelinek ja...@redhat.com * Makefile.in (PATCHLEVEL_c): New variable. (DATESTAMP_s, REVISION_s): If PATCHLEVEL_c is not 0, expand the same way as if DEVPHASE_c was non-empty. --- gcc/Makefile.in.jj2015-04-12 21:50:12.0 +0200 +++ gcc/Makefile.in 2015-05-04 12:03:03.394797230 +0200 @@ -828,14 +828,20 @@ endif version := $(BASEVER_c) +PATCHLEVEL_c := \ + $(shell echo $(BASEVER_c) | sed -e 's/^[0-9]*\.[0-9]*\.\([0-9]*\)$$/\1/') + + # For use in version.c - double quoted strings, with appropriate # surrounding punctuation and spaces, and with the datestamp and # development phase collapsed to the empty string in release mode -# (i.e. if DEVPHASE_c is empty). The space immediately after the -# comma in the $(if ...) constructs is significant - do not remove it. +# (i.e. if DEVPHASE_c is empty and PATCHLEVEL_c is 0). The space +# immediately after the comma in the $(if ...) constructs is +# significant - do not remove it. BASEVER_s := \$(BASEVER_c)\ DEVPHASE_s := \$(if $(DEVPHASE_c), ($(DEVPHASE_c)))\ -DATESTAMP_s := \$(if $(DEVPHASE_c), $(DATESTAMP_c))\ +DATESTAMP_s := \ + \$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\ PKGVERSION_s:= \@PKGVERSION@\ BUGURL_s:= \@REPORT_BUGS_TO@\ @@ -843,7 +849,8 @@ PKGVERSION := @PKGVERSION@ BUGURL_TEXI := @REPORT_BUGS_TEXI@ ifdef REVISION_c -REVISION_s := \$(if $(DEVPHASE_c), $(REVISION_c))\ +REVISION_s := \ + \$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(REVISION_c))\ else REVISION_s := \\ endif Jakub -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)
[PATCH, AArch64] [4.9] Backport PR64304 fix (miscompilation with -mgeneral-regs-only )
According to your opinion, I split the backports of pr64304 into 2 emails, and this one is for 4.9 branch. This patch backport the fix of PR target/64304 , miscompilation with -mgeneral-regs-only, to the 4.9 branch from trunk r219844. Tested on x86_64 by using qemu of aarch64. OK for 4.9? diff -rupN gcc-4.9-20150225/gcc/ChangeLog gcc-4.9-20150225.pr64304//gcc/ChangeLog --- gcc-4.9-20150225/gcc/ChangeLog2015-03-04 20:48:30.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/ChangeLog2015-03-04 20:55:59.0 -0500 @@ -1,3 +1,13 @@ +2015-03-05 Shanyao Chen chenshan...@huawei.com + +Backported from mainline +2015-01-19 Jiong Wang jiong.w...@arm.com +Andrew Pinski apin...@cavium.com + +PR target/64304 +* config/aarch64/aarch64.md (define_insn *ashlmode3_insn): Deleted. +(ashlmode3): Don't expand if operands[2] is not constant. + 2015-02-25 Kai Tietz kti...@redhat.com PR tree-optimization/61917 diff -rupN gcc-4.9-20150225/gcc/config/aarch64/aarch64.md gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md --- gcc-4.9-20150225/gcc/config/aarch64/aarch64.md2015-03-04 20:41:03.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 20:46:44.0 -0500 @@ -2719,6 +2719,8 @@ DONE; } } +else + FAIL; } ) @@ -2947,15 +2949,6 @@ [(set_attr type shift_reg)] ) -(define_insn *ashlmode3_insn - [(set (match_operand:SHORT 0 register_operand =r) -(ashift:SHORT (match_operand:SHORT 1 register_operand r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_si rUss)))] - - lsl\\t%w0, %w1, %w2 - [(set_attr type shift_reg)] -) - (define_insn *optabmode3_insn [(set (match_operand:SHORT 0 register_operand =r) (ASHIFT:SHORT (match_operand:SHORT 1 register_operand r) diff -rupN gcc-4.9-20150225/gcc/testsuite/ChangeLog gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog --- gcc-4.9-20150225/gcc/testsuite/ChangeLog2015-03-04 21:00:24.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog2015-03-04 21:03:21.0 -0500 @@ -1,3 +1,10 @@ +2015-03-05 Shanyao chen chenshan...@huawei.com + +Backported from mainline +2015-01-19 Jiong Wang jiong.w...@arm.com + +* gcc.target/aarch64/pr64304.c: New testcase. + 2015-02-25 Kai Tietz kti...@redhat.com Backported from mainline diff -rupN gcc-4.9-20150225/gcc/testsuite/gcc.target/aarch64/pr64304.c gcc-4.9-20150225.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c --- gcc-4.9-20150225/gcc/testsuite/gcc.target/aarch64/pr64304.c 1969-12-31 19:00:00.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c 2015-03-04 20:59:24.0 -0500 @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 --save-temps } */ + +unsigned char byte = 0; + +void +set_bit (unsigned int bit, unsigned char value) +{ + unsigned char mask = (unsigned char) (1 (bit 7)); + + if (! value) +byte = (unsigned char)~mask; + else +byte |= mask; +/* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, 7 } } */ +} + +/* { dg-final { cleanup-saved-temps } } */ diff -rupN gcc-4.9-20150225/gcc/ChangeLog gcc-4.9-20150225.pr64304//gcc/ChangeLog --- gcc-4.9-20150225/gcc/ChangeLog 2015-03-04 20:48:30.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/ChangeLog 2015-03-04 20:55:59.0 -0500 @@ -1,3 +1,13 @@ +2015-03-05 Shanyao Chen chenshan...@huawei.com + + Backported from mainline + 2015-01-19 Jiong Wang jiong.w...@arm.com + Andrew Pinski apin...@cavium.com + + PR target/64304 + * config/aarch64/aarch64.md (define_insn *ashlmode3_insn): Deleted. + (ashlmode3): Don't expand if operands[2] is not constant. + 2015-02-25 Kai Tietz kti...@redhat.com PR tree-optimization/61917 diff -rupN gcc-4.9-20150225/gcc/config/aarch64/aarch64.md gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md --- gcc-4.9-20150225/gcc/config/aarch64/aarch64.md 2015-03-04 20:41:03.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 20:46:44.0 -0500 @@ -2719,6 +2719,8 @@ DONE; } } +else + FAIL; } ) @@ -2947,15 +2949,6 @@ [(set_attr type shift_reg)] ) -(define_insn *ashlmode3_insn - [(set (match_operand:SHORT 0 register_operand =r) - (ashift:SHORT (match_operand:SHORT 1 register_operand r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_si rUss)))] - - lsl\\t%w0, %w1, %w2 - [(set_attr type shift_reg)] -) - (define_insn *optabmode3_insn [(set (match_operand:SHORT 0 register_operand =r) (ASHIFT:SHORT (match_operand:SHORT 1 register_operand r) diff -rupN gcc-4.9-20150225/gcc/testsuite/ChangeLog gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog --- gcc-4.9-20150225/gcc/testsuite/ChangeLog2015-03-04 21:00:24.0 -0500 +++
[PATCH] Fix PR65965
We don't support vectorizing group stores with gaps - so the natural thing is to just split groups at such boundaries which enables more BB vectorization (and likely loop vectorization as well, though that would be some weird cases I suspect). Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2015-05-04 Richard Biener rguent...@suse.de PR tree-optimization/65965 * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Split store groups at gaps. * gcc.dg/vect/bb-slp-33.c: New testcase. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 222758) +++ gcc/tree-vect-data-refs.c (working copy) @@ -2602,6 +2602,15 @@ vect_analyze_data_ref_accesses (loop_vec if ((init_b - init_a) % type_size_a != 0) break; + /* If we have a store, the accesses are adjacent. This splits +groups into chunks we support (we don't support vectorization +of stores with gaps). */ + if (!DR_IS_READ (dra) + (((unsigned HOST_WIDE_INT)init_b + - TREE_INT_CST_LOW (DR_INIT (datarefs_copy[i-1]))) + != type_size_a)) + break; + /* The step (if not zero) is greater than the difference between data-refs' inits. This splits groups into suitable sizes. */ HOST_WIDE_INT step = tree_to_shwi (DR_STEP (dra)); Index: gcc/testsuite/gcc.dg/vect/bb-slp-33.c === --- gcc/testsuite/gcc.dg/vect/bb-slp-33.c (revision 0) +++ gcc/testsuite/gcc.dg/vect/bb-slp-33.c (working copy) @@ -0,0 +1,49 @@ +/* { dg-require-effective-target vect_int } */ + +#include tree-vect.h + +extern void abort (void); + +void __attribute__((noinline,noclone)) +test(int *__restrict__ a, int *__restrict__ b) +{ + a[0] = b[0]; + a[1] = b[1]; + a[2] = b[2]; + a[3] = b[3]; + a[5] = 0; + a[6] = 0; + a[7] = 0; + a[8] = 0; +} + +int main() +{ + int a[9]; + int b[4]; + b[0] = 1; + __asm__ volatile (); + b[1] = 2; + __asm__ volatile (); + b[2] = 3; + __asm__ volatile (); + b[3] = 4; + __asm__ volatile (); + a[4] = 7; + check_vect (); + test(a, b); + if (a[0] != 1 + || a[1] != 2 + || a[2] != 3 + || a[3] != 4 + || a[4] != 7 + || a[5] != 0 + || a[6] != 0 + || a[7] != 0 + || a[8] != 0) +abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 2 slp2 { target { vect_element_align || vect_hw_misalign } } } } */ +/* { dg-final { cleanup-tree-dump slp2 } } */
[PATCH] Fix PR65935
The following fixes PR65935 where the vectorizer is confused after SLP operands swapping to see the stmts in the IL with unswapped operands. As we already swap for different def-kinds just swap for other swaps as well. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2015-05-04 Richard Biener rguent...@suse.de PR tree-optimization/65935 * tree-vect-slp.c (vect_build_slp_tree): If we swapped operands then make sure to apply that swapping to the IL. * gcc.dg/vect/pr65935.c: New testcase. Index: gcc/tree-vect-slp.c === *** gcc/tree-vect-slp.c (revision 222758) --- gcc/tree-vect-slp.c (working copy) *** vect_build_slp_tree (loop_vec_info loop_ *** 1081,1093 dump_printf (MSG_NOTE, %d , j); } dump_printf (MSG_NOTE, \n); ! /* And try again ... */ if (vect_build_slp_tree (loop_vinfo, bb_vinfo, child, group_size, max_nunits, loads, vectorization_factor, ! matches, npermutes, this_tree_size, max_tree_size)) { oprnd_info-def_stmts = vNULL; SLP_TREE_CHILDREN (*node).quick_push (child); continue; --- 1081,1105 dump_printf (MSG_NOTE, %d , j); } dump_printf (MSG_NOTE, \n); ! /* And try again with scratch 'matches' ... */ ! bool *tem = XALLOCAVEC (bool, group_size); if (vect_build_slp_tree (loop_vinfo, bb_vinfo, child, group_size, max_nunits, loads, vectorization_factor, ! tem, npermutes, this_tree_size, max_tree_size)) { + /* ... so if successful we can apply the operand swapping +to the GIMPLE IL. This is necessary because for example +vect_get_slp_defs uses operand indexes and thus expects +canonical operand order. */ + for (j = 0; j group_size; ++j) + if (!matches[j]) + { + gimple stmt = SLP_TREE_SCALAR_STMTS (*node)[j]; + swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt), + gimple_assign_rhs2_ptr (stmt)); + } oprnd_info-def_stmts = vNULL; SLP_TREE_CHILDREN (*node).quick_push (child); continue; Index: gcc/testsuite/gcc.dg/vect/pr65935.c === *** gcc/testsuite/gcc.dg/vect/pr65935.c (revision 0) --- gcc/testsuite/gcc.dg/vect/pr65935.c (working copy) *** *** 0 --- 1,63 + /* { dg-do run } */ + /* { dg-additional-options -O3 } */ + /* { dg-require-effective-target vect_double } */ + + #include tree-vect.h + + extern void abort (void); + extern void *malloc (__SIZE_TYPE__); + + struct site { + struct { + struct { + double real; + double imag; + } e[3][3]; + } link[32]; + double phase[32]; + } *lattice; + int sites_on_node; + + void rephase (void) + { + int i,j,k,dir; + struct site *s; + for(i=0,s=lattice;isites_on_node;i++,s++) + for(dir=0;dir32;dir++) + for(j=0;j3;j++)for(k=0;k3;k++) + { + s-link[dir].e[j][k].real *= s-phase[dir]; + s-link[dir].e[j][k].imag *= s-phase[dir]; + } + } + + int main() + { + int i,j,k; + check_vect (); + sites_on_node = 1; + lattice = malloc (sizeof (struct site) * sites_on_node); + for (i = 0; i 32; ++i) + { + lattice-phase[i] = i; + for (j = 0; j 3; ++j) + for (k = 0; k 3; ++k) + { + lattice-link[i].e[j][k].real = 1.0; + lattice-link[i].e[j][k].imag = 1.0; + __asm__ volatile ( : : : memory); + } + } + rephase (); + for (i = 0; i 32; ++i) + for (j = 0; j 3; ++j) + for (k = 0; k 3; ++k) + if (lattice-link[i].e[j][k].real != i + || lattice-link[i].e[j][k].imag != i) + abort (); + return 0; + } + + /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 slp1 } } */ + /* { dg-final { cleanup-tree-dump slp1 } } */ + /* { dg-final { cleanup-tree-dump vect } } */
Re: [PATCH] Fix PR65935
On Mon, May 4, 2015 at 4:15 AM, Richard Biener rguent...@suse.de wrote: The following fixes PR65935 where the vectorizer is confused after SLP operands swapping to see the stmts in the IL with unswapped operands. As we already swap for different def-kinds just swap for other swaps as well. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2015-05-04 Richard Biener rguent...@suse.de PR tree-optimization/65935 * tree-vect-slp.c (vect_build_slp_tree): If we swapped operands then make sure to apply that swapping to the IL. * gcc.dg/vect/pr65935.c: New testcase. Index: gcc/tree-vect-slp.c === *** gcc/tree-vect-slp.c (revision 222758) --- gcc/tree-vect-slp.c (working copy) *** vect_build_slp_tree (loop_vec_info loop_ *** 1081,1093 dump_printf (MSG_NOTE, %d , j); } dump_printf (MSG_NOTE, \n); ! /* And try again ... */ if (vect_build_slp_tree (loop_vinfo, bb_vinfo, child, group_size, max_nunits, loads, vectorization_factor, ! matches, npermutes, this_tree_size, max_tree_size)) { oprnd_info-def_stmts = vNULL; SLP_TREE_CHILDREN (*node).quick_push (child); continue; --- 1081,1105 dump_printf (MSG_NOTE, %d , j); } dump_printf (MSG_NOTE, \n); ! /* And try again with scratch 'matches' ... */ ! bool *tem = XALLOCAVEC (bool, group_size); if (vect_build_slp_tree (loop_vinfo, bb_vinfo, child, group_size, max_nunits, loads, vectorization_factor, ! tem, npermutes, this_tree_size, max_tree_size)) { + /* ... so if successful we can apply the operand swapping +to the GIMPLE IL. This is necessary because for example +vect_get_slp_defs uses operand indexes and thus expects +canonical operand order. */ + for (j = 0; j group_size; ++j) + if (!matches[j]) + { + gimple stmt = SLP_TREE_SCALAR_STMTS (*node)[j]; + swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt), + gimple_assign_rhs2_ptr (stmt)); + } oprnd_info-def_stmts = vNULL; SLP_TREE_CHILDREN (*node).quick_push (child); continue; Index: gcc/testsuite/gcc.dg/vect/pr65935.c === *** gcc/testsuite/gcc.dg/vect/pr65935.c (revision 0) --- gcc/testsuite/gcc.dg/vect/pr65935.c (working copy) *** *** 0 --- 1,63 + /* { dg-do run } */ + /* { dg-additional-options -O3 } */ + /* { dg-require-effective-target vect_double } */ + + #include tree-vect.h + + extern void abort (void); + extern void *malloc (__SIZE_TYPE__); + + struct site { + struct { + struct { + double real; + double imag; + } e[3][3]; + } link[32]; + double phase[32]; + } *lattice; + int sites_on_node; + + void rephase (void) + { + int i,j,k,dir; + struct site *s; + for(i=0,s=lattice;isites_on_node;i++,s++) + for(dir=0;dir32;dir++) + for(j=0;j3;j++)for(k=0;k3;k++) + { + s-link[dir].e[j][k].real *= s-phase[dir]; + s-link[dir].e[j][k].imag *= s-phase[dir]; + } + } + + int main() + { + int i,j,k; + check_vect (); + sites_on_node = 1; + lattice = malloc (sizeof (struct site) * sites_on_node); + for (i = 0; i 32; ++i) + { + lattice-phase[i] = i; + for (j = 0; j 3; ++j) + for (k = 0; k 3; ++k) + { + lattice-link[i].e[j][k].real = 1.0; + lattice-link[i].e[j][k].imag = 1.0; + __asm__ volatile ( : : : memory); + } + } + rephase (); + for (i = 0; i 32; ++i) + for (j = 0; j 3; ++j) + for (k = 0; k 3; ++k) + if (lattice-link[i].e[j][k].real != i + || lattice-link[i].e[j][k].imag != i) + abort (); + return 0; + } + + /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 slp1 } } */ + /* { dg-final { cleanup-tree-dump slp1 } } */ + /* { dg-final { cleanup-tree-dump vect } } */ Need for these when it is a run-time test. -- H.J.
[PATCH] Remove dead code.
This patch removes a write only variable from the C++ code. ChangeLog: -- 2015-05-04 Dominik Vogt v...@linux.vnet.ibm.com * call.c (print_z_candidates): Remove dead code. -- Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany From 6943ad84a5a5b69c7cf5df1ea5bb6ab5fd254825 Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Mon, 4 May 2015 12:46:21 +0100 Subject: [PATCH] Remove dead code. --- gcc/cp/call.c | 4 1 file changed, 4 deletions(-) diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 31d2b9c..55350f8 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -3436,7 +3436,6 @@ print_z_candidates (location_t loc, struct z_candidate *candidates) { struct z_candidate *cand1; struct z_candidate **cand2; - int n_candidates; if (!candidates) return; @@ -3478,9 +3477,6 @@ print_z_candidates (location_t loc, struct z_candidate *candidates) } } - for (n_candidates = 0, cand1 = candidates; cand1; cand1 = cand1-next) -n_candidates++; - for (; candidates; candidates = candidates-next) print_z_candidate (loc, candidate:, candidates); } -- 2.3.0
Re: [rs6000] Fix compare debug failure on AIX
On 04 May 2015, at 02:32, David Edelsohn dje@gmail.com wrote: On Sat, May 2, 2015 at 6:04 AM, Eric Botcazou ebotca...@adacore.com wrote: Why should GCC unnecessarily create stack frames to avoid compare-debug testcase failures? I'm not sure I understand the question... compare-debug failures are failures (-g is not supposed to change the generated code and this XCOFF-specific bug was reported to us) so they need to be fixed. From there on, as Alan said, there are 2 cases: either AIX needs a frame for debugging or it doesn't. If the latter, then the lines can simply be deleted. If the former, we have to draw a line somewhere; Alan suggests always creating a frame while I suggest creating it only at -O0 and -Og. I believe that AIX does need a frame for debugging. I don't remember the exact reason off hand. I'm sorry that XCOFF debugging changes the generated code (only in the sense of allocating a frame), but that is a system dependency. It's been this way for over 20 years. I see no reason to produce worse code at -O0 when not debugging simply to make testcases happier. By the way, I'm still waiting for the DWARF debugging patches from Adacore compatible with AIX as and ld. DWARF debugging would not require pushing a frame, and would resolve the failure when testing with DWARF. The patch would be adjusted to only push a frame when writing XCOFF debugging. Sorry but we don’t have these patches. We have a tiny patch to generate Dwarf debug infos on XCOFF platforms but that requires GNU as and ld. Tristan.
Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests
Thomas Schwinge tho...@codesourcery.com writes: Additionally to the %p format specifier printing a 0x prefix vs. not doing that, I've also changed the expected (nil) output for NULL pointers to instead match basically everything. You cannot expect printf to print (nil) or variant for NULL pointers. E.g. on Solaris 10 you get a SEGV instead. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: PR 64454: Improve VRP for %
On Mon, 4 May 2015, Richard Biener wrote: On Sat, May 2, 2015 at 12:46 AM, Marc Glisse marc.gli...@inria.fr wrote: Hello, this patch tries to tighten a bit the range estimate for x%y. slp-perm-7.c started failing by vectorizing more than expected, I assumed it was a good thing and updated the test. I am less conservative than Jakub with division by 0, but I still don't really understand how empty ranges are supposed to be represented in VRP. Bootstrap+testsuite on x86_64-linux-gnu. Hmm, so I don't like how you (continute to) use trees for the constant computations. wide-ints would be a better fit today. I also notice that fold_unary_to_constant can return NULL_TREE and neither the old nor your code handles that. You are right. I was lazy and tried to keep this part of the old code, I shouldn't have... empty ranges are basically UNDEFINED. Cool, that's what I did. But I don't see code adding calls to __builtin_unreachable() when an empty range is detected. Maybe that almost never happens? Aren't you pessimizing the case where the old code used value_range_nonnegative_p() by just using TYPE_UNSIGNED? I don't think so. The old code only handled signed types in the positive case, while I have a more complete handling of signed types, which should do at least as good as the old one even in the positive case. -- Marc Glisse
[PATCH] Fix(?) PR66002
This fixes a missed vectorization of a function in paq8p. Without merged PHI nodes phiopt doesn't recognize adjacent MIN/MAX_EXPRs. Certainly no other pass I schedule mergephi over cares for merged PHIs (DCE might even be confused here). Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2015-05-04 Richard Biener rguent...@suse.de PR tree-optimization/66002 * passes.def: Schedule pass_merge_phi after VRP, right before ifcombine and phiopt. * gcc.dg/vect/vect-125.c: New testcase. Index: gcc/passes.def === *** gcc/passes.def (revision 222760) --- gcc/passes.def (working copy) *** along with GCC; see the file COPYING3. *** 168,174 NEXT_PASS (pass_build_alias); NEXT_PASS (pass_return_slot); NEXT_PASS (pass_fre); - NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_vrp); NEXT_PASS (pass_chkp_opt); NEXT_PASS (pass_dce); --- 168,173 *** along with GCC; see the file COPYING3. *** 176,181 --- 175,181 NEXT_PASS (pass_call_cdce); NEXT_PASS (pass_cselim); NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_tree_ifcombine); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_tail_recursion); Index: gcc/testsuite/gcc.dg/vect/vect-125.c === *** gcc/testsuite/gcc.dg/vect/vect-125.c(revision 0) --- gcc/testsuite/gcc.dg/vect/vect-125.c(working copy) *** *** 0 --- 1,19 + /* { dg-do compile } */ + /* { dg-require-effective-target vect_int } */ + /* { dg-require-effective-target vect_pack_trunc } */ + /* { dg-require-effective-target vect_unpack } */ + + void train(short *t, short *w, int n, int err) + { + n=(n+7)-8; + for (int i=0; in; ++i) + { + int wt=w[i]+((t[i]*err*216)+11); + if (wt-32768) wt=-32768; + if (wt32767) wt=32767; + w[i]=wt; + } + } + + /* { dg-final { scan-tree-dump vectorized 1 loops vect { xfail vect_no_int_max } } } */ + /* { dg-final { cleanup-tree-dump vect } } */
Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests
On 2015-05-04 4:32 AM, Thomas Schwinge wrote: Dave, would you please test the following patch, and report the regression status compared to before r222620? (Compared to your existing r222021 results, as posted in the PR, for example.) With patch, we have the following fails on hppa2.0w-hp-hpux11.11: FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-3.c -DACC_DEVICE_TYPE_host =1 -DACC_MEM_SHARED=1 output pattern test, is libgomp: no device found , should match device [0-9]+\([0-9]+\) is initialized FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-42.c -DACC_DEVICE_TYPE_hos t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match \[[0-9a-fA-FxX]+,2 56\] is not mapped FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-62.c -DACC_DEVICE_TYPE_hos t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match invalid size Running /test/gnu/gcc/gcc/libgomp/testsuite/libgomp.oacc-c++/c++.exp ... FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/lib-3.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is libgomp: no device found , should match device [0-9]+\([0-9]+\) is initialized FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/lib-42.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is , should match \[[0-9a-fA-FxX]+,256\] is not mapped FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/lib-62.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is , should match invalid size Note this is a 32-bit build and not the 64-bit build reported in PR. However, I would expect similar printf support. Don't have a 64-bit build handy. Dave -- John David Anglin dave.ang...@bell.net
Re: Extend verify_type to check various uses of TYPE_MINVAL
Jan Hubicka hubi...@ucw.cz writes: Hi, this patch extends verify_type to check various uses of TYPE_MINVAL. I also added check that MIN_VALUE have compatible type with T: useless_type_conversion_p (const_cast tree (t), TREE_TYPE (TYPE_MIN_VALUE (t))) but that one fails interesting ways for C sizetype. I will try to look into this and thus this patch omits it. The main motivation is to check that various frontend overrides of TYPE_MINVAL are under control. Bootstrapped/regtested x86_64-linux, will commit it as obvious. Not obvious enough, it seems: this patch broke gnat.dg/lto* tests at least on i386-pc-solaris2.10. E.g. FAIL: gnat.dg/lto1.adb (test for excess errors) WARNING: gnat.dg/lto1.adb compilation failed to produce executable FAIL: gnat.dg/lto1.adb (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/solaris/gcc/testsuite/gnat.dg/lto1_pkg.adb:23:1: error: TYPE_MIN_VALUE is not constant placeholder_expr feb2b9b0 type integer_type fea16000 sizetype public unsigned SI size integer_cst fea041cc constant 32 unit size integer_cst fea041e0 constant 4 align 32 symtab 0 alias set -1 canonical type fea16000 precision 32 min integer_cst fea041f4 0 max integer_cst fea04000 4294967295 integer_type feb67ba0 lto1_pkg__Tfiltering_levels_tB___UB0 type integer_type fea16000 sizetype public unsigned SI size integer_cst fea041cc constant 32 unit size integer_cst fea041e0 constant 4 align 32 symtab 0 alias set -1 canonical type fea16000 precision 32 min integer_cst fea041f4 0 max integer_cst fea04000 4294967295 sizes-gimplified visited SI size integer_cst fea041cc 32 unit size integer_cst fea041e0 4 align 32 symtab 0 alias set -1 canonical type feb67ba0 precision 32 min placeholder_expr feb2b9b0 max placeholder_expr feb2b9c0 index type integer_type feb67b40 type enumeral_type feb67960 lto1_pkg__filtering_level_t sizes-gimplified visited unsigned QI size integer_cst fea042d0 constant 8 unit size integer_cst fea042e4 constant 1 align 8 symtab 0 alias set -1 canonical type feb67960 precision 8 min integer_cst feb60e4c 0 max integer_cst feb60f3c 255 values tree_list feb648e8 purpose identifier_node feb639d8 lto1_pkg__none value integer_cst feb60e4c constant visited 0 chain tree_list feb64918 purpose identifier_node feb639f4 lto1_pkg__pr_in_clutter value integer_cst feb60f50 constant 1 chain tree_list feb64930 purpose identifier_node feb63a10 lto1_pkg__ssr_plots value integer_cst feb60f78 constant 2 chain tree_list feb64948 purpose identifier_node feb63a2c lto1_pkg__pr_plots value integer_cst feb60fa0 3 context translation_unit_decl fed805f0 D.18 chain type_decl feb687e8 lto1_pkg__filtering_level_t QI size integer_cst fea042d0 8 unit size integer_cst fea042e4 1 align 8 symtab 0 alias set -1 canonical type feb67b40 precision 8 min integer_cst feb60e4c 0 max integer_cst feb60fa0 3 RM min component_ref feb63a9c RM max component_ref feb63ab8 chain type_decl feb68a10 D.4194 Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests
Rainer Orth r...@cebitec.uni-bielefeld.de writes: You cannot expect printf to print (nil) or variant for NULL pointers. E.g. on Solaris 10 you get a SEGV instead. You are probably mixing it up with %s. %p is required to handle NULL like any other valid pointer value. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests
Andreas Schwab sch...@linux-m68k.org writes: Rainer Orth r...@cebitec.uni-bielefeld.de writes: You cannot expect printf to print (nil) or variant for NULL pointers. E.g. on Solaris 10 you get a SEGV instead. You are probably mixing it up with %s. %p is required to handle NULL like any other valid pointer value. Seems so. Sorry for the noise. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
Hi, On Thu, 30 Apr 2015, Sriraman Tallam wrote: We noticed that one of our benchmarks sped-up by ~1% when we eliminated PLT stubs for some of the hot external library functions like memcmp, pow. The win was from better icache and itlb performance. The main reason was that the PLT stubs had no spatial locality with the call-sites. I have started looking at ways to tell the compiler to eliminate PLT stubs (in-effect inline them) for specified external functions, for x86_64. I have a proposal and a patch and I would like to hear what you think. This comes with caveats. This cannot be generally done for all functions marked extern as it is impossible for the compiler to say if a function is truly extern (defined in a shared library). If a function is not truly extern(ends up defined in the final executable), then calling it indirectly is a performance penalty as it could have been a direct call. This can be fixed by Alans idea. Further, the newly created GOT entries are fixed up at start-up and do not get lazily bound. And this can be fixed by some enhancements in the linker and dynamic linker. The idea is to still generate a PLT stub and make its GOT entry point to it initially (like a normal got.plt slot). Then the first indirect call will use the address of PLT entry (starting lazy resolution) and update the GOT slot with the real address, so further indirect calls will directly go to the function. This requires a new asm marker (and hence new reloc) as normally if there's a GOT slot it's filled by the real symbols address, unlike if there's only a got.plt slot. E.g. a call *foo@GOTPLT(%rip) would generate a GOT slot (and fill its address into above call insn), but generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. Ciao, Michael.
Re: Extend verify_type to check various uses of TYPE_MINVAL
Not obvious enough, it seems: this patch broke gnat.dg/lto* tests at least on i386-pc-solaris2.10. E.g. FAIL: gnat.dg/lto1.adb (test for excess errors) WARNING: gnat.dg/lto1.adb compilation failed to produce executable FAIL: gnat.dg/lto1.adb (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/solaris/gcc/testsuite/gnat.dg/lto1_pkg.adb:23:1: error: TYPE_MIN_VALUE is not constant TYPE_MIN_VALUE can be arbitrary in Ada, with or without LTO. For package Q is function LB return Natural; function UB return Natural; end Q; with Q; package P is type Arr1 is array (Natural range ) of Boolean; subtype Arr2 is Arr1 (Q.LB .. Q.UB); end P; the TYPE_DOMAIN of Arr2 is domain integer_type 0x769be000 type integer_type 0x76d0e0a8 sizetype sizes-gimplified visited DI size integer_cst 0x76d0abb8 64 unit size integer_cst 0x76d0abd0 8 align 64 symtab 0 alias set -1 canonical type 0x769be000 precision 64 min nop_expr 0x769bd000 max cond_expr 0x769b9420 -- Eric Botcazou
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On Mon, May 04, 2015 at 10:11:13AM +0200, Richard Biener wrote: Not sure how this helps when SRA tears apart the parameter. That is, isn't the important thing that both the IPA modified function argument types/decls have the same type as the types of the parameters SRA ends up passing? (as far as alignment goes?) Yes, of course using natural alignment makes sure that the backend can handle alignment properly and we don't run into oddball bugs here. On IRC we were discussing making /* Return true if mode/type need doubleword alignment. */ static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + || (type TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY)); } Looking at struct S { char a[16]; }; typedef struct S T; typedef struct S U __attribute__((aligned (16))); struct V { U u; T v; }; typedef int N __attribute__((aligned (16))); T t1; U u1; int a[3]; void f5 (__builtin_va_list *ap) { t1 = __builtin_va_arg (*ap, T); a[0] = __builtin_va_arg (*ap, int); u1 = __builtin_va_arg (*ap, U); a[1] = __builtin_va_arg (*ap, int); a[2] = __builtin_va_arg (*ap, N); } void f6 (int, N, int, U); void f7 (void) { U u = {}; f6 (0, (N) 1, 0, u); } and s/16/8/g output, it seems that neither i?86 nor x86_64 care about the alignment for any passing, ppc64le cares about aggregates, but not scalars apparently (with a warning that the passing changed), arm cares about both. And the f7 function shows that for non-aggregates, what arm does is simply never going to work, because there is no way to pass down the scalars aligned, f6 is still called with 1 in int type rather than N. So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. /* PR target/65956 */ struct B { char *a, *b; }; typedef struct B C __attribute__((aligned (8))); struct A { C a; int b; long long c; }; char v[3]; __attribute__((noinline, noclone)) void fn1 (int v, ...) { __builtin_va_list ap; __builtin_va_start (ap, v); C c, d; c = __builtin_va_arg (ap, C); __builtin_va_arg (ap, int); d = __builtin_va_arg (ap, C); __builtin_va_end (ap); if (c.a != v[1] || d.a != v[2]) __builtin_abort (); v[1]++; } __attribute__((noinline, noclone)) int fn2 (C x) { asm volatile ( : +g (x.a) : : memory); asm volatile ( : +g (x.b) : : memory); return x.a == v[0]; } __attribute__((noinline, noclone)) void fn3 (const char *x) { if (x[0] != 0) __builtin_abort (); } static struct A foo (const char *x, struct A y, struct A z) { struct A r = { { 0, 0 }, 0, 0 }; if (y.b z.b) { if (fn2 (y.a) fn2 (z.a)) switch (x[0]) { case '|': break; default: fn3 (x); } fn1 (0, y.a, 0, z.a); } return r; } __attribute__((noinline, noclone)) int bar (int x, struct A *y) { switch (x) { case 219: foo (+, y[-2], y[0]); case 220: foo (-, y[-2], y[0]); } } int main () { struct A a[3] = { { { v[1], v[0] }, 1, 1LL }, { { v[0], v[0] }, 0, 0LL }, { { v[2], v[0] }, 2, 2LL } }; bar (220, a + 2); if (v[1] != 1) __builtin_abort (); return 0; } Jakub
Re: [C++17] Implement N3928 - Extending static_assert
On Sat, May 02, 2015 at 04:16:18PM -0400, Ed Smith-Rowland wrote: This extends' static assert to not require a message string. I elected to make this work also for C++11 and C++14 and warn only with -pedantic. I think many people just write static_assert(thing, ); . I took the path of building an empty string in the parser in this case. I wasn't sure if setting message to NULL_TREE would cause sadness later on or not. I also, perhaps in a fit of overzealousness made finish_static_assert not print the extra : and an empty message in this case. I didn't modify _Static_assert for C. I'm not aware of any C DR that is asking for _Static_assert (cst-expr), so I suppose there's no need to change C at this point. Marek
[Committed] Restore bootstrap for ARM
All, I committed the below as obvious. Andreas 2015-05-04 Andreas Tobler andre...@gcc.gnu.org * config/arm/arm.c: Restore bootstrap. Index: config/arm/arm.c === --- config/arm/arm.c(revision 222767) +++ config/arm/arm.c(working copy) @@ -150,7 +150,7 @@ static void assign_minipool_offsets (Mfix *); static void arm_print_value (FILE *, rtx); static void dump_minipool (rtx_insn *); -static int arm_barrier_cost (rtx); +static int arm_barrier_cost (rtx_insn *); static Mfix *create_fix_barrier (Mfix *, HOST_WIDE_INT); static void push_minipool_barrier (rtx_insn *, HOST_WIDE_INT); static void push_minipool_fix (rtx_insn *, HOST_WIDE_INT, rtx *,
PIC calls without PLT, generic implementation
Recent post by Sriraman prompts me to post my -fno-plt approach sooner rather than later; I was working on no-PLT PIC codegen in last few days too. Although I'm posting a patch series, half of it is i386 backend tuning and can go in independently. Except one patch where it's noted specifically, the patches were bootstrapped and regtested together, not separately, on x86-64. Likewise the improvement claimed below is obtained with GCC with all patches applied, the difference being only in -fno-plt flag. The approach taken here is different. Instead of adjusting call expansion in the back end, I force callee address to be loaded into a pseudo at RTL expansion time, similar to function CSE which is not enabled to most targets. The address load (which loads from GOT) can be moved out of loops, scheduled, or, on x86, re-fused with indirect jump by peepholes. On 32-bit x86, it also allows the compiler to use registers other than %ebx for GOT pointer (which can be a win since %ebx is callee-saved). The benefit of PLT is the possibility of lazy relocation. It is not possible with BIND_NOW, in particular when -z relro -z now flags were used at link time as security hardening measure. Performance-critical executables do not particularly need PLT and lazy relocation too, except if they are used very frequently, with each individual run time extremely small -- but in that case they can benefit massively from static linking or less massively from prelinking, and with prelinking they can get the benefit of no-plt. I've used LLVM/Clang to evaluate performance impact of PLT-less PIC codegen. I configured with cmake -DLLVM_ENABLE_PIC=ON -DBUILD_SHARED_LIBS=ON \ -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=OFF from 3.6 release branch; this configuration mimics non-static build that e.g. OpenSUSE is using, and produces Clang dependent on 112 clang/llvm shared libraries, with roughly 24000 externally visible functions. Without input files time is mostly spent on dynamic linking, so without prelink there's a predictable regression, from 55 to 140 ms. On C++ hello world, I get: PLT no-PLT PLT+BIND_NOW [32bit] 430 ms 535 ms 590 ms [64bit] 410 ms 495 ms 555 ms So no-PLT is 20% slower than default, but already 10% faster when non-lazy binding is forced. On tramp3d compilation with -O2 -g I get: PLT no-PLT [32bit] 49.0 s 43.3 s [64bit] 41.6 s 36.8 s So on long-running compiles -fno-plt is a very significant win. Note that I'm using Clang as (perhaps extreme) example of PIC-call-intensive code, but the argument about -fno-plt being useful for performance should apply generally. When looking at code size changes, there's a 1% improvement on 32-bit libstdc++ and a small regression on 64-bit. On LLVM/Clang, there's overall size regression on both 32-bit and 64-bit; I've tried to analyze it and so far came up with one possible cause, which is detailed in IRA REG_EQUIV patch. Thanks. Alexander
[PATCH i386] Move CLOBBERED_REGS earlier in register class list
On 32-bit x86, register class CLOBBERED_REGS is a proper subset of LEGACY_REGS, which causes IRA not to consider it separately for register allocation, even when it has lower cost than other classes. This patch is useful to fix code generation problem that appears with no-PLT PIC tailcalls. Was there a specific reason for CLOBBERED_REGS class to be listed as late as it is? On 32-bit this class contains only EAX, ECX, EDX. OK? * config/i386/i386.h (enum reg_class): Move CLOBBERED_REGS before Q_REGS. (REG_CLASS_NAMES): Ditto. (REG_CLASS_CONTENTS): Ditto. diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 1e755d3..75071ac 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1300,17 +1300,17 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); enum reg_class { NO_REGS, AREG, DREG, CREG, BREG, SIREG, DIREG, AD_REGS, /* %eax/%edx for DImode */ + CLOBBERED_REGS, /* call-clobbered integer registers */ Q_REGS, /* %eax %ebx %ecx %edx */ NON_Q_REGS, /* %esi %edi %ebp %esp */ INDEX_REGS, /* %eax %ebx %ecx %edx %esi %edi %ebp */ LEGACY_REGS, /* %eax %ebx %ecx %edx %esi %edi %ebp %esp */ - CLOBBERED_REGS, /* call-clobbered integer registers */ GENERAL_REGS,/* %eax %ebx %ecx %edx %esi %edi %ebp %esp %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */ FP_TOP_REG, FP_SECOND_REG, /* %st(0) %st(1) */ FLOAT_REGS, SSE_FIRST_REG, NO_REX_SSE_REGS, @@ -1361,16 +1361,16 @@ enum reg_class #define REG_CLASS_NAMES \ { NO_REGS, \ AREG, DREG, CREG, BREG, \ SIREG, DIREG, \ AD_REGS, \ + CLOBBERED_REGS, \ Q_REGS, NON_Q_REGS, \ INDEX_REGS, \ LEGACY_REGS, \ - CLOBBERED_REGS, \ GENERAL_REGS, \ FP_TOP_REG, FP_SECOND_REG, \ FLOAT_REGS, \ SSE_FIRST_REG,\ NO_REX_SSE_REGS, \ SSE_REGS, \ @@ -1400,17 +1400,17 @@ enum reg_class { 0x02, 0x0,0x0 }, /* DREG */ \ { 0x04, 0x0,0x0 }, /* CREG */ \ { 0x08, 0x0,0x0 }, /* BREG */ \ { 0x10, 0x0,0x0 }, /* SIREG */ \ { 0x20, 0x0,0x0 }, /* DIREG */ \ { 0x03, 0x0,0x0 }, /* AD_REGS */ \ + { 0x07, 0x0,0x0 }, /* CLOBBERED_REGS */\ { 0x0f, 0x0,0x0 }, /* Q_REGS */\ { 0x1100f0,0x1fe0,0x0 }, /* NON_Q_REGS */\ { 0x7f,0x1fe0,0x0 }, /* INDEX_REGS */\ { 0x1100ff, 0x0,0x0 }, /* LEGACY_REGS */ \ - { 0x07, 0x0,0x0 }, /* CLOBBERED_REGS */\ { 0x1100ff,0x1fe0,0x0 }, /* GENERAL_REGS */ \ { 0x100, 0x0,0x0 }, /* FP_TOP_REG */\ { 0x0200, 0x0,0x0 }, /* FP_SECOND_REG */ \ { 0xff00, 0x0,0x0 }, /* FLOAT_REGS */\ { 0x20, 0x0,0x0 }, /* SSE_FIRST_REG */ \ { 0x1fe0, 0x00,0x0 }, /* NO_REX_SSE_REGS */ \
[PATCH i386] PR65753: allow PIC tail calls via function pointers
In the i386 backend, tailcalls are incorrectly disallowed in PIC mode for calls via function pointers on the basis that indirect calls, like direct calls, would go via PLT and thus require %ebx to point to GOT -- but that is not true. Quoting Rich Felker who reported the bug, For PLT slots in the non-PIE main executable, %ebx is not required at all. PLT slots in PIE or shared libraries need %ebx, but a function pointer can never evaluate to such a PLT slot; it always evaluates to the nominal address of the function which is the same in all DSOs and therefore fundamentally cannot depend on the address of the GOT in the calling DSO As far as I can see it's simply a mistake that was there from day 1 (comment 4 in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65753 points to original patch). Bootstrapped and regtested on 32-bit x86, OK for trunk? (the comment before the condition will need to be adjusted too, i.e. s/optimize any indirect call, or a direct call/optimize any direct call/ ) PR target/65753 * config/i386/i386.c (ix86_function_ok_for_sibcall): Allow PIC sibcalls via function pointers. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 3263656..f29e053 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5448,13 +5448,13 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) /* If we are generating position-independent code, we cannot sibcall optimize any indirect call, or a direct call to a global function, as the PLT requires %ebx be live. (Darwin does not have a PLT.) */ if (!TARGET_MACHO !TARGET_64BIT flag_pic - (!decl || !targetm.binds_local_p (decl))) + (decl !targetm.binds_local_p (decl))) return false; /* If we need to align the outgoing stack, then sibcalling would unalign the stack, which may break the called function. */ if (ix86_minimum_incoming_stack_boundary (true) PREFERRED_STACK_BOUNDARY)
[PATCH i386] Allow sibcalls in no-PLT PIC
With -fno-plt, we don't have to reject even direct calls as sibcall candidates. This patch depends on '-fplt' flag that is introduced in another patch. This patch requires that with -fno-plt all sibcall candidates go through prepare_call_address that transforms the call to a GOT lookup. OK? * config/i386/i386.c (ix86_function_ok_for_sibcall): Check flag_plt. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index f29e053..b734350 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5448,12 +5448,13 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) /* If we are generating position-independent code, we cannot sibcall optimize any indirect call, or a direct call to a global function, as the PLT requires %ebx be live. (Darwin does not have a PLT.) */ if (!TARGET_MACHO !TARGET_64BIT flag_pic + flag_plt (decl !targetm.binds_local_p (decl))) return false; /* If we need to align the outgoing stack, then sibcalling would unalign the stack, which may break the called function. */ if (ix86_minimum_incoming_stack_boundary (true)
[PATCH i386] Extend sibcall peepholes to allow source in %eax
On i386, peepholes that transform memory load and register-indirect jump into memory-indirect jump are overly restrictive in that they don't allow combining when the jump target is loaded into %eax, and the called function returns a value (also in %eax, so it's not dead after the call). Fix this by checking for same source and output register operands separately. OK? * config/i386/i386.md (sibcall_value_memory): Extend peepholes to allow memory address in %eax. (sibcall_value_pop_memory): Likewise. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 729db75..7f81bcc 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -11872,13 +11872,14 @@ [(set (match_operand:W 0 register_operand) (match_operand:W 1 memory_operand)) (set (match_operand 2) (call (mem:QI (match_dup 0)) (match_operand 3)))] !TARGET_X32 SIBLING_CALL_P (peep2_next_insn (1)) -peep2_reg_dead_p (2, operands[0]) +(REGNO (operands[2]) == REGNO (operands[0]) + || peep2_reg_dead_p (2, operands[0])) [(parallel [(set (match_dup 2) (call (mem:QI (match_dup 1)) (match_dup 3))) (unspec [(const_int 0)] UNSPEC_PEEPSIB)])]) (define_peephole2 @@ -11886,13 +11887,14 @@ (match_operand:W 1 memory_operand)) (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) (set (match_operand 2) (call (mem:QI (match_dup 0)) (match_operand 3)))] !TARGET_X32 SIBLING_CALL_P (peep2_next_insn (2)) -peep2_reg_dead_p (3, operands[0]) +(REGNO (operands[2]) == REGNO (operands[0]) + || peep2_reg_dead_p (3, operands[0])) [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) (parallel [(set (match_dup 2) (call (mem:QI (match_dup 1)) (match_dup 3))) (unspec [(const_int 0)] UNSPEC_PEEPSIB)])]) @@ -11951,13 +11953,14 @@ (call (mem:QI (match_dup 0)) (match_operand 3))) (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (match_operand:SI 4 immediate_operand)))])] !TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (1)) -peep2_reg_dead_p (2, operands[0]) +(REGNO (operands[2]) == REGNO (operands[0]) + || peep2_reg_dead_p (2, operands[0])) [(parallel [(set (match_dup 2) (call (mem:QI (match_dup 1)) (match_dup 3))) (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (match_dup 4))) @@ -11971,13 +11974,14 @@ (call (mem:QI (match_dup 0)) (match_operand 3))) (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (match_operand:SI 4 immediate_operand)))])] !TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (2)) -peep2_reg_dead_p (3, operands[0]) +(REGNO (operands[2]) == REGNO (operands[0]) + || peep2_reg_dead_p (3, operands[0])) [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) (parallel [(set (match_dup 2) (call (mem:QI (match_dup 1)) (match_dup 3))) (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG)
[PATCH] Expand PIC calls without PLT with -fno-plt
This patch introduces option -fno-plt that allows to expand calls that would go via PLT to load the address of the function immediately at call site (which introduces a GOT load). Cover letter explains the motivation for this patch. New option documentation for invoke.texi is missing from the patch; if this is accepted I'll be happy to send a v2 with documentation added. * calls.c (prepare_call_address): Transform PLT call to GOT lookup and indirect call by forcing address into a pseudo with -fno-plt. * common.opt (flag_plt): New option. diff --git a/gcc/calls.c b/gcc/calls.c index 970415d..0c3b9aa 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -222,12 +222,18 @@ prepare_call_address (tree fndecl_or_type, rtx funexp, rtx static_chain_value, /* If we are using registers for parameters, force the function address into a register now. */ funexp = ((reg_parm_seen targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); + else if (flag_pic !flag_plt fndecl_or_type + TREE_CODE (fndecl_or_type) == FUNCTION_DECL + !targetm.binds_local_p (fndecl_or_type)) +{ + funexp = force_reg (Pmode, funexp); +} else if (! sibcallp) { #ifndef NO_FUNCTION_CSE if (optimize ! flag_no_function_cse) funexp = force_reg (Pmode, funexp); #endif diff --git a/gcc/common.opt b/gcc/common.opt index b49ac46..cd8b256 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1773,12 +1773,16 @@ Common Report Var(flag_pic,1) Negative(fpie) Generate position-independent code if possible (small mode) fpie Common Report Var(flag_pie,1) Negative(fPIC) Generate position-independent code for executables if possible (small mode) +fplt +Common Report Var(flag_plt) Init(1) +Use PLT for PIC calls (-fno-plt: load the address from GOT at call site) + fplugin= Common Joined RejectNegative Var(common_deferred_options) Defer Specify a plugin to load fplugin-arg- Common Joined RejectNegative Var(common_deferred_options) Defer
[RFC PATCH] ira: accept loads via argp rtx in validate_equiv_mem
With this patch at hand, I'd like to discuss a code generation problem, which my patch solves only partially. FWIW, it passes bootstrap/regtest on x86-64. With other patches in series applied, GCC with -fno-plt can generate tail calls in PIC mode more frequently, but sometimes poorer code is generated. I've tried to look for possible causes, and found one issue so far. Consider the following testcase: void foo1(int a, int b, int c, int d, int e, int f, int g, int h); int bar(int x); void foo2(int a, int b, int c, int d, int e, int f, int g, int h) { bar(a); foo1(a, b, c, d, e, f, g, h); } Comparing x86 code generation with -O2 -m32 and with/without -fPIC, you can see that -fPIC happens to produce smaller code. Without -fPIC, GCC saves/restores all arguments before/after call to 'bar'. The reason for that is without -fPIC, GCC performs tail call optimization on 'foo1', and that causes it to drop REG_EQUIV notes for incoming arguments in fixup_tail_calls. After that, code generation diverges at IRA stage, where lack of equivalences prevents loads of pseudos to be moved to the point of first use. The patch tries to repair the problem by allowing REG_EQUIV notes to be resynthesized at ira init for loads that happen via `argp' rtx. It helps for the simple testcase above, but not for problematic Clang/LLVM functions where I noticed the issue. I hope there's a way around the 'big hammer' approach of fixup_tail_calls. Might it be possible instead of dropping REG_EQUIV notes, to copy incoming arguments into other pseudos just prior to stack pointer adjustment in preparation for tailcall? diff --git a/gcc/ira.c b/gcc/ira.c index ea2b69f..e6b82e2 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -3001,13 +3001,16 @@ validate_equiv_mem (rtx_insn *start, rtx reg, rtx memref) /* This used to ignore readonly memory and const/pure calls. The problem is the equivalent form may reference a pseudo which gets assigned a call clobbered hard reg. When we later replace REG with its equivalent form, the value in the call-clobbered reg has been changed and all hell breaks loose. */ - if (CALL_P (insn)) + rtx addr = XEXP (memref, 0); + if (GET_CODE (addr) == PLUS GET_CODE (XEXP (addr, 1)) == CONST_INT) + addr = XEXP (addr, 0); + if (CALL_P (insn) addr != arg_pointer_rtx) return 0; note_stores (PATTERN (insn), validate_equiv_mem_from_store, NULL); /* If a register mentioned in MEMREF is modified via an auto-increment, we lose the equivalence. Do the same if one
Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
The use case proposed by Sri allows user to selectively eliminate PLT overhead for hot external calls only. In such scenarios, lazy binding won't be something matters to the user. David On Mon, May 4, 2015 at 7:45 AM, Michael Matz m...@suse.de wrote: Hi, On Thu, 30 Apr 2015, Sriraman Tallam wrote: We noticed that one of our benchmarks sped-up by ~1% when we eliminated PLT stubs for some of the hot external library functions like memcmp, pow. The win was from better icache and itlb performance. The main reason was that the PLT stubs had no spatial locality with the call-sites. I have started looking at ways to tell the compiler to eliminate PLT stubs (in-effect inline them) for specified external functions, for x86_64. I have a proposal and a patch and I would like to hear what you think. This comes with caveats. This cannot be generally done for all functions marked extern as it is impossible for the compiler to say if a function is truly extern (defined in a shared library). If a function is not truly extern(ends up defined in the final executable), then calling it indirectly is a performance penalty as it could have been a direct call. This can be fixed by Alans idea. Further, the newly created GOT entries are fixed up at start-up and do not get lazily bound. And this can be fixed by some enhancements in the linker and dynamic linker. The idea is to still generate a PLT stub and make its GOT entry point to it initially (like a normal got.plt slot). Then the first indirect call will use the address of PLT entry (starting lazy resolution) and update the GOT slot with the real address, so further indirect calls will directly go to the function. This requires a new asm marker (and hence new reloc) as normally if there's a GOT slot it's filled by the real symbols address, unlike if there's only a got.plt slot. E.g. a call *foo@GOTPLT(%rip) would generate a GOT slot (and fill its address into above call insn), but generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. Ciao, Michael.
Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
Hi, On Mon, 4 May 2015, Xinliang David Li wrote: The use case proposed by Sri allows user to selectively eliminate PLT overhead for hot external calls only. Yes, but only _because_ his approach doesn't use lazy binding. With the full solution such restriction to a subset of functions isn't necessary. And we should strive for going the full way, instead of adding hacks, shouldn't we? Ciao, Michael.
Re: [RFA] More type narrowing in match.pd V2
On 05/02/2015 03:17 PM, Bernhard Reutner-Fischer wrote: I should find time to commit the already approved auto-wipe dump file patch. So let's assume I'll get to it maybe next weekend and nobody will notice the 2 leftover .original dumps in this patch :) Doh! Not sure how there's be a .original dump left lying around, but as posted it'll definitely leave a .optimized lying around. I'll fix that before committing. Thanks for pointing it out. jeff
Re: [PATCH] Remove dead code.
On 05/04/2015 05:50 AM, Dominik Vogt wrote: This patch removes a write only variable from the C++ code. ChangeLog: -- 2015-05-04 Dominik Vogt v...@linux.vnet.ibm.com * call.c (print_z_candidates): Remove dead code. OK. Please install. FWIW, removing a write-only variable seems like it ought ot fall under the obvious rule. jeff
Re: [PATCH 00/13] further rtx_insn *ification
On 05/02/2015 03:01 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, This set of patches changes rtx to rtx_insn * in many plaes where its fairly trivial to do so. each was bootstrapped + regtested on x86_64-linux-gnu, and the series was run through config-list.mk. I believe this all falls under Jeff's preapproval from last year for this sort of thing which I assume is still valid, so committing to trunk. And just to be explicit, it does fall under that preapproval for such changes. Jeff
Re: [PATCH 1/4] libcpp: Improvements to comments in line-map.h/c
On 05/01/2015 06:56 PM, David Malcolm wrote: This patch updates and expands some comments in libcpp, adding a big table to try to clarify what an individual source_location value can mean. libcpp/ChangeLog: * include/line-map.h: Fix comment at the top of the file. (source_location): Rewrite and expand the comment for this typedef, adding an ascii-art table to clarify how source_location values are allocated. * line-map.c: Fix comment at the top of the file. OK. jeff
Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
yes -- a full solution that supports lazy binding will be nice. David On Mon, May 4, 2015 at 9:58 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 4 May 2015, Xinliang David Li wrote: The use case proposed by Sri allows user to selectively eliminate PLT overhead for hot external calls only. Yes, but only _because_ his approach doesn't use lazy binding. With the full solution such restriction to a subset of functions isn't necessary. And we should strive for going the full way, instead of adding hacks, shouldn't we? Ciao, Michael.
Re: [rfc, stage 1] default to -fno-delete-null-pointer-checks on nios2-elf
On 05/01/2015 02:33 PM, Sandra Loosemore wrote: Re https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01510.html : On 04/15/2015 10:42 PM, Jeff Law wrote: It looks very sane to me. This is probably how the AVR and CR16 should have been handled to begin with IMHO. FWIW, I generally discourage ports overriding default options, but this is a case where I believe it makes some sense. Please move forward with an official submission. I've now bootstrapped and regression-tested the previously posted patch on x86_64-linux-gnu, as well as retesting it on nios2-elf after updating my source tree to current mainline head. Are the target-independent parts OK to commit? Yes. Please install. Thanks, Jeff
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
On 05/04/2015 10:37 AM, Alexander Monakov wrote: This patch introduces option -fno-plt that allows to expand calls that would go via PLT to load the address of the function immediately at call site (which introduces a GOT load). Cover letter explains the motivation for this patch. New option documentation for invoke.texi is missing from the patch; if this is accepted I'll be happy to send a v2 with documentation added. * calls.c (prepare_call_address): Transform PLT call to GOT lookup and indirect call by forcing address into a pseudo with -fno-plt. * common.opt (flag_plt): New option. OK once you cobble together the invoke.texi changes. Jeff
Re: [RFC PATCH] ira: accept loads via argp rtx in validate_equiv_mem
On 05/04/2015 10:37 AM, Alexander Monakov wrote: With this patch at hand, I'd like to discuss a code generation problem, which my patch solves only partially. FWIW, it passes bootstrap/regtest on x86-64. With other patches in series applied, GCC with -fno-plt can generate tail calls in PIC mode more frequently, but sometimes poorer code is generated. I've tried to look for possible causes, and found one issue so far. Consider the following testcase: void foo1(int a, int b, int c, int d, int e, int f, int g, int h); int bar(int x); void foo2(int a, int b, int c, int d, int e, int f, int g, int h) { bar(a); foo1(a, b, c, d, e, f, g, h); } Comparing x86 code generation with -O2 -m32 and with/without -fPIC, you can see that -fPIC happens to produce smaller code. Without -fPIC, GCC saves/restores all arguments before/after call to 'bar'. The reason for that is without -fPIC, GCC performs tail call optimization on 'foo1', and that causes it to drop REG_EQUIV notes for incoming arguments in fixup_tail_calls. After that, code generation diverges at IRA stage, where lack of equivalences prevents loads of pseudos to be moved to the point of first use. The patch tries to repair the problem by allowing REG_EQUIV notes to be resynthesized at ira init for loads that happen via `argp' rtx. It helps for the simple testcase above, but not for problematic Clang/LLVM functions where I noticed the issue. I hope there's a way around the 'big hammer' approach of fixup_tail_calls. Might it be possible instead of dropping REG_EQUIV notes, to copy incoming arguments into other pseudos just prior to stack pointer adjustment in preparation for tailcall? Isn't the whole point of dropping the notes to indicate that those argument slots are not longer guaranteed to hold the value at all points throughout the function? That can certainly be relaxed, but you'll have to have some kind of code to analyze the data in the argument slots to ensure they haven't changed. You can't just blindly put the notes back if I remember this stuff correctly. Jeff
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote: On 05/04/2015 10:37 AM, Alexander Monakov wrote: This patch introduces option -fno-plt that allows to expand calls that would go via PLT to load the address of the function immediately at call site (which introduces a GOT load). Cover letter explains the motivation for this patch. New option documentation for invoke.texi is missing from the patch; if this is accepted I'll be happy to send a v2 with documentation added. * calls.c (prepare_call_address): Transform PLT call to GOT lookup and indirect call by forcing address into a pseudo with -fno-plt. * common.opt (flag_plt): New option. OK once you cobble together the invoke.texi changes. Isn't what Michael/Alan suggested better? I mean as/ld/compiler changes to inline the plt slot's first part, then lazy binding will work fine. Jakub
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
On 05/04/2015 11:39 AM, Jakub Jelinek wrote: On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote: On 05/04/2015 10:37 AM, Alexander Monakov wrote: This patch introduces option -fno-plt that allows to expand calls that would go via PLT to load the address of the function immediately at call site (which introduces a GOT load). Cover letter explains the motivation for this patch. New option documentation for invoke.texi is missing from the patch; if this is accepted I'll be happy to send a v2 with documentation added. * calls.c (prepare_call_address): Transform PLT call to GOT lookup and indirect call by forcing address into a pseudo with -fno-plt. * common.opt (flag_plt): New option. OK once you cobble together the invoke.texi changes. Isn't what Michael/Alan suggested better? I mean as/ld/compiler changes to inline the plt slot's first part, then lazy binding will work fine. I must have missed Alan/Michael's message. ISTM the win here is that by going through the GOT, you can CSE the GOT reference and possibly get some more register allocation freedom. Is that still the case with Alan/Michael's approach? jeff
Re: [PATCH] fixup libobjc usage of PCC_BITFIELD_TYPE_MATTERS
On 05/01/2015 09:30 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, This adds a configure check to libobjc to find out if types of bitfields effect their layout, and uses it to replace the rather broken usage of PCC_BITFIELD_TYPE_MATTERS. bootstrapped + regtested x86_64-linux-gnu, bootstrapped on ppc64le-linux-gnu and ran check-objc there without failures, and checked the correct part of the ifdef is used on a cross to m68k-linux-elf. ok? I'm sure I've gotten something wrong since this is a bunch of auto tools ;-) Trev libobjc/ChangeLog: 2015-05-01 Trevor Saunders tbsaunde+...@tbsaunde.org * acinclude.m4: Include bitfields.m4. * config.h.in: Regenerate. * configure: Likewise. * configure.ac: Invoke gt_BITFIELD_TYPE_MATTERS. * encoding.c: Check HAVE_BITFIELD_TYPE_MATTERS. OK with the general direction here. If Jakub's test is better, then go with it as a follow-up. jeff
[C++ Patch] PR 66007
Hi, unfortunately we have to return to these few lines of code :( This regression is a more subtle variant of c++/65858: if the user passes -Wno-error=narrowing the pedwarn didn't result in an actual error (even if we are forcing -pedantic-errors around it) but produces anyway a warning, thus returns true, and ok isn't set to true, thus we have a miscompilation in this case too. Jakub suggested simply checking by hand errorcount, which passes all my tests. Thanks, Paolo. /cp 2015-05-04 Paolo Carlini paolo.carl...@oracle.com Jakub Jelinek ja...@redhat.com PR c++/66007 * typeck2.c (check_narrowing): Check by-hand that the pedwarn didn't result in an actual error. /testsuite 2015-05-04 Paolo Carlini paolo.carl...@oracle.com Jakub Jelinek ja...@redhat.com PR c++/66007 * g++.dg/cpp0x/Wnarrowing4.C: New. Index: cp/typeck2.c === --- cp/typeck2.c(revision 222767) +++ cp/typeck2.c(working copy) @@ -958,10 +958,12 @@ check_narrowing (tree type, tree init, tsubst_flag } else if (complain tf_error) { + int savederrorcount = errorcount; global_dc-pedantic_errors = 1; - if (!pedwarn (EXPR_LOC_OR_LOC (init, input_location), OPT_Wnarrowing, - narrowing conversion of %qE from %qT to %qT - inside { }, init, ftype, type)) + pedwarn (EXPR_LOC_OR_LOC (init, input_location), OPT_Wnarrowing, + narrowing conversion of %qE from %qT to %qT + inside { }, init, ftype, type); + if (errorcount == savederrorcount) ok = true; global_dc-pedantic_errors = flag_pedantic_errors; } Index: testsuite/g++.dg/cpp0x/Wnarrowing4.C === --- testsuite/g++.dg/cpp0x/Wnarrowing4.C(revision 0) +++ testsuite/g++.dg/cpp0x/Wnarrowing4.C(working copy) @@ -0,0 +1,14 @@ +// PR c++/66007 +// { dg-do run { target c++11 } } +// { dg-options -Wno-error=narrowing } + +extern C void abort(); + +int main() +{ + unsigned foo[] = { 1, -1, 3 }; + if (foo[0] != 1 || foo[1] != __INT_MAX__ * 2U + 1 || foo[2] != 3) +abort(); +} + +// { dg-prune-output narrowing conversion }
[PATCH] Fix ubsan non-call-exceptions ICE (PR tree-optimization/65984)
Hi! The code I've added in r217755 was assuming that stmt_could_throw_p memory read will always end a bb, but that is clearly not the case. Thus, the following patch uses stmt_ends_bb_p instead. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5? 2015-05-04 Jakub Jelinek ja...@redhat.com PR tree-optimization/65984 * ubsan.c: Include tree-cfg.h. (instrument_bool_enum_load): Use stmt_ends_bb_p instead of stmt_could_throw_p test, rename can_throw variable to ends_bb. * c-c++-common/ubsan/pr65984.c: New test. --- gcc/ubsan.c.jj 2015-04-09 21:49:59.0 +0200 +++ gcc/ubsan.c 2015-05-04 17:17:34.273661884 +0200 @@ -87,6 +87,7 @@ along with GCC; see the file COPYING3. #include builtins.h #include tree-object-size.h #include tree-eh.h +#include tree-cfg.h /* Map from a tree to a VAR_DECL tree. */ @@ -1420,7 +1421,7 @@ instrument_bool_enum_load (gimple_stmt_i || TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME) return; - bool can_throw = stmt_could_throw_p (stmt); + bool ends_bb = stmt_ends_bb_p (stmt); location_t loc = gimple_location (stmt); tree lhs = gimple_assign_lhs (stmt); tree ptype = build_pointer_type (TREE_TYPE (rhs)); @@ -1432,7 +1433,7 @@ instrument_bool_enum_load (gimple_stmt_i tree mem = build2 (MEM_REF, utype, gimple_assign_lhs (g), build_int_cst (atype, 0)); tree urhs = make_ssa_name (utype); - if (can_throw) + if (ends_bb) { gimple_assign_set_lhs (stmt, urhs); g = gimple_build_assign (lhs, NOP_EXPR, urhs); @@ -1469,7 +1470,7 @@ instrument_bool_enum_load (gimple_stmt_i gimple_set_location (g, loc); gsi_insert_after (gsi, g, GSI_NEW_STMT); - if (!can_throw) + if (!ends_bb) { gimple_assign_set_rhs_with_ops (gsi2, NOP_EXPR, urhs); update_stmt (stmt); --- gcc/testsuite/c-c++-common/ubsan/pr65984.c.jj 2015-05-04 14:16:59.655378975 +0200 +++ gcc/testsuite/c-c++-common/ubsan/pr65984.c 2015-05-04 17:19:55.875447821 +0200 @@ -0,0 +1,23 @@ +/* PR tree-optimization/65984 */ +/* { dg-do compile } */ +/* { dg-options -fnon-call-exceptions -fsanitize=bool,enum } */ + +#ifndef __cplusplus +#define bool _Bool +#endif + +enum E { E0, E1, E2 }; +enum E e[2]; +bool *b; + +int +foo (int i) +{ + return e[i]; +} + +int +bar (int i) +{ + return b[i]; +} Jakub
Re: [PATCH 2/4] libcpp: Replace macro usage with C++ constructs
On 05/01/2015 06:56 PM, David Malcolm wrote: libcpp makes extensive use of the C preprocessor. Whilst this has a pleasingly self-referential quality, I find the code hard-to-read; implementing source location support in my JIT branch was much harder than I felt it should have been. In an attempt at making the code easier to follow, and to build towards a followup patch, this patch converts most of these macros to C++ equivalents: using const for compile-time constants, and inline functions where macros aren't used as lvalues. This effectively documents the expected types of the params, and makes them available from the debugger e.g.: (gdb) p LINEMAP_FILE ($3) $1 = 0x13b8b37 command-line and indeed the constants also: (gdb) p IS_ADHOC_LOC(MAX_SOURCE_LOCATION) $2 = false (gdb) p IS_ADHOC_LOC(MAX_SOURCE_LOCATION + 1) $3 = true [I didn't mark the inline functions as static; should they be?] [FWIW, I posted a reduced version of this patch about a year ago as: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01092.html which covered a smaller subset of the macros]. libcpp/ChangeLog: * include/line-map.h (MAX_SOURCE_LOCATION): Convert from a macro to a const source_location. (RESERVED_LOCATION_COUNT): Likewise. (linemap_check_ordinary): Convert from a macro to a pair of inline functions, for const/non-const arguments. (MAP_START_LOCATION): Likewise. (ORDINARY_MAP_STARTING_LINE_NUMBER): Likewise. (ORDINARY_MAP_INCLUDER_FILE_INDEX): Likewise. (ORDINARY_MAP_IN_SYSTEM_HEADER_P): Likewise. (ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): Convert from a macro to a pair of inline functions, for const/non-const arguments, where the latter is named... (SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): New function. (ORDINARY_MAP_FILE_NAME): Convert from a macro to a pair of inline functions, for const/non-const arguments. (MACRO_MAP_MACRO): Likewise. (MACRO_MAP_NUM_MACRO_TOKENS): Likewise. (MACRO_MAP_LOCATIONS): Likewise. (MACRO_MAP_EXPANSION_POINT_LOCATION): Likewise. (LINEMAPS_MAP_INFO): Likewise. (LINEMAPS_MAPS): Likewise. (LINEMAPS_ALLOCATED): Likewise. (LINEMAPS_USED): Likewise. (LINEMAPS_CACHE): Likewise. (LINEMAPS_ORDINARY_CACHE): Likewise. (LINEMAPS_MACRO_CACHE): Likewise. (LINEMAPS_MAP_AT): Convert from a macro to an inline function. (LINEMAPS_LAST_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_MAP): Likewise. (LINEMAPS_ORDINARY_MAPS): Likewise. (LINEMAPS_ORDINARY_MAP_AT): Likewise. (LINEMAPS_ORDINARY_ALLOCATED): Likewise. (LINEMAPS_ORDINARY_USED): Likewise. (LINEMAPS_LAST_ORDINARY_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_ORDINARY_MAP): Likewise. (LINEMAPS_MACRO_MAPS): Likewise. (LINEMAPS_MACRO_MAP_AT): Likewise. (LINEMAPS_MACRO_ALLOCATED): Likewise. (LINEMAPS_MACRO_USED): Likewise. (LINEMAPS_MACRO_LOWEST_LOCATION): Likewise. (LINEMAPS_LAST_MACRO_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_MACRO_MAP): Likewise. (IS_ADHOC_LOC): Likewise. (COMBINE_LOCATION_DATA): Likewise. (SOURCE_LINE): Likewise. (SOURCE_COLUMN): Likewise. (LAST_SOURCE_LINE_LOCATION): Likewise. (LAST_SOURCE_LINE): Likewise. (LAST_SOURCE_COLUMN): Likewise. (LAST_SOURCE_LINE_LOCATION) (INCLUDED_FROM): Likewise. (MAIN_FILE_P): Likewise. (LINEMAP_FILE): Likewise. (LINEMAP_LINE): Likewise. (LINEMAP_SYSP): Likewise. (linemap_location_before_p): Likewise. * line-map.c (linemap_check_files_exited): Make local map const. (linemap_add): Use SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS. (linemap_line_start): Likewise. --- -#define MAP_START_LOCATION(MAP) (MAP)-start_location +#if defined ENABLE_CHECKING (GCC_VERSION = 2007) + +/* Assertion macro to be used in line-map code. */ +#define linemap_assert(EXPR) \ + do {\ +if (! (EXPR)) \ + abort (); \ + } while (0) + +/* Assert that becomes a conditional expression when checking is disabled at + compilation time. Use this for conditions that should not happen but if + they happen, it is better to handle them gracefully rather than crash + randomly later. + Usage: + + if (linemap_assert_fails(EXPR)) handle_error(); */ +#define linemap_assert_fails(EXPR) __extension__ \ + ({linemap_assert (EXPR); false;}) + +#else +/* Include EXPR, so that unused variable warnings do not occur. */ +#define linemap_assert(EXPR) ((void)(0 (EXPR))) +#define linemap_assert_fails(EXPR) (! (EXPR)) +#endif So if we're generally trying to get away from #define programming, then this part seems like a bit of a step backwards.
Re: [PATCH 3/4] libcpp/input.c: Add a way to visualize the linemaps
On 05/01/2015 06:56 PM, David Malcolm wrote: As a relative newcomer to GCC, one of the issues I had was becoming comfortable with the linemap API and its internal representation. To familiarize myself with it, I wrote a dumping routine to try to visualize how the source_location space is carved up between line maps, and what each number can mean. It struck me that this would benefit others, so this patch adds this visualization, via an undocumented option -fdump-locations, and adds a text file to libcpp's sources documenting a simple example of compiling a small C file, with a header and macro expansions (built using the -fdump-locations option and a little hand-editing). gcc/ChangeLog: * common.opt (fdump-locations): New option. * input.c: Include diagnostic-core.h. (get_end_location): New function. (write_digit): New function. (write_digit_row): New function. (dump_location_range): New function. (dump_labelled_location_range): New function. (dump_location_info): New function. * input.h (dump_location_info): New prototype. * toplev.c (compile_file): Handle flag_dump_locations. libcpp/ChangeLog: * include/line-map.h (source_location): Add a reference to location-example.txt to the descriptive comment. * location-example.txt: New file. Maybe dump-internal-locations? Not sure I want to bikeshed on the name any more than that. If you feel strongly about the option name, then I won't stress about it. +void +dump_location_info (FILE *stream) +{ + if (0) +line_table_dump (stream, +line_table, +LINEMAPS_ORDINARY_USED (line_table), +LINEMAPS_MACRO_USED (line_table)); Should the if (0) code go away? + + /* A brute-force visualization: emit a warning at every location. */ + if (0) +for (source_location loc = 0; loc line_table-highest_location; loc++) + warning_at (loc, 0, this is location %i, loc); + /* Alternatively, we could use inform (), though this +also shows lots of locations in stdc-predef.h */ And again. So I think with removing the if (0) code and the possible option name change this is good to go. Jeff
Re: Extend verify_type to check various uses of TYPE_MINVAL
Hi, if my wifi connectoin allows, I will commit the following patch I tested in meantime. It also adds sanity checking for TYPE_MAXVAL that does not seem to trigger any issues anymore. From type_non_common it remains to check values and binfo. I hope to kill all those fields and move them to derived structures where they belong but it is harder than it seems because way obj-c++ shares datastructures with C++ and C FEs and abuse these fields in interesting ways. (I got stuck on these last stage1) Honza Index: ChangeLog === --- ChangeLog (revision 222791) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2015-05-02 Jan Hubicka hubi...@ucw.cz + + * tree.c (verify_type): Check various uses of TYPE_MAXVAL; + fix overactive TYPE_MIN_VALUE check and add FIXME for type + compatibility problems. + 2015-05-04 Ajit Agarwal ajit...@xilinx.com * config/microblaze/microblaze.md (cbranchsi4): Added immediate Index: tree.c === --- tree.c (revision 222753) +++ tree.c (working copy) @@ -12621,14 +12621,9 @@ verify_type (const_tree t) } else if (INTEGRAL_TYPE_P (t) || TREE_CODE (t) == REAL_TYPE || TREE_CODE (t) == FIXED_POINT_TYPE) { - if (!TYPE_MIN_VALUE (t)) - ; - else if (!TREE_CONSTANT (TYPE_MIN_VALUE (t))) -{ - error (TYPE_MIN_VALUE is not constant); - debug_tree (TYPE_MIN_VALUE (t)); - error_found = true; -} + /* FIXME: The following check should pass: + useless_type_conversion_p (const_cast tree (t), TREE_TYPE (TYPE_MIN_VALUE (t)) +bud does not for C sizetypes in LTO. */ } else if (TYPE_MINVAL (t)) { @@ -12637,6 +12632,62 @@ verify_type (const_tree t) error_found = true; } + /* Check various uses of TYPE_MAXVAL. */ + if (RECORD_OR_UNION_TYPE_P (t)) +{ + if (TYPE_METHODS (t) TREE_CODE (TYPE_METHODS (t)) != FUNCTION_DECL + TREE_CODE (TYPE_METHODS (t)) != TEMPLATE_DECL) + { + error (TYPE_METHODS is not FUNCTION_DECL nor TEMPLATE_DECL); + debug_tree (TYPE_METHODS (t)); + error_found = true; + } +} + else if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE) +{ + if (TYPE_METHOD_BASETYPE (t) + TREE_CODE (TYPE_METHOD_BASETYPE (t)) != RECORD_TYPE + TREE_CODE (TYPE_METHOD_BASETYPE (t)) != UNION_TYPE) + { + error (TYPE_METHOD_BASETYPE is not record nor union); + debug_tree (TYPE_METHOD_BASETYPE (t)); + error_found = true; + } +} + else if (TREE_CODE (t) == OFFSET_TYPE) +{ + if (TYPE_OFFSET_BASETYPE (t) + TREE_CODE (TYPE_OFFSET_BASETYPE (t)) != RECORD_TYPE + TREE_CODE (TYPE_OFFSET_BASETYPE (t)) != UNION_TYPE) + { + error (TYPE_OFFSET_BASETYPE is not record nor union); + debug_tree (TYPE_OFFSET_BASETYPE (t)); + error_found = true; + } +} + else if (INTEGRAL_TYPE_P (t) || TREE_CODE (t) == REAL_TYPE || TREE_CODE (t) == FIXED_POINT_TYPE) +{ + /* FIXME: The following check should pass: + useless_type_conversion_p (const_cast tree (t), TREE_TYPE (TYPE_MAX_VALUE (t)) +bud does not for C sizetypes in LTO. */ +} + else if (TREE_CODE (t) == ARRAY_TYPE) +{ + if (TYPE_ARRAY_MAX_SIZE (t) + TREE_CODE (TYPE_ARRAY_MAX_SIZE (t)) != INTEGER_CST) +{ + error (TYPE_ARRAY_MAX_SIZE not INTEGER_CST); + debug_tree (TYPE_ARRAY_MAX_SIZE (t)); + error_found = true; +} +} + else if (TYPE_MAXVAL (t)) +{ + error (TYPE_MAXVAL non-NULL); + debug_tree (TYPE_MAXVAL (t)); + error_found = true; +} + if (error_found) {
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
From: Jeff Law [mailto:l...@redhat.com] Sent: Tuesday, April 28, 2015 12:27 AM OK. No need for heroics -- give it a shot, but don't burn an insane amount of time on it. If we can't get to a reasonable testcase, then so be it. Ok, I tried but really didn't managed to create a testcase. I did, however, understand the condition when this patch is helpful. In the function reg_nonzero_bits_for_combine () in combine.c there is a test to check if last_set_nonzero_bits for a given register is still valid. In the case I'm considering, the test evaluates to false because: (i) the register rX whose nonzero bits are being evaluated was set in a previous basic block than the one with the instruction using rX (hence rsp-last_set_label label_tick) (ii) the predecessor of the the basic block for that same insn is not the previous basic block analyzed by combine_instructions (hence label_tick_ebb_start == label_tick) (iii) the register rX is set multiple time (hence REG_N_SETS (REGNO (x)) != 1) Yet, the block being processed is dominated by the SET for rX so there is a REG_EQUAL available to narrow down the set of nonzero bits. Based on my understanding of your answer quoted above, I'll commit it as is, despite not having been able to come up with a testcase. I'll wait tomorrow to do so though in case you changed your mind about it. Best regards, Thomas
[PATCH] Improve the test in bitfields.m4
From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, here's what I committed. bootstrapped + regtested x86_64-linux-gnu. Trev Using a named bitfield with a width more than 0 means we won't hit weirdness caused by the bitfield not really needing to exist. Changing int to long long means we won't have trouble with some arch where size of int is 1 or 2. libobjc/ChangeLog: 2015-05-04 Trevor Saunders tbsaunde+...@tbsaunde.org * configure: Regenerate. config/ChangeLog: 2015-05-04 Trevor Saunders tbsaunde+...@tbsaunde.org * bitfields.m4: Change int to long long, and use bitfields of width 1 instead of 0. --- config/bitfields.m4 | 7 +++ libobjc/configure | 7 +++ 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/config/bitfields.m4 b/config/bitfields.m4 index ee8f3b5..8185cd3 100644 --- a/config/bitfields.m4 +++ b/config/bitfields.m4 @@ -13,10 +13,9 @@ AC_DEFUN([gt_BITFIELD_TYPE_MATTERS], AC_CACHE_CHECK([if the type of bitfields matters], gt_cv_bitfield_type_matters, [ AC_TRY_COMPILE( - [struct foo1 { char x; char :0; char y; }; -struct foo2 { char x; int :0; char y; }; -int foo1test[ sizeof (struct foo1) == 2 ? 1 : -1 ]; -int foo2test[ sizeof (struct foo2) == 5 ? 1 : -1]; ], + [struct foo1 { char x; char y:1; char z; }; +struct foo2 { char x; long long int y:1; char z; }; +int foo1test[ sizeof (struct foo1) sizeof (struct foo2) ? 1 : -1 ]; ], [], gt_cv_bitfield_type_matters=yes, gt_cv_bitfield_type_matters=no) ]) if test $gt_cv_bitfield_type_matters = yes; then diff --git a/libobjc/configure b/libobjc/configure index 0547f91..2f71735 100755 --- a/libobjc/configure +++ b/libobjc/configure @@ -11539,10 +11539,9 @@ else cat confdefs.h - _ACEOF conftest.$ac_ext /* end confdefs.h. */ -struct foo1 { char x; char :0; char y; }; -struct foo2 { char x; int :0; char y; }; -int foo1test[ sizeof (struct foo1) == 2 ? 1 : -1 ]; -int foo2test[ sizeof (struct foo2) == 5 ? 1 : -1]; +struct foo1 { char x; char y:1; char z; }; +struct foo2 { char x; long long int y:1; char z; }; +int foo1test[ sizeof (struct foo1) sizeof (struct foo2) ? 1 : -1 ]; int main () { -- 2.4.0
Re: [Patch,microblaze]: Optimized usage of fint instruction.
On 03/04/2015 08:20 AM, Michael Eager wrote: On 03/04/15 03:53, Ajit Kumar Agarwal wrote: -Original Message- From: Michael Eager [mailto:ea...@eagerm.com] Sent: Thursday, February 26, 2015 4:33 AM To: Ajit Kumar Agarwal; GCC Patches Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,microblaze]: Optimized usage of fint instruction. On 02/25/15 02:20, Ajit Kumar Agarwal wrote: Hello All: Please find the patch for the optimized usage of fint instruction changes. No regression is seen in the deja GNU tests. commit ed4dc0b96bf43c200cacad97f73a98ab7048e51b Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none) Date: Wed Feb 25 15:36:29 2015 +0530 [Patch,microblaze]: Optimized usage of fint instruction. The changes are made in the patch for optimized usage of fint instruction. The sequence of fint/cond_branch is replaced with fcmp/cond_branch. The fint instruction takes 6/7 cycles as compared to fcmp instruction which takes 1 cycles. The conversion from float to int with fint instruction is not required and can directly compared with fcmp instruction which takes 1 cycle as compared to 6/7 cycles with fint instruction. ChangeLog: 2015-02-25 Ajit Agarwal ajit...@xilinx.com * config/microblaze/microblaze.md (peephole2): New. +emit_insn (gen_cstoresf4 (comp_reg, operands[2], + gen_rtx_REG(SFmode,REGNO(cmp_op0)), + gen_rtx_REG(SFmode,REGNO(cmp_op1; Spaces before left parens and after comma in last two lines. Changes are incorporated. Please find the log for updated patch. commit 492b0d0b67a5b12d2dc239de3215630c8838edea Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none) Date: Wed Mar 4 17:15:16 2015 +0530 [Patch,microblaze]: Optimized usage of fint instruction. The changes are made in the patch for optimized usage of fint instruction. The sequence of fint/cond_branch is replaced with fcmp/cond_branch. The fint instruction takes 6/7 cycles as compared to fcmp instruction which takes 1 cycles. The conversion from float to int with fint instruction is not required and can directly compared with fcmp instruction which takes 1 cycle as compared to 6/7 cycles with fint instruction. ChangeLog: 2015-03-04 Ajit Agarwal ajit...@xilinx.com * config/microblaze/microblaze.md (peephole2): New. Signed-off-by:Ajit Agarwal ajit...@xilinx.com Thanks Regards Ajit OK. Committed revision 222790. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: [Patch,microblaze]: Optimized usage of pcmp conditional instruction.
On 03/06/2015 07:33 AM, Michael Eager wrote: On 03/05/15 21:12, Ajit Kumar Agarwal wrote: Changes are incorporated. Please find the log of the updated patch. commit 91f275c144165320850ddf18e3a1e059a66c Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none) Date: Fri Mar 6 09:55:11 2015 +0530 [Patch,microblaze]: Optimized usage of pcmp conditional instruction. The changes are made in the patch for optimized usage of pcmpne/pcmpeq instructions. The xor with register to register is replaced with pcmpeq /pcmpne instructions and for immediate check still the xori will be used. The purpose of the change is to acheive the aggressive usage of pcmpne /pcmpeq instructions instead of xor being used for comparison. ChangeLog: 2015-03-06 Ajit Agarwal ajit...@xilinx.com * config/microblaze/microblaze.md (cbranchsi4): Added immediate constraints. (cbranchsi4_reg): New. * config/microblaze/microblaze.c (microblaze_expand_conditional_branch_reg): New. * config/microblaze/microblaze-protos.h (microblaze_expand_conditional_branch_reg): New prototype. Signed-off-by:Ajit Agarwal ajit...@xilinx.com Thanks Regards Ajit OK. Committed revision 222791. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
[PATCH, i386]: Fix PR65871, add *bmi_andn_mode_ccno pattern
Hello! Another pattern that seems useful. 2015-05-05 Uros Bizjak ubiz...@gmail.com PR target/65871 * config/i386/i386.md (*bmi_andn_mode_ccno): New pattern. testsuite/ChangeLog: 2015-05-05 Uros Bizjak ubiz...@gmail.com PR target/65871 * gcc.target/i386/pr65871-3.c: New test. Teste on x86_64-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 222774) +++ config/i386/i386.md (working copy) @@ -12565,11 +12564,25 @@ (set_attr btver2_decode direct, double) (set_attr mode MODE)]) +(define_insn *bmi_andn_mode_ccno + [(set (reg FLAGS_REG) + (compare + (and:SWI48 + (not:SWI48 (match_operand:SWI48 1 register_operand r,r)) + (match_operand:SWI48 2 nonimmediate_operand r,m)) + (const_int 0))) + (clobber (match_scratch:SWI48 0 =r,r))] + TARGET_BMI ix86_match_ccmode (insn, CCNOmode) + andn\t{%2, %1, %0|%0, %1, %2} + [(set_attr type bitmanip) + (set_attr btver2_decode direct, double) + (set_attr mode MODE)]) + (define_insn bmi_bextr_mode [(set (match_operand:SWI48 0 register_operand =r,r) (unspec:SWI48 [(match_operand:SWI48 1 nonimmediate_operand r,m) - (match_operand:SWI48 2 register_operand r,r)] - UNSPEC_BEXTR)) + (unspec:SWI48 [(match_operand:SWI48 1 nonimmediate_operand r,m) + (match_operand:SWI48 2 register_operand r,r)] + UNSPEC_BEXTR)) (clobber (reg:CC FLAGS_REG))] TARGET_BMI bextr\t{%2, %1, %0|%0, %1, %2} Index: testsuite/gcc.target/i386/pr65871-3.c === --- testsuite/gcc.target/i386/pr65871-3.c (revision 0) +++ testsuite/gcc.target/i386/pr65871-3.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -mbmi } */ + +int foo (int x, int y) +{ + if (~x y) +return 1; + + return 0; +} + +int bar (int x, int y) +{ + if ((~x y) 0) +return 1; + + return 0; +} + +/* { dg-final { scan-assembler-not test } } */
Re: [PATCH/libiberty] fix build of gdb/binutils with clang.
There was a similar disscussion here https://gcc.gnu.org/ml/gcc/2005-11/msg01190.html The problem is in the configure stage, the __GNU_SOURCE is not defined, and it could not find the declaration of asprintf. so it make a declaration of asprintf in libiberty.h. And for the file floatformat.c, the __GNU_SOURCE is defined, so it could find another asprintf in /usr/include/bits/stdio2.h, it also includes libiberty.h. So these two asprintf conflicts when __USE_FORTIFY_LEVEL is set. On Sat, May 2, 2015 at 11:58 AM, Ian Lance Taylor i...@google.com wrote: On Fri, May 1, 2015 at 4:45 PM, Yunlian Jiang yunl...@google.com wrote: The test case does not have #define _GNU_SOURCE, so it says error: ‘asprintf’ undeclared (first use in this function) OK, then my next question is: why does the test case (I assume you mean the test case for whether to set HAVE_DECL_ASPRINTF) not have #define _GNU_SOURCE? What is the background here? Ian On Fri, May 1, 2015 at 3:45 PM, Ian Lance Taylor i...@google.com wrote: On Tue, Apr 28, 2015 at 2:59 PM, Yunlian Jiang yunl...@google.com wrote: I believe this is the same problem as https://gcc.gnu.org/ml/gcc-patches/2008-07/msg00292.html The asprinf declaration is messed up when using clang to build gdb. diff --git a/include/libiberty.h b/include/libiberty.h index b33dd65..a294903 100644 --- a/include/libiberty.h +++ b/include/libiberty.h @@ -625,8 +625,10 @@ extern int pwait (int, int *, int); /* Like sprintf but provides a pointer to malloc'd storage, which must be freed by the caller. */ +#ifndef asprintf extern int asprintf (char **, const char *, ...) ATTRIBUTE_PRINTF_2; #endif +#endif /* Like asprintf but allocates memory without fail. This works like xmalloc. */ Why is HAVE_DECL_ASPRINTF not defined? Ian
[patch committed SH] Fix PR target/65987
I've committed the attached patch to fix PR target/65987 which is a 6 regression. The recent stdarg change reveals the target problem for section crossing jumps. Some SH specific jump optimizations don't take into account such jumps. The attached patch is a minimal fix to solve the above PR. Tested on sh4-unknown-linux-gnu. Regards, kaz -- 2015-05-04 Kaz Kojima kkoj...@gcc.gnu.org PR target/65987 * config/sh/sh.c (output_far_jump): Take into account crossing jumps. (split_branches): Likewise. diff --git a/config/sh/sh.c b/config/sh/sh.c index 1cf6ed0..a4c9c4c 100644 --- a/config/sh/sh.c +++ b/config/sh/sh.c @@ -2747,7 +2747,8 @@ output_far_jump (rtx_insn *insn, rtx op) if (TARGET_SH2 offset = -32764 - offset - get_attr_length (insn) = 32766) + offset - get_attr_length (insn) = 32766 + ! CROSSING_JUMP_P (insn)) { far = 0; jump = mov.w %O0,%1 \n @@ -6753,6 +6754,13 @@ split_branches (rtx_insn *first) if (type == TYPE_JUMP) { + if (CROSSING_JUMP_P (insn)) + { + emit_insn_before (gen_block_branch_redirect (const0_rtx), + insn); + continue; + } + far_label = as_a rtx_insn * ( XEXP (SET_SRC (PATTERN (insn)), 0)); dest_uid = get_dest_uid (far_label, max_uid);
match.pd patch reverted
I've reverted my latest match.pd change. It's causing a bootstrap failure on i686. Jeff
Re: Extend verify_type to check various uses of TYPE_MINVAL
Not obvious enough, it seems: this patch broke gnat.dg/lto* tests at least on i386-pc-solaris2.10. E.g. FAIL: gnat.dg/lto1.adb (test for excess errors) WARNING: gnat.dg/lto1.adb compilation failed to produce executable FAIL: gnat.dg/lto1.adb (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/solaris/gcc/testsuite/gnat.dg/lto1_pkg.adb:23:1: error: TYPE_MIN_VALUE is not constant TYPE_MIN_VALUE can be arbitrary in Ada, with or without LTO. For package Q is function LB return Natural; function UB return Natural; end Q; with Q; package P is type Arr1 is array (Natural range ) of Boolean; subtype Arr2 is Arr1 (Q.LB .. Q.UB); end P; the TYPE_DOMAIN of Arr2 is domain integer_type 0x769be000 type integer_type 0x76d0e0a8 sizetype sizes-gimplified visited DI size integer_cst 0x76d0abb8 64 unit size integer_cst 0x76d0abd0 8 align 64 symtab 0 alias set -1 canonical type 0x769be000 precision 64 min nop_expr 0x769bd000 max cond_expr 0x769b9420 Thanks, I just noticed the failures. I will revert that check, it is indeed valid for min values to not be constants (and even in C max values may be variable) Honza
Re: [PATCH] Fix ubsan non-call-exceptions ICE (PR tree-optimization/65984)
On 05/04/2015 12:16 PM, Jakub Jelinek wrote: Hi! The code I've added in r217755 was assuming that stmt_could_throw_p memory read will always end a bb, but that is clearly not the case. Thus, the following patch uses stmt_ends_bb_p instead. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5? 2015-05-04 Jakub Jelinek ja...@redhat.com PR tree-optimization/65984 * ubsan.c: Include tree-cfg.h. (instrument_bool_enum_load): Use stmt_ends_bb_p instead of stmt_could_throw_p test, rename can_throw variable to ends_bb. * c-c++-common/ubsan/pr65984.c: New test. OK. Jeff
Fix PR48052: loop not vectorized if index is unsigned int
This is an old thread and we are still running into similar issues: Code is not being vectorized on 64-bit target due to scev not being able to optimally analyze overflow condition. While the original test case shown here seems to work now, it does not work if the start value is not a constant and the loop index variable is of unsigned type: Ex void loop2( double const * __restrict__ x_in, double * __restrict__ x_out, double const * __restrict__ c, unsigned int N, unsigned int start) { for(unsigned int i=start; i!=N; ++i) x_out[i] = c[i]*x_in[i]; } Here is our unit test: int foo(int* A, int* B, unsigned start, unsigned B) { int s; for (unsigned k = start; k start+B; k++) s += A[k] * B[k]; return s; } Our unit test case is extracted from a matrix multiply of a two-dimensional array and all loops are blocked by hand by a factor of B. Even though a bit modified, above loop corresponds to the innermost loop of the blocked matrix multiply. We worked on patch to solve the problem (see attachment.) The attached patch passed bootstrap and make check on x86_64-linux. Ok for trunk? Thanks, Abderrazek Zaafrani From eedbcd1ef6a81bb9c000e0dba9ff2a6c524576ac Mon Sep 17 00:00:00 2001 From: Abderrazek Zaafrani a.zaafr...@samsung.com Date: Mon, 4 May 2015 11:00:12 -0500 Subject: [PATCH] scev for vectorization PR optimization/48052 * tree-ssa-loop-niter.c (variable_appears_in_loop_exit_condition): New. (scev_probably_wraps_p): Handle unsigned convert expressions to a larger type than the basic induction variable. * gcc.dg/vect/pr48052.c: New. --- gcc/testsuite/gcc.dg/vect/pr48052.c | 27 gcc/tree-ssa-loop-niter.c | 84 + 2 files changed, 111 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/pr48052.c diff --git a/gcc/testsuite/gcc.dg/vect/pr48052.c b/gcc/testsuite/gcc.dg/vect/pr48052.c new file mode 100644 index 000..8e406d7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr48052.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-additional-options -O3 } */ +/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ + +int foo(int* A, int* B, unsigned start, unsigned BS) +{ + int s; + for (unsigned k = start; k start + BS; k++) +{ + s += A[k] * B[k]; +} + + return s; +} + +int bar(int* A, int* B, unsigned BS) +{ + int s; + for (unsigned k = 0; k BS; k++) +{ + s += A[k] * B[k]; +} + + return s; +} + diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 042f8df..345fb93 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -3773,6 +3773,30 @@ nowrap_type_p (tree type) return false; } +/* Returns true when T appears in the exit condition of LOOP. */ + +static bool +variable_appears_in_loop_exit_condition (tree t, struct loop *loop) +{ + struct nb_iter_bound *bound; + + /* For now, we are only interested in loops with one exit condition. */ + if (loop-bounds == NULL || loop-bounds-next != NULL) + return false; + + for (bound = loop-bounds; bound; bound = bound-next) +{ + if (gimple_code (bound-stmt) != GIMPLE_COND) +return false; + + if (t == gimple_cond_lhs(bound-stmt) + || t == gimple_cond_rhs(bound-stmt)) +return true; +} + + return false; +} + /* Return false only when the induction variable BASE + STEP * I is known to not overflow: i.e. when the number of iterations is small enough with respect to the step and initial condition in order to @@ -3879,6 +3903,66 @@ scev_probably_wraps_p (tree base, tree step, fold_undefer_and_ignore_overflow_warnings (); + /* At this point, we could not determine that the current scalar + evolution composed of base and step does not overflow. In order + to improve this analysis, go back to the context of this scev, + i.e., statement and loop, and determine from there if we can + deduce that there is no overflow. + + We are so far interested in convert statement of this form + + _1 = (some cast) I; + + where I is a basic induction variable. This case is common when + computing addresses for 64-bit targets. */ + if (loop != NULL loop-nb_iterations != NULL loop-bounds != NULL + at_stmt != NULL integer_onep (step)) +{ + enum tree_code nbi_code = TREE_CODE (loop-nb_iterations); + enum gimple_code stmt_code = gimple_code (at_stmt); + + if (nbi_code != SCEV_NOT_KNOWN stmt_code == GIMPLE_ASSIGN) +{ + tree rhs1 = gimple_assign_rhs1 (at_stmt); + enum tree_code tree_code = gimple_assign_rhs_code (at_stmt); + tree rhs2 = gimple_assign_rhs2 (at_stmt); + + /* If at_stmt is a convert statement: _1 = (some cast) I; */ + if (rhs1 != NULL rhs2 == NULL + (tree_code == CONVERT_EXPR || tree_code == NOP_EXPR)) +{ +
Demangle symbols in debug assertion messages
Hi Here is the patch to demangle symbols in debug messages. I have also simplify code in formatter.h. Here is an example of assertion message: /home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug/functions.h:213: error: function requires a valid iterator range [__first, __last). Objects involved in the operation: iterator __first @ 0x0x7fff165d68b0 { type = __gnu_debug::_Safe_iterator__gnu_cxx::__normal_iteratorint*, std::__cxx1998::vectorint, std::allocatorint , std::__debug::vectorint, std::allocatorint (mutable iterator); state = dereferenceable; references sequence with type `std::__debug::vectorint, std::allocatorint ' @ 0x0x7fff165d69d0 } iterator __last @ 0x0x7fff165d68e0 { type = __gnu_debug::_Safe_iterator__gnu_cxx::__normal_iteratorint*, std::__cxx1998::vectorint, std::allocatorint , std::__debug::vectorint, std::allocatorint (mutable iterator); state = dereferenceable; references sequence with type `std::__debug::vectorint, std::allocatorint ' @ 0x0x7fff165d69d0 } * include/debug/formatter.h (_GLIBCXX_TYPEID): New macro to simplify usage of typeid. (_Error_formatter::_M_print_type): New. * src/c++11/debug.cc (_Error_formatter::_Parameter::_M_print_field): Use latter. (_Error_formatter::_M_print_type): Implement latter using __cxaabiv1::__cxa_demangle to print demangled type name. I just hope that __cxa_demangle is portable. Ok to commit ? François diff --git a/libstdc++-v3/include/debug/formatter.h b/libstdc++-v3/include/debug/formatter.h index 6767cd9..32dcf92 100644 --- a/libstdc++-v3/include/debug/formatter.h +++ b/libstdc++-v3/include/debug/formatter.h @@ -31,7 +31,17 @@ #include bits/c++config.h #include bits/cpp_type_traits.h -#include typeinfo + +#if __cpp_rtti +# include typeinfo +# define _GLIBCXX_TYPEID(_Type) typeid(_Type) +#else +namespace std +{ + class type_info; +} +# define _GLIBCXX_TYPEID(_Type) 0 +#endif namespace __gnu_debug { @@ -218,21 +228,13 @@ namespace __gnu_debug { _M_variant._M_iterator._M_name = __name; _M_variant._M_iterator._M_address = __it; -#if __cpp_rtti - _M_variant._M_iterator._M_type = typeid(__it); -#else - _M_variant._M_iterator._M_type = 0; -#endif + _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it); _M_variant._M_iterator._M_constness = std::__are_same_Safe_iterator_Iterator, _Sequence, typename _Sequence::iterator:: __value ? __mutable_iterator : __const_iterator; _M_variant._M_iterator._M_sequence = __it._M_get_sequence(); -#if __cpp_rtti - _M_variant._M_iterator._M_seq_type = typeid(_Sequence); -#else - _M_variant._M_iterator._M_seq_type = 0; -#endif + _M_variant._M_iterator._M_seq_type = _GLIBCXX_TYPEID(_Sequence); if (__it._M_singular()) _M_variant._M_iterator._M_state = __singular; @@ -256,21 +258,13 @@ namespace __gnu_debug { _M_variant._M_iterator._M_name = __name; _M_variant._M_iterator._M_address = __it; -#if __cpp_rtti - _M_variant._M_iterator._M_type = typeid(__it); -#else - _M_variant._M_iterator._M_type = 0; -#endif + _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it); _M_variant._M_iterator._M_constness = std::__are_same_Safe_local_iterator_Iterator, _Sequence, typename _Sequence::local_iterator:: __value ? __mutable_iterator : __const_iterator; _M_variant._M_iterator._M_sequence = __it._M_get_sequence(); -#if __cpp_rtti - _M_variant._M_iterator._M_seq_type = typeid(_Sequence); -#else - _M_variant._M_iterator._M_seq_type = 0; -#endif + _M_variant._M_iterator._M_seq_type = _GLIBCXX_TYPEID(_Sequence); if (__it._M_singular()) _M_variant._M_iterator._M_state = __singular; @@ -291,11 +285,7 @@ namespace __gnu_debug { _M_variant._M_iterator._M_name = __name; _M_variant._M_iterator._M_address = __it; -#if __cpp_rtti - _M_variant._M_iterator._M_type = typeid(__it); -#else - _M_variant._M_iterator._M_type = 0; -#endif + _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it); _M_variant._M_iterator._M_constness = __mutable_iterator; _M_variant._M_iterator._M_state = __it? __unknown_state : __singular; _M_variant._M_iterator._M_sequence = 0; @@ -308,11 +298,7 @@ namespace __gnu_debug { _M_variant._M_iterator._M_name = __name; _M_variant._M_iterator._M_address = __it; -#if __cpp_rtti - _M_variant._M_iterator._M_type = typeid(__it); -#else - _M_variant._M_iterator._M_type = 0; -#endif + _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it); _M_variant._M_iterator._M_constness = __const_iterator; _M_variant._M_iterator._M_state = __it? __unknown_state : __singular; _M_variant._M_iterator._M_sequence = 0; @@ -325,11 +311,7 @@ namespace __gnu_debug { _M_variant._M_iterator._M_name = __name; _M_variant._M_iterator._M_address = __it; -#if __cpp_rtti - _M_variant._M_iterator._M_type = typeid(__it); -#else - _M_variant._M_iterator._M_type = 0;
Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses
OK. Fixed the patch. Rebased and tested on x86_64-linux (fortunately, it did not conflict with Trevor's series of rtx_insn-related patches). good :) fwiw I have another series that'll probably be ready about the end of the week (the punishment for writing small patches is making the testing box spin for days ;-) I'm trying to continue and the next patch (peep_split.patch, peep_split.cl) is addressing the same task in some of the generated code (namely, gen_peephole2_* and gen_split_* series of functions). ok, I've stayed away from the generators andjust done more trivial changes of rtx - rtx_insn * in arguments. Trev If you're going to continue this work, you should probably get write-after-approval access so that you can commit your own approved changes. Is it OK to mention you as a maintainer who can approve my request for write access? -- Regards, Mikhail Maltsev
Re: [PING 2][PATCH] libgcc: Add CFI directives to the soft floating point support code for ARM
Hi Ramana! Sorry to bother, but I looked at the repository and didn't see this committed. As I don't have write access could you please commit this for me? Thanks a lot! On Tue, Apr 28, 2015 at 2:07 PM, Martin Galvan martin.gal...@tallertechnologies.com wrote: Thanks a lot. I don't have write access to the repository, could you commit this for me? On Tue, Apr 28, 2015 at 1:21 PM, Ramana Radhakrishnan ramana@googlemail.com wrote: On Tue, Apr 28, 2015 at 4:19 PM, Martin Galvan martin.gal...@tallertechnologies.com wrote: This patch adds CFI directives to the soft floating point support code for ARM. Previously, if we tried to do a backtrace from that code in a debug session we'd get something like this: (gdb) bt #0 __nedf2 () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1082 #1 0x0db6 in __aeabi_cdcmple () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1158 #2 0xf5c28f5c in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Now we'll get something like this: (gdb) bt #0 __nedf2 () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1156 #1 0x0db6 in __aeabi_cdcmple () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1263 #2 0x0dc8 in __aeabi_dcmpeq () at ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1285 #3 0x0504 in main () I have a company-wide copyright assignment. I don't have commit access, though, so it would be great if anyone could commit this for me. Thanks a lot! this is OK , thanks. Sorry about the delay in reviewing this. Ramana
Re: [patch] Perform anonymous constant propagation during inlining
2015-05-01 Eric Botcazou ebotca...@adacore.com * expr.c (expand_expr_real_1) SSA_NAME: Try to substitute constants on the RHS of expressions. * gimple-expr.h (is_gimple_constant): Reorder. Bummer. This breaks C++ debugging: +FAIL: gdb.cp/class2.exp: print alpha at marker return 0 +FAIL: gdb.cp/class2.exp: print beta at marker return 0 +FAIL: gdb.cp/class2.exp: print * aap at marker return 0 +FAIL: gdb.cp/class2.exp: print * bbp at marker return 0 +FAIL: gdb.cp/class2.exp: print * abp at marker return 0, s-p-o off +FAIL: gdb.cp/class2.exp: print * (B *) abp at marker return 0 +FAIL: gdb.cp/class2.exp: p acp +FAIL: gdb.cp/class2.exp: p acp-c1 +FAIL: gdb.cp/class2.exp: p acp-c2 because C++ is apparently relying on the assignment to the anonymous return object to preserve the debug info attached to a return statement. Would you be OK with a slight variation of your earlier idea, i.e. calling fold_stmt with a specific valueizer from fold_marked_statements instead of the implicit no_follow_ssa_edges in the inliner? Something like: tree follow_anonymous_single_use_edges (tree val) { if (TREE_CODE (val) == SSA_NAME (!SSA_NAME_VAR (val) || DECL_IGNORED_P (SSA_NAME_VAR (var))) has_single_use (val)) return val return NULL_TREE; } -- Eric Botcazou
Re: [C++ Patch] PR 66007
On 05/04/2015 01:17 PM, Paolo Carlini wrote: This regression is a more subtle variant of c++/65858: if the user passes -Wno-error=narrowing the pedwarn didn't result in an actual error (even if we are forcing -pedantic-errors around it) but produces anyway a warning, thus returns true, and ok isn't set to true, thus we have a miscompilation in this case too. Jakub suggested simply checking by hand errorcount, which passes all my tests. OK. Jason
Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses
(the original message was bounced by the mailing list, resending with compressed attachment) On 30.04.2015 8:00, Jeff Law wrote: Can you please check the changes to do_jump_1, the indention looked weird in the patch. If it's correct, just say so. It is ok. Probably that's because the surrounding code is indented with spaces. The definition of PEEP2_EOB looks wrong. I don't see how you can safely cast pc_rtx to an rtx_insn * since it's an RTX rather than rtx chain object. Maybe you're getting away with it because it's used as marker. But it still feels wrong. Yes, FWIW, it is only needed for assertions in peep2_regno_dead_p and peep2_reg_dead_p which check it against NULL (they are intended to verify that live_before field in peep2_insn_data struct is valid). At least, when I removed the assertions and changed PEEP2_EOB to NULL (as an experiment), the testsuite passed without regressions. You'd probably be better off creating a unique rtx_insn * object and using that as the marker. OK. Fixed the patch. Rebased and tested on x86_64-linux (fortunately, it did not conflict with Trevor's series of rtx_insn-related patches). I'm trying to continue and the next patch (peep_split.patch, peep_split.cl) is addressing the same task in some of the generated code (namely, gen_peephole2_* and gen_split_* series of functions). If you're going to continue this work, you should probably get write-after-approval access so that you can commit your own approved changes. Is it OK to mention you as a maintainer who can approve my request for write access? -- Regards, Mikhail Maltsev as_insn.tar.gz Description: GNU Zip compressed data
Re: [PATCH 4/4] Replace line_map union with C++ class hierarchy
On 05/01/2015 06:56 PM, David Malcolm wrote: This patch eliminates the union in struct line_map in favor of a simple class hierarchy, making struct line_map a base class, with line_map_ordinary and line_map_macro subclasses. The patch eliminates all usage of linemap_check_ordinary and linemap_check_macro from line-map.h, updating return types and signatures throughout libcpp and gcc's usage of it to use the appropriate subclasses. This moves the checking of linemap kind from run-time to compile-time, and also implicitly documents everywhere where the code is expecting an ordinary map vs a macro map vs either kind of map. I believe it makes the code significantly simpler: most of the accessor functions in line-map.h become trivial field-lookups. I attemped to use templates for maps_info, but was stymied by gengtype, so in the end I simply split it manually into maps_info_ordinary and maps_info_macro. In theory it's just a vec, but vec.h is in gcc, and thus not available for use from libcpp. In a similar vein, gcc/is-a.h is presumably not usable from within libcpp. If it were, there would be the following rough equivalences: - line-map.h is-a.h - linemap_check_ordinary (m) as_a line_map_ordinary * (m) linemap_check_macro (m)as_a line_map_macro * (m) linemap_macro_expansion_map_p (m) (M ? is_a line_map_macro * (m) : false) - There are numerous places in libcpp that offset a line_map * using array notation to get the next/prev line_map of the same kind, e.g.: MAP_START_LOCATION (cached[1]) which breaks due to the different sizes of line_map vs its subclasses. On x86_64 host, before: (gdb) p sizeof(line_map) $1 = 40 after: (gdb) p sizeof(line_map) $1 = 8 (gdb) p sizeof(line_map_ordinary) $2 = 32 (gdb) p sizeof(line_map_macro) $3 = 40 Tracking down all of these array-based offsets to use a pointer to the appropriate subclass (and thus use the correct offset) was rather involved, but I believe the patch fixes them all now. (the patch thus also gives a very modest saving of 8 bytes per ordinary line map). I've tried to use the naming convention ord_map and macro_map whenever the typesystem ensures we're dealing with such a map, wherever this is doable without needing to touch lines of code that would otherwise not need touching by the patch. gcc/ChangeLog: * diagnostic.c (diagnostic_report_current_module): Strengthen local new_map from const line_map * to const line_map_ordinary *. * genmatch.c (error_cb): Likewise for local map. (output_line_directive): Likewise for local map. * input.c (expand_location_1): Likewise for local map. Pass NULL rather than map to linemap_unwind_to_first_non_reserved_loc, since the value is never read from there, and the value written back not read from here. (is_location_from_builtin_token): Strengthen local map from const line_map * to const line_map_ordinary *. (dump_location_info): Strengthen locals map from line_map *, one to const line_map_ordinary *, the other to const line_map_macro *. * tree-diagnostic.c (loc_map_pair): Strengthen field map from const line_map * to const line_map_macro *. (maybe_unwind_expanded_macro_loc): Add a call to linemap_check_macro when writing to the map field of the loc_map_pair. Introduce local const line_map_ordinary * ord_map, using it in place of map in the part of the function where we know we have an ordinary map. Strengthen local m from const line_map * to const line_map_ordinary *. gcc/ada/ChangeLog: * gcc-interface/trans.c (Sloc_to_locus1): Strenghthen local map from line_map * to line_map_ordinary *. gcc/c-family/ChangeLog: * c-common.h (fe_file_change): Strengthen param from const line_map * to const line_map_ordinary *. (pp_file_change): Likewise. * c-lex.c (fe_file_change): Likewise. (cb_define): Use linemap_check_ordinary when invoking SOURCE_LINE. (cb_undef): Likewise. * c-opts.c (c_finish_options): Use linemap_check_ordinary when invoking cb_file_change. (c_finish_options): Likewise. (push_command_line_include): Likewise. (cb_file_change): Strengthen param new_map from const line_map * to const line_map_ordinary *. * c-ppoutput.c (cb_define): Likewise for local map. (pp_file_change): Likewise for param map and local from. gcc/fortran/ChangeLog: * cpp.c (maybe_print_line): Strengthen local map from const line_map * to const line_map_ordinary *. (cb_file_change): Likewise for param map and local
[PATCH, i386]: Some trivial const_wide_int/const_double related cleanups
Hello! 2015-05-04 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c: Change GET_CODE (...) == CONST_DOUBLE check to CONST_DOUBLE_P predicate. (standard_sse_constant_p): Return 0 for !TARGET_SSE. (ix86_legitimate_constant_p) case CONST_WIDE_INT: For 32bit targets, allow only operands that satisfy standard_sse_constant_p predicate. * config/i386/i386.md: Change GET_CODE (...) == CONST_DOUBLE check to CONST_DOUBLE_P predicate. Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 222767) +++ config/i386/i386.c (working copy) @@ -9368,7 +9368,7 @@ standard_80387_constant_p (rtx x) REAL_VALUE_TYPE r; - if (!(X87_FLOAT_MODE_P (mode) (GET_CODE (x) == CONST_DOUBLE))) + if (!(CONST_DOUBLE_P (x) X87_FLOAT_MODE_P (mode))) return -1; if (x == CONST0_RTX (mode)) @@ -9469,9 +9469,14 @@ standard_80387_constant_rtx (int idx) int standard_sse_constant_p (rtx x) { - machine_mode mode = GET_MODE (x); + machine_mode mode; - if (x == const0_rtx || x == CONST0_RTX (GET_MODE (x))) + if (!TARGET_SSE) +return 0; + + mode = GET_MODE (x); + + if (x == const0_rtx || x == CONST0_RTX (mode)) return 1; if (vector_all_ones_operand (x, mode)) switch (mode) @@ -13078,9 +13083,7 @@ ix86_legitimate_constant_p (machine_mode, rtx x) break; case CONST_WIDE_INT: - if (GET_MODE (x) == TImode - x != CONST0_RTX (TImode) - !TARGET_64BIT) + if (!TARGET_64BIT !standard_sse_constant_p (x)) return false; break; @@ -15903,7 +15906,7 @@ ix86_print_operand (FILE *file, rtx x, int code) output_address (x); } - else if (GET_CODE (x) == CONST_DOUBLE GET_MODE (x) == SFmode) + else if (CONST_DOUBLE_P (x) GET_MODE (x) == SFmode) { REAL_VALUE_TYPE r; long l; @@ -15921,7 +15924,7 @@ ix86_print_operand (FILE *file, rtx x, int code) fprintf (file, 0x%08x, (unsigned int) l); } - else if (GET_CODE (x) == CONST_DOUBLE GET_MODE (x) == DFmode) + else if (CONST_DOUBLE_P (x) GET_MODE (x) == DFmode) { REAL_VALUE_TYPE r; long l[2]; @@ -15935,7 +15938,7 @@ ix86_print_operand (FILE *file, rtx x, int code) } /* These float cases don't actually occur as immediate operands. */ - else if (GET_CODE (x) == CONST_DOUBLE GET_MODE (x) == XFmode) + else if (CONST_DOUBLE_P (x) GET_MODE (x) == XFmode) { char dstr[30]; @@ -17364,8 +17367,7 @@ ix86_expand_move (machine_mode mode, rtx operands[ op1 = copy_to_mode_reg (mode, op1); if (can_create_pseudo_p () - FLOAT_MODE_P (mode) - GET_CODE (op1) == CONST_DOUBLE) + CONST_DOUBLE_P (op1)) { /* If we are loading a floating point constant to a register, force the value to memory now, since we'll get better code @@ -19563,7 +19565,7 @@ ix86_expand_copysign (rtx operands[]) else vmode = mode; - if (GET_CODE (op0) == CONST_DOUBLE) + if (CONST_DOUBLE_P (op0)) { rtx (*copysign_insn)(rtx, rtx, rtx, rtx); @@ -22632,7 +22634,7 @@ ix86_split_to_parts (rtx operand, rtx *parts, mach for (i = 1; i size; i++) parts[i] = adjust_address (operand, SImode, 4 * i); } - else if (GET_CODE (operand) == CONST_DOUBLE) + else if (CONST_DOUBLE_P (operand)) { REAL_VALUE_TYPE r; long l[4]; @@ -22683,7 +22685,7 @@ ix86_split_to_parts (rtx operand, rtx *parts, mach parts[0] = operand; parts[1] = adjust_address (operand, upper_mode, 8); } - else if (GET_CODE (operand) == CONST_DOUBLE) + else if (CONST_DOUBLE_P (operand)) { REAL_VALUE_TYPE r; long l[4]; @@ -41208,7 +41210,7 @@ ix86_preferred_reload_class (rtx x, reg_class_t re return SSE_CLASS_P (regclass) ? regclass : NO_REGS; /* Floating-point constants need more complex checks. */ - if (GET_CODE (x) == CONST_DOUBLE GET_MODE (x) != VOIDmode) + if (CONST_DOUBLE_P (x)) { /* General regs can load everything. */ if (reg_class_subset_p (regclass, GENERAL_REGS)) @@ -44551,9 +44553,9 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, for (i = 0; i n_elts; ++i) { x = XVECEXP (vals, 0, i); - if (!(CONST_INT_P (x) - || GET_CODE (x) == CONST_DOUBLE - || GET_CODE (x) == CONST_FIXED)) + if (!(CONST_SCALAR_INT_P (x) + || CONST_DOUBLE_P (x) + || CONST_FIXED_P (x))) n_var++, one_var = i; else if (x != CONST0_RTX (inner_mode)) all_const_zero = false; Index: config/i386/i386.md === --- config/i386/i386.md (revision 222767) +++ config/i386/i386.md (working copy) @@ -2955,7
Re: [RFA] More type narrowing in match.pd V2
I think this caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66009 H.J. On Mon, May 4, 2015 at 2:02 AM, Richard Biener richard.guent...@gmail.com wrote: On Sat, May 2, 2015 at 2:36 AM, Jeff Law l...@redhat.com wrote: Here's an updated patch to add more type narrowing to match.pd. Changes since the last version: Slight refactoring of the condition by using types_match as suggested by Richi. I also applied the new types_match to 2 other patterns in match.pd where it seemed clearly appropriate. Additionally the transformation is restricted by using the new single_use predicate. I didn't change other patterns in match.pd to use the new single_use predicate. But some probably could be changed. This (of course) continues to pass the bootstrap and regression check for x86-linux-gnu. There's still a ton of work to do in this space. This is meant to be an incremental stand-alone improvement. OK now? Ok with the {gimple,generic}-match-head.c changes mentioned in the ChangeLog. Thanks, Richard. Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e006b26..5ee89de 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2015-05-01 Jeff Law l...@redhat.com + + * match.pd (bit_and (plus/minus (convert @0) (convert @1) mask): New + simplifier to narrow arithmetic. + 2015-05-01 Rasmus Villemoes r...@rasmusvillemoes.dk * match.pd: New simplification patterns. diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c index daa56aa..303b237 100644 --- a/gcc/generic-match-head.c +++ b/gcc/generic-match-head.c @@ -70,4 +70,20 @@ along with GCC; see the file COPYING3. If not see #include dumpfile.h #include generic-match.h +/* Routine to determine if the types T1 and T2 are effectively + the same for GENERIC. */ +inline bool +types_match (tree t1, tree t2) +{ + return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2); +} + +/* Return if T has a single use. For GENERIC, we assume this is + always true. */ + +inline bool +single_use (tree t) +{ + return true; +} diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index c7b2f95..dc13218 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -861,3 +861,21 @@ do_valueize (tree (*valueize)(tree), tree op) return op; } +/* Routine to determine if the types T1 and T2 are effectively + the same for GIMPLE. */ + +inline bool +types_match (tree t1, tree t2) +{ + return types_compatible_p (t1, t2); +} + +/* Return if T has a single use. For GIMPLE, we also allow any + non-SSA_NAME (ie constants) and zero uses to cope with uses + that aren't linked up yet. */ + +inline bool +single_use (tree t) +{ + return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t); +} diff --git a/gcc/match.pd b/gcc/match.pd index 87ecaf1..51a950a 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -289,8 +289,7 @@ along with GCC; see the file COPYING3. If not see (if (((TREE_CODE (@1) == INTEGER_CST INTEGRAL_TYPE_P (TREE_TYPE (@0)) int_fits_type_p (@1, TREE_TYPE (@0))) - || (GIMPLE types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))) - || (GENERIC TREE_TYPE (@0) == TREE_TYPE (@1))) + || types_match (TREE_TYPE (@0), TREE_TYPE (@1))) /* ??? This transform conflicts with fold-const.c doing Convert (T)(x c) into (T)x (T)c, if c is an integer constants (if x has signed type, the sign bit cannot be set @@ -949,8 +948,7 @@ along with GCC; see the file COPYING3. If not see /* Unordered tests if either argument is a NaN. */ (simplify (bit_ior (unordered @0 @0) (unordered @1 @1)) - (if ((GIMPLE types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1))) - || (GENERIC TREE_TYPE (@0) == TREE_TYPE (@1))) + (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))) (unordered @0 @1))) (simplify (bit_ior:c (unordered @0 @0) (unordered:c@2 @0 @1)) @@ -1054,7 +1052,7 @@ along with GCC; see the file COPYING3. If not see operation and convert the result to the desired type. */ (for op (plus minus) (simplify -(convert (op (convert@2 @0) (convert@3 @1))) +(convert (op@4 (convert@2 @0) (convert@3 @1))) (if (INTEGRAL_TYPE_P (type) /* We check for type compatibility between @0 and @1 below, so there's no need to check that @1/@3 are integral types. */ @@ -1070,15 +1068,45 @@ along with GCC; see the file COPYING3. If not see TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type)) /* The inner conversion must be a widening conversion. */ TYPE_PRECISION (TREE_TYPE (@2)) TYPE_PRECISION (TREE_TYPE (@0)) - ((GENERIC - (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) - == TYPE_MAIN_VARIANT (TREE_TYPE (@1))) - (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) - == TYPE_MAIN_VARIANT (type))) -
Re: [patch] Perform anonymous constant propagation during inlining
On May 4, 2015 11:38:42 PM GMT+02:00, Eric Botcazou ebotca...@adacore.com wrote: 2015-05-01 Eric Botcazou ebotca...@adacore.com * expr.c (expand_expr_real_1) SSA_NAME: Try to substitute constants on the RHS of expressions. * gimple-expr.h (is_gimple_constant): Reorder. Bummer. This breaks C++ debugging: +FAIL: gdb.cp/class2.exp: print alpha at marker return 0 +FAIL: gdb.cp/class2.exp: print beta at marker return 0 +FAIL: gdb.cp/class2.exp: print * aap at marker return 0 +FAIL: gdb.cp/class2.exp: print * bbp at marker return 0 +FAIL: gdb.cp/class2.exp: print * abp at marker return 0, s-p-o off +FAIL: gdb.cp/class2.exp: print * (B *) abp at marker return 0 +FAIL: gdb.cp/class2.exp: p acp +FAIL: gdb.cp/class2.exp: p acp-c1 +FAIL: gdb.cp/class2.exp: p acp-c2 because C++ is apparently relying on the assignment to the anonymous return object to preserve the debug info attached to a return statement. Would you be OK with a slight variation of your earlier idea, i.e. calling fold_stmt with a specific valueizer from fold_marked_statements instead of the implicit no_follow_ssa_edges in the inliner? Something like: tree follow_anonymous_single_use_edges (tree val) { if (TREE_CODE (val) == SSA_NAME (!SSA_NAME_VAR (val) || DECL_IGNORED_P (SSA_NAME_VAR (var))) has_single_use (val)) return val return NULL_TREE; } Yes, that works for me as well. Richard.
Re: [PATCH/libiberty] fix build of gdb/binutils with clang.
On Mon, May 4, 2015 at 3:49 PM, Yunlian Jiang yunl...@google.com wrote: There was a similar disscussion here https://gcc.gnu.org/ml/gcc/2005-11/msg01190.html That was a discussion about libiberty. Your subject says you have trouble building gdb. Can you describe the exact problem that you are having? What precisely are you doing? What precisely happens? The problem is in the configure stage, the __GNU_SOURCE is not defined, and it could not find the declaration of asprintf. so it make a declaration of asprintf in libiberty.h. And for the file floatformat.c, the __GNU_SOURCE is defined, so it could find another asprintf in /usr/include/bits/stdio2.h, it also includes libiberty.h. So these two asprintf conflicts when __USE_FORTIFY_LEVEL is set. I think the basic guideline should be that HAVE_DECL_ASPRINTF should be correct. If libiberty compiled with _GNU_SOURCE defined, then it should test HAVE_DECL_ASPRINTF with _GNU_SOURCE defined. If not, then not. So perhaps the problem is that libiberty is compiling some files with _GNU_SOURCE defined and some not. Ian
Re: [patch] libstdc++/56117 make std::async launch new threads by default
On 02/05/15 19:56 +0100, Jonathan Wakely wrote: One last patch before I head to Lenexa, this fixes the long standing not-a-bug that our default launch policy is launch::deferred. This way std::async with no explicit policy or with any policy that contains launch::async will run in a new thread. Apparently libc++ does the same and they aren't getting lots of complaints about fork-bombs, so let's try the same thing. If people don't like it we have plenty of time in stage 1 to reconsider. Tested x86_64-linux and powerpc64le-linux, I'm going to commit this to trunk unless someone strongly objects. Committed to trunk.
[debug-early] fix problem with template parameter packs
The code handling parameter DIEs needed a little tweaking for variable length template arguments. I've relaxed the original assert, but this may require tweaking at branch review time-- hopefully later this week. Committing to branch. Aldy p.s. Richi/Jason: Winter is coming. Down to 1 GCC regression which is actually a missed DIE optimization which I hope I can fix post merge. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index c51cea1..a5b155f 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -18018,8 +18018,20 @@ gen_formal_parameter_die (tree node, tree origin, bool emit_name_p, DW_AT_abstract_origin. */ if (parm_die parm_die-die_parent != context_die) { - gcc_assert (!DECL_ABSTRACT_P (node)); - parm_die = NULL; + if (!DECL_ABSTRACT_P (node)) + { + gcc_assert (!DECL_ABSTRACT_P (node)); + parm_die = NULL; + } + else + { + /* Reuse DIE even with a differing context. This +happens when called through +dwarf2out_abstract_function for +formal parameter packs. */ + gcc_assert (parm_die-die_parent-die_tag + == DW_TAG_GNU_formal_parameter_pack); + } } if (parm_die parm_die-die_parent == NULL)