Re: [PATCH 01/13] recog: Increased max number of alternatives - v2
On Tue, May 19, 2015 at 10:40:26AM +0200, Andreas Krebbel wrote: On 05/18/2015 04:19 PM, Richard Biener wrote: Please use uint64_t instead. Done. Ok with that change? I've applied the following patch. Bye, -Andreas- gcc/ * recog.h: Increase MAX_RECOG_ALTERNATIVES. Change type of alternative_mask to uint64_t. --- gcc/recog.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/recog.h b/gcc/recog.h index 463c748..3a09304 100644 --- a/gcc/recog.h +++ b/gcc/recog.h @@ -23,8 +23,8 @@ along with GCC; see the file COPYING3. If not see /* Random number that should be large enough for all purposes. Also define a type that has at least MAX_RECOG_ALTERNATIVES + 1 bits, with the extra bit giving an invalid value that can be used to mean uninitialized. */ -#define MAX_RECOG_ALTERNATIVES 30 -typedef unsigned int alternative_mask; +#define MAX_RECOG_ALTERNATIVES 35 +typedef uint64_t alternative_mask; /* A mask of all alternatives. */ #define ALL_ALTERNATIVES ((alternative_mask) -1) -- 1.7.9.5
Re: Calculate TYPE_CANONICAL only for types that can be accessed in memory
On Thu, 21 May 2015, Jan Hubicka wrote: Hi, this is next part of the series. It disables canonical type calculation for incomplete types with exception of arrays based on claim that we do not have good notion of those. I can botostrap this with additional checks in alias.c that canonical types are always present with LTO but I need fix to ICF that compare alias sets of types it does not need to and trips incomplete types otherwise. I will push out these fixes separately and incrementally add the fix. The purpose of those checks is to avoid alias.c degenerating to structural equality path for no good reason. I tried the alternative to disable it on ARRAY_TYPES too and add avoid recursion to those for fields. THis does not fly because we can have ARRAY_REFS of incomplete types: array_ref 0x769c2968 type pointer_type 0x76942150 type integer_type 0x76cd50a8 char readonly string-flag QI size integer_cst 0x76ad7ca8 constant 8 unit size integer_cst 0x76ad7cc0 constant 1 align 8 symtab -158253232 alias set 0 canonical type 0x76adb498 precision 8 min integer_cst 0x76cd2090 -128 max integer_cst 0x76cd2078 127 pointer_to_this pointer_type 0x76cd5150 readonly unsigned DI size integer_cst 0x76ad7bb8 constant 64 unit size integer_cst 0x76ad7bd0 constant 8 align 64 symtab 0 alias set -1 canonical type 0x76af37e0 pointer_to_this pointer_type 0x769479d8 readonly arg 0 mem_ref 0x769d1028 type array_type 0x7694b348 type pointer_type 0x76942150 BLK align 64 symtab 0 alias set -1 structural equality pointer_to_this pointer_type 0x7694b3f0 arg 0 addr_expr 0x769ce380 type pointer_type 0x7694b3f0 constant arg 0 var_decl 0x76941480 reg_note_name arg 1 integer_cst 0x769b6678 constant 0 arg 1 ssa_name 0x769c5630 type integer_type 0x76adb690 int asm_written public SI size integer_cst 0x76ad7df8 constant 32 unit size integer_cst 0x76ad7e10 constant 4 align 32 symtab -158421968 alias set 3 canonical type 0x76adb690 precision 32 min integer_cst 0x76ad7db0 -2147483648 max integer_cst 0x76ad7dc8 2147483647 pointer_to_this pointer_type 0x76af37e0 reference_to_this reference_type 0x76942b28 visiteddef_stmt _103 = (int) _101; version 103 ptr-info 0x769f04a0 ../../gcc/print-rtl.c:173:4 and we compute alias set for it via: #0 internal_error (gmsgid=0x1b86c8f in %s, at %s:%d) at ../../gcc/diagnostic.c:1271 #1 0x015e2416 in fancy_abort (file=0x167ea2a ../../gcc/alias.c, line=823, function=0x167f7d6 get_alias_set(tree_node*)::__FUNCTION__ get_alias_set) at ../../gcc/diagnostic.c:1341 #2 0x007109b9 in get_alias_set (t=0x7694b2a0) at ../../gcc/alias.c:823 #3 0x0070fecf in component_uses_parent_alias_set_from (t=0x769c2968) at ../../gcc/alias.c:607 #4 0x00710497 in reference_alias_ptr_type_1 (t=0x7fffe068) at ../../gcc/alias.c:719 #5 0x007107e8 in get_alias_set (t=0x769c2968) at ../../gcc/alias.c:799 #6 0x00ebca97 in vn_reference_lookup (op=0x769c2968, vuse=0x769ca798, kind=VN_WALKREWRITE, vnresult=0x0) at ../../gcc/tree-ssa-sccvn.c:2217 #7 0x00ebea99 in visit_reference_op_load (lhs=0x769c5678, op=0x769c2968, stmt=0x769cf730) at ../../gcc/tree-ssa-sccvn.c:3030 #8 0x00ec05ec in visit_use (use=0x769c5678) at ../../gcc/tree-ssa-sccvn.c:3685 #9 0x00ec1047 in process_scc (scc=...) at ../../gcc/tree-ssa-sccvn.c:3927 #10 0x00ec1679 in extract_and_process_scc_for_name (name=0x769c5678) at ../../gcc/tree-ssa-sccvn.c:4013 #11 0x00ec1848 in DFS (name=0x769c5678) at ../../gcc/tree-ssa-sccvn.c:4065 #12 0x00ec26d1 in cond_dom_walker::before_dom_children (this=0x7fffe5a0, bb=0x769b9888) at ../../gcc/tree-ssa-sccvn.c:4345 #13 0x014c05c0 in dom_walker::walk (this=0x7fffe5a0, bb=0x769b9888) at ../../gcc/domwalk.c:188 #14 0x00ec2b0e in run_scc_vn (default_vn_walk_kind_=VN_WALKREWRITE) at ../../gcc/tree-ssa-sccvn.c:4436 #15 0x00e98d59 in (anonymous namespace)::pass_fre::execute (this=0x1f621b0, fun=0x7698db28) at ../../gcc/tree-ssa-pre.c:4972 #16 0x00bb6c8f in execute_one_pass (pass=0x1f621b0) at ../../gcc/passes.c:2317 #17 0x00bb6ede in execute_pass_list_1 (pass=0x1f621b0) at ../../gcc/passes.c:2370 #18 0x00bb6f0f in execute_pass_list_1 (pass=0x1f61d90) at ../../gcc/passes.c:2371 #19 0x00bb6f51 in execute_pass_list (fn=0x7698db28, pass=0x1f61cd0) at ../../gcc/passes.c:2381 #20 0x007bb3f6 in
Re: [C++ Patch] PR 61683
Hi, On 04/30/2015 01:56 AM, Paolo Carlini wrote: Hi, this seems pretty straightforward given the grammar. Tested x86_64-linux. ... again, given the grammar, I think this is even obvious: if nobody screams, I'm going to commit the patch in a day or so (but I'm naming the testcase decltype-mem-initializer1.C instead, seems more correct to me) Thanks! Paolo.
Re: Cleanup and improve canonical type construction in LTO
On Thu, 21 May 2015, Jan Hubicka wrote: On Wed, 20 May 2015, Jan Hubicka wrote: Code quality does not seem to be affected too much, which I suppose is partly thanks to that tree-ssa-alias.c pointer hack. My main point was to cleanup the hack about comparing only TYPE_CODE of pointer_type and make more sense of the whole machinery. Well, TYPE_CODE was supposed to mimic what you do explicitely now. I'm curious when the recursion determines incompatibility but the TYPE_CODE is equal. What cases are those? How do you handle void * vs. any other pointer type? For cross-language LTO I think we need to treat those compatible. At least void * is equal to struct Foo * (with incomplete Foo). Well, if you would force this rule for canonical type calculation, then by transitivity we could make no difference between pointers at all. That's kind of my point - and it was the reason to add that get_alias_set () hack globbing all pointer-types. I think it was inspired by actual code in the wild (mostly C code though...) that expects to be able to store all pointer types to *(void *)x, thus void * being a valid abstraction for all pointer types. At the point of introducing the hack I decided to not allow that by assigning alias-set zero to void * pointers (well, all pointers to incomplete types) but to glob all pointer types instead. I think the only way to enforce such non-transitive rules is via alias sets. Just like alias set 0 alias with everything without really degenerating every canonical type to a single type. But alias-sets also obey transitivity rules, no? Or you mean introducing another alias-set zero globbing all pointers and using that for pointers to incomplete types? (what about void **? The same abstraction argument applies to that if I want to store a X **) In the end I like the simplicity of the current approach very much (likewise using element alias-sets for vector/array type alias-sets) Bootstrapped/regtested x86_64-linux with LTO and also tested on Firefox. ppc64 build in progress. OK? Not looking at the patch - please split it up into small pieces so we can bisect in case of problems. This is a tricky area, esp. considering cross-language LTO. We might be not conservative enough with canonical types (given the get_alias_set pointer handling). Yep, i looked into the pointer handling hack. Thanks for looking into this. I know it is a tricky area. Indeed. Richard.
Re: [PATCH 1/3][AArch64][PR target/65697] Strengthen barriers for sync-fetch-op builtins.
[Added PR number and updated patches] On Aarch64, the __sync builtins are implemented using the __atomic operations and barriers. This makes the the __sync builtins inconsistent with their documentation which requires stronger barriers than those for the __atomic builtins. The difference between __sync and __atomic builtins is that the restrictions imposed by a __sync operation's barrier apply to all memory references while the restrictions of an __atomic operation's barrier only need to apply to a subset. This affects Aarch64 in particular because, although its implementation of __atomic builtins is correct, the barriers generated are too weak for the __sync builtins. The affected __sync builtins are the __sync_fetch_and_op (and __sync_op_and_fetch) functions, __sync_compare_and_swap and __sync_lock_test_and_set. This and a following patch modifies the code generated for these functions to weaken initial load-acquires to a simple load and to add a final fence to prevent code-hoisting. The last patch will add tests for the code generated by the Aarch64 backend for the __sync builtins. - Full barriers: __sync_fetch_and_op, __sync_op_and_fetch __sync_*_compare_and_swap [load-acquire; code; store-release] becomes [load; code ; store-release; fence]. - Acquire barriers: __sync_lock_test_and_set [load-acquire; code; store] becomes [load; code; store; fence] The code generated for release barriers and for the __atomic builtins is unchanged. This patch changes the code generated for __sync_fetch_and_op and __sync_op_and_fetch builtins. Tested with check-gcc for aarch64-none-linux-gnu. Ok for trunk? Matthew gcc/ 2015-05-22 Matthew Wahab matthew.wa...@arm.com PR target/65697 * config/aarch64/aarch64.c (aarch64_emit_post_barrier): New. (aarch64_split_atomic_op): Check for __sync memory models, emit appropriate initial and final barriers. From 5a2d546359f78cd3f304a62617f0fc385664374e Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Fri, 15 May 2015 09:26:28 +0100 Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin. Change-Id: I3342a572d672163ffc703e4e51603744680334fc --- gcc/config/aarch64/aarch64.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 8c25d75..182dbad 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9407,6 +9407,22 @@ aarch64_expand_compare_and_swap (rtx operands[]) emit_insn (gen_rtx_SET (bval, x)); } +/* Emit a post-operation barrier. */ + +static void +aarch64_emit_post_barrier (enum memmodel model) +{ + const enum memmodel base_model = memmodel_base (model); + + if (is_mm_sync (model) + (base_model == MEMMODEL_ACQUIRE + || base_model == MEMMODEL_ACQ_REL + || base_model == MEMMODEL_SEQ_CST)) +{ + emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST))); +} +} + /* Split a compare and swap pattern. */ void @@ -9469,12 +9485,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, { machine_mode mode = GET_MODE (mem); machine_mode wmode = (mode == DImode ? DImode : SImode); + const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); + const bool is_sync = is_mm_sync (model); + rtx load_model_rtx = model_rtx; rtx_code_label *label; rtx x; label = gen_label_rtx (); emit_label (label); + /* A __sync operation will emit a final fence to stop code hoisting, so the + load can be relaxed. */ + if (is_sync) +load_model_rtx = GEN_INT (MEMMODEL_RELAXED); + if (new_out) new_out = gen_lowpart (wmode, new_out); if (old_out) @@ -9483,7 +9507,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, old_out = new_out; value = simplify_gen_subreg (wmode, value, mode, 0); - aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx); + aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx); switch (code) { @@ -9519,6 +9543,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, label), pc_rtx); aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + + /* Emit any fence needed for a __sync operation. */ + if (is_sync) +aarch64_emit_post_barrier (model); } static void -- 1.9.1
RE: [Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures
Hi Kyrill, Sorry for little delay in responding. -Original Message- From: Kyrill Tkachov [mailto:kyrylo.tkac...@foss.arm.com] Sent: Tuesday, May 19, 2015 9:13 PM To: Kumar, Venkataramanan; James Greenhalgh; gcc-patches@gcc.gnu.org Cc: Ramana Radhakrishnan; seg...@kernel.crashing.org; Marcus Shawcroft Subject: Re: [Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures Hi Venkat, On 19/05/15 16:37, Kumar, Venkataramanan wrote: Hi Maintainers, Please find the attached patch, that fixes add/extend gcc test suite failures in Aarch64 target. Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049 These tests started to fail after we prevented combiner from converting shift RTX to mult RTX, when the RTX is not inside a memory operation (r222874) . Now I have added new add/extend patterns which are based on shift operations, to fix these cases. Testing status with the patch. (1) GCC bootstrap on AArch64 successful. (2) SPEC2006 INT runs did not show any degradation. Does that mean there was no performance regression? Or no codegen difference? Yes there was no performance regression. What I'd expect from this patch is that the codegen would be the same as before the combine patch (r222874). A performance difference can sometimes be hard to measure even at worse code quality. Can you please confirm that on SPEC2006 INT the adds and shifts are now back to being combined into a single instruction? I used -dp --save-temps to dump pattern names in assembly files. I used revision before combiner patch (r222873) and the patch on top of trunk (little older r223194) for comparison. Quickly compared the counts generated for the below patterns from the SPEC INT binaries. *adds_optabmode_multp2 vs *adds_optabALLX:mode_shft_GPI:mode *subs_optabmode_multp2 vs *subs_optabALLX:mode_shft_GPI:mode *add_uxtmode_multp2 vs *add_uxtmode_shift2 *add_uxtsi_multp2_uxtw vs *add_uxtsi_shift2_uxtw *sub_uxtmode_multp2 vs *sub_uxtmode_shift2 *sub_uxtsi_multp2_uxtw vs *sub_uxtsi_shift2_uxtw *adds_mul_imm_modevs *adds_shift_imm_mode *subs_mul_imm_modevs *subs_shift_imm_mode Patterns are found in few benchmarks. The generated counts are matching in the binaries compiled with the two revisions. Also Looked at assembly generated and the adds and shifts are combined properly using the new patterns. Please let me know if this is OK. Thanks, Kyrill (3) gcc regression testing passed. (-Snip-) # Comparing 3 common sum files ## /bin/sh ./gcc-fsf-trunk/contrib/compare_tests /tmp/gxx-sum1.24998 /tmp/gxx-sum2.24998 Tests that now work, but didn't before: gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0- 9]+, lsl 3 gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3 gcc.target/aarch64/adds3.c scan-assembler-times adds\tx[0-9]+, x[0-9]+, x[0-9]+, sxtw 2 gcc.target/aarch64/extend.c scan-assembler add\tw[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler add\tx[0-9]+,.*uxtw #?3 gcc.target/aarch64/extend.c scan-assembler sub\tw[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxtw #?3 gcc.target/aarch64/subs1.c scan-assembler subs\tw[0-9]+, w[0-9]+, w[0- 9]+, lsl 3 gcc.target/aarch64/subs1.c scan-assembler subs\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3 gcc.target/aarch64/subs3.c scan-assembler-times subs\tx[0-9]+, x[0-9]+, x[0-9]+, sxtw 2 # No differences found in 3 common sum files (-Snip-) The patterns are fixing the regressing tests, so I have not added any new tests. Regarding removal of the old patterns based on mults, I am planning to do it as a separate work. Is this OK for trunk ? gcc/ChangeLog 2015-05-19 Venkataramanan Kumar venkataramanan.ku...@amd.com * config/aarch64/aarch64.md (*adds_shift_imm_mode): New pattern. (*subs_shift_imm_mode): Likewise. (*adds_optabALLX:mode_shift_GPI:mode): Likewise. (*subs_optabALLX:mode_shift_GPI:mode): Likewise. (*add_uxtmode_shift2): Likewise. (*add_uxtsi_shift2_uxtw): Likewise. (*sub_uxtmode_shift2): Likewise. (*sub_uxtsi_shift2_uxtw): Likewise. Regards, Venkat. Regards, Venkat,
Re: [PATCH] Fix PR65701(?), fix typo in vect_enhance_data_refs_alignment
On Fri, 10 Apr 2015, Richard Biener wrote: On Fri, 10 Apr 2015, Richard Biener wrote: The following patch fixes a typo (I think) which is present since the original introduction of the code in vect_enhance_data_refs_alignment. if (do_peeling all_misalignments_unknown vect_supportable_dr_alignment (dr0, false)) { /* Check if the target requires to prefer stores over loads, i.e., if misaligned stores are more expensive than misaligned loads (taking drs with same alignment into account). */ if (first_store DR_IS_READ (dr0)) { ... } /* In case there are only loads with different unknown misalignments, use peeling only if it may help to align other accesses in the loop. */ if (!first_store !STMT_VINFO_SAME_ALIGN_REFS ( vinfo_for_stmt (DR_STMT (dr0))).length () vect_supportable_dr_alignment (dr0, false) != dr_unaligned_supported) do_peeling = false; } the last vect_supportable_dr_alignment check doesn't make much sense. It's not mentioned in the comment and I see no reason for treating dr_unaligned_supported any different here compared to dr_explicit_realign_[optimized]. Ok, as always a second after you hit send you think of the reason of this test. I suppose the idea was that aligning a single load might improve load bandwith (same reason we eventually prefer aligning stores). dr_explicit_realign_[optimized] perform aligned loads thus it makes sense to special-case them. I suppose the cost model in that case should be more precise (and note that for bdver2 aligned and unaligned loads have the same cost). So sth like /* In case there are only loads with different unknown misalignments, use peeling only if it may help to align other accesses in the loop or if it may help improving load bandwith when we'd end up using unaligned loads. */ tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0))); if (!first_store !STMT_VINFO_SAME_ALIGN_REFS ( vinfo_for_stmt (DR_STMT (dr0))).length () (vect_supportable_dr_alignment (dr0, false) != dr_unaligned_supported || (builtin_vectorization_cost (vector_load, dr0_vt, 0) == builtin_vectorization_cost (unaligned_load, dr0_vt, -1 do_peeling = false; Looks like all Intel CPUs still have larger unaligned load cost compared to aligned load cost (same is true for bdver2 in principle - the cost isn't about the instruction encoding but the actual data alignment). So it might be artificially fixing 187.facerec with bdver2 (we don't have a good idea whether we'll going to be store or load bandwith limited in the vectorizer). Still the following is what I have bootstrapped and tested on x86_64-unknown-linux-gnu and applied to trunk. Richard. 2015-05-22 Richard Biener rguent...@suse.de PR tree-optimization/65701 * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Move peeling cost models into one place. Peel for alignment for single loads only if an aligned load is cheaper than an unaligned load. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 223349) +++ gcc/tree-vect-data-refs.c (working copy) @@ -1541,16 +1541,6 @@ vect_enhance_data_refs_alignment (loop_v || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) do_peeling = false; - /* If we don't know how many times the peeling loop will run - assume it will run VF-1 times and disable peeling if the remaining - iters are less than the vectorization factor. */ - if (do_peeling - all_misalignments_unknown - LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) - (LOOP_VINFO_INT_NITERS (loop_vinfo) - 2 * (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1)) -do_peeling = false; - if (do_peeling all_misalignments_unknown vect_supportable_dr_alignment (dr0, false)) @@ -1619,12 +1609,17 @@ vect_enhance_data_refs_alignment (loop_v } /* In case there are only loads with different unknown misalignments, use - peeling only if it may help to align other accesses in the loop. */ + peeling only if it may help to align other accesses in the loop or +if it may help improving load bandwith when we'd end up using +unaligned loads. */ + tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0))); if (!first_store !STMT_VINFO_SAME_ALIGN_REFS ( vinfo_for_stmt (DR_STMT (dr0))).length () - vect_supportable_dr_alignment (dr0, false) - != dr_unaligned_supported) + (vect_supportable_dr_alignment
Re: [patch 1/10] debug-early merge: Ada front-end
My apologies for the delay on Ada. I have reworked the patch to leave the first pass on the TYPE_DECLs which are definitely needed. I also optimized things a bit, since we don't need to save all the globals any more. Thanks, this looks fine modulo a couple of nits, see below. There is one regression which I'd like you and Jason's input before proceeding: +FAIL: gnat.dg/specs/debug1.ads scan-assembler-times DW_AT_artificial 17 The problem is that the Ada front-end twiddles the DECL_ARTIFICIAL flag *after* it has called debug_hooks-early_global_decl(). The function gnat_to_gnu_entity() calls create_var_decl_1-rest_of_decl_compilation, but then much later twiddles DECL_ARTIFICIAL: if (!Comes_From_Source (gnat_entity)) DECL_ARTIFICIAL (gnu_decl) = 1; Twiddling DECL_ARTIFICIAL after we create early dwarf, means the DIE does not get the DW_AT_artificial. Would it be possible for you guys (ahem, Ada folk) to set DECL_ARTIFICIAL before calling rest_of_decl_compilation? Sure, just add a ??? comment before the above lines for now. @@ -535,8 +535,7 @@ extern tree gnat_type_for_size (unsigned precision, int unsignedp); an unsigned type; otherwise a signed type is returned. */ extern tree gnat_type_for_mode (machine_mode mode, int unsignedp); -/* Emit debug info for all global variable declarations. */ -extern void gnat_write_global_declarations (void); +extern void note_types_used_by_globals (void); The comment needs to be adjusted instead of removed and preferably be the same as the one (ajusted too) above the function in utils.c. -- Eric Botcazou
Re: [PATCH 3/3][Aarch64][PR target/65697] Add tests for __sync_builtins.
[Added PR number and updated patches] This patch adds tests for the code generated by the Aarch64 backend for the __sync builtins. Tested aarch64-none-linux-gnu with check-gcc. Ok for trunk? Matthew gcc/testsuite/ 2015-05-21 Matthew Wahab matthew.wa...@arm.com PR target/65697 * gcc.target/aarch64/sync-comp-swap.c: New. * gcc.target/aarch64/sync-comp-swap.x: New. * gcc.target/aarch64/sync-op-acquire.c: New. * gcc.target/aarch64/sync-op-acquire.x: New. * gcc.target/aarch64/sync-op-full.c: New. * gcc.target/aarch64/sync-op-full.x: New. * gcc.target/aarch64/sync-op-release.c: New. * gcc.target/aarch64/sync-op-release.x: New. From a3e8df9afce1098c1c616d66a309ce5bc5b95593 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Fri, 15 May 2015 09:31:42 +0100 Subject: [PATCH 3/3] [Aarch64] Add tests for __sync_builtins. Change-Id: I9f7cde85613dfe2cb6df55cbc732e683092f14d8 --- gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c | 8 +++ gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x | 13 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c | 8 +++ gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x | 7 +++ gcc/testsuite/gcc.target/aarch64/sync-op-full.c| 8 +++ gcc/testsuite/gcc.target/aarch64/sync-op-full.x| 73 ++ gcc/testsuite/gcc.target/aarch64/sync-op-release.c | 6 ++ gcc/testsuite/gcc.target/aarch64/sync-op-release.x | 7 +++ 8 files changed, 130 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full.x create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-release.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-release.x diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c new file mode 100644 index 000..126b997 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-ipa-icf } */ + +#include sync-comp-swap.x + +/* { dg-final { scan-assembler-times ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\] 2 } } */ +/* { dg-final { scan-assembler-times stlxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\] 2 } } */ +/* { dg-final { scan-assembler-times dmb\tish 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x new file mode 100644 index 000..eda52e40 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x @@ -0,0 +1,13 @@ +int v = 0; + +int +sync_bool_compare_swap (int a, int b) +{ + return __sync_bool_compare_and_swap (v, a, b); +} + +int +sync_val_compare_swap (int a, int b) +{ + return __sync_val_compare_and_swap (v, a, b); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c new file mode 100644 index 000..2639f9f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +#include sync-op-acquire.x + +/* { dg-final { scan-assembler-times ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\] 1 } } */ +/* { dg-final { scan-assembler-times stxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\] 1 } } */ +/* { dg-final { scan-assembler-times dmb\tish 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x new file mode 100644 index 000..4c4548c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x @@ -0,0 +1,7 @@ +int v; + +int +sync_lock_test_and_set (int a) +{ + return __sync_lock_test_and_set (v, a); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c new file mode 100644 index 000..10fc8fc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +#include sync-op-full.x + +/* { dg-final { scan-assembler-times ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\] 12 } } */ +/* { dg-final { scan-assembler-times stlxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\] 12 } } */ +/* { dg-final { scan-assembler-times dmb\tish 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.x b/gcc/testsuite/gcc.target/aarch64/sync-op-full.x new file mode 100644 index 000..c24223d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.x @@ -0,0 +1,73 @@ +int v = 0; + +int +sync_fetch_and_add (int a) +{ + return __sync_fetch_and_add (v, a); +} + +int +sync_fetch_and_sub (int a) +{ + return __sync_fetch_and_sub (v, a); +} + +int
Re: C/C++ PATCH to allow deprecating enum values (PR c/47043)
On Thu, May 21, 2015 at 02:00:26PM -0400, Jason Merrill wrote: On 05/07/2015 12:22 PM, Marek Polacek wrote: - mark_used (decl); + mark_used (decl, 0); This should use tf_none rather than 0. Fixed. + build_enumerator (DECL_NAME (decl), value, newtag, +DECL_ATTRIBUTES (decl), DECL_SOURCE_LOCATION (decl)); This is assuming that enumerators can't have dependent attributes. I guess that's currently true, but please add a comment about it. Done. OK with those changes. Thanks, applied. Here's the final version. 2015-05-22 Marek Polacek pola...@redhat.com Edward Smith-Rowland 3dw...@verizon.net PR c/47043 * c-common.c (handle_deprecated_attribute): Allow CONST_DECL. * c-parser.c (c_parser_enum_specifier): Parse and apply enumerator attributes. * cp-tree.h (build_enumerator): Update declaration. * decl.c (build_enumerator): Add attributes parameter. Call cplus_decl_attributes. * init.c (constant_value_1): Pass tf_none to mark_used. * parser.c (cp_parser_enumerator_definition): Parse attributes and pass them down to build_enumerator. * pt.c (tsubst_enum): Pass decl attributes to build_enumerator. * semantics.c (finish_id_expression): Don't warn_deprecated_use here. * doc/extend.texi (Enumerator Attributes): New section. Document syntax of enumerator attributes. * c-c++-common/attributes-enum-1.c: New test. * c-c++-common/attributes-enum-2.c: New test. * g++.dg/cpp0x/attributes-enum-1.C: New test. * g++.dg/cpp1y/attributes-enum-1.C: New test. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index a2b3793..36c984c 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -9088,6 +9088,7 @@ handle_deprecated_attribute (tree *node, tree name, || TREE_CODE (decl) == PARM_DECL || VAR_OR_FUNCTION_DECL_P (decl) || TREE_CODE (decl) == FIELD_DECL + || TREE_CODE (decl) == CONST_DECL || objc_method_decl (TREE_CODE (decl))) TREE_DEPRECATED (decl) = 1; else diff --git gcc/c/c-parser.c gcc/c/c-parser.c index f496733..965b4b9 100644 --- gcc/c/c-parser.c +++ gcc/c/c-parser.c @@ -2516,6 +2516,13 @@ c_parser_declspecs (c_parser *parser, struct c_declspecs *specs, enumerator: enumeration-constant enumeration-constant = constant-expression + + GNU Extensions: + + enumerator: + enumeration-constant attributes[opt] + enumeration-constant attributes[opt] = constant-expression + */ static struct c_typespec @@ -2575,6 +2582,8 @@ c_parser_enum_specifier (c_parser *parser) c_parser_set_source_position_from_token (token); decl_loc = value_loc = token-location; c_parser_consume_token (parser); + /* Parse any specified attributes. */ + tree enum_attrs = c_parser_attributes (parser); if (c_parser_next_token_is (parser, CPP_EQ)) { c_parser_consume_token (parser); @@ -2584,7 +2593,9 @@ c_parser_enum_specifier (c_parser *parser) else enum_value = NULL_TREE; enum_decl = build_enumerator (decl_loc, value_loc, - the_enum, enum_id, enum_value); + the_enum, enum_id, enum_value); + if (enum_attrs) + decl_attributes (TREE_PURPOSE (enum_decl), enum_attrs, 0); TREE_CHAIN (enum_decl) = values; values = enum_decl; seen_comma = false; diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h index 4136d98..91619e2 100644 --- gcc/cp/cp-tree.h +++ gcc/cp/cp-tree.h @@ -5400,7 +5400,7 @@ extern bool xref_basetypes(tree, tree); extern tree start_enum (tree, tree, tree, bool, bool *); extern void finish_enum_value_list (tree); extern void finish_enum(tree); -extern void build_enumerator (tree, tree, tree, location_t); +extern void build_enumerator (tree, tree, tree, tree, location_t); extern tree lookup_enumerator (tree, tree); extern bool start_preparsed_function (tree, tree, int); extern bool start_function (cp_decl_specifier_seq *, diff --git gcc/cp/decl.c gcc/cp/decl.c index e4d3c1d..5396994 100644 --- gcc/cp/decl.c +++ gcc/cp/decl.c @@ -13057,11 +13057,12 @@ finish_enum (tree enumtype) /* Build and install a CONST_DECL for an enumeration constant of the enumeration type ENUMTYPE whose NAME and VALUE (if any) are provided. - LOC is the location of NAME. + Apply ATTRIBUTES if available. LOC is the location of NAME. Assignment of sequential values by default is handled here. */ void -build_enumerator (tree name, tree value, tree enumtype, location_t loc) +build_enumerator (tree name, tree value, tree enumtype,
Re: [PATCH 02/13] optabs: Fix vec_perm - V16QI middle end lowering.
On Tue, May 19, 2015 at 07:48:29AM -0700, Richard Henderson wrote: Ok to apply with that change? Yes, thanks. I've applied the following. Bye, -Andreas- gcc/ * optabs.c (expand_vec_perm): Don't re-use SEL as target operand. --- gcc/optabs.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/optabs.c b/gcc/optabs.c index bd03fc1..bc19029 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6796,11 +6796,11 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target) machine_mode selmode = GET_MODE (sel); if (u == 2) sel = expand_simple_binop (selmode, PLUS, sel, sel, - sel, 0, OPTAB_DIRECT); + NULL, 0, OPTAB_DIRECT); else sel = expand_simple_binop (selmode, ASHIFT, sel, GEN_INT (exact_log2 (u)), - sel, 0, OPTAB_DIRECT); + NULL, 0, OPTAB_DIRECT); gcc_assert (sel != NULL); /* Broadcast the low byte each element into each of its bytes. */ -- 1.7.9.5
Re: Check canonical types in verify_type
On Thu, 21 May 2015, Jan Hubicka wrote: Hmm, I see, interesting hack. For the first part of comment, I see that qualifiers needs to be ignored, but I do not see why we put short * and int * pointers to same class. For the reason that people are very lazy. For example GCC has code punning void ** and void * and void * to Type * (and vice versa). People just don't expect pointers to be in different alias sets (and there is little gained with doing that). I think this observation may be out of date. Removing that code (punting all pointer to ptr_type_node) increase number of TBAA disambiguations for firefox by 20%. It also drops number of queries so in reality the number is higher. So it seems that it may be worth to track this in more sensible way. This was tested on LTO where most pointer alias sets are te same anyway, so the increase should be bigger in non-LTO build. I will gather some stats. The issue is that modern C++ code packs everything into instances, puts pointers to instances everywhere and as wel inline everything into one glob, we could track down a lot of data if we did not get lost in pointers. We may want to have flag disabling that and we may want to throw away some precision without throwing away all. I tried to just trhow away the qualifier from pointer (i.e. make pointer to qualified type to be globbed to pointer to unqualified type). This seems to work in a way bootstrapping GCC with one bug (in ipa-icf) and passing testsuite. Well, see the big comment before the alias.c pointer type handling. This is what the C family frontends did before in its get_alias_set langhook: t1 = build_type_no_quals (t); if (t1 != t) return get_alias_set (t1); where build_type_no_quals strips qualifiers recursively from t and pointed-to-types. The complete/incomplete type fun with LTO can be solved for C++ but indeed I see why pointer to incomplete type needs to be considered compatible with every other pointer to structure. Can't this be dealt with by adding correct edges into the aliasing dag? I don't think so, without the dag degenerating. We've had this mess before (complete vs. incomplete structs and so). It didn't work very well. Can you point me to the code/issues, please? It was the old type merging code. I can spot it in the 4.5 code, gimple.c:gimple_types_compatible_p where in the POINTER_TYPE case it does compare_type_names_p (TYPE_MAIN_VARIANT (TREE_TYPE (t1)), TYPE_MAIN_VARIANT (TREE_TYPE (t2)), true)) { /* Replace the pointed-to incomplete type with the complete one. ??? This simple name-based merging causes at least some of the ICEs in canonicalizing FIELD_DECLs during stmt read. For example in GCC we have two different struct deps and we mismatch the use in struct cpp_reader in sched-int.h vs. mkdeps.c. Of course the whole exercise is for TBAA with structs which contain pointers to incomplete types in one unit and to complete ones in another. So we probably should merge these types only with more context. */ if (COMPLETE_TYPE_P (TREE_TYPE (t2))) TREE_TYPE (t1) = TREE_TYPE (t2); else TREE_TYPE (t2) = TREE_TYPE (t1); where that of course wrecks the tree graph quite a bit (and it is dependent on the order of TUs). Luckily we now have SCC based streaming and merging, not trying to complete types to get more merging done. I don't remember if I added a testcase, so no. It's just very clear to me that the look of pointer members may be very different (consider reference vs. pointer or pointer to array vs pointer to element). Yep, depending on how much of cross-lanugage datastructures we want to permit. We could definitely unpeel those differences just like I did with qualifiers. However this trick is kind of weird because it does not propagate up to aggregates buildt from these. In a way some of this sould probably better be handled at canonical type calculation time if we really want to permit mixing up aggregates with those differences. For example I think it would make sense to look throught ARRAY_TYPEs when handling determining pointer canonical type (so pointer to array and pointer to element are the same). We also may ignore TREE_CODE match for POINTER_TYPE_P if we really do have languages that mixes that. Another case (I've seen this in fortran vs. C) is: int i; vs. common/i/a(1) thus inter-operability of int[1] and int. A similar issue (with alias sets used for accesses) is struct X { int trailing[1]; }; and code doing struct X *p; int (*a[16]) = p-trailing; and (*a)[i] vs. p-trailing[i] (or in the middle-end with frontends that can emit array assignments).
[PATCH] Combine related fail of gcc.target/powerpc/ti_math1.c
This patch fixes FAIL: gcc.target/powerpc/ti_math1.c scan-assembler-times adde 1 a failure caused by combine simplifying this i2src (plus:DI (plus:DI (reg:DI 165 [ val+8 ]) (reg:DI 169 [+8 ])) (reg:DI 76 ca)) to this (plus:DI (plus:DI (reg:DI 76 ca) (reg:DI 165 [ val+8 ])) (reg:DI 169 [+8 ])) which no longer matches rs6000.md adddi3_carry_in_internal. See https://gcc.gnu.org/ml/gcc/2015-05/msg00206.html for related discussion. Bootstrapped and regression tested powerpc64le-linux, powerpc64-linux and x86_64-linux. OK to apply mainline? * rtlanal.c (commutative_operand_precedence): Correct comments. * simplify-rtx.c (simplify_plus_minus_op_data_cmp): Delete forward declaration. Return an int. Distinguish REG,REG return from others. (struct simplify_plus_minus_op_data): Make local to function. (simplify_plus_minus): Rename canonicalized to not_canonical. Don't set not_canonical if merely sorting registers. Avoid packing ops if nothing changes. White space fixes. Some notes: Renaming canonicalized to not_canonical better reflects its usage. At the time the var is set, the expression hasn't been canonicalized. diff -w shown to exclude the whitespace fixes. Index: gcc/rtlanal.c === --- gcc/rtlanal.c (revision 223463) +++ gcc/rtlanal.c (working copy) @@ -3140,10 +3140,9 @@ regno_use_in (unsigned int regno, rtx x) } /* Return a value indicating whether OP, an operand of a commutative - operation, is preferred as the first or second operand. The higher - the value, the stronger the preference for being the first operand. - We use negative values to indicate a preference for the first operand - and positive values for the second operand. */ + operation, is preferred as the first or second operand. The more + positive the value, the stronger the preference for being the first + operand. */ int commutative_operand_precedence (rtx op) @@ -3150,7 +3149,7 @@ commutative_operand_precedence (rtx op) { enum rtx_code code = GET_CODE (op); - /* Constants always come the second operand. Prefer nice constants. */ + /* Constants always become the second operand. Prefer nice constants. */ if (code == CONST_INT) return -8; if (code == CONST_WIDE_INT) Index: gcc/simplify-rtx.c === --- gcc/simplify-rtx.c (revision 223463) +++ gcc/simplify-rtx.c (working copy) @@ -71,7 +71,6 @@ along with GCC; see the file COPYING3. If not see static rtx neg_const_int (machine_mode, const_rtx); static bool plus_minus_operand_p (const_rtx); -static bool simplify_plus_minus_op_data_cmp (rtx, rtx); static rtx simplify_plus_minus (enum rtx_code, machine_mode, rtx, rtx); static rtx simplify_immed_subreg (machine_mode, rtx, machine_mode, unsigned int); @@ -4064,20 +4063,10 @@ simplify_const_binary_operation (enum rtx_code cod -/* Simplify a PLUS or MINUS, at least one of whose operands may be another - PLUS or MINUS. +/* Return a positive integer if X should sort after Y. The value + returned is 1 if and only if X and Y are both regs. */ - Rather than test for specific case, we do this by a brute-force method - and do all possible simplifications until no more changes occur. Then - we rebuild the operation. */ - -struct simplify_plus_minus_op_data -{ - rtx op; - short neg; -}; - -static bool +static int simplify_plus_minus_op_data_cmp (rtx x, rtx y) { int result; @@ -4085,23 +4074,36 @@ simplify_plus_minus_op_data_cmp (rtx x, rtx y) result = (commutative_operand_precedence (y) - commutative_operand_precedence (x)); if (result) -return result 0; +return result + result; /* Group together equal REGs to do more simplification. */ if (REG_P (x) REG_P (y)) return REGNO (x) REGNO (y); - else -return false; + + return 0; } +/* Simplify and canonicalize a PLUS or MINUS, at least one of whose + operands may be another PLUS or MINUS. + + Rather than test for specific case, we do this by a brute-force method + and do all possible simplifications until no more changes occur. Then + we rebuild the operation. + + May return NULL_RTX when no changes were made. */ + static rtx simplify_plus_minus (enum rtx_code code, machine_mode mode, rtx op0, rtx op1) { - struct simplify_plus_minus_op_data ops[16]; + struct simplify_plus_minus_op_data + { +rtx op; +short neg; + } ops[16]; rtx result, tem; int n_ops = 2; - int changed, n_constants, canonicalized = 0; + int changed, n_constants, not_canonical = 0; int i, j; memset (ops, 0, sizeof ops); @@ -4139,7 +4141,18 @@ simplify_plus_minus (enum rtx_code code, machine_m ops[i].op = XEXP (this_op, 0); changed = 1; -
[PATCH][RFA] PR rtl-optimization/66237
This patch fixes a bug introduced by refactoring. A cast from rtx_insn to rtx_jump_insn in fix_crossing_conditional_branches was placed before the check, and that caused ICE if the instruction is actually a call, rather than a jump. Bootstrapped/regtested on x86_64 linux and tested the regressed case (PR34999) on aarch64 crosscompiler using the provided profiling data. OK for trunk? -- Regards, Mikhail Maltsev gcc/ChangeLog: 2015-05-22 Mikhail Maltsev malts...@gmail.com PR rtl-optimization/66237 * bb-reorder.c (fix_crossing_conditional_branches): Fix wrong location of an as_a cast. diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index 20e15fc..cbf6343 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -2044,7 +2044,7 @@ fix_crossing_conditional_branches (void) if (crossing_edge) { - rtx_jump_insn *old_jump = as_a rtx_jump_insn * (BB_END (cur_bb)); + rtx_insn *old_jump = BB_END (cur_bb); /* Check to make sure the jump instruction is a conditional jump. */ @@ -2067,6 +2067,9 @@ fix_crossing_conditional_branches (void) if (set_src (GET_CODE (set_src) == IF_THEN_ELSE)) { + rtx_jump_insn *old_jump_insn = + as_a rtx_jump_insn * (old_jump); + if (GET_CODE (XEXP (set_src, 1)) == PC) old_label = XEXP (set_src, 2); else if (GET_CODE (XEXP (set_src, 2)) == PC) @@ -2095,7 +2098,7 @@ fix_crossing_conditional_branches (void) emit_label (new_label); gcc_assert (GET_CODE (old_label) == LABEL_REF); - old_jump_target = old_jump-jump_target (); + old_jump_target = old_jump_insn-jump_target (); new_jump = as_a rtx_jump_insn * (emit_jump_insn (gen_jump (old_jump_target))); new_jump-set_jump_target (old_jump_target); @@ -2114,7 +2117,7 @@ fix_crossing_conditional_branches (void) /* Make old jump branch to new bb. */ - redirect_jump (old_jump, new_label, 0); + redirect_jump (old_jump_insn, new_label, 0); /* Remove crossing_edge as predecessor of 'dest'. */
Re: [PATCH 2/3][AArch64][PR target/65697] Strengthen barriers for sync-compare-swap builtins.
[Added PR number and updated patches] This patch changes the code generated for __sync_type_compare_and_swap to ldxr reg; cmp; bne label; stlxr; cbnz; label: dmb ish; mov .., reg This removes the acquire-barrier from the load and ends the operation with a fence to prevent memory references appearing after the __sync operation from being moved ahead of the store-release. This also strengthens the acquire barrier generated for __sync_lock_test_and_set (which, like compare-and-swap, is implemented as a form of atomic exchange): ldaxr; stxr; cbnz becomes ldxr; stxr; cbnz; dmb ish Tested with check-gcc for aarch64-none-linux-gnu. Ok for trunk? Matthew 2015-05-22 Matthew Wahab matthew.wa...@arm.com PR target/65697 * config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check for __sync memory models, emit appropriate initial and final barriers. From 1e5cda95944e7176b8934296b1bb1ec4c9fb1362 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Fri, 15 May 2015 09:31:06 +0100 Subject: [PATCH 2/3] [AArch64] Strengthen barriers for sync-compare-swap builtins. Change-Id: I335771f2f42ea951d227f20f6cb9daa07330614d --- gcc/config/aarch64/aarch64.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 182dbad..5b9feee 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9433,14 +9433,19 @@ aarch64_split_compare_and_swap (rtx operands[]) bool is_weak; rtx_code_label *label1, *label2; rtx x, cond; + enum memmodel model; + rtx model_rtx; + rtx load_model_rtx; rval = operands[0]; mem = operands[1]; oldval = operands[2]; newval = operands[3]; is_weak = (operands[4] != const0_rtx); + model_rtx = operands[5]; scratch = operands[7]; mode = GET_MODE (mem); + model = memmodel_from_int (INTVAL (model_rtx)); label1 = NULL; if (!is_weak) @@ -9450,7 +9455,13 @@ aarch64_split_compare_and_swap (rtx operands[]) } label2 = gen_label_rtx (); - aarch64_emit_load_exclusive (mode, rval, mem, operands[5]); + /* A __sync operation will end with a fence so the load can be relaxed. */ + if (is_mm_sync (model)) +load_model_rtx = GEN_INT (MEMMODEL_RELAXED); + else +load_model_rtx = model_rtx; + + aarch64_emit_load_exclusive (mode, rval, mem, load_model_rtx); cond = aarch64_gen_compare_reg (NE, rval, oldval); x = gen_rtx_NE (VOIDmode, cond, const0_rtx); @@ -9458,7 +9469,7 @@ aarch64_split_compare_and_swap (rtx operands[]) gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); - aarch64_emit_store_exclusive (mode, scratch, mem, newval, operands[5]); + aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx); if (!is_weak) { @@ -9475,6 +9486,10 @@ aarch64_split_compare_and_swap (rtx operands[]) } emit_label (label2); + + /* A __sync operation may need a final fence. */ + if (is_mm_sync (model)) +aarch64_emit_post_barrier (model); } /* Split an atomic operation. */ -- 1.9.1
Re: [patch 10/10] debug-early merge: compiler proper
FWIW, Ada is filled with these temporaries and/or types that should really be ignored, and are currently causing grief. It's a little hard to believe that types created in a front-end should be marked ignored. Either they are used by some objects and thus can be needed in the debug info, or they aren't and will be discarded by -feliminate-unused- debug-types. And those not present in the source should be marked artificial. -- Eric Botcazou
[Ada] Correct some anmolies in the handling of Atomic
This update corrects two problems in the handling of Atomic. First we do not need Atomic_Synchronization for an object renaming declaration. Second, when we do have a renaming of an atomic object, the renaming object should be marked as atomic. Compiling this test: 1. package Renamed_Atomic is 2. 3. I : Integer; 4. pragma Atomic (I); 5. 6. function Get_I return Integer; 7. procedure Set_I (Val : Integer); 8. 9. J : Integer renames I; 10. 11. function Get_J return Integer; 12. procedure Set_J (Val : Integer); 13. 14. end Renamed_Atomic; We do not want Is_Atomic set on the identifier reference in line 9, but we do want Is_Atomic set on the entity for J. So if we use -gnatdt to generate a tree file, it should have two Is_Atomic references, one for the entity I and one for the entity J. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-21 Robert Dewar de...@adacore.com * exp_util.adb (Activate_Atomic_Synchronization): Do not set Atomic_Sync_Required for an object renaming declaration. * sem_ch8.adb (Analyze_Object_Renaming): Copy Is_Atomic and Is_Independent to renaming object. Index: exp_util.adb === --- exp_util.adb(revision 223476) +++ exp_util.adb(working copy) @@ -204,6 +204,13 @@ when others = null; end case; + -- Nothing to do for the identifier in an object renaming declaration, + -- the renaming itself does not need atomic syncrhonization. + + if Nkind (Parent (N)) = N_Object_Renaming_Declaration then + return; + end if; + -- Go ahead and set the flag Set_Atomic_Sync_Required (N); Index: sem_ch8.adb === --- sem_ch8.adb (revision 223476) +++ sem_ch8.adb (working copy) @@ -1344,6 +1344,13 @@ Set_Is_Volatile (Id, Is_Volatile_Object (Nam)); + -- Also copy settings of Is_Atomic and Is_Independent + + if Is_Entity_Name (Nam) then + Set_Is_Atomic (Id, Is_Atomic (Entity (Nam))); + Set_Is_Independent (Id, Is_Independent (Entity (Nam))); + end if; + -- Treat as volatile if we just set the Volatile flag if Is_Volatile (Id)
Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
On 05/21/2015 11:02 PM, Sriraman Tallam wrote: On Thu, May 21, 2015 at 2:51 PM, Pedro Alves pal...@redhat.com wrote: On 05/21/2015 10:12 PM, Sriraman Tallam wrote: My original proposal, for x86_64 only, was to add -fno-plt=function-name. This lets the user decide for which functions PLT must be avoided. Let the compiler always generate an indirect call using call *func@GOTPCREL(%rip). We could do this for non-PIC code too. No need for linker fixups since this relies on the user to know that func is from a shared object. Having to pass function names on the command line seems like an odd interface. E.g, you'll need to pass the mangled name for C++ functions. Any reason this isn't a function attribute? It is not clear to me where I would stick the attribute. Example usage in foo.cc: #includestring.h int main() { int n = memcmp(); } I want memcmp to not go through PLT, do you propose explicitly re-declaring it in foo.cc with the attribute? I guess you'd do: #includestring.h __attribute__((no_plt)) typeof (memcpy) memcpy; int main() { int n = memcmp(); } or even: #includestring.h int main() { if (hotpath) { __attribute__((no_plt)) typeof (memcpy) memcpy; for (..) { int n = memcmp(); } } else { int n = memcmp(); } } or globally: $ cat no-plt/string.h: #include_next string.h __attribute__((no_plt)) typeof (memcpy) memcpy; $ gcc -I no-plt/ ... Thanks, Pedro Alves
Re: [Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures
Hi Venkat, On 22/05/15 09:50, Kumar, Venkataramanan wrote: Hi Kyrill, Sorry for little delay in responding. -Original Message- From: Kyrill Tkachov [mailto:kyrylo.tkac...@foss.arm.com] Sent: Tuesday, May 19, 2015 9:13 PM To: Kumar, Venkataramanan; James Greenhalgh; gcc-patches@gcc.gnu.org Cc: Ramana Radhakrishnan; seg...@kernel.crashing.org; Marcus Shawcroft Subject: Re: [Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures Hi Venkat, On 19/05/15 16:37, Kumar, Venkataramanan wrote: Hi Maintainers, Please find the attached patch, that fixes add/extend gcc test suite failures in Aarch64 target. Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049 These tests started to fail after we prevented combiner from converting shift RTX to mult RTX, when the RTX is not inside a memory operation (r222874) . Now I have added new add/extend patterns which are based on shift operations, to fix these cases. Testing status with the patch. (1) GCC bootstrap on AArch64 successful. (2) SPEC2006 INT runs did not show any degradation. Does that mean there was no performance regression? Or no codegen difference? Yes there was no performance regression. What I'd expect from this patch is that the codegen would be the same as before the combine patch (r222874). A performance difference can sometimes be hard to measure even at worse code quality. Can you please confirm that on SPEC2006 INT the adds and shifts are now back to being combined into a single instruction? I used -dp --save-temps to dump pattern names in assembly files. I used revision before combiner patch (r222873) and the patch on top of trunk (little older r223194) for comparison. Quickly compared the counts generated for the below patterns from the SPEC INT binaries. *adds_optabmode_multp2 vs *adds_optabALLX:mode_shft_GPI:mode *subs_optabmode_multp2 vs *subs_optabALLX:mode_shft_GPI:mode *add_uxtmode_multp2vs *add_uxtmode_shift2 *add_uxtsi_multp2_uxtw vs *add_uxtsi_shift2_uxtw *sub_uxtmode_multp2vs *sub_uxtmode_shift2 *sub_uxtsi_multp2_uxtw vs *sub_uxtsi_shift2_uxtw *adds_mul_imm_mode vs *adds_shift_imm_mode *subs_mul_imm_mode vs *subs_shift_imm_mode Patterns are found in few benchmarks. The generated counts are matching in the binaries compiled with the two revisions. Also Looked at assembly generated and the adds and shifts are combined properly using the new patterns. Please let me know if this is OK. Thanks for investigating this. I've run your patch on aarch64.exp and I see the add+shift/extend failures that we were seeing go away (I'm sure you saw that as well ;). Up to the maintainers to review the patch. Thanks, Kyrill Thanks, Kyrill (3) gcc regression testing passed. (-Snip-) # Comparing 3 common sum files ## /bin/sh ./gcc-fsf-trunk/contrib/compare_tests /tmp/gxx-sum1.24998 /tmp/gxx-sum2.24998 Tests that now work, but didn't before: gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0- 9]+, lsl 3 gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3 gcc.target/aarch64/adds3.c scan-assembler-times adds\tx[0-9]+, x[0-9]+, x[0-9]+, sxtw 2 gcc.target/aarch64/extend.c scan-assembler add\tw[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler add\tx[0-9]+,.*uxtw #?3 gcc.target/aarch64/extend.c scan-assembler sub\tw[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxtw #?3 gcc.target/aarch64/subs1.c scan-assembler subs\tw[0-9]+, w[0-9]+, w[0- 9]+, lsl 3 gcc.target/aarch64/subs1.c scan-assembler subs\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3 gcc.target/aarch64/subs3.c scan-assembler-times subs\tx[0-9]+, x[0-9]+, x[0-9]+, sxtw 2 # No differences found in 3 common sum files (-Snip-) The patterns are fixing the regressing tests, so I have not added any new tests. Regarding removal of the old patterns based on mults, I am planning to do it as a separate work. Is this OK for trunk ? gcc/ChangeLog 2015-05-19 Venkataramanan Kumar venkataramanan.ku...@amd.com * config/aarch64/aarch64.md (*adds_shift_imm_mode): New pattern. (*subs_shift_imm_mode): Likewise. (*adds_optabALLX:mode_shift_GPI:mode): Likewise. (*subs_optabALLX:mode_shift_GPI:mode): Likewise. (*add_uxtmode_shift2): Likewise. (*add_uxtsi_shift2_uxtw): Likewise. (*sub_uxtmode_shift2): Likewise. (*sub_uxtsi_shift2_uxtw): Likewise. Regards, Venkat. Regards, Venkat,
Re: [5/9] Create sensible dummy registers
On 05/22/2015 09:39 AM, Richard Sandiford wrote: Eric Botcazou ebotca...@adacore.com writes: Some pieces of code create a temporary REG or MEM and only fill it in later when they're testing the cost of a particular rtx. This patch makes sure that even the dummy REG or MEM is valid, rather than force the gen_* code to handle garbage values. gcc/ * caller-save.c (init_caller_save): Use word_mode and FIRST_PSEUDO_REGISTER when creating temporary rtxes. * expr.c (init_expr_target): Likewise. * ira.c (setup_prohibited_mode_move_regs): Likewise. * postreload.c (reload_cse_regs_1): Likewise. Isn't LAST_VIRTUAL_REGISTER + 1 the canonical regno to be used in this case? Ah, yeah. Here's patch to fix all instances of that (I hope). Bootstrapped regression-tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * caller-save.c (init_caller_save): Base temporary register numbers on LAST_VIRTUAL_REGISTER + 1 rather than FIRST_PSEUDO_REGISTER. * cfgloopanal.c (init_set_costs): Likewise. * dojump.c (prefer_and_bit_test): Likewise. * expr.c (init_expr_target): Likewise. * ira.c (setup_prohibited_mode_move_regs): Likewise. * lower-subreg.c (init_lower_subreg): Likewise. * postreload.c (reload_cse_regs_1): Likewise. OK. jeff
Re: [PATCH][RFA] PR rtl-optimization/66237
On 05/22/2015 02:27 AM, Mikhail Maltsev wrote: This patch fixes a bug introduced by refactoring. A cast from rtx_insn to rtx_jump_insn in fix_crossing_conditional_branches was placed before the check, and that caused ICE if the instruction is actually a call, rather than a jump. Bootstrapped/regtested on x86_64 linux and tested the regressed case (PR34999) on aarch64 crosscompiler using the provided profiling data. OK for trunk? -- Regards, Mikhail Maltsev pr66237.clog gcc/ChangeLog: 2015-05-22 Mikhail Maltsevmalts...@gmail.com PR rtl-optimization/66237 * bb-reorder.c (fix_crossing_conditional_branches): Fix wrong location of an as_a cast. OK. jeff
Final patch to cleanup hppa port's handling of shadd/scaled indexed addressing modes
This is the final patch to the PA backend to cleanup its handling of shadd insns and scaled indexed addressing modes. First, it removes the old non-canonical shadd insns. Second, it removes some non-canonical peephole patterns. No idea what I was thinking when I wrote them. Given they're non-canonical and I don't know how to trigger them, best to just make this disappear. Finally, the remaining peepholes which rewrite indexed stores of FP registers in integer modes are updated to generate canonical RTL for the shift-add insns they generate. Like the other recent changes, these don't affect the code generation for the 300+ files I'm testing with. I've also verified that hppa.exp still runs successfully. Installed on the trunk. Jeff * pa.md (non-canonical shift-add insns): Remove. (peepholes with non-canonical RTL sources): Remove. (peepholes for indexed stores of FP regs in integer modes): Match and generate canonical RTL. diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md index 6cc7a3c..2686f38 100644 --- a/gcc/config/pa/pa.md +++ b/gcc/config/pa/pa.md @@ -2270,8 +2270,8 @@ ; computes the address to be deleted if the register it sets is dead. (define_peephole2 [(set (match_operand:SI 0 register_operand ) - (plus:SI (mult:SI (match_operand:SI 1 register_operand ) - (const_int 4)) + (plus:SI (ashift:SI (match_operand:SI 1 register_operand ) + (const_int 2)) (match_operand:SI 2 register_operand ))) (set (mem:SI (match_dup 0)) (match_operand:SI 3 register_operand ))] @@ -2281,31 +2281,14 @@ FP_REGNO_P (REGNO (operands[3])) [(set (mem:SI (plus:SI (mult:SI (match_dup 1) (const_int 4)) (match_dup 2))) (match_dup 3)) - (set (match_dup 0) (plus:SI (mult:SI (match_dup 1) (const_int 4)) - (match_dup 2)))] - ) - -(define_peephole2 - [(set (match_operand:SI 0 register_operand ) - (plus:SI (match_operand:SI 2 register_operand ) -(mult:SI (match_operand:SI 1 register_operand ) - (const_int 4 - (set (mem:SI (match_dup 0)) -(match_operand:SI 3 register_operand ))] - !TARGET_SOFT_FLOAT -!TARGET_DISABLE_INDEXING -REG_OK_FOR_BASE_P (operands[2]) -FP_REGNO_P (REGNO (operands[3])) - [(set (mem:SI (plus:SI (mult:SI (match_dup 1) (const_int 4)) (match_dup 2))) - (match_dup 3)) - (set (match_dup 0) (plus:SI (mult:SI (match_dup 1) (const_int 4)) + (set (match_dup 0) (plus:SI (ashift:SI (match_dup 1) (const_int 2)) (match_dup 2)))] ) (define_peephole2 [(set (match_operand:DI 0 register_operand ) - (plus:DI (mult:DI (match_operand:DI 1 register_operand ) - (const_int 4)) + (plus:DI (ashift:DI (match_operand:DI 1 register_operand ) + (const_int 2)) (match_operand:DI 2 register_operand ))) (set (mem:SI (match_dup 0)) (match_operand:SI 3 register_operand ))] @@ -2316,25 +2299,7 @@ FP_REGNO_P (REGNO (operands[3])) [(set (mem:SI (plus:DI (mult:DI (match_dup 1) (const_int 4)) (match_dup 2))) (match_dup 3)) - (set (match_dup 0) (plus:DI (mult:DI (match_dup 1) (const_int 4)) - (match_dup 2)))] - ) - -(define_peephole2 - [(set (match_operand:DI 0 register_operand ) - (plus:DI (match_operand:DI 2 register_operand ) -(mult:DI (match_operand:DI 1 register_operand ) - (const_int 4 - (set (mem:SI (match_dup 0)) -(match_operand:SI 3 register_operand ))] - !TARGET_SOFT_FLOAT -!TARGET_DISABLE_INDEXING -TARGET_64BIT -REG_OK_FOR_BASE_P (operands[2]) -FP_REGNO_P (REGNO (operands[3])) - [(set (mem:SI (plus:DI (mult:DI (match_dup 1) (const_int 4)) (match_dup 2))) - (match_dup 3)) - (set (match_dup 0) (plus:DI (mult:DI (match_dup 1) (const_int 4)) + (set (match_dup 0) (plus:DI (ashift:DI (match_dup 1) (const_int 2)) (match_dup 2)))] ) @@ -3896,8 +3861,8 @@ (define_peephole2 [(set (match_operand:SI 0 register_operand ) - (plus:SI (mult:SI (match_operand:SI 1 register_operand ) - (const_int 8)) + (plus:SI (ashift:SI (match_operand:SI 1 register_operand ) + (const_int 3)) (match_operand:SI 2 register_operand ))) (set (mem:DF (match_dup 0)) (match_operand:DF 3 register_operand ))] @@ -3907,15 +3872,15 @@ FP_REGNO_P (REGNO (operands[3])) [(set (mem:DF (plus:SI (mult:SI (match_dup 1) (const_int 8)) (match_dup 2))) (match_dup 3)) - (set (match_dup 0) (plus:SI (mult:SI (match_dup 1) (const_int 8)) + (set (match_dup 0) (plus:SI (ashift:SI (match_dup 1) (const_int 3)) (match_dup 2)))] ) (define_peephole2 [(set
Re: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
On 05/20/2015 08:04 PM, Thomas Preud'homme wrote: From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme From: Steven Bosscher [mailto:stevenb@gmail.com] Sent: Tuesday, May 19, 2015 7:21 PM Not OK. This will break in move_invariants() when it looks at REGNO (inv-reg). Indeed. I'm even surprised all tests passed. Ok I will just prevent moving in such a case. I'm running the tests now and will get back to you tomorrow. Patch is now tested via bootstrap + testsuite run on x86_64-linux-gnu and building arm-none-eabi cross-compiler + testsuite run. Both testsuite run show no regression. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 76a009f..4ce3576 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1626,7 +1626,7 @@ move_invariant_reg (struct loop *loop, unsigned invno) if (REG_P (reg)) regno = REGNO (reg); - if (!can_move_invariant_reg (loop, inv, reg)) + if (!can_move_invariant_reg (loop, inv, dest)) Won't this run into into the same problem if DEST is a SUBREG? Jeff
Re: [patch] testsuite enable PIE tests on FreeBSD
On 05/21/2015 02:01 PM, Andreas Tobler wrote: On 21.05.15 20:14, Andreas Tobler wrote: On 20.05.15 22:30, Jeff Law wrote: On 05/20/2015 11:04 AM, Andreas Tobler wrote: Hi, the attached patch enables some PIE tests on FreeBSD. Ok for trunk? Thanks, Andreas 2015-05-20 Andreas Tobler andre...@gcc.gnu.org * gcc.target/i386/pr32219-1.c: Enable test on FreeBSD. * gcc.target/i386/pr32219-2.c: Likewise. * gcc.target/i386/pr32219-3.c: Likewise. * gcc.target/i386/pr32219-4.c: Likewise. * gcc.target/i386/pr32219-5.c: Likewise. * gcc.target/i386/pr32219-6.c: Likewise * gcc.target/i386/pr32219-7.c: Likewise. * gcc.target/i386/pr32219-8.c: Likewise. * gcc.target/i386/pr39013-1.c: Likewise. * gcc.target/i386/pr39013-2.c: Likewise. * gcc.target/i386/pr64317.c: Likewise. Wouldn't it be better to remove the target selector and instead add: /* { dg-require-effective-target pie } */ In each of those tests? While the net effect is the same today, it means there's only one place to change if another x86 target gains PIE support in the future. Pre-approved using that style. Thanks! Tested on amd64-freebsd and CentOS. Andreas This is what I committed: 2015-05-21 Andreas Tobler andre...@gcc.gnu.org * gcc.target/i386/pr32219-1.c: Use 'dg-require-effective-target pie' instead of listing several targets on its own. * gcc.target/i386/pr64317.c: Likewise. Yes, I know. The comment and the content for this test case do not match. Is this ok: Index: gcc.target/i386/pr64317.c === --- gcc.target/i386/pr64317.c(revision 223498) +++ gcc.target/i386/pr64317.c(working copy) @@ -1,4 +1,5 @@ -/* { dg-do compile { target { { *-*-freebsd* *-*-linux* } ia32 } } } */ +/* { dg-do compile { target ia32 } } */ +/* { dg-require-effective-target pie } */ /* { dg-options -O2 -fpie } */ /* { dg-final { scan-assembler addl\[ \\t\]+\[$\]_GLOBAL_OFFSET_TABLE_, %ebx } } */ /* { dg-final { scan-assembler movl\[ \\t\]+c@GOTOFF\[(\]%ebx\[)\] } } */ or do you prefer instead of /* { dg-do compile { target ia32 } } */ this one: /* { dg-do compile } */ /* { dg-require-effective-target ia32 } */ Btw, all three variants run on i386. It looks like the dg-require-effective-target is typically on a line by itself. So let's stick with the last version. jeff
Re: [gofrontend-dev] Re: GO tools for gccgo cross
On Fri, May 22, 2015 at 2:01 PM, Andrew Chambers andrewchambe...@gmail.com wrote: For example, I've tested on an x86, built cross compilers for ppc64 and ppc64le and then can invoke the native go tool (built to run on x86) to compile with either target compiler by changing my GOARCH value. I don't want to have a separate cross go tool to use only for ppc64 and another to use only for ppc64le. If I have mips-linux-gcc , mips-linux-gccgo, there is no reason to not have a mips-linux-go which automatically uses the correct gccgo and is installed into the same prefix. It is in line with how gcc works currently, Also, for a canadian cross compile (--build differs to --host), building a new go tool is necessary anyway. With the gc toolchain, it's very convenient to be able to say GOARCH=arm go build program to get the ARM version of a program. I don't think we should move away from that in order to follow GCC conventions. The Go conventions are better and easier to use. Ian
Re: [PATCH/libiberty] fix build of gdb/binutils with clang.
On Wed, May 20, 2015 at 3:58 PM, Yunlian Jiang yunl...@google.com wrote: GCC bootstraps with this patch. Committed as follows. Ian include/: 2015-05-22 Yunlian Jiang yunl...@google.com * libiberty.h (asprintf): Don't declare if HAVE_DECL_ASPRINTF is not defined. libiberty/: 2015-05-22 Yunlian Jiang yunl...@google.com * configure.ac: Add AC_GNU_SOURCE. * Makefile.in (COMPILE.c): Add -D_GNU_SOURCE. * configure, config.in: Rebuild. * floatformat.c (_GNU_SOURCE): Don't define if already defined. Index: include/libiberty.h === --- include/libiberty.h (revision 223578) +++ include/libiberty.h (working copy) @@ -621,7 +621,7 @@ extern int pexecute (const char *, char extern int pwait (int, int *, int); -#if !HAVE_DECL_ASPRINTF +#if defined(HAVE_DECL_ASPRINTF) !HAVE_DECL_ASPRINTF /* Like sprintf but provides a pointer to malloc'd storage, which must be freed by the caller. */ Index: libiberty/Makefile.in === --- libiberty/Makefile.in (revision 223578) +++ libiberty/Makefile.in (working copy) @@ -113,7 +113,8 @@ installcheck: installcheck-subdir INCDIR=$(srcdir)/$(MULTISRCTOP)../include -COMPILE.c = $(CC) -c @DEFS@ $(CFLAGS) $(CPPFLAGS) -I. -I$(INCDIR) $(HDEFINES) @ac_libiberty_warn_cflags@ +COMPILE.c = $(CC) -c @DEFS@ $(CFLAGS) $(CPPFLAGS) -I. -I$(INCDIR) \ + $(HDEFINES) @ac_libiberty_warn_cflags@ -D_GNU_SOURCE # Just to make sure we don't use a built-in rule with VPATH .c.$(objext): Index: libiberty/configure.ac === --- libiberty/configure.ac (revision 223578) +++ libiberty/configure.ac (working copy) @@ -155,6 +155,7 @@ AC_MSG_NOTICE([target_header_dir = $targ GCC_NO_EXECUTABLES AC_PROG_CC +AC_GNU_SOURCE AC_SYS_LARGEFILE AC_PROG_CPP_WERROR Index: libiberty/floatformat.c === --- libiberty/floatformat.c (revision 223578) +++ libiberty/floatformat.c (working copy) @@ -19,7 +19,9 @@ along with this program; if not, write t Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA. */ /* This is needed to pick up the NAN macro on some systems. */ +#ifndef _GNU_SOURCE #define _GNU_SOURCE +#endif #ifdef HAVE_CONFIG_H #include config.h
Re: match.pd: (x | y) ~x - y ~x
On Mon, 18 May 2015, Richard Biener wrote: On Fri, May 15, 2015 at 7:22 PM, Marc Glisse marc.gli...@inria.fr wrote: we already have the more complicated: x ~(x y) - x ~y (which I am reindenting by the way) and the simpler: (~x | y) x - x y, so I am proposing this one for completeness. Regtested on ppc64le-redhat-linux. Ok (doesn't seem to be in fold-const.c). Btw, there are quite some (simple) ones only in fold-const.c which are eligible for moving to match.pd (thus remove them from fold-const.c and implement in match.pd). Mostly canonicalization ones though. Haven't you already done a lot of those in the branch though? -- Marc Glisse
Re: Reuse predicate code analysis for constraints
On 05/22/2015 09:42 AM, Richard Sandiford wrote: This patch adjusts the fix for PR target/65689 along the lines suggested in https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01559.html. The idea is to reuse the existing gensupport.c routine to work out the codes accepted by constraints. I'd originally done this with an eye to using compute_test_codes for the problem that Andreas found on s390. I don't think it's going to be useful for that after all, but it seems worth having for its on sake. Bootstrapped regression-tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * gensupport.h (compute_test_codes): Declare. * gensupport.c (compute_predicate_codes): Rename to... (compute_test_codes): ...this. Generalize error message. (process_define_predicate): Update accordingly. * genpreds.c (compute_maybe_allows): Delete. (add_constraint): Use compute_test_codes to determine whether something can accept a SUBREG, REG or MEM. OK. jeff
Fix pa.md splitters for recent combine changes
These should have gone in with the first patch in the series. Thankfully the splitters aren't terribly important anymore and thus having them goof'd up for a couple days hasn't been a problem. In fact, much like the hppa_legitimize_address code to handle shift-add/scaled addressing modes, these splitters don't make a bit of difference in my 300+ testfiles -- again, these were probably in place for f2c translated code to avoid losing because the PA has integer indexed loads, but not integer indexed stores. Verified hppa.exp still passes and that the 300+ testfiles get the same code before/after this change. Installed on the trunk. Jeff * config/pa/pa.md (integer_indexed_store splitters): Use mem_shadd_operand. Use ASHIFT rather than MULT in the resulting insns -- adjusting the constant 2nd operand accordingly. diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md index aaec27d..6cc7a3c 100644 --- a/gcc/config/pa/pa.md +++ b/gcc/config/pa/pa.md @@ -2819,42 +2819,54 @@ ;; a 2 insn store with some creative RTL rewriting. (define_split [(set (mem:SI (plus:SI (mult:SI (match_operand:SI 0 register_operand ) - (match_operand:SI 1 shadd_operand )) + (match_operand:SI 1 mem_shadd_operand )) (plus:SI (match_operand:SI 2 register_operand ) (match_operand:SI 3 const_int_operand (match_operand:SI 4 register_operand )) (clobber (match_operand:SI 5 register_operand ))] - [(set (match_dup 5) (plus:SI (mult:SI (match_dup 0) (match_dup 1)) + [(set (match_dup 5) (plus:SI (ashift:SI (match_dup 0) (match_dup 1)) (match_dup 2))) (set (mem:SI (plus:SI (match_dup 5) (match_dup 3))) (match_dup 4))] - ) + +{ + operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1]))); + +}) (define_split [(set (mem:HI (plus:SI (mult:SI (match_operand:SI 0 register_operand ) - (match_operand:SI 1 shadd_operand )) + (match_operand:SI 1 mem_shadd_operand )) (plus:SI (match_operand:SI 2 register_operand ) (match_operand:SI 3 const_int_operand (match_operand:HI 4 register_operand )) (clobber (match_operand:SI 5 register_operand ))] - [(set (match_dup 5) (plus:SI (mult:SI (match_dup 0) (match_dup 1)) + [(set (match_dup 5) (plus:SI (ashift:SI (match_dup 0) (match_dup 1)) (match_dup 2))) (set (mem:HI (plus:SI (match_dup 5) (match_dup 3))) (match_dup 4))] - ) + +{ + operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1]))); + +}) (define_split [(set (mem:QI (plus:SI (mult:SI (match_operand:SI 0 register_operand ) - (match_operand:SI 1 shadd_operand )) + (match_operand:SI 1 mem_shadd_operand )) (plus:SI (match_operand:SI 2 register_operand ) (match_operand:SI 3 const_int_operand (match_operand:QI 4 register_operand )) (clobber (match_operand:SI 5 register_operand ))] - [(set (match_dup 5) (plus:SI (mult:SI (match_dup 0) (match_dup 1)) + [(set (match_dup 5) (plus:SI (ashift:SI (match_dup 0) (match_dup 1)) (match_dup 2))) (set (mem:QI (plus:SI (match_dup 5) (match_dup 3))) (match_dup 4))] - ) + +{ + operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1]))); + +}) (define_expand movhi [(set (match_operand:HI 0 general_operand )
[PATCH], Add IEEE 128-bit floating point to PowerPC, patch #1
This patch is the first in a series of patches that will eventually add support for IEEE 128-bit floating point support to the PowerPC GCC compiler. At the current time, we do not plan to change the default for long double. I added a new type keyword (__float128) to get access to IEEE 128-bit floating point, and another (__ibm128) to get access to IBM extended double type. Until all of the GCC and LIBGCC patches have been committed, you will not be able to use IEEE 128-bit floating point, and -mfloat128-software will not be turned on by default. This patch adds the new modes (KFmode and IFmode) and the switches (-mfloat128-{none,software}). Due to the fact that TFmode in the PowerPC compiler either represents IEEE 128-bit floating point or the IBM extended double (double-double) format. For most PowerPC users, the default is to use IBM extended double for long double. Because TFmode can be either floating point format, I added new new modes: KFmode -- IEEE 128-bit floating point IFmode -- IBM extended double floating point If the default for TFmode is ibm extended double, the port will eventually use KFmode for IEEE 128-bit floating point. Likewise if the default for TFmode is IEEE 128-bit floating point, the port will use TFmode for IEEE 128-bit floating point, and IFmode for IBM extended double. I have bootstraped these patches on a power7 and compared them to the unpatched compiler. There were no changes when running make check. Are these patches ok to install in the trunk? 2015-05-22 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000-modes.def (IFmode): Define IFmode to provide access to the IBM extended double floating point mode if long double is IEEE 128-bit floating point. (KFmode): Define KFmode to provide access to IEEE 128-bit floating point if long double is the IBM extended double type. * config/rs6000/rs6000.opt (-mfloat128-none): New switches to enable adding IEEE 128-bit floating point support. (-mfloat128-software): Likewise. (-mfloat128-sw): Likewise. * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Do not allow 128-bit floating point types to occupy any register if -mlong-double-64. Do not allow use of IFmode/KFmode unless -mfloat128-software is enabled. (rs6000_debug_reg_global): Add IEEE 128-bit floating point debug support. (rs6000_option_override_internal): Add -mfloat128-* support. (rs6000_init_builtins): Setup __ibm128 and __float128 type modes. * config/rs6000/rs6000.h (rs6000_builtin_type_index): Add ibm128 and float128 type nodes. (ieee128_float_type_node): Likewise. (ibm128_float_type_node): Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000-modes.def === --- gcc/config/rs6000/rs6000-modes.def (revision 223458) +++ gcc/config/rs6000/rs6000-modes.def (working copy) @@ -18,6 +18,13 @@ along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +/* IBM 128-bit floating point. IFmode and KFmode use the fractional float + support in order to declare 3 128-bit floating point types. */ +FRACTIONAL_FLOAT_MODE (IF, 106, 16, ibm_extended_format); + +/* Explicit IEEE 128-bit floating point. */ +FRACTIONAL_FLOAT_MODE (KF, 113, 16, ieee_quad_format); + /* 128-bit floating point. ABI_V4 uses IEEE quad, AIX/Darwin adjust this in rs6000_option_override_internal. */ FLOAT_MODE (TF, 16, ieee_quad_format); Index: gcc/config/rs6000/rs6000.opt === --- gcc/config/rs6000/rs6000.opt(revision 223458) +++ gcc/config/rs6000/rs6000.opt(working copy) @@ -600,3 +600,19 @@ Allow float/double variables in upper re moptimize-swaps Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save Analyze and remove doubleword swaps from VSX computations. + +mfloat128- +Target RejectNegative Joined Enum(float128_type_t) Var(TARGET_FLOAT128) Init(FLOAT128_UNSET) Save +-mfloat128-{software,none} - Specify how IEEE 128-bit floating point is used. + +Enum +Name(float128_type_t) Type(enum float128_type_t) + +EnumValue +Enum(float128_type_t) String(none) Value(FLOAT128_NONE) + +EnumValue +Enum(float128_type_t) String(software) Value(FLOAT128_SW) + +EnumValue +Enum(float128_type_t) String(sw) Value(FLOAT128_SW) Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 223458) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1817,6 +1817,16 @@ rs6000_hard_regno_mode_ok (int regno, ma IN_RANGE (last_regno, FIRST_GPR_REGNO, LAST_GPR_REGNO) ((regno 1) ==
Re: C/C++ PATCH to allow deprecating enum values (PR c/47043)
On 22.05.2015 12:10, Marek Polacek wrote: Thanks, applied. Here's the final version. By the way, we have a feature test macro, __cpp_attributes=200809 which can be used to determine, whether C++11 attribute syntax is supported by the compiler. I propose to add something similar for this extension (like __cpp_gnu_enum_attributes=... for C++ and __GCC_HAVE_ENUM_ATTRIBUTES for C). Thoughts? -- Regards, Mikhail Maltsev
Re: [RFA] Fix combine to canonicalize (mult X pow2)) more often
On 05/22/2015 09:45 AM, Segher Boessenkool wrote: On Thu, May 21, 2015 at 09:24:37AM -0600, Jeff Law wrote: When combine needs to split a complex insn, it will canonicalize a simple (mult X (const_int Y)) where Y is a power of 2 into the expected (ashift X (const_int Y')) if the (mult ...) is selected as a split point. However if the split point is (plus (mult (X (const_int Y)) Z) combine fails to canonicalize. OK for the trunk? Okay. Something went wrong with your changelog though. Rebasing changelogs is a recipe for disaster. Yup. I still haven't found a workflow that makes sense with ChangeLogs. Ultimately I end up having to check them before each commit/push, so assume it'll end up in the right place :-) The following is just copy-paste, but anyway... + /* Similarly for (plus (mult FOO (const_int pow2))). */ + if (split_code == PLUS + GET_CODE (XEXP (*split, 0)) == MULT + CONST_INT_P (XEXP (XEXP (*split, 0), 1)) + INTVAL (XEXP (XEXP (*split, 0), 1)) 0 + (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1 = 0) INTVAL (..) 0 here disallows 163 while exact_log2 allows it (with a 64-bit HWI). Most places don't check the 0 and it doesn't seem necessary in the places that do. This won't ever trigger for a SImode 131 either. None of this matters when we're just looking at scaled indexing, of course. But exact_log2 is much harder to use correctly than to use it incorrectly :-( That test was just copied from the equivalent hunk of code that tests for (mult FOO (const_int pow2)) above. It's certainly ok for the INTVAL to reject things that might be allowed through the exact_log2 test. But as you say, this isn't going to matter. Jeff
RE: [PATCH] Print Pass Names
Subject: Re: [PATCH] Print Pass Names From: richard.guent...@gmail.com Date: Fri, 22 May 2015 21:32:24 +0200 To: hiradi...@msn.com; gcc-patches@gcc.gnu.org On May 22, 2015 6:32:38 PM GMT+02:00, Aditya K hiradi...@msn.com wrote: Currently, when we print the passes it does not print its name. This becomes confusing when we want to print all the passes at once (e.g., -fdump-tree-all-all=stderr pass.dump). This patch adds functionality to print the pass name. It passes bootstrap (with default configurations). Hope this is useful. Can't you just use current_pass-name? You are right. I have updated the patch. Thanks -Aditya gcc/ChangeLog: 2015-05-22 Aditya Kumar hiradi...@msn.com * statistics.c (statistics_fini_pass): Print pass name. --- gcc/statistics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/statistics.c b/gcc/statistics.c index 8cbe88d..50b41d1 100644 --- a/gcc/statistics.c +++ b/gcc/statistics.c @@ -201,7 +201,7 @@ statistics_fini_pass (void) dump_flags TDF_STATS) { fprintf (dump_file, \n); - fprintf (dump_file, Pass statistics:\n); + fprintf (dump_file, Pass statistics of \%s\: , current_pass-name); fprintf (dump_file, \n); curr_statistics_hash () -traverse_noresize void *, statistics_fini_pass_1 (NULL); -- 2.1.0.243.g30d45f7 Richard. Thanks, -Aditya gcc/ChangeLog: 2015-05-22 Aditya Kumar hiradi...@msn.com * passes.c (execute_todo): Added a parameter to pass the pass name. (execute_one_pass): Likewise * statistics.c (statistics_fini_pass): Likewise * statistics.h: Likewise --- gcc/passes.c | 9 + gcc/statistics.c | 4 ++-- gcc/statistics.h | 2 +- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/gcc/passes.c b/gcc/passes.c index 04ff042..7d41bd8 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -1984,7 +1984,7 @@ execute_function_todo (function *fn, void *data) /* Perform all TODO actions. */ static void -execute_todo (unsigned int flags) +execute_todo (unsigned int flags, const char *pass_name=NULL) { #if defined ENABLE_CHECKING if (cfun @@ -1997,7 +1997,7 @@ execute_todo (unsigned int flags) /* Inform the pass whether it is the first time it is run. */ first_pass_instance = (flags TODO_mark_first_instance) != 0; - statistics_fini_pass (); + statistics_fini_pass (pass_name); if (flags) do_per_function (execute_function_todo, (void *)(size_t) flags); @@ -2302,7 +2302,7 @@ execute_one_pass (opt_pass *pass) pass_init_dump_file (pass); /* Run pre-pass verification. */ - execute_todo (pass-todo_flags_start); + execute_todo (pass-todo_flags_start, pass-name); #ifdef ENABLE_CHECKING do_per_function (verify_curr_properties, @@ -2327,7 +2327,8 @@ execute_one_pass (opt_pass *pass) check_profile_consistency (pass-static_pass_number, 0, true); /* Run post-pass cleanup and verification. */ - execute_todo (todo_after | pass-todo_flags_finish | TODO_verify_il); + execute_todo (todo_after | pass-todo_flags_finish | TODO_verify_il, + pass-name); if (profile_report cfun (cfun-curr_properties PROP_cfg)) check_profile_consistency (pass-static_pass_number, 1, true); diff --git a/gcc/statistics.c b/gcc/statistics.c index 8cbe88d..54f81d5 100644 --- a/gcc/statistics.c +++ b/gcc/statistics.c @@ -192,7 +192,7 @@ statistics_fini_pass_3 (statistics_counter_t **slot, /* Dump the current statistics incrementally. */ void -statistics_fini_pass (void) +statistics_fini_pass (const char *pass_name) { if (current_pass-static_pass_number == -1) return; @@ -201,7 +201,7 @@ statistics_fini_pass (void) dump_flags TDF_STATS) { fprintf (dump_file, \n); - fprintf (dump_file, Pass statistics:\n); + fprintf (dump_file, Pass statistics of \%s\:\n, pass_name); fprintf (dump_file, \n); curr_statistics_hash () -traverse_noresize void *, statistics_fini_pass_1 (NULL); diff --git a/gcc/statistics.h b/gcc/statistics.h index 0b871ec..4348b7a 100644 --- a/gcc/statistics.h +++ b/gcc/statistics.h @@ -64,7 +64,7 @@ struct function; extern void statistics_early_init (void); extern void statistics_init (void); extern void statistics_fini (void); -extern void statistics_fini_pass (void); +extern void statistics_fini_pass (const char *pass_name = NULL); extern void statistics_counter_event (struct function *, const char *, int); extern void statistics_histogram_event (struct function *, const char *, int);
Re: [PATCH] Print Pass Names
On 05/22/2015 02:38 PM, Aditya K wrote: Subject: Re: [PATCH] Print Pass Names From: richard.guent...@gmail.com Date: Fri, 22 May 2015 21:32:24 +0200 To: hiradi...@msn.com; gcc-patches@gcc.gnu.org On May 22, 2015 6:32:38 PM GMT+02:00, Aditya K hiradi...@msn.com wrote: Currently, when we print the passes it does not print its name. This becomes confusing when we want to print all the passes at once (e.g., -fdump-tree-all-all=stderr pass.dump). This patch adds functionality to print the pass name. It passes bootstrap (with default configurations). Hope this is useful. Can't you just use current_pass-name? You are right. I have updated the patch. Thanks -Aditya gcc/ChangeLog: 2015-05-22 Aditya Kumar hiradi...@msn.com * statistics.c (statistics_fini_pass): Print pass name.OK. jeff
Re: [Patch libstdc++] Rewrite cpu/generic/atomic_word.h
I bootstrapped this on powerpc-ibm-aix7.1.0.0 and my colleagues bootstrapped this on powerpc64-linux and powerpc64le-linux. It works and produces reasonable instruction sequences. We can iterate on the syntax, but the core concept seems to work correctly. Thanks, David
Re: [Patch]: libbacktrace - add support of PE/COFF
On Thu, May 21, 2015 at 5:41 AM, Tristan Gingold ging...@adacore.com wrote: 2015-05-21 Tristan Gingold ging...@adacore.com * pecoff.c: New file. * Makefile.am (FORMAT_FILES): Add pecoff.c and dependencies. * Makefile.in: Regenerate. * filetype.awk: Detect pecoff. * configure.ac: Define BACKTRACE_SUPPORTS_DATA on elf platforms. Add pecoff. * btest.c (test5): Test enabled only if BACKTRACE_SUPPORTS_DATA is true. * backtrace-supported.h.in (BACKTRACE_SUPPORTS_DATA): Define. * configure: Regenerate. * pecoff.c: New file. Thanks for working on this. diff --git a/libbacktrace/backtrace-supported.h.in b/libbacktrace/backtrace-supported.h.in index 5115ce1..4574635 100644 --- a/libbacktrace/backtrace-supported.h.in +++ b/libbacktrace/backtrace-supported.h.in @@ -59,3 +59,8 @@ POSSIBILITY OF SUCH DAMAGE. */ as 0. */ #define BACKTRACE_SUPPORTS_THREADS @BACKTRACE_SUPPORTS_THREADS@ + +/* BACKTRACE_SUPPORTS_DATA will be #defined'd as 1 if the backtrace library + also handles data symbols, 0 if not. */ + +#define BACKTRACE_SUPPORTS_DATA @BACKTRACE_SUPPORTS_DATA@ End users are expected to read and understand this file, so I think this comment is too obscure. I suggest: BACKTRACE_SUPPORTS_DATA will be #define'd as 1 if backtrace_syminfo will work for variables. It will always work for functions. I would have thought you could distinguish relevant symbols using the storage class and type. But perhaps not. diff --git a/libbacktrace/filetype.awk b/libbacktrace/filetype.awk index 0a656f7..37099ad 100644 --- a/libbacktrace/filetype.awk +++ b/libbacktrace/filetype.awk @@ -1,3 +1,4 @@ # An awk script to determine the type of a file. /\177ELF\001/ { if (NR == 1) { print elf32; exit } } /\177ELF\002/ { if (NR == 1) { print elf64; exit } } +/\114\001/{ if (NR == 1) { print pecoff; exit } } That works for 32-bit, but I think not for 64-bit. For 64-bit I would expect \144\206. +#include windows.h Where is windows.h going to come from when building a cross-compiler? I think this needs to be removed. I see that you define the structs yourself, as you should, so why do you need windows.h? +/* Read a potentially unaligned 2 byte word at P, using native endianness. */ Is there really ever a case where a 2 byte word might be misaligned? +/* Return true iff SYM is a defined symbol for a function. Data symbols + are discarded because they aren't easily identified. */ + +static int +coff_is_symbol (const b_coff_internal_symbol *isym) +{ + return isym-type == 0x20 isym-sec 0; +} Is this really right? This seems to test for DT_FCN set, but won't a function returning, say, int, have type 0x24 (DT_FCN N_TBSHFT) | T_INT? Also, the name isn't right--this is coff_is_function_symbol. + if (coff_expand_symbol (isym, asym, sects_num, strtab, strtab_size) 0) + { + error_callback (data, invalid coff symbol, 0); + return 0; + } That's not a very useful error message--can you be more specific? + /* Allocate memory for symbols are strings. */ Comment looks wrong--omit are? Ian
[PATCH, AARCH64] make stdarg functions work with +nofp
The compiler currently ICEs when compiling a stdarg function with +nofp, as reported in PR 66258. The aarch64.md file disables FP instructions using TARGET_FLOAT, which supports both -mgeneral-regs-only and +nofp. But there is code in aarch64.c that checks TARGET_GENERAL_REGS_ONLY. This results in FP instructions when using +nofp, The aarch64.c code needs to use TARGET_FLOAT instead like the md file already does. I can't meaningfully test this with a bootstrap, since the patch has no effect unless I bootstrap with +nofp, and that will fail as gcc contains floating point code. The testsuite already has multiple stdarg tests, so there is no need for another one. I tested this by verifying I get the same results for some simple testcasess with and without the patch, with and without using -mgeneral-regs-only and -mcpu=cortex-a53+nofp. 2015-05-22 Jim Wilson jim.wil...@linaro.org PR target/66258 * config/aarch64/aarch64.c (aarch64_function_value_regno_p): Change !TARGET_GENERAL_REGS_ONLY to TARGET_FLOAT. (aarch64_secondary_reload): Likewise (aarch64_expand_builtin_va_start): Change TARGET_GENERAL_REGS_ONLY to !TARGET_FLOAT. (aarch64_gimplify_va_arg_expr, aarch64_setup_incoming_varargs): Likewise. Index: config/aarch64/aarch64.c === --- config/aarch64/aarch64.c (revision 223590) +++ config/aarch64/aarch64.c (working copy) @@ -1666,7 +1666,7 @@ aarch64_function_value_regno_p (const un /* Up to four fp/simd registers can return a function value, e.g. a homogeneous floating-point aggregate having four members. */ if (regno = V0_REGNUM regno V0_REGNUM + HA_MAX_NUM_FLDS) -return !TARGET_GENERAL_REGS_ONLY; +return TARGET_FLOAT; return false; } @@ -4783,7 +4783,7 @@ aarch64_secondary_reload (bool in_p ATTR /* A TFmode or TImode memory access should be handled via an FP_REGS because AArch64 has richer addressing modes for LDR/STR instructions than LDP/STP instructions. */ - if (!TARGET_GENERAL_REGS_ONLY rclass == GENERAL_REGS + if (TARGET_FLOAT rclass == GENERAL_REGS GET_MODE_SIZE (mode) == 16 MEM_P (x)) return FP_REGS; @@ -7571,7 +7571,7 @@ aarch64_expand_builtin_va_start (tree va vr_save_area_size = (NUM_FP_ARG_REGS - cum-aapcs_nvrn) * UNITS_PER_VREG; - if (TARGET_GENERAL_REGS_ONLY) + if (!TARGET_FLOAT) { if (cum-aapcs_nvrn 0) sorry (%qs and floating point or vector arguments, @@ -7681,7 +7681,7 @@ aarch64_gimplify_va_arg_expr (tree valis is_ha)) { /* TYPE passed in fp/simd registers. */ - if (TARGET_GENERAL_REGS_ONLY) + if (!TARGET_FLOAT) sorry (%qs and floating point or vector arguments, -mgeneral-regs-only); @@ -7918,7 +7918,7 @@ aarch64_setup_incoming_varargs (cumulati gr_saved = NUM_ARG_REGS - local_cum.aapcs_ncrn; vr_saved = NUM_FP_ARG_REGS - local_cum.aapcs_nvrn; - if (TARGET_GENERAL_REGS_ONLY) + if (!TARGET_FLOAT) { if (local_cum.aapcs_nvrn 0) sorry (%qs and floating point or vector arguments,
Re: Fix two more memory leaks in threader
On 05/20/2015 10:41 AM, Jakub Jelinek wrote: On Wed, May 20, 2015 at 10:36:25AM -0600, Jeff Law wrote: These fix the remaining leaks in the threader that I'm aware of. We failed to properly clean-up when we had to cancel certain jump threading opportunities. So thankfully this wasn't a big leak. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Installed on the trunk. Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fe4dfc4..27435c6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2015-05-20 Jeff Law l...@redhat.com + + * tree-ssa-threadupdate.c (mark_threaded_blocks): Properly + dispose of the jump thread path when the jump threading + opportunity is cancelled. + 2015-05-20 Manuel López-Ibáñez m...@gcc.gnu.org * diagnostic.c (diagnostic_print_caret_line): Fix off-by-one error diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index c5b78a4..4bccad0 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2159,9 +2159,16 @@ mark_threaded_blocks (bitmap threaded_blocks) { /* Attach the path to the starting edge if none is yet recorded. */ if ((*path)[0]-e-aux == NULL) -(*path)[0]-e-aux = path; - else if (dump_file (dump_flags TDF_DETAILS)) - dump_jump_thread_path (dump_file, *path, false); + { + (*path)[0]-e-aux = path; + } Why the braces around single stmt if body? To match what was happening in the else clause. I always find if (fubar) something else { more stuff even more stuff } more painful to parse because of the mis-matched indentation than if (fubar) { something } else { more stuff even more stuff } It's purely a style thing and if you'd prefer not to have the extra braces, I wouldn't lose any sleep over removing them. Also, the indentation seems to be weird. Looks like spaces vs tabs issue. I'll do a pass over tree-ssa-threadedge.c and fix them all up at once. jeff
Re: [gofrontend-dev] Re: GO tools for gccgo cross
On Fri, May 22, 2015 at 3:11 PM, Andrew Chambers andrewchambe...@gmail.com wrote: I'm not suggesting breaking go conventions, I just think the default if no GOARCH is specified then it should match --target. Sounds good to me. Perhaps we could check the symlink name for the target triple if no GOARCH is set. We should probably build it into the tool, perhaps in zdefaultcc.go created in gotools/Makefile.am. Ian
Re: PR fortran/44054 Convert all gfc_error_1 calls to gfc_error
PING: https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01511.html This only needs approval from Fortran maintainers. On 17 May 2015 at 20:22, Manuel López-Ibáñez lopeziba...@gmail.com wrote: Hi, This patch finishes the conversion of Fortran diagnostics to use the common diagnostics by removing all gfc_error*_1 variants. I noticed that whether some buffered gfc_error_1() end up printed may depend on whether a gfc_error_now is given or not, and not only on whether there is any output buffered. Thus, I reintroduced a new error_buffer of type gfc_buffer_error. The rest is mostly mechanic. I did not make an attempt in this patch to remove all code that has become obsolete now: gfc_get_terminal_width (already implemented in diagnostics.c) error_char (already empty, but used by other obsolete functions) error_string (obsolete, just use %s) error_uinteger (obsolete, just use %lu) error_integer (obsolete, just use %ld) gfc_widechar_display_length, gfc_wide_display_length, print_wide_char_into_buffer, gfc_print_wide_char (I'm not sure how this functionality differs from what the common diagnostics already do, perhaps some of it should be moved to the common code) show_locus (obsolete, except Included at handling should be moved to the common diagnostics, no testcase is testing this). show_loci (obsolete, except During initialization handling should be moved to the common diagnostics, no testcase is testing this) error_print, error_printf (obsolete) Bootstrapped and regression tested on x86_64-linux-gnu. OK? gcc/fortran/ChangeLog: 2015-05-17 Manuel López-Ibáñez m...@gcc.gnu.org PR fortran/44054 * gfortran.h (struct gfc_error_buf): Rename as gfc_error_buffer. Move closer to push, pop and free methods. Reimplement using an output_buffer. * error.c (errors, warnings, warning_buffer, cur_error_buffer): Delete everywhere in this file. (error_char): Delete all contents. (gfc_increment_error_count): Delete. (gfc_error_now): Update comment. Set error_buffer.flag. (gfc_warning_check): Do not handle warning_buffer. (gfc_error_1): Delete. (gfc_error_now_1): Delete. (gfc_error_check): Simplify. (gfc_move_error_buffer_from_to): Renamed from gfc_move_output_buffer_from_to. (gfc_push_error): Handle only gfc_error_buffer. (gfc_pop_error): Likewise. (gfc_free_error): Likewise. (gfc_get_errors): Remove warnings and errors. (gfc_diagnostics_init): Use static error_buffer. (gfc_error_1,gfc_error_now_1): Delete declarations. * symbol.c, decl.c, trans-common.c, data.c, expr.c, expr.c, frontend-passes.c, resolve.c, match.c, parse.c: Replace gfc_error_1 with gfc_error and gfc_error_now_1 with gfc_error_1 everywhere. * f95-lang.c (gfc_be_parse_file): Do not update errorcount and warningcount here. * primary.c (match_complex_constant): Replace gfc_error_buf and output_buffer with gfc_error_buffer.
Re: [PATCH GCC]Improve how we handle overflow for type conversion in scev/ivopts, part I
On Wed, May 20, 2015 at 11:41 AM, Bin Cheng bin.ch...@arm.com wrote: Hi, As we know, GCC is too conservative when checking overflow behavior in SCEV and loop related optimizers. Result is some variable can't be recognized as scalar evolution and thus optimizations are missed. To be specific, optimizers like ivopts and vectorizer are affected. This issue is more severe on 64 bit platforms, for example, PR62173 is failed on aarch64; scev-3.c and scev-4.c were marked as XFAIL on lp64 platforms. As the first part to improve overflow checking in GCC, this patch does below improvements: 1) Ideally, chrec_convert should be responsible to convert scev like (type){base, step} to scev like {(type)base, (type)step} when the result scev doesn't overflow; chrec_convert_aggressive should do the conversion if the result scev could overflow/wrap. Unfortunately, current implementation may use chrec_convert_aggressive to return a scev that won't overflow. This is because of a) the static parameter fold_conversions for instantiate_scev_convert can only tracks whether chrec_convert_aggressive may be called, rather than if it does some overflow conversion or not; b) the implementation of instantiate_scev_convert sometimes shortcuts the call to chrec_convert and misses conversion opportunities. This patch improves this. 2) iv-no_overflow computed in simple_iv is too conservative. With 1) fixed, iv-no_overflow should reflects whether chrec_convert_aggressive does return an overflow scev. This patch improves this. 3) chrec_convert should be able to prove the resulting scev won't overflow with loop niter information. This patch doesn't finish this, but it factored a new interface out of scev_probably_wraps_p for future improvement. And that will be the part II patch. With the improvements in SCEV, this patch also improves optimizer(IVOPT) that uses scev information like below: For array reference in the form of arr[IV], GCC tries to derive new address iv {arr+iv.base, iv.step*elem_size} from IV. If IV overflow wrto a type that is narrower than address space, this derivation is not true because arr[IV] isn't a scev. Root cause why scev-*.c are failed now is the overflow information of IV is too conservative. IVOPT has to be conservative to reject arr[IV] as a scev. With more accurate overflow information, IVOPT can be improved too. So this patch fixes the mentioned long standing issues. Bootstrap and test on x86_64, x86 and aarch64. BTW, test gcc.target/i386/pr49781-1.c failed on x86_64, but I can confirmed it's not this patch's fault. So what's your opinion on this?. I maybe mixing things up but does +chrec_convert_aggressive (tree type, tree chrec, bool *fold_conversions) { ... + if (evolution_function_is_affine_p (chrec)) +{ + tree base, step; + struct loop *loop; + + loop = get_chrec_loop (chrec); + base = CHREC_LEFT (chrec); + step = CHREC_RIGHT (chrec); + if (convert_affine_scev (loop, type, base, step, NULL, true)) + return build_polynomial_chrec (loop-num, base, step); ^^^ not forget to set *fold_conversions to true? Or we need to use convert_affine_scev (..., false)? +} (bah, and the diff somehow messes up -p context :/ which is why I like context diffs more) Other from the above the patch looks good to me. Thanks, Richard. Thanks, bin 2015-05-20 Bin Cheng bin.ch...@arm.com PR tree-optimization/62173 * tree-ssa-loop-ivopts.c (struct iv): New field. Reorder fields. (alloc_iv, set_iv): New parameter. (determine_biv_step): Delete. (find_bivs): Inline original determine_biv_step. Pass new argument to set_iv. (idx_find_step): Use no_overflow information for conversion. * tree-scalar-evolution.c (analyze_scalar_evolution_in_loop): Let resolve_mixers handle folded_casts. (instantiate_scev_name): Change bool parameter to bool pointer. (instantiate_scev_poly, instantiate_scev_binary): Ditto. (instantiate_array_ref, instantiate_scev_not): Ditto. (instantiate_scev_3, instantiate_scev_2): Ditto. (instantiate_scev_1, instantiate_scev_r): Ditto. (instantiate_scev_convert, ): Change parameter. Pass argument to chrec_convert_aggressive. (instantiate_scev): Change argument. (resolve_mixers): New parameter and set it. (scev_const_prop): New argument. * tree-scalar-evolution.h (resolve_mixers): New parameter. * tree-chrec.c (convert_affine_scev): Call chrec_convert instead of chrec_conert_1. (chrec_convert): New parameter. Move definition below. (chrec_convert_aggressive): New parameter and set it. Call convert_affine_scev. * tree-chrec.h (chrec_convert): New parameter. (chrec_convert_aggressive): Ditto. * tree-ssa-loop-niter.c (loop_exits_before_overflow):
Re: ODR merging and implicit typedefs
I will take a look if I can improve type_in_anonymous_namepsace somehow. So Ada produces TYPE_DECL with DECL_ABSTRACT that do have TYPE_STUB_DECL with TREE_PUBLIC NULL I suppose. Do you mean DECL_ARTIFICIAL instead of DECL_ABSTRACT? If so, presumably, yes, why wouldn't it do that? That seems the natural description of an artificial private type with file scope. And I don't really understand why DECL_ARTIFICIAL is allowed to make such a difference in semantics here. That looks dangerous to me. Note that the problematic assertions: gcc_assert (!type_with_linkage_p (t1) || !type_in_anonymous_namespace_p (t1)); gcc_assert (!type_with_linkage_p (t2) || !type_in_anonymous_namespace_p (t2)); make the following code unreachable: if ((type_with_linkage_p (t1) type_in_anonymous_namespace_p (t1)) || (type_with_linkage_p (t2) type_in_anonymous_namespace_p (t2))) { /* We can not trip this when comparing ODR types, only when trying to match different ODR derivations from different declarations. So WARN should be always false. */ gcc_assert (!warn); return false; } -- Eric Botcazou
Re: Don't dump low gimple functions in gimple dump
On Thu, May 21, 2015 at 5:36 PM, Thomas Schwinge tho...@codesourcery.com wrote: Hi! It's just been a year. ;-P In early March, I (hopefully correctly) adapted Tom's patch to apply to then-current GCC trunk sources; posting this here. Is the general approach OK? On Tue, 20 May 2014 10:16:45 +0200, Tom de Vries tom_devr...@mentor.com wrote: Honza, Consider this program: ... int main(void) { #pragma omp parallel { extern void foo(void); foo (); } return 0; } ... When compiling this program with -fopenmp, the ompexp pass splits off a new function called main._omp_fn.0 containing the call to foo. The new function is then dumped into the gimple dump by analyze_function. There are two problems with this: - the new function is in low gimple, and is dumped next to high gimple functions - since it's already low, the new function is not lowered, and 'goes missing' in the dumps following the gimple dump, until it reappears again after the last lowering dump. [ http://gcc.gnu.org/ml/gcc/2014-03/msg00312.html ] This patch fixes the problems by ensuring that analyze_function only dumps the new function to the gimple dump after gimplification (specifically, by moving the dump_function call into gimplify_function_tree. That makes the call to dump_function in finalize_size_functions superfluous). That also requires us to add a call to dump_function in finalize_task_copyfn, where we split off a new high gimple function. And in expand_omp_taskreg and expand_omp_target, where we split off a new low gimple function, we now dump the new function into the current (ompexp) dump file, which is the last lowering dump. Finally, we dump an information statement at the start of cgraph_add_new_function to give a better idea when and what kind of function is created. Bootstrapped and reg-tested on x86_64. OK for trunk ? Thanks, - Tom commit b925b393c3d975a9281789d97aff8a91a8b53be0 Author: Thomas Schwinge tho...@codesourcery.com Date: Sun Mar 1 15:05:15 2015 +0100 Don't dump low gimple functions in gimple dump id:537b0f6d.7060...@mentor.com or id:53734dc5.90...@mentor.com 2014-05-19 Tom de Vries t...@codesourcery.com * cgraphunit.c (cgraph_add_new_function): Dump message on new function. (analyze_function): Don't dump function to gimple dump file. * gimplify.c: Add tree-dump.h include. (gimplify_function_tree): Dump function to gimple dump file. * omp-low.c: Add tree-dump.h include. (finalize_task_copyfn): Dump new function to gimple dump file. (expand_omp_taskreg, expand_omp_target): Dump new function to dump file. * stor-layout.c (finalize_size_functions): Don't dump function to gimple dump file. * gcc.dg/gomp/dump-task.c: New test. --- gcc/cgraphunit.c | 15 ++- gcc/gimplify.c| 3 +++ gcc/omp-low.c | 6 ++ gcc/stor-layout.c | 1 - gcc/testsuite/gcc.dg/gomp/dump-task.c | 33 + 5 files changed, 56 insertions(+), 2 deletions(-) diff --git gcc/cgraphunit.c gcc/cgraphunit.c index 8280fc4..0860c86 100644 --- gcc/cgraphunit.c +++ gcc/cgraphunit.c @@ -501,6 +501,20 @@ cgraph_node::add_new_function (tree fndecl, bool lowered) { gcc::pass_manager *passes = g-get_passes (); cgraph_node *node; + + if (dump_file) +{ + const char *function_type = ((gimple_has_body_p (fndecl)) + ? (lowered + ? low gimple + : high gimple) + : to-be-gimplified); + fprintf (dump_file, + Added new %s function %s to callgraph\n, + function_type, + fndecl_name (fndecl)); +} + switch (symtab-state) { case PARSING: @@ -629,7 +643,6 @@ cgraph_node::analyze (void) body. */ if (!gimple_has_body_p (decl)) gimplify_function_tree (decl); - dump_function (TDI_generic, decl); /* Lower the function. */ if (!lowered) diff --git gcc/gimplify.c gcc/gimplify.c index 9214648..d6c500d 100644 --- gcc/gimplify.c +++ gcc/gimplify.c @@ -87,6 +87,7 @@ along with GCC; see the file COPYING3. If not see #include gimple-low.h #include cilk.h #include gomp-constants.h +#include tree-dump.h #include langhooks-def.h /* FIXME: for lhd_set_decl_assembler_name */ #include tree-pass.h /* FIXME: only for PROP_gimple_any */ @@ -9435,6 +9436,8 @@ gimplify_function_tree (tree fndecl) cfun-curr_properties = PROP_gimple_any; pop_cfun (); + + dump_function (TDI_generic, fndecl); Err - that dumps gimple now. I think the dump you removed above was simply bogus and should have used TDI_gimple? Or is TDI_generic just
[committed] Fix warnings in extend.texi
This is to fix two warning: `.' or `,' must follow @xref, not ) occurences. Applying to trunk. 2015-05-22 Marek Polacek pola...@redhat.com * doc/extend.texi: Use @pxref instead of @xref. diff --git gcc/doc/extend.texi gcc/doc/extend.texi index 5539199..6c51bc4 100644 --- gcc/doc/extend.texi +++ gcc/doc/extend.texi @@ -6304,7 +6304,7 @@ compilers to match the native Microsoft compiler. GCC allows attributes to be set on C labels. @xref{Attribute Syntax}, for details of the exact syntax for using attributes. Other attributes are available for functions (@pxref{Function Attributes}), variables -(@pxref{Variable Attributes}), enumerators (@xref{Enumerator Attributes}), +(@pxref{Variable Attributes}), enumerators (@pxref{Enumerator Attributes}), and for types (@pxref{Type Attributes}). This example uses the @code{cold} label attribute to indicate the @@ -6358,7 +6358,7 @@ with computed goto or @code{asm goto}. GCC allows attributes to be set on enumerators. @xref{Attribute Syntax}, for details of the exact syntax for using attributes. Other attributes are available for functions (@pxref{Function Attributes}), variables -(@pxref{Variable Attributes}), labels (@xref{Label Attributes}), +(@pxref{Variable Attributes}), labels (@pxref{Label Attributes}), and for types (@pxref{Type Attributes}). This example uses the @code{deprecated} enumerator attribute to indicate the Marek
Ping: [Patch, fortran, PR44672, v6] [F08] ALLOCATE with SOURCE and no array-spec
Hi, the patch (65548) this one depends on is in trunk now. Still bootstraps ok and regtests with the issue in gfortran.dg/alloc_comp_constructor_1.f90 (which is addressed by the patch for pr58586 already) on x86_64-linux-gnu/f21. Ok for trunk? - Andre On Tue, 19 May 2015 12:26:02 +0200 Andre Vehreschild ve...@gmx.de wrote: Hi all, update based on latest 65548 (v5) patch and current trunk. Description and issue addressed unchanged (see cite below). Bootstrapped and regtested on x86_64-linux-gnu/f21. Any volunteers to review? The initial version dates back to March 30. 2015. Not a single comment so far! - Andre On Thu, 30 Apr 2015 16:17:42 +0200 Andre Vehreschild ve...@gmx.de wrote: Hi all, and also for this bug, I like to present an updated patch. It was brought to my attention, that the previous patch did not fix statements like: allocate(m, source=[(I, I=1, n)]) where n is a variable and type p class(*), allocatable :: m(:,:) end type real mat(2,3) type(P) :: o allocate(o%m, source=mat) The new version of the patch fixes those issue now also and furthermore addresses some issues (most probably not all) where the rank of the source=-variable and the rank of the array to allocate differ. For example, when one is do: real v(:) allocate(v, source= arr(1,2:3)) where arr has a rank of 2 and only the source=-expression a rank of one, which is then compatible with v. Nevertheless did this need addressing, when setting up the descriptor of the v and during data copy. Bootstrap ok on x86_64-linux-gnu/f21. Regtests with one regression in gfortran.dg/alloc_comp_constructor_1.f90, which is addressed in the patch for pr58586, whose final version is in preparation. Ok for trunk in combination with 58586 once both are reviewed? Regards, Andre On Wed, 29 Apr 2015 17:23:58 +0200 Andre Vehreschild ve...@gmx.de wrote: Hi all, this is the fourth version of the patch, adapting to the current state of trunk. This patch is based on my patch for 65584 version 2 and needs that patch applied beforehand to apply cleanly. The patch for 65548 is available from: https://gcc.gnu.org/ml/fortran/2015-04/msg00121.html Scope: Allow allocate of arrays w/o having to give an array-spec as specified in F2008:C633. An example is: integer, dimension(:) :: arr allocate(arr, source = [1,2,3]) Solution: While resolving an allocate, the objects to allocate are analyzed whether they carry an array-spec, if not the array-spec of the source=-expression is transferred. Unfortunately some source=-expressions are not easy to handle and have to be assigned to a temporary variable first. Only with the temporary variable the gfc_trans_allocate() is then able to compute the array descriptor correctly and allocate with correct array bounds. Side notes: This patch creates a regression in alloc_comp_constructor_1.f90 where two free()'s are gone missing. This will be fixed by the patch for pr58586 and therefore not repeated here. Bootstraps and regtests ok on x86_64-linux-gnu/f21. Ok for trunk? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
[Ada] Constants without variable input are not hidden state
This patch implements the following rule with respect to constants: SPARK RM 7.1.1(2) - The hidden state of a package P consists of: * any variables, or constants with variable inputs, declared immediately in the private part or body of P. Constants without variable input are not considered part of the hidden state of a package. These constants do not require indicator Part_Of when declared in the private part of a package. -- Source -- -- types.ads ackage Types is type CA is array (1 .. 3) of Integer; type UA is array (Integer range ) of Integer; type VR (Discr : Boolean) is record Comp_1 : Integer; case Discr is when True = Comp_2 : Integer; when others = null; end case; end record; subtype CR is VR (True); function Get_CA return CA; function Get_CR return CR; function Get_In return Integer; function Get_UA return UA; function Get_VR return VR; end Types; -- types.adb package body Types is function Get_CA return CA is Result : constant CA := (others = 0); begin return Result; end Get_CA; function Get_CR return CR is Result : constant CR := (Discr = True, Comp_1 = 0, Comp_2 = 0); begin return Result; end Get_CR; function Get_In return Integer is begin return 0; end Get_In; function Get_UA return UA is begin return (0, 0, 0); end Get_UA; function Get_VR return VR is begin return (Discr = False, Comp_1 = 0); end Get_VR; end Types; -- legal_hidden_state.ads with Types; use Types; pragma Elaborate_All (Types); package Legal_Hidden_State with SPARK_Mode, Abstract_State = State is procedure Force_Body; private -- Constrained array C1 : constant CA := Get_CA with Part_Of = State; C2 : constant CA := (others = Get_In) with Part_Of = State; C3 : constant CA := (1, 2, Get_In) with Part_Of = State; -- Constrained record C4 : constant CR := Get_CR with Part_Of = State; C5 : constant CR := (Discr = True, Comp_1 = 1, Comp_2 = Get_In) with Part_Of = State; -- Scalar C6 : constant Integer := Get_In with Part_Of = State; -- Unconstrained array C7 : constant UA := Get_UA with Part_Of = State; C8 : constant UA (1 .. 3) := (others = Get_In) with Part_Of = State; C9 : constant UA := (1, 2, Get_In) with Part_Of = State; -- Variant record C10 : constant VR := Get_VR with Part_Of = State; C11 : constant VR := (Discr = False, Comp_1 = Get_In) with Part_Of = State; C12 : constant VR (False) := (Discr = False, Comp_1 = Get_In) with Part_Of = State; end Legal_Hidden_State; -- legal_hidden_state.adb package body Legal_Hidden_State with SPARK_Mode, Refined_State = (State = (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21, C22, C23, C24)) is -- Constrained array C13 : constant CA := Get_CA; C14 : constant CA := (others = Get_In); C15 : constant CA := (1, 2, Get_In); -- Constrained record C16 : constant CR := Get_CR; C17 : constant CR := (Discr = True, Comp_1 = 17, Comp_2 = Get_In); -- Scalar C18 : constant Integer := Get_In; -- Unconstrained array C19 : constant UA := Get_UA; C20 : constant UA (1 .. 3) := (others = Get_In); C21 : constant UA := (1, 2, Get_In); -- Variant record C22 : constant VR := Get_VR; C23 : constant VR := (Discr = False, Comp_1 = Get_In); C24 : constant VR (False) := (Discr = False, Comp_1 = Get_In); procedure Force_Body is begin null; end Force_Body; end Legal_Hidden_State; - -- Compilation -- - $ gcc -c legal_hidden_state.adb Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Hristian Kirtchev kirtc...@adacore.com * sem_prag.adb (Analyze_Pragma): Constants without variable input do not require indicator Part_Of. (Check_Missing_Part_Of): Constants without variable input do not requrie indicator Part_Of. (Collect_Visible_States): Constants without variable input are not part of the hidden state of a package. * sem_util.ads, sem_util.adb (Has_Variable_Input): New routine. Index: sem_prag.adb === --- sem_prag.adb(revision 223484) +++ sem_prag.adb(working copy) @@ -2710,7 +2710,7 @@ Legal : out Boolean); -- Subsidiary to the analysis of pragmas Abstract_State and Part_Of. -- Perform full analysis of indicator Part_Of. Item_Id is the entity of - -- an abstract state, variable or package instantiation. State is the + -- an abstract state, object or package instantiation. State is the -- encapsulating state. Indic is the Part_Of indicator. Flag Legal is -- set when the indicator
Re: Calculate TYPE_CANONICAL only for types that can be accessed in memory
Now we have it spelled out 4 times ... makes sense to create a new macro for it? (though I cannot think of a good name... UNACCESSIBLE_TYPE_P ()?) Yep, actually I already made that version of patch yesterday but then got hooked by beers. This is better version (also with more sensible comments). I will commit it at afternoon if you have no further comments. * lto.c (hash_canonical_type): Be sure we hash only types that need alias set. (gimple_register_canonical_type_1): Do not produce canonical types for types that do not need alias sets. * tree.c (gimple_canonical_types_compatible_p): Sanity check that we do not try to compute canonical type for type that does not need alias set. (verify_type): Drop FIXME for METHOD_TYPE, update FIXME for FUNCITON_TYPE. * tree.h (type_with_alias_set_p): New. Index: lto/lto.c === --- lto/lto.c (revision 223508) +++ lto/lto.c (working copy) @@ -309,6 +309,12 @@ hash_canonical_type (tree type) { inchash::hash hstate; + /* We compute alias sets only for types that needs them. + Be sure we do not recurse to something else as we can not hash incomplete + types in a way they would have same hash value as compatible complete + types. */ + gcc_checking_assert (type_with_alias_set_p (type)); + /* Combine a few common features of types so that types are grouped into smaller sets; when searching for existing matching types to merge, only existing types having the same features as the new type will be @@ -493,7 +495,7 @@ gimple_register_canonical_type_1 (tree t static void gimple_register_canonical_type (tree t) { - if (TYPE_CANONICAL (t)) + if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)) return; gimple_register_canonical_type_1 (t, hash_canonical_type (t)); Index: tree.c === --- tree.c (revision 223508) +++ tree.c (working copy) @@ -12720,6 +12720,23 @@ gimple_canonical_types_compatible_p (con if (t1 == NULL_TREE || t2 == NULL_TREE) return false; + /* We consider complete types always compatible with incomplete type. + This does not make sense for canonical type calculation and thus we + need to ensure that we are never called on it. + + FIXME: For more correctness the function probably should have three modes + 1) mode assuming that types are complete mathcing their structure + 2) mode allowing incomplete types but producing equivalence classes + and thus ignoring all info from complete types + 3) mode allowing incomplete types to match complete but checking + compatibility between complete types. + + 1 and 2 can be used for canonical type calculation. 3 is the real + definition of type compatibility that can be used i.e. for warnings during + declaration merging. */ + + gcc_assert (!trust_type_canonical + || (type_with_alias_set_p (t1) type_with_alias_set_p (t2))); /* If the types have been previously registered and found equal they still are. */ if (TYPE_CANONICAL (t1) TYPE_CANONICAL (t2) @@ -12939,10 +12953,9 @@ verify_type (const_tree t) /* Method and function types can not be used to address memory and thus TYPE_CANONICAL really matters only for determining useless conversions. - FIXME: C++ FE does not agree with gimple_canonical_types_compatible_p - here. gimple_canonical_types_compatible_p calls comp_type_attributes - while for C++ FE the attributes does not make difference. */ - else if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE) + FIXME: C++ FE produce declarations of builtin functions that are not + compatible with main variants. */ + else if (TREE_CODE (t) == FUNCTION_TYPE) ; else if (t != ct /* FIXME: gimple_canonical_types_compatible_p can not compare types Index: tree.h === --- tree.h (revision 223508) +++ tree.h (working copy) @@ -5090,6 +5090,26 @@ int_bit_position (const_tree field) + wi::to_offset (DECL_FIELD_BIT_OFFSET (field))).to_shwi (); } +/* Return true if it makes sense to consider alias set for a type T. */ + +inline bool +type_with_alias_set_p (const_tree t) +{ + /* Function and method types are never accessed as memory locations. */ + if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE) +return false; + + if (COMPLETE_TYPE_P (t)) +return true; + + /* Incomplete types can not be accessed in general except for arrays + where we can fetch its element despite we have no array bounds. */ + if (TREE_CODE (t) == ARRAY_TYPE COMPLETE_TYPE_P (TREE_TYPE (t))) +return true; + + return false; +} + extern void gt_ggc_mx (tree ); extern void gt_pch_nx (tree ); extern void gt_pch_nx
[RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
All, This patch removes the special casing for targets with relaxed memory ordering and handles guard accesses with equivalent atomic load acquire operations. In this process we change the algorithm to load the guard variable with an atomic load that has ACQUIRE semantics. I'm not terribly familiar with the C++ front-end so I'm not sure I've used the appropriate interfaces for doing something like this. This then means that on targets which have weak memory models, the fast path is inlined and can directly use a load-acquire instruction where available (and yay! one more hook gone). Currently bootstrapping and regression testing on AArch64 and ARM (prior to the commit that caused PR66241). If this goes in then I'm happy to withdraw part of the patches to trunk for AArch64 / ARM that defines TARGET_RELAXED_ORDERING and only propose those hunks to the branches. I'd also request the other target maintainers CC'd to help by testing this on their platforms as I do not have access to all of them. To help folks see the difference, this is the difference in output for a compiler for AArch64 built with TARGET_RELAXED_ORDERING set to true and this patch for the testcase below. int* f(void) { static int* p = new int; return p; } - adrpx19, .LANCHOR0 - add x20, x19, :lo12:.LANCHOR0 - mov x0, x20 - bl __cxa_guard_acquire - cbnzw0, .L2 - ldr x0, [x20, 8] + adrpx20, .LANCHOR0 + add x19, x20, :lo12:.LANCHOR0 + ldarx0, [x19] + tbz x0, 0, .L11 +.L9: + ldr x0, [x19, 8] regards Ramana 2015-05-22 Ramana Radhakrishnan ramana.radhakrish...@arm.com PR c++/66192 * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise. * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise. * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise. * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise. * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise. * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise. * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete. * target.def (TARGET_RELAXED_ORDERING): Delete. gcc/cp/ChangeLog: 2015-05-22 Ramana Radhakrishnan ramana.radhakrish...@arm.com PR c++/66192 * decl.c (expand_static_init): Remove special casing for targets with weak memory model. * decl2.c (build_atomic_load): New function. (get_guard_cond): Adjust for atomic_load.
[PATCH] PR other/66250: Can't adjust complex nor decimal floating point modes
machmode.def has /* Allow the target to specify additional modes of various kinds. */ /* Complex modes. */ COMPLEX_MODES (INT); COMPLEX_MODES (FLOAT); /* Decimal floating point modes. */ DECIMAL_FLOAT_MODE (SD, 4, decimal_single_format); DECIMAL_FLOAT_MODE (DD, 8, decimal_double_format); DECIMAL_FLOAT_MODE (TD, 16, decimal_quad_format); We can't adjust any complex nor DFP modes in i386-modes.def since they aren't available yet. But we need to include i386-modes.def before COMPLEX_MODES (INT); COMPLEX_MODES (FLOAT); to get extra modes. This patch adds EXTRA_ADJUSTMENTS_FILE containing adjustments to machine modes and includes it after all modes have been created. Tested on Linux/x86-64. The generated mode files are equivalent before and after the change. I enclosed the diffs here. OK for master? Thanks. H.J. --- PR other/66250 * Makefile.in (extra_adjustments_file): New. (build/genmodes.o): Depend on $(extra_adjustments_file). * config.gcc (extra_adjustments): New. Set to ${cpu_type}/${cpu_type}-adjustments.def if available. * configure.ac (extra_adjustments): AC_SUBST and AC_DEFINE_UNQUOTED if set. * genmodes.c (HAVE_EXTRA_ADJUSTMENTS): New. (EXTRA_ADJUSTMENTS_FILE): Likewise. (emit_autogen_header): Likewise. (emit_insn_modes_h): Call emit_autogen_header. (emit_insn_modes_c_header): Likewise. (emit_min_insn_modes_c_header): Likewise. * machmode.def: Include EXTRA_ADJUSTMENTS_FILE if HAVE_EXTRA_ADJUSTMENTS is 1. * config.in: Regenerated. * configure: Likewise. * config/i386/i386-modes.def (ADJUST_FLOAT_FORMAT, ADJUST_BYTESIZE, ADJUST_ALIGNMENT): Moved to ... * config/i386/i386-adjustments.def: Here. New file. --- gcc/Makefile.in | 3 ++- gcc/config.gcc | 9 + gcc/config.in| 7 +++ gcc/config/i386/i386-adjustments.def | 28 gcc/config/i386/i386-modes.def | 12 +--- gcc/configure| 16 ++-- gcc/configure.ac | 9 + gcc/genmodes.c | 34 ++ gcc/machmode.def | 5 + 9 files changed, 97 insertions(+), 26 deletions(-) create mode 100644 gcc/config/i386/i386-adjustments.def diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 512e7c8..87e1cb1 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -544,6 +544,7 @@ lang_tree_files=@lang_tree_files@ target_cpu_default=@target_cpu_default@ OBJC_BOEHM_GC=@objc_boehm_gc@ extra_modes_file=@extra_modes_file@ +extra_adjustments_file=@extra_adjustments_file@ extra_opt_files=@extra_opt_files@ host_hook_obj=@out_host_hook_obj@ @@ -2540,7 +2541,7 @@ CFLAGS-errors.o += -DHOST_GENERATOR_FILE build/genmddeps.o: genmddeps.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h\ errors.h $(READ_MD_H) build/genmodes.o : genmodes.c $(BCONFIG_H) $(SYSTEM_H) errors.h \ - $(HASHTAB_H) machmode.def $(extra_modes_file) + $(HASHTAB_H) machmode.def $(extra_modes_file) $(extra_adjustments_file) build/genopinit.o : genopinit.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ coretypes.h $(GTM_H) errors.h gensupport.h optabs.def build/genoutput.o : genoutput.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ diff --git a/gcc/config.gcc b/gcc/config.gcc index 1fcc290..85572d6 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -103,6 +103,10 @@ # machine modes, if necessary and different from # $cpu_type/$cpu_type-modes.def. # +# extra_adjustmentsThe name of the file containing adjustments to +# machine modes, if necessary and different from +# $cpu_type/$cpu_type-adjustments.def. +# # extra_objs List of extra objects that should be linked into # the compiler proper (cc1, cc1obj, cc1plus) # depending on target. @@ -494,6 +498,11 @@ if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-modes.def then extra_modes=${cpu_type}/${cpu_type}-modes.def fi +extra_adjustments= +if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-adjustments.def +then + extra_adjustments=${cpu_type}/${cpu_type}-adjustments.def +fi if test -f ${srcdir}/config/${cpu_type}/${cpu_type}.opt then extra_options=${extra_options} ${cpu_type}/${cpu_type}.opt diff --git a/gcc/config/i386/i386-adjustments.def b/gcc/config/i386/i386-adjustments.def new file mode 100644 index 000..690201e --- /dev/null +++ b/gcc/config/i386/i386-adjustments.def @@ -0,0 +1,28 @@ +/* Definitions of target adjustments for GCC for x86. + Copyright (C) 2015 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as
[Ada] Removal of SPARK RM 6.9 (11)
This patch removes the (incorrect) implementation of the following rule: SPARK RM 6.9 (11) - A non-ghost library unit package or generic package specification shall not require a completion solely because of ghost declarations. [In other words, if a library unit package or generic package specification requires a body, then it must still require a body if all of the ghost declarations therein were to be removed. The patch also updates references to SPARK RM 6.9 in various compiler files. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Hristian Kirtchev kirtc...@adacore.com * ghost.adb (Check_Ghost_Completion): Update references to SPARK RM 6.9 rules. (Check_Ghost_Policy): Update references to SPARK RM 6.9 rules. * sem_ch3.adb (Analyze_Object_Declaration): Update references to SPARK RM 6.9 rules. (Check_Completion): Ghost entities do not require a special form of completion. * sem_ch6.adb (Analyze_Generic_Subprogram_Body): Update references to SPARK RM 6.9 rules. (Analyze_Subprogram_Body_Helper): Update references to SPARK RM 6.9 rules. * sem_ch7.adb (Analyze_Package_Body_Helper): Update references to SPARK RM 6.9 rules. (Requires_Completion_In_Body): Ghost entities do not require a special form of completion. Index: sem_ch3.adb === --- sem_ch3.adb (revision 223484) +++ sem_ch3.adb (working copy) @@ -4055,7 +4055,7 @@ -- The Ghost policy in effect at the point of declaration -- and at the point of completion must match - -- (SPARK RM 6.9(15)). + -- (SPARK RM 6.9(14)). if Present (Prev_Entity) and then Is_Ghost_Entity (Prev_Entity) @@ -4237,7 +4237,7 @@ Set_Is_Ghost_Entity (Id); -- The Ghost policy in effect at the point of declaration and at the - -- point of completion must match (SPARK RM 6.9(16)). + -- point of completion must match (SPARK RM 6.9(14)). if Present (Prev_Entity) and then Is_Ghost_Entity (Prev_Entity) then Check_Ghost_Completion (Prev_Entity, Id); @@ -10937,12 +10937,6 @@ if Is_Intrinsic_Subprogram (E) then null; - -- A Ghost entity declared in a non-Ghost package does not force the - -- need for a body (SPARK RM 6.9(11)). - - elsif not Is_Ghost_Entity (Pack_Id) and then Is_Ghost_Entity (E) then -null; - -- The following situation requires special handling: a child unit -- that appears in the context clause of the body of its parent: @@ -19964,7 +19958,7 @@ Set_Is_Ghost_Entity (Full_T); -- The Ghost policy in effect at the point of declaration and at the - -- point of completion must match (SPARK RM 6.9(15)). + -- point of completion must match (SPARK RM 6.9(14)). Check_Ghost_Completion (Priv_T, Full_T); Index: sem_ch7.adb === --- sem_ch7.adb (revision 223476) +++ sem_ch7.adb (working copy) @@ -750,7 +750,7 @@ Set_Is_Ghost_Entity (Body_Id); -- The Ghost policy in effect at the point of declaration and at the - -- point of completion must match (SPARK RM 6.9(15)). + -- point of completion must match (SPARK RM 6.9(14)). Check_Ghost_Completion (Spec_Id, Body_Id); end if; @@ -2527,12 +2527,6 @@ then return False; - -- A Ghost entity declared in a non-Ghost package does not force the - -- need for a body (SPARK RM 6.9(11)). - - elsif not Is_Ghost_Entity (Pack_Id) and then Is_Ghost_Entity (Id) then - return False; - -- Otherwise test to see if entity requires a completion. Note that -- subprogram entities whose declaration does not come from source are -- ignored here on the basis that we assume the expander will provide an Index: ghost.adb === --- ghost.adb (revision 223476) +++ ghost.adb (working copy) @@ -6,7 +6,7 @@ -- -- -- B o d y -- -- -- ---Copyright (C) 2014-2015, Free Software Foundation, Inc. -- +-- Copyright (C) 2014-2015, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -106,7 +106,7 @@ begin -- The Ghost
Re: [patch 10/10] debug-early merge: compiler proper
On Wed, May 20, 2015 at 5:50 PM, Aldy Hernandez al...@redhat.com wrote: On 05/18/2015 06:56 AM, Richard Biener wrote: BTW, thanks for the review. On Fri, May 8, 2015 at 2:40 AM, Aldy Hernandez al...@redhat.com wrote: As seen on TV. +/* FIRST_TIME is set to TRUE for the first time we are called for a + translation unit from finalize_compilation_unit() or false + otherwise. */ + static void -analyze_functions (void) +analyze_functions (bool first_time) { ... + if (first_time) +{ + symtab_node *snode; + FOR_EACH_SYMBOL (snode) + check_global_declaration (snode-decl); +} + I think it is better to split analyze_functions (why does it have it's own copy of unreachable node removal?) into analysis and unused-symbol removal and have the check_global_declaration call be in finalize_compilation_unit directly. Honza? Leaving things as is, as per Honza's comment ?? @@ -1113,6 +1131,19 @@ analyze_functions (void) { if (symtab-dump_file) fprintf (symtab-dump_file, %s, node-name ()); + + /* See if the debugger can use anything before the DECL +passes away. Perhaps it can notice a DECL that is now a +constant and can tag the early DIE with an appropriate +attribute. + +Otherwise, this is the last chance the debug_hooks have +at looking at optimized away DECLs, since +late_global_decl will subsequently be called from the +contents of the now pruned symbol table. */ + if (!decl_function_context (node-decl)) + (*debug_hooks-late_global_decl) (node-decl); + node-remove (); so this applies to VAR_DECLs only - shouldn't this be in the varpool_node::remove function then? You can even register/deregister a hook for this in finalize_compilation_unit. That would IMHO be better. The problem is that varpool_node::remove() may be called before we have finished parsing the DECL, thus before we call early_global_decl() on it. So we would essentially be calling late_global_decl() on a DECL for which we haven't called early_global_decl(). To complicate matters, we may call ::remove() before we finish parsing a decl. In the C front-end, for instance, we call ::remove() from duplicate_decls(), before we even call rest_of_decl_compilation (where we call early_global_decl). Calling late_global_decl so early, before we have even finished parsing, seems wrong and obviously causes problems. One example is dwarf2out can put the DECL into the deferred_asm_names list, only to have duplicate_decls() gcc_free it from under us. All debug stuff apart from dwarf2out.c changes (I assume Jason reviews them) are ok. diff --git a/gcc/gengtype.c b/gcc/gengtype.c index 02012d5..15b6dd2 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -4718,7 +4718,8 @@ write_roots (pair_p variables, bool emit_pch) this funcion will have to be adjusted to be more like output_mangled_typename. */ -static void +/* ?? Why are we keeping this? Is this actually used anywhere? */ +static void ATTRIBUTE_UNUSED output_typename (outf_p of, const_type_p t) { switch (t-kind) Just remove the function. Done. The langhooks changes are ok. diff --git a/gcc/passes.c b/gcc/passes.c index 04ff042..4dee8ad 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -293,6 +293,28 @@ rest_of_decl_compilation (tree decl, else if (TREE_CODE (decl) == VAR_DECL !DECL_EXTERNAL (decl) TREE_STATIC (decl)) varpool_node::get_create (decl); + + /* Generate early debug for global variables. Any local variables will + be handled by either handling reachable functions from + finalize_compilation_unit (and by consequence, locally scoped + symbols), or by rest_of_type_compilation below. + + Also, pick up function prototypes, which will be mostly ignored + by the different early_global_decl() hooks, but will at least be + used by Go's hijack of the debug_hooks to implement + -fdump-go-spec. */ + if (!flag_wpa + !in_lto_p Just check !in_lto_p, !flag_wpa is redundant. Done. + !decl_function_context (decl) + !current_function_decl Why that? !decl_function_context should catch relevant cases? You'd think, huh? The issue here are extern declarations appearing inside of a function. For this case, decl_function_context is NULL, because the actual context is toplevel, but current_function_decl is set to the function where the extern declaration appears. For example: namespace S { int f() { { int i = 42; { extern int i; // Local extern declaration. return i; } } } } I have added a big fat comment in the code now, since it's clearly not obvious why we need current_function_decl. + !decl_type_context (decl)) +
Re: Calculate TYPE_CANONICAL only for types that can be accessed in memory
+ /* No need for canonical types of functions and methods; those are never + accessed as memory locations. */ + if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE) +return; Just occured to me that it might make sense to remove the FUNCTION/METHOD_TYPE case in useless_type_conversion_p (I wonder in which cases we enter up in that path...). Yes, I think that is unreachable (it was used only by that ignored code path in ipa-symtab.c) and I also had patch for it somewhere. Will look it up at afternoon (I am teaching at morning). I think useless_type_conversion_p should also just work on types that do have memory representation. Honza
Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
Bah ! now with patch attached. Ramana diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c index 1ba99d0..857c9ac 100644 --- a/gcc/config/alpha/alpha.c +++ b/gcc/config/alpha/alpha.c @@ -9987,12 +9987,6 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) #undef TARGET_EXPAND_BUILTIN_VA_START #define TARGET_EXPAND_BUILTIN_VA_START alpha_va_start -/* The Alpha architecture does not require sequential consistency. See - http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html - for an example of how it can be violated in practice. */ -#undef TARGET_RELAXED_ORDERING -#define TARGET_RELAXED_ORDERING true - #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE alpha_option_override diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c index c1e2ecd..45ad97a 100644 --- a/gcc/config/ia64/ia64.c +++ b/gcc/config/ia64/ia64.c @@ -630,11 +630,6 @@ static const struct attribute_spec ia64_attribute_table[] = #define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \ ia64_libgcc_floating_mode_supported_p -/* ia64 architecture manual 4.4.7: ... reads, writes, and flushes may occur - in an order different from the specified program order. */ -#undef TARGET_RELAXED_ORDERING -#define TARGET_RELAXED_ORDERING true - #undef TARGET_LEGITIMATE_CONSTANT_P #define TARGET_LEGITIMATE_CONSTANT_P ia64_legitimate_constant_p #undef TARGET_LEGITIMATE_ADDRESS_P diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index a590ef4..ce70ca0 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1620,17 +1620,6 @@ static const struct attribute_spec rs6000_attribute_table[] = #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail #endif -/* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors - The PowerPC architecture requires only weak consistency among - processors--that is, memory accesses between processors need not be - sequentially consistent and memory accesses among processors can occur - in any order. The ability to order memory accesses weakly provides - opportunities for more efficient use of the system bus. Unless a - dependency exists, the 604e allows read operations to precede store - operations. */ -#undef TARGET_RELAXED_ORDERING -#define TARGET_RELAXED_ORDERING true - #ifdef HAVE_AS_TLS #undef TARGET_ASM_OUTPUT_DWARF_DTPREL #define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel diff --git a/gcc/config/sparc/linux.h b/gcc/config/sparc/linux.h index 56def4b..37f507d 100644 --- a/gcc/config/sparc/linux.h +++ b/gcc/config/sparc/linux.h @@ -139,12 +139,6 @@ do { \ /* Static stack checking is supported by means of probes. */ #define STACK_CHECK_STATIC_BUILTIN 1 -/* Linux currently uses RMO in uniprocessor mode, which is equivalent to - TMO, and TMO in multiprocessor mode. But they reserve the right to - change their minds. */ -#undef SPARC_RELAXED_ORDERING -#define SPARC_RELAXED_ORDERING true - #undef NEED_INDICATE_EXEC_STACK #define NEED_INDICATE_EXEC_STACK 1 diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h index fa805fd..8b4a435 100644 --- a/gcc/config/sparc/linux64.h +++ b/gcc/config/sparc/linux64.h @@ -253,12 +253,6 @@ do { \ /* Static stack checking is supported by means of probes. */ #define STACK_CHECK_STATIC_BUILTIN 1 -/* Linux currently uses RMO in uniprocessor mode, which is equivalent to - TMO, and TMO in multiprocessor mode. But they reserve the right to - change their minds. */ -#undef SPARC_RELAXED_ORDERING -#define SPARC_RELAXED_ORDERING true - #undef NEED_INDICATE_EXEC_STACK #define NEED_INDICATE_EXEC_STACK 1 diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index a1562ad..094287f 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -808,9 +808,6 @@ char sparc_hard_reg_printed[8]; #define TARGET_ATTRIBUTE_TABLE sparc_attribute_table #endif -#undef TARGET_RELAXED_ORDERING -#define TARGET_RELAXED_ORDERING SPARC_RELAXED_ORDERING - #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE sparc_option_override diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h index ce1b68b..fd24431 100644 --- a/gcc/config/sparc/sparc.h +++ b/gcc/config/sparc/sparc.h @@ -106,17 +106,6 @@ extern enum cmodel sparc_cmodel; #define SPARC_DEFAULT_CMODEL CM_32 -/* The SPARC-V9 architecture defines a relaxed memory ordering model (RMO) - which requires the following macro to be true if enabled. Prior to V9, - there are no instructions to even talk about memory synchronization. - Note that the UltraSPARC III processors don't implement RMO, unlike the - UltraSPARC II processors. Niagara, Niagara-2, and Niagara-3 do not - implement RMO either. - - Default to false; for example, Solaris never enables RMO, only ever uses -
[Ada] Constants and hidden state
This patch modifies the treatment of constants within the state space of a package. Constants that appear in the hidden state space may or may not act as constituents or possess indicator Part_Of. This is because the compiler cannot accurately determine whether a constant has variable input which in turn classifies it as a hidden state. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Hristian Kirtchev kirtc...@adacore.com * sem_prag.adb (Analyze_Pragma): Remove the detection of a useless Part_Of indicator when the related item is a constant. (Check_Matching_Constituent): Do not emit an error on a constant. (Check_Missing_Part_Of): Do not check for a missing Part_Of indicator when the related item is a constant. (Collect_Body_States): Code cleanup. (Collect_Visible_States): Code cleanup. (Report_Unused_States): Do not emit an error on a constant. * sem_util.ads, sem_util.adb (Has_Variable_Input): Removed. Index: sem_prag.adb === --- sem_prag.adb(revision 223534) +++ sem_prag.adb(working copy) @@ -17555,20 +17555,6 @@ Legal = Legal); if Legal then - - -- Constants without variable input are not considered part - -- of the hidden state of a package (SPARK RM 7.1.1(2)). As a - -- result such constants do not require a Part_Of indicator. - - if Ekind (Item_Id) = E_Constant - and then not Has_Variable_Input (Item_Id) - then - SPARK_Msg_NE -(useless Part_Of indicator, constant does not have - variable input, N, Item_Id); - return; - end if; - State_Id := Entity (State); -- The Part_Of indicator turns an object into a constituent of @@ -23983,14 +23969,25 @@ end loop; end if; + -- Constants are part of the hidden state of a package, but + -- the compiler cannot determine whether they have variable + -- input (SPARK RM 7.1.1(2)) and cannot classify them as a + -- hidden state. Accept the constant quietly even if it is + -- a visible state or lacks a Part_Of indicator. + + if Ekind (Constit_Id) = E_Constant then + null; + -- If we get here, then the constituent is not a hidden -- state of the related package and may not be used in a -- refinement (SPARK RM 7.2.2(9)). - Error_Msg_Name_1 := Chars (Spec_Id); - SPARK_Msg_NE -(cannot use in refinement, constituent is not a hidden - state of package %, Constit, Constit_Id); + else + Error_Msg_Name_1 := Chars (Spec_Id); + SPARK_Msg_NE + (cannot use in refinement, constituent is not a + hidden state of package %, Constit, Constit_Id); + end if; end if; end Check_Matching_Constituent; @@ -24434,7 +24431,6 @@ procedure Collect_Visible_States (Pack_Id : Entity_Id) is -Decl: Node_Id; Item_Id : Entity_Id; begin @@ -24453,28 +24449,16 @@ elsif Ekind (Item_Id) = E_Abstract_State then Add_Item (Item_Id, Result); - elsif Ekind_In (Item_Id, E_Constant, E_Variable) then - Decl := Declaration_Node (Item_Id); + -- Do not consider constants or variables that map generic + -- formals to their actuals, as the formals cannot be named + -- from the outside and participate in refinement. - -- Do not consider constants or variables that map generic - -- formals to their actuals as the formals cannot be named - -- from the outside and participate in refinement. + elsif Ekind_In (Item_Id, E_Constant, E_Variable) + and then No (Corresponding_Generic_Association +(Declaration_Node (Item_Id))) + then + Add_Item (Item_Id, Result); - if Present (Corresponding_Generic_Association (Decl)) then - null; - - -- Constants without variable input are not considered a - -- hidden state of a package (SPARK RM 7.1.1(2)). - - elsif Ekind (Item_Id) = E_Constant -and then not Has_Variable_Input (Item_Id) - then - null; - - else -
[Ada] Cannot rename component of Volatile_Full_Access object
It is not allowed to rename a component of a composite object to which pragma Volatile_Full_Access has been applied. The following is compiled with -gnatj55 1. package RenamVFA is 2.type Int8_t is mod 2**8; 3.type Rec is record 4. A,B,C,D : Int8_t; 5.end record; 6.for Rec'Size use 32; 7.for Rec'Alignment use 4; 8.pragma Volatile_Full_Access (Rec); 9.R : Rec; 10.I1 : Int8_t renames R.A; -- illegal for now | cannot rename component of Volatile_Full_Access object 11.type Arr is array (1 .. 4) of Int8_t; 12.pragma Volatile_Full_Access (Arr); 13.A : Arr; 14.I2 : Int8_t renames A (1); -- illegal for now | cannot rename component of Volatile_Full_Access object 15. end RenamVFA; Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Robert Dewar de...@adacore.com * sem_ch8.adb (Analyze_Object_Renaming): Check for renaming component of an object to which Volatile_Full_Access applies. Index: sem_ch8.adb === --- sem_ch8.adb (revision 223541) +++ sem_ch8.adb (working copy) @@ -912,6 +912,25 @@ (renaming of conversion only allowed for tagged types, Nam); end if; + -- Reject renaming of component of Volatile_Full_Access object + + if Nkind_In (Nam, N_Selected_Component, N_Indexed_Component) then +declare + P : constant Node_Id := Prefix (Nam); +begin + if Is_Entity_Name (P) then + if Has_Volatile_Full_Access (Entity (P)) + or else + Has_Volatile_Full_Access (Etype (P)) + then + Error_Msg_N + (cannot rename component of Volatile_Full_Access + object, Nam); + end if; + end if; +end; + end if; + Resolve (Nam, T); -- If the renamed object is a function call of a limited type,
[Ada] Raise Program_Error on default initialization of references
This patch causes default initialization of objects of types Constant_Reference_Type and Reference_Type in the containers packages to raise Program_Error as required by the RM. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Bob Duff d...@adacore.com * a-cborma.ads, a-cidlli.ads, a-cimutr.ads, a-ciormu.ads, * a-cihase.ads, a-cohama.ads, a-coorse.ads, a-cbhama.ads, * a-cborse.ads, a-comutr.ads, a-ciorma.ads, a-cobove.ads, * a-coormu.ads, a-convec.ads, a-cohase.ads, a-coinho.ads, * a-cbdlli.ads, a-cbmutr.ads, a-cbhase.ads, a-cdlili.ads, * a-cihama.ads, a-coinve.ads, a-ciorse.ads, a-coorma.ads, * a-coinho-shared.ads (Constant_Reference_Type, Reference_Type): Add an initialization expression raise Program_Error. See, for example, RM-A.18.2(148.4). Index: a-cdlili.ads === --- a-cdlili.ads(revision 223476) +++ a-cdlili.ads(working copy) @@ -6,7 +6,7 @@ -- -- -- S p e c -- -- -- --- Copyright (C) 2004-2013, Free Software Foundation, Inc. -- +-- Copyright (C) 2004-2015, Free Software Foundation, Inc. -- -- -- -- This specification is derived from the Ada Reference Manual for use with -- -- GNAT. The copyright notice above, and the license provisions that follow -- @@ -319,9 +319,13 @@ pragma Inline (Finalize); type Constant_Reference_Type - (Element : not null access constant Element_Type) is + (Element : not null access constant Element_Type) is record - Control : Reference_Control_Type; + Control : Reference_Control_Type := + raise Program_Error with uninitialized reference; + -- The RM says, The default initialization of an object of + -- type Constant_Reference_Type or Reference_Type propagates + -- Program_Error. end record; procedure Write @@ -337,9 +341,13 @@ for Constant_Reference_Type'Read use Read; type Reference_Type - (Element : not null access Element_Type) is + (Element : not null access Element_Type) is record - Control : Reference_Control_Type; + Control : Reference_Control_Type := + raise Program_Error with uninitialized reference; + -- The RM says, The default initialization of an object of + -- type Constant_Reference_Type or Reference_Type propagates + -- Program_Error. end record; procedure Write Index: a-cihama.ads === --- a-cihama.ads(revision 223476) +++ a-cihama.ads(working copy) @@ -6,7 +6,7 @@ -- -- -- S p e c -- -- -- --- Copyright (C) 2004-2013, Free Software Foundation, Inc. -- +-- Copyright (C) 2004-2015, Free Software Foundation, Inc. -- -- -- -- This specification is derived from the Ada Reference Manual for use with -- -- GNAT. The copyright notice above, and the license provisions that follow -- @@ -379,9 +379,13 @@ pragma Inline (Finalize); type Constant_Reference_Type - (Element : not null access constant Element_Type) is + (Element : not null access constant Element_Type) is record - Control : Reference_Control_Type; + Control : Reference_Control_Type := + raise Program_Error with uninitialized reference; + -- The RM says, The default initialization of an object of + -- type Constant_Reference_Type or Reference_Type propagates + -- Program_Error. end record; procedure Write @@ -397,9 +401,13 @@ for Constant_Reference_Type'Read use Read; type Reference_Type - (Element : not null access Element_Type) is + (Element : not null access Element_Type) is record - Control : Reference_Control_Type; + Control : Reference_Control_Type := + raise Program_Error with uninitialized reference; + -- The RM says, The default initialization of an object of + -- type Constant_Reference_Type or Reference_Type propagates + -- Program_Error. end record; procedure Write Index: a-coinve.ads === --- a-coinve.ads(revision 223476) +++ a-coinve.ads(working copy) @@ -6,7 +6,7 @@ --
Re: [Ada] Correct some anmolies in the handling of Atomic
Hi Arnaud, Index: exp_util.adb === --- exp_util.adb(revision 223476) +++ exp_util.adb(working copy) @@ -204,6 +204,13 @@ when others = null; end case; + -- Nothing to do for the identifier in an object renaming declaration, + -- the renaming itself does not need atomic syncrhonization. syncrhonization - synchronization Ciao, Duncan.
Re: [patch 10/10] debug-early merge: compiler proper
On Wed, May 20, 2015 at 11:45 PM, Aldy Hernandez al...@redhat.com wrote: On 05/20/2015 05:01 PM, Jan Hubicka wrote: commit 8824b5ecba26cef065e47b34609c72677c3c36fc Author: Aldy Hernandez al...@redhat.com Date: Wed May 20 16:31:14 2015 -0400 Set DECL_IGNORED_P on temporary arrays created in the switch conversion pass. diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c index 6b68a16..a4bcdba 100644 --- a/gcc/tree-switch-conversion.c +++ b/gcc/tree-switch-conversion.c @@ -1097,6 +1097,7 @@ build_one_array (gswitch *swtch, int num, tree arr_index_type, DECL_ARTIFICIAL (decl) = 1; TREE_CONSTANT (decl) = 1; TREE_READONLY (decl) = 1; + DECL_IGNORED_P (decl) = 1; varpool_node::finalize_decl (decl); This looks obvious enough to me. Technically speaking the array type constructed probalby should be TREE_ARTIFICAIL, but probably it does not matter. Fine to commit to trunk btw. Yeah, that's what I thought. I ignored the type because it won't make it to the debugging back end if we stop things at the DECL itself. FWIW, Ada is filled with these temporaries and/or types that should really be ignored, and are currently causing grief. If you grep for finalize_decl, there are several other calls: asan.c: varpool_node::finalize_decl (var); asan.c: varpool_node::finalize_decl (var); cgraphbuild.c: varpool_node::finalize_decl (decl); cgraphunit.c:- varpool_finalize_decl cgraphunit.c: varpool_node::finalize_decl (decl); cgraphunit.c:varpool_node::finalize_decl (tree decl) coverage.c: varpool_node::finalize_decl (var); coverage.c: varpool_node::finalize_decl (var); Etc etc. Hmmm, I bet mainline is generating dwarf for all this. I don't feel comfortable touching all this (ok, I'm lazy), but it would seem like almost all of these calls would benefit from DECL_IGNORED_P. Perhaps we could add an argument to finalize_decl() and do it in there. The only issue are in passes using build_decl directly. I guess we'd want a middle-end-ish create new global static similar to what we have for locals (create_tmp_var). Some of the callers above already set DECL_IGNORED_P properly. Richard. Aldy coverage.c: varpool_node::finalize_decl (fn_info_ary); coverage.c: varpool_node::finalize_decl (gcov_info_var); omp-low.c:varpool_node::finalize_decl (t); omp-low.c:varpool_node::finalize_decl (t); omp-low.c:varpool_node::finalize_decl (decl); omp-low.c: varpool_node::finalize_decl (vars_decl); omp-low.c: varpool_node::finalize_decl (funcs_decl); passes.c: varpool_node::finalize_decl (decl); tree-chkp.c: varpool_node::finalize_decl (var); tree-chkp.c: varpool_node::finalize_decl (bnd_var); tree-profile.c: varpool_node::finalize_decl (ic_void_ptr_var); tree-profile.c: varpool_node::finalize_decl (ic_gcov_type_ptr_var); tree-switch-conversion.c: varpool_node::finalize_decl (decl); ubsan.c: varpool_node::finalize_decl (decl); ubsan.c: varpool_node::finalize_decl (var); ubsan.c: varpool_node::finalize_decl (array); varasm.c: varpool_node::finalize_decl (decl); varpool.c: Unlike finalize_decl function is intended to be used varpool.c: varpool_node::finalize_decl (decl); I would say most of them needs similar treatment (I am not 100% sure about OMP ones that may be user visible) Honza fetch = build4 (ARRAY_REF, value_type, decl, tidx, NULL_TREE,
Re: [PATCH 1/3][AArch64] Strengthen barriers for sync-fetch-op builtins.
Ok for trunk? I can't approve but do you mind taking care of -march=armv8-a in the arm backend too as that would have the same issues. Ramana Matthew gcc/ 2015-05-21 Matthew Wahab matthew.wa...@arm.com * config/aarch64/aarch64.c (aarch64_emit_post_barrier): New. (aarch64_split_atomic_op): Check for __sync memory models, emit appropriate initial and final barriers.
[Patch libstdc++] Rewrite cpu/generic/atomic_word.h
Hi, While writing atomic_word.h for the ARM backend to fix PR target/66200 I thought it would make more sense to write it all up with atomic primitives instead of providing various fragile bits of inline asssembler. Thus this patch came about. I intend to ask for a specialized version of this to be applied for the ARM backend in any case after testing completes. If this goes in as a cleanup I expect all the ports to be deleting their atomic_word.h implementation in the various ports directories and I'll follow up with patches asking port maintainers to test and apply for each of the individual cases if this is deemed to be good enough. Currently tested on AArch64 (without the other patch for PR target/66200) and tested in my tree as part of testing for the ARM backend on arm-linux-gnueabihf with no regressions. Ok to apply ? regards Ramana * config/cpu/generic/atomic_word.h: Rewrite using atomics. commit a360fdf84683777db764ba313354da92691aeb17 Author: Ramana Radhakrishnan ramana.radhakrish...@arm.com Date: Fri May 22 08:00:10 2015 + rewrite as atomics. diff --git a/libstdc++-v3/config/cpu/generic/atomic_word.h b/libstdc++-v3/config/cpu/generic/atomic_word.h index 19038bb..bedce31 100644 --- a/libstdc++-v3/config/cpu/generic/atomic_word.h +++ b/libstdc++-v3/config/cpu/generic/atomic_word.h @@ -29,19 +29,46 @@ #ifndef _GLIBCXX_ATOMIC_WORD_H #define _GLIBCXX_ATOMIC_WORD_H 1 +#include bits/cxxabi_tweaks.h + typedef int _Atomic_word; -// Define these two macros using the appropriate memory barrier for the target. -// The commented out versions below are the defaults. -// See ia64/atomic_word.h for an alternative approach. + +namespace __gnu_cxx _GLIBCXX_VISIBILITY(default) +{ + // Test the first byte of __g and ensure that no loads are hoisted across + // the test. + inline bool + __test_and_acquire (__cxxabiv1::__guard *__g) + { +unsigned char __c; +unsigned char *__p = reinterpret_castunsigned char *(__g); +__atomic_load (__p, __c, __ATOMIC_ACQUIRE); +return __c != 0; + } + + // Set the first byte of __g to 1 and ensure that no stores are sunk + // across the store. + inline void + __set_and_release (__cxxabiv1::__guard *__g) + { +unsigned char *__p = reinterpret_castunsigned char *(__g); +unsigned char val = 1; +__atomic_store (__p, val, __ATOMIC_RELEASE); + } + +#define _GLIBCXX_GUARD_TEST_AND_ACQUIRE(G) __gnu_cxx::__test_and_acquire (G) +#define _GLIBCXX_GUARD_SET_AND_RELEASE(G) __gnu_cxx::__set_and_release (G) // This one prevents loads from being hoisted across the barrier; // in other words, this is a Load-Load acquire barrier. -// This is necessary iff TARGET_RELAXED_ORDERING is defined in tm.h. -// #define _GLIBCXX_READ_MEM_BARRIER __asm __volatile (:::memory) +// This is necessary iff TARGET_RELAXED_ORDERING is defined in tm.h. +#define _GLIBCXX_READ_MEM_BARRIER __atomic_thread_fence (__ATOMIC_ACQUIRE) // This one prevents stores from being sunk across the barrier; in other // words, a Store-Store release barrier. -// #define _GLIBCXX_WRITE_MEM_BARRIER __asm __volatile (:::memory) +#define _GLIBCXX_WRITE_MEM_BARRIER __atomic_thread_fence (__ATOMIC_RELEASE) + +} #endif
Re: [PATCH 1/3][AArch64] Strengthen barriers for sync-fetch-op builtins.
On 22/05/15 12:26, Ramana Radhakrishnan wrote: Ok for trunk? I can't approve but do you mind taking care of -march=armv8-a in the arm backend too as that would have the same issues. Will do, Matthew
Re: [patch, testsuite, ARM] don't try to execute advsimd-intrinsics tests on hardware without NEON
On Thu, May 21, 2015 at 6:06 PM, Sandra Loosemore san...@codesourcery.com wrote: On 05/21/2015 03:48 AM, Christophe Lyon wrote: On 21 May 2015 at 07:33, Sandra Loosemore san...@codesourcery.com wrote: ARM testing shares the AArch64 advsimd-intrinsics execution tests. On ARM, though, the NEON support being tested is optional -- some arches are compatible with the NEON compilation options but hardware available for testing might or might not be able to execute those instructions. In arm-none-eabi testing of a long list of multilibs, I found that this problem caused some of the multilibs to get stuck for days because every one of these execution tests was wandering off into the weeds and timing out. The vect.exp tests already handle this by setting dg-do-what-default to either run or compile, depending on whether we have target hardware execution support (arm_neon_hw) for NEON, or only compilation support (arm_neon_ok). So, I've adapted that logic for advsimd-intrinsics.exp too. Indeed it makes sense. It also appeared that the main loop over the test cases was running them all twice with the torture options -- once using c-torture-execute and once using gcc-dg-runtest. I deleted the former since it appears to ignore dg-do-what-default and always try to execute no matter what. My dejagnu-fu isn't the strongest and this is pretty confusing to me am I missing something here? Otherwise, OK to commit? As noted by Alan in https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01348.html the sets of options covered by gcc-dg-runtest and c-torture-execute are slightly different. That was the reason I kept both. We can probably live with no longer testing -Og -g as Alan says. OTOH, are the 2 option sets supposed to be the same, or are there any plans to make them differ substantially in the future? Richard, adding -Og -g was your change: https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01367.html Is it an oversight that the torture option lists in c-torture.exp, objc-torture.exp, and gcc-dg.exp are not consistent? Maybe we should have a separate patch to unify them? I think the various torture flags were never consistent. I simply avoided putting even more load on the various combinations tested (we should remove some of them, like the -finline-functions and -funroll-loops cases which add little or nothing today). But yes, that's separate things. I'm fine with a patch to unify the various torture flag list into one common one and a later patch trimming it down somewhat. Thanks, Richard. -Sandra
Re: [Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures
On Fri, May 22, 2015 at 4:58 PM, Kyrill Tkachov kyrylo.tkac...@foss.arm.com wrote: Hi Venkat, On 22/05/15 09:50, Kumar, Venkataramanan wrote: Hi Kyrill, Sorry for little delay in responding. -Original Message- From: Kyrill Tkachov [mailto:kyrylo.tkac...@foss.arm.com] Sent: Tuesday, May 19, 2015 9:13 PM To: Kumar, Venkataramanan; James Greenhalgh; gcc-patches@gcc.gnu.org Cc: Ramana Radhakrishnan; seg...@kernel.crashing.org; Marcus Shawcroft Subject: Re: [Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures Hi Venkat, On 19/05/15 16:37, Kumar, Venkataramanan wrote: Hi Maintainers, Please find the attached patch, that fixes add/extend gcc test suite failures in Aarch64 target. Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049 These tests started to fail after we prevented combiner from converting shift RTX to mult RTX, when the RTX is not inside a memory operation (r222874) . Now I have added new add/extend patterns which are based on shift operations, to fix these cases. Testing status with the patch. (1) GCC bootstrap on AArch64 successful. (2) SPEC2006 INT runs did not show any degradation. Does that mean there was no performance regression? Or no codegen difference? Yes there was no performance regression. What I'd expect from this patch is that the codegen would be the same as before the combine patch (r222874). A performance difference can sometimes be hard to measure even at worse code quality. Can you please confirm that on SPEC2006 INT the adds and shifts are now back to being combined into a single instruction? I used -dp --save-temps to dump pattern names in assembly files. I used revision before combiner patch (r222873) and the patch on top of trunk (little older r223194) for comparison. Quickly compared the counts generated for the below patterns from the SPEC INT binaries. *adds_optabmode_multp2 vs *adds_optabALLX:mode_shft_GPI:mode *subs_optabmode_multp2 vs *subs_optabALLX:mode_shft_GPI:mode *add_uxtmode_multp2 vs *add_uxtmode_shift2 *add_uxtsi_multp2_uxtw vs *add_uxtsi_shift2_uxtw *sub_uxtmode_multp2 vs *sub_uxtmode_shift2 *sub_uxtsi_multp2_uxtw vs *sub_uxtsi_shift2_uxtw *adds_mul_imm_modevs *adds_shift_imm_mode *subs_mul_imm_modevs *subs_shift_imm_mode Patterns are found in few benchmarks. The generated counts are matching in the binaries compiled with the two revisions. Also Looked at assembly generated and the adds and shifts are combined properly using the new patterns. Please let me know if this is OK. Thanks for investigating this. I've run your patch on aarch64.exp and I see the add+shift/extend failures that we were seeing go away (I'm sure you saw that as well ;). Exact what's expected. This patch and the previous combine one are designed to fix those failures. Thanks, bin Up to the maintainers to review the patch. Thanks, Kyrill Thanks, Kyrill (3) gcc regression testing passed. (-Snip-) # Comparing 3 common sum files ## /bin/sh ./gcc-fsf-trunk/contrib/compare_tests /tmp/gxx-sum1.24998 /tmp/gxx-sum2.24998 Tests that now work, but didn't before: gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0- 9]+, lsl 3 gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3 gcc.target/aarch64/adds3.c scan-assembler-times adds\tx[0-9]+, x[0-9]+, x[0-9]+, sxtw 2 gcc.target/aarch64/extend.c scan-assembler add\tw[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler add\tx[0-9]+,.*uxtw #?3 gcc.target/aarch64/extend.c scan-assembler sub\tw[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxth #?1 gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxtw #?3 gcc.target/aarch64/subs1.c scan-assembler subs\tw[0-9]+, w[0-9]+, w[0- 9]+, lsl 3 gcc.target/aarch64/subs1.c scan-assembler subs\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3 gcc.target/aarch64/subs3.c scan-assembler-times subs\tx[0-9]+, x[0-9]+, x[0-9]+, sxtw 2 # No differences found in 3 common sum files (-Snip-) The patterns are fixing the regressing tests, so I have not added any new tests. Regarding removal of the old patterns based on mults, I am planning to do it as a separate work. Is this OK for trunk ? gcc/ChangeLog 2015-05-19 Venkataramanan Kumar venkataramanan.ku...@amd.com * config/aarch64/aarch64.md (*adds_shift_imm_mode): New pattern. (*subs_shift_imm_mode): Likewise. (*adds_optabALLX:mode_shift_GPI:mode): Likewise. (*subs_optabALLX:mode_shift_GPI:mode): Likewise. (*add_uxtmode_shift2): Likewise. (*add_uxtsi_shift2_uxtw): Likewise. (*sub_uxtmode_shift2): Likewise. (*sub_uxtsi_shift2_uxtw): Likewise. Regards, Venkat. Regards, Venkat,
Add few cases to operand_equal_p
Hi, I am working on patch that makes operand_equal_p replace logic from ipa-icf-gimple's compare_op via a valueizer hook. Currently the patch however cuts number of merges on firefox to half (apparently becuase it gives up on some tree codes too early) The patch bellow merges code from ipa-icf-gimple.c that is missing in fold-const and is needed. Bootstrapped/regtested x86_64-linux, OK? Honza * fold-const.c (operand_equal_p): Handle OBJ_TYPE_REF, CONSTRUCTOR and be more tolerant about FIELD_DECL. * tree.c (add_expr): Handle FIELD_DECL. (prototype_p, virtual_method_call_p, obj_type_ref_class): Constify. * tree.h (prototype_p, virtual_method_call_p, obj_type_ref_class): Constify. Index: fold-const.c === --- fold-const.c(revision 223500) @@ -3037,6 +3058,40 @@ operand_equal_p (const_tree arg0, const_ case DOT_PROD_EXPR: return OP_SAME (0) OP_SAME (1) OP_SAME (2); + /* OBJ_TYPE_REF really is just a transparent wrapper around expression, + but it holds additional type information for devirtualization that + needs to be matched. We may want to intoruce OEP flag if we want + to compare the actual value only, or if we also care about effects + of potentially merging the code. This flag can bypass this check + as well as the alias set matching in MEM_REF. */ + case OBJ_TYPE_REF: + { + if (!operand_equal_p (OBJ_TYPE_REF_EXPR (arg0), + OBJ_TYPE_REF_EXPR (arg1), flags)) + return false; + if (flag_devirtualize virtual_method_call_p (arg0)) + { + if (tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg0)) + != tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg1))) + return false; + if (!types_same_for_odr (obj_type_ref_class (arg0), +obj_type_ref_class (arg1))) + return false; + /* OBJ_TYPE_REF_OBJECT is used to track the instance of + object THIS pointer points to. Checking that both + addresses equal is safe approximation of the fact + that dynamic types are equal. + Do not care about the other flags, because this expression + does not attribute to actual value of OBJ_TYPE_REF */ + if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0), + OBJ_TYPE_REF_OBJECT (arg1), + OEP_ADDRESS_OF)) + return false; + } + + return true; + } + default: return 0; } @@ -3097,6 +3152,21 @@ operand_equal_p (const_tree arg0, const_ } case tcc_declaration: + /* At LTO time the FIELD_DECL may exist in multiple copies. +We only care about offsets and bit offsets for operands. */ + if (TREE_CODE (arg0) == FIELD_DECL) + { + tree offset1 = DECL_FIELD_OFFSET (arg0); + tree offset2 = DECL_FIELD_OFFSET (arg1); + + tree bit_offset1 = DECL_FIELD_BIT_OFFSET (arg0); + tree bit_offset2 = DECL_FIELD_BIT_OFFSET (arg1); + + flags = ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); + + return operand_equal_p (offset1, offset2, flags) + operand_equal_p (bit_offset1, bit_offset2, flags); + } /* Consider __builtin_sqrt equal to sqrt. */ return (TREE_CODE (arg0) == FUNCTION_DECL DECL_BUILT_IN (arg0) DECL_BUILT_IN (arg1) @@ -3104,12 +3174,50 @@ operand_equal_p (const_tree arg0, const_ DECL_FUNCTION_CODE (arg0) == DECL_FUNCTION_CODE (arg1)); default: + /* In GIMPLE empty constructors are allowed in initializers of +vector types. */ + if (TREE_CODE (arg0) == CONSTRUCTOR) + { + unsigned length1 = vec_safe_length (CONSTRUCTOR_ELTS (arg0)); + unsigned length2 = vec_safe_length (CONSTRUCTOR_ELTS (arg1)); + + if (length1 != length2) + return false; + + flags = ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); + + for (unsigned i = 0; i length1; i++) + if (!operand_equal_p (CONSTRUCTOR_ELT (arg0, i)-value, + CONSTRUCTOR_ELT (arg1, i)-value, flags)) + return false; + + return true; + } return 0; } #undef OP_SAME #undef OP_SAME_WITH_NULL } Index: tree.c === --- tree.c (revision 223500) +++ tree.c (working copy) @@ -7647,6 +7647,12 @@ add_expr (const_tree t, inchash::hash h } return; } +case FIELD_DECL: + /* At LTO time the FIELD_DECL may exist in multiple copies. +We only care about offsets and bit offsets for operands. */ +
[PATCH][3/n] Reduction vectorization improvements
This does some more cleanup and refactoring with two fixes, the pure slp compute in vect_analyze_loop_operations was failing to look at pattern stmts and the vect_is_slp_reduction hunk makes reduction detection fail because the pattern state changes in between reduction detection and vectoriztaion (which re-calls the early code). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2015-05-22 Richard Biener rguent...@suse.de * tree-vectorizer.h (struct _slp_oprnd_info): Add second_pattern member. * tree-vect-loop.c (vect_analyze_loop_operations): Look at patterns when determining whether SLP is pure. (vect_is_slp_reduction): Remove check for pattern stmts. (vect_is_simple_reduction_1): Remove dead code. * tree-vect-slp.c (vect_create_oprnd_info): Initialize second_pattern. (vect_get_and_check_slp_defs): Pass in the stmt number. Allow the first def in a reduction to be not a pattern stmt when the rest of the stmts def are patterns. (vect_build_slp_tree_1): Allow tcc_expression codes like SAD_EXPR and DOT_PROD_EXPR. (vect_build_slp_tree): Adjust. (vect_analyze_slp): Refactor and move BB vect error message ... (vect_slp_analyze_bb_1): ... here. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 223529) +++ gcc/tree-vect-loop.c(working copy) @@ -1399,7 +1399,12 @@ vect_analyze_loop_operations (loop_vec_i { gimple stmt = gsi_stmt (si); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - gcc_assert (stmt_info); + if (STMT_VINFO_IN_PATTERN_P (stmt_info) + STMT_VINFO_RELATED_STMT (stmt_info)) + { + stmt = STMT_VINFO_RELATED_STMT (stmt_info); + stmt_info = vinfo_for_stmt (stmt); + } if ((STMT_VINFO_RELEVANT_P (stmt_info) || VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))) !PURE_SLP_STMT (stmt_info)) @@ -2031,12 +2036,8 @@ vect_is_slp_reduction (loop_vec_info loo if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))) { - if (vinfo_for_stmt (use_stmt) - !STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt))) -{ - loop_use_stmt = use_stmt; - nloop_uses++; -} + loop_use_stmt = use_stmt; + nloop_uses++; } else n_out_of_loop_uses++; @@ -2265,9 +2266,7 @@ vect_is_simple_reduction_1 (loop_vec_inf return NULL; } - if (vinfo_for_stmt (use_stmt) - !is_pattern_stmt_p (vinfo_for_stmt (use_stmt))) -nloop_uses++; + nloop_uses++; if (nloop_uses 1) { if (dump_enabled_p ()) @@ -2325,9 +2324,7 @@ vect_is_simple_reduction_1 (loop_vec_inf gimple use_stmt = USE_STMT (use_p); if (is_gimple_debug (use_stmt)) continue; - if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)) - vinfo_for_stmt (use_stmt) - !is_pattern_stmt_p (vinfo_for_stmt (use_stmt))) + if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))) nloop_uses++; if (nloop_uses 1) { Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 223529) +++ gcc/tree-vect-slp.c (working copy) @@ -183,6 +183,7 @@ vect_create_oprnd_info (int nops, int gr oprnd_info-first_dt = vect_uninitialized_def; oprnd_info-first_op_type = NULL_TREE; oprnd_info-first_pattern = false; + oprnd_info-second_pattern = false; oprnds_info.quick_push (oprnd_info); } @@ -242,7 +243,7 @@ vect_get_place_in_interleaving_chain (gi static int vect_get_and_check_slp_defs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo, - gimple stmt, bool first, + gimple stmt, unsigned stmt_num, vecslp_oprnd_info *oprnds_info) { tree oprnd; @@ -256,6 +257,8 @@ vect_get_and_check_slp_defs (loop_vec_in int first_op_idx = 1; bool commutative = false; bool first_op_cond = false; + bool first = stmt_num == 0; + bool second = stmt_num == 1; if (loop_vinfo) loop = LOOP_VINFO_LOOP (loop_vinfo); @@ -326,7 +329,11 @@ again: !STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt))) { pattern = true; - if (!first !oprnd_info-first_pattern) + if (!first !oprnd_info-first_pattern + /* Allow different pattern state for the defs of the +first stmt in reduction chains. */ + (oprnd_info-first_dt != vect_reduction_def + || (!second !oprnd_info-second_pattern)))
Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
On 22/05/15 14:40, Jason Merrill wrote: On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote: + /* Load the guard value only through an atomic acquire load. */ + guard = build_atomic_load (guard, MEMMODEL_ACQUIRE); + /* Check to see if the GUARD is zero. */ guard = get_guard_bits (guard); I wonder if these calls should be reversed, to express that we're only trying to atomically load a byte (on non-ARM targets)? I'm not sure about the impact on other non-ARM targets without some more investigation. + tree orig_src = src; + tree t, addr, val; + unsigned int size; + int fncode; + + size = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (src))); + + fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1; + t = builtin_decl_implicit ((enum built_in_function) fncode); + + addr = build1 (ADDR_EXPR, ptr_type, src); + val = build_call_expr (t, 2, addr, mem_model); + + /* First reinterpret the loaded bits in the original type of the load, + then convert to the expected result type. */ + t = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (src), val); + return convert (TREE_TYPE (orig_src), t); I don't see anything that changes src here. Sorry 'bout that. Will fix in the next spin depending on comments from others. I take it that you are happy with the patch otherwise ... regards Ramana Jason
[RFC] operand_equal_p with valueization
Hi, with aliasing sanity checks I got burnt again with ipa-icf-gimple's compare_operand doing alias set checks on all types it ever trips across. I always tought that we do not need two equality testers - operand_equal_p and compare_operand and given that it turns out to be non-trivial to fix issues in compare_operand I decided to go ahead with plan to unify them. I think operand_equal_p is better code base to start from since it is more tested and knows more special cases. The basic idea is to add valeize hook similar way as other folders do that allows to do magic inside the recursive comparsion. I.e. declare two trees equal if they are not (i.e. SSA_NAMES from differen functions). I think it should be useful for GVN/VRP/CCP too to improve folding of conditionals. After trying out the hook and figuring out that ipa-icf does not have global context to hold its tables, I dedcided that maybe more C++ way is to have comparsion class that can be derived an adjusted for other needs. The following patch is a first step. If it is considered sane, I will continue by moving the code to one place - ipa-icf-gimple or special ipa-icf-op.c. I will also recover Martin's diagnostics that is useful to debug why things are considered different. Also the code should be C++ized, for example it should use true/false instead 0/1. I think also the iterative hashing should be together with operand_equal_p implementation because these two must match as I found with merging the two cases that ipa-icf-gimple gets right and fold-const wrong - OBJ_TYPE_REF, CONSTRUCTOR and PARM_DECL. Finally I think we want compare_gimple class that does the same for gimple and is independent of rest of the ipa-icf that may be better suitable for stuff like tail merging. The patch bootstraps/regtests ppc64-linux, but I think it is not quite mainline ready as it noticeably reduces number of equivalences found. I will need to debug why that happens, but I am sending it as an RFC for the basic concept and interfaces. Honza * fold-const.c (operand_equal_p): Reorg to wrapper for (operand_compare::operand_equal_p): This function; add support for valueization, add missing type matching for OBJ_TYPE_REF, CONSTRUCTOR; relax matching of PARM_DECL. (operand_compare::operand_equal_valueize): New. * fold-const.h (operand_equal_p): Update prototype. (class operand_compare): New class. * ipa-icf-gimple.c (func_checker::operand_equal_valueize): Break ipa-icf specific bits out from ... (func_checker::compare_operand): ... here; remove most of generic handling and use operand_compare class. * ipa-icf-gimple.h (operand_compare): New. * ipa-icf.c (sem_function::equals_private): Arrange CFUN to be up to date so we operand_equal_p works well for flag_devirtualize. Index: fold-const.c === --- fold-const.c(revision 223500) +++ fold-const.c(working copy) @@ -2716,8 +2730,9 @@ combine_comparisons (location_t loc, are considered the same. It is used when the caller has other ways to ensure that global memory is unchanged in between. */ -int -operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags) +bool +operand_compare::operand_equal_p (const_tree arg0, const_tree arg1, + unsigned int flags) { /* If either is ERROR_MARK, they aren't equal. */ if (TREE_CODE (arg0) == ERROR_MARK || TREE_CODE (arg1) == ERROR_MARK @@ -2868,6 +2883,12 @@ operand_equal_p (const_tree arg0, const_ if (flags OEP_ONLY_CONST) return 0; + int val = operand_equal_valueize (arg0, arg1, flags); + if (val == 1) +return 1; + if (val == 0) +return 0; + /* Define macros to test an operand from arg0 and arg1 for equality and a variant that allows null and views null as being different from any non-null value. In the latter case, if either is null, the both @@ -3104,12 +3174,50 @@ operand_equal_p (const_tree arg0, const_ DECL_FUNCTION_CODE (arg0) == DECL_FUNCTION_CODE (arg1)); default: return 0; } #undef OP_SAME #undef OP_SAME_WITH_NULL } + +/* Valueizer is a virtual method that allows to introduce extra equalities + that are not directly visible from the operand. + N1 means values are known to be equal, 0 means values are known to be different + -1 means that operand_equal_p should continue processing. */ +int +operand_compare::operand_equal_valueize (const_tree, const_tree, unsigned int) +{ + return -1; +} + +/* Conveinece wrapper around operand_compare class because usually we do + not need to play with the valueizer. */ +bool +operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags) +{ + static operand_compare default_compare_instance; + return default_compare_instance.operand_equal_p (arg0, arg1, flags); +} /* Similar to operand_equal_p,
Re: [RFC] operand_equal_p with valueization
On Fri, 22 May 2015, Jan Hubicka wrote: Hi, with aliasing sanity checks I got burnt again with ipa-icf-gimple's compare_operand doing alias set checks on all types it ever trips across. I always tought that we do not need two equality testers - operand_equal_p and compare_operand and given that it turns out to be non-trivial to fix issues in compare_operand I decided to go ahead with plan to unify them. I think operand_equal_p is better code base to start from since it is more tested and knows more special cases. The basic idea is to add valeize hook similar way as other folders do that allows to do magic inside the recursive comparsion. I.e. declare two trees equal if they are not (i.e. SSA_NAMES from differen functions). I think it should be useful for GVN/VRP/CCP too to improve folding of conditionals. After trying out the hook and figuring out that ipa-icf does not have global context to hold its tables, I dedcided that maybe more C++ way is to have comparsion class that can be derived an adjusted for other needs. The following patch is a first step. If it is considered sane, I will continue by moving the code to one place - ipa-icf-gimple or special ipa-icf-op.c. I will also recover Martin's diagnostics that is useful to debug why things are considered different. Also the code should be C++ized, for example it should use true/false instead 0/1. I think also the iterative hashing should be together with operand_equal_p implementation because these two must match as I found with merging the two cases that ipa-icf-gimple gets right and fold-const wrong - OBJ_TYPE_REF, CONSTRUCTOR and PARM_DECL. Finally I think we want compare_gimple class that does the same for gimple and is independent of rest of the ipa-icf that may be better suitable for stuff like tail merging. The patch bootstraps/regtests ppc64-linux, but I think it is not quite mainline ready as it noticeably reduces number of equivalences found. I will need to debug why that happens, but I am sending it as an RFC for the basic concept and interfaces. I think for the case of IPA ICF it would be better to address the issue that it cannot do merging of functions with TBAA conflicts. That is, drop that TBAA code from IPA ICF and arrange for the IPA inliner to inline original unmerged copies. We were talking about making the original nodes inline clones of the node that eventually prevails, much similar to speculative inlining ones (if I remember that suggestion from Martin correctly). And no, I'm hesitant to change operand_equal_p too much. It's very much deep-rooted into GENERIC. Richard. Honza * fold-const.c (operand_equal_p): Reorg to wrapper for (operand_compare::operand_equal_p): This function; add support for valueization, add missing type matching for OBJ_TYPE_REF, CONSTRUCTOR; relax matching of PARM_DECL. (operand_compare::operand_equal_valueize): New. * fold-const.h (operand_equal_p): Update prototype. (class operand_compare): New class. * ipa-icf-gimple.c (func_checker::operand_equal_valueize): Break ipa-icf specific bits out from ... (func_checker::compare_operand): ... here; remove most of generic handling and use operand_compare class. * ipa-icf-gimple.h (operand_compare): New. * ipa-icf.c (sem_function::equals_private): Arrange CFUN to be up to date so we operand_equal_p works well for flag_devirtualize. Index: fold-const.c === --- fold-const.c (revision 223500) +++ fold-const.c (working copy) @@ -2716,8 +2730,9 @@ combine_comparisons (location_t loc, are considered the same. It is used when the caller has other ways to ensure that global memory is unchanged in between. */ -int -operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags) +bool +operand_compare::operand_equal_p (const_tree arg0, const_tree arg1, + unsigned int flags) { /* If either is ERROR_MARK, they aren't equal. */ if (TREE_CODE (arg0) == ERROR_MARK || TREE_CODE (arg1) == ERROR_MARK @@ -2868,6 +2883,12 @@ operand_equal_p (const_tree arg0, const_ if (flags OEP_ONLY_CONST) return 0; + int val = operand_equal_valueize (arg0, arg1, flags); + if (val == 1) +return 1; + if (val == 0) +return 0; + /* Define macros to test an operand from arg0 and arg1 for equality and a variant that allows null and views null as being different from any non-null value. In the latter case, if either is null, the both @@ -3104,12 +3174,50 @@ operand_equal_p (const_tree arg0, const_ DECL_FUNCTION_CODE (arg0) == DECL_FUNCTION_CODE (arg1)); default: return 0; } #undef OP_SAME #undef OP_SAME_WITH_NULL } + +/* Valueizer is a virtual method that allows to introduce extra equalities + that are not
Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote: + /* Load the guard value only through an atomic acquire load. */ + guard = build_atomic_load (guard, MEMMODEL_ACQUIRE); + /* Check to see if the GUARD is zero. */ guard = get_guard_bits (guard); I wonder if these calls should be reversed, to express that we're only trying to atomically load a byte (on non-ARM targets)? + tree orig_src = src; + tree t, addr, val; + unsigned int size; + int fncode; + + size = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (src))); + + fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1; + t = builtin_decl_implicit ((enum built_in_function) fncode); + + addr = build1 (ADDR_EXPR, ptr_type, src); + val = build_call_expr (t, 2, addr, mem_model); + + /* First reinterpret the loaded bits in the original type of the load, + then convert to the expected result type. */ + t = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (src), val); + return convert (TREE_TYPE (orig_src), t); I don't see anything that changes src here. Jason
Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
On Fri, May 22, 2015 at 9:40 AM, Jason Merrill ja...@redhat.com wrote: On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote: + /* Load the guard value only through an atomic acquire load. */ + guard = build_atomic_load (guard, MEMMODEL_ACQUIRE); + /* Check to see if the GUARD is zero. */ guard = get_guard_bits (guard); I wonder if these calls should be reversed, to express that we're only trying to atomically load a byte (on non-ARM targets)? That expresses the semantics more directly, but will that lead to less efficient code on some RISC architectures? - David
[PATCH] Fix ICE in PR66251
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-05-22 Richard Biener rguent...@suse.de PR tree-optimization/66251 * tree-vect-stmts.c (vectorizable_conversion): Properly set STMT_VINFO_VEC_STMT even for the SLP case. * gfortran.fortran-torture/compile/pr66251.f90: New testcase. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 223529) +++ gcc/tree-vect-stmts.c (working copy) @@ -3964,14 +3964,12 @@ vectorizable_conversion (gimple stmt, gi if (slp_node) SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); + + if (!prev_stmt_info) + STMT_VINFO_VEC_STMT (stmt_info) = new_stmt; else - { - if (!prev_stmt_info) - STMT_VINFO_VEC_STMT (stmt_info) = new_stmt; - else - STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; - prev_stmt_info = vinfo_for_stmt (new_stmt); - } + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); } } Index: gcc/testsuite/gfortran.fortran-torture/compile/pr66251.f90 === --- gcc/testsuite/gfortran.fortran-torture/compile/pr66251.f90 (revision 0) +++ gcc/testsuite/gfortran.fortran-torture/compile/pr66251.f90 (working copy) @@ -0,0 +1,7 @@ +SUBROUTINE dbcsr_data_convert (n) + COMPLEX(KIND=4), DIMENSION(:), POINTER :: s_data_c + COMPLEX(KIND=8), DIMENSION(:), POINTER :: t_data_z + t_data_z(1:n) = CMPLX(s_data_c(1:n), KIND=8) + CALL foo() +END SUBROUTINE dbcsr_data_convert +
Re: [PATCH][ARM] Handle UNSPEC_VOLATILE in rtx costs and don't recurse inside the unspec
On Mon, Apr 20, 2015 at 5:28 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, A pet project of mine is to get to the point where backend rtx costs functions won't have to handle rtxes that don't match down to any patterns/expanders we have. Or at least limit such cases. A case dealt with in this patch is QImode PLUS. We don't actually generate or handle these anywhere in the arm backend *except* in sync.md where, for example, atomic_sync_optabmode matches: (set (match_operand:QHSD 0 mem_noofs_operand +Ua) (unspec_volatile:QHSD [(syncop:QHSD (match_dup 0) (match_operand:QHSD 1 atomic_op_operand atomic_op_str)) (match_operand:SI 2 const_int_operand)];; model VUNSPEC_ATOMIC_OP)) Here QHSD can contain QImode and HImode while syncop can be PLUS. Now immediately during splitting in arm_split_atomic_op we convert that QImode PLUS into an SImode one, so we never actually generate any kind of QImode add operations (how would we? we don't have define_insns for such things) but the RTL optimisers will get a hold of the UNSPEC_VOLATILE in the meantime and ask for it's cost (for example, cse when building libatomic). Currently we don't handle UNSPEC_VOLATILE (VUNSPEC_ATOMIC_OP) so the arm rtx costs function just recurses into the QImode PLUS that I'd like to avoid. This patch stops that by passing the VUNSPEC_ATOMIC_OP into arm_unspec_cost and handling it there (very straightforwardly just returning COSTS_N_INSNS (2); there's no indication that we want to do anything smarter here) and stopping the recursion. This is a small step in the direction of not having to care about obviously useless rtxes in the backend. The astute reader might notice that in sync.md we also have the pattern atomic_fetch_sync_optabmode which expands to/matches this: (set (match_operand:QHSD 0 s_register_operand =r) (match_operand:QHSD 1 mem_noofs_operand +Ua)) (set (match_dup 1) (unspec_volatile:QHSD [(syncop:QHSD (match_dup 1) (match_operand:QHSD 2 atomic_op_operand atomic_op_str)) (match_operand:SI 3 const_int_operand)];; model VUNSPEC_ATOMIC_OP)) Here the QImode PLUS is in a PARALLEL together with the UNSPEC, so it might have rtx costs called on it as well. This will always be a (plus (reg) (mem)) rtx, which is unlike any other normal rtx we generate in the arm backend. I'll try to get a patch to handle that case, but I'm still thinking on how to best do that. Tested arm-none-eabi, I didn't see any codegen differences in some compiled codebases. Ok for trunk? OK Ramana P.S. I know that expmed creates all kinds of irregular rtxes and asks for their costs. I'm hoping to clean that up at some point... 2015-04-20 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (arm_new_rtx_costs): Handle UNSPEC_VOLATILE. (arm_unspec_cost): Allos UNSPEC_VOLATILE. Do not recurse inside unknown unspecs.
[Ada] Small enhancement to unchecked conversion warning in -gnatf mode
In full warning mode, when an unchecked conversion is applied to types of different sizes, the compiler issues a warning describing the effects of the conversion on the additional or missing bits. When the source is smaller than the target, it was issuing a specific warning if the target is of a discrete type but, when the source is larger than the target, it wasn't doing the same if the source is of a discrete type. The compiler must issue the following warnings with -gnatf: ucw.ads:12:03: warning: types for unchecked conversion have different sizes ucw.ads:12:03: warning: size of REC is 16, size of INTEGER is 32 ucw.ads:12:03: warning: target value will include 16 undefined high order bits ucw.ads:13:03: warning: types for unchecked conversion have different sizes ucw.ads:13:03: warning: size of INTEGER is 32, size of REC is 16 ucw.ads:13:03: warning: 16 high order bits of source will be ignored on with Unchecked_Conversion; package UCW is type Int8_t is mod 2**8; type Rec is record I1 : Int8_t; I2 : Int8_t; end record; function C1 is new Unchecked_Conversion (Source = Rec, Target = Integer); function C2 is new Unchecked_Conversion (Source = Integer, Target = Rec); end UCW; Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Eric Botcazou ebotca...@adacore.com * sem_ch13.adb (Validate_Unchecked_Conversions): Also issue specific warning for discrete types when the source is larger than the target. Index: sem_ch13.adb === --- sem_ch13.adb(revision 223534) +++ sem_ch13.adb(working copy) @@ -13483,9 +13483,22 @@ end if; else pragma Assert (Source_Siz Target_Siz); -Error_Msg - (\?z?^ trailing bits of source will be ignored!, - Eloc); +if Is_Discrete_Type (Source) then + if Bytes_Big_Endian then + Error_Msg +(\?z?^ low order bits of source will be + ignored!, Eloc); + else + Error_Msg +(\?z?^ high order bits of source will be + ignored!, Eloc); + end if; + +else + Error_Msg + (\?z?^ trailing bits of source will be + ignored!, Eloc); +end if; end if; end if; end if;
[Ada] Make sure Volatile_Full_Access is treated like Atomic
This update makes sure that Volatile_Full_Access is treated like Atomic in all cases except checking specific RM legality rules, and controlling atomic synchronization. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Robert Dewar de...@adacore.com * exp_ch5.adb, layout.adb, einfo.adb, einfo.ads, sem_prag.adb, freeze.adb, freeze.ads, sem_util.adb, sem_util.ads, exp_ch2.adb, exp_ch4.adb, errout.adb, exp_aggr.adb, sem_ch13.adb: This is a general change that deals with the fact that most of the special coding for Atomic should also apply to the case of Volatile_Full_Access. A new attribute Is_Atomic_Or_VFA is introduced, and many of the references to Is_Atomic now use this new attribute. Index: exp_ch5.adb === --- exp_ch5.adb (revision 223476) +++ exp_ch5.adb (working copy) @@ -429,11 +429,11 @@ elsif Has_Controlled_Component (L_Type) then Loop_Required := True; - -- If object is atomic, we cannot tolerate a loop + -- If object is atomic/VFA, we cannot tolerate a loop - elsif Is_Atomic_Object (Act_Lhs) + elsif Is_Atomic_Or_VFA_Object (Act_Lhs) or else -Is_Atomic_Object (Act_Rhs) +Is_Atomic_Or_VFA_Object (Act_Rhs) then return; @@ -442,8 +442,8 @@ elsif Has_Atomic_Components (L_Type) or else Has_Atomic_Components (R_Type) -or else Is_Atomic (Component_Type (L_Type)) -or else Is_Atomic (Component_Type (R_Type)) +or else Is_Atomic_Or_VFA (Component_Type (L_Type)) +or else Is_Atomic_Or_VFA (Component_Type (R_Type)) then Loop_Required := True; @@ -3395,7 +3395,7 @@ Next_Elmt (Prim); end loop; - -- default iterator must exist. + -- Default iterator must exist pragma Assert (False); Index: layout.adb === --- layout.adb (revision 223476) +++ layout.adb (working copy) @@ -6,7 +6,7 @@ -- -- -- B o d y -- -- -- --- Copyright (C) 2001-2014, Free Software Foundation, Inc. -- +-- Copyright (C) 2001-2015, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -2684,11 +2684,11 @@ elsif Is_Array_Type (E) then --- For arrays that are required to be atomic, we do the same +-- For arrays that are required to be atomic/VFA, we do the same -- processing as described above for short records, since we -- really need to have the alignment set for the whole array. -if Is_Atomic (E) and then not Debug_Flag_Q then +if Is_Atomic_Or_VFA (E) and then not Debug_Flag_Q then Set_Composite_Alignment (E); end if; @@ -2903,11 +2903,19 @@ and then Is_Record_Type (E) and then Is_Packed (E) then - -- No effect for record with atomic components + -- No effect for record with atomic/VFA components - if Is_Atomic (E) then + if Is_Atomic_Or_VFA (E) then Error_Msg_N (Optimize_Alignment has no effect for ??, E); -Error_Msg_N (\pragma ignored for atomic record??, E); + +if Is_Atomic (E) then + Error_Msg_N + (\pragma ignored for atomic record??, E); +else + Error_Msg_N + (\pragma ignored for bolatile full access record??, E); +end if; + return; end if; @@ -2920,20 +2928,30 @@ return; end if; - -- No effect if any component is atomic or is a by reference type + -- No effect if any component is atomic/VFA or is a by reference type declare Ent : Entity_Id; + begin Ent := First_Component_Or_Discriminant (E); while Present (Ent) loop if Is_By_Reference_Type (Etype (Ent)) - or else Is_Atomic (Etype (Ent)) - or else Is_Atomic (Ent) + or else Is_Atomic_Or_VFA (Etype (Ent)) + or else Is_Atomic_Or_VFA (Ent) then Error_Msg_N (Optimize_Alignment has no effect for ??, E); - Error_Msg_N -(\pragma is ignored if atomic components present??, E); + +
Re: Add few cases to operand_equal_p
+ case OBJ_TYPE_REF: + { + if (!operand_equal_p (OBJ_TYPE_REF_EXPR (arg0), + OBJ_TYPE_REF_EXPR (arg1), flags)) + return false; + if (flag_devirtualize virtual_method_call_p (arg0)) + { + if (tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg0)) + != tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg1))) + return false; + if (!types_same_for_odr (obj_type_ref_class (arg0), +obj_type_ref_class (arg1))) + return false; devirt machinery in operand_equal_p? please not. not more places! OBJ_TYPE_REF explicitly says what is the type of class whose virtual method is called. It is GIMPLE operand expression like other, so I think we need to handle it. Actually I think I can just drop the flag_devirtualize check because with !flag_devirtualize we should simply avoid having OBJ_TYPE_REF around. That would make it looking less devirt specific. We can also test types for equivalence of main variant, but that would just introduce false positives when LTO did not merged two identical ODR types. It would be correct though. + /* OBJ_TYPE_REF_OBJECT is used to track the instance of + object THIS pointer points to. Checking that both + addresses equal is safe approximation of the fact + that dynamic types are equal. + Do not care about the other flags, because this expression + does not attribute to actual value of OBJ_TYPE_REF */ + if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0), + OBJ_TYPE_REF_OBJECT (arg1), + OEP_ADDRESS_OF)) + return false; + } + + return true; + } + default: return 0; } @@ -3097,6 +3152,21 @@ operand_equal_p (const_tree arg0, const_ } case tcc_declaration: + /* At LTO time the FIELD_DECL may exist in multiple copies. +We only care about offsets and bit offsets for operands. */ Err? Even at LTO time FIELD_DECLs should only appear once. So - testcase? FIELD_DECL has TREE_TYPE and TREE_TYPE may not get merged by LTO. So if the two expressions originate from two different units, we may have two semantically equivalent FIELD_DECLs (of compatible types and same offsets) that occupy different memory locations because their merging was prevented by something upstream (like complete wrt incmplete pointer in the type) + /* In GIMPLE empty constructors are allowed in initializers of +vector types. */ Why this comment about GIMPLE? This is about comparing GENERIC trees which of course can have CONSTRUCTORs of various sorts. + if (TREE_CODE (arg0) == CONSTRUCTOR) + { + unsigned length1 = vec_safe_length (CONSTRUCTOR_ELTS (arg0)); + unsigned length2 = vec_safe_length (CONSTRUCTOR_ELTS (arg1)); + + if (length1 != length2) + return false; + + flags = ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); + + for (unsigned i = 0; i length1; i++) + if (!operand_equal_p (CONSTRUCTOR_ELT (arg0, i)-value, + CONSTRUCTOR_ELT (arg1, i)-value, flags)) You need to compare -index as well here. I'm not sure constructor values are sorted always so you might get very many false negatives. many false negatives is better than all false negatices :) You are right, either I should punt on non-empty constructors or compare the indexes, will do the second. +case FIELD_DECL: + /* At LTO time the FIELD_DECL may exist in multiple copies. +We only care about offsets and bit offsets for operands. */ So explain that this is the reason we don't want to hash by DECL_UID. I still think this is bogus. Will do if we agree on having this. I know you would like ipa-icf to keep original bodies and use them for inlining declaring alias sets to be function local. This is wrong plan. Consder: void t(int *ptr) { *ptr=1; } int a(int *ptr1, int *ptr2) { int a = *ptr1; t(ptr2) return a+*ptr1; } long b(long *ptr1, int *ptr2) { int a = *ptr1; t(ptr2) return a+*ptr1; } here aliasing leads to the two options to be optimizer differently: a: .LFB1: .cfi_startproc movl4(%esp), %edx movl8(%esp), %ecx movl(%edx), %eax movl$1, (%ecx) addl(%edx), %eax ret .cfi_endproc b: .LFB2: .cfi_startproc movl4(%esp), %eax movl8(%esp), %edx movl(%eax), %eax movl$1, (%edx) addl%eax, %eax ret .cfi_endproc however with -fno-early-inlining the functions look identical (modulo alias sets) at ipa-icf time. If we merged a/b, we could get wrong code for a even though no inlining of a or b happens. So either we match the alias sets
Re: [RFC] operand_equal_p with valueization
And no, I'm hesitant to change operand_equal_p too much. It's very much deep-rooted into GENERIC. OK, as another option, i can bring relevant logic from operand_equal_p to ipa-icf and separate it into the compare_operand class like I did. Use it in ipa-icf-gimple now and we can slowly turn other uses of operand_equal into the compare_operand users in middle end. I agree that operand_equal is bit crazy code and it does not handle quite few things we could do at gimple. I have nothing against going this direction. (after all I do not like touching fold-const much becuase it works on generic, gimple and FE non-generic and it is not well specified what it should do) Honza
Re: [patch 1/10] debug-early merge: Ada front-end
On 05/22/2015 04:31 AM, Eric Botcazou wrote: My apologies for the delay on Ada. I have reworked the patch to leave the first pass on the TYPE_DECLs which are definitely needed. I also optimized things a bit, since we don't need to save all the globals any more. Thanks, this looks fine modulo a couple of nits, see below. There is one regression which I'd like you and Jason's input before proceeding: +FAIL: gnat.dg/specs/debug1.ads scan-assembler-times DW_AT_artificial 17 The problem is that the Ada front-end twiddles the DECL_ARTIFICIAL flag *after* it has called debug_hooks-early_global_decl(). The function gnat_to_gnu_entity() calls create_var_decl_1-rest_of_decl_compilation, but then much later twiddles DECL_ARTIFICIAL: if (!Comes_From_Source (gnat_entity)) DECL_ARTIFICIAL (gnu_decl) = 1; Twiddling DECL_ARTIFICIAL after we create early dwarf, means the DIE does not get the DW_AT_artificial. Would it be possible for you guys (ahem, Ada folk) to set DECL_ARTIFICIAL before calling rest_of_decl_compilation? Sure, just add a ??? comment before the above lines for now. Done. @@ -535,8 +535,7 @@ extern tree gnat_type_for_size (unsigned precision, int unsignedp); an unsigned type; otherwise a signed type is returned. */ extern tree gnat_type_for_mode (machine_mode mode, int unsignedp); -/* Emit debug info for all global variable declarations. */ -extern void gnat_write_global_declarations (void); +extern void note_types_used_by_globals (void); The comment needs to be adjusted instead of removed and preferably be the same as the one (ajusted too) above the function in utils.c. Done. I have queued the attached patch for merge into mainline once the rest of the patches are approved. Thank you. gcc/ada/ * gcc-interface/gigi.h (note_types_used_by_globals): Rename from gnat_write_global_declarations. * gcc-interface/misc.c (gnat_parse_file): Call note_types_used_by_globals. Remove LANG_HOOKS_WRITE_GLOBALS. * gcc-interface/utils.c: Rename global_decls to type_decls. (gnat_write_global_declarations): Rename to note_types_used_by_globals. Remove call to finalize_compilation_unit. Remove debug_hooks-global_decl() call for globals. (gnat_pushdecls): Only insert into type_decls if TYPE_DECL. diff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c index d908a1b..e778616 100644 --- a/gcc/ada/gcc-interface/decl.c +++ b/gcc/ada/gcc-interface/decl.c @@ -5241,6 +5241,13 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, int definition) !Present (Alias (gnat_entity)) !(Present (Renamed_Object (gnat_entity)) saved)) { + /* ?? DECL_ARTIFICIAL, and possibly DECL_IGNORED_P below, should + be set before calling rest_of_decl_compilation above (through + create_var_decl_1). This is because rest_of_decl_compilation + calls the debugging backend and will create a DIE without + DW_AT_artificial. + + This is currently causing gnat.dg/specs/debug1.ads to FAIL. */ if (!Comes_From_Source (gnat_entity)) DECL_ARTIFICIAL (gnu_decl) = 1; diff --git a/gcc/ada/gcc-interface/gigi.h b/gcc/ada/gcc-interface/gigi.h index 6d65fc5..ace4ac8 100644 --- a/gcc/ada/gcc-interface/gigi.h +++ b/gcc/ada/gcc-interface/gigi.h @@ -535,8 +535,9 @@ extern tree gnat_type_for_size (unsigned precision, int unsignedp); an unsigned type; otherwise a signed type is returned. */ extern tree gnat_type_for_mode (machine_mode mode, int unsignedp); -/* Emit debug info for all global variable declarations. */ -extern void gnat_write_global_declarations (void); +/* Keep track of types used at the global level and emit debug info + for all global types. */ +extern void note_types_used_by_globals (void); /* Return the unsigned version of a TYPE_NODE, a scalar type. */ extern tree gnat_unsigned_type (tree type_node); diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c index edaab97..2056a61 100644 --- a/gcc/ada/gcc-interface/misc.c +++ b/gcc/ada/gcc-interface/misc.c @@ -122,6 +122,8 @@ gnat_parse_file (void) /* Call the front end. */ _ada_gnat1drv (); + + note_types_used_by_globals (); } /* Return language mask for option processing. */ @@ -972,8 +974,6 @@ gnat_init_ts (void) #define LANG_HOOKS_GETDECLS lhd_return_null_tree_v #undef LANG_HOOKS_PUSHDECL #define LANG_HOOKS_PUSHDECL gnat_return_tree -#undef LANG_HOOKS_WRITE_GLOBALS -#define LANG_HOOKS_WRITE_GLOBALS gnat_write_global_declarations #undef LANG_HOOKS_GET_ALIAS_SET #define LANG_HOOKS_GET_ALIAS_SET gnat_get_alias_set #undef LANG_HOOKS_PRINT_DECL diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c index 5968857..915cbe1 100644 --- a/gcc/ada/gcc-interface/utils.c +++ b/gcc/ada/gcc-interface/utils.c @@ -227,8 +227,8 @@ static GTY((deletable)) struct gnat_binding_level *free_binding_level; /* The context to be used for global declarations. */ static GTY(()) tree global_context; -/* An array of global
Re: Do not compute alias sets for types that don't need them
On Fri, 22 May 2015, Jan Hubicka wrote: Hi, this patch fixes few cases where we compute alias type and don't need to that are found by adding type_with_alias_set_p check to alias.c (I will send this patch separately as there is still one ICE caught by it I believe originating from ipa-icf-gimple, I have more involved fix for that) The point of all this is to check that we don't make bogus querries to alias oracle and don't try to use them somehow - we did in ipa-icf-gimple. This is a bug as the alias oracle gives stupid answers to stupid questions and sometimes think the types conflicts sometimes thinks they doesn't. The other three cases are sort of just pointless computations. Next part I would like to break out is the path computing equivalences considering incomplete types (i.e. where all strucutres are equivalent and so all unions or all functions of a given return value) Bootstrapped/regtested x86_64-linux, OK? Honza * tree-streamer-out.c (pack_ts_type_common_value_fields): Only consider alias set for types that have them. * lto-streamer-out.c (DFS::DFS_write_tree_body): Likewise. * emit-rtl.c (set_mem_attributes_minus_bitpos): Likewise. * ipa-icf-gimple.c (func_checker::compatible_types_p): Likewise. Index: tree-streamer-out.c === --- tree-streamer-out.c (revision 223508) +++ tree-streamer-out.c (working copy) @@ -346,6 +346,7 @@ pack_ts_type_common_value_fields (struct alias-set zero to this type. */ bp_pack_var_len_int (bp, (TYPE_ALIAS_SET (expr) == 0 || (!in_lto_p + type_with_alias_set_p (expr) get_alias_set (expr) == 0)) ? 0 : -1); } Index: lto-streamer-out.c === --- lto-streamer-out.c(revision 223508) +++ lto-streamer-out.c(working copy) @@ -1146,6 +1146,7 @@ hash_tree (struct streamer_tree_cache_d hstate.add_int (TYPE_ALIGN (t)); hstate.add_int ((TYPE_ALIAS_SET (t) == 0 || (!in_lto_p + type_with_alias_set_p (t) get_alias_set (t) == 0)) ? 0 : -1); } Index: emit-rtl.c === --- emit-rtl.c(revision 223508) +++ emit-rtl.c(working copy) @@ -1787,8 +1787,15 @@ set_mem_attributes_minus_bitpos (rtx ref memset (attrs, 0, sizeof (attrs)); /* Get the alias set from the expression or type (perhaps using a - front-end routine) and use it. */ - attrs.alias = get_alias_set (t); + front-end routine) and use it. + + We may be called to produce MEM RTX for variable of incomplete type. + This MEM RTX will only be used to produce address of a vairable, so + we do not need to compute alias set. */ But why do we set mem_attributes for it then? + if (!DECL_P (t) || type_with_alias_set_p (TYPE_MAIN_VARIANT (TREE_TYPE (t +attrs.alias = get_alias_set (t); + else +gcc_assert (DECL_P (t) TREE_ADDRESSABLE (t)); This assert also looks fishy to me... (also just use 'type' instead of TREE_TYPE (t), not sure why you need to pun to the main variant here either - get_alias_set will do that for you and so should type_with_alias_set_p if that is necessary). The rest of the patch looks ok. Thanks, Richard. MEM_VOLATILE_P (ref) |= TYPE_VOLATILE (type); MEM_POINTER (ref) = POINTER_TYPE_P (type); Index: ipa-icf-gimple.c === --- ipa-icf-gimple.c (revision 223508) +++ ipa-icf-gimple.c (working copy) @@ -274,6 +274,12 @@ func_checker::compatible_types_p (tree t if (!types_compatible_p (t1, t2)) return return_false_with_msg (types are not compatible); + /* FIXME: type compatiblity checks for types that are never stored + to memory is not very useful. We should update the code to avoid + calling compatible_types_p on types that will never be used. */ + if (!type_with_alias_set_p (t1) || !type_with_alias_set_p (t2)) +return true; + if (get_alias_set (t1) != get_alias_set (t2)) return return_false_with_msg (alias sets are different); -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
Do not compute alias sets for types that don't need them
Hi, this patch fixes few cases where we compute alias type and don't need to that are found by adding type_with_alias_set_p check to alias.c (I will send this patch separately as there is still one ICE caught by it I believe originating from ipa-icf-gimple, I have more involved fix for that) The point of all this is to check that we don't make bogus querries to alias oracle and don't try to use them somehow - we did in ipa-icf-gimple. This is a bug as the alias oracle gives stupid answers to stupid questions and sometimes think the types conflicts sometimes thinks they doesn't. The other three cases are sort of just pointless computations. Next part I would like to break out is the path computing equivalences considering incomplete types (i.e. where all strucutres are equivalent and so all unions or all functions of a given return value) Bootstrapped/regtested x86_64-linux, OK? Honza * tree-streamer-out.c (pack_ts_type_common_value_fields): Only consider alias set for types that have them. * lto-streamer-out.c (DFS::DFS_write_tree_body): Likewise. * emit-rtl.c (set_mem_attributes_minus_bitpos): Likewise. * ipa-icf-gimple.c (func_checker::compatible_types_p): Likewise. Index: tree-streamer-out.c === --- tree-streamer-out.c (revision 223508) +++ tree-streamer-out.c (working copy) @@ -346,6 +346,7 @@ pack_ts_type_common_value_fields (struct alias-set zero to this type. */ bp_pack_var_len_int (bp, (TYPE_ALIAS_SET (expr) == 0 || (!in_lto_p +type_with_alias_set_p (expr) get_alias_set (expr) == 0)) ? 0 : -1); } Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 223508) +++ lto-streamer-out.c (working copy) @@ -1146,6 +1146,7 @@ hash_tree (struct streamer_tree_cache_d hstate.add_int (TYPE_ALIGN (t)); hstate.add_int ((TYPE_ALIAS_SET (t) == 0 || (!in_lto_p + type_with_alias_set_p (t) get_alias_set (t) == 0)) ? 0 : -1); } Index: emit-rtl.c === --- emit-rtl.c (revision 223508) +++ emit-rtl.c (working copy) @@ -1787,8 +1787,15 @@ set_mem_attributes_minus_bitpos (rtx ref memset (attrs, 0, sizeof (attrs)); /* Get the alias set from the expression or type (perhaps using a - front-end routine) and use it. */ - attrs.alias = get_alias_set (t); + front-end routine) and use it. + + We may be called to produce MEM RTX for variable of incomplete type. + This MEM RTX will only be used to produce address of a vairable, so + we do not need to compute alias set. */ + if (!DECL_P (t) || type_with_alias_set_p (TYPE_MAIN_VARIANT (TREE_TYPE (t +attrs.alias = get_alias_set (t); + else +gcc_assert (DECL_P (t) TREE_ADDRESSABLE (t)); MEM_VOLATILE_P (ref) |= TYPE_VOLATILE (type); MEM_POINTER (ref) = POINTER_TYPE_P (type); Index: ipa-icf-gimple.c === --- ipa-icf-gimple.c(revision 223508) +++ ipa-icf-gimple.c(working copy) @@ -274,6 +274,12 @@ func_checker::compatible_types_p (tree t if (!types_compatible_p (t1, t2)) return return_false_with_msg (types are not compatible); + /* FIXME: type compatiblity checks for types that are never stored + to memory is not very useful. We should update the code to avoid + calling compatible_types_p on types that will never be used. */ + if (!type_with_alias_set_p (t1) || !type_with_alias_set_p (t2)) +return true; + if (get_alias_set (t1) != get_alias_set (t2)) return return_false_with_msg (alias sets are different);
[Ada] Duplicate symbol xxxAM due to anonymous access allocation
This patch reimplements the generation of anonymous finalization masters used in servicing anonymous access-to-controlled type allocations. The modification prevents the generation of a duplicate anonymous master in certain cases. -- Source -- -- gen_pack.ads with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; generic package Gen_Pack is type Rec is tagged record Comp : Unbounded_String; end record; procedure Force_Body; end Gen_Pack; -- gen_pack.adb package body Gen_Pack is Ptr : access Rec := new Rec; procedure Force_Body is begin null; end Force_Body; end Gen_Pack; -- pack.ads with Gen_Pack; package Pack is procedure Force_Body; package Inst_1 is new Gen_Pack; end Pack; -- pack.adb package body Pack is procedure Force_Body is begin null; end Force_Body; package Inst_2 is new Gen_Pack; end Pack; - -- Compilation -- - $ gcc -c pack.adb Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Hristian Kirtchev kirtc...@adacore.com * einfo.adb Node36 is now used as Anonymous_Master. Flag253 is now unused. (Anonymous_Master): New routine. (Has_Anonymous_Master): Removed. (Set_Anonymous_Master): New routine. (Set_Has_Anonymous_Master): Removed. (Write_Entity_Flags): Remove the output for Has_Anonymous_Maser. (Write_Field36_Name): Add output for Anonymous_Master. * einfo.ads Add new attribute Anonymous_Master along with occurrences in nodes. Remove attribute Has_Anonymous_Master along with occurrences in nodes. (Anonymous_Master): New routine along with pragma Inline. (Has_Anonymous_Master): Removed along with pragma Inline. (Set_Anonymous_Master): New routine along with pragma Inline. (Set_Has_Anonymous_Master): Removed along with pragma Inline. * exp_ch4.adb (Create_Anonymous_Master): New routine. (Current_Anonymous_Master): Reimplemented. Index: einfo.adb === --- einfo.adb (revision 223546) +++ einfo.adb (working copy) @@ -264,7 +264,8 @@ --Import_Pragma Node35 - --(unused)Node36 + --Anonymous_MasterNode36 + --(unused)Node38 --(unused)Node39 --(unused)Node40 @@ -556,7 +557,6 @@ --Has_Implicit_DereferenceFlag251 --Is_Processed_Transient Flag252 - --Has_Anonymous_MasterFlag253 --Is_Implementation_Defined Flag254 --Is_Predicate_Function Flag255 --Is_Predicate_Function_M Flag256 @@ -594,6 +594,7 @@ --Has_Volatile_Full_AccessFlag285 --Needs_Typedef Flag286 + --(unused)Flag253 --(unused)Flag287 --(unused)Flag288 --(unused)Flag289 @@ -753,6 +754,12 @@ return Uint14 (Id); end Alignment; + function Anonymous_Master (Id : E) return E is + begin + pragma Assert (Ekind_In (Id, E_Function, E_Package, E_Procedure)); + return Node36 (Id); + end Anonymous_Master; + function Associated_Entity (Id : E) return E is begin return Node37 (Id); @@ -1375,13 +1382,6 @@ return Flag79 (Id); end Has_All_Calls_Remote; - function Has_Anonymous_Master (Id : E) return B is - begin - pragma Assert -(Ekind_In (Id, E_Function, E_Package, E_Package_Body, E_Procedure)); - return Flag253 (Id); - end Has_Anonymous_Master; - function Has_Atomic_Components (Id : E) return B is begin return Flag86 (Implementation_Base_Type (Id)); @@ -3576,6 +3576,12 @@ Set_Elist16 (Id, V); end Set_Access_Disp_Table; + procedure Set_Anonymous_Master (Id : E; V : E) is + begin + pragma Assert (Ekind_In (Id, E_Function, E_Package, E_Procedure)); + Set_Node36 (Id, V); + end Set_Anonymous_Master; + procedure Set_Associated_Entity (Id : E; V : E) is begin Set_Node37 (Id, V); @@ -4246,13 +4252,6 @@ Set_Flag79 (Id, V); end Set_Has_All_Calls_Remote; - procedure Set_Has_Anonymous_Master (Id : E; V : B := True) is - begin - pragma Assert -(Ekind_In (Id, E_Function, E_Package, E_Package_Body, E_Procedure)); - Set_Flag253 (Id, V); - end Set_Has_Anonymous_Master; - procedure Set_Has_Atomic_Components (Id : E; V : B := True) is begin pragma Assert (not Is_Type (Id) or else Is_Base_Type (Id)); @@ -8634,7 +8633,6 @@ W (Has_Aliased_Components, Flag135 (Id)); W (Has_Alignment_Clause,Flag46 (Id)); W (Has_All_Calls_Remote,Flag79 (Id)); - W (Has_Anonymous_Master,
[Ada] Size should be zero for null range discrete subtype
The 'Size of a discrete subtype with a null range should be zero. The following test: 1. with Ada.Text_IO; use Ada.Text_IO; 2. procedure Static_Null_Range_Size is 3.subtype Static_Null_Range is 4. Integer range 5 .. 0; 5.Dummy : Static_Null_Range; 6. begin 7.Put_Line (Static_Null_Range:); 8.Put_Line (Type'Size is 9.Natural'Image 10.(Static_Null_Range'Size)); 11.Put_Line ( Object'Size is 12.Natural'Image (Dummy'Size)); 13.if Static_Null_Range'Size /= 0 then 14. raise Program_Error; 15.end if; 16. end Static_Null_Range_Size; should generate the output: Static_Null_Range: Type'Size is 0 Object'Size is 32 and not raise an exception. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Robert Dewar de...@adacore.com * sem_ch13.adb (Minimum_Size): Size is zero for null range discrete subtype. Index: sem_ch13.adb === --- sem_ch13.adb(revision 223555) +++ sem_ch13.adb(working copy) @@ -11718,11 +11718,20 @@ Lo := Uint_0; end if; + -- Null range case, size is always zero. We only do this in the discrete + -- type case, since that's the odd case that came up. Probably we should + -- also do this in the fixed-point case, but doing so causes peculiar + -- gigi failures, and it is not worth worrying about this incredibly + -- marginal case (explicit null-range fixed-point type declarations)??? + + if Lo Hi and then Is_Discrete_Type (T) then + S := 0; + -- Signed case. Note that we consider types like range 1 .. -1 to be -- signed for the purpose of computing the size, since the bounds have -- to be accommodated in the base type. - if Lo 0 or else Hi 0 then + elsif Lo 0 or else Hi 0 then S := 1; B := Uint_1;
[Ada] Internal crash on package instantation compilation unit
This patch updates the implementation of anonymous masters that support finalization actions of anonymous access-to-controlled type allocations to handle package instantiations that act as a compilation unit. -- Source -- -- q.ads package Q is type Obj_T is tagged null record; type T is access all Obj_T'Class; end Q; -- r.ads with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; with Q; package R is type Obj_T is new Q.Obj_T with record S : Unbounded_String; end record; function Create return Obj_T; end R; -- h.ads with Q; with R; generic type T is private; package H is type Obj_T is new R.Obj_T with null record; function Func return Q.T; end H; -- h.adb package body H is Temp : constant Obj_T := (R.Create with null record); Temp_Ptr : constant access Obj_T := new Obj_T'(Temp); Data : constant Q.T := Q.T (Temp_Ptr); function Func return Q.T is begin return Data; end; end H; -- g.ads with H; with Q; generic type T is private; package G is package My_H is new H (Boolean); type Obj_T is new My_H.Obj_T with null record; function Func return Q.T; end G; -- g.adb package body G is Data : constant Q.T := new Obj_T'(My_H.Obj_T (My_H.Func.all) with null record); function Func return Q.T is begin return Data; end; end G; -- p.ads with G; package P is new G (Boolean); - -- Compilation -- - $ gcc -c p.ads Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Hristian Kirtchev kirtc...@adacore.com * einfo.adb (Anonymous_Master): This attribute now applies to package and subprogram bodies. (Set_Anonymous_Master): This attribute now applies to package and subprogram bodies. (Write_Field36_Name): Add output for package and subprogram bodies. * einfo.ads Update the documentation on attribute Anonymous_Master along with occurrences in entities. * exp_ch4.adb (Create_Anonymous_Master): Reimplemented to handle spec and body anonymous masters of the same unit. (Current_Anonymous_Master): Reimplemented. Handle a package instantiation that acts as a compilation unit. (Insert_And_Analyze): Reimplemented. Index: einfo.adb === --- einfo.adb (revision 223553) +++ einfo.adb (working copy) @@ -757,7 +757,11 @@ function Anonymous_Master (Id : E) return E is begin - pragma Assert (Ekind_In (Id, E_Function, E_Package, E_Procedure)); + pragma Assert (Ekind_In (Id, E_Function, + E_Package, + E_Package_Body, + E_Procedure, + E_Subprogram_Body)); return Node36 (Id); end Anonymous_Master; @@ -3586,7 +3590,11 @@ procedure Set_Anonymous_Master (Id : E; V : E) is begin - pragma Assert (Ekind_In (Id, E_Function, E_Package, E_Procedure)); + pragma Assert (Ekind_In (Id, E_Function, + E_Package, + E_Package_Body, + E_Procedure, + E_Subprogram_Body)); Set_Node36 (Id, V); end Set_Anonymous_Master; @@ -10141,7 +10149,9 @@ when E_Function | E_Operator | E_Package| - E_Procedure = + E_Package_Body | + E_Procedure | + E_Subprogram_Body= Write_Str (Anonymous_Master); when others = Index: einfo.ads === --- einfo.ads (revision 223553) +++ einfo.ads (working copy) @@ -437,10 +437,10 @@ -- into an attribute definition clause for this purpose. --Anonymous_Master (Node36) --- Defined in the entities of non-generic subprogram and package units. --- Contains the entity of a special heterogeneous finalization master --- that services most anonymous access-to-controlled allocations that --- occur within the unit. +-- Defined in the entities of non-generic packages, subprograms and their +-- corresponding bodies. Contains the entity of a special heterogeneous +-- finalization master that services most anonymous access-to-controlled +-- allocations that occur within the unit. --Associated_Entity (Node37) -- Defined in all entities. This field is similar to Associated_Node, but @@ -6096,6 +6096,7 @@ --SPARK_Pragma(Node32) --SPARK_Aux_Pragma
Re: Add few cases to operand_equal_p
On Fri, 22 May 2015, Jan Hubicka wrote: Hi, I am working on patch that makes operand_equal_p replace logic from ipa-icf-gimple's compare_op via a valueizer hook. Currently the patch however cuts number of merges on firefox to half (apparently becuase it gives up on some tree codes too early) The patch bellow merges code from ipa-icf-gimple.c that is missing in fold-const and is needed. Bootstrapped/regtested x86_64-linux, OK? No, I don't like this. Honza * fold-const.c (operand_equal_p): Handle OBJ_TYPE_REF, CONSTRUCTOR and be more tolerant about FIELD_DECL. * tree.c (add_expr): Handle FIELD_DECL. (prototype_p, virtual_method_call_p, obj_type_ref_class): Constify. * tree.h (prototype_p, virtual_method_call_p, obj_type_ref_class): Constify. Index: fold-const.c === --- fold-const.c (revision 223500) @@ -3037,6 +3058,40 @@ operand_equal_p (const_tree arg0, const_ case DOT_PROD_EXPR: return OP_SAME (0) OP_SAME (1) OP_SAME (2); + /* OBJ_TYPE_REF really is just a transparent wrapper around expression, +but it holds additional type information for devirtualization that +needs to be matched. We may want to intoruce OEP flag if we want +to compare the actual value only, or if we also care about effects +of potentially merging the code. This flag can bypass this check +as well as the alias set matching in MEM_REF. */ + case OBJ_TYPE_REF: + { + if (!operand_equal_p (OBJ_TYPE_REF_EXPR (arg0), + OBJ_TYPE_REF_EXPR (arg1), flags)) + return false; + if (flag_devirtualize virtual_method_call_p (arg0)) + { + if (tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg0)) + != tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg1))) + return false; + if (!types_same_for_odr (obj_type_ref_class (arg0), + obj_type_ref_class (arg1))) + return false; devirt machinery in operand_equal_p? please not. not more places! + /* OBJ_TYPE_REF_OBJECT is used to track the instance of +object THIS pointer points to. Checking that both +addresses equal is safe approximation of the fact +that dynamic types are equal. +Do not care about the other flags, because this expression +does not attribute to actual value of OBJ_TYPE_REF */ + if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0), + OBJ_TYPE_REF_OBJECT (arg1), + OEP_ADDRESS_OF)) + return false; + } + + return true; + } + default: return 0; } @@ -3097,6 +3152,21 @@ operand_equal_p (const_tree arg0, const_ } case tcc_declaration: + /* At LTO time the FIELD_DECL may exist in multiple copies. + We only care about offsets and bit offsets for operands. */ Err? Even at LTO time FIELD_DECLs should only appear once. So - testcase? IMHO most of this boils down to ICF being too strict about aliasing because it inlines merged bodies instead of original ones. + if (TREE_CODE (arg0) == FIELD_DECL) + { + tree offset1 = DECL_FIELD_OFFSET (arg0); + tree offset2 = DECL_FIELD_OFFSET (arg1); + + tree bit_offset1 = DECL_FIELD_BIT_OFFSET (arg0); + tree bit_offset2 = DECL_FIELD_BIT_OFFSET (arg1); + + flags = ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); + + return operand_equal_p (offset1, offset2, flags) + operand_equal_p (bit_offset1, bit_offset2, flags); + } /* Consider __builtin_sqrt equal to sqrt. */ return (TREE_CODE (arg0) == FUNCTION_DECL DECL_BUILT_IN (arg0) DECL_BUILT_IN (arg1) @@ -3104,12 +3174,50 @@ operand_equal_p (const_tree arg0, const_ DECL_FUNCTION_CODE (arg0) == DECL_FUNCTION_CODE (arg1)); default: case tcc_exceptional: + /* In GIMPLE empty constructors are allowed in initializers of + vector types. */ Why this comment about GIMPLE? This is about comparing GENERIC trees which of course can have CONSTRUCTORs of various sorts. + if (TREE_CODE (arg0) == CONSTRUCTOR) + { + unsigned length1 = vec_safe_length (CONSTRUCTOR_ELTS (arg0)); + unsigned length2 = vec_safe_length (CONSTRUCTOR_ELTS (arg1)); + + if (length1 != length2) + return false; + + flags = ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); + + for (unsigned i = 0; i length1; i++) + if (!operand_equal_p (CONSTRUCTOR_ELT (arg0, i)-value, + CONSTRUCTOR_ELT (arg1, i)-value, flags)) You need to compare -index as well here. I'm not sure constructor values are sorted
Re: Do not compute alias sets for types that don't need them
Index: emit-rtl.c === --- emit-rtl.c (revision 223508) +++ emit-rtl.c (working copy) @@ -1787,8 +1787,15 @@ set_mem_attributes_minus_bitpos (rtx ref memset (attrs, 0, sizeof (attrs)); /* Get the alias set from the expression or type (perhaps using a - front-end routine) and use it. */ - attrs.alias = get_alias_set (t); + front-end routine) and use it. + + We may be called to produce MEM RTX for variable of incomplete type. + This MEM RTX will only be used to produce address of a vairable, so + we do not need to compute alias set. */ But why do we set mem_attributes for it then? Because we are stupid. We want to produce var, where var is of incomplete type and to do it, we need to compute DECL_RTL of var and that is MEM RTX and we assign attributes to every DECL_RTL MEM we produce. I do not see how to cut this down except for having something like DECL_RTL_ADDR and have way to build it without actually making MEM expr... (which would save some memory for the MEM RTX and for attributes but it would make it difficult to hook that value somewhere) + if (!DECL_P (t) || type_with_alias_set_p (TYPE_MAIN_VARIANT (TREE_TYPE (t +attrs.alias = get_alias_set (t); + else +gcc_assert (DECL_P (t) TREE_ADDRESSABLE (t)); This assert also looks fishy to me... (also just use 'type' instead of TREE_TYPE (t), not sure why you need to pun to the main variant here either - get_alias_set will do that for you and so should type_with_alias_set_p if that is necessary). I am not sure if TYPE_MAIN_VARIANT is really needed here. What I know is that complete types may have incomplete variants. I was bit worried that I can disable calculation of alias set of something that do matter by not checking main variant in case we somehow manage to have a variable of incomplete vairant type but still access it because we know its complete main variant (that would be somewhat broken). That is also why I added the check - the idea is that at least I can check that this particular var is indeed having address taken. Other option would be to define an invalid alias set and assign MEM RTX this invalid alias set and make alias.c bomb when it trips across it. I actually noticed it triggers during producing some libjava glue where we forgot to set TREE_ADDRESSABLE for artificial variable that takes address. I will look into fix later today or tomorrow. Honza The rest of the patch looks ok. Thanks, Richard. MEM_VOLATILE_P (ref) |= TYPE_VOLATILE (type); MEM_POINTER (ref) = POINTER_TYPE_P (type); Index: ipa-icf-gimple.c === --- ipa-icf-gimple.c(revision 223508) +++ ipa-icf-gimple.c(working copy) @@ -274,6 +274,12 @@ func_checker::compatible_types_p (tree t if (!types_compatible_p (t1, t2)) return return_false_with_msg (types are not compatible); + /* FIXME: type compatiblity checks for types that are never stored + to memory is not very useful. We should update the code to avoid + calling compatible_types_p on types that will never be used. */ + if (!type_with_alias_set_p (t1) || !type_with_alias_set_p (t2)) +return true; + if (get_alias_set (t1) != get_alias_set (t2)) return return_false_with_msg (alias sets are different); -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
[C++ Patch] PR 65598
Hi, this is also by and large obvious, I think: in order to use the right location for error messages about 'explicit', just use declspecs-locations[ds_explicit]. Tested x86_64-linux. Thanks, Paolo. PS: I'm pretty sure we do have similar issues for other decl-specifiers, which can be likewise fixed very easily. I'll go through them over the next weeks.. /cp 2015-05-22 Paolo Carlini paolo.carl...@oracle.com PR c++/65598 * decl.c (grokdeclarator): Use the correct location in error messages about 'explicit'. /testsuite 2015-05-22 Paolo Carlini paolo.carl...@oracle.com PR c++/65598 * g++.dg/cpp0x/explicit9.C: New. * g++.dg/cpp0x/explicit8.C: Check the locations too. Index: cp/decl.c === --- cp/decl.c (revision 223520) +++ cp/decl.c (working copy) @@ -10266,12 +10266,15 @@ grokdeclarator (const cp_declarator *declarator, in the declaration of a constructor or conversion function within a class definition. */ if (!current_class_type) - error (%explicit% outside class declaration); + error_at (declspecs-locations[ds_explicit], + %explicit% outside class declaration); else if (friendp) - error (%explicit% in friend declaration); + error_at (declspecs-locations[ds_explicit], + %explicit% in friend declaration); else - error (only declarations of constructors and conversion operators - can be %explicit%); + error_at (declspecs-locations[ds_explicit], + only declarations of constructors and conversion operators + can be %explicit%); explicitp = 0; } Index: testsuite/g++.dg/cpp0x/explicit8.C === --- testsuite/g++.dg/cpp0x/explicit8.C (revision 223520) +++ testsuite/g++.dg/cpp0x/explicit8.C (working copy) @@ -5,13 +5,13 @@ struct A { explicit operator int() const; }; -explicit inline A::operator int() const { return 1; } // { dg-error 'explicit' outside class declaration } +explicit inline A::operator int() const { return 1; } // { dg-error 1:'explicit' outside class declaration } struct B { - explicit void f(); // { dg-error only declarations of constructors and conversion operators can be 'explicit' } + explicit void f(); // { dg-error 3:only declarations of constructors and conversion operators can be 'explicit' } }; -explicit void B::f() { } // { dg-error 'explicit' outside class declaration } +explicit void B::f() { } // { dg-error 1:'explicit' outside class declaration } struct C { explicit C(int); @@ -18,5 +18,5 @@ struct C { }; struct D { - explicit friend C::C(int); // { dg-error 'explicit' in friend declaration } + explicit friend C::C(int); // { dg-error 3:'explicit' in friend declaration } }; Index: testsuite/g++.dg/cpp0x/explicit9.C === --- testsuite/g++.dg/cpp0x/explicit9.C (revision 0) +++ testsuite/g++.dg/cpp0x/explicit9.C (working copy) @@ -0,0 +1,12 @@ +// PR c++/65598 +// { dg-do compile { target c++11 } } + +struct ExplicitTest +{ + explicit operator bool() const; +}; + +explicit ExplicitTest::operator bool() const // { dg-error 1:'explicit' outside class declaration } +{ + return true; +}
Re: [patch 10/10] debug-early merge: compiler proper
On 05/22/2015 07:23 AM, Richard Biener wrote: On Wed, May 20, 2015 at 5:50 PM, Aldy Hernandez al...@redhat.com wrote: On 05/18/2015 06:56 AM, Richard Biener wrote: diff --git a/gcc/tree-core.h b/gcc/tree-core.h index ad1bb23..2a9f417 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1334,6 +1334,9 @@ struct GTY(()) tree_block { tree abstract_origin; tree fragment_origin; tree fragment_chain; + + /* Pointer to the DWARF lexical block. */ + struct die_struct *die; }; struct GTY(()) tree_type_common { Ick - do we need this? dwarf2out.c has a hashtable to map blocks to DIEs (which you don't remove in turn). We need a way to reference the early created DIE from late debugging, and we can't use block_map because it gets cloberred across functions. It's currently being released in late debug (dwarf2out_function_decl), that's why you see it not set to NULL in dwarf2out_c_finalize. Also, it uses BLOCK_NUMBERs, which according to the documentation in tree.h, are not guaranteed to be unique across functions. As Honza mentioned, we're already using a DIE map in types through TYPE_SYMTAB_DIE. See lookup_type_die() in dwarf2out.c. Could we leave this as is? But why then not eliminate block_map in favor of using the new -die member? Having both looks very odd to me. Oh, I would love to. I just didn't want to rip things apart elsewhere until I was sure you guys were on board with the approach. Can you cook up a patch for trunk adding that field to tree_block and removing the block_map map in favor of sth like what we do for lookup_type_die/equate_type_number_to_die and TYPE_SYMTAB_DIE? Absolutely! The attached patch removes block_map in favor of BLOCK_DIE. I did not add lookup_block_die/equate_block_number_to_die abstractions because I think BLOCK_DIE is pretty straightforward. The attached patch is against mainline. I also ported it to the branch for testing, and neither the branch nor mainline exhibit any regressions. Tested on x86-64 Linux with --enable-languages=all,go,ada. OK for trunk? Aldy commit 9a82ff7b044a8d17eaaaec5eaec3e73f836224df Author: Aldy Hernandez al...@redhat.com Date: Fri May 22 10:07:17 2015 -0400 * dwarf2out.c: Remove block_map. (gen_call_site_die): Replace block_map use with BLOCK_DIE. (gen_lexical_block_die): Same. (dwarf2out_function_decl): Remove block_map use. (dwarf2out_c_finalize): Same. * tree-core.h (struct tree_block): Add die field. * tree.h (BLOCK_DIE): New. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index cc7ac84..15c545e 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -2908,10 +2908,6 @@ static int call_site_count = -1; /* Number of tail call sites in the current function. */ static int tail_call_site_count = -1; -/* Vector mapping block numbers to DW_TAG_{lexical_block,inlined_subroutine} - DIEs. */ -static vecdw_die_ref block_map; - /* A cached location list. */ struct GTY ((for_user)) cached_dw_loc_list_def { /* The DECL_UID of the decl that this entry describes. */ @@ -18368,8 +18364,7 @@ gen_call_site_die (tree decl, dw_die_ref subr_die, block != DECL_INITIAL (decl) TREE_CODE (block) == BLOCK) { - if (block_map.length () BLOCK_NUMBER (block)) - stmt_die = block_map[BLOCK_NUMBER (block)]; + stmt_die = BLOCK_DIE (block); if (stmt_die) break; block = BLOCK_SUPERCONTEXT (block); @@ -19469,11 +19464,7 @@ gen_lexical_block_die (tree stmt, dw_die_ref context_die) dw_die_ref stmt_die = new_die (DW_TAG_lexical_block, context_die, stmt); if (call_arg_locations) -{ - if (block_map.length () = BLOCK_NUMBER (stmt)) - block_map.safe_grow_cleared (BLOCK_NUMBER (stmt) + 1); - block_map[BLOCK_NUMBER (stmt)] = stmt_die; -} +BLOCK_DIE (stmt) = stmt_die; if (! BLOCK_ABSTRACT (stmt) TREE_ASM_WRITTEN (stmt)) add_high_low_attributes (stmt, stmt_die); @@ -19506,11 +19497,7 @@ gen_inlined_subroutine_die (tree stmt, dw_die_ref context_die) = new_die (DW_TAG_inlined_subroutine, context_die, stmt); if (call_arg_locations) - { - if (block_map.length () = BLOCK_NUMBER (stmt)) - block_map.safe_grow_cleared (BLOCK_NUMBER (stmt) + 1); - block_map[BLOCK_NUMBER (stmt)] = subr_die; - } + BLOCK_DIE (stmt) = subr_die; add_abstract_origin_attribute (subr_die, decl); if (TREE_ASM_WRITTEN (stmt)) add_high_low_attributes (stmt, subr_die); @@ -21407,7 +21394,6 @@ dwarf2out_function_decl (tree decl) call_arg_loc_last = NULL; call_site_count = -1; tail_call_site_count = -1; - block_map.release (); decl_loc_table-empty (); cached_dw_loc_list_table-empty (); } @@ -25008,7 +24994,6 @@ dwarf2out_c_finalize (void) call_arg_loc_last = NULL; call_site_count = -1; tail_call_site_count = -1; - //block_map = NULL; cached_dw_loc_list_table = NULL;
[Ada] Rename Has_Volatile_Full_Access into Is_Volatile_Full_Access
This is an internal change that renames the Has_Volatile_Full_Access flag into Is_Volatile_Full_Access for the sake of consistency with similar flags. No user-visible changes. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-05-22 Eric Botcazou ebotca...@adacore.com * einfo.ads (Has_Volatile_Full_Access): Rename into... (Is_Volatile_Full_Access): ...this. (Set_Has_Volatile_Full_Access): Rename into... (Set_Is_Volatile_Full_Access): ...this. * einfo.adb (Has_Volatile_Full_Access): Rename into... (Is_Volatile_Full_Access): ...this. (Set_Has_Volatile_Full_Access): Rename into... (Set_Is_Volatile_Full_Access): ...this. (Is_Atomic_Or_VFA): Adjust to above renaming. * errout.adb (Special_Msg_Delete): Likewise. * exp_pakd.adb (Install_PAT): Likewise. * freeze.adb (Freeze_Array_Type): Likewise. * sem_ch8.adb (Analyze_Object_Renaming): Likewise. * sem_ch13.adb (Inherit_Delayed_Rep_Aspects): Likewise. (Inherit_Aspects_At_Freeze_Point): Likewise. * sem_prag.adb (Set_Atomic_VFA): Likewise. (Process_Atomic_Independent_Shared_Volatile): Likewise. * sem_util.adb (Is_Atomic_Or_VFA_Object): Likewise. Index: einfo.adb === --- einfo.adb (revision 223560) +++ einfo.adb (working copy) @@ -592,7 +592,7 @@ --Has_Nested_Subprogram Flag282 --Is_Uplevel_Referenced_EntityFlag283 --Is_UnimplementedFlag284 - --Has_Volatile_Full_AccessFlag285 + --Is_Volatile_Full_Access Flag285 --Needs_Typedef Flag286 --(unused)Flag253 @@ -1856,11 +1856,6 @@ return Flag87 (Implementation_Base_Type (Id)); end Has_Volatile_Components; - function Has_Volatile_Full_Access (Id : E) return B is - begin - return Flag285 (Id); - end Has_Volatile_Full_Access; - function Has_Xref_Entry (Id : E) return B is begin return Flag182 (Id); @@ -2528,6 +2523,11 @@ end if; end Is_Volatile; + function Is_Volatile_Full_Access (Id : E) return B is + begin + return Flag285 (Id); + end Is_Volatile_Full_Access; + function Itype_Printed (Id : E) return B is begin pragma Assert (Is_Itype (Id)); @@ -4758,11 +4758,6 @@ Set_Flag87 (Id, V); end Set_Has_Volatile_Components; - procedure Set_Has_Volatile_Full_Access (Id : E; V : B := True) is - begin - Set_Flag285 (Id, V); - end Set_Has_Volatile_Full_Access; - procedure Set_Has_Xref_Entry (Id : E; V : B := True) is begin Set_Flag182 (Id, V); @@ -5498,6 +5493,11 @@ Set_Flag16 (Id, V); end Set_Is_Volatile; + procedure Set_Is_Volatile_Full_Access (Id : E; V : B := True) is + begin + Set_Flag285 (Id, V); + end Set_Is_Volatile_Full_Access; + procedure Set_Itype_Printed (Id : E; V : B := True) is begin pragma Assert (Is_Itype (Id)); @@ -7335,7 +7335,7 @@ function Is_Atomic_Or_VFA (Id : E) return B is begin - return Is_Atomic (Id) or else Has_Volatile_Full_Access (Id); + return Is_Atomic (Id) or else Is_Volatile_Full_Access (Id); end Is_Atomic_Or_VFA; -- @@ -8750,7 +8750,6 @@ W (Has_Uplevel_Reference, Flag215 (Id)); W (Has_Visible_Refinement, Flag263 (Id)); W (Has_Volatile_Components, Flag87 (Id)); - W (Has_Volatile_Full_Access,Flag285 (Id)); W (Has_Xref_Entry, Flag182 (Id)); W (In_Package_Body, Flag48 (Id)); W (In_Private_Part, Flag45 (Id)); @@ -8865,6 +8864,7 @@ W (Is_Visible_Formal, Flag206 (Id)); W (Is_Visible_Lib_Unit, Flag116 (Id)); W (Is_Volatile, Flag16 (Id)); + W (Is_Volatile_Full_Access, Flag285 (Id)); W (Itype_Printed, Flag202 (Id)); W (Kill_Elaboration_Checks, Flag32 (Id)); W (Kill_Range_Checks, Flag33 (Id)); Index: einfo.ads === --- einfo.ads (revision 223560) +++ einfo.ads (working copy) @@ -2047,12 +2047,6 @@ -- type the pragma will be chained to the rep item chain of the first -- subtype in the usual manner. ---Has_Volatile_Full_Access (Flag285) --- Defined in all type entities, and also in constants, components and --- variables. Set if a pragma Volatile_Full_Access applies to the entity. --- In the case of private and incomplete types, this flag is set in --- both the partial view and the full view. - --Has_Xref_Entry (Flag182) -- Defined in all entities. Set if an entity has an entry in the Xref -- information generated in ali files. This is true for all source @@
Re: [patch 10/10] debug-early merge: compiler proper
On 05/22/2015 07:26 AM, Richard Biener wrote: On Wed, May 20, 2015 at 11:45 PM, Aldy Hernandez al...@redhat.com wrote: On 05/20/2015 05:01 PM, Jan Hubicka wrote: commit 8824b5ecba26cef065e47b34609c72677c3c36fc Author: Aldy Hernandez al...@redhat.com Date: Wed May 20 16:31:14 2015 -0400 Set DECL_IGNORED_P on temporary arrays created in the switch conversion pass. diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c index 6b68a16..a4bcdba 100644 --- a/gcc/tree-switch-conversion.c +++ b/gcc/tree-switch-conversion.c @@ -1097,6 +1097,7 @@ build_one_array (gswitch *swtch, int num, tree arr_index_type, DECL_ARTIFICIAL (decl) = 1; TREE_CONSTANT (decl) = 1; TREE_READONLY (decl) = 1; + DECL_IGNORED_P (decl) = 1; varpool_node::finalize_decl (decl); This looks obvious enough to me. Technically speaking the array type constructed probalby should be TREE_ARTIFICAIL, but probably it does not matter. Fine to commit to trunk btw. Tested independently on trunk, and committed there. Thanks.
Re: [C++ Patch] PR 65598
OK. Jason
[PATCH, i386, libgcc]: Split SSE specific part from set_fast_math
Hello! This patch splits SSE specific part of set_fast_math to its own function, decorated with fxsr,sse target attribute. This way, we can avoid compiling the whole file with -msse that implies generation of possibly unsupported CMOVE insns. Additionally, we can now use generic t-crtfm makefile fragment. 2015-05-22 Uros Bizjak ubiz...@gmail.com * config.host (i[34567]-*-*, x86_64-*-*): Add t-crtfm instead of i386/t-crtfm to tmake_file. * config/i386/crtfastmath.c (set_fast_math_sse): New function. (set_fast_math): Use set_fast_math_sse for SSE targets. * config/i386/t-crtfm: Remove. Bootstrapped, regression tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: config.host === --- config.host (revision 223519) +++ config.host (working copy) @@ -553,12 +553,12 @@ hppa*-*-openbsd*) tmake_file=$tmake_file pa/t-openbsd ;; i[34567]86-*-darwin*) - tmake_file=$tmake_file i386/t-crtpc i386/t-crtfm + tmake_file=$tmake_file i386/t-crtpc t-crtfm tm_file=$tm_file i386/darwin-lib.h extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o ;; x86_64-*-darwin*) - tmake_file=$tmake_file i386/t-crtpc i386/t-crtfm + tmake_file=$tmake_file i386/t-crtpc t-crtfm tm_file=$tm_file i386/darwin-lib.h extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o ;; @@ -595,24 +595,24 @@ x86_64-*-openbsd*) ;; i[34567]86-*-linux*) extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o - tmake_file=${tmake_file} i386/t-crtpc i386/t-crtfm i386/t-crtstuff t-dfprules + tmake_file=${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules tm_file=${tm_file} i386/elf-lib.h md_unwind_header=i386/linux-unwind.h ;; i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | i[34567]86-*-gnu* | i[34567]86-*-kopensolaris*-gnu) extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o - tmake_file=${tmake_file} i386/t-crtpc i386/t-crtfm i386/t-crtstuff t-dfprules + tmake_file=${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules tm_file=${tm_file} i386/elf-lib.h ;; x86_64-*-linux*) extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o - tmake_file=${tmake_file} i386/t-crtpc i386/t-crtfm i386/t-crtstuff t-dfprules + tmake_file=${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules tm_file=${tm_file} i386/elf-lib.h md_unwind_header=i386/linux-unwind.h ;; x86_64-*-kfreebsd*-gnu | x86_64-*-knetbsd*-gnu) extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o - tmake_file=${tmake_file} i386/t-crtpc i386/t-crtfm i386/t-crtstuff t-dfprules + tmake_file=${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules tm_file=${tm_file} i386/elf-lib.h ;; i[34567]86-pc-msdosdjgpp*) @@ -628,7 +628,7 @@ i[34567]86-*-rtems*) extra_parts=$extra_parts crti.o crtn.o ;; i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*) - tmake_file=$tmake_file i386/t-crtpc i386/t-crtfm + tmake_file=$tmake_file i386/t-crtpc t-crtfm extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o tm_file=${tm_file} i386/elf-lib.h md_unwind_header=i386/sol2-unwind.h @@ -652,7 +652,7 @@ i[34567]86-*-cygwin*) else tmake_dlldir_file=i386/t-dlldir-x fi - tmake_file=${tmake_file} ${tmake_eh_file} ${tmake_dlldir_file} i386/t-slibgcc-cygming i386/t-cygming i386/t-cygwin i386/t-crtfm i386/t-chkstk t-dfprules + tmake_file=${tmake_file} ${tmake_eh_file} ${tmake_dlldir_file} i386/t-slibgcc-cygming i386/t-cygming i386/t-cygwin t-crtfm i386/t-chkstk t-dfprules ;; x86_64-*-cygwin*) extra_parts=crtbegin.o crtbeginS.o crtend.o crtfastmath.o @@ -672,7 +672,7 @@ x86_64-*-cygwin*) tmake_dlldir_file=i386/t-dlldir-x fi # FIXME - dj - t-chkstk used to be in here, need a 64-bit version of that - tmake_file=${tmake_file} ${tmake_eh_file} ${tmake_dlldir_file} i386/t-slibgcc-cygming i386/t-cygming i386/t-cygwin i386/t-crtfm t-dfprules i386/t-chkstk + tmake_file=${tmake_file} ${tmake_eh_file} ${tmake_dlldir_file} i386/t-slibgcc-cygming i386/t-cygming i386/t-cygwin t-crtfm t-dfprules i386/t-chkstk ;; i[34567]86-*-mingw*) extra_parts=crtbegin.o crtend.o crtfastmath.o @@ -700,7 +700,7 @@ i[34567]86-*-mingw*) else tmake_dlldir_file=i386/t-dlldir-x fi - tmake_file=${tmake_file} ${tmake_eh_file} ${tmake_dlldir_file} i386/t-slibgcc-cygming i386/t-cygming i386/t-mingw32 i386/t-crtfm i386/t-chkstk t-dfprules + tmake_file=${tmake_file} ${tmake_eh_file} ${tmake_dlldir_file}
Copy TYPE_NO_FORCE_BLK in finalize_type_size
Hi, PR 66181 is about ICE in verify_type that complains that type and its variant differs by TYPE_NO_FORCE_BLK. This flag is kind-of internal to stor-layout.c, so the divergence may not matter (I am not sure about it as C++ FE finalizes type variants separately and thus it may trip to different values) but I think it is cleaner to copy it to all variants like we copy other stuff set by stor layout. Bootstrapped/regtested x86_64-linux and the patch reportedly fixes the ARM ICE OK? Honza PR middle-end/66181 * stor-layout.c (finalize_type_size): Also copy TYPE_NO_FORCE_BLK. Index: stor-layout.c === --- stor-layout.c (revision 222869) +++ stor-layout.c (working copy) @@ -1834,6 +1834,7 @@ TYPE_ALIGN (variant) = valign; TYPE_PRECISION (variant) = precision; SET_TYPE_MODE (variant, mode); + TYPE_NO_FORCE_BLK (variant) = TYPE_NO_FORCE_BLK (type); } } }
[patch] libstdc++/66017 Avoid bad casts and fix alignment of _Rb_tree_nodelong long::_M_storage
There are two problems involved in this PR. First, as Clang's ubsan detects, we are using static_cast to convert from _Rb_tree_node_base* to _Rb_tree_node_Val* in cases where there is no _Rb_tree_node_Val at that address (_M_impl._M_header is just an _Rb_tree_node_base). That's undefined behaviour, and shows up here when alignof(_Rb_tree_node_base) != alignof(_Rb_tree_node_Val) because we end up with a misaligned _Rb_tree_node_Val* (which we never dereference, because it's the past-the-end iterator, but it's still UB to do the cast). The second problem is that alignof(_Rb_tree_nodelong long) changes depending on whether it's compiled as c++98 or c++11. This is because in C++11 mode I replaced the member _M_value_field with uninitialized storage aligned as alignof(_Val), using __aligned_buffer, but for targets that define the ADJUST_FIELD_ALIGN target macro it's wrong to assume that alignof(_M_value_field) == alignof(_Val), because the type might have weaker alignment when it's a member of a structure. This means the __aligned_buffer in C++11 mode is not aligned the same as the _M_value_field member it was meant to replace. Ouch. The first problem can be fixed by simply removing the casts, I don't see why we need them. They are there because the _Rb_tree_iterator and _Rb_tree_const_iterator constructors take a pointer-to-derived, but then they upcast to pointer-to-base and store that type instead. So if we just make the constructors take pointer-to-base instead then we don't need the possibly-invalid casts. This way we only do the downcast when dereferencing an iterator, which only happens for real nodes where the downcast is valid, not for _M_impl._M_header. The second problem can be fixed by making __aligned_buffer use the alignment of struct _Tp2 { Tp t; } instead of alignof(_Tp). patch.txt fixes those two problems. That is needed on trunk and the gcc-5 branch. patch2.txt then takes the idea further and gets rid of some more casts that aren't actually necessary. The pointer returned by _M_end() is never dereferenced, so there's no reason it can't be a _Base_ptr rather than a _Link_type (i.e. _Rb_tree_node_base* rather than _Rb_tree_node_Val*). While making that change I realised that the _M_xxx_tr() functions for heterogeneous lookup can be simplified, so the non-const versions use const ones and then call _M_const_cast() on the results. Any objections to this additional change going in to trunk? We could take it even further and have _M_begin() return a _Base_ptr and then only downcast it as needed, but I'm not proposing that for now. commit d57e7058e821664735e0118f8e37b8cbc37e49fb Author: Jonathan Wakely jwak...@redhat.com Date: Thu May 21 14:41:16 2015 +0100 PR libstdc++/66017 * include/bits/stl_tree.h (_Rb_tree_iterator, _Rb_tree_const_iterator): Support construction from _Base_ptr. (_Rb_tree_const_iterator::_M_const_cast): Remove static_cast. (_Rb_tree::begin, _Rb_tree::end): Remove static_cast. * include/ext/aligned_buffer.h (__aligned_buffer): Use alignment of _Tp as a member subobject, not as a complete object. diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h index 5ca8e28..2dc95df 100644 --- a/libstdc++-v3/include/bits/stl_tree.h +++ b/libstdc++-v3/include/bits/stl_tree.h @@ -188,7 +188,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _M_node() { } explicit - _Rb_tree_iterator(_Link_type __x) _GLIBCXX_NOEXCEPT + _Rb_tree_iterator(_Base_ptr __x) _GLIBCXX_NOEXCEPT : _M_node(__x) { } reference @@ -260,7 +260,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _M_node() { } explicit - _Rb_tree_const_iterator(_Link_type __x) _GLIBCXX_NOEXCEPT + _Rb_tree_const_iterator(_Base_ptr __x) _GLIBCXX_NOEXCEPT : _M_node(__x) { } _Rb_tree_const_iterator(const iterator __it) _GLIBCXX_NOEXCEPT @@ -268,8 +268,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION iterator _M_const_cast() const _GLIBCXX_NOEXCEPT - { return iterator(static_casttypename iterator::_Link_type - (const_casttypename iterator::_Base_ptr(_M_node))); } + { return iterator(const_casttypename iterator::_Base_ptr(_M_node)); } reference operator*() const _GLIBCXX_NOEXCEPT @@ -868,28 +867,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION iterator begin() _GLIBCXX_NOEXCEPT - { - return iterator(static_cast_Link_type - (this-_M_impl._M_header._M_left)); - } + { return iterator(this-_M_impl._M_header._M_left); } const_iterator begin() const _GLIBCXX_NOEXCEPT - { - return const_iterator(static_cast_Const_Link_type - (this-_M_impl._M_header._M_left)); - } + { return const_iterator(this-_M_impl._M_header._M_left); } iterator end() _GLIBCXX_NOEXCEPT - { return iterator(static_cast_Link_type(this-_M_impl._M_header)); } + { return iterator(this-_M_impl._M_header); } const_iterator
Reuse predicate code analysis for constraints
This patch adjusts the fix for PR target/65689 along the lines suggested in https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01559.html. The idea is to reuse the existing gensupport.c routine to work out the codes accepted by constraints. I'd originally done this with an eye to using compute_test_codes for the problem that Andreas found on s390. I don't think it's going to be useful for that after all, but it seems worth having for its on sake. Bootstrapped regression-tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * gensupport.h (compute_test_codes): Declare. * gensupport.c (compute_predicate_codes): Rename to... (compute_test_codes): ...this. Generalize error message. (process_define_predicate): Update accordingly. * genpreds.c (compute_maybe_allows): Delete. (add_constraint): Use compute_test_codes to determine whether something can accept a SUBREG, REG or MEM. Index: gcc/gensupport.h === --- gcc/gensupport.h2015-05-21 08:45:32.663464769 +0100 +++ gcc/gensupport.h2015-05-21 08:45:33.015460604 +0100 @@ -109,5 +109,6 @@ struct pattern_stats }; extern void get_pattern_stats (struct pattern_stats *ranges, rtvec vec); +extern void compute_test_codes (rtx, int, char *); #endif /* GCC_GENSUPPORT_H */ Index: gcc/gensupport.c === --- gcc/gensupport.c2015-05-21 08:45:32.663464769 +0100 +++ gcc/gensupport.c2015-05-21 08:51:12.667438995 +0100 @@ -204,8 +204,8 @@ #define TRISTATE_NOT(a) \ predicate expression EXP, writing the result to CODES. LINENO is the line number on which the directive containing EXP appeared. */ -static void -compute_predicate_codes (rtx exp, int lineno, char codes[NUM_RTX_CODE]) +void +compute_test_codes (rtx exp, int lineno, char *codes) { char op0_codes[NUM_RTX_CODE]; char op1_codes[NUM_RTX_CODE]; @@ -215,29 +215,29 @@ compute_predicate_codes (rtx exp, int li switch (GET_CODE (exp)) { case AND: - compute_predicate_codes (XEXP (exp, 0), lineno, op0_codes); - compute_predicate_codes (XEXP (exp, 1), lineno, op1_codes); + compute_test_codes (XEXP (exp, 0), lineno, op0_codes); + compute_test_codes (XEXP (exp, 1), lineno, op1_codes); for (i = 0; i NUM_RTX_CODE; i++) codes[i] = TRISTATE_AND (op0_codes[i], op1_codes[i]); break; case IOR: - compute_predicate_codes (XEXP (exp, 0), lineno, op0_codes); - compute_predicate_codes (XEXP (exp, 1), lineno, op1_codes); + compute_test_codes (XEXP (exp, 0), lineno, op0_codes); + compute_test_codes (XEXP (exp, 1), lineno, op1_codes); for (i = 0; i NUM_RTX_CODE; i++) codes[i] = TRISTATE_OR (op0_codes[i], op1_codes[i]); break; case NOT: - compute_predicate_codes (XEXP (exp, 0), lineno, op0_codes); + compute_test_codes (XEXP (exp, 0), lineno, op0_codes); for (i = 0; i NUM_RTX_CODE; i++) codes[i] = TRISTATE_NOT (op0_codes[i]); break; case IF_THEN_ELSE: /* a ? b : c accepts the same codes as (a b) | (!a c). */ - compute_predicate_codes (XEXP (exp, 0), lineno, op0_codes); - compute_predicate_codes (XEXP (exp, 1), lineno, op1_codes); - compute_predicate_codes (XEXP (exp, 2), lineno, op2_codes); + compute_test_codes (XEXP (exp, 0), lineno, op0_codes); + compute_test_codes (XEXP (exp, 1), lineno, op1_codes); + compute_test_codes (XEXP (exp, 2), lineno, op2_codes); for (i = 0; i NUM_RTX_CODE; i++) codes[i] = TRISTATE_OR (TRISTATE_AND (op0_codes[i], op1_codes[i]), TRISTATE_AND (TRISTATE_NOT (op0_codes[i]), @@ -321,7 +321,7 @@ compute_predicate_codes (rtx exp, int li default: error_with_line (lineno, - '%s' cannot be used in a define_predicate expression, + '%s' cannot be used in predicates or constraints, GET_RTX_NAME (GET_CODE (exp))); memset (codes, I, NUM_RTX_CODE); break; @@ -373,7 +373,7 @@ process_define_predicate (rtx desc, int if (GET_CODE (desc) == DEFINE_SPECIAL_PREDICATE) pred-special = true; - compute_predicate_codes (XEXP (desc, 1), lineno, codes); + compute_test_codes (XEXP (desc, 1), lineno, codes); for (i = 0; i NUM_RTX_CODE; i++) if (codes[i] != N) Index: gcc/genpreds.c === --- gcc/genpreds.c 2015-05-21 08:45:32.663464769 +0100 +++ gcc/genpreds.c 2015-05-21 08:45:33.015460604 +0100 @@ -716,34 +716,6 @@ mangle (const char *name) return XOBFINISH (rtl_obstack, const char *); } -/* Return a bitmask, bit 1 if EXP maybe allows a REG/SUBREG, 2 if EXP - maybe allows a MEM. Bits should be clear only when we are sure it - will not allow a REG/SUBREG or a
[PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion
This example which I wrote to test ifconversion, currently fails to if-convert or vectorize: int foo () { for (int i = 0; i 32 ; i++) { int m = (a[i] i) ? 5 : 4; b[i] = a[i] * m; } } ...because jump-threading in dom1 rearranged the loop into a form that neither if-conversion nor vectorization would attempt. Discussion at https://gcc.gnu.org/ml/gcc/2015-04/msg00343.html lead to the suggestion that I should rerun loop-header copying (an earlier attempt to fix ifconversion, https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01743.html, still did not enable vectorization.) This patch does so (and makes slightly less conservative, to tackle the example above). I found I had to make this a separate pass, so that the phi nodes were cleaned up at the end of the pass before running tree_if_conversion. Also at this stage in the compiler (inside loop opts) it was not possible to run loop_optimizer_init+finalize, or other loop_optimizer data structures needed later would be deleted; hence, I have two nearly-but-not-quite-identical passes, the new ch_vect avoiding the init/finalize. I tried to tackle this with some C++ subclassing, which removes the duplication, but the result feels a little ugly; suggestions for any neater approach welcome. This patch causes failure of the scan-tree-dump of dom2 in gcc.dg/ssa/pr21417.c. This looks for jump-threading to perform an optimization, but no longer finds the expected line in the log - as the loop-header-copying phase has already done an equivalent transformation *before* dom2. The final CFG is thus in the desired form, but I'm not sure how to determine this (scanning the CFG itself is very difficult, well beyond what we can do with regex, requiring looking at multiple lines and basic blocks). Can anyone advise? [The test issue can be worked around by preserving the old do_while_p logic for the first header-copying pass, and using the new logic only for the second, but this is more awkward inside the compiler, which feels wrong.] Besides the new vect-ifcvt-11.c, the testsuite actually has a couple of other examples where this patch enables (undesired!) vectorization. I've dealt with these, but for the record: * gcc.dg/vect/slp-perm-7.c: the initialization loop in main, contained a check that input[i] 200; this was already optimized out (because input[i] was set to i%256, where iN with N #defined to 16), but that loop was not vectorized because: /work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: note: not vectorized: latch block not empty. /work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: note: bad loop form. * gcc.dg/vect/vect-strided-a-u16-i4.c: the main1() function has three loops; the first (initialization) has an 'if (y) abort() /* Avoid vectorization. */'. However, the 'volatile int y = 0' this was meant to reference, is actually shadowed by a local non-volatile; the test is thus peeled off and absent from the body of the loop. The loop only avoided vectorization because of non-empty latch and bad loop form, as previous. With this patch, both those loops now have good form, hence I have fixed both to check a global volatile to prevent these extraneous parts from being vectorized. Tested with bootstrap + check-gcc on x86_64 and AArch64 (linux). As noted above, this causes a spurious PASS-FAIL of a scan-tree-dump test, which I'm unsure how to fix, but no other regressions. gcc/ChangeLog: * tree-pass.h (make_pass_ch_vect): New. * passes.def: Add pass_ch_vect just before pass_if_conversion. * tree-ssa-loop-ch.c (do_while_loop_p): For single-exit loops, look for blocks with exit edge and code after them. (pass_data_ch_vect, class pass_ch_vect, make_pass_ch_vect): New. (class pass_ch): Extend pass_ch_vect. (pass_ch::execute): Move all but loop_optimizer_init/finalize to... (pass_ch_vect::execute): ...here. * tree-ssa-loop.c (pass_tree_loop_init::execute): Add flags LOOPS_HAVE_PREHEADERS and LOOPS_HAVE_SIMPLE_LATCHES. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-perm-7.c (zero): New. (main): Test zero rather than input[i], to avoid vectorization. * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of x,y,z,w. of unsigned * gcc.dg/vect/vect-ifcvt-11.c: New. diff --git a/gcc/passes.def b/gcc/passes.def index 1d598b2..87cfe2a 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -247,6 +247,7 @@ along with GCC; see the file COPYING3. If not see PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops) NEXT_PASS (pass_expand_omp_ssa); POP_INSERT_PASSES () + NEXT_PASS (pass_ch_vect); NEXT_PASS (pass_if_conversion); /* pass_vectorize must immediately follow pass_if_conversion. Please do not add any other passes in between. */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
Re: [RFA] Fix combine to canonicalize (mult X pow2)) more often
On Thu, May 21, 2015 at 09:24:37AM -0600, Jeff Law wrote: When combine needs to split a complex insn, it will canonicalize a simple (mult X (const_int Y)) where Y is a power of 2 into the expected (ashift X (const_int Y')) if the (mult ...) is selected as a split point. However if the split point is (plus (mult (X (const_int Y)) Z) combine fails to canonicalize. OK for the trunk? Okay. Something went wrong with your changelog though. Rebasing changelogs is a recipe for disaster. The following is just copy-paste, but anyway... + /* Similarly for (plus (mult FOO (const_int pow2))). */ + if (split_code == PLUS +GET_CODE (XEXP (*split, 0)) == MULT +CONST_INT_P (XEXP (XEXP (*split, 0), 1)) +INTVAL (XEXP (XEXP (*split, 0), 1)) 0 +(i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1 = 0) INTVAL (..) 0 here disallows 163 while exact_log2 allows it (with a 64-bit HWI). Most places don't check the 0 and it doesn't seem necessary in the places that do. This won't ever trigger for a SImode 131 either. None of this matters when we're just looking at scaled indexing, of course. But exact_log2 is much harder to use correctly than to use it incorrectly :-( Segher
Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
On 22/05/15 15:28, Jason Merrill wrote: On 05/22/2015 09:55 AM, David Edelsohn wrote: On Fri, May 22, 2015 at 9:40 AM, Jason Merrill ja...@redhat.com wrote: On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote: + /* Load the guard value only through an atomic acquire load. */ + guard = build_atomic_load (guard, MEMMODEL_ACQUIRE); + /* Check to see if the GUARD is zero. */ guard = get_guard_bits (guard); I wonder if these calls should be reversed, to express that we're only trying to atomically load a byte (on non-ARM targets)? That expresses the semantics more directly, but will that lead to less efficient code on some RISC architectures? I'm not sure. I would expect that the target would use a larger load and mask it if that is better for performance, but I don't know. I do notice that get_guard_bits after build_atomic_load just won't work on non-ARM targets, as it ends up trying to take the address of a value. Jason So on powerpc where targetm.guard_mask_bit is false - this is what I see. { static int * p; static int * p; if (cleanup_point (unsigned char) *(char *) (long long int) __atomic_load_8 (_ZGVZ1fvE1p, 2) == 0) { if (cleanup_point __cxa_guard_acquire (_ZGVZ1fvE1p) != 0) { cleanup_point Unknown tree: expr_stmt TARGET_EXPR D.2846, 0;, p = (int *) operator new (4);;, D.2846 = 1;;, __cxa_guard_release (_ZGVZ1fvE1p); ; } } return retval = p; } with the following output - David - is this what you would expect ? 0: addis 2,12,.TOC.-0b@ha addi 2,2,.TOC.-0b@l .localentry _Z1fv,.-_Z1fv mflr %r0 std %r30,-16(%r1) std %r31,-8(%r1) .cfi_register 65, 0 .cfi_offset 30, -16 .cfi_offset 31, -8 addis %r30,%r2,_ZGVZ1fvE1p@toc@ha addi %r30,%r30,_ZGVZ1fvE1p@toc@l std %r0,16(%r1) stdu %r1,-64(%r1) .cfi_def_cfa_offset 64 .cfi_offset 65, 16 addis %r10,%r2,_ZGVZ1fvE1p@toc@ha # gpr load fusion, type long ld %r10,_ZGVZ1fvE1p@toc@l(%r10) cmpw %cr7,%r10,%r10 bne- %cr7,$+4 isync rlwinm %r9,%r10,0,0xff std %r10,32(%r1) cmpwi %cr7,%r9,0 beq %cr7,.L11 .L9: addi %r1,%r1,64 .cfi_remember_state .cfi_def_cfa_offset 0 addis %r3,%r2,_ZZ1fvE1p@toc@ha # gpr load fusion, type long ld %r3,_ZZ1fvE1p@toc@l(%r3) ld %r0,16(%r1) ld %r30,-16(%r1) ld %r31,-8(%r1) mtlr %r0 .cfi_restore 65 .cfi_restore 31 .cfi_restore 30 blr .p2align 4,,15 .L11: .cfi_restore_state mr %r3,%r30 And on AArch64 which is where guard_mask_bit is true. static int * p; static int * p; if (cleanup_point ((long long int) __atomic_load_8 (_ZGVZ1fvE1p, 2) 1) == 0) { if (cleanup_point __cxa_guard_acquire (_ZGVZ1fvE1p) != 0) { cleanup_point Unknown tree: expr_stmt TARGET_EXPR D.3168, 0;, p = (int *) operator new (4);;, D.3168 = 1;;, __cxa_guard_release (_ZGVZ1fvE1p); ; } } return retval = p; } Alternatively I can change this to an atomic_load of just the byte required but I'm not sure if that's supported on all targets. I'm going to be running away shortly for the long weekend here, so will resume discussions/experiments on Tuesday. regards Ramana
Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
On 05/22/2015 09:55 AM, David Edelsohn wrote: On Fri, May 22, 2015 at 9:40 AM, Jason Merrill ja...@redhat.com wrote: On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote: + /* Load the guard value only through an atomic acquire load. */ + guard = build_atomic_load (guard, MEMMODEL_ACQUIRE); + /* Check to see if the GUARD is zero. */ guard = get_guard_bits (guard); I wonder if these calls should be reversed, to express that we're only trying to atomically load a byte (on non-ARM targets)? That expresses the semantics more directly, but will that lead to less efficient code on some RISC architectures? I'm not sure. I would expect that the target would use a larger load and mask it if that is better for performance, but I don't know. I do notice that get_guard_bits after build_atomic_load just won't work on non-ARM targets, as it ends up trying to take the address of a value. Jason
Re: [PATCH][AArch64] PR target/65491: Classify V1TF vectors as AAPCS64 short vectors rather than composite types
Hi James, On 19/05/15 12:18, James Greenhalgh wrote: On Mon, Apr 20, 2015 at 11:16:02AM +0100, Kyrylo Tkachov wrote: Hi all, The ICE in the PR happens when we pass a 1x(128-bit float) vector as an argument. The aarch64 backend erroneously classifies it as a composite type when in fact it is a short vector according to AAPCS64 (section 4.1.2 from http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.p df). Agreed. The solution in this patch is to check aarch64_composite_type_p for a short vector with aarch64_short_vector_p rather than the other way around (check for aarch64_short_vector_p in aarch64_composite_type_p). I think I understand what you are saying, but your patch does the opposite (ADDS a check for aarch64_short_vector_p in aarch64_composite_type_p, REMOVES a check for aarch64_composite_type_p, in aarch64_short_vector_p)... Yeah, I just worded it wrong in the cover letter, sorry about that. As you say, the logic is pretty hairy. This logic is pretty hairy, and I'm struggling to convince myself that your change only hits the bug you described above. I think I've worked it through and it does, but if you can find any additional ABI tests which stress the Vector/Floating-Point passing rules that would help settle my nerves. The aapcs64.exp stuff seems to test the existing rules quite well... The patch is OK. I wouldn't think we would want to backport it to release branches as there is no regression to fix. Ok, I've committed it with r223577. I agree that it's not a regression fix, so messing with ABI code in the release branches is not desirable. Thanks for the review. Kyrill Thanks, James 2015-04-20 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/65491 * config/aarch64/aarch64.c (aarch64_short_vector_p): Move above aarch64_composite_type_p. Remove check for aarch64_composite_type_p. (aarch64_composite_type_p): Return false if given type and mode are for a short vector. 2015-04-20 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/65491 * gcc.target/aarch64/pr65491_1.c: New test. * gcc.target/aarch64/aapcs64/type-def.h (vlf1_t): New typedef. * gcc.target/aarch64/aapcs64/func-ret-1.c: Add test for vlf1_t.
Re: Add few cases to operand_equal_p
+ case OBJ_TYPE_REF: + { + if (!operand_equal_p (OBJ_TYPE_REF_EXPR (arg0), + OBJ_TYPE_REF_EXPR (arg1), flags)) + return false; + if (flag_devirtualize virtual_method_call_p (arg0)) + { + if (tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg0)) + != tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg1))) + return false; + if (!types_same_for_odr (obj_type_ref_class (arg0), + obj_type_ref_class (arg1))) + return false; devirt machinery in operand_equal_p? please not. not more places! OBJ_TYPE_REF explicitly says what is the type of class whose virtual method is called. It is GIMPLE operand expression like other, so I think we need to handle it. Actually I think I can just drop the flag_devirtualize check because with !flag_devirtualize we should simply avoid having OBJ_TYPE_REF around. That would make it looking less devirt specific. We can also test types for equivalence of main variant, but that would just introduce false positives when LTO did not merged two identical ODR types. It would be correct though. After more tought, I indeed like the idea of having gimple specific matching code better (I managed to talk myself into the idea of unifying the code rather than duplicating everything, but handling generic is a pain) If we go this path, I think I can just withdraw OBJ_TYPE_REF change here. operand_equal_p conservatively consders every pair of two OBJ_TYPE_REFs different that is safe. Concerning the rest of the patch, I leave it up to your decision if we want to handle these in operand_equal_p or only at gimple level. thanks, Honza + /* OBJ_TYPE_REF_OBJECT is used to track the instance of +object THIS pointer points to. Checking that both +addresses equal is safe approximation of the fact +that dynamic types are equal. +Do not care about the other flags, because this expression +does not attribute to actual value of OBJ_TYPE_REF */ + if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0), + OBJ_TYPE_REF_OBJECT (arg1), + OEP_ADDRESS_OF)) + return false; + } + + return true; + } + default: return 0; } @@ -3097,6 +3152,21 @@ operand_equal_p (const_tree arg0, const_ } case tcc_declaration: + /* At LTO time the FIELD_DECL may exist in multiple copies. + We only care about offsets and bit offsets for operands. */ Err? Even at LTO time FIELD_DECLs should only appear once. So - testcase? FIELD_DECL has TREE_TYPE and TREE_TYPE may not get merged by LTO. So if the two expressions originate from two different units, we may have two semantically equivalent FIELD_DECLs (of compatible types and same offsets) that occupy different memory locations because their merging was prevented by something upstream (like complete wrt incmplete pointer in the type) + /* In GIMPLE empty constructors are allowed in initializers of + vector types. */ Why this comment about GIMPLE? This is about comparing GENERIC trees which of course can have CONSTRUCTORs of various sorts. + if (TREE_CODE (arg0) == CONSTRUCTOR) + { + unsigned length1 = vec_safe_length (CONSTRUCTOR_ELTS (arg0)); + unsigned length2 = vec_safe_length (CONSTRUCTOR_ELTS (arg1)); + + if (length1 != length2) + return false; + + flags = ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); + + for (unsigned i = 0; i length1; i++) + if (!operand_equal_p (CONSTRUCTOR_ELT (arg0, i)-value, + CONSTRUCTOR_ELT (arg1, i)-value, flags)) You need to compare -index as well here. I'm not sure constructor values are sorted always so you might get very many false negatives. many false negatives is better than all false negatices :) You are right, either I should punt on non-empty constructors or compare the indexes, will do the second. +case FIELD_DECL: + /* At LTO time the FIELD_DECL may exist in multiple copies. + We only care about offsets and bit offsets for operands. */ So explain that this is the reason we don't want to hash by DECL_UID. I still think this is bogus. Will do if we agree on having this. I know you would like ipa-icf to keep original bodies and use them for inlining declaring alias sets to be function local. This is wrong plan. Consder: void t(int *ptr) { *ptr=1; } int a(int *ptr1, int *ptr2) { int a = *ptr1; t(ptr2) return a+*ptr1; } long b(long *ptr1, int *ptr2) { int a = *ptr1; t(ptr2) return a+*ptr1; } here aliasing leads to the two options to be optimizer differently: a: .LFB1: .cfi_startproc
Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
On Fri, May 22, 2015 at 2:00 AM, Pedro Alves pal...@redhat.com wrote: On 05/21/2015 11:02 PM, Sriraman Tallam wrote: On Thu, May 21, 2015 at 2:51 PM, Pedro Alves pal...@redhat.com wrote: On 05/21/2015 10:12 PM, Sriraman Tallam wrote: My original proposal, for x86_64 only, was to add -fno-plt=function-name. This lets the user decide for which functions PLT must be avoided. Let the compiler always generate an indirect call using call *func@GOTPCREL(%rip). We could do this for non-PIC code too. No need for linker fixups since this relies on the user to know that func is from a shared object. Having to pass function names on the command line seems like an odd interface. E.g, you'll need to pass the mangled name for C++ functions. Any reason this isn't a function attribute? It is not clear to me where I would stick the attribute. Example usage in foo.cc: #includestring.h int main() { int n = memcmp(); } I want memcmp to not go through PLT, do you propose explicitly re-declaring it in foo.cc with the attribute? I guess you'd do: #includestring.h __attribute__((no_plt)) typeof (memcpy) memcpy; int main() { int n = memcmp(); } or even: #includestring.h int main() { if (hotpath) { __attribute__((no_plt)) typeof (memcpy) memcpy; for (..) { int n = memcmp(); } } else { int n = memcmp(); } } or globally: $ cat no-plt/string.h: #include_next string.h __attribute__((no_plt)) typeof (memcpy) memcpy; $ gcc -I no-plt/ ... That looks good, thanks. Sri Thanks, Pedro Alves