Re: RFA (diagnostics): PATCH to move deprecated source location to separate note
Jason Merrill ja...@redhat.com writes: When I was fixing attribute deprecated for C++ templates, I found it odd that the source location for the deprecated decl was embedded in the warning rather than in a separate inform. This patch moves it out. Tested x86_64-pc-linux-gnu. OK for trunk? Yes. Thanks! -- Dodji
Re: [patch] fix expand_builtin_init_dwarf_reg_sizes wrt register spans
On 09/30/2014 12:47 PM, Olivier Hainque wrote: 2014-09-30 Olivier Hainque hain...@adacore.com libgcc/ * unwind-dw2.c (DWARF_REG_TO_UNWIND_COLUMN): Move default def to ... gcc/ * defaults.h: ... here. * dwarf2cfi.c (init_one_dwarf_reg_size): New helper, processing one particular reg for expand_builtin_init_dwarf_reg_sizes. Apply DWARF_REG_TO_UNWIND_COLUMN. (expand_builtin_init_dwarf_reg_sizes): Rework to use helper and account for dwarf register spans. Ok. r~
[PATCH AARCH64]load store pair optimization using sched_fusion pass.
Hi, This is the patch implementing ldp/stp optimization for aarch64. It consists of two parts. The first one is peephole part, which further includes ldp/stp patterns (both peephole patterns and the insn match patterns) and auxiliary functions (both checking the validity and merging). The second part implements the aarch64 backend hook for sched-fusion pass, which calculates appropriate priorities for different kinds of load/store instructions. With these priorities, sched-fusion pass can schedule as many load/store instructions together as possible, thus the coming peephole2 pass can merge them. I collected data for miscellaneous benchmarks. Some cases are improved; most of the rest cases are not regressed; only couple of them are regressed a little by 2-3%. After looking into the regressions I can confirm that code transformation is generally good with many load/stores paired. These regressions are most probably false alarms and caused by other issues. Conclusion is this patch can pair lots of consecutive load/store instructions into ldp/stp. The conclusion can be proven by code size improvement of benchmarks. E.g., in general it cuts off text size of spec2k6 binaries (O3 level, not statically linked in my build) by 1.68%. Bootstrap and test on aarch64. Is it OK? 2014-11-18 Bin Cheng bin.ch...@arm.com * config/aarch64/aarch64.md (load_pairmode): Split to load_pairsi, load_pairdi, load_pairsf and load_pairdf. (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split from load_pairmode. New alternative to support int/fp registers in fp/int mode patterns. (store_pairmode:): Split to store_pairsi, store_pairdi, store_pairsf and store_pairdi. (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split from store_pairmode. New alternative to support int/fp registers in fp/int mode patterns. (*load_pair_extendsidi2_aarch64): New pattern. (*load_pair_zero_extendsidi2_aarch64): New pattern. (aarch64-ldpstp.md): Include. * config/aarch64/aarch64-ldpstp.md: New file. * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp): New. (extract_base_offset_in_addr): New. (aarch64_operands_ok_for_ldpstp): New. (aarch64_operands_adjust_ok_for_ldpstp): New. * config/aarch64/aarch64.c (enum sched_fusion_type): New enum. (TARGET_SCHED_FUSION_PRIORITY): New hook. (fusion_load_store): New functon. (extract_base_offset_in_addr): New function. (aarch64_gen_adjusted_ldpstp): New function. (aarch64_sched_fusion_priority): New function. (aarch64_operands_ok_for_ldpstp): New function. (aarch64_operands_adjust_ok_for_ldpstp): New function. 2014-11-18 Bin Cheng bin.ch...@arm.com * gcc.target/aarch64/ldp-stp-1.c: New test. * gcc.target/aarch64/ldp-stp-2.c: New test. * gcc.target/aarch64/ldp-stp-3.c: New test. * gcc.target/aarch64/ldp-stp-4.c: New test. * gcc.target/aarch64/ldp-stp-5.c: New test. * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion and peephole2 pass.Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 217687) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -1054,64 +1054,141 @@ ;; Operands 1 and 3 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. -(define_insn load_pairmode - [(set (match_operand:GPI 0 register_operand =r) - (match_operand:GPI 1 aarch64_mem_pair_operand Ump)) - (set (match_operand:GPI 2 register_operand =r) -(match_operand:GPI 3 memory_operand m))] +(define_insn load_pairsi + [(set (match_operand:SI 0 register_operand =r,*w) + (match_operand:SI 1 aarch64_mem_pair_operand Ump,Ump)) + (set (match_operand:SI 2 register_operand =r,*w) + (match_operand:SI 3 memory_operand m,m))] rtx_equal_p (XEXP (operands[3], 0), plus_constant (Pmode, XEXP (operands[1], 0), - GET_MODE_SIZE (MODEmode))) - ldp\\t%w0, %w2, %1 - [(set_attr type load2)] + GET_MODE_SIZE (SImode))) + @ + ldp\\t%w0, %w2, %1 + ldp\\t%s0, %s2, %1 + [(set_attr type load2,neon_load1_2reg) + (set_attr fp *,yes)] ) +(define_insn load_pairdi + [(set (match_operand:DI 0 register_operand =r,*w) + (match_operand:DI 1 aarch64_mem_pair_operand Ump,Ump)) + (set (match_operand:DI 2 register_operand =r,*w) + (match_operand:DI 3 memory_operand m,m))] + rtx_equal_p (XEXP (operands[3], 0), + plus_constant (Pmode, + XEXP (operands[1], 0), + GET_MODE_SIZE (DImode))) + @ + ldp\\t%x0, %x2, %1 + ldp\\t%d0, %d2, %1 + [(set_attr type load2,neon_load1_2reg) +
Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian
On 06/11/14 08:35, Yangfei (Felix) wrote: The idea is simple: Use movw for certain const source operand instead of ldrh. And exclude the const values which cannot be handled by mov/mvn/movw. I am doing regression test for this patch. Assuming no issue pops up, OK for trunk? So, doesn't that makes the bug latent for architectures older than armv6t2 and big endian and only fixed this in ARM state ? I'd prefer a complete solution please. What about *thumb2_movhi_insn in thumb2.md ? Actually, we fix the bug by excluding the const values which cannot be handled. The patch still works even without the adding of movw here. The new movw alternative here is just an small code optimization for certain arch. We can handle consts by movw instead of ldrh and this better for performance. We find that this bug is not reproducible for *thumb2_movhi_insn. The reason is that this pattern can always move consts using movw. Please fix the PR number in your final commit with PR 59593. Index: gcc/config/arm/predicates.md === --- gcc/config/arm/predicates.md(revision 216838) +++ gcc/config/arm/predicates.md(working copy) @@ -144,6 +144,11 @@ (and (match_code const_int) (match_test INTVAL (op) == 0))) +(define_predicate arm_movw_immediate_operand + (and (match_test TARGET_32BIT arm_arch_thumb2) + (and (match_code const_int) + (match_test (INTVAL (op) 0x) == 0 + ;; Something valid on the RHS of an ARM data-processing instruction (define_predicate arm_rhs_operand (ior (match_operand 0 s_register_operand) @@ -211,6 +216,11 @@ (ior (match_operand 0 arm_rhs_operand) (match_operand 0 arm_not_immediate_operand))) +(define_predicate arm_hi_operand + (ior (match_operand 0 arm_rhsm_operand) + (ior (match_operand 0 arm_not_immediate_operand) +(match_operand 0 arm_movw_immediate_operand + I think you don't need any of these predicates. (define_predicate arm_di_operand (ior (match_operand 0 s_register_operand) (match_operand 0 arm_immediate_di_operand))) Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 216838) +++ gcc/config/arm/arm.md (working copy) @@ -6285,8 +6285,8 @@ ;; Pattern to recognize insn generated default case above (define_insn *movhi_insn_arch4 - [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r) - (match_operand:HI 1 general_operand rIk,K,r,mi))] + [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r) + (match_operand:HI 1 arm_hi_operand rIk,K,j,r,mi))] Use `n' instead of `j' - movw can handle all of the numerical constants for HImode values. And the predicate can remain general_operand. TARGET_ARM arm_arch4 (register_operand (operands[0], HImode) @@ -6294,16 +6294,18 @@ @ mov%?\\t%0, %1\\t%@ movhi mvn%?\\t%0, #%B1\\t%@ movhi + movw%?\\t%0, %1\\t%@ movhi str%(h%)\\t%1, %0\\t%@ movhi ldr%(h%)\\t%0, %1\\t%@ movhi [(set_attr predicable yes) - (set_attr pool_range *,*,*,256) - (set_attr neg_pool_range *,*,*,244) + (set_attr pool_range *,*,*,*,256) + (set_attr neg_pool_range *,*,*,*,244) (set_attr_alternative type [(if_then_else (match_operand 1 const_int_operand ) (const_string mov_imm ) (const_string mov_reg)) (const_string mvn_imm) + (const_string mov_imm) (const_string store1) (const_string load1)])] ) Look at the set_attr arch alternative and set this to t2 for the movw alternative. I would expect that to be enough to sort this out instead of adding all this code. I don't think your patch or the one with my modifications fixes the issue completely for architectures that do not have movw / movt instructions and I would expect that one would still need Thomas's patch to create const table entries of the right size. Otherwise this is OK. regards Ramana --- gcc/ChangeLog (revision 216838) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,12 @@ +2014-11-05 Felix Yang felix.y...@huawei.com + Shanyao Chen chenshan...@huawei.com + I'm assuming you have copyright assignments sorted. Yes, both my employer(Huawei) and I have signed copyright assignments with FSF. +(define_predicate arm_movw_immediate_operand + (and (match_test TARGET_32BIT arm_arch_thumb2) + (ior (match_code high) I don't see why you need to check for high here ? Yeah, I double checked and found that it's not necessary here. Thanks for pointing this out. Attached please find the updated patch. Reg-tested for armeb-none-eabi. OK for the trunk? Index: gcc/ChangeLog
[PATCH] [AArch64, NEON] More NEON intrinsics improvement
Hi, This patch converts more intrinsics to use builtin functions instead of the previous inline assembly syntax. Passed the glorious testsuite of Christophe Lyon. Three testcases are added for the testing of intriniscs which are not covered by the testsuite: gcc.target/aarch64/vfma.c gcc.target/aarch64/vfma_n.c gcc.target/aarch64/vfms.c Regtested with aarch64-linux-gnu on QEMU. OK for the trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,26 @@ +2014-11-18 Felix Yang felix.y...@huawei.com + Haijian Zhang z.zhanghaij...@huawei.com + Jiji Jiang jiangj...@huawei.com + Pengfei Sui suipeng...@huawei.com + + * config/aarch64/arm_neon.h (vrecpe_u32, vrecpeq_u32): Rewrite using + builtin functions. + (vfma_f32, vfmaq_f32, vfmaq_f64, vfma_n_f32, vfmaq_n_f32, vfmaq_n_f64, + vfms_f32, vfmsq_f32, vfmsq_f64): Likewise. + (vhsub_s8, vhsub_u8, vhsub_s16, vhsub_u16, vhsub_s32, vhsub_u32, + vhsubq_s8, vhsubq_u8, vhsubq_s16, vhsubq_u16, vhsubq_s32, vhsubq_u32, + vsubhn_s16, vsubhn_u16, vsubhn_s32, vsubhn_u32, vsubhn_s64, vsubhn_u66, + vrsubhn_s16, vrsubhn_u16, vrsubhn_s32, vrsubhn_u32, vrsubhn_s64, + vrsubhn_u64, vsubhn_high_s16, vsubhn_high_u16, vsubhn_high_s32, + vsubhn_high_u32, vsubhn_high_s64, vsubhn_high_u64, vrsubhn_high_s16, + vrsubhn_high_u16, vrsubhn_high_s32, vrsubhn_high_u32, vrsubhn_high_s64, + vrsubhn_high_u64): Likewise. + * config/aarch64/iterators.md (VDQ_SI): New mode iterator. + * config/aarch64/aarch64.md (define_c_enum unspec): Add UNSPEC_URECPE. + * config/aarch64/aarch64-simd.md (aarch64_urecpemode): New pattern. + * config/aarch64/aarch64-simd-builtins.def (shsub, uhsub, subhn, rsubhn, + subhn2, rsubhn2, urecpe): New builtins. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/testsuite/gcc.target/aarch64/narrow_high-intrinsics.c === --- gcc/testsuite/gcc.target/aarch64/narrow_high-intrinsics.c (revision 217394) +++ gcc/testsuite/gcc.target/aarch64/narrow_high-intrinsics.c (working copy) @@ -107,9 +107,9 @@ ONE (vmovn_high, uint16x8_t, uint16x4_t, uint32x4_ ONE (vmovn_high, uint32x4_t, uint32x2_t, uint64x2_t, u64) -/* { dg-final { scan-assembler-times \\tsubhn2 v 6} } */ +/* { dg-final { scan-assembler-times \\tsubhn2\\tv 6} } */ /* { dg-final { scan-assembler-times \\taddhn2\\tv 6} } */ -/* { dg-final { scan-assembler-times rsubhn2 v 6} } */ +/* { dg-final { scan-assembler-times rsubhn2\\tv 6} } */ /* { dg-final { scan-assembler-times raddhn2\\tv 6} } */ /* { dg-final { scan-assembler-times \\trshrn2 v 6} } */ /* { dg-final { scan-assembler-times \\tshrn2 v 6} } */ Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_n.c === --- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_n.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_n.c (revision 0) @@ -0,0 +1,69 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x4438ca3d, 0x44390a3d }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x44869eb8, 0x4486beb8, 0x4486deb8, 0x4486feb8 }; +VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0x408906e1532b8520, 0x40890ee1532b8520 }; + +#define VECT_VAR_ASSIGN(S,Q,T1,W) S##Q##_##T1##W +#define ASSIGN(S, Q, T, W, V) T##W##_t S##Q##_##T##W = V +#define TEST_MSG VFMA/VFMAQ +void exec_vfma_n (void) +{ + /* Basic test: v4=vfma_n(v1,v2), then store the result. */ +#define TEST_VFMA(Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +vfma##Q##_n_##T2##W(VECT_VAR(vector1, T1, W, N), \ + VECT_VAR(vector2, T1, W, N), \ + VECT_VAR_ASSIGN(Scalar, Q, T1, W)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + +#define CHECK_VFMA_RESULTS(test_name,comment) \ + {\ +CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment); \ +CHECK_FP(test_name, float, 32, 4, PRIx32, expected, comment); \ + CHECK_FP(test_name, float, 64, 2, PRIx64, expected, comment); \ + } + +#define DECL_VABD_VAR(VAR) \ + DECL_VARIABLE(VAR, float, 32, 2);\ + DECL_VARIABLE(VAR, float, 32, 4);\ + DECL_VARIABLE(VAR, float, 64, 2); + + DECL_VABD_VAR(vector1); + DECL_VABD_VAR(vector2); + DECL_VABD_VAR(vector3); +
Re: [PATCH][ARM] Handle simple SImode PLUS and MINUS cases in rtx costs
On Wed, Nov 12, 2014 at 4:38 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is a much-delayed respin of the patch in response to Richards feedback at: http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00068.html We now let recursion do its magic and just add the cost of the arithmetic operation on top. Tested on arm-none-eabi Ok for trunk? Ok. Ramana 2014-11-12 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (arm_new_rtx_costs, case PLUS, MINUS): Add cost of alu.arith in simple SImode case.
Re: [PATCH 0/3][AArch64]More intrinsics/builtins improvements
Yeah, I agree that your approach is better. I missed this point. Thanks. Ah, sorry for the duplication of effort. And thanks for the heads-up about upcoming work! I don't think I have any plans for any of those others at the moment. In the case of vld1_dup, however, I'm going to argue that my approach (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01718.html) is better, in that a builtin is opaque (blocks optimization) for the midend, whereas gcc vector extensions (as in vdup_n_...) allows the midend to perform constant-folding, etc. Does that make sense? --Alan Yangfei (Felix) wrote: These three are logically independent, but all on a common theme, and I've tested them all together by bootstrapped + check-gcc on aarch64-none-elf cross-tested check-gcc on aarch64_be-none-elf Ok for trunk? Hi Alan, It seems that we are duplicating the work for the vld1_dup part. (Refer to my message: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01462.html) I have a plan to improve these intrinsics/builtins: vrsubhnX, vrsqrtX, vqrdmulX, vqmovX, vqdmulhqX, vqdmulhX, vpminX, vpmaxX, vpaddX, vpadaX vmvnX, vmulxX, vmovnX, vmlsX, vhsubX, vcvtX, vcopyX, vaddlvX, vabX, vfmX, vrecpeX, vcntX, vclsX And work for these intrinsics is in progress: vfmX, vrecpeX, vhsubX, vcntX, vclsX Please let me know if you guys want to work on any of them. Thanks.
Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
Now - I can see how that is easily confused by the static chain being address-taken. But I also remember that Eric did some preparatory work to fix that, for nested functions, that is, possibly setting DECL_NONADDRESSABLE_P? Don't remember exactly. The preparatory work is DECL_NONLOCAL_FRAME. The complete patch which does something along these lines is attached to PR tree-optimization/54779 (latest version, for a 4.9-based compiler). -- Eric Botcazou
Re: [ARM] Refactor Neon Builtins infrastructure
On Wed, Nov 12, 2014 at 5:08 PM, James Greenhalgh james.greenha...@arm.com wrote: Hi, I was taking a look at fixing the issues in the ARM back-end exposed by Marc Glisse's patch in [1], and hoped to fix them by adapting the patch recently commited by Tejas ([2]). As I looked, I realised that the ARM target and the AArch64 target now differ drastically in how their Advanced SIMD builtin initialisation and expansion logic works. This is a growing maintenance burden. This patch series is an attempt to start fixing the problem. From a high level, I see five problems with the ARM Neon builtin code. First is the magic number interface, which gives builtins with signed and unsigned, or saturating and non-saturating, variants an extra parameter used to control which instruction is ultimately emitted. This is problematic as it enforces that these intrinsics be implemented with an UNSPEC pattern, we would like the flexibility to try to do a better job of modeling these patterns. Second, is that all the code lives in arm.c. This file is huge and frightening. The least we could do is start to split it up! Third, is the complicated builtin initialisation code. If we collect common cases together from the large switch in the initialisation function, it is clear we can eliminate much of the existing code. In fact, we have already solved the same problem in AArch64 ([3]), and we don't gain anything from having these interfaces separate. Fourth, is that we don't have infrastructure to strongly type the functions in arm_neon.h - instead casting around between signed and unsigned vector arguments as required. We need this to avoid special casing some builtins we may want to vectorize (bswap and friends). Again we've solved this in AArch64 ([4]). Finally, there are the issues with type mangling Marc has seen. This patch-set tries to fix those issues in order, and progresses as so: First the magic words: [Refactor Builtins: 1/8] Remove arm_neon.h's Magic Words Then moving code out to arm-builtins.c: [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file And then making the ARM backend look like the AArch64 backend and fixing Marc's issue. [Patch ARM Refactor Builtins 4/8] Refactor VARn Macros [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM. [Patch ARM Refactor Builtins 6/8] Add some tests for poly mangling [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure Clearly there is more we could do to start sharing code between the two targets rather than duplicating it. For now, the benefit did not seem worth the substantial churn that this would cause both back-ends. I've bootstrapped each patch in this series in turn for both arm and thumb on arm-none-linux-gnueabihf. OK for trunk? This is OK thanks. regards Ramana Thanks, James --- [1]: [c++] typeinfo for target types https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00618.html [2]: [AArch64, Patch] Restructure arm_neon.h vector types's implementation https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00264.html [3]: [AArch64] Refactor Advanced SIMD builtin initialisation. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00532.html [4]: [AArch64] AArch64 SIMD Builtins Better Type Correctness. https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02005.html --- gcc/config.gcc |3 +- gcc/config/arm/arm-builtins.c| 2925 gcc/config/arm/arm-protos.h | 173 +- gcc/config/arm/arm-simd-builtin-types.def| 48 + gcc/config/arm/arm.c | 3149 +- gcc/config/arm/arm_neon.h| 1743 +++--- gcc/config/arm/arm_neon_builtins.def | 435 ++-- gcc/config/arm/iterators.md | 167 ++ gcc/config/arm/neon.md | 893 gcc/config/arm/t-arm | 11 + gcc/config/arm/unspecs.md| 109 +- gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C | 16 + gcc/testsuite/g++.dg/abi/mangle-neon.C |5 + gcc/testsuite/gcc.target/arm/pr51968.c |2 +- create mode 100644 gcc/config/arm/arm-builtins.c create mode 100644 gcc/config/arm/arm-simd-builtin-types.def create mode 100644 gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C 14 files changed, 4992 insertions(+), 4687 deletions(-)
Re: [Refactor Builtins: 1/8] Remove arm_neon.h's Magic Words
On 12/11/14 17:10, James Greenhalgh wrote: Hi, As part of some wider cleanup I'd like to do to ARM's Neon Builtin infrastructure, my first step will be to remove the Magic Words used to decide which variant of an instruction should be emitted. The Magic Words interface allows a single builtin (say, __builtin_neon_shr_nv4hi) to cover signed, unsigned and rounding variants through the use of an extra control parameter. This patch removes that interface, defining individual builtins for each variant and dropping the extra parameter. There are several benefits to cleaning this up: * We can start to drop some of the UNSPEC operations without having to add additional expand patterns to map them. * The interface is confusing on first glance at the file. * Having such a different interface to AArch64 doubles the amount of time it takes to grok the Neon Builtins infrastructure. The drawbacks of changing this interface are: * Another big churn change for the ARM backend. * A series of new iterators, UNSPECs and builtin functions to cover the variants which were previously controlled by a Magic Word. * Lots more patterns for genrecog to think about, potentially slowing down compilation, increasing bootstrap time, and increasing compiler binary size. On balance, I think we should deal with drawbacks in return for the future clean-ups we enable, but I expect this to be controversial. This patch is naieve and conservative. I don't make any effort to merge patterns across iterators, nor any attempt to change UNSPECs to specified tree codes. Future improvements in this area would be useful. I've bootstrapped the patch for arm-none-linux-gnueabihf in isolation, and in series. OK for trunk? Ok, Ramana Thanks, James --- gcc/testsuite/ 2014-11-12 James Greenhalgh james.greenha...@arm.com * gcc.target/arm/pr51968.c (foo): Do not try to pass Magic Word. gcc/ 2014-11-12 James Greenhalgh james.greenha...@arm.com * config/arm/arm.c (arm_expand_neon_builtin): Remove Magic Word parameter, rearrange switch statement accordingly. (arm_evpc_neon_vrev): Remove Magic Word. * config/arm/unspecs.md (unspec): Split many UNSPECs to rounding, or signed/unsigned variants. * config/arm/neon.md (vcondmodemode): Remove Magic Word code. (vcondumodemode): Likewise. (neon_vadd): Remove Magic Word operand. (neon_vaddl): Remove Magic Word operand, convert to use signed/unsigned iterator. (neon_vaddw): Likewise. (neon_vhadd): Likewise, also iterate over rounding forms. (neon_vqadd): Remove Magic Word operand, convert to use signed/unsigned iterator. (neon_vraddhn): Remove Magic Word operand, convert to iterate over rounding forms. (neon_vmul): Remove Magic Word operand, iterate over polynomial/float instruction forms. (neon_vmla): Remove Magic Word operand. (neon_vfma): Likewise. (neon_vfms): Likewise. (neon_vmls): Likewise. (neon_vmlal): Remove Magic Word operand, iterate over signed/unsigned forms. (neon_vmlsl): Likewise. (neon_vqdmulh): Remove Magic Word operand, iterate over rounding forms. (neon_vqdmlal): Remove Magic Word operand, iterate over signed/unsigned forms. (neon_vqdmlsl): Likewise. (neon_vmull): Likewise. (neon_vqdmull): Remove Magic Word operand. (neon_vsub): Remove Magic Word operand. (neon_vsubl): Remove Magic Word operand, convert to use signed/unsigned iterator. (neon_vsubw): Likewise. (neon_vhsub): Likewise. (neon_vqsub): Likewise. (neon_vrsubhn): Remove Magic Word operand, convert to iterate over rounding forms. (neon_vceq): Remove Magic Word operand. (neon_vcge): Likewise. (neon_vcgeu): Likewise. (neon_vcgt): Likewise. (neon_vcgtu): Likewise. (neon_vcle): Likewise. (neon_vclt): Likewise. (neon_vcage): Likewise. (neon_vcagt): Likewise. (neon_vabd): Remove Magic Word operand, iterate over signed/unsigned forms, and split out... (neon_vabdf): ...this as new. (neon_vabdl): Remove Magic Word operand, iterate over signed/unsigned forms. (neon_vaba): Likewise. (neon_vmax): Remove Magic Word operand, iterate over signed/unsigned and max/min forms, and split out... (neon_vmaxminf): ...this as new. (neon_vmin): Delete. (neon_vpadd): Remove Magic Word operand. (neon_vpaddl): Remove Magic Word operand, iterate over signed/unsigned variants. (neon_vpadal): Likewise. (neon_vpmax): Remove Magic Word operand, iterate over signed/unsigned and max/min forms, and split out...
Re: [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h
On 12/11/14 17:10, James Greenhalgh wrote: Hi, If we want to move all the code relating to builtin initialisation and expansion to a common file, we must share the processor flags with that common file. This patch pulls those definitions out to config/arm/arm-protos.h Bootstrapped and regression tested in series, and in isolation with no issues. OK? Thanks, James Ok. Ramana --- 2014-11-12 James Greenhalgh james.greenha...@arm.com * config/arm/t-arm (arm.o): Include arm-protos.h in the recipe. * config/arm/arm.c (FL_CO_PROC): Move to arm-protos.h. (FL_ARCH3M): Likewise. (FL_MODE26): Likewise. (FL_MODE32): Likewise. (FL_ARCH4): Likewise. (FL_ARCH5): Likewise. (FL_THUMB): Likewise. (FL_LDSCHED): Likewise. (FL_STRONG): Likewise. (FL_ARCH5E): Likewise. (FL_XSCALE): Likewise. (FL_ARCH6): Likewise. (FL_VFPV2): Likewise. (FL_WBUF): Likewise. (FL_ARCH6K): Likewise. (FL_THUMB2): Likewise. (FL_NOTM): Likewise. (FL_THUMB_DIV): Likewise. (FL_VFPV3): Likewise. (FL_NEON): Likewise. (FL_ARCH7EM): Likewise. (FL_ARCH7): Likewise. (FL_ARM_DIV): Likewise. (FL_ARCH8): Likewise. (FL_CRC32): Likewise. (FL_SMALLMUL): Likewise. (FL_IWMMXT): Likewise. (FL_IWMMXT2): Likewise. (FL_TUNE): Likewise. (FL_FOR_ARCH2): Likewise. (FL_FOR_ARCH3): Likewise. (FL_FOR_ARCH3M): Likewise. (FL_FOR_ARCH4): Likewise. (FL_FOR_ARCH4T): Likewise. (FL_FOR_ARCH5): Likewise. (FL_FOR_ARCH5T): Likewise. (FL_FOR_ARCH5E): Likewise. (FL_FOR_ARCH5TE): Likewise. (FL_FOR_ARCH5TEJ): Likewise. (FL_FOR_ARCH6): Likewise. (FL_FOR_ARCH6J): Likewise. (FL_FOR_ARCH6K): Likewise. (FL_FOR_ARCH6Z): Likewise. (FL_FOR_ARCH6ZK): Likewise. (FL_FOR_ARCH6T2): Likewise. (FL_FOR_ARCH6M): Likewise. (FL_FOR_ARCH7): Likewise. (FL_FOR_ARCH7A): Likewise. (FL_FOR_ARCH7VE): Likewise. (FL_FOR_ARCH7R): Likewise. (FL_FOR_ARCH7M): Likewise. (FL_FOR_ARCH7EM): Likewise. (FL_FOR_ARCH8A): Likewise. * config/arm/arm-protos.h: Take definitions moved from arm.c.
RE: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
Hi Christophe, Ah sorry. My mistake - it fixes this in bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 This change is needed in order to remove the CANNOT_CHANGE_MODE_CLASS #define, which will be committed as a separate patch. Regards, David Sherwood. -Original Message- From: Christophe Lyon [mailto:christophe.l...@linaro.org] Sent: 17 November 2014 21:09 To: David Sherwood Cc: gcc-patches@gcc.gnu.org; Alan Hayward Subject: Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. On 13 November 2014 15:32, David Sherwood david.sherw...@arm.com wrote: Hi, I think that's because Alan Hayward's original patch had a bug in it, which he has fixed and should be submitting to gcc patches fairly soon. So it's probably best waiting until he gets a new patch up I think. I've applied both Alan's patches and the advsimd-intrinsics tests now all pass on aarch64_be, but this doesn't need your patch. Which testcase does your patch actually fix? Regards, David. -Original Message- From: Christophe Lyon [mailto:christophe.l...@linaro.org] Sent: 13 November 2014 14:22 To: David Sherwood Cc: gcc-patches@gcc.gnu.org Subject: Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. On 13 November 2014 11:09, David Sherwood david.sherw...@arm.com wrote: Hi All, I have successfully rebased this and tested in conjunction with a patch from Alan Hayward ([AArch64] [BE] Fix vector load/stores to not use ld1/st1), who should be submitting a new version shortly. Built and tested on: aarch64-none-elf aarch64_be-none-elf x86_64-linux-gnu I've applied both patches to recent trunk (r217483), and I still see ICEs on aarch64_be, when running the advsimd-intrinsics/vldX.c tests. I have them with Alan Hayward's patch alone, too. BTW, I thought that patch had been approved by Marcus, but it's not in trunk yet. Maybe I missed something. Christophe. Regards, David Sherwood. -Original Message- From: David Sherwood [mailto:david.sherw...@arm.com] Sent: 28 October 2014 08:55 To: 'gcc-patches@gcc.gnu.org' Subject: RE: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. Hi, Sorry to bother you again. Could someone take a look at this change please if they have time? Thanks! David. -Original Message- From: David Sherwood [mailto:david.sherw...@arm.com] Sent: 10 October 2014 15:48 To: gcc-patches@gcc.gnu.org Subject: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. Hi, I have a fix (originally written by Tejas Belagod) for the following bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 Could someone take a look please? Thanks! David Sherwood. ChangeLog: gcc/: 2014-11-13 David Sherwood david.sherw...@arm.com * config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): New decls. * config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum. * config/aarch64/iterators.md (insn_count): New mode_attr. * config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i, vec_load_lanes(o/c/x)i): Fixed to work for Big Endian. * config/aarch64/aarch64-simd.md (aarch64_rev_reglist, aarch64_simd_(ld/st)(2/3/4)): Added. * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): Added.
Re: [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file
On 12/11/14 17:10, James Greenhalgh wrote: Hi, The config/arm/arm.c file has always seemed a worrying size to me. This patch pulls out the builtin related code to its own file. I think this will be a good idea as we move forward. It seems a more sensible separation of concerns. There are no functional changes here. Bootstrapped and regression tested on arm-none-linux-gnueabi, with no issues. OK? Thanks, James --- 2014-11-12 James Greenhalgh james.greenha...@arm.com * config.gcc (extra_objs): Add arm-builtins.o for arm*-*-*. (target_gtfiles): Add config/arm/arm-builtins.c for arm*-*-*. * config/arm/arm-builtins.c: New. * config/arm/t-arm (arm_builtins.o): New. * config/arm/arm-protos.h (arm_expand_builtin): New. (arm_builtin_decl): Likewise. (arm_init_builtins): Likewise. (arm_atomic_assign_expand_fenv): Likewise. * config/arm/arm.c (arm_atomic_assign_expand_fenv): Remove prototype. (arm_init_builtins): Likewise. (arm_init_iwmmxt_builtins): Likewise (safe_vector_operand): Likewise (arm_expand_binop_builtin): Likewise (arm_expand_unop_builtin): Likewise (arm_expand_builtin): Likewise (arm_builtin_decl): Likewise (insn_flags): Remove static. (tune_flags): Likewise. (enum arm_builtins): Move to config/arm/arm-builtins.c. (arm_init_neon_builtins): Likewise. (struct builtin_description): Likewise. (arm_init_iwmmxt_builtins): Likewise. (arm_init_fp16_builtins): Likewise. (arm_init_crc32_builtins): Likewise. (arm_init_builtins): Likewise. (arm_builtin_decl): Likewise. (safe_vector_operand): Likewise. (arm_expand_ternop_builtin): Likewise. (arm_expand_binop_builtin): Likewise. (arm_expand_unop_builtin): Likewise. (neon_dereference_pointer): Likewise. (arm_expand_neon_args): Likewise. (arm_expand_neon_builtin): Likewise. (neon_split_vcombine): Likewise. (arm_expand_builtin): Likewise. (arm_builtin_vectorized_function): Likewise. (arm_atomic_assign_expand_fenv): Likewise. Ok. Ramana
Re: [Patch ARM Refactor Builtins 4/8] Refactor VARn Macros
On 12/11/14 17:10, James Greenhalgh wrote: Hi, These macros can always be defined as a base case of VAR1 and a recursive case of VARn-1. At the moment, the body of VAR1 is duplicated to each macro. This patch makes that change. Regression tested on arm-none-linux-gnueabihf with no issues. OK? Thanks, James --- gcc/ 2014-11-12 James Greenhalgh james.greenha...@arm.com * config/arm/arm-builtins.c (VAR1): Add a comma. (VAR2): Rewrite in terms of VAR1. (VAR3-10): Likewise. (arm_builtins): Remove leading comma before ARM_BUILTIN_MAX. * config/arm/arm_neon_builtins.def: Remove trailing commas. OK. Ramana
Re: [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM.
On 12/11/14 17:10, James Greenhalgh wrote: Hi, Now we have everything we need to start keeping track of the correct qualifiers for each Neon builtin class in the arm back-end. Some of the ARM Neon itypes are redundant when mapped to the qualifiers framework. For now, don't change these, we will clean them up in patch 8 of the series. Bootstrapped on arm-none-gnueabihf with no issues. OK? OK. Ramana Thanks, James --- gcc/ 2014-11-12 James Greenhalgh james.greenha...@arm.com * gcc/config/arm/arm-builtins.c (arm_type_qualifiers): New. (neon_itype): Add new types corresponding to the types used in qualifiers names. (arm_unop_qualifiers): New. (arm_bswap_qualifiers): Likewise. (arm_binop_qualifiers): Likewise. (arm_ternop_qualifiers): Likewise. (arm_getlane_qualifiers): Likewise. (arm_lanemac_qualifiers): Likewise. (arm_setlane_qualifiers): Likewise. (arm_combine_qualifiers): Likewise. (arm_load1_qualifiers): Likewise. (arm_load1_lane_qualifiers): Likewise. (arm_store1_qualifiers): Likewise. (arm_storestruct_lane_qualifiers): Likewise. (UNOP_QUALIFIERS): Likewise. (DUP_QUALIFIERS): Likewise. (SPLIT_QUALIFIERS): Likewise. (CONVERT_QUALIFIERS): Likewise. (FLOAT_WIDEN_QUALIFIERS): Likewise. (FLOAT_NARROW_QUALIFIERS): Likewise. (RINT_QUALIFIERS): Likewise. (COPYSIGNF_QUALIFIERS): Likewise. (CREATE_QUALIFIERS): Likewise. (REINTERP_QUALIFIERS): Likewise. (BSWAP_QUALIFIERS): Likewise. (BINOP_QUALIFIERS): Likewise. (FIXCONV_QUALIFIERS): Likewise. (SCALARMUL_QUALIFIERS): Likewise. (SCALARMULL_QUALIFIERS): Likewise. (SCALARMULH_QUALIFIERS): Likewise. (TERNOP_QUALIFIERS): Likewise. (SELECT_QUALIFIERS): Likewise. (VTBX_QUALIFIERS): Likewise. (GETLANE_QUALIFIERS): Likewise. (SHIFTIMM_QUALIFIERS): Likewise. (LANEMAC_QUALIFIERS): Likewise. (SCALARMAC_QUALIFIERS): Likewise. (SETLANE_QUALIFIERS): Likewise. (SHIFTINSERT_QUALIFIERS): Likewise. (SHIFTACC_QUALIFIERS): Likewise. (LANEMUL_QUALIFIERS): Likewise. (LANEMULL_QUALIFIERS): Likewise. (LANEMULH_QUALIFIERS): Likewise. (COMBINE_QUALIFIERS): Likewise. (VTBL_QUALIFIERS): Likewise. (LOAD1_QUALIFIERS): Likewise. (LOADSTRUCT_QUALIFIERS): Likewise. (LOAD1LANE_QUALIFIERS): Likewise. (LOADSTRUCTLANE_QUALIFIERS): Likewise. (STORE1_QUALIFIERS): Likewise. (STORESTRUCT_QUALIFIERS): Likewise. (STORE1LANE_QUALIFIERS): Likewise. (STORESTRUCTLANE_QUALIFIERS): Likewise. (neon_builtin_datum): Keep track of qualifiers. (VAR1): Likewise.
Re: [Patch ARM Refactor Builtins 6/8] Add some tests for poly mangling
On 12/11/14 17:10, James Greenhalgh wrote: Hi, The poly types end up going through the default mangler, but only sometimes. We don't want to change the mangling for poly types with the next patch in this series, so add a test which should pass before and after. I've checked that the new tests pass at this stage of the patch series, and bootstrapped on arm-none-linux-gnueabihf for good luck. OK? OK. Ramana Thanks, James --- gcc/testsuite/ 2014-11-12 James Greenhalgh james.greenha...@arm.com * g++.dg/abi/mangle-arm-crypto.C: New. * g++.dg/abi/mangle-neon.C (f19): New. (f20): Likewise.
Re: [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling
On 12/11/14 17:10, James Greenhalgh wrote: Hi, This patch wires up builtin initialisation similar to the AArch64 backend, making use of the qualifiers arrays to decide on types for each builtin we hope to initialise. We could take an old snapshot of the qualifiers code from AArch64, but as our end-goal is to pull in the type mangling changes, we are as well to do that now. In order to preserve the old mangling rules after this patch, we must wire all of these types up. Together, this becomes a fairly simple side-port of the logic for Advanced SIMD builtins from the AArch64 target. Bootstrapped on arm-none-linux-gnueabihf with no issues. OK? Thanks, James --- OK. Ramana gcc/ 2014-11-12 James Greenhalgh james.greenha...@arm.com * config/arm/arm-builtins.c (arm_scalar_builtin_types): New. (enum arm_simd_type): Likewise. (struct arm_simd_type_info): Likewise (arm_mangle_builtin_scalar_type): Likewise. (arm_mangle_builtin_vector_type): Likewise. (arm_mangle_builtin_type): Likewise. (arm_simd_builtin_std_type): Likewise. (arm_lookup_simd_builtin_type): Likewise. (arm_simd_builtin_type): Likewise. (arm_init_simd_builtin_types): Likewise. (arm_init_simd_builtin_scalar_types): Likewise. (arm_init_neon_builtins): Rewrite using qualifiers. * config/arm/arm-protos.h (arm_mangle_builtin_type): New. * config/arm/arm-simd-builtin-types.def: New file. * config/arm/t-arm (arm-builtins.o): Depend on it. * config/arm/arm.c (arm_mangle_type): Call arm_mangle_builtin_type. * config/arm/arm_neon.h (int8x8_t): Use new internal type. (int16x4_t): Likewise. (int32x2_t): Likewise. (float16x4_t): Likewise. (float32x2_t): Likewise. (poly8x8_t): Likewise. (poly16x4_t): Likewise. (uint8x8_t): Likewise. (uint16x4_t): Likewise. (uint32x2_t): Likewise. (int8x16_t): Likewise. (int16x8_t): Likewise. (int32x4_t): Likewise. (int64x2_t): Likewise. (float32x4_t): Likewise. (poly8x16_t): Likewise. (poly16x8_t): Likewise. (uint8x16_t): Likewise. (uint16x8_t): Likewise. (uint32x4_t): Likewise. (uint64x2_t): Likewise. Conflicts: gcc/config/arm/arm.c
Re: [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure
On 12/11/14 17:10, James Greenhalgh wrote: Hi, This final patch clears up the remaining data structures which we no longer have any use for... * _QUALIFIERS macros which do not name a distinct pattern of arguments/return types. * The neon_builtin_type_mode enum is not needed, we can map directly to the machine_mode. * The neon_itype enum is not needed, the builtin expand functions can be rewritten to use the qualifiers data. This gives us reasonable parity between the builtin infrastructure for the ARM and AArch64 targets. We could go further and start sharing some of the logic between the two back-ends (and after that the builtin definitions, and some of arm_neon.h, etc.), but I haven't done that here as the immediate benefit is minimal. Bootstrapped and regression tested with no issues. OK? OK. Ramana Thanks, James --- gcc/ 2014-11-12 James Greenhalgh james.greenha...@arm.com * config/arm/arm-builtins.c (CONVERT_QUALIFIERS): Delete. (COPYSIGNF_QUALIFIERS): Likewise. (CREATE_QUALIFIERS): Likewise. (DUP_QUALIFIERS): Likewise. (FLOAT_WIDEN_QUALIFIERS): Likewise. (FLOAT_NARROW_QUALIFIERS): Likewise. (REINTERP_QUALIFIERS): Likewise. (RINT_QUALIFIERS): Likewise. (SPLIT_QUALIFIERS): Likewise. (FIXCONV_QUALIFIERS): Likewise. (SCALARMUL_QUALIFIERS): Likewise. (SCALARMULL_QUALIFIERS): Likewise. (SCALARMULH_QUALIFIERS): Likewise. (SELECT_QUALIFIERS): Likewise. (VTBX_QUALIFIERS): Likewise. (SHIFTIMM_QUALIFIERS): Likewise. (SCALARMAC_QUALIFIERS): Likewise. (LANEMUL_QUALIFIERS): Likewise. (LANEMULH_QUALIFIERS): Likewise. (LANEMULL_QUALIFIERS): Likewise. (SHIFTACC_QUALIFIERS): Likewise. (SHIFTINSERT_QUALIFIERS): Likewise. (VTBL_QUALIFIERS): Likewise. (LOADSTRUCT_QUALIFIERS): Likewise. (LOADSTRUCTLANE_QUALIFIERS): Likewise. (STORESTRUCT_QUALIFIERS): Likewise. (STORESTRUCTLANE_QUALIFIERS): Likewise. (neon_builtin_type_mode): Delete. (v8qi_UP): Map to V8QImode. (v8qi_UP): Map to V8QImode. (v4hi_UP): Map to V4HImode. (v4hf_UP): Map to V4HFmode. (v2si_UP): Map to V2SImode. (v2sf_UP): Map to V2SFmode. (di_UP): Map to DImode. (v16qi_UP): Map to V16QImode. (v8hi_UP): Map to V8HImode. (v4si_UP): Map to V4SImode. (v4sf_UP): Map to V4SFmode. (v2di_UP): Map to V2DImode. (ti_UP): Map to TImode. (ei_UP): Map to EImode. (oi_UP): Map to OImode. (neon_itype): Delete. (neon_builtin_datum): Remove itype, make mode a machine_mode. (VAR1): Update accordingly. (arm_init_neon_builtins): Use machine_mode directly. (neon_dereference_pointer): Likewise. (arm_expand_neon_args): Use qualifiers to decide operand types. (arm_expand_neon_builtin): Likewise. * config/arm/arm_neon_builtins.def: Remap operation type for many builtins.
Re: [PATCH, ARM] Constrain the small multiply test cases to be more restrictive.
On Mon, Nov 17, 2014 at 5:06 AM, Hale Wang hale.w...@arm.com wrote: Hi, Refer to the previous small multiply patch (r217175). The conditions in the small multiply test cases are not restrictive enough. If forcing the march=armv4t/armv5t, these cases will fail. These cases can be used only if we defined -mcpu=cortex-m0/m1/m0plus.small-multiply . This patch is used to fix this issue. These cases will be skipped if we don't define -mcpu=cortex-m0/m1/m0plus.small-multiply. So no influence to other targets. Build gcc passed. Is it OK for trunk? Ok , thanks. Ramana Thanks and Best Regards, Hale Wang gcc/testsuite/ChangeLog: 2014-11-13 Hale Wang hale.w...@arm.com * gcc.target/arm/small-multiply-m0-1.c: Only apply when -mcpu=cortex-m0/m1/m0plus.small-multiply . * gcc.target/arm/small-multiply-m0-2.c: Likewise. * gcc.target/arm/small-multiply-m0-3.c: Likewise. * gcc.target/arm/small-multiply-m0plus-1.c: Likewise. * gcc.target/arm/small-multiply-m0plus-2.c: Likewise. * gcc.target/arm/small-multiply-m0plus-3.c: Likewise. * gcc.target/arm/small-multiply-m1-1.c: Likewise. * gcc.target/arm/small-multiply-m1-2.c: Likewise. * gcc.target/arm/small-multiply-m1-3.c: Likewise. diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c b/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c index 77ec603..49132e3 100644 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb1_ok } */ -/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* } { -mcpu=* } { -mcpu=cortex-m0.small-multiply } } */ +/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* } { * } { -mcpu=cortex-m0.small-multiply } } */ /* { dg-options -mcpu=cortex-m0.small-multiply -mthumb -O2 } */ int diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c b/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c index c89b3ba..7f1bf7b 100644 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb1_ok } */ -/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* } { -mcpu=* } { -mcpu=cortex-m0.small-multiply } } */ +/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* } { * } { -mcpu=cortex-m0.small-multiply } } */ /* { dg-options -mcpu=cortex-m0.small-multiply -mthumb -Os } */ int diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c b/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c index b2df109..aca39d7 100644 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb1_ok } */ -/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* } { -mcpu=* } { -mcpu=cortex-m0.small-multiply } } */ +/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* } { * } { -mcpu=cortex-m0.small-multiply } } */ /* { dg-options -mcpu=cortex-m0.small-multiply -mthumb -Os } */ int diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c index 08a450b..12e8839 100644 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb1_ok } */ -/* { dg-skip-if Test is specific to cortex-m0plus.small-multiply { arm*-*-* } { -mcpu=* } { -mcpu=cortex-m0plus.small-multiply } } */ +/* { dg-skip-if Test is specific to cortex-m0plus.small-multiply { arm*-*-* } { * } { -mcpu=cortex-m0plus.small-multiply } } */ /* { dg-options -mcpu=cortex-m0plus.small-multiply -mthumb -O2 } */ int diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c index 17b52d3..3e3c9b2 100644 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb1_ok } */ -/* { dg-skip-if Test is specific to cortex-m0plus.small-multiply { arm*-*-* } { -mcpu=* } { -mcpu=cortex-m0plus.small-multiply } } */ +/* { dg-skip-if Test is specific to cortex-m0plus.small-multiply { arm*-*-* } { * } { -mcpu=cortex-m0plus.small-multiply } } */ /* { dg-options -mcpu=cortex-m0plus.small-multiply -mthumb -Os } */ int diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-3.c b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-3.c
Re: [PING][PATCH] [AARCH64, NEON] Improve vcls(q?) vcnt(q?) and vld1(q?)_dup intrinsics
On 17 November 2014 06:58, Yangfei (Felix) felix.y...@huawei.com wrote: PING? BTW: It seems that Alan's way of improving vld1(q?)_dup intrinsic is more elegant. So is the improvement of vcls(q?) vcnt(q?) OK for trunk? Thanks. Please rebase over Alan's patch and repost, thank you /Marcus
Re: [PATCH][AArch64] Remove crypto extension from default for cortex-a53, cortex-a57
On 17/11/14 16:59, Ramana Radhakrishnan wrote: On Mon, Nov 17, 2014 at 2:48 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, Some configurations of Cortex-A53 and Cortex-A57 don't ship with crypto, so enabling it by default for -mcpu=cortex-a53 and cortex-a57 is inappropriate. Tested aarch64-none-elf. Reminder that at the moment all the crypto extension does is enable the use of the ACLE crypto intrinsics in arm_neon.h Ok for trunk? I can't ok this but ... Since we've changed behaviour from 4.9 I think it warrants an entry in changes.html for 5.0 Makes sense. Here's what I propose. Ok? Kyrill Ramana Thanks, Kyrill 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-cores.def (cortex-a53): Remove AARCH64_FL_CRYPTO from feature flags. (cortex-a57): Likewise. (cortex-a57.cortex-a53): Likewise. Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.23 diff -U 3 -r1.23 changes.html --- htdocs/gcc-5/changes.html 14 Nov 2014 10:49:51 - 1.23 +++ htdocs/gcc-5/changes.html 17 Nov 2014 17:46:41 - @@ -257,6 +257,13 @@ Alternatively it can be enabled by default by configuring GCC with the code--enable-fix-cortex-a53-835769/code option. /li + li The cryptographic extensions to the ARMv8-A architecture are no + longer enabled by default when specifying the + code-mcpu=cortex-a53/code, code-mcpu=cortex-a57/code or + code-mcpu=cortex-a57.cortex-a53/code options. To enable these + extensions add the code+crypto/code extension to your given + code-mcpu/code or code-march/code options' value. + /li /ul
Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
On Tue, 18 Nov 2014, Eric Botcazou wrote: Now - I can see how that is easily confused by the static chain being address-taken. But I also remember that Eric did some preparatory work to fix that, for nested functions, that is, possibly setting DECL_NONADDRESSABLE_P? Don't remember exactly. The preparatory work is DECL_NONLOCAL_FRAME. The complete patch which does something along these lines is attached to PR tree-optimization/54779 (latest version, for a 4.9-based compiler). Ah, now I remember - this was to be able to optimize away the frame variable in case the nested function was inlined. Toms case is somewhat different as I undestand as somehow LIM store motion doesn't handle indirect frame accesses well enough(?) So he intends to load register vars in the frame into registers at the beginning of the nested function and restore them to the frame on function exit (this will probably break for recursive calls, but OMP offloading might be special enough that this is a non-issue there). So marking the frame decl won't help him here (I thought we might mark the FIELD_DECLs corresponding to individual vars). OTOH inside the nested function accesses to the static chain should be easy to identify. Richard.
Re: [patch] New std::string implementation
On Mon, Nov 17, 2014 at 5:43 PM, Jonathan Wakely jwak...@redhat.com wrote: On 17/11/14 13:06 +0100, Markus Trippelsdorf wrote: On 2014.11.14 at 15:43 +, Jonathan Wakely wrote: Tested on x86_64-linux and powerpc64-linux, also with --disable-libstdcxx11-abi to verify all the incompatible changes can be disabled if needed. On ppc64 I get: FAIL: libstdc++-abi/abi_check FAIL: 27_io/basic_ios/copyfmt/char/1.cc execution test FAIL: 27_io/basic_ios/exceptions/char/1.cc execution test FAIL: 27_io/basic_istream/extractors_arithmetic/char/exceptions_failbit.cc execution test FAIL: 27_io/basic_istream/extractors_arithmetic/wchar_t/exceptions_failbit.cc execution test FAIL: 27_io/basic_istream/extractors_other/char/exceptions_null.cc execution test FAIL: 27_io/basic_istream/extractors_other/wchar_t/exceptions_null.cc execution test FAIL: 27_io/basic_istream/sentry/char/12297.cc execution test FAIL: 27_io/basic_istream/sentry/wchar_t/12297.cc execution test FAIL: 27_io/basic_ostream/inserters_other/char/exceptions_null.cc execution test FAIL: 27_io/basic_ostream/inserters_other/wchar_t/exceptions_null.cc execution test FAIL: 27_io/ios_base/storage/2.cc execution test I think I've fixed those, I'll post an updated patch soon, but I'm still working on some fixes for 22_locale/ test FAILs. The problem I have is that std::basic_iosC, T objects (the base class of all iostreams) have a pointer to an instance of std::num_getC, istreambuf_iteratorC, T and that type uses std::string, so needs to be tagged, which would require the entire iostreams hierarchy to be tagged. I want to avoid that. Updated patch with fixes asap ... Looking at all these issues that just pop up inside libstdc++ I wonder if this whole business will not blow up in our face once out in the wild... Richard.
Re: [PATCH][AArch64] Remove crypto extension from default for cortex-a53, cortex-a57
On 18/11/14 09:38, Kyrill Tkachov wrote: On 17/11/14 16:59, Ramana Radhakrishnan wrote: On Mon, Nov 17, 2014 at 2:48 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, Some configurations of Cortex-A53 and Cortex-A57 don't ship with crypto, so enabling it by default for -mcpu=cortex-a53 and cortex-a57 is inappropriate. Tested aarch64-none-elf. Reminder that at the moment all the crypto extension does is enable the use of the ACLE crypto intrinsics in arm_neon.h Ok for trunk? I can't ok this but ... Since we've changed behaviour from 4.9 I think it warrants an entry in changes.html for 5.0 Makes sense. Here's what I propose. Ok? Kyrill Ramana Thanks, Kyrill 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-cores.def (cortex-a53): Remove AARCH64_FL_CRYPTO from feature flags. (cortex-a57): Likewise. (cortex-a57.cortex-a53): Likewise. www-docs-a50-crypto.patch Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.23 diff -U 3 -r1.23 changes.html --- htdocs/gcc-5/changes.html 14 Nov 2014 10:49:51 - 1.23 +++ htdocs/gcc-5/changes.html 17 Nov 2014 17:46:41 - @@ -257,6 +257,13 @@ Alternatively it can be enabled by default by configuring GCC with the code--enable-fix-cortex-a53-835769/code option. /li + li The cryptographic extensions to the ARMv8-A architecture are no + longer enabled by default when specifying the + code-mcpu=cortex-a53/code, code-mcpu=cortex-a57/code or + code-mcpu=cortex-a57.cortex-a53/code options. To enable these + extensions add the code+crypto/code extension to your given + code-mcpu/code or code-march/code options' value. + /li /ul I'd suggest: The optional cryptographic extensions ... But otherwise it looks fine to me. R.
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On Tue, Nov 18, 2014 at 2:59 AM, David Malcolm dmalc...@redhat.com wrote: On Mon, 2014-11-17 at 11:06 +0100, Richard Biener wrote: On Sat, Nov 15, 2014 at 12:00 PM, David Malcolm dmalc...@redhat.com wrote: On Thu, 2014-11-13 at 11:45 +0100, Richard Biener wrote: On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm dmalc...@redhat.com wrote: On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote: On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote: On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: To be constructive here - the above case is from within a GIMPLE_ASSIGN case label and thus I'd have expected case GIMPLE_ASSIGN: { gassign *a1 = as_a gassign * (s1); gassign *a2 = as_a gassign * (s2); lhs1 = gimple_assign_lhs (a1); lhs2 = gimple_assign_lhs (a2); if (TREE_CODE (lhs1) != SSA_NAME TREE_CODE (lhs2) != SSA_NAME) return (operand_equal_p (lhs1, lhs2, 0) gimple_operand_equal_value_p (gimple_assign_rhs1 (a1), gimple_assign_rhs1 (a2))); else if (TREE_CODE (lhs1) == SSA_NAME TREE_CODE (lhs2) == SSA_NAME) return vn_valueize (lhs1) == vn_valueize (lhs2); return false; } instead. That's the kind of changes I have expected and have approved of. But even that looks like just adding extra work for all developers, with no gain. You only have to add extra code and extra temporaries, in switches typically also have to add {} because of the temporaries and thus extra indentation level, and it doesn't simplify anything in the code. The branch attempts to use the C++ typesystem to capture information about the kinds of gimple statement we expect, both: (A) so that the compiler can detect type errors, and (B) as a comprehension aid to the human reader of the code The ideal here is when function params and struct field can be strengthened from gimple to a subclass ptr. This captures the knowledge that every use of a function or within a struct has a given gimple code. I just don't like all the as_a/is_a stuff enforced everywhere, it means more typing, more temporaries, more indentation. So, as I view it, instead of the checks being done cheaply (yes, I think the gimple checking as we have right now is very cheap) under the hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes put the burden on the developers, who has to check that manually through the as_a/is_a stuff everywhere, more typing and uglier syntax. I just don't see that as a step forward, instead a huge step backwards. But perhaps I'm alone with this. Can you e.g. compare the size of - lines in your patchset combined, and size of + lines in your patchset? As in, if your changes lead to less typing or more. I see two ways out here. One is to add overloads to all the functions taking the special types like tree gimple_assign_rhs1 (gimple *); or simply add gassign *operator ()(gimple *g) { return as_a gassign * (g); } into a gimple-compat.h header which you include in places that are not converted nicely. Thanks for the suggestions. Am I missing something, or is the gimple-compat.h idea above not valid C ++? Note that gimple is still a typedef to gimple_statement_base * (as noted before, the gimple - gimple * change would break everyone else's patches, so we talked about that as a followup patch for early stage3). Given that, if I try to create an operator () outside of a class, I get this error: ‘gassign* operator()(gimple)’ must be a nonstatic member function which is emitted from cp/decl.c's grok_op_properties: /* An operator function must either be a non-static member function or have at least one parameter of a class, a reference to a class, an enumeration, or a reference to an enumeration. 13.4.0.6 */ I tried making it a member function of gimple_statement_base, but that doesn't work either: we want a conversion from a gimple_statement_base * to a gassign *, not from a gimple_statement_base to a gassign *. Is there some syntactic trick here that I'm missing? Sorry if I'm being dumb (I can imagine there's a way of doing it by making gimple become some kind of wrapped ptr class, but that way lies madness, surely). Hmm. struct assign; struct base { operator assign *() const { return (assign *)this; }
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
On Tue, Nov 18, 2014 at 3:59 AM, Jeff Law l...@redhat.com wrote: On 11/17/14 13:05, Ilya Enkovich wrote: How comes you emit debug info for functions that do not exist and thus are never used? Is problem caused by builtins going after END_CHKP_BUILTINS? Or some info generated for all builtins ignoring its existence? IIRC, stabs aren't pruned based on whether or not they're used. So every file that includes tree-core.h is going to have every name/value pair for the enum. Hence the major heartburn for AIX. I wonder how AIX deals with the ever increasing number of target builtins then. Well. Why can't we just drop enum values that exceed the stabs format? At least I don't understand why we cannot fix this in the stabs output. After all what happens to non-GCC code with such large enums? Richard. jeff
Re: [testsuite,ARM] PR61153 Fix vbic and vorn tests
On Mon, Nov 10, 2014 at 3:02 PM, Christophe Lyon christophe.l...@linaro.org wrote: On 30 October 2014 23:02, Christophe Lyon christophe.l...@linaro.org wrote: On 29 October 2014 16:28, Ramana Radhakrishnan ramana@googlemail.com wrote: On Wed, Oct 29, 2014 at 3:26 PM, Christophe Lyon christophe.l...@linaro.org wrote: Hi, In PR61153, the vbic and vorn tests fail because when compiled at -O0 the expected Neon instructions are not generated, making scan-assembler fail. This patch: - replaces -O0 by -O2 - moves the declaration of local variables used as intrinsics parameters and results to global declarations, to prevent the compiler from optimizing the whole test away. OK? If you really want to do it , do it in neon-testgen.ml and do it for the whole lot. I thought it wasn't used anymore. At -O2 I have many more failures :-( (vdup, vget_lane, vget_low, vmov, vset_lane) And -O1 doesn't do the trick either... Christophe. Hi Ramana, Based on your request and on my above comment, I have modified neon-testgen.ml so that it sets -O2 in some testcases and -O0 by default. I've achieved this by adding a new value in the 'features' list in neon.ml, and set it to -O2 for vbic and vorn cases. neon-testgen.ml parses this value, and defaults to -O0. When not -O0, it also moves the input/output variables to global scope to avoid the whole test to be optimized out. I also fixed 3 warnings about 'or' being deprecated in neon.ml. OK? Ok , thanks. Ramana Christophe. 2014-11-10 Christophe Lyon christophe.l...@linaro.org gcc/ * config/arm/neon-testgen.ml (emit_prologue): Handle new compile_test_optim argument. (emit_automatics): Rename to emit_variables. Support variable indentation of its output. (compile_test_optim): New function. (test_intrinsic): Call compile_test_optim. * config/arm/neon.ml (features): Add Compiler_optim. (ops): Add Compiler_optim feature to Vbic and Vorn. (type_in_crypto_only): Replace 'or' by '||'. (reinterp): Likewise. (reinterpq): Likewise. testsuite/ * gcc.target/arm/neon/vbicQs16.c: Regenerate. * gcc.target/arm/neon/vbicQs32.c: Likewise. * gcc.target/arm/neon/vbicQs64.c: Likewise. * gcc.target/arm/neon/vbicQs8.c: Likewise. * gcc.target/arm/neon/vbicQu16.c: Likewise. * gcc.target/arm/neon/vbicQu32.c: Likewise. * gcc.target/arm/neon/vbicQu64.c: Likewise. * gcc.target/arm/neon/vbicQu8.c: Likewise. * gcc.target/arm/neon/vbics16.c: Likewise. * gcc.target/arm/neon/vbics32.c: Likewise. * gcc.target/arm/neon/vbics64.c: Likewise. * gcc.target/arm/neon/vbics8.c: Likewise. * gcc.target/arm/neon/vbicu16.c: Likewise. * gcc.target/arm/neon/vbicu32.c: Likewise. * gcc.target/arm/neon/vbicu64.c: Likewise. * gcc.target/arm/neon/vbicu8.c: Likewise. * gcc.target/arm/neon/vornQs16.c: Likewise. * gcc.target/arm/neon/vornQs32.c: Likewise. * gcc.target/arm/neon/vornQs64.c: Likewise. * gcc.target/arm/neon/vornQs8.c: Likewise. * gcc.target/arm/neon/vornQu16.c: Likewise. * gcc.target/arm/neon/vornQu32.c: Likewise. * gcc.target/arm/neon/vornQu64.c: Likewise. * gcc.target/arm/neon/vornQu8.c: Likewise. * gcc.target/arm/neon/vorns16.c: Likewise. * gcc.target/arm/neon/vorns32.c: Likewise. * gcc.target/arm/neon/vorns64.c: Likewise. * gcc.target/arm/neon/vorns8.c: Likewise. * gcc.target/arm/neon/vornu16.c: Likewise. * gcc.target/arm/neon/vornu32.c: Likewise. * gcc.target/arm/neon/vornu64.c: Likewise. * gcc.target/arm/neon/vornu8.c: Likewise. regards Ramana Christophe. 2014-10-29 Christophe Lyon christophe.l...@linaro.org PR target/61153 * gcc.target/arm/neon/vbicQs16.c: Compile at O2 and move variables declarations from local to global. * gcc.target/arm/neon/vbicQs16.c: Likewise. * gcc.target/arm/neon/vbicQs32.c: Likewise. * gcc.target/arm/neon/vbicQs64.c: Likewise. * gcc.target/arm/neon/vbicQs8.c: Likewise. * gcc.target/arm/neon/vbicQu16.c: Likewise. * gcc.target/arm/neon/vbicQu32.c: Likewise. * gcc.target/arm/neon/vbicQu64.c: Likewise. * gcc.target/arm/neon/vbicQu8.c: Likewise. * gcc.target/arm/neon/vbics16.c: Likewise. * gcc.target/arm/neon/vbics32.c: Likewise. * gcc.target/arm/neon/vbics64.c: Likewise. * gcc.target/arm/neon/vbics8.c: Likewise. * gcc.target/arm/neon/vbicu16.c: Likewise. * gcc.target/arm/neon/vbicu32.c: Likewise. * gcc.target/arm/neon/vbicu64.c: Likewise. * gcc.target/arm/neon/vbicu8.c: Likewise. * gcc.target/arm/neon/vornQs16.c: Likewise. * gcc.target/arm/neon/vornQs32.c: Likewise. * gcc.target/arm/neon/vornQs64.c: Likewise. * gcc.target/arm/neon/vornQs8.c: Likewise. * gcc.target/arm/neon/vornQu16.c: Likewise. * gcc.target/arm/neon/vornQu32.c: Likewise. * gcc.target/arm/neon/vornQu64.c: Likewise.
Re: [PATCH][ARM] Use std::swap instead of manually swapping
On Thu, Nov 13, 2014 at 9:42 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, Following the trend in i386 and alpha, this patch uses std::swap to perform swapping of values in the arm backend instead of declaring temporaries. Tested and bootstrapped on arm-none-linux-gnueabihf. Ok for trunk? OK - Thanks for the cleanup. Thanks, Ramana Thanks, Kyrill 2014-11-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (unaligned_loaddi): Use std::swap instead of manual swapping implementation. (movcond_addsi): Likewise. * config/arm/arm.c (arm_canonicalize_comparison): Likewise. (arm_select_dominance_cc_mode): Likewise. (arm_reload_out_hi): Likewise. (gen_operands_ldrd_strd): Likewise. (output_move_double): Likewise. (arm_print_operand_address): Likewise. (thumb_output_move_mem_multiple): Likewise. (SWAP_RTX): Delete.
Re: [PATCH] Add memory barriers to xbegin/xend/xabort
On 10/29/2014 04:31 AM, Andi Kleen wrote: 2014-10-28 Andi Kleen a...@linux.intel.com PR target/63672 * config/i386/i386.c (ix86_expand_builtin): Generate memory barrier after abort. * config/i386/i386.md (xbegin): Add memory barrier. (xend): Rename to ... (xend_1): New. Generate memory barrier and emit xend. Richi's comment is spot on. The insns themselves should hold the barrier, not being separate like -(define_insn xend +(define_insn xend_1 [(unspec_volatile [(const_int 0)] UNSPECV_XEND)] TARGET_RTM xend [(set_attr type other) (set_attr length 3)]) +(define_expand xend + [(set (match_dup 0) + (unspec:BLK [(const_int 0)] UNSPECV_XEND))] /* or match_dup 0 ? */ + TARGET_RTM +{ + emit_insn (gen_xend_1 ()); + + operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); + MEM_VOLATILE_P (operands[0]) = 1; + + DONE; +}) this, which generates two separate insns. C.f. sse2_lfence. r~
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 17 November 2014 07:59, Yangfei (Felix) felix.y...@huawei.com wrote: +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.md (doloop_end): New pattern. + This looks like a straight copy of the ARM code, but without the TARGET_CAN_USE_DOLOOP_P definition. If the reason for including this code is for the benefit of module-sched then should the hook be defined to limit the use of this pattern to inner most loops only? Cheers /Marcus
Re: [patch] fix expand_builtin_init_dwarf_reg_sizes wrt register spans
On Nov 18, 2014, at 04:01 , Jeff Law l...@redhat.com wrote: Best for Jason, Richard or Jakub. My knowledge of dwarf2 and our implementation in dwarf*out.c is minimal at best. Thanks for your answer Jeff. Richard Jason have provided feedback (thanks for this as well :) on which I'll followup. Olivier
Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
On 18 November 2014 10:14, David Sherwood david.sherw...@arm.com wrote: Hi Christophe, Ah sorry. My mistake - it fixes this in bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 I did look at that PR, but since it has no testcase attached, I was unsure. And I am still not :-) PR 59810 is [AArch64] LDn/STn implementations are not ABI-conformant for bigendian. but the advsimd-intrinsics/vldX.c and vldX_lane.c now PASS with Alan's patches on aarch64_be, so I thought Alan's patches solve PR59810. What am I missing? This change is needed in order to remove the CANNOT_CHANGE_MODE_CLASS #define, which will be committed as a separate patch. Regards, David Sherwood. -Original Message- From: Christophe Lyon [mailto:christophe.l...@linaro.org] Sent: 17 November 2014 21:09 To: David Sherwood Cc: gcc-patches@gcc.gnu.org; Alan Hayward Subject: Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. On 13 November 2014 15:32, David Sherwood david.sherw...@arm.com wrote: Hi, I think that's because Alan Hayward's original patch had a bug in it, which he has fixed and should be submitting to gcc patches fairly soon. So it's probably best waiting until he gets a new patch up I think. I've applied both Alan's patches and the advsimd-intrinsics tests now all pass on aarch64_be, but this doesn't need your patch. Which testcase does your patch actually fix? Regards, David. -Original Message- From: Christophe Lyon [mailto:christophe.l...@linaro.org] Sent: 13 November 2014 14:22 To: David Sherwood Cc: gcc-patches@gcc.gnu.org Subject: Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. On 13 November 2014 11:09, David Sherwood david.sherw...@arm.com wrote: Hi All, I have successfully rebased this and tested in conjunction with a patch from Alan Hayward ([AArch64] [BE] Fix vector load/stores to not use ld1/st1), who should be submitting a new version shortly. Built and tested on: aarch64-none-elf aarch64_be-none-elf x86_64-linux-gnu I've applied both patches to recent trunk (r217483), and I still see ICEs on aarch64_be, when running the advsimd-intrinsics/vldX.c tests. I have them with Alan Hayward's patch alone, too. BTW, I thought that patch had been approved by Marcus, but it's not in trunk yet. Maybe I missed something. Christophe. Regards, David Sherwood. -Original Message- From: David Sherwood [mailto:david.sherw...@arm.com] Sent: 28 October 2014 08:55 To: 'gcc-patches@gcc.gnu.org' Subject: RE: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. Hi, Sorry to bother you again. Could someone take a look at this change please if they have time? Thanks! David. -Original Message- From: David Sherwood [mailto:david.sherw...@arm.com] Sent: 10 October 2014 15:48 To: gcc-patches@gcc.gnu.org Subject: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. Hi, I have a fix (originally written by Tejas Belagod) for the following bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 Could someone take a look please? Thanks! David Sherwood. ChangeLog: gcc/: 2014-11-13 David Sherwood david.sherw...@arm.com * config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): New decls. * config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum. * config/aarch64/iterators.md (insn_count): New mode_attr. * config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i, vec_load_lanes(o/c/x)i): Fixed to work for Big Endian. * config/aarch64/aarch64-simd.md (aarch64_rev_reglist, aarch64_simd_(ld/st)(2/3/4)): Added. * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): Added.
Re: [AArch64, Patch] Restructure arm_neon.h vector types's implementation(Take 2).
On Wed, Nov 05, 2014 at 11:31:24PM +, James Greenhalgh wrote: On Wed, Nov 05, 2014 at 09:50:52PM +, Marc Glisse wrote: Thanks. Do you know if anyone is planning to port this patch to the arm target (which IIRC has the same issue)? No pressure, this is just so I know if I should ping the alternate __float128 patch or wait. Yes, you're right, the arm target also has this issue. I have a port of Tejas' patch based on top of some other refactoring of the arm target's builtin infrastructure I've been working on - I'm hoping to get that all rebased and proposed upstream in the next couple of weeks. This is all done now (r217700), so your patch should be safe to go in. Thanks James
[PATCH][AArch64][1/5] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Hi all, This is another iteration of this patch posted earlier with a bug fixed pointed out by Andrew It implements the target hooks for macro fusion and implements fusion of MOV+MOVK instructions and enables them for cores that support these fusions. Bootstrapped and tested aarch64-none-elf. Ok for trunk? Thanks, Kyrill 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-protos.h (struct tune_params): Add fuseable_ops field. * config/aarch64/aarch64.c (generic_tunings): Specify fuseable_ops. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (thunderx_tunings): Likewise. (aarch64_macro_fusion_p): New function. (aarch_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (AARCH64_FUSE_MOV_MOVK): Likewise. (AARCH64_FUSE_NOTHING): Likewise.commit e9d650bc71fe7549765f750fcdb47a759658c21f Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Tue Oct 21 10:36:48 2014 +0100 [AArch64] Implement TARGET_MACRO_FUSION diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 470b9eb..9e0ff8c 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -170,6 +170,7 @@ struct tune_params const struct cpu_vector_cost *const vec_costs; const int memmov_cost; const int issue_rate; + const unsigned int fuseable_ops; }; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 07f75e1..5ece23a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -302,6 +302,9 @@ static const struct cpu_vector_cost cortexa57_vector_cost = NAMED_PARAM (cond_not_taken_branch_cost, 1) }; +#define AARCH64_FUSE_NOTHING (0) +#define AARCH64_FUSE_MOV_MOVK (1 0) + #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ #endif @@ -312,7 +315,8 @@ static const struct tune_params generic_tunings = generic_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING) }; static const struct tune_params cortexa53_tunings = @@ -322,7 +326,8 @@ static const struct tune_params cortexa53_tunings = cortexa53_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK) }; static const struct tune_params cortexa57_tunings = @@ -332,7 +337,8 @@ static const struct tune_params cortexa57_tunings = cortexa57_regmove_cost, cortexa57_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 3) + NAMED_PARAM (issue_rate, 3), + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK) }; static const struct tune_params thunderx_tunings = @@ -342,7 +348,8 @@ static const struct tune_params thunderx_tunings = thunderx_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 6), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING) }; /* A processor implementing AArch64. */ @@ -9979,6 +9986,59 @@ aarch64_use_by_pieces_infrastructure_p (unsigned int size, return default_use_by_pieces_infrastructure_p (size, align, op, speed_p); } +/* Implement TARGET_SCHED_MACRO_FUSION_P. Return true if target supports + instruction fusion of some sort. */ + +static bool +aarch64_macro_fusion_p (void) +{ + return aarch64_tune_params-fuseable_ops != AARCH64_FUSE_NOTHING; +} + + +/* Implement TARGET_SCHED_MACRO_FUSION_PAIR_P. Return true if PREV and CURR + should be kept together during scheduling. */ + +static bool +aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) +{ + rtx set_dest; + rtx prev_set = single_set (prev); + rtx curr_set = single_set (curr); + /* prev and curr are simple SET insns i.e. no flag setting or branching. */ + bool simple_sets_p = prev_set curr_set !any_condjump_p (curr); + + if (!aarch64_macro_fusion_p ()) +return false; + + if (simple_sets_p + (aarch64_tune_params-fuseable_ops AARCH64_FUSE_MOV_MOVK)) +{ + /* We are trying to match: + prev (mov) == (set (reg r0) (const_int imm16)) + curr (movk) == (set (zero_extract (reg r0) + (const_int 16) + (const_int 16)) + (const_int imm16_1)) */ + + set_dest = SET_DEST (curr_set); + + if (GET_CODE (set_dest) == ZERO_EXTRACT + CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + CONST_INT_P (XEXP (set_dest, 2)) + INTVAL (XEXP (set_dest, 2)) == 16 + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP
[PATCH][AArch64][0/5] Implement macro fusion hooks
Hi all, I've decided to organise the aarch64 macro fusion patches into a series. Some of them have been posted last week and gotten review feedback and have been reworked: - [1/5] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P: This implements the hook itself and adds logic for fusing MOV+MOVK instructions as used when synthesising immediates. - [2/5] Implement adrp+add fusion: Implements fusion of ADRP+ADD instructions when the appropriate register dependencies are met. - [3/5] Implement fusion of MOVK+MOVK: Implements fusion of MOVK+MOVK instructions as used when synthesising immediates. - [4/5] Implement fusion of ARDP+LDR: Implements fusion of ADRP+LDR instructions when the appropriate register constraints are met. - [5/5] Add macro fusion support for cmp/b.X for ThunderX: This is a rebase of Andrews' patch on top of the previous four and implements fusion of flag-setting instructions together with a conditional branch. These have been bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill
[PATCH][AArch64][2/5] Implement adrp+add fusion
Hi all, This patch is just rebased on top of the changes from the previous patch in the series. Otherwise it's the same as https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01263.html with some style cleanup There can be cases where we miss fusion of adrd+add because although they are generated together (in aarch64_load_symref_appropriately), combine can sometimes combine the losym part with the instruction after it and we end up with an instruction stream where the is an insn between the two, preventing the fusion in sched1. We still catch enough cases to make this approach worthwhile and the above-mentioned exceptions can be mitigated in the future (for example, by somehow delaying the generation of the adrp,add RTL after combine but before sched1) Tested and bootstrapped on aarch64-none-linux-gnu. Ok for trunk? 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c: Include tm-constrs.h (AARCH64_FUSE_ADRP_ADD): Define. (cortexa57_tunings): Add AARCH64_FUSE_ADRP_ADD to fuseable_ops. (cortexa53_tunings): Likewise. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_ADRP_ADD.commit 248ec70cfac6cb552a427b4336a3340bb25a5e53 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Thu Nov 6 12:05:26 2014 + [AArch64] Fuse ADRP+ADD diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 502ba6d..03ae7c4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -77,6 +77,7 @@ #include dumpfile.h #include builtins.h #include rtl-iter.h +#include tm-constrs.h /* Defined for convenience. */ #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT) @@ -304,6 +305,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost = #define AARCH64_FUSE_NOTHING (0) #define AARCH64_FUSE_MOV_MOVK (1 0) +#define AARCH64_FUSE_ADRP_ADD (1 1) #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ @@ -327,7 +329,7 @@ static const struct tune_params cortexa53_tunings = generic_vector_cost, NAMED_PARAM (memmov_cost, 4), NAMED_PARAM (issue_rate, 2), - NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK) + NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD)) }; static const struct tune_params cortexa57_tunings = @@ -338,7 +340,7 @@ static const struct tune_params cortexa57_tunings = cortexa57_vector_cost, NAMED_PARAM (memmov_cost, 4), NAMED_PARAM (issue_rate, 3), - NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK) + NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD)) }; static const struct tune_params thunderx_tunings = @@ -10037,6 +10039,32 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) } } + if (simple_sets_p + (aarch64_tune_params-fuseable_ops AARCH64_FUSE_ADRP_ADD)) +{ + + /* We're trying to match: + prev (adrp) == (set (reg r1) + (high (symbol_ref (SYM + curr (add) == (set (reg r0) + (lo_sum (reg r1) + (symbol_ref (SYM + Note that r0 need not necessarily be the same as r1, especially + during pre-regalloc scheduling. */ + + if (satisfies_constraint_Ush (SET_SRC (prev_set)) + REG_P (SET_DEST (prev_set)) REG_P (SET_DEST (curr_set))) +{ + if (GET_CODE (SET_SRC (curr_set)) == LO_SUM + REG_P (XEXP (SET_SRC (curr_set), 0)) + REGNO (XEXP (SET_SRC (curr_set), 0)) + == REGNO (SET_DEST (prev_set)) + rtx_equal_p (XEXP (SET_SRC (prev_set), 0), + XEXP (SET_SRC (curr_set), 1))) +return true; +} +} + return false; }
[PATCH][AArch64][3/5] Implement fusion of MOVK+MOVK
Hi all, Following up on the previous instruction fusion patches this one implements the fusion of instruction pairs of the form: movk Xn, imm16_1 lsl 32 movk Xn, imm16_2 lsl 48 which is usually generated as part of the immediate synthesis code. For some cores we don't want to schedule them apart. These insns are represented in RTL as a SET to a ZERO_EXTRACT so we match for that case. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (AARCH64_FUSE_MOVK_MOVK): Define. (cortexa53_tunings): Specify AARCH64_FUSE_MOVK_MOVK in fuseable_ops. (cortexa57_tunings): Likewise. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_MOVK_MOVK.commit 161e7901d387fa2daf0ea34dd5df4703916435e0 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Wed Nov 12 17:36:56 2014 + [AArch64] Implement fusion of MOVK+MOVK diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1eb066c..c3c29ed 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -306,6 +306,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost = #define AARCH64_FUSE_NOTHING (0) #define AARCH64_FUSE_MOV_MOVK (1 0) #define AARCH64_FUSE_ADRP_ADD (1 1) +#define AARCH64_FUSE_MOVK_MOVK (1 2) #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ @@ -329,7 +330,7 @@ static const struct tune_params cortexa53_tunings = generic_vector_cost, NAMED_PARAM (memmov_cost, 4), NAMED_PARAM (issue_rate, 2), - NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD)) + NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK)) }; static const struct tune_params cortexa57_tunings = @@ -340,7 +341,7 @@ static const struct tune_params cortexa57_tunings = cortexa57_vector_cost, NAMED_PARAM (memmov_cost, 4), NAMED_PARAM (issue_rate, 3), - NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD)) + NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK)) }; static const struct tune_params thunderx_tunings = @@ -10430,6 +10431,36 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) } } + if (simple_sets_p + (aarch64_tune_params-fuseable_ops AARCH64_FUSE_MOVK_MOVK)) +{ + + /* We're trying to match: + prev (movk) == (set (zero_extract (reg r0) + (const_int 16) + (const_int 32)) + (const_int imm16_1)) + curr (movk) == (set (zero_extract (reg r0) + (const_int 16) + (const_int 48)) + (const_int imm16_2)) */ + + if (GET_CODE (SET_DEST (prev_set)) == ZERO_EXTRACT + GET_CODE (SET_DEST (curr_set)) == ZERO_EXTRACT + REG_P (XEXP (SET_DEST (prev_set), 0)) + REG_P (XEXP (SET_DEST (curr_set), 0)) + REGNO (XEXP (SET_DEST (prev_set), 0)) + == REGNO (XEXP (SET_DEST (curr_set), 0)) + CONST_INT_P (XEXP (SET_DEST (prev_set), 2)) + CONST_INT_P (XEXP (SET_DEST (curr_set), 2)) + INTVAL (XEXP (SET_DEST (prev_set), 2)) == 32 + INTVAL (XEXP (SET_DEST (curr_set), 2)) == 48 + CONST_INT_P (SET_SRC (prev_set)) + CONST_INT_P (SET_SRC (curr_set))) +return true; + +} + return false; }
Re: [PATCH 4/4] Data structure is used for inline_summary struct.
On 11/14/2014 05:06 PM, Jan Hubicka wrote: In a way I would like to see these to be methods of the underlying type rather than virtual methods of the summary, becuase these are operations on the data themselves. I was thinking to model these by specual constructor and copy constructor (taking the extra node pointer parameters) and standard destructor. I am not sure this would be more understandable this way? Motivation for this implementation is: a) it's useful to have an access to cgraph_node that is associated with a sumary Yep, one would have node addition ctor (symtab_node *); (or cgraph/varpool nodes for cgraph/varpool annotations) that would default to ctor for implementations that do not care about node. And node duplication ctor ctor (summary , symtab_node *, symtab_node *) that would default to copy constructor for data that do not need to be copied. Hello. I have no problem with such construction and destruction, we can also provide base implementation. I would say that main advantage (in addition to have a way to provide resonable defaults) is to make ctors/dtors of the embedded classes working well, so one can for example embedd pointer_map and not care about its construction/destruction. b) with GTY, we cannot call destructor Everything in symbol table is expecitely memory managed (i.e. enver left to be freed by garbage collector). It resists in GTY only to allow linking garbage collected object from them and to get PCH working. However GTY types need to be allocated by ggc_alloc and one can't call dtor. This was main motivation for providing hooks instead of ctor/dtor API. Maybe I miss something? Thanks, Martin This is however quite cosmetic issue I would preffer our C++ guys to comment on. We can tweak this incrementally. +void +inline_summary_t::duplicate (cgraph_node *src, +cgraph_node *dst, +inline_summary *, +inline_summary *info) Also we should have a way to say that the annotation do not need to be duplicated (for example when we do not want to annotate inline clones). Probably by adding duplicate_p predicate that is called before the actual duplication happens? The updated patch is OK, I will take a look on the main patch. Honza { - struct inline_summary *info; inline_summary_alloc (); - info = inline_summary (dst); - memcpy (info, inline_summary (src), sizeof (struct inline_summary)); + memcpy (info, inline_summary_d-get (src), sizeof (inline_summary)); /* TODO: as an optimization, we may avoid copying conditions that are known to be false or true. */ info-conds = vec_safe_copy (info-conds); @@ -1328,7 +1309,7 @@ free_growth_caches (void) static void dump_inline_edge_summary (FILE *f, int indent, struct cgraph_node *node, - struct inline_summary *info) + inline_summary *info) { struct cgraph_edge *edge; for (edge = node-callees; edge; edge = edge-next_callee) @@ -1345,8 +1326,8 @@ dump_inline_edge_summary (FILE *f, int indent, struct cgraph_node *node, ? inlined : cgraph_inline_failed_string (edge- inline_failed), indent, , es-loop_depth, edge-frequency, es-call_stmt_size, es-call_stmt_time, - (int) inline_summary (callee)-size / INLINE_SIZE_SCALE, - (int) inline_summary (callee)-estimated_stack_size); + (int) inline_summary_d-get (callee)-size / INLINE_SIZE_SCALE, + (int) inline_summary_d-get (callee)-estimated_stack_size); if (es-predicate) { @@ -1372,9 +1353,9 @@ dump_inline_edge_summary (FILE *f, int indent, struct cgraph_node *node, fprintf (f, %*sStack frame offset %i, callee self size %i, callee size %i\n, indent + 2, , - (int) inline_summary (callee)-stack_frame_offset, - (int) inline_summary (callee)-estimated_self_stack_size, - (int) inline_summary (callee)-estimated_stack_size); + (int) inline_summary_d-get (callee)-stack_frame_offset, + (int) inline_summary_d-get (callee)-estimated_self_stack_size, + (int) inline_summary_d-get (callee)-estimated_stack_size); dump_inline_edge_summary (f, indent + 2, callee, info); } } @@ -1402,7 +1383,7 @@ dump_inline_summary (FILE *f, struct cgraph_node *node) { if (node-definition) { - struct inline_summary *s = inline_summary (node); + inline_summary *s = inline_summary_d-get (node); size_time_entry *e; int i; fprintf (f, Inline summary for %s/%i, node-name (), @@ -1725,7 +1706,7 @@ eliminated_by_inlining_prob (gimple stmt) static void set_cond_stmt_execution_predicate (struct ipa_node_params *info, - struct inline_summary *summary, +
[PATCH][AArch64][4/5] Implement fusion of ARDP+LDR
Hi all, Sometimes we want to fuse adrp+ldr pairs as described in the comment in the patch. This is enabled by default for the Cortex-A53. Bootstrapped and tested on aarch64-none-linux. Ok for trunk? Thanks, Kyrill 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (AARCH64_FUSE_ADRP_LDR): Define. (cortexa53_tunings): Specify AARCH64_FUSE_ADRP_LDR in fuseable_ops. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_ADRP_LDR.commit ad175271f82e0330894bfe894e86f7ad8a4b6cce Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Thu Nov 13 16:48:18 2014 + [AArch64][tmp] Fuse ADRP+LDR diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 85e08d0..132535c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -307,6 +307,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost = #define AARCH64_FUSE_MOV_MOVK (1 0) #define AARCH64_FUSE_ADRP_ADD (1 1) #define AARCH64_FUSE_MOVK_MOVK (1 2) +#define AARCH64_FUSE_ADRP_LDR (1 3) #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ @@ -330,7 +331,8 @@ static const struct tune_params cortexa53_tunings = generic_vector_cost, NAMED_PARAM (memmov_cost, 4), NAMED_PARAM (issue_rate, 2), - NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK)) + NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR)) }; static const struct tune_params cortexa57_tunings = @@ -10467,6 +10469,37 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) return true; } + if (simple_sets_p + (aarch64_tune_params-fuseable_ops AARCH64_FUSE_ADRP_LDR)) +{ + /* We're trying to match: + prev (adrp) == (set (reg r0) + (high (symbol_ref (SYM + curr (ldr) == (set (reg r1) + (mem (lo_sum (reg r0) + (symbol_ref (SYM) + or + curr (ldr) == (set (reg r1) + (zero_extend (mem + (lo_sum (reg r0) + (symbol_ref (SYM)) */ + if (satisfies_constraint_Ush (SET_SRC (prev_set)) + REG_P (SET_DEST (prev_set)) REG_P (SET_DEST (curr_set))) +{ + rtx curr_src = SET_SRC (curr_set); + + if (GET_CODE (curr_src) == ZERO_EXTEND) +curr_src = XEXP (curr_src, 0); + + if (MEM_P (curr_src) GET_CODE (XEXP (curr_src, 0)) == LO_SUM + REG_P (XEXP (XEXP (curr_src, 0), 0)) + REGNO (XEXP (XEXP (curr_src, 0), 0)) + == REGNO (SET_DEST (prev_set)) + rtx_equal_p (XEXP (XEXP (curr_src, 0), 1), + XEXP (SET_SRC (prev_set), 0))) + return true; +} +} return false; }
[PATCH][AARCH64][5/5] Add macro fusion support for cmp/b.X for ThunderX
Hi all, This is a rebase of Andrews' CMP+BRANCH fusion patch on top of my macro fusion patches. I've assigned the number 14 to AARCH64_FUSE_CMP_BRANCH. I've given it a test on top of my fusion patches. Ok for trunk together with the rest? 2014-11-14 Andrew Pinski apin...@cavium.com * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define. (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.commit 619f6c056e80f284a834d2b24e6a9c1f933a2dd5 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Fri Nov 14 09:16:08 2014 + [AArch64][apinski] CMP+branch macro fusion diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 2e63269..d0e52b0 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -308,6 +308,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost = #define AARCH64_FUSE_ADRP_ADD (1 1) #define AARCH64_FUSE_MOVK_MOVK (1 2) #define AARCH64_FUSE_ADRP_LDR (1 3) +#define AARCH64_FUSE_CMP_BRANCH (1 4) #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ @@ -353,7 +354,7 @@ static const struct tune_params thunderx_tunings = generic_vector_cost, NAMED_PARAM (memmov_cost, 6), NAMED_PARAM (issue_rate, 2), - NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING) + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH) }; /* A processor implementing AArch64. */ @@ -10133,6 +10134,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) } } + if ((aarch64_tune_params-fuseable_ops AARCH64_FUSE_CMP_BRANCH) + any_condjump_p (curr)) +{ + /* FIXME: this misses some which are considered simple arthematic + instructions for ThunderX. Simple shifts are missed here. */ + if (get_attr_type (prev) == TYPE_ALUS_SREG + || get_attr_type (prev) == TYPE_ALUS_IMM + || get_attr_type (prev) == TYPE_LOGICS_REG + || get_attr_type (prev) == TYPE_LOGICS_IMM) + return true; +} + return false; }
Re: match.pd tweaks for vectors and issues with HONOR_NANS
On Mon, 17 Nov 2014, Richard Biener wrote: On Sun, Nov 16, 2014 at 6:53 PM, Marc Glisse marc.gli...@inria.fr wrote: On Sun, 16 Nov 2014, Richard Biener wrote: I think the element_mode is the way to go. The following passed bootstrap+testsuite. 2014-11-16 Marc Glisse marc.gli...@inria.fr * tree.c (element_mode, integer_truep): New functions. * tree.h (element_mode, integer_truep): Declare them. * fold-const.c (negate_expr_p, fold_negate_expr, combine_comparisons, fold_cond_expr_with_comparison, fold_real_zero_addition_p, fold_comparison, fold_ternary_loc, tree_call_nonnegative_warnv_p, fold_strip_sign_ops): Use element_mode. (fold_binary_loc): Use element_mode and element_precision. * match.pd: Use integer_truep, element_mode, element_precision, VECTOR_TYPE_P and build_one_cst. Extend some transformations to vectors. Simplify A/-A. -! (final_prec != GET_MODE_PRECISION (TYPE_MODE (type)) - TYPE_MODE (type) == TYPE_MODE (inter_type)) +! (final_prec != element_precision (type) + element_mode (type) == element_mode (inter_type)) isn't a 1:1 conversion - please use final_prec != GET_MODE_PRECISION (element_mode (type)) your version is final_prec != final_prec. Good catch, I was doing those replacements too fast to be really safe :-( The tree.c:element_mode function lacks a function comment. Ok with those fixed. I am attaching the version I committed. I'll try to replace some more TYPE_MODE during stage3... Thanks, -- Marc GlisseIndex: gcc/tree.c === --- gcc/tree.c (revision 217701) +++ gcc/tree.c (revision 217702) @@ -2268,20 +2268,34 @@ integer_nonzerop (const_tree expr) { STRIP_NOPS (expr); return ((TREE_CODE (expr) == INTEGER_CST !wi::eq_p (expr, 0)) || (TREE_CODE (expr) == COMPLEX_CST (integer_nonzerop (TREE_REALPART (expr)) || integer_nonzerop (TREE_IMAGPART (expr); } +/* Return 1 if EXPR is the integer constant one. For vector, + return 1 if every piece is the integer constant minus one + (representing the value TRUE). */ + +int +integer_truep (const_tree expr) +{ + STRIP_NOPS (expr); + + if (TREE_CODE (expr) == VECTOR_CST) +return integer_all_onesp (expr); + return integer_onep (expr); +} + /* Return 1 if EXPR is the fixed-point constant zero. */ int fixed_zerop (const_tree expr) { return (TREE_CODE (expr) == FIXED_CST TREE_FIXED_CST (expr).data.is_zero ()); } /* Return the power of two represented by a tree node known to be a @@ -12303,11 +12317,25 @@ get_base_address (tree t) t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); /* ??? Either the alias oracle or all callers need to properly deal with WITH_SIZE_EXPRs before we can look through those. */ if (TREE_CODE (t) == WITH_SIZE_EXPR) return NULL_TREE; return t; } +/* Return the machine mode of T. For vectors, returns the mode of the + inner type. The main use case is to feed the result to HONOR_NANS, + avoiding the BLKmode that a direct TYPE_MODE (T) might return. */ + +machine_mode +element_mode (const_tree t) +{ + if (!TYPE_P (t)) +t = TREE_TYPE (t); + if (VECTOR_TYPE_P (t) || TREE_CODE (t) == COMPLEX_TYPE) +t = TREE_TYPE (t); + return TYPE_MODE (t); +} + #include gt-tree.h Index: gcc/tree.h === --- gcc/tree.h (revision 217701) +++ gcc/tree.h (revision 217702) @@ -1557,20 +1557,22 @@ extern void protected_set_expr_location #define TYPE_NEXT_VARIANT(NODE) (TYPE_CHECK (NODE)-type_common.next_variant) #define TYPE_MAIN_VARIANT(NODE) (TYPE_CHECK (NODE)-type_common.main_variant) #define TYPE_CONTEXT(NODE) (TYPE_CHECK (NODE)-type_common.context) #define TYPE_MODE(NODE) \ (VECTOR_TYPE_P (TYPE_CHECK (NODE)) \ ? vector_type_mode (NODE) : (NODE)-type_common.mode) #define SET_TYPE_MODE(NODE, MODE) \ (TYPE_CHECK (NODE)-type_common.mode = (MODE)) +extern machine_mode element_mode (const_tree t); + /* The canonical type for this type node, which is used by frontends to compare the type for equality with another type. If two types are equal (based on the semantics of the language), then they will have equivalent TYPE_CANONICAL entries. As a special case, if TYPE_CANONICAL is NULL_TREE, and thus TYPE_STRUCTURAL_EQUALITY_P is true, then it cannot be used for comparison against other types. Instead, the type is said to require structural equality checks, described in TYPE_STRUCTURAL_EQUALITY_P. @@ -3992,20 +3994,25 @@ extern int integer_minus_onep (const_tre /* integer_pow2p (tree x) is nonzero is X is an integer constant with exactly one bit 1. */ extern int integer_pow2p (const_tree); /* integer_nonzerop (tree x) is nonzero if X is an integer constant with a
[PATCH][ARM] Add -mcpu=cortex-a17.cortex-a7
Hi all, Following up from adding Cortex-A17 support this patch adds a big.LITTLE option cortex-a17.cortex-a7. Similar to the existing cortex-a15.cortex-a7 support we schedule for Cortex-A7 and make the other tuning decisions as for Cortex-A17. Tested arm-none-eabi. Ok for trunk? Thanks, Kyrill 2014-11-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-cores.def (cortex-a17.cortex-a7): New entry. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * config/arm/bpabi.h (BE8_LINK_SPEC): Add mcpu=cortex-a17.cortex-a7. * config/arm/t-aprofile: Add cortex-a17.cortex-a7 entry to MULTILIB_MATCHES. commit aa691f7d5d474f3e7721dfbedefea4b22502b5c0 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Mon Nov 3 15:07:58 2014 + [ARM] Add cortex-a17.cortex-a7 big.LITTLE support diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index f8003ce..423ee9e 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -162,6 +162,7 @@ ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) /* V7 big.LITTLE implementations */ ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) +ARM_CORE(cortex-a17.cortex-a7, cortexa17cortexa7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a12) /* V8 Architecture Processors */ ARM_CORE(cortex-a53, cortexa53, cortexa53, 8A, FL_LDSCHED | FL_CRC32, cortex_a53) diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 9d8159f..9b1886e 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -301,6 +301,9 @@ EnumValue Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7) EnumValue +Enum(processor_type) String(cortex-a17.cortex-a7) Value(cortexa17cortexa7) + +EnumValue Enum(processor_type) String(cortex-a53) Value(cortexa53) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index 7218542..d300c51 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -31,6 +31,6 @@ (define_attr tune cortexa15,cortexa17,cortexr4,cortexr4f, cortexr5,cortexr7,cortexm7, cortexm4,cortexm3,marvell_pj4, - cortexa15cortexa7,cortexa53,cortexa57, - cortexa57cortexa53 + cortexa15cortexa7,cortexa17cortexa7,cortexa53, + cortexa57,cortexa57cortexa53 (const (symbol_ref ((enum attr_tune) arm_tune diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h index 22a37ae..aa93aa4 100644 --- a/gcc/config/arm/bpabi.h +++ b/gcc/config/arm/bpabi.h @@ -66,6 +66,7 @@ |mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15 \ |mcpu=cortex-a12|mcpu=cortex-a17 \ |mcpu=cortex-a15.cortex-a7\ + |mcpu=cortex-a17.cortex-a7\ |mcpu=marvell-pj4 \ |mcpu=cortex-a53 \ |mcpu=cortex-a57 \ @@ -87,6 +88,7 @@ |mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15 \ |mcpu=cortex-a12|mcpu=cortex-a17 \ |mcpu=cortex-a15.cortex-a7\ + |mcpu=cortex-a17.cortex-a7\ |mcpu=cortex-a53 \ |mcpu=cortex-a57 \ |mcpu=cortex-a57.cortex-a53\ diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile index 475aed1..9e2637b 100644 --- a/gcc/config/arm/t-aprofile +++ b/gcc/config/arm/t-aprofile @@ -85,6 +85,7 @@ MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a15 MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a12 MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a17 MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a15.cortex-a7 +MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a17.cortex-a7 MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a53 MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a57 MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a57.cortex-a53
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
Yes, major code is borrowed from ARM port with the expected mode of loop pseudo changed to DImode. The function loop_canon_p called in sms_schedule can make sure that SMS is only performed for innermost loops. But I think it's a good idea to implement the TARGET_CAN_USE_DOLOOP_P hook here. See my updated patch below. How about this one? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,47 @@ [(set_attr type mrs)]) +;; Define the subtract-one-and-jump insns so loop.c +;; knows what to generate. +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) +{ + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + if (GET_MODE (operands[0]) != DImode) + FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, +loc_ref, pc_rtx))); + DONE; +} + else +FAIL; +}) + ;; AdvSIMD Stuff (include aarch64-simd.md) Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c(revision 217394) +++ gcc/config/aarch64/aarch64.c(working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-aarch64.h On 17 November 2014 07:59, Yangfei (Felix) felix.y...@huawei.com wrote: +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.md (doloop_end): New pattern. + This looks like a straight copy of the ARM code, but without the TARGET_CAN_USE_DOLOOP_P definition. If the reason for including this code is for the benefit of module-sched then should the hook be defined to limit the use of this pattern to inner most loops only? Cheers /Marcus aarch64-doloop-v4.diff Description: aarch64-doloop-v4.diff
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com: On 11/17/14 13:43, Ilya Enkovich wrote: I don't fully understand how this problem appears. Is it fully AIX specific and doesn't affect any other target? May we put all _CHKP codes to the end of enum and ignore them by AIX? Limiting number of codes in enum due to restrictions of specific debug info format seems weird. If something cannot be encoded for debug info then it should be ignored. I really hoped that creation of functions by demand would allow to avoid such kind of problems :( I'll try to make a patch reducing amound of builtin codes to a required minimum in case it appears to be the best option we have. It's a problem with the AIX native tools. Presumably they don't like really long lines -- it was a relatively common issue in the past with vendor tools. I think we should proceed on two fronts. First if David could investigate the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to resolve the issue. Second if you could look at now duplicating every builtin in the enumeration since it's a combination of the number of enums and the length of the debug strings to represent them that's causing AIX heartburn. jeff I see. I can reduce the number of _CHKP builtin codes. Experimental patch shows resulting END_BUILTINS = 1242. But we should expect similar problem for i386 target builds hosted on AIX (here http://toolchain.lug-owl.de/buildbot/ I see such build is tested). Current IX86_BUILTIN_MAX is 2467 which is greater than current END_BUILTINS. Ilya
Re: [PATCH][AARCH64][5/5] Add macro fusion support for cmp/b.X for ThunderX
On 18/11/14 10:34, Kyrill Tkachov wrote: Hi all, This is a rebase of Andrews' CMP+BRANCH fusion patch on top of my macro fusion patches. I've assigned the number 14 to AARCH64_FUSE_CMP_BRANCH. I've given it a test on top of my fusion patches. Ok for trunk together with the rest? 2014-11-14 Andrew Pinski apin...@cavium.com * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define. (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops. (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH. aarch64-cmp-branch.patch commit 619f6c056e80f284a834d2b24e6a9c1f933a2dd5 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Fri Nov 14 09:16:08 2014 + [AArch64][apinski] CMP+branch macro fusion diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 2e63269..d0e52b0 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -308,6 +308,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost = #define AARCH64_FUSE_ADRP_ADD(1 1) #define AARCH64_FUSE_MOVK_MOVK (1 2) #define AARCH64_FUSE_ADRP_LDR(1 3) +#define AARCH64_FUSE_CMP_BRANCH (1 4) #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ @@ -353,7 +354,7 @@ static const struct tune_params thunderx_tunings = generic_vector_cost, NAMED_PARAM (memmov_cost, 6), NAMED_PARAM (issue_rate, 2), - NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING) + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH) }; /* A processor implementing AArch64. */ @@ -10133,6 +10134,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) } } + if ((aarch64_tune_params-fuseable_ops AARCH64_FUSE_CMP_BRANCH) + any_condjump_p (curr)) +{ + /* FIXME: this misses some which are considered simple arthematic + instructions for ThunderX. Simple shifts are missed here. */ + if (get_attr_type (prev) == TYPE_ALUS_SREG + || get_attr_type (prev) == TYPE_ALUS_IMM + || get_attr_type (prev) == TYPE_LOGICS_REG + || get_attr_type (prev) == TYPE_LOGICS_IMM) + return true; +} + Repeatedly calling get_attr_type on the same insn is a bit wasteful, despite the caching that's done. It would be better to call it once and save the value in a variable. Also, calling that function may cause prev to become the extracted insn. Is it safe to assume that the caller(s) of this code understand that this might have happened, or do we need to re-extract curr before returning? return false; }
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 11/18/2014 11:48 AM, Yangfei (Felix) wrote: +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) You'd be better off moving this condition into the expansion predicate (which is currently ). This short-circuits a lot of unnecessary work. See pass_rtl_doloop::gate. r~
[PR63762]GCC generates UNPREDICTABLE STR with Rn = Rt for arm
Hi all, This patch should fix PR63762. Hard_reg allocated during IRA pass conflicts with hard_reg allocated in LRA pass because of inconsistent information between allocno and reg_pref. The staled reg_class information in reg_pref is used by LRA while doing hard register assignment. LRA makes wrong decision to choose same hard_reg number for pseudo registers with overlapped liveness. For this particular case manifested on arm target: (insn 180 183 184 15 (set (mem:SF (post_inc:SI (reg/v/f:SI 223 [ in ])) [2 MEM[base: in_68, offset: 4294967292B]+0 S4 A32]) (reg/v:SF 290 [orig:224 val ] [224])) unpreditable-str.c:33 618 {*movsf_vfp} (expr_list:REG_INC (reg/v/f:SI 223 [ in ]) (nil))) IRA: r290(2 ,SF, GENERAL_REGS) - r278(2, SF, GENERAL_REGS) - r224 (16, SF, VFP_LO_REGS) reg_pref[290] is set following its original_reg's(r224) info which is VFP_LO_REGS. LRA: r302(SI, GENERAL_REGS) - r223(SI, GENERAL_REGS) while selecting hard register for r302, register class is checked first, r290 is VFP_LO_REGS (from reg_pref[290]), r302 is GENERAL_REGS, so they will never conflict. Thus, hard register number 2 is chosen for r302 without problem. A testcase is added for all targets as I think it's a middle-end issue. And sorry for not being able to simplify it. arm-none-eabi has been test on the model, no new issues. bootstrapping and regression tested on x86, no new issues. Is it Okay to commit? gcc/ChangeLog: 2014-11-18 Renlin Li renlin...@arm.com PR middle-end/63762 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-18 Renlin Li renlin...@arm.com PR middle-end/63762 * gcc.dg/pr63762.c: New.diff --git a/gcc/ira.c b/gcc/ira.c index 9c9e71d..e610d35 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -5263,7 +5263,18 @@ ira (FILE *f) ira_allocno_iterator ai; FOR_EACH_ALLOCNO (a, ai) - ALLOCNO_REGNO (a) = REGNO (ALLOCNO_EMIT_DATA (a)-reg); +{ + int old_regno = ALLOCNO_REGNO (a); + int new_regno = REGNO (ALLOCNO_EMIT_DATA (a)-reg); + + ALLOCNO_REGNO (a) = new_regno; + + if (old_regno != new_regno) +setup_reg_classes (new_regno, reg_preferred_class (old_regno), + reg_alternate_class (old_regno), + reg_allocno_class (old_regno)); +} + } else { diff --git a/gcc/testsuite/gcc.dg/pr63762.c b/gcc/testsuite/gcc.dg/pr63762.c new file mode 100644 index 000..df11067 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr63762.c @@ -0,0 +1,77 @@ +/* PR middle-end/63762 */ +/* { dg-do assemble } */ +/* { dg-options -O2 } */ + +#include stdlib.h + +void *astFree (); +void *astMalloc (); +void astNegate (void *); +int astGetNegated (void *); +void astGetRegionBounds (void *, double *, double *); +int astResampleF (void *, ...); + +extern int astOK; + +int +MaskF (int inside, int ndim, const int lbnd[], const int ubnd[], + float in[], float val) +{ + + void *used_region; + float *c, *d, *out, *tmp_out; + double *lbndgd, *ubndgd; + int *lbndg, *ubndg, idim, ipix, nax, nin, nout, npix, npixg, result = 0; + if (!astOK) return result; + lbndg = astMalloc (sizeof (int)*(size_t) ndim); + ubndg = astMalloc (sizeof (int)*(size_t) ndim); + lbndgd = astMalloc (sizeof (double)*(size_t) ndim); + ubndgd = astMalloc (sizeof (double)*(size_t) ndim); + if (astOK) +{ + astGetRegionBounds (used_region, lbndgd, ubndgd); + npix = 1; + npixg = 1; + for (idim = 0; idim ndim; idim++) +{ + lbndg[ idim ] = lbnd[ idim ]; + ubndg[ idim ] = ubnd[ idim ]; + npix *= (ubnd[ idim ] - lbnd[ idim ] + 1); + if (npixg = 0) npixg *= (ubndg[ idim ] - lbndg[ idim ] + 1); +} + if (npixg = 0 astOK) +{ + if ((inside != 0) == (astGetNegated( used_region ) != 0)) +{ + c = in; + for (ipix = 0; ipix npix; ipix++) *(c++) = val; + result = npix; +} +} + else if (npixg 0 astOK) +{ + if ((inside != 0) == (astGetNegated (used_region) != 0)) +{ + tmp_out = astMalloc (sizeof (float)*(size_t) npix); + if (tmp_out) +{ + c = tmp_out; + for (ipix = 0; ipix npix; ipix++) *(c++) = val; + result = npix - npixg; +} + out = tmp_out; +} + else +{ + tmp_out = NULL; + out = in; +} + if (inside) astNegate (used_region); + result += astResampleF (used_region, ndim, lbnd, ubnd, in, NULL, + NULL, NULL, 0, 0.0, 100, val, ndim, + lbnd, ubnd, lbndg, ubndg, out, NULL); + if (inside) astNegate
Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian
On 06/11/14 08:35, Yangfei (Felix) wrote: The idea is simple: Use movw for certain const source operand instead of ldrh. And exclude the const values which cannot be handled by mov/mvn/movw. I am doing regression test for this patch. Assuming no issue pops up, OK for trunk? So, doesn't that makes the bug latent for architectures older than armv6t2 and big endian and only fixed this in ARM state ? I'd prefer a complete solution please. What about *thumb2_movhi_insn in thumb2.md ? Actually, we fix the bug by excluding the const values which cannot be handled. The patch still works even without the adding of movw here. The new movw alternative here is just an small code optimization for certain arch. We can handle consts by movw instead of ldrh and this better for performance. We find that this bug is not reproducible for *thumb2_movhi_insn. The reason is that this pattern can always move consts using movw. Please fix the PR number in your final commit with PR 59593. Index: gcc/config/arm/predicates.md = == --- gcc/config/arm/predicates.md(revision 216838) +++ gcc/config/arm/predicates.md(working copy) @@ -144,6 +144,11 @@ (and (match_code const_int) (match_test INTVAL (op) == 0))) +(define_predicate arm_movw_immediate_operand + (and (match_test TARGET_32BIT arm_arch_thumb2) + (and (match_code const_int) + (match_test (INTVAL (op) 0x) == 0 + ;; Something valid on the RHS of an ARM data-processing instruction (define_predicate arm_rhs_operand (ior (match_operand 0 s_register_operand) @@ -211,6 +216,11 @@ (ior (match_operand 0 arm_rhs_operand) (match_operand 0 arm_not_immediate_operand))) +(define_predicate arm_hi_operand + (ior (match_operand 0 arm_rhsm_operand) + (ior (match_operand 0 arm_not_immediate_operand) +(match_operand 0 arm_movw_immediate_operand + I think you don't need any of these predicates. (define_predicate arm_di_operand (ior (match_operand 0 s_register_operand) (match_operand 0 arm_immediate_di_operand))) Index: gcc/config/arm/arm.md = == --- gcc/config/arm/arm.md (revision 216838) +++ gcc/config/arm/arm.md (working copy) @@ -6285,8 +6285,8 @@ ;; Pattern to recognize insn generated default case above (define_insn *movhi_insn_arch4 - [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r) - (match_operand:HI 1 general_operand rIk,K,r,mi))] + [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r) + (match_operand:HI 1 arm_hi_operand rIk,K,j,r,mi))] Use `n' instead of `j' - movw can handle all of the numerical constants for HImode values. And the predicate can remain general_operand. Hello Ramana, We need to make sure that movw is only used for architectures which satisfy arm_arch_thumb2 as indicated in the following predicate added: +(define_predicate arm_movw_immediate_operand + (and (match_test TARGET_32BIT arm_arch_thumb2) + (and (match_code const_int) + (match_test (INTVAL (op) 0x) == 0 I am modifying the predicate in order to fix issue for other older architectures. It seems we can't achieve this by simply using 'n' instead of 'j' here, right? Thanks.
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 11/18/2014 11:48 AM, Yangfei (Felix) wrote: +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) You'd be better off moving this condition into the expansion predicate (which is currently ). This short-circuits a lot of unnecessary work. See pass_rtl_doloop::gate. r~ Yeah, that's a good idea. See my updated patch :-) Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,43 @@ [(set_attr type mrs)]) +;; Define the subtract-one-and-jump insns so loop.c +;; knows what to generate. +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + optimize 0 flag_modulo_sched +{ + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + + if (GET_MODE (operands[0]) != DImode) +FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, +loc_ref, pc_rtx))); + DONE; +}) + ;; AdvSIMD Stuff (include aarch64-simd.md) Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c(revision 217394) +++ gcc/config/aarch64/aarch64.c(working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-aarch64.h aarch64-doloop-v5.diff Description: aarch64-doloop-v5.diff
[PATCH][ARM] __ARM_FP __ARM_NEON_FP defined when -march=armv7-m
Incorrect predefinitions for certain target architectures. E.g. arm7-m does not contain NEON but the defintion __ARM_NEON_FP was switched on. Similarly with armv6 and even armv2. This patch fixes the predefines for each of the different chips containing certain types of the FPU implementations. Tests: Tested on arm-none-linux-gnueabi and arm-none-linux-gnueabihf without any new regression. Manually compiled for various targets and all correct definitions were present. Is this patch ok for trunk? Mantas gcc/Changelog: * config/arm/arm.h (TARGET_NEON_FP): Removed conditional definition, define to zero if !TARGET_NEON. (TARGET_CPU_CPP_BUILTINS): Added second condition before defining __ARM_FP macro. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index ff4ddac..325fea9 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -118,7 +118,7 @@ extern char arm_arch_name[]; if (TARGET_VFP) \ builtin_define (__VFP_FP__); \ \ - if (TARGET_ARM_FP)\ + if (TARGET_ARM_FP !TARGET_SOFT_FLOAT) \ builtin_define_with_int_value ( \ __ARM_FP, TARGET_ARM_FP); \ if (arm_fp16_format == ARM_FP16_FORMAT_IEEE) \ @@ -2350,10 +2350,9 @@ extern int making_const_table; /* Set as a bit mask indicating the available widths of floating point types for hardware NEON floating point. This is the same as TARGET_ARM_FP without the 64-bit bit set. */ -#ifdef TARGET_NEON -#define TARGET_NEON_FP \ - (TARGET_ARM_FP (0xff ^ 0x08)) -#endif +#define TARGET_NEON_FP \ + (TARGET_NEON ? (TARGET_ARM_FP (0xff ^ 0x08)) \ + : 0) /* The maximum number of parallel loads or stores we support in an ldm/stm instruction. */
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 11/18/2014 12:28 PM, Yangfei (Felix) wrote: +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. Looks good to me. I'll leave it for aarch64 maintainers for final approval. r~
[PING ^ 2][RFC PATCH, AARCH64] Add support for -mlong-calls option
Ping again? Any comment please? Ping? I hope this patch can catch up with stage 1 of GCC-5.0. Thanks. Hi Felix, Sorry for the delay responding, I've been out of the office recently and I'm only just catching up on a backlog of GCC related emails. I'm in two minds about this; I can potentially see the need for attributes to enable long calls for specific calls, and maybe also for pragmas that can be used to efficiently mark a group of functions in that way; but I don't really see the value in adding a -mlong-calls option to do this globally. The reasoning is as follows: long calls are generally very expensive and relatively few functions should need them in most applications (since code that needs to span more than a single block of 128Mbytes - the span of a BL or B instruction - will be very rare in reality). The best way to handle very large branches for those rare cases where you do have a very large contiguous block of code more than 128MB is by having the linker insert veneers when needed; the code will branch to the veneer which will insert an indirect branch at that point (the ABI guarantees that at function call boundaries IP0 and IP1 will not contain live values, making them available for such purposes). In a very small number of cases it might be desirable to mark specific functions as being too far away to reach; in those cases the attributes and pragma methods can be used to mark such calls as being far calls. Aside: The reason -mlong-calls was added to GCC for ARM is that the code there pre-dates the EABI, which introduced the concept of link-time veneering of calls - the option should be unnecessary now that almost everyone uses the EABI as the basis for their platform ABI. We don't have such a legacy for AArch64 and I'd need to see strong justification for its use before adding the option there as well. So please can you rework the patch to remove the -mlong-calls option and just leave the attribute and pragma interfaces. R. Hello Richard, Thanks for the comments. I agree with the idea. And I updated the patch with the -mlong-calls option removed and use short call by default. Reg-tested for aarch64-linux-gnu with qemu. Is it OK for trunk? Index: gcc/ChangeLog = == --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,25 @@ +2014-11-12 Felix Yang felix.y...@huawei.com + Haijian Zhang z.zhanghaij...@huawei.com + + * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define. + * config/aarch64/aarch64.c (aarch64_set_default_type_attributes, + aarch64_attribute_table, aarch64_comp_type_attributes, + aarch64_decl_is_long_call_p, aarch64_function_in_section_p, + aarch64_pr_long_calls, aarch64_pr_no_long_calls, + aarch64_pr_long_calls_off): New functions. + (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as + aarch64_set_default_type_attributes. + (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table. + (TARGET_COMP_TYPE_ATTRIBUTES): Define as aarch64_comp_type_attribute. + (aarch64_pragma_enum): New enum. + (aarch64_attribute_table): New attribute table. + * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls, + aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New declarations. + * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to + generate indirect call for sibling call when needed. + * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified to + exclude a symbol_ref for an indirect call. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/testsuite/gcc.target/aarch64/long-calls-1.c = == --- gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0) @@ -0,0 +1,133 @@ +/* Check that long calls to different sections are not optimized to +bl. */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* This test expects that short calls are the default. */ +/* { dg-skip-if -mlong-calls in use { *-*-* } { -mlong-calls } +{ } } */ + +#define section(S) __attribute__((section(S))) #define weak +__attribute__((weak)) #define noinline __attribute__((noinline)) +#define long_call __attribute__((long_call)) #define short_call +__attribute__((short_call)) + +#define REMOTE_CALL(ID, TARGET_ATTRS, CALL_ATTRS) \ + const char *TARGET_ATTRS ID (void); \ + const char *CALL_ATTRS call_##ID (void) { return ID () + 1; } + +#define EXTERN_CALL(ID, TARGET_ATTRS, CALL_ATTRS) \ + const char *TARGET_ATTRS noinline ID (void) { return
Re: [PATCH][ARM] __ARM_FP __ARM_NEON_FP defined when -march=armv7-m
On 18/11/14 11:30, Mantas Mikaitis wrote: Incorrect predefinitions for certain target architectures. E.g. arm7-m does not contain NEON but the defintion __ARM_NEON_FP was switched on. Similarly with armv6 and even armv2. This patch fixes the predefines for each of the different chips containing certain types of the FPU implementations. Tests: Tested on arm-none-linux-gnueabi and arm-none-linux-gnueabihf without any new regression. Manually compiled for various targets and all correct definitions were present. Is this patch ok for trunk? Mantas gcc/Changelog: * config/arm/arm.h (TARGET_NEON_FP): Removed conditional definition, define to zero if !TARGET_NEON. (TARGET_CPU_CPP_BUILTINS): Added second condition before defining __ARM_FP macro. ARM_DEFS.patch diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index ff4ddac..325fea9 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -118,7 +118,7 @@ extern char arm_arch_name[]; if (TARGET_VFP) \ builtin_define (__VFP_FP__);\ \ - if (TARGET_ARM_FP) \ + if (TARGET_ARM_FP !TARGET_SOFT_FLOAT)\ Wouldn't it be better to factor this into TARGET_ARM_FP? It seems odd that that macro returns a set of values based on something completely unavailable for the current compilation. That would also then mirror the behaviour of TARGET_NEON_FP (see below) and make the internal macros more consistent. builtin_define_with_int_value ( \ __ARM_FP, TARGET_ARM_FP); \ if (arm_fp16_format == ARM_FP16_FORMAT_IEEE)\ @@ -2350,10 +2350,9 @@ extern int making_const_table; /* Set as a bit mask indicating the available widths of floating point types for hardware NEON floating point. This is the same as TARGET_ARM_FP without the 64-bit bit set. */ -#ifdef TARGET_NEON -#define TARGET_NEON_FP \ - (TARGET_ARM_FP (0xff ^ 0x08)) -#endif +#define TARGET_NEON_FP\ + (TARGET_NEON ? (TARGET_ARM_FP (0xff ^ 0x08)) \ + : 0) /* The maximum number of parallel loads or stores we support in an ldm/stm instruction. */ R.
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 18 November 2014 11:28, Yangfei (Felix) felix.y...@huawei.com wrote: Yeah, that's a good idea. See my updated patch :-) Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + OK with me /Marcus
Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian
On 18/11/14 11:02, Yangfei (Felix) wrote: On 06/11/14 08:35, Yangfei (Felix) wrote: The idea is simple: Use movw for certain const source operand instead of ldrh. And exclude the const values which cannot be handled by mov/mvn/movw. I am doing regression test for this patch. Assuming no issue pops up, OK for trunk? So, doesn't that makes the bug latent for architectures older than armv6t2 and big endian and only fixed this in ARM state ? I'd prefer a complete solution please. What about *thumb2_movhi_insn in thumb2.md ? Actually, we fix the bug by excluding the const values which cannot be handled. The patch still works even without the adding of movw here. The new movw alternative here is just an small code optimization for certain arch. We can handle consts by movw instead of ldrh and this better for performance. We find that this bug is not reproducible for *thumb2_movhi_insn. The reason is that this pattern can always move consts using movw. Please fix the PR number in your final commit with PR 59593. Index: gcc/config/arm/predicates.md = == --- gcc/config/arm/predicates.md(revision 216838) +++ gcc/config/arm/predicates.md(working copy) @@ -144,6 +144,11 @@ (and (match_code const_int) (match_test INTVAL (op) == 0))) +(define_predicate arm_movw_immediate_operand + (and (match_test TARGET_32BIT arm_arch_thumb2) + (and (match_code const_int) + (match_test (INTVAL (op) 0x) == 0 + ;; Something valid on the RHS of an ARM data-processing instruction (define_predicate arm_rhs_operand (ior (match_operand 0 s_register_operand) @@ -211,6 +216,11 @@ (ior (match_operand 0 arm_rhs_operand) (match_operand 0 arm_not_immediate_operand))) +(define_predicate arm_hi_operand + (ior (match_operand 0 arm_rhsm_operand) + (ior (match_operand 0 arm_not_immediate_operand) +(match_operand 0 arm_movw_immediate_operand + I think you don't need any of these predicates. (define_predicate arm_di_operand (ior (match_operand 0 s_register_operand) (match_operand 0 arm_immediate_di_operand))) Index: gcc/config/arm/arm.md = == --- gcc/config/arm/arm.md (revision 216838) +++ gcc/config/arm/arm.md (working copy) @@ -6285,8 +6285,8 @@ ;; Pattern to recognize insn generated default case above (define_insn *movhi_insn_arch4 - [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r) - (match_operand:HI 1 general_operand rIk,K,r,mi))] + [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r) + (match_operand:HI 1 arm_hi_operand rIk,K,j,r,mi))] Use `n' instead of `j' - movw can handle all of the numerical constants for HImode values. And the predicate can remain general_operand. Did you read my comment about set_attr arch further down in the thread ? Look at the set_attr arch alternative and set this to t2 for the movw alternative. I would expect that to be enough to sort this out instead of adding all this code. Ramana Hello Ramana, We need to make sure that movw is only used for architectures which satisfy arm_arch_thumb2 as indicated in the following predicate added: +(define_predicate arm_movw_immediate_operand + (and (match_test TARGET_32BIT arm_arch_thumb2) + (and (match_code const_int) + (match_test (INTVAL (op) 0x) == 0 I am modifying the predicate in order to fix issue for other older architectures. It seems we can't achieve this by simply using 'n' instead of 'j' here, right? Thanks.
Re: [PING ^ 2][RFC PATCH, AARCH64] Add support for -mlong-calls option
On Tue, Nov 18, 2014 at 11:51 AM, Yangfei (Felix) felix.y...@huawei.com wrote: Ping again? Any comment please? Pinging daily is only going to irritate people. Please desist from doing so. Ramana Ping? I hope this patch can catch up with stage 1 of GCC-5.0. Thanks. Hi Felix, Sorry for the delay responding, I've been out of the office recently and I'm only just catching up on a backlog of GCC related emails. I'm in two minds about this; I can potentially see the need for attributes to enable long calls for specific calls, and maybe also for pragmas that can be used to efficiently mark a group of functions in that way; but I don't really see the value in adding a -mlong-calls option to do this globally. The reasoning is as follows: long calls are generally very expensive and relatively few functions should need them in most applications (since code that needs to span more than a single block of 128Mbytes - the span of a BL or B instruction - will be very rare in reality). The best way to handle very large branches for those rare cases where you do have a very large contiguous block of code more than 128MB is by having the linker insert veneers when needed; the code will branch to the veneer which will insert an indirect branch at that point (the ABI guarantees that at function call boundaries IP0 and IP1 will not contain live values, making them available for such purposes). In a very small number of cases it might be desirable to mark specific functions as being too far away to reach; in those cases the attributes and pragma methods can be used to mark such calls as being far calls. Aside: The reason -mlong-calls was added to GCC for ARM is that the code there pre-dates the EABI, which introduced the concept of link-time veneering of calls - the option should be unnecessary now that almost everyone uses the EABI as the basis for their platform ABI. We don't have such a legacy for AArch64 and I'd need to see strong justification for its use before adding the option there as well. So please can you rework the patch to remove the -mlong-calls option and just leave the attribute and pragma interfaces. R. Hello Richard, Thanks for the comments. I agree with the idea. And I updated the patch with the -mlong-calls option removed and use short call by default. Reg-tested for aarch64-linux-gnu with qemu. Is it OK for trunk? Index: gcc/ChangeLog = == --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,25 @@ +2014-11-12 Felix Yang felix.y...@huawei.com + Haijian Zhang z.zhanghaij...@huawei.com + + * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define. + * config/aarch64/aarch64.c (aarch64_set_default_type_attributes, + aarch64_attribute_table, aarch64_comp_type_attributes, + aarch64_decl_is_long_call_p, aarch64_function_in_section_p, + aarch64_pr_long_calls, aarch64_pr_no_long_calls, + aarch64_pr_long_calls_off): New functions. + (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as + aarch64_set_default_type_attributes. + (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table. + (TARGET_COMP_TYPE_ATTRIBUTES): Define as aarch64_comp_type_attribute. + (aarch64_pragma_enum): New enum. + (aarch64_attribute_table): New attribute table. + * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls, + aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New declarations. + * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to + generate indirect call for sibling call when needed. + * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified to + exclude a symbol_ref for an indirect call. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/testsuite/gcc.target/aarch64/long-calls-1.c = == --- gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0) @@ -0,0 +1,133 @@ +/* Check that long calls to different sections are not optimized to +bl. */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* This test expects that short calls are the default. */ +/* { dg-skip-if -mlong-calls in use { *-*-* } { -mlong-calls } +{ } } */ + +#define section(S) __attribute__((section(S))) #define weak +__attribute__((weak)) #define noinline __attribute__((noinline)) +#define long_call __attribute__((long_call)) #define short_call +__attribute__((short_call)) + +#define REMOTE_CALL(ID, TARGET_ATTRS, CALL_ATTRS) \ + const char *TARGET_ATTRS ID (void); \ + const char *CALL_ATTRS call_##ID
Re: match.pd tweaks for vectors and issues with HONOR_NANS
On Tue, Nov 18, 2014 at 11:36 AM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 17 Nov 2014, Richard Biener wrote: On Sun, Nov 16, 2014 at 6:53 PM, Marc Glisse marc.gli...@inria.fr wrote: On Sun, 16 Nov 2014, Richard Biener wrote: I think the element_mode is the way to go. The following passed bootstrap+testsuite. 2014-11-16 Marc Glisse marc.gli...@inria.fr * tree.c (element_mode, integer_truep): New functions. * tree.h (element_mode, integer_truep): Declare them. * fold-const.c (negate_expr_p, fold_negate_expr, combine_comparisons, fold_cond_expr_with_comparison, fold_real_zero_addition_p, fold_comparison, fold_ternary_loc, tree_call_nonnegative_warnv_p, fold_strip_sign_ops): Use element_mode. (fold_binary_loc): Use element_mode and element_precision. * match.pd: Use integer_truep, element_mode, element_precision, VECTOR_TYPE_P and build_one_cst. Extend some transformations to vectors. Simplify A/-A. -! (final_prec != GET_MODE_PRECISION (TYPE_MODE (type)) - TYPE_MODE (type) == TYPE_MODE (inter_type)) +! (final_prec != element_precision (type) + element_mode (type) == element_mode (inter_type)) isn't a 1:1 conversion - please use final_prec != GET_MODE_PRECISION (element_mode (type)) your version is final_prec != final_prec. Good catch, I was doing those replacements too fast to be really safe :-( The tree.c:element_mode function lacks a function comment. Ok with those fixed. I am attaching the version I committed. I'll try to replace some more TYPE_MODE during stage3... Btw, a convenience would be to be able to write HONOR_NANS (type) thus effectively make HONOR_* inline functions with a machine_mode and a type overload (and the type overload properly looking at element types). Richard. Thanks, -- Marc Glisse Index: gcc/tree.c === --- gcc/tree.c (revision 217701) +++ gcc/tree.c (revision 217702) @@ -2268,20 +2268,34 @@ integer_nonzerop (const_tree expr) { STRIP_NOPS (expr); return ((TREE_CODE (expr) == INTEGER_CST !wi::eq_p (expr, 0)) || (TREE_CODE (expr) == COMPLEX_CST (integer_nonzerop (TREE_REALPART (expr)) || integer_nonzerop (TREE_IMAGPART (expr); } +/* Return 1 if EXPR is the integer constant one. For vector, + return 1 if every piece is the integer constant minus one + (representing the value TRUE). */ + +int +integer_truep (const_tree expr) +{ + STRIP_NOPS (expr); + + if (TREE_CODE (expr) == VECTOR_CST) +return integer_all_onesp (expr); + return integer_onep (expr); +} + /* Return 1 if EXPR is the fixed-point constant zero. */ int fixed_zerop (const_tree expr) { return (TREE_CODE (expr) == FIXED_CST TREE_FIXED_CST (expr).data.is_zero ()); } /* Return the power of two represented by a tree node known to be a @@ -12303,11 +12317,25 @@ get_base_address (tree t) t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); /* ??? Either the alias oracle or all callers need to properly deal with WITH_SIZE_EXPRs before we can look through those. */ if (TREE_CODE (t) == WITH_SIZE_EXPR) return NULL_TREE; return t; } +/* Return the machine mode of T. For vectors, returns the mode of the + inner type. The main use case is to feed the result to HONOR_NANS, + avoiding the BLKmode that a direct TYPE_MODE (T) might return. */ + +machine_mode +element_mode (const_tree t) +{ + if (!TYPE_P (t)) +t = TREE_TYPE (t); + if (VECTOR_TYPE_P (t) || TREE_CODE (t) == COMPLEX_TYPE) +t = TREE_TYPE (t); + return TYPE_MODE (t); +} + #include gt-tree.h Index: gcc/tree.h === --- gcc/tree.h (revision 217701) +++ gcc/tree.h (revision 217702) @@ -1557,20 +1557,22 @@ extern void protected_set_expr_location #define TYPE_NEXT_VARIANT(NODE) (TYPE_CHECK (NODE)-type_common.next_variant) #define TYPE_MAIN_VARIANT(NODE) (TYPE_CHECK (NODE)-type_common.main_variant) #define TYPE_CONTEXT(NODE) (TYPE_CHECK (NODE)-type_common.context) #define TYPE_MODE(NODE) \ (VECTOR_TYPE_P (TYPE_CHECK (NODE)) \ ? vector_type_mode (NODE) : (NODE)-type_common.mode) #define SET_TYPE_MODE(NODE, MODE) \ (TYPE_CHECK (NODE)-type_common.mode = (MODE)) +extern machine_mode element_mode (const_tree t); + /* The canonical type for this type node, which is used by frontends to compare the type for equality with another type. If two types are equal (based on the semantics of the language), then they will have equivalent TYPE_CANONICAL entries. As a special case, if TYPE_CANONICAL is NULL_TREE, and thus TYPE_STRUCTURAL_EQUALITY_P is true, then it cannot be used for comparison against other types.
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com: On 11/17/14 13:43, Ilya Enkovich wrote: I don't fully understand how this problem appears. Is it fully AIX specific and doesn't affect any other target? May we put all _CHKP codes to the end of enum and ignore them by AIX? Limiting number of codes in enum due to restrictions of specific debug info format seems weird. If something cannot be encoded for debug info then it should be ignored. I really hoped that creation of functions by demand would allow to avoid such kind of problems :( I'll try to make a patch reducing amound of builtin codes to a required minimum in case it appears to be the best option we have. It's a problem with the AIX native tools. Presumably they don't like really long lines -- it was a relatively common issue in the past with vendor tools. I think we should proceed on two fronts. First if David could investigate the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to resolve the issue. Second if you could look at now duplicating every builtin in the enumeration since it's a combination of the number of enums and the length of the debug strings to represent them that's causing AIX heartburn. jeff I see. I can reduce the number of _CHKP builtin codes. Experimental patch shows resulting END_BUILTINS = 1242. But we should expect similar problem for i386 target builds hosted on AIX (here http://toolchain.lug-owl.de/buildbot/ I see such build is tested). Current IX86_BUILTIN_MAX is 2467 which is greater than current END_BUILTINS. I think it's better to fix dbxout.c to do something sensible with too long lines for enum types. It should be easy to cook up a testcase that is not GCC. enum { entry1, entry2, entry3 }; not sure how exactly the issue breaks AIX bootstrap though. I suppose the assembler complains? Richard. Ilya
Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
On Tue, 18 Nov 2014, Richard Biener wrote: On Tue, 18 Nov 2014, Eric Botcazou wrote: Now - I can see how that is easily confused by the static chain being address-taken. But I also remember that Eric did some preparatory work to fix that, for nested functions, that is, possibly setting DECL_NONADDRESSABLE_P? Don't remember exactly. The preparatory work is DECL_NONLOCAL_FRAME. The complete patch which does something along these lines is attached to PR tree-optimization/54779 (latest version, for a 4.9-based compiler). Ah, now I remember - this was to be able to optimize away the frame variable in case the nested function was inlined. Toms case is somewhat different as I undestand as somehow LIM store motion doesn't handle indirect frame accesses well enough(?) So he intends to load register vars in the frame into registers at the beginning of the nested function and restore them to the frame on function exit (this will probably break for recursive calls, but OMP offloading might be special enough that this is a non-issue there). So marking the frame decl won't help him here (I thought we might mark the FIELD_DECLs corresponding to individual vars). OTOH inside the nested function accesses to the static chain should be easy to identify. Tom - does the following patch help? Thanks, Richard. Index: gcc/omp-low.c === --- gcc/omp-low.c (revision 217692) +++ gcc/omp-low.c (working copy) @@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct layout_type (type); } - TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type); + TREE_TYPE (ctx-receiver_decl) += build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT); } /* Instantiate decls as necessary in CTX to satisfy the data sharing
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com: On 11/17/14 13:43, Ilya Enkovich wrote: I don't fully understand how this problem appears. Is it fully AIX specific and doesn't affect any other target? May we put all _CHKP codes to the end of enum and ignore them by AIX? Limiting number of codes in enum due to restrictions of specific debug info format seems weird. If something cannot be encoded for debug info then it should be ignored. I really hoped that creation of functions by demand would allow to avoid such kind of problems :( I'll try to make a patch reducing amound of builtin codes to a required minimum in case it appears to be the best option we have. It's a problem with the AIX native tools. Presumably they don't like really long lines -- it was a relatively common issue in the past with vendor tools. I think we should proceed on two fronts. First if David could investigate the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to resolve the issue. Second if you could look at now duplicating every builtin in the enumeration since it's a combination of the number of enums and the length of the debug strings to represent them that's causing AIX heartburn. jeff I see. I can reduce the number of _CHKP builtin codes. Experimental patch shows resulting END_BUILTINS = 1242. But we should expect similar problem for i386 target builds hosted on AIX (here http://toolchain.lug-owl.de/buildbot/ I see such build is tested). Current IX86_BUILTIN_MAX is 2467 which is greater than current END_BUILTINS. I think it's better to fix dbxout.c to do something sensible with too long lines for enum types. It should be easy to cook up a testcase that is not GCC. enum { entry1, entry2, entry3 }; As I understand the main problem is that fixing dbxout in trunk won't help to build stage1 compiler. not sure how exactly the issue breaks AIX bootstrap though. I suppose the assembler complains? It is linker how complaints. Here is a build log example: http://toolchain.lug-owl.de/buildbot/deliver_artifact.php?mode=viewid=3073584 Ilya Richard. Ilya
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
On Tue, Nov 18, 2014 at 1:13 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com: On 11/17/14 13:43, Ilya Enkovich wrote: I don't fully understand how this problem appears. Is it fully AIX specific and doesn't affect any other target? May we put all _CHKP codes to the end of enum and ignore them by AIX? Limiting number of codes in enum due to restrictions of specific debug info format seems weird. If something cannot be encoded for debug info then it should be ignored. I really hoped that creation of functions by demand would allow to avoid such kind of problems :( I'll try to make a patch reducing amound of builtin codes to a required minimum in case it appears to be the best option we have. It's a problem with the AIX native tools. Presumably they don't like really long lines -- it was a relatively common issue in the past with vendor tools. I think we should proceed on two fronts. First if David could investigate the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to resolve the issue. Second if you could look at now duplicating every builtin in the enumeration since it's a combination of the number of enums and the length of the debug strings to represent them that's causing AIX heartburn. jeff I see. I can reduce the number of _CHKP builtin codes. Experimental patch shows resulting END_BUILTINS = 1242. But we should expect similar problem for i386 target builds hosted on AIX (here http://toolchain.lug-owl.de/buildbot/ I see such build is tested). Current IX86_BUILTIN_MAX is 2467 which is greater than current END_BUILTINS. I think it's better to fix dbxout.c to do something sensible with too long lines for enum types. It should be easy to cook up a testcase that is not GCC. enum { entry1, entry2, entry3 }; As I understand the main problem is that fixing dbxout in trunk won't help to build stage1 compiler. Simply build stage1 without debug info on AIX then. Btw, you have to start fixing the bug at some point ... (we can backport to 4.8 and 4.9). Of course dbxout.c is in kind of deep-freeze unmaintained state. I don't think we ever encouraged work-arounds for broken host compilers if that requires substantial work. not sure how exactly the issue breaks AIX bootstrap though. I suppose the assembler complains? It is linker how complaints. Here is a build log example: http://toolchain.lug-owl.de/buildbot/deliver_artifact.php?mode=viewid=3073584 I see. Thanks, Richard. Ilya Richard. Ilya
Re: [PATCH][AArch64][1/5] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
On 18/11/14 10:33, Kyrill Tkachov wrote: diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 264bf01..ad7ec43c 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -36,7 +36,6 @@ extern int arm_no_early_alu_shift_value_dep (rtx, rtx); extern int arm_no_early_mul_dep (rtx, rtx); extern int arm_no_early_store_addr_dep (rtx, rtx); extern bool arm_rtx_shift_left_p (rtx); - /* RTX cost table definitions. These are used when tuning for speed rather than for size and should reflect the_additional_ cost over the cost of the fastest instruction in the machine, which is COSTS_N_INSNS (1). This hunk should not be here. I'll remove it when I commit if approved... Sorry for that. Kyrill
Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian
On 18/11/14 11:02, Yangfei (Felix) wrote: On 06/11/14 08:35, Yangfei (Felix) wrote: The idea is simple: Use movw for certain const source operand instead of ldrh. And exclude the const values which cannot be handled by mov/mvn/movw. I am doing regression test for this patch. Assuming no issue pops up, OK for trunk? So, doesn't that makes the bug latent for architectures older than armv6t2 and big endian and only fixed this in ARM state ? I'd prefer a complete solution please. What about *thumb2_movhi_insn in thumb2.md ? Actually, we fix the bug by excluding the const values which cannot be handled. The patch still works even without the adding of movw here. The new movw alternative here is just an small code optimization for certain arch. We can handle consts by movw instead of ldrh and this better for performance. We find that this bug is not reproducible for *thumb2_movhi_insn. The reason is that this pattern can always move consts using movw. Please fix the PR number in your final commit with PR 59593. Index: gcc/config/arm/predicates.md = == --- gcc/config/arm/predicates.md (revision 216838) +++ gcc/config/arm/predicates.md (working copy) @@ -144,6 +144,11 @@ (and (match_code const_int) (match_test INTVAL (op) == 0))) +(define_predicate arm_movw_immediate_operand + (and (match_test TARGET_32BIT arm_arch_thumb2) + (and (match_code const_int) + (match_test (INTVAL (op) 0x) == 0 + ;; Something valid on the RHS of an ARM data-processing instruction (define_predicate arm_rhs_operand (ior (match_operand 0 s_register_operand) @@ -211,6 +216,11 @@ (ior (match_operand 0 arm_rhs_operand) (match_operand 0 arm_not_immediate_operand))) +(define_predicate arm_hi_operand + (ior (match_operand 0 arm_rhsm_operand) + (ior (match_operand 0 arm_not_immediate_operand) +(match_operand 0 arm_movw_immediate_operand + I think you don't need any of these predicates. (define_predicate arm_di_operand (ior (match_operand 0 s_register_operand) (match_operand 0 arm_immediate_di_operand))) Index: gcc/config/arm/arm.md = == --- gcc/config/arm/arm.md (revision 216838) +++ gcc/config/arm/arm.md (working copy) @@ -6285,8 +6285,8 @@ ;; Pattern to recognize insn generated default case above (define_insn *movhi_insn_arch4 - [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r) - (match_operand:HI 1 general_operand rIk,K,r,mi))] + [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r) + (match_operand:HI 1 arm_hi_operand rIk,K,j,r,mi))] Use `n' instead of `j' - movw can handle all of the numerical constants for HImode values. And the predicate can remain general_operand. Did you read my comment about set_attr arch further down in the thread ? Look at the set_attr arch alternative and set this to t2 for the movw alternative. I would expect that to be enough to sort this out instead of adding all this code. Sorry for missing the point. It seems to me that 't2' here will conflict with condition of the pattern *movhi_insn_arch4: TARGET_ARM arm_arch4 (register_operand (operands[0], HImode) || register_operand (operands[1], HImode)) #define TARGET_ARM (! TARGET_THUMB) /* 32-bit Thumb-2 code. */ #define TARGET_THUMB2 (TARGET_THUMB arm_arch_thumb2) Right? Thanks.
Re: [PATCH 5/5] combine: preferably delete dead SETs in PARALLELs
On Mon, Nov 17, 2014 at 02:13:22PM -0700, Jeff Law wrote: On 11/14/14 12:19, Segher Boessenkool wrote: If the result of combining some insns is a PARALLEL of two SETs, and that is not a recognised insn, and one of the SETs is dead, combine tries to use the remaining SET as insn. This patch changes things around so that the one SET is preferably used, not the PARALLEL. 2014-11-14 Segher Boessenkool seg...@kernel.crashing.org gcc/ * combine.c (try_combine): Prefer to delete dead SETs inside a PARALLEL over keeping them. OK. Can this go forward independent of the other changes? Seems like it should. Yes, only 4 depends on 3, the rest are independent patches. Does it help pr52714 where we'd like to rip apart a PARALLEL with two sets, one of which is dead. If so, it might allow us to optimize that code better. It does not seem to fix the testcase. I'll investigate why not. You're talking about the (parallel [(set (pc) (pc)) (set (X) (sp))]) right? I guess the set pc pc is not marked as unused... Granted, it originally was an m68k issue, but presumably it's happening eleswhere or you wouldn't be poking at it :-) The case that made me do this is PowerPC, where (with more patches) for long long subfM1(long long a) { return 0x1 - a; } we generated (-m32) subfic 4,4,-1 ; subfic 3,3,1 where that first subfic is (parallel [(set (reg 4) (not (reg 4))) (set (ca) (const_int 1))]) with that second set dead, so we can just do not 4,4 ; subfic 3,3,1 which is cheaper. Segher
Re: [PATCH 4/4] Data structure is used for inline_summary struct.
On Fri, Nov 14, 2014 at 08:59:10PM +0100, Jan Hubicka wrote: b) with GTY, we cannot call destructor Everything in symbol table is expecitely memory managed (i.e. enver left to be freed by garbage collector). It resists in GTY only to allow linking garbage collected object from them and to get PCH working. Well, if I understand the intent correctly, summaries are for stuff that is not in the symbol table. For example jump functions are a Correct. vector of structures possibly containing trees, so everything has to be in garbage collected memory. When an edge is removed, it is necessary to be notified about it immediately, for example to decrement rdesc_refcount (you might argue that that should be done in a separate hook and not from within a summary class but then you start to rely on hook invocation ordering so I think it is better to eventually use the summaries for it too). I do not see why ctors/dtors can not do the reference counting. In fact this is how refcounting is done usually anyway? Well, when there is no garbage collection involved then yes, that is how you normally do it but in the GC case, there is the question of what is the appropriate time to call destructor on garbage collected data (like jump functions)? Martin
Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian
Sorry for missing the point. It seems to me that 't2' here will conflict with condition of the pattern *movhi_insn_arch4: TARGET_ARM arm_arch4 (register_operand (operands[0], HImode) || register_operand (operands[1], HImode)) #define TARGET_ARM (! TARGET_THUMB) /* 32-bit Thumb-2 code. */ #define TARGET_THUMB2 (TARGET_THUMB arm_arch_thumb2) Bah, Indeed ! - I misremembered the t2 there, my mistake. Yes you are right there, but what I'd like you to do is to use that mechanism rather than putting all this logic in the predicate. So, I'd prefer you to add a v6t2 to the values for the arch attribute, don't forget to update the comments above. and in arch_enabled you need to enforce this with (and (eq_attr arch v6t2) (match_test TARGET_32BIT arm_arch6 arm_arch_thumb2)) (const_string yes) And in the pattern use v6t2 ... arm_arch_thumb2 implies that this is at the architecture level of v6t2. Therefore TARGET_ARM arm_arch_thumb2 implies ARM state. regards Ramana Right? Thanks.
Re: [PING ^ 2][RFC PATCH, AARCH64] Add support for -mlong-calls option
On Tue, Nov 18, 2014 at 11:51 AM, Yangfei (Felix) felix.y...@huawei.com wrote: Ping again? Any comment please? Pinging daily is only going to irritate people. Please desist from doing so. Ramana Oh, thanks for reminding me. And sorry if this bothers you guys. The end of stage1 of GCC 5.0 causes me to push this a little bit :-) Ping? I hope this patch can catch up with stage 1 of GCC-5.0. Thanks. Hi Felix, Sorry for the delay responding, I've been out of the office recently and I'm only just catching up on a backlog of GCC related emails. I'm in two minds about this; I can potentially see the need for attributes to enable long calls for specific calls, and maybe also for pragmas that can be used to efficiently mark a group of functions in that way; but I don't really see the value in adding a -mlong-calls option to do this globally. The reasoning is as follows: long calls are generally very expensive and relatively few functions should need them in most applications (since code that needs to span more than a single block of 128Mbytes - the span of a BL or B instruction - will be very rare in reality). The best way to handle very large branches for those rare cases where you do have a very large contiguous block of code more than 128MB is by having the linker insert veneers when needed; the code will branch to the veneer which will insert an indirect branch at that point (the ABI guarantees that at function call boundaries IP0 and IP1 will not contain live values, making them available for such purposes). In a very small number of cases it might be desirable to mark specific functions as being too far away to reach; in those cases the attributes and pragma methods can be used to mark such calls as being far calls. Aside: The reason -mlong-calls was added to GCC for ARM is that the code there pre-dates the EABI, which introduced the concept of link-time veneering of calls - the option should be unnecessary now that almost everyone uses the EABI as the basis for their platform ABI. We don't have such a legacy for AArch64 and I'd need to see strong justification for its use before adding the option there as well. So please can you rework the patch to remove the -mlong-calls option and just leave the attribute and pragma interfaces. R. Hello Richard, Thanks for the comments. I agree with the idea. And I updated the patch with the -mlong-calls option removed and use short call by default. Reg-tested for aarch64-linux-gnu with qemu. Is it OK for trunk? Index: gcc/ChangeLog = == --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,25 @@ +2014-11-12 Felix Yang felix.y...@huawei.com + Haijian Zhang z.zhanghaij...@huawei.com + + * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define. + * config/aarch64/aarch64.c (aarch64_set_default_type_attributes, + aarch64_attribute_table, aarch64_comp_type_attributes, + aarch64_decl_is_long_call_p, aarch64_function_in_section_p, + aarch64_pr_long_calls, aarch64_pr_no_long_calls, + aarch64_pr_long_calls_off): New functions. + (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as + aarch64_set_default_type_attributes. + (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table. + (TARGET_COMP_TYPE_ATTRIBUTES): Define as aarch64_comp_type_attribute. + (aarch64_pragma_enum): New enum. + (aarch64_attribute_table): New attribute table. + * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls, + aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New declarations. + * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to + generate indirect call for sibling call when needed. + * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified to + exclude a symbol_ref for an indirect call. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/testsuite/gcc.target/aarch64/long-calls-1.c = == --- gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0) @@ -0,0 +1,133 @@ +/* Check that long calls to different sections are not optimized +to bl. */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* This test expects that short calls are the default. */ +/* { dg-skip-if -mlong-calls in use { *-*-* } { -mlong-calls +} { } } */ + +#define section(S) __attribute__((section(S))) #define weak +__attribute__((weak)) #define noinline __attribute__((noinline)) +#define
Re: [PATCH] plugin event for C/C++ function definitions
2014-11-14 7:08 GMT-03:00 Richard Biener richard.guent...@gmail.com: On Thu, Nov 13, 2014 at 6:42 PM, Andres Tiraboschi andres.tirabos...@tallertechnologies.com wrote: Hi, this patch adds a new plugin event PLUGIN_START_FUNCTION and PLUGIN_FINISH_FUNCTION that are invoked at start_function and finish_function respectively in the C and C++ frontends. PLUGIN_START_FUNCTION is called before parsing the function body. PLUGIN_FINISH_FUNCTION is called after parsing a function definition. Can you name them more specifically, like PLUGIN_START/FINISH_PARSE_FUNCTION please? Thanks, Richard. Ok, I will do it
Re: [PATCH] Fix optimize_range_tests_diff
Jakub Jelinek ja...@redhat.com writes: On Tue, Oct 14, 2014 at 11:23:22AM -0600, Jeff Law wrote: On 10/14/14 10:02, Jakub Jelinek wrote: When hacking on range reassoc opt, I've noticed we can emit code with undefined behavior even when there wasn't one originally, in particular for: (X - 43U) = 3U || (X - 75U) = 3U and this loop can transform that into ((X - 43U) ~(75U - 43U)) = 3U. */ we actually don't transform it to what the comment says, but ((X - 43) ~(75U - 43U)) = 3U i.e. the initial subtraction can be performed in signed type, if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction at gimple level would be UB (not caught by -fsanitize=undefined, because that is handled much earlier). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-10-14 Jakub Jelinek ja...@redhat.com * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform MINUS_EXPR in unsigned type to avoid undefined behavior. Any chance this fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63302 No. For that I have right now: - if (tree_log2 (lowxor) 0) + if (wi::popcount (wi::to_widest (lowxor)) != 1) in my tree, though supposedly: if (wi::popcount (wi::zext (wi::to_widest (lowxor), TYPE_PRECISION (TREE_TYPE (lowxor != 1) might be better, as without zext it will supposedly not say popcount is 1 for smaller precision signed minimum values. My wide-int-fu is limited, so if there is a better way to do this, I'm all ears. Sorry for the very late reply, but just: if (wi::popcount (lowxor) != 1) should do what you want. Thanks, Richard
[PATCH] Fix PR63914
The following fixes an ICE in CCPs set_lattice_value which is confused when transitioning from { NaN, 0.0 } to { 0.0, 0.0 }. The comment before canonicalize_value nicely explains what happens here (with -ffinite-math-only) but the function fails to handle float vectors or complex float constants. Now -- we cannot simply canonicalize { NaN, 0.0 } to UNDEFINED as the value is still partly defined. Thus instead of doing the other possible thing (drop to VARYING here) I drop the canonicalization we do for the !HONOR_NANS case (as well as the !HONOR_SIGNED_ZEROS as operand_equal_p deals nicely with that). Instead of canonicalizing I allow this kind of lattice transitions (from NaN to !NaN if !HONOR_NANS). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk sofar. I've chosen to change the assert to a gcc_checking_assert which should be safe to backport and would hide the bug. Richard. 2014-11-18 Richard Biener rguent...@suse.de PR tree-optimization/63914 * tree-ssa-ccp.c (canonicalize_value): Remove float value canonicalization. (valid_lattice_transition): Allow (partial) transition from NaN to non-NaN if !HONOR_NANS. (set_lattice_value): Check for valid lattice transitions only when checking is enabled. * gcc.dg/pr63914.c: New testcase. Index: gcc/tree-ssa-ccp.c === --- gcc/tree-ssa-ccp.c (revision 217653) +++ gcc/tree-ssa-ccp.c (working copy) @@ -402,58 +402,16 @@ set_value_varying (tree var) val-mask = -1; } -/* For float types, modify the value of VAL to make ccp work correctly - for non-standard values (-0, NaN): - - If HONOR_SIGNED_ZEROS is false, and VAL = -0, we canonicalize it to 0. - If HONOR_NANS is false, and VAL is NaN, we canonicalize it to UNDEFINED. - This is to fix the following problem (see PR 29921): Suppose we have - - x = 0.0 * y - - and we set value of y to NaN. This causes value of x to be set to NaN. - When we later determine that y is in fact VARYING, fold uses the fact - that HONOR_NANS is false, and we try to change the value of x to 0, - causing an ICE. With HONOR_NANS being false, the real appearance of - NaN would cause undefined behavior, though, so claiming that y (and x) - are UNDEFINED initially is correct. - - For other constants, make sure to drop TREE_OVERFLOW. */ +/* For integer constants, make sure to drop TREE_OVERFLOW. */ static void canonicalize_value (ccp_prop_value_t *val) { - machine_mode mode; - tree type; - REAL_VALUE_TYPE d; - if (val-lattice_val != CONSTANT) return; if (TREE_OVERFLOW_P (val-value)) val-value = drop_tree_overflow (val-value); - - if (TREE_CODE (val-value) != REAL_CST) -return; - - d = TREE_REAL_CST (val-value); - type = TREE_TYPE (val-value); - mode = TYPE_MODE (type); - - if (!HONOR_SIGNED_ZEROS (mode) - REAL_VALUE_MINUS_ZERO (d)) -{ - val-value = build_real (type, dconst0); - return; -} - - if (!HONOR_NANS (mode) - REAL_VALUE_ISNAN (d)) -{ - val-lattice_val = UNDEFINED; - val-value = NULL; - return; -} } /* Return whether the lattice transition is valid. */ @@ -487,7 +445,49 @@ valid_lattice_transition (ccp_prop_value == wi::bit_and_not (wi::to_widest (new_val.value), new_val.mask)); /* Otherwise constant values have to agree. */ - return operand_equal_p (old_val.value, new_val.value, 0); + if (operand_equal_p (old_val.value, new_val.value, 0)) +return true; + + /* At least the kinds and types should agree now. */ + if (TREE_CODE (old_val.value) != TREE_CODE (new_val.value) + || !types_compatible_p (TREE_TYPE (old_val.value), + TREE_TYPE (new_val.value))) +return false; + + /* For floats and !HONOR_NANS allow transitions from (partial) NaN + to non-NaN. */ + tree type = TREE_TYPE (new_val.value); + if (SCALAR_FLOAT_TYPE_P (type) + !HONOR_NANS (TYPE_MODE (type))) +{ + if (REAL_VALUE_ISNAN (TREE_REAL_CST (old_val.value))) + return true; +} + else if (VECTOR_FLOAT_TYPE_P (type) + !HONOR_NANS (TYPE_MODE (TREE_TYPE (type +{ + for (unsigned i = 0; i VECTOR_CST_NELTS (old_val.value); ++i) + if (!REAL_VALUE_ISNAN + (TREE_REAL_CST (VECTOR_CST_ELT (old_val.value, i))) +!operand_equal_p (VECTOR_CST_ELT (old_val.value, i), +VECTOR_CST_ELT (new_val.value, i), 0)) + return false; + return true; +} + else if (COMPLEX_FLOAT_TYPE_P (type) + !HONOR_NANS (TYPE_MODE (TREE_TYPE (type +{ + if (!REAL_VALUE_ISNAN (TREE_REAL_CST (TREE_REALPART (old_val.value))) + !operand_equal_p (TREE_REALPART (old_val.value), + TREE_REALPART (new_val.value), 0)) + return false; + if (!REAL_VALUE_ISNAN (TREE_REAL_CST
Re: [PATCH][AARCH64]Add ACLE more predefined macros
On 17 November 2014 17:33, Renlin Li renlin...@arm.com wrote: Hi all, This is a simple patch to add more conditional macros defined ACLE 2.0. aarch64-none-elf target is tested on the model, no new issues. Is this Okay for trunk? gcc/ChangeLog: 2014-11-17 Renlin Li renlin...@arm.com * config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define __ARM_FP_FAST, __ARM_FEATURE_FMA, __ARM_FP, __ARM_FEATURE_NUMERIC_MAXMIN, __ARM_NEON_FP. OK /Marcus
Re: [PATCH][AArch64] Remove/merge redundant iterators
On 13 November 2014 10:38, Alan Lawrence alan.lawre...@arm.com wrote: Hi, gcc/config/aarch64/iterators.md contains numerous duplicates - not always obvious as they are not always sorted the same. Sometimes, one copy is used is aarch64-simd-builtins.def and another in aarch64-simd.md; othertimes there is no obvious pattern ;). This patch just removes all the duplicates; I'm willing to hear arguments that some of the duplication serves a useful purpose! But in the meantime, the complete list of changes is: * VSDQ_I_BHSI has no uses, remove * SDQ_I (duplicate of ALLI) has no uses, remove; * VDQQHS (duplicate of VDQ_BHSI) has no uses, remove; * VDQM duplicates VDQ_BHSI, use the latter; * VDIC and VDW duplicate VD_BHSI, use the latter; and vdq - vdq_i. OK /Marcus
Re: [PING ^ 2][RFC PATCH, AARCH64] Add support for -mlong-calls option
On 18/11/14 12:51, Yangfei (Felix) wrote: On Tue, Nov 18, 2014 at 11:51 AM, Yangfei (Felix) felix.y...@huawei.com wrote: Ping again? Any comment please? Pinging daily is only going to irritate people. Please desist from doing so. Ramana Oh, thanks for reminding me. And sorry if this bothers you guys. The end of stage1 of GCC 5.0 causes me to push this a little bit :-) You've posted this patch before stage1 closed. In any case IIRC the policy in the contribute pages used to suggest a weekly ping. Ramana Ping? I hope this patch can catch up with stage 1 of GCC-5.0. Thanks. Hi Felix, Sorry for the delay responding, I've been out of the office recently and I'm only just catching up on a backlog of GCC related emails. I'm in two minds about this; I can potentially see the need for attributes to enable long calls for specific calls, and maybe also for pragmas that can be used to efficiently mark a group of functions in that way; but I don't really see the value in adding a -mlong-calls option to do this globally. The reasoning is as follows: long calls are generally very expensive and relatively few functions should need them in most applications (since code that needs to span more than a single block of 128Mbytes - the span of a BL or B instruction - will be very rare in reality). The best way to handle very large branches for those rare cases where you do have a very large contiguous block of code more than 128MB is by having the linker insert veneers when needed; the code will branch to the veneer which will insert an indirect branch at that point (the ABI guarantees that at function call boundaries IP0 and IP1 will not contain live values, making them available for such purposes). In a very small number of cases it might be desirable to mark specific functions as being too far away to reach; in those cases the attributes and pragma methods can be used to mark such calls as being far calls. Aside: The reason -mlong-calls was added to GCC for ARM is that the code there pre-dates the EABI, which introduced the concept of link-time veneering of calls - the option should be unnecessary now that almost everyone uses the EABI as the basis for their platform ABI. We don't have such a legacy for AArch64 and I'd need to see strong justification for its use before adding the option there as well. So please can you rework the patch to remove the -mlong-calls option and just leave the attribute and pragma interfaces. R. Hello Richard, Thanks for the comments. I agree with the idea. And I updated the patch with the -mlong-calls option removed and use short call by default. Reg-tested for aarch64-linux-gnu with qemu. Is it OK for trunk? Index: gcc/ChangeLog = == --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,25 @@ +2014-11-12 Felix Yang felix.y...@huawei.com + Haijian Zhang z.zhanghaij...@huawei.com + + * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define. + * config/aarch64/aarch64.c (aarch64_set_default_type_attributes, + aarch64_attribute_table, aarch64_comp_type_attributes, + aarch64_decl_is_long_call_p, aarch64_function_in_section_p, + aarch64_pr_long_calls, aarch64_pr_no_long_calls, + aarch64_pr_long_calls_off): New functions. + (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as + aarch64_set_default_type_attributes. + (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table. + (TARGET_COMP_TYPE_ATTRIBUTES): Define as aarch64_comp_type_attribute. + (aarch64_pragma_enum): New enum. + (aarch64_attribute_table): New attribute table. + * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls, + aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New declarations. + * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to + generate indirect call for sibling call when needed. + * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified to + exclude a symbol_ref for an indirect call. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/testsuite/gcc.target/aarch64/long-calls-1.c = == --- gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0) @@ -0,0 +1,133 @@ +/* Check that long calls to different sections are not optimized +to bl. */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* This test expects that short calls are the default. */ +/* { dg-skip-if -mlong-calls in use { *-*-* } { -mlong-calls +} { } } */ + +#define section(S) __attribute__((section(S))) #define weak +__attribute__((weak)) #define noinline __attribute__((noinline)) +#define long_call __attribute__((long_call)) #define short_call +__attribute__((short_call)) + +#define REMOTE_CALL(ID, TARGET_ATTRS, CALL_ATTRS) \ + const char
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
2014-11-18 15:18 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 18, 2014 at 1:13 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com: On 11/17/14 13:43, Ilya Enkovich wrote: I don't fully understand how this problem appears. Is it fully AIX specific and doesn't affect any other target? May we put all _CHKP codes to the end of enum and ignore them by AIX? Limiting number of codes in enum due to restrictions of specific debug info format seems weird. If something cannot be encoded for debug info then it should be ignored. I really hoped that creation of functions by demand would allow to avoid such kind of problems :( I'll try to make a patch reducing amound of builtin codes to a required minimum in case it appears to be the best option we have. It's a problem with the AIX native tools. Presumably they don't like really long lines -- it was a relatively common issue in the past with vendor tools. I think we should proceed on two fronts. First if David could investigate the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to resolve the issue. Second if you could look at now duplicating every builtin in the enumeration since it's a combination of the number of enums and the length of the debug strings to represent them that's causing AIX heartburn. jeff I see. I can reduce the number of _CHKP builtin codes. Experimental patch shows resulting END_BUILTINS = 1242. But we should expect similar problem for i386 target builds hosted on AIX (here http://toolchain.lug-owl.de/buildbot/ I see such build is tested). Current IX86_BUILTIN_MAX is 2467 which is greater than current END_BUILTINS. I think it's better to fix dbxout.c to do something sensible with too long lines for enum types. It should be easy to cook up a testcase that is not GCC. enum { entry1, entry2, entry3 }; As I understand the main problem is that fixing dbxout in trunk won't help to build stage1 compiler. Simply build stage1 without debug info on AIX then. Btw, you have to start fixing the bug at some point ... (we can backport to 4.8 and 4.9). Of course dbxout.c is in kind of deep-freeze unmaintained state. Seems with no continuation lines support there is no easy way to limit stabstring length. But it should be easy to fix this particular problem (too long stabstring due to many enum values) by discarding the rest of values when some string length threshold is hit. Possible patch: diff --git a/gcc/dbxout.c b/gcc/dbxout.c index aa15a39..9b0b82d 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -168,6 +168,10 @@ along with GCC; see the file COPYING3. If not see #define DBX_CONTIN_CHAR '\\' #endif +#ifndef DBX_MAX_LENGTH +#define DBX_MAX_LENGTH 0 +#endif + enum typestatus {TYPE_UNSEEN, TYPE_XREF, TYPE_DEFINED}; /* Structure recording information about a C data type. @@ -2270,6 +2274,12 @@ dbxout_type (tree type, int full) stabstr_C (','); if (TREE_CHAIN (tem) != 0) CONTIN; + + /* Prevent too long strings rejected by linker. */ + if (DBX_CONTIN_LENGTH == 0 + DBX_MAX_LENGTH != 0 + obstack_object_size (stabstr_ob) DBX_MAX_LENGTH) + break; } stabstr_C (';'); With this we should be able to set limit in target config files. Tried on linux with a small test with limit set to 20: cat test.c enum e1 { val1, val2, val3 }; gcc test.c -S -gstabs grep e1 test.s .stabs e1:T(0,21)=eval1:0,val2:1,;,128,0,0,0 The last enum value was discarded. I don't know which target should set this limit and what limit values should be used. Thanks, Ilya I don't think we ever encouraged work-arounds for broken host compilers if that requires substantial work. not sure how exactly the issue breaks AIX bootstrap though. I suppose the assembler complains? It is linker how complaints. Here is a build log example: http://toolchain.lug-owl.de/buildbot/deliver_artifact.php?mode=viewid=3073584 I see. Thanks, Richard. Ilya Richard. Ilya
C++ PATCH for c++/58102 (constexpr and mutable)
We need to allow copy-list-initialization of constexpr variables with mutable members, too. The thing we need to avoid is not so much an full-expression with mutable-containing type as assuming that a mutable member of a variable hasn't changed since the variable was initialized. Tested x86_64-pc-linux-gnu, applied to trunk. commit 58e6e22ca70f9efe534f9d038f0cf9a2fe1a7504 Author: Jason Merrill ja...@redhat.com Date: Mon Nov 17 23:44:41 2014 -0500 PR c++/58102 * typeck2.c (store_init_value): Set it. * cp-tree.h (CONSTRUCTOR_MUTABLE_POISON): New. * constexpr.c (cxx_eval_outermost_constant_expr): Check it. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 5b25654..2f0708b 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -3315,15 +3315,15 @@ cxx_eval_outermost_constant_expr (tree t, bool allow_non_constant, verify_constant (r, allow_non_constant, non_constant_p, overflow_p); - if (TREE_CODE (t) != CONSTRUCTOR - cp_has_mutable_p (TREE_TYPE (t))) + /* Mutable logic is a bit tricky: we want to allow initialization of + constexpr variables with mutable members, but we can't copy those + members to another constexpr variable. */ + if (TREE_CODE (r) == CONSTRUCTOR + CONSTRUCTOR_MUTABLE_POISON (r)) { - /* We allow a mutable type if the original expression was a - CONSTRUCTOR so that we can do aggregate initialization of - constexpr variables. */ if (!allow_non_constant) - error (%qT cannot be the type of a complete constant expression - because it has mutable sub-objects, type); + error (%qE is not a constant expression because it refers to + mutable subobjects of %qT, t, type); non_constant_p = true; } diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index d3722d7..3542344 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -109,6 +109,7 @@ c-common.h, not after. DECLTYPE_FOR_LAMBDA_PROXY (in DECLTYPE_TYPE) REF_PARENTHESIZED_P (in COMPONENT_REF, INDIRECT_REF) AGGR_INIT_ZERO_FIRST (in AGGR_INIT_EXPR) + CONSTRUCTOR_MUTABLE_POISON (in CONSTRUCTOR) 3: (TREE_REFERENCE_EXPR) (in NON_LVALUE_EXPR) (commented-out). ICS_BAD_FLAG (in _CONV) FN_TRY_BLOCK_P (in TRY_BLOCK) @@ -3497,6 +3498,11 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter) #define CONSTRUCTOR_NO_IMPLICIT_ZERO(NODE) \ (TREE_LANG_FLAG_1 (CONSTRUCTOR_CHECK (NODE))) +/* True if this CONSTRUCTOR should not be used as a variable initializer + because it was loaded from a constexpr variable with mutable fields. */ +#define CONSTRUCTOR_MUTABLE_POISON(NODE) \ + (TREE_LANG_FLAG_2 (CONSTRUCTOR_CHECK (NODE))) + #define DIRECT_LIST_INIT_P(NODE) \ (BRACE_ENCLOSED_INITIALIZER_P (NODE) CONSTRUCTOR_IS_DIRECT_INIT (NODE)) diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index 01a0671..5748650 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -809,6 +809,10 @@ store_init_value (tree decl, tree init, vectree, va_gc** cleanups, int flags) value = cxx_constant_value (value, decl); } value = maybe_constant_init (value, decl); + if (TREE_CODE (value) == CONSTRUCTOR cp_has_mutable_p (type)) + /* Poison this CONSTRUCTOR so it can't be copied to another + constexpr variable. */ + CONSTRUCTOR_MUTABLE_POISON (value) = true; const_init = (reduced_constant_expression_p (value) || error_operand_p (value)); DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = const_init; diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable2.C new file mode 100644 index 000..c449c3a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable2.C @@ -0,0 +1,10 @@ +// PR c++/58102 +// { dg-do compile { target c++11 } } + +struct S { + mutable int n; + constexpr S() : n() {} +}; + +constexpr S s = {}; +constexpr S s2 = s; // { dg-error mutable }
[PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
Hi, The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when using the -mfix-r1 option. This is due to the fact that the delay slot of the branch instruction that checks if the atomic operation was not successful can be filled with an operation that returns the output result. For the branch likely case this operation can not go in the delay slot because it wont execute when the atomic operation was successful and therefore the wrong result will be returned. This patch forces a nop to be placed in the delay slot if a branch likely is used. The fix is as simple as possible; it does cause a second nop to be introduced after the branch instruction in the branch likely case. As this option is rarely used, it makes more sense to keep the code readable than trying to optimise it. The atomic tests now pass successfully. The ChangeLog and patch are below. Ok to commit? Many thanks, Andrew gcc/ * config/mips/mips.c (mips_process_sync_loop): Place a nop in the delay slot of the branch likely instruction. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 02268f3..6dd3728 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands) This will sometimes be a delayed branch; see the write code below for details. */ mips_multi_add_insn (is_64bit_p ? scd\t%0,%1 : sc\t%0,%1, at, mem, NULL); - mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL); + + /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay slot + of the branch if it is a branch likely because they will not execute when + the atomic operation is successful, so place a NOP in there instead. */ + + mips_multi_add_insn (beq%?\t%0,%.,1b%~, at, NULL); /* if (INSN1 != MOVE INSN1 != LI) NEWVAL = $TMP3 [delay slot]. */ if (insn1 != SYNC_INSN1_MOVE insn1 != SYNC_INSN1_LI tmp3 != newval)
Re: [PATCH 5/5] combine: preferably delete dead SETs in PARALLELs
On Tue, Nov 18, 2014 at 06:38:49AM -0600, Segher Boessenkool wrote: Does it help pr52714 where we'd like to rip apart a PARALLEL with two sets, one of which is dead. If so, it might allow us to optimize that code better. It does not seem to fix the testcase. I'll investigate why not. You're talking about the (parallel [(set (pc) (pc)) (set (X) (sp))]) right? I guess the set pc pc is not marked as unused... The very first thing that is checked for is !(added_sets_2 i1 == 0) which matches in this case. I wonder what that condition is supposed to accomplish -- optimisation I suppose? Hacking that out, it still won't match because the set pc pc is not considered dead. It's a no-op, not the same thing. I'll try to widen the condition... Segher
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
On Tue, Nov 18, 2014 at 8:33 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 15:18 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 18, 2014 at 1:13 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com: On 11/17/14 13:43, Ilya Enkovich wrote: I don't fully understand how this problem appears. Is it fully AIX specific and doesn't affect any other target? May we put all _CHKP codes to the end of enum and ignore them by AIX? Limiting number of codes in enum due to restrictions of specific debug info format seems weird. If something cannot be encoded for debug info then it should be ignored. I really hoped that creation of functions by demand would allow to avoid such kind of problems :( I'll try to make a patch reducing amound of builtin codes to a required minimum in case it appears to be the best option we have. It's a problem with the AIX native tools. Presumably they don't like really long lines -- it was a relatively common issue in the past with vendor tools. I think we should proceed on two fronts. First if David could investigate the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to resolve the issue. Second if you could look at now duplicating every builtin in the enumeration since it's a combination of the number of enums and the length of the debug strings to represent them that's causing AIX heartburn. jeff I see. I can reduce the number of _CHKP builtin codes. Experimental patch shows resulting END_BUILTINS = 1242. But we should expect similar problem for i386 target builds hosted on AIX (here http://toolchain.lug-owl.de/buildbot/ I see such build is tested). Current IX86_BUILTIN_MAX is 2467 which is greater than current END_BUILTINS. I think it's better to fix dbxout.c to do something sensible with too long lines for enum types. It should be easy to cook up a testcase that is not GCC. enum { entry1, entry2, entry3 }; As I understand the main problem is that fixing dbxout in trunk won't help to build stage1 compiler. Simply build stage1 without debug info on AIX then. Btw, you have to start fixing the bug at some point ... (we can backport to 4.8 and 4.9). Of course dbxout.c is in kind of deep-freeze unmaintained state. Seems with no continuation lines support there is no easy way to limit stabstring length. But it should be easy to fix this particular problem (too long stabstring due to many enum values) by discarding the rest of values when some string length threshold is hit. Possible patch: diff --git a/gcc/dbxout.c b/gcc/dbxout.c index aa15a39..9b0b82d 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -168,6 +168,10 @@ along with GCC; see the file COPYING3. If not see #define DBX_CONTIN_CHAR '\\' #endif +#ifndef DBX_MAX_LENGTH +#define DBX_MAX_LENGTH 0 +#endif + enum typestatus {TYPE_UNSEEN, TYPE_XREF, TYPE_DEFINED}; /* Structure recording information about a C data type. @@ -2270,6 +2274,12 @@ dbxout_type (tree type, int full) stabstr_C (','); if (TREE_CHAIN (tem) != 0) CONTIN; + + /* Prevent too long strings rejected by linker. */ + if (DBX_CONTIN_LENGTH == 0 + DBX_MAX_LENGTH != 0 + obstack_object_size (stabstr_ob) DBX_MAX_LENGTH) + break; } stabstr_C (';'); With this we should be able to set limit in target config files. Tried on linux with a small test with limit set to 20: cat test.c enum e1 { val1, val2, val3 }; gcc test.c -S -gstabs grep e1 test.s .stabs e1:T(0,21)=eval1:0,val2:1,;,128,0,0,0 The last enum value was discarded. I don't know which target should set this limit and what limit values should be used. Hi, Ilya Thanks for the preliminary patch to dbxout.c. I was planning to work on something similar today. Continuations would have been a nice solution, but, as I wrote earlier, AIX assembler does not honor stabstring continuations. It's one line, period. As much as can fit on one line. I believe that AIX allows up to 64K. My work-around was not meant as a solution. I was trying to allow GCC to build on AIX while we work on a solution and trying to help everyone understand the source of the problem. We definitely need a fix for dbxout in GCC and backport it to GCC 4.8 and GCC 4.9 branches. I still think that it is rather unfortunate that the infrastructure for CHKP introduces so much bloat into GCC on every target and every architecture even when it is not enabled and cannot function. Thanks, David
Re: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
On Tue, 18 Nov 2014, Andrew Bennett wrote: The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when using the -mfix-r1 option. This is due to the fact that the delay slot of the branch instruction that checks if the atomic operation was not successful can be filled with an operation that returns the output result. For the branch likely case this operation can not go in the delay slot because it wont execute when the atomic operation was successful and therefore the wrong result will be returned. This patch forces a nop to be placed in the delay slot if a branch likely is used. The fix is as simple as possible; it does cause a second nop to be introduced after the branch instruction in the branch likely case. As this option is rarely used, it makes more sense to keep the code readable than trying to optimise it. Can you please be a bit more elaborate on how the wrong code sequence looks like, why it is produced like that and how your fix changes it? Without seeing examples of code generated it is very hard to imagine what is really going on here. Maciej
Re: C++ PATCH for c++/58102 (constexpr and mutable)
On 11/18/2014 02:35 PM, Jason Merrill wrote: We need to allow copy-list-initialization of constexpr variables with mutable members, too. The thing we need to avoid is not so much an full-expression with mutable-containing type as assuming that a mutable member of a variable hasn't changed since the variable was initialized. Thanks. You suggest a similar fix a while ago but I failed to pursue it, sorry. Paolo.
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
On Tue, Nov 18, 2014 at 2:44 PM, David Edelsohn dje@gmail.com wrote: On Tue, Nov 18, 2014 at 8:33 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 15:18 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 18, 2014 at 1:13 PM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com: On 11/17/14 13:43, Ilya Enkovich wrote: I don't fully understand how this problem appears. Is it fully AIX specific and doesn't affect any other target? May we put all _CHKP codes to the end of enum and ignore them by AIX? Limiting number of codes in enum due to restrictions of specific debug info format seems weird. If something cannot be encoded for debug info then it should be ignored. I really hoped that creation of functions by demand would allow to avoid such kind of problems :( I'll try to make a patch reducing amound of builtin codes to a required minimum in case it appears to be the best option we have. It's a problem with the AIX native tools. Presumably they don't like really long lines -- it was a relatively common issue in the past with vendor tools. I think we should proceed on two fronts. First if David could investigate the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to resolve the issue. Second if you could look at now duplicating every builtin in the enumeration since it's a combination of the number of enums and the length of the debug strings to represent them that's causing AIX heartburn. jeff I see. I can reduce the number of _CHKP builtin codes. Experimental patch shows resulting END_BUILTINS = 1242. But we should expect similar problem for i386 target builds hosted on AIX (here http://toolchain.lug-owl.de/buildbot/ I see such build is tested). Current IX86_BUILTIN_MAX is 2467 which is greater than current END_BUILTINS. I think it's better to fix dbxout.c to do something sensible with too long lines for enum types. It should be easy to cook up a testcase that is not GCC. enum { entry1, entry2, entry3 }; As I understand the main problem is that fixing dbxout in trunk won't help to build stage1 compiler. Simply build stage1 without debug info on AIX then. Btw, you have to start fixing the bug at some point ... (we can backport to 4.8 and 4.9). Of course dbxout.c is in kind of deep-freeze unmaintained state. Seems with no continuation lines support there is no easy way to limit stabstring length. But it should be easy to fix this particular problem (too long stabstring due to many enum values) by discarding the rest of values when some string length threshold is hit. Possible patch: diff --git a/gcc/dbxout.c b/gcc/dbxout.c index aa15a39..9b0b82d 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -168,6 +168,10 @@ along with GCC; see the file COPYING3. If not see #define DBX_CONTIN_CHAR '\\' #endif +#ifndef DBX_MAX_LENGTH +#define DBX_MAX_LENGTH 0 +#endif + enum typestatus {TYPE_UNSEEN, TYPE_XREF, TYPE_DEFINED}; /* Structure recording information about a C data type. @@ -2270,6 +2274,12 @@ dbxout_type (tree type, int full) stabstr_C (','); if (TREE_CHAIN (tem) != 0) CONTIN; + + /* Prevent too long strings rejected by linker. */ + if (DBX_CONTIN_LENGTH == 0 + DBX_MAX_LENGTH != 0 + obstack_object_size (stabstr_ob) DBX_MAX_LENGTH) + break; } stabstr_C (';'); With this we should be able to set limit in target config files. Tried on linux with a small test with limit set to 20: cat test.c enum e1 { val1, val2, val3 }; gcc test.c -S -gstabs grep e1 test.s .stabs e1:T(0,21)=eval1:0,val2:1,;,128,0,0,0 The last enum value was discarded. I don't know which target should set this limit and what limit values should be used. Hi, Ilya Thanks for the preliminary patch to dbxout.c. I was planning to work on something similar today. Continuations would have been a nice solution, but, as I wrote earlier, AIX assembler does not honor stabstring continuations. Can we emit this particular STABS piece in raw form then? Thus does it work to have sth like .stabs ... .string ... .stabs ... or whatever way to emit assembled data there? It's one line, period. As much as can fit on one line. I believe that AIX allows up to 64K. Can we get the assembler fixed? (sorry to ask this stupid question again ...) My work-around was not meant as a solution. I was trying to allow GCC to build on AIX while we work on a solution and trying to help everyone understand the source of the problem. We definitely need a fix for dbxout in GCC and backport it to GCC 4.8 and GCC 4.9 branches. I still think that it is rather unfortunate that the
Re: [PATCH] Check 'fd' neither -1 nor 0, before close it
On 11/18/14 0:38, Joseph Myers wrote: On Sat, 15 Nov 2014, Chen Gang wrote: Also in c_common_read_pch(), when failure occurs, also need be sure the 'fd' is not '-1' for the next close operation. Please clarify how c_common_read_pch gets called with fd == -1. Certainly checking after an error is too late; we shouldn't call fdopen in that case at all, and I think something further up the call chain should have avoided calling c_common_read_pch with fd == -1 at all (the question is just exactly what point on the call chain is missing such a check and needs it added). According to current source code, the author wants 'fd' should never be '-1' in c_common_read_pch(). grep -rn read_pch * in root directory, then grep -rn _cpp_stack_file *, can know it is used in 3 areas: - c_common_pch_pragma() in gcc/c-family/c-pch.c, it has already checked 'fd' before call c_common_read_pch() - _cpp_stack_include() in libcpp/files.c, before _cpp_stack_file(), has called and checked _cpp_find_file(). - cpp_read_main_file() in libcpp/init.c, before _cpp_stack_file(), has called and checked _cpp_find_file(). In c_common_read_pch(), even if 'fd' is '-1', we should check it firstly before call fdopen(), instead of check '-1' in failure code block. But for me, it contents another 2 related issues: - _cpp_find_file() always return a valid pointer, so related check code file == NULL is no use for _cpp_find_file() in _cpp_stack_include(). - According to its comments, _cpp_find_file() can not be sure of 'file-fd' must not be '-1', even when file-err_no == 0, we need reopen it if necessary. Thanks. -- Chen Gang Open, share, and attitude like air, water, and life which God blessed
Re: RFA (tree-inline): PATCH for more C++14 constexpr support
Jason Merrill ja...@redhat.com writes: commit e52e82e56507d1de1932abcafd80683c4dc00d1e Author: Jason Merrill ja...@redhat.com Date: Sun Nov 16 17:14:12 2014 -0500 * constexpr.c (use_new_call): Always use new call handling. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 57d0c46..8881271 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -1021,8 +1021,8 @@ adjust_temp_type (tree type, tree temp) } /* True if we want to use the new handling of constexpr calls based on - DECL_SAVED_TREE. Currently only active for C++14 mode. */ -#define use_new_call (cxx_dialect = cxx14) + DECL_SAVED_TREE. */ +#define use_new_call true /* Subroutine of cxx_eval_call_expression. We are processing a call expression (either CALL_EXPR or FAIL: 18_support/numeric_limits/requirements/constexpr_functions.cc (test for excess errors) $ gcc/xg++ -shared-libgcc -Bgcc -nostdinc++ -Lm68k-linux/libstdc++-v3/src -Lm68k-linux/libstdc++-v3/src/.libs -Lm68k-linux/libstdc++-v3/libsupc++/.libs -B/daten/cross/m68k-linux/m68k-linux/bin/ -B/daten/cross/m68k-linux/m68k-linux/lib/ -isystem /daten/cross/m68k-linux/m68k-linux/include -isystem /daten/cross/m68k-linux/m68k-linux/sys-include -Bm68k-linux/./libstdc++-v3/src/.libs -D_GLIBCXX_ASSERT -fmessage-length=0 -ffunction-sections -fdata-sections -O2 -g -D_GNU_SOURCE -DLOCALEDIR=. -nostdinc++ -Im68k-linux/libstdc++-v3/include/m68k-linux -Im68k-linux/libstdc++-v3/include -I../libstdc++-v3/libsupc++ -I../libstdc++-v3/include/backward -I../libstdc++-v3/testsuite/util ../libstdc++-v3/testsuite/18_support/numeric_limits/requirements/constexpr_functions.cc -std=gnu++14 -S -o constexpr_functions.s In file included from ../libstdc++-v3/testsuite/18_support/numeric_limits/requirements/constexpr_functions.cc:21:0: ../libstdc++-v3/testsuite/18_support/numeric_limits/requirements/constexpr_functions.cc: In instantiation of ‘void __gnu_test::constexpr_member_functions::operator()()::_Concept::__constraint() [with _Ttesttype = std::numeric_limitschar; _Tbasetype = char]’: ../libstdc++-v3/testsuite/18_support/numeric_limits/requirements/constexpr_functions.cc:55:2: required from ‘void __gnu_test::constexpr_member_functions::operator()() [with _Ttesttype = std::numeric_limitschar; _Tbasetype = char]’ m68k-linux/libstdc++-v3/include/ext/typelist.h:197:2: required from ‘void __gnu_cxx::typelist::detail::apply_generator2_Gn, __gnu_cxx::typelist::chainHd, Tl, __gnu_cxx::typelist::chainHd2, TlV ::operator()(Gn) [with Gn = __gnu_test::constexpr_member_functions; Hd1 = std::numeric_limitschar; TlT = __gnu_cxx::typelist::chainstd::numeric_limitssigned char, __gnu_cxx::typelist::chainstd::numeric_limitsunsigned char, __gnu_cxx::typelist::chainstd::numeric_limitsshort int, __gnu_cxx::typelist::chainstd::numeric_limitsshort unsigned int, __gnu_cxx::typelist::chainstd::numeric_limitsint, __gnu_cxx::typelist::chainstd::numeric_limitsunsigned int, __gnu_cxx::typelist::chainstd::numeric_limitslong int, __gnu_cxx::typelist::chainstd::numeric_limitslong unsigned int, __gnu_cxx::typelist::chainstd::numeric_limitslong long int, __gnu_cxx::typelist::chainstd::numeric_limitslong long unsigned int, __gnu_cxx::typelist::chainstd::numeric_limitswchar_t, __gnu_cxx::typelist::chainstd::numeric_limitschar16_t, __gnu_cxx::typelist::chainstd::numeric_limitschar32_t, __gnu_cxx::typelist::null_type; Hd2 = char; TlV = __gnu_cxx::typelist::chainsigned char, __gnu_cxx::typelist::chainunsigned char, __gnu_cxx::typelist::chainshort int, __gnu_cxx::typelist::chainshort unsigned int, __gnu_cxx::typelist::chainint, __gnu_cxx::typelist::chainunsigned int, __gnu_cxx::typelist::chainlong int, __gnu_cxx::typelist::chainlong unsigned int, __gnu_cxx::typelist::chainlong long int, __gnu_cxx::typelist::chainlong long unsigned int, __gnu_cxx::typelist::chainwchar_t, __gnu_cxx::typelist::chainchar16_t, __gnu_cxx::typelist::chainchar32_t, __gnu_cxx::typelist::null_type ]’ m68k-linux/libstdc++-v3/include/ext/typelist.h:199:6: required from ‘void __gnu_cxx::typelist::detail::apply_generator2_Gn, __gnu_cxx::typelist::chainHd, Tl, __gnu_cxx::typelist::chainHd2, TlV ::operator()(Gn) [with Gn = __gnu_test::constexpr_member_functions; Hd1 = std::numeric_limitsbool; TlT = __gnu_cxx::typelist::chainstd::numeric_limitschar, __gnu_cxx::typelist::chainstd::numeric_limitssigned char, __gnu_cxx::typelist::chainstd::numeric_limitsunsigned char, __gnu_cxx::typelist::chainstd::numeric_limitsshort int, __gnu_cxx::typelist::chainstd::numeric_limitsshort unsigned int, __gnu_cxx::typelist::chainstd::numeric_limitsint, __gnu_cxx::typelist::chainstd::numeric_limitsunsigned int, __gnu_cxx::typelist::chainstd::numeric_limitslong int, __gnu_cxx::typelist::chainstd::numeric_limitslong unsigned int, __gnu_cxx::typelist::chainstd::numeric_limitslong long int, __gnu_cxx::typelist::chainstd::numeric_limitslong long unsigned int,
Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls
On Tue, Nov 18, 2014 at 9:07 AM, Richard Biener richard.guent...@gmail.com wrote: Can we emit this particular STABS piece in raw form then? Thus does it work to have sth like .stabs ... .string ... .stabs ... or whatever way to emit assembled data there? The stabstring debug information is saved as a length and a string. For 32 bit AIX, the length is an unsigned short. Longer entries apparently can be stored in multiple symbol table entries, but it's not clear how to emit assembly to produce that effect. For 64 bit AIX, the length is an unsigned int 32 bits, so one may be able to work around this problem by compiling GCC on AIX in 64 bit mode. A conversion to default 64 bit build will take a lot of work. Thanks, David
RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
-Original Message- From: Maciej W. Rozycki [mailto:ma...@codesourcery.com] Sent: 18 November 2014 13:48 To: Andrew Bennett Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop On Tue, 18 Nov 2014, Andrew Bennett wrote: The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when using the -mfix-r1 option. This is due to the fact that the delay slot of the branch instruction that checks if the atomic operation was not successful can be filled with an operation that returns the output result. For the branch likely case this operation can not go in the delay slot because it wont execute when the atomic operation was successful and therefore the wrong result will be returned. This patch forces a nop to be placed in the delay slot if a branch likely is used. The fix is as simple as possible; it does cause a second nop to be introduced after the branch instruction in the branch likely case. As this option is rarely used, it makes more sense to keep the code readable than trying to optimise it. Can you please be a bit more elaborate on how the wrong code sequence looks like, why it is produced like that and how your fix changes it? Without seeing examples of code generated it is very hard to imagine what is really going on here. Ok if we compile the following cut-down atomic-op-3.c test case with -mfix-r1. extern void abort(void); int v, count, res; int main () { v = 0; count = 1; if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1) abort (); return 0; } The following assembly code is produced for the atomic operation: .setnoat sync # 15 sync_new_addsi/2[length = 24] 1: ll $3,0($4) addu$1,$3,$2 sc $1,0($4) beql$1,$0,1b addu$3,$3,$2 sync .setat Notice here that the addu is in the delay slot of the branch likely instruction. This is computing the value that exists in the memory location after the atomic operation has completed. Because it is a branch likely instruction this will only run when the atomic operation fails, and hence when it is successful it will not return the correct value. The offending code is in mips_process_sync_loop: mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL); /* if (INSN1 != MOVE INSN1 != LI) NEWVAL = $TMP3 [delay slot]. */ if (insn1 != SYNC_INSN1_MOVE insn1 != SYNC_INSN1_LI tmp3 != newval) { mips_multi_copy_insn (tmp3_insn); mips_multi_set_operand (mips_multi_last_index (), 0, newval); } else if (!(required_oldval cmp)) mips_multi_add_insn (nop, NULL); /* CMP = 1 -- either standalone or in a delay slot. */ if (required_oldval cmp) mips_multi_add_insn (li\t%0,1, cmp, NULL); For the branch likely case the delay slot could be filled either with the operation to compute the value that exists in memory after the atomic operation has completed; an operation to return if the atomic operation is successful or not; or a nop. The first two operations should not be in the delay slot of the branch if it is a branch likely because they will only run if the atomic operation was unsuccessful. My fix places a nop in the delay slot of the branch likely instruction by using the %~ output operation. This then causes the sync code for the previous example to be correct: .setnoat sync # 15 sync_new_addsi/2[length = 24] 1: ll $3,0($4) addu$1,$3,$2 sc $1,0($4) beql$1,$0,1b nop addu$3,$3,$2 sync .setat Regards, Andrew
small C++ PATCH to instantiate_template_1
While looking at 55992 I noticed that we weren't pushing into dependent scope properly; we need to pass the entering scope flag to tsubst_aggr_type so that we get the version of the type with the members. This isn't enough to fix 55992, but may well fix other bugs. Tested x86_64-pc-linux-gnu, applying to trunk. commit c12c225ae102f77c6b4b1f98a69adbee474cb3a3 Author: Jason Merrill ja...@redhat.com Date: Mon Nov 17 16:26:45 2014 -0500 * pt.c (instantiate_template_1): Use tsubst_aggr_type for context. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index c0f3b8c..6661325 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -15856,8 +15856,8 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain) ++processing_template_decl; if (DECL_CLASS_SCOPE_P (gen_tmpl)) { - tree ctx = tsubst (DECL_CONTEXT (gen_tmpl), targ_ptr, - complain, gen_tmpl); + tree ctx = tsubst_aggr_type (DECL_CONTEXT (gen_tmpl), targ_ptr, + complain, gen_tmpl, true); push_nested_class (ctx); } /* Substitute template parameters to obtain the specialization. */
RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when using the -mfix-r1 option. This is due to the fact that the delay slot of the branch instruction that checks if the atomic operation was not successful can be filled with an operation that returns the output result. For the branch likely case this operation can not go in the delay slot because it wont execute when the atomic operation was successful and therefore the wrong result will be returned. This patch forces a nop to be placed in the delay slot if a branch likely is used. The fix is as simple as possible; it does cause a second nop to be introduced after the branch instruction in the branch likely case. As this option is rarely used, it makes more sense to keep the code readable than trying to optimise it. The atomic tests now pass successfully. The ChangeLog and patch are below. Ok to commit? I'm OK with just making the fix-r1 case safe rather than also complicating the normal code path to remove the normal delay slot NOP in this special case. I'd like to see what Catherine thinks though. To remove the second NOP I believe we would have to add a !TARGET_FIX_R1 to the condition of the normal delay slot NOP. This seems a bit convoluted when the real reason is because branch likely is in use. To make use of the mips_branch_likely flag it would have to be set earlier in: mips_output_sync_loop and also get set in mips_sync_loop_insns. If Catherine thinks getting rid of the second NOP is important enough then please base it on mips_branch_likely and fix the callers. gcc/ * config/mips/mips.c (mips_process_sync_loop): Place a nop in the delay slot of the branch likely instruction. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 02268f3..6dd3728 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands) This will sometimes be a delayed branch; see the write code below for details. */ mips_multi_add_insn (is_64bit_p ? scd\t%0,%1 : sc\t%0,%1, at, mem, NULL); - mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL); + + /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay slot + of the branch if it is a branch likely because they will not execute when + the atomic operation is successful, so place a NOP in there instead. */ + I found the comment hard to digest, perhaps: /* When using branch likely (-mfix-r1), the delay slot instruction will be annulled on false. The normal delay slot instructions calculate the overall result of the atomic operation and must not be annulled. Unconditionally use a NOP instead for the branch likely case. */ + mips_multi_add_insn (beq%?\t%0,%.,1b%~, at, NULL); /* if (INSN1 != MOVE INSN1 != LI) NEWVAL = $TMP3 [delay slot]. */ if (insn1 != SYNC_INSN1_MOVE insn1 != SYNC_INSN1_LI tmp3 != newval) Matthew
RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
On Tue, 18 Nov 2014, Andrew Bennett wrote: My fix places a nop in the delay slot of the branch likely instruction by using the %~ output operation. This then causes the sync code for the previous example to be correct: .setnoat sync # 15 sync_new_addsi/2[length = 24] 1: ll $3,0($4) addu$1,$3,$2 sc $1,0($4) beql$1,$0,1b nop addu$3,$3,$2 sync .setat OK, this does look to me like the correct way to address the issue, but where is the second NOP that you previously mentioned? I fail to see it here and this code can't be made any better, there isn't anything you could possibly schedule into the delay slot as there is nothing else to do in this loop. Maciej
Re: [PATCH][wwwdocs] Update 5.0 changes.html with Thumb1 UAL
On 18/11/14 02:48, Terry Guo wrote: + ul + li The Thumb-1 assembly code are now generated in unified syntax. The new option +code-masm-syntax-unified/code can be used to specify whether inline assembly +code are using unified syntax. By default the option is off which means +non-unified syntax is used. However this is subject to change in future releases. +Eventually the non-unified syntax will be deprecated. + /li + /ul Hi Terry, Sorry for the late comment, I see this has already been committed. I think it should be assembly code is now generated. Also whether inline assembly code is using unified syntax. Kyrill
Re: [PATCH, MPX wrappers 2/3] Replace some function calls with wrapper calls during instrumentation
On 15 Nov 00:16, Jeff Law wrote: On 11/14/14 10:29, Ilya Enkovich wrote: Hi, This patch adds wrapper calls. It's simply achieved by using proper name for builtin clones. Patches apply over builtins instrumentation series. Thanks, Ilya -- 2014-11-14 Ilya Enkovich ilya.enkov...@intel.com * c-family/c.opt (fchkp-use-wrappers): New. * ipa-chkp.c (CHKP_WRAPPER_SYMBOL_PREFIX): New. (chkp_wrap_function): New. (chkp_build_instrumented_fndecl): Support wrapped functions. invoke.texi documentation please. OK with the invoke.texi doc work. Patch #3 is OK. Jeff invoke.texi misses all chkp options description. I fixed it. Here is a new patch. Thanks, Ilya -- 2014-11-18 Ilya Enkovich ilya.enkov...@intel.com * c-family/c.opt (fchkp-use-wrappers): New. * ipa-chkp.c (CHKP_WRAPPER_SYMBOL_PREFIX): New. (chkp_wrap_function): New. (chkp_build_instrumented_fndecl): Support wrapped functions. * doc/invoke.texi (-fcheck-pointer-bounds): New. (-fchkp-check-incomplete-type): New. (-fchkp-first-field-has-own-bounds): New. (-fchkp-narrow-bounds): New. (-fchkp-narrow-to-innermost-array): New. (-fchkp-optimize): New. (-fchkp-use-fast-string-functions): New. (-fchkp-use-nochk-string-functions): New. (-fchkp-use-static-bounds): New. (-fchkp-use-static-const-bounds): New. (-fchkp-treat-zero-dynamic-size-as-infinite): New. (-fchkp-check-read): New. (-fchkp-check-write): New. (-fchkp-store-bounds): New. (-fchkp-instrument-calls): New. (-fchkp-instrument-marked-only): New. (-fchkp-use-wrappers): New. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 85dcb98..a9f8f99 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1040,6 +1040,10 @@ fchkp-instrument-marked-only C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0) Instrument only functions marked with bnd_instrument attribute. +fchkp-use-wrappers +C ObjC C++ ObjC++ LTO Report Var(flag_chkp_use_wrappers) Init(1) +Transform instrumented builtin calls into calls to wrappers. + fcilkplus C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0) Enable Cilk Plus diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 89edddb..09a0683 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -299,6 +299,15 @@ Objective-C and Objective-C++ Dialects}. @gccoptlist{-d@var{letters} -dumpspecs -dumpmachine -dumpversion @gol -fsanitize=@var{style} -fsanitize-recover -fsanitize-recover=@var{style} @gol -fasan-shadow-offset=@var{number} -fsanitize-undefined-trap-on-error @gol +-fcheck-pointer-bounds -fchkp-check-incomplete-type @gol +-fchkp-first-field-has-own-bounds -fchkp-narrow-bounds @gol +-fchkp-narrow-to-innermost-array -fchkp-optimize @gol +-fchkp-use-fast-string-functions -fchkp-use-nochk-string-functions @gol +-fchkp-use-static-bounds -fchkp-use-static-const-bounds @gol +-fchkp-treat-zero-dynamic-size-as-infinite -fchkp-check-read @gol +-fchkp-check-read -fchkp-check-write -fchkp-store-bounds @gol +-fchkp-instrument-calls -fchkp-instrument-marked-only @gol +-fchkp-use-wrappers @gol -fdbg-cnt-list -fdbg-cnt=@var{counter-value-list} @gol -fdisable-ipa-@var{pass_name} @gol -fdisable-rtl-@var{pass_name} @gol @@ -5693,6 +5702,117 @@ a @code{libubsan} library routine. The advantage of this is that the @code{libubsan} library is not needed and will not be linked in, so this is usable even for use in freestanding environments. +@item -fcheck-pointer-bounds +@opindex fcheck-pointer-bounds +@opindex fno-check-pointer-bounds +Enable Pointer Bounds Checker instrumentation. Each memory reference +is instrumented with checks of pointer used for memory access against +bounds associated with that pointer. Generated instrumentation may +be controlled by various @option{-fchkp-*} options. + +@item -fchkp-check-incomplete-type +@opindex fchkp-check-incomplete-type +@opindex fno-chkp-check-incomplete-type +Generate pointer bounds checks for variables with incomplete type. +Enabled by default + +@item -fchkp-narrow-bounds +@opindex fchkp-narrow-bounds +@opindex fno-chkp-narrow-bounds +Controls bounds used by Pointer Bounds Checker for pointers to object +fields. If narrowing is enabled then field bounds are used. Otherwise +object bounds are used. See also @option{-fchkp-narrow-to-innermost-array} +and @option{-fchkp-first-field-has-own-bounds}. Enabled by default. + +@item -fchkp-first-field-has-own-bounds +@opindex fchkp-first-field-has-own-bounds +@opindex fno-chkp-first-field-has-own-bounds +Forces Pointer Bounds Checker to use narrowed bounds for address of the +first field in the structure. By default pointer to the first field has +the same bounds as pointer to the whole structure. + +@item -fchkp-narrow-to-innermost-array +@opindex fchkp-narrow-to-innermost-array +@opindex
RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop
OK, this does look to me like the correct way to address the issue, but where is the second NOP that you previously mentioned? I fail to see it here and this code can't be made any better, there isn't anything you could possibly schedule into the delay slot as there is nothing else to do in this loop. The following testcase shows this occurring. short v, count, ret; int main () { v = 0; count = 0; __atomic_exchange_n (v, count + 1, __ATOMIC_RELAXED); return 0; } Produces (for the atomic operation): .setnoat sync 1: ll $3,0($5) and $1,$3,$4 bne $1,$7,2f and $1,$3,$6 or $1,$1,$8 sc $1,0($5) beql$1,$0,1b nop nop sync 2: .setat Regards, Andrew
Re: [ping] account for register spans in expand_builtin_init_dwarf_reg_sizes
On Nov 18, 2014, at 03:29 , Jason Merrill ja...@redhat.com wrote: What happens when the outer loop hits a register that we've already seen as part of a span? Hmm, I was under the impression that this was supposed never to happen. I can see that this is not so clear though. What would happen with the current code is the production of extraneous instructions initializing the same slot, presumably with the same values. I have a candidate improvement to prevent processing the same regno multiple times. Will followup as soon as I have some test results. Thanks for your feedback, Olivier
Re: RFA (tree-inline): PATCH for more C++14 constexpr support
On 17/11/14 18:15, Jason Merrill wrote: On 11/17/2014 05:29 AM, Richard Biener wrote: can you rename it to copy_fn please? It really copies parameter and result and then the body. Ok with that change. Done. Here's what I'm checking in, along with a second patch to enable the new code for C++11 as well: Hi Jason, These broke building libstdc++ on arm-none-eabi. I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63936 for it. Cheers, Kyrill
[PATCH] Fix PR63868 - Guard offloading with ifdef ENABLE_OFFLOADING
Hi, This patch disables the generation of sections with offload IR and offload tables by default, when a compiler is configured without --enable-offload-targets= . Bootsrap and regtesting in progress, OK after it finished? -- Ilya PR regression/63868 gcc/ * cgraph.c (cgraph_node::create): Guard g-have_offload with ifdef ENABLE_OFFLOADING. * omp-low.c (create_omp_child_function): Likewise. (expand_omp_target): Guard node and offload_funcs with ifdef ENABLE_OFFLOADING. * varpool.c (varpool_node::get_create): Guard g-have_offload and offload_vars with ifdef ENABLE_OFFLOADING. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index cc04744..18ae6a8 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -500,7 +500,9 @@ cgraph_node::create (tree decl) lookup_attribute (omp declare target, DECL_ATTRIBUTES (decl))) { node-offloadable = 1; +#ifdef ENABLE_OFFLOADING g-have_offload = true; +#endif } node-register_symbol (); diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 915d55f..44ce479 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -1961,7 +1961,9 @@ create_omp_child_function (omp_context *ctx, bool task_copy) if (is_targetreg_ctx (octx)) { cgraph_node::get_create (decl)-offloadable = 1; +#ifdef ENABLE_OFFLOADING g-have_offload = true; +#endif break; } } @@ -8287,7 +8289,9 @@ expand_omp_target (struct omp_region *region) if (kind == GF_OMP_TARGET_KIND_REGION) { unsigned srcidx, dstidx, num; +#ifdef ENABLE_OFFLOADING struct cgraph_node *node; +#endif /* If the target region needs data sent from the parent function, then the very first statement (except possible @@ -8414,18 +8418,22 @@ expand_omp_target (struct omp_region *region) DECL_STRUCT_FUNCTION (child_fn)-curr_properties = cfun-curr_properties; cgraph_node::add_new_function (child_fn, true); +#ifdef ENABLE_OFFLOADING /* Add the new function to the offload table. */ vec_safe_push (offload_funcs, child_fn); +#endif /* Fix the callgraph edges for child_cfun. Those for cfun will be fixed in a following pass. */ push_cfun (child_cfun); cgraph_edge::rebuild_edges (); +#ifdef ENABLE_OFFLOADING /* Prevent IPA from removing child_fn as unreachable, since there are no refs from the parent function to child_fn in offload LTO mode. */ node = cgraph_node::get (child_fn); node-mark_force_output (); +#endif /* Some EH regions might become dead, see PR34608. If pass_cleanup_cfg isn't the first pass to happen with the @@ -12502,7 +12510,7 @@ omp_finish_file (void) varpool_node::finalize_decl (vars_decl); varpool_node::finalize_decl (funcs_decl); - } +} else { for (unsigned i = 0; i num_funcs; i++) diff --git a/gcc/varpool.c b/gcc/varpool.c index 80dd496..50f2e6e 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -171,9 +171,11 @@ varpool_node::get_create (tree decl) lookup_attribute (omp declare target, DECL_ATTRIBUTES (decl))) { node-offloadable = 1; +#ifdef ENABLE_OFFLOADING g-have_offload = true; if (!in_lto_p) vec_safe_push (offload_vars, decl); +#endif } node-register_symbol ();
Re: [PATCH] Fix PR63868 - Guard offloading with ifdef ENABLE_OFFLOADING
On Tue, Nov 18, 2014 at 06:31:59PM +0300, Ilya Verbin wrote: @@ -8287,7 +8289,9 @@ expand_omp_target (struct omp_region *region) if (kind == GF_OMP_TARGET_KIND_REGION) { unsigned srcidx, dstidx, num; +#ifdef ENABLE_OFFLOADING struct cgraph_node *node; +#endif Please instead move the struct cgraph_node *node; declaration right above where it is used for the first time. There is no goto involved there, and it isn't in a switch either, so you probably also can do just: struct cgraph_node *node = cgraph_node::get (child_fn); instead. Ok with that change. @@ -8414,18 +8418,22 @@ expand_omp_target (struct omp_region *region) DECL_STRUCT_FUNCTION (child_fn)-curr_properties = cfun-curr_properties; cgraph_node::add_new_function (child_fn, true); +#ifdef ENABLE_OFFLOADING /* Add the new function to the offload table. */ vec_safe_push (offload_funcs, child_fn); +#endif /* Fix the callgraph edges for child_cfun. Those for cfun will be fixed in a following pass. */ push_cfun (child_cfun); cgraph_edge::rebuild_edges (); +#ifdef ENABLE_OFFLOADING /* Prevent IPA from removing child_fn as unreachable, since there are no refs from the parent function to child_fn in offload LTO mode. */ node = cgraph_node::get (child_fn); node-mark_force_output (); +#endif /* Some EH regions might become dead, see PR34608. If pass_cleanup_cfg isn't the first pass to happen with the Jakub
Re: [PATCH 4/4] Data structure is used for inline_summary struct.
On Fri, Nov 14, 2014 at 08:59:10PM +0100, Jan Hubicka wrote: b) with GTY, we cannot call destructor Everything in symbol table is expecitely memory managed (i.e. enver left to be freed by garbage collector). It resists in GTY only to allow linking garbage collected object from them and to get PCH working. Well, if I understand the intent correctly, summaries are for stuff that is not in the symbol table. For example jump functions are a Correct. vector of structures possibly containing trees, so everything has to be in garbage collected memory. When an edge is removed, it is necessary to be notified about it immediately, for example to decrement rdesc_refcount (you might argue that that should be done in a separate hook and not from within a summary class but then you start to rely on hook invocation ordering so I think it is better to eventually use the summaries for it too). I do not see why ctors/dtors can not do the reference counting. In fact this is how refcounting is done usually anyway? Well, when there is no garbage collection involved then yes, that is how you normally do it but in the GC case, there is the question of what is the appropriate time to call destructor on garbage collected data (like jump functions)? I still fail to see problem here. Summaries are explicitly managed- they are constructed at summary construction time or when new callgarph node is introduced/duplicated. They are destroyed when callgarph node is destroyed or whole summary is ddestroyed. It is job of the summary datastructure to call proper ctors/dtors, not job of garbage collector that provides the underlying memory management. If you have datastructure that points to something that is not explicitly managed (i.e. tree expression), you just can not have non-trivial constructor on that datastructure, because that is freed transparently by gty that don't do destruction... Honza
[PATCH V2] plugin event for C/C++ function definitions
Hi, this patch adds a new plugin event PLUGIN_START_PARSE_FUNCTION and PLUGIN_FINISH_PARSE_FUNCTION that are invoked at start_function and finish_function respectively in the C and C++ frontends. PLUGIN_START_PARSE_FUNCTION is called before parsing the function body. PLUGIN_FINISH_PARSE_FUNCTION is called after parsing a function definition. changelog: gcc/c/c-decl.c: Invoke callbacks in start_function and finish_function. gcc/cp/decl.c: Invoke callbacks in start_function and finish_function. gcc/doc/plugins.texi: Add documentation about PLUGIN_START_FUNCTION and PLUGIN_FINISH_FUNCTION gcc/plugin.def: Add events for start_function and finish_function. gcc/plugin.c (register_callback, invoke_plugin_callbacks): Same. gcc/testsuite/g++.dg/plugin/def_plugin.c: New test plugin. gcc/testsuite/g++.dg/plugin/def-plugin-test.C: Testcase for above plugin. gcc/testsuite/g++.dg/plugin/plugin.exp diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index e23284a..fea3334 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -8073,6 +8073,7 @@ start_function (struct c_declspecs *declspecs, struct c_declarator *declarator, decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, true, NULL, attributes, NULL, NULL, DEPRECATED_NORMAL); + invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1); /* If the declarator is not suitable for a function definition, cause a syntax error. */ @@ -8886,6 +8887,7 @@ finish_function (void) It's still in DECL_STRUCT_FUNCTION, and we'll restore it in tree_rest_of_compilation. */ set_cfun (NULL); + invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, current_function_decl); current_function_decl = NULL; } diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index d4adbeb..ce2f832 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -13631,6 +13631,7 @@ start_function (cp_decl_specifier_seq *declspecs, tree decl1; decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, attrs); + invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1); if (decl1 == error_mark_node) return false; /* If the declarator is not suitable for a function definition, @@ -14260,6 +14261,7 @@ finish_function (int flags) vec_free (deferred_mark_used_calls); } + invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, fndecl); return fndecl; } diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi index 4a839b8..1c9e074 100644 --- a/gcc/doc/plugins.texi +++ b/gcc/doc/plugins.texi @@ -174,6 +174,8 @@ Callbacks can be invoked at the following pre-determined events: @smallexample enum plugin_event @{ + PLUGIN_START_PARSE_FUNCTION, /* Called before parsing the body of a function. */ + PLUGIN_FINISH_PARSE_FUNCTION, /* After finishing parsing a function. */ PLUGIN_PASS_MANAGER_SETUP,/* To hook into pass manager. */ PLUGIN_FINISH_TYPE, /* After finishing parsing a type. */ PLUGIN_FINISH_DECL, /* After finishing parsing a declaration. */ diff --git a/gcc/plugin.c b/gcc/plugin.c index 8debc09..f7a8b64 100644 --- a/gcc/plugin.c +++ b/gcc/plugin.c @@ -433,6 +433,8 @@ register_callback (const char *plugin_name, return; } /* Fall through. */ + case PLUGIN_START_PARSE_FUNCTION: + case PLUGIN_FINISH_PARSE_FUNCTION: case PLUGIN_FINISH_TYPE: case PLUGIN_FINISH_DECL: case PLUGIN_START_UNIT: @@ -511,6 +513,8 @@ invoke_plugin_callbacks_full (int event, void *gcc_data) gcc_assert (event = PLUGIN_EVENT_FIRST_DYNAMIC); gcc_assert (event event_last); /* Fall through. */ + case PLUGIN_START_PARSE_FUNCTION: + case PLUGIN_FINISH_PARSE_FUNCTION: case PLUGIN_FINISH_TYPE: case PLUGIN_FINISH_DECL: case PLUGIN_START_UNIT: diff --git a/gcc/plugin.def b/gcc/plugin.def index df5d383..04faec9 100644 --- a/gcc/plugin.def +++ b/gcc/plugin.def @@ -18,6 +18,12 @@ along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +/* Called before parsing the body of a function. */ +DEFEVENT (PLUGIN_START_PARSE_FUNCTION) + +/* After finishing parsing a function. */ +DEFEVENT (PLUGIN_FINISH_PARSE_FUNCTION) + /* To hook into pass manager. */ DEFEVENT (PLUGIN_PASS_MANAGER_SETUP) diff --git a/gcc/testsuite/g++.dg/plugin/def-plugin-test.C b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C new file mode 100644 index 000..b7f2d3d --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C @@ -0,0 +1,13 @@ +int global = 12; + +int function1(void); + +int function2(int a) // { dg-warning Start fndef function2 } +{ + return function1() + a; +} // { dg-warning Finish fndef function2 } + +int function1(void) // { dg-warning Start fndef function1 } +{ + return global + 1; +} // { dg-warning Finish fndef function1 } diff --git
Re: [PATCH][AArch64] Adjust generic move costs
On Mon, Nov 17, 2014 at 5:13 PM, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 14 November 2014 14:35, Wilco Dijkstra wdijk...@arm.com wrote: 2014-11-14 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (generic_regmove_cost): Increase FP move cost. OK /Marcus Changelog should probably indicate PR target/61915 when committing ? Ramana
Re: Stream out default optimization nodes
On Mon, Nov 17, 2014 at 10:38 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this patch makes us to store default optimization node same way as we stream target node. This means that command line options given at compile time prevail those given at linktime. Previously we sort of combined both. We still have a lot of flags that are global (i.e. not marked as Optimization) but affect way how the unit is output. Since I woul dlike to replace the old option merging, I would like to add Global attribute to each of them in .opt file and generate streaming code for them same way as we do for optimization/target nodes. This patch regtested/bootstrapped x86_64-linux and in ealrier tree also ppc64-linux/ppc64-aix that do not work for me at the moment. I alosuse it in my tree for some time and tested firefox/libreoffice builds OK? Honza * tree.c (free_lang_data_in_decl): Store default optimization node. Index: tree.c === --- tree.c (revision 217659) +++ tree.c (working copy) @@ -5118,6 +5118,9 @@ free_lang_data_in_decl (tree decl) if (!DECL_FUNCTION_SPECIFIC_TARGET (decl)) DECL_FUNCTION_SPECIFIC_TARGET (decl) = target_option_default_node; + if (!DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)) + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl) + = optimization_default_node; } /* DECL_SAVED_TREE holds the GENERIC representation for DECL. I think one of your LTO streaming change breaks GCC LTO build: https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00473.html /export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/xg++ -B/export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/ -B/usr/5.0.0/x86_64-unknown-linux-gnu/bin/ -nostdinc++ -B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs -I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include -I/export/gnu/import/git/gcc-test-profiled/src-trunk/libstdc++-v3/libsupc++ -L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs -g -O2 -flto=jobserver -frandom-seed=1 -fprofile-use -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -static-libstdc++ -static-libgcc -o build/genmatch \ build/genmatch.o ../libcpp/libcpp.a ../libiberty/libiberty.a build/errors.o build/vec.o build/hash-table.o .././libiberty/libiberty.a ../../src-trunk/libcpp/lex.c: In function âend_directiveâ: ../../src-trunk/libcpp/lex.c:442:43: error: â__builtin_ia32_pcmpestri128â needs isa option -m32 -msse4.2 index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); ^ make[7]: *** [/tmp/ccTC6Hk9.ltrans9.ltrans.o] Error 1 -- H.J.
Re: PATCH: PR bootstrap/63784: [5 Regression] profiledbootstrap failure with bootstrap-lto
On Fri, Nov 14, 2014 at 2:19 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Nov 14, 2014 at 12:15 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Nov 11, 2014 at 8:02 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Nov 10, 2014 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Nov 10, 2014 at 5:44 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Nov 10, 2014 at 2:43 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 10, 2014 at 05:32:32AM -0800, H.J. Lu wrote: On Mon, Nov 10, 2014 at 4:05 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 10, 2014 at 12:50:44PM +0100, Richard Biener wrote: On Sun, Nov 9, 2014 at 5:46 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, r216964 disables bootstrap for libcc1 which exposed 2 things: 1. libcc1 isn't compiled with LTO even when GCC is configured with --with-build-config=bootstrap-lto. It may be intentional since libcc1 is disabled for bootstrap. 2. -fPIC isn't used to created libcc1.so, which is OK if libcc1 is compiled with LTO which remembers PIC option. Why is this any special to LTO? If it is then it looks like a LTO (driver) issue to me? Why are we linking the pic libibterty into a non-pic libcc1? I admit I haven't tried LTO bootstrap, but from normal bootstrap logs, libcc1 is built normally using libtool using -fPIC only, and linked into libcc1.so.0.0.0 and libcc1plugin.so.0.0.0, and of course against the pic/libiberty.a, because we need PIC code in the shared libraries. So, I don't understand the change at all. Jakub This is the command line to build libcc1.la: Sure, but there was -fPIC used to compile all the *.o files that are being linked into libcc1.so, so LTO should know that. And it does. If not please file a bug with a smaller testcase than libcc1 and libiberty. There is nothing wrong with linker. It is a slm-lto bug in libtool. I uploaded a testcase at https://gcc.gnu.org/bugzilla/attachment.cgi?id=33931 My patch is a backport of libtool LTO support: commit b81fd4ef009c24a86a7e64727ea09efb410ea149 Author: Ralf Wildenhues ralf.wildenh...@gmx.de Date: Sun Aug 29 17:31:29 2010 +0200 Support GCC LTO on GNU/Linux. * libltdl/config/ltmain.m4sh (func_mode_link): Allow through flags matching -O*, -flto*, -fwhopr, -fuse-linker-plugin. * libltdl/m4/libtool.m4 (_LT_CMD_GLOBAL_SYMBOLS): Drop symbols starting with __gnu_lto. (_LT_LINKER_SHLIBS) [linux] archive_cmds, archive_expsyms_cmds: Add $pic_flag for GCC. (_LT_LANG_CXX_CONFIG) [linux] archive_cmds, archive_expsyms_cmds: Likewise. (_LT_SYS_HIDDEN_LIBDEPS): Ignore files matching *.lto.o. * NEWS: Update. Signed-off-by: Ralf Wildenhues ralf.wildenh...@gmx.de OK to install? Ping. Stage 1 will be closed tomorrow. I'd like to restore LTO bootstrap. Bugfixing is still possible after that date. I suppose you don't call LTO bootstrap a new feature ;) New LTO bugs have been introduced since LTO bootstrap was broken a few weeks ago. I'd like to check in this one so that we have a chance to prevent further damage. Any objects to backport this from libtool upstream? -- H.J.