Re: RFA (diagnostics): PATCH to move deprecated source location to separate note

2014-11-18 Thread Dodji Seketeli
Jason Merrill ja...@redhat.com writes:

 When I was fixing attribute deprecated for C++ templates, I found it
 odd that the source location for the deprecated decl was embedded in
 the warning rather than in a separate inform.  This patch moves it
 out.

 Tested x86_64-pc-linux-gnu.  OK for trunk?

Yes.  Thanks!

-- 
Dodji


Re: [patch] fix expand_builtin_init_dwarf_reg_sizes wrt register spans

2014-11-18 Thread Richard Henderson
On 09/30/2014 12:47 PM, Olivier Hainque wrote:
 2014-09-30  Olivier Hainque  hain...@adacore.com
 
 libgcc/
 * unwind-dw2.c (DWARF_REG_TO_UNWIND_COLUMN): Move default def to ...
 
 gcc/
 * defaults.h: ... here.
 * dwarf2cfi.c (init_one_dwarf_reg_size): New helper, processing
 one particular reg for expand_builtin_init_dwarf_reg_sizes. Apply
 DWARF_REG_TO_UNWIND_COLUMN.
 (expand_builtin_init_dwarf_reg_sizes): Rework to use helper and
 account for dwarf register spans.

Ok.


r~


[PATCH AARCH64]load store pair optimization using sched_fusion pass.

2014-11-18 Thread Bin Cheng
Hi,
This is the patch implementing ldp/stp optimization for aarch64.  It
consists of two parts.  The first one is peephole part, which further
includes ldp/stp patterns (both peephole patterns and the insn match
patterns) and auxiliary functions (both checking the validity and merging).
The second part implements the aarch64 backend hook for sched-fusion pass,
which calculates appropriate priorities for different kinds of load/store
instructions.  With these priorities, sched-fusion pass can schedule as many
load/store instructions together as possible, thus the coming peephole2 pass
can merge them.

I collected data for miscellaneous benchmarks.  Some cases are improved;
most of the rest cases are not regressed; only couple of them are regressed
a little by 2-3%.  After looking into the regressions I can confirm that
code transformation is generally good with many load/stores paired.  These
regressions are most probably false alarms and caused by other issues.

Conclusion is this patch can pair lots of consecutive load/store
instructions into ldp/stp.  The conclusion can be proven by code size
improvement of benchmarks.  E.g., in general it cuts off text size of
spec2k6 binaries (O3 level, not statically linked in my build) by 1.68%.

Bootstrap and test on aarch64.  Is it OK?

2014-11-18  Bin Cheng  bin.ch...@arm.com

* config/aarch64/aarch64.md (load_pairmode): Split to
load_pairsi, load_pairdi, load_pairsf and load_pairdf.
(load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split
from load_pairmode.  New alternative to support int/fp
registers in fp/int mode patterns.
(store_pairmode:): Split to store_pairsi, store_pairdi,
store_pairsf and store_pairdi.
(store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split
from store_pairmode.  New alternative to support int/fp
registers in fp/int mode patterns.
(*load_pair_extendsidi2_aarch64): New pattern.
(*load_pair_zero_extendsidi2_aarch64): New pattern.
(aarch64-ldpstp.md): Include.
* config/aarch64/aarch64-ldpstp.md: New file.
* config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp):
New.
(extract_base_offset_in_addr): New.
(aarch64_operands_ok_for_ldpstp): New.
(aarch64_operands_adjust_ok_for_ldpstp): New.
* config/aarch64/aarch64.c (enum sched_fusion_type): New enum.
(TARGET_SCHED_FUSION_PRIORITY): New hook.
(fusion_load_store): New functon.
(extract_base_offset_in_addr): New function.
(aarch64_gen_adjusted_ldpstp): New function.
(aarch64_sched_fusion_priority): New function.
(aarch64_operands_ok_for_ldpstp): New function.
(aarch64_operands_adjust_ok_for_ldpstp): New function.

2014-11-18  Bin Cheng  bin.ch...@arm.com

* gcc.target/aarch64/ldp-stp-1.c: New test.
* gcc.target/aarch64/ldp-stp-2.c: New test.
* gcc.target/aarch64/ldp-stp-3.c: New test.
* gcc.target/aarch64/ldp-stp-4.c: New test.
* gcc.target/aarch64/ldp-stp-5.c: New test.
* gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion
and peephole2 pass.Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 217687)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -1054,64 +1054,141 @@
 
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
-(define_insn load_pairmode
-  [(set (match_operand:GPI 0 register_operand =r)
-   (match_operand:GPI 1 aarch64_mem_pair_operand Ump))
-   (set (match_operand:GPI 2 register_operand =r)
-(match_operand:GPI 3 memory_operand m))]
+(define_insn load_pairsi
+  [(set (match_operand:SI 0 register_operand =r,*w)
+   (match_operand:SI 1 aarch64_mem_pair_operand Ump,Ump))
+   (set (match_operand:SI 2 register_operand =r,*w)
+   (match_operand:SI 3 memory_operand m,m))]
   rtx_equal_p (XEXP (operands[3], 0),
plus_constant (Pmode,
   XEXP (operands[1], 0),
-  GET_MODE_SIZE (MODEmode)))
-  ldp\\t%w0, %w2, %1
-  [(set_attr type load2)]
+  GET_MODE_SIZE (SImode)))
+  @
+   ldp\\t%w0, %w2, %1
+   ldp\\t%s0, %s2, %1
+  [(set_attr type load2,neon_load1_2reg)
+   (set_attr fp *,yes)]
 )
 
+(define_insn load_pairdi
+  [(set (match_operand:DI 0 register_operand =r,*w)
+   (match_operand:DI 1 aarch64_mem_pair_operand Ump,Ump))
+   (set (match_operand:DI 2 register_operand =r,*w)
+   (match_operand:DI 3 memory_operand m,m))]
+  rtx_equal_p (XEXP (operands[3], 0),
+   plus_constant (Pmode,
+  XEXP (operands[1], 0),
+  GET_MODE_SIZE (DImode)))
+  @
+   ldp\\t%x0, %x2, %1
+   ldp\\t%d0, %d2, %1
+  [(set_attr type load2,neon_load1_2reg)
+  

Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-18 Thread Ramana Radhakrishnan



On 06/11/14 08:35, Yangfei (Felix) wrote:

  The idea is simple: Use movw for certain const source operand instead of

ldrh.  And exclude the const values which cannot be handled by
mov/mvn/movw.

  I am doing regression test for this patch.  Assuming no issue pops up,

OK for trunk?

So, doesn't that makes the bug latent for architectures older than
armv6t2 and big endian and only fixed this in ARM state ? I'd prefer a complete
solution please. What about *thumb2_movhi_insn in thumb2.md ?



Actually, we fix the bug by excluding the const values which cannot be handled. The patch 
still works even without the adding of movw here.
The new movw alternative here is just an small code optimization for certain 
arch. We can handle consts by movw instead of ldrh and this better for performance.
We find that this bug is not reproducible for *thumb2_movhi_insn. The reason is that this 
pattern can always move consts using movw.


Please fix the PR number in your final commit with PR 59593.


Index: gcc/config/arm/predicates.md
===
--- gcc/config/arm/predicates.md(revision 216838)
+++ gcc/config/arm/predicates.md(working copy)
@@ -144,6 +144,11 @@
   (and (match_code const_int)
(match_test INTVAL (op) == 0)))

+(define_predicate arm_movw_immediate_operand
+  (and (match_test TARGET_32BIT  arm_arch_thumb2)
+   (and (match_code const_int)
+   (match_test (INTVAL (op)  0x) == 0
+
 ;; Something valid on the RHS of an ARM data-processing instruction
 (define_predicate arm_rhs_operand
   (ior (match_operand 0 s_register_operand)
@@ -211,6 +216,11 @@
   (ior (match_operand 0 arm_rhs_operand)
(match_operand 0 arm_not_immediate_operand)))

+(define_predicate arm_hi_operand
+  (ior (match_operand 0 arm_rhsm_operand)
+   (ior (match_operand 0 arm_not_immediate_operand)
+(match_operand 0 arm_movw_immediate_operand
+


I think you don't need any of these predicates.



 (define_predicate arm_di_operand
   (ior (match_operand 0 s_register_operand)
(match_operand 0 arm_immediate_di_operand)))
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 216838)
+++ gcc/config/arm/arm.md   (working copy)
@@ -6285,8 +6285,8 @@

 ;; Pattern to recognize insn generated default case above
 (define_insn *movhi_insn_arch4
-  [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r)
-   (match_operand:HI 1 general_operand  rIk,K,r,mi))]
+  [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r)
+   (match_operand:HI 1 arm_hi_operand rIk,K,j,r,mi))]


Use `n' instead of `j' - movw can handle all of the numerical constants 
for HImode values. And the predicate can remain general_operand.



   TARGET_ARM
 arm_arch4
 (register_operand (operands[0], HImode)
@@ -6294,16 +6294,18 @@
   @
mov%?\\t%0, %1\\t%@ movhi
mvn%?\\t%0, #%B1\\t%@ movhi
+   movw%?\\t%0, %1\\t%@ movhi
str%(h%)\\t%1, %0\\t%@ movhi
ldr%(h%)\\t%0, %1\\t%@ movhi
   [(set_attr predicable yes)
-   (set_attr pool_range *,*,*,256)
-   (set_attr neg_pool_range *,*,*,244)
+   (set_attr pool_range *,*,*,*,256)
+   (set_attr neg_pool_range *,*,*,*,244)
(set_attr_alternative type
  [(if_then_else (match_operand 1 const_int_operand 
)
 (const_string mov_imm )
 (const_string mov_reg))
   (const_string mvn_imm)
+  (const_string mov_imm)
   (const_string store1)
   (const_string load1)])]
 )


Look at the set_attr arch alternative and set this to t2 for the movw 
alternative. I would expect that to be enough to sort this out instead 
of adding all this code.


I don't think your patch or the one with my modifications fixes the 
issue completely for architectures that do not have movw / movt 
instructions and I would expect that one would still need Thomas's patch 
to create const table entries of the right size.


Otherwise this is OK.


regards
Ramana





--- gcc/ChangeLog   (revision 216838)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,12 @@
+2014-11-05  Felix Yang  felix.y...@huawei.com
+   Shanyao Chen  chenshan...@huawei.com
+


I'm assuming you have copyright assignments sorted.


Yes, both my employer(Huawei) and I have signed copyright assignments with FSF.



+(define_predicate arm_movw_immediate_operand
+  (and (match_test TARGET_32BIT  arm_arch_thumb2)
+   (ior (match_code high)


I don't see why you need to check for high here ?



Yeah, I double checked and found that it's not necessary here. Thanks for 
pointing this out.
Attached please find the updated patch. Reg-tested for armeb-none-eabi.
OK for the trunk?


Index: gcc/ChangeLog

[PATCH] [AArch64, NEON] More NEON intrinsics improvement

2014-11-18 Thread Yangfei (Felix)
Hi,
 This patch converts more intrinsics to use builtin functions instead of 
the previous inline assembly syntax. 
 Passed the glorious testsuite of Christophe Lyon. 

 Three testcases are added for the testing of intriniscs which are not 
covered by the testsuite: 
 gcc.target/aarch64/vfma.c
 gcc.target/aarch64/vfma_n.c
 gcc.target/aarch64/vfms.c

 Regtested with aarch64-linux-gnu on QEMU.  OK for the trunk? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,26 @@
+2014-11-18  Felix Yang  felix.y...@huawei.com
+   Haijian Zhang  z.zhanghaij...@huawei.com
+   Jiji Jiang  jiangj...@huawei.com
+   Pengfei Sui  suipeng...@huawei.com
+
+   * config/aarch64/arm_neon.h (vrecpe_u32, vrecpeq_u32): Rewrite using
+   builtin functions.
+   (vfma_f32, vfmaq_f32, vfmaq_f64, vfma_n_f32, vfmaq_n_f32, vfmaq_n_f64,
+   vfms_f32, vfmsq_f32, vfmsq_f64): Likewise.
+   (vhsub_s8, vhsub_u8, vhsub_s16, vhsub_u16, vhsub_s32, vhsub_u32,
+   vhsubq_s8, vhsubq_u8, vhsubq_s16, vhsubq_u16, vhsubq_s32, vhsubq_u32,
+   vsubhn_s16, vsubhn_u16, vsubhn_s32, vsubhn_u32, vsubhn_s64, vsubhn_u66,
+   vrsubhn_s16, vrsubhn_u16, vrsubhn_s32, vrsubhn_u32, vrsubhn_s64,
+   vrsubhn_u64, vsubhn_high_s16, vsubhn_high_u16, vsubhn_high_s32,
+   vsubhn_high_u32, vsubhn_high_s64, vsubhn_high_u64, vrsubhn_high_s16,
+   vrsubhn_high_u16, vrsubhn_high_s32, vrsubhn_high_u32, vrsubhn_high_s64,
+   vrsubhn_high_u64): Likewise.
+   * config/aarch64/iterators.md (VDQ_SI): New mode iterator.
+   * config/aarch64/aarch64.md (define_c_enum unspec): Add UNSPEC_URECPE.
+   * config/aarch64/aarch64-simd.md (aarch64_urecpemode): New pattern.
+   * config/aarch64/aarch64-simd-builtins.def (shsub, uhsub, subhn, rsubhn,
+   subhn2, rsubhn2, urecpe): New builtins.
+
 2014-11-11  Andrew Pinski  apin...@cavium.com
 
Bug target/61997
Index: gcc/testsuite/gcc.target/aarch64/narrow_high-intrinsics.c
===
--- gcc/testsuite/gcc.target/aarch64/narrow_high-intrinsics.c   (revision 
217394)
+++ gcc/testsuite/gcc.target/aarch64/narrow_high-intrinsics.c   (working copy)
@@ -107,9 +107,9 @@ ONE (vmovn_high, uint16x8_t, uint16x4_t, uint32x4_
 ONE (vmovn_high, uint32x4_t, uint32x2_t, uint64x2_t, u64)
 
 
-/* { dg-final { scan-assembler-times \\tsubhn2 v 6} }  */
+/* { dg-final { scan-assembler-times \\tsubhn2\\tv 6} }  */
 /* { dg-final { scan-assembler-times \\taddhn2\\tv 6} }  */
-/* { dg-final { scan-assembler-times rsubhn2 v 6} }  */
+/* { dg-final { scan-assembler-times rsubhn2\\tv 6} }  */
 /* { dg-final { scan-assembler-times raddhn2\\tv 6} }  */
 /* { dg-final { scan-assembler-times \\trshrn2 v 6} }  */
 /* { dg-final { scan-assembler-times \\tshrn2 v 6} }  */
Index: gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_n.c
===
--- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_n.c
(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma_n.c
(revision 0)
@@ -0,0 +1,69 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x4438ca3d, 0x44390a3d };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x44869eb8, 0x4486beb8, 0x4486deb8, 
0x4486feb8 };
+VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0x408906e1532b8520, 
0x40890ee1532b8520 };
+
+#define VECT_VAR_ASSIGN(S,Q,T1,W) S##Q##_##T1##W
+#define ASSIGN(S, Q, T, W, V) T##W##_t S##Q##_##T##W = V
+#define TEST_MSG VFMA/VFMAQ
+void exec_vfma_n (void)
+{
+  /* Basic test: v4=vfma_n(v1,v2), then store the result.  */
+#define TEST_VFMA(Q, T1, T2, W, N) \
+  VECT_VAR(vector_res, T1, W, N) = \
+vfma##Q##_n_##T2##W(VECT_VAR(vector1, T1, W, N),   \
+ VECT_VAR(vector2, T1, W, N),  \
+ VECT_VAR_ASSIGN(Scalar, Q, T1, W));   
\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define CHECK_VFMA_RESULTS(test_name,comment)  \
+  {\
+CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment);  \
+CHECK_FP(test_name, float, 32, 4, PRIx32, expected, comment);  \
+   CHECK_FP(test_name, float, 64, 2, PRIx64, expected, comment);   \
+  }
+
+#define DECL_VABD_VAR(VAR) \
+  DECL_VARIABLE(VAR, float, 32, 2);\
+  DECL_VARIABLE(VAR, float, 32, 4);\
+  DECL_VARIABLE(VAR, float, 64, 2);
+
+  DECL_VABD_VAR(vector1);
+  DECL_VABD_VAR(vector2);
+  DECL_VABD_VAR(vector3);
+  

Re: [PATCH][ARM] Handle simple SImode PLUS and MINUS cases in rtx costs

2014-11-18 Thread Ramana Radhakrishnan
On Wed, Nov 12, 2014 at 4:38 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This is a much-delayed respin of the patch in response to Richards feedback
 at:
 http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00068.html

 We now let recursion do its magic and just add the cost of the arithmetic
 operation on top.

 Tested on arm-none-eabi

 Ok for trunk?

Ok.

Ramana

 2014-11-12  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/arm/arm.c (arm_new_rtx_costs, case PLUS, MINUS):
  Add cost of alu.arith in simple SImode case.


Re: [PATCH 0/3][AArch64]More intrinsics/builtins improvements

2014-11-18 Thread Yangfei (Felix)
Yeah, I agree that your approach is better.  I missed this point.  Thanks.


 
 Ah, sorry for the duplication of effort. And thanks for the heads-up about
 upcoming work! I don't think I have any plans for any of those others at the
 moment.
 
 In the case of vld1_dup, however, I'm going to argue that my approach
 (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01718.html) is better, in that 
 a
 builtin is opaque (blocks optimization) for the midend, whereas gcc vector
 extensions (as in vdup_n_...) allows the midend to perform constant-folding, 
 etc.
 Does that make sense?
 
 --Alan
 
 Yangfei (Felix) wrote:
  These three are logically independent, but all on a common theme, and
  I've tested them all together by
 
  bootstrapped + check-gcc on aarch64-none-elf cross-tested check-gcc
  on aarch64_be-none-elf
 
  Ok for trunk?
 
 
  Hi Alan,
 
  It seems that we are duplicating the work for the vld1_dup part. (Refer 
  to
 my message: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01462.html)
  I have a plan to improve these intrinsics/builtins:  vrsubhnX, vrsqrtX,
 vqrdmulX, vqmovX, vqdmulhqX, vqdmulhX, vpminX, vpmaxX, vpaddX, vpadaX
  vmvnX, vmulxX,
 vmovnX, vmlsX, vhsubX, vcvtX, vcopyX, vaddlvX, vabX, vfmX, vrecpeX, vcntX,
 vclsX
  And work for these intrinsics is in progress:  vfmX, vrecpeX, vhsubX,
 vcntX, vclsX
  Please let me know if you guys want to work on any of them.  Thanks.
 
 



Re: [PATCH, 8/8] Do simple omp lowering for no address taken var

2014-11-18 Thread Eric Botcazou
 Now - I can see how that is easily confused by the static chain
 being address-taken.  But I also remember that Eric did some
 preparatory work to fix that, for nested functions, that is,
 possibly setting DECL_NONADDRESSABLE_P?  Don't remember exactly.

The preparatory work is DECL_NONLOCAL_FRAME.  The complete patch which does 
something along these lines is attached to PR tree-optimization/54779 (latest 
version, for a 4.9-based compiler).

-- 
Eric Botcazou


Re: [ARM] Refactor Neon Builtins infrastructure

2014-11-18 Thread Ramana Radhakrishnan
On Wed, Nov 12, 2014 at 5:08 PM, James Greenhalgh
james.greenha...@arm.com wrote:
 Hi,

 I was taking a look at fixing the issues in the ARM back-end exposed
 by Marc Glisse's patch in [1], and hoped to fix them by adapting the
 patch recently commited by Tejas ([2]).

 As I looked, I realised that the ARM target and the AArch64 target
 now differ drastically in how their Advanced SIMD builtin
 initialisation and expansion logic works. This is a growing
 maintenance burden. This patch series is an attempt to start fixing
 the problem.

 From a high level, I see five problems with the ARM Neon builtin code.

 First is the magic number interface, which gives builtins with signed
 and unsigned, or saturating and non-saturating, variants an extra
 parameter used to control which instruction is ultimately emitted. This
 is problematic as it enforces that these intrinsics be implemented with
 an UNSPEC pattern, we would like the flexibility to try to do a better job
 of modeling these patterns.

 Second, is that all the code lives in arm.c. This file is huge and
 frightening. The least we could do is start to split it up!

 Third, is the complicated builtin initialisation code. If we collect
 common cases together from the large switch in the initialisation function,
 it is clear we can eliminate much of the existing code. In fact, we have
 already solved the same problem in AArch64 ([3]), and we don't gain
 anything from having these interfaces separate.

 Fourth, is that we don't have infrastructure to strongly type the functions
 in arm_neon.h -  instead casting around between signed and unsigned vector
 arguments as required. We need this to avoid special casing some builtins
 we may want to vectorize (bswap and friends). Again we've solved this
 in AArch64 ([4]).

 Finally, there are the issues with type mangling Marc has seen.

 This patch-set tries to fix those issues in order, and progresses as so:

 First the magic words:

   [Refactor Builtins: 1/8] Remove arm_neon.h's Magic Words

 Then moving code out to arm-builtins.c:

   [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h
   [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file

 And then making the ARM backend look like the AArch64 backend and fixing
 Marc's issue.

   [Patch ARM Refactor Builtins 4/8]  Refactor VARn Macros
   [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in
 ARM.
   [Patch ARM Refactor Builtins 6/8] Add some tests for poly mangling
   [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when
 initialising builtins and fix type mangling
   [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin
 infrastructure

 Clearly there is more we could do to start sharing code between the two
 targets rather than duplicating it. For now, the benefit did not seem worth
 the substantial churn that this would cause both back-ends.

 I've bootstrapped each patch in this series in turn for both arm and
 thumb on arm-none-linux-gnueabihf.

 OK for trunk?

This is OK thanks.



regards
Ramana


 Thanks,
 James

 ---
 [1]: [c++] typeinfo for target types
  https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00618.html
 [2]: [AArch64, Patch] Restructure arm_neon.h vector types's implementation
  https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00264.html
 [3]: [AArch64] Refactor Advanced SIMD builtin initialisation.
  https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00532.html
 [4]: [AArch64] AArch64 SIMD Builtins Better Type Correctness.
  https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02005.html

 ---
  gcc/config.gcc   |3 +-
  gcc/config/arm/arm-builtins.c| 2925 
  gcc/config/arm/arm-protos.h  |  173 +-
  gcc/config/arm/arm-simd-builtin-types.def|   48 +
  gcc/config/arm/arm.c | 3149 
 +-
  gcc/config/arm/arm_neon.h| 1743 +++---
  gcc/config/arm/arm_neon_builtins.def |  435 ++--
  gcc/config/arm/iterators.md  |  167 ++
  gcc/config/arm/neon.md   |  893 
  gcc/config/arm/t-arm |   11 +
  gcc/config/arm/unspecs.md|  109 +-
  gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C |   16 +
  gcc/testsuite/g++.dg/abi/mangle-neon.C   |5 +
  gcc/testsuite/gcc.target/arm/pr51968.c   |2 +-
  create mode 100644 gcc/config/arm/arm-builtins.c
  create mode 100644 gcc/config/arm/arm-simd-builtin-types.def
  create mode 100644 gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C
  14 files changed, 4992 insertions(+), 4687 deletions(-)


Re: [Refactor Builtins: 1/8] Remove arm_neon.h's Magic Words

2014-11-18 Thread Ramana Radhakrishnan



On 12/11/14 17:10, James Greenhalgh wrote:


Hi,

As part of some wider cleanup I'd like to do to ARM's Neon Builtin
infrastructure, my first step will be to remove the Magic Words used
to decide which variant of an instruction should be emitted.

The Magic Words interface allows a single builtin
(say, __builtin_neon_shr_nv4hi) to cover signed, unsigned and rounding
variants through the use of an extra control parameter.

This patch removes that interface, defining individual builtins for each
variant and dropping the extra parameter.

There are several benefits to cleaning this up:

   * We can start to drop some of the UNSPEC operations without having to
 add additional expand patterns to map them.
   * The interface is confusing on first glance at the file.
   * Having such a different interface to AArch64 doubles the amount of
 time it takes to grok the Neon Builtins infrastructure.

The drawbacks of changing this interface are:

   * Another big churn change for the ARM backend.
   * A series of new iterators, UNSPECs and builtin functions to cover the
 variants which were previously controlled by a Magic Word.
   * Lots more patterns for genrecog to think about, potentially slowing
 down compilation, increasing bootstrap time, and increasing compiler
 binary size.

On balance, I think we should deal with drawbacks in return for the future
clean-ups we enable, but I expect this to be controversial.

This patch is naieve and conservative. I don't make any effort to merge
patterns across iterators, nor any attempt to change UNSPECs to specified
tree codes. Future improvements in this area would be useful.

I've bootstrapped the patch for arm-none-linux-gnueabihf in isolation, and
in series.

OK for trunk?


Ok,

Ramana


Thanks,
James

---
gcc/testsuite/

2014-11-12  James Greenhalgh  james.greenha...@arm.com

 * gcc.target/arm/pr51968.c (foo): Do not try to pass Magic Word.

gcc/

2014-11-12  James Greenhalgh  james.greenha...@arm.com

 * config/arm/arm.c (arm_expand_neon_builtin): Remove Magic Word
 parameter, rearrange switch statement accordingly.
 (arm_evpc_neon_vrev): Remove Magic Word.
 * config/arm/unspecs.md (unspec): Split many UNSPECs to
 rounding, or signed/unsigned variants.
 * config/arm/neon.md (vcondmodemode): Remove Magic Word code.
 (vcondumodemode): Likewise.
 (neon_vadd): Remove Magic Word operand.
 (neon_vaddl): Remove Magic Word operand, convert to use
 signed/unsigned iterator.
 (neon_vaddw): Likewise.
 (neon_vhadd): Likewise, also iterate over rounding forms.
 (neon_vqadd): Remove Magic Word operand, convert to use
 signed/unsigned iterator.
 (neon_vraddhn): Remove Magic Word operand, convert to iterate
 over rounding forms.
 (neon_vmul): Remove Magic Word operand, iterate over
 polynomial/float instruction forms.
 (neon_vmla): Remove Magic Word operand.
 (neon_vfma): Likewise.
 (neon_vfms): Likewise.
 (neon_vmls): Likewise.
 (neon_vmlal): Remove Magic Word operand, iterate over
 signed/unsigned forms.
 (neon_vmlsl): Likewise.
 (neon_vqdmulh): Remove Magic Word operand, iterate over rounding
 forms.
 (neon_vqdmlal): Remove Magic Word operand, iterate over
 signed/unsigned forms.
 (neon_vqdmlsl): Likewise.
 (neon_vmull): Likewise.
 (neon_vqdmull): Remove Magic Word operand.
 (neon_vsub): Remove Magic Word operand.
 (neon_vsubl): Remove Magic Word operand, convert to use
 signed/unsigned iterator.
 (neon_vsubw): Likewise.
 (neon_vhsub): Likewise.
 (neon_vqsub): Likewise.
 (neon_vrsubhn): Remove Magic Word operand, convert to iterate
 over rounding forms.
 (neon_vceq): Remove Magic Word operand.
 (neon_vcge): Likewise.
 (neon_vcgeu): Likewise.
 (neon_vcgt): Likewise.
 (neon_vcgtu): Likewise.
 (neon_vcle): Likewise.
 (neon_vclt): Likewise.
 (neon_vcage): Likewise.
 (neon_vcagt): Likewise.
 (neon_vabd): Remove Magic Word operand, iterate over
 signed/unsigned forms, and split out...
 (neon_vabdf): ...this as new.
 (neon_vabdl): Remove Magic Word operand, iterate over
 signed/unsigned forms.
 (neon_vaba): Likewise.
 (neon_vmax): Remove Magic Word operand, iterate over
 signed/unsigned and max/min forms, and split out...
 (neon_vmaxminf): ...this as new.
 (neon_vmin): Delete.
 (neon_vpadd): Remove Magic Word operand.
 (neon_vpaddl): Remove Magic Word operand, iterate over
 signed/unsigned variants.
 (neon_vpadal): Likewise.
 (neon_vpmax): Remove Magic Word operand, iterate over
 signed/unsigned and max/min forms, and split out...
 

Re: [Patch ARM Refactor Builtins 2/8] Move Processor flags to arm-protos.h

2014-11-18 Thread Ramana Radhakrishnan



On 12/11/14 17:10, James Greenhalgh wrote:


Hi,

If we want to move all the code relating to builtin initialisation and
expansion to a common file, we must share the processor flags with that
common file.

This patch pulls those definitions out to config/arm/arm-protos.h

Bootstrapped and regression tested in series, and in isolation with no
issues.

OK?

Thanks,
James


Ok.

Ramana

---
2014-11-12  James Greenhalgh  james.greenha...@arm.com

* config/arm/t-arm (arm.o): Include arm-protos.h in the recipe.
* config/arm/arm.c (FL_CO_PROC): Move to arm-protos.h.
(FL_ARCH3M): Likewise.
(FL_MODE26): Likewise.
(FL_MODE32): Likewise.
(FL_ARCH4): Likewise.
(FL_ARCH5): Likewise.
(FL_THUMB): Likewise.
(FL_LDSCHED): Likewise.
(FL_STRONG): Likewise.
(FL_ARCH5E): Likewise.
(FL_XSCALE): Likewise.
(FL_ARCH6): Likewise.
(FL_VFPV2): Likewise.
(FL_WBUF): Likewise.
(FL_ARCH6K): Likewise.
(FL_THUMB2): Likewise.
(FL_NOTM): Likewise.
(FL_THUMB_DIV): Likewise.
(FL_VFPV3): Likewise.
(FL_NEON): Likewise.
(FL_ARCH7EM): Likewise.
(FL_ARCH7): Likewise.
(FL_ARM_DIV): Likewise.
(FL_ARCH8): Likewise.
(FL_CRC32): Likewise.
(FL_SMALLMUL): Likewise.
(FL_IWMMXT): Likewise.
(FL_IWMMXT2): Likewise.
(FL_TUNE): Likewise.
(FL_FOR_ARCH2): Likewise.
(FL_FOR_ARCH3): Likewise.
(FL_FOR_ARCH3M): Likewise.
(FL_FOR_ARCH4): Likewise.
(FL_FOR_ARCH4T): Likewise.
(FL_FOR_ARCH5): Likewise.
(FL_FOR_ARCH5T): Likewise.
(FL_FOR_ARCH5E): Likewise.
(FL_FOR_ARCH5TE): Likewise.
(FL_FOR_ARCH5TEJ): Likewise.
(FL_FOR_ARCH6): Likewise.
(FL_FOR_ARCH6J): Likewise.
(FL_FOR_ARCH6K): Likewise.
(FL_FOR_ARCH6Z): Likewise.
(FL_FOR_ARCH6ZK): Likewise.
(FL_FOR_ARCH6T2): Likewise.
(FL_FOR_ARCH6M): Likewise.
(FL_FOR_ARCH7): Likewise.
(FL_FOR_ARCH7A): Likewise.
(FL_FOR_ARCH7VE): Likewise.
(FL_FOR_ARCH7R): Likewise.
(FL_FOR_ARCH7M): Likewise.
(FL_FOR_ARCH7EM): Likewise.
(FL_FOR_ARCH8A): Likewise.
* config/arm/arm-protos.h: Take definitions moved from arm.c.



RE: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.

2014-11-18 Thread David Sherwood
Hi Christophe,

Ah sorry. My mistake - it fixes this in bugzilla:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810

This change is needed in order to remove the CANNOT_CHANGE_MODE_CLASS
#define, which will be committed as a separate patch.

Regards,
David Sherwood.

-Original Message-
From: Christophe Lyon [mailto:christophe.l...@linaro.org] 
Sent: 17 November 2014 21:09
To: David Sherwood
Cc: gcc-patches@gcc.gnu.org; Alan Hayward
Subject: Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes 
endianness-safe.

On 13 November 2014 15:32, David Sherwood david.sherw...@arm.com wrote:
 Hi,

 I think that's because Alan Hayward's original patch had a bug in it, which he
 has fixed and should be submitting to gcc patches fairly soon. So it's 
 probably
 best waiting until he gets a new patch up I think.


I've applied both Alan's patches and the advsimd-intrinsics tests now
all pass on aarch64_be, but this doesn't need your patch.

Which testcase does your patch actually fix?


 Regards,
 David.

 -Original Message-
 From: Christophe Lyon [mailto:christophe.l...@linaro.org]
 Sent: 13 November 2014 14:22
 To: David Sherwood
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes 
 endianness-safe.

 On 13 November 2014 11:09, David Sherwood david.sherw...@arm.com wrote:
 Hi All,

 I have successfully rebased this and tested in conjunction with a patch from
 Alan Hayward ([AArch64] [BE] Fix vector load/stores to not use ld1/st1), who
 should be submitting a new version shortly. Built and tested on:

 aarch64-none-elf
 aarch64_be-none-elf
 x86_64-linux-gnu

 I've applied both patches to recent trunk (r217483), and I still see
 ICEs on aarch64_be, when running the advsimd-intrinsics/vldX.c tests.

 I have them with Alan Hayward's patch alone, too. BTW, I thought that
 patch had been approved by Marcus, but it's not in trunk yet. Maybe I
 missed something.

 Christophe.


 Regards,
 David Sherwood.

 -Original Message-
 From: David Sherwood [mailto:david.sherw...@arm.com]
 Sent: 28 October 2014 08:55
 To: 'gcc-patches@gcc.gnu.org'
 Subject: RE: [AArch64] [BE] [1/2] Make large opaque integer modes 
 endianness-safe.

 Hi,

 Sorry to bother you again. Could someone take a look at this change
 please if they have time?

 Thanks!
 David.

 -Original Message-
 From: David Sherwood [mailto:david.sherw...@arm.com]
 Sent: 10 October 2014 15:48
 To: gcc-patches@gcc.gnu.org
 Subject: [AArch64] [BE] [1/2] Make large opaque integer modes 
 endianness-safe.

 Hi,

 I have a fix (originally written by Tejas Belagod) for the following bug:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810

 Could someone take a look please?

 Thanks!
 David Sherwood.

 ChangeLog:

 gcc/:
 2014-11-13  David Sherwood  david.sherw...@arm.com

 * config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist,
 aarch64_reverse_mask): New decls.
 * config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum.
 * config/aarch64/iterators.md (insn_count): New mode_attr.
* config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i,
 vec_load_lanes(o/c/x)i): Fixed to work for Big Endian.
 * config/aarch64/aarch64-simd.md (aarch64_rev_reglist,
 aarch64_simd_(ld/st)(2/3/4)): Added.
 * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist,
 aarch64_reverse_mask): Added.










Re: [Patch ARM Refactor Builtins 3/8] Pull builtins code to its own file

2014-11-18 Thread Ramana Radhakrishnan



On 12/11/14 17:10, James Greenhalgh wrote:


Hi,

The config/arm/arm.c file has always seemed a worrying size to me.

This patch pulls out the builtin related code to its own file. I think
this will be a good idea as we move forward. It seems a more sensible
separation of concerns. There are no functional changes here.

Bootstrapped and regression tested on arm-none-linux-gnueabi, with
no issues.

OK?

Thanks,
James

---
2014-11-12  James Greenhalgh  james.greenha...@arm.com

* config.gcc (extra_objs): Add arm-builtins.o for arm*-*-*.
(target_gtfiles): Add config/arm/arm-builtins.c for arm*-*-*.
* config/arm/arm-builtins.c: New.
* config/arm/t-arm (arm_builtins.o): New.
* config/arm/arm-protos.h (arm_expand_builtin): New.
(arm_builtin_decl): Likewise.
(arm_init_builtins): Likewise.
(arm_atomic_assign_expand_fenv): Likewise.
* config/arm/arm.c (arm_atomic_assign_expand_fenv): Remove prototype.
(arm_init_builtins): Likewise.
(arm_init_iwmmxt_builtins): Likewise
(safe_vector_operand): Likewise
(arm_expand_binop_builtin): Likewise
(arm_expand_unop_builtin): Likewise
(arm_expand_builtin): Likewise
(arm_builtin_decl): Likewise
(insn_flags): Remove static.
(tune_flags): Likewise.
(enum arm_builtins): Move to config/arm/arm-builtins.c.
(arm_init_neon_builtins): Likewise.
(struct builtin_description): Likewise.
(arm_init_iwmmxt_builtins): Likewise.
(arm_init_fp16_builtins): Likewise.
(arm_init_crc32_builtins): Likewise.
(arm_init_builtins): Likewise.
(arm_builtin_decl): Likewise.
(safe_vector_operand): Likewise.
(arm_expand_ternop_builtin): Likewise.
(arm_expand_binop_builtin): Likewise.
(arm_expand_unop_builtin): Likewise.
(neon_dereference_pointer): Likewise.
(arm_expand_neon_args): Likewise.
(arm_expand_neon_builtin): Likewise.
(neon_split_vcombine): Likewise.
(arm_expand_builtin): Likewise.
(arm_builtin_vectorized_function): Likewise.
(arm_atomic_assign_expand_fenv): Likewise.



Ok.

Ramana


Re: [Patch ARM Refactor Builtins 4/8] Refactor VARn Macros

2014-11-18 Thread Ramana Radhakrishnan



On 12/11/14 17:10, James Greenhalgh wrote:


Hi,

These macros can always be defined as a base case of VAR1 and a recursive
case of VARn-1. At the moment, the body of VAR1 is duplicated to each
macro.

This patch makes that change.

Regression tested on arm-none-linux-gnueabihf with no issues.

OK?

Thanks,
James

---
gcc/

2014-11-12  James Greenhalgh  james.greenha...@arm.com

* config/arm/arm-builtins.c (VAR1): Add a comma.
(VAR2): Rewrite in terms of VAR1.
(VAR3-10): Likewise.
(arm_builtins): Remove leading comma before ARM_BUILTIN_MAX.
* config/arm/arm_neon_builtins.def: Remove trailing commas.



OK.

Ramana


Re: [Patch ARM Refactor Builtins 5/8] Start keeping track of qualifiers in ARM.

2014-11-18 Thread Ramana Radhakrishnan



On 12/11/14 17:10, James Greenhalgh wrote:


Hi,

Now we have everything we need to start keeping track of the correct
qualifiers for each Neon builtin class in the arm back-end.

Some of the ARM Neon itypes are redundant when mapped to the qualifiers
framework. For now, don't change these, we will clean them up in patch
  8 of the series.

Bootstrapped on arm-none-gnueabihf with no issues.

OK?



OK.

Ramana

Thanks,
James

---
gcc/

2014-11-12  James Greenhalgh  james.greenha...@arm.com

* gcc/config/arm/arm-builtins.c (arm_type_qualifiers): New.
(neon_itype): Add new types corresponding to the types used in
qualifiers names.
(arm_unop_qualifiers): New.
(arm_bswap_qualifiers): Likewise.
(arm_binop_qualifiers): Likewise.
(arm_ternop_qualifiers): Likewise.
(arm_getlane_qualifiers): Likewise.
(arm_lanemac_qualifiers): Likewise.
(arm_setlane_qualifiers): Likewise.
(arm_combine_qualifiers): Likewise.
(arm_load1_qualifiers): Likewise.
(arm_load1_lane_qualifiers): Likewise.
(arm_store1_qualifiers): Likewise.
(arm_storestruct_lane_qualifiers): Likewise.
(UNOP_QUALIFIERS): Likewise.
(DUP_QUALIFIERS): Likewise.
(SPLIT_QUALIFIERS): Likewise.
(CONVERT_QUALIFIERS): Likewise.
(FLOAT_WIDEN_QUALIFIERS): Likewise.
(FLOAT_NARROW_QUALIFIERS): Likewise.
(RINT_QUALIFIERS): Likewise.
(COPYSIGNF_QUALIFIERS): Likewise.
(CREATE_QUALIFIERS): Likewise.
(REINTERP_QUALIFIERS): Likewise.
(BSWAP_QUALIFIERS): Likewise.
(BINOP_QUALIFIERS): Likewise.
(FIXCONV_QUALIFIERS): Likewise.
(SCALARMUL_QUALIFIERS): Likewise.
(SCALARMULL_QUALIFIERS): Likewise.
(SCALARMULH_QUALIFIERS): Likewise.
(TERNOP_QUALIFIERS): Likewise.
(SELECT_QUALIFIERS): Likewise.
(VTBX_QUALIFIERS): Likewise.
(GETLANE_QUALIFIERS): Likewise.
(SHIFTIMM_QUALIFIERS): Likewise.
(LANEMAC_QUALIFIERS): Likewise.
(SCALARMAC_QUALIFIERS): Likewise.
(SETLANE_QUALIFIERS): Likewise.
(SHIFTINSERT_QUALIFIERS): Likewise.
(SHIFTACC_QUALIFIERS): Likewise.
(LANEMUL_QUALIFIERS): Likewise.
(LANEMULL_QUALIFIERS): Likewise.
(LANEMULH_QUALIFIERS): Likewise.
(COMBINE_QUALIFIERS): Likewise.
(VTBL_QUALIFIERS): Likewise.
(LOAD1_QUALIFIERS): Likewise.
(LOADSTRUCT_QUALIFIERS): Likewise.
(LOAD1LANE_QUALIFIERS): Likewise.
(LOADSTRUCTLANE_QUALIFIERS): Likewise.
(STORE1_QUALIFIERS): Likewise.
(STORESTRUCT_QUALIFIERS): Likewise.
(STORE1LANE_QUALIFIERS): Likewise.
(STORESTRUCTLANE_QUALIFIERS): Likewise.
(neon_builtin_datum): Keep track of qualifiers.
(VAR1): Likewise.



Re: [Patch ARM Refactor Builtins 6/8] Add some tests for poly mangling

2014-11-18 Thread Ramana Radhakrishnan



On 12/11/14 17:10, James Greenhalgh wrote:


Hi,

The poly types end up going through the default mangler, but only
sometimes.

We don't want to change the mangling for poly types with the next patch in
this series, so add a test which should pass before and after.

I've checked that the new tests pass at this stage of the patch series,
and bootstrapped on arm-none-linux-gnueabihf for good luck.

OK?



OK.

Ramana

Thanks,
James

---
gcc/testsuite/

2014-11-12  James Greenhalgh  james.greenha...@arm.com

* g++.dg/abi/mangle-arm-crypto.C: New.
* g++.dg/abi/mangle-neon.C (f19): New.
(f20): Likewise.



Re: [Patch ARM Refactor Builtins 7/8] Use qualifiers arrays when initialising builtins and fix type mangling

2014-11-18 Thread Ramana Radhakrishnan



On 12/11/14 17:10, James Greenhalgh wrote:


Hi,

This patch wires up builtin initialisation similar to the AArch64 backend,
making use of the qualifiers arrays to decide on types for each builtin
we hope to initialise.

We could take an old snapshot of the qualifiers code from AArch64, but as
our end-goal is to pull in the type mangling changes, we are as well to do
that now. In order to preserve the old mangling rules after this patch, we
must wire all of these types up.

Together, this becomes a fairly simple side-port of the logic for
Advanced SIMD builtins from the AArch64 target.

Bootstrapped on arm-none-linux-gnueabihf with no issues.

OK?

Thanks,
James

---


OK.

Ramana


gcc/

2014-11-12  James Greenhalgh  james.greenha...@arm.com

* config/arm/arm-builtins.c (arm_scalar_builtin_types): New.
(enum arm_simd_type): Likewise.
(struct arm_simd_type_info): Likewise
(arm_mangle_builtin_scalar_type): Likewise.
(arm_mangle_builtin_vector_type): Likewise.
(arm_mangle_builtin_type): Likewise.
(arm_simd_builtin_std_type): Likewise.
(arm_lookup_simd_builtin_type): Likewise.
(arm_simd_builtin_type): Likewise.
(arm_init_simd_builtin_types): Likewise.
(arm_init_simd_builtin_scalar_types): Likewise.
(arm_init_neon_builtins): Rewrite using qualifiers.
* config/arm/arm-protos.h (arm_mangle_builtin_type): New.
* config/arm/arm-simd-builtin-types.def: New file.
* config/arm/t-arm (arm-builtins.o): Depend on it.
* config/arm/arm.c (arm_mangle_type): Call arm_mangle_builtin_type.
* config/arm/arm_neon.h (int8x8_t): Use new internal type.
(int16x4_t): Likewise.
(int32x2_t): Likewise.
(float16x4_t): Likewise.
(float32x2_t): Likewise.
(poly8x8_t): Likewise.
(poly16x4_t): Likewise.
(uint8x8_t): Likewise.
(uint16x4_t): Likewise.
(uint32x2_t): Likewise.
(int8x16_t): Likewise.
(int16x8_t): Likewise.
(int32x4_t): Likewise.
(int64x2_t): Likewise.
(float32x4_t): Likewise.
(poly8x16_t): Likewise.
(poly16x8_t): Likewise.
(uint8x16_t): Likewise.
(uint16x8_t): Likewise.
(uint32x4_t): Likewise.
(uint64x2_t): Likewise.

Conflicts:
gcc/config/arm/arm.c



Re: [Patch ARM Refactor Builtins 8/8] Neaten up the ARM Neon builtin infrastructure

2014-11-18 Thread Ramana Radhakrishnan



On 12/11/14 17:10, James Greenhalgh wrote:


Hi,

This final patch clears up the remaining data structures which we no
longer have any use for...

  * _QUALIFIERS macros which do not name a distinct pattern of
arguments/return types.
  * The neon_builtin_type_mode enum is not needed, we can map directly to
the machine_mode.
  * The neon_itype enum is not needed, the builtin expand functions can
be rewritten to use the qualifiers data.

This gives us reasonable parity between the builtin infrastructure for
the ARM and AArch64 targets. We could go further and start sharing some
of the logic between the two back-ends (and after that the builtin
definitions, and some of arm_neon.h, etc.), but I haven't done that here
as the immediate benefit is minimal.

Bootstrapped and regression tested with no issues.

OK?


OK.

Ramana


Thanks,
James

---
gcc/

2014-11-12  James Greenhalgh  james.greenha...@arm.com

* config/arm/arm-builtins.c (CONVERT_QUALIFIERS): Delete.
(COPYSIGNF_QUALIFIERS): Likewise.
(CREATE_QUALIFIERS): Likewise.
(DUP_QUALIFIERS): Likewise.
(FLOAT_WIDEN_QUALIFIERS): Likewise.
(FLOAT_NARROW_QUALIFIERS): Likewise.
(REINTERP_QUALIFIERS): Likewise.
(RINT_QUALIFIERS): Likewise.
(SPLIT_QUALIFIERS): Likewise.
(FIXCONV_QUALIFIERS): Likewise.
(SCALARMUL_QUALIFIERS): Likewise.
(SCALARMULL_QUALIFIERS): Likewise.
(SCALARMULH_QUALIFIERS): Likewise.
(SELECT_QUALIFIERS): Likewise.
(VTBX_QUALIFIERS): Likewise.
(SHIFTIMM_QUALIFIERS): Likewise.
(SCALARMAC_QUALIFIERS): Likewise.
(LANEMUL_QUALIFIERS): Likewise.
(LANEMULH_QUALIFIERS): Likewise.
(LANEMULL_QUALIFIERS): Likewise.
(SHIFTACC_QUALIFIERS): Likewise.
(SHIFTINSERT_QUALIFIERS): Likewise.
(VTBL_QUALIFIERS): Likewise.
(LOADSTRUCT_QUALIFIERS): Likewise.
(LOADSTRUCTLANE_QUALIFIERS): Likewise.
(STORESTRUCT_QUALIFIERS): Likewise.
(STORESTRUCTLANE_QUALIFIERS): Likewise.
(neon_builtin_type_mode): Delete.
(v8qi_UP): Map to V8QImode.
(v8qi_UP): Map to V8QImode.
(v4hi_UP): Map to V4HImode.
(v4hf_UP): Map to V4HFmode.
(v2si_UP): Map to V2SImode.
(v2sf_UP): Map to V2SFmode.
(di_UP): Map to DImode.
(v16qi_UP): Map to V16QImode.
(v8hi_UP): Map to V8HImode.
(v4si_UP): Map to V4SImode.
(v4sf_UP): Map to V4SFmode.
(v2di_UP): Map to V2DImode.
(ti_UP): Map to TImode.
(ei_UP): Map to EImode.
(oi_UP): Map to OImode.
(neon_itype): Delete.
(neon_builtin_datum): Remove itype, make mode a machine_mode.
(VAR1): Update accordingly.
(arm_init_neon_builtins): Use machine_mode directly.
(neon_dereference_pointer): Likewise.
(arm_expand_neon_args): Use qualifiers to decide operand types.
(arm_expand_neon_builtin): Likewise.
* config/arm/arm_neon_builtins.def: Remap operation type for
many builtins.



Re: [PATCH, ARM] Constrain the small multiply test cases to be more restrictive.

2014-11-18 Thread Ramana Radhakrishnan
On Mon, Nov 17, 2014 at 5:06 AM, Hale Wang hale.w...@arm.com wrote:
 Hi,

 Refer to the previous small multiply patch (r217175).

 The conditions in the small multiply test cases are not restrictive enough.
 If forcing the march=armv4t/armv5t, these cases will fail.
 These cases can be used only if we defined 
 -mcpu=cortex-m0/m1/m0plus.small-multiply .

 This patch is used to fix this issue.

 These cases will be skipped if we don't define
 -mcpu=cortex-m0/m1/m0plus.small-multiply. So no influence to other
 targets.

 Build gcc passed. Is it OK for trunk?

Ok , thanks.

Ramana

 Thanks and Best Regards,
 Hale Wang

 gcc/testsuite/ChangeLog:

 2014-11-13  Hale Wang  hale.w...@arm.com

 * gcc.target/arm/small-multiply-m0-1.c: Only apply when
  -mcpu=cortex-m0/m1/m0plus.small-multiply .
 * gcc.target/arm/small-multiply-m0-2.c: Likewise.
 * gcc.target/arm/small-multiply-m0-3.c: Likewise.
 * gcc.target/arm/small-multiply-m0plus-1.c: Likewise.
 * gcc.target/arm/small-multiply-m0plus-2.c: Likewise.
 * gcc.target/arm/small-multiply-m0plus-3.c: Likewise.
 * gcc.target/arm/small-multiply-m1-1.c: Likewise.
 * gcc.target/arm/small-multiply-m1-2.c: Likewise.
 * gcc.target/arm/small-multiply-m1-3.c: Likewise.
 
 diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c
 b/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c
 index 77ec603..49132e3 100644
 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c
 +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c
 @@ -1,6 +1,6 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target arm_thumb1_ok } */
 -/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* }
 { -mcpu=* } { -mcpu=cortex-m0.small-multiply } } */
 +/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* }
 { * } { -mcpu=cortex-m0.small-multiply } } */
  /* { dg-options -mcpu=cortex-m0.small-multiply -mthumb -O2 } */

  int
 diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c
 b/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c
 index c89b3ba..7f1bf7b 100644
 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c
 +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c
 @@ -1,6 +1,6 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target arm_thumb1_ok } */
 -/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* }
 { -mcpu=* } { -mcpu=cortex-m0.small-multiply } } */
 +/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* }
 { * } { -mcpu=cortex-m0.small-multiply } } */
  /* { dg-options -mcpu=cortex-m0.small-multiply -mthumb -Os } */

  int
 diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c
 b/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c
 index b2df109..aca39d7 100644
 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c
 +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c
 @@ -1,6 +1,6 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target arm_thumb1_ok } */
 -/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* }
 { -mcpu=* } { -mcpu=cortex-m0.small-multiply } } */
 +/* { dg-skip-if Test is specific to cortex-m0.small-multiply { arm*-*-* }
 { * } { -mcpu=cortex-m0.small-multiply } } */
  /* { dg-options -mcpu=cortex-m0.small-multiply -mthumb -Os } */

  int
 diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c
 b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c
 index 08a450b..12e8839 100644
 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c
 +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c
 @@ -1,6 +1,6 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target arm_thumb1_ok } */
 -/* { dg-skip-if Test is specific to cortex-m0plus.small-multiply {
 arm*-*-* } { -mcpu=* } { -mcpu=cortex-m0plus.small-multiply } } */
 +/* { dg-skip-if Test is specific to cortex-m0plus.small-multiply {
 arm*-*-* } { * } { -mcpu=cortex-m0plus.small-multiply } } */
  /* { dg-options -mcpu=cortex-m0plus.small-multiply -mthumb -O2 } */

  int
 diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c
 b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c
 index 17b52d3..3e3c9b2 100644
 --- a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c
 +++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c
 @@ -1,6 +1,6 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target arm_thumb1_ok } */
 -/* { dg-skip-if Test is specific to cortex-m0plus.small-multiply {
 arm*-*-* } { -mcpu=* } { -mcpu=cortex-m0plus.small-multiply } } */
 +/* { dg-skip-if Test is specific to cortex-m0plus.small-multiply {
 arm*-*-* } { * } { -mcpu=cortex-m0plus.small-multiply } } */
  /* { dg-options -mcpu=cortex-m0plus.small-multiply -mthumb -Os } */

  int
 diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-3.c
 b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-3.c
 

Re: [PING][PATCH] [AARCH64, NEON] Improve vcls(q?) vcnt(q?) and vld1(q?)_dup intrinsics

2014-11-18 Thread Marcus Shawcroft
On 17 November 2014 06:58, Yangfei (Felix) felix.y...@huawei.com wrote:
 PING?
 BTW: It seems that Alan's way of improving vld1(q?)_dup intrinsic is more 
 elegant.
 So is the improvement of vcls(q?) vcnt(q?) OK for trunk?  Thanks.

Please rebase over Alan's patch and repost, thank you /Marcus


Re: [PATCH][AArch64] Remove crypto extension from default for cortex-a53, cortex-a57

2014-11-18 Thread Kyrill Tkachov


On 17/11/14 16:59, Ramana Radhakrishnan wrote:

On Mon, Nov 17, 2014 at 2:48 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

Some configurations of Cortex-A53 and Cortex-A57 don't ship with crypto,
so enabling it by default for -mcpu=cortex-a53 and cortex-a57 is
inappropriate.

Tested aarch64-none-elf. Reminder that at the moment all the crypto
extension does is enable the use of the ACLE crypto intrinsics in arm_neon.h

Ok for trunk?

I can't ok this but ...

Since we've changed behaviour from 4.9 I think it warrants an entry in
changes.html for 5.0


Makes sense.
Here's what I propose.

Ok?

Kyrill



Ramana


Thanks,
Kyrill

2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64-cores.def (cortex-a53): Remove
 AARCH64_FL_CRYPTO from feature flags.
 (cortex-a57): Likewise.
 (cortex-a57.cortex-a53): Likewise.
Index: htdocs/gcc-5/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.23
diff -U 3 -r1.23 changes.html
--- htdocs/gcc-5/changes.html	14 Nov 2014 10:49:51 -	1.23
+++ htdocs/gcc-5/changes.html	17 Nov 2014 17:46:41 -
@@ -257,6 +257,13 @@
Alternatively it can be enabled by default by configuring GCC with the
code--enable-fix-cortex-a53-835769/code option.
  /li
+ li The cryptographic extensions to the ARMv8-A architecture are no
+   longer enabled by default when specifying the
+   code-mcpu=cortex-a53/code, code-mcpu=cortex-a57/code or
+   code-mcpu=cortex-a57.cortex-a53/code options.  To enable these
+   extensions add the code+crypto/code extension to your given
+   code-mcpu/code or code-march/code options' value.
+ /li
 
/ul
 

Re: [PATCH, 8/8] Do simple omp lowering for no address taken var

2014-11-18 Thread Richard Biener
On Tue, 18 Nov 2014, Eric Botcazou wrote:

  Now - I can see how that is easily confused by the static chain
  being address-taken.  But I also remember that Eric did some
  preparatory work to fix that, for nested functions, that is,
  possibly setting DECL_NONADDRESSABLE_P?  Don't remember exactly.
 
 The preparatory work is DECL_NONLOCAL_FRAME.  The complete patch which does 
 something along these lines is attached to PR tree-optimization/54779 (latest 
 version, for a 4.9-based compiler).

Ah, now I remember - this was to be able to optimize away the frame
variable in case the nested function was inlined.

Toms case is somewhat different as I undestand as somehow LIM store
motion doesn't handle indirect frame accesses well enough(?)  So
he intends to load register vars in the frame into registers at the
beginning of the nested function and restore them to the frame on
function exit (this will probably break for recursive calls, but
OMP offloading might be special enough that this is a non-issue there).

So marking the frame decl won't help him here (I thought we might
mark the FIELD_DECLs corresponding to individual vars).  OTOH inside
the nested function accesses to the static chain should be easy to
identify.

Richard.


Re: [patch] New std::string implementation

2014-11-18 Thread Richard Biener
On Mon, Nov 17, 2014 at 5:43 PM, Jonathan Wakely jwak...@redhat.com wrote:
 On 17/11/14 13:06 +0100, Markus Trippelsdorf wrote:

 On 2014.11.14 at 15:43 +, Jonathan Wakely wrote:

 Tested on x86_64-linux and powerpc64-linux, also with
 --disable-libstdcxx11-abi to verify all the incompatible changes can
 be disabled if needed.


 On ppc64 I get:

 FAIL: libstdc++-abi/abi_check
 FAIL: 27_io/basic_ios/copyfmt/char/1.cc execution test
 FAIL: 27_io/basic_ios/exceptions/char/1.cc execution test
 FAIL: 27_io/basic_istream/extractors_arithmetic/char/exceptions_failbit.cc
 execution test
 FAIL:
 27_io/basic_istream/extractors_arithmetic/wchar_t/exceptions_failbit.cc
 execution test
 FAIL: 27_io/basic_istream/extractors_other/char/exceptions_null.cc
 execution test
 FAIL: 27_io/basic_istream/extractors_other/wchar_t/exceptions_null.cc
 execution test
 FAIL: 27_io/basic_istream/sentry/char/12297.cc execution test
 FAIL: 27_io/basic_istream/sentry/wchar_t/12297.cc execution test
 FAIL: 27_io/basic_ostream/inserters_other/char/exceptions_null.cc
 execution test
 FAIL: 27_io/basic_ostream/inserters_other/wchar_t/exceptions_null.cc
 execution test
 FAIL: 27_io/ios_base/storage/2.cc execution test


 I think I've fixed those, I'll post an updated patch soon, but I'm
 still working on some fixes for 22_locale/ test FAILs.

 The problem I have is that std::basic_iosC, T objects (the base
 class of all iostreams) have a pointer to an instance of
 std::num_getC, istreambuf_iteratorC, T and that type uses
 std::string, so needs to be tagged, which would require the entire
 iostreams hierarchy to be tagged. I want to avoid that.

 Updated patch with fixes asap ...

Looking at all these issues that just pop up inside libstdc++ I wonder
if this whole business will not blow up in our face once out in the wild...

Richard.


Re: [PATCH][AArch64] Remove crypto extension from default for cortex-a53, cortex-a57

2014-11-18 Thread Richard Earnshaw
On 18/11/14 09:38, Kyrill Tkachov wrote:
 
 On 17/11/14 16:59, Ramana Radhakrishnan wrote:
 On Mon, Nov 17, 2014 at 2:48 PM, Kyrill Tkachov kyrylo.tkac...@arm.com 
 wrote:
 Hi all,

 Some configurations of Cortex-A53 and Cortex-A57 don't ship with crypto,
 so enabling it by default for -mcpu=cortex-a53 and cortex-a57 is
 inappropriate.

 Tested aarch64-none-elf. Reminder that at the moment all the crypto
 extension does is enable the use of the ACLE crypto intrinsics in arm_neon.h

 Ok for trunk?
 I can't ok this but ...

 Since we've changed behaviour from 4.9 I think it warrants an entry in
 changes.html for 5.0
 
 Makes sense.
 Here's what I propose.
 
 Ok?
 
 Kyrill
 

 Ramana

 Thanks,
 Kyrill

 2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/aarch64/aarch64-cores.def (cortex-a53): Remove
  AARCH64_FL_CRYPTO from feature flags.
  (cortex-a57): Likewise.
  (cortex-a57.cortex-a53): Likewise.
 
 
 www-docs-a50-crypto.patch
 
 
 Index: htdocs/gcc-5/changes.html
 ===
 RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
 retrieving revision 1.23
 diff -U 3 -r1.23 changes.html
 --- htdocs/gcc-5/changes.html 14 Nov 2014 10:49:51 -  1.23
 +++ htdocs/gcc-5/changes.html 17 Nov 2014 17:46:41 -
 @@ -257,6 +257,13 @@
 Alternatively it can be enabled by default by configuring GCC with the
 code--enable-fix-cortex-a53-835769/code option.
   /li
 + li The cryptographic extensions to the ARMv8-A architecture are no
 +   longer enabled by default when specifying the
 +   code-mcpu=cortex-a53/code, code-mcpu=cortex-a57/code or
 +   code-mcpu=cortex-a57.cortex-a53/code options.  To enable these
 +   extensions add the code+crypto/code extension to your given
 +   code-mcpu/code or code-march/code options' value.
 + /li
  
 /ul
  
 

I'd suggest:

The optional cryptographic extensions ...

But otherwise it looks fine to me.

R.



Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-18 Thread Richard Biener
On Tue, Nov 18, 2014 at 2:59 AM, David Malcolm dmalc...@redhat.com wrote:
 On Mon, 2014-11-17 at 11:06 +0100, Richard Biener wrote:
 On Sat, Nov 15, 2014 at 12:00 PM, David Malcolm dmalc...@redhat.com wrote:
  On Thu, 2014-11-13 at 11:45 +0100, Richard Biener wrote:
  On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm dmalc...@redhat.com 
  wrote:
   On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote:
   On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek ja...@redhat.com 
   wrote:
On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:
On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:
 On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:
  To be constructive here - the above case is from within a
  GIMPLE_ASSIGN case label
  and thus I'd have expected
 
  case GIMPLE_ASSIGN:
{
  gassign *a1 = as_a gassign * (s1);
  gassign *a2 = as_a gassign * (s2);
lhs1 = gimple_assign_lhs (a1);
lhs2 = gimple_assign_lhs (a2);
if (TREE_CODE (lhs1) != SSA_NAME
 TREE_CODE (lhs2) != SSA_NAME)
  return (operand_equal_p (lhs1, lhs2, 0)
   gimple_operand_equal_value_p 
  (gimple_assign_rhs1 (a1),
   
  gimple_assign_rhs1 (a2)));
else if (TREE_CODE (lhs1) == SSA_NAME
  TREE_CODE (lhs2) == SSA_NAME)
  return vn_valueize (lhs1) == vn_valueize (lhs2);
return false;
}
 
  instead.  That's the kind of changes I have expected and have 
  approved of.

 But even that looks like just adding extra work for all 
 developers, with no
 gain.  You only have to add extra code and extra temporaries, in 
 switches
 typically also have to add {} because of the temporaries and thus 
 extra
 indentation level, and it doesn't simplify anything in the code.
   
The branch attempts to use the C++ typesystem to capture information
about the kinds of gimple statement we expect, both:
  (A) so that the compiler can detect type errors, and
  (B) as a comprehension aid to the human reader of the code
   
The ideal here is when function params and struct field can be
strengthened from gimple to a subclass ptr.  This captures the
knowledge that every use of a function or within a struct has a 
given
gimple code.
   
I just don't like all the as_a/is_a stuff enforced everywhere,
it means more typing, more temporaries, more indentation.
So, as I view it, instead of the checks being done cheaply (yes, I 
think
the gimple checking as we have right now is very cheap) under the
hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
put the burden on the developers, who has to check that manually 
through
the as_a/is_a stuff everywhere, more typing and uglier syntax.
I just don't see that as a step forward, instead a huge step 
backwards.
But perhaps I'm alone with this.
Can you e.g. compare the size of - lines in your patchset combined, 
and
size of + lines in your patchset?  As in, if your changes lead to 
less
typing or more.
  
   I see two ways out here.  One is to add overloads to all the functions
   taking the special types like
  
   tree
   gimple_assign_rhs1 (gimple *);
  
   or simply add
  
   gassign *operator ()(gimple *g) { return as_a gassign * (g); }
  
   into a gimple-compat.h header which you include in places that
   are not converted nicely.
  
   Thanks for the suggestions.
  
   Am I missing something, or is the gimple-compat.h idea above not valid C
   ++?
  
   Note that gimple is still a typedef to
 gimple_statement_base *
   (as noted before, the gimple - gimple * change would break everyone
   else's patches, so we talked about that as a followup patch for early
   stage3).
  
   Given that, if I try to create an operator () outside of a class, I
   get this error:
  
   ‘gassign* operator()(gimple)’ must be a nonstatic member function
  
   which is emitted from cp/decl.c's grok_op_properties:
 /* An operator function must either be a non-static member 
   function
or have at least one parameter of a class, a reference to a 
   class,
an enumeration, or a reference to an enumeration.  13.4.0.6 */
  
   I tried making it a member function of gimple_statement_base, but that
   doesn't work either: we want a conversion
 from a gimple_statement_base * to a gassign *, not
 from a gimple_statement_base   to a gassign *.
  
   Is there some syntactic trick here that I'm missing?  Sorry if I'm being
   dumb (I can imagine there's a way of doing it by making gimple become
   some kind of wrapped ptr class, but that way lies madness, surely).
 
  Hmm.
 
  struct assign;
  struct base {
operator assign *() const { return (assign *)this; }

Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread Richard Biener
On Tue, Nov 18, 2014 at 3:59 AM, Jeff Law l...@redhat.com wrote:
 On 11/17/14 13:05, Ilya Enkovich wrote:


 How comes you emit debug info for functions that do not exist and thus
 are never used?  Is problem caused by builtins going after
 END_CHKP_BUILTINS? Or some info generated for all builtins ignoring
 its existence?

 IIRC, stabs aren't pruned based on whether or not they're used.  So every
 file that includes tree-core.h is going to have every name/value pair for
 the enum.  Hence the major heartburn for AIX.

I wonder how AIX deals with the ever increasing number of target builtins
then.

Well.  Why can't we just drop enum values that exceed the stabs format?
At least I don't understand why we cannot fix this in the stabs output.
After all what happens to non-GCC code with such large enums?

Richard.

 jeff


Re: [testsuite,ARM] PR61153 Fix vbic and vorn tests

2014-11-18 Thread Ramana Radhakrishnan
On Mon, Nov 10, 2014 at 3:02 PM, Christophe Lyon
christophe.l...@linaro.org wrote:
 On 30 October 2014 23:02, Christophe Lyon christophe.l...@linaro.org wrote:
 On 29 October 2014 16:28, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Wed, Oct 29, 2014 at 3:26 PM, Christophe Lyon
 christophe.l...@linaro.org wrote:
 Hi,

 In PR61153, the vbic and vorn tests fail because when compiled at -O0
 the expected Neon instructions are not generated, making
 scan-assembler fail.

 This patch:
 - replaces -O0 by -O2
 - moves the declaration of local variables used as intrinsics
 parameters and results to global declarations, to prevent the compiler
 from optimizing the whole test away.

 OK?


 If you really want to do it , do it in neon-testgen.ml and do it for
 the whole lot.


 I thought it wasn't used anymore.

 At -O2 I have many more failures :-(

 (vdup, vget_lane, vget_low, vmov, vset_lane)

 And -O1 doesn't do the trick either...

 Christophe.

 Hi Ramana,

 Based on your request and on my above comment, I have modified
 neon-testgen.ml so that it sets -O2 in some testcases and -O0 by
 default.

 I've achieved this by adding a new value in the 'features' list in
 neon.ml, and set it to -O2 for vbic and vorn cases. neon-testgen.ml
 parses this value, and defaults to -O0.
 When not -O0, it also moves the input/output variables to global scope
 to avoid the whole test to be optimized out.

 I also fixed 3 warnings about 'or' being deprecated in neon.ml.

 OK?

Ok , thanks.

Ramana

 Christophe.

 2014-11-10  Christophe Lyon  christophe.l...@linaro.org

 gcc/
 * config/arm/neon-testgen.ml (emit_prologue): Handle new
 compile_test_optim argument.
 (emit_automatics): Rename to emit_variables. Support variable
 indentation of its output.
 (compile_test_optim): New function.
 (test_intrinsic): Call compile_test_optim.
 * config/arm/neon.ml (features): Add Compiler_optim.
 (ops): Add Compiler_optim feature to Vbic and Vorn.
 (type_in_crypto_only): Replace 'or' by '||'.
 (reinterp): Likewise.
 (reinterpq): Likewise.

 testsuite/
 * gcc.target/arm/neon/vbicQs16.c: Regenerate.
 * gcc.target/arm/neon/vbicQs32.c: Likewise.
 * gcc.target/arm/neon/vbicQs64.c: Likewise.
 * gcc.target/arm/neon/vbicQs8.c: Likewise.
 * gcc.target/arm/neon/vbicQu16.c: Likewise.
 * gcc.target/arm/neon/vbicQu32.c: Likewise.
 * gcc.target/arm/neon/vbicQu64.c: Likewise.
 * gcc.target/arm/neon/vbicQu8.c: Likewise.
 * gcc.target/arm/neon/vbics16.c: Likewise.
 * gcc.target/arm/neon/vbics32.c: Likewise.
 * gcc.target/arm/neon/vbics64.c: Likewise.
 * gcc.target/arm/neon/vbics8.c: Likewise.
 * gcc.target/arm/neon/vbicu16.c: Likewise.
 * gcc.target/arm/neon/vbicu32.c: Likewise.
 * gcc.target/arm/neon/vbicu64.c: Likewise.
 * gcc.target/arm/neon/vbicu8.c: Likewise.
 * gcc.target/arm/neon/vornQs16.c: Likewise.
 * gcc.target/arm/neon/vornQs32.c: Likewise.
 * gcc.target/arm/neon/vornQs64.c: Likewise.
 * gcc.target/arm/neon/vornQs8.c: Likewise.
 * gcc.target/arm/neon/vornQu16.c: Likewise.
 * gcc.target/arm/neon/vornQu32.c: Likewise.
 * gcc.target/arm/neon/vornQu64.c: Likewise.
 * gcc.target/arm/neon/vornQu8.c: Likewise.
 * gcc.target/arm/neon/vorns16.c: Likewise.
 * gcc.target/arm/neon/vorns32.c: Likewise.
 * gcc.target/arm/neon/vorns64.c: Likewise.
 * gcc.target/arm/neon/vorns8.c: Likewise.
 * gcc.target/arm/neon/vornu16.c: Likewise.
 * gcc.target/arm/neon/vornu32.c: Likewise.
 * gcc.target/arm/neon/vornu64.c: Likewise.
 * gcc.target/arm/neon/vornu8.c: Likewise.

 regards
 Ramana
 Christophe.

 2014-10-29  Christophe Lyon  christophe.l...@linaro.org

 PR target/61153
 * gcc.target/arm/neon/vbicQs16.c: Compile at O2 and move variables
 declarations from local to global.
 * gcc.target/arm/neon/vbicQs16.c: Likewise.
 * gcc.target/arm/neon/vbicQs32.c: Likewise.
 * gcc.target/arm/neon/vbicQs64.c: Likewise.
 * gcc.target/arm/neon/vbicQs8.c: Likewise.
 * gcc.target/arm/neon/vbicQu16.c: Likewise.
 * gcc.target/arm/neon/vbicQu32.c: Likewise.
 * gcc.target/arm/neon/vbicQu64.c: Likewise.
 * gcc.target/arm/neon/vbicQu8.c: Likewise.
 * gcc.target/arm/neon/vbics16.c: Likewise.
 * gcc.target/arm/neon/vbics32.c: Likewise.
 * gcc.target/arm/neon/vbics64.c: Likewise.
 * gcc.target/arm/neon/vbics8.c: Likewise.
 * gcc.target/arm/neon/vbicu16.c: Likewise.
 * gcc.target/arm/neon/vbicu32.c: Likewise.
 * gcc.target/arm/neon/vbicu64.c: Likewise.
 * gcc.target/arm/neon/vbicu8.c: Likewise.
 * gcc.target/arm/neon/vornQs16.c: Likewise.
 * gcc.target/arm/neon/vornQs32.c: Likewise.
 * gcc.target/arm/neon/vornQs64.c: Likewise.
 * gcc.target/arm/neon/vornQs8.c: Likewise.
 * gcc.target/arm/neon/vornQu16.c: Likewise.
 * gcc.target/arm/neon/vornQu32.c: Likewise.
 * gcc.target/arm/neon/vornQu64.c: Likewise.
 

Re: [PATCH][ARM] Use std::swap instead of manually swapping

2014-11-18 Thread Ramana Radhakrishnan
On Thu, Nov 13, 2014 at 9:42 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 Following the trend in i386 and alpha, this patch uses std::swap to perform
 swapping of values in the arm backend instead of declaring temporaries.
 Tested and bootstrapped on arm-none-linux-gnueabihf.

 Ok for trunk?

OK - Thanks for the cleanup.

Thanks,
Ramana


 Thanks,
 Kyrill

 2014-11-13  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/arm/arm.md (unaligned_loaddi): Use std::swap instead of
 manual swapping implementation.
 (movcond_addsi): Likewise.
 * config/arm/arm.c (arm_canonicalize_comparison): Likewise.
 (arm_select_dominance_cc_mode): Likewise.
 (arm_reload_out_hi): Likewise.
 (gen_operands_ldrd_strd): Likewise.
 (output_move_double): Likewise.
 (arm_print_operand_address): Likewise.
 (thumb_output_move_mem_multiple): Likewise.
 (SWAP_RTX): Delete.


Re: [PATCH] Add memory barriers to xbegin/xend/xabort

2014-11-18 Thread Richard Henderson
On 10/29/2014 04:31 AM, Andi Kleen wrote:
 2014-10-28  Andi Kleen  a...@linux.intel.com
 
   PR target/63672
   * config/i386/i386.c (ix86_expand_builtin): Generate memory
   barrier after abort.
   * config/i386/i386.md (xbegin): Add memory barrier.
   (xend): Rename to ...
   (xend_1): New. Generate memory barrier and emit xend.

Richi's comment is spot on.

The insns themselves should hold the barrier, not being separate like

 
 -(define_insn xend
 +(define_insn xend_1
[(unspec_volatile [(const_int 0)] UNSPECV_XEND)]
TARGET_RTM
xend
[(set_attr type other)
 (set_attr length 3)])
  
 +(define_expand xend
 +  [(set (match_dup 0)
 + (unspec:BLK [(const_int 0)] UNSPECV_XEND))] /* or match_dup 0 ? */
 +  TARGET_RTM
 +{
 +  emit_insn (gen_xend_1 ());
 +
 +  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
 +  MEM_VOLATILE_P (operands[0]) = 1;
 +
 +  DONE;
 +})

this, which generates two separate insns.  C.f. sse2_lfence.


r~


Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Marcus Shawcroft
On 17 November 2014 07:59, Yangfei (Felix) felix.y...@huawei.com wrote:

 +2014-11-13  Felix Yang  felix.y...@huawei.com
 +
 + * config/aarch64/aarch64.md (doloop_end): New pattern.
 +

This looks like a straight copy of the ARM code, but without the
TARGET_CAN_USE_DOLOOP_P definition.  If the reason for including this
code is for the benefit of module-sched then should the hook be
defined to limit the use of this pattern to inner most loops only?

Cheers
/Marcus


Re: [patch] fix expand_builtin_init_dwarf_reg_sizes wrt register spans

2014-11-18 Thread Olivier Hainque

On Nov 18, 2014, at 04:01 , Jeff Law l...@redhat.com wrote:
 Best for Jason, Richard or Jakub.  My knowledge of dwarf2 and our 
 implementation in dwarf*out.c is minimal at best.

 Thanks for your answer Jeff. Richard  Jason have provided feedback
 (thanks for this as well :) on which I'll followup.

 Olivier




Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.

2014-11-18 Thread Christophe Lyon
On 18 November 2014 10:14, David Sherwood david.sherw...@arm.com wrote:
 Hi Christophe,

 Ah sorry. My mistake - it fixes this in bugzilla:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810

I did look at that PR, but since it has no testcase attached, I was unsure.
And I am still not :-)
PR 59810 is [AArch64] LDn/STn implementations are not ABI-conformant
for bigendian.
but the advsimd-intrinsics/vldX.c and vldX_lane.c now PASS with Alan's
patches on aarch64_be, so I thought Alan's patches solve PR59810.

What am I missing?




 This change is needed in order to remove the CANNOT_CHANGE_MODE_CLASS
 #define, which will be committed as a separate patch.

 Regards,
 David Sherwood.

 -Original Message-
 From: Christophe Lyon [mailto:christophe.l...@linaro.org]
 Sent: 17 November 2014 21:09
 To: David Sherwood
 Cc: gcc-patches@gcc.gnu.org; Alan Hayward
 Subject: Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes 
 endianness-safe.

 On 13 November 2014 15:32, David Sherwood david.sherw...@arm.com wrote:
 Hi,

 I think that's because Alan Hayward's original patch had a bug in it, which 
 he
 has fixed and should be submitting to gcc patches fairly soon. So it's 
 probably
 best waiting until he gets a new patch up I think.


 I've applied both Alan's patches and the advsimd-intrinsics tests now
 all pass on aarch64_be, but this doesn't need your patch.

 Which testcase does your patch actually fix?


 Regards,
 David.

 -Original Message-
 From: Christophe Lyon [mailto:christophe.l...@linaro.org]
 Sent: 13 November 2014 14:22
 To: David Sherwood
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: New patch: [AArch64] [BE] [1/2] Make large opaque integer modes 
 endianness-safe.

 On 13 November 2014 11:09, David Sherwood david.sherw...@arm.com wrote:
 Hi All,

 I have successfully rebased this and tested in conjunction with a patch from
 Alan Hayward ([AArch64] [BE] Fix vector load/stores to not use ld1/st1), who
 should be submitting a new version shortly. Built and tested on:

 aarch64-none-elf
 aarch64_be-none-elf
 x86_64-linux-gnu

 I've applied both patches to recent trunk (r217483), and I still see
 ICEs on aarch64_be, when running the advsimd-intrinsics/vldX.c tests.

 I have them with Alan Hayward's patch alone, too. BTW, I thought that
 patch had been approved by Marcus, but it's not in trunk yet. Maybe I
 missed something.

 Christophe.


 Regards,
 David Sherwood.

 -Original Message-
 From: David Sherwood [mailto:david.sherw...@arm.com]
 Sent: 28 October 2014 08:55
 To: 'gcc-patches@gcc.gnu.org'
 Subject: RE: [AArch64] [BE] [1/2] Make large opaque integer modes 
 endianness-safe.

 Hi,

 Sorry to bother you again. Could someone take a look at this change
 please if they have time?

 Thanks!
 David.

 -Original Message-
 From: David Sherwood [mailto:david.sherw...@arm.com]
 Sent: 10 October 2014 15:48
 To: gcc-patches@gcc.gnu.org
 Subject: [AArch64] [BE] [1/2] Make large opaque integer modes 
 endianness-safe.

 Hi,

 I have a fix (originally written by Tejas Belagod) for the following bug:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810

 Could someone take a look please?

 Thanks!
 David Sherwood.

 ChangeLog:

 gcc/:
 2014-11-13  David Sherwood  david.sherw...@arm.com

 * config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist,
 aarch64_reverse_mask): New decls.
 * config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum.
 * config/aarch64/iterators.md (insn_count): New mode_attr.
* config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i,
 vec_load_lanes(o/c/x)i): Fixed to work for Big Endian.
 * config/aarch64/aarch64-simd.md (aarch64_rev_reglist,
 aarch64_simd_(ld/st)(2/3/4)): Added.
 * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist,
 aarch64_reverse_mask): Added.










Re: [AArch64, Patch] Restructure arm_neon.h vector types's implementation(Take 2).

2014-11-18 Thread James Greenhalgh
On Wed, Nov 05, 2014 at 11:31:24PM +, James Greenhalgh wrote:
 On Wed, Nov 05, 2014 at 09:50:52PM +, Marc Glisse wrote:
  Thanks. Do you know if anyone is planning to port this patch to the arm 
  target (which IIRC has the same issue)? No pressure, this is just so I 
  know if I should ping the alternate __float128 patch or wait.
 
 Yes, you're right, the arm target also has this issue. I have a port of
 Tejas' patch based on top of some other refactoring of the arm
 target's builtin infrastructure I've been working on - I'm hoping to get
 that all rebased and proposed upstream in the next couple of weeks.

This is all done now (r217700), so your patch should be safe to go in.

Thanks
James



[PATCH][AArch64][1/5] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-11-18 Thread Kyrill Tkachov

Hi all,

This is another iteration of this patch posted earlier with a bug fixed 
pointed out by Andrew
It implements the target hooks for macro fusion and implements fusion of 
MOV+MOVK instructions

and enables them for cores that support these fusions.
Bootstrapped and tested aarch64-none-elf.

Ok for trunk?

Thanks,
Kyrill

2014-11-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/aarch64-protos.h (struct tune_params): Add
fuseable_ops field.
* config/aarch64/aarch64.c (generic_tunings): Specify fuseable_ops.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(thunderx_tunings): Likewise.
(aarch64_macro_fusion_p): New function.
(aarch_macro_fusion_pair_p): Likewise.
(TARGET_SCHED_MACRO_FUSION_P): Define.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
(AARCH64_FUSE_MOV_MOVK): Likewise.
(AARCH64_FUSE_NOTHING): Likewise.commit e9d650bc71fe7549765f750fcdb47a759658c21f
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Tue Oct 21 10:36:48 2014 +0100

[AArch64] Implement TARGET_MACRO_FUSION

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 470b9eb..9e0ff8c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -170,6 +170,7 @@ struct tune_params
   const struct cpu_vector_cost *const vec_costs;
   const int memmov_cost;
   const int issue_rate;
+  const unsigned int fuseable_ops;
 };
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 07f75e1..5ece23a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -302,6 +302,9 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
   NAMED_PARAM (cond_not_taken_branch_cost, 1)
 };
 
+#define AARCH64_FUSE_NOTHING	(0)
+#define AARCH64_FUSE_MOV_MOVK	(1  0)
+
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
 __extension__
 #endif
@@ -312,7 +315,8 @@ static const struct tune_params generic_tunings =
   generic_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
 };
 
 static const struct tune_params cortexa53_tunings =
@@ -322,7 +326,8 @@ static const struct tune_params cortexa53_tunings =
   cortexa53_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK)
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -332,7 +337,8 @@ static const struct tune_params cortexa57_tunings =
   cortexa57_regmove_cost,
   cortexa57_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
-  NAMED_PARAM (issue_rate, 3)
+  NAMED_PARAM (issue_rate, 3),
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK)
 };
 
 static const struct tune_params thunderx_tunings =
@@ -342,7 +348,8 @@ static const struct tune_params thunderx_tunings =
   thunderx_regmove_cost,
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 6),
-  NAMED_PARAM (issue_rate, 2)
+  NAMED_PARAM (issue_rate, 2),
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
 };
 
 /* A processor implementing AArch64.  */
@@ -9979,6 +9986,59 @@ aarch64_use_by_pieces_infrastructure_p (unsigned int size,
   return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
 }
 
+/* Implement TARGET_SCHED_MACRO_FUSION_P.  Return true if target supports
+   instruction fusion of some sort.  */
+
+static bool
+aarch64_macro_fusion_p (void)
+{
+  return aarch64_tune_params-fuseable_ops != AARCH64_FUSE_NOTHING;
+}
+
+
+/* Implement TARGET_SCHED_MACRO_FUSION_PAIR_P.  Return true if PREV and CURR
+   should be kept together during scheduling.  */
+
+static bool
+aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
+{
+  rtx set_dest;
+  rtx prev_set = single_set (prev);
+  rtx curr_set = single_set (curr);
+  /* prev and curr are simple SET insns i.e. no flag setting or branching.  */
+  bool simple_sets_p = prev_set  curr_set  !any_condjump_p (curr);
+
+  if (!aarch64_macro_fusion_p ())
+return false;
+
+  if (simple_sets_p
+   (aarch64_tune_params-fuseable_ops  AARCH64_FUSE_MOV_MOVK))
+{
+  /* We are trying to match:
+ prev (mov)  == (set (reg r0) (const_int imm16))
+ curr (movk) == (set (zero_extract (reg r0)
+   (const_int 16)
+   (const_int 16))
+ (const_int imm16_1))  */
+
+  set_dest = SET_DEST (curr_set);
+
+  if (GET_CODE (set_dest) == ZERO_EXTRACT
+   CONST_INT_P (SET_SRC (curr_set))
+   CONST_INT_P (SET_SRC (prev_set))
+   CONST_INT_P (XEXP (set_dest, 2))
+   INTVAL (XEXP (set_dest, 2)) == 16
+   REG_P (XEXP (set_dest, 0))
+   REG_P (SET_DEST (prev_set))
+   REGNO (XEXP 

[PATCH][AArch64][0/5] Implement macro fusion hooks

2014-11-18 Thread Kyrill Tkachov

Hi all,

I've decided to organise the aarch64 macro fusion patches into a series.
Some of them have been posted last week and gotten review feedback
and have been reworked:

- [1/5] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P:
This implements the hook itself and adds logic for fusing
MOV+MOVK instructions as used when synthesising immediates.

- [2/5] Implement adrp+add fusion:
Implements fusion of ADRP+ADD instructions when the
appropriate register dependencies are met.

- [3/5] Implement fusion of MOVK+MOVK:
Implements fusion of MOVK+MOVK instructions as used when
synthesising immediates.

- [4/5] Implement fusion of ARDP+LDR:
Implements fusion of ADRP+LDR instructions when the appropriate
register constraints are met.

- [5/5] Add macro fusion support for cmp/b.X for ThunderX:
This is a rebase of Andrews' patch on top of the previous four
and implements fusion of flag-setting instructions together
with a conditional branch.

These have been bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill



[PATCH][AArch64][2/5] Implement adrp+add fusion

2014-11-18 Thread Kyrill Tkachov

Hi all,

This patch is just rebased on top of the changes from the previous patch 
in the series.
Otherwise it's the same as 
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01263.html with some style 
cleanup


There can be cases where we miss fusion of adrd+add because although 
they are generated together (in aarch64_load_symref_appropriately),
combine can sometimes combine the losym part with the instruction after 
it and we end up with an instruction stream where the is an insn between 
the two, preventing the fusion in sched1.
We still catch enough cases to make this approach worthwhile and the 
above-mentioned exceptions can be mitigated in the future (for example, 
by somehow delaying the generation of the adrp,add RTL after combine but 
before sched1)


Tested and bootstrapped on aarch64-none-linux-gnu.
Ok for trunk?

2014-11-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/aarch64.c: Include tm-constrs.h
(AARCH64_FUSE_ADRP_ADD): Define.
(cortexa57_tunings): Add AARCH64_FUSE_ADRP_ADD to fuseable_ops.
(cortexa53_tunings): Likewise.
(aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_ADRP_ADD.commit 248ec70cfac6cb552a427b4336a3340bb25a5e53
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Thu Nov 6 12:05:26 2014 +

[AArch64] Fuse ADRP+ADD

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 502ba6d..03ae7c4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -77,6 +77,7 @@
 #include dumpfile.h
 #include builtins.h
 #include rtl-iter.h
+#include tm-constrs.h
 
 /* Defined for convenience.  */
 #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
@@ -304,6 +305,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
 
 #define AARCH64_FUSE_NOTHING	(0)
 #define AARCH64_FUSE_MOV_MOVK	(1  0)
+#define AARCH64_FUSE_ADRP_ADD	(1  1)
 
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
 __extension__
@@ -327,7 +329,7 @@ static const struct tune_params cortexa53_tunings =
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK)
+  NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD))
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -338,7 +340,7 @@ static const struct tune_params cortexa57_tunings =
   cortexa57_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 3),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK)
+  NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD))
 };
 
 static const struct tune_params thunderx_tunings =
@@ -10037,6 +10039,32 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 }
 }
 
+  if (simple_sets_p
+   (aarch64_tune_params-fuseable_ops  AARCH64_FUSE_ADRP_ADD))
+{
+
+  /*  We're trying to match:
+  prev (adrp) == (set (reg r1)
+  (high (symbol_ref (SYM
+  curr (add) == (set (reg r0)
+ (lo_sum (reg r1)
+ (symbol_ref (SYM
+  Note that r0 need not necessarily be the same as r1, especially
+  during pre-regalloc scheduling.  */
+
+  if (satisfies_constraint_Ush (SET_SRC (prev_set))
+   REG_P (SET_DEST (prev_set))  REG_P (SET_DEST (curr_set)))
+{
+  if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+   REG_P (XEXP (SET_SRC (curr_set), 0))
+   REGNO (XEXP (SET_SRC (curr_set), 0))
+ == REGNO (SET_DEST (prev_set))
+   rtx_equal_p (XEXP (SET_SRC (prev_set), 0),
+  XEXP (SET_SRC (curr_set), 1)))
+return true;
+}
+}
+
   return false;
 }
 

[PATCH][AArch64][3/5] Implement fusion of MOVK+MOVK

2014-11-18 Thread Kyrill Tkachov

Hi all,

Following up on the previous instruction fusion patches this one 
implements the fusion of instruction pairs of the form:

movk Xn, imm16_1 lsl 32
movk Xn, imm16_2 lsl 48

which is usually generated as part of the immediate synthesis code. For 
some cores we don't want to schedule them apart.
These insns are represented in RTL as a SET to a ZERO_EXTRACT so we 
match for that case.


Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2014-11-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/aarch64.c (AARCH64_FUSE_MOVK_MOVK): Define.
(cortexa53_tunings): Specify AARCH64_FUSE_MOVK_MOVK in fuseable_ops.
(cortexa57_tunings): Likewise.
(aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_MOVK_MOVK.commit 161e7901d387fa2daf0ea34dd5df4703916435e0
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Wed Nov 12 17:36:56 2014 +

[AArch64] Implement fusion of MOVK+MOVK

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1eb066c..c3c29ed 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -306,6 +306,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
 #define AARCH64_FUSE_NOTHING	(0)
 #define AARCH64_FUSE_MOV_MOVK	(1  0)
 #define AARCH64_FUSE_ADRP_ADD	(1  1)
+#define AARCH64_FUSE_MOVK_MOVK	(1  2)
 
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
 __extension__
@@ -329,7 +330,7 @@ static const struct tune_params cortexa53_tunings =
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD))
+  NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK))
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -340,7 +341,7 @@ static const struct tune_params cortexa57_tunings =
   cortexa57_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 3),
-  NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD))
+  NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK))
 };
 
 static const struct tune_params thunderx_tunings =
@@ -10430,6 +10431,36 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 }
 }
 
+  if (simple_sets_p
+   (aarch64_tune_params-fuseable_ops  AARCH64_FUSE_MOVK_MOVK))
+{
+
+  /* We're trying to match:
+ prev (movk) == (set (zero_extract (reg r0)
+   (const_int 16)
+   (const_int 32))
+ (const_int imm16_1))
+ curr (movk) == (set (zero_extract (reg r0)
+   (const_int 16)
+   (const_int 48))
+ (const_int imm16_2))  */
+
+  if (GET_CODE (SET_DEST (prev_set)) == ZERO_EXTRACT
+   GET_CODE (SET_DEST (curr_set)) == ZERO_EXTRACT
+   REG_P (XEXP (SET_DEST (prev_set), 0))
+   REG_P (XEXP (SET_DEST (curr_set), 0))
+   REGNO (XEXP (SET_DEST (prev_set), 0))
+ == REGNO (XEXP (SET_DEST (curr_set), 0))
+   CONST_INT_P (XEXP (SET_DEST (prev_set), 2))
+   CONST_INT_P (XEXP (SET_DEST (curr_set), 2))
+   INTVAL (XEXP (SET_DEST (prev_set), 2)) == 32
+   INTVAL (XEXP (SET_DEST (curr_set), 2)) == 48
+   CONST_INT_P (SET_SRC (prev_set))
+   CONST_INT_P (SET_SRC (curr_set)))
+return true;
+
+}
+
   return false;
 }
 

Re: [PATCH 4/4] Data structure is used for inline_summary struct.

2014-11-18 Thread Martin Liška

On 11/14/2014 05:06 PM, Jan Hubicka wrote:


In a way I would like to see these to be methods of the underlying type rather 
than
virtual methods of the summary, becuase these are operations on the data 
themselves.
I was thinking to model these by specual constructor and copy constructor
(taking the extra node pointer parameters) and standard destructor.  I am not 
sure this
would be more understandable this way?


Motivation for this implementation is:
a) it's useful to have an access to cgraph_node that is associated with a sumary


Yep, one would have node addition
  ctor (symtab_node *); (or cgraph/varpool nodes for cgraph/varpool annotations)
that would default to ctor for implementations that do not care about node.
And node duplication ctor
  ctor (summary , symtab_node *, symtab_node *)
that would default to copy constructor for data that do not need to be copied.


Hello.

I have no problem with such construction and destruction, we can also provide
base implementation.


I would say that main advantage (in addition to have a way to provide resonable
defaults) is to make ctors/dtors of the embedded classes working well, so one 
can
for example embedd pointer_map and not care about its construction/destruction.


b) with GTY, we cannot call destructor


Everything in symbol table is expecitely memory managed (i.e. enver left
to be freed by garbage collector). It resists in GTY only to allow linking
garbage collected object from them and to get PCH working.


However GTY types need to be allocated by ggc_alloc and one can't call dtor.
This was main motivation for providing hooks instead of ctor/dtor API.
Maybe I miss something?

Thanks,
Martin



This is however quite cosmetic issue I would preffer our C++ guys to comment 
on.  We can
tweak this incrementally.

+void
+inline_summary_t::duplicate (cgraph_node *src,
+cgraph_node *dst,
+inline_summary *,
+inline_summary *info)


Also we should have a way to say that the annotation do not need to be 
duplicated (for example
when we do not want to annotate inline clones). Probably by adding duplicate_p 
predicate that
is called before the actual duplication happens?

The updated patch is OK, I will take a look on the main patch.

Honza

  {
-  struct inline_summary *info;
inline_summary_alloc ();
-  info = inline_summary (dst);
-  memcpy (info, inline_summary (src), sizeof (struct inline_summary));
+  memcpy (info, inline_summary_d-get (src), sizeof (inline_summary));
/* TODO: as an optimization, we may avoid copying conditions
   that are known to be false or true.  */
info-conds = vec_safe_copy (info-conds);
@@ -1328,7 +1309,7 @@ free_growth_caches (void)

  static void
  dump_inline_edge_summary (FILE *f, int indent, struct cgraph_node *node,
- struct inline_summary *info)
+ inline_summary *info)
  {
struct cgraph_edge *edge;
for (edge = node-callees; edge; edge = edge-next_callee)
@@ -1345,8 +1326,8 @@ dump_inline_edge_summary (FILE *f, int indent, struct 
cgraph_node *node,
   ? inlined : cgraph_inline_failed_string (edge- inline_failed),
   indent, , es-loop_depth, edge-frequency,
   es-call_stmt_size, es-call_stmt_time,
-  (int) inline_summary (callee)-size / INLINE_SIZE_SCALE,
-  (int) inline_summary (callee)-estimated_stack_size);
+  (int) inline_summary_d-get (callee)-size / INLINE_SIZE_SCALE,
+  (int) inline_summary_d-get (callee)-estimated_stack_size);

if (es-predicate)
{
@@ -1372,9 +1353,9 @@ dump_inline_edge_summary (FILE *f, int indent, struct 
cgraph_node *node,
  fprintf (f, %*sStack frame offset %i, callee self size %i,
callee size %i\n,
   indent + 2, ,
-  (int) inline_summary (callee)-stack_frame_offset,
-  (int) inline_summary (callee)-estimated_self_stack_size,
-  (int) inline_summary (callee)-estimated_stack_size);
+  (int) inline_summary_d-get (callee)-stack_frame_offset,
+  (int) inline_summary_d-get 
(callee)-estimated_self_stack_size,
+  (int) inline_summary_d-get (callee)-estimated_stack_size);
  dump_inline_edge_summary (f, indent + 2, callee, info);
}
  }
@@ -1402,7 +1383,7 @@ dump_inline_summary (FILE *f, struct cgraph_node *node)
  {
if (node-definition)
  {
-  struct inline_summary *s = inline_summary (node);
+  inline_summary *s = inline_summary_d-get (node);
size_time_entry *e;
int i;
fprintf (f, Inline summary for %s/%i, node-name (),
@@ -1725,7 +1706,7 @@ eliminated_by_inlining_prob (gimple stmt)

  static void
  set_cond_stmt_execution_predicate (struct ipa_node_params *info,
-  struct inline_summary *summary,
+

[PATCH][AArch64][4/5] Implement fusion of ARDP+LDR

2014-11-18 Thread Kyrill Tkachov

Hi all,

Sometimes we want to fuse adrp+ldr pairs as described in the comment in 
the patch.

This is enabled by default for the Cortex-A53.

Bootstrapped and tested on aarch64-none-linux.

Ok for trunk?

Thanks,
Kyrill

2014-11-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/aarch64.c (AARCH64_FUSE_ADRP_LDR): Define.
(cortexa53_tunings): Specify AARCH64_FUSE_ADRP_LDR in fuseable_ops.
(aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_ADRP_LDR.commit ad175271f82e0330894bfe894e86f7ad8a4b6cce
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Thu Nov 13 16:48:18 2014 +

[AArch64][tmp] Fuse ADRP+LDR

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 85e08d0..132535c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -307,6 +307,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
 #define AARCH64_FUSE_MOV_MOVK	(1  0)
 #define AARCH64_FUSE_ADRP_ADD	(1  1)
 #define AARCH64_FUSE_MOVK_MOVK	(1  2)
+#define AARCH64_FUSE_ADRP_LDR	(1  3)
 
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
 __extension__
@@ -330,7 +331,8 @@ static const struct tune_params cortexa53_tunings =
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 4),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK))
+  NAMED_PARAM (fuseable_ops, (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+ | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR))
 };
 
 static const struct tune_params cortexa57_tunings =
@@ -10467,6 +10469,37 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 return true;
 
 }
+  if (simple_sets_p
+   (aarch64_tune_params-fuseable_ops  AARCH64_FUSE_ADRP_LDR))
+{
+  /* We're trying to match:
+  prev (adrp) == (set (reg r0)
+  (high (symbol_ref (SYM
+  curr (ldr) == (set (reg r1)
+ (mem (lo_sum (reg r0)
+ (symbol_ref (SYM)
+ or
+  curr (ldr) == (set (reg r1)
+ (zero_extend (mem
+   (lo_sum (reg r0)
+   (symbol_ref (SYM))  */
+  if (satisfies_constraint_Ush (SET_SRC (prev_set))
+   REG_P (SET_DEST (prev_set))  REG_P (SET_DEST (curr_set)))
+{
+  rtx curr_src = SET_SRC (curr_set);
+
+  if (GET_CODE (curr_src) == ZERO_EXTEND)
+curr_src = XEXP (curr_src, 0);
+
+  if (MEM_P (curr_src)  GET_CODE (XEXP (curr_src, 0)) == LO_SUM
+   REG_P (XEXP (XEXP (curr_src, 0), 0))
+   REGNO (XEXP (XEXP (curr_src, 0), 0))
+ == REGNO (SET_DEST (prev_set))
+   rtx_equal_p (XEXP (XEXP (curr_src, 0), 1),
+  XEXP (SET_SRC (prev_set), 0)))
+  return true;
+}
+}
 
   return false;
 }

[PATCH][AARCH64][5/5] Add macro fusion support for cmp/b.X for ThunderX

2014-11-18 Thread Kyrill Tkachov

Hi all,

This is a rebase of Andrews' CMP+BRANCH fusion patch on top of my macro 
fusion patches.

I've assigned the number 14 to AARCH64_FUSE_CMP_BRANCH.

I've given it a test on top of my fusion patches.

Ok for trunk together with the rest?

2014-11-14  Andrew Pinski  apin...@cavium.com

* config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
(thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
(aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.commit 619f6c056e80f284a834d2b24e6a9c1f933a2dd5
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Fri Nov 14 09:16:08 2014 +

[AArch64][apinski] CMP+branch macro fusion

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2e63269..d0e52b0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -308,6 +308,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
 #define AARCH64_FUSE_ADRP_ADD	(1  1)
 #define AARCH64_FUSE_MOVK_MOVK	(1  2)
 #define AARCH64_FUSE_ADRP_LDR	(1  3)
+#define AARCH64_FUSE_CMP_BRANCH	(1  4)
 
 #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
 __extension__
@@ -353,7 +354,7 @@ static const struct tune_params thunderx_tunings =
   generic_vector_cost,
   NAMED_PARAM (memmov_cost, 6),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
 };
 
 /* A processor implementing AArch64.  */
@@ -10133,6 +10134,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 }
 }
 
+  if ((aarch64_tune_params-fuseable_ops  AARCH64_FUSE_CMP_BRANCH)
+   any_condjump_p (curr))
+{
+  /* FIXME: this misses some which are considered simple arthematic
+ instructions for ThunderX.  Simple shifts are missed here.  */
+  if (get_attr_type (prev) == TYPE_ALUS_SREG
+  || get_attr_type (prev) == TYPE_ALUS_IMM
+  || get_attr_type (prev) == TYPE_LOGICS_REG
+  || get_attr_type (prev) == TYPE_LOGICS_IMM)
+	return true;
+}
+
   return false;
 }
 

Re: match.pd tweaks for vectors and issues with HONOR_NANS

2014-11-18 Thread Marc Glisse

On Mon, 17 Nov 2014, Richard Biener wrote:


On Sun, Nov 16, 2014 at 6:53 PM, Marc Glisse marc.gli...@inria.fr wrote:

On Sun, 16 Nov 2014, Richard Biener wrote:


I think the element_mode is the way to go.



The following passed bootstrap+testsuite.

2014-11-16  Marc Glisse  marc.gli...@inria.fr

* tree.c (element_mode, integer_truep): New functions.
* tree.h (element_mode, integer_truep): Declare them.
* fold-const.c (negate_expr_p, fold_negate_expr, combine_comparisons,
fold_cond_expr_with_comparison, fold_real_zero_addition_p,
fold_comparison, fold_ternary_loc, tree_call_nonnegative_warnv_p,
fold_strip_sign_ops): Use element_mode.
(fold_binary_loc): Use element_mode and element_precision.
* match.pd: Use integer_truep, element_mode, element_precision,
VECTOR_TYPE_P and build_one_cst. Extend some transformations to
vectors. Simplify A/-A.


-! (final_prec != GET_MODE_PRECISION (TYPE_MODE (type))
-  TYPE_MODE (type) == TYPE_MODE (inter_type))
+! (final_prec != element_precision (type)
+  element_mode (type) == element_mode (inter_type))

isn't a 1:1 conversion - please use

   final_prec != GET_MODE_PRECISION (element_mode (type))

your version is final_prec != final_prec.


Good catch, I was doing those replacements too fast to be really safe :-(


The tree.c:element_mode function lacks a function comment.

Ok with those fixed.


I am attaching the version I committed.

I'll try to replace some more TYPE_MODE during stage3...

Thanks,

--
Marc GlisseIndex: gcc/tree.c
===
--- gcc/tree.c  (revision 217701)
+++ gcc/tree.c  (revision 217702)
@@ -2268,20 +2268,34 @@ integer_nonzerop (const_tree expr)
 {
   STRIP_NOPS (expr);
 
   return ((TREE_CODE (expr) == INTEGER_CST
!wi::eq_p (expr, 0))
  || (TREE_CODE (expr) == COMPLEX_CST
   (integer_nonzerop (TREE_REALPART (expr))
  || integer_nonzerop (TREE_IMAGPART (expr);
 }
 
+/* Return 1 if EXPR is the integer constant one.  For vector,
+   return 1 if every piece is the integer constant minus one
+   (representing the value TRUE).  */
+
+int
+integer_truep (const_tree expr)
+{
+  STRIP_NOPS (expr);
+
+  if (TREE_CODE (expr) == VECTOR_CST)
+return integer_all_onesp (expr);
+  return integer_onep (expr);
+}
+
 /* Return 1 if EXPR is the fixed-point constant zero.  */
 
 int
 fixed_zerop (const_tree expr)
 {
   return (TREE_CODE (expr) == FIXED_CST
   TREE_FIXED_CST (expr).data.is_zero ());
 }
 
 /* Return the power of two represented by a tree node known to be a
@@ -12303,11 +12317,25 @@ get_base_address (tree t)
 t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
 
   /* ???  Either the alias oracle or all callers need to properly deal
  with WITH_SIZE_EXPRs before we can look through those.  */
   if (TREE_CODE (t) == WITH_SIZE_EXPR)
 return NULL_TREE;
 
   return t;
 }
 
+/* Return the machine mode of T.  For vectors, returns the mode of the
+   inner type.  The main use case is to feed the result to HONOR_NANS,
+   avoiding the BLKmode that a direct TYPE_MODE (T) might return.  */
+
+machine_mode
+element_mode (const_tree t)
+{
+  if (!TYPE_P (t))
+t = TREE_TYPE (t);
+  if (VECTOR_TYPE_P (t) || TREE_CODE (t) == COMPLEX_TYPE)
+t = TREE_TYPE (t);
+  return TYPE_MODE (t);
+}
+
 #include gt-tree.h
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 217701)
+++ gcc/tree.h  (revision 217702)
@@ -1557,20 +1557,22 @@ extern void protected_set_expr_location
 #define TYPE_NEXT_VARIANT(NODE) (TYPE_CHECK (NODE)-type_common.next_variant)
 #define TYPE_MAIN_VARIANT(NODE) (TYPE_CHECK (NODE)-type_common.main_variant)
 #define TYPE_CONTEXT(NODE) (TYPE_CHECK (NODE)-type_common.context)
 
 #define TYPE_MODE(NODE) \
   (VECTOR_TYPE_P (TYPE_CHECK (NODE)) \
? vector_type_mode (NODE) : (NODE)-type_common.mode)
 #define SET_TYPE_MODE(NODE, MODE) \
   (TYPE_CHECK (NODE)-type_common.mode = (MODE))
 
+extern machine_mode element_mode (const_tree t);
+
 /* The canonical type for this type node, which is used by frontends to
compare the type for equality with another type.  If two types are
equal (based on the semantics of the language), then they will have
equivalent TYPE_CANONICAL entries.
 
As a special case, if TYPE_CANONICAL is NULL_TREE, and thus
TYPE_STRUCTURAL_EQUALITY_P is true, then it cannot
be used for comparison against other types.  Instead, the type is
said to require structural equality checks, described in
TYPE_STRUCTURAL_EQUALITY_P.
@@ -3992,20 +3994,25 @@ extern int integer_minus_onep (const_tre
 /* integer_pow2p (tree x) is nonzero is X is an integer constant with
exactly one bit 1.  */
 
 extern int integer_pow2p (const_tree);
 
 /* integer_nonzerop (tree x) is nonzero if X is an integer constant
with a 

[PATCH][ARM] Add -mcpu=cortex-a17.cortex-a7

2014-11-18 Thread Kyrill Tkachov

Hi all,

Following up from adding Cortex-A17 support this patch adds a big.LITTLE 
option cortex-a17.cortex-a7.
Similar to the existing cortex-a15.cortex-a7 support we schedule for 
Cortex-A7 and make the other tuning

decisions as for Cortex-A17.

Tested arm-none-eabi.

Ok for trunk?

Thanks,
Kyrill

2014-11-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm-cores.def (cortex-a17.cortex-a7): New entry.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add mcpu=cortex-a17.cortex-a7.
* config/arm/t-aprofile: Add cortex-a17.cortex-a7 entry to
MULTILIB_MATCHES.
commit aa691f7d5d474f3e7721dfbedefea4b22502b5c0
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Mon Nov 3 15:07:58 2014 +

[ARM] Add cortex-a17.cortex-a7 big.LITTLE support

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index f8003ce..423ee9e 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -162,6 +162,7 @@ ARM_CORE(marvell-pj4,		marvell_pj4, marvell_pj4,	7A,  FL_LDSCHED, 9e)
 
 /* V7 big.LITTLE implementations */
 ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7,	7A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
+ARM_CORE(cortex-a17.cortex-a7, cortexa17cortexa7, cortexa7,	7A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a12)
 
 /* V8 Architecture Processors */
 ARM_CORE(cortex-a53,	cortexa53, cortexa53,	8A, FL_LDSCHED | FL_CRC32, cortex_a53)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 9d8159f..9b1886e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -301,6 +301,9 @@ EnumValue
 Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7)
 
 EnumValue
+Enum(processor_type) String(cortex-a17.cortex-a7) Value(cortexa17cortexa7)
+
+EnumValue
 Enum(processor_type) String(cortex-a53) Value(cortexa53)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 7218542..d300c51 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -31,6 +31,6 @@ (define_attr tune
 	cortexa15,cortexa17,cortexr4,cortexr4f,
 	cortexr5,cortexr7,cortexm7,
 	cortexm4,cortexm3,marvell_pj4,
-	cortexa15cortexa7,cortexa53,cortexa57,
-	cortexa57cortexa53
+	cortexa15cortexa7,cortexa17cortexa7,cortexa53,
+	cortexa57,cortexa57cortexa53
 	(const (symbol_ref ((enum attr_tune) arm_tune
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 22a37ae..aa93aa4 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -66,6 +66,7 @@
|mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15   \
|mcpu=cortex-a12|mcpu=cortex-a17			\
|mcpu=cortex-a15.cortex-a7\
+   |mcpu=cortex-a17.cortex-a7\
|mcpu=marvell-pj4	\
|mcpu=cortex-a53	\
|mcpu=cortex-a57	\
@@ -87,6 +88,7 @@
|mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15   \
|mcpu=cortex-a12|mcpu=cortex-a17			\
|mcpu=cortex-a15.cortex-a7\
+   |mcpu=cortex-a17.cortex-a7\
|mcpu=cortex-a53	\
|mcpu=cortex-a57	\
|mcpu=cortex-a57.cortex-a53\
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 475aed1..9e2637b 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -85,6 +85,7 @@ MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a15
 MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a12
 MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a17
 MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a15.cortex-a7
+MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a17.cortex-a7
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a53
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a57
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a57.cortex-a53

Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Yangfei (Felix)
Yes, major code is borrowed from ARM port with the expected mode of loop pseudo 
changed to DImode. 
The function loop_canon_p called in sms_schedule can make sure that SMS is only 
performed for innermost loops. 
But I think it's a good idea to implement the TARGET_CAN_USE_DOLOOP_P hook 
here. 
See my updated patch below.  How about this one? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-11-13  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64.c (doloop_end): New pattern.
+   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
+
 2014-11-11  Andrew Pinski  apin...@cavium.com
 
Bug target/61997
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 217394)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -4087,6 +4087,47 @@
   [(set_attr type mrs)])
 
 
+;; Define the subtract-one-and-jump insns so loop.c
+;; knows what to generate.
+(define_expand doloop_end
+  [(use (match_operand 0  ))  ; loop pseudo
+   (use (match_operand 1  ))] ; label
+  
+{
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+  if (optimize  0  flag_modulo_sched)
+{
+  rtx s0;
+  rtx bcomp;
+  rtx loc_ref;
+  rtx cc_reg;
+  rtx insn;
+  rtx cmp;
+
+  if (GET_MODE (operands[0]) != DImode)
+   FAIL;
+
+  s0 = operands [0];
+  insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1)));
+
+  cmp = XVECEXP (PATTERN (insn), 0, 0);
+  cc_reg = SET_DEST (cmp);
+  bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
+  loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]);
+  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
+  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
+loc_ref, pc_rtx)));
+  DONE;
+}
+  else
+FAIL;
+})
+
 ;; AdvSIMD Stuff
 (include aarch64-simd.md)
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 217394)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   aarch64_use_by_pieces_infrastructure_p
 
+#undef TARGET_CAN_USE_DOLOOP_P
+#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include gt-aarch64.h


 
 On 17 November 2014 07:59, Yangfei (Felix) felix.y...@huawei.com wrote:
 
  +2014-11-13  Felix Yang  felix.y...@huawei.com
  +
  + * config/aarch64/aarch64.md (doloop_end): New pattern.
  +
 
 This looks like a straight copy of the ARM code, but without the
 TARGET_CAN_USE_DOLOOP_P definition.  If the reason for including this code
 is for the benefit of module-sched then should the hook be defined to limit 
 the
 use of this pattern to inner most loops only?
 
 Cheers
 /Marcus


aarch64-doloop-v4.diff
Description: aarch64-doloop-v4.diff


Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread Ilya Enkovich
2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com:
 On 11/17/14 13:43, Ilya Enkovich wrote:


 I don't fully understand how this problem appears.  Is it fully AIX
 specific and doesn't affect any other target?  May we put all _CHKP
 codes to the end of enum and ignore them by AIX? Limiting number of
 codes in enum due to restrictions of specific debug info format seems
 weird.  If something cannot be encoded for debug info then it should
 be ignored.  I really hoped that creation of functions by demand would
 allow to avoid such kind of problems :(

 I'll try to make a patch reducing amound of builtin codes to a
 required minimum in case it appears to be the best option we have.

 It's a problem with the AIX native tools.  Presumably they don't like really
 long lines -- it was a relatively common issue in the past with vendor
 tools.

 I think we should proceed on two fronts.  First if David could investigate
 the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to
 resolve the issue.  Second if you could look at now duplicating every
 builtin in the enumeration since it's a combination of the number of enums
 and the length of the debug strings to represent them that's causing AIX
 heartburn.



 jeff

I see.  I can reduce the number of _CHKP builtin codes. Experimental
patch shows resulting END_BUILTINS = 1242.  But we should expect
similar problem for i386 target builds hosted on AIX (here
http://toolchain.lug-owl.de/buildbot/ I see such build is tested).
Current IX86_BUILTIN_MAX is 2467 which is greater than current
END_BUILTINS.

Ilya


Re: [PATCH][AARCH64][5/5] Add macro fusion support for cmp/b.X for ThunderX

2014-11-18 Thread Richard Earnshaw
On 18/11/14 10:34, Kyrill Tkachov wrote:
 Hi all,
 
 This is a rebase of Andrews' CMP+BRANCH fusion patch on top of my macro 
 fusion patches.
 I've assigned the number 14 to AARCH64_FUSE_CMP_BRANCH.
 
 I've given it a test on top of my fusion patches.
 
 Ok for trunk together with the rest?
 
 2014-11-14  Andrew Pinski  apin...@cavium.com
 
  * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
  (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
  (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.
 
 
 aarch64-cmp-branch.patch
 
 
 commit 619f6c056e80f284a834d2b24e6a9c1f933a2dd5
 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
 Date:   Fri Nov 14 09:16:08 2014 +
 
 [AArch64][apinski] CMP+branch macro fusion
 
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 2e63269..d0e52b0 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -308,6 +308,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost 
 =
  #define AARCH64_FUSE_ADRP_ADD(1  1)
  #define AARCH64_FUSE_MOVK_MOVK   (1  2)
  #define AARCH64_FUSE_ADRP_LDR(1  3)
 +#define AARCH64_FUSE_CMP_BRANCH  (1  4)
  
  #if HAVE_DESIGNATED_INITIALIZERS  GCC_VERSION = 2007
  __extension__
 @@ -353,7 +354,7 @@ static const struct tune_params thunderx_tunings =
generic_vector_cost,
NAMED_PARAM (memmov_cost, 6),
NAMED_PARAM (issue_rate, 2),
 -  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
 +  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
  };
  
  /* A processor implementing AArch64.  */
 @@ -10133,6 +10134,18 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
 *curr)
  }
  }
  
 +  if ((aarch64_tune_params-fuseable_ops  AARCH64_FUSE_CMP_BRANCH)
 +   any_condjump_p (curr))
 +{
 +  /* FIXME: this misses some which are considered simple arthematic
 + instructions for ThunderX.  Simple shifts are missed here.  */
 +  if (get_attr_type (prev) == TYPE_ALUS_SREG
 +  || get_attr_type (prev) == TYPE_ALUS_IMM
 +  || get_attr_type (prev) == TYPE_LOGICS_REG
 +  || get_attr_type (prev) == TYPE_LOGICS_IMM)
 + return true;
 +}
 +

Repeatedly calling get_attr_type on the same insn is a bit wasteful,
despite the caching that's done.  It would be better to call it once and
save the value in a variable.

Also, calling that function may cause prev to become the extracted insn.
 Is it safe to assume that the caller(s) of this code understand that
this might have happened, or do we need to re-extract curr before returning?

return false;
  }
  
 




Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Richard Henderson
On 11/18/2014 11:48 AM, Yangfei (Felix) wrote:
 +(define_expand doloop_end
 +  [(use (match_operand 0  ))  ; loop pseudo
 +   (use (match_operand 1  ))] ; label
 +  
 +{
 +  /* Currently SMS relies on the do-loop pattern to recognize loops
 + where (1) the control part consists of all insns defining and/or
 + using a certain 'count' register and (2) the loop count can be
 + adjusted by modifying this register prior to the loop.
 + ??? The possible introduction of a new block to initialize the
 + new IV can potentially affect branch optimizations.  */
 +  if (optimize  0  flag_modulo_sched)

You'd be better off moving this condition into the expansion predicate (which
is currently ).

This short-circuits a lot of unnecessary work.  See  pass_rtl_doloop::gate.


r~


[PR63762]GCC generates UNPREDICTABLE STR with Rn = Rt for arm

2014-11-18 Thread Renlin Li

Hi all,

This patch should fix PR63762.

Hard_reg allocated during IRA pass conflicts with hard_reg allocated in 
LRA pass because of inconsistent information between allocno and reg_pref.


The staled reg_class information in reg_pref is used by LRA while doing 
hard register assignment. LRA makes wrong decision
to choose same hard_reg number for pseudo registers with overlapped 
liveness.



For this particular case manifested on arm target:

(insn 180 183 184 15 (set (mem:SF (post_inc:SI (reg/v/f:SI 223 [ in ])) 
[2 MEM[base: in_68, offset: 4294967292B]+0 S4 A32])
(reg/v:SF 290 [orig:224 val ] [224])) unpreditable-str.c:33 618 
{*movsf_vfp}

 (expr_list:REG_INC (reg/v/f:SI 223 [ in ])
(nil)))

IRA:
r290(2 ,SF, GENERAL_REGS) - r278(2, SF, GENERAL_REGS) - r224 (16, SF, 
VFP_LO_REGS)
reg_pref[290] is set following its original_reg's(r224) info which is 
VFP_LO_REGS.


LRA:
r302(SI, GENERAL_REGS) - r223(SI, GENERAL_REGS)

while selecting hard register for r302, register class is checked first, 
r290 is VFP_LO_REGS (from reg_pref[290]), r302 is GENERAL_REGS, so they 
will never conflict. Thus, hard register number 2 is chosen for r302 
without problem.




A testcase is added for all targets as I think it's a middle-end issue. 
And sorry for not being able to simplify it.
arm-none-eabi has been test on the model, no new issues. bootstrapping 
and regression tested on x86, no new issues.


Is it Okay to commit?

gcc/ChangeLog:

2014-11-18  Renlin Li  renlin...@arm.com

PR middle-end/63762
* ira.c (ira): Update preferred class.

gcc/testsuite/ChangeLog:

2014-11-18  Renlin Li  renlin...@arm.com

PR middle-end/63762
* gcc.dg/pr63762.c: New.diff --git a/gcc/ira.c b/gcc/ira.c
index 9c9e71d..e610d35 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5263,7 +5263,18 @@ ira (FILE *f)
 	  ira_allocno_iterator ai;
 
 	  FOR_EACH_ALLOCNO (a, ai)
-		ALLOCNO_REGNO (a) = REGNO (ALLOCNO_EMIT_DATA (a)-reg);
+{
+  int old_regno = ALLOCNO_REGNO (a);
+  int new_regno = REGNO (ALLOCNO_EMIT_DATA (a)-reg);
+
+  ALLOCNO_REGNO (a) = new_regno;
+
+  if (old_regno != new_regno)
+setup_reg_classes (new_regno, reg_preferred_class (old_regno),
+   reg_alternate_class (old_regno),
+   reg_allocno_class (old_regno));
+}
+
 	}
 	  else
 	{
diff --git a/gcc/testsuite/gcc.dg/pr63762.c b/gcc/testsuite/gcc.dg/pr63762.c
new file mode 100644
index 000..df11067
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr63762.c
@@ -0,0 +1,77 @@
+/* PR middle-end/63762 */
+/* { dg-do assemble } */
+/* { dg-options -O2 } */
+
+#include stdlib.h
+
+void *astFree ();
+void *astMalloc ();
+void astNegate (void *);
+int astGetNegated (void *);
+void astGetRegionBounds (void *, double *, double *);
+int astResampleF (void *, ...);
+
+extern int astOK;
+
+int
+MaskF (int inside, int ndim, const int lbnd[], const int ubnd[],
+   float in[], float val)
+{
+
+  void *used_region;
+  float *c, *d, *out, *tmp_out;
+  double *lbndgd, *ubndgd;
+  int *lbndg, *ubndg, idim, ipix, nax, nin, nout, npix, npixg, result = 0;
+  if (!astOK) return result;
+  lbndg = astMalloc (sizeof (int)*(size_t) ndim);
+  ubndg = astMalloc (sizeof (int)*(size_t) ndim);
+  lbndgd = astMalloc (sizeof (double)*(size_t) ndim);
+  ubndgd = astMalloc (sizeof (double)*(size_t) ndim);
+  if (astOK)
+{
+  astGetRegionBounds (used_region, lbndgd, ubndgd);
+  npix = 1;
+  npixg = 1;
+  for (idim = 0; idim  ndim; idim++)
+{
+  lbndg[ idim ] = lbnd[ idim ];
+  ubndg[ idim ] = ubnd[ idim ];
+  npix *= (ubnd[ idim ] - lbnd[ idim ] + 1);
+  if (npixg = 0) npixg *= (ubndg[ idim ] - lbndg[ idim ] + 1);
+}
+  if (npixg = 0  astOK)
+{
+  if ((inside != 0) == (astGetNegated( used_region ) != 0))
+{
+  c = in;
+  for (ipix = 0; ipix  npix; ipix++) *(c++) = val;
+  result = npix;
+}
+}
+  else if (npixg  0  astOK)
+{
+  if ((inside != 0) == (astGetNegated (used_region) != 0))
+{
+  tmp_out = astMalloc (sizeof (float)*(size_t) npix);
+  if (tmp_out)
+{
+  c = tmp_out;
+  for (ipix = 0; ipix  npix; ipix++) *(c++) = val;
+  result = npix - npixg;
+}
+  out = tmp_out;
+}
+  else
+{
+  tmp_out = NULL;
+  out = in;
+}
+  if (inside) astNegate (used_region);
+  result += astResampleF (used_region, ndim, lbnd, ubnd, in, NULL,
+  NULL, NULL, 0, 0.0, 100, val, ndim,
+  lbnd, ubnd, lbndg, ubndg, out, NULL);
+  if (inside) astNegate 

Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-18 Thread Yangfei (Felix)
 On 06/11/14 08:35, Yangfei (Felix) wrote:
The idea is simple: Use movw for certain const source operand
  instead of
  ldrh.  And exclude the const values which cannot be handled by
  mov/mvn/movw.
I am doing regression test for this patch.  Assuming no issue
  pops up,
  OK for trunk?
 
  So, doesn't that makes the bug latent for architectures older than
  armv6t2 and big endian and only fixed this in ARM state ? I'd prefer
  a complete solution please. What about *thumb2_movhi_insn in
 thumb2.md ?
 
 
  Actually, we fix the bug by excluding the const values which cannot be 
  handled.
 The patch still works even without the adding of movw here.
  The new movw alternative here is just an small code optimization for 
  certain
 arch. We can handle consts by movw instead of ldrh and this better for
 performance.
  We find that this bug is not reproducible for *thumb2_movhi_insn. The reason
 is that this pattern can always move consts using movw.
 
 Please fix the PR number in your final commit with PR 59593.
 
  Index: gcc/config/arm/predicates.md
 
 =
 ==
  --- gcc/config/arm/predicates.md(revision 216838)
  +++ gcc/config/arm/predicates.md(working copy)
  @@ -144,6 +144,11 @@
 (and (match_code const_int)
  (match_test INTVAL (op) == 0)))
 
  +(define_predicate arm_movw_immediate_operand
  +  (and (match_test TARGET_32BIT  arm_arch_thumb2)
  +   (and (match_code const_int)
  +   (match_test (INTVAL (op)  0x) == 0
  +
   ;; Something valid on the RHS of an ARM data-processing instruction
  (define_predicate arm_rhs_operand
 (ior (match_operand 0 s_register_operand) @@ -211,6 +216,11 @@
 (ior (match_operand 0 arm_rhs_operand)
  (match_operand 0 arm_not_immediate_operand)))
 
  +(define_predicate arm_hi_operand
  +  (ior (match_operand 0 arm_rhsm_operand)
  +   (ior (match_operand 0 arm_not_immediate_operand)
  +(match_operand 0 arm_movw_immediate_operand
  +
 
 I think you don't need any of these predicates.
 
 
   (define_predicate arm_di_operand
 (ior (match_operand 0 s_register_operand)
  (match_operand 0 arm_immediate_di_operand)))
  Index: gcc/config/arm/arm.md
 
 =
 ==
  --- gcc/config/arm/arm.md   (revision 216838)
  +++ gcc/config/arm/arm.md   (working copy)
  @@ -6285,8 +6285,8 @@
 
   ;; Pattern to recognize insn generated default case above
  (define_insn *movhi_insn_arch4
  -  [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r)
  -   (match_operand:HI 1 general_operand  rIk,K,r,mi))]
  +  [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r)
  +   (match_operand:HI 1 arm_hi_operand rIk,K,j,r,mi))]
 
 Use `n' instead of `j' - movw can handle all of the numerical constants for 
 HImode
 values. And the predicate can remain general_operand.
 


Hello Ramana,

  We need to make sure that movw is only used for architectures which satisfy 
arm_arch_thumb2 as indicated in the following predicate added:

+(define_predicate arm_movw_immediate_operand
+  (and (match_test TARGET_32BIT  arm_arch_thumb2)
+   (and (match_code const_int)
+   (match_test (INTVAL (op)  0x) == 0

  I am modifying the predicate in order to fix issue for other older 
architectures.
  It seems we can't achieve this by simply using 'n' instead of 'j' here, right?
  Thanks.



Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Yangfei (Felix)
 On 11/18/2014 11:48 AM, Yangfei (Felix) wrote:
  +(define_expand doloop_end
  +  [(use (match_operand 0  ))  ; loop pseudo
  +   (use (match_operand 1  ))] ; label
  +  
  +{
  +  /* Currently SMS relies on the do-loop pattern to recognize loops
  + where (1) the control part consists of all insns defining and/or
  + using a certain 'count' register and (2) the loop count can be
  + adjusted by modifying this register prior to the loop.
  + ??? The possible introduction of a new block to initialize the
  + new IV can potentially affect branch optimizations.  */
  +  if (optimize  0  flag_modulo_sched)
 
 You'd be better off moving this condition into the expansion predicate (which 
 is
 currently ).
 
 This short-circuits a lot of unnecessary work.  See  pass_rtl_doloop::gate.
 
 
 r~


Yeah, that's a good idea.  See my updated patch :-)


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-11-13  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64.c (doloop_end): New pattern.
+   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
+
 2014-11-11  Andrew Pinski  apin...@cavium.com
 
Bug target/61997
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 217394)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -4087,6 +4087,43 @@
   [(set_attr type mrs)])
 
 
+;; Define the subtract-one-and-jump insns so loop.c
+;; knows what to generate.
+(define_expand doloop_end
+  [(use (match_operand 0  ))  ; loop pseudo
+   (use (match_operand 1  ))] ; label
+  optimize  0  flag_modulo_sched
+{
+  rtx s0;
+  rtx bcomp;
+  rtx loc_ref;
+  rtx cc_reg;
+  rtx insn;
+  rtx cmp;
+
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+
+  if (GET_MODE (operands[0]) != DImode)
+FAIL;
+
+  s0 = operands [0];
+  insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1)));
+
+  cmp = XVECEXP (PATTERN (insn), 0, 0);
+  cc_reg = SET_DEST (cmp);
+  bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
+  loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]);
+  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
+  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
+loc_ref, pc_rtx)));
+  DONE;
+})
+
 ;; AdvSIMD Stuff
 (include aarch64-simd.md)
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 217394)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   aarch64_use_by_pieces_infrastructure_p
 
+#undef TARGET_CAN_USE_DOLOOP_P
+#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include gt-aarch64.h



aarch64-doloop-v5.diff
Description: aarch64-doloop-v5.diff


[PATCH][ARM] __ARM_FP __ARM_NEON_FP defined when -march=armv7-m

2014-11-18 Thread Mantas Mikaitis

Incorrect predefinitions for certain target architectures. E.g. arm7-m
does not contain NEON but the defintion __ARM_NEON_FP was switched on.
Similarly with armv6 and even armv2.

This patch fixes the predefines for each of the different chips
containing certain types of the FPU implementations.

Tests:

Tested on arm-none-linux-gnueabi and arm-none-linux-gnueabihf without
any new regression.

Manually compiled for various targets and all correct definitions were
present.

Is this patch ok for trunk?

Mantas

gcc/Changelog:

 * config/arm/arm.h (TARGET_NEON_FP): Removed conditional definition, 
define to zero if !TARGET_NEON.
   (TARGET_CPU_CPP_BUILTINS): Added second condition before defining 
__ARM_FP macro.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index ff4ddac..325fea9 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -118,7 +118,7 @@ extern char arm_arch_name[];
 	if (TARGET_VFP)	\
 	  builtin_define (__VFP_FP__);		\
 			\
-	if (TARGET_ARM_FP)\
+	if (TARGET_ARM_FP  !TARGET_SOFT_FLOAT)	\
 	  builtin_define_with_int_value (		\
 	__ARM_FP, TARGET_ARM_FP);			\
 	if (arm_fp16_format == ARM_FP16_FORMAT_IEEE)		\
@@ -2350,10 +2350,9 @@ extern int making_const_table;
 /* Set as a bit mask indicating the available widths of floating point
types for hardware NEON floating point.  This is the same as
TARGET_ARM_FP without the 64-bit bit set.  */
-#ifdef TARGET_NEON
-#define TARGET_NEON_FP		\
-  (TARGET_ARM_FP  (0xff ^ 0x08))
-#endif
+#define TARGET_NEON_FP \
+  (TARGET_NEON ? (TARGET_ARM_FP  (0xff ^ 0x08)) \
+   : 0)
 
 /* The maximum number of parallel loads or stores we support in an ldm/stm
instruction.  */


Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Richard Henderson
On 11/18/2014 12:28 PM, Yangfei (Felix) wrote:
 +2014-11-13  Felix Yang  felix.y...@huawei.com
 +
 + * config/aarch64/aarch64.c (doloop_end): New pattern.
 + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.

Looks good to me.  I'll leave it for aarch64 maintainers for final approval.


r~


[PING ^ 2][RFC PATCH, AARCH64] Add support for -mlong-calls option

2014-11-18 Thread Yangfei (Felix)
Ping again?  Any comment please? 


 
 Ping?  I hope this patch can catch up with stage 1 of GCC-5.0.  Thanks.
 
 
 
 
   Hi Felix,
  
   Sorry for the delay responding, I've been out of the office recently
   and I'm only just catching up on a backlog of GCC related emails.
  
   I'm in two minds about this; I can potentially see the need for
   attributes to enable long calls for specific calls, and maybe also
   for pragmas that can be used to efficiently mark a group of
   functions in that way; but I don't really see the value in adding a
   -mlong-calls option to do
  this globally.
  
   The reasoning is as follows: long calls are generally very expensive
   and relatively few functions should need them in most applications
   (since code that needs to span more than a single block of 128Mbytes
   - the span of a BL or B instruction - will be very rare in reality).
  
   The best way to handle very large branches for those rare cases
   where you do have a very large contiguous block of code more than
   128MB is by having the linker insert veneers when needed; the code
   will branch to the veneer which will insert an indirect branch at
   that point (the ABI guarantees that at function call boundaries IP0
   and IP1 will not contain live values, making them available for such 
   purposes).
  
   In a very small number of cases it might be desirable to mark
   specific functions as being too far away to reach; in those cases
   the attributes and pragma methods can be used to mark such calls as being
 far calls.
  
   Aside: The reason -mlong-calls was added to GCC for ARM is that the
   code there pre-dates the EABI, which introduced the concept of
   link-time veneering of calls - the option should be unnecessary now
   that almost everyone uses the EABI as the basis for their platform
   ABI.  We don't have such a legacy for AArch64 and I'd need to see
   strong
  justification for its use before adding the option there as well.
  
   So please can you rework the patch to remove the -mlong-calls option
   and just leave the attribute and pragma interfaces.
  
   R.
 
 
  Hello Richard,
 
Thanks for the comments.  I agree with the idea.
And I updated the patch with the -mlong-calls option removed and use
  short call by default.
Reg-tested for aarch64-linux-gnu with qemu.  Is it OK for trunk?
 
 
  Index: gcc/ChangeLog
 
 =
  ==
  --- gcc/ChangeLog   (revision 217394)
  +++ gcc/ChangeLog   (working copy)
  @@ -1,3 +1,25 @@
  +2014-11-12  Felix Yang  felix.y...@huawei.com
  +   Haijian Zhang  z.zhanghaij...@huawei.com
  +
  +   * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define.
  +   * config/aarch64/aarch64.c (aarch64_set_default_type_attributes,
  +   aarch64_attribute_table, aarch64_comp_type_attributes,
  +   aarch64_decl_is_long_call_p, aarch64_function_in_section_p,
  +   aarch64_pr_long_calls, aarch64_pr_no_long_calls,
  +   aarch64_pr_long_calls_off): New functions.
  +   (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as
  +   aarch64_set_default_type_attributes.
  +   (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table.
  +   (TARGET_COMP_TYPE_ATTRIBUTES): Define as
  aarch64_comp_type_attribute.
  +   (aarch64_pragma_enum): New enum.
  +   (aarch64_attribute_table): New attribute table.
  +   * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls,
  +   aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New declarations.
  +   * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to
  +   generate indirect call for sibling call when needed.
  +   * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified to
  +   exclude a symbol_ref for an indirect call.
  +
   2014-11-11  Andrew Pinski  apin...@cavium.com
 
  Bug target/61997
  Index: gcc/testsuite/gcc.target/aarch64/long-calls-1.c
 
 =
  ==
  --- gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0)
  +++ gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0)
  @@ -0,0 +1,133 @@
  +/* Check that long calls to different sections are not optimized to
  +bl.  */
  +/* { dg-do compile } */
  +/* { dg-options -O2 } */
  +/* This test expects that short calls are the default.  */
  +/* { dg-skip-if -mlong-calls in use { *-*-* } { -mlong-calls }
  +{  } } */
  +
  +#define section(S) __attribute__((section(S))) #define weak
  +__attribute__((weak)) #define noinline __attribute__((noinline))
  +#define long_call __attribute__((long_call)) #define short_call
  +__attribute__((short_call))
  +
  +#define REMOTE_CALL(ID, TARGET_ATTRS, CALL_ATTRS)  \
  +  const char *TARGET_ATTRS ID (void);  
  \
  +  const char *CALL_ATTRS call_##ID (void) { return ID () + 1; }
  +
  +#define EXTERN_CALL(ID, TARGET_ATTRS, CALL_ATTRS)  \
  +  const char *TARGET_ATTRS noinline ID (void) { return 

Re: [PATCH][ARM] __ARM_FP __ARM_NEON_FP defined when -march=armv7-m

2014-11-18 Thread Richard Earnshaw
On 18/11/14 11:30, Mantas Mikaitis wrote:
 Incorrect predefinitions for certain target architectures. E.g. arm7-m
 does not contain NEON but the defintion __ARM_NEON_FP was switched on.
 Similarly with armv6 and even armv2.
 
 This patch fixes the predefines for each of the different chips
 containing certain types of the FPU implementations.
 
 Tests:
 
 Tested on arm-none-linux-gnueabi and arm-none-linux-gnueabihf without
 any new regression.
 
 Manually compiled for various targets and all correct definitions were
 present.
 
 Is this patch ok for trunk?
 
 Mantas
 
 gcc/Changelog:
 
   * config/arm/arm.h (TARGET_NEON_FP): Removed conditional definition, 
 define to zero if !TARGET_NEON.
 (TARGET_CPU_CPP_BUILTINS): Added second condition before defining 
 __ARM_FP macro.
 
 
 ARM_DEFS.patch
 
 
 diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
 index ff4ddac..325fea9 100644
 --- a/gcc/config/arm/arm.h
 +++ b/gcc/config/arm/arm.h
 @@ -118,7 +118,7 @@ extern char arm_arch_name[];
   if (TARGET_VFP) \
 builtin_define (__VFP_FP__);\
   \
 - if (TARGET_ARM_FP)  \
 + if (TARGET_ARM_FP  !TARGET_SOFT_FLOAT)\

Wouldn't it be better to factor this into TARGET_ARM_FP?  It seems odd
that that macro returns a set of values based on something completely
unavailable for the current compilation.  That would also then mirror
the behaviour of TARGET_NEON_FP (see below) and make the internal macros
more consistent.


 builtin_define_with_int_value (   \
   __ARM_FP, TARGET_ARM_FP); \
   if (arm_fp16_format == ARM_FP16_FORMAT_IEEE)\
 @@ -2350,10 +2350,9 @@ extern int making_const_table;
  /* Set as a bit mask indicating the available widths of floating point
 types for hardware NEON floating point.  This is the same as
 TARGET_ARM_FP without the 64-bit bit set.  */
 -#ifdef TARGET_NEON
 -#define TARGET_NEON_FP   \
 -  (TARGET_ARM_FP  (0xff ^ 0x08))
 -#endif
 +#define TARGET_NEON_FP\
 +  (TARGET_NEON ? (TARGET_ARM_FP  (0xff ^ 0x08)) \
 +   : 0)
  
  /* The maximum number of parallel loads or stores we support in an ldm/stm
 instruction.  */
 

R.



Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Marcus Shawcroft
On 18 November 2014 11:28, Yangfei (Felix) felix.y...@huawei.com wrote:

 Yeah, that's a good idea.  See my updated patch :-)


 Index: gcc/ChangeLog
 ===
 --- gcc/ChangeLog   (revision 217394)
 +++ gcc/ChangeLog   (working copy)
 @@ -1,3 +1,8 @@
 +2014-11-13  Felix Yang  felix.y...@huawei.com
 +
 +   * config/aarch64/aarch64.c (doloop_end): New pattern.
 +   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
 +


OK with me /Marcus


Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-18 Thread Ramana Radhakrishnan



On 18/11/14 11:02, Yangfei (Felix) wrote:

On 06/11/14 08:35, Yangfei (Felix) wrote:

   The idea is simple: Use movw for certain const source operand
instead of

ldrh.  And exclude the const values which cannot be handled by
mov/mvn/movw.

   I am doing regression test for this patch.  Assuming no issue
pops up,

OK for trunk?

So, doesn't that makes the bug latent for architectures older than
armv6t2 and big endian and only fixed this in ARM state ? I'd prefer
a complete solution please. What about *thumb2_movhi_insn in

thumb2.md ?




Actually, we fix the bug by excluding the const values which cannot be handled.

The patch still works even without the adding of movw here.

The new movw alternative here is just an small code optimization for certain

arch. We can handle consts by movw instead of ldrh and this better for
performance.

We find that this bug is not reproducible for *thumb2_movhi_insn. The reason

is that this pattern can always move consts using movw.

Please fix the PR number in your final commit with PR 59593.


Index: gcc/config/arm/predicates.md


=
==

--- gcc/config/arm/predicates.md(revision 216838)
+++ gcc/config/arm/predicates.md(working copy)
@@ -144,6 +144,11 @@
(and (match_code const_int)
 (match_test INTVAL (op) == 0)))

+(define_predicate arm_movw_immediate_operand
+  (and (match_test TARGET_32BIT  arm_arch_thumb2)
+   (and (match_code const_int)
+   (match_test (INTVAL (op)  0x) == 0
+
  ;; Something valid on the RHS of an ARM data-processing instruction
(define_predicate arm_rhs_operand
(ior (match_operand 0 s_register_operand) @@ -211,6 +216,11 @@
(ior (match_operand 0 arm_rhs_operand)
 (match_operand 0 arm_not_immediate_operand)))

+(define_predicate arm_hi_operand
+  (ior (match_operand 0 arm_rhsm_operand)
+   (ior (match_operand 0 arm_not_immediate_operand)
+(match_operand 0 arm_movw_immediate_operand
+


I think you don't need any of these predicates.



  (define_predicate arm_di_operand
(ior (match_operand 0 s_register_operand)
 (match_operand 0 arm_immediate_di_operand)))
Index: gcc/config/arm/arm.md


=
==

--- gcc/config/arm/arm.md   (revision 216838)
+++ gcc/config/arm/arm.md   (working copy)
@@ -6285,8 +6285,8 @@

  ;; Pattern to recognize insn generated default case above
(define_insn *movhi_insn_arch4
-  [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r)
-   (match_operand:HI 1 general_operand  rIk,K,r,mi))]
+  [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r)
+   (match_operand:HI 1 arm_hi_operand rIk,K,j,r,mi))]


Use `n' instead of `j' - movw can handle all of the numerical constants for 
HImode
values. And the predicate can remain general_operand.



Did you read my comment about set_attr arch further down in the thread ?



Look at the set_attr arch alternative and set this to t2 for the movw 
alternative. I would expect that to be enough to sort this out instead of adding all this 
code.





Ramana




Hello Ramana,

   We need to make sure that movw is only used for architectures which satisfy 
arm_arch_thumb2 as indicated in the following predicate added:

+(define_predicate arm_movw_immediate_operand
+  (and (match_test TARGET_32BIT  arm_arch_thumb2)
+   (and (match_code const_int)
+   (match_test (INTVAL (op)  0x) == 0

   I am modifying the predicate in order to fix issue for other older 
architectures.
   It seems we can't achieve this by simply using 'n' instead of 'j' here, 
right?
   Thanks.



Re: [PING ^ 2][RFC PATCH, AARCH64] Add support for -mlong-calls option

2014-11-18 Thread Ramana Radhakrishnan
On Tue, Nov 18, 2014 at 11:51 AM, Yangfei (Felix) felix.y...@huawei.com wrote:
 Ping again?  Any comment please?


Pinging daily is only going to irritate people. Please desist from doing so.

Ramana



 Ping?  I hope this patch can catch up with stage 1 of GCC-5.0.  Thanks.




   Hi Felix,
  
   Sorry for the delay responding, I've been out of the office recently
   and I'm only just catching up on a backlog of GCC related emails.
  
   I'm in two minds about this; I can potentially see the need for
   attributes to enable long calls for specific calls, and maybe also
   for pragmas that can be used to efficiently mark a group of
   functions in that way; but I don't really see the value in adding a
   -mlong-calls option to do
  this globally.
  
   The reasoning is as follows: long calls are generally very expensive
   and relatively few functions should need them in most applications
   (since code that needs to span more than a single block of 128Mbytes
   - the span of a BL or B instruction - will be very rare in reality).
  
   The best way to handle very large branches for those rare cases
   where you do have a very large contiguous block of code more than
   128MB is by having the linker insert veneers when needed; the code
   will branch to the veneer which will insert an indirect branch at
   that point (the ABI guarantees that at function call boundaries IP0
   and IP1 will not contain live values, making them available for such 
   purposes).
  
   In a very small number of cases it might be desirable to mark
   specific functions as being too far away to reach; in those cases
   the attributes and pragma methods can be used to mark such calls as being
 far calls.
  
   Aside: The reason -mlong-calls was added to GCC for ARM is that the
   code there pre-dates the EABI, which introduced the concept of
   link-time veneering of calls - the option should be unnecessary now
   that almost everyone uses the EABI as the basis for their platform
   ABI.  We don't have such a legacy for AArch64 and I'd need to see
   strong
  justification for its use before adding the option there as well.
  
   So please can you rework the patch to remove the -mlong-calls option
   and just leave the attribute and pragma interfaces.
  
   R.
 
 
  Hello Richard,
 
Thanks for the comments.  I agree with the idea.
And I updated the patch with the -mlong-calls option removed and use
  short call by default.
Reg-tested for aarch64-linux-gnu with qemu.  Is it OK for trunk?
 
 
  Index: gcc/ChangeLog
 
 =
  ==
  --- gcc/ChangeLog   (revision 217394)
  +++ gcc/ChangeLog   (working copy)
  @@ -1,3 +1,25 @@
  +2014-11-12  Felix Yang  felix.y...@huawei.com
  +   Haijian Zhang  z.zhanghaij...@huawei.com
  +
  +   * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define.
  +   * config/aarch64/aarch64.c (aarch64_set_default_type_attributes,
  +   aarch64_attribute_table, aarch64_comp_type_attributes,
  +   aarch64_decl_is_long_call_p, aarch64_function_in_section_p,
  +   aarch64_pr_long_calls, aarch64_pr_no_long_calls,
  +   aarch64_pr_long_calls_off): New functions.
  +   (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as
  +   aarch64_set_default_type_attributes.
  +   (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table.
  +   (TARGET_COMP_TYPE_ATTRIBUTES): Define as
  aarch64_comp_type_attribute.
  +   (aarch64_pragma_enum): New enum.
  +   (aarch64_attribute_table): New attribute table.
  +   * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls,
  +   aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New declarations.
  +   * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to
  +   generate indirect call for sibling call when needed.
  +   * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified to
  +   exclude a symbol_ref for an indirect call.
  +
   2014-11-11  Andrew Pinski  apin...@cavium.com
 
  Bug target/61997
  Index: gcc/testsuite/gcc.target/aarch64/long-calls-1.c
 
 =
  ==
  --- gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0)
  +++ gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0)
  @@ -0,0 +1,133 @@
  +/* Check that long calls to different sections are not optimized to
  +bl.  */
  +/* { dg-do compile } */
  +/* { dg-options -O2 } */
  +/* This test expects that short calls are the default.  */
  +/* { dg-skip-if -mlong-calls in use { *-*-* } { -mlong-calls }
  +{  } } */
  +
  +#define section(S) __attribute__((section(S))) #define weak
  +__attribute__((weak)) #define noinline __attribute__((noinline))
  +#define long_call __attribute__((long_call)) #define short_call
  +__attribute__((short_call))
  +
  +#define REMOTE_CALL(ID, TARGET_ATTRS, CALL_ATTRS)  \
  +  const char *TARGET_ATTRS ID (void); 
   \
  +  const char *CALL_ATTRS call_##ID 

Re: match.pd tweaks for vectors and issues with HONOR_NANS

2014-11-18 Thread Richard Biener
On Tue, Nov 18, 2014 at 11:36 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 17 Nov 2014, Richard Biener wrote:

 On Sun, Nov 16, 2014 at 6:53 PM, Marc Glisse marc.gli...@inria.fr wrote:

 On Sun, 16 Nov 2014, Richard Biener wrote:

 I think the element_mode is the way to go.



 The following passed bootstrap+testsuite.

 2014-11-16  Marc Glisse  marc.gli...@inria.fr

 * tree.c (element_mode, integer_truep): New functions.
 * tree.h (element_mode, integer_truep): Declare them.
 * fold-const.c (negate_expr_p, fold_negate_expr,
 combine_comparisons,
 fold_cond_expr_with_comparison, fold_real_zero_addition_p,
 fold_comparison, fold_ternary_loc, tree_call_nonnegative_warnv_p,
 fold_strip_sign_ops): Use element_mode.
 (fold_binary_loc): Use element_mode and element_precision.
 * match.pd: Use integer_truep, element_mode, element_precision,
 VECTOR_TYPE_P and build_one_cst. Extend some transformations to
 vectors. Simplify A/-A.


 -! (final_prec != GET_MODE_PRECISION (TYPE_MODE (type))
 -  TYPE_MODE (type) == TYPE_MODE (inter_type))
 +! (final_prec != element_precision (type)
 +  element_mode (type) == element_mode (inter_type))

 isn't a 1:1 conversion - please use

final_prec != GET_MODE_PRECISION (element_mode (type))

 your version is final_prec != final_prec.


 Good catch, I was doing those replacements too fast to be really safe :-(

 The tree.c:element_mode function lacks a function comment.

 Ok with those fixed.


 I am attaching the version I committed.

 I'll try to replace some more TYPE_MODE during stage3...

Btw, a convenience would be to be able to write

 HONOR_NANS (type)

thus effectively make HONOR_* inline functions with a machine_mode
and a type overload (and the type overload properly looking at
element types).

Richard.

 Thanks,

 --
 Marc Glisse
 Index: gcc/tree.c
 ===
 --- gcc/tree.c  (revision 217701)
 +++ gcc/tree.c  (revision 217702)
 @@ -2268,20 +2268,34 @@ integer_nonzerop (const_tree expr)
  {
STRIP_NOPS (expr);

return ((TREE_CODE (expr) == INTEGER_CST
 !wi::eq_p (expr, 0))
   || (TREE_CODE (expr) == COMPLEX_CST
(integer_nonzerop (TREE_REALPART (expr))
   || integer_nonzerop (TREE_IMAGPART (expr);
  }

 +/* Return 1 if EXPR is the integer constant one.  For vector,
 +   return 1 if every piece is the integer constant minus one
 +   (representing the value TRUE).  */
 +
 +int
 +integer_truep (const_tree expr)
 +{
 +  STRIP_NOPS (expr);
 +
 +  if (TREE_CODE (expr) == VECTOR_CST)
 +return integer_all_onesp (expr);
 +  return integer_onep (expr);
 +}
 +
  /* Return 1 if EXPR is the fixed-point constant zero.  */

  int
  fixed_zerop (const_tree expr)
  {
return (TREE_CODE (expr) == FIXED_CST
TREE_FIXED_CST (expr).data.is_zero ());
  }

  /* Return the power of two represented by a tree node known to be a
 @@ -12303,11 +12317,25 @@ get_base_address (tree t)
  t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);

/* ???  Either the alias oracle or all callers need to properly deal
   with WITH_SIZE_EXPRs before we can look through those.  */
if (TREE_CODE (t) == WITH_SIZE_EXPR)
  return NULL_TREE;

return t;
  }

 +/* Return the machine mode of T.  For vectors, returns the mode of the
 +   inner type.  The main use case is to feed the result to HONOR_NANS,
 +   avoiding the BLKmode that a direct TYPE_MODE (T) might return.  */
 +
 +machine_mode
 +element_mode (const_tree t)
 +{
 +  if (!TYPE_P (t))
 +t = TREE_TYPE (t);
 +  if (VECTOR_TYPE_P (t) || TREE_CODE (t) == COMPLEX_TYPE)
 +t = TREE_TYPE (t);
 +  return TYPE_MODE (t);
 +}
 +
  #include gt-tree.h
 Index: gcc/tree.h
 ===
 --- gcc/tree.h  (revision 217701)
 +++ gcc/tree.h  (revision 217702)
 @@ -1557,20 +1557,22 @@ extern void protected_set_expr_location
  #define TYPE_NEXT_VARIANT(NODE) (TYPE_CHECK
 (NODE)-type_common.next_variant)
  #define TYPE_MAIN_VARIANT(NODE) (TYPE_CHECK
 (NODE)-type_common.main_variant)
  #define TYPE_CONTEXT(NODE) (TYPE_CHECK (NODE)-type_common.context)

  #define TYPE_MODE(NODE) \
(VECTOR_TYPE_P (TYPE_CHECK (NODE)) \
 ? vector_type_mode (NODE) : (NODE)-type_common.mode)
  #define SET_TYPE_MODE(NODE, MODE) \
(TYPE_CHECK (NODE)-type_common.mode = (MODE))

 +extern machine_mode element_mode (const_tree t);
 +
  /* The canonical type for this type node, which is used by frontends to
 compare the type for equality with another type.  If two types are
 equal (based on the semantics of the language), then they will have
 equivalent TYPE_CANONICAL entries.

 As a special case, if TYPE_CANONICAL is NULL_TREE, and thus
 TYPE_STRUCTURAL_EQUALITY_P is true, then it cannot
 be used for comparison against other types.  

Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread Richard Biener
On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com:
 On 11/17/14 13:43, Ilya Enkovich wrote:


 I don't fully understand how this problem appears.  Is it fully AIX
 specific and doesn't affect any other target?  May we put all _CHKP
 codes to the end of enum and ignore them by AIX? Limiting number of
 codes in enum due to restrictions of specific debug info format seems
 weird.  If something cannot be encoded for debug info then it should
 be ignored.  I really hoped that creation of functions by demand would
 allow to avoid such kind of problems :(

 I'll try to make a patch reducing amound of builtin codes to a
 required minimum in case it appears to be the best option we have.

 It's a problem with the AIX native tools.  Presumably they don't like really
 long lines -- it was a relatively common issue in the past with vendor
 tools.

 I think we should proceed on two fronts.  First if David could investigate
 the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to
 resolve the issue.  Second if you could look at now duplicating every
 builtin in the enumeration since it's a combination of the number of enums
 and the length of the debug strings to represent them that's causing AIX
 heartburn.



 jeff

 I see.  I can reduce the number of _CHKP builtin codes. Experimental
 patch shows resulting END_BUILTINS = 1242.  But we should expect
 similar problem for i386 target builds hosted on AIX (here
 http://toolchain.lug-owl.de/buildbot/ I see such build is tested).
 Current IX86_BUILTIN_MAX is 2467 which is greater than current
 END_BUILTINS.

I think it's better to fix dbxout.c to do something sensible with
too long lines for enum types.  It should be easy to cook up a
testcase that is not GCC.

enum { entry1, entry2, entry3 };

not sure how exactly the issue breaks AIX bootstrap though.  I suppose
the assembler complains?

Richard.

 Ilya


Re: [PATCH, 8/8] Do simple omp lowering for no address taken var

2014-11-18 Thread Richard Biener
On Tue, 18 Nov 2014, Richard Biener wrote:

 On Tue, 18 Nov 2014, Eric Botcazou wrote:
 
   Now - I can see how that is easily confused by the static chain
   being address-taken.  But I also remember that Eric did some
   preparatory work to fix that, for nested functions, that is,
   possibly setting DECL_NONADDRESSABLE_P?  Don't remember exactly.
  
  The preparatory work is DECL_NONLOCAL_FRAME.  The complete patch which does 
  something along these lines is attached to PR tree-optimization/54779 
  (latest 
  version, for a 4.9-based compiler).
 
 Ah, now I remember - this was to be able to optimize away the frame
 variable in case the nested function was inlined.
 
 Toms case is somewhat different as I undestand as somehow LIM store
 motion doesn't handle indirect frame accesses well enough(?)  So
 he intends to load register vars in the frame into registers at the
 beginning of the nested function and restore them to the frame on
 function exit (this will probably break for recursive calls, but
 OMP offloading might be special enough that this is a non-issue there).
 
 So marking the frame decl won't help him here (I thought we might
 mark the FIELD_DECLs corresponding to individual vars).  OTOH inside
 the nested function accesses to the static chain should be easy to
 identify.

Tom - does the following patch help?

Thanks,
Richard.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c   (revision 217692)
+++ gcc/omp-low.c   (working copy)
@@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct
   layout_type (type);
 }
 
-  TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type);
+  TREE_TYPE (ctx-receiver_decl)
+= build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
 }
 
 /* Instantiate decls as necessary in CTX to satisfy the data sharing


Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread Ilya Enkovich
2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com:
 On 11/17/14 13:43, Ilya Enkovich wrote:


 I don't fully understand how this problem appears.  Is it fully AIX
 specific and doesn't affect any other target?  May we put all _CHKP
 codes to the end of enum and ignore them by AIX? Limiting number of
 codes in enum due to restrictions of specific debug info format seems
 weird.  If something cannot be encoded for debug info then it should
 be ignored.  I really hoped that creation of functions by demand would
 allow to avoid such kind of problems :(

 I'll try to make a patch reducing amound of builtin codes to a
 required minimum in case it appears to be the best option we have.

 It's a problem with the AIX native tools.  Presumably they don't like really
 long lines -- it was a relatively common issue in the past with vendor
 tools.

 I think we should proceed on two fronts.  First if David could investigate
 the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to
 resolve the issue.  Second if you could look at now duplicating every
 builtin in the enumeration since it's a combination of the number of enums
 and the length of the debug strings to represent them that's causing AIX
 heartburn.



 jeff

 I see.  I can reduce the number of _CHKP builtin codes. Experimental
 patch shows resulting END_BUILTINS = 1242.  But we should expect
 similar problem for i386 target builds hosted on AIX (here
 http://toolchain.lug-owl.de/buildbot/ I see such build is tested).
 Current IX86_BUILTIN_MAX is 2467 which is greater than current
 END_BUILTINS.

 I think it's better to fix dbxout.c to do something sensible with
 too long lines for enum types.  It should be easy to cook up a
 testcase that is not GCC.

 enum { entry1, entry2, entry3 };


As I understand the main problem is that fixing dbxout in trunk won't
help to build stage1 compiler.

 not sure how exactly the issue breaks AIX bootstrap though.  I suppose
 the assembler complains?

It is linker how complaints. Here is a build log example:
http://toolchain.lug-owl.de/buildbot/deliver_artifact.php?mode=viewid=3073584

Ilya


 Richard.

 Ilya


Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread Richard Biener
On Tue, Nov 18, 2014 at 1:13 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com:
 On 11/17/14 13:43, Ilya Enkovich wrote:


 I don't fully understand how this problem appears.  Is it fully AIX
 specific and doesn't affect any other target?  May we put all _CHKP
 codes to the end of enum and ignore them by AIX? Limiting number of
 codes in enum due to restrictions of specific debug info format seems
 weird.  If something cannot be encoded for debug info then it should
 be ignored.  I really hoped that creation of functions by demand would
 allow to avoid such kind of problems :(

 I'll try to make a patch reducing amound of builtin codes to a
 required minimum in case it appears to be the best option we have.

 It's a problem with the AIX native tools.  Presumably they don't like 
 really
 long lines -- it was a relatively common issue in the past with vendor
 tools.

 I think we should proceed on two fronts.  First if David could investigate
 the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to
 resolve the issue.  Second if you could look at now duplicating every
 builtin in the enumeration since it's a combination of the number of enums
 and the length of the debug strings to represent them that's causing AIX
 heartburn.



 jeff

 I see.  I can reduce the number of _CHKP builtin codes. Experimental
 patch shows resulting END_BUILTINS = 1242.  But we should expect
 similar problem for i386 target builds hosted on AIX (here
 http://toolchain.lug-owl.de/buildbot/ I see such build is tested).
 Current IX86_BUILTIN_MAX is 2467 which is greater than current
 END_BUILTINS.

 I think it's better to fix dbxout.c to do something sensible with
 too long lines for enum types.  It should be easy to cook up a
 testcase that is not GCC.

 enum { entry1, entry2, entry3 };


 As I understand the main problem is that fixing dbxout in trunk won't
 help to build stage1 compiler.

Simply build stage1 without debug info on AIX then.

Btw, you have to start fixing the bug at some point ... (we can
backport to 4.8 and 4.9).  Of course dbxout.c is in kind of
deep-freeze unmaintained state.

I don't think we ever encouraged work-arounds for broken host compilers
if that requires substantial work.

 not sure how exactly the issue breaks AIX bootstrap though.  I suppose
 the assembler complains?

 It is linker how complaints. Here is a build log example:
 http://toolchain.lug-owl.de/buildbot/deliver_artifact.php?mode=viewid=3073584

I see.

Thanks,
Richard.

 Ilya


 Richard.

 Ilya


Re: [PATCH][AArch64][1/5] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-11-18 Thread Kyrill Tkachov


On 18/11/14 10:33, Kyrill Tkachov wrote:

diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 264bf01..ad7ec43c 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -36,7 +36,6 @@ extern int arm_no_early_alu_shift_value_dep (rtx, rtx);
  extern int arm_no_early_mul_dep (rtx, rtx);
  extern int arm_no_early_store_addr_dep (rtx, rtx);
  extern bool arm_rtx_shift_left_p (rtx);
-
  /* RTX cost table definitions.  These are used when tuning for speed rather
 than for size and should reflect the_additional_  cost over the cost
 of the fastest instruction in the machine, which is COSTS_N_INSNS (1).


This hunk should not be here. I'll remove it when I commit if approved...
Sorry for that.

Kyrill



Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-18 Thread Yangfei (Felix)
 On 18/11/14 11:02, Yangfei (Felix) wrote:
  On 06/11/14 08:35, Yangfei (Felix) wrote:
 The idea is simple: Use movw for certain const source
  operand instead of
  ldrh.  And exclude the const values which cannot be handled by
  mov/mvn/movw.
 I am doing regression test for this patch.  Assuming no
  issue pops up,
  OK for trunk?
 
  So, doesn't that makes the bug latent for architectures older than
  armv6t2 and big endian and only fixed this in ARM state ? I'd
  prefer a complete solution please. What about *thumb2_movhi_insn in
  thumb2.md ?
 
 
  Actually, we fix the bug by excluding the const values which cannot be
 handled.
  The patch still works even without the adding of movw here.
  The new movw alternative here is just an small code optimization
  for certain
  arch. We can handle consts by movw instead of ldrh and this better
  for performance.
  We find that this bug is not reproducible for *thumb2_movhi_insn.
  The reason
  is that this pattern can always move consts using movw.
 
  Please fix the PR number in your final commit with PR 59593.
 
  Index: gcc/config/arm/predicates.md
 
 
 =
  ==
  --- gcc/config/arm/predicates.md  (revision 216838)
  +++ gcc/config/arm/predicates.md  (working copy)
  @@ -144,6 +144,11 @@
  (and (match_code const_int)
   (match_test INTVAL (op) == 0)))
 
  +(define_predicate arm_movw_immediate_operand
  +  (and (match_test TARGET_32BIT  arm_arch_thumb2)
  +   (and (match_code const_int)
  + (match_test (INTVAL (op)  0x) == 0
  +
;; Something valid on the RHS of an ARM data-processing
  instruction (define_predicate arm_rhs_operand
  (ior (match_operand 0 s_register_operand) @@ -211,6 +216,11 @@
  (ior (match_operand 0 arm_rhs_operand)
   (match_operand 0 arm_not_immediate_operand)))
 
  +(define_predicate arm_hi_operand
  +  (ior (match_operand 0 arm_rhsm_operand)
  +   (ior (match_operand 0 arm_not_immediate_operand)
  +(match_operand 0 arm_movw_immediate_operand
  +
 
  I think you don't need any of these predicates.
 
 
(define_predicate arm_di_operand
  (ior (match_operand 0 s_register_operand)
   (match_operand 0 arm_immediate_di_operand)))
  Index: gcc/config/arm/arm.md
 
 
 =
  ==
  --- gcc/config/arm/arm.md (revision 216838)
  +++ gcc/config/arm/arm.md (working copy)
  @@ -6285,8 +6285,8 @@
 
;; Pattern to recognize insn generated default case above
  (define_insn *movhi_insn_arch4
  -  [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r)
  - (match_operand:HI 1 general_operand  rIk,K,r,mi))]
  +  [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r)
  + (match_operand:HI 1 arm_hi_operand rIk,K,j,r,mi))]
 
  Use `n' instead of `j' - movw can handle all of the numerical
  constants for HImode values. And the predicate can remain general_operand.
 
 
 Did you read my comment about set_attr arch further down in the thread ?
 
  Look at the set_attr arch alternative and set this to t2 for the movw
 alternative. I would expect that to be enough to sort this out instead of 
 adding all
 this code.
 

Sorry for missing the point.  It seems to me that 't2' here will conflict with 
condition of the pattern *movhi_insn_arch4: 
  TARGET_ARM
arm_arch4
(register_operand (operands[0], HImode)
   || register_operand (operands[1], HImode))

#define TARGET_ARM  (! TARGET_THUMB)
/* 32-bit Thumb-2 code.  */
#define TARGET_THUMB2   (TARGET_THUMB  arm_arch_thumb2)

Right? Thanks.



Re: [PATCH 5/5] combine: preferably delete dead SETs in PARALLELs

2014-11-18 Thread Segher Boessenkool
On Mon, Nov 17, 2014 at 02:13:22PM -0700, Jeff Law wrote:
 On 11/14/14 12:19, Segher Boessenkool wrote:
 If the result of combining some insns is a PARALLEL of two SETs, and that
 is not a recognised insn, and one of the SETs is dead, combine tries to
 use the remaining SET as insn.
 
 This patch changes things around so that the one SET is preferably used,
 not the PARALLEL.
 
 
 2014-11-14  Segher Boessenkool  seg...@kernel.crashing.org
 
 gcc/
  * combine.c (try_combine): Prefer to delete dead SETs inside
  a PARALLEL over keeping them.
 OK.  Can this go forward independent of the other changes?  Seems like 
 it should.

Yes, only 4 depends on 3, the rest are independent patches.

 Does it help pr52714 where we'd like to rip apart a PARALLEL with two 
 sets, one of which is dead.  If so, it might allow us to  optimize that 
 code better.

It does not seem to fix the testcase.  I'll investigate why not.
You're talking about the

  (parallel [(set (pc) (pc))
 (set (X) (sp))])

right?  I guess the set pc pc is not marked as unused...

 Granted, it originally was an m68k issue, but presumably 
 it's happening eleswhere or you wouldn't be poking at it :-)

The case that made me do this is PowerPC, where (with more patches) for

  long long subfM1(long long a) { return 0x1 - a; }

we generated (-m32)

  subfic 4,4,-1 ; subfic 3,3,1

where that first subfic is

  (parallel [(set (reg 4) (not (reg 4)))
 (set (ca) (const_int 1))])

with that second set dead, so we can just do

  not 4,4 ; subfic 3,3,1

which is cheaper.


Segher


Re: [PATCH 4/4] Data structure is used for inline_summary struct.

2014-11-18 Thread Martin Jambor
On Fri, Nov 14, 2014 at 08:59:10PM +0100, Jan Hubicka wrote:
   
b) with GTY, we cannot call destructor
   
   Everything in symbol table is expecitely memory managed (i.e. enver left
   to be freed by garbage collector). It resists in GTY only to allow linking
   garbage collected object from them and to get PCH working.
   
  
  Well, if I understand the intent correctly, summaries are for stuff
  that is not in the symbol table.  For example jump functions are a
 Correct.
  vector of structures possibly containing trees, so everything has to
  be in garbage collected memory.
  
  When an edge is removed, it is necessary to be notified about it
  immediately, for example to decrement rdesc_refcount (you might argue
  that that should be done in a separate hook and not from within a
  summary class but then you start to rely on hook invocation ordering
  so I think it is better to eventually use the summaries for it too).
 
 I do not see why ctors/dtors can not do the reference counting. In fact
 this is how refcounting is done usually anyway?
 

Well, when there is no garbage collection involved then yes, that is
how you normally do it but in the GC case, there is the question of
what is the appropriate time to call destructor on garbage collected
data (like jump functions)?

Martin


Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-18 Thread Ramana Radhakrishnan




Sorry for missing the point.  It seems to me that 't2' here will conflict with 
condition of the pattern *movhi_insn_arch4:
   TARGET_ARM
 arm_arch4
 (register_operand (operands[0], HImode)
|| register_operand (operands[1], HImode))

#define TARGET_ARM  (! TARGET_THUMB)
/* 32-bit Thumb-2 code.  */
#define TARGET_THUMB2   (TARGET_THUMB  arm_arch_thumb2)



Bah, Indeed ! - I misremembered the t2 there, my mistake.

Yes you are right there, but what I'd like you to do is to use that 
mechanism rather than putting all this logic in the predicate.


So, I'd prefer you to add a v6t2 to the values for the arch attribute, 
don't forget to update the comments above.


and in arch_enabled you need to enforce this with

 (and (eq_attr arch v6t2)
  (match_test TARGET_32BIT  arm_arch6  arm_arch_thumb2))
 (const_string yes)

And in the pattern use v6t2 ...

arm_arch_thumb2 implies that this is at the architecture level of v6t2. 
Therefore TARGET_ARM  arm_arch_thumb2 implies ARM state.




regards
Ramana






Right? Thanks.



Re: [PING ^ 2][RFC PATCH, AARCH64] Add support for -mlong-calls option

2014-11-18 Thread Yangfei (Felix)
 On Tue, Nov 18, 2014 at 11:51 AM, Yangfei (Felix) felix.y...@huawei.com
 wrote:
  Ping again?  Any comment please?
 
 
 Pinging daily is only going to irritate people. Please desist from doing so.
 
 Ramana


Oh, thanks for reminding me.  And sorry if this bothers you guys.  
The end of stage1 of GCC 5.0 causes me to push this a little bit :-)  


 
 
 
  Ping?  I hope this patch can catch up with stage 1 of GCC-5.0.  Thanks.
 
 
 
 
Hi Felix,
   
Sorry for the delay responding, I've been out of the office
recently and I'm only just catching up on a backlog of GCC related 
emails.
   
I'm in two minds about this; I can potentially see the need for
attributes to enable long calls for specific calls, and maybe
also for pragmas that can be used to efficiently mark a group of
functions in that way; but I don't really see the value in adding
a -mlong-calls option to do
   this globally.
   
The reasoning is as follows: long calls are generally very
expensive and relatively few functions should need them in most
applications (since code that needs to span more than a single
block of 128Mbytes
- the span of a BL or B instruction - will be very rare in reality).
   
The best way to handle very large branches for those rare cases
where you do have a very large contiguous block of code more than
128MB is by having the linker insert veneers when needed; the
code will branch to the veneer which will insert an indirect
branch at that point (the ABI guarantees that at function call
boundaries IP0 and IP1 will not contain live values, making them 
available
 for such purposes).
   
In a very small number of cases it might be desirable to mark
specific functions as being too far away to reach; in those cases
the attributes and pragma methods can be used to mark such calls
as being
  far calls.
   
Aside: The reason -mlong-calls was added to GCC for ARM is that
the code there pre-dates the EABI, which introduced the concept
of link-time veneering of calls - the option should be
unnecessary now that almost everyone uses the EABI as the basis
for their platform ABI.  We don't have such a legacy for AArch64
and I'd need to see strong
   justification for its use before adding the option there as well.
   
So please can you rework the patch to remove the -mlong-calls
option and just leave the attribute and pragma interfaces.
   
R.
  
  
   Hello Richard,
  
 Thanks for the comments.  I agree with the idea.
 And I updated the patch with the -mlong-calls option removed and
   use short call by default.
 Reg-tested for aarch64-linux-gnu with qemu.  Is it OK for trunk?
  
  
   Index: gcc/ChangeLog
  
 
 =
   ==
   --- gcc/ChangeLog   (revision 217394)
   +++ gcc/ChangeLog   (working copy)
   @@ -1,3 +1,25 @@
   +2014-11-12  Felix Yang  felix.y...@huawei.com
   +   Haijian Zhang  z.zhanghaij...@huawei.com
   +
   +   * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define.
   +   * config/aarch64/aarch64.c (aarch64_set_default_type_attributes,
   +   aarch64_attribute_table, aarch64_comp_type_attributes,
   +   aarch64_decl_is_long_call_p, aarch64_function_in_section_p,
   +   aarch64_pr_long_calls, aarch64_pr_no_long_calls,
   +   aarch64_pr_long_calls_off): New functions.
   +   (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as
   +   aarch64_set_default_type_attributes.
   +   (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table.
   +   (TARGET_COMP_TYPE_ATTRIBUTES): Define as
   aarch64_comp_type_attribute.
   +   (aarch64_pragma_enum): New enum.
   +   (aarch64_attribute_table): New attribute table.
   +   * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls,
   +   aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New
 declarations.
   +   * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to
   +   generate indirect call for sibling call when needed.
   +   * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified
 to
   +   exclude a symbol_ref for an indirect call.
   +
2014-11-11  Andrew Pinski  apin...@cavium.com
  
   Bug target/61997
   Index: gcc/testsuite/gcc.target/aarch64/long-calls-1.c
  
 
 =
   ==
   --- gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0)
   +++ gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0)
   @@ -0,0 +1,133 @@
   +/* Check that long calls to different sections are not optimized
   +to bl.  */
   +/* { dg-do compile } */
   +/* { dg-options -O2 } */
   +/* This test expects that short calls are the default.  */
   +/* { dg-skip-if -mlong-calls in use { *-*-* } { -mlong-calls
   +} {  } } */
   +
   +#define section(S) __attribute__((section(S))) #define weak
   +__attribute__((weak)) #define noinline __attribute__((noinline))
   +#define 

Re: [PATCH] plugin event for C/C++ function definitions

2014-11-18 Thread Andres Tiraboschi
2014-11-14 7:08 GMT-03:00 Richard Biener richard.guent...@gmail.com:
 On Thu, Nov 13, 2014 at 6:42 PM, Andres Tiraboschi
 andres.tirabos...@tallertechnologies.com wrote:
 Hi, this patch adds a new plugin event PLUGIN_START_FUNCTION and 
 PLUGIN_FINISH_FUNCTION that are invoked at start_function and 
 finish_function respectively in the C and C++ frontends.
 PLUGIN_START_FUNCTION is called before parsing the function body.
 PLUGIN_FINISH_FUNCTION is called after parsing a function definition.

 Can you name them more specifically, like
 PLUGIN_START/FINISH_PARSE_FUNCTION please?

 Thanks,
 Richard.

Ok, I will do it


Re: [PATCH] Fix optimize_range_tests_diff

2014-11-18 Thread Richard Sandiford
Jakub Jelinek ja...@redhat.com writes:
 On Tue, Oct 14, 2014 at 11:23:22AM -0600, Jeff Law wrote:
 On 10/14/14 10:02, Jakub Jelinek wrote:
 When hacking on range reassoc opt, I've noticed we can emit
 code with undefined behavior even when there wasn't one originally,
 in particular for:
 (X - 43U) = 3U || (X - 75U) = 3U
 and this loop can transform that into
 ((X - 43U)  ~(75U - 43U)) = 3U.  */
 we actually don't transform it to what the comment says, but
 ((X - 43)  ~(75U - 43U)) = 3U
 i.e. the initial subtraction can be performed in signed type,
 if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction
 at gimple level would be UB (not caught by -fsanitize=undefined,
 because that is handled much earlier).
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 2014-10-14  Jakub Jelinek  ja...@redhat.com
 
 * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform
 MINUS_EXPR in unsigned type to avoid undefined behavior.
 Any chance this fixes:
 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63302

 No.  For that I have right now:
 -  if (tree_log2 (lowxor)  0)
 +  if (wi::popcount (wi::to_widest (lowxor)) != 1)
 in my tree, though supposedly:
if (wi::popcount (wi::zext (wi::to_widest (lowxor), TYPE_PRECISION 
 (TREE_TYPE (lowxor != 1)
 might be better, as without zext it will supposedly
 not say popcount is 1 for smaller precision signed minimum values.
 My wide-int-fu is limited, so if there is a better way to do this, I'm all
 ears.

Sorry for the very late reply, but just:

  if (wi::popcount (lowxor) != 1)

should do what you want.

Thanks,
Richard



[PATCH] Fix PR63914

2014-11-18 Thread Richard Biener

The following fixes an ICE in CCPs set_lattice_value which is
confused when transitioning from { NaN, 0.0 } to { 0.0, 0.0 }.
The comment before canonicalize_value nicely explains what
happens here (with -ffinite-math-only) but the function fails
to handle float vectors or complex float constants.  Now -- we
cannot simply canonicalize { NaN, 0.0 } to UNDEFINED as the
value is still partly defined.  Thus instead of doing the
other possible thing (drop to VARYING here) I drop the
canonicalization we do for the !HONOR_NANS case (as well
as the !HONOR_SIGNED_ZEROS as operand_equal_p deals nicely
with that).  Instead of canonicalizing I allow this kind of
lattice transitions (from NaN to !NaN if !HONOR_NANS).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
sofar.

I've chosen to change the assert to a gcc_checking_assert which
should be safe to backport and would hide the bug.

Richard.

2014-11-18  Richard Biener  rguent...@suse.de

PR tree-optimization/63914
* tree-ssa-ccp.c (canonicalize_value): Remove float value
canonicalization.
(valid_lattice_transition): Allow (partial) transition
from NaN to non-NaN if !HONOR_NANS.
(set_lattice_value): Check for valid lattice transitions
only when checking is enabled.

* gcc.dg/pr63914.c: New testcase.

Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c  (revision 217653)
+++ gcc/tree-ssa-ccp.c  (working copy)
@@ -402,58 +402,16 @@ set_value_varying (tree var)
   val-mask = -1;
 }
 
-/* For float types, modify the value of VAL to make ccp work correctly
-   for non-standard values (-0, NaN):
-
-   If HONOR_SIGNED_ZEROS is false, and VAL = -0, we canonicalize it to 0.
-   If HONOR_NANS is false, and VAL is NaN, we canonicalize it to UNDEFINED.
- This is to fix the following problem (see PR 29921): Suppose we have
-
- x = 0.0 * y
-
- and we set value of y to NaN.  This causes value of x to be set to NaN.
- When we later determine that y is in fact VARYING, fold uses the fact
- that HONOR_NANS is false, and we try to change the value of x to 0,
- causing an ICE.  With HONOR_NANS being false, the real appearance of
- NaN would cause undefined behavior, though, so claiming that y (and x)
- are UNDEFINED initially is correct.
-
-  For other constants, make sure to drop TREE_OVERFLOW.  */
+/* For integer constants, make sure to drop TREE_OVERFLOW.  */
 
 static void
 canonicalize_value (ccp_prop_value_t *val)
 {
-  machine_mode mode;
-  tree type;
-  REAL_VALUE_TYPE d;
-
   if (val-lattice_val != CONSTANT)
 return;
 
   if (TREE_OVERFLOW_P (val-value))
 val-value = drop_tree_overflow (val-value);
-
-  if (TREE_CODE (val-value) != REAL_CST)
-return;
-
-  d = TREE_REAL_CST (val-value);
-  type = TREE_TYPE (val-value);
-  mode = TYPE_MODE (type);
-
-  if (!HONOR_SIGNED_ZEROS (mode)
-   REAL_VALUE_MINUS_ZERO (d))
-{
-  val-value = build_real (type, dconst0);
-  return;
-}
-
-  if (!HONOR_NANS (mode)
-   REAL_VALUE_ISNAN (d))
-{
-  val-lattice_val = UNDEFINED;
-  val-value = NULL;
-  return;
-}
 }
 
 /* Return whether the lattice transition is valid.  */
@@ -487,7 +445,49 @@ valid_lattice_transition (ccp_prop_value
== wi::bit_and_not (wi::to_widest (new_val.value), new_val.mask));
 
   /* Otherwise constant values have to agree.  */
-  return operand_equal_p (old_val.value, new_val.value, 0);
+  if (operand_equal_p (old_val.value, new_val.value, 0))
+return true;
+
+  /* At least the kinds and types should agree now.  */
+  if (TREE_CODE (old_val.value) != TREE_CODE (new_val.value)
+  || !types_compatible_p (TREE_TYPE (old_val.value),
+ TREE_TYPE (new_val.value)))
+return false;
+
+  /* For floats and !HONOR_NANS allow transitions from (partial) NaN
+ to non-NaN.  */
+  tree type = TREE_TYPE (new_val.value);
+  if (SCALAR_FLOAT_TYPE_P (type)
+   !HONOR_NANS (TYPE_MODE (type)))
+{
+  if (REAL_VALUE_ISNAN (TREE_REAL_CST (old_val.value)))
+   return true;
+}
+  else if (VECTOR_FLOAT_TYPE_P (type)
+   !HONOR_NANS (TYPE_MODE (TREE_TYPE (type
+{
+  for (unsigned i = 0; i  VECTOR_CST_NELTS (old_val.value); ++i)
+   if (!REAL_VALUE_ISNAN
+  (TREE_REAL_CST (VECTOR_CST_ELT (old_val.value, i)))
+!operand_equal_p (VECTOR_CST_ELT (old_val.value, i),
+VECTOR_CST_ELT (new_val.value, i), 0))
+ return false;
+  return true;
+}
+  else if (COMPLEX_FLOAT_TYPE_P (type)
+   !HONOR_NANS (TYPE_MODE (TREE_TYPE (type
+{
+  if (!REAL_VALUE_ISNAN (TREE_REAL_CST (TREE_REALPART (old_val.value)))
+  !operand_equal_p (TREE_REALPART (old_val.value),
+  TREE_REALPART (new_val.value), 0))
+   return false;
+  if (!REAL_VALUE_ISNAN (TREE_REAL_CST 

Re: [PATCH][AARCH64]Add ACLE more predefined macros

2014-11-18 Thread Marcus Shawcroft
On 17 November 2014 17:33, Renlin Li renlin...@arm.com wrote:
 Hi all,

 This is a simple patch to add more conditional macros defined ACLE 2.0.

 aarch64-none-elf target is tested on the model, no new issues.

 Is this Okay for trunk?


 gcc/ChangeLog:

 2014-11-17  Renlin Li  renlin...@arm.com

 * config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define
 __ARM_FP_FAST,
 __ARM_FEATURE_FMA, __ARM_FP, __ARM_FEATURE_NUMERIC_MAXMIN,
 __ARM_NEON_FP.

OK /Marcus


Re: [PATCH][AArch64] Remove/merge redundant iterators

2014-11-18 Thread Marcus Shawcroft
On 13 November 2014 10:38, Alan Lawrence alan.lawre...@arm.com wrote:
 Hi,

 gcc/config/aarch64/iterators.md contains numerous duplicates - not always
 obvious as they are not always sorted the same. Sometimes, one copy is used
 is aarch64-simd-builtins.def and another in aarch64-simd.md; othertimes
 there is no obvious pattern ;).

 This patch just removes all the duplicates; I'm willing to hear arguments
 that some of the duplication serves a useful purpose! But in the meantime,
 the complete list of changes is:
   * VSDQ_I_BHSI has no uses, remove
   * SDQ_I (duplicate of ALLI) has no uses, remove;
   * VDQQHS (duplicate of VDQ_BHSI) has no uses, remove;
   * VDQM duplicates VDQ_BHSI, use the latter;
   * VDIC and VDW duplicate VD_BHSI, use the latter;

and vdq - vdq_i.

OK /Marcus


Re: [PING ^ 2][RFC PATCH, AARCH64] Add support for -mlong-calls option

2014-11-18 Thread Ramana Radhakrishnan



On 18/11/14 12:51, Yangfei (Felix) wrote:

On Tue, Nov 18, 2014 at 11:51 AM, Yangfei (Felix) felix.y...@huawei.com
wrote:

Ping again?  Any comment please?



Pinging daily is only going to irritate people. Please desist from doing so.

Ramana



Oh, thanks for reminding me.  And sorry if this bothers you guys.
The end of stage1 of GCC 5.0 causes me to push this a little bit :-)


You've posted this patch before stage1 closed. In any case IIRC the 
policy in the contribute pages used to suggest a weekly ping.



Ramana









Ping?  I hope this patch can catch up with stage 1 of GCC-5.0.  Thanks.





Hi Felix,

Sorry for the delay responding, I've been out of the office
recently and I'm only just catching up on a backlog of GCC related emails.

I'm in two minds about this; I can potentially see the need for
attributes to enable long calls for specific calls, and maybe
also for pragmas that can be used to efficiently mark a group of
functions in that way; but I don't really see the value in adding
a -mlong-calls option to do

this globally.


The reasoning is as follows: long calls are generally very
expensive and relatively few functions should need them in most
applications (since code that needs to span more than a single
block of 128Mbytes
- the span of a BL or B instruction - will be very rare in reality).

The best way to handle very large branches for those rare cases
where you do have a very large contiguous block of code more than
128MB is by having the linker insert veneers when needed; the
code will branch to the veneer which will insert an indirect
branch at that point (the ABI guarantees that at function call
boundaries IP0 and IP1 will not contain live values, making them available

for such purposes).


In a very small number of cases it might be desirable to mark
specific functions as being too far away to reach; in those cases
the attributes and pragma methods can be used to mark such calls
as being

far calls.


Aside: The reason -mlong-calls was added to GCC for ARM is that
the code there pre-dates the EABI, which introduced the concept
of link-time veneering of calls - the option should be
unnecessary now that almost everyone uses the EABI as the basis
for their platform ABI.  We don't have such a legacy for AArch64
and I'd need to see strong

justification for its use before adding the option there as well.


So please can you rework the patch to remove the -mlong-calls
option and just leave the attribute and pragma interfaces.

R.



Hello Richard,

   Thanks for the comments.  I agree with the idea.
   And I updated the patch with the -mlong-calls option removed and
use short call by default.
   Reg-tested for aarch64-linux-gnu with qemu.  Is it OK for trunk?


Index: gcc/ChangeLog




=

==
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,25 @@
+2014-11-12  Felix Yang  felix.y...@huawei.com
+   Haijian Zhang  z.zhanghaij...@huawei.com
+
+   * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define.
+   * config/aarch64/aarch64.c (aarch64_set_default_type_attributes,
+   aarch64_attribute_table, aarch64_comp_type_attributes,
+   aarch64_decl_is_long_call_p, aarch64_function_in_section_p,
+   aarch64_pr_long_calls, aarch64_pr_no_long_calls,
+   aarch64_pr_long_calls_off): New functions.
+   (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as
+   aarch64_set_default_type_attributes.
+   (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table.
+   (TARGET_COMP_TYPE_ATTRIBUTES): Define as
aarch64_comp_type_attribute.
+   (aarch64_pragma_enum): New enum.
+   (aarch64_attribute_table): New attribute table.
+   * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls,
+   aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New

declarations.

+   * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to
+   generate indirect call for sibling call when needed.
+   * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified

to

+   exclude a symbol_ref for an indirect call.
+
  2014-11-11  Andrew Pinski  apin...@cavium.com

 Bug target/61997
Index: gcc/testsuite/gcc.target/aarch64/long-calls-1.c




=

==
--- gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/long-calls-1.c (revision 0)
@@ -0,0 +1,133 @@
+/* Check that long calls to different sections are not optimized
+to bl.  */
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+/* This test expects that short calls are the default.  */
+/* { dg-skip-if -mlong-calls in use { *-*-* } { -mlong-calls
+} {  } } */
+
+#define section(S) __attribute__((section(S))) #define weak
+__attribute__((weak)) #define noinline __attribute__((noinline))
+#define long_call __attribute__((long_call)) #define short_call
+__attribute__((short_call))
+
+#define REMOTE_CALL(ID, TARGET_ATTRS, CALL_ATTRS)

\

+  const char 

Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread Ilya Enkovich
2014-11-18 15:18 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Nov 18, 2014 at 1:13 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com:
 On 11/17/14 13:43, Ilya Enkovich wrote:


 I don't fully understand how this problem appears.  Is it fully AIX
 specific and doesn't affect any other target?  May we put all _CHKP
 codes to the end of enum and ignore them by AIX? Limiting number of
 codes in enum due to restrictions of specific debug info format seems
 weird.  If something cannot be encoded for debug info then it should
 be ignored.  I really hoped that creation of functions by demand would
 allow to avoid such kind of problems :(

 I'll try to make a patch reducing amound of builtin codes to a
 required minimum in case it appears to be the best option we have.

 It's a problem with the AIX native tools.  Presumably they don't like 
 really
 long lines -- it was a relatively common issue in the past with vendor
 tools.

 I think we should proceed on two fronts.  First if David could investigate
 the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to
 resolve the issue.  Second if you could look at now duplicating every
 builtin in the enumeration since it's a combination of the number of enums
 and the length of the debug strings to represent them that's causing AIX
 heartburn.



 jeff

 I see.  I can reduce the number of _CHKP builtin codes. Experimental
 patch shows resulting END_BUILTINS = 1242.  But we should expect
 similar problem for i386 target builds hosted on AIX (here
 http://toolchain.lug-owl.de/buildbot/ I see such build is tested).
 Current IX86_BUILTIN_MAX is 2467 which is greater than current
 END_BUILTINS.

 I think it's better to fix dbxout.c to do something sensible with
 too long lines for enum types.  It should be easy to cook up a
 testcase that is not GCC.

 enum { entry1, entry2, entry3 };


 As I understand the main problem is that fixing dbxout in trunk won't
 help to build stage1 compiler.

 Simply build stage1 without debug info on AIX then.

 Btw, you have to start fixing the bug at some point ... (we can
 backport to 4.8 and 4.9).  Of course dbxout.c is in kind of
 deep-freeze unmaintained state.

Seems with no continuation lines support there is no easy way to limit
stabstring length.  But it should be easy to fix this particular
problem (too long stabstring due to many enum values) by discarding
the rest of values when some string length threshold is hit.  Possible
patch:

diff --git a/gcc/dbxout.c b/gcc/dbxout.c
index aa15a39..9b0b82d 100644
--- a/gcc/dbxout.c
+++ b/gcc/dbxout.c
@@ -168,6 +168,10 @@ along with GCC; see the file COPYING3.  If not see
 #define DBX_CONTIN_CHAR '\\'
 #endif

+#ifndef DBX_MAX_LENGTH
+#define DBX_MAX_LENGTH 0
+#endif
+
 enum typestatus {TYPE_UNSEEN, TYPE_XREF, TYPE_DEFINED};

 /* Structure recording information about a C data type.
@@ -2270,6 +2274,12 @@ dbxout_type (tree type, int full)
  stabstr_C (',');
  if (TREE_CHAIN (tem) != 0)
CONTIN;
+
+ /* Prevent too long strings rejected by linker.  */
+ if (DBX_CONTIN_LENGTH == 0
+  DBX_MAX_LENGTH != 0
+  obstack_object_size (stabstr_ob)  DBX_MAX_LENGTH)
+   break;
}

   stabstr_C (';');

With this we should be able to set limit in target config files.
Tried on linux with a small test with limit set to 20:

cat test.c
enum e1 {
  val1, val2, val3
};
gcc test.c -S -gstabs
grep e1 test.s
.stabs  e1:T(0,21)=eval1:0,val2:1,;,128,0,0,0


The last enum value was discarded.

I don't know which target should set this limit and what limit values
should be used.

Thanks,
Ilya


 I don't think we ever encouraged work-arounds for broken host compilers
 if that requires substantial work.

 not sure how exactly the issue breaks AIX bootstrap though.  I suppose
 the assembler complains?

 It is linker how complaints. Here is a build log example:
 http://toolchain.lug-owl.de/buildbot/deliver_artifact.php?mode=viewid=3073584

 I see.

 Thanks,
 Richard.

 Ilya


 Richard.

 Ilya


C++ PATCH for c++/58102 (constexpr and mutable)

2014-11-18 Thread Jason Merrill
We need to allow copy-list-initialization of constexpr variables with 
mutable members, too.  The thing we need to avoid is not so much an 
full-expression with mutable-containing type as assuming that a mutable 
member of a variable hasn't changed since the variable was initialized.


Tested x86_64-pc-linux-gnu, applied to trunk.
commit 58e6e22ca70f9efe534f9d038f0cf9a2fe1a7504
Author: Jason Merrill ja...@redhat.com
Date:   Mon Nov 17 23:44:41 2014 -0500

	PR c++/58102
	* typeck2.c (store_init_value): Set it.
	* cp-tree.h (CONSTRUCTOR_MUTABLE_POISON): New.
	* constexpr.c (cxx_eval_outermost_constant_expr): Check it.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 5b25654..2f0708b 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3315,15 +3315,15 @@ cxx_eval_outermost_constant_expr (tree t, bool allow_non_constant,
 
   verify_constant (r, allow_non_constant, non_constant_p, overflow_p);
 
-  if (TREE_CODE (t) != CONSTRUCTOR
-   cp_has_mutable_p (TREE_TYPE (t)))
+  /* Mutable logic is a bit tricky: we want to allow initialization of
+ constexpr variables with mutable members, but we can't copy those
+ members to another constexpr variable.  */
+  if (TREE_CODE (r) == CONSTRUCTOR
+   CONSTRUCTOR_MUTABLE_POISON (r))
 {
-  /* We allow a mutable type if the original expression was a
-	 CONSTRUCTOR so that we can do aggregate initialization of
-	 constexpr variables.  */
   if (!allow_non_constant)
-	error (%qT cannot be the type of a complete constant expression 
-	   because it has mutable sub-objects, type);
+	error (%qE is not a constant expression because it refers to 
+	   mutable subobjects of %qT, t, type);
   non_constant_p = true;
 }
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index d3722d7..3542344 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -109,6 +109,7 @@ c-common.h, not after.
   DECLTYPE_FOR_LAMBDA_PROXY (in DECLTYPE_TYPE)
   REF_PARENTHESIZED_P (in COMPONENT_REF, INDIRECT_REF)
   AGGR_INIT_ZERO_FIRST (in AGGR_INIT_EXPR)
+  CONSTRUCTOR_MUTABLE_POISON (in CONSTRUCTOR)
3: (TREE_REFERENCE_EXPR) (in NON_LVALUE_EXPR) (commented-out).
   ICS_BAD_FLAG (in _CONV)
   FN_TRY_BLOCK_P (in TRY_BLOCK)
@@ -3497,6 +3498,11 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter)
 #define CONSTRUCTOR_NO_IMPLICIT_ZERO(NODE) \
   (TREE_LANG_FLAG_1 (CONSTRUCTOR_CHECK (NODE)))
 
+/* True if this CONSTRUCTOR should not be used as a variable initializer
+   because it was loaded from a constexpr variable with mutable fields.  */
+#define CONSTRUCTOR_MUTABLE_POISON(NODE) \
+  (TREE_LANG_FLAG_2 (CONSTRUCTOR_CHECK (NODE)))
+
 #define DIRECT_LIST_INIT_P(NODE) \
(BRACE_ENCLOSED_INITIALIZER_P (NODE)  CONSTRUCTOR_IS_DIRECT_INIT (NODE))
 
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 01a0671..5748650 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -809,6 +809,10 @@ store_init_value (tree decl, tree init, vectree, va_gc** cleanups, int flags)
 	value = cxx_constant_value (value, decl);
 	}
   value = maybe_constant_init (value, decl);
+  if (TREE_CODE (value) == CONSTRUCTOR  cp_has_mutable_p (type))
+	/* Poison this CONSTRUCTOR so it can't be copied to another
+	   constexpr variable.  */
+	CONSTRUCTOR_MUTABLE_POISON (value) = true;
   const_init = (reduced_constant_expression_p (value)
 		|| error_operand_p (value));
   DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = const_init;
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable2.C
new file mode 100644
index 000..c449c3a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable2.C
@@ -0,0 +1,10 @@
+// PR c++/58102
+// { dg-do compile { target c++11 } }
+
+struct S {
+  mutable int n;
+  constexpr S() : n() {}
+};
+
+constexpr S s = {};
+constexpr S s2 = s;		// { dg-error mutable }


[PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-18 Thread Andrew Bennett
Hi,

The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when
using the -mfix-r1 option.  This is due to the fact that the delay 
slot of the branch instruction that checks if the atomic operation was
not successful can be filled with an operation that returns the output 
result.  For the branch likely case this operation can not go in the 
delay slot because it wont execute when the atomic operation was 
successful and therefore the wrong result will be returned.  This patch 
forces a nop to be placed in the delay slot if a branch likely is used.  
The fix is as simple as possible; it does cause a second nop to be introduced 
after the branch instruction in the branch likely case.  As this option is 
rarely used, it makes more sense to keep the code readable than trying to 
optimise it.

The atomic tests now pass successfully.  The ChangeLog and patch are below.

Ok to commit?

Many thanks,



Andrew



gcc/
* config/mips/mips.c (mips_process_sync_loop): Place a nop in the 
delay slot of the branch likely instruction.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 02268f3..6dd3728 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
  This will sometimes be a delayed branch; see the write code below
  for details.  */
   mips_multi_add_insn (is_64bit_p ? scd\t%0,%1 : sc\t%0,%1, at, mem, NULL);
-  mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL);
+
+  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay slot
+ of the branch if it is a branch likely because they will not execute when
+ the atomic operation is successful, so place a NOP in there instead.  */
+
+  mips_multi_add_insn (beq%?\t%0,%.,1b%~, at, NULL);
 
   /* if (INSN1 != MOVE  INSN1 != LI) NEWVAL = $TMP3 [delay slot].  */
   if (insn1 != SYNC_INSN1_MOVE  insn1 != SYNC_INSN1_LI  tmp3 != newval)



Re: [PATCH 5/5] combine: preferably delete dead SETs in PARALLELs

2014-11-18 Thread Segher Boessenkool
On Tue, Nov 18, 2014 at 06:38:49AM -0600, Segher Boessenkool wrote:
  Does it help pr52714 where we'd like to rip apart a PARALLEL with two 
  sets, one of which is dead.  If so, it might allow us to  optimize that 
  code better.
 
 It does not seem to fix the testcase.  I'll investigate why not.
 You're talking about the
 
   (parallel [(set (pc) (pc))
  (set (X) (sp))])
 
 right?  I guess the set pc pc is not marked as unused...

The very first thing that is checked for is !(added_sets_2  i1 == 0)
which matches in this case.  I wonder what that condition is supposed
to accomplish -- optimisation I suppose?

Hacking that out, it still won't match because the set pc pc is not
considered dead.  It's a no-op, not the same thing.  I'll try to widen
the condition...


Segher


Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread David Edelsohn
On Tue, Nov 18, 2014 at 8:33 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-11-18 15:18 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Nov 18, 2014 at 1:13 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com:
 On 11/17/14 13:43, Ilya Enkovich wrote:


 I don't fully understand how this problem appears.  Is it fully AIX
 specific and doesn't affect any other target?  May we put all _CHKP
 codes to the end of enum and ignore them by AIX? Limiting number of
 codes in enum due to restrictions of specific debug info format seems
 weird.  If something cannot be encoded for debug info then it should
 be ignored.  I really hoped that creation of functions by demand would
 allow to avoid such kind of problems :(

 I'll try to make a patch reducing amound of builtin codes to a
 required minimum in case it appears to be the best option we have.

 It's a problem with the AIX native tools.  Presumably they don't like 
 really
 long lines -- it was a relatively common issue in the past with vendor
 tools.

 I think we should proceed on two fronts.  First if David could 
 investigate
 the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to
 resolve the issue.  Second if you could look at now duplicating every
 builtin in the enumeration since it's a combination of the number of 
 enums
 and the length of the debug strings to represent them that's causing AIX
 heartburn.



 jeff

 I see.  I can reduce the number of _CHKP builtin codes. Experimental
 patch shows resulting END_BUILTINS = 1242.  But we should expect
 similar problem for i386 target builds hosted on AIX (here
 http://toolchain.lug-owl.de/buildbot/ I see such build is tested).
 Current IX86_BUILTIN_MAX is 2467 which is greater than current
 END_BUILTINS.

 I think it's better to fix dbxout.c to do something sensible with
 too long lines for enum types.  It should be easy to cook up a
 testcase that is not GCC.

 enum { entry1, entry2, entry3 };


 As I understand the main problem is that fixing dbxout in trunk won't
 help to build stage1 compiler.

 Simply build stage1 without debug info on AIX then.

 Btw, you have to start fixing the bug at some point ... (we can
 backport to 4.8 and 4.9).  Of course dbxout.c is in kind of
 deep-freeze unmaintained state.

 Seems with no continuation lines support there is no easy way to limit
 stabstring length.  But it should be easy to fix this particular
 problem (too long stabstring due to many enum values) by discarding
 the rest of values when some string length threshold is hit.  Possible
 patch:

 diff --git a/gcc/dbxout.c b/gcc/dbxout.c
 index aa15a39..9b0b82d 100644
 --- a/gcc/dbxout.c
 +++ b/gcc/dbxout.c
 @@ -168,6 +168,10 @@ along with GCC; see the file COPYING3.  If not see
  #define DBX_CONTIN_CHAR '\\'
  #endif

 +#ifndef DBX_MAX_LENGTH
 +#define DBX_MAX_LENGTH 0
 +#endif
 +
  enum typestatus {TYPE_UNSEEN, TYPE_XREF, TYPE_DEFINED};

  /* Structure recording information about a C data type.
 @@ -2270,6 +2274,12 @@ dbxout_type (tree type, int full)
   stabstr_C (',');
   if (TREE_CHAIN (tem) != 0)
 CONTIN;
 +
 + /* Prevent too long strings rejected by linker.  */
 + if (DBX_CONTIN_LENGTH == 0
 +  DBX_MAX_LENGTH != 0
 +  obstack_object_size (stabstr_ob)  DBX_MAX_LENGTH)
 +   break;
 }

stabstr_C (';');

 With this we should be able to set limit in target config files.
 Tried on linux with a small test with limit set to 20:

cat test.c
 enum e1 {
   val1, val2, val3
 };
gcc test.c -S -gstabs
grep e1 test.s
 .stabs  e1:T(0,21)=eval1:0,val2:1,;,128,0,0,0


 The last enum value was discarded.

 I don't know which target should set this limit and what limit values
 should be used.

Hi, Ilya

Thanks for the preliminary patch to dbxout.c.  I was planning to work
on something similar today.

Continuations would have been a nice solution, but, as I wrote
earlier, AIX assembler does not honor stabstring continuations.  It's
one line, period. As much as can fit on one line.  I believe that AIX
allows up to 64K.

My work-around was not meant as a solution.  I was trying to allow GCC
to build on AIX while we work on a solution and trying to help
everyone understand the source of the problem.

We definitely need a fix for dbxout in GCC and backport it to GCC 4.8
and GCC 4.9 branches.

I still think that it is rather unfortunate that the infrastructure
for CHKP introduces so much bloat into GCC on every target and every
architecture even when it is not enabled and cannot function.

Thanks, David


Re: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-18 Thread Maciej W. Rozycki
On Tue, 18 Nov 2014, Andrew Bennett wrote:

 The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when
 using the -mfix-r1 option.  This is due to the fact that the delay 
 slot of the branch instruction that checks if the atomic operation was
 not successful can be filled with an operation that returns the output 
 result.  For the branch likely case this operation can not go in the 
 delay slot because it wont execute when the atomic operation was 
 successful and therefore the wrong result will be returned.  This patch 
 forces a nop to be placed in the delay slot if a branch likely is used.  
 The fix is as simple as possible; it does cause a second nop to be introduced 
 after the branch instruction in the branch likely case.  As this option is 
 rarely used, it makes more sense to keep the code readable than trying to 
 optimise it.

 Can you please be a bit more elaborate on how the wrong code sequence 
looks like, why it is produced like that and how your fix changes it?  
Without seeing examples of code generated it is very hard to imagine 
what is really going on here.

  Maciej


Re: C++ PATCH for c++/58102 (constexpr and mutable)

2014-11-18 Thread Paolo Carlini

On 11/18/2014 02:35 PM, Jason Merrill wrote:
We need to allow copy-list-initialization of constexpr variables with 
mutable members, too.  The thing we need to avoid is not so much an 
full-expression with mutable-containing type as assuming that a 
mutable member of a variable hasn't changed since the variable was 
initialized.
Thanks. You suggest a similar fix a while ago but I failed to pursue it, 
sorry.


Paolo.


Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread Richard Biener
On Tue, Nov 18, 2014 at 2:44 PM, David Edelsohn dje@gmail.com wrote:
 On Tue, Nov 18, 2014 at 8:33 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-11-18 15:18 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Nov 18, 2014 at 1:13 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-11-18 15:04 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Nov 18, 2014 at 11:51 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-11-18 5:56 GMT+03:00 Jeff Law l...@redhat.com:
 On 11/17/14 13:43, Ilya Enkovich wrote:


 I don't fully understand how this problem appears.  Is it fully AIX
 specific and doesn't affect any other target?  May we put all _CHKP
 codes to the end of enum and ignore them by AIX? Limiting number of
 codes in enum due to restrictions of specific debug info format seems
 weird.  If something cannot be encoded for debug info then it should
 be ignored.  I really hoped that creation of functions by demand would
 allow to avoid such kind of problems :(

 I'll try to make a patch reducing amound of builtin codes to a
 required minimum in case it appears to be the best option we have.

 It's a problem with the AIX native tools.  Presumably they don't like 
 really
 long lines -- it was a relatively common issue in the past with vendor
 tools.

 I think we should proceed on two fronts.  First if David could 
 investigate
 the stabs-in-xcoff bits a bit to see if DBX_CONTIN_LEN can be used to
 resolve the issue.  Second if you could look at now duplicating every
 builtin in the enumeration since it's a combination of the number of 
 enums
 and the length of the debug strings to represent them that's causing AIX
 heartburn.



 jeff

 I see.  I can reduce the number of _CHKP builtin codes. Experimental
 patch shows resulting END_BUILTINS = 1242.  But we should expect
 similar problem for i386 target builds hosted on AIX (here
 http://toolchain.lug-owl.de/buildbot/ I see such build is tested).
 Current IX86_BUILTIN_MAX is 2467 which is greater than current
 END_BUILTINS.

 I think it's better to fix dbxout.c to do something sensible with
 too long lines for enum types.  It should be easy to cook up a
 testcase that is not GCC.

 enum { entry1, entry2, entry3 };


 As I understand the main problem is that fixing dbxout in trunk won't
 help to build stage1 compiler.

 Simply build stage1 without debug info on AIX then.

 Btw, you have to start fixing the bug at some point ... (we can
 backport to 4.8 and 4.9).  Of course dbxout.c is in kind of
 deep-freeze unmaintained state.

 Seems with no continuation lines support there is no easy way to limit
 stabstring length.  But it should be easy to fix this particular
 problem (too long stabstring due to many enum values) by discarding
 the rest of values when some string length threshold is hit.  Possible
 patch:

 diff --git a/gcc/dbxout.c b/gcc/dbxout.c
 index aa15a39..9b0b82d 100644
 --- a/gcc/dbxout.c
 +++ b/gcc/dbxout.c
 @@ -168,6 +168,10 @@ along with GCC; see the file COPYING3.  If not see
  #define DBX_CONTIN_CHAR '\\'
  #endif

 +#ifndef DBX_MAX_LENGTH
 +#define DBX_MAX_LENGTH 0
 +#endif
 +
  enum typestatus {TYPE_UNSEEN, TYPE_XREF, TYPE_DEFINED};

  /* Structure recording information about a C data type.
 @@ -2270,6 +2274,12 @@ dbxout_type (tree type, int full)
   stabstr_C (',');
   if (TREE_CHAIN (tem) != 0)
 CONTIN;
 +
 + /* Prevent too long strings rejected by linker.  */
 + if (DBX_CONTIN_LENGTH == 0
 +  DBX_MAX_LENGTH != 0
 +  obstack_object_size (stabstr_ob)  DBX_MAX_LENGTH)
 +   break;
 }

stabstr_C (';');

 With this we should be able to set limit in target config files.
 Tried on linux with a small test with limit set to 20:

cat test.c
 enum e1 {
   val1, val2, val3
 };
gcc test.c -S -gstabs
grep e1 test.s
 .stabs  e1:T(0,21)=eval1:0,val2:1,;,128,0,0,0


 The last enum value was discarded.

 I don't know which target should set this limit and what limit values
 should be used.

 Hi, Ilya

 Thanks for the preliminary patch to dbxout.c.  I was planning to work
 on something similar today.

 Continuations would have been a nice solution, but, as I wrote
 earlier, AIX assembler does not honor stabstring continuations.

Can we emit this particular STABS piece in raw form then?  Thus
does it work to have sth like

   .stabs ...
   .string ...
   .stabs ...

or whatever way to emit assembled data there?

  It's
 one line, period. As much as can fit on one line.  I believe that AIX
 allows up to 64K.

Can we get the assembler fixed? (sorry to ask this stupid question
again ...)

 My work-around was not meant as a solution.  I was trying to allow GCC
 to build on AIX while we work on a solution and trying to help
 everyone understand the source of the problem.

 We definitely need a fix for dbxout in GCC and backport it to GCC 4.8
 and GCC 4.9 branches.

 I still think that it is rather unfortunate that the 

Re: [PATCH] Check 'fd' neither -1 nor 0, before close it

2014-11-18 Thread Chen Gang
On 11/18/14 0:38, Joseph Myers wrote:
 On Sat, 15 Nov 2014, Chen Gang wrote:
 
 Also in c_common_read_pch(), when failure occurs, also need be sure the
 'fd' is not '-1' for the next close operation.
 
 Please clarify how c_common_read_pch gets called with fd == -1.  
 Certainly checking after an error is too late; we shouldn't call fdopen in 
 that case at all, and I think something further up the call chain should 
 have avoided calling c_common_read_pch with fd == -1 at all (the question 
 is just exactly what point on the call chain is missing such a check and 
 needs it added).
 

According to current source code, the author wants 'fd' should never be
'-1' in c_common_read_pch(). grep -rn read_pch * in root directory,
then grep -rn _cpp_stack_file *, can know it is used in 3 areas:

 - c_common_pch_pragma() in gcc/c-family/c-pch.c, it has already
   checked 'fd' before call c_common_read_pch()

 - _cpp_stack_include() in libcpp/files.c, before _cpp_stack_file(),
   has called and checked _cpp_find_file().

 - cpp_read_main_file() in libcpp/init.c, before _cpp_stack_file(),
   has called and checked _cpp_find_file().

In c_common_read_pch(), even if 'fd' is '-1', we should check it firstly
before call fdopen(), instead of check '-1' in failure code block.

But for me, it contents another 2 related issues:

 - _cpp_find_file() always return a valid pointer, so related check code
   file == NULL is no use for _cpp_find_file() in _cpp_stack_include().

 - According to its comments, _cpp_find_file() can not be sure of
   'file-fd' must not be '-1', even when file-err_no == 0, we need
   reopen it if necessary.

Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed


Re: RFA (tree-inline): PATCH for more C++14 constexpr support

2014-11-18 Thread Andreas Schwab
Jason Merrill ja...@redhat.com writes:

 commit e52e82e56507d1de1932abcafd80683c4dc00d1e
 Author: Jason Merrill ja...@redhat.com
 Date:   Sun Nov 16 17:14:12 2014 -0500

   * constexpr.c (use_new_call): Always use new call handling.

 diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
 index 57d0c46..8881271 100644
 --- a/gcc/cp/constexpr.c
 +++ b/gcc/cp/constexpr.c
 @@ -1021,8 +1021,8 @@ adjust_temp_type (tree type, tree temp)
  }
  
  /* True if we want to use the new handling of constexpr calls based on
 -   DECL_SAVED_TREE.  Currently only active for C++14 mode.  */
 -#define use_new_call (cxx_dialect = cxx14)
 +   DECL_SAVED_TREE.  */
 +#define use_new_call true
  
  /* Subroutine of cxx_eval_call_expression.
 We are processing a call expression (either CALL_EXPR or

FAIL: 18_support/numeric_limits/requirements/constexpr_functions.cc (test for 
excess errors)

$ gcc/xg++ -shared-libgcc -Bgcc -nostdinc++ -Lm68k-linux/libstdc++-v3/src 
-Lm68k-linux/libstdc++-v3/src/.libs -Lm68k-linux/libstdc++-v3/libsupc++/.libs 
-B/daten/cross/m68k-linux/m68k-linux/bin/ 
-B/daten/cross/m68k-linux/m68k-linux/lib/ -isystem 
/daten/cross/m68k-linux/m68k-linux/include -isystem 
/daten/cross/m68k-linux/m68k-linux/sys-include 
-Bm68k-linux/./libstdc++-v3/src/.libs -D_GLIBCXX_ASSERT -fmessage-length=0 
-ffunction-sections -fdata-sections -O2 -g -D_GNU_SOURCE -DLOCALEDIR=. 
-nostdinc++ -Im68k-linux/libstdc++-v3/include/m68k-linux 
-Im68k-linux/libstdc++-v3/include -I../libstdc++-v3/libsupc++ 
-I../libstdc++-v3/include/backward -I../libstdc++-v3/testsuite/util 
../libstdc++-v3/testsuite/18_support/numeric_limits/requirements/constexpr_functions.cc
 -std=gnu++14 -S -o constexpr_functions.s
In file included from 
../libstdc++-v3/testsuite/18_support/numeric_limits/requirements/constexpr_functions.cc:21:0:
../libstdc++-v3/testsuite/18_support/numeric_limits/requirements/constexpr_functions.cc:
 In instantiation of ‘void 
__gnu_test::constexpr_member_functions::operator()()::_Concept::__constraint() 
[with _Ttesttype = std::numeric_limitschar; _Tbasetype = char]’:
../libstdc++-v3/testsuite/18_support/numeric_limits/requirements/constexpr_functions.cc:55:2:
   required from ‘void __gnu_test::constexpr_member_functions::operator()() 
[with _Ttesttype = std::numeric_limitschar; _Tbasetype = char]’
m68k-linux/libstdc++-v3/include/ext/typelist.h:197:2:   required from ‘void 
__gnu_cxx::typelist::detail::apply_generator2_Gn, 
__gnu_cxx::typelist::chainHd, Tl, __gnu_cxx::typelist::chainHd2, TlV 
::operator()(Gn) [with Gn = __gnu_test::constexpr_member_functions; Hd1 = 
std::numeric_limitschar; TlT = 
__gnu_cxx::typelist::chainstd::numeric_limitssigned char, 
__gnu_cxx::typelist::chainstd::numeric_limitsunsigned char, 
__gnu_cxx::typelist::chainstd::numeric_limitsshort int, 
__gnu_cxx::typelist::chainstd::numeric_limitsshort unsigned int, 
__gnu_cxx::typelist::chainstd::numeric_limitsint, 
__gnu_cxx::typelist::chainstd::numeric_limitsunsigned int, 
__gnu_cxx::typelist::chainstd::numeric_limitslong int, 
__gnu_cxx::typelist::chainstd::numeric_limitslong unsigned int, 
__gnu_cxx::typelist::chainstd::numeric_limitslong long int, 
__gnu_cxx::typelist::chainstd::numeric_limitslong long unsigned int, 
__gnu_cxx::typelist::chainstd::numeric_limitswchar_t, 
__gnu_cxx::typelist::chainstd::numeric_limitschar16_t, 
__gnu_cxx::typelist::chainstd::numeric_limitschar32_t, 
__gnu_cxx::typelist::null_type; Hd2 = char; TlV = 
__gnu_cxx::typelist::chainsigned char, __gnu_cxx::typelist::chainunsigned 
char, __gnu_cxx::typelist::chainshort int, __gnu_cxx::typelist::chainshort 
unsigned int, __gnu_cxx::typelist::chainint, 
__gnu_cxx::typelist::chainunsigned int, __gnu_cxx::typelist::chainlong int, 
__gnu_cxx::typelist::chainlong unsigned int, __gnu_cxx::typelist::chainlong 
long int, __gnu_cxx::typelist::chainlong long unsigned int, 
__gnu_cxx::typelist::chainwchar_t, __gnu_cxx::typelist::chainchar16_t, 
__gnu_cxx::typelist::chainchar32_t, __gnu_cxx::typelist::null_type  
  ]’
m68k-linux/libstdc++-v3/include/ext/typelist.h:199:6:   required from ‘void 
__gnu_cxx::typelist::detail::apply_generator2_Gn, 
__gnu_cxx::typelist::chainHd, Tl, __gnu_cxx::typelist::chainHd2, TlV 
::operator()(Gn) [with Gn = __gnu_test::constexpr_member_functions; Hd1 = 
std::numeric_limitsbool; TlT = 
__gnu_cxx::typelist::chainstd::numeric_limitschar, 
__gnu_cxx::typelist::chainstd::numeric_limitssigned char, 
__gnu_cxx::typelist::chainstd::numeric_limitsunsigned char, 
__gnu_cxx::typelist::chainstd::numeric_limitsshort int, 
__gnu_cxx::typelist::chainstd::numeric_limitsshort unsigned int, 
__gnu_cxx::typelist::chainstd::numeric_limitsint, 
__gnu_cxx::typelist::chainstd::numeric_limitsunsigned int, 
__gnu_cxx::typelist::chainstd::numeric_limitslong int, 
__gnu_cxx::typelist::chainstd::numeric_limitslong unsigned int, 
__gnu_cxx::typelist::chainstd::numeric_limitslong long int, 
__gnu_cxx::typelist::chainstd::numeric_limitslong long unsigned int, 

Re: [PATCH, Pointer Bounds Checker, Builtins instrumentation 3/5] Expand instrumented builtin calls

2014-11-18 Thread David Edelsohn
On Tue, Nov 18, 2014 at 9:07 AM, Richard Biener
richard.guent...@gmail.com wrote:

 Can we emit this particular STABS piece in raw form then?  Thus
 does it work to have sth like

.stabs ...
.string ...
.stabs ...

 or whatever way to emit assembled data there?

The stabstring debug information is saved as a length and a string.
For 32 bit AIX, the length is an unsigned short.  Longer entries
apparently can be stored in multiple symbol table entries, but it's
not clear how to emit assembly to produce that effect.  For 64 bit
AIX, the length is an unsigned int 32 bits, so one may be able to work
around this problem by compiling GCC on AIX in 64 bit mode.  A
conversion to default 64 bit build will take a lot of work.

Thanks, David


RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-18 Thread Andrew Bennett
 -Original Message-
 From: Maciej W. Rozycki [mailto:ma...@codesourcery.com]
 Sent: 18 November 2014 13:48
 To: Andrew Bennett
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] If using branch likelies in MIPS sync code fill the delay
 slot with a nop
 
 On Tue, 18 Nov 2014, Andrew Bennett wrote:
 
  The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when
  using the -mfix-r1 option.  This is due to the fact that the delay
  slot of the branch instruction that checks if the atomic operation was
  not successful can be filled with an operation that returns the output
  result.  For the branch likely case this operation can not go in the
  delay slot because it wont execute when the atomic operation was
  successful and therefore the wrong result will be returned.  This patch
  forces a nop to be placed in the delay slot if a branch likely is used.
  The fix is as simple as possible; it does cause a second nop to be
 introduced
  after the branch instruction in the branch likely case.  As this option is
  rarely used, it makes more sense to keep the code readable than trying to
  optimise it.
 
  Can you please be a bit more elaborate on how the wrong code sequence
 looks like, why it is produced like that and how your fix changes it?
 Without seeing examples of code generated it is very hard to imagine
 what is really going on here.

Ok if we compile the following cut-down atomic-op-3.c test case with
-mfix-r1.

extern void abort(void);

int v, count, res;
int
main ()
{
  v = 0;
  count = 1;

  if (__atomic_add_fetch (v, count, __ATOMIC_RELAXED) != 1)
abort ();

  return 0;
}

The following assembly code is produced for the atomic operation:

.setnoat
sync # 15   sync_new_addsi/2[length = 24]
1:
ll  $3,0($4)
addu$1,$3,$2
sc  $1,0($4)
beql$1,$0,1b
addu$3,$3,$2
sync
.setat


Notice here that the addu is in the delay slot of the branch likely instruction.
This is computing the value that exists in the memory location after the atomic 
operation has completed.  Because it is a branch likely instruction
this will only run when the atomic operation fails, and hence when it is
successful it will not return the correct value.


The offending code is in mips_process_sync_loop:

mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL);
 
  /* if (INSN1 != MOVE  INSN1 != LI) NEWVAL = $TMP3 [delay slot].  */
  if (insn1 != SYNC_INSN1_MOVE  insn1 != SYNC_INSN1_LI  tmp3 != newval)
{
  mips_multi_copy_insn (tmp3_insn);
  mips_multi_set_operand (mips_multi_last_index (), 0, newval);
}
  else if (!(required_oldval  cmp))
mips_multi_add_insn (nop, NULL);
  
  /* CMP = 1 -- either standalone or in a delay slot.  */
  if (required_oldval  cmp)
mips_multi_add_insn (li\t%0,1, cmp, NULL);


For the branch likely case the delay slot could be filled either with the
operation to compute the value that exists in memory after the atomic operation
has completed; an operation to return if the atomic operation is successful 
or not; or a nop.  The first two operations should not be in the delay slot 
of the branch if it is a branch likely because they will only run if the atomic 
operation was unsuccessful.

My fix places a nop in the delay slot of the branch likely instruction
by using the %~ output operation.  This then causes the sync code for the 
previous example to be correct:

.setnoat
sync # 15   sync_new_addsi/2[length = 24]
1:
ll  $3,0($4)
addu$1,$3,$2
sc  $1,0($4)
beql$1,$0,1b
nop
addu$3,$3,$2
sync
.setat



Regards,



Andrew




small C++ PATCH to instantiate_template_1

2014-11-18 Thread Jason Merrill
While looking at 55992 I noticed that we weren't pushing into dependent 
scope properly; we need to pass the entering scope flag to 
tsubst_aggr_type so that we get the version of the type with the 
members.  This isn't enough to fix 55992, but may well fix other bugs.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit c12c225ae102f77c6b4b1f98a69adbee474cb3a3
Author: Jason Merrill ja...@redhat.com
Date:   Mon Nov 17 16:26:45 2014 -0500

	* pt.c (instantiate_template_1): Use tsubst_aggr_type for context.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c0f3b8c..6661325 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15856,8 +15856,8 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain)
 ++processing_template_decl;
   if (DECL_CLASS_SCOPE_P (gen_tmpl))
 {
-  tree ctx = tsubst (DECL_CONTEXT (gen_tmpl), targ_ptr,
-			 complain, gen_tmpl);
+  tree ctx = tsubst_aggr_type (DECL_CONTEXT (gen_tmpl), targ_ptr,
+   complain, gen_tmpl, true);
   push_nested_class (ctx);
 }
   /* Substitute template parameters to obtain the specialization.  */


RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-18 Thread Matthew Fortune
 The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when
 using the -mfix-r1 option.  This is due to the fact that the delay
 slot of the branch instruction that checks if the atomic operation was
 not successful can be filled with an operation that returns the output
 result.  For the branch likely case this operation can not go in the
 delay slot because it wont execute when the atomic operation was
 successful and therefore the wrong result will be returned.  This patch
 forces a nop to be placed in the delay slot if a branch likely is used.
 The fix is as simple as possible; it does cause a second nop to be
 introduced
 after the branch instruction in the branch likely case.  As this option is
 rarely used, it makes more sense to keep the code readable than trying to
 optimise it.
 
 The atomic tests now pass successfully.  The ChangeLog and patch are
 below.
 
 Ok to commit?

I'm OK with just making the fix-r1 case safe rather than also
complicating the normal code path to remove the normal delay slot NOP in
this special case.  I'd like to see what Catherine thinks though.  To
remove the second NOP I believe we would have to add a !TARGET_FIX_R1
to the condition of the normal delay slot NOP. This seems a bit convoluted
when the real reason is because branch likely is in use.  To make use of
the mips_branch_likely flag it would have to be set earlier in:
mips_output_sync_loop and also get set in mips_sync_loop_insns.

If Catherine thinks getting rid of the second NOP is important enough then
please base it on mips_branch_likely and fix the callers.

 gcc/
   * config/mips/mips.c (mips_process_sync_loop): Place a nop in the
   delay slot of the branch likely instruction.
 
 
 diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
 index 02268f3..6dd3728 100644
 --- a/gcc/config/mips/mips.c
 +++ b/gcc/config/mips/mips.c
 @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx
 *operands)
   This will sometimes be a delayed branch; see the write code below
   for details.  */
mips_multi_add_insn (is_64bit_p ? scd\t%0,%1 : sc\t%0,%1, at, mem,
 NULL);
 -  mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL);
 +
 +  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay
 slot
 + of the branch if it is a branch likely because they will not execute
 when
 + the atomic operation is successful, so place a NOP in there instead.
 */
 +

I found the comment hard to digest, perhaps:

/* When using branch likely (-mfix-r1), the delay slot instruction will
   be annulled on false.  The normal delay slot instructions calculate the
   overall result of the atomic operation and must not be annulled.
   Unconditionally use a NOP instead for the branch likely case.  */

 +  mips_multi_add_insn (beq%?\t%0,%.,1b%~, at, NULL);
 
/* if (INSN1 != MOVE  INSN1 != LI) NEWVAL = $TMP3 [delay slot].  */
if (insn1 != SYNC_INSN1_MOVE  insn1 != SYNC_INSN1_LI  tmp3 !=
 newval)

Matthew


RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-18 Thread Maciej W. Rozycki
On Tue, 18 Nov 2014, Andrew Bennett wrote:

 My fix places a nop in the delay slot of the branch likely instruction
 by using the %~ output operation.  This then causes the sync code for the 
 previous example to be correct:
 
 .setnoat
 sync # 15   sync_new_addsi/2[length = 24]
 1:
 ll  $3,0($4)
 addu$1,$3,$2
 sc  $1,0($4)
 beql$1,$0,1b
 nop
 addu$3,$3,$2
 sync
 .setat

 OK, this does look to me like the correct way to address the issue, but 
where is the second NOP that you previously mentioned?  I fail to see it 
here and this code can't be made any better, there isn't anything you 
could possibly schedule into the delay slot as there is nothing else to 
do in this loop.

  Maciej


Re: [PATCH][wwwdocs] Update 5.0 changes.html with Thumb1 UAL

2014-11-18 Thread Kyrill Tkachov


On 18/11/14 02:48, Terry Guo wrote:

+ ul
+  li The Thumb-1 assembly code are now generated in unified syntax. The 
new option
+code-masm-syntax-unified/code can be used to specify whether 
inline assembly
+code are using unified syntax. By default the option is off which means
+non-unified syntax is used. However this is subject to change in 
future releases.
+Eventually the non-unified syntax will be deprecated.
+  /li
+ /ul

Hi Terry,

Sorry for the late comment, I see this has already been committed.

I think it should be assembly code is now generated.
Also whether inline assembly code is using unified syntax.

Kyrill



Re: [PATCH, MPX wrappers 2/3] Replace some function calls with wrapper calls during instrumentation

2014-11-18 Thread Ilya Enkovich
On 15 Nov 00:16, Jeff Law wrote:
 On 11/14/14 10:29, Ilya Enkovich wrote:
 Hi,
 
 This patch adds wrapper calls.  It's simply achieved by using proper name 
 for builtin clones.  Patches apply over builtins instrumentation series.
 
 Thanks,
 Ilya
 --
 2014-11-14  Ilya Enkovich  ilya.enkov...@intel.com
 
  * c-family/c.opt (fchkp-use-wrappers): New.
  * ipa-chkp.c (CHKP_WRAPPER_SYMBOL_PREFIX): New.
  (chkp_wrap_function): New.
  (chkp_build_instrumented_fndecl): Support wrapped
  functions.
 invoke.texi documentation please.
 
 OK with the invoke.texi doc work.  Patch #3 is OK.
 
 Jeff
 

invoke.texi misses all chkp options description.  I fixed it.  Here is a new 
patch.

Thanks,
Ilya
--
2014-11-18  Ilya Enkovich  ilya.enkov...@intel.com

* c-family/c.opt (fchkp-use-wrappers): New.
* ipa-chkp.c (CHKP_WRAPPER_SYMBOL_PREFIX): New.
(chkp_wrap_function): New.
(chkp_build_instrumented_fndecl): Support wrapped
functions.
* doc/invoke.texi (-fcheck-pointer-bounds): New.
(-fchkp-check-incomplete-type): New.
(-fchkp-first-field-has-own-bounds): New.
(-fchkp-narrow-bounds): New.
(-fchkp-narrow-to-innermost-array): New.
(-fchkp-optimize): New.
(-fchkp-use-fast-string-functions): New.
(-fchkp-use-nochk-string-functions): New.
(-fchkp-use-static-bounds): New.
(-fchkp-use-static-const-bounds): New.
(-fchkp-treat-zero-dynamic-size-as-infinite): New.
(-fchkp-check-read): New.
(-fchkp-check-write): New.
(-fchkp-store-bounds): New.
(-fchkp-instrument-calls): New.
(-fchkp-instrument-marked-only): New.
(-fchkp-use-wrappers): New.


diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 85dcb98..a9f8f99 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1040,6 +1040,10 @@ fchkp-instrument-marked-only
 C ObjC C++ ObjC++ LTO Report Var(flag_chkp_instrument_marked_only) Init(0)
 Instrument only functions marked with bnd_instrument attribute.
 
+fchkp-use-wrappers
+C ObjC C++ ObjC++ LTO Report Var(flag_chkp_use_wrappers) Init(1)
+Transform instrumented builtin calls into calls to wrappers.
+
 fcilkplus
 C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
 Enable Cilk Plus
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 89edddb..09a0683 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -299,6 +299,15 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-d@var{letters}  -dumpspecs  -dumpmachine  -dumpversion @gol
 -fsanitize=@var{style} -fsanitize-recover -fsanitize-recover=@var{style} @gol
 -fasan-shadow-offset=@var{number} -fsanitize-undefined-trap-on-error @gol
+-fcheck-pointer-bounds -fchkp-check-incomplete-type @gol
+-fchkp-first-field-has-own-bounds -fchkp-narrow-bounds @gol
+-fchkp-narrow-to-innermost-array -fchkp-optimize @gol
+-fchkp-use-fast-string-functions -fchkp-use-nochk-string-functions @gol
+-fchkp-use-static-bounds -fchkp-use-static-const-bounds @gol
+-fchkp-treat-zero-dynamic-size-as-infinite -fchkp-check-read @gol
+-fchkp-check-read -fchkp-check-write -fchkp-store-bounds @gol
+-fchkp-instrument-calls -fchkp-instrument-marked-only @gol
+-fchkp-use-wrappers @gol
 -fdbg-cnt-list -fdbg-cnt=@var{counter-value-list} @gol
 -fdisable-ipa-@var{pass_name} @gol
 -fdisable-rtl-@var{pass_name} @gol
@@ -5693,6 +5702,117 @@ a @code{libubsan} library routine.  The advantage of 
this is that the
 @code{libubsan} library is not needed and will not be linked in, so this
 is usable even for use in freestanding environments.
 
+@item -fcheck-pointer-bounds
+@opindex fcheck-pointer-bounds
+@opindex fno-check-pointer-bounds
+Enable Pointer Bounds Checker instrumentation.  Each memory reference
+is instrumented with checks of pointer used for memory access against
+bounds associated with that pointer.  Generated instrumentation may
+be controlled by various @option{-fchkp-*} options.
+
+@item -fchkp-check-incomplete-type
+@opindex fchkp-check-incomplete-type
+@opindex fno-chkp-check-incomplete-type
+Generate pointer bounds checks for variables with incomplete type.
+Enabled by default
+
+@item -fchkp-narrow-bounds
+@opindex fchkp-narrow-bounds
+@opindex fno-chkp-narrow-bounds
+Controls bounds used by Pointer Bounds Checker for pointers to object
+fields.  If narrowing is enabled then field bounds are used.  Otherwise
+object bounds are used.  See also @option{-fchkp-narrow-to-innermost-array}
+and @option{-fchkp-first-field-has-own-bounds}.  Enabled by default.
+
+@item -fchkp-first-field-has-own-bounds
+@opindex fchkp-first-field-has-own-bounds
+@opindex fno-chkp-first-field-has-own-bounds
+Forces Pointer Bounds Checker to use narrowed bounds for address of the
+first field in the structure.  By default pointer to the first field has
+the same bounds as pointer to the whole structure.
+
+@item -fchkp-narrow-to-innermost-array
+@opindex fchkp-narrow-to-innermost-array
+@opindex 

RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-18 Thread Andrew Bennett
  OK, this does look to me like the correct way to address the issue, but
 where is the second NOP that you previously mentioned?  I fail to see it
 here and this code can't be made any better, there isn't anything you
 could possibly schedule into the delay slot as there is nothing else to
 do in this loop.

The following testcase shows this occurring.

short v, count, ret;

int
main ()
{
  v = 0;
  count = 0;

  __atomic_exchange_n (v, count + 1, __ATOMIC_RELAXED);

  return 0;
}

Produces (for the atomic operation):

   .setnoat
sync
1:
ll  $3,0($5)
and $1,$3,$4
bne $1,$7,2f
and $1,$3,$6
or  $1,$1,$8
sc  $1,0($5)
beql$1,$0,1b
nop
nop
sync
2:
.setat


Regards,



Andrew


Re: [ping] account for register spans in expand_builtin_init_dwarf_reg_sizes

2014-11-18 Thread Olivier Hainque

On Nov 18, 2014, at 03:29 , Jason Merrill ja...@redhat.com wrote:
 What happens when the outer loop hits a register that we've already seen as 
 part of a span?

 Hmm, I was under the impression that this was supposed never to happen. I can
 see that this is not so clear though.

 What would happen with the current code is the production of extraneous
 instructions initializing the same slot, presumably with the same values.

 I have a candidate improvement to prevent processing the same regno
 multiple times.

 Will followup as soon as I have some test results.

 Thanks for your feedback,

 Olivier

 

Re: RFA (tree-inline): PATCH for more C++14 constexpr support

2014-11-18 Thread Kyrill Tkachov


On 17/11/14 18:15, Jason Merrill wrote:

On 11/17/2014 05:29 AM, Richard Biener wrote:

can you rename it to copy_fn please?  It really copies parameter and
result and then the body.

Ok with that change.

Done.  Here's what I'm checking in, along with a second patch to enable
the new code for C++11 as well:


Hi Jason,

These broke building libstdc++ on arm-none-eabi.
I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63936 for it.

Cheers,
Kyrill





[PATCH] Fix PR63868 - Guard offloading with ifdef ENABLE_OFFLOADING

2014-11-18 Thread Ilya Verbin
Hi,

This patch disables the generation of sections with offload IR and offload
tables by default, when a compiler is configured without
--enable-offload-targets= .
Bootsrap and regtesting in progress, OK after it finished?

  -- Ilya


PR regression/63868
gcc/
* cgraph.c (cgraph_node::create): Guard g-have_offload with
ifdef ENABLE_OFFLOADING.
* omp-low.c (create_omp_child_function): Likewise.
(expand_omp_target): Guard node and offload_funcs
with ifdef ENABLE_OFFLOADING.
* varpool.c (varpool_node::get_create): Guard g-have_offload and
offload_vars with ifdef ENABLE_OFFLOADING.


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index cc04744..18ae6a8 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -500,7 +500,9 @@ cgraph_node::create (tree decl)
lookup_attribute (omp declare target, DECL_ATTRIBUTES (decl)))
 {
   node-offloadable = 1;
+#ifdef ENABLE_OFFLOADING
   g-have_offload = true;
+#endif
 }
 
   node-register_symbol ();
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 915d55f..44ce479 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1961,7 +1961,9 @@ create_omp_child_function (omp_context *ctx, bool 
task_copy)
if (is_targetreg_ctx (octx))
  {
cgraph_node::get_create (decl)-offloadable = 1;
+#ifdef ENABLE_OFFLOADING
g-have_offload = true;
+#endif
break;
  }
 }
@@ -8287,7 +8289,9 @@ expand_omp_target (struct omp_region *region)
   if (kind == GF_OMP_TARGET_KIND_REGION)
 {
   unsigned srcidx, dstidx, num;
+#ifdef ENABLE_OFFLOADING
   struct cgraph_node *node;
+#endif
 
   /* If the target region needs data sent from the parent
 function, then the very first statement (except possible
@@ -8414,18 +8418,22 @@ expand_omp_target (struct omp_region *region)
   DECL_STRUCT_FUNCTION (child_fn)-curr_properties = cfun-curr_properties;
   cgraph_node::add_new_function (child_fn, true);
 
+#ifdef ENABLE_OFFLOADING
   /* Add the new function to the offload table.  */
   vec_safe_push (offload_funcs, child_fn);
+#endif
 
   /* Fix the callgraph edges for child_cfun.  Those for cfun will be
 fixed in a following pass.  */
   push_cfun (child_cfun);
   cgraph_edge::rebuild_edges ();
 
+#ifdef ENABLE_OFFLOADING
   /* Prevent IPA from removing child_fn as unreachable, since there are no
 refs from the parent function to child_fn in offload LTO mode.  */
   node = cgraph_node::get (child_fn);
   node-mark_force_output ();
+#endif
 
   /* Some EH regions might become dead, see PR34608.  If
 pass_cleanup_cfg isn't the first pass to happen with the
@@ -12502,7 +12510,7 @@ omp_finish_file (void)
 
   varpool_node::finalize_decl (vars_decl);
   varpool_node::finalize_decl (funcs_decl);
-   }
+}
   else
 {
   for (unsigned i = 0; i  num_funcs; i++)
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 80dd496..50f2e6e 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -171,9 +171,11 @@ varpool_node::get_create (tree decl)
lookup_attribute (omp declare target, DECL_ATTRIBUTES (decl)))
 {
   node-offloadable = 1;
+#ifdef ENABLE_OFFLOADING
   g-have_offload = true;
   if (!in_lto_p)
vec_safe_push (offload_vars, decl);
+#endif
 }
 
   node-register_symbol ();


Re: [PATCH] Fix PR63868 - Guard offloading with ifdef ENABLE_OFFLOADING

2014-11-18 Thread Jakub Jelinek
On Tue, Nov 18, 2014 at 06:31:59PM +0300, Ilya Verbin wrote:
 @@ -8287,7 +8289,9 @@ expand_omp_target (struct omp_region *region)
if (kind == GF_OMP_TARGET_KIND_REGION)
  {
unsigned srcidx, dstidx, num;
 +#ifdef ENABLE_OFFLOADING
struct cgraph_node *node;
 +#endif

Please instead move the struct cgraph_node *node; declaration
right above where it is used for the first time.

There is no goto involved there, and it isn't in a switch either,
so you probably also can do just:
struct cgraph_node *node = cgraph_node::get (child_fn);
instead.

Ok with that change.

 @@ -8414,18 +8418,22 @@ expand_omp_target (struct omp_region *region)
DECL_STRUCT_FUNCTION (child_fn)-curr_properties = 
 cfun-curr_properties;
cgraph_node::add_new_function (child_fn, true);
  
 +#ifdef ENABLE_OFFLOADING
/* Add the new function to the offload table.  */
vec_safe_push (offload_funcs, child_fn);
 +#endif
  
/* Fix the callgraph edges for child_cfun.  Those for cfun will be
fixed in a following pass.  */
push_cfun (child_cfun);
cgraph_edge::rebuild_edges ();
  
 +#ifdef ENABLE_OFFLOADING
/* Prevent IPA from removing child_fn as unreachable, since there are 
 no
refs from the parent function to child_fn in offload LTO mode.  */
node = cgraph_node::get (child_fn);
node-mark_force_output ();
 +#endif
  
/* Some EH regions might become dead, see PR34608.  If
pass_cleanup_cfg isn't the first pass to happen with the

Jakub


Re: [PATCH 4/4] Data structure is used for inline_summary struct.

2014-11-18 Thread Jan Hubicka
 On Fri, Nov 14, 2014 at 08:59:10PM +0100, Jan Hubicka wrote:

 b) with GTY, we cannot call destructor

Everything in symbol table is expecitely memory managed (i.e. enver left
to be freed by garbage collector). It resists in GTY only to allow 
linking
garbage collected object from them and to get PCH working.

   
   Well, if I understand the intent correctly, summaries are for stuff
   that is not in the symbol table.  For example jump functions are a
  Correct.
   vector of structures possibly containing trees, so everything has to
   be in garbage collected memory.
   
   When an edge is removed, it is necessary to be notified about it
   immediately, for example to decrement rdesc_refcount (you might argue
   that that should be done in a separate hook and not from within a
   summary class but then you start to rely on hook invocation ordering
   so I think it is better to eventually use the summaries for it too).
  
  I do not see why ctors/dtors can not do the reference counting. In fact
  this is how refcounting is done usually anyway?
  
 
 Well, when there is no garbage collection involved then yes, that is
 how you normally do it but in the GC case, there is the question of
 what is the appropriate time to call destructor on garbage collected
 data (like jump functions)?

I still fail to see problem here.  Summaries are explicitly managed- they are
constructed at summary construction time or when new callgarph node is
introduced/duplicated.  They are destroyed when callgarph node is destroyed or
whole summary is ddestroyed.  It is job of the summary datastructure to call
proper ctors/dtors, not job of garbage collector that provides the underlying
memory management.

If you have datastructure that points to something that is not explicitly
managed (i.e. tree expression), you just can not have non-trivial constructor
on that datastructure, because that is freed transparently by gty that don't do
destruction...

Honza



[PATCH V2] plugin event for C/C++ function definitions

2014-11-18 Thread Andres Tiraboschi
Hi, this patch adds a new plugin event PLUGIN_START_PARSE_FUNCTION and 
PLUGIN_FINISH_PARSE_FUNCTION that are invoked at start_function and 
finish_function respectively in the C and C++ frontends. 
PLUGIN_START_PARSE_FUNCTION is called before parsing the function body.
PLUGIN_FINISH_PARSE_FUNCTION is called after parsing a function definition.

changelog:

 gcc/c/c-decl.c: Invoke callbacks in start_function and finish_function.

 gcc/cp/decl.c: Invoke callbacks in start_function and finish_function.

 gcc/doc/plugins.texi: Add documentation about PLUGIN_START_FUNCTION and 
PLUGIN_FINISH_FUNCTION
 gcc/plugin.def: Add events for start_function and finish_function. 
 
 gcc/plugin.c (register_callback, invoke_plugin_callbacks): Same.

 gcc/testsuite/g++.dg/plugin/def_plugin.c: New test plugin. 
 
 gcc/testsuite/g++.dg/plugin/def-plugin-test.C: Testcase for above plugin.
 gcc/testsuite/g++.dg/plugin/plugin.exp 

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index e23284a..fea3334 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -8073,6 +8073,7 @@ start_function (struct c_declspecs *declspecs, struct 
c_declarator *declarator,
 
   decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, true, NULL,
  attributes, NULL, NULL, DEPRECATED_NORMAL);
+  invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1);
 
   /* If the declarator is not suitable for a function definition,
  cause a syntax error.  */
@@ -8886,6 +8887,7 @@ finish_function (void)
  It's still in DECL_STRUCT_FUNCTION, and we'll restore it in
  tree_rest_of_compilation.  */
   set_cfun (NULL);
+  invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, 
current_function_decl);
   current_function_decl = NULL;
 }
 
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index d4adbeb..ce2f832 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13631,6 +13631,7 @@ start_function (cp_decl_specifier_seq *declspecs,
   tree decl1;
 
   decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, attrs);
+  invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1);
   if (decl1 == error_mark_node)
 return false;
   /* If the declarator is not suitable for a function definition,
@@ -14260,6 +14261,7 @@ finish_function (int flags)
   vec_free (deferred_mark_used_calls);
 }
 
+  invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, fndecl);
   return fndecl;
 }
 
diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi
index 4a839b8..1c9e074 100644
--- a/gcc/doc/plugins.texi
+++ b/gcc/doc/plugins.texi
@@ -174,6 +174,8 @@ Callbacks can be invoked at the following pre-determined 
events:
 @smallexample
 enum plugin_event
 @{
+  PLUGIN_START_PARSE_FUNCTION,  /* Called before parsing the body of a 
function. */
+  PLUGIN_FINISH_PARSE_FUNCTION, /* After finishing parsing a function. */
   PLUGIN_PASS_MANAGER_SETUP,/* To hook into pass manager.  */
   PLUGIN_FINISH_TYPE,   /* After finishing parsing a type.  */
   PLUGIN_FINISH_DECL,   /* After finishing parsing a declaration. */
diff --git a/gcc/plugin.c b/gcc/plugin.c
index 8debc09..f7a8b64 100644
--- a/gcc/plugin.c
+++ b/gcc/plugin.c
@@ -433,6 +433,8 @@ register_callback (const char *plugin_name,
return;
  }
   /* Fall through.  */
+  case PLUGIN_START_PARSE_FUNCTION:
+  case PLUGIN_FINISH_PARSE_FUNCTION:
   case PLUGIN_FINISH_TYPE:
   case PLUGIN_FINISH_DECL:
   case PLUGIN_START_UNIT:
@@ -511,6 +513,8 @@ invoke_plugin_callbacks_full (int event, void *gcc_data)
gcc_assert (event = PLUGIN_EVENT_FIRST_DYNAMIC);
gcc_assert (event  event_last);
   /* Fall through.  */
+  case PLUGIN_START_PARSE_FUNCTION:
+  case PLUGIN_FINISH_PARSE_FUNCTION:
   case PLUGIN_FINISH_TYPE:
   case PLUGIN_FINISH_DECL:
   case PLUGIN_START_UNIT:
diff --git a/gcc/plugin.def b/gcc/plugin.def
index df5d383..04faec9 100644
--- a/gcc/plugin.def
+++ b/gcc/plugin.def
@@ -18,6 +18,12 @@ along with GCC; see the file COPYING3.  If not see
 http://www.gnu.org/licenses/.  */
 
 
+/* Called before parsing the body of a function.  */
+DEFEVENT (PLUGIN_START_PARSE_FUNCTION)
+
+/* After finishing parsing a function. */
+DEFEVENT (PLUGIN_FINISH_PARSE_FUNCTION)
+
 /* To hook into pass manager.  */
 DEFEVENT (PLUGIN_PASS_MANAGER_SETUP)
 
diff --git a/gcc/testsuite/g++.dg/plugin/def-plugin-test.C 
b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C
new file mode 100644
index 000..b7f2d3d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C
@@ -0,0 +1,13 @@
+int global = 12;
+
+int function1(void);
+
+int function2(int a) // { dg-warning Start fndef function2 }
+{
+  return function1() + a;
+} //  { dg-warning Finish fndef function2 }
+
+int function1(void) // { dg-warning Start fndef function1 }
+{
+  return global + 1;
+} //  { dg-warning Finish fndef function1 }
diff --git 

Re: [PATCH][AArch64] Adjust generic move costs

2014-11-18 Thread Ramana Radhakrishnan
On Mon, Nov 17, 2014 at 5:13 PM, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:
 On 14 November 2014 14:35, Wilco Dijkstra wdijk...@arm.com wrote:

 2014-11-14  Wilco Dijkstra  wdijk...@arm.com

 * gcc/config/aarch64/aarch64.c (generic_regmove_cost):
 Increase FP move cost.

 OK /Marcus

Changelog should probably indicate PR target/61915 when committing ?

Ramana


Re: Stream out default optimization nodes

2014-11-18 Thread H.J. Lu
On Mon, Nov 17, 2014 at 10:38 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 this patch makes us to store default optimization node same way as we stream
 target node.  This means that command line options given at compile time
 prevail those given at linktime.  Previously we sort of combined both.

 We still have a lot of flags that are global (i.e. not marked as Optimization)
 but affect way how the unit is output.  Since I woul dlike to replace the old
 option merging, I would like to add Global attribute to each of them in .opt
 file and generate streaming code for them same way as we do for
 optimization/target nodes.

 This patch regtested/bootstrapped x86_64-linux and in ealrier tree also
 ppc64-linux/ppc64-aix that do not work for me at the moment.
 I alosuse it in my tree for some time and tested firefox/libreoffice builds

 OK?
 Honza

 * tree.c (free_lang_data_in_decl): Store default optimization node.
 Index: tree.c
 ===
 --- tree.c  (revision 217659)
 +++ tree.c  (working copy)
 @@ -5118,6 +5118,9 @@ free_lang_data_in_decl (tree decl)
   if (!DECL_FUNCTION_SPECIFIC_TARGET (decl))
 DECL_FUNCTION_SPECIFIC_TARGET (decl)
   = target_option_default_node;
 + if (!DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl))
 +   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
 + = optimization_default_node;
 }

/* DECL_SAVED_TREE holds the GENERIC representation for DECL.

I think one of your LTO streaming change breaks GCC LTO build:

https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00473.html

/export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/xg++
-B/export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/
-B/usr/5.0.0/x86_64-unknown-linux-gnu/bin/ -nostdinc++
-B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
-B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
 
-I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
 
-I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include
 -I/export/gnu/import/git/gcc-test-profiled/src-trunk/libstdc++-v3/libsupc++
-L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
-L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
  -g -O2 -flto=jobserver -frandom-seed=1 -fprofile-use -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -DGENERATOR_FILE
-static-libstdc++ -static-libgcc  -o build/genmatch \
build/genmatch.o ../libcpp/libcpp.a ../libiberty/libiberty.a
build/errors.o build/vec.o build/hash-table.o
.././libiberty/libiberty.a
../../src-trunk/libcpp/lex.c: In function âend_directiveâ:
../../src-trunk/libcpp/lex.c:442:43: error:
â__builtin_ia32_pcmpestri128â needs isa option -m32 -msse4.2
   index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
   ^
make[7]: *** [/tmp/ccTC6Hk9.ltrans9.ltrans.o] Error 1


-- 
H.J.


Re: PATCH: PR bootstrap/63784: [5 Regression] profiledbootstrap failure with bootstrap-lto

2014-11-18 Thread H.J. Lu
On Fri, Nov 14, 2014 at 2:19 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Nov 14, 2014 at 12:15 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 8:02 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Nov 10, 2014 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Nov 10, 2014 at 5:44 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Nov 10, 2014 at 2:43 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Nov 10, 2014 at 05:32:32AM -0800, H.J. Lu wrote:
 On Mon, Nov 10, 2014 at 4:05 AM, Jakub Jelinek ja...@redhat.com wrote:
  On Mon, Nov 10, 2014 at 12:50:44PM +0100, Richard Biener wrote:
  On Sun, Nov 9, 2014 at 5:46 PM, H.J. Lu hongjiu...@intel.com wrote:
   Hi,
  
   r216964 disables bootstrap for libcc1 which exposed 2 things:
  
   1. libcc1 isn't compiled with LTO even when GCC is configured with
   --with-build-config=bootstrap-lto.  It may be intentional since
   libcc1 is disabled for bootstrap.
   2. -fPIC isn't used to created libcc1.so, which is OK if libcc1 is
   compiled with LTO which remembers PIC option.
 
  Why is this any special to LTO?  If it is then it looks like a LTO
  (driver) issue to me?  Why are we linking the pic libibterty into
  a non-pic libcc1?
 
  I admit I haven't tried LTO bootstrap, but from normal bootstrap logs,
  libcc1 is built normally using libtool using -fPIC only, and linked 
  into
  libcc1.so.0.0.0 and libcc1plugin.so.0.0.0, and of course against the
  pic/libiberty.a, because we need PIC code in the shared libraries.
  So, I don't understand the change at all.
 
  Jakub

 This is the command line to build libcc1.la:

 Sure, but there was -fPIC used to compile all the *.o files that are 
 being
 linked into libcc1.so, so LTO should know that.

 And it does.  If not please file a bug with a smaller testcase than libcc1
 and libiberty.


 There is nothing wrong with linker.  It is a slm-lto bug in libtool.  I 
 uploaded
 a testcase at

 https://gcc.gnu.org/bugzilla/attachment.cgi?id=33931


 My patch is a backport of libtool LTO support:

 commit b81fd4ef009c24a86a7e64727ea09efb410ea149
 Author: Ralf Wildenhues ralf.wildenh...@gmx.de
 Date:   Sun Aug 29 17:31:29 2010 +0200

 Support GCC LTO on GNU/Linux.

 * libltdl/config/ltmain.m4sh (func_mode_link): Allow through
 flags matching -O*, -flto*, -fwhopr, -fuse-linker-plugin.
 * libltdl/m4/libtool.m4 (_LT_CMD_GLOBAL_SYMBOLS): Drop symbols
 starting with __gnu_lto.
 (_LT_LINKER_SHLIBS) [linux] archive_cmds, archive_expsyms_cmds:
 Add $pic_flag for GCC.
 (_LT_LANG_CXX_CONFIG) [linux] archive_cmds, archive_expsyms_cmds:
 Likewise.
 (_LT_SYS_HIDDEN_LIBDEPS): Ignore files matching *.lto.o.
 * NEWS: Update.

 Signed-off-by: Ralf Wildenhues ralf.wildenh...@gmx.de

 OK to install?


 Ping.

 Stage 1 will be closed tomorrow.  I'd like to restore LTO bootstrap.

 Bugfixing is still possible after that date.  I suppose you don't call
 LTO bootstrap a new feature ;)


New LTO bugs have been introduced since LTO bootstrap was broken
a few weeks ago.  I'd like to check in this one so that we have a chance
to prevent further damage.  Any objects to backport this from libtool
upstream?

-- 
H.J.


  1   2   >