[PATCH] Fix latent PHI-opt bug
This fixes a miscompile that can happen when PHI-opt is entered with a not cleaned up CFG and we have a always true/false condition like if (a_1 != a_1) or if (0 != 0) in this case the check guarding /* If the middle basic block was empty or is defining the PHI arguments and this is a single phi where the args are different for the edges e0 and e1 then we can remove the middle basic block. */ if (emtpy_or_with_defined_p single_non_singleton_phi_for_edges (phi_nodes (gimple_bb (phi)), e0, e1)) { replace_phi_edge_with_variable (cond_bb, e1, phi, arg); /* Note that we optimized this PHI. */ return 2; can run into PHI being _not_ the single non-singleton PHI as both values in the conditional are equal. I've remembered running into this issue before so this time I'll fix it properly instead of just making sure to cleanup the CFG properly ... Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2014-10-21 Richard Biener rguent...@suse.de * tree-ssa-phiopt.c (value_replacement): Properly verify we are the non-singleton PHI. Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c (revision 216396) +++ gcc/tree-ssa-phiopt.c (working copy) @@ -814,7 +814,7 @@ value_replacement (basic_block cond_bb, for the edges e0 and e1 then we can remove the middle basic block. */ if (emtpy_or_with_defined_p single_non_singleton_phi_for_edges (phi_nodes (gimple_bb (phi)), - e0, e1)) +e0, e1) == phi) { replace_phi_edge_with_variable (cond_bb, e1, phi, arg); /* Note that we optimized this PHI. */
Re: [GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use
On Mon, Oct 20, 2014 at 5:53 PM, Xinliang David Li davi...@google.com wrote: On Mon, Oct 20, 2014 at 1:32 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Oct 20, 2014 at 12:02 AM, Xinliang David Li davi...@google.com wrote: On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li davi...@google.com wrote: On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote: The difference in instrumentation runtime is huge -- as topn profiler is pretty expensive to run. With FDO, it is probably better to make early inlining more aggressive in order to get more context sensitive profiling. I agree with that, I just would like to understand where increasing the iterations helps and if we can handle it without iterating (because Richi originally requested to drop the iteration for correcness issues) Well, I requested to do any iteration with an IPA view in mind. That is, iterate for cgraph cycles for example where currently we face the situation that at least one function is inlined unoptimized. For this we'd like to first optimize without inlining (well, maybe inlining doesn't hurt) yes -- inlining decision made without callee cleanup is more conservative and should not hurt. and then inline (and re-optimize if we inlined). Indirect edges are more interesting, but basically you'd want to re-inline once you discover new direct calls during early opts (but then make sure to do that only after the direct callee was early-optimized first). It would be interesting to inline the newly introduced direct calls if the callsites also have function pointer arguments that are known in the call context. Thus it would be nice if somebody could improve on the currently very simple function ordering we apply early opts, integrating iteration in a better way (not iterating over all functions but only where it might make a difference, focused on inlining). Do you have some examples? We can do FDO experiment by shutting down einline. (Note that increasing iteration to 2 did not actually improve performance with our benchmarks). Early inlining itself has large performance impact for FDO (the runtime of the profile-use build). With it disabled, the FDO performance drops by 2% on average. The degradation is seen across all benchmarks except for one. Only 2%? You are lucky ;) 2% average is considered pretty significant for optimized build runtime performance. For tramp3d introducing early inlining made a difference of 10% ;) (yes, statistically for tramp3d we have for each assembler instruction generated 100 calls in the initial code ... wheee C++ template metaprogramming!) Is this 10% difference from instrumentation build or optimized build runtime? It's from instrumentation build. I don't remember any numbers for the improvement on optimized build with FDO vs. non-FDO. Richard. So indeed early inlining was absoultely required to make FDO usable at all. thanks, David Richard. David David Honza David On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka hubi...@ucw.cz wrote: Increasing the number of early inliner iterations from 1 to 2 enables more indirect calls to be promoted/inlined before instrumentation. This in turn reduces the instrumentation overhead, particularly for more expensive indirect call topn profiling. How much difference you get here? One posibility would be also to run specialized ipa-cp before profile instrumentation. Honza Passes internal testing and regression tests. Ok for google/4_9? 2014-10-18 Teresa Johnson tejohn...@google.com Google ref b/17934523 * opts.c (finish_options): Increase max-early-inliner-iterations to 2 for profile-gen and profile-use builds. Index: opts.c === --- opts.c (revision 216286) +++ opts.c (working copy) @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g opts-x_param_values, opts_set-x_param_values); } + if (opts-x_profile_arc_flag + || opts-x_flag_branch_probabilities) +{ + maybe_set_param_value + (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2, +opts-x_param_values, opts_set-x_param_values); +} + if (!(opts-x_flag_auto_profile || (opts-x_profile_arc_flag || opts-x_flag_branch_probabilities))) { -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [patch] Second basic-block.h restructuring patch.
On Mon, Oct 20, 2014 at 8:21 PM, Andrew MacLeod amacl...@redhat.com wrote: creates cfg.h, cfganal.h, lcm.h, and loop-unroll.h to house the prototypes for those .c files. cfganal.h also gets struct edge_list and class control_dependences definitions since that is where all the routines and manipulators are declared. loop-unroll.h only exports 2 routines, so rather than including that in basic-block.h I simply included it from the 2 .c files which consume those routines. Again, the other includes will be flattened out of basic-block.h to just their consumers later. loop-unroll.c also had one function I marked as static since it wasn't actually used anywhere else. bootstraps on x86_64-unknown-linux-gnu, and regressions are running... I expect no regressions because of the nature of the changes. OK to check in assuming everything is OK? Ok. Thanks, Richard. Andrew
Re: [PATCH 6/8] Handle SCRATCH in decompose_address
Maxim Kuvyrkov maxim.kuvyr...@linaro.org writes: This patch is a simple fix to allow decompose_address to handle SCRATCH'es during 2nd scheduler pass. This patch is a prerequisite for a scheduler improvement that relies on decompose_address to parse insns. Bootstrapped and regtested on x86_64-linux-gnu and regtested on arm-linux-gnueabihf and aarch64-linux-gnu. Can't approve it, but FWIW, as the author of the original code it looks good to me. I agree (mem (scratch)) as an idiom for a mem with an unknown address should be handled here. Thanks, Richard
Re: [PATCH PR63530] Fix the pointer alignment in vectorization
On Mon, Oct 20, 2014 at 10:10 PM, Carrot Wei car...@google.com wrote: Hi Richard An arm testcase that can reproduce this bug is attached. 2014-10-20 Guozhi Wei car...@google.com PR tree-optimization/63530 gcc.target/arm/pr63530.c: New testcase. Index: pr63530.c === --- pr63530.c (revision 0) +++ pr63530.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon } */ +/* { dg-options -march=armv7-a -mfloat-abi=hard -mfpu=neon -marm -O2 -ftree-vectorize -funroll-loops --param \max-completely-peeled-insns=400\ } */ + +typedef struct { + unsigned char map[256]; + int i; +} A, *AP; + +void* calloc(int, int); + +AP foo (int n) +{ + AP b = (AP)calloc (1, sizeof (A)); + int i; + for (i = n; i 256; i++) +b-map[i] = i; + return b; +} + +/* { dg-final { scan-assembler-not vst1.64 } } */ Can you make it a runtime testcase that fails? This way it would be less target specific. On Mon, Oct 20, 2014 at 1:19 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 17, 2014 at 7:58 PM, Carrot Wei car...@google.com wrote: I miss a testcase. I also miss a comment before this code explaining why DR_MISALIGNMENT if not -1 is valid and why it is not valid if DR_MISALIGNMENT (dr) == -1 means some unknown misalignment, otherwise it means some known misalignment. See the usage in file tree-vect-stmts.c. I know that. 'offset' is supplied (what about 'byte_offset' btw?). Also if peeling It is for conservative, so it doesn't change the logic when offset is supplied. I've checked that most of the passed in offset are caused by negative step, its impact to DR_MISALIGNMENT should have already be considered in function vect_update_misalignment_for_peel, but the comments of vect_create_addr_base_for_vector_ref does not guarantee this usage of offset. The usage of byte_offset is quite broken, many direct or indirect callers don't provide the parameters. So only the author can comment this. Well - please make it consistent at least, (offset || byte_offset). for alignment aligned this ref (misalign == 0) you don't set the alignment. I assume if no misalignment is specified, the natural alignment of the vector type is used, and caused the wrong code in our case, is it right? No, DR_MISALIGNMENT == 0 means aligned. OTOH it's quite unnecessary to do all the dance with the alignment part of the SSA name info (unnecessary for the actual memory references created by the vectorizer). The type of the access ultimatively provides the larger alignment - the SSA name info only may enlarge it even further (thus it's dangerous to specify larger than valid there). So if you don't want to make it really optimal wrt offset/byte_offset please do if (offset || byte_offset || misalign == -1) mark_ptr_info_alignment_unknown (...) else set_ptr_info_alignment (..., align, misalign); The patch is ok with this change and the testcase turned into a runtime one and moved to gcc.dg/vect/ Thanks, Richard. Thus you may fix a bug (not sure without a testcase) but the new code certainly doesn't look 100% correct. That said, I would have expected that we can unconditionally do set_ptr_info_alignment (..., align, misalign) if misalign is != -1 and if we adjust misalign by offset * step + byte_offset (usually both are constants). Also we can still trust the alignment copied from addr_base modulo vector element size even if DR_MISALIGN is -1. This may matter for targets that require element-alignment for vector accesses.
Re: [PATCH 8/8] Use rank_for_schedule to as tie-breaker in model_order_p
Maxim Kuvyrkov maxim.kuvyr...@linaro.org writes: This patch improves model_order_p to use non-reg-pressure version of rank_for_schedule when it needs to break the tie. At the moment it is comparing INSN_PRIORITY by itself, and it seems prudent to outsource that to rank_for_schedule. Do you have an example of where this helps? A possible danger is that rank_for_schedule might (indirectly) depend on state that isn't maintained or updated in the same way during the model schedule phase. Thanks, Richard
Re: [PATCH, PR63307] Fix generation of new declarations in random order
On Thu, Oct 16, 2014 at 11:06:34AM -0600, Jeff Law wrote: We really prefer fully specified sorts. For a qsort callback, this doesn't look fully specified. With that fixed, this should be OK. jeff Thanks for the review. Here is the updated version. Is it ok? Yes, this is good for the trunk. This broke bootstrap everywhere unfortunately, has it been tested at all? I already wrote during the initial comment that BLOCKs aren't decls and you can't push them into the vectors, they can't be sorted easily (BLOCK_NUMBER isn't assigned at that point e.g. and the comparison function looks at DECL_UID unconditionally anyway). I've bootstrapped/regtested on i686-linux the following quick fix, bootstrapped on x86_64-linux too, in the middle of regtesting there. If it succeeds, I'll commit as obvious, so that people can continue working on the trunk. 2014-10-21 Jakub Jelinek ja...@redhat.com * cilk.c (fill_decls_vec): Only put decls into vector v. (compare_decls): Fix up formatting. --- gcc/c-family/cilk.c.jj 2014-10-20 19:24:54.0 +0200 +++ gcc/c-family/cilk.c 2014-10-21 08:46:24.727790990 +0200 @@ -347,9 +347,12 @@ fill_decls_vec (tree const key0, tree * tree t1 = key0; struct cilk_decls dp; - dp.key = t1; - dp.val = val0; - v-safe_push (dp); + if (DECL_P (t1)) +{ + dp.key = t1; + dp.val = val0; + v-safe_push (dp); +} return true; } @@ -400,8 +403,8 @@ create_parm_list (struct wrapper_data *w static int compare_decls (const void *a, const void *b) { - const struct cilk_decls* t1 = (const struct cilk_decls*) a; - const struct cilk_decls* t2 = (const struct cilk_decls*) b; + const struct cilk_decls *t1 = (const struct cilk_decls *) a; + const struct cilk_decls *t2 = (const struct cilk_decls *) b; if (DECL_UID (t1-key) DECL_UID (t2-key)) return 1; Jakub
Re: The nvptx port [0/11+]
On Mon, Oct 20, 2014 at 4:17 PM, Bernd Schmidt ber...@codesourcery.com wrote: This is a patch kit that adds the nvptx port to gcc. It contains preliminary patches to add needed functionality, the target files, and one somewhat optional patch with additional target tools. There'll be more patch series, one for the testsuite, and one to make the offload functionality work with this port. Also required are the previous four rtl patches, two of which weren't entirely approved yet. For the moment, I've stripped out all the address space support that got bogged down in review by brokenness in our representation of address spaces. The ptx address spaces are of course still defined and used inside the backend. Ptx really isn't a usual target - it is a virtual target which is then translated by another compiler (ptxas) to the final code that runs on the GPU. There are many restrictions, some imposed by the GPU hardware, and some by the fact that not everything you'd want can be represented in ptx. Here are some of the highlights: * Everything is typed - variables, functions, registers. This can cause problems with KR style C or anything else that doesn't have a proper type internally. * Declarations are needed, even for undefined variables. * Can't emit initializers referring to their variable's address since you can't write forward declarations for variables. * Variables can be declared only as scalars or arrays, not structures. Initializers must be in the variable's declared type, which requires some code in the backend, and it means that packed pointer values are not representable. * Since it's a virtual target, we skip register allocation - no good can probably come from doing that twice. This means asm statements aren't fixed up and will fail if they use matching constraints. So with this restriction I wonder why it didn't make sense to go the HSA backend route emitting PTX from a GIMPLE SSA pass. This would have avoided the LTO dance as well ... That is, what is the advantage of expanding to RTL here - what main benefits do you get from that which you thought would be different to handle if doing code generation from GIMPLE SSA? For HSA we even do register allocation (to a fixed virtual register set), sth simple enough on SSA. We of course also have to do instruction selection but luckily virtual ISAs are easy to target. So were you worried about duplicating instruction selection and or doing it manually instead of with well-known machine descriptions? I'm just curious - I am not asking you to rewrite the beast ;) Thanks, Richard. * No support for indirect jumps, label values, nonlocal gotos. * No alloca - ptx defines it, but it's not implemented. * No trampolines. * No debugging (at all, for now - we may add line number directives). * Limited C library support - I have a hacked up copy of newlib that provides a reasonable subset. * malloc and free are defined by ptx (these appear to be undocumented), but there isn't a realloc. I have one patch for Fortran to use a malloc/memcpy helper function in cases where we know the old size. All in all, this is not intended to be used as a C (or any other source language) compiler. I've gone through a lot of effort to make it work reasonably well, but only in order to get sufficient test coverage from the testsuites. The intended use for this is only to build it as an offload compiler, and use it through OpenACC by way of lto1. That leaves the question of how we should document it - does it need the usual constraint and option documentation, given that user's aren't expected to use any of it? A slightly earlier version of the entire patch kit was bootstrapped and tested on x86_64-linux. Ok for trunk? Bernd
RE: [PATCH, PR63307] Fix generation of new declarations in random order
For some reasons it passed bootstrap locally... -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Tuesday, October 21, 2014 12:15 PM To: Zamyatin, Igor; Jeff Law Cc: GCC Patches (gcc-patches@gcc.gnu.org) Subject: Re: [PATCH, PR63307] Fix generation of new declarations in random order On Thu, Oct 16, 2014 at 11:06:34AM -0600, Jeff Law wrote: We really prefer fully specified sorts. For a qsort callback, this doesn't look fully specified. With that fixed, this should be OK. jeff Thanks for the review. Here is the updated version. Is it ok? Yes, this is good for the trunk. This broke bootstrap everywhere unfortunately, has it been tested at all? I already wrote during the initial comment that BLOCKs aren't decls and you can't push them into the vectors, they can't be sorted easily (BLOCK_NUMBER isn't assigned at that point e.g. and the comparison function looks at DECL_UID unconditionally anyway). I've bootstrapped/regtested on i686-linux the following quick fix, bootstrapped on x86_64-linux too, in the middle of regtesting there. If it succeeds, I'll commit as obvious, so that people can continue working on the trunk. 2014-10-21 Jakub Jelinek ja...@redhat.com * cilk.c (fill_decls_vec): Only put decls into vector v. (compare_decls): Fix up formatting. --- gcc/c-family/cilk.c.jj2014-10-20 19:24:54.0 +0200 +++ gcc/c-family/cilk.c 2014-10-21 08:46:24.727790990 +0200 @@ -347,9 +347,12 @@ fill_decls_vec (tree const key0, tree * tree t1 = key0; struct cilk_decls dp; - dp.key = t1; - dp.val = val0; - v-safe_push (dp); + if (DECL_P (t1)) +{ + dp.key = t1; + dp.val = val0; + v-safe_push (dp); +} return true; } @@ -400,8 +403,8 @@ create_parm_list (struct wrapper_data *w static int compare_decls (const void *a, const void *b) { - const struct cilk_decls* t1 = (const struct cilk_decls*) a; - const struct cilk_decls* t2 = (const struct cilk_decls*) b; + const struct cilk_decls *t1 = (const struct cilk_decls *) a; const + struct cilk_decls *t2 = (const struct cilk_decls *) b; if (DECL_UID (t1-key) DECL_UID (t2-key)) return 1; Jakub
Re: [PATCH, PR63307] Fix generation of new declarations in random order
On Tue, Oct 21, 2014 at 10:14:56AM +0200, Jakub Jelinek wrote: I've bootstrapped/regtested on i686-linux the following quick fix, bootstrapped on x86_64-linux too, in the middle of regtesting there. If it succeeds, I'll commit as obvious, so that people can continue working on the trunk. Ah, Kyrill has reverted the commit in the mean time, so there is no rush for this, so I'm not going to commit it now. The question remains, are the decls all you need from the traversal (i.e. what you need to act upon)? From my earlier skim of the original code that wasn't that obvious. You can have in decl_map at least also BLOCKs, perhaps types too, what else? 2014-10-21 Jakub Jelinek ja...@redhat.com * cilk.c (fill_decls_vec): Only put decls into vector v. (compare_decls): Fix up formatting. Jakub
Re: The nvptx port [0/11+]
On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote: * Can't emit initializers referring to their variable's address since you can't write forward declarations for variables. Can't that be handled by emitting the initializer without the address and some constructor that fixes up the initializer at runtime? * Variables can be declared only as scalars or arrays, not structures. Initializers must be in the variable's declared type, which requires some code in the backend, and it means that packed pointer values are not representable. Can't you represent structures and unions as arrays of chars? For constant initializers that don't need relocations the compiler can surely turn them into arrays of char initializers (e.g. fold-const.c native_encode_expr/native_interpret_expr could be used for that). Supposedly it would mean slower than perhaps necessary loads/stores of aligned larger fields from the structure, but if it is an alternative to not supporting structures/unions at all, that sounds like so severe limitation that it can be pretty fatal for the target. * No support for indirect jumps, label values, nonlocal gotos. Not even indirect calls? How do you implement C++ or Fortran vtables? Jakub
Re: [PATCH i386 AVX512] [81/n] Add new built-ins.
On Mon, Oct 20, 2014 at 3:50 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Oct 20, 2014 at 05:41:25PM +0400, Kirill Yukhin wrote: Hello, This patch adds (almost) all built-ins needed by AVX-512VL,BW,DQ intrinsics. Main questionable hunk is: diff --git a/gcc/tree-core.h b/gcc/tree-core.h index b69312b..a639487 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1539,7 +1539,7 @@ struct GTY(()) tree_function_decl { DECL_FUNCTION_CODE. Otherwise unused. ??? The bitfield needs to be able to hold all target function codes as well. */ - ENUM_BITFIELD(built_in_function) function_code : 11; + ENUM_BITFIELD(built_in_function) function_code : 12; ENUM_BITFIELD(built_in_class) built_in_class : 2; unsigned static_ctor_flag : 1; Well, decl_with_vis has 15 unused bits, so instead of growing FUNCTION_DECL significantly, might be better to move one of the flags to decl_with_vis and just document that it applies to FUNCTION_DECLs only. Or move some flag to cgraph if possible. But seeing e.g. IX86_BUILTIN_FIXUPIMMPD256, IX86_BUILTIN_FIXUPIMMPD256_MASK, IX86_BUILTIN_FIXUPIMMPD256_MASKZ etc. I wonder if you really need that many builtins, weren't we adding for avx512f just single builtin instead of 3 different ones, always providing mask argument and depending on whether it is all ones, etc. figuring out what kind of masking should be performed? If only we had no lang-specific flags in tree_base we could use the same place as we use for internal function code ... But yes, not using that many builtins in the first place is preferred for example by making them type-generic and/or variadic. Richard. Jakub
[1/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target
Hi There, This is the first patch to enable GCC generate UAL assembly code for Thumb1 target. This new option enables user to specify which syntax is used in their inline assembly code. If the inline assembly code uses UAL format, then gcc does nothing because gcc generates UAL code as well. If the inline assembly code uses non-UAL, then gcc will insert some directives in final assembly code. Is it ok to trunk? BR, Terry 2014-10-21 Terry Guo terry@arm.com * config/arm/arm.h (TARGET_UNIFIED_ASM): Also include thumb1. (ASM_APP_ON): Redefined. * config/arm/arm.c (arm_option_override): Thumb2 always uses UAL for inline assembly code. * config/arm/arm.opt (masm-syntax-unified): New option. * doc/invoke.texi (-masm-syntax-unified): Document new option.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 3623c70..e654e22 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -165,6 +165,8 @@ extern char arm_arch_name[]; } \ if (TARGET_IDIV)\ builtin_define (__ARM_ARCH_EXT_IDIV__); \ + if (inline_asm_unified) \ + builtin_define (__ARM_ASM_SYNTAX_UNIFIED__);\ } while (0) #include config/arm/arm-opts.h @@ -348,8 +350,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void); || (!optimize_size !current_tune-prefer_constant_pool))) /* We could use unified syntax for arm mode, but for now we just use it - for Thumb-2. */ -#define TARGET_UNIFIED_ASM TARGET_THUMB2 + for thumb mode. */ +#define TARGET_UNIFIED_ASM (TARGET_THUMB) /* Nonzero if this chip provides the DMB instruction. */ #define TARGET_HAVE_DMB(arm_arch6m || arm_arch7) @@ -2144,8 +2146,13 @@ extern int making_const_table; #define CC_STATUS_INIT \ do { cfun-machine-thumb1_cc_insn = NULL_RTX; } while (0) +#undef ASM_APP_ON +#define ASM_APP_ON (inline_asm_unified ? \t.syntax unified : \ + \t.syntax divided\n) + #undef ASM_APP_OFF -#define ASM_APP_OFF (TARGET_ARM ? : \t.thumb\n) +#define ASM_APP_OFF (TARGET_ARM ? \t.arm\n\t.syntax divided\n : \ +\t.thumb\n\t.syntax unified\n) /* Output a push or a pop instruction (only used when profiling). We can't push STATIC_CHAIN_REGNUM (r12) directly with Thumb-1. We know diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 1ee0eb3..9ccf73c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3121,6 +3121,11 @@ arm_option_override (void) if (target_slow_flash_data) arm_disable_literal_pool = true; + /* Thumb2 inline assembly code should always use unified syntax. + This will apply to ARM and Thumb1 eventually. */ + if (TARGET_THUMB2) +inline_asm_unified = 1; + /* Register global variables with the garbage collector. */ arm_add_gc_roots (); } diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt index 0a80513..50f4c7d 100644 --- a/gcc/config/arm/arm.opt +++ b/gcc/config/arm/arm.opt @@ -271,3 +271,7 @@ Use Neon to perform 64-bits operations rather than core registers. mslow-flash-data Target Report Var(target_slow_flash_data) Init(0) Assume loading data from flash is slower than fetching instructions. + +masm-syntax-unified +Target Report Var(inline_asm_unified) Init(0) +Assume unified syntax for Thumb inline assembly code. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23f272f..c30c858 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -545,6 +545,7 @@ Objective-C and Objective-C++ Dialects}. -munaligned-access @gol -mneon-for-64bits @gol -mslow-flash-data @gol +-masm-syntax-unified @gol -mrestrict-it} @emph{AVR Options} @@ -12954,6 +12955,14 @@ Therefore literal load is minimized for better performance. This option is only supported when compiling for ARMv7 M-profile and off by default. +@item -masm-syntax-unified +@opindex masm-syntax-unified +Assume the Thumb1 inline assembly code are using unified syntax. +The default is currently off, which means divided syntax is assumed. +However, this may change in future releases of GCC. Divided syntax +should be considered deprecated. This option has no effect when +generating Thumb2 code. Thumb2 assembly code always uses unified syntax. + @item -mrestrict-it @opindex mrestrict-it Restricts generation of IT blocks to conform to the rules of ARMv8.
RE: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap
Hi Richard, I realized thanks to Christophe Lyon that a shift was not right: the shift count is a number of bytes instead of a number of bits. This extra patch fixes the problem. ChangeLog are as follows: *** gcc/ChangeLog *** 2014-09-26 Thomas Preud'homme thomas.preudho...@arm.com * tree-ssa-math-opts.c (find_bswap_or_nop_1): Fix creation of MARKER_BYTE_UNKNOWN markers when handling casts. *** gcc/testsuite/ChangeLog *** 2014-10-08 Thomas Preud'homme thomas.preudho...@arm.com * gcc.dg/optimize-bswaphi-1.c: New bswap pass test. diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c index 3e51f04..18aba28 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c +++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c @@ -42,6 +42,20 @@ uint32_t read_be16_3 (unsigned char *data) return *(data + 1) | (*data 8); } +typedef int SItype __attribute__ ((mode (SI))); +typedef int HItype __attribute__ ((mode (HI))); + +/* Test that detection of significant sign extension works correctly. This + checks that unknown byte marker are set correctly in cast of cast. */ + +HItype +swap16 (HItype in) +{ + return (HItype) (((in 0) 0xFF) 8) + | (((in 8) 0xFF) 0); +} + /* { dg-final { scan-tree-dump-times 16 bit load in target endianness found at 3 bswap } } */ -/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 3 bswap { xfail alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 1 bswap { target alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 4 bswap { xfail alpha*-*-* arm*-*-* } } } */ /* { dg-final { cleanup-tree-dump bswap } } */ diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 3c6e935..2ef2333 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1916,7 +1916,8 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) if (!TYPE_UNSIGNED (n-type) type_size old_type_size HEAD_MARKER (n-n, old_type_size)) for (i = 0; i type_size - old_type_size; i++) - n-n |= MARKER_BYTE_UNKNOWN (type_size - 1 - i); + n-n |= MARKER_BYTE_UNKNOWN +((type_size - 1 - i) * BITS_PER_MARKER); if (type_size 64 / BITS_PER_MARKER) { regression testsuite run without regression on x86_64-linux-gnu and bswap tests all pass on arm-none-eabi target Is it ok for trunk? Best regards, Thomas -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 24, 2014 4:01 PM To: Thomas Preud'homme Cc: GCC Patches Subject: Re: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap On Tue, Sep 16, 2014 at 12:24 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi all, The fix for PR61306 disabled bswap when a sign extension is detected. However this led to a test case regression (and potential performance regression) in case where a sign extension happens but its effect is canceled by other bit manipulation. This patch aims to fix that by having a special marker to track bytes whose value is unpredictable due to sign extension. If the final result of a bit manipulation doesn't contain any such marker then the bswap optimization can proceed. Nice and simple idea. Ok. Thanks, Richard. *** gcc/ChangeLog *** 2014-09-15 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/63266 * tree-ssa-math-opts.c (struct symbolic_number): Add comment about marker for unknown byte value. (MARKER_MASK): New macro. (MARKER_BYTE_UNKNOWN): New macro. (HEAD_MARKER): New macro. (do_shift_rotate): Mark bytes with unknown values due to sign extension when doing an arithmetic right shift. Replace hardcoded mask for marker by new MARKER_MASK macro. (find_bswap_or_nop_1): Likewise and adjust ORing of two symbolic numbers accordingly. *** gcc/testsuite/ChangeLog *** 2014-09-15 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/63266 * gcc.dg/optimize-bswapsi-1.c (swap32_d): New bswap pass test. Testing: * Built an arm-none-eabi-gcc cross-compiler and used it to run the testsuite on QEMU emulating Cortex-M3 without any regression * Bootstrapped on x86_64-linux-gnu target and testsuite was run without regression Ok for trunk?
[PATCH, fixincludes]: Add pthread.h to glibc_c99_inline_4 fix
On Thu, Oct 16, 2014 at 2:05 PM, Jakub Jelinek ja...@redhat.com wrote: Recent change caused bootstrap failure on CentOS 5.11: /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only handles version 2 information. unwind-dw2-fde-dip_s.o: In function `__pthread_cleanup_routine': unwind-dw2-fde-dip.c:(.text+0x1590): multiple definition of `__pthread_cleanup_routine' /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only handles version 2 information. unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only handles version 2 information. unwind-sjlj_s.o: In function `__pthread_cleanup_routine': unwind-sjlj.c:(.text+0x0): multiple definition of `__pthread_cleanup_routine' unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only handles version 2 information. emutls_s.o: In function `__pthread_cleanup_routine': emutls.c:(.text+0x170): multiple definition of `__pthread_cleanup_routine' unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here collect2: error: ld returned 1 exit status gmake[5]: *** [libgcc_s.so] Error 1 $ ld --version GNU ld version 2.17.50.0.6-26.el5 20061020 It looks like a switch-to-c11 fallout. Older glibc versions have issues with c99 (and c11) conformance [1]. Changing extern __inline void __pthread_cleanup_routine (...) in system /usr/include/pthread.h to if __STDC_VERSION__ 199901L extern #endif __inline__ void __pthread_cleanup_routine (...) fixes this issue and allows bootstrap to proceed. However, fixincludes is not yet built in stage1 bootstrap. Is there a way to fix this issue without changing system headers? [1] https://gcc.gnu.org/ml/gcc-patches/2006-11/msg01030.html Yeah, old glibcs are totally incompatible with -fno-gnu89-inline. Not sure if it is easily fixincludable, if yes, then -fgnu89-inline should be used for code like libgcc which is built with the newly built compiler before it is fixincluded. Or we need -fgnu89-inline by default for old glibcs (that is pretty much what we do e.g. in Developer Toolset for RHEL5). At the end of the day, adding pthread.h to glibc_c99_inline_4 fix fixes the bootstrap. The fix applies __attribute__((__gnu_inline__)) to the declaration: extern __inline __attribute__ ((__gnu_inline__)) void __pthread_cleanup_routine (struct __pthread_cleanup_frame *__frame) 2014-10-21 Uros Bizjak ubiz...@gmail.com * inclhack.def (glibc_c99_inline_4): Add pthread.h to files. * fixincl.x: Regenerate. Bootstrapped and regression tested on CentOS 5.11 x86_64-linux-gnu {,-m32}. OK for mainline? Uros. Index: fixincl.x === --- fixincl.x (revision 216501) +++ fixincl.x (working copy) @@ -2,11 +2,11 @@ * * DO NOT EDIT THIS FILE (fixincl.x) * - * It has been AutoGen-ed August 12, 2014 at 02:09:58 PM by AutoGen 5.12 + * It has been AutoGen-ed October 21, 2014 at 10:18:16 AM by AutoGen 5.16.2 * From the definitionsinclhack.def * and the template file fixincl */ -/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Aug 12 14:09:58 MSK 2014 +/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Oct 21 10:18:17 CEST 2014 * * You must regenerate it. Use the ./genfixes script. * @@ -3173,7 +3173,7 @@ * File name selection pattern */ tSCC zGlibc_C99_Inline_4List[] = - sys/sysmacros.h\0*/sys/sysmacros.h\0wchar.h\0*/wchar.h\0; + sys/sysmacros.h\0*/sys/sysmacros.h\0wchar.h\0*/wchar.h\0pthread.h\0*/pthread.h\0; /* * Machine/OS name selection pattern */ Index: inclhack.def === --- inclhack.def(revision 216501) +++ inclhack.def(working copy) @@ -1687,7 +1687,8 @@ */ fix = { hackname = glibc_c99_inline_4; -files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h'; +files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h', +pthread.h, '*/pthread.h'; bypass= __extern_inline|__gnu_inline__; select= (^| )extern __inline; c_fix = format;
Re: [PATCH, fixincludes]: Add pthread.h to glibc_c99_inline_4 fix
On Tue, Oct 21, 2014 at 11:30:49AM +0200, Uros Bizjak wrote: At the end of the day, adding pthread.h to glibc_c99_inline_4 fix fixes the bootstrap. The fix applies __attribute__((__gnu_inline__)) to the declaration: extern __inline __attribute__ ((__gnu_inline__)) void __pthread_cleanup_routine (struct __pthread_cleanup_frame *__frame) 2014-10-21 Uros Bizjak ubiz...@gmail.com * inclhack.def (glibc_c99_inline_4): Add pthread.h to files. * fixincl.x: Regenerate. Bootstrapped and regression tested on CentOS 5.11 x86_64-linux-gnu {,-m32}. OK for mainline? Ok, thanks. --- inclhack.def (revision 216501) +++ inclhack.def (working copy) @@ -1687,7 +1687,8 @@ */ fix = { hackname = glibc_c99_inline_4; -files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h'; +files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h', +pthread.h, '*/pthread.h'; bypass= __extern_inline|__gnu_inline__; select= (^| )extern __inline; c_fix = format; Jakub
Re: [PATCH/AARCH64] Add ThunderX -mcpu support
On 20 October 2014 21:45, Andrew Pinski apin...@cavium.com wrote: Hi, This adds simple -mcpu=thunderx support. Right now we use the schedule model of cortex-a53 but we will submit a schedule model for ThunderX later on. Note ThunderX is an AARCH64 only processor so I created a new file to hold the cost tables for it rather than adding it to aarch-cost-tables.h. OK? Built and tested for aarch64-elf. OK, thanks! Couple of minor nits: +/* RTX cost tables for aarch64. s/aarch64/AArch64/ +/* ThunderX does not have implement AARCH32. */ s/AARCH32/AArch32/ Cheers /Marcus Thanks, Andrew Pinski PS The corresponding binutils patch is located at https://sourceware.org/ml/binutils/2014-10/msg00170.html . ChangeLog: * doc/invoke.texi (AARCH64/mtune): Document thunderx as an available option also. * config/aarch64/aarch64-cost-tables.h: New file. * config/aarch64/aarch64-cores.def (thunderx): New core. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.c: Include aarch64-cost-tables.h instead of config/arm/aarch-cost-tables.h. (thunderx_regmove_cost): New variable. (thunderx_tunings): New variable.
[PATCH] Add arm_cortex_m7_tune.
Hi, This patch is used to tune the gcc for Cortex-M7. The performance of Dhrystone can be improved by 1%. The performance of Coremark can be improved by 2.3%. Patch also attached for convenience. Is it ok for trunk? Thanks and Best Regards, Hale Wang gcc/ChangeLog 2014-10-11 Hale Wang hale.w...@arm.com * config/arm/arm.c: Add cortex-m7 tune. * config/arm/arm-cores.def: Use cortex-m7 tune. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 56ec7fd..3b34173 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -149,7 +149,7 @@ ARM_CORE(cortex-r4, cortexr4, cortexr4, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r4f,cortexr4f, cortexr4f, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r5, cortexr5, cortexr5, 7R, FL_LDSCHED | FL_ARM_DIV, cortex) ARM_CORE(cortex-r7, cortexr7, cortexr7, 7R, FL_LDSCHED | FL_ARM_DIV, cortex) -ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, v7m) +ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 93b989d..834b13a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2003,6 +2003,27 @@ const struct tune_params arm_v7m_tune = 8 /* Maximum insns to inline memset. */ }; +/* Cortex-M7 tuning. */ + +const struct tune_params arm_cortex_m7_tune = +{ + arm_9e_rtx_costs, + v7m_extra_costs, + NULL, /* Sched adj cost. */ + 0, /* Constant limit. */ + 0, /* Max cond insns. */ + ARM_PREFETCH_NOT_BENEFICIAL, + true, /* Prefer constant pool. */ + arm_cortex_m_branch_cost, + false, /* Prefer LDRD/STRD. */ + {true, true}, /* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false, /* Prefer Neon for stringops. */ + 8 /* Maximum insns to inline memset. */ +}; + /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus. */ const struct tune_params arm_v6m_tune =
[PATCH] Don't put conditional loads/stores into interleaved chains (PR tree-optimization/63563)
Hi! This patch prevents conditional loads/stores to be added into interleaved groups (where it ICEs later on). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9? 2014-10-21 Jakub Jelinek ja...@redhat.com PR tree-optimization/63563 * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Bail out if either dra or drb stmts are not normal loads/stores. * gcc.target/i386/pr63563.c: New test. --- gcc/tree-vect-data-refs.c.jj2014-10-03 10:10:42.0 +0200 +++ gcc/tree-vect-data-refs.c 2014-10-20 15:21:47.938679992 +0200 @@ -2551,11 +2551,14 @@ vect_analyze_data_ref_accesses (loop_vec over them. The we can just skip ahead to the next DR here. */ /* Check that the data-refs have same first location (except init) -and they are both either store or load (not load and store). */ +and they are both either store or load (not load and store, +not masked loads or stores). */ if (DR_IS_READ (dra) != DR_IS_READ (drb) || !operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) - || !dr_equal_offsets_p (dra, drb)) + || !dr_equal_offsets_p (dra, drb) + || !gimple_assign_single_p (DR_STMT (dra)) + || !gimple_assign_single_p (DR_STMT (drb))) break; /* Check that the data-refs have the same constant size and step. */ --- gcc/testsuite/gcc.target/i386/pr63563.c.jj 2014-10-20 15:27:17.713745577 +0200 +++ gcc/testsuite/gcc.target/i386/pr63563.c 2014-10-20 15:27:57.637023020 +0200 @@ -0,0 +1,17 @@ +/* PR tree-optimization/63563 */ +/* { dg-do compile } */ +/* { dg-options -O3 -mavx2 } */ + +struct A { unsigned long a, b, c, d; } a[1024] = { { 0, 1, 2, 3 } }, b; + +void +foo (void) +{ + int i; + for (i = 0; i 1024; i++) +{ + a[i].a = a[i].b = a[i].c = b.c; + if (a[i].d) + a[i].d = b.d; +} +} Jakub
RE: [PATCH] Add arm_cortex_m7_tune.
Attach the patch. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Hale Wang Sent: Tuesday, October 21, 2014 5:49 PM To: gcc-patches@gcc.gnu.org Subject: [PATCH] Add arm_cortex_m7_tune. Hi, This patch is used to tune the gcc for Cortex-M7. The performance of Dhrystone can be improved by 1%. The performance of Coremark can be improved by 2.3%. Patch also attached for convenience. Is it ok for trunk? Thanks and Best Regards, Hale Wang gcc/ChangeLog 2014-10-11 Hale Wang hale.w...@arm.com * config/arm/arm.c: Add cortex-m7 tune. * config/arm/arm-cores.def: Use cortex-m7 tune. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 56ec7fd..3b34173 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -149,7 +149,7 @@ ARM_CORE(cortex-r4, cortexr4, cortexr4, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r4f,cortexr4f, cortexr4f, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r5, cortexr5, cortexr5, 7R, FL_LDSCHED | FL_ARM_DIV, cortex) ARM_CORE(cortex-r7, cortexr7, cortexr7, 7R, FL_LDSCHED | FL_ARM_DIV, cortex) -ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, v7m) +ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 93b989d..834b13a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2003,6 +2003,27 @@ const struct tune_params arm_v7m_tune = 8 /* Maximum insns to inline memset. */ }; +/* Cortex-M7 tuning. */ + +const struct tune_params arm_cortex_m7_tune = { + arm_9e_rtx_costs, + v7m_extra_costs, + NULL, /* Sched adj cost. */ + 0, /* Constant limit. */ + 0, /* Max cond insns. */ + ARM_PREFETCH_NOT_BENEFICIAL, + true, /* Prefer constant pool. */ + arm_cortex_m_branch_cost, + false, /* Prefer LDRD/STRD. */ + {true, true}, /* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false, /* Prefer Neon for stringops. */ + 8 /* Maximum insns to inline memset. */ +}; + /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus. */ const struct tune_params arm_v6m_tune = cortex-m7-tune-2.patch Description: Binary data
Small multiplier support in Cortex-M0/1/+
Hi, Some configurations of the Cortex-M0 and Cortex-M1 come with a high latency multiplier. This patch adds support for such configurations. Small multiplier means using add/sub/shift instructions to replace the mul instruction for the MCU that has no fast multiplier. The following strategies are adopted in this patch: 1. Define new CPUs as -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,cortex-m1.small- multiply to support small multiplier. 2. -Os means size is preferred. A threshold of 5 is set which means it will prevent spliting if ending up with more than 5 instructions. As for non-OS, there will be no such a limit. Some test cases are also added in the testsuite to verify this function. Is it ok for trunk? Thanks and Best Regards, Hale Wang gcc/ChangeLog: 2014-08-29 Hale Wang hale.w...@arm.com * config/arm/arm-cores.def: Add support for -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply, cortex-m1.small-multiply. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * config/arm/arm.c: Update the rtx-costs for MUL. * config/arm/bpabi.h: Handle -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply, cortex-m1.small-multiply. * doc/invoke.texi: Document -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply, cortex-m1.small-multiply. * testsuite/gcc.target/arm/small-multiply-m0-1.c: New test case. * testsuite/gcc.target/arm/small-multiply-m0-2.c: Likewise. * testsuite/gcc.target/arm/small-multiply-m0-3.c: Likewise. * testsuite/gcc.target/arm/small-multiply-m0plus-1.c: Likewise. * testsuite/gcc.target/arm/small-multiply-m0plus-2.c: Likewise. * testsuite/gcc.target/arm/small-multiply-m0plus-3.c: Likewise. * testsuite/gcc.target/arm/small-multiply-m1-1.c: Likewise. * testsuite/gcc.target/arm/small-multiply-m1-2.c: Likewise. * testsuite/gcc.target/arm/small-multiply-m1-3.c: Likewise. === diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index a830a83..af4b373 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -137,6 +137,11 @@ ARM_CORE(cortex-m1, cortexm1, cortexm1, 6M, FL_LDSCHED, v6m) ARM_CORE(cortex-m0, cortexm0, cortexm0, 6M, FL_LDSCHED, v6m) ARM_CORE(cortex-m0plus, cortexm0plus, cortexm0plus, 6M, FL_LDSCHED, v6m) +/* V6M Architecture Processors for small-multiply implementations. */ +ARM_CORE(cortex-m1.small-multiply, cortexm1smallmultiply, cortexm1, 6M, FL_LDSCHED | FL_SMALLMUL, v6m) +ARM_CORE(cortex-m0.small-multiply, cortexm0smallmultiply, cortexm0, 6M, FL_LDSCHED | FL_SMALLMUL, v6m) +ARM_CORE(cortex-m0plus.small-multiply,cortexm0plussmallmultiply, cortexm0plus,6M, FL_LDSCHED | FL_SMALLMUL, v6m) + /* V7 Architecture Processors */ ARM_CORE(generic-armv7-a,genericv7a, genericv7a, 7A, FL_LDSCHED, cortex) ARM_CORE(cortex-a5, cortexa5, cortexa5, 7A, FL_LDSCHED, cortex_a5) diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index bc046a0..bd65bd2 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -241,6 +241,15 @@ EnumValue Enum(processor_type) String(cortex-m0plus) Value(cortexm0plus) EnumValue +Enum(processor_type) String(cortex-m1.small-multiply) Value(cortexm1smallmultiply) + +EnumValue +Enum(processor_type) String(cortex-m0.small-multiply) Value(cortexm0smallmultiply) + +EnumValue +Enum(processor_type) String(cortex-m0plus.small-multiply) Value(cortexm0plussmallmultiply) + +EnumValue Enum(processor_type) String(generic-armv7-a) Value(genericv7a) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index 954cab8..8b5c778 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -25,6 +25,7 @@ arm1176jzs,arm1176jzfs,mpcorenovfp, mpcore,arm1156t2s,arm1156t2fs, cortexm1,cortexm0,cortexm0plus, + cortexm1smallmultiply,cortexm0smallmultiply,cortexm0plussmallmultiply, genericv7a,cortexa5,cortexa7, cortexa8,cortexa9,cortexa12, cortexa15,cortexr4,cortexr4f, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 93b989d..5062c85 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -751,6 +751,8 @@ static int thumb_call_reg_needed; #define FL_ARCH8 (1 24) /* Architecture 8. */ #define FL_CRC32 (1 25) /* ARMv8 CRC32 instructions. */ +#define FL_SMALLMUL (1 26) /* Small multiply supported. */ + #define FL_IWMMXT (1 29) /* XScale v2 or Intel Wireless MMX technology. */ #define FL_IWMMXT2(1 30) /* Intel Wireless MMX2 technology. */ @@ -914,6 +916,9 @@ int arm_condexec_masklen = 0; /* Nonzero if chip supports the ARMv8 CRC
Re: [PATCHv4][Kasan] Allow to override Asan shadow offset from command line
On 10/17/2014 04:32 PM, Ian Lance Taylor wrote: Jakub Jelinek ja...@redhat.com writes: Not sure if there aren't extra steps to make strtoull prototype available in system.h, libiberty.h etc. for systems that don't have strtoull in their headers. See the #if defined(HAVE_DECL_XXX) !HAVE_DECL_XXX lines in include/libiberty.h. Although strtol is missing there as well. Thanks, here is a new version of patch which adds support for strtoll and strtoull and also a small test for strtol family. There some open questions though: 1) how am to test libiberty patches like this? Glibc obviously has strtoll and strtoull so I won't get it in my libiberty.a by default. For now I manually embed strtoll.o/strtoull.o into libiberty.a and run tests by hand but that doesn't sound quite robust and also just x64 is probably not enough as well. 2) I had to use __extension__ keyword to hide warnings from -Wlong-long. Unfortunately it does not fix spurious warnings for long long constants in older versions of GCC (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7263). I guess that's fine. -Y commit d78d8daeec19531014c703894dc67db1430f6cbb Author: Yury Gribov y.gri...@samsung.com Date: Thu Oct 16 18:31:10 2014 +0400 Add strtoll and strtoull to libiberty. 2014-10-20 Yury Gribov y.gri...@samsung.com include/ * libiberty.h (strtol, strtoul, strtoll, strtoull): New prototypes. libiberty/ * strtoll.c: New file. * strtoull.c: New file. * configure.ac: Add long long checks. Add harness for strtoll and strtoull. Check decls for strtol, strtoul, strtoll, strtoull. * Makefile.in (CFILES, CONFIGURED_OFILES): Added strtoll and strtoull. * config.in: Regenerate. * configure: Regenerate. * functions.texi: Regenerate. * testsuite/Makefile.in (check-strtol): New rule. (test-strtol): Likewise. (mostlyclean): Clean up strtol test. * testsuite/test-strtol.c: New test. diff --git a/include/libiberty.h b/include/libiberty.h index d09c9a5..26355a9 100644 --- a/include/libiberty.h +++ b/include/libiberty.h @@ -655,6 +655,33 @@ extern size_t strnlen (const char *, size_t); extern int strverscmp (const char *, const char *); #endif +#if defined(HAVE_DECL_STRTOL) !HAVE_DECL_STRTOL +extern long int strtol (const char *nptr, +char **endptr, int base); +#endif + +#if defined(HAVE_DECL_STRTOUL) !HAVE_DECL_STRTOUL +extern unsigned long int strtoul (const char *nptr, + char **endptr, int base); +#endif + +#if defined(HAVE_DECL_STRTOLL) !HAVE_DECL_STRTOLL +__extension__ +extern long long int strtoll (const char *nptr, + char **endptr, int base); +#endif + +#if defined(HAVE_DECL_STRTOULL) !HAVE_DECL_STRTOULL +__extension__ +extern unsigned long long int strtoull (const char *nptr, +char **endptr, int base); +#endif + +#if defined(HAVE_DECL_STRVERSCMP) !HAVE_DECL_STRVERSCMP +/* Compare version strings. */ +extern int strverscmp (const char *, const char *); +#endif + /* Set the title of a process */ extern void setproctitle (const char *name, ...); diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in index 9b87720..1b0d8ae 100644 --- a/libiberty/Makefile.in +++ b/libiberty/Makefile.in @@ -152,8 +152,8 @@ CFILES = alloca.c argv.c asprintf.c atexit.c\ spaces.c splay-tree.c stack-limit.c stpcpy.c stpncpy.c \ strcasecmp.c strchr.c strdup.c strerror.c strncasecmp.c \ strncmp.c strrchr.c strsignal.c strstr.c strtod.c strtol.c \ - strtoul.c strndup.c strnlen.c strverscmp.c \ - timeval-utils.c tmpnam.c \ + strtoll.c strtoul.c strtoull.c strndup.c strnlen.c \ + strverscmp.c timeval-utils.c tmpnam.c\ unlink-if-ordinary.c \ vasprintf.c vfork.c vfprintf.c vprintf.c vsnprintf.c vsprintf.c \ waitpid.c \ @@ -219,8 +219,8 @@ CONFIGURED_OFILES = ./asprintf.$(objext) ./atexit.$(objext) \ ./strchr.$(objext) ./strdup.$(objext) ./strncasecmp.$(objext) \ ./strncmp.$(objext) ./strndup.$(objext) ./strnlen.$(objext) \ ./strrchr.$(objext) ./strstr.$(objext) ./strtod.$(objext) \ - ./strtol.$(objext) ./strtoul.$(objext) ./strverscmp.$(objext) \ - ./tmpnam.$(objext) \ + ./strtol.$(objext) ./strtoul.$(objext) strtoll.$(objext) \ + ./strtoull.$(objext) ./tmpnam.$(objext) ./strverscmp.$(objext) \ ./vasprintf.$(objext) ./vfork.$(objext) ./vfprintf.$(objext) \ ./vprintf.$(objext) ./vsnprintf.$(objext) ./vsprintf.$(objext) \ ./waitpid.$(objext) @@ -694,6 +694,17 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir else true; fi $(COMPILE.c) $(srcdir)/crc32.c $(OUTPUT_OPTION) +./d-demangle.$(objext): $(srcdir)/d-demangle.c config.h $(INCDIR)/ansidecl.h \ + $(INCDIR)/demangle.h $(INCDIR)/libiberty.h \ + $(INCDIR)/safe-ctype.h + if [ x$(PICFLAG) != x ]; then \ + $(COMPILE.c) $(PICFLAG) $(srcdir)/d-demangle.c -o pic/$@; \ + else true; fi + if [
Re: [Patch, libstdc++/63497] Avoid dereferencing invalid iterator in regex_executor
On 20/10/14 10:23 -0700, Tim Shen wrote: Bootstrapped and tested. Did you manage to produce a testcase that crashed on trunk? @@ -407,25 +409,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION templatetypename _BiIter, typename _Alloc, typename _TraitsT, bool __dfs_mode bool _Executor_BiIter, _Alloc, _TraitsT, __dfs_mode:: -_M_word_boundary(_State_TraitsT) const +_M_word_boundary(_State_TraitsT) { - // By definition. - bool __ans = false; - auto __pre = _M_current; - --__pre; - if (!(_M_at_begin() _M_at_end())) + bool __left_is_word = false; + if (_M_current != _M_begin + || (_M_flags regex_constants::match_prev_avail)) { - if (_M_at_begin()) - __ans = _M_is_word(*_M_current) - !(_M_flags regex_constants::match_not_bow); - else if (_M_at_end()) - __ans = _M_is_word(*__pre) - !(_M_flags regex_constants::match_not_eow); - else - __ans = _M_is_word(*_M_current) - != _M_is_word(*__pre); + --_M_current; + if (_M_is_word(*_M_current)) + __left_is_word = true; + ++_M_current; Is it really necessary to modify _M_current here? Couldn't you do: auto __pre = _M_current; if (_M_is_word(*--__pre)) __left_is_word = true; Then the function could remain const, couldn't it?
Re: The nvptx port [0/11+]
On 10/21/2014 10:18 AM, Richard Biener wrote: So with this restriction I wonder why it didn't make sense to go the HSA backend route emitting PTX from a GIMPLE SSA pass. This would have avoided the LTO dance as well ... Quite simple - there isn't an established way to do this. If I'd known you were doing something like this when I started the work I might have looked into that approach. Bernd
Re: The nvptx port [0/11+]
On 10/21/2014 10:42 AM, Jakub Jelinek wrote: On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote: * Can't emit initializers referring to their variable's address since you can't write forward declarations for variables. Can't that be handled by emitting the initializer without the address and some constructor that fixes up the initializer at runtime? That reminds me that constructors are something I forgot to add to the list. I'm thinking about making these work with some trickery in the linker, but at the moment they are unsupported. Can't you represent structures and unions as arrays of chars? For constant initializers that don't need relocations the compiler can surely turn them into arrays of char initializers (e.g. fold-const.c native_encode_expr/native_interpret_expr could be used for that). Supposedly it would mean slower than perhaps necessary loads/stores of aligned larger fields from the structure, but if it is an alternative to not supporting structures/unions at all, that sounds like so severe limitation that it can be pretty fatal for the target. Oh, structs and unions are supported, and essentially that's what I'm doing - I choose a base integer type to represent them. That happens to be the size of a pointer, so properly aligned symbol refs can be emitted. It's just the packed ones that can't be done. * No support for indirect jumps, label values, nonlocal gotos. Not even indirect calls? How do you implement C++ or Fortran vtables? Indirect calls do exist. Bernd
[2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target
Hi there, Attached patch intends to enable GCC generate UAL format code for Thumb1 target. Tested with regression test and no regressions. Is it OK to trunk? BR, Terry 2014-10-21 Terry Guo terry@arm.com * config/arm/arm.c (arm_output_mi_thunk): Use UAL for Thumb1 target. * config/arm/thumb1.md: Likewise. gcc/testsuite 2014-10-21 Terry Guo terry@arm.com * gcc.target/arm/anddi_notdi-1.c: Match with UAL format. * gcc.target/arm/pr40956.c: Likewise. * gcc.target/arm/thumb1-Os-mult.c: Likewise. * gcc.target/arm/thumb1-load-64bit-constant-3.c: Likewise.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9ccf73c..dc73244 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28615,12 +28615,14 @@ arm_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED, fputs (\tldr\tr3, , file); assemble_name (file, label); fputs (+4\n, file); - asm_fprintf (file, \t%s\t%r, %r, r3\n, + asm_fprintf (file, \t%ss\t%r, %r, r3\n, mi_op, this_regno, this_regno); } else if (mi_delta != 0) { - asm_fprintf (file, \t%s\t%r, %r, #%d\n, + /* Thumb1 unified syntax requires s suffix in instruction name when +one of the operands is immediate. */ + asm_fprintf (file, \t%ss\t%r, %r, #%d\n, mi_op, this_regno, this_regno, mi_delta); } diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index 020d83b..8a2abe9 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -29,7 +29,7 @@ (clobber (reg:CC CC_REGNUM)) ] TARGET_THUMB1 - add\\t%Q0, %Q0, %Q2\;adc\\t%R0, %R0, %R2 + adds\\t%Q0, %Q0, %Q2\;adcs\\t%R0, %R0, %R2 [(set_attr length 4) (set_attr type multiple)] ) @@ -42,9 +42,9 @@ * static const char * const asms[] = { - \add\\t%0, %0, %2\, - \sub\\t%0, %0, #%n2\, - \add\\t%0, %1, %2\, + \adds\\t%0, %0, %2\, + \subs\\t%0, %0, #%n2\, + \adds\\t%0, %1, %2\, \add\\t%0, %0, %2\, \add\\t%0, %0, %2\, \add\\t%0, %1, %2\, @@ -56,7 +56,7 @@ if ((which_alternative == 2 || which_alternative == 6) CONST_INT_P (operands[2]) INTVAL (operands[2]) 0) - return \sub\\t%0, %1, #%n2\; + return (which_alternative == 2) ? \subs\\t%0, %1, #%n2\ : \sub\\t%0, %1, #%n2\; return asms[which_alternative]; reload_completed CONST_INT_P (operands[2]) @@ -105,7 +105,7 @@ (match_operand:DI 2 register_operand l))) (clobber (reg:CC CC_REGNUM))] TARGET_THUMB1 - sub\\t%Q0, %Q0, %Q2\;sbc\\t%R0, %R0, %R2 + subs\\t%Q0, %Q0, %Q2\;sbcs\\t%R0, %R0, %R2 [(set_attr length 4) (set_attr type multiple)] ) @@ -115,7 +115,7 @@ (minus:SI (match_operand:SI 1 register_operand l) (match_operand:SI 2 reg_or_int_operand lPd)))] TARGET_THUMB1 - sub\\t%0, %1, %2 + subs\\t%0, %1, %2 [(set_attr length 2) (set_attr conds set) (set_attr type alus_sreg)] @@ -133,9 +133,9 @@ TARGET_THUMB1 !arm_arch6 * if (which_alternative 2) -return \mov\\t%0, %1\;mul\\t%0, %2\; +return \mov\\t%0, %1\;muls\\t%0, %2\; else -return \mul\\t%0, %2\; +return \muls\\t%0, %2\; [(set_attr length 4,4,2) (set_attr type muls)] @@ -147,9 +147,9 @@ (match_operand:SI 2 register_operand l,0,0)))] TARGET_THUMB1 arm_arch6 @ - mul\\t%0, %2 - mul\\t%0, %1 - mul\\t%0, %1 + muls\\t%0, %2 + muls\\t%0, %1 + muls\\t%0, %1 [(set_attr length 2) (set_attr type muls)] ) @@ -159,7 +159,7 @@ (and:SI (match_operand:SI 1 register_operand %0) (match_operand:SI 2 register_operand l)))] TARGET_THUMB1 - and\\t%0, %2 + ands\\t%0, %2 [(set_attr length 2) (set_attr type logic_imm) (set_attr conds set)]) @@ -202,7 +202,7 @@ (and:SI (not:SI (match_operand:SI 1 register_operand l)) (match_operand:SI 2 register_operand 0)))] TARGET_THUMB1 - bic\\t%0, %1 + bics\\t%0, %1 [(set_attr length 2) (set_attr conds set) (set_attr type logics_reg)] @@ -213,7 +213,7 @@ (ior:SI (match_operand:SI 1 register_operand %0) (match_operand:SI 2 register_operand l)))] TARGET_THUMB1 - orr\\t%0, %2 + orrs\\t%0, %2 [(set_attr length 2) (set_attr conds set) (set_attr type logics_reg)]) @@ -223,7 +223,7 @@ (xor:SI (match_operand:SI 1 register_operand %0) (match_operand:SI 2 register_operand l)))] TARGET_THUMB1 - eor\\t%0, %2 + eors\\t%0, %2 [(set_attr length 2) (set_attr conds set) (set_attr type logics_reg)] @@ -234,7 +234,7 @@ (ashift:SI (match_operand:SI 1 register_operand l,0) (match_operand:SI 2 nonmemory_operand N,l)))] TARGET_THUMB1 - lsl\\t%0, %1, %2 + lsls\\t%0, %1, %2 [(set_attr length 2)
Re: [PATCH] Don't put conditional loads/stores into interleaved chains (PR tree-optimization/63563)
On Tue, 21 Oct 2014, Jakub Jelinek wrote: Hi! This patch prevents conditional loads/stores to be added into interleaved groups (where it ICEs later on). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9? Ok. Thanks, Richard. 2014-10-21 Jakub Jelinek ja...@redhat.com PR tree-optimization/63563 * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Bail out if either dra or drb stmts are not normal loads/stores. * gcc.target/i386/pr63563.c: New test. --- gcc/tree-vect-data-refs.c.jj 2014-10-03 10:10:42.0 +0200 +++ gcc/tree-vect-data-refs.c 2014-10-20 15:21:47.938679992 +0200 @@ -2551,11 +2551,14 @@ vect_analyze_data_ref_accesses (loop_vec over them. The we can just skip ahead to the next DR here. */ /* Check that the data-refs have same first location (except init) - and they are both either store or load (not load and store). */ + and they are both either store or load (not load and store, + not masked loads or stores). */ if (DR_IS_READ (dra) != DR_IS_READ (drb) || !operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) - || !dr_equal_offsets_p (dra, drb)) + || !dr_equal_offsets_p (dra, drb) + || !gimple_assign_single_p (DR_STMT (dra)) + || !gimple_assign_single_p (DR_STMT (drb))) break; /* Check that the data-refs have the same constant size and step. */ --- gcc/testsuite/gcc.target/i386/pr63563.c.jj2014-10-20 15:27:17.713745577 +0200 +++ gcc/testsuite/gcc.target/i386/pr63563.c 2014-10-20 15:27:57.637023020 +0200 @@ -0,0 +1,17 @@ +/* PR tree-optimization/63563 */ +/* { dg-do compile } */ +/* { dg-options -O3 -mavx2 } */ + +struct A { unsigned long a, b, c, d; } a[1024] = { { 0, 1, 2, 3 } }, b; + +void +foo (void) +{ + int i; + for (i = 0; i 1024; i++) +{ + a[i].a = a[i].b = a[i].c = b.c; + if (a[i].d) + a[i].d = b.d; +} +} Jakub
Re: The nvptx port [0/11+]
On Tue, Oct 21, 2014 at 12:53 PM, Bernd Schmidt ber...@codesourcery.com wrote: On 10/21/2014 10:18 AM, Richard Biener wrote: So with this restriction I wonder why it didn't make sense to go the HSA backend route emitting PTX from a GIMPLE SSA pass. This would have avoided the LTO dance as well ... Quite simple - there isn't an established way to do this. If I'd known you were doing something like this when I started the work I might have looked into that approach. Ah, I see. I think having both ways now is good so we can compare pros and cons in practice (and make further targets follow the better approach if there is one). Richard. Bernd
Re: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap
On Tue, Oct 21, 2014 at 11:28 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi Richard, I realized thanks to Christophe Lyon that a shift was not right: the shift count is a number of bytes instead of a number of bits. This extra patch fixes the problem. Ok. Thanks, Richard. ChangeLog are as follows: *** gcc/ChangeLog *** 2014-09-26 Thomas Preud'homme thomas.preudho...@arm.com * tree-ssa-math-opts.c (find_bswap_or_nop_1): Fix creation of MARKER_BYTE_UNKNOWN markers when handling casts. *** gcc/testsuite/ChangeLog *** 2014-10-08 Thomas Preud'homme thomas.preudho...@arm.com * gcc.dg/optimize-bswaphi-1.c: New bswap pass test. diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c index 3e51f04..18aba28 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c +++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c @@ -42,6 +42,20 @@ uint32_t read_be16_3 (unsigned char *data) return *(data + 1) | (*data 8); } +typedef int SItype __attribute__ ((mode (SI))); +typedef int HItype __attribute__ ((mode (HI))); + +/* Test that detection of significant sign extension works correctly. This + checks that unknown byte marker are set correctly in cast of cast. */ + +HItype +swap16 (HItype in) +{ + return (HItype) (((in 0) 0xFF) 8) + | (((in 8) 0xFF) 0); +} + /* { dg-final { scan-tree-dump-times 16 bit load in target endianness found at 3 bswap } } */ -/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 3 bswap { xfail alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 1 bswap { target alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 4 bswap { xfail alpha*-*-* arm*-*-* } } } */ /* { dg-final { cleanup-tree-dump bswap } } */ diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 3c6e935..2ef2333 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1916,7 +1916,8 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) if (!TYPE_UNSIGNED (n-type) type_size old_type_size HEAD_MARKER (n-n, old_type_size)) for (i = 0; i type_size - old_type_size; i++) - n-n |= MARKER_BYTE_UNKNOWN (type_size - 1 - i); + n-n |= MARKER_BYTE_UNKNOWN +((type_size - 1 - i) * BITS_PER_MARKER); if (type_size 64 / BITS_PER_MARKER) { regression testsuite run without regression on x86_64-linux-gnu and bswap tests all pass on arm-none-eabi target Is it ok for trunk? Best regards, Thomas -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 24, 2014 4:01 PM To: Thomas Preud'homme Cc: GCC Patches Subject: Re: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap On Tue, Sep 16, 2014 at 12:24 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi all, The fix for PR61306 disabled bswap when a sign extension is detected. However this led to a test case regression (and potential performance regression) in case where a sign extension happens but its effect is canceled by other bit manipulation. This patch aims to fix that by having a special marker to track bytes whose value is unpredictable due to sign extension. If the final result of a bit manipulation doesn't contain any such marker then the bswap optimization can proceed. Nice and simple idea. Ok. Thanks, Richard. *** gcc/ChangeLog *** 2014-09-15 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/63266 * tree-ssa-math-opts.c (struct symbolic_number): Add comment about marker for unknown byte value. (MARKER_MASK): New macro. (MARKER_BYTE_UNKNOWN): New macro. (HEAD_MARKER): New macro. (do_shift_rotate): Mark bytes with unknown values due to sign extension when doing an arithmetic right shift. Replace hardcoded mask for marker by new MARKER_MASK macro. (find_bswap_or_nop_1): Likewise and adjust ORing of two symbolic numbers accordingly. *** gcc/testsuite/ChangeLog *** 2014-09-15 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/63266 * gcc.dg/optimize-bswapsi-1.c (swap32_d): New bswap pass test. Testing: * Built an arm-none-eabi-gcc cross-compiler and used it to run the testsuite on QEMU emulating Cortex-M3 without any regression * Bootstrapped on x86_64-linux-gnu target and testsuite was run without regression Ok for trunk?
[C++ Patch] Add default arguments to cp_parser_unary_expression
Hi, another patchlet along the lines of the other I proposed over the last weeks: this one should be really uncontroversial, because turns out that in all but one case we are passing all NULL / false arguments. Tested x86_64-linux. Thanks, Paolo. /
Re: [C++ Patch] Add default arguments to cp_parser_unary_expression
... the patch. Paolo. // 2014-10-21 Paolo Carlini paolo.carl...@oracle.com * parser.c (cp_parser_unary_expression): Add default arguments. (cp_parser_cast_expression, cp_parser_sizeof_operand, cp_parser_omp_atomic): Adjust. Index: parser.c === --- parser.c(revision 216502) +++ parser.c(working copy) @@ -1968,7 +1968,7 @@ enum { non_attr = 0, normal_attr = 1, id_attr = 2 static void cp_parser_pseudo_destructor_name (cp_parser *, tree, tree *, tree *); static tree cp_parser_unary_expression - (cp_parser *, bool, bool, cp_id_kind *); + (cp_parser *, cp_id_kind * = NULL, bool = false, bool = false, bool = false); static enum tree_code cp_parser_unary_operator (cp_token *); static tree cp_parser_new_expression @@ -7104,8 +7104,8 @@ cp_parser_pseudo_destructor_name (cp_parser* parse Returns a representation of the expression. */ static tree -cp_parser_unary_expression (cp_parser *parser, bool address_p, bool cast_p, - bool decltype_p, cp_id_kind * pidk) +cp_parser_unary_expression (cp_parser *parser, cp_id_kind * pidk, + bool address_p, bool cast_p, bool decltype_p) { cp_token *token; enum tree_code unary_operator; @@ -7381,14 +7381,6 @@ static tree pidk); } -static inline tree -cp_parser_unary_expression (cp_parser *parser, bool address_p, bool cast_p, - cp_id_kind * pidk) -{ - return cp_parser_unary_expression (parser, address_p, cast_p, -/*decltype*/false, pidk); -} - /* Returns ERROR_MARK if TOKEN is not a unary-operator. If TOKEN is a unary-operator, the corresponding tree code is returned. */ @@ -8018,8 +8010,8 @@ cp_parser_cast_expression (cp_parser *parser, bool /* If we get here, then it's not a cast, so it must be a unary-expression. */ - return cp_parser_unary_expression (parser, address_p, cast_p, -decltype_p, pidk); + return cp_parser_unary_expression (parser, pidk, address_p, +cast_p, decltype_p); } /* Parse a binary expression of the general form: @@ -24374,8 +24366,7 @@ cp_parser_sizeof_operand (cp_parser* parser, enum /* If the type-id production did not work out, then we must be looking at the unary-expression production. */ if (!expr) -expr = cp_parser_unary_expression (parser, /*address_p=*/false, - /*cast_p=*/false, NULL); +expr = cp_parser_unary_expression (parser); /* Go back to evaluating expressions. */ --cp_unevaluated_operand; @@ -29039,8 +29030,7 @@ cp_parser_omp_atomic (cp_parser *parser, cp_token { case OMP_ATOMIC_READ: case NOP_EXPR: /* atomic write */ - v = cp_parser_unary_expression (parser, /*address_p=*/false, - /*cast_p=*/false, NULL); + v = cp_parser_unary_expression (parser); if (v == error_mark_node) goto saw_error; if (!cp_parser_require (parser, CPP_EQ, RT_EQ)) @@ -29048,8 +29038,7 @@ cp_parser_omp_atomic (cp_parser *parser, cp_token if (code == NOP_EXPR) lhs = cp_parser_expression (parser); else - lhs = cp_parser_unary_expression (parser, /*address_p=*/false, - /*cast_p=*/false, NULL); + lhs = cp_parser_unary_expression (parser); if (lhs == error_mark_node) goto saw_error; if (code == NOP_EXPR) @@ -29070,8 +29059,7 @@ cp_parser_omp_atomic (cp_parser *parser, cp_token } else { - v = cp_parser_unary_expression (parser, /*address_p=*/false, - /*cast_p=*/false, NULL); + v = cp_parser_unary_expression (parser); if (v == error_mark_node) goto saw_error; if (!cp_parser_require (parser, CPP_EQ, RT_EQ)) @@ -29082,8 +29070,7 @@ cp_parser_omp_atomic (cp_parser *parser, cp_token } restart: - lhs = cp_parser_unary_expression (parser, /*address_p=*/false, - /*cast_p=*/false, NULL); + lhs = cp_parser_unary_expression (parser); orig_lhs = lhs; switch (TREE_CODE (lhs)) { @@ -29322,14 +29309,12 @@ stmt_done: { if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON)) goto saw_error; - v = cp_parser_unary_expression (parser, /*address_p=*/false, - /*cast_p=*/false, NULL); + v = cp_parser_unary_expression (parser); if (v == error_mark_node) goto saw_error; if (!cp_parser_require (parser, CPP_EQ, RT_EQ)) goto saw_error; - lhs1 = cp_parser_unary_expression (parser, /*address_p=*/false, -/*cast_p=*/false, NULL); + lhs1 = cp_parser_unary_expression
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_preds_critical_p): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject temporarily block if-conversion with incoming critical edges if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted after adding support for extended predication. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and at least one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com: Richard, Thanks for your answer! In current implementation phi node conversion assume that one of incoming edge to bb containing given phi has at least one non-critical edge and choose it to insert predicated code. But if we choose critical edge we need to determine insert point and insertion direction (before/after) since in other case we can get invalid ssa form (use before def). This is done by my new function which is not in current patch ( I will present this patch later). SO I assume that we need to leave this patch as it is to not introduce new bugs. Thanks. Yuri. 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I reworked the patch as you proposed, but I didn't understand what did you mean by: So please rework the patch so critical edges are always handled correctly. In current patch flag_force_vectorize is used (1) to reject phi nodes with more than 2 arguments; (2) to reject basic blocks with only critical incoming edges since support for extended predication of phi nodes will be in next patch. I mean that (2) should not be rejected dependent on flag_force_vectorize. It was rejected because if-cvt couldn't handle it correctly before but with this patch this is fixed. I see no reason to still reject this then even for !flag_force_vectorize. Rejecting PHIs with more than two arguments with flag_force_vectorize is ok. Richard. Could you please clarify your statement. I attached modified patch. ChangeLog: 2014-10-17 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_edges_are_critical): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject block if-conversion with incoming critical edges only if FLAG_FORCE_VECTORIZE was not set-up. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and atleast one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-17 13:09 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Here is reduced patch as you requested. All your remarks have been fixed. Could you please look at it ( I have already sent the patch with changes in add_to_predicate_list for review). + if (dump_file (dump_flags TDF_DETAILS)) +
[PATCH] Fix obvious errors in IPA devirt and tree-prof testcases
They still FAIL though: FAIL: g++.dg/tree-prof/pr35545.C execution: file pr35545.gcda does not exist, -fprofile-generate -D_PROFILE_GENERATE FAIL: g++.dg/ipa/devirt-42.C -std=gnu++98 scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++11 scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++1y scan-tree-dump-times optimized return 2 2 Honza - please test your patches better. Richard. 2014-10-21 Richard Biener rguent...@suse.de * g++.dg/ipa/devirt-42.C: Fix dump scanning routines. * g++.dg/ipa/devirt-46.C: Likewise. * g++.dg/ipa/devirt-47.C: Likewise. * g++.dg/tree-prof/pr35545.C: Likewise. Index: gcc/testsuite/g++.dg/ipa/devirt-42.C === --- gcc/testsuite/g++.dg/ipa/devirt-42.C(revision 216506) +++ gcc/testsuite/g++.dg/ipa/devirt-42.C(working copy) @@ -31,8 +31,8 @@ main() /* { dg-final { scan-ipa-dump-times Discovered a virtual call to a known target 2 inline } } */ /* Verify that speculation is optimized by late optimizers. */ -/* { dg-final { scan-ipa-dump-times return 2 2 optimized } } */ -/* { dg-final { scan-ipa-dump-not OBJ_TYPE_REF optimized } } */ +/* { dg-final { scan-tree-dump-times return 2 2 optimized } } */ +/* { dg-final { scan-tree-dump-not OBJ_TYPE_REF optimized } } */ /* { dg-final { cleanup-ipa-dump inline } } */ -/* { dg-final { cleanup-ipa-dump optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/g++.dg/ipa/devirt-46.C === --- gcc/testsuite/g++.dg/ipa/devirt-46.C(revision 216506) +++ gcc/testsuite/g++.dg/ipa/devirt-46.C(working copy) @@ -21,7 +21,7 @@ m() } /* { dg-final { scan-ipa-dump-times Discovered a virtual call to a known target\[^\\n\]*B::foo 1 inline } } */ -/* { dg-final { scan-ipa-dump-not OBJ_TYPE_REF optimized } } */ -/* { dg-final { scan-ipa-dump-not abort optimized } } */ +/* { dg-final { scan-tree-dump-not OBJ_TYPE_REF optimized } } */ +/* { dg-final { scan-tree-dump-not abort optimized } } */ /* { dg-final { cleanup-ipa-dump inline } } */ -/* { dg-final { cleanup-ipa-dump optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/g++.dg/ipa/devirt-47.C === --- gcc/testsuite/g++.dg/ipa/devirt-47.C(revision 216506) +++ gcc/testsuite/g++.dg/ipa/devirt-47.C(working copy) @@ -24,8 +24,8 @@ m() } /* { dg-final { scan-ipa-dump-times Discovered a virtual call to a known target\[^\\n\]*C::_ZTh 1 inline } } */ -/* { dg-final { scan-ipa-dump-not OBJ_TYPE_REF optimized } } */ +/* { dg-final { scan-tree-dump-not OBJ_TYPE_REF optimized } } */ /* FIXME: We ought to inline thunk. */ -/* { dg-final { scan-ipa-dump C::_ZThn optimized } } */ +/* { dg-final { scan-tree-dump C::_ZThn optimized } } */ /* { dg-final { cleanup-ipa-dump inline } } */ -/* { dg-final { cleanup-ipa-dump optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/g++.dg/tree-prof/pr35545.C === --- gcc/testsuite/g++.dg/tree-prof/pr35545.C(revision 216506) +++ gcc/testsuite/g++.dg/tree-prof/pr35545.C(working copy) @@ -48,5 +48,5 @@ int main() } /* { dg-final-use { scan-ipa-dump Indirect call - direct call profile_estimate } } */ /* { dg-final-use { cleanup-ipa-dump profile } } */ -/* { dg-final-use { scan-ipa-dump-not OBJ_TYPE_REF optimized } } */ +/* { dg-final-use { scan-tree-dump-not OBJ_TYPE_REF optimized } } */ /* { dg-final-use { cleanup-tree-dump optimized } } */
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). But the same reasoning applies to this version of the patch when flag_force_vectorize is true!? (insertion point and invalid SSA form) Which means the patch doesn't make sense in isolation? Btw, I think for the case you should simply do gsi_insert_on_edge () and commit_edge_insertions () before the call to combine_blocks (pushing the edge predicate to the newly created block). Richard. Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_preds_critical_p): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject temporarily block if-conversion with incoming critical edges if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted after adding support for extended predication. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and at least one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com: Richard, Thanks for your answer! In current implementation phi node conversion assume that one of incoming edge to bb containing given phi has at least one non-critical edge and choose it to insert predicated code. But if we choose critical edge we need to determine insert point and insertion direction (before/after) since in other case we can get invalid ssa form (use before def). This is done by my new function which is not in current patch ( I will present this patch later). SO I assume that we need to leave this patch as it is to not introduce new bugs. Thanks. Yuri. 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I reworked the patch as you proposed, but I didn't understand what did you mean by: So please rework the patch so critical edges are always handled correctly. In current patch flag_force_vectorize is used (1) to reject phi nodes with more than 2 arguments; (2) to reject basic blocks with only critical incoming edges since support for extended predication of phi nodes will be in next patch. I mean that (2) should not be rejected dependent on flag_force_vectorize. It was rejected because if-cvt couldn't handle it correctly before but with this patch this is fixed. I see no reason to still reject this then even for !flag_force_vectorize. Rejecting PHIs with more than two arguments with flag_force_vectorize is ok. Richard. Could you please clarify your statement. I attached modified patch. ChangeLog: 2014-10-17 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_edges_are_critical): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject block if-conversion with incoming critical edges only if FLAG_FORCE_VECTORIZE was not set-up. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and atleast one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE
gnu11 fallout
Tested on m68k-suse-linux, installed as obvious. Andreas. * gcc.dg/bf-spl1.c (main): Fix implicit int. diff --git a/gcc/testsuite/gcc.dg/bf-spl1.c b/gcc/testsuite/gcc.dg/bf-spl1.c index b28130d..1cba005 100644 --- a/gcc/testsuite/gcc.dg/bf-spl1.c +++ b/gcc/testsuite/gcc.dg/bf-spl1.c @@ -44,6 +44,7 @@ pack_d () x = dst.bits.fraction; } +int main () { pack_d (); -- 2.1.2 -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[Patch ARM-AArch64/testsuite v3 00/21] Neon intrinsics executable tests
This patch series is an updated version of the series I sent here: https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00022.html I addressed comments from Marcus and Richard, and decided to skip support for half-precision variants for the time being. I'll post dedicated patches later. Compared to v2: - the directory containing the new tests is named gcc.target/aarch64/adv-simd instead of gcc.target/aarch64/neon-intrinsics. - the driver is named adv-simd.exp instead of neon-intrinsics.exp - the driver is guarded against the new test parallelization framework - the README file uses 'Advanced SIMD (Neon)' instead of 'Neon' Christophe Lyon (21): Advanced SIMD (Neon) intrinsics execution tests initial framework. vaba, vld1 and vshl tests. Add unary operators: vabs and vneg. Add binary operators: vadd, vand, vbic, veor, vorn, vorr, vsub. Add comparison operators: vceq, vcge, vcgt, vcle and vclt. Add comparison operators with floating-point operands: vcage, vcagt, vcale and cvalt. Add unary saturating operators: vqabs and vqneg. Add binary saturating operators: vqadd, vqsub. Add vabal tests. Add vabd tests. Add vabdl tests. Add vaddhn tests. Add vaddl tests. Add vaddw tests. Add vbsl tests. Add vclz tests. Add vdup and vmov tests. Add vld1_dup tests. Add vld2/vld3/vld4 tests. Add vld2_lane, vld3_lane and vld4_lane tests. Add vmul tests. Add vuzp and vzip tests.
[Patch ARM-AArch64/testsuite v3 02/21] Add unary operators: vabs and vneg.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/unary_op.inc: New file. * gcc.target/aarch64/advsimd-intrinsics/vabs.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vneg.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_op.inc new file mode 100644 index 000..33f9b5f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_op.inc @@ -0,0 +1,72 @@ +/* Template file for unary operator validation. + + This file is meant to be included by the relevant test files, which + have to define the intrinsic family to test. If a given intrinsic + supports variants which are not supported by all the other unary + operators, these can be tested by providing a definition for + EXTRA_TESTS. */ + +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* Basic test: y=OP(x), then store the result. */ +#define TEST_UNARY_OP1(INSN, Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + +#define TEST_UNARY_OP(INSN, Q, T1, T2, W, N) \ + TEST_UNARY_OP1(INSN, Q, T1, T2, W, N) \ + + /* No need for 64 bits variants in the general case. */ + DECL_VARIABLE(vector, int, 8, 8); + DECL_VARIABLE(vector, int, 16, 4); + DECL_VARIABLE(vector, int, 32, 2); + DECL_VARIABLE(vector, int, 8, 16); + DECL_VARIABLE(vector, int, 16, 8); + DECL_VARIABLE(vector, int, 32, 4); + + DECL_VARIABLE(vector_res, int, 8, 8); + DECL_VARIABLE(vector_res, int, 16, 4); + DECL_VARIABLE(vector_res, int, 32, 2); + DECL_VARIABLE(vector_res, int, 8, 16); + DECL_VARIABLE(vector_res, int, 16, 8); + DECL_VARIABLE(vector_res, int, 32, 4); + + clean_results (); + + /* Initialize input vector from buffer. */ + VLOAD(vector, buffer, , int, s, 8, 8); + VLOAD(vector, buffer, , int, s, 16, 4); + VLOAD(vector, buffer, , int, s, 32, 2); + VLOAD(vector, buffer, q, int, s, 8, 16); + VLOAD(vector, buffer, q, int, s, 16, 8); + VLOAD(vector, buffer, q, int, s, 32, 4); + + /* Apply a unary operator named INSN_NAME. */ + TEST_UNARY_OP(INSN_NAME, , int, s, 8, 8); + TEST_UNARY_OP(INSN_NAME, , int, s, 16, 4); + TEST_UNARY_OP(INSN_NAME, , int, s, 32, 2); + TEST_UNARY_OP(INSN_NAME, q, int, s, 8, 16); + TEST_UNARY_OP(INSN_NAME, q, int, s, 16, 8); + TEST_UNARY_OP(INSN_NAME, q, int, s, 32, 4); + + CHECK_RESULTS (TEST_MSG, ); + +#ifdef EXTRA_TESTS + EXTRA_TESTS(); +#endif +} + +int main (void) +{ + FNNAME (INSN_NAME)(); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c new file mode 100644 index 000..ca3901a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c @@ -0,0 +1,74 @@ +#define INSN_NAME vabs +#define TEST_MSG VABS/VABSQ + +/* Extra tests for functions requiring floating-point types. */ +void exec_vabs_f32(void); +#define EXTRA_TESTS exec_vabs_f32 + +#include unary_op.inc + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x10, 0xf, 0xe, 0xd, + 0xc, 0xb, 0xa, 0x9 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x10, 0xf, 0xe, 0xd }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x10, 0xf }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x10, 0xf, 0xe, 0xd, 0xc, 0xb, 0xa, 0x9, + 0x8, 0x7, 0x6, 0x5, 0x4, 0x3, 0x2, 0x1 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0x10, 0xf, 0xe, 0xd, + 0xc, 0xb, 0xa, 0x9 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0x10, 0xf, 0xe, 0xd }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33,
[Patch ARM-AArch64/testsuite v3 01/21] Advanced SIMD (Neon) intrinsics execution tests initial framework. vaba, vld1 and vshl tests.
* documentation (README) * dejanu driver (advsimd-intrinsics.exp) * support macros (arm-neon-ref.h, compute-ref-data.h) * Tests for 3 intrinsics: vaba, vld1, vshl 2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/arm/README.advsimd-intrinsics: New file. * gcc.target/aarch64/advsimd-intrinsics/README: Likewise. * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h: Likewise. * gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h: Likewise. * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vaba.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vld1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vshl.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/README b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/README new file mode 100644 index 000..52c374c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/README @@ -0,0 +1,132 @@ +This directory contains executable tests for ARM/AArch64 Advanced SIMD +(Neon) intrinsics. + +It is meant to cover execution cases of all the Advanced SIMD +intrinsics, but does not scan the generated assembler code. + +The general framework is composed as follows: +- advsimd-intrinsics.exp: main dejagnu driver +- *.c: actual tests, generally one per intrinsinc family +- arm-neon-ref.h: contains macro definitions to save typing in actual + test files +- compute-ref-data.h: contains input vectors definitions +- *.inc: generic tests, shared by several families of intrinsics. For + instance, unary or binary operators + +A typical .c test file starts with the following contents (look at +vld1.c and vaba.c for sample cases): +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +Then, definitions of expected results, based on common input values, +as defined in compute-ref-data.h. +For example: +VECT_VAR_DECL(expected,int,16,4) [] = { 0x16, 0x17, 0x18, 0x19 }; +defines the expected results of an operator generating int16x4 values. + +The common input values defined in compute-ref-data.h have been chosen +to avoid corner-case values for most operators, yet exposing negative +values for signed operators. For this reason, their range is also +limited. For instance, the initialization of buffer_int16x4 will be +{ -16, -15, -14, -13 }. + +The initialization of floating-point values is done via hex notation, +to avoid potential rounding problems. + +To test special values and corner cases, specific initialization +values should be used in dedicated tests, to ensure proper coverage. +An example of this is vshl. + +When a variant of an intrinsic is not available, its expected result +should be defined to the value of CLEAN_PATTERN_8 as defined in +arm-neon-ref.h. For example: +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +if the given intrinsic has no variant producing an int64x1 result, +like the vcmp family (eg. vclt). + +This is because the helper function (check_results(), defined in +arm-neon-ref.h), iterates over all the possible variants, to save +typing in each individual test file. Alternatively, one can directly +call the CHECK/CHECK_FP macros to check only a few expected results +(see vabs.c for an example). + +Then, define the TEST_MSG string, which will be used when reporting errors. + +Next, define the function performing the actual tests, in general +relying on the helpers provided by arm-neon-ref.h, which means: + +* declare necessary vectors of suitable types: using + DECL_VARIABLE_ALL_VARIANTS when all variants are supported, or the + relevant of subset calls to DECL_VARIABLE. + +* call clean_results() to initialize the 'results' buffers. + +* initialize the input vectors, using VLOAD, VDUP or VSET_LANE (vld* + tests do not need this step, since their actual purpose is to + initialize vectors). + +* execute the intrinsic on relevant variants, for instance using + TEST_MACRO_ALL_VARIANTS_2_5. + +* call check_results() to check that the results match the expected + values. + +A template test file could be: += +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf6, 0xf7, 0xf8, 0xf9, + 0xfa, 0xfb, 0xfc, 0xfd }; +/* and as many others as necessary. */ + +#define TEST_MSG VMYINTRINSIC +void exec_myintrinsic (void) +{ + /* my test: v4=vmyintrinsic(v1,v2,v3), then store the result. */ +#define TEST_VMYINTR(Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +vmyintr##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \ +VECT_VAR(vector2, T1, W, N), \ +
[Patch ARM-AArch64/testsuite v3 03/21] Add binary operators: vadd, vand, vbic, veor, vorn, vorr, vsub.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/binary_op.inc: New file. * gcc.target/aarch64/advsimd-intrinsics/vadd.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vand.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vbic.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/veor.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vorn.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vorr.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vsub.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op.inc new file mode 100644 index 000..3483e0e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op.inc @@ -0,0 +1,70 @@ +/* Template file for binary operator validation. + + This file is meant to be included by the relevant test files, which + have to define the intrinsic family to test. If a given intrinsic + supports variants which are not supported by all the other binary + operators, these can be tested by providing a definition for + EXTRA_TESTS. */ + +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* Basic test: y=OP(x1,x2), then store the result. */ +#define TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + +#define TEST_BINARY_OP(INSN, Q, T1, T2, W, N) \ + TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \ + + DECL_VARIABLE_ALL_VARIANTS(vector); + DECL_VARIABLE_ALL_VARIANTS(vector2); + DECL_VARIABLE_ALL_VARIANTS(vector_res); + + clean_results (); + + /* Initialize input vector from buffer. */ + TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer); + + /* Fill input vector2 with arbitrary values. */ + VDUP(vector2, , int, s, 8, 8, 2); + VDUP(vector2, , int, s, 16, 4, -4); + VDUP(vector2, , int, s, 32, 2, 3); + VDUP(vector2, , int, s, 64, 1, 100); + VDUP(vector2, , uint, u, 8, 8, 20); + VDUP(vector2, , uint, u, 16, 4, 30); + VDUP(vector2, , uint, u, 32, 2, 40); + VDUP(vector2, , uint, u, 64, 1, 2); + VDUP(vector2, q, int, s, 8, 16, -10); + VDUP(vector2, q, int, s, 16, 8, -20); + VDUP(vector2, q, int, s, 32, 4, -30); + VDUP(vector2, q, int, s, 64, 2, 24); + VDUP(vector2, q, uint, u, 8, 16, 12); + VDUP(vector2, q, uint, u, 16, 8, 3); + VDUP(vector2, q, uint, u, 32, 4, 55); + VDUP(vector2, q, uint, u, 64, 2, 3); + + /* Apply a binary operator named INSN_NAME. */ + TEST_MACRO_ALL_VARIANTS_1_5(TEST_BINARY_OP, INSN_NAME); + + CHECK_RESULTS (TEST_MSG, ); + +#ifdef EXTRA_TESTS + EXTRA_TESTS(); +#endif +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c new file mode 100644 index 000..f08c620 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c @@ -0,0 +1,81 @@ +#define INSN_NAME vadd +#define TEST_MSG VADD/VADDQ + +/* Extra tests for functions requiring floating-point types. */ +void exec_vadd_f32(void); +#define EXTRA_TESTS exec_vadd_f32 + +#include binary_op.inc + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf2, 0xf3, 0xf4, 0xf5, + 0xf6, 0xf7, 0xf8, 0xf9 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xffec, 0xffed, 0xffee, 0xffef }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfff3, 0xfff4 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x54 }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x4, 0x5, 0x6, 0x7, + 0x8, 0x9, 0xa, 0xb }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xe, 0xf, 0x10, 0x11 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x18, 0x19 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfff2 }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0xe6, 0xe7, 0xe8, 0xe9, + 0xea, 0xeb, 0xec, 0xed, + 0xee, 0xef, 0xf0, 0xf1, + 0xf2, 0xf3, 0xf4, 0xf5 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xffdc, 0xffdd, 0xffde, 0xffdf, +
[Patch ARM-AArch64/testsuite v3 05/21] Add comparison operators with floating-point operands: vcage, vcagt, vcale and cvalt.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc: New file. * gcc.target/aarch64/advsimd-intrinsics/vcage.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcagt.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcale.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcalt.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc new file mode 100644 index 000..33451d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc @@ -0,0 +1,75 @@ +/* Template file for the validation of comparison operator with + floating-point support. + + This file is meant to be included by the relevant test files, which + have to define the intrinsic family to test. If a given intrinsic + supports variants which are not supported by all the other + operators, these can be tested by providing a definition for + EXTRA_TESTS. */ + +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Additional expected results declaration, they are initialized in + each test file. */ +extern ARRAY(expected2, uint, 32, 2); +extern ARRAY(expected2, uint, 32, 4); + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* Basic test: y=vcomp(x1,x2), then store the result. */ +#define TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) \ + VECT_VAR(vector_res, T3, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vector_res, T3, W, N)) + +#define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N) \ + TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) + + DECL_VARIABLE(vector, float, 32, 2); + DECL_VARIABLE(vector, float, 32, 4); + DECL_VARIABLE(vector2, float, 32, 2); + DECL_VARIABLE(vector2, float, 32, 4); + DECL_VARIABLE(vector_res, uint, 32, 2); + DECL_VARIABLE(vector_res, uint, 32, 4); + + clean_results (); + + /* Initialize input vector from buffer. */ + VLOAD(vector, buffer, , float, f, 32, 2); + VLOAD(vector, buffer, q, float, f, 32, 4); + + /* Choose init value arbitrarily, will be used for vector + comparison. */ + VDUP(vector2, , float, f, 32, 2, -16.0f); + VDUP(vector2, q, float, f, 32, 4, -14.0f); + + /* Apply operator named INSN_NAME. */ + TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2); + CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, ); + + TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4); + CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, ); + + /* Test again, with different input values. */ + VDUP(vector2, , float, f, 32, 2, -10.0f); + VDUP(vector2, q, float, f, 32, 4, 10.0f); + + TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2); + CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected2, ); + + TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4); + CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected2,); +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c new file mode 100644 index 000..219d03f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c @@ -0,0 +1,52 @@ +#define INSN_NAME vcage +#define TEST_MSG VCAGE/VCAGEQ + +#include cmp_fp_op.inc + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x333, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x333, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x0 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0x333, 0x, 0x, 0x, + 0x333, 0x, 0x, 0x };
[Patch ARM-AArch64/testsuite v3 06/21] Add unary saturating operators: vqabs and vqneg.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc: New file. * gcc.target/aarch64/advsimd-intrinsics/vqabs.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vqneg.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc new file mode 100644 index 000..3f6d984 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc @@ -0,0 +1,80 @@ +/* Template file for saturating unary operator validation. + + This file is meant to be included by the relevant test files, which + have to define the intrinsic family to test. If a given intrinsic + supports variants which are not supported by all the other + saturating unary operators, these can be tested by providing a + definition for EXTRA_TESTS. */ + +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* y=OP(x), then store the result. */ +#define TEST_UNARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \ + Set_Neon_Cumulative_Sat(0); \ + VECT_VAR(vector_res, T1, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \ +vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \ + VECT_VAR(vector_res, T1, W, N)); \ + CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT) + +#define TEST_UNARY_SAT_OP(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \ + TEST_UNARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) + + /* No need for 64 bits variants. */ + DECL_VARIABLE(vector, int, 8, 8); + DECL_VARIABLE(vector, int, 16, 4); + DECL_VARIABLE(vector, int, 32, 2); + DECL_VARIABLE(vector, int, 8, 16); + DECL_VARIABLE(vector, int, 16, 8); + DECL_VARIABLE(vector, int, 32, 4); + + DECL_VARIABLE(vector_res, int, 8, 8); + DECL_VARIABLE(vector_res, int, 16, 4); + DECL_VARIABLE(vector_res, int, 32, 2); + DECL_VARIABLE(vector_res, int, 8, 16); + DECL_VARIABLE(vector_res, int, 16, 8); + DECL_VARIABLE(vector_res, int, 32, 4); + + clean_results (); + + /* Initialize input vector from buffer. */ + VLOAD(vector, buffer, , int, s, 8, 8); + VLOAD(vector, buffer, , int, s, 16, 4); + VLOAD(vector, buffer, , int, s, 32, 2); + VLOAD(vector, buffer, q, int, s, 8, 16); + VLOAD(vector, buffer, q, int, s, 16, 8); + VLOAD(vector, buffer, q, int, s, 32, 4); + + /* Apply a saturating unary operator named INSN_NAME. */ + TEST_UNARY_SAT_OP(INSN_NAME, , int, s, 8, 8, expected_cumulative_sat, ); + TEST_UNARY_SAT_OP(INSN_NAME, , int, s, 16, 4, expected_cumulative_sat, ); + TEST_UNARY_SAT_OP(INSN_NAME, , int, s, 32, 2, expected_cumulative_sat, ); + TEST_UNARY_SAT_OP(INSN_NAME, q, int, s, 8, 16, expected_cumulative_sat, ); + TEST_UNARY_SAT_OP(INSN_NAME, q, int, s, 16, 8, expected_cumulative_sat, ); + TEST_UNARY_SAT_OP(INSN_NAME, q, int, s, 32, 4, expected_cumulative_sat, ); + + CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, ); + CHECK(TEST_MSG, int, 16, 4, PRIx8, expected, ); + CHECK(TEST_MSG, int, 32, 2, PRIx8, expected, ); + CHECK(TEST_MSG, int, 8, 16, PRIx8, expected, ); + CHECK(TEST_MSG, int, 16, 8, PRIx8, expected, ); + CHECK(TEST_MSG, int, 32, 4, PRIx8, expected, ); + +#ifdef EXTRA_TESTS + EXTRA_TESTS(); +#endif +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqabs.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqabs.c new file mode 100644 index 000..f2be790 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqabs.c @@ -0,0 +1,127 @@ +#define INSN_NAME vqabs +#define TEST_MSG VQABS/VQABSQ + +/* Extra tests for functions requiring corner cases tests. */ +void vqabs_extra(void); +#define EXTRA_TESTS vqabs_extra + +#include unary_sat_op.inc + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x10, 0xf, 0xe, 0xd, 0xc, 0xb, 0xa, 0x9 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x10, 0xf, 0xe, 0xd }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x10, 0xf }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x,
[Patch ARM-AArch64/testsuite v3 08/21] Add vabal tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vabal.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabal.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabal.c new file mode 100644 index 000..cd31062 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabal.c @@ -0,0 +1,161 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff6, 0xfff7, 0xfff8, 0xfff9, + 0xfffa, 0xfffb, 0xfffc, 0xfffd }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0x16, 0x17, 0x18, 0x19 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x20, 0x21 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0x53, 0x54, 0x55, 0x56, +0x57, 0x58, 0x59, 0x5a }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0x907, 0x908, 0x909, 0x90a }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffe7, +0xffe8 }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x, + 0x, 0x }; + +/* Expected results for cases with input values chosen to test + possible intermediate overflow. */ +VECT_VAR_DECL(expected2,int,16,8) [] = { 0xef, 0xf0, 0xf1, 0xf2, +0xf3, 0xf4, 0xf5, 0xf6 }; +VECT_VAR_DECL(expected2,int,32,4) [] = { 0xffef, 0xfff0, 0xfff1, 0xfff2 }; +VECT_VAR_DECL(expected2,int,64,2) [] = { 0xffef, 0xfff0 }; +VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xee, 0xef, 0xf0, 0xf1, + 0xf2, 0xf3, 0xf4, 0xf5 }; +VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xffe2, 0xffe3, 0xffe4, 0xffe5 }; +VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xffe7, 0xffe8 }; + +#define TEST_MSG VABAL +void exec_vabal (void) +{ + /* Basic test: v4=vabal(v1,v2,v3), then store the result. */ +#define TEST_VABAL(T1, T2, W, W2, N) \ + VECT_VAR(vector_res, T1, W2, N) =\ +vabal_##T2##W(VECT_VAR(vector1, T1, W2, N), \ + VECT_VAR(vector2, T1, W, N), \ + VECT_VAR(vector3, T1, W, N)); \ + vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N)) + +#define DECL_VABAL_VAR_LONG(VAR) \ + DECL_VARIABLE(VAR, int, 16, 8); \ + DECL_VARIABLE(VAR, int, 32, 4); \ + DECL_VARIABLE(VAR, int, 64, 2); \ + DECL_VARIABLE(VAR, uint, 16, 8); \ + DECL_VARIABLE(VAR, uint, 32, 4); \ + DECL_VARIABLE(VAR, uint, 64, 2) + +#define DECL_VABAL_VAR_SHORT(VAR) \ + DECL_VARIABLE(VAR, int, 8, 8); \ + DECL_VARIABLE(VAR, int, 16, 4); \ + DECL_VARIABLE(VAR, int, 32, 2); \ + DECL_VARIABLE(VAR, uint, 8, 8); \ + DECL_VARIABLE(VAR, uint, 16, 4); \ +
[Patch ARM-AArch64/testsuite v3 04/21] Add comparison operators: vceq, vcge, vcgt, vcle and vclt.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc: New file. * gcc.target/aarch64/advsimd-intrinsics/vceq.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcge.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcgt.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vcle.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vclt.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc new file mode 100644 index 000..a09c5f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc @@ -0,0 +1,224 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h +#include math.h + +/* Additional expected results declaration, they are initialized in + each test file. */ +extern ARRAY(expected_uint, uint, 8, 8); +extern ARRAY(expected_uint, uint, 16, 4); +extern ARRAY(expected_uint, uint, 32, 2); +extern ARRAY(expected_q_uint, uint, 8, 16); +extern ARRAY(expected_q_uint, uint, 16, 8); +extern ARRAY(expected_q_uint, uint, 32, 4); +extern ARRAY(expected_float, uint, 32, 2); +extern ARRAY(expected_q_float, uint, 32, 4); +extern ARRAY(expected_uint2, uint, 32, 2); +extern ARRAY(expected_uint3, uint, 32, 2); +extern ARRAY(expected_uint4, uint, 32, 2); +extern ARRAY(expected_nan, uint, 32, 2); +extern ARRAY(expected_mnan, uint, 32, 2); +extern ARRAY(expected_nan2, uint, 32, 2); +extern ARRAY(expected_inf, uint, 32, 2); +extern ARRAY(expected_minf, uint, 32, 2); +extern ARRAY(expected_inf2, uint, 32, 2); +extern ARRAY(expected_mzero, uint, 32, 2); +extern ARRAY(expected_p8, uint, 8, 8); +extern ARRAY(expected_q_p8, uint, 8, 16); + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* Basic test: y=vcomp(x1,x2), then store the result. */ +#define TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) \ + VECT_VAR(vector_res, T3, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vector_res, T3, W, N)) + +#define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N) \ + TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) + + /* No need for 64 bits elements. */ + DECL_VARIABLE(vector, int, 8, 8); + DECL_VARIABLE(vector, int, 16, 4); + DECL_VARIABLE(vector, int, 32, 2); + DECL_VARIABLE(vector, uint, 8, 8); + DECL_VARIABLE(vector, uint, 16, 4); + DECL_VARIABLE(vector, uint, 32, 2); + DECL_VARIABLE(vector, float, 32, 2); + DECL_VARIABLE(vector, int, 8, 16); + DECL_VARIABLE(vector, int, 16, 8); + DECL_VARIABLE(vector, int, 32, 4); + DECL_VARIABLE(vector, uint, 8, 16); + DECL_VARIABLE(vector, uint, 16, 8); + DECL_VARIABLE(vector, uint, 32, 4); + DECL_VARIABLE(vector, float, 32, 4); + + DECL_VARIABLE(vector2, int, 8, 8); + DECL_VARIABLE(vector2, int, 16, 4); + DECL_VARIABLE(vector2, int, 32, 2); + DECL_VARIABLE(vector2, uint, 8, 8); + DECL_VARIABLE(vector2, uint, 16, 4); + DECL_VARIABLE(vector2, uint, 32, 2); + DECL_VARIABLE(vector2, float, 32, 2); + DECL_VARIABLE(vector2, int, 8, 16); + DECL_VARIABLE(vector2, int, 16, 8); + DECL_VARIABLE(vector2, int, 32, 4); + DECL_VARIABLE(vector2, uint, 8, 16); + DECL_VARIABLE(vector2, uint, 16, 8); + DECL_VARIABLE(vector2, uint, 32, 4); + DECL_VARIABLE(vector2, float, 32, 4); + + DECL_VARIABLE(vector_res, uint, 8, 8); + DECL_VARIABLE(vector_res, uint, 16, 4); + DECL_VARIABLE(vector_res, uint, 32, 2); + DECL_VARIABLE(vector_res, uint, 8, 16); + DECL_VARIABLE(vector_res, uint, 16, 8); + DECL_VARIABLE(vector_res, uint, 32, 4); + + clean_results (); + + /* There is no 64 bits variant, don't use the generic initializer. */ + VLOAD(vector, buffer, , int, s, 8, 8); + VLOAD(vector, buffer, , int, s, 16, 4); + VLOAD(vector, buffer, , int, s, 32, 2); + VLOAD(vector, buffer, , uint, u, 8, 8); + VLOAD(vector, buffer, , uint, u, 16, 4); + VLOAD(vector, buffer, , uint, u, 32, 2); + VLOAD(vector, buffer, , float, f, 32, 2); + + VLOAD(vector, buffer, q, int, s, 8, 16); + VLOAD(vector, buffer, q, int, s, 16, 8); + VLOAD(vector, buffer, q, int, s, 32, 4); + VLOAD(vector, buffer, q, uint, u, 8, 16); + VLOAD(vector, buffer, q, uint, u, 16, 8); + VLOAD(vector, buffer, q, uint, u, 32, 4); + VLOAD(vector, buffer, q, float, f, 32, 4); + + /* Choose init value arbitrarily, will be used for vector + comparison. */ + VDUP(vector2, , int, s, 8, 8, -10); + VDUP(vector2, , int, s, 16, 4, -14); + VDUP(vector2, , int, s, 32, 2, -16); + VDUP(vector2, , uint, u, 8, 8, 0xF3); + VDUP(vector2, , uint, u, 16, 4, 0xFFF2); + VDUP(vector2, , uint, u, 32, 2, 0xFFF1); + VDUP(vector2, , float, f, 32, 2, -15.0f); + + VDUP(vector2, q, int, s, 8,
[Patch ARM-AArch64/testsuite v3 07/21] Add binary saturating operators: vqadd, vqsub.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc: New file. * gcc.target/aarch64/advsimd-intrinsics/vqadd.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vqsub.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc new file mode 100644 index 000..35d7701 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc @@ -0,0 +1,91 @@ +/* Template file for saturating binary operator validation. + + This file is meant to be included by the relevant test files, which + have to define the intrinsic family to test. If a given intrinsic + supports variants which are not supported by all the other + saturating binary operators, these can be tested by providing a + definition for EXTRA_TESTS. */ + +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* vector_res = OP(vector1,vector2), then store the result. */ + +#define TEST_BINARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \ + Set_Neon_Cumulative_Sat(0); \ + VECT_VAR(vector_res, T1, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ +vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \ + VECT_VAR(vector_res, T1, W, N)); \ + CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT) + +#define TEST_BINARY_SAT_OP(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \ + TEST_BINARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) + + DECL_VARIABLE_ALL_VARIANTS(vector1); + DECL_VARIABLE_ALL_VARIANTS(vector2); + DECL_VARIABLE_ALL_VARIANTS(vector_res); + + clean_results (); + + /* Initialize input vector1 from buffer. */ + TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer); + + /* Choose arbitrary initialization values. */ + VDUP(vector2, , int, s, 8, 8, 0x11); + VDUP(vector2, , int, s, 16, 4, 0x22); + VDUP(vector2, , int, s, 32, 2, 0x33); + VDUP(vector2, , int, s, 64, 1, 0x44); + VDUP(vector2, , uint, u, 8, 8, 0x55); + VDUP(vector2, , uint, u, 16, 4, 0x66); + VDUP(vector2, , uint, u, 32, 2, 0x77); + VDUP(vector2, , uint, u, 64, 1, 0x88); + + VDUP(vector2, q, int, s, 8, 16, 0x11); + VDUP(vector2, q, int, s, 16, 8, 0x22); + VDUP(vector2, q, int, s, 32, 4, 0x33); + VDUP(vector2, q, int, s, 64, 2, 0x44); + VDUP(vector2, q, uint, u, 8, 16, 0x55); + VDUP(vector2, q, uint, u, 16, 8, 0x66); + VDUP(vector2, q, uint, u, 32, 4, 0x77); + VDUP(vector2, q, uint, u, 64, 2, 0x88); + + /* Apply a saturating binary operator named INSN_NAME. */ + TEST_BINARY_SAT_OP(INSN_NAME, , int, s, 8, 8, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, , int, s, 16, 4, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, , int, s, 32, 2, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, , int, s, 64, 1, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, , uint, u, 8, 8, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, , uint, u, 16, 4, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, , uint, u, 32, 2, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, , uint, u, 64, 1, expected_cumulative_sat, ); + + TEST_BINARY_SAT_OP(INSN_NAME, q, int, s, 8, 16, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, q, int, s, 16, 8, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, q, int, s, 32, 4, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, q, int, s, 64, 2, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, q, uint, u, 8, 16, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, q, uint, u, 16, 8, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, q, uint, u, 32, 4, expected_cumulative_sat, ); + TEST_BINARY_SAT_OP(INSN_NAME, q, uint, u, 64, 2, expected_cumulative_sat, ); + + CHECK_RESULTS (TEST_MSG, ); + +#ifdef EXTRA_TESTS + EXTRA_TESTS(); +#endif +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqadd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqadd.c new file mode 100644 index 000..c07f5ff --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqadd.c @@ -0,0 +1,278 @@ +#define INSN_NAME vqadd +#define TEST_MSG VQADD/VQADDQ + +/* Extra tests for special cases: + - some requiring intermediate types larger than 64 bits to + compute saturation flag. + - corner case saturations with types smaller than 64 bits. +*/ +void
[Patch ARM-AArch64/testsuite v3 10/21] Add vabdl tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vabdl.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdl.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdl.c new file mode 100644 index 000..28018ab --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdl.c @@ -0,0 +1,109 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0x11, 0x10, 0xf, 0xe, + 0xd, 0xc, 0xb, 0xa }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0x3, 0x2, 0x1, 0x0 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x18, 0x17 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xef, 0xf0, 0xf1, 0xf2, +0xf3, 0xf4, 0xf5, 0xf6 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffe3, 0xffe4, 0xffe5, 0xffe6 }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffe8, +0xffe9 }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x, + 0x, 0x }; + +#define TEST_MSG VABDL +void exec_vabdl (void) +{ + /* Basic test: v4=vabdl(v1,v2), then store the result. */ +#define TEST_VABDL(T1, T2, W, W2, N) \ + VECT_VAR(vector_res, T1, W2, N) =\ +vabdl_##T2##W(VECT_VAR(vector1, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N)) + +#define DECL_VABDL_VAR_LONG(VAR) \ + DECL_VARIABLE(VAR, int, 16, 8); \ + DECL_VARIABLE(VAR, int, 32, 4); \ + DECL_VARIABLE(VAR, int, 64, 2); \ + DECL_VARIABLE(VAR, uint, 16, 8); \ + DECL_VARIABLE(VAR, uint, 32, 4); \ + DECL_VARIABLE(VAR, uint, 64, 2) + +#define DECL_VABDL_VAR_SHORT(VAR) \ + DECL_VARIABLE(VAR, int, 8, 8); \ + DECL_VARIABLE(VAR, int, 16, 4); \ + DECL_VARIABLE(VAR, int, 32, 2); \ + DECL_VARIABLE(VAR, uint, 8, 8); \ + DECL_VARIABLE(VAR, uint, 16, 4); \ + DECL_VARIABLE(VAR, uint, 32, 2) + + DECL_VABDL_VAR_SHORT(vector1); + DECL_VABDL_VAR_SHORT(vector2); + DECL_VABDL_VAR_LONG(vector_res); + + clean_results (); + + /* Initialize input vector1 from buffer. */ + VLOAD(vector1, buffer, , int, s, 8, 8); + VLOAD(vector1, buffer, , int, s, 16, 4); + VLOAD(vector1, buffer, , int, s, 32, 2); + VLOAD(vector1, buffer, , uint, u, 8, 8); + VLOAD(vector1, buffer, , uint, u, 16, 4); + VLOAD(vector1, buffer, , uint, u, 32, 2); + + /* Choose init value arbitrarily. */ + VDUP(vector2, , int, s, 8, 8, 1); + VDUP(vector2, , int, s, 16, 4, -13); + VDUP(vector2, , int, s, 32, 2, 8); + VDUP(vector2, , uint, u, 8, 8, 1); + VDUP(vector2, , uint, u, 16, 4, 13); + VDUP(vector2, , uint, u, 32, 2, 8); + + /* Execute the
[Patch ARM-AArch64/testsuite v3 16/21] Add vdup and vmov tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c new file mode 100644 index 000..b5132f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c @@ -0,0 +1,253 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* We test vdup and vmov in the same place since they are aliases. */ + +/* Expected results. */ +/* Chunk 0. */ +VECT_VAR_DECL(expected0,int,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,int,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,int,32,2) [] = { 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,int,64,1) [] = { 0xfff0 }; +VECT_VAR_DECL(expected0,uint,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,uint,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,uint,32,2) [] = { 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,uint,64,1) [] = { 0xfff0 }; +VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc180, 0xc180 }; +VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,int,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0, +0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,int,32,4) [] = { 0xfff0, 0xfff0, +0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,int,64,2) [] = { 0xfff0, +0xfff0 }; +VECT_VAR_DECL(expected0,uint,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,uint,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0, + 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,uint,32,4) [] = { 0xfff0, 0xfff0, + 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,uint,64,2) [] = { 0xfff0, + 0xfff0 }; +VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0, + 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc180, 0xc180, + 0xc180, 0xc180 }; + +/* Chunk 1. */ +VECT_VAR_DECL(expected1,int,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1, + 0xf1, 0xf1, 0xf1, 0xf1 }; +VECT_VAR_DECL(expected1,int,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,int,32,2) [] = { 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,int,64,1) [] = { 0xfff1 }; +VECT_VAR_DECL(expected1,uint,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1 }; +VECT_VAR_DECL(expected1,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,uint,32,2) [] = { 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,uint,64,1) [] = { 0xfff1 }; +VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1 }; +VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0xc170, 0xc170 }; +VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1 }; +VECT_VAR_DECL(expected1,int,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1, +0xfff1, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,int,32,4) [] = { 0xfff1, 0xfff1, +
[Patch ARM-AArch64/testsuite v3 12/21] Add vaddl tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vaddl.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c new file mode 100644 index 000..861abec --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c @@ -0,0 +1,122 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x3, 0x3, 0x3, 0x3, + 0x3, 0x3, 0x3, 0x3 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x37, 0x37, 0x37, 0x37 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x3, 0x3 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xffe3, 0xffe4, 0xffe5, 0xffe6, +0xffe7, 0xffe8, 0xffe9, 0xffea }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xffe2, 0xffe3, + 0xffe4, 0xffe5 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0xffe0, + 0xffe1 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0x1e3, 0x1e4, 0x1e5, 0x1e6, +0x1e7, 0x1e8, 0x1e9, 0x1ea }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0x1ffe1, 0x1ffe2, +0x1ffe3, 0x1ffe4 }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x1ffe0, 0x1ffe1 }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x, + 0x, 0x }; + +#ifndef INSN_NAME +#define INSN_NAME vaddl +#define TEST_MSG VADDL +#endif + +#define FNNAME1(NAME) void exec_ ## NAME (void) +#define FNNAME(NAME) FNNAME1(NAME) + +FNNAME (INSN_NAME) +{ + /* Basic test: y=vaddl(x1,x2), then store the result. */ +#define TEST_VADDL1(INSN, T1, T2, W, W2, N)\ + VECT_VAR(vector_res, T1, W2, N) =\ +INSN##_##T2##W(VECT_VAR(vector, T1, W, N), \ + VECT_VAR(vector2, T1, W, N));\ + vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N)) + +#define TEST_VADDL(INSN, T1, T2, W, W2, N) \ + TEST_VADDL1(INSN, T1, T2, W, W2, N) + + DECL_VARIABLE(vector, int, 8, 8); + DECL_VARIABLE(vector, int, 16, 4); + DECL_VARIABLE(vector, int, 32, 2); + DECL_VARIABLE(vector, uint, 8, 8); + DECL_VARIABLE(vector, uint, 16, 4); + DECL_VARIABLE(vector, uint, 32, 2); + + DECL_VARIABLE(vector2, int, 8, 8); + DECL_VARIABLE(vector2, int, 16, 4); + DECL_VARIABLE(vector2, int, 32, 2); + DECL_VARIABLE(vector2, uint, 8, 8); + DECL_VARIABLE(vector2, uint, 16, 4); + DECL_VARIABLE(vector2, uint, 32, 2); + + DECL_VARIABLE(vector_res, int, 16, 8); + DECL_VARIABLE(vector_res, int, 32, 4); + DECL_VARIABLE(vector_res, int, 64, 2); + DECL_VARIABLE(vector_res, uint, 16, 8); + DECL_VARIABLE(vector_res, uint, 32, 4); + DECL_VARIABLE(vector_res, uint, 64, 2); + + clean_results (); + + /* Initialize input vector from buffer. */ + VLOAD(vector, buffer, , int, s, 8, 8); + VLOAD(vector, buffer, , int, s, 16, 4); + VLOAD(vector, buffer, , int, s, 32, 2); + VLOAD(vector, buffer, , uint, u, 8, 8); + VLOAD(vector, buffer, , uint, u, 16, 4); + VLOAD(vector, buffer, , uint, u, 32, 2); + + /*
[Patch ARM-AArch64/testsuite v3 13/21] Add vaddw tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vaddw.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c new file mode 100644 index 000..5804cd7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c @@ -0,0 +1,122 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x3, 0x3, 0x3, 0x3, + 0x3, 0x3, 0x3, 0x3 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x37, 0x37, 0x37, 0x37 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x3, 0x3 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xffe3, 0xffe4, 0xffe5, 0xffe6, +0xffe7, 0xffe8, 0xffe9, 0xffea }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xffe2, 0xffe3, + 0xffe4, 0xffe5 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0xffe0, + 0xffe1 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xe3, 0xe4, 0xe5, 0xe6, +0xe7, 0xe8, 0xe9, 0xea }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffe1, 0xffe2, +0xffe3, 0xffe4 }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffe0, 0xffe1 }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x, + 0x, 0x }; + +#ifndef INSN_NAME +#define INSN_NAME vaddw +#define TEST_MSG VADDW +#endif + +#define FNNAME1(NAME) void exec_ ## NAME (void) +#define FNNAME(NAME) FNNAME1(NAME) + +FNNAME (INSN_NAME) +{ + /* Basic test: y=vaddw(x1,x2), then store the result. */ +#define TEST_VADDW1(INSN, T1, T2, W, W2, N)\ + VECT_VAR(vector_res, T1, W2, N) =\ +INSN##_##T2##W(VECT_VAR(vector, T1, W2, N), \ + VECT_VAR(vector2, T1, W, N));\ + vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N)) + +#define TEST_VADDW(INSN, T1, T2, W, W2, N) \ + TEST_VADDW1(INSN, T1, T2, W, W2, N) + + DECL_VARIABLE(vector, int, 16, 8); + DECL_VARIABLE(vector, int, 32, 4); + DECL_VARIABLE(vector, int, 64, 2); + DECL_VARIABLE(vector, uint, 16, 8); + DECL_VARIABLE(vector, uint, 32, 4); + DECL_VARIABLE(vector, uint, 64, 2); + + DECL_VARIABLE(vector2, int, 8, 8); + DECL_VARIABLE(vector2, int, 16, 4); + DECL_VARIABLE(vector2, int, 32, 2); + DECL_VARIABLE(vector2, uint, 8, 8); + DECL_VARIABLE(vector2, uint, 16, 4); + DECL_VARIABLE(vector2, uint, 32, 2); + + DECL_VARIABLE(vector_res, int, 16, 8); + DECL_VARIABLE(vector_res, int, 32, 4); + DECL_VARIABLE(vector_res, int, 64, 2); + DECL_VARIABLE(vector_res, uint, 16, 8); + DECL_VARIABLE(vector_res, uint, 32, 4); + DECL_VARIABLE(vector_res, uint, 64, 2); + + clean_results (); + + /* Initialize input vector from buffer. */ + VLOAD(vector, buffer, q, int, s, 16, 8); + VLOAD(vector, buffer, q, int, s, 32, 4); + VLOAD(vector, buffer, q, int, s, 64, 2); + VLOAD(vector, buffer, q, uint, u, 16, 8); + VLOAD(vector, buffer, q, uint, u, 32, 4); + VLOAD(vector, buffer, q, uint, u, 64, 2); + +
[Patch ARM-AArch64/testsuite v3 19/21] Add vld2_lane, vld3_lane and vld4_lane tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c new file mode 100644 index 000..1991033 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c @@ -0,0 +1,610 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ + +/* vld2/chunk 0. */ +VECT_VAR_DECL(expected_vld2_0,int,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0xaa }; +VECT_VAR_DECL(expected_vld2_0,int,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,int,32,2) [] = { 0xfff0, 0xfff1 }; +VECT_VAR_DECL(expected_vld2_0,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected_vld2_0,uint,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0xaa }; +VECT_VAR_DECL(expected_vld2_0,uint,16,4) [] = { 0x, 0x, + 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,uint,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0xaa }; +VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0x, 0x, + 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc180, 0xc170 }; +VECT_VAR_DECL(expected_vld2_0,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected_vld2_0,int,16,8) [] = { 0x, 0x, 0x, 0x, + 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,int,32,4) [] = { 0x, 0x, + 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,int,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected_vld2_0,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected_vld2_0,uint,16,8) [] = { 0x, 0x, 0x, 0x, + 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,uint,32,4) [] = { 0xfff0, 0xfff1, + 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,uint,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected_vld2_0,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected_vld2_0,poly,16,8) [] = { 0x, 0x, 0x, 0x, + 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected_vld2_0,hfloat,32,4) [] = { 0x, 0x, + 0x, 0x }; + +/* vld2/chunk 1. */ +VECT_VAR_DECL(expected_vld2_1,int,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa, + 0xaa, 0xaa, 0xf0, 0xf1 }; +VECT_VAR_DECL(expected_vld2_1,int,16,4) [] = { 0xfff0, 0xfff1, 0x, 0x }; +VECT_VAR_DECL(expected_vld2_1,int,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected_vld2_1,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected_vld2_1,uint,8,8) [] = { 0xf0, 0xf1, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0xaa }; +VECT_VAR_DECL(expected_vld2_1,uint,16,4) [] = { 0x, 0x, 0xfff0, 0xfff1 }; +VECT_VAR_DECL(expected_vld2_1,uint,32,2) [] = { 0xfff0, 0xfff1 }; +VECT_VAR_DECL(expected_vld2_1,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf0, 0xf1, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0xaa }; +VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0x, 0x, 0xfff0, 0xfff1 }; +VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected_vld2_1,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, +
[Patch ARM-AArch64/testsuite v3 14/21] Add vbsl tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vbsl.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c new file mode 100644 index 000..bb17f0a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c @@ -0,0 +1,124 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf2, 0xf2, 0xf2, 0xf2, + 0xf6, 0xf6, 0xf6, 0xf6 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffd }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3, + 0xf7, 0xf7, 0xf7, 0xf7 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfff1 }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3, + 0xf7, 0xf7, 0xf7, 0xf7 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc184, 0xc174 }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2, + 0xf6, 0xf6, 0xf6, 0xf6, + 0xf2, 0xf2, 0xf2, 0xf2, + 0xf6, 0xf6, 0xf6, 0xf6 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2, + 0xfff4, 0xfff4, 0xfff6, 0xfff6 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xfff0, 0xfff0, + 0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffd, + 0xfffd }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf3, 0xf3, 0xf3, 0xf3, +0xf7, 0xf7, 0xf7, 0xf7, +0xf3, 0xf3, 0xf3, 0xf3, +0xf7, 0xf7, 0xf7, 0xf7 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2, +0xfff4, 0xfff4, 0xfff6, 0xfff6 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfff0, 0xfff0, +0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfff1, +0xfff1 }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf3, 0xf3, 0xf3, 0xf3, +0xf7, 0xf7, 0xf7, 0xf7, +0xf3, 0xf3, 0xf3, 0xf3, +0xf7, 0xf7, 0xf7, 0xf7 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2, +0xfff4, 0xfff4, 0xfff6, 0xfff6 }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc181, 0xc171, + 0xc161, 0xc151 }; + +#define TEST_MSG VBSL/VBSLQ +void exec_vbsl (void) +{ + /* Basic test: y=vbsl(unsigned_vec,x1,x2), then store the result. */ +#define TEST_VBSL(T3, Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +vbsl##Q##_##T2##W(VECT_VAR(vector_first, T3, W, N), \ + VECT_VAR(vector, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + + DECL_VARIABLE_ALL_VARIANTS(vector); + DECL_VARIABLE_ALL_VARIANTS(vector2); + DECL_VARIABLE_ALL_VARIANTS(vector_res); + + DECL_VARIABLE_UNSIGNED_VARIANTS(vector_first); + + clean_results (); + + TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer); + VLOAD(vector, buffer, , float, f, 32, 2); + VLOAD(vector, buffer, q, float, f, 32, 4); + + /* Choose init value arbitrarily, will be used for vector + comparison. As we want different values for each type variant, we + can't use generic initialization macros. */ + VDUP(vector2, , int, s, 8, 8, -10); + VDUP(vector2, , int, s, 16, 4, -14); + VDUP(vector2, , int, s, 32, 2, -30); + VDUP(vector2, , int, s, 64, 1, -33); + VDUP(vector2, , uint, u, 8, 8, 0xF3); + VDUP(vector2, , uint, u, 16, 4, 0xFFF2); + VDUP(vector2, , uint, u, 32, 2, 0xFFF0); + VDUP(vector2, , uint, u, 64, 1, 0xFFF3); + VDUP(vector2, , float, f, 32, 2, -30.3f); + VDUP(vector2, , poly, p, 8, 8, 0xF3); + VDUP(vector2, , poly, p, 16, 4, 0xFFF2); + + VDUP(vector2, q, int, s, 8, 16, -10); + VDUP(vector2, q, int, s, 16, 8, -14); +
[Patch ARM-AArch64/testsuite v3 17/21] Add vld1_dup tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c new file mode 100644 index 000..0e05274 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c @@ -0,0 +1,180 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +/* Chunk 0. */ +VECT_VAR_DECL(expected0,int,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,int,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,int,32,2) [] = { 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,int,64,1) [] = { 0xfff0 }; +VECT_VAR_DECL(expected0,uint,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,uint,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,uint,32,2) [] = { 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,uint,64,1) [] = { 0xfff0 }; +VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc180, 0xc180 }; +VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0, +0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,int,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0, +0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,int,32,4) [] = { 0xfff0, 0xfff0, +0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,int,64,2) [] = { 0xfff0, +0xfff0 }; +VECT_VAR_DECL(expected0,uint,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,uint,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0, + 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,uint,32,4) [] = { 0xfff0, 0xfff0, + 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,uint,64,2) [] = { 0xfff0, + 0xfff0 }; +VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0, + 0xf0, 0xf0, 0xf0, 0xf0 }; +VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0, + 0xfff0, 0xfff0, 0xfff0, 0xfff0 }; +VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc180, 0xc180, + 0xc180, 0xc180 }; + +/* Chunk 1. */ +VECT_VAR_DECL(expected1,int,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1, + 0xf1, 0xf1, 0xf1, 0xf1 }; +VECT_VAR_DECL(expected1,int,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,int,32,2) [] = { 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,int,64,1) [] = { 0xfff1 }; +VECT_VAR_DECL(expected1,uint,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1 }; +VECT_VAR_DECL(expected1,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,uint,32,2) [] = { 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,uint,64,1) [] = { 0xfff1 }; +VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1 }; +VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0xc170, 0xc170 }; +VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1, +0xf1, 0xf1, 0xf1, 0xf1 }; +VECT_VAR_DECL(expected1,int,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1, +0xfff1, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,int,32,4) [] = { 0xfff1, 0xfff1, +0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected1,int,64,2) [] = { 0xfff1,
[Patch ARM-AArch64/testsuite v3 20/21] Add vmul tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vmul.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c new file mode 100644 index 000..7527861 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c @@ -0,0 +1,156 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0x1, 0x12, 0x23, + 0x34, 0x45, 0x56, 0x67 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xfde0, 0xfe02, 0xfe24, 0xfe46 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfcd0, 0xfd03 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xc0, 0x4, 0x48, 0x8c, + 0xd0, 0x14, 0x58, 0x9c }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfab0, 0xfb05, 0xfb5a, 0xfbaf }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xf9a0, 0xfa06 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0xc0, 0x84, 0x48, 0xc, + 0xd0, 0x94, 0x58, 0x1c }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc405, 0xc3f9c000 }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x90, 0x7, 0x7e, 0xf5, + 0x6c, 0xe3, 0x5a, 0xd1, + 0x48, 0xbf, 0x36, 0xad, + 0x24, 0x9b, 0x12, 0x89 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xf780, 0xf808, 0xf890, 0xf918, + 0xf9a0, 0xfa28, 0xfab0, 0xfb38 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xf670, 0xf709, + 0xf7a2, 0xf83b }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x60, 0xa, 0xb4, 0x5e, +0x8, 0xb2, 0x5c, 0x6, +0xb0, 0x5a, 0x4, 0xae, +0x58, 0x2, 0xac, 0x56 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xf450, 0xf50b, 0xf5c6, 0xf681, +0xf73c, 0xf7f7, 0xf8b2, 0xf96d }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xf340, 0xf40c, +0xf4d8, 0xf5a4 }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x, +0x }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x60, 0xca, 0x34, 0x9e, +0xc8, 0x62, 0x9c, 0x36, +0x30, 0x9a, 0x64, 0xce, +0x98, 0x32, 0xcc, 0x66 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4c7, 0xc4bac000, + 0xc4ae4ccd, 0xc4a1d999 }; + +#ifndef INSN_NAME +#define INSN_NAME vmul +#define TEST_MSG VMUL +#endif + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ +#define DECL_VMUL(T, W, N) \ + DECL_VARIABLE(vector1, T, W, N); \ + DECL_VARIABLE(vector2, T, W, N); \ + DECL_VARIABLE(vector_res, T, W, N) + + /* vector_res = OP(vector1, vector2), then store the result. */ +#define TEST_VMUL1(INSN, Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \ + VECT_VAR(vector_res, T1, W, N)) + +#define TEST_VMUL(INSN, Q, T1, T2, W, N) \ + TEST_VMUL1(INSN, Q, T1, T2, W, N) + + DECL_VMUL(int, 8, 8); + DECL_VMUL(int, 16, 4); + DECL_VMUL(int, 32, 2); + DECL_VMUL(uint, 8, 8); + DECL_VMUL(uint, 16, 4); + DECL_VMUL(uint, 32, 2); + DECL_VMUL(poly, 8, 8); + DECL_VMUL(float, 32, 2); + DECL_VMUL(int, 8, 16); + DECL_VMUL(int, 16, 8); + DECL_VMUL(int, 32, 4); + DECL_VMUL(uint, 8, 16); + DECL_VMUL(uint, 16, 8); + DECL_VMUL(uint, 32, 4); + DECL_VMUL(poly, 8, 16); + DECL_VMUL(float, 32, 4); + + clean_results (); + + /* Initialize input vector1 from buffer. */ + VLOAD(vector1, buffer, , int, s, 8, 8); + VLOAD(vector1, buffer, , int, s, 16, 4); + VLOAD(vector1, buffer, , int, s, 32, 2); + VLOAD(vector1, buffer, , uint, u, 8, 8); + VLOAD(vector1, buffer, , uint,
[Patch ARM-AArch64/testsuite v3 15/21] Add vclz tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vclz.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclz.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclz.c new file mode 100644 index 000..ad28d2d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclz.c @@ -0,0 +1,194 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x3, 0x3, 0x3, 0x3 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x11, 0x11 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x5, 0x5 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, + 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0x3, 0x3, 0x3, 0x3 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, +0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xd, 0xd, 0xd, 0xd, +0xd, 0xd, 0xd, 0xd }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0x1f, 0x1f, 0x1f, 0x1f }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x, +0x }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x, + 0x, 0x }; + + +/* Expected results with input=0. */ +VECT_VAR_DECL(expected_with_0,int,8,8) [] = { 0x8, 0x8, 0x8, 0x8, + 0x8, 0x8, 0x8, 0x8 }; +VECT_VAR_DECL(expected_with_0,int,16,4) [] = { 0x10, 0x10, 0x10, 0x10 }; +VECT_VAR_DECL(expected_with_0,int,32,2) [] = { 0x20, 0x20 }; +VECT_VAR_DECL(expected_with_0,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected_with_0,uint,8,8) [] = { 0x8, 0x8, 0x8, 0x8, + 0x8, 0x8, 0x8, 0x8 }; +VECT_VAR_DECL(expected_with_0,uint,16,4) [] = { 0x10, 0x10, 0x10, 0x10 }; +VECT_VAR_DECL(expected_with_0,uint,32,2) [] = { 0x20, 0x20 }; +VECT_VAR_DECL(expected_with_0,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected_with_0,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected_with_0,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected_with_0,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected_with_0,int,8,16) [] = { 0x8, 0x8, 0x8, 0x8, + 0x8, 0x8, 0x8, 0x8, + 0x8, 0x8, 0x8, 0x8, + 0x8, 0x8, 0x8, 0x8 }; +VECT_VAR_DECL(expected_with_0,int,16,8) [] = { 0x10, 0x10, 0x10, 0x10, + 0x10, 0x10, 0x10, 0x10 }; +VECT_VAR_DECL(expected_with_0,int,32,4) [] = { 0x20, 0x20, 0x20, 0x20 }; +VECT_VAR_DECL(expected_with_0,int,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected_with_0,uint,8,16) [] = { 0x8, 0x8, 0x8, 0x8, + 0x8, 0x8, 0x8, 0x8, + 0x8, 0x8, 0x8, 0x8, + 0x8, 0x8, 0x8, 0x8 }; +VECT_VAR_DECL(expected_with_0,uint,16,8) [] = { 0x10, 0x10, 0x10, 0x10, + 0x10, 0x10, 0x10, 0x10 }; +VECT_VAR_DECL(expected_with_0,uint,32,4) [] = { 0x20, 0x20, 0x20, 0x20 }; +VECT_VAR_DECL(expected_with_0,uint,64,2) [] = {
[Patch ARM-AArch64/testsuite v3 18/21] Add vld2/vld3/vld4 tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vldX.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c new file mode 100644 index 000..fe00640 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c @@ -0,0 +1,692 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results. */ + +/* vld2/chunk 0. */ +VECT_VAR_DECL(expected_vld2_0,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7 }; +VECT_VAR_DECL(expected_vld2_0,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected_vld2_0,int,32,2) [] = { 0xfff0, 0xfff1 }; +VECT_VAR_DECL(expected_vld2_0,int,64,1) [] = { 0xfff0 }; +VECT_VAR_DECL(expected_vld2_0,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7 }; +VECT_VAR_DECL(expected_vld2_0,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected_vld2_0,uint,32,2) [] = { 0xfff0, 0xfff1 }; +VECT_VAR_DECL(expected_vld2_0,uint,64,1) [] = { 0xfff0 }; +VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7 }; +VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc180, 0xc170 }; +VECT_VAR_DECL(expected_vld2_0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected_vld2_0,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3, + 0xfff4, 0xfff5, 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected_vld2_0,int,32,4) [] = { 0xfff0, 0xfff1, + 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected_vld2_0,int,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected_vld2_0,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected_vld2_0,uint,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3, + 0xfff4, 0xfff5, 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected_vld2_0,uint,32,4) [] = { 0xfff0, 0xfff1, + 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected_vld2_0,uint,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected_vld2_0,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected_vld2_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3, + 0xfff4, 0xfff5, 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected_vld2_0,hfloat,32,4) [] = { 0xc180, 0xc170, + 0xc160, 0xc150 }; + +/* vld2/chunk 1. */ +VECT_VAR_DECL(expected_vld2_1,int,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected_vld2_1,int,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected_vld2_1,int,32,2) [] = { 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected_vld2_1,int,64,1) [] = { 0xfff1 }; +VECT_VAR_DECL(expected_vld2_1,uint,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected_vld2_1,uint,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected_vld2_1,uint,32,2) [] = { 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected_vld2_1,uint,64,1) [] = { 0xfff1 }; +VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0xc160, 0xc150 }; +VECT_VAR_DECL(expected_vld2_1,int,8,16) [] = { 0x0, 0x1, 0x2, 0x3, + 0x4, 0x5, 0x6, 0x7, + 0x8, 0x9, 0xa, 0xb, + 0xc, 0xd, 0xe, 0xf
[Patch ARM-AArch64/testsuite v3 21/21] Add vuzp and vzip tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vuzp.c: New file. * gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c new file mode 100644 index 000..53f875e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c @@ -0,0 +1,245 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +/* Expected results splitted in several chunks. */ +/* Chunk 0. */ +VECT_VAR_DECL(expected0,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7 }; +VECT_VAR_DECL(expected0,int,16,4) [] = { 0xfff0, 0xfff1, +0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected0,int,32,2) [] = { 0xfff0, 0xfff1 }; +VECT_VAR_DECL(expected0,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected0,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3, +0xf4, 0xf5, 0xf6, 0xf7 }; +VECT_VAR_DECL(expected0,uint,16,4) [] = { 0xfff0, 0xfff1, + 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected0,uint,32,2) [] = { 0xfff0, + 0xfff1 }; +VECT_VAR_DECL(expected0,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3, +0xf4, 0xf5, 0xf6, 0xf7 }; +VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff1, + 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc180, 0xc170 }; +VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3, +0xf4, 0xf5, 0xf6, 0xf7, +0xf8, 0xf9, 0xfa, 0xfb, +0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected0,int,16,8) [] = { 0xfff0, 0xfff1, +0xfff2, 0xfff3, +0xfff4, 0xfff5, +0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected0,int,32,4) [] = { 0xfff0, 0xfff1, +0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected0,int,64,2) [] = { 0x, +0x }; +VECT_VAR_DECL(expected0,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected0,uint,16,8) [] = { 0xfff0, 0xfff1, + 0xfff2, 0xfff3, + 0xfff4, 0xfff5, + 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected0,uint,32,4) [] = { 0xfff0, 0xfff1, + 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected0,uint,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff1, + 0xfff2, 0xfff3, + 0xfff4, 0xfff5, + 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc180, 0xc170, + 0xc160, 0xc150 }; + +/* Chunk 1. */ +VECT_VAR_DECL(expected1,int,8,8) [] = { 0x11, 0x11, 0x11, 0x11, + 0x11, 0x11, 0x11, 0x11 }; +VECT_VAR_DECL(expected1,int,16,4) [] = { 0x22, 0x22, 0x22, 0x22 }; +VECT_VAR_DECL(expected1,int,32,2) [] = { 0x33, 0x33 }; +VECT_VAR_DECL(expected1,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected1,uint,8,8) [] = { 0x55, 0x55, 0x55, 0x55, +0x55, 0x55, 0x55, 0x55 }; +VECT_VAR_DECL(expected1,uint,16,4) [] = { 0x66, 0x66, 0x66, 0x66 }; +VECT_VAR_DECL(expected1,uint,32,2) [] = { 0x77, 0x77 }; +VECT_VAR_DECL(expected1,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected1,poly,8,8) [] = { 0x55, 0x55, 0x55, 0x55, +0x55, 0x55, 0x55, 0x55 }; +VECT_VAR_DECL(expected1,poly,16,4) [] = { 0x66, 0x66, 0x66, 0x66 }; +VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0x4206, 0x4206 }; +VECT_VAR_DECL(expected1,int,8,16) [] = { 0x11, 0x11, 0x11, 0x11, +0x11, 0x11, 0x11, 0x11, +
[Patch ARM-AArch64/testsuite v3 11/21] Add vaddhn tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vaddhn.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c new file mode 100644 index 000..74b4b4d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c @@ -0,0 +1,109 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h + +#if defined(__cplusplus) +#include cstdint +#else +#include stdint.h +#endif + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x32, 0x32, 0x32, 0x32, + 0x32, 0x32, 0x32, 0x32 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x32, 0x32, 0x32, 0x32 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x18, 0x18 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x3, 0x3, 0x3, 0x3, + 0x3, 0x3, 0x3, 0x3 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x37, 0x37, 0x37, 0x37 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x3, 0x3 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0x, 0x, + 0x, 0x }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0x, 0x, +0x, 0x }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x, +0x }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x, + 0x, 0x }; + +#ifndef INSN_NAME +#define INSN_NAME vaddhn +#define TEST_MSG VADDHN +#endif + +#define FNNAME1(NAME) void exec_ ## NAME (void) +#define FNNAME(NAME) FNNAME1(NAME) + +FNNAME (INSN_NAME) +{ + /* Basic test: vec64=vaddhn(vec128_a, vec128_b), then store the result. */ +#define TEST_VADDHN1(INSN, T1, T2, W, W2, N) \ + VECT_VAR(vector64, T1, W2, N) = INSN##_##T2##W(VECT_VAR(vector1, T1, W, N), \ +VECT_VAR(vector2, T1, W, N)); \ + vst1_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector64, T1, W2, N)) + +#define TEST_VADDHN(INSN, T1, T2, W, W2, N)\ + TEST_VADDHN1(INSN, T1, T2, W, W2, N) + + DECL_VARIABLE_64BITS_VARIANTS(vector64); + DECL_VARIABLE_128BITS_VARIANTS(vector1); + DECL_VARIABLE_128BITS_VARIANTS(vector2); + + clean_results (); + + /* Fill input vector1 and vector2 with arbitrary values */ + VDUP(vector1, q, int, s, 16, 8, 50*(UINT8_MAX+1)); + VDUP(vector1, q, int, s, 32, 4, 50*(UINT16_MAX+1)); + VDUP(vector1, q, int, s, 64, 2, 24*((uint64_t)UINT32_MAX+1)); + VDUP(vector1, q, uint, u, 16, 8, 3*(UINT8_MAX+1)); + VDUP(vector1, q, uint, u, 32, 4, 55*(UINT16_MAX+1)); + VDUP(vector1, q, uint, u, 64, 2, 3*((uint64_t)UINT32_MAX+1)); + + VDUP(vector2, q, int, s, 16, 8, (uint16_t)UINT8_MAX); + VDUP(vector2, q, int, s, 32, 4, (uint32_t)UINT16_MAX); + VDUP(vector2, q, int, s, 64, 2, (uint64_t)UINT32_MAX); + VDUP(vector2, q, uint, u, 16, 8, (uint16_t)UINT8_MAX); + VDUP(vector2, q, uint, u, 32, 4, (uint32_t)UINT16_MAX); + VDUP(vector2, q, uint, u, 64, 2, (uint64_t)UINT32_MAX); + + TEST_VADDHN(INSN_NAME,
[Patch ARM-AArch64/testsuite v3 09/21] Add vabd tests.
2014-10-21 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vabd.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c new file mode 100644 index 000..e95404f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c @@ -0,0 +1,153 @@ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h +#include math.h + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0x11, 0x10, 0xf, 0xe, + 0xd, 0xc, 0xb, 0xa }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0x3, 0x2, 0x1, 0x0 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x18, 0x17 }; +VECT_VAR_DECL(expected,int,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xef, 0xf0, 0xf1, 0xf2, + 0xf3, 0xf4, 0xf5, 0xf6 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xffe3, 0xffe4, 0xffe5, 0xffe6 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffe8, 0xffe9 }; +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x }; +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33, + 0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x41c2, 0x41ba }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0x1a, 0x19, 0x18, 0x17, + 0x16, 0x15, 0x14, 0x13, + 0x12, 0x11, 0x10, 0xf, + 0xe, 0xd, 0xc, 0xb }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0x4, 0x3, 0x2, 0x1, + 0x0, 0x1, 0x2, 0x3 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0x30, 0x2f, 0x2e, 0x2d }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x, + 0x }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xe6, 0xe7, 0xe8, 0xe9, +0xea, 0xeb, 0xec, 0xed, +0xee, 0xef, 0xf0, 0xf1, +0xf2, 0xf3, 0xf4, 0xf5 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xffe4, 0xffe5, 0xffe6, 0xffe7, +0xffe8, 0xffe9, 0xffea, 0xffeb }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffd0, 0xffd1, +0xffd2, 0xffd3 }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x, +0x }; +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33, +0x33, 0x33, 0x33, 0x33 }; +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x, +0x, 0x, 0x, 0x }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x42407ae1, 0x423c7ae1, + 0x42387ae1, 0x42347ae1 }; + +/* Additional expected results for float32 variants with specially + chosen input values. */ +VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 }; + +#define TEST_MSG VABD/VABDQ +void exec_vabd (void) +{ + /* Basic test: v4=vabd(v1,v2), then store the result. */ +#define TEST_VABD(Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +vabd##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + +#define DECL_VABD_VAR(VAR) \ + DECL_VARIABLE(VAR, int, 8, 8); \ + DECL_VARIABLE(VAR, int, 16, 4); \ + DECL_VARIABLE(VAR, int, 32, 2); \ + DECL_VARIABLE(VAR, uint, 8, 8); \ + DECL_VARIABLE(VAR, uint, 16, 4); \ + DECL_VARIABLE(VAR, uint, 32, 2); \ + DECL_VARIABLE(VAR, float, 32, 2);\ + DECL_VARIABLE(VAR, int, 8, 16); \ + DECL_VARIABLE(VAR, int, 16, 8); \ + DECL_VARIABLE(VAR, int, 32, 4); \ + DECL_VARIABLE(VAR, uint, 8, 16); \ + DECL_VARIABLE(VAR, uint, 16, 8); \ + DECL_VARIABLE(VAR, uint, 32, 4); \ + DECL_VARIABLE(VAR, float, 32, 4) + + DECL_VABD_VAR(vector1); + DECL_VABD_VAR(vector2); + DECL_VABD_VAR(vector_res); + + clean_results (); + + /* Initialize input vector1 from buffer. */ + VLOAD(vector1, buffer, , int, s, 8, 8); + VLOAD(vector1, buffer, , int, s, 16, 4); + VLOAD(vector1, buffer, , int, s, 32, 2); + VLOAD(vector1, buffer, , uint, u, 8, 8); + VLOAD(vector1, buffer, ,
Re: [libstdc++ PATCH] More Fundamentals v1 variable templates
On 21/10/14 07:19 +0300, Ville Voutilainen wrote: No, no, no! Contributors must be punished, otherwise they will never learn! ;) Revised patch attached. Thanks, committed to trunk. diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 1ee8ddc..c797246 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -644,7 +644,10 @@ experimental_headers = \ ${experimental_srcdir}/string_view \ ${experimental_srcdir}/string_view.tcc \ ${experimental_srcdir}/tuple \ - ${experimental_srcdir}/type_traits + ${experimental_srcdir}/type_traits \ + ${experimental_srcdir}/ratio \ + ${experimental_srcdir}/chrono \ + ${experimental_srcdir}/system_error \ I rearranged these files to keep them in alphabetical order (and remove the trailing backslash).
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
Richard, Yes, This patch does not make sense since phi node predication for bb with critical incoming edges only performs another function which is absent (predicate_extended_scalar_phi). BTW I see that commit_edge_insertions() is used for rtx instructions only but you propose to use it for tree also. Did I miss something? Thanks ahead. 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). But the same reasoning applies to this version of the patch when flag_force_vectorize is true!? (insertion point and invalid SSA form) Which means the patch doesn't make sense in isolation? Btw, I think for the case you should simply do gsi_insert_on_edge () and commit_edge_insertions () before the call to combine_blocks (pushing the edge predicate to the newly created block). Richard. Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_preds_critical_p): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject temporarily block if-conversion with incoming critical edges if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted after adding support for extended predication. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and at least one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com: Richard, Thanks for your answer! In current implementation phi node conversion assume that one of incoming edge to bb containing given phi has at least one non-critical edge and choose it to insert predicated code. But if we choose critical edge we need to determine insert point and insertion direction (before/after) since in other case we can get invalid ssa form (use before def). This is done by my new function which is not in current patch ( I will present this patch later). SO I assume that we need to leave this patch as it is to not introduce new bugs. Thanks. Yuri. 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I reworked the patch as you proposed, but I didn't understand what did you mean by: So please rework the patch so critical edges are always handled correctly. In current patch flag_force_vectorize is used (1) to reject phi nodes with more than 2 arguments; (2) to reject basic blocks with only critical incoming edges since support for extended predication of phi nodes will be in next patch. I mean that (2) should not be rejected dependent on flag_force_vectorize. It was rejected because if-cvt couldn't handle it correctly before but with this patch this is fixed. I see no reason to still reject this then even for !flag_force_vectorize. Rejecting PHIs with more than two arguments with flag_force_vectorize is ok. Richard. Could you please clarify your statement. I attached modified patch. ChangeLog: 2014-10-17 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_edges_are_critical): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject block if-conversion with incoming critical edges only if FLAG_FORCE_VECTORIZE was not set-up. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to
[PATCHv2] Don't expand string/memory builtins if ASan is enabled.
Hi, this is the second version of the patch. Here the major changes from the previous one: 1) Added a new intercepted_p parameter in get_mem_refs_of_builtin_call to decide whether builtin function should/shouldn't be instrumented. 2) Changed instrument_mem_region_access function. Now, we update asan_mem_ref_ht with (base, size_in_bytes), if we can determine access size during compile time. 3) Removed ASAN_CHECK_START_INSTRUMENTED and ASAN_CHECK_END_INSTRUMENTED from asan_check_flags since we don't instrument base and end of memory region with access size 1 anymore. 4) Specified builtins that shouldn't be expanded explicitly in gcc/builtins.c. Regtested / bootrapped on x86_64-unknown-linux-gnu. -Maxim On 10/17/2014 05:03 PM, Jakub Jelinek wrote: On Fri, Oct 17, 2014 at 05:01:33PM +0400, Yury Gribov wrote: On 10/17/2014 04:24 PM, Jakub Jelinek wrote: +/* Returns TRUE if given FCODE corresponds to string or memory builtin function. + */ + +static inline bool +is_memory_builtin (enum built_in_function fcode) +{ + return fcode = BUILT_IN_STRSTR fcode = BUILT_IN_BCMP; This is too fragile and ugly. IMHO you should list (supposedly not in a special inline, but directly where you use it) in a switch all the builtins you don't want to expand. We already do this for BUILT_IN_ASAN_REPORT_LOAD1 ... BUILT_IN_ASAN_STOREN I know, but it is still a coherent sent of builtins for very similar purposes, many of them sorted by increasing size number. but I agree that this one is more ugly. The memops builtins are just random bag of them, it is expected many people will add builtins into that range and outside of that range. Jakub gcc/ChangeLog: 2014-10-21 Max Ostapenko m.ostape...@partner.samsung.com * asan.c (asan_mem_ref_hasher::hash): Remove MEM_REF access size from hash value construction. Call iterative_hash_expr instead of explicit hash building. (asan_mem_ref_hasher::equal): Change condition. (has_mem_ref_been_instrumented): Likewise. (update_mem_ref_hash_table): Likewise. (maybe_update_mem_ref_hash_table): New function. (instrument_strlen_call): Removed. (get_mem_refs_of_builtin_call): Handle new parameter. (instrument_builtin_call): Call maybe_update_mem_ref_hash_table instead of instrument_mem_region_access if intercepted_p is true. (instrument_mem_region_access): Instrument only base with len instead of base and end with 1. (build_check_stmt): Remove start_instrumented and end_instrumented parameters. (enum asan_check_flags): Remove ASAN_CHECK_START_INSTRUMENTED and ASAN_CHECK_END_INSTRUMENTED. Change ASAN_CHECK_LAST. (asan_expand_check_ifn): Remove start_instrumented and end_instrumented. * builtins.c (expand_builtin): Don't expand string/memory builtin functions that have interceptors in libsanitizer if ASan is enabled. gcc/testsuite/ChangeLog: 2014-10-21 Max Ostapenko m.ostape...@partner.samsung.com * c-c++-common/asan/no-redundant-instrumentation-1.c: Updated test. * c-c++-common/asan/no-redundant-instrumentation-4.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-5.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-6.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-7.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-8.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-2.c: Removed. * c-c++-common/asan/no-redundant-instrumentation-9.c: Likewise. * c-c++-common/asan/no-redundant-instrumentation-10.c: New test. * c-c++-common/asan/no-redundant-instrumentation-11.c: Likewise. diff --git a/gcc/asan.c b/gcc/asan.c index 2a61a82..a9eb9aa 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -253,9 +253,7 @@ enum asan_check_flags ASAN_CHECK_STORE = 1 0, ASAN_CHECK_SCALAR_ACCESS = 1 1, ASAN_CHECK_NON_ZERO_LEN = 1 2, - ASAN_CHECK_START_INSTRUMENTED = 1 3, - ASAN_CHECK_END_INSTRUMENTED = 1 4, - ASAN_CHECK_LAST + ASAN_CHECK_LAST = 1 3 }; /* Hashtable support for memory references used by gimple @@ -352,10 +350,7 @@ struct asan_mem_ref_hasher inline hashval_t asan_mem_ref_hasher::hash (const asan_mem_ref *mem_ref) { - inchash::hash hstate; - inchash::add_expr (mem_ref-start, hstate); - hstate.add_wide_int (mem_ref-access_size); - return hstate.end (); + return iterative_hash_expr (mem_ref-start, 0); } /* Compare two memory references. We accept the length of either @@ -365,8 +360,7 @@ inline bool asan_mem_ref_hasher::equal (const asan_mem_ref *m1, const asan_mem_ref *m2) { - return (m1-access_size == m2-access_size - operand_equal_p (m1-start, m2-start, 0)); + return operand_equal_p (m1-start, m2-start, 0); } static hash_tableasan_mem_ref_hasher *asan_mem_ref_ht; @@ -417,7 +411,8 @@ has_mem_ref_been_instrumented (tree ref, HOST_WIDE_INT access_size) asan_mem_ref r; asan_mem_ref_init (r, ref, access_size); - return (get_mem_ref_hash_table ()-find (r) != NULL); + asan_mem_ref *saved_ref = get_mem_ref_hash_table ()-find (r); +
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Yes, This patch does not make sense since phi node predication for bb with critical incoming edges only performs another function which is absent (predicate_extended_scalar_phi). BTW I see that commit_edge_insertions() is used for rtx instructions only but you propose to use it for tree also. Did I miss something? Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert if you want easy access to the newly created basic block to push the predicate to - see gsi_commit_edge_inserts implementation). Richard. Thanks ahead. 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). But the same reasoning applies to this version of the patch when flag_force_vectorize is true!? (insertion point and invalid SSA form) Which means the patch doesn't make sense in isolation? Btw, I think for the case you should simply do gsi_insert_on_edge () and commit_edge_insertions () before the call to combine_blocks (pushing the edge predicate to the newly created block). Richard. Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_preds_critical_p): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject temporarily block if-conversion with incoming critical edges if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted after adding support for extended predication. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and at least one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com: Richard, Thanks for your answer! In current implementation phi node conversion assume that one of incoming edge to bb containing given phi has at least one non-critical edge and choose it to insert predicated code. But if we choose critical edge we need to determine insert point and insertion direction (before/after) since in other case we can get invalid ssa form (use before def). This is done by my new function which is not in current patch ( I will present this patch later). SO I assume that we need to leave this patch as it is to not introduce new bugs. Thanks. Yuri. 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I reworked the patch as you proposed, but I didn't understand what did you mean by: So please rework the patch so critical edges are always handled correctly. In current patch flag_force_vectorize is used (1) to reject phi nodes with more than 2 arguments; (2) to reject basic blocks with only critical incoming edges since support for extended predication of phi nodes will be in next patch. I mean that (2) should not be rejected dependent on flag_force_vectorize. It was rejected because if-cvt couldn't handle it correctly before but with this patch this is fixed. I see no reason to still reject this then even for !flag_force_vectorize. Rejecting PHIs with more than two arguments with flag_force_vectorize is ok. Richard. Could you please clarify your statement. I attached modified patch. ChangeLog: 2014-10-17 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up
Re: Add polymorphic call context propagation to ipa-prop
Hi, On Thu, Oct 02, 2014 at 09:00:12AM +0200, Jan Hubicka wrote: Hi, this patch makes ipa-prop to use ipa-polymorphic-call-context infrastructure for forward propagation (in a very minimal and simple way). At the moment only static type info is propagated and it is used only speculatively because I will need to update dynamic type change code to deal with more general setting than in the old binfo propagation code. I want to do it incrementally. The basic problem is that the old binfo code wasmostly built around idea that all bases are primary bases and one class may contain other only as a base (not as field). Martin, I get out of range ICEs in controlled uses code and thus I added an extra check, see FIXME bellow. Could you, please, help me to fix that correctly? Is there a simple testcase? And sorry for not reviewing this in time but I've only recently noticed that... THe patch also does not add necessary propagation into ipa-cp and thus devirtualizatoin happens only during inlining and is not appropriately hinted. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza * ipa-prop.h (ipa_get_controlled_uses): Add hack to avoid ICE when speculation is added. (ipa_edge_args): Add polymorphic_call_contexts. (ipa_get_ith_polymorhic_call_context): New accesor. (ipa_make_edge_direct_to_target): Add SPECULATIVE parameter. (ipa_print_node_jump_functions_for_edge): Print contexts. (ipa_compute_jump_functions_for_edge): Compute contexts. (update_jump_functions_after_inlining): Update contexts. (ipa_make_edge_direct_to_target): Add SPECULATIVE argument; update dumping; add speculative edge creation. (try_make_edge_direct_virtual_call): Add CTX_PTR parameter; handle context updating. (update_indirect_edges_after_inlining): Pass down context. (ipa_edge_duplication_hook): Duplicate contexts. (ipa_write_node_info): Stream out contexts. (ipa_read_node_info): Stream in contexts. * ipa-devirt.c (type_all_derivations_known_p): Avoid ICE on non-ODR types. (try_speculative_devirtualization): New function. * ipa-utils.h (try_speculative_devirtualization): Declare. Index: ipa-prop.h === --- ipa-prop.h(revision 215792) +++ ipa-prop.h(working copy) @@ -432,7 +432,10 @@ ipa_set_param_used (struct ipa_node_para static inline int ipa_get_controlled_uses (struct ipa_node_params *info, int i) { - return info-descriptors[i].controlled_uses; + /* FIXME: introducing speuclation causes out of bounds access here. */ + if (info-descriptors.length () (unsigned)i) +return info-descriptors[i].controlled_uses; + return IPA_UNDESCRIBED_USE; } /* Set the controlled counter of a given parameter. */ @@ -479,6 +482,7 @@ struct GTY(()) ipa_edge_args { /* Vector of the callsite's jump function of each parameter. */ Index: ipa-prop.c === --- ipa-prop.c(revision 215792) +++ ipa-prop.c(working copy) @@ -2608,11 +2625,15 @@ update_jump_functions_after_inlining (st for (i = 0; i count; i++) { struct ipa_jump_func *dst = ipa_get_ith_jump_func (args, i); + struct ipa_polymorphic_call_context *dst_ctx + = ipa_get_ith_polymorhic_call_context (args, i); if (dst-type == IPA_JF_ANCESTOR) { struct ipa_jump_func *src; int dst_fid = dst-value.ancestor.formal_id; + struct ipa_polymorphic_call_context *src_ctx + = ipa_get_ith_polymorhic_call_context (top, dst_fid); This should be moved down below the check that there is not a mismatch between number of formal parameters and actual arguments, to the same place where we initialize src. /* Variable number of arguments can cause havoc if we try to access one that does not exist in the inlined edge. So make sure we @@ -2625,6 +2646,22 @@ update_jump_functions_after_inlining (st src = ipa_get_ith_jump_func (top, dst_fid); + if (src_ctx !src_ctx-useless_p ()) + { + struct ipa_polymorphic_call_context ctx = *src_ctx; + + /* TODO: Make type preserved safe WRT contexts. */ + if (!dst-value.ancestor.agg_preserved) + ctx.make_speculative (); + ctx.offset_by (dst-value.ancestor.offset); + if (!ctx.useless_p ()) + { + vec_safe_grow_cleared (args-polymorphic_call_contexts, + count); + dst_ctx = ipa_get_ith_polymorhic_call_context (args, i); + } + } I believe that dst_ctx-combine_with (ctx) is missing here? Thanks, Martin
[PATCH][ARM] Update target testcases for gnu11
this patch update arm testcases for recently gnu11 change. ok for trunk? thanks. gcc/testsuite/ * gcc.target/arm/20031108-1.c: Add explicit declaration. * gcc.target/arm/cold-lc.c: Likewise. * gcc.target/arm/neon-modes-2.c: Likewise. * gcc.target/arm/pr43920-2.c: Likewise. * gcc.target/arm/pr44788.c: Likewise. * gcc.target/arm/pr55642.c: Likewise. * gcc.target/arm/pr58784.c: Likewise. * gcc.target/arm/pr60650.c: Likewise. * gcc.target/arm/pr60650-2.c: Likewise. * gcc.target/arm/vfp-ldmdbs.c: Likewise. * gcc.target/arm/vfp-ldmias.c: Likewise. * lib/target-supports.exp: Likewise. * gcc.target/arm/pr51968.c: Add -Wno-implicit-function-declaration. diff --git a/gcc/testsuite/gcc.target/arm/20031108-1.c b/gcc/testsuite/gcc.target/arm/20031108-1.c index d9b6006..7923e11 100644 --- a/gcc/testsuite/gcc.target/arm/20031108-1.c +++ b/gcc/testsuite/gcc.target/arm/20031108-1.c @@ -20,6 +20,9 @@ typedef struct record Rec_Pointer Ptr_Glob; +extern int Proc_7 (int, int, int *); + +void Proc_1 (Ptr_Val_Par) Rec_Pointer Ptr_Val_Par; { diff --git a/gcc/testsuite/gcc.target/arm/cold-lc.c b/gcc/testsuite/gcc.target/arm/cold-lc.c index 295c29f..467a696 100644 --- a/gcc/testsuite/gcc.target/arm/cold-lc.c +++ b/gcc/testsuite/gcc.target/arm/cold-lc.c @@ -7,6 +7,7 @@ struct thread_info { struct task_struct *task; }; extern struct thread_info *current_thread_info (void); +extern int show_stack (struct task_struct *, unsigned long *); void dump_stack (void) { diff --git a/gcc/testsuite/gcc.target/arm/neon-modes-2.c b/gcc/testsuite/gcc.target/arm/neon-modes-2.c index 40f1bba..16319bb 100644 --- a/gcc/testsuite/gcc.target/arm/neon-modes-2.c +++ b/gcc/testsuite/gcc.target/arm/neon-modes-2.c @@ -11,6 +11,8 @@ #define MANY(A) A (0), A (1), A (2), A (3), A (4), A (5) +extern void foo (int *, int *); + void bar (uint32_t *ptr, int y) { diff --git a/gcc/testsuite/gcc.target/arm/pr43920-2.c b/gcc/testsuite/gcc.target/arm/pr43920-2.c index f647165..f5e8f48 100644 --- a/gcc/testsuite/gcc.target/arm/pr43920-2.c +++ b/gcc/testsuite/gcc.target/arm/pr43920-2.c @@ -4,6 +4,8 @@ #include stdio.h +extern int lseek(int, long, int); + int getFileStartAndLength (int fd, int *start_, size_t *length_) { int start, end; diff --git a/gcc/testsuite/gcc.target/arm/pr44788.c b/gcc/testsuite/gcc.target/arm/pr44788.c index eb4bc11..9ce44a8 100644 --- a/gcc/testsuite/gcc.target/arm/pr44788.c +++ b/gcc/testsuite/gcc.target/arm/pr44788.c @@ -2,6 +2,8 @@ /* { dg-require-effective-target arm_thumb2_ok } */ /* { dg-options -Os -fno-strict-aliasing -fPIC -mthumb -march=armv7-a -mfpu=vfp3 -mfloat-abi=softfp } */ +extern void foo (float *); + void joint_decode(float* mlt_buffer1, int t) { int i; float decode_buffer[1060]; diff --git a/gcc/testsuite/gcc.target/arm/pr51968.c b/gcc/testsuite/gcc.target/arm/pr51968.c index f0506c2..6cf802b 100644 --- a/gcc/testsuite/gcc.target/arm/pr51968.c +++ b/gcc/testsuite/gcc.target/arm/pr51968.c @@ -1,6 +1,6 @@ /* PR target/51968 */ /* { dg-do compile } */ -/* { dg-options -O2 -march=armv7-a -mfloat-abi=softfp -mfpu=neon } */ +/* { dg-options -O2 -Wno-implicit-function-declaration -march=armv7-a -mfloat-abi=softfp -mfpu=neon } */ /* { dg-require-effective-target arm_neon_ok } */ typedef __builtin_neon_qi int8x8_t __attribute__ ((__vector_size__ (8))); diff --git a/gcc/testsuite/gcc.target/arm/pr55642.c b/gcc/testsuite/gcc.target/arm/pr55642.c index 10f2daa..a7defa7 100644 --- a/gcc/testsuite/gcc.target/arm/pr55642.c +++ b/gcc/testsuite/gcc.target/arm/pr55642.c @@ -2,6 +2,8 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb2_ok } */ +extern int abs (int); + int foo (int v) { diff --git a/gcc/testsuite/gcc.target/arm/pr58784.c b/gcc/testsuite/gcc.target/arm/pr58784.c index 9a1fcff..4ee3ef5 100644 --- a/gcc/testsuite/gcc.target/arm/pr58784.c +++ b/gcc/testsuite/gcc.target/arm/pr58784.c @@ -11,6 +11,9 @@ typedef struct __attribute__ ((__packed__)) char stepsRemoved; ptp_tlv_t tlv[1]; } ptp_message_announce_t; + +extern void f (ptp_message_announce_t *); + int ptplib_send_announce(int sequenceId, int i) { ptp_message_announce_t tx_packet; diff --git a/gcc/testsuite/gcc.target/arm/pr60650-2.c b/gcc/testsuite/gcc.target/arm/pr60650-2.c index 1946760..c8d4615 100644 --- a/gcc/testsuite/gcc.target/arm/pr60650-2.c +++ b/gcc/testsuite/gcc.target/arm/pr60650-2.c @@ -4,17 +4,19 @@ int a, h, j; long long d, e, i; int f; +int fn1 (void *p1, int p2) { switch (p2) case 8: { -register b = *(long long *) p1, c asm (r2); +register int b = *(long long *) p1, c asm (r2); asm (%0: =r (a), =r (c):r (b), r (0)); *(long long *) p1 = c; } } +int fn2 () { int k; @@ -27,8 +29,8 @@ fn2 () case 0: ( { -register l asm (r4); -register m asm (r0); +register int l asm (r4); +register int m asm (r0);
[PATCH][AArch64]Update target testcases for gnu11
Update testcases for recent gnu11 changes. ok for trunk? thanks. gcc/testsuite/ * gcc.target/aarch64/pic-constantpool1.c: Add explicit declaration. * gcc.target/aarch64/pic-symrefplus.c: Likewise. * gcc.target/aarch64/reload-valid-spoff.c: Likewise. * gcc.target/aarch64/vect.x: Likewise. * gcc.target/aarch64/vect-ld1r.x: Add return type. * gcc.target/aarch64/vect-fmax-fmin.c: Likewise. * gcc.target/aarch64/vect-fp.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c b/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c index 3109d9d..043f1ee 100644 --- a/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c +++ b/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c @@ -2,10 +2,13 @@ /* { dg-do compile } */ extern int __finite (double __value) __attribute__ ((__nothrow__)) __attribute__ ((__const__)); +extern int __finitef (float __value) __attribute__ ((__nothrow__)) __attribute__ ((__const__)); +extern int __signbit (double __value) __attribute__ ((__nothrow__)) __attribute__ ((__const__)); +extern int __signbitf (float __value) __attribute__ ((__nothrow__)) __attribute__ ((__const__)); int __ecvt_r (value, ndigit, decpt, sign, buf, len) double value; - int ndigit, *decpt, *sign; + int ndigit, *decpt, *sign, len; char *buf; { if ((sizeof (value) == sizeof (float) ? __finitef (value) : __finite (value)) value != 0.0) diff --git a/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c b/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c index f277a52..406568c 100644 --- a/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c +++ b/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c @@ -34,12 +34,16 @@ struct locale_data values []; }; extern const struct locale_data _nl_C_LC_TIME __attribute__ ((visibility (hidden))); +extern void *memset (void *s, int c, size_t n); +extern size_t strlen (const char *s); +extern int __strncasecmp_l (const char *s1, const char *s2, size_t n, __locale_t locale); char * __strptime_internal (rp, fmt, tmp, statep , locale) const char *rp; const char *fmt; __locale_t locale; void *statep; + int tmp; { struct locale_data *const current = locale-__locales[__LC_TIME]; const char *rp_backup; @@ -124,5 +128,9 @@ __strptime_internal (rp, fmt, tmp, statep , locale) } char * __strptime_l (buf, format, tm , locale) + int buf; + int format; + int tm; + int locale; { } diff --git a/gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c b/gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c index b44e560..c2b5464 100644 --- a/gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c +++ b/gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c @@ -17,6 +17,11 @@ struct arpreq }; typedef struct _IO_FILE FILE; extern char *fgets (char *__restrict __s, int __n, FILE *__restrict __stream); +extern void *memset (void *s, int c, size_t n); +extern void *memcpy (void *dest, const void *src, size_t n); +extern int fprintf (FILE *stream, const char *format, ...); +extern char * safe_strncpy (char *dst, const char *src, size_t size); +extern size_t strlen (const char *s); extern struct _IO_FILE *stderr; extern int optind; struct aftype { diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fmax-fmin.c b/gcc/testsuite/gcc.target/aarch64/vect-fmax-fmin.c index 42600b7..33a9444 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-fmax-fmin.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-fmax-fmin.c @@ -8,11 +8,11 @@ extern void abort (void); #include vect-fmaxv-fminv.x #define DEFN_SETV(type) \ - set_vector_##type (pR##type a, type n) \ - { \ - int i; \ - for (i=0; i16; i++) \ - a[i] = n; \ + void set_vector_##type (pR##type a, type n) \ + { \ + int i; \ + for (i=0; i16; i++) \ + a[i] = n; \ } #define DEFN_CHECKV(type) \ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fp.c b/gcc/testsuite/gcc.target/aarch64/vect-fp.c index bcf9d9d..af0c524 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-fp.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-fp.c @@ -8,11 +8,11 @@ extern void abort (void); #define DEFN_SETV(type) \ - set_vector_##type (pR##type a, type n) \ - { \ - int i; \ - for (i=0; i16; i++) \ - a[i] = n; \ + void set_vector_##type (pR##type a, type n) \ + { \ + int i; \ + for (i=0; i16; i++) \ + a[i] = n; \ } #define DEFN_CHECKV(type) \ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-ld1r.x b/gcc/testsuite/gcc.target/aarch64/vect-ld1r.x index 680ce43..db83036 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-ld1r.x +++ b/gcc/testsuite/gcc.target/aarch64/vect-ld1r.x @@ -7,7 +7,7 @@ for (i = 0; i 8 / sizeof (TYPE); i++) \ output[i] = *a; \ } \ - foo_ ## TYPE ## _q (TYPE *a, TYPE *output) \ + void foo_ ## TYPE ## _q (TYPE *a, TYPE *output)
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
Richard, I saw the sources of these functions, but I can't understand why I should use something else? Note that all predicate computations are located in basic blocks ( by design of if-conv) and there is special function that put these computations in bb (insert_gimplified_predicates). Edge contains only predicate not its computations. New function - find_insertion_point() does very simple search - it finds out the latest (in current bb) operand def-stmt of predicates taken from all incoming edges. In original algorithm the predicate of non-critical edge is taken to perform phi-node predication since for critical edge it does not work properly. My question is: does your comments mean that I should re-design my extensions? Thanks. Yuri. BTW Jeff did initial review of my changes related to predicate computation for join blocks. I presented him updated patch with test-case and some minor changes in patch. But still did not get any feedback on it. Could you please take a look also on it? 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Yes, This patch does not make sense since phi node predication for bb with critical incoming edges only performs another function which is absent (predicate_extended_scalar_phi). BTW I see that commit_edge_insertions() is used for rtx instructions only but you propose to use it for tree also. Did I miss something? Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert if you want easy access to the newly created basic block to push the predicate to - see gsi_commit_edge_inserts implementation). Richard. Thanks ahead. 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). But the same reasoning applies to this version of the patch when flag_force_vectorize is true!? (insertion point and invalid SSA form) Which means the patch doesn't make sense in isolation? Btw, I think for the case you should simply do gsi_insert_on_edge () and commit_edge_insertions () before the call to combine_blocks (pushing the edge predicate to the newly created block). Richard. Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_preds_critical_p): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject temporarily block if-conversion with incoming critical edges if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted after adding support for extended predication. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and at least one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com: Richard, Thanks for your answer! In current implementation phi node conversion assume that one of incoming edge to bb containing given phi has at least one non-critical edge and choose it to insert predicated code. But if we choose critical edge we need to determine insert point and insertion direction (before/after) since in other case we can get invalid ssa form (use before def). This is done by my new function which is not in current patch ( I will present this patch later). SO I assume that we need to leave this patch as it is to not introduce new bugs. Thanks. Yuri. 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I reworked the patch as you proposed, but I didn't understand what did you mean by: So please rework the patch so critical edges are always handled correctly. In current patch flag_force_vectorize is used (1) to reject phi nodes with more than 2 arguments; (2) to
[PATCH][match-and-simplify] Re-factor code in fold_stmt_1
This refactors the code I added to fold_stmt to dispatch to pattern-based folding to avoid long lines and make error handling easier (no goto). It also uses the newly introduced gimple_seq_discard to properly discard an unused simplification result. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. This piece will land in trunk with the next merge piece. Richard. 2014-10-21 Richard Biener rguent...@suse.de * gimple-fold.c (replace_stmt_with_simplification): New helper split out from ... (fold_stmt_1): ... here. Discard the simplified sequence if replacement failed. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 216505) +++ gcc/gimple-fold.c (working copy) @@ -2794,6 +2794,121 @@ gimple_fold_call (gimple_stmt_iterator * return changed; } + +/* Worker for fold_stmt_1 dispatch to pattern based folding with + gimple_simplify. + + Replaces *GSI with the simplification result in RCODE and OPS + and the associated statements in *SEQ. Does the replacement + according to INPLACE and returns true if the operation succeeded. */ + +static bool +replace_stmt_with_simplification (gimple_stmt_iterator *gsi, + code_helper rcode, tree *ops, + gimple_seq *seq, bool inplace) +{ + gimple stmt = gsi_stmt (*gsi); + + /* Play safe and do not allow abnormals to be mentioned in + newly created statements. See also maybe_push_res_to_seq. */ + if ((TREE_CODE (ops[0]) == SSA_NAME +SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[0])) + || (ops[1] + TREE_CODE (ops[1]) == SSA_NAME + SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[1])) + || (ops[2] + TREE_CODE (ops[2]) == SSA_NAME + SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[2]))) +return false; + + if (gimple_code (stmt) == GIMPLE_COND) +{ + gcc_assert (rcode.is_tree_code ()); + if (TREE_CODE_CLASS ((enum tree_code)rcode) == tcc_comparison + /* GIMPLE_CONDs condition may not throw. */ + (!flag_exceptions + || !cfun-can_throw_non_call_exceptions + || !operation_could_trap_p (rcode, + FLOAT_TYPE_P (TREE_TYPE (ops[0])), + false, NULL_TREE))) + gimple_cond_set_condition (stmt, rcode, ops[0], ops[1]); + else if (rcode == SSA_NAME) + gimple_cond_set_condition (stmt, NE_EXPR, ops[0], + build_zero_cst (TREE_TYPE (ops[0]))); + else if (rcode == INTEGER_CST) + { + if (integer_zerop (ops[0])) + gimple_cond_make_false (stmt); + else + gimple_cond_make_true (stmt); + } + else if (!inplace) + { + tree res = maybe_push_res_to_seq (rcode, boolean_type_node, + ops, seq); + if (!res) + return false; + gimple_cond_set_condition (stmt, NE_EXPR, res, +build_zero_cst (TREE_TYPE (res))); + } + else + return false; + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, gimple_simplified to ); + if (!gimple_seq_empty_p (*seq)) + print_gimple_seq (dump_file, *seq, 0, TDF_SLIM); + print_gimple_stmt (dump_file, gsi_stmt (*gsi), +0, TDF_SLIM); + } + gsi_insert_seq_before (gsi, *seq, GSI_SAME_STMT); + return true; +} + else if (is_gimple_assign (stmt) + rcode.is_tree_code ()) +{ + if (!inplace + || gimple_num_ops (stmt) = get_gimple_rhs_num_ops (rcode)) + { + maybe_build_generic_op (rcode, + TREE_TYPE (gimple_assign_lhs (stmt)), + ops[0], ops[1], ops[2]); + gimple_assign_set_rhs_with_ops_1 (gsi, rcode, + ops[0], ops[1], ops[2]); + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, gimple_simplified to ); + if (!gimple_seq_empty_p (*seq)) + print_gimple_seq (dump_file, *seq, 0, TDF_SLIM); + print_gimple_stmt (dump_file, gsi_stmt (*gsi), +0, TDF_SLIM); + } + gsi_insert_seq_before (gsi, *seq, GSI_SAME_STMT); + return true; + } +} + else if (!inplace) +{ + if (gimple_has_lhs (stmt)) + { + tree lhs = gimple_get_lhs (stmt); + maybe_push_res_to_seq (rcode, TREE_TYPE (lhs), +ops, seq, lhs); + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, gimple_simplified to ); + print_gimple_seq (dump_file, *seq, 0, TDF_SLIM); + } + gsi_replace_with_seq_vops
Re: [C++ Patch] Add default arguments to cp_parser_unary_expression
OK. Jason
[PATCH][dejagnu] gcc-dg-prune glitch when filtering relocation truncation error
On 19/08/14 17:30, Mike Stump wrote: On Aug 19, 2014, at 6:12 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: So how about this? Ok. Thanks. looks like this patch only fixed one invoke path. currently, gcc-dg-prune may be invoked directly *or* via ${tool}_check_compile: and gcc-dg-prune is implemented to return ::unsupported::memory full if the input message contains the relocation truncated error pattern. this return message it OK if it's invoked directly, while it will be wrong if it's invoked via ${tool}_check_compile. because the ${tool}_check_compile has a duplicated check of unsupported testcase later via ${tool}_check_unsupported_p which only works with original output message by matching the relocation truncation keyword. So, our early hijack of the error in gcc-dg-prune will replace those keywords to ::unsupported::memory which confuse the later check. this patch doing the following cleanup: * modify the expected output in ${tool}_check_compile. if gcc-dg-prune invoked, then we expect ::unsupported:: keyword for unsupported testcase. * remove the duplicated unresolve report in compat.exp. for all ${tool}_check_compile return 0, the issue is handled already. No need to report a redundant status. ok for trunk? gcc/testsuite/ * lib/compat.exp (compat-run): Remove unresolved. * lib/gcc-defs.exp (${tools}_check_compile): Update code logic for unsupported testcase. diff --git a/gcc/testsuite/lib/compat.exp b/gcc/testsuite/lib/compat.exp index 7ab85aa..45cf0e0 100644 --- a/gcc/testsuite/lib/compat.exp +++ b/gcc/testsuite/lib/compat.exp @@ -134,7 +134,6 @@ proc compat-run { testname objlist dest optall optfile optstr } { $options] if ![${tool}_check_compile $testcase $testname link \ $dest $comp_output] then { - unresolved $testcase $testname execute $optstr return } diff --git a/gcc/testsuite/lib/gcc-defs.exp b/gcc/testsuite/lib/gcc-defs.exp index cb93238..d479667 100644 --- a/gcc/testsuite/lib/gcc-defs.exp +++ b/gcc/testsuite/lib/gcc-defs.exp @@ -54,12 +54,17 @@ proc ${tool}_check_compile {testcase option objname gcc_output} { if { [info proc ${tool}-dg-prune] != } { global target_triplet set gcc_output [${tool}-dg-prune $target_triplet $gcc_output] -} - -set unsupported_message [${tool}_check_unsupported_p $gcc_output] -if { $unsupported_message != } { - unsupported $testcase: $unsupported_message - return 0 + if [string match *::unsupported::* $gcc_output] then { + regsub -- ::unsupported:: $gcc_output gcc_output + unsupported $testcase: $gcc_output + return 0 + } +} else { + set unsupported_message [${tool}_check_unsupported_p $gcc_output] + if { $unsupported_message != } { + unsupported $testcase: $unsupported_message + return 0 + } } # remove any leftover LF/CR to make sure any output is legit
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I saw the sources of these functions, but I can't understand why I should use something else? Note that all predicate computations are located in basic blocks ( by design of if-conv) and there is special function that put these computations in bb (insert_gimplified_predicates). Edge contains only predicate not its computations. New function - find_insertion_point() does very simple search - it finds out the latest (in current bb) operand def-stmt of predicates taken from all incoming edges. In original algorithm the predicate of non-critical edge is taken to perform phi-node predication since for critical edge it does not work properly. My question is: does your comments mean that I should re-design my extensions? Well, we have infrastructure for inserting code on edges and you've made critical edges predicated correctly. So why re-invent the wheel? I realize this is very similar to my initial suggestion to simply split critical edges in loops you want to if-convert but delays splitting until it turns out to be necessary (which might be good for the !force_vect case). For edge predicates you simply can emit their computation on the edge, no? Btw, I very originally suggested to rework if-conversion to only record edge predicates - having both block and edge predicates somewhat complicates the code and makes it harder to maintain (thus also the suggestion to simply split critical edges if necessary to make BB predicates work always). Your patches add a lot of code and to me it seems we can avoid doing so much special casing. Richard. Thanks. Yuri. BTW Jeff did initial review of my changes related to predicate computation for join blocks. I presented him updated patch with test-case and some minor changes in patch. But still did not get any feedback on it. Could you please take a look also on it? 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Yes, This patch does not make sense since phi node predication for bb with critical incoming edges only performs another function which is absent (predicate_extended_scalar_phi). BTW I see that commit_edge_insertions() is used for rtx instructions only but you propose to use it for tree also. Did I miss something? Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert if you want easy access to the newly created basic block to push the predicate to - see gsi_commit_edge_inserts implementation). Richard. Thanks ahead. 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). But the same reasoning applies to this version of the patch when flag_force_vectorize is true!? (insertion point and invalid SSA form) Which means the patch doesn't make sense in isolation? Btw, I think for the case you should simply do gsi_insert_on_edge () and commit_edge_insertions () before the call to combine_blocks (pushing the edge predicate to the newly created block). Richard. Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_preds_critical_p): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject temporarily block if-conversion with incoming critical edges if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted after adding support for extended predication. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and at least one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com: Richard, Thanks for your answer! In current implementation phi node conversion assume
[0/6] nvptx testsuite patches
This series modifies a large number of tests in order to clean up testsuite results on nvptx. The goal here was never really to get an entirely clean run - the target is just too different from conventional ones - but to be able to test the compiler sufficiently to be sure that it's in good shape for use in offloading. Most of the patches here add annotations for use of features like alloca or indirect jumps that are unsupported on the target. Examples of things that still cause failures are things like dots in identifiers, use of constructors (which is something I want to look into), certain constructs that trigger bugs in the ptxas tool, and lots of undefined C library functions. Bernd
[1/6] nvptx testsuite patches: alloca
This deals with uses of alloca in the testsuite. Some tests require it outright, others only at -O0, and others require it implicitly by requiring an alignment for stack variables bigger than the target's STACK_BOUNDARY. For the latter I've added explicit xfails. Bernd gcc/testsuite/ * lib/target-supports.exp (check_effective_target_alloca): New function. * gcc.c-torture/execute/20010209-1.c: Require alloca. * gcc.c-torture/execute/20020314-1.c: Likewise. * gcc.c-torture/execute/20020412-1.c: Likewise. * gcc.c-torture/execute/20021113-1.c: Likewise. * gcc.c-torture/execute/20040223-1.c: Likewise. * gcc.c-torture/execute/20040308-1.c: Likewise. * gcc.c-torture/execute/20040811-1.c: Likewise. * gcc.c-torture/execute/20070824-1.c: Likewise. * gcc.c-torture/execute/20070919-1.c: Likewise. * gcc.c-torture/execute/built-in-setjmp.c: Likewise. * gcc.c-torture/execute/pr22061-1.c: Likewise. * gcc.c-torture/execute/pr22061-4.c: Likewise. * gcc.c-torture/execute/pr43220.c: Likewise. * gcc.c-torture/execute/vla-dealloc-1.c: Likewise. * gcc.dg/torture/stackalign/alloca-1.c: Likewise. * gcc.dg/torture/stackalign/vararg-1.c: Likewise. * gcc.dg/torture/stackalign/vararg-2.c: Likewise. * gcc.c-torture/compile/2923-1.c: Likewise. * gcc.c-torture/compile/20030224-1.c: Likewise. * gcc.c-torture/compile/20071108-1.c: Likewise. * gcc.c-torture/compile/20071117-1.c: Likewise. * gcc.c-torture/compile/900313-1.c: Likewise. * gcc.c-torture/compile/pr17397.c: Likewise. * gcc.c-torture/compile/pr35006.c: Likewise. * gcc.c-torture/compile/pr42956.c: Likewise. * gcc.c-torture/compile/pr51354.c: Likewise. * gcc.c-torture/compile/pr55851.c: Likewise. * gcc.c-torture/compile/vla-const-1.c: Likewise. * gcc.c-torture/compile/vla-const-2.c: Likewise. * gcc.c-torture/compile/pr31507-1.c: Likewise. * gcc.c-torture/compile/pr52714.c: Likewise. * gcc.dg/20001012-2.c: Likewise. * gcc.dg/auto-type-1.c: Likewise. * gcc.dg/builtin-object-size-1.c: Likewise. * gcc.dg/builtin-object-size-2.c: Likewise. * gcc.dg/builtin-object-size-3.c: Likewise. * gcc.dg/builtin-object-size-4.c: Likewise. * gcc.dg/packed-vla.c: Likewise. * gcc.c-torture/compile/parms.c: Likewise. * gcc.c-torture/execute/920721-2.c: Skip -O0 unless alloca is available. * gcc.c-torture/execute/920929-1.c: Likewise. * gcc.c-torture/execute/921017-1.c: Likewise. * gcc.c-torture/execute/941202-1.c: Likewise. * gcc.c-torture/execute/align-nest.c: Likewise. * gcc.c-torture/execute/alloca-1.c: Likewise. * gcc.c-torture/execute/pr36321.c: Likewise. * gcc.c-torture/compile/20001221-1.c: Likewise. * gcc.c-torture/compile/20020807-1.c: Likewise. * gcc.c-torture/compile/20050801-2.c: Likewise. * gcc.c-torture/compile/920428-4.c: Likewise. * gcc.c-torture/compile/debugvlafunction-1.c.c: Likewise. * gcc.c-torture/compile/pr41469.c: Likewise. * gcc.dg/torture/pr48953.c: Likewise. * gcc.dg/torture/pr8081.c: Likewise. * gcc.dg/torture/stackalign/inline-1.c: Skip if nvptx-*-*. * gcc.dg/torture/stackalign/inline-2.c: Likewise. * gcc.dg/torture/stackalign/nested-1.c: Likewise. * gcc.dg/torture/stackalign/nested-2.c: Likewise. * gcc.dg/torture/stackalign/nested-3.c: Likewise. * gcc.dg/torture/stackalign/nested-4.c: Likewise. * gcc.dg/torture/stackalign/nested-1.c: Likewise. * gcc.dg/torture/stackalign/global-1.c: Likewise. * gcc.dg/torture/stackalign/pr16660-1.c: Likewise. * gcc.dg/torture/stackalign/pr16660-2.c: Likewise. * gcc.dg/torture/stackalign/pr16660-3.c: Likewise. * gcc.dg/torture/stackalign/ret-struct-1.c: Likewise. * gcc.dg/torture/stackalign/struct-1.c: Likewise. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp.orig +++ gcc/testsuite/lib/target-supports.exp @@ -604,6 +606,15 @@ proc add_options_for_tls { flags } { return $flags } +# Return 1 if alloca is supported, 0 otherwise. + +proc check_effective_target_alloca {} { +if { [istarget nvptx-*-*] } { + return 0 +} +return 1 +} + # Return 1 if thread local storage (TLS) is supported, 0 otherwise. proc check_effective_target_tls {} { Index: gcc/testsuite/gcc.c-torture/execute/20010209-1.c === --- gcc/testsuite/gcc.c-torture/execute/20010209-1.c.orig +++ gcc/testsuite/gcc.c-torture/execute/20010209-1.c @@ -1,3 +1,4 @@ +/* { dg-require-effective-target alloca } */ int b; int foo (void) { Index: gcc/testsuite/gcc.c-torture/execute/20020314-1.c === --- gcc/testsuite/gcc.c-torture/execute/20020314-1.c.orig +++ gcc/testsuite/gcc.c-torture/execute/20020314-1.c @@ -1,3 +1,4 @@ +/* { dg-require-effective-target alloca } */ void f(void * a, double y) { } Index: gcc/testsuite/gcc.c-torture/execute/20020412-1.c
[2/6] nvptx testsuite patches: typed assembly
Since everything in ptx assembly is typed, KR C is problematic. There are a number of testcases that call functions with the wrong number of arguments, or arguments of the wrong type. I've added a new feature, untyped_assembly, which these tests now require. I've also used this for tests using builtin_apply/builtin_return. Bernd gcc/testsuite/ * lib/target-supports.exp (check_effective_target_untyped_assembly): New function. * gcc.c-torture/compile/20091215-1.c: Require untyped_assembly. * gcc.c-torture/compile/920917-1.c: Likewise. * gcc.c-torture/compile/930120-1.c: Likewise. * gcc.c-torture/compile/930411-1.c: Likewise. * gcc.c-torture/compile/930529-1.c: Likewise. * gcc.c-torture/compile/930623-1.c: Likewise. * gcc.c-torture/compile/950329-1.c: Likewise. * gcc.c-torture/compile/calls.c: Likewise. * gcc.c-torture/compile/pr37258.c: Likewise. * gcc.c-torture/compile/pr37327.c: Likewise. * gcc.c-torture/compile/pr38360.c: Likewise. * gcc.c-torture/compile/pr43635.c: Likewise. * gcc.c-torture/compile/pr47428.c: Likewise. * gcc.c-torture/compile/pr47967.c: Likewise. * gcc.c-torture/compile/pr49145.c: Likewise. * gcc.c-torture/compile/pr51694.c: Likewise. * gcc.c-torture/compile/pr53411.c: Likewise. * gcc.c-torture/execute/20001101.c: Likewise. * gcc.c-torture/execute/20051012-1.c: Likewise. * gcc.c-torture/execute/920501-1.c: Likewise. * gcc.c-torture/execute/921202-1.c: Likewise. * gcc.c-torture/execute/921208-2.c: Likewise. * gcc.c-torture/execute/call-trap-1.c: Likewise. * gcc.c-torture/compile/20010525-1.c: Likewise. * gcc.c-torture/compile/20021015-2.c: Likewise. * gcc.c-torture/compile/20031023-1.c: Likewise. * gcc.c-torture/compile/20031023-2.c: Likewise. * gcc.c-torture/compile/pr49206.c: Likewise. * gcc.c-torture/execute/pr47237.c: Likewise. * gcc.dg/torture/stackalign/builtin-apply-1.c: Likewise. * gcc.dg/torture/stackalign/builtin-apply-2.c: Likewise. * gcc.dg/torture/stackalign/builtin-apply-3.c: Likewise. * gcc.dg/torture/stackalign/builtin-apply-4.c: Likewise. * gcc.dg/torture/stackalign/builtin-return-1.c: Likewise. * gcc.dg/builtin-apply1.c: Likewise. * gcc.dg/builtin-apply2.c: Likewise. * gcc.dg/builtin-apply3.c: Likewise. * gcc.dg/builtin-apply4.c: Likewise. * gcc.dg/pr38338.c: Likewise. * gcc.dg/torture/pr41993.c: Likewise. * gcc.c-torture/compile/386.c: Likewise. * gcc.c-torture/compile/cmpsi386.c: Likewise. * gcc.c-torture/compile/consec.c: Likewise. * gcc.c-torture/compile/ex.c: Likewise. * gcc.c-torture/compile/pass.c: Likewise. * gcc.c-torture/compile/scal.c: Likewise. * gcc.c-torture/compile/uuarg.c: Likewise. * gcc.c-torture/compile/conv_tst.c: Likewise. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp.orig +++ gcc/testsuite/lib/target-supports.exp @@ -604,6 +606,17 @@ proc add_options_for_tls { flags } { return $flags } +# Return 1 if the assembler does not verify function types against +# calls, 0 otherwise. Such verification will typically show up problems +# with KR C function declarations. + +proc check_effective_target_untyped_assembly {} { +if { [istarget nvptx-*-*] } { + return 0 +} +return 1 +} + # Return 1 if alloca is supported, 0 otherwise. proc check_effective_target_alloca {} { Index: gcc/testsuite/gcc.c-torture/compile/20091215-1.c === --- gcc/testsuite/gcc.c-torture/compile/20091215-1.c.orig +++ gcc/testsuite/gcc.c-torture/compile/20091215-1.c @@ -1,3 +1,5 @@ +/* { dg-require-effective-target untyped_assembly } */ + void bar (); void Index: gcc/testsuite/gcc.c-torture/compile/920917-1.c === --- gcc/testsuite/gcc.c-torture/compile/920917-1.c.orig +++ gcc/testsuite/gcc.c-torture/compile/920917-1.c @@ -1,2 +1,4 @@ +/* { dg-require-effective-target untyped_assembly } */ + inline f(x){switch(x){case 6:case 4:case 3:case 1:;}return x;} g(){f(sizeof(xx));} Index: gcc/testsuite/gcc.c-torture/compile/930120-1.c === --- gcc/testsuite/gcc.c-torture/compile/930120-1.c.orig +++ gcc/testsuite/gcc.c-torture/compile/930120-1.c @@ -1,3 +1,4 @@ +/* { dg-require-effective-target untyped_assembly } */ union { short I[2]; long int L; Index: gcc/testsuite/gcc.c-torture/compile/930411-1.c === --- gcc/testsuite/gcc.c-torture/compile/930411-1.c.orig +++ gcc/testsuite/gcc.c-torture/compile/930411-1.c @@ -1,3 +1,5 @@ +/* { dg-require-effective-target untyped_assembly } */ + int heap; g(){} Index: gcc/testsuite/gcc.c-torture/compile/930529-1.c === ---
Re: [PATCH][ARM] Update target testcases for gnu11
On 21/10/14 14:48, Jiong Wang wrote: this patch update arm testcases for recently gnu11 change. ok for trunk? This is OK bar the minor nit in the ChangeLog below - as a follow up it would be nice to see if we can use the ACLE feature macros instead of hard-coding some of the functions into the target_neon supports (especially the ones for vcvt16 and vfma). thanks. gcc/testsuite/ * gcc.target/arm/20031108-1.c: Add explicit declaration. * gcc.target/arm/cold-lc.c: Likewise. * gcc.target/arm/neon-modes-2.c: Likewise. * gcc.target/arm/pr43920-2.c: Likewise. * gcc.target/arm/pr44788.c: Likewise. * gcc.target/arm/pr55642.c: Likewise. * gcc.target/arm/pr58784.c: Likewise. * gcc.target/arm/pr60650.c: Likewise. * gcc.target/arm/pr60650-2.c: Likewise. * gcc.target/arm/vfp-ldmdbs.c: Likewise. * gcc.target/arm/vfp-ldmias.c: Likewise. * lib/target-supports.exp: Likewise. Can you mention the specific target-supports functions changed here, please ? * gcc.target/arm/pr51968.c: Add -Wno-implicit-function-declaration. regards Ramana
[3/6] nvptx testsuite patches: stdio
Some tests use stdio functions which are unavaiable with the cut-down newlib I'm using for ptx testing. I'm somewhat uncertain what to do with these; they are by no means the only unavailable library functions the testsuite tries to use (signal is another example). Here's a patch which deals with parts of the problem, but I wouldn't mind leaving this one out if it doesn't seem worthwhile. Bernd gcc/testsuite/ * lib/target-supports.exp (check_effective_target_stdio): New function. * gcc.c-torture/execute/gofast.c (fail): Don't fprintf on nvptx. * gcc.dg/torture/matrix-1.c: Require stdio. * gcc.dg/torture/matrix-2.c: Require stdio. * gcc.dg/torture/matrix-5.c: Require stdio. * gcc.dg/torture/matrix-6.c: Require stdio. * gcc.dg/torture/transpose-1.c: Require stdio. * gcc.dg/torture/transpose-2.c: Require stdio. * gcc.dg/torture/transpose-3.c: Require stdio. * gcc.dg/torture/transpose-4.c: Require stdio. * gcc.dg/torture/transpose-5.c: Require stdio. * gcc.dg/torture/transpose-6.c: Require stdio. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp.orig +++ gcc/testsuite/lib/target-supports.exp @@ -604,6 +606,15 @@ proc add_options_for_tls { flags } { return $flags } +# Return 1 if the C library on this target has stdio support, 0 otherwise. + +proc check_effective_target_stdio {} { +if { [istarget nvptx-*-*] } { + return 0 +} +return 1 +} + # Return 1 if the assembler does not verify function types against # calls, 0 otherwise. Such verification will typically show up problems # with KR C function declarations. Index: gcc/testsuite/gcc.c-torture/execute/gofast.c === --- gcc/testsuite/gcc.c-torture/execute/gofast.c.orig +++ gcc/testsuite/gcc.c-torture/execute/gofast.c @@ -48,7 +48,9 @@ int fail (char *msg) { fail_count++; +#ifndef __nvptx__ fprintf (stderr, Test failed: %s\n, msg); +#endif } int Index: gcc/testsuite/gcc.dg/torture/matrix-1.c === --- gcc/testsuite/gcc.dg/torture/matrix-1.c.orig +++ gcc/testsuite/gcc.dg/torture/matrix-1.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options -fwhole-program } */ +/* { dg-require-effective-target stdio } */ #include stdio.h #include stdlib.h Index: gcc/testsuite/gcc.dg/torture/matrix-2.c === --- gcc/testsuite/gcc.dg/torture/matrix-2.c.orig +++ gcc/testsuite/gcc.dg/torture/matrix-2.c @@ -1,6 +1,6 @@ /* { dg-do run } */ /* { dg-options -fwhole-program } */ - +/* { dg-require-effective-target stdio } */ #include stdio.h #include stdlib.h Index: gcc/testsuite/gcc.dg/torture/matrix-5.c === --- gcc/testsuite/gcc.dg/torture/matrix-5.c.orig +++ gcc/testsuite/gcc.dg/torture/matrix-5.c @@ -1,6 +1,6 @@ /* { dg-do run } */ /* { dg-options -fwhole-program } */ - +/* { dg-require-effective-target stdio } */ #include stdio.h #include stdlib.h Index: gcc/testsuite/gcc.dg/torture/matrix-6.c === --- gcc/testsuite/gcc.dg/torture/matrix-6.c.orig +++ gcc/testsuite/gcc.dg/torture/matrix-6.c @@ -1,6 +1,6 @@ /* { dg-do run } */ /* { dg-options -fwhole-program } */ - +/* { dg-require-effective-target stdio } */ #include stdio.h #include stdlib.h Index: gcc/testsuite/gcc.dg/torture/transpose-1.c === --- gcc/testsuite/gcc.dg/torture/transpose-1.c.orig +++ gcc/testsuite/gcc.dg/torture/transpose-1.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options -fwhole-program } */ +/* { dg-require-effective-target stdio } */ #include stdio.h #include stdlib.h Index: gcc/testsuite/gcc.dg/torture/transpose-2.c === --- gcc/testsuite/gcc.dg/torture/transpose-2.c.orig +++ gcc/testsuite/gcc.dg/torture/transpose-2.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options -fwhole-program } */ +/* { dg-require-effective-target stdio } */ #include stdio.h #include stdlib.h Index: gcc/testsuite/gcc.dg/torture/transpose-3.c === --- gcc/testsuite/gcc.dg/torture/transpose-3.c.orig +++ gcc/testsuite/gcc.dg/torture/transpose-3.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options -fwhole-program } */ +/* { dg-require-effective-target stdio } */ #include stdio.h #include stdlib.h Index: gcc/testsuite/gcc.dg/torture/transpose-4.c === --- gcc/testsuite/gcc.dg/torture/transpose-4.c.orig +++ gcc/testsuite/gcc.dg/torture/transpose-4.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options
Re: [PATCH i386 AVX512] [81/n] Add new built-ins.
Hello, On 21 Oct 11:17, Richard Biener wrote: On Mon, Oct 20, 2014 at 3:50 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Oct 20, 2014 at 05:41:25PM +0400, Kirill Yukhin wrote: Hello, This patch adds (almost) all built-ins needed by AVX-512VL,BW,DQ intrinsics. Main questionable hunk is: diff --git a/gcc/tree-core.h b/gcc/tree-core.h index b69312b..a639487 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1539,7 +1539,7 @@ struct GTY(()) tree_function_decl { DECL_FUNCTION_CODE. Otherwise unused. ??? The bitfield needs to be able to hold all target function codes as well. */ - ENUM_BITFIELD(built_in_function) function_code : 11; + ENUM_BITFIELD(built_in_function) function_code : 12; ENUM_BITFIELD(built_in_class) built_in_class : 2; unsigned static_ctor_flag : 1; Well, decl_with_vis has 15 unused bits, so instead of growing FUNCTION_DECL significantly, might be better to move one of the flags to decl_with_vis and just document that it applies to FUNCTION_DECLs only. Or move some flag to cgraph if possible. But seeing e.g. IX86_BUILTIN_FIXUPIMMPD256, IX86_BUILTIN_FIXUPIMMPD256_MASK, IX86_BUILTIN_FIXUPIMMPD256_MASKZ etc. I wonder if you really need that many builtins, weren't we adding for avx512f just single builtin instead of 3 different ones, always providing mask argument and depending on whether it is all ones, etc. figuring out what kind of masking should be performed? If only we had no lang-specific flags in tree_base we could use the same place as we use for internal function code ... But yes, not using that many builtins in the first place is preferred for example by making them type-generic and/or variadic. We might try to refactor x86 built-ins toward type-generic approach, but I think it can be postponed to 6.x release series. -- Thanks, K Richard. Jakub
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I saw the sources of these functions, but I can't understand why I should use something else? Note that all predicate computations are located in basic blocks ( by design of if-conv) and there is special function that put these computations in bb (insert_gimplified_predicates). Edge contains only predicate not its computations. New function - find_insertion_point() does very simple search - it finds out the latest (in current bb) operand def-stmt of predicates taken from all incoming edges. In original algorithm the predicate of non-critical edge is taken to perform phi-node predication since for critical edge it does not work properly. My question is: does your comments mean that I should re-design my extensions? Well, we have infrastructure for inserting code on edges and you've made critical edges predicated correctly. So why re-invent the wheel? I realize this is very similar to my initial suggestion to simply split critical edges in loops you want to if-convert but delays splitting until it turns out to be necessary (which might be good for the !force_vect case). For edge predicates you simply can emit their computation on the edge, no? Btw, I very originally suggested to rework if-conversion to only record edge predicates - having both block and edge predicates somewhat complicates the code and makes it harder to maintain (thus also the suggestion to simply split critical edges if necessary to make BB predicates work always). Your patches add a lot of code and to me it seems we can avoid doing so much special casing. For example attacking the critical edge issue by a simple Index: tree-if-conv.c === --- tree-if-conv.c (revision 216508) +++ tree-if-conv.c (working copy) @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop, if (EDGE_COUNT (e-src-succs) == 1) found = true; if (!found) - { - if (dump_file (dump_flags TDF_DETAILS)) - fprintf (dump_file, only critical predecessors\n); - return false; - } + split_edge (EDGE_PRED (bb, 0)); } return true; it changes the number of blocks in the loop, so get_loop_body_in_if_conv_order should probably be re-done with the above eventually signalling that it created a new block. Or the above should populate a vector of edges to split and do that after the loop calling if_convertible_bb_p. Richard. Richard. Thanks. Yuri. BTW Jeff did initial review of my changes related to predicate computation for join blocks. I presented him updated patch with test-case and some minor changes in patch. But still did not get any feedback on it. Could you please take a look also on it? 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Yes, This patch does not make sense since phi node predication for bb with critical incoming edges only performs another function which is absent (predicate_extended_scalar_phi). BTW I see that commit_edge_insertions() is used for rtx instructions only but you propose to use it for tree also. Did I miss something? Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert if you want easy access to the newly created basic block to push the predicate to - see gsi_commit_edge_inserts implementation). Richard. Thanks ahead. 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). But the same reasoning applies to this version of the patch when flag_force_vectorize is true!? (insertion point and invalid SSA form) Which means the patch doesn't make sense in isolation? Btw, I think for the case you should simply do gsi_insert_on_edge () and commit_edge_insertions () before the call to combine_blocks (pushing the edge predicate to the newly created block). Richard. Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev ysrum...@gmail.com (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments.
Re: [PATCH i386 AVX512] [81/n] Add new built-ins.
On Tue, Oct 21, 2014 at 06:08:15PM +0400, Kirill Yukhin wrote: --- a/gcc/tree.h +++ b/gcc/tree.h @@ -2334,6 +2334,10 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_COMDAT(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.comdat_flag) + /* In a FUNCTION_DECL indicates that a static chain is needed. */ +#define DECL_STATIC_CHAIN(NODE) \ + (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.regdecl_flag) + I would say that you should still keep it together with the FUNCTION_DECL macros and use FUNCTION_DECL_CHECK there, to make it clear we don't want the macro to be used on VAR_DECLs etc. So just s/function_decl/decl_with_vis/ in the definition IMHO. Also, with so many added builtins, how does it affect int i; compilation time at -O0? If it is significant, maybe it is highest time to make the md builtin decl building more lazy. Jakub
[4/6] nvptx testsuite patches: xfails and skips
Some things don't fit into nice categories that apply to a larger set of tests, or which are somewhat random like ptxas tool failures. For these I've added xfails and skips. Bernd gcc/testsuite/ * lib/target-supports.exp (check_effective_target_trampolines, check_profiling_available, check_effective_target_lto, check_effective_target_vect_natural): False for nvptx-*-*. * gcc.c-torture/compile/limits-fndefn.c: Skip for nvptx-*-*. * gcc.c-torture/compile/pr34334.c: Likewise. * gcc.c-torture/compile/pr37056.c: Likewise. * gcc.c-torture/compile/pr39423-1.c: Likewise. * gcc.c-torture/compile/pr46534.c: Likewise. * gcc.c-torture/compile/pr49049.c: Likewise. * gcc.c-torture/compile/pr59417.c: Likewise. * gcc.c-torture/compile/20080721-1.c: Likewise. * gcc.c-torture/compile/920501-4.c: Likewise. * gcc.c-torture/compile/921011-1.c: Likewise. * gcc.dg/20040813-1.c: Likewise. * gcc.dg/pr28755.c: Likewise. * gcc.dg/pr44194-1.c: Likewise. * gcc.c-torture/compile/pr42717.c: Xfail for nvptx-*-*. * gcc.c-torture/compile/pr61684.c: Likewise. * gcc.c-torture/compile/pr20601-1.c: Likewise. * gcc.c-torture/compile/pr59221.c: Likewise. * gcc.c-torture/compile/20060208-1.c: Likewise. * gcc.c-torture/execute/pr52129.c: Likewise. * gcc.c-torture/execute/20020310-1.c: Likewise. * gcc.c-torture/execute/20101011-1.c: Define DO_TEST to 0 for nvptx. * gcc.c-torture/execute20020312-2.c: Add case for for nvptx. * gcc.c-torture/compile/pr60655-1.c: Don't add -fdata-sections for nvptx-*-*. * gcc.dg/pr36400.c: Xfail scan-assembler test on nvptx-*-*. * gcc.dg/const-elim-2.c: Likewise. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp.orig +++ gcc/testsuite/lib/target-supports.exp @@ -436,6 +436,7 @@ proc check_effective_target_trampolines } if { [istarget avr-*-*] || [istarget msp430-*-*] + || [istarget nvptx-*-*] || [istarget hppa2.0w-hp-hpux11.23] || [istarget hppa64-hp-hpux11.23] } { return 0; @@ -532,6 +533,7 @@ proc check_profiling_available { test_wh || [istarget msp430-*-*] || [istarget nds32*-*-elf] || [istarget nios2-*-elf] + || [istarget nvptx-*-*] || [istarget powerpc-*-eabi*] || [istarget powerpc-*-elf] || [istarget rx-*-*] @@ -4216,7 +4218,8 @@ proc check_effective_target_vect_natural verbose check_effective_target_vect_natural_alignment: using cached result 2 } else { set et_vect_natural_alignment_saved 1 -if { [check_effective_target_arm_eabi] } { +if { [check_effective_target_arm_eabi] + || [istarget nvptx-*-*] } { set et_vect_natural_alignment_saved 0 } } @@ -5691,6 +5694,9 @@ proc check_effective_target_gld { } { proc check_effective_target_lto { } { global ENABLE_LTO +if { [istarget nvptx-*-*] } { + return 0; +} return [info exists ENABLE_LTO] } Index: gcc/testsuite/gcc.c-torture/compile/limits-fndefn.c === --- gcc/testsuite/gcc.c-torture/compile/limits-fndefn.c.orig +++ gcc/testsuite/gcc.c-torture/compile/limits-fndefn.c @@ -1,4 +1,5 @@ /* { dg-skip-if too complex for avr { avr-*-* } { * } { } } */ +/* { dg-skip-if ptxas times out { nvptx-*-* } { * } { } } */ /* { dg-timeout-factor 4.0 } */ #define LIM1(x) x##0, x##1, x##2, x##3, x##4, x##5, x##6, x##7, x##8, x##9, #define LIM2(x) LIM1(x##0) LIM1(x##1) LIM1(x##2) LIM1(x##3) LIM1(x##4) \ Index: gcc/testsuite/gcc.c-torture/compile/pr60655-1.c === --- gcc/testsuite/gcc.c-torture/compile/pr60655-1.c.orig +++ gcc/testsuite/gcc.c-torture/compile/pr60655-1.c @@ -1,4 +1,4 @@ -/* { dg-options -fdata-sections { target { ! { { hppa*-*-hpux* } { ! lp64 } } } } } */ +/* { dg-options -fdata-sections { target { { ! { { hppa*-*-hpux* } { ! lp64 } } } { ! nvptx-*-* } } } } */ typedef unsigned char unit; typedef unit *unitptr; Index: gcc/testsuite/gcc.c-torture/compile/pr34334.c === --- gcc/testsuite/gcc.c-torture/compile/pr34334.c.orig +++ gcc/testsuite/gcc.c-torture/compile/pr34334.c @@ -1,3 +1,4 @@ +/* { dg-skip-if ptxas times out { nvptx-*-* } { * } { -O0 } } */ __extension__ typedef __SIZE_TYPE__ size_t; __extension__ typedef long long int __quad_t; __extension__ typedef unsigned int __mode_t; Index: gcc/testsuite/gcc.c-torture/compile/pr37056.c === --- gcc/testsuite/gcc.c-torture/compile/pr37056.c.orig +++ gcc/testsuite/gcc.c-torture/compile/pr37056.c @@ -1,3 +1,4 @@ +/* { dg-skip-if ptxas times out { nvptx-*-* } { -O2 -Os } { } } */ extern void abort (void); static union { Index: gcc/testsuite/gcc.c-torture/compile/pr39423-1.c
[5/6] nvptx testsuite patches: jumps and labels
This deals with tests requiring indirect jumps (including tests using setjmp), label values, and nonlocal goto. A subset of these tests uses the NO_LABEL_VALUES macro, but it's not consistent across the testsuite. The feature test I wrote tests whether that is defined and returns false for label_values if so. Bernd gcc/testsuite/ * lib/target-supports.exp (check_effective_target_indirect_jumps): New function. (check_effective_target_nonlocal_goto): New function. (check_effective_target_label_values): New function. * gcc.c-torture/execute/20071220-2.c: Require label_values. * gcc.c-torture/compile/labels-2.c: Likewise. * gcc.c-torture/compile/2518-1.c: Likewise. * gcc.c-torture/compile/20021108-1.c: Likewise. * gcc.c-torture/compile/981006-1.c: Likewise. * gcc.c-torture/execute/20040302-1.c: Likewise. * gcc.dg/torture/pr33848.c: Likewise. * gcc.c-torture/compile/pr46107.c: Require indirect jumps and label values. * gcc.c-torture/compile/pr32919.c: Likewise. * gcc.c-torture/compile/pr17913.c: Likewise. * gcc.c-torture/compile/pr51495.c: Likewise. * gcc.c-torture/compile/pr25224.c: Likewise. * gcc.c-torture/compile/labels-3.c: Likewise. * gcc.c-torture/compile/pr27863.c: Likewise. * gcc.c-torture/compile/20050510-1.c: Likewise. * gcc.c-torture/compile/pr28489.c: Likewise. * gcc.c-torture/compile/pr29128.c: Likewise. * gcc.c-torture/compile/pr21356: Likewise. * gcc.c-torture/execute/20071210-1.c: Likewise. * gcc.c-torture/execute/200701220-1.c: Likewise. * gcc.c-torture/execute/pr51447.c: Likewise. * gcc.c-torture/execute/comp-goto-1.c: Likewise. * gcc.c-torture/execute/comp-goto-2.c: Likewise. * gcc.dg/20021029-1.c: Likewise. * gcc.dg/pr43379.c: Likewise. * gcc.dg/pr45259.c: Likewise. * gcc.dg/torture/pr53695.c: Likewise. * gcc.dg/torture/pr57584.c: Likewise. * gcc.c-torture/execute/980526-1.c: Skip if -O0 and neither label_values or indirect_jumps are available. * gcc.c-torture/compile/920415-1.c: Likewise. Remove NO_LABEL_VALUES test. * gcc.c-torture/compile/920428-3.c: Likewise. * gcc.c-torture/compile/950613-1.c: Likewise. * gcc.c-torture/compile/pr30984.c: Require indirect jumps. * gcc.c-torture/compile/991213-3.c: Likewise. * gcc.c-torture/compile/920825-1.c: Likewise. * gcc.c-torture/compile/20011029-1.c: Likewise. * gcc.c-torture/compile/complex-6.c: Likewise. * gcc.c-torture/compile/pr27127.c: Likewise. * gcc.c-torture/compile/pr58164.c: Likewise. * gcc.c-torture/compile/20041214-1.c: Likewise. * gcc.c-torture/execute/built-in-setjmp.c: Likewise. * gcc.c-torture/execute/pr56982.c: Likewise. * gcc.c-torture/execute/pr60003.c: Likewise. * gcc.c-torture/execute/pr26983.c: Likewise. * gcc.dg/pr57287-2.c: Likewise. * gcc.dg/pr59920-1.c: Likewise. * gcc.dg/pr59920-2.c: Likewise. * gcc.dg/pr59920-3.c: Likewise. * gcc.dg/setjmp-3.c: Likewise. * gcc.dg/setjmp-4.c: Likewise. * gcc.dg/setjmp-5.c: Likewise. * gcc.dg/torture/pr48542.c: Likewise. * gcc.dg/torture/pr57147-2.c: Likewise. * gcc.dg/torture/pr59993.c: Likewise. * gcc.dg/torture/stackalign/non-local-goto-1.c: Require nonlocal_goto. * gcc.dg/torture/stackalign/non-local-goto-2.c: Likewise. * gcc.dg/torture/stackalign/non-local-goto-3.c: Likewise. * gcc.dg/torture/stackalign/non-local-goto-4.c: Likewise. * gcc.dg/torture/stackalign/non-local-goto-5.c: Likewise. * gcc.dg/torture/stackalign/setjmp-1.c: Likewise. * gcc.dg/torture/stackalign/setjmp-3.c: Likewise. * gcc.dg/torture/stackalign/setjmp-4.c: Likewise. * gcc.dg/non-local-goto-1.c: Likewise. * gcc.dg/non-local-goto-2.c: Likewise. * gcc.dg/pr49994-1.c: Likewise. * gcc.dg/torture/pr57036-2.c: Likewise. * gcc.c-torture/compile/20040614-1.c: Require label_values. Remove NO_LABEL_VALUES test. * gcc.c-torture/compile/920831-1.c: Likewise. * gcc.c-torture/compile/920502-1.c: Likewise. * gcc.c-torture/compile/920501-7.c: Likewise. * gcc.dg/pr52139.c: Likewise. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp.orig +++ gcc/testsuite/lib/target-supports.exp @@ -604,7 +606,38 @@ proc add_options_for_tls { flags } { return 1 } +# Return 1 if indirect jumps are supported, 0 otherwise. + +proc check_effective_target_indirect_jumps {} { +if { [istarget nvptx-*-*] } { + return 0 +} +return 1 +} + +# Return 1 if nonlocal goto is supported, 0 otherwise. + +proc check_effective_target_nonlocal_goto {} { +if { [istarget nvptx-*-*] } { + return 0 +} +return 1 +} + +# Return 1 if taking label values is supported, 0 otherwise. + +proc check_effective_target_label_values {} { +if { [istarget nvptx-*-*] } { + return 0 +} +return [check_no_compiler_messages label_values assembly { + #ifdef NO_LABEL_VALUES + #error NO + #endif +}] +} + # Return 1 if the assembler does not verify function types against
[6/7] Random tweaks
This tweaks a few tests so that we don't have to skip them. This is mostly concerned with declaring main properly, or changing other declarations where the test does not seem to rely on the type mismatches. I've also included one example of changing a function name to not be call, ptxas seems to have a bug that makes it not allow this function name. If that doesn't seem too awful I'll have a few more tests to fix up in this way. There'll be a 7th patch, not because I can't count, but because I didn't follow a consistent naming scheme for the patches. Bernd * gcc.c-torture/compile/920625-2.c: Add return type to freeReturnStruct. * gcc.c-torture/execute/20091229-1.c: Declare main properly. * gcc.c-torture/execute/pr61375.c: Likewise. * gcc.c-torture/execute/20111208-1.c: Use __SIZE_TYPE__ for size_t. * gcc.dg/pr30904.c: Remove extern from declaration of t. * gcc.c-torture/compile/callind.c (bar): Renamed from call. Index: gcc/testsuite/gcc.c-torture/compile/920625-2.c === --- gcc/testsuite/gcc.c-torture/compile/920625-2.c.orig +++ gcc/testsuite/gcc.c-torture/compile/920625-2.c @@ -100,4 +100,4 @@ copyQueryResult(Widget w, Boolean copy, freeReturnStruct(); } -freeReturnStruct(){} +void freeReturnStruct(){} Index: gcc/testsuite/gcc.c-torture/execute/20091229-1.c === --- gcc/testsuite/gcc.c-torture/execute/20091229-1.c.orig +++ gcc/testsuite/gcc.c-torture/execute/20091229-1.c @@ -1,2 +1,2 @@ long long foo(long long v) { return v / -0x08000LL; } -void main() { if (foo(0x08000LL) != -1) abort(); exit (0); } +int main(int argc, char **argv) { if (foo(0x08000LL) != -1) abort(); exit (0); } Index: gcc/testsuite/gcc.c-torture/execute/20111208-1.c === --- gcc/testsuite/gcc.c-torture/execute/20111208-1.c.orig +++ gcc/testsuite/gcc.c-torture/execute/20111208-1.c @@ -1,7 +1,7 @@ /* PR tree-optimization/51315 */ /* Reported by Jurij Smakov ju...@wooyd.org */ -typedef unsigned int size_t; +typedef __SIZE_TYPE__ size_t; extern void *memcpy (void *__restrict __dest, __const void *__restrict __src, size_t __n) Index: gcc/testsuite/gcc.c-torture/execute/pr61375.c === --- gcc/testsuite/gcc.c-torture/execute/pr61375.c.orig +++ gcc/testsuite/gcc.c-torture/execute/pr61375.c @@ -19,7 +19,7 @@ uint128_central_bitsi_ior (unsigned __in } int -main(int argc) +main(int argc, char **argv) { __int128 in = 1; #ifdef __SIZEOF_INT128__ Index: gcc/testsuite/gcc.dg/pr30904.c === --- gcc/testsuite/gcc.dg/pr30904.c.orig +++ gcc/testsuite/gcc.dg/pr30904.c @@ -1,7 +1,7 @@ /* { dg-do link } */ /* { dg-options -O2 -fdump-tree-optimized } */ -extern int t; +int t; extern void link_error(void); int main (void) { Index: gcc/testsuite/gcc.c-torture/compile/callind.c === --- gcc/testsuite/gcc.c-torture/compile/callind.c.orig +++ gcc/testsuite/gcc.c-torture/compile/callind.c @@ -1,8 +1,8 @@ -call (foo, a) +bar (foo, a) int (**foo) (); { - (foo)[1] = call; + (foo)[1] = bar; foo[a] (1); }
[7/7] nvptx testsuite patches: Return addresses
This tests for availability of return addresses in a number of tests. Bernd gcc/testsuite/ * lib/target-supports.exp (check_effective_target_return_address): New function. * gcc.c-torture/execute/20010122-1.c: Require return_address. * gcc.c-torture/execute/20030323-1.c: Likewise. * gcc.c-torture/execute/20030811-1.c: Likewise. * gcc.c-torture/execute/eeprof-1.c: Likewise. * gcc.c-torture/execute/frame-address.c: Likewise. * gcc.c-torture/execute/pr17377.c: Likewise. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp.orig +++ gcc/testsuite/lib/target-supports.exp @@ -604,7 +606,17 @@ proc add_options_for_tls { flags } { return 1 } +# Return 1 if builtin_return_address and builtin_frame_address are +# supported, 0 otherwise. + +proc check_effective_target_return_address {} { +if { [istarget nvptx-*-*] } { + return 0 +} +return 1 +} + # Return 1 if the assembler does not verify function types against # calls, 0 otherwise. Such verification will typically show up problems # with KR C function declarations. Index: gcc/testsuite/gcc.c-torture/execute/20010122-1.c === --- gcc/testsuite/gcc.c-torture/execute/20010122-1.c.orig +++ gcc/testsuite/gcc.c-torture/execute/20010122-1.c @@ -1,4 +1,5 @@ /* { dg-skip-if requires frame pointers { *-*-* } -fomit-frame-pointer } */ +/* { dg-require-effective-target return_address } */ extern void exit (int); extern void abort (void); Index: gcc/testsuite/gcc.c-torture/execute/20030323-1.c === --- gcc/testsuite/gcc.c-torture/execute/20030323-1.c.orig +++ gcc/testsuite/gcc.c-torture/execute/20030323-1.c @@ -1,4 +1,5 @@ /* PR opt/10116 */ +/* { dg-require-effective-target return_address } */ /* Removed tablejump while label still in use; this is really a link test. */ void *NSReturnAddress(int offset) Index: gcc/testsuite/gcc.c-torture/execute/20030811-1.c === --- gcc/testsuite/gcc.c-torture/execute/20030811-1.c.orig +++ gcc/testsuite/gcc.c-torture/execute/20030811-1.c @@ -1,4 +1,5 @@ /* Origin: PR target/11535 from H. J. Lu h...@lucon.org */ +/* { dg-require-effective-target return_address } */ void vararg (int i, ...) { Index: gcc/testsuite/gcc.c-torture/execute/eeprof-1.c === --- gcc/testsuite/gcc.c-torture/execute/eeprof-1.c.orig +++ gcc/testsuite/gcc.c-torture/execute/eeprof-1.c @@ -1,3 +1,4 @@ +/* { dg-require-effective-target return_address } */ /* { dg-options -finstrument-functions } */ /* { dg-xfail-if { powerpc-ibm-aix* } * } */ Index: gcc/testsuite/gcc.c-torture/execute/frame-address.c === --- gcc/testsuite/gcc.c-torture/execute/frame-address.c.orig +++ gcc/testsuite/gcc.c-torture/execute/frame-address.c @@ -1,3 +1,4 @@ +/* { dg-require-effective-target return_address } */ int check_fa_work (const char *, const char *) __attribute__((noinline)); int check_fa_mid (const char *) __attribute__((noinline)); int check_fa (char *) __attribute__((noinline)); Index: gcc/testsuite/gcc.c-torture/execute/pr17377.c === --- gcc/testsuite/gcc.c-torture/execute/pr17377.c.orig +++ gcc/testsuite/gcc.c-torture/execute/pr17377.c @@ -1,6 +1,7 @@ /* PR target/17377 Bug in code emitted by return pattern on CRIS: missing pop of forced return address on stack. */ +/* { dg-require-effective-target return_address } */ int calls = 0; void *f (int) __attribute__ ((__noinline__));
Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.
Richard, In my initial design I did such splitting but before start real if-conversion but I decided to not perform it since code size for if-converted loop is growing (number of phi nodes is increased). It is worth noting also that for phi with #nodes 2 we need to get all predicates (except one) to do phi-predication and it means that block containing such phi can have only 1 critical edge. Thanks. Yuri. 2014-10-21 18:19 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I saw the sources of these functions, but I can't understand why I should use something else? Note that all predicate computations are located in basic blocks ( by design of if-conv) and there is special function that put these computations in bb (insert_gimplified_predicates). Edge contains only predicate not its computations. New function - find_insertion_point() does very simple search - it finds out the latest (in current bb) operand def-stmt of predicates taken from all incoming edges. In original algorithm the predicate of non-critical edge is taken to perform phi-node predication since for critical edge it does not work properly. My question is: does your comments mean that I should re-design my extensions? Well, we have infrastructure for inserting code on edges and you've made critical edges predicated correctly. So why re-invent the wheel? I realize this is very similar to my initial suggestion to simply split critical edges in loops you want to if-convert but delays splitting until it turns out to be necessary (which might be good for the !force_vect case). For edge predicates you simply can emit their computation on the edge, no? Btw, I very originally suggested to rework if-conversion to only record edge predicates - having both block and edge predicates somewhat complicates the code and makes it harder to maintain (thus also the suggestion to simply split critical edges if necessary to make BB predicates work always). Your patches add a lot of code and to me it seems we can avoid doing so much special casing. For example attacking the critical edge issue by a simple Index: tree-if-conv.c === --- tree-if-conv.c (revision 216508) +++ tree-if-conv.c (working copy) @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop, if (EDGE_COUNT (e-src-succs) == 1) found = true; if (!found) - { - if (dump_file (dump_flags TDF_DETAILS)) - fprintf (dump_file, only critical predecessors\n); - return false; - } + split_edge (EDGE_PRED (bb, 0)); } return true; it changes the number of blocks in the loop, so get_loop_body_in_if_conv_order should probably be re-done with the above eventually signalling that it created a new block. Or the above should populate a vector of edges to split and do that after the loop calling if_convertible_bb_p. Richard. Richard. Thanks. Yuri. BTW Jeff did initial review of my changes related to predicate computation for join blocks. I presented him updated patch with test-case and some minor changes in patch. But still did not get any feedback on it. Could you please take a look also on it? 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, Yes, This patch does not make sense since phi node predication for bb with critical incoming edges only performs another function which is absent (predicate_extended_scalar_phi). BTW I see that commit_edge_insertions() is used for rtx instructions only but you propose to use it for tree also. Did I miss something? Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert if you want easy access to the newly created basic block to push the predicate to - see gsi_commit_edge_inserts implementation). Richard. Thanks ahead. 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com: On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4). But the same reasoning applies to this version of the patch when flag_force_vectorize is true!? (insertion point and invalid SSA form) Which means the patch doesn't make sense in isolation? Btw, I think for the case you should simply do gsi_insert_on_edge () and commit_edge_insertions () before the call to combine_blocks (pushing the edge predicate to the newly created block). Richard. Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev
Re: [PATCH][ARM]Add ACLE 2.0 predefined marco __ARM_FEATURE_IDIV
On Mon, Oct 13, 2014 at 3:15 PM, Renlin Li renlin...@arm.com wrote: Hi all, This is a simple patch to add missing __ARM_FEATURE_IDIV__ predefined marco(ACLE 2.0) into TARGET_CPU_CPP_BUILTINS. Is it Okay to commit? gcc/ChangeLog: 2014-10-13 Renlin Li renlin...@arm.com * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Add ACLE 2.0 predefined marco __ARM_FEATURE_IDIV__. Replace this with Define __ARM_FEATURE_IDIV__. in the Changelog. Ok with that change. Ramana
Re: [0/6] nvptx testsuite patches
On 10/21/14 14:10, Bernd Schmidt wrote: This series modifies a large number of tests in order to clean up testsuite results on nvptx. The goal here was never really to get an entirely clean run - the target is just too different from conventional ones - but to be able to test the compiler sufficiently to be sure that it's in good shape for use in offloading. Most of the patches here add annotations for use of features like alloca or indirect jumps that are unsupported on the target. Examples of things that still cause failures are things like dots in identifiers, use of constructors (which is something I want to look into), certain constructs that trigger bugs in the ptxas tool, and lots of undefined C library functions. Yea. When I first looked at PTX, my thought was to use the existing testsuite, to the extent possible, to shake out the initial code generation issues. There's just some things that would require heroic effort to make work and they aren't really a priority for PTX. So I've got not problem conceptually with the direction this work is taking. jeff
Re: [PATCH i386 AVX512] [81/n] Add new built-ins.
On 21 Oct 16:20, Jakub Jelinek wrote: On Tue, Oct 21, 2014 at 06:08:15PM +0400, Kirill Yukhin wrote: --- a/gcc/tree.h +++ b/gcc/tree.h @@ -2334,6 +2334,10 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_COMDAT(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.comdat_flag) + /* In a FUNCTION_DECL indicates that a static chain is needed. */ +#define DECL_STATIC_CHAIN(NODE) \ + (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.regdecl_flag) + I would say that you should still keep it together with the FUNCTION_DECL macros and use FUNCTION_DECL_CHECK there, to make it clear we don't want the macro to be used on VAR_DECLs etc. So just s/function_decl/decl_with_vis/ in the definition IMHO. Yeah, sure. Also, with so many added builtins, how does it affect int i; compilation time at -O0? If it is significant, maybe it is highest time to make the md builtin decl building more lazy. I've tried this: $ echo int i; test.c $ time for i in `seq 1` ; do ./build-x86_64-linux/gcc/xgcc -B./build-x86_64-linux/gcc -O0 -S test.c ; done For trunk w/ and w/o the patch applied. Got 106.86 vs. 106.85 secs. which looks equal. So, I think we may say that this patch does not affect compile time. -- Thanks, K Jakub
[COMMITTED][PATCH][ARM] Update target testcases for gnu11
On 21/10/14 15:13, Ramana Radhakrishnan wrote: On 21/10/14 14:48, Jiong Wang wrote: this patch update arm testcases for recently gnu11 change. ok for trunk? This is OK bar the minor nit in the ChangeLog below - as a follow up it would be nice to see if we can use the ACLE feature macros instead of hard-coding some of the functions into the target_neon supports (especially the ones for vcvt16 and vfma). thanks. gcc/testsuite/ * gcc.target/arm/20031108-1.c: Add explicit declaration. * gcc.target/arm/cold-lc.c: Likewise. * gcc.target/arm/neon-modes-2.c: Likewise. * gcc.target/arm/pr43920-2.c: Likewise. * gcc.target/arm/pr44788.c: Likewise. * gcc.target/arm/pr55642.c: Likewise. * gcc.target/arm/pr58784.c: Likewise. * gcc.target/arm/pr60650.c: Likewise. * gcc.target/arm/pr60650-2.c: Likewise. * gcc.target/arm/vfp-ldmdbs.c: Likewise. * gcc.target/arm/vfp-ldmias.c: Likewise. * lib/target-supports.exp: Likewise. Can you mention the specific target-supports functions changed here, please ? OK, thanks, committed. URL:https://gcc.gnu.org/viewcvs?rev=216517root=gccview=rev Log: [ARM] Update testcases for GNU11 2014-10-21 Jiong Wang jiong.w...@arm.com * gcc.target/arm/20031108-1.c (Proc_7): Add explicit declaration. (Proc_1): Add return type. * gcc.target/arm/cold-lc.c (show_stack): Add explict declaration. * gcc.target/arm/neon-modes-2.c (foo): Likewise. * gcc.target/arm/pr43920-2.c (lseek): Likewise. * gcc.target/arm/pr44788.c (foo): Likewise. * gcc.target/arm/pr55642.c (abs): Likewise. * gcc.target/arm/pr58784.c (f): Likewise. * gcc.target/arm/pr60650.c (foo1, foo2): Likewise. * gcc.target/arm/vfp-ldmdbs.c (bar): Likewise. * gcc.target/arm/vfp-ldmias.c (bar): Likewise. * gcc.target/arm/pr60650-2.c (fn1, fn2): Add return type and add type for local variables. * lib/target-supports.exp (check_effective_target_arm_crypto_ok_nocache): Add declaration for vaeseq_u8. (check_effective_target_arm_neon_fp16_ok_nocache): Add declaration for vcvt_f16_f32. (check_effective_target_arm_neonv2_ok_nocache): Add declaration for vfma_f32. * gcc.target/arm/pr51968.c: Add -Wno-implicit-function-declaration.
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On 09/25/2014 08:12 AM, Chen Gang wrote: OK, thanks, next month, I shall try Qemu for microblaze (I also focus on Qemu, and try to make patches for it). Excuse me, after tried upstream qemu, it cann't run microblaze correctly, even for Xilinx qemu branch, I cann't run correctly either. I tried to consult related members in qemu mailing list, but got no result. After compared upstream branch and Xilinx branch, I am sure upstream microblaze qemu is lack of several related main features about microblaze. For qemu, I have to only focus on upstream, and try bug fix patches, new features and version merging are out of my current border, so sorry, I have to stop trying qemu for microblaze gcc test, at present. So I guess the root cause is: I only use cross-compiling environments under fedora x86_64, no any real or virtual target for test. Yes, if you want to test on a target, you will need a target. You can either have a simulator (see binutils and sim/* for an example of how to write one) or target hardware in some form. After trying sim, for me, it is really useful way for test, although I also met issues: For a hello world C program, microblaze gcc succeeded building, gdb can load and display the source code, and disassembe code successfully, but sim reported failure, the related issue is below: [root@localhost test]# /upstream/release/bin/microblaze-gchen-linux-run ./test Loading section .interp, size 0xd vma 0x10f4 Loading section .note.ABI-tag, size 0x20 vma 0x1104 Loading section .hash, size 0x24 vma 0x1124 Loading section .dynsym, size 0x40 vma 0x1148 Loading section .dynstr, size 0x3c vma 0x1188 Loading section .gnu.version, size 0x8 vma 0x11c4 Loading section .gnu.version_r, size 0x20 vma 0x11cc Loading section .rela.dyn, size 0x24 vma 0x11ec Loading section .rela.plt, size 0x24 vma 0x1210 Loading section .init, size 0x58 vma 0x1234 Loading section .plt, size 0x44 vma 0x128c Loading section .text, size 0x3d0 vma 0x12d0 Loading section .fini, size 0x34 vma 0x16a0 Loading section .rodata, size 0x12 vma 0x16d4 Loading section .eh_frame, size 0x4 vma 0x16e8 Loading section .ctors, size 0x8 vma 0x100016ec Loading section .dtors, size 0x8 vma 0x100016f4 Loading section .jcr, size 0x4 vma 0x100016fc Loading section .dynamic, size 0xd0 vma 0x10001700 Loading section .got, size 0xc vma 0x100017d0 Loading section .got.plt, size 0x18 vma 0x100017dc Loading section .data, size 0x10 vma 0x100017f4 Start address 0x12d0 Transfer rate: 14424 bits in 1 sec. ERROR: Unknown opcode program stopped with signal 4. For me, I guess it is sim's issue, and I shall try to fix it in the next month, so sorry, I can not finish emulator for microblaze within this month. :-( Welcome any ideas, suggestions or completions. Thanks. -- Chen Gang Open share and attitude like air water and life which God blessed
Re: [PATCH 1/5] Add recog_constrain_insn
On 10/17/2014 10:47 AM, Richard Sandiford wrote: This patch just adds a new utility function called recog_constrain_insn, to go alongside the existing recog_constrain_insn_cached. Note that the extract_insn in lra.c wasn't used when checking is disabled. The function just moved on to the next instruction straight away. The RA parts are ok for me. Thanks, Richard. gcc/ * recog.h (extract_constrain_insn): Declare. * recog.c (extract_constrain_insn): New function. * lra.c (check_rtl): Use it. * postreload.c (reload_cse_simplify_operands): Likewise. * reg-stack.c (check_asm_stack_operands): Likewise. (subst_asm_stack_regs): Likewise. * regcprop.c (copyprop_hardreg_forward_1): Likewise. * regrename.c (build_def_use): Likewise. * sel-sched.c (get_reg_class): Likewise. * config/arm/arm.c (note_invalid_constants): Likewise. * config/s390/predicates.md (execute_operation): Likewise.
Re: [PATCH] Add zero-overhead looping for xtensa backend
On Wed, Oct 15, 2014 at 7:10 PM, Yangfei (Felix) felix.y...@huawei.com wrote: Hi Sterling, Since the patch is delayed for a long time, I'm kind of pushing it. Sorry for that. Yeah, you are right. We have some performance issue here as GCC may use one more general register in some cases with this patch. Take the following arraysum testcase for example. In doloop optimization, GCC figures out that the number of iterations is 1024 and creates a new pseudo 79 as the new trip count register. The pseudo 79 is live throughout the loop, this makes the register pressure in the loop higher. And it's possible that this new pseudo is spilled by reload when the register pressure is very high. I know that the xtensa loop instruction copies the trip count register into the LCOUNT special register. And we need describe this hardware feature in GCC in order to free the trip count register. But I find it difficult to do. Do you have any good suggestions on this? There are two issues related to the trip count, one I would like you to solve now, one later. 1. Later: The trip count doesn't need to be updated at all inside these loops, once the loop instruction executes. The code below relates to this case. 2. Now: You should be able to use a loop instruction regardless of whether the trip count is spilled. If you have an example where that wouldn't work, I would love to see it. arraysum.c: int g[1024]; int g_sum; void test_entry () { int i, Sum = 0; for (i = 0; i 1024; i++) Sum = Sum + g[i]; g_sum = Sum; } 1. RTL before the doloop optimization pass(arraysum.c.193r.loop2_invariant): (note 34 0 32 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 32 34 36 2 NOTE_INSN_FUNCTION_BEG) (insn 36 32 37 2 (set (reg:SI 72 [ ivtmp$8 ]) (mem/u/c:SI (symbol_ref/u:SI (*.LC2) [flags 0x2]) [2 S4 A32])) 29 {movsi_internal} (expr_list:REG_EQUAL (symbol_ref:SI (g) var_decl 0x7f6eef5d62d0 g) (nil))) (insn 37 36 33 2 (set (reg/f:SI 76 [ D.1393 ]) (mem/u/c:SI (symbol_ref/u:SI (*.LC3) [flags 0x2]) [2 S4 A32])) 29 {movsi_internal} (expr_list:REG_EQUAL (const:SI (plus:SI (symbol_ref:SI (g) var_decl 0x7f6eef5d62d0 g) (const_int 4096 [0x1000]))) (nil))) (insn 33 37 42 2 (set (reg/v:SI 74 [ Sum ]) (const_int 0 [0])) arraysum.c:6 29 {movsi_internal} (nil)) (code_label 42 33 38 3 2 [0 uses]) (note 38 42 39 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 39 38 40 3 (set (reg:SI 77 [ MEM[base: _14, offset: 0B] ]) (mem:SI (reg:SI 72 [ ivtmp$8 ]) [2 MEM[base: _14, offset: 0B]+0 S4 A32])) arraysum.c:9 29 {movsi_internal} (nil)) (insn 40 39 41 3 (set (reg/v:SI 74 [ Sum ]) (plus:SI (reg/v:SI 74 [ Sum ]) (reg:SI 77 [ MEM[base: _14, offset: 0B] ]))) arraysum.c:9 1 {addsi3} (expr_list:REG_DEAD (reg:SI 77 [ MEM[base: _14, offset: 0B] ]) (nil))) (insn 41 40 43 3 (set (reg:SI 72 [ ivtmp$8 ]) (plus:SI (reg:SI 72 [ ivtmp$8 ]) (const_int 4 [0x4]))) 1 {addsi3} (nil)) (jump_insn 43 41 52 3 (set (pc) (if_then_else (ne (reg:SI 72 [ ivtmp$8 ]) (reg/f:SI 76 [ D.1393 ])) (label_ref:SI 52) (pc))) arraysum.c:8 39 {*btrue} (int_list:REG_BR_PROB 9899 (nil)) - 52) (code_label 52 43 51 5 3 [1 uses]) (note 51 52 44 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (note 44 51 45 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 45 44 46 4 (set (reg/f:SI 78) (mem/u/c:SI (symbol_ref/u:SI (*.LC4) [flags 0x2]) [2 S4 A32])) arraysum.c:11 29 {movsi_internal} (expr_list:REG_EQUAL (symbol_ref:SI (g_sum) var_decl 0x7f6eef5d6360 g_sum) (nil))) (insn 46 45 0 4 (set (mem/c:SI (reg/f:SI 78) [2 g_sum+0 S4 A32]) (reg/v:SI 74 [ Sum ])) arraysum.c:11 29 {movsi_internal} (expr_list:REG_DEAD (reg/f:SI 78) (expr_list:REG_DEAD (reg/v:SI 74 [ Sum ]) (nil 2. RTL after the doloop optimization pass(arraysum.c.195r.loop2_doloop): (note 34 0 32 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 32 34 36 2 NOTE_INSN_FUNCTION_BEG) (insn 36 32 37 2 (set (reg:SI 72 [ ivtmp$8 ]) (mem/u/c:SI (symbol_ref/u:SI (*.LC2) [flags 0x2]) [2 S4 A32])) 29 {movsi_internal} (expr_list:REG_EQUAL (symbol_ref:SI (g) var_decl 0x7f6eef5d62d0 g) (nil))) (insn 37 36 33 2 (set (reg/f:SI 76 [ D.1393 ]) (mem/u/c:SI (symbol_ref/u:SI (*.LC3) [flags 0x2]) [2 S4 A32])) 29 {movsi_internal} (expr_list:REG_EQUAL (const:SI (plus:SI (symbol_ref:SI (g) var_decl 0x7f6eef5d62d0 g) (const_int 4096 [0x1000]))) (nil))) (insn 33 37 54 2 (set (reg/v:SI 74 [ Sum ]) (const_int 0 [0])) arraysum.c:6 29 {movsi_internal} (nil)) (insn 54 33 42 2 (set (reg:SI 79) (const_int 1024 [0x400])) arraysum.c:6 -1 (nil)) (code_label 42 54 38 3 2 [0 uses]) (note 38 42 39 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
Re: [PATCH 2/5] Add preferred_for_{size,speed} attributes
On 10/17/2014 10:48 AM, Richard Sandiford wrote: This is the main patch, to add new preferred_for_size and preferred_for_speed attributes that can be used to selectively disable alternatives when optimising for size or speed. As explained in the docs, the new attributes are just optimisation hints and it is possible that size-only alternatives will sometimes end up in a block that's optimised for speed, or vice versa. The patch deals with code that directly accesses the enabled_attributes mask and that ought to take size/speed choices into account. The next patch deals with indirect uses. Note that I'm not making reload support these attributes for hopefully obvious reasons :-) Richard gcc/ * doc/md.texi: Document preferred_for_size and preferred_for_speed attributes. * genattr.c (main): Handle preferred_for_size and preferred_for_speed in the same way as enabled. * recog.h (bool_attr): New enum. (target_recog): Replace x_enabled_alternatives with x_bool_attr_masks. (get_preferred_alternatives, check_bool_attrs): Declare. * recog.c (have_bool_attr, get_bool_attr, get_bool_attr_mask_uncached) (get_bool_attr_mask, get_preferred_alternatives, check_bool_attrs): New functions. (get_enabled_alternatives): Use get_bool_attr_mask. * ira-costs.c (record_reg_classes): Use get_preferred_alternatives instead of recog_data.enabled_alternatives. * ira.c (ira_setup_alts): Likewise. * postreload.c (reload_cse_simplify_operands): Likewise. * config/i386/i386.c (ix86_legitimate_combined_insn): Likewise. * ira-lives.c (preferred_alternatives): New variable. (process_bb_node_lives): Set it. (check_and_make_def_conflict, make_early_clobber_and_input_conflicts) (single_reg_class, ira_implicitly_set_insn_hard_regs): Use it instead of recog_data.enabled_alternatives. * lra-int.h (lra_insn_recog_data): Replace enabled_alternatives to preferred_alternatives. * lra-constraints.c (process_alt_operands): Update accordingly. * lra.c (lra_set_insn_recog_data): Likewise. (lra_update_insn_recog_data): Assert check_bool_attrs. Thanks for picking this up and making a systematic solution, Richard. All RA-related changes are ok for me. I guess other changes (genattrr.c, recog.[ch], md.texi and i386.c) are obvious but I have no power to approve them.
Re: [PATCH 4/5] Remove recog_data.enabled_alternatives
On 10/17/2014 10:52 AM, Richard Sandiford wrote: After the previous patches, this one gets rid of recog_data.enabled_alternatives and its one remaining use. Ok for me, too. Pretty obvious patch although I have no power to approve it all.
Re: [PATCH 5/5] Use preferred_for_speed in i386.md
On 10/17/2014 10:54 AM, Richard Sandiford wrote: Undo the original fix for 61630 and use preferred_for_speed in the problematic pattern. I've not written many gcc.target/i386 tests so the markup might need some work. Final lra.c change is ok for me too. gcc/ * lra.c (lra): Remove call to recog_init. * config/i386/i386.md (preferred_for_speed): New attribute (*floatSWI48:modeMODEF:mode2_sse): Override it instead of enabled. gcc/testsuite/ * gcc.target/i386/conversion-2.c: New test.
Re: [PATCH i386 AVX512] [81/n] Add new built-ins.
On 21 Oct 18:47, Kirill Yukhin wrote: On 21 Oct 16:20, Jakub Jelinek wrote: On Tue, Oct 21, 2014 at 06:08:15PM +0400, Kirill Yukhin wrote: --- a/gcc/tree.h +++ b/gcc/tree.h @@ -2334,6 +2334,10 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_COMDAT(NODE) \ (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.comdat_flag) + /* In a FUNCTION_DECL indicates that a static chain is needed. */ +#define DECL_STATIC_CHAIN(NODE) \ + (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.regdecl_flag) + I would say that you should still keep it together with the FUNCTION_DECL macros and use FUNCTION_DECL_CHECK there, to make it clear we don't want the macro to be used on VAR_DECLs etc. So just s/function_decl/decl_with_vis/ in the definition IMHO. Yeah, sure. Also, with so many added builtins, how does it affect int i; compilation time at -O0? If it is significant, maybe it is highest time to make the md builtin decl building more lazy. I've tried this: $ echo int i; test.c $ time for i in `seq 1` ; do ./build-x86_64-linux/gcc/xgcc -B./build-x86_64-linux/gcc -O0 -S test.c ; done For trunk w/ and w/o the patch applied. Got 106.86 vs. 106.85 secs. which looks equal. Retested on clear machine (SandyBridge). Got 189 vs. 192 secs., i.e. ~1%
Re: [1/6] nvptx testsuite patches: alloca
On 10/21/14 14:12, Bernd Schmidt wrote: This deals with uses of alloca in the testsuite. Some tests require it outright, others only at -O0, and others require it implicitly by requiring an alignment for stack variables bigger than the target's STACK_BOUNDARY. For the latter I've added explicit xfails. OK. Jeff
Re: [2/6] nvptx testsuite patches: typed assembly
On 10/21/14 14:15, Bernd Schmidt wrote: Since everything in ptx assembly is typed, KR C is problematic. There are a number of testcases that call functions with the wrong number of arguments, or arguments of the wrong type. I've added a new feature, untyped_assembly, which these tests now require. I've also used this for tests using builtin_apply/builtin_return. I'd kind of prefer to see the tests fixed, but I can live with this. FWIW, the PA (32-bit SOM) is very sensitive to this stuff as well, though the linker will detect and correct most of these problems. The PTX model doesn't give you the option to correct this stuff during the link phase jeff
Re: [3/6] nvptx testsuite patches: stdio
On 10/21/14 14:17, Bernd Schmidt wrote: Some tests use stdio functions which are unavaiable with the cut-down newlib I'm using for ptx testing. I'm somewhat uncertain what to do with these; they are by no means the only unavailable library functions the testsuite tries to use (signal is another example). Here's a patch which deals with parts of the problem, but I wouldn't mind leaving this one out if it doesn't seem worthwhile. Tests probably shouldn't be using stdio anyway, except perhaps for the wrapper used when we run remotes and such to print the PASS/FAIL message. One could argue a better direction would be to change calls into stdio to instead call some other function defined in the same .c file. That other function would be marked as noinline. That would help minimize the possibility of compromising the test. Jeff
[PATCHv5][PING^2] Vimrc config with GNU formatting
On 10/13/2014 02:26 PM, Yury Gribov wrote: On 10/02/2014 09:14 PM, Yury Gribov wrote: On 09/17/2014 09:08 PM, Yury Gribov wrote: On 09/16/2014 08:38 PM, Yury Gribov wrote: Hi all, This is the third version of the patch. A list of changes since last version: * move config to contrib so that it's _not_ enabled by default (current score is 2/1 in favor of no Vim config by default) * update Makefile.in to make .local.vimrc if developer asks for it * disable autoformatting for flex files * fix filtering of non-GNU sources (libsanitizer) * added some small fixes in cinoptions based on feedback from community As noted by Richard, the config does not do a good job of formatting unbound {} blocks e.g. void foo () { int x; { // I'm an example of bad bad formatting } } but it seems to be the best we can get with Vim's cindent (and I don't think anyone seriously considers writing a custom indentexpr). Ok to commit? New vesion with support for another popular local .vimrc plugin. Hi all, Here is a new vesion of vimrc patch. Hope I got email settings right this time. Changes since v4: * fixed and enhanced docs * added support for .lvimrc in Makefile * minor fixes in cinoptions and formatoptions (reported by Segher) * removed shiftwidth settings (as it does not really relate to code formatting) -Y commit 3f560e9dd16a5e914b6f2ba82edffe13dfde944c Author: Yury Gribov y.gri...@samsung.com Date: Thu Oct 2 15:50:52 2014 +0400 2014-10-02 Laurynas Biveinis laurynas.bivei...@gmail.com Yury Gribov y.gri...@samsung.com Vim config with GNU formatting. contrib/ * vimrc: New file. / * .gitignore: Added .local.vimrc and .lvimrc. * Makefile.tpl (vimrc, .lvimrc, .local.vimrc): New targets. * Makefile.in: Regenerate. diff --git a/.gitignore b/.gitignore index e9b56be..ab97ac6 100644 --- a/.gitignore +++ b/.gitignore @@ -32,6 +32,9 @@ POTFILES TAGS TAGS.sub +.local.vimrc +.lvimrc + .gdbinit .gdb_history diff --git a/Makefile.in b/Makefile.in index d6105b3..f3a34af 100644 --- a/Makefile.in +++ b/Makefile.in @@ -2384,6 +2384,18 @@ mail-report-with-warnings.log: warning.log chmod +x $@ echo If you really want to send e-mail, run ./$@ now +# Local Vim config + +$(srcdir)/.local.vimrc: + $(LN_S) $(srcdir)/contrib/vimrc $@ + +$(srcdir)/.lvimrc: + $(LN_S) $(srcdir)/contrib/vimrc $@ + +vimrc: $(srcdir)/.local.vimrc $(srcdir)/.lvimrc + +.PHONY: vimrc + # Installation targets. .PHONY: install uninstall diff --git a/Makefile.tpl b/Makefile.tpl index f7c7e38..b98930c 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -867,6 +867,18 @@ mail-report-with-warnings.log: warning.log chmod +x $@ echo If you really want to send e-mail, run ./$@ now +# Local Vim config + +$(srcdir)/.local.vimrc: + $(LN_S) $(srcdir)/contrib/vimrc $@ + +$(srcdir)/.lvimrc: + $(LN_S) $(srcdir)/contrib/vimrc $@ + +vimrc: $(srcdir)/.local.vimrc $(srcdir)/.lvimrc + +.PHONY: vimrc + # Installation targets. .PHONY: install uninstall diff --git a/contrib/vimrc b/contrib/vimrc new file mode 100644 index 000..34e8f35 --- /dev/null +++ b/contrib/vimrc @@ -0,0 +1,45 @@ + Code formatting settings for Vim. + + To enable this for GCC files by default, you can either source this file + in your .vimrc via autocmd: + :au BufNewFile,BufReadPost path/to/gcc/* :so path/to/gcc/contrib/vimrc + or source the script manually for each newly opened file: + :so contrib/vimrc + You could also use numerous plugins that enable local vimrc e.g. + mbr's localvimrc or thinca's vim-localrc (but note that the latter + is much less secure). To install local vimrc config, run + $ make vimrc + from GCC build folder. + + Copyright (C) 2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see http://www.gnu.org/licenses/. + +function! SetStyle() + let l:fname = expand(%:p) + if stridx(l:fname, 'libsanitizer') != -1 +return + endif + let l:ext = fnamemodify(l:fname, :e) + let l:c_exts = ['c', 'h', 'cpp', 'cc', 'C', 'H', 'def', 'java'] + if index(l:c_exts, l:ext) != -1 +setlocal cindent +setlocal softtabstop=2 +setlocal cinoptions=4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0 +setlocal textwidth=80 +setlocal formatoptions-=ro formatoptions+=cqlt + endif +endfunction + +call SetStyle()
Re: [PATCH] Account for prologue spills in reg_pressure scheduling
On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote: Hi, This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers. At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers. The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue. However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself. In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers. The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes. The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default). A thanks goes to Richard good discussion of the problem and suggestions on the approach to fix it. The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu. OK to apply? It is a pretty interesting idea for heuristic, Maxim. But I don't understand the following loop: + for (i = 0; i FIRST_PSEUDO_REGISTER; ++i) + if (call_used_regs[i]) + for (c = 0; c ira_pressure_classes_num; ++c) + { + enum reg_class cl = ira_pressure_classes[c]; + if (ira_class_hard_regs[cl][i]) + ++call_used_regs_num[cl]; ira_class_hard_regs[cl] is array containing hard registers belonging to class CL. So if GENERAL_REGS consists of hard regs 0..3, 12..15, the array will contain 8 elements 0..3, 12..15. The array size is defined by ira_class_hard_regs_num[cl]. So the index is order number of hard reg in the class (starting from 0) but not hard register number itself. Also the pressure classes never intersect so you can stop the inner loop when you find class to which hard reg belongs to. I believe you should rewrite the code and get performance results again to get an approval. You also missed the changelog.
Re: [4/6] nvptx testsuite patches: xfails and skips
On 10/21/14 14:19, Bernd Schmidt wrote: Some things don't fit into nice categories that apply to a larger set of tests, or which are somewhat random like ptxas tool failures. For these I've added xfails and skips. Bernd ts-xfails.diff gcc/testsuite/ * lib/target-supports.exp (check_effective_target_trampolines, check_profiling_available, check_effective_target_lto, check_effective_target_vect_natural): False for nvptx-*-*. * gcc.c-torture/compile/limits-fndefn.c: Skip for nvptx-*-*. * gcc.c-torture/compile/pr34334.c: Likewise. * gcc.c-torture/compile/pr37056.c: Likewise. * gcc.c-torture/compile/pr39423-1.c: Likewise. * gcc.c-torture/compile/pr46534.c: Likewise. * gcc.c-torture/compile/pr49049.c: Likewise. * gcc.c-torture/compile/pr59417.c: Likewise. * gcc.c-torture/compile/20080721-1.c: Likewise. * gcc.c-torture/compile/920501-4.c: Likewise. * gcc.c-torture/compile/921011-1.c: Likewise. * gcc.dg/20040813-1.c: Likewise. * gcc.dg/pr28755.c: Likewise. * gcc.dg/pr44194-1.c: Likewise. * gcc.c-torture/compile/pr42717.c: Xfail for nvptx-*-*. * gcc.c-torture/compile/pr61684.c: Likewise. * gcc.c-torture/compile/pr20601-1.c: Likewise. * gcc.c-torture/compile/pr59221.c: Likewise. * gcc.c-torture/compile/20060208-1.c: Likewise. * gcc.c-torture/execute/pr52129.c: Likewise. * gcc.c-torture/execute/20020310-1.c: Likewise. * gcc.c-torture/execute/20101011-1.c: Define DO_TEST to 0 for nvptx. * gcc.c-torture/execute20020312-2.c: Add case for for nvptx. * gcc.c-torture/compile/pr60655-1.c: Don't add -fdata-sections for nvptx-*-*. * gcc.dg/pr36400.c: Xfail scan-assembler test on nvptx-*-*. * gcc.dg/const-elim-2.c: Likewise. More ptx tooling failures than I'd expect. I'll leave it up to you whether or not to push on NVidia to fix some of those failures. The timeouts seem particularly troublesome. I think this is fine. jeff
Re: [5/6] nvptx testsuite patches: jumps and labels
On 10/21/14 14:23, Bernd Schmidt wrote: This deals with tests requiring indirect jumps (including tests using setjmp), label values, and nonlocal goto. A subset of these tests uses the NO_LABEL_VALUES macro, but it's not consistent across the testsuite. The feature test I wrote tests whether that is defined and returns false for label_values if so. Bernd ts-jumps-labels.diff gcc/testsuite/ * lib/target-supports.exp (check_effective_target_indirect_jumps): New function. (check_effective_target_nonlocal_goto): New function. (check_effective_target_label_values): New function. * gcc.c-torture/execute/20071220-2.c: Require label_values. * gcc.c-torture/compile/labels-2.c: Likewise. * gcc.c-torture/compile/2518-1.c: Likewise. * gcc.c-torture/compile/20021108-1.c: Likewise. * gcc.c-torture/compile/981006-1.c: Likewise. * gcc.c-torture/execute/20040302-1.c: Likewise. * gcc.dg/torture/pr33848.c: Likewise. * gcc.c-torture/compile/pr46107.c: Require indirect jumps and label values. * gcc.c-torture/compile/pr32919.c: Likewise. * gcc.c-torture/compile/pr17913.c: Likewise. * gcc.c-torture/compile/pr51495.c: Likewise. * gcc.c-torture/compile/pr25224.c: Likewise. * gcc.c-torture/compile/labels-3.c: Likewise. * gcc.c-torture/compile/pr27863.c: Likewise. * gcc.c-torture/compile/20050510-1.c: Likewise. * gcc.c-torture/compile/pr28489.c: Likewise. * gcc.c-torture/compile/pr29128.c: Likewise. * gcc.c-torture/compile/pr21356: Likewise. * gcc.c-torture/execute/20071210-1.c: Likewise. * gcc.c-torture/execute/200701220-1.c: Likewise. * gcc.c-torture/execute/pr51447.c: Likewise. * gcc.c-torture/execute/comp-goto-1.c: Likewise. * gcc.c-torture/execute/comp-goto-2.c: Likewise. * gcc.dg/20021029-1.c: Likewise. * gcc.dg/pr43379.c: Likewise. * gcc.dg/pr45259.c: Likewise. * gcc.dg/torture/pr53695.c: Likewise. * gcc.dg/torture/pr57584.c: Likewise. * gcc.c-torture/execute/980526-1.c: Skip if -O0 and neither label_values or indirect_jumps are available. * gcc.c-torture/compile/920415-1.c: Likewise. Remove NO_LABEL_VALUES test. * gcc.c-torture/compile/920428-3.c: Likewise. * gcc.c-torture/compile/950613-1.c: Likewise. * gcc.c-torture/compile/pr30984.c: Require indirect jumps. * gcc.c-torture/compile/991213-3.c: Likewise. * gcc.c-torture/compile/920825-1.c: Likewise. * gcc.c-torture/compile/20011029-1.c: Likewise. * gcc.c-torture/compile/complex-6.c: Likewise. * gcc.c-torture/compile/pr27127.c: Likewise. * gcc.c-torture/compile/pr58164.c: Likewise. * gcc.c-torture/compile/20041214-1.c: Likewise. * gcc.c-torture/execute/built-in-setjmp.c: Likewise. * gcc.c-torture/execute/pr56982.c: Likewise. * gcc.c-torture/execute/pr60003.c: Likewise. * gcc.c-torture/execute/pr26983.c: Likewise. * gcc.dg/pr57287-2.c: Likewise. * gcc.dg/pr59920-1.c: Likewise. * gcc.dg/pr59920-2.c: Likewise. * gcc.dg/pr59920-3.c: Likewise. * gcc.dg/setjmp-3.c: Likewise. * gcc.dg/setjmp-4.c: Likewise. * gcc.dg/setjmp-5.c: Likewise. * gcc.dg/torture/pr48542.c: Likewise. * gcc.dg/torture/pr57147-2.c: Likewise. * gcc.dg/torture/pr59993.c: Likewise. * gcc.dg/torture/stackalign/non-local-goto-1.c: Require nonlocal_goto. * gcc.dg/torture/stackalign/non-local-goto-2.c: Likewise. * gcc.dg/torture/stackalign/non-local-goto-3.c: Likewise. * gcc.dg/torture/stackalign/non-local-goto-4.c: Likewise. * gcc.dg/torture/stackalign/non-local-goto-5.c: Likewise. * gcc.dg/torture/stackalign/setjmp-1.c: Likewise. * gcc.dg/torture/stackalign/setjmp-3.c: Likewise. * gcc.dg/torture/stackalign/setjmp-4.c: Likewise. * gcc.dg/non-local-goto-1.c: Likewise. * gcc.dg/non-local-goto-2.c: Likewise. * gcc.dg/pr49994-1.c: Likewise. * gcc.dg/torture/pr57036-2.c: Likewise. * gcc.c-torture/compile/20040614-1.c: Require label_values. Remove NO_LABEL_VALUES test. * gcc.c-torture/compile/920831-1.c: Likewise. * gcc.c-torture/compile/920502-1.c: Likewise. * gcc.c-torture/compile/920501-7.c: Likewise. * gcc.dg/pr52139.c: Likewise. NO_LABEL_VALUES probably hasn't been consistently kept up-to-date as the focus of the project has moved a bit away from embedded. That code also predates the push for check_effective_target_*. OK for the trunk. jef