[PATCH] Fix latent PHI-opt bug

2014-10-21 Thread Richard Biener

This fixes a miscompile that can happen when PHI-opt is entered
with a not cleaned up CFG and we have a always true/false condition
like

 if (a_1 != a_1)

or

 if (0 != 0)

in this case the check guarding

  /* If the middle basic block was empty or is defining the
 PHI arguments and this is a single phi where the args are 
different
 for the edges e0 and e1 then we can remove the middle basic 
block. */
  if (emtpy_or_with_defined_p
   single_non_singleton_phi_for_edges (phi_nodes (gimple_bb 
(phi)),
e0, e1))
{
  replace_phi_edge_with_variable (cond_bb, e1, phi, arg);
  /* Note that we optimized this PHI.  */
  return 2;

can run into PHI being _not_ the single non-singleton PHI as
both values in the conditional are equal.

I've remembered running into this issue before so this time I'll
fix it properly instead of just making sure to cleanup the CFG
properly ...

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2014-10-21  Richard Biener  rguent...@suse.de

* tree-ssa-phiopt.c (value_replacement): Properly verify we
are the non-singleton PHI.

Index: gcc/tree-ssa-phiopt.c
===
--- gcc/tree-ssa-phiopt.c   (revision 216396)
+++ gcc/tree-ssa-phiopt.c   (working copy)
@@ -814,7 +814,7 @@ value_replacement (basic_block cond_bb,
 for the edges e0 and e1 then we can remove the middle basic block. */
   if (emtpy_or_with_defined_p
   single_non_singleton_phi_for_edges (phi_nodes (gimple_bb (phi)),
-   e0, e1))
+e0, e1) == phi)
{
   replace_phi_edge_with_variable (cond_bb, e1, phi, arg);
  /* Note that we optimized this PHI.  */


Re: [GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use

2014-10-21 Thread Richard Biener
On Mon, Oct 20, 2014 at 5:53 PM, Xinliang David Li davi...@google.com wrote:
 On Mon, Oct 20, 2014 at 1:32 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Oct 20, 2014 at 12:02 AM, Xinliang David Li davi...@google.com 
 wrote:
 On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
 The difference in instrumentation runtime is huge -- as topn profiler
 is pretty expensive to run.

 With FDO, it is probably better to make early inlining more aggressive
 in order to get more context sensitive profiling.

 I agree with that, I just would like to understand where increasing the 
 iterations
 helps and if we can handle it without iterating (because Richi originally 
 requested to
 drop the iteration for correcness issues)

 Well, I requested to do any iteration with an IPA view in mind.  That is,
 iterate for cgraph cycles for example where currently we face the situation
 that at least one function is inlined unoptimized.  For this we'd like to
 first optimize without inlining (well, maybe inlining doesn't hurt)

 yes -- inlining decision made without callee cleanup is more
 conservative and should not hurt.

and then
 inline (and re-optimize if we inlined).

 Indirect edges are more interesting, but basically you'd want to re-inline
 once you discover new direct calls during early opts (but then make
 sure to do that only after the direct callee was early-optimized first).


 It would be interesting to inline the newly introduced direct calls if
 the callsites also have function pointer arguments that are known in
 the call context.

 Thus it would be nice if somebody could improve on the currently very
 simple function ordering we apply early opts, integrating iteration
 in a better way (not iterating over all functions but only where it
 might make a difference, focused on inlining).

 Do you have some examples?

 We can do FDO experiment by shutting down einline. (Note that
 increasing iteration to 2 did not actually improve performance with
 our benchmarks).

 Early inlining itself has large performance impact for FDO (the
 runtime of the profile-use build). With it disabled, the FDO
 performance drops by 2% on average. The degradation is seen across
 all benchmarks except for one.

 Only 2%?  You are lucky ;)

 2% average is considered pretty significant for optimized build
 runtime performance.


 For tramp3d introducing early inlining
 made a difference of 10% ;)  (yes, statistically for tramp3d
 we have for each assembler instruction generated 100 calls in the
 initial code ... wheee C++ template metaprogramming!)

 Is this 10% difference from instrumentation build or optimized
 build runtime?

It's from instrumentation build.  I don't remember any numbers for the
improvement on optimized build with FDO vs. non-FDO.

Richard.


 So indeed early inlining was absoultely required to make FDO usable at all.

 thanks,

 David

 Richard.

 David



 David

 Honza

 David

 On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Increasing the number of early inliner iterations from 1 to 2 enables 
  more
  indirect calls to be promoted/inlined before instrumentation. This in 
  turn
  reduces the instrumentation overhead, particularly for more expensive 
  indirect
  call topn profiling.
 
  How much difference you get here? One posibility would be also to run 
  specialized
  ipa-cp before profile instrumentation.
 
  Honza
 
  Passes internal testing and regression tests. Ok for google/4_9?
 
  2014-10-18  Teresa Johnson  tejohn...@google.com
 
  Google ref b/17934523
  * opts.c (finish_options): Increase 
  max-early-inliner-iterations to 2
  for profile-gen and profile-use builds.
 
  Index: opts.c
  ===
  --- opts.c  (revision 216286)
  +++ opts.c  (working copy)
  @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct 
  g
   opts-x_param_values, opts_set-x_param_values);
   }
 
  +  if (opts-x_profile_arc_flag
  +  || opts-x_flag_branch_probabilities)
  +{
  +  maybe_set_param_value
  +   (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
  +opts-x_param_values, opts_set-x_param_values);
  +}
  +
 if (!(opts-x_flag_auto_profile
   || (opts-x_profile_arc_flag || 
  opts-x_flag_branch_probabilities)))
   {
 
 
  --
  Teresa Johnson | Software Engineer | tejohn...@google.com | 
  408-460-2413


Re: [patch] Second basic-block.h restructuring patch.

2014-10-21 Thread Richard Biener
On Mon, Oct 20, 2014 at 8:21 PM, Andrew MacLeod amacl...@redhat.com wrote:
 creates cfg.h, cfganal.h, lcm.h, and loop-unroll.h to house the prototypes
 for those .c files.

 cfganal.h also gets struct edge_list  and class control_dependences
 definitions since that is where all the routines and manipulators are
 declared.

  loop-unroll.h only exports 2 routines, so rather than including that in
 basic-block.h I simply included it from the 2 .c files which consume those
 routines.  Again, the other includes will be flattened out of basic-block.h
 to just their consumers later.

 loop-unroll.c also had one function I marked as static since it wasn't
 actually used anywhere else.

 bootstraps on x86_64-unknown-linux-gnu, and regressions are running... I
 expect no regressions because of the nature of the changes.   OK to check in
 assuming everything is OK?

Ok.

Thanks,
Richard.

 Andrew


Re: [PATCH 6/8] Handle SCRATCH in decompose_address

2014-10-21 Thread Richard Sandiford
Maxim Kuvyrkov maxim.kuvyr...@linaro.org writes:
 This patch is a simple fix to allow decompose_address to handle
 SCRATCH'es during 2nd scheduler pass. This patch is a prerequisite for a
 scheduler improvement that relies on decompose_address to parse insns.

 Bootstrapped and regtested on x86_64-linux-gnu and regtested on
 arm-linux-gnueabihf and aarch64-linux-gnu.

Can't approve it, but FWIW, as the author of the original code it looks
good to me.  I agree (mem (scratch)) as an idiom for a mem with an
unknown address should be handled here.

Thanks,
Richard



Re: [PATCH PR63530] Fix the pointer alignment in vectorization

2014-10-21 Thread Richard Biener
On Mon, Oct 20, 2014 at 10:10 PM, Carrot Wei car...@google.com wrote:
 Hi Richard

 An arm testcase that can reproduce this bug is attached.

 2014-10-20  Guozhi Wei  car...@google.com

 PR tree-optimization/63530
 gcc.target/arm/pr63530.c: New testcase.


 Index: pr63530.c
 ===
 --- pr63530.c (revision 0)
 +++ pr63530.c (revision 0)
 @@ -0,0 +1,21 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target arm_neon } */
 +/* { dg-options -march=armv7-a -mfloat-abi=hard -mfpu=neon -marm -O2
 -ftree-vectorize -funroll-loops --param
 \max-completely-peeled-insns=400\ } */
 +
 +typedef struct {
 +  unsigned char map[256];
 +  int i;
 +} A, *AP;
 +
 +void* calloc(int, int);
 +
 +AP foo (int n)
 +{
 +  AP b = (AP)calloc (1, sizeof (A));
 +  int i;
 +  for (i = n; i  256; i++)
 +b-map[i] = i;
 +  return b;
 +}
 +
 +/* { dg-final { scan-assembler-not vst1.64 } } */

Can you make it a runtime testcase that fails?  This way it would be
less target specific.

 On Mon, Oct 20, 2014 at 1:19 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Oct 17, 2014 at 7:58 PM, Carrot Wei car...@google.com wrote:

 I miss a testcase.  I also miss a comment before this code explaining
 why DR_MISALIGNMENT if not -1 is valid and why it is not valid if

 DR_MISALIGNMENT (dr) == -1 means some unknown misalignment, otherwise
 it means some known misalignment.
 See the usage in file tree-vect-stmts.c.

I know that.

 'offset' is supplied (what about 'byte_offset' btw?).  Also if peeling

 It is for conservative, so it doesn't change the logic when offset is 
 supplied.
 I've checked that most of the passed in offset are caused by negative
 step, its impact to DR_MISALIGNMENT should have already be considered
 in function vect_update_misalignment_for_peel, but the comments of
 vect_create_addr_base_for_vector_ref does not guarantee this usage of
 offset.

 The usage of byte_offset is quite broken, many direct or indirect
 callers don't provide the parameters. So only the author can comment
 this.

Well - please make it consistent at least, (offset || byte_offset).

 for alignment aligned this ref (misalign == 0) you don't set the alignment.

 I assume if no misalignment is specified, the natural alignment of the
 vector type is used, and caused the wrong code in our case, is it
 right?

No, DR_MISALIGNMENT == 0 means aligned.

OTOH it's quite unnecessary to do all the dance with the alignment
part of the SSA name info (unnecessary for the actual memory references
created by the vectorizer).  The type of the access ultimatively provides
the larger alignment - the SSA name info only may enlarge it even
further (thus it's dangerous to specify larger than valid there).

So if you don't want to make it really optimal wrt offset/byte_offset please
do

   if (offset || byte_offset || misalign == -1)
mark_ptr_info_alignment_unknown (...)
   else
set_ptr_info_alignment (..., align, misalign);

The patch is ok with this change and the testcase turned into a runtime one
and moved to gcc.dg/vect/

Thanks,
Richard.

 Thus you may fix a bug (not sure without a testcase) but the new code
 certainly doesn't look 100% correct.

 That said, I would have expected that we can unconditionally do

  set_ptr_info_alignment (..., align, misalign)

 if misalign is != -1 and if we adjust misalign by offset * step + byte_offset
 (usually both are constants).

 Also we can still trust the alignment copied from addr_base modulo
 vector element size even if DR_MISALIGN is -1.  This may matter
 for targets that require element-alignment for vector accesses.



Re: [PATCH 8/8] Use rank_for_schedule to as tie-breaker in model_order_p

2014-10-21 Thread Richard Sandiford
Maxim Kuvyrkov maxim.kuvyr...@linaro.org writes:
 This patch improves model_order_p to use non-reg-pressure version of
 rank_for_schedule when it needs to break the tie.  At the moment it is
 comparing INSN_PRIORITY by itself, and it seems prudent to outsource
 that to rank_for_schedule.

Do you have an example of where this helps?  A possible danger is that
rank_for_schedule might (indirectly) depend on state that isn't maintained
or updated in the same way during the model schedule phase.

Thanks,
Richard



Re: [PATCH, PR63307] Fix generation of new declarations in random order

2014-10-21 Thread Jakub Jelinek
On Thu, Oct 16, 2014 at 11:06:34AM -0600, Jeff Law wrote:
 We really prefer fully specified sorts.   For a qsort callback, this
 doesn't look fully specified.
 
 
 With that fixed, this should be OK.
 
 jeff
 
 Thanks for the review. Here is the updated version.
 Is it ok?
 Yes, this is good for the trunk.

This broke bootstrap everywhere unfortunately, has it been tested at all?

I already wrote during the initial comment that BLOCKs aren't decls and
you can't push them into the vectors, they can't be sorted easily
(BLOCK_NUMBER isn't assigned at that point e.g. and the comparison function
looks at DECL_UID unconditionally anyway).

I've bootstrapped/regtested on i686-linux the following quick fix,
bootstrapped on x86_64-linux too, in the middle of regtesting there.
If it succeeds, I'll commit as obvious, so that people can continue working
on the trunk.

2014-10-21  Jakub Jelinek  ja...@redhat.com

* cilk.c (fill_decls_vec): Only put decls into vector v.
(compare_decls): Fix up formatting.

--- gcc/c-family/cilk.c.jj  2014-10-20 19:24:54.0 +0200
+++ gcc/c-family/cilk.c 2014-10-21 08:46:24.727790990 +0200
@@ -347,9 +347,12 @@ fill_decls_vec (tree const key0, tree *
   tree t1 = key0;
   struct cilk_decls dp;
 
-  dp.key = t1;
-  dp.val = val0;
-  v-safe_push (dp);
+  if (DECL_P (t1))
+{
+  dp.key = t1;
+  dp.val = val0;
+  v-safe_push (dp);
+}
   return true;
 }
 
@@ -400,8 +403,8 @@ create_parm_list (struct wrapper_data *w
 static int
 compare_decls (const void *a, const void *b)
 {
-  const struct cilk_decls* t1 = (const struct cilk_decls*) a;
-  const struct cilk_decls* t2 = (const struct cilk_decls*) b;
+  const struct cilk_decls *t1 = (const struct cilk_decls *) a;
+  const struct cilk_decls *t2 = (const struct cilk_decls *) b;
 
   if (DECL_UID (t1-key)  DECL_UID (t2-key))
 return 1;


Jakub


Re: The nvptx port [0/11+]

2014-10-21 Thread Richard Biener
On Mon, Oct 20, 2014 at 4:17 PM, Bernd Schmidt ber...@codesourcery.com wrote:
 This is a patch kit that adds the nvptx port to gcc. It contains preliminary
 patches to add needed functionality, the target files, and one somewhat
 optional patch with additional target tools. There'll be more patch series,
 one for the testsuite, and one to make the offload functionality work with
 this port. Also required are the previous four rtl patches, two of which
 weren't entirely approved yet.

 For the moment, I've stripped out all the address space support that got
 bogged down in review by brokenness in our representation of address spaces.
 The ptx address spaces are of course still defined and used inside the
 backend.

 Ptx really isn't a usual target - it is a virtual target which is then
 translated by another compiler (ptxas) to the final code that runs on the
 GPU. There are many restrictions, some imposed by the GPU hardware, and some
 by the fact that not everything you'd want can be represented in ptx. Here
 are some of the highlights:
  * Everything is typed - variables, functions, registers. This can
cause problems with KR style C or anything else that doesn't
have a proper type internally.
  * Declarations are needed, even for undefined variables.
  * Can't emit initializers referring to their variable's address since
you can't write forward declarations for variables.
  * Variables can be declared only as scalars or arrays, not
structures. Initializers must be in the variable's declared type,
which requires some code in the backend, and it means that packed
pointer values are not representable.
  * Since it's a virtual target, we skip register allocation - no good
can probably come from doing that twice. This means asm statements
aren't fixed up and will fail if they use matching constraints.

So with this restriction I wonder why it didn't make sense to go the
HSA backend route emitting PTX from a GIMPLE SSA pass.  This
would have avoided the LTO dance as well ...

That is, what is the advantage of expanding to RTL here - what
main benefits do you get from that which you thought would be
different to handle if doing code generation from GIMPLE SSA?

For HSA we even do register allocation (to a fixed virtual register
set), sth simple enough on SSA.  We of course also have to do
instruction selection but luckily virtual ISAs are easy to target.

So were you worried about duplicating instruction selection
and or doing it manually instead of with well-known machine
descriptions?

I'm just curious - I am not asking you to rewrite the beast ;)

Thanks,
Richard.

  * No support for indirect jumps, label values, nonlocal gotos.
  * No alloca - ptx defines it, but it's not implemented.
  * No trampolines.
  * No debugging (at all, for now - we may add line number directives).
  * Limited C library support - I have a hacked up copy of newlib
that provides a reasonable subset.
  * malloc and free are defined by ptx (these appear to be
undocumented), but there isn't a realloc. I have one patch for
Fortran to use a malloc/memcpy helper function in cases where we
know the old size.

 All in all, this is not intended to be used as a C (or any other source
 language) compiler. I've gone through a lot of effort to make it work
 reasonably well, but only in order to get sufficient test coverage from the
 testsuites. The intended use for this is only to build it as an offload
 compiler, and use it through OpenACC by way of lto1. That leaves the
 question of how we should document it - does it need the usual constraint
 and option documentation, given that user's aren't expected to use any of
 it?

 A slightly earlier version of the entire patch kit was bootstrapped and
 tested on x86_64-linux. Ok for trunk?


 Bernd


RE: [PATCH, PR63307] Fix generation of new declarations in random order

2014-10-21 Thread Zamyatin, Igor
For some reasons it passed bootstrap locally...

 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Tuesday, October 21, 2014 12:15 PM
 To: Zamyatin, Igor; Jeff Law
 Cc: GCC Patches (gcc-patches@gcc.gnu.org)
 Subject: Re: [PATCH, PR63307] Fix generation of new declarations in random
 order
 
 On Thu, Oct 16, 2014 at 11:06:34AM -0600, Jeff Law wrote:
  We really prefer fully specified sorts.   For a qsort callback, this
  doesn't look fully specified.
  
  
  With that fixed, this should be OK.
  
  jeff
  
  Thanks for the review. Here is the updated version.
  Is it ok?
  Yes, this is good for the trunk.
 
 This broke bootstrap everywhere unfortunately, has it been tested at all?
 
 I already wrote during the initial comment that BLOCKs aren't decls and you
 can't push them into the vectors, they can't be sorted easily
 (BLOCK_NUMBER isn't assigned at that point e.g. and the comparison
 function looks at DECL_UID unconditionally anyway).
 
 I've bootstrapped/regtested on i686-linux the following quick fix,
 bootstrapped on x86_64-linux too, in the middle of regtesting there.
 If it succeeds, I'll commit as obvious, so that people can continue working on
 the trunk.
 
 2014-10-21  Jakub Jelinek  ja...@redhat.com
 
   * cilk.c (fill_decls_vec): Only put decls into vector v.
   (compare_decls): Fix up formatting.
 
 --- gcc/c-family/cilk.c.jj2014-10-20 19:24:54.0 +0200
 +++ gcc/c-family/cilk.c   2014-10-21 08:46:24.727790990 +0200
 @@ -347,9 +347,12 @@ fill_decls_vec (tree const key0, tree *
tree t1 = key0;
struct cilk_decls dp;
 
 -  dp.key = t1;
 -  dp.val = val0;
 -  v-safe_push (dp);
 +  if (DECL_P (t1))
 +{
 +  dp.key = t1;
 +  dp.val = val0;
 +  v-safe_push (dp);
 +}
return true;
  }
 
 @@ -400,8 +403,8 @@ create_parm_list (struct wrapper_data *w  static int
 compare_decls (const void *a, const void *b)  {
 -  const struct cilk_decls* t1 = (const struct cilk_decls*) a;
 -  const struct cilk_decls* t2 = (const struct cilk_decls*) b;
 +  const struct cilk_decls *t1 = (const struct cilk_decls *) a;  const
 + struct cilk_decls *t2 = (const struct cilk_decls *) b;
 
if (DECL_UID (t1-key)  DECL_UID (t2-key))
  return 1;
 
 
   Jakub


Re: [PATCH, PR63307] Fix generation of new declarations in random order

2014-10-21 Thread Jakub Jelinek
On Tue, Oct 21, 2014 at 10:14:56AM +0200, Jakub Jelinek wrote:
 I've bootstrapped/regtested on i686-linux the following quick fix,
 bootstrapped on x86_64-linux too, in the middle of regtesting there.
 If it succeeds, I'll commit as obvious, so that people can continue working
 on the trunk.

Ah, Kyrill has reverted the commit in the mean time, so there is no rush for
this, so I'm not going to commit it now.

The question remains, are the decls all you need from the traversal (i.e.
what you need to act upon)?  From my earlier skim of the original code that
wasn't that obvious.
You can have in decl_map at least also BLOCKs, perhaps types too, what else?

 2014-10-21  Jakub Jelinek  ja...@redhat.com
 
   * cilk.c (fill_decls_vec): Only put decls into vector v.
   (compare_decls): Fix up formatting.

Jakub


Re: The nvptx port [0/11+]

2014-10-21 Thread Jakub Jelinek
On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote:
  * Can't emit initializers referring to their variable's address since
you can't write forward declarations for variables.

Can't that be handled by emitting the initializer without the address and
some constructor that fixes up the initializer at runtime?

  * Variables can be declared only as scalars or arrays, not
structures. Initializers must be in the variable's declared type,
which requires some code in the backend, and it means that packed
pointer values are not representable.

Can't you represent structures and unions as arrays of chars?
For constant initializers that don't need relocations the compiler can
surely turn them into arrays of char initializers (e.g. fold-const.c
native_encode_expr/native_interpret_expr could be used for that).
Supposedly it would mean slower than perhaps necessary loads/stores of
aligned larger fields from the structure, but if it is an alternative to
not supporting structures/unions at all, that sounds like so severe
limitation that it can be pretty fatal for the target.

  * No support for indirect jumps, label values, nonlocal gotos.

Not even indirect calls?  How do you implement C++ or Fortran vtables?

Jakub


Re: [PATCH i386 AVX512] [81/n] Add new built-ins.

2014-10-21 Thread Richard Biener
On Mon, Oct 20, 2014 at 3:50 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Oct 20, 2014 at 05:41:25PM +0400, Kirill Yukhin wrote:
 Hello,
 This patch adds (almost) all built-ins needed by
 AVX-512VL,BW,DQ intrinsics.

 Main questionable hunk is:

 diff --git a/gcc/tree-core.h b/gcc/tree-core.h
 index b69312b..a639487 100644
 --- a/gcc/tree-core.h
 +++ b/gcc/tree-core.h
 @@ -1539,7 +1539,7 @@ struct GTY(()) tree_function_decl {
   DECL_FUNCTION_CODE.  Otherwise unused.
   ???  The bitfield needs to be able to hold all target function
 codes as well.  */
 -  ENUM_BITFIELD(built_in_function) function_code : 11;
 +  ENUM_BITFIELD(built_in_function) function_code : 12;
ENUM_BITFIELD(built_in_class) built_in_class : 2;

unsigned static_ctor_flag : 1;

 Well, decl_with_vis has 15 unused bits, so instead of growing
 FUNCTION_DECL significantly, might be better to move one of the
 flags to decl_with_vis and just document that it applies to FUNCTION_DECLs
 only.  Or move some flag to cgraph if possible.

 But seeing e.g.
IX86_BUILTIN_FIXUPIMMPD256, IX86_BUILTIN_FIXUPIMMPD256_MASK,
IX86_BUILTIN_FIXUPIMMPD256_MASKZ
 etc. I wonder if you really need that many builtins, weren't we adding
 for avx512f just single builtin instead of 3 different ones, always
 providing mask argument and depending on whether it is all ones, etc.
 figuring out what kind of masking should be performed?

If only we had no lang-specific flags in tree_base we could use
the same place as we use for internal function code ...

But yes, not using that many builtins in the first place is preferred
for example by making them type-generic and/or variadic.

Richard.

 Jakub


[1/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target

2014-10-21 Thread Terry Guo
Hi There,

This is the first patch to enable GCC generate UAL assembly code for Thumb1
target. This new option enables user to specify which syntax is used in
their inline assembly code.  If the inline assembly code uses UAL format,
then gcc does nothing because gcc generates UAL code as well. If the inline
assembly code uses non-UAL, then gcc will insert some directives in final
assembly code. Is it ok to trunk?

BR,
Terry

2014-10-21  Terry Guo  terry@arm.com

* config/arm/arm.h (TARGET_UNIFIED_ASM): Also include thumb1.
(ASM_APP_ON): Redefined.
* config/arm/arm.c (arm_option_override): Thumb2 always uses UAL
for inline assembly code.
* config/arm/arm.opt (masm-syntax-unified): New option.
* doc/invoke.texi (-masm-syntax-unified): Document new option.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3623c70..e654e22 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -165,6 +165,8 @@ extern char arm_arch_name[];
  } \
if (TARGET_IDIV)\
  builtin_define (__ARM_ARCH_EXT_IDIV__); \
+   if (inline_asm_unified) \
+ builtin_define (__ARM_ASM_SYNTAX_UNIFIED__);\
 } while (0)
 
 #include config/arm/arm-opts.h
@@ -348,8 +350,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
|| (!optimize_size  !current_tune-prefer_constant_pool)))
 
 /* We could use unified syntax for arm mode, but for now we just use it
-   for Thumb-2.  */
-#define TARGET_UNIFIED_ASM TARGET_THUMB2
+   for thumb mode.  */
+#define TARGET_UNIFIED_ASM (TARGET_THUMB)
 
 /* Nonzero if this chip provides the DMB instruction.  */
 #define TARGET_HAVE_DMB(arm_arch6m || arm_arch7)
@@ -2144,8 +2146,13 @@ extern int making_const_table;
 #define CC_STATUS_INIT \
   do { cfun-machine-thumb1_cc_insn = NULL_RTX; } while (0)
 
+#undef ASM_APP_ON
+#define ASM_APP_ON (inline_asm_unified ? \t.syntax unified : \
+   \t.syntax divided\n)
+
 #undef  ASM_APP_OFF
-#define ASM_APP_OFF (TARGET_ARM ?  : \t.thumb\n)
+#define ASM_APP_OFF (TARGET_ARM ? \t.arm\n\t.syntax divided\n : \
+\t.thumb\n\t.syntax unified\n)
 
 /* Output a push or a pop instruction (only used when profiling).
We can't push STATIC_CHAIN_REGNUM (r12) directly with Thumb-1.  We know
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1ee0eb3..9ccf73c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3121,6 +3121,11 @@ arm_option_override (void)
   if (target_slow_flash_data)
 arm_disable_literal_pool = true;
 
+  /* Thumb2 inline assembly code should always use unified syntax.
+ This will apply to ARM and Thumb1 eventually.  */
+  if (TARGET_THUMB2)
+inline_asm_unified = 1;
+
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
 }
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 0a80513..50f4c7d 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -271,3 +271,7 @@ Use Neon to perform 64-bits operations rather than core 
registers.
 mslow-flash-data
 Target Report Var(target_slow_flash_data) Init(0)
 Assume loading data from flash is slower than fetching instructions.
+
+masm-syntax-unified
+Target Report Var(inline_asm_unified) Init(0)
+Assume unified syntax for Thumb inline assembly code.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23f272f..c30c858 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -545,6 +545,7 @@ Objective-C and Objective-C++ Dialects}.
 -munaligned-access @gol
 -mneon-for-64bits @gol
 -mslow-flash-data @gol
+-masm-syntax-unified @gol
 -mrestrict-it}
 
 @emph{AVR Options}
@@ -12954,6 +12955,14 @@ Therefore literal load is minimized for better 
performance.
 This option is only supported when compiling for ARMv7 M-profile and
 off by default.
 
+@item -masm-syntax-unified
+@opindex masm-syntax-unified
+Assume the Thumb1 inline assembly code are using unified syntax.
+The default is currently off, which means divided syntax is assumed.
+However, this may change in future releases of GCC.  Divided syntax
+should be considered deprecated.  This option has no effect when
+generating Thumb2 code.  Thumb2 assembly code always uses unified syntax.
+
 @item -mrestrict-it
 @opindex mrestrict-it
 Restricts generation of IT blocks to conform to the rules of ARMv8.


RE: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap

2014-10-21 Thread Thomas Preud'homme
Hi Richard,

I realized thanks to Christophe Lyon that a shift was not right: the shift count
is a number of bytes instead of a number of bits.

This extra patch fixes the problem.

ChangeLog are as follows:

*** gcc/ChangeLog ***

2014-09-26  Thomas Preud'homme  thomas.preudho...@arm.com

* tree-ssa-math-opts.c (find_bswap_or_nop_1): Fix creation of
MARKER_BYTE_UNKNOWN markers when handling casts.

*** gcc/testsuite/ChangeLog ***

2014-10-08  Thomas Preud'homme  thomas.preudho...@arm.com

* gcc.dg/optimize-bswaphi-1.c: New bswap pass test.

diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c 
b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
index 3e51f04..18aba28 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
@@ -42,6 +42,20 @@ uint32_t read_be16_3 (unsigned char *data)
   return *(data + 1) | (*data  8);
 }
 
+typedef int SItype __attribute__ ((mode (SI)));
+typedef int HItype __attribute__ ((mode (HI)));
+
+/* Test that detection of significant sign extension works correctly. This
+   checks that unknown byte marker are set correctly in cast of cast.  */
+
+HItype
+swap16 (HItype in)
+{
+  return (HItype) (((in  0)  0xFF)  8)
+   | (((in  8)  0xFF)  0);
+}
+
 /* { dg-final { scan-tree-dump-times 16 bit load in target endianness found 
at 3 bswap } } */
-/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 3 
bswap { xfail alpha*-*-* arm*-*-* } } } */
+/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 1 
bswap { target alpha*-*-* arm*-*-* } } } */
+/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 4 
bswap { xfail alpha*-*-* arm*-*-* } } } */
 /* { dg-final { cleanup-tree-dump bswap } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 3c6e935..2ef2333 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1916,7 +1916,8 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number 
*n, int limit)
if (!TYPE_UNSIGNED (n-type)  type_size  old_type_size
 HEAD_MARKER (n-n, old_type_size))
  for (i = 0; i  type_size - old_type_size; i++)
-   n-n |= MARKER_BYTE_UNKNOWN  (type_size - 1 - i);
+   n-n |= MARKER_BYTE_UNKNOWN
+((type_size - 1 - i) * BITS_PER_MARKER);
 
if (type_size  64 / BITS_PER_MARKER)
  {

regression testsuite run without regression on x86_64-linux-gnu and bswap tests 
all pass on arm-none-eabi target

Is it ok for trunk?

Best regards,

Thomas

 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 24, 2014 4:01 PM
 To: Thomas Preud'homme
 Cc: GCC Patches
 Subject: Re: [PATCH] Fix PR63266: Keep track of impact of sign extension
 in bswap
 
 On Tue, Sep 16, 2014 at 12:24 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
  Hi all,
 
  The fix for PR61306 disabled bswap when a sign extension is detected.
 However this led to a test case regression (and potential performance
 regression) in case where a sign extension happens but its effect is
 canceled by other bit manipulation. This patch aims to fix that by having a
 special marker to track bytes whose value is unpredictable due to sign
 extension. If the final result of a bit manipulation doesn't contain any
 such marker then the bswap optimization can proceed.
 
 Nice and simple idea.
 
 Ok.
 
 Thanks,
 Richard.
 
  *** gcc/ChangeLog ***
 
  2014-09-15  Thomas Preud'homme  thomas.preudho...@arm.com
 
  PR tree-optimization/63266
  * tree-ssa-math-opts.c (struct symbolic_number): Add comment
 about
  marker for unknown byte value.
  (MARKER_MASK): New macro.
  (MARKER_BYTE_UNKNOWN): New macro.
  (HEAD_MARKER): New macro.
  (do_shift_rotate): Mark bytes with unknown values due to sign
  extension when doing an arithmetic right shift. Replace hardcoded
  mask for marker by new MARKER_MASK macro.
  (find_bswap_or_nop_1): Likewise and adjust ORing of two
 symbolic
  numbers accordingly.
 
  *** gcc/testsuite/ChangeLog ***
 
  2014-09-15  Thomas Preud'homme  thomas.preudho...@arm.com
 
  PR tree-optimization/63266
  * gcc.dg/optimize-bswapsi-1.c (swap32_d): New bswap pass test.
 
 
  Testing:
 
  * Built an arm-none-eabi-gcc cross-compiler and used it to run the
 testsuite on QEMU emulating Cortex-M3 without any regression
  * Bootstrapped on x86_64-linux-gnu target and testsuite was run
 without regression
 
 
  Ok for trunk?






[PATCH, fixincludes]: Add pthread.h to glibc_c99_inline_4 fix

2014-10-21 Thread Uros Bizjak
On Thu, Oct 16, 2014 at 2:05 PM, Jakub Jelinek ja...@redhat.com wrote:

  Recent change caused bootstrap failure on CentOS 5.11:
 
  /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
  handles version 2 information.
  unwind-dw2-fde-dip_s.o: In function `__pthread_cleanup_routine':
  unwind-dw2-fde-dip.c:(.text+0x1590): multiple definition of
  `__pthread_cleanup_routine'
  /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
  handles version 2 information.
  unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here
  /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
  handles version 2 information.
  unwind-sjlj_s.o: In function `__pthread_cleanup_routine':
  unwind-sjlj.c:(.text+0x0): multiple definition of 
  `__pthread_cleanup_routine'
  unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here
  /usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
  handles version 2 information.
  emutls_s.o: In function `__pthread_cleanup_routine':
  emutls.c:(.text+0x170): multiple definition of `__pthread_cleanup_routine'
  unwind-dw2_s.o:unwind-dw2.c:(.text+0x270): first defined here
  collect2: error: ld returned 1 exit status
  gmake[5]: *** [libgcc_s.so] Error 1
 
  $ ld --version
  GNU ld version 2.17.50.0.6-26.el5 20061020

 It looks like a switch-to-c11 fallout. Older glibc versions have
 issues with c99 (and c11) conformance [1].

 Changing extern __inline void __pthread_cleanup_routine (...) in
 system /usr/include/pthread.h to

 if __STDC_VERSION__  199901L
 extern
 #endif
 __inline__ void __pthread_cleanup_routine (...)

 fixes this issue and allows bootstrap to proceed.

 However, fixincludes is not yet built in stage1 bootstrap. Is there a
 way to fix this issue without changing system headers?

 [1] https://gcc.gnu.org/ml/gcc-patches/2006-11/msg01030.html

 Yeah, old glibcs are totally incompatible with -fno-gnu89-inline.
 Not sure if it is easily fixincludable, if yes, then -fgnu89-inline should
 be used for code like libgcc which is built with the newly built compiler
 before it is fixincluded.
 Or we need -fgnu89-inline by default for old glibcs (that is pretty
 much what we do e.g. in Developer Toolset for RHEL5).

At the end of the day, adding pthread.h to glibc_c99_inline_4 fix
fixes the bootstrap. The fix applies __attribute__((__gnu_inline__))
to the declaration:

extern __inline __attribute__ ((__gnu_inline__)) void
__pthread_cleanup_routine (struct __pthread_cleanup_frame *__frame)

2014-10-21  Uros Bizjak  ubiz...@gmail.com

* inclhack.def (glibc_c99_inline_4): Add pthread.h to files.
* fixincl.x: Regenerate.

Bootstrapped and regression tested on CentOS 5.11 x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
Index: fixincl.x
===
--- fixincl.x   (revision 216501)
+++ fixincl.x   (working copy)
@@ -2,11 +2,11 @@
  * 
  * DO NOT EDIT THIS FILE   (fixincl.x)
  * 
- * It has been AutoGen-ed  August 12, 2014 at 02:09:58 PM by AutoGen 5.12
+ * It has been AutoGen-ed  October 21, 2014 at 10:18:16 AM by AutoGen 5.16.2
  * From the definitionsinclhack.def
  * and the template file   fixincl
  */
-/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Aug 12 14:09:58 MSK 2014
+/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Oct 21 10:18:17 CEST 2014
  *
  * You must regenerate it.  Use the ./genfixes script.
  *
@@ -3173,7 +3173,7 @@
  *  File name selection pattern
  */
 tSCC zGlibc_C99_Inline_4List[] =
-  sys/sysmacros.h\0*/sys/sysmacros.h\0wchar.h\0*/wchar.h\0;
+  
sys/sysmacros.h\0*/sys/sysmacros.h\0wchar.h\0*/wchar.h\0pthread.h\0*/pthread.h\0;
 /*
  *  Machine/OS name selection pattern
  */
Index: inclhack.def
===
--- inclhack.def(revision 216501)
+++ inclhack.def(working copy)
@@ -1687,7 +1687,8 @@
  */
 fix = {
 hackname  = glibc_c99_inline_4;
-files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h';
+files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h',
+pthread.h, '*/pthread.h';
 bypass= __extern_inline|__gnu_inline__;
 select= (^| )extern __inline;
 c_fix = format;


Re: [PATCH, fixincludes]: Add pthread.h to glibc_c99_inline_4 fix

2014-10-21 Thread Jakub Jelinek
On Tue, Oct 21, 2014 at 11:30:49AM +0200, Uros Bizjak wrote:
 At the end of the day, adding pthread.h to glibc_c99_inline_4 fix
 fixes the bootstrap. The fix applies __attribute__((__gnu_inline__))
 to the declaration:
 
 extern __inline __attribute__ ((__gnu_inline__)) void
 __pthread_cleanup_routine (struct __pthread_cleanup_frame *__frame)
 
 2014-10-21  Uros Bizjak  ubiz...@gmail.com
 
 * inclhack.def (glibc_c99_inline_4): Add pthread.h to files.
 * fixincl.x: Regenerate.
 
 Bootstrapped and regression tested on CentOS 5.11 x86_64-linux-gnu {,-m32}.
 
 OK for mainline?

Ok, thanks.

 --- inclhack.def  (revision 216501)
 +++ inclhack.def  (working copy)
 @@ -1687,7 +1687,8 @@
   */
  fix = {
  hackname  = glibc_c99_inline_4;
 -files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h';
 +files = sys/sysmacros.h, '*/sys/sysmacros.h', wchar.h, '*/wchar.h',
 +pthread.h, '*/pthread.h';
  bypass= __extern_inline|__gnu_inline__;
  select= (^| )extern __inline;
  c_fix = format;


Jakub


Re: [PATCH/AARCH64] Add ThunderX -mcpu support

2014-10-21 Thread Marcus Shawcroft
On 20 October 2014 21:45, Andrew Pinski apin...@cavium.com wrote:
 Hi,
   This adds simple -mcpu=thunderx support.  Right now we use the
 schedule model of cortex-a53 but we will submit a schedule model for
 ThunderX later on.  Note ThunderX is an AARCH64 only processor so I
 created a new file to hold the cost tables for it rather than adding
 it to aarch-cost-tables.h.

 OK?  Built and tested for aarch64-elf.


OK, thanks!

Couple of minor nits:

+/* RTX cost tables for aarch64.

s/aarch64/AArch64/

+/* ThunderX does not have implement AARCH32.  */

s/AARCH32/AArch32/

Cheers
/Marcus


 Thanks,
 Andrew Pinski

 PS The corresponding binutils patch is located at
 https://sourceware.org/ml/binutils/2014-10/msg00170.html .


 ChangeLog:
 * doc/invoke.texi (AARCH64/mtune): Document thunderx as an available
 option also.
 * config/aarch64/aarch64-cost-tables.h: New file.
 * config/aarch64/aarch64-cores.def (thunderx): New core.
 * config/aarch64/aarch64-tune.md: Regenerate.
 * config/aarch64/aarch64.c: Include aarch64-cost-tables.h instead of
 config/arm/aarch-cost-tables.h.
 (thunderx_regmove_cost): New variable.
 (thunderx_tunings): New variable.


[PATCH] Add arm_cortex_m7_tune.

2014-10-21 Thread Hale Wang
Hi,

This patch is used to tune the gcc for Cortex-M7.

The performance of Dhrystone can be improved by 1%.
The performance of Coremark can be improved by 2.3%.

Patch also attached for convenience.  

Is it ok for trunk?

Thanks and Best Regards,
Hale Wang

gcc/ChangeLog
2014-10-11  Hale Wang  hale.w...@arm.com

* config/arm/arm.c: Add cortex-m7 tune.
* config/arm/arm-cores.def: Use cortex-m7 tune.

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 56ec7fd..3b34173 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -149,7 +149,7 @@ ARM_CORE(cortex-r4,
cortexr4, cortexr4,  7R,  FL_LDSCHED, cortex)
ARM_CORE(cortex-r4f,cortexr4f, cortexr4f,
7R,  FL_LDSCHED, cortex)
ARM_CORE(cortex-r5,  cortexr5, cortexr5,
7R,  FL_LDSCHED | FL_ARM_DIV, cortex)
ARM_CORE(cortex-r7,  cortexr7, cortexr7,
7R,  FL_LDSCHED | FL_ARM_DIV, cortex)
-ARM_CORE(cortex-m7,   cortexm7, cortexm7,
7EM, FL_LDSCHED, v7m)
+ARM_CORE(cortex-m7,  cortexm7, cortexm7,
7EM, FL_LDSCHED, cortex_m7)
ARM_CORE(cortex-m4,   cortexm4, cortexm4,
7EM, FL_LDSCHED, v7m)
ARM_CORE(cortex-m3,   cortexm3, cortexm3,
7M,  FL_LDSCHED, v7m)
ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4,
7A,  FL_LDSCHED, 9e)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 93b989d..834b13a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2003,6 +2003,27 @@ const struct tune_params arm_v7m_tune =
   8
/* Maximum insns to inline memset.  */
};

+/* Cortex-M7 tuning.  */
+
+const struct tune_params arm_cortex_m7_tune =
+{
+  arm_9e_rtx_costs,
+  v7m_extra_costs,
+  NULL,
/* Sched adj cost.  */
+  0,
/* Constant limit.  */
+  0,
/* Max cond insns.  */
+  ARM_PREFETCH_NOT_BENEFICIAL,
+  true,
/* Prefer constant pool.  */
+  arm_cortex_m_branch_cost,
+  false,
/* Prefer LDRD/STRD.  */
+  {true, true},
/* Prefer non short circuit.  */
+  arm_default_vec_cost,/* Vectorizer costs.  */
+  false,/* Prefer Neon for 64-bits
bitops.  */
+  false, false, /* Prefer 32-bit encodings.
*/
+  false,
/* Prefer Neon for stringops.  */
+  8
/* Maximum insns to inline memset.  */
+};
+
/* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus.
*/
const struct tune_params arm_v6m_tune =





[PATCH] Don't put conditional loads/stores into interleaved chains (PR tree-optimization/63563)

2014-10-21 Thread Jakub Jelinek
Hi!

This patch prevents conditional loads/stores to be added into interleaved
groups (where it ICEs later on).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9?

2014-10-21  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/63563
* tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Bail out
if either dra or drb stmts are not normal loads/stores.

* gcc.target/i386/pr63563.c: New test.

--- gcc/tree-vect-data-refs.c.jj2014-10-03 10:10:42.0 +0200
+++ gcc/tree-vect-data-refs.c   2014-10-20 15:21:47.938679992 +0200
@@ -2551,11 +2551,14 @@ vect_analyze_data_ref_accesses (loop_vec
 over them.  The we can just skip ahead to the next DR here.  */
 
  /* Check that the data-refs have same first location (except init)
-and they are both either store or load (not load and store).  */
+and they are both either store or load (not load and store,
+not masked loads or stores).  */
  if (DR_IS_READ (dra) != DR_IS_READ (drb)
  || !operand_equal_p (DR_BASE_ADDRESS (dra),
   DR_BASE_ADDRESS (drb), 0)
- || !dr_equal_offsets_p (dra, drb))
+ || !dr_equal_offsets_p (dra, drb)
+ || !gimple_assign_single_p (DR_STMT (dra))
+ || !gimple_assign_single_p (DR_STMT (drb)))
break;
 
  /* Check that the data-refs have the same constant size and step.  */
--- gcc/testsuite/gcc.target/i386/pr63563.c.jj  2014-10-20 15:27:17.713745577 
+0200
+++ gcc/testsuite/gcc.target/i386/pr63563.c 2014-10-20 15:27:57.637023020 
+0200
@@ -0,0 +1,17 @@
+/* PR tree-optimization/63563 */
+/* { dg-do compile } */
+/* { dg-options -O3 -mavx2 } */
+
+struct A { unsigned long a, b, c, d; } a[1024] = { { 0, 1, 2, 3 } }, b;
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i  1024; i++)
+{
+  a[i].a = a[i].b = a[i].c = b.c;
+  if (a[i].d)
+   a[i].d = b.d;
+}
+}

Jakub


RE: [PATCH] Add arm_cortex_m7_tune.

2014-10-21 Thread Hale Wang
Attach the patch.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Hale Wang
 Sent: Tuesday, October 21, 2014 5:49 PM
 To: gcc-patches@gcc.gnu.org
 Subject: [PATCH] Add arm_cortex_m7_tune.
 
 Hi,
 
 This patch is used to tune the gcc for Cortex-M7.
 
 The performance of Dhrystone can be improved by 1%.
 The performance of Coremark can be improved by 2.3%.
 
 Patch also attached for convenience.
 
 Is it ok for trunk?
 
 Thanks and Best Regards,
 Hale Wang
 
 gcc/ChangeLog
 2014-10-11  Hale Wang  hale.w...@arm.com
 
 * config/arm/arm.c: Add cortex-m7 tune.
 * config/arm/arm-cores.def: Use cortex-m7 tune.
 
 diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
 index 56ec7fd..3b34173 100644
 --- a/gcc/config/arm/arm-cores.def
 +++ b/gcc/config/arm/arm-cores.def
 @@ -149,7 +149,7 @@ ARM_CORE(cortex-r4,
 cortexr4, cortexr4,  7R,  FL_LDSCHED, cortex)
 ARM_CORE(cortex-r4f,cortexr4f, cortexr4f,
 7R,  FL_LDSCHED, cortex)
 ARM_CORE(cortex-r5,  cortexr5, cortexr5,
 7R,  FL_LDSCHED | FL_ARM_DIV, cortex)
 ARM_CORE(cortex-r7,  cortexr7, cortexr7,
 7R,  FL_LDSCHED | FL_ARM_DIV, cortex)
 -ARM_CORE(cortex-m7,   cortexm7, cortexm7,
 7EM, FL_LDSCHED, v7m)
 +ARM_CORE(cortex-m7,  cortexm7, cortexm7,
 7EM, FL_LDSCHED, cortex_m7)
 ARM_CORE(cortex-m4,   cortexm4, cortexm4,
 7EM, FL_LDSCHED, v7m)
 ARM_CORE(cortex-m3,   cortexm3, cortexm3,
 7M,  FL_LDSCHED, v7m)
 ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4,
 7A,  FL_LDSCHED, 9e)
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index
 93b989d..834b13a 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -2003,6 +2003,27 @@ const struct tune_params arm_v7m_tune =
8
 /* Maximum insns to inline memset.  */
 };
 
 +/* Cortex-M7 tuning.  */
 +
 +const struct tune_params arm_cortex_m7_tune = {
 +  arm_9e_rtx_costs,
 +  v7m_extra_costs,
 +  NULL,
 /* Sched adj cost.  */
 +  0,
 /* Constant limit.  */
 +  0,
 /* Max cond insns.  */
 +  ARM_PREFETCH_NOT_BENEFICIAL,
 +  true,
 /* Prefer constant pool.  */
 +  arm_cortex_m_branch_cost,
 +  false,
 /* Prefer LDRD/STRD.  */
 +  {true, true},
 /* Prefer non short circuit.  */
 +  arm_default_vec_cost,/* Vectorizer costs.  */
 +  false,/* Prefer Neon for
64-bits
 bitops.  */
 +  false, false, /* Prefer 32-bit
encodings.
 */
 +  false,
 /* Prefer Neon for stringops.  */
 +  8
 /* Maximum insns to inline memset.  */
 +};
 +
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
 arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus.
 */
 const struct tune_params arm_v6m_tune =
 
 
 


cortex-m7-tune-2.patch
Description: Binary data


Small multiplier support in Cortex-M0/1/+

2014-10-21 Thread Hale Wang
Hi,

Some configurations of the Cortex-M0 and Cortex-M1 come with a high latency
multiplier. This patch adds support for such configurations.

Small multiplier means using add/sub/shift instructions to replace the mul
instruction for the MCU that has no fast multiplier.

The following strategies are adopted in this patch:
1. Define new CPUs as
-mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,cortex-m1.small-
multiply to support small multiplier.
2. -Os means size is preferred. A threshold of 5 is set which means it will
prevent spliting if ending up with more than 5 instructions. As for non-OS,
there will be no such a limit.

Some test cases are also added in the testsuite to verify this function.

Is it ok for trunk?

Thanks and Best Regards,
Hale Wang

gcc/ChangeLog:

2014-08-29  Hale Wang  hale.w...@arm.com

* config/arm/arm-cores.def: Add support for
-mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
cortex-m1.small-multiply.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/arm.c: Update the rtx-costs for MUL.
* config/arm/bpabi.h: Handle
-mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
cortex-m1.small-multiply.
* doc/invoke.texi: Document
-mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
cortex-m1.small-multiply.
* testsuite/gcc.target/arm/small-multiply-m0-1.c: New test case.
* testsuite/gcc.target/arm/small-multiply-m0-2.c: Likewise.
* testsuite/gcc.target/arm/small-multiply-m0-3.c: Likewise.
* testsuite/gcc.target/arm/small-multiply-m0plus-1.c: Likewise.
* testsuite/gcc.target/arm/small-multiply-m0plus-2.c: Likewise.
* testsuite/gcc.target/arm/small-multiply-m0plus-3.c: Likewise.
* testsuite/gcc.target/arm/small-multiply-m1-1.c: Likewise.
* testsuite/gcc.target/arm/small-multiply-m1-2.c: Likewise.
* testsuite/gcc.target/arm/small-multiply-m1-3.c: Likewise.

===
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index a830a83..af4b373 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -137,6 +137,11 @@ ARM_CORE(cortex-m1,  cortexm1, cortexm1,
6M, FL_LDSCHED, v6m)
 ARM_CORE(cortex-m0,  cortexm0, cortexm0, 6M,
FL_LDSCHED, v6m)
 ARM_CORE(cortex-m0plus,  cortexm0plus, cortexm0plus, 6M,
FL_LDSCHED, v6m)
 
+/* V6M Architecture Processors for small-multiply implementations.  */
+ARM_CORE(cortex-m1.small-multiply,   cortexm1smallmultiply, cortexm1,
6M, FL_LDSCHED | FL_SMALLMUL, v6m)
+ARM_CORE(cortex-m0.small-multiply,   cortexm0smallmultiply, cortexm0,
6M, FL_LDSCHED | FL_SMALLMUL, v6m)
+ARM_CORE(cortex-m0plus.small-multiply,cortexm0plussmallmultiply,
cortexm0plus,6M, FL_LDSCHED | FL_SMALLMUL, v6m)
+
 /* V7 Architecture Processors */
 ARM_CORE(generic-armv7-a,genericv7a, genericv7a, 7A,
FL_LDSCHED, cortex)
 ARM_CORE(cortex-a5,  cortexa5, cortexa5, 7A,
FL_LDSCHED, cortex_a5)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index bc046a0..bd65bd2 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -241,6 +241,15 @@ EnumValue
 Enum(processor_type) String(cortex-m0plus) Value(cortexm0plus)
 
 EnumValue
+Enum(processor_type) String(cortex-m1.small-multiply)
Value(cortexm1smallmultiply)
+
+EnumValue
+Enum(processor_type) String(cortex-m0.small-multiply)
Value(cortexm0smallmultiply)
+
+EnumValue
+Enum(processor_type) String(cortex-m0plus.small-multiply)
Value(cortexm0plussmallmultiply)
+
+EnumValue
 Enum(processor_type) String(generic-armv7-a) Value(genericv7a)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 954cab8..8b5c778 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -25,6 +25,7 @@
arm1176jzs,arm1176jzfs,mpcorenovfp,
mpcore,arm1156t2s,arm1156t2fs,
cortexm1,cortexm0,cortexm0plus,
+
cortexm1smallmultiply,cortexm0smallmultiply,cortexm0plussmallmultiply,
genericv7a,cortexa5,cortexa7,
cortexa8,cortexa9,cortexa12,
cortexa15,cortexr4,cortexr4f,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 93b989d..5062c85 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -751,6 +751,8 @@ static int thumb_call_reg_needed;
 #define FL_ARCH8  (1  24)   /* Architecture 8.  */
 #define FL_CRC32  (1  25)  /* ARMv8 CRC32 instructions.
*/
 
+#define FL_SMALLMUL   (1  26)   /* Small multiply supported.  */
+
 #define FL_IWMMXT (1  29)  /* XScale v2 or Intel
Wireless MMX technology.  */
 #define FL_IWMMXT2(1  30)   /* Intel Wireless MMX2 technology.
*/
 
@@ -914,6 +916,9 @@ int arm_condexec_masklen = 0;
 /* Nonzero if chip supports the ARMv8 CRC 

Re: [PATCHv4][Kasan] Allow to override Asan shadow offset from command line

2014-10-21 Thread Yury Gribov

On 10/17/2014 04:32 PM, Ian Lance Taylor wrote:

Jakub Jelinek ja...@redhat.com writes:


Not sure if there aren't extra steps to make strtoull prototype available
in system.h, libiberty.h etc. for systems that don't have strtoull in their
headers.


See the

#if defined(HAVE_DECL_XXX)  !HAVE_DECL_XXX

lines in include/libiberty.h.  Although strtol is missing there as well.


Thanks, here is a new version of patch which adds support for strtoll 
and strtoull and also a small test for strtol family.


There some open questions though:

1) how am  to test libiberty patches like this? Glibc obviously has 
strtoll and strtoull so I won't get it in my libiberty.a by default. For 
now I manually embed strtoll.o/strtoull.o into libiberty.a and run tests 
by hand but that doesn't sound quite robust and also just x64 is 
probably not enough as well.


2) I had to use __extension__ keyword to hide warnings from -Wlong-long. 
Unfortunately it does not fix spurious warnings for long long constants 
in older versions of GCC 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7263). I guess that's fine.


-Y

commit d78d8daeec19531014c703894dc67db1430f6cbb
Author: Yury Gribov y.gri...@samsung.com
Date:   Thu Oct 16 18:31:10 2014 +0400

Add strtoll and strtoull to libiberty.

2014-10-20  Yury Gribov  y.gri...@samsung.com

include/
	* libiberty.h (strtol, strtoul, strtoll, strtoull): New prototypes.

libiberty/
	* strtoll.c: New file.
	* strtoull.c: New file.
	* configure.ac: Add long long checks. Add harness for strtoll and
	strtoull. Check decls for strtol, strtoul, strtoll, strtoull.
	* Makefile.in (CFILES, CONFIGURED_OFILES): Added strtoll and strtoull.
	* config.in: Regenerate.
	* configure: Regenerate.
	* functions.texi: Regenerate.
	* testsuite/Makefile.in (check-strtol): New rule.
	(test-strtol): Likewise.
	(mostlyclean): Clean up strtol test.
	* testsuite/test-strtol.c: New test.

diff --git a/include/libiberty.h b/include/libiberty.h
index d09c9a5..26355a9 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -655,6 +655,33 @@ extern size_t strnlen (const char *, size_t);
 extern int strverscmp (const char *, const char *);
 #endif
 
+#if defined(HAVE_DECL_STRTOL)  !HAVE_DECL_STRTOL
+extern long int strtol (const char *nptr,
+char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOUL)  !HAVE_DECL_STRTOUL
+extern unsigned long int strtoul (const char *nptr,
+  char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOLL)  !HAVE_DECL_STRTOLL
+__extension__
+extern long long int strtoll (const char *nptr,
+  char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRTOULL)  !HAVE_DECL_STRTOULL
+__extension__
+extern unsigned long long int strtoull (const char *nptr,
+char **endptr, int base);
+#endif
+
+#if defined(HAVE_DECL_STRVERSCMP)  !HAVE_DECL_STRVERSCMP
+/* Compare version strings.  */
+extern int strverscmp (const char *, const char *);
+#endif
+
 /* Set the title of a process */
 extern void setproctitle (const char *name, ...);
 
diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in
index 9b87720..1b0d8ae 100644
--- a/libiberty/Makefile.in
+++ b/libiberty/Makefile.in
@@ -152,8 +152,8 @@ CFILES = alloca.c argv.c asprintf.c atexit.c\
 	 spaces.c splay-tree.c stack-limit.c stpcpy.c stpncpy.c		\
 	 strcasecmp.c strchr.c strdup.c strerror.c strncasecmp.c	\
 	 strncmp.c strrchr.c strsignal.c strstr.c strtod.c strtol.c	\
-	 strtoul.c strndup.c strnlen.c strverscmp.c			\
-	timeval-utils.c tmpnam.c	\
+	 strtoll.c strtoul.c strtoull.c strndup.c strnlen.c \
+	 strverscmp.c timeval-utils.c tmpnam.c\
 	unlink-if-ordinary.c		\
 	vasprintf.c vfork.c vfprintf.c vprintf.c vsnprintf.c vsprintf.c	\
 	waitpid.c			\
@@ -219,8 +219,8 @@ CONFIGURED_OFILES = ./asprintf.$(objext) ./atexit.$(objext)		\
 	 ./strchr.$(objext) ./strdup.$(objext) ./strncasecmp.$(objext)	\
 	 ./strncmp.$(objext) ./strndup.$(objext) ./strnlen.$(objext)	\
 	 ./strrchr.$(objext) ./strstr.$(objext) ./strtod.$(objext)	\
-	 ./strtol.$(objext) ./strtoul.$(objext) ./strverscmp.$(objext)	\
-	./tmpnam.$(objext)		\
+	 ./strtol.$(objext) ./strtoul.$(objext) strtoll.$(objext)	\
+	./strtoull.$(objext) ./tmpnam.$(objext) ./strverscmp.$(objext)	\
 	./vasprintf.$(objext) ./vfork.$(objext) ./vfprintf.$(objext)	\
 	 ./vprintf.$(objext) ./vsnprintf.$(objext) ./vsprintf.$(objext)	\
 	./waitpid.$(objext)
@@ -694,6 +694,17 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	else true; fi
 	$(COMPILE.c) $(srcdir)/crc32.c $(OUTPUT_OPTION)
 
+./d-demangle.$(objext): $(srcdir)/d-demangle.c config.h $(INCDIR)/ansidecl.h \
+	$(INCDIR)/demangle.h $(INCDIR)/libiberty.h \
+	$(INCDIR)/safe-ctype.h
+	if [ x$(PICFLAG) != x ]; then \
+	  $(COMPILE.c) $(PICFLAG) $(srcdir)/d-demangle.c -o pic/$@; \
+	else true; fi
+	if [ 

Re: [Patch, libstdc++/63497] Avoid dereferencing invalid iterator in regex_executor

2014-10-21 Thread Jonathan Wakely

On 20/10/14 10:23 -0700, Tim Shen wrote:

Bootstrapped and tested.


Did you manage to produce a testcase that crashed on trunk?


@@ -407,25 +409,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  templatetypename _BiIter, typename _Alloc, typename _TraitsT,
   bool __dfs_mode
bool _Executor_BiIter, _Alloc, _TraitsT, __dfs_mode::
-_M_word_boundary(_State_TraitsT) const
+_M_word_boundary(_State_TraitsT)
{
-  // By definition.
-  bool __ans = false;
-  auto __pre = _M_current;
-  --__pre;
-  if (!(_M_at_begin()  _M_at_end()))
+  bool __left_is_word = false;
+  if (_M_current != _M_begin
+ || (_M_flags  regex_constants::match_prev_avail))
{
- if (_M_at_begin())
-   __ans = _M_is_word(*_M_current)
-  !(_M_flags  regex_constants::match_not_bow);
- else if (_M_at_end())
-   __ans = _M_is_word(*__pre)
-  !(_M_flags  regex_constants::match_not_eow);
- else
-   __ans = _M_is_word(*_M_current)
- != _M_is_word(*__pre);
+ --_M_current;
+ if (_M_is_word(*_M_current))
+   __left_is_word = true;
+ ++_M_current;


Is it really necessary to modify _M_current here?
Couldn't you do:

  auto __pre = _M_current;
  if (_M_is_word(*--__pre))
__left_is_word = true;

Then the function could remain const, couldn't it?


Re: The nvptx port [0/11+]

2014-10-21 Thread Bernd Schmidt

On 10/21/2014 10:18 AM, Richard Biener wrote:

So with this restriction I wonder why it didn't make sense to go the
HSA backend route emitting PTX from a GIMPLE SSA pass.  This
would have avoided the LTO dance as well ...


Quite simple - there isn't an established way to do this. If I'd known 
you were doing something like this when I started the work I might have 
looked into that approach.



Bernd



Re: The nvptx port [0/11+]

2014-10-21 Thread Bernd Schmidt

On 10/21/2014 10:42 AM, Jakub Jelinek wrote:

On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote:

  * Can't emit initializers referring to their variable's address since
you can't write forward declarations for variables.


Can't that be handled by emitting the initializer without the address and
some constructor that fixes up the initializer at runtime?


That reminds me that constructors are something I forgot to add to the 
list. I'm thinking about making these work with some trickery in the 
linker, but at the moment they are unsupported.



Can't you represent structures and unions as arrays of chars?
For constant initializers that don't need relocations the compiler can
surely turn them into arrays of char initializers (e.g. fold-const.c
native_encode_expr/native_interpret_expr could be used for that).
Supposedly it would mean slower than perhaps necessary loads/stores of
aligned larger fields from the structure, but if it is an alternative to
not supporting structures/unions at all, that sounds like so severe
limitation that it can be pretty fatal for the target.


Oh, structs and unions are supported, and essentially that's what I'm 
doing - I choose a base integer type to represent them. That happens to 
be the size of a pointer, so properly aligned symbol refs can be 
emitted. It's just the packed ones that can't be done.



  * No support for indirect jumps, label values, nonlocal gotos.


Not even indirect calls?  How do you implement C++ or Fortran vtables?


Indirect calls do exist.


Bernd



[2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target

2014-10-21 Thread Terry Guo
Hi there,

Attached patch intends to enable GCC generate UAL format code for Thumb1
target. Tested with regression test and no regressions. Is it OK to trunk?

BR,
Terry

2014-10-21  Terry Guo  terry@arm.com

   * config/arm/arm.c (arm_output_mi_thunk): Use UAL for Thumb1
target.
   * config/arm/thumb1.md: Likewise.

gcc/testsuite
2014-10-21  Terry Guo  terry@arm.com

* gcc.target/arm/anddi_notdi-1.c: Match with UAL format.
* gcc.target/arm/pr40956.c: Likewise.
* gcc.target/arm/thumb1-Os-mult.c: Likewise.
 * gcc.target/arm/thumb1-load-64bit-constant-3.c: Likewise.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9ccf73c..dc73244 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28615,12 +28615,14 @@ arm_output_mi_thunk (FILE *file, tree thunk 
ATTRIBUTE_UNUSED,
  fputs (\tldr\tr3, , file);
  assemble_name (file, label);
  fputs (+4\n, file);
- asm_fprintf (file, \t%s\t%r, %r, r3\n,
+ asm_fprintf (file, \t%ss\t%r, %r, r3\n,
   mi_op, this_regno, this_regno);
}
   else if (mi_delta != 0)
{
- asm_fprintf (file, \t%s\t%r, %r, #%d\n,
+ /* Thumb1 unified syntax requires s suffix in instruction name when
+one of the operands is immediate.  */
+ asm_fprintf (file, \t%ss\t%r, %r, #%d\n,
   mi_op, this_regno, this_regno,
   mi_delta);
}
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 020d83b..8a2abe9 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -29,7 +29,7 @@
(clobber (reg:CC CC_REGNUM))
   ]
   TARGET_THUMB1
-  add\\t%Q0, %Q0, %Q2\;adc\\t%R0, %R0, %R2
+  adds\\t%Q0, %Q0, %Q2\;adcs\\t%R0, %R0, %R2
   [(set_attr length 4)
(set_attr type multiple)]
 )
@@ -42,9 +42,9 @@
   *
static const char * const asms[] =
{
- \add\\t%0, %0, %2\,
- \sub\\t%0, %0, #%n2\,
- \add\\t%0, %1, %2\,
+ \adds\\t%0, %0, %2\,
+ \subs\\t%0, %0, #%n2\,
+ \adds\\t%0, %1, %2\,
  \add\\t%0, %0, %2\,
  \add\\t%0, %0, %2\,
  \add\\t%0, %1, %2\,
@@ -56,7 +56,7 @@
if ((which_alternative == 2 || which_alternative == 6)
 CONST_INT_P (operands[2])
 INTVAL (operands[2])  0)
- return \sub\\t%0, %1, #%n2\;
+ return (which_alternative == 2) ? \subs\\t%0, %1, #%n2\ : \sub\\t%0, 
%1, #%n2\;
return asms[which_alternative];
   
reload_completed  CONST_INT_P (operands[2])
@@ -105,7 +105,7 @@
  (match_operand:DI 2 register_operand  l)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_THUMB1
-  sub\\t%Q0, %Q0, %Q2\;sbc\\t%R0, %R0, %R2
+  subs\\t%Q0, %Q0, %Q2\;sbcs\\t%R0, %R0, %R2
   [(set_attr length 4)
(set_attr type multiple)]
 )
@@ -115,7 +115,7 @@
(minus:SI (match_operand:SI 1 register_operand l)
  (match_operand:SI 2 reg_or_int_operand lPd)))]
   TARGET_THUMB1
-  sub\\t%0, %1, %2
+  subs\\t%0, %1, %2
   [(set_attr length 2)
(set_attr conds set)
(set_attr type alus_sreg)]
@@ -133,9 +133,9 @@
  TARGET_THUMB1  !arm_arch6
   *
   if (which_alternative  2)
-return \mov\\t%0, %1\;mul\\t%0, %2\;
+return \mov\\t%0, %1\;muls\\t%0, %2\;
   else
-return \mul\\t%0, %2\;
+return \muls\\t%0, %2\;
   
   [(set_attr length 4,4,2)
(set_attr type muls)]
@@ -147,9 +147,9 @@
 (match_operand:SI 2 register_operand l,0,0)))]
   TARGET_THUMB1  arm_arch6
   @
-   mul\\t%0, %2
-   mul\\t%0, %1
-   mul\\t%0, %1
+   muls\\t%0, %2
+   muls\\t%0, %1
+   muls\\t%0, %1
   [(set_attr length 2)
(set_attr type muls)]
 )
@@ -159,7 +159,7 @@
(and:SI (match_operand:SI 1 register_operand %0)
(match_operand:SI 2 register_operand l)))]
   TARGET_THUMB1
-  and\\t%0, %2
+  ands\\t%0, %2
   [(set_attr length 2)
(set_attr type  logic_imm)
(set_attr conds set)])
@@ -202,7 +202,7 @@
(and:SI (not:SI (match_operand:SI 1 register_operand l))
(match_operand:SI 2 register_operand 0)))]
   TARGET_THUMB1
-  bic\\t%0, %1
+  bics\\t%0, %1
   [(set_attr length 2)
(set_attr conds set)
(set_attr type logics_reg)]
@@ -213,7 +213,7 @@
(ior:SI (match_operand:SI 1 register_operand %0)
(match_operand:SI 2 register_operand l)))]
   TARGET_THUMB1
-  orr\\t%0, %2
+  orrs\\t%0, %2
   [(set_attr length 2)
(set_attr conds set)
(set_attr type logics_reg)])
@@ -223,7 +223,7 @@
(xor:SI (match_operand:SI 1 register_operand %0)
(match_operand:SI 2 register_operand l)))]
   TARGET_THUMB1
-  eor\\t%0, %2
+  eors\\t%0, %2
   [(set_attr length 2)
(set_attr conds set)
(set_attr type logics_reg)]
@@ -234,7 +234,7 @@
(ashift:SI (match_operand:SI 1 register_operand l,0)
   (match_operand:SI 2 nonmemory_operand N,l)))]
   TARGET_THUMB1
-  lsl\\t%0, %1, %2
+  lsls\\t%0, %1, %2
   [(set_attr length 2)

Re: [PATCH] Don't put conditional loads/stores into interleaved chains (PR tree-optimization/63563)

2014-10-21 Thread Richard Biener
On Tue, 21 Oct 2014, Jakub Jelinek wrote:

 Hi!
 
 This patch prevents conditional loads/stores to be added into interleaved
 groups (where it ICEs later on).
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9?

Ok.

Thanks,
Richard.

 2014-10-21  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/63563
   * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Bail out
   if either dra or drb stmts are not normal loads/stores.
 
   * gcc.target/i386/pr63563.c: New test.
 
 --- gcc/tree-vect-data-refs.c.jj  2014-10-03 10:10:42.0 +0200
 +++ gcc/tree-vect-data-refs.c 2014-10-20 15:21:47.938679992 +0200
 @@ -2551,11 +2551,14 @@ vect_analyze_data_ref_accesses (loop_vec
over them.  The we can just skip ahead to the next DR here.  */
  
 /* Check that the data-refs have same first location (except init)
 -  and they are both either store or load (not load and store).  */
 +  and they are both either store or load (not load and store,
 +  not masked loads or stores).  */
 if (DR_IS_READ (dra) != DR_IS_READ (drb)
 || !operand_equal_p (DR_BASE_ADDRESS (dra),
  DR_BASE_ADDRESS (drb), 0)
 -   || !dr_equal_offsets_p (dra, drb))
 +   || !dr_equal_offsets_p (dra, drb)
 +   || !gimple_assign_single_p (DR_STMT (dra))
 +   || !gimple_assign_single_p (DR_STMT (drb)))
   break;
  
 /* Check that the data-refs have the same constant size and step.  */
 --- gcc/testsuite/gcc.target/i386/pr63563.c.jj2014-10-20 
 15:27:17.713745577 +0200
 +++ gcc/testsuite/gcc.target/i386/pr63563.c   2014-10-20 15:27:57.637023020 
 +0200
 @@ -0,0 +1,17 @@
 +/* PR tree-optimization/63563 */
 +/* { dg-do compile } */
 +/* { dg-options -O3 -mavx2 } */
 +
 +struct A { unsigned long a, b, c, d; } a[1024] = { { 0, 1, 2, 3 } }, b;
 +
 +void
 +foo (void)
 +{
 +  int i;
 +  for (i = 0; i  1024; i++)
 +{
 +  a[i].a = a[i].b = a[i].c = b.c;
 +  if (a[i].d)
 + a[i].d = b.d;
 +}
 +}
 
   Jakub
 


Re: The nvptx port [0/11+]

2014-10-21 Thread Richard Biener
On Tue, Oct 21, 2014 at 12:53 PM, Bernd Schmidt ber...@codesourcery.com wrote:
 On 10/21/2014 10:18 AM, Richard Biener wrote:

 So with this restriction I wonder why it didn't make sense to go the
 HSA backend route emitting PTX from a GIMPLE SSA pass.  This
 would have avoided the LTO dance as well ...


 Quite simple - there isn't an established way to do this. If I'd known you
 were doing something like this when I started the work I might have looked
 into that approach.

Ah, I see.  I think having both ways now is good so we can compare
pros and cons in practice (and make further targets follow the better
approach if there is one).

Richard.


 Bernd



Re: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap

2014-10-21 Thread Richard Biener
On Tue, Oct 21, 2014 at 11:28 AM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 Hi Richard,

 I realized thanks to Christophe Lyon that a shift was not right: the shift 
 count
 is a number of bytes instead of a number of bits.

 This extra patch fixes the problem.

Ok.

Thanks,
Richard.

 ChangeLog are as follows:

 *** gcc/ChangeLog ***

 2014-09-26  Thomas Preud'homme  thomas.preudho...@arm.com

 * tree-ssa-math-opts.c (find_bswap_or_nop_1): Fix creation of
 MARKER_BYTE_UNKNOWN markers when handling casts.

 *** gcc/testsuite/ChangeLog ***

 2014-10-08  Thomas Preud'homme  thomas.preudho...@arm.com

 * gcc.dg/optimize-bswaphi-1.c: New bswap pass test.

 diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c 
 b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
 index 3e51f04..18aba28 100644
 --- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
 +++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
 @@ -42,6 +42,20 @@ uint32_t read_be16_3 (unsigned char *data)
return *(data + 1) | (*data  8);
  }

 +typedef int SItype __attribute__ ((mode (SI)));
 +typedef int HItype __attribute__ ((mode (HI)));
 +
 +/* Test that detection of significant sign extension works correctly. This
 +   checks that unknown byte marker are set correctly in cast of cast.  */
 +
 +HItype
 +swap16 (HItype in)
 +{
 +  return (HItype) (((in  0)  0xFF)  8)
 +   | (((in  8)  0xFF)  0);
 +}
 +
  /* { dg-final { scan-tree-dump-times 16 bit load in target endianness found 
 at 3 bswap } } */
 -/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 
 3 bswap { xfail alpha*-*-* arm*-*-* } } } */
 +/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 
 1 bswap { target alpha*-*-* arm*-*-* } } } */
 +/* { dg-final { scan-tree-dump-times 16 bit bswap implementation found at 
 4 bswap { xfail alpha*-*-* arm*-*-* } } } */
  /* { dg-final { cleanup-tree-dump bswap } } */
 diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
 index 3c6e935..2ef2333 100644
 --- a/gcc/tree-ssa-math-opts.c
 +++ b/gcc/tree-ssa-math-opts.c
 @@ -1916,7 +1916,8 @@ find_bswap_or_nop_1 (gimple stmt, struct 
 symbolic_number *n, int limit)
 if (!TYPE_UNSIGNED (n-type)  type_size  old_type_size
  HEAD_MARKER (n-n, old_type_size))
   for (i = 0; i  type_size - old_type_size; i++)
 -   n-n |= MARKER_BYTE_UNKNOWN  (type_size - 1 - i);
 +   n-n |= MARKER_BYTE_UNKNOWN
 +((type_size - 1 - i) * BITS_PER_MARKER);

 if (type_size  64 / BITS_PER_MARKER)
   {

 regression testsuite run without regression on x86_64-linux-gnu and bswap 
 tests all pass on arm-none-eabi target

 Is it ok for trunk?

 Best regards,

 Thomas

 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 24, 2014 4:01 PM
 To: Thomas Preud'homme
 Cc: GCC Patches
 Subject: Re: [PATCH] Fix PR63266: Keep track of impact of sign extension
 in bswap

 On Tue, Sep 16, 2014 at 12:24 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
  Hi all,
 
  The fix for PR61306 disabled bswap when a sign extension is detected.
 However this led to a test case regression (and potential performance
 regression) in case where a sign extension happens but its effect is
 canceled by other bit manipulation. This patch aims to fix that by having a
 special marker to track bytes whose value is unpredictable due to sign
 extension. If the final result of a bit manipulation doesn't contain any
 such marker then the bswap optimization can proceed.

 Nice and simple idea.

 Ok.

 Thanks,
 Richard.

  *** gcc/ChangeLog ***
 
  2014-09-15  Thomas Preud'homme  thomas.preudho...@arm.com
 
  PR tree-optimization/63266
  * tree-ssa-math-opts.c (struct symbolic_number): Add comment
 about
  marker for unknown byte value.
  (MARKER_MASK): New macro.
  (MARKER_BYTE_UNKNOWN): New macro.
  (HEAD_MARKER): New macro.
  (do_shift_rotate): Mark bytes with unknown values due to sign
  extension when doing an arithmetic right shift. Replace hardcoded
  mask for marker by new MARKER_MASK macro.
  (find_bswap_or_nop_1): Likewise and adjust ORing of two
 symbolic
  numbers accordingly.
 
  *** gcc/testsuite/ChangeLog ***
 
  2014-09-15  Thomas Preud'homme  thomas.preudho...@arm.com
 
  PR tree-optimization/63266
  * gcc.dg/optimize-bswapsi-1.c (swap32_d): New bswap pass test.
 
 
  Testing:
 
  * Built an arm-none-eabi-gcc cross-compiler and used it to run the
 testsuite on QEMU emulating Cortex-M3 without any regression
  * Bootstrapped on x86_64-linux-gnu target and testsuite was run
 without regression
 
 
  Ok for trunk?






[C++ Patch] Add default arguments to cp_parser_unary_expression

2014-10-21 Thread Paolo Carlini

Hi,

another patchlet along the lines of the other I proposed over the last 
weeks: this one should be really uncontroversial, because turns out that 
in all but one case we are passing all NULL / false arguments.


Tested x86_64-linux.

Thanks,
Paolo.

/


Re: [C++ Patch] Add default arguments to cp_parser_unary_expression

2014-10-21 Thread Paolo Carlini

... the patch.

Paolo.

//
2014-10-21  Paolo Carlini  paolo.carl...@oracle.com

* parser.c (cp_parser_unary_expression): Add default arguments.
(cp_parser_cast_expression, cp_parser_sizeof_operand,
cp_parser_omp_atomic): Adjust.
Index: parser.c
===
--- parser.c(revision 216502)
+++ parser.c(working copy)
@@ -1968,7 +1968,7 @@ enum { non_attr = 0, normal_attr = 1, id_attr = 2
 static void cp_parser_pseudo_destructor_name
   (cp_parser *, tree, tree *, tree *);
 static tree cp_parser_unary_expression
-  (cp_parser *, bool, bool, cp_id_kind *);
+  (cp_parser *, cp_id_kind * = NULL, bool = false, bool = false, bool = false);
 static enum tree_code cp_parser_unary_operator
   (cp_token *);
 static tree cp_parser_new_expression
@@ -7104,8 +7104,8 @@ cp_parser_pseudo_destructor_name (cp_parser* parse
Returns a representation of the expression.  */
 
 static tree
-cp_parser_unary_expression (cp_parser *parser, bool address_p, bool cast_p,
-   bool decltype_p, cp_id_kind * pidk)
+cp_parser_unary_expression (cp_parser *parser, cp_id_kind * pidk,
+   bool address_p, bool cast_p, bool decltype_p)
 {
   cp_token *token;
   enum tree_code unary_operator;
@@ -7381,14 +7381,6 @@ static tree
   pidk);
 }
 
-static inline tree
-cp_parser_unary_expression (cp_parser *parser, bool address_p, bool cast_p,
-   cp_id_kind * pidk)
-{
-  return cp_parser_unary_expression (parser, address_p, cast_p,
-/*decltype*/false, pidk);
-}
-
 /* Returns ERROR_MARK if TOKEN is not a unary-operator.  If TOKEN is a
unary-operator, the corresponding tree code is returned.  */
 
@@ -8018,8 +8010,8 @@ cp_parser_cast_expression (cp_parser *parser, bool
 
   /* If we get here, then it's not a cast, so it must be a
  unary-expression.  */
-  return cp_parser_unary_expression (parser, address_p, cast_p,
-decltype_p, pidk);
+  return cp_parser_unary_expression (parser, pidk, address_p,
+cast_p, decltype_p);
 }
 
 /* Parse a binary expression of the general form:
@@ -24374,8 +24366,7 @@ cp_parser_sizeof_operand (cp_parser* parser, enum
   /* If the type-id production did not work out, then we must be
  looking at the unary-expression production.  */
   if (!expr)
-expr = cp_parser_unary_expression (parser, /*address_p=*/false,
-  /*cast_p=*/false, NULL);
+expr = cp_parser_unary_expression (parser);
 
   /* Go back to evaluating expressions.  */
   --cp_unevaluated_operand;
@@ -29039,8 +29030,7 @@ cp_parser_omp_atomic (cp_parser *parser, cp_token
 {
 case OMP_ATOMIC_READ:
 case NOP_EXPR: /* atomic write */
-  v = cp_parser_unary_expression (parser, /*address_p=*/false,
- /*cast_p=*/false, NULL);
+  v = cp_parser_unary_expression (parser);
   if (v == error_mark_node)
goto saw_error;
   if (!cp_parser_require (parser, CPP_EQ, RT_EQ))
@@ -29048,8 +29038,7 @@ cp_parser_omp_atomic (cp_parser *parser, cp_token
   if (code == NOP_EXPR)
lhs = cp_parser_expression (parser);
   else
-   lhs = cp_parser_unary_expression (parser, /*address_p=*/false,
- /*cast_p=*/false, NULL);
+   lhs = cp_parser_unary_expression (parser);
   if (lhs == error_mark_node)
goto saw_error;
   if (code == NOP_EXPR)
@@ -29070,8 +29059,7 @@ cp_parser_omp_atomic (cp_parser *parser, cp_token
}
   else
{
- v = cp_parser_unary_expression (parser, /*address_p=*/false,
- /*cast_p=*/false, NULL);
+ v = cp_parser_unary_expression (parser);
  if (v == error_mark_node)
goto saw_error;
  if (!cp_parser_require (parser, CPP_EQ, RT_EQ))
@@ -29082,8 +29070,7 @@ cp_parser_omp_atomic (cp_parser *parser, cp_token
 }
 
 restart:
-  lhs = cp_parser_unary_expression (parser, /*address_p=*/false,
-   /*cast_p=*/false, NULL);
+  lhs = cp_parser_unary_expression (parser);
   orig_lhs = lhs;
   switch (TREE_CODE (lhs))
 {
@@ -29322,14 +29309,12 @@ stmt_done:
 {
   if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON))
goto saw_error;
-  v = cp_parser_unary_expression (parser, /*address_p=*/false,
- /*cast_p=*/false, NULL);
+  v = cp_parser_unary_expression (parser);
   if (v == error_mark_node)
goto saw_error;
   if (!cp_parser_require (parser, CPP_EQ, RT_EQ))
goto saw_error;
-  lhs1 = cp_parser_unary_expression (parser, /*address_p=*/false,
-/*cast_p=*/false, NULL);
+  lhs1 = cp_parser_unary_expression 

Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-21 Thread Yuri Rumyantsev
Richard,

I did some changes in patch and ChangeLog to mark that support for
if-convert of blocks with only critical incoming edges will be added
in the future (more precise in patch.4).

Could you please review it.

Thanks.

ChangeLog:

2014-10-21  Yuri Rumyantsev  ysrum...@gmail.com

(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
if destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_preds_critical_p): New function.
(if_convertible_bb_p): Use call of all_preds_critical_p
to reject temporarily block if-conversion with incoming critical edges
if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
after adding support for extended predication.
(predicate_bbs): Skip loop exit block also.Invoke build2_loc
to compute predicate instead of fold_build2_loc.
Add zeroing of edge 'aux' field.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and at least one incoming edge is not critical original
algorithm is used.
(tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
Nullify 'aux' field of edges for blocks with two successors.

2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com:
 Richard,

 Thanks for your answer!

 In current implementation phi node conversion assume that one of
 incoming edge to bb containing given phi has at least one non-critical
 edge and choose it to insert predicated code. But if we choose
 critical edge we need to determine insert point and insertion
 direction (before/after) since in other case we can get invalid ssa
 form (use before def). This is done by my new function which is not in
 current patch ( I will present this patch later). SO I assume that we
 need to leave this patch as it is to not introduce new bugs.

 Thanks.
 Yuri.

 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I reworked the patch as you proposed, but I didn't understand what
 did you mean by:

So please rework the patch so critical edges are always handled
correctly.

 In current patch flag_force_vectorize is used (1) to reject phi nodes
 with more than 2 arguments; (2) to reject basic blocks with only
 critical incoming edges since support for extended predication of phi
 nodes will be in next patch.

 I mean that (2) should not be rejected dependent on flag_force_vectorize.
 It was rejected because if-cvt couldn't handle it correctly before but with
 this patch this is fixed.  I see no reason to still reject this then even
 for !flag_force_vectorize.

 Rejecting PHIs with more than two arguments with flag_force_vectorize
 is ok.

 Richard.

 Could you please clarify your statement.

 I attached modified patch.

 ChangeLog:

 2014-10-17  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 (all_edges_are_critical): New function.
 (if_convertible_bb_p): Use call of all_preds_critical_p
 to reject block if-conversion with incoming critical edges only if
 FLAG_FORCE_VECTORIZE was not set-up.
 (predicate_bbs): Skip loop exit block also.Invoke build2_loc
 to compute predicate instead of fold_build2_loc.
 Add zeroing of edge 'aux' field.
 (find_phi_replacement_condition): Extend function interface:
 it returns NULL if given phi node must be handled by means of
 extended phi node predication. If number of predecessors of phi-block
 is equal 2 and atleast one incoming edge is not critical original
 algorithm is used.
 (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
 Nullify 'aux' field of edges for blocks with two successors.




 2014-10-17 13:09 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 Here is reduced patch as you requested. All your remarks have been fixed.
 Could you please look at it ( I have already sent the patch with
 changes in add_to_predicate_list for review).

 + if (dump_file  (dump_flags  TDF_DETAILS))
 +   

[PATCH] Fix obvious errors in IPA devirt and tree-prof testcases

2014-10-21 Thread Richard Biener

They still FAIL though:

FAIL: g++.dg/tree-prof/pr35545.C execution: file pr35545.gcda does not 
exist,  -fprofile-generate -D_PROFILE_GENERATE
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++98  scan-tree-dump-times optimized 
return 2 2
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++11  scan-tree-dump-times optimized 
return 2 2
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++1y  scan-tree-dump-times optimized 
return 2 2

Honza - please test your patches better.

Richard.

2014-10-21  Richard Biener  rguent...@suse.de

* g++.dg/ipa/devirt-42.C: Fix dump scanning routines.
* g++.dg/ipa/devirt-46.C: Likewise.
* g++.dg/ipa/devirt-47.C: Likewise.
* g++.dg/tree-prof/pr35545.C: Likewise.

Index: gcc/testsuite/g++.dg/ipa/devirt-42.C
===
--- gcc/testsuite/g++.dg/ipa/devirt-42.C(revision 216506)
+++ gcc/testsuite/g++.dg/ipa/devirt-42.C(working copy)
@@ -31,8 +31,8 @@ main()
 /* { dg-final { scan-ipa-dump-times Discovered a virtual call to a known 
target 2 inline  } } */
 
 /* Verify that speculation is optimized by late optimizers.  */
-/* { dg-final { scan-ipa-dump-times return 2 2 optimized  } } */
-/* { dg-final { scan-ipa-dump-not OBJ_TYPE_REF optimized  } } */
+/* { dg-final { scan-tree-dump-times return 2 2 optimized  } } */
+/* { dg-final { scan-tree-dump-not OBJ_TYPE_REF optimized  } } */
 
 /* { dg-final { cleanup-ipa-dump inline } } */
-/* { dg-final { cleanup-ipa-dump optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
Index: gcc/testsuite/g++.dg/ipa/devirt-46.C
===
--- gcc/testsuite/g++.dg/ipa/devirt-46.C(revision 216506)
+++ gcc/testsuite/g++.dg/ipa/devirt-46.C(working copy)
@@ -21,7 +21,7 @@ m()
 }
 
 /* { dg-final { scan-ipa-dump-times Discovered a virtual call to a known 
target\[^\\n\]*B::foo 1 inline  } } */
-/* { dg-final { scan-ipa-dump-not OBJ_TYPE_REF optimized  } } */
-/* { dg-final { scan-ipa-dump-not abort optimized  } } */
+/* { dg-final { scan-tree-dump-not OBJ_TYPE_REF optimized  } } */
+/* { dg-final { scan-tree-dump-not abort optimized  } } */
 /* { dg-final { cleanup-ipa-dump inline } } */
-/* { dg-final { cleanup-ipa-dump optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
Index: gcc/testsuite/g++.dg/ipa/devirt-47.C
===
--- gcc/testsuite/g++.dg/ipa/devirt-47.C(revision 216506)
+++ gcc/testsuite/g++.dg/ipa/devirt-47.C(working copy)
@@ -24,8 +24,8 @@ m()
 }
 
 /* { dg-final { scan-ipa-dump-times Discovered a virtual call to a known 
target\[^\\n\]*C::_ZTh 1 inline  } } */
-/* { dg-final { scan-ipa-dump-not OBJ_TYPE_REF optimized  } } */
+/* { dg-final { scan-tree-dump-not OBJ_TYPE_REF optimized  } } */
 /* FIXME: We ought to inline thunk.  */
-/* { dg-final { scan-ipa-dump C::_ZThn optimized  } } */
+/* { dg-final { scan-tree-dump C::_ZThn optimized  } } */
 /* { dg-final { cleanup-ipa-dump inline } } */
-/* { dg-final { cleanup-ipa-dump optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
Index: gcc/testsuite/g++.dg/tree-prof/pr35545.C
===
--- gcc/testsuite/g++.dg/tree-prof/pr35545.C(revision 216506)
+++ gcc/testsuite/g++.dg/tree-prof/pr35545.C(working copy)
@@ -48,5 +48,5 @@ int main()
 }
 /* { dg-final-use { scan-ipa-dump Indirect call - direct call 
profile_estimate } } */
 /* { dg-final-use { cleanup-ipa-dump profile } } */
-/* { dg-final-use { scan-ipa-dump-not OBJ_TYPE_REF optimized } } */
+/* { dg-final-use { scan-tree-dump-not OBJ_TYPE_REF optimized } } */
 /* { dg-final-use { cleanup-tree-dump optimized } } */


Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-21 Thread Richard Biener
On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I did some changes in patch and ChangeLog to mark that support for
 if-convert of blocks with only critical incoming edges will be added
 in the future (more precise in patch.4).

But the same reasoning applies to this version of the patch when
flag_force_vectorize is true!?  (insertion point and invalid SSA form)

Which means the patch doesn't make sense in isolation?

Btw, I think for the case you should simply do gsi_insert_on_edge ()
and commit_edge_insertions () before the call to combine_blocks
(pushing the edge predicate to the newly created block).

Richard.

 Could you please review it.

 Thanks.

 ChangeLog:

 2014-10-21  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 (all_preds_critical_p): New function.
 (if_convertible_bb_p): Use call of all_preds_critical_p
 to reject temporarily block if-conversion with incoming critical edges
 if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
 after adding support for extended predication.
 (predicate_bbs): Skip loop exit block also.Invoke build2_loc
 to compute predicate instead of fold_build2_loc.
 Add zeroing of edge 'aux' field.
 (find_phi_replacement_condition): Extend function interface:
 it returns NULL if given phi node must be handled by means of
 extended phi node predication. If number of predecessors of phi-block
 is equal 2 and at least one incoming edge is not critical original
 algorithm is used.
 (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
 Nullify 'aux' field of edges for blocks with two successors.

 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com:
 Richard,

 Thanks for your answer!

 In current implementation phi node conversion assume that one of
 incoming edge to bb containing given phi has at least one non-critical
 edge and choose it to insert predicated code. But if we choose
 critical edge we need to determine insert point and insertion
 direction (before/after) since in other case we can get invalid ssa
 form (use before def). This is done by my new function which is not in
 current patch ( I will present this patch later). SO I assume that we
 need to leave this patch as it is to not introduce new bugs.

 Thanks.
 Yuri.

 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I reworked the patch as you proposed, but I didn't understand what
 did you mean by:

So please rework the patch so critical edges are always handled
correctly.

 In current patch flag_force_vectorize is used (1) to reject phi nodes
 with more than 2 arguments; (2) to reject basic blocks with only
 critical incoming edges since support for extended predication of phi
 nodes will be in next patch.

 I mean that (2) should not be rejected dependent on flag_force_vectorize.
 It was rejected because if-cvt couldn't handle it correctly before but with
 this patch this is fixed.  I see no reason to still reject this then even
 for !flag_force_vectorize.

 Rejecting PHIs with more than two arguments with flag_force_vectorize
 is ok.

 Richard.

 Could you please clarify your statement.

 I attached modified patch.

 ChangeLog:

 2014-10-17  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 (all_edges_are_critical): New function.
 (if_convertible_bb_p): Use call of all_preds_critical_p
 to reject block if-conversion with incoming critical edges only if
 FLAG_FORCE_VECTORIZE was not set-up.
 (predicate_bbs): Skip loop exit block also.Invoke build2_loc
 to compute predicate instead of fold_build2_loc.
 Add zeroing of edge 'aux' field.
 (find_phi_replacement_condition): Extend function interface:
 it returns NULL if given phi node must be handled by means of
 extended phi node predication. If number of predecessors of phi-block
 is equal 2 and atleast one incoming edge is not critical original
 algorithm is used.
 (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE 

gnu11 fallout

2014-10-21 Thread Andreas Schwab
Tested on m68k-suse-linux, installed as obvious.

Andreas.

* gcc.dg/bf-spl1.c (main): Fix implicit int.

diff --git a/gcc/testsuite/gcc.dg/bf-spl1.c b/gcc/testsuite/gcc.dg/bf-spl1.c
index b28130d..1cba005 100644
--- a/gcc/testsuite/gcc.dg/bf-spl1.c
+++ b/gcc/testsuite/gcc.dg/bf-spl1.c
@@ -44,6 +44,7 @@ pack_d ()
   x = dst.bits.fraction;
 }
 
+int
 main ()
 {
   pack_d ();
-- 
2.1.2

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


[Patch ARM-AArch64/testsuite v3 00/21] Neon intrinsics executable tests

2014-10-21 Thread Christophe Lyon
This patch series is an updated version of the series I sent here:
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00022.html

I addressed comments from Marcus and Richard, and decided to skip
support for half-precision variants for the time being. I'll post
dedicated patches later.

Compared to v2:
- the directory containing the new tests is named
  gcc.target/aarch64/adv-simd instead of
  gcc.target/aarch64/neon-intrinsics.
- the driver is named adv-simd.exp instead of neon-intrinsics.exp
- the driver is guarded against the new test parallelization framework
- the README file uses 'Advanced SIMD (Neon)' instead of 'Neon'

Christophe Lyon (21):
  Advanced SIMD (Neon) intrinsics execution tests initial framework.
vaba, vld1 and vshl tests.
  Add unary operators: vabs and vneg.
  Add binary operators: vadd, vand, vbic, veor, vorn, vorr, vsub.
  Add comparison operators: vceq, vcge, vcgt, vcle and vclt.
  Add comparison operators with floating-point operands: vcage, vcagt,  
   vcale and cvalt.
  Add unary saturating operators: vqabs and vqneg.
  Add binary saturating operators: vqadd, vqsub.
  Add vabal tests.
  Add vabd tests.
  Add vabdl tests.
  Add vaddhn tests.
  Add vaddl tests.
  Add vaddw tests.
  Add vbsl tests.
  Add vclz tests.
  Add vdup and vmov tests.
  Add vld1_dup tests.
  Add vld2/vld3/vld4 tests.
  Add vld2_lane, vld3_lane and vld4_lane tests.
  Add vmul tests.
  Add vuzp and vzip tests.


[Patch ARM-AArch64/testsuite v3 02/21] Add unary operators: vabs and vneg.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/unary_op.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vabs.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vneg.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_op.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_op.inc
new file mode 100644
index 000..33f9b5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_op.inc
@@ -0,0 +1,72 @@
+/* Template file for unary operator validation.
+
+   This file is meant to be included by the relevant test files, which
+   have to define the intrinsic family to test. If a given intrinsic
+   supports variants which are not supported by all the other unary
+   operators, these can be tested by providing a definition for
+   EXTRA_TESTS.  */
+
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=OP(x), then store the result.  */
+#define TEST_UNARY_OP1(INSN, Q, T1, T2, W, N)  \
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_UNARY_OP(INSN, Q, T1, T2, W, N)   \
+  TEST_UNARY_OP1(INSN, Q, T1, T2, W, N)
\
+
+  /* No need for 64 bits variants in the general case.  */
+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, int, 8, 16);
+  DECL_VARIABLE(vector, int, 16, 8);
+  DECL_VARIABLE(vector, int, 32, 4);
+
+  DECL_VARIABLE(vector_res, int, 8, 8);
+  DECL_VARIABLE(vector_res, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 8, 16);
+  DECL_VARIABLE(vector_res, int, 16, 8);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+
+  clean_results ();
+
+  /* Initialize input vector from buffer.  */
+  VLOAD(vector, buffer, , int, s, 8, 8);
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, q, int, s, 8, 16);
+  VLOAD(vector, buffer, q, int, s, 16, 8);
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+
+  /* Apply a unary operator named INSN_NAME.  */
+  TEST_UNARY_OP(INSN_NAME, , int, s, 8, 8);
+  TEST_UNARY_OP(INSN_NAME, , int, s, 16, 4);
+  TEST_UNARY_OP(INSN_NAME, , int, s, 32, 2);
+  TEST_UNARY_OP(INSN_NAME, q, int, s, 8, 16);
+  TEST_UNARY_OP(INSN_NAME, q, int, s, 16, 8);
+  TEST_UNARY_OP(INSN_NAME, q, int, s, 32, 4);
+
+  CHECK_RESULTS (TEST_MSG, );
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS();
+#endif
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME)();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c
new file mode 100644
index 000..ca3901a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c
@@ -0,0 +1,74 @@
+#define INSN_NAME vabs
+#define TEST_MSG VABS/VABSQ
+
+/* Extra tests for functions requiring floating-point types.  */
+void exec_vabs_f32(void);
+#define EXTRA_TESTS exec_vabs_f32
+
+#include unary_op.inc
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x10, 0xf, 0xe, 0xd,
+  0xc, 0xb, 0xa, 0x9 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x10, 0xf, 0xe, 0xd };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x10, 0xf };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x10, 0xf, 0xe, 0xd, 0xc, 0xb, 0xa, 
0x9,
+   0x8, 0x7, 0x6, 0x5, 0x4, 0x3, 0x2, 0x1 
};
+VECT_VAR_DECL(expected,int,16,8) [] = { 0x10, 0xf, 0xe, 0xd,
+   0xc, 0xb, 0xa, 0x9 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x10, 0xf, 0xe, 0xd };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,

[Patch ARM-AArch64/testsuite v3 01/21] Advanced SIMD (Neon) intrinsics execution tests initial framework. vaba, vld1 and vshl tests.

2014-10-21 Thread Christophe Lyon
* documentation (README)
* dejanu driver (advsimd-intrinsics.exp)
* support macros (arm-neon-ref.h, compute-ref-data.h)
* Tests for 3 intrinsics: vaba, vld1, vshl

2014-10-21  Christophe Lyon  christophe.l...@linaro.org

* gcc.target/arm/README.advsimd-intrinsics: New file.
* gcc.target/aarch64/advsimd-intrinsics/README: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vaba.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vshl.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/README 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/README
new file mode 100644
index 000..52c374c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/README
@@ -0,0 +1,132 @@
+This directory contains executable tests for ARM/AArch64 Advanced SIMD
+(Neon) intrinsics.
+
+It is meant to cover execution cases of all the Advanced SIMD
+intrinsics, but does not scan the generated assembler code.
+
+The general framework is composed as follows:
+- advsimd-intrinsics.exp: main dejagnu driver
+- *.c: actual tests, generally one per intrinsinc family
+- arm-neon-ref.h: contains macro definitions to save typing in actual
+  test files
+- compute-ref-data.h: contains input vectors definitions
+- *.inc: generic tests, shared by several families of intrinsics. For
+   instance, unary or binary operators
+
+A typical .c test file starts with the following contents (look at
+vld1.c and vaba.c for sample cases):
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+Then, definitions of expected results, based on common input values,
+as defined in compute-ref-data.h.
+For example:
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x16, 0x17, 0x18, 0x19 };
+defines the expected results of an operator generating int16x4 values.
+
+The common input values defined in compute-ref-data.h have been chosen
+to avoid corner-case values for most operators, yet exposing negative
+values for signed operators. For this reason, their range is also
+limited. For instance, the initialization of buffer_int16x4 will be
+{ -16, -15, -14, -13 }.
+
+The initialization of floating-point values is done via hex notation,
+to avoid potential rounding problems.
+
+To test special values and corner cases, specific initialization
+values should be used in dedicated tests, to ensure proper coverage.
+An example of this is vshl.
+
+When a variant of an intrinsic is not available, its expected result
+should be defined to the value of CLEAN_PATTERN_8 as defined in
+arm-neon-ref.h. For example:
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+if the given intrinsic has no variant producing an int64x1 result,
+like the vcmp family (eg. vclt).
+
+This is because the helper function (check_results(), defined in
+arm-neon-ref.h), iterates over all the possible variants, to save
+typing in each individual test file. Alternatively, one can directly
+call the CHECK/CHECK_FP macros to check only a few expected results
+(see vabs.c for an example).
+
+Then, define the TEST_MSG string, which will be used when reporting errors.
+
+Next, define the function performing the actual tests, in general
+relying on the helpers provided by arm-neon-ref.h, which means:
+
+* declare necessary vectors of suitable types: using
+  DECL_VARIABLE_ALL_VARIANTS when all variants are supported, or the
+  relevant of subset calls to DECL_VARIABLE.
+
+* call clean_results() to initialize the 'results' buffers.
+
+* initialize the input vectors, using VLOAD, VDUP or VSET_LANE (vld*
+  tests do not need this step, since their actual purpose is to
+  initialize vectors).
+
+* execute the intrinsic on relevant variants, for instance using
+  TEST_MACRO_ALL_VARIANTS_2_5.
+
+* call check_results() to check that the results match the expected
+  values.
+
+A template test file could be:
+=
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf6, 0xf7, 0xf8, 0xf9,
+  0xfa, 0xfb, 0xfc, 0xfd };
+/* and as many others as necessary.  */
+
+#define TEST_MSG VMYINTRINSIC
+void exec_myintrinsic (void)
+{
+  /* my test: v4=vmyintrinsic(v1,v2,v3), then store the result.  */
+#define TEST_VMYINTR(Q, T1, T2, W, N)  \
+  VECT_VAR(vector_res, T1, W, N) = \
+vmyintr##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),  \
+VECT_VAR(vector2, T1, W, N),   \
+

[Patch ARM-AArch64/testsuite v3 03/21] Add binary operators: vadd, vand, vbic, veor, vorn, vorr, vsub.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/binary_op.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vadd.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vand.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vbic.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/veor.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vorn.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vorr.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vsub.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op.inc
new file mode 100644
index 000..3483e0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op.inc
@@ -0,0 +1,70 @@
+/* Template file for binary operator validation.
+
+   This file is meant to be included by the relevant test files, which
+   have to define the intrinsic family to test. If a given intrinsic
+   supports variants which are not supported by all the other binary
+   operators, these can be tested by providing a definition for
+   EXTRA_TESTS.  */
+
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=OP(x1,x2), then store the result.  */
+#define TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N),  \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_BINARY_OP(INSN, Q, T1, T2, W, N)  \
+  TEST_BINARY_OP1(INSN, Q, T1, T2, W, N)   \
+
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+  DECL_VARIABLE_ALL_VARIANTS(vector2);
+  DECL_VARIABLE_ALL_VARIANTS(vector_res);
+
+  clean_results ();
+
+  /* Initialize input vector from buffer.  */
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+
+  /* Fill input vector2 with arbitrary values.  */
+  VDUP(vector2, , int, s, 8, 8, 2);
+  VDUP(vector2, , int, s, 16, 4, -4);
+  VDUP(vector2, , int, s, 32, 2, 3);
+  VDUP(vector2, , int, s, 64, 1, 100);
+  VDUP(vector2, , uint, u, 8, 8, 20);
+  VDUP(vector2, , uint, u, 16, 4, 30);
+  VDUP(vector2, , uint, u, 32, 2, 40);
+  VDUP(vector2, , uint, u, 64, 1, 2);
+  VDUP(vector2, q, int, s, 8, 16, -10);
+  VDUP(vector2, q, int, s, 16, 8, -20);
+  VDUP(vector2, q, int, s, 32, 4, -30);
+  VDUP(vector2, q, int, s, 64, 2, 24);
+  VDUP(vector2, q, uint, u, 8, 16, 12);
+  VDUP(vector2, q, uint, u, 16, 8, 3);
+  VDUP(vector2, q, uint, u, 32, 4, 55);
+  VDUP(vector2, q, uint, u, 64, 2, 3);
+
+  /* Apply a binary operator named INSN_NAME.  */
+  TEST_MACRO_ALL_VARIANTS_1_5(TEST_BINARY_OP, INSN_NAME);
+
+  CHECK_RESULTS (TEST_MSG, );
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS();
+#endif
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c
new file mode 100644
index 000..f08c620
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c
@@ -0,0 +1,81 @@
+#define INSN_NAME vadd
+#define TEST_MSG VADD/VADDQ
+
+/* Extra tests for functions requiring floating-point types.  */
+void exec_vadd_f32(void);
+#define EXTRA_TESTS exec_vadd_f32
+
+#include binary_op.inc
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf2, 0xf3, 0xf4, 0xf5,
+  0xf6, 0xf7, 0xf8, 0xf9 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xffec, 0xffed, 0xffee, 0xffef };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfff3, 0xfff4 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x54 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x4, 0x5, 0x6, 0x7,
+   0x8, 0x9, 0xa, 0xb };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xe, 0xf, 0x10, 0x11 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x18, 0x19 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfff2 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0xe6, 0xe7, 0xe8, 0xe9,
+   0xea, 0xeb, 0xec, 0xed,
+   0xee, 0xef, 0xf0, 0xf1,
+   0xf2, 0xf3, 0xf4, 0xf5 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xffdc, 0xffdd, 0xffde, 0xffdf,
+   

[Patch ARM-AArch64/testsuite v3 05/21] Add comparison operators with floating-point operands: vcage, vcagt, vcale and cvalt.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vcage.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcagt.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcale.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcalt.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc
new file mode 100644
index 000..33451d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc
@@ -0,0 +1,75 @@
+/* Template file for the validation of comparison operator with
+   floating-point support.
+
+   This file is meant to be included by the relevant test files, which
+   have to define the intrinsic family to test. If a given intrinsic
+   supports variants which are not supported by all the other
+   operators, these can be tested by providing a definition for
+   EXTRA_TESTS.  */
+
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Additional expected results declaration, they are initialized in
+   each test file.  */
+extern ARRAY(expected2, uint, 32, 2);
+extern ARRAY(expected2, uint, 32, 4);
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=vcomp(x1,x2), then store the result.  */
+#define TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) \
+  VECT_VAR(vector_res, T3, W, N) = \
+INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N),  \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vector_res, T3, W, N))
+
+#define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N)  \
+  TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N)
+
+  DECL_VARIABLE(vector, float, 32, 2);
+  DECL_VARIABLE(vector, float, 32, 4);
+  DECL_VARIABLE(vector2, float, 32, 2);
+  DECL_VARIABLE(vector2, float, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 32, 2);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+
+  clean_results ();
+
+  /* Initialize input vector from buffer.  */
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+
+  /* Choose init value arbitrarily, will be used for vector
+ comparison.  */
+  VDUP(vector2, , float, f, 32, 2, -16.0f);
+  VDUP(vector2, q, float, f, 32, 4, -14.0f);
+
+  /* Apply operator named INSN_NAME.  */
+  TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
+  CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, );
+
+  TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4);
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, );
+
+  /* Test again, with different input values.  */
+  VDUP(vector2, , float, f, 32, 2, -10.0f);
+  VDUP(vector2, q, float, f, 32, 4, 10.0f);
+
+  TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
+  CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected2, );
+
+  TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4);
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected2,);
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c
new file mode 100644
index 000..219d03f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c
@@ -0,0 +1,52 @@
+#define INSN_NAME vcage
+#define TEST_MSG VCAGE/VCAGEQ
+
+#include cmp_fp_op.inc
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x333, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x333, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x0 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0x333, 0x, 0x, 0x,
+   0x333, 0x, 0x, 0x };

[Patch ARM-AArch64/testsuite v3 06/21] Add unary saturating operators: vqabs and vqneg.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc: New
file.
* gcc.target/aarch64/advsimd-intrinsics/vqabs.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vqneg.c: Likewise.

diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc
new file mode 100644
index 000..3f6d984
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc
@@ -0,0 +1,80 @@
+/* Template file for saturating unary operator validation.
+
+   This file is meant to be included by the relevant test files, which
+   have to define the intrinsic family to test. If a given intrinsic
+   supports variants which are not supported by all the other
+   saturating unary operators, these can be tested by providing a
+   definition for EXTRA_TESTS.  */
+
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* y=OP(x), then store the result.  */
+#define TEST_UNARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  Set_Neon_Cumulative_Sat(0);  \
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \
+vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N),  \
+ VECT_VAR(vector_res, T1, W, N));  \
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_UNARY_SAT_OP(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) 
\
+  TEST_UNARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  /* No need for 64 bits variants.  */
+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, int, 8, 16);
+  DECL_VARIABLE(vector, int, 16, 8);
+  DECL_VARIABLE(vector, int, 32, 4);
+
+  DECL_VARIABLE(vector_res, int, 8, 8);
+  DECL_VARIABLE(vector_res, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 8, 16);
+  DECL_VARIABLE(vector_res, int, 16, 8);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+
+  clean_results ();
+
+  /* Initialize input vector from buffer.  */
+  VLOAD(vector, buffer, , int, s, 8, 8);
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, q, int, s, 8, 16);
+  VLOAD(vector, buffer, q, int, s, 16, 8);
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+
+  /* Apply a saturating unary operator named INSN_NAME.  */
+  TEST_UNARY_SAT_OP(INSN_NAME, , int, s, 8, 8, expected_cumulative_sat, );
+  TEST_UNARY_SAT_OP(INSN_NAME, , int, s, 16, 4, expected_cumulative_sat, );
+  TEST_UNARY_SAT_OP(INSN_NAME, , int, s, 32, 2, expected_cumulative_sat, );
+  TEST_UNARY_SAT_OP(INSN_NAME, q, int, s, 8, 16, expected_cumulative_sat, );
+  TEST_UNARY_SAT_OP(INSN_NAME, q, int, s, 16, 8, expected_cumulative_sat, );
+  TEST_UNARY_SAT_OP(INSN_NAME, q, int, s, 32, 4, expected_cumulative_sat, );
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, );
+  CHECK(TEST_MSG, int, 16, 4, PRIx8, expected, );
+  CHECK(TEST_MSG, int, 32, 2, PRIx8, expected, );
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected, );
+  CHECK(TEST_MSG, int, 16, 8, PRIx8, expected, );
+  CHECK(TEST_MSG, int, 32, 4, PRIx8, expected, );
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS();
+#endif
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqabs.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqabs.c
new file mode 100644
index 000..f2be790
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqabs.c
@@ -0,0 +1,127 @@
+#define INSN_NAME vqabs
+#define TEST_MSG VQABS/VQABSQ
+
+/* Extra tests for functions requiring corner cases tests.  */
+void vqabs_extra(void);
+#define EXTRA_TESTS vqabs_extra
+
+#include unary_sat_op.inc
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x10, 0xf, 0xe, 0xd, 0xc, 0xb, 0xa, 0x9 
};
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x10, 0xf, 0xe, 0xd };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x10, 0xf };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 

[Patch ARM-AArch64/testsuite v3 08/21] Add vabal tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vabal.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabal.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabal.c
new file mode 100644
index 000..cd31062
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabal.c
@@ -0,0 +1,161 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff6, 0xfff7, 0xfff8, 0xfff9,
+   0xfffa, 0xfffb, 0xfffc, 0xfffd };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x16, 0x17, 0x18, 0x19 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x20, 0x21 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0x53, 0x54, 0x55, 0x56,
+0x57, 0x58, 0x59, 0x5a };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x907, 0x908, 0x909, 0x90a };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffe7,
+0xffe8 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+
+/* Expected results for cases with input values chosen to test
+   possible intermediate overflow.  */
+VECT_VAR_DECL(expected2,int,16,8) [] = { 0xef, 0xf0, 0xf1, 0xf2,
+0xf3, 0xf4, 0xf5, 0xf6 };
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0xffef, 0xfff0, 0xfff1, 0xfff2 };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0xffef, 0xfff0 };
+VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xee, 0xef, 0xf0, 0xf1,
+ 0xf2, 0xf3, 0xf4, 0xf5 };
+VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xffe2, 0xffe3, 0xffe4, 0xffe5 };
+VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xffe7, 0xffe8 };
+
+#define TEST_MSG VABAL
+void exec_vabal (void)
+{
+  /* Basic test: v4=vabal(v1,v2,v3), then store the result.  */
+#define TEST_VABAL(T1, T2, W, W2, N)   \
+  VECT_VAR(vector_res, T1, W2, N) =\
+vabal_##T2##W(VECT_VAR(vector1, T1, W2, N),
\
+ VECT_VAR(vector2, T1, W, N),  \
+ VECT_VAR(vector3, T1, W, N)); \
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+#define DECL_VABAL_VAR_LONG(VAR)   \
+  DECL_VARIABLE(VAR, int, 16, 8);  \
+  DECL_VARIABLE(VAR, int, 32, 4);  \
+  DECL_VARIABLE(VAR, int, 64, 2);  \
+  DECL_VARIABLE(VAR, uint, 16, 8); \
+  DECL_VARIABLE(VAR, uint, 32, 4); \
+  DECL_VARIABLE(VAR, uint, 64, 2)
+
+#define DECL_VABAL_VAR_SHORT(VAR)  \
+  DECL_VARIABLE(VAR, int, 8, 8);   \
+  DECL_VARIABLE(VAR, int, 16, 4);  \
+  DECL_VARIABLE(VAR, int, 32, 2);  \
+  DECL_VARIABLE(VAR, uint, 8, 8);  \
+  DECL_VARIABLE(VAR, uint, 16, 4); \
+  

[Patch ARM-AArch64/testsuite v3 04/21] Add comparison operators: vceq, vcge, vcgt, vcle and vclt.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vceq.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcge.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcgt.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcle.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vclt.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc
new file mode 100644
index 000..a09c5f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc
@@ -0,0 +1,224 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+#include math.h
+
+/* Additional expected results declaration, they are initialized in
+   each test file.  */
+extern ARRAY(expected_uint, uint, 8, 8);
+extern ARRAY(expected_uint, uint, 16, 4);
+extern ARRAY(expected_uint, uint, 32, 2);
+extern ARRAY(expected_q_uint, uint, 8, 16);
+extern ARRAY(expected_q_uint, uint, 16, 8);
+extern ARRAY(expected_q_uint, uint, 32, 4);
+extern ARRAY(expected_float, uint, 32, 2);
+extern ARRAY(expected_q_float, uint, 32, 4);
+extern ARRAY(expected_uint2, uint, 32, 2);
+extern ARRAY(expected_uint3, uint, 32, 2);
+extern ARRAY(expected_uint4, uint, 32, 2);
+extern ARRAY(expected_nan, uint, 32, 2);
+extern ARRAY(expected_mnan, uint, 32, 2);
+extern ARRAY(expected_nan2, uint, 32, 2);
+extern ARRAY(expected_inf, uint, 32, 2);
+extern ARRAY(expected_minf, uint, 32, 2);
+extern ARRAY(expected_inf2, uint, 32, 2);
+extern ARRAY(expected_mzero, uint, 32, 2);
+extern ARRAY(expected_p8, uint, 8, 8);
+extern ARRAY(expected_q_p8, uint, 8, 16);
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=vcomp(x1,x2), then store the result.  */
+#define TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) \
+  VECT_VAR(vector_res, T3, W, N) = \
+INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N),  \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vector_res, T3, W, N))
+
+#define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N)  \
+  TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N)
+
+  /* No need for 64 bits elements.  */
+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 8, 8);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+  DECL_VARIABLE(vector, float, 32, 2);
+  DECL_VARIABLE(vector, int, 8, 16);
+  DECL_VARIABLE(vector, int, 16, 8);
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector, uint, 8, 16);
+  DECL_VARIABLE(vector, uint, 16, 8);
+  DECL_VARIABLE(vector, uint, 32, 4);
+  DECL_VARIABLE(vector, float, 32, 4);
+
+  DECL_VARIABLE(vector2, int, 8, 8);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector2, uint, 8, 8);
+  DECL_VARIABLE(vector2, uint, 16, 4);
+  DECL_VARIABLE(vector2, uint, 32, 2);
+  DECL_VARIABLE(vector2, float, 32, 2);
+  DECL_VARIABLE(vector2, int, 8, 16);
+  DECL_VARIABLE(vector2, int, 16, 8);
+  DECL_VARIABLE(vector2, int, 32, 4);
+  DECL_VARIABLE(vector2, uint, 8, 16);
+  DECL_VARIABLE(vector2, uint, 16, 8);
+  DECL_VARIABLE(vector2, uint, 32, 4);
+  DECL_VARIABLE(vector2, float, 32, 4);
+
+  DECL_VARIABLE(vector_res, uint, 8, 8);
+  DECL_VARIABLE(vector_res, uint, 16, 4);
+  DECL_VARIABLE(vector_res, uint, 32, 2);
+  DECL_VARIABLE(vector_res, uint, 8, 16);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+
+  clean_results ();
+
+  /* There is no 64 bits variant, don't use the generic initializer.  */
+  VLOAD(vector, buffer, , int, s, 8, 8);
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, , uint, u, 8, 8);
+  VLOAD(vector, buffer, , uint, u, 16, 4);
+  VLOAD(vector, buffer, , uint, u, 32, 2);
+  VLOAD(vector, buffer, , float, f, 32, 2);
+
+  VLOAD(vector, buffer, q, int, s, 8, 16);
+  VLOAD(vector, buffer, q, int, s, 16, 8);
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, uint, u, 8, 16);
+  VLOAD(vector, buffer, q, uint, u, 16, 8);
+  VLOAD(vector, buffer, q, uint, u, 32, 4);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+
+  /* Choose init value arbitrarily, will be used for vector
+ comparison.  */
+  VDUP(vector2, , int, s, 8, 8, -10);
+  VDUP(vector2, , int, s, 16, 4, -14);
+  VDUP(vector2, , int, s, 32, 2, -16);
+  VDUP(vector2, , uint, u, 8, 8, 0xF3);
+  VDUP(vector2, , uint, u, 16, 4, 0xFFF2);
+  VDUP(vector2, , uint, u, 32, 2, 0xFFF1);
+  VDUP(vector2, , float, f, 32, 2, -15.0f);
+
+  VDUP(vector2, q, int, s, 8, 

[Patch ARM-AArch64/testsuite v3 07/21] Add binary saturating operators: vqadd, vqsub.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc: New
file.
* gcc.target/aarch64/advsimd-intrinsics/vqadd.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vqsub.c: Likewise.

diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc
new file mode 100644
index 000..35d7701
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc
@@ -0,0 +1,91 @@
+/* Template file for saturating binary operator validation.
+
+   This file is meant to be included by the relevant test files, which
+   have to define the intrinsic family to test. If a given intrinsic
+   supports variants which are not supported by all the other
+   saturating binary operators, these can be tested by providing a
+   definition for EXTRA_TESTS.  */
+
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* vector_res = OP(vector1,vector2), then store the result.  */
+
+#define TEST_BINARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  Set_Neon_Cumulative_Sat(0);  \
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \
+ VECT_VAR(vector2, T1, W, N)); \
+vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N),  \
+ VECT_VAR(vector_res, T1, W, N));  \
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_BINARY_SAT_OP(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  TEST_BINARY_SAT_OP1(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  DECL_VARIABLE_ALL_VARIANTS(vector1);
+  DECL_VARIABLE_ALL_VARIANTS(vector2);
+  DECL_VARIABLE_ALL_VARIANTS(vector_res);
+
+  clean_results ();
+
+  /* Initialize input vector1 from buffer.  */
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer);
+
+  /* Choose arbitrary initialization values.  */
+  VDUP(vector2, , int, s, 8, 8, 0x11);
+  VDUP(vector2, , int, s, 16, 4, 0x22);
+  VDUP(vector2, , int, s, 32, 2, 0x33);
+  VDUP(vector2, , int, s, 64, 1, 0x44);
+  VDUP(vector2, , uint, u, 8, 8, 0x55);
+  VDUP(vector2, , uint, u, 16, 4, 0x66);
+  VDUP(vector2, , uint, u, 32, 2, 0x77);
+  VDUP(vector2, , uint, u, 64, 1, 0x88);
+
+  VDUP(vector2, q, int, s, 8, 16, 0x11);
+  VDUP(vector2, q, int, s, 16, 8, 0x22);
+  VDUP(vector2, q, int, s, 32, 4, 0x33);
+  VDUP(vector2, q, int, s, 64, 2, 0x44);
+  VDUP(vector2, q, uint, u, 8, 16, 0x55);
+  VDUP(vector2, q, uint, u, 16, 8, 0x66);
+  VDUP(vector2, q, uint, u, 32, 4, 0x77);
+  VDUP(vector2, q, uint, u, 64, 2, 0x88);
+
+  /* Apply a saturating binary operator named INSN_NAME.  */
+  TEST_BINARY_SAT_OP(INSN_NAME, , int, s, 8, 8, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, , int, s, 16, 4, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, , int, s, 32, 2, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, , int, s, 64, 1, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, , uint, u, 8, 8, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, , uint, u, 16, 4, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, , uint, u, 32, 2, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, , uint, u, 64, 1, expected_cumulative_sat, );
+
+  TEST_BINARY_SAT_OP(INSN_NAME, q, int, s, 8, 16, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, q, int, s, 16, 8, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, q, int, s, 32, 4, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, q, int, s, 64, 2, expected_cumulative_sat, );
+  TEST_BINARY_SAT_OP(INSN_NAME, q, uint, u, 8, 16, expected_cumulative_sat, 
);
+  TEST_BINARY_SAT_OP(INSN_NAME, q, uint, u, 16, 8, expected_cumulative_sat, 
);
+  TEST_BINARY_SAT_OP(INSN_NAME, q, uint, u, 32, 4, expected_cumulative_sat, 
);
+  TEST_BINARY_SAT_OP(INSN_NAME, q, uint, u, 64, 2, expected_cumulative_sat, 
);
+
+  CHECK_RESULTS (TEST_MSG, );
+
+#ifdef EXTRA_TESTS
+  EXTRA_TESTS();
+#endif
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqadd.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqadd.c
new file mode 100644
index 000..c07f5ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqadd.c
@@ -0,0 +1,278 @@
+#define INSN_NAME vqadd
+#define TEST_MSG VQADD/VQADDQ
+
+/* Extra tests for special cases:
+   - some requiring intermediate types larger than 64 bits to
+   compute saturation flag.
+   - corner case saturations with types smaller than 64 bits.
+*/
+void 

[Patch ARM-AArch64/testsuite v3 10/21] Add vabdl tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vabdl.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdl.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdl.c
new file mode 100644
index 000..28018ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdl.c
@@ -0,0 +1,109 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0x11, 0x10, 0xf, 0xe,
+   0xd, 0xc, 0xb, 0xa };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x3, 0x2, 0x1, 0x0 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x18, 0x17 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xef, 0xf0, 0xf1, 0xf2,
+0xf3, 0xf4, 0xf5, 0xf6 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffe3, 0xffe4, 0xffe5, 0xffe6 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffe8,
+0xffe9 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+
+#define TEST_MSG VABDL
+void exec_vabdl (void)
+{
+  /* Basic test: v4=vabdl(v1,v2), then store the result.  */
+#define TEST_VABDL(T1, T2, W, W2, N)   \
+  VECT_VAR(vector_res, T1, W2, N) =\
+vabdl_##T2##W(VECT_VAR(vector1, T1, W, N), \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+#define DECL_VABDL_VAR_LONG(VAR)   \
+  DECL_VARIABLE(VAR, int, 16, 8);  \
+  DECL_VARIABLE(VAR, int, 32, 4);  \
+  DECL_VARIABLE(VAR, int, 64, 2);  \
+  DECL_VARIABLE(VAR, uint, 16, 8); \
+  DECL_VARIABLE(VAR, uint, 32, 4); \
+  DECL_VARIABLE(VAR, uint, 64, 2)
+
+#define DECL_VABDL_VAR_SHORT(VAR)  \
+  DECL_VARIABLE(VAR, int, 8, 8);   \
+  DECL_VARIABLE(VAR, int, 16, 4);  \
+  DECL_VARIABLE(VAR, int, 32, 2);  \
+  DECL_VARIABLE(VAR, uint, 8, 8);  \
+  DECL_VARIABLE(VAR, uint, 16, 4); \
+  DECL_VARIABLE(VAR, uint, 32, 2)
+
+  DECL_VABDL_VAR_SHORT(vector1);
+  DECL_VABDL_VAR_SHORT(vector2);
+  DECL_VABDL_VAR_LONG(vector_res);
+
+  clean_results ();
+
+  /* Initialize input vector1 from buffer.  */
+  VLOAD(vector1, buffer, , int, s, 8, 8);
+  VLOAD(vector1, buffer, , int, s, 16, 4);
+  VLOAD(vector1, buffer, , int, s, 32, 2);
+  VLOAD(vector1, buffer, , uint, u, 8, 8);
+  VLOAD(vector1, buffer, , uint, u, 16, 4);
+  VLOAD(vector1, buffer, , uint, u, 32, 2);
+
+  /* Choose init value arbitrarily.  */
+  VDUP(vector2, , int, s, 8, 8, 1);
+  VDUP(vector2, , int, s, 16, 4, -13);
+  VDUP(vector2, , int, s, 32, 2, 8);
+  VDUP(vector2, , uint, u, 8, 8, 1);
+  VDUP(vector2, , uint, u, 16, 4, 13);
+  VDUP(vector2, , uint, u, 32, 2, 8);
+
+  /* Execute the 

[Patch ARM-AArch64/testsuite v3 16/21] Add vdup and vmov tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
new file mode 100644
index 000..b5132f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
@@ -0,0 +1,253 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* We test vdup and vmov in the same place since they are aliases.  */
+
+/* Expected results.  */
+/* Chunk 0.  */
+VECT_VAR_DECL(expected0,int,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+   0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,int,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,int,32,2) [] = { 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,int,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected0,uint,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,uint,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,uint,32,2) [] = { 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,uint,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc180, 0xc180 };
+VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,int,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
+0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,int,32,4) [] = { 0xfff0, 0xfff0,
+0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,int,64,2) [] = { 0xfff0,
+0xfff0 };
+VECT_VAR_DECL(expected0,uint,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,uint,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
+ 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,uint,32,4) [] = { 0xfff0, 0xfff0,
+ 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,uint,64,2) [] = { 0xfff0,
+ 0xfff0 };
+VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
+ 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc180, 0xc180,
+   0xc180, 0xc180 };
+
+/* Chunk 1.  */
+VECT_VAR_DECL(expected1,int,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+   0xf1, 0xf1, 0xf1, 0xf1 };
+VECT_VAR_DECL(expected1,int,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,int,32,2) [] = { 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,int,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected1,uint,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1 };
+VECT_VAR_DECL(expected1,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,uint,32,2) [] = { 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,uint,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1 };
+VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0xc170, 0xc170 };
+VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1 };
+VECT_VAR_DECL(expected1,int,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
+0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,int,32,4) [] = { 0xfff1, 0xfff1,
+

[Patch ARM-AArch64/testsuite v3 12/21] Add vaddl tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vaddl.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c
new file mode 100644
index 000..861abec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c
@@ -0,0 +1,122 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x3, 0x3, 0x3, 0x3,
+   0x3, 0x3, 0x3, 0x3 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x37, 0x37, 0x37, 0x37 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x3, 0x3 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,8) [] = {  0xffe3, 0xffe4, 0xffe5, 0xffe6,
+0xffe7, 0xffe8, 0xffe9, 0xffea };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xffe2, 0xffe3,
+   0xffe4, 0xffe5 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xffe0,
+   0xffe1 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0x1e3, 0x1e4, 0x1e5, 0x1e6,
+0x1e7, 0x1e8, 0x1e9, 0x1ea };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x1ffe1, 0x1ffe2,
+0x1ffe3, 0x1ffe4 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x1ffe0, 0x1ffe1 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+
+#ifndef INSN_NAME
+#define INSN_NAME vaddl
+#define TEST_MSG VADDL
+#endif
+
+#define FNNAME1(NAME) void exec_ ## NAME (void)
+#define FNNAME(NAME) FNNAME1(NAME)
+
+FNNAME (INSN_NAME)
+{
+  /* Basic test: y=vaddl(x1,x2), then store the result.  */
+#define TEST_VADDL1(INSN, T1, T2, W, W2, N)\
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_##T2##W(VECT_VAR(vector, T1, W, N), \
+  VECT_VAR(vector2, T1, W, N));\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+#define TEST_VADDL(INSN, T1, T2, W, W2, N) \
+  TEST_VADDL1(INSN, T1, T2, W, W2, N)
+
+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 8, 8);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+
+  DECL_VARIABLE(vector2, int, 8, 8);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector2, uint, 8, 8);
+  DECL_VARIABLE(vector2, uint, 16, 4);
+  DECL_VARIABLE(vector2, uint, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 16, 8);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  /* Initialize input vector from buffer.  */
+  VLOAD(vector, buffer, , int, s, 8, 8);
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, , uint, u, 8, 8);
+  VLOAD(vector, buffer, , uint, u, 16, 4);
+  VLOAD(vector, buffer, , uint, u, 32, 2);
+
+  /* 

[Patch ARM-AArch64/testsuite v3 13/21] Add vaddw tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vaddw.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c
new file mode 100644
index 000..5804cd7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c
@@ -0,0 +1,122 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x3, 0x3, 0x3, 0x3,
+   0x3, 0x3, 0x3, 0x3 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x37, 0x37, 0x37, 0x37 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x3, 0x3 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,8) [] = {  0xffe3, 0xffe4, 0xffe5, 0xffe6,
+0xffe7, 0xffe8, 0xffe9, 0xffea };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xffe2, 0xffe3,
+   0xffe4, 0xffe5 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xffe0,
+   0xffe1 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xe3, 0xe4, 0xe5, 0xe6,
+0xe7, 0xe8, 0xe9, 0xea };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffe1, 0xffe2,
+0xffe3, 0xffe4 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffe0, 0xffe1 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+
+#ifndef INSN_NAME
+#define INSN_NAME vaddw
+#define TEST_MSG VADDW
+#endif
+
+#define FNNAME1(NAME) void exec_ ## NAME (void)
+#define FNNAME(NAME) FNNAME1(NAME)
+
+FNNAME (INSN_NAME)
+{
+  /* Basic test: y=vaddw(x1,x2), then store the result.  */
+#define TEST_VADDW1(INSN, T1, T2, W, W2, N)\
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_##T2##W(VECT_VAR(vector, T1, W2, N),
\
+  VECT_VAR(vector2, T1, W, N));\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+#define TEST_VADDW(INSN, T1, T2, W, W2, N) \
+  TEST_VADDW1(INSN, T1, T2, W, W2, N)
+
+  DECL_VARIABLE(vector, int, 16, 8);
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector, int, 64, 2);
+  DECL_VARIABLE(vector, uint, 16, 8);
+  DECL_VARIABLE(vector, uint, 32, 4);
+  DECL_VARIABLE(vector, uint, 64, 2);
+
+  DECL_VARIABLE(vector2, int, 8, 8);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector2, uint, 8, 8);
+  DECL_VARIABLE(vector2, uint, 16, 4);
+  DECL_VARIABLE(vector2, uint, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 16, 8);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  /* Initialize input vector from buffer.  */
+  VLOAD(vector, buffer, q, int, s, 16, 8);
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, int, s, 64, 2);
+  VLOAD(vector, buffer, q, uint, u, 16, 8);
+  VLOAD(vector, buffer, q, uint, u, 32, 4);
+  VLOAD(vector, buffer, q, uint, u, 64, 2);
+
+  

[Patch ARM-AArch64/testsuite v3 19/21] Add vld2_lane, vld3_lane and vld4_lane tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c
new file mode 100644
index 000..1991033
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c
@@ -0,0 +1,610 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+
+/* vld2/chunk 0.  */
+VECT_VAR_DECL(expected_vld2_0,int,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+ 0xaa, 0xaa, 0xaa, 0xaa };
+VECT_VAR_DECL(expected_vld2_0,int,16,4) [] = { 0x, 0x, 0x, 0x 
};
+VECT_VAR_DECL(expected_vld2_0,int,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_0,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected_vld2_0,uint,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+  0xaa, 0xaa, 0xaa, 0xaa };
+VECT_VAR_DECL(expected_vld2_0,uint,16,4) [] = { 0x, 0x,
+   0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,uint,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+  0xaa, 0xaa, 0xaa, 0xaa };
+VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0x, 0x,
+   0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL(expected_vld2_0,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_vld2_0,int,16,8) [] = { 0x, 0x, 0x, 0x,
+  0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,int,32,4) [] = { 0x, 0x,
+  0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,int,64,2) [] = { 0x,
+  0x };
+VECT_VAR_DECL(expected_vld2_0,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_vld2_0,uint,16,8) [] = { 0x, 0x, 0x, 0x,
+   0x, 0x, 0x, 0x 
};
+VECT_VAR_DECL(expected_vld2_0,uint,32,4) [] = { 0xfff0, 0xfff1,
+   0x, 0x };
+VECT_VAR_DECL(expected_vld2_0,uint,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected_vld2_0,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_vld2_0,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+   0x, 0x, 0x, 0x 
};
+VECT_VAR_DECL(expected_vld2_0,hfloat,32,4) [] = { 0x, 0x,
+ 0x, 0x };
+
+/* vld2/chunk 1.  */
+VECT_VAR_DECL(expected_vld2_1,int,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
+ 0xaa, 0xaa, 0xf0, 0xf1 };
+VECT_VAR_DECL(expected_vld2_1,int,16,4) [] = { 0xfff0, 0xfff1, 0x, 0x 
};
+VECT_VAR_DECL(expected_vld2_1,int,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected_vld2_1,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected_vld2_1,uint,8,8) [] = { 0xf0, 0xf1, 0xaa, 0xaa,
+  0xaa, 0xaa, 0xaa, 0xaa };
+VECT_VAR_DECL(expected_vld2_1,uint,16,4) [] = { 0x, 0x, 0xfff0, 0xfff1 
};
+VECT_VAR_DECL(expected_vld2_1,uint,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf0, 0xf1, 0xaa, 0xaa,
+  0xaa, 0xaa, 0xaa, 0xaa };
+VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0x, 0x, 0xfff0, 0xfff1 
};
+VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected_vld2_1,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33,
+  

[Patch ARM-AArch64/testsuite v3 14/21] Add vbsl tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vbsl.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
new file mode 100644
index 000..bb17f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
@@ -0,0 +1,124 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
+  0xf6, 0xf6, 0xf6, 0xf6 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffd };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+   0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+   0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc184, 0xc174 };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
+   0xf6, 0xf6, 0xf6, 0xf6,
+   0xf2, 0xf2, 0xf2, 0xf2,
+   0xf6, 0xf6, 0xf6, 0xf6 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2,
+   0xfff4, 0xfff4, 0xfff6, 0xfff6 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfff0, 0xfff0,
+   0xfff2, 0xfff2 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffd,
+   0xfffd };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+0xf7, 0xf7, 0xf7, 0xf7,
+0xf3, 0xf3, 0xf3, 0xf3,
+0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2,
+0xfff4, 0xfff4, 0xfff6, 0xfff6 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfff0, 0xfff0,
+0xfff2, 0xfff2 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfff1,
+0xfff1 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+0xf7, 0xf7, 0xf7, 0xf7,
+0xf3, 0xf3, 0xf3, 0xf3,
+0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2,
+0xfff4, 0xfff4, 0xfff6, 0xfff6 };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc181, 0xc171,
+  0xc161, 0xc151 };
+
+#define TEST_MSG VBSL/VBSLQ
+void exec_vbsl (void)
+{
+  /* Basic test: y=vbsl(unsigned_vec,x1,x2), then store the result.  */
+#define TEST_VBSL(T3, Q, T1, T2, W, N) \
+  VECT_VAR(vector_res, T1, W, N) = \
+vbsl##Q##_##T2##W(VECT_VAR(vector_first, T3, W, N),
\
+ VECT_VAR(vector, T1, W, N),   \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+  DECL_VARIABLE_ALL_VARIANTS(vector2);
+  DECL_VARIABLE_ALL_VARIANTS(vector_res);
+
+  DECL_VARIABLE_UNSIGNED_VARIANTS(vector_first);
+
+  clean_results ();
+
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+
+  /* Choose init value arbitrarily, will be used for vector
+ comparison. As we want different values for each type variant, we
+ can't use generic initialization macros.  */
+  VDUP(vector2, , int, s, 8, 8, -10);
+  VDUP(vector2, , int, s, 16, 4, -14);
+  VDUP(vector2, , int, s, 32, 2, -30);
+  VDUP(vector2, , int, s, 64, 1, -33);
+  VDUP(vector2, , uint, u, 8, 8, 0xF3);
+  VDUP(vector2, , uint, u, 16, 4, 0xFFF2);
+  VDUP(vector2, , uint, u, 32, 2, 0xFFF0);
+  VDUP(vector2, , uint, u, 64, 1, 0xFFF3);
+  VDUP(vector2, , float, f, 32, 2, -30.3f);
+  VDUP(vector2, , poly, p, 8, 8, 0xF3);
+  VDUP(vector2, , poly, p, 16, 4, 0xFFF2);
+
+  VDUP(vector2, q, int, s, 8, 16, -10);
+  VDUP(vector2, q, int, s, 16, 8, -14);
+  

[Patch ARM-AArch64/testsuite v3 17/21] Add vld1_dup tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c
new file mode 100644
index 000..0e05274
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c
@@ -0,0 +1,180 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+/* Chunk 0.  */
+VECT_VAR_DECL(expected0,int,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+   0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,int,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,int,32,2) [] = { 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,int,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected0,uint,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,uint,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,uint,32,2) [] = { 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,uint,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc180, 0xc180 };
+VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0,
+0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,int,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
+0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,int,32,4) [] = { 0xfff0, 0xfff0,
+0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,int,64,2) [] = { 0xfff0,
+0xfff0 };
+VECT_VAR_DECL(expected0,uint,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,uint,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
+ 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,uint,32,4) [] = { 0xfff0, 0xfff0,
+ 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,uint,64,2) [] = { 0xfff0,
+ 0xfff0 };
+VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0 };
+VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
+ 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc180, 0xc180,
+   0xc180, 0xc180 };
+
+/* Chunk 1.  */
+VECT_VAR_DECL(expected1,int,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+   0xf1, 0xf1, 0xf1, 0xf1 };
+VECT_VAR_DECL(expected1,int,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,int,32,2) [] = { 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,int,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected1,uint,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1 };
+VECT_VAR_DECL(expected1,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,uint,32,2) [] = { 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,uint,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1 };
+VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0xc170, 0xc170 };
+VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1,
+0xf1, 0xf1, 0xf1, 0xf1 };
+VECT_VAR_DECL(expected1,int,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
+0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,int,32,4) [] = { 0xfff1, 0xfff1,
+0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,int,64,2) [] = { 0xfff1,

[Patch ARM-AArch64/testsuite v3 20/21] Add vmul tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vmul.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
new file mode 100644
index 000..7527861
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
@@ -0,0 +1,156 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0x1, 0x12, 0x23,
+  0x34, 0x45, 0x56, 0x67 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfde0, 0xfe02, 0xfe24, 0xfe46 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfcd0, 0xfd03 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xc0, 0x4, 0x48, 0x8c,
+   0xd0, 0x14, 0x58, 0x9c };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfab0, 0xfb05, 0xfb5a, 0xfbaf };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xf9a0, 0xfa06 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0xc0, 0x84, 0x48, 0xc,
+   0xd0, 0x94, 0x58, 0x1c };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc405, 0xc3f9c000 };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x90, 0x7, 0x7e, 0xf5,
+   0x6c, 0xe3, 0x5a, 0xd1,
+   0x48, 0xbf, 0x36, 0xad,
+   0x24, 0x9b, 0x12, 0x89 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xf780, 0xf808, 0xf890, 0xf918,
+   0xf9a0, 0xfa28, 0xfab0, 0xfb38 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xf670, 0xf709,
+   0xf7a2, 0xf83b };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x60, 0xa, 0xb4, 0x5e,
+0x8, 0xb2, 0x5c, 0x6,
+0xb0, 0x5a, 0x4, 0xae,
+0x58, 0x2, 0xac, 0x56 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xf450, 0xf50b, 0xf5c6, 0xf681,
+0xf73c, 0xf7f7, 0xf8b2, 0xf96d };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xf340, 0xf40c,
+0xf4d8, 0xf5a4 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x,
+0x };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x60, 0xca, 0x34, 0x9e,
+0xc8, 0x62, 0x9c, 0x36,
+0x30, 0x9a, 0x64, 0xce,
+0x98, 0x32, 0xcc, 0x66 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4c7, 0xc4bac000,
+  0xc4ae4ccd, 0xc4a1d999 };
+
+#ifndef INSN_NAME
+#define INSN_NAME vmul
+#define TEST_MSG VMUL
+#endif
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+#define DECL_VMUL(T, W, N) \
+  DECL_VARIABLE(vector1, T, W, N); \
+  DECL_VARIABLE(vector2, T, W, N); \
+  DECL_VARIABLE(vector_res, T, W, N)
+
+  /* vector_res = OP(vector1, vector2), then store the result.  */
+#define TEST_VMUL1(INSN, Q, T1, T2, W, N)  \
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N),
\
+   VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VMUL(INSN, Q, T1, T2, W, N)   \
+  TEST_VMUL1(INSN, Q, T1, T2, W, N)
+
+  DECL_VMUL(int, 8, 8);
+  DECL_VMUL(int, 16, 4);
+  DECL_VMUL(int, 32, 2);
+  DECL_VMUL(uint, 8, 8);
+  DECL_VMUL(uint, 16, 4);
+  DECL_VMUL(uint, 32, 2);
+  DECL_VMUL(poly, 8, 8);
+  DECL_VMUL(float, 32, 2);
+  DECL_VMUL(int, 8, 16);
+  DECL_VMUL(int, 16, 8);
+  DECL_VMUL(int, 32, 4);
+  DECL_VMUL(uint, 8, 16);
+  DECL_VMUL(uint, 16, 8);
+  DECL_VMUL(uint, 32, 4);
+  DECL_VMUL(poly, 8, 16);
+  DECL_VMUL(float, 32, 4);
+
+  clean_results ();
+
+  /* Initialize input vector1 from buffer.  */
+  VLOAD(vector1, buffer, , int, s, 8, 8);
+  VLOAD(vector1, buffer, , int, s, 16, 4);
+  VLOAD(vector1, buffer, , int, s, 32, 2);
+  VLOAD(vector1, buffer, , uint, u, 8, 8);
+  VLOAD(vector1, buffer, , uint, 

[Patch ARM-AArch64/testsuite v3 15/21] Add vclz tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vclz.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclz.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclz.c
new file mode 100644
index 000..ad28d2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclz.c
@@ -0,0 +1,194 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0 
};
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x3, 0x3, 0x3, 0x3 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x11, 0x11 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2 
};
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x5, 0x5 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2,
+   0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2, 0x2 
};
+VECT_VAR_DECL(expected,int,16,8) [] = { 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3 
};
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x3, 0x3, 0x3, 0x3 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 
0x3,
+0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3 
};
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xd, 0xd, 0xd, 0xd,
+0xd, 0xd, 0xd, 0xd };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x1f, 0x1f, 0x1f, 0x1f };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x,
+0x };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+
+
+/* Expected results with input=0.  */
+VECT_VAR_DECL(expected_with_0,int,8,8) [] = { 0x8, 0x8, 0x8, 0x8,
+ 0x8, 0x8, 0x8, 0x8 };
+VECT_VAR_DECL(expected_with_0,int,16,4) [] = { 0x10, 0x10, 0x10, 0x10 };
+VECT_VAR_DECL(expected_with_0,int,32,2) [] = { 0x20, 0x20 };
+VECT_VAR_DECL(expected_with_0,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected_with_0,uint,8,8) [] = { 0x8, 0x8, 0x8, 0x8,
+  0x8, 0x8, 0x8, 0x8 };
+VECT_VAR_DECL(expected_with_0,uint,16,4) [] = { 0x10, 0x10, 0x10, 0x10 };
+VECT_VAR_DECL(expected_with_0,uint,32,2) [] = { 0x20, 0x20 };
+VECT_VAR_DECL(expected_with_0,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected_with_0,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_with_0,poly,16,4) [] = { 0x, 0x, 0x, 0x 
};
+VECT_VAR_DECL(expected_with_0,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected_with_0,int,8,16) [] = { 0x8, 0x8, 0x8, 0x8,
+  0x8, 0x8, 0x8, 0x8,
+  0x8, 0x8, 0x8, 0x8,
+  0x8, 0x8, 0x8, 0x8 };
+VECT_VAR_DECL(expected_with_0,int,16,8) [] = { 0x10, 0x10, 0x10, 0x10,
+  0x10, 0x10, 0x10, 0x10 };
+VECT_VAR_DECL(expected_with_0,int,32,4) [] = { 0x20, 0x20, 0x20, 0x20 };
+VECT_VAR_DECL(expected_with_0,int,64,2) [] = { 0x,
+  0x };
+VECT_VAR_DECL(expected_with_0,uint,8,16) [] = { 0x8, 0x8, 0x8, 0x8,
+   0x8, 0x8, 0x8, 0x8,
+   0x8, 0x8, 0x8, 0x8,
+   0x8, 0x8, 0x8, 0x8 };
+VECT_VAR_DECL(expected_with_0,uint,16,8) [] = { 0x10, 0x10, 0x10, 0x10,
+   0x10, 0x10, 0x10, 0x10 };
+VECT_VAR_DECL(expected_with_0,uint,32,4) [] = { 0x20, 0x20, 0x20, 0x20 };
+VECT_VAR_DECL(expected_with_0,uint,64,2) [] = { 

[Patch ARM-AArch64/testsuite v3 18/21] Add vld2/vld3/vld4 tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vldX.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c
new file mode 100644
index 000..fe00640
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c
@@ -0,0 +1,692 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results.  */
+
+/* vld2/chunk 0.  */
+VECT_VAR_DECL(expected_vld2_0,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected_vld2_0,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 
};
+VECT_VAR_DECL(expected_vld2_0,int,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_0,int,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected_vld2_0,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+  0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected_vld2_0,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 
};
+VECT_VAR_DECL(expected_vld2_0,uint,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_0,uint,64,1) [] = { 0xfff0 };
+VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+  0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 
};
+VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL(expected_vld2_0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+  0xf4, 0xf5, 0xf6, 0xf7,
+  0xf8, 0xf9, 0xfa, 0xfb,
+  0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected_vld2_0,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+  0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected_vld2_0,int,32,4) [] = { 0xfff0, 0xfff1,
+  0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld2_0,int,64,2) [] = { 0x,
+  0x };
+VECT_VAR_DECL(expected_vld2_0,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+   0xf4, 0xf5, 0xf6, 0xf7,
+   0xf8, 0xf9, 0xfa, 0xfb,
+   0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected_vld2_0,uint,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+   0xfff4, 0xfff5, 0xfff6, 0xfff7 
};
+VECT_VAR_DECL(expected_vld2_0,uint,32,4) [] = { 0xfff0, 0xfff1,
+   0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld2_0,uint,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected_vld2_0,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+   0xf4, 0xf5, 0xf6, 0xf7,
+   0xf8, 0xf9, 0xfa, 0xfb,
+   0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected_vld2_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+   0xfff4, 0xfff5, 0xfff6, 0xfff7 
};
+VECT_VAR_DECL(expected_vld2_0,hfloat,32,4) [] = { 0xc180, 0xc170,
+ 0xc160, 0xc150 };
+
+/* vld2/chunk 1.  */
+VECT_VAR_DECL(expected_vld2_1,int,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
+ 0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected_vld2_1,int,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 
};
+VECT_VAR_DECL(expected_vld2_1,int,32,2) [] = { 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld2_1,int,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,uint,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
+  0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected_vld2_1,uint,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 
};
+VECT_VAR_DECL(expected_vld2_1,uint,32,2) [] = { 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld2_1,uint,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
+  0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 
};
+VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0xc160, 0xc150 };
+VECT_VAR_DECL(expected_vld2_1,int,8,16) [] = { 0x0, 0x1, 0x2, 0x3,
+  0x4, 0x5, 0x6, 0x7,
+  0x8, 0x9, 0xa, 0xb,
+  0xc, 0xd, 0xe, 0xf 

[Patch ARM-AArch64/testsuite v3 21/21] Add vuzp and vzip tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vuzp.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise.


diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c
new file mode 100644
index 000..53f875e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c
@@ -0,0 +1,245 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+/* Expected results splitted in several chunks.  */
+/* Chunk 0.  */
+VECT_VAR_DECL(expected0,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+   0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected0,int,16,4) [] = { 0xfff0, 0xfff1,
+0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected0,int,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected0,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected0,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected0,uint,16,4) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected0,uint,32,2) [] = { 0xfff0,
+ 0xfff1 };
+VECT_VAR_DECL(expected0,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+0xf4, 0xf5, 0xf6, 0xf7,
+0xf8, 0xf9, 0xfa, 0xfb,
+0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected0,int,16,8) [] = { 0xfff0, 0xfff1,
+0xfff2, 0xfff3,
+0xfff4, 0xfff5,
+0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected0,int,32,4) [] = { 0xfff0, 0xfff1,
+0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected0,int,64,2) [] = { 0x,
+0x };
+VECT_VAR_DECL(expected0,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7,
+ 0xf8, 0xf9, 0xfa, 0xfb,
+ 0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected0,uint,16,8) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3,
+ 0xfff4, 0xfff5,
+ 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected0,uint,32,4) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected0,uint,64,2) [] = { 0x,
+ 0x };
+VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7,
+ 0xf8, 0xf9, 0xfa, 0xfb,
+ 0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3,
+ 0xfff4, 0xfff5,
+ 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc180, 0xc170,
+   0xc160, 0xc150 };
+
+/* Chunk 1.  */
+VECT_VAR_DECL(expected1,int,8,8) [] = { 0x11, 0x11, 0x11, 0x11,
+   0x11, 0x11, 0x11, 0x11 };
+VECT_VAR_DECL(expected1,int,16,4) [] = { 0x22, 0x22, 0x22, 0x22 };
+VECT_VAR_DECL(expected1,int,32,2) [] = { 0x33, 0x33 };
+VECT_VAR_DECL(expected1,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected1,uint,8,8) [] = { 0x55, 0x55, 0x55, 0x55,
+0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected1,uint,16,4) [] = { 0x66, 0x66, 0x66, 0x66 };
+VECT_VAR_DECL(expected1,uint,32,2) [] = { 0x77, 0x77 };
+VECT_VAR_DECL(expected1,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected1,poly,8,8) [] = { 0x55, 0x55, 0x55, 0x55,
+0x55, 0x55, 0x55, 0x55 };
+VECT_VAR_DECL(expected1,poly,16,4) [] = { 0x66, 0x66, 0x66, 0x66 };
+VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0x4206, 0x4206 };
+VECT_VAR_DECL(expected1,int,8,16) [] = { 0x11, 0x11, 0x11, 0x11,
+0x11, 0x11, 0x11, 0x11,
+ 

[Patch ARM-AArch64/testsuite v3 11/21] Add vaddhn tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vaddhn.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c
new file mode 100644
index 000..74b4b4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c
@@ -0,0 +1,109 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+
+#if defined(__cplusplus)
+#include cstdint
+#else
+#include stdint.h
+#endif
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x32, 0x32, 0x32, 0x32,
+  0x32, 0x32, 0x32, 0x32 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x32, 0x32, 0x32, 0x32 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x18, 0x18 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x3, 0x3, 0x3, 0x3,
+   0x3, 0x3, 0x3, 0x3 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x37, 0x37, 0x37, 0x37 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x3, 0x3 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,8) [] = {  0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x, 0x,
+   0x, 0x };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x, 0x,
+0x, 0x };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x,
+0x };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+
+#ifndef INSN_NAME
+#define INSN_NAME vaddhn
+#define TEST_MSG VADDHN
+#endif
+
+#define FNNAME1(NAME) void exec_ ## NAME (void)
+#define FNNAME(NAME) FNNAME1(NAME)
+
+FNNAME (INSN_NAME)
+{
+  /* Basic test: vec64=vaddhn(vec128_a, vec128_b), then store the result.  */
+#define TEST_VADDHN1(INSN, T1, T2, W, W2, N)   \
+  VECT_VAR(vector64, T1, W2, N) = INSN##_##T2##W(VECT_VAR(vector1, T1, W, N), \
+VECT_VAR(vector2, T1, W, N)); \
+  vst1_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector64, T1, W2, N))
+
+#define TEST_VADDHN(INSN, T1, T2, W, W2, N)\
+  TEST_VADDHN1(INSN, T1, T2, W, W2, N)
+
+  DECL_VARIABLE_64BITS_VARIANTS(vector64);
+  DECL_VARIABLE_128BITS_VARIANTS(vector1);
+  DECL_VARIABLE_128BITS_VARIANTS(vector2);
+
+  clean_results ();
+
+  /* Fill input vector1 and vector2 with arbitrary values */
+  VDUP(vector1, q, int, s, 16, 8, 50*(UINT8_MAX+1));
+  VDUP(vector1, q, int, s, 32, 4, 50*(UINT16_MAX+1));
+  VDUP(vector1, q, int, s, 64, 2, 24*((uint64_t)UINT32_MAX+1));
+  VDUP(vector1, q, uint, u, 16, 8, 3*(UINT8_MAX+1));
+  VDUP(vector1, q, uint, u, 32, 4, 55*(UINT16_MAX+1));
+  VDUP(vector1, q, uint, u, 64, 2, 3*((uint64_t)UINT32_MAX+1));
+
+  VDUP(vector2, q, int, s, 16, 8, (uint16_t)UINT8_MAX);
+  VDUP(vector2, q, int, s, 32, 4, (uint32_t)UINT16_MAX);
+  VDUP(vector2, q, int, s, 64, 2, (uint64_t)UINT32_MAX);
+  VDUP(vector2, q, uint, u, 16, 8, (uint16_t)UINT8_MAX);
+  VDUP(vector2, q, uint, u, 32, 4, (uint32_t)UINT16_MAX);
+  VDUP(vector2, q, uint, u, 64, 2, (uint64_t)UINT32_MAX);
+
+  TEST_VADDHN(INSN_NAME, 

[Patch ARM-AArch64/testsuite v3 09/21] Add vabd tests.

2014-10-21 Thread Christophe Lyon

2014-10-21  Christophe Lyon  christophe.l...@linaro.org
 
* gcc.target/aarch64/advsimd-intrinsics/vabd.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c
new file mode 100644
index 000..e95404f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c
@@ -0,0 +1,153 @@
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+#include math.h
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x11, 0x10, 0xf, 0xe,
+  0xd, 0xc, 0xb, 0xa };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x3, 0x2, 0x1, 0x0 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x18, 0x17 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xef, 0xf0, 0xf1, 0xf2,
+   0xf3, 0xf4, 0xf5, 0xf6 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xffe3, 0xffe4, 0xffe5, 0xffe6 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffe8, 0xffe9 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x41c2, 0x41ba };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x1a, 0x19, 0x18, 0x17,
+   0x16, 0x15, 0x14, 0x13,
+   0x12, 0x11, 0x10, 0xf,
+   0xe, 0xd, 0xc, 0xb };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0x4, 0x3, 0x2, 0x1,
+   0x0, 0x1, 0x2, 0x3 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x30, 0x2f, 0x2e, 0x2d };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x,
+   0x };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xe6, 0xe7, 0xe8, 0xe9,
+0xea, 0xeb, 0xec, 0xed,
+0xee, 0xef, 0xf0, 0xf1,
+0xf2, 0xf3, 0xf4, 0xf5 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xffe4, 0xffe5, 0xffe6, 0xffe7,
+0xffe8, 0xffe9, 0xffea, 0xffeb };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffd0, 0xffd1,
+0xffd2, 0xffd3 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x,
+0x };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x42407ae1, 0x423c7ae1,
+  0x42387ae1, 0x42347ae1 };
+
+/* Additional expected results for float32 variants with specially
+   chosen input values.  */
+VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+#define TEST_MSG VABD/VABDQ
+void exec_vabd (void)
+{
+  /* Basic test: v4=vabd(v1,v2), then store the result.  */
+#define TEST_VABD(Q, T1, T2, W, N) \
+  VECT_VAR(vector_res, T1, W, N) = \
+vabd##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \
+ VECT_VAR(vector2, T1, W, N)); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define DECL_VABD_VAR(VAR) \
+  DECL_VARIABLE(VAR, int, 8, 8);   \
+  DECL_VARIABLE(VAR, int, 16, 4);  \
+  DECL_VARIABLE(VAR, int, 32, 2);  \
+  DECL_VARIABLE(VAR, uint, 8, 8);  \
+  DECL_VARIABLE(VAR, uint, 16, 4); \
+  DECL_VARIABLE(VAR, uint, 32, 2); \
+  DECL_VARIABLE(VAR, float, 32, 2);\
+  DECL_VARIABLE(VAR, int, 8, 16);  \
+  DECL_VARIABLE(VAR, int, 16, 8);  \
+  DECL_VARIABLE(VAR, int, 32, 4);  \
+  DECL_VARIABLE(VAR, uint, 8, 16); \
+  DECL_VARIABLE(VAR, uint, 16, 8); \
+  DECL_VARIABLE(VAR, uint, 32, 4); \
+  DECL_VARIABLE(VAR, float, 32, 4)
+
+  DECL_VABD_VAR(vector1);
+  DECL_VABD_VAR(vector2);
+  DECL_VABD_VAR(vector_res);
+
+  clean_results ();
+
+  /* Initialize input vector1 from buffer.  */
+  VLOAD(vector1, buffer, , int, s, 8, 8);
+  VLOAD(vector1, buffer, , int, s, 16, 4);
+  VLOAD(vector1, buffer, , int, s, 32, 2);
+  VLOAD(vector1, buffer, , uint, u, 8, 8);
+  VLOAD(vector1, buffer, , 

Re: [libstdc++ PATCH] More Fundamentals v1 variable templates

2014-10-21 Thread Jonathan Wakely

On 21/10/14 07:19 +0300, Ville Voutilainen wrote:

No, no, no! Contributors must be punished, otherwise they will never learn! ;)
Revised patch attached.


Thanks, committed to trunk.


diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 1ee8ddc..c797246 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -644,7 +644,10 @@ experimental_headers = \
${experimental_srcdir}/string_view \
${experimental_srcdir}/string_view.tcc \
${experimental_srcdir}/tuple \
-   ${experimental_srcdir}/type_traits
+   ${experimental_srcdir}/type_traits \
+   ${experimental_srcdir}/ratio \
+   ${experimental_srcdir}/chrono \
+   ${experimental_srcdir}/system_error \


I rearranged these files to keep them in alphabetical order (and
remove the trailing backslash).



Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-21 Thread Yuri Rumyantsev
Richard,

Yes, This patch does not make sense since phi node predication for bb
with critical incoming edges only performs another function which is
absent (predicate_extended_scalar_phi).

BTW I see that commit_edge_insertions() is used for rtx instructions
only but you propose to use it for tree also.
Did I miss something?

Thanks ahead.


2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I did some changes in patch and ChangeLog to mark that support for
 if-convert of blocks with only critical incoming edges will be added
 in the future (more precise in patch.4).

 But the same reasoning applies to this version of the patch when
 flag_force_vectorize is true!?  (insertion point and invalid SSA form)

 Which means the patch doesn't make sense in isolation?

 Btw, I think for the case you should simply do gsi_insert_on_edge ()
 and commit_edge_insertions () before the call to combine_blocks
 (pushing the edge predicate to the newly created block).

 Richard.

 Could you please review it.

 Thanks.

 ChangeLog:

 2014-10-21  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 (all_preds_critical_p): New function.
 (if_convertible_bb_p): Use call of all_preds_critical_p
 to reject temporarily block if-conversion with incoming critical edges
 if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
 after adding support for extended predication.
 (predicate_bbs): Skip loop exit block also.Invoke build2_loc
 to compute predicate instead of fold_build2_loc.
 Add zeroing of edge 'aux' field.
 (find_phi_replacement_condition): Extend function interface:
 it returns NULL if given phi node must be handled by means of
 extended phi node predication. If number of predecessors of phi-block
 is equal 2 and at least one incoming edge is not critical original
 algorithm is used.
 (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
 Nullify 'aux' field of edges for blocks with two successors.

 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com:
 Richard,

 Thanks for your answer!

 In current implementation phi node conversion assume that one of
 incoming edge to bb containing given phi has at least one non-critical
 edge and choose it to insert predicated code. But if we choose
 critical edge we need to determine insert point and insertion
 direction (before/after) since in other case we can get invalid ssa
 form (use before def). This is done by my new function which is not in
 current patch ( I will present this patch later). SO I assume that we
 need to leave this patch as it is to not introduce new bugs.

 Thanks.
 Yuri.

 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 I reworked the patch as you proposed, but I didn't understand what
 did you mean by:

So please rework the patch so critical edges are always handled
correctly.

 In current patch flag_force_vectorize is used (1) to reject phi nodes
 with more than 2 arguments; (2) to reject basic blocks with only
 critical incoming edges since support for extended predication of phi
 nodes will be in next patch.

 I mean that (2) should not be rejected dependent on flag_force_vectorize.
 It was rejected because if-cvt couldn't handle it correctly before but with
 this patch this is fixed.  I see no reason to still reject this then even
 for !flag_force_vectorize.

 Rejecting PHIs with more than two arguments with flag_force_vectorize
 is ok.

 Richard.

 Could you please clarify your statement.

 I attached modified patch.

 ChangeLog:

 2014-10-17  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 (all_edges_are_critical): New function.
 (if_convertible_bb_p): Use call of all_preds_critical_p
 to reject block if-conversion with incoming critical edges only if
 FLAG_FORCE_VECTORIZE was not set-up.
 (predicate_bbs): Skip loop exit block also.Invoke build2_loc
 to 

[PATCHv2] Don't expand string/memory builtins if ASan is enabled.

2014-10-21 Thread Maxim Ostapenko

Hi,

this is the second version of the patch. Here the major changes from the 
previous one:


1) Added a new intercepted_p parameter in get_mem_refs_of_builtin_call 
to decide whether builtin function should/shouldn't be instrumented.


2) Changed instrument_mem_region_access function. Now, we update 
asan_mem_ref_ht with (base, size_in_bytes), if we can determine access 
size during compile time.


3) Removed ASAN_CHECK_START_INSTRUMENTED and ASAN_CHECK_END_INSTRUMENTED 
from asan_check_flags since we don't instrument base and end of

memory region with access size 1 anymore.

4) Specified builtins that shouldn't be expanded explicitly in 
gcc/builtins.c.


Regtested / bootrapped on x86_64-unknown-linux-gnu.

-Maxim
On 10/17/2014 05:03 PM, Jakub Jelinek wrote:

On Fri, Oct 17, 2014 at 05:01:33PM +0400, Yury Gribov wrote:

On 10/17/2014 04:24 PM, Jakub Jelinek wrote:

+/* Returns TRUE if given FCODE corresponds to string or memory builtin 
function.
+ */
+
+static inline bool
+is_memory_builtin (enum built_in_function fcode)
+{
+  return fcode = BUILT_IN_STRSTR  fcode = BUILT_IN_BCMP;

This is too fragile and ugly.
IMHO you should list (supposedly not in a special inline, but directly
where you use it) in a switch all the builtins you don't want to expand.

We already do this for BUILT_IN_ASAN_REPORT_LOAD1 ... BUILT_IN_ASAN_STOREN

I know, but it is still a coherent sent of builtins for very similar
purposes, many of them sorted by increasing size number.


but I agree that this one is more ugly.

The memops builtins are just random bag of them, it is expected many people
will add builtins into that range and outside of that range.

Jakub



gcc/ChangeLog:

2014-10-21  Max Ostapenko  m.ostape...@partner.samsung.com

	* asan.c (asan_mem_ref_hasher::hash): Remove MEM_REF access size from
	hash value construction. Call iterative_hash_expr instead of explicit
	hash building.
	(asan_mem_ref_hasher::equal): Change condition.
	(has_mem_ref_been_instrumented): Likewise.
	(update_mem_ref_hash_table): Likewise.
	(maybe_update_mem_ref_hash_table): New function.
	(instrument_strlen_call): Removed.
	(get_mem_refs_of_builtin_call): Handle new parameter.
	(instrument_builtin_call): Call maybe_update_mem_ref_hash_table instead
	of instrument_mem_region_access if intercepted_p is true.
	(instrument_mem_region_access): Instrument only base with len instead of
	base and end with 1.
	(build_check_stmt): Remove start_instrumented and end_instrumented
	parameters.
	(enum asan_check_flags): Remove ASAN_CHECK_START_INSTRUMENTED and
	ASAN_CHECK_END_INSTRUMENTED. Change ASAN_CHECK_LAST.
	(asan_expand_check_ifn): Remove start_instrumented and end_instrumented.
	* builtins.c (expand_builtin): Don't expand string/memory builtin functions
	that have interceptors in libsanitizer if ASan is enabled.

gcc/testsuite/ChangeLog:

2014-10-21  Max Ostapenko  m.ostape...@partner.samsung.com

	* c-c++-common/asan/no-redundant-instrumentation-1.c: Updated test.
	* c-c++-common/asan/no-redundant-instrumentation-4.c: Likewise.
	* c-c++-common/asan/no-redundant-instrumentation-5.c: Likewise.
	* c-c++-common/asan/no-redundant-instrumentation-6.c: Likewise.
	* c-c++-common/asan/no-redundant-instrumentation-7.c: Likewise.
	* c-c++-common/asan/no-redundant-instrumentation-8.c: Likewise.
	* c-c++-common/asan/no-redundant-instrumentation-2.c: Removed.
	* c-c++-common/asan/no-redundant-instrumentation-9.c: Likewise.
	* c-c++-common/asan/no-redundant-instrumentation-10.c: New test.
	* c-c++-common/asan/no-redundant-instrumentation-11.c: Likewise.


diff --git a/gcc/asan.c b/gcc/asan.c
index 2a61a82..a9eb9aa 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -253,9 +253,7 @@ enum asan_check_flags
   ASAN_CHECK_STORE = 1  0,
   ASAN_CHECK_SCALAR_ACCESS = 1  1,
   ASAN_CHECK_NON_ZERO_LEN = 1  2,
-  ASAN_CHECK_START_INSTRUMENTED = 1  3,
-  ASAN_CHECK_END_INSTRUMENTED = 1  4,
-  ASAN_CHECK_LAST
+  ASAN_CHECK_LAST = 1  3
 };
 
 /* Hashtable support for memory references used by gimple
@@ -352,10 +350,7 @@ struct asan_mem_ref_hasher
 inline hashval_t
 asan_mem_ref_hasher::hash (const asan_mem_ref *mem_ref)
 {
-  inchash::hash hstate;
-  inchash::add_expr (mem_ref-start, hstate);
-  hstate.add_wide_int (mem_ref-access_size);
-  return hstate.end ();
+  return iterative_hash_expr (mem_ref-start, 0);
 }
 
 /* Compare two memory references.  We accept the length of either
@@ -365,8 +360,7 @@ inline bool
 asan_mem_ref_hasher::equal (const asan_mem_ref *m1,
 			const asan_mem_ref *m2)
 {
-  return (m1-access_size == m2-access_size
-	   operand_equal_p (m1-start, m2-start, 0));
+  return operand_equal_p (m1-start, m2-start, 0);
 }
 
 static hash_tableasan_mem_ref_hasher *asan_mem_ref_ht;
@@ -417,7 +411,8 @@ has_mem_ref_been_instrumented (tree ref, HOST_WIDE_INT access_size)
   asan_mem_ref r;
   asan_mem_ref_init (r, ref, access_size);
 
-  return (get_mem_ref_hash_table ()-find (r) != NULL);
+  asan_mem_ref *saved_ref = get_mem_ref_hash_table ()-find (r);
+  

Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-21 Thread Richard Biener
On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 Yes, This patch does not make sense since phi node predication for bb
 with critical incoming edges only performs another function which is
 absent (predicate_extended_scalar_phi).

 BTW I see that commit_edge_insertions() is used for rtx instructions
 only but you propose to use it for tree also.
 Did I miss something?

Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
if you want easy access to the newly created basic block to push
the predicate to - see gsi_commit_edge_inserts implementation).

Richard.

 Thanks ahead.


 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I did some changes in patch and ChangeLog to mark that support for
 if-convert of blocks with only critical incoming edges will be added
 in the future (more precise in patch.4).

 But the same reasoning applies to this version of the patch when
 flag_force_vectorize is true!?  (insertion point and invalid SSA form)

 Which means the patch doesn't make sense in isolation?

 Btw, I think for the case you should simply do gsi_insert_on_edge ()
 and commit_edge_insertions () before the call to combine_blocks
 (pushing the edge predicate to the newly created block).

 Richard.

 Could you please review it.

 Thanks.

 ChangeLog:

 2014-10-21  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 (all_preds_critical_p): New function.
 (if_convertible_bb_p): Use call of all_preds_critical_p
 to reject temporarily block if-conversion with incoming critical edges
 if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
 after adding support for extended predication.
 (predicate_bbs): Skip loop exit block also.Invoke build2_loc
 to compute predicate instead of fold_build2_loc.
 Add zeroing of edge 'aux' field.
 (find_phi_replacement_condition): Extend function interface:
 it returns NULL if given phi node must be handled by means of
 extended phi node predication. If number of predecessors of phi-block
 is equal 2 and at least one incoming edge is not critical original
 algorithm is used.
 (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
 Nullify 'aux' field of edges for blocks with two successors.

 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com:
 Richard,

 Thanks for your answer!

 In current implementation phi node conversion assume that one of
 incoming edge to bb containing given phi has at least one non-critical
 edge and choose it to insert predicated code. But if we choose
 critical edge we need to determine insert point and insertion
 direction (before/after) since in other case we can get invalid ssa
 form (use before def). This is done by my new function which is not in
 current patch ( I will present this patch later). SO I assume that we
 need to leave this patch as it is to not introduce new bugs.

 Thanks.
 Yuri.

 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 I reworked the patch as you proposed, but I didn't understand what
 did you mean by:

So please rework the patch so critical edges are always handled
correctly.

 In current patch flag_force_vectorize is used (1) to reject phi nodes
 with more than 2 arguments; (2) to reject basic blocks with only
 critical incoming edges since support for extended predication of phi
 nodes will be in next patch.

 I mean that (2) should not be rejected dependent on flag_force_vectorize.
 It was rejected because if-cvt couldn't handle it correctly before but 
 with
 this patch this is fixed.  I see no reason to still reject this then even
 for !flag_force_vectorize.

 Rejecting PHIs with more than two arguments with flag_force_vectorize
 is ok.

 Richard.

 Could you please clarify your statement.

 I attached modified patch.

 ChangeLog:

 2014-10-17  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up 

Re: Add polymorphic call context propagation to ipa-prop

2014-10-21 Thread Martin Jambor
Hi,


On Thu, Oct 02, 2014 at 09:00:12AM +0200, Jan Hubicka wrote:
 Hi,
 this patch makes ipa-prop to use ipa-polymorphic-call-context infrastructure
 for forward propagation (in a very minimal and simple way). 
 
 At the moment only static type info is propagated and it is used only
 speculatively because I will need to update dynamic type change code to deal
 with more general setting than in the old binfo propagation code. I want to do
 it incrementally.  The basic problem is that the old binfo code wasmostly 
 built
 around idea that all bases are primary bases and one class may contain other
 only as a base (not as field).
 
 Martin, I get out of range ICEs in controlled uses code and thus
 I added an extra check, see FIXME bellow. Could you, please, help
 me to fix that correctly?

Is there a simple testcase?

And sorry for not reviewing this in time but I've only recently
noticed that...

 
 THe patch also does not add necessary propagation into ipa-cp and thus
 devirtualizatoin happens only during inlining and is not appropriately
 hinted.
 
 Bootstrapped/regtested x86_64-linux, will commit it shortly.
 
 Honza
 
   * ipa-prop.h (ipa_get_controlled_uses): Add hack to avoid ICE
   when speculation is added.
   (ipa_edge_args): Add polymorphic_call_contexts.
   (ipa_get_ith_polymorhic_call_context): New accesor.
   (ipa_make_edge_direct_to_target): Add SPECULATIVE parameter.
   (ipa_print_node_jump_functions_for_edge): Print contexts.
   (ipa_compute_jump_functions_for_edge): Compute contexts.
   (update_jump_functions_after_inlining): Update contexts.
   (ipa_make_edge_direct_to_target): Add SPECULATIVE argument;
   update dumping; add speculative edge creation.
   (try_make_edge_direct_virtual_call): Add CTX_PTR parameter; handle
   context updating.
   (update_indirect_edges_after_inlining): Pass down context.
   (ipa_edge_duplication_hook): Duplicate contexts.
   (ipa_write_node_info): Stream out contexts.
   (ipa_read_node_info): Stream in contexts.
   * ipa-devirt.c (type_all_derivations_known_p): Avoid ICE on non-ODR
   types.
   (try_speculative_devirtualization): New function.
   * ipa-utils.h (try_speculative_devirtualization): Declare.
 Index: ipa-prop.h
 ===
 --- ipa-prop.h(revision 215792)
 +++ ipa-prop.h(working copy)
 @@ -432,7 +432,10 @@ ipa_set_param_used (struct ipa_node_para
  static inline int
  ipa_get_controlled_uses (struct ipa_node_params *info, int i)
  {
 -  return info-descriptors[i].controlled_uses;
 +  /* FIXME: introducing speuclation causes out of bounds access here.  */
 +  if (info-descriptors.length ()  (unsigned)i)
 +return info-descriptors[i].controlled_uses;
 +  return IPA_UNDESCRIBED_USE;
  }
  
  /* Set the controlled counter of a given parameter.  */
 @@ -479,6 +482,7 @@ struct GTY(()) ipa_edge_args
  {
/* Vector of the callsite's jump function of each parameter.  */
 Index: ipa-prop.c
 ===
 --- ipa-prop.c(revision 215792)
 +++ ipa-prop.c(working copy)
 @@ -2608,11 +2625,15 @@ update_jump_functions_after_inlining (st
for (i = 0; i  count; i++)
  {
struct ipa_jump_func *dst = ipa_get_ith_jump_func (args, i);
 +  struct ipa_polymorphic_call_context *dst_ctx
 + = ipa_get_ith_polymorhic_call_context (args, i);
  
if (dst-type == IPA_JF_ANCESTOR)
   {
 struct ipa_jump_func *src;
 int dst_fid = dst-value.ancestor.formal_id;
 +   struct ipa_polymorphic_call_context *src_ctx
 + = ipa_get_ith_polymorhic_call_context (top, dst_fid);

This should be moved down below the check that there is not a mismatch
between number of formal parameters and actual arguments, to the same
place where we initialize src.

  
 /* Variable number of arguments can cause havoc if we try to access
one that does not exist in the inlined edge.  So make sure we
 @@ -2625,6 +2646,22 @@ update_jump_functions_after_inlining (st
  
 src = ipa_get_ith_jump_func (top, dst_fid);
  
 +   if (src_ctx  !src_ctx-useless_p ())
 + {
 +   struct ipa_polymorphic_call_context ctx = *src_ctx;
 +
 +   /* TODO: Make type preserved safe WRT contexts.  */
 +   if (!dst-value.ancestor.agg_preserved)
 + ctx.make_speculative ();
 +   ctx.offset_by (dst-value.ancestor.offset);
 +   if (!ctx.useless_p ())
 + {
 +   vec_safe_grow_cleared (args-polymorphic_call_contexts,
 +  count);
 +   dst_ctx = ipa_get_ith_polymorhic_call_context (args, i);
 + }
 + }

I believe that dst_ctx-combine_with (ctx) is missing here?

Thanks,

Martin


[PATCH][ARM] Update target testcases for gnu11

2014-10-21 Thread Jiong Wang

this patch update arm testcases for recently gnu11 change.

ok for trunk?
thanks.

gcc/testsuite/
  * gcc.target/arm/20031108-1.c: Add explicit declaration.
  * gcc.target/arm/cold-lc.c: Likewise.
  * gcc.target/arm/neon-modes-2.c: Likewise.
  * gcc.target/arm/pr43920-2.c: Likewise.
  * gcc.target/arm/pr44788.c: Likewise.
  * gcc.target/arm/pr55642.c: Likewise.
  * gcc.target/arm/pr58784.c: Likewise.
  * gcc.target/arm/pr60650.c: Likewise.
  * gcc.target/arm/pr60650-2.c: Likewise.
  * gcc.target/arm/vfp-ldmdbs.c: Likewise.
  * gcc.target/arm/vfp-ldmias.c: Likewise.
  * lib/target-supports.exp: Likewise.
  * gcc.target/arm/pr51968.c: Add -Wno-implicit-function-declaration.

diff --git a/gcc/testsuite/gcc.target/arm/20031108-1.c b/gcc/testsuite/gcc.target/arm/20031108-1.c
index d9b6006..7923e11 100644
--- a/gcc/testsuite/gcc.target/arm/20031108-1.c
+++ b/gcc/testsuite/gcc.target/arm/20031108-1.c
@@ -20,6 +20,9 @@ typedef struct record
 
 Rec_Pointer Ptr_Glob;
 
+extern int Proc_7 (int, int, int *);
+
+void
 Proc_1 (Ptr_Val_Par)
 Rec_Pointer Ptr_Val_Par;
 {
diff --git a/gcc/testsuite/gcc.target/arm/cold-lc.c b/gcc/testsuite/gcc.target/arm/cold-lc.c
index 295c29f..467a696 100644
--- a/gcc/testsuite/gcc.target/arm/cold-lc.c
+++ b/gcc/testsuite/gcc.target/arm/cold-lc.c
@@ -7,6 +7,7 @@ struct thread_info {
 struct task_struct *task;
 };
 extern struct thread_info *current_thread_info (void);
+extern int show_stack (struct task_struct *, unsigned long *);
 
 void dump_stack (void)
 {
diff --git a/gcc/testsuite/gcc.target/arm/neon-modes-2.c b/gcc/testsuite/gcc.target/arm/neon-modes-2.c
index 40f1bba..16319bb 100644
--- a/gcc/testsuite/gcc.target/arm/neon-modes-2.c
+++ b/gcc/testsuite/gcc.target/arm/neon-modes-2.c
@@ -11,6 +11,8 @@
 
 #define MANY(A) A (0), A (1), A (2), A (3), A (4), A (5)
 
+extern void foo (int *, int *);
+
 void
 bar (uint32_t *ptr, int y)
 {
diff --git a/gcc/testsuite/gcc.target/arm/pr43920-2.c b/gcc/testsuite/gcc.target/arm/pr43920-2.c
index f647165..f5e8f48 100644
--- a/gcc/testsuite/gcc.target/arm/pr43920-2.c
+++ b/gcc/testsuite/gcc.target/arm/pr43920-2.c
@@ -4,6 +4,8 @@
 
 #include stdio.h
 
+extern int lseek(int, long, int);
+
 int getFileStartAndLength (int fd, int *start_, size_t *length_)
 {
   int start, end;
diff --git a/gcc/testsuite/gcc.target/arm/pr44788.c b/gcc/testsuite/gcc.target/arm/pr44788.c
index eb4bc11..9ce44a8 100644
--- a/gcc/testsuite/gcc.target/arm/pr44788.c
+++ b/gcc/testsuite/gcc.target/arm/pr44788.c
@@ -2,6 +2,8 @@
 /* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options -Os -fno-strict-aliasing -fPIC -mthumb -march=armv7-a -mfpu=vfp3 -mfloat-abi=softfp } */
 
+extern void foo (float *);
+
 void joint_decode(float* mlt_buffer1, int t) {
 int i;
 float decode_buffer[1060];
diff --git a/gcc/testsuite/gcc.target/arm/pr51968.c b/gcc/testsuite/gcc.target/arm/pr51968.c
index f0506c2..6cf802b 100644
--- a/gcc/testsuite/gcc.target/arm/pr51968.c
+++ b/gcc/testsuite/gcc.target/arm/pr51968.c
@@ -1,6 +1,6 @@
 /* PR target/51968 */
 /* { dg-do compile } */
-/* { dg-options -O2 -march=armv7-a -mfloat-abi=softfp -mfpu=neon } */
+/* { dg-options -O2 -Wno-implicit-function-declaration -march=armv7-a -mfloat-abi=softfp -mfpu=neon } */
 /* { dg-require-effective-target arm_neon_ok } */
 
 typedef __builtin_neon_qi int8x8_t __attribute__ ((__vector_size__ (8)));
diff --git a/gcc/testsuite/gcc.target/arm/pr55642.c b/gcc/testsuite/gcc.target/arm/pr55642.c
index 10f2daa..a7defa7 100644
--- a/gcc/testsuite/gcc.target/arm/pr55642.c
+++ b/gcc/testsuite/gcc.target/arm/pr55642.c
@@ -2,6 +2,8 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_thumb2_ok } */
 
+extern int abs (int);
+
 int
 foo (int v)
 {
diff --git a/gcc/testsuite/gcc.target/arm/pr58784.c b/gcc/testsuite/gcc.target/arm/pr58784.c
index 9a1fcff..4ee3ef5 100644
--- a/gcc/testsuite/gcc.target/arm/pr58784.c
+++ b/gcc/testsuite/gcc.target/arm/pr58784.c
@@ -11,6 +11,9 @@ typedef struct __attribute__ ((__packed__))
 char stepsRemoved;
 ptp_tlv_t tlv[1];
 } ptp_message_announce_t;
+
+extern void f (ptp_message_announce_t *);
+
 int ptplib_send_announce(int sequenceId, int i)
 {
 ptp_message_announce_t tx_packet;
diff --git a/gcc/testsuite/gcc.target/arm/pr60650-2.c b/gcc/testsuite/gcc.target/arm/pr60650-2.c
index 1946760..c8d4615 100644
--- a/gcc/testsuite/gcc.target/arm/pr60650-2.c
+++ b/gcc/testsuite/gcc.target/arm/pr60650-2.c
@@ -4,17 +4,19 @@
 int a, h, j;
 long long d, e, i;
 int f;
+int
 fn1 (void *p1, int p2)
 {
 switch (p2)
 case 8:
 {
-register b = *(long long *) p1, c asm (r2);
+register int b = *(long long *) p1, c asm (r2);
 asm (%0: =r (a), =r (c):r (b), r (0));
 *(long long *) p1 = c;
 }
 }
 
+int
 fn2 ()
 {
 int k;
@@ -27,8 +29,8 @@ fn2 ()
 case 0:
 (
 {
-register l asm (r4);
-register m asm (r0);
+register int l asm (r4);
+register int m asm (r0);

[PATCH][AArch64]Update target testcases for gnu11

2014-10-21 Thread Jiong Wang

Update testcases for recent gnu11 changes.

ok for trunk?

thanks.

gcc/testsuite/
  * gcc.target/aarch64/pic-constantpool1.c: Add explicit declaration.
  * gcc.target/aarch64/pic-symrefplus.c: Likewise.
  * gcc.target/aarch64/reload-valid-spoff.c: Likewise.
  * gcc.target/aarch64/vect.x: Likewise.
  * gcc.target/aarch64/vect-ld1r.x: Add return type.
  * gcc.target/aarch64/vect-fmax-fmin.c: Likewise.
  * gcc.target/aarch64/vect-fp.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c b/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c
index 3109d9d..043f1ee 100644
--- a/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c
+++ b/gcc/testsuite/gcc.target/aarch64/pic-constantpool1.c
@@ -2,10 +2,13 @@
 /* { dg-do compile } */
 
 extern int __finite (double __value) __attribute__ ((__nothrow__)) __attribute__ ((__const__));
+extern int __finitef (float __value) __attribute__ ((__nothrow__)) __attribute__ ((__const__));
+extern int __signbit (double __value) __attribute__ ((__nothrow__)) __attribute__ ((__const__));
+extern int __signbitf (float __value) __attribute__ ((__nothrow__)) __attribute__ ((__const__));
 int
 __ecvt_r (value, ndigit, decpt, sign, buf, len)
  double value;
- int ndigit, *decpt, *sign;
+ int ndigit, *decpt, *sign, len;
  char *buf;
 {
   if ((sizeof (value) == sizeof (float) ? __finitef (value) : __finite (value))  value != 0.0)
diff --git a/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c b/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c
index f277a52..406568c 100644
--- a/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c
+++ b/gcc/testsuite/gcc.target/aarch64/pic-symrefplus.c
@@ -34,12 +34,16 @@ struct locale_data
   values [];
 };
 extern const struct locale_data _nl_C_LC_TIME __attribute__ ((visibility (hidden)));
+extern void *memset (void *s, int c, size_t n);
+extern size_t strlen (const char *s);
+extern int __strncasecmp_l (const char *s1, const char *s2, size_t n, __locale_t locale);
 char *
 __strptime_internal (rp, fmt, tmp, statep , locale)
  const char *rp;
  const char *fmt;
  __locale_t locale;
  void *statep;
+ int tmp;
 {
   struct locale_data *const current = locale-__locales[__LC_TIME];
   const char *rp_backup;
@@ -124,5 +128,9 @@ __strptime_internal (rp, fmt, tmp, statep , locale)
 }
 char *
 __strptime_l (buf, format, tm , locale)
+ int buf;
+ int format;
+ int tm;
+ int locale;
 {
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c b/gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c
index b44e560..c2b5464 100644
--- a/gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c
+++ b/gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c
@@ -17,6 +17,11 @@ struct arpreq
 };
 typedef struct _IO_FILE FILE;
 extern char *fgets (char *__restrict __s, int __n, FILE *__restrict __stream);
+extern void *memset (void *s, int c, size_t n);
+extern void *memcpy (void *dest, const void *src, size_t n);
+extern int fprintf (FILE *stream, const char *format, ...);
+extern char * safe_strncpy (char *dst, const char *src, size_t size);
+extern size_t strlen (const char *s);
 extern struct _IO_FILE *stderr;
 extern int optind;
 struct aftype {
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fmax-fmin.c b/gcc/testsuite/gcc.target/aarch64/vect-fmax-fmin.c
index 42600b7..33a9444 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fmax-fmin.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fmax-fmin.c
@@ -8,11 +8,11 @@ extern void abort (void);
 #include vect-fmaxv-fminv.x
 
 #define DEFN_SETV(type) \
-		set_vector_##type (pR##type a, type n)   \
-		{  \
-		  int i;			 \
-		  for (i=0; i16; i++)		 \
-		a[i] = n; \
+		void set_vector_##type (pR##type a, type n)   \
+		{	  \
+		  int i;  \
+		  for (i=0; i16; i++)			  \
+		a[i] = n;  \
 		}
 
 #define DEFN_CHECKV(type) \
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fp.c b/gcc/testsuite/gcc.target/aarch64/vect-fp.c
index bcf9d9d..af0c524 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fp.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fp.c
@@ -8,11 +8,11 @@ extern void abort (void);
 
 
 #define DEFN_SETV(type) \
-		set_vector_##type (pR##type a, type n)   \
-		{  \
-		  int i;			 \
-		  for (i=0; i16; i++)		 \
-		a[i] = n; \
+		void set_vector_##type (pR##type a, type n)   \
+		{	  \
+		  int i;  \
+		  for (i=0; i16; i++)			  \
+		a[i] = n;  \
 		}
 
 #define DEFN_CHECKV(type) \
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-ld1r.x b/gcc/testsuite/gcc.target/aarch64/vect-ld1r.x
index 680ce43..db83036 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-ld1r.x
+++ b/gcc/testsuite/gcc.target/aarch64/vect-ld1r.x
@@ -7,7 +7,7 @@
 for (i = 0; i  8 / sizeof (TYPE); i++) \
   output[i] = *a; \
   } \
-  foo_ ## TYPE ## _q (TYPE *a, TYPE *output) \
+  void foo_ ## TYPE ## _q (TYPE *a, TYPE *output) 

Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-21 Thread Yuri Rumyantsev
Richard,

I saw the sources of these functions, but I can't understand why I
should use something else? Note that all predicate computations are
located in basic blocks ( by design of if-conv) and there is special
function that put these computations in bb
(insert_gimplified_predicates). Edge contains only predicate not its
computations. New function - find_insertion_point() does very simple
search - it finds out the latest (in current bb) operand def-stmt of
predicates taken from all incoming edges.
In original algorithm the predicate of non-critical edge is taken to
perform phi-node predication since for critical edge it does not work
properly.

My question is: does your comments mean that I should re-design my extensions?

Thanks.
Yuri.

BTW Jeff did initial review of my changes related to predicate
computation for join blocks. I presented him updated patch with
test-case and some minor changes in patch. But still did not get any
feedback on it. Could you please take a look also on it?


2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 Yes, This patch does not make sense since phi node predication for bb
 with critical incoming edges only performs another function which is
 absent (predicate_extended_scalar_phi).

 BTW I see that commit_edge_insertions() is used for rtx instructions
 only but you propose to use it for tree also.
 Did I miss something?

 Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
 if you want easy access to the newly created basic block to push
 the predicate to - see gsi_commit_edge_inserts implementation).

 Richard.

 Thanks ahead.


 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I did some changes in patch and ChangeLog to mark that support for
 if-convert of blocks with only critical incoming edges will be added
 in the future (more precise in patch.4).

 But the same reasoning applies to this version of the patch when
 flag_force_vectorize is true!?  (insertion point and invalid SSA form)

 Which means the patch doesn't make sense in isolation?

 Btw, I think for the case you should simply do gsi_insert_on_edge ()
 and commit_edge_insertions () before the call to combine_blocks
 (pushing the edge predicate to the newly created block).

 Richard.

 Could you please review it.

 Thanks.

 ChangeLog:

 2014-10-21  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 (all_preds_critical_p): New function.
 (if_convertible_bb_p): Use call of all_preds_critical_p
 to reject temporarily block if-conversion with incoming critical edges
 if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
 after adding support for extended predication.
 (predicate_bbs): Skip loop exit block also.Invoke build2_loc
 to compute predicate instead of fold_build2_loc.
 Add zeroing of edge 'aux' field.
 (find_phi_replacement_condition): Extend function interface:
 it returns NULL if given phi node must be handled by means of
 extended phi node predication. If number of predecessors of phi-block
 is equal 2 and at least one incoming edge is not critical original
 algorithm is used.
 (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
 Nullify 'aux' field of edges for blocks with two successors.

 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com:
 Richard,

 Thanks for your answer!

 In current implementation phi node conversion assume that one of
 incoming edge to bb containing given phi has at least one non-critical
 edge and choose it to insert predicated code. But if we choose
 critical edge we need to determine insert point and insertion
 direction (before/after) since in other case we can get invalid ssa
 form (use before def). This is done by my new function which is not in
 current patch ( I will present this patch later). SO I assume that we
 need to leave this patch as it is to not introduce new bugs.

 Thanks.
 Yuri.

 2014-10-20 12:00 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 I reworked the patch as you proposed, but I didn't understand what
 did you mean by:

So please rework the patch so critical edges are always handled
correctly.

 In current patch flag_force_vectorize is used (1) to reject phi nodes
 with more than 2 arguments; (2) to 

[PATCH][match-and-simplify] Re-factor code in fold_stmt_1

2014-10-21 Thread Richard Biener

This refactors the code I added to fold_stmt to dispatch to
pattern-based folding to avoid long lines and make error
handling easier (no goto).  It also uses the newly introduced
gimple_seq_discard to properly discard an unused simplification
result.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

This piece will land in trunk with the next merge piece.

Richard.

2014-10-21  Richard Biener  rguent...@suse.de

* gimple-fold.c (replace_stmt_with_simplification): New helper
split out from ...
(fold_stmt_1): ... here.  Discard the simplified sequence if
replacement failed.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 216505)
+++ gcc/gimple-fold.c   (working copy)
@@ -2794,6 +2794,121 @@ gimple_fold_call (gimple_stmt_iterator *
   return changed;
 }
 
+
+/* Worker for fold_stmt_1 dispatch to pattern based folding with
+   gimple_simplify.
+
+   Replaces *GSI with the simplification result in RCODE and OPS
+   and the associated statements in *SEQ.  Does the replacement
+   according to INPLACE and returns true if the operation succeeded.  */
+
+static bool
+replace_stmt_with_simplification (gimple_stmt_iterator *gsi,
+ code_helper rcode, tree *ops,
+ gimple_seq *seq, bool inplace)
+{
+  gimple stmt = gsi_stmt (*gsi);
+
+  /* Play safe and do not allow abnormals to be mentioned in
+ newly created statements.  See also maybe_push_res_to_seq.  */
+  if ((TREE_CODE (ops[0]) == SSA_NAME
+SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[0]))
+  || (ops[1]
+  TREE_CODE (ops[1]) == SSA_NAME
+  SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[1]))
+  || (ops[2]
+  TREE_CODE (ops[2]) == SSA_NAME
+  SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[2])))
+return false;
+
+  if (gimple_code (stmt) == GIMPLE_COND)
+{
+  gcc_assert (rcode.is_tree_code ());
+  if (TREE_CODE_CLASS ((enum tree_code)rcode) == tcc_comparison
+ /* GIMPLE_CONDs condition may not throw.  */
+  (!flag_exceptions
+ || !cfun-can_throw_non_call_exceptions
+ || !operation_could_trap_p (rcode,
+ FLOAT_TYPE_P (TREE_TYPE (ops[0])),
+ false, NULL_TREE)))
+   gimple_cond_set_condition (stmt, rcode, ops[0], ops[1]);
+  else if (rcode == SSA_NAME)
+   gimple_cond_set_condition (stmt, NE_EXPR, ops[0],
+  build_zero_cst (TREE_TYPE (ops[0])));
+  else if (rcode == INTEGER_CST)
+   {
+ if (integer_zerop (ops[0]))
+   gimple_cond_make_false (stmt);
+ else
+   gimple_cond_make_true (stmt);
+   }
+  else if (!inplace)
+   {
+ tree res = maybe_push_res_to_seq (rcode, boolean_type_node,
+   ops, seq);
+ if (!res)
+   return false;
+ gimple_cond_set_condition (stmt, NE_EXPR, res,
+build_zero_cst (TREE_TYPE (res)));
+   }
+  else
+   return false;
+  if (dump_file  (dump_flags  TDF_DETAILS))
+   {
+ fprintf (dump_file, gimple_simplified to );
+ if (!gimple_seq_empty_p (*seq))
+   print_gimple_seq (dump_file, *seq, 0, TDF_SLIM);
+ print_gimple_stmt (dump_file, gsi_stmt (*gsi),
+0, TDF_SLIM);
+   }
+  gsi_insert_seq_before (gsi, *seq, GSI_SAME_STMT);
+  return true;
+}
+  else if (is_gimple_assign (stmt)
+   rcode.is_tree_code ())
+{
+  if (!inplace
+ || gimple_num_ops (stmt) = get_gimple_rhs_num_ops (rcode))
+   {
+ maybe_build_generic_op (rcode,
+ TREE_TYPE (gimple_assign_lhs (stmt)),
+ ops[0], ops[1], ops[2]);
+ gimple_assign_set_rhs_with_ops_1 (gsi, rcode,
+   ops[0], ops[1], ops[2]);
+ if (dump_file  (dump_flags  TDF_DETAILS))
+   {
+ fprintf (dump_file, gimple_simplified to );
+ if (!gimple_seq_empty_p (*seq))
+   print_gimple_seq (dump_file, *seq, 0, TDF_SLIM);
+ print_gimple_stmt (dump_file, gsi_stmt (*gsi),
+0, TDF_SLIM);
+   }
+ gsi_insert_seq_before (gsi, *seq, GSI_SAME_STMT);
+ return true;
+   }
+}
+  else if (!inplace)
+{
+  if (gimple_has_lhs (stmt))
+   {
+ tree lhs = gimple_get_lhs (stmt);
+ maybe_push_res_to_seq (rcode, TREE_TYPE (lhs),
+ops, seq, lhs);
+ if (dump_file  (dump_flags  TDF_DETAILS))
+   {
+ fprintf (dump_file, gimple_simplified to );
+ print_gimple_seq (dump_file, *seq, 0, TDF_SLIM);
+   }
+ gsi_replace_with_seq_vops 

Re: [C++ Patch] Add default arguments to cp_parser_unary_expression

2014-10-21 Thread Jason Merrill

OK.

Jason


[PATCH][dejagnu] gcc-dg-prune glitch when filtering relocation truncation error

2014-10-21 Thread Jiong Wang


On 19/08/14 17:30, Mike Stump wrote:

On Aug 19, 2014, at 6:12 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

So how about this?

Ok.  Thanks.


looks like this patch only fixed one invoke path.

currently, gcc-dg-prune may be invoked directly *or* via 
${tool}_check_compile:

and gcc-dg-prune is implemented to return ::unsupported::memory full if the 
input
message contains the relocation truncated error pattern.

this return message it OK if it's invoked directly, while it will be wrong if 
it's invoked
via ${tool}_check_compile. because the ${tool}_check_compile has a duplicated 
check of unsupported
testcase later via ${tool}_check_unsupported_p which only works with original 
output message by
matching the relocation truncation keyword. So, our early hijack of the error 
in gcc-dg-prune
will replace those keywords to ::unsupported::memory which confuse the later 
check.

this patch doing the following cleanup:

* modify the expected output in ${tool}_check_compile.
  if gcc-dg-prune invoked, then we expect ::unsupported:: keyword for 
unsupported testcase.

* remove the duplicated unresolve report in compat.exp.
  for all ${tool}_check_compile return 0, the issue is handled already. No need 
to report a redundant status.

ok for trunk?

gcc/testsuite/
  * lib/compat.exp (compat-run): Remove unresolved.
  * lib/gcc-defs.exp (${tools}_check_compile): Update code logic for 
unsupported testcase.
diff --git a/gcc/testsuite/lib/compat.exp b/gcc/testsuite/lib/compat.exp
index 7ab85aa..45cf0e0 100644
--- a/gcc/testsuite/lib/compat.exp
+++ b/gcc/testsuite/lib/compat.exp
@@ -134,7 +134,6 @@ proc compat-run { testname objlist dest optall optfile optstr } {
 		 $options]
 if ![${tool}_check_compile $testcase $testname link  \
 	 $dest $comp_output] then {
-	unresolved $testcase $testname execute $optstr
 	return
 }
 
diff --git a/gcc/testsuite/lib/gcc-defs.exp b/gcc/testsuite/lib/gcc-defs.exp
index cb93238..d479667 100644
--- a/gcc/testsuite/lib/gcc-defs.exp
+++ b/gcc/testsuite/lib/gcc-defs.exp
@@ -54,12 +54,17 @@ proc ${tool}_check_compile {testcase option objname gcc_output} {
 if { [info proc ${tool}-dg-prune] !=  } {
 	global target_triplet
 	set gcc_output [${tool}-dg-prune $target_triplet $gcc_output]
-}
-
-set unsupported_message [${tool}_check_unsupported_p $gcc_output]
-if { $unsupported_message !=  } {
-	unsupported $testcase: $unsupported_message
-	return 0
+	if [string match *::unsupported::* $gcc_output] then {
+	regsub -- ::unsupported:: $gcc_output  gcc_output
+	unsupported $testcase: $gcc_output
+	return 0
+	}
+} else {
+	set unsupported_message [${tool}_check_unsupported_p $gcc_output]
+	if { $unsupported_message !=  } {
+	unsupported $testcase: $unsupported_message
+	return 0
+	}
 }

 # remove any leftover LF/CR to make sure any output is legit

Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-21 Thread Richard Biener
On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I saw the sources of these functions, but I can't understand why I
 should use something else? Note that all predicate computations are
 located in basic blocks ( by design of if-conv) and there is special
 function that put these computations in bb
 (insert_gimplified_predicates). Edge contains only predicate not its
 computations. New function - find_insertion_point() does very simple
 search - it finds out the latest (in current bb) operand def-stmt of
 predicates taken from all incoming edges.
 In original algorithm the predicate of non-critical edge is taken to
 perform phi-node predication since for critical edge it does not work
 properly.

 My question is: does your comments mean that I should re-design my extensions?

Well, we have infrastructure for inserting code on edges and you've
made critical edges predicated correctly.  So why re-invent the wheel?
I realize this is very similar to my initial suggestion to simply split
critical edges in loops you want to if-convert but delays splitting
until it turns out to be necessary (which might be good for the
!force_vect case).

For edge predicates you simply can emit their computation on the
edge, no?

Btw, I very originally suggested to rework if-conversion to only
record edge predicates - having both block and edge predicates
somewhat complicates the code and makes it harder to
maintain (thus also the suggestion to simply split critical edges
if necessary to make BB predicates work always).

Your patches add a lot of code and to me it seems we can avoid
doing so much special casing.

Richard.

 Thanks.
 Yuri.

 BTW Jeff did initial review of my changes related to predicate
 computation for join blocks. I presented him updated patch with
 test-case and some minor changes in patch. But still did not get any
 feedback on it. Could you please take a look also on it?


 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 Yes, This patch does not make sense since phi node predication for bb
 with critical incoming edges only performs another function which is
 absent (predicate_extended_scalar_phi).

 BTW I see that commit_edge_insertions() is used for rtx instructions
 only but you propose to use it for tree also.
 Did I miss something?

 Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
 if you want easy access to the newly created basic block to push
 the predicate to - see gsi_commit_edge_inserts implementation).

 Richard.

 Thanks ahead.


 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 I did some changes in patch and ChangeLog to mark that support for
 if-convert of blocks with only critical incoming edges will be added
 in the future (more precise in patch.4).

 But the same reasoning applies to this version of the patch when
 flag_force_vectorize is true!?  (insertion point and invalid SSA form)

 Which means the patch doesn't make sense in isolation?

 Btw, I think for the case you should simply do gsi_insert_on_edge ()
 and commit_edge_insertions () before the call to combine_blocks
 (pushing the edge predicate to the newly created block).

 Richard.

 Could you please review it.

 Thanks.

 ChangeLog:

 2014-10-21  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 (all_preds_critical_p): New function.
 (if_convertible_bb_p): Use call of all_preds_critical_p
 to reject temporarily block if-conversion with incoming critical edges
 if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
 after adding support for extended predication.
 (predicate_bbs): Skip loop exit block also.Invoke build2_loc
 to compute predicate instead of fold_build2_loc.
 Add zeroing of edge 'aux' field.
 (find_phi_replacement_condition): Extend function interface:
 it returns NULL if given phi node must be handled by means of
 extended phi node predication. If number of predecessors of phi-block
 is equal 2 and at least one incoming edge is not critical original
 algorithm is used.
 (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
 Nullify 'aux' field of edges for blocks with two successors.

 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev ysrum...@gmail.com:
 Richard,

 Thanks for your answer!

 In current implementation phi node conversion assume 

[0/6] nvptx testsuite patches

2014-10-21 Thread Bernd Schmidt
This series modifies a large number of tests in order to clean up 
testsuite results on nvptx. The goal here was never really to get an 
entirely clean run - the target is just too different from conventional 
ones - but to be able to test the compiler sufficiently to be sure that 
it's in good shape for use in offloading. Most of the patches here add 
annotations for use of features like alloca or indirect jumps that are 
unsupported on the target.


Examples of things that still cause failures are things like dots in 
identifiers, use of constructors (which is something I want to look 
into), certain constructs that trigger bugs in the ptxas tool, and lots 
of undefined C library functions.



Bernd


[1/6] nvptx testsuite patches: alloca

2014-10-21 Thread Bernd Schmidt
This deals with uses of alloca in the testsuite. Some tests require it 
outright, others only at -O0, and others require it implicitly by 
requiring an alignment for stack variables bigger than the target's 
STACK_BOUNDARY. For the latter I've added explicit xfails.



Bernd

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_alloca): New function.
	* gcc.c-torture/execute/20010209-1.c: Require alloca.
	* gcc.c-torture/execute/20020314-1.c: Likewise.
	* gcc.c-torture/execute/20020412-1.c: Likewise.
	* gcc.c-torture/execute/20021113-1.c: Likewise.
	* gcc.c-torture/execute/20040223-1.c: Likewise.
	* gcc.c-torture/execute/20040308-1.c: Likewise.
	* gcc.c-torture/execute/20040811-1.c: Likewise.
	* gcc.c-torture/execute/20070824-1.c: Likewise.
	* gcc.c-torture/execute/20070919-1.c: Likewise.
	* gcc.c-torture/execute/built-in-setjmp.c: Likewise.
	* gcc.c-torture/execute/pr22061-1.c: Likewise.
	* gcc.c-torture/execute/pr22061-4.c: Likewise.
	* gcc.c-torture/execute/pr43220.c: Likewise.
	* gcc.c-torture/execute/vla-dealloc-1.c: Likewise.
	* gcc.dg/torture/stackalign/alloca-1.c: Likewise.
	* gcc.dg/torture/stackalign/vararg-1.c: Likewise.
	* gcc.dg/torture/stackalign/vararg-2.c: Likewise.
	* gcc.c-torture/compile/2923-1.c: Likewise.
	* gcc.c-torture/compile/20030224-1.c: Likewise.
	* gcc.c-torture/compile/20071108-1.c: Likewise.
	* gcc.c-torture/compile/20071117-1.c: Likewise.
	* gcc.c-torture/compile/900313-1.c: Likewise.
	* gcc.c-torture/compile/pr17397.c: Likewise.
	* gcc.c-torture/compile/pr35006.c: Likewise.
	* gcc.c-torture/compile/pr42956.c: Likewise.
	* gcc.c-torture/compile/pr51354.c: Likewise.
	* gcc.c-torture/compile/pr55851.c: Likewise.
	* gcc.c-torture/compile/vla-const-1.c: Likewise.
	* gcc.c-torture/compile/vla-const-2.c: Likewise.
	* gcc.c-torture/compile/pr31507-1.c: Likewise.
	* gcc.c-torture/compile/pr52714.c: Likewise.
	* gcc.dg/20001012-2.c: Likewise.
	* gcc.dg/auto-type-1.c: Likewise.
	* gcc.dg/builtin-object-size-1.c: Likewise.
	* gcc.dg/builtin-object-size-2.c: Likewise.
	* gcc.dg/builtin-object-size-3.c: Likewise.
	* gcc.dg/builtin-object-size-4.c: Likewise.
	* gcc.dg/packed-vla.c: Likewise.
	* gcc.c-torture/compile/parms.c: Likewise.
	* gcc.c-torture/execute/920721-2.c: Skip -O0 unless alloca is available.
	* gcc.c-torture/execute/920929-1.c: Likewise.
	* gcc.c-torture/execute/921017-1.c: Likewise.
	* gcc.c-torture/execute/941202-1.c: Likewise.
	* gcc.c-torture/execute/align-nest.c: Likewise.
	* gcc.c-torture/execute/alloca-1.c: Likewise.
	* gcc.c-torture/execute/pr36321.c: Likewise.
	* gcc.c-torture/compile/20001221-1.c: Likewise.
	* gcc.c-torture/compile/20020807-1.c: Likewise.
	* gcc.c-torture/compile/20050801-2.c: Likewise.
	* gcc.c-torture/compile/920428-4.c: Likewise.
	* gcc.c-torture/compile/debugvlafunction-1.c.c: Likewise.
	* gcc.c-torture/compile/pr41469.c: Likewise.
	* gcc.dg/torture/pr48953.c: Likewise.
	* gcc.dg/torture/pr8081.c: Likewise.
	* gcc.dg/torture/stackalign/inline-1.c: Skip if nvptx-*-*.
	* gcc.dg/torture/stackalign/inline-2.c: Likewise.
	* gcc.dg/torture/stackalign/nested-1.c: Likewise.
	* gcc.dg/torture/stackalign/nested-2.c: Likewise.
	* gcc.dg/torture/stackalign/nested-3.c: Likewise.
	* gcc.dg/torture/stackalign/nested-4.c: Likewise.
	* gcc.dg/torture/stackalign/nested-1.c: Likewise.
	* gcc.dg/torture/stackalign/global-1.c: Likewise.
	* gcc.dg/torture/stackalign/pr16660-1.c: Likewise.
	* gcc.dg/torture/stackalign/pr16660-2.c: Likewise.
	* gcc.dg/torture/stackalign/pr16660-3.c: Likewise.
	* gcc.dg/torture/stackalign/ret-struct-1.c: Likewise.
	* gcc.dg/torture/stackalign/struct-1.c: Likewise.


Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp.orig
+++ gcc/testsuite/lib/target-supports.exp
@@ -604,6 +606,15 @@ proc add_options_for_tls { flags } {
 return $flags
 }
 
+# Return 1 if alloca is supported, 0 otherwise.
+
+proc check_effective_target_alloca {} {
+if { [istarget nvptx-*-*] } {
+	return 0
+}
+return 1
+}
+
 # Return 1 if thread local storage (TLS) is supported, 0 otherwise.
 
 proc check_effective_target_tls {} {
Index: gcc/testsuite/gcc.c-torture/execute/20010209-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20010209-1.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/20010209-1.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target alloca } */
 int b;
 int foo (void)
 {
Index: gcc/testsuite/gcc.c-torture/execute/20020314-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20020314-1.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/20020314-1.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target alloca } */
 void f(void * a, double y)
 {
 }
Index: gcc/testsuite/gcc.c-torture/execute/20020412-1.c

[2/6] nvptx testsuite patches: typed assembly

2014-10-21 Thread Bernd Schmidt
Since everything in ptx assembly is typed, KR C is problematic. There 
are a number of testcases that call functions with the wrong number of 
arguments, or arguments of the wrong type. I've added a new feature, 
untyped_assembly, which these tests now require. I've also used this for 
tests using builtin_apply/builtin_return.



Bernd

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_untyped_assembly): New function.
	* gcc.c-torture/compile/20091215-1.c: Require untyped_assembly.
	* gcc.c-torture/compile/920917-1.c: Likewise.
	* gcc.c-torture/compile/930120-1.c: Likewise.
	* gcc.c-torture/compile/930411-1.c: Likewise.
	* gcc.c-torture/compile/930529-1.c: Likewise.
	* gcc.c-torture/compile/930623-1.c: Likewise.
	* gcc.c-torture/compile/950329-1.c: Likewise.
	* gcc.c-torture/compile/calls.c: Likewise.
	* gcc.c-torture/compile/pr37258.c: Likewise.
	* gcc.c-torture/compile/pr37327.c: Likewise.
	* gcc.c-torture/compile/pr38360.c: Likewise.
	* gcc.c-torture/compile/pr43635.c: Likewise.
	* gcc.c-torture/compile/pr47428.c: Likewise.
	* gcc.c-torture/compile/pr47967.c: Likewise.
	* gcc.c-torture/compile/pr49145.c: Likewise.
	* gcc.c-torture/compile/pr51694.c: Likewise.
	* gcc.c-torture/compile/pr53411.c: Likewise.
	* gcc.c-torture/execute/20001101.c: Likewise.
	* gcc.c-torture/execute/20051012-1.c: Likewise.
	* gcc.c-torture/execute/920501-1.c: Likewise.
	* gcc.c-torture/execute/921202-1.c: Likewise.
	* gcc.c-torture/execute/921208-2.c: Likewise.
	* gcc.c-torture/execute/call-trap-1.c: Likewise.
	* gcc.c-torture/compile/20010525-1.c: Likewise.
	* gcc.c-torture/compile/20021015-2.c: Likewise.
	* gcc.c-torture/compile/20031023-1.c: Likewise.
	* gcc.c-torture/compile/20031023-2.c: Likewise.
	* gcc.c-torture/compile/pr49206.c: Likewise.
	* gcc.c-torture/execute/pr47237.c: Likewise.
	* gcc.dg/torture/stackalign/builtin-apply-1.c: Likewise.
	* gcc.dg/torture/stackalign/builtin-apply-2.c: Likewise.
	* gcc.dg/torture/stackalign/builtin-apply-3.c: Likewise.
	* gcc.dg/torture/stackalign/builtin-apply-4.c: Likewise.
	* gcc.dg/torture/stackalign/builtin-return-1.c: Likewise.
	* gcc.dg/builtin-apply1.c: Likewise.
	* gcc.dg/builtin-apply2.c: Likewise.
	* gcc.dg/builtin-apply3.c: Likewise.
	* gcc.dg/builtin-apply4.c: Likewise.
	* gcc.dg/pr38338.c: Likewise.
	* gcc.dg/torture/pr41993.c: Likewise.
	* gcc.c-torture/compile/386.c: Likewise.
	* gcc.c-torture/compile/cmpsi386.c: Likewise.
	* gcc.c-torture/compile/consec.c: Likewise.
	* gcc.c-torture/compile/ex.c: Likewise.
	* gcc.c-torture/compile/pass.c: Likewise.
	* gcc.c-torture/compile/scal.c: Likewise.
	* gcc.c-torture/compile/uuarg.c: Likewise.
	* gcc.c-torture/compile/conv_tst.c: Likewise.


Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp.orig
+++ gcc/testsuite/lib/target-supports.exp
@@ -604,6 +606,17 @@ proc add_options_for_tls { flags } {
 return $flags
 }
 
+# Return 1 if the assembler does not verify function types against
+# calls, 0 otherwise.  Such verification will typically show up problems
+# with KR C function declarations.
+
+proc check_effective_target_untyped_assembly {} {
+if { [istarget nvptx-*-*] } {
+	return 0
+}
+return 1
+}
+
 # Return 1 if alloca is supported, 0 otherwise.
 
 proc check_effective_target_alloca {} {
Index: gcc/testsuite/gcc.c-torture/compile/20091215-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/20091215-1.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/20091215-1.c
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target untyped_assembly } */
+
 void bar ();
 
 void
Index: gcc/testsuite/gcc.c-torture/compile/920917-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/920917-1.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/920917-1.c
@@ -1,2 +1,4 @@
+/* { dg-require-effective-target untyped_assembly } */
+
 inline f(x){switch(x){case 6:case 4:case 3:case 1:;}return x;}
 g(){f(sizeof(xx));}
Index: gcc/testsuite/gcc.c-torture/compile/930120-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/930120-1.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/930120-1.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target untyped_assembly } */
 union {
   short I[2];
   long int L;
Index: gcc/testsuite/gcc.c-torture/compile/930411-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/930411-1.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/930411-1.c
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target untyped_assembly } */
+
 int heap;
 
 g(){}
Index: gcc/testsuite/gcc.c-torture/compile/930529-1.c
===
--- 

Re: [PATCH][ARM] Update target testcases for gnu11

2014-10-21 Thread Ramana Radhakrishnan



On 21/10/14 14:48, Jiong Wang wrote:

this patch update arm testcases for recently gnu11 change.

ok for trunk?


This is OK  bar the minor nit in the ChangeLog below - as a follow up it 
would be nice to see if we can use the ACLE feature macros instead of 
hard-coding some of the functions into the target_neon supports 
(especially the ones for vcvt16 and vfma).



thanks.

gcc/testsuite/
* gcc.target/arm/20031108-1.c: Add explicit declaration.
* gcc.target/arm/cold-lc.c: Likewise.
* gcc.target/arm/neon-modes-2.c: Likewise.
* gcc.target/arm/pr43920-2.c: Likewise.
* gcc.target/arm/pr44788.c: Likewise.
* gcc.target/arm/pr55642.c: Likewise.
* gcc.target/arm/pr58784.c: Likewise.
* gcc.target/arm/pr60650.c: Likewise.
* gcc.target/arm/pr60650-2.c: Likewise.
* gcc.target/arm/vfp-ldmdbs.c: Likewise.
* gcc.target/arm/vfp-ldmias.c: Likewise.



* lib/target-supports.exp: Likewise.


Can you mention the specific target-supports functions changed here, 
please ?



* gcc.target/arm/pr51968.c: Add -Wno-implicit-function-declaration.



regards
Ramana


[3/6] nvptx testsuite patches: stdio

2014-10-21 Thread Bernd Schmidt
Some tests use stdio functions which are unavaiable with the cut-down 
newlib I'm using for ptx testing. I'm somewhat uncertain what to do with 
these; they are by no means the only unavailable library functions the 
testsuite tries to use (signal is another example). Here's a patch which 
deals with parts of the problem, but I wouldn't mind leaving this one 
out if it doesn't seem worthwhile.



Bernd

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_stdio): New
	function.
	* gcc.c-torture/execute/gofast.c (fail): Don't fprintf on nvptx.
	* gcc.dg/torture/matrix-1.c: Require stdio.
	* gcc.dg/torture/matrix-2.c: Require stdio.
	* gcc.dg/torture/matrix-5.c: Require stdio.
	* gcc.dg/torture/matrix-6.c: Require stdio.
	* gcc.dg/torture/transpose-1.c: Require stdio.
	* gcc.dg/torture/transpose-2.c: Require stdio.
	* gcc.dg/torture/transpose-3.c: Require stdio.
	* gcc.dg/torture/transpose-4.c: Require stdio.
	* gcc.dg/torture/transpose-5.c: Require stdio.
	* gcc.dg/torture/transpose-6.c: Require stdio.


Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp.orig
+++ gcc/testsuite/lib/target-supports.exp
@@ -604,6 +606,15 @@ proc add_options_for_tls { flags } {
 return $flags
 }
 
+# Return 1 if the C library on this target has stdio support, 0 otherwise.
+
+proc check_effective_target_stdio {} {
+if { [istarget nvptx-*-*] } {
+	return 0
+}
+return 1
+}
+
 # Return 1 if the assembler does not verify function types against
 # calls, 0 otherwise.  Such verification will typically show up problems
 # with KR C function declarations.
Index: gcc/testsuite/gcc.c-torture/execute/gofast.c
===
--- gcc/testsuite/gcc.c-torture/execute/gofast.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/gofast.c
@@ -48,7 +48,9 @@ int
 fail (char *msg)
 {
   fail_count++;
+#ifndef __nvptx__
   fprintf (stderr, Test failed: %s\n, msg);
+#endif
 }
 
 int
Index: gcc/testsuite/gcc.dg/torture/matrix-1.c
===
--- gcc/testsuite/gcc.dg/torture/matrix-1.c.orig
+++ gcc/testsuite/gcc.dg/torture/matrix-1.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options -fwhole-program } */
+/* { dg-require-effective-target stdio } */
 
 #include stdio.h
 #include stdlib.h
Index: gcc/testsuite/gcc.dg/torture/matrix-2.c
===
--- gcc/testsuite/gcc.dg/torture/matrix-2.c.orig
+++ gcc/testsuite/gcc.dg/torture/matrix-2.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options -fwhole-program } */
-
+/* { dg-require-effective-target stdio } */
 
 #include stdio.h
 #include stdlib.h
Index: gcc/testsuite/gcc.dg/torture/matrix-5.c
===
--- gcc/testsuite/gcc.dg/torture/matrix-5.c.orig
+++ gcc/testsuite/gcc.dg/torture/matrix-5.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options -fwhole-program } */
-
+/* { dg-require-effective-target stdio } */
 
 #include stdio.h
 #include stdlib.h
Index: gcc/testsuite/gcc.dg/torture/matrix-6.c
===
--- gcc/testsuite/gcc.dg/torture/matrix-6.c.orig
+++ gcc/testsuite/gcc.dg/torture/matrix-6.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options -fwhole-program } */
-
+/* { dg-require-effective-target stdio } */
 
 #include stdio.h
 #include stdlib.h
Index: gcc/testsuite/gcc.dg/torture/transpose-1.c
===
--- gcc/testsuite/gcc.dg/torture/transpose-1.c.orig
+++ gcc/testsuite/gcc.dg/torture/transpose-1.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options -fwhole-program } */
+/* { dg-require-effective-target stdio } */
 
 #include stdio.h
 #include stdlib.h
Index: gcc/testsuite/gcc.dg/torture/transpose-2.c
===
--- gcc/testsuite/gcc.dg/torture/transpose-2.c.orig
+++ gcc/testsuite/gcc.dg/torture/transpose-2.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options -fwhole-program } */
+/* { dg-require-effective-target stdio } */
 
 #include stdio.h
 #include stdlib.h
Index: gcc/testsuite/gcc.dg/torture/transpose-3.c
===
--- gcc/testsuite/gcc.dg/torture/transpose-3.c.orig
+++ gcc/testsuite/gcc.dg/torture/transpose-3.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options -fwhole-program } */
+/* { dg-require-effective-target stdio } */
 
 #include stdio.h
 #include stdlib.h
Index: gcc/testsuite/gcc.dg/torture/transpose-4.c
===
--- gcc/testsuite/gcc.dg/torture/transpose-4.c.orig
+++ gcc/testsuite/gcc.dg/torture/transpose-4.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options 

Re: [PATCH i386 AVX512] [81/n] Add new built-ins.

2014-10-21 Thread Kirill Yukhin
Hello,
On 21 Oct 11:17, Richard Biener wrote:
 On Mon, Oct 20, 2014 at 3:50 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Mon, Oct 20, 2014 at 05:41:25PM +0400, Kirill Yukhin wrote:
  Hello,
  This patch adds (almost) all built-ins needed by
  AVX-512VL,BW,DQ intrinsics.
 
  Main questionable hunk is:
 
  diff --git a/gcc/tree-core.h b/gcc/tree-core.h
  index b69312b..a639487 100644
  --- a/gcc/tree-core.h
  +++ b/gcc/tree-core.h
  @@ -1539,7 +1539,7 @@ struct GTY(()) tree_function_decl {
DECL_FUNCTION_CODE.  Otherwise unused.
???  The bitfield needs to be able to hold all target function
  codes as well.  */
  -  ENUM_BITFIELD(built_in_function) function_code : 11;
  +  ENUM_BITFIELD(built_in_function) function_code : 12;
 ENUM_BITFIELD(built_in_class) built_in_class : 2;
 
 unsigned static_ctor_flag : 1;
 
  Well, decl_with_vis has 15 unused bits, so instead of growing
  FUNCTION_DECL significantly, might be better to move one of the
  flags to decl_with_vis and just document that it applies to FUNCTION_DECLs
  only.  Or move some flag to cgraph if possible.
 
  But seeing e.g.
 IX86_BUILTIN_FIXUPIMMPD256, IX86_BUILTIN_FIXUPIMMPD256_MASK,
 IX86_BUILTIN_FIXUPIMMPD256_MASKZ
  etc. I wonder if you really need that many builtins, weren't we adding
  for avx512f just single builtin instead of 3 different ones, always
  providing mask argument and depending on whether it is all ones, etc.
  figuring out what kind of masking should be performed?
 
 If only we had no lang-specific flags in tree_base we could use
 the same place as we use for internal function code ...
 
 But yes, not using that many builtins in the first place is preferred
 for example by making them type-generic and/or variadic.
We might try to refactor x86 built-ins toward type-generic approach, but
I think it can be postponed to 6.x release series.

--
Thanks, K
 
 Richard.
 
  Jakub


Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-21 Thread Richard Biener
On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I saw the sources of these functions, but I can't understand why I
 should use something else? Note that all predicate computations are
 located in basic blocks ( by design of if-conv) and there is special
 function that put these computations in bb
 (insert_gimplified_predicates). Edge contains only predicate not its
 computations. New function - find_insertion_point() does very simple
 search - it finds out the latest (in current bb) operand def-stmt of
 predicates taken from all incoming edges.
 In original algorithm the predicate of non-critical edge is taken to
 perform phi-node predication since for critical edge it does not work
 properly.

 My question is: does your comments mean that I should re-design my 
 extensions?

 Well, we have infrastructure for inserting code on edges and you've
 made critical edges predicated correctly.  So why re-invent the wheel?
 I realize this is very similar to my initial suggestion to simply split
 critical edges in loops you want to if-convert but delays splitting
 until it turns out to be necessary (which might be good for the
 !force_vect case).

 For edge predicates you simply can emit their computation on the
 edge, no?

 Btw, I very originally suggested to rework if-conversion to only
 record edge predicates - having both block and edge predicates
 somewhat complicates the code and makes it harder to
 maintain (thus also the suggestion to simply split critical edges
 if necessary to make BB predicates work always).

 Your patches add a lot of code and to me it seems we can avoid
 doing so much special casing.

For example attacking the critical edge issue by a simple

Index: tree-if-conv.c
===
--- tree-if-conv.c  (revision 216508)
+++ tree-if-conv.c  (working copy)
@@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
if (EDGE_COUNT (e-src-succs) == 1)
  found = true;
   if (!found)
-   {
- if (dump_file  (dump_flags  TDF_DETAILS))
-   fprintf (dump_file, only critical predecessors\n);
- return false;
-   }
+   split_edge (EDGE_PRED (bb, 0));
 }

   return true;

it changes the number of blocks in the loop, so
get_loop_body_in_if_conv_order should probably be re-done with the
above eventually signalling that it created a new block.  Or the above
should populate a vector of edges to split and do that after the
loop calling if_convertible_bb_p.

Richard.

 Richard.

 Thanks.
 Yuri.

 BTW Jeff did initial review of my changes related to predicate
 computation for join blocks. I presented him updated patch with
 test-case and some minor changes in patch. But still did not get any
 feedback on it. Could you please take a look also on it?


 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 Yes, This patch does not make sense since phi node predication for bb
 with critical incoming edges only performs another function which is
 absent (predicate_extended_scalar_phi).

 BTW I see that commit_edge_insertions() is used for rtx instructions
 only but you propose to use it for tree also.
 Did I miss something?

 Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
 if you want easy access to the newly created basic block to push
 the predicate to - see gsi_commit_edge_inserts implementation).

 Richard.

 Thanks ahead.


 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 I did some changes in patch and ChangeLog to mark that support for
 if-convert of blocks with only critical incoming edges will be added
 in the future (more precise in patch.4).

 But the same reasoning applies to this version of the patch when
 flag_force_vectorize is true!?  (insertion point and invalid SSA form)

 Which means the patch doesn't make sense in isolation?

 Btw, I think for the case you should simply do gsi_insert_on_edge ()
 and commit_edge_insertions () before the call to combine_blocks
 (pushing the edge predicate to the newly created block).

 Richard.

 Could you please review it.

 Thanks.

 ChangeLog:

 2014-10-21  Yuri Rumyantsev  ysrum...@gmail.com

 (flag_force_vectorize): New variable.
 (edge_predicate): New function.
 (set_edge_predicate): New function.
 (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
 if destination block of edge is not always executed. Set-up predicate
 for critical edge.
 (if_convertible_phi_p): Accept phi nodes with more than two args
 if FLAG_FORCE_VECTORIZE was set-up.
 (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
 (if_convertible_stmt_p): Fix up pre-function comments.
 

Re: [PATCH i386 AVX512] [81/n] Add new built-ins.

2014-10-21 Thread Jakub Jelinek
On Tue, Oct 21, 2014 at 06:08:15PM +0400, Kirill Yukhin wrote:
 --- a/gcc/tree.h
 +++ b/gcc/tree.h
 @@ -2334,6 +2334,10 @@ extern void decl_value_expr_insert (tree, tree);
  #define DECL_COMDAT(NODE) \
(DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.comdat_flag)
  
 + /* In a FUNCTION_DECL indicates that a static chain is needed.  */
 +#define DECL_STATIC_CHAIN(NODE) \
 +  (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.regdecl_flag)
 +

I would say that you should still keep it together with the FUNCTION_DECL
macros and use FUNCTION_DECL_CHECK there, to make it clear we don't want
the macro to be used on VAR_DECLs etc.
So just s/function_decl/decl_with_vis/ in the definition IMHO.

Also, with so many added builtins, how does it affect
int i;
compilation time at -O0?  If it is significant, maybe it is highest time to
make the md builtin decl building more lazy.

Jakub


[4/6] nvptx testsuite patches: xfails and skips

2014-10-21 Thread Bernd Schmidt
Some things don't fit into nice categories that apply to a larger set of 
tests, or which are somewhat random like ptxas tool failures. For these 
I've added xfails and skips.



Bernd

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_trampolines,
	check_profiling_available, check_effective_target_lto,
	check_effective_target_vect_natural): False for nvptx-*-*.
	* gcc.c-torture/compile/limits-fndefn.c: Skip for nvptx-*-*.
	* gcc.c-torture/compile/pr34334.c: Likewise.
	* gcc.c-torture/compile/pr37056.c: Likewise.
	* gcc.c-torture/compile/pr39423-1.c: Likewise.
	* gcc.c-torture/compile/pr46534.c: Likewise.
	* gcc.c-torture/compile/pr49049.c: Likewise.
	* gcc.c-torture/compile/pr59417.c: Likewise.
	* gcc.c-torture/compile/20080721-1.c: Likewise.
	* gcc.c-torture/compile/920501-4.c: Likewise.
	* gcc.c-torture/compile/921011-1.c: Likewise.	
	* gcc.dg/20040813-1.c: Likewise.
	* gcc.dg/pr28755.c: Likewise.
	* gcc.dg/pr44194-1.c: Likewise.
	* gcc.c-torture/compile/pr42717.c: Xfail for nvptx-*-*.
	* gcc.c-torture/compile/pr61684.c: Likewise.
	* gcc.c-torture/compile/pr20601-1.c: Likewise.
	* gcc.c-torture/compile/pr59221.c: Likewise.
	* gcc.c-torture/compile/20060208-1.c: Likewise.
	* gcc.c-torture/execute/pr52129.c: Likewise.
	* gcc.c-torture/execute/20020310-1.c: Likewise.
	* gcc.c-torture/execute/20101011-1.c: Define DO_TEST to 0 for nvptx.
	* gcc.c-torture/execute20020312-2.c: Add case for for nvptx.
	* gcc.c-torture/compile/pr60655-1.c: Don't add -fdata-sections
	for nvptx-*-*.
	* gcc.dg/pr36400.c: Xfail scan-assembler test on nvptx-*-*.
	* gcc.dg/const-elim-2.c: Likewise.


Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp.orig
+++ gcc/testsuite/lib/target-supports.exp
@@ -436,6 +436,7 @@ proc check_effective_target_trampolines
 }
 if { [istarget avr-*-*]
 	 || [istarget msp430-*-*]
+	 || [istarget nvptx-*-*]
 	 || [istarget hppa2.0w-hp-hpux11.23]
 	 || [istarget hppa64-hp-hpux11.23] } {
 	return 0;
@@ -532,6 +533,7 @@ proc check_profiling_available { test_wh
 	 || [istarget msp430-*-*]
 	 || [istarget nds32*-*-elf]
 	 || [istarget nios2-*-elf]
+	 || [istarget nvptx-*-*]
 	 || [istarget powerpc-*-eabi*]
 	 || [istarget powerpc-*-elf]
 	 || [istarget rx-*-*]	
@@ -4216,7 +4218,8 @@ proc check_effective_target_vect_natural
 verbose check_effective_target_vect_natural_alignment: using cached result 2
 } else {
 set et_vect_natural_alignment_saved 1
-if { [check_effective_target_arm_eabi] } {
+if { [check_effective_target_arm_eabi]
+	 || [istarget nvptx-*-*] } {
 set et_vect_natural_alignment_saved 0
 }
 }
@@ -5691,6 +5694,9 @@ proc check_effective_target_gld { } {
 
 proc check_effective_target_lto { } {
 global ENABLE_LTO
+if { [istarget nvptx-*-*] } {
+	return 0;
+}
 return [info exists ENABLE_LTO]
 }
 
Index: gcc/testsuite/gcc.c-torture/compile/limits-fndefn.c
===
--- gcc/testsuite/gcc.c-torture/compile/limits-fndefn.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/limits-fndefn.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if too complex for avr { avr-*-* } { * } {  } } */
+/* { dg-skip-if ptxas times out { nvptx-*-* } { * } {  } } */
 /* { dg-timeout-factor 4.0 } */
 #define LIM1(x) x##0, x##1, x##2, x##3, x##4, x##5, x##6, x##7, x##8, x##9,
 #define LIM2(x) LIM1(x##0) LIM1(x##1) LIM1(x##2) LIM1(x##3) LIM1(x##4) \
Index: gcc/testsuite/gcc.c-torture/compile/pr60655-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr60655-1.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/pr60655-1.c
@@ -1,4 +1,4 @@
-/* { dg-options -fdata-sections { target { ! { { hppa*-*-hpux* }  { ! lp64 } } } } } */
+/* { dg-options -fdata-sections { target { { ! { { hppa*-*-hpux* }  { ! lp64 } } }  { ! nvptx-*-* } } } } */
 
 typedef unsigned char unit;
 typedef unit *unitptr;
Index: gcc/testsuite/gcc.c-torture/compile/pr34334.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr34334.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/pr34334.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if ptxas times out { nvptx-*-* } { * } { -O0 } } */
 __extension__ typedef __SIZE_TYPE__ size_t;
 __extension__ typedef long long int __quad_t;
 __extension__ typedef unsigned int __mode_t;
Index: gcc/testsuite/gcc.c-torture/compile/pr37056.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr37056.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/pr37056.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if ptxas times out { nvptx-*-* } { -O2 -Os } {  } } */
 extern void abort (void);
 
 static union {
Index: gcc/testsuite/gcc.c-torture/compile/pr39423-1.c

[5/6] nvptx testsuite patches: jumps and labels

2014-10-21 Thread Bernd Schmidt
This deals with tests requiring indirect jumps (including tests using 
setjmp), label values, and nonlocal goto.


A subset of these tests uses the NO_LABEL_VALUES macro, but it's not 
consistent across the testsuite. The feature test I wrote tests whether 
that is defined and returns false for label_values if so.



Bernd

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_indirect_jumps):
	New function.
	(check_effective_target_nonlocal_goto): New function.
	(check_effective_target_label_values): New function.
	* gcc.c-torture/execute/20071220-2.c: Require label_values.
	* gcc.c-torture/compile/labels-2.c: Likewise.
	* gcc.c-torture/compile/2518-1.c: Likewise.
	* gcc.c-torture/compile/20021108-1.c: Likewise.
	* gcc.c-torture/compile/981006-1.c: Likewise.
	* gcc.c-torture/execute/20040302-1.c: Likewise.
	* gcc.dg/torture/pr33848.c: Likewise.

	* gcc.c-torture/compile/pr46107.c: Require indirect jumps and
	label values.
	* gcc.c-torture/compile/pr32919.c: Likewise.
	* gcc.c-torture/compile/pr17913.c: Likewise.
	* gcc.c-torture/compile/pr51495.c: Likewise.
	* gcc.c-torture/compile/pr25224.c: Likewise.
	* gcc.c-torture/compile/labels-3.c: Likewise.
	* gcc.c-torture/compile/pr27863.c: Likewise.
	* gcc.c-torture/compile/20050510-1.c: Likewise.
	* gcc.c-torture/compile/pr28489.c: Likewise.
	* gcc.c-torture/compile/pr29128.c: Likewise.
	* gcc.c-torture/compile/pr21356: Likewise.
	* gcc.c-torture/execute/20071210-1.c: Likewise.
	* gcc.c-torture/execute/200701220-1.c: Likewise.
	* gcc.c-torture/execute/pr51447.c: Likewise.
	* gcc.c-torture/execute/comp-goto-1.c: Likewise.
	* gcc.c-torture/execute/comp-goto-2.c: Likewise.
	* gcc.dg/20021029-1.c: Likewise.
	* gcc.dg/pr43379.c: Likewise.
	* gcc.dg/pr45259.c: Likewise.
	* gcc.dg/torture/pr53695.c: Likewise.
	* gcc.dg/torture/pr57584.c: Likewise.

	* gcc.c-torture/execute/980526-1.c: Skip if -O0 and neither label_values
	or indirect_jumps are available.
	* gcc.c-torture/compile/920415-1.c: Likewise.  Remove NO_LABEL_VALUES
	test.
	* gcc.c-torture/compile/920428-3.c: Likewise.
	* gcc.c-torture/compile/950613-1.c: Likewise.

	* gcc.c-torture/compile/pr30984.c: Require indirect jumps.
	* gcc.c-torture/compile/991213-3.c: Likewise.
	* gcc.c-torture/compile/920825-1.c: Likewise.
	* gcc.c-torture/compile/20011029-1.c: Likewise.
	* gcc.c-torture/compile/complex-6.c: Likewise.
	* gcc.c-torture/compile/pr27127.c: Likewise.
	* gcc.c-torture/compile/pr58164.c: Likewise.
	* gcc.c-torture/compile/20041214-1.c: Likewise.
	* gcc.c-torture/execute/built-in-setjmp.c: Likewise.
	* gcc.c-torture/execute/pr56982.c: Likewise.
	* gcc.c-torture/execute/pr60003.c: Likewise.
	* gcc.c-torture/execute/pr26983.c: Likewise.
	* gcc.dg/pr57287-2.c: Likewise.
	* gcc.dg/pr59920-1.c: Likewise.
	* gcc.dg/pr59920-2.c: Likewise.
	* gcc.dg/pr59920-3.c: Likewise.
	* gcc.dg/setjmp-3.c: Likewise.
	* gcc.dg/setjmp-4.c: Likewise.
	* gcc.dg/setjmp-5.c: Likewise.
	* gcc.dg/torture/pr48542.c: Likewise.
	* gcc.dg/torture/pr57147-2.c: Likewise.
	* gcc.dg/torture/pr59993.c: Likewise.

	* gcc.dg/torture/stackalign/non-local-goto-1.c: Require nonlocal_goto.
	* gcc.dg/torture/stackalign/non-local-goto-2.c: Likewise.
	* gcc.dg/torture/stackalign/non-local-goto-3.c: Likewise.
	* gcc.dg/torture/stackalign/non-local-goto-4.c: Likewise.
	* gcc.dg/torture/stackalign/non-local-goto-5.c: Likewise.
 	* gcc.dg/torture/stackalign/setjmp-1.c: Likewise.
 	* gcc.dg/torture/stackalign/setjmp-3.c: Likewise.
 	* gcc.dg/torture/stackalign/setjmp-4.c: Likewise.
	* gcc.dg/non-local-goto-1.c: Likewise.
	* gcc.dg/non-local-goto-2.c: Likewise.
	* gcc.dg/pr49994-1.c: Likewise.
	* gcc.dg/torture/pr57036-2.c: Likewise.

	* gcc.c-torture/compile/20040614-1.c: Require label_values.  Remove
	NO_LABEL_VALUES test.
	* gcc.c-torture/compile/920831-1.c: Likewise.
	* gcc.c-torture/compile/920502-1.c: Likewise.
	* gcc.c-torture/compile/920501-7.c: Likewise.
	* gcc.dg/pr52139.c: Likewise.


Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp.orig
+++ gcc/testsuite/lib/target-supports.exp
@@ -604,7 +606,38 @@ proc add_options_for_tls { flags } {
 return 1
 }
 
+# Return 1 if indirect jumps are supported, 0 otherwise.
+
+proc check_effective_target_indirect_jumps {} {
+if { [istarget nvptx-*-*] } {
+	return 0
+}
+return 1
+}
+
+# Return 1 if nonlocal goto is supported, 0 otherwise.
+
+proc check_effective_target_nonlocal_goto {} {
+if { [istarget nvptx-*-*] } {
+	return 0
+}
+return 1
+}
+
+# Return 1 if taking label values is supported, 0 otherwise.
+
+proc check_effective_target_label_values {} {
+if { [istarget nvptx-*-*] } {
+	return 0
+}
+return [check_no_compiler_messages label_values assembly {
+	#ifdef NO_LABEL_VALUES
+	#error NO
+	#endif
+}]
+}
+
 # Return 1 if the assembler does not verify function types against

[6/7] Random tweaks

2014-10-21 Thread Bernd Schmidt
This tweaks a few tests so that we don't have to skip them.  This is 
mostly concerned with declaring main properly, or changing other 
declarations where the test does not seem to rely on the type mismatches.


I've also included one example of changing a function name to not be 
call, ptxas seems to have a bug that makes it not allow this function 
name. If that doesn't seem too awful I'll have a few more tests to fix 
up in this way.


There'll be a 7th patch, not because I can't count, but because I didn't 
follow a consistent naming scheme for the patches.



Bernd

	* gcc.c-torture/compile/920625-2.c: Add return type to
	freeReturnStruct.
	* gcc.c-torture/execute/20091229-1.c: Declare main properly.
	* gcc.c-torture/execute/pr61375.c: Likewise.
	* gcc.c-torture/execute/20111208-1.c: Use __SIZE_TYPE__ for size_t.
	* gcc.dg/pr30904.c: Remove extern from declaration of t.
	* gcc.c-torture/compile/callind.c (bar): Renamed from call.

Index: gcc/testsuite/gcc.c-torture/compile/920625-2.c
===
--- gcc/testsuite/gcc.c-torture/compile/920625-2.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/920625-2.c
@@ -100,4 +100,4 @@ copyQueryResult(Widget w, Boolean copy,
   freeReturnStruct();
 }
 
-freeReturnStruct(){}
+void freeReturnStruct(){}
Index: gcc/testsuite/gcc.c-torture/execute/20091229-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20091229-1.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/20091229-1.c
@@ -1,2 +1,2 @@
 long long foo(long long v) { return v / -0x08000LL; }
-void main() { if (foo(0x08000LL) != -1) abort(); exit (0); }
+int main(int argc, char **argv) { if (foo(0x08000LL) != -1) abort(); exit (0); }
Index: gcc/testsuite/gcc.c-torture/execute/20111208-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20111208-1.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/20111208-1.c
@@ -1,7 +1,7 @@
 /* PR tree-optimization/51315 */
 /* Reported by Jurij Smakov ju...@wooyd.org */
 
-typedef unsigned int size_t;
+typedef __SIZE_TYPE__ size_t;
 
 extern void *memcpy (void *__restrict __dest,
__const void *__restrict __src, size_t __n)
Index: gcc/testsuite/gcc.c-torture/execute/pr61375.c
===
--- gcc/testsuite/gcc.c-torture/execute/pr61375.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/pr61375.c
@@ -19,7 +19,7 @@ uint128_central_bitsi_ior (unsigned __in
 }
 
 int
-main(int argc)
+main(int argc, char **argv)
 {
   __int128 in = 1;
 #ifdef __SIZEOF_INT128__
Index: gcc/testsuite/gcc.dg/pr30904.c
===
--- gcc/testsuite/gcc.dg/pr30904.c.orig
+++ gcc/testsuite/gcc.dg/pr30904.c
@@ -1,7 +1,7 @@
 /* { dg-do link } */
 /* { dg-options -O2 -fdump-tree-optimized } */
 
-extern int t;
+int t;
 extern void link_error(void);
 int main (void)
 {
Index: gcc/testsuite/gcc.c-torture/compile/callind.c
===
--- gcc/testsuite/gcc.c-torture/compile/callind.c.orig
+++ gcc/testsuite/gcc.c-torture/compile/callind.c
@@ -1,8 +1,8 @@
-call (foo, a)
+bar (foo, a)
  int (**foo) ();
 {
 
-  (foo)[1] = call;
+  (foo)[1] = bar;
 
   foo[a] (1);
 }


[7/7] nvptx testsuite patches: Return addresses

2014-10-21 Thread Bernd Schmidt

This tests for availability of return addresses in a number of tests.


Bernd

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_return_address):
	New function.
	* gcc.c-torture/execute/20010122-1.c: Require return_address.
	* gcc.c-torture/execute/20030323-1.c: Likewise.
	* gcc.c-torture/execute/20030811-1.c: Likewise.
	* gcc.c-torture/execute/eeprof-1.c: Likewise.
	* gcc.c-torture/execute/frame-address.c: Likewise.
	* gcc.c-torture/execute/pr17377.c: Likewise.


Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp.orig
+++ gcc/testsuite/lib/target-supports.exp
@@ -604,7 +606,17 @@ proc add_options_for_tls { flags } {
 return 1
 }
 
+# Return 1 if builtin_return_address and builtin_frame_address are
+# supported, 0 otherwise.
+
+proc check_effective_target_return_address {} {
+if { [istarget nvptx-*-*] } {
+	return 0
+}
+return 1
+}
+
 # Return 1 if the assembler does not verify function types against
 # calls, 0 otherwise.  Such verification will typically show up problems
 # with KR C function declarations.
 
Index: gcc/testsuite/gcc.c-torture/execute/20010122-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20010122-1.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/20010122-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if requires frame pointers { *-*-* } -fomit-frame-pointer  } */
+/* { dg-require-effective-target return_address } */
 
 extern void exit (int);
 extern void abort (void);
Index: gcc/testsuite/gcc.c-torture/execute/20030323-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20030323-1.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/20030323-1.c
@@ -1,4 +1,5 @@
 /* PR opt/10116 */
+/* { dg-require-effective-target return_address } */
 /* Removed tablejump while label still in use; this is really a link test.  */
 
 void *NSReturnAddress(int offset)
Index: gcc/testsuite/gcc.c-torture/execute/20030811-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20030811-1.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/20030811-1.c
@@ -1,4 +1,5 @@
 /* Origin: PR target/11535 from H. J. Lu h...@lucon.org */
+/* { dg-require-effective-target return_address } */
 
 void vararg (int i, ...)
 {
Index: gcc/testsuite/gcc.c-torture/execute/eeprof-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/eeprof-1.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/eeprof-1.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target return_address } */
 /* { dg-options -finstrument-functions } */
 /* { dg-xfail-if  { powerpc-ibm-aix* } *  } */
 
Index: gcc/testsuite/gcc.c-torture/execute/frame-address.c
===
--- gcc/testsuite/gcc.c-torture/execute/frame-address.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/frame-address.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target return_address } */
 int check_fa_work (const char *, const char *) __attribute__((noinline));
 int check_fa_mid (const char *) __attribute__((noinline));
 int check_fa (char *) __attribute__((noinline));
Index: gcc/testsuite/gcc.c-torture/execute/pr17377.c
===
--- gcc/testsuite/gcc.c-torture/execute/pr17377.c.orig
+++ gcc/testsuite/gcc.c-torture/execute/pr17377.c
@@ -1,6 +1,7 @@
 /* PR target/17377
Bug in code emitted by return pattern on CRIS: missing pop of
forced return address on stack.  */
+/* { dg-require-effective-target return_address } */
 int calls = 0;
 
 void *f (int) __attribute__ ((__noinline__));


Re: [PATCH,1/2] Extended if-conversion for loops marked with pragma omp simd.

2014-10-21 Thread Yuri Rumyantsev
Richard,

In my initial design I did such splitting but before start real
if-conversion but I decided to not perform it since code size for
if-converted loop is growing (number of phi nodes is increased). It is
worth noting also that for phi with #nodes  2 we need to get all
predicates (except one) to do phi-predication and it means that block
containing such phi can have only 1 critical edge.

Thanks.
Yuri.

2014-10-21 18:19 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Richard,

 I saw the sources of these functions, but I can't understand why I
 should use something else? Note that all predicate computations are
 located in basic blocks ( by design of if-conv) and there is special
 function that put these computations in bb
 (insert_gimplified_predicates). Edge contains only predicate not its
 computations. New function - find_insertion_point() does very simple
 search - it finds out the latest (in current bb) operand def-stmt of
 predicates taken from all incoming edges.
 In original algorithm the predicate of non-critical edge is taken to
 perform phi-node predication since for critical edge it does not work
 properly.

 My question is: does your comments mean that I should re-design my 
 extensions?

 Well, we have infrastructure for inserting code on edges and you've
 made critical edges predicated correctly.  So why re-invent the wheel?
 I realize this is very similar to my initial suggestion to simply split
 critical edges in loops you want to if-convert but delays splitting
 until it turns out to be necessary (which might be good for the
 !force_vect case).

 For edge predicates you simply can emit their computation on the
 edge, no?

 Btw, I very originally suggested to rework if-conversion to only
 record edge predicates - having both block and edge predicates
 somewhat complicates the code and makes it harder to
 maintain (thus also the suggestion to simply split critical edges
 if necessary to make BB predicates work always).

 Your patches add a lot of code and to me it seems we can avoid
 doing so much special casing.

 For example attacking the critical edge issue by a simple

 Index: tree-if-conv.c
 ===
 --- tree-if-conv.c  (revision 216508)
 +++ tree-if-conv.c  (working copy)
 @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
 if (EDGE_COUNT (e-src-succs) == 1)
   found = true;
if (!found)
 -   {
 - if (dump_file  (dump_flags  TDF_DETAILS))
 -   fprintf (dump_file, only critical predecessors\n);
 - return false;
 -   }
 +   split_edge (EDGE_PRED (bb, 0));
  }

return true;

 it changes the number of blocks in the loop, so
 get_loop_body_in_if_conv_order should probably be re-done with the
 above eventually signalling that it created a new block.  Or the above
 should populate a vector of edges to split and do that after the
 loop calling if_convertible_bb_p.

 Richard.

 Richard.

 Thanks.
 Yuri.

 BTW Jeff did initial review of my changes related to predicate
 computation for join blocks. I presented him updated patch with
 test-case and some minor changes in patch. But still did not get any
 feedback on it. Could you please take a look also on it?


 2014-10-21 17:38 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 Yes, This patch does not make sense since phi node predication for bb
 with critical incoming edges only performs another function which is
 absent (predicate_extended_scalar_phi).

 BTW I see that commit_edge_insertions() is used for rtx instructions
 only but you propose to use it for tree also.
 Did I miss something?

 Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
 if you want easy access to the newly created basic block to push
 the predicate to - see gsi_commit_edge_inserts implementation).

 Richard.

 Thanks ahead.


 2014-10-21 16:44 GMT+04:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Richard,

 I did some changes in patch and ChangeLog to mark that support for
 if-convert of blocks with only critical incoming edges will be added
 in the future (more precise in patch.4).

 But the same reasoning applies to this version of the patch when
 flag_force_vectorize is true!?  (insertion point and invalid SSA form)

 Which means the patch doesn't make sense in isolation?

 Btw, I think for the case you should simply do gsi_insert_on_edge ()
 and commit_edge_insertions () before the call to combine_blocks
 (pushing the edge predicate to the newly created block).

 Richard.

 Could you please review it.

 Thanks.

 ChangeLog:

 2014-10-21  Yuri Rumyantsev  

Re: [PATCH][ARM]Add ACLE 2.0 predefined marco __ARM_FEATURE_IDIV

2014-10-21 Thread Ramana Radhakrishnan
On Mon, Oct 13, 2014 at 3:15 PM, Renlin Li renlin...@arm.com wrote:
 Hi all,

 This is a simple patch to add missing __ARM_FEATURE_IDIV__ predefined
 marco(ACLE 2.0) into TARGET_CPU_CPP_BUILTINS.
 Is it Okay to commit?


 gcc/ChangeLog:

 2014-10-13  Renlin Li  renlin...@arm.com

 * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Add ACLE 2.0 predefined
 marco __ARM_FEATURE_IDIV__.

Replace this with Define __ARM_FEATURE_IDIV__. in the Changelog.

Ok with that change.

Ramana


Re: [0/6] nvptx testsuite patches

2014-10-21 Thread Jeff Law

On 10/21/14 14:10, Bernd Schmidt wrote:

This series modifies a large number of tests in order to clean up
testsuite results on nvptx. The goal here was never really to get an
entirely clean run - the target is just too different from conventional
ones - but to be able to test the compiler sufficiently to be sure that
it's in good shape for use in offloading. Most of the patches here add
annotations for use of features like alloca or indirect jumps that are
unsupported on the target.

Examples of things that still cause failures are things like dots in
identifiers, use of constructors (which is something I want to look
into), certain constructs that trigger bugs in the ptxas tool, and lots
of undefined C library functions.
Yea. When I first looked at PTX, my thought was to use the existing 
testsuite, to the extent possible, to shake out the initial code 
generation issues.  There's just some things that would require heroic 
effort to make work and they aren't really a priority for PTX.


So I've got not problem conceptually with the direction this work is taking.

jeff



Re: [PATCH i386 AVX512] [81/n] Add new built-ins.

2014-10-21 Thread Kirill Yukhin
On 21 Oct 16:20, Jakub Jelinek wrote:
 On Tue, Oct 21, 2014 at 06:08:15PM +0400, Kirill Yukhin wrote:
  --- a/gcc/tree.h
  +++ b/gcc/tree.h
  @@ -2334,6 +2334,10 @@ extern void decl_value_expr_insert (tree, tree);
   #define DECL_COMDAT(NODE) \
 (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.comdat_flag)
   
  + /* In a FUNCTION_DECL indicates that a static chain is needed.  */
  +#define DECL_STATIC_CHAIN(NODE) \
  +  (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.regdecl_flag)
  +
 
 I would say that you should still keep it together with the FUNCTION_DECL
 macros and use FUNCTION_DECL_CHECK there, to make it clear we don't want
 the macro to be used on VAR_DECLs etc.
 So just s/function_decl/decl_with_vis/ in the definition IMHO.
Yeah, sure.

 Also, with so many added builtins, how does it affect
 int i;
 compilation time at -O0?  If it is significant, maybe it is highest time to
 make the md builtin decl building more lazy.
I've tried this:
$ echo int i;  test.c
$ time for i in `seq 1` ; do ./build-x86_64-linux/gcc/xgcc 
-B./build-x86_64-linux/gcc -O0 -S test.c ; done

For trunk w/ and w/o the patch applied.
Got 106.86 vs. 106.85 secs. which looks equal.
So, I think we may say that this patch does not affect compile time. 

--
Thanks, K
 
   Jakub


[COMMITTED][PATCH][ARM] Update target testcases for gnu11

2014-10-21 Thread Jiong Wang


On 21/10/14 15:13, Ramana Radhakrishnan wrote:


On 21/10/14 14:48, Jiong Wang wrote:

this patch update arm testcases for recently gnu11 change.

ok for trunk?

This is OK  bar the minor nit in the ChangeLog below - as a follow up it
would be nice to see if we can use the ACLE feature macros instead of
hard-coding some of the functions into the target_neon supports
(especially the ones for vcvt16 and vfma).


thanks.

gcc/testsuite/
 * gcc.target/arm/20031108-1.c: Add explicit declaration.
 * gcc.target/arm/cold-lc.c: Likewise.
 * gcc.target/arm/neon-modes-2.c: Likewise.
 * gcc.target/arm/pr43920-2.c: Likewise.
 * gcc.target/arm/pr44788.c: Likewise.
 * gcc.target/arm/pr55642.c: Likewise.
 * gcc.target/arm/pr58784.c: Likewise.
 * gcc.target/arm/pr60650.c: Likewise.
 * gcc.target/arm/pr60650-2.c: Likewise.
 * gcc.target/arm/vfp-ldmdbs.c: Likewise.
 * gcc.target/arm/vfp-ldmias.c: Likewise.
 * lib/target-supports.exp: Likewise.

Can you mention the specific target-supports functions changed here,
please ?


OK, thanks, committed.

URL:https://gcc.gnu.org/viewcvs?rev=216517root=gccview=rev
Log:
[ARM] Update testcases for GNU11

2014-10-21  Jiong Wang  jiong.w...@arm.com

* gcc.target/arm/20031108-1.c (Proc_7): Add explicit declaration.
(Proc_1): Add return type.
* gcc.target/arm/cold-lc.c (show_stack): Add explict declaration.
* gcc.target/arm/neon-modes-2.c (foo): Likewise.
* gcc.target/arm/pr43920-2.c (lseek): Likewise.
* gcc.target/arm/pr44788.c (foo): Likewise.
* gcc.target/arm/pr55642.c (abs): Likewise.
* gcc.target/arm/pr58784.c (f): Likewise.
* gcc.target/arm/pr60650.c (foo1, foo2): Likewise.
* gcc.target/arm/vfp-ldmdbs.c (bar): Likewise.
* gcc.target/arm/vfp-ldmias.c (bar): Likewise.
* gcc.target/arm/pr60650-2.c (fn1, fn2): Add return type and add type
for local variables.
* lib/target-supports.exp
(check_effective_target_arm_crypto_ok_nocache): Add declaration for
vaeseq_u8.
(check_effective_target_arm_neon_fp16_ok_nocache): Add declaration for
vcvt_f16_f32.
(check_effective_target_arm_neonv2_ok_nocache): Add declaration for
vfma_f32.
* gcc.target/arm/pr51968.c: Add -Wno-implicit-function-declaration.




Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'

2014-10-21 Thread Chen Gang
On 09/25/2014 08:12 AM, Chen Gang wrote:
 OK, thanks, next month, I shall try Qemu for microblaze (I also focus on 
 Qemu, and try to make patches for it).
 

Excuse me, after tried upstream qemu, it cann't run microblaze correctly,
even for Xilinx qemu branch, I cann't run correctly either. I tried to
consult related members in qemu mailing list, but got no result.

After compared upstream branch and Xilinx branch, I am sure upstream
microblaze qemu is lack of several related main features about
microblaze.

For qemu, I have to only focus on upstream, and try bug fix patches, new
features and version merging are out of my current border, so sorry, I
have to stop trying qemu for microblaze gcc test, at present.

 So I guess the root cause is: I only use cross-compiling environments
 under fedora x86_64, no any real or virtual target for test.
 
 Yes, if you want to test on a target, you will need a target.  You can either 
 have a simulator (see binutils and sim/* for an example of how to write one) 
 or target hardware in some form.
 

After trying sim, for me, it is really useful way for test, although I
also met issues:

For a hello world C program, microblaze gcc succeeded building, gdb can
load and display the source code, and disassembe code successfully, but
sim reported failure, the related issue is below:

  [root@localhost test]# /upstream/release/bin/microblaze-gchen-linux-run ./test
  Loading section .interp, size 0xd vma 0x10f4
  Loading section .note.ABI-tag, size 0x20 vma 0x1104
  Loading section .hash, size 0x24 vma 0x1124
  Loading section .dynsym, size 0x40 vma 0x1148
  Loading section .dynstr, size 0x3c vma 0x1188
  Loading section .gnu.version, size 0x8 vma 0x11c4
  Loading section .gnu.version_r, size 0x20 vma 0x11cc
  Loading section .rela.dyn, size 0x24 vma 0x11ec
  Loading section .rela.plt, size 0x24 vma 0x1210
  Loading section .init, size 0x58 vma 0x1234
  Loading section .plt, size 0x44 vma 0x128c
  Loading section .text, size 0x3d0 vma 0x12d0
  Loading section .fini, size 0x34 vma 0x16a0
  Loading section .rodata, size 0x12 vma 0x16d4
  Loading section .eh_frame, size 0x4 vma 0x16e8
  Loading section .ctors, size 0x8 vma 0x100016ec
  Loading section .dtors, size 0x8 vma 0x100016f4
  Loading section .jcr, size 0x4 vma 0x100016fc
  Loading section .dynamic, size 0xd0 vma 0x10001700
  Loading section .got, size 0xc vma 0x100017d0
  Loading section .got.plt, size 0x18 vma 0x100017dc
  Loading section .data, size 0x10 vma 0x100017f4
  Start address 0x12d0
  Transfer rate: 14424 bits in 1 sec.
  ERROR: Unknown opcode
  program stopped with signal 4.

For me, I guess it is sim's issue, and I shall try to fix it in the next
month, so sorry, I can not finish emulator for microblaze within this
month. :-(


Welcome any ideas, suggestions or completions.

Thanks.
-- 
Chen Gang

Open share and attitude like air water and life which God blessed


Re: [PATCH 1/5] Add recog_constrain_insn

2014-10-21 Thread Vladimir Makarov
On 10/17/2014 10:47 AM, Richard Sandiford wrote:
 This patch just adds a new utility function called recog_constrain_insn,
 to go alongside the existing recog_constrain_insn_cached.

 Note that the extract_insn in lra.c wasn't used when checking is disabled.
 The function just moved on to the next instruction straight away.


The RA parts are ok for me.   Thanks, Richard.

 gcc/
   * recog.h (extract_constrain_insn): Declare.
   * recog.c (extract_constrain_insn): New function.
   * lra.c (check_rtl): Use it.
   * postreload.c (reload_cse_simplify_operands): Likewise.
   * reg-stack.c (check_asm_stack_operands): Likewise.
   (subst_asm_stack_regs): Likewise.
   * regcprop.c (copyprop_hardreg_forward_1): Likewise.
   * regrename.c (build_def_use): Likewise.
   * sel-sched.c (get_reg_class): Likewise.
   * config/arm/arm.c (note_invalid_constants): Likewise.
   * config/s390/predicates.md (execute_operation): Likewise.






Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-21 Thread augustine.sterl...@gmail.com
On Wed, Oct 15, 2014 at 7:10 PM, Yangfei (Felix) felix.y...@huawei.com wrote:
 Hi Sterling,

 Since the patch is delayed for a long time, I'm kind of pushing it. Sorry 
 for that.
 Yeah, you are right. We have some performance issue here as GCC may use 
 one more general register in some cases with this patch.
 Take the following arraysum testcase for example. In doloop optimization, 
 GCC figures out that the number of iterations is 1024 and creates a new 
 pseudo 79 as the new trip count register.
 The pseudo 79 is live throughout the loop, this makes the register 
 pressure in the loop higher. And it's possible that this new pseudo is 
 spilled by reload when the register pressure is very high.
 I know that the xtensa loop instruction copies the trip count register 
 into the LCOUNT special register. And we need describe this hardware feature 
 in GCC in order to free the trip count register.
 But I find it difficult to do. Do you have any good suggestions on this?

There are two issues related to the trip count, one I would like you
to solve now, one later.

1. Later: The trip count doesn't need to be updated at all inside
these loops, once the loop instruction executes. The code below
relates to this case.

2. Now: You should be able to use a loop instruction regardless of
whether the trip count is spilled. If you have an example where that
wouldn't work, I would love to see it.


 arraysum.c:
 int g[1024];
 int g_sum;

 void test_entry ()
 {
 int i, Sum = 0;

 for (i = 0; i  1024; i++)
   Sum = Sum + g[i];

 g_sum = Sum;
 }


 1. RTL before the doloop optimization pass(arraysum.c.193r.loop2_invariant):
 (note 34 0 32 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
 (note 32 34 36 2 NOTE_INSN_FUNCTION_BEG)
 (insn 36 32 37 2 (set (reg:SI 72 [ ivtmp$8 ])
 (mem/u/c:SI (symbol_ref/u:SI (*.LC2) [flags 0x2]) [2  S4 A32])) 29 
 {movsi_internal}
  (expr_list:REG_EQUAL (symbol_ref:SI (g)  var_decl 0x7f6eef5d62d0 g)
 (nil)))
 (insn 37 36 33 2 (set (reg/f:SI 76 [ D.1393 ])
 (mem/u/c:SI (symbol_ref/u:SI (*.LC3) [flags 0x2]) [2  S4 A32])) 29 
 {movsi_internal}
  (expr_list:REG_EQUAL (const:SI (plus:SI (symbol_ref:SI (g)  var_decl 
 0x7f6eef5d62d0 g)
 (const_int 4096 [0x1000])))
 (nil)))
 (insn 33 37 42 2 (set (reg/v:SI 74 [ Sum ])
 (const_int 0 [0])) arraysum.c:6 29 {movsi_internal}
  (nil))
 (code_label 42 33 38 3 2  [0 uses])
 (note 38 42 39 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
 (insn 39 38 40 3 (set (reg:SI 77 [ MEM[base: _14, offset: 0B] ])
 (mem:SI (reg:SI 72 [ ivtmp$8 ]) [2 MEM[base: _14, offset: 0B]+0 S4 
 A32])) arraysum.c:9 29 {movsi_internal}
  (nil))
 (insn 40 39 41 3 (set (reg/v:SI 74 [ Sum ])
 (plus:SI (reg/v:SI 74 [ Sum ])
 (reg:SI 77 [ MEM[base: _14, offset: 0B] ]))) arraysum.c:9 1 
 {addsi3}
  (expr_list:REG_DEAD (reg:SI 77 [ MEM[base: _14, offset: 0B] ])
 (nil)))
 (insn 41 40 43 3 (set (reg:SI 72 [ ivtmp$8 ])
 (plus:SI (reg:SI 72 [ ivtmp$8 ])
 (const_int 4 [0x4]))) 1 {addsi3}
  (nil))
 (jump_insn 43 41 52 3 (set (pc)
 (if_then_else (ne (reg:SI 72 [ ivtmp$8 ])
 (reg/f:SI 76 [ D.1393 ]))
 (label_ref:SI 52)
 (pc))) arraysum.c:8 39 {*btrue}
  (int_list:REG_BR_PROB 9899 (nil))
  - 52)
 (code_label 52 43 51 5 3  [1 uses])
 (note 51 52 44 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
 (note 44 51 45 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
 (insn 45 44 46 4 (set (reg/f:SI 78)
 (mem/u/c:SI (symbol_ref/u:SI (*.LC4) [flags 0x2]) [2  S4 A32])) 
 arraysum.c:11 29 {movsi_internal}
  (expr_list:REG_EQUAL (symbol_ref:SI (g_sum)  var_decl 0x7f6eef5d6360 
 g_sum)
 (nil)))
 (insn 46 45 0 4 (set (mem/c:SI (reg/f:SI 78) [2 g_sum+0 S4 A32])
 (reg/v:SI 74 [ Sum ])) arraysum.c:11 29 {movsi_internal}
  (expr_list:REG_DEAD (reg/f:SI 78)
 (expr_list:REG_DEAD (reg/v:SI 74 [ Sum ])
 (nil


 2. RTL after the doloop optimization pass(arraysum.c.195r.loop2_doloop):
 (note 34 0 32 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
 (note 32 34 36 2 NOTE_INSN_FUNCTION_BEG)
 (insn 36 32 37 2 (set (reg:SI 72 [ ivtmp$8 ])
 (mem/u/c:SI (symbol_ref/u:SI (*.LC2) [flags 0x2]) [2  S4 A32])) 29 
 {movsi_internal}
  (expr_list:REG_EQUAL (symbol_ref:SI (g)  var_decl 0x7f6eef5d62d0 g)
 (nil)))
 (insn 37 36 33 2 (set (reg/f:SI 76 [ D.1393 ])
 (mem/u/c:SI (symbol_ref/u:SI (*.LC3) [flags 0x2]) [2  S4 A32])) 29 
 {movsi_internal}
  (expr_list:REG_EQUAL (const:SI (plus:SI (symbol_ref:SI (g)  var_decl 
 0x7f6eef5d62d0 g)
 (const_int 4096 [0x1000])))
 (nil)))
 (insn 33 37 54 2 (set (reg/v:SI 74 [ Sum ])
 (const_int 0 [0])) arraysum.c:6 29 {movsi_internal}
  (nil))
 (insn 54 33 42 2 (set (reg:SI 79)
 (const_int 1024 [0x400])) arraysum.c:6 -1
  (nil))
 (code_label 42 54 38 3 2  [0 uses])
 (note 38 42 39 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
 

Re: [PATCH 2/5] Add preferred_for_{size,speed} attributes

2014-10-21 Thread Vladimir Makarov
On 10/17/2014 10:48 AM, Richard Sandiford wrote:
 This is the main patch, to add new preferred_for_size and
 preferred_for_speed attributes that can be used to selectively disable
 alternatives when optimising for size or speed.  As explained in the
 docs, the new attributes are just optimisation hints and it is possible
 that size-only alternatives will sometimes end up in a block that's
 optimised for speed, or vice versa.

 The patch deals with code that directly accesses the enabled_attributes
 mask and that ought to take size/speed choices into account.  The next
 patch deals with indirect uses.  Note that I'm not making reload support
 these attributes for hopefully obvious reasons :-)

 Richard


 gcc/
   * doc/md.texi: Document preferred_for_size and preferred_for_speed
   attributes.
   * genattr.c (main): Handle preferred_for_size and
   preferred_for_speed in the same way as enabled.
   * recog.h (bool_attr): New enum.
   (target_recog): Replace x_enabled_alternatives with x_bool_attr_masks.
   (get_preferred_alternatives, check_bool_attrs): Declare.
   * recog.c (have_bool_attr, get_bool_attr, get_bool_attr_mask_uncached)
   (get_bool_attr_mask, get_preferred_alternatives, check_bool_attrs):
   New functions.
   (get_enabled_alternatives): Use get_bool_attr_mask.
   * ira-costs.c (record_reg_classes): Use get_preferred_alternatives
   instead of recog_data.enabled_alternatives.
   * ira.c (ira_setup_alts): Likewise.
   * postreload.c (reload_cse_simplify_operands): Likewise.
   * config/i386/i386.c (ix86_legitimate_combined_insn): Likewise.
   * ira-lives.c (preferred_alternatives): New variable.
   (process_bb_node_lives): Set it.
   (check_and_make_def_conflict, make_early_clobber_and_input_conflicts)
   (single_reg_class, ira_implicitly_set_insn_hard_regs): Use it instead
   of recog_data.enabled_alternatives.
   * lra-int.h (lra_insn_recog_data): Replace enabled_alternatives
   to preferred_alternatives.
   * lra-constraints.c (process_alt_operands): Update accordingly.
   * lra.c (lra_set_insn_recog_data): Likewise.
   (lra_update_insn_recog_data): Assert check_bool_attrs.


Thanks for picking this up and making a systematic solution, Richard. 
All  RA-related changes are ok for me.  I guess other changes
(genattrr.c, recog.[ch], md.texi and i386.c) are obvious but I have no
power to approve them.




Re: [PATCH 4/5] Remove recog_data.enabled_alternatives

2014-10-21 Thread Vladimir Makarov
On 10/17/2014 10:52 AM, Richard Sandiford wrote:
 After the previous patches, this one gets rid of
 recog_data.enabled_alternatives and its one remaining use.


Ok for me, too.  Pretty obvious patch although I have no power to
approve it all.



Re: [PATCH 5/5] Use preferred_for_speed in i386.md

2014-10-21 Thread Vladimir Makarov
On 10/17/2014 10:54 AM, Richard Sandiford wrote:
 Undo the original fix for 61630 and use preferred_for_speed in the
 problematic pattern.

 I've not written many gcc.target/i386 tests so the markup might need
 some work.

Final lra.c change is ok for me too.


 gcc/
   * lra.c (lra): Remove call to recog_init.
   * config/i386/i386.md (preferred_for_speed): New attribute
   (*floatSWI48:modeMODEF:mode2_sse): Override it instead of
   enabled.

 gcc/testsuite/
   * gcc.target/i386/conversion-2.c: New test.





Re: [PATCH i386 AVX512] [81/n] Add new built-ins.

2014-10-21 Thread Kirill Yukhin
On 21 Oct 18:47, Kirill Yukhin wrote:
 On 21 Oct 16:20, Jakub Jelinek wrote:
  On Tue, Oct 21, 2014 at 06:08:15PM +0400, Kirill Yukhin wrote:
   --- a/gcc/tree.h
   +++ b/gcc/tree.h
   @@ -2334,6 +2334,10 @@ extern void decl_value_expr_insert (tree, tree);
#define DECL_COMDAT(NODE) \
  (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.comdat_flag)

   + /* In a FUNCTION_DECL indicates that a static chain is needed.  */
   +#define DECL_STATIC_CHAIN(NODE) \
   +  (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.regdecl_flag)
   +
  
  I would say that you should still keep it together with the FUNCTION_DECL
  macros and use FUNCTION_DECL_CHECK there, to make it clear we don't want
  the macro to be used on VAR_DECLs etc.
  So just s/function_decl/decl_with_vis/ in the definition IMHO.
 Yeah, sure.
 
  Also, with so many added builtins, how does it affect
  int i;
  compilation time at -O0?  If it is significant, maybe it is highest time to
  make the md builtin decl building more lazy.
 I've tried this:
 $ echo int i;  test.c
 $ time for i in `seq 1` ; do ./build-x86_64-linux/gcc/xgcc 
 -B./build-x86_64-linux/gcc -O0 -S test.c ; done
 
 For trunk w/ and w/o the patch applied.
 Got 106.86 vs. 106.85 secs. which looks equal.
Retested on clear machine (SandyBridge). Got 189 vs. 192 secs., i.e. ~1%


Re: [1/6] nvptx testsuite patches: alloca

2014-10-21 Thread Jeff Law

On 10/21/14 14:12, Bernd Schmidt wrote:

This deals with uses of alloca in the testsuite. Some tests require it
outright, others only at -O0, and others require it implicitly by
requiring an alignment for stack variables bigger than the target's
STACK_BOUNDARY. For the latter I've added explicit xfails.

OK.
Jeff



Re: [2/6] nvptx testsuite patches: typed assembly

2014-10-21 Thread Jeff Law

On 10/21/14 14:15, Bernd Schmidt wrote:

Since everything in ptx assembly is typed, KR C is problematic. There
are a number of testcases that call functions with the wrong number of
arguments, or arguments of the wrong type. I've added a new feature,
untyped_assembly, which these tests now require. I've also used this for
tests using builtin_apply/builtin_return.

I'd kind of prefer to see the tests fixed, but I can live with this.

FWIW, the PA (32-bit SOM) is very sensitive to this stuff as well, 
though the linker will detect and correct most of these problems.  The 
PTX model doesn't give you the option to correct this stuff during the 
link phase


jeff





Re: [3/6] nvptx testsuite patches: stdio

2014-10-21 Thread Jeff Law

On 10/21/14 14:17, Bernd Schmidt wrote:

Some tests use stdio functions which are unavaiable with the cut-down
newlib I'm using for ptx testing. I'm somewhat uncertain what to do with
these; they are by no means the only unavailable library functions the
testsuite tries to use (signal is another example). Here's a patch which
deals with parts of the problem, but I wouldn't mind leaving this one
out if it doesn't seem worthwhile.
Tests probably shouldn't be using stdio anyway, except perhaps for the 
wrapper used when we run remotes and such to print the PASS/FAIL message.


One could argue a better direction would be to change calls into stdio 
to instead call some other function defined in the same .c file.  That 
other function would be marked as noinline.  That would help minimize 
the possibility of compromising the test.


Jeff



[PATCHv5][PING^2] Vimrc config with GNU formatting

2014-10-21 Thread Yury Gribov

On 10/13/2014 02:26 PM, Yury Gribov wrote:

On 10/02/2014 09:14 PM, Yury Gribov wrote:

On 09/17/2014 09:08 PM, Yury Gribov wrote:
  On 09/16/2014 08:38 PM, Yury Gribov wrote:
  Hi all,
 
  This is the third version of the patch. A list of changes since last
  version:
  * move config to contrib so that it's _not_ enabled by default
(current
  score is 2/1 in favor of no Vim config by default)
  * update Makefile.in to make .local.vimrc if developer asks for it
  * disable autoformatting for flex files
  * fix filtering of non-GNU sources (libsanitizer)
  * added some small fixes in cinoptions based on feedback from
community
 
  As noted by Richard, the config does not do a good job of formatting
  unbound {} blocks e.g.
  void
  foo ()
  {
 int x;
   {
 // I'm an example of bad bad formatting
   }
  }
  but it seems to be the best we can get with Vim's cindent
  (and I don't think anyone seriously considers writing a custom
  indentexpr).
 
  Ok to commit?
 
  New vesion with support for another popular local .vimrc plugin.

Hi all,

Here is a new vesion of vimrc patch. Hope I got email settings right
this time.

Changes since v4:
* fixed and enhanced docs
* added support for .lvimrc in Makefile
* minor fixes in cinoptions and formatoptions (reported by Segher)
* removed shiftwidth settings (as it does not really relate to code
formatting)

-Y





commit 3f560e9dd16a5e914b6f2ba82edffe13dfde944c
Author: Yury Gribov y.gri...@samsung.com
Date:   Thu Oct 2 15:50:52 2014 +0400

2014-10-02  Laurynas Biveinis  laurynas.bivei...@gmail.com
	Yury Gribov  y.gri...@samsung.com

Vim config with GNU formatting.

contrib/
	* vimrc: New file.

/
	* .gitignore: Added .local.vimrc and .lvimrc.
	* Makefile.tpl (vimrc, .lvimrc, .local.vimrc): New targets.
	* Makefile.in: Regenerate.

diff --git a/.gitignore b/.gitignore
index e9b56be..ab97ac6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -32,6 +32,9 @@ POTFILES
 TAGS
 TAGS.sub
 
+.local.vimrc
+.lvimrc
+
 .gdbinit
 .gdb_history
 
diff --git a/Makefile.in b/Makefile.in
index d6105b3..f3a34af 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -2384,6 +2384,18 @@ mail-report-with-warnings.log: warning.log
 	chmod +x $@
 	echo If you really want to send e-mail, run ./$@ now
 
+# Local Vim config
+
+$(srcdir)/.local.vimrc:
+	$(LN_S) $(srcdir)/contrib/vimrc $@
+
+$(srcdir)/.lvimrc:
+	$(LN_S) $(srcdir)/contrib/vimrc $@
+
+vimrc: $(srcdir)/.local.vimrc $(srcdir)/.lvimrc
+
+.PHONY: vimrc
+
 # Installation targets.
 
 .PHONY: install uninstall
diff --git a/Makefile.tpl b/Makefile.tpl
index f7c7e38..b98930c 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -867,6 +867,18 @@ mail-report-with-warnings.log: warning.log
 	chmod +x $@
 	echo If you really want to send e-mail, run ./$@ now
 
+# Local Vim config
+
+$(srcdir)/.local.vimrc:
+	$(LN_S) $(srcdir)/contrib/vimrc $@
+
+$(srcdir)/.lvimrc:
+	$(LN_S) $(srcdir)/contrib/vimrc $@
+
+vimrc: $(srcdir)/.local.vimrc $(srcdir)/.lvimrc
+
+.PHONY: vimrc
+
 # Installation targets.
 
 .PHONY: install uninstall
diff --git a/contrib/vimrc b/contrib/vimrc
new file mode 100644
index 000..34e8f35
--- /dev/null
+++ b/contrib/vimrc
@@ -0,0 +1,45 @@
+ Code formatting settings for Vim.
+
+ To enable this for GCC files by default, you can either source this file
+ in your .vimrc via autocmd:
+   :au BufNewFile,BufReadPost path/to/gcc/* :so path/to/gcc/contrib/vimrc
+ or source the script manually for each newly opened file:
+   :so contrib/vimrc
+ You could also use numerous plugins that enable local vimrc e.g.
+ mbr's localvimrc or thinca's vim-localrc (but note that the latter
+ is much less secure). To install local vimrc config, run
+   $ make vimrc
+ from GCC build folder.
+ 
+ Copyright (C) 2014 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program.  If not, see http://www.gnu.org/licenses/.
+
+function! SetStyle()
+  let l:fname = expand(%:p)
+  if stridx(l:fname, 'libsanitizer') != -1
+return
+  endif
+  let l:ext = fnamemodify(l:fname, :e)
+  let l:c_exts = ['c', 'h', 'cpp', 'cc', 'C', 'H', 'def', 'java']
+  if index(l:c_exts, l:ext) != -1
+setlocal cindent
+setlocal softtabstop=2
+setlocal cinoptions=4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0
+setlocal textwidth=80
+setlocal formatoptions-=ro formatoptions+=cqlt
+  endif
+endfunction
+
+call SetStyle()


Re: [PATCH] Account for prologue spills in reg_pressure scheduling

2014-10-21 Thread Vladimir Makarov
On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote:
 Hi,

 This patch improves register pressure scheduling (both 
 SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number 
 of available registers.

 At the moment the scheduler does not account for spills in the prologues and 
 restores in the epilogue, which occur from use of call-used registers.  The 
 current state is, essentially, optimized for case when there is a hot loop 
 inside the function, and the loop executes significantly more often than the 
 prologue/epilogue.  However, on the opposite end, we have a case when the 
 function is just a single non-cyclic basic block, which executes just as 
 often as prologue / epilogue, so spills in the prologue hurt performance as 
 much as spills in the basic block itself.  In such a case the scheduler 
 should throttle-down on the number of available registers and try to not go 
 beyond call-clobbered registers.

 The patch uses basic block frequencies to balance the cost of using call-used 
 registers for intermediate cases between the two above extremes.

 The motivation for this patch was a floating-point testcase on 
 arm-linux-gnueabihf (ARM is one of the few targets that use register pressure 
 scheduling by default).

 A thanks goes to Richard good discussion of the problem and suggestions on 
 the approach to fix it.

 The patch was bootstrapped on x86_64-linux-gnu (which doesn't really 
 exercises the patch), and cross-tested on arm-linux-gnueabihf and 
 aarch64-linux-gnu.

 OK to apply?

It is a pretty interesting idea for heuristic, Maxim.

But I don't understand the following loop:

+  for (i = 0; i  FIRST_PSEUDO_REGISTER; ++i)
+   if (call_used_regs[i])
+ for (c = 0; c  ira_pressure_classes_num; ++c)
+   {
+ enum reg_class cl = ira_pressure_classes[c];
+ if (ira_class_hard_regs[cl][i])
+   ++call_used_regs_num[cl];


ira_class_hard_regs[cl] is array containing hard registers belonging to
class CL.  So if GENERAL_REGS consists of hard regs 0..3, 12..15,  the
array will contain 8 elements 0..3, 12..15.  The array size is defined
by ira_class_hard_regs_num[cl].  So the index is order number of hard
reg in the class (starting from 0) but not hard register number itself. 
Also the pressure classes never intersect so you can stop the inner loop
when you find class to which hard reg belongs to.

I believe you should rewrite the code and get performance results again
to get an approval.   You also missed the changelog.



Re: [4/6] nvptx testsuite patches: xfails and skips

2014-10-21 Thread Jeff Law

On 10/21/14 14:19, Bernd Schmidt wrote:

Some things don't fit into nice categories that apply to a larger set of
tests, or which are somewhat random like ptxas tool failures. For these
I've added xfails and skips.


Bernd

ts-xfails.diff


gcc/testsuite/
* lib/target-supports.exp (check_effective_target_trampolines,
check_profiling_available, check_effective_target_lto,
check_effective_target_vect_natural): False for nvptx-*-*.
* gcc.c-torture/compile/limits-fndefn.c: Skip for nvptx-*-*.
* gcc.c-torture/compile/pr34334.c: Likewise.
* gcc.c-torture/compile/pr37056.c: Likewise.
* gcc.c-torture/compile/pr39423-1.c: Likewise.
* gcc.c-torture/compile/pr46534.c: Likewise.
* gcc.c-torture/compile/pr49049.c: Likewise.
* gcc.c-torture/compile/pr59417.c: Likewise.
* gcc.c-torture/compile/20080721-1.c: Likewise.
* gcc.c-torture/compile/920501-4.c: Likewise.
* gcc.c-torture/compile/921011-1.c: Likewise.   
* gcc.dg/20040813-1.c: Likewise.
* gcc.dg/pr28755.c: Likewise.
* gcc.dg/pr44194-1.c: Likewise.
* gcc.c-torture/compile/pr42717.c: Xfail for nvptx-*-*.
* gcc.c-torture/compile/pr61684.c: Likewise.
* gcc.c-torture/compile/pr20601-1.c: Likewise.
* gcc.c-torture/compile/pr59221.c: Likewise.
* gcc.c-torture/compile/20060208-1.c: Likewise.
* gcc.c-torture/execute/pr52129.c: Likewise.
* gcc.c-torture/execute/20020310-1.c: Likewise.
* gcc.c-torture/execute/20101011-1.c: Define DO_TEST to 0 for nvptx.
* gcc.c-torture/execute20020312-2.c: Add case for for nvptx.
* gcc.c-torture/compile/pr60655-1.c: Don't add -fdata-sections
for nvptx-*-*.
* gcc.dg/pr36400.c: Xfail scan-assembler test on nvptx-*-*.
* gcc.dg/const-elim-2.c: Likewise.
More ptx tooling failures than I'd expect.  I'll leave it up to you 
whether or not to push on NVidia to fix some of those failures.  The 
timeouts seem particularly troublesome.


I think this is fine.

jeff




Re: [5/6] nvptx testsuite patches: jumps and labels

2014-10-21 Thread Jeff Law

On 10/21/14 14:23, Bernd Schmidt wrote:

This deals with tests requiring indirect jumps (including tests using
setjmp), label values, and nonlocal goto.

A subset of these tests uses the NO_LABEL_VALUES macro, but it's not
consistent across the testsuite. The feature test I wrote tests whether
that is defined and returns false for label_values if so.


Bernd


ts-jumps-labels.diff


gcc/testsuite/
* lib/target-supports.exp (check_effective_target_indirect_jumps):
New function.
(check_effective_target_nonlocal_goto): New function.
(check_effective_target_label_values): New function.
* gcc.c-torture/execute/20071220-2.c: Require label_values.
* gcc.c-torture/compile/labels-2.c: Likewise.
* gcc.c-torture/compile/2518-1.c: Likewise.
* gcc.c-torture/compile/20021108-1.c: Likewise.
* gcc.c-torture/compile/981006-1.c: Likewise.
* gcc.c-torture/execute/20040302-1.c: Likewise.
* gcc.dg/torture/pr33848.c: Likewise.

* gcc.c-torture/compile/pr46107.c: Require indirect jumps and
label values.
* gcc.c-torture/compile/pr32919.c: Likewise.
* gcc.c-torture/compile/pr17913.c: Likewise.
* gcc.c-torture/compile/pr51495.c: Likewise.
* gcc.c-torture/compile/pr25224.c: Likewise.
* gcc.c-torture/compile/labels-3.c: Likewise.
* gcc.c-torture/compile/pr27863.c: Likewise.
* gcc.c-torture/compile/20050510-1.c: Likewise.
* gcc.c-torture/compile/pr28489.c: Likewise.
* gcc.c-torture/compile/pr29128.c: Likewise.
* gcc.c-torture/compile/pr21356: Likewise.
* gcc.c-torture/execute/20071210-1.c: Likewise.
* gcc.c-torture/execute/200701220-1.c: Likewise.
* gcc.c-torture/execute/pr51447.c: Likewise.
* gcc.c-torture/execute/comp-goto-1.c: Likewise.
* gcc.c-torture/execute/comp-goto-2.c: Likewise.
* gcc.dg/20021029-1.c: Likewise.
* gcc.dg/pr43379.c: Likewise.
* gcc.dg/pr45259.c: Likewise.
* gcc.dg/torture/pr53695.c: Likewise.
* gcc.dg/torture/pr57584.c: Likewise.

* gcc.c-torture/execute/980526-1.c: Skip if -O0 and neither label_values
or indirect_jumps are available.
* gcc.c-torture/compile/920415-1.c: Likewise.  Remove NO_LABEL_VALUES
test.
* gcc.c-torture/compile/920428-3.c: Likewise.
* gcc.c-torture/compile/950613-1.c: Likewise.

* gcc.c-torture/compile/pr30984.c: Require indirect jumps.
* gcc.c-torture/compile/991213-3.c: Likewise.
* gcc.c-torture/compile/920825-1.c: Likewise.
* gcc.c-torture/compile/20011029-1.c: Likewise.
* gcc.c-torture/compile/complex-6.c: Likewise.
* gcc.c-torture/compile/pr27127.c: Likewise.
* gcc.c-torture/compile/pr58164.c: Likewise.
* gcc.c-torture/compile/20041214-1.c: Likewise.
* gcc.c-torture/execute/built-in-setjmp.c: Likewise.
* gcc.c-torture/execute/pr56982.c: Likewise.
* gcc.c-torture/execute/pr60003.c: Likewise.
* gcc.c-torture/execute/pr26983.c: Likewise.
* gcc.dg/pr57287-2.c: Likewise.
* gcc.dg/pr59920-1.c: Likewise.
* gcc.dg/pr59920-2.c: Likewise.
* gcc.dg/pr59920-3.c: Likewise.
* gcc.dg/setjmp-3.c: Likewise.
* gcc.dg/setjmp-4.c: Likewise.
* gcc.dg/setjmp-5.c: Likewise.
* gcc.dg/torture/pr48542.c: Likewise.
* gcc.dg/torture/pr57147-2.c: Likewise.
* gcc.dg/torture/pr59993.c: Likewise.

* gcc.dg/torture/stackalign/non-local-goto-1.c: Require nonlocal_goto.
* gcc.dg/torture/stackalign/non-local-goto-2.c: Likewise.
* gcc.dg/torture/stackalign/non-local-goto-3.c: Likewise.
* gcc.dg/torture/stackalign/non-local-goto-4.c: Likewise.
* gcc.dg/torture/stackalign/non-local-goto-5.c: Likewise.
* gcc.dg/torture/stackalign/setjmp-1.c: Likewise.
* gcc.dg/torture/stackalign/setjmp-3.c: Likewise.
* gcc.dg/torture/stackalign/setjmp-4.c: Likewise.
* gcc.dg/non-local-goto-1.c: Likewise.
* gcc.dg/non-local-goto-2.c: Likewise.
* gcc.dg/pr49994-1.c: Likewise.
* gcc.dg/torture/pr57036-2.c: Likewise.

* gcc.c-torture/compile/20040614-1.c: Require label_values.  Remove
NO_LABEL_VALUES test.
* gcc.c-torture/compile/920831-1.c: Likewise.
* gcc.c-torture/compile/920502-1.c: Likewise.
* gcc.c-torture/compile/920501-7.c: Likewise.
* gcc.dg/pr52139.c: Likewise.
NO_LABEL_VALUES probably hasn't been consistently kept up-to-date as the 
focus of the project has moved a bit away from embedded.  That code also 
predates the push for check_effective_target_*.


OK for the trunk.

jef


  1   2   >