Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Richard Biener
On August 26, 2015 6:08:55 PM GMT+02:00, Alan Lawrence alan.lawre...@arm.com 
wrote:
Richard Biener wrote:

 One extra question is does the way we limit total scalarization
work
 well
 for arrays?  I suppose we have either sth like the maximum size of
an
 aggregate we scalarize or the maximum number of component accesses
 we create?

 Only the former and that would be kept intact.  It is in fact
visible
 in the context of the last hunk of the patch.
 
 OK.  IIRC the gimplification code also has the latter and also
considers zeroing the whole aggregate before initializing non-zero
fields.  IMHO it makes sense to reuse some of the analysis and
classification routines it has.

Do you mean gimplify_init_constructor? Yes, there's quite a lot of
logic there 
;). That feels like a separate patch - and belonging to the
constant-handling 
subseries of this series

Yes.

 - as gimplify_init_constructor already deals
with both 
record and array types, and I don't see anything there that's
specifically good 
for total-scalarization of arrays?

IOW, do you mean that to block this patch, or can it be separate (I can
address 
Martin + Jeff's comments fairly quickly and independently) ?

No, but I'd like this being explores with the init sub series.  We don't want 
two places doing total scalarization of initualizers , gimplification and SRA 
and with different/conflicting heuristics.  IMHO the gimplification total 
scalarization happens too early.

Richard.

Cheers, Alan




Go patch committed: don't crash on invalid numeric type

2015-08-26 Thread Ian Lance Taylor
This patch by Chris Manghane fixes a compiler crash on an invalid
program when the compiler tries to set a numeric constant to an
invalid type.  This fixes https://golang.org/issue/11537.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227201)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d5e6af4e6dd456075a1ec1c03d0dc41cbea5eb36
+cd5362c7bb0b207f484a8dfb8db229fd2bffef09
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 227201)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -15150,7 +15150,11 @@ Numeric_constant::set_type(Type* type, b
   else if (type-complex_type() != NULL)
 ret = this-check_complex_type(type-complex_type(), issue_error, loc);
   else
-go_unreachable();
+{
+  ret = false;
+  if (issue_error)
+go_assert(saw_errors());
+}
   if (ret)
 this-type_ = type;
   return ret;


Re: [libvtv] Fix formatting errors

2015-08-26 Thread Jeff Law

On 08/26/2015 07:30 AM, Rainer Orth wrote:

While looking at libvtv for the Solaris port, I noticed all sorts of GNU
Coding Standard violations:

* ChangeLog entries attributed to the committer instead of the author
   and with misformatted PR references, entries only giving a vague
   rational instead of what changed

* overlong lines

* tons of whitespace errors (though I may be wrong in some cases: C++
   code might have other rules)

* code formatting that seems to have been done to be visually pleasing,
   completely different from what Emacs does

* commented code fragments (#if 0 equivalent)

* configure.tgt target list in no recognizable order

* the Cygwin/MingW port is done in the worst possible way: tons of
   target-specific ifdefs instead of feature-specific conditionals or an
   interface that can wrap both Cygwin and Linux variants of the code

The following patch (as yet not even compiled) fixes some of the most
glaring errors.  The Solaris port will fix a few of the latter ones.

Do you think this is the right direction or did I get something wrong?

Thanks.
 Rainer


2015-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de

Fix formatting errors.
I'm more interested in the current state of vtv as I keep getting 
dragged into discussions about what we can/should be doing in the 
compiler world to close more security stuff.


Vtables are an obvious candidate given we've got vtv.

Jeff


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Jeff Law

On 08/26/2015 05:13 AM, Ilya Enkovich wrote:

2015-08-26 0:42 GMT+03:00 Jeff Law l...@redhat.com:

On 08/21/2015 04:49 AM, Ilya Enkovich wrote:



I want a work with bitmasks to be expressed in a natural way using
regular integer operations. Currently all masks manipulations are
emulated via vector statements (mostly using a bunch of vec_cond). For
complex predicates it may be nontrivial to transform it back to scalar
masks and get an efficient code. Also the same vector may be used as
both a mask and an integer vector. Things become more complex if you
additionally have broadcasts and vector pack/unpack code. It also
should be transformed into a scalar masks manipulations somehow.


Or why not model the conversion at the gimple level using a CONVERT_EXPR?
In fact, the more I think about it, that seems to make more sense to me.

We pick a canonical form for the mask, whatever it may be.  We use that
canonical form and model conversions between it and the other form via
CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
If it's not up to the task, we should really look into why and resolve.

Yes, that does mean we have two forms which I'm not terribly happy about and
it means some target dependencies on what the masked vector operation looks
like (ie, does it accept a simple integer or vector mask), but I'm starting
to wonder if, as distasteful as I find it, it's the right thing to do.


If we have some special representation for masks in GIMPLE then we
might not need any conversions. We could ask a target to define a MODE
for this type and use it directly everywhere: directly compare into
it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
that type is reserved for masks usage then you previous suggestion to
transform masks into target specific form at GIMPLE-RTL phase should
work fine. This would allow to support only a single masks
representation in GIMPLE.
Possibly, but you mentioned that you may need to use the masks in both 
forms depending on the exact context.  If so, then I think we need to 
model a conversion between the two forms.



Jeff



Fwd: [libvtv] Fix formatting errors

2015-08-26 Thread Caroline Tice
-- Forwarded message --
From: Caroline Tice cmt...@google.com
Date: Wed, Aug 26, 2015 at 12:50 PM
Subject: Re: [libvtv] Fix formatting errors
To: Jeff Law l...@redhat.com
Cc: Rainer Orth r...@cebitec.uni-bielefeld.de, GCC Patches
gcc-patches@gcc.gnu.org


As far as I know vtv is working just fine...is there something I don't
know about?

-- Caroline
cmt...@google.com

On Wed, Aug 26, 2015 at 12:47 PM, Jeff Law l...@redhat.com wrote:

 On 08/26/2015 07:30 AM, Rainer Orth wrote:

 While looking at libvtv for the Solaris port, I noticed all sorts of GNU
 Coding Standard violations:

 * ChangeLog entries attributed to the committer instead of the author
and with misformatted PR references, entries only giving a vague
rational instead of what changed

 * overlong lines

 * tons of whitespace errors (though I may be wrong in some cases: C++
code might have other rules)

 * code formatting that seems to have been done to be visually pleasing,
completely different from what Emacs does

 * commented code fragments (#if 0 equivalent)

 * configure.tgt target list in no recognizable order

 * the Cygwin/MingW port is done in the worst possible way: tons of
target-specific ifdefs instead of feature-specific conditionals or an
interface that can wrap both Cygwin and Linux variants of the code

 The following patch (as yet not even compiled) fixes some of the most
 glaring errors.  The Solaris port will fix a few of the latter ones.

 Do you think this is the right direction or did I get something wrong?

 Thanks.
  Rainer


 2015-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de

 Fix formatting errors.

 I'm more interested in the current state of vtv as I keep getting dragged 
 into discussions about what we can/should be doing in the compiler world to 
 close more security stuff.

 Vtables are an obvious candidate given we've got vtv.

 Jeff


[Patch, fortran] F2008 - implement pointer function assignment

2015-08-26 Thread Paul Richard Thomas
Dear All,

The attached patch more or less implements the assignment of
expressions to the result of a pointer function. To wit:

my_ptr_fcn (arg1, arg2...) = expr

arg1 would usually be the target, pointed to by the function. The
patch parses these statements and resolves them into:

temp_ptr = my_ptr_fcn (arg1, arg2...)
temp_ptr = expr

I say more or less implemented because I have ducked one of the
headaches here. At the end of the specification block, there is an
ambiguity between statement functions and pointer function
assignments. I do not even try to resolve this ambiguity and require
that there be at least one other type of executable statement before
these beasts. This can undoubtedly be fixed but the effort seems to me
to be unwarranted at the present time.

I had a stupid amount of trouble with the test fmt_tab_1.f90. I have
no idea why but the gfc_warning no longer showed the offending line,
although the line number in the error message was OK. Changing to
gfc_warning_now fixed the problem. Also, I can see no reason why this
should be dg-run and so changed to dg-compile. Finally, I set
-std=legacy to stop the generic error associated with tabs.

Bootstraps and regtests on x86_64/FC21 - OK for trunk?

Now back to trying to get my head round parameterized derived types!

Cheers

Paul

2015-08-26  Paul Thomas  pa...@gcc.gnu.org

* decl.c (get_proc_name): Return if statement function is
found.
* io.c (next_char_not_space): Change tab warning to warning now
to prevent locus being lost.
* match.c (gfc_match_ptr_fcn_assign): New function.
* match.h : Add prototype for gfc_match_ptr_fcn_assign.
* parse.c : Add static flag 'in_specification_block'.
(decode_statement): If in specification block match a statement
function, otherwise if standard embraces F2008 try to match a
pointer function assignment.
(parse_interface): Set 'in_specification_block' on exiting from
parse_spec.
(parse_spec): Set and then reset 'in_specification_block'.
(gfc_parse_file): Set 'in_specification_block'.
* resolve.c (get_temp_from_expr): Extend to include functions
and array constructors as rvalues..
(resolve_ptr_fcn_assign): New function.
(gfc_resolve_code): Call it on finding a pointer function as an
lvalue.
* symbol.c (gfc_add_procedure): Add a sentence to the error to
flag up the ambiguity between a statement function and pointer
function assignment at the end of the specification block.

2015-08-26  Paul Thomas  pa...@gcc.gnu.org

* gfortran.dg/fmt_tab_1.f90: Change from run to compile and set
standard as legacy.
* gfortran.dg/ptr_func_assign_1.f08: New test.
Index: gcc/fortran/decl.c
===
*** gcc/fortran/decl.c  (revision 227118)
--- gcc/fortran/decl.c  (working copy)
*** get_proc_name (const char *name, gfc_sym
*** 901,906 
--- 901,908 
  return rc;
  
sym = *result;
+   if (sym-attr.proc == PROC_ST_FUNCTION)
+ return rc;
  
if (sym-attr.module_procedure
 sym-attr.if_source == IFSRC_IFBODY)
Index: gcc/fortran/io.c
===
*** gcc/fortran/io.c(revision 227118)
--- gcc/fortran/io.c(working copy)
*** next_char_not_space (bool *error)
*** 200,206 
if (c == '\t')
{
  if (gfc_option.allow_std  GFC_STD_GNU)
!   gfc_warning (0, Extension: Tab character in format at %C);
  else
{
  gfc_error (Extension: Tab character in format at %C);
--- 200,206 
if (c == '\t')
{
  if (gfc_option.allow_std  GFC_STD_GNU)
!   gfc_warning_now (0, Extension: Tab character in format at %C);
  else
{
  gfc_error (Extension: Tab character in format at %C);
Index: gcc/fortran/match.c
===
*** gcc/fortran/match.c (revision 227118)
--- gcc/fortran/match.c (working copy)
*** match
*** 4886,4892 
  gfc_match_st_function (void)
  {
gfc_error_buffer old_error;
- 
gfc_symbol *sym;
gfc_expr *expr;
match m;
--- 4886,4891 
*** gfc_match_st_function (void)
*** 4926,4931 
--- 4925,5000 
return MATCH_YES;
  
  undo_error:
+   gfc_pop_error (old_error);
+   return MATCH_NO;
+ }
+ 
+ 
+ /* Match an assignment to a pointer function (F2008). This could, in
+general be ambiguous with a statement function. In this implementation
+it remains so if it is the first statement after the specification
+block.  */
+ 
+ match
+ gfc_match_ptr_fcn_assign (void)
+ {
+   gfc_error_buffer old_error;
+   locus old_loc;
+   gfc_symbol *sym;
+   gfc_expr *expr;
+   match m;
+   char name[GFC_MAX_SYMBOL_LEN + 1];
+ 
+   old_loc = gfc_current_locus;
+   m = gfc_match_name (name);
+   if (m != MATCH_YES)
+ return m;
+ 
+   

Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law l...@redhat.com wrote:
 On 08/25/2015 03:42 PM, Martin Jambor wrote:

 Hi,

 On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote:

 This changes the completely_scalarize_record path to also work on arrays
 (thus
 allowing records containing arrays, etc.). This just required extending
 the
 existing type_consists_of_records_p and completely_scalarize_record
 methods
 to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed
 both
 methods so as not to mention 'record'.


 thanks for working on this.  I see Jeff has already approved the
 patch, but I have two comments nevertheless.  First, I would be much
 happier if you added a proper comment to scalarize_elem function which
 you forgot completely.  The name is not very descriptive and it has
 quite few parameters too.

 Right.  I mentioned that I missed the lack of function comments when looking
 at #3 and asked Alan to go back and fix them in #1 and #2.


 Second, this patch should also fix PR 67283.  It would be great if you
 could verify that and add it to the changelog when committing if that
 is indeed the case.

 Excellent.  Yes, definitely mention the BZ.

One extra question is does the way we limit total scalarization work well
for arrays?  I suppose we have either sth like the maximum size of an
aggregate we scalarize or the maximum number of component accesses
we create?

Thanks,
Richard.

 jeff



Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 9:50 PM, Jeff Law l...@redhat.com wrote:
 On 08/25/2015 05:06 AM, Alan Lawrence wrote:

 When SRA completely scalarizes an array, this patch changes the
 generated accesses from e.g.

 MEM[(int[8] *)a + 4B] = 1;

 to

 a[1] = 1;

 This overcomes a limitation in dom2, that accesses to equivalent
 chunks of e.g. MEM[(int[8] *)a] are not hashable_expr_equal_p with
 accesses to e.g. MEM[(int[8] *)a]. This is necessary for constant
 propagation in the ssa-dom-cse-2.c testcase (after the next patch
 that makes SRA handle constant-pool loads).

 I tried to work around this by making dom2's hashable_expr_equal_p
 less conservative, but found that on platforms without AArch64's
 vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC,
 mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8]
 *)a] equivalent to a[0], etc.; a complete overhaul of
 hashable_expr_equal_p seems like a larger task than this patch
 series.

 I can't see how to write a testcase for this in C though as direct
 assignment to an array is not possible; such assignments occur only
 with constant pool data, which is dealt with in the next patch.

 It's a general issue that if there's  1 common way to represent an
 expression, then DOM will often miss discovery of the CSE opportunity
 because of the way it hashes expressions.

 Ideally we'd be moving to a canonical form, but I also realize that in
 the case of memory references like this, that may not be feasible.

 It does make me wonder how many CSEs we're really missing due to the two
 ways to represent array accesses.


 Bootstrap + check-gcc on x86-none-linux-gnu,
 arm-none-linux-gnueabihf, aarch64-none-linux-gnu.

 gcc/ChangeLog:

 * tree-sra.c (completely_scalarize): Move some code into:
 (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base
 is aligned array. --- gcc/tree-sra.c | 110
 - 1 file
 changed, 69 insertions(+), 41 deletions(-)

 diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc
 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@
 scalarizable_type_p (tree type) } }

 +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT
 *sz_out)

 Function comment needed.

 I may have missed it in the earlier patches, but can you please make
 sure any new functions you created have comments in those as well.  Such
 patches are pre-approved.

 With the added function comment, this patch is fine.

Err ... you generally _cannot_ create ARRAY_REFs out of thin air because
of correctness issues with data-ref and data dependence analysis.  You can
of course keep ARRAY_REFs if the original access was an ARRAY_REF.

But I'm not convinced this is what the pass does.

We've went through great lengths removing all the code from gimplification
and folding that tried to be clever in producing array refs from accesses to
sth with an ARRAY_TYPE - this all eventually lead to wrong-code issues
later.

So I'd rather _not_ have this patch.  (as always I'm too slow responding
and Jeff is too fast ;))

Thanks,
Richard.

 jeff




Re: [libgfortran,patch] Remove never-used debugging code

2015-08-26 Thread FX
 OK. Just checking.  Thanks for the code cleanup.

Thanks for the review. Committed as rev. 227208.

FX


Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Alan Lawrence

Richard Biener wrote:


One extra question is does the way we limit total scalarization work

well

for arrays?  I suppose we have either sth like the maximum size of an
aggregate we scalarize or the maximum number of component accesses
we create?


Only the former and that would be kept intact.  It is in fact visible
in the context of the last hunk of the patch.


OK.  IIRC the gimplification code also has the latter and also considers 
zeroing the whole aggregate before initializing non-zero fields.  IMHO it makes 
sense to reuse some of the analysis and classification routines it has.


Do you mean gimplify_init_constructor? Yes, there's quite a lot of logic there 
;). That feels like a separate patch - and belonging to the constant-handling 
subseries of this series - as gimplify_init_constructor already deals with both 
record and array types, and I don't see anything there that's specifically good 
for total-scalarization of arrays?


IOW, do you mean that to block this patch, or can it be separate (I can address 
Martin + Jeff's comments fairly quickly and independently) ?


Cheers, Alan



Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Ilya Enkovich
2015-08-26 17:56 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2015-08-26 16:02 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2015-08-21 14:00 GMT+03:00 Richard Biener richard.guent...@gmail.com:

 Hmm, I don't see how vector masks are more difficult to operate with.

 There are just no instructions for that but you have to pretend you
 have to get code vectorized.

 Huh?  Bitwise ops should be readily available.

 Right bitwise ops are available, but there is no comparison into a
 vector and no masked loads and stores using vector masks (when we
 speak about 512-bit vectors).



 Also according to vector ABI integer mask should be used for mask
 operand in case of masked vector call.

 What ABI?  The function signature of the intrinsics?  How would that
 come into play here?

 Not intrinsics. I mean OpenMP vector functions which require integer
 arg for a mask in case of 512-bit vector.

 How do you declare those?

 Something like this:

 #pragma omp declare simd inbranch
 int foo(int*);

 The 'inbranch' is the thing that matters?  And all of foo is then
 implicitely predicated?

That's right. And a vector version of foo gets a mask as an additional arg.



 Well, you are missing the case of

bool b = a  b;
int x = (int)b;

 This case seems to require no changes and just be transformed into vec_cond.

 Ok, the example was too simple but I meant that a bool has a non-conditional
 use.

Right. In such cases I think it's reasonable to replace it with a
select similar to what we now have but without whole bool tree
transformed.


 Ok, so I still believe we don't want two ways to express things on GIMPLE if
 possible.  Yes, the vectorizer already creates only vector stmts that
 are supported
 by the hardware.  So it's a matter of deciding on the GIMPLE representation
 for the mask.  I'd rather use vectorbool (and the target assigning
 an integer
 mode to it) than an 'int' in GIMPLE statements.  Because that makes the
 type constraints on GIMPLE very weak and exposes those 'ints' to all kind
 of optimization passes.

 Thus if we change the result type requirement of vector comparisons from
 signed integer vectors to bool vectors the vectorizer can still go for
 promoting that bool vector to a vector of ints via a VEC_COND_EXPR
 and the expander can special-case that if the target has a vector comparison
 producing a vector mask.

 So, can you give that vectorbool some thought?

Yes, I want to try it. But getting rid of bool patterns would mean
support for all targets currently supporting vec_cond. Would it be OK
to have vectorbool mask co-exist with bool patterns for some time?
Thus first step would be to require vectorbool for MASK_LOAD and
MASK_STORE and support it for i386 (the only user of MASK_LOAD and
MASK_STORE).

Note that to assign
 sth else than a vector mode to it needs adjustments in stor-layout.c.
 I'm pretty sure we don't want vector BImodes.

I can directly build a vector type with specified mode to avoid it. Smth. like:

mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
mask_type = make_vector_type (bool_type_node, nunits, mask_mode);

Thanks,
Ilya


 Richard.



Re: [PATCH] 2015-07-31 Benedikt Huber benedikt.hu...@theobroma-systems.com Philipp Tomsich philipp.toms...@theobroma-systems.com

2015-08-26 Thread Benedikt Huber
ping

[PATCH v4][aarch64] Implemented reciprocal square root (rsqrt) estimation in 
-ffast-math

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02698.html

 On 31 Jul 2015, at 19:05, Benedikt Huber 
 benedikt.hu...@theobroma-systems.com wrote:
 
   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
   rsqrtf.
   * config/aarch64/aarch64-opts.h: -mrecip has a default value
   depending on the core.
   * config/aarch64/aarch64-protos.h: Declare.
   * config/aarch64/aarch64-simd.md: Matching expressions for
   frsqrte and frsqrts.
   * config/aarch64/aarch64-tuning-flags.def: Added
   MRECIP_DEFAULT_ENABLED.
   * config/aarch64/aarch64.c: New functions. Emit rsqrt
   estimation code in fast math mode.
   * config/aarch64/aarch64.md: Added enum entries.
   * config/aarch64/aarch64.opt: Added options -mrecip and
   -mlow-precision-recip-sqrt.
   * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
   for frsqrte and frsqrts
   * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
 
 Signed-off-by: Philipp Tomsich philipp.toms...@theobroma-systems.com
 ---
 gcc/ChangeLog  |  21 
 gcc/config/aarch64/aarch64-builtins.c  | 104 
 gcc/config/aarch64/aarch64-opts.h  |   7 ++
 gcc/config/aarch64/aarch64-protos.h|   2 +
 gcc/config/aarch64/aarch64-simd.md |  27 ++
 gcc/config/aarch64/aarch64-tuning-flags.def|   1 +
 gcc/config/aarch64/aarch64.c   | 106 +++-
 gcc/config/aarch64/aarch64.md  |   3 +
 gcc/config/aarch64/aarch64.opt |   8 ++
 gcc/doc/invoke.texi|  19 
 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
 gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107 +
 12 files changed, 463 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
 
 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 3432adb..3bf3098 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,24 @@
 +2015-07-31  Benedikt Huber  benedikt.hu...@theobroma-systems.com
 + Philipp Tomsich  philipp.toms...@theobroma-systems.com
 +
 + * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
 + rsqrtf.
 + * config/aarch64/aarch64-opts.h: -mrecip has a default value
 + depending on the core.
 + * config/aarch64/aarch64-protos.h: Declare.
 + * config/aarch64/aarch64-simd.md: Matching expressions for
 + frsqrte and frsqrts.
 + * config/aarch64/aarch64-tuning-flags.def: Added
 + MRECIP_DEFAULT_ENABLED.
 + * config/aarch64/aarch64.c: New functions. Emit rsqrt
 + estimation code in fast math mode.
 + * config/aarch64/aarch64.md: Added enum entries.
 + * config/aarch64/aarch64.opt: Added options -mrecip and
 + -mlow-precision-recip-sqrt.
 + * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
 + for frsqrte and frsqrts
 + * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
 +
 2015-07-08  Jiong Wang  jiong.w...@arm.com
 
   * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function.
 diff --git a/gcc/config/aarch64/aarch64-builtins.c 
 b/gcc/config/aarch64/aarch64-builtins.c
 index b6c89b9..b4f443c 100644
 --- a/gcc/config/aarch64/aarch64-builtins.c
 +++ b/gcc/config/aarch64/aarch64-builtins.c
 @@ -335,6 +335,11 @@ enum aarch64_builtins
   AARCH64_BUILTIN_GET_FPSR,
   AARCH64_BUILTIN_SET_FPSR,
 
 +  AARCH64_BUILTIN_RSQRT_DF,
 +  AARCH64_BUILTIN_RSQRT_SF,
 +  AARCH64_BUILTIN_RSQRT_V2DF,
 +  AARCH64_BUILTIN_RSQRT_V2SF,
 +  AARCH64_BUILTIN_RSQRT_V4SF,
   AARCH64_SIMD_BUILTIN_BASE,
   AARCH64_SIMD_BUILTIN_LANE_CHECK,
 #include aarch64-simd-builtins.def
 @@ -824,6 +829,42 @@ aarch64_init_crc32_builtins ()
 }
 
 void
 +aarch64_add_builtin_rsqrt (void)
 +{
 +  tree fndecl = NULL;
 +  tree ftype = NULL;
 +
 +  tree V2SF_type_node = build_vector_type (float_type_node, 2);
 +  tree V2DF_type_node = build_vector_type (double_type_node, 2);
 +  tree V4SF_type_node = build_vector_type (float_type_node, 4);
 +
 +  ftype = build_function_type_list (double_type_node, double_type_node, 
 NULL_TREE);
 +  fndecl = add_builtin_function (__builtin_aarch64_rsqrt_df,
 +ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE);
 +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl;
 +
 +  ftype = build_function_type_list (float_type_node, float_type_node, 
 NULL_TREE);
 +  fndecl = add_builtin_function (__builtin_aarch64_rsqrt_sf,
 +ftype, AARCH64_BUILTIN_RSQRT_SF, BUILT_IN_MD, NULL, NULL_TREE);
 +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_SF] = fndecl;
 +
 +  ftype = build_function_type_list (V2DF_type_node, V2DF_type_node, 
 NULL_TREE);
 +  fndecl 

Re: [PATCH] [AVX512F] Add scatter support for vectorizer

2015-08-26 Thread Petr Murzin
On Wed, Aug 26, 2015 at 10:41 AM, Richard Biener
richard.guent...@gmail.com wrote:
 @@ -3763,32 +3776,46 @@ again:
if (vf  *min_vf)
 *min_vf = vf;

 -  if (gather)
 +  if (gatherscatter != SG_NONE)
 {
   tree off;
 + if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off,
 NULL, true) != 0)
 +   gatherscatter = GATHER;
 + else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL,
 off, NULL, false)
 + != 0)
 +   gatherscatter = SCATTER;
 + else
 +   gatherscatter = SG_NONE;

 as I said vect_check_gather_scatter already knows whether the DR is a read or
 a write and thus whether it needs to check for gather or scatter.  Remove
 the new argument.  And simply do

if (!vect_check_gather_scatter (stmt))
  gatherscatter = SG_NONE;

 - STMT_VINFO_GATHER_P (stmt_info) = true;
 + if (gatherscatter == GATHER)
 +   STMT_VINFO_GATHER_P (stmt_info) = true;
 + else
 +   STMT_VINFO_SCATTER_P (stmt_info) = true;
 }

 and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P
 using the enum so you can simply do

  STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter;
 Otherwise the patch looks ok to me.

Fixed.
Uros, could you please have a look at target part of patch?

Thanks,
Petr

2015-08-26  Andrey Turetskiy  andrey.turets...@intel.com
Petr Murzin  petr.mur...@intel.com

gcc/

* config/i386/i386-builtin-types.def
(VOID_PFLOAT_HI_V8DI_V16SF_INT): New.
(VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto.
(VOID_PINT_HI_V8DI_V16SI_INT): Ditto.
(VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto.
* config/i386/i386.c
(ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF,
IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
IX86_BUILTIN_SCATTERALTDIV16SI.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df,
__builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di,
__builtin_ia32_scatteraltdiv8si.
(ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF,
IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
IX86_BUILTIN_SCATTERALTDIV16SI.
(ix86_vectorize_builtin_scatter): New.
(TARGET_VECTORIZE_BUILTIN_SCATTER): Define as
ix86_vectorize_builtin_scatter.
* doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New.
* doc/tm.texi: Regenerate.
* target.def: Add scatter builtin.
* tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it
for loads/stores in case of gather/scatter accordingly.
(STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
(vect_check_gather): Rename to ...
(vect_check_gather_scatter): this.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use
STMT_VINFO_GATHER_SCATTER_P instead of STMT_VINFO_SCATTER_P.
(vect_check_gather_scatter): Use it instead of vect_check_gather.
(vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable
and new checkings for it accordingly.
* tree-vect-stmts.c
(STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
(vect_check_gather_scatter): Use it instead of vect_check_gather.
(vectorizable_store): Add checkings for STMT_VINFO_GATHER_SCATTER_P.

gcc/testsuite/

* gcc.target/i386/avx512f-scatter-1.c: New.
* gcc.target/i386/avx512f-scatter-2.c: Ditto.
* gcc.target/i386/avx512f-scatter-3.c: Ditto.


scatter
Description: Binary data


tests
Description: Binary data


Re: [PATCH] [AVX512F] Add scatter support for vectorizer

2015-08-26 Thread Uros Bizjak
On Wed, Aug 26, 2015 at 7:39 PM, Petr Murzin petrmurz...@gmail.com wrote:
 On Wed, Aug 26, 2015 at 10:41 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 @@ -3763,32 +3776,46 @@ again:
if (vf  *min_vf)
 *min_vf = vf;

 -  if (gather)
 +  if (gatherscatter != SG_NONE)
 {
   tree off;
 + if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off,
 NULL, true) != 0)
 +   gatherscatter = GATHER;
 + else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL,
 off, NULL, false)
 + != 0)
 +   gatherscatter = SCATTER;
 + else
 +   gatherscatter = SG_NONE;

 as I said vect_check_gather_scatter already knows whether the DR is a read or
 a write and thus whether it needs to check for gather or scatter.  Remove
 the new argument.  And simply do

if (!vect_check_gather_scatter (stmt))
  gatherscatter = SG_NONE;

 - STMT_VINFO_GATHER_P (stmt_info) = true;
 + if (gatherscatter == GATHER)
 +   STMT_VINFO_GATHER_P (stmt_info) = true;
 + else
 +   STMT_VINFO_SCATTER_P (stmt_info) = true;
 }

 and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P
 using the enum so you can simply do

  STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter;
 Otherwise the patch looks ok to me.

 Fixed.
 Uros, could you please have a look at target part of patch?

 2015-08-26  Andrey Turetskiy  andrey.turets...@intel.com
 Petr Murzin  petr.mur...@intel.com

 gcc/

 * config/i386/i386-builtin-types.def
 (VOID_PFLOAT_HI_V8DI_V16SF_INT): New.
 (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto.
 (VOID_PINT_HI_V8DI_V16SI_INT): Ditto.
 (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto.
 * config/i386/i386.c
 (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF,
 IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
 IX86_BUILTIN_SCATTERALTDIV16SI.
 (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df,
 __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di,
 __builtin_ia32_scatteraltdiv8si.
 (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF,
 IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
 IX86_BUILTIN_SCATTERALTDIV16SI.
 (ix86_vectorize_builtin_scatter): New.
 (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as
 ix86_vectorize_builtin_scatter.
 * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New.
 * doc/tm.texi: Regenerate.
 * target.def: Add scatter builtin.
 * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it
 for loads/stores in case of gather/scatter accordingly.
 (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
 (vect_check_gather): Rename to ...
 (vect_check_gather_scatter): this.
 * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use
 STMT_VINFO_GATHER_SCATTER_P instead of STMT_VINFO_SCATTER_P.
 (vect_check_gather_scatter): Use it instead of vect_check_gather.
 (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable
 and new checkings for it accordingly.
 * tree-vect-stmts.c
 (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
 (vect_check_gather_scatter): Use it instead of vect_check_gather.
 (vectorizable_store): Add checkings for STMT_VINFO_GATHER_SCATTER_P.

 gcc/testsuite/

 * gcc.target/i386/avx512f-scatter-1.c: New.
 * gcc.target/i386/avx512f-scatter-2.c: Ditto.
 * gcc.target/i386/avx512f-scatter-3.c: Ditto.

x86 target part and testsuite are OK with the following change to the testcases:

 +/* { dg-do run } */
 +/* { dg-require-effective-target avx512f } */
 +/* { dg-options -O3 -mavx512f -DAVX512F } */
 +
 +#include avx512f-check.h
 +
 +#define N 1024

We don't want -D in the options, please move these to the source:

/* { dg-do run } */
/* { dg-require-effective-target avx512f } */
/* { dg-options -O3 -mavx512f } */

#define AVX512F

#include avx512f-check.h

#define N 1024

Thanks,
Uros.


[PATCH 0/2] Final cleanup in move to ISL

2015-08-26 Thread Sebastian Pop
Hi,

Richi suggested at the Cauldron that it would be good to have graphite more
automatic and with fewer flags.  The first patch removes the -funroll-and-jam
pass that does not seem very stable or useful for now.  The second patch removes
the other -floop-* flags that were part of the old graphite's middle-end (these
were the first transforms implemented on the polyhedral representation
(matrices, etc.) when we had no ISL scheduler.)  The transition to ISL that
removed GCC's dependence on PPL and Cloog has not removed all graphite's
middle-end for loop transforms.  We now can remove that code as it is replaced
by ISL's scheduler.

The patches pass make check and bootstrap (in progress) with 
-fgraphite-identity.
Ok to commit?

Thanks,
Sebastian


Sebastian Pop (2):
  remove -floop-unroll-and-jam
  remove -floop-* flags

 gcc/Makefile.in|2 -
 gcc/common.opt |   20 +-
 gcc/doc/invoke.texi|  108 +-
 gcc/graphite-blocking.c|  270 -
 gcc/graphite-interchange.c |  656 
 gcc/graphite-isl-ast-to-gimple.c   |  102 +-
 gcc/graphite-optimize-isl.c|  193 +---
 gcc/graphite-poly.c|  492 +
 gcc/graphite-poly.h| 1085 
 gcc/graphite-sese-to-poly.c|   22 +-
 gcc/graphite.c |   13 +-
 gcc/params.def |   15 -
 gcc/testsuite/g++.dg/graphite/graphite.exp |   10 +-
 gcc/testsuite/gcc.dg/graphite/block-0.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-1.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-3.c|4 +-
 gcc/testsuite/gcc.dg/graphite/block-4.c|4 +-
 gcc/testsuite/gcc.dg/graphite/block-5.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-6.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-7.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-8.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-pr47654.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/graphite.exp |   14 +-
 gcc/testsuite/gcc.dg/graphite/interchange-0.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-1.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-10.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-11.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-12.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-13.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-14.c |3 +-
 gcc/testsuite/gcc.dg/graphite/interchange-15.c |4 +-
 gcc/testsuite/gcc.dg/graphite/interchange-3.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-4.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-5.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-6.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-7.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-8.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-9.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|4 +-
 gcc/testsuite/gcc.dg/graphite/pr37485.c|5 +-
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c|2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |3 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |4 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c  |2 +-
 .../gcc.dg/graphite/uns-interchange-mvt.c  |4 +-
 gcc/testsuite/gfortran.dg/graphite/graphite.exp|   10 +-
 gcc/toplev.c   |3 +-
 48 files changed, 123 insertions(+), 2973 deletions(-)
 delete mode 100644 gcc/graphite-blocking.c
 delete mode 100644 gcc/graphite-interchange.c

-- 
2.1.0.243.g30d45f7



[PATCH 2/2] remove -floop-* flags

2015-08-26 Thread Sebastian Pop
---
 gcc/Makefile.in|2 -
 gcc/common.opt |   16 +-
 gcc/doc/invoke.texi|  108 +-
 gcc/graphite-blocking.c|  270 -
 gcc/graphite-interchange.c |  656 
 gcc/graphite-optimize-isl.c|   14 +-
 gcc/graphite-poly.c|  489 +
 gcc/graphite-poly.h| 1082 
 gcc/graphite-sese-to-poly.c|   22 +-
 gcc/graphite.c |   10 +-
 gcc/testsuite/g++.dg/graphite/graphite.exp |   10 +-
 gcc/testsuite/gcc.dg/graphite/block-0.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-1.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-3.c|4 +-
 gcc/testsuite/gcc.dg/graphite/block-4.c|4 +-
 gcc/testsuite/gcc.dg/graphite/block-5.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-6.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-7.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-8.c|2 +-
 gcc/testsuite/gcc.dg/graphite/block-pr47654.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/graphite.exp |   14 +-
 gcc/testsuite/gcc.dg/graphite/interchange-0.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-1.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-10.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-11.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-12.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-13.c |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-14.c |3 +-
 gcc/testsuite/gcc.dg/graphite/interchange-15.c |4 +-
 gcc/testsuite/gcc.dg/graphite/interchange-3.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-4.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-5.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-6.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-7.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-8.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-9.c  |2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|4 +-
 gcc/testsuite/gcc.dg/graphite/pr37485.c|5 +-
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c|2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |3 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |4 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c  |2 +-
 .../gcc.dg/graphite/uns-interchange-mvt.c  |4 +-
 gcc/testsuite/gfortran.dg/graphite/graphite.exp|   10 +-
 45 files changed, 98 insertions(+), 2686 deletions(-)
 delete mode 100644 gcc/graphite-blocking.c
 delete mode 100644 gcc/graphite-interchange.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e298ecc..3d1c1e5 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1277,10 +1277,8 @@ OBJS = \
graph.o \
graphds.o \
graphite.o \
-   graphite-blocking.o \
graphite-isl-ast-to-gimple.o \
graphite-dependences.o \
-   graphite-interchange.o \
graphite-optimize-isl.o \
graphite-poly.o \
graphite-scop-detection.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 0964ae4..94d1d88 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1341,16 +1341,16 @@ Common Report Var(flag_loop_parallelize_all) 
Optimization
 Mark all loops as parallel
 
 floop-strip-mine
-Common Report Var(flag_loop_strip_mine) Optimization
-Enable Loop Strip Mining transformation
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
 
 floop-interchange
-Common Report Var(flag_loop_interchange) Optimization
-Enable Loop Interchange transformation
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
 
 floop-block
-Common Report Var(flag_loop_block) Optimization
-Enable Loop Blocking transformation
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
 
 floop-unroll-and-jam
 Common Alias(floop-nest-optimize)
@@ -2315,8 +2315,8 @@ Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
 
 ftree-loop-linear
-Common Alias(floop-interchange)
-Enable loop interchange transforms.  Same as -floop-interchange
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
 
 ftree-loop-ivcanon
 Common Report Var(flag_tree_loop_ivcanon) Init(1) Optimization
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c33cc27..8710ff8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8733,102 +8733,19 @@ Perform loop optimizations on trees.  This flag is 
enabled by default
 at @option{-O} and 

[PATCH 1/2] remove -floop-unroll-and-jam

2015-08-26 Thread Sebastian Pop
---
 gcc/common.opt   |   4 +-
 gcc/doc/invoke.texi  |   8 +-
 gcc/graphite-isl-ast-to-gimple.c | 102 +-
 gcc/graphite-optimize-isl.c  | 179 ---
 gcc/graphite-poly.c  |   3 +-
 gcc/graphite-poly.h  |   3 -
 gcc/graphite.c   |   3 +-
 gcc/params.def   |  15 
 gcc/toplev.c |   3 +-
 9 files changed, 29 insertions(+), 291 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4dcd518..0964ae4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1353,8 +1353,8 @@ Common Report Var(flag_loop_block) Optimization
 Enable Loop Blocking transformation
 
 floop-unroll-and-jam
-Common Report Var(flag_loop_unroll_jam) Optimization
-Enable Loop Unroll Jam transformation
+Common Alias(floop-nest-optimize)
+Enable loop nest transforms.  Same as -floop-nest-optimize
  
 fgnu-tm
 Common Report Var(flag_tm)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27be317..c33cc27 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8848,10 +8848,10 @@ is experimental.
 
 @item -floop-unroll-and-jam
 @opindex floop-unroll-and-jam
-Enable unroll and jam for the ISL based loop nest optimizer.  The unroll 
-factor can be changed using the @option{loop-unroll-jam-size} parameter.
-The unrolled dimension (counting from the most inner one) can be changed 
-using the @option{loop-unroll-jam-depth} parameter. .
+Perform loop nest transformations.  Same as
+@option{-floop-nest-optimize}.  To use this code transformation, GCC has
+to be configured with @option{--with-isl} to enable the Graphite loop
+transformation infrastructure.
 
 @item -floop-parallelize-all
 @opindex floop-parallelize-all
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index dfb012f..5434bfd 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -968,92 +968,6 @@ extend_schedule (__isl_take isl_map *schedule, int 
nb_schedule_dims)
   return schedule;
 }
 
-/* Set the separation_class option for unroll and jam. */
-
-static __isl_give isl_union_map *
-generate_luj_sepclass_opt (scop_p scop, __isl_take isl_union_set *domain, 
-   int dim, int cl)
-{
-  isl_map  *map;
-  isl_space *space, *space_sep;
-  isl_ctx *ctx;
-  isl_union_map *mapu;
-  int nsched = get_max_schedule_dimensions (scop);
- 
-  ctx = scop-ctx;
-  space_sep = isl_space_alloc (ctx, 0, 1, 1);
-  space_sep = isl_space_wrap (space_sep);
-  space_sep = isl_space_set_tuple_name (space_sep, isl_dim_set,
-   separation_class);
-  space = isl_set_get_space (scop-context);
-  space_sep = isl_space_align_params (space_sep, isl_space_copy(space));
-  space = isl_space_map_from_domain_and_range (space, space_sep);
-  space = isl_space_add_dims (space,isl_dim_in, nsched);
-  map = isl_map_universe (space);
-  isl_map_fix_si (map,isl_dim_out,0,dim);
-  isl_map_fix_si (map,isl_dim_out,1,cl);
-
-  mapu = isl_union_map_intersect_domain (isl_union_map_from_map (map), 
-domain);
-  return (mapu);
-}
-
-/* Compute the separation class for loop unroll and jam.  */
-
-static __isl_give isl_union_set *
-generate_luj_sepclass (scop_p scop)
-{
-  int i;
-  poly_bb_p pbb;
-  isl_union_set *domain_isl;
-
-  domain_isl = isl_union_set_empty (isl_set_get_space (scop-context));
-
-  FOR_EACH_VEC_ELT (SCOP_BBS (scop), i, pbb)
-{
-  isl_set *bb_domain;
-  isl_set *bb_domain_s;
-
-  if (pbb-map_sepclass == NULL)
-   continue;
-
-  if (isl_set_is_empty (pbb-domain))
-   continue;
-
-  bb_domain = isl_set_copy (pbb-domain);
-  bb_domain_s = isl_set_apply (bb_domain, pbb-map_sepclass);
-  pbb-map_sepclass = NULL;
-
-  domain_isl =
-   isl_union_set_union (domain_isl, isl_union_set_from_set (bb_domain_s));
-}
-
-  return domain_isl;
-}
-
-/* Set the AST built options for loop unroll and jam. */
- 
-static __isl_give isl_union_map *
-generate_luj_options (scop_p scop)
-{
-  isl_union_set *domain_isl;
-  isl_union_map *options_isl_ss;
-  isl_union_map *options_isl =
-isl_union_map_empty (isl_set_get_space (scop-context));
-  int dim = get_max_schedule_dimensions (scop) - 1;
-  int dim1 = dim - PARAM_VALUE (PARAM_LOOP_UNROLL_JAM_DEPTH);
-
-  if (!flag_loop_unroll_jam)
-return options_isl;
-
-  domain_isl = generate_luj_sepclass (scop);
-
-  options_isl_ss = generate_luj_sepclass_opt (scop, domain_isl, dim1, 0);
-  options_isl = isl_union_map_union (options_isl, options_isl_ss);
-
-  return options_isl;
-}
-
 /* Generates a schedule, which specifies an order used to
visit elements in a domain.  */
 
@@ -1102,13 +1016,11 @@ ast_build_before_for (__isl_keep isl_ast_build *build, 
void *user)
 }
 
 /* Set the separate option for all dimensions.
-   This helps to reduce control overhead.
-   Set the options for 

Re: [libvtv] Fix formatting errors

2015-08-26 Thread Jeff Law

On 08/26/2015 01:50 PM, Caroline Tice wrote:

As far as I know vtv is working just fine...is there something I don't
know about?
I'm not aware of anything that isn't working, but I'm also not aware of 
vtv in widespread use, typical performance hit experienced, etc.


jeff



Re: [PATCH] [AVX512F] Add scatter support for vectorizer

2015-08-26 Thread Richard Biener
On Fri, Aug 21, 2015 at 2:18 PM, Petr Murzin petrmurz...@gmail.com wrote:
 Hello,
 Please have a look at updated patch.

 On Tue, Aug 4, 2015 at 3:15 PM, Richard Biener rguent...@suse.de wrote:
 On Fri, 31 Jul 2015, Petr Murzin wrote:
 @@ -5586,8 +5770,6 @@ vectorizable_store (gimple stmt,
 gimple_stmt_iterator *gsi, gimple *vec_stmt,
prev_stmt_info = NULL;
for (j = 0; j  ncopies; j++)
  {
 -  gimple new_stmt;
 -
if (j == 0)
 {
if (slp)

 spurious change?

 I have increased the scope of this variable to use it in checking for
 STMT_VINFO_SCATTER_P (stmt_info).

@@ -3763,32 +3776,46 @@ again:
   if (vf  *min_vf)
*min_vf = vf;

-  if (gather)
+  if (gatherscatter != SG_NONE)
{
  tree off;
+ if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off,
NULL, true) != 0)
+   gatherscatter = GATHER;
+ else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL,
off, NULL, false)
+ != 0)
+   gatherscatter = SCATTER;
+ else
+   gatherscatter = SG_NONE;

as I said vect_check_gather_scatter already knows whether the DR is a read or
a write and thus whether it needs to check for gather or scatter.  Remove
the new argument.  And simply do

   if (!vect_check_gather_scatter (stmt))
 gatherscatter = SG_NONE;

- STMT_VINFO_GATHER_P (stmt_info) = true;
+ if (gatherscatter == GATHER)
+   STMT_VINFO_GATHER_P (stmt_info) = true;
+ else
+   STMT_VINFO_SCATTER_P (stmt_info) = true;
}

and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P
using the enum so you can simply do

 STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter;

I miss a few testcases that exercise scatter vectorization.  And as Uros
said, the i386 specific parts should be split out.

Otherwise the patch looks ok to me.

Thanks,
Richard.


 Thanks,
 Petr

 2015-08-21  Andrey Turetskiy  andrey.turets...@intel.com
 Petr Murzin  petr.mur...@intel.com

 gcc/

 * config/i386/i386-builtin-types.def
 (VOID_PFLOAT_HI_V8DI_V16SF_INT): New.
 (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto.
 (VOID_PINT_HI_V8DI_V16SI_INT): Ditto.
 (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto.
 * config/i386/i386.c
 (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF,
 IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
 IX86_BUILTIN_SCATTERALTDIV16SI.
 (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df,
 __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di,
 __builtin_ia32_scatteraltdiv8si.
 (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF,
 IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
 IX86_BUILTIN_SCATTERALTDIV16SI.
 (ix86_vectorize_builtin_scatter): New.
 (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as
 ix86_vectorize_builtin_scatter.
 * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New.
 * doc/tm.texi: Regenerate.
 * target.def: Add scatter builtin.
 * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Add new
 checkings for STMT_VINFO_SCATTER_P.
 (vect_check_gather): Rename to ...
 (vect_check_gather_scatter): this and enhance number of arguments.
 (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable
 and new checkings for it accordingly.
 * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it
 for loads/stores
  in case of gather/scatter accordingly.
 (STMT_VINFO_SCATTER_P(S)): Define.
 (vect_check_gather): Rename to ...
 (vect_check_gather_scatter): this.
 * triee-vect-stmts.c (vectorizable_mask_load_store): Ditto.
 (vectorizable_store): Add checkings for STMT_VINFO_SCATTER_P.
 (vect_mark_stmts_to_be_vectorized): Ditto.


[PATCH] Remove reference to undefined documentation node.

2015-08-26 Thread Dominik Vogt
This patch removes a menu entry that points to an undefined node
in the documentation.  The faulty entry has been introduced with
git commit id 3aabc45f2, subversion id 138bc75d-0d04-0410-96.  It
looks like the entry is a remnant of an earlier version of the
documentation introduced with that change.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* doc/extend.texi: Remove reference to undefined node.
From 55b9c29f73d8da1881ce5a3f65d0c7f40623e161 Mon Sep 17 00:00:00 2001
From: Dominik Vogt v...@linux.vnet.ibm.com
Date: Wed, 26 Aug 2015 10:59:29 +0100
Subject: [PATCH] Remove reference to undefined documentation node.

---
 gcc/doc/extend.texi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 018b5d8..f5f90e6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7245,7 +7245,6 @@ for a C symbol, or to place a C variable in a specific register.
 @menu
 * Basic Asm::  Inline assembler without operands.
 * Extended Asm::   Inline assembler with operands.
-* Constraints::Constraints for @code{asm} operands
 * Asm Labels:: Specifying the assembler name to use for a C symbol.
 * Explicit Reg Vars::  Defining variables residing in specified registers.
 * Size of an asm:: How GCC calculates the size of an @code{asm} block.
-- 
2.3.0



Re: [RFC 5/5] Always completely replace constant pool entries

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 9:54 PM, Jeff Law l...@redhat.com wrote:
 On 08/25/2015 05:06 AM, Alan Lawrence wrote:

 I used this as a means of better-testing the previous changes, as it
 exercises
 the constant replacement code a whole lot more. Indeed, quite a few tests
 are
 now optimized away to nothing on AArch64...

 Always pulling in constants, is almost certainly not what we want, but we
 may
 nonetheless want something more aggressive than the usual --param, e.g.
 for the
 ssa-dom-cse-2.c test. Thoughts welcomed?

 I'm of the opinion that we have too many knobs already.  So I'd perhaps ask
 whether or not this option is likely to be useful to end users?

 As for the patch itself, any thoughts on reasonable heuristics for when to
 pull in the constants?  Clearly we don't want the patch as-is, but are there
 cases we can identify when we want to be more aggressive?

Well - I still think that we need to enhance those followup passes to directly
handle the constant pool entry.  Expanding the assignment piecewise for
arbitrary large initializers is certainly a no-go.  IIRC I enhanced FRE to do
this at some point.  For DOM it's much harder due to the way it is structured
and I'd like to keep DOM simple.

Note that we still want SRA to partly scalarize the initializer if
only few elements
remain accessed (so we can optimize the initializer away).  Of course
that requires
catching most followup optimization opportunities before the 2nd SRA run.

Richard.

 jeff




Re: [PATCH][ARM]Tighten the conditions for arm_movw, arm_movt

2015-08-26 Thread Ramana Radhakrishnan

 I have tested that, arm-none-linux-gnueabi bootstraps Okay on trunk code.

 JFTR, this is ok to backport to gcc-5 in case there are no regressions.

regards
Ramana





 Thanks,
 Kyrill





Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 10:13 PM, Jeff Law l...@redhat.com wrote:
 On 08/25/2015 05:06 AM, Alan Lawrence wrote:

 This makes SRA replace loads of records/arrays from constant pool entries,
 with elementwise assignments of the constant values, hence, overcoming the
 fundamental problem in PR/63679.

 As a first pass, the approach I took was to look for constant-pool loads
 as
 we scanned through other accesses, and add them as candidates there; to
 build a
 constant replacement_decl for any such accesses in completely_scalarize;
 and to
 use any existing replacement_decl rather than creating a variable in
 create_access_replacement. (I did try using CONSTANT_CLASS_P in the
 latter, but
 that does not allow addresses of labels, which can still end up in the
 constant
 pool.)

 Feedback as to the approach or how it might be better structured / fitted
 into
 SRA, is solicited ;).

 Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
 arm-none-linux-gnueabihf, including with the next patch (rfc), which
 greatly increases the number of testcases in which this code is exercised!

 Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes
 (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc,
 sparc, avr, and sh.

 gcc/ChangeLog:

 * tree-sra.c (create_access): Scan for uses of constant pool and
 add
 to candidates.
 (subst_initial): New.
 (scalarize_elem): Build replacement_decl using subst_initial.
 (create_access_replacement): Use replacement_decl if set.

 gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
 sra-max-scalarization-size-Ospeed.
 ---
   gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
   gcc/tree-sra.c| 56
 +--
   2 files changed, 55 insertions(+), 8 deletions(-)

 diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
 index af35fcc..a3ff2df 100644
 --- a/gcc/tree-sra.c
 +++ b/gcc/tree-sra.c
 @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
 else
   ptr = false;

 +  /* FORNOW: scan for uses of constant pool as we go along.  */

 I'm not sure why you have this marked as FORNOW.  If I'm reading all this
 code correctly, you're lazily adding items from the constant pool into the
 candidates table when you find they're used.  That seems better than walking
 the entire constant pool adding them all to the candidates.

 I don't see this as fundamentally wrong or unclean.

 The question I have is why this differs from the effects of patch #5. That
 would seem to indicate that there's things we're not getting into the
 candidate tables with this approach?!?



 @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type,
 HOST_WIDE_INT offset, tree ref)
   }
   }

 +static tree
 +subst_initial (tree expr, tree var)

 Function comment.

 I think this patch is fine with the function comment added and removing the
 FORNOW part of the comment in create_access.  It may be worth noting in
 create_access's comment that it can add new items to the candidates tables
 for constant pool entries.

I'm happy seeing this code in SRA as I never liked that we already decide
at gimplification time which initializers to expand and which to init from
a constant pool entry.  So ... can we now remove gimplify_init_constructor
by _always_ emitting a constant pool entry and an assignment from it
(obviously only if the constructor can be put into the constant pool)?  Defering
the expansion decision to SRA makes it possible to better estimate whether
the code is hot/cold or whether the initialized variable can be replaced by
the constant pool entry completely (variable ends up readonly).

Oh, and we'd no longer create the awful split code at -O0 ...

So can you explore that a bit once this series is settled?  This is probably
also related to 5/5 as this makes all the target dependent decisions in SRA
now and thus the initial IL from gimplification should be the same for all
targets (that's always a nice thing to have IMHO).

Thanks,
Richard.


 Jeff


Re: [PING^2][PATCH, PR46193] Handle mix/max pointer reductions in parloops

2015-08-26 Thread Richard Biener
On Mon, Aug 24, 2015 at 5:10 PM, Tom de Vries tom_devr...@mentor.com wrote:
 On 22-07-15 20:15, Tom de Vries wrote:

 On 13/07/15 13:02, Tom de Vries wrote:

 Hi,

 this patch fixes PR46193.

 It handles min and max reductions of pointer type in parloops.

 Bootstrapped and reg-tested on x86_64.

 OK for trunk?



 Ping^2.

 Original submission at
 https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01018.html .

Please don't use lower_bound_in_type with two identical types.
Instead use wi::max_value and wide_int_to_tree.

Ok with that change.

Thanks,
Richard.


 Thanks,
 - Tom


 0001-Handle-mix-max-pointer-reductions-in-parloops.patch


 Handle mix/max pointer reductions in parloops

 2015-07-13  Tom de Vriest...@codesourcery.com

 PR tree-optimization/46193
 * omp-low.c (omp_reduction_init): Handle pointer type for min or max
 clause.

 * gcc.dg/autopar/pr46193.c: New test.

 * testsuite/libgomp.c/pr46193.c: New test.
 ---
   gcc/omp-low.c  |  4 ++
   gcc/testsuite/gcc.dg/autopar/pr46193.c | 38 +++
   libgomp/testsuite/libgomp.c/pr46193.c  | 67
 ++
   3 files changed, 109 insertions(+)
   create mode 100644 gcc/testsuite/gcc.dg/autopar/pr46193.c
   create mode 100644 libgomp/testsuite/libgomp.c/pr46193.c

 diff --git a/gcc/omp-low.c b/gcc/omp-low.c
 index 2e2070a..20d0010 100644
 --- a/gcc/omp-low.c
 +++ b/gcc/omp-low.c
 @@ -3423,6 +3423,8 @@ omp_reduction_init (tree clause, tree type)
   real_maxval (min, 1, TYPE_MODE (type));
 return build_real (type, min);
   }
 +  else if (POINTER_TYPE_P (type))
 +return lower_bound_in_type (type, type);
 else
   {
 gcc_assert (INTEGRAL_TYPE_P (type));
 @@ -3439,6 +3441,8 @@ omp_reduction_init (tree clause, tree type)
   real_maxval (max, 0, TYPE_MODE (type));
 return build_real (type, max);
   }
 +  else if (POINTER_TYPE_P (type))
 +return upper_bound_in_type (type, type);
 else
   {
 gcc_assert (INTEGRAL_TYPE_P (type));
 diff --git a/gcc/testsuite/gcc.dg/autopar/pr46193.c
 b/gcc/testsuite/gcc.dg/autopar/pr46193.c
 new file mode 100644
 index 000..544a5da
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/autopar/pr46193.c
 @@ -0,0 +1,38 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -ftree-parallelize-loops=2
 -fdump-tree-parloops-details
 } */
 +
 +extern void abort (void);
 +
 +char *
 +foo (int count, char **list)
 +{
 +  char *minaddr = list[0];
 +  int i;
 +
 +  for (i = 0; i  count; i++)
 +{
 +  char *addr = list[i];
 +  if (addr  minaddr)
 +minaddr = addr;
 +}
 +
 +  return minaddr;
 +}
 +
 +char *
 +foo2 (int count, char **list)
 +{
 +  char *maxaddr = list[0];
 +  int i;
 +
 +  for (i = 0; i  count; i++)
 +{
 +  char *addr = list[i];
 +  if (addr  maxaddr)
 +maxaddr = addr;
 +}
 +
 +  return maxaddr;
 +}
 +
 +/* { dg-final { scan-tree-dump-times parallelizing inner loop 2
 parloops
 } } */
 diff --git a/libgomp/testsuite/libgomp.c/pr46193.c
 b/libgomp/testsuite/libgomp.c/pr46193.c
 new file mode 100644
 index 000..1e27faf
 --- /dev/null
 +++ b/libgomp/testsuite/libgomp.c/pr46193.c
 @@ -0,0 +1,67 @@
 +/* { dg-do run } */
 +/* { dg-additional-options -ftree-parallelize-loops=2 } */
 +
 +extern void abort (void);
 +
 +char *
 +foo (int count, char **list)
 +{
 +  char *minaddr = list[0];
 +  int i;
 +
 +  for (i = 0; i  count; i++)
 +{
 +  char *addr = list[i];
 +  if (addr  minaddr)
 +minaddr = addr;
 +}
 +
 +  return minaddr;
 +}
 +
 +char *
 +foo2 (int count, char **list)
 +{
 +  char *maxaddr = list[0];
 +  int i;
 +
 +  for (i = 0; i  count; i++)
 +{
 +  char *addr = list[i];
 +  if (addr  maxaddr)
 +maxaddr = addr;
 +}
 +
 +  return maxaddr;
 +}
 +
 +#define N 5
 +
 +static void
 +init (char **list)
 +{
 +  int i;
 +  for (i = 0; i  N; ++i)
 +list[i] = (char *)list[i];
 +}
 +
 +int
 +main (void)
 +{
 +  char *list[N];
 +  char * res;
 +
 +  init (list);
 +
 +  res = foo (N, list);
 +
 +  if (res != (char *)list[0])
 +abort ();
 +
 +  res = foo2 (N, list);
 +
 +  if (res != (char *)list[N-1])
 +abort ();
 +
 +  return 0;
 +}
 -- 1.9.1





RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2015-08-26 Thread Matthew Fortune
Moore, Catherine catherine_mo...@mentor.com writes:
  The recent changes to the MIPS GCC Linux sysroot
  (https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01014.html) have meant
  that the include directory is now not global and is provided only for
  each multi-lib configuration.  This means that for any test in the
  MIPS GCC Testsuite that requires standard library support we need to
  check if there is a multi-lib support for the test options, otherwise
 it might fail to compile.
 
  This patch adds this support to the testsuite and mips.exp files.
  Firstly any test that requires standard library support has the
  implicit option (REQUIRES_STDLIB) added to its dg-options.  Secondly
  in mips.exp a pre- processor check is performed to ensure that when
  expanding a testcase containing a #include stdlib.h using the
  current set of test options we do not get file not found errors.  If
  this happens we mark the testcase as unsupported.
 
  The patch has been tested on the mti/img elf/linux-gnu toolchains, and
  there have been no new regressions.
 
  The patch and ChangeLog are below.
 
  Ok to commit?
 
 
 Yes.  This looks good.

I had some comments on this that I hadn't got round to posting. The fix in
this patch is not general enough as the missing header problem comes in
two (related) forms:

1) Using the new MTI and IMG sysroot layout we can end up with GCC looking
   for headers in a sysroot that simply does not exist. The current patch
   handles this.
2) Using any sysroot layout (i.e. a simple mips-linux-gnu) it is possible
   for the stdlib.h header to be found but the ABI dependent gnu-stubs
   header may not be installed depending on soft/hard nan1985/nan2008.

The test for stdlib.h needs to therefore verify that preprocessing succeeds
rather than just testing for an error relating to stdlib.h. This could be
done by adding a further option to mips_preprocess to indicate the processor
output should go to a file and that the caller wants the messages emitted
by the compiler instead.

A second issue is that you have added (REQUIRES_STDLIB) to too many tests.
You only need to add it to tests that request a compiler option (via
dg-options) that could potentially lead to forcing soft/hard nan1985/nan2008
directly or indirectly. So -mips32r6 implies nan2008 so you need it -mips32r5
implies nan1985 so you need it. There are at least two tests which don't
need the option but you need to check them all so we don't run the check
needlessly.

Thanks,
Matthew


Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Richard Biener
On August 26, 2015 11:30:26 AM GMT+02:00, Martin Jambor mjam...@suse.cz wrote:
Hi,

On Wed, Aug 26, 2015 at 09:07:33AM +0200, Richard Biener wrote:
 On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law l...@redhat.com wrote:
  On 08/25/2015 03:42 PM, Martin Jambor wrote:
 
  Hi,
 
  On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote:
 
  This changes the completely_scalarize_record path to also work on
arrays
  (thus
  allowing records containing arrays, etc.). This just required
extending
  the
  existing type_consists_of_records_p and
completely_scalarize_record
  methods
  to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I
renamed
  both
  methods so as not to mention 'record'.
 
 
  thanks for working on this.  I see Jeff has already approved the
  patch, but I have two comments nevertheless.  First, I would be
much
  happier if you added a proper comment to scalarize_elem function
which
  you forgot completely.  The name is not very descriptive and it
has
  quite few parameters too.
 
  Right.  I mentioned that I missed the lack of function comments
when looking
  at #3 and asked Alan to go back and fix them in #1 and #2.
 
 
  Second, this patch should also fix PR 67283.  It would be great if
you
  could verify that and add it to the changelog when committing if
that
  is indeed the case.
 
  Excellent.  Yes, definitely mention the BZ.
 
 One extra question is does the way we limit total scalarization work
well
 for arrays?  I suppose we have either sth like the maximum size of an
 aggregate we scalarize or the maximum number of component accesses
 we create?
 

Only the former and that would be kept intact.  It is in fact visible
in the context of the last hunk of the patch.

OK.  IIRC the gimplification code also has the latter and also considers 
zeroing the whole aggregate before initializing non-zero fields.  IMHO it makes 
sense to reuse some of the analysis and classification routines it has.

Richard.

Martin




Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-26 Thread Richard Biener
On Wed, 26 Aug 2015, Bin.Cheng wrote:

 On Wed, Aug 26, 2015 at 3:50 AM, Jeff Law l...@redhat.com wrote:
  On 08/25/2015 05:06 AM, Alan Lawrence wrote:
 
  When SRA completely scalarizes an array, this patch changes the
  generated accesses from e.g.
 
  MEM[(int[8] *)a + 4B] = 1;
 
  to
 
  a[1] = 1;
 
  This overcomes a limitation in dom2, that accesses to equivalent
  chunks of e.g. MEM[(int[8] *)a] are not hashable_expr_equal_p with
  accesses to e.g. MEM[(int[8] *)a]. This is necessary for constant
  propagation in the ssa-dom-cse-2.c testcase (after the next patch
  that makes SRA handle constant-pool loads).
 
  I tried to work around this by making dom2's hashable_expr_equal_p
  less conservative, but found that on platforms without AArch64's
  vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC,
  mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8]
  *)a] equivalent to a[0], etc.; a complete overhaul of
  hashable_expr_equal_p seems like a larger task than this patch
  series.
 
  I can't see how to write a testcase for this in C though as direct
  assignment to an array is not possible; such assignments occur only
  with constant pool data, which is dealt with in the next patch.
 
  It's a general issue that if there's  1 common way to represent an
  expression, then DOM will often miss discovery of the CSE opportunity
  because of the way it hashes expressions.
 
  Ideally we'd be moving to a canonical form, but I also realize that in
  the case of memory references like this, that may not be feasible.
 IIRC, there were talks about lowering all memory reference on GIMPLE?
 Which is the reverse approach.  Since SRA is in quite early
 compilation stage, don't know if lowered memory reference has impact
 on other optimizers.

Yeah, I'd only do the lowering after loop opts.  Which also may make
the DOM issue moot as the array refs would be lowered as well and thus
DOM would see a consistent set of references again.  The lowering should
also simplify SLSR and expose address computation redundancies to DOM.

I'd place such lowering before the late reassoc (any takers?  I suppose
you can pick up one of the bitfield lowering passes posted in the
previous years as this should also handle bitfield accesses correctly).

Thanks,
Richard.

 Thanks,
 bin
 
  It does make me wonder how many CSEs we're really missing due to the two
  ways to represent array accesses.
 
 
  Bootstrap + check-gcc on x86-none-linux-gnu,
  arm-none-linux-gnueabihf, aarch64-none-linux-gnu.
 
  gcc/ChangeLog:
 
  * tree-sra.c (completely_scalarize): Move some code into:
  (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base
  is aligned array. --- gcc/tree-sra.c | 110
  - 1 file
  changed, 69 insertions(+), 41 deletions(-)
 
  diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc
  100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@
  scalarizable_type_p (tree type) } }
 
  +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT
  *sz_out)
 
  Function comment needed.
 
  I may have missed it in the earlier patches, but can you please make
  sure any new functions you created have comments in those as well.  Such
  patches are pre-approved.
 
  With the added function comment, this patch is fine.
 
  jeff
 
 
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-26 Thread Bin.Cheng
On Wed, Aug 26, 2015 at 3:29 PM, Richard Biener rguent...@suse.de wrote:
 On Wed, 26 Aug 2015, Bin.Cheng wrote:

 On Wed, Aug 26, 2015 at 3:50 AM, Jeff Law l...@redhat.com wrote:
  On 08/25/2015 05:06 AM, Alan Lawrence wrote:
 
  When SRA completely scalarizes an array, this patch changes the
  generated accesses from e.g.
 
  MEM[(int[8] *)a + 4B] = 1;
 
  to
 
  a[1] = 1;
 
  This overcomes a limitation in dom2, that accesses to equivalent
  chunks of e.g. MEM[(int[8] *)a] are not hashable_expr_equal_p with
  accesses to e.g. MEM[(int[8] *)a]. This is necessary for constant
  propagation in the ssa-dom-cse-2.c testcase (after the next patch
  that makes SRA handle constant-pool loads).
 
  I tried to work around this by making dom2's hashable_expr_equal_p
  less conservative, but found that on platforms without AArch64's
  vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC,
  mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8]
  *)a] equivalent to a[0], etc.; a complete overhaul of
  hashable_expr_equal_p seems like a larger task than this patch
  series.
 
  I can't see how to write a testcase for this in C though as direct
  assignment to an array is not possible; such assignments occur only
  with constant pool data, which is dealt with in the next patch.
 
  It's a general issue that if there's  1 common way to represent an
  expression, then DOM will often miss discovery of the CSE opportunity
  because of the way it hashes expressions.
 
  Ideally we'd be moving to a canonical form, but I also realize that in
  the case of memory references like this, that may not be feasible.
 IIRC, there were talks about lowering all memory reference on GIMPLE?
 Which is the reverse approach.  Since SRA is in quite early
 compilation stage, don't know if lowered memory reference has impact
 on other optimizers.

 Yeah, I'd only do the lowering after loop opts.  Which also may make
 the DOM issue moot as the array refs would be lowered as well and thus
 DOM would see a consistent set of references again.  The lowering should
 also simplify SLSR and expose address computation redundancies to DOM.

 I'd place such lowering before the late reassoc (any takers?  I suppose
 you can pick up one of the bitfield lowering passes posted in the
 previous years as this should also handle bitfield accesses correctly).
I ran into several issues related to lowered memory references (some
of them are about slsr), and want to have a look at this.  But only
after finishing major issues in IVO...

As for slsr, I think the problem is more about we need to prove
equality of expressions by diving into definition chain of ssa_var,
just like tree_to_affine_expand.  I think this has already been
discussed too.  Anyway, lowering memory reference provides a canonical
form and should benefit other optimizers.

Thanks,
bin

 Thanks,
 Richard.

 Thanks,
 bin
 
  It does make me wonder how many CSEs we're really missing due to the two
  ways to represent array accesses.
 
 
  Bootstrap + check-gcc on x86-none-linux-gnu,
  arm-none-linux-gnueabihf, aarch64-none-linux-gnu.
 
  gcc/ChangeLog:
 
  * tree-sra.c (completely_scalarize): Move some code into:
  (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base
  is aligned array. --- gcc/tree-sra.c | 110
  - 1 file
  changed, 69 insertions(+), 41 deletions(-)
 
  diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc
  100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@
  scalarizable_type_p (tree type) } }
 
  +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT
  *sz_out)
 
  Function comment needed.
 
  I may have missed it in the earlier patches, but can you please make
  sure any new functions you created have comments in those as well.  Such
  patches are pre-approved.
 
  With the added function comment, this patch is fine.
 
  jeff
 
 



 --
 Richard Biener rguent...@suse.de
 SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
 21284 (AG Nuernberg)


Re: [PATCH][1/n] dwarf2out refactoring for early (LTO) debug

2015-08-26 Thread Richard Biener
On Wed, 19 Aug 2015, Richard Biener wrote:

 On Tue, 18 Aug 2015, Aldy Hernandez wrote:
 
  On 08/18/2015 07:20 AM, Richard Biener wrote:
   
   This starts a series of patches (still in development) to refactor
   dwarf2out.c to better cope with early debug (and LTO debug).
  
  Awesome!  Thanks.
  
   Aldyh, what other testing did you usually do for changes?  Run
   the gdb testsuite against the new compiler?  Anything else?
  
  gdb testsuite, and make sure you test GCC with 
  --enable-languages=all,go,ada,
  though the latter is mostly useful while you iron out bugs initially.  I 
  found
  that ultimately, the best test was C++.
 
 I see.
 
  Pre merge I also bootstrapped the compiler and compared .debug* section 
  sizes
  in object files to make sure things were within reason.
  
   +
   +static void
   +vmsdbgout_early_finish (const char *filename ATTRIBUTE_UNUSED)
   +{
   +  if (write_symbols == VMS_AND_DWARF2_DEBUG)
   +(*dwarf2_debug_hooks.early_finish) (filename);
   +}
  
  You can get rid of ATTRIBUTE_UNUSED now.
 
 Done.  I've also refrained from moving
 
   gen_scheduled_generic_parms_dies ();
   gen_remaining_tmpl_value_param_die_attribute ();
 
 for now as that causes regressions I have to investigate.
 
 The patch below has passed bootstrap  regtest on x86_64-unknown-linux-gnu
 as well as gdb testing.  Twice unpatched, twice patched - results seem
 to be somewhat unstable!?  I even refrained from using any -j with
 make check-gdb...  maybe it's just contrib/test_summary not coping well
 with gdb?  any hints?  Difference between unpatched run 1  2 is
 for example
 
 --- results.unpatched   2015-08-19 15:08:36.152899926 +0200
 +++ results.unpatched2  2015-08-19 15:29:46.902060797 +0200
 @@ -209,7 +209,6 @@
  WARNING: remote_expect statement without a default case?!
  WARNING: remote_expect statement without a default case?!
  WARNING: remote_expect statement without a default case?!
 -FAIL: gdb.base/varargs.exp: print find_max_float_real(4, fc1, fc2, fc3, 
 fc4)
  FAIL: gdb.cp/inherit.exp: print g_vD
  FAIL: gdb.cp/inherit.exp: print g_vE
  FAIL: gdb.cp/no-dmgl-verbose.exp: setting breakpoint at 'f(std::string)'
 @@ -238,6 +237,7 @@
  UNRESOLVED: gdb.fortran/types.exp: set print sevenbit-strings
  FAIL: gdb.fortran/whatis_type.exp: run to MAIN__
  WARNING: remote_expect statement without a default case?!
 +FAIL: gdb.gdb/complaints.exp: print symfile_complaints-root-fmt
  WARNING: remote_expect statement without a default case?!
  WARNING: remote_expect statement without a default case?!
  WARNING: remote_expect statement without a default case?!
 @@ -362,12 +362,12 @@
 === gdb Summary ===
  
 -# of expected passes   30881
 +# of expected passes   30884
  # of unexpected failures   284
  # of unexpected successes  2
 -# of expected failures 85
 +# of expected failures 83
  # of unknown successes 2
 -# of known failures60
 +# of known failures59
  # of unresolved testcases  6
  # of untested testcases32
  # of unsupported tests 165
 
 the sames changes randomly appear/disappear in the patched case.  
 Otherwise patched/unpatched agree.
 
 Ok?

Jason, are you willing to review these refactoring patches or can
I invoke my middle-end maintainer powers to remove some of this noise
from the LTO parts?

Thanks,
Richard.

 Thanks,
 Richard.
 
 2015-08-18  Richard Biener  rguent...@suse.de
 
   * debug.h (gcc_debug_hooks::early_finish): Add filename argument.
   * dbxout.c (dbx_debug_hooks): Adjust.
   * debug.c (do_nothing_hooks): Likewise.
   * sdbout.c (sdb_debug_hooks): Likewise.
   * vmsdbgout.c (vmsdbgout_early_finish): New function dispatching
   to dwarf2out variant if needed.
   (vmsdbg_debug_hooks): Adjust.
   * dwarf2out.c (dwarf2_line_hooks): Adjust.
   (flush_limbo_die_list): New function.
   (dwarf2out_finish): Call flush_limbo_die_list instead of
   dwarf2out_early_finish.  Assert there are no deferred asm-names.
   Move early stuff ...
   (dwarf2out_early_finish): ... here.
   * cgraphunit.c (symbol_table::finalize_compilation_unit):
   Call early_finish with main_input_filename argument.
 
 
 Index: gcc/cgraphunit.c
 ===
 --- gcc/cgraphunit.c  (revision 226966)
 +++ gcc/cgraphunit.c  (working copy)
 @@ -2490,7 +2490,7 @@ symbol_table::finalize_compilation_unit
  
/* Clean up anything that needs cleaning up after initial debug
   generation.  */
 -  (*debug_hooks-early_finish) ();
 +  (*debug_hooks-early_finish) (main_input_filename);
  
/* Finally drive the pass manager.  */
compile ();
 Index: gcc/dbxout.c
 ===
 --- gcc/dbxout.c  (revision 226966)
 +++ gcc/dbxout.c  (working copy)
 @@ -354,7 +354,7 @@ const struct gcc_debug_hooks dbx_debug_h
  {

[PATCH][3/n] dwarf2out refactoring for early (LTO) debug

2015-08-26 Thread Richard Biener

The following fixes a GC issue I run into when doing 
prune_unused_types_prune early.  The issue is that the DIE struct
has a chain_circular marked field (die_sib) which cannot tolerate
spurious extra entries from old removed entries into the circular
chain.  Otherwise we fail to properly mark parts of the chain.
Those stray entries are kept live referenced from TYPE_SYMTAB_DIE.

So the following patch that makes sure to clear -die_sib for
nodes we remove.  (these DIEs remaining in TYPE_SYMTAB_DIE also
means we may end up re-using them which is probably not what we
want ... in the original LTO experiment I had a -removed flag
in the DIE struct and removed DIEs from the cache at cache lookup
time if I hit a removed DIE)

Bootstrapped and tested on x86_64-unknown-linux-gnu, gdb tested there
as well.

Ok for trunk?

Thanks,
Richard.

2015-08-26  Richard Biener  rguent...@suse.de

* dwarf2out.c (remove_child_with_prev): Clear child-die_sib.
(replace_child): Likewise.
(remove_child_TAG): Adjust.
(move_marked_base_types): Likewise.
(prune_unused_types_prune): Clear die_sib of removed children.

Index: trunk/gcc/dwarf2out.c
===
--- trunk.orig/gcc/dwarf2out.c  2015-08-26 09:30:54.679185817 +0200
+++ trunk/gcc/dwarf2out.c   2015-08-25 16:54:09.150506037 +0200
@@ -4827,6 +4827,7 @@ remove_child_with_prev (dw_die_ref child
 prev-die_sib = child-die_sib;
   if (child-die_parent-die_child == child)
 child-die_parent-die_child = prev;
+  child-die_sib = NULL;
 }
 
 /* Replace OLD_CHILD with NEW_CHILD.  PREV must have the property that
@@ -4853,6 +4854,7 @@ replace_child (dw_die_ref old_child, dw_
 }
   if (old_child-die_parent-die_child == old_child)
 old_child-die_parent-die_child = new_child;
+  old_child-die_sib = NULL;
 }
 
 /* Move all children from OLD_PARENT to NEW_PARENT.  */
@@ -4883,9 +4885,9 @@ remove_child_TAG (dw_die_ref die, enum d
remove_child_with_prev (c, prev);
c-die_parent = NULL;
/* Might have removed every child.  */
-   if (c == c-die_sib)
+   if (die-die_child == NULL)
  return;
-   c = c-die_sib;
+   c = prev-die_sib;
   }
   } while (c != die-die_child);
 }
@@ -24565,8 +24590,8 @@ prune_unused_types_prune (dw_die_ref die
 
   c = die-die_child;
   do {
-dw_die_ref prev = c;
-for (c = c-die_sib; ! c-die_mark; c = c-die_sib)
+dw_die_ref prev = c, next;
+for (c = c-die_sib; ! c-die_mark; c = next)
   if (c == die-die_child)
{
  /* No marked children between 'prev' and the end of the list.  */
@@ -24578,8 +24603,14 @@ prune_unused_types_prune (dw_die_ref die
  prev-die_sib = c-die_sib;
  die-die_child = prev;
}
+ c-die_sib = NULL;
  return;
}
+  else
+   {
+ next = c-die_sib;
+ c-die_sib = NULL;
+   }
 
 if (c != prev-die_sib)
   prev-die_sib = c;
@@ -24824,8 +24855,8 @@ move_marked_base_types (void)
  remove_child_with_prev (c, prev);
  /* As base types got marked, there must be at least
 one node other than DW_TAG_base_type.  */
- gcc_assert (c != c-die_sib);
- c = c-die_sib;
+ gcc_assert (die-die_child != NULL);
+ c = prev-die_sib;
}
 }
   while (c != die-die_child);


[build] Use __cxa_atexit on Solaris 12+

2015-08-26 Thread Rainer Orth
Solaris 12 introduced __cxa_atexit in libc.  The following patch makes
use of it, and also removes the strange failures seen with gld reported
in PR c++/51923.

Bootstrapped without regressions on i386-pc-solaris2.1[12] and
sparc-sun-solaris2.1[12], will installl on mainline.  Will backport to
the gcc 5 branch after some soak time.

Rainer


2015-02-10  Rainer Orth  r...@cebitec.uni-bielefeld.de

* config.gcc (*-*-solaris2*): Enable default_use_cxa_atexit on
Solaris 12+.

Use __cxa_atexit on Solaris 10+

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -820,6 +820,12 @@ case ${target} in
   sol2_tm_file_head=dbxelf.h elfos.h ${cpu_type}/sysv4.h
   sol2_tm_file_tail=${cpu_type}/sol2.h sol2.h
   sol2_tm_file=${sol2_tm_file_head} ${sol2_tm_file_tail}
+  case ${target} in
+*-*-solaris2.1[2-9]*)
+  # __cxa_atexit was introduced in Solaris 12.
+  default_use_cxa_atexit=yes
+  ;;
+  esac
   use_gcc_stdint=wrap
   if test x$gnu_ld = xyes; then
 tm_file=usegld.h ${tm_file}

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[gomp4] loop partition optimization

2015-08-26 Thread Nathan Sidwell
I've committed this patch, which implements a simple partioned execution 
optimization.  A loop over both worker and vector dimensions is emits  separate 
FORK and JOIN markers for the two dimensions -- there may be reduction pieces 
between them, as Cesar will shortly be committing.


However, if there aren't reductions, then we end up with one partitioned region 
sitting neatly entirely inside another region.   This is inefficient, as it 
causes us to add separate worker and vector partitioning startup.


This optimization looks for regions of this form, and if found consumes the 
inner retion into the outer region.  Then we only emit a single setup block of code.


nathan
2015-08-26  Nathan Sidwell  nat...@codesourcery.com

	* config/nvptx/nvptx.opt (moptimize): New flag.
	* config/nvptx/nvptx.c (nvptx_option_override): Default
	nvptx_optimize.
	(nvptx_optimmize_inner): New.
	(nvptx_process_pars): Call it.
	* doc/invoke.txi (Nvptx options): Document moptimize.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 227180)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -178,6 +178,9 @@ nvptx_option_override (void)
   write_symbols = NO_DEBUG;
   debug_info_level = DINFO_LEVEL_NONE;
 
+  if (nvptx_optimize  0)
+nvptx_optimize = optimize  0;
+
   declared_fndecls_htab = hash_tabletree_hasher::create_ggc (17);
   needed_fndecls_htab = hash_tabletree_hasher::create_ggc (17);
   declared_libfuncs_htab
@@ -3005,6 +3008,64 @@ nvptx_skip_par (unsigned mask, parallel
   nvptx_single (mask, par-forked_block, pre_tail);
 }
 
+/* If PAR has a single inner parallel and PAR itself only contains
+   empty entry and exit blocks, swallow the inner PAR.  */
+
+static void
+nvptx_optimize_inner (parallel *par)
+{
+  parallel *inner = par-inner;
+
+  /* We mustn't be the outer dummy par.  */
+  if (!par-mask)
+return;
+
+  /* We must have a single inner par.  */
+  if (!inner || inner-next)
+return;
+
+  /* We must only contain 2 blocks ourselves -- the head and tail of
+ the inner par.  */
+  if (par-blocks.length () != 2)
+return;
+
+  /* We must be disjoint partitioning.  As we only have vector and
+ worker partitioning, this is sufficient to guarantee the pars
+ have adjacent partitioning.  */
+  if ((par-mask  inner-mask)  (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1))
+/* This indicates malformed code generation.  */
+return;
+
+  /* The outer forked insn should be the only one in its block.  */
+  rtx_insn *probe;
+  rtx_insn *forked = par-forked_insn;
+  for (probe = BB_END (par-forked_block);
+   probe != forked; probe = PREV_INSN (probe))
+if (INSN_P (probe))
+  return;
+
+  /* The outer joining insn, if any, must be in the same block as the inner
+ joined instruction, which must otherwise be empty of insns.  */
+  rtx_insn *joining = par-joining_insn;
+  rtx_insn *join = inner-join_insn;
+  for (probe = BB_END (inner-join_block);
+   probe != join; probe = PREV_INSN (probe))
+if (probe != joining  INSN_P (probe))
+  return;
+
+  /* Preconditions met.  Swallow the inner par.  */
+  par-mask |= inner-mask  (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1);
+
+  par-blocks.reserve (inner-blocks.length ());
+  while (inner-blocks.length ())
+par-blocks.quick_push (inner-blocks.pop ());
+
+  par-inner = inner-inner;
+  inner-inner = NULL;
+
+  delete inner;
+}
+
 /* Process the parallel PAR and all its contained
parallels.  We do everything but the neutering.  Return mask of
partitioned modes used within this parallel.  */
@@ -3012,8 +3073,11 @@ nvptx_skip_par (unsigned mask, parallel
 static unsigned
 nvptx_process_pars (parallel *par)
 {
-  unsigned inner_mask = par-mask;
+  if (nvptx_optimize)
+nvptx_optimize_inner (par);
   
+  unsigned inner_mask = par-mask;
+
   /* Do the inner parallels first.  */
   if (par-inner)
 {
Index: gcc/config/nvptx/nvptx.opt
===
--- gcc/config/nvptx/nvptx.opt	(revision 227180)
+++ gcc/config/nvptx/nvptx.opt	(working copy)
@@ -29,6 +29,10 @@ mmainkernel
 Target Report RejectNegative
 Link in code for a __main kernel.
 
+moptimize
+Target Report Var(nvptx_optimize) Init(-1)
+Optimize partition neutering
+
 Enum
 Name(ptx_isa) Type(int)
 Known PTX ISA versions (for use with the -misa= option):
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 227180)
+++ gcc/doc/invoke.texi	(working copy)
@@ -18814,6 +18814,11 @@ Generate code for 32-bit or 64-bit ABI.
 Link in code for a __main kernel.  This is for stand-alone instead of
 offloading execution.
 
+@item -moptimize
+@opindex moptimize
+Apply partitioned execution optimizations.  This is the default when any
+level of optimization is selected.
+
 @end table
 
 @node PDP-11 Options


[libgfortran,committed] Fix SHAPE intrinsic with KIND values 1 and 2

2015-08-26 Thread FX
Attached patch fixes the SHAPE intrinsic with option argument KIND values of 1 
and 2. While we already accept and emit code for SHAPE with KIND values, the 
runtime versions with integer kinds 1 and 2 are missing (while values of 4, 8 
and 10 are present).

The patch adds the necessary generated files, and symbols into gfortran.map, as 
well as a testcase.

I also took the opportunity to fix an error in the type of the SHAPE argument, 
which is a generic array (array_t) and not a specifically-typed version. This 
changes nothing for the generated code, because only the shape of the array 
descriptor is accessed. But it’s cleaner that way.

Committed as revision 227210, after bootstrapping and regtesting on 
x86_64-apple-darwin15.

FX




shape.ChangeLog
Description: Binary data


shape.diff
Description: Binary data


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Ilya Enkovich
2015-08-26 0:26 GMT+03:00 Jeff Law l...@redhat.com:
 On 08/21/2015 06:17 AM, Ilya Enkovich wrote:


 Hmm, I don't see how vector masks are more difficult to operate with.


 There are just no instructions for that but you have to pretend you
 have to get code vectorized.


 Also according to vector ABI integer mask should be used for mask
 operand in case of masked vector call.


 What ABI?  The function signature of the intrinsics?  How would that
 come into play here?


 Not intrinsics. I mean OpenMP vector functions which require integer
 arg for a mask in case of 512-bit vector.

 That's what I assumed -- you can pass in a mask as an argument and it's
 supposed to be a simple integer, right?

Depending on target ABI requires either vector mask or a simple integer value.





 Current implementation of masked loads, masked stores and bool
 patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
 really call it a canonical representation for all targets?


 No idea - we'll revisit when another targets adds a similar capability.


 AVX-512 is such target. Current representation forces multiple scalar
 mask - vector mask and back transformations which are artificially
 introduced by current bool patterns and are hard to optimize out.

 I'm a bit surprised they're so prevalent and hard to optimize away. ISTM PRE
 ought to handle this kind of thing with relative ease.

Most of vector comparisons are UNSPEC. And I doubt PRE may actually
help much even if get rid of UNSPEC somehow. Is there really a
redundancy in:

if ((v1 cmp v2)  (v3 cmp v4))
  load

v1 cmp v2 - mask1
select mask1 vec_cst_-1 vec_cst_0 - vec_mask1
v3 cmp v4 - mask2
select mask2 vec_mask1 vec_cst_0 - vec_mask2
vec_mask2 NE vec_cst_0 - mask3
load by mask3

It looks to me more like a i386 specific instruction selection problem.

Ilya



 Fact is GCC already copes with vector masks generated by vector compares
 just fine everywhere and I'd rather leave it as that.


 Nope. Currently vector mask is obtained from a vec_cond A op B, {0 ..
 0}, {-1 .. -1}. AND and IOR on bools are also expressed via
 additional vec_cond. I don't think vectorizer ever generates vector
 comparison.

 And I wouldn't say it's fine 'everywhere' because there is a single
 target utilizing them. Masked loads and stored for AVX-512 just don't
 work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
 512-bit vector then we get an ugly inefficient code. The question is
 where to fight with this inefficiency: in RTL or in GIMPLE. I want to
 fight with it where it appears, i.e. in GIMPLE by preventing bool -
 int conversions applied everywhere even if target doesn't need it.

 You should expect pushback anytime target dependencies are added to gimple,
 even if it's stuff in the vectorizer, which is infested with target
 dependencies.


 If we don't want to support both types of masks in GIMPLE then it's
 more reasonable to make bool - int conversion in expand for targets
 requiring it, rather than do it for everyone and then leave it to
 target to transform it back and try to get rid of all those redundant
 transformations. I'd give vectorbool a chance to become a canonical
 mask representation for that.

 Might be worth some experimentation.

 Jeff


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Ilya Enkovich
2015-08-26 0:42 GMT+03:00 Jeff Law l...@redhat.com:
 On 08/21/2015 04:49 AM, Ilya Enkovich wrote:


 I want a work with bitmasks to be expressed in a natural way using
 regular integer operations. Currently all masks manipulations are
 emulated via vector statements (mostly using a bunch of vec_cond). For
 complex predicates it may be nontrivial to transform it back to scalar
 masks and get an efficient code. Also the same vector may be used as
 both a mask and an integer vector. Things become more complex if you
 additionally have broadcasts and vector pack/unpack code. It also
 should be transformed into a scalar masks manipulations somehow.

 Or why not model the conversion at the gimple level using a CONVERT_EXPR?
 In fact, the more I think about it, that seems to make more sense to me.

 We pick a canonical form for the mask, whatever it may be.  We use that
 canonical form and model conversions between it and the other form via
 CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
 If it's not up to the task, we should really look into why and resolve.

 Yes, that does mean we have two forms which I'm not terribly happy about and
 it means some target dependencies on what the masked vector operation looks
 like (ie, does it accept a simple integer or vector mask), but I'm starting
 to wonder if, as distasteful as I find it, it's the right thing to do.

If we have some special representation for masks in GIMPLE then we
might not need any conversions. We could ask a target to define a MODE
for this type and use it directly everywhere: directly compare into
it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
that type is reserved for masks usage then you previous suggestion to
transform masks into target specific form at GIMPLE-RTL phase should
work fine. This would allow to support only a single masks
representation in GIMPLE.

Thanks,
Ilya



 But I don't like changing our IL so much as to allow 'integer' masks
 everywhere.

 I'm warming up to that idea...

 jeff



Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-08-26 Thread Marcus Shawcroft
On 25 August 2015 at 14:12, Andre Vieira andre.simoesdiasvie...@arm.com wrote:

 gcc/ChangeLog:
 2015-08-07  Ramana Radhakrishnan  ramana.radhakrish...@arm.com
 Andre Vieira  andre.simoesdiasvie...@arm.com

 * config/aarch64/aarch64.md (*condjump): Handle functions  1 Mib.
 (*cboptabmode1): Likewise.
 (*tboptabmode1): Likewise.
 (*cboptabmode1): Likewise.
 * config/aarch64/iterators.md (inv_cb): New code attribute.
 (inv_tb): Likewise.
 * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
 * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.

 gcc/testsuite/ChangeLog:
 2015-08-07  Andre Vieira  andre.simoesdiasvie...@arm.com

 * gcc.target/aarch64/long_branch_1.c: New test.

OK /Marcus


[libgo] Use stat_atim.go on Solaris 12+

2015-08-26 Thread Rainer Orth
Solaris 12 changes the stat_[amc]tim members of struct stat from
timestruc_t to timespec_t for XPG7 compatiblity, thus breaking the libgo
build.  The following patch checks for this change and uses the common
stat_atim.go if appropriate.

Btw., I noticed that go/os/stat_atim.go and stat_dragonfly.go are identical;
no idea why that would be useful.

Bootstrapped without regressions on i386-pc-solaris2.1[12] and
sparc-sun-solaris2.1[12].

I had to regenerate aclocal.m4 since for some reason it had been built
with automake 1.11.1 instead of the common 1.11.6, thus inhibiting
Makefile.in regeneration.

Ok for mainline now and the gcc 5 branch after some soak time?

Rainer


2015-02-10  Rainer Orth  r...@cebitec.uni-bielefeld.de

* configure.ac (have_stat_timespec): Check for timespec_t st_atim
in sys/stat.h.
(HAVE_STAT_TIMESPEC): New conditional.
* configure: Regenerate.
* Makefile.am [LIBGO_IS_SOLARIS  HAVE_STAT_TIMESPEC]
(go_os_stat_file): Use go/os/stat_atim.go.
* aclocal.m4: Regenerate.
* Makefile.in: Regenerate.

# HG changeset patch
# Parent b83d7b91430fc3d2c2f34df34aaf648b178d2cad
Use stat_atim.go on Solaris 12+

diff --git a/libgo/Makefile.am b/libgo/Makefile.am
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -880,7 +880,11 @@ endif
 endif
 
 if LIBGO_IS_SOLARIS
+if HAVE_STAT_TIMESPEC
+go_os_stat_file = go/os/stat_atim.go
+else
 go_os_stat_file = go/os/stat_solaris.go
+endif
 else
 if LIBGO_IS_LINUX
 go_os_stat_file = go/os/stat_atim.go
diff --git a/libgo/configure.ac b/libgo/configure.ac
--- a/libgo/configure.ac
+++ b/libgo/configure.ac
@@ -654,6 +654,12 @@ AC_CACHE_CHECK([epoll_event data.fd offs
 STRUCT_EPOLL_EVENT_FD_OFFSET=${libgo_cv_c_epoll_event_fd_offset}
 AC_SUBST(STRUCT_EPOLL_EVENT_FD_OFFSET)
 
+dnl Check if sys/stat.h uses timespec_t for st_?tim members.  Introduced
+dnl in Solaris 12 for XPG7 compatibility.
+AC_EGREP_HEADER([timespec_t.*st_atim], [sys/stat.h],
+		[have_stat_timespec=yes], [have_stat_timespec=no])
+AM_CONDITIONAL(HAVE_STAT_TIMESPEC, test $have_stat_timespec = yes)
+
 dnl See if struct exception is defined in math.h.
 AC_CHECK_TYPE([struct exception],
 [libgo_has_struct_exception=yes],

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[libvtv] Update copyrights

2015-08-26 Thread Rainer Orth
While working on the Solaris libvtv port, I noticed that many of the
libvtv copyright years hadn't been updated, were misformtted, or both.
It turns out that libvtv isn't listed in contrib/update-copyright.py at
all.  This patch fixes this and includes the result of running
update-copyright.py --this-year libvtv.

I've neither added libvtv to self.default_dirs in the script nor added
copyrights to the numerous files in libvtv that currently lack one.

Ok for mainline once it has survived regtesting?

Thanks.
Rainer


2015-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de

libvtv:
Update copyrights.

contrib:
* update-copyright.py (GCCCmdLine): Add libvtv.

# HG changeset patch
# Parent 322129613b3dfc80c06f5f87dae9f2fa962a3496
Update copyrights

diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py
--- a/contrib/update-copyright.py
+++ b/contrib/update-copyright.py
@@ -745,6 +745,7 @@ class GCCCmdLine (CmdLine):
 # libsanitiser is imported from upstream.
 self.add_dir ('libssp')
 self.add_dir ('libstdc++-v3', LibStdCxxFilter())
+self.add_dir ('libvtv')
 self.add_dir ('lto-plugin')
 # zlib is imported from upstream.
 
diff --git a/libvtv/Makefile.am b/libvtv/Makefile.am
--- a/libvtv/Makefile.am
+++ b/libvtv/Makefile.am
@@ -1,6 +1,6 @@
 ## Makefile for the VTV library.
 ##
-## Copyright (C) 2013 Free Software Foundation, Inc.
+## Copyright (C) 2013-2015 Free Software Foundation, Inc.
 ##
 ## Process this file with automake to produce Makefile.in.
 ##
diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
--- a/libvtv/configure.tgt
+++ b/libvtv/configure.tgt
@@ -1,5 +1,5 @@
 # -*- shell-script -*-
-#   Copyright (C) 2013 Free Software Foundation, Inc.
+#   Copyright (C) 2013-2015 Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
diff --git a/libvtv/testsuite/config/default.exp b/libvtv/testsuite/config/default.exp
--- a/libvtv/testsuite/config/default.exp
+++ b/libvtv/testsuite/config/default.exp
@@ -1,4 +1,4 @@
-#   Copyright (C) 2013 Free Software Foundation, Inc.
+#   Copyright (C) 2013-2015 Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
diff --git a/libvtv/testsuite/libvtv.cc/virtfunc-test.cc b/libvtv/testsuite/libvtv.cc/virtfunc-test.cc
--- a/libvtv/testsuite/libvtv.cc/virtfunc-test.cc
+++ b/libvtv/testsuite/libvtv.cc/virtfunc-test.cc
@@ -2,8 +2,7 @@
 
 /* This test script is part of GDB, the GNU debugger.
 
-   Copyright 1993, 1994, 1997, 1998, 1999, 2003, 2004,
-   Free Software Foundation, Inc.
+   Copyright (C) 1993-2015 Free Software Foundation, Inc.
 
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
diff --git a/libvtv/testsuite/other-tests/Makefile.am b/libvtv/testsuite/other-tests/Makefile.am
--- a/libvtv/testsuite/other-tests/Makefile.am
+++ b/libvtv/testsuite/other-tests/Makefile.am
@@ -1,6 +1,6 @@
 ## Makefile for the testsuite subdirectory of the VTV library.
 ##
-## Copyright (C) 2013 Free Software Foundation, Inc.
+## Copyright (C) 2013-2015 Free Software Foundation, Inc.
 ##
 ## Process this file with automake to produce Makefile.in.
 ##
diff --git a/libvtv/vtv_fail.cc b/libvtv/vtv_fail.cc
--- a/libvtv/vtv_fail.cc
+++ b/libvtv/vtv_fail.cc
@@ -1,5 +1,4 @@
-/* Copyright (C) 2012-2013
- Free Software Foundation
+/* Copyright (C) 2012-2015 Free Software Foundation, Inc.
 
  This file is part of GCC.
 
diff --git a/libvtv/vtv_fail.h b/libvtv/vtv_fail.h
--- a/libvtv/vtv_fail.h
+++ b/libvtv/vtv_fail.h
@@ -1,5 +1,4 @@
-// Copyright (C) 2012-2013
-// Free Software Foundation
+// Copyright (C) 2012-2015 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
diff --git a/libvtv/vtv_malloc.cc b/libvtv/vtv_malloc.cc
--- a/libvtv/vtv_malloc.cc
+++ b/libvtv/vtv_malloc.cc
@@ -1,5 +1,4 @@
-/* Copyright (C) 2012-2013
-   Free Software Foundation
+/* Copyright (C) 2012-2015 Free Software Foundation, Inc.
 
This file is part of GCC.
 
diff --git a/libvtv/vtv_malloc.h b/libvtv/vtv_malloc.h
--- a/libvtv/vtv_malloc.h
+++ b/libvtv/vtv_malloc.h
@@ -1,5 +1,4 @@
-// Copyright (C) 2012-2013
-// Free Software Foundation
+// Copyright (C) 2012-2015 Free Software Foundation, Inc.
 //
 // This file is part of GCC.
 //
diff --git a/libvtv/vtv_map.h b/libvtv/vtv_map.h
--- a/libvtv/vtv_map.h
+++ b/libvtv/vtv_map.h
@@ -1,5 +1,4 @@
-/* Copyright (C) 2012-2013
-   Free Software Foundation
+/* Copyright (C) 2012-2015 Free Software Foundation, Inc.
 
This file is part of GCC.
 
diff --git a/libvtv/vtv_rts.cc b/libvtv/vtv_rts.cc
--- a/libvtv/vtv_rts.cc
+++ b/libvtv/vtv_rts.cc
@@ -1,5 +1,4 @@
-/* Copyright (C) 2012-2013
- Free Software Foundation
+/* 

Re: Fix libbacktrace -fPIC breakage from Use libbacktrace in libgfortran

2015-08-26 Thread Hans-Peter Nilsson
 From: Ulrich Weigand uweig...@de.ibm.com
 Date: Wed, 26 Aug 2015 13:45:35 +0200

 Hans-Peter Nilsson wrote:
   From: Ulrich Weigand uweig...@de.ibm.com
   Date: Tue, 25 Aug 2015 19:45:06 +0200
  
   However, neither works for the SPU, because in both cases libtool
   will only do the test whether the target supports the -fPIC option.
   It will not test whether the target supports dynamic libraries.
   
   [ It will do that test; and default to --disable-shared on SPU.
   That is a no-op for libbacktrace however, since it calls LT_INIT
   with the disable-shared option anyway.
  
  Maybe it shouldn't?
 
 Huh?  We do want libbacktrace solely as static library, that's the
 whole point ...

I meant that as a *suggestion for a possible workaround* to stop
libtool from refusing to compile with PIC, but then I take it
you don't need hints to try another angle than adjusting
compilation flags.

When adding back the -fPIC
   flag due to either the pic-only LT_INIT option or the -prefer-pic
   libtool command line option, it does not check for that again.  ]
  
  Sounds like a bug somewhere, in libtool or its current use:
  there *should* be a way to specify I'd prefer PIC code in these
  static libraries.
 
 But that's what the option *does*.
 
 Let me try again, maybe we can reduce confusion a bit :-)

I don't feel very confused, but I understand you've investigated
things down to a point where we can conclude that libtool can't
do what SPU needs without also at least fiddling with
compilation options.

 I guess we can always fall back to just hard-coding SPU once
 more; that's certainly the simplest solution right now.

Maybe.

brgds, H-P


[boehm-gc] Avoid unstructured procfs on Solaris

2015-08-26 Thread Rainer Orth
boehm-gc doesn't currently build on Solaris 12 since that release
finally removed the old unstructured /proc, thus the PIOCOPENPD ioctl.
This is already mentioned in the Solaris 11 EOF list:


http://www.oracle.com/technetwork/systems/end-of-notices/eonsolaris11-392732.html

Since the replacement (using /proc/pid/pagedata directly) has been
available since Solaris 2.6 in 1997, there's no need to retain the old
code, especially given that mainline only supports Solaris 10 and up.

Bootstrapped without regressions on i386-pc-solaris2.1[12] and
sparc-sun-solaris2.1[12], will install on mainline.  Will backport to
the gcc 5 branch after some soak time.

Rainer


2015-02-10  Rainer Orth  r...@cebitec.uni-bielefeld.de

* os_dep.c [GC_SOLARIS_THREADS] (GC_dirty_init): Use
/proc/pid/pagedata instead of PIOCOPENPD.

# HG changeset patch
# Parent 819be80e1b9c7e840fe5d232d64cf106869a933d
Avoid unstructured procfs on Solaris 12+

diff --git a/boehm-gc/os_dep.c b/boehm-gc/os_dep.c
--- a/boehm-gc/os_dep.c
+++ b/boehm-gc/os_dep.c
@@ -3184,13 +3184,11 @@ void GC_dirty_init()
 		  		(GC_words_allocd + GC_words_allocd_before_gc));
 #	endif   
 }
-sprintf(buf, /proc/%d, getpid());
-fd = open(buf, O_RDONLY);
-if (fd  0) {
+sprintf(buf, /proc/%d/pagedata, getpid());
+GC_proc_fd = open(buf, O_RDONLY);
+if (GC_proc_fd  0) {
 	ABORT(/proc open failed);
 }
-GC_proc_fd = syscall(SYS_ioctl, fd, PIOCOPENPD, 0);
-close(fd);
 syscall(SYS_fcntl, GC_proc_fd, F_SETFD, FD_CLOEXEC);
 if (GC_proc_fd  0) {
 	ABORT(/proc ioctl failed);


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Remove reference to undefined documentation node.

2015-08-26 Thread Dominik Vogt
On Wed, Aug 26, 2015 at 11:05:09AM +0100, Dominik Vogt wrote:
 This patch removes a menu entry that points to an undefined node
 in the documentation.  The faulty entry has been introduced with
 git commit id 3aabc45f2, subversion id 138bc75d-0d04-0410-96.  It
 looks like the entry is a remnant of an earlier version of the
 documentation introduced with that change.

Sorry, this patch is not good.  Please ignore; I'll look for a
different way to fix the warning.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: Fix libbacktrace -fPIC breakage from Use libbacktrace in libgfortran

2015-08-26 Thread Ulrich Weigand
Hans-Peter Nilsson wrote:
  From: Ulrich Weigand uweig...@de.ibm.com
  Date: Tue, 25 Aug 2015 19:45:06 +0200
 
  However, neither works for the SPU, because in both cases libtool
  will only do the test whether the target supports the -fPIC option.
  It will not test whether the target supports dynamic libraries.
  
  [ It will do that test; and default to --disable-shared on SPU.
  That is a no-op for libbacktrace however, since it calls LT_INIT
  with the disable-shared option anyway.
 
 Maybe it shouldn't?

Huh?  We do want libbacktrace solely as static library, that's the
whole point ...

   When adding back the -fPIC
  flag due to either the pic-only LT_INIT option or the -prefer-pic
  libtool command line option, it does not check for that again.  ]
 
 Sounds like a bug somewhere, in libtool or its current use:
 there *should* be a way to specify I'd prefer PIC code in these
 static libraries.

But that's what the option *does*.

Let me try again, maybe we can reduce confusion a bit :-)

We've been discussing three potential sets of options to use with
the LT_INIT call here.   Those are:

A) LT_INIT# no options
   Build both a static and a shared library.  If the target does not
   support shared libraries, build the static library only.  The code
   landing in the static library is built without -fPIC; code for the
   shared library is built with -fPIC (or the appropriate target flag).

B) LT_INIT([disable-shared])
   Build *solely* a static library.  Code is compiled without -fPIC.

C) LT_INIT([disable-shared,pic-only])
   Build solely a static library, but compile code with -fPIC or the
   appropriate target flag (may be none if the target does not support
   -fPIC).

[Note that in all cases, behaviour can be overridden via configure
options like --enable/disable-shared and --enable/disable-static.]

As I understand it, we deliberately do not use option A.  As the comment
in the libbacktrace configure.ac says:
 # When building as a target library, shared libraries may want to link
 # this in.  We don't want to provide another shared library to
 # complicate dependencies.  Instead, we just compile with -fPIC.

That's why libbacktrace currently uses option B and manually adds a
-fPIC flag.  Now, after the latest check-in, the behaviour is mostly
equivalent to using option C (and not manually changing PIC flags).

However, none of the options do exactly what would be right for
the SPU, which would be:

  Build solely a static library, using code that is compiled so that
  it can be linked as part of a second library (static or shared).

This is equivalent to:

  Build solely a static library, but compile code with -fPIC or the
  appropriate target flag *if the target supports shared libraries*.

This again is *mostly* equivalent to option C, *except* on targets
that support -fPIC but do not support shared libraries.

I'm not sure if it is worthwhile to try and change libtool to
support targets with that property (e.g. adding a new LT_INIT
option), if this in practice only affects SPU.

 But, I'll have to leave solving this PIC-failing-at-linkage
 problem to you; I committed the current approved fix for
 PIC-failing-at-compilation.

I guess we can always fall back to just hard-coding SPU once
more; that's certainly the simplest solution right now.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [AArch64][TLSLE][1/3] Add the option -mtls-size for AArch64

2015-08-26 Thread Marcus Shawcroft
On 25 August 2015 at 15:15, Jiong Wang jiong.w...@arm.com wrote:

 2015-08-25  Jiong Wang  jiong.w...@arm.com

 gcc/
   * config/aarch64/aarch64.opt (mtls-size): New entry.
   * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
   (aarch64_override_options_internal): Call initialize_aarch64_tls_size.
   * doc/invoke.texi (AArch64 Options): Document -mtls-size.


OK Thanks /Marcus


Re: [PATCH][4/N] Introduce new inline functions for GET_MODE_UNIT_SIZE and GET_MODE_UNIT_PRECISION

2015-08-26 Thread Oleg Endo

On 19 Aug 2015, at 22:35, Jeff Law l...@redhat.com wrote:

 On 08/19/2015 06:29 AM, David Sherwood wrote:
 I asked Richard S. to give this a once-over which he did.  However, he
 technically can't approve due to the way his maintainership position was
 worded.
 
 The one request would be a function comment for emit_mode_unit_size and
 emit_mode_unit_precision.  OK with that change.
 Thanks. Here's a new patch with the comments added.
 
 Good to go?
 David.
 
 ChangeLog:
 
 2015-08-19  David Sherwood  david.sherw...@arm.com
 
  gcc/
  * genmodes.c (emit_mode_unit_size_inline): New function.
  (emit_mode_unit_precision_inline): New function.
  (emit_insn_modes_h): Emit new #define.  Emit new functions.
  (emit_mode_unit_size): New function.
  (emit_mode_unit_precision): New function.
  (emit_mode_adjustments): Add mode_unit_size adjustments.
  (emit_insn_modes_c): Emit new arrays.
  * machmode.h (GET_MODE_UNIT_SIZE, GET_MODE_UNIT_PRECISION): Update to
  use new inline methods.
 
 Thanks, this is OK for the trunk.

It seems this broke sh-elf, at least when compiling on OSX with its native 
clang.

../../gcc-trunk/gcc/machmode.h:228:43: error: redefinition of 'mode_unit_size' 
with a different type:
  'const unsigned char [56]' vs 'unsigned char [56]'
extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
  ^
./insn-modes.h:417:24: note: previous definition is here
  extern unsigned char mode_unit_size[NUM_MACHINE_MODES];
   ^
Cheers,
Oleg

Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Wed, Aug 26, 2015 at 3:35 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote:
 On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
   AVX-512 is such target. Current representation forces multiple scalar
   mask - vector mask and back transformations which are artificially
   introduced by current bool patterns and are hard to optimize out.
 
  I dislike the bool patterns anyway and we should try to remove those
  and make the vectorizer handle them in other ways (they have single-use
  issues anyway).  I don't remember exactly what caused us to add them
  but one reason was there wasn't a vector type for 'bool' (but I don't see 
  how
  it should be necessary to ask get me a vector type for 'bool').
 
  That was just one of the reasons.  The other reason is that even if we 
  would
  choose some vector of integer type as vector of bool, the question is what
  type.  E.g. if you use vector of chars, you almost always get terrible
  vectorized code, except for the AVX-512 you really want an integral type
  that has the size of the types you are comparing.

 Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
 first compute the vector type for the comparison itself (which is fixed) 
 and
 thus we can compute the vector type of any bitwise op on it as well.

 Sure, but if you then immediately vector narrow it to a V*QI vector because
 it is stored originally into a bool/_Bool variable, and then again when it
 is used in say a COND_EXPR widen it again, you get really poor code.
 So, what the bool pattern code does is kind of poor man's type
 promotion/demotion pass for bool only, at least for the common cases.

Yeah, I just looked at the code but in the end everything should be fixable
in the place we compute STMT_VINFO_VECTYPE.  The code just
looks at the LHS type plus at the narrowest type (for vectorization factor).
It should get re-structured to get the vector types from the operands
(much like code-generation will eventually fall back to).

 PR50596 has been the primary reason to introduce the bool patterns.
 If there is a better type promotion/demotion pass on a copy of the loop,
 sure, we can get rid of it (but figure out also what to do for SLP).

Yeah, of course.  Basic-block SLP just asks for the vectype during SLP
analysis AFAIK.

I suppose we want sth like get_result_vectype (gimple) which can look
at operands as well and can be used from both places.

After all we do want to fix the non-single-use issue somehow and getting
rid of the patterns sounds good to me anyway...

Not sure if I can get to the above for GCC 6, but at least putting it on my
TODO...

Richard.

 Jakub


[PATCH][AArch64 array_mode 8/8] Add d-registers to TARGET_ARRAY_MODE_SUPPORTED_P

2015-08-26 Thread Alan Lawrence
This adds an AARCH64_VALID_SIMD_DREG_MODE exactly paralleling the existing
...QREG... macro, and as a driveby fixes mode-(MODE) in the latter.

The new test now compiles (at -O3) to:

test_1:
add v1.2s, v1.2s, v5.2s
add v2.2s, v2.2s, v6.2s
add v3.2s, v3.2s, v7.2s
add v0.2s, v0.2s, v4.2s
ret

Whereas prior to this patch we got:

test_1:
add v0.2s, v0.2s, v4.2s
sub sp, sp, #160
add v1.2s, v1.2s, v5.2s
add v2.2s, v2.2s, v6.2s
add v3.2s, v3.2s, v7.2s
str d0, [sp, 96]
str d1, [sp, 104]
str d2, [sp, 112]
str d3, [sp, 120]
ldp x2, x3, [sp, 96]
stp x2, x3, [sp, 128]
ldp x0, x1, [sp, 112]
stp x0, x1, [sp, 144]
ldr d1, [sp, 136]
ldr d0, [sp, 128]
ldr d2, [sp, 144]
ldr d3, [sp, 152]
add sp, sp, 160
ret

I've tried to look for (the absence of) this extra code in a number of ways,
all 3 scan...not's were previously failing (i.e. regex's were matching) but now
pass.

bootstrapped and check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.h (AARCH64_VALID_SIMD_DREG_MODE): New.
(AARCH64_VALID_SIMD_QREG_MODE): Correct mode-MODE.

* config/aarch64/aarch64.c (aarch64_array_mode_supported_p): Add
AARCH64_VALID_SIMD_DREG_MODE.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-int32x2x4_1.c: New.
---
 gcc/config/aarch64/aarch64.c   |  3 ++-
 gcc/config/aarch64/aarch64.h   |  7 ++-
 .../gcc.target/aarch64/vect-int32x2x4_1.c  | 22 ++
 3 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a923b55..d2ea7f6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -650,7 +650,8 @@ aarch64_array_mode_supported_p (machine_mode mode,
unsigned HOST_WIDE_INT nelems)
 {
   if (TARGET_SIMD
-   AARCH64_VALID_SIMD_QREG_MODE (mode)
+   (AARCH64_VALID_SIMD_QREG_MODE (mode)
+ || AARCH64_VALID_SIMD_DREG_MODE (mode))
(nelems = 2  nelems = 4))
 return true;
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 3851564..d1ba00b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -915,10 +915,15 @@ extern enum aarch64_code_model aarch64_cmodel;
   (aarch64_cmodel == AARCH64_CMODEL_TINY   \
|| aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
 
+/* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register.  */
+#define AARCH64_VALID_SIMD_DREG_MODE(MODE) \
+  ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \
+   || (MODE) == V2SFmode || (MODE) == DImode || (MODE) == DFmode)
+
 /* Modes valid for AdvSIMD Q registers.  */
 #define AARCH64_VALID_SIMD_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V4SFmode || (MODE) == V2DImode || mode == V2DFmode)
+   || (MODE) == V4SFmode || (MODE) == V2DImode || (MODE) == V2DFmode)
 
 #define ENDIAN_LANE_N(mode, n)  \
   (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c 
b/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c
new file mode 100644
index 000..734cfd6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options -O3 -fdump-rtl-expand } */
+
+#include arm_neon.h
+
+uint32x2x4_t
+test_1 (uint32x2x4_t a, uint32x2x4_t b)
+{
+   uint32x2x4_t result;
+
+   for (unsigned index = 0; index  4; ++index)
+ result.val[index] = a.val[index] + b.val[index];
+
+   return result;
+}
+
+/* Should not use the stack in expand.  */
+/* { dg-final { scan-rtl-dump-not virtual-stack-vars expand } } */
+/* Should not have to modify the stack pointer.  */
+/* { dg-final { scan-assembler-not \t(add|sub).*sp } } */
+/* Should not have to store or load anything.  */
+/* { dg-final { scan-assembler-not \t(ld|st)\[rp\] } } */
-- 
1.8.3



Re: [libvtv] Update copyrights

2015-08-26 Thread Joseph Myers
On Wed, 26 Aug 2015, Rainer Orth wrote:

 While working on the Solaris libvtv port, I noticed that many of the
 libvtv copyright years hadn't been updated, were misformtted, or both.
 It turns out that libvtv isn't listed in contrib/update-copyright.py at
 all.  This patch fixes this and includes the result of running
 update-copyright.py --this-year libvtv.
 
 I've neither added libvtv to self.default_dirs in the script nor added
 copyrights to the numerous files in libvtv that currently lack one.
 
 Ok for mainline once it has survived regtesting?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Ilya Enkovich
2015-08-26 16:02 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2015-08-21 14:00 GMT+03:00 Richard Biener richard.guent...@gmail.com:

 Hmm, I don't see how vector masks are more difficult to operate with.

 There are just no instructions for that but you have to pretend you
 have to get code vectorized.

 Huh?  Bitwise ops should be readily available.

Right bitwise ops are available, but there is no comparison into a
vector and no masked loads and stores using vector masks (when we
speak about 512-bit vectors).



 Also according to vector ABI integer mask should be used for mask
 operand in case of masked vector call.

 What ABI?  The function signature of the intrinsics?  How would that
 come into play here?

 Not intrinsics. I mean OpenMP vector functions which require integer
 arg for a mask in case of 512-bit vector.

 How do you declare those?

Something like this:

#pragma omp declare simd inbranch
int foo(int*);



 Current implementation of masked loads, masked stores and bool
 patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
 really call it a canonical representation for all targets?

 No idea - we'll revisit when another targets adds a similar capability.

 AVX-512 is such target. Current representation forces multiple scalar
 mask - vector mask and back transformations which are artificially
 introduced by current bool patterns and are hard to optimize out.

 I dislike the bool patterns anyway and we should try to remove those
 and make the vectorizer handle them in other ways (they have single-use
 issues anyway).  I don't remember exactly what caused us to add them
 but one reason was there wasn't a vector type for 'bool' (but I don't see how
 it should be necessary to ask get me a vector type for 'bool').


 Using scalar masks everywhere should probably cause the same conversion
 problem for SSE I listed above though.

 Talking about a canonical representation, shouldn't we use some
 special masks representation and not mixing it with integer and vector
 of integers then? Only in this case target would be able to
 efficiently expand it into a corresponding rtl.

 That was my idea of vectorbool ... but I didn't explore it and see where
 it will cause issues.

 Fact is GCC already copes with vector masks generated by vector compares
 just fine everywhere and I'd rather leave it as that.

 Nope. Currently vector mask is obtained from a vec_cond A op B, {0 ..
 0}, {-1 .. -1}. AND and IOR on bools are also expressed via
 additional vec_cond. I don't think vectorizer ever generates vector
 comparison.

 Ok, well that's an implementation detail then.  Are you sure about AND and 
 IOR?
 The comment above vect_recog_bool_pattern says

 Assuming size of TYPE is the same as size of all comparisons
 (otherwise some casts would be added where needed), the above
 sequence we create related pattern stmts:
 S1'  a_T = x1 CMP1 y1 ? 1 : 0;
 S3'  c_T = x2 CMP2 y2 ? a_T : 0;
 S4'  d_T = x3 CMP3 y3 ? 1 : 0;
 S5'  e_T = c_T | d_T;
 S6'  f_T = e_T;

 thus has vector mask |

I think in practice it would look like:

S4'  d_T = x3 CMP3 y3 ? 1 : c_T;

Thus everything is usually hidden in vec_cond. But my concern is
mostly about types used for that.


 And I wouldn't say it's fine 'everywhere' because there is a single
 target utilizing them. Masked loads and stored for AVX-512 just don't
 work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
 512-bit vector then we get an ugly inefficient code. The question is
 where to fight with this inefficiency: in RTL or in GIMPLE. I want to
 fight with it where it appears, i.e. in GIMPLE by preventing bool -
 int conversions applied everywhere even if target doesn't need it.

 If we don't want to support both types of masks in GIMPLE then it's
 more reasonable to make bool - int conversion in expand for targets
 requiring it, rather than do it for everyone and then leave it to
 target to transform it back and try to get rid of all those redundant
 transformations. I'd give vectorbool a chance to become a canonical
 mask representation for that.

 Well, you are missing the case of

bool b = a  b;
int x = (int)b;

This case seems to require no changes and just be transformed into vec_cond.

Thanks,
Ilya


 where the bool is used as integer (and thus an integer mask would have to be
 expanded).  When the bool is a mask in itself the integer use is either free
 or a matter of a widening/shortening operation.

 Richard.



[v3 patch] Only set std::enable_shared_from_this member once.

2015-08-26 Thread Jonathan Wakely

This adds a check to weak_ptr::_M_assign() so that calling
__enable_shared_from_this_helper twice with the same pointer won't
change which shared_ptr object the weak_ptr shares ownership with.

On the lib reflector Peter Dimov convinced me that the
boost::enable_shared_from_this behaviour is preferable to what we do
now. I'm writing a proposal to specify this in the standard, but am
changing it now in our implementation.

Tested powerpc64le-linux, committing to trunk.
commit a1cd60820fb1af7f3396ff4b28e0e1d3449bfacb
Author: Jonathan Wakely jwak...@redhat.com
Date:   Tue Aug 25 17:10:36 2015 +0100

Only set std::enable_shared_from_this member once.

	* include/bits/shared_ptr.h (__enable_shared_from_this_helper): Use
	nullptr.
	* include/bits/shared_ptr_base.h (weak_ptr::_M_assign): Don't assign
	if ownership is already shared with a shared_ptr object.
	(__enable_shared_from_this_helper): Use nullptr.
	* testsuite/20_util/enable_shared_from_this/members/const.cc: New.
	* testsuite/20_util/enable_shared_from_this/members/reinit.cc: New.
	* testsuite/20_util/enable_shared_from_this/requirements/
	explicit_instantiation.cc: Instantiate with const and incomplete types.

diff --git a/libstdc++-v3/include/bits/shared_ptr.h b/libstdc++-v3/include/bits/shared_ptr.h
index f96c078..2413b1b 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -588,7 +588,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	 const enable_shared_from_this* __pe,
 	 const _Tp1* __px) noexcept
 	{
-	  if (__pe != 0)
+	  if (__pe != nullptr)
 	__pe-_M_weak_assign(const_cast_Tp1*(__px), __pn);
 	}
 
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index aec10fe..820edcb 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -1468,8 +1468,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   _M_assign(_Tp* __ptr, const __shared_count_Lp __refcount) noexcept
   {
-	_M_ptr = __ptr;
-	_M_refcount = __refcount;
+	if (use_count() == 0)
+	  {
+	_M_ptr = __ptr;
+	_M_refcount = __refcount;
+	  }
   }
 
   templatetypename _Tp1, _Lock_policy _Lp1 friend class __shared_ptr;
@@ -1549,7 +1552,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	 const __enable_shared_from_this* __pe,
 	 const _Tp1* __px) noexcept
 	{
-	  if (__pe != 0)
+	  if (__pe != nullptr)
 	__pe-_M_weak_assign(const_cast_Tp1*(__px), __pn);
 	}
 
diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc
new file mode 100644
index 000..fdf39c7
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc
@@ -0,0 +1,60 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+// { dg-options -std=gnu++11 }
+
+#include memory
+#include testsuite_hooks.h
+
+templatetypename T, typename U
+  bool
+  share_ownership(const std::shared_ptrT p1, const std::shared_ptrU p2)
+  {
+return !p1.owner_before(p2)  !p2.owner_before(p1);
+  }
+
+void
+test01()
+{
+  struct X : public std::enable_shared_from_thisX { };
+  using CX = const X;
+  std::shared_ptrCX p(new X);
+  VERIFY( share_ownership(p-shared_from_this(), p) );
+  p.reset(new CX);
+  VERIFY( share_ownership(p-shared_from_this(), p) );
+  auto p2 = std::const_pointer_castX(p)-shared_from_this();
+  VERIFY( share_ownership(p2, p) );
+}
+
+void
+test02()
+{
+  struct X;
+  using CX = const X;
+  struct X : public std::enable_shared_from_thisCX { };
+  std::shared_ptrCX p(new X);
+  VERIFY( share_ownership(p-shared_from_this(), p) );
+  p.reset(new CX);
+  VERIFY( share_ownership(p-shared_from_this(), p) );
+}
+
+int
+main()
+{
+  test01();
+  test02();
+}
diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc
new file mode 100644
index 000..3740db8
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc
@@ -0,0 +1,49 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This 

Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2015-08-21 14:00 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2015-08-21 11:15 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law l...@redhat.com wrote:
 On 08/17/2015 10:25 AM, Ilya Enkovich wrote:

 Hi,

 This patch intoriduces a new vectorizer hook use_scalar_mask_p which
 affects code generated by if-conversion pass (and affects patterns in 
 later
 patches).

 Thanks,
 Ilya
 --
 2015-08-17  Ilya Enkovich  enkovich@gmail.com

 * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New.
 * doc/tm.texi.in: Regenerated.
 * target.def (use_scalar_mask_p): New.
 * tree-if-conv.c: Include target.h.
 (predicate_mem_writes): Don't convert boolean predicates into
 integer when scalar masks are used.

 Presumably this is how you prevent the generation of scalar masks rather
 than boolean masks on targets which don't have the former?

 I hate to ask, but how painful would it be to go from a boolean to integer
 masks later such as during expansion?  Or vice-versa.

 WIthout a deep knowledge of the entire patchkit, it feels like we're
 introducing target stuff in a place where we don't want it and that we'd 
 be
 better served with a canonical representation through gimple, then 
 dropping
 into something more target specific during gimple-rtl expansion.

 I want a work with bitmasks to be expressed in a natural way using
 regular integer operations. Currently all masks manipulations are
 emulated via vector statements (mostly using a bunch of vec_cond). For
 complex predicates it may be nontrivial to transform it back to scalar
 masks and get an efficient code. Also the same vector may be used as
 both a mask and an integer vector. Things become more complex if you
 additionally have broadcasts and vector pack/unpack code. It also
 should be transformed into a scalar masks manipulations somehow.

 Hmm, I don't see how vector masks are more difficult to operate with.

 There are just no instructions for that but you have to pretend you
 have to get code vectorized.

Huh?  Bitwise ops should be readily available.


 Also according to vector ABI integer mask should be used for mask
 operand in case of masked vector call.

 What ABI?  The function signature of the intrinsics?  How would that
 come into play here?

 Not intrinsics. I mean OpenMP vector functions which require integer
 arg for a mask in case of 512-bit vector.

How do you declare those?


 Current implementation of masked loads, masked stores and bool
 patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
 really call it a canonical representation for all targets?

 No idea - we'll revisit when another targets adds a similar capability.

 AVX-512 is such target. Current representation forces multiple scalar
 mask - vector mask and back transformations which are artificially
 introduced by current bool patterns and are hard to optimize out.

I dislike the bool patterns anyway and we should try to remove those
and make the vectorizer handle them in other ways (they have single-use
issues anyway).  I don't remember exactly what caused us to add them
but one reason was there wasn't a vector type for 'bool' (but I don't see how
it should be necessary to ask get me a vector type for 'bool').


 Using scalar masks everywhere should probably cause the same conversion
 problem for SSE I listed above though.

 Talking about a canonical representation, shouldn't we use some
 special masks representation and not mixing it with integer and vector
 of integers then? Only in this case target would be able to
 efficiently expand it into a corresponding rtl.

 That was my idea of vectorbool ... but I didn't explore it and see where
 it will cause issues.

 Fact is GCC already copes with vector masks generated by vector compares
 just fine everywhere and I'd rather leave it as that.

 Nope. Currently vector mask is obtained from a vec_cond A op B, {0 ..
 0}, {-1 .. -1}. AND and IOR on bools are also expressed via
 additional vec_cond. I don't think vectorizer ever generates vector
 comparison.

Ok, well that's an implementation detail then.  Are you sure about AND and IOR?
The comment above vect_recog_bool_pattern says

Assuming size of TYPE is the same as size of all comparisons
(otherwise some casts would be added where needed), the above
sequence we create related pattern stmts:
S1'  a_T = x1 CMP1 y1 ? 1 : 0;
S3'  c_T = x2 CMP2 y2 ? a_T : 0;
S4'  d_T = x3 CMP3 y3 ? 1 : 0;
S5'  e_T = c_T | d_T;
S6'  f_T = e_T;

thus has vector mask |

 And I wouldn't say it's fine 'everywhere' because there is a single
 target utilizing them. Masked loads and stored for AVX-512 just don't
 work now. And if we extend existing MASK_LOAD and MASK_STORE 

Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Wed, Aug 26, 2015 at 1:13 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2015-08-26 0:42 GMT+03:00 Jeff Law l...@redhat.com:
 On 08/21/2015 04:49 AM, Ilya Enkovich wrote:


 I want a work with bitmasks to be expressed in a natural way using
 regular integer operations. Currently all masks manipulations are
 emulated via vector statements (mostly using a bunch of vec_cond). For
 complex predicates it may be nontrivial to transform it back to scalar
 masks and get an efficient code. Also the same vector may be used as
 both a mask and an integer vector. Things become more complex if you
 additionally have broadcasts and vector pack/unpack code. It also
 should be transformed into a scalar masks manipulations somehow.

 Or why not model the conversion at the gimple level using a CONVERT_EXPR?
 In fact, the more I think about it, that seems to make more sense to me.

 We pick a canonical form for the mask, whatever it may be.  We use that
 canonical form and model conversions between it and the other form via
 CONVERT_EXPR.  We then let DOM/PRE find/eliminate the redundant conversions.
 If it's not up to the task, we should really look into why and resolve.

 Yes, that does mean we have two forms which I'm not terribly happy about and
 it means some target dependencies on what the masked vector operation looks
 like (ie, does it accept a simple integer or vector mask), but I'm starting
 to wonder if, as distasteful as I find it, it's the right thing to do.

 If we have some special representation for masks in GIMPLE then we
 might not need any conversions. We could ask a target to define a MODE
 for this type and use it directly everywhere: directly compare into
 it, use it directly for masked loads and stores, AND, IOR, EQ etc. If
 that type is reserved for masks usage then you previous suggestion to
 transform masks into target specific form at GIMPLE-RTL phase should
 work fine. This would allow to support only a single masks
 representation in GIMPLE.

But we can already do all this with the integer vector masks we have.
If you think that the vectorizer generated

  mask = VEC_COND v1  v2 ? { -1,...} : { 0, ...} 

is ugly then we can remove that implementation detail and use

  mask = v1  v2;

directly.  Note that the VEC_COND form was invented to avoid
the need to touch RTL expansion for vector compares (IIRC).
Or it pre-dated specifying what compares generate on GIMPLE.

Richard.

 Thanks,
 Ilya



 But I don't like changing our IL so much as to allow 'integer' masks
 everywhere.

 I'm warming up to that idea...

 jeff



Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Jakub Jelinek
On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
  AVX-512 is such target. Current representation forces multiple scalar
  mask - vector mask and back transformations which are artificially
  introduced by current bool patterns and are hard to optimize out.
 
 I dislike the bool patterns anyway and we should try to remove those
 and make the vectorizer handle them in other ways (they have single-use
 issues anyway).  I don't remember exactly what caused us to add them
 but one reason was there wasn't a vector type for 'bool' (but I don't see how
 it should be necessary to ask get me a vector type for 'bool').

That was just one of the reasons.  The other reason is that even if we would
choose some vector of integer type as vector of bool, the question is what
type.  E.g. if you use vector of chars, you almost always get terrible
vectorized code, except for the AVX-512 you really want an integral type
that has the size of the types you are comparing.
And I'd say this is very much related to the need to do some type promotions
or demotions on the scalar code meant to be vectorized (but only the copy
for vectorizations), so that we have as few different scalar type sizes in
the loop as possible, because widening / narrowing vector conversions aren't
exactly cheap and a single char operation in a loop otherwise full of long
long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into
vf=16 (or 32 or 64), increasing it a lot.

Jakub


[libvtv] Fix formatting errors

2015-08-26 Thread Rainer Orth
While looking at libvtv for the Solaris port, I noticed all sorts of GNU
Coding Standard violations:

* ChangeLog entries attributed to the committer instead of the author
  and with misformatted PR references, entries only giving a vague
  rational instead of what changed

* overlong lines

* tons of whitespace errors (though I may be wrong in some cases: C++
  code might have other rules)

* code formatting that seems to have been done to be visually pleasing,
  completely different from what Emacs does

* commented code fragments (#if 0 equivalent)

* configure.tgt target list in no recognizable order

* the Cygwin/MingW port is done in the worst possible way: tons of
  target-specific ifdefs instead of feature-specific conditionals or an
  interface that can wrap both Cygwin and Linux variants of the code

The following patch (as yet not even compiled) fixes some of the most
glaring errors.  The Solaris port will fix a few of the latter ones.

Do you think this is the right direction or did I get something wrong?

Thanks.
Rainer


2015-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de

Fix formatting errors.

# HG changeset patch
# Parent 6459822b8e6fa7647ad0d12ffb6f3da7bd0c5db2
Fix formatting errors

diff --git a/libvtv/ChangeLog b/libvtv/ChangeLog
--- a/libvtv/ChangeLog
+++ b/libvtv/ChangeLog
@@ -1,6 +1,6 @@
-2015-08-01  Caroline Tice  cmt...@google.com
+2015-08-01  Eric Gallager  eg...@gwmail.gwu.edu
 
-	PR 66521
+	PR bootstrap/66521
 	* Makefile.am:  Update to match latest tree.
 	* Makefile.in: Regenerate.
 	* testsuite/lib/libvtv: Brought up to date.
@@ -24,15 +24,13 @@ 2015-02-09  Thomas Schwinge  thomas@cod
 	* configure: Likewise.
 	* testsuite/Makefile.in: Likewise.
 
-2015-01-29  Caroline Tice  cmt...@google.com
+2015-01-29  Patrick Wollgast  patrick.wollg...@rub.de
 
-	Committing VTV Cywin/Ming patch for Patrick Wollgast
 	* libvtv/Makefile.in : Regenerate.
 	* libvtv/configure : Regenerate.
 
-2015-01-28  Caroline Tice  cmt...@google.com
+2015-01-28  Patrick Wollgast  patrick.wollg...@rub.de
 
-	Committing VTV Cywin/Ming patch for Patrick Wollgast
 	* libvtv/Makefile.am : Add libvtv.la to toolexeclib_LTLIBRARIES, if
 	VTV_CYGMIN is set. Define libvtv_la_LIBADD, libvtv_la_LDFLAGS,
 	libvtv_stubs_la_LDFLAGS and libvtv_stubs_la_SOURCES if VTV_CYGMIN is
diff --git a/libvtv/vtv_fail.cc b/libvtv/vtv_fail.cc
--- a/libvtv/vtv_fail.cc
+++ b/libvtv/vtv_fail.cc
@@ -38,9 +38,7 @@
desired.  This may be the case if the programmer has to deal wtih
unverified third party software, for example.  __vtv_really_fail is
available for the programmer to call from his version of
-   __vtv_verify_fail, if he decides the failure is real.
-
-*/
+   __vtv_verify_fail, if he decides the failure is real.  */
 
 #include stdlib.h
 #include stdio.h
@@ -80,8 +78,8 @@ const unsigned long SET_HANDLE_HANDLE_BI
 
 /* Instantiate the template classes (in vtv_set.h) for our particular
hash table needs.  */
-typedef void * vtv_set_handle;
-typedef vtv_set_handle * vtv_set_handle_handle; 
+typedef void *vtv_set_handle;
+typedef vtv_set_handle *vtv_set_handle_handle; 
 
 static int vtv_failures_log_fd = -1;
 
@@ -121,17 +119,16 @@ log_error_message (const char *log_msg, 
variable.  */
 
 static inline bool
-is_set_handle_handle (void * ptr)
+is_set_handle_handle (void *ptr)
 {
-  return ((unsigned long) ptr  SET_HANDLE_HANDLE_BIT)
-  == SET_HANDLE_HANDLE_BIT;
+  return ((unsigned long) ptr  SET_HANDLE_HANDLE_BIT) == SET_HANDLE_HANDLE_BIT;
 }
 
 /* Returns the actual pointer value of a vtable map variable, PTR (see
comments for is_set_handle_handle for more details).  */
 
 static inline vtv_set_handle * 
-ptr_from_set_handle_handle (void * ptr)
+ptr_from_set_handle_handle (void *ptr)
 {
   return (vtv_set_handle *) ((unsigned long) ptr  ~SET_HANDLE_HANDLE_BIT);
 }
@@ -141,7 +138,7 @@ ptr_from_set_handle_handle (void * ptr)
variable.  */
 
 static inline vtv_set_handle_handle
-set_handle_handle (vtv_set_handle * ptr)
+set_handle_handle (vtv_set_handle *ptr)
 {
   return (vtv_set_handle_handle) ((unsigned long) ptr | SET_HANDLE_HANDLE_BIT);
 }
@@ -151,7 +148,7 @@ set_handle_handle (vtv_set_handle * ptr)
file, then calls __vtv_verify_fail.  SET_HANDLE_PTR is the pointer
to the set of valid vtable pointers, VTBL_PTR is the pointer that
was not found in the set, and DEBUG_MSG is the message to be
-   written to the log file before failing. n */
+   written to the log file before failing.  */
 
 void
 __vtv_verify_fail_debug (void **set_handle_ptr, const void *vtbl_ptr, 
@@ -197,9 +194,9 @@ vtv_fail (const char *msg, void **data_s
  *** Unable to verify vtable pointer (%p) in set (%p) *** \n;
 
   snprintf (buffer, sizeof (buffer), format_str, vtbl_ptr,
-is_set_handle_handle(*data_set_ptr) ?
-  ptr_from_set_handle_handle (*data_set_ptr) :
-	  *data_set_ptr);
+

Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Jakub Jelinek
On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote:
 On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
   AVX-512 is such target. Current representation forces multiple scalar
   mask - vector mask and back transformations which are artificially
   introduced by current bool patterns and are hard to optimize out.
 
  I dislike the bool patterns anyway and we should try to remove those
  and make the vectorizer handle them in other ways (they have single-use
  issues anyway).  I don't remember exactly what caused us to add them
  but one reason was there wasn't a vector type for 'bool' (but I don't see 
  how
  it should be necessary to ask get me a vector type for 'bool').
 
  That was just one of the reasons.  The other reason is that even if we would
  choose some vector of integer type as vector of bool, the question is what
  type.  E.g. if you use vector of chars, you almost always get terrible
  vectorized code, except for the AVX-512 you really want an integral type
  that has the size of the types you are comparing.
 
 Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
 first compute the vector type for the comparison itself (which is fixed) and
 thus we can compute the vector type of any bitwise op on it as well.

Sure, but if you then immediately vector narrow it to a V*QI vector because
it is stored originally into a bool/_Bool variable, and then again when it
is used in say a COND_EXPR widen it again, you get really poor code.
So, what the bool pattern code does is kind of poor man's type
promotion/demotion pass for bool only, at least for the common cases.

PR50596 has been the primary reason to introduce the bool patterns.
If there is a better type promotion/demotion pass on a copy of the loop,
sure, we can get rid of it (but figure out also what to do for SLP).

Jakub


[PATCH][AArch64 array_mode 3/8] Stop using EImode in aarch64-simd.md and iterators.md

2015-08-26 Thread Alan Lawrence
The V_THREE_ELEM attribute used BLKmode for most sizes, but occasionally
EImode. This patch changes to BLKmode in all cases, explicitly setting
memory size (thus, preserving size for the cases that were EImode, and
setting size for the first time for cases that were already BLKmode).

The patterns affected are only for intrinsics: the aarch64_ld3r
expanders and aarch64_simd_ld3r insns, and the
aarch64_vec_{load,store}_lanesci_lane insns used by the
aarch64_{ld,st}3_lane expanders.

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_ld3rmode,
aarch64_vec_load_lanesci_lanemode,
aarch64_vec_store_lanesci_lanemode): Change operand mode
from V_THREE_ELEM to BLK.

(aarch64_ld3rmode, aarch64_ld3_lanemode,
aarch64_st3_laneVQ:mode): Generate MEM rtx with BLKmode, call
set_mem_size.

* config/aarch64/iterators.md (V_THREE_ELEM): Remove.
---
 gcc/config/aarch64/aarch64-simd.md | 27 ++-
 gcc/config/aarch64/iterators.md|  8 
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 7b7a1b8..156fc4f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4001,7 +4001,7 @@
 
 (define_insn aarch64_simd_ld3rmode
   [(set (match_operand:CI 0 register_operand =w)
-   (unspec:CI [(match_operand:V_THREE_ELEM 1 
aarch64_simd_struct_operand Utv)
+   (unspec:CI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
   UNSPEC_LD3_DUP))]
   TARGET_SIMD
@@ -4011,7 +4011,7 @@
 
 (define_insn aarch64_vec_load_lanesci_lanemode
   [(set (match_operand:CI 0 register_operand =w)
-   (unspec:CI [(match_operand:V_THREE_ELEM 1 
aarch64_simd_struct_operand Utv)
+   (unspec:CI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)
(match_operand:CI 2 register_operand 0)
(match_operand:SI 3 immediate_operand i)
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
@@ -4052,11 +4052,11 @@
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
 (define_insn aarch64_vec_store_lanesci_lanemode
-  [(set (match_operand:V_THREE_ELEM 0 aarch64_simd_struct_operand =Utv)
-   (unspec:V_THREE_ELEM [(match_operand:CI 1 register_operand w)
-(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
-   (match_operand:SI 2 immediate_operand i)]
-   UNSPEC_ST3_LANE))]
+  [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv)
+   (unspec:BLK [(match_operand:CI 1 register_operand w)
+(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
+(match_operand:SI 2 immediate_operand i)]
+   UNSPEC_ST3_LANE))]
   TARGET_SIMD
   {
 operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2])));
@@ -4368,8 +4368,8 @@
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
-  machine_mode mode = V_THREE_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3);
 
   emit_insn (gen_aarch64_simd_ld3rmode (operands[0], mem));
   DONE;
@@ -4589,8 +4589,8 @@
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
-  machine_mode mode = V_THREE_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3);
 
   aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode),
NULL);
@@ -4874,8 +4874,9 @@
   (match_operand:SI 2 immediate_operand)]
   TARGET_SIMD
 {
-  machine_mode mode = V_THREE_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[0]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[0]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3);
+
   operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2])));
 
   emit_insn (gen_aarch64_vec_store_lanesci_laneVQ:mode (mem,
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 98b6714..ae0be0b 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -568,14 +568,6 @@
   (V2SF V2SF) (V4SF V2SF)
   (DF V2DI)   (V2DF V2DI)])
 
-;; Similar, for three elements.
-(define_mode_attr V_THREE_ELEM [(V8QI BLK) (V16QI BLK)
-(V4HI BLK) (V8HI BLK)
-(V2SI BLK) (V4SI BLK)
-(DI EI)(V2DI EI)
-(V2SF BLK) (V4SF BLK)
-(DF EI)(V2DF EI)])
-
 ;; Similar, for four elements.
 

[PATCH][AArch64 array_mode 5/8] Remove V_FOUR_ELEM, again using BLKmode + set_mem_size.

2015-08-26 Thread Alan Lawrence
This removes V_FOUR_ELEM in the same way that patch 3 removed V_THREE_ELEM,
again using BLKmode + set_mem_size. (This makes the four-lane expanders very
similar to the three-lane expanders, and they will be combined in patch 7.)

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_ld4rmode,
aarch64_vec_load_lanesxi_lanemode,
aarch64_vec_store_lanesxi_lanemode): Change operand mode
from V_FOUR_ELEM to BLK.
(aarch64_ld4rmode, aarch64_ld4_lanemode,
aarch64_st4_laneVQ:mode): Generate MEM rtx with BLKmode, call
set_mem_size.

* config/aarch64/iterators.md (V_FOUR_ELEM): Remove.
---
 gcc/config/aarch64/aarch64-simd.md | 25 +
 gcc/config/aarch64/iterators.md|  9 -
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 156fc4f..68182d6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4096,7 +4096,7 @@
 
 (define_insn aarch64_simd_ld4rmode
   [(set (match_operand:XI 0 register_operand =w)
-   (unspec:XI [(match_operand:V_FOUR_ELEM 1 
aarch64_simd_struct_operand Utv)
+   (unspec:XI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
   UNSPEC_LD4_DUP))]
   TARGET_SIMD
@@ -4106,7 +4106,7 @@
 
 (define_insn aarch64_vec_load_lanesxi_lanemode
   [(set (match_operand:XI 0 register_operand =w)
-   (unspec:XI [(match_operand:V_FOUR_ELEM 1 
aarch64_simd_struct_operand Utv)
+   (unspec:XI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)
(match_operand:XI 2 register_operand 0)
(match_operand:SI 3 immediate_operand i)
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
@@ -4147,10 +4147,10 @@
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
 (define_insn aarch64_vec_store_lanesxi_lanemode
-  [(set (match_operand:V_FOUR_ELEM 0 aarch64_simd_struct_operand =Utv)
-   (unspec:V_FOUR_ELEM [(match_operand:XI 1 register_operand w)
-(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
-   (match_operand:SI 2 immediate_operand i)]
+  [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv)
+   (unspec:BLK [(match_operand:XI 1 register_operand w)
+(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
+(match_operand:SI 2 immediate_operand i)]
UNSPEC_ST4_LANE))]
   TARGET_SIMD
   {
@@ -4381,8 +4381,8 @@
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
-  machine_mode mode = V_FOUR_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 4);
 
   emit_insn (gen_aarch64_simd_ld4rmode (operands[0],mem));
   DONE;
@@ -4609,8 +4609,8 @@
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
-  machine_mode mode = V_FOUR_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 4);
 
   aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode),
NULL);
@@ -4892,8 +4892,9 @@
   (match_operand:SI 2 immediate_operand)]
   TARGET_SIMD
 {
-  machine_mode mode = V_FOUR_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[0]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[0]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 4);
+
   operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2])));
 
   emit_insn (gen_aarch64_vec_store_lanesxi_laneVQ:mode (mem,
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index ae0be0b..9535b7f 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -568,15 +568,6 @@
   (V2SF V2SF) (V4SF V2SF)
   (DF V2DI)   (V2DF V2DI)])
 
-;; Similar, for four elements.
-(define_mode_attr V_FOUR_ELEM [(V8QI SI)   (V16QI SI)
-   (V4HI V4HI) (V8HI V4HI)
-   (V2SI V4SI) (V4SI V4SI)
-   (DI OI) (V2DI OI)
-   (V2SF V4SF) (V4SF V4SF)
-   (DF OI) (V2DF OI)])
-
-
 ;; Mode for atomic operation suffixes
 (define_mode_attr atomic_sfx
   [(QI b) (HI h) (SI ) (DI )])
-- 
1.8.3



Re: [PATCH], PowerPC IEEE 128-bit patch #6

2015-08-26 Thread David Edelsohn
On Fri, Aug 14, 2015 at 11:47 AM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:

 This is patch #6:

 2015-08-13  Michael Meissner  meiss...@linux.vnet.ibm.com

 * config/rs6000/rs6000-protos.h (rs6000_expand_float128_convert):
 Add declaration.

 * config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Fix a
 comment.
 (rs6000_cannot_change_mode_class): Add support for IEEE 128-bit
 floating point in VSX registers.
 (rs6000_output_move_128bit): Always print out the set insn if we
 can't generate an appropriate 128-bit move.
 (rs6000_generate_compare): Add support for IEEE 128-bit floating
 point in VSX registers comparisons.
 (rs6000_expand_float128_convert): Likewise.

 * config/rs6000/rs6000.md (extenddftf2): Add support for IEEE
 128-bit floating point in VSX registers.
 (extenddftf2_internal): Likewise.
 (trunctfdf2): Likewise.
 (trunctfdf2_internal2): Likewise.
 (fix_trunc_helper): Likewise.
 (fix_trunctfdi2): Likewise.
 (floatditf2): Likewise.
 (floatunsmodetf2): Likewise.
 (extendFLOAT128_SFDFTF:modeIFKF:mode2): Likewise.
 (truncIFKF:modeFLOAT128_SFDFTF:mode2): Likewise.
 (fix_truncIFKF:modeSDI:mode2): Likewise.
 (fixuns_truncIFKF:modeSDI:mode2): Likewise.
 (floatSDI:modeIFKF:mode2): Likewise.
 (floatunsSDI:modeIFKF:mode2): Likewise.

This patch is okay.

Thanks, David


Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Martin Jambor
Hi,

On Tue, Aug 25, 2015 at 12:06:16PM +0100, Alan Lawrence wrote:
 This makes SRA replace loads of records/arrays from constant pool entries,
 with elementwise assignments of the constant values, hence, overcoming the
 fundamental problem in PR/63679.
 
 As a first pass, the approach I took was to look for constant-pool loads as
 we scanned through other accesses, and add them as candidates there; to build 
 a
 constant replacement_decl for any such accesses in completely_scalarize; and 
 to
 use any existing replacement_decl rather than creating a variable in
 create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, 
 but
 that does not allow addresses of labels, which can still end up in the 
 constant
 pool.)
 
 Feedback as to the approach or how it might be better structured / fitted into
 SRA, is solicited ;).

I'm not familiar with constant pools very much, but I'll try:

 
 Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
 arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly 
 increases the number of testcases in which this code is exercised!
 
 Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using 
 a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, 
 avr, and sh.
 
 gcc/ChangeLog:
 
   * tree-sra.c (create_access): Scan for uses of constant pool and add
   to candidates.
   (subst_initial): New.
   (scalarize_elem): Build replacement_decl using subst_initial.
   (create_access_replacement): Use replacement_decl if set.
 
 gcc/testsuite/ChangeLog:
 
   * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
   sra-max-scalarization-size-Ospeed.
 ---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
  gcc/tree-sra.c| 56 
 +--
  2 files changed, 55 insertions(+), 8 deletions(-)
 
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
 index 9eccdc9..b13d583 100644
 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
 @@ -1,5 +1,5 @@
  /* { dg-do compile } */
 -/* { dg-options -O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized } */
 +/* { dg-options -O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized 
 --param sra-max-scalarization-size-Ospeed=32 } */
  
  int
  foo ()
 @@ -17,7 +17,4 @@ foo ()
  /* After late unrolling the above loop completely DOM should be
 able to optimize this to return 28.  */
  
 -/* See PR63679 and PR64159, if the target forces the initializer to memory 
 then
 -   DOM is not able to perform this optimization.  */
 -
 -/* { dg-final { scan-tree-dump return 28; optimized { xfail aarch64*-*-* 
 alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
 +/* { dg-final { scan-tree-dump return 28; optimized } } */
 diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
 index af35fcc..a3ff2df 100644
 --- a/gcc/tree-sra.c
 +++ b/gcc/tree-sra.c
 @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
else
  ptr = false;
  
 +  /* FORNOW: scan for uses of constant pool as we go along.  */
 +  if (TREE_CODE (base) == VAR_DECL  DECL_IN_CONSTANT_POOL (base)
 +   !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
 +{
 +  gcc_assert (!write);
 +  bitmap_set_bit (candidate_bitmap, DECL_UID (base));
 +  tree_node **slot = candidates-find_slot_with_hash (base, DECL_UID 
 (base),
 +   INSERT);
 +  *slot = base;
 +}
 +

I believe you only want to do this if (sra_mode ==
SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA).

The idea of candidates is that we gather them in find_var_candidates
and ten we only eliminate them, this has the benefit of not worrying
about disqualifying a candidate and then erroneously re-adding it
later.  So if you could find a way to structure your code this way, I'd
much happier.  If it is impossible without traversing the whole
function just for that purpose, we may need some mechanism to prevent
us from making a disqualified decl a candidate again.  Or, if we come
to the conclusion that constant pool decls do not ever get
disqualified, a gcc_assert making sure it actually does not happen in
disqualify_candidate.

And of course at find_var_candidates time we check that all candidates
pass simple checks in maybe_add_sra_candidate.  I suppose many of them
do not make sense for constant pool decls but at least please have a
look whether that is the case for all of them or not.

if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
  return NULL;
  
 @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, 
 HOST_WIDE_INT offset, tree ref)
  }
  }
  
 +static tree
 +subst_initial (tree expr, tree var)

This needs a comment and a better name.  A name that would make it
clear this is for constant 

Re: [PATCH 2/5] completely_scalarize arrays as well as records

2015-08-26 Thread Martin Jambor
Hi,

On Wed, Aug 26, 2015 at 09:07:33AM +0200, Richard Biener wrote:
 On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law l...@redhat.com wrote:
  On 08/25/2015 03:42 PM, Martin Jambor wrote:
 
  Hi,
 
  On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote:
 
  This changes the completely_scalarize_record path to also work on arrays
  (thus
  allowing records containing arrays, etc.). This just required extending
  the
  existing type_consists_of_records_p and completely_scalarize_record
  methods
  to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed
  both
  methods so as not to mention 'record'.
 
 
  thanks for working on this.  I see Jeff has already approved the
  patch, but I have two comments nevertheless.  First, I would be much
  happier if you added a proper comment to scalarize_elem function which
  you forgot completely.  The name is not very descriptive and it has
  quite few parameters too.
 
  Right.  I mentioned that I missed the lack of function comments when looking
  at #3 and asked Alan to go back and fix them in #1 and #2.
 
 
  Second, this patch should also fix PR 67283.  It would be great if you
  could verify that and add it to the changelog when committing if that
  is indeed the case.
 
  Excellent.  Yes, definitely mention the BZ.
 
 One extra question is does the way we limit total scalarization work well
 for arrays?  I suppose we have either sth like the maximum size of an
 aggregate we scalarize or the maximum number of component accesses
 we create?
 

Only the former and that would be kept intact.  It is in fact visible
in the context of the last hunk of the patch.

Martin


Re: [PATCH], PowerPC IEEE 128-bit patch #5

2015-08-26 Thread David Edelsohn
On Tue, Aug 25, 2015 at 7:20 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:

 Here is the revised patch. Is it ok to install?

 2015-08-25  Michael Meissner  meiss...@linux.vnet.ibm.com

 * config/rs6000/predicates.md (int_reg_operand_not_pseudo): New
 predicate for only GPR hard registers.

 * config/rs6000/rs6000.md (FP): Add IEEE 128-bit floating point
 modes to iterators. Add new iterators for moving 128-bit values in
 scalar FPR registers and VSX registers.
 (FMOVE128): Likewise.
 (FMOVE128_FPR): Likewise.
 (FMOVE128_GPR): Likewise.
 (FMOVE128_VSX): Likewise.
 (FLOAT128_SFDFTF): New iterators for IEEE 128-bit floating point
 in VSX registers.
 (IFKF): Likewise.
 (IBM128): Likewise.
 (TFIFKF): Likewise.
 (RELOAD): Add IEEE 128-bit floating point modes.
 (signbittf2): Convert TF insns to add support for new IEEE 128-bit
 floating point in VSX registers modes.
 (signbitmode2, IBM128 iterator): Likewise.
 (movmode_64bit_dm, FMOVE128_FPR iterator): Likewise.
 (movmode_32bit, FMOVE128_FPR iterator): Likewise.
 (negtf2): Likewise.
 (negmode2, TFIFKF iterator): Likewise.
 (negtf2_internal): Likewise.
 (abstf2): Likewise.
 (absmode2, TFIFKF iterator): Likewise.
 (ieee_128bit_negative_zero): New IEEE 128-bit floating point in
 VSX insn support for negate, absolute value, and negative absolute
 value.
 (ieee_128bit_vsx_negmode2): Likewise.
 (ieee_128bit_vsx_negmode2_internal): Likewise.
 (ieee_128bit_vsx_absmode2): Likewise.
 (ieee_128bit_vsx_absmode2_internal): Likewise.
 (ieee_128bit_vsx_nabsmode2): Likewise.
 (ieee_128bit_vsx_nabsmode2_internal): Likewise.
 (FP128_64): Update pack/unpack 128-bit insns for IEEE 128-bit
 floating point in VSX registers.
 (unpackmode_dm): Likewise.
 (unpackmode_nodm): Likewise.
 (packmode): Likewise.
 (unpackv1ti): Likewise.
 (unpackmode, FMOVE128_VSX iterator): Likewise.
 (packv1ti): Likewise.
 (packmode, FMOVE128_VSX iterator): Likewise.

The revised patch is okay.

Thanks, David


Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote:
  AVX-512 is such target. Current representation forces multiple scalar
  mask - vector mask and back transformations which are artificially
  introduced by current bool patterns and are hard to optimize out.

 I dislike the bool patterns anyway and we should try to remove those
 and make the vectorizer handle them in other ways (they have single-use
 issues anyway).  I don't remember exactly what caused us to add them
 but one reason was there wasn't a vector type for 'bool' (but I don't see how
 it should be necessary to ask get me a vector type for 'bool').

 That was just one of the reasons.  The other reason is that even if we would
 choose some vector of integer type as vector of bool, the question is what
 type.  E.g. if you use vector of chars, you almost always get terrible
 vectorized code, except for the AVX-512 you really want an integral type
 that has the size of the types you are comparing.

Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always
first compute the vector type for the comparison itself (which is fixed) and
thus we can compute the vector type of any bitwise op on it as well.

 And I'd say this is very much related to the need to do some type promotions
 or demotions on the scalar code meant to be vectorized (but only the copy
 for vectorizations), so that we have as few different scalar type sizes in
 the loop as possible, because widening / narrowing vector conversions aren't
 exactly cheap and a single char operation in a loop otherwise full of long
 long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into
 vf=16 (or 32 or 64), increasing it a lot.

That's true but unrelated.  With conditions this gets to optimizing where
the promotion/demotion happens (which depends on how the result is used).

The current pattern approach has the issue that it doesn't work for multiple
uses in the condition bitops which is bad as well.

But it couldn't have been _only_ the vector type computation that made us
invent the patterns, no?  Do you remember anything else?

Thanks,
Richard.



 Jakub


[PATCH][AArch64 array_mode 4/8] Remove EImode

2015-08-26 Thread Alan Lawrence
This removes EImode from the (AArch64) compiler, and all mention of or support
for it.

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist): Update
comment.

* config/aarch64/aarch64-builtins.c (ei_UP,
aarch64_simd_intEI_type_node): Remove.
(aarch64_simd_builtin_std_type): Remove EImode case.
(aarch64_init_simd_builtin_types): Don't create/add intEI_type_node.

* config/aarch64/aarch64-modes.def: Remove EImode.
---
 gcc/config/aarch64/aarch64-builtins.c | 8 
 gcc/config/aarch64/aarch64-modes.def  | 5 ++---
 gcc/config/aarch64/aarch64.c  | 2 +-
 3 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 294bf9d..9c8ca3b 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -73,7 +73,6 @@
 #define v2di_UP  V2DImode
 #define v2df_UP  V2DFmode
 #define ti_UP   TImode
-#define ei_UP   EImode
 #define oi_UP   OImode
 #define ci_UP   CImode
 #define xi_UP   XImode
@@ -435,7 +434,6 @@ static struct aarch64_simd_type_info aarch64_simd_types [] 
= {
 #undef ENTRY
 
 static tree aarch64_simd_intOI_type_node = NULL_TREE;
-static tree aarch64_simd_intEI_type_node = NULL_TREE;
 static tree aarch64_simd_intCI_type_node = NULL_TREE;
 static tree aarch64_simd_intXI_type_node = NULL_TREE;
 
@@ -509,8 +507,6 @@ aarch64_simd_builtin_std_type (enum machine_mode mode,
   return QUAL_TYPE (TI);
 case OImode:
   return aarch64_simd_intOI_type_node;
-case EImode:
-  return aarch64_simd_intEI_type_node;
 case CImode:
   return aarch64_simd_intCI_type_node;
 case XImode:
@@ -623,15 +619,11 @@ aarch64_init_simd_builtin_types (void)
 #define AARCH64_BUILD_SIGNED_TYPE(mode)  \
   make_signed_type (GET_MODE_PRECISION (mode));
   aarch64_simd_intOI_type_node = AARCH64_BUILD_SIGNED_TYPE (OImode);
-  aarch64_simd_intEI_type_node = AARCH64_BUILD_SIGNED_TYPE (EImode);
   aarch64_simd_intCI_type_node = AARCH64_BUILD_SIGNED_TYPE (CImode);
   aarch64_simd_intXI_type_node = AARCH64_BUILD_SIGNED_TYPE (XImode);
 #undef AARCH64_BUILD_SIGNED_TYPE
 
   tdecl = add_builtin_type
-   (__builtin_aarch64_simd_ei , aarch64_simd_intEI_type_node);
-  TYPE_NAME (aarch64_simd_intEI_type_node) = tdecl;
-  tdecl = add_builtin_type
(__builtin_aarch64_simd_oi , aarch64_simd_intOI_type_node);
   TYPE_NAME (aarch64_simd_intOI_type_node) = tdecl;
   tdecl = add_builtin_type
diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index b17b90d..653bd00 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -46,9 +46,8 @@ VECTOR_MODE (FLOAT, DF, 1);   /* V1DF.  */
 /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
 INT_MODE (OI, 32);
 
-/* Opaque integer modes for 3, 6 or 8 Neon double registers (2 is
-   TImode).  */
-INT_MODE (EI, 24);
+/* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon d-registers
+   (2 d-regs = 1 q-reg = TImode).  */
 INT_MODE (CI, 48);
 INT_MODE (XI, 64);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 020f63c..a923b55 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9305,7 +9305,7 @@ aarch64_simd_attr_length_move (rtx_insn *insn)
 }
 
 /* Compute and return the length of aarch64_simd_reglistmode, where mode is
-   one of VSTRUCT modes: OI, CI, EI, or XI.  */
+   one of VSTRUCT modes: OI, CI, or XI.  */
 int
 aarch64_simd_attr_length_rglist (enum machine_mode mode)
 {
-- 
1.8.3



[PATCH][AArch64 array_mode 2/8] Remove VSTRUCT_DREG, use BLKmode for d-reg aarch64_st/ld expands

2015-08-26 Thread Alan Lawrence
aarch64_stVSTRUCT:nregsVDC:mode and
aarch64_ldVSTRUCT:nregsVDC:mode expanders back onto 12 insns
aarch64_{ld,st}{2,3,4}mode_dreg (for VD and DX modes), using the
VSTRUCT_DREG iterator over TI/EI/OI modes to represent the block of memory
transferred. Instead, use BLKmode for all memory transfers, explicitly setting
mem_size.

Bootstrapped and check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(aarch64_ld2mode_dreg VD  DX, aarch64_st2mode_dreg VD  DX ):
Change all TImode operands to BLKmode.
(aarch64_ld3mode_dreg VD  DX, aarch64_st3mode_dreg VD  DX):
Change all EImode operands to BLKmode.
(aarch64_ld4mode_dreg VD  DX, aarch64_st4mode_dreg VD  DX):
Change all OImode operands to BLKmode.

(aarch64_ldVSTRUCT:nregsVDC:mode,
aarch64_stVSTRUCT:nregsVDC:mode): Generate MEM rtx with BLKmode
and call set_mem_size.

* config/aarch64/iterators.md (VSTRUCT_DREG): Remove.
---
 gcc/config/aarch64/aarch64-simd.md | 44 +++---
 gcc/config/aarch64/iterators.md|  2 --
 2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 3796386..7b7a1b8 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4393,7 +4393,7 @@
(subreg:OI
  (vec_concat:VRL2
(vec_concat:VDBL
-(unspec:VD [(match_operand:TI 1 aarch64_simd_struct_operand 
Utv)]
+(unspec:VD [(match_operand:BLK 1 aarch64_simd_struct_operand 
Utv)]
UNSPEC_LD2)
 (vec_duplicate:VD (const_int 0)))
(vec_concat:VDBL
@@ -4410,7 +4410,7 @@
(subreg:OI
  (vec_concat:VRL2
(vec_concat:VDBL
-(unspec:DX [(match_operand:TI 1 aarch64_simd_struct_operand 
Utv)]
+(unspec:DX [(match_operand:BLK 1 aarch64_simd_struct_operand 
Utv)]
UNSPEC_LD2)
 (const_int 0))
(vec_concat:VDBL
@@ -4428,7 +4428,7 @@
 (vec_concat:VRL3
  (vec_concat:VRL2
(vec_concat:VDBL
-(unspec:VD [(match_operand:EI 1 aarch64_simd_struct_operand 
Utv)]
+(unspec:VD [(match_operand:BLK 1 aarch64_simd_struct_operand 
Utv)]
UNSPEC_LD3)
 (vec_duplicate:VD (const_int 0)))
(vec_concat:VDBL
@@ -4450,7 +4450,7 @@
 (vec_concat:VRL3
  (vec_concat:VRL2
(vec_concat:VDBL
-(unspec:DX [(match_operand:EI 1 aarch64_simd_struct_operand 
Utv)]
+(unspec:DX [(match_operand:BLK 1 aarch64_simd_struct_operand 
Utv)]
UNSPEC_LD3)
 (const_int 0))
(vec_concat:VDBL
@@ -4472,7 +4472,7 @@
 (vec_concat:VRL4
   (vec_concat:VRL2
 (vec_concat:VDBL
-  (unspec:VD [(match_operand:OI 1 aarch64_simd_struct_operand 
Utv)]
+  (unspec:VD [(match_operand:BLK 1 aarch64_simd_struct_operand 
Utv)]
  UNSPEC_LD4)
   (vec_duplicate:VD (const_int 0)))
  (vec_concat:VDBL
@@ -4499,7 +4499,7 @@
 (vec_concat:VRL4
   (vec_concat:VRL2
 (vec_concat:VDBL
-  (unspec:DX [(match_operand:OI 1 aarch64_simd_struct_operand 
Utv)]
+  (unspec:DX [(match_operand:BLK 1 aarch64_simd_struct_operand 
Utv)]
  UNSPEC_LD4)
   (const_int 0))
  (vec_concat:VDBL
@@ -4526,8 +4526,8 @@
   (unspec:VDC [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
-  machine_mode mode = VSTRUCT:VSTRUCT_DREGmode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, VSTRUCT:nregs * 8);
 
   emit_insn (gen_aarch64_ldVSTRUCT:nregsVDC:mode_dreg (operands[0], mem));
   DONE;
@@ -4765,8 +4765,8 @@
 )
 
 (define_insn aarch64_st2mode_dreg
-  [(set (match_operand:TI 0 aarch64_simd_struct_operand =Utv)
-   (unspec:TI [(match_operand:OI 1 register_operand w)
+  [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv)
+   (unspec:BLK [(match_operand:OI 1 register_operand w)
 (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
UNSPEC_ST2))]
   TARGET_SIMD
@@ -4775,8 +4775,8 @@
 )
 
 (define_insn aarch64_st2mode_dreg
-  [(set (match_operand:TI 0 aarch64_simd_struct_operand =Utv)
-   (unspec:TI [(match_operand:OI 1 register_operand w)
+  [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv)
+   (unspec:BLK [(match_operand:OI 1 register_operand w)
 (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
UNSPEC_ST2))]
   TARGET_SIMD
@@ -4785,8 +4785,8 @@
 )
 
 (define_insn aarch64_st3mode_dreg
-  [(set (match_operand:EI 0 aarch64_simd_struct_operand =Utv)
-   (unspec:EI [(match_operand:CI 1 register_operand w)
+ 

[PATCH][AArch64 array_mode 6/8] Remove V_TWO_ELEM, again using BLKmode + set_mem_size.

2015-08-26 Thread Alan Lawrence
Same logic as previous; this makes the 2-, 3-, and 4-lane expanders all follow 
the same pattern.

bootstrapped and check-gcc on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_ld2rmode,
aarch64_vec_load_lanesoi_lanemode,
aarch64_vec_store_lanesoi_lanemode): Change operand mode
from V_TWO_ELEM to BLK.
(aarch64_ld2rmode, aarch64_ld2_lanemode,
aarch64_st2_laneVQ:mode): Generate MEM rtx with BLKmode, call
set_mem_size.

* config/aarch64/iterators.md (V_TWO_ELEM): Remove.
---
 gcc/config/aarch64/aarch64-simd.md | 21 +++--
 gcc/config/aarch64/iterators.md|  9 -
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 68182d6..f938754 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3906,7 +3906,7 @@
 
 (define_insn aarch64_simd_ld2rmode
   [(set (match_operand:OI 0 register_operand =w)
-   (unspec:OI [(match_operand:V_TWO_ELEM 1 aarch64_simd_struct_operand 
Utv)
+   (unspec:OI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
   UNSPEC_LD2_DUP))]
   TARGET_SIMD
@@ -3916,7 +3916,7 @@
 
 (define_insn aarch64_vec_load_lanesoi_lanemode
   [(set (match_operand:OI 0 register_operand =w)
-   (unspec:OI [(match_operand:V_TWO_ELEM 1 aarch64_simd_struct_operand 
Utv)
+   (unspec:OI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)
(match_operand:OI 2 register_operand 0)
(match_operand:SI 3 immediate_operand i)
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
@@ -3957,8 +3957,8 @@
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
 (define_insn aarch64_vec_store_lanesoi_lanemode
-  [(set (match_operand:V_TWO_ELEM 0 aarch64_simd_struct_operand =Utv)
-   (unspec:V_TWO_ELEM [(match_operand:OI 1 register_operand w)
+  [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv)
+   (unspec:BLK [(match_operand:OI 1 register_operand w)
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
(match_operand:SI 2 immediate_operand i)]
UNSPEC_ST2_LANE))]
@@ -4355,8 +4355,8 @@
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
-  machine_mode mode = V_TWO_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2);
 
   emit_insn (gen_aarch64_simd_ld2rmode (operands[0], mem));
   DONE;
@@ -4569,8 +4569,8 @@
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
-  machine_mode mode = V_TWO_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2);
 
   aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode),
NULL);
@@ -4857,8 +4857,9 @@
   (match_operand:SI 2 immediate_operand)]
   TARGET_SIMD
 {
-  machine_mode mode = V_TWO_ELEMmode;
-  rtx mem = gen_rtx_MEM (mode, operands[0]);
+  rtx mem = gen_rtx_MEM (BLKmode, operands[0]);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2);
+
   operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2])));
 
   emit_insn (gen_aarch64_vec_store_lanesoi_laneVQ:mode (mem,
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 9535b7f..2a99e10 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -559,15 +559,6 @@
(V4SI V16SI)  (V4SF V16SF)
(V2DI V8DI)  (V2DF V8DF)])
 
-;; Mode of pair of elements for each vector mode, to define transfer
-;; size for structure lane/dup loads and stores.
-(define_mode_attr V_TWO_ELEM [(V8QI HI)   (V16QI HI)
-  (V4HI SI)   (V8HI SI)
-  (V2SI V2SI) (V4SI V2SI)
-  (DI V2DI)   (V2DI V2DI)
-  (V2SF V2SF) (V4SF V2SF)
-  (DF V2DI)   (V2DF V2DI)])
-
 ;; Mode for atomic operation suffixes
 (define_mode_attr atomic_sfx
   [(QI b) (HI h) (SI ) (DI )])
-- 
1.8.3



[PATCH][AArch64 0/8] Add D-registers to TARGET_ARRAY_MODE_SUPPORTED_P

2015-08-26 Thread Alan Lawrence
The end goal of this series of patches is to enable 64bit vector modes for
TARGET_ARRAY_MODE_SUPPORTED_P, achieved in the last patch. At present, doing so
causes ICEs with illegal subregs (e.g. returning the middle bits from a large
int mode covering 3 vectors); the patchset avoids these by first removing EImode
(192 bits = 24 bytes = 1.5 vector registers), which is currently used for
24-byte quantities transferred to/from memory by some {ld,st}3_lane instrinsics.
There is no real need to use EImode here, it's only real purpose is that it has
size 24 bytes, so we can use BLKmode instead as long as we explicitly set the
size.

Patches 5-6 extend the same BLKmode treatment to {ld,st}{2,4}, allowing all the
expander patterns to combined in patch 7; these are not essential to the end
goal but it seemed good to be consistent. Patch 1 is a driveby, and stands in
its own right.



[PATCH][AArch64 array_mode 1/8] Rename vec_store_lanesmode_lane to aarch64_vec_store_lanesmode_lane

2015-08-26 Thread Alan Lawrence
vec_store_lanes{oi,ci,xi}_lane are not standard pattern names, so using them in 
aarch64-simd.md is misleading. This adds an aarch64_ prefix to those pattern 
names, paralleling aarch64_vec_load_lanesmode_lane.

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (vec_store_lanesoi_lanemode): Rename
to...
(aarch64_vec_store_lanesoi_lanemode): ...this.

(vec_store_lanesci_lanemode): Rename to...
(aarch64_vec_store_lanesci_lanemode): ...this.

(vec_store_lanesxi_lanemode): Rename to...
(aarch64_vec_store_lanesxi_lanemode): ...this.

(aarch64_st2_laneVQ:mode, aarch64_st3_laneVQ:mode,
aarch64_st4_laneVQ:mode): Follow renaming.
---
 gcc/config/aarch64/aarch64-simd.md | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index b90f938..3796386 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3956,7 +3956,7 @@
 )
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
-(define_insn vec_store_lanesoi_lanemode
+(define_insn aarch64_vec_store_lanesoi_lanemode
   [(set (match_operand:V_TWO_ELEM 0 aarch64_simd_struct_operand =Utv)
(unspec:V_TWO_ELEM [(match_operand:OI 1 register_operand w)
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
@@ -4051,7 +4051,7 @@
 )
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
-(define_insn vec_store_lanesci_lanemode
+(define_insn aarch64_vec_store_lanesci_lanemode
   [(set (match_operand:V_THREE_ELEM 0 aarch64_simd_struct_operand =Utv)
(unspec:V_THREE_ELEM [(match_operand:CI 1 register_operand w)
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
@@ -4146,7 +4146,7 @@
 )
 
 ;; RTL uses GCC vector extension indices, so flip only for assembly.
-(define_insn vec_store_lanesxi_lanemode
+(define_insn aarch64_vec_store_lanesxi_lanemode
   [(set (match_operand:V_FOUR_ELEM 0 aarch64_simd_struct_operand =Utv)
(unspec:V_FOUR_ELEM [(match_operand:XI 1 register_operand w)
 (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)
@@ -4861,9 +4861,9 @@
   rtx mem = gen_rtx_MEM (mode, operands[0]);
   operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2])));
 
-  emit_insn (gen_vec_store_lanesoi_laneVQ:mode (mem,
- operands[1],
- operands[2]));
+  emit_insn (gen_aarch64_vec_store_lanesoi_laneVQ:mode (mem,
+ operands[1],
+ operands[2]));
   DONE;
 })
 
@@ -4878,9 +4878,9 @@
   rtx mem = gen_rtx_MEM (mode, operands[0]);
   operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2])));
 
-  emit_insn (gen_vec_store_lanesci_laneVQ:mode (mem,
- operands[1],
- operands[2]));
+  emit_insn (gen_aarch64_vec_store_lanesci_laneVQ:mode (mem,
+ operands[1],
+ operands[2]));
   DONE;
 })
 
@@ -4895,9 +4895,9 @@
   rtx mem = gen_rtx_MEM (mode, operands[0]);
   operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2])));
 
-  emit_insn (gen_vec_store_lanesxi_laneVQ:mode (mem,
- operands[1],
- operands[2]));
+  emit_insn (gen_aarch64_vec_store_lanesxi_laneVQ:mode (mem,
+ operands[1],
+ operands[2]));
   DONE;
 })
 
-- 
1.8.3



[PATCH][AArch64 array_mode 7/8] Combine the expanders using VSTRUCT:nregs

2015-08-26 Thread Alan Lawrence
The previous patches leave ld[234]_lane, st[234]_lane, and ld[234]r expanders 
all nearly identical, so we can easily parameterize across the number of lanes 
and combine them.

For the ldVSTRUCT:nregs_lane pattern, I switched from the VCONQ attribute to 
just using the MODE attribute, this is identical for all the Q-register modes 
over which we iterate.

bootstrapped and check-gcc on aarch64-none-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_ld2rmode,
aarch64_ld3rmode, aarch64_ld4rmode): Combine together, making...
(aarch64_simd_ldVSTRUCT:nregsrVALLDIF:mode): ...this.

(aarch64_ld2_lanemode, aarch64_ld3_lanemode,
aarch64_ld4_lanemode): Combine together, making...
(aarch64_ldVSTRUCT:nregs_laneVQ:mode): ...this.

(aarch64_st2_laneVQ:mode, aarch64_st3_laneVQ:mode,
aarch64_st4_laneVQ:mode): Combine together, making...
(aarch64_stVSTRUCT:nregs_laneVQ:mode): ...this.
---
 gcc/config/aarch64/aarch64-simd.md | 144 ++---
 1 file changed, 21 insertions(+), 123 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index f938754..38c4210 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4349,42 +4349,18 @@
 FAIL;
 })
 
-(define_expand aarch64_ld2rmode
-  [(match_operand:OI 0 register_operand =w)
+(define_expand aarch64_ldVSTRUCT:nregsrVALLDIF:mode
+  [(match_operand:VSTRUCT 0 register_operand =w)
(match_operand:DI 1 register_operand w)
(unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
   rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (VALLDIF:MODEmode))
+* VSTRUCT:nregs);
 
-  emit_insn (gen_aarch64_simd_ld2rmode (operands[0], mem));
-  DONE;
-})
-
-(define_expand aarch64_ld3rmode
-  [(match_operand:CI 0 register_operand =w)
-   (match_operand:DI 1 register_operand w)
-   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  TARGET_SIMD
-{
-  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3);
-
-  emit_insn (gen_aarch64_simd_ld3rmode (operands[0], mem));
-  DONE;
-})
-
-(define_expand aarch64_ld4rmode
-  [(match_operand:XI 0 register_operand =w)
-   (match_operand:DI 1 register_operand w)
-   (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  TARGET_SIMD
-{
-  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 4);
-
-  emit_insn (gen_aarch64_simd_ld4rmode (operands[0],mem));
+  emit_insn (gen_aarch64_simd_ldVSTRUCT:nregsrVALLDIF:mode (operands[0],
+   mem));
   DONE;
 })
 
@@ -4561,67 +4537,25 @@
   DONE;
 })
 
-(define_expand aarch64_ld2_lanemode
-  [(match_operand:OI 0 register_operand =w)
+(define_expand aarch64_ldVSTRUCT:nregs_laneVQ:mode
+  [(match_operand:VSTRUCT 0 register_operand =w)
(match_operand:DI 1 register_operand w)
-   (match_operand:OI 2 register_operand 0)
+   (match_operand:VSTRUCT 2 register_operand 0)
(match_operand:SI 3 immediate_operand i)
(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   TARGET_SIMD
 {
   rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2);
+  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (VQ:MODEmode))
+* VSTRUCT:nregs);
 
-  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode),
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VQ:MODEmode),
NULL);
-  emit_insn (gen_aarch64_vec_load_lanesoi_lanemode (operands[0],
- mem,
- operands[2],
- operands[3]));
+  emit_insn (gen_aarch64_vec_load_lanesVSTRUCT:mode_laneVQ:mode (
+   operands[0], mem, operands[2], operands[3]));
   DONE;
 })
 
-(define_expand aarch64_ld3_lanemode
-  [(match_operand:CI 0 register_operand =w)
-   (match_operand:DI 1 register_operand w)
-   (match_operand:CI 2 register_operand 0)
-   (match_operand:SI 3 immediate_operand i)
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
-  TARGET_SIMD
-{
-  rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
-  set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3);
-
-  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode),
-   NULL);
-  emit_insn (gen_aarch64_vec_load_lanesci_lanemode (operands[0],
- mem,
- operands[2],
- operands[3]));
-  DONE;

[gomp4] teach the tracer pass to ignore more blocks for OpenACC

2015-08-26 Thread Cesar Philippidis
I hit a problem in on one of my reduction test cases where the
GOACC_JOIN was getting cloned. Nvptx requires FORK and JOIN to be
single-entry, single-exit regions, or some form of thread divergence may
occur. When that happens, we cannot use the shfl instruction for
reductions or broadcasting (if the warp is divergent), and it may cause
problems with synchronization in general.

Nathan ran into a similar problem in one of the ssa passes when he added
support for predication in the nvptx backend. Part of his solution was
to add a gimple_call_internal_unique_p function to determine if internal
functions are safe to be cloned. This patch teaches the tracer to scan
each basic block for internal function calls using
gimple_call_internal_unique_p, and mark the blocks that contain certain
OpenACC internal functions calls as ignored. It is a shame that
gimple_statement_iterators do not play nicely with const_basic_block.

Is this patch ok for gomp-4_0-branch?

Cesar
2015-08-25  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* tracer.c (ignore_bb_p): Change bb argument from const_basic_block
	to basic_block.  Check for non-clonable calls to internal functions.


diff --git a/gcc/tracer.c b/gcc/tracer.c
index cad7ab1..f20c158 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -58,7 +58,7 @@
 #include fibonacci_heap.h
 
 static int count_insns (basic_block);
-static bool ignore_bb_p (const_basic_block);
+static bool ignore_bb_p (basic_block);
 static bool better_p (const_edge, const_edge);
 static edge find_best_successor (basic_block);
 static edge find_best_predecessor (basic_block);
@@ -91,8 +91,9 @@ bb_seen_p (basic_block bb)
 
 /* Return true if we should ignore the basic block for purposes of tracing.  */
 static bool
-ignore_bb_p (const_basic_block bb)
+ignore_bb_p (basic_block bb)
 {
+  gimple_stmt_iterator gsi;
   gimple g;
 
   if (bb-index  NUM_FIXED_BLOCKS)
@@ -106,6 +107,16 @@ ignore_bb_p (const_basic_block bb)
   if (g  gimple_code (g) == GIMPLE_TRANSACTION)
 return true;
 
+  /* Ignore blocks containing non-clonable function calls.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+{
+  g = gsi_stmt (gsi);
+
+  if (is_gimple_call (g)  gimple_call_internal_p (g)
+	   gimple_call_internal_unique_p (g))
+	return true;
+}
+
   return false;
 }
 


Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-08-26 Thread Lynn A. Boger

I am working on a new patch to address some of the previous concerns
and plan to post it soon after some final testing.

On 08/25/2015 05:51 PM, Ian Lance Taylor wrote:

On Tue, Aug 18, 2015 at 1:36 PM, Lynn A. Boger
labo...@linux.vnet.ibm.com wrote:

libgo/
 PR target/66870
 configure.ac:  When gccgo for building libgo uses the gold version
 containing split stack support on ppc64, ppc64le, define
 LINKER_SUPPORTS_SPLIT_STACK.
 configure:  Regenerate.

Your version test for gold isn't robust: if the major version = 3,
then presumably split stack is supported.  And since you have numbers,
I suggest not trying to use switch, but instead writing something like
 if expr $gold_minor == 25; then
 ...
 elif expr $gold_minor  25; then
 ...
 fi

If that is fixed, I'm fine with the libgo part of this patch.

Ian






[AArch64/testsuite] Add more TLS local executable testcases

2015-08-26 Thread Jiong Wang

This patch cover tlsle tiny model tests, tls size truncation for tiny 
small model included also.

All testcases pass native test.

OK for trunk?

2015-08-26  Jiong Wang  jiong.w...@arm.com

gcc/testsuite/
  * gcc.target/aarch64/tlsle12_tiny_1.c: New testcase for tiny model.
  * gcc.target/aarch64/tlsle24_tiny_1.c: Likewise.
  * gcc.target/aarch64/tlsle_sizeadj_tiny_1.c: TLS size truncation test
  for tiny model.
  * gcc.target/aarch64/tlsle_sizeadj_small_1.c: TLS size truncation test
  for small model.
  
-- 
Regards,
Jiong

Index: gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c	(working copy)
@@ -0,0 +1,8 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options -O2 -fpic -ftls-model=local-exec -mtls-size=12 -mcmodel=tiny --save-temps } */
+
+#include tls_1.x
+
+/* { dg-final { scan-assembler-times #:tprel_lo12 2 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options -O2 -fpic -ftls-model=local-exec -mtls-size=24 -mcmodel=tiny --save-temps } */
+
+#include tls_1.x
+
+/* { dg-final { scan-assembler-times #:tprel_lo12_nc 2 } } */
+/* { dg-final { scan-assembler-times #:tprel_hi12 2 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c	(working copy)
@@ -0,0 +1,10 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-require-effective-target aarch64_tlsle32 } */
+/* { dg-options -O2 -fpic -ftls-model=local-exec -mtls-size=48 --save-temps } */
+
+#include tls_1.x
+
+/* { dg-final { scan-assembler-times #:tprel_g1 2 } } */
+/* { dg-final { scan-assembler-times #:tprel_g0_nc 2 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c
===
--- gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options -O2 -fpic -ftls-model=local-exec -mtls-size=32 -mcmodel=tiny --save-temps } */
+
+#include tls_1.x
+
+/* { dg-final { scan-assembler-times #:tprel_lo12_nc 2 } } */
+/* { dg-final { scan-assembler-times #:tprel_hi12 2 } } */
+/* { dg-final { cleanup-saved-temps } } */


[patch] libstdc++/64351 Ensure std::generate_canonical doesn't return 1.

2015-08-26 Thread Jonathan Wakely

Ed posted this patch to https://gcc.gnu.org/PR64351 in January, I've
tested it and am committing it to trunk with a test.

commit 45f154a5f9172a17f6226b99b41cb9c0bd8d15ec
Author: Jonathan Wakely jwak...@redhat.com
Date:   Wed Aug 26 12:53:08 2015 +0100

Ensure std::generate_canonical doesn't return 1.

2015-08-26  Edward Smith-Rowland  3dw...@verizon.net
	Jonathan Wakely  jwak...@redhat.com

	PR libstdc++/64351
	PR libstdc++/63176
	* include/bits/random.tcc (generate_canonical): Loop until we get a
	result less than one.
	* testsuite/26_numerics/random/uniform_real_distribution/operators/
	64351.cc: New.

diff --git a/libstdc++-v3/include/bits/random.tcc b/libstdc++-v3/include/bits/random.tcc
index 4fdbcfc..a6d966b 100644
--- a/libstdc++-v3/include/bits/random.tcc
+++ b/libstdc++-v3/include/bits/random.tcc
@@ -3472,15 +3472,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const long double __r = static_castlong double(__urng.max())
 			- static_castlong double(__urng.min()) + 1.0L;
   const size_t __log2r = std::log(__r) / std::log(2.0L);
-  size_t __k = std::maxsize_t(1UL, (__b + __log2r - 1UL) / __log2r);
-  _RealType __sum = _RealType(0);
-  _RealType __tmp = _RealType(1);
-  for (; __k != 0; --__k)
+  const size_t __m = std::maxsize_t(1UL,
+	  (__b + __log2r - 1UL) / __log2r);
+  _RealType __ret;
+  do
 	{
-	  __sum += _RealType(__urng() - __urng.min()) * __tmp;
-	  __tmp *= __r;
+	  _RealType __sum = _RealType(0);
+	  _RealType __tmp = _RealType(1);
+	  for (size_t __k = __m; __k != 0; --__k)
+	{
+	  __sum += _RealType(__urng() - __urng.min()) * __tmp;
+	  __tmp *= __r;
+	}
+	  __ret = __sum / __tmp;
 	}
-  return __sum / __tmp;
+  while (__builtin_expect(__ret = _RealType(1), 0));
+  return __ret;
 }
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc b/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc
new file mode 100644
index 000..3de4412
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc
@@ -0,0 +1,57 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+// { dg-options -std=gnu++11 }
+// { dg-do run { target { ! simulator } } }
+
+#include random
+#include testsuite_hooks.h
+
+// libstdc++/64351
+void
+test01()
+{
+  std::mt19937 rng(8890);
+  std::uniform_real_distributionfloat dist;
+
+  rng.discard(30e6);
+  for (long i = 0; i  10e6; ++i)
+{
+  auto n = dist(rng);
+  VERIFY( n != 1.f );
+}
+}
+
+// libstdc++/63176
+void
+test02()
+{
+  std::mt19937 rng(8890);
+  std::seed_seq sequence{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+  rng.seed(sequence);
+  rng.discard(12 * 629143 + 6);
+  float n =
+std::generate_canonicalfloat, std::numeric_limitsfloat::digits(rng);
+  VERIFY( n != 1.f );
+}
+
+int
+main()
+{
+  test01();
+  test02();
+}


Re: [PATCH][4/N] Introduce new inline functions for GET_MODE_UNIT_SIZE and GET_MODE_UNIT_PRECISION

2015-08-26 Thread Oleg Endo

On 26 Aug 2015, at 23:27, Oleg Endo oleg.e...@t-online.de wrote:

 
 On 19 Aug 2015, at 22:35, Jeff Law l...@redhat.com wrote:
 
 On 08/19/2015 06:29 AM, David Sherwood wrote:
 I asked Richard S. to give this a once-over which he did.  However, he
 technically can't approve due to the way his maintainership position was
 worded.
 
 The one request would be a function comment for emit_mode_unit_size and
 emit_mode_unit_precision.  OK with that change.
 Thanks. Here's a new patch with the comments added.
 
 Good to go?
 David.
 
 ChangeLog:
 
 2015-08-19  David Sherwood  david.sherw...@arm.com
 
 gcc/
 * genmodes.c (emit_mode_unit_size_inline): New function.
 (emit_mode_unit_precision_inline): New function.
 (emit_insn_modes_h): Emit new #define.  Emit new functions.
 (emit_mode_unit_size): New function.
 (emit_mode_unit_precision): New function.
 (emit_mode_adjustments): Add mode_unit_size adjustments.
 (emit_insn_modes_c): Emit new arrays.
 * machmode.h (GET_MODE_UNIT_SIZE, GET_MODE_UNIT_PRECISION): Update to
 use new inline methods.
 
 Thanks, this is OK for the trunk.
 
 It seems this broke sh-elf, at least when compiling on OSX with its native 
 clang.
 
 ../../gcc-trunk/gcc/machmode.h:228:43: error: redefinition of 
 'mode_unit_size' with a different type:
  'const unsigned char [56]' vs 'unsigned char [56]'
 extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
  ^
 ./insn-modes.h:417:24: note: previous definition is here
  extern unsigned char mode_unit_size[NUM_MACHINE_MODES];
   ^

This following fixes the problem for me:

Index: gcc/genmodes.c
===
--- gcc/genmodes.c  (revision 227221)
+++ gcc/genmodes.c  (working copy)
@@ -1063,7 +1063,7 @@
 unsigned char\n\
 mode_unit_size_inline (machine_mode mode)\n\
 {\n\
-  extern unsigned char mode_unit_size[NUM_MACHINE_MODES];\n\
+  extern CONST_MODE_UNIT_SIZE unsigned char 
mode_unit_size[NUM_MACHINE_MODES];\n\
   switch (mode)\n\
 {);


Cheers,
Oleg

[PATCH] s390: Add emit_barrier() after trap.

2015-08-26 Thread Dominik Vogt
This patch fixes an ICE on S390 when a trap is generated because
the given -mstack-size is to small.  A barrier was missing after
the trap, so on higher optimization levels a NULL pointer fron an
uninitialized basic block was used.  The patch also contains a
test case.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog
 
* config/s390/s390.c (s390_emit_prologue): Add emit_barrier() after
trap to fix ICE.

gcc/testsuite/ChangeLog
 
* gcc.target/s390/20150826-1.c: New test.
From ec6b88cd51234d138bd559271def086156fcae07 Mon Sep 17 00:00:00 2001
From: Dominik Vogt v...@linux.vnet.ibm.com
Date: Wed, 26 Aug 2015 14:37:00 +0100
Subject: [PATCH] s390: Add emit_barrier() after trap.

---
 gcc/config/s390/s390.c |  1 +
 gcc/testsuite/gcc.target/s390/20150826-1.c | 11 +++
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/20150826-1.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6366691..5951598 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10491,6 +10491,7 @@ s390_emit_prologue (void)
 		   current_function_name(), cfun_frame_layout.frame_size,
 		   s390_stack_size);
 	  emit_insn (gen_trap ());
+	  emit_barrier ();
 	}
 	  else
 	{
diff --git a/gcc/testsuite/gcc.target/s390/20150826-1.c b/gcc/testsuite/gcc.target/s390/20150826-1.c
new file mode 100644
index 000..830772f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/20150826-1.c
@@ -0,0 +1,11 @@
+/* Check that -mstack-size=32 does not cause an ICE.  */
+
+/* { dg-do compile } */
+/* { dg-options -O3 -mstack-size=32 -Wno-pointer-to-int-cast } */
+
+extern char* bar(char *);
+int foo(void)
+{
+  char b[100];
+  return (int)bar(b);
+} /* { dg-warning An unconditional trap is added } */
-- 
2.3.0



Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Richard Biener
On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2015-08-26 16:02 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2015-08-21 14:00 GMT+03:00 Richard Biener richard.guent...@gmail.com:

 Hmm, I don't see how vector masks are more difficult to operate with.

 There are just no instructions for that but you have to pretend you
 have to get code vectorized.

 Huh?  Bitwise ops should be readily available.

 Right bitwise ops are available, but there is no comparison into a
 vector and no masked loads and stores using vector masks (when we
 speak about 512-bit vectors).



 Also according to vector ABI integer mask should be used for mask
 operand in case of masked vector call.

 What ABI?  The function signature of the intrinsics?  How would that
 come into play here?

 Not intrinsics. I mean OpenMP vector functions which require integer
 arg for a mask in case of 512-bit vector.

 How do you declare those?

 Something like this:

 #pragma omp declare simd inbranch
 int foo(int*);

The 'inbranch' is the thing that matters?  And all of foo is then
implicitely predicated?



 Current implementation of masked loads, masked stores and bool
 patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we
 really call it a canonical representation for all targets?

 No idea - we'll revisit when another targets adds a similar capability.

 AVX-512 is such target. Current representation forces multiple scalar
 mask - vector mask and back transformations which are artificially
 introduced by current bool patterns and are hard to optimize out.

 I dislike the bool patterns anyway and we should try to remove those
 and make the vectorizer handle them in other ways (they have single-use
 issues anyway).  I don't remember exactly what caused us to add them
 but one reason was there wasn't a vector type for 'bool' (but I don't see how
 it should be necessary to ask get me a vector type for 'bool').


 Using scalar masks everywhere should probably cause the same conversion
 problem for SSE I listed above though.

 Talking about a canonical representation, shouldn't we use some
 special masks representation and not mixing it with integer and vector
 of integers then? Only in this case target would be able to
 efficiently expand it into a corresponding rtl.

 That was my idea of vectorbool ... but I didn't explore it and see where
 it will cause issues.

 Fact is GCC already copes with vector masks generated by vector compares
 just fine everywhere and I'd rather leave it as that.

 Nope. Currently vector mask is obtained from a vec_cond A op B, {0 ..
 0}, {-1 .. -1}. AND and IOR on bools are also expressed via
 additional vec_cond. I don't think vectorizer ever generates vector
 comparison.

 Ok, well that's an implementation detail then.  Are you sure about AND and 
 IOR?
 The comment above vect_recog_bool_pattern says

 Assuming size of TYPE is the same as size of all comparisons
 (otherwise some casts would be added where needed), the above
 sequence we create related pattern stmts:
 S1'  a_T = x1 CMP1 y1 ? 1 : 0;
 S3'  c_T = x2 CMP2 y2 ? a_T : 0;
 S4'  d_T = x3 CMP3 y3 ? 1 : 0;
 S5'  e_T = c_T | d_T;
 S6'  f_T = e_T;

 thus has vector mask |

 I think in practice it would look like:

 S4'  d_T = x3 CMP3 y3 ? 1 : c_T;

 Thus everything is usually hidden in vec_cond. But my concern is
 mostly about types used for that.


 And I wouldn't say it's fine 'everywhere' because there is a single
 target utilizing them. Masked loads and stored for AVX-512 just don't
 work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to
 512-bit vector then we get an ugly inefficient code. The question is
 where to fight with this inefficiency: in RTL or in GIMPLE. I want to
 fight with it where it appears, i.e. in GIMPLE by preventing bool -
 int conversions applied everywhere even if target doesn't need it.

 If we don't want to support both types of masks in GIMPLE then it's
 more reasonable to make bool - int conversion in expand for targets
 requiring it, rather than do it for everyone and then leave it to
 target to transform it back and try to get rid of all those redundant
 transformations. I'd give vectorbool a chance to become a canonical
 mask representation for that.

 Well, you are missing the case of

bool b = a  b;
int x = (int)b;

 This case seems to require no changes and just be transformed into vec_cond.

Ok, the example was too simple but I meant that a bool has a non-conditional
use.

Ok, so I still believe we don't want two ways to express things on GIMPLE if
possible.  Yes, the vectorizer already creates only vector stmts that
are supported
by the hardware.  So it's a matter of deciding on the GIMPLE representation
for the mask.  I'd rather use vectorbool (and the target assigning
an integer
mode to it) than an 'int' 

Re: [Scalar masks 2/x] Use bool masks in if-conversion

2015-08-26 Thread Jakub Jelinek
On Wed, Aug 26, 2015 at 04:56:23PM +0200, Richard Biener wrote:
  How do you declare those?
 
  Something like this:
 
  #pragma omp declare simd inbranch
  int foo(int*);
 
 The 'inbranch' is the thing that matters?  And all of foo is then
 implicitely predicated?

If it is
#pragma omp declare simd notinbranch,
then only the non-predicated version is emitted and thus it is usable only
in vectorized loops inside of non-conditional contexts.
If it is
#pragma omp declare simd inbranch,
then only the predicated version is emitted, there is an extra argument
(either V*QI if I remember well, or for AVX-512 short/int/long bitmask),
if the caller wants to use it in non-conditional contexts, it just passes
all ones mask.  For
#pragma omp declare simd
(neither inbranch nor notinbranch), two versions are emitted, one predicated
and one non-predicated.

Jakub


[PATCH] Fix and simplify (Re: Fix libbacktrace -fPIC breakage from Use libbacktrace in libgfortran)

2015-08-26 Thread Ulrich Weigand
Hans-Peter Nilsson wrote:

 I don't feel very confused, but I understand you've investigated
 things down to a point where we can conclude that libtool can't
 do what SPU needs without also at least fiddling with
 compilation options.

Well, looks like I was confused after all.  I missed one extra
feature of libtool that does indeed just make everything work
automatically: if a library is set up using the noinst flag,
libtool considers it a convenience library and will never
create a shared library in any case; but it will create two
sets of object files, one suitable for linking into a static
library and one suitable for linking into a shared library,
and will automatically use the correct set when linking any
other library against the convenince library.

This is exactly what we want to happen for libbacktrace.  And
in fact, it is *already* set up as convenience library:
noinst_LTLIBRARIES = libbacktrace.la

This means the only thing we need to do is simply remove all
the special code: no more disable-shared and no more fiddling
with -fPIC (except for the --enable-host-shared case, which
remains special just like it does in all other libraries).

I've verified that this works on x86_64: the resulting
libgfortran.so uses the -fPIC version of the libbacktrace
object, while libgfortran.a uses the non-PIC versions.

On SPU, libtool will now automatically only generate the
non-PIC versions since the target does not support shared
library.  So everything works as expected.

OK for mainline?

Bye,
Ulrich

Index: libbacktrace/configure.ac
===
--- libbacktrace/configure.ac   (revision 227217)
+++ libbacktrace/configure.ac   (working copy)
@@ -79,7 +79,7 @@ case $AWK in
 ) AC_MSG_ERROR([can't build without awk]) ;;
 esac
 
-LT_INIT([disable-shared])
+LT_INIT
 AM_PROG_LIBTOOL
 
 backtrace_supported=yes
@@ -161,22 +161,11 @@ else
   fi
 fi
 
-# When building as a target library, shared libraries may want to link
-# this in.  We don't want to provide another shared library to
-# complicate dependencies.  Instead, we just compile with -fPIC, if
-# the target supports compiling with that option.
-PIC_FLAG=
-if test -n ${with_target_subdir}; then
-  ac_save_CFLAGS=$CFLAGS
-  CFLAGS=$CFLAGS -fPIC
-  AC_TRY_COMPILE([], [], [PIC_FLAG=-fPIC])
-  CFLAGS=$ac_save_CFLAGS
-fi
-# Similarly, use -fPIC with --enable-host-shared:
+# Enable --enable-host-shared.
 AC_ARG_ENABLE(host-shared,
 [AS_HELP_STRING([--enable-host-shared],
[build host code as shared libraries])],
-[PIC_FLAG=-fPIC], [])
+[PIC_FLAG=-fPIC], [PIC_FLAG=])
 AC_SUBST(PIC_FLAG)
 
 # Test for __sync support.

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [gomp4.1] comment some stuff

2015-08-26 Thread Jakub Jelinek
On Tue, Aug 25, 2015 at 10:35:37AM -0700, Aldy Hernandez wrote:
 diff --git a/libgomp/env.c b/libgomp/env.c
 index 65a6851..0569521 100644
 --- a/libgomp/env.c
 +++ b/libgomp/env.c
 @@ -69,7 +69,7 @@ struct gomp_task_icv gomp_global_icv = {
  
  unsigned long gomp_max_active_levels_var = INT_MAX;
  bool gomp_cancel_var = false;
 -int gomp_max_task_priority_var = 0;
 +static int gomp_max_task_priority_var = 0;
  #ifndef HAVE_SYNC_BUILTINS
  gomp_mutex_t gomp_managed_threads_lock;
  #endif

Please remove this hunk.  The variable is meant to be used in task.c,
where
  (void) priority;
is present right now (like:
  if (priority  gomp_max_task_priority_var)
priority = gomp_max_task_priority_var;
or so.

 @@ -110,7 +112,12 @@ static void gomp_task_maybe_wait_for_dependencies (void 
 **depend);
  
  /* Called when encountering an explicit task directive.  If IF_CLAUSE is
 false, then we must not delay in executing the task.  If UNTIED is true,
 -   then the task may be executed by any member of the team.  */
 +   then the task may be executed by any member of the team.
 +
 +   DEPEND is an array containing:
 + depend[0]: number of depend elements.
 + depend[1]: number of depend elements of type out.
 + depend[N+2]: address of [0..N]th depend element.  */

Either [1..N]th, or [0..N-1]th.  And depend[N+2] should better be
depend[2..N+1].

Otherwise LGTM.

Jakub


Re: [gomp4] teach the tracer pass to ignore more blocks for OpenACC

2015-08-26 Thread Nathan Sidwell

On 08/26/15 09:57, Cesar Philippidis wrote:

I hit a problem in on one of my reduction test cases where the
GOACC_JOIN was getting cloned. Nvptx requires FORK and JOIN to be
single-entry, single-exit regions, or some form of thread divergence may
occur. When that happens, we cannot use the shfl instruction for
reductions or broadcasting (if the warp is divergent), and it may cause
problems with synchronization in general.

Nathan ran into a similar problem in one of the ssa passes when he added
support for predication in the nvptx backend. Part of his solution was
to add a gimple_call_internal_unique_p function to determine if internal
functions are safe to be cloned. This patch teaches the tracer to scan
each basic block for internal function calls using
gimple_call_internal_unique_p, and mark the blocks that contain certain
OpenACC internal functions calls as ignored. It is a shame that
gimple_statement_iterators do not play nicely with const_basic_block.

Is this patch ok for gomp-4_0-branch?


ok by me.  (I idly wonder if tracer should be using the routine that 
jump-threading has for scanning a block determining duplicability)


nathan

--
Nathan Sidwell - Director, Sourcery Services - Mentor Embedded


[patch] libstdc++/66902 Make _S_debug_messages static.

2015-08-26 Thread Jonathan Wakely

This patch removes a public symbol from the .so, which is generally a
bad thing, but there should be no users of this anywhere (it's never
declared in any public header).

For targets using symbol versioning this isn't exported at all, as it
isn't in the linker script, so this really just makes other targets
consistent with the ones using versioned symbols.

Tested powerpc64le-linux and dragonfly-4.2, committed to trunk
commit d35fbf8937930554af62a7320806abecf7381175
Author: Jonathan Wakely jwak...@redhat.com
Date:   Fri Jul 17 10:15:03 2015 +0100

libstdc++/66902 Make _S_debug_messages static.

	PR libstdc++/66902
	* src/c++11/debug.cc (_S_debug_messages): Give internal linkage.

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 997c0f3..c435de7 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -103,7 +103,7 @@ namespace
 
 namespace __gnu_debug
 {
-  const char* _S_debug_messages[] =
+  static const char* _S_debug_messages[] =
   {
 // General Checks
 function requires a valid iterator range [%1.name;, %2.name;),


[v3 patch] try_emplace and insert_or_assign for Debug Mode.

2015-08-26 Thread Jonathan Wakely

These new members need to be defined in Debug Mode, because the
iterators passed in as hints and returned as results need to be safe
iterators.

No new tests, because we already have tests for these members, and
they're failing in debug mode.

Tested powerpc64le-linux, committed to trunk.


commit ae899df9056ff8a58d658ef42125935856503f96
Author: Jonathan Wakely jwak...@redhat.com
Date:   Wed Aug 26 21:24:30 2015 +0100

try_emplace and insert_or_assign for Debug Mode.

	* include/debug/map.h (map::try_emplace, map::insert_or_assign):
	Define.
	* include/debug/unordered_map (unordered_map::try_emplace,
	unordered_map::insert_or_assign): Define.

diff --git a/libstdc++-v3/include/debug/map.h b/libstdc++-v3/include/debug/map.h
index d45cf79..914d721 100644
--- a/libstdc++-v3/include/debug/map.h
+++ b/libstdc++-v3/include/debug/map.h
@@ -317,6 +317,89 @@ namespace __debug
 	_Base::insert(__first, __last);
 	}
 
+
+#if __cplusplus  201402L
+  template typename... _Args
+pairiterator, bool
+try_emplace(const key_type __k, _Args... __args)
+{
+	  auto __res = _Base::try_emplace(__k,
+	  std::forward_Args(__args)...);
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template typename... _Args
+pairiterator, bool
+try_emplace(key_type __k, _Args... __args)
+{
+	  auto __res = _Base::try_emplace(std::move(__k),
+	  std::forward_Args(__args)...);
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template typename... _Args
+iterator
+try_emplace(const_iterator __hint, const key_type __k,
+_Args... __args)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::try_emplace(__hint.base(), __k,
+	 std::forward_Args(__args)...),
+			  this);
+	}
+
+  template typename... _Args
+iterator
+try_emplace(const_iterator __hint, key_type __k, _Args... __args)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::try_emplace(__hint.base(), std::move(__k),
+	 std::forward_Args(__args)...),
+			  this);
+	}
+
+  template typename _Obj
+std::pairiterator, bool
+insert_or_assign(const key_type __k, _Obj __obj)
+	{
+	  auto __res = _Base::insert_or_assign(__k,
+	   std::forward_Obj(__obj));
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template typename _Obj
+std::pairiterator, bool
+insert_or_assign(key_type __k, _Obj __obj)
+	{
+	  auto __res = _Base::insert_or_assign(std::move(__k),
+	   std::forward_Obj(__obj));
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template typename _Obj
+iterator
+insert_or_assign(const_iterator __hint,
+ const key_type __k, _Obj __obj)
+	{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::insert_or_assign(__hint.base(), __k,
+		  std::forward_Obj(__obj)),
+			  this);
+	}
+
+  template typename _Obj
+iterator
+insert_or_assign(const_iterator __hint, key_type __k, _Obj __obj)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::insert_or_assign(__hint.base(),
+		  std::move(__k),
+		  std::forward_Obj(__obj)),
+			  this);
+	}
+#endif
+
+
 #if __cplusplus = 201103L
   iterator
   erase(const_iterator __position)
diff --git a/libstdc++-v3/include/debug/unordered_map b/libstdc++-v3/include/debug/unordered_map
index cc3bc3f..1bbdb61 100644
--- a/libstdc++-v3/include/debug/unordered_map
+++ b/libstdc++-v3/include/debug/unordered_map
@@ -377,6 +377,88 @@ namespace __debug
 	  _M_check_rehashed(__bucket_count);
 	}
 
+#if __cplusplus  201402L
+  template typename... _Args
+pairiterator, bool
+try_emplace(const key_type __k, _Args... __args)
+{
+	  auto __res = _Base::try_emplace(__k,
+	  std::forward_Args(__args)...);
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template typename... _Args
+pairiterator, bool
+try_emplace(key_type __k, _Args... __args)
+{
+	  auto __res = _Base::try_emplace(std::move(__k),
+	  std::forward_Args(__args)...);
+	  return { iterator(__res.first, this), __res.second };
+	}
+
+  template typename... _Args
+iterator
+try_emplace(const_iterator __hint, const key_type __k,
+_Args... __args)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::try_emplace(__hint.base(), __k,
+	 std::forward_Args(__args)...),
+			  this);
+	}
+
+  template typename... _Args
+iterator
+try_emplace(const_iterator __hint, key_type __k, _Args... __args)
+{
+	  __glibcxx_check_insert(__hint);
+	  return iterator(_Base::try_emplace(__hint.base(), std::move(__k),
+	 std::forward_Args(__args)...),
+			  this);
+	}
+
+  template typename _Obj
+pairiterator, bool
+   

Re: [gomp4] lowering OpenACC reductions

2015-08-26 Thread Cesar Philippidis
On 08/21/2015 02:00 PM, Cesar Philippidis wrote:

 This patch teaches omplower how to utilize the new OpenACC reduction
 framework described in Nathan's document, which was posted here
 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01248.html. Here is the
 infrastructure patch
 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01130.html, and here's
 the nvptx backend changes
 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01334.html. The updated
 reduction tests have been posted here
 https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01561.html.

All of these patches have been committed to gomp-4_0-branch.

Cesar


Go patch committed: Don't crash on invalid builtin calls

2015-08-26 Thread Ian Lance Taylor
This patch by Chris Manghane fixes the Go compiler to not crash when
it sees invalid builtin calls.  This fixes
https://golang.org/issue/11544 .  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227227)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-cd5362c7bb0b207f484a8dfb8db229fd2bffef09
+5ee78e7d52a4cad0b23f5bc62e5b452489243c70
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 227227)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -6588,7 +6588,11 @@ Builtin_call_expression::Builtin_call_ex
 recover_arg_is_set_(false)
 {
   Func_expression* fnexp = this-fn()-func_expression();
-  go_assert(fnexp != NULL);
+  if (fnexp == NULL)
+{
+  this-code_ = BUILTIN_INVALID;
+  return;
+}
   const std::string name(fnexp-named_object()-name());
   if (name == append)
 this-code_ = BUILTIN_APPEND;
@@ -6661,7 +6665,7 @@ Expression*
 Builtin_call_expression::do_lower(Gogo* gogo, Named_object* function,
  Statement_inserter* inserter, int)
 {
-  if (this-classification() == EXPRESSION_ERROR)
+  if (this-is_error_expression())
 return this;
 
   Location loc = this-location();
@@ -7500,11 +7504,13 @@ Builtin_call_expression::do_discarding_v
 Type*
 Builtin_call_expression::do_type()
 {
+  if (this-is_error_expression())
+return Type::make_error_type();
   switch (this-code_)
 {
 case BUILTIN_INVALID:
 default:
-  go_unreachable();
+  return Type::make_error_type();
 
 case BUILTIN_NEW:
 case BUILTIN_MAKE:


[gomp4] initialize worker reduction locks

2015-08-26 Thread Cesar Philippidis
This patch teaches omplow how to emit function calls to
IFN_GOACC_LOCK_INIT so that the worker mutex has a proper initial value.
On nvptx targets, shared memory isn't initialized (and that's where the
lock is located for OpenACC workers), so this makes it explicit. Nathan
added the internal function used in the patch a couple of days ago.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-08-26  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* omp-low.c (lower_oacc_reductions): Call GOACC_REDUCTION_INIT
	to initialize the gang and worker mutex.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 955a098..ee92141 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4795,10 +4795,20 @@ lower_oacc_reductions (enum internal_fn ifn, int loop_dim, tree clauses,
   if (ctx-reductions == 0)
 return;
 
+  dim = build_int_cst (integer_type_node, loop_dim);
+
+  /* Call GOACC_LOCK_INIT.  */
+  if (ifn == IFN_GOACC_REDUCTION_SETUP)
+{
+  call = build_call_expr_internal_loc (UNKNOWN_LOCATION,
+	   IFN_GOACC_LOCK_INIT,
+	   void_type_node, 2, dim, lid);
+  gimplify_and_add (call, ilist);
+}
+
   /* Call GOACC_LOCK.  */
   if (ifn == IFN_GOACC_REDUCTION_FINI  write_back)
 {
-  dim = build_int_cst (integer_type_node, loop_dim);
   call = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOACC_LOCK,
 	   void_type_node, 2, dim, lid);
   gimplify_and_add (call, ilist);


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-08-26 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Thursday, August 20, 2015 9:19 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/20/2015 09:38 AM, Ajit Kumar Agarwal wrote:


 Bootstrapping with i386 and Microblaze target works fine. No 
 regression is seen in Deja GNU tests for Microblaze. There are lesser 
 failures. Mibench/EEMBC benchmarks were run for Microblaze target and 
 the gain of 9.3% is seen in rgbcmy_lite the EEMBC benchmarks.
 What do you mean by there are lesser failures?  Are you saying there are 
 cases where path splitting generates incorrect code, or cases where path 
 splitting produces code that is less efficient, or something else?

 I meant there are more Deja GNU testcases passes with the path splitting 
 changes.
Ah, in that case, that's definitely good news!

Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c

void
dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
{
  int i, level, qmul, qadd;

  qadd = (qscale - 1) | 1;
  qmul = qscale  1;

  for (i = 0; i = nCoeffs; i++)
{
  level = block[i];
  if (level  0)
level = level * qmul - qadd;
  else
level = level * qmul + qadd;
  block[i] = level;
}
}

The above Loop is a candidate of path splitting as the IF block merges at the 
latch of the Loop and the path splitting duplicates
The latch of the loop which is the statement block[i] = level into the 
predecessors THEN and ELSE block.

Due to above path splitting,  the IF conversion is disabled and the above 
IF-THEN-ELSE is not IF-converted and the test case fails.

There were following review comments from the above patch.

+/* This function performs the feasibility tests for path splitting
 +   to perform. Return false if the feasibility for path splitting
 +   is not done and returns true if the feasibility for path splitting
 +   is done. Following feasibility tests are performed.
 +
 +   1. Return false if the join block has rhs casting for assign
 +  gimple statements.

Comments from Jeff:

These seem totally arbitrary.  What's the reason behind each of these 
restrictions?  None should be a correctness requirement AFAICT.  

In the above patch I have made a check given in point 1. in the loop latch and 
the Path splitting is disabled and the IF-conversion
happens and the test case passes.

I have incorporated the above review comments of not doing the above 
feasibility check of the point 1 and the above testcases goes
For path splitting and due to path splitting the if-cvt is not happening and 
the test case fails (expecting the pattern Applying if conversion 
To be present). With the above patch given for review and the Feasibility check 
of cast assign in the latch of the loop as given in point 1
 disables the path splitting  and if-cvt happens and the above test case passes.

Please let me know whether to keep the above feasibility check as given in 
point 1  or better appropriate changes required for the above 
Test case scenario of path splitting vs IF-conversion.

Thanks  Regards
Ajit


jeff



Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Alan Lawrence

Jeff Law wrote:


The question I have is why this differs from the effects of patch #5. 
That would seem to indicate that there's things we're not getting into 
the candidate tables with this approach?!?


I'll answer this first, as I think (Richard and) Martin have identified enough 
other issues with this patch that will take longer to address but if you 
look at the context to the hunk in patch 5, it is iterating through the 
candidates (from patch 4), and then filtering out any candidates bigger than 
max-scalarization-size, which filtering patch 5 removes.


--Alan



[Patch, libstdc++/67362] Fix non-special character for POSIX basic syntax in regex

2015-08-26 Thread Tim Shen
Bootstrapped and tested on x86_64-pc-linux-gnu.

Thanks!


-- 
Regards,
Tim Shen
commit e134e1a835ad15900686351cade36774593b91ea
Author: Tim Shen tims...@google.com
Date:   Wed Aug 26 17:51:29 2015 -0700

PR libstdc++/67362
* include/bits/regex_scanner.tcc (_Scanner::_M_scan_normal):
Always returns ordinary char token if the char isn't
considered a special char.
* testsuite/28_regex/regression.cc: New test file for collecting
regression testcases from, typically, bugzilla.

diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc 
b/libstdc++-v3/include/bits/regex_scanner.tcc
index 3bcbd0f..1555669 100644
--- a/libstdc++-v3/include/bits/regex_scanner.tcc
+++ b/libstdc++-v3/include/bits/regex_scanner.tcc
@@ -99,6 +99,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   auto __c = *_M_current++;
   const char* __pos;
 
+  if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')) == nullptr)
+   {
+ _M_token = _S_token_ord_char;
+ _M_value.assign(1, __c);
+ return;
+   }
   if (__c == '\\')
{
  if (_M_current == _M_end)
diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc 
b/libstdc++-v3/testsuite/28_regex/regression.cc
new file mode 100644
index 000..71d82d5
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/regression.cc
@@ -0,0 +1,42 @@
+// { dg-options -std=gnu++11 }
+
+//
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+#include testsuite_hooks.h
+#include testsuite_regex.h
+
+using namespace __gnu_test;
+using namespace std;
+
+// PR libstdc++/67362
+void
+test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  regex re(((.), regex_constants::basic);
+}
+
+int
+main()
+{
+  test01();
+  return 0;
+}
+


Re: [Patch, libstdc++/67362] Fix non-special character for POSIX basic syntax in regex

2015-08-26 Thread Tim Shen
On Wed, Aug 26, 2015 at 6:41 PM, Tim Shen tims...@google.com wrote:
 Bootstrapped and tested on x86_64-pc-linux-gnu.

Also plan to backport to 4.9 and 5.


-- 
Regards,
Tim Shen


[c++-delayed-folding] fold_simple

2015-08-26 Thread Jason Merrill
Why does fold_simple fold so many patterns?  I thought we wanted 
something that would just fold conversions and negations of constant values.


Jason



[gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-08-26 Thread Cesar Philippidis
This patch strips out all of the references to ganglocal memory in gcc.
Unfortunately, the runtime api still takes a shared memory parameter, so
I haven't made any changes there yet. Perhaps we could still keep the
shared memory argument to GOACC_parallel, but remove all of the support
for ganglocal mappings. Then again, maybe we still need support
ganglocal mappings for legacy purposes.

With the ganglocal mapping aside, I'm in favor of leaving the shared
memory argument to GOACC_parallel, just in case we find another use for
shared memory in the future.

Nathan, what do you want to do here?

Cesar
2015-08-26  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* builtins.c (expand_oacc_ganglocal_ptr): Delete.
	(expand_builtin): Remove stale GOACC_GET_GANGLOCAL_PTR builtin.
	* config/nvptx/nvptx.md (ganglocal_ptr): Delete.
	* gimple.h (struct gimple_statement_omp_parallel_layout): Remove
	ganglocal_size member.
	(gimple_omp_target_ganglocal_size): Delete.
	(gimple_omp_target_set_ganglocal_size): Delete.
	* omp-builtins.def (BUILT_IN_GOACC_GET_GANGLOCAL_PTR): Delete.
	* omp-low.c (struct omp_context): Remove ganglocal_init, ganglocal_ptr,
	ganglocal_size, ganglocal_size_host, worker_var, worker_count and
	worker_sync_elt.
	(alloc_var_ganglocal): Delete.
	(install_var_ganglocal): Delete.
	(new_omp_context): Don't use ganglocal memory.
	(expand_omp_target): Likewise.
	(lower_omp_taskreg): Likewise.
	(lower_omp_target): Likewise.
	* tree-parloops.c (create_parallel_loop): Likewise.
	* tree-pretty-print.c (dump_omp_clause): Remove support for
	GOMP_MAP_FORCE_TO_GANGLOCAL

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 7c3ead1..f465716 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5913,25 +5913,6 @@ expand_builtin_acc_on_device (tree exp, rtx target)
   return target;
 }
 
-static rtx
-expand_oacc_ganglocal_ptr (rtx target ATTRIBUTE_UNUSED)
-{
-#ifdef HAVE_ganglocal_ptr
-  enum insn_code icode;
-  icode = CODE_FOR_ganglocal_ptr;
-  rtx tmp = target;
-  if (!REG_P (tmp) || GET_MODE (tmp) != Pmode)
-tmp = gen_reg_rtx (Pmode);
-  rtx insn = GEN_FCN (icode) (tmp);
-  if (insn != NULL_RTX)
-{
-  emit_insn (insn);
-  return tmp;
-}
-#endif
-  return NULL_RTX;
-}
-
 /* Expand an expression EXP that calls a built-in function,
with result going to TARGET if that's convenient
(and in mode MODE if that's convenient).
@@ -7074,12 +7055,6 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	return target;
   break;
 
-case BUILT_IN_GOACC_GET_GANGLOCAL_PTR:
-  target = expand_oacc_ganglocal_ptr (target);
-  if (target)
-	return target;
-  break;
-
 default:	/* just do library call, if unknown builtin */
   break;
 }
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 3d734a8..d0d6564 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1485,23 +1485,6 @@
   
   %.\\tst.shared%u1\\t%1,%0;)
 
-(define_insn ganglocal_ptrmode
-  [(set (match_operand:P 0 nvptx_register_operand )
-	(unspec:P [(const_int 0)] UNSPEC_SHARED_DATA))]
-  
-  %.\\tcvta.shared%t0\\t%0, sdata;)
-
-(define_expand ganglocal_ptr
-  [(match_operand 0 nvptx_register_operand )]
-  
-{
-  if (Pmode == DImode)
-emit_insn (gen_ganglocal_ptrdi (operands[0]));
-  else
-emit_insn (gen_ganglocal_ptrsi (operands[0]));
-  DONE;
-})
-
 ;; Atomic insns.
 
 (define_expand atomic_compare_and_swapmode
diff --git a/gcc/gimple.h b/gcc/gimple.h
index d8d8742..278b49f 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -580,10 +580,6 @@ struct GTY((tag(GSS_OMP_PARALLEL_LAYOUT)))
   /* [ WORD 10 ]
  Shared data argument.  */
   tree data_arg;
-
-  /* [ WORD 11 ]
- Size of the gang-local memory to allocate.  */
-  tree ganglocal_size;
 };
 
 /* GIMPLE_OMP_PARALLEL or GIMPLE_TASK */
@@ -5232,25 +5228,6 @@ gimple_omp_target_set_data_arg (gomp_target *omp_target_stmt,
 }
 
 
-/* Return the size of gang-local data associated with OMP_TARGET GS.  */
-
-static inline tree
-gimple_omp_target_ganglocal_size (const gomp_target *omp_target_stmt)
-{
-  return omp_target_stmt-ganglocal_size;
-}
-
-
-/* Set SIZE to be the size of gang-local memory associated with OMP_TARGET
-   GS.  */
-
-static inline void
-gimple_omp_target_set_ganglocal_size (gomp_target *omp_target_stmt, tree size)
-{
-  omp_target_stmt-ganglocal_size = size;
-}
-
-
 /* Return the clauses associated with OMP_TEAMS GS.  */
 
 static inline tree
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 0d9f386..615c4e0 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -58,8 +58,6 @@ DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_UPDATE, GOACC_update,
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, GOACC_wait,
 		   BT_FN_VOID_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_GANGLOCAL_PTR, GOACC_get_ganglocal_ptr,
-		   BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DEVICEPTR, GOACC_deviceptr,
 		   BT_FN_PTR_PTR, 

Re: C++ delayed folding branch review

2015-08-26 Thread Jason Merrill

On 08/24/2015 03:15 AM, Kai Tietz wrote:

2015-08-03 17:39 GMT+02:00 Jason Merrill ja...@redhat.com:

On 08/03/2015 05:42 AM, Kai Tietz wrote:

2015-08-03 5:49 GMT+02:00 Jason Merrill ja...@redhat.com:

On 07/31/2015 05:54 PM, Kai Tietz wrote:


The STRIP_NOPS-requirement in 'reduced_constant_expression_p' I could
remove, but for one case in constexpr.  Without folding we don't do
type-sinking/raising.


Right.


So binary/unary operations might be containing cast, which were in the
past unexpected.


Why aren't the casts folded away?


On such cast constructs, as for this vector-sample, we can't fold away


Which testcase is this?


It is the g++.dg/ext/vector20.C testcase.  IIRC I mentioned this
testcase already earlier as reference, but I might be wrong here.


I don't see any casts in that testcase.  So the compiler is introducing 
introducing conversions back and forth between const and non-const, 
then?  I suppose it doesn't so much matter where they come from, they 
should be folded away regardless.



the cast chain.  The difference here to none-delayed-folding branch is
that the cast isn't moved out of the plus-expr.  What we see now is
(plus ((vec) (const vector ...) {  }), ...).  Before we had (vec)
(plus (const vector ...) { ... }).


How could a PLUS_EXPR be considered a reduced constant, regardless of where
the cast is?


Of course it is just possible to sink out a cast from PLUS_EXPR, in
pretty few circumstance (eg. on constants if both types just differ in
const-attribute, if conversion is no view-convert).


I don't understand how this is an answer to my question.


On verify_constant we check by reduced_constant_expression_p, if value is
a constant.  We don't handle here, that NOP_EXPRs are something we want to
look through here, as it doesn't change anything if this is a constant, or
not.


NOPs around constants should have been folded away by the time we get
there.


Not in this cases, as the we actually have here a switch from const to
none-const.  So there is an attribute-change, which we can't ignore in
general.


I wasn't suggesting we ignore it, we should be able to change the type of
the vector_cst.


Well, the vector_cst we can change type, but this wouldn't help
AFAICS.  As there is still one cast surviving within PLUS_EXPR for the
other operand.


Isn't the other operand also constant?  In constexpr evaluation, either 
we're dealing with a bunch of constants, in which case we should be 
folding things fully, including conversions between const and non-const, 
or we don't care.



So the way to solve it would be to move such conversion out of the
expression.  For integer-scalars we do this, and for some
floating-points too.  So it might be something we don't handle for
operations with vector-type.


We don't need to worry about that in constexpr evaluation, since we only 
care about constant operands.



But I agree that for constexpr's we could special case cast
from const to none-const (as required in expressions like const vec v
= v + 1).


Right.  But really this should happen in convert.c, it shouldn't be specific
to C++.


Hmm, maybe.  But isn't one of our different goals to move such
implicit code-modification to match.pd instead?


Folding const into a constant is hardly code modification.  But perhaps 
it should go into fold_unary_loc:VIEW_CONVERT_EXPR rather than into 
convert.c.


Jason



Re: [PATCH] Fix and simplify (Re: Fix libbacktrace -fPIC breakage from Use libbacktrace in libgfortran)

2015-08-26 Thread Ian Lance Taylor
Ulrich Weigand uweig...@de.ibm.com writes:

 I've verified that this works on x86_64: the resulting
 libgfortran.so uses the -fPIC version of the libbacktrace
 object, while libgfortran.a uses the non-PIC versions.

 On SPU, libtool will now automatically only generate the
 non-PIC versions since the target does not support shared
 library.  So everything works as expected.

 OK for mainline?

Can you verify that libgo works as expected?

Ian


[PATCH] fix --with-cpu for sh targets

2015-08-26 Thread Rich Felker
A missing * in the pattern for sh targets prevents the --with-cpu
configure option from being accepted for certain targets (e.g. ones
with explicit endianness, like sh2eb).

The latest config.sub should also be pulled from upstream since it has
a fix for related issues.

Rich
--- gcc-5.2.0.orig/gcc/config.gcc
+++ gcc-5.2.0/gcc/config.gcc
@@ -4096,7 +4099,7 @@
esac
;;
 
-   sh[123456ble]-*-* | sh-*-*)
+   sh[123456ble]*-*-* | sh-*-*)
supported_defaults=cpu
case `echo $with_cpu | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ_ 
abcdefghijklmnopqrstuvwxyz- | sed s/sh/m/` in
 | m1 | m2 | m2e | m3 | m3e | m4 | m4-single | m4-single-only 
| m4-nofpu )