date:20170906

[C++, ping] Fix PR bootstrap/81926

2017-09-06 Thread Eric Botcazou

The analysis and original patch:
  https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00101.html
and the amended patch:
  https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00146.html

Thanks in advance.

-- 
Eric Botcazou

C++ PATCH for c++/82053, ICE with default argument in lambda in template

2017-09-06 Thread Jason Merrill

When we regenerate a lambda, the resulting op() doesn't have any
template information, so we can't delay instantiating default
arguments like we do for a normal template function.  I believe this
is also the direction of the core working group for default arguments
in local extern function declarations, so I don't think we need to
invent a mechanism to remember those template arguments for later use.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 17d672e6eaf10f96174b00207c60b5467693877f
Author: Jason Merrill 
Date:   Thu Aug 31 13:03:31 2017 -0400

PR c++/82053 - ICE with default argument in lambda in template

* pt.c (tsubst_arg_types): Substitute default arguments for lambdas
in templates.
(retrieve_specialization): Use lambda_fn_in_template_p.
* cp-tree.h: Declare it.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 20fa039..a0e31d3 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6821,6 +6821,7 @@ extern tree current_nonlambda_function(void);
 extern tree nonlambda_method_basetype  (void);
 extern tree current_nonlambda_scope(void);
 extern bool generic_lambda_fn_p(tree);
+extern bool lambda_fn_in_template_p(tree);
 extern void maybe_add_lambda_conv_op(tree);
 extern bool is_lambda_ignored_entity(tree);
 extern bool lambda_static_thunk_p  (tree);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4a65e31..ec7bbc8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -1193,16 +1193,8 @@ retrieve_specialization (tree tmpl, tree args, hashval_t 
hash)
 
   /* Lambda functions in templates aren't instantiated normally, but through
  tsubst_lambda_expr.  */
-  if (LAMBDA_FUNCTION_P (tmpl))
-{
-  bool generic = PRIMARY_TEMPLATE_P (tmpl);
-  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (tmpl)) > generic)
-   return NULL_TREE;
-
-  /* But generic lambda functions are instantiated normally, once their
-containing context is fully instantiated.  */
-  gcc_assert (generic);
-}
+  if (lambda_fn_in_template_p (tmpl))
+return NULL_TREE;
 
   if (optimize_specialization_lookup_p (tmpl))
 {
@@ -12579,7 +12571,7 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
 bool
 lambda_fn_in_template_p (tree fn)
 {
-  if (!LAMBDA_FUNCTION_P (fn))
+  if (!fn || !LAMBDA_FUNCTION_P (fn))
 return false;
   tree closure = DECL_CONTEXT (fn);
   return CLASSTYPE_TEMPLATE_INFO (closure) != NULL_TREE;
@@ -13248,6 +13240,13 @@ tsubst_arg_types (tree arg_types,
done in build_over_call.  */
 default_arg = TREE_PURPOSE (arg_types);
 
+/* Except that we do substitute default arguments under tsubst_lambda_expr,
+   since the new op() won't have any associated template arguments for us
+   to refer to later.  */
+if (lambda_fn_in_template_p (in_decl))
+  default_arg = tsubst_copy_and_build (default_arg, args, complain, 
in_decl,
+  false/*fn*/, false/*constexpr*/);
+
 if (default_arg && TREE_CODE (default_arg) == DEFAULT_ARG)
   {
 /* We've instantiated a template before its default arguments
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-defarg7.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-defarg7.C
new file mode 100644
index 000..f67dfee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-defarg7.C
@@ -0,0 +1,13 @@
+// PR c++/82053
+// { dg-do compile { target c++14 } }
+
+template
+int fn() { return 42; }
+
+template
+auto lam = [](int = fn()){};
+
+int main()
+{
+  lam();
+}

Re: [PATCH 1/3] improve detection of attribute conflicts (PR 81544)

2017-09-06 Thread Joseph Myers

On Thu, 17 Aug 2017, Martin Sebor wrote:

> +/* Check LAST_DECL and NODE of the same symbol for attributes that are
> +   recorded in EXCL to be mutually exclusive with ATTRNAME, diagnose
> +   them, and return true if any have been found.  NODE can be a DECL
> +   or a TYPE.  */
> +
> +static bool
> +diag_attr_exclusions (tree last_decl, tree node, tree attrname,
> +   const attribute_spec *spec)

EXCL is not an argument to this function, so the comment above it should 
not refer to EXCL (presumably it should refer to SPEC instead).

> + note &= warning (OPT_Wattributes,
> +  "ignoring attribute %qE in declaration of "
> +  "a built-in function qD because it conflicts "
> +  "with attribute %qs",
> +  attrname, node, excl->name);

%qD not qD, presumably.

(Generically, warning_at would be preferred to warning, but that may best 
be kept separate if you don't already have a location available here.)

> +static const struct attribute_spec::exclusions attr_gnu_inline_exclusions[] =
> +{
> +  ATTR_EXCL ("gnu_inline", true, true, true),
> +  ATTR_EXCL ("noinline", true, true, true),
> +  ATTR_EXCL (NULL, false, false, false),
> +};

This says gnu_inline is incompatible with noinline, and is listed as the 
EXCL field for the gnu_inline attribute.

> +static const struct attribute_spec::exclusions attr_inline_exclusions[] =
> +{
> +  ATTR_EXCL ("always_inline", true, true, true),
> +  ATTR_EXCL ("noinline", true, true, true),
> +  ATTR_EXCL (NULL, false, false, false),
> +};

This is listed as the EXCL field for the noinline attribute, but does not 
mention gnu_inline.  Does this mean some asymmetry in when that pair is 
diagnosed?  I don't see tests for that pair added by the patch.

(Of course, gnu_inline + always_inline is OK, and attr_inline_exclusions 
is also used for the always_inline attribute in this patch.)

In general, the data structures where you need to ensure manually that if 
attribute A is listed in EXCL for B, then attribute B is also listed in 
EXCL for A, seem concerning.  I'd expect either data structures that make 
such asymmetry impossible, or a self-test that verifies that the tables in 
use are in fact symmetric (unless there is some reason the symmetry is not 
in fact required and symmetric diagnostics still result from asymmetric 
tables - in which case the various combinations and orderings of 
gnu_inline and noinline definitely need tests to show that the diagnostics 
work).

> +both the @code{const} and the @code{pure} attribute is diagnnosed.

s/diagnnosed/diagnosed/

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] correct documentation of attribute ifunc (PR 81882)

2017-09-06 Thread Joseph Myers

This patch is OK with the spacing in the function prototype fixed as noted 
to follow normal GNU standards.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, rs6000] Add builtins to convert from float/double to int/long using current rounding mode

2017-09-06 Thread Carl Love

On Wed, 2017-09-06 at 16:13 -0500, Pat Haugen wrote:
> On 09/06/2017 11:24 AM, Carl Love wrote:
> > +  "fctiw %1,%1; mfvsrd %0,%1; extsw %0,%0"
> > +  [(set_attr "type" "integer")
> > +   (set_attr "length" "4")])
> 
> Should be type "three" and length "12".
> 
> -Pat

Pat:

Yes, that is wrong in more ways then one.  Looks like I posted the wrong
version of the patch.  This was the first version which unfortunately
results in generating extra extsw instructions.  

I withdraw this patch from consideration.

   Carl Love

Re: [RFA] [PATCH 4/4] Ignore reads of "dead" memory locations in DSE

2017-09-06 Thread Jeff Law

Another old patch getting resurrected...


On 01/04/2017 06:50 AM, Richard Biener wrote:
> On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law  wrote:
>> This is the final patch in the kit to improve our DSE implementation.
>>
>> It's based on a observation by Richi.  Namely that a read from bytes of
>> memory that are dead can be ignored.  By ignoring such reads we can
>> sometimes find additional stores that allow us to either eliminate or trim
>> an earlier store more aggressively.
>>
>> This only hit (by hit I mean the ability to ignore resulted in finding a
>> full or partially dead store that we didn't otherwise find) once during a
>> bootstrap, but does hit often in the libstdc++ testsuite.  I've added a test
>> derived from the conversation between myself and Richi last year.
>>
>> There's nothing in the BZ database on this issue and I can't reasonably call
>> it a bugfix.  I wouldn't lose sleep if this deferred to gcc-8.
>>
>> Bootstrapped and regression tested on x86-64-linux-gnu.  OK for the trunk or
>> defer to gcc-8?
>>
>>
>>
>> * tree-ssa-dse.c (live_bytes_read): New function.
>> (dse_classify_store): Ignore reads of dead bytes.
>>
>> * testsuite/gcc.dg/tree-ssa/ssa-dse-26.c: New test.
>> * testsuite/gcc.dg/tree-ssa/ssa-dse-26.c: Likewise.
>>
>>
>>
[ snip ]

>> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
>> index a807d6d..f5b53fc 100644
>> --- a/gcc/tree-ssa-dse.c
>> +++ b/gcc/tree-ssa-dse.c
>> @@ -475,6 +475,41 @@ maybe_trim_partially_dead_store (ao_ref *ref, sbitmap
>> live, gimple *stmt)
>>  }
>>  }
>>
>> +/* Return TRUE if USE_REF reads bytes from LIVE where live is
>> +   derived from REF, a write reference.
>> +
>> +   While this routine may modify USE_REF, it's passed by value, not
>> +   location.  So callers do not see those modifications.  */
>> +
>> +static bool
>> +live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live)
>> +{
>> +  /* We have already verified that USE_REF and REF hit the same object.
>> + Now verify that there's actually an overlap between USE_REF and REF.
>> */
>> +  if ((use_ref.offset < ref->offset
>> +   && use_ref.offset + use_ref.size > ref->offset)
>> +  || (use_ref.offset >= ref->offset
>> + && use_ref.offset < ref->offset + ref->size))
> 
> can you use ranges_overlap_p? (tree-ssa-alias.h)
Yes.  Didn't know about it.  Done.

> 
>> +{
>> +  normalize_ref (&use_ref, ref);
>> +
>> +  /* If USE_REF covers all of REF, then it will hit one or more
>> +live bytes.   This avoids useless iteration over the bitmap
>> +below.  */
>> +  if (use_ref.offset == ref->offset && use_ref.size == ref->size)
>> +   return true;
>> +
>> +  /* Now iterate over what's left in USE_REF and see if any of
>> +those bits are i LIVE.  */
>> +  for (int i = (use_ref.offset - ref->offset) / BITS_PER_UNIT;
>> +  i < (use_ref.offset + use_ref.size) / BITS_PER_UNIT; i++)
>> +   if (bitmap_bit_p (live, i))
> 
> a bitmap_bit_in_range_p () would be nice to have.  And it can be more
> efficient than this loop...
Yea.  That likely would help here.  I'm testing with a
bitmap_bit_in_range_p implementation (only for sbitmaps since that's
what we're using here).

That implementation does the reasonably efficient things and is modeled
after the sbitmap implementation of bitmap_set_range.


>> @@ -554,6 +589,41 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple
>> **use_stmt,
>>   /* If the statement is a use the store is not dead.  */
>>   else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
>> {
>> + /* Handle common cases where we can easily build a ao_ref
>> +structure for USE_STMT and in doing so we find that the
>> +references hit non-live bytes and thus can be ignored.  */
>> + if (live_bytes)
>> +   {
>> + if (is_gimple_assign (use_stmt))
>> +   {
>> + /* Other cases were noted as non-aliasing by
>> +the call to ref_maybe_used_by_stmt_p.  */
>> + ao_ref use_ref;
>> + ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt));
>> + if (valid_ao_ref_for_dse (&use_ref)
>> + && use_ref.base == ref->base
>> + && use_ref.size == use_ref.max_size
>> + && !live_bytes_read (use_ref, ref, live_bytes))
>> +   {
>> + if (gimple_vdef (use_stmt))
>> +   {
>> + /* If we have already seen a store and
>> +this is also a store, then we have to
>> +fail.  */
>> + if (temp)
>> +   {
>> + fail = true;
>> + BREAK_FROM_IMM_

Re: Add support to trace comparison instructions and switch statements

2017-09-06 Thread Jakub Jelinek

On Wed, Sep 06, 2017 at 10:08:01PM +0200, David Edelsohn wrote:
> This change broke bootstrap on AIX because sancov.c now references a
> macro that is defined as a function on AIX.  sancov.c needs to include
> tm_p.h to pull in the target-dependent prototypes.  The following
> patch works for me.  Is this okay?
> 
> * sancov.c: Include tm_p.h.

Ok, thanks.  And sorry for the breakage.

> Index: sancov.c
> ===
> --- sancov.c(revision 251817)
> +++ sancov.c(working copy)
> @@ -28,6 +28,7 @@
>  #include "basic-block.h"
>  #include "options.h"
>  #include "flags.h"
> +#include "tm_p.h"
>  #include "stmt.h"
>  #include "gimple-iterator.h"
>  #include "gimple-builder.h"

Jakub

Re: [PATCH, rs6000] Add builtins to convert from float/double to int/long using current rounding mode

2017-09-06 Thread Pat Haugen

On 09/06/2017 11:24 AM, Carl Love wrote:
> +  "fctiw %1,%1; mfvsrd %0,%1; extsw %0,%0"
> +  [(set_attr "type" "integer")
> +   (set_attr "length" "4")])

Should be type "three" and length "12".

-Pat

RFC: Representation of runtime offsets and sizes

2017-09-06 Thread Richard Sandiford

The next main step in the SVE submission is to add support for
offsets and sizes that are a runtime invariant rather than a compile
time constant.  This is an RFC about our approach for doing that.
It's an update of https://gcc.gnu.org/ml/gcc/2016-11/msg00031.html
(which covered more topics than this message).

The size of an SVE register in bits can be any multiple of 128 between
128 and 2048 inclusive.  The way we chose to represent this was to
have a runtime indeterminate that counts the number of 128 bit blocks
above the minimum of 128.  If we call the indeterminate X then:

* an SVE register has 128 + 128 * X bits (16 + 16 * X bytes)
* the last int in an SVE vector is at byte offset 12 + 16 * X
* etc.

Although the maximum value of X is 15, we don't want to take advantage
of that, since there's nothing particularly magical about the value.

So we have two types of target: those for which there are no runtime
indeterminates, and those for which there is one runtime indeterminate.
We decided to generalise the interface slightly by allowing any number
of indeterminates, although the underlying implementation is still
limited to 0 and 1 for now.

The main class for working with these runtime offsets and sizes is
"poly_int".  It represents a value of the form:

  C0 + C1 * X1 + ... + Cn * Xn

where each coefficient Ci is a compile-time constant and where each
indeterminate Xi is a nonnegative runtime value.  The class takes two
template parameters, one giving the number of coefficients and one
giving the type of the coefficients.  There are then typedefs for the
common cases, with the number of coefficients being controlled by
the target.

poly_int is used for things like:

- the number of elements in a VECTOR_TYPE
- the size and number of units in a general machine_mode
- the offset of something in the stack frame
- SUBREG_BYTE
- MEM_SIZE and MEM_OFFSET
- mem_ref_offset

(only a selective list).  There are also rtx and tree representations
of poly_int, although I've left those out of this RFC.

The patch has detailed documentation -- which I've also attached as
a PDF -- but the main points are:

* there's no total ordering between poly_ints, so the best we can do
  when comparing them is to ask whether two values *might* or *must*
  be related in a particular way.  E.g. if mode A has size 2 + 2X
  and mode B has size 4, the condition:

GET_MODE_SIZE (A) <= GET_MODE_SIZE (B)

  is true for X<=1 and false for X>=2.  This translates to:

may_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == true
must_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == false

  Of course, the may/must distinction already exists in things like
  alias analysis.

* some poly_int arithmetic operations (notably division) are only possible
  for certain values.  These operations therefore become conditional.

* target-independent code is exposed to these restrictions even if the
  current target has no indeterminates.  But:

  * we've tried to provide enough operations that poly_ints are easy
to work with.

  * it means that developers working with non-SVE targets don't need
to test SVE.  If the code compiles on a non-SVE target, and if it
doesn't use any asserting operations, it's reasonable to assume
that it will work on SVE too.

* for target-specific code, poly_int degenerates to a scalar if there
  are no runtime invariants for that target.  Only very minor changes
  are needed to non-AArch64 targets.

* poly_int operations should be (and in practice seem to be) as
  efficient as normal scalar operations on non-AArch64 targets.

The patch really needs some self-tests (which weren't supported when we
did the work originally), but otherwise it's what I'd like to submit.

Thanks,
Richard


10 Sizes and offsets as runtime invariants
**

GCC allows the size of a hardware register to be a runtime invariant
rather than a compile-time constant.  This in turn means that various
sizes and offsets must also be runtime invariants rather than
compile-time constants, such as:

   * the size of a general 'machine_mode' (*note Machine Modes::);

   * the size of a spill slot;

   * the offset of something within a stack frame;

   * the number of elements in a vector;

   * the size and offset of a 'mem' rtx (*note Regs and Memory::); and

   * the byte offset in a 'subreg' rtx (*note Regs and Memory::).

 The motivating example is the Arm SVE ISA, whose vector registers can
be any multiple of 128 bits between 128 and 2048 inclusive.  The
compiler normally produces code that works for all SVE register sizes,
with the actual size only being known at runtime.

 GCC's main representation of such runtime invariants is the 'poly_int'
class.  This chapter describes what 'poly_int' does, lists the available
operations, and gives some general usage guidelines.

* Menu:

* Overview of poly_int::
* Consequences of using poly_int::
* Comparisons involving poly_int::
* Arithmetic on poly_ints::
* A

Re: Add support to trace comparison instructions and switch statements

2017-09-06 Thread David Edelsohn

This change broke bootstrap on AIX because sancov.c now references a
macro that is defined as a function on AIX.  sancov.c needs to include
tm_p.h to pull in the target-dependent prototypes.  The following
patch works for me.  Is this okay?

* sancov.c: Include tm_p.h.

Index: sancov.c
===
--- sancov.c(revision 251817)
+++ sancov.c(working copy)
@@ -28,6 +28,7 @@
 #include "basic-block.h"
 #include "options.h"
 #include "flags.h"
+#include "tm_p.h"
 #include "stmt.h"
 #include "gimple-iterator.h"
 #include "gimple-builder.h"

Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-09-06 Thread Qing Zhao

Just a followup on this patch.

We did some run-time performance testing internally on this set of
change on sparc M8 machine with -mmisalign and -mno-misalign
based on the latest upstream gcc

for CPU2017 C/C++ SPEED run:

***without -O,  -mmisalign slowdown the run-time performance about 4% on
average

This is mainly due to the following workaround to misaligned support in
M8: (config/sparc/sparc.c)

+/* for misaligned ld/st provided by M8, the IMM field is 10-bit wide
+   other than the 13-bit for regular ld/st.
+   The best solution for this problem is to distinguish each ld/st
+   whether it's aligned or misaligned. However, due to the current
+   design of the common routine TARGET_LEGITIMATE_ADDRESS_P,  only
+   the ADDR of a ld/st is passed to the routine, the align info
+   carried by the corresponding MEM is NOT passed in. without changing
+   the prototype of TARGET_LEGITIMATE_ADDRESS_P, we cannot use this
+   best solution.
+   as a workaround, we have to conservatively treat ALL IMM field of
+   a ld/st insn on a MISALIGNED target is 10-bit wide.
+   the side-effect of this workaround is:  there will be additiona
+   REG<-IMM insn generated for regular ld/st when -mmisalign is ON.
+   However, such additional reload insns should be very easily to be
+   removed by a set of optimization whenever -O specified.
+*/
+#define RTX_OK_FOR_OFFSET_P(X, MODE) \
+  (CONST_INT_P (X)   \
+   && ((!TARGET_MISALIGN \
+&& INTVAL (X) >=3D3D -0x1000 \
+&& INTVAL (X) <=3D3D (0x1000 - GET_MODE_SIZE (MODE)))\
+|| (TARGET_MISALIGN  \
+&& INTVAL (X) >=3D3D -0x0400 \
+&& INTVAL (X) <=3D3D (0x0400 - GET_MODE_SIZE (MODE)

due to this run-time regression introduced by this workaround is not
trivial, We decided to hold on this
set of change at this time.

Thanks.

Qing

> 
> This set of change is to provide a way to use misaligned load/store insns to 
> implement the compiler-time known unaligned memory access,  -mno-misalign can 
> be used
> to disable such behavior very easily if our performance data shows that 
> misaligned load/store insns are slower than the current software emulation. 
> 
> Qing

Re: [PATCH, rs6000] Add support for vec_xst_len_r() and vec_xl_len_r() builtins

2017-09-06 Thread Segher Boessenkool

Hi Carl,

On Wed, Sep 06, 2017 at 08:22:03AM -0700, Carl Love wrote:
>   (define_insn "*stxvl"): add missing argument to the sldi instruction.

s/add/Add/ .  This one-liner fix is approved right now, please commit
it as a separate patch.


> +(define_insn "addi_neg16"
> +  [(set (match_operand:DI 0 "vsx_register_operand" "=r")
> +   (unspec:DI
> +   [(match_operand:DI 1 "gpc_reg_operand" "r")]
> +   UNSPEC_ADDI_NEG16))]
> +  ""
> +  "addi %0,%1,-16"
> +)

You don't need a separate insn (or unspec) for this at all afaics...
Where you do

  emit_insn (gen_addi_neg16 (tmp, operands[2]));

you could just do

  emit_insn (gen_adddi3 (tmp, operands[2], GEN_INT (-16)));


> +;; Load VSX Vector with Length, right justified
> +(define_expand "lxvll"
> +  [(set (match_dup 3)
> +(match_operand:DI 2 "register_operand"))
> +   (set (match_operand:V16QI 0 "vsx_register_operand")
> + (unspec:V16QI
> +  [(match_operand:DI 1 "gpc_reg_operand")
> +   (match_dup 3)]
> +  UNSPEC_LXVLL))]
> +  "TARGET_P9_VECTOR && TARGET_64BIT"
> +{
> +  operands[3] = gen_reg_rtx (DImode);
> +})

Hrm, so you make a reg 3 only because the lxvll pattern will clobber it?

> +(define_insn "*lxvll"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> + (unspec:V16QI
> +  [(match_operand:DI 1 "gpc_reg_operand" "b")
> +   (match_operand:DI 2 "register_operand" "+r")]
> +  UNSPEC_LXVLL))]
> +  "TARGET_P9_VECTOR && TARGET_64BIT"
> +;;  "lxvll %x0,%1,%2;"
> +  "sldi %2,%2, 56\; lxvll %x0,%1,%2;"
> +  [(set_attr "length" "8")
> +   (set_attr "type" "vecload")])

It is nicer to just have a match_scratch in here then, like

(define_insn "*lxvll"
  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
(unspec:V16QI
 [(match_operand:DI 1 "gpc_reg_operand" "b")
  (match_operand:DI 2 "register_operand" "r")]
 UNSPEC_LXVLL))
   (clobber (match_scratch:DI 3 "=&r"))]
  "TARGET_P9_VECTOR && TARGET_64BIT"
  "sldi %3,%2,56\;lxvll %x0,%1,%3"
  [(set_attr "length" "8")
   (set_attr "type" "vecload")])

(Note spacing, comment, ";" stuff, and the earlyclobber).

Ideally you split the sldi off in the expand though, so that the *lxvll
pattern is really just that single insn.


> +(define_insn "altivec_lvsl_reg"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=v")
> +   (unspec:V16QI
> +   [(match_operand:DI 1 "gpc_reg_operand" "b")]
> +   UNSPEC_LVSL_REG))]
> +  "TARGET_ALTIVEC"
> +  "lvsl %0,0,%1"
> +  [(set_attr "type" "vecload")])

vecload isn't really the correct type for this, but I see we have the
same on the existing lvsl patterns (it's permute unit on p9; I expect
the same on p8 and older, but please check).

Please move this next to the existing lvsl pattern.


> +;; Expand for builtin xl_len_r
> +(define_expand "xl_len_r"
> +  [(match_operand:V16QI 0 "vsx_register_operand" "=v")
> +   (match_operand:DI 1 "register_operand" "r")
> +   (match_operand:DI 2 "register_operand" "r")]
> +  "UNSPEC_XL_LEN_R"
> +{
> +  rtx shift_mask = gen_reg_rtx (V16QImode);
> +  rtx rtx_vtmp = gen_reg_rtx (V16QImode);
> +  rtx tmp = gen_reg_rtx (DImode);
> +
> +/* Setup permute vector to shift right by operands[2] bytes.
> +   Note: addi operands[2], -16 is negative so we actually need to
> +   shift left to get a right shift.  */

Indent the comment with the code, so that's 2 spaces more here.

The comment isn't clear to me...  Neither is the code though: lvsl
looks at just the low 4 bits of its arg, so the addi does nothing
useful?  Maybe I am missing something.

> +  emit_insn (gen_addi_neg16 (tmp, operands[2]));
> +  emit_insn (gen_altivec_lvsl_reg (shift_mask, tmp));
> +  emit_insn (gen_lxvll (rtx_vtmp, operands[1], operands[2]));
> +  emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], rtx_vtmp,
> +   rtx_vtmp, shift_mask));



> +;; Store VSX Vector with Length, right justified

_left_ justified?

> +(define_expand "stxvll"
> +  [(set (match_dup 3)
> + (match_operand:DI 2 "register_operand"))
> +   (set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand"))
> + (unspec:V16QI
> +  [(match_operand:V16QI 0 "vsx_register_operand")
> +   (match_dup 3)]
> +  UNSPEC_STXVLL))]
> +  "TARGET_P9_VECTOR && TARGET_64BIT"
> +{
> +  operands[3] = gen_reg_rtx (DImode);
> +})


> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-5-p9-runnable.c
> @@ -0,0 +1,309 @@
> +/* { dg-do run { target { powerpc64*-*-* && { p9vector_hw } } } } */

This should be powerpc*-*-* I think?  Does it need braces around
p9vector_hw?


Segher

C++ PATCH for c++/82070, error with nested lambda capture

2017-09-06 Thread Jason Merrill

I was expecting that references to capture proxies would be resolved
in the reconstructed lambda by normal name lookup, but that doesn't
work in decltype, and processing the nested lambda really wants to
find the new capture proxy, not the captured variable.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit f9a1fe6d129418e72c68d0d1d9d35089ba7817b2
Author: Jason Merrill 
Date:   Wed Sep 6 13:41:58 2017 -0400

PR c++/82070 - error with nested lambda capture

* pt.c (tsubst_expr) [DECL_EXPR]: Register capture proxies with
register_local_specialization.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index eb27f6a..4a65e31 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15985,8 +15985,11 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl,
else if (is_capture_proxy (decl)
 && !DECL_TEMPLATE_INSTANTIATION (current_function_decl))
  {
-   /* We're in tsubst_lambda_expr, we've already inserted new capture
-  proxies, and uses will find them with lookup_name.  */
+   /* We're in tsubst_lambda_expr, we've already inserted a new
+  capture proxy, so look it up and register it.  */
+   tree inst = lookup_name (DECL_NAME (decl));
+   gcc_assert (inst != decl && is_capture_proxy (inst));
+   register_local_specialization (inst, decl);
break;
  }
else if (DECL_IMPLICIT_TYPEDEF_P (decl)
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested7.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested7.C
new file mode 100644
index 000..7403315
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested7.C
@@ -0,0 +1,17 @@
+// PR c++/82070
+// { dg-do compile { target c++11 } }
+
+namespace a {
+template 
+void
+c (int, int, b d)
+{
+  [d] { [d] {}; };
+}
+}
+void
+e ()
+{
+  int f;
+  a::c (f, 3, [] {});
+}

Re: [Patch, fortran] Parameterized Derived Types

2017-09-06 Thread Damian Rouson

 
Thanks for your tireless efforts on this, Paul! I look forward to trying this 
out after it hits the trunk.  

Your phrase “last unimplemented F2003” feature bolsters my suspicion that it 
might be ok to switch the features listed as “Partial” on the Fortran wiki to 
“Yes." I suppose the difference depends on developer intent. If the 
developer(s) intended to leave some aspect of a feature unimplemented (as might 
be evidenced by an appropriate compiler message), then “Partial” seems best. 
Otherwise, “Yes” seems appropriate even in the presence of bugs. I’ll send a 
separate email to the list with further thoughts on this.  


Best Regards,  
___  
Damian Rouson, Ph.D., P.E.
President, Sourcery Institute
www.sourceryinstitute.org(http://www.sourceryinstitute.org)
+1-510-600-2992 (mobile)



On September 6, 2017 at 6:04:47 AM, Paul Richard Thomas 
(paul.richard.tho...@gmail.com(mailto:paul.richard.tho...@gmail.com)) wrote:

> Dear All,
>  
> Since my message to the list of 16 August 2017 I have put in another
> intense period of activity to develop a patch to implement PDTs in
> gfortran. I have now temporarily run out of time to develop it
> further; partly because of a backlog of other patches and PRs to deal
> with but also pressure from daytime work.
>  
> The patch adds the last unimplemented F2003 feature to gfortran.
>  
> As in the provisional patch, I have attached some notes on the
> implementation. This indicates some of the weaknesses, problem areas
> and TODOs.
>  
> Suggest that a good read of Mark Leair's excellent PGInsider article
> on PDTs - http://www.pgroup.com/lit/articles/insider/v5n2a4.htm is a
> worthwhile exercise.
>  
> To judge by the complete silence following my previous message, I will
> have a problem getting this patch reviewed. I would welcome any
> remarks or reviews but intend to commit, warts and all, on Saturday
> unless something fundamentally wrong comes out of the woodwork.
>  
> Note that the PDT parts in the compiler are rather well insulated from
> the rest of fortran and that I do not believe that any regressions
> will result.
>  
> I hope that a month or two of testing in other hands will add to the
> list of TODOs and that when I return to PDTs a greatly improved
> version will result.
>  
> Bootstrapped and regtested on FC23/x86_4 - OK for trunk? (Note above
> remark about committing on Saturday in the absence of a review.)
>  
> Best regards
>  
> Paul
>  
> 2017-09-05 Paul Thomas  
>  
> * decl.c : Add decl_type_param_list, type_param_spec_list as
> static variables to hold PDT spec lists.
> (build_sym): Copy 'type_param_spec_list' to symbol spec_list.
> (build_struct): Copy the 'saved_kind_expr' to the component
> 'kind_expr'. Check that KIND or LEN components appear in the
> decl_type_param_list. These should appear as symbols in the
> f2k_derived namespace. If the component is itself a PDT type,
> copy the decl_type_param_list to the component param_list.
> (gfc_match_kind_spec): If the KIND expression is parameterized
> set KIND to zero and store the expression in 'saved_kind_expr'.
> (insert_parameter_exprs): New function.
> (gfc_insert_kind_parameter_exprs): New function.
> (gfc_insert_parameter_exprs): New function.
> (gfc_get_pdt_instance): New function.
> (gfc_match_decl_type_spec): Match the decl_type_spec_list if it
> is present. If it is, call 'gfc_get_pdt_instance' to obtain the
> specific instance of the PDT.
> (match_attr_spec): Match KIND and LEN attributes. Check for the
> standard and for type/kind of the parameter. They are also not
> allowed outside a derived type definition.
> (gfc_match_data_decl): Null the decl_type_param_list and the
> type_param_spec_list on entry and free them on exit.
> (gfc_match_formal_arglist): If 'typeparam' is true, add the
> formal symbol to the f2k_derived namespace.
> (gfc_match_derived_decl): Register the decl_type_param_list
> if this is a PDT. If this is a type extension, gather up all
> the type parameters and put them in the right order.
> *dump-parse-tree.c (show_attr): Signal PDT templates and the
> parameter attributes.
> (show_components): Output parameter atrributes and component
> parameter list.
> (show_symbol): Show variable parameter lists.
> * expr.c (expr.c): Copy the expression parameter list.
> (gfc_is_constant_expr): Pass on symbols representing PDT
> parameters.
> (gfc_check_init_expr): Break on PDT KIND parameters and
> PDT parameter expressions.
> (gfc_check_assign): Assigning to KIND or LEN components is an
> error.
> (derived_parameter_expr): New function.
> (gfc_derived_parameter_expr): New function.
> (gfc_spec_list_type): New function.
> * gfortran.h : Add enum gfc_param_spec_type. Add the PDT attrs
> to the structure symbol_attr. Add the 'kind_expr' and
> 'param_list' field to the gfc_component structure. Comment on
> the reuse of the gfc_actual_arglist structure as storage for
> type parameter spec lists. Add the new field 'spec_type' to
> this structure.

Re: [Patch, fortran] Parameterized Derived Types

2017-09-06 Thread Janus Weil

Hi Paul,

thanks for your patch! It's really great to finally see PDTs come to
gfortran. You're a hero, man ;)

Also: Sorry about the silence. It's certainly not due to lack of
interest, but rather lack of time (day job and private life taking up
all of mine at the moment).

In my current situation I can not promise a complete review of this
beast of a patch, but I will try to do some testing and at least skim
over the diff. I will probably not get to it before the weekend,
though.

Cheers,
Janus



2017-09-06 15:04 GMT+02:00 Paul Richard Thomas :
> Dear All,
>
> Since my message to the list of 16 August 2017 I have put in another
> intense period of activity to develop a patch to implement PDTs in
> gfortran. I have now temporarily run out of time to develop it
> further; partly because of a backlog of other patches and PRs to deal
> with but also pressure from daytime work.
>
> The patch adds the last unimplemented F2003 feature to gfortran.
>
> As in the provisional patch, I have attached some notes on the
> implementation. This indicates some of the weaknesses, problem areas
> and TODOs.
>
> Suggest that a good read of Mark Leair's excellent PGInsider article
> on PDTs -  http://www.pgroup.com/lit/articles/insider/v5n2a4.htm is a
> worthwhile exercise.
>
> To judge by the complete silence following my previous message, I will
> have a problem getting this patch reviewed. I would welcome any
> remarks or reviews but intend to commit, warts and all, on Saturday
> unless something fundamentally wrong comes out of the woodwork.
>
> Note that the PDT parts in the compiler are rather well insulated from
> the rest of fortran and that I do not believe that any regressions
> will result.
>
> I hope that a month or two of testing in other hands will add to the
> list of TODOs and that when I return to PDTs a greatly improved
> version will result.
>
> Bootstrapped and regtested on FC23/x86_4 - OK for trunk? (Note above
> remark about committing on Saturday in the absence of a review.)
>
> Best regards
>
> Paul
>
> 2017-09-05  Paul Thomas  
>
> * decl.c : Add decl_type_param_list, type_param_spec_list as
> static variables to hold PDT spec lists.
> (build_sym): Copy 'type_param_spec_list' to symbol spec_list.
> (build_struct): Copy the 'saved_kind_expr' to the component
> 'kind_expr'. Check that KIND or LEN components appear in the
> decl_type_param_list. These should appear as symbols in the
> f2k_derived namespace. If the component is itself a PDT type,
> copy the decl_type_param_list to the component param_list.
> (gfc_match_kind_spec): If the KIND expression is parameterized
> set KIND to zero and store the expression in 'saved_kind_expr'.
> (insert_parameter_exprs): New function.
> (gfc_insert_kind_parameter_exprs): New function.
> (gfc_insert_parameter_exprs): New function.
> (gfc_get_pdt_instance): New function.
> (gfc_match_decl_type_spec): Match the decl_type_spec_list if it
> is present. If it is, call 'gfc_get_pdt_instance' to obtain the
> specific instance of the PDT.
> (match_attr_spec): Match KIND and LEN attributes. Check for the
> standard and for type/kind of the parameter. They are also not
> allowed outside a derived type definition.
> (gfc_match_data_decl): Null the decl_type_param_list and the
> type_param_spec_list on entry and free them on exit.
> (gfc_match_formal_arglist): If 'typeparam' is true, add the
> formal symbol to the f2k_derived namespace.
> (gfc_match_derived_decl): Register the decl_type_param_list
> if this is a PDT. If this is a type extension, gather up all
> the type parameters and put them in the right order.
> *dump-parse-tree.c (show_attr): Signal PDT templates and the
> parameter attributes.
> (show_components): Output parameter atrributes and component
> parameter list.
> (show_symbol): Show variable parameter lists.
> * expr.c (expr.c): Copy the expression parameter list.
> (gfc_is_constant_expr): Pass on symbols representing PDT
> parameters.
> (gfc_check_init_expr): Break on PDT KIND parameters and
> PDT parameter expressions.
> (gfc_check_assign): Assigning to KIND or LEN components is an
> error.
> (derived_parameter_expr): New function.
> (gfc_derived_parameter_expr): New function.
> (gfc_spec_list_type): New function.
> * gfortran.h : Add enum gfc_param_spec_type. Add the PDT attrs
> to the structure symbol_attr. Add the 'kind_expr' and
> 'param_list' field to the gfc_component structure. Comment on
> the reuse of the gfc_actual_arglist structure as storage for
> type parameter spec lists. Add the new field 'spec_type' to
> this structure. Add 'param_list' fields to gfc_symbol and
> gfc_expr. Add prototypes for gfc_insert_kind_parameter_exprs,
> gfc_insert_parameter_exprs, gfc_add_kind, gfc_add_len,
> gfc_derived_parameter_expr and gfc_spec_list_type.
> *

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-09-06 Thread Richard Sandiford

Richard Sandiford  writes:
> Richard Sandiford  writes:
>> Michael Collison  writes:
>>> Richard,
>>>
>>> The problem with this approach for Aarch64 is that
>>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is
>>> normally 0 as it based on the TARGET_SIMD flag.
>>
>> Maybe I'm wrong, but that seems like a missed optimisation in itself.

Sorry to follow up on myself yet again, but I'd forgotten this was
because we allow the SIMD unit to do scalar shifts.  So I guess
we have no choice, even though it seems unfortunate.

> +(define_insn_and_split "*aarch64_reg__minus3"
> +  [(set (match_operand:GPI 0 "register_operand" "=&r")
> + (ASHIFT:GPI
> +   (match_operand:GPI 1 "register_operand" "r")
> +   (minus:QI (match_operand 2 "const_int_operand" "n")
> + (match_operand:QI 3 "register_operand" "r"]
> +  "INTVAL (operands[2]) == GET_MODE_BITSIZE (mode)"
> +  "#"
> +  "&& true"
> +  [(const_int 0)]
> +  {
> +/* Handle cases where operand 3 is a plain QI register, or
> +   a subreg with either a SImode or DImode register.  */
> +
> +rtx subreg_tmp = (REG_P (operands[3])
> +   ? gen_lowpart_SUBREG (SImode, operands[3])
> +   : SUBREG_REG (operands[3]));
> +
> +if (REG_P (subreg_tmp) && GET_MODE (subreg_tmp) == DImode)
> +  subreg_tmp = gen_lowpart_SUBREG (SImode, subreg_tmp);

I think this all simplifies to:

  rtx subreg_tmp = gen_lowpart (SImode, operands[3]);

(or it would be worth having a comment that explains why not).
As well as being shorter, it will properly simplify hard REGs
to new hard REGs.

> +rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
> +: operands[0]);
> +
> +if (mode == DImode && !can_create_pseudo_p ())
> +  tmp = gen_lowpart_SUBREG (SImode, operands[0]);

I think this too would be simpler with gen_lowpart:

rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
   : gen_lowpart (SImode, operands[0]));

> +
> +emit_insn (gen_negsi2 (tmp, subreg_tmp));
> +
> +rtx and_op = gen_rtx_AND (SImode, tmp,
> +   GEN_INT (GET_MODE_BITSIZE (mode) - 1));
> +
> +rtx subreg_tmp2 = gen_lowpart_SUBREG (QImode, and_op);
> +
> +emit_insn (gen_3 (operands[0], operands[1], subreg_tmp2));
> +DONE;
> +  }
> +)

The pattern should probably set the "length" attribute to 8.

Looks good to me with those changes FWIW.

Thanks,
Richard

Re: [PATCH], Enable -mfloat128 by default on PowerPC VSX systems

2017-09-06 Thread Segher Boessenkool

On Wed, Sep 06, 2017 at 01:48:38AM -0400, Michael Meissner wrote:
> Here is a respin of the patch to enable -mfloat128 on PowerPC Linux systems 
> now
> that the libquadmath patch has been applied.  I rebased the patches against 
> the
> top of the trunk on Tuesday (subversion id 251609).
> 
> I tweaked the documentation a bit based on your comments.
> 
> I built the patch on the following systems.  There are no regressions, and the
> tests float128-type-{1,2}.c now pass (previously they had regressed due to
> other float128 changes).
> 
> * Power7, bootstrap, big endian, --with-cpu=power7
> * Power7, bootstrap, big endian, --with-cpu=power5
> * Power8, bootstrap, little endian, --with-cpu=power8
> * Power9 prototype bootstrap, little endian, --with-cpu=power9
> 
> Can I check these patches into the trunk?

It looks fine, please commit.  Thanks!


Segher

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-09-06 Thread Richard Sandiford

Richard Sandiford  writes:
> Michael Collison  writes:
>> Richard,
>>
>> The problem with this approach for Aarch64 is that
>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is
>> normally 0 as it based on the TARGET_SIMD flag.
>
> Maybe I'm wrong, but that seems like a missed optimisation in itself.
> Like you say, the definition is:
>
>   static unsigned HOST_WIDE_INT
>   aarch64_shift_truncation_mask (machine_mode mode)
>   {
> return
>   (!SHIFT_COUNT_TRUNCATED
>|| aarch64_vector_mode_supported_p (mode)
>|| aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE
>|| (mode) - 1);
>   }

er,

aarch64_shift_truncation_mask (machine_mode mode)
{
  return
(!SHIFT_COUNT_TRUNCATED
 || aarch64_vector_mode_supported_p (mode)
 || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1);
}

> SHIFT_COUNT_TRUNCATED is:
>
>   #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)
>
> and aarch64_vector_mode_supported_p always returns false for
> !TARGET_SIMD:
>
>   static bool
>   aarch64_vector_mode_supported_p (machine_mode mode)
>   {
> if (TARGET_SIMD
> && (mode == V4SImode  || mode == V8HImode
> || mode == V16QImode || mode == V2DImode
> || mode == V2SImode  || mode == V4HImode
> || mode == V8QImode || mode == V2SFmode
> || mode == V4SFmode || mode == V2DFmode
> || mode == V4HFmode || mode == V8HFmode
> || mode == V1DFmode))
>   return true;
>
> return false;
>   }
>
> So when does the second || condition fire?
>
> I'm surprised the aarch64_vect_struct_mode_p part is needed, since
> this hook describes the shift optabs, and AArch64 don't provide any
> shift optabs for OI, CI or XI.
>
> Thanks,
> Richard
>
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
>> Sent: Wednesday, September 6, 2017 11:32 AM
>> To: Michael Collison 
>> Cc: Richard Biener ; Richard Kenner
>> ; GCC Patches ; nd
>> ; Andrew Pinski 
>> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>>
>> Michael Collison  writes:
>>> Richard Sandiford do you have any objections to the patch as it stands?
>>> It doesn't appear as if anything is going to change in the mid-end 
>>> anytime soon.
>>
>> I think one of the suggestions was to do it in expand, taking
>> advantage of range info and TARGET_SHIFT_TRUNCATION_MASK.  This would
>> be like the current FMA_EXPR handling in expand_expr_real_2.
>>
>> I know there was talk about cleaner approaches, but at least doing the
>> above seems cleaner than doing in the backend.  It should also be a
>> nicely-contained piece of work.
>>
>> Thanks,
>> Richard
>>
>>> -Original Message-
>>> From: Richard Sandiford [mailto:richard.sandif...@linaro.org]
>>> Sent: Tuesday, August 22, 2017 9:11 AM
>>> To: Richard Biener 
>>> Cc: Richard Kenner ; Michael Collison 
>>> ; GCC Patches ; nd 
>>> ; Andrew Pinski 
>>> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>>>
>>> Richard Biener  writes:
 On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
  wrote:
> Richard Biener  writes:
>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
>>  wrote:
>>>Richard Biener  writes:
 On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
  wrote:
>> Correct. It is truncated for integer shift, but not simd shift 
>> instructions. We generate a pattern in the split that only
>>>generates
>> the integer shift instructions.
>
> That's unfortunate, because it would be nice to do this in
>>>simplify_rtx,
> since it's machine-independent, but that has to be conditioned 
> on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.

 SHIFT_COUNT_TRUNCATED should go ... you should express this in 
 the patterns, like for example with

 (define_insn ashlSI3
   [(set (match_operand 0 "")
  (ashl:SI (match_operand ... )
  (subreg:QI (match_operand:SI ...)))]

 or an explicit and:SI and combine / simplify_rtx should apply the
>>>magic
 optimization we expect.
>>>
>>>The problem with the explicit AND is that you'd end up with either 
>>>an AND of two constants for constant shifts, or with two separate 
>>>patterns, one for constant shifts and one for variable shifts.  
>>>(And the problem in theory with two patterns is that it reduces the 
>>>RA's freedom, although in practice I guess we'd always want a 
>>>constant shift where possible for cost reasons, and so the RA would 
>>>never need to replace pseudos with constants itself.)
>>>
>>>I think all useful instances of this optimisation will be exposed 
>>>by the gimple optimisers, so maybe expand could to do it based on 
>>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-09-06 Thread Richard Sandiford

Michael Collison  writes:
> Richard,
>
> The problem with this approach for Aarch64 is that
> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is
> normally 0 as it based on the TARGET_SIMD flag.

Maybe I'm wrong, but that seems like a missed optimisation in itself.
Like you say, the definition is:

  static unsigned HOST_WIDE_INT
  aarch64_shift_truncation_mask (machine_mode mode)
  {
return
  (!SHIFT_COUNT_TRUNCATED
   || aarch64_vector_mode_supported_p (mode)
   || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 
1);
  }

SHIFT_COUNT_TRUNCATED is:

  #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)

and aarch64_vector_mode_supported_p always returns false for
!TARGET_SIMD:

  static bool
  aarch64_vector_mode_supported_p (machine_mode mode)
  {
if (TARGET_SIMD
&& (mode == V4SImode  || mode == V8HImode
|| mode == V16QImode || mode == V2DImode
|| mode == V2SImode  || mode == V4HImode
|| mode == V8QImode || mode == V2SFmode
|| mode == V4SFmode || mode == V2DFmode
|| mode == V4HFmode || mode == V8HFmode
|| mode == V1DFmode))
  return true;

return false;
  }

So when does the second || condition fire?

I'm surprised the aarch64_vect_struct_mode_p part is needed, since
this hook describes the shift optabs, and AArch64 don't provide any
shift optabs for OI, CI or XI.

Thanks,
Richard

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
> Sent: Wednesday, September 6, 2017 11:32 AM
> To: Michael Collison 
> Cc: Richard Biener ; Richard Kenner
> ; GCC Patches ; nd
> ; Andrew Pinski 
> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>
> Michael Collison  writes:
>> Richard Sandiford do you have any objections to the patch as it stands?
>> It doesn't appear as if anything is going to change in the mid-end 
>> anytime soon.
>
> I think one of the suggestions was to do it in expand, taking advantage of 
> range info and TARGET_SHIFT_TRUNCATION_MASK.  This would be like the current 
> FMA_EXPR handling in expand_expr_real_2.
>
> I know there was talk about cleaner approaches, but at least doing the above 
> seems cleaner than doing in the backend.  It should also be a 
> nicely-contained piece of work.
>
> Thanks,
> Richard
>
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@linaro.org]
>> Sent: Tuesday, August 22, 2017 9:11 AM
>> To: Richard Biener 
>> Cc: Richard Kenner ; Michael Collison 
>> ; GCC Patches ; nd 
>> ; Andrew Pinski 
>> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>>
>> Richard Biener  writes:
>>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
>>>  wrote:
 Richard Biener  writes:
> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
>  wrote:
>>Richard Biener  writes:
>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
>>>  wrote:
> Correct. It is truncated for integer shift, but not simd shift 
> instructions. We generate a pattern in the split that only
>>generates
> the integer shift instructions.

 That's unfortunate, because it would be nice to do this in
>>simplify_rtx,
 since it's machine-independent, but that has to be conditioned 
 on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>>
>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in 
>>> the patterns, like for example with
>>>
>>> (define_insn ashlSI3
>>>   [(set (match_operand 0 "")
>>>  (ashl:SI (match_operand ... )
>>>  (subreg:QI (match_operand:SI ...)))]
>>>
>>> or an explicit and:SI and combine / simplify_rtx should apply the
>>magic
>>> optimization we expect.
>>
>>The problem with the explicit AND is that you'd end up with either 
>>an AND of two constants for constant shifts, or with two separate 
>>patterns, one for constant shifts and one for variable shifts.  
>>(And the problem in theory with two patterns is that it reduces the 
>>RA's freedom, although in practice I guess we'd always want a 
>>constant shift where possible for cost reasons, and so the RA would 
>>never need to replace pseudos with constants itself.)
>>
>>I think all useful instances of this optimisation will be exposed 
>>by the gimple optimisers, so maybe expand could to do it based on 
>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than 
>>the rtx code and it does take the mode into account.
>
> Sure, that could work as well and also take into account range info. 
> But we'd then need named expanders and the result would still have 
> the explicit and or need to be an unspec or a different RTL operation.

 Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have 
 target-dependent rather than undefine

Re: [PATCH] Fix rs6000 sysv4 -fPIC hot/cold partitioning handling (PR target/81979)

2017-09-06 Thread Segher Boessenkool

On Wed, Sep 06, 2017 at 06:26:10PM +0200, Jakub Jelinek wrote:
> > Maybe this "switch to the other section" thing should be abstracted out?
> > Messing with in_cold_section_p is a bit dirty.
> 
> But it reflects the reality, and is what final.c and varasm.c also do.

Yes, but those aren't target code :-)

I'm suggesting adding a generic switch_from_hot_to_cold_or_the_other_way_around
function (but with a better name ;-) ) that just does these same two lines,
only not in target code.  Seems cleaner to me, less surprising.

But, okay either way.

Segher

[committed][Testsuite] PR78468 - add alloca alignment test

2017-09-06 Thread Wilco Dijkstra

Add an alignment test to check that aligned alloca's really do get
correctly aligned.  Some targets may not ensure SP is always a multiple
of STACK_BOUNDARY (particularly with outgoing arguments), which means
aligned alloca does not get correctly aligned.  This can be fixed either
by aligning the outgoing arguments or setting STACK_BOUNDARY correctly.

Committed as obvious.

ChangeLog: 
2017-09-06  Wilco Dijkstra  

PR middle-end/78468
* gcc.dg/pr78468.c: Add alignment test.
--
diff --git a/gcc/testsuite/gcc.dg/pr78468.c b/gcc/testsuite/gcc.dg/pr78468.c
new file mode 100644
index 
..68eb83a0868c16327e36055aae4eea34fc2ba35e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr78468.c
@@ -0,0 +1,102 @@
+/* { dg-do run } */
+/* { dg-require-effective-target alloca } */
+/* { dg-options "-O2 -fno-inline" } */
+
+/* Test that targets correctly round the size of the outgoing arguments
+   to a multiple of STACK_BOUNDARY.  There is a serious alignment bug if
+   aligned alloca does not get aligned!  */
+
+__extension__ typedef __UINTPTR_TYPE__ uintptr_t;
+extern void abort (void);
+
+volatile int xx;
+volatile int x = 16;
+
+void
+t1 (int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7,
+void *p, int align)
+{
+  xx = x0 + x1 + x2 + x3 + x4 + x4 + x6 + x7;
+  if ((int)(uintptr_t)p & (align-1))
+abort ();
+}
+
+void
+t2 (int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7,
+void *p, int align, int dummy)
+{
+  xx = x0 + x1 + x2 + x3 + x4 + x4 + x6 + x7;
+  if ((int)(uintptr_t)p & (align-1))
+abort ();
+}
+
+void
+t1_a4 (int size)
+{
+  void *p = __builtin_alloca_with_align (size, 32);
+  t1 (0, 0, 0, 0, 0, 0, 0, 0, p, 4);
+}
+
+void
+t2_a4 (int size)
+{
+  void *p = __builtin_alloca_with_align (size, 32);
+  t2 (0, 0, 0, 0, 0, 0, 0, 0, p, 4, 0);
+}
+
+void
+t1_a8 (int size)
+{
+  void *p = __builtin_alloca_with_align (size, 64);
+  t1 (0, 0, 0, 0, 0, 0, 0, 0, p, 8);
+}
+
+void
+t2_a8 (int size)
+{
+  void *p = __builtin_alloca_with_align (size, 64);
+  t2 (0, 0, 0, 0, 0, 0, 0, 0, p, 8, 0);
+}
+
+void
+t1_a16 (int size)
+{
+  void *p = __builtin_alloca_with_align (size, 128);
+  t1 (0, 0, 0, 0, 0, 0, 0, 0, p, 16);
+}
+
+void
+t2_a16 (int size)
+{
+  void *p = __builtin_alloca_with_align (size, 128);
+  t2 (0, 0, 0, 0, 0, 0, 0, 0, p, 16, 0);
+}
+
+void
+t1_a32 (int size)
+{
+  void *p = __builtin_alloca_with_align (size, 256);
+  t1 (0, 0, 0, 0, 0, 0, 0, 0, p, 32);
+}
+
+void
+t2_a32 (int size)
+{
+  void *p = __builtin_alloca_with_align (size, 256);
+  t2 (0, 0, 0, 0, 0, 0, 0, 0, p, 32, 0);
+}
+
+
+int
+main ()
+{
+  t1_a4 (x);
+  t2_a4 (x);
+  t1_a8 (x);
+  t2_a8 (x);
+  t1_a16 (x);
+  t2_a16 (x);
+  t1_a32 (x);
+  t2_a32 (x);
+  return 0;
+}

Re: [PATCH] Fix rs6000 sysv4 -fPIC hot/cold partitioning handling (PR target/81979)

2017-09-06 Thread Jakub Jelinek

On Wed, Sep 06, 2017 at 11:10:07AM -0500, Segher Boessenkool wrote:
> >for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> 
> {
> 
> >  if (INSN_P (insn))
> > @@ -25270,10 +25273,14 @@ uses_TOC (void)
> >   sub = XEXP (sub, 0);
> >   if (GET_CODE (sub) == UNSPEC
> >   && XINT (sub, 1) == UNSPEC_TOC)
> > -   return 1;
> > +   return ret;
> > }
> > }
> >}
> > +else if (crtl->has_bb_partition
> > +&& NOTE_P (insn)
> > +&& NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
> > +  ret = 2;
> 
> }

Ok.

> > +  if (uses_toc == 2)

I could repeat the crtl->has_bb_partition test here if it made things
clearer, but it is redundant with the above.

> > +   {
> > + in_cold_section_p = !in_cold_section_p;
> > + switch_to_section (current_function_section ());
> > +   }
> >(*targetm.asm_out.internal_label) (file, "LCL", rs6000_pic_labelno);
> >  
> >fprintf (file, "\t.long ");
> > @@ -33321,6 +4,11 @@ rs6000_elf_declare_function_name (FILE *
> >ASM_GENERATE_INTERNAL_LABEL (buf, "LCF", rs6000_pic_labelno);
> >assemble_name (file, buf);
> >putc ('\n', file);
> > +  if (uses_toc == 2)
> > +   {
> > + in_cold_section_p = !in_cold_section_p;
> > + switch_to_section (current_function_section ());
> > +   }
> >  }
> 
> Hrm, does that work if not hot/cold partitioning?  Oh, that cannot happen
> because uses_toc==2.  Tricky.
> 
> Maybe this "switch to the other section" thing should be abstracted out?
> Messing with in_cold_section_p is a bit dirty.

But it reflects the reality, and is what final.c and varasm.c also do.
Without changing in_cold_section_p, that flag will be incorrect while inside
of the other section.  There are no switch_to_* functions except to
switch_to_section, and as argument that can use current_function_section
which uses the in_cold_section_p flag, or unlikely_text_section which
hardcodes true for in cold, or function_section which uses
first_function_block_is_cold.  Even if we introduced function_other_section
that used !first_function_block_is_cold the in_cold_section_p flag would be
incorrect there.

Jakub

[PATCH, rs6000] Add builtins to convert from float/double to int/long using current rounding mode

2017-09-06 Thread Carl Love

GCC Maintainers:

The following patch adds support for a couple of requested builtins that
convert from float/double to int / long using the current rounding
mode. 

The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE).

Please let me know if the following patch is acceptable.  Thanks.

Carl Love

---

gcc/ChangeLog:

2017-09-06  Carl Love  

* config/rs6000/rs6000-builtin.def (FCTID, FCTIW): Add BU_P7_MISC_1
macro expansion for builtins.
* config/rs6000/rs6000.md (fctid, fctiw): Add define_insn for the
fctid and fctiw instructions.

gcc/testsuite/ChangeLog:

2017-09-06 Carl Love  
* gcc.target/powerpc/builtin-fctid-fctiw-runnable.c: New test file
for the __builtin_fctid and __builtin_fctiw builtins.
---
 gcc/config/rs6000/rs6000-builtin.def   |   2 +
 gcc/config/rs6000/rs6000.md|  18 +++
 .../powerpc/builtin-fctid-fctiw-runnable.c | 138 +
 3 files changed, 158 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 850164a..7affa30 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2231,6 +2231,8 @@ BU_DFP_MISC_2 (DSCRIQ,"dscriq",   CONST,  
dfp_dscri_td)
 /* 1 argument BCD functions added in ISA 2.06.  */
 BU_P7_MISC_1 (CDTBCD,  "cdtbcd",   CONST,  cdtbcd)
 BU_P7_MISC_1 (CBCDTD,  "cbcdtd",   CONST,  cbcdtd)
+BU_P7_MISC_1 (FCTID,   "fctid",CONST,  fctid)
+BU_P7_MISC_1 (FCTIW,   "fctiw",CONST,  fctiw)
 
 /* 2 argument BCD functions added in ISA 2.06.  */
 BU_P7_MISC_2 (ADDG6S,  "addg6s",   CONST,  addg6s)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 20873ac..a5cbef5 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14054,6 +14054,24 @@
   [(set_attr "type" "integer")
(set_attr "length" "4")])
 
+(define_insn "fctid"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec:DI [(match_operand:DF 1 "register_operand" "f")]
+  UNSPEC_FCTID))]
+  ""
+  "fctid %1,%1; mfvsrd %0,%1"
+  [(set_attr "type" "two")
+   (set_attr "length" "8")])
+
+(define_insn "fctiw"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (unspec:SI [(match_operand:DF 1 "register_operand" "f")]
+  UNSPEC_FCTIW))]
+  ""
+  "fctiw %1,%1; mfvsrd %0,%1; extsw %0,%0"
+  [(set_attr "type" "integer")
+   (set_attr "length" "4")])
+
 (define_int_iterator UNSPEC_DIV_EXTEND [UNSPEC_DIVE
UNSPEC_DIVEO
UNSPEC_DIVEU
diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c
new file mode 100644
index 000..79c5341
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c
@@ -0,0 +1,138 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8" } */
+
+#ifdef DEBUG
+#include 
+#endif
+
+void abort (void);
+
+long
+test_bi_lrint_1 (float __A)
+{
+   return (__builtin_fctid (__A));
+}
+long
+test_bi_lrint_2 (double __A)
+{
+   return (__builtin_fctid (__A));
+}
+
+int
+test_bi_rint_1 (float __A)
+{
+   return (__builtin_fctiw (__A));
+}
+
+int
+test_bi_rint_2 (double __A)
+{
+   return (__builtin_fctiw (__A));
+}
+
+
+int main( void)
+{
+  signed long lx, expected_l;
+  double dy;
+
+  signed int x, expected_i;
+  float y;
+  
+  dy = 1.45;
+  expected_l = 1;
+  lx = __builtin_fctid (dy);
+
+  if( lx != expected_l)
+#ifdef DEBUG
+printf("ERROR: __builtin_fctid(dy= %f) = %ld, expected %ld\n",
+  dy, lx, expected_l);
+#else
+abort();
+#endif
+
+  dy = 3.51;
+  expected_l = 4;
+  lx = __builtin_fctid (dy);
+  
+  if( lx != expected_l)
+#ifdef DEBUG
+printf("ERROR: __builtin_fctid(dy= %f) = %ld, expected %ld\n",
+  dy, lx, expected_l);
+#else
+abort();
+#endif
+
+  dy = 5.57;
+  expected_i = 6;
+  x = __builtin_fctiw (dy);
+
+  if( x != expected_i)
+#ifdef DEBUG
+printf("ERROR: __builtin_fctiw(dy= %f) = %d, expected %d\n",
+  dy, x, expected_i);
+#else
+abort();
+#endif
+
+  y = 11.47;
+  expected_i = 11;
+  x = __builtin_fctiw (y);
+
+  if( x != expected_i)
+#ifdef DEBUG
+printf("ERROR: __builtin_fctiw(y = %f) = %d, expected %d\n",
+  y, x, expected_i);
+#else
+abort();
+#endif
+
+  y = 17.77;
+  expected_l = 18;
+  lx = test_bi_lrint_1 (y);
+
+  if( lx != expected_l)
+#ifdef DEBUG
+printf("ERROR: function call test_bi_lrint_1 (y = %f) = %ld, expected 
%ld\n",
+  y, lx, expected_l);
+#else

Re: [PATCH] Fix rs6000 sysv4 -fPIC hot/cold partitioning handling (PR target/81979)

2017-09-06 Thread Segher Boessenkool

Hi,

On Tue, Sep 05, 2017 at 11:27:25PM +0200, Jakub Jelinek wrote:
> On powerpc with sysv4 -fPIC we emit something like
> .LCL0:
>   .long .LCTOC1-.LCF0
> before we start emitting the function, and in the prologue we emit
> .LCF0:
> and some code.  This fails to assemble if the prologue is emitted in a
> different partition from the start of the function, as e.g. the following
> testcase, where the start of the function is hot, i.e. in .text section,
> but the shrink-wrapped prologue is cold, emitted in .text.unlikely section.
> .LCL0 is still emitted in the section the function starts, thus .text, and
> there is no relocation for subtraction of two symbols in other sections
> (the second - operand has to be in the current section so that a PC-relative
> relocation can be used).  This probably never worked, but is now more
> severe, as we enable hot/cold partitioning in GCC 8, where it
> has been previously only enabled for -fprofile-use.

I wonder if that helps performance at all, for rs6000 anyway...  It's is
a never-ending source of ICEs though :-(

> --- gcc/config/rs6000/rs6000.c.jj 2017-09-04 09:55:28.0 +0200
> +++ gcc/config/rs6000/rs6000.c2017-09-04 16:36:49.033213325 +0200
> @@ -25248,12 +25248,15 @@ get_TOC_alias_set (void)
>  
>  /* This returns nonzero if the current function uses the TOC.  This is
> determined by the presence of (use (unspec ... UNSPEC_TOC)), which
> -   is generated by the ABI_V4 load_toc_* patterns.  */
> +   is generated by the ABI_V4 load_toc_* patterns.
> +   Return 2 instead of 1 if the load_toc_* pattern is in the function
> +   partition that doesn't start the function.  */
>  #if TARGET_ELF
>  static int
>  uses_TOC (void)
>  {
>rtx_insn *insn;
> +  int ret = 1;
>  
>for (insn = get_insns (); insn; insn = NEXT_INSN (insn))

{

>  if (INSN_P (insn))
> @@ -25270,10 +25273,14 @@ uses_TOC (void)
> sub = XEXP (sub, 0);
> if (GET_CODE (sub) == UNSPEC
> && XINT (sub, 1) == UNSPEC_TOC)
> - return 1;
> + return ret;
>   }
>   }
>}
> +else if (crtl->has_bb_partition
> +  && NOTE_P (insn)
> +  && NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
> +  ret = 2;

}

>return 0;
>  }
>  #endif


> @@ -33304,14 +33311,20 @@ rs6000_elf_declare_function_name (FILE *
>return;
>  }
>  
> +  int uses_toc;
>if (DEFAULT_ABI == ABI_V4
>&& (TARGET_RELOCATABLE || flag_pic > 1)
>&& !TARGET_SECURE_PLT
>&& (!constant_pool_empty_p () || crtl->profile)
> -  && uses_TOC ())
> +  && (uses_toc = uses_TOC ()))
>  {
>char buf[256];
>  
> +  if (uses_toc == 2)
> + {
> +   in_cold_section_p = !in_cold_section_p;
> +   switch_to_section (current_function_section ());
> + }
>(*targetm.asm_out.internal_label) (file, "LCL", rs6000_pic_labelno);
>  
>fprintf (file, "\t.long ");
> @@ -33321,6 +4,11 @@ rs6000_elf_declare_function_name (FILE *
>ASM_GENERATE_INTERNAL_LABEL (buf, "LCF", rs6000_pic_labelno);
>assemble_name (file, buf);
>putc ('\n', file);
> +  if (uses_toc == 2)
> + {
> +   in_cold_section_p = !in_cold_section_p;
> +   switch_to_section (current_function_section ());
> + }
>  }

Hrm, does that work if not hot/cold partitioning?  Oh, that cannot happen
because uses_toc==2.  Tricky.

Maybe this "switch to the other section" thing should be abstracted out?
Messing with in_cold_section_p is a bit dirty.

Otherwise looks okay; please add the {} in the first fragment.

Thanks,


Segher

RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-09-06 Thread Michael Collison

Patch updated with all relevant comments and suggestions.

Bootstrapped and tested on arm-none-linux-gnueabihf, and aarch64-none-linux-gnu 
and x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  
Michael Collison 

* compare-elim.c: Include emit-rtl.h.
(can_merge_compare_into_arith): New function.
(try_validate_parallel): Likewise.
(try_merge_compare): Likewise.
(try_eliminate_compare): Call the above when no previous clobber
is available.
(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
dataflow problems.

2017-08-05  Kyrylo Tkachov  
Michael Collison 

* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.

-Original Message-
From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] 
Sent: Saturday, September 2, 2017 12:07 AM
To: Kyrill Tkachov 
Cc: Jeff Law ; Michael Collison ; 
gcc-patches@gcc.gnu.org; nd 
Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

Hi!

On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote:
> On 28/08/17 19:26, Jeff Law wrote:
> >On 08/10/2017 03:14 PM, Michael Collison wrote:
> >>One issue that we keep encountering on aarch64 is GCC not making 
> >>good use of the flag-setting arithmetic instructions like ADDS, 
> >>SUBS, ANDS etc. that perform an arithmetic operation and compare the 
> >>result against zero.
> >>They are represented in a fairly standard way in the backend as 
> >>PARALLEL
> >>patterns:
> >>(parallel [(set (reg x1) (op (reg x2) (reg x3)))
> >>(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 
> >>0)))])

That is incorrect: the compare has to come first.  From md.texi:

@cindex @code{compare}, canonicalization of [ ... ]

@item
For instructions that inherently set a condition code register, the 
@code{compare} operator is always written as the first RTL expression of the 
@code{parallel} instruction pattern.  For example, [ ... ]

aarch64.md seems to do this correctly, fwiw.

> >>GCC isn't forming these from separate arithmetic and comparison 
> >>instructions as aggressively as it could.
> >>A particular pain point is when the result of the arithmetic insn is 
> >>used before the comparison instruction.
> >>The testcase in this patch is one such example where we have:
> >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (plus:SI (reg:SI 0 x0 [ x ])
> >> (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
> >>  (nil))
> >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
> >> (plus:SI (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
> >>  (nil))
> >>(insn 34 33 17 2 (set (reg:CC 66 cc)
> >> (compare:CC (reg/v:SI 0 x0 [orig:73  ] [73])
> >> (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
> >>  (nil))
> >>
> >>This scares combine away as x0 is used in insn 33 as well as the 
> >>comparison in insn 34.
> >>I think the compare-elim pass can help us here.
> >Is it the multiple use or the hard register that combine doesn't 
> >appreciate.  The latter would definitely steer us towards compare-elim.
> 
> It's the multiple use IIRC.

Multiple use, and multiple set (of x1), and more complications...

7+33 won't combine to an existing insn.

7+34 will not even be tried (insn 33 is the first use of x0, not insn 34).
But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so
7 cannot be merged into 34.

7+33+34 results in a parallel of a compare with the same invalid insn
as in the 7+33 case.  Combine would try to split it to two insns again, except 
it already has two insns (the arith and the compare).  It does not see that 
when it splits the insn it can combine the first half with the compare.

What would be needed is pulling insn 34 before insn 33 (which is fine, no 
conflicts there), and then we could combine 7+34 just fine.  But combine tries 
to be linear complexity, and it really cannot change insns around anyway.

Segher

pr5198v2.patch
Description: pr5198v2.patch

RE: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-09-06 Thread Michael Collison

Richard,

The problem with this approach for Aarch64 is that TARGET_SHIFT_TRUNCATION_MASK 
is based on SHIFT_COUNT_TRUNCATED which is normally 0 as it based on the 
TARGET_SIMD flag.

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
Sent: Wednesday, September 6, 2017 11:32 AM
To: Michael Collison 
Cc: Richard Biener ; Richard Kenner 
; GCC Patches ; nd 
; Andrew Pinski 
Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts

Michael Collison  writes:
> Richard Sandiford do you have any objections to the patch as it stands?
> It doesn't appear as if anything is going to change in the mid-end 
> anytime soon.

I think one of the suggestions was to do it in expand, taking advantage of 
range info and TARGET_SHIFT_TRUNCATION_MASK.  This would be like the current 
FMA_EXPR handling in expand_expr_real_2.

I know there was talk about cleaner approaches, but at least doing the above 
seems cleaner than doing in the backend.  It should also be a nicely-contained 
piece of work.

Thanks,
Richard

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@linaro.org]
> Sent: Tuesday, August 22, 2017 9:11 AM
> To: Richard Biener 
> Cc: Richard Kenner ; Michael Collison 
> ; GCC Patches ; nd 
> ; Andrew Pinski 
> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>
> Richard Biener  writes:
>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
>>  wrote:
>>> Richard Biener  writes:
 On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
  wrote:
>Richard Biener  writes:
>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
>>  wrote:
 Correct. It is truncated for integer shift, but not simd shift 
 instructions. We generate a pattern in the split that only
>generates
 the integer shift instructions.
>>>
>>> That's unfortunate, because it would be nice to do this in
>simplify_rtx,
>>> since it's machine-independent, but that has to be conditioned 
>>> on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>
>> SHIFT_COUNT_TRUNCATED should go ... you should express this in 
>> the patterns, like for example with
>>
>> (define_insn ashlSI3
>>   [(set (match_operand 0 "")
>>  (ashl:SI (match_operand ... )
>>  (subreg:QI (match_operand:SI ...)))]
>>
>> or an explicit and:SI and combine / simplify_rtx should apply the
>magic
>> optimization we expect.
>
>The problem with the explicit AND is that you'd end up with either 
>an AND of two constants for constant shifts, or with two separate 
>patterns, one for constant shifts and one for variable shifts.  
>(And the problem in theory with two patterns is that it reduces the 
>RA's freedom, although in practice I guess we'd always want a 
>constant shift where possible for cost reasons, and so the RA would 
>never need to replace pseudos with constants itself.)
>
>I think all useful instances of this optimisation will be exposed 
>by the gimple optimisers, so maybe expand could to do it based on 
>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than 
>the rtx code and it does take the mode into account.

 Sure, that could work as well and also take into account range info. 
 But we'd then need named expanders and the result would still have 
 the explicit and or need to be an unspec or a different RTL operation.
>>>
>>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have 
>>> target-dependent rather than undefined behaviour, so it's OK for a 
>>> target to use shift codes with out-of-range values.
>>
>> Hmm, but that means simplify-rtx can't do anything with them because 
>> we need to preserve target dependent behavior.
>
> Yeah, it needs to punt.  In practice that shouldn't matter much.
>
>> I think the RTL IL should be always well-defined and its semantics 
>> shouldn't have any target dependences (ideally, and if, then they 
>> should be well specified via extra target hooks/macros).
>
> That would be nice :-) I think the problem has traditionally been that
>> shifts can be used in quite a few define_insn patterns besides those 
>> for shift instructions.  So if your target defines shifts to have 
>> 256-bit precision (say) then you need to make sure that every 
>> define_insn with a shift rtx will honour that.
>
> It's more natural for target guarantees to apply to instructions than 
> to
>> rtx codes.
>
>>>  And
>>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about 
>>> how the normal shift optabs behave, so I don't think we'd need new 
>>> optabs or new unspecs.
>>>
>>> E.g. it already works this way when expanding double-word shifts, 
>>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There 
>>> it's possible to use a shorter sequence if you know that the shift 
>>> optab truncates the count, so we can do that ev

Re: [PATCH] Fix ICE in categorize_decl_for_section with TLS decl (PR middle-end/82095)

2017-09-06 Thread Jakub Jelinek

On Wed, Sep 06, 2017 at 09:29:25AM -0600, Jeff Law wrote:
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk?
> > 
> > 2017-09-05  Jakub Jelinek  
> > 
> > PR middle-end/82095
> > * varasm.c (categorize_decl_for_section): Use SECCAT_TBSS for TLS vars 
> > with
> > NULL DECL_INITIAL.
> > 
> > * gcc.dg/tls/pr82095.c: New test.
> THanks.  Sorry about the breakage.  TLS didn't even cross my mind.
> Presumably the TLS initialization sections are readonly and copied into
> the actual thread specific locations.

.tbss section is just in headers (it implies zeroing the corresponding
thread private chunk) and .tdata is in relro part (the image; it might
contain relocations and we don't support separate images for parts without
and with relocations), copied to the thread private chunk.

Jakub

[C++ PATCH] Merge fn and non-fn lookup interface

2017-09-06 Thread Nathan Sidwell

This patch merges the lookup of function and non-function member lookup 
into get_class_binding_direct.   lookup_field_1 becomes an internal detail.


We grow a tri-valued argument to get_class_binding_direct:
<0 -- caller wants functions
=0 -- caller wants whatever is bound
>0 -- caller wants type_decl binding.

This has the nice property that lookup_field_1's want_type argument maps 
onto the latter two values.  The default is the first, which matches the 
existing get_class_binding usage.  The two places where lookup_field_1 
was being called directly are converted and were:


1) hierarchy searching.  This functionality is swallowed by 
get_class_binding_direct, and it passes in the want_type argument.


2) named initializers.  this now passes in 0.  You'll notice this case 
is with the type being complete, so we now might get a binary search of 
METHOD_VEC that we didn;t before.  This is going to be a short-lived 
performance regression.


applied to trunk.

I'm going to hold off the next patch as (a) it's more invasive, but (b) 
it steals the punch line from my name-lookup cauldron talk.


nathan
--
Nathan Sidwell
2017-09-06  Nathan Sidwell  

	* name-lookup.h (lookup_field_1): Delete.
	(get_class_binding_direct, get_class_binding): Add type_or_fns arg.
	* name-lookup.c	(lookup_field_1): make static
	(method_vec_binary_search, method_vec_linear_search): New.  Broken
	out of ...
	(get_class_binding_direct): ... here.  Add TYPE_OR_FNS argument.
	Do complete search of this level.
	(get_class_binding): Adjust.
	* decl.c (reshape_init_class): Call get_class_binding.
	* search.c (lookup_field_r): Move field searching into
	get_class_binding_direct.

Index: decl.c
===
--- decl.c	(revision 251782)
+++ decl.c	(working copy)
@@ -5746,7 +5746,7 @@ reshape_init_class (tree type, reshape_i
 	/* We already reshaped this.  */
 	gcc_assert (d->cur->index == field);
 	  else if (TREE_CODE (d->cur->index) == IDENTIFIER_NODE)
-	field = lookup_field_1 (type, d->cur->index, /*want_type=*/false);
+	field = get_class_binding (type, d->cur->index, false);
 	  else
 	{
 	  if (complain & tf_error)
Index: name-lookup.c
===
--- name-lookup.c	(revision 251794)
+++ name-lookup.c	(working copy)
@@ -1113,79 +1113,54 @@ extract_conversion_operator (tree fns, t
   return convs;
 }
 
-/* TYPE is a class type. Return the member functions in the method
-   vector with name NAME.  Does not lazily declare implicitly-declared
-   member functions.  */
+/* Binary search of (ordered) METHOD_VEC for NAME.  */
 
-tree
-get_class_binding_direct (tree type, tree name)
+static tree
+method_vec_binary_search (vec *method_vec, tree name)
 {
-  vec *method_vec = CLASSTYPE_METHOD_VEC (type);
-  if (!method_vec)
-return NULL_TREE;
-
-  /* Conversion operators can only be found by the marker conversion
- operator name.  */
-  bool conv_op = IDENTIFIER_CONV_OP_P (name);
-  tree lookup = conv_op ? conv_op_identifier : name;
-  tree val = NULL_TREE;
-  tree fns;
-
-  /* If the type is complete, use binary search.  */
-  if (COMPLETE_TYPE_P (type))
+  for (unsigned lo = 0, hi = method_vec->length (); lo < hi;)
 {
-  int lo = 0;
-  int hi = method_vec->length ();
-  while (lo < hi)
-	{
-	  int i = (lo + hi) / 2;
-
-	  fns = (*method_vec)[i];
-	  tree fn_name = OVL_NAME (fns);
-	  if (fn_name > lookup)
-	hi = i;
-	  else if (fn_name < lookup)
-	lo = i + 1;
-	  else
-	{
-	  val = fns;
-	  break;
-	}
-	}
+  unsigned mid = (lo + hi) / 2;
+  tree binding = (*method_vec)[mid];
+  tree binding_name = OVL_NAME (binding);
+
+  if (binding_name > name)
+	hi = mid;
+  else if (binding_name < name)
+	lo = mid + 1;
+  else
+	return binding;
 }
-  else
-for (int i = 0; vec_safe_iterate (method_vec, i, &fns); ++i)
-  /* We can get a NULL binding during insertion of a new
-	 method name, because the identifier_binding machinery
-	 performs a lookup.  If we find such a NULL slot, that's
-	 the thing we were looking for, so we might as well bail
-	 out immediately.  */
-  if (!fns)
-	break;
-  else if (OVL_NAME (fns) == lookup)
-	{
-	  val = fns;
-	  break;
-	}
 
-  /* Extract the conversion operators asked for, unless the general
- conversion operator was requested.   */
-  if (val && conv_op)
-{
-  gcc_checking_assert (OVL_FUNCTION (val) == conv_op_marker);
-  val = OVL_CHAIN (val);
-  if (tree type = TREE_TYPE (name))
-	val = extract_conversion_operator (val, type);
-}
+  return NULL_TREE;
+}
 
-  return val;
+/* Linear search of (unordered) METHOD_VEC for NAME.  */
+
+static tree
+method_vec_linear_search (vec *method_vec, tree name)
+{
+  for (int ix = method_vec->length (); ix--;)
+/* We can get a NULL binding during insertion of a new method
+   name, because the identifier_binding machinery perfo

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Kyrill Tkachov



On 06/09/17 14:17, Bernd Edlinger wrote:

On 09/06/17 14:51, Richard Earnshaw (lists) wrote:

On 06/09/17 13:44, Bernd Edlinger wrote:

On 09/04/17 21:54, Bernd Edlinger wrote:

Hi Kyrill,

Thanks for your review!


On 09/04/17 15:55, Kyrill Tkachov wrote:

Hi Bernd,

On 18/01/17 15:36, Bernd Edlinger wrote:

On 01/13/17 19:28, Bernd Edlinger wrote:

On 01/13/17 17:10, Bernd Edlinger wrote:

On 01/13/17 14:50, Richard Earnshaw (lists) wrote:

On 18/12/16 12:58, Bernd Edlinger wrote:

Hi,

this is related to PR77308, the follow-up patch will depend on this
one.

When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
before reload, a mis-compilation in libgcc function
__gnu_satfractdasq
was discovered, see [1] for more details.

The reason seems to be that when the *arm_cmpdi_insn is directly
followed by a *arm_cmpdi_unsigned instruction, both are split
up into this:

  [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 0) (match_dup 1)))
   (parallel [(set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 3) (match_dup 4)))
  (set (match_dup 2)
   (minus:SI (match_dup 5)
(ltu:SI (reg:CC_C CC_REGNUM)
(const_int
0])]

  [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 2) (match_dup 3)))
   (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
  (set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 0) (match_dup 1]

The problem is that the reg:CC from the *subsi3_carryin_compare
is not mentioning that the reg:CC is also dependent on the reg:CC
from before.  Therefore the *arm_cmpsi_insn appears to be
redundant and thus got removed, because the data values are
identical.

I think that applies to a number of similar pattern where data
flow is happening through the CC reg.

So this is a kind of correctness issue, and should be fixed
independently from the optimization issue PR77308.

Therefore I think the patterns need to specify the true
value that will be in the CC reg, in order for cse to
know what the instructions are really doing.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


I agree you've found a valid problem here, but I have some issues
with
the patch itself.


(define_insn_and_split "subdi3_compare1"
 [(set (reg:CC_NCV CC_REGNUM)
   (compare:CC_NCV
 (match_operand:DI 1 "register_operand" "r")
 (match_operand:DI 2 "register_operand" "r")))
  (set (match_operand:DI 0 "register_operand" "=&r")
   (minus:DI (match_dup 1) (match_dup 2)))]
 "TARGET_32BIT"
 "#"
 "&& reload_completed"
 [(parallel [(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 1) (match_dup 2)))
 (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
2)))])
  (parallel [(set (reg:CC_C CC_REGNUM)
  (compare:CC_C
(zero_extend:DI (match_dup 4))
(plus:DI (zero_extend:DI (match_dup 5))
 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
 (set (match_dup 3)
  (minus:SI (minus:SI (match_dup 4) (match_dup 5))
(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


This pattern is now no-longer self consistent in that before the
split
the overall result for the condition register is in mode CC_NCV, but
afterwards it is just CC_C.

I think CC_NCV is correct mode (the N, C and V bits all correctly
reflect the result of the 64-bit comparison), but that then
implies that
the cc mode of subsi3_carryin_compare is incorrect as well and
should in
fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
agree
that CC_NCV is the correct mode for this operation

I'm not sure if there are other consequences that will fall out from
fixing this (it's possible that we might need a change to
select_cc_mode
as well).


Yes, this is still a bit awkward...

The N and V bit will be the correct result for the subdi3_compare1
a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
only gets the C bit correct, the expression for N and V is a different
one.

It probably works, because the subsi3_carryin_compare instruction sets
more CC bits than the pattern does explicitly specify the value.
We know the subsi3_carryin_compare also computes the NV bits, but
it is
hard to write down the correct rtl expression for it.

In theory the pattern should describe everything correctly,
maybe, like:

set (reg:CC_C CC_REGNUM)
   (compare:CC_C
 (zero_extend:DI (match_dup 4))
 (plus:DI (zero_extend:DI (match_dup 5))
  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
set (reg:CC_NV CC_REGNUM)
   (compare:CC_NV
(match_dup 4))
(plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
(const_int 0)))
set (match_dup 3)
   (minus:SI (minus:SI (match_dup 4) (match_dup 5))
 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)


But I doubt that wil

Re: [PATCH] Fix ICE in categorize_decl_for_section with TLS decl (PR middle-end/82095)

2017-09-06 Thread Jeff Law

On 09/05/2017 03:16 PM, Jakub Jelinek wrote:
> Hi!
> 
> If a DECL_THREAD_LOCAL_P decl has NULL DECL_INITIAL and
> -fzero-initialized-in-bss (the default), we ICE starting with
> r251602, which changed bss_initializer_p:
> +  /* Do not put constants into the .bss section, they belong in a readonly
> + section.  */
> +  return (!TREE_READONLY (decl)
> +   &&
> to:
>   (DECL_INITIAL (decl) == NULL
>   /* In LTO we have no errors in program; error_mark_node is used
>  to mark offlined constructors.  */
>   || (DECL_INITIAL (decl) == error_mark_node
>   && !in_lto_p)
>   || (flag_zero_initialized_in_bss
>   && initializer_zerop (DECL_INITIAL (decl
> Previously because bss_initializer_p for these returned true, ret was
> SECCAT_BSS and therefore we set it to SECCAT_TBSS as intended, but now ret
> is not SECCAT_BSS, but as TLS has only tbss and tdata possibilities, we
> still want to use tbss.  DECL_INITIAL NULL for a decl means implicit zero
> initialization.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2017-09-05  Jakub Jelinek  
> 
>   PR middle-end/82095
>   * varasm.c (categorize_decl_for_section): Use SECCAT_TBSS for TLS vars 
> with
>   NULL DECL_INITIAL.
> 
>   * gcc.dg/tls/pr82095.c: New test.
THanks.  Sorry about the breakage.  TLS didn't even cross my mind.
Presumably the TLS initialization sections are readonly and copied into
the actual thread specific locations.

Jeff

[PATCH, config.gcc] fix case filter for powerpc-*-vxworkspe

2017-09-06 Thread Olivier Hainque

To match on vxworks*spe so it applies to VxWorks 7 as well.

Committing to mainline after verifying for e500v2-wrs-vxworks7
that we now include config/powerpcspe/vxworks.h instead of 
config/rs6000/vxworks.h.

Olivier

2017-09-06  Olivier Hainque  

* config.gcc (powerpc-wrs-vxworksspe): Now match as vxworks*spe.



config-vx7spe.diff
Description: Binary data

[PATCH, rs6000] Add support for vec_xst_len_r() and vec_xl_len_r() builtins

2017-09-06 Thread Carl Love

GCC Maintainers:

The following patch adds support for the vec_xst_len_r() and
vec_xl_len_r() Powerr 9 builtins. The patch has been run on
powerpc64le-unknown-linux-gnu (Power 9 LE).  No regressions were found
but it does seem to "fix" a couple of existing tests.

136a137
> FAIL: TestCgoCallbackGC
139c140,141
< # of expected passes  350
---
> # of expected passes  349
> # of unexpected failures  1
141c143
< /home/carll/GCC/build/gcc-builtin-pre-commit/./gcc/gccgo version 8.0.0 
20170905 (experimental) (GCC)
---
> /home/carll/GCC/build/gcc-base/./gcc/gccgo version 8.0.0 20170905 
> (experimental) (GCC)
163a166
> FAIL: html/template
167,168c170,172
< # of expected passes  146
< /home/carll/GCC/build/gcc-builtin-pre-commit/./gcc/gccgo version 8.0.0 
20170905 (experimental) (GCC)
---
> # of expected passes  145
> # of unexpected failures  1
> /home/carll/GCC/build/gcc-base/./gcc/gccgo version 8.0.0 20170905 
> (experimental) (GCC)
 

Please let me know if the following patch is acceptable.  Thanks.

Carl Love



gcc/ChangeLog:

2017-09-06  Carl Love  

* config/rs6000/rs6000-c.c (P9V_BUILTIN_VEC_XL_LEN_R,
P9V_BUILTIN_VEC_XST_LEN_R): Add support for builtins
vector unsigned char vec_xl_len_r (unsigned char *, size_t);
void vec_xst_len_r (vector unsigned char, unsigned char *, size_t);
* config/rs6000/altivec.h (vec_xl_len_r, vec_xst_len_r): Add defines.
* config/rs6000/rs6000-builtin.def (XL_LEN_R, XST_LEN_R): Add
definitions and overloading.
* config/rs6000/rs6000.c (altivec_expand_builtin): Add case
statement for P9V_BUILTIN_XST_LEN_R.
(altivec_init_builtins):  Add def_builtin for P9V_BUILTIN_STXVLL.
* config/rs6000/vsx.md (addi_neg16, lxvll, stxvll, altivec_lvsl_reg,
altivec_lvsr_reg, xl_len_r, xst_len_r):  Add define_expand and
define_insn for the instructions and builtins.
(define_insn "*stxvl"): add missing argument to the sldi instruction.
* doc/extend.texi: Update the built-in documenation file for the new
built-in functions.

gcc/testsuite/ChangeLog:

2017-09-06  Carl Love  

* gcc.target/powerpc/builtins-5-p9-runnable.c: Add new runable test file
for the new built-ins and the existing built-ins.
---
 gcc/config/rs6000/altivec.h|   2 +
 gcc/config/rs6000/rs6000-builtin.def   |   4 +
 gcc/config/rs6000/rs6000-c.c   |   8 +
 gcc/config/rs6000/rs6000.c |   7 +-
 gcc/config/rs6000/vsx.md   | 133 -
 gcc/doc/extend.texi|   4 +
 .../gcc.target/powerpc/builtins-5-p9-runnable.c| 309 +
 7 files changed, 465 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-5-p9-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index c8e508c..94a4db2 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -467,6 +467,8 @@
 #ifdef _ARCH_PPC64
 #define vec_xl_len __builtin_vec_lxvl
 #define vec_xst_len __builtin_vec_stxvl
+#define vec_xl_len_r __builtin_vec_xl_len_r
+#define vec_xst_len_r __builtin_vec_xst_len_r
 #endif
 
 #define vec_cmpnez __builtin_vec_vcmpnez
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 850164a..8f87cce 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2125,6 +2125,7 @@ BU_P9V_OVERLOAD_2 (VIESP, "insert_exp_sp")
 
 /* 2 argument vector functions added in ISA 3.0 (power9).  */
 BU_P9V_64BIT_VSX_2 (LXVL,  "lxvl", CONST,  lxvl)
+BU_P9V_64BIT_VSX_2 (XL_LEN_R,  "xl_len_r", CONST,  xl_len_r)
 
 BU_P9V_AV_2 (VEXTUBLX, "vextublx", CONST,  vextublx)
 BU_P9V_AV_2 (VEXTUBRX, "vextubrx", CONST,  vextubrx)
@@ -2141,6 +2142,7 @@ BU_P9V_VSX_3 (VINSERT4B_DI, "vinsert4b_di",   CONST,  
vinsert4b_di)
 /* 3 argument vector functions returning void, treated as SPECIAL,
added in ISA 3.0 (power9).  */
 BU_P9V_64BIT_AV_X (STXVL,  "stxvl",MISC)
+BU_P9V_64BIT_AV_X (XST_LEN_R,  "xst_len_r",MISC)
 
 /* 1 argument vector functions added in ISA 3.0 (power9). */
 BU_P9V_AV_1 (VCLZLSBB, "vclzlsbb", CONST,  vclzlsbb)
@@ -2182,12 +2184,14 @@ BU_P9V_AV_P (VCMPNEZW_P,"vcmpnezw_p",   CONST,  
vector_nez_v4si_p)
 
 /* ISA 3.0 Vector scalar overloaded 2 argument functions */
 BU_P9V_OVERLOAD_2 (LXVL,   "lxvl")
+BU_P9V_OVERLOAD_2 (XL_LEN_R,   "xl_len_r")
 BU_P9V_OVERLOAD_2 (VEXTULX,"vextulx")
 BU_P9V_OVERLOAD_2 (VEXTURX,"vexturx")
 BU_P9V_OVERLOAD_2 (VEXTRACT4B, "vextract4b")
 
 /* ISA 3.0 Vector scalar overloaded 3 argument functions */
 BU_P9V_OVERLOAD_3 (STXVL,  "stxvl")
+BU_P9V_OVERLOAD_3 (XST_LEN_R,  "xst_len_r")
 B

[arm-embedded] [PATCH 3/3, GCC/ARM] Add support for ARM Cortex-R52 processor

2017-09-06 Thread Thomas Preudhomme


Hi,

We have decided to apply the following patch to the embedded-7-branch to enable 
Arm Cortex-R52 support.


*** gcc/ChangeLog.arm ***

2017-09-04  Thomas Preud'homme  

 Backport from mainline
 2017-07-14  Thomas Preud'homme  

* config/arm/arm-cpus.in (cortex-r52): Add new entry.
(armv8-r): Set ARM Cortex-R52 as default CPU.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
Cortex-R52.
* doc/invoke.texi: Mention -mtune=cortex-r52 and availability of fp.dp
extension for -mcpu=cortex-r52.

Best regards,

Thomas
--- Begin Message ---

Hi,

On 29/06/17 16:13, Thomas Preudhomme wrote:

Please ignore this patch. I'll respin the patch on a more recent GCC.


Please find an updated patch in attachment.

This patch adds support for the ARM Cortex-R52 processor rencently
announced.

[1] https://developer.arm.com/products/processors/cortex-r/cortex-r52

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-07-14  Thomas Preud'homme  

* config/arm/arm-cpus.in (cortex-r52): Add new entry.
(armv8-r): Set ARM Cortex-R52 as default CPU.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
Cortex-R52.
* doc/invoke.texi: Mention -mtune=cortex-r52 and availability of fp.dp
extension for -mcpu=cortex-r52.

Tested by building an arm-none-eabi GCC cross-compiler targeting Cortex-R52 and 
building an hello world with it. Also checked that the .fpu option created by 
GCC for -mcpu=cortex-r52 and -mcpu=cortex-r52+nofp.dp is as expected 
(respectively .fpu neon-fp-armv8 and .fpu fpv5-sp-d16


Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index e2ff297aed7514073dbb3bf5ee86964f202e5a14..d009a9e18acb093aefe0f9d8d6de49489fc2325c 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -381,7 +381,7 @@ begin arch armv8-m.main
 end arch armv8-m.main
 
 begin arch armv8-r
- tune for cortex-r4
+ tune for cortex-r52
  tune flags CO_PROC
  base 8R
  profile R
@@ -1315,6 +1315,16 @@ begin cpu cortex-m33
  costs v7m
 end cpu cortex-m33
 
+# V8 R-profile implementations.
+begin cpu cortex-r52
+ cname cortexr52
+ tune flags LDSCHED
+ architecture armv8-r+crc+simd
+ fpu neon-fp-armv8
+ option nofp.dp remove FP_DBL ALL_SIMD
+ costs cortex
+end cpu cortex-r52
+
 # FPU entries
 # format:
 # begin fpu 
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 51678c2566e841894c5c0e9c613c8c0f832e9988..4e508b1555a77628ff6e7cfea39c98b87caa840a 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -357,6 +357,9 @@ Enum(processor_type) String(cortex-m23) Value( TARGET_CPU_cortexm23)
 EnumValue
 Enum(processor_type) String(cortex-m33) Value( TARGET_CPU_cortexm33)
 
+EnumValue
+Enum(processor_type) String(cortex-r52) Value( TARGET_CPU_cortexr52)
+
 Enum
 Name(arm_arch) Type(int)
 Known ARM architectures (for use with the -march= option):
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index ba2c7d8ecfdbf6966ebf04b680d587a0e057b161..1b3f7a94cc78fac8abf1042ef60c81a74eaf24eb 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -57,5 +57,6 @@
 	cortexa73,exynosm1,xgene1,
 	cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,
 	cortexa73cortexa53,cortexa55,cortexa75,
-	cortexa75cortexa55,cortexm23,cortexm33"
+	cortexa75cortexa55,cortexm23,cortexm33,
+	cortexr52"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index 16171d4e801af46ad549314d1f376e90d5bff57c..5c29b94caaba4ff6f89a191f1d8edcf10431c0b3 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -58,6 +58,7 @@ static struct vendor_cpu arm_cpu_table[] = {
 {"0xc15", "armv7-r", "cortex-r5"},
 {"0xc17", "armv7-r", "cortex-r7"},
 {"0xc18", "armv7-r", "cortex-r8"},
+{"0xd13", "armv8-r+crc", "cortex-r52"},
 {"0xc20", "armv6-m", "cortex-m0"},
 {"0xc21", "armv6-m", "cortex-m1"},
 {"0xc23", "armv7-m", "cortex-m3"},
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e60edcae53ef3c995054b9b0229b5f0fccbb8462..a093b9bcf77b1f4b40992516e853826bb7d528d4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15538,7 +15538,7 @@ Permissible names are: @samp{arm2}, @samp{arm250},
 @samp{cortex-a32}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
 @samp{cortex-r4}, @samp{cortex-r4f}, @samp{cortex-r5}, @samp{cortex-r7},
-@samp{cortex-r8},
+@samp{cortex-r8}, @samp{cortex-r52},
 @samp{cortex-m33},
 @samp{cortex-m23},
 @samp{cortex-m7},
@@ -15628,7 +15628,7 @@ Disables the floating-point and SIMD instructions on
 
 @item +nofp.dp

[arm-embedded] [PATCH, GCC/ARM] Rewire -mfpu=fp-armv8 as VFPv5 + D32 + DP

2017-09-06 Thread Thomas Preudhomme


Hi,

We have decided to apply the following patch to the embedded-7-branch to enable 
ARMv8-R support.



ChangeLog entry is as follows:

*** gcc/ChangeLog.arm ***

2017-09-04  Thomas Preud'homme  

 Backport from mainline
 2017-07-14  Thomas Preud'homme  

* config/arm/arm-isa.h (isa_bit_FP_ARMv8): Delete enumerator.
(ISA_FP_ARMv8): Define as ISA_FPv5 and ISA_FP_D32.
* config/arm/arm-cpus.in (armv8-r): Define fp.sp as enabling FPv5.
(fp-armv8): Define it as FP_ARMv8 only.
config/arm/arm.h (TARGET_FPU_ARMV8): Delete.
(TARGET_VFP_FP16INST): Define using TARGET_VFP5 rather than
TARGET_FPU_ARMV8.
config/arm/arm.c (arm_rtx_costs_internal): Replace checks against
TARGET_FPU_ARMV8 by checks against TARGET_VFP5.
* config/arm/arm-builtins.c (arm_builtin_vectorized_function): Define
first ARM_CHECK_BUILTIN_MODE definition using TARGET_VFP5 rather
than TARGET_FPU_ARMV8.
* config/arm/arm-c.c (arm_cpu_builtins): Likewise for
__ARM_FEATURE_NUMERIC_MAXMIN macro definition.
* config/arm/arm.md (cmov): Condition on TARGET_VFP5 rather than
TARGET_FPU_ARMV8.
* config/arm/neon.md (neon_vrint): Likewise.
(neon_vcvt): Likewise.
(neon_): Likewise.
(3): Likewise.
* config/arm/vfp.md (lsi2): Likewise.
* config/arm/predicates.md (arm_cond_move_operator): Check against
TARGET_VFP5 rather than TARGET_FPU_ARMV8 and fix spacing.

Best regards,

Thomas
--- Begin Message ---

Hi,

fp-armv8 is currently defined as a double precision FPv5 with 32 D
registers *and* a special FP_ARMv8 bit. However FP for ARMv8 should only
bring 32 D registers on top of FPv5-D16 so this FP_ARMv8 bit is
spurious. As a consequence, many instruction patterns which are guarded
by TARGET_FPU_ARMV8 are unavailable to FPv5-D16 and FPv5-SP-D16.

This patch gets rid of TARGET_FPU_ARMV8 and rewire all uses to
expressions based on TARGET_VFP5, TARGET_VFPD32 and TARGET_VFP_DOUBLE.
It also redefine ISA_FP_ARMv8 to include the D32 capability to
distinguish it from FPv5-D16. At last, it sets the +fp.sp for ARMv8-R to
enable FPv5-SP-D16 (ie FP for ARMv8 with single precision only and 16 D
registers).

ChangeLog entry is as follows:

2017-07-07  Thomas Preud'homme  

* config/arm/arm-isa.h (isa_bit_FP_ARMv8): Delete enumerator.
(ISA_FP_ARMv8): Define as ISA_FPv5 and ISA_FP_D32.
* config/arm/arm-cpus.in (armv8-r): Define fp.sp as enabling FPv5.
(fp-armv8): Define it as FP_ARMv8 only.
config/arm/arm.h (TARGET_FPU_ARMV8): Delete.
(TARGET_VFP_FP16INST): Define using TARGET_VFP5 rather than
TARGET_FPU_ARMV8.
config/arm/arm.c (arm_rtx_costs_internal): Replace checks against
TARGET_FPU_ARMV8 by checks against TARGET_VFP5.
* config/arm/arm-builtins.c (arm_builtin_vectorized_function): Define
first ARM_CHECK_BUILTIN_MODE definition using TARGET_VFP5 rather
than TARGET_FPU_ARMV8.
* config/arm/arm-c.c (arm_cpu_builtins): Likewise for
__ARM_FEATURE_NUMERIC_MAXMIN macro definition.
* config/arm/arm.md (cmov): Condition on TARGET_VFP5 rather than
TARGET_FPU_ARMV8.
* config/arm/neon.md (neon_vrint): Likewise.
(neon_vcvt): Likewise.
(neon_): Likewise.
(3): Likewise.
* config/arm/vfp.md (lsi2): Likewise.
* config/arm/predicates.md (arm_cond_move_operator): Check against
TARGET_VFP5 rather than TARGET_FPU_ARMV8 and fix spacing.

Testing:
  * Bootstrapped under ARMv8-A Thumb state and ran testsuite -> no regression
  * built Spec2000 and Spec2006 with -march=armv8-a+fp16 and compared objdump 
-> no code generation difference


Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 63ee880822c17eda55dd58438d61cbbba333b2c6..7504ed581c63a657a0dff48442633704bd252b2e 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -3098,7 +3098,7 @@ arm_builtin_vectorized_function (unsigned int fn, tree type_out, tree type_in)
NULL_TREE is returned if no such builtin is available.  */
 #undef ARM_CHECK_BUILTIN_MODE
 #define ARM_CHECK_BUILTIN_MODE(C)\
-  (TARGET_FPU_ARMV8   \
+  (TARGET_VFP5   \
&& flag_unsafe_math_optimizations \
&& ARM_CHECK_BUILTIN_MODE_1 (C))
 
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index a3daa3220a2bc4220dffdb7ca08ca9419bdac425..9178937b6d9e0fe5d0948701390c4cf01f4f8c7d 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -96,7 +96,7 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 		   || TARGET_ARM_ARCH_ISA_THUMB >=2));
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_NUMERIC_MAXMIN",
-		  TARGET_ARM_ARCH >= 8 && TARGET_NEON && TARGET_FPU_ARMV8);
+		  TARGET_ARM_ARCH >= 8 && TARGET_NEON && TARGET_VFP5);
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_SIMD32", TARGET_INT_SIMD);
 
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
inde

[arm-embedded] [PATCH 2/3, GCC/ARM] Add support for ARMv8-R architecture

2017-09-06 Thread Thomas Preudhomme


Hi,

We have decided to apply the following patch to the embedded-7-branch to enable 
ARMv8-R support.


ChangeLog entry is as follows:

*** gcc/ChangeLog.arm ***

2017-09-04  Thomas Preud'homme  

 Backport from mainline
 2017-07-06  Thomas Preud'homme  

* config/arm/arm-cpus.in (armv8-r): Add new entry.
* config/arm/arm-isa.h (ISA_ARMv8r): Define macro.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R
enumerator.
* doc/invoke.texi: Mention -march=armv8-r and its extensions.

*** gcc/testsuite/ChangeLog ***


2017-09-04  Thomas Preud'homme  

 Backport from mainline
 2017-07-06  Thomas Preud'homme  

* lib/target-supports.exp: Generate
check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r
and check_effective_target_arm_arch_v8r_multilib.

*** libgcc/ChangeLog ***


2017-09-04  Thomas Preud'homme  

 Backport from mainline
 2017-07-06  Thomas Preud'homme  

* config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R.
--- Begin Message ---

Please find an updated patch in attachment. ChangeLog entry are now as follows:

*** gcc/ChangeLog ***

2017-07-06  Thomas Preud'homme  

* config/arm/arm-cpus.in (armv8-r): Add new entry.
* config/arm/arm-isa.h (ISA_ARMv8r): Define macro.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R
enumerator.
* doc/invoke.texi: Mention -march=armv8-r and its extensions.

*** gcc/testsuite/ChangeLog ***

2017-01-31  Thomas Preud'homme  

* lib/target-supports.exp: Generate
check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r
and check_effective_target_arm_arch_v8r_multilib.

*** libgcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

* config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R.


Tested by building an arm-none-eabi GCC cross-compiler targetting
ARMv8-R.

Is this ok for stage1?

Best regards,

Thomas

Best regards,

Thomas

On 29/06/17 16:13, Thomas Preudhomme wrote:

Please ignore this patch. I'll respin the patch on a more recent GCC.

Best regards,

Thomas

On 29/06/17 14:55, Thomas Preudhomme wrote:

Hi,

This patch adds support for ARMv8-R architecture [1] which was recently
announced. User level instructions for ARMv8-R are the same as those in
ARMv8-A Aarch32 mode so this patch define ARMv8-R to have the same
features as ARMv8-A in ARM backend.

[1] 
https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile 



ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * config/arm/arm-cpus.in (armv8-r, armv8-r+rcr): Add new entry.
 * config/arm/arm-cpu-cdata.h: Regenerate.
 * config/arm/arm-cpu-data.h: Regenerate.
 * config/arm/arm-isa.h (ISA_ARMv8r): Define macro.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R
 enumerator.
 * config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARMv8-R and
 ARMv8-R with CRC extensions.
 * doc/invoke.texi: Mention -march=armv8-r and -march=armv8-r+crc
 options.  Document meaning of -march=armv8-r+rcr.

*** gcc/testsuite/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * lib/target-supports.exp: Generate
 check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r
 and check_effective_target_arm_arch_v8r_multilib.

*** libgcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R.

Tested by building an arm-none-eabi GCC cross-compiler targetting
ARMv8-R.

Is this ok for stage1?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 946d543ebb29416da9b4928161607cccacaa78a7..f35128acb7d68c6a0592355b9d3d56ee8f826aca 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -380,6 +380,22 @@ begin arch armv8-m.main
  option nodsp remove bit_ARMv7em
 end arch armv8-m.main
 
+begin arch armv8-r
+ tune for cortex-r4
+ tune flags CO_PROC
+ base 8R
+ profile R
+ isa ARMv8r
+ option crc add bit_crc32
+# fp.sp => fp-armv8 (d16); simd => simd + fp-armv8 + d32 + double precision
+# note: no fp option for fp-armv8 (d16) + double precision at the moment
+ option fp.sp add FP_ARMv8
+ option simd add FP_ARMv8 NEON
+ option crypto add FP_ARMv8 CRYPTO
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+end arch armv8-r
+
 begin arch iwmmxt
  tune for iwmmxt
  tune flags LDSCHED STRONG XSCALE
diff --git a/gcc/config/arm/arm-isa.h b/gcc/config/arm/arm-isa.h
index c0c2ccee330f2313951e980c5d399ae5d21005d6..0d66a0400c517668db023fc66ff43e26d43add51 100644
--- a/gcc/config/arm/arm-isa.h
+++ b/gcc/config/arm/arm-isa.h
@@ -127,6 +127,7 @@ enum isa_feature
 #define IS

[arm-embedded] [PATCH 1/3, GCC/ARM, ping] Add MIDR info for ARM Cortex-R7 and Cortex-R8

2017-09-06 Thread Thomas Preudhomme


Hi,

We have decided to apply the following patch to the embedded-7-branch as a 
dependency patch to enable ARMv8-R support.


ChangeLog entry is as follows:

*** gcc/ChangeLog.arm ***

2017-09-04  Thomas Preud'homme  

 Backport from mainline
 2017-07-04  Thomas Preud'homme  

 * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
 Cortex-R7 and Cortex-R8 processors.

Best regards,

Thomas
--- Begin Message ---

Ping?

Best regards,

Thomas

On 29/06/17 14:55, Thomas Preudhomme wrote:

Hi,

The driver is missing MIDR information for processors ARM Cortex-R7 and
Cortex-R8 to support -march/-mcpu/-mtune=native on the command line.
This patch adds the missing information.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
 Cortex-R7 and Cortex-R8 processors.

Is this ok for master?

Best regards,

Thomas
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index b034f13fda63f5892bbd9879d72f4b02e2632d69..29873d57a1e45fd989f6ff01dd4a2ae7320d93bb 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -54,6 +54,8 @@ static struct vendor_cpu arm_cpu_table[] = {
 {"0xd09", "armv8-a+crc", "cortex-a73"},
 {"0xc14", "armv7-r", "cortex-r4"},
 {"0xc15", "armv7-r", "cortex-r5"},
+{"0xc17", "armv7-r", "cortex-r7"},
+{"0xc18", "armv7-r", "cortex-r8"},
 {"0xc20", "armv6-m", "cortex-m0"},
 {"0xc21", "armv6-m", "cortex-m1"},
 {"0xc23", "armv7-m", "cortex-m3"},
--- End Message ---

Re: Add support to trace comparison instructions and switch statements

2017-09-06 Thread Jakub Jelinek

On Wed, Sep 06, 2017 at 04:37:18PM +0200, Jakub Jelinek wrote:
> Ok.  Please make sure those entrypoints make it into the various example
> __sanitier_cov_trace* fuzzer implementations though, so that people using
> -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
> At least it should be added to sanitizer_common (both in LLVM and GCC).

Forgot to say that I've committed the patch to GCC trunk today.

Jakub

Re: Add support to trace comparison instructions and switch statements

2017-09-06 Thread Jakub Jelinek

On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:
> Hi Jakub
> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall 
> -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But 
> once we trace integer comparisons, why not real type comparisons.
> I remember Dmitry said it is not enough useful to trace real type comparisons 
> because it is rare to see them in programs.
> But libdng_sdk really has real type comparisons. So I want to keep them and 
> implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.

Ok.  Please make sure those entrypoints make it into the various example
__sanitier_cov_trace* fuzzer implementations though, so that people using
-fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
At least it should be added to sanitizer_common (both in LLVM and GCC).

BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other
-fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep
using some weirdo LLVM IL acronym instead of being named by what it really
traces (trace-array-idx or something similar)).

Any plans to implement some or all of those?

Jakub

[PATCH 2/2] [arm] Improve error checking in parsecpu.awk

2017-09-06 Thread Richard Earnshaw (lists)


This patch adds a bit more error checking to parsecpu.awk to ensure
that statements are not missing arguments or have excess arguments
beyond those permitted.  It also slightly improves the handling of
errors so that we terminate properly if parsing fails and be as
helpful as we can while in the parsing phase.

* config/arm/parsecpu.awk (fatal): Note that we've encountered an
error.  Only quit immediately if parsing is complete.
(BEGIN): Initialize fatal_err and parse_done.
(begin fpu, end fpu): Check number of arguments.
(begin arch, end arch): Likewise.
(begin cpu, end cpu): Likewise.
(cname, tune for, tune flags, architecture, fpu, option): Likewise.
(optalias): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251800 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   | 11 +++
 gcc/config/arm/parsecpu.awk | 26 +-
 2 files changed, 36 insertions(+), 1 deletion(-)


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cab5166..69713c1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,16 @@
 2017-09-06  Richard Earnshaw  
 
+	* config/arm/parsecpu.awk (fatal): Note that we've encountered an
+	error.  Only quit immediately if parsing is complete.
+	(BEGIN): Initialize fatal_err and parse_done.
+	(begin fpu, end fpu): Check number of arguments.
+	(begin arch, end arch): Likewise.
+	(begin cpu, end cpu): Likewise.
+	(cname, tune for, tune flags, architecture, fpu, option): Likewise.
+	(optalias): Likewise.
+
+2017-09-06  Richard Earnshaw  
+
 	* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
 	* config/arm/arm-isa.h: Delete.  Move definitions to ...
 	* arm-cpus.in: ... here.  Use new feature and fgroup values.
diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index d07d3fc..0b4fc68 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -32,7 +32,8 @@
 
 function fatal (m) {
 print "error ("lineno"): " m > "/dev/stderr"
-exit 1
+fatal_err = 1
+if (parse_done) exit 1
 }
 
 function toplevel () {
@@ -502,14 +503,18 @@ BEGIN {
 arch_name = ""
 fpu_name = ""
 lineno = 0
+fatal_err = 0
+parse_done = 0
 if (cmd == "") fatal("Usage parsecpu.awk -v cmd=")
 }
 
+# New line.  Reset parse status and increment line count for error messages
 // {
 lineno++
 parse_ok = 0
 }
 
+# Comments must be on a line on their own.
 /^#/ {
 parse_ok = 1
 }
@@ -552,12 +557,14 @@ BEGIN {
 }
 
 /^begin fpu / {
+if (NF != 3) fatal("syntax: begin fpu ")
 toplevel()
 fpu_name = $3
 parse_ok = 1
 }
 
 /^end fpu / {
+if (NF != 3) fatal("syntax: end fpu ")
 if (fpu_name != $3) fatal("mimatched end fpu")
 if (! (fpu_name in fpu_isa)) {
 	fatal("fpu definition \"" fpu_name "\" lacks an \"isa\" statement")
@@ -570,24 +577,28 @@ BEGIN {
 }
 
 /^begin arch / {
+if (NF != 3) fatal("syntax: begin arch ")
 toplevel()
 arch_name = $3
 parse_ok = 1
 }
 
 /^[ 	]*base / {
+if (NF != 2) fatal("syntax: base ")
 if (arch_name == "") fatal("\"base\" statement outside of arch block")
 arch_base[arch_name] = $2
 parse_ok = 1
 }
 
 /^[ 	]*profile / {
+if (NF != 2) fatal("syntax: profile ")
 if (arch_name == "") fatal("\"profile\" statement outside of arch block")
 arch_prof[arch_name] = $2
 parse_ok = 1
 }
 
 /^end arch / {
+if (NF != 3) fatal("syntax: end arch ")
 if (arch_name != $3) fatal("mimatched end arch")
 if (! arch_name in arch_tune_for) {
 	fatal("arch definition lacks a \"tune for\" statement")
@@ -603,18 +614,21 @@ BEGIN {
 }
 
 /^begin cpu / {
+if (NF != 3) fatal("syntax: begin cpu ")
 toplevel()
 cpu_name = $3
 parse_ok = 1
 }
 
 /^[ 	]*cname / {
+if (NF != 2) fatal("syntax: cname ")
 if (cpu_name == "") fatal("\"cname\" outside of cpu block")
 cpu_cnames[cpu_name] = $2
 parse_ok = 1
 }
 
 /^[ 	]*tune for / {
+if (NF != 3) fatal("syntax: tune for ")
 if (cpu_name != "") {
 	cpu_tune_for[cpu_name] = $3
 } else if (arch_name != "") {
@@ -624,6 +638,7 @@ BEGIN {
 }
 
 /^[ 	]*tune flags / {
+if (NF < 3) fatal("syntax: tune flags  []*")
 flags=""
 flag_count = NF
 for (n = 3; n <= flag_count; n++) {
@@ -640,18 +655,21 @@ BEGIN {
 }
 
 /^[ 	]*architecture / {
+if (NF != 2) fatal("syntax: architecture ")
 if (cpu_name == "") fatal("\"architecture\" outside of cpu block")
 cpu_arch[cpu_name] = $2
 parse_ok = 1
 }
 
 /^[ 	]*fpu / {
+if (NF != 2) fatal("syntax: fpu ")
 if (cpu_name == "") fatal("\"fpu\" outside of cpu block")
 cpu_fpu[cpu_name] = $2
 parse_ok = 1
 }
 
 /^[ 	]*isa / {
+if (NF < 2) fatal("syntax: isa  []*")
 flags=""
 flag_count = NF
 for (n = 2; n <= flag_count; n++) {
@@ -670,6 +688,7 @@ BEGIN {
 }
 
 /^[ 	]*option / {
+if (NF < 4) fatal("syntax: option  add|remove +")
 name=$2
 if ($3 == "add") {
 	remove

[arm] auto-generate arm-isa.h from CPU descriptions

2017-09-06 Thread Richard Earnshaw (lists)


This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
This has the primary advantage that it makes the description file more
self-contained, but it also solves the 'array dimensioning' problem
that Tamar recently encountered.  It adds two new constructs to
arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
a group of feature bits so that they can be referenced together.  We
follow the convention that feature bits are all lower case, while
fgroups are (predominantly) upper case.  This is helpful as in some
contexts they share the same namespace.  Most of the minor changes in
this patch are related to adopting this new naming convention.

* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
* config/arm/arm-isa.h: Delete.  Move definitions to ...
* arm-cpus.in: ... here.  Use new feature and fgroup values.
* config/arm/arm.c (arm_option_override): Use lower case for feature
bit names.
* config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
(TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
* config/arm/parsecpu.awk (END): Add new command 'isa'.
(isa_pfx): Delete.
(print_isa_bits_for): New function.
(gen_isa): New function.
(gen_comm_data): Use print_isa_bits_for.
(define feature): New keyword.
(define fgroup): New keyword.
* config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h
(arm-isa.h): Add rule to generate file.
* common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
case for feature bit names.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251799 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |  21 +++
 gcc/common/config/arm/arm-common.c |  10 +-
 gcc/config.gcc |   2 +-
 gcc/config/arm/arm-cpus.in | 262 +
 gcc/config/arm/arm-isa.h   | 172 
 gcc/config/arm/arm.c   |  32 ++---
 gcc/config/arm/arm.h   |   8 +-
 gcc/config/arm/parsecpu.awk| 187 ++
 gcc/config/arm/t-arm   |   9 ++
 9 files changed, 418 insertions(+), 285 deletions(-)
 delete mode 100644 gcc/config/arm/arm-isa.h


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5df398c..cab5166 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,24 @@
+2017-09-06  Richard Earnshaw  
+
+	* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
+	* config/arm/arm-isa.h: Delete.  Move definitions to ...
+	* arm-cpus.in: ... here.  Use new feature and fgroup values.
+	* config/arm/arm.c (arm_option_override): Use lower case for feature
+	bit names.
+	* config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
+	(TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
+	* config/arm/parsecpu.awk (END): Add new command 'isa'.
+	(isa_pfx): Delete.
+	(print_isa_bits_for): New function.
+	(gen_isa): New function.
+	(gen_comm_data): Use print_isa_bits_for.
+	(define feature): New keyword.
+	(define fgroup): New keyword.
+	* config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h
+	(arm-isa.h): Add rule to generate file.
+	* common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
+	case for feature bit names.
+
 2017-09-06  Richard Biener  
 
 	* tree-ssa-pre.c (NECESSARY): Remove.
diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index 38bd3a7..7cb99ec 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -574,7 +574,7 @@ arm_canon_arch_option (int argc, const char **argv)
 	{
 	  /* The easiest and safest way to remove the default fpu
 	 capabilities is to look for a '+no..' option that removes
-	 the base FPU bit (isa_bit_VFPv2).  If that doesn't exist
+	 the base FPU bit (isa_bit_vfpv2).  If that doesn't exist
 	 then the best we can do is strip out all the bits that
 	 might be part of the most capable FPU we know about,
 	 which is "crypto-neon-fp-armv8".  */
@@ -586,7 +586,7 @@ arm_canon_arch_option (int argc, const char **argv)
 		   ++ext)
 		{
 		  if (ext->remove
-		  && check_isa_bits_for (ext->isa_bits, isa_bit_VFPv2))
+		  && check_isa_bits_for (ext->isa_bits, isa_bit_vfpv2))
 		{
 		  arm_initialize_isa (fpu_isa, ext->isa_bits);
 		  bitmap_and_compl (target_isa, target_isa, fpu_isa);
@@ -620,7 +620,7 @@ arm_canon_arch_option (int argc, const char **argv)
 {
   /* Clearing the VFPv2 bit is sufficient to stop any extention that
 	 builds on the FPU from matching.  */
-  bitmap_clear_bit (target_isa, isa_bit_VFPv2);
+  bitmap_clear_bit (target_isa, isa_bit_vfpv2);
 }
 
   /* If we don't have a selected architecture by now, something's
@@ -692,8 +692,8 @@ arm_canon_arch_option (int argc, const char **argv)
  capable FPU variant that we do support.  This is sufficient for
  multilib selection.  */
 
-  if (bitmap_bit_p (target_isa_unsatisfie

[C++ PATCH] method vec

2017-09-06 Thread Nathan Sidwell

This preparatory patch fixes up a couple of places where a non-function 
could start appearing in the METHOD_VEC.  The warn_hidden change looks 
bigger than necessary, because of indentation change.  I noticed 
check_classfn could check a template mismatch earlier, and avoid doing 
some work.


applied to trunk.

nathan
--
Nathan Sidwell
2017-09-06  Nathan Sidwell  

	* class.c (warn_hidden): Don't barf on non-functions.
	* decl2.c (check_classfn): Likewise.  Check template match earlier.

Index: class.c
===
--- class.c	(revision 251782)
+++ class.c	(working copy)
@@ -2818,63 +2818,64 @@ check_for_override (tree decl, tree ctyp
 static void
 warn_hidden (tree t)
 {
-  vec *method_vec = CLASSTYPE_METHOD_VEC (t);
-  tree fns;
-
-  /* We go through each separately named virtual function.  */
-  for (int i = 0; vec_safe_iterate (method_vec, i, &fns); ++i)
-{
-  tree name = OVL_NAME (fns);
-  auto_vec base_fndecls;
-  tree base_binfo;
-  tree binfo;
-  int j;
+  if (vec *method_vec = CLASSTYPE_METHOD_VEC (t))
+for (unsigned ix = method_vec->length (); ix--;)
+  {
+	tree fns = (*method_vec)[ix];
 
-  /* Iterate through all of the base classes looking for possibly
-	 hidden functions.  */
-  for (binfo = TYPE_BINFO (t), j = 0;
-	   BINFO_BASE_ITERATE (binfo, j, base_binfo); j++)
-	{
-	  tree basetype = BINFO_TYPE (base_binfo);
-	  get_basefndecls (name, basetype, &base_fndecls);
-	}
+	if (!OVL_P (fns))
+	  continue;
 
-  /* If there are no functions to hide, continue.  */
-  if (base_fndecls.is_empty ())
-	continue;
+	tree name = OVL_NAME (fns);
+	auto_vec base_fndecls;
+	tree base_binfo;
+	tree binfo;
+	unsigned j;
+
+	/* Iterate through all of the base classes looking for possibly
+	   hidden functions.  */
+	for (binfo = TYPE_BINFO (t), j = 0;
+	 BINFO_BASE_ITERATE (binfo, j, base_binfo); j++)
+	  {
+	tree basetype = BINFO_TYPE (base_binfo);
+	get_basefndecls (name, basetype, &base_fndecls);
+	  }
 
-  /* Remove any overridden functions.  */
-  for (ovl_iterator iter (fns); iter; ++iter)
-	{
-	  tree fndecl = *iter;
-	  if (TREE_CODE (fndecl) == FUNCTION_DECL
-	  && DECL_VINDEX (fndecl))
-	{
-	  /* If the method from the base class has the same
-		 signature as the method from the derived class, it
-		 has been overridden.  */
-	  for (size_t k = 0; k < base_fndecls.length (); k++)
-		if (base_fndecls[k]
-		&& same_signature_p (fndecl, base_fndecls[k]))
-		  base_fndecls[k] = NULL_TREE;
-	}
-	}
+	/* If there are no functions to hide, continue.  */
+	if (base_fndecls.is_empty ())
+	  continue;
 
-  /* Now give a warning for all base functions without overriders,
-	 as they are hidden.  */
-  size_t k;
-  tree base_fndecl;
-  FOR_EACH_VEC_ELT (base_fndecls, k, base_fndecl)
-	if (base_fndecl)
+	/* Remove any overridden functions.  */
+	for (ovl_iterator iter (fns); iter; ++iter)
 	  {
-	/* Here we know it is a hider, and no overrider exists.  */
-	warning_at (location_of (base_fndecl),
-			OPT_Woverloaded_virtual,
-			"%qD was hidden", base_fndecl);
-	warning_at (location_of (fns),
-			OPT_Woverloaded_virtual, "  by %qD", fns);
+	tree fndecl = *iter;
+	if (TREE_CODE (fndecl) == FUNCTION_DECL
+		&& DECL_VINDEX (fndecl))
+	  {
+		/* If the method from the base class has the same
+		   signature as the method from the derived class, it
+		   has been overridden.  */
+		for (size_t k = 0; k < base_fndecls.length (); k++)
+		  if (base_fndecls[k]
+		  && same_signature_p (fndecl, base_fndecls[k]))
+		base_fndecls[k] = NULL_TREE;
+	  }
 	  }
-}
+
+	/* Now give a warning for all base functions without overriders,
+	   as they are hidden.  */
+	tree base_fndecl;
+	FOR_EACH_VEC_ELT (base_fndecls, j, base_fndecl)
+	  if (base_fndecl)
+	{
+	  /* Here we know it is a hider, and no overrider exists.  */
+	  warning_at (location_of (base_fndecl),
+			  OPT_Woverloaded_virtual,
+			  "%qD was hidden", base_fndecl);
+	  warning_at (location_of (fns),
+			  OPT_Woverloaded_virtual, "  by %qD", fns);
+	}
+  }
 }
 
 /* Recursive helper for finish_struct_anon.  */
@@ -6981,7 +6982,7 @@ unreverse_member_declarations (tree t)
 
   /* For the TYPE_FIELDS, only the non TYPE_DECLs are in reverse
  order, so we can't just use nreverse.  Due to stat_hack
- chicanery in finish_member_declarations.  */
+ chicanery in finish_member_declaration.  */
   prev = NULL_TREE;
   for (x = TYPE_FIELDS (t);
x && TREE_CODE (x) != TYPE_DECL;
Index: decl2.c
===
--- decl2.c	(revision 251782)
+++ decl2.c	(working copy)
@@ -611,6 +611,15 @@ check_classfn (tree ctype, tree function
   for (ovl_iterator iter (fns); !matched && iter; ++iter)
 {
   tree fndecl = *iter;
+
+  /* A member template definition only matches a member template

[Ada] Wrong code on assignment of conditional expression to a mutable obkect

2017-09-06 Thread Arnaud Charlet

This patch fixes an error in an assignmen statement to an  entity of a mutable
type (variable or in-out parameter) when the righ-hand side of the assignment
is a conditioal expression, some of whose alternatives are aggregates. Prior
to this patch, not all components of the mutable object were properly
assigned the corresponding values of the aggregate.

Executing:

   gnatmake -q bug
   ./bug

must yield:

   local var  72
   local var  42
   in_out parameter  72
   in_out parameter  42

---

with Ada.Text_IO;

procedure Bug is

   type Yoyo (Exists : Boolean := False) is record
 case Exists is
   when False =>
 null;
   when True =>
 Value : Integer := 5;
 end case;
   end record;

   Var1 : Yoyo;
   Var2 : Yoyo;

   procedure Test (Condition : in Boolean;
   Value : in Integer;
   Yo: in out Yoyo) is
  Var3 : Yoyo;
   begin
 Yo := (if Condition then
  (Exists => True,
   Value  => Value)
else
  (Exists => False));

 Var3 := (case condition is
   when True  => (Exists => True, Value => Value),
   when False => (Exists => False));

 if Condition and then
   Yo.Value /= Value then
   
   Ada.Text_IO.Put_Line ("Compiler bug exposed");
 end if;

 if Condition then
   Ada.Text_IO.Put_Line ("local var " & Integer'Image (Var3.Value));
 end if;

   end;

begin

   Test (True, 72, Var1);
   Test (True, 42, Var2);

   Ada.Text_IO.Put_Line ("in_out parameter " & Var1.Value'Img);
   Ada.Text_IO.Put_Line ("in_out parameter " & Var2.Value'Img);

   Test (False, 1000, Var1);

end Bug;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* sem_ch5.adb (Analyze_Assigment): If the left-hand side is an
entity of a mutable type and the right-hand side is a conditional
expression, resolve the alternatives of the conditional using
the base type of the target entity, because the alternatives
may have distinct subtypes. This is particularly relevant if
the alternatives are aggregates.

Index: sem_ch5.adb
===
--- sem_ch5.adb (revision 251789)
+++ sem_ch5.adb (working copy)
@@ -580,8 +580,27 @@
 
   Set_Assignment_Type (Lhs, T1);
 
-  Resolve (Rhs, T1);
+  --  If the target of the assignment is an entity of a mutable type
+  --  and the expression is a conditional expression, its alternatives
+  --  can be of different subtypes of the nominal type of the LHS, so
+  --  they must be resolved with the base type, given that their subtype
+  --  may differ frok that of the target mutable object.
 
+  if Is_Entity_Name (Lhs)
+and then Ekind_In (Entity (Lhs),
+   E_Variable,
+   E_Out_Parameter,
+   E_In_Out_Parameter)
+and then Is_Composite_Type (T1)
+and then not Is_Constrained (Etype (Entity (Lhs)))
+and then Nkind_In (Rhs, N_If_Expression, N_Case_Expression)
+  then
+ Resolve (Rhs, Base_Type (T1));
+
+  else
+ Resolve (Rhs, T1);
+  end if;
+
   --  This is the point at which we check for an unset reference
 
   Check_Unset_Reference (Rhs);

[C++ PATCH] class FIELD_VEC initialization

2017-09-06 Thread Nathan Sidwell

Here's some cleanup of the SORTED_FIELDS vector initialization.  Some 
function renaming, to be more specific.  The functionality change is a 
minor bug in late enums.  We only add them to the field vec, if there's 
already a field vec.  But of course, their addition could have cause the 
class's TYPE_FIELD length to cross the threshold for wanting a field vector.


Applied to trunk.

nathan
--
Nathan Sidwell
2017-09-06  Nathan Sidwell  

	* name-lookup.c (count_fields): Rename to ...
	(count_class_fields): ... here.  Take a class, don't count
	NULL-named fields.
	(add_fields_to_record_type): Rename to ...
	(field_vec_append_class_fields): ... here.  Take a class, don't
	add NULL-named fields.
	(add_enum_fields_to_record_type): Rename to ...
	(field_vec_append_enum_values): ... here.
	(set_class_bindings): Adjust, assert we added expected number.
	(insert_late_enum_def_bindings): Reimplement.  Create vector if
	there are now sufficient entries.

Index: name-lookup.c
===
--- name-lookup.c	(revision 251782)
+++ name-lookup.c	(working copy)
@@ -1452,59 +1452,57 @@ sorted_fields_type_new (int n)
   return sft;
 }
 
-/* Subroutine of insert_into_classtype_sorted_fields.  Recursively
-   count the number of fields in TYPE, including anonymous union
-   members.  */
+/* Recursively count the number of fields in KLASS, including anonymous
+   union members.  */
 
-static int
-count_fields (tree fields)
+static unsigned
+count_class_fields (tree klass)
 {
-  tree x;
-  int n_fields = 0;
-  for (x = fields; x; x = DECL_CHAIN (x))
-{
-  if (DECL_DECLARES_FUNCTION_P (x))
-	/* Functions are dealt with separately.  */;
-  else if (TREE_CODE (x) == FIELD_DECL && ANON_AGGR_TYPE_P (TREE_TYPE (x)))
-	n_fields += count_fields (TYPE_FIELDS (TREE_TYPE (x)));
-  else
-	n_fields += 1;
-}
+  unsigned n_fields = 0;
+
+  for (tree fields = TYPE_FIELDS (klass); fields; fields = DECL_CHAIN (fields))
+if (DECL_DECLARES_FUNCTION_P (fields))
+  /* Functions are dealt with separately.  */;
+else if (TREE_CODE (fields) == FIELD_DECL
+	 && ANON_AGGR_TYPE_P (TREE_TYPE (fields)))
+  n_fields += count_class_fields (TREE_TYPE (fields));
+else if (DECL_NAME (fields))
+  n_fields += 1;
+
   return n_fields;
 }
 
-/* Subroutine of insert_into_classtype_sorted_fields.  Recursively add
-   all the fields in the TREE_LIST FIELDS to the SORTED_FIELDS_TYPE
-   elts, starting at offset IDX.  */
-
-static int
-add_fields_to_record_type (tree fields, struct sorted_fields_type *field_vec,
-			   int idx)
+/* Append all the nonfunction members fields of KLASS to FIELD_VEC
+   starting at IDX. Recurse for anonymous members.  The array must
+   have space.  Returns the next available index.  */
+
+static unsigned
+field_vec_append_class_fields (struct sorted_fields_type *field_vec,
+			   tree klass, unsigned idx)
 {
-  tree x;
-  for (x = fields; x; x = DECL_CHAIN (x))
-{
-  if (DECL_DECLARES_FUNCTION_P (x))
-	/* Functions are handled separately.  */;
-  else if (TREE_CODE (x) == FIELD_DECL && ANON_AGGR_TYPE_P (TREE_TYPE (x)))
-	idx = add_fields_to_record_type (TYPE_FIELDS (TREE_TYPE (x)), field_vec, idx);
-  else
-	field_vec->elts[idx++] = x;
-}
+  for (tree fields = TYPE_FIELDS (klass); fields; fields = DECL_CHAIN (fields))
+if (DECL_DECLARES_FUNCTION_P (fields))
+  /* Functions are handled separately.  */;
+else if (TREE_CODE (fields) == FIELD_DECL
+	 && ANON_AGGR_TYPE_P (TREE_TYPE (fields)))
+  idx = field_vec_append_class_fields (field_vec, TREE_TYPE (fields), idx);
+else if (DECL_NAME (fields))
+  field_vec->elts[idx++] = fields;
+
   return idx;
 }
 
-/* Add all of the enum values of ENUMTYPE, to the FIELD_VEC elts,
-   starting at offset IDX.  */
+/* Append all of the enum values of ENUMTYPE to FIELD_VEC starting at IDX.
+   FIELD_VEC must have space.  */
 
-static int
-add_enum_fields_to_record_type (tree enumtype,
-struct sorted_fields_type *field_vec,
-int idx)
+static unsigned
+field_vec_append_enum_values (struct sorted_fields_type *field_vec,
+			  tree enumtype, unsigned idx)
 {
-  tree values;
-  for (values = TYPE_VALUES (enumtype); values; values = TREE_CHAIN (values))
+  for (tree values = TYPE_VALUES (enumtype);
+   values; values = TREE_CHAIN (values))
 field_vec->elts[idx++] = TREE_VALUE (values);
+
   return idx;
 }
 
@@ -1518,12 +1516,12 @@ set_class_bindings (tree klass)
 qsort (method_vec->address (), method_vec->length (),
 	   sizeof (tree), method_name_cmp);
 
-  tree fields = TYPE_FIELDS (klass);
-  int n_fields = count_fields (fields);
+  int n_fields = count_class_fields (klass);
   if (n_fields >= 8)
 {
   struct sorted_fields_type *field_vec = sorted_fields_type_new (n_fields);
-  add_fields_to_record_type (fields, field_vec, 0);
+  unsigned idx = field_vec_append_class_fields (field_vec, klass, 0);
+  gcc_assert (idx =

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Bernd Edlinger

On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
> On 06/09/17 13:44, Bernd Edlinger wrote:
>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>> Hi Kyrill,
>>>
>>> Thanks for your review!
>>>
>>>
>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
 Hi Bernd,

 On 18/01/17 15:36, Bernd Edlinger wrote:
> On 01/13/17 19:28, Bernd Edlinger wrote:
>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
 On 18/12/16 12:58, Bernd Edlinger wrote:
> Hi,
>
> this is related to PR77308, the follow-up patch will depend on this
> one.
>
> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
> before reload, a mis-compilation in libgcc function
> __gnu_satfractdasq
> was discovered, see [1] for more details.
>
> The reason seems to be that when the *arm_cmpdi_insn is directly
> followed by a *arm_cmpdi_unsigned instruction, both are split
> up into this:
>
>  [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 0) (match_dup 1)))
>   (parallel [(set (reg:CC CC_REGNUM)
>   (compare:CC (match_dup 3) (match_dup 4)))
>  (set (match_dup 2)
>   (minus:SI (match_dup 5)
>(ltu:SI (reg:CC_C CC_REGNUM)
> (const_int
> 0])]
>
>  [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 2) (match_dup 3)))
>   (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>  (set (reg:CC CC_REGNUM)
>   (compare:CC (match_dup 0) (match_dup 1]
>
> The problem is that the reg:CC from the *subsi3_carryin_compare
> is not mentioning that the reg:CC is also dependent on the reg:CC
> from before.  Therefore the *arm_cmpsi_insn appears to be
> redundant and thus got removed, because the data values are
> identical.
>
> I think that applies to a number of similar pattern where data
> flow is happening through the CC reg.
>
> So this is a kind of correctness issue, and should be fixed
> independently from the optimization issue PR77308.
>
> Therefore I think the patterns need to specify the true
> value that will be in the CC reg, in order for cse to
> know what the instructions are really doing.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
>
 I agree you've found a valid problem here, but I have some issues
 with
 the patch itself.


 (define_insn_and_split "subdi3_compare1"
 [(set (reg:CC_NCV CC_REGNUM)
   (compare:CC_NCV
 (match_operand:DI 1 "register_operand" "r")
 (match_operand:DI 2 "register_operand" "r")))
  (set (match_operand:DI 0 "register_operand" "=&r")
   (minus:DI (match_dup 1) (match_dup 2)))]
 "TARGET_32BIT"
 "#"
 "&& reload_completed"
 [(parallel [(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 1) (match_dup 2)))
 (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
 2)))])
  (parallel [(set (reg:CC_C CC_REGNUM)
  (compare:CC_C
(zero_extend:DI (match_dup 4))
(plus:DI (zero_extend:DI (match_dup 5))
 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
 (set (match_dup 3)
  (minus:SI (minus:SI (match_dup 4) (match_dup 5))
(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


 This pattern is now no-longer self consistent in that before the
 split
 the overall result for the condition register is in mode CC_NCV, but
 afterwards it is just CC_C.

 I think CC_NCV is correct mode (the N, C and V bits all correctly
 reflect the result of the 64-bit comparison), but that then
 implies that
 the cc mode of subsi3_carryin_compare is incorrect as well and
 should in
 fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
 agree
 that CC_NCV is the correct mode for this operation

 I'm not sure if there are other consequences that will fall out from
 fixing this (it's possible that we might need a change to
 select_cc_mode
 as well).

>>> Yes, this is still a bit awkward...
>>>
>>> The N and V bit will be the correct result for the subdi3_compare1
>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI

[PATCH, e500v2-vxworks] correct CPU name designation for 8548 targets on VxWorks7

2017-09-06 Thread Olivier Hainque


Compared to prior versions of regular VxWorks (not AE/653), the VxWorks 7
header files expect the e500v2 family of CPUs to be designated in a slightly
different fashion.

With this on top of previously posted patches, a build for
e500v2-wrs-vxworks proceeds to completion.

Committing to mainline.

Olivier

2017-09-06  Olivier Hainque  

* config/powerpcspe/vxworks.h (VXCPU_FOR_8548): Correct definition for
VxWorks 7. Adjust surrounding comments. 



vx7-cpu-8548.diff
Description: Binary data

[PATCH] Replace PRE "DCE"

2017-09-06 Thread Richard Biener


The following replaces the weird PRE "DCE" algorithm by a simple
work-list based one seeded by inserted_exprs.  This makes it possible
to get rid of the error-prone marking of stmts necessary and allows
re-ordering of elimination dead stmt removal and DCE again (I'm
in the process of developing a RPO based VN and want to keep
elimination common but move it out of PRE).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-09-06  Richard Biener  

* tree-ssa-pre.c (NECESSARY): Remove.
(create_expression_by_pieces): Do not touch pass-local flags.
(insert_into_preds_of_block): Likewise.
(do_pre_regular_insertion): Likewise.
(eliminate_insert): Likewise.
(eliminate_dom_walker::before_dom_children): Likewise.
(fini_eliminate): Do not look at inserted_exprs.
(mark_operand_necessary): Remove.
(remove_dead_inserted_code): Replace with simple work-list
algorithm based on inserted_exprs and SSA uses.
(pass_pre::execute): Re-order fini_eliminate and
remove_dead_inserted_code.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 251790)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -2753,8 +2753,6 @@ find_or_generate_expression (basic_block
   return NULL_TREE;
 }
 
-#define NECESSARY GF_PLF_1
-
 /* Create an expression in pieces, so that we can handle very complex
expressions that may be ANTIC, but not necessary GIMPLE.
BLOCK is the basic block the expression will be inserted into,
@@ -2972,7 +2970,6 @@ create_expression_by_pieces (basic_block
}
 
  bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (forcedname));
- gimple_set_plf (stmt, NECESSARY, false);
}
   gimple_seq_add_seq (stmts, forced_stmts);
 }
@@ -3095,7 +3092,6 @@ insert_into_preds_of_block (basic_block
   temp = make_temp_ssa_name (type, NULL, "prephitmp");
   phi = create_phi_node (temp, block);
 
-  gimple_set_plf (phi, NECESSARY, false);
   VN_INFO_GET (temp)->value_id = val;
   VN_INFO (temp)->valnum = sccvn_valnum_from_value_id (val);
   if (VN_INFO (temp)->valnum == NULL_TREE)
@@ -3342,7 +3338,6 @@ do_pre_regular_insertion (basic_block bl
  gimple_stmt_iterator gsi = gsi_after_labels (block);
  gsi_insert_before (&gsi, assign, GSI_NEW_STMT);
 
- gimple_set_plf (assign, NECESSARY, false);
  VN_INFO_GET (temp)->value_id = val;
  VN_INFO (temp)->valnum = sccvn_valnum_from_value_id (val);
  if (VN_INFO (temp)->valnum == NULL_TREE)
@@ -4204,9 +4199,6 @@ eliminate_insert (gimple_stmt_iterator *
 {
   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
   VN_INFO_GET (res)->valnum = val;
-
-  if (TREE_CODE (leader) == SSA_NAME)
-   gimple_set_plf (SSA_NAME_DEF_STMT (leader), NECESSARY, true);
 }
 
   pre_stats.insertions++;
@@ -4291,17 +4283,9 @@ eliminate_dom_walker::before_dom_childre
 
  remove_phi_node (&gsi, false);
 
- if (inserted_exprs
- && !bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (res))
- && TREE_CODE (sprime) == SSA_NAME)
-   gimple_set_plf (SSA_NAME_DEF_STMT (sprime), NECESSARY, true);
-
  if (!useless_type_conversion_p (TREE_TYPE (res), TREE_TYPE (sprime)))
sprime = fold_convert (TREE_TYPE (res), sprime);
  gimple *stmt = gimple_build_assign (res, sprime);
- /* ???  It cannot yet be necessary (DOM walk).  */
- gimple_set_plf (stmt, NECESSARY, gimple_plf (phi, NECESSARY));
-
  gimple_stmt_iterator gsi2 = gsi_after_labels (b);
  gsi_insert_before (&gsi2, stmt, GSI_NEW_STMT);
  continue;
@@ -4478,10 +4462,6 @@ eliminate_dom_walker::before_dom_childre
  print_gimple_stmt (dump_file, stmt, 0);
}
 
- if (TREE_CODE (sprime) == SSA_NAME)
-   gimple_set_plf (SSA_NAME_DEF_STMT (sprime),
-   NECESSARY, true);
-
  pre_stats.eliminations++;
  gimple *orig_stmt = stmt;
  if (!useless_type_conversion_p (TREE_TYPE (lhs),
@@ -4615,10 +4595,6 @@ eliminate_dom_walker::before_dom_childre
{
  propagate_value (use_p, sprime);
  modified = true;
- if (TREE_CODE (sprime) == SSA_NAME
- && !is_gimple_debug (stmt))
-   gimple_set_plf (SSA_NAME_DEF_STMT (sprime),
-   NECESSARY, true);
}
}
 
@@ -4787,11 +4763,7 @@ eliminate_dom_walker::before_dom_childre
continue;
  tree sprime = eliminate_avail (arg);
  if (sprime && may_propagate_copy (arg, sprime))
-   {
- propagate_value (use_p, sprime);
- if (TREE_CODE (sprime) == SSA_NAME)
-   gimple_set_plf (SSA_NAME_DEF_STMT (sprime), NECESSARY, true);
-

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Bernd Edlinger

On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
> On 06/09/17 13:44, Bernd Edlinger wrote:
>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>> Hi Kyrill,
>>>
>>> Thanks for your review!
>>>
>>>
>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
 Hi Bernd,

 On 18/01/17 15:36, Bernd Edlinger wrote:
> On 01/13/17 19:28, Bernd Edlinger wrote:
>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
 On 18/12/16 12:58, Bernd Edlinger wrote:
> Hi,
>
> this is related to PR77308, the follow-up patch will depend on this
> one.
>
> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
> before reload, a mis-compilation in libgcc function
> __gnu_satfractdasq
> was discovered, see [1] for more details.
>
> The reason seems to be that when the *arm_cmpdi_insn is directly
> followed by a *arm_cmpdi_unsigned instruction, both are split
> up into this:
>
>  [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 0) (match_dup 1)))
>   (parallel [(set (reg:CC CC_REGNUM)
>   (compare:CC (match_dup 3) (match_dup 4)))
>  (set (match_dup 2)
>   (minus:SI (match_dup 5)
>(ltu:SI (reg:CC_C CC_REGNUM)
> (const_int
> 0])]
>
>  [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 2) (match_dup 3)))
>   (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>  (set (reg:CC CC_REGNUM)
>   (compare:CC (match_dup 0) (match_dup 1]
>
> The problem is that the reg:CC from the *subsi3_carryin_compare
> is not mentioning that the reg:CC is also dependent on the reg:CC
> from before.  Therefore the *arm_cmpsi_insn appears to be
> redundant and thus got removed, because the data values are
> identical.
>
> I think that applies to a number of similar pattern where data
> flow is happening through the CC reg.
>
> So this is a kind of correctness issue, and should be fixed
> independently from the optimization issue PR77308.
>
> Therefore I think the patterns need to specify the true
> value that will be in the CC reg, in order for cse to
> know what the instructions are really doing.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
>
 I agree you've found a valid problem here, but I have some issues
 with
 the patch itself.


 (define_insn_and_split "subdi3_compare1"
 [(set (reg:CC_NCV CC_REGNUM)
   (compare:CC_NCV
 (match_operand:DI 1 "register_operand" "r")
 (match_operand:DI 2 "register_operand" "r")))
  (set (match_operand:DI 0 "register_operand" "=&r")
   (minus:DI (match_dup 1) (match_dup 2)))]
 "TARGET_32BIT"
 "#"
 "&& reload_completed"
 [(parallel [(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 1) (match_dup 2)))
 (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
 2)))])
  (parallel [(set (reg:CC_C CC_REGNUM)
  (compare:CC_C
(zero_extend:DI (match_dup 4))
(plus:DI (zero_extend:DI (match_dup 5))
 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
 (set (match_dup 3)
  (minus:SI (minus:SI (match_dup 4) (match_dup 5))
(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


 This pattern is now no-longer self consistent in that before the
 split
 the overall result for the condition register is in mode CC_NCV, but
 afterwards it is just CC_C.

 I think CC_NCV is correct mode (the N, C and V bits all correctly
 reflect the result of the 64-bit comparison), but that then
 implies that
 the cc mode of subsi3_carryin_compare is incorrect as well and
 should in
 fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
 agree
 that CC_NCV is the correct mode for this operation

 I'm not sure if there are other consequences that will fall out from
 fixing this (it's possible that we might need a change to
 select_cc_mode
 as well).

>>> Yes, this is still a bit awkward...
>>>
>>> The N and V bit will be the correct result for the subdi3_compare1
>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI

[Ada] Spurious warning in formal package when use clause is present.

2017-09-06 Thread Arnaud Charlet

This patch removes a spurious style warning on an operator declared in a
generic package when the package is used as a formal of a generic subprogram,
and the subprogream body includes a use clause on that package.

The following must compile quietly:

gcc -c -gnatyO generic_test.adb

---
with Generic_2;
procedure Generic_Test is

  generic
with package P_1 is new Generic_2 (<>);
  procedure S_1_G;

  procedure S_1_G is
  use P_1;
  begin
null;
  end S_1_G;
  pragma Unreferenced (S_1_G);

begin
  null;
end Generic_Test;
---
with Dummy;
pragma Unreferenced (Dummy);

with Generic_1;

generic
package Generic_2 is

  package P_1 is new Generic_1 (T_1 => Natural);

end Generic_2;
---
generic
  type T_1 is limited private;
package Generic_1 is

private

  type T_2 is
record
  X : T_1;
end record;

  function "=" (Left, Right : T_2) return Boolean is (True);

end Generic_1;
--
package Dummy is

  generic
type T is range <>;
  package Dummy is

function Foo (Of_Image : String) return T renames T'Value;
  end Dummy;
end Dummy;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* sem_aux.adb (Is_Geeric_Formal): Handle properly formal packages.
* sem_ch3.adb (Analyze_Declarations): In a generic subprogram
body. do not freeze the formals of the generic unit.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 251789)
+++ sem_ch3.adb (working copy)
@@ -2649,9 +2649,27 @@
--  in order to perform visibility checks on delayed aspects.
 
Adjust_Decl;
-   Freeze_All (First_Entity (Current_Scope), Decl);
-   Freeze_From := Last_Entity (Current_Scope);
 
+   --  If the current scope is a generic subprogram body. skip
+   --  the generic formal parameters that are not frozen here.
+
+   if Is_Subprogram (Current_Scope)
+ and then Nkind (Unit_Declaration_Node (Current_Scope))
+   = N_Generic_Subprogram_Declaration
+ and then Present (First_Entity (Current_Scope))
+   then
+  while Is_Generic_Formal (Freeze_From) loop
+ Freeze_From := Next_Entity (Freeze_From);
+  end loop;
+
+  Freeze_All (Freeze_From, Decl);
+  Freeze_From := Last_Entity (Current_Scope);
+
+   else
+  Freeze_All (First_Entity (Current_Scope), Decl);
+  Freeze_From := Last_Entity (Current_Scope);
+   end if;
+
 --  Current scope is a package specification
 
 elsif Scope (Current_Scope) /= Standard_Standard
Index: sem_aux.adb
===
--- sem_aux.adb (revision 251753)
+++ sem_aux.adb (working copy)
@@ -1053,9 +1053,13 @@
 
  return
Nkind_In (Kind, N_Formal_Object_Declaration,
-   N_Formal_Package_Declaration,
N_Formal_Type_Declaration)
- or else Is_Formal_Subprogram (E);
+ or else Is_Formal_Subprogram (E)
+
+ or else
+   (Ekind (E) = E_Package
+ and then Nkind (Original_Node (Unit_Declaration_Node (E))) =
+N_Formal_Package_Declaration);
   end if;
end Is_Generic_Formal;

[Ada] Volatile component not treated as such

2017-09-06 Thread Arnaud Charlet

This patch corrects an issue where attributes applied to records were not
propagated to components within the records - causing incorrect code to be
generated by the backend. Additionally, this ticket fixes another issue with
pragma Volatile_Full_Access that allowed the attribute to be applied to a type
with aliased components.


-- Source --


--  p.ads

with System; use System;

package P is
   type Int8_t is mod 2**8;
   type Rec is record
 A,B,C,D : aliased Int8_t;
   end record;
   type VFA_Rec is new Rec with Volatile_Full_Access; --  ERROR

   R : Rec with Volatile_Full_Access; --  ERROR

   type Arr is array (1 .. 4) of aliased Int8_t;
   type VFA_Arr is new Arr with Volatile_Full_Access; --  ERROR

   A : Arr with Volatile_Full_Access; --  ERROR

   type Priv_VFA_Rec is private
 with Volatile_Full_Access; --  ERROR

   type Priv_Ind_Rec is private
 with Independent; --  ERROR

   type Priv_Vol_Rec is private
 with Volatile; --  ERROR

   type Priv_Atomic_Rec is private
 with Atomic; --  ERROR

   type Aliased_Rec is tagged  record
  X : aliased Integer;
   end record with Volatile_Full_Access; --  OK

   type Atomic_And_VFA_Int is new Integer
 with Atomic, Volatile_Full_Access; --  ERROR

   type Atomic_And_VFA_Rec is record
  X : Integer with Atomic;
   end record with Volatile_Full_Access; --  ERROR

   type Atomic_T is tagged record
  X : Integer with Atomic; --  OK
   end record;

   type Atomic_And_VFA_T is new Atomic_T with record
  Y : Integer;
   end record with Volatile_Full_Access; --  ERROR

   type Aliased_And_VFA_T is new Aliased_Rec with record
  Y : Integer;
   end record with Volatile_Full_Access; --  ERROR

   Aliased_And_VFA_Obj   : aliased Integer with Volatile_Full_Access; --  ERROR
   Atomic_And_VFA_Obj: Integer with Atomic, Volatile_Full_Access; --  ERROR
   Aliased_And_VFA_Obj_B : Aliased_Rec with Volatile_Full_Access; --  ERROR
   Atomic_And_VFA_Obj_B  : Atomic_T with Volatile_Full_Access;--  ERROR

private
   type Priv_VFA_Rec is record
  X : Integer;
   end record;

   type Priv_Ind_Rec is record
  X : Integer;
   end record;

   type Priv_Vol_Rec is record
  X : Integer;
   end record;

   type Priv_Atomic_Rec is record
  X : Integer;
   end record;
end;

--  p2.adb

with System;

procedure P2 is

   type Type1_T is record
  Field_1 : Integer;
  Field_2 : Integer;
  Field_3 : Integer;
  Field_4 : Short_Integer;
   end record;

   for Type1_T use record
  Field_1 at 0 range 0 .. 31;
  Field_2 at 4 range 0 .. 31;
  Field_3 at 8 range 0 .. 31;
  Field_4 at 12 range 0 .. 15;
   end record;
   for Type1_T'Size use (14) * System.Storage_Unit;
   pragma Volatile(Type1_T);

   type Type2_T is record
  Type1   : Type1_T;
  Field_1 : Integer;
  Field_2 : Integer;
  Field_3 : Integer;
  Field_4 : Short_Integer;
   end record;

   for Type2_T use record
  Type1   at 0 range 0 .. 111;
  Field_1 at 14 range 0 .. 31;
  Field_2 at 18 range 0 .. 31;
  Field_3 at 22 range 0 .. 31;
  Field_4 at 26 range 0 .. 15;
   end record;
   for Type2_T'Size use (28) * System.Storage_Unit;
   pragma Volatile(Type2_T); --  ERROR

   Type1 : Type1_T := (0,0,0,0);

   Type2 : Type2_T:= ((0,0,0,0),0,0,0,0);
begin
   Type1.Field_1 :=  Type1.Field_1 +1;

   Type2.Field_1 := Type2.Field_1 +1;
end;


-- Compilation and output --


& gcc -c p.ads
& gnatmake -q p2.adb
p.ads:8:33: cannot apply Volatile_Full_Access (aliased component present)
p.ads:10:17: cannot apply Volatile_Full_Access (aliased component present)
p.ads:13:33: cannot apply Volatile_Full_Access (aliased component present)
p.ads:15:17: cannot apply Volatile_Full_Access (aliased component present)
p.ads:18:11: representation item must be after full type declaration
p.ads:21:11: representation item must be after full type declaration
p.ads:24:11: representation item must be after full type declaration
p.ads:27:11: representation item must be after full type declaration
p.ads:31:20: cannot apply Volatile_Full_Access (aliased component present)
p.ads:34:19: cannot have Volatile_Full_Access and Atomic for same entity
p.ads:38:20: cannot have Volatile_Full_Access and Atomic for same entity
p.ads:46:20: cannot have Volatile_Full_Access and Atomic for same entity
p.ads:50:20: cannot apply Volatile_Full_Access (aliased component present)
p.ads:53:49: cannot have Volatile_Full_Access and Atomic for same entity
p.ads:54:45: cannot apply Volatile_Full_Access (aliased component present)
p.ads:55:42: cannot have Volatile_Full_Access and Atomic for same entity
p2.adb:30:31: size of volatile field "Type1" must be at least 128 bits
p2.adb:31:27: position of volatile field "Field_1" must be multiple of 32 bits
p2.adb:32:27: position of volatile field "Field_2" must be multiple of 32 bits
p2.adb:33:27: position of volatile field "Field_3" mu

[PATCH] Adjust gcc.c-torture/execute/20050604-1.c

2017-09-06 Thread Richard Biener


When fiddling around with vector lowering I found the following
adjusted testcase helpful testing proper vector lowering of
word_mode vector plus.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-09-06  Richard Biener  

* gcc.c-torture/execute/20050604-1.c: Adjust to be a better
test for correctness of vector lowering.

Index: gcc/testsuite/gcc.c-torture/execute/20050604-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20050604-1.c(revision 251790)
+++ gcc/testsuite/gcc.c-torture/execute/20050604-1.c(working copy)
@@ -6,7 +6,7 @@
 
 extern void abort (void);
 
-typedef short v4hi __attribute__ ((vector_size (8)));
+typedef unsigned short v4hi __attribute__ ((vector_size (8)));
 typedef float v4sf __attribute__ ((vector_size (16)));
 
 union
@@ -26,7 +26,7 @@ foo (void)
 {
   unsigned int i;
   for (i = 0; i < 2; i++)
-u.v += (v4hi) { 12, 14 };
+u.v += (v4hi) { 12, 32768 };
   for (i = 0; i < 2; i++)
 v.v += (v4sf) { 18.0, 20.0, 22 };
 }
@@ -35,7 +35,7 @@ int
 main (void)
 {
   foo ();
-  if (u.s[0] != 24 || u.s[1] != 28 || u.s[2] || u.s[3])
+  if (u.s[0] != 24 || u.s[1] != 0 || u.s[2] || u.s[3])
 abort ();
   if (v.f[0] != 36.0 || v.f[1] != 40.0 || v.f[2] != 44.0 || v.f[3] != 0.0)
 abort ();

Re: [PATCH] Fix SLSR issue

2017-09-06 Thread Richard Biener

On Wed, 6 Sep 2017, Richard Biener wrote:

> 
> This fixes a bogus check for a mode when the type matters.  The
> test can get fooled by vector ops with integral mode and thus we
> later ICE trying to use wide-ints operating on vector constants.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Promptly overlooked some regressions, fixed as follows.

Richard.

2017-09-06  Richard Biener  

* gimple-ssa-strength-reduction.c
(find_candidates_dom_walker::before_dom_children): Also allow
pointer types.

Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 251753)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -1742,7 +1742,8 @@ find_candidates_dom_walker::before_dom_c
slsr_process_ref (gs);
 
   else if (is_gimple_assign (gs)
-  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs
+  && (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs)))
+  || POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs)
{
  tree rhs1 = NULL_TREE, rhs2 = NULL_TREE;

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Richard Earnshaw (lists)

On 06/09/17 13:44, Bernd Edlinger wrote:
> On 09/04/17 21:54, Bernd Edlinger wrote:
>> Hi Kyrill,
>>
>> Thanks for your review!
>>
>>
>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>> Hi Bernd,
>>>
>>> On 18/01/17 15:36, Bernd Edlinger wrote:
 On 01/13/17 19:28, Bernd Edlinger wrote:
> On 01/13/17 17:10, Bernd Edlinger wrote:
>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>> On 18/12/16 12:58, Bernd Edlinger wrote:
 Hi,

 this is related to PR77308, the follow-up patch will depend on this
 one.

 When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
 before reload, a mis-compilation in libgcc function 
 __gnu_satfractdasq
 was discovered, see [1] for more details.

 The reason seems to be that when the *arm_cmpdi_insn is directly
 followed by a *arm_cmpdi_unsigned instruction, both are split
 up into this:

 [(set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 0) (match_dup 1)))
  (parallel [(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 3) (match_dup 4)))
 (set (match_dup 2)
  (minus:SI (match_dup 5)
   (ltu:SI (reg:CC_C CC_REGNUM) 
 (const_int
 0])]

 [(set (reg:CC CC_REGNUM)
   (compare:CC (match_dup 2) (match_dup 3)))
  (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
 (set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 0) (match_dup 1]

 The problem is that the reg:CC from the *subsi3_carryin_compare
 is not mentioning that the reg:CC is also dependent on the reg:CC
 from before.  Therefore the *arm_cmpsi_insn appears to be
 redundant and thus got removed, because the data values are 
 identical.

 I think that applies to a number of similar pattern where data
 flow is happening through the CC reg.

 So this is a kind of correctness issue, and should be fixed
 independently from the optimization issue PR77308.

 Therefore I think the patterns need to specify the true
 value that will be in the CC reg, in order for cse to
 know what the instructions are really doing.


 Bootstrapped and reg-tested on arm-linux-gnueabihf.
 Is it OK for trunk?

>>> I agree you've found a valid problem here, but I have some issues 
>>> with
>>> the patch itself.
>>>
>>>
>>> (define_insn_and_split "subdi3_compare1"
>>>[(set (reg:CC_NCV CC_REGNUM)
>>>  (compare:CC_NCV
>>>(match_operand:DI 1 "register_operand" "r")
>>>(match_operand:DI 2 "register_operand" "r")))
>>> (set (match_operand:DI 0 "register_operand" "=&r")
>>>  (minus:DI (match_dup 1) (match_dup 2)))]
>>>"TARGET_32BIT"
>>>"#"
>>>"&& reload_completed"
>>>[(parallel [(set (reg:CC CC_REGNUM)
>>> (compare:CC (match_dup 1) (match_dup 2)))
>>>(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
>>> 2)))])
>>> (parallel [(set (reg:CC_C CC_REGNUM)
>>> (compare:CC_C
>>>   (zero_extend:DI (match_dup 4))
>>>   (plus:DI (zero_extend:DI (match_dup 5))
>>>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>>>(set (match_dup 3)
>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>>>
>>>
>>> This pattern is now no-longer self consistent in that before the 
>>> split
>>> the overall result for the condition register is in mode CC_NCV, but
>>> afterwards it is just CC_C.
>>>
>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>> reflect the result of the 64-bit comparison), but that then 
>>> implies that
>>> the cc mode of subsi3_carryin_compare is incorrect as well and 
>>> should in
>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
>>> agree
>>> that CC_NCV is the correct mode for this operation
>>>
>>> I'm not sure if there are other consequences that will fall out from
>>> fixing this (it's possible that we might need a change to 
>>> select_cc_mode
>>> as well).
>>>
>> Yes, this is still a bit awkward...
>>
>> The N and V bit will be the correct result for the subdi3_compare1
>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>> only gets the C bit correct, the expression for N and V is a different
>> one.
>>
>> It probably works, because the subsi3_carryin_compare instruction sets
>> mor

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-06 Thread Bernd Edlinger

On 09/04/17 21:54, Bernd Edlinger wrote:
> Hi Kyrill,
> 
> Thanks for your review!
> 
> 
> On 09/04/17 15:55, Kyrill Tkachov wrote:
>> Hi Bernd,
>>
>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>> On 01/13/17 19:28, Bernd Edlinger wrote:
 On 01/13/17 17:10, Bernd Edlinger wrote:
> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this is related to PR77308, the follow-up patch will depend on this
>>> one.
>>>
>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>> before reload, a mis-compilation in libgcc function 
>>> __gnu_satfractdasq
>>> was discovered, see [1] for more details.
>>>
>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>> up into this:
>>>
>>> [(set (reg:CC CC_REGNUM)
>>>   (compare:CC (match_dup 0) (match_dup 1)))
>>>  (parallel [(set (reg:CC CC_REGNUM)
>>>  (compare:CC (match_dup 3) (match_dup 4)))
>>> (set (match_dup 2)
>>>  (minus:SI (match_dup 5)
>>>   (ltu:SI (reg:CC_C CC_REGNUM) 
>>> (const_int
>>> 0])]
>>>
>>> [(set (reg:CC CC_REGNUM)
>>>   (compare:CC (match_dup 2) (match_dup 3)))
>>>  (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>> (set (reg:CC CC_REGNUM)
>>>  (compare:CC (match_dup 0) (match_dup 1]
>>>
>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>> redundant and thus got removed, because the data values are 
>>> identical.
>>>
>>> I think that applies to a number of similar pattern where data
>>> flow is happening through the CC reg.
>>>
>>> So this is a kind of correctness issue, and should be fixed
>>> independently from the optimization issue PR77308.
>>>
>>> Therefore I think the patterns need to specify the true
>>> value that will be in the CC reg, in order for cse to
>>> know what the instructions are really doing.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>>
>> I agree you've found a valid problem here, but I have some issues 
>> with
>> the patch itself.
>>
>>
>> (define_insn_and_split "subdi3_compare1"
>>[(set (reg:CC_NCV CC_REGNUM)
>>  (compare:CC_NCV
>>(match_operand:DI 1 "register_operand" "r")
>>(match_operand:DI 2 "register_operand" "r")))
>> (set (match_operand:DI 0 "register_operand" "=&r")
>>  (minus:DI (match_dup 1) (match_dup 2)))]
>>"TARGET_32BIT"
>>"#"
>>"&& reload_completed"
>>[(parallel [(set (reg:CC CC_REGNUM)
>> (compare:CC (match_dup 1) (match_dup 2)))
>>(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
>> 2)))])
>> (parallel [(set (reg:CC_C CC_REGNUM)
>> (compare:CC_C
>>   (zero_extend:DI (match_dup 4))
>>   (plus:DI (zero_extend:DI (match_dup 5))
>>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>>(set (match_dup 3)
>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>>
>>
>> This pattern is now no-longer self consistent in that before the 
>> split
>> the overall result for the condition register is in mode CC_NCV, but
>> afterwards it is just CC_C.
>>
>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>> reflect the result of the 64-bit comparison), but that then 
>> implies that
>> the cc mode of subsi3_carryin_compare is incorrect as well and 
>> should in
>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
>> agree
>> that CC_NCV is the correct mode for this operation
>>
>> I'm not sure if there are other consequences that will fall out from
>> fixing this (it's possible that we might need a change to 
>> select_cc_mode
>> as well).
>>
> Yes, this is still a bit awkward...
>
> The N and V bit will be the correct result for the subdi3_compare1
> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
> only gets the C bit correct, the expression for N and V is a different
> one.
>
> It probably works, because the subsi3_carryin_compare instruction sets
> more CC bits than the pattern does explicitly specify the value.
> We know the subsi3_carryin_compare also computes the NV bits, but 
> it is
> hard to

replace libiberty with gnulib (was: Re: [PATCH 0/2] add unique_ptr class)

2017-09-06 Thread Manuel López-Ibáñez


On 05/09/17 18:40, Pedro Alves wrote:

On 09/05/2017 05:52 PM, Manuel López-Ibáñez wrote:
Yeah, ISTR it was close, though there were a couple things
that needed addressing still.

The wiki seems to miss a pointer to following iterations/review
of that patch (mailing list archives don't cross month
boundaries...).  You can find it starting here:
  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01208.html
I think this was the latest version posted:
  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01554.html


Thanks, I have updated the hyperlinks in the wiki.

Unfortunately, Ayush left and there is no one else to finish the work. While 
converting individuals functions from libiberty to gnulib is more or less 
straightforward, the build system of GCC is far too complex for any new or 
less-experienced contributor to finish the job.


I have also updated https://gcc.gnu.org/wiki/SummerOfCode
I don't believe that this was the only project accepted in 2016, but I cannot 
remember the others. Didn't GCC apply this year?


Cheers,

Manuel.

[PATCH] Fix PR82108

2017-09-06 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk 
sofar.

Richard.

2017-09-06  Richard Biener  

PR tree-optimization/82108
* tree-vect-stmts.c (vectorizable_load): Fix pointer adjustment
for gap in the non-permutation SLP case.

* gcc.dg/vect/pr82108.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 251642)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -7203,7 +7203,6 @@ vectorizable_load (gimple *stmt, gimple_
 {
   first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
   group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
-  int group_gap = GROUP_GAP (vinfo_for_stmt (first_stmt));
   /* For SLP vectorization we directly vectorize a subchain
  without permutation.  */
   if (slp && ! SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
@@ -7246,7 +7245,8 @@ vectorizable_load (gimple *stmt, gimple_
  else
{
  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
- group_gap_adj = group_gap;
+ group_gap_adj
+   = group_size - SLP_INSTANCE_GROUP_SIZE (slp_node_instance);
}
}
   else
Index: gcc/testsuite/gcc.dg/vect/pr82108.c
===
--- gcc/testsuite/gcc.dg/vect/pr82108.c (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/pr82108.c (working copy)
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_float } */
+
+#include "tree-vect.h"
+
+void __attribute__((noinline,noclone))
+downscale_2 (const float* src, int src_n, float* dst)
+{
+  int i;
+
+  for (i = 0; i < src_n; i += 2) {
+  const float* a = src;
+  const float* b = src + 4;
+
+  dst[0] = (a[0] + b[0]) / 2;
+  dst[1] = (a[1] + b[1]) / 2;
+  dst[2] = (a[2] + b[2]) / 2;
+  dst[3] = (a[3] + b[3]) / 2;
+
+  src += 2 * 4;
+  dst += 4;
+  }
+}
+
+int main ()
+{
+  const float in[4 * 4] = {
+  1, 2, 3, 4,
+  5, 6, 7, 8,
+
+  1, 2, 3, 4,
+  5, 6, 7, 8
+  };
+  float out[2 * 4];
+
+  check_vect ();
+
+  downscale_2 (in, 4, out);
+
+  if (out[0] != 3 || out[1] != 4 || out[2] != 5 || out[3] != 6
+  || out[4] != 3 || out[5] != 4 || out[6] != 5 || out[7] != 6)
+__builtin_abort ();
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */

[Ada] Issue error message on invalid representation clause for extension

2017-09-06 Thread Arnaud Charlet

This makes the compiler generate an error message also in the case where one
of the specified components overlaps the parent field because its size has
been explicitly set by a size clause.

The compiler must issue an error on 32-bit platforms for the package:

 1. package P is
 2.
 3.   type Byte is mod 2**8;
 4.   for Byte'Size use 8;
 5.
 6.   type Root is tagged record
 7. Status : Byte;
 8.   end record;
 9.   for Root use record
10. Status at 4 range 0 .. 7;
11.   end record;
12.   for Root'Size use 64;
13.
14.   type Ext is new Root with record
15. Thread_Status : Byte;
16.   end record;
17.   for Ext use record
18. Thread_Status at 5 range 0 .. 7;
|
>>> component overlaps parent field of "Ext"

19.   end record;
20.
21. end P;

 21 lines: 1 error

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Eric Botcazou  

* sem_ch13.adb (Check_Record_Representation_Clause): Give an
error as soon as one of the specified components overlaps the
parent field.

Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 251784)
+++ sem_ch13.adb(working copy)
@@ -9806,12 +9806,12 @@
   --  checking for overlap, since no overlap is possible.
 
   Tagged_Parent : Entity_Id := Empty;
-  --  This is set in the case of a derived tagged type for which we have
-  --  Is_Fully_Repped_Tagged_Type True (indicating that all components are
-  --  positioned by record representation clauses). In this case we must
-  --  check for overlap between components of this tagged type, and the
-  --  components of its parent. Tagged_Parent will point to this parent
-  --  type. For all other cases Tagged_Parent is left set to Empty.
+  --  This is set in the case of an extension for which we have either a
+  --  size clause or Is_Fully_Repped_Tagged_Type True (indicating that all
+  --  components are positioned by record representation clauses) on the
+  --  parent type. In this case we check for overlap between components of
+  --  this tagged type and the parent component. Tagged_Parent will point
+  --  to this parent type. For all other cases, Tagged_Parent is Empty.
 
   Parent_Last_Bit : Uint;
   --  Relevant only if Tagged_Parent is set, Parent_Last_Bit indicates the
@@ -9959,19 +9959,23 @@
 
   if Rectype = Any_Type then
  return;
-  else
- Rectype := Underlying_Type (Rectype);
   end if;
 
+  Rectype := Underlying_Type (Rectype);
+
   --  See if we have a fully repped derived tagged type
 
   declare
  PS : constant Entity_Id := Parent_Subtype (Rectype);
 
   begin
- if Present (PS) and then Is_Fully_Repped_Tagged_Type (PS) then
+ if Present (PS) and then Known_Static_RM_Size (PS) then
 Tagged_Parent := PS;
+Parent_Last_Bit := RM_Size (PS) - 1;
 
+ elsif Present (PS) and then Is_Fully_Repped_Tagged_Type (PS) then
+Tagged_Parent := PS;
+
 --  Find maximum bit of any component of the parent type
 
 Parent_Last_Bit := UI_From_Int (System_Address_Size - 1);
@@ -10063,7 +10067,7 @@
  ("bit number out of range of specified size",
   Last_Bit (CC));
 
-   --  Check for overlap with tag component
+   --  Check for overlap with tag or parent component
 
 else
if Is_Tagged_Type (Rectype)
@@ -10073,27 +10077,20 @@
 ("component overlaps tag field of&",
  Component_Name (CC), Rectype);
   Overlap_Detected := True;
+
+   elsif Present (Tagged_Parent)
+ and then Fbit <= Parent_Last_Bit
+   then
+  Error_Msg_NE
+("component overlaps parent field of&",
+ Component_Name (CC), Rectype);
+  Overlap_Detected := True;
end if;
 
if Hbit < Lbit then
   Hbit := Lbit;
end if;
 end if;
-
---  Check parent overlap if component might overlap parent field
-
-if Present (Tagged_Parent) and then Fbit <= Parent_Last_Bit then
-   Pcomp := First_Component_Or_Discriminant (Tagged_Parent);
-   while Present (Pcomp) loop
-  if not Is_Tag (Pcomp)
-and then Chars (Pcomp) /= Name_uParent
-  then
- Check_Component_Overlap (Comp, Pcomp);
-  end if;
-
-  Next_Component_Or_Discriminant (Pcomp);
-   end loop;
-end if;
  end if;
 
  Next (CC);

[Ada] Reject invalid use of Global/Depends on object declaration

2017-09-06 Thread Arnaud Charlet

GNAT failed to issue an error on a Global/Depends aspect put on an object
declaration, which is only allowed for a task object. Instead it crashed.
Now fixed.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Yannick Moy  

* sem_prag.adb (Analyze_Depends_Global): Reinforce test on object
declarations to only consider valid uses of Global/Depends those on
single concurrent objects.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 251778)
+++ sem_prag.adb(working copy)
@@ -4080,7 +4080,10 @@
 
  --  Object declaration of a single concurrent type
 
- elsif Nkind (Subp_Decl) = N_Object_Declaration then
+ elsif Nkind (Subp_Decl) = N_Object_Declaration
+   and then Is_Single_Concurrent_Object
+  (Unique_Defining_Entity (Subp_Decl))
+ then
 null;
 
  --  Single task type

[Ada] Missing finalization of generalized indexed element

2017-09-06 Thread Arnaud Charlet

This patch modifies the finalization mechanism to recognize a heavily expanded
generalized indexing where the element type requires finalization actions.


-- Source --


--  types.ads

with Ada.Finalization; use Ada.Finalization;

package Types is
   type Element is new Controlled with record
  Id : Natural := 0;
   end record;

   procedure Adjust (Obj : in out Element);
   procedure Finalize (Obj : in out Element);
   procedure Initialize (Obj : In out Element);

   subtype Index is Integer range 1 .. 3;
   type Collection is array (Index) of Element;

   type Vector is new Controlled with record
  Id   : Natural := 0;
  Elements : Collection;
   end record
 with Constant_Indexing => Element_At;

   procedure Adjust (Obj : in out Vector);
   procedure Finalize (Obj : in out Vector);
   procedure Initialize (Obj : In out Vector);

   function Element_At
 (Obj : Vector;
  Pos : Index) return Element'Class;

   function Make_Vector return Vector'Class;
end Types;

--  types.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Types is
   Id_Gen : Natural := 10;

   procedure Adjust (Obj : in out Element) is
  Old_Id : constant Natural := Obj.Id;
  New_Id : constant Natural := Old_Id + 1;

   begin
  if Old_Id = 0 then
 Put_Line ("  Element adj ERROR");
  else
 Put_Line ("  Element adj" & Old_Id'Img & " ->" & New_Id'Img);
 Obj.Id := New_Id;
  end if;
   end Adjust;

   procedure Adjust (Obj : in out Vector) is
  Old_Id : constant Natural := Obj.Id;
  New_Id : constant Natural := Old_Id + 1;

   begin
  if Old_Id = 0 then
 Put_Line ("  Vector  adj ERROR");
  else
 Put_Line ("  Vector  adj" & Old_Id'Img & " ->" & New_Id'Img);
 Obj.Id := New_Id;
  end if;
   end Adjust;

   function Element_At
 (Obj : Vector;
  Pos : Index) return Element'Class
   is
   begin
  return Obj.Elements (Pos);
   end Element_At;

   procedure Finalize (Obj : in out Element) is
   begin
  if Obj.Id = 0 then
 Put_Line ("  Element fin ERROR");
  else
 Put_Line ("  Element fin" & Obj.Id'Img);
 Obj.Id := 0;
  end if;
   end Finalize;

   procedure Finalize (Obj : in out Vector) is
   begin
  if Obj.Id = 0 then
 Put_Line ("  Vector  fin ERROR");
  else
 Put_Line ("  Vector  fin" & Obj.Id'Img);
 Obj.Id := 0;
  end if;
   end Finalize;

   procedure Initialize (Obj : In out Element) is
   begin
  Obj.Id := Id_Gen;
  Id_Gen := Id_Gen + 10;
  Put_Line ("  Element ini" & Obj.Id'Img);
   end Initialize;

   procedure Initialize (Obj : In out Vector) is
   begin
  Obj.Id := Id_Gen;
  Id_Gen := Id_Gen + 10;
  Put_Line ("  Vector  ini" & Obj.Id'Img);
   end Initialize;

   function Make_Vector return Vector'Class is
  Result : Vector;
   begin
  return Result;
   end Make_Vector;
end Types;

--  main.adb

with Ada.Text_IO; use Ada.Text_IO;
with Types;  use Types;

procedure Main is
begin
   Put_Line ("Main");

   declare
  Vec  : Vector'Class  := Make_Vector;
  Elem : Element'Class := Vec (1);
   begin
  Put_Line ("Main middle");
   end;

   Put_Line ("Main end");
end Main;


-- Compilation and output --


$ gnatmake -q main.adb
$ ./main.adb
Main
  Element ini 10
  Element ini 20
  Element ini 30
  Vector  ini 40
  Element adj 10 -> 11
  Element adj 20 -> 21
  Element adj 30 -> 31
  Vector  adj 40 -> 41
  Vector  fin 40
  Element fin 30
  Element fin 20
  Element fin 10
  Element adj 11 -> 12
  Element adj 21 -> 22
  Element adj 31 -> 32
  Vector  adj 41 -> 42
  Vector  fin 41
  Element fin 31
  Element fin 21
  Element fin 11
  Element adj 12 -> 13
  Element adj 13 -> 14
  Element fin 13
Main middle
  Element fin 14
  Vector  fin 42
  Element fin 32
  Element fin 22
  Element fin 12
Main end

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Hristian Kirtchev  

* exp_util.adb (Is_Controlled_Indexing): New routine.
(Is_Displace_Call): Use routine Strip to remove indirections.
(Is_Displacement_Of_Object_Or_Function_Result): Code clean up. Add a
missing case of controlled generalized indexing.
(Is_Source_Object): Use routine Strip to remove indirections.
(Strip): New routine.

Index: exp_util.adb
===
--- exp_util.adb(revision 251784)
+++ exp_util.adb(working copy)
@@ -7590,22 +7590,28 @@
  (Obj_Id : Entity_Id) return Boolean
is
   function Is_Controlled_Function_Call (N : Node_Id) return Boolean;
-  --  Determine if particular node denotes a controlled function call. The
-  --  call may have been heavily expanded.
+  --  Determine whether node N denotes a controlled function call
 
+  function Is_Controlled_Indexing (N : Node_Id) return Boolean;
+  --  Det

[Ada] Handling of inherited and explicit postconditions

2017-09-06 Thread Arnaud Charlet

This patch fixes the handling of overriding operations that have both an
explicit postcondition and an inherited classwide one.

Executing:

   gnatmake -q -gnata post_class.adb
   post_class

must yield:

   raised SYSTEM.ASSERTIONS.ASSERT_FAILURE :
   failed inherited postcondition from the_package.ads:4

---
with The_Package; use The_Package;
procedure Post_Class is
   X : D;
begin
   Proc (X);
end Post_Class;
---
package The_Package is
   type T is tagged null record;
   function F (X : T) return Boolean is (True);
   procedure Proc (X : in out T) with Post => True, post'class => F (X);
   type D is new T with null record;
   overriding function F (X : D) return Boolean is (False);
end The_Package;
---
package body The_Package is
   procedure Proc (X : in out T) is
   begin
  null;
   end Proc;
end The_Package;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* einfo.ads, einfo.adb (Get_Classwwide_Pragma): New utility,
to retrieve the inherited classwide precondition/postcondition
of a subprogram.
* freeze.adb (Freeze_Entity): Use Get_Classwide_Pragma when
freezing a subprogram, to complete the generation of the
corresponding checking code.

Index: einfo.adb
===
--- einfo.adb   (revision 251783)
+++ einfo.adb   (working copy)
@@ -7481,6 +7481,39 @@
   return Empty;
end Get_Pragma;
 
+   --
+   -- Get_Classwide_Pragma --
+   --
+
+   function Get_Classwide_Pragma
+ (E  : Entity_Id;
+  Id : Pragma_Id) return Node_Id
+is
+  Item  : Node_Id;
+  Items : Node_Id;
+
+   begin
+  Items := Contract (E);
+  if No (Items) then
+ return Empty;
+  end if;
+
+  Item := Pre_Post_Conditions (Items);
+
+  while Present (Item) loop
+ if Nkind (Item) = N_Pragma
+   and then Get_Pragma_Id (Pragma_Name_Unmapped (Item)) = Id
+   and then Class_Present (Item)
+ then
+return Item;
+ else
+Item := Next_Pragma (Item);
+ end if;
+  end loop;
+
+  return Empty;
+   end Get_Classwide_Pragma;
+
--
-- Get_Record_Representation_Clause --
--
Index: einfo.ads
===
--- einfo.ads   (revision 251783)
+++ einfo.ads   (working copy)
@@ -8295,6 +8295,12 @@
--Test_Case
--Volatile_Function
 
+   function Get_Classwide_Pragma
+ (E  : Entity_Id;
+  Id : Pragma_Id) return Node_Id;
+   --  Examine Rep_Item chain to locate a classwide pre- or postcondition
+   --  of a primitive operation. Returns Empty if not present.
+
function Get_Record_Representation_Clause (E : Entity_Id) return Node_Id;
--  Searches the Rep_Item chain for a given entity E, for a record
--  representation clause, and if found, returns it. Returns Empty
Index: freeze.adb
===
--- freeze.adb  (revision 251781)
+++ freeze.adb  (working copy)
@@ -1418,8 +1418,8 @@
  New_Prag : Node_Id;
 
   begin
- A_Pre := Get_Pragma (Par_Prim, Pragma_Precondition);
- if Present (A_Pre) and then Class_Present (A_Pre) then
+ A_Pre := Get_Classwide_Pragma (Par_Prim, Pragma_Precondition);
+ if Present (A_Pre) then
 New_Prag := New_Copy_Tree (A_Pre);
 Build_Class_Wide_Expression
   (Prag  => New_Prag,
@@ -1436,9 +1436,9 @@
 end if;
  end if;
 
- A_Post := Get_Pragma (Par_Prim, Pragma_Postcondition);
+ A_Post := Get_Classwide_Pragma (Par_Prim, Pragma_Postcondition);
 
- if Present (A_Post) and then Class_Present (A_Post) then
+ if Present (A_Post) then
 New_Prag := New_Copy_Tree (A_Post);
 Build_Class_Wide_Expression
   (Prag   => New_Prag,

[Ada] Dimensional checking and generic subprograms

2017-09-06 Thread Arnaud Charlet

This patch enahnces dimensionality checking to cover generic subprograms that
are intended to apply to types of different dimensions, such as an integration
function. Dimensionality checking is performed in each instance. and rely on
a special handling of conversion operations to prevent spurious dimensional
errors in the generic unit itself.

The following must compile quietly:

   gcc -c -gnatws integrate.adb

--- 
package Dims with SPARK_Mode is

-
-- Setup Dimension System
-
type Unit_Type is new Float with Dimension_System =>
((Unit_Name => Meter, Unit_Symbol => 'm', Dim_Symbol => 'L'),
 (Unit_Name => Kilogram, Unit_Symbol => "kg", Dim_Symbol => 'M'),
 (Unit_Name => Second, Unit_Symbol => 's', Dim_Symbol => 'T'),
 (Unit_Name => Ampere, Unit_Symbol => 'A', Dim_Symbol => 'I'),
 (Unit_Name => Kelvin, Unit_Symbol => 'K', Dim_Symbol => "Theta"),
 (Unit_Name => Radian, Unit_Symbol => "Rad", Dim_Symbol => "A")),
   Default_Value => 0.0; 

   -- Base Dimensions
   subtype Length_Type is Unit_Type with
Dimension => (Symbol => 'm', Meter => 1, others => 0);  
 
   subtype Time_Type is Unit_Type with
Dimension => (Symbol => 's', Second => 1, others => 0);   

   subtype Linear_Velocity_Type is Unit_Type with
Dimension => (Meter => 1, Second => -1, others => 0);   

   -- Base Units
   Meter: constant Length_Type := Length_Type (1.0);
   Second   : constant Time_Type   := Time_Type (1.0);
end dims;
---
with Dims; use Dims;
procedure Integrate is
   generic
  type Op1 is new Unit_Type;
  type Op2 is new Unit_Type;
  type Res is new Unit_Type;
   function I (X : op1; Y : Op2) return Res;
   function I (X : op1; Y : Op2) return Res is
   begin
  return Res (Unit_Type (X) *  Unit_type (Y));
   end I;

   function Distance is new I (Time_Type, Linear_Velocity_Type, Length_Type);
   Secs : Time_Type := 5.0;
   Speed : Linear_Velocity_Type := 10.0;
   Covered : Length_Type;
begin
   Covered  := Distance (Secs, Speed);

   declare
  subtype Area is Unit_Type with dimension =>
 (Meter => 2, others => 0);
  My_Little_Acre : Area;
  function Acres is new I (Length_Type, Length_Type, Area);
begin
   My_Little_Acre := Covered * Covered;
   My_Little_Acre := Acres (Covered, Covered);
end;
end Integrate;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* sem_dim.adb (Analyze_Dimension): In an instance, a type
conversion takes its dimensions from the expression, not from
the context type.
(Dimensions_Of_Operand): Ditto.

Index: sem_dim.adb
===
--- sem_dim.adb (revision 251753)
+++ sem_dim.adb (working copy)
@@ -1161,7 +1161,6 @@
 | N_Qualified_Expression
 | N_Selected_Component
 | N_Slice
-| N_Type_Conversion
 | N_Unchecked_Type_Conversion
  =>
 Analyze_Dimension_Has_Etype (N);
@@ -1191,7 +1190,17 @@
  when N_Subtype_Declaration =>
 Analyze_Dimension_Subtype_Declaration (N);
 
+ when  N_Type_Conversion =>
+if In_Instance
+  and then Exists (Dimensions_Of (Expression (N)))
+then
+   Set_Dimensions (N, Dimensions_Of (Expression (N)));
+else
+   Analyze_Dimension_Has_Etype (N);
+end if;
+
  when N_Unary_Op =>
+
 Analyze_Dimension_Unary_Op (N);
 
  when others =>
@@ -1378,11 +1387,24 @@
 
  --  A type conversion may have been inserted to rewrite other
  --  expressions, e.g. function returns. Dimensions are those of
- --  the target type.
+ --  the target type, unless this is a conversion in an instance,
+ --  in which case the proper dimensions are those of the operand,
 
  elsif Nkind (N) = N_Type_Conversion then
-return Dimensions_Of (Etype (N));
+if In_Instance
+  and then Is_Generic_Actual_Type (Etype (Expression (N)))
+then
+   return Dimensions_Of (Etype (Expression (N)));
 
+elsif In_Instance
+  and then Exists (Dimensions_Of (Expression (N)))
+then
+   return Dimensions_Of (Expression (N));
+
+else
+   return Dimensions_Of (Etype (N));
+end if;
+
  --  Otherwise return the default dimensions
 
  else

[Ada] Time_IO.Value enhanced to parse ISO-8861 UTC date and time

2017-09-06 Thread Arnaud Charlet

The function Value of package GNAT.Calendar.Time_IO has been enhanced
to parse strings containing UTC date and time.

After this patch the following test works fine.

with Ada.Calendar;  use Ada.Calendar;
with Ada.Text_IO;   use Ada.Text_IO;
with GNAT.Calendar.Time_IO; use GNAT.Calendar.Time_IO;

procedure Do_Test is
   Picture : Picture_String := "%Y-%m-%dT%H:%M:%S,%i";
   T1  : Time;
   T2  : Time;
   T3  : Time;
   T4  : Time;
   T5  : Time;
begin
   T1 := Value ("2017-04-14T14:47:06");
   pragma Assert (Image (T1, Picture) = "2017-04-14T14:47:06,000");

   T2 := Value ("2017-04-14T14:47:06Z");
   pragma Assert (Image (T2, Picture) = "2017-04-14T14:47:06,000");

   T3 := Value ("2017-04-14T14:47:06,999");
   pragma Assert (Image (T3, Picture) = "2017-04-14T14:47:06,999");

   T4 := Value ("2017-04-14T19:47:06+05");
   pragma Assert (Image (T4, Picture) = "2017-04-14T14:47:06,000");

   T5 := Value ("2017-04-14T09:00:06-05:47");
   pragma Assert (Image (T5, Picture) = "2017-04-14T14:47:06,000");
end;

Command: gnatmake -gnata do_test.adb; ./do_test

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Javier Miranda  

* g-catiio.ads, g-catiio.adb (Value): Extended to parse an UTC time
following ISO-8861.

Index: g-catiio.adb
===
--- g-catiio.adb(revision 251753)
+++ g-catiio.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
--- Copyright (C) 1999-2016, AdaCore --
+-- Copyright (C) 1999-2017, AdaCore --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -93,6 +93,26 @@
   Length  : Natural := 0) return String;
--  As above with N provided in Integer format
 
+   procedure Parse_ISO_8861_UTC
+  (Date: String;
+   Time: out Ada.Calendar.Time;
+   Success : out Boolean);
+   --  Subsidiary of function Value. It parses the string Date, interpreted as
+   --  an ISO 8861 time representation, and returns corresponding Time value.
+   --  Success is set to False when the string is not a supported ISO 8861
+   --  date. The following regular expression defines the supported format:
+   --
+   --(mmdd | '-'mm'-'dd)'T'(hhmmss | hh':'mm':'ss)
+   --  [ ('Z' | ('.' | ',') s{s} | ('+'|'-')hh':'mm) ]
+   --
+   --  Trailing characters (in particular spaces) are not allowed.
+   --
+   --  Examples:
+   --
+   --2017-04-14T14:47:0620170414T14:47:0620170414T144706
+   --2017-04-14T14:47:06,12 20170414T14:47:06.12
+   --2017-04-14T19:47:06+05 20170414T09:00:06-05:47
+
---
-- Am_Pm --
---
@@ -531,7 +551,7 @@
   "JUL", "AUG", "SEP", "OCT", "NOV", "DEC");
   --  Short version of the month names, used when parsing date strings
 
-  S : String := Str;
+  S : String := Str;
 
begin
   GNAT.Case_Util.To_Upper (S);
@@ -545,6 +565,390 @@
   return Abbrev_Upper_Month_Names'First;
end Month_Name_To_Number;
 
+   
+   -- Parse_ISO_8861_UTC --
+   
+
+   procedure Parse_ISO_8861_UTC
+  (Date: String;
+   Time: out Ada.Calendar.Time;
+   Success : out Boolean)
+   is
+  Index : Positive := Date'First;
+  --  The current character scan index. After a call to Advance, Index
+  --  points to the next character.
+
+  End_Of_Source_Reached : exception;
+  --  An exception used to signal that the scan pointer has reached the
+  --  end of the source string.
+
+  Wrong_Syntax  : exception;
+  --  An exception used to signal that the scan pointer has reached an
+  --  unexpected character in the source string.
+
+  procedure Advance;
+  pragma Inline (Advance);
+  --  Past the current character of Date
+
+  procedure Advance_Digits (Num_Digits : Positive);
+  pragma Inline (Advance_Digits);
+  --  Past the given number of digit characters
+
+  function Scan_Day return Day_Number;
+  pragma Inline (Scan_Day);
+  --  Scan the two digits of a day number and return its value
+
+  function Scan_Hour return Hour_Number;
+  pragma Inline (Scan_Hour);
+  --  Scan the two digits of an hour number and return its value
+
+  function Scan_Minute return Minute_Number;
+  pragma Inline (Scan_Minute);
+  --  Scan the two digits of a minute number and return its value
+
+

[Ada] Eliminate out-of-line body of local inlined subprograms

2017-09-06 Thread Arnaud Charlet

This improves a little the algorithm used to compute the set of externally
visible entities in package bodies to make it less conservative in the
presence of local inlined subprograms.  The typical effect is to eliminate
the out-of-line body if the subprogram is inlined at every call site:

package Q3 is

  procedure Caller;

end Q3;

package body Q3 is

  I : Integer := 0;

  procedure Inner is
  begin
I := 1;
  end;

  procedure Proc;
  pragma Inline (Proc);

  procedure Proc is
  begin
Inner;
  end;

  procedure Caller is
  begin
Proc;
  end;

end Q3;

The out-of-line body of Proc is now eliminated at -O1 and above.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Eric Botcazou  

* inline.adb (Split_Unconstrained_Function): Also set Is_Inlined
on the procedure created to encapsulate the body.
* sem_ch7.adb: Add with clause for GNAT.HTable.
(Entity_Table_Size): New constant.
(Entity_Hash): New function.
(Subprogram_Table): New instantiation of GNAT.Htable.Simple_HTable.
(Is_Subprogram_Ref): Rename into...
(Scan_Subprogram_Ref): ...this. Record references to subprograms in
the table instead of bailing out on them. Scan the value of constants
if it is not known at compile time.
(Contains_Subprograms_Refs): Rename into...
(Scan_Subprogram_Refs): ...this.
(Has_Referencer): Scan the body of all inlined subprograms. Reset the
Is_Public flag on subprograms if they are not actually referenced.
(Hide_Public_Entities): Beef up comment on the algorithm.
Reset the table of subprograms on entry.

Index: inline.adb
===
--- inline.adb  (revision 251779)
+++ inline.adb  (working copy)
@@ -1607,7 +1607,7 @@
   --  N is an inlined function body that returns an unconstrained type and
   --  has a single extended return statement. Split N in two subprograms:
   --  a procedure P' and a function F'. The formals of P' duplicate the
-  --  formals of N plus an extra formal which is used return a value;
+  --  formals of N plus an extra formal which is used to return a value;
   --  its body is composed by the declarations and list of statements
   --  of the extended return statement of N.
 
@@ -1915,6 +1915,7 @@
 Pop_Scope;
 Build_Procedure (Proc_Id, Decl_List);
 Insert_Actions (N, Decl_List);
+Set_Is_Inlined (Proc_Id);
 Push_Scope (Scope);
  end;
 
Index: sem_ch7.adb
===
--- sem_ch7.adb (revision 251763)
+++ sem_ch7.adb (working copy)
@@ -70,6 +70,8 @@
 with Style;
 with Uintp; use Uintp;
 
+with GNAT.HTable;
+
 package body Sem_Ch7 is
 
---
@@ -187,6 +189,38 @@
   end if;
end Analyze_Package_Body;
 
+   --
+   -- Analyze_Package_Body_Helper Data and Subprograms --
+   --
+
+   Entity_Table_Size : constant := 4096;
+   --  Number of headers in hash table
+
+   subtype Entity_Header_Num is Integer range 0 .. Entity_Table_Size - 1;
+   --  Range of headers in hash table
+
+   function Entity_Hash (Id : Entity_Id) return Entity_Header_Num;
+   --  Simple hash function for Entity_Ids
+
+   package Subprogram_Table is new GNAT.Htable.Simple_HTable
+ (Header_Num => Entity_Header_Num,
+  Element=> Boolean,
+  No_Element => False,
+  Key=> Entity_Id,
+  Hash   => Entity_Hash,
+  Equal  => "=");
+   --  Hash table to record which subprograms are referenced. It is declared
+   --  at library level to avoid elaborating it for every call to Analyze.
+
+   -
+   -- Entity_Hash --
+   -
+
+   function Entity_Hash (Id : Entity_Id) return Entity_Header_Num is
+   begin
+  return Entity_Header_Num (Id mod Entity_Table_Size);
+   end Entity_Hash;
+
-
-- Analyze_Package_Body_Helper --
-
@@ -200,8 +234,8 @@
   --  Attempt to hide all public entities found in declarative list Decls
   --  by resetting their Is_Public flag to False depending on whether the
   --  entities are not referenced by inlined or generic bodies. This kind
-  --  of processing is a conservative approximation and may still leave
-  --  certain entities externally visible.
+  --  of processing is a conservative approximation and will still leave
+  --  entities externally visible if the package is not simple enough.
 
   procedure Install_Composite_Operations (P : Entity_Id);
   --  Composite types declared in the current scope may depend on types
@@ -214,11 +248,6 @@
   --
 
   procedure Hide_Public_Entities (Decls : List_Id) is
-

[Ada] Crash when issuing warning on uninitialized value

2017-09-06 Thread Arnaud Charlet

When issuing a warning on a read of an uninitialized variable through
reading an attribute such as Loop_Entry, GNAT could crash. Now fixed.

GNAT issues a warning as expected on the following code:

 $ gcc -c s.adb

 1. package S is
 2.
 3.type Array_Range is range 1 .. 10;
 4.
 5.type IntArray is array (Array_Range) of Integer;
 6.
 7.procedure Move (Dest, Src : aliased out IntArray);
 8.
 9. end S;

 1. package body S is
 2.
 3.procedure Move (Dest, Src : aliased out IntArray) is
 4.begin
 5.   for Index in Dest'Range loop
 6.  pragma Assert (for all J in Dest'First .. Index - 1 =>
 7.   Dest (J) = Src'Loop_Entry (J));
  1  2
>>> warning: "Dest" may be referenced before it has a value
>>> warning: "Src" may be referenced before it has a value

 8.
 9.  Dest (Index) := Src (Index);
10.  Src (Index) := 0;
11.   end loop;
12.end Move;
13.
14. end S;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Yannick Moy  

* sem_warn.adb (Check_References): Take into
account possibility of attribute reference as original node.

Index: sem_warn.adb
===
--- sem_warn.adb(revision 251773)
+++ sem_warn.adb(working copy)
@@ -1382,16 +1382,22 @@
   --  deal with case where original unset reference has been
   --  rewritten during expansion.
 
-  --  In some cases, the original node may be a type conversion
-  --  or qualification, and in this case we want the object
-  --  entity inside.
+  --  In some cases, the original node may be a type
+  --  conversion, a qualification or an attribute reference and
+  --  in this case we want the object entity inside. Same for
+  --  an expression with actions.
 
   UR := Original_Node (UR);
   while Nkind (UR) = N_Type_Conversion
 or else Nkind (UR) = N_Qualified_Expression
 or else Nkind (UR) = N_Expression_With_Actions
+or else Nkind (UR) = N_Attribute_Reference
   loop
- UR := Expression (UR);
+ if Nkind (UR) = N_Attribute_Reference then
+UR := Prefix (UR);
+ else
+UR := Expression (UR);
+ end if;
   end loop;
 
   --  Don't issue warning if appearing inside Initial_Condition

Re: Add support to trace comparison instructions and switch statements

2017-09-06 Thread 吴潍浠(此彼)

Hi Jakub
I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall 
-fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But 
once we trace integer comparisons, why not real type comparisons.
I remember Dmitry said it is not enough useful to trace real type comparisons 
because it is rare to see them in programs.
But libdng_sdk really has real type comparisons. So I want to keep them and 
implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.

And thanks again for your professional help.

Wish Wu

--
From:Jakub Jelinek 
Time:2017 Sep 6 (Wed) 05:44
To:Wish Wu 
Cc:Dmitry Vyukov ; gcc-patches ; 
Jeff Law ; wishwu007 
Subject:Re: Add support to trace comparison instructions and switch statements

On Tue, Sep 05, 2017 at 09:03:52PM +0800, 吴潍浠(此彼) wrote:
> Attachment is my updated path.
> The implementation of parse_sanitizer_options is not elegance enough. Mixing 
> handling flags of fsanitize is easy to make mistakes.

To avoid too many further iterations, I took the liberty to tweak your
patch.  From https://clang.llvm.org/docs/SanitizerCoverage.html
I've noticed that since 2017-08-11 clang/llvm wants to emit
__sanitizer_cov_trace_const_cmpN with the first argument a constant
if one of the comparison operands is a constant, so the patch implements
that too.
I wonder about the __sanitizer_cov_trace_cmp{f,d} entry-points, because
I can't find them on that page nor in llvm sources.
I've also added handling of COND_EXPRs and added some documentation.

I've bootstrapped/regtested the patch on x86_64-linux and i686-linux.
Can you test it on whatever you want to use the patch for?

2017-09-05  Wish Wu  
 Jakub Jelinek  

 * asan.c (initialize_sanitizer_builtins): Add
 BT_FN_VOID_UINT8_UINT8, BT_FN_VOID_UINT16_UINT16,
 BT_FN_VOID_UINT32_UINT32, BT_FN_VOID_UINT64_UINT64,
 BT_FN_VOID_FLOAT_FLOAT, BT_FN_VOID_DOUBLE_DOUBLE and
 BT_FN_VOID_UINT64_PTR variables.
 * builtin-types.def (BT_FN_VOID_UINT8_UINT8): New fn type.
 (BT_FN_VOID_UINT16_UINT16): Likewise.
 (BT_FN_VOID_UINT32_UINT32): Likewise.
 (BT_FN_VOID_FLOAT_FLOAT): Likewise.
 (BT_FN_VOID_DOUBLE_DOUBLE): Likewise.
 (BT_FN_VOID_UINT64_PTR): Likewise.
 * common.opt (flag_sanitize_coverage): New variable.
 (fsanitize-coverage=trace-pc): Remove.
 (fsanitize-coverage=): Add.
 * flag-types.h (enum sanitize_coverage_code): New enum.
 * fold-const.c (fold_range_test): Disable non-short-circuit
 optimization if flag_sanitize_coverage.
 (fold_truth_andor): Likewise.
 * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
 * opts.c (COVERAGE_SANITIZER_OPT): Define.
 (coverage_sanitizer_opts): New array.
 (get_closest_sanitizer_option): Add OPTS argument, handle also
 OPT_fsanitize_coverage_.
 (parse_sanitizer_options): Adjusted to also handle
 OPT_fsanitize_coverage_.
 (common_handle_option): Add OPT_fsanitize_coverage_.
 * sancov.c (instrument_comparison, instrument_switch): New function.
 (sancov_pass): Add trace-cmp support.
 * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1,
 BUILT_IN_SANITIZER_COV_TRACE_CMP2, BUILT_IN_SANITIZER_COV_TRACE_CMP4,
 BUILT_IN_SANITIZER_COV_TRACE_CMP8,
 BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1,
 BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2,
 BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4,
 BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8,
 BUILT_IN_SANITIZER_COV_TRACE_CMPF, BUILT_IN_SANITIZER_COV_TRACE_CMPD,
 BUILT_IN_SANITIZER_COV_TRACE_SWITCH): New builtins.
 * doc/invoke.texi: Document -fsanitize-coverage=trace-cmp.

 * gcc.dg/sancov/cmp0.c: New test.

--- gcc/asan.c.jj 2017-09-04 09:55:26.600687479 +0200
+++ gcc/asan.c 2017-09-05 15:39:32.452612728 +0200
@@ -2709,6 +2709,29 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_SIZE_CONST_PTR_INT
 = build_function_type_list (size_type_node, const_ptr_type_node,
 integer_type_node, NULL_TREE);
+
+  tree BT_FN_VOID_UINT8_UINT8
+= build_function_type_list (void_type_node, unsigned_char_type_node,
+unsigned_char_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT16_UINT16
+= build_function_type_list (void_type_node, uint16_type_node,
+uint16_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT32_UINT32
+= build_function_type_list (void_type_node, uint32_type_node,
+uint32_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_UINT64
+= build_function_type_list (void_type_node, uint64_type_node,
+uint64_type_node, NULL_TREE);
+  tree BT_FN_VOID_FLOAT_FLOAT
+= build_function_type_list (void_type_node, float_type_node,
+float_type_node, NULL_TREE);
+  tree BT_FN_VOID_DOUBLE_DOUBLE
+= build_function_type_list (void_type_node, double_type_node,
+double_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_PTR
+= build_function_type_list (void_type_node, uint64_type_node,
+ptr_type_node, NULL_TREE);
+
   tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];

[C++ PATCH] rename lookup_fnfields_slot

2017-09-06 Thread Nathan Sidwell

This patch renames lookup_fnfields_slot{,_nolazy} to 
get_class_binding{,_direct}.  It also removes a few now-unneeded checks 
for CLASSTYPE_METHOD_VEC being non-null.


You may notice that the new names mention nothing about the kind of 
member looked for.  That's intentional.  These functions will absorb the 
non-function member lookup functionality.


nathan
--
Nathan Sidwell
2017-09-06  Nathan Sidwell  

	* name-lookup.h (lookup_fnfields_slot_nolazy,
	lookup_fnfields_slot): Rename to ...
	(get_class_binding_direct, get_class_binding): ... here.
	* name-lookup.c (lookup_fnfields_slot_nolazy,
	lookup_fnfields_slot): Rename to ...
	(get_class_binding_direct, get_class_binding): ... here.
	* cp-tree.h (CLASSTYPE_CONSTRUCTORS, CLASSTYPE_DESTRUCTOR): Adjust.
	* call.c (build_user_type_conversion_1): Adjust.
	(has_trivial_copy_assign_p): Adjust.
	(has_trivial_copy_p): Adjust.
	* class.c (get_basefndecls) Adjust.
	(vbase_has_user_provided_move_assign) Adjust.
	(classtype_has_move_assign_or_move_ctor_p): Adjust.
	(type_build_ctor_call, type_build_dtor_call): Adjust.
	* decl.c (register_dtor_fn): Adjust.
	* decl2.c (check_classfn): Adjust.
	* pt.c (retrieve_specialization): Adjust.
	(check_explicit_specialization): Adjust.
	(do_class_deduction): Adjust.
	* search.c (lookup_field_r): Adjust.
	(look_for_overrides_here, lookup_conversions_r): Adjust.
	* semantics.c (classtype_has_nothrow_assign_or_copy_p): Adjust.
	* tree.c (type_has_nontrivial_copy_init): Adjust.
	* method.c (lazily_declare_fn): Adjust comment.

Index: call.c
===
--- call.c	(revision 251779)
+++ call.c	(working copy)
@@ -3738,7 +3738,7 @@ build_user_type_conversion_1 (tree totyp
   if (CLASS_TYPE_P (totype))
 /* Use lookup_fnfields_slot instead of lookup_fnfields to avoid
creating a garbage BASELINK; constructors can't be inherited.  */
-ctors = lookup_fnfields_slot (totype, complete_ctor_identifier);
+ctors = get_class_binding (totype, complete_ctor_identifier);
 
   /* FIXME P0135 doesn't say what to do in C++17 about list-initialization from
  a single element.  For now, let's handle constructors as before and also
@@ -8243,9 +8243,7 @@ first_non_public_field (tree type)
 static bool
 has_trivial_copy_assign_p (tree type, bool access, bool *hasassign)
 {
-  tree fns = cp_assignment_operator_id (NOP_EXPR);
-  fns = lookup_fnfields_slot (type, fns);
-
+  tree fns = get_class_binding (type, cp_assignment_operator_id (NOP_EXPR));
   bool all_trivial = true;
 
   /* Iterate over overloads of the assignment operator, checking
@@ -8294,8 +8292,7 @@ has_trivial_copy_assign_p (tree type, bo
 static bool
 has_trivial_copy_p (tree type, bool access, bool hasctor[2])
 {
-  tree fns = lookup_fnfields_slot (type, complete_ctor_identifier);
-
+  tree fns = get_class_binding (type, complete_ctor_identifier);
   bool all_trivial = true;
 
   for (ovl_iterator oi (fns); oi; ++oi)
Index: class.c
===
--- class.c	(revision 251779)
+++ class.c	(working copy)
@@ -2745,7 +2745,7 @@ get_basefndecls (tree name, tree t, vec<
   bool found_decls = false;
 
   /* Find virtual functions in T with the indicated NAME.  */
-  for (ovl_iterator iter (lookup_fnfields_slot (t, name)); iter; ++iter)
+  for (ovl_iterator iter (get_class_binding (t, name)); iter; ++iter)
 {
   tree method = *iter;
 
@@ -5034,14 +5034,12 @@ bool
 vbase_has_user_provided_move_assign (tree type)
 {
   /* Does the type itself have a user-provided move assignment operator?  */
-  for (ovl_iterator iter (lookup_fnfields_slot_nolazy
-			  (type, cp_assignment_operator_id (NOP_EXPR)));
-   iter; ++iter)
-{
-  tree fn = *iter;
-  if (move_fn_p (fn) && user_provided_p (fn))
+  if (!CLASSTYPE_LAZY_MOVE_ASSIGN (type))
+for (ovl_iterator iter (get_class_binding_direct
+			(type, cp_assignment_operator_id (NOP_EXPR)));
+	 iter; ++iter)
+  if (!DECL_ARTIFICIAL (*iter) && move_fn_p (*iter))
 	return true;
-}
 
   /* Do any of its bases?  */
   tree binfo = TYPE_BINFO (type);
@@ -5180,13 +5178,12 @@ classtype_has_move_assign_or_move_ctor_p
 		  && !CLASSTYPE_LAZY_MOVE_ASSIGN (t)));
 
   if (!CLASSTYPE_LAZY_MOVE_CTOR (t))
-for (ovl_iterator iter (lookup_fnfields_slot_nolazy (t, ctor_identifier));
-	 iter; ++iter)
+for (ovl_iterator iter (CLASSTYPE_CONSTRUCTORS (t)); iter; ++iter)
   if ((!user_p || !DECL_ARTIFICIAL (*iter)) && move_fn_p (*iter))
 	return true;
 
   if (!CLASSTYPE_LAZY_MOVE_ASSIGN (t))
-for (ovl_iterator iter (lookup_fnfields_slot_nolazy
+for (ovl_iterator iter (get_class_binding_direct
 			(t, cp_assignment_operator_id (NOP_EXPR)));
 	 iter; ++iter)
   if ((!user_p || !DECL_ARTIFICIAL (*iter)) && move_fn_p (*iter))
@@ -5220,8 +5217,7 @@ type_build_ctor_call (tree t)
 return false;
   /* A user-declared constructor might be private, and a constructor might
  be trivial but dele

[Ada] Extension of 'Image in Ada2020

2017-09-06 Thread Arnaud Charlet

Refactor of all 'Image attributes for better error diagnostics and clarity.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Justin Squirek  

* exp_imgv.adb (Expand_Image_Attribute),
(Expand_Wide_Image_Attribute), (Expand_Wide_Wide_Image_Attribute):
Added case to handle new-style 'Image expansion
(Rewrite_Object_Image): Moved from exp_attr.adb
* exp_attr.adb (Expand_N_Attribute_Reference): Modified Image
attribute cases so that the relevant subprograms in exp_imgv.adb
handle all expansion.
(Rewrite_Object_Reference_Image): Moved to exp_imgv.adb
* sem_attr.adb (Analyze_Attribute): Modified Image attribute
cases to call common function Analyze_Image_Attribute.
(Analyze_Image_Attribute): Created as a common path for all
image attributes (Check_Object_Reference_Image): Removed
* sem_util.ads, sem_util.adb (Is_Image_Applied_To_Object):
Removed and refactored into Is_Object_Image (Is_Object_Image):
Created as a replacement for Is_Image_Applied_To_Object

Index: exp_imgv.adb
===
--- exp_imgv.adb(revision 251753)
+++ exp_imgv.adb(working copy)
@@ -36,6 +36,7 @@
 with Rtsfind;  use Rtsfind;
 with Sem_Aux;  use Sem_Aux;
 with Sem_Res;  use Sem_Res;
+with Sem_Util; use Sem_Util;
 with Sinfo;use Sinfo;
 with Snames;   use Snames;
 with Stand;use Stand;
@@ -52,6 +53,17 @@
--  Ordinary_Fixed_Point_Type with a small that is a negative power of ten.
--  Shouldn't this be in einfo.adb or sem_aux.adb???
 
+   procedure Rewrite_Object_Image
+ (N : Node_Id;
+  Pref  : Entity_Id;
+  Attr_Name : Name_Id;
+  Str_Typ   : Entity_Id);
+   --  AI12-00124: Rewrite attribute 'Image when it is applied to an object
+   --  reference as an attribute applied to a type. N denotes the node to be
+   --  rewritten, Pref denotes the prefix of the 'Image attribute, and Name
+   --  and Str_Typ specify which specific string type and 'Image attribute to
+   --  apply (e.g. Name_Wide_Image and Standard_Wide_String).
+

-- Build_Enumeration_Image_Tables --

@@ -254,10 +266,10 @@
   Loc   : constant Source_Ptr := Sloc (N);
   Exprs : constant List_Id:= Expressions (N);
   Pref  : constant Node_Id:= Prefix (N);
-  Ptyp  : constant Entity_Id  := Entity (Pref);
-  Rtyp  : constant Entity_Id  := Root_Type (Ptyp);
   Expr  : constant Node_Id:= Relocate_Node (First (Exprs));
   Imid  : RE_Id;
+  Ptyp  : Entity_Id;
+  Rtyp  : Entity_Id;
   Tent  : Entity_Id;
   Ttyp  : Entity_Id;
   Proc_Ent  : Entity_Id;
@@ -273,6 +285,14 @@
   Pnn : constant Entity_Id := Make_Temporary (Loc, 'P');
 
begin
+  if Is_Object_Image (Pref) then
+ Rewrite_Object_Image (N, Pref, Name_Image, Standard_String);
+ return;
+  end if;
+
+  Ptyp := Entity (Pref);
+  Rtyp := Root_Type (Ptyp);
+
   --  Build declarations of Snn and Pnn to be inserted
 
   Ins_List := New_List (
@@ -791,11 +811,19 @@
 
procedure Expand_Wide_Image_Attribute (N : Node_Id) is
   Loc  : constant Source_Ptr := Sloc (N);
-  Rtyp : constant Entity_Id  := Root_Type (Entity (Prefix (N)));
-  Rnn  : constant Entity_Id := Make_Temporary (Loc, 'S');
-  Lnn  : constant Entity_Id := Make_Temporary (Loc, 'P');
+  Pref : constant Entity_Id  := Prefix (N);
+  Rnn  : constant Entity_Id  := Make_Temporary (Loc, 'S');
+  Lnn  : constant Entity_Id  := Make_Temporary (Loc, 'P');
+  Rtyp : Entity_Id;
 
begin
+  if Is_Object_Image (Pref) then
+ Rewrite_Object_Image (N, Pref, Name_Wide_Image, Standard_Wide_String);
+ return;
+  end if;
+
+  Rtyp := Root_Type (Entity (Pref));
+
   Insert_Actions (N, New_List (
 
  --  Rnn : Wide_String (1 .. base_typ'Width);
@@ -882,12 +910,20 @@
 
procedure Expand_Wide_Wide_Image_Attribute (N : Node_Id) is
   Loc  : constant Source_Ptr := Sloc (N);
-  Rtyp : constant Entity_Id  := Root_Type (Entity (Prefix (N)));
+  Pref : constant Entity_Id  := Prefix (N);
+  Rnn  : constant Entity_Id  := Make_Temporary (Loc, 'S');
+  Lnn  : constant Entity_Id  := Make_Temporary (Loc, 'P');
+  Rtyp : Entity_Id;
 
-  Rnn : constant Entity_Id := Make_Temporary (Loc, 'S');
-  Lnn : constant Entity_Id := Make_Temporary (Loc, 'P');
+   begin
+  if Is_Object_Image (Pref) then
+ Rewrite_Object_Image
+   (N, Pref, Name_Wide_Wide_Image, Standard_Wide_Wide_String);
+ return;
+  end if;
 
-   begin
+  Rtyp := Root_Type (Entity (Pref));
+
   Insert_Actions (N, New_List (
 
  --  Rnn : Wide_Wide_String (1 .. rt'Wide_Wide_Width);
@@ -1373,4 +1409,23 @@
  and then Ur

[AArch64] Merge stores of D register values of different modes

2017-09-06 Thread Jackson Woodruff


Hi all,

This patch merges loads and stores from D-registers that are of 
different modes.


Code like this:

typedef int __attribute__((vector_size(8))) vec;
struct pair
{
  vec v;
  double d;
}

void
assign (struct pair *p, vec v)
{
  p->v = v;
  p->d = 1.0;
}

Now generates a stp instruction whereas previously it generated two 
`str` instructions. Likewise for loads.


I have taken the opportunity to merge some of the patterns into a single 
pattern. Previously, we had different patterns for DI, DF, SI, SF modes. 
The patch uses the new iterators to reduce these to two patterns.



This patch also merges storing of double zero values with
long integer values:

struct pair
{
  long long l;
  double d;
}

void
foo (struct pair *p)
{
  p->l = 10;
  p->d = 0.0;
}

Now generates a single store pair instruction rather than two `str` 
instructions.


Bootstrap and testsuite run OK. OK for trunk?

Jackson

gcc/

2017-07-21  Jackson Woodruff  

* config/aarch64/aarch64.md: New patterns to generate stp
and ldp.
* config/aarch64/aarch64-ldpstp.md: Modified peephole
for different mode ldpstp and added peephole for merge zero
stores. Likewise for loads.
* config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp):
Added size check.
(aarch64_gen_store_pair): Rename calls to match new patterns.
(aarch64_gen_load_pair): Rename calls to match new patterns.
* config/aarch64/aarch64-simd.md (store_pair): Updated
pattern to match two modes.
(store_pair_sw, store_pair_dw): New patterns to generate stp for
single words and double words.
(load_pair_sw, load_pair_dw): Likewise.
(store_pair_sf, store_pair_df, store_pair_si, store_pair_di):
Removed.
(load_pair_sf, load_pair_df, load_pair_si, load_pair_di):
Removed.
* config/aarch64/iterators.md: New mode iterators for
types in d registers and duplicate DX and SX modes.
New iterator for DI, DF, SI, SF.
* config/aarch64/predicates.md (aarch64_reg_zero_or_fp_zero):
New.


gcc/testsuite/

2017-07-21  Jackson Woodruff  

* gcc.target/aarch64/ldp_stp_6.c: New.
* gcc.target/aarch64/ldp_stp_7.c: New.
* gcc.target/aarch64/ldp_stp_8.c: New.
diff --git a/gcc/config/aarch64/aarch64-ldpstp.md 
b/gcc/config/aarch64/aarch64-ldpstp.md
index 
e8dda42c2dd1e30c4607c67a2156ff7813bd89ea..14e860d258e548d4118d957675f8bdbb74615337
 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -99,10 +99,10 @@
 })
 
 (define_peephole2
-  [(set (match_operand:VD 0 "register_operand" "")
-   (match_operand:VD 1 "aarch64_mem_pair_operand" ""))
-   (set (match_operand:VD 2 "register_operand" "")
-   (match_operand:VD 3 "memory_operand" ""))]
+  [(set (match_operand:DREG 0 "register_operand" "")
+   (match_operand:DREG 1 "aarch64_mem_pair_operand" ""))
+   (set (match_operand:DREG2 2 "register_operand" "")
+   (match_operand:DREG2 3 "memory_operand" ""))]
   "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
   [(parallel [(set (match_dup 0) (match_dup 1))
  (set (match_dup 2) (match_dup 3))])]
@@ -119,11 +119,12 @@
 })
 
 (define_peephole2
-  [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "")
-   (match_operand:VD 1 "register_operand" ""))
-   (set (match_operand:VD 2 "memory_operand" "")
-   (match_operand:VD 3 "register_operand" ""))]
-  "TARGET_SIMD && aarch64_operands_ok_for_ldpstp (operands, false, mode)"
+  [(set (match_operand:DREG 0 "aarch64_mem_pair_operand" "")
+   (match_operand:DREG 1 "register_operand" ""))
+   (set (match_operand:DREG2 2 "memory_operand" "")
+   (match_operand:DREG2 3 "register_operand" ""))]
+  "TARGET_SIMD
+   && aarch64_operands_ok_for_ldpstp (operands, false, mode)"
   [(parallel [(set (match_dup 0) (match_dup 1))
  (set (match_dup 2) (match_dup 3))])]
 {
@@ -138,7 +139,6 @@
 }
 })
 
-
 ;; Handle sign/zero extended consecutive load/store.
 
 (define_peephole2
@@ -181,6 +181,30 @@
 }
 })
 
+;; Handle storing of a floating point zero.
+;; We can match modes that won't work for a stp instruction
+;; as aarch64_operands_ok_for_ldpstp checks that the modes are
+;; compatible.
+(define_peephole2
+  [(set (match_operand:DSX 0 "aarch64_mem_pair_operand" "")
+   (match_operand:DSX 1 "aarch64_reg_zero_or_fp_zero" ""))
+   (set (match_operand: 2 "memory_operand" "")
+   (match_operand: 3 "aarch64_reg_zero_or_fp_zero" ""))]
+  "aarch64_operands_ok_for_ldpstp (operands, false, DImode)"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+{
+  rtx base, offset_1, offset_2;
+
+  extract_base_offset_in_addr (operands[0], &base, &offset_1);
+  extract_base_offset_in_addr (operands[2], &base, &offset_2);
+  if (INTVAL (offset_1) > INT

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-09-06 Thread Richard Sandiford

Michael Collison  writes:
> Richard Sandiford do you have any objections to the patch as it stands?
> It doesn't appear as if anything is going to change in the mid-end
> anytime soon.

I think one of the suggestions was to do it in expand, taking advantage
of range info and TARGET_SHIFT_TRUNCATION_MASK.  This would be like
the current FMA_EXPR handling in expand_expr_real_2.

I know there was talk about cleaner approaches, but at least doing
the above seems cleaner than doing in the backend.  It should also
be a nicely-contained piece of work.

Thanks,
Richard

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
> Sent: Tuesday, August 22, 2017 9:11 AM
> To: Richard Biener 
> Cc: Richard Kenner ; Michael Collison
> ; GCC Patches ; nd
> ; Andrew Pinski 
> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>
> Richard Biener  writes:
>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
>>  wrote:
>>> Richard Biener  writes:
 On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
  wrote:
>Richard Biener  writes:
>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
>>  wrote:
 Correct. It is truncated for integer shift, but not simd shift 
 instructions. We generate a pattern in the split that only
>generates
 the integer shift instructions.
>>>
>>> That's unfortunate, because it would be nice to do this in
>simplify_rtx,
>>> since it's machine-independent, but that has to be conditioned on 
>>> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>
>> SHIFT_COUNT_TRUNCATED should go ... you should express this in the 
>> patterns, like for example with
>>
>> (define_insn ashlSI3
>>   [(set (match_operand 0 "")
>>  (ashl:SI (match_operand ... )
>>  (subreg:QI (match_operand:SI ...)))]
>>
>> or an explicit and:SI and combine / simplify_rtx should apply the
>magic
>> optimization we expect.
>
>The problem with the explicit AND is that you'd end up with either 
>an AND of two constants for constant shifts, or with two separate 
>patterns, one for constant shifts and one for variable shifts.  (And 
>the problem in theory with two patterns is that it reduces the RA's 
>freedom, although in practice I guess we'd always want a constant 
>shift where possible for cost reasons, and so the RA would never 
>need to replace pseudos with constants itself.)
>
>I think all useful instances of this optimisation will be exposed by 
>the gimple optimisers, so maybe expand could to do it based on 
>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than 
>the rtx code and it does take the mode into account.

 Sure, that could work as well and also take into account range info. 
 But we'd then need named expanders and the result would still have 
 the explicit and or need to be an unspec or a different RTL operation.
>>>
>>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have 
>>> target-dependent rather than undefined behaviour, so it's OK for a 
>>> target to use shift codes with out-of-range values.
>>
>> Hmm, but that means simplify-rtx can't do anything with them because 
>> we need to preserve target dependent behavior.
>
> Yeah, it needs to punt.  In practice that shouldn't matter much.
>
>> I think the RTL IL should be always well-defined and its semantics 
>> shouldn't have any target dependences (ideally, and if, then they 
>> should be well specified via extra target hooks/macros).
>
> That would be nice :-) I think the problem has traditionally been that
>> shifts can be used in quite a few define_insn patterns besides those
>> for shift instructions.  So if your target defines shifts to have
>> 256-bit precision (say) then you need to make sure that every
>> define_insn with a shift rtx will honour that.
>
> It's more natural for target guarantees to apply to instructions than to
>> rtx codes.
>
>>>  And
>>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about how 
>>> the normal shift optabs behave, so I don't think we'd need new optabs 
>>> or new unspecs.
>>>
>>> E.g. it already works this way when expanding double-word shifts, 
>>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There it's 
>>> possible to use a shorter sequence if you know that the shift optab 
>>> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED 
>>> isn't defined.
>>
>> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK 
>> applies to the instructions generated by the named shift patterns but 
>> _not_ general shift RTXen.  But the generated pattern contains shift 
>> RTXen and how can we figure whether they were generated by the named 
>> expanders or by other means?  Don't define_expand also serve as 
>> define_insn for things like combine?
>
> Yeah, you can't (and aren't supposed to

[Ada] Derived iterable types with noniterable parent

2017-09-06 Thread Arnaud Charlet

This patch fixes a bug in which if a derived type has a Default_Iterator
specified, and the parent type does not, then a "for ... of" loop causes
the compiler to crash. No small test case available.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Bob Duff  

* exp_ch5.adb (Get_Default_Iterator): Replace
"Assert(False)" with "return Iter", because if an iterable
type is derived from a noniterable one, then we won't find an
overriding or inherited default iterator.

Index: exp_ch5.adb
===
--- exp_ch5.adb (revision 251767)
+++ exp_ch5.adb (working copy)
@@ -3934,9 +3934,9 @@
 
 function Get_Default_Iterator
   (T : Entity_Id) return Entity_Id;
---  If the container is a derived type, the aspect holds the parent
---  operation. The required one is a primitive of the derived type
---  and is either inherited or overridden. Also sets Container_Arg.
+--  Return the default iterator for a specific type. If the type is
+--  derived, we return the inherited or overridden one if
+--  appropriate.
 
 --
 -- Get_Default_Iterator --
@@ -3953,11 +3953,11 @@
 begin
Container_Arg := New_Copy_Tree (Container);
 
-   --  A previous version of GNAT allowed indexing aspects to
-   --  be redefined on derived container types, while the
-   --  default iterator was inherited from the parent type.
-   --  This non-standard extension is preserved temporarily for
-   --  use by the modelling project under debug flag d.X.
+   --  A previous version of GNAT allowed indexing aspects to be
+   --  redefined on derived container types, while the default
+   --  iterator was inherited from the parent type. This
+   --  nonstandard extension is preserved for use by the
+   --  modelling project under debug flag -gnatd.X.
 
if Debug_Flag_Dot_XX then
   if Base_Type (Etype (Container)) /=
@@ -3995,9 +3995,11 @@
  Next_Elmt (Prim);
   end loop;
 
-  --  Default iterator must exist
+  --  If we didn't find it, then our parent type is not
+  --  iterable, so we return the Default_Iterator aspect of
+  --  this type.
 
-  pragma Assert (False);
+  return Iter;
 
--  Otherwise not a derived type

[Ada] Missing finalization of cursor in "of" iterator loop

2017-09-06 Thread Arnaud Charlet

This patch modifies the finalization machinery to ensure that the cursor of an
"of" iterator loop is properly finalized at the end of the loop. Previously it
was incorrectly assumed that such a cursor will never need finalization
ctions.


-- Source --


--  leak.adb

pragma Warnings (Off);

with Ada.Unchecked_Deallocation;
with Ada.Finalization;
with Ada.Iterator_Interfaces;
with Ada.Text_IO; use Ada.Text_IO;

procedure Leak is
   type El is tagged null record;

   type Integer_Access is access all Integer;

   procedure Unchecked_Free is new Ada.Unchecked_Deallocation
 (Integer, Integer_Access);

   type Cursor is new Ada.Finalization.Controlled with record
  Count : Integer_Access := new Integer'(1);
   end record;

   overriding procedure Adjust (C : in out Cursor);
   overriding procedure Finalize (C : in out Cursor);

   overriding procedure Adjust (C : in out Cursor) is
   begin
  C.Count.all := C.Count.all + 1;
  Put_Line ("Adjust   Cursor.   Count = " & C.Count.all'Img);
   end Adjust;

   overriding procedure Finalize (C : in out Cursor) is
   begin
  C.Count.all := C.Count.all - 1;
  Put_Line ("Finalize Cursor.   Count = " & C.Count.all'Img);
  if C.Count.all = 0 then
 Unchecked_Free (C.Count);
  end if;
   end Finalize;

   function Has_Element (C : Cursor) return Boolean is (False);

   package Child is
  package Iterators is new Ada.Iterator_Interfaces
(Cursor   => Cursor,
 Has_Element  => Has_Element);

  type Iterator is
new Ada.Finalization.Controlled
  and Iterators.Forward_Iterator
  with record
 Count : Integer_Access := new Integer'(1);
  end record;

  overriding function First (I : Iterator) return Cursor
  is (Ada.Finalization.Controlled with others => <>);

  overriding function Next (I : Iterator; C : Cursor) return Cursor
  is (Ada.Finalization.Controlled with others => <>);

  overriding procedure Adjust (I : in out Iterator);

  end Child;

   package body Child is
  overriding procedure Adjust (I : in out Iterator) is
  begin
 I.Count.all := I.Count.all + 1;
 Put_Line ("Adjust   Iterator. Count = " & I.Count.all'Img);
  end Adjust;

  overriding procedure Finalize (I : in out Iterator) is
  begin
 I.Count.all := I.Count.all - 1;
 Put_Line ("Finalize Iterator. Count = " & I.Count.all'Img);
 if I.Count.all = 0 then
Unchecked_Free (I.Count);
 end if;
  end Finalize;
   end Child;

   type Iterable is tagged null record
 with Default_Iterator  => Iterate,
  Iterator_Element  => El'Class,
  Constant_Indexing => El_At;

   function Iterate
 (O : Iterable) return Child.Iterators.Forward_Iterator'Class
   is (Child.Iterator'(Ada.Finalization.Controlled with others => <>));

   function El_At (Self : Iterable; Pos : Cursor'Class) return El'Class
   is (El'(others => <>));

   Seq : Iterable;

begin
   Put_Line ("START");
   for V of Seq loop
  null;
   end loop;
   Put_Line ("END");
end Leak;


-- Compilation and output --


$ gnatmake -q leak.adb -largs -lgmem
$ ./leak
$ gnatmem ./leak > leaks.txt
$ grep -c "Number of non freed allocations" leaks.txt
START
Adjust   Iterator. Count =  2
Finalize Iterator. Count =  1
Adjust   Cursor.   Count =  2
Finalize Cursor.   Count =  1
Adjust   Cursor.   Count =  2
Finalize Cursor.   Count =  1
Finalize Cursor.   Count =  0
Finalize Iterator. Count =  0
END
0

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Hristian Kirtchev  

* einfo.adb (Status_Flag_Or_Transient_Decl): The attribute is now
allowed on loop parameters.
(Set_Status_Flag_Or_Transient_Decl): The attribute is now allowed
on loop parameters.
(Write_Field15_Name): Update the output for
Status_Flag_Or_Transient_Decl.
* einfo.ads: Attribute Status_Flag_Or_Transient_Decl now applies
to loop parameters. Update the documentation of the attribute
and the E_Loop_Parameter entity.
* exp_ch7.adb (Process_Declarations): Remove the bogus guard
which assumes that cursors can never be controlled.
* exp_util.adb (Requires_Cleanup_Actions): Remove the bogus
guard which assumes that cursors can never be controlled.

Index: exp_ch7.adb
===
--- exp_ch7.adb (revision 251753)
+++ exp_ch7.adb (working copy)
@@ -2100,15 +2100,6 @@
elsif Is_Ignored_Ghost_Entity (Obj_Id) then
   null;
 
-   --  The expansion of iterator loops generates an object
-   --  declaration where the Ekind is explicitly set to loop
-   --  parameter. This is to ensure that the loop parameter behaves
-   --  as a constant from user code point of view. Such object are

[Ada] Better warning on access to string at negative or null index

2017-09-06 Thread Arnaud Charlet

The warning issued when accessing a string at a negative or null index
was misleading, suggesting to use S'First - 1 as correct index, which
it is obviously not. Add a detection for negative or null index when
accessing a standard string, so that an appropriate warning is issued.
Also add a corresponding warning for other arrays, which is currently
not triggered by this detection mechanism under -gnatww

The following compilation shows the new warning:

 $ gcc -c cstr.adb

 1. procedure Cstr (X : in out String; J : Integer := -1) is
 2. begin
 3.X(0 .. J) := "";
 |
>>> warning: string index should be positive
>>> warning: static expression fails Constraint_Check

 4.X(0) := 'c';
 |
>>> warning: string index should be positive
>>> warning: static expression fails Constraint_Check

 5.X(0 .. 4) := "hello";
 13
>>> warning: string index should be positive
>>> warning: static expression fails Constraint_Check
>>> warning: index for "X" may assume lower bound of 1
>>> warning: suggested replacement: "X'First + 3"

 6. end Cstr;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Yannick Moy  

* sem_warn.adb (Warn_On_Suspicious_Index): Improve warning when the
literal index used to access a string is null or negative.

Index: sem_warn.adb
===
--- sem_warn.adb(revision 251772)
+++ sem_warn.adb(working copy)
@@ -46,6 +46,7 @@
 with Snames;   use Snames;
 with Stand;use Stand;
 with Stringt;  use Stringt;
+with Tbuild;   use Tbuild;
 with Uintp;use Uintp;
 
 package body Sem_Warn is
@@ -3878,6 +3879,13 @@
  procedure Warn1;
  --  Generate first warning line
 
+ procedure Warn_On_Index_Below_Lower_Bound;
+ --  Generate a warning on indexing the array with a literal value
+ --  below the lower bound of the index type.
+
+ procedure Warn_On_Literal_Index;
+ --  Generate a warning on indexing the array with a literal value
+
  --
  -- Length_Reference --
  --
@@ -3903,21 +3911,31 @@
   ("?w?index for& may assume lower bound of^", X, Ent);
  end Warn1;
 
-  --  Start of processing for Test_Suspicious_Index
+ -
+ -- Warn_On_Index_Below_Lower_Bound --
+ -
 
-  begin
- --  Nothing to do if subscript does not come from source (we don't
- --  want to give garbage warnings on compiler expanded code, e.g. the
- --  loops generated for slice assignments. Such junk warnings would
- --  be placed on source constructs with no subscript in sight).
+ procedure Warn_On_Index_Below_Lower_Bound is
+ begin
+if Is_Standard_String_Type (Typ) then
+   Discard_Node
+ (Compile_Time_Constraint_Error
+   (N   => X,
+Msg => "?w?string index should be positive"));
+else
+   Discard_Node
+ (Compile_Time_Constraint_Error
+   (N   => X,
+Msg => "?w?index out of the allowed range"));
+end if;
+ end Warn_On_Index_Below_Lower_Bound;
 
- if not Comes_From_Source (Original_Node (X)) then
-return;
- end if;
+ ---
+ -- Warn_On_Literal_Index --
+ ---
 
- --  Case where subscript is a constant integer
-
- if Nkind (X) = N_Integer_Literal then
+ procedure Warn_On_Literal_Index is
+ begin
 Warn1;
 
 --  Case where original form of subscript is an integer literal
@@ -4037,7 +4055,35 @@
Error_Msg_FE -- CODEFIX
  ("\?w?suggested replacement: `&~`", Original_Node (X), Ent);
 end if;
+ end Warn_On_Literal_Index;
 
+  --  Start of processing for Test_Suspicious_Index
+
+  begin
+ --  Nothing to do if subscript does not come from source (we don't
+ --  want to give garbage warnings on compiler expanded code, e.g. the
+ --  loops generated for slice assignments. Such junk warnings would
+ --  be placed on source constructs with no subscript in sight).
+
+ if not Comes_From_Source (Original_Node (X)) then
+return;
+ end if;
+
+ --  Case where subscript is a constant integer
+
+ if Nkind (X) = N_Integer_Literal then
+
+--  Case where subscript is lower than the lowest possible bound.
+--  This might be the case for example when programmers try to
+--  access a string at index 0, as they are used to in other
+--  programming

[Ada] Improve error message when function is used in a call statement

2017-09-06 Thread Arnaud Charlet

A typical error for new users of Ada is to call functions in a call
statement. Improve the error message for these users, to better indicate
what the error is in that case.

The following compilation raises the new message.

 $ gcc -c main.adb

 1. procedure Main is
 2.function Lol return Integer is (0);
 3. begin
 4.Lol;
   |
>>> cannot use call to function "Lol" as a statement
>>> return value of a function call cannot be ignored

 5. end Main;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Yannick Moy  

* sem_res.adb (Resolve): Update message for function call as statement.

Index: sem_res.adb
===
--- sem_res.adb (revision 251755)
+++ sem_res.adb (working copy)
@@ -2533,8 +2533,11 @@
  and then Ekind (Entity (Name (N))) = E_Function
then
   Error_Msg_NE
-("cannot use function & in a procedure call",
+("cannot use call to function & as a statement",
  Name (N), Entity (Name (N)));
+  Error_Msg_N
+("\return value of a function call cannot be ignored",
+ Name (N));
 
--  Otherwise give general message (not clear what cases this
--  covers, but no harm in providing for them).

[Ada] No_Return procedures in renaming declarations.

2017-09-06 Thread Arnaud Charlet

This patch implements legality rule in 6.5.1 (7/2): if a renaming as body
completes a nonreturning procedure declaration, the renamed procedure must
be nonreturning as well.  Previously GNAT only produced a warning in such
cases.

Tested in ACATS test B651002.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* sem_ch6.adb (Check_Returns): Clean up warnings coming from
generated bodies for renamings that are completions, when renamed
procedure is No_Return.
* sem_ch8.adb (Analyze_Subprogram_Renaming): Implement legality
rule in 6.5.1 (7/2): if a renaming is a completion of a subprogram
with No_Return, the renamed entity must be No_Return as well.

Index: sem_ch6.adb
===
--- sem_ch6.adb (revision 251762)
+++ sem_ch6.adb (working copy)
@@ -6693,7 +6693,11 @@
   Error_Msg_N
 ("implied return after this statement "
  & "would have raised Program_Error", Last_Stm);
-   else
+
+   --  In normal compilation mode, do not warn on a generated
+   --  call (e.g. in the body of a renaming as completion).
+
+   elsif Comes_From_Source (Last_Stm) then
   Error_Msg_N
 ("implied return after this statement "
  & "will raise Program_Error??", Last_Stm);
Index: sem_ch8.adb
===
--- sem_ch8.adb (revision 251762)
+++ sem_ch8.adb (working copy)
@@ -2946,6 +2946,14 @@
  Check_Fully_Conformant (New_S, Rename_Spec);
  Set_Public_Status (New_S);
 
+ if No_Return (Rename_Spec)
+and then not No_Return (Entity (Nam))
+ then
+Error_Msg_N ("renaming completes a No_Return procedure", N);
+Error_Msg_N
+  ("\renamed procedure must be nonreturning (RM 6.5.1 (7/2))", N);
+ end if;
+
  --  The specification does not introduce new formals, but only
  --  repeats the formals of the original subprogram declaration.
  --  For cross-reference purposes, and for refactoring tools, we

[Ada] Crash on generic subprogram with aspect No_Return.

2017-09-06 Thread Arnaud Charlet

This patch fixes a compiler abort on a generic unit to which the aspect
No_Return applies.

Tested in ACATS 4.1D C651002.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* freeze.adb (Freeze_Entity): Do not generate a freeze
node for a generic unit, even if it includes delayed aspect
specifications. Freeze nodes for generic entities must never
appear in the tree that reaches the back-end of the compiler.

Index: freeze.adb
===
--- freeze.adb  (revision 251765)
+++ freeze.adb  (working copy)
@@ -5489,6 +5489,13 @@
 then
Explode_Initialization_Compound_Statement (E);
 end if;
+
+--  Do not generate a freeze node for a generic unit.
+
+if Is_Generic_Unit (E) then
+   Result := No_List;
+   goto Leave;
+end if;
  end if;
 
   --  Case of a type or subtype being frozen

[Ada] Pragma No_Return on generic units

2017-09-06 Thread Arnaud Charlet

This patch ensures that if a pragma No_Return applies to a generic subprogram ,
all its instantiations are treated as No_Return subprograms as well.

Tested in ACATS 4.1D C651001.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* sem_ch12.adb (Analyze_Subprogram_Instantiation): Propagate
No_Return flag to instance if pragma applies to generic unit. This
must be done explicitly because the pragma does not appear
directly in the generic declaration (unlike the corresponding
aspect specification).

Index: sem_ch12.adb
===
--- sem_ch12.adb(revision 251753)
+++ sem_ch12.adb(working copy)
@@ -5382,6 +5382,15 @@
  Set_Has_Pragma_Inline (Act_Decl_Id, Has_Pragma_Inline (Gen_Unit));
  Set_Has_Pragma_Inline (Anon_Id, Has_Pragma_Inline (Gen_Unit));
 
+ --  Propagate No_Return if pragma applied to generic unit. This must
+ --  be done explicitly because pragma does not appear in generic
+ --  declaration (unlike the aspect case).
+
+ if No_Return (Gen_Unit) then
+Set_No_Return (Act_Decl_Id);
+Set_No_Return (Anon_Id);
+ end if;
+
  Set_Has_Pragma_Inline_Always
(Act_Decl_Id, Has_Pragma_Inline_Always (Gen_Unit));
  Set_Has_Pragma_Inline_Always

[Ada] Inherited aspects that may be delayed in a parent type

2017-09-06 Thread Arnaud Charlet

This patch fixes an omission in the handling of delayed aspects on derived
types. The type may inherit a representation aspect from its parent, but have
no explicit aspect specifications. At the point it is frozen, the parent is
frozen as well and its explicit aspects have been analyzed. The inherited
aspects of the derived type can then be captured properly.

Tested in ACATS test C35A001.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* freeze.adb (Freeze_Entity): For a derived type that has no
explicit delayed aspects but may inherit delayed aspects from its
parent type, analyze aspect at freeze point for proper capture
of an inherited aspect.

Index: freeze.adb
===
--- freeze.adb  (revision 251760)
+++ freeze.adb  (working copy)
@@ -5266,8 +5266,12 @@
   --  pragma or attribute definition clause in the tree at this point. We
   --  also analyze the aspect specification node at the freeze point when
   --  the aspect doesn't correspond to pragma/attribute definition clause.
+  --  In addition, a derived type may have inherited aspects that were
+  --  delayed in the parent, so these must also be captured now.
 
-  if Has_Delayed_Aspects (E) then
+  if Has_Delayed_Aspects (E)
+ or else May_Inherit_Delayed_Rep_Aspects (E)
+  then
  Analyze_Aspects_At_Freeze_Point (E);
   end if;

[Ada] Restore original implementation of internal Table package

2017-09-06 Thread Arnaud Charlet

This wasn't explicitly mentioned but the previous changes also replaced the
internal Table package used in the compiler by GNAT.Tables, resulting in a
large performance hit for the compiler because the memory management scheme
of the latter is very inefficient.

This restores the original implementation, which brings about a 10% speedup
in clock time on a typical compilation at -O0.

In addition, also use Table instead of GNAT.Table consistently in compiler
units: most compiler units instantiate the Table package when they need a
resizable array but a few of them were instantiating GNAT.Table instead, which
is less efficient and creates an additional dependency on the runtime.

This changes these units to using the Table package, which is immediate since
the interface is (essentially) the same.

No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Eric Botcazou  

* table.ads, table.adb: Restore original implementation.
* namet.h (Names_Ptr): Adjust back.
(Name_Chars_Ptr): Likewise.
* uintp.h (Uints_Ptr): Likewise.
(Udigits_Ptr): Likewise.
* g-table.ads: Remove pragma Compiler_Unit_Warning.
* par_sco.adb: Do not with GNAT.Table and use Table consistently.
* scos.ads: Replace GNAT.Table with Table and adjust instantiations.
* spark_xrefs.ads: Likewise.
* scos.h: Undo latest changes.
* gcc-interfaces/trans.c (gigi): Likewise.

Index: g-table.ads
===
--- g-table.ads (revision 251753)
+++ g-table.ads (working copy)
@@ -41,8 +41,6 @@
 -- GNAT.Table
 -- Table (the compiler unit)
 
-pragma Compiler_Unit_Warning;
-
 with GNAT.Dynamic_Tables;
 
 generic
Index: namet.h
===
--- namet.h (revision 251753)
+++ namet.h (working copy)
@@ -45,11 +45,11 @@
 };
 
 /* Pointer to names table vector. */
-#define Names_Ptr namet__name_entries__tab__the_instance
+#define Names_Ptr namet__name_entries__table
 extern struct Name_Entry *Names_Ptr;
 
 /* Pointer to name characters table. */
-#define Name_Chars_Ptr namet__name_chars__tab__the_instance
+#define Name_Chars_Ptr namet__name_chars__table
 extern char *Name_Chars_Ptr;
 
 /* This is Hostparm.Max_Line_Length.  */
Index: par_sco.adb
===
--- par_sco.adb (revision 251753)
+++ par_sco.adb (working copy)
@@ -44,7 +44,6 @@
 
 with GNAT.HTable;  use GNAT.HTable;
 with GNAT.Heap_Sort_G;
-with GNAT.Table;
 
 package body Par_SCO is
 
@@ -76,12 +75,13 @@
--  running some steps multiple times (the second pass has to be started
--  from multiple places).
 
-   package SCO_Raw_Table is new GNAT.Table
+   package SCO_Raw_Table is new Table.Table
  (Table_Component_Type => SCO_Table_Entry,
   Table_Index_Type => Nat,
   Table_Low_Bound  => 1,
   Table_Initial=> 500,
-  Table_Increment  => 300);
+  Table_Increment  => 300,
+  Table_Name   => "Raw_Table");
 
---
-- Unit Number Table --
Index: scos.ads
===
--- scos.ads(revision 251753)
+++ scos.ads(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- S p e c  --
 --  --
---  Copyright (C) 2009-2016, Free Software Foundation, Inc. --
+--  Copyright (C) 2009-2017, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -29,10 +29,9 @@
 --  is used in the ALI file.
 
 with Namet; use Namet;
+with Table;
 with Types; use Types;
 
-with GNAT.Table;
-
 package SCOs is
 
--  SCO information can exist in one of two forms. In the ALI file, it is
@@ -383,12 +382,13 @@
   --  For the SCO for a pragma/aspect, gives the pragma/apsect name
end record;
 
-   package SCO_Table is new GNAT.Table (
+   package SCO_Table is new Table.Table (
  Table_Component_Type => SCO_Table_Entry,
  Table_Index_Type => Nat,
  Table_Low_Bound  => 1,
  Table_Initial=> 500,
- Table_Increment  => 300);
+ Table_Increment  => 300,
+ Table_Name   => "Table");
 
Is_Decision : constant array (Character) of Boolean :=
  ('E' | 'G' | 'I' | 'P' | 'a' | 'A' | 'W' | 'X' => True,
@@ -530,12 +530,13 @@
 
end record;
 
-   package SCO_Unit_Table is new GNAT.Table (
+   package SCO_Unit_Table is new Table.Table (
  Table_Component_Type => SCO_Unit_Table_Entry,
  Table_I

[Ada] Primitive functions that require one formal and return an array

2017-09-06 Thread Arnaud Charlet

Primitive functions whose first formal is a controlling parameter, whose
other formals have defaults and whose result is an array type can lead to
ambiguities when the result of such a call is the prefix of an indexed
component. The interpretation that analyzes Obj.F (X, Y) into F (Obj)(X, Y)
is only legal if the first parameter of F is a controlling parameter. This
additional guard was previously missing from the predicate, leading to
malformed trees and a compiler crash.

Compiling huckel.adb must yield:

huckel.adb:135:27: expected type "Real" defined at huckel.ads:9
huckel.adb:135:27: found type "Ada.Numerics.Generic_Real_Arrays.Real_Matrix"
  from instance at huckel.ads:16

-- Huckel package
-- This is a translation from Fortran II code documented in the
-- book "Computing Methods for Quantum Organic Chemistry"
with Ada.Numerics.Generic_Real_Arrays;

package Huckel is
   type Real is digits 15;
   type Molecule (Atoms : Positive) is tagged private;
   function Input return Molecule;
   procedure Compute_Energies(Item : in out Molecule);
   procedure Output(Item : in Molecule);

private
   package Matrices is new Ada.Numerics.Generic_Real_Arrays(Real);
   use Matrices;
   type Molecule (Atoms : Positive) is tagged record
  Orbitals: Positive;
  Atomic_Matrix   : Real_Matrix(1..Atoms, 1..Atoms);
  Atomic_Diagonal : Real_Vector(1..Atoms);
  Unit_Matrix : Real_Matrix(1..Atoms, 1..Atoms);
  Bond_Orders : Real_Matrix(1..Atoms, 1..Atoms);
  Free_Valences   : Real_vector(1..Atoms);
   end record;
end Huckel;
---
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with Ada.Text_IO;
with Ada.Numerics.Generic_Elementary_Functions;

package body Huckel is
   package Real_IO is new Ada.Text_IO.Float_IO(Real);
   use Real_Io;

   ---
   -- Input --
   ---

   function Input return Molecule is
  Num_Atoms : Positive;
  Num_Orbs  : Positive;
   begin
  Get(Item => Num_Atoms);
  Get(Item => Num_Orbs);
  declare
 Temp : Molecule(Atoms => Num_Atoms);
  begin
 Temp.Orbitals := Num_Orbs;
 -- Read the atomic matrix into the upper semi-matrix of Atomic_Matrix
 for I in 1..Num_Atoms loop
for J in 1..I loop
   Get(Item => Temp.Atomic_Matrix(J, I));
   -- Print the input matrix in lower semi-matrix format
   Put(Item =>
 Temp.Atomic_Matrix(J,I), Aft => 0, Fore => 2, Exp => 0);
   -- Make all bonding terms negative
   Temp.Atomic_Matrix(I, J) := -Temp.Atomic_Matrix(I,J);
end loop;
New_Line;
 end loop;
 return Temp;
  end;
   end Input;

   
   -- Modify --
   
   procedure Modify(Item : in out Molecule) is
  Num_Mods : natural;
  I, J : Positive;
  Modification : Real;
   begin
  Get(Item => Num_Mods);
  if Num_Mods > 0 then
 New_Line(3);
 Put_Line("Modifications");
 for Num in 1..Num_Mods loop
Get(Item => I);
Get(Item => J);
Get(Item => Modification);
Put(Item => I, Width => 3);
Put(Item => J, Width => 6);
Put(Item => Modification, Aft => 3, Fore => 7, Exp => 0);
New_Line;
if I = J then
   Item.Atomic_Diagonal(J) := Modification;
elsif I < J then
   Item.Atomic_Matrix(I, J) := Modification;
else
   Item.Atomic_Matrix(J, I) := Modification;
end if;
 end loop;
  end if;
   end Modify;

   --
   -- Pahy --
   --
   procedure Pahy(Item : in out Molecule) is

   begin
  for J in 1..Item.Atoms loop
 for I in 1..J loop
Item.Atomic_Matrix(I, J) := Item.Atomic_Matrix(J, I);
Item.Atomic_Diagonal(J) := Item.Atomic_Matrix(J,J);
 end loop;
  end loop;
   end Pahy;

   
   -- Scofi1 --
   

   procedure Scofi1(Item : in out Molecule) is
package elem_funcs is new Ada.Numerics.Generic_Elementary_Functions(real);
  use elem_funcs;
  Max : Real := 0.0;
  J_up : Natural;
  Aii  : Real;
  Ajj  : Real;
  Aod  : Real;
  Asq  : Real;
  Eps  : constant Real := 1.0e-16;
  diffr : Real;
  sign  : Real;
  tden  : Real;
  Tank  : Real;
  C: Real;
  S  : Real;
  xj : Real;
   begin
  -- initialize unit matrix
  Item.Unit_Matrix := (Others => (Others => 0.0));
  for I in 1..Item.Atoms loop
 Item.Unit_Matrix(I, I) := 1.0;
  end loop;
  for I in 2..Item.Atoms loop
 J_Up := I - 1;
 for J in 1..J_Up loop
Aii := Item.Atomic_Diagonal(I);
Ajj := Item.Atomic_Diagonal(J);
Aod := Item.Atomic_Matrix(J, I);
Asq := Aod * Aod;
if Asq > Max then

Re: [PATCH] Factor out division by squares and remove division around comparisons (2/2)

2017-09-06 Thread Jackson Woodruff


Hi all,

A minor improvement came to mind while updating other parts of this patch.

I've updated a testcase to make it more clear and a condition now uses a 
call to is_division_by rather than manually checking those conditions.


Jackson

On 08/30/2017 05:32 PM, Jackson Woodruff wrote:

Hi all,

I've attached a new version of the patch in response to a few of Wilco's 
comments in person.


The end product of the pass is still the same, but I have fixed several 
bugs.


Now tested independently of the other patches.

On 08/15/2017 03:07 PM, Richard Biener wrote:

On Thu, Aug 10, 2017 at 4:10 PM, Jackson Woodruff
 wrote:

Hi all,

The patch implements the some of the division optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 .

We now reassociate (as discussed in the bug report):

 x / (y * y) -> x  * (1 / y) * (1 / y)

If it is reasonable to do so. This is done with
-funsafe-math-optimizations.

Bootstrapped and regtested with part (1/2). OK for trunk?


I believe your enhancement shows the inherent weakness of
CSE of reciprocals in that it works from the defs.  It will
handle x / (y * y) but not x / (y * y * y).

I think a rewrite of this mini-pass is warranted.


I suspect that there might be more to gain by of handling the case of
x / (y * z) rather than the case of x / (y**n), but I agree that this 
pass could do more.




Richard.


Jackson

gcc/

2017-08-03  Jackson Woodruff  

 PR 71026/tree-optimization
 * tree-ssa-math-opts (is_division_by_square,
 is_square_of, insert_sqaure_reciprocals): New.
 (insert_reciprocals): Change to insert reciprocals
 before a division by a square.
 (execute_cse_reciprocals_1): Change to consider
 division by a square.


gcc/testsuite

2017-08-03  Jackson Woodruff  

 PR 71026/tree-optimization
 * gcc.dg/associate_division_1.c: New.



Thanks,

Jackson.

Updated ChangeLog:

gcc/

2017-08-30  Jackson Woodruff  

 PR 71026/tree-optimization
 * tree-ssa-math-opts (is_division_by_square, is_square_of): New.
 (insert_reciprocals): Change to insert reciprocals
 before a division by a square and to insert the square
 of a reciprocal.
 (execute_cse_reciprocals_1): Change to consider
 division by a square.
 (register_division_in): Add importance parameter.

gcc/testsuite

2017-08-30  Jackson Woodruff  

 PR 71026/tree-optimization
 * gcc.dg/extract_recip_3.c: New.
 * gcc.dg/extract_recip_4.c: New.
 * gfortran.dg/extract_recip_1.f: New.
diff --git a/gcc/testsuite/gcc.dg/extract_recip_3.c 
b/gcc/testsuite/gcc.dg/extract_recip_3.c
new file mode 100644
index 
..ad9f2dc36f1e695ceca1f50bc78f4ac4fbb2e787
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/extract_recip_3.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+float
+extract_square (float *a, float *b, float x, float y)
+{
+  *a = 3 / (y * y);
+  *b = 5 / (y * y);
+
+  return x / (y * y);
+}
+
+/* Don't expect the 'powmult' (calculation of y * y)
+   to be deleted until a later pass, so look for one
+   more multiplication than strictly necessary.  */
+float
+extract_recip (float *a, float *b, float x, float y, float z)
+{
+  *a = 7 / y;
+  *b = x / (y * y);
+
+  return z / y;
+}
+
+/* 4 For the pointers to a, b, 4 multiplications in 'extract_square',
+   4 multiplications in 'extract_recip' expected.  */
+/* { dg-final { scan-tree-dump-times " \\* " 12 "optimized" } } */
+
+/* 1 division in 'extract_square', 1 division in 'extract_recip'. */
+/* { dg-final { scan-tree-dump-times " / " 2 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/extract_recip_4.c 
b/gcc/testsuite/gcc.dg/extract_recip_4.c
new file mode 100644
index 
..83105c60ced5c2671f3793d76482c35502712a2c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/extract_recip_4.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+/* Don't expect any of these divisions to be extracted.  */
+double f (double x, int p)
+{
+  if (p > 0)
+{
+  return 1.0/(x * x);
+}
+
+  if (p > -1)
+{
+  return x * x * x;
+}
+  return  1.0 /(x);
+}
+
+/* Expect a reciprocal to be extracted here.  */
+double g (double *a, double x, double y)
+{
+  *a = 3 / y;
+  double k = x / (y * y);
+
+  if (y * y == 2.0)
+return k + 1 / y;
+  else
+return k - 1 / y;
+}
+
+/* Expect 2 divisions in 'f' and 1 in 'g'.  */
+/* { dg-final { scan-tree-dump-times " / " 3 "optimized" } } */
+/* Expect 3 multiplications in 'f' and 4 in 'g'.  Also
+   expect one for the point to a.  */
+/* { dg-final { scan-tree-dump-times " \\* " 8 "optimized" } } */
diff --git a/gcc/testsuite/gfortran.dg/extract_recip_1.f 
b/gcc/testsuite/gfortran.dg/extract_recip_1.f
new file mode 100644
index 
..ecf05189773b6c2f46222857fd88fd010bfdf348
--- /dev/null
++

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-09-06 Thread Jackson Woodruff


On 08/30/2017 01:46 PM, Richard Biener wrote:

On Wed, Aug 30, 2017 at 11:46 AM, Jackson Woodruff
 wrote:

On 08/29/2017 01:13 PM, Richard Biener wrote:


On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff
 wrote:


Hi all,

Apologies again to those CC'ed, who (again) received this twice.

Joseph: Yes you are correct. I misread the original thread, now fixed.

Richard: I've moved the optimizations out of fold-const.c. One has been
replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C)
I've
deleted as it only introduced a new optimization when running with the
flags
'-O0 -funsafe-math-optimizations'.



Hmm, how did you verify that, that it only adds sth with -O0
-funsafe-math-optimizations?



By checking with various flags, although not exhaustively. I looked for
reasons for the behavior in match.pd (explained below).

I have also since discovered that the combinations of
'-funsafe-math-optimizations -frounding-math' (at all levels) and
'-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern adds
something.


Is that because in GIMPLE the reassoc pass should do this transform?


It's because the pattern that changes (X / C) -> X * (1 / C) is gated with
O1:

   (for cst (REAL_CST COMPLEX_CST VECTOR_CST)
(simplify
 (rdiv @0 cst@1)
->(if (optimize)
-> (if (flag_reciprocal_math
   && !real_zerop (@1))
   (with
{ tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @1);
}
(if (tem)
 (mult @0 { tem; } )))
   (if (cst != COMPLEX_CST)
(with { tree inverse = exact_inverse (type, @1); }
 (if (inverse)
  (mult @0 { inverse; } 


I've flagged the two lines that are particularly relevant to this.


So this means we go x / (C * y) -> (x / C) / y -> (x * (1/C)) / y
why's that in any way preferable?  I suppose this is again to enable
the recip pass to detect / y (as opposed to / (C * y))?  What's the
reason to believe that / y is more "frequent"?


Removing this pattern, as I would expect, means that the divisions in the
above optimization (and the one further down) are not removed.

So then there is the question of edge cases. This pattern is (ignoring the
second case) going to fail when const_binop returns null. Looking through
that function says that it will fail (for reals) when:

- Either argument is null (not the case)

- The operation is not in the list (PLUS_EXPR,
   MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR)
   (again not the case)

- We honor Signalling NaNs and one of the operands is a sNaN.

- The operation is a division, and the second argument is zero
   and dividing by 0.0 raises an exception.

- The result is infinity and neither of the operands were infinity
   and flag_trapping_math is set.

- The result isn't exact and flag_rounding_math is set.


For (x / ( y * C) -> (x / C) / y), I will add some gating for each of these
so that the pattern is never executed if one of these would be the case.


Why not transform this directly to (x * (1/C)) / y then (and only then)?
That makes it obvious not two divisions prevail.


Done.



That said, I'm questioning this canonicalization.  I can come up with a
testcase where it makes things worse:

  tem = x / (y * C);
  tem2 = z / (y * C);

should generate

  rdivtmp = 1 / (y*C);
  tem = x *rdivtmp;
  tem2= z * rdivtmp;

instead of

  rdivtmp = 1/y;
  tem = x * 1/C * rdivtmp;
  tem2 = z * 1/C * rdivtmp;


Ideally we would be able to CSE that into

rdivtmp = 1/y * 1/C;
tem = x * rdivtmp;
tem2 = z * rdivtmp;

Although we currently do not. An equally (perhaps more?) problematic 
case is something like:


tem = x / (y * C)
tem2 = y * C

Which becomes:

tem = x * (1 / C) / y
tem2 = y * C

Instead of

K = y * C
tem = x / K
tem2 = K

Which ultimately requires context awareness to avoid. This does seem to 
be a general problem with a large number of match.pd patterns rather 
than anything specific to this one. For example, a similar example can 
be constructed for (say) (A / B) / C -> (A / (B * C)).





The additional cases where this isn't converted to a multiplication by the
reciprocal appear to be when -freciprocal-math is used, but we don't have
-fno-rounding-math, or funsafe-math-optimizations.
>>



On O1 and up, the pattern that replaces 'x / C' with 'x * (1 / C)'
is enabled and then the pattern is covered by that and
(x * C +- y * C -> C * (x +- y)) (which is already present in match.pd)

I have also updated the testcase for those optimizations to use 'O1' to
avoid that case.


On 08/24/2017 10:06 PM, Jeff Law wrote:



On 08/17/2017 03:55 AM, Wilco Dijkstra wrote:



Richard Biener wrote:



On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra

wrote:



Richard Biener wrote:



We also change the association of

 x / (y * C) -> (x / C) / y

If C is a constant.




Why's that profitable?




It enables (x * C1) / (y * C2) -> (x * C1/C2) / y for example.
Also 1/y is now available to the reciprocal optimization, see
https://gcc.g

[Ada] Spurious errors on derived untagged types with partial constraints

2017-09-06 Thread Arnaud Charlet

This patch fixes the handling of untagged discriminated derived types that
constrain some parent discriminants and rename others. The compiler failed
to handle a change of representation on the derived type, and generated
faulty code for the initialization procedure or such a derived type.

Executing:
---
   gnatmake -q p
   p
--
must yield:
--
1234
   TRUE
20
   discriminant rules!!

---
with Q; use Q;
with Text_IO; use Text_IO;
procedure P is

  procedure Inner (B : Base) is begin
null; -- Put_Line (B.S);
Put_Line (Integer'Image (B.I));
Put_Line (Boolean'Image (B.B));
Put_Line (Integer'Image (B.D));
Put_Line (B.S);
 end;

  D1 : Derived (True);

begin
  D1.S := "discriminant rules!!";
  Inner (Base (D1));
end;
---
package Q is

  type Base (D : Positive; B : Boolean) is record
I : Integer := 1234;
S : String (1 .. D);  --   := (1 .. D => 'Q');
  end record;

  type Derived (B : Boolean) is new Base (D => 20, B => B);
  for Derived use record
I at 0 range 0 .. 31;
  end record;
  Thing : Derived (False);

end Q;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* exp_ch4.adb (Handle_Changed_Representation): For an untagged
derived type with a mixture of renamed and constrained parent
discriminants, the constraint for the target must obtain the
discriminant values from both the operand and from the stored
constraint for it, given that the constrained discriminants are
not visible in the object.
* exp_ch5.adb (Make_Field_Assign): The type of the right-hand
side may be derived from that of the left-hand side (as in the
case of an assignment with a change of representation) so the
discriminant to be used in the retrieval of the value of the
component must be the entity in the type of the right-hand side.

Index: exp_ch5.adb
===
--- exp_ch5.adb (revision 251753)
+++ exp_ch5.adb (working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2017, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -1448,9 +1448,21 @@
 U_U : Boolean := False) return Node_Id
  is
 A: Node_Id;
+Disc : Entity_Id;
 Expr : Node_Id;
 
  begin
+
+--  The discriminant entity to be used in the retrieval below must
+--  be one in the corresponding type, given that the assignment
+--  may be between derived and parent types.
+
+if Is_Derived_Type (Etype (Rhs)) then
+   Disc := Find_Component (R_Typ, C);
+else
+   Disc := C;
+end if;
+
 --  In the case of an Unchecked_Union, use the discriminant
 --  constraint value as on the right-hand side of the assignment.
 
@@ -1463,7 +1475,7 @@
Expr :=
  Make_Selected_Component (Loc,
Prefix=> Duplicate_Subexpr (Rhs),
-   Selector_Name => New_Occurrence_Of (C, Loc));
+   Selector_Name => New_Occurrence_Of (Disc, Loc));
 end if;
 
 A :=
Index: exp_ch4.adb
===
--- exp_ch4.adb (revision 251758)
+++ exp_ch4.adb (working copy)
@@ -10627,7 +10627,6 @@
  Temp : Entity_Id;
  Decl : Node_Id;
  Odef : Node_Id;
- Disc : Node_Id;
  N_Ix : Node_Id;
  Cons : List_Id;
 
@@ -10657,23 +10656,70 @@
 
 if not Is_Constrained (Target_Type) then
if Has_Discriminants (Operand_Type) then
-  Disc := First_Discriminant (Operand_Type);
 
-  if Disc /= First_Stored_Discriminant (Operand_Type) then
- Disc := First_Stored_Discriminant (Operand_Type);
-  end if;
+  --  A change of representation can only apply to untagged
+  --  types. We need to build the constraint that applies to
+  --  the target type, using the constraints of the operand.
+  --  The analysis is complicated if there are both inherited
+  --  discriminants and constrained discriminants.
+  --  We iterate over the discriminants of the target, and
+  --  find the discriminant of the same name

[Ada] Minor cleanup in support machinery for inter-unit inlining

2017-09-06 Thread Arnaud Charlet

The inter-unit inlining done by the compiler requires a dedicated machinery
to deal with the public status of library-level entities, since it breaks
the private/plublic semantic barrier of the language.

This is a minor cleanup to this machinery, no functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Eric Botcazou  

* sem_ch7.adb (Has_Referencer): Move up and expand comment
explaining the test used to detect inlining.  Use same test
in second occurrence.
(Analyze_Package_Body_Helper): Minor formatting fixes.

Index: sem_ch7.adb
===
--- sem_ch7.adb (revision 251762)
+++ sem_ch7.adb (working copy)
@@ -392,6 +392,13 @@
 
  --  An inlined subprogram body acts as a referencer
 
+ --  Note that we test Has_Pragma_Inline here in addition
+ --  to Is_Inlined. We are doing this for a client, since
+ --  we are computing which entities should be public, and
+ --  it is the client who will decide if actual inlining
+ --  should occur, so we need to catch all cases where the
+ --  subprogram may be inlined by the client.
+
  if Is_Inlined (Decl_Id)
or else Has_Pragma_Inline (Decl_Id)
  then
@@ -413,18 +420,13 @@
   else
  Decl_Id := Defining_Entity (Decl);
 
- --  An inlined body acts as a referencer. Note that an
- --  inlined subprogram remains Is_Public as gigi requires
- --  the flag to be set.
+ --  An inlined body acts as a referencer, see above. Note
+ --  that an inlined subprogram remains Is_Public as gigi
+ --  requires the flag to be set.
 
- --  Note that we test Has_Pragma_Inline here rather than
- --  Is_Inlined. We are compiling this for a client, and
- --  it is the client who will decide if actual inlining
- --  should occur, so we need to assume that the procedure
- --  could be inlined for the purpose of accessing global
- --  entities.
-
- if Has_Pragma_Inline (Decl_Id) then
+ if Is_Inlined (Decl_Id)
+   or else Has_Pragma_Inline (Decl_Id)
+ then
 if Top_Level
   and then not Contains_Subprograms_Refs (Decl)
 then
@@ -915,11 +917,11 @@
   --  down the number of global symbols that do not neet public visibility
   --  as this has two beneficial effects:
   --(1) It makes the compilation process more efficient.
-  --(2) It gives the code generatormore freedom to optimize within each
+  --(2) It gives the code generator more leeway to optimize within each
   --unit, especially subprograms.
 
-  --  This is done only for top level library packages or child units as
-  --  the algorithm does a top down traversal of the package body.
+  --  This is done only for top-level library packages or child units as
+  --  the algorithm does a top-down traversal of the package body.
 
   if (Scope (Spec_Id) = Standard_Standard or else Is_Child_Unit (Spec_Id))
 and then not Is_Generic_Unit (Spec_Id)

[PATCH] Factor out division by squares and remove division around comparisons (0/2)

2017-09-06 Thread Jackson Woodruff


Hi all,

This patch is split from part (1/2). It includes the patterns that have 
been moved out of fold-const.c



It also removes an (almost entirely) redundant pattern:

(A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2)

which was only used in special cases, either with combinations
of flags like -fno-reciprocal-math -funsafe-math-optimizations
and cases where C was sNaN, or small enough to result in infinity.

This pattern is covered by:

   (A / C1) +- (A / C2) -> (with O1 and reciprocal math)
   A * (1 / C1) +- A * (1 / C2) ->
   A * (1 / C1 +- 1 / C2)

The previous pattern required funsafe-math-optimizations.

To adjust for this case, the testcase has been updated to require O1 so 
that the optimization is still performed.



This pattern is moved verbatim into match.pd:

(A / C) +- (B / C) -> (A +- B) / C.

OK for trunk?

Jackson

gcc/

2017-08-30  Jackson Woodruff  

PR 71026/tree-optimization
* match.pd: Move RDIV patterns from fold-const.c
* fold-const.c (distribute_real_division): Removed.
(fold_binary_loc): Remove calls to distribute_real_divison.

gcc/testsuite/

2017-08-30  Jackson Woodruff  

PR 71026/tree-optimization
* gcc/testsuire/gcc.dg/fold-div-1.c: Use O1.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 
de60f681514aacedb993d5c83c081354fa3b342b..9de1728fb27b7749aaca1ab318b88c4c9b237317
 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3794,47 +3794,6 @@ invert_truthvalue_loc (location_t loc, tree arg)
   : TRUTH_NOT_EXPR,
  type, arg);
 }
-
-/* Knowing that ARG0 and ARG1 are both RDIV_EXPRs, simplify a binary operation
-   with code CODE.  This optimization is unsafe.  */
-static tree
-distribute_real_division (location_t loc, enum tree_code code, tree type,
- tree arg0, tree arg1)
-{
-  bool mul0 = TREE_CODE (arg0) == MULT_EXPR;
-  bool mul1 = TREE_CODE (arg1) == MULT_EXPR;
-
-  /* (A / C) +- (B / C) -> (A +- B) / C.  */
-  if (mul0 == mul1
-  && operand_equal_p (TREE_OPERAND (arg0, 1),
-  TREE_OPERAND (arg1, 1), 0))
-return fold_build2_loc (loc, mul0 ? MULT_EXPR : RDIV_EXPR, type,
-   fold_build2_loc (loc, code, type,
-TREE_OPERAND (arg0, 0),
-TREE_OPERAND (arg1, 0)),
-   TREE_OPERAND (arg0, 1));
-
-  /* (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2).  */
-  if (operand_equal_p (TREE_OPERAND (arg0, 0),
-  TREE_OPERAND (arg1, 0), 0)
-  && TREE_CODE (TREE_OPERAND (arg0, 1)) == REAL_CST
-  && TREE_CODE (TREE_OPERAND (arg1, 1)) == REAL_CST)
-{
-  REAL_VALUE_TYPE r0, r1;
-  r0 = TREE_REAL_CST (TREE_OPERAND (arg0, 1));
-  r1 = TREE_REAL_CST (TREE_OPERAND (arg1, 1));
-  if (!mul0)
-   real_arithmetic (&r0, RDIV_EXPR, &dconst1, &r0);
-  if (!mul1)
-real_arithmetic (&r1, RDIV_EXPR, &dconst1, &r1);
-  real_arithmetic (&r0, code, &r0, &r1);
-  return fold_build2_loc (loc, MULT_EXPR, type,
- TREE_OPERAND (arg0, 0),
- build_real (type, r0));
-}
-
-  return NULL_TREE;
-}
 
 /* Return a BIT_FIELD_REF of type TYPE to refer to BITSIZE bits of INNER
starting at BITPOS.  The field is unsigned if UNSIGNEDP is nonzero
@@ -9378,12 +9337,6 @@ fold_binary_loc (location_t loc,
}
}
 
- if (flag_unsafe_math_optimizations
- && (TREE_CODE (arg0) == RDIV_EXPR || TREE_CODE (arg0) == 
MULT_EXPR)
- && (TREE_CODE (arg1) == RDIV_EXPR || TREE_CODE (arg1) == 
MULT_EXPR)
- && (tem = distribute_real_division (loc, code, type, arg0, arg1)))
-   return tem;
-
   /* Convert a + (b*c + d*e) into (a + b*c) + d*e.
  We associate floats only if the user has specified
  -fassociative-math.  */
@@ -9775,13 +9728,6 @@ fold_binary_loc (location_t loc,
return tem;
}
 
-  if (FLOAT_TYPE_P (type)
- && flag_unsafe_math_optimizations
- && (TREE_CODE (arg0) == RDIV_EXPR || TREE_CODE (arg0) == MULT_EXPR)
- && (TREE_CODE (arg1) == RDIV_EXPR || TREE_CODE (arg1) == MULT_EXPR)
- && (tem = distribute_real_division (loc, code, type, arg0, arg1)))
-   return tem;
-
   /* Handle (A1 * C1) - (A2 * C2) with A1, A2 or C1, C2 being the same or
 one.  Make sure the type is not saturating and has the signedness of
 the stripped operands, as fold_plusminus_mult_expr will re-associate.
diff --git a/gcc/match.pd b/gcc/match.pd
index 
69dd8193cd0524d99fba8be8da8183230b8d621a..ab3f133f443a02e423abfbd635947ecaa8024a74
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3517,6 +3517,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (!HONOR_SNANS (type))
@0))
 
+ (for op (plus minus)
+  /* Simplify (A / C) +- (B / C) -> (A +- B) / C.  */
+  (simplify
+   (op (rdi

[Ada] Resolution of set membersip operations with overloaded alternatives

2017-09-06 Thread Arnaud Charlet

This patch fixes a bug in the resolution of set membership operations when
the expression and/or the alternatives on the right-hand side are overloaded.
If a given overloaded alternative is resolved to a unique type by intersection
with the types of previous alternatives, the type is used subsequently to
resolve the expression itself. If the alternative is an enumeration literal,
it must be replaced by the literal correspoding to the selected interpretation,
because subsequent resolution will not replace the entity itself.

The following must compile and run quietly:

gnatmake -q -gnatws c45
c45

---
with Text_IO; use Text_IO;
procedure C45 is
   procedure Failed (Msg : String) is
   begin
  Put_Line (Msg);
   end;

   type Month is (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec);
   type Radix is (Bin, Oct, Dec, Hex);
   type Shape is (Tri, Sqr, Pnt, Hex, Oct);
  -- Oct is defined for all three types; Dec for all but Shape; and Hex for
  -- all but Month.

   -- Three identical functions, one for each type. These provide no
   -- overloading information at all.
   function Item return Month is
   begin
  return Aug;
   end Item;

   function Item return Radix is
   begin
  return Dec;
   end Item;

   function Item return Shape is
   begin
  return Hex;
   end Item;


begin

   -- No overloading in the choices:
   if Item in Jan .. Mar then -- type Month
  Failed ("Wrong result - no choice overloading (1)");
   end if;

   if Item in Tri | Sqr | Pnt then -- type Radix
  Failed ("Wrong result - no choice overloading (2)");
   end if;

   -- A single overloaded choice:
   if Item not in May .. Oct then -- type Month
  Failed ("Wrong result - single overloaded choice (3)");
   end if;

   if Item not in Bin | Dec then -- type Radix
  Failed ("Wrong result - single overloaded choice (4)");
   end if;

   if Item not in Tri | Sqr | Hex then -- type Shape
  Failed ("Wrong result - single overloaded choice (5)");
   end if;

   -- At least one choice without overloading:
   if Item in Jan | Oct .. Dec then -- type Month
  Failed ("Wrong result - a non-overloaded choice (6)");
   end if;

   if Item not in Oct .. Hex | Bin then -- type Radix
  Failed ("Wrong result - a non-overloaded choice (7)");
   end if;

   if Item not in Oct | Sqr | Hex then -- type Shape
  Failed ("Wrong result - a non-overloaded choice (8)");
   end if;

   if Item not in Oct | Sqr | Hex | Tri then -- type Shape
  Failed ("Wrong result - a non-overloaded choice (9)");
   end if;

   if Item not in Dec | Hex | Oct | Bin then -- type Radix
  Failed ("Wrong result - a non-overloaded choice (10");
   end if;

   -- The ultimate: everything is overloaded, but there still is only
   -- one possible solution.
   if Item not in Oct | Dec | Hex then -- type Radix
  Failed ("Wrong result - everything overloaded (11)");
   end if;

end C45;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* sem_ch4.adb (Analyze_Set_Membership):  If an alternative
in a set membership is an overloaded enumeration literal, and
the type of the alternative is resolved from a previous one,
replace the entity of the alternative as well as the type,
to prevent inconsistencies between the entity and the type.

Index: sem_ch4.adb
===
--- sem_ch4.adb (revision 251753)
+++ sem_ch4.adb (working copy)
@@ -2935,11 +2935,20 @@
   --  for all of them.
 
   Set_Etype (Alt, It.Typ);
+
+  --  If the alternative is an enumeration literal, use
+  --  the one for this interpretation.
+
+  if Is_Entity_Name (Alt) then
+ Set_Entity (Alt, It.Nam);
+  end if;
+
   Get_Next_Interp (Index, It);
 
   if No (It.Typ) then
  Set_Is_Overloaded (Alt, False);
  Common_Type := Etype (Alt);
+
   end if;
 
   Candidate_Interps := Alt;

[Ada] Enable automatic reordering of components in record types

2017-09-06 Thread Arnaud Charlet

This activates the reordering of components in record types with convention
Ada that was implemented some time ago in the compiler.  The idea is to get
rid of blatant inefficiencies that the layout in textual order of the source
code can bring about, typically when the offset of components is not fixed
or not a multiple of the storage unit.

The reordering is automatic and silent by default, but both aspects can be
toggled: pragma No_Component_Reordering disables it either on a per-record-
type or on a global basis, while -gnatw.q gives a warning for each affected
component in record types.  When pragma No_Component_Reordering is used as a
configuration pragma to disable it, there is a requirement that the pragma
be used consistently within a partition.

The typical example is a discriminated record type with an array component,
which yields with -gnatw.q -gnatl:

 1. package P is
 2.
 3.   type R (D : Positive) is record
 4. S : String (1 .. D);
|
>>> warning: record layout may cause performance issues
>>> warning: component "S" whose length depends on a discriminant
>>> warning: comes too early and was moved down

 5. I : Integer;
 6.   end record;
 7.
 8. end P;

In this case, the compiler moves component S to the last position in the
record so that every component is at a fixed offset from the start.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Eric Botcazou  

* ali.ads (ALIs_Record): Add No_Component_Reordering component.
(No_Component_Reordering_Specified): New switch.
* ali.adb (Initialize_ALI): Set No_Component_Reordering_Specified.
(Scan_ALI): Set No_Component_Reordering and deal with NC marker.
* bcheck.adb (Check_Consistent_No_Component_Reordering):
New check.
(Check_Configuration_Consistency): Invoke it.
* debug.adb (d.r): Toggle the effect of the switch.
(d.v): Change to no-op.
* einfo.ads (Has_Complex_Representation):
Restrict to record types.
(No_Reordering): New alias for Flag239.
(OK_To_Reorder_Components): Delete.
(No_Reordering): Declare.
(Set_No_Reordering): Likewise.
(OK_To_Reorder_Components): Delete.
(Set_OK_To_Reorder_Components): Likewise.
* einfo.adb (Has_Complex_Representation): Expect record types.
(No_Reordering): New function.
(OK_To_Reorder_Components): Delete.
(Set_Has_Complex_Representation): Expect base record types.
(Set_No_Reordering): New procedure.
(Set_OK_To_Reorder_Components): Delete.
(Write_Entity_Flags): Adjust to above change.
* fe.h (Debug_Flag_Dot_R): New macro and declaration.
* freeze.adb (Freeze_Record_Type): Remove conditional code setting
OK_To_Reorder_Components on record types with convention Ada.
* lib-writ.adb (Write_ALI): Deal with NC marker.
* opt.ads (No_Component_Reordering): New flag.
(No_Component_Reordering_Config): Likewise.
(Config_Switches_Type): Add No_Component_Reordering component.
* opt.adb (Register_Opt_Config_Switches): Copy
No_Component_Reordering onto No_Component_Reordering_Config.
(Restore_Opt_Config_Switches): Restore No_Component_Reordering.
(Save_Opt_Config_Switches): Save No_Component_Reordering.
(Set_Opt_Config_Switches): Set No_Component_Reordering.
* par-prag.adb (Prag): Deal with Pragma_No_Component_Reordering.
* sem_ch3.adb (Analyze_Private_Extension_Declaration): Also set the
No_Reordering flag from the default.
(Build_Derived_Private_Type): Likewise.
(Build_Derived_Record_Type): Likewise.  Then inherit it
for untagged types and clean up handling of similar flags.
(Record_Type_Declaration): Likewise.
* sem_ch13.adb (Same_Representation): Deal with No_Reordering and
remove redundant test on Is_Tagged_Type.
* sem_prag.adb (Analyze_Pragma): Handle No_Component_Reordering.
(Sig_Flags): Likewise.
* snames.ads-tmpl (Name_No_Component_Reordering): New name.
(Pragma_Id): Add Pragma_No_Component_Reordering value.
* warnsw.adb (Set_GNAT_Mode_Warnings): Enable -gnatw.q as well.
* gcc-interface/decl.c (gnat_to_gnu_entity) :
Copy the layout of the parent type only if the No_Reordering
settings match.
(components_to_record): Reorder record types with
convention Ada by default unless No_Reordering is set or -gnatd.r
is specified and do not warn if No_Reordering is set in GNAT mode.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 251759)
+++ sem_ch3.adb (working copy)
@@ -5015,6 +5015,7 @@
   Set_Ekind(T, E_Record_Type_With_Private);
   Init_Size_Align  (T);
   Set_Default_SSO  (T);
+  Set_No_Reordering

[Ada] Spurious error with formal incomplete types

2017-09-06 Thread Arnaud Charlet

This patch fixes a spurious error on the use of of a generic unit with formal
incomplete types, as a formal package in another generic unit, when the
actuals for the incomplete types are themselves formal incomplete types.

The treatment of incomplete subtypes that are created for such formals is now
more consistent with the handling of other subtypes, given their increased
use in Ada2012.

The following must compile quietly:

---
   gcc -c promote_2_streams.ads


generic
   type Data_Stream_Type;
   type Data_Type;
   with function Has_Data
 (Stream : not null access Data_Stream_Type) return Boolean;
   with function Consume
 (Stream : not null access Data_Stream_Type) return Data_Type;
package Data_Streams is end;
---
with Data_Streams;
generic
  type Data1_Type is private;
  type Data2_Type is private;

  with package DS1 is new Data_Streams (Data_Type => Data1_Type, others => <>);
  with package DS2 is new Data_Streams (Data_Type => Data2_Type, others => <>);
package Promote_2_Streams is

   type Which_Type is range 1 .. 2;

   type Data_Type (Which : Which_Type := 1) is record
  case Which is
 when 1 => Data1 : Data1_Type;
 when 2 => Data2 : Data2_Type;
  end case;
   end record;

   function Consume1 (Stream : not null access DS1.Data_Stream_Type)
 return Data_Type is ((Which => 1, Data1 => DS1.Consume (Stream)));

   function Consume2 (Stream : not null access DS2.Data_Stream_Type)
 return Data_Type is ((Which => 2, Data2 => DS2.Consume (Stream)));

   package PS1 is new Data_Streams
 (DS1.Data_Stream_Type, Data_Type, DS1.Has_Data, Consume1);

   package PS2 is new Data_Streams
 (DS2.Data_Stream_Type, Data_Type, DS2.Has_Data, Consume2);

end Promote_2_Streams;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* einfo.adb (Designated_Type): Use Is_Incomplete_Type to handle
properly incomplete subtypes that may be created by explicit or
implicit declarations.
(Is_Base_Type): Take E_Incomplete_Subtype into account.
(Subtype_Kind): Ditto.
* sem_ch3.adb (Build_Discriminated_Subtype): Set properly the
Ekind of a subtype of a discriminated incomplete type.
(Fixup_Bad_Constraint): Use Subtype_Kind in all cases, including
incomplete types, to preserve error reporting.
(Process_Incomplete_Dependents): Do not create a subtype
declaration for an incomplete subtype that is created internally.
* sem_ch7.adb (Analyze_Package_Specification): Handle properly
incomplete subtypes that do not require a completion, either
because they are limited views, of they are generic actuals.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 251753)
+++ sem_ch3.adb (working copy)
@@ -10094,7 +10094,11 @@
  --  elaboration, because only the access type is needed in the
  --  initialization procedure.
 
- Set_Ekind (Def_Id, Ekind (T));
+ if Ekind (T) = E_Incomplete_Type then
+Set_Ekind (Def_Id, E_Incomplete_Subtype);
+ else
+Set_Ekind (Def_Id, Ekind (T));
+ end if;
 
  if For_Access and then Within_Init_Proc then
 null;
@@ -13629,15 +13633,9 @@
 
   procedure Fixup_Bad_Constraint is
   begin
- --  Set a reasonable Ekind for the entity. For an incomplete type,
- --  we can't do much, but for other types, we can set the proper
- --  corresponding subtype kind.
+ --  Set a reasonable Ekind for the entity, including incomplete types.
 
- if Ekind (T) = E_Incomplete_Type then
-Set_Ekind (Def_Id, Ekind (T));
- else
-Set_Ekind (Def_Id, Subtype_Kind (Ekind (T)));
- end if;
+ Set_Ekind (Def_Id, Subtype_Kind (Ekind (T)));
 
  --  Set Etype to the known type, to reduce chances of cascaded errors
 
@@ -20802,7 +20800,9 @@
  --  Ada 2005 (AI-412): Transform a regular incomplete subtype into a
  --  corresponding subtype of the full view.
 
- elsif Ekind (Priv_Dep) = E_Incomplete_Subtype then
+ elsif Ekind (Priv_Dep) = E_Incomplete_Subtype
+and then Comes_From_Source (Priv_Dep)
+ then
 Set_Subtype_Indication
   (Parent (Priv_Dep), New_Occurrence_Of (Full_T, Sloc (Priv_Dep)));
 Set_Etype (Priv_Dep, Full_T);
Index: sem_ch7.adb
===
--- sem_ch7.adb (revision 251753)
+++ sem_ch7.adb (working copy)
@@ -1441,11 +1441,14 @@
 
  --  Check on incomplete types
 
- --  AI05-0213: A formal incomplete type has no completion
+ --  AI05-0213: A formal incomplete type has no completion,
+ --  and neither does the corresponding subtype in an instance.
 
- if Ekind (E) = E_Incomplete_Type
+ if Is_Incomplete_Type (E)

[Ada] Extension of 'Image in Ada2020.

2017-09-06 Thread Arnaud Charlet

AI12-0124 adds the notation Object'Image to the language, following the
semantics of GNAT-defined attribute 'Img. This patch fixes an omission in
the characterization of objects, which must include function calls and thus
attribute references for attributes that are functions, as well as predefined
operators.

The following must compile and execute quietly:

   gnatmake -q img
   img

---
procedure Img is
type Enum is (A, BC, ABC, A_B_C, abcd, 'd');
  type New_Enum is new Enum;

  function Ident (X : Enum) return Enum is
  begin
 return X;
  end Ident;

  E1 : New_Enum := New_Enum (Ident (BC));

  type Int is new Long_Integer;
  type Der is new Int;

  function Ident (X : Der) return Der is
  begin
 return X;
  end Ident;

  V : Der := Ident (123);
begin
   if New_Enum'Pred (E1)'Img /= "A" then
  raise Program_Error;
   end if;

   if New_Enum'Pred (E1)'Image /= "A" then
  raise Program_Error;
   end if;

   if Der'(V - 23)'Image /= "100" then
  raise Program_Error;
   end if;
end;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* sem_util.adb (Is_Object_Reference): A function call is an
object reference, and thus attribute references for attributes
that are functions (such as Pred and Succ) as well as predefined
operators are legal in contexts that require an object, such as
the prefix of attribute Img and the Ada2020 version of 'Image.

Index: sem_util.adb
===
--- sem_util.adb(revision 251753)
+++ sem_util.adb(working copy)
@@ -14153,18 +14153,21 @@
 --  In Ada 95, a function call is a constant object; a procedure
 --  call is not.
 
-when N_Function_Call =>
+--  Note that predefined operators are functions as well, and so
+--  are attributes that are (can be renamed as) functions.
+
+when N_Function_Call | N_Binary_Op | N_Unary_Op =>
return Etype (N) /= Standard_Void_Type;
 
---  Attributes 'Input, 'Loop_Entry, 'Old, and 'Result produce
---  objects.
+--  Attributes references 'Loop_Entry, 'Old, and 'Result yield
+--  objects, even though they are not functions.
 
 when N_Attribute_Reference =>
return
- Nam_In (Attribute_Name (N), Name_Input,
- Name_Loop_Entry,
+ Nam_In (Attribute_Name (N), Name_Loop_Entry,
  Name_Old,
- Name_Result);
+ Name_Result)
+  or else Is_Function_Attribute_Name (Attribute_Name (N));
 
 when N_Selected_Component =>
return

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-06 Thread Bernd Edlinger

On 09/05/17 23:27, Wilco Dijkstra wrote:
> Bernd Edlinger wrote:
>> No, the split condition does not begin with "&& TARGET_32BIT...".
>> Therefore the split is enabled in TARGET_NEON after reload_completed.
>> And it is invoked from adddi3_neon for all alternatives without vfp
>> registers:
> 
> Hmm that's a huge mess. I'd argue that any inst_and_split should only split
> it's own instruction, never other instructions (especially if they are from
> different md files, which is extremely confusing). Otherwise we should use
> a separate split and explicitly list which instructions it splits.
> 

This is literally a mine-field.

> So then the next question is whether the neon_adddi3 still needs the
> arm_adddi3 splitter in some odd corner cases?
> 

Yes, I think so.
While *arm_adddi3 and adddi3_neon insn are mutually exclusive,
they share the splitter.

I don't know with which iwmmxt-pattern the *arm_adddi3-splitter might
interfere, therefore I don't know if the !TARGET_IWMMXT can be removed
from the splitter condition.

Other patterns have a iwmmxt-twin that is not mutually exclusive.
For instance *anddi_notdi_di, bicdi3_neon and iwmmxt_nanddi3
The bicdi3_neon insn duplicates the alternatives while iwmmxt does not.

And nobody is able to test iwmmxt.

>>> Also there are more cases, a quick grep suggests *anddi_notdi_di has the 
>>> same issue.
> 
>> Yes, that pattern can be cleaned up in a follow-up patch.
> 
> And there are a lot more instructions that need the same treatment and split
> early (probably best at expand time). I noticed none of the zero/sign extends
> split before regalloc for example.
> 

I did use the test cases as benchmarks to decide which way to go.

It is quite easy to try different combinations of cpu and inspect
the stack usage of neon, iwmmxt, and vfp etc.

It may be due to interaction with other patterns, but not in every case
the split at expand time produced the best results.

>> Note this splitter is invoked from bicdi3_neon as well.
>> However I think anddi_notdi_di should be safe as long as it is enabled
>> after reload_completed (which is probably a bug).
> 
> Since we should be splitting and/bic early now I don't think you can get 
> anddi_notdi
> anymore. So it could be removed completely assuming Neon already does the 
> right
> thing.
> > It looks like we need to do a full pass over all DI mode instructions 
and clean up
> all the mess.
> 

Yes, but in small steps, and using some benchmarks to make sure that
each step does improve at least something.

Bernd.

> Wilco
>

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

2017-09-06 Thread Jakub Jelinek

On Tue, Sep 05, 2017 at 11:21:48PM -0600, Jeff Law wrote:
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -5763,14 +5763,15 @@ reassociate_bb (basic_block bb)
>"Width = %d was chosen for reassociation\n", 
> width);
>  
>  
> -   /* For binary bit operations, if the last operand in
> -  OPS is a constant, move it to the front.  This
> -  helps ensure that we generate (X & Y) & C rather
> -  than (X & C) & Y.  The former will often match
> -  a canonical bit test when we get to RTL.  */
> -   if ((rhs_code == BIT_AND_EXPR
> -|| rhs_code == BIT_IOR_EXPR
> -|| rhs_code == BIT_XOR_EXPR)
> +   /* For binary bit operations, if there are at least 3
> +  operands and the last last operand in OPS is a constant,
> +  move it to the front.  This helps ensure that we generate
> +  (X & Y) & C rather than (X & C) & Y.  The former will
> +  often match a canonical bit test when we get to RTL.  */
> +   if (ops.length () != 2

So wouldn't it be clearer to write ops.length () > 2 ?
if (ops.length () == 0)
else if (ops.length () == 1)
come earlier, so it is the same thing, but might help the reader.

> +   && (rhs_code == BIT_AND_EXPR
> +   || rhs_code == BIT_IOR_EXPR
> +   || rhs_code == BIT_XOR_EXPR)
> && TREE_CODE (ops.last ()->op) == INTEGER_CST)
>   std::swap (*ops[0], *ops[ops_num - 1]);

Don't you then want to put the constant as second operand rather than first,
i.e. swap with *ops[1]?
And doesn't swap_ops_for_binary_stmt undo it again?

Jakub

[Ada] Compiler crash on call to eliminated protected operation.

2017-09-06 Thread Arnaud Charlet

This patch fixes an omission in the handling of pragma Eliminate when applied
to a protected operation. The pragma was properly  processed, but a call to
an eliminated protected operation was not flagged as an error, and the code
generator aborted on a call to an undefined operation.

Compiling:

  gcc -c -gnatec=gnat.adc data.adb

must yield:

   data.adb:12:14: cannot reference subprogram "Some_Protected_Data"
   eliminated at Global_Pragmas.adc:4
   data.adb:20:21: cannot reference subprogram "Some_Protected_Data"
   eliminated at Global_Pragmas.adc:4

---
--  List of unused entities to be placed in gnat.adc.  --
pragma Eliminate (Data, Some_Protected_Data, Source_Location => "data.ads:12");

---
package Data is

  type Data_Type_T is new Natural;

   function Get_Private_Data return
 Data_Type_T;

private

   protected type Some_Type is

  function Some_Protected_Data return
Data_Type_T;
   private
  Data : Data_Type_T := 0;
   end Some_Type;

end Data;
---
package body Data is
   protected body Some_Type is
  function Some_Protected_Data return
Data_Type_T
  is
  begin
 return Data;
  end Some_Protected_Data;

   function Redundant return Data_Type_T is
   begin
  return Some_Protected_Data;
   end;
   end Some_Type;

   My_Data : Some_Type;

   function Get_Private_Data return Data_Type_T is
   begin
  return My_Data.Some_Protected_Data;
   end Get_Private_Data;
end Data;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-06  Ed Schonberg  

* sem_res.adb (Resolve_Entry_Call): Check whether a protected
operation is subject to a pragma Eliminate.

Index: sem_res.adb
===
--- sem_res.adb (revision 251753)
+++ sem_res.adb (working copy)
@@ -7519,10 +7519,15 @@
 
   if Nkind (Entry_Name) = N_Selected_Component then
 
- --  Simple entry call
+ --  Simple entry or protected operation call
 
  Nam := Entity (Selector_Name (Entry_Name));
  Obj := Prefix (Entry_Name);
+
+ if Is_Subprogram (Nam) then
+Check_For_Eliminated_Subprogram (Entry_Name, Nam);
+ end if;
+
  Was_Over := Is_Overloaded (Selector_Name (Entry_Name));
 
   else pragma Assert (Nkind (Entry_Name) = N_Indexed_Component);

Re: [PR 82078] Enqueue all SRA links for write flag propagation

2017-09-06 Thread Richard Biener

On Wed, 6 Sep 2017, Martin Jambor wrote:

> Hi,
> 
> PR 82078 is another fallout from lazy setting of written flag in SRA.
> The problem here is that we do not enqueue assignment links going out
> of accesses of candidates that were disqualified before we start the
> loop with sort_and_splice_var_accesses.
> 
> Given that the propagation is now a correctness necessity, we need to
> enqueue all links for processing, so this patch does it when they they
> are created.  There should be very little extra work done because of
> this because propagate_all_subaccesses starts with checking the
> candidate-status of both sides of each link and acts accordingly.
> 
> Bootstrapped and tested on x86_64-linux without any issues.  OK for
> trunk?

Ok.

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> 
> 
> 2017-09-05  Martin Jambor  
> 
>   PR tree-optimization/82078
> gcc/
>   * tree-sra.c (sort_and_splice_var_accesses): Move call to
>   add_access_to_work_queue...
>   (build_accesses_from_assign): ...here.
>   (propagate_all_subaccesses): Make sure racc is the group
>   representative, if there is one.
> 
> gcc/testsuite/
>   * gcc.dg/tree-ssa/pr82078.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr82078.c | 27 +++
>  gcc/tree-sra.c  |  5 -
>  2 files changed, 31 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82078.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c
> new file mode 100644
> index 000..3774986324b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct S0 {
> +  signed f4;
> +  signed f9 : 5;
> +} a[6][5], b = {2}
> +
> +;
> +int c, d;
> +int fn1() {
> +  struct S0 e[5][6];
> +  struct S0 f;
> +  b = f = e[2][5] = a[5][0];
> +  if (d)
> +;
> +  else
> +return f.f9;
> +  e[c][45] = a[4][4];
> +}
> +
> +int main() {
> +  fn1();
> +  if (b.f4 != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 68edbce21b3..163b7a2d03b 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1359,6 +1359,8 @@ build_accesses_from_assign (gimple *stmt)
>link->lacc = lacc;
>link->racc = racc;
>add_link_to_rhs (racc, link);
> +  add_access_to_work_queue (racc);
> +
>/* Let's delay marking the areas as written until propagation of 
> accesses
>across link, unless the nature of rhs tells us that its data comes
>from elsewhere.  */
> @@ -2118,7 +2120,6 @@ sort_and_splice_var_accesses (tree var)
>access->grp_total_scalarization = total_scalarization;
>access->grp_partial_lhs = grp_partial_lhs;
>access->grp_unscalarizable_region = unscalarizable_region;
> -  add_access_to_work_queue (access);
>  
>*prev_acc_ptr = access;
>prev_acc_ptr = &access->next_grp;
> @@ -2712,6 +2713,8 @@ propagate_all_subaccesses (void)
>struct access *racc = pop_access_from_work_queue ();
>struct assign_link *link;
>  
> +  if (racc->group_representative)
> + racc= racc->group_representative;
>gcc_assert (racc->first_link);
>  
>for (link = racc->first_link; link; link = link->next)
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PR 82078] Enqueue all SRA links for write flag propagation

2017-09-06 Thread Martin Jambor

Hi,

PR 82078 is another fallout from lazy setting of written flag in SRA.
The problem here is that we do not enqueue assignment links going out
of accesses of candidates that were disqualified before we start the
loop with sort_and_splice_var_accesses.

Given that the propagation is now a correctness necessity, we need to
enqueue all links for processing, so this patch does it when they they
are created.  There should be very little extra work done because of
this because propagate_all_subaccesses starts with checking the
candidate-status of both sides of each link and acts accordingly.

Bootstrapped and tested on x86_64-linux without any issues.  OK for
trunk?

Thanks,

Martin



2017-09-05  Martin Jambor  

PR tree-optimization/82078
gcc/
* tree-sra.c (sort_and_splice_var_accesses): Move call to
add_access_to_work_queue...
(build_accesses_from_assign): ...here.
(propagate_all_subaccesses): Make sure racc is the group
representative, if there is one.

gcc/testsuite/
* gcc.dg/tree-ssa/pr82078.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr82078.c | 27 +++
 gcc/tree-sra.c  |  5 -
 2 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82078.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c
new file mode 100644
index 000..3774986324b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-options "-O" } */
+
+struct S0 {
+  signed f4;
+  signed f9 : 5;
+} a[6][5], b = {2}
+
+;
+int c, d;
+int fn1() {
+  struct S0 e[5][6];
+  struct S0 f;
+  b = f = e[2][5] = a[5][0];
+  if (d)
+;
+  else
+return f.f9;
+  e[c][45] = a[4][4];
+}
+
+int main() {
+  fn1();
+  if (b.f4 != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 68edbce21b3..163b7a2d03b 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1359,6 +1359,8 @@ build_accesses_from_assign (gimple *stmt)
   link->lacc = lacc;
   link->racc = racc;
   add_link_to_rhs (racc, link);
+  add_access_to_work_queue (racc);
+
   /* Let's delay marking the areas as written until propagation of accesses
 across link, unless the nature of rhs tells us that its data comes
 from elsewhere.  */
@@ -2118,7 +2120,6 @@ sort_and_splice_var_accesses (tree var)
   access->grp_total_scalarization = total_scalarization;
   access->grp_partial_lhs = grp_partial_lhs;
   access->grp_unscalarizable_region = unscalarizable_region;
-  add_access_to_work_queue (access);
 
   *prev_acc_ptr = access;
   prev_acc_ptr = &access->next_grp;
@@ -2712,6 +2713,8 @@ propagate_all_subaccesses (void)
   struct access *racc = pop_access_from_work_queue ();
   struct assign_link *link;
 
+  if (racc->group_representative)
+   racc= racc->group_representative;
   gcc_assert (racc->first_link);
 
   for (link = racc->first_link; link; link = link->next)
-- 
2.14.1

RE: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-09-06 Thread Michael Collison

Richard Sandiford do you have any objections to the patch as it stands? It 
doesn't appear as if anything is going to change in the mid-end anytime soon.

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
Sent: Tuesday, August 22, 2017 9:11 AM
To: Richard Biener 
Cc: Richard Kenner ; Michael Collison 
; GCC Patches ; nd 
; Andrew Pinski 
Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts

Richard Biener  writes:
> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
>  wrote:
>> Richard Biener  writes:
>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
>>>  wrote:
Richard Biener  writes:
> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
>  wrote:
>>> Correct. It is truncated for integer shift, but not simd shift 
>>> instructions. We generate a pattern in the split that only
generates
>>> the integer shift instructions.
>>
>> That's unfortunate, because it would be nice to do this in
simplify_rtx,
>> since it's machine-independent, but that has to be conditioned on 
>> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>
> SHIFT_COUNT_TRUNCATED should go ... you should express this in the 
> patterns, like for example with
>
> (define_insn ashlSI3
>   [(set (match_operand 0 "")
>  (ashl:SI (match_operand ... )
>  (subreg:QI (match_operand:SI ...)))]
>
> or an explicit and:SI and combine / simplify_rtx should apply the
magic
> optimization we expect.

The problem with the explicit AND is that you'd end up with either 
an AND of two constants for constant shifts, or with two separate 
patterns, one for constant shifts and one for variable shifts.  (And 
the problem in theory with two patterns is that it reduces the RA's 
freedom, although in practice I guess we'd always want a constant 
shift where possible for cost reasons, and so the RA would never 
need to replace pseudos with constants itself.)

I think all useful instances of this optimisation will be exposed by 
the gimple optimisers, so maybe expand could to do it based on 
TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than 
the rtx code and it does take the mode into account.
>>>
>>> Sure, that could work as well and also take into account range info. 
>>> But we'd then need named expanders and the result would still have 
>>> the explicit and or need to be an unspec or a different RTL operation.
>>
>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have 
>> target-dependent rather than undefined behaviour, so it's OK for a 
>> target to use shift codes with out-of-range values.
>
> Hmm, but that means simplify-rtx can't do anything with them because 
> we need to preserve target dependent behavior.

Yeah, it needs to punt.  In practice that shouldn't matter much.

> I think the RTL IL should be always well-defined and its semantics 
> shouldn't have any target dependences (ideally, and if, then they 
> should be well specified via extra target hooks/macros).

That would be nice :-)  I think the problem has traditionally been that shifts 
can be used in quite a few define_insn patterns besides those for shift 
instructions.  So if your target defines shifts to have 256-bit precision (say) 
then you need to make sure that every define_insn with a shift rtx will honour 
that.

It's more natural for target guarantees to apply to instructions than to rtx 
codes.

>>  And
>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about how 
>> the normal shift optabs behave, so I don't think we'd need new optabs 
>> or new unspecs.
>>
>> E.g. it already works this way when expanding double-word shifts, 
>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There it's 
>> possible to use a shorter sequence if you know that the shift optab 
>> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED 
>> isn't defined.
>
> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK 
> applies to the instructions generated by the named shift patterns but 
> _not_ general shift RTXen.  But the generated pattern contains shift 
> RTXen and how can we figure whether they were generated by the named 
> expanders or by other means?  Don't define_expand also serve as 
> define_insn for things like combine?

Yeah, you can't (and aren't supposed to try to) reverse-engineer the expander 
from the generated instructions.  TARGET_SHIFT_TRUNCATION_MASK should only be 
used if you're expanding a shift optab.

> That said, from looking at the docs and your description above it 
> seems that semantics are not fully reflected in the RTL instruction stream?

Yeah, the semantics go from being well-defined in the optab interface to being 
target-dependent in the rtl stream.

Thanks,
Richard

>
> Richard.
>
>> Thanks,
>> Richard
>>
>>>
>>> Richard.
>>>
Thanks,
Richard

Re: [AArch64, PATCH] Improve Neon store of zero

2017-09-06 Thread Jackson Woodruff


Hi all,

I've attached a new patch that addresses some of the issues raised with 
my original patch.


On 08/23/2017 03:35 PM, Wilco Dijkstra wrote:

Richard Sandiford wrote:


Sorry for only noticing now, but the call to aarch64_legitimate_address_p
is asking whether the MEM itself is a legitimate LDP/STP address.  Also,
it might be better to pass false for strict_p, since this can be called
before RA.  So maybe:

if (GET_CODE (operands[0]) == MEM
&& !(aarch64_simd_imm_zero (operands[1], mode)
 && aarch64_mem_pair_operand (operands[0], mode)))


There were also some issues with the choice of mode for the call the 
aarch64_mem_pair_operand.


For a 128-bit wide mode, we want to check `aarch64_mem_pair_operand 
(operands[0], DImode)` since that's what the stp will be.


For a 64-bit wide mode, we don't need to do that check because a normal
`str` can be issued.

I've updated the condition as such.



Is there any reason for doing this check at all (or at least this early during
expand)?


Not doing this check means that the zero is forced into a register, so 
we then carry around a bit more RTL and rely on combine to merge things.




There is a similar issue with this part:

  (define_insn "*aarch64_simd_mov"
[(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, m,  w, ?r, ?w, ?r, w")
+   "=w, Ump,  m,  w, ?r, ?w, ?r, w")

The Ump causes the instruction to always split off the address offset. Ump
cannot be used in patterns that are generated before register allocation as it
also calls laarch64_legitimate_address_p with strict_p set to true.


I've changed the constraint to a new constraint 'Umq', that acts the 
same as Ump, but calls aarch64_legitimate_address_p with strict_p set to 
false and uses DImode for the mode to pass.



OK for trunk?

Jackson



Wilco



ChangeLog:

gcc/

2017-08-29  Jackson Woodruff  

* config/aarch64/constraints.md (Umq): New constraint.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov):
Change to use Umq.
(mov): Update condition.

gcc/testsuite

2017-08-29  Jackson Woodruff  

* gcc.target/aarch64/simd/vect_str_zero.c:
Update testcase.
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
f3e084f8778d70c82823b92fa80ff96021ad26db..a044a1306a897b169ff3bfa06532c692aaf023c8
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -23,10 +23,11 @@
(match_operand:VALL_F16 1 "general_operand" ""))]
   "TARGET_SIMD"
   "
-if (GET_CODE (operands[0]) == MEM
-   && !(aarch64_simd_imm_zero (operands[1], mode)
-&& aarch64_legitimate_address_p (mode, operands[0],
- PARALLEL, 1)))
+  if (GET_CODE (operands[0]) == MEM
+  && !(aarch64_simd_imm_zero (operands[1], mode)
+  && ((GET_MODE_SIZE (mode) == 16
+   && aarch64_mem_pair_operand (operands[0], DImode))
+  || GET_MODE_SIZE (mode) == 8)))
   operands[1] = force_reg (mode, operands[1]);
   "
 )
@@ -126,7 +127,7 @@
 
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Ump,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
"m,  Dz, w,  w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 
9ce3d4efaf31a301dfb7c1772a6b685fb2cbd2ee..4b926bf80558532e87a1dc4cacc85ff008dd80aa
 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -156,6 +156,14 @@
  (and (match_code "mem")
   (match_test "REG_P (XEXP (op, 0))")))
 
+(define_memory_constraint "Umq"
+  "@internal
+   A memory address which uses a base register with an offset small enough for
+   a load/store pair operation in DI mode."
+   (and (match_code "mem")
+   (match_test "aarch64_legitimate_address_p (DImode, XEXP (op, 0),
+  PARALLEL, 0)")))
+
 (define_memory_constraint "Ump"
   "@internal
   A memory address suitable for a load/store pair operation."
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c
index 
07198de109432b530745cc540790303ae0245efb..00cbf20a0b293e71ed713f0c08d89d8a525fa785
 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c
@@ -7,7 +7,7 @@ void
 f (uint32x4_t *p)
 {
   uint32x4_t x = { 0, 0, 0, 0};
-  p[1] = x;
+  p[4] = x;
 
   /* { dg-final { scan-assembler "stp\txzr, xzr," } } */
 }
@@ -16,7 +16,9 @@ void
 g (float32x2_t *p)
 {
   float32x2_t x = {0.0, 0.0};
-  p[0] = x;
+  p[400] = x;
 
   /* { dg-final { scan-assembler "str\txzr, " } } */
 }
+
+/* { dg-final { scan-assembler-not "add\tx\[0-9\]\+, x0, \[0-9\]+" } } */

Re: [PATCH v2] Python testcases to check DWARF output

2017-09-06 Thread Pierre-Marie de Rodat


On 09/05/2017 09:46 PM, Mike Stump wrote:

I've included the dwarf people on the cc list.  Seems like they may
have an opinion on the direction or the patch itself.  I was fine
with the patch from the larger testsuite perspective.

Good idea, thank you! And thank you for your feedback. :-)

--
Pierre-Marie de Rodat

Re: [PATCH, GCC/ARM, ping] Remove ARMv8-M code for D17-D31

2017-09-06 Thread Kyrill Tkachov


Hi Thomas,

On 05/09/17 10:04, Thomas Preudhomme wrote:

Ping?



This is ok if a bootstrap and test run on arm-none-linux-gnueabihf shows 
no problems.

Thanks,
Kyrill


Best regards,

Thomas

On 25/08/17 12:18, Thomas Preudhomme wrote:

Hi,

I've now also added a couple more changes:

* size to_clear_bitmap according to maxregno to be consistent with 
its use

* use directly TARGET_HARD_FLOAT instead of clear_vfpregs


Original message below (ChangeLog unchanged):

Function cmse_nonsecure_entry_clear_before_return has code to deal with
high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do
not support more than 16 double VFP registers (D0-D15). This makes this
security-sensitive code harder to read for not much benefit since
libcall for cmse_nonsecure_call functions do not deal with those high
VFP registers anyway.

This commit gets rid of this code for simplicity and fixes 2 issues in
the same function:

- stop the first loop when reaching maxregno to avoid dealing with VFP
   registers if targetting Thumb-1 or using -mfloat-abi=soft
- include maxregno in that loop

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-06-13  Thomas Preud'homme 

 * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
 Extensions with more than 16 double VFP registers.
 (cmse_nonsecure_entry_clear_before_return): Remove second entry of
 to_clear_mask and all code related to it.  Replace the remaining
 entry by a sbitmap and adapt code accordingly.

Testing: Testsuite shows no regression when run for ARMv8-M Baseline and
ARMv8-M Mainline.

Is this ok for trunk?

Best regards,

Thomas

On 23/08/17 11:56, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 17/07/17 17:25, Thomas Preudhomme wrote:
My bad, found an off-by-one error in the sizing of bitmaps. Please 
find fixed patch in attachment.


ChangeLog entry is unchanged:

*** gcc/ChangeLog ***

2017-06-13  Thomas Preud'homme 

 * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security
 Extensions with more than 16 double VFP registers.
 (cmse_nonsecure_entry_clear_before_return): Remove second 
entry of

 to_clear_mask and all code related to it.  Replace the remaining
 entry by a sbitmap and adapt code accordingly.

Best regards,

Thomas

On 17/07/17 09:52, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 12/07/17 09:59, Thomas Preudhomme wrote:

Hi Richard,

On 07/07/17 15:19, Richard Earnshaw (lists) wrote:


Hmm, I think that's because really this is a partial 
conversion.  It
looks like doing this properly would involve moving that 
existing code

to use sbitmaps as well.  I think doing that would be better for
long-term maintenance perspectives, but I'm not going to insist 
that you

do it now.


There's also the assert later but I've found a way to improve it 
slightly. While switching to auto_sbitmap I also changed the code 
slightly to allocate directly bitmaps to the right size. Since 
the change is probably bigger than what you had in mind I'd 
appreciate if you can give me an OK again. See updated patch in 
attachment. ChangeLog entry is unchanged:


2017-06-13  Thomas Preud'homme 

 * config/arm/arm.c (arm_option_override): Forbid ARMv8-M 
Security

 Extensions with more than 16 double VFP registers.
 (cmse_nonsecure_entry_clear_before_return): Remove second 
entry of
 to_clear_mask and all code related to it.  Replace the 
remaining

 entry by a sbitmap and adapt code accordingly.



As a result I'll let you take the call as to whether you keep this
version or go back to your earlier patch.  If you do decide to 
keep this

version, then see the comment below.


Given the changes I'm more happy with how the patch looks now and 
making it go in can be a nice incentive to change other ARMv8-M 
Security Extension related code later on.


Best regards,

Thomas

Re: [PATCH] Improve alloca alignment

2017-09-06 Thread Rainer Orth

Jeff Law  writes:

> On 08/22/2017 08:15 AM, Wilco Dijkstra wrote:
>> Jeff Law wrote:
>> On 07/26/2017 05:29 PM, Wilco Dijkstra wrote:
>> 
 But then the check size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0
 seems wrong too given that round_push uses a different alignment to
 align to.
>>> I had started to dig into the history of this code, but just didn't have
>>> the time to do so fully before needing to leave for the day.  To some
>>> degree I was hoping you knew the rationale behind the test against
>>> MAX_SUPPORTED_STACK_ALIGNMENT and I wouldn't have to do a ton of digging :-)
>> 
>> I looked further into this - it appears this works correctly since it is
>> only bypassed if
>> size_align is already maximally aligned. round_push aligns to the
>> preferred alignment,
>> which may be lower or equal to MAX_SUPPORTED_STACK_ALIGNMENT (which is
>> at least STACK_BOUNDARY).
>> 
>> Here is the updated version:
>> 
>> ChangeLog:
>> 2017-08-22  Wilco Dijkstra  
>> 
>>  * explow.c (get_dynamic_stack_size): Improve dynamic alignment.
> OK.  I wonder if this will make it easier to write stack-clash tests of
> the dynamic space for boundary conditions :-)  I was always annoyed that
> I had to fiddle around with magic adjustments to the sizes of objects to
> tickle boundary cases.

This patch brought back PR libgomp/78468, which had caused its
predecessor to be backed out of gcc-7.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH] Fix SLSR issue

2017-09-06 Thread Richard Biener


This fixes a bogus check for a mode when the type matters.  The
test can get fooled by vector ops with integral mode and thus we
later ICE trying to use wide-ints operating on vector constants.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-09-06  Richard Biener  

* gimple-ssa-strength-reduction.c
(find_candidates_dom_walker::before_doom_children): Use a
type and not a mode check.

Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 251710)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -1742,8 +1742,7 @@ find_candidates_dom_walker::before_dom_c
slsr_process_ref (gs);
 
   else if (is_gimple_assign (gs)
-  && SCALAR_INT_MODE_P
-   (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)
+  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs
{
  tree rhs1 = NULL_TREE, rhs2 = NULL_TREE;

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-06 Thread Christophe Lyon

On 5 September 2017 at 20:20, Christophe Lyon
 wrote:
> On 5 September 2017 at 19:53, Kyrill  Tkachov
>  wrote:
>>
>> On 05/09/17 18:48, Bernd Edlinger wrote:
>>>
>>> On 09/05/17 17:02, Wilco Dijkstra wrote:

 Bernd Edlinger wrote:

> Combine creates an invalid insn out of these two insns:

 Yes it looks like a latent bug. We need to use
 arm_general_register_operand
 as arm_adddi3/subdi3 only allow integer registers. You don't need a new
 predicate
 s_register_operand_nv. Also I'd prefer something like
 arm_general_adddi_operand.

>>> Thanks, attached is a patch following your suggestion.
>>>
 +  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) ||
 reload_completed)"

 The split condition for adddi3 now looks more accurate indeed, although
 we could
 remove the !TARGET_NEON from the split condition as this is always true
 given
 arm_adddi3 uses "TARGET_32BIT && !TARGET_NEON".

>>> No, the split condition does not begin with "&& TARGET_32BIT...".
>>> Therefore the split is enabled in TARGET_NEON after reload_completed.
>>> And it is invoked from adddi3_neon for all alternatives without vfp
>>> registers:
>>>
>>> switch (which_alternative)
>>>   {
>>>   case 0: /* fall through */
>>>   case 3: return "vadd.i64\t%P0, %P1, %P2";
>>>   case 1: return "#";
>>>   case 2: return "#";
>>>   case 4: return "#";
>>>   case 5: return "#";
>>>   case 6: return "#";
>>>
>>>
>>>
 Also there are more cases, a quick grep suggests *anddi_notdi_di has the
 same issue.

>>> Yes, that pattern can be cleaned up in a follow-up patch.
>>> Note this splitter is invoked from bicdi3_neon as well.
>>> However I think anddi_notdi_di should be safe as long as it is enabled
>>> after reload_completed (which is probably a bug).
>>>
>>
>> Thanks, that's what I had in mind in my other reply.
>> This is ok if testing comes back ok.
>>
>
> I've submitted the patch for testing, I'll let you know about the results.
>

I can confirm the last patch does fix the regression I reported, and causes
no other regression. (The previous previous of the fix, worked, too).

Thanks for the prompt fix.

Christophe

> Christophe
>
>> Kyrill
>>
>>
>>> Bernd.
>>>
 Wilco

>>

99 matches

Mail list logo