date:20151006

Two more cases where we compare addresses

2015-10-06 Thread Jan Hubicka

Hi,
this patch fixes use of operand_equal_p in fold_comparison where
we compare two addresses for equivalence and in 
fold_addr_of_array_ref_difference.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (fold_comparison, fold_addr_of_array_ref_difference):
Pass OEP_ADDRESS_OF flag to operand_equal_p.
Index: fold-const.c
===
--- fold-const.c(revision 228131)
+++ fold-const.c(working copy)
@@ -8386,7 +8443,7 @@ fold_comparison (location_t loc, enum tr
 
   /* If we have equivalent bases we might be able to simplify.  */
   if (indirect_base0 == indirect_base1
- && operand_equal_p (base0, base1, 0))
+ && operand_equal_p (base0, base1, OEP_ADDRESS_OF))
{
  /* We can fold this expression to a constant if the non-constant
 offset parts are equal.  */
@@ -8806,7 +8863,7 @@ fold_addr_of_array_ref_difference (locat
  && (base_offset = fold_binary_loc (loc, MINUS_EXPR, type,
 TREE_OPERAND (base0, 0),
 TREE_OPERAND (base1, 0
-  || operand_equal_p (base0, base1, 0))
+  || operand_equal_p (base0, base1, OEP_ADDRESS_OF))
 {
   tree op0 = fold_convert_loc (loc, type, TREE_OPERAND (aref0, 1));
   tree op1 = fold_convert_loc (loc, type, TREE_OPERAND (aref1, 1));

Use OEP_ADDRESS_OF in emit-rtl.c

2015-10-06 Thread Jan Hubicka

Hi,
adding some extra sanity checks to operand_equal_p made me to notice that uses
of operand_equal_p in mem attrs really care about addresses only.  The 
expression
is tree of the original memory acces MEM RTX was created from and thus the
comparsions should be done with OEP_ADDRESS_OF.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* emit-rtl.c (mem_attrs_eq_p, mem_expr_equal_p): Pass OEP_ADDRESS_OF
to operand_equal_p.

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 228131)
+++ emit-rtl.c  (working copy)
@@ -334,7 +334,7 @@ mem_attrs_eq_p (const struct mem_attrs *
  && p->addrspace == q->addrspace
  && (p->expr == q->expr
  || (p->expr != NULL_TREE && q->expr != NULL_TREE
- && operand_equal_p (p->expr, q->expr, 0;
+ && operand_equal_p (p->expr, q->expr, OEP_ADDRESS_OF;
 }
 
 /* Set MEM's memory attributes so that they are the same as ATTRS.  */
@@ -1657,7 +1657,7 @@ mem_expr_equal_p (const_tree expr1, cons
   if (TREE_CODE (expr1) != TREE_CODE (expr2))
 return 0;
 
-  return operand_equal_p (expr1, expr2, 0);
+  return operand_equal_p (expr1, expr2, OEP_ADDRESS_OF);
 }
 
 /* Return OFFSET if XEXP (MEM, 0) - OFFSET is known to be ALIGN

Re: Do not compare types in operands_equal_p if OEP_ADDRESS_OF is set

2015-10-06 Thread Jan Hubicka

> > I also disabled type matching done by operand_equal_p and cleaned up the
> > conditional of MEM_REF into multiple ones - for example it was passing
> > OEP_ADDRESS_OF when comparing TYPE_SIZE which is quite a nonsense.
> > 
> > I wonder what to do about OPE_CONSTANT_ADDRESS_OF.  This flag does not seem
> > to be used at all in current tree nor documented somehow.
> 
> It is used and (un-)documented as OEP_ADDRESS_OF, see the ADDR_EXPR case:
> 
>  case ADDR_EXPR:
>  return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1, 0),
>   TREE_CONSTANT (arg0) && TREE_CONSTANT (arg1)
>   ? OEP_CONSTANT_ADDRESS_OF | OEP_ADDRESS_OF : 0);
> 
> So it's OEP_ADDRESS_OF but for constant addresses.

Yep, I noticed it, but just after reading the sources for few times. The
use is well hidden :)

Here is updated patch adding Richard's feedback and also making most of
OEP_ADDRESS_OF special cases to also handle OEP_CONSTANT_ADDRESS_OF.
OEP_CONSTANT_ADDRESS_OF means we care about address and we know the whole
expr is constant, so it is stronger than OEP_ADDRESS_OF.

I also added documentation and cleaned up handling of ADDR_EXPR.  There are two
cases handing ADDR_EXPR.
One handles TREE_CONSTANT (arg0) && TREE_CONSTANT (arg1) and the other the
rest of cases, so we do not need the conditional in the code quoted above.
This will hopefully make it more obvious when the OEP_CONSTANT_ADDRESS_OF
is set and used.

I also added sanity check that OEP_ADDRESSOF|OEP_CONSTANT_ADDRESS_OF is not
used for INTEGER_CST and NOP_EXPR.  There are many other cases where we can't
take address, but this seems strong enough to catch the wrong recursion which
forgets to clear the flag and forced me to fix the OEP_CONSTANT_ADDRESS_OF
handling.

I wonder if the INDIRECT_REF also needs the TBAA check that we do for MEM_REF?
While we lower that early, I think we can still unify the code like in case
of
  cond ? ref_alias_set_1 : ref_alias_set_2

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (operand_equal_p): Document OEP_ADDRESS_OF
and OEP_CONSTANT_ADDRESS_OF; make most of OEP_ADDRESS_OF
special cases to also handle OEP_CONSTANT_ADDRESS_OF; skip
type checking for OPE_*ADDRESS_OF.
Index: fold-const.c
===
--- fold-const.c(revision 228131)
+++ fold-const.c(working copy)
@@ -2691,7 +2691,12 @@ combine_comparisons (location_t loc,
 
If OEP_PURE_SAME is set, then pure functions with identical arguments
are considered the same.  It is used when the caller has other ways
-   to ensure that global memory is unchanged in between.  */
+   to ensure that global memory is unchanged in between.
+
+   If OEP_ADDRESS_OF/OEP_CONSTANT_ADDRESS_OF is set, we are actually comparing
+   addresses of objects, not values of expressions.  OEP_CONSTANT_ADDRESS_OF
+   is used for ADDR_EXPR with TREE_CONSTANT flag set and we further ignore
+   any side effects on SAVE_EXPRs then.  */
 
 int
 operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
@@ -2710,31 +2715,48 @@ operand_equal_p (const_tree arg0, const_
   /* Check equality of integer constants before bailing out due to
  precision differences.  */
   if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
-return tree_int_cst_equal (arg0, arg1);
+{
+  /* Address of INTEGER_CST is not defined; check that we did not forget
+to drop the OEP_ADDRESS_OF/OEP_CONSTANT_ADDRESS_OF flags.  */
+  gcc_checking_assert (!(flags
+& (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)));
+  return tree_int_cst_equal (arg0, arg1);
+}
 
-  /* If both types don't have the same signedness, then we can't consider
- them equal.  We must check this before the STRIP_NOPS calls
- because they may change the signedness of the arguments.  As pointers
- strictly don't have a signedness, require either two pointers or
- two non-pointers as well.  */
-  if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1))
-  || POINTER_TYPE_P (TREE_TYPE (arg0)) != POINTER_TYPE_P (TREE_TYPE 
(arg1)))
-return 0;
+  if (!(flags & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)))
+{
+  /* If both types don't have the same signedness, then we can't consider
+them equal.  We must check this before the STRIP_NOPS calls
+because they may change the signedness of the arguments.  As pointers
+strictly don't have a signedness, require either two pointers or
+two non-pointers as well.  */
+  if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1))
+ || POINTER_TYPE_P (TREE_TYPE (arg0))
+!= POINTER_TYPE_P (TREE_TYPE (arg1)))
+   return 0;
 
-  /* We cannot consider pointers to different address space equal.  */
-  if (POINTER_TYPE_P (TREE_TYPE (arg0)) && POINTER_TYPE_P

Re: C PATCH for c/65345 (file-scope _Atomic expansion with floats)

2015-10-06 Thread Marcus Shawcroft

On 6 October 2015 at 12:29, Ramana Radhakrishnan
 wrote:

> Thanks for the explanation Eric, by that explanation I do not see the need to 
> adjust for TARGET_EXPR or mark_addressable in the backends.
>
> Here are the patches that I'm testing - I will apply the ARM one after 
> testing finishes - my previous testing broke because of some other reasons.
>
> The AArch64 patch cleared testing - ok to apply ?
>

> PR c/65345
>
> * config/aarch64/aarch64-builtins.c 
> (aarch64_atomic_assign_expand_fenv): Use create_tmp_var_raw instead of 
> create_tmp_var.

OK /Marcus

Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2015-10-06 Thread Ulrich Weigand

Hans-Peter Nilsson wrote:
> > From: Ulrich Weigand 
> > Date: Tue, 6 Oct 2015 16:04:35 +0200
> 
> > I'm using the build procedure: build initial GCC (--without-headers),
> > use it to build newlib, install newlib into prefix, build final GCC
> > (--with-headers).  Using this procedure, inihibit_libc used to *not*
> > be set in the final GCC build, but now it is.
> 
> And not using --with-newlib I think.  Somewhat of a borderline
> case, FWIW.

Well, --with-newlib doesn't really matter here, since the only use in GCC
itself is for this check:

if { { test x$host != x$target && test "x$with_sysroot" = x ; } ||
   test x$with_newlib = xyes ; } &&
 { test "x$with_headers" = xno || test ! -f "$target_header_dir/stdio.h"; } 
; then
   inhibit_libc=true
fi

and since for me host != target and I don't use a sysroot, the first
condition in the || is already true.

(I don't like using --with-newlib because it causes the configure scripts
in the various target libraries to stop doing cross-compile checks and
default to hard-coded assumptions on which functions are and are not
present.  Those hard-coded checks tend to be outdated and/or wrong for
SPU; while the ususal cross-compile checks work just fine if newlib has
been installed into the prefix before.)

> > > +if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
> > > +  if test "x$with_headers" != x; then
> > > +target_header_dir=$with_headers
> > 
> > In the common case of just using --with-headers, this now sets
> > target_header_dir to "yes", which is not particularly useful.
> 
> --with-headers without a path to an argument?
> Odd but that *should* work.  I see old lines here and there
> including *toplevel* configure.ac that refers to that.

Yes, pretty much the only use for --with-headers without argument was to
short-circuit the inhibit_libc test.

> > Now I understand that you didn't introduce those lines, and they were 
> > already
> > wrong before your patch; but after the patch the problem now actually 
> > matters.
> > Before the patch, I could always use --with-header to say: just assume 
> > headers
> > are present in the prefix, and everything worked.
> 
> At a quick glance and my initial guess there's a missing two or
> four lines checking for with_headers=yes.
> 
> > This is not a critical problem since I have a work-around: remove 
> > --with-headers
> > and also manually create a symlink from sys-include to include in the 
> > prefix.
> > But it would still be nice to avoid having to do the symlink ...
> 
> I'd recommend writing a patch handling that "yes".
> I know I'm the one "exposing a latent bug" but you're in a much
> better position to test it.

So the question is what this should do then?  Should I simply add back the
behavior that when using --with-headers, we never get inhibit_libc set?

Or should we simply ignore --with-headers and check for the presence of
headers installed in the prefix?  This would work too, once we solve the
sys-include vs. include problem.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: C PATCH for c/65345 (file-scope _Atomic expansion with floats)

2015-10-06 Thread Uros Bizjak

On Thu, Oct 1, 2015 at 4:49 PM, Marek Polacek  wrote:
> Joseph reminded me that I had forgotten about this patch.  As mentioned
> here , I'm
> removing the XFAILs in the tests so people are likely to see new FAILs.

2015-10-06  Uros Bizjak  

PR c/65345
* config/alpha/alpha.c (alpha_atomic_assign_expand_fenv): Use
create_tmp_var_raw instead of create_tmp_var.

Tested with alpha-linux-gnu crosscompiler on the attached testcase,
since native alpha bootstrap currently fails due to PR 67766.

Uros.
Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 228525)
+++ config/alpha/alpha.c(working copy)
@@ -9765,7 +9765,7 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree
 
__ieee_set_fp_control (masked_fenv);  */
 
-  fenv_var = create_tmp_var (long_unsigned_type_node);
+  fenv_var = create_tmp_var_raw (long_unsigned_type_node);
   get_fpscr
 = build_fn_decl ("__ieee_get_fp_control",
 build_function_type_list (long_unsigned_type_node, NULL));
@@ -9794,7 +9794,7 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree
 
__atomic_feraiseexcept (new_fenv_var);  */
 
-  new_fenv_var = create_tmp_var (long_unsigned_type_node);
+  new_fenv_var = create_tmp_var_raw (long_unsigned_type_node);
   reload_fenv = build2 (MODIFY_EXPR, long_unsigned_type_node, new_fenv_var,
build_call_expr (get_fpscr, 0));
   restore_fnenv = build_call_expr (set_fpscr, 1, fenv_var);

Re: [patch 4/3] Header file reduction - Tools for contrib

2015-10-06 Thread Bernd Schmidt


On 10/06/2015 04:04 PM, Andrew MacLeod wrote:


I primarily submitted it early because you wanted to look at the tools
before the code patch, which is the one I care about since the longer it
goes, the more effort it is to update the patch to mainline.


The problem is that the generated patch is impossible to review on its 
own. It's just a half a megabyte dump of changes that can't 
realistically be verified for correctness. Reading it can throw up some 
interesting questions which can then (hopefully) be answered by 
reference to the tools, such as "why does timevar.h move?" For that to 
work, the tools need at least to have a minimum level of readability. 
They are the important part here, not the generated patch. (Unless you 
find a reviewer who's less risk-averse than me and is willing to approve 
the whole set and hope for the best.)


I suspect you'll have to regenerate the includes patch anyway, because 
of the missing #undef tracking I mentioned.


Let's consider the timevar.h example a bit more. Does the include have 
to move? I don't see anything in that file that looks like a dependency, 
and include files that need it are already including it. Is the fact 
that df.h includes it in any way material for generating an order of 
headers? IMO, no, it's an unnecessary change indicating a bug in the 
script, and any kind of unnecessary change in a patch like this makes it 
so much harder to verify. I think the canonical order that's produced 
should probably ignore files included from other headers so that these 
are left alone in their original order.


I'd still like more explanations of special cases in the tools like the 
diagnostic.h area as well as

# seed tm.h with options.h since its a build file and won't be seen.
and I think we need to understand what makes them special in a way that 
makes the rest of the algorithm not handle them correctly (so that we 
don't overlook any other such cases).



Bernd

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread Steve Ellcey

On Mon, 2015-10-05 at 09:57 -0700, H.J. Lu wrote:

> You need to update dwarf2cfi.c to generate proper unwind info for
> whatever frame instructions MIPS generates, like what we did for
> x86 dynamic stack realignment.

The problem is understanding what the 'proper' unwind info is and
figuring out what is wrong about it when it doesn't work.

I used Bernd's comment about Rule #6 to handle the sp = sp AND reg
issue, but I have a couple of more dwarf/cfi questions.

One, can you explicitly make a copy of the stack pointer to another
register and not make that register the new stack pointer?  I want to
save the old stack pointer before aligning it but when I do that I think
that dwarfcfi.c automatically assumes that the new register is now the
stack pointer and that is not what I want.  I want the stack pointer to
remain as the original register.

My other question is about 'set_unexpected' and how that affects
the generated unwind info.  I noticed that a lot of my failing tests use
'set_unexpected' and I don't know what this function does or how it
affects the generated code that would cause these tests in particular to
fail.

Steve Ellcey
sell...@imgtec.com

Re: [PATCH] remove dead code used by the old cloog scheduler

2015-10-06 Thread Tobias Grosser


On 10/06/2015 05:45 PM, Sebastian Pop wrote:

2015-10-05  Aditya Kumar  
 Sebastian Pop  

 * graphite-dependences.c (scop_get_transformed_schedule): 
Remove.
 (no_violations): Remove.
 (subtract_commutative_associative_deps): Remove.
 (compute_deps): Do not call 
subtract_commutative_associative_deps.
 (transform_is_safe): Remove.
 (graphite_legal_transform): Remove.
 * graphite-poly.h (graphite_legal_transform): Remove.


LGTM.

Tobias

---
  gcc/graphite-dependences.c | 255 +
  gcc/graphite-poly.h|   3 -
  2 files changed, 1 insertion(+), 257 deletions(-)

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index e39394a..2c4f92c 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -154,26 +154,6 @@ scop_get_original_schedule (scop_p scop, vec 
pbbs)
return res;
  }

-/* Returns all the transformed schedules in SCOP.  */
-
-static isl_union_map *
-scop_get_transformed_schedule (scop_p scop, vec pbbs)
-{
-  int i;
-  poly_bb_p pbb;
-  isl_space *space = isl_set_get_space (scop->context);
-  isl_union_map *res = isl_union_map_empty (space);
-
-  FOR_EACH_VEC_ELT (pbbs, i, pbb)
-{
-  res = isl_union_map_add_map
-   (res, constrain_domain (isl_map_copy (pbb->transformed),
-   isl_set_copy (pbb->domain)));
-}
-
-  return res;
-}
-
  /* Helper function used on each MAP of a isl_union_map.  Computes the
 maximal output dimension.  */

@@ -262,33 +242,6 @@ apply_schedule_on_deps (__isl_keep isl_union_map *schedule,
return x;
  }

-/* Return true when SCHEDULE does not violate the data DEPS: that is
-   when the intersection of LEX with the DEPS transformed by SCHEDULE
-   is empty.  LEX is the relation in which the outputs occur before
-   the inputs.  */
-
-static bool
-no_violations (__isl_keep isl_union_map *schedule,
-  __isl_keep isl_union_map *deps)
-{
-  bool res;
-  isl_space *space;
-  isl_map *lex, *x;
-
-  if (isl_union_map_is_empty (deps))
-return true;
-
-  x = apply_schedule_on_deps (schedule, deps);
-  space = isl_map_get_space (x);
-  space = isl_space_range (space);
-  lex = isl_map_lex_ge (space);
-  x = isl_map_intersect (x, lex);
-  res = isl_map_is_empty (x);
-
-  isl_map_free (x);
-  return res;
-}
-
  /* Return true when DEPS is non empty and the intersection of LEX with
 the DEPS transformed by SCHEDULE is non empty.  LEX is the relation
 in which all the inputs before DEPTH occur at the same time as the
@@ -332,161 +285,6 @@ carries_deps (__isl_keep isl_union_map *schedule,
return res;
  }

-/* Subtract from the RAW, WAR, and WAW dependences those relations
-   that have been marked as belonging to an associative commutative
-   reduction.  */
-
-static void
-subtract_commutative_associative_deps (scop_p scop,
-  vec pbbs,
-  isl_union_map *original,
-  isl_union_map **must_raw,
-  isl_union_map **may_raw,
-  isl_union_map **must_raw_no_source,
-  isl_union_map **may_raw_no_source,
-  isl_union_map **must_war,
-  isl_union_map **may_war,
-  isl_union_map **must_war_no_source,
-  isl_union_map **may_war_no_source,
-  isl_union_map **must_waw,
-  isl_union_map **may_waw,
-  isl_union_map **must_waw_no_source,
-  isl_union_map **may_waw_no_source)
-{
-  int i, j;
-  poly_bb_p pbb;
-  poly_dr_p pdr;
-  isl_space *space = isl_set_get_space (scop->context);
-
-  FOR_EACH_VEC_ELT (pbbs, i, pbb)
-if (PBB_IS_REDUCTION (pbb))
-  {
-   isl_union_map *r = isl_union_map_empty (isl_space_copy (space));
-   isl_union_map *must_w = isl_union_map_empty (isl_space_copy (space));
-   isl_union_map *may_w = isl_union_map_empty (isl_space_copy (space));
-   isl_union_map *all_w;
-   isl_union_map *empty;
-   isl_union_map *x_must_raw;
-   isl_union_map *x_may_raw;
-   isl_union_map *x_must_raw_no_source;
-   isl_union_map *x_may_raw_no_source;
-   isl_union_map *x_must_war;
-   isl_union_map *x_may_war;
-   isl_union_map *x_must_war_no_source;
-   isl_union_map *x_may_war_no_source;
-   isl_union_map *x_must_waw;
-   isl_union_map *x_may_waw;
-   isl_union_map *x_must_waw_no_source;
-   isl_union_map *x_may_waw_no_source;
-
-   FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr)
- if (pdr_read_p

Re: [PATCH, rs6000] Fix PR target/67808, LRA ICE on double to long double conversion

2015-10-06 Thread David Edelsohn

On Mon, Oct 5, 2015 at 6:36 PM, Michael Meissner
 wrote:
> Ok, after spending the day on going down the rabbit hole of trying to optimize
> just about every, here are my patches.
>
> Note, I simplified the constraints to eliminate some rare possibilities, like
> optimizing converting from double to long double if the double happened to be
> in a GPR register and the long double value was stored in memory (but there
> never was an optimization for having the double value be in a GPR and the long
> double value to also be a GPR).
>
> I also separated the VSX case from the non-VSX case. This is to simplify 
> things
> at the RTL level (non-VSX must load up 0.0 to put into the lower word, while
> VSX can use the XXLXOR instruction to clear the register).
>
> I dropped support in the insns for extending the DFmode value to TFmode that 
> is
> located in memory directly. Now, the compiler builds the whole value in FPRs
> and then does the store. This simplifies the code somewhat, and SPE/ieeequad
> paths require the value to be in registers, which might lead to other lra
> bugs. In the case of just doing one conversion in straight line code, it just
> changes the register allocation somewhat (allocate 1 TFmode pseudo instead of
> 1-2 DFmode psuedos).
>
> I have bootstrapped the compiler on little endian power8 with no regressions. 
> I
> have built the test case with various options (-mlra vs. no -mlra, 32-bit,
> 64-bit, power5/power6/power7/power8), and it all builds correctly.
>
> Is this patch ok to apply to the trunk?
>
> I would like to apply this patch to GCC 5.x as well. However, in doing the
> patch, this patch touches areas that I've been working on for IEEE 128-bit
> floating point support, and so the patch will need to be reworked for GCC
> 5.x.  Is it ok to install in the trunk?
>
> In addition, I will need to modify this area again with the next IEEE 128-bit
> floating point support patch, but I wanted to separate this patch out so that
> it could be considered by itself, and back ported to GCC 5.x.
>
> [gcc]
> 2015-10-05  Michael Meissner  
> Peter Bergner  
>
> PR target/67808
> * config/rs6000/rs6000.md (extenddftf2): In the expander, only
> allow registers, but provide insns for the combiner to create for
> loads from memory. Separate VSX code from non-VSX code. For
> non-VSX code, combine extenddftf2_fprs into extenddftf2 and rename
> externaldftf2_internal to externaldftf2_fprs. Reorder constraints
> so that registers come before memory operations. Drop support from
> converting DFmode to TFmode, if the DFmode value is in a GPR
> register, and the TFmode is in memory.
> (extenddftf2_fprs): Likewise.
> (extenddftf2_internal): Likewise.
> (extenddftf2_vsx): Likewise.
> (extendsftf2): In the expander, only allow registers, but provide
> insns for the combiner to create for stores and loads.
>
> [gcc/testsuite]
> 2015-10-05  Michael Meissner  
> Peter Bergner 
>
> PR target/67808
> * gcc.target/powerpc/pr67808.c: New test.

This is okay for trunk, but can we hold off for GCC 5 and allow things
to settle?

Thanks, David

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread H.J. Lu

On Tue, Oct 6, 2015 at 8:30 AM, Steve Ellcey  wrote:
> On Mon, 2015-10-05 at 09:57 -0700, H.J. Lu wrote:
>
>> You need to update dwarf2cfi.c to generate proper unwind info for
>> whatever frame instructions MIPS generates, like what we did for
>> x86 dynamic stack realignment.
>
> The problem is understanding what the 'proper' unwind info is and
> figuring out what is wrong about it when it doesn't work.
>
> I used Bernd's comment about Rule #6 to handle the sp = sp AND reg
> issue, but I have a couple of more dwarf/cfi questions.
>
> One, can you explicitly make a copy of the stack pointer to another
> register and not make that register the new stack pointer?  I want to
> save the old stack pointer before aligning it but when I do that I think
> that dwarfcfi.c automatically assumes that the new register is now the
> stack pointer and that is not what I want.  I want the stack pointer to
> remain as the original register.
>
> My other question is about 'set_unexpected' and how that affects
> the generated unwind info.  I noticed that a lot of my failing tests use
> 'set_unexpected' and I don't know what this function does or how it
> affects the generated code that would cause these tests in particular to
> fail.

You can start by writing dynamic stack alignment in assembly
with CFI directives and verify you can unwind it under gdb.  You
will know what CFI directives you should generate for GDB.

BTW, you should first fix readelf to dump MIPS DWARF unwind info.

-- 
H.J.

Re: [Patch, fortran, pr65889, v1] [6 Regressions] [OOP] ICE with sizeof a polymorphic variable

2015-10-06 Thread Mikael Morin


Le 06/10/2015 16:22, Andre Vehreschild a écrit :

Hi all,

the attached patch fixes a 6 regression when the argument of sizeof()
is a pointer to a class object, e.g., when the class object is
intent(out). The patch improves the check if the parameter is a
class object by previously checking whether the argument is the plain
object or a pointer to one and using TREE_OPERAND() once or twice,
respectively.

Bootstraps and regtests ok on x86_64-linux-gnu/f21.

Ok for trunk?


Ok. Thanks.

Mikael

Re: [PATCH] reorg.c: use vec instead of rtx_insn_list for the delay insn list

2015-10-06 Thread Oleg Endo

On Tue, 2015-10-06 at 15:59 +0200, Bernd Schmidt wrote:
> On 10/06/2015 03:37 PM, tbsaunde+...@tbsaunde.org wrote:
> > This seems a bit cleaner, and should involve less allocation.
> 
> I agree this is good. rtx_insn_list should die.
> 
> > I tested there was no regressions for sh-sim with all languages accept
> > ada,lto,fortran, ok?
> 
> Could you please also build a few large source files and compare 
> before/after code generation to ensure there are no changes? I always 
> try to do this for patches like this one to avoid accidents, and this 
> change looks sufficiently non-mechanical to warrant it. Could also test 
> mips-elf.

I've ran a CSiBE comparison for "-O2 -m4-single -ml" on sh-elf.  There
are no code changes with Trev's patch.

Cheers,
Oleg

Re: [PATCH] reorg.c: use vec instead of rtx_insn_list for the delay insn list

2015-10-06 Thread Bernd Schmidt


On 10/06/2015 04:50 PM, Oleg Endo wrote:

On Tue, 2015-10-06 at 15:59 +0200, Bernd Schmidt wrote:

On 10/06/2015 03:37 PM, tbsaunde+...@tbsaunde.org wrote:

This seems a bit cleaner, and should involve less allocation.


I agree this is good. rtx_insn_list should die.


I tested there was no regressions for sh-sim with all languages accept
ada,lto,fortran, ok?


Could you please also build a few large source files and compare
before/after code generation to ensure there are no changes? I always
try to do this for patches like this one to avoid accidents, and this
change looks sufficiently non-mechanical to warrant it. Could also test
mips-elf.


I've ran a CSiBE comparison for "-O2 -m4-single -ml" on sh-elf.  There
are no code changes with Trev's patch.


Thanks! Good enough.


Bernd

Re: [PATCH, i386] Add missing entries to cpuinfo.

2015-10-06 Thread Uros Bizjak

On Tue, Oct 6, 2015 at 3:36 PM, Kirill Yukhin  wrote:
> Hello,
> Patch in the bottom adds missing options to libgcc/config/i386/cpuinfo.c
> It also updates documentation.
> As far as number of entries exceeded 32, I've extended
> type of features array to INT64 (most suspicious part of the patch).
>
> Bootstrapped and regtested. `make pdf' seems to be working properly.
>
> Comments?

Er, this is on purpose. Multiversioning is intended to depend on ISA
extensions such as SSE, AVX, AES to some degree, and similar
extensions that do make a difference when used in compilation. This
was discussed some years ago, but I can't find the URL of the
discussion.

So, from the list below, I'd take only SHA. Plus fixes/additions in
the documentation and testsuite, of course.

Uros.

> gcc/
> * config/i386/i386.c (build_processor_model_struct): Replace
> array type to intDI.
> (old_builtin_cpu): Handle SHA, PREFETCHWT1, FSGSBASE, HLE,
> RTM, RDSEED, ADX, MPX, CLFLUSHOPT, PCOMMIT, CLWB, XSAVEOPT,
> XSAVEC, and XSAVES features. Update built-in expansion.
> * doc/extend.texi (ivybridge): New.
> (haswell): Ditto.
> (broadwell): Ditto.
> (skylake): Ditto.
> (skylake-avx512): Ditto.
> (aes): Ditto.
> (pclmul): Ditto.
> (fma): Ditto.
> (bmi): Ditto.
> (bmi2): Ditto.
> (avx512bw): Ditto.
> (avx512dq): Ditto.
> (avx512pf): Ditto.
> (avx512er): Ditto.
> (avx512ifma): Ditto.
> (avx512vbmi): Ditto.
> (sha): Ditto.
> (prefetchwt1): Ditto.
>
> gcc/testsuite/
> * gcc.target/i386/builtin_target.c: Check SHA, PREFETCHWT1,
> FSGSBASE, HLE, RTM, RDSEED, ADX, MPX, CLFLUSHOPT, PCOMMIT,
> CLWB, XSAVEOPT, XSAVEC, and XSAVES features.
>
> libgcc/
> * config/i386/cpuinfo.c (processor_features): Add SHA, PREFETCHWT1,
> FSGSBASE, HLE, RTM, RDSEED, ADX, MPX, CLFLUSHOPT, PCOMMIT, CLWB,
> XSAVEOPT, XSAVEC, and XSAVES features.
> (struct __processor_model): Set type of __cpu_features array to
> uint64_t.
> (get_available_features): Add new features. Reorder according
> to bit number.
>
> --
> Thanks, K
>
> commit 839916b8d017fac166f76c317bdf5c38d5d15ea4
> Author: Kirill Yukhin 
> Date:   Tue Oct 6 16:20:09 2015 +0300
>
> Add missing entries to libgcc/config/i386/cpuinfo.c.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d59b59b..13241f2 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -36507,10 +36507,10 @@ build_processor_model_struct (void)
>field_chain = field;
>  }
>
> -  /* The last field is an array of unsigned integers of size one.  */
> +  /* The last field is an array of unsigned long long integers of size one.  
> */
>field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
>   get_identifier (field_name[3]),
> - build_array_type (unsigned_type_node,
> + build_array_type (unsigned_intDI_type_node,
> build_index_type (size_one_node)));
>if (field_chain != NULL_TREE)
>  DECL_CHAIN (field) = field_chain;
> @@ -36587,6 +36587,20 @@ fold_builtin_cpu (tree fndecl, tree *args)
>  F_AVX512PF,
>  F_AVX512VBMI,
>  F_AVX512IFMA,
> +F_SHA,
> +F_PREFETCHWT1,
> +F_FSGSBASE,
> +F_HLE,
> +F_RTM,
> +F_RDSEED,
> +F_ADX,
> +F_MPX,
> +F_CLFLUSHOPT,
> +F_PCOMMIT,
> +F_CLWB,
> +F_XSAVEOPT,
> +F_XSAVEC,
> +F_XSAVES,
>  F_MAX
>};
>
> @@ -36697,6 +36711,20 @@ fold_builtin_cpu (tree fndecl, tree *args)
>{"avx512pf",F_AVX512PF},
>{"avx512vbmi",F_AVX512VBMI},
>{"avx512ifma",F_AVX512IFMA},
> +  {"prefetchwt1", F_PREFETCHWT1},
> +  {"sha", F_SHA},
> +  {"fsgsbase",F_FSGSBASE},
> +  {"hle", F_HLE},
> +  {"rtm", F_RTM},
> +  {"rdseed",  F_RDSEED},
> +  {"adx", F_ADX},
> +  {"mpx", F_MPX},
> +  {"clflushopt",F_CLFLUSHOPT},
> +  {"pcommit", F_PCOMMIT},
> +  {"clwb",F_CLWB},
> +  {"xsaveopt",F_XSAVEOPT},
> +  {"xsavec",  F_XSAVEC},
> +  {"xsaves",  F_XSAVES},
>  };
>
>tree __processor_model_type = build_processor_model_struct ();
> @@ -36780,7 +36808,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
>tree field;
>tree final;
>
> -  unsigned int field_val = 0;
> +  unsigned HOST_WIDE_INT field_val = 0;
>unsigned int NUM_ISA_NAMES
> = sizeof (isa_names_table) / sizeof (struct _isa_names_table);
>
> @@ -36806,13 +36834,13 @@ fold_builtin_cpu (tree fndecl, tree *args)
>  field, NULL_TREE);
>
>/* Access the 0th element of __cpu_features array.  */
> -  array_elt = build4 (ARRAY_REF, unsigned_type_node, ref,
> +  array_elt = build4 (ARRAY_REF,

[Patch, fortran, pr65889, v1] [6 Regressions] [OOP] ICE with sizeof a polymorphic variable

2015-10-06 Thread Andre Vehreschild

Hi all,

the attached patch fixes a 6 regression when the argument of sizeof()
is a pointer to a class object, e.g., when the class object is
intent(out). The patch improves the check if the parameter is a
class object by previously checking whether the argument is the plain
object or a pointer to one and using TREE_OPERAND() once or twice,
respectively.

Bootstraps and regtests ok on x86_64-linux-gnu/f21.

Ok for trunk?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/fortran/ChangeLog:

2015-10-06  Andre Vehreschild  

* trans-intrinsic.c (gfc_conv_intrinsic_sizeof): Handle pointer to and
on stack class objects as sizeof parameter.

gcc/testsuite/ChangeLog:

2015-10-06  Andre Vehreschild  

* gfortran.dg/sizeof_5.f90: New test.


diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 35052be..ac61c09 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -5937,11 +5937,16 @@ gfc_conv_intrinsic_sizeof (gfc_se *se, gfc_expr *expr)
 }
   else if (arg->ts.type == BT_CLASS)
 {
-  /* For deferred length arrays, conv_expr_descriptor returns an
-	 indirect_ref to the component.  */
+  /* Conv_expr_descriptor returns a component_ref to _data component of the
+	 class object.  The class object may be a non-pointer object, e.g.
+	 located on the stack, or a memory location pointed to, e.g. a
+	 parameter, i.e., an indirect_ref.  */
   if (arg->rank < 0
 	  || (arg->rank > 0 && !VAR_P (argse.expr)
-	  && GFC_DECL_CLASS (TREE_OPERAND (argse.expr, 0
+	  && ((INDIRECT_REF_P (TREE_OPERAND (argse.expr, 0))
+		   && GFC_DECL_CLASS (TREE_OPERAND (
+	TREE_OPERAND (argse.expr, 0), 0)))
+		  || GFC_DECL_CLASS (TREE_OPERAND (argse.expr, 0)
 	byte_size = gfc_class_vtab_size_get (TREE_OPERAND (argse.expr, 0));
   else if (arg->rank > 0)
 	/* The scalarizer added an additional temp.  To get the class' vptr
diff --git a/gcc/testsuite/gfortran.dg/sizeof_5.f90 b/gcc/testsuite/gfortran.dg/sizeof_5.f90
new file mode 100644
index 000..0e1496a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/sizeof_5.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+!
+! PR fortran/65889
+!
+!
+module m
+  type n
+  end type n
+contains
+  subroutine g(ns)
+class(n), intent(out), allocatable, dimension(:) :: ns
+class(n), allocatable, dimension(:) :: tmp
+write (0,*) sizeof(ns), sizeof(tmp)
+  end subroutine g
+end module m

Re: [PATCH] PR target/67850: Wrong call_used_regs used in aggregate_value_p

2015-10-06 Thread H.J. Lu

On Tue, Oct 6, 2015 at 6:39 AM, Richard Biener  wrote:
> On Tue, 6 Oct 2015, H.J. Lu wrote:
>
>> On Tue, Oct 06, 2015 at 02:30:59PM +0200, Richard Biener wrote:
>> > On Tue, Oct 6, 2015 at 1:43 PM, H.J. Lu  wrote:
>> > > Since targetm.expand_to_rtl_hook may be called to switch ABI, it should
>> > > be called for each function before expanding to RTL.  Otherwise, we may
>> > > use the stale information from compilation of the previous function.
>> > > aggregate_value_p uses call_used_regs.  aggregate_value_p is used by
>> > > IPA and return value optimization, which are called before
>> > > pass_expand::execute after RTL expansion starts.  We need to call
>> > > targetm.expand_to_rtl_hook early enough in cgraph_node::expand to make
>> > > sure that everything is in sync when RTL expansion starts.
>> > >
>> > > Tested on Linux/x86-64.  OK for trunk?
>> >
>> > Hmm, I think set_cfun hook should handle this.  expand_to_rtl_hook 
>> > shouldn't
>> > mess with per-function stuff.
>> >
>> > Richard.
>> >
>>
>> I am testig this patch.  OK for trunk if there is no regresion?
>>
>>
>> H.J.
>> --
>> ix86_maybe_switch_abi is called to late during RTL expansion and we
>> use the stale information from compilation of the previous function.
>> aggregate_value_p uses call_used_regs.  aggregate_value_p is used by
>> IPA and return value optimization, which are called before
>> pass_expand::execute after RTL expansion starts.  Instead,
>> ix86_maybe_switch_abi should be merged with ix86_set_current_function.
>>
>>   PR target/67850
>>   * config/i386/i386.c (ix86_set_current_function): Renamed
>>   to ...
>>   (ix86_set_current_function_1): This.
>>   (ix86_set_current_function): New. incorporate old
>>   ix86_set_current_function and ix86_maybe_switch_abi.
>>   (ix86_maybe_switch_abi): Removed.
>>   (TARGET_EXPAND_TO_RTL_HOOK): Likewise.
>> ---
>>  gcc/config/i386/i386.c | 33 ++---
>>  1 file changed, 18 insertions(+), 15 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index d59b59b..a0adf3d 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -6222,7 +6222,7 @@ ix86_reset_previous_fndecl (void)
>> FNDECL.  The argument might be NULL to indicate processing at top
>> level, outside of any function scope.  */
>>  static void
>> -ix86_set_current_function (tree fndecl)
>> +ix86_set_current_function_1 (tree fndecl)
>>  {
>>/* Only change the context if the function changes.  This hook is called
>>   several times in the course of compiling a function, and we don't want 
>> to
>> @@ -6262,6 +6262,23 @@ ix86_set_current_function (tree fndecl)
>>ix86_previous_fndecl = fndecl;
>>  }
>>
>> +static void
>> +ix86_set_current_function (tree fndecl)
>> +{
>> +  ix86_set_current_function_1 (fndecl);
>> +
>> +  if (!cfun)
>> +return;
>
> I think you want to test !fndecl here.  Why split this out at all?
> The ix86_previous_fndecl caching should still work, no?
>
>> +  /* 64-bit MS and SYSV ABI have different set of call used registers.
>> + Avoid expensive re-initialization of init_regs each time we switch
>> + function context since this is needed only during RTL expansion.  */
>
> The comment is now wrong (and your bug shows it was wrong previously).
>

Here is the updated patch.  OK for master if there is no
regression on Linux/x86-64?

-- 
H.J.
From e5c618c9f951d6b9e2c534d4ace9225c6c991c75 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 6 Oct 2015 05:50:37 -0700
Subject: [PATCH] Merge ix86_maybe_switch_abi with ix86_set_current_function

ix86_maybe_switch_abi is called to late during RTL expansion and we
use the stale information from compilation of the previous function.
aggregate_value_p uses call_used_regs.  aggregate_value_p is used by
IPA and return value optimization, which are called before
ix86_maybe_switch_abi is called.  This patch merges ix86_maybe_switch_abi
with ix86_set_current_function.

	PR target/67850
	* config/i386/i386.c (ix86_maybe_switch_abi): Merged with ...
	(ix86_set_current_function): This.
	(TARGET_EXPAND_TO_RTL_HOOK): Removed.
---
 gcc/config/i386/i386.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 38953dd..c44f0af 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6367,6 +6367,14 @@ ix86_set_current_function (tree fndecl)
 	TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
 }
   ix86_previous_fndecl = fndecl;
+
+  /* 64-bit MS and SYSV ABI have different set of call used registers.
+ Avoid expensive re-initialization of init_regs each time we switch
+ function context.  */
+  if (TARGET_64BIT
+  && (call_used_regs[SI_REG]
+	  == (cfun->machine->call_abi == MS_ABI)))
+reinit_regs ();
 }
 
 
@@ -7502,17 +7510,6 @@ ix86_call_abi_override

[PATCH] remove dead code used by the old cloog scheduler

2015-10-06 Thread Sebastian Pop

2015-10-05  Aditya Kumar  
Sebastian Pop  

* graphite-dependences.c (scop_get_transformed_schedule): 
Remove.
(no_violations): Remove.
(subtract_commutative_associative_deps): Remove.
(compute_deps): Do not call 
subtract_commutative_associative_deps.
(transform_is_safe): Remove.
(graphite_legal_transform): Remove.
* graphite-poly.h (graphite_legal_transform): Remove.
---
 gcc/graphite-dependences.c | 255 +
 gcc/graphite-poly.h|   3 -
 2 files changed, 1 insertion(+), 257 deletions(-)

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index e39394a..2c4f92c 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -154,26 +154,6 @@ scop_get_original_schedule (scop_p scop, vec 
pbbs)
   return res;
 }
 
-/* Returns all the transformed schedules in SCOP.  */
-
-static isl_union_map *
-scop_get_transformed_schedule (scop_p scop, vec pbbs)
-{
-  int i;
-  poly_bb_p pbb;
-  isl_space *space = isl_set_get_space (scop->context);
-  isl_union_map *res = isl_union_map_empty (space);
-
-  FOR_EACH_VEC_ELT (pbbs, i, pbb)
-{
-  res = isl_union_map_add_map
-   (res, constrain_domain (isl_map_copy (pbb->transformed),
-   isl_set_copy (pbb->domain)));
-}
-
-  return res;
-}
-
 /* Helper function used on each MAP of a isl_union_map.  Computes the
maximal output dimension.  */
 
@@ -262,33 +242,6 @@ apply_schedule_on_deps (__isl_keep isl_union_map *schedule,
   return x;
 }
 
-/* Return true when SCHEDULE does not violate the data DEPS: that is
-   when the intersection of LEX with the DEPS transformed by SCHEDULE
-   is empty.  LEX is the relation in which the outputs occur before
-   the inputs.  */
-
-static bool
-no_violations (__isl_keep isl_union_map *schedule,
-  __isl_keep isl_union_map *deps)
-{
-  bool res;
-  isl_space *space;
-  isl_map *lex, *x;
-
-  if (isl_union_map_is_empty (deps))
-return true;
-
-  x = apply_schedule_on_deps (schedule, deps);
-  space = isl_map_get_space (x);
-  space = isl_space_range (space);
-  lex = isl_map_lex_ge (space);
-  x = isl_map_intersect (x, lex);
-  res = isl_map_is_empty (x);
-
-  isl_map_free (x);
-  return res;
-}
-
 /* Return true when DEPS is non empty and the intersection of LEX with
the DEPS transformed by SCHEDULE is non empty.  LEX is the relation
in which all the inputs before DEPTH occur at the same time as the
@@ -332,161 +285,6 @@ carries_deps (__isl_keep isl_union_map *schedule,
   return res;
 }
 
-/* Subtract from the RAW, WAR, and WAW dependences those relations
-   that have been marked as belonging to an associative commutative
-   reduction.  */
-
-static void
-subtract_commutative_associative_deps (scop_p scop,
-  vec pbbs,
-  isl_union_map *original,
-  isl_union_map **must_raw,
-  isl_union_map **may_raw,
-  isl_union_map **must_raw_no_source,
-  isl_union_map **may_raw_no_source,
-  isl_union_map **must_war,
-  isl_union_map **may_war,
-  isl_union_map **must_war_no_source,
-  isl_union_map **may_war_no_source,
-  isl_union_map **must_waw,
-  isl_union_map **may_waw,
-  isl_union_map **must_waw_no_source,
-  isl_union_map **may_waw_no_source)
-{
-  int i, j;
-  poly_bb_p pbb;
-  poly_dr_p pdr;
-  isl_space *space = isl_set_get_space (scop->context);
-
-  FOR_EACH_VEC_ELT (pbbs, i, pbb)
-if (PBB_IS_REDUCTION (pbb))
-  {
-   isl_union_map *r = isl_union_map_empty (isl_space_copy (space));
-   isl_union_map *must_w = isl_union_map_empty (isl_space_copy (space));
-   isl_union_map *may_w = isl_union_map_empty (isl_space_copy (space));
-   isl_union_map *all_w;
-   isl_union_map *empty;
-   isl_union_map *x_must_raw;
-   isl_union_map *x_may_raw;
-   isl_union_map *x_must_raw_no_source;
-   isl_union_map *x_may_raw_no_source;
-   isl_union_map *x_must_war;
-   isl_union_map *x_may_war;
-   isl_union_map *x_must_war_no_source;
-   isl_union_map *x_may_war_no_source;
-   isl_union_map *x_must_waw;
-   isl_union_map *x_may_waw;
-   isl_union_map *x_must_waw_no_source;
-   isl_union_map *x_may_waw_no_source;
-
-   FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr)
- if (pdr_read_p (pdr))
-   r = isl_union_map_add_map (r, add_pdr_constraints (pdr, pbb));

[Patch, fortran] COMMON block error recovery: PR 67758 (second pass)

2015-10-06 Thread Mikael Morin


Hello,

Dominique noticed that the test coming with the preceding PR67758 patch 
[1] was failing if compiled as free form.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00301.html

The problem is again an inconsistent state, but this time between the 
in_common attribute and the common_block pointer.

So, here is another iteration, hopefully fixing the remaining problems.
The changes are:
   - adding a symbol to a common block list in gfc_match_common is 
delayed after the call to gfc_add_in_common.
   - gfc_restore_latest_undo_checkpoint is changed to check the 
common_block pointer directly instead of the in_common attribute.
Both of these changes fix the testcase independently, but with some 
regressions, so there is additionally:
   - gfc_restore_old_symbol is changed to also restore the 
common-related pointers.  This is done using a new function created to 
factor the related memory management.
   - In gfc_restore_last_undo_checkpoint, when a symbol has been 
removed from the common block linked list, its common_next pointer is 
cleared.


Regression tested on x86_64-linux.  OK for trunk?

Mikael


2015-10-06  Mikael Morin  

	PR fortran/67758
	* gfortran.h (gfc_symbol): Expand comment.
	* match.c (gfc_match_common): Delay adding the symbol to
	the common_block after the gfc_add_in_common call.
	* symbol.c (gfc_free_symbol): Move common block memory handling...
	(gfc_set_symbol_common_block): ... here as a new function.
	(restore_old_symbol): Restore common block fields.
	(gfc_restore_last_undo_checkpoint):
	Check the common_block pointer instead of the in_common attribute.
	When a symbol has been removed from the common block linked list,
	clear its common_next pointer.

2015-10-06  Mikael Morin  

	PR fortran/67758
	* gfortran.dg/common_25.f90: New file.

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 9c0084b..b2894cc 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1411,8 +1411,12 @@ typedef struct gfc_symbol
 
   struct gfc_symbol *common_next;	/* Links for COMMON syms */
 
-  /* This is in fact a gfc_common_head but it is only used for pointer
- comparisons to check if symbols are in the same common block.  */
+  /* This is only used for pointer comparisons to check if symbols
+ are in the same common block.
+ In opposition to common_block, the common_head pointer takes into account
+ equivalences: if A is in a common block C and A and B are in equivalence,
+ then both A and B have common_head pointing to C, while A's common_block
+ points to C and B's is NULL.  */
   struct gfc_common_head* common_head;
 
   /* Make sure setup code for dummy arguments is generated in the correct
diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index 29437c3..74f26b7 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -4365,16 +4365,6 @@ gfc_match_common (void)
 		goto cleanup;
 	}
 
-	  sym->common_block = t;
-	  sym->common_block->refs++;
-
-	  if (tail != NULL)
-	tail->common_next = sym;
-	  else
-	*head = sym;
-
-	  tail = sym;
-
 	  /* Deal with an optional array specification after the
 	 symbol name.  */
 	  m = gfc_match_array_spec (, true, true);
@@ -4409,6 +4399,16 @@ gfc_match_common (void)
 	 if any, and continue matching.  */
 	  gfc_add_in_common (>attr, sym->name, NULL);
 
+	  sym->common_block = t;
+	  sym->common_block->refs++;
+
+	  if (tail != NULL)
+	tail->common_next = sym;
+	  else
+	*head = sym;
+
+	  tail = sym;
+
 	  sym->common_head = t;
 
 	  /* Check to see if the symbol is already in an equivalence group.
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 35a3496..a9a0dc0 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -2585,6 +2585,25 @@ gfc_find_uop (const char *name, gfc_namespace *ns)
 }
 
 
+/* Update a symbol's common_block field, and take care of the associated
+   memory management.  */
+
+static void
+set_symbol_common_block (gfc_symbol *sym, gfc_common_head *common_block)
+{
+  if (sym->common_block == common_block)
+return;
+
+  if (sym->common_block && sym->common_block->name[0] != '\0')
+{
+  sym->common_block->refs--;
+  if (sym->common_block->refs == 0)
+	free (sym->common_block);
+}
+  sym->common_block = common_block;
+}
+
+
 /* Remove a gfc_symbol structure and everything it points to.  */
 
 void
@@ -2612,12 +2631,7 @@ gfc_free_symbol (gfc_symbol *sym)
 
   gfc_free_namespace (sym->f2k_derived);
 
-  if (sym->common_block && sym->common_block->name[0] != '\0')
-{ 
-  sym->common_block->refs--; 
-  if (sym->common_block->refs == 0)
-	free (sym->common_block);
-}
+  set_symbol_common_block (sym, NULL);
 
   free (sym);
 }
@@ -3090,6 +3104,9 @@ restore_old_symbol (gfc_symbol *p)
   p->formal = old->formal;
 }
 
+  set_symbol_common_block (p, old->common_block);
+  p->common_head = old->common_head;
+
   p->old_symbol = old->old_symbol;
   free

Re: [google][gcc-4_9] encode and compress cc1 option strings in gcov_module_info

2015-10-06 Thread Rong Xu

It's 1:3 to 1:4 in the programs I tested. But it really depends on how
the options are used. I think your idea of using combined strlen works
better.
I just make the code a little clumsy but it does not cause any
performance issue.

On Tue, Oct 6, 2015 at 10:21 AM, Xinliang David Li  wrote:
> On Tue, Oct 6, 2015 at 9:26 AM, Rong Xu  wrote:
>> On Mon, Oct 5, 2015 at 5:33 PM, Xinliang David Li  wrote:
>>>unsigned ggc_memory = gcov_read_unsigned ();
>>> +  unsigned marker = 0, len = 0, k;
>>> +  char **string_array, *saved_cc1_strings;
>>> +
>>>for (unsigned j = 0; j < 7; j++)
>>>
>>>
>>> Do not use hard coded number. Use the enum defined in coverage.c.
>>
>> OK.
>>
>>>
>>>
>>> +string_array[j] = xstrdup (gcov_read_string ());
>>> +
>>> +  k = 0;
>>> +  for (unsigned j = 1; j < 7; j++)
>>>
>>> Do not use hard coded number.
>>
>> OK.
>>
>>>
>>>
>>> +{
>>> +  if (num_array[j] == 0)
>>> +continue;
>>> +  marker += num_array[j];
>>>
>>> It is better to read if the name of variable 'marker' is changed to
>>> 'j_end' or something similar
>>>
>>> For all the substrings of 'j' kind, there should be just one marker,
>>> right? It looks like here you introduce one marker per string, not one
>>> marker per string kind.
>>
>> I don't understand what you meant here. "marker" is fixed for each j
>> substring (one option kind) -- it the end index of the sub-string
>> array. k-loop is for each string.
>>
>
> That was a wrong comment from me. Discard it.
>
>>>
>>> +  len += 3; /* [[  */
>>>
>>> Same here for hard coded value.
>>>
>>> +  for (; k < marker; k++)
>>> +len += strlen (string_array[k]) + 1; /* 1 for delimter of ']'  
>>> */
>>>
>>> Why do we need one ']' per string?
>>
>> This is because the options strings can contain space '  '. I cannot
>> use space as the delimiter, neither is \0 as it is the end of the
>> string of the encoded string.
>
> Ok -- this allows you to avoid string copy during parsing.
>>
>>>
>>>
>>> +}
>>> +  saved_cc1_strings = (char *) xmalloc (len + 1);
>>> +  saved_cc1_strings[0] = 0;
>>> +
>>> +  marker = 0;
>>> +  k = 0;
>>> +  for (unsigned j = 1; j < 7; j++)
>>>
>>> Same here for 7.
>>
>> will fix in the new patch.
>>
>>>
>>> +{
>>> +  static const char lipo_string_flags[6] = {'Q', 'B', 'S',
>>> 'D','I', 'C'};
>>> +  if (num_array[j] == 0)
>>> +continue;
>>> +  marker += num_array[j];
>>>
>>> Suggest changing marker to j_end
>> OK.
>>
>>>
>>> +  sprintf (saved_cc1_strings, "%s[[%c", saved_cc1_strings,
>>> +   lipo_string_flags[j - 1]);
>>> +  for (; k < marker; k++)
>>> +{
>>> +  sprintf (saved_cc1_strings, "%s%s]", saved_cc1_strings,
>>> +   string_array[k]);
>>>
>>> +#define DELIMTER"[["
>>>
>>> Why double '[' ?
>> I will change to single '['.
>>
>>>
>>> +#define DELIMTER2   "]"
>>> +#define QUOTE_PATH_FLAG 'Q'
>>> +#define BRACKET_PATH_FLAG   'B'
>>> +#define SYSTEM_PATH_FLAG'S'
>>> +#define D_U_OPTION_FLAG 'D'
>>> +#define INCLUDE_OPTION_FLAG 'I'
>>> +#define COMMAND_ARG_FLAG'C'
>>> +
>>> +enum lipo_cc1_string_kind {
>>> +  k_quote_paths = 0,
>>> +  k_bracket_paths,
>>> +  k_system_paths,
>>> +  k_cpp_defines,
>>> +  k_cpp_includes,
>>> +  k_lipo_cl_args,
>>> +  num_lipo_cc1_string_kind
>>> +};
>>> +
>>> +struct lipo_parsed_cc1_string {
>>> +  const char* source_filename;
>>> +  unsigned num[num_lipo_cc1_string_kind];
>>> +  char **strings[num_lipo_cc1_string_kind];
>>> +};
>>> +
>>> +struct lipo_parsed_cc1_string *
>>> +lipo_parse_saved_cc1_string (const char *src, char *str,
>>> +bool parse_cl_args_only);
>>> +void free_parsed_string (struct lipo_parsed_cc1_string *string);
>>> +
>>>
>>> Declare above in a header file.
>>
>> OK.
>>
>>>
>>>
>>>  /* Returns true if the command-line arguments stored in the given 
>>> module-infos
>>> are incompatible.  */
>>>  bool
>>> -incompatible_cl_args (struct gcov_module_info* mod_info1,
>>> -  struct gcov_module_info* mod_info2)
>>> +incompatible_cl_args (struct lipo_parsed_cc1_string* mod_info1,
>>> +  struct lipo_parsed_cc1_string* mod_info2)
>>>
>>> Fix formating.
>> OK.
>>>
>>>  {
>>>  {
>>> @@ -1647,7 +1679,7 @@ build_var (tree fn_decl, tree type, int counter)
>>>  /* Creates the gcov_fn_info RECORD_TYPE.  */
>>>
>>>NULL_TREE, get_gcov_unsigned_t ());
>>>DECL_CHAIN (field) = fields;
>>>fields = field;
>>>
>>> -  /* Num bracket paths  */
>>> +  /* cc1_uncompressed_strlen field */
>>>field = build_decl (BUILTINS_LOCATION, FIELD_DECL,
>>>NULL_TREE, get_gcov_unsigned_t ());
>>>DECL_CHAIN (field) = fields;
>>>fields = field;
>>>
>>>
>>> Why do we need to store uncompressed string length? If there is need

Re: [C++ PATCH] Build COND_EXPRs with location (PR c++/67863)

2015-10-06 Thread Marek Polacek

On Tue, Oct 06, 2015 at 11:46:33AM -0600, Jeff Law wrote:
> >2015-10-06  Marek Polacek  
> >
> > PR c++/67863
> > * call.c (build_conditional_expr_1): Build the COND_EXPR with
> > a location.
> >
> > * c-c++-common/Wtautological-compare-4.c: New test.
> OK.
 
Committed.

> Related, I'll drop my proposed patch to builtins.c that worked around this
> problem.

Yeah, thanks.

Marek

[PATCH v3, i386]: Enable -mstackrealign and 'force_align_arg_pointer' attribute for x86_64

2015-10-06 Thread Uros Bizjak

On Mon, Oct 5, 2015 at 7:33 PM, Uros Bizjak  wrote:

>> As shown in PR 66697 [1] and WineHQ bug [2], an application can
>> misalign incoming stack to less than ABI mandated 16 bytes. While it
>> is possible to use -mincoming-stack-boundary=2  (= 4 bytes) for 32 bit
>> targets to emit stack realignment code, this option is artificially
>> limited to 4 (= 16  bytes) for 64bit targets.
>
> Attached v2 patch goes all the way to enable -mstackrealign and
> 'force_align_arg_pointer' attribute for x86_64. In addition to
> -mincoming-stack-boundary changes, the patch changes
> MIN_STACK_BOUNDARY definition to 8bytes on 64bit targets, as this is
> really the minimum supported stack boundary.
>
> This patch is also needed to allow stack realignment in the interrupt handler.

V3 adds support for unaligned SSE moves outside stack realignment area.

2015-10-06  Uros Bizjak  

PR target/66697
* config/i386/i386.c (ix86_option_override_internal): Always use
8-byte minimum stack boundary in 64-bit mode.
(ix86_compute_frame_layout): Remove assert on INCOMING_STACK_BOUNDARY.
(ix86_emit_save_reg_using_mov): Support unaligned SSE store.
Add a REG_CFA_EXPRESSION note if needed.
(ix86_emit_restore_sse_regs_using_mov): Support unaligned SSE load.
(ix86_handle_force_align_arg_pointer_attribute): New.
(ix86_minimum_incoming_stack_boundary): Remove TARGET_64BIT check.
(ix86_attribute_table): Set ix86_force_align_arg_pointer_string
with ix86_handle_force_align_arg_pointer_attribute.
* config/i386/i386.h (MIN_STACK_BOUNDARY): Set to BITS_PER_WORD.

testsuite/ChangeLog:

2015-10-06  Uros Bizjak  

PR target/66697
* gcc.target/i386/20060512-1.c: Remove ia32 requirement.
(PUSH, POP): New defines.
(sse2_test): Use PUSH and POP to misalign runtime stack.
* gcc.target/i386/20060512-2.c: Remove ia32 requirement.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

An earlier version of this patch was also used to compile 64bit Wine,
where it reportedly fixes all unaligned stack failures for x86_64
target. Please see WineHQ bugreport.

I'll wait for a reconfirmation that this slightly changed patch also
works OK for Wine people, before it is committed to mainline and later
backported to gcc-5 branch.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 228530)
+++ config/i386/i386.c  (working copy)
@@ -5209,8 +5209,7 @@ ix86_option_override_internal (bool main_args_p,
   ix86_incoming_stack_boundary = ix86_default_incoming_stack_boundary;
   if (opts_set->x_ix86_incoming_stack_boundary_arg)
 {
-  int min = (TARGET_64BIT_P (opts->x_ix86_isa_flags)
-? (TARGET_SSE_P (opts->x_ix86_isa_flags) ? 4 : 3) : 2);
+  int min = TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 3 : 2;

   if (opts->x_ix86_incoming_stack_boundary_arg < min
  || opts->x_ix86_incoming_stack_boundary_arg > 12)
@@ -11391,7 +11390,6 @@ ix86_compute_frame_layout (struct ix86_frame *fram
   /* The only ABI that has saved SSE registers (Win64) also has a
  16-byte aligned default stack, and thus we don't need to be
 within the re-aligned local stack frame to save them.  */
-  gcc_assert (INCOMING_STACK_BOUNDARY >= 128);
   offset = ROUND_UP (offset, 16);
   offset += frame->nsseregs * 16;
 }
@@ -11616,14 +11614,26 @@ ix86_emit_save_reg_using_mov (machine_mode mode, u
   struct machine_function *m = cfun->machine;
   rtx reg = gen_rtx_REG (mode, regno);
   rtx mem, addr, base, insn;
+  unsigned int align;

   addr = choose_baseaddr (cfa_offset);
   mem = gen_frame_mem (mode, addr);

-  /* For SSE saves, we need to indicate the 128-bit alignment.  */
-  set_mem_align (mem, GET_MODE_ALIGNMENT (mode));
+  /* The location is aligned up to INCOMING_STACK_BOUNDARY.  */
+  align = MIN (GET_MODE_ALIGNMENT (mode), INCOMING_STACK_BOUNDARY);
+  set_mem_align (mem, align);

-  insn = emit_move_insn (mem, reg);
+  /* SSE saves are not within re-aligned local stack frame.
+ In case INCOMING_STACK_BOUNDARY is misaligned, we have
+ to emit unaligned store.  */
+  if (mode == V4SFmode && align < 128)
+{
+  rtx unspec = gen_rtx_UNSPEC (mode, gen_rtvec (1, reg), UNSPEC_STOREU);
+  insn = emit_insn (gen_rtx_SET (mem, unspec));
+}
+  else
+insn = emit_insn (gen_rtx_SET (mem, reg));
+
   RTX_FRAME_RELATED_P (insn) = 1;

   base = addr;
@@ -11670,6 +11680,8 @@ ix86_emit_save_reg_using_mov (machine_mode mode, u
   mem = gen_rtx_MEM (mode, addr);
   add_reg_note (insn, REG_CFA_OFFSET, gen_rtx_SET (mem, reg));
 }
+  else
+add_reg_note (insn, REG_CFA_EXPRESSION, gen_rtx_SET (mem, reg));
 }

 /* Emit code to save registers using MOV insns.
@@ -11886,6 +11898,25 @@ find_drap_reg (void)
 }
 }

+/* Handle a "force_align_arg_pointer" attribute.  */
+
+static tree

Re: [C++ PATCH] Build COND_EXPRs with location (PR c++/67863)

2015-10-06 Thread Jeff Law


On 10/06/2015 11:29 AM, Marek Polacek wrote:

I've been chasing a bogus -Wtautological-compare warning that only
occurred in the C++ FE.  Turned out that we were building COND_EXPRs
without a location; that means that this code in warn_tautological_cmp
didn't work as planned:

   /* Don't warn for various macro expansions.  */
   if (from_macro_expansion_at (loc)
   || from_macro_expansion_at (EXPR_LOCATION (lhs))
   || from_macro_expansion_at (EXPR_LOCATION (rhs)))
 return;

If we set the location properly, we're able to detect that either LHS
or RHS comes from a macro expansion.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-10-06  Marek Polacek  

PR c++/67863
* call.c (build_conditional_expr_1): Build the COND_EXPR with
a location.

* c-c++-common/Wtautological-compare-4.c: New test.

OK.

Related, I'll drop my proposed patch to builtins.c that worked around 
this problem.


Jeff

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread Steve Ellcey

On Tue, 2015-10-06 at 11:10 -0700, H.J. Lu wrote:

> Does it pass all tests under g++.dg/torture/stackalign?  You need
> to implement -mstackrealign and -mpreferred-stack-boundary=
> as well as update check_effective_target_automatic_stack_alignment
> to run all stack alignment tests.

No, those tests were not run.  I have not implemented the -mstackrealign
or -mpreferred-stack-boundary options.  I do not think those are the
options I want for stack realignment on MIPS.  The only reason
for implementing stack realignment on MIPS is to align MSA (vector)
register spills.  Yes, stack realignment can also be used with aligned
local variables but those could also be handled with the existing alloca
method without implementing stack realignment.

So I want an option (turned on by default when using -mmsa) that only
aligns a function if that function contains MSA register usage.  I don't
want an option to align every function because that would slow things
down for no reason.  I should and will try these test cases with my
option.  Right now my testing forces all functions to be aligned even if
they don't use MSA because that gives me maximum testing, but the final
code will only align functions that use MSA.

Steve Ellcey
sell...@imgtec.com

Re: Do not use TYPE_CANONICAL in useless_type_conversion

2015-10-06 Thread Jan Hubicka

> 
> The patch works for me but I'm not sure about the store_expr_with_bounds
> change.  Where does the actual copying take place?  adjust_address_nv
> just adjusts the mode ...

Yep, the mode was supposed to happen in the later path where we emit block 
moves,
but i missed an else there.  I will update the patch.  Thanks for catching this.

Honza
> 
> Index: gcc/expr.c
> ===
> --- gcc/expr.c  (revision 228514)
> +++ gcc/expr.c  (working copy)
> @@ -5462,7 +5462,7 @@ store_expr_with_bounds (tree exp, rtx ta
>  {
>if (GET_MODE (temp) != GET_MODE (target) && GET_MODE (temp) != 
> VOIDmode)
> {
> - if (GET_MODE (target) == BLKmode)
> + if (GET_MODE (target) == BLKmode || MEM_P (target))
> {
>   /* Handle calls that return BLKmode values in registers.  */
>   if (REG_P (temp) && TREE_CODE (exp) == CALL_EXPR)
> 
> works for me as well (for the testcase that ICEd for you) and it
> definitely will emit the copy.  Basically if it is a MEM the mode
> should be irrelevant (do we have BLKmode things that are _not_ MEMs?).
> 
> Anyway, your expr.c hunk looks wrong (no copy) and I'll let somebody
> more familiar with that area do the sort-out.  Unfortunately
> that ICEing testcase isn't a runtime one ;)
> 
> Richard.

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread Steve Ellcey

On Tue, 2015-10-06 at 17:39 +0200, Bernd Schmidt wrote:

> 
> Did your tag that copy as RTX_FRAME_RELATED? I'd expect dwarf2cfi would 
> ignore instructions with that bit unset. There's even a comment 
> indicating arm does something like this:

Yes, I had marked it as RTX_FRAME_RELATED.  When I took that out things
looked better (well, maybe just different).

If I remove that and I change Rule #16 to handle an AND with a register
I get odd looking .cfi stuff.  The AND instruction (which is marked with
RTX_FRAME_RELATED) seems to generate these cfi_escape macros:

.cfi_escape 0x10,0x1f,0x2,0x8e,0x7c
.cfi_escape 0x10,0x1e,0x2,0x8e,0x78
.cfi_escape 0x10,0xc,0x2,0x8e,0x74

which are meaningless to me.  What I found works better is to skip the
dwarf2cfi.c changes and explicitly attach this note to the AND
instruction:

  cfi_note = alloc_reg_note (REG_CFA_DEF_CFA, stack_pointer_rtx, NULL_RTX);
  REG_NOTES (insn) = cfi_note;

When I do this the only unexpected test suite execution failures I see
are ones that call the C++ set_unexpected function.

FYI: The reason I was copying the stack pointer to another register was
to use that other register as my argument pointer (needed after the
stack pointer got aligned) and to use it to restore the stack pointer at
the end of the function for normal returns.

Steve Ellcey
sell...@imgtec.com

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread H.J. Lu

On Tue, Oct 6, 2015 at 11:32 AM, Steve Ellcey  wrote:
> On Tue, 2015-10-06 at 11:10 -0700, H.J. Lu wrote:
>
>> Does it pass all tests under g++.dg/torture/stackalign?  You need
>> to implement -mstackrealign and -mpreferred-stack-boundary=
>> as well as update check_effective_target_automatic_stack_alignment
>> to run all stack alignment tests.
>
> No, those tests were not run.  I have not implemented the -mstackrealign
> or -mpreferred-stack-boundary options.  I do not think those are the
> options I want for stack realignment on MIPS.  The only reason
> for implementing stack realignment on MIPS is to align MSA (vector)
> register spills.  Yes, stack realignment can also be used with aligned
> local variables but those could also be handled with the existing alloca
> method without implementing stack realignment.
>

We added tests to cover all issues we found in dynamic stack alignment.
It is the minimum requirement of dynamic stack alignment implementation
to pass all those tests.  If you don't run those tests, you don't know what
you missed.  Please let me know If there are any additional issues beyond
those tests.

-- 
H.J.

RE: [PATCH 1/2] [Refactoring graphite] Move declarations, assign types, renaming.

2015-10-06 Thread Sebastian Paul Pop

Thanks for the quick fix.
Sorry for breaking bootstrap.


-Original Message-
From: H.J. Lu [mailto:hjl.to...@gmail.com] 
Sent: Tuesday, October 06, 2015 11:43 AM
To: Aditya Kumar
Cc: GCC Patches; Tobias Grosser; Richard Biener; aditya...@samsung.com; 
Sebastian Pop; Sebastian Pop
Subject: Re: [PATCH 1/2] [Refactoring graphite] Move declarations, assign 
types, renaming.

On Tue, Oct 6, 2015 at 9:37 AM, H.J. Lu  wrote:
> On Tue, Oct 6, 2015 at 9:34 AM, H.J. Lu  wrote:
>> On Mon, Oct 5, 2015 at 7:53 PM, Aditya Kumar  wrote:
>>> 1. Move declarations near the assignment/usage.
>>> 2. Assign type to members which were void*.
>>> 3. Rename scop->context to scop::param_context, and scop::ctx to
>>> scop::isl_context
>>>
>>> No functional changes intended. Passes regtest and bootstrap.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2015-10-05  Aditya Kumar  
>>>
>>> * graphite-dependences.c (scop_get_reads): Renamed scop->context to 
>>> scop->param_context.
>>> (scop_get_must_writes): Same.
>>> (scop_get_may_writes): Same.
>>> (scop_get_original_schedule): Same.
>>> (scop_get_transformed_schedule): Same.
>>> (subtract_commutative_associative_deps): Same.
>>> * graphite-isl-ast-to-gimple.c (add_parameters_to_ivs_params): Same.
>>> (generate_isl_context): Same.
>>> (generate_isl_schedule): Same.
>>> (scop_to_isl_ast): Same.
>>> (graphite_regenerate_ast_isl): Same.
>>> * graphite-optimize-isl.c (scop_get_domains): Same.
>>> (optimize_isl): Renamed scop->context to scop->param_context.
>>> * graphite-poly.c (new_poly_bb): Change the type of argument to 
>>> gimple_poly_bb_p.
>>> (new_scop): Renamed scop->context to scop->param_context.
>>> (free_scop): Same.
>>> (print_scop_context): Same.
>>> * graphite-poly.h (new_poly_dr): Change the type of argument from 
>>> void* to data_reference_p.
>>> (struct poly_bb): Change the type of black_box to gimple_poly_bb_p.
>>> (new_poly_bb): Change the type of argument from void* to 
>>> gimple_poly_bb_p.
>>> (pbb_set_black_box): Same.
>>> (struct scop): Rename context to param_context, ctx to isl_context.
>>> * graphite-scop-detection.c (scop_detection::build_scop_bbs_1): 
>>> Move declarations closer to assignment.
>>> (find_params_in_bb): Same.
>>> (find_scop_parameters): Same.
>>> * graphite-sese-to-poly.c (unsigned ssa_name_version_typesize): 
>>> Global to be used for statement IDs.
>>> (isl_id_for_pbb): Use ssa_name_version_typesize.
>>> (simple_copy_phi_p): Move declarations closer to assignment.
>>> (build_pbb_scattering_polyhedrons): Same.
>>> (build_scop_scattering): Same.
>>> (isl_id_for_ssa_name): Same.
>>> (extract_affine_name): Same.
>>> (extract_affine_int): Same.
>>> (extract_affine): Same.
>>> (set_scop_parameter_dim): Use renamed member.
>>> (build_loop_iteration_domains): Same.
>>> (add_param_constraints): Same.
>>> (build_scop_iteration_domain): Same.
>>> (pdr_add_data_dimensions): Same.
>>> (build_poly_dr): Same.
>>> (build_scop_drs): Move declarations closer to assignment.
>>> (analyze_drs_in_stmts): Same.
>>> (insert_out_of_ssa_copy): Same.
>>> (insert_out_of_ssa_copy_on_edge): Same.
>>> (propagate_expr_outside_region): Same.
>>> (rewrite_phi_out_of_ssa): Same.
>>> (rewrite_degenerate_phi): Same.
>>> (rewrite_reductions_out_of_ssa): Same.
>>> (rewrite_cross_bb_scalar_dependence): Same.
>>> (handle_scalar_deps_crossing_scop_limits): Same.
>>> (rewrite_cross_bb_scalar_deps): Same.
>>> * graphite.c (graphite_transform_loops): Use renamed member.
>>>
>>>
>>
>> It breaks bootstrap with isl-0.14:
>>
>> /export/gnu/import/git/sources/gcc/gcc/graphite-optimize-isl.c: In
>> function ‘bool optimize_isl(scop_p)’:
>> /export/gnu/import/git/sources/gcc/gcc/graphite-optimize-isl.c:333:40:
>> error: ‘struct scop’ has no member named ‘ctx’
>>isl_options_set_schedule_fuse (scop->ctx, ISL_SCHEDULE_FUSE_MIN);
>> ^
>> Makefile:1077: recipe for target 'graphite-optimize-isl.o' failed
>> make[3]: *** [graphite-optimize-isl.o] Error 1
>> make[3]: *** Waiting for unfinished jobs
>>
>>
>
> I am testing this patch.
>
> --
> H.J.
> ---
> iff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
> index 3fe3133..2bae417 100644
> --- a/gcc/graphite-optimize-isl.c
> +++ b/gcc/graphite-optimize-isl.c
> @@ -330,7 +330,7 @@ optimize_isl (scop_p scop)
>/* ISL-0.15 or later.  */
>isl_options_set_schedule_serialize_sccs (scop->isl_context, 1);
>  #else
> -  isl_options_set_schedule_fuse (scop->ctx, ISL_SCHEDULE_FUSE_MIN);
> +

Re: [patch 4/3] Header file reduction - Tools for contrib

2015-10-06 Thread Andrew MacLeod


On 10/06/2015 10:56 AM, Bernd Schmidt wrote:

On 10/06/2015 04:04 PM, Andrew MacLeod wrote:


I primarily submitted it early because you wanted to look at the tools
before the code patch, which is the one I care about since the longer it
goes, the more effort it is to update the patch to mainline.


The problem is that the generated patch is impossible to review on its 
own. It's just a half a megabyte dump of changes that can't 
realistically be verified for correctness. Reading it can throw up 
some interesting questions which can then (hopefully) be answered by 
reference to the tools, such as "why does timevar.h move?" For that to 
work, the tools need at least to have a minimum level of readability. 
They are the important part here, not the generated patch. (Unless you 
find a reviewer who's less risk-averse than me and is willing to 
approve the whole set and hope for the best.)


I dont get your fear.  I could have created that patch by hand, it would 
just take a long time, and would likely be less complete, but just as large.


I'm not  changing functionality.  ALL the tool is doing is removing 
header files which aren't needed to compile.  It goes to great pains to 
make sure it doesn't remove a silent dependency that conditional 
compilation might introduce.  Other than that, the sanity check is that 
everything compiles on every target and regression tests show nothing.   
Since we're doing this with just include files, and not changing 
functionality, Im not sure what your primary concern is? You are 
unlikely to ever be able to read the patch and decide for yourself 
whether removing expr.h from the header list is correct or not.  Much 
like if I proposed the same thing by hand.


Yes, I added the other tool in which reorders the headers and removes 
duplicates, and perhaps that is what is causing you the angst.  The 
canonical ordering was developed by taking current practice and adding 
in other core files which had ordering issues that showed up during the 
reduction process.   Reorderiing all files to this order should actually 
resolve more issues than it causes.   I can generate and provide that as 
a patch if you want to look at it separately...  I dont know what that 
buys you.  you could match the includes to the master list to make sure 
the tool did its job by itself I guess.


The tools are unlikely to ever be used again... Jeff suggested I provide 
them to contrib just in case someone decided to do something with them 
someday, they wouldn't be lost,or at least they wouldn't have to track 
me down to get them.


IF we discover that one or more of the tools does continue to have some 
life, well then maybe at that point its worth putting some time into 
refining it a bit better.



I suspect you'll have to regenerate the includes patch anyway, because 
of the missing #undef tracking I mentioned.


I dont see that #undef is relevant at all.  All the conditional 
dependencies care about is "MAY DEFINE"  Its conservative in that if 
something could be defined, we'll assume it is and not remove any file 
which may depend on it.  to undefine something in a MAY DEFINE world 
doesnt mean anything.






Let's consider the timevar.h example a bit more. Does the include have 
to move? I don't see anything in that file that looks like a 
dependency, and include files that need it are already including it. 
Is the fact that df.h includes it in any way material for generating 
an order of headers? IMO, no, it's an unnecessary change indicating a 
bug in the script, and any kind of unnecessary change in a patch like 
this makes it so much harder to verify. I think the canonical order 
that's produced should probably ignore files included from other 
headers so that these are left alone in their original order.


I covered  this in the last note.  Pretty much every file is going to 
have a "core" of up to 95 files reordered into the canonical form, which 
taken as a snapshot of any given file, may look arbitrary but is in fact 
a specific subset of the canonical ordering.   You cant only order some 
parts of it because there are subtle dependencies between the files 
which force you to look at them all.  Trust me, I didnt start by 
reordering all of them this way... it developed over time.



I'd still like more explanations of special cases in the tools like 
the diagnostic.h area as well as

# seed tm.h with options.h since its a build file and won't be seen.
and I think we need to understand what makes them special in a way 
that makes the rest of the algorithm not handle them correctly (so 
that we don't overlook any other such cases).


See the other note, its because of the front end files/diagnostic 
dependencies  or irreconcilable cycles because of what a header 
includes. Any other case would have shown up the way those did 
during development.


Andrew

Re: [patch 4/3] Header file reduction - Tools for contrib

2015-10-06 Thread Andrew MacLeod


On 10/06/2015 08:02 AM, Bernd Schmidt wrote:



This sounds like the intention is to move recognized core files (I 
assume these are the ones in the "order" array in the tool) to the 
start, and leaving everything alone? I was a bit confused about this 
at first; I see for example "timevar.h" moving around without being 
present in the list, but it looks like it gets added implicitly 
through being included by df.h. (Incidentally, this looks like another 
case like the obstack one - a file using timevars should include 
timevar.h IMO, even if it also includes df.h).




Ordering the includes is perhaps more complex than you realize. It more 
complex than I realized when I first started it. it took a long and very 
frustrating period to get it working properly.


There are implicit dependencies between some include files.  The primary 
ordering list is to provide a canonical order for key files so that 
those dependencies are automatically taken care of.  Until now we've 
managed it by hand.The problem is that the dependencies are not 
necessary always from the main header file.. they may come from one of 
the headers that were included in it. There are lots of dependencies on 
symtab.h for instance, which comes from tree.h   Some other source files 
don't need tree.h, but they do need symtab.h.   If symtab.h isn't in the 
ordering list and the header which uses it is (like cgraph.h) , the tool 
would move cgraph.h above symtab.h and the result doesn't work.


The solution is to take that initial canonical list, and fully expand it 
to include everything that those headers include. This gives a linear 
canonical list of close to 100 files.It means things like timevar.h 
(which is included by df.h)  are in this "ordering":

<...>
regset.h
alloc-pool.h
timevar.h
df.h
tm_p.h
gimple-iterator
<...>

A source file which does not include df.h  but includes timevar.h muist 
keep it in this same relative ordering, or some other header from the 
ordering list which uses timevar.h may no longer compile. (timevar.h 
would end up after everything in the canonical list instead of in fromt 
of the other file)


This means the any of those 100 headers files which occur in a source 
file should occur in this order.  The original version of the tool tried 
to spell out this exact order, but I realized that was not maintainable 
as headers change, and it was actually far simply to specify the core 
ones In the tool, and let it do the expansion based on what is in the 
current tree.


This also means that taken as a snapshot, you are going to see things 
like timevar.h move around in apparently random fashion... but it is not 
random.  It will be in front of any and all headers listed after it in 
the ordering.  Any headers which don't appear in the canonical list will 
simply retain their current order in the source file, but AFTER all the 
ones in the canonical list.


This also made it fairly easy to remove redundant includes that have 
been seen already by including some other header... I just build the 
list of headers that have been seen already


There are a couple of specialty cases that are handled..
The 'exclude processing' list are headers which shouldn't be expanded 
like above. They can cause irreconcilable problems when expanded , 
especially the front end file files.  They do need to be ordered since 
diagnostics require them to be included first in order to satisfy the 
requirement that   GCC_DIAG_STYLE  be defined before diagnostic.h is 
included.  Plus most of them include tree.h and/or diagnostic.h 
themselves, but we don't want them to impact  the ordering for the 
backend files.


That list puts those core files in an appropriate place canoncailly, but 
doesn't expand into the file because the order we get for the different 
front ends would be different .  Finally diagnostic*.h and friends are 
removed from the list and put at the end to ensure eveything that might 
be needed by them is available.  Again, the front end files would have 
made it much earlier than we wanted for the backend files.


I also disagree with the assertion that " a file using timevars should 
include timevar.h IMO, even if it also includes df.h"  It could, but I 
don't see the value, and I doubt anyone really cares much.If someone 
ever removes the only thing that does bring timevar.h, you simply add it 
then. That is just part of updating headers.   I'm sure before I run 
this patch not every file which uses timevar.h actually physically 
includes it.  This process will set us to a somewhat consistent state.


 Its simple enough to remove the ones that are redundant in an 
automated way, and very difficult to determine whether they not 
required, but contain content that is used.


The fully expanded canonical list looks something like this:

safe-ctype.h
filenames.h
libiberty.h
hwint.h
system.h
insn-modes.h
machmode.h
signop.h
wide-int.h
double-int.h
real.h
fixed-value.h
statistics.h
gtype-desc.h
ggc.h
vec.h

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread H.J. Lu

On Tue, Oct 6, 2015 at 11:02 AM, Steve Ellcey  wrote:
> On Tue, 2015-10-06 at 17:39 +0200, Bernd Schmidt wrote:
>
>>
>> Did your tag that copy as RTX_FRAME_RELATED? I'd expect dwarf2cfi would
>> ignore instructions with that bit unset. There's even a comment
>> indicating arm does something like this:
>
> Yes, I had marked it as RTX_FRAME_RELATED.  When I took that out things
> looked better (well, maybe just different).
>
> If I remove that and I change Rule #16 to handle an AND with a register
> I get odd looking .cfi stuff.  The AND instruction (which is marked with
> RTX_FRAME_RELATED) seems to generate these cfi_escape macros:
>
> .cfi_escape 0x10,0x1f,0x2,0x8e,0x7c
> .cfi_escape 0x10,0x1e,0x2,0x8e,0x78
> .cfi_escape 0x10,0xc,0x2,0x8e,0x74
>
> which are meaningless to me.  What I found works better is to skip the
> dwarf2cfi.c changes and explicitly attach this note to the AND
> instruction:
>
>   cfi_note = alloc_reg_note (REG_CFA_DEF_CFA, stack_pointer_rtx, 
> NULL_RTX);
>   REG_NOTES (insn) = cfi_note;
>
> When I do this the only unexpected test suite execution failures I see
> are ones that call the C++ set_unexpected function.
>
> FYI: The reason I was copying the stack pointer to another register was
> to use that other register as my argument pointer (needed after the
> stack pointer got aligned) and to use it to restore the stack pointer at
> the end of the function for normal returns.

Does it pass all tests under g++.dg/torture/stackalign?  You need
to implement -mstackrealign and -mpreferred-stack-boundary=
as well as update check_effective_target_automatic_stack_alignment
to run all stack alignment tests.


-- 
H.J.

Re: [PATCH, i386] Introduce switch for Skylake Server CPU.

2015-10-06 Thread Kirill Yukhin

Hello Uroš,

I've merged two patches together and rebased it
on top of gcc-5-branch. The only change I made compared
to trunk version is scheduling set to CPU_NEHALEM since
CPU_HASWELL is not supported in gcc-5.

Bootstrapped.

Is it ok for gcc-5-branch?

gcc/
* config.gcc: Support "skylake-avx512".
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
PROCESSOR_SKYLAKE_AVX512.
* config/i386/i386.c (m_SKYLAKE_AVX512): Define.
(processor_target_table): Add "skylake-avx512".
(PTA_SKYLAKE): Define.
(ix86_option_override_internal): Add "skylake-avx512".
(fold_builtin_cpu): Handle "skylake-avx512".
* config/i386/i386.h (TARGET_SKYLAKE_AVX512): Define.
(processor_type): Add PROCESSOR_SKYLAKE_AVX512.
* doc/invoke.texi (skylake-avx512): New.

libgcc/
* libgcc/config/i386/cpuinfo.c (get_intel_cpu): Detect "skylake-avx512",
AES, PCLMUL, AVX-512VL, AVX-512BW, AVX-512DQ, AVX-512PF,
AVX-512ER, AVX-512CD.

gcc/testsuite/
* gcc.target/i386/builtin_target.c: Add check for "skylake-avx512", 
"aes"
and "pclmul".
* gcc.target/i386/funcspec-5.c: Test avx512vl, avx512bw,
avx512dq, avx512cd, avx512er, avx512pf and skylake-avx512.

--
Thanks, K

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c835734..207fc65 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -589,8 +589,8 @@ pentium4 pentium4m pentiumpro prescott"
 x86_64_archs="amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
 bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
 core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \
-sandybridge ivybridge haswell broadwell bonnell silvermont knl x86-64 \
-native"
+sandybridge ivybridge haswell broadwell bonnell silvermont knl \
+skylake-avx512 x86-64 native"
 
 # Additional x86 processors supported by --with-cpu=.  Each processor
 # MUST be separated by exactly one space.
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index f3f90df..4f20e14 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -185,6 +185,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   def_or_undef (parse_in, "__knl");
   def_or_undef (parse_in, "__knl__");
   break;
+case PROCESSOR_SKYLAKE_AVX512:
+  def_or_undef (parse_in, "__skylake_avx512");
+  def_or_undef (parse_in, "__skylake_avx512__");
+  break;
 /* use PROCESSOR_max to not set/unset the arch macro.  */
 case PROCESSOR_max:
   break;
@@ -294,6 +298,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 case PROCESSOR_KNL:
   def_or_undef (parse_in, "__tune_knl__");
   break;
+case PROCESSOR_SKYLAKE_AVX512:
+  def_or_undef (parse_in, "__tune_skylake_avx512__");
+  break;
 case PROCESSOR_INTEL:
 case PROCESSOR_GENERIC:
   break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9b17256..43e6f91 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2054,6 +2054,7 @@ const struct processor_costs *ix86_cost = _cost;
 #define m_BONNELL (1<

Re: [PATCH, i386] Introduce switch for Skylake Server CPU.

2015-10-06 Thread Kirill Yukhin

One more missed hunk:

iff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c 
b/gcc/testsuite/gcc.target/i386/builtin_target.c
index 9eb397e..cbca6b4 100644
--- a/gcc/testsuite/gcc.target/i386/builtin_target.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
@@ -173,6 +173,10 @@ check_features (unsigned int ecx, unsigned int edx,
 assert (__builtin_cpu_supports ("sse2"));
   if (ecx & bit_POPCNT)
 assert (__builtin_cpu_supports ("popcnt"));
+  if (ecx & bit_AES)
+assert (__builtin_cpu_supports ("aes"));
+  if (ecx & bit_PCLMUL)
+assert (__builtin_cpu_supports ("pclmul"));
   if (ecx & bit_SSE3)
 assert (__builtin_cpu_supports ("sse3"));
   if (ecx & bit_SSSE3)

Re: [PATCH, obvious, AVX-512] Add missing AVX-512 features detection.

2015-10-06 Thread Richard Biener

On Fri, Oct 2, 2015 at 5:25 PM, Kirill Yukhin  wrote:
> Hello,
> Patch in the bottom adds missing AVX-512VBMI,IFMA
> features to libgcc/config/i386/cpuinfo.c and, built-in expansion
> and test.
>
> Comitted to main trunk as obvious.

The test now execute FAILs for me:

FAIL: gcc.target/i386/builtin_target.c execution test

I have family 6, model 94

flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb xsaveopt pln pts
dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1
hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap


> gcc/
> * config/i386/i386.c (processor_features): Add F_AVX512VBMI,
> F_AVX512IFMA.
> (isa_names_table): Handle F_AVX512VBMI and F_AVX512IFMA.
> libgcc/
> * config/i386/cpuinfo.c (processor_features): Add
> FEATURE_AVX512VBMI and FEATURE_AVX512VBMI.
> testsuite/
> * gcc.target/i386/builtin_target.c: Handle "avx512ifma"
> and "avx512vbmi".
>
> --
> Thanks, K
>
> commit 39d9d882ed654e8b40095a24cb05baf661b81f3f
> Author: Kirill Yukhin 
> Date:   Fri Oct 2 18:08:33 2015 +0300
>
> AVX-512. Add missing features to cpuinfo.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1ccc33e..1719175 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -36591,6 +36591,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
>  F_AVX512CD,
>  F_AVX512ER,
>  F_AVX512PF,
> +F_AVX512VBMI,
> +F_AVX512IFMA,
>  F_MAX
>};
>
> @@ -36699,6 +36701,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
>{"avx512cd",F_AVX512CD},
>{"avx512er",F_AVX512ER},
>{"avx512pf",F_AVX512PF},
> +  {"avx512vbmi",F_AVX512VBMI},
> +  {"avx512ifma",F_AVX512IFMA},
>  };
>
>tree __processor_model_type = build_processor_model_struct ();
> diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c 
> b/gcc/testsuite/gcc.target/i386/builtin_target.c
> index aff4559..a9a8753 100644
> --- a/gcc/testsuite/gcc.target/i386/builtin_target.c
> +++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
> @@ -200,6 +200,10 @@ check_features (unsigned int ecx, unsigned int edx,
> assert (__builtin_cpu_supports ("avx512bw"));
>if (ebx & bit_AVX512DQ)
> assert (__builtin_cpu_supports ("avx512dq"));
> +  if (ebx & bit_AVX512IFMA)
> +   assert (__builtin_cpu_supports ("avx512ifma"));
> +  if (ebx & bit_AVX512VBMI)
> +   assert (__builtin_cpu_supports ("avx512vbmi"));
>  }
>  }
>
> diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
> index ddb49e3..40ed84c 100644
> --- a/libgcc/config/i386/cpuinfo.c
> +++ b/libgcc/config/i386/cpuinfo.c
> @@ -110,7 +110,9 @@ enum processor_features
>FEATURE_AVX512DQ,
>FEATURE_AVX512CD,
>FEATURE_AVX512ER,
> -  FEATURE_AVX512PF
> +  FEATURE_AVX512PF,
> +  FEATURE_AVX512VBMI,
> +  FEATURE_AVX512IFMA
>  };
>
>  struct __processor_model
> @@ -336,6 +338,10 @@ get_available_features (unsigned int ecx, unsigned int 
> edx,
> features |= (1 << FEATURE_AVX512PF);
>if (ebx & bit_AVX512ER)
> features |= (1 << FEATURE_AVX512ER);
> +  if (ebx & bit_AVX512IFMA)
> +   features |= (1 << FEATURE_AVX512IFMA);
> +  if (ecx & bit_AVX512VBMI)
> +   features |= (1 << FEATURE_AVX512VBMI);
>  }
>
>unsigned int ext_level;

Re: Fold acc_on_device

2015-10-06 Thread Segher Boessenkool

On Thu, Oct 01, 2015 at 08:33:07AM -0400, Nathan Sidwell wrote:
> 2015-10-01  Nathan Sidwell  
> 
>   * builtins.c: Don't include gomp-constants.h.
>   (fold_builtin_1): Don't fold acc_on_device here.
>   * gimple-fold.c: Include gomp-constants.h.
>   (gimple_fold_builtin_acc_on_device): New.
>   (gimple_fold_builtin): Call it.
> 
> Index: gcc/gimple-fold.c
> ===
> --- gcc/gimple-fold.c (revision 228288)
> +++ gcc/gimple-fold.c (working copy)
> @@ -2848,6 +2890,9 @@ gimple_fold_builtin (gimple_stmt_iterato
>  n == 3
>  ? gimple_call_arg (stmt, 2)
>  : NULL_TREE, fcode);

Missing break statement here.  This is PR67861.

> +case BUILT_IN_ACC_ON_DEVICE:
> +  return gimple_fold_builtin_acc_on_device (gsi,
> + gimple_call_arg (stmt, 0));
>  default:;
>  }


Segher

[PATCH, i386] Checked AES and PCLMUL in builtin_target.c.

2015-10-06 Thread Kirill Yukhin

Hello,
This obvious patch adds check for AES and PCLMUL cpuids.

gcc/testsuite/
* gcc.target/i386/builtin_target.c: Add check for AES and PCLMUL.

Updated test pass. Checked into main trunk.

--
Thanks, K

commit 6b4c0a8204ec5d311e4fef740ad8834cc4f5f5ff
Author: Kirill Yukhin 
Date:   Tue Oct 6 10:27:09 2015 +0300

Check AES and PCLMUL in builtin_target.c.

diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c 
b/gcc/testsuite/gcc.target/i386/builtin_target.c
index 9eb397e..cbca6b4 100644
--- a/gcc/testsuite/gcc.target/i386/builtin_target.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
@@ -173,6 +173,10 @@ check_features (unsigned int ecx, unsigned int edx,
 assert (__builtin_cpu_supports ("sse2"));
   if (ecx & bit_POPCNT)
 assert (__builtin_cpu_supports ("popcnt"));
+  if (ecx & bit_AES)
+assert (__builtin_cpu_supports ("aes"));
+  if (ecx & bit_PCLMUL)
+assert (__builtin_cpu_supports ("pclmul"));
   if (ecx & bit_SSE3)
 assert (__builtin_cpu_supports ("sse3"));
   if (ecx & bit_SSSE3)

[PATCH 2/3] remove unused struct base_alias_pair

2015-10-06 Thread Sebastian Pop

2015-10-06  Aditya Kumar  
Sebastian Pop  

* graphite-poly.c (free_data_refs_aux): Remove.
(free_gimple_poly_bb): Do not call free_data_refs_aux.
* graphite-poly.h (struct base_alias_pair): Remove.
* graphite-sese-to-poly.c (pdr_add_alias_set): Remove all uses 
of
base_alias_pair and dr->aux.
(build_alias_set): Same.
* tree-data-ref.c (create_data_ref): Initialize alias_set.
* tree-data-ref.h (data_reference): Add alias_set.
---
 gcc/graphite-poly.c | 18 --
 gcc/graphite-poly.h |  6 --
 gcc/graphite-sese-to-poly.c | 29 -
 gcc/tree-data-ref.c |  1 +
 gcc/tree-data-ref.h |  3 +++
 5 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 7de0e81..52d0765 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -273,29 +273,11 @@ new_gimple_poly_bb (basic_block bb, vec 
drs)
   return gbb;
 }
 
-static void
-free_data_refs_aux (vec datarefs)
-{
-  unsigned int i;
-  data_reference_p dr;
-
-  FOR_EACH_VEC_ELT (datarefs, i, dr)
-if (dr->aux)
-  {
-   base_alias_pair_p bap = (base_alias_pair_p)(dr->aux);
-
-   free (bap->alias_set);
-
-   free (bap);
-   dr->aux = NULL;
-  }
-}
 /* Frees GBB.  */
 
 void
 free_gimple_poly_bb (gimple_poly_bb_p gbb)
 {
-  free_data_refs_aux (GBB_DATA_REFS (gbb));
   free_data_refs (GBB_DATA_REFS (gbb));
 
   GBB_CONDITIONS (gbb).release ();
diff --git a/gcc/graphite-poly.h b/gcc/graphite-poly.h
index 3c4353d..418af6e 100644
--- a/gcc/graphite-poly.h
+++ b/gcc/graphite-poly.h
@@ -417,12 +417,6 @@ struct scop
 #define SCOP_CONTEXT(S) (NULL)
 #define POLY_SCOP_P(S) (S->poly_scop_p)
 
-typedef struct base_alias_pair
-{
-  int base_obj_set;
-  int *alias_set;
-} *base_alias_pair_p;
-
 extern scop_p new_scop (edge, edge);
 extern void free_scop (scop_p);
 extern gimple_poly_bb_p new_gimple_poly_bb (basic_block, 
vec);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 235c911..40b598d 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -932,16 +932,9 @@ build_scop_iteration_domain (scop_p scop)
 static isl_map *
 pdr_add_alias_set (isl_map *acc, data_reference_p dr)
 {
-  isl_constraint *c;
-  int alias_set_num = 0;
-  base_alias_pair *bap = (base_alias_pair *)(dr->aux);
-
-  if (bap && bap->alias_set)
-alias_set_num = *(bap->alias_set);
-
-  c = isl_equality_alloc
+  isl_constraint *c = isl_equality_alloc
   (isl_local_space_from_space (isl_map_get_space (acc)));
-  c = isl_constraint_set_constant_si (c, -alias_set_num);
+  c = isl_constraint_set_constant_si (c, -dr->alias_set);
   c = isl_constraint_set_coefficient_si (c, isl_dim_out, 0, 1);
 
   return isl_map_add_constraint (acc, c);
@@ -1086,11 +1079,7 @@ build_poly_dr (data_reference_p dr, poly_bb_p pbb)
 isl_id *id = isl_id_for_dr (scop, dr);
 int nb = 1 + DR_NUM_DIMENSIONS (dr);
 isl_space *space = isl_space_set_alloc (scop->isl_context, 0, nb);
-int alias_set_num = 0;
-base_alias_pair *bap = (base_alias_pair *)(dr->aux);
-
-if (bap && bap->alias_set)
-  alias_set_num = *(bap->alias_set);
+int alias_set_num = dr->alias_set;
 
 space = isl_space_set_tuple_id (space, isl_dim_set, id);
 subscript_sizes = isl_set_nat_universe (space);
@@ -1130,18 +1119,8 @@ build_alias_set (vec drs)
   graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL);
   free (all_vertices);
 
-  data_reference_p dr;
-  FOR_EACH_VEC_ELT (drs, i, dr)
-dr->aux = XNEW (base_alias_pair);
-
   for (i = 0; i < g->n_vertices; i++)
-{
-  data_reference_p dr = drs[i];
-  base_alias_pair *bap = (base_alias_pair *)(dr->aux);
-  bap->alias_set = XNEW (int);
-  int c = g->vertices[i].component + 1;
-  *(bap->alias_set) = c;
-}
+drs[i]->alias_set = g->vertices[i].component + 1;
 
   free_graph (g);
 }
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e7087d7..0ffa1db 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1080,6 +1080,7 @@ create_data_ref (loop_p nest, loop_p loop, tree memref, 
gimple *stmt,
   DR_STMT (dr) = stmt;
   DR_REF (dr) = memref;
   DR_IS_READ (dr) = is_read;
+  dr->alias_set = 0;
 
   dr_analyze_innermost (dr, nest);
   dr_analyze_indices (dr, nest, loop);
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 4c9e357..e6f82ff 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -127,6 +127,9 @@ struct data_reference
 
   /* Alias information for the data reference.  */
   struct dr_alias alias;
+
+  /* The alias set for this data reference.  */
+  int alias_set;
 };
 
 #define DR_STMT(DR)(DR)->stmt
-- 
1.9.1

[PATCH 3/3] move dr->alias_set to a helper structure

2015-10-06 Thread Sebastian Pop

2015-10-06  Aditya Kumar  
Sebastian Pop  

* graphite-poly.c (new_scop): Initialize drs.
* graphite-poly.h (struct dr_info): New.
(struct scop): Add drs.
* graphite-sese-to-poly.c (pdr_add_alias_set): Use dr_info.
(pdr_add_memory_accesses): Same.
(build_poly_dr): Same.
(build_alias_set): Same.
(build_scop_drs): Same.
(build_pbb_drs): Remove.
* tree-data-ref.c (create_data_ref): Do not initialize 
alias_set.
* tree-data-ref.h (data_reference): Remove alias_set.
---
 gcc/graphite-poly.c |  1 +
 gcc/graphite-poly.h | 39 
 gcc/graphite-sese-to-poly.c | 63 +++--
 gcc/tree-data-ref.c |  1 -
 gcc/tree-data-ref.h |  3 ---
 5 files changed, 67 insertions(+), 40 deletions(-)

diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 52d0765..ab28b7a 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -322,6 +322,7 @@ new_scop (edge entry, edge exit)
   scop_set_region (scop, region);
   SCOP_BBS (scop).create (3);
   POLY_SCOP_P (scop) = false;
+  scop->drs.create (3);
 
   return scop;
 }
diff --git a/gcc/graphite-poly.h b/gcc/graphite-poly.h
index 418af6e..37a1755 100644
--- a/gcc/graphite-poly.h
+++ b/gcc/graphite-poly.h
@@ -366,6 +366,42 @@ pbb_set_black_box (poly_bb_p pbb, gimple_poly_bb_p 
black_box)
   pbb->black_box = black_box;
 }
 
+/* A helper structure to keep track of data references, polyhedral BBs, and
+   alias sets.  */
+
+struct dr_info
+{
+  /* The data reference.  */
+  data_reference_p dr;
+
+  /* ALIAS_SET is the SCC number assigned by a graph_dfs of the alias graph.  
-1
+ is an invalid alias set.  */
+  int alias_set;
+
+  /* The polyhedral BB containing this DR.  */
+  poly_bb_p pbb;
+
+  /* Construct a DR_INFO from a data reference DR, an ALIAS_SET, and a PBB.  */
+  dr_info (data_reference_p dr, int alias_set, poly_bb_p pbb)
+: dr (dr), alias_set (alias_set), pbb (pbb) {}
+
+  /* A simpler constructor to be able to push these objects in a vec.  */
+  dr_info (int i) : dr (NULL), alias_set (-1), pbb (NULL)
+  {
+gcc_assert (i == 0);
+  }
+
+  /* Assignment operator, to be able to iterate over a vec of these objects.  
*/
+  const dr_info &
+  operator= (const dr_info )
+  {
+dr = p.dr;
+alias_set = p.alias_set;
+pbb = p.pbb;
+return *this;
+  }
+};
+
 /* A SCOP is a Static Control Part of the program, simple enough to be
represented in polyhedral form.  */
 struct scop
@@ -381,6 +417,9 @@ struct scop
  representation.  */
   vec bbs;
 
+  /* All the data references in this scop.  */
+  vec drs;
+
   /* The context describes known restrictions concerning the parameters
  and relations in between the parameters.
 
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 40b598d..e61e0bf 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -930,11 +930,11 @@ build_scop_iteration_domain (scop_p scop)
domain.  */
 
 static isl_map *
-pdr_add_alias_set (isl_map *acc, data_reference_p dr)
+pdr_add_alias_set (isl_map *acc, dr_info )
 {
   isl_constraint *c = isl_equality_alloc
   (isl_local_space_from_space (isl_map_get_space (acc)));
-  c = isl_constraint_set_constant_si (c, -dr->alias_set);
+  c = isl_constraint_set_constant_si (c, -dri.alias_set);
   c = isl_constraint_set_coefficient_si (c, isl_dim_out, 0, 1);
 
   return isl_map_add_constraint (acc, c);
@@ -968,8 +968,10 @@ set_index (isl_map *map, int pos, isl_pw_aff *index)
PBB is the poly_bb_p that contains the data reference DR.  */
 
 static isl_map *
-pdr_add_memory_accesses (isl_map *acc, data_reference_p dr, poly_bb_p pbb)
+pdr_add_memory_accesses (isl_map *acc, dr_info )
 {
+  data_reference_p dr = dri.dr;
+  poly_bb_p pbb = dri.pbb;
   int i, nb_subscripts = DR_NUM_DIMENSIONS (dr);
   scop_p scop = PBB_SCOP (pbb);
 
@@ -1056,10 +1058,12 @@ pdr_add_data_dimensions (isl_set *subscript_sizes, 
scop_p scop,
 /* Build data accesses for DR in PBB.  */
 
 static void
-build_poly_dr (data_reference_p dr, poly_bb_p pbb)
+build_poly_dr (dr_info )
 {
   isl_map *acc;
   isl_set *subscript_sizes;
+  poly_bb_p pbb = dri.pbb;
+  data_reference_p dr = dri.dr;
   scop_p scop = PBB_SCOP (pbb);
 
   {
@@ -1072,19 +1076,18 @@ build_poly_dr (data_reference_p dr, poly_bb_p pbb)
 acc = isl_map_set_tuple_id (acc, isl_dim_out, isl_id_for_dr (scop, dr));
   }
 
-  acc = pdr_add_alias_set (acc, dr);
-  acc = pdr_add_memory_accesses (acc, dr, pbb);
+  acc = pdr_add_alias_set (acc, dri);
+  acc = pdr_add_memory_accesses (acc, dri);
 
   {
 isl_id *id = isl_id_for_dr (scop, dr);
 int nb = 1 + DR_NUM_DIMENSIONS (dr);
 isl_space *space = isl_space_set_alloc (scop->isl_context, 0, nb);
-int alias_set_num =

[PATCH 1/3] remove dead code in computation of alias sets

2015-10-06 Thread Sebastian Pop

2015-10-06  Aditya Kumar  
Sebastian Pop  

* graphite-poly.c (new_poly_dr): Remove dr_base_object_set.
Do not set PDR_BASE_OBJECT_SET.
* graphite-poly.h (poly_dr): Same.
(PDR_BASE_OBJECT_SET): Remove.
(new_poly_dr): Update decl.
* graphite-sese-to-poly.c (build_poly_dr): Update call to
new_poly_dr.
(write_alias_graph_to_ascii_dimacs): Remove.
(write_alias_graph_to_ascii_dot): Remove.
(write_alias_graph_to_ascii_ecc): Remove.
(dr_same_base_object_p): Remove.
(build_alias_set_optimal_p): Rename build_alias_set.  Remove 
dead
code.
(build_base_obj_set_for_drs): Remove.
(dump_alias_graphs): Remove.
(build_scop_drs): Remove dead code.
---
 gcc/graphite-poly.c |   5 +-
 gcc/graphite-poly.h |   7 +-
 gcc/graphite-sese-to-poly.c | 275 +++-
 3 files changed, 20 insertions(+), 267 deletions(-)

diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 5d6a669..7de0e81 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -136,15 +136,14 @@ apply_poly_transforms (scop_p scop)
NB_SUBSCRIPTS.  */
 
 void
-new_poly_dr (poly_bb_p pbb, int dr_base_object_set,
-enum poly_dr_type type, data_reference_p cdr, graphite_dim_t 
nb_subscripts,
+new_poly_dr (poly_bb_p pbb, enum poly_dr_type type, data_reference_p cdr,
+graphite_dim_t nb_subscripts,
 isl_map *acc, isl_set *subscript_sizes)
 {
   static int id = 0;
   poly_dr_p pdr = XNEW (struct poly_dr);
 
   PDR_ID (pdr) = id++;
-  PDR_BASE_OBJECT_SET (pdr) = dr_base_object_set;
   PDR_NB_REFS (pdr) = 1;
   PDR_PBB (pdr) = pbb;
   pdr->accesses = acc;
diff --git a/gcc/graphite-poly.h b/gcc/graphite-poly.h
index 982fa94..3c4353d 100644
--- a/gcc/graphite-poly.h
+++ b/gcc/graphite-poly.h
@@ -182,10 +182,6 @@ struct poly_dr
   isl_map *accesses;
   isl_set *subscript_sizes;
 
-  /* Data reference's base object set number, we must assure 2 pdrs are in the
- same base object set before dependency checking.  */
-  int dr_base_object_set;
-
   /* The number of subscripts.  */
   graphite_dim_t nb_subscripts;
 };
@@ -196,10 +192,9 @@ struct poly_dr
 #define PDR_PBB(PDR) (PDR->pbb)
 #define PDR_TYPE(PDR) (PDR->type)
 #define PDR_ACCESSES(PDR) (NULL)
-#define PDR_BASE_OBJECT_SET(PDR) (PDR->dr_base_object_set)
 #define PDR_NB_SUBSCRIPTS(PDR) (PDR->nb_subscripts)
 
-void new_poly_dr (poly_bb_p, int, enum poly_dr_type, data_reference_p,
+void new_poly_dr (poly_bb_p, enum poly_dr_type, data_reference_p,
  graphite_dim_t, isl_map *, isl_set *);
 void free_poly_dr (poly_dr_p);
 void debug_pdr (poly_dr_p, int);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index d0c7eb4..235c911 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -1065,7 +1065,6 @@ pdr_add_data_dimensions (isl_set *subscript_sizes, scop_p 
scop,
 static void
 build_poly_dr (data_reference_p dr, poly_bb_p pbb)
 {
-  int dr_base_object_set;
   isl_map *acc;
   isl_set *subscript_sizes;
   scop_p scop = PBB_SCOP (pbb);
@@ -1100,131 +1099,21 @@ build_poly_dr (data_reference_p dr, poly_bb_p pbb)
 subscript_sizes = pdr_add_data_dimensions (subscript_sizes, scop, dr);
   }
 
-  gcc_assert (dr->aux);
-  dr_base_object_set = ((base_alias_pair *)(dr->aux))->base_obj_set;
-
-  new_poly_dr (pbb, dr_base_object_set,
+  new_poly_dr (pbb,
   DR_IS_READ (dr) ? PDR_READ : PDR_WRITE,
   dr, DR_NUM_DIMENSIONS (dr), acc, subscript_sizes);
 }
 
-/* Write to FILE the alias graph of data references in DIMACS format.  */
-
-static inline bool
-write_alias_graph_to_ascii_dimacs (FILE *file, char *comment,
-  vec drs)
-{
-  int num_vertex = drs.length ();
-  int edge_num = 0;
-  data_reference_p dr1, dr2;
-  int i, j;
-
-  if (num_vertex == 0)
-return true;
-
-  FOR_EACH_VEC_ELT (drs, i, dr1)
-for (j = i + 1; drs.iterate (j, ); j++)
-  if (dr_may_alias_p (dr1, dr2, true))
-   edge_num++;
-
-  fprintf (file, "$\n");
-
-  if (comment)
-fprintf (file, "c %s\n", comment);
-
-  fprintf (file, "p edge %d %d\n", num_vertex, edge_num);
-
-  FOR_EACH_VEC_ELT (drs, i, dr1)
-for (j = i + 1; drs.iterate (j, ); j++)
-  if (dr_may_alias_p (dr1, dr2, true))
-   fprintf (file, "e %d %d\n", i + 1, j + 1);
+/* Compute alias-sets for all data references in DRS.  */
 
-  return true;
-}
-
-/* Write to FILE the alias graph of data references in DOT format.  */
-
-static inline bool
-write_alias_graph_to_ascii_dot (FILE *file, char *comment,
-   vec drs)
-{
-  int num_vertex = drs.length ();
-  data_reference_p dr1, dr2;
-  int i, j;
-
-  if (num_vertex == 0)
-return

RE: [PATCH 1/2] [Refactoring graphite] Move declarations, assign types, renaming.

2015-10-06 Thread Sebastian Paul Pop

We just realized why this error got introduced:
the code that fails is in an #ifdef of ISL-0.14 or earlier.
We only bootstrapped and regtested the change with ISL-0.15.

Sorry again for the breakage,
Sebastian

-Original Message-
From: Sebastian Paul Pop [mailto:s@samsung.com] 
Sent: Tuesday, October 06, 2015 2:10 PM
To: 'H.J. Lu'; 'Aditya Kumar'
Cc: 'GCC Patches'; 'Tobias Grosser'; 'Richard Biener'; 'aditya...@samsung.com'; 
'Sebastian Pop'
Subject: RE: [PATCH 1/2] [Refactoring graphite] Move declarations, assign 
types, renaming.

Thanks for the quick fix.
Sorry for breaking bootstrap.


-Original Message-
From: H.J. Lu [mailto:hjl.to...@gmail.com] 
Sent: Tuesday, October 06, 2015 11:43 AM
To: Aditya Kumar
Cc: GCC Patches; Tobias Grosser; Richard Biener; aditya...@samsung.com; 
Sebastian Pop; Sebastian Pop
Subject: Re: [PATCH 1/2] [Refactoring graphite] Move declarations, assign 
types, renaming.

On Tue, Oct 6, 2015 at 9:37 AM, H.J. Lu  wrote:
> On Tue, Oct 6, 2015 at 9:34 AM, H.J. Lu  wrote:
>> On Mon, Oct 5, 2015 at 7:53 PM, Aditya Kumar  wrote:
>>> 1. Move declarations near the assignment/usage.
>>> 2. Assign type to members which were void*.
>>> 3. Rename scop->context to scop::param_context, and scop::ctx to
>>> scop::isl_context
>>>
>>> No functional changes intended. Passes regtest and bootstrap.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2015-10-05  Aditya Kumar  
>>>
>>> * graphite-dependences.c (scop_get_reads): Renamed scop->context to 
>>> scop->param_context.
>>> (scop_get_must_writes): Same.
>>> (scop_get_may_writes): Same.
>>> (scop_get_original_schedule): Same.
>>> (scop_get_transformed_schedule): Same.
>>> (subtract_commutative_associative_deps): Same.
>>> * graphite-isl-ast-to-gimple.c (add_parameters_to_ivs_params): Same.
>>> (generate_isl_context): Same.
>>> (generate_isl_schedule): Same.
>>> (scop_to_isl_ast): Same.
>>> (graphite_regenerate_ast_isl): Same.
>>> * graphite-optimize-isl.c (scop_get_domains): Same.
>>> (optimize_isl): Renamed scop->context to scop->param_context.
>>> * graphite-poly.c (new_poly_bb): Change the type of argument to 
>>> gimple_poly_bb_p.
>>> (new_scop): Renamed scop->context to scop->param_context.
>>> (free_scop): Same.
>>> (print_scop_context): Same.
>>> * graphite-poly.h (new_poly_dr): Change the type of argument from 
>>> void* to data_reference_p.
>>> (struct poly_bb): Change the type of black_box to gimple_poly_bb_p.
>>> (new_poly_bb): Change the type of argument from void* to 
>>> gimple_poly_bb_p.
>>> (pbb_set_black_box): Same.
>>> (struct scop): Rename context to param_context, ctx to isl_context.
>>> * graphite-scop-detection.c (scop_detection::build_scop_bbs_1): 
>>> Move declarations closer to assignment.
>>> (find_params_in_bb): Same.
>>> (find_scop_parameters): Same.
>>> * graphite-sese-to-poly.c (unsigned ssa_name_version_typesize): 
>>> Global to be used for statement IDs.
>>> (isl_id_for_pbb): Use ssa_name_version_typesize.
>>> (simple_copy_phi_p): Move declarations closer to assignment.
>>> (build_pbb_scattering_polyhedrons): Same.
>>> (build_scop_scattering): Same.
>>> (isl_id_for_ssa_name): Same.
>>> (extract_affine_name): Same.
>>> (extract_affine_int): Same.
>>> (extract_affine): Same.
>>> (set_scop_parameter_dim): Use renamed member.
>>> (build_loop_iteration_domains): Same.
>>> (add_param_constraints): Same.
>>> (build_scop_iteration_domain): Same.
>>> (pdr_add_data_dimensions): Same.
>>> (build_poly_dr): Same.
>>> (build_scop_drs): Move declarations closer to assignment.
>>> (analyze_drs_in_stmts): Same.
>>> (insert_out_of_ssa_copy): Same.
>>> (insert_out_of_ssa_copy_on_edge): Same.
>>> (propagate_expr_outside_region): Same.
>>> (rewrite_phi_out_of_ssa): Same.
>>> (rewrite_degenerate_phi): Same.
>>> (rewrite_reductions_out_of_ssa): Same.
>>> (rewrite_cross_bb_scalar_dependence): Same.
>>> (handle_scalar_deps_crossing_scop_limits): Same.
>>> (rewrite_cross_bb_scalar_deps): Same.
>>> * graphite.c (graphite_transform_loops): Use renamed member.
>>>
>>>
>>
>> It breaks bootstrap with isl-0.14:
>>
>> /export/gnu/import/git/sources/gcc/gcc/graphite-optimize-isl.c: In
>> function ‘bool optimize_isl(scop_p)’:
>> /export/gnu/import/git/sources/gcc/gcc/graphite-optimize-isl.c:333:40:
>> error: ‘struct scop’ has no member named ‘ctx’
>>isl_options_set_schedule_fuse (scop->ctx, ISL_SCHEDULE_FUSE_MIN);
>> ^
>> Makefile:1077: recipe for target 'graphite-optimize-isl.o' failed
>> make[3]: ***

Re: [PATCH 1/3] remove dead code in computation of alias sets

2015-10-06 Thread Tobias Grosser


On 10/06/2015 10:45 PM, Sebastian Pop wrote:

2015-10-06  Aditya Kumar  
 Sebastian Pop  

 * graphite-poly.c (new_poly_dr): Remove dr_base_object_set.
 Do not set PDR_BASE_OBJECT_SET.
 * graphite-poly.h (poly_dr): Same.
 (PDR_BASE_OBJECT_SET): Remove.
 (new_poly_dr): Update decl.
 * graphite-sese-to-poly.c (build_poly_dr): Update call to
 new_poly_dr.
 (write_alias_graph_to_ascii_dimacs): Remove.
 (write_alias_graph_to_ascii_dot): Remove.
 (write_alias_graph_to_ascii_ecc): Remove.
 (dr_same_base_object_p): Remove.
 (build_alias_set_optimal_p): Rename build_alias_set.  Remove 
dead
 code.
 (build_base_obj_set_for_drs): Remove.
 (dump_alias_graphs): Remove.
 (build_scop_drs): Remove dead code.


All LGTM.

Tobias

Re: [patch 4/3] Header file reduction - Tools for contrib

2015-10-06 Thread Bernd Schmidt


On 10/06/2015 09:19 PM, Andrew MacLeod wrote:

I dont get your fear.  I could have created that patch by hand, it would
just take a long time, and would likely be less complete, but just as
large.

I'm not  changing functionality.  ALL the tool is doing is removing
header files which aren't needed to compile.  It goes to great pains to
make sure it doesn't remove a silent dependency that conditional
compilation might introduce.  Other than that, the sanity check is that
everything compiles on every target and regression tests show nothing.
Since we're doing this with just include files, and not changing
functionality, Im not sure what your primary concern is?


My concern is that I've seen occasions in the past where "harmless 
cleanups" that were not intended to alter functionality introduced 
severe and subtle bugs that went unnoticed for a significant amount of 
time. If a change does not alter functionality, then there is a valid 
question of "why apply it then?", and the question of correctness 
becomes very important (to me anyway). The patch was produced by a 
fairly complex process, and I'd want to at least be able to convince 
myself that the process is correct.


Anyhow, I'll step back from this, you're probably better served by 
someone else reviewing the patch.



Bernd

Re: RFA: LM32: Configure with newlib-stdint.h

2015-10-06 Thread Jeff Law


On 10/06/2015 07:36 AM, Nick Clifton wrote:

Hi Sebastien,

   I recently found that I could not build libstdc++ for the lm32-elf
   target because the port lacked a definition of __INTPTR_TYPE__.  I
   tracked this down to the fact that INTPTR_TYPE was not being defined
   when building gcc, and this in turn was due to the fact that the
   newlib stdint types were not being defined.  So please may I apply the
   patch below to correct this problem ?

Cheers
   Nick

gcc/ChangeLog
2015-10-06  Nick Clifton  

* config.gcc (lm32-elf): Add newlib-stdint.h to tm_file.

OK.
jeff

Re: [C++ PATCH] Use protected_set_expr_location more

2015-10-06 Thread Jeff Law


On 10/06/2015 11:30 AM, Marek Polacek wrote:

Similarly to what I've just done for the C FE, this makes the C++ FE
use the protected_set_expr_location helper where applicable.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-10-06  Marek Polacek  

* cp-gimplify.c (genericize_if_stmt): Use protected_set_expr_location.
(genericize_cp_loop): Likewise.
* decl.c (cxx_maybe_build_cleanup): Likewise.
* parser.c (cp_parser_binary_expression): Likewise.
(cp_parser_omp_for_loop): Likewise.
(cp_parser_omp_construct): Likewise.
* semantics.c (finish_transaction_stmt): Likewise.
(build_transaction_expr): Likewise.

OK.
jeff

[committed, nios2] fix link error from gcc.dg/pr65658.c

2015-10-06 Thread Sandra Loosemore

On nios2-linux-gnu, the testcase gcc.dg/pr65658.c was failing with a 
link error.  The problem was that the nios2 back end was emitting 
GP-relative addressing for an uninitialized common symbol (C tentative 
declaration) that had a strong definition in a shared library.  That's a 
dumb thing to do -- it should be treating uninitialized common symbols 
similarly to weak definitions.  I've checked in the attached patch to 
fix it.


-Sandra
2015-10-06  Sandra Loosemore  

	gcc/
	* config/nios2/nios2.c (nios2_symbol_ref_in_small_data_p):
	For -mgpopt=local, also exclude unintialized common symbols.
	* doc/invoke.texi (Nios II Options): Document the change.
Index: gcc/config/nios2/nios2.c
===
--- gcc/config/nios2/nios2.c	(revision 228499)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -2099,13 +2099,17 @@ nios2_symbol_ref_in_small_data_p (rtx sy
 
 case gpopt_local:
   /* Use GP-relative addressing for small data symbols that are
-	 not external or weak, plus any symbols that have explicitly
-	 been placed in a small data section.  */
+	 not external or weak or uninitialized common, plus any symbols
+	 that have explicitly been placed in a small data section.  */
   if (decl && DECL_SECTION_NAME (decl))
 	return nios2_small_section_name_p (DECL_SECTION_NAME (decl));
   return (SYMBOL_REF_SMALL_P (sym)
 	  && !SYMBOL_REF_EXTERNAL_P (sym)
-	  && !(decl && DECL_WEAK (decl)));
+	  && !(decl && DECL_WEAK (decl))
+	  && !(decl && DECL_COMMON (decl)
+		   && (DECL_INITIAL (decl) == NULL
+		   || (!in_lto_p
+			   && DECL_INITIAL (decl) == error_mark_node;
 
 case gpopt_global:
   /* Use GP-relative addressing for small data symbols, even if
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 228499)
+++ gcc/doc/invoke.texi	(working copy)
@@ -18535,13 +18535,14 @@ Do not generate GP-relative accesses.
 
 @item local
 Generate GP-relative accesses for small data objects that are not 
-external or weak.  Also use GP-relative addressing for objects that
+external, weak, or uninitialized common symbols.  
+Also use GP-relative addressing for objects that
 have been explicitly placed in a small data section via a @code{section}
 attribute.
 
 @item global
 As for @samp{local}, but also generate GP-relative accesses for
-small data objects that are external or weak.  If you use this option,
+small data objects that are external, weak, or common.  If you use this option,
 you must ensure that all parts of your program (including libraries) are
 compiled with the same @option{-G} setting.

Re: [patch 4/3] Header file reduction - Tools for contrib

2015-10-06 Thread Jeff Law


On 10/06/2015 02:37 PM, Bernd Schmidt wrote:

On 10/06/2015 09:19 PM, Andrew MacLeod wrote:

I dont get your fear.  I could have created that patch by hand, it would
just take a long time, and would likely be less complete, but just as
large.

I'm not  changing functionality.  ALL the tool is doing is removing
header files which aren't needed to compile.  It goes to great pains to
make sure it doesn't remove a silent dependency that conditional
compilation might introduce.  Other than that, the sanity check is that
everything compiles on every target and regression tests show nothing.
Since we're doing this with just include files, and not changing
functionality, Im not sure what your primary concern is?


My concern is that I've seen occasions in the past where "harmless
cleanups" that were not intended to alter functionality introduced
severe and subtle bugs that went unnoticed for a significant amount of
time. If a change does not alter functionality, then there is a valid
question of "why apply it then?", and the question of correctness
becomes very important (to me anyway). The patch was produced by a
fairly complex process, and I'd want to at least be able to convince
myself that the process is correct.
A very valid concern.  In fact, one could argue that one of the long 
term problems we're likely to face as a project is the inability to do 
this kind of refactoring with high degrees of confidence that we're not 
breaking things.






Anyhow, I'll step back from this, you're probably better served by
someone else reviewing the patch.

That's fine.  I don't mind covering this.

jeff

Re: [patch 0/3] Header file reduction.

2015-10-06 Thread Jeff Law


On 10/05/2015 03:11 PM, Andrew MacLeod wrote:


In any case, a direct include of obstack.h in coretypes.h was considered
earlier in the aggregation process and it didn't show up as something
that would be a win.  It is included a couple of common places that we
have no control over..  in particular libcpp/include/symtab.h includes
obstack.h and  is included by tree-core.h.  A very significant number of
files bring that in.  If we included obstack.h in coretypes.h then those
files would be including it again for a second time for no particularly
good reason.  So I made the judgement call to not put it in coretypes.h.
And just as important, we can revisit the aggregators and when we do so, 
we ought to be able to answer the question, "if obstack.h is put into 
coretypes.h" does that clean things up elsewhere and re-run the tools to 
clean things up.






And it's one example, but it does point out a problem with this sort
of automated approach: realistically no one is going to check the
whole patch, and it may contain changes that could be done better.


The point being that the aggregation *wasn't* automated... and has
nothing to do with this patch set.I analyzed and preformed all that
sort of thing  earlier. Sure judgment calls were made, but it wasn't
automated in the slightest.   There are certainly further aggregation
improvements that could be made... and either I or someone else could do
more down the road.,  The heavy lifting has all been done now.

Agreed.



So  the *only* thing that is automated is removing include files which
are not needed so that we can get an idea of what the true dependencies
in the source base are.

Also agreed.


the reduction on all those files will take the better part of a week.


That's a little concerning due to the possibility of intervening
commits. I'd like to make one requirement for checkin, that you take
the revision at which you're committing and then run the script again,
verifying that the process produces the same changes as the patch you
committed. (Or do things in smaller chunks.).



Well, sure there are intervening commits.. the only ones that actually
matter are the ones which fail to compile because someone made a code
change which now requires a header that wasn't needed before.  which is
really a state we're looking for I think.   I fix those up before
committing.   Its *possible* a conditional compilation issue could creep
in, but highly unlikely.
More likely is conditional compilation will be removed :-)  We're trying 
to get away from conditional compilation as a general direction.


Intervening commits are always a problem with this kind of large patch 
that hits many places.   But IMHO, they're an easily managed problem.


jeff

RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI

2015-10-06 Thread Steve Ellcey

On Tue, 2015-10-06 at 12:02 +, Moore, Catherine wrote:

> > Moore, Catherine  writes:
> > > The patch itself looks good, but the tests that you added need a little 
> > > more
> > work.

Here is a new patch for just the tests.  I added NOMIPS16 and the
-mabi=32 flag as Matthew suggested and I also added -mno-abicalls.

Without the -mno-abicalls the MIPS linux compiler defaulted to
-mabicalls and allocated 32 bytes without the optimization and the MIPS
elf compiler defaulted to -mno-abicalls and allocated 24 bytes without
the optimization.  This synchronizes them so they both allocate 24 bytes
without the optimization and 8 bytes with it.  Everything should pass
now, it passed for me using mips-mti-linux-gnu and mips-mti-elf.

Steve Ellcey
sell...@imgtec.com


2015-10-06  Steve Ellcey  

* gcc.target/mips/mips.exp (mips_option_groups): Add -mframe-header-opt
and -mno-frame-header-opt options.
* gcc.target/mips/frame-header-1.c: New file.
* gcc.target/mips/frame-header-2.c: New file.
* gcc.target/mips/frame-header-3.c: New file.


diff --git a/gcc/testsuite/gcc.target/mips/frame-header-1.c 
b/gcc/testsuite/gcc.target/mips/frame-header-1.c
new file mode 100644
index 000..971656d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/frame-header-1.c
@@ -0,0 +1,21 @@
+/* Verify that we do not optimize away the frame header in foo when using
+   -mno-frame-header-opt by checking the stack pointer increment done in
+   that function.  Without the optimization foo should increment the stack
+   by 24 bytes, with the optimization it would only be 8 bytes.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mno-frame-header-opt -mabi=32 -mno-abicalls" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-24" } } */
+
+NOMIPS16 void __attribute__((noinline))
+bar (int* a)
+{
+  *a = 1;
+}
+
+NOMIPS16 void
+foo (int a)
+{
+  bar ();
+}
diff --git a/gcc/testsuite/gcc.target/mips/frame-header-2.c 
b/gcc/testsuite/gcc.target/mips/frame-header-2.c
new file mode 100644
index 000..0e86bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/frame-header-2.c
@@ -0,0 +1,21 @@
+/* Verify that we do optimize away the frame header in foo when using
+   -mframe-header-opt by checking the stack pointer increment done in
+   that function.  Without the optimization foo should increment the
+   stack by 24 bytes, with the optimization it would only be 8 bytes.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mframe-header-opt -mabi=32 -mno-abicalls" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-8" } } */
+
+NOMIPS16 void __attribute__((noinline))
+bar (int* a)
+{
+  *a = 1;
+}
+
+NOMIPS16 void
+foo (int a)
+{
+  bar ();
+}
diff --git a/gcc/testsuite/gcc.target/mips/frame-header-3.c 
b/gcc/testsuite/gcc.target/mips/frame-header-3.c
new file mode 100644
index 000..2a8c515
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/frame-header-3.c
@@ -0,0 +1,22 @@
+/* Verify that we do not optimize away the frame header in foo when using
+   -mframe-header-opt but are calling a weak function that may be overridden
+   by a different function that does need the frame header.  Without the
+   optimization foo should increment the stack by 24 bytes, with the
+   optimization it would only be 8 bytes.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mframe-header-opt -mabi=32 -mno-abicalls" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-24" } } */
+
+NOMIPS16 void __attribute__((noinline, weak))
+bar (int* a)
+{
+  *a = 1;
+}
+
+void
+NOMIPS16 foo (int a)
+{
+  bar ();
+}
diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 42e7fff..0f2d6a2 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -256,6 +256,7 @@ set mips_option_groups {
 maddps "HAS_MADDPS"
 lsa "(|!)HAS_LSA"
 section_start "-Wl,--section-start=.*"
+frame-header "-mframe-header-opt|-mno-frame-header-opt"
 }
 
 for { set option 0 } { $option < 32 } { incr option } {

Re: [PATCH v2] SH FDPIC backend support

2015-10-06 Thread Oleg Endo

On Tue, 2015-10-06 at 12:52 -0400, Rich Felker wrote:
> > > +  if (TARGET_FDPIC)
> > > +{
> > > +  rtx a = force_reg (Pmode, plus_constant (Pmode, XEXP (tramp_mem, 
> > > 0), 8));
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 0), a);
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 4), 
> > > OUR_FDPIC_REG);
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 8),
> > > +   gen_int_mode (TARGET_LITTLE_ENDIAN ? 0xd203d302 : 
> > > 0xd302d203,
> > > + SImode));
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 12),
> > > +   gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x5c216122 : 
> > > 0x61225c21,
> > > + SImode));
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 16),
> > > +   gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x0009412b : 
> > > 0x412b0009,
> > > + SImode));
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 20), cxt);
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 24), fnaddr);
> > > +}
> > > +  else
> > > +{
> > > +  emit_move_insn (change_address (tramp_mem, SImode, NULL_RTX),
> > > +   gen_int_mode (TARGET_LITTLE_ENDIAN ? 0xd301d202 : 
> > > 0xd202d301,
> > > + SImode));
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 4),
> > > +   gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x0009422b : 
> > > 0x422b0009,
> > > + SImode));
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 8), cxt);
> > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 12), fnaddr);
> > > +}
> > 
> > I think this hunk really needs a comment.  It copies machine code from
> > somewhere to somewhere via constant loads... but what exactly are the
> > instructions ...
> 
> This is generating trampolines for nested functions. This portion of
> the patch applied without modification from the old patch, so I didn't
> read into it in any more detail; it seems to be the following, which
> makes sense:
> 
> 0:.long 1f
>   .long gotval
> 1:mov.l 3f,r3
>   mov.l 2f,r2
>   mov.l @r2,r1
>   mov.l @(4,r2),r12
>   jmp @r1
>   nop
> 3:.long cxt
> 2:.long fnaddr
> 
> The corresponding non-FDPIC version is:
> 
>   mov.l 3f,r3
>   mov.l 2f,r2
>   jmp @r2
>   nop
> 3:.long cxt
> 2:.long fnaddr
> 
> Should these go into the source as comments?

Yes, please.  And of course some of the descriptive text as above.

> I would think it does, but I've found in the RTL files sometimes extra
> escaping is silently accepted, and I'm not sure if omitting it would
> visibly break. Can I rely on it producing a visible error right away
> if removing it is wrong, or do I need to search the gccint
> documentation to figure out what the right way is?

Just compile some code and look at the generated asm.

> I don't want to turn this into a political battle so we can go with
> whatever is appropriate for upstream gcc. Note however that, at
> present, the only targets this code is useful on are completely
> non-GNU Linux (musl-based and not using any GNU userspace on the
> target). uClibc may also work if someone digs up the old (untouched
> since 2011) superh_fdpic branch.

In this case leave as just "Linux".

> By "better" you mean leaving the self-specs approach in-place but
> explicitly initializing it to 0 with Init(0)? That sounds good to me.

Yes.

> It can't generate the same code either way because, with the patch as
> submitted, there's an extra load inside the asm. I would prefer
> switching to an approach that avoids that (mainly to avoid the ugly
> near-duplication of the asm block, but also to save a couple
> instructions) but short of feedback on acceptable ways to do the
> punning in the C++ I'll just leave it in the asm for now.

Do you have some alternatives to what's currently in the patch?  It's
difficult to judge without seeing them...

Cheers,
Oleg

Go patch committed: Track each package import separately

2015-10-06 Thread Ian Lance Taylor

This patch by Chris Manghane changes the Go frontend to track each
package import separately, so that we can report whether an import is
used separately by the alias used.  This fixes
https://golang.org/issue/12326 .  This requires adjust a couple of
test cases, to match errors previously only emitted by the gc
compiler, now also emitted by gccgo.  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 228497)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d8150af96de991fb79b1bf65ae982a860552c492
+3039d79149901d25d89c2412bdd8684f3cbcd09e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 228306)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -516,7 +516,6 @@ Gogo::import_package(const std::string&
 {
   Package* package = p->second;
   package->set_location(location);
-  package->set_is_imported();
   std::string ln = local_name;
   bool is_ln_exported = is_local_name_exported;
   if (ln.empty())
@@ -525,7 +524,9 @@ Gogo::import_package(const std::string&
  go_assert(!ln.empty());
  is_ln_exported = Lex::is_exported_name(ln);
}
-  if (ln == ".")
+  if (ln == "_")
+;
+  else if (ln == ".")
{
  Bindings* bindings = package->bindings();
  for (Bindings::const_declarations_iterator p =
@@ -533,11 +534,12 @@ Gogo::import_package(const std::string&
   p != bindings->end_declarations();
   ++p)
this->add_dot_import_object(p->second);
+  std::string dot_alias = "." + package->package_name();
+  package->add_alias(dot_alias, location);
}
-  else if (ln == "_")
-   package->set_uses_sink_alias();
   else
{
+  package->add_alias(ln, location);
  ln = this->pack_hidden_name(ln, is_ln_exported);
  this->package_->bindings()->add_package(ln, package);
}
@@ -563,7 +565,6 @@ Gogo::import_package(const std::string&
  "being compiled (see -fgo-pkgpath option)"));
 
   this->imports_.insert(std::make_pair(filename, package));
-  package->set_is_imported();
 }
 
   delete stream;
@@ -1544,7 +1545,10 @@ Gogo::lookup(const std::string& name, Na
   if (ret != NULL)
{
  if (ret->package() != NULL)
-   ret->package()->note_usage();
+{
+  std::string dot_alias = "." + ret->package()->package_name();
+  ret->package()->note_usage(dot_alias);
+}
  return ret;
}
 }
@@ -1594,10 +1598,14 @@ Gogo::add_imported_package(const std::st
 
   *padd_to_globals = false;
 
-  if (alias_arg == ".")
-*padd_to_globals = true;
-  else if (alias_arg == "_")
-ret->set_uses_sink_alias();
+  if (alias_arg == "_")
+;
+  else if (alias_arg == ".")
+{
+  *padd_to_globals = true;
+  std::string dot_alias = "." + real_name;
+  ret->add_alias(dot_alias, location);
+}
   else
 {
   std::string alias = alias_arg;
@@ -1606,6 +1614,7 @@ Gogo::add_imported_package(const std::st
  alias = real_name;
  is_alias_exported = Lex::is_exported_name(alias);
}
+  ret->add_alias(alias, location);
   alias = this->pack_hidden_name(alias, is_alias_exported);
   Named_object* no = this->package_->bindings()->add_package(alias, ret);
   if (!no->is_package())
@@ -2356,15 +2365,30 @@ Gogo::clear_file_scope()
++p)
 {
   Package* package = p->second;
-  if (package != this->package_
- && package->is_imported()
- && !package->used()
- && !package->uses_sink_alias()
- && !quiet)
-   error_at(package->location(), "imported and not used: %s",
-Gogo::message_name(package->package_name()).c_str());
-  package->clear_is_imported();
-  package->clear_uses_sink_alias();
+  if (package != this->package_ && !quiet)
+{
+  for (Package::Aliases::const_iterator p1 = 
package->aliases().begin();
+   p1 != package->aliases().end();
+   ++p1)
+{
+  if (!p1->second->used())
+{
+  // Give a more refined error message if the alias name is 
known.
+  std::string pkg_name = package->package_name();
+  if (p1->first != pkg_name && p1->first[0] != '.')
+{
+  error_at(p1->second->location(),
+   "imported and not used: %s as %s",
+   Gogo::message_name(pkg_name).c_str(),
+

Re: [patch 4/3] Header file reduction - Tools for contrib

2015-10-06 Thread Jeff Law


On 10/06/2015 08:04 AM, Andrew MacLeod wrote:

No commenting on the quality of python code... :-) I was
learning python on the fly.Im sure some things are QUITE awful.,


Yeah, the general impression is of fairly ad-hoc code. Not sure how
much can be done about this.

they were never intended as general purpose tools, they were developed
over multiple iterations and bugfixing and never properly designed..
they were never originally intended for public submission, so they
suffer...  and I'm not interested in rewriting them yet again

So a little background for Bernd.

The tangled mess that our header files has been makes it extremely 
difficult to do something introduce a new classes/interfaces to improve 
the separation of various parts of GCC.Consider the case if we 
wanted to drop trees from gimple onward by initially wrapping trees in a 
trivially compatible class then converting files one by one to use the 
new representation.


We'd want to be able to do the conversion, then ensure ourselves that 
the old interfaces couldn't sneak in.  Getting there required some 
significant header file deconstruction, then reconstruction.


So Andrew set forth to try and untangle the mess of dependencies, remove 
unnecessary includes, etc etc.  He had the good sense to write some 
scripts to help :-0


A few months ago as this stage of refactoring header files as nearing 
completion, I asked Andrew how we were going to prevent things from 
getting into the sorry shape we were in last year.  From that discussion 
the suggestion that he should polish up his scripts and submit them for 
inclusion into the contrib/ subdirectory for future reference/use.


Ideally we'd occasionally run those scripts to ensure that we don't muck 
things up too badly again in the future.


Anyway, that's how we got here.  The scripts are just helper tools, but 
I wouldn't consider them a core part of GCC.  Obviously the cleaner and 
easier to run, the better.


It's interesting that a lot of work done by Andrew has ended up 
mirroring stuff I'm reading these days in Feathers' book.



Jeff

Re: [patch 0/3] Header file reduction.

2015-10-06 Thread Jeff Law


On 10/05/2015 02:10 PM, Andrew MacLeod wrote:


Is the bitmap/obstack example really one of a change that is
desirable? I think if a file uses obstacks then an include of
obstack.h is perfectly fine, giving us freedom to e.g. change bitmaps
not to use obstacks. Given that multiple headers include obstack.h,
and pretty much everything seems to indirectly include bitmap.h
anyway, maybe a better change would be to just include it always in
system.h.


Its just an example of the sort of redundant includes the tool removes.
It may not be the best example.  The tools don't treat obstack specially 
(nor should they IMHO).  So let's pretend it's not obstack.h which has 
been arguably a core part of GCC for a long time.




   I don't see the point in leaving redundant #includes in the source
code because of direct uses from that header in the source.  I'm not
even sure how I could automate detecting that accurately.. Going
forward,  If anyone ever makes a change which removes a header from an
include file, they just have to correct the fallout. heh. Thats kinda
all I've done for 4 months :-)   At least we'll have grasp of the
ramifications..
And the last sentence is the key here.  We're trying to get to a point 
where we can make certain kinds of changes, then have the compiler spit 
out errors, fix the errors and have a high degree of confidence that the 
final change is correct and that we've found all the places that need to 
change.


The change could be as simple as moving a function declaration to its 
natural place, collecting interfaces & data into classes, or something 
more ambitious like removing trees from the backend.  Folks will note 
that these are all refactorings that we don't want to change any 
observable behaviour.







 * diff -c is somewhat unusual and I find diff -u much more readable.


unsual? I've been using -cp for the past 2 decades and no one has ever
mentioned it before...   poking around  the wiki I see it mentions you
can use either -up or -cp.

I guess I could repackage things using -up...  I don't even know where
my script is to change it :-).   is -u what everyone uses now? no one
has mentioned it before that I am aware of.
I'm probably the last person in the world that still generally prefers 
-cp :-)  I'm getting to the point where I can tolerate -u.







 * Maybe the patches for reordering and removing should be split, also
   for readability and for easier future identification of problems.


I was trying to avoid too much churn on 550ish files...  I didn't think
each one needed 2 sets of check-ins.It could be done, but it will
take a while.  The reordering patch can be quickly generated, but the
reduction on all those files will take the better part of a week.

My theory is it perfectly safe to back out any single file from the
patch set if we discover it has an issue and then examine what the root
of the problem is..

tool patch coming shortly... probably tomorrow now.
I haven't looked at the 3 patches in detail yet.  Given my familiarity 
with the overall process/goal, I can probably handle them as-is. 
They're just big mechanical changes.


jeff

[PATCH] Introduce base class for bb_vec_info and loop_vec_info

2015-10-06 Thread Richard Biener


This is sth I long wanted to have done and now it is required to not make
pushing/popping of the vectorizer state awkward (which is what I'm going
to work on).  It's mostly a 1:1 conversion everywhere so followup TLC
is certainly possible.  At least it shows that the current way of having
one of loop_vec_info or bb_vec_info is quite awkward already:

   else
-{
-  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
-  void *target_cost_data;
-
-  if (loop_vinfo)
-   target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
-  else
-   target_cost_data = BB_VINFO_TARGET_COST_DATA (bb_vinfo);
-
-  return add_stmt_cost (target_cost_data, count, kind, stmt_info,
-   misalign, where);
-}
+return add_stmt_cost (stmt_info->vinfo->target_cost_data,
+ count, kind, stmt_info, misalign, where);

You may notice I didn't bother to introduce all kinds of macros to
access the base class field.  Instead I'll be working towards removing
most of the screaming in the vectorizers code.

The patch moves common members into the base but doesn't bother yet
to do further semantic unification (like vectorization factor could
be in both, being always one in the BB variant).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-06  Richard Biener  

* tree-vectorizer.h (vec_info): New base class for...
(_loop_vec_info): ... this and ...
(_bb_vec_info): ... this.
(vect_is_simple_use, vect_is_simple_use_1, new_stmt_vec_info,
vect_analyze_data_refs_alignment, vect_verify_datarefs_alignment,
vect_analyze_data_ref_accesses, vect_analyze_data_refs,
vect_schedule_slp, vect_analyze_slp, vect_pattern_recog,
vect_destroy_datarefs): Adjust interface to take a vec_info *
rather than both a loop_vec_info and a bb_vec_info argument.
* tree-vect-data-refs.c (vect_compute_data_refs_alignment,
vect_verify_datarefs_alignment, vect_enhance_data_refs_alignment,
vect_analyze_data_refs_alignment, vect_analyze_data_ref_accesses,
vect_analyze_data_refs, vect_create_data_ref_ptr): Adjust
accordingly.
* tree-vect-loop.c (new_loop_vec_info): Initialize base class.
(destroy_loop_vec_info, vect_analyze_loop_2,
vect_is_simple_reduction_1, get_initial_def_for_induction,
vect_create_epilog_for_reduction, vectorizable_reduction,
vectorizable_live_operation, vect_transform_loop): Adjust.
* tree-vect-patterns.c (type_conversion_p,
vect_recog_widen_mult_pattern, vect_recog_widen_shift_pattern,
vect_recog_rotate_pattern, vect_recog_vector_vector_shift_pattern,
vect_recog_divmod_pattern, vect_recog_mixed_size_cond_pattern,
check_bool_pattern, vect_recog_bool_pattern,
vect_mark_pattern_stmts, vect_pattern_recog): Likewise.
* tree-vect-slp.c (vect_get_and_check_slp_defs,
vect_build_slp_tree_1, vect_build_slp_tree, vect_analyze_slp_cost_1,
vect_analyze_slp_instance, vect_analyze_slp, destroy_bb_vec_info,
vect_slp_analyze_bb_1, vect_schedule_slp): Likewise.
(new_bb_vec_info): Initialize base classs.
* tree-vect-stmts.c (record_stmt_cost, process_use,
vect_get_vec_def_for_operand, vect_finish_stmt_generation,
vectorizable_mask_load_store, vectorizable_call,
vectorizable_simd_clone_call, vectorizable_conversion,
vectorizable_assignment, vectorizable_shift,
vectorizable_operation, vectorizable_store,
vectorizable_load, vect_is_simple_cond, vectorizable_condition,
new_stmt_vec_info, vect_is_simple_use, vect_is_simple_use_1): Likewise.
* tree-vectorizer.c (vect_destroy_datarefs): Likewise.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 228482)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -784,23 +784,17 @@ vect_compute_data_ref_alignment (struct
Return FALSE if a data reference is found that cannot be vectorized.  */
 
 static bool
-vect_compute_data_refs_alignment (loop_vec_info loop_vinfo,
-  bb_vec_info bb_vinfo)
+vect_compute_data_refs_alignment (vec_info *vinfo)
 {
-  vec datarefs;
+  vec datarefs = vinfo->datarefs;
   struct data_reference *dr;
   unsigned int i;
 
-  if (loop_vinfo)
-datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-  else
-datarefs = BB_VINFO_DATAREFS (bb_vinfo);
-
   FOR_EACH_VEC_ELT (datarefs, i, dr)
 if (STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (dr)))
 && !vect_compute_data_ref_alignment (dr))
   {
-if (bb_vinfo)
+if (is_a  (vinfo))
   {
 /* Mark unsupported statement as unvectorizable.  */
 STMT_VINFO_VECTORIZABLE

Re: [PATCH] Unswitching outer loops.

2015-10-06 Thread Richard Biener

On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev  wrote:
> Thanks Richard.
> I'd like to answer on your last comment related to using of exit edge
> argument for edge that skips loop.
> Let's consider the following test-case:
>
> #include 
> #define N 32
> float *foo(int ustride, int size, float *src)
> {
>float *buffer, *p;
>int i, k;
>
>if (!src)
> return NULL;
>
>buffer = (float *) malloc(N * size * sizeof(float));
>
>if(buffer)
>   for(i=0, p=buffer; i for(k=0; k  *p++ = src[k];
>
>return buffer;
> }
>
> Before adding new edge we have in post-header bb:
>   :
>   # _6 = PHI <0B(8), buffer_20(16)>
>   return _6;
>
> It is clear that we must preserve function semantic and transform it to
> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)>

Ah, yeah.  I was confusing the loop exit of the inner vs. the outer loop.

Richard.

>
> 2015-10-05 13:57 GMT+03:00 Richard Biener :
>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev  wrote:
>>> Hi Richard,
>>>
>>> I re-designed outer loop unswitching using basic idea of 23855 patch -
>>> hoist invariant guard if loop is empty without guard. Note that this
>>> was added to loop unswitching pass with simple modifications - using
>>> another loop iterator etc.
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>> What is your opinion?
>>
>> Overall it looks good.  Some comments below - a few more testcases would
>> be nice as well.
>>
>> +  /* Loop must not be infinite.  */
>> +  if (!finite_loop_p (loop))
>> +return false;
>>
>> why's that?
>>
>> +  body = get_loop_body_in_dom_order (loop);
>> +  for (i = 0; i < loop->num_nodes; i++)
>> +{
>> +  if (body[i]->loop_father != loop)
>> +   continue;
>> +  if (!empty_bb_without_guard_p (loop, body[i]))
>>
>> I wonder if there is a better way to iterate over the interesting
>> blocks and PHIs
>> we need to check for side-effects (and thus we maybe can avoid gathering
>> the loop in DOM order).
>>
>> +  FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
>> +   {
>> + if (may_be_used_outside
>>
>> may_be_used_outside can be hoisted above the loop.  I wonder if we can take
>> advantage of loop-closed SSA form here (and the fact we have a single exit
>> from the loop).  Iterating over exit dest PHIs and determining whether the
>> exit edge DEF is inside the loop part it may not be should be enough.
>>
>> +  gcc_assert (single_succ_p (pre_header));
>>
>> that should be always true.
>>
>> +  gsi_remove (, false);
>> +  bb = guard->dest;
>> +  remove_edge (guard);
>> +  /* Update dominance for destination of GUARD.  */
>> +  if (EDGE_COUNT (bb->preds) == 0)
>> +{
>> +  basic_block s_bb;
>> +  gcc_assert (single_succ_p (bb));
>> +  s_bb = single_succ (bb);
>> +  delete_basic_block (bb);
>> +  if (single_pred_p (s_bb))
>> +   set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb));
>>
>> all this massaging should be simplified by leaving it to CFG cleanup by
>> simply adjusting the CONDs condition to always true/false.  There is
>> gimple_cond_make_{true,false} () for this (would be nice to have a variant
>> taking a bool).
>>
>> +  new_edge = make_edge (pre_header, exit->dest, flags);
>> +  if (fix_dom_of_exit)
>> +set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
>> +  update_stmt (gsi_stmt (gsi));
>>
>> the update_stmt should be not necessary, it's done by gsi_insert_after 
>> already.
>>
>> +  /* Add NEW_ADGE argument for all phi in post-header block.  */
>> +  bb = exit->dest;
>> +  for (gphi_iterator gsi = gsi_start_phis (bb);
>> +   !gsi_end_p (gsi); gsi_next ())
>> +{
>> +  gphi *phi = gsi.phi ();
>> +  /* edge_iterator ei; */
>> +  tree arg;
>> +  if (virtual_operand_p (gimple_phi_result (phi)))
>> +   {
>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>> +   }
>> +  else
>> +   {
>> + /* Use exit edge argument.  */
>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
>>
>> Hum.  How is it ok to use the exit edge argument for the edge that skips
>> the loop?  Why can't you always use the pre-header edge value?
>> That is, if we have
>>
>>  for(i=0;i>{
>>  if (n > 0)
>> {
>>  for (;;)
>>{
>>}
>>  }
>>}
>>   ... = i;
>>
>> then i is used after the loop and the correct value to use if
>> n > 0 is false is '0'.  Maybe this way we can also relax
>> what check_exit_phi does?  IMHO the only restriction is
>> if sth defined inside the loop before the header check for
>> the inner loop is used after the loop.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>>
>>> ChangeLog:
>>> 2015-09-30  Yuri Rumyantsev  
>>>

Re: C PATCH for c/65345 (file-scope _Atomic expansion with floats)

2015-10-06 Thread Eric Botcazou

> The attached patch is to fix ICEs for new pr65345-[45].c tests
> on sh4-unknown-linux-gnu.  It's a mechanical change referring
> to the original i386 patch, though I'm not sure the change
> for TARGET_EXPR is really needed for SH targets.

No, it is not if you don't need to make the variable addressable.  In any 
case, the failure mode is another ICE in make_decl_rtl so easy to spot.

-- 
Eric Botcazou

Re: Cache reals for 1/4, 1/6 and 1/9

2015-10-06 Thread Richard Biener

On Mon, Oct 5, 2015 at 4:47 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Oct 1, 2015 at 3:59 PM, Bernd Schmidt  wrote:
>>> On 10/01/2015 03:51 PM, Richard Sandiford wrote:

 We have a global 1/2 and a cached 1/3, but recalculate 1/4, 1/6 and 1/9
 each time we need them.  That seems a bit arbitrary and makes the folding
 code more noisy (especially once it's moved to match.pd).

 This patch caches the other three constants too.  Bootstrapped &
 regression-tested on x86_64-linux-gnu.  OK to install?
>>>
>>>
>>> Looks reasonable enough.
>>
>> Given
>>
>> /* Returns the special REAL_VALUE_TYPE corresponding to 1/3.  */
>>
>> const REAL_VALUE_TYPE *
>> dconst_third_ptr (void)
>> {
>>   static REAL_VALUE_TYPE value;
>>
>>   /* Initialize mathematical constants for constant folding builtins.
>>  These constants need to be given to at least 160 bits precision.  */
>>   if (value.cl == rvc_zero)
>> {
>>   real_arithmetic (, RDIV_EXPR, , real_digit (3));
>> }
>>   return 
>> }
>>
>> I wonder if it makes sense to have
>>
>> template
>> const REAL_VALUE_TYPE &
>> dconst (void)
>> {
>>   static REAL_VALUE_TYPE value;
>>   if (value.cl == rvc_zero)
>> real_arithmetic (, RDIV_EXPR, real_digit (a), real_digit (b));
>>   return value;
>> }
>>
>> instead which allows us to use
>>
>>   dconst<1,2>()
>>
>> in place of dconst_half () and allows arbitrary extra cached constants to be
>> added (well, double-check that, but I think the function static should be
>> a .comdat).
>
> You suggested on IRC that we do the same for the integral constants,
> so e.g. dconst0 becomes dconst<0> ().  Here's the result.  Like I said,
> I think this may be a case of "be careful what you wish for".
>
> Bootstrapped & regression-tested on x86_64-linux-gnu.  Also tested by
> building one target per CPU directory and checking that there were no
> new warnings and no changes in testsuite output at -O2.  OK to install?

Ok.  Note that if people don't like the dconst<1,2> () syntax we could wrap
it in a macro

#define DCONST(a, args...) dconst ()

and use DCONST(1, 2) (well, if I got the variadic macro syntax right and if
that's a required feature in C++04, it's in C++11 at least).

I wonder if C++23 will finally get sth like Schemes define-syntax ;)

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/ada/
> * gcc-interface/trans.c (convert_with_check): Use dconst template
> instead of static variables.
>
> gcc/c-family/
> * c-common.c (c_common_truthvalue_conversion): Use dconst template
> instead of static variables.
> * c-lex.c (interpret_float): Likewise.
> * c-ubsan.c (ubsan_instrument_division): Likewise.
>
> gcc/java/
> * decl.c (java_init_decl_processing): Use dconst template instead
> of static variables.
>
> gcc/
> * real.h (dconst0, dconst1, dconst2, dconstm1, dconsthalf): Delete.
> (dconst_third, dconst_third_ptr): Delete.
> (real_from_fraction): Declare.
> (dconst): New function.
> * real.c (real_from_fraction): New function.
> (real_digit, dconst_third_ptr): Delete.
> (exact_real_inverse, real_to_decimal_for_mode, decimal_integer_string)
> (ten_to_mptwo, times_pten): Use dconst instead of real_digit.
> (real_powi, real_floor, real_ceil, real_round): Use dconst
> instead of static variables.
> * emit-rtl.c (dconst0, dconst1, dconst2, dconstm1, dconsthalf): 
> Delete.
> (init_emit_once): Don't initialize them.
> * builtins.c (fold_builtin_sqrt, fold_builtin_cbrt): Use dconst
> instead of static variables.  Also use dconst<1, 6> and dconst<1, 9>
> instead of deriving them from doncst_third.
> (expand_builtin_cexpi, expand_builtin_signbit, fold_builtin_cabs)
> (fold_builtin_pow, fold_builtin_powi, fold_builtin_signbit)
> (fold_builtin_modf, fold_builtin_classify, fold_builtin_fpclassify)
> (fold_builtin_1, fold_builtin_2): Use dconst instead of static
> variables.
> * doc/match-and-simplify.texi: Likewise (in examples).
> * config/aarch64/aarch64.c (aarch64_float_const_zero_rtx_p): Likewise.
> * config/c6x/c6x.md (divsf3, divdf3): Likewise.
> * config/fr30/fr30.c (fr30_const_double_is_zero): Likewise.
> * config/i386/i386.c (standard_80387_constant_p): Likewise.
> (ix86_expand_convert_uns_didf_sse, ix86_expand_convert_uns_sidf_sse)
> (ix86_expand_convert_sign_didf_sse, ix86_expand_convert_uns_sisf_sse)
> (ix86_expand_vector_convert_uns_vsivsf): Likewise.
> (ix86_expand_adjust_ufix_to_sfix_si, ix86_emit_i387_round): Likewise.
> (ix86_emit_swsqrtsf, ix86_gen_TWO52, ix86_expand_lround): Likewise.
> (ix86_expand_floorceildf_32, ix86_expand_floorceil): Likewise.
>

Re: [PATCH, i386] Introduce switch for Skylake Server CPU.

2015-10-06 Thread Uros Bizjak

On Tue, Oct 6, 2015 at 9:09 AM, Kirill Yukhin  wrote:
> Hello Uroš,
>
> I've merged two patches together and rebased it
> on top of gcc-5-branch. The only change I made compared
> to trunk version is scheduling set to CPU_NEHALEM since
> CPU_HASWELL is not supported in gcc-5.
>
> Bootstrapped.
>
> Is it ok for gcc-5-branch?
>
> gcc/
> * config.gcc: Support "skylake-avx512".
> * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> PROCESSOR_SKYLAKE_AVX512.
> * config/i386/i386.c (m_SKYLAKE_AVX512): Define.
> (processor_target_table): Add "skylake-avx512".
> (PTA_SKYLAKE): Define.
> (ix86_option_override_internal): Add "skylake-avx512".
> (fold_builtin_cpu): Handle "skylake-avx512".
> * config/i386/i386.h (TARGET_SKYLAKE_AVX512): Define.
> (processor_type): Add PROCESSOR_SKYLAKE_AVX512.
> * doc/invoke.texi (skylake-avx512): New.
>
> libgcc/
> * libgcc/config/i386/cpuinfo.c (get_intel_cpu): Detect 
> "skylake-avx512",
> AES, PCLMUL, AVX-512VL, AVX-512BW, AVX-512DQ, AVX-512PF,
> AVX-512ER, AVX-512CD.
>
> gcc/testsuite/
> * gcc.target/i386/builtin_target.c: Add check for "skylake-avx512", 
> "aes"
> and "pclmul".
> * gcc.target/i386/funcspec-5.c: Test avx512vl, avx512bw,
> avx512dq, avx512cd, avx512er, avx512pf and skylake-avx512.

OK together with a followup testsuite patch.

Thanks,
Uros.

> --
> Thanks, K
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index c835734..207fc65 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -589,8 +589,8 @@ pentium4 pentium4m pentiumpro prescott"
>  x86_64_archs="amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
>  bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
>  core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \
> -sandybridge ivybridge haswell broadwell bonnell silvermont knl x86-64 \
> -native"
> +sandybridge ivybridge haswell broadwell bonnell silvermont knl \
> +skylake-avx512 x86-64 native"
>
>  # Additional x86 processors supported by --with-cpu=.  Each processor
>  # MUST be separated by exactly one space.
> diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> index f3f90df..4f20e14 100644
> --- a/gcc/config/i386/i386-c.c
> +++ b/gcc/config/i386/i386-c.c
> @@ -185,6 +185,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>def_or_undef (parse_in, "__knl");
>def_or_undef (parse_in, "__knl__");
>break;
> +case PROCESSOR_SKYLAKE_AVX512:
> +  def_or_undef (parse_in, "__skylake_avx512");
> +  def_or_undef (parse_in, "__skylake_avx512__");
> +  break;
>  /* use PROCESSOR_max to not set/unset the arch macro.  */
>  case PROCESSOR_max:
>break;
> @@ -294,6 +298,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>  case PROCESSOR_KNL:
>def_or_undef (parse_in, "__tune_knl__");
>break;
> +case PROCESSOR_SKYLAKE_AVX512:
> +  def_or_undef (parse_in, "__tune_skylake_avx512__");
> +  break;
>  case PROCESSOR_INTEL:
>  case PROCESSOR_GENERIC:
>break;
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 9b17256..43e6f91 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2054,6 +2054,7 @@ const struct processor_costs *ix86_cost = _cost;
>  #define m_BONNELL (1<  #define m_SILVERMONT (1<  #define m_KNL (1< +#define m_SKYLAKE_AVX512 (1<  #define m_INTEL (1<
>  #define m_GEODE (1< @@ -2522,6 +2523,7 @@ static const struct ptt 
> processor_target_table[PROCESSOR_max] =
>{"bonnell", _cost, 16, 15, 16, 7, 16},
>{"silvermont", _cost, 16, 15, 16, 7, 16},
>{"knl", _cost, 16, 15, 16, 7, 16},
> +  {"skylake-avx512", _cost, 16, 10, 16, 10, 16},
>{"intel", _cost, 16, 15, 16, 7, 16},
>{"geode", _cost, 0, 0, 0, 0, 0},
>{"k6", _cost, 32, 7, 32, 7, 32},
> @@ -3210,6 +3212,10 @@ ix86_option_override_internal (bool main_args_p,
> | PTA_FMA | PTA_MOVBE | PTA_HLE)
>  #define PTA_BROADWELL \
>(PTA_HASWELL | PTA_ADX | PTA_PRFCHW | PTA_RDSEED)
> +#define PTA_SKYLAKE_AVX512 \
> +  (PTA_BROADWELL | PTA_CLFLUSHOPT | PTA_XSAVEC | PTA_XSAVES \
> +   | PTA_AVX512F | PTA_AVX512CD | PTA_AVX512VL \
> +   | PTA_AVX512BW | PTA_AVX512DQ)
>  #define PTA_KNL \
>(PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD)
>  #define PTA_BONNELL \
> @@ -3271,6 +3277,7 @@ ix86_option_override_internal (bool main_args_p,
>{"haswell", PROCESSOR_HASWELL, CPU_NEHALEM, PTA_HASWELL},
>{"core-avx2", PROCESSOR_HASWELL, CPU_NEHALEM, PTA_HASWELL},
>{"broadwell", PROCESSOR_HASWELL, CPU_NEHALEM, PTA_BROADWELL},
> +  {"skylake-avx512", PROCESSOR_SKYLAKE_AVX512, CPU_NEHALEM, 
> PTA_SKYLAKE_AVX512},
>

Re: [PATCH, obvious, AVX-512] Add missing AVX-512 features detection.

2015-10-06 Thread Kirill Yukhin

Hi Richard,
On 06 Oct 09:36, Richard Biener wrote:
> The test now execute FAILs for me:
> 
> FAIL: gcc.target/i386/builtin_target.c execution test
> 
> I have family 6, model 94
Wow, Skylake!

Fixed. AVX-512VBMI bit lives in ecx, not ebx as rest of AVX-512.

gcc/testsuite/
* gcc.target/i386/builtin_target.c: Fix AVX-512VBMI detection.

I've checked the test on Skylake machine and it passes now.

--
Thanks, K

Index: gcc/testsuite/gcc.target/i386/builtin_target.c
===
--- gcc/testsuite/gcc.target/i386/builtin_target.c  (revision 228513)
+++ gcc/testsuite/gcc.target/i386/builtin_target.c  (working copy)
@@ -211,7 +211,7 @@
assert (__builtin_cpu_supports ("avx512dq"));
   if (ebx & bit_AVX512IFMA)
assert (__builtin_cpu_supports ("avx512ifma"));
-  if (ebx & bit_AVX512VBMI)
+  if (ecx & bit_AVX512VBMI)
assert (__builtin_cpu_supports ("avx512vbmi"));
 }
 }

Re: Generalize gimple_val_nonnegative_real_p

2015-10-06 Thread Richard Biener

On Mon, Oct 5, 2015 at 5:02 PM, Richard Sandiford
 wrote:
> The upcoming patch to move sqrt and cbrt simplifications to match.pd
> caused a regression because the (abs @0)->@0 simplification didn't
> trigger for:
>
> (abs (convert (abs X)))
>
> The simplification is based on tree_expr_nonnegative_p, which is
> pretty weak for gimple (it gives up if it sees an SSA_NAME).
>
> We have the stronger gimple_val_nonnegative_real_p, but (a) as its
> name implies, it's specific to reals and (b) in its current form it
> doesn't handle converts.  This patch:
>
> - generalises the routine all types
> - reuses tree_{unary,binary,call}_nonnegative_warnv_p for the leaf cases
> - makes the routine handle CONVERT_EXPR
> - allows a nesting depth of 1 for CONVERT_EXPR
> - uses the routine instead of tree_expr_nonnegative_p for gimple.
>
> Limiting the depth to 1 is a little arbitrary but adding a param seemed
> over the top.
>
> Bootstrapped & regression-tested on x86_64-linux-gnu.  I didn't write
> a specific test because this is already covered by the testsuite if
> the follow-on patch is also applied.  OK to install?

Hmm.  I don't like having essentially two copies of the same machinery.
Can you instead fold gimple_val_nonnegative_real_p into a
tree_ssa_name_nonnegative_warnv_p used by tree_expr_nonnegative_warnv_p?
For integers it's also possible to look at SSA name range info.
You'd still limit recursion appropriately (by passing down a depth arg
everywhere,
defaulted to 0 I guess).

Note that the comment in gimple_val_nonnegative_real_p is correct in that
we really shouldn't recurse (but maybe handle fixed patterns - like you do here)
as the appropriate way is to have a "nonnegative" lattice.  SSA name range info
may already provide enough info here (well, not for reals - time to add basic
real range support to VRP!).

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * gimple-fold.h (gimple_val_nonnegative_real_p): Replace with...
> (gimple_val_nonnegative_p): ...this new function.
> * gimple-fold.c (gimple_val_nonnegative_real_p): Replace with...
> (gimple_val_nonnegative_p): ...this new function.  Add a nesting
> depth.  Handle conversions and allow them to be nested to a depth
> of 1.  Generalize to non-reals.  Use tree_binary_nonnegative_warnv_p,
> tree_unary_nonnegative_warnv_p and tree_call_nonnegative_warnv_p.
> * tree-ssa-math-opts.c (gimple_expand_builtin_pow): Update 
> accordingly.
> * match.pd (nonnegative_p): New predicate.  Use it instead of
> tree_expr_nonnegative_p to detect redundant abs expressions.
>
> Index: a/gcc/gimple-fold.c
> ===
> *** a/gcc/gimple-fold.c
> --- b/gcc/gimple-fold.c
> *** gimple_get_virt_method_for_binfo (HOST_WIDE_INT token, tree 
> known_binfo,
> *** 5773,5787 
>   }
>
>   /* Return true iff VAL is a gimple expression that is known to be
> !non-negative.  Restricted to floating-point inputs.  */
>
>   bool
> ! gimple_val_nonnegative_real_p (tree val)
>   {
> gimple *def_stmt;
>
> -   gcc_assert (val && SCALAR_FLOAT_TYPE_P (TREE_TYPE (val)));
> -
> /* Use existing logic for non-gimple trees.  */
> if (tree_expr_nonnegative_p (val))
>   return true;
> --- 5773,5785 
>   }
>
>   /* Return true iff VAL is a gimple expression that is known to be
> !non-negative.  DEPTH is the nesting depth.  */
>
>   bool
> ! gimple_val_nonnegative_p (tree val, unsigned int depth)
>   {
> gimple *def_stmt;
>
> /* Use existing logic for non-gimple trees.  */
> if (tree_expr_nonnegative_p (val))
>   return true;
> *** gimple_val_nonnegative_real_p (tree val)
> *** 5789,5906 
> if (TREE_CODE (val) != SSA_NAME)
>   return false;
>
> !   /* Currently we look only at the immediately defining statement
> !  to make this determination, since recursion on defining
> !  statements of operands can lead to quadratic behavior in the
> !  worst case.  This is expected to catch almost all occurrences
> !  in practice.  It would be possible to implement limited-depth
> !  recursion if important cases are lost.  Alternatively, passes
> !  that need this information (such as the pow/powi lowering code
> !  in the cse_sincos pass) could be revised to provide it through
>dataflow propagation.  */
>
> def_stmt = SSA_NAME_DEF_STMT (val);
>
> if (is_gimple_assign (def_stmt))
>   {
> !   tree op0, op1;
> !
> !   /* See fold-const.c:tree_expr_nonnegative_p for additional
> !cases that could be handled with recursion.  */
> !
> !   switch (gimple_assign_rhs_code (def_stmt))
> {
> !   case ABS_EXPR:
> ! /* Always true for floating-point operands.  */
> ! return true;
> !
> !   case MULT_EXPR:
> ! /* True if the two operands are identical (since we are
> !

Re: [AARCH64] Add missing entries in iterator vwcore

2015-10-06 Thread James Greenhalgh

On Tue, Oct 06, 2015 at 02:40:26AM +0100, Kugan wrote:
> 
> 
> On 05/10/15 21:33, James Greenhalgh wrote:
> > On Thu, Oct 01, 2015 at 09:41:20PM +0100, Kugan wrote:
> >> Hi,
> >>
> >> In "aarch64_get_lane" operand 0 is VEL, so  for %0,
> >> iterator vwcore should (?) support all the modes in VEL.
> >>
> >> Ran into following error with a local patch for an existing test case.
> >> However it can also be reproduced with the attached test case.
> >>
> >> fnction ???fn1???:
> >> t.c:25:1: internal compiler error: output_operand: invalid %-code
> >>  }
> >>  ^
> >> 0x8198fb output_operand_lossage(char const*, ...)
> >>../../base/gcc/final.c:3417
> >> 0x81a45b output_asm_insn(char const*, rtx_def**)
> >>../../base/gcc/final.c:3782
> >> 0x81b9d3 output_asm_insn(char const*, rtx_def**)
> >>../../base/gcc/final.c:2364
> >> 0x81b9d3 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
> >>../../base/gcc/final.c:3029
> >> 0x81be2b final(rtx_insn*, _IO_FILE*, int)
> >>../../base/gcc/final.c:2058
> >> 0x81c6e7 rest_of_handle_final
> >>../../base/gcc/final.c:4449
> >> 0x81c6e7 execute
> >>../../base/gcc/final.c:4524
> >>
> >>
> >> Attached patch fixes this. Bootstrapped and regression tested for
> >> aarch64-none-linux-gnu with no new regression. Is this OK for trunk?
> >>
> >> gcc/ChangeLog:
> >>
> >> 2015-10-02  Kugan Vivekanandarajah  
> >>
> >>* config/aarch64/iterators.md: Add missing core element mode for
> >> mode.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2015-10-02  Kugan Vivekanandarajah  
> >>
> >>* gcc.target/aarch64/foo.c: New test.
> >>
> > 
> > "foo.c" is not OK, please give this testcase a meaningful name.
> > 
> Renamed the test case.
> 
> Is this OK now?

It still doesn't quite look right. For one, the attribute causing the ICE
is "vwcore" not "vcore".

How about calling the test gcc.target/aarch64/get_lane_f16_1.c ?

> gcc/ChangeLog:
> 
> 2015-10-06  Kugan Vivekanandarajah  
> 
>   * config/aarch64/iterators.md: Add missing core element mode for
>mode.

This ChangeLog entry is incomplete:

* config/aarch64/iterators.md (vwcore): Add missing cases for
 V4HF/V8HF modes.

> 
> gcc/testsuite/ChangeLog:
> 
> 2015-10-06  Kugan Vivekanandarajah  
> 
>   * gcc.target/aarch64/vcore_ice_test.c: New test.
> 

Please remeber to also update this with the new test name I suggested
above.

OK with those changes.

Thanks,
James

Re: Move sqrt and cbrt simplifications to match.pd

2015-10-06 Thread Marc Glisse


On Mon, 5 Oct 2015, Richard Sandiford wrote:


+  /* cbrt(sqrt(x)) -> pow(x,1/6).  */
+  (simplify
+   (sqrts (cbrts @0))
+   (pows @0 { build_real_truncate (type, dconst<1, 6> ()); }))
+  /* sqrt(cbrt(x)) -> pow(x,1/6).  */
+  (simplify
+   (cbrts (sqrts @0))
+   (pows @0 { build_real_truncate (type, dconst<1, 6> ()); }))


I think you swapped the comments (not that it matters).

--
Marc Glisse

Re: Do not compare types in operands_equal_p if OEP_ADDRESS_OF is set

2015-10-06 Thread Eric Botcazou

> I also disabled type matching done by operand_equal_p and cleaned up the
> conditional of MEM_REF into multiple ones - for example it was passing
> OEP_ADDRESS_OF when comparing TYPE_SIZE which is quite a nonsense.
> 
> I wonder what to do about OPE_CONSTANT_ADDRESS_OF.  This flag does not seem
> to be used at all in current tree nor documented somehow.

It is used and (un-)documented as OEP_ADDRESS_OF, see the ADDR_EXPR case:

 case ADDR_EXPR:
 return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1, 0),
TREE_CONSTANT (arg0) && TREE_CONSTANT (arg1)
? OEP_CONSTANT_ADDRESS_OF | OEP_ADDRESS_OF : 0);

So it's OEP_ADDRESS_OF but for constant addresses.

-- 
Eric Botcazou

Re: Move sqrt and cbrt simplifications to match.pd

2015-10-06 Thread Richard Biener

On Mon, Oct 5, 2015 at 5:17 PM, Richard Sandiford
 wrote:
> This patch moves the sqrt and cbrt simplification rules to match.pd.
> builtins.c now only does the constant folding.
>
> Bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?

Ok (once prerequesites are approved).

People may notice that on GENERIC we no longer simplify these - this is because
genmatch doesn't output a toplevel entry for simplifying calls (well,
I was lazy,
would need to have hooked into builtins.c folding somewhere but IMHO GENERIC
call folding should go).

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * builtins.c (fold_builtin_sqrt, fold_builtin_cbrt): Delete.
> (fold_builtin_1): Update accordingly.  Handle constant arguments here.
> * match.pd: Add rules previously handled by fold_builtin_sqrt
> and fold_builtin_cbrt.
>
> gcc/testsuite/
> * gcc.dg/builtins-47.c: Test the optimized dump instead.
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 85ba6dd..3df60e8 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -158,8 +158,6 @@ static bool integer_valued_real_p (tree);
>  static tree fold_trunc_transparent_mathfn (location_t, tree, tree);
>  static rtx expand_builtin_fabs (tree, rtx, rtx);
>  static rtx expand_builtin_signbit (tree, rtx);
> -static tree fold_builtin_sqrt (location_t, tree, tree);
> -static tree fold_builtin_cbrt (location_t, tree, tree);
>  static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
>  static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
>  static tree fold_builtin_cos (location_t, tree, tree, tree);
> @@ -7706,145 +7704,6 @@ fold_builtin_cproj (location_t loc, tree arg, tree 
> type)
>return NULL_TREE;
>  }
>
> -/* Fold a builtin function call to sqrt, sqrtf, or sqrtl with argument ARG.
> -   Return NULL_TREE if no simplification can be made.  */
> -
> -static tree
> -fold_builtin_sqrt (location_t loc, tree arg, tree type)
> -{
> -
> -  enum built_in_function fcode;
> -  tree res;
> -
> -  if (!validate_arg (arg, REAL_TYPE))
> -return NULL_TREE;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((res = do_mpfr_arg1 (arg, type, mpfr_sqrt, <0> (), NULL, true)))
> -return res;
> -
> -  /* Optimize sqrt(expN(x)) = expN(x*0.5).  */
> -  fcode = builtin_mathfn_code (arg);
> -  if (flag_unsafe_math_optimizations && BUILTIN_EXPONENT_P (fcode))
> -{
> -  tree expfn = TREE_OPERAND (CALL_EXPR_FN (arg), 0);
> -  arg = fold_build2_loc (loc, MULT_EXPR, type,
> -CALL_EXPR_ARG (arg, 0),
> -build_real (type, dconst<1, 2> ()));
> -  return build_call_expr_loc (loc, expfn, 1, arg);
> -}
> -
> -  /* Optimize sqrt(Nroot(x)) -> pow(x,1/(2*N)).  */
> -  if (flag_unsafe_math_optimizations && BUILTIN_ROOT_P (fcode))
> -{
> -  tree powfn = mathfn_built_in (type, BUILT_IN_POW);
> -
> -  if (powfn)
> -   {
> - tree arg0 = CALL_EXPR_ARG (arg, 0);
> - tree arg1 = (BUILTIN_SQRT_P (fcode)
> -  ? build_real (type, dconst<1, 4> ())
> -  : build_real_truncate (type, dconst<1, 6> ()));
> - return build_call_expr_loc (loc, powfn, 2, arg0, arg1);
> -   }
> -}
> -
> -  /* Optimize sqrt(pow(x,y)) = pow(|x|,y*0.5).  */
> -  if (flag_unsafe_math_optimizations
> -  && (fcode == BUILT_IN_POW
> - || fcode == BUILT_IN_POWF
> - || fcode == BUILT_IN_POWL))
> -{
> -  tree powfn = TREE_OPERAND (CALL_EXPR_FN (arg), 0);
> -  tree arg0 = CALL_EXPR_ARG (arg, 0);
> -  tree arg1 = CALL_EXPR_ARG (arg, 1);
> -  tree narg1;
> -  if (!tree_expr_nonnegative_p (arg0))
> -   arg0 = build1 (ABS_EXPR, type, arg0);
> -  narg1 = fold_build2_loc (loc, MULT_EXPR, type, arg1,
> -  build_real (type, dconst<1, 2> ()));
> -  return build_call_expr_loc (loc, powfn, 2, arg0, narg1);
> -}
> -
> -  return NULL_TREE;
> -}
> -
> -/* Fold a builtin function call to cbrt, cbrtf, or cbrtl with argument ARG.
> -   Return NULL_TREE if no simplification can be made.  */
> -
> -static tree
> -fold_builtin_cbrt (location_t loc, tree arg, tree type)
> -{
> -  const enum built_in_function fcode = builtin_mathfn_code (arg);
> -  tree res;
> -
> -  if (!validate_arg (arg, REAL_TYPE))
> -return NULL_TREE;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((res = do_mpfr_arg1 (arg, type, mpfr_cbrt, NULL, NULL, 0)))
> -return res;
> -
> -  if (flag_unsafe_math_optimizations)
> -{
> -  /* Optimize cbrt(expN(x)) -> expN(x/3).  */
> -  if (BUILTIN_EXPONENT_P (fcode))
> -   {
> - tree expfn = TREE_OPERAND (CALL_EXPR_FN (arg), 0);
> - arg = fold_build2_loc (loc, MULT_EXPR, type,
> -CALL_EXPR_ARG (arg, 0),
> -build_real_truncate (type, dconst<1, 3> ()));
> -

RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI

2015-10-06 Thread Matthew Fortune

Moore, Catherine  writes:
> The patch itself looks good, but the tests that you added need a little more 
> work.
> 
> I tested with the mips-sde-elf-lite configuration and I'm seeing failures for 
> many
> options.  The main failure mode seems to be that the stack is incremented by 
> 24 instead of
> 32.
> I tried this change in frame-header-1.c and frame-header-3.c:
> 
> /* { dg-final { scan-assembler-not "\taddiu\t\\\$sp,\\\$sp,-8" } }
> 
> instead of:
> 
> /* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-32" }

I'd quite like to be specific about the frame layout we expect as the testcase 
is so simple
that I think it should be predictable over time. Did you happen to see a 
pattern to the
failure? i.e. Could it be non-o32 ABIs? I'm not a fan of scan-assembler-nots in 
general
as it is so easy to change exact output text and never match them anyway even 
if the
offending instruction is generated.

> There are still errors in frame-header-2.c when compiling with -mips16 and 
> -mabi=64 (this
> one uses a daddiu).
> Also, the tests fail for -flto variants.

Let's just add NOMIPS16 (I think that is the macro) to the functions and lock 
the tests down
to -mabi=32 which is the only ABI they are valid for anyway.

Matthew

> Would you please fix this up and resubmit?
> 
> Thanks,
> Catherine
> 
> >
> >
> > diff --git a/gcc/testsuite/gcc.target/mips/frame-header-1.c
> > b/gcc/testsuite/gcc.target/mips/frame-header-1.c
> > new file mode 100644
> > index 000..8495e0f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/mips/frame-header-1.c
> > @@ -0,0 +1,21 @@
> > +/* Verify that we do not optimize away the frame header in foo when using
> > +   -mno-frame-header-opt by checking the stack pointer increment done in
> > +   that function.  Without the optimization foo should increment the stack
> > +   by 32 bytes, with the optimization it would only be 8 bytes.  */
> > +
> > +/* { dg-do compile } */
> > +/* { dg-options "-mno-frame-header-opt" } */
> > +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
> > +/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-32" } } */
> > +
> > +void __attribute__((noinline))
> > +bar (int* a)
> > +{
> > +  *a = 1;
> > +}
> > +
> > +void
> > +foo (int a)
> > +{
> > +  bar ();
> > +}
> > diff --git a/gcc/testsuite/gcc.target/mips/frame-header-2.c
> > b/gcc/testsuite/gcc.target/mips/frame-header-2.c
> > new file mode 100644
> > index 000..37ea2d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/mips/frame-header-2.c
> > @@ -0,0 +1,21 @@
> > +/* Verify that we do optimize away the frame header in foo when using
> > +   -mframe-header-opt by checking the stack pointer increment done in
> > +   that function.  Without the optimization foo should increment the
> > +   stack by 32 bytes, with the optimization it would only be 8 bytes.
> > +*/
> > +
> > +/* { dg-do compile } */
> > +/* { dg-options "-mframe-header-opt" } */
> > +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
> > +/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-8" } } */
> > +
> > +void __attribute__((noinline))
> > +bar (int* a)
> > +{
> > +  *a = 1;
> > +}
> > +
> > +void
> > +foo (int a)
> > +{
> > +  bar ();
> > +}
> > diff --git a/gcc/testsuite/gcc.target/mips/frame-header-3.c
> > b/gcc/testsuite/gcc.target/mips/frame-header-3.c
> > new file mode 100644
> > index 000..1cb1547
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/mips/frame-header-3.c
> > @@ -0,0 +1,22 @@
> > +/* Verify that we do not optimize away the frame header in foo when using
> > +   -mframe-header-opt but are calling a weak function that may be
> > overridden
> > +   by a different function that does need the frame header.  Without the
> > +   optimization foo should increment the stack by 32 bytes, with the
> > +   optimization it would only be 8 bytes.  */
> > +
> > +/* { dg-do compile } */
> > +/* { dg-options "-mframe-header-opt" } */
> > +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
> > +/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-32" } } */
> > +
> > +void __attribute__((noinline, weak))
> > +bar (int* a)
> > +{
> > +  *a = 1;
> > +}
> > +
> > +void
> > +foo (int a)
> > +{
> > +  bar ();
> > +}
> > diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> > b/gcc/testsuite/gcc.target/mips/mips.exp
> > index 42e7fff..0f2d6a2 100644
> > --- a/gcc/testsuite/gcc.target/mips/mips.exp
> > +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> > @@ -256,6 +256,7 @@ set mips_option_groups {
> >  maddps "HAS_MADDPS"
> >  lsa "(|!)HAS_LSA"
> >  section_start "-Wl,--section-start=.*"
> > +frame-header "-mframe-header-opt|-mno-frame-header-opt"
> >  }
> >
> >  for { set option 0 } { $option < 32 } { incr option } {
> >

Re: Do not compare types in operands_equal_p if OEP_ADDRESS_OF is set

2015-10-06 Thread Richard Biener

On Tue, 6 Oct 2015, Jan Hubicka wrote:

> Hi,
> While looking for uses of useless_type_conversion on non-gimple register types
> I run across few that seem to be completely unnecesary and I would like to get
> rid of them in hope to get rid of code comparing functions/method type and
> possibly more. 
> 
> usless_type_conversion is about operations on the types in gimple expressions,
> not about memory accesses nor about function calls.
> 
> First on is in fold-const.c that may be used on MEM_REF of aggregate type.
> As discussed earlier, the type compare is unnecesary when we only care about
> address that seems to be the most comon case we get into this path.
> 
> I also disabled type matching done by operand_equal_p and cleaned up the
> conditional of MEM_REF into multiple ones - for example it was passing
> OEP_ADDRESS_OF when comparing TYPE_SIZE which is quite a nonsense.
> 
> I wonder what to do about OPE_CONSTANT_ADDRESS_OF.  This flag does not seem
> to be used at all in current tree nor documented somehow.

Eric added that.  It's set when seeing ADDR_EXPRs and has an extra
special handling when TREE_SIDE_EFFECTS are tested.  It matters for
Ada I guess.

> I also made operand_equal_p to skip AA compare when -fno-strict-aliasing
> is used.
>
> Bootstrapped/regtested x86_64-linux, OK?

See comments below - otherwise it looks good.

Richard.

> Honza
> 
>   * fold-const.c (operand_equal_p): When in OEP_ADDRESS_OF
>   do not require types to match; also relax checking of MEM_REF.
> Index: fold-const.c
> ===
> --- fold-const.c  (revision 228131)
> +++ fold-const.c  (working copy)
> @@ -2712,26 +2712,31 @@ operand_equal_p (const_tree arg0, const_
>if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
>  return tree_int_cst_equal (arg0, arg1);
>  
> -  /* If both types don't have the same signedness, then we can't consider
> - them equal.  We must check this before the STRIP_NOPS calls
> - because they may change the signedness of the arguments.  As pointers
> - strictly don't have a signedness, require either two pointers or
> - two non-pointers as well.  */
> -  if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1))
> -  || POINTER_TYPE_P (TREE_TYPE (arg0)) != POINTER_TYPE_P (TREE_TYPE 
> (arg1)))
> -return 0;
> +  if (!(flags & OEP_ADDRESS_OF))
> +{
> +  /* If both types don't have the same signedness, then we can't consider
> +  them equal.  We must check this before the STRIP_NOPS calls
> +  because they may change the signedness of the arguments.  As pointers
> +  strictly don't have a signedness, require either two pointers or
> +  two non-pointers as well.  */
> +  if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE 
> (arg1))
> +   || POINTER_TYPE_P (TREE_TYPE (arg0))
> +  != POINTER_TYPE_P (TREE_TYPE (arg1)))
> + return 0;
>  
> -  /* We cannot consider pointers to different address space equal.  */
> -  if (POINTER_TYPE_P (TREE_TYPE (arg0)) && POINTER_TYPE_P (TREE_TYPE (arg1))
> -  && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
> -   != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
> -return 0;
> +  /* We cannot consider pointers to different address space equal.  */
> +  if (POINTER_TYPE_P (TREE_TYPE (arg0))
> +   && POINTER_TYPE_P (TREE_TYPE (arg1))
> +   && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
> +   != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
> + return 0;
>  
> -  /* If both types don't have the same precision, then it is not safe
> - to strip NOPs.  */
> -  if (element_precision (TREE_TYPE (arg0))
> -  != element_precision (TREE_TYPE (arg1)))
> -return 0;
> +  /* If both types don't have the same precision, then it is not safe
> +  to strip NOPs.  */
> +  if (element_precision (TREE_TYPE (arg0))
> +   != element_precision (TREE_TYPE (arg1)))
> + return 0;

It's odd that you move this under the !OEP_ADDRESS_OF case but
not the STRIP_NOPS itself.

> +}
>  
>STRIP_NOPS (arg0);
>STRIP_NOPS (arg1);
> @@ -2935,27 +2940,34 @@ operand_equal_p (const_tree arg0, const_
>  
>   case TARGET_MEM_REF:
>   case MEM_REF:
> -   /* Require equal access sizes, and similar pointer types.
> -  We can have incomplete types for array references of
> -  variable-sized arrays from the Fortran frontend
> -  though.  Also verify the types are compatible.  */
> -   if (!((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1))
> -|| (TYPE_SIZE (TREE_TYPE (arg0))
> -&& TYPE_SIZE (TREE_TYPE (arg1))
> -&& operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)),
> -TYPE_SIZE (TREE_TYPE (arg1)), 
> flags)))
> -   && types_compatible_p

[PATCH] Fix PR67859

2015-10-06 Thread Richard Biener


This fixes an ICE in SCCVN - not sure how we got away with not clearing
new slots...  possibly SSA name recycling triggered.

Bootstrapped on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-06  Richard Biener  

PR tree-optimization/67859
* tree-ssa-sccvn.c (VN_INFO_GET): Clear new entries.

* gcc.dg/torture/pr67859.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 228514)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -407,7 +407,7 @@ VN_INFO_GET (tree name)
   newinfo = XOBNEW (_ssa_aux_obstack, struct vn_ssa_aux);
   memset (newinfo, 0, sizeof (struct vn_ssa_aux));
   if (SSA_NAME_VERSION (name) >= vn_ssa_aux_table.length ())
-vn_ssa_aux_table.safe_grow (SSA_NAME_VERSION (name) + 1);
+vn_ssa_aux_table.safe_grow_cleared (SSA_NAME_VERSION (name) + 1);
   vn_ssa_aux_table[SSA_NAME_VERSION (name)] = newinfo;
   return newinfo;
 }

Index: gcc/testsuite/gcc.dg/torture/pr67859.c
===
--- gcc/testsuite/gcc.dg/torture/pr67859.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr67859.c  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+int a, b, c;
+
+void
+fn1 ()
+{
+  b = c ? 0 : 1 << a;
+  b |= 0x9D7A5FD9;
+  for (;;)
+{
+  int d = 1;
+  b &= (unsigned) d;
+}
+}

Re: C PATCH for c/65345 (file-scope _Atomic expansion with floats)

2015-10-06 Thread Eric Botcazou

> Joseph reminded me that I had forgotten about this patch.  As mentioned
> here , I'm
> removing the XFAILs in the tests so people are likely to see new FAILs.
> 
> I think the following targets will need similar fix as the one below:
> * MIPS
> * rs6000
> * alpha
> * sparc
> * s390
> * arm
> * sh
> * aarch64
> 
> I'm CCing the respective maintainers.  You might want to XFAIL those tests.

Thanks, here are the SPARC bits with an explanation for the other maintainers: 
create_tmp_var_raw must be used instead of create_tmp_var because the hook can 
be invoked outside of a function context; likewise for TREE_ADDRESSABLE vs 
mark_addressable; TARGET_EXPR is needed for variables that are addressable 
(because their address is taken) to force proper gimplification.

Tested on SPARC/Solaris, applied on the mainline.


PR c/65345
* config/sparc/sparc.c (sparc_atomic_assign_expand_fenv): Adjust to
use create_tmp_var_raw rather than create_tmp_var.

-- 
Eric BotcazouIndex: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 228512)
+++ config/sparc/sparc.c	(working copy)
@@ -12540,20 +12540,23 @@ sparc_atomic_assign_expand_fenv (tree *h
 
__builtin_load_fsr (_var);  */
 
-  tree fenv_var = create_tmp_var (unsigned_type_node);
-  mark_addressable (fenv_var);
+  tree fenv_var = create_tmp_var_raw (unsigned_type_node);
+  TREE_ADDRESSABLE (fenv_var) = 1;
   tree fenv_addr = build_fold_addr_expr (fenv_var);
   tree stfsr = sparc_builtins[SPARC_BUILTIN_STFSR];
-  tree hold_stfsr = build_call_expr (stfsr, 1, fenv_addr);
+  tree hold_stfsr
+= build4 (TARGET_EXPR, unsigned_type_node, fenv_var,
+	  build_call_expr (stfsr, 1, fenv_addr), NULL_TREE, NULL_TREE);
 
-  tree tmp1_var = create_tmp_var (unsigned_type_node);
-  mark_addressable (tmp1_var);
+  tree tmp1_var = create_tmp_var_raw (unsigned_type_node);
+  TREE_ADDRESSABLE (tmp1_var) = 1;
   tree masked_fenv_var
 = build2 (BIT_AND_EXPR, unsigned_type_node, fenv_var,
 	  build_int_cst (unsigned_type_node,
 			 ~(accrued_exception_mask | trap_enable_mask)));
   tree hold_mask
-= build2 (MODIFY_EXPR, void_type_node, tmp1_var, masked_fenv_var);
+= build4 (TARGET_EXPR, unsigned_type_node, tmp1_var, masked_fenv_var,
+	  NULL_TREE, NULL_TREE);
 
   tree tmp1_addr = build_fold_addr_expr (tmp1_var);
   tree ldfsr = sparc_builtins[SPARC_BUILTIN_LDFSR];
@@ -12578,10 +12581,12 @@ sparc_atomic_assign_expand_fenv (tree *h
  tmp2_var >>= 5;
__atomic_feraiseexcept ((int) tmp2_var);  */
 
-  tree tmp2_var = create_tmp_var (unsigned_type_node);
-  mark_addressable (tmp2_var);
-  tree tmp3_addr = build_fold_addr_expr (tmp2_var);
-  tree update_stfsr = build_call_expr (stfsr, 1, tmp3_addr);
+  tree tmp2_var = create_tmp_var_raw (unsigned_type_node);
+  TREE_ADDRESSABLE (tmp2_var) = 1;
+  tree tmp2_addr = build_fold_addr_expr (tmp2_var);
+  tree update_stfsr
+= build4 (TARGET_EXPR, unsigned_type_node, tmp2_var,
+	  build_call_expr (stfsr, 1, tmp2_addr), NULL_TREE, NULL_TREE);
 
   tree update_ldfsr = build_call_expr (ldfsr, 1, fenv_addr);

Re: Do not compare types in operands_equal_p if OEP_ADDRESS_OF is set

2015-10-06 Thread Jan Hubicka

Hi,

I see, OEP_CONSTANT_ADDRESS_OF is set in:
return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1, 0), 
TREE_CONSTANT (arg0) && TREE_CONSTANT (arg1)
? OEP_CONSTANT_ADDRESS_OF | OEP_ADDRESS_OF : 0);
so it is not additive to OEP_ADDRESS_OF, I suppose the existing checks for 
OEP_ADDRESS_OF
in MEM_REF and INDIRECT_REF should also check for OEP_CONSTANT_ADDRESS_OF.  I 
will sent
separate patch for that.
> > -  if (element_precision (TREE_TYPE (arg0))
> > -  != element_precision (TREE_TYPE (arg1)))
> > -return 0;
> > +  /* If both types don't have the same precision, then it is not safe
> > +to strip NOPs.  */
> > +  if (element_precision (TREE_TYPE (arg0))
> > + != element_precision (TREE_TYPE (arg1)))
> > +   return 0;
> 
> It's odd that you move this under the !OEP_ADDRESS_OF case but
> not the STRIP_NOPS itself.

Hmm, I suppose NOP_EXPR should not happen here as it does not have address 
defined.
I will try to assert there and move the statement around.
> 
> > +}
> >  
> >STRIP_NOPS (arg0);
> >STRIP_NOPS (arg1);
> > @@ -2935,27 +2940,34 @@ operand_equal_p (const_tree arg0, const_
> >  
> > case TARGET_MEM_REF:
> > case MEM_REF:
> > - /* Require equal access sizes, and similar pointer types.
> > -We can have incomplete types for array references of
> > -variable-sized arrays from the Fortran frontend
> > -though.  Also verify the types are compatible.  */
> > - if (!((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1))
> > -  || (TYPE_SIZE (TREE_TYPE (arg0))
> > -  && TYPE_SIZE (TREE_TYPE (arg1))
> > -  && operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)),
> > -  TYPE_SIZE (TREE_TYPE (arg1)), 
> > flags)))
> > - && types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1))
> > - && ((flags & OEP_ADDRESS_OF)
> > - || (alias_ptr_types_compatible_p
> > -   (TREE_TYPE (TREE_OPERAND (arg0, 1)),
> > -TREE_TYPE (TREE_OPERAND (arg1, 1)))
> > - && (MR_DEPENDENCE_CLIQUE (arg0)
> > - == MR_DEPENDENCE_CLIQUE (arg1))
> > - && (MR_DEPENDENCE_BASE (arg0)
> > - == MR_DEPENDENCE_BASE (arg1))
> > - && (TYPE_ALIGN (TREE_TYPE (arg0))
> > -   == TYPE_ALIGN (TREE_TYPE (arg1)))
> > -   return 0;
> > + if (!(flags & OEP_ADDRESS_OF))
> > +   {
> > + /* Require equal access sizes */
> > + if (TYPE_SIZE (TREE_TYPE (arg0)) != TYPE_SIZE (TREE_TYPE (arg1))
> > + && (!TYPE_SIZE (TREE_TYPE (arg0))
> > + || !TYPE_SIZE (TREE_TYPE (arg1))
> > + || !operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)),
> > +  TYPE_SIZE (TREE_TYPE (arg1)),
> > +  flags & ~OEP_CONSTANT_ADDRESS_OF)))
> 
> so you still pass OEP_ADDRESS_OF ...

I don't because it is tested earlier 
  if (!(flags & OEP_ADDRESS_OF))
> 
> > +   return 0;
> > + /* Verify that access happens in similar types.  */
> > + if (!types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)))
> > +   return 0;
> > + /* Verify that accesses are TBAA compatible.  */
> > + if ((flag_strict_aliasing
> > +  && !alias_ptr_types_compatible_p
> > +   (TREE_TYPE (TREE_OPERAND (arg0, 1)),
> > +TREE_TYPE (TREE_OPERAND (arg1, 1
> > + || MR_DEPENDENCE_CLIQUE (arg0)
> > +!= MR_DEPENDENCE_CLIQUE (arg1)
> > + || MR_DEPENDENCE_BASE (arg0)
> > +!= MR_DEPENDENCE_BASE (arg1))
> > +   return 0;
> > +   }
> > +  /* Verify that alignment is compatible.  */
> > +  if (TYPE_ALIGN (TREE_TYPE (arg0))
> > +  != TYPE_ALIGN (TREE_TYPE (arg1)))
> > + return 0;
> 
> why's compatible alignment required for OEP_ADDRESS_OF?  We only
> look at type alignment on memory references (see get_pointer_alignment
> vs. get_object_alignment).

I actually tested it with the TYPE_ALIGN test in the condtional, too, so I know
it works.  Later I dediced to play safe and that possibly get_pointer_alignment
may want to do that.  Will move it back
to "if (!(flags & OEP_ADDRESS_OF))"

Honza
> 
> >   flags &= ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF);
> >   return (OP_SAME (0) && OP_SAME (1)
> >   /* TARGET_MEM_REF require equal extra operands.  */
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)

Re: [google][gcc-4_9] encode and compress cc1 option strings in gcov_module_info

2015-10-06 Thread Rong Xu

Here is the patch set 2 that integrates David's comments. Note that
this uses the combined strlen (i.e. encoding compressed and
uncompressed strlen into one gcov_unsigned_t).

Testing is ongoing.

-Rong

On Tue, Oct 6, 2015 at 11:30 AM, Rong Xu  wrote:
> It's 1:3 to 1:4 in the programs I tested. But it really depends on how
> the options are used. I think your idea of using combined strlen works
> better.
> I just make the code a little clumsy but it does not cause any
> performance issue.
>
> On Tue, Oct 6, 2015 at 10:21 AM, Xinliang David Li  wrote:
>> On Tue, Oct 6, 2015 at 9:26 AM, Rong Xu  wrote:
>>> On Mon, Oct 5, 2015 at 5:33 PM, Xinliang David Li  
>>> wrote:
unsigned ggc_memory = gcov_read_unsigned ();
 +  unsigned marker = 0, len = 0, k;
 +  char **string_array, *saved_cc1_strings;
 +
for (unsigned j = 0; j < 7; j++)


 Do not use hard coded number. Use the enum defined in coverage.c.
>>>
>>> OK.
>>>


 +string_array[j] = xstrdup (gcov_read_string ());
 +
 +  k = 0;
 +  for (unsigned j = 1; j < 7; j++)

 Do not use hard coded number.
>>>
>>> OK.
>>>


 +{
 +  if (num_array[j] == 0)
 +continue;
 +  marker += num_array[j];

 It is better to read if the name of variable 'marker' is changed to
 'j_end' or something similar

 For all the substrings of 'j' kind, there should be just one marker,
 right? It looks like here you introduce one marker per string, not one
 marker per string kind.
>>>
>>> I don't understand what you meant here. "marker" is fixed for each j
>>> substring (one option kind) -- it the end index of the sub-string
>>> array. k-loop is for each string.
>>>
>>
>> That was a wrong comment from me. Discard it.
>>

 +  len += 3; /* [[  */

 Same here for hard coded value.

 +  for (; k < marker; k++)
 +len += strlen (string_array[k]) + 1; /* 1 for delimter of ']' 
  */

 Why do we need one ']' per string?
>>>
>>> This is because the options strings can contain space '  '. I cannot
>>> use space as the delimiter, neither is \0 as it is the end of the
>>> string of the encoded string.
>>
>> Ok -- this allows you to avoid string copy during parsing.
>>>


 +}
 +  saved_cc1_strings = (char *) xmalloc (len + 1);
 +  saved_cc1_strings[0] = 0;
 +
 +  marker = 0;
 +  k = 0;
 +  for (unsigned j = 1; j < 7; j++)

 Same here for 7.
>>>
>>> will fix in the new patch.
>>>

 +{
 +  static const char lipo_string_flags[6] = {'Q', 'B', 'S',
 'D','I', 'C'};
 +  if (num_array[j] == 0)
 +continue;
 +  marker += num_array[j];

 Suggest changing marker to j_end
>>> OK.
>>>

 +  sprintf (saved_cc1_strings, "%s[[%c", saved_cc1_strings,
 +   lipo_string_flags[j - 1]);
 +  for (; k < marker; k++)
 +{
 +  sprintf (saved_cc1_strings, "%s%s]", saved_cc1_strings,
 +   string_array[k]);

 +#define DELIMTER"[["

 Why double '[' ?
>>> I will change to single '['.
>>>

 +#define DELIMTER2   "]"
 +#define QUOTE_PATH_FLAG 'Q'
 +#define BRACKET_PATH_FLAG   'B'
 +#define SYSTEM_PATH_FLAG'S'
 +#define D_U_OPTION_FLAG 'D'
 +#define INCLUDE_OPTION_FLAG 'I'
 +#define COMMAND_ARG_FLAG'C'
 +
 +enum lipo_cc1_string_kind {
 +  k_quote_paths = 0,
 +  k_bracket_paths,
 +  k_system_paths,
 +  k_cpp_defines,
 +  k_cpp_includes,
 +  k_lipo_cl_args,
 +  num_lipo_cc1_string_kind
 +};
 +
 +struct lipo_parsed_cc1_string {
 +  const char* source_filename;
 +  unsigned num[num_lipo_cc1_string_kind];
 +  char **strings[num_lipo_cc1_string_kind];
 +};
 +
 +struct lipo_parsed_cc1_string *
 +lipo_parse_saved_cc1_string (const char *src, char *str,
 +bool parse_cl_args_only);
 +void free_parsed_string (struct lipo_parsed_cc1_string *string);
 +

 Declare above in a header file.
>>>
>>> OK.
>>>


  /* Returns true if the command-line arguments stored in the given 
 module-infos
 are incompatible.  */
  bool
 -incompatible_cl_args (struct gcov_module_info* mod_info1,
 -  struct gcov_module_info* mod_info2)
 +incompatible_cl_args (struct lipo_parsed_cc1_string* mod_info1,
 +  struct lipo_parsed_cc1_string* mod_info2)

 Fix formating.
>>> OK.

  {
  {
 @@ -1647,7 +1679,7 @@ build_var (tree fn_decl, tree type, int counter)
  /* Creates the gcov_fn_info RECORD_TYPE.  */

C++ PATCH for c++/67810 (wrong fold-expression error)

2015-10-06 Thread Jason Merrill

It seems that my approach of scanning the tokens looking for an ellipsis 
with an operator next to it wasn't good enough; it got confused by a 
reference pack expansion in a template argument list.  So this patch 
changes the parsing approach so that within parentheses we always start 
by parsing an expression, and then pass that expression into 
cp_parser_fold_expression if what follows the expression looks like part 
of a fold-expression.  At that point we need to complain if the initial 
expression is a binary or trinary expression, since the operands of 
fold-expressions can only be cast-expressions.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit fb86de89602221b589f91b1cfd0fbfb7b6d7e130
Author: Jason Merrill 
Date:   Tue Oct 6 17:31:10 2015 -0400

	PR c++/67810
	* parser.c (cp_parser_fold_expr_p): Remove.
	(is_binary_op): New.
	(cp_parser_fold_expression): Take LHS as parameter.
	(cp_parser_primary_expression): Call it after parsing an expression.
	(cp_parser_binary_expression, cp_parser_assignment_operator_opt)
	(cp_parser_expression): Ignore an operator followed by '...'.
	(is_binary_op): New.
	* pt.c (tsubst_unary_left_fold, tsubst_binary_left_fold)
	(tsubst_unary_right_fold, tsubst_binary_right_fold): Handle errors.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ffed595..1f36b25 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -4339,6 +4339,49 @@ cp_parser_fold_operator (cp_token *token)
 }
 }
 
+/* Returns true if CODE indicates a binary expression, which is not allowed in
+   the LHS of a fold-expression.  More codes will need to be added to use this
+   function in other contexts.  */
+
+static bool
+is_binary_op (tree_code code)
+{
+  switch (code)
+{
+case PLUS_EXPR:
+case POINTER_PLUS_EXPR:
+case MINUS_EXPR:
+case MULT_EXPR:
+case TRUNC_DIV_EXPR:
+case TRUNC_MOD_EXPR:
+case BIT_XOR_EXPR:
+case BIT_AND_EXPR:
+case BIT_IOR_EXPR:
+case LSHIFT_EXPR:
+case RSHIFT_EXPR:
+
+case MODOP_EXPR:
+
+case EQ_EXPR:
+case NE_EXPR:
+case LE_EXPR:
+case GE_EXPR:
+case LT_EXPR:
+case GT_EXPR:
+
+case TRUTH_ANDIF_EXPR:
+case TRUTH_ORIF_EXPR:
+
+case COMPOUND_EXPR:
+
+case DOTSTAR_EXPR:
+case MEMBER_REF:
+  return true;
+default:
+  return false;
+}
+}
+
 /* If the next token is a suitable fold operator, consume it and return as
the function above.  */
 
@@ -4352,41 +4395,6 @@ cp_parser_fold_operator (cp_parser *parser)
   return code;
 }
 
-/* Returns true iff we're at the beginning of an N4191 fold-expression, after
-   the left parenthesis.  Rather than do tentative parsing, we scan the tokens
-   up to the matching right paren for an ellipsis next to a binary
-   operator.  */
-
-static bool
-cp_parser_fold_expr_p (cp_parser *parser)
-{
-  /* An ellipsis right after the left paren always indicates a
- fold-expression.  */
-  if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))
-{
-  /* But if there isn't a fold operator after the ellipsis,
- give a different error.  */
-  cp_token *token = cp_lexer_peek_nth_token (parser->lexer, 2);
-  return (cp_parser_fold_operator (token) != ERROR_MARK);
-}
-
-  /* Otherwise, look for an ellipsis.  */
-  cp_lexer_save_tokens (parser->lexer);
-  int ret = cp_parser_skip_to_closing_parenthesis_1 (parser, false,
-		 CPP_ELLIPSIS, false);
-  bool found = (ret == -1);
-  if (found)
-{
-  /* We found an ellipsis, is the previous token an operator?  */
-  cp_token *token = cp_lexer_peek_token (parser->lexer);
-  --token;
-  if (cp_parser_fold_operator (token) == ERROR_MARK)
-	found = false;
-}
-  cp_lexer_rollback_tokens (parser->lexer);
-  return found;
-}
-
 /* Parse a fold-expression.
 
  fold-expression:
@@ -4397,14 +4405,10 @@ cp_parser_fold_expr_p (cp_parser *parser)
Note that the '(' and ')' are matched in primary expression. */
 
 static tree
-cp_parser_fold_expression (cp_parser *parser)
+cp_parser_fold_expression (cp_parser *parser, tree expr1)
 {
   cp_id_kind pidk;
 
-  if (cxx_dialect < cxx1z && !in_system_header_at (input_location))
-pedwarn (input_location, 0, "fold-expressions only available with "
-	 "-std=c++1z or -std=gnu++1z");
-
   // Left fold.
   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))
 {
@@ -4423,10 +4427,6 @@ cp_parser_fold_expression (cp_parser *parser)
   return finish_left_unary_fold_expr (expr, op);
 }
 
-  tree expr1 = cp_parser_cast_expression (parser, false, false, false, );
-  if (expr1 == error_mark_node)
-return error_mark_node;
-
   const cp_token* token = cp_lexer_peek_token (parser->lexer);
   int op = cp_parser_fold_operator (parser);
   if (op == ERROR_MARK)
@@ -4442,6 +4442,16 @@ cp_parser_fold_expression (cp_parser *parser)
 }
   cp_lexer_consume_token (parser->lexer);
 
+  /* The operands of a fold-expression are

[PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-10-06 Thread Jeff Law



As touched on in the BZ, we record jump threads as a list of edges to 
traverse.  A jump thread may be recorded through a block which hasn't 
been optimized by DOM yet.


If DOM is able to optimize a control flow statement in such a block, 
then it will remove one or more outgoing edges from the block.


Removal of an edge triggers releasing the edge back to the GC system, so 
naturally bad things happen if the threader then looks at the content of 
those edges.


After some instrumentation I found this was happening for both jump 
threads with joiner blocks as well as FSM jump threads.  The former are 
actually more common than the latter.


Given the sequencing of the recording of the jump thread, DOM optimizing 
the COND_EXPR, subsequent releasing the edge structure and finally 
examination of the jump thread to modify the CFG I saw only one good 
solution and one bad solution.


The bad solution involved removing jump thread paths at the time in 
which DOM optimizes the COND_EXPR.  The problem is we have to walk the 
entire path of every recorded jump thread each time DOM performs this 
optimization.  Obviously not good.


So instead we record the affected edge pointers into a hash table and 
use those later to prune the jump thread paths.  It's a single walk over 
each recorded jump threading path.  Since we're not looking at the 
contents of the edge, this works reasonably well.


One more reason to push harder for jump threading to occur independently 
of DOM using Sebastian's backward walking FSM bits.


Anyway, bootstrapped and regression tested on x86_64-linux-gnu.  Also 
bootstrapped and testcase tested on x86_64-linux-gnu with just release 
checking enabled.


Installed on the trunk.

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 732b3d1..db6f1b6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,17 @@
+2015-10-06  Jeff Law  
+
+   PR tree-optimization/67816
+   * tree-ssa-threadupdate.h (remove_jump_threads_including): Renamed
+   from remove_jump_threads_starting_at.  Accept an edge rather than
+   a basic block.
+   * tree-ssa-threadupdate.c (removed_edges): New hash table.
+   (remove_jump_threads_including): Note edges that get removed from
+   the CFG for later pruning of jump threading paths including them.
+   (thread_through_all_blocks): Remove paths which include edges that
+   have been removed.
+   * tree-ssa-dom.c (optimize_stmt): Call remove_jump_threads_including
+   on each outgoing edges when optimizing away a control statement.
+
 2015-10-06  Trevor Saunders  
 
* reorg.c (emit_delay_sequence): Store list of delay slot insns
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4ec4743..1882fbd 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-10-06  Jeff Law  
+
+   * gcc.c-torture/compile/pr67816.c: New test.
+
 2015-10-07  Kugan Vivekanandarajah  
 
* gcc.target/aarch64/get_lane_f16_1.c: New test.
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr67816.c 
b/gcc/testsuite/gcc.c-torture/compile/pr67816.c
new file mode 100644
index 000..c3db3a3
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr67816.c
@@ -0,0 +1,19 @@
+int a, c, d, e;
+int b[10];
+void fn1() {
+  int i, f = 0;
+  for (;;) {
+i = 0;
+for (; i < a; i++)
+  if (c) {
+if (b[i])
+  f = 1;
+  } else if (b[i])
+f = 0;
+if (f)
+  d++;
+while (e)
+  ;
+  }
+}
+
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index c226da5..941087d 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1840,8 +1840,13 @@ optimize_stmt (basic_block bb, gimple_stmt_iterator si,
  edge taken_edge = find_taken_edge (bb, val);
  if (taken_edge)
{
- /* Delete threads that start at BB.  */
- remove_jump_threads_starting_at (bb);
+
+ /* We need to remove any queued jump threads that
+reference outgoing edges from this block.  */
+ edge_iterator ei;
+ edge e;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+   remove_jump_threads_including (e);
 
  /* If BB is in a loop, then removing an outgoing edge from BB
 may cause BB to move outside the loop, changes in the
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 4a147bb..26b199b 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -215,6 +215,18 @@ redirection_data::equal (const redirection_data *p1, const 
redirection_data *p2)
   return true;
 }
 
+/* Rather than search all the edges in jump thread paths each time
+   DOM is able to simply if control statement, we build a hash table
+   with the deleted edges.  We only care about the address of the edge,
+   not its contents.  */
+struct removed_edges :

[PATCH 1/3] [ARM] PR63870 Add qualifiers for NEON builtins

2015-10-06 Thread charles . baylis

From: Charles Baylis 

gcc/ChangeLog:

  Charles Baylis  

PR target/63870
* config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerator
qualifier_struct_load_store_lane_index.
(builtin_arg): New enumerator NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
(arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON
argument qualifiers.
(arm_expand_neon_builtin): Handle new NEON argument qualifier.
* config/arm/arm.h (ENDIAN_LANE_N): New macro.

Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647
---
 gcc/config/arm/arm-builtins.c | 46 ---
 gcc/config/arm/arm.c  |  1 +
 gcc/config/arm/arm.h  |  3 +++
 3 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 0f5a1f1..a29f8d6 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -77,7 +77,9 @@ enum arm_type_qualifiers
   /* Polynomial types.  */
   qualifier_poly = 0x100,
   /* Lane indices - must be within range of previous argument = a vector.  */
-  qualifier_lane_index = 0x200
+  qualifier_lane_index = 0x200,
+  /* Lane indices for single lane structure loads and stores.  */
+  qualifier_struct_load_store_lane_index = 0x400
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -1973,6 +1975,7 @@ typedef enum {
   NEON_ARG_COPY_TO_REG,
   NEON_ARG_CONSTANT,
   NEON_ARG_LANE_INDEX,
+  NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX,
   NEON_ARG_MEMORY,
   NEON_ARG_STOP
 } builtin_arg;
@@ -2030,9 +2033,9 @@ neon_dereference_pointer (tree exp, tree type, 
machine_mode mem_mode,
 /* Expand a Neon builtin.  */
 static rtx
 arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
- int icode, int have_retval, tree exp, ...)
+ int icode, int have_retval, tree exp,
+ builtin_arg *args)
 {
-  va_list ap;
   rtx pat;
   tree arg[SIMD_MAX_BUILTIN_ARGS];
   rtx op[SIMD_MAX_BUILTIN_ARGS];
@@ -2047,13 +2050,11 @@ arm_expand_neon_args (rtx target, machine_mode 
map_mode, int fcode,
  || !(*insn_data[icode].operand[0].predicate) (target, tmode)))
 target = gen_reg_rtx (tmode);
 
-  va_start (ap, exp);
-
   formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode]));
 
   for (;;)
 {
-  builtin_arg thisarg = (builtin_arg) va_arg (ap, int);
+  builtin_arg thisarg = args[argc];
 
   if (thisarg == NEON_ARG_STOP)
break;
@@ -2089,6 +2090,18 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, 
int fcode,
op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
  break;
 
+   case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX:
+ gcc_assert (argc > 1);
+ if (CONST_INT_P (op[argc]))
+   {
+ neon_lane_bounds (op[argc], 0,
+   GET_MODE_NUNITS (map_mode), exp);
+ /* Keep to GCC-vector-extension lane indices in the RTL.  */
+ op[argc] =
+   GEN_INT (ENDIAN_LANE_N (map_mode, INTVAL (op[argc])));
+   }
+ goto constant_arg;
+
case NEON_ARG_LANE_INDEX:
  /* Previous argument must be a vector, which this indexes.  */
  gcc_assert (argc > 0);
@@ -2099,17 +2112,22 @@ arm_expand_neon_args (rtx target, machine_mode 
map_mode, int fcode,
}
  /* Fall through - if the lane index isn't a constant then
 the next case will error.  */
+
case NEON_ARG_CONSTANT:
+constant_arg:
  if (!(*insn_data[icode].operand[opno].predicate)
  (op[argc], mode[argc]))
-   error_at (EXPR_LOCATION (exp), "incompatible type for argument 
%d, "
-  "expected %", argc + 1);
+   {
+ error ("%Kargument %d must be a constant immediate",
+exp, argc + 1);
+ return const0_rtx;
+   }
  break;
+
 case NEON_ARG_MEMORY:
  /* Check if expand failed.  */
  if (op[argc] == const0_rtx)
  {
-   va_end (ap);
return 0;
  }
  gcc_assert (MEM_P (op[argc]));
@@ -2132,8 +2150,6 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, 
int fcode,
}
 }
 
-  va_end (ap);
-
   if (have_retval)
 switch (argc)
   {
@@ -2245,6 +2261,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 
   if (d->qualifiers[qualifiers_k] & qualifier_lane_index)
args[k] = NEON_ARG_LANE_INDEX;
+  else if (d->qualifiers[qualifiers_k] & 
qualifier_struct_load_store_lane_index)
+   args[k] = NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX;
   else if (d->qualifiers[qualifiers_k] & qualifier_immediate)

[PATCH v2 0/3] [ARM] PR63870 vldN_lane/vstN_lane error messages

2015-10-06 Thread charles . baylis

From: Charles Baylis 

This patch series fixes up the error messages for single lane vector
load/stores, similarly to AArch64.

make check on arm-linux-gnueabihf/qemu completes with no new regressions.

Changes since the last version:
. removed the duplicate arm_neon_lane_bounds function
. resolved conflicts with other NEON work
. whitespace clean up

Charles Baylis (3):
  [ARM] PR63870 Add qualifiers for NEON builtins
  [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate
qualifier
  [ARM] PR63870 Enable test cases for ARM

 gcc/config/arm/arm-builtins.c  | 50 ++
 gcc/config/arm/arm.c   |  1 +
 gcc/config/arm/arm.h   |  3 ++
 gcc/config/arm/neon.md | 49 +++--
 .../advsimd-intrinsics/vld2_lane_f16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_f32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_f64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_p8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld2_lane_s16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_s32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_s64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_s8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld2_lane_u16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_u32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_u64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2_lane_u8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld2q_lane_f16_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_f32_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_f64_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_p8_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2q_lane_s16_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_s32_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_s64_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_s8_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld2q_lane_u16_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_u32_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_u64_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld2q_lane_u8_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_f16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_f32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_f64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_p8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld3_lane_s16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_s32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_s64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_s8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld3_lane_u16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_u32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_u64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3_lane_u8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld3q_lane_f16_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_f32_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_f64_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_p8_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3q_lane_s16_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_s32_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_s64_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_s8_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld3q_lane_u16_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_u32_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_u64_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld3q_lane_u8_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_f16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_f32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_f64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_p8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld4_lane_s16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_s32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_s64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_s8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld4_lane_u16_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_u32_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_u64_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4_lane_u8_indices_1.c|  5 +--
 .../advsimd-intrinsics/vld4q_lane_f16_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld4q_lane_f32_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld4q_lane_f64_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld4q_lane_p8_indices_1.c   |  5 +--
 .../advsimd-intrinsics/vld4q_lane_s16_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld4q_lane_s32_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld4q_lane_s64_indices_1.c  |  5 +--
 .../advsimd-intrinsics/vld4q_lane_s8_indices_1.c   |  5 +--

[PATCH 2/3] [ARM] PR63870 Mark lane indices of vldN/vstN with appropriate qualifier

2015-10-06 Thread charles . baylis

From: Charles Baylis 

gcc/ChangeLog:

  Charles Baylis  

PR target/63870
* config/arm/arm-builtins.c: (arm_load1_qualifiers) Use
qualifier_struct_load_store_lane_index.
(arm_storestruct_lane_qualifiers) Likewise.
* config/arm/neon.md: (neon_vld1_lane) Reverse lane numbers for
big-endian.
(neon_vst1_lane) Likewise.
(neon_vld2_lane) Likewise.
(neon_vst2_lane) Likewise.
(neon_vld3_lane) Likewise.
(neon_vst3_lane) Likewise.
(neon_vld4_lane) Likewise.
(neon_vst4_lane) Likewise.

Change-Id: Ic39898d288701bc5b712490265be688f5620c4e2
---
 gcc/config/arm/arm-builtins.c |  4 ++--
 gcc/config/arm/neon.md| 49 +++
 2 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index a29f8d6..cbe96e4 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -162,7 +162,7 @@ arm_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_const_pointer_map_mode,
-  qualifier_none, qualifier_immediate };
+  qualifier_none, qualifier_struct_load_store_lane_index };
 #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers)
 
 /* The first argument (return type) of a store should be void type,
@@ -181,7 +181,7 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer_map_mode,
-  qualifier_none, qualifier_immediate };
+  qualifier_none, qualifier_struct_load_store_lane_index };
 #define STORE1LANE_QUALIFIERS (arm_storestruct_lane_qualifiers)
 
 #define v8qi_UP  V8QImode
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2667866..251afdc 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4261,8 +4261,9 @@ if (BYTES_BIG_ENDIAN)
 UNSPEC_VLD1_LANE))]
   "TARGET_NEON"
 {
-  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
   HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
+  operands[3] = GEN_INT (lane);
   if (lane < 0 || lane >= max)
 error ("lane out of range");
   if (max == 1)
@@ -4281,8 +4282,9 @@ if (BYTES_BIG_ENDIAN)
 UNSPEC_VLD1_LANE))]
   "TARGET_NEON"
 {
-  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
   HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
+  operands[3] = GEN_INT (lane);
   int regno = REGNO (operands[0]);
   if (lane < 0 || lane >= max)
 error ("lane out of range");
@@ -4367,8 +4369,9 @@ if (BYTES_BIG_ENDIAN)
  UNSPEC_VST1_LANE))]
   "TARGET_NEON"
 {
-  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
   HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
+  operands[2] = GEN_INT (lane);
   if (lane < 0 || lane >= max)
 error ("lane out of range");
   if (max == 1)
@@ -4387,7 +4390,7 @@ if (BYTES_BIG_ENDIAN)
  UNSPEC_VST1_LANE))]
   "TARGET_NEON"
 {
-  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
   HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[1]);
   if (lane < 0 || lane >= max)
@@ -4396,8 +4399,8 @@ if (BYTES_BIG_ENDIAN)
 {
   lane -= max / 2;
   regno += 2;
-  operands[2] = GEN_INT (lane);
 }
+  operands[2] = GEN_INT (lane);
   operands[1] = gen_rtx_REG (mode, regno);
   if (max == 2)
 return "vst1.\t{%P1}, %A0";
@@ -4457,7 +4460,7 @@ if (BYTES_BIG_ENDIAN)
UNSPEC_VLD2_LANE))]
   "TARGET_NEON"
 {
-  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
   HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[0]);
   rtx ops[4];
@@ -4466,7 +4469,7 @@ if (BYTES_BIG_ENDIAN)
   ops[0] = gen_rtx_REG (DImode, regno);
   ops[1] = gen_rtx_REG (DImode, regno + 2);
   ops[2] = operands[1];
-  ops[3] = operands[3];
+  ops[3] = GEN_INT (lane);
   output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, %A2", ops);
   return "";
 }
@@ -4482,7 +4485,7 @@ if (BYTES_BIG_ENDIAN)
UNSPEC_VLD2_LANE))]
   "TARGET_NEON"
 {
-  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
   HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[0]);
   rtx ops[4];
@@ -4572,7 +4575,7 @@ if (BYTES_BIG_ENDIAN)
  UNSPEC_VST2_LANE))]
   "TARGET_NEON"
 {
-  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
   HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[1]);

Re: [4/7] Use correct promoted mode sign for result of GIMPLE_CALL

2015-10-06 Thread kugan




On 15/09/15 22:47, Richard Biener wrote:

On Tue, Sep 8, 2015 at 11:50 PM, Jim Wilson  wrote:

On 09/08/2015 08:39 AM, Jeff Law wrote:

Is this another instance of the PROMOTE_MODE issue that was raised by
Jim Wilson a couple months ago?


It looks like a closely related problem.  The one I am looking at has
confusion with a function arg and a local variable as they have
different sign extension promotion rules.  Kugan's is with a function
return value and a local variable as they have different sign extension
promotion rules.

The bug report is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932

The gcc-patches thread spans a month end boundary, so it has multiple heads
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00112.html
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00524.html

Function args and function return values get the same sign extension
treatment when promoted, this is handled by
TARGET_PROMOTE_FUNCTION_MODE. Local variables are treated differently,
via PROMOTE_MODE. I think the function arg/return treatment is wrong,
but changing that is an ABI change which is undesirable.  I suppose we
could change local variables to match function args and return values,
but I think that is moving in the wrong direction.  Though Kugan's new
optimization pass will remove some of the extra unnecessary sign/zero
extensions added by the arm TARGET_PROMOTE_FUNCTION_MODE definition, so
maybe it won't matter enough to worry about any more.

If we can't fix this in the arm backend, then we may need different
middle fixes for these two cases.  I was looking at ways to fix this in
the tree-out-of-ssa pass.  I don't know if this will work for Kugan's
testcase, I'd need time to look at it.


As you said, I dont think the fix in tree-out-of-ssa pass will not fix 
this case. Kyrill also saw the same problem with the trunk as in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67714


I think the function return value should have been "promoted" according to
the ABI by the lowering pass.  Thus the call stmt return type be changed,
exposing the "mismatch" and compensating the IL with a sign-conversion.



Function return value is promoted as per ABI.
In the example from PR67714

 _8 = fn1D.5055 ();
  e_9 = (charD.4) _8;
  f_13 = _8;
...

_8 is sign extended correctly. But in f_13 = _8, it is promoted to 
unsigned and zero extended due to the backend PROMOTE_MODE. We thus have:


The zero-extension during expand:
;; f_13 = _8;

(insn 15 14 0 (set (reg/v:SI 110 [ f ])
(zero_extend:SI (subreg/u:QI (reg/v:SI 110 [ f ]) 0))) 
arm-zext.c:18 -1

 (nil))

This is wrong.


As for your original issue with function arguments they should really
get similar
treatment, eventually in function arg gimplification already, by making
the PARM_DECLs promoted and using a local variable for further uses
with the "local" type.  Eventually one can use DECL_VALUE_EXPR to fixup
the IL, not sure.  Or we can do this in the promotion pass as well.



I will try doing this see if I can do this.

Thanks,
Kugan


Richard.


Jim

Re: [PATCH] Fix PR67859

2015-10-06 Thread Richard Biener

On Tue, 6 Oct 2015, Richard Biener wrote:

> 
> This fixes an ICE in SCCVN - not sure how we got away with not clearing
> new slots...  possibly SSA name recycling triggered.

Ah, it's not supposed to happen.  Bisection shows the cause, fixed
with the following instead.

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2015-10-06  Richard Biener  

PR tree-optimization/67859
* tree-ssa-pre.c (create_expression_by_pieces): Properly
discard not inserted stmts.

* gcc.dg/torture/pr67859.c: New testcase.

Index: gcc/testsuite/gcc.dg/torture/pr67859.c
===
--- gcc/testsuite/gcc.dg/torture/pr67859.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr67859.c  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+int a, b, c;
+
+void
+fn1 ()
+{
+  b = c ? 0 : 1 << a;
+  b |= 0x9D7A5FD9;
+  for (;;)
+{
+  int d = 1;
+  b &= (unsigned) d;
+}
+}
Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 228514)
--- gcc/tree-ssa-pre.c  (working copy)
*** create_expression_by_pieces (basic_block
*** 2897,2907 

folded = gimple_convert (_stmts, exprtype, folded);

!   /* If everything simplified to an exisiting SSA name or constant just
!  return that.  */
!   if (gimple_seq_empty_p (forced_stmts)
!   || is_gimple_min_invariant (folded))
  return folded;

gcc_assert (TREE_CODE (folded) == SSA_NAME);

--- 2897,2912 

folded = gimple_convert (_stmts, exprtype, folded);

!   /* If there is nothing to insert, return the simplified result.  */
!   if (gimple_seq_empty_p (forced_stmts))
  return folded;
+   /* If we simplified to a constant return it and discard eventually
+  built stmts.  */
+   if (is_gimple_min_invariant (folded))
+ {
+   gimple_seq_discard (forced_stmts);
+   return folded;
+ }

gcc_assert (TREE_CODE (folded) == SSA_NAME);

Re: [PING][PR67476] Add param parloops-schedule

2015-10-06 Thread Bernd Schmidt


On 10/04/2015 05:36 PM, Tom de Vries wrote:

I'll try to give a bit of context:

The omp-expand machinery is used in two contexts:
1. when omp annotations are added to the source. In that case,
omp-expand is used in non-ssa gimple context.
2. when parloops annotates a loop with omp annotations. In that case,
omp-expand is used in ssa gimple context.


This much I remembered. The rest is at least useful background information.


In addition, I've recently (r226427) fixed parloops such that it no
longer invalidates the loops structure and cancels the loop tree. At the
parloops side, that involved adding an empty latch block, in order not
to break the LOOPS_HAVE_SIMPLE_LATCHES property. At the omp-expand side,
that meant handling the empty latch block, as well as updating the loops
structure. In r227435, I've applied a similar fix for
expand_omp_for_static_chunk.


Ok, I see similar pieces in that commit. Probably this means we're 
looking at some changes that are independent from each other and should 
be submitted as such. Is there a way to make these functions share code?



For example - this thing is entirely unexplained:

[...]

This bit is adding missing ssa support. In expand_omp_for_generic, we
add a loop around the loop we're expanding. Since we're in ssa, that
means we need to:
- add phis to the outer loop that connect to the phis in the inner,
   original loop, and
- move the loop entry value of the inner phi to the loop entry value of
   the outer phi.


Explanations like this should go into a comment.

Also, you're using l2_bb in that block, but as far as I can tell it 
could be NULL (if broken_loop). Is there a reason why this can't happen?



Bernd

Re: [ARM] Fix PR middle-end/65958

2015-10-06 Thread Eric Botcazou

> Thanks - I have no further comments on this patch. We probably need to
> implement the same on AArch64 too in order to avoid similar problems.

Here's the implementation for aarch64, very similar but simpler since there is 
no shortage of scratch registers; the only thing to note is the new blockage 
pattern.  This was tested on real hardware but not with Linux, instead with 
Darwin (experimental port of the toolchain to iOS) and makes it possible to 
pass ACATS (Ada conformance testsuite which requires stack checking).

There is also a couple of tweaks for the ARM implementation: a cosmetic one 
for the probe_stack pattern and one for the output_probe_stack_range loop.


2015-10-06  Tristan Gingold  
Eric Botcazou  

PR middle-end/65958
* config/aarch64/aarch64-protos.h (aarch64_output_probe_stack-range):
Declare.
* config/aarch64/aarch64.md: Declare UNSPECV_BLOCKAGE and
UNSPEC_PROBE_STACK_RANGE.
(blockage): New instruction.
(probe_stack_range): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_probe_stack_range): New
function.
(aarch64_output_probe_stack_range): Likewise.
(aarch64_expand_prologue): Invoke aarch64_emit_probe_stack_range if
static builtin stack checking is enabled.
* config/aarch64/aarch64-linux.h (STACK_CHECK_STATIC_BUILTIN):
Define.

* config/arm/arm.c (arm_emit_probe_stack_range): Adjust comment.
(output_probe_stack_range): Rotate the loop and simplify.
(thumb1_expand_prologue): Tweak sorry message.
* config/arm/arm.md (probe_stack): Use bare string.


2015-10-06  Eric Botcazou  

* gcc.target/aarch64/stack-checking.c: New test.

-- 
Eric BotcazouIndex: config/aarch64/aarch64-linux.h
===
--- config/aarch64/aarch64-linux.h	(revision 228512)
+++ config/aarch64/aarch64-linux.h	(working copy)
@@ -88,4 +88,7 @@
 #undef TARGET_BINDS_LOCAL_P
 #define TARGET_BINDS_LOCAL_P default_binds_local_p_2
 
+/* Define this to be nonzero if static stack checking is supported.  */
+#define STACK_CHECK_STATIC_BUILTIN 1
+
 #endif  /* GCC_AARCH64_LINUX_H */
Index: config/aarch64/aarch64-protos.h
===
--- config/aarch64/aarch64-protos.h	(revision 228512)
+++ config/aarch64/aarch64-protos.h	(working copy)
@@ -316,6 +316,7 @@ void aarch64_asm_output_labelref (FILE *
 void aarch64_cpu_cpp_builtins (cpp_reader *);
 void aarch64_elf_asm_named_section (const char *, unsigned, tree);
 const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
+const char * aarch64_output_probe_stack_range (rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode, const char *);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c	(revision 228512)
+++ config/aarch64/aarch64.c	(working copy)
@@ -76,6 +76,7 @@
 #include "sched-int.h"
 #include "cortex-a57-fma-steering.h"
 #include "target-globals.h"
+#include "common/common-target.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2144,6 +2145,167 @@ aarch64_libgcc_cmp_return_mode (void)
   return SImode;
 }
 
+#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
+
+#if (PROBE_INTERVAL % 4096) != 0
+#error Cannot use indexed address calculation for stack probing
+#endif
+
+#if PROBE_INTERVAL > 4096
+#error Cannot use indexed addressing mode for stack probing
+#endif
+
+/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
+   inclusive.  These are offsets from the current stack pointer.  */
+
+static void
+aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
+{
+  rtx reg9 = gen_rtx_REG (Pmode, 9);
+
+  /* The following code uses indexed address calculation on FIRST.  */
+  gcc_assert ((first % 4096) == 0);
+
+  /* See if we have a constant small number of probes to generate.  If so,
+ that's the easy case.  */
+  if (size <= PROBE_INTERVAL)
+{
+  emit_set_insn (reg9,
+		 plus_constant (Pmode, stack_pointer_rtx,
+	   -(first + PROBE_INTERVAL)));
+  emit_stack_probe (plus_constant (Pmode, reg9, PROBE_INTERVAL - size));
+}
+
+  /* The run-time loop is made up of 8 insns in the generic case while the
+ compile-time loop is made up of 4+2*(n-2) insns for n # of intervals.  */
+  else if (size <= 4 * PROBE_INTERVAL)
+{
+  HOST_WIDE_INT i, rem;
+
+  emit_set_insn (reg9,
+		 plus_constant (Pmode, stack_pointer_rtx,
+	   -(first + PROBE_INTERVAL)));
+  emit_stack_probe (reg9);
+
+  /* Probe at FIRST + N * PROBE_INTERVAL for values of N from 2 until
+	 it exceeds SIZE.  If only two probes are needed, this will not
+	 generate any

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-10-06 Thread Richard Biener

On Thu, Oct 1, 2015 at 10:20 AM, Michael Collison
 wrote:
> Marc,
>
> Ah I did misunderstand you. Patch with match.pd formatting fix.

Ok if bootstrap/regtest passes.

Thanks,
Richard.

> On 10/01/2015 01:05 AM, Marc Glisse wrote:
>>
>> On Thu, 1 Oct 2015, Michael Collison wrote:
>>
>>> ChangeLog formatting and test case fixed.
>>
>>
>> Oups, sorry for the lack of precision, but I meant indenting the code in
>> match.pd, I hadn't even looked at the ChangeLog.
>>
>
> --
> Michael Collison
> Linaro Toolchain Working Group
> michael.colli...@linaro.org
>

Re: C PATCH for c/65345 (file-scope _Atomic expansion with floats)

2015-10-06 Thread Kaz Kojima

Eric Botcazou  wrote:
> No, it is not if you don't need to make the variable addressable.  In any 
> case, the failure mode is another ICE in make_decl_rtl so easy to spot.

Thanks for your explanation!  It clarifies the intent of
the original i386 patch.

Regards,
kaz

Re: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.

2015-10-06 Thread Alan Lawrence


Thanks for working on this, Simon!

On 01/10/15 15:43, Simon Dardis wrote:

-(define_expand "reduc_smax_"
-  [(match_operand:VWHB 0 "register_operand" "")
-   (match_operand:VWHB 1 "register_operand" "")]
+(define_expand "reduc_smax_scal_"
+  [(match_operand:HI 0 "register_operand" "")
+   (match_operand:VH 1 "register_operand" "")]




-(define_expand "reduc_smin_"
-  [(match_operand:VWHB 0 "register_operand" "")
-   (match_operand:VWHB 1 "register_operand" "")]
+(define_expand "reduc_smin_scal_"
+  [(match_operand:HI 0 "register_operand" "")
+   (match_operand:VH 1 "register_operand" "")]


I note these two change from VWHB to VH; the latter is just V4HI, so this loses 
you smin/smax for V2SI and V8QI...is that intentional? (It looks like you define 
vec_loongson_extract_lo for all relevant modes so I would expect you to use 
 as you do for reduc_plus_scal.)


(In contrast umax/umin only had VB = V8QI variants before.)

Also a minor stylistic point:


+  emit_insn ( gen_vec_loongson_extract_lo_ (operands[0], tmp));


(Five instances) spurious space after (.


Cheers, Alan

Re: [patch 4/3] Header file reduction - Tools for contrib

2015-10-06 Thread Andrew MacLeod


On 10/06/2015 04:37 PM, Bernd Schmidt wrote:

On 10/06/2015 09:19 PM, Andrew MacLeod wrote:

I dont get your fear.  I could have created that patch by hand, it would
just take a long time, and would likely be less complete, but just as
large.

I'm not  changing functionality.  ALL the tool is doing is removing
header files which aren't needed to compile.  It goes to great pains to
make sure it doesn't remove a silent dependency that conditional
compilation might introduce.  Other than that, the sanity check is that
everything compiles on every target and regression tests show nothing.
Since we're doing this with just include files, and not changing
functionality, Im not sure what your primary concern is?


My concern is that I've seen occasions in the past where "harmless 
cleanups" that were not intended to alter functionality introduced 
severe and subtle bugs that went unnoticed for a significant amount of 
time. If a change does not alter functionality, then there is a valid 
question of "why apply it then?", and the question of correctness 
becomes very important (to me anyway). The patch was produced by a 
fairly complex process, and I'd want to at least be able to convince 
myself that the process is correct.


Anyhow, I'll step back from this, you're probably better served by 
someone else reviewing the patch.



Bernd
I do get it.  And I have spent a lot of time trying to make sure none of 
those sort of bugs come in, and ultimately have tried to be 
conservative.. after all, its better to have the tool leave an include 
than remove one that may be required.


Ultimately, these changes are unlikely to introduce an issue, but there 
is a very slight possibility.  Any issues that do surface should be of 
the "not using a pattern" kind because a conditional compilation code 
case was somehow missed. I'm hoping for none of those obviously.  
Anyway, the tool does seem to work on all the tests I have looked at.  
If any bugs are uncovered by this, then they are also latent issues we 
didn't know about that should be exposed and fixed anyway.


I am fine if we'd like to separate the patches into the reordering, and 
the deleting.   Its not a lot of effort on my part, just a lot of time 
compiling for the reducer in the background.. and we can do them as 2 
commits if that is helpful.


What I don't want to do is spend a lot more time massaging the tools for 
contrib because I am sick of looking at them right now, and no one is in 
a hurry to use them anyway...  if anyone ever does.:-) The documentation 
grammer should certainly be fixed up and I will add some comments around 
the questions you had.


we could also do a small scale submission on half a dozen files, provide 
the reorder patch, and then  the reduction patch with the logs  if that 
helps whoever is reviewing get comfortable with what the tool is doing, 
then its easier to simply acknowledge the mechanical nature of the large 
commit.


Perhaps it would be educational anyway.

I'll do it however you guys want...  i just want to get it done :-)

Andrew

Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-10-06 Thread Steve Ellcey

On Thu, 2015-09-24 at 21:24 -0400, Trevor Saunders wrote:
> On Thu, Sep 24, 2015 at 06:55:11PM +0200, Bernd Schmidt wrote:
> > On 09/24/2015 06:11 PM, Steve Ellcey wrote:
> > >At least one of the warnings in glibc is not justified (in my opinion).
> > >The header file timezone/private.h defines time_t_min and time_t_max.
> > >These are not used in any of the timezone files built by glibc but if
> > >you look at the complete tz package they are used when building other
> > >objects that are not part of the glibc tz component and that include
> > >private.h.
> > 
> > The standard C way of writing this would be to declare time_t_min in the
> > header and have its definition in another file, or use a TIME_T_MIN macro as
> > glibc does in mktime.c. That file even has a local redefinition:
> >   time_t time_t_min = TIME_T_MIN;
> > So at the very least the warning points at code that has some oddities.
> 
> I can believe its an odd way to write C, but is it actually a bad one?
> I expect if I got warnings for code like that I'd be pretty unhappy
> about either moving the constant out where the compiler can't always see
> it, or making it a macro.
> 
> > >I would make two arguments about why I don't think we should warn.
> > >
> > >One is that 'static int const foo = 1' seems a lot like '#define foo 1'
> > >and we don't complain about the macro foo not being used.  If we
> > >complain about the unused const, why not complain about the unused
> > >macro?  We don't complain because we know it would result in too many
> > >warnings in existing code.  If we want people to move away from macros,
> > >and I think we do, then we should not make it harder to do so by
> > >introducing new warnings when they change.
> > >
> > >The other is that C++ does not complain about this.  I know that C and
> > >C++ are different languages with different rules but it seems like this
> > >difference is a difference that doesn't have to exist.  Either both
> > >should complain or neither should complain.  I can't think of any valid
> > >reason for one to complain and the other not to.
> > 
> > Well, they _are_ different languages, and handling of const is one place
> > where they differ. For example, C++ consts can be used in places where
> > constant expressions are required. The following is a valid C++ program but
> > not a C program:
> > 
> > const int v = 200;
> > int t[v];
> > 
> > The result is that the typical programming style for C is to have constants
> > #defined, while for C++ you can find more examples like the above; I recall
> > Stroustrup explicitly advocating that in the introductory books I read 20
> > years ago, and using it as a selling point for C++. Existing practice is
> > important when deciding what to warn about, and for the moment I remain
> > convinced that C practice is sufficiently different from C++.
> 
> existing practice is certainly important, but I would say that what is
> good practice is also very important.  It seems to me that warning for
> these constants is basically making it hard to follow a better practice
> than the existing one.  That seems pretty unfortunate.
> 
> On the other hand I've become much more of a C++ programmer than a C one
> so, I'm probably not the best judge.
> 
> Trev
> 
> > 
> > 
> > Bernd

So, is there any consensus on this issue?  I cannot build top-of-tree
glibc with top-of-tree GCC due to this warning coming from
timezone/private.h.  If GCC is going to keep this warning then I should
talk to the owners of the header file (tz group, not glibc) and see if
they would be willing to add a unused attribute to it.  Personally, I
would rather have GCC not issue the warning, but that is just based on
my opinion that the definition of a (possibly) unused static initialized
variable in a C header is a reasonable practice.

Steve Ellcey
sell...@imgtec.com

Re: [3/7] Optimize ZEXT_EXPR with tree-vrp

2015-10-06 Thread kugan



Hi Richard,

Thanks for the review.

On 15/09/15 23:08, Richard Biener wrote:

On Mon, Sep 7, 2015 at 4:58 AM, Kugan  wrote:

This patch tree-vrp handling and optimization for ZEXT_EXPR.


+  else if (code == SEXT_EXPR)
+{
+  gcc_assert (range_int_cst_p ());
+  unsigned int prec = tree_to_uhwi (vr1.min);
+  type = vr0.type;
+  wide_int tmin, tmax;
+  wide_int type_min, type_max;
+  wide_int may_be_nonzero, must_be_nonzero;
+
+  gcc_assert (!TYPE_UNSIGNED (expr_type));

hmm, I don't think we should restrict SEXT_EXPR this way.  SEXT_EXPR
should operate on both signed and unsigned types and the result type
should be the same as the type of operand 0.

+  type_min = wi::shwi (1 << (prec - 1),
+  TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  type_max = wi::shwi (((1 << (prec - 1)) - 1),
+  TYPE_PRECISION (TREE_TYPE (vr0.max)));

there is wi::min_value and max_value for this.


As of now, SEXT_EXPR in gimple is of the form: x = y sext 8 and types of 
all the operand and results are of the wider type. Therefore we cant use 
the  wi::min_value. Or do you want to convert this precision (in this 
case 8) to a type and use wi::min_value?


Please find the patch that addresses the other comments.

Thanks,
Kugan



+ HOST_WIDE_INT int_may_be_nonzero = may_be_nonzero.to_uhwi ();
+ HOST_WIDE_INT int_must_be_nonzero = must_be_nonzero.to_uhwi ();

this doesn't need to fit a HOST_WIDE_INT, please use wi::bit_and (can't
find a test_bit with a quick search).

+  tmin = wi::sext (tmin, prec - 1);
+  tmax = wi::sext (tmax, prec - 1);
+  min = wide_int_to_tree (expr_type, tmin);
+  max = wide_int_to_tree (expr_type, tmax);

not sure why you need the extra sign-extensions here.

+case SEXT_EXPR:
+   {
+ gcc_assert (is_gimple_min_invariant (op1));
+ unsigned int prec = tree_to_uhwi (op1);

no need to assert, tree_to_uhwi will do that for you.

+ HOST_WIDE_INT may_be_nonzero = may_be_nonzero0.to_uhwi ();
+ HOST_WIDE_INT must_be_nonzero = must_be_nonzero0.to_uhwi ();

likewise with HOST_WIDE__INT issue.

Otherwise looks ok to me.  Btw, this and adding of SEXT_EXPR could be
accompanied with a match.pd pattern detecting sign-extension patterns,
that would give some extra test coverage.

Thanks,
Richard.




gcc/ChangeLog:

2015-09-07  Kugan Vivekanandarajah  

 * tree-vrp.c (extract_range_from_binary_expr_1): Handle SEXT_EXPR.
 (simplify_bit_ops_using_ranges): Likewise.
 (simplify_stmt_using_ranges): Likewise.
>From 75fb9b8bcacd36a1409bf94c38048de83a5eab62 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Mon, 17 Aug 2015 13:45:52 +1000
Subject: [PATCH 3/7] Optimize ZEXT_EXPR with tree-vrp

---
 gcc/tree-vrp.c | 73 ++
 1 file changed, 73 insertions(+)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2cd71a2..9c7d8d8 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -2317,6 +2317,7 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   && code != LSHIFT_EXPR
   && code != MIN_EXPR
   && code != MAX_EXPR
+  && code != SEXT_EXPR
   && code != BIT_AND_EXPR
   && code != BIT_IOR_EXPR
   && code != BIT_XOR_EXPR)
@@ -2877,6 +2878,53 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
   extract_range_from_multiplicative_op_1 (vr, code, , );
   return;
 }
+  else if (code == SEXT_EXPR)
+{
+  gcc_assert (range_int_cst_p ());
+  unsigned int prec = tree_to_uhwi (vr1.min);
+  type = vr0.type;
+  wide_int tmin, tmax;
+  wide_int sign_bit;
+  wide_int type_min, type_max;
+  wide_int may_be_nonzero, must_be_nonzero;
+
+  type_min = wi::shwi (1 << (prec - 1),
+			   TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  type_max = wi::shwi (((1 << (prec - 1)) - 1),
+			   TYPE_PRECISION (TREE_TYPE (vr0.max)));
+  sign_bit = wi::shwi (1 << (prec - 1),
+			   TYPE_PRECISION (TREE_TYPE (vr0.min)));
+  if (zero_nonzero_bits_from_vr (expr_type, ,
+ _be_nonzero,
+ _be_nonzero))
+	{
+	  if (wi::bit_and (must_be_nonzero, sign_bit) == sign_bit)
+	{
+	  /* If to-be-extended sign bit is one.  */
+	  tmin = type_min;
+	  tmax = may_be_nonzero;
+	}
+	  else if (wi::bit_and (may_be_nonzero, sign_bit)
+		   != sign_bit)
+	{
+	  /* If to-be-extended sign bit is zero.  */
+	  tmin = must_be_nonzero;
+	  tmax = may_be_nonzero;
+	}
+	  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+	}
+  else
+	{
+	  tmin = type_min;
+	  tmax = type_max;
+	}
+  min = wide_int_to_tree (expr_type, tmin);
+  max = wide_int_to_tree (expr_type, tmax);
+}
   else if (code == RSHIFT_EXPR
 	   || code == LSHIFT_EXPR)
 {
@@ -9244,6 +9292,30 @@ simplify_bit_ops_using_ranges

Re: [google][gcc-4_9] encode and compress cc1 option strings in gcov_module_info

2015-10-06 Thread Xinliang David Li

ok with the following changes (and after testing).

+   * sizeof (char*));
   for (unsigned j = 0; j < total_num - num_array[0]; j++)
-module->string_array[j] = xstrdup (gcov_read_string ());
+string_array[j] = xstrdup (gcov_read_string ());
+
+  k = 0;
+  for (unsigned j = 1; j < (unsigned)num_lipo_cc1_string_kind; j++)

missing space.

+{
+  if (num_array[j] == 0)
+continue;
+  j_end += num_array[j];
+  len += strlen (DELIMITER) + 1; /* [  */
+  for (; k < j_end; k++)
+len += strlen (string_array[k]) + 1; /* 1 for delimiter of ']'  */
+}
+  saved_cc1_strings = (char *) xmalloc (len + 1);
+  saved_cc1_strings[0] = 0;
+
+  j_end = 0;
+  k = 0;
+  for (unsigned j = 1; j < (unsigned)num_lipo_cc1_string_kind; j++)

missing space.

+
+struct lipo_parsed_cc1_string *
+lipo_parse_saved_cc1_string (const char *src, char *str, bool
parse_cl_args_only)
+{
+  char *substr_begin;
+  unsigned size = sizeof (struct lipo_parsed_cc1_string);
+
+  struct lipo_parsed_cc1_string *ret = (struct lipo_parsed_cc1_string *)
+ xmalloc (size);
+  memset (ret, 0, size);

use XCNEW instead to allocate and initialize.

+  ret->source_filename = src;
+
+  substr_begin = find_substr (str);
+  if (substr_begin == str + strlen(str))

  if (*substr_begin == '\0') is faster.

+k = k_cpp_includes;
+break;
+  case COMMAND_ARG_FLAG:
+k = k_lipo_cl_args;
+break;
+  default:
+gcc_unreachable ();
 }
-  for (pdir = system_paths; pdir; pdir = pdir->next)
-num_system_paths++;
+substr_end = find_substr (substr_begin + 2);

 2 --> strlen(DELIMITER) + 1

+/* The following defines are for the saved_cc1_strings encoding.  */
+
+#define DELIMITER   "["
+#define DELIMITER2  "]"

Suggest changing DELIMITER to LIPO_STR_DELIM

On Tue, Oct 6, 2015 at 11:40 AM, Rong Xu  wrote:
> Here is the patch set 2 that integrates David's comments. Note that
> this uses the combined strlen (i.e. encoding compressed and
> uncompressed strlen into one gcov_unsigned_t).
>
> Testing is ongoing.
>
> -Rong
>
> On Tue, Oct 6, 2015 at 11:30 AM, Rong Xu  wrote:
>> It's 1:3 to 1:4 in the programs I tested. But it really depends on how
>> the options are used. I think your idea of using combined strlen works
>> better.
>> I just make the code a little clumsy but it does not cause any
>> performance issue.
>>
>> On Tue, Oct 6, 2015 at 10:21 AM, Xinliang David Li  
>> wrote:
>>> On Tue, Oct 6, 2015 at 9:26 AM, Rong Xu  wrote:
 On Mon, Oct 5, 2015 at 5:33 PM, Xinliang David Li  
 wrote:
>unsigned ggc_memory = gcov_read_unsigned ();
> +  unsigned marker = 0, len = 0, k;
> +  char **string_array, *saved_cc1_strings;
> +
>for (unsigned j = 0; j < 7; j++)
>
>
> Do not use hard coded number. Use the enum defined in coverage.c.

 OK.

>
>
> +string_array[j] = xstrdup (gcov_read_string ());
> +
> +  k = 0;
> +  for (unsigned j = 1; j < 7; j++)
>
> Do not use hard coded number.

 OK.

>
>
> +{
> +  if (num_array[j] == 0)
> +continue;
> +  marker += num_array[j];
>
> It is better to read if the name of variable 'marker' is changed to
> 'j_end' or something similar
>
> For all the substrings of 'j' kind, there should be just one marker,
> right? It looks like here you introduce one marker per string, not one
> marker per string kind.

 I don't understand what you meant here. "marker" is fixed for each j
 substring (one option kind) -- it the end index of the sub-string
 array. k-loop is for each string.

>>>
>>> That was a wrong comment from me. Discard it.
>>>
>
> +  len += 3; /* [[  */
>
> Same here for hard coded value.
>
> +  for (; k < marker; k++)
> +len += strlen (string_array[k]) + 1; /* 1 for delimter of 
> ']'  */
>
> Why do we need one ']' per string?

 This is because the options strings can contain space '  '. I cannot
 use space as the delimiter, neither is \0 as it is the end of the
 string of the encoded string.
>>>
>>> Ok -- this allows you to avoid string copy during parsing.

>
>
> +}
> +  saved_cc1_strings = (char *) xmalloc (len + 1);
> +  saved_cc1_strings[0] = 0;
> +
> +  marker = 0;
> +  k = 0;
> +  for (unsigned j = 1; j < 7; j++)
>
> Same here for 7.

 will fix in the new patch.

>
> +{
> +  static const char lipo_string_flags[6] = {'Q', 'B', 'S',
> 'D','I', 'C'};
> +

Re: [PATCH v2] SH FDPIC backend support

2015-10-06 Thread Rich Felker

On Wed, Oct 07, 2015 at 07:22:59AM +0900, Oleg Endo wrote:
> On Tue, 2015-10-06 at 12:52 -0400, Rich Felker wrote:
> > > > +  if (TARGET_FDPIC)
> > > > +{
> > > > +  rtx a = force_reg (Pmode, plus_constant (Pmode, XEXP (tramp_mem, 
> > > > 0), 8));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 0), a);
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 4), 
> > > > OUR_FDPIC_REG);
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 8),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0xd203d302 : 
> > > > 0xd302d203,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 12),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x5c216122 : 
> > > > 0x61225c21,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 16),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x0009412b : 
> > > > 0x412b0009,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 20), cxt);
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 24), fnaddr);
> > > > +}
> > > > +  else
> > > > +{
> > > > +  emit_move_insn (change_address (tramp_mem, SImode, NULL_RTX),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0xd301d202 : 
> > > > 0xd202d301,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 4),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x0009422b : 
> > > > 0x422b0009,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 8), cxt);
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 12), fnaddr);
> > > > +}
> > > 
> > > I think this hunk really needs a comment.  It copies machine code from
> > > somewhere to somewhere via constant loads... but what exactly are the
> > > instructions ...
> > 
> > This is generating trampolines for nested functions. This portion of
> > the patch applied without modification from the old patch, so I didn't
> > read into it in any more detail; it seems to be the following, which
> > makes sense:
> > 
> > 0:  .long 1f
> > .long gotval
> > 1:  mov.l 3f,r3
> > mov.l 2f,r2
> > mov.l @r2,r1
> > mov.l @(4,r2),r12
> > jmp @r1
> > nop
> > 3:  .long cxt
> > 2:  .long fnaddr
> > 
> > The corresponding non-FDPIC version is:
> > 
> > mov.l 3f,r3
> > mov.l 2f,r2
> > jmp @r2
> > nop
> > 3:  .long cxt
> > 2:  .long fnaddr
> > 
> > Should these go into the source as comments?
> 
> Yes, please.  And of course some of the descriptive text as above.

OK.

> > I would think it does, but I've found in the RTL files sometimes extra
> > escaping is silently accepted, and I'm not sure if omitting it would
> > visibly break. Can I rely on it producing a visible error right away
> > if removing it is wrong, or do I need to search the gccint
> > documentation to figure out what the right way is?
> 
> Just compile some code and look at the generated asm.

OK, I'll try this.

> > It can't generate the same code either way because, with the patch as
> > submitted, there's an extra load inside the asm. I would prefer
> > switching to an approach that avoids that (mainly to avoid the ugly
> > near-duplication of the asm block, but also to save a couple
> > instructions) but short of feedback on acceptable ways to do the
> > punning in the C++ I'll just leave it in the asm for now.
> 
> Do you have some alternatives to what's currently in the patch?  It's
> difficult to judge without seeing them...

Perhaps something like the following:

#ifdef __SH_FDPIC__
typedef __attribute__((__may_alias__)) uintptr_t sh_aliased_uintptr_t;
#define SH_CODE_ADDR(x) (*(sh_aliased_uintptr_t *)(x))
#else
#define SH_CODE_ADDR(x) x
#endif

And then just passing SH_CODE_ADDR(__udiv_qrnnd_16) rather than just
__udiv_qrnnd_16 as the input to the asm.

Rich

[Fortran, committed] Fix segfault on substring of derived type component (pr 65766)

2015-10-06 Thread Louis Krupp

Revision 228551...

Louis

[PATCH] move graphite bookkeeping from sese to sese_info

2015-10-06 Thread Sebastian Pop

2015-10-06  Aditya Kumar  
Sebastian Pop  

* graphite-isl-ast-to-gimple.c (translate_isl_ast_to_gimple): 
Use
an sese_info_p.
(copy_def): Same.
(copy_internal_parameters): Same.
(translate_isl_ast_to_gimple): Use an sese_l.
(build_iv_mapping): Same.
* graphite-poly.c (new_sese): Rename new_sese_info.
(free_sese): Rename free_sese_info.
* graphite-poly.h (struct scop): Use an sese_info_p.
(scop_set_region): Same.
* graphite-scop-detection.c (struct sese_l): Moved...
(get_entry_bb): Moved...
(get_exit_bb): Moved...
(parameter_index_in_region_1): Use an sese_info_p.
(parameter_index_in_region): Same.
(scan_tree_for_params): Same.
(find_params_in_bb): Same.
(sese_dom_walker): Use an sese_l.
* graphite-sese-to-poly.c (remove_invariant_phi): Same.
(reduction_phi_p): Same.
(parameter_index_in_region_1): Use an sese_info_p.
(propagate_expr_outside_region): Use an sese_l.
* graphite.c: Replace uses of SCOP_REGION.
* sese.c (sese_record_loop): Use an sese_info_p.
(build_sese_loop_nests): Same.
(sese_build_liveouts_use): Same.
(sese_build_liveouts_bb): Same.
(sese_build_liveouts_bb): Same.
(sese_bad_liveouts_use): Same.
(sese_reset_debug_liveouts_bb): Same.
(sese_build_liveouts): Same.
(new_sese): Renamed new_sese_info.
(free_sese): Renamed free_sese_info.
(set_rename): Use an sese_info_p.
(graphite_copy_stmts_from_block): Same.
(copy_bb_and_scalar_dependences): Same.
(outermost_loop_in_sese_1): Use an sese_l.
(outermost_loop_in_sese): Same.
(if_region_set_false_region): Use an sese_info_p.
(move_sese_in_condition): Same.
(scalar_evolution_in_region): Use an sese_l.
* sese.h (struct sese_l): ... here.
(SESE_ENTRY): Remove.
(SESE_ENTRY_BB): Remove.
(SESE_EXIT): Remove.
(SESE_EXIT_BB): Remove.
(sese_contains_loop): Use an sese_info_p.
(sese_nb_params): Same.
(bb_in_sese_p): Use an sese_l.
(stmt_in_sese_p): Same.
(defined_in_sese_p): Same.
(loop_in_sese_p): Same.
(sese_loop_depth): Same.
(struct ifsese_s): Use an sese_info_p.
(gbb_loop_at_index): Use an sese_l.
(nb_common_loops): Same.
(scev_analyzable_p): Same.
---
 gcc/graphite-isl-ast-to-gimple.c |  38 +--
 gcc/graphite-poly.c  |   4 +-
 gcc/graphite-poly.h  |   4 +-
 gcc/graphite-scop-detection.c| 134 ++-
 gcc/graphite-sese-to-poly.c  |  46 +++---
 gcc/graphite.c   |   8 +--
 gcc/sese.c   | 103 +++---
 gcc/sese.h   | 124 +---
 8 files changed, 222 insertions(+), 239 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 43cb7dd..f4e7edf 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -132,7 +132,7 @@ void ivs_params_clear (ivs_params )
 class translate_isl_ast_to_gimple
 {
  public:
-  translate_isl_ast_to_gimple (sese r)
+  translate_isl_ast_to_gimple (sese_info_p r)
 : region (r)
   { }
 
@@ -261,9 +261,9 @@ class translate_isl_ast_to_gimple
  corresponding tree expressions.  */
   void build_iv_mapping (vec iv_map, gimple_poly_bb_p gbb,
 __isl_keep isl_ast_expr *user_expr, ivs_params ,
-sese region);
+sese_l );
 private:
-  sese region;
+  sese_info_p region;
 };
 
 /* Return the tree variable that corresponds to the given isl ast identifier
@@ -741,7 +741,7 @@ void
 translate_isl_ast_to_gimple::
 build_iv_mapping (vec iv_map, gimple_poly_bb_p gbb,
  __isl_keep isl_ast_expr *user_expr, ivs_params ,
- sese region)
+ sese_l )
 {
   gcc_assert (isl_ast_expr_get_type (user_expr) == isl_ast_expr_op &&
  isl_ast_expr_get_op_type (user_expr) == isl_ast_op_call);
@@ -756,7 +756,6 @@ build_iv_mapping (vec iv_map, gimple_poly_bb_p gbb,
   loop_p old_loop = gbb_loop_at_index (gbb, region, i - 1);
   iv_map[old_loop->num] = t;
 }
-
 }
 
 /* Translates an isl_ast_node_user to Gimple.
@@ -787,10

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread Bernd Schmidt


On 10/06/2015 08:02 PM, Steve Ellcey wrote:

If I remove that and I change Rule #16 to handle an AND with a register
I get odd looking .cfi stuff.  The AND instruction (which is marked with
RTX_FRAME_RELATED) seems to generate these cfi_escape macros:

.cfi_escape 0x10,0x1f,0x2,0x8e,0x7c
.cfi_escape 0x10,0x1e,0x2,0x8e,0x78
.cfi_escape 0x10,0xc,0x2,0x8e,0x74

which are meaningless to me.


Possibly something generating a DW_CFA_expression, which then gets 
expanded as .cfi_escape sequences. Have a look at reg_save, there's code 
conditional on stack_realign that does that. Or put a breakpoint 
anywhere in dwarf2cfi that makes a .cfi_escape to see where it's coming 
from.


As HJ says, fix readelf first to help you dump the unwind info.


Bernd

Re: [RFA][PATCH] Fix building cr16-elf with trunk compiler

2015-10-06 Thread Marek Polacek

On Wed, Sep 30, 2015 at 07:10:11PM +0200, Marek Polacek wrote:
> On Wed, Sep 30, 2015 at 11:01:14AM -0600, Jeff Law wrote:
> > On 09/30/2015 06:45 AM, Marek Polacek wrote:
> > >On Wed, Sep 30, 2015 at 02:41:36PM +0200, Bernd Schmidt wrote:
> > >>On 09/29/2015 11:49 PM, Jeff Law wrote:
> > >>>
> > >>>This code from builtins.c:
> > >>>
> > >>>   /* If we don't need too much alignment, we'll have been guaranteed
> > >>>  proper alignment by get_trampoline_type.  */
> > >>>   if (TRAMPOLINE_ALIGNMENT <= STACK_BOUNDARY)
> > >>> return tramp;
> > >>>
> > >>>
> > >>>It's entirely conceivable that TRAMPOLINE_ALIGNMENT will be the same as
> > >>>STACK_BOUNDARY.  And if they are, then -Wtautological-compare will
> > >>>complain bitterly.
> > >>
> > >>Eww. Can we fix the warning not to complain when the comparison involves
> > >>macros?
> > >
> > >It already has
> > >
> > >   /* Don't warn for various macro expansions.  */
> > >   if (from_macro_expansion_at (loc)
> > >   || from_macro_expansion_at (EXPR_LOCATION (lhs))
> > >   || from_macro_expansion_at (EXPR_LOCATION (rhs)))
> > > return;
> > >
> > >and also
> > >
> > >   /* We do not warn for constants because they are typical of macro
> > >  expansions that test for features, sizeof, and similar.  */
> > >   if (CONSTANT_CLASS_P (lhs) || CONSTANT_CLASS_P (rhs))
> > > return;
> > >
> > >so why does it warn? :(
> > If you want to dive into it, be my guest :-0  Attached is a suitable .ii
> > file.
> 
> All right, let me take a look.

The problem here is that COND_EXPR in cc1plus don't have a location.  I've
opened  to track this.

Marek

Re: [patch 4/3] Header file reduction - Tools for contrib

2015-10-06 Thread Bernd Schmidt


On 10/05/2015 11:18 PM, Andrew MacLeod wrote:

Here's the patch to add all the tools to contrib/headers.


Small patches should not be sent in compressed form, it makes reading 
and quoting them harder. This message is only intended to contain the 
patch in plain text so that I can quote it in further replies.



There are 9 tools I used over the run of the project.  They were
developed in various stages and iterations, but I tried to at least have
some common interface things, and I tried some cleaning up and
documentation.  No commenting on the quality of python code... :-) I was
learning python on the fly.Im sure some things are QUITE awful.,

There is a readme file which gives a common use cases for each tool

Some of the tools are for analysis, aggregation, or flattening, some for
visualization, and some are for the include reduction. I would have just
filed them away somewhere, but  Jeff suggested I contribute them in case
someone wants to do something with them down the road... which
presumably also includes me :-)   Less chance of losing them this way.

They need more polishing, but I'm tired of looking at them. I will
return to them down the road and see about cleaning them up a bit more.
They still aren't perfect by any means, but should do their job safely.
when used properly.   Comments in the code vary from good to absent,
depending on how irritable I was at the time I was working on itl

I will soon also provide a modified config-list.mk which still works
like the current one, but allows for easy overrides of certain things
the include reducer requires..  until now I've just made a copy of
config-list.mk and modified it for my own means.

The 2 tools for include reduction are  gcc-order-headers   and
reduce-headers

what the process/conditions for checking things into contrib?  I've
never had to do it before :-)

Andrew



Index: contrib/headers/ChangeLog
===
*** contrib/headers/ChangeLog	(revision 0)
--- contrib/headers/ChangeLog	(working copy)
***
*** 0 
--- 1,12 
+ 2015-10-06  Andrew MacLeod  
+ 
+ 	* README : New File.
+ 	* count-headers : New File.
+ 	* gcc-order-headers : New File.
+ 	* graph-header-logs : New File.
+ 	* graph-include-web : New File.
+ 	* headerutils.py : New File.
+ 	* included-by : New File.
+ 	* reduce-headers : New File.
+ 	* replace-header : New File.
+ 	* show-headers : New File.
Index: contrib/headers/README
===
*** contrib/headers/README	(revision 0)
--- contrib/headers/README	(working copy)
***
*** 0 
--- 1,282 
+ Quick start documentation for the header file utilities.  
+ 
+ This isn't a full breakdown of the tools, just they typical use scenarios.
+ 
+ - Each tool accepts -h to show its usage. usually no parameters will also
+ trigger the help message.  Help may specify additonal functionality to what is
+ listed here.
+ 
+ - For *all* tools, option format for specifying filenames must have no spaces
+ between the option and filename.
+ ie.: tool -lfilename.h  target.h
+ 
+ - Many of the tools are required to be run from the core gcc source directory
+ containing coretypes.h  typically that is  in gcc/gcc from a source checkout.
+ For these tools to work on files not in this directory, their path needs to be
+ specified on the command line, 
+ ie.: tool c/c-decl.c  lto/lto.c
+ 
+ - options can be intermixed with filenames anywhere on the command line
+ ie.   tool ssa.h rtl.h -a   is equivalent to 
+   tool ssa.h -a rtl.h
+ 
+ 
+ 
+ 
+ 
+ gcc-order-headers
+ -
+   This will reorder any primary backend headers files into a canonical order
+   which will resolve any hidden dependencies they may have.  Any unknown
+   headers will simply be occur after the recognized core files, and retain the
+   same relative ordering they had.
+  
+   Must be run in the core gcc source directory
+ 
+   simply execute the command listing any files you wish to process on the
+   command line.
+ 
+   Any files which are changed are output, and the original is saved with a
+   .bak extention.
+ 
+   ex.: gcc-order-headers tree-ssa.c c/c-decl.c
+ 
+   -s will list all of the known headers in their canonical order. It does not
+   show which of those headers include other headers, just the final canonical
+   ordering.
+ 
+   if any header files are included within a conditional code block, the tool
+   will issue a message and not change the file.  When this happens, you can
+   manually inspect the file, and if reorder it will be fine, rerun the command
+   with -i on the files.  This will ignore the conditional error condition
+   and perform the re-ordering anyway.
+   
+   If any #include line has the beginning of a multi-line comment, it will also
+   refuse to process the file until that is resolved. 
+  
+ 
+ 
+ 
+ show-headers
+ 
+   This

[wwwdocs] Suggest UBsan in https://gcc.gnu.org/bugs/

2015-10-06 Thread Jonathan Wakely


For many non-bugs UBsan is at least as likely to reveal it as
-fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations so we
should tell people to try that before wasting time in Bugzilla.

OK for wwwdocs?
Index: htdocs/bugs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/bugs/index.html,v
retrieving revision 1.116
diff -u -r1.116 index.html
--- htdocs/bugs/index.html	5 Jul 2014 21:52:32 -	1.116
+++ htdocs/bugs/index.html	6 Oct 2015 10:50:32 -
@@ -50,7 +50,11 @@
 with gcc -Wall -Wextra and see whether this shows anything
 wrong with your code.  Similarly, if compiling with
 -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations
-makes a difference, your code probably is not correct.
+makes a difference, your code probably is not correct.
+If compiling with -fsanitize=undefined is supported by your
+version of GCC and produces any run-time errors your code is definitely
+not correct.
+
 
 Summarized bug reporting instructions

[patch 0/6] scalar-storage-order merge (2)

2015-10-06 Thread Eric Botcazou

Hi,

this is a repost of the diff of the scalar-storage-order branch vs mainline.
It contains the fixes suggested by Joseph for the C front-end and the doc, 
fixes for the handling of complex types, the new pragma scalar_storage_order 
and associated -fsso-struct switch for the C family of languages, and the 
adjustment for vector types suggested by Ramana, plus assorted fixes:
  https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00024.html
  https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00138.html
  https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01500.html

See https://gcc.gnu.org/ml/gcc/2015-06/msg00126.html for the proposal.

I have split the diff into 6 pieces, which are interdependent and thus cannot 
be applied independently: 3 for the Ada, C and C++ front-ends, 1 for the bulk 
of the implementation, 1 for the rest and 1 for the testsuite.

It has been bootstrapped/regtested on x86_64-linux and powerpc-linux.

-- 
Eric Botcazou

[patch 1/6] scalar-storage-order merge: Ada front-end

2015-10-06 Thread Eric Botcazou

This is the Ada front-end (in fact mostly gigi) part.

ada/
* freeze.adb (Check_Component_Storage_Order): Skip a record component
if it has Complex_Representation.
(Freeze_Record_Type): If the type has Complex_Representation, skip
the regular treatment of Scalar_Storage_Order attribute and instead
issue a warning if it is present.
* gcc-interface/gigi.h (set_reverse_storage_order_on_pad_type):
Declare.
* gcc-interface/decl.c (gnat_to_gnu_entity) : Set the
storage order on the enclosing record for a packed array type.
: Set the storage order.
: Likewise.
: Likewise.
: Likewise.
(gnat_to_gnu_component_type): Set the reverse storage order on a
padded type built for a non-bit-packed array.
(gnat_to_gnu_field): Likewise.
(components_to_record): Deal with TYPE_REVERSE_STORAGE_ORDER.
* gcc-interface/utils.c (make_packable_type): Likewise.
(pad_type_hasher::equal): Likewise.
(gnat_types_compatible_p): Likewise.
(unchecked_convert): Likewise.
(set_reverse_storage_order_on_pad_type): New public function.
* gcc-interface/trans.c (Attribute_to_gnu): Adjust call to
get_inner_reference.
* gcc-interface/utils2.c (build_unary_op): Likewise.
(gnat_build_constructor): Deal with TYPE_REVERSE_STORAGE_ORDER.
(gnat_rewrite_reference): Propagate REF_REVERSE_STORAGE_ORDER.

 freeze.adb |  111 ---
 gcc-interface/decl.c   |   49 +++--
 gcc-interface/gigi.h   |3 +
 gcc-interface/trans.c  |4 -
 gcc-interface/utils.c  |   84 +++--
 gcc-interface/utils2.c |   11 +++-
 6 files changed, 192 insertions(+), 70 deletions(-)

-- 
Eric BotcazouIndex: ada/freeze.adb
===
--- ada/freeze.adb	(.../trunk/gcc)	(revision 228112)
+++ ada/freeze.adb	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -1196,9 +1196,14 @@ package body Freeze is
  Attribute_Scalar_Storage_Order);
   Comp_ADC_Present := Present (Comp_ADC);
 
-  --  Case of record or array component: check storage order compatibility
-
-  if Is_Record_Type (Comp_Type) or else Is_Array_Type (Comp_Type) then
+  --  Case of record or array component: check storage order compatibility.
+  --  But, if the record has Complex_Representation, then it is treated as
+  --  a scalar in the back end so the storage order is irrelevant.
+
+  if (Is_Record_Type (Comp_Type)
+and then not Has_Complex_Representation (Comp_Type))
+or else Is_Array_Type (Comp_Type)
+  then
  Comp_SSO_Differs :=
Reverse_Storage_Order (Encl_Type)
  /=
@@ -3958,61 +3963,73 @@ package body Freeze is
 Next_Entity (Comp);
  end loop;
 
- --  Deal with default setting of reverse storage order
+ SSO_ADC := Get_Attribute_Definition_Clause
+  (Rec, Attribute_Scalar_Storage_Order);
 
- Set_SSO_From_Default (Rec);
+ --  If the record type has Complex_Representation, then it is treated
+ --  as a scalar in the back end so the storage order is irrelevant.
 
- --  Check consistent attribute setting on component types
+ if Has_Complex_Representation (Rec) then
+if Present (SSO_ADC) then
+   Error_Msg_N
+  ("??storage order has no effect with "
+   & "Complex_Representation", SSO_ADC);
+end if;
 
- SSO_ADC := Get_Attribute_Definition_Clause
-  (Rec, Attribute_Scalar_Storage_Order);
+ else
+--  Deal with default setting of reverse storage order
 
- declare
-Comp_ADC_Present : Boolean;
- begin
-Comp := First_Component (Rec);
-while Present (Comp) loop
-   Check_Component_Storage_Order
- (Encl_Type=> Rec,
-  Comp => Comp,
-  ADC  => SSO_ADC,
-  Comp_ADC_Present => Comp_ADC_Present);
-   SSO_ADC_Component := SSO_ADC_Component or Comp_ADC_Present;
-   Next_Component (Comp);
-end loop;
- end;
+Set_SSO_From_Default (Rec);
+
+--  Check consistent attribute setting on component types
 
- --  Now deal with reverse storage order/bit order issues
+declare
+   Comp_ADC_Present : Boolean;
+begin
+   Comp := First_Component (Rec);
+   while Present (Comp) loop
+  Check_Component_Storage_Order
+(Encl_Type=> Rec,
+ Comp => Comp,
+ ADC  => SSO_ADC,
+

[patch 2/6] scalar-storage-order merge: C front-end

2015-10-06 Thread Eric Botcazou

This is the C front-end + C family part.

* doc/extend.texi (type attributes): Document scalar_storage_order.
(Structure-Packing Pragmas): Rename into...
(Structure-Layout Pragmas): ...this.  Document scalar_storage_order.
* doc/invoke.texi (C Dialect Options): Document -fsso-struct
(Warnings): Document -Wno-scalar-storage-order.
* flag-types.h (enum scalar_storage_order_kind): New enumeration.
c-family/
* c-common.c (c_common_attributes): Add scalar_storage_order.
(handle_scalar_storage_order_attribute): New function.
* c-pragma.c (global_sso): New variable.
(maybe_apply_pragma_scalar_storage_order): New function.
(handle_pragma_scalar_storage_order): Likewise.
(init_pragma): Register scalar_storage_order.
* c-pragma.h (maybe_apply_pragma_scalar_storage_order): Declare.
* c.opt (Wscalar-storage-order): New warning.
(fsso-struct=): New option.
c/
* c-decl.c (finish_struct): If the structure has reverse storage
order, rewrite the type of array fields with scalar component.  Call
maybe_apply_pragma_scalar_storage_order on entry.
* c-typeck.c (build_unary_op) : Remove left-overs.  Issue
errors on bit-fields and reverse SSO here and not...
(c_mark_addressable): ...here.
(output_init_element): Adjust call to initializer_constant_valid_p.
(c_build_qualified_type): Propagate TYPE_REVERSE_STORAGE_ORDER.

 doc/extend.texi |   69 ++
 doc/invoke.texi |   22 +++-
 flag-types.h|9 +-
 c-family/c.opt  |   17 
 c-family/c-common.c |   47 ++-
 c-family/c-pragma.c |   50 +
 c-family/c-pragma.h |1 
 c/c-typeck.c|   66 ++---
 c/c-decl.c  |   48 +---
 8 files changed, 273 insertions(+), 47 deletions(-)

-- 
Eric BotcazouIndex: doc/extend.texi
===
--- doc/extend.texi	(.../trunk/gcc)	(revision 228112)
+++ doc/extend.texi	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -6310,6 +6310,42 @@ of the structure or union is placed to m
 attached to an @code{enum} definition, it indicates that the smallest
 integral type should be used.
 
+@item scalar_storage_order ("@var{endianness}")
+@cindex @code{scalar_storage_order} type attribute
+When attached to a @code{union} or a @code{struct}, this attribute sets
+the storage order, aka endianness, of the scalar fields of the type, as
+well as the array fields whose component is scalar.  The supported
+endianness are @code{big-endian} and @code{little-endian}.  The attribute
+has no effects on fields which are themselves a @code{union}, a @code{struct}
+or an array whose component is a @code{union} or a @code{struct}, and it is
+possible to have fields with a different scalar storage order than the
+enclosing type.
+
+This attribute is supported only for targets that use a uniform default
+scalar storage order (fortunately, most of them), i.e. targets that store
+the scalars either all in big-endian or all in little-endian.
+
+Additional restrictions are enforced for types with the reverse scalar
+storage order with regard to the scalar storage order of the target:
+
+@itemize
+@item Taking the address of a scalar field of a @code{union} or a
+@code{struct} with reverse scalar storage order is not permitted and will
+yield an error
+@item Taking the address of an array field, whose component is scalar, of
+a @code{union} or a @code{struct} with reverse scalar storage order is
+permitted but will yield a warning, unless @option{-Wno-scalar-storage-order}
+is specified
+@item Taking the address of a @code{union} or a @code{struct} with reverse
+scalar storage order is permitted
+@end itemize
+
+These restrictions exist because the storage order attribute is lost when
+the address of a scalar or the address of an array with scalar component
+is taken, so storing indirectly through this address will generally not work.
+The second case is nevertheless allowed to be able to perform a block copy
+from or to the array.
+
 @item transparent_union
 @cindex @code{transparent_union} type attribute
 
@@ -18326,7 +18362,7 @@ for further explanation.
 * Darwin Pragmas::
 * Solaris Pragmas::
 * Symbol-Renaming Pragmas::
-* Structure-Packing Pragmas::
+* Structure-Layout Pragmas::
 * Weak Pragmas::
 * Diagnostic Pragmas::
 * Visibility Pragmas::
@@ -18602,8 +18638,8 @@ the name does not change.
 always the C-language name.
 @end enumerate
 
-@node Structure-Packing Pragmas
-@subsection Structure-Packing Pragmas
+@node Structure-Layout Pragmas
+@subsection Structure-Layout Pragmas
 
 For compatibility with Microsoft Windows compilers, GCC supports a
 set of @code{#pragma} directives that

[patch 3/6] scalar-storage-order merge: C++ front-end

2015-10-06 Thread Eric Botcazou

This is the C++ front-end part, probably incomplete but passes the testsuite.

cp/
* class.c: Add c-family/c-pragma.h.
(finish_struct_1): If structure has reverse scalar storage order,
rewrite the type of array fields with scalar component.  Call
maybe_apply_pragma_scalar_storage_order on entry.
* constexpr.c (reduced_constant_expression_p): Unfold recursion and
deal with TYPE_REVERSE_STORAGE_ORDER.
* typeck.c (structural_comptypes): Return false if two aggregate
types have different scalar storage order.
(cp_build_addr_expr_1) : New case.  Issue the
error for bit-fields here and not later.
: Issue error and warning for reverse scalar storage
order.
* typeck2.c (split_nonconstant_init_1) : Adjust call to
initializer_constant_valid_p.

 class.c |   24 +++-
 constexpr.c |   19 +--
 typeck.c|   45 ++---
 typeck2.c   |5 -
 4 files changed, 82 insertions(+), 11 deletions(-)

-- 
Eric BotcazouIndex: cp/typeck.c
===
--- cp/typeck.c	(.../trunk/gcc)	(revision 228112)
+++ cp/typeck.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -1217,6 +1217,9 @@ structural_comptypes (tree t1, tree t2,
 return false;
   if (TYPE_FOR_JAVA (t1) != TYPE_FOR_JAVA (t2))
 return false;
+  if (AGGREGATE_TYPE_P (t1)
+  && TYPE_REVERSE_STORAGE_ORDER (t1) != TYPE_REVERSE_STORAGE_ORDER (t2))
+return false;
 
   /* Allow for two different type nodes which have essentially the same
  definition.  Note that we already checked for equality of the type
@@ -5584,6 +5587,41 @@ cp_build_addr_expr_1 (tree arg, bool str
 	}
   break;
 
+case COMPONENT_REF:
+  if (BASELINK_P (TREE_OPERAND (arg, 1)))
+	break;
+
+  if (DECL_C_BIT_FIELD (TREE_OPERAND (arg, 1)))
+	{
+	  if (complain & tf_error)
+	error ("attempt to take address of bit-field structure member %qD",
+		   TREE_OPERAND (arg, 1));
+	  return error_mark_node;
+	}
+  /* Fall through.  */
+
+case ARRAY_REF:
+  if (TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_OPERAND (arg, 0
+	{
+	  if (!AGGREGATE_TYPE_P (TREE_TYPE (arg))
+	  && !VECTOR_TYPE_P (TREE_TYPE (arg)))
+	{
+	  if (complain & tf_error)
+		error ("attempt to take address of scalar with reverse "
+		   "storage order");
+	  return error_mark_node;
+	}
+
+	   if (TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE
+	   && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (arg)))
+	{
+	  if (complain & tf_warning)
+		warning (OPT_Wscalar_storage_order, "address of array with "
+			 "reverse scalar storage order requested");
+	}
+	}
+  break;
+
 case BASELINK:
   arg = BASELINK_FUNCTIONS (arg);
   /* Fall through.  */
@@ -5647,13 +5685,6 @@ cp_build_addr_expr_1 (tree arg, bool str
 	val = build2 (COMPOUND_EXPR, TREE_TYPE (val),
 		  TREE_OPERAND (arg, 0), val);
 }
-  else if (DECL_C_BIT_FIELD (TREE_OPERAND (arg, 1)))
-{
-  if (complain & tf_error)
-	error ("attempt to take address of bit-field structure member %qD",
-	   TREE_OPERAND (arg, 1));
-  return error_mark_node;
-}
   else
 {
   tree object = TREE_OPERAND (arg, 0);
Index: cp/class.c
===
--- cp/class.c	(.../trunk/gcc)	(revision 228112)
+++ cp/class.c	(.../branches/scalar-storage-order/gcc)	(revision 228133)
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.
 #include "stor-layout.h"
 #include "attribs.h"
 #include "cp-tree.h"
+#include "c-family/c-pragma.h"
 #include "flags.h"
 #include "toplev.h"
 #include "target.h"
@@ -6625,6 +6626,7 @@ finish_struct_1 (tree t)
 }
 
   /* Layout the class itself.  */
+  maybe_apply_pragma_scalar_storage_order (t);
   layout_class_type (t, );
   if (CLASSTYPE_AS_BASE (t) != t)
 /* We use the base type for trivial assignments, and hence it
@@ -6688,12 +6690,32 @@ finish_struct_1 (tree t)
   set_method_tm_attributes (t);
 
   /* Complete the rtl for any static member objects of the type we're
- working on.  */
+ working on and rewrite the type of array fields with scalar
+ component if the enclosing type has reverse storage order.  */
   for (x = TYPE_FIELDS (t); x; x = DECL_CHAIN (x))
 if (VAR_P (x) && TREE_STATIC (x)
 && TREE_TYPE (x) != error_mark_node
 	&& same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t))
   DECL_MODE (x) = TYPE_MODE (t);
+else if (TYPE_REVERSE_STORAGE_ORDER (t)
+	 && TREE_CODE (x) == FIELD_DECL
+	 && TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
+  {
+	tree ftype = TREE_TYPE (x);
+	tree ctype = strip_array_types (ftype);
+	if (!RECORD_OR_UNION_TYPE_P (ctype) && TYPE_MODE (ctype) != QImode)
+	  {
+	tree fmain_type = TYPE_MAIN_VARIANT (ftype);
+	tree *typep = _type;
+	do {
+	  *typep =

[patch 4/6] scalar-storage-order merge: bulk

2015-10-06 Thread Eric Botcazou

This is the bulk of the implementation.

* calls.c (store_unaligned_arguments_into_pseudos): Adjust calls to
extract_bit_field and store_bit_field.
(initialize_argument_information): Adjust call to store_expr.
(load_register_parameters): Adjust call to extract_bit_field.
* expmed.c (check_reverse_storage_order_support): New function.
(check_reverse_float_storage_order_support): Likewise.
(flip_storage_order): Likewise.
(store_bit_field_1): Add REVERSE parameter.  Flip the storage order
of the value if it is true.  Pass REVERSE to recursive call after
adjusting the target offset.
Do not use extraction or movstrict instruction if REVERSE is true.
Pass REVERSE to store_fixed_bit_field.
(store_bit_field): Add REVERSE parameter and pass to it to above.
(store_fixed_bit_field): Add REVERSE parameter and pass to it to
store_split_bit_field and store_fixed_bit_field_1.
(store_fixed_bit_field_1):  Add REVERSE parameter.  Flip the storage
order of the value if it is true and adjust the target offset.
(store_split_bit_field): Add REVERSE parameter and pass it to
store_fixed_bit_field.  Adjust the target offset if it is true.
(extract_bit_field_1): Add REVERSE parameter.  Flip the storage order
of the value if it is true.  Pass REVERSE to recursive call after
adjusting the target offset.
Do not use extraction or subreg instruction if REVERSE is true.
Pass REVERSE to extract_fixed_bit_field.
(extract_bit_field): Add REVERSE parameter and pass to it to above.
(extract_fixed_bit_field): Add REVERSE parameter and pass to it to
extract_split_bit_field and extract_fixed_bit_field_1.
(extract_fixed_bit_field_1): Add REVERSE parameter.  Flip the storage
order of the value if it is true and adjust the target offset.
(extract_split_bit_field): Add REVERSE parameter and pass it to
extract_fixed_bit_field.  Adjust the target offset if it is true.
* expmed.h (flip_storage_order): Declare.
(store_bit_field): Adjust prototype.
(extract_bit_field): Likewise.
* expr.c (emit_group_load_1): Adjust calls to extract_bit_field.
(emit_group_store): Adjust call to store_bit_field.
(copy_blkmode_from_reg): Likewise.
(copy_blkmode_to_reg): Likewise.
(write_complex_part): Likewise.
(read_complex_part): Likewise.
(optimize_bitfield_assignment_op): Add REVERSE parameter.  Assert
that it isn't true if the target is a register.
: If it is, do not optimize unless bitsize is equal to 1,
and flip the storage order of the value.
: Flip the storage order of the value.
(get_bit_range): Adjust call to get_inner_reference.
(expand_assignment): Adjust calls to get_inner_reference, store_expr,
optimize_bitfield_assignment_op and store_field.  Handle MEM_EXPRs
with reverse storage order.
(store_expr_with_bounds): Add REVERSE parameter and pass it to
recursive calls and call to store_bit_field.  Force the value into a
register if it is true and then flip the storage order of the value.
(store_expr): Add REVERSE parameter and pass it to above.
(categorize_ctor_elements_1): Adjust call to
initializer_constant_valid_p.
(store_constructor_field): Add REVERSE parameter and pass it to
recursive calls and call to store_field.
(store_constructor): Add REVERSE parameter and pass it to calls to
store_constructor_field and store_expr.  Set it to true for an
aggregate type with TYPE_REVERSE_STORAGE_ORDER.
(store_field): Add REVERSE parameter and pass it to recursive calls
and calls to store_expr and store_bit_field.  Temporarily flip the
storage order of the value with record type and integral mode and
adjust the shift if it is true.
(get_inner_reference): Add PREVERSEP parameter and set it to true
upon encoutering a reference with reverse storage order.
(expand_expr_addr_expr_1): Adjust call to get_inner_reference.
(expand_constructor): Adjust call to store_constructor.
(expand_expr_real_2) : Pass TYPE_REVERSE_STORAGE_ORDER
of the union type to store_expr in the MEM case and assert that it
isn't set in the REG case.  Adjust call to store_field.
(expand_expr_real_1) : Handle reverse storage order.
: Add REVERSEP variable and adjust calls to
get_inner_reference and extract_bit_field. Temporarily flip the
storage order of the value with record type and integral mode and
adjust the shift if it is true.  Flip the storage order of the value
at the end if it is true.
: Add REVERSEP variable and adjust call to
get_inner_reference.  Do not fetch an inner

Re: [PATCH, i386, AVX-512] Update extract_even_odd w/ AVX-512BW insns.

2015-10-06 Thread H.J. Lu

On Fri, Oct 2, 2015 at 7:37 AM, Kirill Yukhin  wrote:
> On 01 Oct 14:11, Kirill Yukhin wrote:
>> Bootstrapped. New tests pass (fail w/o the change). Regtesting is in 
>> progress.
>>
>> Is it ok for trunk?
>>
>> gcc/
>>   * config/i386/i386.c (expand_vec_perm_even_odd_trunc): New.
>>   (expand_vec_perm_even_odd_1): Handle V64QImode.
>>   (ix86_expand_vec_perm_const_1): Try expansion with
>>   expand_vec_perm_even_odd_trunc as well.
>>   * config/i386/sse.md (VI124_AVX512F): Rename to ...
>>   (define_mode_iterator VI124_AVX2_24_AVX512F_1_AVX512BW): This. Extend
>>   to V54QI.
>>   (define_mode_iterator VI248_AVX2_8_AVX512F): Rename to ...
>>   (define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW): This. Extend
>>   to V32HI and V16SI.
>>   (define_insn "avx512bw_v32hiv32qi2"): Unhide pattern name.
>>   (define_expand "vec_pack_trunc_"): Update iterator name.
>>   (define_expand "vec_unpacks_lo_"): Ditto.
>>   (define_expand "vec_unpacks_hi_"): Ditto.
>>   (define_expand "vec_unpacku_lo_"): Ditto.
>>   (define_expand "vec_unpacku_hi_"): Ditto.
>>
>> gcc/testsuite/
>>   * gcc.target/i386/vect-pack-trunc-1.c: New test.
>>   * gcc.target/i386/vect-pack-trunc-2.c: Ditto.
>>   * gcc.target/i386/vect-perm-even-1.c: Ditto.
>>   * gcc.target/i386/vect-perm-odd-1.c: Ditto.
>>   * gcc.target/i386/vect-unpack-1.c: Ditto.
>>   * gcc.target/i386/vect-unpack-2.c: Ditto.
> Checked into main trunk. I'll also check it into gcc-5-branch
> if no objections from RMs next ww.
>

This caused:

FAIL: gcc.target/i386/vect-perm-odd-1.c (test for excess errors)

on gcc-5-branch.

-- 
H.J.

[patch 5/6] scalar-storage-order merge: rest

2015-10-06 Thread Eric Botcazou

This is the rest of the implementation.

* asan.c (instrument_derefs): Adjust call to get_inner_reference.
* builtins.c (get_object_alignment_2): Likewise.
* cfgexpand.c (expand_debug_expr): Adjust call to get_inner_reference
and get_ref_base_and_extent.
* dbxout.c (dbxout_expand_expr): Likewise.
* dwarf2out.c (add_var_loc_to_decl): Likewise.
(loc_list_for_address_of_addr_expr_of_indirect_ref): Likewise.
(loc_list_from_tree): Likewise.
(fortran_common): Likewise.
* gimple-fold.c (gimple_fold_builtin_memory_op): Adjust calls to
get_ref_base_and_extent.
(get_base_constructor): Likewise.
(fold_const_aggregate_ref_1): Likewise.
* gimple-laddress.c (pass_laddress::execute): Adjust call to
get_inner_reference.
* gimple-ssa-strength-reduction.c (slsr_process_ref): Adjust call to
get_inner_reference and bail out on reverse storage order.
* ifcvt.c (noce_emit_move_insn): Adjust calls to store_bit_field.
* ipa-cp.c (ipa_get_jf_ancestor_result): Adjust call to
build_ref_for_offset.
* ipa-polymorphic-call.c (set_by_invariant): Adjust call to
get_ref_base_and_extent.
(ipa_polymorphic_call_context): Likewise.
(extr_type_from_vtbl_ptr_store): Likewise.
(check_stmt_for_type_change): Likewise.
(get_dynamic_type): Likewise.
* ipa-prop.c (ipa_load_from_parm_agg_1): Adjust call to
get_ref_base_and_extent.
(compute_complex_assign_jump_func): Likewise.
(get_ancestor_addr_info): Likewise.
(compute_known_type_jump_func): Likewise.
(determine_known_aggregate_parts): Likewise.
(ipa_get_adjustment_candidate): Likewise.
(ipa_modify_call_arguments): Set REF_REVERSE_STORAGE_ORDER on
MEM_REF.
* ipa-prop.h (ipa_parm_adjustment): Add REVERSE field.
(build_ref_for_offset): Adjust prototype.
* simplify-rtx.c (delegitimize_mem_from_attrs): Adjust call to
get_inner_reference.
* tree-affine.c (tree_to_aff_combination): Adjust call to
get_inner_reference.
(get_inner_reference_aff): Likewise.
* tree-data-ref.c (split_constant_offset_1): Likewise.
(dr_analyze_innermost): Likewise.  Bail out if reverse storage order.
* tree-scalar-evolution.c (interpret_rhs_expr): Adjust call to
get_inner_reference.
* tree-sra.c (struct access): Add REVERSE and move WRITE around.
(dump_access): Print new fields.
(create_access): Adjust call to get_ref_base_and_extent and set the
REVERSE flag according to the result.
(completely_scalarize_record): Set the REVERSE flag.
(scalarize_elem): Add REVERSE parameter.
(build_access_from_expr_1): Preserve storage order barriers.
(build_accesses_from_assign): Likewise.
(build_ref_for_offset): Add REVERSE parameter and set the
REF_REVERSE_STORAGE_ORDER flag accordingly.
(build_ref_for_model): Adjust call to build_ref_for_offset and clear
the REF_REVERSE_STORAGE_ORDER flag if there are components.
(analyze_access_subtree): Likewise.
(create_artificial_child_access): Set the REVERSE flag.
(get_access_for_expr): Adjust call to get_ref_base_and_extent.
(turn_representatives_into_adjustments): Propagate REVERSE flag.
(ipa_sra_check_caller): Adjust call to get_inner_reference.
* tree-ssa-alias.c (ao_ref_base): Adjust call to
get_ref_base_and_extent.
(aliasing_component_refs_p): Likewise.
(stmt_kills_ref_p_1): Likewise.
* tree-ssa-dce.c (mark_aliased_reaching_defs_necessary_1): Likewise.
* tree-ssa-loop-ivopts.c (may_be_nonaddressable_p) : New.
Return true if reverse storage order.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
(split_address_cost): Likewise.  Bail out if reverse storage order.
* tree-ssa-math-opts.c (find_bswap_or_nop_load): Adjust call to
get_inner_reference.  Bail out if reverse storage order.
(bswap_replace): Adjust call to get_inner_reference.
* tree-ssa-pre.c (create_component_ref_by_pieces_1) : Set
the REF_REVERSE_STORAGE_ORDER flag.
: Likewise.
* tree-ssa-sccvn.c (vn_reference_eq): Return false on storage order
barriers.
(copy_reference_ops_from_ref) : Set REVERSE field according
to the REF_REVERSE_STORAGE_ORDER flag.
: Likewise.
: Set it for storage order barriers.
(contains_storage_order_barrier_p): New predicate.
(vn_reference_lookup_3): Adjust calls to get_ref_base_and_extent.
Punt on storage order barriers if necessary.
* tree-ssa-sccvn.h (struct vn_reference_op_struct): Add REVERSE.
* tree-ssa-structalias.c (get_constraint_for_component_ref): Adjust
call to

[wwwdocs] Update C++ conformance status

2015-10-06 Thread Jonathan Wakely


People are being scared off by the experimental status on
https://gcc.gnu.org/projects/cxx0x.html

e.g. https://gcc.gnu.org/ml/gcc/2015-10/msg00025.html

This makes it clear C++11 in 5.1 is no longer experimental.

We also have a "Standard Conformance" section for G++ in
https://gcc.gnu.org/bugs/ which says "Two milestones in standard
conformance are GCC 3.0 (including a major overhaul of the standard
library) and the 3.4.0 version (with its new C++ parser)."  I've
added some more recent milestones, although maybe std::lib conformance
doesn't need to be mentioned in this context?

Index: htdocs/projects/cxx0x.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cxx0x.html,v
retrieving revision 1.67
diff -u -r1.67 cxx0x.html
--- htdocs/projects/cxx0x.html	26 Jan 2015 11:12:43 -	1.67
+++ htdocs/projects/cxx0x.html	6 Oct 2015 10:58:01 -
@@ -28,10 +28,10 @@
   line.  GCC 4.7 and later support -std=c++11 and
   -std=gnu++11 as well.
 
-  Important: GCC's support for C++11 is still
+  Important: Before GCC 5.1 support for C++11 was
   experimental.  Some features were implemented based on
-  early proposals, and no attempt will be made to maintain backward
-  compatibility when they are updated to match the final C++11
+  early proposals, and no attempt was made to maintain backward
+  compatibility when they were updated to match the final C++11
   standard.
 
 C++11 Language Features
Index: htdocs/bugs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/bugs/index.html,v
retrieving revision 1.116
diff -u -r1.116 index.html
--- htdocs/bugs/index.html	5 Jul 2014 21:52:32 -	1.116
+++ htdocs/bugs/index.html	6 Oct 2015 10:58:01 -
@@ -696,9 +700,12 @@
 However, some non-conforming constructs are allowed when the command-line
 option -fpermissive is used.
 
-Two milestones in standard conformance are GCC 3.0 (including a major
-overhaul of the standard library) and the 3.4.0 version (with its new C++
-parser).
+Significant milestones in standard conformance are
+GCC 3.0 (including a major overhaul of the standard library),
+the 3.4.0 version (with its new C++ parser),
+4.8.1 (complete C++11 language support),
+5.1 (complete C++14 language support),
+and 5.1 (complete C++11 and C++14 standard library support).
 
 New in GCC 3.0

1 2 >

1 - 100 of 168 matches

Mail list logo