Re: [PATCH] do not take mutex in _Unwind_Find_registered_FDE if there is no registered objects

2016-07-27 Thread Gleb Natapov
On Wed, Jul 27, 2016 at 05:12:18PM -0600, Jeff Law wrote:
> On 07/25/2016 07:44 AM, Gleb Natapov wrote:
> > _Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even
> > when there is no registered objects. As far as I see only statically
> > linked applications call __register_frame_info* functions, so for
> > dynamically linked executables taking the lock to check unseen_objects
> > and seen_objects is a pessimization. Since the function is called on
> > each thrown exception this is a lot of unneeded locking.  This patch
> > checks unseen_objects and seen_objects outside of the lock and returns
> > earlier if both are NULL.
> > 
> > diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> > index 5b16a1f..41de746 100644
> > --- a/libgcc/unwind-dw2-fde.c
> > +++ b/libgcc/unwind-dw2-fde.c
> > @@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases 
> > *bases)
> >struct object *ob;
> >const fde *f = NULL;
> > 
> > +  /* __atomic_write is not used to modify unseen_objects and seen_objects
> > + since they are modified in locked sections only and unlock provides
> > + release semantics. */
> > +  if (!__atomic_load_n(_objects, __ATOMIC_ACQUIRE)
> > +  && !__atomic_load_n(_objects, __ATOMIC_ACQUIRE))
> > +  return NULL;
> As as Andrew noted, this might be bad on targets that don't have atomics.
> For those we could easily end up inside a libfunc which would be
> unfortunate.  Certain mips/arm variants come to mind here.
> 
> For targets that don't have atomics or any kind of synchronization libfunc,
> we'll emit nop-asm-barriers to at least stop the optimizers from munging
> things too badly.
> 
> It's been a couple years since I've really thought about these kinds of
> synchronization issues -- is it really safe in a weakly ordered processor to
> rely on the mutex lock/unlock of the "object_mutex" to order the
> loads/stores of "unseen_objects" and "seen_objects"?
> 
I am pretty sure it is. After mutex unlock another cpu will not see
old value (provided it uses acquire semantics to read value).

But when I wrote the patch I did not notice that Jakub already wrote one
that address the same issue and avoids both of your concerns. It can be
found here: https://gcc.gnu.org/bugzilla/attachment.cgi?id=38852. Can we
apply his version?


--
Gleb.


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-27 Thread Bernd Edlinger
Am 27.07.2016 um 23:31 schrieb Jeff Law:
> On 07/26/2016 11:32 AM, Bernd Edlinger wrote:
>> Hi!
>>
>> As described in PR 71779 and PR 70903 we have a wrong code issue on
>> aarch64
>> which is triggered by a paradoxical subreg that can be created in
>> store_bit_field_using_insv when the target has an EP_insv bitfield
>> instruction.
>>
>> The attached patch changes this insn...
>>
>> (insn 1047 1046 1048 (set (reg:DI 481)
>> (subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
>>  (nil))
>>
>> ... into a more simple insn:
>>
>> (insn 1047 1046 1048 (set (reg:DI 479)
>> (zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1
>>  (nil))
>>
>> which manages to fix two bugs at the same time.
>>
>> The patch does not include a test case, as I was looking at the PR 71779
>> just by curiosity, and I have only a theoretical interest in aarch64.
>> However,
>> it should be easy to extract one from PR 70903 at least, but I can't
>> do that by
>> myself.
>>
>> Therefore I would like to leave the test case(s) to Cristina Tamar
>> from ARM,
>> and/or Andreas Schwab, who have also helped out with bootstrapping and
>> reg-testing on aarch64 and aarch64-ilp32.
>>
>> Bootstrap and reg-test on x86_64-linux-gnu with no regressions.
>> Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes.
>> Also successfully bootstrapped on an aarch64 juno board and no
>> regressions.
>>
>>
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>> patch-pr71779.diff
>>
>>
>> 2016-07-25  Bernd Edlinger  
>>
>> PR rtl-optimization/71779
>> PR rtl-optimization/70903
>> * expmed.c (store_bit_field_using_insv): Don't generate a paradoxical
>> subreg.
> So you're stumbling into another really interesting area.
>
> I can hazard a guess that the reason we create the paradoxical SUBREG
> rather than a zero or sign extension is because various optimizers know
> that the upper bits in a paradoxical are don't care bits and may make
> transformations with that knowledge.
>
> Once you turn that into a zero/sign extend, then the rest of the
> optimizers must preserve the zero/sign extension property -- they have
> no way of knowing the bits were don't cares.
>
> So it's not necessarily clear that your change results in generally
> better code or not.
>

Yes, I understand, on x86 the stage 2/3 did not change but x86 has
probably not a very sophisticated insv code pattern.

> More importantly, it really feels like you're papering over a real bug
> elsewhere.  AFAICT the use of a paradoxical subreg here is safe.  So the
> real bug ought to be downstream of this point.  Several folks have
> pointed at reload, which would certainly be a point ripe for a
> paradoxical subreg problem.
>

I debugged lra already, because I was initially sure it did something
wrong with the subreg, but it turned out that it follows exactly what
you say here, it spills the subreg twice, once as a 32 bit value,
into a 64 bit slot with stack image in the upper word and then again
as a zero extended 64 bit value.  Later the garbage extended 64 bit
value is used when it should not.

So lra seems not to be the reason, but it is apparently spilling the
paradoxical subreg twice.

So at least lra does not generate better code from paradoxical subreg.

> Sooo. I don't think we can go with this patch as-is today.  We need to
> find the root problem which I believe is later in the pipeline.
>
> This patch might make sense from an optimization standpoint -- I'm
> generally of the opinion that while paradoxical subregs give the
> optimziers more freedom, the optimizers rarely are able to use it and
> they generally know much more about the semantics of and how to optimize
> around zero/sign extensions.  But we really should fix the bug first,
> then come back to improving the code generation.
>

Yes.  And paradoxical subreg have non-intuitive semantics.


Thanks
Bernd.

> Now if someone wanted to look for an interesting optimization project --
> ree.c knows how to do redundant extension eliminations.  A paradoxical
> is a form of extension that isn't currently understood by REE. Similarly
> REE doesn't know how to handle zero-extension as an "and" or sign
> extension as a pair of shifts.  Both of which can occur in practice due
> to costs or ISA limitations.  I wouldn't be surprise if adding those
> capabilities to REE would significantly improve its ability to eliminate
> unnecessary extensions.
>
> Jeff
>


Re: Init df for split pass since some target use REG_NOTE in split pattern

2016-07-27 Thread Kito Cheng
Hi Jeff:
> This seems better suited as a generic optimization than hidden away in a
> backend.
>
> AFAICT you're just noticing a word of the output operand is dead and eliding
> the load/store for that word.
>
> In fact, I'm a bit surprised nothing has optimized this away by the time
> reload/LRA is done.   You might spend some time walking through lower-subreg
> to see if it can be easily extended to handle your case.

Agree, I will take a look for lower-subreg and LRA.

However this patch still need for other target, this patch is ok for trunk?


[gomp4] encode acc routine clauses inside fortran module files

2016-07-27 Thread Cesar Philippidis
This patch contains the following changes:

* Enhance support for OpenACC routine clauses inside fortran module
  files. Also, allow the routine directive to be applied to intrinsic
  procedures. The trunk patch can be found here:

  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00063.html

* Change an LTO wrapper assert failure to an error when it detects
  missing symbols. This situation can arise in offloading code, e.g.
  when the user forgets to declare a global variable as offloadable.
  The trunk patch can be found here. Part of this patch was already
  present in gomp-4_0-branch.

  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00043.html

I've applied this patch gomp-4_0-branch.

Cesar
2016-07-27  Cesar Philippidis  

	gcc/fortran/
	* gfortran.h (enum oacc_function): Define.
	(oacc_function_type): Declare.
	(symbol_attribute): Change the type of oacc_function from unsigned
	to an ENUM_BITFIELD.
	* module.c (oacc_function): New DECL_MIO_NAME.
	(mio_symbol_attribute): Set the oacc_function attribute.
	* openmp.c (gfc_oacc_routine_dims): Change the return type from
	int to oacc_function.
	(gfc_match_oacc_routine): Handle intrinsic procedures.
	* symbol.c (oacc_function_types): Define.
	* trans-decl.c (add_attributes_to_decl): Update to handle the
	retyped oacc_function attribute.

	gcc/
	* lto-cgraph.c (input_overwrite_node): Change the assertion to an
	error for missing symbols.

	gcc/testsuite/
	* gfortran.dg/goacc/fixed-1.f: Add test coverage.
	* gfortran.dg/goacc/routine-7.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-fortran/abort-1.f90:
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Add test
	coverage.

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 2327b13..7784e93 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -292,6 +292,15 @@ enum save_state
 { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
 };
 
+/* Flags to keep track of ACC routine states.  */
+enum oacc_function
+{ OACC_FUNCTION_NONE = 0,
+  OACC_FUNCTION_SEQ,
+  OACC_FUNCTION_GANG,
+  OACC_FUNCTION_WORKER,
+  OACC_FUNCTION_VECTOR
+};
+
 /* Strings for all symbol attributes.  We use these for dumping the
parse tree, in error messages, and also when reading and writing
modules.  In symbol.c.  */
@@ -301,6 +310,7 @@ extern const mstring intents[];
 extern const mstring access_types[];
 extern const mstring ifsrc_types[];
 extern const mstring save_status[];
+extern const mstring oacc_function_types[];
 
 /* Enumeration of all the generic intrinsic functions.  Used by the
backend for identification of a function.  */
@@ -851,7 +861,7 @@ typedef struct
   unsigned oacc_declare_link:1;
 
   /* This is an OpenACC acclerator function at level N - 1  */
-  unsigned oacc_function:3;
+  ENUM_BITFIELD (oacc_function) oacc_function:3;
   unsigned oacc_function_nohost:1;
 
   /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 32ee526..6ee81c3 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -2095,6 +2095,7 @@ DECL_MIO_NAME (procedure_type)
 DECL_MIO_NAME (ref_type)
 DECL_MIO_NAME (sym_flavor)
 DECL_MIO_NAME (sym_intent)
+DECL_MIO_NAME (oacc_function)
 #undef DECL_MIO_NAME
 
 /* Symbol attributes are stored in list with the first three elements
@@ -2116,6 +2117,8 @@ mio_symbol_attribute (symbol_attribute *attr)
   attr->proc = MIO_NAME (procedure_type) (attr->proc, procedures);
   attr->if_source = MIO_NAME (ifsrc) (attr->if_source, ifsrc_types);
   attr->save = MIO_NAME (save_state) (attr->save, save_status);
+  attr->oacc_function = MIO_NAME (oacc_function) (attr->oacc_function,
+		  oacc_function_types);
 
   ext_attr = attr->ext_attr;
   mio_integer ((int *) _attr);
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 52c0309..c20a0a3 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1879,21 +1879,31 @@ gfc_match_oacc_cache (void)
 
 /* Determine the loop level for a routine.   */
 
-static int
+static oacc_function
 gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
 {
   int level = -1;
+  oacc_function ret = OACC_FUNCTION_SEQ;
 
   if (clauses)
 {
   unsigned mask = 0;
 
   if (clauses->gang)
-	level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_GANG;
+	}
   if (clauses->worker)
-	level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_WORKER;
+	}
   if (clauses->vector)
-	level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_VECTOR;
+	}
   if (clauses->seq)
 	level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
 
@@ -1901,10 +1911,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
 	gfc_error ("Multiple loop axes specified for routine");
 }
 
-  

Re: [PATCH 1/3] make pattern_regs a vec

2016-07-27 Thread Trevor Saunders
On Wed, Jul 27, 2016 at 04:15:02PM -0600, Jeff Law wrote:
> On 07/24/2016 03:10 PM, tbsaunde+...@tbsaunde.org wrote:
> > From: Trevor Saunders 
> > 
> > gcc/ChangeLog:
> > 
> > 2016-07-24  Trevor Saunders  
> > 
> > * store-motion.c (struct st_expr): Make pattern_regs a vector.
> > (extract_mentioned_regs): Append to a vector instead of
> > returning a rtx_expr_list.
> > (st_expr_entry): Adjust.
> > (store_ops_ok): Likewise.
> > (store_killed_in_insn): Likewise.
> > (find_moveable_store): Likewise.
> This is fine.  Though one has to wonder if a different representation than
> lists/vecs would work better given the walks.  A bitmap for example might
> work well since I think we just need to track register #s.

yeah, I just noticed there's actually a comment suggesting a regset
should be used.

I also noticed free_st_expr_entry needs to release the vec, I'll be
happy when we use auto_vec and stuff more and this sort of thing is
impossible.  I guess this used to work because the gc just cleaned up
after us being sloppy.  Anyway that seems pretty obvious so I'm tempted
to commit this as is in furtherance of killing rtl lists, and then we
can worry about making it a bitmap some other time.

Trev



PR fortran/71799 -- Patch

2016-07-27 Thread Steve Kargl
Probably committable under trivially correct.  OK?

2016-07-22  Steven G. Kargl  

PR fortran/71799
* resolve.c(gfc_resolve_iterator): Failure of type conversion need
not ICE.

2016-07-22  Steven G. Kargl  

PR fortran/71799
* gfortran.dg/pr71799.f90: New test.

Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (revision 238665)
+++ gcc/fortran/resolve.c   (working copy)
@@ -6515,15 +6515,15 @@ gfc_resolve_iterator (gfc_iterator *iter
   /* Convert start, end, and step to the same type as var.  */
   if (iter->start->ts.kind != iter->var->ts.kind
   || iter->start->ts.type != iter->var->ts.type)
-gfc_convert_type (iter->start, >var->ts, 2);
+gfc_convert_type (iter->start, >var->ts, 1);
 
   if (iter->end->ts.kind != iter->var->ts.kind
   || iter->end->ts.type != iter->var->ts.type)
-gfc_convert_type (iter->end, >var->ts, 2);
+gfc_convert_type (iter->end, >var->ts, 1);
 
   if (iter->step->ts.kind != iter->var->ts.kind
   || iter->step->ts.type != iter->var->ts.type)
-gfc_convert_type (iter->step, >var->ts, 2);
+gfc_convert_type (iter->step, >var->ts, 1);
 
   if (iter->start->expr_type == EXPR_CONSTANT
   && iter->end->expr_type == EXPR_CONSTANT
Index: gcc/testsuite/gfortran.dg/pr71799.f90
===
--- gcc/testsuite/gfortran.dg/pr71799.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr71799.f90   (working copy)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+subroutine test2(s)
+integer(1) :: i
+integer (8) :: s
+
+do i = 10, HUGE(i) - 10, 222 ! { dg-error "overflow converting" }
+  s = s + 1
+end do
+
+end subroutine test2

-- 
Steve


PR fortran/71859 -- Patch

2016-07-27 Thread Steve Kargl
Patch is self-explanatory.  OK?

2016-07-26  Steven G. Kargl  

PR fortran/71859
* check.c(numeric_check): Prevent ICE.  Issue error for invalid
subroutine as an actual argument when numeric argument is expected.

2016-07-26  Steven G. Kargl  

PR fortran/71859
* gfortran.dg/pr71859.f90: New test.
* gfortran.dg/intrinsic_numeric_arg.f: Update error message.
* gfortran.dg/coarray_collectives_1.f90: Ditto.

Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c (revision 238665)
+++ gcc/fortran/check.c (working copy)
@@ -72,6 +72,11 @@ type_check (gfc_expr *e, int n, bt type)
 static bool
 numeric_check (gfc_expr *e, int n)
 {
+  /* Users sometime use a subroutine designator as an actual argument to
+ an intrinsic subprogram that expects an argument with a numeric type.  */
+  if (e->symtree && e->symtree->n.sym->attr.subroutine)
+goto bandaid;
+
   if (gfc_numeric_ts (>ts))
 return true;
 
@@ -86,7 +91,9 @@ numeric_check (gfc_expr *e, int n)
   return true;
 }
 
-  gfc_error ("%qs argument of %qs intrinsic at %L must be a numeric type",
+bandaid:
+
+  gfc_error ("%qs argument of %qs intrinsic at %L must have a numeric type",
 gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic,
 >where);
 
Index: gcc/testsuite/gfortran.dg/pr71859.f90
===
--- gcc/testsuite/gfortran.dg/pr71859.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr71859.f90   (working copy)
@@ -0,0 +1,8 @@
+! { dg-do compile }
+program p
+   call s(1)
+   x = abs(s)  ! { dg-error "must have a numeric type" }
+end
+subroutine s(n)
+   print *, n
+end
Index: gcc/testsuite/gfortran.dg/intrinsic_numeric_arg.f
===
--- gcc/testsuite/gfortran.dg/intrinsic_numeric_arg.f   (revision 238665)
+++ gcc/testsuite/gfortran.dg/intrinsic_numeric_arg.f   (working copy)
@@ -4,6 +4,6 @@
LOGICAL Z
CHARACTER A
REAL R
-   R = ABS(Z) !  { dg-error " must be a numeric type" }
-   R = ABS(A) !  { dg-error " must be a numeric type" }
+   R = ABS(Z) !  { dg-error " must have a numeric type" }
+   R = ABS(A) !  { dg-error " must have a numeric type" }
END
Index: gcc/testsuite/gfortran.dg/coarray_collectives_1.f90
===
--- gcc/testsuite/gfortran.dg/coarray_collectives_1.f90 (revision 238665)
+++ gcc/testsuite/gfortran.dg/coarray_collectives_1.f90 (working copy)
@@ -14,7 +14,7 @@ program test
   integer(8) :: i8
   character(len=19, kind=4) :: msg4
 
-  call co_sum("abc") ! { dg-error "must be a numeric type" }
+  call co_sum("abc") ! { dg-error "must have a numeric type" }
   call co_max(cmplx(1.0,0.0)) ! { dg-error "shall be of type integer, real or 
character" }
   call co_min(cmplx(0.0,1.0)) ! { dg-error "shall be of type integer, real or 
character" }
 


-- 
Steve


Re: [PATCH] do not take mutex in _Unwind_Find_registered_FDE if there is no registered objects

2016-07-27 Thread Jeff Law

On 07/25/2016 07:44 AM, Gleb Natapov wrote:

_Unwind_Find_FDE calls _Unwind_Find_registered_FDE and it takes lock even
when there is no registered objects. As far as I see only statically
linked applications call __register_frame_info* functions, so for
dynamically linked executables taking the lock to check unseen_objects
and seen_objects is a pessimization. Since the function is called on
each thrown exception this is a lot of unneeded locking.  This patch
checks unseen_objects and seen_objects outside of the lock and returns
earlier if both are NULL.

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 5b16a1f..41de746 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -1001,6 +1001,13 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   struct object *ob;
   const fde *f = NULL;

+  /* __atomic_write is not used to modify unseen_objects and seen_objects
+ since they are modified in locked sections only and unlock provides
+ release semantics. */
+  if (!__atomic_load_n(_objects, __ATOMIC_ACQUIRE)
+  && !__atomic_load_n(_objects, __ATOMIC_ACQUIRE))
+  return NULL;
As as Andrew noted, this might be bad on targets that don't have 
atomics.  For those we could easily end up inside a libfunc which would 
be unfortunate.  Certain mips/arm variants come to mind here.


For targets that don't have atomics or any kind of synchronization 
libfunc, we'll emit nop-asm-barriers to at least stop the optimizers 
from munging things too badly.


It's been a couple years since I've really thought about these kinds of 
synchronization issues -- is it really safe in a weakly ordered 
processor to rely on the mutex lock/unlock of the "object_mutex" to 
order the loads/stores of "unseen_objects" and "seen_objects"?




Jeff






Re: [PATCH 1/3] (v2) On-demand locations within string-literals

2016-07-27 Thread Manuel López-Ibáñez
On 27 July 2016 at 15:30, David Malcolm  wrote:
>> Perhaps it could live for now in c-format.c, since it is the only
>> place using it?
>
> Martin Sebor [CC-ed] wants to use it from the middle-end:
>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
> so it's unclear to me that c-format.c would be a better location.

Fine. He will have to figure out how to get a cpp_reader from the
middle-end, though.

> There are various places it could live; but getting it working took a
> lot of effort to achieve - the currently proposed mixture of libcpp,
> input.c and c-format.c for the locations of the various pieces works
> (for example, auto_vec isn't available in libcpp).

I don't doubt it. I tried to do something similar in the past and I
failed, this is why I ended up with the poor approximation that was in
place until now. This is a significant step forward.

Is libcpp still C? When would be the time to move it to C++ already
and start using common utilities?

Also, moving vec.h, sbitmap, etc to their own directory/library so
that they can be used by other parts of the compiler (hey! maybe even
by other parts of the toolchain?) is desirable. Richard has said in
the past that he supports such moves. Did I understand correctly
Richard?

> Given that both Martin and I have candidate patches that are touching
> the same area, I'd prefer to focus on getting this code in to trunk,
> rather than rewrite it out-of-tree, so that we can at least have the
> improvement to location-handling for Wformat.  Once the code is in the
> tree, it should be easier to figure out how to access it from the
> middle-end.

Sure, I think this version is fine. I'm a big proponent of
step-by-step, even if the steps are only approximations to the optimal
solution :)
It may be enough to motivate someone else more capable to improve over
my poor approximations ;-)


>> [*] In an ideal world, we would have a language-agnostic diagnostics
>> library
>> that would include line-map and that would be used by libcpp and the
>> rest of
>> GCC, so that we can remove all the error-routines in libcpp and the
>> awkward
>> glue code that ties it into diagnostics.c.,
>
> Agreed, though that may have to wait until gcc 8 at this point.
> (Given that the proposed diagnostics library would use line maps, and
> would be used by libcpp, would it make sense to move the diagnostics
> into libcpp itself?  Diagnostics would seem to be intimately related to
> location-tracking)

I don't think so. There is nothing in diagnostic.* pretty-print.*
input.* line-map.* that requires libcpp (and only two mentions of tree
that could be easily abstracted out). This was a deliberate design
goal of Gabriel and followed by most of us later working on
diagnostics. Of course, cpp may make use of the new library, but also
other parts of the toolchain (GAS?). The main obstacle I faced when
trying to do this move was the build machinery to make both libcpp and
gcc build and statically link with this new library.

Once that move is done, the main abstraction challenge to remove the
glue is that libcpp has its own flags for options and diagnostics that
are independent from those of gcc (see c_cpp_error in c-common.c). It
would be great if libcpp used the common flags, but then one would
have to figure out a way to reorder things so that the diagnostic
library, libcpp and gcc can use (or avoid being dependent on) the same
flags.

Cheers,

Manuel.


Re: [PATCH 3/3] merge adjust_cost and adjust_cost_2 target hooks

2016-07-27 Thread Jeff Law

On 07/24/2016 03:10 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-07-24  Trevor Saunders  

* config/alpha/alpha.c (alpha_adjust_cost): Adjust.
* config/arm/arm-protos.h (struct tune_params): Likewise.
* config/arm/arm.c (xscale_sched_adjust_cost): Likewise.
(cortex_a9_sched_adjust_cost): Likewise.
(fa726te_sched_adjust_cost): Likewise.
(arm_adjust_cost): Likewise.
* config/bfin/bfin.c (bfin_adjust_cost): Likewise.
* config/c6x/c6x.c (c6x_adjust_cost): Likewise.
* config/epiphany/epiphany.c (epiphany_adjust_cost): Likewise.
* config/i386/i386.c (ix86_adjust_cost): Likewise.
* config/ia64/ia64.c: Likewise.
* config/m68k/m68k.c: Likewise.
* config/mep/mep.c (mep_adjust_cost): Likewise.
* config/microblaze/microblaze.c (microblaze_adjust_cost):
* Likewise.
* config/mips/mips.c (mips_adjust_cost): Likewise.
* config/mn10300/mn10300.c (mn10300_adjust_sched_cost):
* Likewise.
* config/pa/pa.c (pa_adjust_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_adjust_cost): Likewise.
(rs6000_debug_adjust_cost): Likewise.
* config/sh/sh.c (sh_adjust_cost): Likewise.
* config/sparc/sparc.c (supersparc_adjust_cost): Likewise.
(hypersparc_adjust_cost): Likewise.
(sparc_adjust_cost): Likewise.
* config/spu/spu.c (spu_sched_adjust_cost): Likewise.
* config/tilegx/tilegx.c (tilegx_sched_adjust_cost): Likewise.
* config/tilepro/tilepro.c (tilepro_sched_adjust_cost):
* Likewise.
* config/visium/visium.c (visium_adjust_cost): Likewise.
* doc/tm.texi: Regenerate.
* haifa-sched.c (dep_cost_1): Adjust.
* target.def: Merge adjust_cost and adjust_cost_2.

OK.
jeff




Re: [PATCH 1/3] make pattern_regs a vec

2016-07-27 Thread Jeff Law

On 07/24/2016 03:10 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-07-24  Trevor Saunders  

* store-motion.c (struct st_expr): Make pattern_regs a vector.
(extract_mentioned_regs): Append to a vector instead of
returning a rtx_expr_list.
(st_expr_entry): Adjust.
(store_ops_ok): Likewise.
(store_killed_in_insn): Likewise.
(find_moveable_store): Likewise.
This is fine.  Though one has to wonder if a different representation 
than lists/vecs would work better given the walks.  A bitmap for example 
might work well since I think we just need to track register #s.


jeff



Re: [PATCH 2/3] haifa-sched.c: make twins a auto_vec

2016-07-27 Thread Jeff Law

On 07/24/2016 03:10 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-07-24  Trevor Saunders  

* haifa-sched.c (add_to_speculative_block): Make twins a vector.

OK.
jeff



Re: RFC: Make diagnostics for C++ reference binding more consistent

2016-07-27 Thread Jason Merrill
On Wed, Jul 27, 2016 at 8:05 AM, Jonathan Wakely  wrote:
> Consider:
>
> template T declval();
>
> int& r1 = declval();
> int&& r2 = declval();
> int& r3 = declval();
>
>
> This produces three quite different errors:
>
>
> refs.cc:3:25: error: invalid initialization of non-const reference of type
> 'int&' from an rvalue of type 'int'
> int& r1 = declval();
>   ~~^~
> refs.cc:4:25: error: cannot bind 'int' lvalue to 'int&&'
> int&& r2 = declval();
>~^~
> refs.cc:5:30: error: binding 'const int' to reference of type 'int&'
> discards qualifiers
> int& r3 = declval();
>   ~~~^~
>
>
> The first one uses the order to/from, but the other two are from/to.
>
> The first and second are saying the same thing (wrong value category)
> but very differently.
>
> The first and third mention references but the second doesn't.
>
> The standard talks about binding a reference to something, but the
> second and third diagnostics talk about binding something to a
> reference (and the first doesn't mention binding at all).
>
> The first diagnostic is specific to lvalue references (it wouldn't be
> invalid if it was binding a non-const _rvalue_ reference to an
> rvalue), but doesn't use the word "lvalue".
>
> (FWIW Clang and EDG both have their own inconsistencies for the same
> example, but that shouldn't stop us trying to improve.)
>
>
> IMHO the first should use the words "bind" and "lvalue reference":
>
> refs.cc:3:25: error: cannot bind non-const lvalue reference of type 'int&'
> to an rvalue of type 'int'
> int& r1 = declval();
>   ~~^~
>
> The second should use the phrasing "bind ref to value" instead of
> "bind value to ref", and mention "rvalue reference":
>
> refs.cc:4:25: error: cannot bind rvalue reference of type 'int&&' to an
> lvalue of type 'int'
> int&& r2 = declval();
>~^~
>
> The third should use the same phrasing (but doesn't need to mention
> lvalue/rvalue because the problem is related to cv-qualifiers not
> value categories):
>
> refs.cc:5:30: error: binding reference of type 'int&' to 'const int'
> discards qualifiers
> int& r3 = declval();
>   ~~~^~
>
>
> I've only considered trivial examples here, is there some reason to
> prefer to current diagnostics for more complex cases?
>
> The attached patch makes that change, but there are probably lots of
> tests that would need updating which I haven't done until I know if
> the change is worthwhile.

Sounds good to me.

Jason


Re: [PATCH] Add mark_spam.py script

2016-07-27 Thread Jeff Law

On 07/26/2016 06:39 AM, Martin Liška wrote:

Hello.

This is python script that utilizes bugzilla API and marks PRs as spam:

$ ./mark_spam.py --help
usage: mark_spam.py [-h] [--verbose] api_key range

Mark Bugzilla issues as spam.

positional arguments:
  api_key API key
  range   Range of IDs, e.g. 10-23,24,25,27

optional arguments:
  -h, --help  show this help message and exit
  --verbose   Verbose logging

Sample usage:
$ ./mark_spam.py my_api_key 72634-72636
Marking as spam: PR72634
Marking as spam: PR72635
Marking as spam: PR72636

API key can be set up here:
https://gcc.gnu.org/bugzilla/userprefs.cgi?tab=apikey

Sample PR marked by the script: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72635

Ready to install?
Martin


0001-Add-mark_spam.py-script.patch


From 467dc2cf8f0c549f5d7ee190efe59c841a9acad9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 26 Jul 2016 14:34:55 +0200
Subject: [PATCH] Add mark_spam.py script

contrib/ChangeLog:

2016-07-26  Martin Liska  

* mark_spam.py: New file.

OK.
jeff



Re: [PATCH] Do not allow make_compound_operation for vector mode, (PR70944)

2016-07-27 Thread Jeff Law

On 07/27/2016 07:46 AM, Martin Liška wrote:

Hello.

Following patch rejects compound operation manipulation for vector mode.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin


0001-Do-not-allow-make_compound_operation-for-vector-mode.patch


From 5f7ae66453a1f7a1a2c44414b22c742d69670177 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 27 Jul 2016 13:44:51 +0200
Subject: [PATCH] Do not allow make_compound_operation for vector mode
 (PR70944)

gcc/testsuite/ChangeLog:

2016-07-27  Martin Liska  

* g++.dg/vect/pr70944.cc: New test.

gcc/ChangeLog:

2016-07-27  Martin Liska  

PR rtl-optimization/70944
* combine.c (make_compound_operation):
Do not allow make_compound_operation for vector mode

OK.

Jeff


Re: Init df for split pass since some target use REG_NOTE in split pattern

2016-07-27 Thread Jeff Law

On 07/25/2016 08:31 PM, Kito Cheng wrote:

Hi Jeff:

Oop, patch in attachment, and I hit this bug in gcc.dg/torture/vshuf-v2di.c
with our nds32 internal branch.


Hi Richard:
I think we really need reg dead note for some optimization, and btw,
here is our split pattern:

(define_split
  [(set (match_operand:DI 0 "nds32_general_register_operand" "")
(match_operand:DI 1 "nds32_general_register_operand" ""))]
  "find_regno_note (insn, REG_UNUSED, REGNO (operands[0])) != NULL
   || find_regno_note (insn, REG_UNUSED, REGNO (operands[0]) + 1) != NULL"
  [(set (match_dup 0) (match_dup 1))]
{
  rtx dead_note = find_regno_note (curr_insn, REG_UNUSED, REGNO (operands[0]));
  HOST_WIDE_INT offset;
  if (dead_note == NULL_RTX)
offset = 0;
  else
offset = 4;
  operands[0] = simplify_gen_subreg (
  SImode, operands[0],
  DImode, offset);
  operands[1] = simplify_gen_subreg (
  SImode, operands[1],
  DImode, offset);
})
This seems better suited as a generic optimization than hidden away in a 
backend.


AFAICT you're just noticing a word of the output operand is dead and 
eliding the load/store for that word.


In fact, I'm a bit surprised nothing has optimized this away by the time 
reload/LRA is done.   You might spend some time walking through 
lower-subreg to see if it can be easily extended to handle your case.


Jeff




C PATCH for c/71853 (ICE with uninitialized memory on invalid)

2016-07-27 Thread Marek Polacek
This testcase was breaking because we were using uninitialized memory
coming from c_expr in c_parser_switch_statement.  There, in case we hadn't
seen '(' after switch, we called c_finish_case with uninitialized CE.
Fixed thus.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-07-27  Marek Polacek  

PR c/71853
* c-parser.c (c_parser_switch_statement): Initialize ce.original_type
to error node for invalid code.

* gcc.dg/noncompile/pr71853.c: New test.

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 8952bca..3679654 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -5661,6 +5661,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p)
 {
   switch_cond_loc = UNKNOWN_LOCATION;
   expr = error_mark_node;
+  ce.original_type = error_mark_node;
 }
   c_start_case (switch_loc, switch_cond_loc, expr, explicit_cast_p);
   save_break = c_break_label;
diff --git gcc/testsuite/gcc.dg/noncompile/pr71853.c 
gcc/testsuite/gcc.dg/noncompile/pr71853.c
index e69de29..e049c8e 100644
--- gcc/testsuite/gcc.dg/noncompile/pr71853.c
+++ gcc/testsuite/gcc.dg/noncompile/pr71853.c
@@ -0,0 +1,9 @@
+/* PR c/71426 */
+/* { dg-do compile } */
+
+void f (void)
+{
+  case (0) { /* { dg-error "expected" } */
+switch 0: { } /* { dg-error "expected" } */
+  }
+}

Marek


Re: [PATCH] Introduce no_profile_instrument_function attribute (PR gcov-profile/68025)

2016-07-27 Thread Jeff Law

On 07/27/2016 02:27 AM, Martin Liška wrote:

Hi.

As mentioned in the PR gcov-profile/68025, there's a request not to instrument
some functions (e.g. a in linux kernel). Thus, I come with a new attribute 
no_profile_instrument_function
which skips any profiling instrumentation.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
Conceptually fine.  I think you need an update to extensions.texi to 
document the new attribute.  With that missing bit added, this will be 
fine for the trunk.


jeff


Re: [PATCH] Fix wrong code on aarch64 due to paradoxical subreg

2016-07-27 Thread Jeff Law

On 07/26/2016 11:32 AM, Bernd Edlinger wrote:

Hi!

As described in PR 71779 and PR 70903 we have a wrong code issue on aarch64
which is triggered by a paradoxical subreg that can be created in
store_bit_field_using_insv when the target has an EP_insv bitfield instruction.

The attached patch changes this insn...

(insn 1047 1046 1048 (set (reg:DI 481)
(subreg:DI (reg/f:SI 479) 0)) y.c:12702 -1
 (nil))

... into a more simple insn:

(insn 1047 1046 1048 (set (reg:DI 479)
(zero_extend:DI (reg:SI 480))) isl_input.c:2496 -1
 (nil))

which manages to fix two bugs at the same time.

The patch does not include a test case, as I was looking at the PR 71779
just by curiosity, and I have only a theoretical interest in aarch64. However,
it should be easy to extract one from PR 70903 at least, but I can't do that by
myself.

Therefore I would like to leave the test case(s) to Cristina Tamar from ARM,
and/or Andreas Schwab, who have also helped out with bootstrapping and
reg-testing on aarch64 and aarch64-ilp32.

Bootstrap and reg-test on x86_64-linux-gnu with no regressions.
Successfully bootstrapped on aarch64_ilp32 and the ISL testsuite passes.
Also successfully bootstrapped on an aarch64 juno board and no regressions.


Is it OK for trunk?


Thanks
Bernd.


patch-pr71779.diff


2016-07-25  Bernd Edlinger  

PR rtl-optimization/71779
PR rtl-optimization/70903
* expmed.c (store_bit_field_using_insv): Don't generate a paradoxical
subreg.

So you're stumbling into another really interesting area.

I can hazard a guess that the reason we create the paradoxical SUBREG 
rather than a zero or sign extension is because various optimizers know 
that the upper bits in a paradoxical are don't care bits and may make 
transformations with that knowledge.


Once you turn that into a zero/sign extend, then the rest of the 
optimizers must preserve the zero/sign extension property -- they have 
no way of knowing the bits were don't cares.


So it's not necessarily clear that your change results in generally 
better code or not.


More importantly, it really feels like you're papering over a real bug 
elsewhere.  AFAICT the use of a paradoxical subreg here is safe.  So the 
real bug ought to be downstream of this point.  Several folks have 
pointed at reload, which would certainly be a point ripe for a 
paradoxical subreg problem.


Sooo. I don't think we can go with this patch as-is today.  We need to 
find the root problem which I believe is later in the pipeline.


This patch might make sense from an optimization standpoint -- I'm 
generally of the opinion that while paradoxical subregs give the 
optimziers more freedom, the optimizers rarely are able to use it and 
they generally know much more about the semantics of and how to optimize 
around zero/sign extensions.  But we really should fix the bug first, 
then come back to improving the code generation.


Now if someone wanted to look for an interesting optimization project -- 
ree.c knows how to do redundant extension eliminations.  A paradoxical 
is a form of extension that isn't currently understood by REE. 
Similarly REE doesn't know how to handle zero-extension as an "and" or 
sign extension as a pair of shifts.  Both of which can occur in practice 
due to costs or ISA limitations.  I wouldn't be surprise if adding those 
capabilities to REE would significantly improve its ability to eliminate 
unnecessary extensions.


Jeff




Re: [C++ PATCH] for PR72457

2016-07-27 Thread Jason Merrill
On Wed, Jul 27, 2016 at 2:50 AM, Markus Trippelsdorf
 wrote:
> On 2016.07.23 at 22:55 -0400, Jason Merrill wrote:
>> Using build_value_init in a base initialization is wrong, because it
>> calls the complete object constructor and misses protected access.  So
>> let's handle list-value-initialization in expand_aggr_init_1.
>>
>> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72457.
> And because it was backported, the gcc-6 branch is also affected.
>
> The following fix was tested on ppc64le. OK for trunk and gcc-6?
>
> (Unfortunately the reduced testcase is much too big.)
>
>  PR c++/72457
>  *constexpr.c (cx_check_missing_mem_inits): Handle potential
>  NULL_TREE.
>
> diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> index 6bcb41ae8254..83fd9a4896ac 100644
> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -734,7 +734,7 @@ cx_check_missing_mem_inits (tree fun, tree body, bool 
> complain)
>   || DECL_ARTIFICIAL (index))
> continue;
> }
> -  for (; field != index; field = DECL_CHAIN (field))
> +  for (; field != NULL_TREE && field != index; field = DECL_CHAIN 
> (field))

This is wrong; it ends up just skipping over the rest of the fields,
so we don't check whether they were initialized.  Rather, we need to
handle seeing two initializers in a row for the same field.

Jason


Re: [PATCH, 2 of 4], Enhance PowerPC vec_extract support for power8/power9 machines

2016-07-27 Thread Michael Meissner
Next patch in the vec_extract series.

This patch adds support for vec_extract with a variable argument element number
for vector double or vector long on 64-bit ISA 2.07 (power8) or ISA 3.0
(power9) systems.  It needs 64-bit ISA 2.07 for the direct move support.

I have tested this on a little endian 64-bit power8 system and a big endian
power7 system with both 32-bit and 64-bit targets.  Note, the power7 system
uses the existing method of saving the vector and addressing the vector
elements in memory because it doesn't have the direct move instructions.  There
were no regressions in the test, can I install these patches on the trunk?

[gcc, patch #2]
2016-07-27  Michael Meissner  

* config/rs6000/rs6000-protos.h (rs6000_split_vec_extract_var):
New declaration.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Add support for vec_extract of vector double or vector long having
a variable element number on 64-bit ISA 2.07 systems or newer.
* config/rs6000/rs6000.c (rs6000_expand_vector_extract):
Likewise.
(rs6000_split_vec_extract_var): New function to split a
vec_extract built-in function with variable element number.
(rtx_is_swappable_p): Variable vec_extracts and shifts are not
swappable.
* config/rs6000/vsx.md (UNSPEC_VSX_VSLO): New unspecs.
(UNSPEC_VSX_EXTRACT): Likewise.
(vsx_extract_, VSX_D iterator): Fix constraints to allow
direct move instructions to be generated on 64-bit ISA 2.07
systems and newer, and to take advantage of the ISA 3.0 MFVSRLD
instruction.
(vsx_vslo_): New insn to do VSLO on V2DFmode and V2DImode
arguments for vec_extract variable element.
(vsx_extract__var, VSX_D iterator): New insn to support
vec_extract with variable element on V2DFmode and V2DImode
vectors.
* config/rs6000/rs6000.h (TARGET_VEXTRACTUB): Remove
-mupper-regs-df requirement, since it isn't needed.
(VEC_EXTRACT_OPTIMIZE_P): New macro to say whether we can optmize
vec_extract on 64-bit ISA 2.07 systems and newer.

[gcc/testsuite, patch #2]
2016-07-27  Michael Meissner  

* gcc.target/powerpc/vec-extract-1.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 238775)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -62,6 +62,7 @@ extern void rs6000_expand_vector_init (r
 extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
 extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
+extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
 extern bool altivec_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern bool rs6000_expand_vec_perm_const (rtx op[4]);
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 238772)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -5105,29 +5105,61 @@ altivec_resolve_overloaded_builtin (loca
  arg2);
}
 
-  /* If we can use the VSX xxpermdi instruction, use that for extract.  */
+  /* See if we can optimize vec_extracts with the current VSX instruction
+set.  */
   mode = TYPE_MODE (arg1_type);
-  if ((mode == V2DFmode || mode == V2DImode) && VECTOR_MEM_VSX_P (mode)
- && TREE_CODE (arg2) == INTEGER_CST
- && wi::ltu_p (arg2, 2))
+  if (VECTOR_MEM_VSX_P (mode))
+
{
  tree call = NULL_TREE;
+ int nunits = GET_MODE_NUNITS (mode);
 
- if (mode == V2DFmode)
-   call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DF];
- else if (mode == V2DImode)
-   call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DI];
+ /* If the second argument is an integer constant, if the value is in
+the expected range, generate the built-in code if we can.  We need
+64-bit and direct move to extract the small integer vectors.  */
+ if (TREE_CODE (arg2) == INTEGER_CST && wi::ltu_p (arg2, nunits))
+   {
+ switch (mode)
+   {
+   default:
+ break;
+
+   case V1TImode:
+ call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V1TI];
+ break;
+
+   case V2DFmode:
+ call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DF];
+ break;
+
+   case V2DImode:
+ call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DI];
+ 

Re: [Bug tree-optimization] Fix for PR71994

2016-07-27 Thread Jeff Law

On 07/26/2016 06:10 PM, kugan wrote:

Hi Jeff,

Thanks for your comments.




* tree-ssa-reassoc.c (maybe_optimize_range_tests): Check type
compatibility.

I'd kind of like to see some analysis of how we've got a bool here --
ISTM it ought to have been converted it to the type of the LHS of the
PHI when propagated.


You are right. The problem was with the order of checking tcc_compare
and calling get_ops. We ended up calling get_ops where we should not.

Bootstrap and regression testing is ongoing. Is this OK for trunk if no
regressions?

Thanks,
Kugan


gcc/testsuite/ChangeLog:

2016-07-27  Kugan Vivekanandarajah  

* gcc.dg/torture/pr71994.c: New test.

gcc/ChangeLog:

2016-07-27  Kugan Vivekanandarajah  

* tree-ssa-reassoc.c (maybe_optimize_range_tests): Check tcc_comparison
 before calling get_ops.

OK if no regressions.

jeff


Re: Use correct location information for OpenACC shape and simple clauses in C/C++

2016-07-27 Thread David Malcolm
On Wed, 2016-07-27 at 17:17 +0200, Thomas Schwinge wrote:
> Hi!
> 
> I found that for a lot of OpenACC (and potentially also OpenMP)
> clauses
> (in C/C++ front ends; didn't look at Fortran), we use wrong location
> information.  The problem is that
> c_parser_oacc_all_clauses/c_parser_omp_all_clauses calls
> cp_parser_omp_clause_name to determine the pragma_omp_clause c_kind,
> and
> that function (as documented) consumes the clause token before
> returning.
> So, when we then do "c_parser_peek_token (parser)->location" or
> similar
> in some clause parsing function, that will return the location
> information of the token _after_ the clause token, which -- at least
> very
> often -- is not desirable, in particular if that location information
> is
> used then in a build_omp_clause call, which should point to the
> clause
> token itself, and not whatever follows after that.
> 
> Probably that all went unnoticed for so long, because the GCC
> testsuite
> largely is running with -fno-diagnostics-show-caret, so we don't
> visually
> see the wrong location information (and nobody pays attention to the
> colum information as given, for example, as line 2, column 32 in
> "[...]/c-c++-common/goacc/routine-2.c:2:32: error: [...]".

> There seems to be a lot of inconsistency in that in all the clause
> parsing; here is just a first patch to fix the immediate problem I've
> been observing.  OK for trunk already, or need to clean this all up
> in
> one go?  Do we need this on release branches, as a "quality of
> implementation" fix (wrong diagnostic locations)?

[...]

I'm not a reviewer for the C/C++ FEs so I can't really review this
patch, but it might be nice in this (or in a followup) to add some test
cases for this that explicitly test the caret information for some of
these errors.

Put this at the top of a new test file:
/* { dg-options "-fdiagnostics-show-caret" } */
to override the default -fno-diagnostics-show-caret, and use this to
quote what we expect to see after each dg-error:

 /* { dg-begin-multiline-output "" }
EXPECTED CARET TEXT GOES HERE, with trailing dg- stuff trimmed off
{ dg-end-multiline-output "" } */


Hope this is constructive
Dave


Re: [PATCH 2/3] Run profile feedback tests with autofdo

2016-07-27 Thread Jeff Law

On 07/25/2016 10:28 PM, Andi Kleen wrote:

But it's not expected that it is not deterministic, so that it changes
run to run. Do you see that? Or some other problem? Please describe
it exactly.

It definitely changes run to run for me.


And do you have autofdo installed? (create_gcov)

No.

jeff


Re: Implement -Wimplicit-fallthrough (take 2): fix missing breaks

2016-07-27 Thread Jeff Law

On 07/27/2016 10:52 AM, Marek Polacek wrote:

This is what the new warning pointed out.  I think all these are bugs.

This patch has been tested on powerpc64le-unknown-linux-gnu, aarch64-linux-gnu,
and x86_64-redhat-linux.

2016-07-27  Marek Polacek  

PR c/7652
gcc/
* config/i386/i386.c (ix86_expand_args_builtin): Add break.
(ix86_expand_round_builtin): Likewise.
* config/rs6000/rs6000.c (altivec_expand_ld_builtin): Add break.
(altivec_expand_st_builtin): Likewise.
* gengtype.c (dbgprint_count_type_at): Add break.
gcc/java/
* jcf-dump.c (print_constant): Add break.
libgo/
* runtime/heapdump.c (dumpefacetypes): Add break.
Note that I think libgo's runtime is shared with golang and that Ian 
imports the runtime from that project.


jeff



Re: Implement -Wimplicit-fallthrough: core

2016-07-27 Thread Jeff Law

On 07/27/2016 11:59 AM, Marek Polacek wrote:

On Wed, Jul 27, 2016 at 11:04:04AM -0600, Martin Sebor wrote:

Just a couple of minor things:


+@cindex @code{fallthrough} statement attribute
+The @code{fallthrough} attribute with a null statement serves as a
+fallthrough statement.  It hints to the compiler that a statement
+that falls through to another case label, or user-defined label
+in a switch statement is intentional and thus the
+@option{-Wimplicit-fallthrough} warning must not trigger.  The
+fallthrough attribute might appear at most once in each attribute
+list, and might not be mixed with other attributes.


"Might appear" means that it may or may not appear (we don't know),
but the intent is to say that it's allowed to appear at most once.
Such a constraint is more correctly phrased in terms of either may
or or must, for example:

  The fallthrough attribute may appear at most once.

or

 The fallthrough attribute must not appear more than once.

Same for "might not be mixed with."


Oops, that's right.  Fixed.


In (very light) testing I noticed that in the following, the switch
in h() is diagnosed but the one in g() is not.

Martin

  int f (void);

  #define foo() for ( ; ; ) { if (f ()) break; }
  #define bar() do { if (f ()) break; } while (0)

  int g (int i)
  {
switch (i) {
case 0: foo ();
case 1: return 1;
}
return 0;
  }

  int h (int i)
  {
switch (i) {
case 0: bar ();
case 1: return 1;
}
return 0;
  }


Yeah, that is true.  I'm not sure if the warning can reasonably be expected to
handle all such cases and really see through loops like that, most likely not.
It's really just a heuristics.
Fortunately I didn't see any false positives due to that while bootstrapping.

Can you add it as an xfail'd test?

jeff



Re: [PATCH] Remove BITS_PER_UNIT_LOG

2016-07-27 Thread Jeff Law

On 07/27/2016 01:38 PM, Bernd Edlinger wrote:

Hi!

I've noticed that there are two defines for log2 of BITS_PER_UNIT:
BITS_PER_UNIT_LOG and LOG2_BITS_PER_UNIT.
At least this wheel was apparently invented more than once :)

Thus I'd say the most simple wheel will do for us as well.

This patch removes BITS_PER_UNIT_LOG as this is
less often used, and moves LOG2_BITS_PER_UNIT
to tree.h because default.h is obviously not always included
before tree.h.


Bootstrap and reg-test on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.


patch-unit-size.diff


gcc:
2016-07-27  Bernd Edlinger  

* defaults.h (LOG2_BITS_PER_UNIT): Move from here...
* tree.h (LOG2_BITS_PER_UNIT): ...to here.
(BITS_PER_UNIT_LOG): Remove.
(int_bit_position): Use LOG2_BITS_PER_UNIT instead of BITS_PER_UNIT_LOG.
* expr.c (expand_assignment): Likewise.
* stor-layout.c (initialize_sizetypes): Likewise.

c-family:
2016-07-27  Bernd Edlinger  

* c-common.c (check_user_alignment): Use LOG2_BITS_PER_UNIT instead of
BITS_PER_UNIT_LOG.

OK.
jeff



Re: [PATCH, 1 of 4 or 5], Enhance PowerPC vec_extract support for power8/power9 machines

2016-07-27 Thread Michael Meissner
On Wed, Jul 27, 2016 at 02:35:18PM -0500, Segher Boessenkool wrote:
> Later patches have some testcases?

Yes, as I said, since the first patch just changed the internal infrastructure
to allow for variable vec_extracts (in patch #2), it didn't change any code.

There will be new test cases for the successive patches.

> This is okay for trunk, with or without the cosmetic change.  Thanks,

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[PATCH] Remove BITS_PER_UNIT_LOG

2016-07-27 Thread Bernd Edlinger
Hi!

I've noticed that there are two defines for log2 of BITS_PER_UNIT:
BITS_PER_UNIT_LOG and LOG2_BITS_PER_UNIT.
At least this wheel was apparently invented more than once :)

Thus I'd say the most simple wheel will do for us as well.

This patch removes BITS_PER_UNIT_LOG as this is
less often used, and moves LOG2_BITS_PER_UNIT
to tree.h because default.h is obviously not always included
before tree.h.


Bootstrap and reg-test on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.gcc:
2016-07-27  Bernd Edlinger  

	* defaults.h (LOG2_BITS_PER_UNIT): Move from here...
	* tree.h (LOG2_BITS_PER_UNIT): ...to here.
	(BITS_PER_UNIT_LOG): Remove.
	(int_bit_position): Use LOG2_BITS_PER_UNIT instead of BITS_PER_UNIT_LOG.
	* expr.c (expand_assignment): Likewise.
	* stor-layout.c (initialize_sizetypes): Likewise.

c-family:
2016-07-27  Bernd Edlinger  

	* c-common.c (check_user_alignment): Use LOG2_BITS_PER_UNIT instead of
	BITS_PER_UNIT_LOG.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 238784)
+++ gcc/c-family/c-common.c	(working copy)
@@ -7679,7 +7679,7 @@ check_user_alignment (const_tree align, bool allow
   error ("requested alignment is not a positive power of 2");
   return -1;
 }
-  else if (i >= HOST_BITS_PER_INT - BITS_PER_UNIT_LOG)
+  else if (i >= HOST_BITS_PER_INT - LOG2_BITS_PER_UNIT)
 {
   error ("requested alignment is too large");
   return -1;
Index: gcc/defaults.h
===
--- gcc/defaults.h	(revision 238784)
+++ gcc/defaults.h	(working copy)
@@ -491,14 +491,6 @@ see the files COPYING3 and COPYING.RUNTIME respect
your target, you should override these values by defining the
appropriate symbols in your tm.h file.  */
 
-#if BITS_PER_UNIT == 8
-#define LOG2_BITS_PER_UNIT 3
-#elif BITS_PER_UNIT == 16
-#define LOG2_BITS_PER_UNIT 4
-#else
-#error Unknown BITS_PER_UNIT
-#endif
-
 #ifndef BITS_PER_WORD
 #define BITS_PER_WORD (BITS_PER_UNIT * UNITS_PER_WORD)
 #endif
Index: gcc/expr.c
===
--- gcc/expr.c	(revision 238784)
+++ gcc/expr.c	(working copy)
@@ -4990,8 +4990,7 @@ expand_assignment (tree to, tree from, bool nontem
   if (bitpos < 0)
 	{
 	  gcc_assert (offset == NULL_TREE);
-	  offset = size_int (bitpos >> (BITS_PER_UNIT == 8
-	? 3 : exact_log2 (BITS_PER_UNIT)));
+	  offset = size_int (bitpos >> LOG2_BITS_PER_UNIT);
 	  bitpos &= BITS_PER_UNIT - 1;
 	}
 
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c	(revision 238784)
+++ gcc/stor-layout.c	(working copy)
@@ -2572,7 +2572,7 @@ initialize_sizetypes (void)
 }
 
   bprecision
-= MIN (precision + BITS_PER_UNIT_LOG + 1, MAX_FIXED_MODE_SIZE);
+= MIN (precision + LOG2_BITS_PER_UNIT + 1, MAX_FIXED_MODE_SIZE);
   bprecision
 = GET_MODE_PRECISION (smallest_mode_for_size (bprecision, MODE_INT));
   if (bprecision > HOST_BITS_PER_DOUBLE_INT)
Index: gcc/tree.h
===
--- gcc/tree.h	(revision 238784)
+++ gcc/tree.h	(working copy)
@@ -4235,6 +4235,8 @@ extern tree bit_position (const_tree);
 extern tree byte_position (const_tree);
 extern HOST_WIDE_INT int_byte_position (const_tree);
 
+/* Type for sizes of data-type.  */
+
 #define sizetype sizetype_tab[(int) stk_sizetype]
 #define bitsizetype sizetype_tab[(int) stk_bitsizetype]
 #define ssizetype sizetype_tab[(int) stk_ssizetype]
@@ -4244,12 +4246,15 @@ extern HOST_WIDE_INT int_byte_position (const_tree
 #define bitsize_int(L) size_int_kind (L, stk_bitsizetype)
 #define sbitsize_int(L) size_int_kind (L, stk_sbitsizetype)
 
-/* Type for sizes of data-type.  */
+/* Log2 of BITS_PER_UNIT.  */
 
-#define BITS_PER_UNIT_LOG \
-  ((BITS_PER_UNIT > 1) + (BITS_PER_UNIT > 2) + (BITS_PER_UNIT > 4) \
-   + (BITS_PER_UNIT > 8) + (BITS_PER_UNIT > 16) + (BITS_PER_UNIT > 32) \
-   + (BITS_PER_UNIT > 64) + (BITS_PER_UNIT > 128) + (BITS_PER_UNIT > 256))
+#if BITS_PER_UNIT == 8
+#define LOG2_BITS_PER_UNIT 3
+#elif BITS_PER_UNIT == 16
+#define LOG2_BITS_PER_UNIT 4
+#else
+#error Unknown BITS_PER_UNIT
+#endif
 
 /* Concatenate two lists (chains of TREE_LIST nodes) X and Y
by making the last node in X point to Y.
@@ -5400,8 +5405,8 @@ extern GTY(()) struct int_n_trees_t int_n_trees[NU
 
 inline HOST_WIDE_INT
 int_bit_position (const_tree field)
-{ 
-  return ((wi::to_offset (DECL_FIELD_OFFSET (field)) << BITS_PER_UNIT_LOG)
+{
+  return ((wi::to_offset (DECL_FIELD_OFFSET (field)) << LOG2_BITS_PER_UNIT)
 	  + wi::to_offset (DECL_FIELD_BIT_OFFSET (field))).to_shwi ();
 }
 


Re: [PATCH, 1 of 4 or 5], Enhance PowerPC vec_extract support for power8/power9 machines

2016-07-27 Thread Segher Boessenkool
On Wed, Jul 27, 2016 at 10:32:21AM -0400, Michael Meissner wrote:
> 2016-07-27  Michael Meissner  
> 
>   * config/rs6000/vector.md (vec_extract): Change the calling
>   signature of rs6000_expand_vector_extract so that the element
>   number is a RTX instead of a constant integer.
>   * config/rs6000/rs6000-protos.h (rs6000_expand_vector_extract):
>   Likewise.
>   * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Likewise.
>   (altivec_expand_vec_ext_builtin): Likewise.
>   * config/rs6000/altivec.md (reduc_plus_scal_): Likewise.
>   * config/rs6000/vsx.md (vsx_extract_): Fix spelling of the
>   MFVSRLD instruction.

> @@ -14658,14 +14661,18 @@ altivec_expand_vec_ext_builtin (tree exp
>  {
>machine_mode tmode, mode0;
>tree arg0, arg1;
> -  int elt;
>rtx op0;
> +  rtx op1;

You could put op0, op1 on one line, or better yet, declare them where
they are first initialised.

> --- gcc/config/rs6000/vsx.md  (revision 238772)
> +++ gcc/config/rs6000/vsx.md  (working copy)
> @@ -2159,7 +2159,7 @@ (define_insn "vsx_extract_"
>  
>else if (element == VECTOR_ELEMENT_MFVSRLD_64BIT && INT_REGNO_P (op0_regno)
>  && TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE)
> -return "mfvsrdl %0,%x1";
> +return "mfvsrld %0,%x1";

Later patches have some testcases?

This is okay for trunk, with or without the cosmetic change.  Thanks,


Segher


Re: Implement -Wimplicit-fallthrough: core

2016-07-27 Thread Marek Polacek
On Wed, Jul 27, 2016 at 11:04:04AM -0600, Martin Sebor wrote:
> Just a couple of minor things:
> 
> > +@cindex @code{fallthrough} statement attribute
> > +The @code{fallthrough} attribute with a null statement serves as a
> > +fallthrough statement.  It hints to the compiler that a statement
> > +that falls through to another case label, or user-defined label
> > +in a switch statement is intentional and thus the
> > +@option{-Wimplicit-fallthrough} warning must not trigger.  The
> > +fallthrough attribute might appear at most once in each attribute
> > +list, and might not be mixed with other attributes.
> 
> "Might appear" means that it may or may not appear (we don't know),
> but the intent is to say that it's allowed to appear at most once.
> Such a constraint is more correctly phrased in terms of either may
> or or must, for example:
> 
>   The fallthrough attribute may appear at most once.
> 
> or
> 
>  The fallthrough attribute must not appear more than once.
> 
> Same for "might not be mixed with."
 
Oops, that's right.  Fixed.

> In (very light) testing I noticed that in the following, the switch
> in h() is diagnosed but the one in g() is not.
> 
> Martin
> 
>   int f (void);
> 
>   #define foo() for ( ; ; ) { if (f ()) break; }
>   #define bar() do { if (f ()) break; } while (0)
> 
>   int g (int i)
>   {
> switch (i) {
> case 0: foo ();
> case 1: return 1;
> }
> return 0;
>   }
> 
>   int h (int i)
>   {
> switch (i) {
> case 0: bar ();
> case 1: return 1;
> }
> return 0;
>   }

Yeah, that is true.  I'm not sure if the warning can reasonably be expected to
handle all such cases and really see through loops like that, most likely not.
It's really just a heuristics.
Fortunately I didn't see any false positives due to that while bootstrapping.

Thanks,

Marek


Re: Implement -Wimplicit-fallthrough: core (take 2)

2016-07-27 Thread Marek Polacek
On Wed, Jul 27, 2016 at 06:46:33PM +0200, Marek Polacek wrote:
> +static tree
> +warn_implicit_fallthrough_r (gimple_stmt_iterator *gsi_p, bool 
> *handled_ops_p,
> +struct walk_stmt_info *)

Yes, bad formatting here.  I'll fix.

Marek


Re: Implement -Wimplicit-fallthrough (take 2): fix missing breaks

2016-07-27 Thread Tom Tromey
> "Marek" == Marek Polacek  writes:

Marek> gcc/java/
Marek>  * jcf-dump.c (print_constant): Add break.

This bit is ok.

Tom


[PATCH] testsuite/20_util/forward/1_neg.cc: Move dg-error to right line.

2016-07-27 Thread Jonathan Wakely

The dg-error was on the wrong line in this test. It happened to
pass by chance because it was on line 31 and the error happened at
location 30:31 so the column number matched the dg-error line number!

Running with -fno-show-column caused it to fail.

Splitting the expression into three lines allows us to put the
dg-error with the right sub-expression, and make sure it matches
what's intended.

Tested powerpc64le-linux, committed to trunk.

commit 203176bafe5b215f44bd82dd4e5cacd06a1cb6ab
Author: redi 
Date:   Wed Jul 27 17:32:45 2016 +

* testsuite/20_util/forward/1_neg.cc: Move dg-error to right line.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@238793 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/testsuite/20_util/forward/1_neg.cc 
b/libstdc++-v3/testsuite/20_util/forward/1_neg.cc
index d2f3477..97851f2 100644
--- a/libstdc++-v3/testsuite/20_util/forward/1_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/forward/1_neg.cc
@@ -27,8 +27,9 @@ template
   std::shared_ptr
   factory(A1&& a1, A2&& a2)
   {
-return std::shared_ptr(new T(std::forward(a1),
-   std::forward(a2))); // { dg-error 
"rvalue" }
+return std::shared_ptr(
+   new T(std::forward(a1), // { dg-error "rvalue" }
+  std::forward(a2)));
   }
 
 struct A


[committed] Move make_location from tree.h/c to input.h/c

2016-07-27 Thread David Malcolm
For some reason I added make_location and some related functions to
tree.h/c, rather than to input.h/c.  Move them there, so we can use them
without requiring tree, and add some selftest coverage.

Bootstrapped on x86_64-pc-linux-gnu.

Committed to trunk as r238792.

gcc/ChangeLog:
* input.c (get_pure_location): Move here from tree.c.
(make_location): Likewise.  Add header comment.
(selftest::test_accessing_ordinary_linemaps): Verify
pure_location_p, make_location, get_location_from_adhoc_loc and
get_range_from_loc.
* input.h (get_pure_location): Move declaration here from tree.h.
(get_finish): Likewise for inline function.
(make_location): Likewise for declaration.
* tree.c (get_pure_location): Move to input.c.
(make_location): Likewise.
* tree.h (get_pure_location): Move declaration to tree.h.
(get_finish): Likewise for inline function.
(make_location): Likewise for declaration.

libcpp/ChangeLog:
* include/line-map.h (source_location): Fix line numbers in
comment.
---
 gcc/input.c   | 60 +++
 gcc/input.h   | 13 ++
 gcc/tree.c| 36 
 gcc/tree.h| 13 --
 libcpp/include/line-map.h |  4 ++--
 5 files changed, 75 insertions(+), 51 deletions(-)

diff --git a/gcc/input.c b/gcc/input.c
index 5033824..a7aced2 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -801,6 +801,56 @@ expansion_point_location (source_location location)
   LRK_MACRO_EXPANSION_POINT, NULL);
 }
 
+/* Given location LOC, strip away any packed range information
+   or ad-hoc information.  */
+
+location_t
+get_pure_location (location_t loc)
+{
+  if (IS_ADHOC_LOC (loc))
+loc
+  = line_table->location_adhoc_data_map.data[loc & 
MAX_SOURCE_LOCATION].locus;
+
+  if (loc >= LINEMAPS_MACRO_LOWEST_LOCATION (line_table))
+return loc;
+
+  if (loc < RESERVED_LOCATION_COUNT)
+return loc;
+
+  const line_map *map = linemap_lookup (line_table, loc);
+  const line_map_ordinary *ordmap = linemap_check_ordinary (map);
+
+  return loc & ~((1 << ordmap->m_range_bits) - 1);
+}
+
+/* Construct a location with caret at CARET, ranging from START to
+   finish e.g.
+
+ 112
+12345678901234567890
+ 522
+ 523   return foo + bar;
+  ^
+ 524
+
+   The location's caret is at the "+", line 523 column 15, but starts
+   earlier, at the "f" of "foo" at column 11.  The finish is at the "r"
+   of "bar" at column 19.  */
+
+location_t
+make_location (location_t caret, location_t start, location_t finish)
+{
+  location_t pure_loc = get_pure_location (caret);
+  source_range src_range;
+  src_range.m_start = start;
+  src_range.m_finish = finish;
+  location_t combined_loc = COMBINE_LOCATION_DATA (line_table,
+  pure_loc,
+  src_range,
+  NULL);
+  return combined_loc;
+}
+
 #define ONE_K 1024
 #define ONE_M (ONE_K * ONE_K)
 
@@ -1603,6 +1653,16 @@ test_accessing_ordinary_linemaps (const line_table_case 
_)
   assert_loceq ("bar.c", 1, 150, loc_f);
 
   ASSERT_FALSE (is_location_from_builtin_token (loc_a));
+  ASSERT_TRUE (pure_location_p (line_table, loc_a));
+
+  /* Verify using make_location to build a range, and extracting data
+ back from it.  */
+  location_t range_c_b_d = make_location (loc_c, loc_b, loc_d);
+  ASSERT_FALSE (pure_location_p (line_table, range_c_b_d));
+  ASSERT_EQ (loc_c, get_location_from_adhoc_loc (line_table, range_c_b_d));
+  source_range src_range = get_range_from_loc (line_table, range_c_b_d);
+  ASSERT_EQ (loc_b, src_range.m_start);
+  ASSERT_EQ (loc_d, src_range.m_finish);
 }
 
 /* Verify various properties of UNKNOWN_LOCATION.  */
diff --git a/gcc/input.h b/gcc/input.h
index b61cf19..24d9115 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -77,6 +77,19 @@ extern location_t input_location;
 #define from_macro_expansion_at(LOC) \
   ((linemap_location_from_macro_expansion_p (line_table, LOC)))
 
+extern location_t get_pure_location (location_t loc);
+
+/* Get the endpoint of any range encoded within location LOC.  */
+
+static inline location_t
+get_finish (location_t loc)
+{
+  return get_range_from_loc (line_table, loc).m_finish;
+}
+
+extern location_t make_location (location_t caret,
+location_t start, location_t finish);
+
 void dump_line_table_statistics (void);
 
 void dump_location_info (FILE *stream);
diff --git a/gcc/tree.c b/gcc/tree.c
index 661d385..11d3b51 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -14112,28 +14112,6 @@ nonnull_arg_p (const_tree arg)
   return false;
 }
 
-/* Given location LOC, strip away any packed range information
-   or ad-hoc information.  */
-
-location_t

Re: Implement -Wimplicit-fallthrough (take 2): fix missing breaks

2016-07-27 Thread Mike Stump
On Jul 27, 2016, at 9:52 AM, Marek Polacek  wrote:
> 
> This is what the new warning pointed out.  I think all these are bugs.
> 
> --- gcc/libgo/runtime/heapdump.c
> +++ gcc/libgo/runtime/heapdump.c
> @@ -766,6 +766,7 @@ dumpefacetypes(void *obj __attribute__ ((unused)), 
> uintptr size, const Type *typ
>   for(i = 0; i <= size - type->__size; i += type->__size)
>   //playgcprog(i, (uintptr*)type->gc + 1, 
> dumpeface_callback, obj);
>   break;
> + break;
>   case TypeInfo_Chan:
>   if(type->__size == 0) // channels may have zero-sized objects 
> in them
>   break;

I disagree that's the best fix.  Better would be to uncomment out the 
playgcprog calls, and #if 0 the entire contents of the function.

Re: Implement -Wimplicit-fallthrough: core

2016-07-27 Thread Martin Sebor

Just a couple of minor things:


+@cindex @code{fallthrough} statement attribute
+The @code{fallthrough} attribute with a null statement serves as a
+fallthrough statement.  It hints to the compiler that a statement
+that falls through to another case label, or user-defined label
+in a switch statement is intentional and thus the
+@option{-Wimplicit-fallthrough} warning must not trigger.  The
+fallthrough attribute might appear at most once in each attribute
+list, and might not be mixed with other attributes.


"Might appear" means that it may or may not appear (we don't know),
but the intent is to say that it's allowed to appear at most once.
Such a constraint is more correctly phrased in terms of either may
or or must, for example:

  The fallthrough attribute may appear at most once.

or

 The fallthrough attribute must not appear more than once.

Same for "might not be mixed with."

In (very light) testing I noticed that in the following, the switch
in h() is diagnosed but the one in g() is not.

Martin

  int f (void);

  #define foo() for ( ; ; ) { if (f ()) break; }
  #define bar() do { if (f ()) break; } while (0)

  int g (int i)
  {
switch (i) {
case 0: foo ();
case 1: return 1;
}
return 0;
  }

  int h (int i)
  {
switch (i) {
case 0: bar ();
case 1: return 1;
}
return 0;
  }



Re: Implement -Wimplicit-fallthrough: the rest

2016-07-27 Thread Marek Polacek
On Fri, Jul 22, 2016 at 01:27:51PM +0200, Bernd Schmidt wrote:
> On 07/22/2016 01:15 PM, Jakub Jelinek wrote:
> > > @@ -32335,6 +32341,7 @@ rs6000_handle_altivec_attribute (tree *node,
> > >   case V4SImode: case V8HImode: case V16QImode: case V4SFmode:
> > >   case V2DImode: case V2DFmode:
> > > result = type;
> > > +   gcc_fallthrough ();
> > >   default: break;
> > >   }
> > >break;
> > > @@ -32345,6 +32352,7 @@ rs6000_handle_altivec_attribute (tree *node,
> > >   case SImode: case V4SImode: result = bool_V4SI_type_node; break;
> > >   case HImode: case V8HImode: result = bool_V8HI_type_node; break;
> > >   case QImode: case V16QImode: result = bool_V16QI_type_node;
> > > +   gcc_fallthrough ();
> > >   default: break;
> > >   }
> > >break;
> > > @@ -32352,6 +32360,7 @@ rs6000_handle_altivec_attribute (tree *node,
> > >switch (mode)
> > >   {
> > >   case V8HImode: result = pixel_V8HI_type_node;
> > > +   gcc_fallthrough ();
> > >   default: break;
> > >   }
> > >  default: break;
> > 
> > I thought we don't warn on these anymore?
> 
> Also, please just add a break for cases like this. The last one ought to be
> an if statement in any case.

With the current version code like that shouldn't need any adjustments since
we shouldn't warn on it.

Marek


Implement -Wimplicit-fallthrough (take 2): the rest

2016-07-27 Thread Marek Polacek
And this is the rest.  Either I just adjusted a falls through comment, or
I added __builtin_fallthrough ().  These were the cases where I was fairly
sure that the fall through is intentional.

This patch has been tested on powerpc64le-unknown-linux-gnu, aarch64-linux-gnu,
and x86_64-redhat-linux.

2016-07-27  Marek Polacek  

PR c/7652
gcc/
* Makefile.in (insn-attrtab.o-warn, insn-dfatab.o-warn,
insn-latencytab.o-warn, insn-output.o-warn, insn-emit.o-warn): Add
-Wno-switch-fallthrough.
* alias.c (find_base_value): Adjust falls through comment.
* builtins.c (expand_builtin_int_roundingfn_2): Add gcc_fallthrough.
(expand_builtin): Likewise.
* cfgexpand.c (expand_debug_expr): Adjust falls through comment.
* combine.c (find_split_point): Likewise.
(expand_compound_operation): Likewise.  Add gcc_fallthrough.
(make_compound_operation): Adjust falls through comment.
(canon_reg_for_combine): Add gcc_fallthrough.
(force_to_mode): Adjust falls through comment.  Add gcc_fallthrough.
(simplify_shift_const_1): Adjust falls through comment.
(simplify_comparison): Likewise.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Add
gcc_fallthrough.
* config/aarch64/predicates.md: Likewise.
* config/i386/i386.c (ix86_gimplify_va_arg): Add gcc_fallthrough.
(print_reg): Likewise.
(ix86_print_operand): Likewise.
(ix86_build_const_vector): Likewise.
(ix86_expand_branch): Likewise.
(ix86_sched_init_global): Adjust falls through comment.
(ix86_expand_multi_arg_builtin): Add gcc_fallthrough.
(ix86_expand_builtin): Add gcc_fallthrough.
(ix86_expand_vector_init_one_var): Likewise.
* config/rs6000/rs6000.c (rs6000_builtin_vectorized_libmass): Likewise.
(rs6000_emit_vector_compare_inner): Add gcc_fallthrough.
(rs6000_adjust_cost): Likewise.
(insn_must_be_first_in_group): Likewise.
* config/rs6000/rs6000.md: Likewise.  Adjust falls through comment.
* convert.c (convert_to_real_1): Add gcc_fallthrough.
(convert_to_integer_1): Likewise.
* dbxout.c (dbxout_symbol): Adjust falls through comment.
* gcc/df-scan.c (df_uses_record): Likewise.
* dojump.c (do_jump): Add gcc_fallthrough.
* dwarf2out.c (mem_loc_descriptor): Likewise.  Adjust falls through
comment.
(resolve_args_picking_1): Likewise.
(loc_list_from_tree_1): Likewise.
(gen_formal_parameter_die): Likewise.
* expmed.c (expand_divmod): Add gcc_fallthrough.
(make_tree): Adjust falls through comment.
* expr.c (expand_expr_real_2): Add gcc_fallthrough.
(expand_expr_real_1): Likewise.  Adjust falls through comment.
* final.c (output_alternate_entry_point): Add gcc_fallthrough.
* fold-const.c (const_binop): Adjust falls through comment.
(fold_truth_not_expr): Likewise.
(make_range_step): Add gcc_fallthrough.
(fold_cond_expr_with_comparison): Likewise.
(fold_binary_loc): Likewise.
(contains_label_1): Adjust falls through comment.
(multiple_of_p): Likewise.
* gcc.c (driver_handle_option): Add gcc_fallthrough.
* gcov-tool.c (process_args): Likewise.
* genattrtab.c (check_attr_test): Likewise.
(make_canonical): Likewise.
(write_test_expr): Likewise.
* genconfig.c (walk_insn_part): Likewise.
* genpreds.c (validate_exp): Adjust falls through comment.  Add
gcc_fallthrough.
(needs_variable): Adjust falls through comment.
* gimple-pretty-print.c (dump_gimple_assign): Add gcc_fallthrough.
* gimple-ssa-strength-reduction.c
(find_candidates_dom_walker::before_dom_children): Likewise.
* gimplify.c (gimplify_addr_expr): Adjust falls through comment.
(gimplify_scan_omp_clauses): Add gcc_fallthrough.
* godump.c (go_format_type): Add gcc_fallthrough.
* graphite-isl-ast-to-gimple.c (substitute_ssa_name): Adjust falls
through comment.
* hsa-gen.c (get_address_from_value): Likewise.
* ipa-icf.c (sem_function::hash_stmt): Adjust falls through comment.
* ira.c (ira_setup_alts): Add gcc_fallthrough.
* lra-eliminations.c (lra_eliminate_regs_1): Adjust falls through
comment.
* lto-streamer-out.c (lto_output_tree_ref): Add gcc_fallthrough.
* omp-low.c (scan_sharing_clauses): Add gcc_fallthrough.
(lower_rec_input_clauses): Likewise.
* opts.c (common_handle_option): Likewise.
* print-rtl.c (print_rtx): Likewise.
* read-rtl.c (read_rtx_code): Likewise.
* real.c (round_for_format): Likewise.
* recog.c (asm_operand_ok): Likewise.
* reginfo.c (reg_scan_mark_refs): Adjust falls through comment.
* reload1.c 

Implement -Wimplicit-fallthrough (take 2): questionable code

2016-07-27 Thread Marek Polacek
These are the cases where I wasn't sure if the falls through were intentional
or not.

This patch has been tested on powerpc64le-unknown-linux-gnu, aarch64-linux-gnu,
and x86_64-redhat-linux.

2016-07-27  Marek Polacek  

PR c/7652
gcc/
* config/i386/i386.c (ix86_expand_branch): Add gcc_fallthrough..
* cselib.c (cselib_expand_value_rtx_1): Likewise. 
* gensupport.c (get_alternatives_number): Likewise.
(subst_dup): Likewise.
* gimplify.c (goa_stabilize_expr): Likewise.
* hsa-gen.c (gen_hsa_insn_for_internal_fn_call): Likewise.
* reg-stack.c (get_true_reg): Likewise.
* tree-complex.c (expand_complex_division): Likewise.
* tree-data-ref.c (get_references_in_stmt): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* var-tracking.c (adjust_mems): Likewise.
gcc/cp/
* call.c (add_builtin_candidate): Add gcc_fallthrough.
* cxx-pretty-print.c (pp_cxx_unqualified_id): Likewise.
* parser.c (cp_parser_skip_to_end_of_statement): Likewise.
(cp_parser_cache_defarg): Likewise.
gcc/c-family/
* c-ada-spec.c (dump_generic_ada_node): Add gcc_fallthrough.
libcpp/
* pch.c (write_macdef): Add CPP_FALLTHRU.
* lex.c (_cpp_lex_direct): Likewise.

diff --git gcc/gcc/c-family/c-ada-spec.c gcc/gcc/c-family/c-ada-spec.c
index e33fdff..659ffa0 100644
--- gcc/gcc/c-family/c-ada-spec.c
+++ gcc/gcc/c-family/c-ada-spec.c
@@ -1862,6 +1862,8 @@ dump_generic_ada_node (pretty_printer *buffer, tree node, 
tree type, int spc,
 case TREE_BINFO:
   dump_generic_ada_node
(buffer, BINFO_TYPE (node), type, spc, limited_access, name_only);
+  /* XXX Really fallthru?  */
+  gcc_fallthrough ();
 
 case TREE_VEC:
   pp_string (buffer, "--- unexpected node: TREE_VEC");
--- gcc/gcc/config/i386/i386.c
+++ gcc/gcc/config/i386/i386.c
@@ -22484,6 +22495,8 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx 
op1, rtx label)
  op0 = force_reg (mode, gen_rtx_XOR (mode, op0, op1));
  op1 = const0_rtx;
}
+  /* XXX Really fallthru?  */
+  gcc_fallthrough ();
 case TImode:
   /* Expand DImode branch into multiple compare+branch.  */
   {
--- gcc/gcc/cp/call.c
+++ gcc/gcc/cp/call.c
@@ -2542,6 +2544,8 @@ add_builtin_candidate (struct z_candidate **candidates, 
enum tree_code code,
  type2 = ptrdiff_type_node;
  break;
}
+  /* XXX Really fallthru?  */
+  gcc_fallthrough ();
 case MULT_EXPR:
 case TRUNC_DIV_EXPR:
   if (ARITHMETIC_TYPE_P (type1) && ARITHMETIC_TYPE_P (type2))
--- gcc/gcc/cp/cxx-pretty-print.c
+++ gcc/gcc/cp/cxx-pretty-print.c
@@ -142,6 +142,8 @@ pp_cxx_unqualified_id (cxx_pretty_printer *pp, tree t)
 
 case OVERLOAD:
   t = OVL_CURRENT (t);
+  /* XXX Really fallthru?  */
+  gcc_fallthrough ();
 case VAR_DECL:
 case PARM_DECL:
 case CONST_DECL:
--- gcc/gcc/cp/parser.c
+++ gcc/gcc/cp/parser.c
@@ -3488,6 +3488,8 @@ cp_parser_skip_to_end_of_statement (cp_parser* parser)
  cp_lexer_consume_token (parser->lexer);
  return;
}
+ /* XXX Really fallthru?  */
+ gcc_fallthrough ();
 
case CPP_OPEN_BRACE:
  ++nesting_depth;
@@ -27640,6 +27646,8 @@ cp_parser_cache_defarg (cp_parser *parser, bool nsdmi)
  parser->in_template_argument_list_p = saved_italp;
  break;
}
+ /* XXX Really fallthru?  */
+ gcc_fallthrough ();
case CPP_CLOSE_PAREN:
case CPP_ELLIPSIS:
  /* If we run into a non-nested `;', `}', or `]',
--- gcc/gcc/cselib.c
+++ gcc/gcc/cselib.c
@@ -1619,6 +1619,8 @@ cselib_expand_value_rtx_1 (rtx orig, struct 
expand_value_data *evd,
return orig;
}
   }
+  /* XXX Really fallthru?  */
+  gcc_fallthrough ();
 
 CASE_CONST_ANY:
 case SYMBOL_REF:
diff --git gcc/gcc/gensupport.c gcc/gcc/gensupport.c
index 0eb4591..795f88c 100644
--- gcc/gcc/gensupport.c
+++ gcc/gcc/gensupport.c
@@ -1038,6 +1038,8 @@ get_alternatives_number (rtx pattern, int *n_alt, 
file_location loc)
case 'V':
  if (XVEC (pattern, i) == NULL)
break;
+ /* XXX Really fallthru?  */
+ gcc_fallthrough ();
 
case 'E':
  for (j = XVECLEN (pattern, i) - 1; j >= 0; --j)
@@ -2156,6 +2158,8 @@ subst_dup (rtx pattern, int n_alt, int n_subst_alt)
case 'V':
  if (XVEC (pattern, i) == NULL)
break;
+ /* XXX Really fallthru?  */
+ gcc_fallthrough ();
case 'E':
  if (code != MATCH_DUP && code != MATCH_OP_DUP)
for (j = XVECLEN (pattern, i) - 1; j >= 0; --j)
--- gcc/gcc/gimplify.c
+++ gcc/gcc/gimplify.c
@@ -10320,6 +10324,8 @@ goa_stabilize_expr (tree *expr_p, gimple_seq *pre_p, 
tree lhs_addr,
 case tcc_comparison:
   saw_lhs |= goa_stabilize_expr (_OPERAND (expr, 

Re: [v3 PATCH] Implement std::string_view and P0254r2, Integrating std::string_view and std::string.

2016-07-27 Thread Jonathan Wakely

On 25/07/16 00:41 +0300, Ville Voutilainen wrote:

Changelog as it was before, tested on Linux-x64.

I haven't fixed all section references of string tests, and I haven't
added section references
to string_view tests, because they didn't have any. I also haven't
added tests that test the various
exception-throwing cases for the various operations. Doing those changes is
a) going to change many many files
b) probably not urgent
so here's a modest suggestion:
let's put this patch in, and tweak the section references in the other
tests later with separate patches,
as well as add more comprehensive exception-tests.

Ok for trunk?


+#if __cplusplus > 201402L
+  /**
+   *  @brief  Append a string_view.
+   *  @param __sv  The string_view to be appended.
+   *  @return  Reference to this string.
+   */
+  basic_string& append(__sv_type __sv)
+  {
+   return this->append(__sv.data(), __sv.size());
+  }

One-liners like this don't need braces on a separate line:

 { return this->append(__sv.data(), __sv.size()); }

But they do need a blank line after them before the next function:

+  /**
+   *  @brief  Append a range of characters from a string_view.
+   *  @param __sv  The string_view to be appended from.
+   *  @param __pos The position in the string_view to append
from.
+   *  @param __n   The number of characters to append from the
string_view.
+   *  @return  Reference to this string.
+   */
+  basic_string& append(__sv_type __sv,
+  size_type __pos, size_type __n = npos)
+  {
+   return _M_append(__sv.data()
++ __sv._M_check(__pos,
"basic_string::append"),
+__sv._M_limit(__pos, __n));
+  }
+#endif // C++17
+


The Doxygen @file comment in  refers to
"experimental" still.

We need to close the versioned namespace before this:

+  namespace __detail
+  {

And reopen it afterwards. i.e.

_GLIBCXX_END_NAMESPACE_VERSION
 namespace __detail
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION

 ...

 _GLIBCXX_END_NAMESPACE_VERSION
 }
_GLIBCXX_BEGIN_NAMESPACE_VERSION

Otherwise for the versioned namespace config we would have
std::__7::__detail and also std::__detail::__7 and thigns would get
ambiguous.

I don't think we want the non-standard literals in namespace std:

+  // I added these EMSR.
+  inline namespace literals
+  {
+  inline namespace string_view_literals
+  {
+


I've recently discovered that the order of dejagnu directives matters,
and this is wrong:

+// { dg-do run { xfail *-*-* } }
+// { dg-options "-std=gnu++17 -O0" }
+// { dg-require-debug-mode "" }

It should be dg-options first, then dg-do, then any dg-require-xxx
last. It doesn't actually make any difference in any of the tests in
your patch though, so don't worry about changing it.

The case where it matters is where the dg-do directive has a target
which might be dependent on the dg-options:

// { dg-do compile { target c++1z } }
// { dg-options "-std=gnu++1z" }

This wouldn't get run, because the effective target is tested before
-std=gnu++1z is added to the options.

And this would always run, ignoring the dg-require:

// { dg-require-effective-target c++1z }
// { dg-do compile }

But as I said, don't worry about changing them.

OK for trunk with the changes to formatting, namespaces and literals.

Thanks for doing this!



Implement -Wimplicit-fallthrough (take 2): fix missing breaks

2016-07-27 Thread Marek Polacek
This is what the new warning pointed out.  I think all these are bugs.

This patch has been tested on powerpc64le-unknown-linux-gnu, aarch64-linux-gnu,
and x86_64-redhat-linux.

2016-07-27  Marek Polacek  

PR c/7652
gcc/
* config/i386/i386.c (ix86_expand_args_builtin): Add break.
(ix86_expand_round_builtin): Likewise.
* config/rs6000/rs6000.c (altivec_expand_ld_builtin): Add break.
(altivec_expand_st_builtin): Likewise.
* gengtype.c (dbgprint_count_type_at): Add break.
gcc/java/
* jcf-dump.c (print_constant): Add break.
libgo/
* runtime/heapdump.c (dumpefacetypes): Add break.

--- gcc/gcc/config/i386/i386.c
+++ gcc/gcc/config/i386/i386.c
@@ -40171,6 +40187,7 @@ ix86_expand_args_builtin (const struct 
builtin_description *d,
 case 5:
   pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op,
 args[2].op, args[3].op, args[4].op);
+  break;
 case 6:
   pat = GEN_FCN (icode) (real_target, args[0].op, args[1].op,
 args[2].op, args[3].op, args[4].op,
@@ -40545,6 +40562,7 @@ ix86_expand_round_builtin (const struct 
builtin_description *d,
 case 5:
   pat = GEN_FCN (icode) (target, args[0].op, args[1].op,
 args[2].op, args[3].op, args[4].op);
+  break;
 case 6:
   pat = GEN_FCN (icode) (target, args[0].op, args[1].op,
 args[2].op, args[3].op, args[4].op,
--- gcc/gcc/config/rs6000/rs6000.c
+++ gcc/gcc/config/rs6000/rs6000.c
@@ -14401,6 +14401,7 @@ altivec_expand_ld_builtin (tree exp, rtx target, bool 
*expandedp)
   break;
 case ALTIVEC_BUILTIN_LD_INTERNAL_2di:
   icode = CODE_FOR_vector_altivec_load_v2di;
+  break;
 case ALTIVEC_BUILTIN_LD_INTERNAL_1ti:
   icode = CODE_FOR_vector_altivec_load_v1ti;
   break;
@@ -14462,6 +14463,7 @@ altivec_expand_st_builtin (tree exp, rtx target 
ATTRIBUTE_UNUSED,
   break;
 case ALTIVEC_BUILTIN_ST_INTERNAL_2di:
   icode = CODE_FOR_vector_altivec_store_v2di;
+  break;
 case ALTIVEC_BUILTIN_ST_INTERNAL_1ti:
   icode = CODE_FOR_vector_altivec_store_v1ti;
   break;
--- gcc/gcc/gengtype.c
+++ gcc/gcc/gengtype.c
@@ -175,6 +175,7 @@ dbgprint_count_type_at (const char *fil, int lin, const 
char *msg, type_p t)
{
case TYPE_UNDEFINED:
  nb_undefined++;
+ break;
case TYPE_SCALAR:
  nb_scalar++;
  break;
--- gcc/gcc/java/jcf-dump.c
+++ gcc/gcc/java/jcf-dump.c
@@ -926,6 +926,7 @@ print_constant (FILE *out, JCF *jcf, int index, int 
verbosity)
  if (verbosity > 0)
fprintf (out, "Fieldref: %ld=", (long) JPOOL_USHORT2 (jcf, index));
  print_constant (out, jcf, JPOOL_USHORT2 (jcf, index), 0);
+ break;
case 5:
case 6:
case 7:
--- gcc/libgo/runtime/heapdump.c
+++ gcc/libgo/runtime/heapdump.c
@@ -766,6 +766,7 @@ dumpefacetypes(void *obj __attribute__ ((unused)), uintptr 
size, const Type *typ
for(i = 0; i <= size - type->__size; i += type->__size)
//playgcprog(i, (uintptr*)type->gc + 1, 
dumpeface_callback, obj);
break;
+   break;
case TypeInfo_Chan:
if(type->__size == 0) // channels may have zero-sized objects 
in them
break;

Marek


Re: Implement -Wimplicit-fallthrough: the rest

2016-07-27 Thread Marek Polacek
On Fri, Jul 22, 2016 at 01:15:41PM +0200, Jakub Jelinek wrote:
> > @@ -32352,6 +32360,7 @@ rs6000_handle_altivec_attribute (tree *node,
> >switch (mode)
> > {
> > case V8HImode: result = pixel_V8HI_type_node;
> > + gcc_fallthrough ();
> > default: break;
> > }
> >  default: break;
> 
> I thought we don't warn on these anymore?
 
It warned, but not anymore.

> Also, as others said, I think it would be best to split this patch into:
> 1) bugfixes part (where you've added break; that was missing and it changes
>(fixes) behavior)
> 2) questionable cases (your XX), with request for the corresponding
>maintainers to decide
> 3) the actual addition of the attribute/comments or tweaking their wording,
>so that for intentional fallthrus we don't warn

Ok, I'll do so momentarily.

Marek


[arm-embedded] Backport of resolution handling fix WRT incremental linking

2016-07-27 Thread Thomas Preudhomme
The following backport has been made from gcc-6-branch to ARM/embedded-5-
branch in order to fix an ICE in LTO observed when running g++.dg/lto/20081219 
testcase:

2016-07-27  Thomas Preud'homme  

Backport from mainline
2015-11-29  Jan Hubicka  

* cgraph.c (cgraph_node::make_local): No name is unique during
incremental linking.
* cgraph.h (can_be_discarded_p): Update comment; also common and
WEAK in named sections can be discarded; when doing incremental
link do not rely on resolution being the final one.
* varasm.c (default_binds_local_p_3, decl_binds_to_current_def_p):
When symbol can be discarded, do not rely on resolution info.
* symtab.c (symtab_node::nonzero_address): Take into account that
symbol can be discarded.
* ipa-visibility.c (update_visibility_by_resolution_info): Handle
definition correctly.
(function_and_variable_visibility): Do not set unique_name when
incrementally linking.


Best regards,

Thomas


Re: Implement -Wimplicit-fallthrough: core (take 2)

2016-07-27 Thread Marek Polacek
On Fri, Jul 22, 2016 at 01:06:41PM +0200, Jakub Jelinek wrote:
> On Fri, Jul 22, 2016 at 12:44:07PM +0200, Marek Polacek wrote:
> > --- gcc/gcc/cp/parser.h
> > +++ gcc/gcc/cp/parser.h
> > @@ -46,7 +46,7 @@ struct GTY (()) cp_token {
> >   Otherwise, this value is RID_MAX.  */
> >ENUM_BITFIELD (rid) keyword : 8;
> >/* Token flags.  */
> > -  unsigned char flags;
> > +  unsigned short flags;
> >/* True if this token is from a context where it is implicitly extern 
> > "C" */
> >BOOL_BITFIELD implicit_extern_c : 1;
> >/* True if an error has already been reported for this token, such as a
> 
> I'm afraid this is really bad.
> Right now, there are 8 and 8 bit bitfields, then 8-bit char, 3 individual
> bits, 5 unused bits and 32-bit int, nicely packed into 64-bit word before a
> union with pointer members, and the C++ FE lexes everything first, so there
> are possibly millions of tokens in memory.
> Can't you just make it unsigned int flags : 11; instead?  Or instead
> reshuffle the cpplib.h flags?  E.g. I don't see the C++ FE to use the
> NO_EXPAND flag, so moving it to the upper byte of the short and moving the
> new flag to its bit?  Perhaps that is even better for now.

Ouch, sorry, that is embarassing.  I had been meaning to get back to this, but
apparently I never did.  This version should be better.  Another change is that
we don't warn for { something; default: return; } even in nested switches.
Another change is that I made the FEs less permissive wrt invalid code.

CCing Jason and Joseph to see what they think of the FEs part.

2016-07-27  Marek Polacek  
Jakub Jelinek  

PR c/7652
gcc/
* common.opt (Wimplicit-fallthrough): New option.
* doc/extend.texi: Document statement attributes and the fallthrough
attribute.
* doc/invoke.texi: Document -Wimplicit-fallthrough.
* gimple.h (gimple_call_internal_p): New function.
* gimplify.c (struct gimplify_ctx): Add in_switch_expr.
(struct label_entry): New struct.
(find_label_entry): New function.
(last_stmt_in_scope): New function.
(warn_implicit_fallthrough_r): New function.
(maybe_warn_implicit_fallthrough): New function.
(expand_FALLTHROUGH_r): New function.
(expand_FALLTHROUGH): New function.
(gimplify_switch_expr): Call maybe_warn_implicit_fallthrough and
expand_FALLTHROUGH for the innermost GIMPLE_SWITCH.
(gimplify_label_expr): New function.
(gimplify_case_label_expr): Set location.
(gimplify_expr): Call gimplify_label_expr.
* internal-fn.c (expand_FALLTHROUGH): New function.
* internal-fn.def (FALLTHROUGH): New internal function.
* system.h (gcc_fallthrough): Define.
* tree-core.h: Add FALLTHROUGH_LABEL_P comment.
* tree.h (FALLTHROUGH_LABEL_P): Define.
gcc/c-family/
* c-common.c (c_common_attribute_table): Add fallthrough attribute.
(handle_fallthrough_attribute): New function.
gcc/c/
* c-parser.c (struct c_token): Add flags field.
(c_lex_one_token): Pass it to c_lex_with_flags.
(c_parser_declaration_or_fndef): Turn __attribute__((fallthrough));
into IFN_FALLTHROUGH.
(c_parser_label): Set FALLTHROUGH_LABEL_P on labels.
(c_parser_statement_after_labels): Handle RID_ATTRIBUTE.
gcc/cp/
* constexpr.c (cxx_eval_internal_function): Handle IFN_FALLTHROUGH.
(potential_constant_expression_1): Likewise.
* parser.c (cp_parser_primary_expression): Handle RID_ATTRIBUTE.
(cp_parser_statement): Handle fallthrough attribute.
(cp_parser_label_for_labeled_statement): Set FALLTHROUGH_LABEL_P on
labels.
(cp_parser_std_attribute): Handle fallthrough attribute.
(cp_parser_check_std_attribute): Detect duplicated fallthrough
attribute.
gcc/testsuite/
* c-c++-common/Wimplicit-fallthrough-1.c: New test.
* c-c++-common/Wimplicit-fallthrough-10.c: New test.
* c-c++-common/Wimplicit-fallthrough-11.c: New test.
* c-c++-common/Wimplicit-fallthrough-12.c: New test.
* c-c++-common/Wimplicit-fallthrough-13.c: New test.
* c-c++-common/Wimplicit-fallthrough-14.c: New test.
* c-c++-common/Wimplicit-fallthrough-15.c: New test.
* c-c++-common/Wimplicit-fallthrough-16.c: New test.
* c-c++-common/Wimplicit-fallthrough-17.c: New test.
* c-c++-common/Wimplicit-fallthrough-2.c: New test.
* c-c++-common/Wimplicit-fallthrough-3.c: New test.
* c-c++-common/Wimplicit-fallthrough-4.c: New test.
* c-c++-common/Wimplicit-fallthrough-5.c: New test.
* c-c++-common/Wimplicit-fallthrough-6.c: New test.
* c-c++-common/Wimplicit-fallthrough-7.c: New test.
* c-c++-common/Wimplicit-fallthrough-8.c: New test.
* c-c++-common/Wimplicit-fallthrough-9.c: New test.
* 

[PING] [PING] libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to test

2016-07-27 Thread Thomas Schwinge
Hi!

Ping.

On Wed, 20 Jul 2016 13:52:20 +0200, I wrote:
> Ping.
> 
> On Wed, 13 Jul 2016 12:37:07 +0200, I wrote:
> > As discussed before, "offloading compilation is slow; I suppose because
> > of having to invoke several tools (LTO streaming -> mkoffload -> offload
> > compilers, assemblers, linkers -> combine the resulting images; but I
> > have not done a detailed analysis on that)".  For this reason it is
> > beneficial (that is, it is measurable in libgomp testing wall time) to
> > limit offload compilation to the one (in the OpenACC case) offload target
> > that we're actually going to test (that is, execute).  Another reason is
> > that -foffload=-fdump-tree-[...] produces clashes (that is,
> > unpredicatable outcome) in the file names of offload compilations' dump
> > files' names.  Here is a patch to implement that, to specify
> > -foffload=[...] during libgomp OpenACC testing.  As that has been
> > challenged before:
> > 
> > | [...] there actually is a difference between offload_plugins and
> > | offload_targets (for example, "intelmic"
> > | vs. "x86_64-intelmicemul-linux-gnu"), and I'm using both variables --
> > | to avoid having to translate the more specific
> > | "x86_64-intelmicemul-linux-gnu" (which we required in the test harness)
> > | into the less specific "intelmic" (for plugin loading) in
> > | libgomp/target.c.  I can do that, so that we can continue to use just a
> > | single offload_targets variable, but I consider that a less elegant
> > | solution.
> > 
> > OK for trunk?
> > 
> > commit 5fdb515826769ebb36bc5c49a3ffac4d17a8a589
> > Author: Thomas Schwinge 
> > Date:   Wed Jul 13 11:37:16 2016 +0200
> > 
> > libgomp: In OpenACC testing, cycle though $offload_targets, and by 
> > default only build for the offload target that we're actually going to test
> > 
> > libgomp/
> > * plugin/configfrag.ac: Enumerate both offload plugins and 
> > offload
> > targets.
> > (OFFLOAD_PLUGINS): Renamed from OFFLOAD_TARGETS.
> > * target.c (gomp_target_init): Adjust to that.
> > * testsuite/lib/libgomp.exp: Likewise.
> > (offload_targets_s, offload_targets_s_openacc): Remove 
> > variables.
> > (offload_target_to_openacc_device_type): New proc.
> > (check_effective_target_openacc_nvidia_accel_selected)
> > (check_effective_target_openacc_host_selected): Examine
> > $openacc_device_type instead of $offload_target_openacc.
> > * Makefile.in: Regenerate.
> > * config.h.in: Likewise.
> > * configure: Likewise.
> > * testsuite/Makefile.in: Likewise.
> > * testsuite/libgomp.oacc-c++/c++.exp: Cycle through
> > $offload_targets (plus "disable") instead of
> > $offload_targets_s_openacc, and add "-foffload=$offload_target" 
> > to
> > tagopt.
> > * testsuite/libgomp.oacc-c/c.exp: Likewise.
> > * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
> > ---
> >  libgomp/Makefile.in|  1 +
> >  libgomp/config.h.in|  4 +-
> >  libgomp/configure  | 44 +++--
> >  libgomp/plugin/configfrag.ac   | 39 +++-
> >  libgomp/target.c   |  8 +--
> >  libgomp/testsuite/Makefile.in  |  1 +
> >  libgomp/testsuite/lib/libgomp.exp  | 72 
> > ++
> >  libgomp/testsuite/libgomp.oacc-c++/c++.exp | 30 +
> >  libgomp/testsuite/libgomp.oacc-c/c.exp | 30 +
> >  libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 22 ---
> >  10 files changed, 142 insertions(+), 109 deletions(-)
> > 
> > diff --git libgomp/Makefile.in libgomp/Makefile.in
> > index 88c8517..33be8c7 100644
> > --- libgomp/Makefile.in
> > +++ libgomp/Makefile.in
> > @@ -380,6 +380,7 @@ mkdir_p = @mkdir_p@
> >  multi_basedir = @multi_basedir@
> >  offload_additional_lib_paths = @offload_additional_lib_paths@
> >  offload_additional_options = @offload_additional_options@
> > +offload_plugins = @offload_plugins@
> >  offload_targets = @offload_targets@
> >  oldincludedir = @oldincludedir@
> >  pdfdir = @pdfdir@
> > diff --git libgomp/config.h.in libgomp/config.h.in
> > index 226ac53..28f7b2d 100644
> > --- libgomp/config.h.in
> > +++ libgomp/config.h.in
> > @@ -98,8 +98,8 @@
> > */
> >  #undef LT_OBJDIR
> >  
> > -/* Define to offload targets, separated by commas. */
> > -#undef OFFLOAD_TARGETS
> > +/* Define to offload plugins, separated by commas. */
> > +#undef OFFLOAD_PLUGINS
> >  
> >  /* Name of package */
> >  #undef PACKAGE
> > diff --git libgomp/configure libgomp/configure
> > index 8d03eb6..4baab20 100755
> > --- libgomp/configure
> > +++ libgomp/configure
> > @@ -633,6 +633,8 @@ PLUGIN_NVPTX_FALSE
> >  PLUGIN_NVPTX_TRUE

Use correct location information for OpenACC shape and simple clauses in C/C++

2016-07-27 Thread Thomas Schwinge
Hi!

I found that for a lot of OpenACC (and potentially also OpenMP) clauses
(in C/C++ front ends; didn't look at Fortran), we use wrong location
information.  The problem is that
c_parser_oacc_all_clauses/c_parser_omp_all_clauses calls
cp_parser_omp_clause_name to determine the pragma_omp_clause c_kind, and
that function (as documented) consumes the clause token before returning.
So, when we then do "c_parser_peek_token (parser)->location" or similar
in some clause parsing function, that will return the location
information of the token _after_ the clause token, which -- at least very
often -- is not desirable, in particular if that location information is
used then in a build_omp_clause call, which should point to the clause
token itself, and not whatever follows after that.

Probably that all went unnoticed for so long, because the GCC testsuite
largely is running with -fno-diagnostics-show-caret, so we don't visually
see the wrong location information (and nobody pays attention to the
colum information as given, for example, as line 2, column 32 in
"[...]/c-c++-common/goacc/routine-2.c:2:32: error: [...]".

There seems to be a lot of inconsistency in that in all the clause
parsing; here is just a first patch to fix the immediate problem I've
been observing.  OK for trunk already, or need to clean this all up in
one go?  Do we need this on release branches, as a "quality of
implementation" fix (wrong diagnostic locations)?

commit bac4c04ca1d52c56a3583f5958e116c62b889d5a
Author: Thomas Schwinge 
Date:   Wed Jul 27 16:55:56 2016 +0200

Use correct location information for OpenACC shape and simple clauses in 
C/C++

gcc/c/
* c-parser.c (c_parser_oacc_shape_clause)
(c_parser_oacc_simple_clause): Add loc formal parameter.  Adjust
all users.
gcc/cp/
* parser.c (cp_parser_oacc_shape_clause): Add loc formal
parameter.  Adjust all users.
---
 gcc/c/c-parser.c | 25 +
 gcc/cp/parser.c  | 12 +++-
 2 files changed, 20 insertions(+), 17 deletions(-)

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 0031481..82ac855 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -11623,12 +11623,12 @@ c_parser_omp_clause_num_workers (c_parser *parser, 
tree list)
 */
 
 static tree
-c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+c_parser_oacc_shape_clause (c_parser *parser, location_t loc,
+   omp_clause_code kind,
const char *str, tree list)
 {
   const char *id = "num";
   tree ops[2] = { NULL_TREE, NULL_TREE }, c;
-  location_t loc = c_parser_peek_token (parser)->location;
 
   if (kind == OMP_CLAUSE_VECTOR)
 id = "length";
@@ -11758,12 +11758,12 @@ c_parser_oacc_shape_clause (c_parser *parser, 
omp_clause_code kind,
seq */
 
 static tree
-c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
-tree list)
+c_parser_oacc_simple_clause (c_parser * /* parser */, location_t loc,
+enum omp_clause_code code, tree list)
 {
   check_no_duplicate_clause (list, code, omp_clause_code_name[code]);
 
-  tree c = build_omp_clause (c_parser_peek_token (parser)->location, code);
+  tree c = build_omp_clause (loc, code);
   OMP_CLAUSE_CHAIN (c) = list;
 
   return c;
@@ -13089,7 +13089,7 @@ c_parser_oacc_all_clauses (c_parser *parser, 
omp_clause_mask mask,
  c_name = "async";
  break;
case PRAGMA_OACC_CLAUSE_AUTO:
- clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_AUTO,
+ clauses = c_parser_oacc_simple_clause (parser, here, OMP_CLAUSE_AUTO,
clauses);
  c_name = "auto";
  break;
@@ -13139,7 +13139,7 @@ c_parser_oacc_all_clauses (c_parser *parser, 
omp_clause_mask mask,
  break;
case PRAGMA_OACC_CLAUSE_GANG:
  c_name = "gang";
- clauses = c_parser_oacc_shape_clause (parser, OMP_CLAUSE_GANG,
+ clauses = c_parser_oacc_shape_clause (parser, here, OMP_CLAUSE_GANG,
c_name, clauses);
  break;
case PRAGMA_OACC_CLAUSE_HOST:
@@ -13151,8 +13151,9 @@ c_parser_oacc_all_clauses (c_parser *parser, 
omp_clause_mask mask,
  c_name = "if";
  break;
case PRAGMA_OACC_CLAUSE_INDEPENDENT:
- clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_INDEPENDENT,
-   clauses);
+ clauses = c_parser_oacc_simple_clause (parser, here,
+OMP_CLAUSE_INDEPENDENT,
+clauses);
  c_name = "independent";
  break;
case PRAGMA_OACC_CLAUSE_LINK:
@@ -13200,7 +13201,7 @@ c_parser_oacc_all_clauses (c_parser *parser, 
omp_clause_mask mask,
  c_name = "self";
  break;
case 

[PATCH, 1 of 4 or 5], Enhance PowerPC vec_extract support for power8/power9 machines

2016-07-27 Thread Michael Meissner
These patches enhance the vec_extract built-in on modern PowerPC server
systems.  Currently, vec_extract is optimized for constant element numbers for
vector double/vector long on any VSX system, and constant element numbers for
vector char/vector short/vector int on ISA 3.0 (power9) systems.

If the vec_extract is not handled, the compiler currently stores the vector
into memory, and then indexes the element via normal memory addressing.  This
creates a store-load hit.

This patch and the successive patches will enable better code generation of
vec_extract on 64-bit systems with direct move (power8) and above.

This particular patch changes the infrastructure so that in the next patch, I
can add support for extracting a variable element of vector double or vector
long.  This particular patch is just infrastructure, and does not change the
code generation.

In addition, I discovered a previous change for ISA 3.0 extraction spelled an
instruction wrong, and it is fixed here.  It turns out that I messed up the
constraints, so the register allocator would never generate this instruction.
This patch just uses the correct name, but it won't be until the next patch
that the constraints will be fixed so it can be generated.

I have tested this patch and there are no regressions.  Can I apply this to the
trunk?  These sets of patches depend on the DImode in Altivec registers patches
that have not been back ported to GCC 6.2, so it is for trunk only.

The next patch will enhance vec_extract to to allow the element number to be
variable for vec_extract of vector double/vector long on 64-bit systems with
direct move using the VSLO instruction.

The third patch will enhance vec_extract to better optimize extracting elements
if the vector is in memory.  Right now, the code only optimizes extracting
element 0.  The new patch will allow both constant and variable element
numbers.

The fourth patch will enhance vec_extract for the other vector types.  I might
split it up into two patches, one for vector float, and the other for vector
char, vector short, and vector long.

I built spec 2006 with all of the patches, and there were some benchmarks that
generated a few changes, and a few benchmarks that generated a lot (gamess had
over 500 places that were optimized).  I ran a comparison between the old
compiler and one with the patches installed on several of the benchmarks that
showed the most changes.  I did not see any performance changes on the
benchmarks that I ran.  I believe this is because vec_extract is typically
generated at the end of vector reductions, and it does not account for much
time in the whole benchmark.  User written code that uses vec_extract would
hopefully see speed improvements.

2016-07-27  Michael Meissner  

* config/rs6000/vector.md (vec_extract): Change the calling
signature of rs6000_expand_vector_extract so that the element
number is a RTX instead of a constant integer.
* config/rs6000/rs6000-protos.h (rs6000_expand_vector_extract):
Likewise.
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Likewise.
(altivec_expand_vec_ext_builtin): Likewise.
* config/rs6000/altivec.md (reduc_plus_scal_): Likewise.
* config/rs6000/vsx.md (vsx_extract_): Fix spelling of the
MFVSRLD instruction.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vector.md
===
--- gcc/config/rs6000/vector.md (revision 238772)
+++ gcc/config/rs6000/vector.md (working copy)
@@ -858,8 +858,7 @@ (define_expand "vec_extract"
(match_operand 2 "const_int_operand" "")]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
 {
-  rs6000_expand_vector_extract (operands[0], operands[1],
-   INTVAL (operands[2]));
+  rs6000_expand_vector_extract (operands[0], operands[1], operands[2]);
   DONE;
 })
 
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 238772)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -61,7 +61,7 @@ extern void convert_int_to_float128 (rtx
 extern void rs6000_expand_vector_init (rtx, rtx);
 extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
-extern void rs6000_expand_vector_extract (rtx, rtx, int);
+extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
 extern bool altivec_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern bool rs6000_expand_vec_perm_const (rtx op[4]);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 238772)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6911,35 

Re: [PATCH 1/3] (v2) On-demand locations within string-literals

2016-07-27 Thread David Malcolm
On Tue, 2016-07-26 at 19:05 +0100, Manuel López-Ibáñez wrote:
> On 26/07/16 18:11, David Malcolm wrote:
> 
> > gcc/ChangeLog:
> > * gcc.c (cpp_options): Rename string to...
> > (cpp_options_): ...this, to avoid clashing with struct in
> > cpplib.h.
> 
> It seems to me that you need this because  now gcc.c includes
> cpplib.h via 
> input.h, which seems wrong.
> 
> input.h was FE-independent (it depends on line-map.h but it is an
> accident of 
> history that line-map.h is in libcpp since it doesn't depend on
> anything from 
> libcpp [*]). Note that input.h is included in coretypes.h, so this
> means that 
> now cpplib.h is included almost everywhere! [**]
> 
> There is the following in coretypes.h:
> 
> /* Provide forward struct declaration so that we don't have to
> include
> all of cpplib.h whenever a random prototype includes a pointer.
> Note that the cpp_reader and cpp_token typedefs remain part of
> cpplib.h.  */
> 
> struct cpp_reader;
> struct cpp_token;
> 
> precisely to avoid including cpplib.h.
> 
> 
> If I understand correctly, cpplib.h is needed in input.h because of
> this 
> declaration:
> 
> +extern const char *get_source_range_for_substring (cpp_reader
> *pfile,
> +string_concat_db
> *concats,
> +location_t
> strloc,
> +enum cpp_ttype
> type,
> +int start_idx,
> int end_idx,
> +source_range
> *out_range);
> 
> 
> Does this really need to be in input.h ?  It seems something that
> only C-family 
> languages will be able to use. Note that you need a reader to use
> this 
> function, and for that, you need to already include cpplib.h.

Fair point; the attached modification to patch 1 compiles cleanly, and
moves it to a new header.

> Perhaps it could live for now in c-format.c, since it is the only
> place using it?

Martin Sebor [CC-ed] wants to use it from the middle-end:
  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
so it's unclear to me that c-format.c would be a better location.

There are various places it could live; but getting it working took a
lot of effort to achieve - the currently proposed mixture of libcpp,
input.c and c-format.c for the locations of the various pieces works
(for example, auto_vec isn't available in libcpp).

Given that both Martin and I have candidate patches that are touching
the same area, I'd prefer to focus on getting this code in to trunk,
rather than rewrite it out-of-tree, so that we can at least have the
improvement to location-handling for Wformat.  Once the code is in the
tree, it should be easier to figure out how to access it from the
middle-end.

> Cheers,
> 
>   Manuel.
> 
> [*] In an ideal world, we would have a language-agnostic diagnostics
> library 
> that would include line-map and that would be used by libcpp and the
> rest of 
> GCC, so that we can remove all the error-routines in libcpp and the
> awkward 
> glue code that ties it into diagnostics.c.,

Agreed, though that may have to wait until gcc 8 at this point.
(Given that the proposed diagnostics library would use line maps, and
would be used by libcpp, would it make sense to move the diagnostics
into libcpp itself?  Diagnostics would seem to be intimately related to
location-tracking)

> [**] And it seems that we are slowly undoing all the work that was
> done by 
> Andrew MacLeod to clean up the .h web and remove dependencies 
> (https://gcc.gnu.org/wiki/rearch).
> 
> From 09824cb27c0e817b29de1c7eb9b53c603116f13e Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Wed, 27 Jul 2016 10:33:52 -0400
Subject: [PATCH] Avoid including cpplib.h from input.h

gcc/c-family/ChangeLog:
	* c-common.c: Include "substring-locations.h".

gcc/ChangeLog:
	* input.h: Don't include cpplib.h.
	(get_source_range_for_substring): Move to...
	* substring-locations.h: New header.
---
 gcc/c-family/c-common.c   |  1 +
 gcc/input.h   |  8 
 gcc/substring-locations.h | 30 ++
 3 files changed, 31 insertions(+), 8 deletions(-)
 create mode 100644 gcc/substring-locations.h

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index f4ffc0e..c4843db 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-iterator.h"
 #include "opts.h"
 #include "gimplify.h"
+#include "substring-locations.h"
 
 cpp_reader *parse_in;		/* Declared in c-pragma.h.  */
 
diff --git a/gcc/input.h b/gcc/input.h
index 24d9115..c17e440 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -22,7 +22,6 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_INPUT_H
 
 #include "line-map.h"
-#include 
 
 extern GTY(()) struct line_maps *line_table;
 
@@ -131,11 +130,4 @@ class GTY(()) 

C++ PATCH for c++/71747 (ICE with self-referential partial specialization)

2016-07-27 Thread Jason Merrill
In this testcase, trying to match the partial specialization led to 
trying to match the partial specialization again, without hitting any 
instantiation that would have called push_tinst_level.  So we should 
call it during this substitution, too.  This requires passing the 
partial specialization template into get_partial_spec_bindings, and 
provides an opportunity to tidy the associated code a bit.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 334888fdae630a4593a5b3cf0c23d016b0c81e1b
Author: Jason Merrill 
Date:   Tue Jul 26 17:25:21 2016 -0400

	PR c++/71747 - ICE with self-referential partial spec

	* pt.c (get_partial_spec_bindings): Replace tparms and spec_args
	parameters with spec_tmpl.  Call push_tinst_level.
	(most_specialized_partial_spec): Adjust.
	(more_specialized_partial_spec): Adjust.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 97d5000..a23a05a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -140,7 +140,7 @@ static int unify (tree, tree, tree, tree, int, bool);
 static void add_pending_template (tree);
 static tree reopen_tinst_level (struct tinst_level *);
 static tree tsubst_initializer_list (tree, tree);
-static tree get_partial_spec_bindings (tree, tree, tree, tree);
+static tree get_partial_spec_bindings (tree, tree, tree);
 static tree coerce_template_parms (tree, tree, tree, tsubst_flags_t,
    bool, bool);
 static tree coerce_innermost_template_parms (tree, tree, tree, tsubst_flags_t,
@@ -20689,8 +20689,6 @@ more_specialized_partial_spec (tree tmpl, tree pat1, tree pat2)
 
   tree tmpl1 = TREE_VALUE (pat1);
   tree tmpl2 = TREE_VALUE (pat2);
-  tree parms1 = DECL_INNERMOST_TEMPLATE_PARMS (tmpl1);
-  tree parms2 = DECL_INNERMOST_TEMPLATE_PARMS (tmpl2);
   tree specargs1 = TI_ARGS (get_template_info (DECL_TEMPLATE_RESULT (tmpl1)));
   tree specargs2 = TI_ARGS (get_template_info (DECL_TEMPLATE_RESULT (tmpl2)));
 
@@ -20699,14 +20697,14 @@ more_specialized_partial_spec (tree tmpl, tree pat1, tree pat2)
  types in the arguments, and we need our dependency check functions
  to behave correctly.  */
   ++processing_template_decl;
-  targs = get_partial_spec_bindings (tmpl, parms1, specargs1, specargs2);
+  targs = get_partial_spec_bindings (tmpl, tmpl1, specargs2);
   if (targs)
 {
   --winner;
   any_deductions = true;
 }
 
-  targs = get_partial_spec_bindings (tmpl, parms2, specargs2, specargs1);
+  targs = get_partial_spec_bindings (tmpl, tmpl2, specargs1);
   if (targs)
 {
   ++winner;
@@ -20790,23 +20788,23 @@ get_bindings (tree fn, tree decl, tree explicit_args, bool check_rettype)
 }
 
 /* Return the innermost template arguments that, when applied to a partial
-   specialization of TMPL whose innermost template parameters are
-   TPARMS, and whose specialization arguments are SPEC_ARGS, yield the
-   ARGS.
+   specialization SPEC_TMPL of TMPL, yield the ARGS.
 
For example, suppose we have:
 
  template  struct S {};
  template  struct S {};
 
-   Then, suppose we want to get `S'.  The TPARMS will be
-   {T}, the SPEC_ARGS will be {T*, int} and the ARGS will be {double*,
-   int}.  The resulting vector will be {double}, indicating that `T'
-   is bound to `double'.  */
+   Then, suppose we want to get `S'.  SPEC_TMPL will be the
+   partial specialization and the ARGS will be {double*, int}.  The resulting
+   vector will be {double}, indicating that `T' is bound to `double'.  */
 
 static tree
-get_partial_spec_bindings (tree tmpl, tree tparms, tree spec_args, tree args)
+get_partial_spec_bindings (tree tmpl, tree spec_tmpl, tree args)
 {
+  tree tparms = DECL_INNERMOST_TEMPLATE_PARMS (spec_tmpl);
+  tree spec_args
+= TI_ARGS (get_template_info (DECL_TEMPLATE_RESULT (spec_tmpl)));
   int i, ntparms = TREE_VEC_LENGTH (tparms);
   tree deduced_args;
   tree innermost_deduced_args;
@@ -20832,6 +20830,13 @@ get_partial_spec_bindings (tree tmpl, tree tparms, tree spec_args, tree args)
 if (! TREE_VEC_ELT (innermost_deduced_args, i))
   return NULL_TREE;
 
+  tree tinst = build_tree_list (spec_tmpl, deduced_args);
+  if (!push_tinst_level (tinst))
+{
+  excessive_deduction_depth = true;
+  return NULL_TREE;
+}
+
   /* Verify that nondeduced template arguments agree with the type
  obtained from argument deduction.
 
@@ -20848,6 +20853,9 @@ get_partial_spec_bindings (tree tmpl, tree tparms, tree spec_args, tree args)
   spec_args = coerce_template_parms (DECL_INNERMOST_TEMPLATE_PARMS (tmpl),
  spec_args, tmpl,
  tf_none, false, false);
+
+  pop_tinst_level ();
+
   if (spec_args == error_mark_node
   /* We only need to check the innermost arguments; the other
 	 arguments will always agree.  */
@@ -21057,44 +21065,21 @@ most_specialized_partial_spec (tree target, tsubst_flags_t complain)
 
   for (t = DECL_TEMPLATE_SPECIALIZATIONS (main_tmpl); t; t = TREE_CHAIN (t))
 {
-  tree 

Re: [PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-07-27 Thread Ramana Radhakrishnan
On Mon, Jul 4, 2016 at 2:57 PM, Matthew Wahab
 wrote:
> On 17/05/16 15:34, Matthew Wahab wrote:
>> The ARMv8.2-A FP16 extension adds a number of instructions to support
>> data movement for FP16 values. This patch adds these instructions to the
>> backend, making them available to the compiler code generator.
>
> This updates the expected output for the test added by the patch since
> gcc now generates ldrh/strh for some indexed loads/stores which were
> previously done with vld1/vstr1.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> 2016-07-04  Matthew Wahab  
> Jiong Wang 
>
> * config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
> available when FP16 instructions are available.
> (output_move_vfp): Add support for 16-bit data moves.
> (arm_validize_comparison): Fix some white-space.  Support HFmode
> by conversion to SFmode.
> * config/arm/arm.md (truncdfhf2): Fix a comment.
> (extendhfdf2): Likewise.
> (cstorehf4): New.
> (movsicc): Fix some white-space.
> (movhfcc): New.
> (movsfcc): Fix some white-space.
> (*cmovhf): New.
> * config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
> instructions are available.
> (*thumb2_movhi_vfp): Likewise.
> (*arm_movhi_fp16): New.
> (*thumb2_movhi_fp16): New.
> (*movhf_vfp_fp16): New.
> (*movhf_vfp_neon): Disable when VFP FP16 instructions are
> available.
> (*movhf_vfp): Likewise.
> (extendhfsf2): Enable when VFP FP16 instructions are available.
> (truncsfhf2):  Enable when VFP FP16 instructions are available.
>
> testsuite/
> 2016-07-04  Matthew Wahab  
>
> * gcc.target/arm/armv8_2_fp16-move-1.c: New.
>

OK.

Thanks,
Ramana


Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.

2016-07-27 Thread Ramana Radhakrishnan
On Tue, May 17, 2016 at 3:31 PM, Matthew Wahab
 wrote:
> The ACLE specifies a number of intrinsics for manipulating vectors
> holding values in most of the integer and floating point type. These
> include 16-bit integer types but not 16-bit floating point even though
> the same instruction is used for both.
>
> A future version of the ACLE extends the data processing intrinscs to
> the 16-bit floating point types, making the intrinsics available
> under the same conditions as the ARM __fp16 type.
>
> This patch adds the new intrinsics:
>  vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
>  vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
>  vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
>  vzip_f16, vzipq_f16.
>
> This patch also updates the advsimd-intrinsics testsuite to test the f16
> variants for ARM targets. These intrinsics are only implemented in the
> ARM target so the tests are disabled for AArch64 using an extra
> condition on a new convenience macro FP16_SUPPORTED. This patch also
> disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
> it is no longer needed.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
> advsimd-intrinsics testsuite using an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-05-17  Matthew Wahab  
>
> * config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
> V4HF modes.
> (arm_evpc_neon_vzip): Likewise.
> (arm_evpc_neon_vrev): Likewise.
> (arm_evpc_neon_vtrn): Likewise.
> (arm_evpc_neon_vext): Likewise.
> * config/arm/arm_neon.h (vbsl_f16): New.
> (vbslq_f16): New.
> (vdup_n_f16): New.
> (vdupq_n_f16): New.
> (vdup_lane_f16): New.
> (vdupq_lane_f16): New.
> (vext_f16): New.
> (vextq_f16): New.
> (vmov_n_f16): New.
> (vmovq_n_f16): New.
> (vrev64_f16): New.
> (vrev64q_f16): New.
> (vtrn_f16): New.
> (vtrnq_f16): New.
> (vuzp_f16): New.
> (vuzpq_f16): New.
> (vzip_f16): New.
> (vzipq_f16): New.
> * config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf
> variants).
> (vdup_lane): New (v8hf, v4hf variants).
> (vext): New (v8hf, v4hf variants).
> (vbsl): New (v8hf, v4hf variants).
> * config/arm/iterators.md (VDQWH): New.
> (VH): New.
> (V_double_vector_mode): Add V8HF and V4HF.  Fix white-space.
> (Scalar_mul_8_16): Fix white-space.
> (Is_d_reg): Add V4HF and V8HF.
> * config/arm/neon.md (neon_vdup_lane_internal): New.
> (neon_vdup_lane): New.
> (neon_vtrn_internal): Replace VDQW with VDQWH.
> (*neon_vtrn_insn): Likewise.
> (neon_vzip_internal): Likewise. Also fix white-space.
> (*neon_vzip_insn): Likewise
> (neon_vuzp_internal): Likewise.
> (*neon_vuzp_insn): Likewise
> * config/arm/vec-common.md (vec_perm_const): New.
>
> testsuite/
> 2016-05-17  Matthew Wahab  
>
> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
> (FP16_SUPPORTED): New
> (vdup_n_f16): Disable for non-AArch64 targets.
> * gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Add __fp16 tests,
> conditional on FP16_SUPPORTED.
> * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vext.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vrev.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: Add support
> for testing __fp16.
> * gcc.target/aarch64/advsimd-intrinsics/vtrn.c: Add __fp16 tests,
> conditional on FP16_SUPPORTED.
> * gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise.
>

OK.


Ramana


Re: [PATCH 5/17][ARM] Enable HI mode moves for floating point values.

2016-07-27 Thread Ramana Radhakrishnan
On Tue, May 17, 2016 at 3:29 PM, Matthew Wahab
 wrote:
> The handling of 16-bit integer data-movement in the ARM backend doesn't
> make full use of the VFP instructions when they are available, even when
> the values are for use in VFP operations.
>
> This patch adds support for using the VFP instructions and registers
> when moving 16-bit integer and floating point data between registers and
> between registers and memory.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator. Tested this patch for arm-none-linux-gnueabihf
> with native bootstrap and make check and for arm-none-eabi with
> check-gcc on an ARMv8.2-A emulator.
>
> Ok for trunk?


OK. ( the test function where this will make a difference is testhisf
for the record) ...

Ramana
> Matthew
>
> 2016-05-17  Jiong Wang  
> Matthew Wahab  
>
> * config/arm/arm.c (output_move_vfp): Weaken assert to allow
> HImode.
> (arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
> * config/arm/arm.md (*movhi_insn_arch4) Disable when VFP registers
> are
> available.
> (*movhi_bytes): Likewise.
> * config/arm/vfp.md (*arm_movhi_vfp): New.
> (*thumb2_movhi_vfp): New.
>
> testsuite/
> 2016-05-17  Matthew Wahab  
>
> * gcc.target/arm/short-vfp-1.c: New.
>


[PATCH] Do not allow make_compound_operation for vector mode, (PR70944)

2016-07-27 Thread Martin Liška
Hello.

Following patch rejects compound operation manipulation for vector mode.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 5f7ae66453a1f7a1a2c44414b22c742d69670177 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 27 Jul 2016 13:44:51 +0200
Subject: [PATCH] Do not allow make_compound_operation for vector mode
 (PR70944)

gcc/testsuite/ChangeLog:

2016-07-27  Martin Liska  

	* g++.dg/vect/pr70944.cc: New test.

gcc/ChangeLog:

2016-07-27  Martin Liska  

	PR rtl-optimization/70944
	* combine.c (make_compound_operation):
	Do not allow make_compound_operation for vector mode
---
 gcc/combine.c|  4 
 gcc/testsuite/g++.dg/vect/pr70944.cc | 13 +
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr70944.cc

diff --git a/gcc/combine.c b/gcc/combine.c
index 1becc3c..93c1710 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7770,6 +7770,10 @@ make_compound_operation (rtx x, enum rtx_code in_code)
   rtx tem;
   const char *fmt;
 
+  /* PR rtl-optimization/70944.  */
+  if (VECTOR_MODE_P (mode))
+return x;
+
   /* Select the code to be used in recursive calls.  Once we are inside an
  address, we stay there.  If we have a comparison, set to COMPARE,
  but once inside, go back to our default of SET.  */
diff --git a/gcc/testsuite/g++.dg/vect/pr70944.cc b/gcc/testsuite/g++.dg/vect/pr70944.cc
new file mode 100644
index 000..f8973aa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr70944.cc
@@ -0,0 +1,13 @@
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-additional-options "-O3 -march=core-avx2" } */
+
+unsigned *a;
+void
+fn1 ()
+{
+  for (int i; i; ++i)
+{
+  unsigned g (a[i] << 8 >> 24);
+  a[i] = g;
+}
+}
-- 
2.9.2



Re: [PATCH 4/17][ARM] Define feature macros for FP16.

2016-07-27 Thread Ramana Radhakrishnan
On Tue, May 17, 2016 at 3:28 PM, Matthew Wahab
 wrote:
> The FP16 extension introduced with the ARMv8.2-A architecture adds
> instructions operating on FP16 values to the VFP and NEON instruction
> sets.
>
> The patch adds the feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
> which is defined to be 1 if the VFP FP16 instructions are available; it
> is otherwise undefined.
>
> The patch also adds the feature macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
> which is defined to be 1 if the NEON FP16 instructions are available; it
> is otherwise undefined.
>
> These two macros will appear in a future version of the ACLE.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-05-17  Matthew Wahab  
>
> * config/arm/arm-c.c (arm_cpu_builtins): Define
> "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC" and
> "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC".
>
> testsuite/
> 2016-05-17  Matthew Wahab  
>
> * gcc.target/arm/attr-fp16-arith-1.c: New.
>

OK.

Ramana


Re: [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions.

2016-07-27 Thread Ramana Radhakrishnan
On Mon, Jul 4, 2016 at 2:49 PM, Matthew Wahab
 wrote:
> On 17/05/16 15:26, Matthew Wahab wrote:
>> The ARMv8.2-A FP16 extension adds to both the VFP and the NEON
>> instruction sets. This patch adds support to the testsuite to select
>> targets and set options for tests that make use of these
>> instructions. It also adds documentation for ARMv8.1-A selectors.
>
> This is a rebase of the patch to take account of changes in
> sourcebuild.texi.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on
> an ARMv8.2-A emulator.
>
> 2016-07-04  Matthew Wahab  
>
> * doc/sourcebuild.texi (ARM-specific attributes): Add anchor for
> arm_v8_1a_neon_ok.  Add entries for arm_v8_2a_fp16_scalar_ok,
> arm_v8_2a_fp16_scalar_hw, arm_v8_2a_fp16_neon_ok and
> arm_v8_2a_fp16_neon_hw.
> (Add options): Add entries for arm_v8_1a_neon,
> arm_v8_2a_fp16_scalar,
> arm_v8_2a_fp16_neon.
> * lib/target-supports.exp
> (add_options_for_arm_v8_2a_fp16_scalar): New.
>
> (add_options_for_arm_v8_2a_fp16_neon): New.
> (check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
> (add_options_for_arm_arch_v8_2a): Auto-generate.
> (check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
> (check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
> (check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
> (check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
> (check_effective_target_arm_v8_2a_fp16_neon_ok): New.
> (check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
> (check_effective_target_arm_v8_2a_fp16_neon_hw): New.
>


OK.

Ramana


Re: [PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support.

2016-07-27 Thread Ramana Radhakrishnan
On Tue, May 17, 2016 at 3:24 PM, Matthew Wahab
 wrote:
> The ARMv8.2-A FP16 extension only supports the IEEE format for FP16
> data. It is not compatible with the option -mfp16-format=none nor with
> the option -mfp16-format=alternative (selecting the ARM alternative FP16
> format). Using either with the FP16 extension will trigger a compiler
> error.
>
> This patch adds the selector arm_fp16_alternative_ok to the testsuite's
> target-support code to allow tests to require support for the
> alternative format. It also adds selector arm_fp16_none_ok to check
> whether -mfp16-format=none is a valid option for the target.  The patch
> also updates existing tests to make use of the new selectors.
>
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check and for arm-none-eabi and armeb-none-eabi with make check on an
> ARMv8.2-A emulator.
>
> Ok for trunk?
> Matthew
>
> 2016-05-17  Matthew Wahab  
>
> * doc/sourcebuild.texi (ARM-specific attributes): Add entries for
> arm_fp16_alternative_ok and arm_fp16_none_ok.
>
> testsuite/
> 2016-05-17  Matthew Wahab  
>
> * g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Use
> arm_fp16_alternative_ok.
> * g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
> * gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
> * gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
> * gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-1.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-10.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-11.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-12.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-2.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-3.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-4.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-5.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-6.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-7.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-8.c: Likewise.
> * gcc.target/arm/fp16-compile-alt-9.c: Likewise.
> * gcc.target/arm/fp16-compile-none-1.c: Use arm_fp16_none_ok.
> * gcc.target/arm/fp16-compile-none-2.c: Likewise.
> * gcc.target/arm/fp16-rounding-alt-1.c: Use
> arm_fp16_alternative_ok.
> * lib/target-supports.exp
> (check_effective_target_arm_fp16_alternative_ok_nocache): New.
> (check_effective_target_arm_fp16_alternative_ok): New.
> (check_effective_target_arm_fp16_none_ok_nocache): New.
> (check_effective_target_arm_fp16_none_ok): New.
>


OK.

Thanks,
ramana


Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases

2016-07-27 Thread Ramana Radhakrishnan
On Wed, May 25, 2016 at 1:49 PM, Prathamesh Kulkarni
 wrote:
> On 23 May 2016 at 14:28, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> This patch overrides expand_divmod_libfunc for ARM port and adds test-cases.
>> I separated the SImode tests into separate file from DImode tests
>> because certain arm configs (cortex-15) have hardware div insn for
>> SImode but not for DImode,
>> and for that config we want SImode tests to be disabled but not DImode tests.
>> The patch therefore has two target-effective checks: divmod and 
>> divmod_simode.
>> Cross-tested on arm*-*-*.
>> Bootstrap+test on arm-linux-gnueabihf in progress.
>> Does this patch look OK ?
> Hi,
> This version adds couple of more test-cases and fixes typo in
> divmod-3-simode.c, divmod-4-simode.c
>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Prathamesh

>From the patch (snipped out unnecessary parts)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 201aeb4..3bbf11b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c



+  gcc_assert (quotient);
+  gcc_assert (remainder);
+

There's a trailing white space here.

+  *quot_p = quotient;
+  *rem_p = remainder;
+}



+# For ARM configs defining __ARM_ARCH_EXT_IDIV__, disable
divmod_simode test-cases

Very unhelpful comment ...

For versions of the architecture where there exists a DIV instruction,
the divmod helper function is not used, disable the software divmod
optimization.


+
+proc check_effective_target_arm_divmod_simode { } {
+return [check_no_compiler_messages arm_divmod assembly {
+   #ifdef __ARM_ARCH_EXT_IDIV__
+   #error has div insn
+   #endif
+   int i;
+}]
+}
+
+proc check_effective_target_divmod { } {

Missing comment above.

+#TODO: Add checks for all targets that have either hardware divmod insn
+# or define libfunc for divmod.
+if { [istarget arm*-*-*]
+|| [istarget x86_64-*-*] } {
+   return 1
+}
+return 0
+}





The new helper functions need documentation in doc/sourcebuild.texi

Please repost with the doc changes, otherwise this is OK from my side.

Thanks,
Ramana


Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-27 Thread Richard Biener
On Tue, Jul 26, 2016 at 5:49 PM, Yuri Rumyantsev  wrote:
> Hi Richard,
>
> It turned out that the patch proposed by you does not work properly
> for nested loop. If we slightly change pr70729.cc to
> (non-essential part is omitted
> void inline Ss::foo (float w)
> {
> #pragma omp simd
>   for (int i=0; i {
>   float w1 = C2[S_n + i] * w;
>   v1.v_i[i] += (int)w1;
>   C1[S_n + i] += w1;
> }
> }
> void Ss::boo (int n)
> {
>   for (int i = 0; i foo(ww[i]);
> }
> loop in foo won't be vectorized since REF_LOOP is outer loop which is
> not marked with omp simd pragma:
>  t1.cc:73:25: note: not vectorized: not suitable for scatter store *_31 = _33;
> t1.cc:73:25: note: bad data references.
>
> Note that the check I proposed before works fine.

The attached works for me (current trunk doesn't work because of caching
we do - I suppose we should try again to avoid the quadraticness in other
ways when ref_indep_loop_p is called from outermost_indep_loop).

Richard.

> 2016-07-20 12:35 GMT+03:00 Yuri Rumyantsev :
>> Richard,
>>
>> Jakub has already fixed it.
>> Sorry for troubles.
>> Yuri.
>>
>> 2016-07-19 18:29 GMT+03:00 Renlin Li :
>>> Hi Yuri,
>>>
>>> I saw this test case runs on arm platforms, and maybe other platforms as
>>> well.
>>>
>>> testsuite/g++.dg/vect/pr70729.cc:7:10: fatal error: xmmintrin.h: No such
>>> file or directory
>>>
>>> Before the change here, it's gated by vect_simd_clones target selector,
>>> which limit it to i?86/x86_64 platform only.
>>>
>>> Regards,
>>> Renlin Li
>>>
>>>
>>>
>>>
>>> On 08/07/16 15:07, Yuri Rumyantsev wrote:

 Hi Richard,

 Thanks for your help - your patch looks much better.
 Here is new patch in which additional argument was added to determine
 source loop of reference.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?
 ChangeLog:
 2016-07-08  Yuri Rumyantsev  

 PR tree-optimization/71734
 * tree-ssa-loop-im.c (ref_indep_loop_p_1): Add REF_LOOP argument which
 contains REF, use it to check safelen, assume that safelen value
 must be greater 1, fix style.
 (ref_indep_loop_p_2): Add REF_LOOP argument.
 (ref_indep_loop_p): Pass LOOP as additional argument to
 ref_indep_loop_p_2.
 gcc/testsuite/ChangeLog:
  * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.

 2016-07-08 11:18 GMT+03:00 Richard Biener :
>
> On Thu, Jul 7, 2016 at 5:38 PM, Yuri Rumyantsev 
> wrote:
>>
>> I checked simd3.f90 and found out that my additional check reject
>> independence of references
>>
>> REF is independent in loop#3
>> .istart0.19, .iend0.20
>> which are defined in loop#1 which is outer for loop#3.
>> Note that these references are defined by
>> _103 = __builtin_GOMP_loop_dynamic_next (&.istart0.19, &.iend0.20);
>> which is in loop#1.
>> It is clear that both these references can not be independent for
>> loop#3.
>
>
> Ok, so we end up calling ref_indep_loop for ref in LOOP also for inner
> loops
> of LOOP to catch memory references in those as well.  So the issue is
> really
> that we look at the wrong loop for safelen and we _do_ want to apply
> safelen
> to inner loops as well.
>
> So better track the loop we are ultimately asking the question for, like
> in the
> attached patch (fixes the testcase for me).
>
> Richard.
>
>
>
>> 2016-07-07 17:11 GMT+03:00 Richard Biener :
>>>
>>> On Thu, Jul 7, 2016 at 4:04 PM, Yuri Rumyantsev 
>>> wrote:

 I Added this check because of new failures in libgomp.fortran suite.
 Here is copy of Jakub message:
 --- Comment #29 from Jakub Jelinek  ---
 The #c27 r237844 change looks bogus to me.
 First of all, IMNSHO you can argue this way only if ref is a reference
 seen in
 loop LOOP,
>>>
>>>
>>> or inner loops of LOOP I guess.  I _think_ we never call
>>> ref_indep_loop_p_1 with
>>> a REF whose loop is not a sub-loop of LOOP or LOOP itself (as it would
>>> not make
>>> sense to do that, it would be a waste of time).
>>>
>>> So only if "or inner loops of LOOP" is not correct the check would be
>>> needed
>>> but then my issue with unrolling an inner loop and turning a ref that
>>> safelen
>>> does not apply to into a ref that it now applies to arises.
>>>
>>> I don't fully get what Jakub is hinting at.
>>>
>>> Can you install the safelen > 0 -> safelen > 1 fix please?  Jakub, can
>>> you
>>> explain that bitmap check with a simple testcase?
>>>
>>> Thanks,
>>> Richard.
>>>
 

RFC: Make diagnostics for C++ reference binding more consistent

2016-07-27 Thread Jonathan Wakely

Consider:

template T declval();

int& r1 = declval();
int&& r2 = declval();
int& r3 = declval();


This produces three quite different errors:


refs.cc:3:25: error: invalid initialization of non-const reference of type 
'int&' from an rvalue of type 'int'
int& r1 = declval();
  ~~^~
refs.cc:4:25: error: cannot bind 'int' lvalue to 'int&&'
int&& r2 = declval();
   ~^~
refs.cc:5:30: error: binding 'const int' to reference of type 'int&' discards 
qualifiers
int& r3 = declval();
  ~~~^~


The first one uses the order to/from, but the other two are from/to.

The first and second are saying the same thing (wrong value category)
but very differently.

The first and third mention references but the second doesn't.

The standard talks about binding a reference to something, but the
second and third diagnostics talk about binding something to a
reference (and the first doesn't mention binding at all).

The first diagnostic is specific to lvalue references (it wouldn't be
invalid if it was binding a non-const _rvalue_ reference to an
rvalue), but doesn't use the word "lvalue".

(FWIW Clang and EDG both have their own inconsistencies for the same
example, but that shouldn't stop us trying to improve.)


IMHO the first should use the words "bind" and "lvalue reference":

refs.cc:3:25: error: cannot bind non-const lvalue reference of type 'int&' to 
an rvalue of type 'int'
int& r1 = declval();
  ~~^~

The second should use the phrasing "bind ref to value" instead of
"bind value to ref", and mention "rvalue reference":

refs.cc:4:25: error: cannot bind rvalue reference of type 'int&&' to an lvalue 
of type 'int'
int&& r2 = declval();
   ~^~

The third should use the same phrasing (but doesn't need to mention
lvalue/rvalue because the problem is related to cv-qualifiers not
value categories):

refs.cc:5:30: error: binding reference of type 'int&' to 'const int' discards 
qualifiers
int& r3 = declval();
  ~~~^~


I've only considered trivial examples here, is there some reason to
prefer to current diagnostics for more complex cases?

The attached patch makes that change, but there are probably lots of
tests that would need updating which I haven't done until I know if
the change is worthwhile.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 802c325..918661a 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6710,15 +6710,15 @@ convert_like_real (conversion *convs, tree expr, tree 
fn, int argnum,
tree extype = TREE_TYPE (expr);
if (TYPE_REF_IS_RVALUE (ref_type)
&& lvalue_p (expr))
- error_at (loc, "cannot bind %qT lvalue to %qT",
-   extype, totype);
+ error_at (loc, "cannot bind rvalue reference of type %qT to "
+"lvalue of type %qT", totype, extype);
else if (!TYPE_REF_IS_RVALUE (ref_type) && !lvalue_p (expr)
 && !CP_TYPE_CONST_NON_VOLATILE_P (TREE_TYPE (ref_type)))
- error_at (loc, "invalid initialization of non-const reference of "
-   "type %qT from an rvalue of type %qT", totype, extype);
+ error_at (loc, "cannot bind non-const lvalue reference of "
+   "type %qT to an rvalue of type %qT", totype, extype);
else if (!reference_compatible_p (TREE_TYPE (totype), extype))
- error_at (loc, "binding %qT to reference of type %qT "
-   "discards qualifiers", extype, totype);
+ error_at (loc, "binding reference of type %qT to %qT "
+   "discards qualifiers", totype, extype);
else
  gcc_unreachable ();
maybe_print_user_conv_context (convs);


Re: [PR71078] x / abs(x) -> copysign (1.0, x)

2016-07-27 Thread Richard Biener
On Wed, 27 Jul 2016, Prathamesh Kulkarni wrote:

> On 26 July 2016 at 17:41, Richard Biener  wrote:
> > On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> The attached patch tries to fix PR71078.
> >> I am not sure if I have got the converts right.
> >> I put (convert? @0) and (convert1? (abs @1))
> >> to match for cases when operands's types may
> >> be different from outermost type like in pr71078-3.c
> >
> > Types of RDIV_EXPR have to be the same so as you have a
> > match on @0 the converts need to be either both present
> > or not present.
> >
> > +  (if (FLOAT_TYPE_P (type)
> >
> > as you special-case several types below please use SCALAR_FLOAT_TYPE_P
> > here.
> >
> > +   && ! HONOR_NANS (type)
> > +   && ! HONOR_INFINITIES (type))
> > +   (switch
> > +(if (type == float_type_node)
> > + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0)))
> >
> > please use if (types_match (type, float_type_node)) instead of
> > pointer equality.  I _think_ you can do better here by using
> > IFN_COPYSIGN but possibly only so if the target supports it.
> > Richard - this seems to be the first pattern in need of
> > generating a builtin where no other was there to match the type
> > to - any idea how we can safely use the internal function here?
> > I see those do not have an expander that would fall back to
> > expanding the regular builtin, correct?
> >
> > Please place the pattern next to
> >
> > /* Optimize -A / A to -1.0 if we don't care about
> >NaNs or Infinities.  */
> > (simplify
> >  (rdiv:C @0 (negate @0))
> >  (if (FLOAT_TYPE_P (type)
> >   && ! HONOR_NANS (type)
> >   && ! HONOR_INFINITIES (type))
> >   { build_minus_one_cst (type); }))
> >
> > where it logically belongs.
> Hi,
> Is this  version OK ?
> Bootstrap + test in progress on x86_64-unknown-linux-gnu

Ok.

Thanks,
Richard.

> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> >> test-case (included in patch).
> >> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> >>
> >> Thanks,
> >> Prathamesh
> >>
> >
> > --
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> > 21284 (AG Nuernberg)
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Remove gcc.dg/vect/costmodel/x86_64/costmodel-pr68961.c

2016-07-27 Thread Richard Biener

I forgot to remove this (now FAILing) testcase after the 
fwprop/simplify-rtx fix.

Committed.

2016-07-27  Richard Biener  

* gcc.dg/vect/costmodel/x86_64/costmodel-pr68961.c: Remove.


Re: [PR71078] x / abs(x) -> copysign (1.0, x)

2016-07-27 Thread Prathamesh Kulkarni
On 26 July 2016 at 17:41, Richard Biener  wrote:
> On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote:
>
>> Hi,
>> The attached patch tries to fix PR71078.
>> I am not sure if I have got the converts right.
>> I put (convert? @0) and (convert1? (abs @1))
>> to match for cases when operands's types may
>> be different from outermost type like in pr71078-3.c
>
> Types of RDIV_EXPR have to be the same so as you have a
> match on @0 the converts need to be either both present
> or not present.
>
> +  (if (FLOAT_TYPE_P (type)
>
> as you special-case several types below please use SCALAR_FLOAT_TYPE_P
> here.
>
> +   && ! HONOR_NANS (type)
> +   && ! HONOR_INFINITIES (type))
> +   (switch
> +(if (type == float_type_node)
> + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0)))
>
> please use if (types_match (type, float_type_node)) instead of
> pointer equality.  I _think_ you can do better here by using
> IFN_COPYSIGN but possibly only so if the target supports it.
> Richard - this seems to be the first pattern in need of
> generating a builtin where no other was there to match the type
> to - any idea how we can safely use the internal function here?
> I see those do not have an expander that would fall back to
> expanding the regular builtin, correct?
>
> Please place the pattern next to
>
> /* Optimize -A / A to -1.0 if we don't care about
>NaNs or Infinities.  */
> (simplify
>  (rdiv:C @0 (negate @0))
>  (if (FLOAT_TYPE_P (type)
>   && ! HONOR_NANS (type)
>   && ! HONOR_INFINITIES (type))
>   { build_minus_one_cst (type); }))
>
> where it logically belongs.
Hi,
Is this  version OK ?
Bootstrap + test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>> test-case (included in patch).
>> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>>
>> Thanks,
>> Prathamesh
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
2016-07-27  Prathamesh Kulkarni  

PR middle-end/71078
* match.pd (x / abs(x) -> copysign(1.0, x)): New pattern.

testsuite/
* gcc.dg/tree-ssa/pr71078-1.c: New test-case.
* gcc.dg/tree-ssa/pr71078-2.c: Likewise.
* gcc.dg/tree-ssa/pr71078-3.c: Likewise.

diff --git a/gcc/match.pd b/gcc/match.pd
index 21bf617..2fd898a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -195,6 +195,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && ! HONOR_INFINITIES (type))
   { build_minus_one_cst (type); }))
 
+/* PR71078: x / abs(x) -> copysign (1.0, x) */
+(simplify
+ (rdiv:C (convert? @0) (convert? (abs @0)))
+  (if (SCALAR_FLOAT_TYPE_P (type)
+   && ! HONOR_NANS (type)
+   && ! HONOR_INFINITIES (type))
+   (switch
+(if (types_match (type, float_type_node))
+ (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0)))
+(if (types_match (type, double_type_node))
+ (BUILT_IN_COPYSIGN { build_one_cst (type); } (convert @0)))
+(if (types_match (type, long_double_type_node))
+ (BUILT_IN_COPYSIGNL { build_one_cst (type); } (convert @0))
+
 /* In IEEE floating point, x/1 is not equivalent to x for snans.  */
 (simplify
  (rdiv @0 real_onep)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c
new file mode 100644
index 000..6204c14
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */
+
+#include 
+
+float f1(float x)
+{
+  float t1 = fabsf (x);
+  float t2 = x / t1;
+  return t2;
+}
+ 
+double f2(double x)
+{
+  double t1 = fabs (x);
+  double t2 = x / t1;
+  return t2;
+}
+
+long double f3 (long double x)
+{
+  long double t1 = fabsl (x);
+  long double t2 = x / t1;
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c
new file mode 100644
index 000..96485af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */
+
+#include 
+
+float f1(float x)
+{
+  float t1 = fabsf (x);
+  float t2 = t1 / x; 
+  return t2;
+}
+ 
+double f2(double x)
+{
+  double t1 = fabs (x);
+  double t2 = t1 / x; 
+  return t2;
+}
+
+long double f3 (long double x)
+{
+  long double t1 = fabsl (x);
+  long double t2 = t1 / x; 
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */
diff --git 

[PATCH] Fix PR72517

2016-07-27 Thread Richard Biener

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-07-27  Richard Biener  

PR tree-optimization/72517
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences):
Revert change to not compute read-read dependences.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 238781)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -473,9 +473,10 @@ vect_analyze_data_ref_dependences (loop_
 .create (LOOP_VINFO_DATAREFS (loop_vinfo).length ()
 * LOOP_VINFO_DATAREFS (loop_vinfo).length ());
   LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = true;
+  /* We need read-read dependences to compute STMT_VINFO_SAME_ALIGN_REFS.  */
   if (!compute_all_dependences (LOOP_VINFO_DATAREFS (loop_vinfo),
_VINFO_DDRS (loop_vinfo),
-   LOOP_VINFO_LOOP_NEST (loop_vinfo), false))
+   LOOP_VINFO_LOOP_NEST (loop_vinfo), true))
 return false;
 
   FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr)


[PATCH] Fix signed/unsigned warning in predict.c

2016-07-27 Thread Richard Biener

Bootstrapped on x86_64-unknown-linux-gnu, applied.

Richard.

2016-07-27  Richard Biener  

* predict.c (set_even_probabilities): Make nedges unsigned.

Index: gcc/predict.c
===
--- gcc/predict.c   (revision 238781)
+++ gcc/predict.c   (working copy)
@@ -793,7 +793,7 @@ static void
 set_even_probabilities (basic_block bb,
hash_set *unlikely_edges = NULL)
 {
-  int nedges = 0;
+  unsigned nedges = 0;
   edge e;
   edge_iterator ei;
 


Re: [PATCH GCC]Vectorize possible infinite loop by loop versioning.

2016-07-27 Thread Richard Biener
On Tue, Jul 26, 2016 at 7:11 PM, Bin Cheng  wrote:
> Hi,
> This patch vectorizes possible infinite loops by versioning.  It analyzes 
> loops considered for vectorization using loop constraint facility which was 
> introduced by previous patch; then vectorizes the loop with respect to 
> assumptions of loop niters information; at last, it sets constraint flags for 
> versioned loop so that niter analyzer in following optimizers can take 
> advantage of it.  The patch also adds two tests.
>
> Bootstrap and test on x86_64.  Any comments?

+  else if (integer_nonzerop (may_be_zero))
+   {
+ niter = build_int_cst (TREE_TYPE (niter), 0);
+ *number_of_iterationsm1 = niter;
+ *number_of_iterations = niter;

*number_of_iterations should be 1 here.

Otherwise the patch looks ok to me.

Thanks,
RIchard.





> Thanks,
> bin
>
> 2016-07-25  Bin Cheng  
>
> PR tree-optimization/57558
> * tree-vect-loop-manip.c (vect_create_cond_for_niters_checks): New
> function.
> (vect_loop_versioning): Support versioning with niter assumptions.
> * tree-vect-loop.c (tree-ssa-loop.h): Include header file.
> (vect_get_loop_niters): New parameter.  Reimplement to support
> assumptions in loop niter info.
> (vect_analyze_loop_form_1, vect_analyze_loop_form): Ditto.
> (new_loop_vec_info): Init LOOP_VINFO_NITERS_ASSUMPTIONS.
> (vect_estimate_min_profitable_iters): Use LOOP_REQUIRES_VERSIONING.
> Support loop versioning for niters.
> * tree-vectorizer.c (tree-ssa-loop-niter.h): Include header file.
> (vect_free_loop_info_assumptions): New function.
> (vectorize_loops): Free loop niter info for loops with flag
> LOOP_F_ASSUMPTIONS set if vectorization failed.
> * tree-vectorizer.h (struct _loop_vec_info): New field
> num_iters_assumptions.
> (LOOP_VINFO_NITERS_ASSUMPTIONS): New macro.
> (LOOP_REQUIRES_VERSIONING_FOR_NITERS): New macro.
> (LOOP_REQUIRES_VERSIONING): New macro.
> (vect_free_loop_info_assumptions): New decl.
>
> gcc/testsuite/ChangeLog
> 2016-07-25  Bin Cheng  
>
> PR tree-optimization/57558
> * gcc.dg/vect/pr57558-1.c: New test.
> * gcc.dg/vect/pr57558-2.c: New test.


Re: [PATCH] Remove special streaming of builtins

2016-07-27 Thread Richard Biener
On Wed, 27 Jul 2016, Ilya Enkovich wrote:

> 2016-07-26 22:52 GMT+03:00 Richard Biener :
> > On July 26, 2016 7:26:46 PM GMT+02:00, "H.J. Lu"  
> > wrote:
> >>On Mon, Jul 25, 2016 at 4:35 AM, Richard Biener 
> >>wrote:
> >>>
> >>> So I needed to fix that builtins appearing in BLOCK_VARs and the
> >>solution
> >>> I came up with accidentially disabled streaming via the special path.
> >>> Thus the following patch removes the special-casing completely and
> >>makes
> >>> the BLOCK_VARs handling work the same way as for regular externs (by
> >>> streaming a local copy).  We stream each builtin decl once and then
> >>> refer to it via the decl index (which is cheaper than the special
> >>> casing).
> >>>
> >>> I'm not 100% this solves for example the -fno-math-errno inlining
> >>> across TUs (it certainly doesn't if you use attribute optimize with
> >>> -fno-math-errno), but it eventually should by means of having two
> >>> different BUILT_IN_XXX if they have different behavior.  At least
> >>> if all relevant bits are set on the function _type_ rather than
> >>> the decl which I think we still lto-symtab replace with one
> >>> entity during WPA(?)
> >>>
> >>> Well.
> >>>
> >>> LTO bootstrapped and tested on x86_64-unknown-linux-gnu
> >>(c,c++,fortran),
> >>> bootstrapped on x86_64-unknown-linux-gnu (all), testing in progress.
> >>>
> >>> I might have not catched all fndecl compares.
> >>>
> >>> Will apply to trunk if testing completes.  As said, maybe followup
> >>> cleanups possible, at least to lto-opts.c / lto-wrapper.
> >>>
> >>> Richard.
> >>>
> >>> 2016-07-25  Richard Biener  
> >>>
> >>> * cgraph.c (cgraph_node::verify_node): Compare against
> >>builtin
> >>> by using DECL_BUILT_IN_CLASS and DECL_FUNCTION_CODE.
> >>> * tree-chkp.c (chkp_gimple_call_builtin_p): Likewise.
> >>> * tree-streamer.h (streamer_handle_as_builtin_p): Remove.
> >>> (streamer_get_builtin_tree): Likewise.
> >>> (streamer_write_builtin): Likewise.
> >>> * lto-streamer.h (LTO_builtin_decl): Remove.
> >>> * lto-streamer-in.c (lto_read_tree_1): Remove assert.
> >>> (lto_input_scc): Remove LTO_builtin_decl handling.
> >>> (lto_input_tree_1): Liekwise.
> >>> * lto-streamer-out.c (lto_output_tree_1): Remove special
> >>> handling of builtins.
> >>> (DFS::DFS): Likewise.
> >>> * tree-streamer-in.c (streamer_get_builtin_tree): Remove.
> >>> * tree-streamer-out.c (pack_ts_function_decl_value_fields):
> >>Remove
> >>> assert.
> >>> (streamer_write_builtin): Remove.
> >>>
> >>> lto/
> >>> * lto.c (compare_tree_sccs_1): Remove
> >>streamer_handle_as_builtin_p uses.
> >>> (unify_scc): Likewise.
> >>> (lto_read_decls): Likewise.
> >>>
> >>
> >>This caused:
> >>
> >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72683
> >
> > Probably another by-decl built-in function compare in the mpx support.  I 
> > have fixed the one that triggered on a not mpx capable machine.  Ilya 
> > possibly knows where the other one(s) are lurking off his head?
> 
> I found a couple more of such comparisons.  Their replacement fixes
> PR72683 and PR72657.

Great!  Thanks for tracking this down.

Richard.


Re: [Patch GCC]Support constraint flags in loop structure.

2016-07-27 Thread Richard Biener
On Tue, Jul 26, 2016 at 7:10 PM, Bin Cheng  wrote:
> Hi,
> This patch adds support for constraint flags in loop structure.  Different to 
> existing boolean flags which are set by niter analyzer, constraint flag is 
> mainly set by consumers and affects certain semantics of niter analyzer APIs. 
>  Currently niter APIs affected are number_of_iterations_exit* and their 
> callers.  Constraint flags are added to support handling of possible infinite 
> loops in GCC.  One typical usecase of constraint is in vectorizer, as 
> described in patch's comment:
>
>   /* ...
>1) Compute niter->assumptions by calling niter analyzer API and
>   record it as possible condition for loop versioning.
>2) Clear buffered result of niter/scev analyzer.
>3) Set constraint LOOP_C_FINITE assuming the loop is finite.
>4) Analyze data references.  Since data reference analysis depends
>   on niter/scev analyzer, the point is that niter/scev analysis
>   is done under circumstance of LOOP_C_FINITE constraint.
>5) Version the loop with assumptions computed in step 1).
>6) Vectorize the versioned loop in which assumptions is guarded to
>   be true.
>7) Update constraints in versioned loops so that niter analyzer
>   in following passes can use it.
>
>  Note consumers are usually the loop optimizers and it is consumers'
>  responsibility to set/clear constraints correctly.  Failing to do
>  that might result in hard to track bugs in niter/scev analyzer.  */
>
> Next patch will use constraint to vectorize possible infinite loop by 
> versioning, I would also expect possible infinite loops (loops with 
> assumptions) can be handled by more optimizers.  This patch itself doesn't 
> change GCC behavior, bootstrap and test on x86_64.  Any comments?

+ Note consumers are usually the loop optimizers and it is consumers'
+ responsibility to set/clear constraints correctly.  Failing to do
+ that might result in hard to track bugs in niter/scev analyzer.  */

"in hard to track down bugs in niter/scev consumers"

+static inline void
+loop_constraint_clr (struct loop *loop, unsigned c)
+{

use _clear (similar to loops_state_clear).  Function comments missing.

I think we want to copy constraints in copy_loop_info.

Please place the 'constraints' field in struct loop somewhere better,
between two pointers we have 4 bytes wasted on a 64bit arch.
Maybe you can group all bools and put constraints next to safelen
(yeah, we'd still waste some padding, not sure if we want to turn
the bools and estimate_state into a bitfield).

It would be nice to document the constraints in doc/loop.texi.

Ok with that changes.

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-07-25  Bin Cheng  
>
> * cfgloop.h (struct loop): New field constraints.
> (LOOP_C_INFINITE, LOOP_C_FINITE): New macros.
> (loop_constraint_set, loop_constraint_clr, loop_constraint_set_p): New
> functions.
> * cfgloop.c (alloc_loop): Initialize new field.
> * tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions):
> Adjust niter analysis wrto loop constraints.


Use "oacc kernels" attribute for OpenACC kernels (was: Mark oacc kernels fns)

2016-07-27 Thread Thomas Schwinge
Hi!

On Mon, 25 Jan 2016 16:09:14 +0100, Jakub Jelinek  wrote:
> On Mon, Jan 25, 2016 at 10:06:50AM -0500, Nathan Sidwell wrote:
> > On 01/04/16 10:39, Nathan Sidwell wrote:
> > >There's currently no robust predicate to determine whether an oacc offload
> > >function is for a kernels region (as opposed to a parallel region).
> > >[...]
> > >
> > >This patch marks TREE_PUBLIC on the offload attribute values, to note 
> > >kernels
> > >regions,  and adds a predicate to check that.  [...]
> > >
> > >Using these predicates improves the dump output of the openacc device 
> > >lowering
> > >pass too.

I just submitted a patch adding "Test cases to check OpenACC offloaded
function's attributes and classification",
,
to actually check the dump output of "oaccdevlow" -- it works.  ;-)

> > https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00092.html
> > ping?
> 
> Ok, thanks.

It's conceptually and code-wise simpler to just use a "oacc kernels"
attribute for that.  (And, that will make another patch I'm working on
less convoluted.)

I'm open to suggestions if there is a better place to set the "oacc
kernels" attribute -- I put it into expand_omp_target, where another
special thing for GF_OMP_TARGET_KIND_OACC_KERNELS is already being done,
and before "rewriting" GF_OMP_TARGET_KIND_OACC_KERNELS (and
GF_OMP_TARGET_KIND_OACC_PARALLEL) into BUILT_IN_GOACC_PARALLEL.  My
reasoning for not setting the attribute earlier (like, in the front
ends), is that at that point in/before expand_omp_target, we still have
the distrinction between OACC_PARALLEL/OACC_KERNELS (tree codes), and
later GF_OMP_TARGET_KIND_OACC_PARALLEL/GF_OMP_TARGET_KIND_OACC_KERNELS
(GIMPLE_OMP_TARGET subcodes).  Another question/possibly cleanup of
course might be to actually do set the "oacc kernels" attribute in the
front end and merge OACC_KERNELS into OACC_PARALLEL, and
GF_OMP_TARGET_KIND_OACC_KERNELS into GF_OMP_TARGET_KIND_OACC_PARALLEL?

But anyway, as a first step: OK for trunk?

commit 2e6dc8dfd679d8dae814e325afa2547b502827ef
Author: Thomas Schwinge 
Date:   Tue Jul 26 17:44:31 2016 +0200

Use "oacc kernels" attribute for OpenACC kernels

gcc/
* omp-low.c (expand_omp_target) :
Set "oacc kernels" attribute.
(set_oacc_fn_attrib): Remove is_kernel formal parameter.  Adjust
all users.
(oacc_fn_attrib_kernels_p): Remove function.
(execute_oacc_device_lower): Look for "oacc kernels" attribute
instead of calling oacc_fn_attrib_kernels_p.
* tree-ssa-loop.c (gate_oacc_kernels): Likewise.
* tree-parloops.c (create_parallel_loop): If oacc_kernels_p,
assert "oacc kernels" attribute is set.
---
 gcc/omp-low.c  | 53 --
 gcc/omp-low.h  |  3 +-
 gcc/tree-parloops.c|  5 +-
 gcc/tree-ssa-loop.c|  5 +-
 10 files changed, 34 insertions(+), 48 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index c75452c..a35556d 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -12552,11 +12552,10 @@ replace_oacc_fn_attrib (tree fn, tree dims)
 
 /* Scan CLAUSES for launch dimensions and attach them to the oacc
function attribute.  Push any that are non-constant onto the ARGS
-   list, along with an appropriate GOMP_LAUNCH_DIM tag.  IS_KERNEL is
-   true, if these are for a kernels region offload function.  */
+   list, along with an appropriate GOMP_LAUNCH_DIM tag.  */
 
 void
-set_oacc_fn_attrib (tree fn, tree clauses, bool is_kernel, vec *args)
+set_oacc_fn_attrib (tree fn, tree clauses, vec *args)
 {
   /* Must match GOMP_DIM ordering.  */
   static const omp_clause_code ids[]
@@ -12581,9 +12580,6 @@ set_oacc_fn_attrib (tree fn, tree clauses, bool 
is_kernel, vec *args)
  non_const |= GOMP_DIM_MASK (ix);
}
   attr = tree_cons (NULL_TREE, dim, attr);
-  /* Note kernelness with TREE_PUBLIC.  */
-  if (is_kernel)
-   TREE_PUBLIC (attr) = 1;
 }
 
   replace_oacc_fn_attrib (fn, attr);
@@ -12652,16 +12648,6 @@ get_oacc_fn_attrib (tree fn)
   return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
 }
 
-/* Return true if this oacc fn attrib is for a kernels offload
-   region.  We use the TREE_PUBLIC flag of each dimension -- only
-   need to check the first one.  */
-
-bool
-oacc_fn_attrib_kernels_p (tree attr)
-{
-  return TREE_PUBLIC (TREE_VALUE (attr));
-}
-
 /* Return level at which oacc routine may spawn a partitioned loop, or
-1 if it is not a routine (i.e. is an offload fn).  */
 
@@ -13044,7 +13030,12 @@ expand_omp_target (struct omp_region *region)
   exit_bb = region->exit;
 
   if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
-mark_loops_in_oacc_kernels_region (region->entry, region->exit);
+{
+  

Re: [PATCH] Remove special streaming of builtins

2016-07-27 Thread Ilya Enkovich
2016-07-26 22:52 GMT+03:00 Richard Biener :
> On July 26, 2016 7:26:46 PM GMT+02:00, "H.J. Lu"  wrote:
>>On Mon, Jul 25, 2016 at 4:35 AM, Richard Biener 
>>wrote:
>>>
>>> So I needed to fix that builtins appearing in BLOCK_VARs and the
>>solution
>>> I came up with accidentially disabled streaming via the special path.
>>> Thus the following patch removes the special-casing completely and
>>makes
>>> the BLOCK_VARs handling work the same way as for regular externs (by
>>> streaming a local copy).  We stream each builtin decl once and then
>>> refer to it via the decl index (which is cheaper than the special
>>> casing).
>>>
>>> I'm not 100% this solves for example the -fno-math-errno inlining
>>> across TUs (it certainly doesn't if you use attribute optimize with
>>> -fno-math-errno), but it eventually should by means of having two
>>> different BUILT_IN_XXX if they have different behavior.  At least
>>> if all relevant bits are set on the function _type_ rather than
>>> the decl which I think we still lto-symtab replace with one
>>> entity during WPA(?)
>>>
>>> Well.
>>>
>>> LTO bootstrapped and tested on x86_64-unknown-linux-gnu
>>(c,c++,fortran),
>>> bootstrapped on x86_64-unknown-linux-gnu (all), testing in progress.
>>>
>>> I might have not catched all fndecl compares.
>>>
>>> Will apply to trunk if testing completes.  As said, maybe followup
>>> cleanups possible, at least to lto-opts.c / lto-wrapper.
>>>
>>> Richard.
>>>
>>> 2016-07-25  Richard Biener  
>>>
>>> * cgraph.c (cgraph_node::verify_node): Compare against
>>builtin
>>> by using DECL_BUILT_IN_CLASS and DECL_FUNCTION_CODE.
>>> * tree-chkp.c (chkp_gimple_call_builtin_p): Likewise.
>>> * tree-streamer.h (streamer_handle_as_builtin_p): Remove.
>>> (streamer_get_builtin_tree): Likewise.
>>> (streamer_write_builtin): Likewise.
>>> * lto-streamer.h (LTO_builtin_decl): Remove.
>>> * lto-streamer-in.c (lto_read_tree_1): Remove assert.
>>> (lto_input_scc): Remove LTO_builtin_decl handling.
>>> (lto_input_tree_1): Liekwise.
>>> * lto-streamer-out.c (lto_output_tree_1): Remove special
>>> handling of builtins.
>>> (DFS::DFS): Likewise.
>>> * tree-streamer-in.c (streamer_get_builtin_tree): Remove.
>>> * tree-streamer-out.c (pack_ts_function_decl_value_fields):
>>Remove
>>> assert.
>>> (streamer_write_builtin): Remove.
>>>
>>> lto/
>>> * lto.c (compare_tree_sccs_1): Remove
>>streamer_handle_as_builtin_p uses.
>>> (unify_scc): Likewise.
>>> (lto_read_decls): Likewise.
>>>
>>
>>This caused:
>>
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72683
>
> Probably another by-decl built-in function compare in the mpx support.  I 
> have fixed the one that triggered on a not mpx capable machine.  Ilya 
> possibly knows where the other one(s) are lurking off his head?

I found a couple more of such comparisons.  Their replacement fixes
PR72683 and PR72657.

Thanks,
Ilya

>
> Richard.
>
>


Re: PING: new pass to warn on questionable uses of alloca() and VLAs

2016-07-27 Thread Rainer Orth
Hi Aldy,

> Just in case this got lost in noise, since I know there was a lot of back
> and forth between Martin Sebor and I.
>
> This is the last iteration.
>
> Tested on x86-64 Linux.
>
> OK for trunk?
>
> gcc/
>
>   * Makefile.in (OBJS): Add gimple-ssa-warn-walloca.o.
>   * passes.def: Add two instances of pass_walloca.
>   * tree-pass.h (make_pass_walloca): New.
>   * gimple-ssa-warn-walloca.c: New file.

just a nit: the files is called gimple-ssa-warn-alloca.[co].

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


PING: new pass to warn on questionable uses of alloca() and VLAs

2016-07-27 Thread Aldy Hernandez
Just in case this got lost in noise, since I know there was a lot of 
back and forth between Martin Sebor and I.


This is the last iteration.

Tested on x86-64 Linux.

OK for trunk?
gcc/

* Makefile.in (OBJS): Add gimple-ssa-warn-walloca.o.
* passes.def: Add two instances of pass_walloca.
* tree-pass.h (make_pass_walloca): New.
* gimple-ssa-warn-walloca.c: New file.
* doc/invoke.texi: Document -Walloca, -Walloca-larger-than=, and
-Wvla-larger-than= options.

gcc/c-family/

* c.opt (Walloca): New.
(Walloca-larger-than=): New.
(Wvla-larger-than=): New.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 0786fa3..8411bed 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1284,6 +1284,7 @@ OBJS = \
gimple-ssa-nonnull-compare.o \
gimple-ssa-split-paths.o \
gimple-ssa-strength-reduction.o \
+   gimple-ssa-warn-alloca.o \
gimple-streamer-in.o \
gimple-streamer-out.o \
gimple-walk.o \
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index c11e7e7..6e82fc8 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -376,6 +376,16 @@ c_common_handle_option (size_t scode, const char *arg, int 
value,
   cpp_opts->warn_num_sign_change = value;
   break;
 
+case OPT_Walloca_larger_than_:
+  if (!value)
+   inform (loc, "-Walloca-larger-than=0 is meaningless");
+  break;
+
+case OPT_Wvla_larger_than_:
+  if (!value)
+   inform (loc, "-Wvla-larger-than=0 is meaningless");
+  break;
+
 case OPT_Wunknown_pragmas:
   /* Set to greater than 1, so that even unknown pragmas in
 system headers will be warned about.  */
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 8c70152..017a8f9 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -275,6 +275,16 @@ Wall
 C ObjC C++ ObjC++ Warning
 Enable most warning messages.
 
+Walloca
+C ObjC C++ ObjC++ Var(warn_alloca) Warning
+Warn on any use of alloca.
+
+Walloca-larger-than=
+C ObjC C++ ObjC++ Var(warn_alloca_limit) Warning Joined RejectNegative UInteger
+-Walloca-larger-than= Warn on unbounded uses of
+alloca, and on bounded uses of alloca whose bound can be larger than
+ bytes.
+
 Warray-bounds
 LangEnabledBy(C ObjC C++ ObjC++,Wall)
 ; in common.opt
@@ -980,6 +990,12 @@ Wvla
 C ObjC C++ ObjC++ Var(warn_vla) Init(-1) Warning
 Warn if a variable length array is used.
 
+Wvla-larger-than=
+C ObjC C++ ObjC++ Var(warn_vla_limit) Warning Joined RejectNegative UInteger
+-Wvla-larger-than= Warn on unbounded uses of variable-length arrays, 
and
+on bounded uses of variable-length arrays whose bound can be
+larger than  bytes.
+
 Wvolatile-register-var
 C ObjC C++ ObjC++ Var(warn_volatile_register_var) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wall)
 Warn when a register variable is declared volatile.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9a4db38..341ebb8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -254,6 +254,7 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic @gol
 -pedantic-errors @gol
 -w  -Wextra  -Wall  -Waddress  -Waggregate-return  @gol
+-Walloca -Walloca-larger-than=@var{n} @gol
 -Wno-aggressive-loop-optimizations -Warray-bounds -Warray-bounds=@var{n} @gol
 -Wno-attributes -Wbool-compare -Wno-builtin-macro-redefined @gol
 -Wc90-c99-compat -Wc99-c11-compat @gol
@@ -310,7 +311,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wunused-const-variable -Wunused-const-variable=@var{n} @gol
 -Wunused-but-set-parameter -Wunused-but-set-variable @gol
 -Wuseless-cast -Wvariadic-macros -Wvector-operation-performance @gol
--Wvla -Wvolatile-register-var  -Wwrite-strings @gol
+-Wvla -Wvla-larger-than=@var{n} -Wvolatile-register-var  -Wwrite-strings @gol
 -Wzero-as-null-pointer-constant -Whsa}
 
 @item C and Objective-C-only Warning Options
@@ -4660,6 +4661,61 @@ annotations.
 Warn about overriding virtual functions that are not marked with the override
 keyword.
 
+@item -Walloca
+@opindex Wno-alloca
+@opindex Walloca
+This option warns on all uses of @code{alloca} in the source.
+
+@item -Walloca-larger-than=@var{n}
+This option warns on calls to @code{alloca} that are not bounded by a
+controlling predicate limiting its size to @var{n} bytes, or calls to
+@code{alloca} where the bound is unknown.
+
+For example, a bounded case of @code{alloca} could be:
+
+@smallexample
+unsigned int n;
+...
+if (n <= 1000)
+  alloca (n);
+@end smallexample
+
+In the above example, passing @code{-Walloca=1000} would not issue a
+warning because the call to @code{alloca} is known to be at most 1000
+bytes.  However, if @code{-Walloca=500} was passed, the compiler would
+have emitted a warning.
+
+Unbounded uses, on the other hand, are uses of @code{alloca} with no
+controlling predicate verifying its size.  For example:
+
+@smallexample
+stuff ();
+alloca (n);
+@end smallexample
+
+If @code{-Walloca=500} 

Test cases to check OpenACC offloaded function's attributes and classification

2016-07-27 Thread Thomas Schwinge
Hi!

OK for trunk?

commit 8200af082db5438be18bc60f721fcf21641c0d86
Author: Thomas Schwinge 
Date:   Tue Jul 26 17:18:21 2016 +0200

Test cases to check OpenACC offloaded function's attributes and 
classification

gcc/testsuite/
* c-c++-common/goacc/oaccdevlow-kernels.c: New file.
* c-c++-common/goacc/oaccdevlow-parallel.c: Likewise.
* c-c++-common/goacc/oaccdevlow-routine.c: Likewise.
* gfortran.dg/goacc/oaccdevlow-kernels.f95: Likewise.
* gfortran.dg/goacc/oaccdevlow-parallel.f95: Likewise.
* gfortran.dg/goacc/oaccdevlow-routine.f95: Likewise.
---
 .../c-c++-common/goacc/oaccdevlow-kernels.c| 34 
 .../c-c++-common/goacc/oaccdevlow-parallel.c   | 27 
 .../c-c++-common/goacc/oaccdevlow-routine.c| 29 +
 .../gfortran.dg/goacc/oaccdevlow-kernels.f95   | 36 ++
 .../gfortran.dg/goacc/oaccdevlow-parallel.f95  | 29 +
 .../gfortran.dg/goacc/oaccdevlow-routine.f95   | 28 +
 6 files changed, 183 insertions(+)

diff --git gcc/testsuite/c-c++-common/goacc/oaccdevlow-kernels.c 
gcc/testsuite/c-c++-common/goacc/oaccdevlow-kernels.c
new file mode 100644
index 000..14d650a
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/oaccdevlow-kernels.c
@@ -0,0 +1,34 @@
+/* Check offloaded function's attributes and classification for OpenACC
+   kernels.  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-parloops1-all" }
+   { dg-additional-options "-fdump-tree-oaccdevlow" } */
+
+#define N (1024 * 512)
+
+extern unsigned int *__restrict a;
+extern unsigned int *__restrict b;
+extern unsigned int *__restrict c;
+
+void KERNELS ()
+{
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  for (unsigned int i = 0; i < N; i++)
+c[i] = a[i] + b[i];
+}
+
+/* Check the offloaded function's attributes.
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(omp target 
entrypoint\\)\\)" 1 "ompexp" } } */
+
+/* Check that exactly one OpenACC kernels loop is analyzed, and that it can be
+   parallelized.
+   { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 
"parloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } 
}
+   { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check the offloaded function's classification and compute dimensions (will
+   always be [1, 1, 1] for target compilation).
+   { dg-final { scan-tree-dump-times "(?n)Function is kernels offload" 1 
"oaccdevlow" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccdevlow" } } */
diff --git gcc/testsuite/c-c++-common/goacc/oaccdevlow-parallel.c 
gcc/testsuite/c-c++-common/goacc/oaccdevlow-parallel.c
new file mode 100644
index 000..63c372a
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/oaccdevlow-parallel.c
@@ -0,0 +1,27 @@
+/* Check offloaded function's attributes and classification for OpenACC
+   parallel.  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-oaccdevlow" } */
+
+#define N (1024 * 512)
+
+extern unsigned int *__restrict a;
+extern unsigned int *__restrict b;
+extern unsigned int *__restrict c;
+
+void PARALLEL ()
+{
+#pragma acc parallel loop copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  for (unsigned int i = 0; i < N; i++)
+c[i] = a[i] + b[i];
+}
+
+/* Check the offloaded function's attributes.
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(omp target 
entrypoint\\)\\)" 1 "ompexp" } } */
+
+/* Check the offloaded function's classification and compute dimensions (will
+   always be [1, 1, 1] for target compilation).
+   { dg-final { scan-tree-dump-times "(?n)Function is parallel offload" 1 
"oaccdevlow" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccdevlow" } } */
diff --git gcc/testsuite/c-c++-common/goacc/oaccdevlow-routine.c 
gcc/testsuite/c-c++-common/goacc/oaccdevlow-routine.c
new file mode 100644
index 000..fa2eae7
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/oaccdevlow-routine.c
@@ -0,0 +1,29 @@
+/* Check offloaded function's attributes and classification for OpenACC
+   routine.  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-oaccdevlow" } */
+
+#define N (1024 * 512)
+
+extern unsigned int *__restrict a;
+extern unsigned int *__restrict b;
+extern unsigned int *__restrict c;
+#pragma acc declare copyin (a, b) create (c)
+
+#pragma acc routine worker
+void ROUTINE ()
+{
+#pragma acc loop
+  for (unsigned int i = 0; i < N; i++)
+c[i] = a[i] + b[i];
+}
+
+/* Check the offloaded function's attributes.
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(omp declare 

Re: [PATCH] predict.c: merge multi-edges

2016-07-27 Thread Martin Liška
On 07/25/2016 07:52 PM, Jeff Law wrote:
> Can you turn this into a test as well?
> 
> With that change this patch is OK for the trunk.
> 
> jeff

Sure, thanks for the review.
Installed as r238781.

Martin
>From e88a89a85f2d7b4d37d44397d2abe9775cb1cbfb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 21 Jul 2016 16:20:42 +0200
Subject: [PATCH] predict.c: merge multi-edges

gcc/testsuite/ChangeLog:

2016-07-21  Martin Liska  

	* gcc.dg/predict-13.c: New test.
	* gcc.dg/predict-14.c: New test.

gcc/ChangeLog:

2016-07-21  Martin Liska  

	* predict.c (set_even_probabilities): Handle unlikely edges.
	(combine_predictions_for_bb): Likewise.
---
 gcc/predict.c | 73 +--
 gcc/testsuite/gcc.dg/predict-13.c | 24 +
 gcc/testsuite/gcc.dg/predict-14.c | 19 ++
 3 files changed, 106 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/predict-13.c
 create mode 100644 gcc/testsuite/gcc.dg/predict-14.c

diff --git a/gcc/predict.c b/gcc/predict.c
index 7fbd404..ca320cd 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -785,9 +785,13 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability,
 }
 
 /* We can not predict the probabilities of outgoing edges of bb.  Set them
-   evenly and hope for the best.  */
+   evenly and hope for the best.  If UNLIKELY_EDGES is not null, distribute
+   even probability for all edges not mentioned in the set.  These edges
+   are given PROB_VERY_UNLIKELY probability.  */
+
 static void
-set_even_probabilities (basic_block bb)
+set_even_probabilities (basic_block bb,
+			hash_set *unlikely_edges = NULL)
 {
   int nedges = 0;
   edge e;
@@ -796,9 +800,25 @@ set_even_probabilities (basic_block bb)
   FOR_EACH_EDGE (e, ei, bb->succs)
 if (!(e->flags & (EDGE_EH | EDGE_FAKE)))
   nedges ++;
+
+  /* Make the distribution even if all edges are unlikely.  */
+  unsigned unlikely_count = unlikely_edges ? unlikely_edges->elements () : 0;
+  if (unlikely_count == nedges)
+{
+  unlikely_edges = NULL;
+  unlikely_count = 0;
+}
+
+  unsigned c = nedges - unlikely_count;
+
   FOR_EACH_EDGE (e, ei, bb->succs)
 if (!(e->flags & (EDGE_EH | EDGE_FAKE)))
-  e->probability = (REG_BR_PROB_BASE + nedges / 2) / nedges;
+  {
+	if (unlikely_edges != NULL && unlikely_edges->contains (e))
+	  e->probability = PROB_VERY_UNLIKELY;
+	else
+	  e->probability = (REG_BR_PROB_BASE + c / 2) / c;
+  }
 else
   e->probability = 0;
 }
@@ -1068,18 +1088,51 @@ combine_predictions_for_bb (basic_block bb, bool dry_run)
 
   /* When there is no successor or only one choice, prediction is easy.
 
- We are lazy for now and predict only basic blocks with two outgoing
- edges.  It is possible to predict generic case too, but we have to
- ignore first match heuristics and do more involved combining.  Implement
- this later.  */
+ When we have a basic block with more than 2 successors, the situation
+ is more complicated as DS theory cannot be used literally.
+ More precisely, let's assume we predicted edge e1 with probability p1,
+ thus: m1({b1}) = p1.  As we're going to combine more than 2 edges, we
+ need to find probability of e.g. m1({b2}), which we don't know.
+ The only approximation is to equally distribute 1-p1 to all edges
+ different from b1.
+
+ According to numbers we've got from SPEC2006 benchark, there's only
+ one interesting reliable predictor (noreturn call), which can be
+ handled with a bit easier approach.  */
   if (nedges != 2)
 {
+  hash_set unlikely_edges (4);
+
+  /* Identify all edges that have a probability close to very unlikely.
+	 Doing the approach for very unlikely doesn't worth for doing as
+	 there's no such probability in SPEC2006 benchmark.  */
+  edge_prediction **preds = bb_predictions->get (bb);
+  if (preds)
+	for (pred = *preds; pred; pred = pred->ep_next)
+	  if (pred->ep_probability <= PROB_VERY_UNLIKELY)
+	unlikely_edges.add (pred->ep_edge);
+
   if (!bb->count && !dry_run)
-	set_even_probabilities (bb);
+	set_even_probabilities (bb, _edges);
   clear_bb_predictions (bb);
   if (dump_file)
-	fprintf (dump_file, "%i edges in bb %i predicted to even probabilities\n",
-		 nedges, bb->index);
+	{
+	  fprintf (dump_file, "Predictions for bb %i\n", bb->index);
+	  if (unlikely_edges.elements () == 0)
+	fprintf (dump_file,
+		 "%i edges in bb %i predicted to even probabilities\n",
+		 nedges, bb->index);
+	  else
+	{
+	  fprintf (dump_file,
+		   "%i edges in bb %i predicted with some unlikely edges\n",
+		   nedges, bb->index);
+	  FOR_EACH_EDGE (e, ei, bb->succs)
+		if (!(e->flags & (EDGE_EH | EDGE_FAKE)))
+		  dump_prediction (dump_file, PRED_COMBINED, e->probability,
+		   bb, REASON_NONE, e);
+	}
+	}
   return;
 }
 
diff --git 

Re: [PATCH 2/3] Run profile feedback tests with autofdo

2016-07-27 Thread Martin Liška
On 07/26/2016 06:28 AM, Andi Kleen wrote:
> And do you have autofdo installed? (create_gcov)
> 
> -Andi

Ah, sorry for the false alarm, create_gcov is really missing on my distribution.

Martin


[PATCH] Introduce no_profile_instrument_function attribute (PR gcov-profile/68025)

2016-07-27 Thread Martin Liška
Hi.

As mentioned in the PR gcov-profile/68025, there's a request not to instrument
some functions (e.g. a in linux kernel). Thus, I come with a new attribute 
no_profile_instrument_function
which skips any profiling instrumentation.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Thoughts?
Martin
>From be1a81dc949480c24c0e173085bdad193c0579ce Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 26 Jul 2016 15:03:29 +0200
Subject: [PATCH] Introduce no_profile_instrument_function attribute

gcc/ChangeLog:

2016-07-26  Martin Liska  

	PR gcov-profile/68025
	* tree-profile.c (tree_profiling): Respect
	no_profile_instrument_function attribute.

gcc/c-family/ChangeLog:

2016-07-26  Martin Liska  

	PR gcov-profile/68025
	* c-common.c (handle_no_profile_instrument_function_attribute):

gcc/testsuite/ChangeLog:

2016-07-26  Martin Liska  

	PR gcov-profile/68025
	* gcc.dg/no_profile_instrument_function-attr-1.c: New test.
---
 gcc/c-family/c-common.c| 21 
 .../gcc.dg/no_profile_instrument_function-attr-1.c | 23 ++
 gcc/tree-profile.c |  3 +++
 3 files changed, 47 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 1c5974a..35b5e5d 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -353,6 +353,8 @@ static tree handle_tls_model_attribute (tree *, tree, tree, int,
 	bool *);
 static tree handle_no_instrument_function_attribute (tree *, tree,
 		 tree, int, bool *);
+static tree handle_no_profile_instrument_function_attribute (tree *, tree,
+			 tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
 static tree handle_returns_twice_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_limit_stack_attribute (tree *, tree, tree, int,
@@ -717,6 +719,9 @@ const struct attribute_spec c_common_attribute_table[] =
   { "no_instrument_function", 0, 0, true,  false, false,
 			  handle_no_instrument_function_attribute,
 			  false },
+  { "no_profile_instrument_function",  0, 0, true, false, false,
+			  handle_no_profile_instrument_function_attribute,
+			  false },
   { "malloc", 0, 0, true,  false, false,
 			  handle_malloc_attribute, false },
   { "returns_twice",  0, 0, true,  false, false,
@@ -8293,6 +8298,22 @@ handle_no_instrument_function_attribute (tree *node, tree name,
   return NULL_TREE;
 }
 
+/* Handle a "no_profile_instrument_function" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_no_profile_instrument_function_attribute (tree *node, tree name, tree,
+		 int, bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* Handle a "malloc" attribute; arguments as in
struct attribute_spec.handler.  */
 
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
new file mode 100644
index 000..c93d171
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -0,0 +1,23 @@
+/* { dg-options "-O2 -fprofile-generate -fdump-tree-optimized" } */
+
+__attribute__ ((no_profile_instrument_function))
+int foo()
+{
+  return 0;
+}
+
+__attribute__ ((no_profile_instrument_function))
+int bar()
+{
+  return 1;
+}
+
+int main ()
+{
+  return foo ();
+}
+
+/* { dg-final { scan-tree-dump-times "__gcov0\\.main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v2" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__gcov_time_profiler" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 1f3a726..39fe15f 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -524,6 +524,9 @@ tree_profiling (void)
   if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
 	continue;
 
+  if (lookup_attribute ("no_profile_instrument_function",
+			DECL_ATTRIBUTES (node->decl)))
+	continue;
   /* Do not instrument extern inline functions when testing coverage.
 	 While this is not perfectly consistent (early inlined extern inlines
 	 will get acocunted), testsuite expects that.  */
-- 
2.9.2



Re: Gimple loop splitting v2

2016-07-27 Thread Richard Biener
On Wed, Jul 27, 2016 at 8:17 AM, Andrew Pinski  wrote:
> On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener
>  wrote:
>> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski  wrote:
>>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz  wrote:
 Hi,

 On Tue, 1 Dec 2015, Jeff Law wrote:

> > So, okay for trunk?
> -ENOPATCH

 Sigh :)
 Here it is.
>>>
>>>
>>> I found one problem with it.
>>> Take:
>>> void f(int *a, int M, int *b)
>>> {
>>>   for(int i = 0; i <= M; i++)
>>> {
>>>if (i < M)
>>> a[i] = i;
>>> }
>>> }
>>>  CUT ---
>>> There are two issues with the code as below.  The outer most loop's
>>> aux is still set which causes the vectorizer not to vector the loop.
>>> The other issue is I need to run pass_scev_cprop after pass_loop_split
>>> to get the induction variable usage after the loop gone so the
>>> vectorizer will work.
>>
>> I think scev_cprop needs to be re-written to an utility so that the 
>> vectorizer
>> itself can (within its own cost-model) eliminate an induction using it.
>>
>> Richard.
>>
>>> Something like (note this is copy and paste from a terminal):
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index c327900..e8d6ea6 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
>>>   NEXT_PASS (pass_copy_prop);
>>>   NEXT_PASS (pass_dce);
>>>   NEXT_PASS (pass_tree_unswitch);
>>> - NEXT_PASS (pass_scev_cprop);
>>>   NEXT_PASS (pass_loop_split);
>>> + NEXT_PASS (pass_scev_cprop);
>>>   NEXT_PASS (pass_record_bounds);
>>>   NEXT_PASS (pass_loop_distribution);
>>>   NEXT_PASS (pass_copy_prop);
>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>> index 5411530..e72ef19 100644
>>> --- a/gcc/tree-ssa-loop-split.c
>>> +++ b/gcc/tree-ssa-loop-split.c
>>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>>>
>>>gcc_assert (scev_initialized_p ());
>>>FOR_EACH_LOOP (loop, 0)
>>> -loop->aux = NULL;
>>> +{
>>> +  loop->aux = NULL;
>>> +  if (loop_outer (loop))
>>> +   loop_outer (loop)->aux = NULL;
>>> +}
>>
>> How does the iterator not visit loop_outer (loop)?!
>
> The iterator with flags of 0 does not visit the the root.  So the way
> to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so
> we zero out the root too.

Or not set ->aux on the root in the first place.

Richard.

> Thanks,
> Andrew
>
>>
>>>
>>>/* Go through all loops starting from innermost.  */
>>>FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>>>  }
>>>
>>>FOR_EACH_LOOP (loop, 0)
>>> -loop->aux = NULL;
>>> +{
>>> +  loop->aux = NULL;
>>> +  if (loop_outer (loop))
>>> +   loop_outer (loop)->aux = NULL;
>>> +}
>>>
>>>if (changed)
>>>  return TODO_cleanup_cfg;
>>> -  CUT -
>>>
>>> Thanks,
>>> Andrew
>>>
>>>


 Ciao,
 Michael.
 * common.opt (-fsplit-loops): New flag.
 * passes.def (pass_loop_split): Add.
 * opts.c (default_options_table): Add OPT_fsplit_loops entry at 
 -O3.
 (enable_fdo_optimizations): Add loop splitting.
 * timevar.def (TV_LOOP_SPLIT): Add.
 * tree-pass.h (make_pass_loop_split): Declare.
 * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
 * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
 * tree-ssa-loop-split.c: New file.
 * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
 * doc/invoke.texi (fsplit-loops): Document.
 * doc/passes.texi (Loop optimization): Add paragraph about loop
 splitting.

 testsuite/
 * gcc.dg/loop-split.c: New test.

 Index: common.opt
 ===
 --- common.opt  (revision 231115)
 +++ common.opt  (working copy)
 @@ -2453,6 +2457,10 @@ funswitch-loops
  Common Report Var(flag_unswitch_loops) Optimization
  Perform loop unswitching.

 +fsplit-loops
 +Common Report Var(flag_split_loops) Optimization
 +Perform loop splitting.
 +
  funwind-tables
  Common Report Var(flag_unwind_tables) Optimization
  Just generate unwind tables for exception handling.
 Index: passes.def
 ===
 --- passes.def  (revision 231115)
 +++ passes.def  (working copy)
 @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_tree_unswitch);
   NEXT_PASS (pass_scev_cprop);
 + NEXT_PASS (pass_loop_split);
   NEXT_PASS (pass_record_bounds);
   

Re: [PATCH 2/2] Use static_assert for STATIC_ASSERT for C++11 onwards

2016-07-27 Thread Richard Biener
On Wed, Jul 27, 2016 at 1:19 AM, David Malcolm  wrote:
> C++11 has a
>   static_assert (COND, MESSAGE)
> which gives more readable error messages for STATIC_ASSERT than our
> current implementation.
>
> This patch makes us use it if __cplusplus >= 201103L
>
> There's also a provisional static_assert (COND) in C++1z, but presumably
> we should wait until that one is fully standardized before using it.
>
> Bootstrapped on x86_64-pc-linux-gnu (in conjunction
> with the previous patch)
>
> OK for trunk?

Ok.

Richard.

> gcc/ChangeLog:
> * system.h (STATIC_ASSERT): Use static_assert if building
> with C++11 onwards.
> ---
>  gcc/system.h | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/system.h b/gcc/system.h
> index 78a7da6..8a17197 100644
> --- a/gcc/system.h
> +++ b/gcc/system.h
> @@ -752,9 +752,14 @@ extern void fancy_abort (const char *, int, const char 
> *) ATTRIBUTE_NORETURN;
>  #define STATIC_CONSTANT_P(X) (false && (X))
>  #endif
>
> -/* Until we can use C++11's static_assert.  */
> +/* static_assert (COND, MESSAGE) is available in C++11 onwards.  */
> +#if __cplusplus >= 201103L
> +#define STATIC_ASSERT(X) \
> +  static_assert ((X), #X)
> +#else
>  #define STATIC_ASSERT(X) \
>typedef int assertion1[(X) ? 1 : -1] ATTRIBUTE_UNUSED
> +#endif
>
>  /* Provide a fake boolean type.  We make no attempt to use the
> C99 _Bool, as it may not be available in the bootstrap compiler,
> --
> 1.8.5.3
>


Re: [PATCH], Fix PR 71869, Correctly implement isgreater, etc. on PowerPC __float128

2016-07-27 Thread Richard Biener
On Tue, 26 Jul 2016, Segher Boessenkool wrote:

> On Tue, Jul 26, 2016 at 08:04:32PM -0400, Michael Meissner wrote:
> > > dg-do compile?  That's not testing much then, as an executable test!
> > 
> > Good catch.  Hopefully, third time is a charm.  I verified that changing the
> > dg-do compile to dg-do run did run properly on a big endian power7 (both 
> > 32-bit
> > and 64-bit) and a little endian power8.
> > 
> > [gcc]
> > 2016-07-26  Michael Meissner  
> > 
> > PR target/71869
> > * config/rs6000/rs6000.c (rs6000_generate_compare): Rework
> > __float128 support when we don't have hardware support, so that
> > the IEEE built-in functions like isgreater, first call __unordkf3
> > to make sure neither operand is a NaN, and if both operands are
> > ordered, do the normal comparison.
> > 
> > [gcc/testsuite]
> > 2016-07-26  Michael Meissner  
> > 
> > PR target/71869
> > * gcc.target/powerpc/float128-cmp.c: New test to make sure that
> > IEEE built-in functions handle quiet and signalling NaNs
> > correctly.
> 
> Okay for trunk.  Okay for 6 once there is independent testing (on
> gcc-testresults, say) for all affected targets and you aren't comfortable
> waiting any longer ;-)

You have at least two more weeks until GCC 6.2 RC1.

Richard.


Re: [PATCH 1/2] Add a STATIC_ASSERT on sizeof (struct cp_token)

2016-07-27 Thread Andreas Schwab
On Mi, Jul 27 2016, David Malcolm  wrote:

> +/* The C++ frontend lexes everything first, and keeps the tokens
> +   in memory, so there are possibly millions of tokens in memory.
> +
> +   Use a STATIC_ASSERT to ensure that we don't accidentally grow
> +   the structure.
> +
> +   To avoid introducing too many assumptions on the host data layout,
> +   only enable the assertion when compiling with GCC for a
> +   known-good host.  */
> +#if defined (__GNUC__) && defined (__x86_64__)
> +STATIC_ASSERT (sizeof (cp_token) ==

If you make that <= then you can enable it on more hosts.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[C++ PATCH] for PR72457

2016-07-27 Thread Markus Trippelsdorf
On 2016.07.23 at 22:55 -0400, Jason Merrill wrote:
> Using build_value_init in a base initialization is wrong, because it
> calls the complete object constructor and misses protected access.  So
> let's handle list-value-initialization in expand_aggr_init_1.
> 
> Tested x86_64-pc-linux-gnu, applying to trunk.

This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72457.
And because it was backported, the gcc-6 branch is also affected.

The following fix was tested on ppc64le. OK for trunk and gcc-6?

(Unfortunately the reduced testcase is much too big.)

 PR c++/72457
 *constexpr.c (cx_check_missing_mem_inits): Handle potential
 NULL_TREE.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6bcb41ae8254..83fd9a4896ac 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -734,7 +734,7 @@ cx_check_missing_mem_inits (tree fun, tree body, bool 
complain)
  || DECL_ARTIFICIAL (index))
continue;
}
-  for (; field != index; field = DECL_CHAIN (field))
+  for (; field != NULL_TREE && field != index; field = DECL_CHAIN (field))
{
  tree ftype;
  if (TREE_CODE (field) != FIELD_DECL

-- 
Markus


Re: Gimple loop splitting v2

2016-07-27 Thread Andrew Pinski
On Tue, Jul 26, 2016 at 4:32 AM, Richard Biener
 wrote:
> On Mon, Jul 25, 2016 at 10:57 PM, Andrew Pinski  wrote:
>> On Wed, Dec 2, 2015 at 5:23 AM, Michael Matz  wrote:
>>> Hi,
>>>
>>> On Tue, 1 Dec 2015, Jeff Law wrote:
>>>
 > So, okay for trunk?
 -ENOPATCH
>>>
>>> Sigh :)
>>> Here it is.
>>
>>
>> I found one problem with it.
>> Take:
>> void f(int *a, int M, int *b)
>> {
>>   for(int i = 0; i <= M; i++)
>> {
>>if (i < M)
>> a[i] = i;
>> }
>> }
>>  CUT ---
>> There are two issues with the code as below.  The outer most loop's
>> aux is still set which causes the vectorizer not to vector the loop.
>> The other issue is I need to run pass_scev_cprop after pass_loop_split
>> to get the induction variable usage after the loop gone so the
>> vectorizer will work.
>
> I think scev_cprop needs to be re-written to an utility so that the vectorizer
> itself can (within its own cost-model) eliminate an induction using it.
>
> Richard.
>
>> Something like (note this is copy and paste from a terminal):
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index c327900..e8d6ea6 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -262,8 +262,8 @@ along with GCC; see the file COPYING3.  If not see
>>   NEXT_PASS (pass_copy_prop);
>>   NEXT_PASS (pass_dce);
>>   NEXT_PASS (pass_tree_unswitch);
>> - NEXT_PASS (pass_scev_cprop);
>>   NEXT_PASS (pass_loop_split);
>> + NEXT_PASS (pass_scev_cprop);
>>   NEXT_PASS (pass_record_bounds);
>>   NEXT_PASS (pass_loop_distribution);
>>   NEXT_PASS (pass_copy_prop);
>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> index 5411530..e72ef19 100644
>> --- a/gcc/tree-ssa-loop-split.c
>> +++ b/gcc/tree-ssa-loop-split.c
>> @@ -592,7 +592,11 @@ tree_ssa_split_loops (void)
>>
>>gcc_assert (scev_initialized_p ());
>>FOR_EACH_LOOP (loop, 0)
>> -loop->aux = NULL;
>> +{
>> +  loop->aux = NULL;
>> +  if (loop_outer (loop))
>> +   loop_outer (loop)->aux = NULL;
>> +}
>
> How does the iterator not visit loop_outer (loop)?!

The iterator with flags of 0 does not visit the the root.  So the way
to fix this is change 0 (which is the flags) with LI_INCLUDE_ROOT so
we zero out the root too.

Thanks,
Andrew

>
>>
>>/* Go through all loops starting from innermost.  */
>>FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
>> @@ -631,7 +635,11 @@ tree_ssa_split_loops (void)
>>  }
>>
>>FOR_EACH_LOOP (loop, 0)
>> -loop->aux = NULL;
>> +{
>> +  loop->aux = NULL;
>> +  if (loop_outer (loop))
>> +   loop_outer (loop)->aux = NULL;
>> +}
>>
>>if (changed)
>>  return TODO_cleanup_cfg;
>> -  CUT -
>>
>> Thanks,
>> Andrew
>>
>>
>>>
>>>
>>> Ciao,
>>> Michael.
>>> * common.opt (-fsplit-loops): New flag.
>>> * passes.def (pass_loop_split): Add.
>>> * opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
>>> (enable_fdo_optimizations): Add loop splitting.
>>> * timevar.def (TV_LOOP_SPLIT): Add.
>>> * tree-pass.h (make_pass_loop_split): Declare.
>>> * tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
>>> * tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
>>> * tree-ssa-loop-split.c: New file.
>>> * Makefile.in (OBJS): Add tree-ssa-loop-split.o.
>>> * doc/invoke.texi (fsplit-loops): Document.
>>> * doc/passes.texi (Loop optimization): Add paragraph about loop
>>> splitting.
>>>
>>> testsuite/
>>> * gcc.dg/loop-split.c: New test.
>>>
>>> Index: common.opt
>>> ===
>>> --- common.opt  (revision 231115)
>>> +++ common.opt  (working copy)
>>> @@ -2453,6 +2457,10 @@ funswitch-loops
>>>  Common Report Var(flag_unswitch_loops) Optimization
>>>  Perform loop unswitching.
>>>
>>> +fsplit-loops
>>> +Common Report Var(flag_split_loops) Optimization
>>> +Perform loop splitting.
>>> +
>>>  funwind-tables
>>>  Common Report Var(flag_unwind_tables) Optimization
>>>  Just generate unwind tables for exception handling.
>>> Index: passes.def
>>> ===
>>> --- passes.def  (revision 231115)
>>> +++ passes.def  (working copy)
>>> @@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
>>>   NEXT_PASS (pass_dce);
>>>   NEXT_PASS (pass_tree_unswitch);
>>>   NEXT_PASS (pass_scev_cprop);
>>> + NEXT_PASS (pass_loop_split);
>>>   NEXT_PASS (pass_record_bounds);
>>>   NEXT_PASS (pass_loop_distribution);
>>>   NEXT_PASS (pass_copy_prop);
>>> Index: opts.c
>>> ===
>>> --- opts.c  (revision 231115)
>>> +++ opts.c  (working copy)
>>> @@ -532,6 +532,7 @@ static const struct default_options