Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Jakub Jelinek
On Thu, Nov 12, 2015 at 11:51:33PM +0300, Ilya Verbin wrote:
> I'm unable to reproduce the hang (have tried various values of 
> OMP_NUM_THREADS).
> The testcase just aborts at (a != 50 || b != 4 || c != 20), because
> a == 37, b == 12, c == 40.

The hang has been with a fprintf (stderr, "...\n"); inside of the parallel
regions.  Anyway, still can't reproduce target-32.c crash, the target-33.c
abort is due to thinko (a, b, c were firstprivate in the explicit tasks, so
no wonder it could see the previous values).  See the following incremental
patch.

> BTW, don't know is this a bug or not:
> Conditional jump or move depends on uninitialised value(s)
>at 0x4C2083D: priority_queue_insert (priority_queue.h:347)
>by 0x4C24DF9: GOMP_PLUGIN_target_task_completion (task.c:678)

This is due to uninitialized task->priority for the target task.  See below.

Now I'm fighting target-34.c test hangs, strangely it hangs even with host
fallback.

For the offloading case, I actually see a problematic spot, namely that
GOMP_PLUGIN_target_task_completion could finish too early, and get the
task_lock before the thread that run the gomp_target_task_fn doing map_vars
+ async_run for it.  Bet I need to add further ttask state kinds and deal
with that case (so GOMP_PLUGIN_target_task_completion would just take the
task lock and tweak ttask state if it has not been added to the queues
yet).
Plus I think I want to improve the case where we are not waiting, in
gomp_create_target_task if not waiting for dependencies actually schedule
manually the gomp_target_task_fn.

--- libgomp/testsuite/libgomp.c/target-33.c 2015-11-12 16:20:14.0 
+0100
+++ libgomp/testsuite/libgomp.c/target-33.c 2015-11-13 09:45:27.174427034 
+0100
@@ -61,10 +61,10 @@
 a = a + 4;
 c >>= 1;
   }
-  #pragma omp task if (0) depend (in: d[3], d[4])
+  #pragma omp task if (0) depend (in: d[3], d[4]) shared (a, b, c)
   if (a != 50 || b != 4 || c != 20)
 abort ();
-  #pragma omp task
+  #pragma omp task shared (a)
   a += 50;
   #pragma omp target nowait map (tofrom: b)
   b++;
--- libgomp/task.c  2015-11-12 16:24:19.127548800 +0100
+++ libgomp/task.c  2015-11-13 10:53:19.525519366 +0100
@@ -538,6 +538,7 @@
  + sizeof (unsigned short))
  + tgt_size);
   gomp_init_task (task, parent, gomp_icv (false));
+  task->priority = 0;
   task->kind = GOMP_TASK_WAITING;
   task->in_tied_task = parent->in_tied_task;
   task->taskgroup = taskgroup;
--- libgomp/testsuite/libgomp.c/target-34.c.jj  2015-11-13 08:54:42.607799433 
+0100
+++ libgomp/testsuite/libgomp.c/target-34.c 2015-11-13 08:54:37.865866795 
+0100
@@ -0,0 +1,12 @@
+#define main do_main
+#include "target-33.c"
+#undef main
+
+int
+main ()
+{
+  #pragma omp parallel
+  #pragma omp single
+  do_main ();
+  return 0;
+}


Jakub


Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-13 Thread Richard Biener
On Fri, Nov 13, 2015 at 7:31 AM, Bin.Cheng  wrote:
> On Fri, Nov 13, 2015 at 2:13 PM, Jeff Law  wrote:
>> On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:
>>
>>>
>>> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>>>
>>>
>>>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
>>> From: Ajit Kumar Agarwal
>>> Date: Wed, 7 Oct 2015 20:50:40 +0200
>>> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used
>>> inside
>>>   loop for LICM and IVOPTS.
>>>
>>> Changes are done in the Loop Invariant(LICM) at RTL level and also the
>>> Induction variable optimization based on SSA representation. The current
>>> logic used in LICM for register used inside the loops is changed. The
>>> Live Out of the loop latch node and the Live in of the destination of
>>> the exit nodes is used to set the Loops Liveness at the exit of the Loop.
>>> The register used is the number of live variables at the exit of the
>>> Loop calculated above.
>>>
>>> For Induction variable optimization on tree SSA representation, the
>>> register
>>> used logic is based on the number of phi nodes at the loop header to
>>> represent
>>> the liveness at the loop.  Current Logic used only the number of phi nodes
>>> at
>>> the loop header.  Changes are made to represent the phi operands also live
>>> at
>>> the loop. Thus number of phi operands also gets incremented in the number
>>> of
>>> registers used.
>>>
>>> ChangeLog:
>>> 2015-10-09  Ajit Agarwal
>>>
>>> * loop-invariant.c (compute_loop_liveness): New.
>>> (determine_regs_used): New.
>>> (find_invariants_to_move): Use of determine_regs_used.
>>> * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>>> arguments for register used.
>>
>> I think Bin rejected the tree-ssa-loop-ivopts change.  However, the
>> loop-invariant change is still pending, right?
> Ah, reject is a strong word, I am just being dumb and don't understand
> why it's a general better estimation yet.
> Maybe Richard have some inputs here?

Not really.  I agree with Bin that the change doesn't look like an improvement
by design (might be one by accident for some benchmarks).

Richard.

> Thanks,
> bin
>>
>>
>>>
>>> Signed-off-by:Ajit agarwalajit...@xilinx.com
>>> ---
>>>   gcc/loop-invariant.c   | 72
>>> +-
>>>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>>>   2 files changed, 60 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>>> index 52c8ae8..e4291c9 100644
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>>>   }
>>>   }
>>>
>>> +static int
>>> +determine_regs_used()
>>> +{
>>> +  unsigned int j;
>>> +  unsigned int reg_used = 2;
>>> +  bitmap_iterator bi;
>>> +
>>> +  EXECUTE_IF_SET_IN_BITMAP (_DATA (curr_loop)->regs_live, 0, j, bi)
>>> +(reg_used) ++;
>>> +
>>> +  return reg_used;
>>> +}
>>
>> Isn't this just bitmap_count_bits (regs_live) + 2?
>>
>>
>>> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>>>   }
>>>   }
>>>
>>> -
>>> +static void
>>> +calculate_loop_liveness (void)
>>
>> Needs a function comment.
>>
>>
>>> +{
>>> +  basic_block bb;
>>> +  struct loop *loop;
>>>
>>> -/* Move the invariants out of the loops.  */
>>> +  FOR_EACH_LOOP (loop, 0)
>>> +if (loop->aux == NULL)
>>> +  {
>>> +loop->aux = xcalloc (1, sizeof (struct loop_data));
>>> +bitmap_initialize (_DATA (loop)->regs_live, _obstack);
>>> + }
>>> +
>>> +  FOR_EACH_BB_FN (bb, cfun)
>>
>> Why loop over blocks here?  Why not just iterate through all the loops in
>> the loop structure.  Order isn't particularly important AFAICT for this
>> code.
>>
>>
>>
>>> +   {
>>> + int  i;
>>> + edge e;
>>> + vec edges;
>>> + edges = get_loop_exit_edges (loop);
>>> + FOR_EACH_VEC_ELT (edges, i, e)
>>> + {
>>> +   bitmap_ior_into (_DATA (loop)->regs_live,
>>> DF_LR_OUT(e->src));
>>> +   bitmap_ior_into (_DATA (loop)->regs_live,
>>> DF_LR_IN(e->dest));
>>
>> Space before the open-paren in the previous two lines
>> DF_LR_OUT (e->src) and FD_LR_INT (e->dest))
>>
>>
>>> + }
>>> +  }
>>> +  }
>>> +}
>>> +
>>> +/* Move the invariants  ut of the loops.  */
>>
>> Looks like you introduced a typo.
>>
>> I'd like to see testcases which show the change in # regs used computation
>> helping generate better code.
>>
>> And  I'd also like to see some background information on why you think this
>> is a more accurate measure for the number of registers used in the loop.
>> regs_used AFAICT is supposed to be an estimate of the registers live around
>> the loop.  So ISTM that you get that value by live-out set on the backedge
>> of the loop.  I guess you get somethign similar by looking at the exit
>> edge's 

Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-13 Thread Richard Biener
On Thu, Nov 12, 2015 at 4:33 PM, David Malcolm  wrote:
> On Thu, 2015-11-12 at 15:06 +0100, Richard Biener wrote:
>> On Thu, Nov 12, 2015 at 3:04 PM, Richard Biener
>>  wrote:
>> > On Thu, Nov 12, 2015 at 2:49 PM, Tom de Vries  
>> > wrote:
>> >> On 12/11/15 13:26, Richard Biener wrote:
>> >>>
>> >>> On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries 
>> >>> wrote:
>> 
>>  Hi,
>> 
>>  [ See also related discussion at
>>  https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]
>> 
>>  this patch removes the usage of first_pass_instance from pass_vrp.
>> 
>>  the patch:
>>  - limits itself to pass_vrp, but my intention is to remove all
>> usage of first_pass_instance
>>  - lacks an update to gdbhooks.py
>> 
>>  Modifying the pass behaviour depending on the instance number, as
>>  first_pass_instance does, break compositionality of the pass list. In
>>  other
>>  words, adding a pass instance in a pass list may change the behaviour of
>>  another instance of that pass in the pass list. Which obviously makes it
>>  harder to understand and change the pass list. [ I've filed this issue 
>>  as
>>  PR68247 - Remove pass_first_instance ]
>> 
>>  The solution is to make the difference in behaviour explicit in the pass
>>  list, and no longer change behaviour depending on instance number.
>> 
>>  One obvious possible fix is to create a duplicate pass with a different
>>  name, say 'pass_vrp_warn_array_bounds':
>>  ...
>> NEXT_PASS (pass_vrp_warn_array_bounds);
>> ...
>> NEXT_PASS (pass_vrp);
>>  ...
>> 
>>  But, AFAIU that requires us to choose a different dump-file name for 
>>  each
>>  pass. And choosing vrp1 and vrp2 as new dump-file names still means that
>>  -fdump-tree-vrp no longer works (which was mentioned as drawback here:
>>  https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).
>> 
>>  This patch instead makes pass creation parameterizable. So in the pass
>>  list,
>>  we use:
>>  ...
>> NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
>> ...
>> NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
>>  ...
>> 
>>  This approach gives us clarity in the pass list, similar to using a
>>  duplicate pass 'pass_vrp_warn_array_bounds'.
>> 
>>  But it also means -fdump-tree-vrp still works as before.
>> 
>>  Good idea? Other comments?
>> >>>
>> >>>
>> >>> It's good to get rid of the first_pass_instance hack.
>> >>>
>> >>> I can't comment on the AWK, leaving that to others.  Syntax-wise I'd 
>> >>> hoped
>> >>> we can just use NEXT_PASS with the extra argument being optional...
>> >>
>> >>
>> >> I suppose I could use NEXT_PASS in the pass list, and expand into
>> >> NEXT_PASS_WITH_ARG in pass-instances.def.
>> >>
>> >> An alternative would be to change the NEXT_PASS macro definitions into
>> >> vararg variants. But the last time I submitted something with a vararg 
>> >> macro
>> >> ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I got a
>> >> question about it ( 
>> >> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html
>> >> ), so I tend to avoid using vararg macros.
>> >>
>> >>> I don't see the need for giving clone_with_args a new name, just use an
>> >>> overload
>> >>> of clone ()?
>> >>
>> >>
>> >> That's what I tried initially, but I ran into:
>> >> ...
>> >> src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* opt_pass::clone()’
>> >> was hidden [-Woverloaded-virtual]
>> >>virtual opt_pass *clone ();
>> >>  ^
>> >> src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass*
>> >> {anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
>> >>opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp
>> >> (m_ctxt, warn_array_bounds_p); }
>> >> ...
>> >>
>> >> Googling the error message gives this discussion: (
>> >> http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions
>> >> ), and indeed adding
>> >>   "using gimple_opt_pass::clone;"
>> >> in class pass_vrp makes the warning disappear.
>> >>
>> >> I'll submit an updated version.
>> >
>> > Hmm, but actually the above means the pass does not expose the
>> > non-argument clone
>> > which is good!
>> >
>> > Or did you forget to add the virtual-with-arg variant to opt_pass?
>> > That is, why's it
>> > a virtual function in the first place?  (clone_with_arg)
>>
>> That said,
>>
>> --- a/gcc/tree-pass.h
>> +++ b/gcc/tree-pass.h
>> @@ -83,6 +83,7 @@ public:
>>
>>   The default implementation prints an error message and aborts.  */
>>virtual opt_pass *clone ();
>> +  virtual opt_pass *clone_with_arg (bool);
>>
>>
>> means the arg type is fixed at 'bool' (yeah, mimicing
>> first_pass_instance).  

Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-13 Thread Richard Biener
On Thu, Nov 12, 2015 at 4:07 PM, Andre Vieira
 wrote:
> Hi,
>
>   This patch changes this testcase to make sure LTO will not optimize away
> the assignment of the local array to a global variable which was introduced
> to make sure stack space was made available for the test to work.
>
>   This is correct because LTO is supposed to optimize this global away as at
> link time it knows this global will never be read. By adding a read of the
> global, LTO will no longer optimize it away.

But that's only because we can't see that bar doesn't clobber it, else
we would optimize away the check and get here again.  Much better
to mark 'dummy' with __attribute__((used)) and go away with 'g' entirely.

Richard.

>   Tested by running regressions for this testcase for various ARM targets.
>
>   Is this OK to commit?
>
>   Thanks,
>   Andre Vieira
>
> gcc/testsuite/ChangeLog:
> 2015-11-06  Andre Vieira  
>
> * gcc.dg/torture/stackalign/builtin-return-1.c: Added read
>   to global such that a write is not optimized away by LTO.


PR68264: Use unordered comparisons for tree-call-cdce.c

2015-11-13 Thread Richard Sandiford
As reported in PR 68264, tree-call-cdce.c should be using unordered
comparisons for the range checks, in order to avoid raising FE_INVALID
for quiet NaNs.

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  The test failed on
aarch64-linux-gnu before the patch, but it didn't on x86_64-linux-gnu
because it used unordered comparisons for the previous ordered tree codes.

OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/68264
* tree-call-cdce.c (gen_one_condition): Update commentary.
(gen_conditions_for_pow_int_base): Invert the sense of the tests
passed to gen_one_condition.
(gen_conditions_for_domain): Likewise.  Use unordered comparisons.
(shrink_wrap_one_built_in_call): Invert the sense of the tests,
using EDGE_FALSE_VALUE for edges to the call block and
EDGE_TRUE_VALUE for the others.

gcc/testsuite/
PR tree-optimization/68264
* gcc.dg/torture/pr68264.c: New test.

diff --git a/gcc/testsuite/gcc.dg/torture/pr68264.c 
b/gcc/testsuite/gcc.dg/torture/pr68264.c
new file mode 100644
index 000..c4e85e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr68264.c
@@ -0,0 +1,102 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fenv_exceptions } */
+
+#include 
+#include 
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define LARGE_NEG_MAYBE_ERANGE 0x01
+#define LARGE_NEG_ERANGE   0x02
+#define LARGE_POS_ERANGE   0x04
+#define LARGE_NEG_EDOM 0x08
+#define LARGE_POS_EDOM 0x10
+
+#define LARGE_ERANGE (LARGE_NEG_ERANGE | LARGE_POS_ERANGE)
+#define LARGE_EDOM (LARGE_NEG_EDOM | LARGE_POS_EDOM)
+#define POWER_ERANGE (LARGE_NEG_MAYBE_ERANGE | LARGE_POS_ERANGE)
+
+#define TEST(CALL, FLAGS) (CALL, tester (FLAGS))
+
+volatile double d;
+volatile int i;
+
+static void (*tester) (int);
+
+void
+check_quiet_nan (int flags __attribute__ ((unused)))
+{
+  if (fetestexcept (FE_ALL_EXCEPT))
+abort ();
+  if (errno)
+abort ();
+}
+
+void
+check_large_neg (int flags)
+{
+  if (flags & LARGE_NEG_MAYBE_ERANGE)
+return;
+  int expected_errno = (flags & LARGE_NEG_ERANGE ? ERANGE
+   : flags & LARGE_NEG_EDOM ? EDOM
+   : 0);
+  if (expected_errno != errno)
+abort ();
+  errno = 0;
+}
+
+void
+check_large_pos (int flags)
+{
+  int expected_errno = (flags & LARGE_POS_ERANGE ? ERANGE
+   : flags & LARGE_POS_EDOM ? EDOM
+   : 0);
+  if (expected_errno != errno)
+abort ();
+  errno = 0;
+}
+
+void
+test (void)
+{
+  TEST (acos (d), LARGE_EDOM);
+  TEST (asin (d), LARGE_EDOM);
+  TEST (acosh (d), LARGE_NEG_EDOM);
+  TEST (atanh (d), LARGE_EDOM);
+  TEST (cosh (d), LARGE_ERANGE);
+  TEST (sinh (d), LARGE_ERANGE);
+  TEST (log (d), LARGE_NEG_EDOM);
+  TEST (log2 (d), LARGE_NEG_EDOM);
+  TEST (log10 (d), LARGE_NEG_EDOM);
+  /* Disabled due to glibc PR 6792, fixed in Apr 2015.  */
+  if (0)
+TEST (log1p (d), LARGE_NEG_EDOM);
+  TEST (exp (d), POWER_ERANGE);
+  TEST (exp2 (d), POWER_ERANGE);
+  TEST (expm1 (d), POWER_ERANGE);
+  TEST (sqrt (d), LARGE_NEG_EDOM);
+  TEST (pow (100.0, d), POWER_ERANGE);
+  TEST (pow (i, d), POWER_ERANGE);
+}
+
+int
+main (void)
+{
+  errno = 0;
+  i = 100;
+  d = __builtin_nan ("");
+  tester = check_quiet_nan;
+  feclearexcept (FE_ALL_EXCEPT);
+  test ();
+
+  d = -1.0e80;
+  tester = check_large_neg;
+  errno = 0;
+  test ();
+
+  d = 1.0e80;
+  tester = check_large_pos;
+  errno = 0;
+  test ();
+}
diff --git a/gcc/tree-call-cdce.c b/gcc/tree-call-cdce.c
index a5f38ce..fbcc70b 100644
--- a/gcc/tree-call-cdce.c
+++ b/gcc/tree-call-cdce.c
@@ -51,10 +51,11 @@ along with GCC; see the file COPYING3.  If not see
  built_in_call (args)
 
 An actual simple example is :
- log (x);   // Mostly dead call
+log (x);   // Mostly dead call
  ==>
- if (x <= 0)
- log (x);
+if (__builtin_islessequal (x, 0))
+log (x);
+
  With this change, call to log (x) is effectively eliminated, as
  in majority of the cases, log won't be called with x out of
  range.  The branch is totally predictable, so the branch cost
@@ -306,15 +307,13 @@ is_call_dce_candidate (gcall *call)
 }
 
 
-/* A helper function to generate gimple statements for
-   one bound comparison.  ARG is the call argument to
-   be compared with the bound, LBUB is the bound value
-   in integer, TCODE is the tree_code of the comparison,
-   TEMP_NAME1/TEMP_NAME2 are names of the temporaries,
-   CONDS is a vector holding the produced GIMPLE statements,
-   and NCONDS points to the variable holding the number
-   of logical comparisons.  CONDS is either empty or
-   a list ended with a null tree.  */
+/* A helper function to generate gimple statements for one bound
+   comparison, so that the built-in function is called whenever
+   TCODE  is *false*.  TEMP_NAME1/TEMP_NAME2 are names
+   of the temporaries, CONDS is a vector 

Re: Use combined_fn in tree-vrp.c

2015-11-13 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Nov 10, 2015 at 1:09 AM, Bernd Schmidt  wrote:
>> On 11/07/2015 01:46 PM, Richard Sandiford wrote:
>>>
>>> @@ -3814,8 +3817,8 @@ extract_range_basic (value_range *vr, gimple *stmt)
>>>   break;
>>>   /* Both __builtin_ffs* and __builtin_popcount return
>>>  [0, prec].  */
>>> -   CASE_INT_FN (BUILT_IN_FFS):
>>> -   CASE_INT_FN (BUILT_IN_POPCOUNT):
>>> +   CASE_CFN_FFS:
>>> +   CASE_CFN_POPCOUNT:
>>>   arg = gimple_call_arg (stmt, 0);
>>>   prec = TYPE_PRECISION (TREE_TYPE (arg));
>>>   mini = 0;
>>
>>
>> So let me see if I understood this. From what we discussed the purpose of
>> these new internal functions is that they can have vector types. If so,
>> isn't this code (here and elsewhere) which expects integers potentially
>> going to be confused?
>
> We indeed need to add additional checks to most users of CASE_CFN_* to cover
> the bigger freedom that exists with respect to types.

The code above is OK because it's only handling integral types.
A vector popcount or vector ffs must return a vector result.

> Richard, please audit all the cases you change for that.

I had another look and the only problematical uses I could see are the
match.pd ones.  E.g.:

 /* Optimize logN(func()) for various exponential functions.  We
want to determine the value "x" and the power "exponent" in
order to transform logN(x**exponent) into exponent*logN(x).  */
 (for logs (LOG  LOG   LOG   LOG2 LOG2  LOG2  LOG10 LOG10)
  exps (EXP2 EXP10 POW10 EXP  EXP10 POW10 EXP   EXP2)
  (simplify
   (logs (exps @0))
   (with {
 tree x;
 switch (exps)
   {
   CASE_CFN_EXP:
 /* Prepare to do logN(exp(exponent)) -> exponent*logN(e).  */
 x = build_real_truncate (type, dconst_e ());
 break;
   CASE_CFN_EXP2:
 /* Prepare to do logN(exp2(exponent)) -> exponent*logN(2).  */
 x = build_real (type, dconst2);
 break;
   CASE_CFN_EXP10:
   CASE_CFN_POW10:
 /* Prepare to do logN(exp10(exponent)) -> exponent*logN(10).  */
 {
   REAL_VALUE_TYPE dconst10;
   real_from_integer (, VOIDmode, 10, SIGNED);
   x = build_real (type, dconst10);
 }
 break;
   default:
 gcc_unreachable ();
   }
 }
(mult (logs { x; }) @0

Here we could either add a SCALAR_FLOAT_TYPE_P check or extend
build_real to handle vector types.  Which do you think would be best?

Thanks,
Richard



Re: [PATCH, 4/16] Implement -foffload-alias

2015-11-13 Thread Richard Biener
On Fri, 13 Nov 2015, Tom de Vries wrote:

> On 13/11/15 09:46, Richard Biener wrote:
> > On Thu, 12 Nov 2015, Tom de Vries wrote:
> > 
> > > On 11/11/15 12:00, Jakub Jelinek wrote:
> > > > On Wed, Nov 11, 2015 at 11:51:02AM +0100, Richard Biener wrote:
> > > > > > The option -foffload-alias=pointer instructs the compiler to assume
> > > > > > that
> > > > > > objects references in an offload region do not alias.
> > > > > > 
> > > > > > The option -foffload-alias=all instructs the compiler to make no
> > > > > > assumptions about aliasing in offload regions.
> > > > > > 
> > > > > > The default value is -foffload-alias=none.
> > > > > 
> > > > > I think global options for this is nonsense.  Please follow what
> > > > > we do for #pragma GCC ivdep for example, thus allow the alias
> > > > > behavior to be specified per "region" (whatever makes sense here
> > > > > in the context of offloading).
> > > 
> > > So, IIUC, instead of a global option foffload-alias, you're saying
> > > something
> > > like the following would be acceptable:
> > > ...
> > > #pragma GCC offload-alias=
> > > #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
> > >{
> > >  #pragma acc loop
> > >  for (COUNTERTYPE ii = 0; ii < N; ii++)
> > >c[ii] = a[ii] + b[ii];
> > >}
> > > ...
> > > ?
> > > 
> > > I suppose that would work (though a global option would allow us to easily
> > > switch between none/pointer/all values in a large number of files,
> > > something
> > > that might be useful when f.i. running an openacc  test suite).
> > > 
> > > > Yeah, completely agreed.  I don't see why the offloaded region would be
> > > > in
> > > > any way special, they are C/C++/Fortran code as any other.
> > > > What we can and should improve is teach IPA aliasing/points to analysis
> > > > about the way we lower the host vs. offloading region boundary, so that
> > > > if alias analysis on the caller of GOMP_target_ext/GOACC_parallel_keyed
> > > > determines something it can be used on the offloaded function side and
> > > > vice
> > > > versa,
> > > 
> > > I agree this would be a nice way to solve the aliasing info problem, but
> > > considering the remark of Richard at
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46032#c19 :
> > > ...
> > > Not that I think IPA PTA is anywhere near production ready
> > 
> > Just to clarify on that sentence:
> >   1) we lack good testing coverage for IPA PTA so wrong-code bugs might
> > still exist
> >   2) IPA PTA can use a _lot_ of memory and compile-time
> >   3) for existing wrong-code issues I have merely dumbed down the
> > use of the analysis result resulting in weaker alias analysis compared to
> > the local PTA (for some cases)
> > 
> > Because of 2) and no good way to avoid this I decided to not make
> > fixing 3) a priority (and 1) still holds).
> > 
> 
> Hi,
> 
> thanks for the explanation. Filed as PR68331 - '[meta-bug] fipa-pta issues'.
> 
> Any feedback on the '#pragma GCC offload-alias=' bit above?
> Is that sort of what you had in mind?

Yes.  Whether that makes sense is another question of course.  You can
annotate memory references with MR_DEPENDENCE_BASE/CLIQUE yourself
as well if you know dependences without the users intervention.

Richard.

> Thanks,
> - Tom
> 
> 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] RFC: Enable graphite at -O3 -fprofile_use

2015-11-13 Thread Richard Biener
On Fri, 13 Nov 2015, VandeVondele  Joost wrote:

> I'm all in favour of requiring isl and enabling graphite by default, but 
> would suggest to enable it with -Ofast instead.
> 
> One reason is that certainly extracting testcases from a PGO build is 
> more difficult, and initially there will certainly be miscompiles with 
> graphite (CP2K is right now).
> 
> Furthermore, unless graphite is particularly effective with PGO (does it 
> use average loop trip counts already?), I don't see a particular 
> connection.

The reason to choose FDO was so GRAPHITE can concentrate its computing
budget on the hot parts of a program (which profile estimation isn't
good enough identifying), reducing its compile-time cost.

-Ofast isn't supposed to enable passes over -O3 so you're suggesting
to enable it with -O3 which I think is a bit premature.  But we can
try doing that and revert at the end of stage3 if problems are just
too big.

Richard.


Re: [patch] GSoC: Implement std::experimental::shared_ptr

2015-11-13 Thread Jonathan Wakely

On 13/11/15 11:09 +, Jonathan Wakely wrote:

This is the other piece of work done by Fan You for the Google Summer
of Code (and mentored by Tim).

This implements experimental::shared_ptr from the Library Fundamentals
TS, which differs from std::shared_ptr by supporting arrays, i.e.
shared_ptr and shared_ptr behave correctly, using
delete[] to free the managed memory, and providing operator[] instead
of operator* and operator->.

I made a few changes to Fan You's patch:

- moved __libfund_v1 tag type and new partial specializations to
, so all changes are in the experimental dir.

- added a second template parameter to the tag type to distinguish
arrays from non-arrays, so we can have two partial specializations,
__shared_ptr<__libfund_v1> and __shared_ptr<__libfund_v1>, with slightly different interfaces.

- added std::hash specialization.

- used remove_extent_t, enable_if_t etc alias templates.

- removed the _Weak_friend helper class.

- fixed some tests that used operator-> on shared_ptr.


Thanks very much to Fan You, and to TIm Shen and Google.

Tested powerpc64le-linux, committed to trunk.


Oops, I made a small error in the doxygen @file comment, fixed with
this patch. Committed to trunk.

commit 5fb8a8b7401e0374d34574db0398639b84c36b6e
Author: Jonathan Wakely 
Date:   Fri Nov 13 11:16:51 2015 +

	* include/experimental/bits/shared_ptr.h: Tweak comments.

diff --git a/libstdc++-v3/include/experimental/bits/shared_ptr.h b/libstdc++-v3/include/experimental/bits/shared_ptr.h
index feba7d7..413652d 100644
--- a/libstdc++-v3/include/experimental/bits/shared_ptr.h
+++ b/libstdc++-v3/include/experimental/bits/shared_ptr.h
@@ -24,7 +24,7 @@
 
 /** @file experimental/bits/shared_ptr.h
  *  This is an internal header file, included by other library headers.
- *  Do not attempt to use it directly. @headername{memory}
+ *  Do not attempt to use it directly. @headername{experimental/memory}
  */
 
 #ifndef _GLIBCXX_EXPERIMENTAL_SHARED_PTR_H
@@ -57,7 +57,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /*
* The specification of std::experimental::shared_ptr is slightly different
-   * to std::shared_ptr (specifically in terms of pointer "compatibility") so
+   * to std::shared_ptr (specifically in terms of "compatible" pointers) so
* to implement std::experimental::shared_ptr without too much duplication
* we make it derive from a partial specialization of std::__shared_ptr
* using a special tag type, __libfund_v1.


Re: [PATCH] Fix ICE for boolean comparison

2015-11-13 Thread Ilya Enkovich
2015-11-13 14:28 GMT+03:00 Richard Biener :
> On Fri, Nov 13, 2015 at 11:52 AM, Ilya Enkovich  
> wrote:
>> 2015-11-13 13:38 GMT+03:00 Richard Biener :
>>> On Thu, Nov 12, 2015 at 4:44 PM, Ilya Enkovich  
>>> wrote:
 Hi,

 Currently compiler may ICE when loaded boolean is compared with vector 
 invariant or another boolean value.  This is because we don't detect mix 
 of bool and non-bool vectypes and incorrectly determine vectype for 
 boolean loop invariant for comparison.  This was fixed for COND_EXP before 
 but also needs to be fixed for comparison.  This patch was bootstrapped 
 and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>>>
>>> Hmm, so this disables vectorization in these cases.  Isn't this a
>>> regression?  Shouldn't we simply "materialize"
>>> the non-bool vector from the boolean one say, with
>>>
>>>  vec = boolvec ? {-1, -1 ... } : {0, 0, 0 ...}
>>
>> We may do this using patterns, but still should catch cases when
>> patterns don't catch it. Patterns don't have vectypes computed and
>> therefore may miss such cases. Thus stability fix is still valid.
>>
>> I don't think we have a compiler version which can vectorize
>> simd-bool-comparison-2.cc, thus technically it is not a regression.
>> There are also other similar cases, e.g. store of comparison result or
>> use loaded boolean as a predicate. I was going to support
>> vectorization for such cases later (seems I don't hit stage1 for them
>> and not sure if it will be OK for stage3).
>
> I still think those checks show that there is an issue we should fix
> differently.  We're accumulating more mess into the already messy
> vectorizer :(

Right. Earlier vectype computation would let to reveal such cases more easily.

>
> Ok.

Thanks!
Ilya

>
> Thanks,
> Richard.
>
>> Ilya
>>
>>>
>>> ?
>>>
>>> Thanks,
>>> Richard.
>>>
 Thanks,
 Ilya


[Ada] Iterable aspect for an integer type

2015-11-13 Thread Arnaud Charlet
This patch fixes a spurious error on an iterator loop over an integer type
on which the Iterable aspect has been specified. Analysis of the loop uses
the base type to find the required primitive operations, but the signature
of the First primitive and others uses the first subtype instead.


Executing

gnatmake -q vect1
vect1

must yield:

iteration over cursors
 0
 1
 2
 3
 4
iteration over elements
 0
 1
 2
 3
 4

---
with Text_IO; use Text_IO;
procedure Vect1 is
  package Iter is
  type Vector_Id is new Natural with
Iterable => (First => First,
 Next  => Next,
 Has_Element => Has_Element,
 Element => Element);

  type Cursor is new Natural;

  function First (Vector : Vector_Id) return Cursor;
  function Next (Vector : Vector_Id; Position : Cursor) return Cursor;
  function Has_Element (Vector : Vector_Id; Position : Cursor) return Boolean;
  function Element (Vector : Vector_Id; Position : Cursor) return Natural;
  end Iter;

  package body Iter is
  function First (Vector : Vector_Id) return Cursor is
  begin
 return 0;
  end First;

  function Next (Vector : Vector_Id; Position : Cursor) return Cursor is
  begin
 return Position + 1;
  end Next;

  function Has_Element (Vector : Vector_Id; Position : Cursor) return Boolean
  is
  begin
 return Position < Cursor (Vector);
  end Has_Element;

  function Element (Vector : Vector_Id; Position : Cursor) return Natural is
  begin
 return Natural (Position);
  end Element;
  end Iter;
  use Iter;

  V : Vector_Id;
begin
   V := 5;
   Put_Line ("iteration over cursors");
   for I in V loop
  put_line (integer'image (Integer (I)));
  null;
   end loop;

   Put_Line ("iteration over elements");
   for I of V loop
  put_line (integer'image (Integer (I)));
  null;
   end loop;
end Vect1;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-13  Ed Schonberg  

* sem_util.adb (Get_Cursor_Type): To determine whether a function
First is the proper Iterable primitive, use the base type of the
first formal rather than the type. This is needed in the unusual
case where the Iterable aspect is specified for an integer type.

Index: sem_util.adb
===
--- sem_util.adb(revision 230301)
+++ sem_util.adb(working copy)
@@ -7553,13 +7553,16 @@
   Cursor := Any_Type;
 
   --  Locate function with desired name and profile in scope of type
+  --  In the rare case where the type is an integer type, a base type
+  --  is created for it, check that the base type of the first formal
+  --  of First matches the base type of the domain.
 
   Func := First_Entity (Scope (Typ));
   while Present (Func) loop
  if Chars (Func) = Chars (First_Op)
and then Ekind (Func) = E_Function
and then Present (First_Formal (Func))
-   and then Etype (First_Formal (Func)) = Typ
+   and then Base_Type (Etype (First_Formal (Func))) = Base_Type (Typ)
and then No (Next_Formal (First_Formal (Func)))
  then
 if Cursor /= Any_Type then


Re: [PATCH, 4/16] Implement -foffload-alias

2015-11-13 Thread Jakub Jelinek
On Fri, Nov 13, 2015 at 12:29:51PM +0100, Richard Biener wrote:
> > thanks for the explanation. Filed as PR68331 - '[meta-bug] fipa-pta issues'.
> > 
> > Any feedback on the '#pragma GCC offload-alias=' bit 
> > above?
> > Is that sort of what you had in mind?
> 
> Yes.  Whether that makes sense is another question of course.  You can
> annotate memory references with MR_DEPENDENCE_BASE/CLIQUE yourself
> as well if you know dependences without the users intervention.

I really don't like even the GCC offload-alias, I just don't see anything
special on the offload code.  Not to mention that the same issue is already
with other outlined functions, like OpenMP tasks or parallel regions, those
aren't offloaded, yet they can suffer from worse alias/points-to analysis
too.

We simply have some compiler internal interface between the caller and
callee of the outlined regions, each interface in between those has
its own structure type used to communicate the info;
we can attach attributes on the fields, or some flags to indicate some
properties interesting from aliasing POV.  We don't really need to perform
full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph
the relationship in between such callers and callees (for offloading regions
we already have "omp target entrypoint" attribute on the callee and a
singler caller), tell LTO if possible not to split those into different
partitions if easily possible, and then just for these pairs perform
aliasing/points-to analysis in the caller and the result record using
cliques/special attributes/whatever to the callee side, so that the callee
(outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis.

Jakub


Re: [PATCH][ARM] PR 49526: Add support for smmul,smmla,smmls instructions

2015-11-13 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00686.html

Thanks,
Kyrill

On 06/11/15 17:05, Kyrill Tkachov wrote:

Hi all,

This patch introduces support for the smmul, smmla and smmls instructions that 
appear in armv6 architecture levels
and higher. To quote the SMMUL description from the ARMARM:
"Signed Most Significant Word Multiply multiplies two signed 32-bit values, 
extracts the most significant 32 bits of
the result, and writes those bits to the destination register."

The smmla and smmls are the multiply-accumulate and multiply-subtract 
extensions of that multiply.
There also exists an smmulr variant that rounds the result rather than 
truncating it.
However, when I tried adding patterns for those forms I got an LRA ICE that I 
was not able to figure out.
I'll try to find a testcase for that, but in the meantime there's no reason to 
not have patterns for the
non-rounding variants.

Bootstrapped and tested on arm-none-linux-gnueabihf.
I've seen this trigger in quite a few places in SPEC2006 where it always made 
the code better.

Ok for trunk?

Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

PR target/49526
* config/arm/arm.md (*mulsidi3si_v6): New pattern.
(*mulsidi3siaddsi_v6): Likewise.
(*mulsidi3sisubsi_v6): Likewise.
* config/arm/predicates.md (subreg_highpart_operator):
New predicate.

2015-11-06  Kyrylo Tkachov  

PR target/49526
* gcc.target/arm/pr49526_1.c: New test.
* gcc.target/arm/pr49526_2.c: Likewise.
* gcc.target/arm/pr49526_3.c: Likewise.




Re: [PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem

2015-11-13 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00581.html

Thanks,
Kyrill
On 06/11/15 10:46, Kyrill Tkachov wrote:

Hi all,

In this wrong-code PR the vector setmem expansion and 
arm_block_set_aligned_vect in particular
use the wrong offset when calling adjust_automodify_address. In the attached 
testcase during the
initial zeroing out we get two V16QI stores, but they both are recorded by 
adjust_automodify_address
as modifying x+0 rather than x+0 and x+12 (the total size to be written is 28).

This led to the scheduling pass moving the store from "x.g = 2;" to before the 
zeroing stores.

This patch fixes the problem by keeping track of the offset to which stores are 
emitted and
passing it to adjust_automodify_address as appropriate.

From inspection I see arm_block_set_unaligned_vect also has this issue so I 
performed the same
fix in that function as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

This bug appears on GCC 5 too and I'm currently testing this patch there.
Ok to backport to GCC 5 as well?

Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

PR target/68143
* config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of
offset from dstbase and use it appropriately in
adjust_automodify_address.
(arm_block_set_aligned_vect): Likewise.

2015-11-06  Kyrylo Tkachov  

PR target/68143
* gcc.target/arm/pr68143_1.c: New test.




[PATCH] Fix PR68306

2015-11-13 Thread Richard Biener

This fixes more of PR68306.  Digging into the details reveals
that we only need to check a single (but the correct one) DR
and that cost modeling also gets this wrong.

Thus fixed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-13  Richard Biener  

PR tree-optimization/68306
* tree-vect-data-refs.c (verify_data_ref_alignment): Move
loop related checks ...
(vect_verify_datarefs_alignment): ... here.
(vect_slp_analyze_and_verify_node_alignment): Compute and
verify alignment of the single DR that it matters.
* tree-vect-stmts.c (vectorizable_store): Add an assert.
(vectorizable_load): Add a comment.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Fix DR used
for determining load cost.

* gcc.dg/pr68306.c: Adjust.
* gcc.dg/pr68306-2.c: New testcase.
* gcc.dg/pr68306-3.c: Likewise.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 230293)
--- gcc/tree-vect-data-refs.c   (working copy)
*** verify_data_ref_alignment (data_referenc
*** 920,936 
gimple *stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
  
-   /* For interleaving, only the alignment of the first access matters.   */
-   if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
-   && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
- return true;
- 
-   /* Strided accesses perform only component accesses, alignment is
-  irrelevant for them.  */
-   if (STMT_VINFO_STRIDED_P (stmt_info)
-   && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
- return true;
- 
supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
if (!supportable_dr_alignment)
  {
--- 920,925 
*** vect_verify_datarefs_alignment (loop_vec
*** 977,982 
--- 966,983 
  
if (!STMT_VINFO_RELEVANT_P (stmt_info))
continue;
+ 
+   /* For interleaving, only the alignment of the first access matters.   
*/
+   if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
+   return true;
+ 
+   /* Strided accesses perform only component accesses, alignment is
+irrelevant for them.  */
+   if (STMT_VINFO_STRIDED_P (stmt_info)
+ && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
+   return true;
+ 
if (! verify_data_ref_alignment (dr))
return false;
  }
*** vect_analyze_data_refs_alignment (loop_v
*** 2100,2127 
  static bool
  vect_slp_analyze_and_verify_node_alignment (slp_tree node)
  {
!   unsigned i;
!   gimple *stmt;
!   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt)
  {
!   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
! 
!   /* Strided accesses perform only component accesses, misalignment
!information is irrelevant for them.  */
!   if (STMT_VINFO_STRIDED_P (stmt_info)
! && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
!   continue;
! 
!   data_reference_p dr = STMT_VINFO_DATA_REF (stmt_info);
!   if (! vect_compute_data_ref_alignment (dr)
! || ! verify_data_ref_alignment (dr))
!   {
! if (dump_enabled_p ())
!   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"not vectorized: bad data alignment in basic "
!"block.\n");
! return false;
!   }
  }
  
return true;
--- 2101,2122 
  static bool
  vect_slp_analyze_and_verify_node_alignment (slp_tree node)
  {
!   /* We vectorize from the first scalar stmt in the node unless
!  the node is permuted in which case we start from the first
!  element in the group.  */
!   gimple *first_stmt = SLP_TREE_SCALAR_STMTS (node)[0];
!   if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
! first_stmt = GROUP_FIRST_ELEMENT (vinfo_for_stmt (first_stmt));
! 
!   data_reference_p dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
!   if (! vect_compute_data_ref_alignment (dr)
!   || ! verify_data_ref_alignment (dr))
  {
!   if (dump_enabled_p ())
!   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"not vectorized: bad data alignment in basic "
!"block.\n");
!   return false;
  }
  
return true;
Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 230293)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_store (gimple *stmt, gimple
*** 5464,5469 
--- 5464,5470 
   group.  */
vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
first_stmt = SLP_TREE_SCALAR_STMTS (slp_node)[0]; 
+ gcc_assert (GROUP_FIRST_ELEMENT (vinfo_for_stmt (first_stmt)) == 
first_stmt);
first_dr = 

[PATCH] Fix PR ipa/68311

2015-11-13 Thread Martin Liška
Hello.

Following patch fixes PR68311, can regbootstrap on x86_64-linux-gnu.

Ready for trunk?
Thanks,
Martin
>From bc07c0709f0601e18b7ea7dfad867a5296378640 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 12 Nov 2015 16:17:52 +0100
Subject: [PATCH] Fix PR ipa/68311

gcc/ChangeLog:

2015-11-12  Martin Liska  

	PR ipa/68311
	* ipa-icf.c (sem_item_optimizer::traverse_congruence_split):
	Replace ctor with auto_vec and initialization in a loop.
---
 gcc/ipa-icf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 7bb3af5..97702c9 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -3038,7 +3038,9 @@ sem_item_optimizer::traverse_congruence_split (congruence_class * const ,
 
   if (popcount > 0 && popcount < cls->members.length ())
 {
-  congruence_class* newclasses[2] = { new congruence_class (class_id++), new congruence_class (class_id++) };
+  auto_vec  newclasses (2);
+  newclasses.safe_push (new congruence_class (class_id++));
+  newclasses.safe_push (new congruence_class (class_id++));
 
   for (unsigned int i = 0; i < cls->members.length (); i++)
 	{
-- 
2.6.2



Re: [PATCH] Fix PR ipa/68035

2015-11-13 Thread Martin Liška
On 11/12/2015 07:40 PM, Jan Hubicka wrote:
 +
 +  /* Initialize hash values if we are not in LTO mode.  */
 +  if (!in_lto_p)
 +  item->get_hash ();
  }
>>>
>>> Hmm, what is the difference to the LTO mode here. I would have expected 
>>> that all the items
>>> was analyzed in both paths?
>>
>> Difference is that in case of the LTO mode, the hash value is read from 
>> streamed LTO file.
>> On the other hand, in classic compilation mode we have to force the 
>> calculation as a hash value
>> is computed lazily.
> 
> In this case we need to also handle cases where function/variable is born 
> during WPA (i.e. produced
> by earlier pass), so in_lto_p check looks wrong.
> I will look at the updated patch.
> 
> Honza
> 

Hi Honza.

Currently we just register {cgraph,varpool}_removal_hooks in WPA.
Probably place for enhancement?

Thanks,
Martin


Re: [PATCH] Fix PR ipa/68311

2015-11-13 Thread Richard Biener
On Fri, Nov 13, 2015 at 1:04 PM, Martin Liška  wrote:
> Hello.
>
> Following patch fixes PR68311, can regbootstrap on x86_64-linux-gnu.
>
> Ready for trunk?

Please use

 auto_vec  newclasses;

as it gets you a stack allocation.  Also use quick_push as you know the vector
is large enough.

Ok with that changes.

Richard.

> Thanks,
> Martin


Re: [PATCH] [ARM] neon-testgen.ml typo

2015-11-13 Thread Kyrill Tkachov


On 13/11/15 11:18, Ramana Radhakrishnan wrote:

Hmm. I hadn't noticed that the crypto intrinsics tests where generated by
neon-testgen.ml, I thought they were hand-written.
The tests I added do not cover the crypto intrinsics, so I'm going
to revert r230274 and restore all the tests generated by neon-testgen.ml
until we have better coverage in advsimd-intrinsics.

 From what I remember from a few days back I thought it was generally
ok to get rid of the lot as we had test coverage for everything else
in gcc.target/arm .

Thus don't bother reverting.


+1. I'll also add that you can now remove neon.ml from config/arm.
And also, I think we can move the remaining hand-written tests from 
gcc.target/arm/neon/
into gcc.target/arm/ and remove the neon/ directory altogether.

Kyrill



Ramana


Sorry for the oversight.

Christophe.



Christophe.


regards
Ramana




Christophe.


regards
Ramana




Re: [PATCH] Fix ICE for boolean comparison

2015-11-13 Thread Richard Biener
On Fri, Nov 13, 2015 at 11:52 AM, Ilya Enkovich  wrote:
> 2015-11-13 13:38 GMT+03:00 Richard Biener :
>> On Thu, Nov 12, 2015 at 4:44 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> Currently compiler may ICE when loaded boolean is compared with vector 
>>> invariant or another boolean value.  This is because we don't detect mix of 
>>> bool and non-bool vectypes and incorrectly determine vectype for boolean 
>>> loop invariant for comparison.  This was fixed for COND_EXP before but also 
>>> needs to be fixed for comparison.  This patch was bootstrapped and tested 
>>> on x86_64-unknown-linux-gnu.  OK for trunk?
>>
>> Hmm, so this disables vectorization in these cases.  Isn't this a
>> regression?  Shouldn't we simply "materialize"
>> the non-bool vector from the boolean one say, with
>>
>>  vec = boolvec ? {-1, -1 ... } : {0, 0, 0 ...}
>
> We may do this using patterns, but still should catch cases when
> patterns don't catch it. Patterns don't have vectypes computed and
> therefore may miss such cases. Thus stability fix is still valid.
>
> I don't think we have a compiler version which can vectorize
> simd-bool-comparison-2.cc, thus technically it is not a regression.
> There are also other similar cases, e.g. store of comparison result or
> use loaded boolean as a predicate. I was going to support
> vectorization for such cases later (seems I don't hit stage1 for them
> and not sure if it will be OK for stage3).

I still think those checks show that there is an issue we should fix
differently.  We're accumulating more mess into the already messy
vectorizer :(

Ok.

Thanks,
Richard.

> Ilya
>
>>
>> ?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Ilya


[PATCH 05/N] Fix memory leaks in graphite

2015-11-13 Thread Martin Liška
Hello.

Patch survives regbootstrap on x86_64-linux-gnu.
Ready for trunk?

Thanks,
Martin
>From 3f84b19e0ea7eacf26a566d3ef796397dafe76ce Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 12 Nov 2015 15:45:38 +0100
Subject: [PATCH] Fix memory leaks in graphite

---
 gcc/graphite-poly.c   |  1 +
 gcc/graphite-scop-detection.c | 27 ++-
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 5928b4c..809670a 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -328,6 +328,7 @@ free_scop (scop_p scop)
 free_poly_bb (pbb);
 
   scop->pbbs.release ();
+  scop->drs.release ();
 
   isl_set_free (scop->param_context);
   isl_union_map_free (scop->must_raw);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index a7179d9..b5298d7 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -522,6 +522,11 @@ class scop_detection
 public:
   scop_detection () : scops (vNULL) {}
 
+  ~scop_detection ()
+  {
+scops.release ();
+  }
+
   /* A marker for invalid sese_l.  */
   static sese_l invalid_sese;
 
@@ -1065,13 +1070,20 @@ scop_detection::harmful_stmt_in_region (sese_l scop) const
 
   /* The basic block should not be part of an irreducible loop.  */
   if (bb->flags & BB_IRREDUCIBLE_LOOP)
-return true;
+	{
+	  dom.release ();
+	  return true;
+	}
 
   if (harmful_stmt_in_bb (scop, bb))
-	return true;
+	{
+	  dom.release ();
+	  return true;
+	}
 }
 
-return false;
+  dom.release ();
+  return false;
 }
 
 /* Returns true if S1 subsumes/surrounds S2.  */
@@ -1749,12 +1761,9 @@ graphite_find_cross_bb_scalar_vars (scop_p scop, gimple *stmt,
 static gimple_poly_bb_p
 try_generate_gimple_bb (scop_p scop, basic_block bb)
 {
-  vec drs;
-  drs.create (3);
-  vec writes;
-  writes.create (3);
-  vec reads;
-  reads.create (3);
+  vec drs = vNULL;
+  vec writes = vNULL;
+  vec reads = vNULL;
 
   sese_l region = scop->scop_info->region;
   loop_p nest = outermost_loop_in_sese (region, bb);
-- 
2.6.2



Re: [PATCH 4/4] [ARM] Add attribute/pragma target fpu=

2015-11-13 Thread Kyrill Tkachov

Hi Christian,

On 12/11/15 14:54, Christian Bruel wrote:

Hi Kyril,


...
The parts in this patch look ok to me.
However, I think we need some more functionality
In aarch64 we support compiling a file with no simd, including arm_neon.h and 
using arm_neon.h intrinsics
within functions tagged with simd support.
We want to support such functionality on arm i.e. compile a file with -mfpu=vfp 
and use arm_neon.h intrinsics
in a function tagged with an fpu=neon attribute.
For that we'd need to wrap the intrinsics in arm_neon.h in appropriate pragmas, 
like in the aarch64 version of arm_neon.h


As discussed, here is arm_neon.h for aarch32/neon with the same programming 
model than aarch64/simd. As you said lets use one of the fpu=neon attributes 
even if the file is compiled with -mfpu=vfp.

The drawback for this is that now we unconditionally makes available every neon 
intrinsics, introducing a small legacy change with regards to error checking 
(that you didn't have with aarch64). Then it's worth to stress that:

 - One cannot check #include "arm_neon.h" to check if the compiler can use neon 
instruction. Instead use #ifndef __ARM_NEON__. (Found in target-supports.exp)


Checking the macro is the 'canonical' way to check for NEON support,
so I reckon we can live with that.




 - Types cannot be checked. For instance:

#include 

poly128_t
foo (poly128_t* ptr)
{
  return vldrq_p128 (ptr);
}

compiled with -mfpu=neon used to be rejected with

   error: unknown type name 'poly128_t' ...

 Now the error, as a side effect from the inlining rules between incompatible 
modes, becomes

  error: inlining failed in call to always_inline 'vldrq_p128': target specific 
option mismatch ...


Well, the previous message is misleading anyway since the user error there is 
not a type issue
but failure to specify the correct -mfpu option.



I found this more confusing, so I was a little bit reluctant to implement this, 
but the code is correctly rejected and the message makes sense, after all. Just 
a different check.

This patch applies on top of the preceding attribute/pragma target fpu= series. 
Tested with arm-none-eabi configured with default and --with-cpu=cortex-a9 
--with-fp --with-float=hard


Do you mean --with-fpu=?



Also fixes a few macro that depends on fpu=, that I forgot to redefine.


Can you please split those changes into a separate patch and ChangeLog and 
commit the separately?
That part is preapproved.


This patch is ok then with above comment about splitting the arm-c.c changes 
separately.
Thanks for doing this!
I believe all patches in this series are approved then
so you can go ahead and start committing.

Kyrill



Christian





Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-11-13 Thread Kirill Yukhin
Hello Jakub,
I've fixed all long lines, thanks!
I've also fixed max_len for "simd" attribute.

Tests are fixed w/ scan for SIMD-mangled routines, routines
made `extern'.

ChangeLog entry was updated.
gcc/
* omp-low.c (pass_omp_simd_clone::gate): If target allows - call
without additional conditions.
* doc/extend.texi (@item simd): New.
gcc/c-family/
* c-common.c (handle_simd_attribute): New.
(struct attribute_spec): Add entry for "simd".
(handle_simd_attribute): New
gcc/c/
* c-parser.c (c_finish_omp_declare_simd): Look for
"simd" attribute as well. Update error message.
gcc/cp/
* parser.c (cp_parser_late_parsing_cilk_simd_fn_info): Look for
"simd" attribute as well. Update error message.
gcc/testsuite/
* c-c++-common/attr-simd.c: New test.
* c-c++-common/attr-simd-2.c: New test.
* c-c++-common/attr-simd-3.c: New test.

Bootstrapped and regtested.

If no more objections - I'll check it into main trunk next ww.

> Are you going to update VectorABI.txt based on the latest changes in
> the Intel ABI document?  I mean especially using L, R or U for linear
> %val(), %ref() or %uval() (if references), not using s for linear with
> uniform parameter stride, but instead using ls, Ls, Rs or Us for those?

Yes, we'll update it.

--
Thanks, K

On 10 Nov 09:58, Jakub Jelinek wrote:
> On Tue, Nov 10, 2015 at 11:44:18AM +0300, Kirill Yukhin wrote:
> >bool *);
> > +static tree handle_simd_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_omp_declare_target_attribute (tree *, tree, tree, int,
> >  bool *);
> >  static tree handle_designated_init_attribute (tree *, tree, tree, int, 
> > bool *);
> > @@ -818,6 +819,8 @@ const struct attribute_spec c_common_attribute_table[] =
> >   handle_omp_declare_simd_attribute, false },
> >{ "cilk simd function", 0, -1, true,  false, false,
> >   handle_omp_declare_simd_attribute, false },
> > +  { "simd",  0, -1, true,  false, false,
> > + handle_simd_attribute, false },
> 
> Why the -1 in there?  I thought the simd attribute has exactly zero
> arguments, so 0, 0.
> 
> > +static tree
> > +handle_simd_attribute (tree *node, tree name, tree ARG_UNUSED (args),
> > +  int ARG_UNUSED (flags), bool *no_add_attrs)
> 
> As per recent discussion, please leave ARG_UNUSED (args) etc. out, just
> use unnamed arguments.
> > +{
> > +  if (TREE_CODE (*node) == FUNCTION_DECL)
> > +{
> > +  if (lookup_attribute ("cilk simd function", DECL_ATTRIBUTES (*node)) 
> > != NULL)
> 
> Too long line.
> > +   {
> > + error_at (DECL_SOURCE_LOCATION (*node),
> > +   "%<__simd__%> attribute cannot be "
> > +   "used in the same function marked as a Cilk Plus 
> > SIMD-enabled function");
> 
> Too long line.  You should just move "use in the same " one line above.
> 
> > + *no_add_attrs = true;
> > +   }
> > +  else
> > +   {
> > + DECL_ATTRIBUTES (*node)
> > +   = tree_cons (get_identifier ("omp declare simd"),
> > +NULL_TREE, DECL_ATTRIBUTES (*node)); 
> > +   }
> 
> Please avoid {}s around single statement in the body.
> >  {
> > -  error ("%<#pragma omp declare simd%> cannot be used in the same "
> > +  error ("%<#pragma omp declare simd%> or % attribute cannot be 
> > used in the same "
> >  "function marked as a Cilk Plus SIMD-enabled function");
> 
> Too long line.
> 
> > +  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) != NULL)
> > +   {
> > + error_at (DECL_SOURCE_LOCATION (fndecl),
> > +   "%<__simd__%> attribute cannot be "
> > +   "used in the same function marked as a Cilk Plus 
> > SIMD-enabled function");
> 
> Too long line, see above.
> 
> >c = build_tree_list (get_identifier ("omp declare simd"), c);
> >TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl);
> >DECL_ATTRIBUTES (fndecl) = c;
> > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> > index 7555bf3..f3831b9 100644
> > --- a/gcc/cp/parser.c
> > +++ b/gcc/cp/parser.c
> > @@ -34534,10 +34534,11 @@ cp_parser_late_parsing_cilk_simd_fn_info 
> > (cp_parser *parser, tree attrs)
> >cp_omp_declare_simd_data *info = parser->cilk_simd_fn_info;
> >int ii = 0;
> >  
> > -  if (parser->omp_declare_simd != NULL)
> > +  if (parser->omp_declare_simd != NULL
> > +  || lookup_attribute ("simd", attrs))
> >  {
> > -  error ("%<#pragma omp declare simd%> cannot be used in the same 
> > function"
> > -" marked as a Cilk Plus SIMD-enabled function");
> > +  error ("%<#pragma omp declare simd%> of % attribute cannot be 
> > used "
> > +"in the same function marked as a Cilk Plus SIMD-enabled 
> > function");
> 
> Too long lines.
> 
> >

Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread David Malcolm
On Fri, 2015-11-13 at 07:57 +0100, Marek Polacek wrote:
> Probably coming too late, sorry.

> On Thu, Nov 12, 2015 at 09:08:36PM -0500, David Malcolm wrote:
> > index 4335a87..eb4e1fc 100644
> > --- a/gcc/c/c-typeck.c
> > +++ b/gcc/c/c-typeck.c
> > @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "c-family/c-ubsan.h"
> >  #include "cilk.h"
> >  #include "gomp-constants.h"
> > +#include "spellcheck.h"
> >  
> >  /* Possible cases of implicit bad conversions.  Used to select
> > diagnostic messages in convert_for_assignment.  */
> > @@ -2242,6 +2243,72 @@ lookup_field (tree type, tree component)
> >return tree_cons (NULL_TREE, field, NULL_TREE);
> >  }
> >  
> > +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> > +
> > +static void
> > +lookup_field_fuzzy_find_candidates (tree type, tree component,
> > +   vec *candidates)
> > +{
> > +  tree field;
> > +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> 
> I'd prefer declaring field in the for loop, so
>   for (tree field = TYPE_FIELDS...
> 
> > + && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> > + || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> 
> This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).

I based this code on the code in lookup_field right above it;
I copied-and-pasted that conditional, so presumably it should also be
changed in lookup_field (which has the condition twice)?

FWIW I notice RECORD_OR_UNION_TYPE_P also covers QUAL_UNION_TYPE.

/* Nonzero if TYPE is a record or union type.  */
#define RECORD_OR_UNION_TYPE_P(TYPE)\
  (TREE_CODE (TYPE) == RECORD_TYPE  \
   || TREE_CODE (TYPE) == UNION_TYPE\
   || TREE_CODE (TYPE) == QUAL_UNION_TYPE)

FWIW I've made the change in the attached patch (both to the new
function, and to lookup_field).

> > +   {
> > + lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> > + component,
> > + candidates);
> > +   }
> 
> Lose the brackets around a single statement.

Done.

> > +  if (DECL_NAME (field))
> > +   candidates->safe_push (DECL_NAME (field));
> > +}
> > +}
> > +
> > +/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
> > +   rather than returning a TREE_LIST for an exact match.  */
> > +
> > +static tree
> > +lookup_field_fuzzy (tree type, tree component)
> > +{
> > +  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
> > +
> > +  /* First, gather a list of candidates.  */
> > +  auto_vec  candidates;
> > +
> > +  lookup_field_fuzzy_find_candidates (type, component,
> > + );
> > +
> > +  /* Now determine which is closest.  */
> > +  int i;
> > +  tree identifier;
> > +  tree best_identifier = NULL;
> 
> NULL_TREE

Fixed.

> > +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> > +  FOR_EACH_VEC_ELT (candidates, i, identifier)
> > +{
> > +  gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
> > +  edit_distance_t dist = levenshtein_distance (component, identifier);
> > +  if (dist < best_distance)
> > +   {
> > + best_distance = dist;
> > + best_identifier = identifier;
> > +   }
> > +}
> > +
> > +  /* If more than half of the letters were misspelled, the suggestion is
> > + likely to be meaningless.  */
> > +  if (best_identifier)
> > +{
> > +  unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
> > +IDENTIFIER_LENGTH (best_identifier)) / 2;
> > +  if (best_distance > cutoff)
> > +   return NULL;
> 
> NULL_TREE

Fixed.

> > +/* The Levenshtein distance is an "edit-distance": the minimal
> > +   number of one-character insertions, removals or substitutions
> > +   that are needed to change one string into another.
> > +
> > +   This implementation uses the Wagner-Fischer algorithm.  */
> > +
> > +static edit_distance_t
> > +levenshtein_distance (const char *s, int len_s,
> > + const char *t, int len_t)
> > +{
> > +  const bool debug = false;
> > +
> > +  if (debug)
> > +{
> > +  printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> > +  printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> > +}
> 
> Did you leave this debug stuff here intentionally?

I find it useful, but I believe it's against our policy, so I've deleted
it in the attached patch.

> > +  /* Build the rest of the row by considering neighbours to
> > +the north, west and northwest.  */
> > +  for (int j = 0; j < len_s; j++)
> > +   {
> > + edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> > + edit_distance_t deletion = v1[j] + 1;
> > + edit_distance_t insertion= v0[j + 1] + 1;
> 
> The formatting doesn't look right here.

It's correct; it's "diff" inserting two spaces before a tab combined
with our mixed spaces+tab convention: the "for" is at column 6 (6
spaces), whereas the other lines 

Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-11-13 Thread Jakub Jelinek
On Fri, Nov 13, 2015 at 02:54:33PM +0300, Kirill Yukhin wrote:
> @@ -9013,6 +9016,35 @@ handle_omp_declare_simd_attribute (tree *, tree, tree, 
> int, bool *)
>return NULL_TREE;
>  }
>  
> +/* Handle an "simd" attribute.  */

/* Handle a "simd" attribute.  */
instead?

> +static tree
> +handle_simd_attribute (tree *node, tree name, tree, int , bool *no_add_attrs)

No space after int.

> +@item simd
> +@cindex @code{simd} function attribute.
> +This attribute enables creation of one or more function versions that
> +can process multiple arguments using SIMD instructions from a
> +single invocation.  Specifying this attribute allows compiler to
> +assume that such a versions are available at link time (provided

Not a native english speaker, but I'd leave the "a " out.

> +in the same or another translation unit).  Generated versions are
> +target dependent and described in corresponding Vector ABI document.  For
> +x86_64 target this document can be found
> +@w{@uref{https://sourceware.org/glibc/wiki/libmvec?action=AttachFile=view=VectorABI.txt,here}}.
> +It is prohibited to use the attribute along with Cilk Plus's @code{vector}

I think we usually don't say prohibited in the docs, perhaps
"The attribute should not be used together with Cilk Plus @code{vector}
attribute on the same function."?

> +attribute. If the attribute is specified and @code{#pragma omp declare simd}
> +presented on a declaration and @code{-fopenmp} or @code{-fopenmp-simd}

is present on ?

> diff --git a/gcc/testsuite/c-c++-common/attr-simd-2.c 
> b/gcc/testsuite/c-c++-common/attr-simd-2.c
> new file mode 100644
> index 000..bc91ccf
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/attr-simd-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fdump-tree-optimized -fopenmp-simd" } */
> +
> +#pragma omp declare simd
> +__attribute__((__simd__))
> +extern

Maybe just add
#ifdef __cplusplus
"C"
#endif
and remove the C++ mangling cruft from the scan-assembler lines?

> +int simd_attr (void)
> +{
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "omp declare simd" "optimized" } } */
> +/* { dg-final { scan-assembler-times "_ZGVbN4_(?:_Z9)?simd_attr(?:v)?:" 1 { 
> target { i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "_ZGVbM4_(?:_Z9)?simd_attr(?:v)?:" 1 { 
> target { i?86-*-* x86_64-*-* } } } } */

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/attr-simd.c

Similarly.

Ok for trunk with those changes.

Jakub


Re: [PATCH 05/N] Fix memory leaks in graphite

2015-11-13 Thread Richard Biener
On Fri, Nov 13, 2015 at 12:43 PM, Martin Liška  wrote:
> Hello.
>
> Patch survives regbootstrap on x86_64-linux-gnu.
> Ready for trunk?

Ok.

Richard.

> Thanks,
> Martin


Re: [patch] libstdc++/56158 Extend valid values of iostream bitmask types

2015-11-13 Thread Jonathan Wakely

On 12/11/15 11:09 -0700, Martin Sebor wrote:

On 11/12/2015 10:08 AM, Jonathan Wakely wrote:

On 12/11/15 08:48 -0700, Martin Sebor wrote:

On 11/11/2015 02:48 AM, Jonathan Wakely wrote:

As described in the PR, we have operator~ overloads defined for
enumeration types which produce values outside the range of valid
values for the type. In C++11 that can be trivially solved by giving
the enumeration types a fixed underlying type, but this code needs to
be valid in C++03 too.

This patch defines new min/max enumerators as INT_MIN/INT_MAX so that
every int value is also a valid value for the bitmask type.

Does anyone see any problems with this solution, or better solutions?


Just a minor nit that the C-style cast in the below triggers
a -Wold-style-cast warning in Clang, in case libstdc++ tries
to be Clang-warning free. Since the type of __INT_MAX__ is
int it shouldn't be necessary.

+  _S_ios_fmtflags_min = ~(int)__INT_MAX__


That's worth fixing, thanks.



Any suggestions for how to test this, given that GCC's ubsan doesn't
check for this, and we can't run the testsuite with ubsan anyway?


Use a case/switch statement with -Werror=switch-enum to make sure
all the cases are handled and none is duplicated or outside of the
valid values of the enumeration:

void foo (ios::iostate s) {
switch (s) {
case badbit:
case eofbit:
case failbit:
case goodbit:
case __INT_MAX__:
case ~__INT_MAX__: ;
  }
}


I thought this was a great idea at first ... but -Wswitch-enum will
complain that the end, min and max enumerators are not handled (even
though __INT_MAX__ and ~__INT_MAX__ have the same values as the max
and min ones, respectively).


Hmm, I didn't see any warnings for the small test case I wrote and
still don't. Just out of curiosity, what did I miss?

enum iostate {
   goodbit = 0,
   eofbit,
   failbit,
   badbit,
   max = __INT_MAX__,
   min = ~__INT_MAX__
};

void foo (iostate s) {
   switch (s) {
   case badbit:
   case eofbit:
   case failbit:
   case goodbit:
   case __INT_MAX__:
   case ~__INT_MAX__: ;
   ;
   }
}


I must have messed up my test, I agree this is warning clean.

So I could add tests for those minimum and maximum values, and also
static_assert that the enumerations have an underlying type with the
same representation as int (because if it's actually bigger than int
the operator~ could still produce values outside the range of valid
values).



[gomp4] Merge trunk r230255 (2015-11-12) into gomp-4_0-branch

2015-11-13 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r230293:

commit 679edb57a2d0826d2965ba5d61ef11df0e3b23bf
Merge: 6ec2634 0ebb8b2
Author: tschwinge 
Date:   Fri Nov 13 09:21:42 2015 +

svn merge -r 230169:230255 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230293 
138bc75d-0d04-0410-961f-82ee72b054a4

Something in r230169..r230255 causes the libgomp.oacc-fortran/pset-1.f90
tests to regress for -O3 with nvptx offloading:

PASS: libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  execution test
PASS: libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
PASS: libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
[-PASS:-]{+FAIL: libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler 
error)+}
{+FAIL:+} libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.oacc-fortran/pset-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
[-execution test-]
[-PASS:-]{+compilation failed to produce executable+}
{+FAIL: libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  (internal compiler error)+}
{+FAIL:+} libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  (test for excess errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.oacc-fortran/pset-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  
[-execution test-]{+compilation failed to produce executable+}
PASS: libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  (test for excess errors)
PASS: libgomp.oacc-fortran/pset-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  execution test

[...]/libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90: In function 
'MAIN__._omp_fn.8':
[...]/libgomp/testsuite/libgomp.oacc-fortran/pset-1.f90:218:0: internal 
compiler error: in vectorizable_store, at tree-vect-stmts.c:5651
0xc0732c vectorizable_store
[...]/gcc/tree-vect-stmts.c:5651
0xc0f26c vect_transform_stmt(gimple*, gimple_stmt_iterator*, bool*, 
_slp_tree*, _slp_instance*)
[...]/gcc/tree-vect-stmts.c:8003
0xc2799d vect_schedule_slp_instance
[...]/gcc/tree-vect-slp.c:3486
0xc29483 vect_schedule_slp(vec_info*)
[...]/gcc/tree-vect-slp.c:3551
0xc2ca88 vect_slp_bb(basic_block_def*)
[...]/gcc/tree-vect-slp.c:2543
0xc2ea65 execute
[...]/gcc/tree-vectorizer.c:734

That is to say, the same error is also present on trunk, not just on
gomp-4_0-branch.  As there is discussion about this (or, at least, what
looks like very similar backtraces) on the mailing lists and in Bugzilla,
I suppose this will be cured soon.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH, 4/16] Implement -foffload-alias

2015-11-13 Thread Richard Biener
On Thu, 12 Nov 2015, Tom de Vries wrote:

> On 11/11/15 12:00, Jakub Jelinek wrote:
> > On Wed, Nov 11, 2015 at 11:51:02AM +0100, Richard Biener wrote:
> > > > The option -foffload-alias=pointer instructs the compiler to assume that
> > > > objects references in an offload region do not alias.
> > > > 
> > > > The option -foffload-alias=all instructs the compiler to make no
> > > > assumptions about aliasing in offload regions.
> > > > 
> > > > The default value is -foffload-alias=none.
> > > 
> > > I think global options for this is nonsense.  Please follow what
> > > we do for #pragma GCC ivdep for example, thus allow the alias
> > > behavior to be specified per "region" (whatever makes sense here
> > > in the context of offloading).
> 
> So, IIUC, instead of a global option foffload-alias, you're saying something
> like the following would be acceptable:
> ...
> #pragma GCC offload-alias=
> #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
>   {
> #pragma acc loop
> for (COUNTERTYPE ii = 0; ii < N; ii++)
>   c[ii] = a[ii] + b[ii];
>   }
> ...
> ?
> 
> I suppose that would work (though a global option would allow us to easily
> switch between none/pointer/all values in a large number of files, something
> that might be useful when f.i. running an openacc  test suite).
> 
> > Yeah, completely agreed.  I don't see why the offloaded region would be in
> > any way special, they are C/C++/Fortran code as any other.
> > What we can and should improve is teach IPA aliasing/points to analysis
> > about the way we lower the host vs. offloading region boundary, so that
> > if alias analysis on the caller of GOMP_target_ext/GOACC_parallel_keyed
> > determines something it can be used on the offloaded function side and vice
> > versa,
> 
> I agree this would be a nice way to solve the aliasing info problem, but
> considering the remark of Richard at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46032#c19 :
> ...
> Not that I think IPA PTA is anywhere near production ready

Just to clarify on that sentence:
 1) we lack good testing coverage for IPA PTA so wrong-code bugs might 
still exist
 2) IPA PTA can use a _lot_ of memory and compile-time
 3) for existing wrong-code issues I have merely dumbed down the
use of the analysis result resulting in weaker alias analysis compared to
the local PTA (for some cases)

Because of 2) and no good way to avoid this I decided to not make
fixing 3) a priority (and 1) still holds).

Richard.

> ...
> I haven't considered proceeding in that direction.
> 
> Thanks,
> - Tom
> 
> > but a switch like the above is just wrong.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [committed] gen-pass-instances.awk: Simplify match regexp in handle_line

2015-11-13 Thread Thomas Schwinge
Hi Tom!

As I happend to see this, and have a number of older build trees around:

On Thu, 12 Nov 2015 11:09:04 +0100, Tom de Vries  wrote:
> this patch [and another dozen of such patches before] simplifies [...] in 
> gen-pass-instances.awk.
> 
> Committed to trunk as trivial.

Supposing this was the intention, just to confirm: the generated
gcc/pass-instances.def files that I looked at did not change after your
recent gcc/gen-pass-instances.awk changes.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [patch] libstdc++/56158 Extend valid values of iostream bitmask types

2015-11-13 Thread Jonathan Wakely

On 13/11/15 08:39 +, Jonathan Wakely wrote:

On 12/11/15 11:09 -0700, Martin Sebor wrote:

On 11/12/2015 10:08 AM, Jonathan Wakely wrote:

On 12/11/15 08:48 -0700, Martin Sebor wrote:

On 11/11/2015 02:48 AM, Jonathan Wakely wrote:

As described in the PR, we have operator~ overloads defined for
enumeration types which produce values outside the range of valid
values for the type. In C++11 that can be trivially solved by giving
the enumeration types a fixed underlying type, but this code needs to
be valid in C++03 too.

This patch defines new min/max enumerators as INT_MIN/INT_MAX so that
every int value is also a valid value for the bitmask type.

Does anyone see any problems with this solution, or better solutions?


Just a minor nit that the C-style cast in the below triggers
a -Wold-style-cast warning in Clang, in case libstdc++ tries
to be Clang-warning free. Since the type of __INT_MAX__ is
int it shouldn't be necessary.

+  _S_ios_fmtflags_min = ~(int)__INT_MAX__


That's worth fixing, thanks.



Any suggestions for how to test this, given that GCC's ubsan doesn't
check for this, and we can't run the testsuite with ubsan anyway?


Use a case/switch statement with -Werror=switch-enum to make sure
all the cases are handled and none is duplicated or outside of the
valid values of the enumeration:

void foo (ios::iostate s) {
   switch (s) {
   case badbit:
   case eofbit:
   case failbit:
   case goodbit:
   case __INT_MAX__:
   case ~__INT_MAX__: ;
 }
}


I thought this was a great idea at first ... but -Wswitch-enum will
complain that the end, min and max enumerators are not handled (even
though __INT_MAX__ and ~__INT_MAX__ have the same values as the max
and min ones, respectively).


Hmm, I didn't see any warnings for the small test case I wrote and
still don't. Just out of curiosity, what did I miss?

enum iostate {
  goodbit = 0,
  eofbit,
  failbit,
  badbit,
  max = __INT_MAX__,
  min = ~__INT_MAX__
};

void foo (iostate s) {
  switch (s) {
  case badbit:
  case eofbit:
  case failbit:
  case goodbit:
  case __INT_MAX__:
  case ~__INT_MAX__: ;
  ;
  }
}


I must have messed up my test, I agree this is warning clean.

So I could add tests for those minimum and maximum values, and also
static_assert that the enumerations have an underlying type with the
same representation as int (because if it's actually bigger than int
the operator~ could still produce values outside the range of valid
values).


How's this?



commit c9fc6b83d0ec4bbf73d7d0f410e0b0e2799d8adf
Author: Jonathan Wakely 
Date:   Fri Nov 13 08:56:22 2015 +

Improve tests for valid values of iostream bitmask types

	* testsuite/27_io/ios_base/types/fmtflags/case_label.cc: Explicitly
	check minimum and maximum values, and size of underlying type.
	* testsuite/27_io/ios_base/types/iostate/case_label.cc: Likewise.
	* testsuite/27_io/ios_base/types/openmode/case_label.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc b/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
index e8820c5..3475fd3 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
@@ -70,9 +70,11 @@ case_labels(bitmask_type b)
   break;
 case std::_S_ios_fmtflags_end:
   break;
-case std::_S_ios_fmtflags_min:
+case __INT_MAX__:
   break;
-case std::_S_ios_fmtflags_max:
+case ~__INT_MAX__:
   break;
 }
+  static_assert( sizeof(std::underlying_type_t) == sizeof(int),
+  "underlying type has same range of values as int");
 }
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc b/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc
index 4e4e4f5..a72a774 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc
@@ -42,9 +42,11 @@ case_labels(bitmask_type b)
   break;
 case std::_S_ios_iostate_end:
   break;
-case std::_S_ios_iostate_min:
+case __INT_MAX__:
   break;
-case std::_S_ios_iostate_max:
+case ~__INT_MAX__:
   break;
 }
+  static_assert( sizeof(std::underlying_type_t) == sizeof(int),
+  "underlying type has same range of values as int");
 }
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/case_label.cc b/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/case_label.cc
index 8c6672f6..f621d21 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/case_label.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/case_label.cc
@@ -46,9 +46,11 @@ case_labels(bitmask_type b)
   break;
 case std::_S_ios_openmode_end:
   break;
-case std::_S_ios_openmode_min:
+case __INT_MAX__:
   break;
-case std::_S_ios_openmode_max:
+case ~__INT_MAX__:
   break;
 }
+  static_assert( 

Re: [PATCH 1/2] simplify-rtx: Simplify trunc of and of shiftrt

2015-11-13 Thread Uros Bizjak
Hello!

> 2015-11-09  Segher Boessenkool  
>
> * gcc/simplify-rtx.c (simplify_truncation): Simplify TRUNCATE
> of AND of [LA]SHIFTRT.

Revision r230164 (the above patch) regressed:

FAIL: gcc.target/alpha/pr42269-1.c scan-assembler-not addl

on alpha-linux-gnu.

The difference starts in combine, where before the patch, we were able
to combine insns:

(insn 7 6 8 2 (set (reg:DI 82)
(lshiftrt:DI (reg:DI 81 [ x ])
(const_int 16 [0x10]))) pr42269-1.c:8 66 {lshrdi3}
 (expr_list:REG_DEAD (reg:DI 81 [ x ])
(nil)))
(insn 8 7 11 2 (set (reg:DI 70 [ _2 ])
(sign_extend:DI (subreg:SI (reg:DI 82) 0))) pr42269-1.c:8 2
{*extendsidi2_1}
 (expr_list:REG_DEAD (reg:DI 82)
(nil)))

to:

Trying 7 -> 8:
Successfully matched this instruction:
(set (reg:DI 70 [ _2 ])
(zero_extract:DI (reg/v:DI 80 [ x ])
(const_int 16 [0x10])
(const_int 16 [0x10])))
allowing combination of insns 7 and 8
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 7.
modifying insn i3 8: r70:DI=zero_extract(r80:DI,0x10,0x10)
deferring rescan insn with uid = 8.

After the patch, the combination fails:

Trying 7 -> 8:
Failed to match this instruction:
(set (reg:DI 70 [ _2 ])
(sign_extend:DI (lshiftrt:SI (subreg:SI (reg/v:DI 80 [ x ]) 0)
(const_int 16 [0x10]

Uros.


Re: [PATCH] More compile-time saving in BB vectorization

2015-11-13 Thread Christophe Lyon
On 12 November 2015 at 21:04, Christophe Lyon
 wrote:
> On 12 November 2015 at 16:49, Andreas Schwab  wrote:
>> Richard Biener  writes:
>>
>>>   * tree-vectorizer.h (vect_slp_analyze_and_verify_instance_alignment):
>>>   Declare.
>>>   (vect_analyze_data_refs_alignment): Make loop vect specific.
>>>   (vect_verify_datarefs_alignment): Likewise.
>>>   * tree-vect-data-refs.c (vect_slp_analyze_data_ref_dependences):
>>>   Add missing continue.
>>>   (vect_compute_data_ref_alignment): Export.
>>>   (vect_compute_data_refs_alignment): Merge into...
>>>   (vect_analyze_data_refs_alignment): ... this.
>>>   (verify_data_ref_alignment): Split out from ...
>>>   (vect_verify_datarefs_alignment): ... here.
>>>   (vect_slp_analyze_and_verify_node_alignment): New function.
>>>   (vect_slp_analyze_and_verify_instance_alignment): Likewise.
>>>   * tree-vect-slp.c (vect_supported_load_permutation_p): Remove
>>>   misplaced checks on alignment.
>>>   (vect_slp_analyze_bb_1): Add fatal output parameter.  Do
>>>   alignment analysis after SLP discovery and do it per instance.
>>>   (vect_slp_bb): When vect_slp_analyze_bb_1 fatally failed do not
>>>   bother to re-try using different vector sizes.
>>
>> This breaks libgfortran on ia64:
>>
>> ../../../libgfortran/generated/matmul_c4.c: In function 'matmul_c4':
>> ../../../libgfortran/generated/matmul_c4.c:79:1: internal compiler error: in 
>> vectorizable_store, at tree-vect-stmts.c:5651
>>  matmul_c4 (gfc_array_c4 * const restrict retarray,
>>  ^
>> 0x410ff01f vectorizable_store
>> ../../gcc/tree-vect-stmts.c:5651
>> 0x41115b5f vect_transform_stmt(gimple*, gimple_stmt_iterator*, 
>> bool*, _slp_tree*, _slp_instance*)
>> ../../gcc/tree-vect-stmts.c:8003
>> 0x4114df1f vect_schedule_slp_instance
>> ../../gcc/tree-vect-slp.c:3484
>> 0x41154d6f vect_schedule_slp(vec_info*)
>> ../../gcc/tree-vect-slp.c:3549
>> 0x411562bf vect_slp_bb(basic_block_def*)
>> ../../gcc/tree-vect-slp.c:2543
>> 0x41159f2f execute
>> ../../gcc/tree-vectorizer.c:734
>>
>
> Same problem on armeb.
>
Now fixed at r230260 (pr68308).

Thanks.


>
>> Andreas.
>>
>> --
>> Andreas Schwab, SUSE Labs, sch...@suse.de
>> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
>> "And now for something completely different."


[PATCH] Avoid useless work in loop vectorization

2015-11-13 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-13  Richard Biener  

* tree-vect-loop.c (vect_analyze_loop_2): Add fatal parameter.
Signal fatal failure if early checks fail.
(vect_analyze_loop): If vect_analyze_loop_2 fails fatally
do not bother testing further vector sizes.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 230260)
+++ gcc/tree-vect-loop.c(working copy)
@@ -1709,13 +1709,16 @@ vect_analyze_loop_operations (loop_vec_i
for it.  The different analyses will record information in the
loop_vec_info struct.  */
 static bool
-vect_analyze_loop_2 (loop_vec_info loop_vinfo)
+vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool )
 {
   bool ok;
   int max_vf = MAX_VECTORIZATION_FACTOR;
   int min_vf = 2;
   unsigned int n_stmts = 0;
 
+  /* The first group of checks is independent of the vector size.  */
+  fatal = true;
+
   /* Find all data references in the loop (which correspond to vdefs/vuses)
  and analyze their evolution in the loop.  */
 
@@ -1795,7 +1798,6 @@ vect_analyze_loop_2 (loop_vec_info loop_
 
   /* Classify all cross-iteration scalar data-flow cycles.
  Cross-iteration cycles caused by virtual phis are analyzed separately.  */
-
   vect_analyze_scalar_cycles (loop_vinfo);
 
   vect_pattern_recog (loop_vinfo);
@@ -1825,6 +1827,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
   return false;
 }
 
+  /* While the rest of the analysis below depends on it in some way.  */
+  fatal = false;
+
   /* Analyze data dependences between the data-refs in the loop
  and adjust the maximum vectorization factor according to
  the dependences.
@@ -2118,7 +2169,8 @@ vect_analyze_loop (struct loop *loop)
  return NULL;
}
 
-  if (vect_analyze_loop_2 (loop_vinfo))
+  bool fatal = false;
+  if (vect_analyze_loop_2 (loop_vinfo, fatal))
{
  LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
 
@@ -2128,7 +2180,8 @@ vect_analyze_loop (struct loop *loop)
   destroy_loop_vec_info (loop_vinfo, true);
 
   vector_sizes &= ~current_vector_size;
-  if (vector_sizes == 0
+  if (fatal
+ || vector_sizes == 0
  || current_vector_size == 0)
return NULL;
 


Re: [PATCH, 11/16] Update testcases after adding kernels pass group

2015-11-13 Thread Richard Biener
On Thu, 12 Nov 2015, David Malcolm wrote:

> On Thu, 2015-11-12 at 15:43 +0100, Richard Biener wrote:
> > On Thu, Nov 12, 2015 at 3:31 PM, Tom de Vries  
> > wrote:
> > > On 11/11/15 12:03, Richard Biener wrote:
> > >>
> > >> On Mon, 9 Nov 2015, Tom de Vries wrote:
> > >>
> > >>> On 09/11/15 16:35, Tom de Vries wrote:
> > 
> >  Hi,
> > 
> >  this patch series for stage1 trunk adds support to:
> >  - parallelize oacc kernels regions using parloops, and
> >  - map the loops onto the oacc gang dimension.
> > 
> >  The patch series contains these patches:
> > 
> > 1Insert new exit block only when needed in
> >    transform_to_exit_first_loop_alt
> > 2Make create_parallel_loop return void
> > 3Ignore reduction clause on kernels directive
> > 4Implement -foffload-alias
> > 5Add in_oacc_kernels_region in struct loop
> > 6Add pass_oacc_kernels
> > 7Add pass_dominator_oacc_kernels
> > 8Add pass_ch_oacc_kernels
> > 9Add pass_parallelize_loops_oacc_kernels
> >    10Add pass_oacc_kernels pass group in passes.def
> >    11Update testcases after adding kernels pass group
> >    12Handle acc loop directive
> >    13Add c-c++-common/goacc/kernels-*.c
> >    14Add gfortran.dg/goacc/kernels-*.f95
> >    15Add libgomp.oacc-c-c++-common/kernels-*.c
> >    16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> >  The first 9 patches are more or less independent, but patches 10-16 are
> >  intended to be committed at the same time.
> > 
> >  Bootstrapped and reg-tested on x86_64.
> > 
> >  Build and reg-tested with nvidia accelerator, in combination with a
> >  patch that enables accelerator testing (which is submitted at
> >  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> >  I'll post the individual patches in reply to this message.
> > >>>
> > >>>
> > >>> This patch updates existing testcases with new pass numbers, given the
> > >>> passes
> > >>> that were added in the pass list in patch 10.
> > >>
> > >>
> > >> I think it would be nice to be able to specify the number in the .def
> > >> file instead so we can avoid this kind of churn everytime we do this.
> > >
> > >
> > > How about something along the lines of:
> > > ...
> > >   /* pass_build_ealias is a dummy pass that ensures that we
> > >  execute TODO_rebuild_alias at this point.  */
> > >   NEXT_PASS (pass_build_ealias);
> > >   /* Pass group that runs when there are oacc kernels in the
> > >   function.  */
> > >   NEXT_PASS (pass_oacc_kernels);
> > >   PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> > >   PUSH_ID ("oacc_kernels")
> > > ...
> > >   POP_ID ()
> > >   POP_INSERT_PASSES ()
> > >   NEXT_PASS (pass_fre);
> > > ...
> > >
> > > where the PUSH_ID/POP_ID pair has the functionality that all the contained
> > > passes:
> > > - have the id prefixed to the dump file, so the dump file of pass_ch
> > >   which normally is "ch" becomes "oacc_kernels_ch", and
> > > - the pass name in pass_instances.def becomes pass_oacc_kernels_ch, such
> > >   that it doesn't count as numbered instance of pass_ch
> > > ?
> > 
> > Hmm.  I'd like to have sth that allows me to add "slp" to both
> > pass_slp_vectorize
> > instances, having them share the suffix (as no two functions are in both 
> > dumps).
> > 
> > We similarly have "duplicates" across the -Og vs. the -O[0-3] pipeline.
> > 
> > Basically make all dump file name suffixes manually specified which means 
> > moving
> > them from the class definition to the actual instance.
> > 
> > Well, just an idea.  In a distant future I like our pass pipeline to become 
> > more
> > dynamic, getting away from a static passes.def towards, say, a pass "script"
> > (to be able to say "if inlining did nothing skip this group" or similar).
> 
> Can't that be done by having a parent pass to hold them, with a gate
> function?

Sure, that's how we do it for the loop sub-pipeline for example.

> Or are you thinking of having another domain-specific language?

Kind of.  I'm thinking of the pass pipeline being dynamic in the
sense of a program controlling execution of passes.  Basically
"scripting" the pass manager itself (yes, also with the idea to
give users and us more control).

Of course specific features can be implemented in the pass manager
itself (it's a "script" with static configuration).

> Thinking aloud, I've sometimes wondered if it would be helpful to be
> able to subclass pass_manager, so that multiple passes.def files could
> generate alternative pass_manager subclasses, with the precise choice of
> pass_manager subclass being determined by options+target.  I don't know
> if that latter idea is useful though.

I think the "use" of passes.def is simply too static.  We 

Re: [PATCH] Avoid false vector mask conversion

2015-11-13 Thread Richard Biener
On Thu, Nov 12, 2015 at 5:08 PM, Ilya Enkovich  wrote:
> Hi,
>
> When we use LTO for fortran we may have a mix 32bit and 1bit scalar booleans. 
> It means we may have conversion of one scalar type to another which confuses 
> vectorizer because values with different scalar boolean type may get the same 
> vectype.

Confuses aka fails to vectorize?

>  This patch transforms such conversions into comparison.
>
> I managed to make a small fortran test which gets vectorized with this patch 
> but I didn't find how I can run fortran test with LTO and then scan tree dump 
> to check it is vectorized.  BTW here is a loop from the test:
>
>   real*8 a(18)
>   logical b(18)
>   integer i
>
>   do i=1,18
>  if(a(i).gt.0.d0) then
> b(i)=.true.
>  else
> b(i)=.false.
>  endif
>   enddo

This looks the the "error" comes from if-conversion - can't we do
better there then?

Richard.

> Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-12  Ilya Enkovich  
>
> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
> Transform useless boolean conversion into assignment.
>
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index b9d900c..62070da 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -3674,6 +3674,38 @@ vect_recog_mask_conversion_pattern (vec 
> *stmts, tree *type_in,
>if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>  return NULL;
>
> +  /* Check conversion between boolean types of different sizes.
> + If no vectype is specified, then we have a regular mask
> + assignment with no actual conversion.  */
> +  if (rhs_code == CONVERT_EXPR
> +  && !STMT_VINFO_DATA_REF (stmt_vinfo)
> +  && !STMT_VINFO_VECTYPE (stmt_vinfo))
> +{
> +  if (TREE_CODE (rhs1) != SSA_NAME)
> +   return NULL;
> +
> +  rhs1_type = search_type_for_mask (rhs1, vinfo);
> +  if (!rhs1_type)
> +   return NULL;
> +
> +  vectype1 = get_mask_type_for_scalar_type (rhs1_type);
> +
> +  if (!vectype1)
> +   return NULL;
> +
> +  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +  pattern_stmt = gimple_build_assign (lhs, rhs1);
> +
> +  *type_out = vectype1;
> +  *type_in = vectype1;
> +  stmts->safe_push (last_stmt);
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> + "vect_recog_mask_conversion_pattern: detected:\n");
> +
> +  return pattern_stmt;
> +}
> +
>if (rhs_code != BIT_IOR_EXPR
>&& rhs_code != BIT_XOR_EXPR
>&& rhs_code != BIT_AND_EXPR)


Re: [PATCH 1/2] simplify-rtx: Simplify trunc of and of shiftrt

2015-11-13 Thread Uros Bizjak
On Fri, Nov 13, 2015 at 3:36 PM, Segher Boessenkool
 wrote:
> Hi!
>
> On Fri, Nov 13, 2015 at 11:02:55AM +0100, Uros Bizjak wrote:
>> on alpha-linux-gnu.
>>
>> The difference starts in combine, where before the patch, we were able
>> to combine insns:
>>
>> (insn 7 6 8 2 (set (reg:DI 82)
>> (lshiftrt:DI (reg:DI 81 [ x ])
>> (const_int 16 [0x10]))) pr42269-1.c:8 66 {lshrdi3}
>>  (expr_list:REG_DEAD (reg:DI 81 [ x ])
>> (nil)))
>> (insn 8 7 11 2 (set (reg:DI 70 [ _2 ])
>> (sign_extend:DI (subreg:SI (reg:DI 82) 0))) pr42269-1.c:8 2
>> {*extendsidi2_1}
>>  (expr_list:REG_DEAD (reg:DI 82)
>> (nil)))
>>
>> to:
>>
>> Trying 7 -> 8:
>> Successfully matched this instruction:
>> (set (reg:DI 70 [ _2 ])
>> (zero_extract:DI (reg/v:DI 80 [ x ])
>> (const_int 16 [0x10])
>> (const_int 16 [0x10])))
>> allowing combination of insns 7 and 8
>> original costs 4 + 4 = 8
>> replacement cost 4
>> deferring deletion of insn with uid = 7.
>> modifying insn i3 8: r70:DI=zero_extract(r80:DI,0x10,0x10)
>> deferring rescan insn with uid = 8.
>>
>> After the patch, the combination fails:
>>
>> Trying 7 -> 8:
>> Failed to match this instruction:
>> (set (reg:DI 70 [ _2 ])
>> (sign_extend:DI (lshiftrt:SI (subreg:SI (reg/v:DI 80 [ x ]) 0)
>> (const_int 16 [0x10]
>
> Somehow, before the patch, it decided to do a zero-extension (where the
> combined insns had a sign extension).  Was that even correct?  Maybe
> many bits of reg 80 (or, hrm, 81 in the orig?!) are known zero?

Oops, this analysis is wrong. I'll re-do the analysis in reported PR 68330 [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68330

Uros.


[PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order

2015-11-13 Thread Wilco Dijkstra
This patch adds CCMP selection based on rtx costs. This is based on Jiong's
already approved patch
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01434.html with some minor
refactoring and the tests updated.

OK for commit?

ChangeLog:
2015-11-13  Jiong Wang  

gcc/
* ccmp.c (expand_ccmp_expr_1): Cost the instruction sequences
generated from different expand order.
  
gcc/testsuite/
* gcc.target/aarch64/ccmp_1.c: Update test.

---
 gcc/ccmp.c| 47
+++
 gcc/testsuite/gcc.target/aarch64/ccmp_1.c | 15 --
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index cbdbd6d..95a41a6 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-outof-ssa.h"
 #include "cfgexpand.h"
 #include "ccmp.h"
+#include "predict.h"
 
 /* The following functions expand conditional compare (CCMP) instructions.
Here is a short description about the over all algorithm:
@@ -159,6 +160,8 @@ expand_ccmp_next (gimple *g, enum tree_code code, rtx
prev,
 static rtx
 expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx *gen_seq)
 {
+  rtx prep_seq_1, gen_seq_1;
+  rtx prep_seq_2, gen_seq_2;
   tree exp = gimple_assign_rhs_to_tree (g);
   enum tree_code code = TREE_CODE (exp);
   gimple *gs0 = get_gimple_for_ssa_name (TREE_OPERAND (exp, 0));
@@ -174,19 +177,53 @@ expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx
*gen_seq)
 {
   if (TREE_CODE_CLASS (code1) == tcc_comparison)
{
- int unsignedp0;
- enum rtx_code rcode0;
+ int unsignedp0, unsignedp1;
+ enum rtx_code rcode0, rcode1;
+ int speed_p = optimize_insn_for_speed_p ();
+ rtx tmp2, ret, ret2;
+ unsigned cost1 = MAX_COST;
+ unsigned cost2 = MAX_COST;
 
  unsignedp0 = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (gs0)));
+ unsignedp1 = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (gs1)));
  rcode0 = get_rtx_code (code0, unsignedp0);
+ rcode1 = get_rtx_code (code1, unsignedp1);
 
- tmp = targetm.gen_ccmp_first (prep_seq, gen_seq, rcode0,
+ tmp = targetm.gen_ccmp_first (_seq_1, _seq_1, rcode0,
gimple_assign_rhs1 (gs0),
gimple_assign_rhs2 (gs0));
- if (!tmp)
+
+ tmp2 = targetm.gen_ccmp_first (_seq_2, _seq_2, rcode1,
+gimple_assign_rhs1 (gs1),
+gimple_assign_rhs2 (gs1));
+
+ if (!tmp && !tmp2)
return NULL_RTX;
 
- return expand_ccmp_next (gs1, code, tmp, prep_seq, gen_seq);
+ if (tmp != NULL)
+   {
+ ret = expand_ccmp_next (gs1, code, tmp, _seq_1,
_seq_1);
+ cost1 = seq_cost (safe_as_a  (prep_seq_1),
speed_p);
+ cost1 += seq_cost (safe_as_a  (gen_seq_1),
speed_p);
+   }
+ if (tmp2 != NULL)
+   {
+ ret2 = expand_ccmp_next (gs0, code, tmp2, _seq_2,
+  _seq_2);
+ cost2 = seq_cost (safe_as_a  (prep_seq_2),
speed_p);
+ cost2 += seq_cost (safe_as_a  (gen_seq_2),
speed_p);
+   }
+
+ if (cost2 < cost1)
+   {
+ *prep_seq = prep_seq_2;
+ *gen_seq = gen_seq_2;
+ return ret2;
+   }
+
+ *prep_seq = prep_seq_1;
+ *gen_seq = gen_seq_1;
+ return ret;
}
   else
{
diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
b/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
index ef077e0..7c39b61 100644
--- a/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
@@ -80,5 +80,16 @@ f13 (int a, int b)
   return a == 3 || a == 0;
 }
 
-/* { dg-final { scan-assembler "fccmp\t" } } */
-/* { dg-final { scan-assembler "fccmpe\t" } } */
+/* { dg-final { scan-assembler "cmp\t(.)+32" } } */
+/* { dg-final { scan-assembler "cmp\t(.)+33" } } */
+/* { dg-final { scan-assembler "cmp\t(.)+34" } } */
+/* { dg-final { scan-assembler "cmp\t(.)+35" } } */
+
+/* { dg-final { scan-assembler-times "\tcmp\tw\[0-9\]+, 0" 4 } } */
+/* { dg-final { scan-assembler-times "fcmpe\t(.)+0\\.0" 2 } } */
+/* { dg-final { scan-assembler-times "fcmp\t(.)+0\\.0" 2 } } */
+
+/* { dg-final { scan-assembler "adds\t" } } */
+/* { dg-final { scan-assembler-times "\tccmp\t" 11 } } */
+/* { dg-final { scan-assembler-times "fccmp\t.*0\\.0" 1 } } */
+/* { dg-final { scan-assembler-times "fccmpe\t.*0\\.0" 1 } } */
-- 
1.9.1





[PATCH 3/4][AArch64] Add CCMP to rtx costs

2015-11-13 Thread Wilco Dijkstra
This patch adds support for rtx costing of CCMP. The cost is the same as
int/FP compare, however comparisons with zero get a slightly larger cost.
This means we prefer emitting compares with zero so they can be merged with
ALU operations.

OK for commit?

ChangeLog:
2015-11-13  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (aarch64_if_then_else_costs):
Add support for CCMP costing.

---
 gcc/config/aarch64/aarch64.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a224982..b789841 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5638,6 +5638,26 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx
op2, int *cost, bool speed)
 }
   else if (GET_MODE_CLASS (GET_MODE (inner)) == MODE_CC)
 {
+  /* CCMP.  */
+  if ((GET_CODE (op1) == COMPARE) && CONST_INT_P (op2))
+   {
+ /* Increase cost of CCMP reg, 0, imm, CC to prefer CMP reg, 0.  */
+ if (XEXP (op1, 1) == const0_rtx)
+   *cost += 1;
+ if (speed)
+   {
+ machine_mode mode = GET_MODE (XEXP (op1, 0));
+ const struct cpu_cost_table *extra_cost
+   = aarch64_tune_params.insn_extra_cost;
+
+ if (GET_MODE_CLASS (mode) == MODE_INT)
+   *cost += extra_cost->alu.arith;
+ else
+   *cost += extra_cost->fp[mode == DFmode].compare;
+   }
+ return true;
+   }
+
   /* It's a conditional operation based on the status flags,
 so it must be some flavor of CSEL.  */
 
-- 
1.9.1





[PATCH] Fix SH/FDPIC bad codegen with ssp enabled

2015-11-13 Thread Rich Felker
The "chk_guard_add" pattern used for loading the GOT slot address for
__stack_chk_guard hard-codes use of r12 as a fixed GOT register and
thus is not suitable for FDPIC, where the saved initial value of r12
from function entry is what we need.

I would actually prefer removing this hack entirely if possible. I
tried non-FDPIC with it disabled and did not experience any problems;
I suspect it was written to work around a bug that no longer exists.

2015-11-13  Rich Felker 

gcc/
* config/sh/sh.md (symGOT_load): Suppress __stack_chk_guard
address loading hack for FDPIC targets.

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 7a40d0f..45c9995 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -11082,7 +11082,7 @@ (define_expand "symGOT_load"
   operands[2] = !can_create_pseudo_p () ? operands[0] : gen_reg_rtx (Pmode);
   operands[3] = !can_create_pseudo_p () ? operands[0] : gen_reg_rtx (Pmode);
 
-  if (!TARGET_SHMEDIA
+  if (!TARGET_SHMEDIA && !TARGET_FDPIC
   && flag_stack_protect
   && GET_CODE (operands[1]) == CONST
   && GET_CODE (XEXP (operands[1], 0)) == UNSPEC


Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-13 Thread Segher Boessenkool
On Fri, Nov 13, 2015 at 11:17:38AM +0100, Uros Bizjak wrote:
> IMO, this is such a small code-size regression, that it should not
> block the patch submission.

In that case: Kyrill, the patch is okay for trunk.  Thanks!

> It would be nice to know, what causes the
> increase (in case, this is some systematic oversight), but we can live
> with it.

After the patch it will no longer combine an imul reg,reg (+ mov) into an
imul mem,reg.  _Most_ cases that end up as mem,reg are already expanded
as such, but not all.  It's hard to make a smallish testcase.


Segher


[nvptx] complex vector reductions

2015-11-13 Thread Nathan Sidwell
I noticed that we weren't supporting reductions of complex type, particularly 
complex double.


I've committed this patch to add support for vector reductions.  We split the 
complex type apart and process each half before sticking it back together again. 
 As it happens only 32-bit shuffles exist on hardware, so splitting (say) 
complex float this way doesn't result in worse code that punning the  whole 
thing to a 64-bit int.  I suspect it might play better, by keeping the real and 
imaginary parts separate.


Worker and gang-level reductions of such types will be a different patch, which 
I need to think about.


nathan
2015-11-13  Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx.c (nvptx_generate_vector_shuffle): Deal with
	complex types.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: New.
	* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: New.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 230320)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -3634,26 +3634,51 @@ nvptx_generate_vector_shuffle (location_
 {
   unsigned fn = NVPTX_BUILTIN_SHUFFLE;
   tree_code code = NOP_EXPR;
-  tree type = unsigned_type_node;
-  enum machine_mode mode = TYPE_MODE (TREE_TYPE (var));
+  tree arg_type = unsigned_type_node;
+  tree var_type = TREE_TYPE (var);
+  tree dest_type = var_type;
 
-  if (!INTEGRAL_MODE_P (mode))
+  if (TREE_CODE (var_type) == COMPLEX_TYPE)
+var_type = TREE_TYPE (var_type);
+
+  if (TREE_CODE (var_type) == REAL_TYPE)
 code = VIEW_CONVERT_EXPR;
-  if (GET_MODE_SIZE (mode) == GET_MODE_SIZE (DImode))
+
+  if (TYPE_SIZE (var_type)
+  == TYPE_SIZE (long_long_unsigned_type_node))
 {
   fn = NVPTX_BUILTIN_SHUFFLELL;
-  type = long_long_unsigned_type_node;
+  arg_type = long_long_unsigned_type_node;
 }
-
+  
   tree call = nvptx_builtin_decl (fn, true);
-  call = build_call_expr_loc
-(loc, call, 3, fold_build1 (code, type, var),
- build_int_cst (unsigned_type_node, shift),
- build_int_cst (unsigned_type_node, SHUFFLE_DOWN));
+  tree bits = build_int_cst (unsigned_type_node, shift);
+  tree kind = build_int_cst (unsigned_type_node, SHUFFLE_DOWN);
+  tree expr;
+
+  if (var_type != dest_type)
+{
+  /* Do real and imaginary parts separately.  */
+  tree real = fold_build1 (REALPART_EXPR, var_type, var);
+  real = fold_build1 (code, arg_type, real);
+  real = build_call_expr_loc (loc, call, 3, real, bits, kind);
+  real = fold_build1 (code, var_type, real);
 
-  call = fold_build1 (code, TREE_TYPE (dest_var), call);
+  tree imag = fold_build1 (IMAGPART_EXPR, var_type, var);
+  imag = fold_build1 (code, arg_type, imag);
+  imag = build_call_expr_loc (loc, call, 3, imag, bits, kind);
+  imag = fold_build1 (code, var_type, imag);
+
+  expr = fold_build2 (COMPLEX_EXPR, dest_type, real, imag);
+}
+  else
+{
+  expr = fold_build1 (code, arg_type, var);
+  expr = build_call_expr_loc (loc, call, 3, expr, bits, kind);
+  expr = fold_build1 (code, dest_type, expr);
+}
 
-  gimplify_assign (dest_var, call, seq);
+  gimplify_assign (dest_var, expr, seq);
 }
 
 /* Insert code to locklessly update  *PTR with *PTR OP VAR just before
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c	(working copy)
@@ -0,0 +1,52 @@
+
+#include 
+
+/* Double float has 53 bits of fraction. */
+#define FRAC (1.0 / (1LL << 48))
+
+int close_enough (double _Complex a, double _Complex b)
+{
+  double _Complex diff = a - b;
+  double mag2_a = __real__(a) * __real__ (a) + __imag__ (a) * __imag__ (a);
+  double mag2_diff = (__real__(diff) * __real__ (diff)
+		 + __imag__ (diff) * __imag__ (diff));
+
+  return mag2_diff / mag2_a < (FRAC * FRAC);
+}
+
+int main (void)
+{
+#define N 100
+  double _Complex ary[N], sum, prod, tsum, tprod;
+  int ix;
+
+  sum = tsum = 0;
+  prod = tprod = 1;
+  
+  for (ix = 0; ix < N;  ix++)
+{
+  double frac = ix * (1.0 / 1024) + 1.0;
+  
+  ary[ix] = frac + frac * 2.0i - 1.0i;
+  sum += ary[ix];
+  prod *= ary[ix];
+}
+
+#pragma acc parallel vector_length(32) copyin(ary) copy (tsum, tprod)
+  {
+#pragma acc loop vector reduction(+:tsum) reduction (*:tprod)
+for (ix = 0; ix < N; ix++)
+  {
+	tsum += ary[ix];
+	tprod *= ary[ix];
+  }
+  }
+
+  if (!close_enough (sum, tsum))
+return 1;
+
+  if (!close_enough (prod, tprod))
+return 1;
+
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c	(revision 0)
+++ 

Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Marek Polacek
On Fri, Nov 13, 2015 at 07:16:08AM -0500, David Malcolm wrote:
> > > +   && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> > > +   || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> > 
> > This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).
> 
> I based this code on the code in lookup_field right above it;
> I copied-and-pasted that conditional, so presumably it should also be
> changed in lookup_field (which has the condition twice)?
> 
> FWIW I notice RECORD_OR_UNION_TYPE_P also covers QUAL_UNION_TYPE.
> 
> /* Nonzero if TYPE is a record or union type.  */
> #define RECORD_OR_UNION_TYPE_P(TYPE)  \
>   (TREE_CODE (TYPE) == RECORD_TYPE\
>|| TREE_CODE (TYPE) == UNION_TYPE  \
>|| TREE_CODE (TYPE) == QUAL_UNION_TYPE)
> 
> FWIW I've made the change in the attached patch (both to the new
> function, and to lookup_field).

Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
things down.  I think we should have a C FE-only macro, maybe called
RECORD_OR_UNION_TYPE_P that only checks for those two types, but this is
something that I can deal with later on.

So I think please just drop these changes for now.  Sorry again.

> > > +  const bool debug = false;
> > > +
> > > +  if (debug)
> > > +{
> > > +  printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> > > +  printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> > > +}
> > 
> > Did you leave this debug stuff here intentionally?
> 
> I find it useful, but I believe it's against our policy, so I've deleted
> it in the attached patch.

Probably.  But you could surely have a separate DEBUG_FUNCTION that can be
called from gdb.
 
> > > +  /* Build the rest of the row by considering neighbours to
> > > +  the north, west and northwest.  */
> > > +  for (int j = 0; j < len_s; j++)
> > > + {
> > > +   edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> > > +   edit_distance_t deletion = v1[j] + 1;
> > > +   edit_distance_t insertion= v0[j + 1] + 1;
> > 
> > The formatting doesn't look right here.
> 
> It's correct; it's "diff" inserting two spaces before a tab combined
> with our mixed spaces+tab convention: the "for" is at column 6 (6
> spaces), whereas the other lines are at column 8 (1 tab), which looks
> weird in a diff.

Sorry, what I had in mind were the spaces after "deletion" and "insertion"
before "=".  Not a big deal, of course.
 
> Patch attached; only tested lightly so far (compiles, and passes
> spellcheck subset of tests).
> 
> OK for trunk if it passes bootstrap?

Ok modulo the RECORD_OR_UNION_TYPE_P changes, thanks.

Marek


Re: [patch] Implement std::experimental::source_location (sort of)

2015-11-13 Thread Martin Sebor

On 11/13/2015 08:21 AM, Martin Sebor wrote:

On 11/13/2015 05:56 AM, Jonathan Wakely wrote:

This is a non-conforming implementation of
https://rawgit.com/cplusplus/fundamentals-ts/v2/fundamentals-ts.html#reflection.src_loc



It doesn't provide any column numbers, and fails to meet the
requirement that using current() in a NSDMI will refer to the location
of the constructor ... but maybe it's good enough until we get the
necessary front-end support. It is experimental, after all.

Thoughts?


Lars (CC'd) mentioned he was interested in implementing the front
end support (http://gcc.gnu.org/PR66561).


(Hopefully with the right email for Lars now.)



Martin




Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Jakub Jelinek
On Fri, Nov 13, 2015 at 04:53:05PM +0100, Marek Polacek wrote:
> On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> > On 11/13/2015 04:11 PM, Marek Polacek wrote:
> > >Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
> > >we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
> > >introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
> > >things down.
> > 
> > I don't think so, the three codes are adjacent so we should be generating
> > "(unsigned)(code - RECORD_TYPE) < 3".
> 
> Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
> form, then we don't need a separate version for the C FE.

Why?  The compiler should do that already, or do you care about
-O0 builds or host compilers other than gcc that aren't able to do this?
The disadvantage of writing it manually that way is that you need to assert
somewhere that the 3 values indeed are consecutive, while
when the (host?) compiler performs this optimization, it does that only if
they are consecutive, if they are not, the code will be just less efficient.

Jakub


Re: [PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-13 Thread Jim Wilson
On Thu, Nov 12, 2015 at 8:15 AM, Ramana Radhakrishnan
 wrote:
> This is OK to go in with a follow up to handle this cpu in t-aprofile
> similar to the other cpus in there - for bonus points please deal with
> the exynos core at the same time if not already done.

This was tested with a arm-eabi cross compiler build configured
--with-multilib-list=aprofile, and then using
  ./xgcc -B./ -mcpu=X --print-libgcc
to verify that processor names map to the correct libgcc multilib.

Jim
2015-11-12  Jim Wilson  

	* gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add lines for exynos-m1
	and qdf24xx to match -march=armv8-a.

Index: gcc/config/arm/t-aprofile
===
--- gcc/config/arm/t-aprofile	(revision 230283)
+++ gcc/config/arm/t-aprofile	(working copy)
@@ -91,6 +91,8 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?corte
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a57.cortex-a53
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a72
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a72.cortex-a53
+MULTILIB_MATCHES   += march?armv8-a=mcpu?exynos-m1
+MULTILIB_MATCHES   += march?armv8-a=mcpu?qdf24xx
 
 # Arch Matches
 MULTILIB_MATCHES   += march?armv8-a=march?armv8-a+crc


Re: [PATCH] Fix memory leaks in tree-ssa-uninit.c

2015-11-13 Thread Martin Liška
On 11/13/2015 05:32 PM, Jeff Law wrote:
> On 11/13/2015 05:50 AM, Martin Liška wrote:
>> Hello.
>>
>> Patch survives regbootstrap on x86_64-linux-gnu.
>> Ready for trunk?
>>
>> Thanks,
>> Martin
>>
>>
>> 0001-Fix-memory-leaks-in-tree-ssa-uninit.c.patch
>>
>>
>>  From 54851503251dee7a8bd074485db262715e628728 Mon Sep 17 00:00:00 2001
>> From: marxin
>> Date: Fri, 13 Nov 2015 12:23:22 +0100
>> Subject: [PATCH] Fix memory leaks in tree-ssa-uninit.c
>>
>> gcc/ChangeLog:
>>
>> 2015-11-13  Martin Liska
>>
>> * tree-ssa-uninit.c (convert_control_dep_chain_into_preds):
>> Fix GNU coding style.
>> (find_def_preds): Use auto_vec.
>> (destroy_predicate_vecs): Change signature of the function.
>> (prune_uninit_phi_opnds_in_unrealizable_paths): Use the
>> new signature.
>> (simplify_preds_4): Use destroy_predicate_vecs instread of
>> just releasing preds vector.
>> (normalize_preds): Likewise.
>> (is_use_properly_guarded): Use new signature of
>> destroy_predicate_vecs.
>> (find_uninit_use): Likewise.
> OK.
> 
> FWIW, there's all kinds of spaces vs tabs issues in this file.  I'm curious 
> why you chose to fix convert_control_dep_chain_into_preds, but didn't change 
> any others.

Hi Jeff.

Thanks for confirmation, you are right, it's full of coding style issues. I can 
change these if it would be desired?

Thanks,
Martin

> 
> Jeff



Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-11-13 Thread Bernd Schmidt

On 11/06/2015 08:51 PM, Jeff Law wrote:

I think the change is fine for the trunk, though I'm still curious about
how the code as-is resulted in a comparison failure.


I've been retesting and I think this was a case of something else 
triggering an random failure - the patch made it go away on the testcase 
I looked at, but random compare-debug failures persisted. I think I have 
those fixed now. I'll leave this patch out for now.



Bernd



Re: [patch] update locale support fro FreeBSD

2015-11-13 Thread Jonathan Wakely

On 12/11/15 23:32 +0100, Andreas Tobler wrote:

All,

with the work from Jennifer Yao and John Marino we can now update the 
locale support on FreeBSD to the level of DragonFly.


Results of this work can be found on the results list.

Here my small addendum to make it work on FreeBSD.

Is this ok for trunk? (Given that the work from Jennifer and John are 
committed before stage3?)


Those patches are now in.

I'm OK with this change for gcc6. It is an ABI change for FreeBSD, as
it changes the std::locale definitions, but as the target maintainer
you can decide if that's OK.



[PATCH 1/4][AArch64] Generalize CCMP support

2015-11-13 Thread Wilco Dijkstra
This patch series generalizes CCMP by adding FCCMP support and enabling more
optimizations. The first patch simplifies the representation of CCMP
patterns by using if-then-else which closely matches real instruction
semantics. As a result the existing special CC modes and functions are no
longer required. The condition of the CCMP is the if condition which
compares the previously set CC register. The then part does the compare like
a normal compare. The else part contains the integer value of the AArch64
condition that must be false if the if condition is false.

As a result of this patch a compare with zero can be merged into an ALU
operation:

int
f (int a, int b)
{
  a += b;
  return a == 0 || a == 3;
}

f:
addsw0, w0, w1
ccmpw0, 3, 4, ne
csetw0, eq
ret

Passes GCC regression tests. OK for commit?

ChangeLog:
2015-11-13  Wilco Dijkstra  

* gcc/ccmp.c (expand_ccmp_expr): Extract cmp_code from return value
of 
expand_ccmp_expr_1.
* gcc/config/aarch64/aarch64.md (ccmp_and): Use if_then_else for
ccmp.
(ccmp_ior): Remove pattern.
(cmp): Remove expand.
(cmp): Globalize pattern.
(cstorecc4): Use cc_register.
(movcc): Remove ccmp_cc_register check.
* gcc/config/aarch64/aarch64.c (aarch64_get_condition_code_1):
Simplify after removal of CC_DNE/* modes.
(aarch64_ccmp_mode_to_code): Remove.
(aarch64_print_operand): Remove 'K' case.  Merge 'm' and 'M' cases.
In 'k' case use integer as condition.
(aarch64_nzcv_codes): Remove inverted cases.
(aarch64_code_to_ccmode): Remove.
(aarch64_gen_ccmp_first): Use cmp pattern directly.  Return the
correct 
comparison with CC register to be used in folowing CCMP/branch/CSEL.
(aarch64_gen_ccmp_next): Use previous comparison and mode in CCMP
pattern.  Return the comparison with CC register.  Invert conditions
when bitcode is OR.
* gcc/config/aarch64/aarch64-modes.def: Remove CC_DNE/* modes.
* gcc/config/aarch64/predicates.md (ccmp_cc_register): Remove.


---
 gcc/ccmp.c   |   7 +-
 gcc/config/aarch64/aarch64-modes.def |  10 --
 gcc/config/aarch64/aarch64.c | 305
---
 gcc/config/aarch64/aarch64.md|  68 ++--
 gcc/config/aarch64/predicates.md |  17 --
 5 files changed, 83 insertions(+), 324 deletions(-)

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index 20348d9..ef60a6d 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -249,9 +249,10 @@ expand_ccmp_expr (gimple *g)
   enum insn_code icode;
   enum machine_mode cc_mode = CCmode;
   tree lhs = gimple_assign_lhs (g);
+  enum rtx_code cmp_code = GET_CODE (tmp);
 
 #ifdef SELECT_CC_MODE
-  cc_mode = SELECT_CC_MODE (NE, tmp, const0_rtx);
+  cc_mode = SELECT_CC_MODE (cmp_code, XEXP (tmp, 0), const0_rtx);
 #endif
   icode = optab_handler (cstore_optab, cc_mode);
   if (icode != CODE_FOR_nothing)
@@ -262,8 +263,8 @@ expand_ccmp_expr (gimple *g)
  emit_insn (prep_seq);
  emit_insn (gen_seq);
 
- tmp = emit_cstore (target, icode, NE, cc_mode, cc_mode,
-0, tmp, const0_rtx, 1, mode);
+ tmp = emit_cstore (target, icode, cmp_code, cc_mode, cc_mode,
+0, XEXP (tmp, 0), const0_rtx, 1, mode);
  if (tmp)
return tmp;
}
diff --git a/gcc/config/aarch64/aarch64-modes.def
b/gcc/config/aarch64/aarch64-modes.def
index 3bf3b2d..0c529e9 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -25,16 +25,6 @@ CC_MODE (CC_ZESWP); /* zero-extend LHS (but swap to make
it RHS).  */
 CC_MODE (CC_SESWP); /* sign-extend LHS (but swap to make it RHS).  */
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
-CC_MODE (CC_DNE);
-CC_MODE (CC_DEQ);
-CC_MODE (CC_DLE);
-CC_MODE (CC_DLT);
-CC_MODE (CC_DGE);
-CC_MODE (CC_DGT);
-CC_MODE (CC_DLEU);
-CC_MODE (CC_DLTU);
-CC_MODE (CC_DGEU);
-CC_MODE (CC_DGTU);
 
 /* Half-precision floating point for __fp16.  */
 FLOAT_MODE (HF, 2, 0);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9ae8c69..adb222a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3907,7 +3907,6 @@ aarch64_get_condition_code (rtx x)
 static int
 aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code
comp_code)
 {
-  int ne = -1, eq = -1;
   switch (mode)
 {
 case CCFPmode:
@@ -3930,56 +3929,6 @@ aarch64_get_condition_code_1 (enum machine_mode mode,
enum rtx_code comp_code)
}
   break;
 
-case CC_DNEmode:
-  ne = AARCH64_NE;
-  eq = AARCH64_EQ;
-  break;
-
-case CC_DEQmode:
-  ne = AARCH64_EQ;
-  eq = AARCH64_NE;
-  break;
-
-case CC_DGEmode:
-  ne = 

Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Marek Polacek
On Fri, Nov 13, 2015 at 04:56:30PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 04:53:05PM +0100, Marek Polacek wrote:
> > On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> > > I don't think so, the three codes are adjacent so we should be generating
> > > "(unsigned)(code - RECORD_TYPE) < 3".
> > 
> > Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
> > form, then we don't need a separate version for the C FE.
> 
> Why?  The compiler should do that already, or do you care about
> -O0 builds or host compilers other than gcc that aren't able to do this?

I don't.

> The disadvantage of writing it manually that way is that you need to assert
> somewhere that the 3 values indeed are consecutive, while
> when the (host?) compiler performs this optimization, it does that only if
> they are consecutive, if they are not, the code will be just less efficient.

Ok, I understand now what Bernd meant.  I didn't realize the compiler already
does such optimization with those _TYPEs...

Marek


Re: [PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-13 Thread Jim Wilson
On Fri, Nov 13, 2015 at 9:02 AM, Kyrill Tkachov  wrote:
> Sorry to chime in late on this, but while you're at it could
> you please add an xgene1 entry?

Yes, I just realized that xgene1 is missing too, I rushed the patch a
little too much.  I will revise it to add xgene1 also.

Jim


Re: [patch] Implement std::experimental::source_location (sort of)

2015-11-13 Thread Martin Sebor

On 11/13/2015 05:56 AM, Jonathan Wakely wrote:

This is a non-conforming implementation of
https://rawgit.com/cplusplus/fundamentals-ts/v2/fundamentals-ts.html#reflection.src_loc


It doesn't provide any column numbers, and fails to meet the
requirement that using current() in a NSDMI will refer to the location
of the constructor ... but maybe it's good enough until we get the
necessary front-end support. It is experimental, after all.

Thoughts?


Lars (CC'd) mentioned he was interested in implementing the front
end support (http://gcc.gnu.org/PR66561).

Martin


Re: [Bulk] [OpenACC 0/7] host_data construct

2015-11-13 Thread Jakub Jelinek
On Mon, Nov 02, 2015 at 06:33:39PM +, Julian Brown wrote:
> Firstly, on trunk at least, use_device_ptr variables are restricted to
> pointer or array types: that restriction doesn't exist in OpenACC, nor
> actually could I find it in the OpenMP 4.1 document (my guess is the
> standards are supposed to match in this regard). I think that a program
> such as this should work:

So, after talking about this on omp-lang, it seems there is agreement
that only arrays and pointer types (or reference to arrays or pointers)
should be allowed in use_device_ptr clause and that for pointers/reference
to pointers it should probably act the way I've coded it up, i.e. that
for them it translates the pointer to point to corresponding object to the
one to which it points on the host.  It is too late to change the standard
now, but will be changed soon, and hopefully clarified in examples.

> void target_fn (int *targ_data);
> 
> int
> main (int argc, char *argv[])
> {
>   char out;
>   int myvar;
> #pragma omp target enter data map(to: myvar)
> 
> #pragma omp target data use_device_ptr(myvar) map(from:out)
>   {
> target_fn ();
> out = 5;
>   }
> 
>   return 0;
> }

That would make the above non-conforming for OpenMP.

> Secondly, attempts to use use_device_ptr on (e.g.
> dynamically-allocated) arrays accessed through a pointer cause an ICE
> with the existing trunk OpenMP code:
> 
> #include 
> 
> void target_fn (char *targ_data);
> 
> int
> main (int argc, char *argv[])
> {
>   char *myarr, out;
> 
>   myarr = malloc (1024);
> 
> #pragma omp target data map(to: myarr[0:1024])
>   {
> #pragma omp target data use_device_ptr(myarr) map(from:out)
> {
>   target_fn (myarr);
>   out = 5;
> }
>   }
> 
>   return 0;
> }

Can't reproduce this ICE (at least not on gomp-4_5-branch, but there
aren't significant changes from the trunk there).

> Furthermore, this looks strange to me (006t.omplower):
> 
>   .omp_data_arr.5.out = 
>   myarr.8 = myarr;
>   .omp_data_arr.5.myarr = myarr.8;
>   #pragma omp target data map(from:out [len: 1]) use_device_ptr(myarr)
> {
>   D.2436 = .omp_data_arr.5.myarr;
>   myarr = D.2436;
> 
> That's clobbering the original myarr variable, right?

Just use -fdump-tree-omplower-uid to see that it is a different variable.
Basically, for OpenMP use_device_ptr creates a private copy of the
pointer for the body of the target data construct, and that pointer is
assigned the target device's address.  For arrays the implementation
creates an artificial pointer variable (holding the start of the array
initially) and replaces all references to the array in the target data
body with dereference of the pointer.

Jakub


Re: TR1 Special Math

2015-11-13 Thread Jonathan Wakely
On 25 October 2015 at 20:48, Jonathan Wakely  wrote:
> On 25 October 2015 at 17:46, Ed Smith-Rowland <3dw...@verizon.net> wrote:
>> On 10/24/2015 11:38 PM, Jonathan Wakely wrote:
>>>
>>> On 8 May 2015 at 15:05, Ed Smith-Rowland <3dw...@verizon.net> wrote:

 On 05/07/2015 12:06 PM, Jonathan Wakely wrote:
>
> Hi Ed,
>
> The C++ committee is considering the
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4437.pdf
> proposal to make C++17 include the contents of ISO 29124:2010 (the
> special math functions from TR1 that went into a separate standard,
> not into C++11).
>
> What is the status of our TR1 implementation? Is it complete? Good
> enough quality to move out of the tr1 sub-dir?
>
> Even if N4437 isn't accepted for C++17 we could move things around to
> turn the TR1 code into an iso29124 implementation, do you think that
> would make sense?
>
 That would make absolute sense.
 I actually have a tree where I've done that.
 All the functions are in there (29124 removed the hypergeometric
 functions.
 I'd like to keep those as extensions.
 I have some bugfixes also.

 I have a better version of the Carlson elliptic functions (which are used
 in
 the 29124 elliptic functions).

 Ed

>>> Hi Ed, Florian,
>>>
>>> Here's a patch to re-use the TR1 math functions to implement IS 29124,
>>> what do you think of this approach? Ed, were you just going to copy
>>> the files and have duplicated code?
>>>
>>> We should probably uglify the names of the hypergeometric functions if
>>> they are not in the final standard.
>>>
>>> This doesn't include Florian's patch, which should be applied.
>>>
>>> (I want to get this done before stage 1 ends in a couple of weeks, so
>>> am posting this for review now, but I'll be unavailable for the next
>>> week or two and might not be able to actually commit anything until
>>> stage 3).
>>
>> Hi all!
>>
>> I am actually very aware of the stage 1 deadline and am working furiously!
>>
>> This patch adds the hypergeometric and confluent hypergeometric functions
>> that were actually stricken fromTR29124.
>> I actually had a mind to add those back especially since the confluent one
>> is actually pretty stable in it's realm and is used in some statistics
>> tests.
>> I expect that some people have ventures to use both and so TR29129 would not
>> be a full replacement for TR1 without them.
>>
>> I intend to post within the next few days.  I have to realize that some of
>> my hopes and dreams would be better done with these in tree! ;-)
>>
>> Thank you for lighting a fire Jonathan!
>
> Excellent, glad to hear you're on this, as you know the code and the
> specs, whereas I'm poking around blindly :-)

Hi Ed,

Have you been able to find enough time to work on this?

Will you be able to make the stage 1 deadline tomorrow, and if not, do
you think I should apply my patch to re-use the TR1 stuff?  (We can
apply Florian's bug fix to that as well).


Re: [nvptx] complex vector reductions

2015-11-13 Thread Bernd Schmidt

On 11/13/2015 04:11 PM, Nathan Sidwell wrote:

I noticed that we weren't supporting reductions of complex type,
particularly complex double.

I've committed this patch to add support for vector reductions.  We
split the complex type apart and process each half before sticking it
back together again.  As it happens only 32-bit shuffles exist on
hardware, so splitting (say) complex float this way doesn't result in
worse code that punning the  whole thing to a 64-bit int.  I suspect it
might play better, by keeping the real and imaginary parts separate.

Worker and gang-level reductions of such types will be a different
patch, which I need to think about.


LGTM.


Bernd



Re: [libstdc++ testsuite][patch] many locale tests only SUPPORTED on linux, start making these portable

2015-11-13 Thread Jonathan Wakely

On 11/11/15 22:12 +, Jonathan Wakely wrote:

On 11/11/15 22:53 +0100, John Marino wrote:

On 11/11/2015 10:51 PM, Jonathan Wakely wrote:

On 16/10/15 11:21 +0200, John Marino wrote:

The only significant comment was:

e.g. `"de_DE" => "de_DE@ISO8859-15` should be `e.g. "de_DE" =>
"de_DE.ISO8859-15"` ? Since it's an encoding, @ is used for a modifier.
FYI, https://www.ietf.org/rfc/rfc4646.txt



It's a good catch; that looks like a typo.  Did that appear in more than
one place?


Dunno, I'd have to grep for it.


What are the next steps then?


I think we can go ahead and make this change, I'll try to get it
committed tomorrow or Friday.


I'm committing the patch from
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01546.html
with a tiny whitespace change (just removing some spaces at the end of
a line, which Git complained about).

Tested powerpc64le-linux, committed to trunk.  Changelog attached.

2015-11-13  John Marino  

* testsuite/22_locale/codecvt/always_noconv/char/wrapped_env.cc:
Use portable locale name
* testsuite/22_locale/codecvt/always_noconv/char/wrapped_locale.cc:
Likewise.
* testsuite/22_locale/codecvt/always_noconv/wchar_t/2.cc: Likewise.
* testsuite/22_locale/codecvt/always_noconv/wchar_t/3.cc: Likewise.
* testsuite/22_locale/codecvt/always_noconv/wchar_t/wrapped_env.cc:
Likewise.
* testsuite/22_locale/codecvt/always_noconv/wchar_t/wrapped_locale.cc:
Likewise.
* testsuite/22_locale/codecvt/encoding/char/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/encoding/char/wrapped_locale.cc:
Likewise.
* testsuite/22_locale/codecvt/encoding/wchar_t/2.cc: Likewise.
* testsuite/22_locale/codecvt/encoding/wchar_t/3.cc: Likewise.
* testsuite/22_locale/codecvt/encoding/wchar_t/wrapped_env.cc:
Likewise.
* testsuite/22_locale/codecvt/encoding/wchar_t/wrapped_locale.cc:
Likewise.
* testsuite/22_locale/codecvt/in/char/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/in/char/wrapped_locale.cc: Likewise.
* testsuite/22_locale/codecvt/in/wchar_t/2.cc: Likewise.
* testsuite/22_locale/codecvt/in/wchar_t/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/in/wchar_t/wrapped_locale.cc: Likewise.
* testsuite/22_locale/codecvt/length/char/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/length/char/wrapped_locale.cc: Likewise.
* testsuite/22_locale/codecvt/length/wchar_t/2.cc: Likewise.
* testsuite/22_locale/codecvt/length/wchar_t/3.cc: Likewise.
* testsuite/22_locale/codecvt/length/wchar_t/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/length/wchar_t/wrapped_locale.cc:
Likewise.
* testsuite/22_locale/codecvt/max_length/char/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/max_length/char/wrapped_locale.cc:
Likewise.
* testsuite/22_locale/codecvt/max_length/wchar_t/2.cc: Likewise.
* testsuite/22_locale/codecvt/max_length/wchar_t/3.cc: Likewise.
* testsuite/22_locale/codecvt/max_length/wchar_t/wrapped_env.cc:
Likewise.
* testsuite/22_locale/codecvt/max_length/wchar_t/wrapped_locale.cc:
Likewise.
* testsuite/22_locale/codecvt/out/char/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/out/char/wrapped_locale.cc: Likewise.
* testsuite/22_locale/codecvt/out/wchar_t/2.cc: Likewise.
* testsuite/22_locale/codecvt/out/wchar_t/7.cc: Likewise.
* testsuite/22_locale/codecvt/out/wchar_t/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/out/wchar_t/wrapped_locale.cc: Likewise.
* testsuite/22_locale/codecvt/unshift/char/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/unshift/char/wrapped_locale.cc: Likewise.
* testsuite/22_locale/codecvt/unshift/wchar_t/2.cc: Likewise.
* testsuite/22_locale/codecvt/unshift/wchar_t/3.cc: Likewise.
* testsuite/22_locale/codecvt/unshift/wchar_t/wrapped_env.cc: Likewise.
* testsuite/22_locale/codecvt/unshift/wchar_t/wrapped_locale.cc:
Likewise.
* testsuite/22_locale/codecvt_byname/50714.cc: Likewise.
* testsuite/22_locale/collate/compare/char/1.cc: Likewise.
* testsuite/22_locale/collate/compare/char/2.cc: Likewise.
* testsuite/22_locale/collate/compare/char/3.cc: Likewise.
* testsuite/22_locale/collate/compare/char/wrapped_env.cc: Likewise.
* testsuite/22_locale/collate/compare/char/wrapped_locale.cc: Likewise.
* testsuite/22_locale/collate/compare/wchar_t/1.cc: Likewise.
* testsuite/22_locale/collate/compare/wchar_t/2.cc: Likewise.
* testsuite/22_locale/collate/compare/wchar_t/3.cc: Likewise.
* testsuite/22_locale/collate/compare/wchar_t/wrapped_env.cc: Likewise.
* 

Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Bernd Schmidt

On 11/13/2015 04:11 PM, Marek Polacek wrote:

Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
things down.


I don't think so, the three codes are adjacent so we should be 
generating "(unsigned)(code - RECORD_TYPE) < 3".



Bernd


Re: [PATCH] Fix memory leaks in tree-ssa-uninit.c

2015-11-13 Thread Jeff Law

On 11/13/2015 05:50 AM, Martin Liška wrote:

Hello.

Patch survives regbootstrap on x86_64-linux-gnu.
Ready for trunk?

Thanks,
Martin


0001-Fix-memory-leaks-in-tree-ssa-uninit.c.patch


 From 54851503251dee7a8bd074485db262715e628728 Mon Sep 17 00:00:00 2001
From: marxin
Date: Fri, 13 Nov 2015 12:23:22 +0100
Subject: [PATCH] Fix memory leaks in tree-ssa-uninit.c

gcc/ChangeLog:

2015-11-13  Martin Liska

* tree-ssa-uninit.c (convert_control_dep_chain_into_preds):
Fix GNU coding style.
(find_def_preds): Use auto_vec.
(destroy_predicate_vecs): Change signature of the function.
(prune_uninit_phi_opnds_in_unrealizable_paths): Use the
new signature.
(simplify_preds_4): Use destroy_predicate_vecs instread of
just releasing preds vector.
(normalize_preds): Likewise.
(is_use_properly_guarded): Use new signature of
destroy_predicate_vecs.
(find_uninit_use): Likewise.

OK.

FWIW, there's all kinds of spaces vs tabs issues in this file.  I'm 
curious why you chose to fix convert_control_dep_chain_into_preds, but 
didn't change any others.


Jeff


Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Jakub Jelinek
On Fri, Nov 13, 2015 at 07:37:17PM +0300, Ilya Verbin wrote:
> I don't know which interface to implement to maintain compatibility in the
> future.
> Anyway, currently it's impossible that a process will use the same 
> liboffloadmic
> for 2 different offloading paths (say GCC's in exec and ICC's in a dso), 
> because
> in fact GCC's and ICC's libraries are not the same.  First of all, they have
> different names: liboffloadmic in GCC and just liboffload in ICC.  And most
> importantly, ICC's version contains some references to libiomp5, which were
> removed form GCC's version.  In theory, we want to use one library with all
> compilers, but I'm not sure when it will be possible.

Ok, in that case it is less of a problem.

> > Do you get still crashes on any of the testcases with this?
> 
> No, all tests now pass using emul.  I'll report when I have any results on HW.

Perfect, I'll commit it to gomp-4_5-branch then.

Thanks.

Jakub


Re: [PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-13 Thread Kyrill Tkachov

Hi Jim,

On 13/11/15 16:59, Jim Wilson wrote:

On Thu, Nov 12, 2015 at 8:15 AM, Ramana Radhakrishnan
 wrote:

This is OK to go in with a follow up to handle this cpu in t-aprofile
similar to the other cpus in there - for bonus points please deal with
the exynos core at the same time if not already done.

This was tested with a arm-eabi cross compiler build configured
--with-multilib-list=aprofile, and then using
   ./xgcc -B./ -mcpu=X --print-libgcc
to verify that processor names map to the correct libgcc multilib.

Jim


Index: gcc/config/arm/t-aprofile
===
--- gcc/config/arm/t-aprofile   (revision 230283)
+++ gcc/config/arm/t-aprofile   (working copy)
@@ -91,6 +91,8 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?corte
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a57.cortex-a53
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a72
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a72.cortex-a53
+MULTILIB_MATCHES   += march?armv8-a=mcpu?exynos-m1
+MULTILIB_MATCHES   += march?armv8-a=mcpu?qdf24xx


Sorry to chime in late on this, but while you're at it could
you please add an xgene1 entry?

You can either do it as part of this patch or as a separate
pre-approved patch if you don't want to re-test this one.

This patch is ok if you take the second route.

Thanks,
Kyrill



Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

2015-11-13 Thread David Malcolm
On Wed, 2015-10-14 at 11:00 +0200, Richard Biener wrote:
> On Tue, Oct 13, 2015 at 5:32 PM, David Malcolm  wrote:
> > On Thu, 2015-09-24 at 10:15 +0200, Richard Biener wrote:
> >> On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm  wrote:
> >> > On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote:
> >> >> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz  wrote:
> >> >> > Hi,
> >> >> >
> >> >> > On Tue, 22 Sep 2015, David Malcolm wrote:
> >> >> >
> >> >> >> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
> >> >> >> table ever get smaller, or does it only ever get inserted into?
> >> >> >
> >> >> > It only ever grows.
> >> >> >
> >> >> >> An idea I had is that we could stash short ranges directly into the 
> >> >> >> 32
> >> >> >> bits of location_t, by offsetting the per-column-bits somewhat.
> >> >> >
> >> >> > It's certainly worth an experiment: let's say you restrict yourself to
> >> >> > tokens less than 8 characters, you need an additional 3 bits (using 
> >> >> > one
> >> >> > value, e.g. zero, as the escape value).  That leaves 20 bits for the 
> >> >> > line
> >> >> > numbers (for the normal 8 bit columns), which might be enough for most
> >> >> > single-file compilations.  For LTO compilation this often won't be 
> >> >> > enough.
> >> >> >
> >> >> >> My plan is to investigate the impact these patches have on the time 
> >> >> >> and
> >> >> >> memory consumption of the compiler,
> >> >> >
> >> >> > When you do so, make sure you're also measuring an LTO compilation 
> >> >> > with
> >> >> > debug info of something big (firefox).  I know that we already had 
> >> >> > issues
> >> >> > with the size of the linemap data in the past for these cases 
> >> >> > (probably
> >> >> > when we added columns).
> >> >>
> >> >> The issue we have with LTO is that the linemap gets populated in quite
> >> >> random order and thus we repeatedly switch files (we've mitigated this
> >> >> somewhat for GCC 5).  We also considered dropping column info
> >> >> (and would drop range info) as diagnostics are from optimizers only
> >> >> with LTO and we keep locations merely for debug info.
> >> >
> >> > Thanks.  Presumably the mitigation you're referring to is the
> >> > lto_location_cache class in lto-streamer-in.c?
> >> >
> >> > Am I right in thinking that, right now, the LTO code doesn't support
> >> > ad-hoc locations? (presumably the block pointers only need to exist
> >> > during optimization, which happens after the serialization)
> >>
> >> LTO code does support ad-hoc locations but they are "restored" only
> >> when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA).
> >>
> >> > The obvious simplification would be, as you suggest, to not bother
> >> > storing range information with LTO, falling back to just the existing
> >> > representation.  Then there's no need to extend LTO to serialize ad-hoc
> >> > data; simply store the underlying locus into the bit stream.  I think
> >> > that this happens already: lto-streamer-out.c calls expand_location and
> >> > stores the result, so presumably any ad-hoc location_t values made by
> >> > the v2 patches would have dropped their range data there when I ran the
> >> > test suite.
> >>
> >> Yep.  We only preserve BLOCKs, so if you don't add extra code to
> >> preserve ranges they'll be "dropped".
> >>
> >> > If it's acceptable to not bother with ranges for LTO, one way to do the
> >> > "stashing short ranges into the location_t" idea might be for the
> >> > bits-per-range of location_t values to be a property of the line_table
> >> > (or possibly the line map), set up when the struct line_maps is created.
> >> > For non-LTO it could be some tuned value (maybe from a param?); for LTO
> >> > it could be zero, so that we have as many bits as before for line/column
> >> > data.
> >>
> >> That could be a possibility (likewise for column info?)
> >>
> >> Richard.
> >>
> >> > Hope this sounds sane
> >> > Dave
> >
> > I did some crude benchmarking of the patchkit, using these scripts:
> >   https://github.com/davidmalcolm/gcc-benchmarking
> > (specifically, bb0222b455df8cefb53bfc1246eb0a8038256f30),
> > using the "big-code.c" and "kdecore.cc" files Michael posted as:
> >   https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00062.html
> > and "influence.i", a preprocessed version of SPEC2006's 445.gobmk
> > engine/influence.c (as an example of a moderate-sized pure C source
> > file).
> >
> > This doesn't yet cover very large autogenerated C files, and the .cc
> > file is only being measured to see the effect on the ad-hoc table (and
> > tokenization).
> >
> > "control" was r227977.
> > "experiment" was the same revision with the v2 patchkit applied.
> >
> > Recall that this patchkit captures ranges for tokens as an extra field
> > within tokens within libcpp and the C FE, and adds ranges to the ad-hoc
> > location lookaside, storing them for all tree nodes within the C FE that
> > have a 

[PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Wilco Dijkstra
This patch adds support for FCCMP. This is trivial with the new CCMP
representation - remove the restriction of FP in ccmp.c and add FCCMP
patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.

OK for commit?

ChangeLog:
2015-11-13  Wilco Dijkstra  

* gcc/ccmp.c (ccmp_candidate_p): Remove integer-only restriction.
* gcc/config/aarch64/aarch64.md (fccmp): New pattern.
(fccmpe): Likewise.
(fcmp): Rename to fcmp and globalize pattern.
(fcmpe): Likewise.
* gcc/config/aarch64/aarch64.c (aarch64_gen_ccmp_first): Add FP
support.
(aarch64_gen_ccmp_next): Add FP support.

gcc/testsuite/
* gcc.target/aarch64/ccmp_1.c: New testcase.


---
 gcc/ccmp.c|  6 ---
 gcc/config/aarch64/aarch64.c  | 24 +
 gcc/config/aarch64/aarch64.md | 34 -
 gcc/testsuite/gcc.target/aarch64/ccmp_1.c | 84
+++
 4 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_1.c

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index ef60a6d..cbdbd6d 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -108,12 +108,6 @@ ccmp_candidate_p (gimple *g)
   || gimple_bb (gs0) != gimple_bb (g))
 return false;
 
-  if (!(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0)))
-   || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0
-  || !(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)))
-  || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)
-return false;
-
   tcode0 = gimple_assign_rhs_code (gs0);
   tcode1 = gimple_assign_rhs_code (gs1);
   if (TREE_CODE_CLASS (tcode0) == tcc_comparison
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index adb222a..a224982 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12400,6 +12400,18 @@ aarch64_gen_ccmp_first (rtx *prep_seq, rtx
*gen_seq,
   icode = CODE_FOR_cmpdi;
   break;
 
+case SFmode:
+  cmp_mode = SFmode;
+  cc_mode = aarch64_select_cc_mode ((enum rtx_code) code, op0, op1);
+  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpesf : CODE_FOR_fcmpsf;
+  break;
+
+case DFmode:
+  cmp_mode = DFmode;
+  cc_mode = aarch64_select_cc_mode ((enum rtx_code) code, op0, op1);
+  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpedf : CODE_FOR_fcmpdf;
+  break;
+
 default:
   end_sequence ();
   return NULL_RTX;
@@ -12463,6 +12475,18 @@ aarch64_gen_ccmp_next (rtx *prep_seq, rtx *gen_seq,
rtx prev, int cmp_code,
   icode = CODE_FOR_ccmpdi;
   break;
 
+case SFmode:
+  cmp_mode = SFmode;
+  cc_mode = aarch64_select_cc_mode ((enum rtx_code) cmp_code, op0,
op1);
+  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpesf : CODE_FOR_fccmpsf;
+  break;
+
+case DFmode:
+  cmp_mode = DFmode;
+  cc_mode = aarch64_select_cc_mode ((enum rtx_code) cmp_code, op0,
op1);
+  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpedf : CODE_FOR_fccmpdf;
+  break;
+
 default:
   end_sequence ();
   return NULL_RTX;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 18a4808e..04b4ddb 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -279,6 +279,36 @@
   [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
 )
 
+(define_insn "fccmp"
+  [(set (match_operand:CCFP 1 "cc_register" "")
+   (if_then_else:CCFP
+ (match_operator 4 "aarch64_comparison_operator"
+  [(match_operand 0 "cc_register" "")
+   (const_int 0)])
+ (compare:CCFP
+   (match_operand:GPF 2 "register_operand" "w")
+   (match_operand:GPF 3 "register_operand" "w"))
+ (match_operand 5 "immediate_operand")))]
+  "TARGET_FLOAT"
+  "fccmp\\t%2, %3, %k5, %m4"
+  [(set_attr "type" "fcmp")]
+)
+
+(define_insn "fccmpe"
+  [(set (match_operand:CCFPE 1 "cc_register" "")
+(if_then_else:CCFPE
+ (match_operator 4 "aarch64_comparison_operator"
+  [(match_operand 0 "cc_register" "")
+ (const_int 0)])
+  (compare:CCFPE
+   (match_operand:GPF 2 "register_operand" "w")
+   (match_operand:GPF 3 "register_operand" "w"))
+ (match_operand 5 "immediate_operand")))]
+  "TARGET_FLOAT"
+  "fccmpe\\t%2, %3, %k5, %m4"
+  [(set_attr "type" "fcmp")]
+)
+
 ;; Expansion of signed mod by a power of 2 using CSNEG.
 ;; For x0 % n where n is a power of 2 produce:
 ;; negs   x1, x0
@@ -2794,7 +2824,7 @@
   [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
 )
 
-(define_insn "*cmp"
+(define_insn "fcmp"
   [(set (reg:CCFP CC_REGNUM)
 (compare:CCFP (match_operand:GPF 0 "register_operand" "w,w")
  (match_operand:GPF 1 "aarch64_fp_compare_operand"
"Y,w")))]
@@ -2805,7 +2835,7 @@
   [(set_attr "type" "fcmp")]
 )
 
-(define_insn "*cmpe"
+(define_insn "fcmpe"
   [(set (reg:CCFPE CC_REGNUM)
 

Re: TR1 Special Math

2015-11-13 Thread Ed Smith-Rowland

On 11/13/2015 10:32 AM, Jonathan Wakely wrote:

On 25 October 2015 at 20:48, Jonathan Wakely  wrote:

On 25 October 2015 at 17:46, Ed Smith-Rowland <3dw...@verizon.net> wrote:

On 10/24/2015 11:38 PM, Jonathan Wakely wrote:

On 8 May 2015 at 15:05, Ed Smith-Rowland <3dw...@verizon.net> wrote:

On 05/07/2015 12:06 PM, Jonathan Wakely wrote:

Hi Ed,

The C++ committee is considering the
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4437.pdf
proposal to make C++17 include the contents of ISO 29124:2010 (the
special math functions from TR1 that went into a separate standard,
not into C++11).

What is the status of our TR1 implementation? Is it complete? Good
enough quality to move out of the tr1 sub-dir?

Even if N4437 isn't accepted for C++17 we could move things around to
turn the TR1 code into an iso29124 implementation, do you think that
would make sense?


That would make absolute sense.
I actually have a tree where I've done that.
All the functions are in there (29124 removed the hypergeometric
functions.
I'd like to keep those as extensions.
I have some bugfixes also.

I have a better version of the Carlson elliptic functions (which are used
in
the 29124 elliptic functions).

Ed


Hi Ed, Florian,

Here's a patch to re-use the TR1 math functions to implement IS 29124,
what do you think of this approach? Ed, were you just going to copy
the files and have duplicated code?

We should probably uglify the names of the hypergeometric functions if
they are not in the final standard.

This doesn't include Florian's patch, which should be applied.

(I want to get this done before stage 1 ends in a couple of weeks, so
am posting this for review now, but I'll be unavailable for the next
week or two and might not be able to actually commit anything until
stage 3).

Hi all!

I am actually very aware of the stage 1 deadline and am working furiously!

This patch adds the hypergeometric and confluent hypergeometric functions
that were actually stricken fromTR29124.
I actually had a mind to add those back especially since the confluent one
is actually pretty stable in it's realm and is used in some statistics
tests.
I expect that some people have ventures to use both and so TR29129 would not
be a full replacement for TR1 without them.

I intend to post within the next few days.  I have to realize that some of
my hopes and dreams would be better done with these in tree! ;-)

Thank you for lighting a fire Jonathan!

Excellent, glad to hear you're on this, as you know the code and the
specs, whereas I'm poking around blindly :-)

Hi Ed,

Have you been able to find enough time to work on this?

Will you be able to make the stage 1 deadline tomorrow, and if not, do
you think I should apply my patch to re-use the TR1 stuff?  (We can
apply Florian's bug fix to that as well).


I'm going to post something in a few hours.



Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Jakub Jelinek
On Fri, Nov 13, 2015 at 11:18:41AM +0100, Jakub Jelinek wrote:
> For the offloading case, I actually see a problematic spot, namely that
> GOMP_PLUGIN_target_task_completion could finish too early, and get the
> task_lock before the thread that run the gomp_target_task_fn doing map_vars
> + async_run for it.  Bet I need to add further ttask state kinds and deal
> with that case (so GOMP_PLUGIN_target_task_completion would just take the
> task lock and tweak ttask state if it has not been added to the queues
> yet).
> Plus I think I want to improve the case where we are not waiting, in
> gomp_create_target_task if not waiting for dependencies actually schedule
> manually the gomp_target_task_fn.

These two have been resolved, plus target-34.c issue resolved too (the bug
was that I've been too lazy and just put target-33.c test into #pragma omp
parallel #pragma omp single, but that is invalid OpenMP, as single is a
worksharing region and #pragma omp barrier may not be encountered in such a
region.  Fixed by rewriting the testcase.

So here is a full patch that passes for me both non-offloading and
offloading, OMP_NUM_THREADS=16 (implicit on my box) as well as
OMP_NUM_THREADS=1 (explicit).  I've incorporated your incremental patch.

One option to avoid the static variable would be to pass two pointers
instead of one (async_data), one would be the callback function pointer,
another argument to it.  Or another possibility would be to say that
the async_data argument the plugin passes to liboffloadmic would be
pointer to structure, holding a function pointer (completion callback)
and the data pointer to pass to it, and then the plugin would just
GOMP_PLUGIN_malloc 2 * sizeof (void *) for it, fill it in and
register some function in itself that would call the
GOMP_PLUGIN_target_task_completion with the second structure element
as argument and then free the structure pointer.

Do you get still crashes on any of the testcases with this?

2015-11-13  Jakub Jelinek  
Ilya Verbin  

* parallel.c (gomp_resolve_num_threads): Don't assume that
if thr->ts.team is non-NULL, then pool must be non-NULL.
* libgomp-plugin.h (GOMP_PLUGIN_target_task_completion): Declare.
* team.c (gomp_free_thread): Call gomp_team_end if thr->ts.team
is artificial team created for target nowait in implicit parallel
region.
(gomp_team_start): For nested check, test thr->ts.level instead of
thr->ts.team != NULL.
* target.c (GOMP_target): Don't adjust *thr in any way around
running offloaded task.
(GOMP_target_ext): Likewise.  Handle target nowait.
(GOMP_target_update_ext, GOMP_target_enter_exit_data): Check
return value from gomp_create_target_task, if false, fallthrough
as if no dependencies exist.
(gomp_target_task_fn): Change return type to bool, return true
if the task should have another part scheduled later.  Handle
target nowait.
(gomp_load_plugin_for_device): Initialize async_run.
* libgomp.map (GOMP_PLUGIN_1.1): New symbol version, export
GOMP_PLUGIN_target_task_completion.
* task.c (priority_queue_move_task_first,
gomp_target_task_completion, GOMP_PLUGIN_target_task_completion):
New functions.
(gomp_create_target_task): Change return type to bool, add
state argument, return false if for async {{enter,exit} data,update}
constructs no dependencies need to be waited for, handle target
nowait.  Set task->fn to NULL instead of gomp_target_task_fn.
(gomp_barrier_handle_tasks, GOMP_taskwait,
gomp_task_maybe_wait_for_dependencies): Handle target nowait target
tasks specially.
(GOMP_taskgroup_end): Likewise.  If taskgroup is NULL, and
thr->ts.level is 0, act as a barrier.
target nowait tasks specially.
* priority_queue.c (priority_queue_task_in_queue_p,
priority_list_verify): Adjust for addition of
GOMP_TASK_ASYNC_RUNNING kind.
* libgomp.h (enum gomp_task_kind): Add GOMP_TASK_ASYNC_RUNNING.
(enum gomp_target_task_state): New enum.
(struct gomp_target_task): Add state, tgt, task and team fields.
(gomp_create_target_task): Change return type to bool, add
state argument.
(gomp_target_task_fn): Change return type to bool.
(struct gomp_device_descr): Add async_run_func.
* testsuite/libgomp.c/target-32.c: New test.
* testsuite/libgomp.c/target-34.c: New test.
* testsuite/libgomp.c/target-33.c: New test.

2015-11-13  Ilya Verbin  

* runtime/offload_host.cpp (task_completion_callback): New
variable.
(offload_proxy_task_completed_ooo): Call task_completion_callback.
(__offload_register_task_callback): New function.
* runtime/offload_host.h (__offload_register_task_callback): 

Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-13 Thread Marek Polacek
On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> On 11/13/2015 04:11 PM, Marek Polacek wrote:
> >Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
> >we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
> >introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
> >things down.
> 
> I don't think so, the three codes are adjacent so we should be generating
> "(unsigned)(code - RECORD_TYPE) < 3".

Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
form, then we don't need a separate version for the C FE.

I'll look at this cleanup in the next week.

Marek


Re: [PATCH] PR/67682, break SLP groups up if only some elements match

2015-11-13 Thread Alan Lawrence
On 10/11/15 12:51, Richard Biener wrote:
>>
>> Just noticing this... if we have a vectorization factor of 4 and matches
>> is 1, 1, 1, 1,  1, 1, 0, 0, 0, 0, 0, 0 then this will split into 1, 1, 1, 1 
>> and
>> 1, 1, 0, 0, 0, ... where we know from the matches that it will again fail?
>>
>> Thus shouldn't we split either only if i % vectorization_factor is 0 or
>> if not, split "twice", dropping the intermediate surely non-matching
>> group of vectorization_factor size?  After all if we split like with the
>> patch then the upper half will _not_ be splitted again with the
>> simplified patch (result will be 1, 1, 0, 0, 0, 0, 0, 0 again).
>>
>> So I expect that the statistics will be the same if we restrict splitting
>> to the i % vectorization_factor == 0 case, or rather split where we do
>> now but only re-analyze group2 if i % vectorization_factor == 0 holds?
>>
>> Ok with that change.  Improvements on that incrementally.
>
> Btw, it just occurs to me that the whole thing is equivalent to splitting
> the store-group into vector-size pieces up-front?  That way we do
> the maximum splitting up-frond and avoid any redundant work?
>
> The patch is still ok as said, just the above may be a simple thing
> to explore.

I'd refrained from splitting in vect_analyze_group_access_1 as my understanding
was that we only did that once, whereas we would retry the
vect_analyze_slp_instance path each time we decreased the
vectorization_factor...however, I did try putting code at the beginning of
vect_analyze_slp_instance to split up any groups > vf. Unfortunately this loses
us some previously-successful SLPs, as some bigger groups cannot be SLPed if we
split them as they require 'unrolling'...so not addressing that here.

However your suggestion of splitting twice when we know the boundary is in the
middle of a vector is a nice compromise; it nets us a good number more
successes in SPEC2000 and SPEC2006, about 7% more than without the patch.

Hence, here's the patch I've committed, as r230330, after regstrap on x86_64
and AArch64. (I dropped the previous bb-slp-subgroups-2 and renamed the others
up as we don't do that one anymore.)

Cheers, Alan

gcc/ChangeLog:

PR tree-optimization/67682
* tree-vect-slp.c (vect_split_slp_store_group): New.
(vect_analyze_slp_instance): During basic block SLP, recurse on
subgroups if vect_build_slp_tree fails after 1st vector.

gcc/testsuite/ChangeLog:

PR tree-optimization/67682
* gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic.
* gcc.dg/vect/bb-slp-subgroups-1.c: New.
* gcc.dg/vect/bb-slp-subgroups-2.c: New.
* gcc.dg/vect/bb-slp-subgroups-3.c: New.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-7.c   | 10 +--
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 41 +
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 +
 gcc/tree-vect-slp.c| 85 +-
 5 files changed, 215 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
index ab54a48..b8bef8c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
@@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y)
   unsigned int *pout = [0];
   unsigned int a0, a1, a2, a3;
 
-  /* Non isomorphic.  */
+  /* Non isomorphic, even 64-bit subgroups.  */
   a0 = *pin++ + 23;
-  a1 = *pin++ + 142;
+  a1 = *pin++ * 142;
   a2 = *pin++ + 2;
   a3 = *pin++ * 31;
-  
+
   *pout++ = a0 * x;
   *pout++ = a1 * y;
   *pout++ = a2 * x;
@@ -29,7 +29,7 @@ main1 (unsigned int x, unsigned int y)
 
   /* Check results.  */
   if (out[0] != (in[0] + 23) * x
-  || out[1] != (in[1] + 142) * y
+  || out[1] != (in[1] * 142) * y
   || out[2] != (in[2] + 2) * x
   || out[3] != (in[3] * 31) * y)
 abort();
@@ -47,4 +47,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */
-  
+
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
new file mode 100644
index 000..39c23c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
@@ -0,0 +1,44 @@
+/* { dg-require-effective-target vect_int } */
+/* PR tree-optimization/67682.  */
+
+#include "tree-vect.h"
+
+int __attribute__((__aligned__(8))) a[8];
+int __attribute__((__aligned__(8))) b[4];
+
+__attribute__ ((noinline)) void
+test ()
+{
+a[0] = b[0];
+a[1] = b[1];
+a[2] = b[2];
+a[3] = b[3];
+a[4] = 0;
+a[5] = 0;
+a[6] = 0;
+a[7] = 0;
+}
+
+int
+main (int argc, char **argv)
+{
+  check_vect ();
+
+  for (int i = 0; i < 8; i++)
+a[i] 

Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Ilya Verbin
On Fri, Nov 13, 2015 at 16:11:50 +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 11:18:41AM +0100, Jakub Jelinek wrote:
> > For the offloading case, I actually see a problematic spot, namely that
> > GOMP_PLUGIN_target_task_completion could finish too early, and get the
> > task_lock before the thread that run the gomp_target_task_fn doing map_vars
> > + async_run for it.  Bet I need to add further ttask state kinds and deal
> > with that case (so GOMP_PLUGIN_target_task_completion would just take the
> > task lock and tweak ttask state if it has not been added to the queues
> > yet).
> > Plus I think I want to improve the case where we are not waiting, in
> > gomp_create_target_task if not waiting for dependencies actually schedule
> > manually the gomp_target_task_fn.
> 
> These two have been resolved, plus target-34.c issue resolved too (the bug
> was that I've been too lazy and just put target-33.c test into #pragma omp
> parallel #pragma omp single, but that is invalid OpenMP, as single is a
> worksharing region and #pragma omp barrier may not be encountered in such a
> region.  Fixed by rewriting the testcase.
> 
> So here is a full patch that passes for me both non-offloading and
> offloading, OMP_NUM_THREADS=16 (implicit on my box) as well as
> OMP_NUM_THREADS=1 (explicit).  I've incorporated your incremental patch.
> 
> One option to avoid the static variable would be to pass two pointers
> instead of one (async_data), one would be the callback function pointer,
> another argument to it.  Or another possibility would be to say that
> the async_data argument the plugin passes to liboffloadmic would be
> pointer to structure, holding a function pointer (completion callback)
> and the data pointer to pass to it, and then the plugin would just
> GOMP_PLUGIN_malloc 2 * sizeof (void *) for it, fill it in and
> register some function in itself that would call the
> GOMP_PLUGIN_target_task_completion with the second structure element
> as argument and then free the structure pointer.

I don't know which interface to implement to maintain compatibility in the
future.
Anyway, currently it's impossible that a process will use the same liboffloadmic
for 2 different offloading paths (say GCC's in exec and ICC's in a dso), because
in fact GCC's and ICC's libraries are not the same.  First of all, they have
different names: liboffloadmic in GCC and just liboffload in ICC.  And most
importantly, ICC's version contains some references to libiomp5, which were
removed form GCC's version.  In theory, we want to use one library with all
compilers, but I'm not sure when it will be possible.

> Do you get still crashes on any of the testcases with this?

No, all tests now pass using emul.  I'll report when I have any results on HW.

Thanks,
  -- Ilya


Re: [PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Evandro Menezes

Hi, Wilco.

It looks good to me, but FCMP is quite different from FCCMP on Exynos 
M1, so it'd be helpful to have distinct types for them. Say, "fcmp{s,d}" 
and "fccmp{s,d}".  Would it be acceptable to add this with this patch or 
later?


Thank you.

--
Evandro Menezes

On 11/13/2015 10:02 AM, Wilco Dijkstra wrote:

This patch adds support for FCCMP. This is trivial with the new CCMP
representation - remove the restriction of FP in ccmp.c and add FCCMP
patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.

OK for commit?

ChangeLog:
2015-11-13  Wilco Dijkstra  

* gcc/ccmp.c (ccmp_candidate_p): Remove integer-only restriction.
* gcc/config/aarch64/aarch64.md (fccmp): New pattern.
(fccmpe): Likewise.
(fcmp): Rename to fcmp and globalize pattern.
(fcmpe): Likewise.
* gcc/config/aarch64/aarch64.c (aarch64_gen_ccmp_first): Add FP
support.
(aarch64_gen_ccmp_next): Add FP support.

gcc/testsuite/
* gcc.target/aarch64/ccmp_1.c: New testcase.


---
  gcc/ccmp.c|  6 ---
  gcc/config/aarch64/aarch64.c  | 24 +
  gcc/config/aarch64/aarch64.md | 34 -
  gcc/testsuite/gcc.target/aarch64/ccmp_1.c | 84
+++
  4 files changed, 140 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_1.c

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index ef60a6d..cbdbd6d 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -108,12 +108,6 @@ ccmp_candidate_p (gimple *g)
|| gimple_bb (gs0) != gimple_bb (g))
  return false;
  
-  if (!(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0)))

-   || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0
-  || !(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)))
-  || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)
-return false;
-
tcode0 = gimple_assign_rhs_code (gs0);
tcode1 = gimple_assign_rhs_code (gs1);
if (TREE_CODE_CLASS (tcode0) == tcc_comparison
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index adb222a..a224982 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12400,6 +12400,18 @@ aarch64_gen_ccmp_first (rtx *prep_seq, rtx
*gen_seq,
icode = CODE_FOR_cmpdi;
break;
  
+case SFmode:

+  cmp_mode = SFmode;
+  cc_mode = aarch64_select_cc_mode ((enum rtx_code) code, op0, op1);
+  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpesf : CODE_FOR_fcmpsf;
+  break;
+
+case DFmode:
+  cmp_mode = DFmode;
+  cc_mode = aarch64_select_cc_mode ((enum rtx_code) code, op0, op1);
+  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpedf : CODE_FOR_fcmpdf;
+  break;
+
  default:
end_sequence ();
return NULL_RTX;
@@ -12463,6 +12475,18 @@ aarch64_gen_ccmp_next (rtx *prep_seq, rtx *gen_seq,
rtx prev, int cmp_code,
icode = CODE_FOR_ccmpdi;
break;
  
+case SFmode:

+  cmp_mode = SFmode;
+  cc_mode = aarch64_select_cc_mode ((enum rtx_code) cmp_code, op0,
op1);
+  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpesf : CODE_FOR_fccmpsf;
+  break;
+
+case DFmode:
+  cmp_mode = DFmode;
+  cc_mode = aarch64_select_cc_mode ((enum rtx_code) cmp_code, op0,
op1);
+  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpedf : CODE_FOR_fccmpdf;
+  break;
+
  default:
end_sequence ();
return NULL_RTX;
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 18a4808e..04b4ddb 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -279,6 +279,36 @@
[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
  )
  
+(define_insn "fccmp"

+  [(set (match_operand:CCFP 1 "cc_register" "")
+   (if_then_else:CCFP
+ (match_operator 4 "aarch64_comparison_operator"
+  [(match_operand 0 "cc_register" "")
+   (const_int 0)])
+ (compare:CCFP
+   (match_operand:GPF 2 "register_operand" "w")
+   (match_operand:GPF 3 "register_operand" "w"))
+ (match_operand 5 "immediate_operand")))]
+  "TARGET_FLOAT"
+  "fccmp\\t%2, %3, %k5, %m4"
+  [(set_attr "type" "fcmp")]
+)
+
+(define_insn "fccmpe"
+  [(set (match_operand:CCFPE 1 "cc_register" "")
+(if_then_else:CCFPE
+ (match_operator 4 "aarch64_comparison_operator"
+  [(match_operand 0 "cc_register" "")
+ (const_int 0)])
+  (compare:CCFPE
+   (match_operand:GPF 2 "register_operand" "w")
+   (match_operand:GPF 3 "register_operand" "w"))
+ (match_operand 5 "immediate_operand")))]
+  "TARGET_FLOAT"
+  "fccmpe\\t%2, %3, %k5, %m4"
+  [(set_attr "type" "fcmp")]
+)
+
  ;; Expansion of signed mod by a power of 2 using CSNEG.
  ;; For x0 % n where n is a power of 2 produce:
  ;; negs   x1, x0
@@ -2794,7 +2824,7 @@
[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
  )
  
-(define_insn 

[PATCH, x86] Fix posix_memalign declaration in mm_malloc.h

2015-11-13 Thread Szabolcs Nagy

Followup to https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01433.html

The posix_memalign declaration is incompatible with musl libc in C++,
because of the exception specification (matters with -std=c++11
-pedantic-errors).  It also pollutes the namespace and lacks protection
against a potential macro definition that is allowed by POSIX.  The
fix avoids source level namespace pollution but retains the dependency
on the posix_memalign extern libc symbol.

Added a test case for the namespace issue.

OK for trunk?

gcc/ChangeLog:

2015-11-13  Szabolcs Nagy  

* config/i386/pmm_malloc.h (posix_memalign): Renamed to ...
(_mm_posix_memalign): This.  Use posix_memalign as extern
symbol only.

gcc/testsuite/ChangeLog:

2015-11-13  Szabolcs Nagy  

* g++.dg/other/mm_malloc.C: New.
diff --git a/gcc/config/i386/pmm_malloc.h b/gcc/config/i386/pmm_malloc.h
index 901001b..0696c20 100644
--- a/gcc/config/i386/pmm_malloc.h
+++ b/gcc/config/i386/pmm_malloc.h
@@ -27,12 +27,13 @@
 #include 
 
 /* We can't depend on  since the prototype of posix_memalign
-   may not be visible.  */
+   may not be visible and we can't pollute the namespace either.  */
 #ifndef __cplusplus
-extern int posix_memalign (void **, size_t, size_t);
+extern int _mm_posix_memalign (void **, size_t, size_t)
 #else
-extern "C" int posix_memalign (void **, size_t, size_t) throw ();
+extern "C" int _mm_posix_memalign (void **, size_t, size_t) throw ()
 #endif
+__asm__("posix_memalign");
 
 static __inline void *
 _mm_malloc (size_t size, size_t alignment)
@@ -42,7 +43,7 @@ _mm_malloc (size_t size, size_t alignment)
 return malloc (size);
   if (alignment == 2 || (sizeof (void *) == 8 && alignment == 4))
 alignment = sizeof (void *);
-  if (posix_memalign (, alignment, size) == 0)
+  if (_mm_posix_memalign (, alignment, size) == 0)
 return ptr;
   else
 return NULL;
diff --git a/gcc/testsuite/g++.dg/other/mm_malloc.C b/gcc/testsuite/g++.dg/other/mm_malloc.C
new file mode 100644
index 000..00582cc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/other/mm_malloc.C
@@ -0,0 +1,17 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-std=c++11" } */
+
+/* Suppress POSIX declarations in libc headers in standard C++ mode.  */
+#undef _GNU_SOURCE
+
+#define posix_memalign user_can_do_this
+
+#include 
+
+void *
+foo ()
+{
+	return _mm_malloc (16, 16);
+}
+
+/* { dg-final { scan-assembler "call\\tposix_memalign" } } */


Re: [PATCH 07/N] Fix memory leaks in haifa-sched

2015-11-13 Thread Jeff Law

On 11/13/2015 07:05 AM, Martin Liška wrote:

Hello.

Patch can bootstrap on x86_64-linux-pc and regression tests are running.

Ready for trunk?
Thanks,
Martin


0001-Release-memory-in-haifa-sched.patch


 From 630eba9465d6502b49bac163f985d25aee982e03 Mon Sep 17 00:00:00 2001
From: marxin
Date: Fri, 13 Nov 2015 10:37:21 +0100
Subject: [PATCH] Release memory in haifa-sched

gcc/ChangeLog:

2015-11-13  Martin Liska

* haifa-sched.c (haifa_finish_h_i_d): Release reg_set_list.

OK.  Thanks for taking care of this.

Jeff


libcpp/C FE source range patch committed (r230331).

2015-11-13 Thread David Malcolm
On Mon, 2015-11-02 at 14:14 -0500, David Malcolm wrote:
> On Fri, 2015-10-30 at 00:15 -0600, Jeff Law wrote:
> > On 10/23/2015 02:41 PM, David Malcolm wrote:
> > > As in the previous version of this patch
> > >   "Implement tree expression tracking in C FE (v2)"
> > > the patch now captures ranges for all C expressions during parsing within
> > > a new field of c_expr, and for all tree nodes with a location_t, it stores
> > > them in ad-hoc locations for later use.
> > >
> > > Hence compound expressions get ranges; see:
> > >
> > > https://dmalcolm.fedorapeople.org/gcc/2015-09-22/diagnostic-test-expressions-1.html
> > >
> > > and for this example:
> > >
> > >int test (int foo)
> > >{
> > >  return foo * 100;
> > > ^^^   ^^^
> > >}
> > >
> > > we have access to the ranges of "foo" and "100" during C parsing via
> > > the c_expr, but once we have GENERIC, all we have is a VAR_DECL and an
> > > INTEGER_CST (the former's location is in at the top of the
> > > function, and the latter has no location).
> > >
> > > gcc/ChangeLog:
> > >   * Makefile.in (OBJS): Add gcc-rich-location.o.
> > >   * gcc-rich-location.c: New file.
> > >   * gcc-rich-location.h: New file.
> > >   * print-tree.c (print_node): Print any source range information.
> > >   * tree.c (set_source_range): New functions.
> > >   * tree.h (CAN_HAVE_RANGE_P): New.
> > >   (EXPR_LOCATION_RANGE): New.
> > >   (EXPR_HAS_RANGE): New.
> > >   (get_expr_source_range): New inline function.
> > >   (DECL_LOCATION_RANGE): New.
> > >   (set_source_range): New decls.
> > >   (get_decl_source_range): New inline function.
> > >
> > > gcc/c-family/ChangeLog:
> > >   * c-common.c (c_fully_fold_internal): Capture existing souce_range,
> > >   and store it on the result.
> > >
> > > gcc/c/ChangeLog:
> > >   * c-parser.c (set_c_expr_source_range): New functions.
> > >   (c_token::get_range): New method.
> > >   (c_token::get_finish): New method.
> > >   (c_parser_expr_no_commas): Call set_c_expr_source_range on the ret
> > >   based on the range from the start of the LHS to the end of the
> > >   RHS.
> > >   (c_parser_conditional_expression): Likewise, based on the range
> > >   from the start of the cond.value to the end of exp2.value.
> > >   (c_parser_binary_expression): Call set_c_expr_source_range on
> > >   the stack values for TRUTH_ANDIF_EXPR and TRUTH_ORIF_EXPR.
> > >   (c_parser_cast_expression): Call set_c_expr_source_range on ret
> > >   based on the cast_loc through to the end of the expr.
> > >   (c_parser_unary_expression): Likewise, based on the
> > >   op_loc through to the end of op.
> > >   (c_parser_sizeof_expression) Likewise, based on the start of the
> > >   sizeof token through to either the closing paren or the end of
> > >   expr.
> > >   (c_parser_postfix_expression): Likewise, using the token range,
> > >   or from the open paren through to the close paren for
> > >   parenthesized expressions.
> > >   (c_parser_postfix_expression_after_primary): Likewise, for
> > >   various kinds of expression.
> > >   * c-tree.h (struct c_expr): Add field "src_range".
> > >   (c_expr::get_start): New method.
> > >   (c_expr::get_finish): New method.
> > >   (set_c_expr_source_range): New decls.
> > >   * c-typeck.c (parser_build_unary_op): Call set_c_expr_source_range
> > >   on ret for prefix unary ops.
> > >   (parser_build_binary_op): Likewise, running from the start of
> > >   arg1.value through to the end of arg2.value.
> > >
> > > gcc/testsuite/ChangeLog:
> > >   * gcc.dg/plugin/diagnostic-test-expressions-1.c: New file.
> > >   * gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c:
> > >   New file.
> > >   * gcc.dg/plugin/plugin.exp (plugin_test_list): Add
> > >   diagnostic_plugin_test_tree_expression_range.c and
> > >   diagnostic-test-expressions-1.c.
> > 
> > >   /* Initialization routine for this file.  */
> > >
> > > @@ -6085,6 +6112,9 @@ c_parser_expr_no_commas (c_parser *parser, struct 
> > > c_expr *after,
> > > ret.value = build_modify_expr (op_location, lhs.value, 
> > > lhs.original_type,
> > >code, exp_location, rhs.value,
> > >rhs.original_type);
> > > +  set_c_expr_source_range (,
> > > +lhs.get_start (),
> > > +rhs.get_finish ());
> > One line if it fits.
> > 
> > 
> > > @@ -6198,6 +6232,9 @@ c_parser_conditional_expression (c_parser *parser, 
> > > struct c_expr *after,
> > >  ? t1
> > >  : NULL);
> > >   }
> > > +  set_c_expr_source_range (,
> > > +start,
> > > +exp2.get_finish ());
> > Here too.
> > 
> > > @@ -6522,6 +6564,10 @@ c_parser_cast_expression (c_parser *parser, struct 
> > > c_expr *after)
> > >   expr = convert_lvalue_to_rvalue (expr_loc, expr, true, true);
> > > }
> > > ret.value = c_cast_expr (cast_loc, type_name, expr.value);
> > > + 

Fix openacc testcase

2015-11-13 Thread Nathan Sidwell
I've committed this trunk patch.  This test case's loop is not parallelizeable, 
as it increments a (non-reduction) variable.  It thus must be marked 'seq'.


nathan
2015-11-13  Nathan Sidwell  

	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Sequential
	loop is sequential.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c	(revision 230324)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c	(working copy)
@@ -9,7 +9,7 @@ main (void)
   int m1 = 4, m2 = -5, m3 = 17;
 
 #pragma acc parallel copy(l)
-  #pragma acc loop collapse(3) reduction(+:l)
+  #pragma acc loop seq collapse(3) reduction(+:l)
 for (i = -2; i < m1; i++)
   for (j = m2; j < -2; j++)
 	{


Re: [AArch64][TLSGD][2/2] Implement TLS GD traditional for tiny code model

2015-11-13 Thread Jiong Wang


On 05/11/15 14:57, Jiong Wang wrote:

Marcus Shawcroft writes:


+#ifdef HAVE_AS_TINY_TLSGD_RELOCS
+  return SYMBOL_TINY_TLSGD;
+#else
+  return SYMBOL_SMALL_TLSGD;
+#endif

Rather than introduce blocks of conditional compilation it is better
to gate different behaviours with a test on a constant expression. In
this case add something like this:

#if define(HAVE_AS_TINY_TLSGD_RELOCS)
#define USE_TINY_TLSGD 1
#else
#define USE_TINY_TLSGD 0
#endif

up near the definition of TARGET_HAVE_TLS then write the above
fragment without using the preprocessor:

return USE_TINY_TLSGD ? SYMBOL_TINY_TLSGD : SYMBOL_SMALL_TLSGD;


Done.


- aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver));
+ if (type == SYMBOL_SMALL_TLSGD)
+  aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver));
+ else
+  aarch64_emit_call_insn (gen_tlsgd_tiny (result, imm, resolver));
  insns = get_insns ();
  end_sequence ();

Add a separate case statment for SYMBOL_TINY_TLSGD rather than reusing
the case statement for SYMBOL_SMALL_TLSGD and then needing to add
another test against symbol type within the body of the case
statement.


Done.



+(define_insn "tlsgd_tiny"
+  [(set (match_operand 0 "register_operand" "")
+ (call (mem:DI (match_operand:DI 2 "" "")) (const_int 1)))
+   (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")]
UNSPEC_GOTTINYTLS)
+   (clobber (reg:DI LR_REGNUM))
+  ]
+  ""
+  "adr\tx0, %A1;bl\t%2;nop";
+  [(set_attr "type" "multiple")
+   (set_attr "length" "12")])

I don't think the explicit clobber LR_REGNUM is required since your
change last September:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html


We don't need this explict clobber LR_REGNUM only if operand 0 happen
be allocated to LR_REGNUM as after my above patch LR_REGNUM is allocable.

However we still need the explict clobber here.  Because for all other
cases LR_REGNUM not allocated, gcc data flow analysis can't deduct 
LR_REGNUM

will still be clobbered implicitly by the call instruction.

Without this "clobber" tag, a direct impact is df_regs_ever_live is 
calculated

incorrectly for x30, then for the following simple testcase:

__thread int t0 = 0x10;
__thread int t1 = 0x10;

int
main (int argc, char **argv)
{
  if (t0 != t1)
return 1;
  return  0;
}


if you compile with

 "-O2 -ftls-model=global-dynamic -fpic -mtls-dialect=trad t.c 
-mcmodel=tiny -fomit-frame-pointer",

wrong code will be generated:

 main:
str x19, [sp, -16]!  <--- x30 is not saved.
adr x0, :tlsgd:t0
bl __tls_get_addr
nop

Patch updated. tls regression OK

OK for trunk?

2015-11-05  Jiong Wang  

gcc/
  * configure.ac: Add check for binutils global dynamic tiny code model
  relocation support.
  * configure: Regenerate.
  * config.in: Regenerate.
  * config/aarch64/aarch64.md (tlsgd_tiny): New define_insn.
  * config/aarch64/aarch64-protos.h (aarch64_symbol_type): New
  enumeration SYMBOL_TINY_TLSGD.
  (aarch64_symbol_context): New comment on SYMBOL_TINY_TLSGD.
  * config/aarch64/aarch64.c (aarch64_classify_tls_symbol): Support
  SYMBOL_TINY_TLSGD.
  (aarch64_print_operand): Likewise.
  (aarch64_expand_mov_immediate): Likewise.
  (aarch64_load_symref_appropriately): Likewise.

gcc/testsuite/
  * lib/target-supports.exp (check_effective_target_aarch64_tlsgdtiny):
  New effective check.
  * gcc.target/aarch64/tlsgd_small_1.c: New testcase.
  * gcc.target/aarch64/tlsgd_small_ilp32_1.c: Likewise.
  * gcc.target/aarch64/tlsgd_tiny_1.c: Likewise.
  * gcc.target/aarch64/tlsgd_tiny_ilp32_1.c: Likewise.

Ping ~


[PATCH v2] [ARM] PR61551 RFC: Improve costs for NEON addressing modes

2015-11-13 Thread Charles Baylis
Hi

Following on from previous discussion:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03464.html
and IRC.

I'm going to try once more to make the case for fixing the worst
problem for GCC 6, pending a rewrite of the address_cost
infrastructure for GCC 7. I think the rewrite you're describing is
overkill for this problem. There is one specific problem which I would
like to fix for GCC6, and that is the failure of the ARM backend to
allow use of post-indexed addressing for some vector modes.

Test program:
 #include 
char *f(char *p, int8x8x4_t v, int r)
{ vst4_s8(p, v); p+=32; return p; }

Desired code:
f:
vst4.8  {d0-d3}, [r0]!
bx  lr

Currently generated code:
f:
mov r3, r0
addsr0, r0, #32
vst4.8  {d0-d3}, [r3]
bx  lr

The auto-inc-dec phase does not apply in this case, because the costs
for RTXs which use POST_INC are wrong. Using gdb to poke at this, we
can see:

$ arm-unknown-linux-gnueabihf-gcc -mfpu=neon -O3 -S /tmp/foo.c
-wrapper gdb,--args
GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9

Reading symbols from
/home/charles.baylis/tools/tools-arm-unknown-linux-gnueabihf-git/bin/../libexec/gcc/arm-unknown-linux-gnueabihf/6.0.0/cc1...done.
(gdb) b auto-inc-dec.c:473
Breakpoint 1 at 0x102c253: file
/home/charles.baylis/srcarea/gcc/gcc-git/gcc/auto-inc-dec.c, line 473.
(gdb) r

(gdb) print debug_rtx(mem)
(mem:OI (reg/v/f:SI 112 [ p ]) [0 MEM[(signed char[32] *)p_2(D)]+0 S32 A8])
$1 = void
(gdb) print rtx_cost(mem, V16QImode, SET, 1, false)
$2 = 4
(gdb) print debug_rtx(mem_tmp)
(mem:OI (post_inc:SI (reg/f:SI 115 [ p ])) [0  S32 A64])
$3 = void
(gdb) print rtx_cost(mem_tmp, V16QImode, SET, 1, false)
$4 = 32

So, the cost of
 (mem:OI (reg/v/f:SI 112 [ p ]))
is 4, while the cost of
(mem:OI (post_inc:SI (reg/f:SI 115 [ p ])))
is 32.

That is a difference equivalent to 7 insns, which has no basis in
reality. It is just a bug.

Addressing some specific review points from the previous version.

> > +{
> > +  0,
> > +  COSTS_N_INSNS (15),
> > +  COSTS_N_INSNS (15),
> > +  COSTS_N_INSNS (15),
> > +  COSTS_N_INSNS (15)
> > +} /* vec512 */
> >}
> >  };
>
> I'm curious as to the numbers here - The costs should reflect the relative 
> costs of the
> addressing modes not the costs of the loads and stores - thus having high 
> numbers
> here for vector modes may just prevent this from even triggering in 
> auto-inc-dec
> code ? In my experience with GCC I've never satisfactorily answered the 
> question
> whether these should be comparable to rtx_costs or not. In an ideal world 
> they should
> be but I'm never sure. IOW I'm not sure if using COSTS_N_INSNS or plain 
> numbers
> here is appropriate.

That's the point of the patch. These numbers give the same behaviour
as the current arm_rtx_costs code, and they are obviously wrong.

> 17:45 < ramana> My problem is that the mid-end in a number of other places
> compares the cost coming out of rtx_cost and address_cost and if the 2
> are not in sync we get funny values.

There is already no correspondence at all between the two at present.
My patch doesn't address this, but I think it must at least make it
better.

However, I don't really understand this comment - as you point out
above, address_cost and rtx_cost return values measured in different
units. I don't see how they can be made to correspond, given that.

> Right, but this does not change arm_address_costs - so how is this going to 
> work?
> I would like this moved into a new function aarch_address_costs and that 
> replacing
> arm_address_costs only to be called from here.

I could do that, but if I did, I would have to resubmit the patch at
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00387.html along with a
reimplemention of arm_address_costs which used a table without
changing its numerical results (pending subsequent tuning). Since the
former would already solve my problem, and the latter would then be a
pure code clean up of a separate function, why not accept the '387
patch as is, and leave the clean up until GCC 7?

Alternatively, this is an updated patch series which changes the costs
for MEMs in arm_rtx_costs using the table. Passes make check with no
regressions for arm-unknown-linux-gnueabihf on qemu.
From d8110f141a449c62f1ba2c4f47832ee2633d3998 Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Wed, 28 Oct 2015 18:48:16 +
Subject: [PATCH 1/4] Add table-driven implemention of "case MEM:" in
 arm_rtx_costs_new.

This patch replicates the existing cost calculation using a table, so that
the costs can be tuned cleanly. The old implementation is retained for
comparison, and check is made that the same result is obtained from both
methods.

Change-Id: If349ffd7dbbe13a814be4a0d022382ddc8270973
---
 gcc/config/arm/aarch-common-protos.h |  28 
 gcc/config/arm/aarch-cost-tables.h   |  95 

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-11-13 Thread Jeff Law

On 11/13/2015 03:13 AM, Richard Biener wrote:


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 34d2356..6613e83 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1474,6 +1474,7 @@ OBJS = \
 tree-ssa-loop.o \
 tree-ssa-math-opts.o \
 tree-ssa-operands.o \
+   tree-ssa-path-split.o \


gimple-ssa-path-split please.

Agreed.   I'll make that change for Ajit.





 tree-ssa-phionlycprop.o \
 tree-ssa-phiopt.o \
 tree-ssa-phiprop.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 757ce85..3e946ca 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2403,6 +2403,10 @@ ftree-vrp
  Common Report Var(flag_tree_vrp) Init(0) Optimization
  Perform Value Range Propagation on trees.

+ftree-path-split


fsplit-paths

And this plus related variable name fixes and such.




+@item -ftree-path-split
+@opindex ftree-path-split
+Perform Path Splitting on trees.  When the two execution paths of a
+if-then-else merge at the loop latch node, try to duplicate the
+merge node into two paths. This is enabled by default at @option{-O2}
+and above.
+


I think if we go into the detail of the transform we should mention the
effective result (creating a loop nest with disambiguation figuring out
which is the "better" inner loop).
It no longer creates a loop nest.  The overall shape of the CFG is 
maintained.  ie, we still have a single simple latch for the loop.  The 
blocks we create are internal to the loop.


I always struggle with the right level at which to document these 
options.   I'll take a look at this for Ajit.


BTW Do we have an API for indicating that new blocks have been added to 
a loop?  If so, then we can likely drop the LOOPS_NEED_FIXUP.






  @item -fsplit-ivs-in-unroller
  @opindex fsplit-ivs-in-unroller
  Enables expression of values of induction variables in later iterations
diff --git a/gcc/opts.c b/gcc/opts.c
index 9a3fbb3..9a0b27c 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -509,6 +509,7 @@ static const struct default_options
default_options_table[] =
  { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1
},
  { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
+{ OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 },


Is this transform a good idea for -Os?

In general, no because of the block duplication.

jeff


[patch] Define std::experimental::randint etc.

2015-11-13 Thread Jonathan Wakely

Another piece of the Library Fundamentals v2 TS, as specified by
https://rawgit.com/cplusplus/fundamentals-ts/v2/fundamentals-ts.html#rand.util.randint

Tested powerpc64le-linux, committed to trunk.

commit 3a59dc1453e5d273e6943a0928fc7722488ae654
Author: Jonathan Wakely 
Date:   Fri Nov 13 16:26:34 2015 +

Define std::experimental::randint etc.

	* include/Makefile.am: Add new header.
	* include/Makefile.in: Regenerate.
	* include/experimental/random: New.
	* testsuite/experimental/random/randint.cc: New.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 4e2ae18..5560b9e 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -660,6 +660,7 @@ experimental_headers = \
 	${experimental_srcdir}/numeric \
 	${experimental_srcdir}/optional \
 	${experimental_srcdir}/propagate_const \
+	${experimental_srcdir}/random \
 	${experimental_srcdir}/ratio \
 	${experimental_srcdir}/regex \
 	${experimental_srcdir}/set \
diff --git a/libstdc++-v3/include/experimental/random b/libstdc++-v3/include/experimental/random
new file mode 100644
index 000..9be1d31
--- /dev/null
+++ b/libstdc++-v3/include/experimental/random
@@ -0,0 +1,77 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file experimental/random
+ *  This is a TS C++ Library header.
+ */
+
+#ifndef _GLIBCXX_EXPERIMENTAL_RANDOM
+#define _GLIBCXX_EXPERIMENTAL_RANDOM 1
+
+#include 
+
+namespace std {
+namespace experimental {
+inline namespace fundamentals_v2 {
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+#define __cpp_lib_experimental_randint 201511
+
+  inline std::default_random_engine&
+  _S_randint_engine()
+  {
+static thread_local default_random_engine __eng{random_device{}()};
+return __eng;
+  }
+
+  // 13.2.2.1, Function template randint
+  template
+inline _IntType
+randint(_IntType __a, _IntType __b)
+{
+  static_assert(is_integral<_IntType>::value && sizeof(_IntType) > 1,
+		"argument must be an integer type");
+  using _Dist = std::uniform_int_distribution<_IntType>;
+  static thread_local _Dist __dist;
+  return __dist(_S_randint_engine(), typename _Dist::param_type{__a, __b});
+}
+
+  inline void
+  reseed()
+  {
+_S_randint_engine().seed(random_device{}());
+  }
+
+  inline void
+  reseed(default_random_engine::result_type __value)
+  {
+_S_randint_engine().seed(__value);
+  }
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace fundamentals_v2
+} // namespace experimental
+} // namespace std
+
+#endif
diff --git a/libstdc++-v3/testsuite/experimental/random/randint.cc b/libstdc++-v3/testsuite/experimental/random/randint.cc
new file mode 100644
index 000..d523836
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/random/randint.cc
@@ -0,0 +1,84 @@
+// { dg-options "-std=gnu++14" }
+// { dg-require-effective-target tls_runtime }
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a moved_to of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+
+void
+test01()
+{
+  for (int i = 0; i < 100; ++i)
+  {
+const int n = std::experimental::randint(-10, i);
+VERIFY( -10 <= n && n <= i );
+  }
+
+  std::experimental::reseed(99u);
+  const long n1[] = {
+std::experimental::randint(0, 

Re: [PATCH] Fix PR56118

2015-11-13 Thread Alan Lawrence

On 10/11/15 09:34, Richard Biener wrote:


The following fixes PR56118 by adjusting the cost model handling of
basic-block vectorization to favor the vectorized version in case
estimated cost is the same as the estimated cost of the scalar
version.  This makes sense because we over-estimate the vectorized
cost in several places.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-10  Richard Biener  

PR tree-optimization/56118
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
cost favor vectorized version.

* gcc.target/i386/pr56118.c: New testcase.


On AArch64 and ARM targets, this causes PASS->FAIL of

gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable"
gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump slp2 
"vectorization is not profitable"


that sounds like a good thing ;), so I imagine the xfail directive may just 
need updating. The test also looks to be failing on powerpc64 (according to 
https://gcc.gnu.org/ml/gcc-testresults/2015-11/msg01327.html).


Regards, Alan



[hsa] Pass kernel launch attributes through a device-specific argument

2015-11-13 Thread Martin Jambor
Hi,

this hsa patch is analogous to the for-trunk RFC I have sent a while
ago and implements passing HSA-specific grid sizes through a
device-specific argument.  Committed to the branch.

Thanks,

Martin


2015-11-13  Martin Jambor  

include/
* gomp-constants.h (GOMP_TARGET_ARG_FIRST_DEVICE_SPECIFIC): New
constant.
(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.

gcc/
* builtin-types.def
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT_PTR): Turned
into BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR.
* omp-builtins.def (BUILT_IN_GOMP_TARGET): Updated type.
* omp-low.c (get_target_arguments): New function.
(expand_omp_target): Call it, do not calculate num_teams and
thread_limit.

gcc/fortran
* types.def:
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT_PTR): Turned
into BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR.

libgomp/
* libgomp.h (gomp_device_descr): Update type of run_func.
* libgomp_g.h (GOMP_target_ext): Update type.
* oacc-host.c (host_run): Likewise.
* target.c (GOMP_target_ext): Change type, pass arguments to plugins.
* plugin/plugin-hsa.c (parse_launch_attributes): Parse arguments.
(GOMP_OFFLOAD_run): Update type.

liboffloadmic/plugin/
* libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_run): Update type.
---
 gcc/builtin-types.def|  6 +-
 gcc/fortran/types.def|  4 +-
 gcc/omp-builtins.def |  2 +-
 gcc/omp-low.c| 85 
 include/gomp-constants.h | 10 +++
 libgomp/libgomp.h|  2 +-
 libgomp/libgomp_g.h  |  3 +-
 libgomp/oacc-host.c  |  2 +-
 libgomp/plugin/plugin-hsa.c  | 31 +++--
 libgomp/target.c | 13 ++--
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp |  3 +-
 11 files changed, 109 insertions(+), 52 deletions(-)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index ef854c4..251c980 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -557,9 +557,9 @@ DEF_FUNCTION_TYPE_9 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 
-DEF_FUNCTION_TYPE_11 
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT_PTR,
- BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
- BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT, BT_PTR)
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
+BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
  BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 14e6970..d5f44ab 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -222,9 +222,9 @@ DEF_FUNCTION_TYPE_9 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 
-DEF_FUNCTION_TYPE_11 
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT_PTR,
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
- BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT, BT_PTR)
+ BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
  BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index c75da11..20c06b7 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -343,7 +343,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_REGISTER, 
"GOMP_offload_register",
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_UNREGISTER, "GOMP_offload_unregister",
  BT_FN_VOID_PTR_INT_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target_ext",
- BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT_PTR,
+ BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data_ext",
  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c7c9c3b..d10636b 100644
--- a/gcc/omp-low.c
+++ 

Re: [PATCH] Fix memory leaks in tree-ssa-uninit.c

2015-11-13 Thread Jeff Law

On 11/13/2015 09:58 AM, Martin Liška wrote:

On 11/13/2015 05:32 PM, Jeff Law wrote:

On 11/13/2015 05:50 AM, Martin Liška wrote:

Hello.

Patch survives regbootstrap on x86_64-linux-gnu.
Ready for trunk?

Thanks,
Martin


0001-Fix-memory-leaks-in-tree-ssa-uninit.c.patch


  From 54851503251dee7a8bd074485db262715e628728 Mon Sep 17 00:00:00 2001
From: marxin
Date: Fri, 13 Nov 2015 12:23:22 +0100
Subject: [PATCH] Fix memory leaks in tree-ssa-uninit.c

gcc/ChangeLog:

2015-11-13  Martin Liska

 * tree-ssa-uninit.c (convert_control_dep_chain_into_preds):
 Fix GNU coding style.
 (find_def_preds): Use auto_vec.
 (destroy_predicate_vecs): Change signature of the function.
 (prune_uninit_phi_opnds_in_unrealizable_paths): Use the
 new signature.
 (simplify_preds_4): Use destroy_predicate_vecs instread of
 just releasing preds vector.
 (normalize_preds): Likewise.
 (is_use_properly_guarded): Use new signature of
 destroy_predicate_vecs.
 (find_uninit_use): Likewise.

OK.

FWIW, there's all kinds of spaces vs tabs issues in this file.  I'm curious why 
you chose to fix convert_control_dep_chain_into_preds, but didn't change any 
others.


Hi Jeff.

Thanks for confirmation, you are right, it's full of coding style issues. I can 
change these if it would be desired?
It's probably a good thing to do.  Given it'd strictly be formatting, 
I'd even consider it post-stage1.


jeff


Re: [PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-13 Thread Jim Wilson
Revised patch with the also missing xgene1 part added.

Jim
2015-11-13  Jim Wilson  

	* gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add lines for exynos-m1
	and qdf24xx and xgene1 to match -march=armv8-a.

Index: gcc/config/arm/t-aprofile
===
--- gcc/config/arm/t-aprofile	(revision 230283)
+++ gcc/config/arm/t-aprofile	(working copy)
@@ -91,6 +91,9 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?corte
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a57.cortex-a53
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a72
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a72.cortex-a53
+MULTILIB_MATCHES   += march?armv8-a=mcpu?exynos-m1
+MULTILIB_MATCHES   += march?armv8-a=mcpu?qdf24xx
+MULTILIB_MATCHES   += march?armv8-a=mcpu?xgene1
 
 # Arch Matches
 MULTILIB_MATCHES   += march?armv8-a=march?armv8-a+crc


Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-11-13 Thread Richard Biener
On November 13, 2015 5:26:01 PM GMT+01:00, Jeff Law  wrote:
>On 11/13/2015 03:13 AM, Richard Biener wrote:
>
>>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>>> index 34d2356..6613e83 100644
>>> --- a/gcc/Makefile.in
>>> +++ b/gcc/Makefile.in
>>> @@ -1474,6 +1474,7 @@ OBJS = \
>>>  tree-ssa-loop.o \
>>>  tree-ssa-math-opts.o \
>>>  tree-ssa-operands.o \
>>> +   tree-ssa-path-split.o \
>>
>> gimple-ssa-path-split please.
>Agreed.   I'll make that change for Ajit.
>
>
>>
>>>  tree-ssa-phionlycprop.o \
>>>  tree-ssa-phiopt.o \
>>>  tree-ssa-phiprop.o \
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index 757ce85..3e946ca 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -2403,6 +2403,10 @@ ftree-vrp
>>>   Common Report Var(flag_tree_vrp) Init(0) Optimization
>>>   Perform Value Range Propagation on trees.
>>>
>>> +ftree-path-split
>>
>> fsplit-paths
>And this plus related variable name fixes and such.
>
>
>>>
>>> +@item -ftree-path-split
>>> +@opindex ftree-path-split
>>> +Perform Path Splitting on trees.  When the two execution paths of a
>>> +if-then-else merge at the loop latch node, try to duplicate the
>>> +merge node into two paths. This is enabled by default at
>@option{-O2}
>>> +and above.
>>> +
>>
>> I think if we go into the detail of the transform we should mention
>the
>> effective result (creating a loop nest with disambiguation figuring
>out
>> which is the "better" inner loop).
>It no longer creates a loop nest.  The overall shape of the CFG is 
>maintained.  ie, we still have a single simple latch for the loop.  The
>
>blocks we create are internal to the loop.
>
>I always struggle with the right level at which to document these 
>options.   I'll take a look at this for Ajit.
>
>BTW Do we have an API for indicating that new blocks have been added to
>
>a loop?  If so, then we can likely drop the LOOPS_NEED_FIXUP.

Please. It's called add_to_loop or so.

Richard.

>
>>
>>>   @item -fsplit-ivs-in-unroller
>>>   @opindex fsplit-ivs-in-unroller
>>>   Enables expression of values of induction variables in later
>iterations
>>> diff --git a/gcc/opts.c b/gcc/opts.c
>>> index 9a3fbb3..9a0b27c 100644
>>> --- a/gcc/opts.c
>>> +++ b/gcc/opts.c
>>> @@ -509,6 +509,7 @@ static const struct default_options
>>> default_options_table[] =
>>>   { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference,
>NULL, 1
>>> },
>>>   { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
>>>   { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
>>> +{ OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 },
>>
>> Is this transform a good idea for -Os?
>In general, no because of the block duplication.
>
>jeff




[PATCH] 21_strings/basic_string/capacity/wchar_t/18654.cc

2015-11-13 Thread David Edelsohn
http://www.cplusplus.com/reference/string/basic_string/reserve/

"Note that the resulting string capacity may be equal or greater than n."

The current testcase verifies that the capacity is exactly equal to
the length of the string or reserve value, but the standard allows the
capacity to be larger.  On AIX, the capacity is larger and the
testcase incorrectly fails.

Linux x86-64:
i: 4
str.length: 4
str.capacity: 4
str.capacity: 12
str.capacity: 8
str.capacity: 4

AIX:
i: 4
str.length: 4
str.capacity: 7   <-- i
str.capacity: 14  <-- i*3
str.capacity: 8   <-- i*2
str.capacity: 7   <-- default

* 21_strings/basic_string/capacity/wchar_t/18654.cc: Verify the
capacity is greater than or equal to the requested amount.

Index: 18654.cc
===
--- 18654.cc(revision 230322)
+++ 18654.cc(working copy)
@@ -50,10 +50,10 @@
   str.reserve(3 * i);

   str.reserve(2 * i);
-  VERIFY( str.capacity() == 2 * i );
+  VERIFY( str.capacity() >= 2 * i );

   str.reserve();
-  VERIFY( str.capacity() == i );
+  VERIFY( str.capacity() >= i );
 }
 }

Thanks, David


Re: [PATCH] PR fortran/68319 -- Implement checks for F2008:C1206

2015-11-13 Thread Paul Richard Thomas
Hi Steve,

I saw the thread on clf. That's a pretty quick turn around!

OK for trunk.

Thanks for the patch

Paul

On 13 November 2015 at 19:38, Steve Kargl
 wrote:
> The attached patch implements the checks required by
> constraint C1206 from Fortran 2008 standard.  Built
> and regression tested on x86_64-*-freebsd.  OK to
> commmit?
>
> 2015-11-13  Steven G. Kargl  
>
> PR fortran/68319
> * decl.c (gfc_match_data, gfc_match_entry): Enforce F2008:C1206.
> * io.c (gfc_match_format): Ditto.
> * match.c (gfc_match_st_function): Ditto.
>
> 2015-11-13  Steven G. Kargl  
>
> PR fortran/68319
> * gfortran.dg/pr68319.f90: New test.
>
> --
> Steve



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


[gomp4.5] Fix up doacross with dynamic scheduling

2015-11-13 Thread Jakub Jelinek
Hi!

Guided scheduling doesn't ensure that all assigned chunks have sizes
of multiple of chunk_size, chunk_size for guided scheduling actually means
just minimum size of the chunk.  That means we can't divide the IV or
number of iterations by chunk size though, because the precondition for that
is that all chunks assigned to each thread are multiples of the chunk size.

Fixed thusly, tested on x86_64-linux, committed to gomp-4_5-branch.

2015-11-13  Jakub Jelinek  

* ordered.c (gomp_doacross_init, GOMP_doacross_post,
GOMP_doacross_wait, gomp_doacross_ull_init, GOMP_doacross_ull_post,
GOMP_doacross_ull_wait): For GFS_GUIDED don't divide number of
iterators or IV by chunk size.
* testsuite/libgomp.c/doacross-3.c: New test.

--- libgomp/ordered.c.jj2015-10-14 10:24:10.0 +0200
+++ libgomp/ordered.c   2015-11-13 19:14:11.877005602 +0100
@@ -297,6 +297,8 @@ gomp_doacross_init (unsigned ncounts, lo
 
   if (ws->sched == GFS_STATIC)
 num_ents = team->nthreads;
+  else if (ws->sched == GFS_GUIDED)
+num_ents = counts[0];
   else
 num_ents = (counts[0] - 1) / chunk_size + 1;
   if (num_bits <= MAX_COLLAPSED_BITS)
@@ -366,6 +368,8 @@ GOMP_doacross_post (long *counts)
 
   if (__builtin_expect (ws->sched == GFS_STATIC, 1))
 ent = thr->ts.team_id;
+  else if (ws->sched == GFS_GUIDED)
+ent = counts[0];
   else
 ent = counts[0] / doacross->chunk_size;
   unsigned long *array = (unsigned long *) (doacross->array
@@ -426,6 +430,8 @@ GOMP_doacross_wait (long first, ...)
   else
ent = first / ws->chunk_size % thr->ts.team->nthreads;
 }
+  else if (ws->sched == GFS_GUIDED)
+ent = first;
   else
 ent = first / doacross->chunk_size;
   unsigned long *array = (unsigned long *) (doacross->array
@@ -520,6 +526,8 @@ gomp_doacross_ull_init (unsigned ncounts
 
   if (ws->sched == GFS_STATIC)
 num_ents = team->nthreads;
+  else if (ws->sched == GFS_GUIDED)
+num_ents = counts[0];
   else
 num_ents = (counts[0] - 1) / chunk_size + 1;
   if (num_bits <= MAX_COLLAPSED_BITS)
@@ -595,6 +603,8 @@ GOMP_doacross_ull_post (gomp_ull *counts
 
   if (__builtin_expect (ws->sched == GFS_STATIC, 1))
 ent = thr->ts.team_id;
+  else if (ws->sched == GFS_GUIDED)
+ent = counts[0];
   else
 ent = counts[0] / doacross->chunk_size_ull;
 
@@ -676,6 +686,8 @@ GOMP_doacross_ull_wait (gomp_ull first,
   else
ent = first / ws->chunk_size_ull % thr->ts.team->nthreads;
 }
+  else if (ws->sched == GFS_GUIDED)
+ent = first;
   else
 ent = first / doacross->chunk_size_ull;
 
--- libgomp/testsuite/libgomp.c/doacross-3.c.jj 2015-11-13 19:08:22.191960410 
+0100
+++ libgomp/testsuite/libgomp.c/doacross-3.c2015-11-13 19:09:01.626401650 
+0100
@@ -0,0 +1,225 @@
+extern void abort (void);
+
+#define N 256
+int a[N], b[N / 16][8][4], c[N / 32][8][8], g[N / 16][8][6];
+volatile int d, e;
+volatile unsigned long long f;
+
+int
+main ()
+{
+  unsigned long long i;
+  int j, k, l, m;
+  #pragma omp parallel private (l)
+  {
+#pragma omp for schedule(guided, 3) ordered (1) nowait
+for (i = 1; i < N + f; i++)
+  {
+   #pragma omp atomic write
+   a[i] = 1;
+   #pragma omp ordered depend(sink: i - 1)
+   if (i > 1)
+ {
+   #pragma omp atomic read
+   l = a[i - 1];
+   if (l < 2)
+ abort ();
+ }
+   #pragma omp atomic write
+   a[i] = 2;
+   if (i < N - 1)
+ {
+   #pragma omp atomic read
+   l = a[i + 1];
+   if (l == 3)
+ abort ();
+ }
+   #pragma omp ordered depend(source)
+   #pragma omp atomic write
+   a[i] = 3;
+  }
+#pragma omp for schedule(guided) ordered (3) nowait
+for (i = 3; i < N / 16 - 1 + f; i++)
+  for (j = 0; j < 8; j += 2)
+   for (k = 1; k <= 3; k++)
+ {
+   #pragma omp atomic write
+   b[i][j][k] = 1;
+   #pragma omp ordered depend(sink: i, j - 2, k - 1) \
+   depend(sink: i - 2, j - 2, k + 1)
+   #pragma omp ordered depend(sink: i - 3, j + 2, k - 2)
+   if (j >= 2 && k > 1)
+ {
+   #pragma omp atomic read
+   l = b[i][j - 2][k - 1];
+   if (l < 2)
+ abort ();
+ }
+   #pragma omp atomic write
+   b[i][j][k] = 2;
+   if (i >= 5 && j >= 2 && k < 3)
+ {
+   #pragma omp atomic read
+   l = b[i - 2][j - 2][k + 1];
+   if (l < 2)
+ abort ();
+ }
+   if (i >= 6 && j < N / 16 - 3 && k == 3)
+ {
+   #pragma omp atomic read
+   l = b[i - 3][j + 2][k - 2];
+   if (l < 2)
+ abort ();
+ }
+   #pragma omp ordered depend(source)
+   #pragma omp atomic write
+   b[i][j][k] = 3;
+  

Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Ilya Verbin
On Fri, Nov 13, 2015 at 17:41:53 +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 07:37:17PM +0300, Ilya Verbin wrote:
> > I don't know which interface to implement to maintain compatibility in the
> > future.
> > Anyway, currently it's impossible that a process will use the same 
> > liboffloadmic
> > for 2 different offloading paths (say GCC's in exec and ICC's in a dso), 
> > because
> > in fact GCC's and ICC's libraries are not the same.  First of all, they have
> > different names: liboffloadmic in GCC and just liboffload in ICC.  And most
> > importantly, ICC's version contains some references to libiomp5, which were
> > removed form GCC's version.  In theory, we want to use one library with all
> > compilers, but I'm not sure when it will be possible.
> 
> Ok, in that case it is less of a problem.
> 
> > > Do you get still crashes on any of the testcases with this?
> > 
> > No, all tests now pass using emul.  I'll report when I have any results on 
> > HW.
> 
> Perfect, I'll commit it to gomp-4_5-branch then.

make check-target-libgomp with offloading to HW also passed :)

And this:

+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -3,6 +3,7 @@
 
 int main ()
 {
+  int x = 1;
   int a = 0, b = 0, c = 0, d[7];
 
   #pragma omp parallel
@@ -18,6 +19,7 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
+  while (x);
   usleep (1000);
   #pragma omp atomic update
   b |= 4;
@@ -25,6 +27,7 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
+  while (x);
   usleep (5000);
   #pragma omp atomic update
   b |= 1;

demonstrates 200% CPU usage both using emul and HW, so 2 target tasks really run
concurrently.

  -- Ilya


[PATCH] PR fortran/68319 -- Implement checks for F2008:C1206

2015-11-13 Thread Steve Kargl
The attached patch implements the checks required by
constraint C1206 from Fortran 2008 standard.  Built
and regression tested on x86_64-*-freebsd.  OK to
commmit?

2015-11-13  Steven G. Kargl  

PR fortran/68319
* decl.c (gfc_match_data, gfc_match_entry): Enforce F2008:C1206.
* io.c (gfc_match_format): Ditto.
* match.c (gfc_match_st_function): Ditto.

2015-11-13  Steven G. Kargl  

PR fortran/68319
* gfortran.dg/pr68319.f90: New test.

-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 230336)
+++ gcc/fortran/decl.c	(working copy)
@@ -552,6 +552,15 @@ gfc_match_data (void)
   gfc_data *new_data;
   match m;
 
+  /* Before parsing the rest of a DATA statement, check F2008:c1206.  */
+  if ((gfc_current_state () == COMP_FUNCTION
+   || gfc_current_state () == COMP_SUBROUTINE)
+  && gfc_state_stack->previous->state == COMP_INTERFACE)
+{
+  gfc_error ("DATA statement at %C cannot appear within an INTERFACE");
+  return MATCH_ERROR;
+}
+
   set_in_match_data (true);
 
   for (;;)
@@ -5767,6 +5776,13 @@ gfc_match_entry (void)
   return MATCH_ERROR;
 }
 
+  if ((state == COMP_SUBROUTINE || state == COMP_FUNCTION)
+  && gfc_state_stack->previous->state == COMP_INTERFACE)
+{
+  gfc_error ("ENTRY statement at %C cannot appear within an INTERFACE");
+  return MATCH_ERROR;
+}
+
   module_procedure = gfc_current_ns->parent != NULL
 		   && gfc_current_ns->parent->proc_name
 		   && gfc_current_ns->parent->proc_name->attr.flavor
Index: gcc/fortran/io.c
===
--- gcc/fortran/io.c	(revision 230336)
+++ gcc/fortran/io.c	(working copy)
@@ -1199,6 +1199,15 @@ gfc_match_format (void)
   return MATCH_ERROR;
 }
 
+  /* Before parsing the rest of a FORMAT statement, check F2008:c1206.  */
+  if ((gfc_current_state () == COMP_FUNCTION
+   || gfc_current_state () == COMP_SUBROUTINE)
+  && gfc_state_stack->previous->state == COMP_INTERFACE)
+{
+  gfc_error ("FORMAT statement at %C cannot appear within an INTERFACE");
+  return MATCH_ERROR;
+}
+
   if (gfc_statement_label == NULL)
 {
   gfc_error ("Missing format label at %C");
Index: gcc/fortran/match.c
===
--- gcc/fortran/match.c	(revision 230336)
+++ gcc/fortran/match.c	(working copy)
@@ -4913,6 +4913,15 @@ gfc_match_st_function (void)
 
   sym->value = expr;
 
+  if ((gfc_current_state () == COMP_FUNCTION
+   || gfc_current_state () == COMP_SUBROUTINE)
+  && gfc_state_stack->previous->state == COMP_INTERFACE)
+{
+  gfc_error ("Statement function at %L cannot appear within an INTERFACE",
+		 >where);
+  return MATCH_ERROR;
+}
+
   if (!gfc_notify_std (GFC_STD_F95_OBS, "Statement function at %C"))
 return MATCH_ERROR;
 
Index: gcc/testsuite/gfortran.dg/pr68319.f90
===
--- gcc/testsuite/gfortran.dg/pr68319.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr68319.f90	(working copy)
@@ -0,0 +1,26 @@
+! { dg-do compile }
+! PR fortran/68319
+!
+subroutine foo
+
+   interface
+
+  real function bar(i)
+ f(i) = 2 * i ! { dg-error "cannot appear within" }
+  end function bar
+
+  real function bah(j)
+ entry boo(j) ! { dg-error "cannot appear within" }
+  end function bah
+
+  real function fu(j)
+ data i /1/   ! { dg-error "cannot appear within" }
+  end function fu
+
+  real function fee(j)
+10   format('(A)')! { dg-error "cannot appear within" }
+  end function fee
+
+   end interface
+
+end subroutine foo


RE: [PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Wilco Dijkstra
> Evandro Menezes wrote:
> Hi, Wilco.
> 
> It looks good to me, but FCMP is quite different from FCCMP on Exynos M1,
> so it'd be helpful to have distinct types for them. Say, "fcmp{s,d}"
> and "fccmp{s,d}".  Would it be acceptable to add this with this patch or 
> later?

It would be easy to add fccmps/d as new attributes, I prefer to do that as a 
separate
patch. Are there any other attributes that you think are missing besides the 
ones I know
about (extr, 64-bit mul/mla, mulh, bfi)?

Also would we need a new entry in the cost tables?

Wilco




[gomp4.5] Fix up ordered threads simd

2015-11-13 Thread Jakub Jelinek
Hi!

This fixes #pragma omp ordered threads simd expansion.
In that case we want GOMP_ordered_start () / GOMP_ordered_end () calls
around the block, but those calls really should be done just once, while
the other stuff in between GOMP_SIMD_ORDERED_{START,END} internal calls
should be expanded in a loop from 0 to vf-1 with the iterator and linear
vars being adjusted there.  Therefore, I'm not emitting those calls in
between the internal calls (that is what eventually should be a loop), but
not outside either (because everything there should be vectorized).
Thus, it is handled as an argument to the internal calls (for now not a big
difference, as the vectorizer always gives up on this, but we should teach
it to handle that case eventually).

Regtested on x86_64-linux, committed to gomp-4_5-branch.

2015-11-13  Jakub Jelinek  

* omp-low.c (lower_omp_ordered): Add argument to GOMP_SIMD_ORDERED_*
internal calls - 0 if ordered simd and 1 for ordered threads simd.
* tree-vectorizer.c (adjust_simduid_builtins): If GOMP_SIMD_ORDERED_*
argument is 1, replace it with GOMP_ordered_* call instead of removing
it.

* testsuite/libgomp.c/ordered-5.c: New test.

--- gcc/omp-low.c.jj2015-11-09 11:17:31.0 +0100
+++ gcc/omp-low.c   2015-11-13 17:20:18.701832932 +0100
@@ -13924,8 +13924,10 @@ lower_omp_ordered (gimple_stmt_iterator
   gomp_ordered *ord_stmt = as_a  (stmt);
   gcall *x;
   gbind *bind;
-  bool simd
-= find_omp_clause (gimple_omp_ordered_clauses (ord_stmt), OMP_CLAUSE_SIMD);
+  bool simd = find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
+  OMP_CLAUSE_SIMD);
+  bool threads = find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
+ OMP_CLAUSE_THREADS);
 
   if (find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
   OMP_CLAUSE_DEPEND))
@@ -13948,7 +13950,8 @@ lower_omp_ordered (gimple_stmt_iterator
 
   if (simd)
 {
-  x = gimple_build_call_internal (IFN_GOMP_SIMD_ORDERED_START, 0);
+  x = gimple_build_call_internal (IFN_GOMP_SIMD_ORDERED_START, 1,
+ build_int_cst (NULL_TREE, threads));
   cfun->has_simduid_loops = true;
 }
   else
@@ -13962,7 +13965,8 @@ lower_omp_ordered (gimple_stmt_iterator
   gimple_omp_set_body (stmt, NULL);
 
   if (simd)
-x = gimple_build_call_internal (IFN_GOMP_SIMD_ORDERED_END, 0);
+x = gimple_build_call_internal (IFN_GOMP_SIMD_ORDERED_END, 1,
+   build_int_cst (NULL_TREE, threads));
   else
 x = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ORDERED_END),
   0);
--- gcc/tree-vectorizer.c.jj2015-11-09 11:17:56.0 +0100
+++ gcc/tree-vectorizer.c   2015-11-13 17:51:32.793269963 +0100
@@ -177,6 +177,21 @@ adjust_simduid_builtins (hash_table

[C PATCH] Use RECORD_OR_UNION_TYPE_P macro

2015-11-13 Thread Marek Polacek
As promised & discussed in the thread here:
.

diffstat shows:
 c/c-decl.c   |   38 +--
 c/c-typeck.c |   81 +++
 2 files changed, 40 insertions(+), 79 deletions(-)
so this patch simplifies the code quite a bit.

I suppose I could commit this right away, but maybe someone will be kind
enough to glance over this.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-11-13  Marek Polacek  

* c-decl.c: Use RECORD_OR_UNION_TYPE_P throughout.
* c-typeck.c: Likewise.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 9a222d8..7b9ab8a 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -3048,8 +3048,7 @@ pushdecl (tree x)
element = TREE_TYPE (element);
   element = TYPE_MAIN_VARIANT (element);
 
-  if ((TREE_CODE (element) == RECORD_TYPE
-  || TREE_CODE (element) == UNION_TYPE)
+  if (RECORD_OR_UNION_TYPE_P (element)
  && (TREE_CODE (x) != TYPE_DECL
  || TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
  && !COMPLETE_TYPE_P (element))
@@ -4643,8 +4642,7 @@ diagnose_uninitialized_cst_member (tree decl, tree type)
  inform (DECL_SOURCE_LOCATION (field), "%qD should be initialized", 
field);
}
 
-  if (TREE_CODE (field_type) == RECORD_TYPE
- || TREE_CODE (field_type) == UNION_TYPE)
+  if (RECORD_OR_UNION_TYPE_P (field_type))
diagnose_uninitialized_cst_member (decl, field_type);
 }
 }
@@ -4966,8 +4964,7 @@ finish_decl (tree decl, location_t init_loc, tree init,
   if (TREE_READONLY (decl))
warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc___compat,
"uninitialized const %qD is invalid in C++", decl);
-  else if ((TREE_CODE (type) == RECORD_TYPE
-   || TREE_CODE (type) == UNION_TYPE)
+  else if (RECORD_OR_UNION_TYPE_P (type)
   && C_TYPE_FIELDS_READONLY (type))
diagnose_uninitialized_cst_member (decl, type);
 }
@@ -6726,8 +6723,7 @@ grokdeclarator (const struct c_declarator *declarator,
&& VAR_P (decl)
&& TREE_PUBLIC (decl)
&& TREE_STATIC (decl)
-   && (TREE_CODE (TREE_TYPE (decl)) == RECORD_TYPE
-   || TREE_CODE (TREE_TYPE (decl)) == UNION_TYPE
+   && (RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
|| TREE_CODE (TREE_TYPE (decl)) == ENUMERAL_TYPE)
&& TYPE_NAME (TREE_TYPE (decl)) == NULL_TREE)
   warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc___compat,
@@ -7282,8 +7278,7 @@ grokfield (location_t loc,
 that took root before someone noticed the bug...  */
 
   tree type = declspecs->type;
-  bool type_ok = (TREE_CODE (type) == RECORD_TYPE
- || TREE_CODE (type) == UNION_TYPE);
+  bool type_ok = RECORD_OR_UNION_TYPE_P (type);
   bool ok = false;
 
   if (type_ok
@@ -7359,7 +7354,7 @@ is_duplicate_field (tree x, tree y)
   xt = TREE_TYPE (x);
   if (DECL_NAME (x) != NULL_TREE)
xn = DECL_NAME (x);
-  else if ((TREE_CODE (xt) == RECORD_TYPE || TREE_CODE (xt) == UNION_TYPE)
+  else if (RECORD_OR_UNION_TYPE_P (xt)
   && TYPE_NAME (xt) != NULL_TREE
   && TREE_CODE (TYPE_NAME (xt)) == TYPE_DECL)
xn = DECL_NAME (TYPE_NAME (xt));
@@ -7369,7 +7364,7 @@ is_duplicate_field (tree x, tree y)
   yt = TREE_TYPE (y);
   if (DECL_NAME (y) != NULL_TREE)
yn = DECL_NAME (y);
-  else if ((TREE_CODE (yt) == RECORD_TYPE || TREE_CODE (yt) == UNION_TYPE)
+  else if (RECORD_OR_UNION_TYPE_P (yt)
   && TYPE_NAME (yt) != NULL_TREE
   && TREE_CODE (TYPE_NAME (yt)) == TYPE_DECL)
yn = DECL_NAME (TYPE_NAME (yt));
@@ -7404,8 +7399,7 @@ detect_field_duplicates_hash (tree fieldlist,
  }
*slot = y;
   }
-else if (TREE_CODE (TREE_TYPE (x)) == RECORD_TYPE
-|| TREE_CODE (TREE_TYPE (x)) == UNION_TYPE)
+else if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
   {
detect_field_duplicates_hash (TYPE_FIELDS (TREE_TYPE (x)), htab);
 
@@ -7456,8 +7450,7 @@ detect_field_duplicates (tree fieldlist)
   do {
 timeout--;
 if (DECL_NAME (x) == NULL_TREE
-   && (TREE_CODE (TREE_TYPE (x)) == RECORD_TYPE
-   || TREE_CODE (TREE_TYPE (x)) == UNION_TYPE))
+   && RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
   timeout = 0;
 x = DECL_CHAIN (x);
   } while (timeout > 0 && x);
@@ -7473,8 +7466,7 @@ detect_field_duplicates (tree fieldlist)
if (DECL_NAME (x)
|| (flag_plan9_extensions
&& DECL_NAME (x) == NULL_TREE
-   && (TREE_CODE (TREE_TYPE (x)) == RECORD_TYPE
-   || TREE_CODE (TREE_TYPE (x)) == UNION_TYPE)
+   && RECORD_OR_UNION_TYPE_P (TREE_TYPE (x))
&& TYPE_NAME (TREE_TYPE (x)) != NULL_TREE
&& TREE_CODE (TYPE_NAME (TREE_TYPE (x))) == TYPE_DECL))
  

Re: [PATCH] Fix PR68306

2015-11-13 Thread Uros Bizjak
On Fri, Nov 13, 2015 at 2:15 PM, Uros Bizjak  wrote:

>> 2015-11-13  Richard Biener  
>>
>> PR tree-optimization/68306
>> * tree-vect-data-refs.c (verify_data_ref_alignment): Move
>> loop related checks ...
>> (vect_verify_datarefs_alignment): ... here.
>> (vect_slp_analyze_and_verify_node_alignment): Compute and
>> verify alignment of the single DR that it matters.
>> * tree-vect-stmts.c (vectorizable_store): Add an assert.
>> (vectorizable_load): Add a comment.
>> * tree-vect-slp.c (vect_analyze_slp_cost_1): Fix DR used
>> for determining load cost.
>>
>> * gcc.dg/pr68306.c: Adjust.
>> * gcc.dg/pr68306-2.c: New testcase.
>> * gcc.dg/pr68306-3.c: Likewise.
>
> + /* { dg-additional-options "-mno-sse -mno-mmx" { target x86_64-*-* } } */
>
> You should use  { target i?86-*-* x86_64-*-* } here and in a couple of
> other places.

Added by attached patch.

2015-11-13  Uros Bizjak  

* gcc.dg/pr68306.c (dg-additional-options): Add i?86-*-* target.
* gcc.dg/pr68306-2.c (dg-additional-options): Ditto.
* gcc.dg/pr68306-3.c (dg-additional-options): Ditto.

Tested on x86_64-linux-gnu {,-m32}  and committed to mainline SVN.

Uros.
Index: ChangeLog
===
--- ChangeLog   (revision 230338)
+++ ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2015-11-13  Uros Bizjak  
+
+   * gcc.dg/pr68306.c (dg-additional-options): Add i?86-*-* target.
+   * gcc.dg/pr68306-2.c (dg-additional-options): Ditto.
+   * gcc.dg/pr68306-3.c (dg-additional-options): Ditto.
+
 2015-11-13  David Malcolm  
 
* gcc.dg/diagnostic-token-ranges.c: New file.
Index: gcc.dg/pr68306-2.c
===
--- gcc.dg/pr68306-2.c  (revision 230338)
+++ gcc.dg/pr68306-2.c  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3" } */
-/* { dg-additional-options "-mno-sse -mno-mmx" { target x86_64-*-* } } */
+/* { dg-additional-options "-mno-sse -mno-mmx" { target i?86-*-* x86_64-*-* } 
} */
 
 struct {
 int tz_minuteswest;
Index: gcc.dg/pr68306-3.c
===
--- gcc.dg/pr68306-3.c  (revision 230338)
+++ gcc.dg/pr68306-3.c  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3" } */
-/* { dg-additional-options "-mno-sse -mno-mmx" { target x86_64-*-* } } */
+/* { dg-additional-options "-mno-sse -mno-mmx" { target i?86-*-* x86_64-*-* } 
} */
 /* { dg-additional-options "-mno-altivec -mno-vsx" { target powerpc*-*-* } } */
 
 extern void fn2();
Index: gcc.dg/pr68306.c
===
--- gcc.dg/pr68306.c(revision 230338)
+++ gcc.dg/pr68306.c(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3" } */
-/* { dg-additional-options "-mno-sse -mno-mmx" { target x86_64-*-* } } */
+/* { dg-additional-options "-mno-sse -mno-mmx" { target i?86-*-* x86_64-*-* } 
} */
 
 enum powerpc_pmc_type { PPC_PMC_IBM };
 struct {


[RFC] Device-specific OpenMP target arguments

2015-11-13 Thread Martin Jambor
Hello,

the patch below is is an untested trunk-only implementation of device
specific GOMP_target_ext arguments, which was proposed to me by Jakub
today on IRC.  I'm sending this patch to make sure I understood the
details well.  Nevertheless, I will be commiting a tested and working
version to the hsa branch shortly.

Trunkness-only means there is not much device-specific in the patch
itself, the idea is that the device specific stuff will be added to
the args vector at the place specified by the comment in omp-low.c.
Each such argument will take two array elements, the first one will be
an identifier and the second one the value itself.

As suggested by Jakub, the first two elements will be common NUM_TEAMS
and THREAD_LIMIT from the teams construct, if present.

Any comments welcome,

Thanks,

Martin


2015-11-13  Martin Jambor  

include/
* gomp-constants.h (GOMP_TARGET_ARG_FIRST_DEVICE_SPECIFIC): New
constant.
(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.

gcc/
* builtin-types.def
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Turned
into BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR.
* omp-builtins.def (BUILT_IN_GOMP_TARGET): Updated type.
* omp-low.c (get_target_arguments): New function.
(expand_omp_target): Call it, do not calculate num_teams and
thread_limit.

gcc/fortran
* types.def:
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Turned
into BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR.

libgomp/
* libgomp.h (gomp_device_descr): Update type of run_func.
* libgomp_g.h (GOMP_target_ext): Update type.
* oacc-host.c (host_run): Likewise.
* target.c (GOMP_target_ext): Change type, pass arguments to plugins.

liboffloadmic/plugin/
* libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_run): Update type.


diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c68fb19..b0c7704 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -556,9 +556,9 @@ DEF_FUNCTION_TYPE_9 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 
-DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
- BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
- BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
+BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
  BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index a37e856..279d055 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -221,9 +221,9 @@ DEF_FUNCTION_TYPE_9 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 
-DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
- BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
- BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
+BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
  BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 0b6bd58..17f0610 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -339,7 +339,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, 
"GOMP_single_copy_start",
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, "GOMP_single_copy_end",
  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target_ext",
- BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
+ BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data_ext",
  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 51b471c..b9f3ac3 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12426,6 +12426,54 @@ get_oacc_ifn_dim_arg (const gimple *stmt)
   return (int) axis;
 }
 
+/* Create an array of arguments that is then passed to GOMP_target.   */
+
+static tree
+get_target_arguments (gimple_stmt_iterator *gsi, gomp_target 

Re: [PATCH] Preserve the original program while using graphite

2015-11-13 Thread Tom de Vries

On 11/11/15 23:54, Aditya Kumar wrote:

Earlier, graphite used to translate portions of the original program after
scop-detection in order to represent the SCoP into polyhedral model.  This was
required because each basic block was represented as independent basic block in
the polyhedral model. So all the cross-basic-block dependencies were translated
out-of-ssa.

With this patch those dependencies are also exposed to the ISL, so there is no
need to modify the original structure of the program.

After this patch we should be able to enable graphite at some default
optimization level.


Highlights:
Remove cross bb scalar to array translation
For reductions, add support for more than just INT_CST
Early bailout on codegen.
Verify loop-closed ssa structure during copy of renames
The uses of exprs should come from bb which dominates the bb
Collect the init value of close phi in loop-guard
Do not follow vuses for close-phi, postpone loop-close phi until the
 corresponding loop-phi is processed
Bail out if no bb found to place cond/loop -phis
Move insertion of liveouts at the end of codegen
Insert loop-phis in the loop-header.


This patch passes regtest and bootstrap with BOOT_CFLAGS='-O2 
-fgraphite-identity -floop-nest-optimize'




This patch has been committed, and caused PR68341 - 'FAIL: 
gcc.dg/graphite/interchange-{1,11,13}.c (internal compiler error)'


Thanks,
- Tom


[PATCH][AArch64] Handle function literal pools according to function size

2015-11-13 Thread Evandro Menezes

   [AArch64] Handle function literal pools according to function size

   gcc/

   PR target/63304
   * config/aarch64/aarch64-protos.h
   (aarch64_nopcrelative_literal_loads):
   Move to module scope in "aarch64.c".
   (aarch64_may_load_literal_pcrel) New function.
   * config/aarch64/aarch64.c (aarch64_nopcrelative_literal_loads):
   Change
   scope to module.
   (aarch64_may_load_literal_pcrel): New function that replaces the
   global
   variable "aarch64_nopcrelative_literal_loads" in most cases.
   (aarch64_current_func_size): New function.
   * config/aarch64/aarch64.h (machine_function): Add new member
   "size".
   * config/aarch64/aarch64.md
   (aarch64_reload_movcp):
   Use "aarch64_may_load_literal_pcrel".
   (aarch64_reload_movcp): Likewise.

Since defaulting to always using a global literal pool results in 
possible performance degradation on targets without insn fusion of the 
resulting insns, this tentative patch reverts to per function literal 
pool when the function size allows it or to the global literal pool 
otherwise.


Though the global literal pool promotes reuse of constants with positive 
impact in text size, it comes at the cost of increased I-cache pressure, 
since it then takes a pair of insns to access a literal.  Conversely, 
the per function literal pools limit reuse of constants, but reduce 
I-cache pressure due to then just a PC-relative load being used to 
access a literal.  I hope to have data to quantifying such analysis soon.


Bootstrapped in aarch64 and arm.

Feedback is welcome.

--
Evandro Menezes

>From d0fa78c4c29a15964467276493280efa091fbd64 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Fri, 13 Nov 2015 15:55:45 -0600
Subject: [PATCH] [AArch64] Handle function literal pools according to function
 size

gcc/
	PR target/63304
	* config/aarch64/aarch64-protos.h (aarch64_nopcrelative_literal_loads):
	Move to module scope in "aarch64.c".
	(aarch64_may_load_literal_pcrel) New function.
	* config/aarch64/aarch64.c (aarch64_nopcrelative_literal_loads): Change
	scope to module.
	(aarch64_may_load_literal_pcrel): New function that replaces the global
	variable "aarch64_nopcrelative_literal_loads" in most cases.
	(+aarch64_current_func_size): New function.
	* config/aarch64/aarch64.h (machine_function): Add new member "size".
	* config/aarch64/aarch64.md (aarch64_reload_movcp):
	Use "aarch64_may_load_literal_pcrel".
	(aarch64_reload_movcp): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h |  4 ++-
 gcc/config/aarch64/aarch64.c| 49 -
 gcc/config/aarch64/aarch64.h|  7 +-
 gcc/config/aarch64/aarch64.md   |  4 +--
 4 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9000d67..57868b7 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -426,5 +426,7 @@ int aarch64_ccmp_mode_to_code (enum machine_mode mode);
 bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset);
 bool aarch64_operands_ok_for_ldpstp (rtx *, bool, enum machine_mode);
 bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, enum machine_mode);
-extern bool aarch64_nopcrelative_literal_loads;
+
+extern bool aarch64_may_load_literal_pcrel (void);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5ec7f08..71f8331 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -139,6 +139,7 @@ static bool aarch64_vector_mode_supported_p (machine_mode);
 static bool aarch64_vectorize_vec_perm_const_ok (machine_mode vmode,
 		 const unsigned char *sel);
 static int aarch64_address_cost (rtx, machine_mode, addr_space_t, bool);
+static unsigned long aarch64_current_func_size (void);
 
 /* Major revision number of the ARM Architecture implemented by the target.  */
 unsigned aarch64_architecture_version;
@@ -150,7 +151,7 @@ enum aarch64_processor aarch64_tune = cortexa53;
 unsigned long aarch64_tune_flags = 0;
 
 /* Global flag for PC relative loads.  */
-bool aarch64_nopcrelative_literal_loads;
+static bool aarch64_nopcrelative_literal_loads;
 
 /* Support for command line parsing of boolean flags in the tuning
structures.  */
@@ -1558,7 +1559,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	 we need to expand the literal pool access carefully.
 	 This is something that needs to be done in a number
 	 of places, so could well live as a separate function.  */
-	  if (aarch64_nopcrelative_literal_loads)
+	  if (!aarch64_may_load_literal_pcrel ())
 	{
 	  gcc_assert (can_create_pseudo_p ());
 	  base = gen_reg_rtx (ptr_mode);
@@ -3698,7 +3699,7 @@ aarch64_classify_address (struct aarch64_address_info *info,
 	  return ((GET_CODE (sym) == LABEL_REF
 		   || (GET_CODE (sym) == SYMBOL_REF
 		   

Re: [PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Evandro Menezes

On 11/13/2015 11:36 AM, Wilco Dijkstra wrote:

Evandro Menezes wrote:
Hi, Wilco.

It looks good to me, but FCMP is quite different from FCCMP on Exynos M1,
so it'd be helpful to have distinct types for them. Say, "fcmp{s,d}"
and "fccmp{s,d}".  Would it be acceptable to add this with this patch or later?

It would be easy to add fccmps/d as new attributes, I prefer to do that as a 
separate
patch. Are there any other attributes that you think are missing besides the 
ones I know
about (extr, 64-bit mul/mla, mulh, bfi)?

Also would we need a new entry in the cost tables?

Hi, Wilco.

I don't think that additional entries in the cost tables are needed.

However, I'm afraid that, other than the ones above, no, or perhaps a 
handful of, additional attributes will be needed.  The reason is that, 
so far, I have found few attributes that would make a noticeable 
difference.  As more performance analysis, a never ending task, is done, 
I expect some important attributes to pop, but I cannot anticipate them 
conclusively right now.


Thank you,

--
Evandro Menezes



[PATCH] C++ FE: offer suggestions for misspelled field names

2015-11-13 Thread David Malcolm
This is analogous to:
  "[PATCH 2/2] C FE: suggest corrections for misspelled field names"
 https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03380.html
but for the C++ frontend.

OK for trunk if it passes bootstrap?

gcc/c/ChangeLog:
* c-typeck.c (lookup_field_fuzzy): Move determination of closest
candidate into a new function, find_closest_identifier.

gcc/cp/ChangeLog:
* cp-tree.h (lookup_member_fuzzy): New decl.
* search.c: Include spellcheck.h.
(class lookup_field_fuzzy_info): New class.
(lookup_field_fuzzy_info::fuzzy_lookup_fnfields): New.
(lookup_field_fuzzy_info::fuzzy_lookup_field): New.
(lookup_field_fuzzy_r): New.
(lookup_member_fuzzy): New.
* typeck.c (finish_class_member_access_expr): When issuing
a "has no member named" error, call lookup_member_fuzzy, and
offer any result as a suggestion.

gcc/ChangeLog:
* spellcheck-tree.c (find_closest_identifier): New function, taken
from c/c-typeck.c:lookup_field_fuzzy, with NULL corrected to
NULL_TREE in two places.
* spellcheck.h (find_closest_identifier): New decl.

gcc/testsuite/ChangeLog:
* g++.dg/spellcheck-fields.C: New file.
---
 gcc/c/c-typeck.c |  28 +--
 gcc/cp/cp-tree.h |   1 +
 gcc/cp/search.c  | 138 +++
 gcc/cp/typeck.c  |  15 +++-
 gcc/spellcheck-tree.c|  41 +
 gcc/spellcheck.h |   6 ++
 gcc/testsuite/g++.dg/spellcheck-fields.C |  89 
 7 files changed, 288 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-fields.C

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index eb4e1fc..9a23ba2 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2280,33 +2280,7 @@ lookup_field_fuzzy (tree type, tree component)
   lookup_field_fuzzy_find_candidates (type, component,
  );
 
-  /* Now determine which is closest.  */
-  int i;
-  tree identifier;
-  tree best_identifier = NULL;
-  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
-  FOR_EACH_VEC_ELT (candidates, i, identifier)
-{
-  gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
-  edit_distance_t dist = levenshtein_distance (component, identifier);
-  if (dist < best_distance)
-   {
- best_distance = dist;
- best_identifier = identifier;
-   }
-}
-
-  /* If more than half of the letters were misspelled, the suggestion is
- likely to be meaningless.  */
-  if (best_identifier)
-{
-  unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
-IDENTIFIER_LENGTH (best_identifier)) / 2;
-  if (best_distance > cutoff)
-   return NULL;
-}
-
-  return best_identifier;
+  return find_closest_identifier (component, );
 }
 
 /* Make an expression to refer to the COMPONENT field of structure or
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 828f268..9dc0e44 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6116,6 +6116,7 @@ extern int class_method_index_for_fn  (tree, 
tree);
 extern tree lookup_fnfields(tree, tree, int);
 extern tree lookup_member  (tree, tree, int, bool,
 tsubst_flags_t);
+extern tree lookup_member_fuzzy(tree, tree, bool);
 extern int look_for_overrides  (tree, tree);
 extern void get_pure_virtuals  (tree);
 extern void maybe_suppress_debug_info  (tree);
diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index 94502f6..fe794a7 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cp-tree.h"
 #include "intl.h"
 #include "toplev.h"
+#include "spellcheck.h"
 
 static int is_subobject_of_p (tree, tree);
 static tree dfs_lookup_base (tree, void *);
@@ -1352,6 +1353,143 @@ lookup_member (tree xbasetype, tree name, int protect, 
bool want_type,
   return rval;
 }
 
+/* Helper class for lookup_member_fuzzy.  */
+
+class lookup_field_fuzzy_info
+{
+ public:
+  lookup_field_fuzzy_info (bool want_type_p) :
+m_want_type_p (want_type_p), m_candidates () {}
+
+  void fuzzy_lookup_fnfields (tree type);
+  void fuzzy_lookup_field (tree type);
+
+  /* If true, we are looking for types, not data members.  */
+  bool m_want_type_p;
+  /* The result: a vec of identifiers.  */
+  auto_vec m_candidates;
+};
+
+/* Locate all methods within TYPE, append them to m_candidates.  */
+
+void
+lookup_field_fuzzy_info::fuzzy_lookup_fnfields (tree type)
+{
+  vec *method_vec;
+  tree fn;
+  size_t i;
+
+  if (!CLASS_TYPE_P (type))
+return;
+
+  method_vec = CLASSTYPE_METHOD_VEC (type);
+  if (!method_vec)
+return;
+
+  for (; vec_safe_iterate 

Re: Automatic openacc loop partitioning

2015-11-13 Thread Nathan Sidwell

On 11/13/15 16:40, Bernd Schmidt wrote:

+  this_mask = (this_mask & -this_mask);


Unnecessary parens.


+  if (!this_mask && noisy)
+warning_at (loop->loc, 0,
+"insufficient partitioning available to parallelize loop");


Should this really be an unconditional warning? Isn't sequential execution a
valid result of "auto"? (The spec implies that this is a possible outcome for
loops inside kernels that can't be determined to be independent.)


This piece of  code is only active when loops have the INDEPENDENT clause. 
That's implicit for (marked) loops inside parallel.  But for kernels you need to 
say so explicitly.  IMO the user should be told that such loops fail to be 
parallelized.



Speaking of kernels, the testcase doesn't cover it, but maybe that's because
that needs something else before it works?


Going to wait for Tom's kernels patch set to land.


nathan


Re: [PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Andrew Pinski
On Sat, Nov 14, 2015 at 1:36 AM, Wilco Dijkstra  wrote:
>> Evandro Menezes wrote:
>> Hi, Wilco.
>>
>> It looks good to me, but FCMP is quite different from FCCMP on Exynos M1,
>> so it'd be helpful to have distinct types for them. Say, "fcmp{s,d}"
>> and "fccmp{s,d}".  Would it be acceptable to add this with this patch or 
>> later?
>
> It would be easy to add fccmps/d as new attributes, I prefer to do that as a 
> separate
> patch. Are there any other attributes that you think are missing besides the 
> ones I know
> about (extr, 64-bit mul/mla, mulh, bfi)?
>
> Also would we need a new entry in the cost tables?


For ThunderX they can treated as the same.


Thanks,
Andrew

>
> Wilco
>
>


Re: [PATCH] PR fortran/67803 -- Check CHARACTER array constructor element types

2015-11-13 Thread Steve Kargl
On Fri, Nov 13, 2015 at 01:57:13PM -0800, Steve Kargl wrote:
> The attached patch fixes an ICE that occurs in arith.c(gfc_arith_concat)
> because op1 and op2 have incompatible typespecs.  The fix is actually
> implemented in array.c(gfc_match_array_constructor) where the types
> of the elements in a constructor are compared to the typespec that was
> specified in the constructor.  See testcase for examples.  Built
> and regression tested on x86_64-*-freebsd.  OK to commit?
> 
> 2015-11-13  Steven G. Kargl  
> 
>   PR fortran/67803
>   * array.c (gfc_match_array_constructor): If array constructor included
>   a CHARACTER typespec, check array elements for compatible type.
> 
> 2015-11-13  Steven G. Kargl  
>  
>   PR fortran/67803
>   * gfortran.dg/pr67803.f90: New test.
> 

Now with a patch attached.

-- 
Steve
Index: gcc/fortran/array.c
===
--- gcc/fortran/array.c	(revision 230351)
+++ gcc/fortran/array.c	(working copy)
@@ -1162,6 +1162,35 @@ done:
 {
   expr = gfc_get_array_expr (ts.type, ts.kind, );
   expr->ts = ts;
+
+  /* If the typespec is CHARACTER, check that array elements can
+	 be converted.  See PR fortran/67803.  */
+  if (ts.type == BT_CHARACTER)
+	{
+	  gfc_constructor *c;
+
+	  c = gfc_constructor_first (head);
+	  for (; c; c = gfc_constructor_next (c))
+	{
+	  if (gfc_numeric_ts (>expr->ts)
+		  || c->expr->ts.type == BT_LOGICAL)
+		{
+		  gfc_error ("Incompatiable typespec for array element at %L",
+			 >expr->where);
+		  return MATCH_ERROR;
+		}
+
+	  /* Special case null().  */
+	  if (c->expr->expr_type == EXPR_FUNCTION
+		  && c->expr->ts.type == BT_UNKNOWN
+		  && strcmp (c->expr->symtree->name, "null") == 0)
+		{
+		  gfc_error ("Incompatiable typespec for array element at %L",
+			 >expr->where);
+		  return MATCH_ERROR;
+		}
+	}
+	}
 }
   else
 expr = gfc_get_array_expr (BT_UNKNOWN, 0, );
@@ -1171,6 +1200,7 @@ done:
 expr->ts.u.cl->length_from_typespec = seen_ts;
 
   *result = expr;
+
   return MATCH_YES;
 
 syntax:
Index: gcc/testsuite/gfortran.dg/pr67803.f90
===
--- gcc/testsuite/gfortran.dg/pr67803.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr67803.f90	(working copy)
@@ -0,0 +1,14 @@
+! { dg-do compile }
+! PR fortran/67803
+! Original code submitted by Gerhard Steinmetz
+! 
+!
+program p
+  character(2) :: x(1)
+  x = '0' // [character :: 1]   ! { dg-error "Incompatiable typespec for" }
+  x = '0' // [character :: 1.]  ! { dg-error "Incompatiable typespec for" }
+  x = '0' // [character :: 1d1] ! { dg-error "Incompatiable typespec for" }
+  x = '0' // [character :: (0.,1.)] ! { dg-error "Incompatiable typespec for" }
+  x = '0' // [character :: .true.]  ! { dg-error "Incompatiable typespec for" }
+  x = '0' // [character :: null()]  ! { dg-error "Incompatiable typespec for" }
+end


Re: [PATCH applied], Power9 patches #6-8 (IEEE 128-bit h/w, 128-bit direct move, integer mult/add)

2015-11-13 Thread Michael Meissner
Here is the combo patch for patches #6, #7, and #8 that was applied.  I redid
the code attributes to have a  attribute for both fix and float patterns
David requested.

[gcc]
2015-11-13  Michael Meissner  

* config/rs6000/constraints.md (we constraint): New constraint for
64-bit power9 vector support.
(wL constraint): New constraint for the element in a vector that
can be addressed by the MFVSRLD instruction.

* config/rs6000/rs6000-protos.h (convert_float128_to_int): Add
declaration.
(convert_int_to_float128): Likewise.
(rs6000_generate_compare): Add support for ISA 3.0 (power9)
hardware support for IEEE 128-bit floating point.
(rs6000_expand_float128_convert): Likewise.
(convert_float128_to_int): Likewise.
(convert_int_to_float128): Likewise.

* config/rs6000/rs6000.md (UNSPEC_ROUND_TO_ODD): New unspecs for
ISA 3.0 hardware IEEE 128-bit floating point.
(UNSPEC_IEEE128_MOVE): Likewise.
(UNSPEC_IEEE128_CONVERT): Likewise.
(FMA_F): Add support for IEEE 128-bit floating point hardware
support.
(Ff): Add support for DImode.
(Fv): Likewise.
(any_fix code iterator): New and updated iterators for IEEE
128-bit floating point hardware support.
(any_float code iterator): Likewise.
(s code attribute): Likewise.
(su code attribute): Likewise.
(az code attribute): Likewise.
(uns code attribute): Likewise.
(neg2, FLOAT128 iterator): Add support for IEEE 128-bit
floating point hardware support.
(abs2, FLOAT128 iterator): Likewise.
(add3, IEEE128 iterator): New insns for IEEE 128-bit
floating point hardware.
(sub3, IEEE128 iterator): Likewise.
(mul3, IEEE128 iterator): Likewise.
(div3, IEEE128 iterator): Likewise.
(copysign3, IEEE128 iterator): Likewise.
(sqrt2, IEEE128 iterator): Likewise.
(neg2, IEEE128 iterator): Likewise.
(abs2, IEEE128 iterator): Likewise.
(nabs2, IEEE128 iterator): Likewise.
(fma4_hw, IEEE128 iterator): Likewise.
(fms4_hw, IEEE128 iterator): Likewise.
(nfma4_hw, IEEE128 iterator): Likewise.
(nfms4_hw, IEEE128 iterator): Likewise.
(extend2_hw): Likewise.
(truncdf2_hw, IEEE128 iterator): Likewise.
(truncsf2_hw, IEEE128 iterator): Likewise.
(fix_fixuns code attribute): Likewise.
(float_floatuns code attribute): Likewise.
(fix_si2_hw): Likewise.
(fix_di2_hw): Likewise.
(float_si2_hw): Likewise.
(float_di2_hw): Likewise.
(xscvqpwz_): Likewise.
(xscvqpdz_): Likewise.
(xscvdqp_GPR direct move helpers if we have
the MFVSRLD and MTVSRDD instructions.
(rs6000_secondary_reload_simple_move): Add support for doing
vector direct moves directly without additional scratch registers
if we have ISA 3.0 instructions.
(rs6000_secondary_reload_direct_move): Update comments.
(rs6000_output_move_128bit): Add support for ISA 3.0 vector
instructions.

* config/rs6000/vsx.md (vsx_mov): Add support for ISA 3.0
direct move instructions.
(vsx_movti_64bit): Likewise.
(vsx_extract_): Likewise.

* config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New
macros for ISA 3.0 direct move instructions.
(TARGET_DIRECT_MOVE_128): Likewise.
(TARGET_MADDLD): Add support for the ISA 3.0 integer multiply-add
instruction.

* doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL
constraints.  Update wa documentation to say not to use %x on
instructions that only take Altivec registers.

[gcc/testsuite]
2015-11-13  Michael Meissner  

* gcc.target/powerpc/float128-hw.c: New test for IEEE 128-bit
hardware floating point support.

* gcc.target/powerpc/direct-move-vector.c: New test for 128-bit
vector direct move instructions.

* gcc.target/powerpc/maddld.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: 

  1   2   >