date:20160609

[PR71478] Fix ICE in tree-ssa-reassoc.c

2016-06-09 Thread kugan


Hi,

In PR71478, for vector negation of ssa produced by call stmt, we add 
vector (-1) and ssa to the ops list. However, in the place where we 
remove the (-1) from ops list, we failed to do this for vector integer. 
As a result, rewrite_expr_tree wrongly assumes that it is working with 
gimple_assign.


Attached patch fixes the place where we remove the vector (-1).

Regression tested on x86-64-linux-gnu with no new regressions. 
Regression testing on aarc64-linux-gnu is ongoing. Is this OK for trunk?


Thanks,
Kugan

gcc/testsuite/ChangeLog:

2016-06-10  Kugan Vivekanandarajah  

* gcc.dg/pr71478.c: New test.

gcc/ChangeLog:

2016-06-10  Kugan Vivekanandarajah  

* tree-ssa-reassoc.c (reassociate_bb): Remove (-1) from ops list for
vector integer type.
diff --git a/gcc/testsuite/gcc.dg/pr71478.c b/gcc/testsuite/gcc.dg/pr71478.c
index e69de29..7b9cce6 100644
--- a/gcc/testsuite/gcc.dg/pr71478.c
+++ b/gcc/testsuite/gcc.dg/pr71478.c
@@ -0,0 +1,21 @@
+/* PR target/71478 */
+/* { dg-require-effective-target vect_int } */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+typedef unsigned int __attribute__ ((vector_size (8))) uv2si;
+typedef int __attribute__ ((vector_size (8))) v2si;
+
+uv2si bar (v2si);
+
+uv2si
+foo (void)
+{
+  v2si x = (v2si) (0x7fff80008000UL);
+  v2si y = (v2si) (0x8f997fffUL);
+  uv2si z = x >= y;
+  uv2si k = bar (x);
+  uv2si j = k * __builtin_shuffle (z, z, (uv2si) {1, 3});
+  return k * j;
+}
+
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 36b34d3..e32d503 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -5303,8 +5303,7 @@ reassociate_bb (basic_block bb)
  && rhs_code == MULT_EXPR)
{
  last = ops.last ();
- if (((TREE_CODE (last->op) == INTEGER_CST
-   && integer_minus_onep (last->op))
+ if ((integer_minus_onep (last->op)
   || real_minus_onep (last->op))
  && !HONOR_SNANS (TREE_TYPE (lhs))
  && (!HONOR_SIGNED_ZEROS (TREE_TYPE (lhs))

Re: [C++ RFC / Patch] Again about PR 70202

2016-06-09 Thread Paolo Carlini


... and this version passes testing. For real.

Paolo.


Index: cp/decl.c
===
--- cp/decl.c   (revision 237285)
+++ cp/decl.c   (working copy)
@@ -6002,6 +6002,13 @@ check_initializer (tree decl, tree init, int flags
   tree init_code = NULL;
   tree core_type;
 
+  if (init && init != error_mark_node
+  && TREE_TYPE (init) == error_mark_node)
+{
+  TREE_TYPE (decl) = error_mark_node;
+  return NULL_TREE;
+}
+
   /* Things that are going to be initialized need to have complete
  type.  */
   TREE_TYPE (decl) = type = complete_type (TREE_TYPE (decl));
Index: testsuite/g++.dg/inherit/crash6.C
===
--- testsuite/g++.dg/inherit/crash6.C   (revision 0)
+++ testsuite/g++.dg/inherit/crash6.C   (working copy)
@@ -0,0 +1,10 @@
+// PR c++/70202
+
+class A
+{
+  virtual void foo () { }
+};
+class B : public A, A { };  // { dg-error "duplicate base type" }
+
+B b1,  = b1;  // { dg-error "incomplete type" }
+A a = b2;

Go patch committed: fix error message quoting

2016-06-09 Thread Ian Lance Taylor

This patch to the Go frontend fixes the quoting in an error message.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 236804)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a87af72757d9a2e4479062a459a41d4540398005
+054ff1ece3dd5888a445efeaf3ae197b16d4186f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 236804)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -11463,7 +11463,7 @@ Selector_expression::lower_method_expres
 
   if (method != NULL && !is_pointer && !method->is_value_method())
 {
-  error_at(location, "method requires pointer (use %<(*%s).%s)%>",
+  error_at(location, "method requires pointer (use %<(*%s).%s%>)",
   nt->message_name().c_str(),
   Gogo::message_name(name).c_str());
   return Expression::make_error(location);

[wwwdocs] Follow up on usability improvements patch

2016-06-09 Thread aquaglow

I submitted a patch for the website a few weeks ago, but received no
response in the negative or positive. Wanted to check if it had been
overlooked.

Original patch can be found here:
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00767.html

Thanks,
Stuart

[PATCH, applied] Backport PowerPC ISA 3.0 splat changes to GCC 6.2

2016-06-09 Thread Michael Meissner

I have applied the following patches for ISA 3.0 (power9) that were installed
on the trunk in May to the GCC 6.2 branch after doing a bootstrap and
regression testing.  This patch combines the original patch, and the two
followup patches that fixed some bugs.  It does remove the all_ones_predicate
that had also been used by the ISA 3.0 min/max patch, and it was installed when
that patch was back ported.

[gcc]
2016-06-09  Michael Meissner  

Back port from trunk
2016-05-31  Michael Meissner  

* config/rs6000/vsx.md (vsx_splat_, V2DI/V2DF): Simplify
alternatives, eliminating preferred register class.  Add support
for the MTVSRDD instruction in ISA 3.0.
(vsx_splat_v4si_internal): Use splat_input_operand instead of
reg_or_indexed_operand.
(vsx_splat_v4sf_internal): Likewise.

Back port from trunk
2016-05-31  Michael Meissner  

PR target/71186
* config/rs6000/vsx.md (xxspltib__nosplit): Add alternatives
for loading up all 0's or all 1's.

Back port from trunk
2016-05-18  Michael Meissner  

PR target/70915
* config/rs6000/constraints.md (wE constraint): New constraint
for a vector constant that can be loaded with XXSPLTIB.
(wM constraint): New constraint for a vector constant of a 1's.
(wS constraint): New constraint for a vector constant that can be
loaded with XXSPLTIB and a vector sign extend instruction.
* config/rs6000/predicates.md (xxspltib_constant_split): New
predicates for wE/wS constraints.
(xxspltib_constant_nosplit): Likewise.
(easy_vector_constant): Add support for constants that can be
loaded via XXSPLTIB.
(splat_input_operand): Add support for ISA 3.0 word splat operations.
* config/rs6000/rs6000.c (xxspltib_constant_p): New function to
return if a constant can be loaded with the ISA 3.0 XXSPLTIB
instruction and possibly with a sign extension.
(output_vec_const_move): Add support for XXSPLTIB. If we are
loading up 0/-1 into Altivec registers, prefer using VSPLTISW
instead of XXLXOR/XXLORC.
(rs6000_expand_vector_init): Add support for ISA 3.0 word splat
operations.
(rs6000_legitimize_reload_address): Likewise.
(rs6000_output_move_128bit): Use output_vec_const_move to emit
constants.
* config/rs6000/vsx.md (VSX_M): Add TImode (if -mvsx-timode) and
combine VSX_M and VSX_M2 into one iterator.
(VSX_M2): Likewise.
(VSINT_84): New iterators for loading constants with XXSPLTIB.
(VSINT_842): Likewise.
(UNSPEC_VSX_SIGN_EXTEND): New UNSPEC.
(xxspltib_v16qi): New insns to load up constants with the ISA 3.0
XXSPLTIB instruction.
(xxspltib__nosplit): Likewise.
(xxspltib__split): New insn to load up constants with
XXSPLTIB and a sign extend instruction.
(vsx_mov): Replace single move that handled all vector types
with separate 32-bit and 64-bit moves.  Combine the movti_
moves (when -mvsx-timode is in effect) into the main vector
moves.  Eliminate separate moves for  , where the
preferred register class () is listed first, and the
secondary register class () is listed second with a '?' to
discourage use.  Prefer loading 0/-1 in any VSX register for ISA
3.0, and Altivec registers for ISA 2.06/2.07 (PR target/70915) so
that if the register was involved in a slow operation, the
clear/set operation does not wait for the slow operation to
finish.  Adjust the length attributes for 32-bit mode.  Use
rs6000_output_move_128bit and drop the use of the string
instructions for 32-bit movti when -mvsx-timode is in effect.  Use
spacing so that the alternatives and attributes don't generate
long lines, and put things in columns, so that it is easier to
match up the operands and attributes with the insn alternatives.
(vsx_mov_64bit): Likewise.
(vsx_mov_32bit): Likewise.
(vsx_movti_64bit): Fold movti into normal vector moves.
(vsx_movti_32bit): Likewise.
(vsx_splat_, V4SI/V4SF modes): Add support for ISA 3.0 word
splat instructions.
(vsx_splat_v4si_internal): Likewise.
(vsx_splat_v4sf_internal): Likewise.
(vector fusion peepholes): Use VSX_M instead of VSX_M2.
(vsx_sign_extend_qi_): New ISA 3.0 instructions to sign
extend vector elements.
(vsx_sign_extend_hi_): Likewise.
(vsx_sign_extend_si_v2di): Likewise.
* config/rs6000/rs6000-protos.h (xxspltib_constant_p): Add
declaration.
* doc/md.texi (PowerPC constraints): Document the wE, wM, and wS
constraints.  Add

Re: [C++ RFC / Patch] Again about PR 70202

2016-06-09 Thread Paolo Carlini


Hi,

On 09/06/2016 23:38, Jason Merrill wrote:

I would think that when we see a duplicated base we should just drop
the duplicates and continue.

If the type of b1 is error_mark_node, why isn't the type of b2 also
error_mark_node?
Thanks Jason. Normally check_initializer massages the decl basing on the 
init and so to speak transforms an error_mark_node as TREE_TYPE of the 
init into an error_mark_node as DECL_INITIAL of a reference. Thus we 
don't see error_mark_node as TREE_TYPE of b2. Now if I do something as 
trivial as the attached the testcase passes.


Next step, analyze the huge breakage caused by that change, I don't dare 
running the testsuite ;)


Thanks,
Paolo.

///


Index: decl.c
===
--- decl.c  (revision 237280)
+++ decl.c  (working copy)
@@ -6002,6 +6002,12 @@ check_initializer (tree decl, tree init, int flags
   tree init_code = NULL;
   tree core_type;
 
+  if (TREE_TYPE (init) == error_mark_node)
+{
+  TREE_TYPE (decl) = error_mark_node;
+  return NULL_TREE;
+}
+
   /* Things that are going to be initialized need to have complete
  type.  */
   TREE_TYPE (decl) = type = complete_type (TREE_TYPE (decl));

Re: [PATCH][AArch64] Enable -frename-registers at -O2 and higher

2016-06-09 Thread Jim Wilson

On Tue, May 31, 2016 at 2:56 AM, James Greenhalgh
 wrote:
> As you're proposing to have this on by default, I'd like to give a chance
> to hear whether there is consensus as to this being the right choice for
> the thunderx, xgene1, exynos-m1 and qdf24xx subtargets.

I tested this on qdf24xx using SPEC 2006.  I'm seeing a very small
performance decrease on the int benchmarks which may just be noise,
and an even smaller decrease on the FP side which is definitely in the
noise.  This is an out-of-order part with hardware register renaming,
so this optimization probably doesn't do much for this target.  I may
also be seeing secondary issues, e.g. the pipeline description isn't
as good as I would like yet.

Jim

Re: [PATCH,FIXINCLUDES] AIX stdlib.h #define malloc

2016-06-09 Thread David Edelsohn

Bruce,

The current AIX stdlib.h header file uses #define.  This is exactly
what I am trying to remove with the fixincludes patch.

#define malloc __linux_malloc

needs to be replaced with

extern void *malloc(size_t) __asm__("__linux_malloc");

and so forth.

Thanks, David

On Thu, Jun 9, 2016 at 5:18 PM, Bruce Korb  wrote:
> It ought to work:
>
> $ for f in m re c v ; do printf '#define %salloc __linux_%salloc\n' $f $f
>> done > foo.h
> $ grep -E '[ \t](m|re|c|v)alloc +__linux_\1alloc' foo.h
> #define malloc __linux_malloc
> #define realloc __linux_realloc
> #define calloc __linux_calloc
> #define valloc __linux_valloc
>
> and your "%1" is then "m" or "re" or "c" or "v".
> I can't test right now tho.
>
> On Thu, Jun 9, 2016 at 10:49 AM, David Edelsohn  wrote:
>>
>> Hi, Bruce!
>>
>> I thought about a regex, but the aliases require a full function
>> signature and the original, narrow context does not provide a function
>> signature.  If it was just alias XXXalloc as __linux_XXXalloc, it
>> would be more straight forward.  If there's a convenient way to add
>> the other information, I'm eager to learn.
>>
>> Thanks, David
>>
>> On Thu, Jun 9, 2016 at 1:44 PM, Bruce Korb  wrote:
>> > He's retired, but he ain't dead.
>> > I think these could be accomplished with a single fix.
>> > Please try a regex expression in the selection and utilize the selection
>> > in
>> > the replacement.
>> > I'll look at it when I have time (give me a few days)
>> >
>> > On Thu, Jun 9, 2016 at 10:29 AM, Jeff Law  wrote:
>> >>
>> >> On 06/09/2016 11:25 AM, David Edelsohn wrote:
>> >>>
>> >>> AIX has added variants of malloc, realloc, calloc and valloc with
>> >>> greater compatibility with Linux semantics, especially for NULL
>> >>> addresses.  The variants are declared in stdlib.h and use #define to
>> >>> override the normal definition if _LINUX_SOURCE_COMPAT is defined,
>> >>> e.g.,
>> >>>
>> >>> #define malloc __linux_malloc
>> >>> #define calloc __linux_calloc
>> >>> #define realloc __linux_realloc
>> >>> #define valloc __linux_valloc
>> >>>
>> >>> libstdc++-v3 cstdlib specifically undefines a number of stdlib.h
>> >>> macros,
>> >>> e.g.,
>> >>>
>> >>> // Get rid of those macros defined in  in lieu of real
>> >>> functions.
>> >>> ...
>> >>> #undef malloc
>> >>> #undef realloc
>> >>>
>> >>> C++ applications on AIX, especially users of BOOST that include
>> >>> cstdlib, encounter unexpected behavior when the definition of malloc
>> >>> changes from the expected / requested version.
>> >>>
>> >>> The following patch updates fixincludes to correct the AIX stdlib.h
>> >>> header by converting the #define to GCC asm aliases.  I created a
>> >>> separate fix for each definition because the order is not guaranteed.
>> >>>
>> >>> Bootstrapped on powerpc-ibm-aix7.1.0.0.  This fixes a recent node.js
>> >>> build failure on AIX due to additional dependence on BOOST.
>> >>>
>> >>> Okay for trunk, GCC 6 and GCC 5?
>> >>>
>> >>> Thanks, David
>> >>>
>> >>> * inclhack.def (aix_stdlib_malloc): New fix.
>> >>> (aix_stdlib_realloc): New fix.
>> >>> (aix_stdlib_calloc): New fix.
>> >>> (aix_stdlib_valloc): New fix.
>> >>> * fixincl.x: Regenerate.
>> >>> * test/base/stdlib.h [AIX_STDLIB_MALLOC]: New test.
>> >>> [AIX_STDLIB_REALLOC]: New test.
>> >>> [AIX_STDLIB_CALLOC]: New test.
>> >>> [AIX_STDLIB_VALLOC]: New test.
>> >>
>> >> Wow, fixincludesI'm not even sure if Bruce is around anymore...
>> >>
>> >> GIven these are conditional on mach= *-*-aix*, I think you can
>> >> self-approve them.
>> >>
>> >> jeff
>> >>
>> >
>
>

Re: [PATCH] Fold x/x to 1, 0/x to 0 and 0%x to 0 consistently

2016-06-09 Thread Jason Merrill

On Thu, Jun 9, 2016 at 3:39 AM, Richard Biener  wrote:
> On Thu, 9 Jun 2016, Jakub Jelinek wrote:
>
>> On Thu, Jun 09, 2016 at 08:50:15AM +0200, Richard Biener wrote:
>> > On Wed, 8 Jun 2016, Jason Merrill wrote:
>> >
>> > > On Wed, Jun 8, 2016 at 11:16 AM, Marc Glisse  
>> > > wrote:
>> > > > On Wed, 8 Jun 2016, Richard Biener wrote:
>> > > >
>> > > >> The following works around PR70992 but the issue came up repeatedly
>> > > >> that we are not very consistent in preserving the undefined behavior
>> > > >> of division or modulo by zero.  Ok - the only inconsistency is
>> > > >> that we fold 0 % x to 0 but not 0 % 0 (with literal zero).
>> > > >>
>> > > >> After folding is now no longer done early in the C family FEs the
>> > > >> number of diagnostic regressions with the patch below is two.
>> > > >>
>> > > >> FAIL: g++.dg/cpp1y/constexpr-sfinae.C  -std=c++14 (test for excess 
>> > > >> errors)
>> > >
>> > > Yep. We don't want to fold away undefined behavior in a constexpr
>> > > function, since constexpr evaluation wants to detect undefined
>> > > behavior and treat the expression as non-constant in that case.
>> >
>> > Hmm.  So 0 / x is not constant because x might be zero but 0 * x is
>> > constant because it can never invoke undefined behavior?  Does this mean
>> > that 0 << n is not const because n might be too large (I suppose
>> > 0 << 12000 is not const already)?  Is 0 * (-x) const?  x might be INT_MIN.
>>
>> E.g. for the shifts the C++ FE has cxx_eval_check_shift_p which should
>> optionally warn and/or set *non_constant_p.  0 * (-INT_MIN) would be
>> non-constant too, etc.
>> constexpr int foo (int x) { return -x; }
>> constexpr int bar (int x) { return 0 * (-x); }
>> constexpr int a = foo (-__INT_MAX__ - 1);
>> constexpr int b = bar (-__INT_MAX__ - 1);
>> shows that we don't diagnose the latter though, most likely because
>> constexpr evaluation is done on the folded tree.

I don't think there's any folding before constexpr evaluation in this
case, since this doesn't involve a call.

>> So, either whatever cp_fold does (which uses fold* underneath) should avoid
>> folding such cases, or the constexpr evaluation should be done on a copy
>> made before folding.  Then cp_fold doesn't have to prohibit optimizations
>> and it is a matter of constexpr.c routines to detect all the undefined
>> behavior.  After all, I think doing constexpr evaluation on unfolded trees
>> will have also the advantage of better diagnostic locations.
>
> Yes, I think constexpr diagnostic / detection should be done on
> unfolded trees (not sure why we'd need to a copy here, just do folding
> later?).  If cp_fold already avoids folding 0 * (-x) then it should
> simply avoid folding 0 / x or 0 % x as well (for the particular issue
> this thread started on).

The issue is with constexpr functions, which get delayed folding like
any other function before they are used in constant expression
evaluation.  The copy would be to preserve the unfolded trees through
cp_fold_function.  I've been thinking about doing this anyway; this
may be the motivation to go ahead with it.

Jason

Re: [PATCH] Add ggc-tests.c

2016-06-09 Thread Jeff Law


On 06/06/2016 03:31 PM, David Malcolm wrote:

Jeff approved an earlier version of this (as
unittests/test-ggc.c):
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03306.html

Not terribly happy with that counter to used to create a big list
to detect recursion. But I'm not offhand sure how to avoid without
exposing more of the ggc system that is wise.

OK if/when prereqs are approved. Minor twiddling if we end up
moving it elsewhere or standardizing/reducing header files is
pre-approved.


This version moves it to gcc/ggc-tests.c and ports it to the new
-fself-test approach.

For now, I also reduced the count within
  TEST_F (ggc_test, chain_next)
from 2 million to 10, to avoid slowing down the test (though the
former takes only about 0.5s on my box).

I've also fixed things so that it works with both checked and unchecked
builds.  Note that the GTY roots within ggc-tests.c are wrapped with
  #if CHECKING_P
which implies that PCH files would be incompatible between release vs
checked builds (I'm not sure whether or not that's already the case).
I wouldn't expect PCH files to be compatible across release vs checked 
builds.  It might work, but that'd likely be more accident than design 
(though I am trying to squash out cases where structures change their 
lengths based on things like ENABLE_CHECKING).





I've also added various new tests since Jeff's review, for:
  * GTY((length)),
  * unions, and
  * GTY((user))

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* Makefile.in (OBJS): Add ggc-tests.o.
(GTFILES): Add ggc-tests.c.
* ggc-tests.c: New file.



+
+static void
+test_chain_next ()
+{
+  /* 2 million nodes (and thus the same number of stack frames) ought
+ to be deep enough to crash if gengtype has created something
+ that recurses.
+
+ This length reliably causes the test to segfault without the
+ chain_next/prev optimization on this box (Fedora 20 x86_64 with 128GB
+ of RAM), but causes this test to take about 0.5s, dominating the time
+ taken by the overall testsuite.
+
+ We could perhaps lower this by not increasing the stack size so much
+ in toplev.c, or perhaps reducing the stack size when running this
+ testcase.  */
+  const int count = 10; // 200;
ISTM like you need to update the comment.  And the trailing // 200 
should be eliminated.


I'm just not sure how to really test for this properly.  I'd be tempted 
to remove it.


OK with the comment fixes for test_chain_next.

jeff

Minor tweak to df_note_bb_compute

2016-06-09 Thread Eric Botcazou

This simply prevents valgrind from complaining about an invalid read when 
pass_free_cfg::execute calls df_analyze for targets with delay slots, because 
var_location instructions are present in the RTL stream at this point.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2016-06-09  Eric Botcazou  

* df-problems.c (df_note_bb_compute): Guard use of DF_INSN_INFO_GET.

-- 
Eric BotcazouIndex: df-problems.c
===
--- df-problems.c	(revision 237165)
+++ df-problems.c	(working copy)
@@ -3498,13 +3498,13 @@ df_note_bb_compute (unsigned int bb_inde
 
   FOR_BB_INSNS_REVERSE (bb, insn)
 {
+  if (!INSN_P (insn))
+	continue;
+
   df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
   df_mw_hardreg *mw;
   int debug_insn;
 
-  if (!INSN_P (insn))
-	continue;
-
   debug_insn = DEBUG_INSN_P (insn);
 
   bitmap_clear (do_not_gen);

Re: [PATCH] Fix PR 71407 : Use correct types for live usage of live operations

2016-06-09 Thread Jeff Law


On 06/09/2016 10:03 AM, Alan Hayward wrote:

This patch fixes PR 71407 by ensuring the BIT_FIELD_REF is created using
the
given vectype and then casted to the result type.

Tested on x86 and aarch64.
Note that the test vectorizes on x86 but does not vectorize on aarch64 or
power (due to a != statement failing to vectorize)

Ok to commit?



gcc/
PR tree-optimization/71407
PR tree-optimization/71416
* tree-vect-loop.c (vectorizable_live_operation): Use vectype for
BIT_FIELD_REF type.

Testsuite/
PR tree-optimization/71407
PR tree-optimization/71416
* gcc.dg/vect/pr71407.c: New
* gcc.dg/vect/pr71416-1.c: New
* gcc.dg/vect/pr71416-2.c: New

OK.
jeff

Re: [C++ RFC / Patch] Again about PR 70202

2016-06-09 Thread Jason Merrill

I would think that when we see a duplicated base we should just drop
the duplicates and continue.

If the type of b1 is error_mark_node, why isn't the type of b2 also
error_mark_node?

Jason


On Thu, Jun 9, 2016 at 1:01 PM, Paolo Carlini  wrote:
> Hi again,
>
> thus today I had to revert my first try at resolving this error recovery
> issue because it caused c++/71465. In retrospect it was a bit heavy handed
> anyway, the zeroing of the type served us well for many years...
>
> Today I tried to investigate the issue a bit more. I remind you that we are
> crashing on the gcc_unreachable () at the end of build_simple_base_path
> while, during error recovery, we are processing the initialization of the
> final 'a' in:
>
> class A
> {
>   virtual void foo () { }
> };
> class B : public A, A { };  // { dg-error "duplicate base type" }
>
> B b1,  = b1;
> A a = b2;
>
> Now the first comment I have about this, about which I'd like to have some
> feedback, is that the gcc_unreachable () at the end of
> build_simple_base_path without a return seems a bit weird to me, because the
> function is *supposed to return a tree*, not void. I guess some front-ends
> would even warn on this?!? Anyway, if we change:
>
>
> /* */
>
> gcc_unreachable ();
> }
>
> to:
>
> /* */
>
> gcc_assert (seen_error ());
> return error_mark_node;
> }
>
> we get something which, by and large is equally safe in terms of
> miscompilations and helps a lot error recovery, ie, we would not have
> c++/70202. It's also true however that we would not make much progress in
> trying to understand if we could solve the specific problem at issue in a
> more insightful way.
>
> Then my next observation is the following: if we change the testcase to:
>
> class A
> {
>   virtual void foo () { }
> };
> class B : public A, A { };  // { dg-error "duplicate base type" }
>
> B b1,  = b1;
> A a = b1;
>
> thus everything is the same but at the end we have 'b1', not 'b2', we also
> don't crash. We are saved not by something particularly smart ;) but by this
> check at the beginning of ocp_convert:
>
>   if (error_operand_p (e) || type == error_mark_node)
>  return error_mark_node;
>
> that is, the TREE_TYPE of 'e', the VAR_DECL for 'b1', is error_mark_node and
> we return early the error_mark_node, we don't reach the problematic
> build_base_path. With 'b2' we have a more complicated REFERENCE_REF which we
> don't catch, but otherwise everything is the same, we reach build_base_path
> and we crash on the gcc_unreachable. In practice, if I add at the beginning
> of ocp_convert:
>
> +  if (REFERENCE_REF_P (expr)
> +  && VAR_P (TREE_OPERAND (expr, 0))
> +  && DECL_INITIAL (TREE_OPERAND (expr, 0)) == error_mark_node)
> +return error_mark_node;
>
> it also passes testing. Then the final observation is that we also don't
> crash if the type of 'A' is simpler, thus we don't have the declaration of
>
> virtual void foo () { }
>
> Then I gather that when the type is "complex" enough we are not cleaning
> things up completely enough when we encounter it as a duplicated base. Maybe
> as optimal error recovery when we see a duplicated base we should clean up
> everything having to do with bases end up with something equivalent to an
> incomplete class B; or something like that?!? I don't know what we want to
> try short / medium /long term...
>
> Thanks!
> Paolo.
>
>

[PATCH] PR bootstrap/71481: fix input.c selftest

2016-06-09 Thread David Malcolm

input.c's selftest::test_reading_source_line attempted to read from
__FILE__, which doesn't work if the binary is run from a different
location than the build dir.

Fix it by rewriting the test to write out a tempfile, and read from
that, rather than from __FILE__.

I used make_temp_file to create the name for the temporary file, on
the grounds that that's what the driver uses for that purpose.

This is on top of the patch kit posted as:
  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00735.html

Successfully bootstrapped on x86_64-pc-linux-gnu
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0

OK for trunk?

gcc/ChangeLog:
PR bootstrap/71481
* input.c (selftest::test_reading_source_line): Avoid reading from
__FILE__ by creating a tempfile with known content and reading
from that instead.
---
 gcc/input.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/input.c b/gcc/input.c
index 704ee75..08019c9 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1219,23 +1219,33 @@ test_builtins ()
 static void
 test_reading_source_line ()
 {
-  /* We will read *this* source file, using __FILE__.
- Here is some specific text to read and test for:
- The quick brown fox jumps over the lazy dog.  */
-  const int linenum_after_test_message = __LINE__;
-  const int linenum = linenum_after_test_message - 1;
-
+  /* Create a tempfile and write some text to it.  */
+  char *filename = make_temp_file (".txt");
+  ASSERT_NE (filename, NULL);
+  FILE *out = fopen (filename, "w");
+  if (!out)
+::selftest::fail_formatted (SELFTEST_LOCATION,
+   "unable to open tempfile: %s", filename);
+  fprintf (out,
+  "01234567890123456789\n"
+  "This is the test text\n"
+  "This is the 3rd line\n");
+  fclose (out);
+
+  /* Read back a specific line from the tempfile.  */
   int line_size;
-  const char *source_line = location_get_source_line (__FILE__, linenum, 
_size);
+  const char *source_line = location_get_source_line (filename, 2, _size);
   ASSERT_TRUE (source_line != NULL);
-  ASSERT_EQ (53, line_size);
-  if (!strncmp (" The quick brown fox jumps over the lazy dog.  */",
-  source_line, line_size))
+  ASSERT_EQ (21, line_size);
+  if (!strncmp ("This is the test text",
+   source_line, line_size))
 ::selftest::pass (SELFTEST_LOCATION,
  "source_line matched expected value");
   else
 ::selftest::fail (SELFTEST_LOCATION,
  "source_line did not match expected value");
+
+  free (filename);
 }
 
 /* Run all of the selftests within this file.  */
-- 
1.8.5.3

Re: [PING] [PATCH] c/71392 - SEGV calling integer overflow built-ins with a null pointer

2016-06-09 Thread Jeff Law


On 06/09/2016 12:34 PM, Martin Sebor wrote:

Attached is an updated version of the original patch described
below to annotate with the nonnull attribute the Built-In Functions
to Perform Arithmetic with Overflow Checking.

Since the machinery that's in place doesn't handle the attribute
on type-generic built-ins changes to the (handle_nonnull_attribute
function in Ada and LTO besides C were required so I CC the
maintainers for these areas.

Thanks
Martin

On 06/02/2016 05:22 PM, Martin Sebor wrote:

In a discussion of a patch in a this area (c/68120 and c++/70507)
Jakub noticed that the integer arithmetic built-ins with overflow
checking that expect a pointer to an integer as the last argument
silently (i.e., without a warning) accept a null pointer.  As the
test case in the bug referenced in in subject shows, such calls
then crash at runtime.

The attached patch follows the same approach used by other built
ins that take a pointer to an object (such as __built_strlen) to
issue a -Wnonnull warning for such invalid calls.

Martin



gcc-71392.diff


PR c/71392 - SEGV calling integer overflow built-ins with a null pointer

gcc/ChangeLog:

PR c/71392
* builtin-attrs.def (ATTR_NOTHROW_NONNULL_LEAF_LIST): New macro.
(ATTR_NOTHROW_NONNULL_TYPEGENERIC_LEAF): Same.
* builtins.def (BUILT_IN_SADD_OVERFLOW, BUILT_IN_SADDL_OVERFLOW): Use
them.
(BUILT_IN_SADDLL_OVERFLOW, BUILT_IN_SSUB_OVERFLOW): Same.
(BUILT_IN_SSUBL_OVERFLOW, BUILT_IN_SSUBLL_OVERFLOW): Same.
(BUILT_IN_SMUL_OVERFLOW, BUILT_IN_SMULL_OVERFLOW): Same.
(BUILT_IN_SMULLL_OVERFLOW, BUILT_IN_UADD_OVERFLOW): Same.
(BUILT_IN_UADDL_OVERFLOW, BUILT_IN_UADDLL_OVERFLOW): Same.
(BUILT_IN_USUB_OVERFLOW, BUILT_IN_USUBL_OVERFLOW): Same.
(BUILT_IN_USUBLL_OVERFLOW, BUILT_IN_UMUL_OVERFLOW): Same.
(BUILT_IN_UMULL_OVERFLOW, BUILT_IN_UMULLL_OVERFLOW):

gcc/ada/ChangeLog:

PR c/71392
* gcc/ada/gcc-interface/utils.c (handle_nonnull_attribute): Accept
the nonnull attribute in type-generic builtins.

gcc/c-family/ChangeLog:

PR c/71392
* gcc/c-family/c-common.c (handle_nonnull_attribute): Accept
the nonnull attribute in type-generic builtins.

gcc/lto/ChangeLog:

PR c/71392
* gcc/lto/lto-lang.c (handle_nonnull_attribute): Accept the nonnull
attribute in type-generic builtins.

gcc/testsuite/ChangeLog:

PR c/71392
* c-c++-common/builtin-arith-overflow-1.c: Add test cases.

OK for the trunk.

THanks,
Jeff

Re: [PING] [PATCH] Fix ICE with x87 asm operands (PR inline-asm/68843)

2016-06-09 Thread Jeff Law


On 05/29/2016 08:37 AM, Bernd Edlinger wrote:




gcc:
2016-05-22  Bernd Edlinger  

PR inline-asm/68843
* reg-stack.c (check_asm_stack_operands): Explicit input arguments
must be grouped on top of stack.  Don't force early clobber
on ordinary reg outputs.

testsuite:
2016-05-22  Bernd Edlinger  

PR inline-asm/68843
* gcc.target/i386/pr68843-1.c: New test.
* gcc.target/i386/pr68843-2.c: New test.

OK for the trunk.

Sorry for the wait,
Jeff

Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-09 Thread Segher Boessenkool

On Thu, Jun 09, 2016 at 04:06:53PM -0400, Michael Meissner wrote:
> I'm including the global reviewers on the list.  I just want to be sure that
> there is no problem installing these patches on the GCC 6.2 branch.  While it
> is technically an enchancement, it is needed to be able to install the glibc
> support that is needed to complete the work to add IEEE 128-bit floating 
> point.
> 
> The issue being fixed is that when we are creating the complex type, we used 
> to
> do a lookup for the size, and that fails on the PowerPC which has 2 128-bit
> floating point types (__ibm128 and __float128, with long double currently
> defaulting to __ibm128).

The rs6000 parts are okay to backport to 6.

Thanks,


Segher

Re: [PATCH] Fixes to must-tail-call tests

2016-06-09 Thread Jeff Law


On 05/27/2016 11:44 AM, David Malcolm wrote:

On Fri, 2016-05-27 at 13:29 +0100, Thomas Preudhomme wrote:

Hi Rainer,

On Wednesday 25 May 2016 11:31:12 Rainer Orth wrote:

David Malcolm  writes:

The following fixes the known failures of the must-tail-call
tests.

Tested with --target=
* aarch64-unknown-linux-gnu
* ia64-unknown-linux-gnu
* m68k-unknown-linux-gnu
* x86_64-pc-linux-gnu


Even with this patch, there are still failures on sparc-sun
-solaris2.12:

FAIL: gcc.dg/plugin/must-tail-call-1.c
-fplugin=./must_tail_call_plugin.so
(test for excess errors)

Excess errors:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/plugin/must-tail
-call-1.c:1
2:10: error: cannot tail-call: target is not able to optimize the
call into
a sibling call

FAIL: gcc.dg/plugin/must-tail-call-2.c
-fplugin=./must_tail_call_plugin.so
(test for excess errors)

Excess errors:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/plugin/must-tail
-call-2.c:3
2:10: error: cannot tail-call: target is not able to optimize the
call into
a sibling call



My aim with these tests was to try to cover the various ways in which
mandatory tail-call optimization can fail.

However, this is very target-dependent, and, as written, the test over
-specifies the output.

Sorry about this.

I've run the test on all of the configurations in contrib/config
-list.mk (using the patch kit in
  https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02100.html )

Collated output can be seen here:
  https://dmalcolm.fedorapeople.org/gcc/2016-05-27/must-tail-call-logs.txt
showing all the different error messages across every configuration.

It's not clear to me what the best way forward here is.
We could simply check for
   error: cannot tail-call:
and leave the precise messages we're checking for unspecified.

If we care about checking for the precise messages, one of more
duplicate copy(s) of the test could be provided, filtering by target to
specify precise targets, giving more precise output, where this is
known.

I'm attaching a patch (sans ChangeLog) which strips the precise
messages in the manner described above.  Retesting on all targets with
this patch, the only remaining failures are of the form:
  must-tail-call-1.c:12:10: error: cannot tail-call: machine
description does not have a sibcall_epilogue instruction pattern
on targets lacking the pattern.

Thoughts?
I probably should have caught this -- there's all kinds of oddball 
reasons why we might not be able to optimize a tail call, some of which 
are target dependent.


I think trying to give highly precise messages is doomed to long term 
maintenance headaches because they're going to be target dependent.


So I'd go with just stripping the precise messages like you've done.

jeff

Re: Revert gcc r227962

2016-06-09 Thread Jeff Law


On 05/29/2016 08:33 AM, JonY wrote:

On 5/23/2016 16:56, JonY wrote:

On 5/20/2016 06:36, JonY wrote:

On 5/20/2016 02:11, Jeff Law wrote:

So if we make this change (revert 227962), my understanding is that
cygwin bootstraps will fail because they won't find kernel32 and perhaps
other libraries.

Jeff



I'll need to double check with trunk but gcc-5.3.0 built OK without it.
The other alternative is to search /usr/lib before w32api.




yep it reached stage 3 but failed from another error building target
libraries (libcilkrts), meaning it was able to find the w32api libraries
even with this patch reverted.



Has it been reverted? I managed to bootstrap after disabling the failing
libraries.

I've reverted the patch on the trunk.  THanks for your patience.

jeff>

Re: [PATCH] Remove old diagnostic macros.

2016-06-09 Thread Jeff Law


On 06/09/2016 12:35 PM, Marcin Baczyński wrote:

2016-06-09 18:35 GMT+02:00 Jeff Law :

On 05/30/2016 05:19 PM, Marcin Baczyński wrote:


Hi,
this is my first GCC patch, so please bear with me if something is wrong
with
it in an obvious way.

I've found two unused macros in gcc/diagnostic.h. Is the patch okay as is?
Bootstrapped on x86_64-pc-linux-gnu.

2016-05-31 Marcin Baczyński 

   * gcc/diagnostic.h (diagnostic_line_cutoff, diagnostic_flush_buffer):
delete.


This is fine for the trunk.  Please install.



Thanks for the approval. As for installing, I don't have write access
to the repository, so could someone with the access do it, please?

OK.  I took care of committing to the repository.  Thanks,
jeff

Re: move increase_alignment from simple to regular ipa pass

2016-06-09 Thread Prathamesh Kulkarni

On 8 June 2016 at 20:38, Jan Hubicka  wrote:
>> I think it would be nice to work towards transitioning
>> flag_section_anchors to a flag on varpool nodes, thereby removing
>> the Optimization flag from common.opt:fsection-anchors
>>
>> That would simplify the walk over varpool candidates.
>
> Makes sense to me, too. There are more candidates for sutff that should be
> variable specific in common.opt (such as variable alignment, -fdata-sctions,
> -fmerge-constants) and targets.  We may try to do it in an easy to extend way
> so incrementally we can get rid of those global flags, too.
In this version I removed Optimization from fsection-anchors entry in
common.opt,
and gated the increase_alignment pass on flag_section_anchors != 0.
Cross tested on arm*-*-*, aarch64*-*-*.
Does it look OK ?
>
> One thing that needs to be done for LTO is sane merging, I guess in this case
> it is clear that the variable should be anchored when its previaling 
> definition
> is.
Um could we determine during WPA if symbol is a section anchor for merging ?
Seems to me SYMBOL_REF_ANCHOR_P is defined only on DECL_RTL and not at
tree level.
Do we have DECL_RTL info available during WPA ?

Thanks,
Prathamesh
>
> Honza
>>
>> Richard.
>>
>> > Thanks,
>> > Prathamesh
>> > >
>> > > Honza
>> > >>
>> > >> Richard.
>> >
>> >
>>
>> --
>> Richard Biener 
>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
>> 21284 (AG Nuernberg)
diff --git a/gcc/common.opt b/gcc/common.opt
index f0d7196..f93f26c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2133,7 +2133,7 @@ Common Report Var(flag_sched_dep_count_heuristic) Init(1) 
Optimization
 Enable the dependent count heuristic in the scheduler.
 
 fsection-anchors
-Common Report Var(flag_section_anchors) Optimization
+Common Report Var(flag_section_anchors)
 Access data in the same section from shared anchor points.
 
 fsee
diff --git a/gcc/passes.def b/gcc/passes.def
index 3647e90..3a8063c 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -138,12 +138,12 @@ along with GCC; see the file COPYING3.  If not see
   PUSH_INSERT_PASSES_WITHIN (pass_ipa_tree_profile)
   NEXT_PASS (pass_feedback_split_functions);
   POP_INSERT_PASSES ()
-  NEXT_PASS (pass_ipa_increase_alignment);
   NEXT_PASS (pass_ipa_tm);
   NEXT_PASS (pass_ipa_lower_emutls);
   TERMINATE_PASS_LIST (all_small_ipa_passes)
 
   INSERT_PASSES_AFTER (all_regular_ipa_passes)
+  NEXT_PASS (pass_ipa_increase_alignment);
   NEXT_PASS (pass_ipa_whole_program_visibility);
   NEXT_PASS (pass_ipa_profile);
   NEXT_PASS (pass_ipa_icf);
diff --git a/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c 
b/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c
new file mode 100644
index 000..74eaed8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target section_anchors } */
+/* { dg-require-effective-target vect_int } */
+
+#define N 32
+
+/* Clone of section-anchors-vect-70.c with foo() having 
-fno-tree-loop-vectorize.  */ 
+
+static struct A {
+  int p1, p2;
+  int e[N];
+} a, b, c;
+
+__attribute__((optimize("-fno-tree-loop-vectorize")))
+int foo(void)
+{
+  for (int i = 0; i < N; i++)
+a.e[i] = b.e[i] + c.e[i];
+
+   return a.e[0];
+}
+
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 
"increase_alignment" { target aarch64*-*-* } } } */
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 
"increase_alignment" { target powerpc64*-*-* } } } */
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 
"increase_alignment" { target arm*-*-* } } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 36299a6..d36aa1d 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -483,7 +483,7 @@ extern simple_ipa_opt_pass 
*make_pass_local_optimization_passes (gcc::context *c
 
 extern ipa_opt_pass_d *make_pass_ipa_whole_program_visibility (gcc::context
   *ctxt);
-extern simple_ipa_opt_pass *make_pass_ipa_increase_alignment (gcc::context
+extern ipa_opt_pass_d *make_pass_ipa_increase_alignment (gcc::context
  *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_inline (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_free_lang_data (gcc::context *ctxt);
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 2669813..d34e560 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -899,6 +899,34 @@ get_vec_alignment_for_type (tree type)
   return (alignment > TYPE_ALIGN (type)) ? alignment : 0;
 }
 
+/* Return true if alignment should be increased for this vnode.
+   This is done if every function that references/referring to vnode
+   has flag_tree_loop_vectorize and flag_section_anchors set.  */
+
+static bool
+increase_alignment_p (varpool_node *vnode)
+{
+  ipa_ref *ref;
+

Re: move increase_alignment from simple to regular ipa pass

2016-06-09 Thread Jan Hubicka

> On 8 June 2016 at 20:38, Jan Hubicka  wrote:
> >> I think it would be nice to work towards transitioning
> >> flag_section_anchors to a flag on varpool nodes, thereby removing
> >> the Optimization flag from common.opt:fsection-anchors
> >>
> >> That would simplify the walk over varpool candidates.
> >
> > Makes sense to me, too. There are more candidates for sutff that should be
> > variable specific in common.opt (such as variable alignment, -fdata-sctions,
> > -fmerge-constants) and targets.  We may try to do it in an easy to extend 
> > way
> > so incrementally we can get rid of those global flags, too.
> In this version I removed Optimization from fsection-anchors entry in
> common.opt,
> and gated the increase_alignment pass on flag_section_anchors != 0.
> Cross tested on arm*-*-*, aarch64*-*-*.
> Does it look OK ?

If you go this way you will need to do something sane for LTO.  Here one can 
compile
some object files with -fsection-anchors and other without and link with random 
setting
(because in traditional compilation linktime flags does not matter).

For global flags we have magic in merge_and_complain that determines flags to 
pass
to the LTO compiler.
It is not very robust though.
> >
> > One thing that needs to be done for LTO is sane merging, I guess in this 
> > case
> > it is clear that the variable should be anchored when its previaling 
> > definition
> > is.
> Um could we determine during WPA if symbol is a section anchor for merging ?
> Seems to me SYMBOL_REF_ANCHOR_P is defined only on DECL_RTL and not at
> tree level.
> Do we have DECL_RTL info available during WPA ?

We don't have anchros computed, but we can decide whether we want to potentially
anchor the variable if we can.

I would say all you need is to have section_anchor flag in varpool node itself
which controls RTL production. At varpool_finalize_decl you will set it
according to flag_varpool and stream it to LTO objects. At WPA when doing
linking, the section_anchor flag of the previaling decl wins..

Honza
> 
> Thanks,
> Prathamesh
> >
> > Honza
> >>
> >> Richard.
> >>
> >> > Thanks,
> >> > Prathamesh
> >> > >
> >> > > Honza
> >> > >>
> >> > >> Richard.
> >> >
> >> >
> >>
> >> --
> >> Richard Biener 
> >> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> >> 21284 (AG Nuernberg)

> diff --git a/gcc/common.opt b/gcc/common.opt
> index f0d7196..f93f26c 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2133,7 +2133,7 @@ Common Report Var(flag_sched_dep_count_heuristic) 
> Init(1) Optimization
>  Enable the dependent count heuristic in the scheduler.
>  
>  fsection-anchors
> -Common Report Var(flag_section_anchors) Optimization
> +Common Report Var(flag_section_anchors)
>  Access data in the same section from shared anchor points.
>  
>  fsee
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 3647e90..3a8063c 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -138,12 +138,12 @@ along with GCC; see the file COPYING3.  If not see
>PUSH_INSERT_PASSES_WITHIN (pass_ipa_tree_profile)
>NEXT_PASS (pass_feedback_split_functions);
>POP_INSERT_PASSES ()
> -  NEXT_PASS (pass_ipa_increase_alignment);
>NEXT_PASS (pass_ipa_tm);
>NEXT_PASS (pass_ipa_lower_emutls);
>TERMINATE_PASS_LIST (all_small_ipa_passes)
>  
>INSERT_PASSES_AFTER (all_regular_ipa_passes)
> +  NEXT_PASS (pass_ipa_increase_alignment);
>NEXT_PASS (pass_ipa_whole_program_visibility);
>NEXT_PASS (pass_ipa_profile);
>NEXT_PASS (pass_ipa_icf);
> diff --git a/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c 
> b/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c
> new file mode 100644
> index 000..74eaed8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-73.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target section_anchors } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#define N 32
> +
> +/* Clone of section-anchors-vect-70.c with foo() having 
> -fno-tree-loop-vectorize.  */ 
> +
> +static struct A {
> +  int p1, p2;
> +  int e[N];
> +} a, b, c;
> +
> +__attribute__((optimize("-fno-tree-loop-vectorize")))
> +int foo(void)
> +{
> +  for (int i = 0; i < N; i++)
> +a.e[i] = b.e[i] + c.e[i];
> +
> +   return a.e[0];
> +}
> +
> +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 
> "increase_alignment" { target aarch64*-*-* } } } */
> +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 
> "increase_alignment" { target powerpc64*-*-* } } } */
> +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 
> "increase_alignment" { target arm*-*-* } } } */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 36299a6..d36aa1d 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -483,7 +483,7 @@ extern simple_ipa_opt_pass 
> *make_pass_local_optimization_passes (gcc::context *c
>  
>  extern

Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-06-09 Thread Alexander Monakov

On Thu, 9 Jun 2016, Nathan Sidwell wrote:
> >  (define_expand "restore_stack_block"
> >[(match_operand 0 "register_operand" "")
> > (match_operand 1 "register_operand" "")]
> 
> you've not addressed my previous comments about this.

To be clear -- do you mean that "restore_stack_block" should have a comment
mentioning why "save_stack_block" is not overridden?

> > @@ -0,0 +1,25 @@
> 
> > +char *__nvptx_stacks[32] __attribute__((shared,nocommon));
> 
> Is there a reason this can't be in crt0?

It's also needed for offloading compilation, which doesn't link crt0.

> It should be 'void *'  Also, why '32' when only slot zero is initialized?
> ISTM that this should be size 1.

Offloading will need the full width.

Thanks.
Alexander

Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-09 Thread Michael Meissner

I'm including the global reviewers on the list.  I just want to be sure that
there is no problem installing these patches on the GCC 6.2 branch.  While it
is technically an enchancement, it is needed to be able to install the glibc
support that is needed to complete the work to add IEEE 128-bit floating point.

The issue being fixed is that when we are creating the complex type, we used to
do a lookup for the size, and that fails on the PowerPC which has 2 128-bit
floating point types (__ibm128 and __float128, with long double currently
defaulting to __ibm128).

On Fri, Jun 03, 2016 at 09:33:35AM -0400, Michael Meissner wrote:
> These patches were installed on the trunk on May 2nd, with a fix from Alan
> Modra on May 11th.  Unless I here objections in the next few days, I will
> commit these changes to the GCC 6.x branch.  These changes will allow people 
> to
> use complex __float128 types (via an attribute) on the PowerPC.
> 
> Note, we will need patches to libgcc to fully enable complex __float128 
> support
> on the PowerPC.  These patches enable the compiler support, so that the libgcc
> changes can be coded.
> 
> In addition to bootstrapping and regtesting on the PowerPC (little endian
> power8), I also bootstrapped and regested the changes on x86_64 running RHEL
> 6.2.  There were no regressions in either case.
> 
> [gcc]
> 2016-06-02  Michael Meissner  
> 
>   Back port from trunk
>   2016-05-11  Alan Modra  
> 
>   * config/rs6000/rs6000.c (is_complex_IBM_long_double,
>   abi_v4_pass_in_fpr): New functions.
>   (rs6000_function_arg_boundary): Exclude complex IBM long double
>   from 64-bit alignment when ABI_V4.
>   (rs6000_function_arg, rs6000_function_arg_advance_1,
>   rs6000_gimplify_va_arg): Use abi_v4_pass_in_fpr.
> 
>   Back port from trunk
>   2016-05-02  Michael Meissner  
> 
>   * machmode.h (mode_complex): Add support to give the complex mode
>   for a given mode.
>   (GET_MODE_COMPLEX_MODE): Likewise.
>   * stor-layout.c (layout_type): For COMPLEX_TYPE, use the mode
>   stored by build_complex_type and gfc_build_complex_type instead of
>   trying to figure out the appropriate mode based on the size. Raise
>   an assertion error, if the type was not set.
>   * genmodes.c (struct mode_data): Add field for the complex type of
>   the given type.
>   (blank_mode): Likewise.
>   (make_complex_modes): Remember the complex mode created in the
>   base type.
>   (emit_mode_complex): Write out the mode_complex array to map a
>   type mode to the complex version.
>   (emit_insn_modes_c): Likewise.
>   * tree.c (build_complex_type): Set the complex type to use before
>   calling layout_type.
>   * config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Add
>   support for __float128 complex datatypes.
>   (rs6000_hard_regno_mode_ok): Likewise.
>   (rs6000_setup_reg_addr_masks): Likewise.
>   (rs6000_complex_function_value): Likewise.
>   * config/rs6000/rs6000.h (FLOAT128_IEEE_P): Likewise.
>   __float128 and __ibm128 complex.
>   (FLOAT128_IBM_P): Likewise.
>   (ALTIVEC_ARG_MAX_RETURN): Likewise.
>   * doc/extend.texi (Additional Floating Types): Document that
>   -mfloat128 must be used to enable __float128.  Document complex
>   __float128 and __ibm128 support.
> 
> [gcc/fortran]
> 2016-06-02  Michael Meissner  
> 
>   Back port from trunk
>   2016-05-02  Michael Meissner  
> 
>   * trans-types.c (gfc_build_complex_type):
> 
> [gcc/testsuite]
> 2016-06-02  Michael Meissner  
> 
>   Back port from trunk
>   2016-05-02  Michael Meissner  
> 
>   * gcc.target/powerpc/float128-complex-1.c: New tests for complex
>   __float128.
>   * gcc.target/powerpc/float128-complex-2.c: Likewise.
> 
> -- 
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

> Index: gcc/machmode.h
> ===
> --- gcc/machmode.h(revision 237045)
> +++ gcc/machmode.h(working copy)
> @@ -269,6 +269,10 @@ extern const unsigned char mode_wider[NU
>  extern const unsigned char mode_2xwider[NUM_MACHINE_MODES];
>  #define GET_MODE_2XWIDER_MODE(MODE) ((machine_mode) mode_2xwider[MODE])
> 
> +/* Get the complex mode from the component mode.  */
> +extern const unsigned char mode_complex[NUM_MACHINE_MODES];
> +#define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
> +
>  /* Return the mode for data of a given size SIZE and mode class CLASS.
> If LIMIT is nonzero, then don't use modes bigger than MAX_FIXED_MODE_SIZE.
> The value is BLKmode if no other mode is found.  */
>

Re: [PATCH, RS6000] Add RS6000_BTM_MODULO to set of RS6000_BTM_COMMON flags

2016-06-09 Thread Segher Boessenkool

[ Sorry I missed this ]

On Thu, Jun 02, 2016 at 02:51:37PM -0600, Kelvin Nilsen wrote:
> This patch adds the RS6000_BTM_MODULO flag to to the set of flags 
> associated with the RS6000_BTM_COMMON variable.
> 
> This patch has bootstrapped with the trunk and the gcc-6-branch on
> powerpc64le-unknown-linux-gnu and there were no regressions.
> 
> Is it ok to merge this with the trunk?  Can I merge with the 
> gcc-6-branch after waiting a few days following the trunk integration?

Okay, thank you!


Segher

Re: [PATCH 0/9] separate shrink-wrapping

2016-06-09 Thread Segher Boessenkool

On Thu, Jun 09, 2016 at 10:12:53AM -0600, Jeff Law wrote:
> I'm going to largely let Bernd own the review on this.  Just a few comments.
> 
> I certainly like the concept.  My mental model is that parts of the 
> prologue might sink further than other parts of the prologue.  It's not 
> an exact match, but I think it's a "close enough" mental model.

The "normal", "main" shrink-wrapping sinks the prologue, duplicating
blocks that can be run both with and without prologue.  It always
keeps one prologue, and any path through the function will execute the
prologue at most once.

Separate shrink-wrapping can put the prologue pieces in more than one
spot (each), and also execute them more than once.  But it does not
copy blocks, that can explode the code size too much if there are many
concerns.  The standard example:

  |
  1
 / \
2   3
 \ /
  4
 / \
5   6
 \ /
  7
  |

where 3 and 6 have some concern.  If 3 and 6 together have a lower
execution frequency than 1 does, it is better to put prologues and
epilogues around 3 and 6; otherwise, it is better to have a prologue
before 1 and an epilogue after 7.

> It looks like the "concerns" are all target defined and its the target's 
> responsibility to deal with dependencies between the "concerns".  Right? 

Exactly.

There are not many dependencies in general (except for the obvious
"setting up a stack frame has to come before almost everything else").

I have found no reasonably simple way yet to have the target inform the
generic code about the dependencies, and nothing uses it so far anyway.
So I just punt the responsibility it to the target code, for now anyway.

>  (BTW, I think "concerns" is a particularly poor name, perhaps 
> "reasons" would be better?).

I think that's even worse.  "pieces", maybe?

> Right now it looks like shrink_wrapped_separate essentially says "we did 
> something special here" -- while I don't think it's necessary for this 
> patch, describing what we did might be better.  Essentially different 
> paths would have a set of prologue/epilogue attributes that apply to 
> that path.

It is used by a few later passes to say "don't touch some stuff, you do
not have all the info you need".  This is analogous to the existing
"shrink_wrapped" flag.

> This would be useful for things like cfgcleanup where you could join 
> noreturn calls more aggressively.   Though this may not matter in any 
> significant way in practice.

Oh, it already tried to join such paths more aggressively.  Which then
blows up big time in the usual dwarf2cfi spot.

> There may be enough commonality that we can promote some "concerns" from 
> target dependent to target independent -- I've often wondered if we 
> could handle a fair amount of prologue/epilogue generation in target 
> independent ways.For simple targets, I wouldn't be terribly 
> surprised if all prologue/epilogue generation could be handled 
> generically and they'd get separate shrink-wrapping "for free".  But 
> that's all blue-sky.

I don't think that will work to well, unfortunately.

Segher

Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-06-09 Thread Nathan Sidwell


On 06/02/16 17:22, Alexander Monakov wrote:

On Wed, 25 May 2016, Nathan Sidwell wrote:

It seems like we should reject the combination of -msoft-stack -fopenacc?


Possibly; the doc text makes it explicit that the option is exposed only for
the purpose of testing the compiler, anyway.


It is always best to prevent the user doing something you don't recommend,
rather than presume they'll be sensible.


No change for that yet in the respin; do you want something like

  if (flag_openacc && TARGET_SOFT_STACK)
sorry ("-fopenacc and -msoft-stack are mutually exclusive");

in nvptx_override_options?


Yes, but it's not a sorry, just a regular error.



+static void
+init_softstack_frame (FILE *file, unsigned alignment, unsigned size)
+{
+  /* Maintain 64-bit stack alignment.  */


This still needs blank lines to aid readibility.


@@ -1027,6 +1186,21 @@ nvptx_declare_function_name (FILE *file, const char 
*name, const_tree decl)
   REGNO (cfun->machine->axis_predicate[1]), "x");
 }

+/* Output instruction that sets soft stack pointer in shared memory to the
+   value in register given by SRC_REGNO.  */
+
+const char *
+nvptx_output_set_softstack (unsigned src_regno)
+{
+  if (cfun->machine->has_softstack && !crtl->is_leaf)
+{
+  fprintf (asm_out_file, "\tst.shared.u%d\t[%s], ",
+  POINTER_SIZE, reg_names[SOFTSTACK_SLOT_REGNUM]);
+  output_reg (asm_out_file, src_regno, VOIDmode);
+  fprintf (asm_out_file, ";\n");
+}
+  return "";


I think this would  be clearer by having the softstack insn's pattern mention 
both the SRC and softstack reg and then simply return either "" or 
"st.shared%t\\t%0, %1"  (and do that directly in the insn's emitter section 
rather than an fn call?)  Then also have an epilogue  expander emit the insn, 
rather than embed it in the return emitter.




--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md





 (define_expand "restore_stack_block"
   [(match_operand 0 "register_operand" "")
(match_operand 1 "register_operand" "")]


you've not addressed my previous comments about this.



@@ -0,0 +1,25 @@



+char *__nvptx_stacks[32] __attribute__((shared,nocommon));


Is there a reason this can't be in crt0?
It should be 'void *'  Also, why '32' when only slot zero is initialized?  ISTM 
that this should be size 1.

Re: [PATCH] Remove old diagnostic macros.

2016-06-09 Thread Marcin Baczyński

2016-06-09 18:35 GMT+02:00 Jeff Law :
> On 05/30/2016 05:19 PM, Marcin Baczyński wrote:
>>
>> Hi,
>> this is my first GCC patch, so please bear with me if something is wrong
>> with
>> it in an obvious way.
>>
>> I've found two unused macros in gcc/diagnostic.h. Is the patch okay as is?
>> Bootstrapped on x86_64-pc-linux-gnu.
>>
>> 2016-05-31 Marcin Baczyński 
>>
>>* gcc/diagnostic.h (diagnostic_line_cutoff, diagnostic_flush_buffer):
>> delete.
>
> This is fine for the trunk.  Please install.
>

Thanks for the approval. As for installing, I don't have write access
to the repository, so could someone with the access do it, please?

> THanks,
> jeff

[PING] [PATCH] c/71392 - SEGV calling integer overflow built-ins with a null pointer

2016-06-09 Thread Martin Sebor


Attached is an updated version of the original patch described
below to annotate with the nonnull attribute the Built-In Functions
to Perform Arithmetic with Overflow Checking.

Since the machinery that's in place doesn't handle the attribute
on type-generic built-ins changes to the (handle_nonnull_attribute
function in Ada and LTO besides C were required so I CC the
maintainers for these areas.

Thanks
Martin

On 06/02/2016 05:22 PM, Martin Sebor wrote:

In a discussion of a patch in a this area (c/68120 and c++/70507)
Jakub noticed that the integer arithmetic built-ins with overflow
checking that expect a pointer to an integer as the last argument
silently (i.e., without a warning) accept a null pointer.  As the
test case in the bug referenced in in subject shows, such calls
then crash at runtime.

The attached patch follows the same approach used by other built
ins that take a pointer to an object (such as __built_strlen) to
issue a -Wnonnull warning for such invalid calls.

Martin


PR c/71392 - SEGV calling integer overflow built-ins with a null pointer

gcc/ChangeLog:

	PR c/71392
	* builtin-attrs.def (ATTR_NOTHROW_NONNULL_LEAF_LIST): New macro.
	(ATTR_NOTHROW_NONNULL_TYPEGENERIC_LEAF): Same.
	* builtins.def (BUILT_IN_SADD_OVERFLOW, BUILT_IN_SADDL_OVERFLOW): Use
	them.
	(BUILT_IN_SADDLL_OVERFLOW, BUILT_IN_SSUB_OVERFLOW): Same.
	(BUILT_IN_SSUBL_OVERFLOW, BUILT_IN_SSUBLL_OVERFLOW): Same.
	(BUILT_IN_SMUL_OVERFLOW, BUILT_IN_SMULL_OVERFLOW): Same.
	(BUILT_IN_SMULLL_OVERFLOW, BUILT_IN_UADD_OVERFLOW): Same.
	(BUILT_IN_UADDL_OVERFLOW, BUILT_IN_UADDLL_OVERFLOW): Same.
	(BUILT_IN_USUB_OVERFLOW, BUILT_IN_USUBL_OVERFLOW): Same.
	(BUILT_IN_USUBLL_OVERFLOW, BUILT_IN_UMUL_OVERFLOW): Same.
	(BUILT_IN_UMULL_OVERFLOW, BUILT_IN_UMULLL_OVERFLOW):

gcc/ada/ChangeLog:

	PR c/71392
	* gcc/ada/gcc-interface/utils.c (handle_nonnull_attribute): Accept
	the nonnull attribute in type-generic builtins.

gcc/c-family/ChangeLog:

	PR c/71392
	* gcc/c-family/c-common.c (handle_nonnull_attribute): Accept
	the nonnull attribute in type-generic builtins.

gcc/lto/ChangeLog:

	PR c/71392
	* gcc/lto/lto-lang.c (handle_nonnull_attribute): Accept the nonnull
	attribute in type-generic builtins.

gcc/testsuite/ChangeLog:

	PR c/71392
	* c-c++-common/builtin-arith-overflow-1.c: Add test cases.

Index: gcc/ada/gcc-interface/utils.c
===
--- gcc/ada/gcc-interface/utils.c	(revision 237267)
+++ gcc/ada/gcc-interface/utils.c	(working copy)
@@ -5833,10 +5833,14 @@
 
   /* If no arguments are specified, all pointer arguments should be
  non-null.  Verify a full prototype is given so that the arguments
- will have the correct types when we actually check them later.  */
+ will have the correct types when we actually check them later.
+ Avoid diagnosing type-generic built-ins since those have no
+ prototype.  */
   if (!args)
 {
-  if (!prototype_p (type))
+  if (!prototype_p (type)
+	  && TYPE_ATTRIBUTES (type)
+	  && !lookup_attribute ("type generic", TYPE_ATTRIBUTES (type)))
 	{
 	  error ("nonnull attribute without arguments on a non-prototype");
 	  *no_add_attrs = true;
Index: gcc/builtin-attrs.def
===
--- gcc/builtin-attrs.def	(revision 237267)
+++ gcc/builtin-attrs.def	(working copy)
@@ -165,6 +165,7 @@
 /* Nothrow leaf functions whose pointer parameter(s) are all nonnull.  */
 DEF_ATTR_TREE_LIST (ATTR_NOTHROW_NONNULL_LEAF, ATTR_NONNULL, ATTR_NULL, \
 			ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_NOTHROW_NONNULL_LEAF_LIST, ATTR_LEAF, ATTR_NULL, ATTR_NOTHROW_NONNULL_LEAF)
 /* Nothrow functions whose first parameter is a nonnull pointer.  */
 DEF_ATTR_TREE_LIST (ATTR_NOTHROW_NONNULL_1, ATTR_NONNULL, ATTR_LIST_1, \
 			ATTR_NOTHROW_LIST)
@@ -183,6 +184,10 @@
 /* Nothrow leaf functions which are type-generic.  */
 DEF_ATTR_TREE_LIST (ATTR_NOTHROW_TYPEGENERIC_LEAF, ATTR_TYPEGENERIC, ATTR_NULL, \
 			ATTR_NOTHROW_LEAF_LIST)
+/* Nothrow nonnull leaf functions that are type-generic.  */
+DEF_ATTR_TREE_LIST (ATTR_NOTHROW_NONNULL_TYPEGENERIC_LEAF,
+		ATTR_TYPEGENERIC, ATTR_NULL,
+		ATTR_NOTHROW_NONNULL_LEAF)
 /* Nothrow const functions whose pointer parameter(s) are all nonnull.  */
 DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL, ATTR_CONST, ATTR_NULL, \
 			ATTR_NOTHROW_NONNULL)
Index: gcc/builtins.def
===
--- gcc/builtins.def	(revision 237267)
+++ gcc/builtins.def	(working copy)
@@ -707,31 +707,31 @@
 DEF_C94_BUILTIN(BUILT_IN_TOWUPPER, "towupper", BT_FN_WINT_WINT, ATTR_PURE_NOTHROW_LEAF_LIST)
 
 /* Category: integer overflow checking builtins.  */
-DEF_GCC_BUILTIN(BUILT_IN_ADD_OVERFLOW, "add_overflow", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF)
-DEF_GCC_BUILTIN(BUILT_IN_SUB_OVERFLOW, "sub_overflow", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF)
-DEF_GCC_BUILTIN

Re: [PATCH 1/3] config-list.mk: add KNOWN_BROKEN

2016-06-09 Thread Trevor Saunders

On Thu, Jun 09, 2016 at 04:25:32PM +, Joseph Myers wrote:
> On Thu, 9 Jun 2016, Jeff Law wrote:
> 
> > On 05/26/2016 09:04 AM, David Malcolm wrote:
> > > When using config-list.mk to build all configurations, it's useful
> > > to filter out the configurations that are known to be broken.
> > > 
> > > This patch does so, adding a KNOWN_BROKEN variable.
> > > 
> > > contrib/ChangeLog:
> > >   * config-list.mk (LIST): Rename to...
> > >   (FULL_LIST): ...this.
> > >   (KNOWN_BROKEN): New variable.
> > >   (LIST): Redefine, in terms of FULL_LIST and KNOWN_BROKEN.
> > I'd rather fix, deprecate and/or remove these kinds of targets.
> > 
> > Given that interix has been broken for a long time, I'll approve a patch 
> > that
> > removes it.
> 
> The rule was meant to be that a target is deprecated in one release, then 
> removed in the next - the error message given without --enable-obsolete 
> says so.  So all the targets that require --enable-obsolete in GCC 6 
> should be removed for GCC 7 in the absence of interest in reviving them.

yeah, I have patches to do that (including interix which was obsoleted),
but moving and a hard drive failure got in the way of sending them.
I'll try and get those out shortly.

Trev

[PATCH 3/3] pretty-print.c: skip color selftests if GCC_COLORS is set

2016-06-09 Thread David Malcolm

gcc/ChangeLog:
* pretty-print.c (assert_pp_format_colored): Skip the test if
GCC_COLORS is set.
(test_pp_format): Remove comment about GCC_COLORS.
---
 gcc/pretty-print.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index 86ae3a5..3713953 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -1266,6 +1266,12 @@ static void
 assert_pp_format_colored (const location , const char *expected,
  const char *fmt, ...)
 {
+  /* The tests of colorization assume that the default color scheme.
+ If GCC_COLORS is set, then the colors have potentially been
+ overridden; skip the test.  */
+  if (getenv ("GCC_COLORS"))
+return;
+
   va_list ap;
 
   va_start (ap, fmt);
@@ -1347,7 +1353,6 @@ test_pp_format ()
   ASSERT_PP_FORMAT_2 ("normal colored normal 12345678",
  "normal %rcolored%R normal %x",
  "error", 0x12345678);
-  /* The following assumes an empty value for GCC_COLORS.  */
   assert_pp_format_colored
 (SELFTEST_LOCATION,
  "normal \33[01;31m\33[Kcolored\33[m\33[K normal 12345678",
-- 
1.8.5.3

[PATCH 2/3] selftests: improve reported failure locations

2016-06-09 Thread David Malcolm

This patch introduce a selftest::location struct to wrap up __FILE__
and __LINE__ information (and __FUNCTION__) throughout the selftests,
allowing location information to be passed around.

It updates the helper functions in pretty-print.c to pass through
the precise location of each test, so that if a failure occurs, the
correct line number is printed, rather than a line within a helper
function.

gcc/ChangeLog:
* input.c (test_reading_source_line): Use SELFTEST_LOCATION.
* pretty-print.c (assert_pp_format_va): Add location param and use
it with ASSERT_STREQ_AT.
(assert_pp_format): Add location param and pass it to
assert_pp_format_va.
(assert_pp_format_colored): Likewise.
(ASSERT_PP_FORMAT_1): New.
(ASSERT_PP_FORMAT_2): New.
(ASSERT_PP_FORMAT_3): New.
(test_pp_format): Provide SELFTEST_LOCATION throughout, either
explicitly, or implicitly via the above macros.
* selftest.c (selftest::pass): Use a selftest::location rather
than file and line.
(selftest::fail): Likewise.  Print the function name.
(selftest::fail_formatted): Likewise.
(selftest::assert_streq): Use a selftest::location rather than
file and line.
* selftest.h (selftest::location): New struct.
(SELFTEST_LOCATION): New macro.
(selftest::pass): Accept a const location & rather than file
and line.
(selftest::fail): Likewise.
(selftest::fail_formatted): Likewise.
(selftest::assert_streq): Likewise.
(ASSERT_TRUE): Update for above changes, using SELFTEST_LOCATION.
(ASSERT_FALSE): Likewise.
(ASSERT_EQ): Likewise.
(ASSERT_NE): Likewise.
(ASSERT_STREQ): Likewise.
(ASSERT_PRED1): Likewise.
(ASSERT_STREQ_AT): New macro.
---
 gcc/input.c|   4 +-
 gcc/pretty-print.c | 128 -
 gcc/selftest.c |  20 -
 gcc/selftest.h |  60 ++---
 4 files changed, 134 insertions(+), 78 deletions(-)

diff --git a/gcc/input.c b/gcc/input.c
index 0b340a8..704ee75 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1231,10 +1231,10 @@ test_reading_source_line ()
   ASSERT_EQ (53, line_size);
   if (!strncmp (" The quick brown fox jumps over the lazy dog.  */",
   source_line, line_size))
-::selftest::pass (__FILE__, __LINE__,
+::selftest::pass (SELFTEST_LOCATION,
  "source_line matched expected value");
   else
-::selftest::fail (__FILE__, __LINE__,
+::selftest::fail (SELFTEST_LOCATION,
  "source_line did not match expected value");
 }
 
diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index 8febda0..86ae3a5 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -1227,8 +1227,8 @@ test_basic_printing ()
prints EXPECTED, assuming that pp_show_color is SHOW_COLOR.  */
 
 static void
-assert_pp_format_va (const char *expected, bool show_color, const char *fmt,
-va_list *ap)
+assert_pp_format_va (const location , const char *expected,
+bool show_color, const char *fmt, va_list *ap)
 {
   pretty_printer pp;
   text_info ti;
@@ -1243,34 +1243,59 @@ assert_pp_format_va (const char *expected, bool 
show_color, const char *fmt,
   pp_show_color () = show_color;
   pp_format (, );
   pp_output_formatted_text ();
-  ASSERT_STREQ (expected, pp_formatted_text ());
+  ASSERT_STREQ_AT (loc, expected, pp_formatted_text ());
 }
 
 /* Verify that pp_format (FMT, ...) followed by pp_output_formatted_text
prints EXPECTED, with show_color disabled.  */
 
 static void
-assert_pp_format (const char *expected, const char *fmt, ...)
+assert_pp_format (const location , const char *expected,
+ const char *fmt, ...)
 {
   va_list ap;
 
   va_start (ap, fmt);
-  assert_pp_format_va (expected, false, fmt, );
+  assert_pp_format_va (loc, expected, false, fmt, );
   va_end (ap);
 }
 
 /* As above, but with colorization enabled.  */
 
 static void
-assert_pp_format_colored (const char *expected, const char *fmt, ...)
+assert_pp_format_colored (const location , const char *expected,
+ const char *fmt, ...)
 {
   va_list ap;
 
   va_start (ap, fmt);
-  assert_pp_format_va (expected, true, fmt, );
+  assert_pp_format_va (loc, expected, true, fmt, );
   va_end (ap);
 }
 
+/* Helper function for calling testing pp_format,
+   by calling assert_pp_format with various numbers of arguments.
+   These exist mostly to avoid having to write SELFTEST_LOCATION
+   throughout test_pp_format.  */
+
+#define ASSERT_PP_FORMAT_1(EXPECTED, FMT, ARG1)  \
+  SELFTEST_BEGIN_STMT\
+assert_pp_format ((SELFTEST_LOCATION), (EXPECTED), (FMT), \
+ (ARG1));\
+  SELFTEST_END_STMT
+
+#define ASSERT_PP_FORMAT_2(EXPECTED, FMT,

[PATCH 1/3] selftest: show values when ASSERT_STREQ fails

2016-06-09 Thread David Malcolm

Rework ASSERT_STREQ so that it prints the actual and expected values
to stderr when it fails (by moving it to a helper function).

gcc/ChangeLog:
* selftest.c (selftest::fail_formatted): New function.
(selftest::assert_streq): New function.
* selftest.h (selftests::fail_formatted): New decl.
(selftest::assert_streq): New decl.
(ASSERT_STREQ): Reimplement in terms of selftest::assert_streq.
---
 gcc/selftest.c | 32 
 gcc/selftest.h | 24 +++-
 2 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/gcc/selftest.c b/gcc/selftest.c
index de804df..e5332db 100644
--- a/gcc/selftest.c
+++ b/gcc/selftest.c
@@ -44,4 +44,36 @@ selftest::fail (const char *file, int line, const char *msg)
   abort ();
 }
 
+/* As "fail", but using printf-style formatted output.  */
+
+void
+selftest::fail_formatted (const char *file, int line, const char *fmt, ...)
+{
+  va_list ap;
+
+  fprintf (stderr, "%s:%i: FAIL: ", file, line);
+  /* TODO: add calling function name as well?  */
+  va_start (ap, fmt);
+  vfprintf (stderr, fmt, ap);
+  va_end (ap);
+  fprintf (stderr, "\n");
+  abort ();
+}
+
+/* Implementation detail of ASSERT_STREQ.  */
+
+void
+selftest::assert_streq (const char *file, int line,
+   const char *desc_expected, const char *desc_actual,
+   const char *val_expected, const char *val_actual)
+{
+  if (0 == strcmp (val_expected, val_actual))
+::selftest::pass (file, line, "ASSERT_STREQ");
+  else
+::selftest::fail_formatted
+   (file, line, "ASSERT_STREQ (%s, %s) expected=\"%s\" actual=\"%s\"",
+desc_expected, desc_actual, val_expected, val_actual);
+}
+
+
 #endif /* #if CHECKING_P */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index d1f8acc..6759734 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -39,6 +39,17 @@ extern void pass (const char *file, int line, const char 
*msg);
 
 extern void fail (const char *file, int line, const char *msg);
 
+/* As "fail", but using printf-style formatted output.  */
+
+extern void fail_formatted (const char *file, int line, const char *fmt, ...)
+ ATTRIBUTE_PRINTF_3;
+
+/* Implementation detail of ASSERT_STREQ.  */
+
+extern void assert_streq (const char *file, int line,
+ const char *desc_expected, const char *desc_actual,
+ const char *val_expected, const char *val_actual);
+
 /* Declarations for specific families of tests (by source file), in
alphabetical order.  */
 extern void bitmap_c_tests ();
@@ -123,15 +134,10 @@ extern int num_passes;
::selftest::pass if they are equal,
::selftest::fail if they are non-equal.  */
 
-#define ASSERT_STREQ(EXPECTED, ACTUAL)\
-  SELFTEST_BEGIN_STMT \
-  const char *desc = "ASSERT_STREQ (" #EXPECTED ", " #ACTUAL ")"; \
-  const char *expected_ = (EXPECTED);\
-  const char *actual_ = (ACTUAL);\
-  if (0 == strcmp (expected_, actual_))  \
-::selftest::pass (__FILE__, __LINE__, desc);  \
-  else\
-::selftest::fail (__FILE__, __LINE__, desc);  \
+#define ASSERT_STREQ(EXPECTED, ACTUAL) \
+  SELFTEST_BEGIN_STMT  \
+  ::selftest::assert_streq (__FILE__, __LINE__, #EXPECTED, #ACTUAL, \
+   (EXPECTED), (ACTUAL));  \
   SELFTEST_END_STMT
 
 /* Evaluate PRED1 (VAL1), calling ::selftest::pass if it is true,
-- 
1.8.5.3

[PATCH 0/3] selftest improvements

2016-06-09 Thread David Malcolm

PR bootstrap/71471 highlighted the need for selftests to provide more
information when they fail.  The report contained:
  src/gcc/pretty-print.c:1246: FAIL: ASSERT_STREQ (expected, pp_formatted_text 
())

This showed a string-equality failure within the helper function
assert_pp_format_va, but didn't show which actual comparison was
failing.

Patch 1 updates ASSERT_STREQ so that the output includes the actual
and expected strings, which would have allowed us to identify the
failing test by searching the source.

Patch 2 introduces a way to pass location information to helper
functions so that the file/line that's printed for a failure points to
the specific case, rather than to the inside of the helper function,
and uses it within pretty-print.c's selftests.  This would have
allowed us to directly identify the failing test.

Patch 3 fixes a bug that turned out not to be the cause of the PR.

I've tested the combination of these patches:
  * successful bootstrap on x86_64-pc-linux-gnu
  * successful build/-fselftest of stage 1 on powerpc-ibm-aix7.1.3.0

OK for trunk?

David Malcolm (3):
  selftest: show values when ASSERT_STREQ fails
  selftests: improve reported failure locations
  pretty-print.c: skip color selftests if GCC_COLORS is set

 gcc/input.c|   4 +-
 gcc/pretty-print.c | 135 +
 gcc/selftest.c |  40 ++--
 gcc/selftest.h |  76 +-
 4 files changed, 177 insertions(+), 78 deletions(-)

-- 
1.8.5.3

Re: [PATCH] Add selftest for pretty-print.c (v2)

2016-06-09 Thread David Edelsohn

On Thu, Jun 9, 2016 at 1:53 PM, David Malcolm  wrote:
> On Thu, 2016-06-09 at 11:22 -0600, Jeff Law wrote:
>> On 06/09/2016 07:30 AM, David Edelsohn wrote:
>> >
>> > The self-tests specifically abort the build and break bootstrap
>> > upon
>> > failure.  Most other changes that inadvertently have bugs or tickle
>> > a
>> > latent issue in a target will introduce some additional testsuite
>> > failures, not a bootstrap failure.  x86 developers seem to get
>> > quite
>> > annoyed when a patch causes a bootstrap failure for an x86
>> > configuration.
>> >
>> > Second, all of the large changes that may have unknown effects on
>> > various targets have been tested extensively on multiple
>> > architectures, as have most global optimization changes.  It may
>> > not
>> > be required, but it generally has been considered "good form" and
>> > has
>> > been a stipulation of patch approval by some reviewers.  It would
>> > be
>> > very unfortunate for GCC to lower the bar for patches by some
>> > developers and not others.
>> Let's all calm down a bit here.  Everyone here just wants to make a
>> better compiler and mistakes happen.
>
>> What I see in David Malcolm's change is a fairly minor bug.  I don't
>> think David (or anyone) could have really expected that %p is printed
>> differently across different hosts and thus his patch would need
>> wider
>> host testing.  And AFAICT David addressed this issue as soon as he
>> started his day.
>>
>> So let's all take a deep breath and get back to improving GCC rather
>> than taking jabs at each other.
>>
>
> Sorry about the breakage.  I've committed a fix as r237271, which I've
> tested on PPC AIX (and on x86_64 linux).
>
> The selftest code is very new.  I tested both it and the pretty-print.c
> tests for every known-good *target* in config-list.mk; the issue here
> was a *host*-specific issue.
>
> Maybe the current "fail the build on any selftest failures" is too
> aggressive.  That said, note that if one knows which file the failing
> test is in (which we did), it's trivial to disable the tests in that
> file by hacking gcc/selftests-run-tests.c and commenting out/deleting
> the call:
>
> diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
> index 934e700..1c8128b 100644
> --- a/gcc/selftest-run-tests.c
> +++ b/gcc/selftest-run-tests.c
> @@ -46,7 +46,7 @@ selftest::run_tests ()
>hash_map_tests_c_tests ();
>hash_set_tests_c_tests ();
>vec_c_tests ();
> -  pretty_print_c_tests ();
> +  //pretty_print_c_tests ();
>wide_int_cc_tests ();
>
>
> whilst the underlying failure is investigated, so adding a new selftest
> is presumably not as risky an event as, say, changing an optimizer: the
> change is localized and can be readily disabled if it turns out to have
> a config-specific assumption.
>
> The selftests currently in trunk aren't the most exciting; I'm much
> more interested in the ggc-tests.c patch (awaiting review), since this
> would finally give us self-testing of gengtype and ggc, which AFAIK we
> haven't been able to test directly before.  I hate gengtype, and it's
> been a goal of mine to try to tame it since I started working on gcc.
> (FWIW I've successfully tested the ggc patch on AIX PPC now, for stage
> 1 at least, and for all targets in config-list.mk).
>
> Sorry again about the breakage.

Thanks for fixing this so quickly.

Maybe we need to consider some sort of "warn on failure" beta testing
period for new self-tests before they cause errors.  If self-tests can
trigger host-dependent behavior and cause bootstrap failure as a
consequence, we need to think about how this interacts with other GCC
development policies.

Thanks, David

Re: [PATCH] Add selftest for pretty-print.c (v2)

2016-06-09 Thread David Malcolm

On Thu, 2016-06-09 at 11:22 -0600, Jeff Law wrote:
> On 06/09/2016 07:30 AM, David Edelsohn wrote:
> > 
> > The self-tests specifically abort the build and break bootstrap
> > upon
> > failure.  Most other changes that inadvertently have bugs or tickle
> > a
> > latent issue in a target will introduce some additional testsuite
> > failures, not a bootstrap failure.  x86 developers seem to get
> > quite
> > annoyed when a patch causes a bootstrap failure for an x86
> > configuration.
> > 
> > Second, all of the large changes that may have unknown effects on
> > various targets have been tested extensively on multiple
> > architectures, as have most global optimization changes.  It may
> > not
> > be required, but it generally has been considered "good form" and
> > has
> > been a stipulation of patch approval by some reviewers.  It would
> > be
> > very unfortunate for GCC to lower the bar for patches by some
> > developers and not others.
> Let's all calm down a bit here.  Everyone here just wants to make a 
> better compiler and mistakes happen.

> What I see in David Malcolm's change is a fairly minor bug.  I don't 
> think David (or anyone) could have really expected that %p is printed
> differently across different hosts and thus his patch would need
> wider 
> host testing.  And AFAICT David addressed this issue as soon as he 
> started his day.
> 
> So let's all take a deep breath and get back to improving GCC rather 
> than taking jabs at each other.
> 

Sorry about the breakage.  I've committed a fix as r237271, which I've
tested on PPC AIX (and on x86_64 linux).

The selftest code is very new.  I tested both it and the pretty-print.c
tests for every known-good *target* in config-list.mk; the issue here
was a *host*-specific issue.

Maybe the current "fail the build on any selftest failures" is too
aggressive.  That said, note that if one knows which file the failing
test is in (which we did), it's trivial to disable the tests in that
file by hacking gcc/selftests-run-tests.c and commenting out/deleting
the call:

diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index 934e700..1c8128b 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -46,7 +46,7 @@ selftest::run_tests ()
   hash_map_tests_c_tests ();
   hash_set_tests_c_tests ();
   vec_c_tests ();
-  pretty_print_c_tests ();
+  //pretty_print_c_tests ();
   wide_int_cc_tests ();

whilst the underlying failure is investigated, so adding a new selftest
is presumably not as risky an event as, say, changing an optimizer: the
change is localized and can be readily disabled if it turns out to have
a config-specific assumption.

The selftests currently in trunk aren't the most exciting; I'm much
more interested in the ggc-tests.c patch (awaiting review), since this
would finally give us self-testing of gengtype and ggc, which AFAIK we
haven't been able to test directly before.  I hate gengtype, and it's
been a goal of mine to try to tame it since I started working on gcc.
(FWIW I've successfully tested the ggc patch on AIX PPC now, for stage
1 at least, and for all targets in config-list.mk).

Sorry again about the breakage.

Dave

Re: [PATCH] Fix tests for x86 interrupt for -fpic and -march=corei7 targets

2016-06-09 Thread Uros Bizjak

On Thu, Jun 9, 2016 at 4:01 PM, Koval, Julia  wrote:
> Hi,
>
> Here is a trivial patch, that fixes tests for these targets. Ok for trunk?
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/interrupt-12.c: Fix test for -fpic and corei7.
> * gcc.target/i386/interrupt-13.c: Likewise.
> * gcc.target/i386/interrupt-15.c: Likewise.
> * gcc.target/i386/interrupt-14.c: Fix test for -fpic.
> * gcc.target/i386/interrupt-24.c: Likewise.
> * gcc.target/i386/interrupt-3.c: Fix test for corei7.
> * gcc.target/i386/interrupt-9.c: Likewise.
> * gcc.target/i386/interrupt-redzone-2.c: Likewise.

OK.

Thanks,
Uros.

Re: Moving backwards/FSM threader into its own pass

2016-06-09 Thread Jeff Law


On 06/09/2016 05:18 AM, Martin Liška wrote:

Hello.

The patch caused:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71466
THanks.  I've got a few issues to address related to that change -- I've 
just been swamped with some personal stuff the last week.


I'm seriously considering reverting the change at this point, addressing 
the issues, then re-installing as I expect to stay pretty busy in the 
immediate future.


jeff

Re: [PATCH,FIXINCLUDES] AIX stdlib.h #define malloc

2016-06-09 Thread David Edelsohn

Hi, Bruce!

I thought about a regex, but the aliases require a full function
signature and the original, narrow context does not provide a function
signature.  If it was just alias XXXalloc as __linux_XXXalloc, it
would be more straight forward.  If there's a convenient way to add
the other information, I'm eager to learn.

Thanks, David

On Thu, Jun 9, 2016 at 1:44 PM, Bruce Korb  wrote:
> He's retired, but he ain't dead.
> I think these could be accomplished with a single fix.
> Please try a regex expression in the selection and utilize the selection in
> the replacement.
> I'll look at it when I have time (give me a few days)
>
> On Thu, Jun 9, 2016 at 10:29 AM, Jeff Law  wrote:
>>
>> On 06/09/2016 11:25 AM, David Edelsohn wrote:
>>>
>>> AIX has added variants of malloc, realloc, calloc and valloc with
>>> greater compatibility with Linux semantics, especially for NULL
>>> addresses.  The variants are declared in stdlib.h and use #define to
>>> override the normal definition if _LINUX_SOURCE_COMPAT is defined,
>>> e.g.,
>>>
>>> #define malloc __linux_malloc
>>> #define calloc __linux_calloc
>>> #define realloc __linux_realloc
>>> #define valloc __linux_valloc
>>>
>>> libstdc++-v3 cstdlib specifically undefines a number of stdlib.h macros,
>>> e.g.,
>>>
>>> // Get rid of those macros defined in  in lieu of real
>>> functions.
>>> ...
>>> #undef malloc
>>> #undef realloc
>>>
>>> C++ applications on AIX, especially users of BOOST that include
>>> cstdlib, encounter unexpected behavior when the definition of malloc
>>> changes from the expected / requested version.
>>>
>>> The following patch updates fixincludes to correct the AIX stdlib.h
>>> header by converting the #define to GCC asm aliases.  I created a
>>> separate fix for each definition because the order is not guaranteed.
>>>
>>> Bootstrapped on powerpc-ibm-aix7.1.0.0.  This fixes a recent node.js
>>> build failure on AIX due to additional dependence on BOOST.
>>>
>>> Okay for trunk, GCC 6 and GCC 5?
>>>
>>> Thanks, David
>>>
>>> * inclhack.def (aix_stdlib_malloc): New fix.
>>> (aix_stdlib_realloc): New fix.
>>> (aix_stdlib_calloc): New fix.
>>> (aix_stdlib_valloc): New fix.
>>> * fixincl.x: Regenerate.
>>> * test/base/stdlib.h [AIX_STDLIB_MALLOC]: New test.
>>> [AIX_STDLIB_REALLOC]: New test.
>>> [AIX_STDLIB_CALLOC]: New test.
>>> [AIX_STDLIB_VALLOC]: New test.
>>
>> Wow, fixincludesI'm not even sure if Bruce is around anymore...
>>
>> GIven these are conditional on mach= *-*-aix*, I think you can
>> self-approve them.
>>
>> jeff
>>
>

Re: [PATCH,FIXINCLUDES] AIX stdlib.h #define malloc

2016-06-09 Thread Jeff Law


On 06/09/2016 11:25 AM, David Edelsohn wrote:

AIX has added variants of malloc, realloc, calloc and valloc with
greater compatibility with Linux semantics, especially for NULL
addresses.  The variants are declared in stdlib.h and use #define to
override the normal definition if _LINUX_SOURCE_COMPAT is defined,
e.g.,

#define malloc __linux_malloc
#define calloc __linux_calloc
#define realloc __linux_realloc
#define valloc __linux_valloc

libstdc++-v3 cstdlib specifically undefines a number of stdlib.h macros, e.g.,

// Get rid of those macros defined in  in lieu of real functions.
...
#undef malloc
#undef realloc

C++ applications on AIX, especially users of BOOST that include
cstdlib, encounter unexpected behavior when the definition of malloc
changes from the expected / requested version.

The following patch updates fixincludes to correct the AIX stdlib.h
header by converting the #define to GCC asm aliases.  I created a
separate fix for each definition because the order is not guaranteed.

Bootstrapped on powerpc-ibm-aix7.1.0.0.  This fixes a recent node.js
build failure on AIX due to additional dependence on BOOST.

Okay for trunk, GCC 6 and GCC 5?

Thanks, David

* inclhack.def (aix_stdlib_malloc): New fix.
(aix_stdlib_realloc): New fix.
(aix_stdlib_calloc): New fix.
(aix_stdlib_valloc): New fix.
* fixincl.x: Regenerate.
* test/base/stdlib.h [AIX_STDLIB_MALLOC]: New test.
[AIX_STDLIB_REALLOC]: New test.
[AIX_STDLIB_CALLOC]: New test.
[AIX_STDLIB_VALLOC]: New test.

Wow, fixincludesI'm not even sure if Bruce is around anymore...

GIven these are conditional on mach= *-*-aix*, I think you can 
self-approve them.


jeff

[PATCH,FIXINCLUDES] AIX stdlib.h #define malloc

2016-06-09 Thread David Edelsohn

AIX has added variants of malloc, realloc, calloc and valloc with
greater compatibility with Linux semantics, especially for NULL
addresses.  The variants are declared in stdlib.h and use #define to
override the normal definition if _LINUX_SOURCE_COMPAT is defined,
e.g.,

#define malloc __linux_malloc
#define calloc __linux_calloc
#define realloc __linux_realloc
#define valloc __linux_valloc

libstdc++-v3 cstdlib specifically undefines a number of stdlib.h macros, e.g.,

// Get rid of those macros defined in  in lieu of real functions.
...
#undef malloc
#undef realloc

C++ applications on AIX, especially users of BOOST that include
cstdlib, encounter unexpected behavior when the definition of malloc
changes from the expected / requested version.

The following patch updates fixincludes to correct the AIX stdlib.h
header by converting the #define to GCC asm aliases.  I created a
separate fix for each definition because the order is not guaranteed.

Bootstrapped on powerpc-ibm-aix7.1.0.0.  This fixes a recent node.js
build failure on AIX due to additional dependence on BOOST.

Okay for trunk, GCC 6 and GCC 5?

Thanks, David

* inclhack.def (aix_stdlib_malloc): New fix.
(aix_stdlib_realloc): New fix.
(aix_stdlib_calloc): New fix.
(aix_stdlib_valloc): New fix.
* fixincl.x: Regenerate.
* test/base/stdlib.h [AIX_STDLIB_MALLOC]: New test.
[AIX_STDLIB_REALLOC]: New test.
[AIX_STDLIB_CALLOC]: New test.
[AIX_STDLIB_VALLOC]: New test.

Index: inclhack.def
===
--- inclhack.def(revision 237258)
+++ inclhack.def(working copy)
@@ -911,7 +911,49 @@
 test_text = "#ifdef __cplusplus\n}\n\n#ifdef ferror";
 };

+/*
+ * stdlib.h on AIX uses #define on malloc and friends.
+ */
+fix = {
+hackname  = aix_stdlib_malloc;
+mach  = "*-*-aix*";
+files = stdlib.h;
+select= "#define[ \t]+malloc[ \t]+__linux_malloc";
+c_fix = format;
+c_fix_arg = "extern void malloc(size_t) __asm__(\"__linux_malloc\");";
+test_text = "#define malloc __linux_malloc";
+};

+fix = {
+hackname  = aix_stdlib_realloc;
+mach  = "*-*-aix*";
+files = stdlib.h;
+select= "#define[ \t]+realloc[ \t]+__linux_realloc";
+c_fix = format;
+c_fix_arg = "extern void realloc(void *, size_t)
__asm__(\"__linux_realloc\");";
+test_text = "#define realloc __linux_realloc";
+};
+
+fix = {
+hackname  = aix_stdlib_calloc;
+mach  = "*-*-aix*";
+files = stdlib.h;
+select= "#define[ \t]+calloc[ \t]+__linux_calloc";
+c_fix = format;
+c_fix_arg = "extern void calloc(size_t, size_t)
__asm__(\"__linux_calloc\");";
+test_text = "#define calloc __linux_calloc";
+};
+
+fix = {
+hackname  = aix_stdlib_valloc;
+mach  = "*-*-aix*";
+files = stdlib.h;
+select= "#define[ \t]+valloc[ \t]+__linux_valloc";
+c_fix = format;
+c_fix_arg = "extern void valloc(size_t) __asm__(\"__linux_valloc\");";
+test_text = "#define valloc __linux_valloc";
+};
+
 /*
  * stdlib.h on AIX 4.3 declares strtof() with a non-const first argument.
  */

Re: [PATCH] PR bootstrap/71471: remove selftest for pp_format (%p)

2016-06-09 Thread Jeff Law


On 06/09/2016 10:45 AM, David Malcolm wrote:

I was confused by the comment to pp_format:

  /* The following format specifiers are recognized as being client independent:
 ...
 %p: pointer
 ... */

into thinking that %p is printed in a host-independent manner, when
"client independent" means in relation to different pretty-printers e.g the
C++ pretty printer, or normal tree one etc.

The following patch removes the overzealous test case and clarifies
the comment.

OK for trunk?

I'm working on a followup that makes selftest failures easier to
track down.

gcc/ChangeLog:
PR bootstrap/71471
* pretty-print.c (pp_indent): Specify that %p is printed in a
host-dependent manner.
(test_pp_format): Remove the test for %p.

OK.
jeff

Re: [PATCH] Add selftest for pretty-print.c (v2)

2016-06-09 Thread Jeff Law


On 06/09/2016 07:30 AM, David Edelsohn wrote:


The self-tests specifically abort the build and break bootstrap upon
failure.  Most other changes that inadvertently have bugs or tickle a
latent issue in a target will introduce some additional testsuite
failures, not a bootstrap failure.  x86 developers seem to get quite
annoyed when a patch causes a bootstrap failure for an x86
configuration.

Second, all of the large changes that may have unknown effects on
various targets have been tested extensively on multiple
architectures, as have most global optimization changes.  It may not
be required, but it generally has been considered "good form" and has
been a stipulation of patch approval by some reviewers.  It would be
very unfortunate for GCC to lower the bar for patches by some
developers and not others.
Let's all calm down a bit here.  Everyone here just wants to make a 
better compiler and mistakes happen.


What I see in David Malcolm's change is a fairly minor bug.  I don't 
think David (or anyone) could have really expected that %p is printed 
differently across different hosts and thus his patch would need wider 
host testing.  And AFAICT David addressed this issue as soon as he 
started his day.


So let's all take a deep breath and get back to improving GCC rather 
than taking jabs at each other.


jeff

Re: Update probabilities in predict.def to match reality

2016-06-09 Thread Jan Hubicka

> On 06/07/2016 09:27 PM, Jan Hubicka wrote:
> > There are bugs in few predictors - goto predictor is dead because the FE 
> > code was dropped,
> > return predictor is bit random because CFG is optimized (it should probably 
> > be done in FE),
> > loop iv compare seems bogus and fortran fail alloc does not seem to work as 
> > intended.
> > I added FIXME and will addres them incrementally.
> 
> Hi.
> 
> I've investigated why 'loop iv compare heuristics' provides bogus values and 
> I've just created
> PR for that:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71474
> 
> And I would like to apply following patch, that adds 'XFAIL' test-case 
> described in the PR and
> I would distinguish scanning for 'loop iv compare' and 'guess loop iv 
> compare'.
> 
> Can I install the patch?
> Thanks,
> Martin
> 

> >From 6272402f76d4e6ff496d55e9a4fac7ee9a696e4e Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Thu, 9 Jun 2016 16:53:32 +0200
> Subject: [PATCH] Make 'loop iv compare' heuristics scanning more precise in
>  test-suite
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-06-09  Martin Liska  
> 
>   * gcc.dg/predict-1.c: Distinguish between "loop iv compare"
>   and "guess loop iv compared" heuristics.
>   * gcc.dg/predict-2.c: Likewise.
>   * gcc.dg/predict-3.c: Likewise.
>   * gcc.dg/predict-4.c: Likewise.
>   * gcc.dg/predict-5.c: Likewise.
>   * gcc.dg/predict-6.c: Likewise.
>   * gfortran.dg/predict-1.f90: New test.

I think the usual strategy is to add testcases once PR is fixed (so failures
track regressions). So please just fix the predict-?.c templates and leave
predict-1.f90 to go with the patch (I see you arleady proposed one in 
the PR log.

OK for the predict-?.c changes.

Thanks,
Honza

Re: [C++ RFC / Patch] Again about PR 70202

2016-06-09 Thread Jakub Jelinek

On Thu, Jun 09, 2016 at 07:01:47PM +0200, Paolo Carlini wrote:
> Now the first comment I have about this, about which I'd like to have some
> feedback, is that the gcc_unreachable () at the end of
> build_simple_base_path without a return seems a bit weird to me, because the
> function is *supposed to return a tree*, not void. I guess some front-ends
> would even warn on this?!? Anyway, if we change:
> 
> 
> /* */
> 
> gcc_unreachable ();
> }

I'll just comment on this.  gcc_unreachable () is a noreturn function,
so this is perfectly fine IMHO.

Jakub

[C++ RFC / Patch] Again about PR 70202

2016-06-09 Thread Paolo Carlini


Hi again,

thus today I had to revert my first try at resolving this error recovery 
issue because it caused c++/71465. In retrospect it was a bit heavy 
handed anyway, the zeroing of the type served us well for many years...


Today I tried to investigate the issue a bit more. I remind you that we 
are crashing on the gcc_unreachable () at the end of 
build_simple_base_path while, during error recovery, we are processing 
the initialization of the final 'a' in:


class A
{
  virtual void foo () { }
};
class B : public A, A { };  // { dg-error "duplicate base type" }

B b1,  = b1;
A a = b2;

Now the first comment I have about this, about which I'd like to have 
some feedback, is that the gcc_unreachable () at the end of 
build_simple_base_path without a return seems a bit weird to me, because 
the function is *supposed to return a tree*, not void. I guess some 
front-ends would even warn on this?!? Anyway, if we change:



/* */

gcc_unreachable ();
}

to:

/* */

gcc_assert (seen_error ());
return error_mark_node;
}

we get something which, by and large is equally safe in terms of 
miscompilations and helps a lot error recovery, ie, we would not have 
c++/70202. It's also true however that we would not make much progress 
in trying to understand if we could solve the specific problem at issue 
in a more insightful way.


Then my next observation is the following: if we change the testcase to:

class A
{
  virtual void foo () { }
};
class B : public A, A { };  // { dg-error "duplicate base type" }

B b1,  = b1;
A a = b1;

thus everything is the same but at the end we have 'b1', not 'b2', we 
also don't crash. We are saved not by something particularly smart ;) 
but by this check at the beginning of ocp_convert:


  if (error_operand_p (e) || type == error_mark_node)
 return error_mark_node;

that is, the TREE_TYPE of 'e', the VAR_DECL for 'b1', is error_mark_node 
and we return early the error_mark_node, we don't reach the problematic 
build_base_path. With 'b2' we have a more complicated REFERENCE_REF 
which we don't catch, but otherwise everything is the same, we reach 
build_base_path and we crash on the gcc_unreachable. In practice, if I 
add at the beginning of ocp_convert:


+  if (REFERENCE_REF_P (expr)
+  && VAR_P (TREE_OPERAND (expr, 0))
+  && DECL_INITIAL (TREE_OPERAND (expr, 0)) == error_mark_node)
+return error_mark_node;

it also passes testing. Then the final observation is that we also don't 
crash if the type of 'A' is simpler, thus we don't have the declaration of


virtual void foo () { }

Then I gather that when the type is "complex" enough we are not cleaning 
things up completely enough when we encounter it as a duplicated base. 
Maybe as optimal error recovery when we see a duplicated base we should 
clean up everything having to do with bases end up with something 
equivalent to an incomplete class B; or something like that?!? I don't 
know what we want to try short / medium /long term...


Thanks!
Paolo.

Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-06-09 Thread Jakub Jelinek

On Thu, Jun 09, 2016 at 07:53:52PM +0300, Alexander Monakov wrote:
> I'm sending updated patch series with backend prerequisites for OpenMP
> offloading to the NVIDIA PTX ISA.  The first patch has already received some
> comments and this version reflects review feedback.  The other patches have
> been adjusted for clarity and re-cut in a more rigorous manner.  All patches 
> are
> rebased onto current trunk.
> 
> Jakub, can you offer wishes/recommendations for sending the rest of
> (middle-end and libgomp) patches?  As you know there's a branch with

Once all the prerequisites are in (I assume the patches depend on the NVPTX
backend patches you've just posted), then I'd prefer if you rebase the rest
to current trunk and post  in reasonably reviewable chunks (that can be all
of middle-end changes in one patch, all of libgomp plugin changes, all of
other libgomp changes, or if some of those would be too large, split that a
little bit).

Jakub

Re: [RFC: Patch 0/6] Rewrite the noce-ifcvt cost models

2016-06-09 Thread Jeff Law


On 06/02/2016 10:53 AM, James Greenhalgh wrote:

Hi,

When I was working in the area last year, I promised to revisit the cost
model for noce if-conversion and see if I could improve the modeling. This
turned out to be more tricky than I expected.

This patch set rewrites the cost model for noce if-conversion. The goal is
to rationalise the units used in the calculations away from BRANCH_COST,
which is only defined relative to itself.
Right.  I think we all agreed that the key weakness of BRANCH_COST was 
that its meaning is only defined relative to itself.


What we want is a costing metric that would allow us to estimate the 
cost of different forms of the computation, which might include branches 
and which may include edge probabilty information.




If you're looking at that and thinking it doesn't sound much different from
our current call to BRANCH_COST, you're right. This isn't as large a
departure from the existing cost model as I had originally intended.
Perhaps not as large of a change as you intended, but I think you're 
hitting the key issue with BRANCH_COST.




This act of making the cost models consistent will cause code generation
changes on a number of targets - most notably x86_64. On x86_64 the RTX
cost of a conditional move comes out at "20" - this is far higher than
COSTS_N_INSNS (BRANCH_COST) for the x86 targets, so they lose lots
of if-conversion. The easy fix for this would be to implement the new hook.
I measured the performance impact on Spec2000 as a smoke test, it didn't
seem to harm anything, and the result was a slight (< 3%) uplift on
Spec2000FP. I'm no expert on x86_64, so I haven't taken a closer look for
the reasons.
I'd be comfortable with Uros guiding the implementation of the target 
hook for x86 so that we don't take a major step backward.



Jeff

[PATCH 3/8] nvptx -muniform-simt

2016-06-09 Thread Alexander Monakov

This patch implements -muniform-simt code generation option, which is used to
emit code for OpenMP offloading.  The goal is to emit code that can either
execute "normally", or can execute in a way that keeps all lanes in a given
warp active, their local state synchronized, and observable effects from
execution happening as if only one lane was active.  The latter mode is how
OpenMP offloaded code runs outside of SIMD regions.

To achieve that, the compiler instruments atomic instructions and calls to
functions provided by the CUDA runtime (malloc, free, vprintf), i.e. those
that GCC itself doesn't compile.

Instrumentation converts an atomic instruction to a predicated atomic
instruction followed by a warp shuffle.  To illustrate,

atom.op dest, 

becomes

  @PRED atom.op dest, 
shfl.idx dest, dest, MASTER

where, outside of SIMD regions:

- PRED is true in lane 0, false in lanes 1-31, so the side effect happens once
- MASTER is 0 in all lanes, so the shuffle synchronizes 'dest' among all lanes

and inside of SIMD regions:

- PRED is true in all lanes, so the atomic is done in all lanes independently
- MASTER equals to current lane number, so the shuffle is a no-op.

To keep track of current state and compute PRED and MASTER, the compiler uses
shared memory array 'unsigned __nvptx_uni[]' with per-warp all-zeros or
all-ones masks.  The mask word is zero outside of SIMD regions, all-ones
inside.  Function prologue uses mask to compute MASTER and PRED via:

MASTER = LANE_ID & MASK;
PRED   = LANE_ID == MASTER;

Calls are handled like atomics.

gcc/
* config/nvptx/nvptx.c (need_unisimt_decl): New variable.  Set it...
(nvptx_init_unisimt_predicate): ...here (new function) and use it...
(nvptx_file_end): ...here to emit declaration of __nvptx_uni array.
(nvptx_declare_function_name): Call nvptx_init_unisimt_predicate.
(nvptx_get_unisimt_master): New helper function.
(nvptx_get_unisimt_predicate): Ditto.
(nvptx_call_insn_is_syscall_p): Ditto.
(nvptx_unisimt_handle_set): Ditto.
(nvptx_reorg_uniform_simt): New.  Transform code for -muniform-simt.
(nvptx_reorg): Call nvptx_reorg_uniform_simt.
* config/nvptx/nvptx.h (TARGET_CPU_CPP_BUILTINS): Define
__nvptx_unisimt__ when -muniform-simt option is active.
(struct machine_function): Add unisimt_master, unisimt_predicate
rtx fields.
* config/nvptx/nvptx.md (atomic): New attribute.
(atomic_compare_and_swap_1): Mark with atomic attribute.
(atomic_exchange): Ditto.
(atomic_fetch_add): Ditto.
(atomic_fetch_addsf): Ditto.
(atomic_fetch_): Ditto.
* config/nvptx/nvptx.opt (muniform-simt): New option.
* doc/invoke.texi (-muniform-simt): Document.

gcc/testsuite/
* gcc.target/nvptx/unisimt.c: New test.

libgcc/
* config/nvptx/crt0.c (__main) [__nvptx_unisimt__]: Setup __nvptx_uni.
* config/nvptx/stacks.c (__nvptx_uni): Define.
---
 gcc/config/nvptx/nvptx.c | 124 +++
 gcc/config/nvptx/nvptx.h |   4 +
 gcc/config/nvptx/nvptx.md|  18 +++--
 gcc/config/nvptx/nvptx.opt   |   4 +
 gcc/doc/invoke.texi  |  10 +++
 gcc/testsuite/gcc.target/nvptx/unisimt.c |  22 ++
 libgcc/config/nvptx/crt0.c   |   4 +
 libgcc/config/nvptx/stacks.c |   3 +-
 8 files changed, 183 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/unisimt.c

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 7b90cb1..f91573a 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -142,6 +142,9 @@ static GTY(()) tree global_lock_var;
 /* True if any function references __nvptx_stacks.  */
 static bool need_softstack_decl;
 
+/* True if any function references __nvptx_uni.  */
+static bool need_unisimt_decl;
+
 /* Allocate a new, cleared machine_function structure.  */
 
 static struct machine_function *
@@ -1049,6 +1052,34 @@ nvptx_init_axis_predicate (FILE *file, int regno, const 
char *name)
   fprintf (file, "\t}\n");
 }
 
+/* Emit code to initialize predicate and master lane index registers for
+   -muniform-simt code generation variant.  */
+
+static void
+nvptx_init_unisimt_predicate (FILE *file)
+{
+  int bits = POINTER_SIZE;
+  int master = REGNO (cfun->machine->unisimt_master);
+  int pred = REGNO (cfun->machine->unisimt_predicate);
+  fprintf (file, "\t{\n");
+  fprintf (file, "\t\t.reg.u32 %%ustmp0;\n");
+  fprintf (file, "\t\t.reg.u%d %%ustmp1;\n", bits);
+  fprintf (file, "\t\t.reg.u%d %%ustmp2;\n", bits);
+  fprintf (file, "\t\tmov.u32 %%ustmp0, %%tid.y;\n");
+  fprintf (file, "\t\tmul%s.u32 %%ustmp1, %%ustmp0, 4;\n",
+  bits == 64 ? ".wide" : ".lo");
+  fprintf (file, "\t\tmov.u%d %%ustmp2, __nvptx_uni;\n", bits);
+  fprintf (file, "\t\tadd.u%d %%ustmp2, %%ustmp2, %%ustmp1;\n", bits);

[PATCH 1/8] nvptx -msoft-stack

2016-06-09 Thread Alexander Monakov

This is a respin of the recently reviewed -msoft-stack patch that addresses
review feedback and adds required libgcc changes.

gcc/:
* config/nvptx/nvptx-protos.h (nvptx_output_set_softstack): Declare.
* config/nvptx/nvptx.c: (need_softstack_decl): New variable.
(init_softstack_frame): New.
(nvptx_declare_function_name): Handle TARGET_SOFT_STACK.
(nvptx_output_set_softstack): New.
(nvptx_output_return): Emit stack restore if needed.
(nvptx_get_drap_rtx): Return %argp as the DRAP if needed.
(nvptx_file_end): Handle need_softstack_decl.
* config/nvptx/nvptx.h: (TARGET_CPU_CPP_BUILTINS): Define
__nvptx_softstack__ when -msoft-stack is active.
(STACK_SIZE_MODE): Define.
(FIXED_REGISTERS): Adjust.
(SOFTSTACK_SLOT_REGNUM): New.
(SOFTSTACK_PREV_REGNUM): New.
(REGISTER_NAMES): Adjust.
(struct machine_function): New bool field has_softstack.
* config/nvptx/nvptx.md (UNSPEC_SET_SOFTSTACK): New.
(allocate_stack): Implement for TARGET_SOFT_STACK.  Remove unused code.
(allocate_stack_): Remove unused pattern.
(set_softstack_insn): New pattern.
(restore_stack_block): Handle for TARGET_SOFT_STACK.
* config/nvptx/nvptx.opt: (msoft-stack): New option.
* doc/invoke.texi (msoft-stack): Document.

gcc/testsuite/:
* gcc.target/nvptx/softstack.c: New test.
* lib/target-supports.exp (check_effective_target_alloca): Use a
compile test.

libgcc/:
* config/nvptx/crt0.c (__main) [__nvptx_softstack__]: Setup
__nvptx_stacks.
* config/nvptx/stacks.c: New file.
* config/nvptx/t-nvptx (LIB2ADD): Add stacks.c.
---
 gcc/config/nvptx/nvptx-protos.h|   1 +
 gcc/config/nvptx/nvptx.c   | 111 ++---
 gcc/config/nvptx/nvptx.h   |  15 +++-
 gcc/config/nvptx/nvptx.md  |  33 ++---
 gcc/config/nvptx/nvptx.opt |   4 ++
 gcc/doc/invoke.texi|  12 
 gcc/testsuite/gcc.target/nvptx/softstack.c |  23 ++
 gcc/testsuite/lib/target-supports.exp  |   5 +-
 libgcc/config/nvptx/crt0.c |   6 ++
 libgcc/config/nvptx/stacks.c   |  24 +++
 libgcc/config/nvptx/t-nvptx|   3 +-
 11 files changed, 213 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/softstack.c
 create mode 100644 libgcc/config/nvptx/stacks.c

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index ec4588e..647607d 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -41,5 +41,6 @@ extern const char *nvptx_ptx_type_from_mode (machine_mode, 
bool);
 extern const char *nvptx_output_mov_insn (rtx, rtx);
 extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx);
 extern const char *nvptx_output_return (void);
+extern const char *nvptx_output_set_softstack (unsigned);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 39e4145..1c95fdf 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -139,6 +139,9 @@ static GTY(()) rtx worker_red_sym;
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
+/* True if any function references __nvptx_stacks.  */
+static bool need_softstack_decl;
+
 /* Allocate a new, cleared machine_function structure.  */
 
 static struct machine_function *
@@ -979,6 +982,60 @@ init_frame (FILE  *file, int regno, unsigned align, 
unsigned size)
   POINTER_SIZE, reg_names[regno], reg_names[regno]);
 }
 
+/* Emit soft stack frame setup sequence.  */
+
+static void
+init_softstack_frame (FILE *file, unsigned alignment, HOST_WIDE_INT size)
+{
+  /* Maintain 64-bit stack alignment.  */
+  unsigned keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+  size = ROUND_UP (size, keep_align);
+  int bits = POINTER_SIZE;
+  const char *reg_stack = reg_names[STACK_POINTER_REGNUM];
+  const char *reg_frame = reg_names[FRAME_POINTER_REGNUM];
+  const char *reg_sspslot = reg_names[SOFTSTACK_SLOT_REGNUM];
+  const char *reg_sspprev = reg_names[SOFTSTACK_PREV_REGNUM];
+  fprintf (file, "\t.reg.u%d %s;\n", bits, reg_stack);
+  fprintf (file, "\t.reg.u%d %s;\n", bits, reg_frame);
+  fprintf (file, "\t.reg.u%d %s;\n", bits, reg_sspslot);
+  fprintf (file, "\t.reg.u%d %s;\n", bits, reg_sspprev);
+  fprintf (file, "\t{\n");
+  fprintf (file, "\t\t.reg.u32 %%fstmp0;\n");
+  fprintf (file, "\t\t.reg.u%d %%fstmp1;\n", bits);
+  fprintf (file, "\t\t.reg.u%d %%fstmp2;\n", bits);
+  fprintf (file, "\t\tmov.u32 %%fstmp0, %%tid.y;\n");
+  fprintf (file, "\t\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
+  bits == 64 ? ".wide" : ".lo", bits / 8);
+  fprintf (file, "\t\tmov.u%d %%fstmp2, __nvptx_stacks;\n", bits);
+  /* Initialize %sspslot = &__nvptx_stacks[tid.y].  */
+  fprintf

[PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-06-09 Thread Alexander Monakov

Hi,

I'm sending updated patch series with backend prerequisites for OpenMP
offloading to the NVIDIA PTX ISA.  The first patch has already received some
comments and this version reflects review feedback.  The other patches have
been adjusted for clarity and re-cut in a more rigorous manner.  All patches are
rebased onto current trunk.

Jakub, can you offer wishes/recommendations for sending the rest of
(middle-end and libgomp) patches?  As you know there's a branch with
development history; is that of interest, or would it be easier if I rebased
all stuff anew on current trunk?

Thanks.
Alexander

[PATCH 8/8] nvptx: handle OpenMP "omp target entrypoint"

2016-06-09 Thread Alexander Monakov

This patch implements emission of OpenMP target region entrypoints: the
compiler emits the target function with '$impl' appended to the name, and
under the original name it emits a short entry sequence that sets up shared
memory arrays and calls the target function via 'gomp_nvptx_main' (which is
implemented in libgomp).

* config/nvptx/nvptx.c (write_as_kernel): Restrict to OpenACC target
regions.
(write_omp_entry): New.  Use it...
(nvptx_declare_function_name): ...here to emit OpenMP target region
entrypoints.
(nvptx_record_offload_symbol): Handle NULL attributes.
---
 gcc/config/nvptx/nvptx.c | 82 +---
 1 file changed, 78 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index d959f85..5e47002 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -723,7 +723,10 @@ static bool
 write_as_kernel (tree attrs)
 {
   return (lookup_attribute ("kernel", attrs) != NULL_TREE
- || lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE);
+ || (lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE
+ && lookup_attribute ("oacc function", attrs) != NULL_TREE));
+  /* For OpenMP target regions, the corresponding kernel entry is emitted from
+ write_omp_entry as a separate function.  */
 }
 
 /* Emit a linker marker for a function decl or defn.  */
@@ -1073,6 +1076,69 @@ nvptx_init_unisimt_predicate (FILE *file)
   need_unisimt_decl = true;
 }
 
+/* Emit kernel NAME for function ORIG outlined for an OpenMP 'target' region:
+
+   extern void gomp_nvptx_main (void (*fn)(void*), void *fnarg);
+   void __attribute__((kernel)) NAME (void *arg, char *stack, size_t stacksize)
+   {
+ __nvptx_stacks[tid.y] = stack + stacksize * (ctaid.x * ntid.y + tid.y + 
1);
+ __nvptx_uni[tid.y] = 0;
+ gomp_nvptx_main (ORIG, arg);
+   }
+   ORIG itself should not be emitted as a PTX .entry function.  */
+
+static void
+write_omp_entry (FILE *file, const char *name, const char *orig)
+{
+  static bool gomp_nvptx_main_declared;
+  if (!gomp_nvptx_main_declared)
+{
+  gomp_nvptx_main_declared = true;
+  write_fn_marker (func_decls, false, true, "gomp_nvptx_main");
+  func_decls << ".extern .func gomp_nvptx_main (.param.u" << POINTER_SIZE
+<< " %in_ar1, .param.u" << POINTER_SIZE << " %in_ar2);\n";
+}
+#define ENTRY_TEMPLATE(PS, PS_BYTES, MAD_PS_32) "\
+ (.param.u" PS " %arg, .param.u" PS " %stack, .param.u" PS " %sz)\n\
+{\n\
+   .reg.u32 %r<3>;\n\
+   .reg.u" PS " %R<4>;\n\
+   mov.u32 %r0, %tid.y;\n\
+   mov.u32 %r1, %ntid.y;\n\
+   mov.u32 %r2, %ctaid.x;\n\
+   cvt.u" PS ".u32 %R1, %r0;\n\
+   " MAD_PS_32 " %R1, %r1, %r2, %R1;\n\
+   mov.u" PS " %R0, __nvptx_stacks;\n\
+   " MAD_PS_32 " %R0, %r0, " PS_BYTES ", %R0;\n\
+   ld.param.u" PS " %R2, [%stack];\n\
+   ld.param.u" PS " %R3, [%sz];\n\
+   add.u" PS " %R2, %R2, %R3;\n\
+   mad.lo.u" PS " %R2, %R1, %R3, %R2;\n\
+   st.shared.u" PS " [%R0], %R2;\n\
+   mov.u" PS " %R0, __nvptx_uni;\n\
+   " MAD_PS_32 " %R0, %r0, 4, %R0;\n\
+   mov.u32 %r0, 0;\n\
+   st.shared.u32 [%R0], %r0;\n\
+   mov.u" PS " %R0, \0;\n\
+   ld.param.u" PS " %R1, [%arg];\n\
+   {\n\
+   .param.u" PS " %P<2>;\n\
+   st.param.u" PS " [%P0], %R0;\n\
+   st.param.u" PS " [%P1], %R1;\n\
+   call.uni gomp_nvptx_main, (%P0, %P1);\n\
+   }\n\
+   ret.uni;\n\
+}\n"
+  static const char entry64[] = ENTRY_TEMPLATE ("64", "8", "mad.wide.u32");
+  static const char entry32[] = ENTRY_TEMPLATE ("32", "4", "mad.lo.u32  ");
+#undef ENTRY_TEMPLATE
+  const char *entry_1 = TARGET_ABI64 ? entry64 : entry32;
+  /* Position ENTRY_2 after the embedded nul using strlen of the prefix.  */
+  const char *entry_2 = entry_1 + strlen (entry64) + 1;
+  fprintf (file, ".visible .entry %s%s%s%s", name, entry_1, orig, entry_2);
+  need_softstack_decl = need_unisimt_decl = true;
+}
+
 /* Implement ASM_DECLARE_FUNCTION_NAME.  Writes the start of a ptx
function, including local var decls and copies from the arguments to
local regs.  */
@@ -1084,6 +1150,14 @@ nvptx_declare_function_name (FILE *file, const char 
*name, const_tree decl)
   tree result_type = TREE_TYPE (fntype);
   int argno = 0;
 
+  if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
+  && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
+{
+  char *buf = (char *) alloca (strlen (name) + sizeof ("$impl"));
+  sprintf (buf, "%s$impl", name);
+  write_omp_entry (file, name, buf);
+  name = buf;
+}
   /* We construct the initial part of the function into a string
  stream, in order to share the prototype writing code.  */
   std::stringstream s;
@@ -4153,13 +4227,13 @@ nvptx_record_offload_symbol (tree decl)
 case FUNCTION_DECL:
   {

[PATCH 4/8] nvptx -mgomp

2016-06-09 Thread Alexander Monakov

This patch adds option -mgomp which enables -msoft-stack plus -muniform-simt,
and builds a multilib with it.  This codegen convention is used for OpenMP
offloading.

* config/nvptx/nvptx.c (nvptx_option_override): Handle TARGET_GOMP.
* config/nvptx/nvptx.opt (mgomp): New option.
* config/nvptx/t-nvptx (MULTILIB_OPTIONS): New.
* doc/invoke.texi (mgomp): Document.
---
 gcc/config/nvptx/nvptx.c   | 3 +++
 gcc/config/nvptx/nvptx.opt | 4 
 gcc/config/nvptx/t-nvptx   | 2 ++
 gcc/doc/invoke.texi| 5 +
 4 files changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index f91573a..294002e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -192,6 +192,9 @@ nvptx_option_override (void)
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_red");
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
+  if (TARGET_GOMP)
+target_flags |= MASK_SOFT_STACK | MASK_UNIFORM_SIMT;
 }
 
 /* Return a ptx type for MODE.  If PROMOTE, then use .u32 for QImode to
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 0d46e1d..cb6194d 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -40,3 +40,7 @@ Use custom stacks instead of local memory for automatic 
storage.
 muniform-simt
 Target Report Mask(UNIFORM_SIMT)
 Generate code that can keep local state uniform across all lanes.
+
+mgomp
+Target Report Mask(GOMP)
+Generate code for OpenMP offloading: enables -msoft-stack and -muniform-simt.
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index e2580c9..6c1010d 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -8,3 +8,5 @@ ALL_HOST_OBJS += mkoffload.o
 mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
$(LIBIBERTY) $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
  mkoffload.o collect-utils.o libcommon-target.a $(LIBIBERTY) $(LIBS)
+
+MULTILIB_OPTIONS = mgomp
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 534025d..bae7f2d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -19631,6 +19631,11 @@ generation variant is used for OpenMP offloading, but 
the option is exposed on
 its own for the purpose of testing the compiler; to generate code suitable for
 linking into programs using OpenMP offloading, use option @option{-mgomp}.
 
+@item -mgomp
+@opindex mgomp
+Generate code for use in OpenMP offloading: enables @option{-msoft-stack} and
+@option{-muniform-simt} options, and selects corresponding multilib variant.
+
 @end table
 
 @node PDP-11 Options
-- 
2.6.6

[PATCH 7/8] nvptx: new insns for OpenMP SIMD-on-SIMT

2016-06-09 Thread Alexander Monakov

This patch implements in nvptx.md a few new instruction patterns that are used
for OpenMP SIMD code.

* config/nvptx/nvptx-protos.h (nvptx_shuffle_kind): Move enum
declaration from nvptx.c.
(nvptx_gen_shuffle): Declare.
* config/nvptx/nvptx.c (nvptx_shuffle_kind): Move to nvptx-protos.h.
(nvptx_gen_shuffle): Export.
* config/nvptx/nvptx.md (UNSPEC_VOTE_BALLOT): New unspec.
(UNSPEC_LANEID): Ditto.
(UNSPECV_NOUNROLL): Ditto.
(nvptx_vote_ballot): New pattern.
(omp_simt_lane): Ditto.
(omp_simt_last_lane): Ditto.
(omp_simt_ordered): Ditto.
(omp_simt_vote_any): Ditto.
(omp_simt_xchg_bfly): Ditto.
(omp_simt_xchg_idx): Ditto.
(nvptx_nounroll): Ditto.
* target-insns.def (omp_simt_lane): New.
(omp_simt_last_lane): New.
(omp_simt_ordered): New.
(omp_simt_vote_any): New.
(omp_simt_xchg_bfly): New.
(omp_simt_xchg_idx): New.
---
 gcc/config/nvptx/nvptx-protos.h | 11 +
 gcc/config/nvptx/nvptx.c| 12 +-
 gcc/config/nvptx/nvptx.md   | 94 +
 gcc/target-insns.def|  6 +++
 4 files changed, 112 insertions(+), 11 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 647607d..331ec0a 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -21,6 +21,16 @@
 #ifndef GCC_NVPTX_PROTOS_H
 #define GCC_NVPTX_PROTOS_H
 
+/* The kind of shuffe instruction.  */
+enum nvptx_shuffle_kind
+{
+  SHUFFLE_UP,
+  SHUFFLE_DOWN,
+  SHUFFLE_BFLY,
+  SHUFFLE_IDX,
+  SHUFFLE_MAX
+};
+
 extern void nvptx_declare_function_name (FILE *, const char *, const_tree 
decl);
 extern void nvptx_declare_object_name (FILE *file, const char *name,
   const_tree decl);
@@ -36,6 +46,7 @@ extern void nvptx_register_pragmas (void);
 extern void nvptx_expand_oacc_fork (unsigned);
 extern void nvptx_expand_oacc_join (unsigned);
 extern void nvptx_expand_call (rtx, rtx);
+extern rtx nvptx_gen_shuffle (rtx, rtx, rtx, nvptx_shuffle_kind);
 extern rtx nvptx_expand_compare (rtx);
 extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
 extern const char *nvptx_output_mov_insn (rtx, rtx);
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 46afae4..d959f85 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -70,16 +70,6 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-/* The kind of shuffe instruction.  */
-enum nvptx_shuffle_kind
-{
-  SHUFFLE_UP,
-  SHUFFLE_DOWN,
-  SHUFFLE_BFLY,
-  SHUFFLE_IDX,
-  SHUFFLE_MAX
-};
-
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
 {
@@ -1433,7 +1423,7 @@ nvptx_gen_pack (rtx dst, rtx src0, rtx src1)
 /* Generate an instruction or sequence to broadcast register REG
across the vectors of a single warp.  */
 
-static rtx
+rtx
 nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, nvptx_shuffle_kind kind)
 {
   rtx res;
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 31f87ec..b1b27a2 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -42,6 +42,10 @@
 
UNSPEC_BIT_CONV
 
+   UNSPEC_VOTE_BALLOT
+
+   UNSPEC_LANEID
+
UNSPEC_SHUFFLE
UNSPEC_BR_UNIFIED
 ])
@@ -57,6 +61,8 @@
UNSPECV_FORKED
UNSPECV_JOINING
UNSPECV_JOIN
+
+   UNSPECV_NOUNROLL
 ])
 
 (define_attr "subregs_ok" "false,true"
@@ -1166,6 +1172,88 @@
   ""
   "%.\\tshfl%S3.b32\\t%0, %1, %2, 31;")
 
+(define_insn "nvptx_vote_ballot"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (unspec:SI [(match_operand:BI 1 "nvptx_register_operand" "R")]
+  UNSPEC_VOTE_BALLOT))]
+  ""
+  "%.\\tvote.ballot.b32\\t%0, %1;")
+
+;; Patterns for OpenMP SIMD-via-SIMT lowering
+
+;; Implement IFN_GOMP_SIMT_LANE: set operand 0 to lane index
+(define_insn "omp_simt_lane"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+   (unspec:SI [(const_int 0)] UNSPEC_LANEID))]
+  ""
+  "%.\\tmov.u32\\t%0, %%laneid;")
+
+;; Implement IFN_GOMP_SIMT_ORDERED: copy operand 1 to operand 0 and
+;; place a compiler barrier to disallow unrolling/peeling the containing loop
+(define_expand "omp_simt_ordered"
+  [(match_operand:SI 0 "nvptx_register_operand" "=R")
+   (match_operand:SI 1 "nvptx_register_operand" "R")]
+  ""
+{
+  emit_move_insn (operands[0], operands[1]);
+  emit_insn (gen_nvptx_nounroll ());
+  DONE;
+})
+
+;; Implement IFN_GOMP_SIMT_XCHG_BFLY: perform a "butterfly" exchange
+;; across lanes
+(define_expand "omp_simt_xchg_bfly"
+  [(match_operand 0 "nvptx_register_operand" "=R")
+   (match_operand 1 "nvptx_register_operand" "R")
+   (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")]
+  ""
+{
+  emit_insn (nvptx_gen_shuffle (operands[0], operands[1], operands[2],
+   SHUFFLE_BFLY));
+  DONE;
+})
+
+;; Implement

[PATCH 2/8] nvptx: implement predicated instructions

2016-06-09 Thread Alexander Monakov

This patch wires up generation of predicated instruction forms in nvptx.md and
fixes their handling in nvptx.c.  This is a prerequisite for the following
patch.  On its own it doesn't affect generated code because COND_EXEC
instructions are created by if-conversion only after register allocation,
which is not performed on NVPTX.

* config/nvptx/nvptx.c (nvptx_output_call_insn): Handle COND_EXEC
patterns.  Emit instruction predicate.
(nvptx_print_operand): Fix handling of instruction predicates.
* config/nvptx/nvptx.md (predicable): New attribute.  Generate
predicated forms via define_cond_exec.
(br_true): Mark as not predicable.
(br_false): Ditto.
(br_true_uni): Ditto.
(br_false_uni): Ditto.
(return): Ditto.
(trap_if_true): Ditto.
(trap_if_false): Ditto.
(nvptx_fork): Ditto.
(nvptx_forked): Ditto.
(nvptx_joining): Ditto.
(nvptx_join): Ditto.
(nvptx_barsync): Ditto.
---
 gcc/config/nvptx/nvptx.c  | 14 --
 gcc/config/nvptx/nvptx.md | 43 +++
 2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 1c95fdf..7b90cb1 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1919,6 +1919,8 @@ nvptx_output_mov_insn (rtx dst, rtx src)
   return "%.\tcvt%t0%t1\t%0, %1;";
 }
 
+static void nvptx_print_operand (FILE *, rtx, int);
+
 /* Output INSN, which is a call to CALLEE with result RESULT.  For ptx, this
involves writing .param declarations and in/out copies into them.  For
indirect calls, also write the .callprototype.  */
@@ -1930,6 +1932,8 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx 
callee)
   static int labelno;
   bool needs_tgt = register_operand (callee, Pmode);
   rtx pat = PATTERN (insn);
+  if (GET_CODE (pat) == COND_EXEC)
+pat = COND_EXEC_CODE (pat);
   int arg_end = XVECLEN (pat, 0);
   tree decl = NULL_TREE;
 
@@ -1974,6 +1978,8 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx 
callee)
   fprintf (asm_out_file, ";\n");
 }
 
+  /* The '.' stands for the call's predicate, if any.  */
+  nvptx_print_operand (asm_out_file, NULL_RTX, '.');
   fprintf (asm_out_file, "\t\tcall ");
   if (result != NULL_RTX)
 fprintf (asm_out_file, "(%s_in), ", reg_names[NVPTX_RETURN_REGNUM]);
@@ -2037,8 +2043,6 @@ nvptx_print_operand_punct_valid_p (unsigned char c)
   return c == '.' || c== '#';
 }
 
-static void nvptx_print_operand (FILE *, rtx, int);
-
 /* Subroutine of nvptx_print_operand; used to print a memory reference X to 
FILE.  */
 
 static void
@@ -2099,12 +2103,10 @@ nvptx_print_operand (FILE *file, rtx x, int code)
   x = current_insn_predicate;
   if (x)
{
- unsigned int regno = REGNO (XEXP (x, 0));
- fputs ("[", file);
+ fputs ("@", file);
  if (GET_CODE (x) == EQ)
fputs ("!", file);
- fputs (reg_names [regno], file);
- fputs ("]", file);
+ output_reg (file, REGNO (XEXP (x, 0)), VOIDmode);
}
   return;
 }
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index bbc8b1c..bfe3260 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -126,6 +126,17 @@
   return true;
 })
 
+(define_attr "predicable" "false,true"
+  (const_string "true"))
+
+(define_cond_exec
+  [(match_operator 0 "predicate_operator"
+  [(match_operand:BI 1 "nvptx_register_operand" "")
+   (match_operand:BI 2 "const0_operand" "")])]
+  ""
+  ""
+  )
+
 (define_constraint "P0"
   "An integer with the value 0."
   (and (match_code "const_int")
@@ -511,7 +522,8 @@
  (label_ref (match_operand 1 "" ""))
  (pc)))]
   ""
-  "%j0\\tbra\\t%l1;")
+  "%j0\\tbra\\t%l1;"
+  [(set_attr "predicable" "false")])
 
 (define_insn "br_false"
   [(set (pc)
@@ -520,7 +532,8 @@
  (label_ref (match_operand 1 "" ""))
  (pc)))]
   ""
-  "%J0\\tbra\\t%l1;")
+  "%J0\\tbra\\t%l1;"
+  [(set_attr "predicable" "false")])
 
 ;; unified conditional branch
 (define_insn "br_true_uni"
@@ -529,7 +542,8 @@
   UNSPEC_BR_UNIFIED) (const_int 0))
 (label_ref (match_operand 1 "" "")) (pc)))]
   ""
-  "%j0\\tbra.uni\\t%l1;")
+  "%j0\\tbra.uni\\t%l1;"
+  [(set_attr "predicable" "false")])
 
 (define_insn "br_false_uni"
   [(set (pc) (if_then_else
@@ -537,7 +551,8 @@
   UNSPEC_BR_UNIFIED) (const_int 0))
 (label_ref (match_operand 1 "" "")) (pc)))]
   ""
-  "%J0\\tbra.uni\\t%l1;")
+  "%J0\\tbra.uni\\t%l1;"
+  [(set_attr "predicable" "false")])
 
 (define_expand "cbranch4"
   [(set (pc)
@@ -940,7 +955,8 @@
   ""
 {
   return nvptx_output_return ();
-})
+}
+  [(set_attr "predicable" "false")])
 
 (define_expand "epilogue"
   [(clobber (const_int 0))]
@@ -1029,14 +1045,16 @@
(const_int 0))

[PATCH 6/8] new target hook: TARGET_SIMT_VF

2016-06-09 Thread Alexander Monakov

This patch adds a new target hook and implements it in a straightforward
manner on NVPTX to indicate that the target is running in SIMT fashion with 32
threads in a synchronous group ("warp").  For use in OpenMP transforms.

* config/nvptx/nvptx.c (nvptx_simt_vf): New.
(TARGET_SIMT_VF): Define.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: (TARGET_SIMT_VF): New hook.
* target.def: Define it.
---
 gcc/config/nvptx/nvptx.c | 11 +++
 gcc/doc/tm.texi  |  4 
 gcc/doc/tm.texi.in   |  2 ++
 gcc/target.def   | 12 
 4 files changed, 29 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 294002e..46afae4 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4430,6 +4430,14 @@ nvptx_expand_builtin (tree exp, rtx target, rtx 
ARG_UNUSED (subtarget),
 #define PTX_WORKER_LENGTH 32
 #define PTX_GANG_DEFAULT  32
 
+/* Implement TARGET_SIMT_VF target hook: number of threads in a warp.  */
+
+static int
+nvptx_simt_vf ()
+{
+  return PTX_VECTOR_LENGTH;
+}
+
 /* Validate compute dimensions of an OpenACC offload or routine, fill
in non-unity defaults.  FN_LEVEL indicates the level at which a
routine might spawn a loop.  It is negative for non-routines.  If
@@ -5195,6 +5203,9 @@ nvptx_goacc_reduction (gcall *call)
 #undef  TARGET_BUILTIN_DECL
 #define TARGET_BUILTIN_DECL nvptx_builtin_decl
 
+#undef TARGET_SIMT_VF
+#define TARGET_SIMT_VF nvptx_simt_vf
+
 #undef TARGET_GOACC_VALIDATE_DIMS
 #define TARGET_GOACC_VALIDATE_DIMS nvptx_goacc_validate_dims
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b318615..93af006 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5768,6 +5768,10 @@ usable.  In that case, the smaller the number is, the 
more desirable it is
 to use it.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_SIMT_VF (void)
+Return number of threads in SIMT thread group on the target.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree @var{decl}, int 
*@var{dims}, int @var{fn_level})
 This hook should check the launch dimensions provided for an OpenACC
 compute region, or routine.  Defaulted values are represented as -1
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 1e8423c..ccec1b1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4263,6 +4263,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_SIMD_CLONE_USABLE
 
+@hook TARGET_SIMT_VF
+
 @hook TARGET_GOACC_VALIDATE_DIMS
 
 @hook TARGET_GOACC_DIM_LIMIT
diff --git a/gcc/target.def b/gcc/target.def
index a4df363..ba47e5b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1639,6 +1639,18 @@ int, (struct cgraph_node *), NULL)
 
 HOOK_VECTOR_END (simd_clone)
 
+/* Functions relating to OpenMP SIMT vectorization transform.  */
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_SIMT_"
+HOOK_VECTOR (TARGET_SIMT, simt)
+
+DEFHOOK
+(vf,
+"Return number of threads in SIMT thread group on the target.",
+int, (void), NULL)
+
+HOOK_VECTOR_END (simt)
+
 /* Functions relating to openacc.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_GOACC_"
-- 
2.6.6

[PATCH 5/8] nvptx mkoffload: pass -mgomp for OpenMP offloading

2016-06-09 Thread Alexander Monakov

This patch wires up use of alternative -mgomp multilib for OpenMP offloading
via nvptx mkoffload.  It makes OpenACC and OpenMP incompatible for
simultaneous offloading compilation, so I've added a diagnostic for that.

* config/nvptx/mkoffload.c (main): Check that either OpenACC or OpenMP
is selected.  Pass -mgomp to offload compiler in OpenMP case.
---
 gcc/config/nvptx/mkoffload.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index c8eed45..d876c7b 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -460,6 +460,7 @@ main (int argc, char **argv)
 
   /* Scan the argument vector.  */
   bool fopenmp = false;
+  bool fopenacc = false;
   for (int i = 1; i < argc; i++)
 {
 #define STR "-foffload-abi="
@@ -476,11 +477,15 @@ main (int argc, char **argv)
 #undef STR
   else if (strcmp (argv[i], "-fopenmp") == 0)
fopenmp = true;
+  else if (strcmp (argv[i], "-fopenacc") == 0)
+   fopenacc = true;
   else if (strcmp (argv[i], "-save-temps") == 0)
save_temps = true;
   else if (strcmp (argv[i], "-v") == 0)
verbose = true;
 }
+  if (!(fopenacc ^ fopenmp))
+fatal_error (input_location, "either -fopenacc or -fopenmp must be set");
 
   struct obstack argv_obstack;
   obstack_init (_obstack);
@@ -501,6 +506,8 @@ main (int argc, char **argv)
 default:
   gcc_unreachable ();
 }
+  if (fopenmp)
+obstack_ptr_grow (_obstack, "-mgomp");
 
   for (int ix = 1; ix != argc; ix++)
 {
-- 
2.6.6

Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-09 Thread Jakub Jelinek

On Thu, Jun 09, 2016 at 06:43:04PM +0200, Jakub Jelinek wrote:
> Yes, I'm all in favor in disabling X constraint for inline asm.
> Especially if people actually try to print it as well, rather than make it
> unused.  That is a sure path to ICEs.

Though, on the other side, even our documentation mentions
asm volatile ("mtfsf 255,%1" : "=X"(sum): "f"(fpenv));
So perhaps we need to error just in case such an argument is printed?

Jakub

Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-09 Thread Jakub Jelinek

On Thu, Jun 09, 2016 at 10:30:13AM -0600, Jeff Law wrote:
> On 06/06/2016 01:40 PM, Jakub Jelinek wrote:
> >On Mon, Jun 06, 2016 at 09:27:56PM +0200, Marc Glisse wrote:
> >>The last one would miss floating point registers (no 2 platforms use the
> >>same letter for those, hence my quest for something more generic).
> >>
> >>The goal of the experiment is described in PR59159 (for which "+X" is
> >>unlikely to be the right answer, in particular because it is meaningless for
> >>constants). I don't know in what context people use the "X" constraint, or
> >>even better "=X"...
> >
> >X constraint has been added mainly for uses in match_scratch like:
> >(clobber (match_scratch:SI 2 "=X,X,X,"))
> >or when the predicate takes care of everything and it is not needed to
> >specify anything further:
> >  [(set (match_operand:SWI12 0 "push_operand" "=X")
> >(match_operand:SWI12 1 "nonmemory_no_elim_operand" "rn"))]
> >Using it in inline asm generally has resulted in lots of issues, including
> >ICEs etc., so nothing I'd recommend to use.
> So would it make sense to define it as not available for use in ASMs?  I
> realize that's potentially a user-visible change, but it might be a
> reasonable one to make.

Yes, I'm all in favor in disabling X constraint for inline asm.
Especially if people actually try to print it as well, rather than make it
unused.  That is a sure path to ICEs.

Jakub

Re: [PATCH] Remove old diagnostic macros.

2016-06-09 Thread Jeff Law


On 05/30/2016 05:19 PM, Marcin Baczyński wrote:

Hi,
this is my first GCC patch, so please bear with me if something is wrong with
it in an obvious way.

I've found two unused macros in gcc/diagnostic.h. Is the patch okay as is?
Bootstrapped on x86_64-pc-linux-gnu.

2016-05-31 Marcin Baczyński 

   * gcc/diagnostic.h (diagnostic_line_cutoff, diagnostic_flush_buffer):
delete.

This is fine for the trunk.  Please install.

THanks,
jeff

Re: [PATCH 0/4] BRIG (HSAIL) frontend

2016-06-09 Thread Pekka Jääskeläinen

Ping:

Any further comments to the patch set?

Thanks,
Pekka

On Wed, May 18, 2016 at 8:01 PM, Pekka Jääskeläinen  wrote:
> Hi Joseph,
>
> Updated diffstat below:
>  Makefile.def  | 3 +
>  Makefile.in   |   489 +
>  configure | 1 +
>  configure.ac  | 1 +
>  gcc/brig/Make-lang.in |   246 +
>  gcc/brig/brig-c.h |68 +
>  gcc/brig/brig-lang.c  |   454 +
>  gcc/brig/brigfrontend/brig-arg-block-handler.cc   |67 +
>  gcc/brig/brigfrontend/brig-atomic-inst-handler.cc |   377 +
>  gcc/brig/brigfrontend/brig-basic-inst-handler.cc  |   733 +
>  gcc/brig/brigfrontend/brig-branch-inst-handler.cc |   217 +
>  gcc/brig/brigfrontend/brig-cmp-inst-handler.cc|   212 +
>  gcc/brig/brigfrontend/brig-code-entry-handler.cc  |  2325 +++
>  gcc/brig/brigfrontend/brig-code-entry-handler.h   |   449 +
>  gcc/brig/brigfrontend/brig-comment-handler.cc |39 +
>  gcc/brig/brigfrontend/brig-control-handler.cc |29 +
>  .../brigfrontend/brig-copy-move-inst-handler.cc   |56 +
>  gcc/brig/brigfrontend/brig-cvt-inst-handler.cc|   250 +
>  gcc/brig/brigfrontend/brig-fbarrier-handler.cc|44 +
>  gcc/brig/brigfrontend/brig-function-handler.cc|   373 +
>  gcc/brig/brigfrontend/brig-function.cc|   698 +
>  gcc/brig/brigfrontend/brig-function.h |   216 +
>  gcc/brig/brigfrontend/brig-inst-mod-handler.cc|   168 +
>  gcc/brig/brigfrontend/brig-label-handler.cc   |37 +
>  gcc/brig/brigfrontend/brig-lane-inst-handler.cc   |83 +
>  gcc/brig/brigfrontend/brig-machine.c  |37 +
>  gcc/brig/brigfrontend/brig-machine.h  |35 +
>  gcc/brig/brigfrontend/brig-mem-inst-handler.cc|   181 +
>  gcc/brig/brigfrontend/brig-module-handler.cc  |30 +
>  gcc/brig/brigfrontend/brig-queue-inst-handler.cc  |92 +
>  gcc/brig/brigfrontend/brig-seg-inst-handler.cc|   134 +
>  gcc/brig/brigfrontend/brig-signal-inst-handler.cc |42 +
>  gcc/brig/brigfrontend/brig-util.cc|   348 +
>  gcc/brig/brigfrontend/brig-util.h |49 +
>  gcc/brig/brigfrontend/brig-variable-handler.cc|   256 +
>  gcc/brig/brigfrontend/brig_to_generic.cc  |   773 +
>  gcc/brig/brigfrontend/brig_to_generic.h   |   245 +
>  gcc/brig/brigfrontend/phsa.h  |40 +
>  gcc/brig/brigspec.c   |   193 +
>  gcc/brig/config-lang.in   |41 +
>  gcc/brig/lang-specs.h |28 +
>  gcc/brig/lang.opt |41 +
>  gcc/doc/frontends.texi| 2 +-
>  gcc/doc/invoke.texi   | 4 +
>  gcc/doc/standards.texi| 8 +
>  gcc/testsuite/brig.dg/README  |10 +
>  gcc/testsuite/brig.dg/dg.exp  |27 +
>  gcc/testsuite/brig.dg/test/gimple/alloca.hsail|37 +
>  gcc/testsuite/brig.dg/test/gimple/atomics.hsail   |33 +
>  gcc/testsuite/brig.dg/test/gimple/branches.hsail  |58 +
>  gcc/testsuite/brig.dg/test/gimple/fbarrier.hsail  |74 +
>  .../brig.dg/test/gimple/function_calls.hsail  |59 +
>  gcc/testsuite/brig.dg/test/gimple/mem.hsail   |39 +
>  gcc/testsuite/brig.dg/test/gimple/mulhi.hsail |33 +
>  gcc/testsuite/brig.dg/test/gimple/packed.hsail|78 +
>  .../brig.dg/test/gimple/smoke_test.hsail  |91 +
>  gcc/testsuite/brig.dg/test/gimple/variables.hsail |   124 +
>  gcc/testsuite/brig.dg/test/gimple/vector.hsail|57 +
>  gcc/testsuite/lib/brig-dg.exp |29 +
>  gcc/testsuite/lib/brig.exp|40 +
>  include/hsa-interface.h   |   630 +
>  libhsail-rt/Makefile.am   |   123 +
>  libhsail-rt/Makefile.in   |   721 +
>  libhsail-rt/README| 4 +
>  libhsail-rt/aclocal.m4|   979 +
>  libhsail-rt/config.h.in   |   217 +
>  libhsail-rt/configure | 17162 ++
>  libhsail-rt/configure.ac  |   150 +
>  libhsail-rt/include/internal/phsa-rt.h|97 +
>  .../include/internal/phsa_queue_interface.h   |60 +
>  libhsail-rt/include/internal/workitems.h  |   103 +
>  libhsail-rt/m4/libtool.m4 |  7997 
>  libhsail-rt/m4/ltoptions.m4   |   384 +
>  libhsail-rt/m4/ltsugar.m4 |   123 +
>  libhsail-rt/m4/ltversion.m4   |23 +
>  libhsail-rt/m4/lt~obsolete.m4 |98 +

Re: [PATCH] Fix check_GNU_style.sh for BSD / Mac OS X

2016-06-09 Thread Jeff Law


On 06/03/2016 09:41 AM, Alan Hayward wrote:

check_GNU_style.sh fails to detect lines >80 chars on BSD / Mac OS X
systems.

This is becuase paste is being called with an empty delimiter list.
Instead \0 should be used.

Tested on Ubuntu 14.04 and OS X 10.9.5

contrib/
* check_GNU_style.sh: Fix paste args for BSD

OK.
jeff

Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-09 Thread Jeff Law


On 06/06/2016 01:40 PM, Jakub Jelinek wrote:

On Mon, Jun 06, 2016 at 09:27:56PM +0200, Marc Glisse wrote:

The last one would miss floating point registers (no 2 platforms use the
same letter for those, hence my quest for something more generic).

The goal of the experiment is described in PR59159 (for which "+X" is
unlikely to be the right answer, in particular because it is meaningless for
constants). I don't know in what context people use the "X" constraint, or
even better "=X"...


X constraint has been added mainly for uses in match_scratch like:
(clobber (match_scratch:SI 2 "=X,X,X,"))
or when the predicate takes care of everything and it is not needed to
specify anything further:
  [(set (match_operand:SWI12 0 "push_operand" "=X")
(match_operand:SWI12 1 "nonmemory_no_elim_operand" "rn"))]
Using it in inline asm generally has resulted in lots of issues, including
ICEs etc., so nothing I'd recommend to use.
So would it make sense to define it as not available for use in ASMs?  I 
realize that's potentially a user-visible change, but it might be a 
reasonable one to make.


jeff

Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-09 Thread Jeff Law


On 06/07/2016 11:58 AM, Bernd Edlinger wrote:


AFACT this is not the only place where overly complex RTL trees can
cause an ICE.
That wouldn't surprise me at all -- but the design of RTL is such that 
it can be arbitrarily complex.  Essentially, routines can not make 
assumptions about the complexity of the RTL they're given -- except for 
any guards/pre-conditions set up in the caller.




Yes. That is what I think too, "X" is probably not used often in
practice, but if it is allowed, it should at least not result in an ICE.

There's no disagreement about "X" should never result in an ICE.

The question is whether or not "X" truly means an arbitrary piece of RTL 
or some tighter subset of RTL.   I guess one could also raise the 
question of whether or not "X" should ever be exposed to asms.


"X"'s original intent IIRC was to specify that for a particular 
alternative the operand didn't matter and would never be used.  This 
comes up with scratch registers for example.





"X" should allow to feed "whatsoever" valid C expressions to the asm
input, and using the %-expression in the assembler string should not
cause an ICE.
Actually it doesn't even have to be a valid C expression.  It might have 
started as a C expression from an ASM statement, but been modified by 
various RTL optimization passes into something that isn't necessarily a 
valid C expression.





However if I compare commond.md ...

(define_constraint "i"
   "Matches a general integer constant."
   (and (match_test "CONSTANT_P (op)")
(match_test "!flag_pic || LEGITIMATE_PIC_OPERAND_P (op)")))

... against recog.c, general_operand seems to allow less i.e. checks
targetm.legitimate_constant_p additionally.  So that is something
I don't really understand, I mean if a constant is not a "legitimate"
constant by the target's definition, why should it be safe to use
in later in targetm.asm_out.print_operand?
Legitimacy in this context is usually something related to operands that 
have to be reloaded for PIC code generation.  They may print just fine, 
but from a code generation standpoint they require special handling. 
One could argue that this is inconsistent and it ought to be cleaned up, 
but that's separate from the "X" issue.


jeff

Re: [PATCH 1/3] config-list.mk: add KNOWN_BROKEN

2016-06-09 Thread Joseph Myers

On Thu, 9 Jun 2016, Jeff Law wrote:

> On 05/26/2016 09:04 AM, David Malcolm wrote:
> > When using config-list.mk to build all configurations, it's useful
> > to filter out the configurations that are known to be broken.
> > 
> > This patch does so, adding a KNOWN_BROKEN variable.
> > 
> > contrib/ChangeLog:
> > * config-list.mk (LIST): Rename to...
> > (FULL_LIST): ...this.
> > (KNOWN_BROKEN): New variable.
> > (LIST): Redefine, in terms of FULL_LIST and KNOWN_BROKEN.
> I'd rather fix, deprecate and/or remove these kinds of targets.
> 
> Given that interix has been broken for a long time, I'll approve a patch that
> removes it.

The rule was meant to be that a target is deprecated in one release, then 
removed in the next - the error message given without --enable-obsolete 
says so.  So all the targets that require --enable-obsolete in GCC 6 
should be removed for GCC 7 in the absence of interest in reviving them.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] PR bootstrap/71471: remove selftest for pp_format (%p)

2016-06-09 Thread David Malcolm

I was confused by the comment to pp_format:

  /* The following format specifiers are recognized as being client independent:
 ...
 %p: pointer
 ... */

into thinking that %p is printed in a host-independent manner, when
"client independent" means in relation to different pretty-printers e.g the
C++ pretty printer, or normal tree one etc.

The following patch removes the overzealous test case and clarifies
the comment.

OK for trunk?

I'm working on a followup that makes selftest failures easier to
track down.

gcc/ChangeLog:
PR bootstrap/71471
* pretty-print.c (pp_indent): Specify that %p is printed in a
host-dependent manner.
(test_pp_format): Remove the test for %p.
---
 gcc/pretty-print.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index d805da4..8febda0 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -279,7 +279,7 @@ pp_indent (pretty_printer *pp)
%wd, %wi, %wo, %wu, %wx: HOST_WIDE_INT versions.
%c: character.
%s: string.
-   %p: pointer.
+   %p: pointer (printed in a host-dependent manner).
%r: if pp_show_color(pp), switch to color identified by const char *.
%R: if pp_show_color(pp), reset color.
%m: strerror(text->err_no) - does not consume a value from args_ptr.
@@ -1317,8 +1317,8 @@ test_pp_format ()
   assert_pp_format ("A 12345678", "%c %x", 'A', 0x12345678);
   assert_pp_format ("hello world 12345678", "%s %x", "hello world",
0x12345678);
-  assert_pp_format ("0xcafebabe 12345678", "%p %x", (void *)0xcafebabe,
-   0x12345678);
+  /* We can't test for %p; the pointer is printed in an implementation-defined
+ manner.  */
   assert_pp_format ("normal colored normal 12345678",
"normal %rcolored%R normal %x",
"error", 0x12345678);
-- 
1.8.5.3

Re: [PATCH 0/9] separate shrink-wrapping

2016-06-09 Thread Jeff Law


On 06/07/2016 07:47 PM, Segher Boessenkool wrote:

This patch series introduces separate shrink-wrapping.

There are many things the prologue/epilogue of a function do, and most of
those things can be done independently.  For example, most of the time,
for many targets, the save of callee-saved registers can be done later
than the "main" prologue.

Doing so helps quite a bit because the prologue is expensive for functions
that do not need everything it does done for every path through the
function; often, the hot paths do not need much at all, e.g. not those
things the prologue needs to do for the function to call other functions.

The first patch creates a command-line flag, some hooks, a status flag
("is this function wrapped separately", used by later passes), and
documentation for these things.

The next six patches are to prevent later passes from mishandling the
epilogue instructions that now appear before the epilogue: mostly, you
cannot do much to instructions with a REG_CFA_RESTORE note without
confusing dwarf2cfi.  The cprop one is for prologue instructions.

Then, the main patch.  And finally a patch for PowerPC that implements
separate wrapping for GPRs and LR.

Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on
powerpc64le-linux.  Previous versions of this series also tested on
x86_64-linux.

Is this okay for trunk?


Segher


Segher Boessenkool (9):
  separate shrink-wrap: New command-line flag, status flag, hooks, and doc
  cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
  dce: Don't dead-code delete separately wrapped restores
  regrename: Don't rename restores
  regrename: Don't run if function was separately shrink-wrapped
  sel-sched: Don't mess with register restores
  cprop: Leave RTX_FRAME_RELATED_P instructions alone
  shrink-wrap: shrink-wrapping for separate concerns
  rs6000: Separate shrink-wrapping

I'm going to largely let Bernd own the review on this.  Just a few comments.

I certainly like the concept.  My mental model is that parts of the 
prologue might sink further than other parts of the prologue.  It's not 
an exact match, but I think it's a "close enough" mental model.


It looks like the "concerns" are all target defined and its the target's 
responsibility to deal with dependencies between the "concerns".  Right? 
 (BTW, I think "concerns" is a particularly poor name, perhaps 
"reasons" would be better?).


Right now it looks like shrink_wrapped_separate essentially says "we did 
something special here" -- while I don't think it's necessary for this 
patch, describing what we did might be better.  Essentially different 
paths would have a set of prologue/epilogue attributes that apply to 
that path.


This would be useful for things like cfgcleanup where you could join 
noreturn calls more aggressively.   Though this may not matter in any 
significant way in practice.


There may be enough commonality that we can promote some "concerns" from 
target dependent to target independent -- I've often wondered if we 
could handle a fair amount of prologue/epilogue generation in target 
independent ways.For simple targets, I wouldn't be terribly 
surprised if all prologue/epilogue generation could be handled 
generically and they'd get separate shrink-wrapping "for free".  But 
that's all blue-sky.


Jeff

[PATCH] Fix PR 71407 : Use correct types for live usage of live operations

2016-06-09 Thread Alan Hayward

This patch fixes PR 71407 by ensuring the BIT_FIELD_REF is created using
the
given vectype and then casted to the result type.

Tested on x86 and aarch64.
Note that the test vectorizes on x86 but does not vectorize on aarch64 or
power (due to a != statement failing to vectorize)

Ok to commit?



gcc/
PR tree-optimization/71407
PR tree-optimization/71416
* tree-vect-loop.c (vectorizable_live_operation): Use vectype for
BIT_FIELD_REF type.

Testsuite/
PR tree-optimization/71407
PR tree-optimization/71416
* gcc.dg/vect/pr71407.c: New
* gcc.dg/vect/pr71416-1.c: New
* gcc.dg/vect/pr71416-2.c: New


Alan.



pr71407.patch
Description: Binary data

Re: [PATCH] c/70883 - inconsistent error message for calls to __builtin_add_overflow with too few arguments

2016-06-09 Thread Jeff Law


On 06/07/2016 08:44 PM, Martin Sebor wrote:

Ping: This is a trivial patch to make the text of an error message
for an insufficient number of arguments to a built-in consistent
across different built-ins.  Rather that sometimes saying "not
enough arguments" and others "too few arguments" the patch makes
GCC print "too few" in all cases.

  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00093.html

OK.
jeff

Re: [PATCH] Do not enable -fcheck-pointer-bounds w/ -fsanitize=bounds

2016-06-09 Thread Jeff Law


On 06/09/2016 03:12 AM, Martin Liška wrote:

Hello.

I've prepared patch for the PR that is very similar to what was done in PR65044.
I've been currently testing the patch on x86_64-linux.

Ready to be installed after it finishes?

OK.
jeff

Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory

2016-06-09 Thread Jeff Law


On 06/08/2016 10:47 AM, Jiong Wang wrote:

As discussed on the PR

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,

here is the patch.

For this particular failure on arm, *arm_movsi_insn has the following
operand
constraints:
operand 0: "=rk,r,r,r,rk,m"
  operand 1: "rk, I,K,j,mi,rk"

gcc won't explicitly refuse an unmatch CT_MEMORY operand (r235184) if it
comes from substituion that alternative (alt) 4 got a chance to compete
with
alt 0, and eventually be the winner as it's with rld_nregs=0 while alt 0 is
with rld_nregs=1.

I fell it's OK to give alt 4 a chance here, but we should calculate the
cost
correctly.

For alt 4, it should be treated as spill into memory, but currently lra
only
recognize a spill for pseudo register.   While the spilled rtx for alt 4
is a
plus after equiv substitution.

 (plus:SI (reg/f:SI 102 sfp)
(const_int 4 [0x4]))

This patch thus let lra-constraint cost spill of Non-pseudo as well and
fixed
the regression.

x86_64/aarch64/arm boostrap and regression OK.
arm bootstrapped cc1 is about 0.3% smaller in code size.

OK for trunk?

gcc/
PR rtl-optimization/70751
* lra-constraints.c (process_alt_operands): Recognize Non-pseudo
spilled into
memory.

I think the change itself is fine, but the comment could use some work. 
Actually I think you just need to remove the first sentence (the one 
referring to BZ70751 and r235184) and keep everything from "Suppose a 
target" onward.


OK with that fix.

jeff

Re: [PATCH][MIPS] Don't split shifts by default for MIPS16.

2016-06-09 Thread Maciej W. Rozycki

On Tue, 24 May 2016, Robert Suchanek wrote:

> The following changes the default behaviour of shift splitting
> for MIPS16 e.g. the shifts will be split only when used with
> undocumented -mno-debugd option that is now switched on by default.
> 
> This appears to enable better optimization in certain cases, and hence,
> giving slightly better performance.

 Thank you for your contribution, however I have some issues with your 
proposal.

 First of all since TARGET_DEBUG_D_MODE is used across several places I 
think this really should be split into two separate pieces, independently 
reviewed.

 Then we have two parts:

1. The change to the expander is probably all but obviously correct as it 
   complements a similar one applied to the corresponding splitters eons 
   ago (back in 2001), although I wonder why we need to have both the 
   splitters and instructions split manually in the expander in the first 
   place.  Can you please investigatie it?

2. The other change is far from obvious since it flips the splitting 
   default, which has been there since forever (or the addition of MIPS16 
   support back in 1998).  So I think it needs a justification more 
   elaborate than just "certain cases", and perhaps a proper command-line 
   option defined rather than using an obscure debugging hook.  It may 
   also make sense to set the default dynamically from the CPU tuning 
   selected -- if the "certain cases" change from CPU to CPU, that is.  
   Can you please look into it?

  Maciej

Re: [PATCH, RFC] First cut at using vec_construct for strided loads

2016-06-09 Thread Bill Schmidt


> On Jun 8, 2016, at 9:05 AM, Richard Biener  wrote:
> 
> On Wed, 8 Jun 2016, Bill Schmidt wrote:
> 
>> Hi Richard,
>> 
>>> On Jun 8, 2016, at 7:29 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Jun 13, 2012 at 4:18 AM, William J. Schmidt
>>>  wrote:
 This patch is a follow-up to the discussion generated by
 http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html.  I've added
 vec_construct to the cost model for use in vect_model_load_cost, and
 implemented a cost calculation that makes sense to me for PowerPC.  I'm
 less certain about the default, i386, and spu implementations.  I took a
 guess at i386 from the discussions we had, and used the same calculation
 for the default and for spu.  I'm hoping you or others can fill in the
 blanks if I guessed badly.
 
 The i386 cost for vec_construct is different from all the others, which
 are parameterized for each processor description.  This should probably
 be parameterized in some way as well, but thought you'd know better than
 I how that should be.  Perhaps instead of
 
   elements / 2 + 1
 
 it should be
 
   (elements / 2) * X + Y
 
 where X and Y are taken from the processor description, and represent
 the cost of a merge and a permute, respectively.  Let me know what you
 think.
>>> 
>>> Just trying to understand how you arrived at the above formulas in 
>>> investigating
>>> strangely low cost for v16qi construction of 9.  If we pairwise reduce 
>>> elements
>>> with a cost of 1 then we arrive at a cost of elements - 1, that's what you'd
>>> get with not accounting an initial move of element zero into a vector and 
>>> then
>>> inserting each other element into that with elements - 1 inserts.
>> 
>> What I wrote there only makes partial sense for certain types on Power, so 
>> far as
>> I can tell, and even then it doesn’t generalize properly.  When the scalar 
>> registers
>> are contained in the vector registers (as happens for floating-point on 
>> Power), then
>> you can do some merges and other forms of permutes to combine them faster
>> than doing specific inserts.  But that isn’t a general solution even on 
>> Power; for the
>> integer modes we still do inserts.
> 
> You mean Power has instructions to combine more than two vector registers
> into one?  Otherwise you still need n / 2 plus n / 4 plus n / 8 ...
> "permutes" which boils down to n - 1.

Right, we do not.  This is what I meant about it not generalizing properly -- 
it was an
ill-thought-out, off-the-cuff remark, so far as I can tell.  I agree with the n 
- 1, and I need
to go in and make a similar change for Power.  There is a special case with 
32-bit
floating-point that probably also needs adjustment.

Bill

> 
>> So what you have makes sense to me, and what’s currently in place for Power 
>> needs
>> work also, so far as I can tell.  I’ll take a note to revisit this.
> 
> Thanks.
> Richard.
> 
>> — Bill
>> 
>>> 
>>> This also matches up with code-generation on x86_64 for
>>> 
>>> vT foo (T a, T b, ...)
>>> {
>>> return (vT) {a, b, ... };
>>> }
>>> 
>>> for any vector / element type combination I tried.  Thus the patch below.
>>> 
>>> I'll bootstrap / test that on x86_64-linux and I'm leaving other
>>> targets to target
>>> maintainers.
>>> 
>>> Ok for the i386 parts?
>>> 
>>> Thanks,
>>> Richard.
>>> 
>>> 2016-06-08  Richard Biener  
>>> 
>>>   * targhooks.c (default_builtin_vectorization_cost): Adjust
>>>   vec_construct cost.
>>>   * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
>>> 
>>> Index: gcc/targhooks.c
>>> ===
>>> --- gcc/targhooks.c (revision 237196)
>>> +++ gcc/targhooks.c (working copy)
>>> @@ -589,8 +589,7 @@ default_builtin_vectorization_cost (enum
>>>return 3;
>>> 
>>>  case vec_construct:
>>> -   elements = TYPE_VECTOR_SUBPARTS (vectype);
>>> -   return elements / 2 + 1;
>>> +   return TYPE_VECTOR_SUBPARTS (vectype) - 1;
>>> 
>>>  default:
>>>gcc_unreachable ();
>>> Index: gcc/config/i386/i386.c
>>> ===
>>> --- gcc/config/i386/i386.c  (revision 237196)
>>> +++ gcc/config/i386/i386.c  (working copy)
>>> @@ -49503,8 +49520,6 @@ static int
>>> ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
>>> tree vectype, int)
>>> {
>>> -  unsigned elements;
>>> -
>>>  switch (type_of_cost)
>>>{
>>>  case scalar_stmt:
>>> @@ -49546,8 +49561,7 @@ ix86_builtin_vectorization_cost (enum ve
>>>return ix86_cost->vec_stmt_cost;
>>> 
>>>  case vec_construct:
>>> -   elements = TYPE_VECTOR_SUBPARTS (vectype);
>>> -   return ix86_cost->vec_stmt_cost * (elements / 2 + 1);
>>> +   return

[ARM] Fix, add tests for FP16 aapcs.

2016-06-09 Thread Matthew Wahab


Hello,

A number of tests were added to check for FP16 arguments and return
values being passed in registers. These require mfloat-abi=hard to be
selected but in some test configurations they were run with
-mfloat-abi=soft or -mfloat-abi=softfp.

Explict skip-if directives are added to the tests to ensure that they
only run on valid configurations. In addition, the code in the
gcc.target/arm/fp16-aapcs-1.c test is reworked to focus on argument
passing and return values and a softfp variant is added as
fp16-aapcs-2.c.

Tested for arm-none-linux-gnueabihf with native make check and for
arm-none-eabi with cross-compiled check-gcc. Also checked the new tests
with cross-compiled arm-eabi-qemu/-mcpu=cortex-m3/-mthumb.

Ok for trunk?
Matthew

2016-06-09  Matthew Wahab  

* testsuite/gcc.target/arm/aapcs/neon-vect10.c: Skip for
mfloat-abi=soft and mfloat-abi=softfp.  Replace arm_neon_fp16_ok
with arm_neon_fp16_hw.
* testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise.
* testsuite/gcc.target/arm/fp16-aapcs-1.c: Skip for
mfloat-abi=soft and mfloat-abi=softfp.  Also, simplify the test
and set option -mfloat-abi=hard.
* testsuite/gcc.target/arm/fp16-aapcs-2.c: New.
>From b02f0283367d4a4c1b012e8ca8e7b5c91f3ac561 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 24 May 2016 09:21:11 +0100
Subject: [PATCH] [ARM] Fix, add tests for FP16 aapcs.

A number of tests were added to check for FP16 arguments and return
values being passed in register. These require mfloat-abi=hard to be
selected but, in some test configurations, they were run with
-mfloat-abi=soft or -mfloat-abi=softfp.

Explict skip-if directives are added to the tests to ensure that they
only run on valid configurations. In addition, the code in the
gcc.target/arm/fp16-aapcs-1.c test is reworked to focus on argument
passing and return values and a softfp variant is added as
fp16-aapcs-2.c.

2016-06-09  Matthew Wahab  

	* testsuite/gcc.target/arm/aapcs/neon-vect10.c: Skip for
	mfloat-abi=soft and mfloat-abi=softfp.  Replace arm_neon_fp16_ok
	with arm_neon_fp16_hw.
	* testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise.
	* testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise.
	* testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise.
	* testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise.
	* testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise.
	* testsuite/gcc.target/arm/fp16-aapcs-1.c: Skip for
	mfloat-abi=soft and mfloat-abi=softfp.  Also, simplify the test
	and set option -mfloat-abi=hard.
	* testsuite/gcc.target/arm/fp16-aapcs-2.c: New.
---
 gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c  |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/vfp18.c   |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/vfp19.c   |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/vfp20.c   |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/vfp21.c   |  3 ++-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c  | 22 +-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c  | 21 +
 8 files changed, 46 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c

diff --git a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c
index 680a3b5..6764a19 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c
+++ b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c
@@ -1,8 +1,9 @@
 /* Test AAPCS layout (VFP variant for Neon types) */
 
 /* { dg-do run { target arm_eabi } } */
-/* { dg-require-effective-target arm_neon_fp16_ok } */
+/* { dg-require-effective-target arm_neon_fp16_hw } */
 /* { dg-add-options arm_neon_fp16 } */
+/* { dg-skip-if "" { *-*-* } { "-mfloat-abi=soft" "-mfloat-abi=softfp" } } */
 
 #ifndef IN_FRAMEWORK
 #define VFP
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c
index fc2b13b..b2526a1 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c
+++ b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c
@@ -1,8 +1,9 @@
 /* Test AAPCS layout (VFP variant for Neon types) */
 
 /* { dg-do run { target arm_eabi } } */
-/* { dg-require-effective-target arm_neon_fp16_ok } */
+/* { dg-require-effective-target arm_neon_fp16_hw } */
 /* { dg-add-options arm_neon_fp16 } */
+/* { dg-skip-if "" { *-*-* } { "-mfloat-abi=soft" "-mfloat-abi=softfp" } } */
 
 #ifndef IN_FRAMEWORK
 #define VFP
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c
index 225e9ce..629b884 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c
+++

RE: [PATCH][MIPS] Add -minline-intermix to ignore compression flags when inlining

2016-06-09 Thread Maciej W. Rozycki

On Wed, 25 May 2016, Robert Suchanek wrote:

> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index 73f1cb6..2f6195e 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -837,6 +837,7 @@ Objective-C and Objective-C++ Dialects}.
> > >   -mips16  -mno-mips16  -mflip-mips16 @gol
> > >   -minterlink-compressed -mno-interlink-compressed @gol
> > >   -minterlink-mips16  -mno-interlink-mips16 @gol
> > > +-minline-intermix -mno-inline-intermix @gol
> > 
> > Funky indentation here
> 
> I'm not sure what's wrong here, it's the diff I'd say.  There is no
> indentation within "MIPS Options".

 Two spaces rather than one are used between individual options (modulo 
formatting errata in some previous additions).

  Maciej

Re: [PATCH 2/3] config-list.mk: add GCC_SRC_DIR

2016-06-09 Thread Jeff Law


On 05/26/2016 09:04 AM, David Malcolm wrote:

config-list.mk currently requires the pwd to be in a sibling directory
of the source tree.  However, building using config-list.mk can consume
over 400GB of disk space in this build directory (e.g. my machine
successfully built cc1 for 206 configurations last night, consuming
442GB of space).  I've found it useful to be able to run config-list.mk
in an arbitrary build location (i.e. one with plenty of free space),
so this patch adds a GCC_SRC_DIR variable which can be overridden.

contrib/ChangeLog:
* config-list.mk (GCC_SRC_DIR): New variable.
(make-log-dir): Use GCC_SRC_DIR.
($(LIST)): Likewise.

OK.  Good to see this happen, I've found the restriction annoying as well.

jeff

Re: [PATCH 3/3] config-list.mk: add OPT-enable-obsolete to 4 targets

2016-06-09 Thread Jeff Law


On 05/26/2016 09:04 AM, David Malcolm wrote:

r233165 marked three deprecated rtems targets as obsolete.
r233887 marked mep-elf as obsolete.

Update config-list.mk to add OPT-enable-obsolete to these 4
targets.

contrib/ChangeLog:
* config-list.mk (FULL_LIST): Add OPT-enable-obsolete to
avr-rtems, h8300-rtems, m32r-rtems, mep-elf.

OK.
jeff

Re: [PATCH 1/3] config-list.mk: add KNOWN_BROKEN

2016-06-09 Thread Jeff Law


On 05/26/2016 09:04 AM, David Malcolm wrote:

When using config-list.mk to build all configurations, it's useful
to filter out the configurations that are known to be broken.

This patch does so, adding a KNOWN_BROKEN variable.

contrib/ChangeLog:
* config-list.mk (LIST): Rename to...
(FULL_LIST): ...this.
(KNOWN_BROKEN): New variable.
(LIST): Redefine, in terms of FULL_LIST and KNOWN_BROKEN.

I'd rather fix, deprecate and/or remove these kinds of targets.

Given that interix has been broken for a long time, I'll approve a patch 
that removes it.


jeff

Re: Update probabilities in predict.def to match reality

2016-06-09 Thread Martin Liška

On 06/07/2016 09:27 PM, Jan Hubicka wrote:
> There are bugs in few predictors - goto predictor is dead because the FE code 
> was dropped,
> return predictor is bit random because CFG is optimized (it should probably 
> be done in FE),
> loop iv compare seems bogus and fortran fail alloc does not seem to work as 
> intended.
> I added FIXME and will addres them incrementally.

Hi.

I've investigated why 'loop iv compare heuristics' provides bogus values and 
I've just created
PR for that:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71474

And I would like to apply following patch, that adds 'XFAIL' test-case 
described in the PR and
I would distinguish scanning for 'loop iv compare' and 'guess loop iv compare'.

Can I install the patch?
Thanks,
Martin

>From 6272402f76d4e6ff496d55e9a4fac7ee9a696e4e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 9 Jun 2016 16:53:32 +0200
Subject: [PATCH] Make 'loop iv compare' heuristics scanning more precise in
 test-suite

gcc/testsuite/ChangeLog:

2016-06-09  Martin Liska  

	* gcc.dg/predict-1.c: Distinguish between "loop iv compare"
	and "guess loop iv compared" heuristics.
	* gcc.dg/predict-2.c: Likewise.
	* gcc.dg/predict-3.c: Likewise.
	* gcc.dg/predict-4.c: Likewise.
	* gcc.dg/predict-5.c: Likewise.
	* gcc.dg/predict-6.c: Likewise.
	* gfortran.dg/predict-1.f90: New test.
---
 gcc/testsuite/gcc.dg/predict-1.c|  2 +-
 gcc/testsuite/gcc.dg/predict-2.c|  2 +-
 gcc/testsuite/gcc.dg/predict-3.c|  2 +-
 gcc/testsuite/gcc.dg/predict-4.c|  2 +-
 gcc/testsuite/gcc.dg/predict-5.c|  2 +-
 gcc/testsuite/gcc.dg/predict-6.c|  2 +-
 gcc/testsuite/gfortran.dg/predict-1.f90 | 18 ++
 7 files changed, 24 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/predict-1.f90

diff --git a/gcc/testsuite/gcc.dg/predict-1.c b/gcc/testsuite/gcc.dg/predict-1.c
index d0924f2..10d62ba 100644
--- a/gcc/testsuite/gcc.dg/predict-1.c
+++ b/gcc/testsuite/gcc.dg/predict-1.c
@@ -23,4 +23,4 @@ void foo (int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "loop iv compare heuristics of edge\[^:\]*: 2.0%" 5 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 2.0%" 5 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-2.c b/gcc/testsuite/gcc.dg/predict-2.c
index 3011686..aa91568 100644
--- a/gcc/testsuite/gcc.dg/predict-2.c
+++ b/gcc/testsuite/gcc.dg/predict-2.c
@@ -23,4 +23,4 @@ void foo (int base, int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-not "loop iv compare heuristics of edge\[^:\]*:" "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-not "guess loop iv compare heuristics of edge\[^:\]*:" "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-3.c b/gcc/testsuite/gcc.dg/predict-3.c
index 663f141..7274963 100644
--- a/gcc/testsuite/gcc.dg/predict-3.c
+++ b/gcc/testsuite/gcc.dg/predict-3.c
@@ -25,4 +25,4 @@ void foo (int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "loop iv compare heuristics of edge\[^:\]*: 98.0%" 3 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 98.0%" 3 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-4.c b/gcc/testsuite/gcc.dg/predict-4.c
index 5779da3..2ac2ec5 100644
--- a/gcc/testsuite/gcc.dg/predict-4.c
+++ b/gcc/testsuite/gcc.dg/predict-4.c
@@ -15,4 +15,4 @@ void foo (int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump "loop iv compare heuristics of edge\[^:\]*: 50.0%" "profile_estimate"} } */
+/* { dg-final { scan-tree-dump "  loop iv compare heuristics of edge\[^:\]*: 50.0%" "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-5.c b/gcc/testsuite/gcc.dg/predict-5.c
index 56ada30..135081d 100644
--- a/gcc/testsuite/gcc.dg/predict-5.c
+++ b/gcc/testsuite/gcc.dg/predict-5.c
@@ -21,4 +21,4 @@ void foo (int base, int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "loop iv compare heuristics of edge\[^:\]*: 98.0%" 4 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 98.0%" 4 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-6.c b/gcc/testsuite/gcc.dg/predict-6.c
index 9ed41ed..104683f 100644
--- a/gcc/testsuite/gcc.dg/predict-6.c
+++ b/gcc/testsuite/gcc.dg/predict-6.c
@@ -21,4 +21,4 @@ void foo (int base, int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "loop iv compare heuristics of edge\[^:\]*: 2.0%" 4 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 2.0%" 4 "profile_estimate"} } */
diff --git a/gcc/testsuite/gfortran.dg/predict-1.f90 b/gcc/testsuite/gfortran.dg/predict-1.f90
new file mode 100644
index 000..78a03f6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/predict-1.f90
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-profile_estimate" }
+

Re: [PATCH][wwwdocs][AArch64] Mention -mcpu=qdf24xx support for GCC 6

2016-06-09 Thread Jim Wilson

On Thu, Jun 9, 2016 at 1:43 AM, Kyrill Tkachov
 wrote:
> On 03/06/16 21:32, Evandro Menezes wrote:
>> Shouldn't this read "The Qualcomm QDF24xx processors are now supported via
>> the"?
>>
> I used Jim's suggestion at
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01423.html.
> I don't have a strong opinion on it either.
> If Jim objects to the wording I'm happy to change it.

I don't have a strong opinion either.  The option was named qdf24xx in
anticipation that there might be more than one closely related part,
but currently there are zero of them, as nothing has been released
yet.

Jim

[committed] MIPS: Stay within 79 columns in `mips_output_jump'

2016-06-09 Thread Maciej W. Rozycki

Hi,

 I have applied this change, as obvious.

gcc/
* config/mips/mips.c (mips_output_jump): Fix formatting.

  Maciej

gcc-mips-output-jump-format.diff
Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  (revision 237266)
+++ gcc/config/mips/mips.c  (working copy)
@@ -13588,8 +13588,9 @@ mips_output_jump (rtx *operands, int tar
   else
s += sprintf (s, "%%*");
 
-  s += sprintf (s, "%s%s%s%s%s\t%%%d%s", insn_name, and_link, reg, 
compact, short_delay,
-   target_opno, nop);
+  s += sprintf (s, "%s%s%s%s%s\t%%%d%s",
+   insn_name, and_link, reg, compact, short_delay,
+   target_opno, nop);
 
   if (!reg_p && TARGET_ABICALLS_PIC2)
s += sprintf (s, "\n\t.option\tpic2");

Re: [Patch, testsuite] Skip some more tests for targets with int size < 32

2016-06-09 Thread Mike Stump

On Jun 8, 2016, at 6:14 AM, Senthil Kumar Selvaraj 
 wrote:
> 
>  This patch requires int32plus support for a few more tests - these
>  were failing for the avr target.

>  If this is ok, could someone commit please? I don't have commit access.

Ok.

Committed revision 237266.

> 2016-06-08  Senthil Kumar Selvaraj  
> 
>   * gcc.c-torture/execute/bswap-2.c: Require int32plus.
>   * gcc.dg/torture/pr68067-1.c: Likewise.
>   * gcc.dg/torture/pr68067-2.c: Likewise.

Re: [Patch, avr] Fix broken stack-usage-1.c test

2016-06-09 Thread Mike Stump

On Jun 8, 2016, at 4:20 AM, Senthil Kumar Selvaraj 
 wrote:
> 
>  I forgot to send this testcase modification with that patch - here's
>  the fix for making gcc.dg/stack-usage-1.c pass again for avr.
> 
>  If this is ok, could someone commit please? I don't have commit
>  access.

Ok.

Committed revision 237266.

> 2016-06-08  Senthil Kumar Selvaraj  
> 
>   * gcc.dg/stack-usage-1.c (SIZE): Consider return address
>  when setting SIZE.

[PATCH] Fix tests for x86 interrupt for -fpic and -march=corei7 targets

2016-06-09 Thread Koval, Julia

Hi,

Here is a trivial patch, that fixes tests for these targets. Ok for trunk?

gcc/testsuite/ChangeLog:
* gcc.target/i386/interrupt-12.c: Fix test for -fpic and corei7.
* gcc.target/i386/interrupt-13.c: Likewise. 
* gcc.target/i386/interrupt-15.c: Likewise.
* gcc.target/i386/interrupt-14.c: Fix test for -fpic.
* gcc.target/i386/interrupt-24.c: Likewise. 
* gcc.target/i386/interrupt-3.c: Fix test for corei7. 
* gcc.target/i386/interrupt-9.c: Likewise. 
* gcc.target/i386/interrupt-redzone-2.c: Likewise.

Br,
Julia


fix_tests_09
Description: fix_tests_09

[hsa-branch 5/5] OMP lowering/expansion changes to gridify tiled loops

2016-06-09 Thread Martin Jambor

Hi,

the patch below is the main part of the series that enhances the
existing gridification code to pattern-match sequences of constructs
in which the distribute, parallel and a loop constructs are not in one
simple combined construct but the step in the distribute loop exactly
matches the iteration size of (possibly many) inner loop constructs.
It also checks other conditions, most notably that an unknown
non-const and non-pure function is not called as part of the
distribute loop but outside of a normal loop construct.

If this pattern is matched, iterations of the distribute loop are
converted to HSA groups and iterations of inner loop constructs to
individual work-items within those groups.  In the code, the inner
loops are converted to a barrier.

I'll commit this to the hsa branch in a few moments and will re-submit
it for trunk at some point in summer.

Thanks,

Martin

2016-06-09  Martin Jambor  

gcc/
* gimple.h (enum gf_mask): New element GF_OMP_FOR_GRID_GROUP_ITER.
(gimple_omp_for_grid_group_iter): New function.
(gimple_omp_for_set_grid_group_iter): Likewise.
* omp-low.c (check_omp_nesting_restrictions): Allow kernel loop in
place of a distribute one.
(grid_expand_omp_for_loop): New parameter specifying whether the loop
is an intra-group one.  If so or if the loop is over groups, use the
respective builtins for expansion.  Emit barriers for intra-group
ones.  Moved a branch removeal here from grid_expand_target_grid_body.
(grid_expand_target_grid_body): If the loop iterates over groups, find
the intra-group ones, expand them and remove them from the OMP
construct tree.
(grid_prop): New type.
(grid_safe_assignment_p): New parameter grid, do not consider safe
assignments to variables holding group sizes.
(grid_seq_only_contains_local_assignments): New parameter grid, pass
it to grid_safe_assignment_p.
(grid_find_single_omp_among_assignments_1): Likewise.  Also tkae
target location from grid.  Emit more missed-optimizations
information.
(grid_find_single_omp_among_assignments): Likewise.
(grid_parallel_clauses_gridifiable): New function.
(grid_inner_loop_gridifiable_p): Likewise.
(grid_dist_follows_simple_pattern): Likewise.
(grid_gfor_follows_tiling_pattern): Likewise.
(grid_call_permissible_in_distribute_p): Likewise.
(grid_handle_call_in_distribute): Likewise.
(grid_dist_follows_tiling_pattern): Likewise.
(grid_target_follows_gridifiable_pattern): New parameter grid.  If
distribute is not in a combined construct, attempt tiled
gridification.  Parts of simple gridification moved to the new
functions.
(grid_var_segment): New enum.
(grid_mark_variable_segment): New function.
(grid_copy_leading_local_assignments): Also call
grid_mark_variable_segment if requested by a new parameter.
(grid_mark_tiling_loops): New function.
(grid_mark_tiling_parallels_and_loops): Likewise.
(grid_process_kernel_body_copy): Also handle tiled grids.
(grid_attempt_target_gridification): new variable grid to be passed
around.  Get group sizes from it.

gcc/testsuite/
* c-c++-common/gomp/gridify-2.c: New test.
* c-c++-common/gomp/gridify-3.c: Likewise.

libgomp/
* testsuite/libgomp.hsa.c/tiling-1.c: New test.
* testsuite/libgomp.hsa.c/tiling-2.c: Likewise.
---
 gcc/gimple.h|   21 +
 gcc/omp-low.c   | 1057 ---
 gcc/testsuite/c-c++-common/gomp/gridify-2.c |   66 ++
 gcc/testsuite/c-c++-common/gomp/gridify-3.c |   68 ++
 libgomp/testsuite/libgomp.hsa.c/tiling-1.c  |  212 ++
 libgomp/testsuite/libgomp.hsa.c/tiling-2.c  |  258 +++
 6 files changed, 1434 insertions(+), 248 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-3.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-2.c

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 063e29d..2680a13 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -163,6 +163,7 @@ enum gf_mask {
 GF_OMP_FOR_COMBINED= 1 << 4,
 GF_OMP_FOR_COMBINED_INTO   = 1 << 5,
 GF_OMP_FOR_GRID_PHONY  = 1 << 6,
+GF_OMP_FOR_GRID_GROUP_ITER  = 1 << 7,
 GF_OMP_TARGET_KIND_MASK= (1 << 4) - 1,
 GF_OMP_TARGET_KIND_REGION  = 0,
 GF_OMP_TARGET_KIND_DATA= 1,
@@ -5124,6 +5125,26 @@ gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool 
value)
 omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY;
 }
 
+/* Return true if iterations of a grid OMP_FOR statement correspond to HSA
+   groups.  */
+
+static inline bool
+gimple_omp_for_grid_group_iter (const gomp_for

[hsa-branch 2/5] Make emit_insn_operands handle zero operands

2016-06-09 Thread Martin Jambor

Hi,

the patch below allows emit_insn_operands to instructions with no
operands gracefully.  Apparently so far we have not produced any.

I'll commit this to the hsa branch in a few moments and then to trunk
at some point in summer.

Martin

2016-06-02  Martin Jambor  

* hsa-brig.c (emit_insn_operands): Cope with zero operands in an
instruction.
---
 gcc/hsa-brig.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 9c74b9a..471533c 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -1236,20 +1236,20 @@ emit_insn_operands (hsa_insn_basic *insn)
 operand_offsets;
 
   unsigned l = insn->operand_count ();
-  operand_offsets.safe_grow (l);
-
-  for (unsigned i = 0; i < l; i++)
-operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i)));
 
   /* We have N operands so use 4 * N for the byte_count.  */
   uint32_t byte_count = lendian32 (4 * l);
-
   unsigned offset = brig_data.add (_count, sizeof (byte_count));
-  brig_data.add (operand_offsets.address (),
-l * sizeof (BrigOperandOffset32_t));
+  if (l > 0)
+{
+  operand_offsets.safe_grow (l);
+  for (unsigned i = 0; i < l; i++)
+   operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i)));
 
+  brig_data.add (operand_offsets.address (),
+l * sizeof (BrigOperandOffset32_t));
+}
   brig_data.round_size_up (4);
-
   return offset;
 }
 
-- 
2.8.2

[hsa-branch 3/5] Reorganize HSA branches representation

2016-06-09 Thread Martin Jambor

this patch reorganizes the class hierarchy we use to represent HSA
branching and synchronization instructions.  The ultimate goal is to
find a way of representing a barrier instruction which shares the same
kind with branches.

It basically renames hsa_insn_br, which we have used to represent only
conditional branches anyway, to hsa_insn_cbr and makes it inherit from
a new ancestor that will be used to represent barriers.  The next patch
in the series actually introduces barrier instruction into the IL .

I'll commit this to the hsa branch in a few moments and then to trunk
at some point in summer.

Martin

2016-06-03  Martin Jambor  

* hsa.h (hsa_insn_br): Renamed to hsa_insn_cbr, renamed all
occurences in all files too.
(hsa_insn_br): New class, now the ancestor of hsa_incn_cbr.
(is_a_helper ::test): New function.
(is_a_helper ::test): Adjust to only cover conditional
branch instructions.
* hsa-brig.c (emit_branch_insn): Renamed to emit_cond_branch_insn.
Emit the width stored in the class.
(emit_generic_branch_insn): New function.
(emit_insn): Call emit_generic_branch_insn.
* hsa-dump.c (hsa_width_specifier_name): New function.
(dump_hsa_insn_1): Dump generic branch instructions.
* hsa-gen.c (hsa_insn_br::hsa_insn_br): New.
(hsa_insn_br::operator new): Likewise.
(hsa_insn_cbr::hsa_insn_cbr): Set width via ancestor constructor.
* hsa.c (hsa_destroy_insn): Also handle instances of hsa_insn_br.
---
 gcc/hsa-brig.c | 32 
 gcc/hsa-dump.c | 92 --
 gcc/hsa-gen.c  | 37 +--
 gcc/hsa.c  |  6 ++--
 gcc/hsa.h  | 49 +--
 5 files changed, 188 insertions(+), 28 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 471533c..716d8f5 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -1496,11 +1496,29 @@ emit_cmp_insn (hsa_insn_cmp *cmp)
   brig_insn_count++;
 }
 
-/* Emit an HSA branching instruction and all necessary directives, schedule
-   necessary operands for writing.  */
+/* Emit an HSA generic branching/sycnronization instruction.  */
+
+static void
+emit_generic_branch_insn (hsa_insn_br *br)
+{
+  struct BrigInstBr repr;
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = lendian16 (br->m_opcode);
+  repr.width = br->m_width;
+  repr.base.type = lendian16 (br->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (br));
+  memset (, 0, sizeof (repr.reserved));
+
+  brig_code.add (, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA conditional branching instruction and all necessary directives,
+   schedule necessary operands for writing.  */
 
 static void
-emit_branch_insn (hsa_insn_br *br)
+emit_cond_branch_insn (hsa_insn_cbr *br)
 {
   struct BrigInstBr repr;
 
@@ -1513,7 +1531,7 @@ emit_branch_insn (hsa_insn_br *br)
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
   repr.base.opcode = lendian16 (br->m_opcode);
-  repr.width = BRIG_WIDTH_1;
+  repr.width = br->m_width;
   /* For Conditional jumps the type is always B1.  */
   repr.base.type = lendian16 (BRIG_TYPE_B1);
 
@@ -1885,8 +1903,8 @@ emit_insn (hsa_insn_basic *insn)
 emit_segment_insn (seg);
   else if (hsa_insn_cmp *cmp = dyn_cast  (insn))
 emit_cmp_insn (cmp);
-  else if (hsa_insn_br *br = dyn_cast  (insn))
-emit_branch_insn (br);
+  else if (hsa_insn_cbr *br = dyn_cast  (insn))
+emit_cond_branch_insn (br);
   else if (hsa_insn_sbr *sbr = dyn_cast  (insn))
 {
   if (switch_instructions == NULL)
@@ -1895,6 +1913,8 @@ emit_insn (hsa_insn_basic *insn)
   switch_instructions->safe_push (sbr);
   emit_switch_insn (sbr);
 }
+  else if (hsa_insn_br *br = dyn_cast  (insn))
+emit_generic_branch_insn (br);
   else if (hsa_insn_arg_block *block = dyn_cast  (insn))
 emit_arg_block_insn (block);
   else if (hsa_insn_call *call = dyn_cast  (insn))
diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index 985caca..3b65684 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -621,6 +621,88 @@ hsa_m_atomicop_name (enum BrigAtomicOperation op)
 }
 }
 
+/* Return textual name for atomic operation.  */
+
+static const char *
+hsa_width_specifier_name (BrigWidth8_t width)
+{
+  switch (width)
+{
+case BRIG_WIDTH_NONE:
+  return "none";
+case BRIG_WIDTH_1:
+  return "1";
+case BRIG_WIDTH_2:
+  return "2";
+case BRIG_WIDTH_4:
+  return "4";
+case BRIG_WIDTH_8:
+  return "8";
+case BRIG_WIDTH_16:
+  return "16";
+case BRIG_WIDTH_32:
+  return "32";
+case BRIG_WIDTH_64:
+  return "64";
+case BRIG_WIDTH_128:
+  return "128";
+case BRIG_WIDTH_256:
+  return "256";
+case BRIG_WIDTH_512:
+  return

[hsa-branch 1/5] Allow putting local variables into group and global segments

2016-06-09 Thread Martin Jambor

Hi,

the following patch adds the capability to put local HSA variables
into the group or global segment as indicated by new declaration
attributes.  In the process I had to fix how we differentiate between
local and global attributes because context and allocation properties
now can diverge.

This patch does not do the things necessary to expose the attributes
to the user.  I wonder whether it even makes sense for OpenMP uses.  I
will do so when we provide the users with a more direct way of
generating HSAIL.

I will commit the patch to the HSA branch in a few moments.  The
ultimate goal is trunk but I'd like to keep it there for a number of
weeks.

Martin

2016-06-02  Martin Jambor  

* hsa-gen.c (get_symbol_for_decl): Fix dinstinguishing between
global and local functions.  Put local variables into a segment
according to their attribute or static flag, if there is one.
---
 gcc/hsa-gen.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index c08f4a8..2ead76a 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -794,8 +794,8 @@ get_symbol_for_decl (tree decl)
 
   dummy.m_decl = decl;
 
-  bool is_in_global_vars
-= TREE_CODE (decl) == VAR_DECL && is_global_var (decl);
+  bool is_in_global_vars = ((TREE_CODE (decl) == VAR_DECL)
+   && !decl_function_context (decl));
 
   if (is_in_global_vars)
 slot = hsa_global_variable_symbols->find_slot (, INSERT);
@@ -861,8 +861,17 @@ get_symbol_for_decl (tree decl)
  /* PARM_DECL and RESULT_DECL should be already in m_local_symbols.  */
  gcc_assert (TREE_CODE (decl) == VAR_DECL);
 
- sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_PRIVATE,
-   BRIG_LINKAGE_FUNCTION);
+ BrigSegment8_t segment;
+ if (lookup_attribute ("hsa_group_segment", DECL_ATTRIBUTES (decl)))
+   segment = BRIG_SEGMENT_GROUP;
+ else if (TREE_STATIC (decl)
+  || lookup_attribute ("hsa_global_segment",
+   DECL_ATTRIBUTES (decl)))
+   segment = BRIG_SEGMENT_GLOBAL;
+ else
+   segment = BRIG_SEGMENT_PRIVATE;
+
+ sym = new hsa_symbol (BRIG_TYPE_NONE, segment, BRIG_LINKAGE_FUNCTION);
  sym->m_align = align;
  sym->fillup_for_decl (decl);
  hsa_cfun->m_private_variables.safe_push (sym);
-- 
2.8.2

[hsa-branch 4/5] New HSA builtins needed for tiling

2016-06-09 Thread Martin Jambor

Hi,

this patch adds two hsa builtins and code for emission of HSAIL for
them and for BUILT_IN_GOMP_BARRIER.  These built-ins are going to be
introduced to GIMPLE IL at OMP expansion time by the subsequent patch
in the series.  I plan to experiment with adding more builtins for
special HSAIL instructions but that is a separate effort.

I'll commit this to the hsa branch in a few moments and then to trunk
at some point in summer.

Thanks,

Martin

2016-06-03  Martin Jambor  

* hsa-builtins.def (BUILT_IN_HSA_GET_WORKGROUP_ID): New.
(BUILT_IN_HSA_GET_WORKITEM_ID): Likewise.
* hsa-gen.c (gen_hsa_insns_for_call): Emit HSAIL for the above
builtins and for BUILT_IN_GOMP_BARRIER.  Move emiting of
BUILT_IN_HSA_GET_WORKITEM_ABSID up in the function.
---
 gcc/hsa-builtins.def |  4 
 gcc/hsa-gen.c| 38 ++
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/hsa-builtins.def b/gcc/hsa-builtins.def
index e4681c1..3f183f1 100644
--- a/gcc/hsa-builtins.def
+++ b/gcc/hsa-builtins.def
@@ -27,5 +27,9 @@ along with GCC; see the file COPYING3.  If not see
 /* The reason why they aren't in gcc/builtins.def is that the Fortran front end
doesn't source those.  */
 
+DEF_HSA_BUILTIN (BUILT_IN_HSA_GET_WORKGROUP_ID, "hsa_get_workgroup_id",
+BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_HSA_BUILTIN (BUILT_IN_HSA_GET_WORKITEM_ID, "hsa_get_workitem_id",
+BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_HSA_BUILTIN (BUILT_IN_HSA_GET_WORKITEM_ABSID, "hsa_get_workitem_absid",
 BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index fb376a1..efcba8c 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5783,6 +5783,36 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
hbb->append_insn (atominsn);
break;
   }
+
+case BUILT_IN_HSA_GET_WORKGROUP_ID:
+  {
+   hsa_op_immed *bdim = new hsa_op_immed (gimple_call_arg (stmt, 0), true);
+   if (bdim->m_type != BRIG_TYPE_U32)
+ bdim->get_in_type (BRIG_TYPE_U32, hbb);
+   query_hsa_grid (stmt, BRIG_OPCODE_WORKGROUPID, bdim, hbb);
+   break;
+  }
+case BUILT_IN_HSA_GET_WORKITEM_ID:
+  {
+   hsa_op_immed *bdim = new hsa_op_immed (gimple_call_arg (stmt, 0), true);
+   if (bdim->m_type != BRIG_TYPE_U32)
+ bdim->get_in_type (BRIG_TYPE_U32, hbb);
+   query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMID, bdim, hbb);
+   break;
+  }
+case BUILT_IN_HSA_GET_WORKITEM_ABSID:
+  {
+   hsa_op_immed *bdim = new hsa_op_immed (gimple_call_arg (stmt, 0), true);
+   if (bdim->m_type != BRIG_TYPE_U32)
+ bdim->get_in_type (BRIG_TYPE_U32, hbb);
+   query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, bdim, hbb);
+   break;
+  }
+
+case BUILT_IN_GOMP_BARRIER:
+  hbb->append_insn (new hsa_insn_br (0, BRIG_OPCODE_BARRIER, 
BRIG_TYPE_NONE,
+BRIG_WIDTH_ALL));
+  break;
 case BUILT_IN_GOMP_PARALLEL:
   {
gcc_checking_assert (gimple_call_num_args (stmt) == 4);
@@ -5798,14 +5828,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 
break;
   }
-case BUILT_IN_HSA_GET_WORKITEM_ABSID:
-  {
-   hsa_op_immed *bdim = new hsa_op_immed (gimple_call_arg (stmt, 0), true);
-   if (bdim->m_type != BRIG_TYPE_U32)
- bdim->get_in_type (BRIG_TYPE_U32, hbb);
-   query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, bdim, hbb);
-   break;
-  }
 case BUILT_IN_OMP_GET_THREAD_NUM:
   {
query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
-- 
2.8.2

[hsa-branch 0/5] Gridification support for tiling algorithms

2016-06-09 Thread Martin Jambor

Hi,

this patch series, currently intended just for the branch but
eventually also for trunk in time for gcc 7, enables gridification,
that is the expansion of OpenMP loops for HSA GPUs, to work with
separate distribute and loop construct, provided that the step size of
the distribute loop is equal to the iteration size of iteration space
of the  "normal" loops in it.

It also allows the HSA back-end to emit group-segment variables and
expands any variables private to a distribute construct as such.

Apart from increased flexibility, one of the main motivations is to
enable tiling.  The patches enable the compiler to grok the matrix
multiplication code example and and emit it to HSA, which than runs
2.5 times faster (in my very non-scientific settings) than a naive
implementation (compiled for HSA).

Thanks,

Martin


#define BLOCK_SIZE 16

void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, 
const float*A, const int LDA,
   const float*B, const int LDB, const float beta, float*C, const int LDC){

#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
#pragma omp distribute collapse(2)
   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
  for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
{
 /* Each team has a local copy of these mini matrices */
 float As[BLOCK_SIZE][BLOCK_SIZE];
 float Bs[BLOCK_SIZE][BLOCK_SIZE];
#pragma omp parallel
 {
 int C_row, C_col;
 float Cval = 0.0;

 for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE )
   {
#pragma omp for collapse(2)
 for (int row=0 ; row < BLOCK_SIZE ; row++)
   for (int col=0 ; col < BLOCK_SIZE ; col++)
 {
   C_row = C_row_start + row;
   C_col = C_col_start + col;
   if ((C_row < M) && (kblock + col < K))
 As[row][col] = A[(C_row*LDA)+ kblock + col];
   else
 As[row][col] = 0;
   if ((kblock + row < K) && C_col < N)
 Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
   else
 Bs[row][col] = 0;
 }

#pragma omp for collapse(2)
 for (int row=0 ; row < BLOCK_SIZE ; row++)
   for (int col=0 ; col < BLOCK_SIZE ; col++)
 {
   for (int e = 0; e < BLOCK_SIZE; ++e)
 Cval += As[row][e] * Bs[e][col];
 }
   }  /* End for kblock .. */


#pragma omp for collapse(2)
 for (int row=0 ; row < BLOCK_SIZE ; row++)
   for (int col=0 ; col < BLOCK_SIZE ; col++)
 {
   C_row = C_row_start + row;
   C_col = C_col_start + col;
   if ((C_row < M) && (C_col < N))
 C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col];

 }
 } /* end parallel */
  }/* end target teams distribute */
}

[hsa-branch] Fix issue with an undefined builtin

2016-06-09 Thread Martin Jambor

Hi,

when I added HSA-only builtins to the HSA branch, I added them only
conditionally if HSA was enabled, which however broke non-HSA
compilation.  This patch fixes this be removing the ifdefs.

Now that we have reorganized HSA so that all GTY stuff is in one file
(hsa.c), the right thing to do is to conditionally compile all of it.
However, this will mean putting ifdefs also to omp-lowering, which is
something that I would like to do only after we split omp-low.c and
gridification gets its own file.  At the branch this does not do any
real harm anyway.

I will commit this to the branch straight away.

Thanks,

Martin


2016-06-06  Martin Jambor  

* builtins.def: Do not enclose DEF_HSA_BUILTIN by an ifdef ENABLE_HSA.

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 2bc933b..4e1c0ac 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -189,14 +189,10 @@ along with GCC; see the file COPYING3.  If not see
|| flag_offload_abi != OFFLOAD_ABI_UNSET))
 
 #undef DEF_HSA_BUILTIN
-#ifdef ENABLE_HSA
 #define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
false, false, true, ATTRS, false, \
   (!flag_disable_hsa))
-#else
-#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS)
-#endif
 
 /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
by the compiler but a few are implemented in libcilkrts.  */

Re: [PATCH] Add selftest for pretty-print.c (v2)

2016-06-09 Thread David Edelsohn

On Thu, Jun 9, 2016 at 9:10 AM, Jakub Jelinek  wrote:
> On Thu, Jun 09, 2016 at 09:02:27AM -0400, David Edelsohn wrote:
>> On Thu, Jun 9, 2016 at 8:48 AM, Bernd Schmidt  wrote:
>> > On 06/09/2016 02:21 PM, David Edelsohn wrote:
>> >
>> >> This is a completely unacceptable way to introduce these self-tests.
>> >> Please stop adding self-tests that only are tested on x86 Linux and
>> >> cause bootstrap failures.
>> >
>> >
>> > We have no requirement to test patches on more than one target. I think 
>> > your
>> > request is unreasonable.
>>
>> Bernd,
>>
>> This is a completely inappropriate response.  GCC must maintain a
>> stable, working development base on which developers can work.  GCC
>> specifically supports multiple architectures and targets.
>>
>> GCC is not your personal kingdom and playground.
>
> For patches that the submitter can easily expect problems on some
> architectures, testing there is desirable, but we certainly don't and should
> not require every single generic code change to be expected on all supported
> targets, not even on primary targets.  That just doesn't scale, various
> people submit multiple changes a day and having to wait days or weeks before
> testing is over is unacceptable.
>
> You've reported a problem, David is going to look at it and fix.
> I don't see why it would be unacceptable to add further self-tests, as long
> as there is nothing in them where one should expect issues on targets other
> than the tested one.

I didn't request that every patch be tested on every architecture.
Please don't play a rhetorical game of arguing against the strawman
that you created.

The self-tests specifically abort the build and break bootstrap upon
failure.  Most other changes that inadvertently have bugs or tickle a
latent issue in a target will introduce some additional testsuite
failures, not a bootstrap failure.  x86 developers seem to get quite
annoyed when a patch causes a bootstrap failure for an x86
configuration.

Second, all of the large changes that may have unknown effects on
various targets have been tested extensively on multiple
architectures, as have most global optimization changes.  It may not
be required, but it generally has been considered "good form" and has
been a stipulation of patch approval by some reviewers.  It would be
very unfortunate for GCC to lower the bar for patches by some
developers and not others.

Thanks, David

[PATCH] Handle undefined extern vars in output_in_order

2016-06-09 Thread Alexander Monakov

Hi,

This patch teaches cgraphunit.c:output_in_order to output undefined external
variables via assemble_undefined_decl.  At the moment that is only done for
-ftoplevel-reorder in varpool.c:symbol_table::output_variables.  This patch
makes both behave the same way.  I've also made handling of variables in both
functions look similar to each other.

I'll admit it's rather unclear to me what the '!DECL_EXTERNAL (pv->decl)'
condition is guarding in output_in_order (removed in my patch).  AIUI, it
should be always true for defined variables (or, conversely, if it's false
for a defined variable, why are we skipping it?).

I have plans to reimplement handling of "omp declare target link" in a more
clean manner.  Now it looks quite out of place there.

Bootstrapped and regtested on x86_64, OK for trunk?

Alexander

* cgraphunit.c (cgraph_order_sort_kind): New entry ORDER_VAR_UNDEF.
(output_in_order): Loop over undefined variables too.  Output them
via assemble_undefined_decl.  Skip "omp declare target link" variables
earlier.
* varpool.c (symbol_table::output_variables): Handle undefined
variables together with defined ones.

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 4bfcad7..5a04902 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2141,6 +2141,7 @@ enum cgraph_order_sort_kind
   ORDER_UNDEFINED = 0,
   ORDER_FUNCTION,
   ORDER_VAR,
+  ORDER_VAR_UNDEF,
   ORDER_ASM
 };
 
@@ -2187,16 +2188,25 @@ output_in_order (bool no_reorder)
}
 }
 
-  FOR_EACH_DEFINED_VARIABLE (pv)
-if (!DECL_EXTERNAL (pv->decl))
-  {
-   if (no_reorder && !pv->no_reorder)
-   continue;
-   i = pv->order;
-   gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
-   nodes[i].kind = ORDER_VAR;
-   nodes[i].u.v = pv;
-  }
+  FOR_EACH_VARIABLE (pv)
+{
+  if (no_reorder && !pv->no_reorder)
+   continue;
+  if (DECL_HARD_REGISTER (pv->decl))
+   continue;
+  if (DECL_HAS_VALUE_EXPR_P (pv->decl))
+   {
+ gcc_checking_assert (lookup_attribute ("omp declare target link",
+DECL_ATTRIBUTES (pv->decl)));
+#ifdef ACCEL_COMPILER
+ continue;
+#endif
+   }
+  i = pv->order;
+  gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
+  nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDEF;
+  nodes[i].u.v = pv;
+}
 
   for (pa = symtab->first_asm_symbol (); pa; pa = pa->next)
 {
@@ -,16 +2232,13 @@ output_in_order (bool no_reorder)
  break;
 
case ORDER_VAR:
-#ifdef ACCEL_COMPILER
- /* Do not assemble "omp declare target link" vars.  */
- if (DECL_HAS_VALUE_EXPR_P (nodes[i].u.v->decl)
- && lookup_attribute ("omp declare target link",
-  DECL_ATTRIBUTES (nodes[i].u.v->decl)))
-   break;
-#endif
  nodes[i].u.v->assemble_decl ();
  break;
 
+   case ORDER_VAR_UNDEF:
+ assemble_undefined_decl (nodes[i].u.v->decl);
+ break;
+
case ORDER_ASM:
  assemble_asm (nodes[i].u.a->asm_str);
  break;
diff --git a/gcc/varpool.c b/gcc/varpool.c
index ab615fa..b513026 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -733,11 +733,6 @@ symbol_table::output_variables (void)
 
   timevar_push (TV_VAROUT);
 
-  FOR_EACH_VARIABLE (node)
-if (!node->definition
-   && !DECL_HAS_VALUE_EXPR_P (node->decl)
-   && !DECL_HARD_REGISTER (node->decl))
-  assemble_undefined_decl (node->decl);
   FOR_EACH_DEFINED_VARIABLE (node)
 {
   /* Handled in output_in_order.  */
@@ -747,20 +742,24 @@ symbol_table::output_variables (void)
   node->finalize_named_section_flags ();
 }
 
-  FOR_EACH_DEFINED_VARIABLE (node)
+  FOR_EACH_VARIABLE (node)
 {
-  /* Handled in output_in_order.  */
   if (node->no_reorder)
continue;
+  if (DECL_HARD_REGISTER (node->decl))
+   continue;
+  if (DECL_HAS_VALUE_EXPR_P (node->decl))
+   {
+ gcc_checking_assert (lookup_attribute ("omp declare target link",
+DECL_ATTRIBUTES (node->decl)));
 #ifdef ACCEL_COMPILER
-  /* Do not assemble "omp declare target link" vars.  */
-  if (DECL_HAS_VALUE_EXPR_P (node->decl)
- && lookup_attribute ("omp declare target link",
-  DECL_ATTRIBUTES (node->decl)))
-   continue;
+ continue;
 #endif
-  if (node->assemble_decl ())
-changed = true;
+   }
+  if (node->definition)
+   changed |= node->assemble_decl ();
+  else
+   assemble_undefined_decl (node->decl);
 }
   timevar_pop (TV_VAROUT);
   return changed;

Re: [PATCH] Add selftest for pretty-print.c (v2)

2016-06-09 Thread Jakub Jelinek

On Thu, Jun 09, 2016 at 09:02:27AM -0400, David Edelsohn wrote:
> On Thu, Jun 9, 2016 at 8:48 AM, Bernd Schmidt  wrote:
> > On 06/09/2016 02:21 PM, David Edelsohn wrote:
> >
> >> This is a completely unacceptable way to introduce these self-tests.
> >> Please stop adding self-tests that only are tested on x86 Linux and
> >> cause bootstrap failures.
> >
> >
> > We have no requirement to test patches on more than one target. I think your
> > request is unreasonable.
> 
> Bernd,
> 
> This is a completely inappropriate response.  GCC must maintain a
> stable, working development base on which developers can work.  GCC
> specifically supports multiple architectures and targets.
> 
> GCC is not your personal kingdom and playground.

For patches that the submitter can easily expect problems on some
architectures, testing there is desirable, but we certainly don't and should
not require every single generic code change to be expected on all supported
targets, not even on primary targets.  That just doesn't scale, various
people submit multiple changes a day and having to wait days or weeks before
testing is over is unacceptable.

You've reported a problem, David is going to look at it and fix.
I don't see why it would be unacceptable to add further self-tests, as long
as there is nothing in them where one should expect issues on targets other
than the tested one.

Jakub

Re: [PATCH] Add selftest for pretty-print.c (v2)

2016-06-09 Thread David Edelsohn

On Thu, Jun 9, 2016 at 8:48 AM, Bernd Schmidt  wrote:
> On 06/09/2016 02:21 PM, David Edelsohn wrote:
>
>> This is a completely unacceptable way to introduce these self-tests.
>> Please stop adding self-tests that only are tested on x86 Linux and
>> cause bootstrap failures.
>
>
> We have no requirement to test patches on more than one target. I think your
> request is unreasonable.

Bernd,

This is a completely inappropriate response.  GCC must maintain a
stable, working development base on which developers can work.  GCC
specifically supports multiple architectures and targets.

GCC is not your personal kingdom and playground.

Thanks, David

>
> Have you at least investigated whether the self-test exposes a real bug or
> not?
>
>
> Bernd

Re: [PATCH] Add selftest for pretty-print.c (v2)

2016-06-09 Thread Bernd Schmidt


On 06/09/2016 02:21 PM, David Edelsohn wrote:


This is a completely unacceptable way to introduce these self-tests.
Please stop adding self-tests that only are tested on x86 Linux and
cause bootstrap failures.


We have no requirement to test patches on more than one target. I think 
your request is unreasonable.


Have you at least investigated whether the self-test exposes a real bug 
or not?



Bernd

Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-09 Thread Jakub Jelinek

On Thu, Jun 09, 2016 at 02:40:43PM +0200, Christophe Lyon wrote:
> > Bet it depends if this happens before the signal(SIGILL, sig_ill_handler);
> > call or after it.  If before, then I guess you'd better rewrite the
> > long long a = 0, b = 1;
> > asm ("vorr %P0, %P1, %P2"
> >  : "=w" (a)
> >  : "0" (a), "w" (b));
> > if (a != 1)
> 
> Of course you are right: it happens just before the call to signal,
> to build the sig_ill_handler address in r1.
> 
> So it's not even a problem with rewriting the asm.

Ugh, so the added options don't affect just vectorized code, but normal
integer only code?
check_vect is fragile, there is always a risk that some instruction is
scheduled before the call.
If you have working target attribute support, I think you should compile
check_vect with attribute set to some lowest common denominator that every
ARM CPU supports (if there is any, that is).  Though most likely you'll need
to tweak the inline asm, because maybe "w" constraint won't be available
then.

Jakub

Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-09 Thread Christophe Lyon

On 9 June 2016 at 14:31, Jakub Jelinek  wrote:
> On Thu, Jun 09, 2016 at 02:18:44PM +0200, Christophe Lyon wrote:
>> On 8 June 2016 at 16:50, Jakub Jelinek  wrote:
>> > On Wed, Jun 08, 2016 at 04:44:00PM +0200, Christophe Lyon wrote:
>> >> I've tried the attached patch (which does only dg-options ->
>> >> dg-additional-options).
>> >> For GCC, it's better, except that on arm-none-eabi qemu complains about
>> >> an illegal instruction when asked to use arm926 and GCC is configured with
>> >> the default cpu. Maybe that's because check_vect does not have the 
>> >> expected
>> >
>> > check_vect installs a SIGILL handler and if the insn is invalid, excepts
>> > a signal to be raised.  Is that not the case with qemu?  Or is qemu just
>> > being too noisy?
>> >
>> qemu complains when executing check_vect's prologue, which contains
>> movwr1, #35712
>> which is not supported either on arm926.
>
> Bet it depends if this happens before the signal(SIGILL, sig_ill_handler);
> call or after it.  If before, then I guess you'd better rewrite the
> long long a = 0, b = 1;
> asm ("vorr %P0, %P1, %P2"
>  : "=w" (a)
>  : "0" (a), "w" (b));
> if (a != 1)

Of course you are right: it happens just before the call to signal,
to build the sig_ill_handler address in r1.

So it's not even a problem with rewriting the asm.

> fully into inline asm, if after, then it is likely either just too noisy
> qemu, or misdesigned issue in qemu.
>
> Jakub

Re: Maintain loop iteration count estimates

2016-06-09 Thread Jan Hubicka

> On Thu, 9 Jun 2016, Jan Hubicka wrote:
> 
> > Hi,
> > after we read the profile, we know expected number of iterations.
> 
> We know the average ;)  It may make sense to add some histogram
> value profiling for niter now that we should easily able to do so.

I always interpreted the estimated number of iterations to be the same as
expected number of iterations and to be same as average.  So it seems to be
sane to feed the info from profile.

I am thinking to add the histograms, yes.  It is midly anoying to do so becuase
one needs to intrstrument all exit edges out of loop. I guess we don't care
much if histogram will get lost on abnormals.

One option I am thinking about is to introduce counter that will take two
parameters A,B.  It will record linear histogram for values in range 0...A and
logarithmic histogram for values greater than A capping by 2^B (so we don't
need 64 counters for every loop).  I think we are not really that interested in
precise histograms for loops that iterate more than, say, 2^10 times. We only
need to know that it loops a lot.  We however care about low iteration counts
to make good decision for peeling.  This is still 26 counters per loop that is
quite a lot.  

A lot cheaper alternative may be to simply measure loop peeling limit by having
counter that counts how often loop exits in first PARAM_MAX_PEEL_TIMES 
iterations
and second counter that measure what is the maximal number of iterations in this
case (we really want likely maximal number of iterations but that seems harder 
to
get). This will determine peeling limit which we can then store into loop 
structure.

Vectorizer/unroller/prefetcher and most of other classical loop opts care about
the average being large enough so the current expected value seems to do the
trick.

This does not let you to update profile very precisely after peeling (and also 
peeling
done by vectorizer), but it needs only 2 counters per loop.

What other passes, beside peeling, would immediately benefit from a historgram?
I wonder if you can think of better scheme?
> 
> > Currently we use profile each time estimate_numbers_of_iterations_loop
> > is called to recompute this value.  This is not very safe because the
> > profile may be misupdated.  It seems safer to compute it at once and
> > maintain thorough the compilation.
> > 
> > Notice that I removed:
> > -  /* Force estimate compuation but leave any existing upper bound in 
> > place.  */
> > -  loop->any_estimate = false;
> > From beggining of estimate_numbers_of_iterations_loop.  I can not make 
> > sense of
> > this. Even w/o profile if we have estimate, we are better to maintain it 
> > because
> > later we may not be able to derive it again.
> > There seems to be no code that is forced by setting loop->any_estimate = 
> > true.
> > Only code that cares seems to be record_niter_bound that only decreases 
> > existing
> > estimates. THis seems sane procedure - we don't roll loops.
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> Ok.  Did you check what this does to SPEC with FDO?

I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I have
readily available.  Any regressions would however point to loop info updating
bugs (or wrong use of the profile) so I will look into them if they appear.
I am trying to benchmark firefox now.

Honza

Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-09 Thread Jakub Jelinek

On Thu, Jun 09, 2016 at 02:18:44PM +0200, Christophe Lyon wrote:
> On 8 June 2016 at 16:50, Jakub Jelinek  wrote:
> > On Wed, Jun 08, 2016 at 04:44:00PM +0200, Christophe Lyon wrote:
> >> I've tried the attached patch (which does only dg-options ->
> >> dg-additional-options).
> >> For GCC, it's better, except that on arm-none-eabi qemu complains about
> >> an illegal instruction when asked to use arm926 and GCC is configured with
> >> the default cpu. Maybe that's because check_vect does not have the expected
> >
> > check_vect installs a SIGILL handler and if the insn is invalid, excepts
> > a signal to be raised.  Is that not the case with qemu?  Or is qemu just
> > being too noisy?
> >
> qemu complains when executing check_vect's prologue, which contains
> movwr1, #35712
> which is not supported either on arm926.

Bet it depends if this happens before the signal(SIGILL, sig_ill_handler);
call or after it.  If before, then I guess you'd better rewrite the
long long a = 0, b = 1;
asm ("vorr %P0, %P1, %P2"
 : "=w" (a)
 : "0" (a), "w" (b));
if (a != 1)
fully into inline asm, if after, then it is likely either just too noisy
qemu, or misdesigned issue in qemu.

Jakub

1 2 >

1 - 100 of 123 matches

Mail list logo