C++ PATCH for missing feature test macros

2015-10-19 Thread Jason Merrill
Someone pointed out that we were missing feature test macros for some 
recently implemented C++1z features; this patch fixes that.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit cb8271fe1b49b984fa38dc6cf1d3ed25462afd20
Author: Jason Merrill 
Date:   Sun Oct 18 10:15:36 2015 -1000

	* c-cppbuiltin.c (c_cpp_builtins): Define
	__cpp_enumerator_attributes, __cpp_fold_expressions,
	__cpp_unicode_characters.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 35d246b..cd6fd51 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -833,7 +833,8 @@ c_cpp_builtins (cpp_reader *pfile)
   if (cxx_dialect >= cxx11)
 	{
 	  /* Set feature test macros for C++11.  */
-	  cpp_define (pfile, "__cpp_unicode_characters=200704");
+	  if (cxx_dialect <= cxx14)
+	cpp_define (pfile, "__cpp_unicode_characters=200704");
 	  cpp_define (pfile, "__cpp_raw_strings=200710");
 	  cpp_define (pfile, "__cpp_unicode_literals=200710");
 	  cpp_define (pfile, "__cpp_user_defined_literals=200809");
@@ -869,9 +870,12 @@ c_cpp_builtins (cpp_reader *pfile)
   if (cxx_dialect > cxx14)
 	{
 	  /* Set feature test macros for C++1z.  */
+	  cpp_define (pfile, "__cpp_unicode_characters=201411");
 	  cpp_define (pfile, "__cpp_static_assert=201411");
 	  cpp_define (pfile, "__cpp_namespace_attributes=201411");
+	  cpp_define (pfile, "__cpp_enumerator_attributes=201411");
 	  cpp_define (pfile, "__cpp_nested_namespace_definitions=201411");
+	  cpp_define (pfile, "__cpp_fold_expressions=201411");
 	}
   if (flag_concepts)
 	/* Use a value smaller than the 201507 specified in
diff --git a/gcc/testsuite/g++.dg/cpp1z/attributes-enum-1.C b/gcc/testsuite/g++.dg/cpp1z/attributes-enum-1.C
new file mode 100644
index 000..1f8f848
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/attributes-enum-1.C
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++1z" }
+
+#ifndef __cpp_enumerator_attributes
+#error __cpp_enumerator_attributes not defined
+#endif
+
+#if __cpp_enumerator_attributes != 201411
+#error Wrong value for __cpp_enumerator_attributes
+#endif
diff --git a/gcc/testsuite/g++.dg/cpp1z/attributes-enum-1a.C b/gcc/testsuite/g++.dg/cpp1z/attributes-enum-1a.C
new file mode 100644
index 000..f321ba1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/attributes-enum-1a.C
@@ -0,0 +1,5 @@
+// This macro should not be defined without -std=c++1z.
+
+#ifdef __cpp_enumerator_attributes
+#error __cpp_enumerator_attributes defined
+#endif
diff --git a/gcc/testsuite/g++.dg/cpp1z/fold7.C b/gcc/testsuite/g++.dg/cpp1z/fold7.C
new file mode 100644
index 000..3e6925a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/fold7.C
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++1z" }
+
+#ifndef __cpp_fold_expressions
+#error __cpp_fold_expressions not defined
+#endif
+
+#if __cpp_fold_expressions != 201411
+#error Wrong value for __cpp_fold_expressions
+#endif
diff --git a/gcc/testsuite/g++.dg/cpp1z/fold7a.C b/gcc/testsuite/g++.dg/cpp1z/fold7a.C
new file mode 100644
index 000..d56cefb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/fold7a.C
@@ -0,0 +1,5 @@
+// This macro should not be defined without -std=c++1z.
+
+#ifdef __cpp_fold_expressions
+#error __cpp_fold_expressions defined
+#endif
diff --git a/gcc/testsuite/g++.dg/cpp1z/utf8-2.C b/gcc/testsuite/g++.dg/cpp1z/utf8-2.C
new file mode 100644
index 000..152762f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/utf8-2.C
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++1z" }
+
+#ifndef __cpp_unicode_characters
+#error __cpp_unicode_characters not defined
+#endif
+
+#if __cpp_unicode_characters != 201411
+#error Wrong value for __cpp_unicode_characters
+#endif
diff --git a/gcc/testsuite/g++.dg/cpp1z/utf8-2a.C b/gcc/testsuite/g++.dg/cpp1z/utf8-2a.C
new file mode 100644
index 000..9985cd0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/utf8-2a.C
@@ -0,0 +1,5 @@
+// This macro should not be 201411 without -std=c++1z.
+
+#if __cpp_unicode_characters == 201411
+#error Wrong value for __cpp_unicode_characters
+#endif


C++ PATCH for N4268, constant evaluation of non-type template arguments

2015-10-19 Thread Jason Merrill
N4268 is the last bullet point remaining for C++1z features until 
Saturday, when presumably more stuff will be added.  The basic 
functionality is quite straightforward: first generalize parsing, then 
pass all arguments through maybe_constant_value.  But to get correct 
handling of pointers to members, we need to prevent constexpr evaluation 
from lowering PTRMEM_CST until it's really needed, namely if we're 
trying to use one in a .* expression or we're handing off to the 
language-independent code.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 24ba0af22de3694ee1271a053cea04a35b0c71f1
Author: Jason Merrill 
Date:   Sun Oct 18 22:57:24 2015 -1000

	Implement N4268, Do constant evaluation of all non-type template args.

gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins): Define
	__cpp_nontype_template_args.
gcc/cp/
	* parser.c (cp_parser_template_argument): For C++1z just parse a
	constant-expression.
	* pt.c (convert_nontype_argument): For C++1z always call
	maybe_constant_value.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index cd6fd51..c6c3d6b 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -876,6 +876,7 @@ c_cpp_builtins (cpp_reader *pfile)
 	  cpp_define (pfile, "__cpp_enumerator_attributes=201411");
 	  cpp_define (pfile, "__cpp_nested_namespace_definitions=201411");
 	  cpp_define (pfile, "__cpp_fold_expressions=201411");
+	  cpp_define (pfile, "__cpp_nontype_template_args=201411");
 	}
   if (flag_concepts)
 	/* Use a value smaller than the 201507 specified in
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 435757d..f07a5e4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -14977,8 +14977,12 @@ cp_parser_template_argument (cp_parser* parser)
 	warn_deprecated_use (argument, NULL_TREE);
   return argument;
 }
-  /* It must be a non-type argument.  There permitted cases are given
- in [temp.arg.nontype]:
+  /* It must be a non-type argument.  In C++17 any constant-expression is
+ allowed.  */
+  if (cxx_dialect > cxx14)
+goto general_expr;
+
+  /* Otherwise, the permitted cases are given in [temp.arg.nontype]:
 
  -- an integral constant-expression of integral or enumeration
 	type; or
@@ -15090,6 +15094,7 @@ cp_parser_template_argument (cp_parser* parser)
   return error_mark_node;
 }
 
+ general_expr:
   /* If the argument wasn't successfully parsed as a type-id followed
  by '>>', the argument can only be a constant expression now.
  Otherwise, we try parsing the constant-expression tentatively,
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 14a5ddd..142245a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6233,6 +6233,17 @@ convert_nontype_argument (tree type, tree expr, tsubst_flags_t complain)
 	   CONSTRUCTOR.  */;
   else if (INTEGRAL_OR_ENUMERATION_TYPE_P (type))
 	expr = maybe_constant_value (expr);
+  else if (cxx_dialect >= cxx1z)
+	{
+	  if (TREE_CODE (type) != REFERENCE_TYPE)
+	expr = maybe_constant_value (expr);
+	  else if (REFERENCE_REF_P (expr))
+	{
+	  expr = TREE_OPERAND (expr, 0);
+	  expr = maybe_constant_value (expr);
+	  expr = convert_from_reference (expr);
+	}
+	}
   else if (TYPE_PTR_OR_PTRMEM_P (type))
 	{
 	  tree folded = maybe_constant_value (expr);
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype1.C b/gcc/testsuite/g++.dg/cpp1z/nontype1.C
new file mode 100644
index 000..bb46a99
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype1.C
@@ -0,0 +1,25 @@
+// { dg-options -std=c++1z }
+
+struct S { int m; static int s; } s;
+
+struct T: S { };
+
+template class X { };
+template class M {};
+template class SP {};
+
+constexpr int *p = ::s;
+constexpr int S::*sp = ::m;
+constexpr S& sr = s;
+constexpr S& sf() { return s; }
+
+// { dg-final { scan-assembler "_Z1g1XIXadL_ZN1S1s" } }
+void g (X<>) { }
+// { dg-final { scan-assembler "_Z1f1XIXadL_ZN1S1s" } }
+void f (X) { }
+// { dg-final { scan-assembler "_Z1f1MIXadL_ZN1S1m" } }
+void f (M) {}
+// { dg-final { scan-assembler "_Z1f2SPIL_Z1sEE" } }
+void f (SP) {}
+// { dg-final { scan-assembler "_Z1g2SPIL_Z1sEE" } }
+void g (SP) {}
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype2.C b/gcc/testsuite/g++.dg/cpp1z/nontype2.C
new file mode 100644
index 000..e489476
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype2.C
@@ -0,0 +1,18 @@
+// { dg-options -std=c++1z }
+
+#include 
+
+struct S { int m; static int s; } s;
+
+template class X { };
+template class Y {};
+template class Z {};
+
+X<> x7;			// { dg-error "" }
+Y<"foo"> y1;			// { dg-error "string literal" }
+Z z1;		// { dg-error "" }
+
+void f()
+{
+  Y<__func__> y;		// { dg-error "" }
+}
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype3.C b/gcc/testsuite/g++.dg/cpp1z/nontype3.C
new file mode 100644
index 000..886d7a5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype3.C
@@ -0,0 +1,9 @@
+// { dg-options 

Re: [PATCH] c/67925 - update documentation on `inline'

2015-10-19 Thread Jeff Law

On 10/15/2015 06:25 AM, Arkadiusz Drabczyk wrote:

On Wed, Oct 14, 2015 at 06:18:03PM -0600, Martin Sebor wrote:

>On 10/14/2015 03:42 PM, Arkadiusz Drabczyk wrote:

> >On Wed, Oct 14, 2015 at 08:36:43AM -0600, Martin Sebor wrote:

> >>On 10/13/2015 04:47 PM, Arkadiusz Drabczyk wrote:

> >>>* gcc/doc/extend.texi: documentation says that functions declared
> >>>`inline' would not be integrated if they are called before they are
> >>>defined or if they are recursive. Both of these statements is now
> >>>false as shown in examples on Bugzilla.

> >>
> >>It might also be worth updating the note in the subsequent
> >>paragraph and removing the mention of variable-length data types
> >>which no longer prevent inlining.

> >
> >Done.  I also removed the mention of nested functions as the following
> >code compiled with GCC 6.0 doesn't give any warning with -O2 -Winline
> >and main() is the only function defined in assembler code:

>
>I think this is the Ada-specific warning I mentioned (see
>check_inlining_for_nested_subprog in gcc/ada/gcc-interface/trans.c)
>so the part about nested functions needs to stay.

Ok, I brought it back.


> >> = G_("function %q+F can never be inlined because "
> >>  "it uses non-local goto");

> >
> >I tested of all of these and listed them in the documentation but
> >wasn't able to reproduce this one.  The following code does not give
> >any warning with -O2 -Winline:

>
>The warning above is issued for non-local and computed goto. You can
>find examples of both in the test suite (find gcc/testsuite/ -name
>"*goto*.[cC]")

Aha, ok, I just thought that longjmp() counts as a non-local goto
here.  Indeed, this code invokes `function 'bar' can never be inlined
because it uses non-local goto' with -O2 -Winline (but only if
function that contains goto is nested, otherwise it fails to compile):

#include 
#include 

int main (void)
{
__label__ l1;

void foo (void)
{

inline void bar (void)
{
puts ("goto l1");
goto l1;
}

bar ();
}

foo ();
abort ();
  l1:
puts ("label l1");
return 0;
}

I brought back a mention of nonlocal goto and added a mention of
__builtin_longjmp() as the usual longjmp() does not prevent inlining.

Thanks Arkadiusz & Martin.  I committed this version to the trunk.

jeff



Re: [PATCH, rs6000] Pass --secure-plt to the linker

2015-10-19 Thread Alan Modra
On Mon, Oct 19, 2015 at 08:10:32PM +0100, Szabolcs Nagy wrote:
> On 19/10/15 14:04, Szabolcs Nagy wrote:
> >On 19/10/15 12:12, Alan Modra wrote:
> >>On Thu, Oct 15, 2015 at 06:50:50PM +0100, Szabolcs Nagy wrote:
> >>>A powerpc toolchain built with (or without) --enable-secureplt
> >>>currently creates a binary that uses bss plt if
> >>>
> >>>(1) any of the linked PIC objects have bss plt relocs
> >>>(2) or all the linked objects are non-PIC or have no relocs,
> >>>
> >>>because this is the binutils linker behaviour.
> >>>
> >>>This patch passes --secure-plt to the linker which makes the linker
> >>>warn in case (1) and produce a binary with secure plt in case (2).
> >>
> >>The idea is OK I think, but
> >>
> >>>@@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", 
> >>>DEFAULT_ASM_ENDIAN)
> >>>  %{R*} \
> >>>  %(link_shlib) \
> >>>  %{!T*: %(link_start) } \
> >>>+%{!static: %(link_secure_plt_default)} \
> >>>  %(link_os)"
> >>
> >>this change needs to be conditional on !mbss-plt too.
> >>
> >
> >OK, will change that.
> >
> >if -msecure-plt and -mbss-plt are supposed to affect
> >linking too (not just code gen) then shall i add
> >%{msecure-plt: --secure-plt} too?
> >
> 
> I added !mbss-plt only for now as a mix of -msecure-plt
> and -mbss-plt options do not cancel each other in gcc,

They do for code-gen since they share the same variable (see
sysv4.opt), but I guess you meant as far as spec parsing goes.  In
hindsight, it might have been better if I'd spelled -mbss-plt as
-mno-secure-plt.

> the patch only changes behaviour for a secureplt toolchain.
> 
> OK to commit?

Apologies for not thinking of this before when I first reviewed the
patch, but have you bootstrapped this patch on powerpc64-linux?  I'm
guessing not, because it occurs to me that --secure-plt is not a
powerpc64-linux-ld option.  So if you try to build a biarch compiler
with --enable-secure-plt then ld will complain when attempting to link
64-bit binaries.

You'll want this on top of your patch:

diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
index 9599735..01fb880 100644
--- a/gcc/config/rs6000/linux64.h
+++ b/gcc/config/rs6000/linux64.h
@@ -174,20 +174,24 @@ extern int dot_symbols;
 #undef ASM_DEFAULT_SPEC
 #undef ASM_SPEC
 #undef LINK_OS_LINUX_SPEC
+#undef LINK_SECURE_PLT_SPEC
 
 #ifndefRS6000_BI_ARCH
 #defineASM_DEFAULT_SPEC "-mppc64"
 #defineASM_SPEC "%(asm_spec64) %(asm_spec_common)"
 #defineLINK_OS_LINUX_SPEC "%(link_os_linux_spec64)"
+#defineLINK_SECURE_PLT_SPEC ""
 #else
 #if DEFAULT_ARCH64_P
 #defineASM_DEFAULT_SPEC "-mppc%{!m32:64}"
 #defineASM_SPEC "%{m32:%(asm_spec32)}%{!m32:%(asm_spec64)} 
%(asm_spec_common)"
 #defineLINK_OS_LINUX_SPEC 
"%{m32:%(link_os_linux_spec32)}%{!m32:%(link_os_linux_spec64)}"
+#defineLINK_SECURE_PLT_SPEC "%{m32: " LINK_SECURE_PLT_DEFAULT_SPEC "}"
 #else
 #defineASM_DEFAULT_SPEC "-mppc%{m64:64}"
 #defineASM_SPEC "%{!m64:%(asm_spec32)}%{m64:%(asm_spec64)} 
%(asm_spec_common)"
 #defineLINK_OS_LINUX_SPEC 
"%{!m64:%(link_os_linux_spec32)}%{m64:%(link_os_linux_spec64)}"
+#defineLINK_SECURE_PLT_SPEC "%{!m64: " LINK_SECURE_PLT_DEFAULT_SPEC "}"
 #endif
 #endif
 
diff --git a/gcc/config/rs6000/sysv4.h b/gcc/config/rs6000/sysv4.h
index 93499e8..1bb400f 100644
--- a/gcc/config/rs6000/sysv4.h
+++ b/gcc/config/rs6000/sysv4.h
@@ -570,6 +570,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
: %(link_start_default) }"
 
 #define LINK_START_DEFAULT_SPEC ""
+#define LINK_SECURE_PLT_SPEC LINK_SECURE_PLT_DEFAULT_SPEC
 
 #undef LINK_SPEC
 #defineLINK_SPEC "\
@@ -577,7 +578,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 %{R*} \
 %(link_shlib) \
 %{!T*: %(link_start) } \
-%{!static: %{!mbss-plt: %(link_secure_plt_default)}} \
+%{!static: %{!mbss-plt: %(link_secure_plt)}} \
 %(link_os)"
 
 /* Shared libraries are not default.  */
@@ -893,7 +894,7 @@ ncrtn.o%s"
   { "link_os_openbsd", LINK_OS_OPENBSD_SPEC }, \
   { "link_os_default", LINK_OS_DEFAULT_SPEC }, \
   { "cc1_secure_plt_default",  CC1_SECURE_PLT_DEFAULT_SPEC },  \
-  { "link_secure_plt_default", LINK_SECURE_PLT_DEFAULT_SPEC }, \
+  { "link_secure_plt", LINK_SECURE_PLT_SPEC }, \
   { "cpp_os_ads",  CPP_OS_ADS_SPEC },  \
   { "cpp_os_yellowknife",  CPP_OS_YELLOWKNIFE_SPEC },  \
   { "cpp_os_mvme", CPP_OS_MVME_SPEC }, \


-- 
Alan Modra
Australia Development Lab, IBM


Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-19 Thread Jeff Law

On 10/14/2015 01:15 PM, Bernd Schmidt wrote:

On 10/14/2015 07:43 PM, Jeff Law wrote:

Obviously some pessimization relative to current code is necessary to
fix some of the problems WRT thread safety and avoiding things like
introducing faults in code which did not previously fault.


Huh? This patch is purely an (attempt at) optimization, not something
that fixes any problems.
Then I must be mentally merging two things Abe has been working on then. 
 He's certainly had an if-converter patch that was designed to avoid 
introducing races in code that didn't previously have races.


Looking back through the archives that appears to be the case. His 
patches to avoid racing are for the tree level if converter, not the RTL 
if converter.


Sigh, sorry for the confusion.  It's totally my fault.  Assuming Abe 
doesn't have a correctness case at all here, then I don't see any way 
for the code to go forward as-is since it's likely making things 
significantly worse.




I can't test valgrind right now, it fails to run on my machine, but I
guess it could adapt to allow stores slightly below the stack (maybe
warning once)? It seems like a bit of an edge case to worry about, but
if supporting it is critical and it can't be changed to adapt to new
optimizations, then I think we're probably better off entirely without
this scratchpad transformation.

Alternatively I can think of a few other possible approaches which
wouldn't require this kind of bloat:
  * add support for allocating space in the stack redzone. That could be
interesting for the register allocator as well. Would help only
x86_64, but that's a large fraction of gcc's userbase.
  * add support for opportunistically finding unused alignment padding
in the existing stack frame. Less likely to work but would produce
better results when it does.
  * on embedded targets we probably don't have to worry about valgrind,
so do the optimal (sp - x) thing there
  * allocate a single global as the dummy target. Might be more
expensive to load the address on some targets though.
  * at least find a way to express costs for this transformation.
Difficult since you don't yet necessarily know if the function is
going to have a stack frame. Hence, IMO this approach is flawed.
(You'll still want cost estimates even when not allocating stuff in
the normal stack frame, because generated code will still execute
between two and four extra instructions).
One could argue these should all be on the table.  However, I tend to 
really dislike using area beyond the current stack.  I realize it's 
throw-away data, but it just seems like a bad idea to me -- even on 
embedded targets that don't support valgrind.





Re: [PATCH 1/3] [ARM] PR63870 Add qualifiers for NEON builtins

2015-10-19 Thread Alan Lawrence

On 14/10/15 23:02, Charles Baylis wrote:

On 12 October 2015 at 11:58, Alan Lawrence  wrote:

>

Given we are making changes here to how this all works on bigendian, have
you tested armeb at all?


I tested on big endian, and it passes, except


Well, I asked because it seemed good to make sure that the changes/improvements 
to how lane-swapping was done, wasn't breaking anything on armeb by the back 
door, and so thank you, I'm happy with that as far as your patch is concerned ;).



for a testsuite issue
with the *_f16 tests, which fail because they are built without the
fp16 options on big endian. This is because
check_effective_target_arm_neon_fp16_ok_nocache gets an ICE when it
attempts to compile the test program. I think those fp16 intrinsics
are in your area, do you want to take a look? :)


Heh, yes, I see ;). So I've dug into this a bit, and the problem seems to be 
that we don't define a movv4hf pattern, and hence, we fall back to 
emit_multi_word_move. This uses subregs, and in simplify_subreg_regno, 
REG_CANNOT_CHANGE_MODE_P is true on bigendian (but false on little-endian).


That is, I *think* the right thing to do is just to add a movv4hf (and v8hf) 
pattern, I'm testing this now


Cheers, Alan



Re: [vec-cmp, patch 7/6] Vector comparison enabling in SLP

2015-10-19 Thread Jeff Law

On 10/19/2015 05:06 AM, Ilya Enkovich wrote:

Hi,

It appeared our testsuite doesn't have a test which would require vector 
comparison support in SLP even after boolean pattern disabling.  This patch 
adds such tests and allow comparison for SLP.  Is it OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* tree-vect-slp.c (vect_build_slp_tree_1): Allow
comparison statements.
(vect_get_constant_vectors): Support boolean vector
constants.

gcc/testsuite/

2015-10-19  Ilya Enkovich  

* gcc.dg/vect/slp-cond-5.c: New test.

OK once any prerequisites are approved.

jeff



Re: [PR fortran/63858] Fix mix of OpenACC and OpenMP sentinels in continuations

2015-10-19 Thread Thomas Schwinge
Hi!

Ping...

On Fri, 9 Oct 2015 12:15:24 +0200, I wrote:
> On Mon, 27 Jul 2015 16:14:17 +0200, I wrote:
> > On Tue, 30 Jun 2015 03:39:42 +0300, Ilmir Usmanov  wrote:
> > > 08.06.2015, 17:59, "Cesar Philippidis" :
> > > > On 06/07/2015 02:05 PM, Ilmir Usmanov wrote:
> > > >>  08.06.2015, 00:01, "Ilmir Usmanov" :
> > >   This patch fixes checks of OpenMP and OpenACC continuations in
> > >   case if someone mixes them (i.e. continues OpenMP directive with
> > >   !$ACC sentinel or vice versa).
> > 
> > Thanks for working on this!
> > 
> > >   OK for gomp branch?
> > 
> > The same applies to GCC trunk, as far as I can tell -- any reason not to
> > apply the patch to trunk?
> 
> Ping -- OK to commit the following (by Ilmir) to trunk:
> 
> commit 38e62678ef11f349f029d42439668071f170e059
> Author: Ilmir Usmanov 
> Date:   Sun Jul 26 12:10:36 2015 +
> 
> [PR fortran/63858] Fix mix of OpenACC and OpenMP sentinels in 
> continuations
> 
>   gcc/fortran/
>   PR fortran/63858
>   * scanner.c (skip_omp_attribute_fixed, skip_oacc_attribute_fixed):
>   New functions.
>   (skip_fixed_comments, gfc_next_char_literal): Fix mix of OpenACC
>   and OpenMP sentinels in continuation.
>   gcc/testsuite/
>   PR fortran/63858
>   * gfortran.dg/goacc/omp-fixed.f: New file.
>   * gfortran.dg/goacc/omp.f95: Extend.
> ---
>  gcc/fortran/scanner.c   | 258 
> +---
>  gcc/testsuite/gfortran.dg/goacc/omp-fixed.f |  32 
>  gcc/testsuite/gfortran.dg/goacc/omp.f95 |  10 +-
>  3 files changed, 199 insertions(+), 101 deletions(-)
> 
> diff --git gcc/fortran/scanner.c gcc/fortran/scanner.c
> index bfb7d45..1e1ea84 100644
> --- gcc/fortran/scanner.c
> +++ gcc/fortran/scanner.c
> @@ -935,6 +935,63 @@ skip_free_comments (void)
>return false;
>  }
>  
> +/* Return true if MP was matched in fixed form.  */
> +static bool
> +skip_omp_attribute_fixed (locus *start)
> +{
> +  gfc_char_t c;
> +  if (((c = next_char ()) == 'm' || c == 'M')
> +  && ((c = next_char ()) == 'p' || c == 'P'))
> +{
> +  c = next_char ();
> +  if (c != '\n'
> +   && (continue_flag
> +   || c == ' ' || c == '\t' || c == '0'))
> + {
> +   do
> + c = next_char ();
> +   while (gfc_is_whitespace (c));
> +   if (c != '\n' && c != '!')
> + {
> +   /* Canonicalize to *$omp.  */
> +   *start->nextc = '*';
> +   openmp_flag = 1;
> +   gfc_current_locus = *start;
> +   return true;
> + }
> + }
> +}
> +  return false;
> +}
> +
> +/* Return true if CC was matched in fixed form.  */
> +static bool
> +skip_oacc_attribute_fixed (locus *start)
> +{
> +  gfc_char_t c;
> +  if (((c = next_char ()) == 'c' || c == 'C')
> +  && ((c = next_char ()) == 'c' || c == 'C'))
> +{
> +  c = next_char ();
> +  if (c != '\n'
> +   && (continue_flag
> +   || c == ' ' || c == '\t' || c == '0'))
> + {
> +   do
> + c = next_char ();
> +   while (gfc_is_whitespace (c));
> +   if (c != '\n' && c != '!')
> + {
> +   /* Canonicalize to *$omp.  */
> +   *start->nextc = '*';
> +   openacc_flag = 1;
> +   gfc_current_locus = *start;
> +   return true;
> + }
> + }
> +}
> +  return false;
> +}
>  
>  /* Skip comment lines in fixed source mode.  We have the same rules as
> in skip_free_comment(), except that we can have a 'c', 'C' or '*'
> @@ -1003,128 +1060,92 @@ skip_fixed_comments (void)
> && continue_line < gfc_linebuf_linenum (gfc_current_locus.lb))
>   continue_line = gfc_linebuf_linenum (gfc_current_locus.lb);
>  
> -   if (flag_openmp || flag_openmp_simd)
> +   if ((flag_openmp || flag_openmp_simd) && !flag_openacc)
>   {
> if (next_char () == '$')
>   {
> c = next_char ();
> if (c == 'o' || c == 'O')
>   {
> -   if (((c = next_char ()) == 'm' || c == 'M')
> -   && ((c = next_char ()) == 'p' || c == 'P'))
> - {
> -   c = next_char ();
> -   if (c != '\n'
> -   && ((openmp_flag && continue_flag)
> -   || c == ' ' || c == '\t' || c == '0'))
> - {
> -   do
> - c = next_char ();
> -   while (gfc_is_whitespace (c));
> -   if (c != '\n' && c != '!')
> - {
> -   /* Canonicalize to *$omp.  */
> -   *start.nextc = '*';
> -   openmp_flag = 1;
> -   gfc_current_locus = start;
> -

[HSA] Class refactoring

2015-10-19 Thread Martin Liška

Hello.

Following tarball is made of patches that rename class member variables to 
respect
C++ coding style, where all member variables should begin with 'm_'.

Martin


hsa-class-refactoring.tar.bz2
Description: application/bzip


Re: Move cabs simplifications to match.pd

2015-10-19 Thread Richard Biener
On October 19, 2015 4:42:23 PM GMT+02:00, Richard Sandiford 
 wrote:
>The fold code also expanded cabs(x+yi) to fsqrt(x*x+y*y) when
>optimising
>for speed.  tree-ssa-math-opts.c has this transformation too, but
>unlike
>the fold code, it first checks whether the target implements the sqrt
>optab.  The patch simply removes the fold code and keeps the
>tree-ssa-math-opts.c logic the same.
>
>gcc.dg/lto/20110201-1_0.c was relying on us replacing cabs
>with fsqrt even on targets where fsqrt is itself a library call.
>The discussion leading up to that patch suggested that we only
>want to test the fold on targets with a square root instruction,
>so it would be OK to skip the test on other targets:
>
>https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01961.html
>https://gcc.gnu.org/ml/gcc-patches/2011-07/msg02036.html
>
>The patch does that using the sqrt_insn effective target.
>
>It's possible that removing the tree folds renders the LTO trick
>unnecessary, but since the test was originally for an ICE, it seems
>better to leave it as-is.
>
>Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>20110201-1_0.c passes on all three.  OK to install?

OK.

Thanks,
Richard.

>Thanks,
>Richard
>
>
>gcc/
>   * builtins.c (fold_builtin_cabs): Delete.
>   (fold_builtin_1): Update accordingly.  Handle constant arguments here.
>   * match.pd: Add rules previously handled by fold_builtin_cabs.
>
>gcc/testsuite/
>   * gcc.dg/lto/20110201-1_0.c: Restrict to sqrt_insn targets.
>   Add associated options for arm*-*-*.
>   (sqrt): Remove dummy definition.
>
>diff --git a/gcc/builtins.c b/gcc/builtins.c
>index 1e4ec35..8f87fd9 100644
>--- a/gcc/builtins.c
>+++ b/gcc/builtins.c
>@@ -7539,82 +7539,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl,
>tree arg)
>   return NULL_TREE;
> }
> 
>-/* Fold call to builtin cabs, cabsf or cabsl with argument ARG.  TYPE
>is the
>-   return type.  Return NULL_TREE if no simplification can be made. 
>*/
>-
>-static tree
>-fold_builtin_cabs (location_t loc, tree arg, tree type, tree fndecl)
>-{
>-  tree res;
>-
>-  if (!validate_arg (arg, COMPLEX_TYPE)
>-  || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE)
>-return NULL_TREE;
>-
>-  /* Calculate the result when the argument is a constant.  */
>-  if (TREE_CODE (arg) == COMPLEX_CST
>-  && (res = do_mpfr_arg2 (TREE_REALPART (arg), TREE_IMAGPART
>(arg),
>-type, mpfr_hypot)))
>-return res;
>-
>-  if (TREE_CODE (arg) == COMPLEX_EXPR)
>-{
>-  tree real = TREE_OPERAND (arg, 0);
>-  tree imag = TREE_OPERAND (arg, 1);
>-
>-  /* If either part is zero, cabs is fabs of the other.  */
>-  if (real_zerop (real))
>-  return fold_build1_loc (loc, ABS_EXPR, type, imag);
>-  if (real_zerop (imag))
>-  return fold_build1_loc (loc, ABS_EXPR, type, real);
>-
>-  /* cabs(x+xi) -> fabs(x)*sqrt(2).  */
>-  if (flag_unsafe_math_optimizations
>-&& operand_equal_p (real, imag, OEP_PURE_SAME))
>-{
>-STRIP_NOPS (real);
>-return fold_build2_loc (loc, MULT_EXPR, type,
>-fold_build1_loc (loc, ABS_EXPR, type, real),
>-build_real_truncate (type, dconst_sqrt2 ()));
>-  }
>-}
>-
>-  /* Optimize cabs(-z) and cabs(conj(z)) as cabs(z).  */
>-  if (TREE_CODE (arg) == NEGATE_EXPR
>-  || TREE_CODE (arg) == CONJ_EXPR)
>-return build_call_expr_loc (loc, fndecl, 1, TREE_OPERAND (arg,
>0));
>-
>-  /* Don't do this when optimizing for size.  */
>-  if (flag_unsafe_math_optimizations
>-  && optimize && optimize_function_for_speed_p (cfun))
>-{
>-  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
>-
>-  if (sqrtfn != NULL_TREE)
>-  {
>-tree rpart, ipart, result;
>-
>-arg = builtin_save_expr (arg);
>-
>-rpart = fold_build1_loc (loc, REALPART_EXPR, type, arg);
>-ipart = fold_build1_loc (loc, IMAGPART_EXPR, type, arg);
>-
>-rpart = builtin_save_expr (rpart);
>-ipart = builtin_save_expr (ipart);
>-
>-result = fold_build2_loc (loc, PLUS_EXPR, type,
>-  fold_build2_loc (loc, MULT_EXPR, type,
>-   rpart, rpart),
>-  fold_build2_loc (loc, MULT_EXPR, type,
>-   ipart, ipart));
>-
>-return build_call_expr_loc (loc, sqrtfn, 1, result);
>-  }
>-}
>-
>-  return NULL_TREE;
>-}
>-
> /* Build a complex (inf +- 0i) for the result of cproj.  TYPE is the
>complex tree type of the result.  If NEG is true, the imaginary
>zero is negative.  */
>@@ -9683,7 +9607,11 @@ fold_builtin_1 (location_t loc, tree fndecl,
>tree arg0)
> break;
> 
> CASE_FLT_FN (BUILT_IN_CABS):
>-  return fold_builtin_cabs (loc, arg0, type, fndecl);
>+  if (TREE_CODE (arg0) == COMPLEX_CST
>+&& TREE_CODE (TREE_TYPE 

[gomp4] auto partitioning

2015-10-19 Thread Nathan Sidwell

I've committed this patch to gomp4 branch.

It implements handling of the 'auto' clause on a loop.  such loops can be 
implicitly partitioned, if they are (explicitly or implicitly) 'independent'. 
This patch walks the loop structure after explicit partitioning has been 
handled, and attempts to allocate a partitioning for such auto loops.  If 
there's no available partitioning a diagnostic is emitted.


Auto partitioning caused a failure of a collapse testcase.  I considered this a 
latent bug and forced that testcase to retain the original behaviour of a 'seq' 
loop.


nathan
2015-10-19  Nathan Sidwell  

	gcc/
	* omp-low.c (oacc_loop_auto_partitions): New.
	(oacc_loop_partition): Call it.

	gcc/testsuite/
	* gfortran.dg/goacc/routine-4.f90: Add diagnostic.
	* gfortran.dg/goacc/routine-5.f90: Add diagnostic.
	* c-c++-common/goacc-gomp/nesting-1.c: Add diagnostic.
	* c-c++-common/goacc/routine-6.c: Add diagnostic.
	* c-c++-common/goacc/routine-7.c: Add diagnostic.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Force
	serialization.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228960)
+++ gcc/omp-low.c	(working copy)
@@ -16244,6 +16244,50 @@ oacc_loop_fixed_partitions (oacc_loop *l
   return has_auto;
 }
 
+/* Walk the OpenACC loop heirarchy to assign auto-partitioned loops.
+   OUTER_MASK is the partitioning this loop is contained within.
+   Return the cumulative partitioning used by this loop, siblings and
+   children.  */
+
+static unsigned
+oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask)
+{
+  unsigned inner_mask = 0;
+  bool noisy = true;
+
+#ifdef ACCEL_COMPILER
+  /* When device_type is supported, we want the device compiler to be
+ noisy, if the loop parameters are device_type-specific.  */
+  noisy = false;
+#endif
+
+  if (loop->child)
+inner_mask |= oacc_loop_auto_partitions (loop->child,
+	 outer_mask | loop->mask);
+
+  if ((loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT))
+{
+  unsigned this_mask = 0;
+  
+  /* Pick the innermost free partitioning.  */
+  this_mask = inner_mask | GOMP_DIM_MASK (GOMP_DIM_MAX);
+  this_mask = (this_mask & -this_mask) >> 1;
+  this_mask &= ~outer_mask;
+
+  if (!this_mask && noisy)
+	warning_at (loop->loc, 0,
+		"insufficient parallelism available to partition loop");
+
+  loop->mask = this_mask;
+}
+  inner_mask |= loop->mask;
+  
+  if (loop->sibling)
+inner_mask |= oacc_loop_auto_partitions (loop->sibling, outer_mask);
+  
+  return inner_mask;
+}
+
 /* Walk the OpenACC loop heirarchy to check and assign partitioning
axes.  */
 
@@ -16255,7 +16299,8 @@ oacc_loop_partition (oacc_loop *loop, in
   if (fn_level >= 0)
 outer_mask = GOMP_DIM_MASK (fn_level) - 1;
 
-  oacc_loop_fixed_partitions (loop, outer_mask);
+  if (oacc_loop_fixed_partitions (loop, outer_mask))
+oacc_loop_auto_partitions (loop, outer_mask);
 }
 
 /* Default launch dimension validator.  Force everything to 1.  A
Index: gcc/testsuite/gfortran.dg/goacc/routine-4.f90
===
--- gcc/testsuite/gfortran.dg/goacc/routine-4.f90	(revision 228960)
+++ gcc/testsuite/gfortran.dg/goacc/routine-4.f90	(working copy)
@@ -44,7 +44,7 @@ program main
   !
 
   !$acc parallel copy (a)
-  !$acc loop
+  !$acc loop ! { dg-warning "insufficient parallelism" }
   do i = 1, N
  call gang (a)
   end do
Index: gcc/testsuite/gfortran.dg/goacc/routine-5.f90
===
--- gcc/testsuite/gfortran.dg/goacc/routine-5.f90	(revision 228960)
+++ gcc/testsuite/gfortran.dg/goacc/routine-5.f90	(working copy)
@@ -87,7 +87,7 @@ subroutine seq (a)
   integer, intent (inout) :: a(N)
   integer :: i
 
-  !$acc loop
+  !$acc loop ! { dg-warning "insufficient parallelism" }
   do i = 1, N
  a(i) = a(i) - a(i)
   end do
Index: gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
===
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c	(revision 228960)
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c	(working copy)
@@ -20,7 +20,7 @@ f_acc_kernels (void)
   }
 }
 
-#pragma acc routine
+#pragma acc routine vector
 void
 f_acc_loop (void)
 {
Index: gcc/testsuite/c-c++-common/goacc/routine-7.c
===
--- gcc/testsuite/c-c++-common/goacc/routine-7.c	(revision 228960)
+++ gcc/testsuite/c-c++-common/goacc/routine-7.c	(working copy)
@@ -74,7 +74,7 @@ vector (int red)
 int
 seq (int red)
 {
-#pragma acc loop reduction (+:red)
+#pragma acc loop reduction (+:red) // { dg-warning "insufficient parallelism" }
   for (int i = 0; i < 10; i++)
 red ++;
 
Index: gcc/testsuite/c-c++-common/goacc/routine-6.c
===
--- 

Re: PING^2: [PATCH] Limit alignment on error_mark_node variable

2015-10-19 Thread Bernd Schmidt

On 10/20/2015 12:07 AM, H.J. Lu wrote:

On Thu, Jul 09, 2015 at 03:57:31PM +0200, Richard Biener wrote:

I don't see why the C FE would need to invoke finish_decl twice here.

[...]

So I'd rather
have the C frontend not invoke rest_of_decl_compilation on this in the
first place.

Your patch still feels like a hack in the wrong place.


I've debugged this a bit and it looks like HJ gave a somewhat confused 
account of what actually happens. The frontend never calls 
rest_of_decl_compilation for the decl in question. It only calls 
finish_decl, which detects the problem and sets the type to error_mark. 
That function calls rest_of_decl_compilation, but only for things at 
file scope (the problematic decl is in a function scope).



diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c
index 7d2d096..4d679d9 100644
--- a/gcc/cgraphbuild.c
+++ b/gcc/cgraphbuild.c
@@ -381,7 +381,8 @@ pass_build_cgraph_edges::execute (function *fun)
FOR_EACH_LOCAL_DECL (fun, ix, decl)
  if (TREE_CODE (decl) == VAR_DECL
 && (TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
-   && !DECL_HAS_VALUE_EXPR_P (decl))
+   && !DECL_HAS_VALUE_EXPR_P (decl)
+   && TREE_TYPE (decl) != error_mark_node)
varpool_node::finalize_decl (decl);


This is where we call finalize_decl. I've tried a few things to stop the 
C frontend from letting the erroneous decl escape, but couldn't really 
make it work, so this looks like an adequate fix. The patch is OK IMO.



Bernd


Re: [RFC VTV] Fix VTV for targets that have section anchors.

2015-10-19 Thread Jeff Law

On 10/13/2015 06:53 AM, Ramana Radhakrishnan wrote:




On 12/10/15 21:44, Jeff Law wrote:

On 10/09/2015 03:17 AM, Ramana Radhakrishnan wrote:

This started as a Friday afternoon project ...

It turned out enabling VTV for AArch64 and ARM was a matter of fixing
PR67868 which essentially comes from building libvtv with section
anchors turned on. The problem was that the flow of control from
output_object_block through to switch_section did not have the same
special casing for the vtable section that exists in
assemble_variable.

That's some ugly code.  You might consider factoring that code into a function 
and just calling it from both places.  Your version doesn't seem to handle 
PECOFF, so I'd probably refactor from assemble_variable.



I was a bit lazy as I couldn't immediately think of a target that would want 
PECOFF, section anchors and VTV. That combination seems to be quite rare, 
anyway point taken on the refactor.

Ok if no regressions ?



However both these failures also occur on x86_64 - so I'm content to
declare victory on AArch64 as far as basic enablement goes.

Cool.



1. Are the generic changes to varasm.c ok ? 2. Can we take the
AArch64 support in now, given this amount of testing ? Marcus /
Caroline ? 3. Any suggestions / helpful debug hints for VTV debugging
(other than turning VTV_DEBUG on and inspecting trace) ?

I think that with refactoring they'd be good to go.  No opinions on the AArch64 
specific question -- call for the AArch64 maintainers.

Good to see someone hacking on vtv.  It's in my queue to look at as well.


Yeah figuring out more about vtv is also in my background queue.

regards
Ramana

PR other/67868

* varasm.c (assemble_variable): Move special vtv handling to..
(handle_vtv_comdat_sections): .. here. New function.
(output_object_block): Handle vtv sections.

libvtv/Changelog

* configure.tgt: Support aarch64 and arm.
OK for the trunk.  Sorry for the delay, I was out on the PTO the latter 
half of last week.


jeff



[PATCH] Don't allow FSM threader to create irreducible loops unless it eliminates a multi-way branch

2015-10-19 Thread Jeff Law
If I hack up GCC's old jump threader to avoid threading across backedges 
and instead let the FSM threader handle that case, then we end up with 
cases where the FSM threader creates irreducible loops with marginal 
benefit.


This can be seen in ssa-dom-thread-2{d,e,f}.c.

We've long avoided such threads in the old jump threader.  We generally 
want to avoid them in the FSM threader as well.  The only case where 
we're going to allow them is when we're able to eliminate a multi-way 
branch from the loop.


Bootstrapped and regression tested on x86_64-linux-gnu.  Also tested the 
above mentioned testcases with my hacked up compiler.


Installed on the trunk.

Jeff
commit 518690952b62c1d38b89cdbef0490a7d11f06c40
Author: Jeff Law 
Date:   Mon Oct 19 10:23:26 2015 -0600

[PATCH] Don't allow FSM threader to create irreducible loops unless it 
eliminates a multi-way branch

* tree-ssa-threadupdate.c (valid_jump_thread_path): Reject paths
that create irreducible loops unless the path elimiantes a multiway
branch.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 89a42c1..ff3d3fc 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-10-19  Jeff Law  
+
+   * tree-ssa-threadupdate.c (valid_jump_thread_path): Reject paths
+   that create irreducible loops unless the path elimiantes a multiway
+   branch.
+
 2015-10-19  Richard Biener  
 
PR tree-optimization/67975
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 5632a88..8e3437a 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2553,11 +2553,31 @@ static bool
 valid_jump_thread_path (vec *path)
 {
   unsigned len = path->length ();
+  bool multiway_branch = false;
 
-  /* Check that the path is connected.  */
+  /* Check that the path is connected and see if there's a multi-way
+ branch on the path.  */
   for (unsigned int j = 0; j < len - 1; j++)
-if ((*path)[j]->e->dest != (*path)[j+1]->e->src)
-  return false;
+{
+  if ((*path)[j]->e->dest != (*path)[j+1]->e->src)
+return false;
+  gimple *last = last_stmt ((*path)[j]->e->dest);
+  multiway_branch |= (last && gimple_code (last) == GIMPLE_SWITCH);
+}
+
+  /* If we are trying to thread the loop latch to a block that does
+ not dominate the loop latch, then that will create an irreducible
+ loop.  We avoid that unless the jump thread has a multi-way
+ branch, in which case we have deemed it worth losing other
+ loop optimizations later if we can eliminate the multi-way branch.  */
+  edge e = (*path)[0]->e;
+  struct loop *loop = e->dest->loop_father;
+  if (!multiway_branch
+  && loop->latch
+  && loop_latch_edge (loop) == e
+  && (determine_bb_domination_status (loop, path->last ()->e->dest)
+ == DOMST_NONDOMINATING))
+return false;
 
   return true;
 }
@@ -2650,7 +2670,9 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   if (bitmap_bit_p (threaded_blocks, entry->src->index)
  /* Verify that the jump thread path is still valid: a
 previous jump-thread may have changed the CFG, and
-invalidated the current path.  */
+invalidated the current path or the requested jump
+thread might create irreducible loops which should
+generally be avoided.  */
  || !valid_jump_thread_path (path))
{
  /* Remove invalid FSM jump-thread paths.  */


OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)

2015-10-19 Thread Thomas Schwinge
Hi!

Chung-Lin, would you please have a look at the following (on
gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?

On Tue, 23 Jun 2015 13:51:39 +0200, Jakub Jelinek  wrote:
> On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote:
> > On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
> > > Given that a mapped variable in 4.1 can have different kinds across 
> > > nested data
> > > regions, we need to store map-type not only for each var, but also for 
> > > each
> > > structured mapping.  Here is my WIP patch, is it sane? :)
> > > Attached testcase works OK on the device with non-shared memory.
> > 
> > A bit updated version with a fix for GOMP_MAP_TO_PSET.
> > make check-target-libgomp passed.
> 
> Ok, thanks.
> 
> > include/gcc/
> > * gomp-constants.h (GOMP_MAP_ALWAYS_TO_P,
> > GOMP_MAP_ALWAYS_FROM_P): Define.
> > libgomp/
> > * libgomp.h (struct target_var_desc): New.
> > (struct target_mem_desc): Replace array of splay_tree_key with array of
> > target_var_desc.
> > (struct splay_tree_key_s): Move copy_from to target_var_desc.
> > * oacc-mem.c (gomp_acc_remove_pointer): Use copy_from from
> > target_var_desc.
> > * oacc-parallel.c (GOACC_parallel): Use copy_from from target_var_desc.
> > * target.c (gomp_map_vars_existing): Copy data to device if map-type is
> > 'always to' or 'always tofrom'.
> > (gomp_map_vars): Use key from target_var_desc.  Set copy_from and
> > always_copy_from.
> > (gomp_copy_from_async): Use key and copy_from from target_var_desc.
> > (gomp_unmap_vars): Copy data from device if always_copy_from is set.
> > (gomp_offload_image_to_device): Do not use copy_from.
> > * testsuite/libgomp.c/target-11.c: New test.

(That's gomp-4_1-branch r224838.  The attached
gomp-4_1-branch-r224838.patch is a variant that applies on top of
gomp-4_0-branch r228972.)  This change introduces regressions in OpenACC
async clause handling.

Testing on gomp-4_1-branch r224838:

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

Same for C++.

Testing on gomp-4_0-branch r228972 plus the attached
gomp-4_1-branch-r224838.patch:

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none (test for 
excess errors)
[-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none execution 
test

Same for C++.

As I mentioned in
,
all three regressions are visible when testing on trunk r228777.  I have
not analyzed why the three different branches show different sets of
regressions -- I'm hoping they're all manifestations of the same
underlying problem: they're all using the OpenACC async clause.

Looking at gomp-4_0-branch r228972 plus the attached
gomp-4_1-branch-r224838.patch, clearly there is "some kind of data
corruption":

$ gdb -q a.out 
Reading symbols from a.out...done.
(gdb) start
[...]
25  a = (float *) malloc (nbytes);
(gdb) n
26  b = (float *) malloc (nbytes);
(gdb) print a
$1 = (float *) 0xab12c0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x004015d2 in main (argc=1, argv=0x7fffd408) at 
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c:133
133 if (a[i] != 3.0)
(gdb) print a
$2 = (float *) 0x500680620

0x500680620 looks like a nvptx device pointer to me, which is a) wrong
(after the "malloc", "a" shouldn't change its value throughout program
execution), and b) that "explains" the segmentation fault (device pointer
dereferenced in host code).

So, maybe data is erroneously being copied back to the host from device,
or from libgomp internal data structures.  Maybe some copy_from flag
handling needs to be adjusted or added in the OpenACC code in libgomp?


I have no idea whether that's related, but I noticed that currently we're
not in any way handling async_refcount in libgomp/oacc-*.c -- do we have
to?  (Its name certainly makes me believe it's related to asynchronous
data (un-)mapping.)  Should we be able to drop some of the
OpenACC-specific async implementation in libgomp, and use new/generic
target.c code instead?


Please note that there will be further libgomp changes (target.c, and
other 

Re: Fix prototype for print_insn in rtl.h

2015-10-19 Thread Jeff Law

On 10/19/2015 09:14 AM, Jeff Law wrote:

On 10/15/2015 10:28 AM, Andrew MacLeod wrote:

On 10/13/2015 11:32 AM, Jeff Law wrote:

On 10/13/2015 02:21 AM, Nikolai Bozhenov wrote:

2015-10-13  Nikolai Bozhenov

 * gcc/rtl.h (print_insn): fix prototype

Installed on the trunk after bootstrap & regression test.

jeff


Sorry, a little late to the party.. but why is print_insn even in
rtl.h?  it seems that sched-vis.c is the only thing that uses it...

Then let's move it to sched-int.h, unless there's some good reason not to.
Because there isn't a sched-vis.h file and sched-vis.c would need to 
include sched-int.h.


That's all rather silly because sched-vis.c has nothing to do with 
scheduling.  It's just an RTL dumper.


I think moving all that stuff into print-rtl.[ch] is probably the better 
solution.


jeff


Re: [patch] fix gotools cross build

2015-10-19 Thread Ian Lance Taylor
On Wed, May 6, 2015 at 5:34 AM, Matthias Klose  wrote:
>
> Yes, it's documented that there is still some work to do for cross builds,
> however a cross build for gotools currently fails.
>
> The toplevel make always passes the GOC variable in the environment, 
> overwriting
> anything configured in gotools own configure.ac. Fixed by explictly using 
> @GOC@
> for GOCOMPILER.
>
> gotools is a host project, and the cross_compiling check always fails. Not 
> sure
> if the $host != $target test is any better, but it works for me.
>
> Ok for the trunk and the gcc-5 branch?
>
>   Matthias
>
> * Makefile.am: Use GOC configured in configure.ac for cross builds.
> * configure.ac: Fix NATIVE conditional.
> * Makefile.in, configure: Regenerate.
>
> --- gotools/Makefile.am
> +++ gotools/Makefile.am
> @@ -33,7 +33,7 @@
>  # Use the compiler we just built.
>  GOCOMPILER = $(GOC_FOR_TARGET)
>  else
> -GOCOMPILER = $(GOC)
> +GOCOMPILER = @GOC@
>  endif
>
>  GOCFLAGS = $(CFLAGS_FOR_TARGET)
> --- gotools/configure.ac
> +++ gotools/configure.ac
> @@ -46,7 +46,7 @@
>  AC_PROG_CC
>  AC_PROG_GO
>
> -AM_CONDITIONAL(NATIVE, test "$cross_compiling" = no)
> +AM_CONDITIONAL(NATIVE, test "$host" = "$target")
>
>  dnl Test for -lsocket and -lnsl.  Copied from libjava/configure.ac.
>  AC_CACHE_CHECK([for socket libraries], gotools_cv_lib_sockets,


I really apologize for never responding to this.  I keep meaning to
figure out the right thing to do, but I never have time to think about
it.

As far as I can see, both these suggested changes are wrong.  They may
avoid problems that exist today, but as far as I can see they don't do
it in the right way.

If GOC is not being set correctly by the top level Makefile, then the
fix is to set it correctly.  Overriding GOC in the gotools Makefile is
just going to lead us into more complexity and confusion over time.

The test for NATIVE currently tests whether we are building with a
cross-compiler.  You are suggesting changing it to be whether we are
building a cross-compiler.  Neither test is fully correct.  If we just
want to make things slightly more correct, we should that $build ==
$host (aka ! $cross_compiling) and that $host == $target.  If both
conditions are true, then we have a native build.  A change along
those lines is OK.

Sorry again for not responding to this.

Ian


Re: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)

2015-10-19 Thread Ilya Verbin
On Mon, Oct 19, 2015 at 18:24:35 +0200, Thomas Schwinge wrote:
> Chung-Lin, would you please have a look at the following (on
> gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?
> 
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

Maybe it was caused by this change in gomp_unmap_vars?
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01376.html

Looking at the code, I don't see any difference in async_refcount handling, but
I was unable to test it without having hardware :(

  -- Ilya


Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-19 Thread Thomas Schwinge
Hi!

Ping...

On Wed, 30 Sep 2015 17:54:07 +0200, I wrote:
> On Tue, 29 Sep 2015 10:18:14 +0200, Jakub Jelinek  wrote:
> > On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> > > On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek  
> > > wrote:
> > > > So, do I understand well that you'll call GOMP_set_offload_targets from
> > > > construct[ors] of all shared libraries (and the binary) that contain 
> > > > offloaded
> > > > code?  If yes, that is surely going to fail the assertions in there.
> > > 
> > > Indeed.  My original plan has been to generate/invoke this constructor
> > > only for/from the final executable and not for any shared libraries, but
> > > it seems I didn't implemented this correctly.
> > 
> > How would you mean to implement it?
> 
> I have come to realize that we need to generate/invoke this constructor
> From everything that links against libgomp (which is what I implemented),
> that is, executables as well as shared libraries.
> 
> > -fopenmp or -fopenacc code with
> > offloading bits might not be in the final executable at all, nor in shared
> > libraries it is linked against; such libraries could be only dlopened,
> > consider say python plugin.  And this is not just made up, perhaps not with
> > offloading yet, but people regularly use OpenMP code in plugins and then we
> > get complains that fork child of the main program is not allowed to do
> > anything but async-signal-safe functions.
> 
> I'm not sure I'm completely understanding that paragraph?  Are you saying
> that offloaded code can be in libraries that are not linked against
> libgomp?  How would these register (GOMP_offload_register) their
> offloaded code?  I think it's a reasonable to expect that every shared
> library that contains offloaded code must link against libgomp, which
> will happen automatically given that it is built with -fopenmp/-fopenacc?
> 
> > > > You can dlopen such libraries etc.  What if you link one library with
> > > > -fopenmp=nvptx-none and another one with 
> > > > -fopenmp=x86_64-intelmicemul-linux?
> > > 
> > > So, the first question to answer is: what do we expect to happen in this
> > > case, or similarly, if the executable and any shared libraries are
> > > compiled with different/incompatible -foffload options?
> > 
> > As the device numbers are per-process, the only possibility I see is that
> > all the physically available devices are always available, and just if you
> > try to offload from some code to a device that doesn't support it, you get
> > host fallback.  Because, one shared library could carefully use device(xyz)
> > to offload to say XeonPhi it is compiled for and supports, and another
> > library device(abc) to offload to PTX it is compiled for and supports.
> 
> OK, I think I get that, and it makes sense.  Even though, I don't know
> how you'd do that today: as far as I can tell, there is no specification
> covering the OpenMP 4 target device IDs, so I have no idea how a user
> program/library could realiably use them in practice?  For example, in
> the current GCC implementation, the OpenMP 4 target device IDs depend on
> the number of individual devices availble in the system, and the order in
> which libgomp loads the plugins, which is defined (arbitrarily) by the
> GCC configuration?
> 
> > > For this, I propose that the only mode of operation that we currently can
> > > support is that all of the executable and any shared libraries agree on
> > > the offload targets specified by -foffload, and I thus propose the
> > > following patch on top of what Joseph has posted before (passes the
> > > testsuite, but not yet tested otherwise):
> > 
> > See above, no.
> 
> OK.
> 
> How's the following (complete patch instead of incremental patch; the
> driver changes are still the same as before)?  The changes are:
> 
>   * libgomp/target.c:gomp_target_init again loads all the plugins.
>   * libgomp/target.c:resolve_device and
> libgomp/oacc-init.c:resolve_device verify that a default device
> (OpenMP device-var ICV, and acc_device_default, respectively) is
> actually enabled, or resort to host fallback if not.
>   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> to enable devices specified by -foffload.  Can be called multiple
> times (executable, any shared libraries); the set of enabled devices
> is the union of all those ever requested.
>   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> changed to enable all devices.  This is to maintain compatibility
> with old executables and shared libraries built without the -foffload
> constructor support.
>   * IntelMIC mkoffload changed to use GOMP_offload_register_ver instead
> of GOMP_offload_register, and GOMP_offload_unregister_ver instead of
> GOMP_offload_unregister.  To avoid enabling all devices
> (GOMP_offload_register).
>   * New test cases to verify this (-foffload=disable, host 

<    1   2