date:20181217

Re: [PATCH] Fix AVX512VL gather ICEs (PR target/88513, PR target/88514)

2018-12-17 Thread Uros Bizjak

On Mon, Dec 17, 2018 at 11:37 PM Jakub Jelinek  wrote:
>
> Hi!
>
> Some of the following testcases ICE, because I was assuming that
> VEC_UNPACK_{LO,HI}_EXPR and VEC_PACK_TRUNC_EXPR just work on the
> VECTOR_BOOLEAN_TYPE_P mask types that AVX512* has (with scalar modes),
> but they really only work if the wider mode is different from the narrower
> one, so e.g. one can extract the lo or hi half of a nunits 16 
> VECTOR_BOOLEAN_TYPE_P
> type with VEC_UNPACK_*_EXPR to have a HImode -> QImode expander, or
> combine two nunits 8 halves into one 16 nunits one, i.e. QImode + QImode ->
> HImode with VEC_PACK_TRUNC_EXPR, but for bitmasks with fewer bits it is
> ambigious - either we need to extract lo/hi half of nunits 8
> VECTOR_BOOLEAN_TYPE_P or lo/hi half of nunits 4 VECTOR_BOOLEAN_TYPE_P, both
> would be QImode -> QImode operation and from the mode one can't figure out
> which one is which.  For VEC_PACK_TRUNC_EXPR it is even more complicated,
> because we use the operand mode as the name in the optab, so
> vec_pack_trunc_qi is already used the the 8 + 8 -> 16 nunits one which gives
> a HImode result and there is nothing left for the 4 + 4 -> 8 and 2 + 2 -> 4.
>
> When not assuming it works, e.g. if I just used
> supportable_widening_operation in the gather and scatter INTEGRAL_TYPE_P
> masktype handling code, it would lead just to missed optimizations, because
> the vectorizer just punted in those cases (note, it doesn't affect
> -mprefer-vector-width=512 case, because even for DF/DImode we use 8 bits
> already, but the cases when AVX512VL is used with
> -mprefer-vector-width={128,256}.
>
> The following patch introduces 3 new optabs which are like
> vec_pack_trunc_optab resp. vec_unpacks_{lo,hi}_optab, except that their
> expanders take another argument - CONST_INT representing the number of units
> in the wider of the two bitmask (VECTOR_BOOLEAN_TYPE_P) types and is meant
> to be used for the cases where both modes have different
> TYPE_VECTOR_SUBPARTS, but the same TYPE_MODE.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-12-17  Jakub Jelinek  
>
> PR target/88513
> PR target/88514
> * optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab,
> vec_unpacks_sbool_lo_optab): New optabs.
> * optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool_*_optab
> and pass additional argument if both input and target have the same
> scalar mode of VECTOR_BOOLEAN_TYPE_P vectors.
> * expr.c (expand_expr_real_2) : Handle
> VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode
> as the operands using vec_pack_sbool_trunc_optab.
> * tree-vect-stmts.c (supportable_widening_operation): Use
> vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions
> where both wider_vectype and vectype have the same scalar mode.
> (supportable_narrowing_operation): Similarly use
> vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same
> scalar mode.
> * config/i386/i386.c (ix86_get_builtin)
> : Check for non-VECTOR_MODE_P
> rather than VOIDmode.

This entry doesn't match the change, you are checking for
VECTOR_MODE_P. On a related note, should similar
IX86_BUILTIN_GATHER3ALTDIV16{SF,SI} builtins be changed in the same
way?

> * config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_):
> Remove useless ()s around "register_operand", formatting fixes.
> (vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi,
> vec_unpacks_sbool_hi_qi): New expanders.
> * doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M,
> vec_unpacks_sbool_lo_M): Document.
>
> * gcc.target/i386/avx512f-pr88513-1.c: New test.
> * gcc.target/i386/avx512f-pr88513-2.c: New test.
> * gcc.target/i386/avx512vl-pr88464-1.c: New test.
> * gcc.target/i386/avx512vl-pr88464-2.c: New test.
> * gcc.target/i386/avx512vl-pr88464-3.c: New test.
> * gcc.target/i386/avx512vl-pr88464-4.c: New test.
> * gcc.target/i386/avx512vl-pr88513-1.c: New test.
> * gcc.target/i386/avx512vl-pr88513-2.c: New test.
> * gcc.target/i386/avx512vl-pr88513-3.c: New test.
> * gcc.target/i386/avx512vl-pr88513-4.c: New test.
> * gcc.target/i386/avx512vl-pr88514-1.c: New test.
> * gcc.target/i386/avx512vl-pr88514-2.c: New test.
> * gcc.target/i386/avx512vl-pr88514-3.c: New test.

LGTM for the x86 part.

Thanks,
Uros.

> --- gcc/optabs.def.jj   2018-12-14 20:35:37.883126125 +0100
> +++ gcc/optabs.def  2018-12-17 15:18:27.817103784 +0100
> @@ -335,6 +335,7 @@ OPTAB_D (vec_pack_sfix_trunc_optab, "vec
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
>  OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
> +OPTAB_D (vec_pack_sbool_trunc_optab, "vec_pack_sbool_tr

Re: [C++ Patch] PR 71140 ("[concepts] ill-formed nested-requirement lacking a semicolon not rejected")

2018-12-17 Thread Jason Merrill

On Wed, Oct 31, 2018 at 2:34 PM Andrew Sutton  wrote:
>
> Sorry for the slow reply. I've been stuck working on some other projects.
>>
>> Can you say a bit about why that was better than continuing to use VAR_DECL?
>
> I wanted to make sure that we avoid normal VAR_DECL processing routines, so 
> we don't e.g., slip into a function where we might try to generate an address 
> for a concept.
>
>> Yeah, don't worry about trying to send small patches.  I don't mind
>> reviewing what's on the branch, though at least the final patch should
>> be sent to the list for archival.
>>
>> What feedback are you looking for at this point?
>
> Mostly anything that would obviously prevent or cause problems merging in the 
> near future.
>
> I'll try to keep the asutton/gcc fork on github rebased on trunk so there 
> shouldn't be too many merge issues.

Any updates?

Jason

Re: [ping] Change static chain to r11 on aarch64

2018-12-17 Thread Hans-Peter Nilsson

On Mon, 17 Dec 2018, Hans-Peter Nilsson wrote:
> On Mon, 17 Dec 2018, Wilco Dijkstra wrote:
> > H-P:
> > > So, changing from R18 to R11 for aarch64 seems right, as the
> > > latter is call-clobbered and the former is call-saved IIUC.
> >
> > The AArch64 ABI defines x18 as platform specific:
> > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
> > On Linux it is call-clobbered, but it could be a fixed register on other
> > platforms (eg. a thread-local pointer). I don't think it's possible to make
> > it a callee-save.
>
> JFTR, in gcc, it's treated as call-saved, AFAICS, with no
> special-casing (other than being STATIC_CHAIN_REGNUM, which
> by itself has no other effect).

I take that back,  My bad reading.  Sorry for the noise.

brgds, H-P

Re: [ping] Change static chain to r11 on aarch64

2018-12-17 Thread Hans-Peter Nilsson

On Mon, 17 Dec 2018, Wilco Dijkstra wrote:
> H-P:
> > So, changing from R18 to R11 for aarch64 seems right, as the
> > latter is call-clobbered and the former is call-saved IIUC.
>
> The AArch64 ABI defines x18 as platform specific:
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
> On Linux it is call-clobbered, but it could be a fixed register on other
> platforms (eg. a thread-local pointer). I don't think it's possible to make
> it a callee-save.

JFTR, in gcc, it's treated as call-saved, AFAICS, with no
special-casing (other than being STATIC_CHAIN_REGNUM, which
by itself has no other effect).

Is that a bug or deliberate?  If deliberate, a comment at
CALL_USED_REGISTERS would have cleared that question.
The document you refer to seems to be on the fence regarding
whether a compiler should treat it as call-clobbered.

> Still it is the wrong register to use since it already has
> different uses. Using x9 would make its use as an extra argument clearer.

Sure.  For one, it would avoid the bug with
STATIC_CHAIN_REGNUM being call-saved. :)

brgds, H-P

Re: [patch] Fix bootstrap powerpc--freebsd* targets

2018-12-17 Thread Alan Modra

On Mon, Dec 17, 2018 at 11:05:57AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Dec 17, 2018 at 10:40:01AM +1030, Alan Modra wrote:
> > Since I broke powerpc*-freebsd and the other non-linux powerpc
> > targets, I guess I ought to fix them.  The following is a variation on
> > your first patch, that results in -mcall-linux for powerpc-freebsd*
> > providing the 32-bit powerpc-linux dynamic linker.
> 
> That, like the first patch, abuses that header file.  Please do it
> somewhere sane instead, not in a random subtarget file?

Is there is a better place, currently?  sysv4.h contains a mess of OS
related defines already, to support various -mcall options.  If those
stay in sysv4.h I can't see a better place for the fall-back
GNU_USER_DYNAMIC_LINKER define.

Here's the problem:
powerpc*-*-linux* uses tm_file="rs6000/rs6000.h dbxelf.h elfos.h
gnu-user.h linux.h freebsd-spec.h rs6000/sysv4.h" plus a few more.
linux.h contains the proper GNU_USER_DYNAMIC_LINKER define for linux.
Fairly obviously we can't put a fallback define in rs6000/rs6000.h
for those targets that don't include linux.h (and including linux.h
for non-linux targets is probably not a good idea).

Besides rs6000/sysv4.h, you could put the fallback in rs6000/freebsd.h
to fix powerpc*-freebsd*, but then you'd need to put it in
rs6000/netbsd.h, rs6000/eabi.h, rs6000/rtems.h, rs6000/vxworks.h,
rs6000/lynx.h to fix those targets.  That would be horrible.  And it
would leave powerpc-elf broken.

> 
> > * config/rs6000/sysv4.h (GNU_USER_DYNAMIC_LINKER): Define.

-- 
Alan Modra
Australia Development Lab, IBM

[nvptx] vector length patch series -- openacc parts

2018-12-17 Thread Tom de Vries

On 14-12-18 20:58, Tom de Vries wrote:

> 0003-openacc-Add-target-hook-TARGET_GOACC_ADJUST_PARALLEL.patch

> 0017-nvptx-Enable-large-vectors.patch

> 0023-nvptx-Force-vl32-if-calling-vector-partitionable-rou.patch

Thomas,

these patches are openacc (0003) or have openacc components (0017, 0023).

Can you review and possibly approve the openacc parts?

Thanks,
- Tom

Re: [PATCH] v5: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-17 Thread David Malcolm

On Mon, 2018-12-17 at 14:33 -0500, Jason Merrill wrote:
> On 12/14/18 7:17 PM, David Malcolm wrote:
> > +  /* Since default args are effectively part of the function
> > type,
> > +strip location wrappers here, since otherwise the
> > location of
> > +one function's default arguments is arbitrarily chosen
> > for
> > +all functions with similar signature (due to
> > canonicalization
> > +of function types).  */
> 
> Hmm, looking at this again, why would this happen?  I see that 
> type_list_equal uses == to compare default arguments, so two
> function 
> types with the same default argument but different location wrappers 
> shouldn't be combined.
> 
> Jason

Thanks.

I did some digging into this.  I added this strip to fix
  g++.dg/template/defarg6.C
but it looks like I was overzealous (the comment is correct, but it's
papering over a problem).

It turns out that type_list_equal is doing more than just pointer
equality; it's hitting the simple_cst_equal part of the && at line
7071:

7063bool
7064type_list_equal (const_tree l1, const_tree l2)
7065{
7066  const_tree t1, t2;
7067
7068  for (t1 = l1, t2 = l2; t1 && t2; t1 = TREE_CHAIN (t1), t2 = 
TREE_CHAIN (t2))
7069if (TREE_VALUE (t1) != TREE_VALUE (t2)
7070|| (TREE_PURPOSE (t1) != TREE_PURPOSE (t2)
7071&& ! (1 == simple_cst_equal (TREE_PURPOSE (t1), 
TREE_PURPOSE (t2))
7072  && (TREE_TYPE (TREE_PURPOSE (t1))
7073  == TREE_TYPE (TREE_PURPOSE (t2))
7074  return false;
7075
7076  return t1 == t2;
7077}

What's happening is that there are two different functions with
identical types apart from the locations of their (equal) default
arguments: both of the TREE_PURPOSEs are NON_LVALUE_EXPR wrappers
around a CONST_DECL enum value (at different source locations).

simple_cst_equal is stripping the location wrappers here:

7311  if (CONVERT_EXPR_CODE_P (code1) || code1 == NON_LVALUE_EXPR)
7312{
7313  if (CONVERT_EXPR_CODE_P (code2)
7314  || code2 == NON_LVALUE_EXPR)
7315return simple_cst_equal (TREE_OPERAND (t1, 0), TREE_OPERAND 
(t2, 0));
7316  else
7317return simple_cst_equal (TREE_OPERAND (t1, 0), t2);
7318}

and thus finds them to be equal; the iteration in type_list_equal
continues, and runs out of parameters with t1 == t2 == NULL, and thus
returns true, and thus the two function types hash to the same slot,
and the two function types get treated as being the same.

It's not clear to me yet what the best solution to this is:
- should simple_cst_equal regard different source locations as being
different?
- should function-type hashing use a custom version of type_list_equal
when comparing params, and make different source locations of default
args be different?
- something else?

Dave

Re: [PATCH] PR libstdc++/71044 optimize std::filesystem::path construction

2018-12-17 Thread Jonathan Wakely


On 13/12/18 20:34 +, Jonathan Wakely wrote:

   Construction and modification of paths is now done more efficiently, by
   splitting the input into a stack-based buffer of string_view objects
   instead of a dynamically-allocated vector containing strings. Once the
   final size is known only a single allocation is needed to reserve space
   for it.  The append and concat operations no longer require constructing
   temporary path objects, nor re-parsing the entire native pathname.


This patch fixes a regression in the append/concat code, introduced by
the changes to avoid constructing  an intermediate path. The problem
wasn't caught by existing tests, so I've improved the tests.

Tested x86_64-linux, committed to trunk.


commit 7583088c09b5dd5d33a98abd1f61b0ccff961d6e
Author: Jonathan Wakely 
Date:   Mon Dec 17 21:42:48 2018 +

PR libstdc++/71044 fix off-by-one errors introduced recently

The recent changes to append/concat directly from strings (without
constructing paths) introduced regressions where one of the components
could be omitted from the iteration sequence in the result.

PR libstdc++/71044
* src/filesystem/std-path.cc (path::_M_append): Fix off-by-one error
that caused a component to be lost from the iteration sequence.
(path::_M_concat): Likewise.
* testsuite/27_io/filesystem/path/append/source.cc: Test appending
long strings.
* testsuite/27_io/filesystem/path/concat/strings.cc: Test
concatenating long strings.
* testsuite/27_io/filesystem/path/construct/string_view.cc: Test
construction from long string.

diff --git a/libstdc++-v3/src/filesystem/std-path.cc b/libstdc++-v3/src/filesystem/std-path.cc
index d2520492c03..b5ddbdad149 100644
--- a/libstdc++-v3/src/filesystem/std-path.cc
+++ b/libstdc++-v3/src/filesystem/std-path.cc
@@ -781,10 +781,11 @@ path::_M_append(basic_string_view s)
 	  ::new(output++) _Cmpt(c.str, c.type, parser.offset(c));
 	  ++_M_cmpts._M_impl->_M_size;
 	}
-	  for (auto c = parser.next(); c.valid(); c = parser.next())
+	  while (cmpt.valid())
 	{
-	  ::new(output++) _Cmpt(c.str, c.type, parser.offset(c));
+	  ::new(output++) _Cmpt(cmpt.str, cmpt.type, parser.offset(cmpt));
 	  ++_M_cmpts._M_impl->_M_size;
+	  cmpt = parser.next();
 	}
 
 	  if (s.back() == '/')
@@ -1139,7 +1140,7 @@ path::_M_concat(basic_string_view s)
 	}
 #endif
 	}
-  else if (orig_filenamelen == 0)
+  else if (orig_filenamelen == 0 && extra.empty())
 	{
 	  // Replace empty filename at end of original path.
 	  std::prev(output)->_M_pathname = it->str;
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
index e440ca921c7..21ae6be3d97 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/append/source.cc
@@ -137,6 +137,22 @@ test05()
   s = second->native();
   p3 /= s;
   VERIFY( p3.string() == "0/123456789/a/123456789" );
+  }
+
+void
+test06()
+{
+  const std::string s0 = "a/b/c";
+  path p = s0;
+  std::string s;
+  for (int i = 0; i < 10; ++i)
+s += "0/1/2/3/4/5/6/7/8/9/";
+  // append a long string with many components
+  test(p, s.c_str());
+
+  // Same again but with a trailing slash on the left operand:
+  path p2 = s0 + '/';
+  test(p2, s.c_str());
 }
 
 int
@@ -147,4 +163,5 @@ main()
   test03();
   test04();
   test05();
+  test06();
 }
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
index eea9b6dc69b..316f7c3d7bd 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/concat/strings.cc
@@ -24,8 +24,10 @@
 #include 
 #include 
 #include 
+#include 
 
 using std::filesystem::path;
+using __gnu_test::compare_paths;
 
 void
 test01()
@@ -60,7 +62,7 @@ test01()
 void
 test02()
 {
-  std::basic_string_view s;
+  std::basic_string_view s, expected;
 
   path p = "0/1/2/3/4/5/6";
   // The string_view aliases the path's internal string:
@@ -68,25 +70,54 @@ test02()
   // Append that string_view, which must work correctly even though the
   // internal string will be reallocated during the operation:
   p += s;
-  VERIFY( p.string() == "0/1/2/3/4/5/60/1/2/3/4/5/6" );
+  compare_paths(p, "0/1/2/3/4/5/60/1/2/3/4/5/6");
 
   // Same again with a trailing slash:
   path p2 = "0/1/2/3/4/5/";
   s = p2.native();
   p2 += s;
-  VERIFY( p2.string() == "0/1/2/3/4/5/0/1/2/3/4/5/" );
+  compare_paths(p2, "0/1/2/3/4/5/0/1/2/3/4/5/");
 
   // And aliasing one of the components of the path:
   path p3 = "0/123456789";
   path::iterator second = std::next(p3.begin());
   s = second->native();
   p3 += s;
-  VERIFY( p3.string() == "0/123456789123456789" );
+  compare_paths(p3, "0/1234567891

[PATCH] Fix AVX512VL gather ICEs (PR target/88513, PR target/88514)

2018-12-17 Thread Jakub Jelinek

Hi!

Some of the following testcases ICE, because I was assuming that
VEC_UNPACK_{LO,HI}_EXPR and VEC_PACK_TRUNC_EXPR just work on the
VECTOR_BOOLEAN_TYPE_P mask types that AVX512* has (with scalar modes),
but they really only work if the wider mode is different from the narrower
one, so e.g. one can extract the lo or hi half of a nunits 16 
VECTOR_BOOLEAN_TYPE_P
type with VEC_UNPACK_*_EXPR to have a HImode -> QImode expander, or
combine two nunits 8 halves into one 16 nunits one, i.e. QImode + QImode ->
HImode with VEC_PACK_TRUNC_EXPR, but for bitmasks with fewer bits it is
ambigious - either we need to extract lo/hi half of nunits 8
VECTOR_BOOLEAN_TYPE_P or lo/hi half of nunits 4 VECTOR_BOOLEAN_TYPE_P, both
would be QImode -> QImode operation and from the mode one can't figure out
which one is which.  For VEC_PACK_TRUNC_EXPR it is even more complicated,
because we use the operand mode as the name in the optab, so
vec_pack_trunc_qi is already used the the 8 + 8 -> 16 nunits one which gives
a HImode result and there is nothing left for the 4 + 4 -> 8 and 2 + 2 -> 4.

When not assuming it works, e.g. if I just used
supportable_widening_operation in the gather and scatter INTEGRAL_TYPE_P
masktype handling code, it would lead just to missed optimizations, because
the vectorizer just punted in those cases (note, it doesn't affect
-mprefer-vector-width=512 case, because even for DF/DImode we use 8 bits
already, but the cases when AVX512VL is used with
-mprefer-vector-width={128,256}.

The following patch introduces 3 new optabs which are like
vec_pack_trunc_optab resp. vec_unpacks_{lo,hi}_optab, except that their
expanders take another argument - CONST_INT representing the number of units
in the wider of the two bitmask (VECTOR_BOOLEAN_TYPE_P) types and is meant
to be used for the cases where both modes have different
TYPE_VECTOR_SUBPARTS, but the same TYPE_MODE.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-17  Jakub Jelinek  

PR target/88513
PR target/88514
* optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab,
vec_unpacks_sbool_lo_optab): New optabs.
* optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool_*_optab
and pass additional argument if both input and target have the same
scalar mode of VECTOR_BOOLEAN_TYPE_P vectors.
* expr.c (expand_expr_real_2) : Handle
VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode
as the operands using vec_pack_sbool_trunc_optab.
* tree-vect-stmts.c (supportable_widening_operation): Use
vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions
where both wider_vectype and vectype have the same scalar mode.
(supportable_narrowing_operation): Similarly use
vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same
scalar mode.
* config/i386/i386.c (ix86_get_builtin)
: Check for non-VECTOR_MODE_P
rather than VOIDmode.
* config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_):
Remove useless ()s around "register_operand", formatting fixes.
(vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi,
vec_unpacks_sbool_hi_qi): New expanders.
* doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M,
vec_unpacks_sbool_lo_M): Document.

* gcc.target/i386/avx512f-pr88513-1.c: New test.
* gcc.target/i386/avx512f-pr88513-2.c: New test.
* gcc.target/i386/avx512vl-pr88464-1.c: New test.
* gcc.target/i386/avx512vl-pr88464-2.c: New test.
* gcc.target/i386/avx512vl-pr88464-3.c: New test.
* gcc.target/i386/avx512vl-pr88464-4.c: New test.
* gcc.target/i386/avx512vl-pr88513-1.c: New test.
* gcc.target/i386/avx512vl-pr88513-2.c: New test.
* gcc.target/i386/avx512vl-pr88513-3.c: New test.
* gcc.target/i386/avx512vl-pr88513-4.c: New test.
* gcc.target/i386/avx512vl-pr88514-1.c: New test.
* gcc.target/i386/avx512vl-pr88514-2.c: New test.
* gcc.target/i386/avx512vl-pr88514-3.c: New test.

--- gcc/optabs.def.jj   2018-12-14 20:35:37.883126125 +0100
+++ gcc/optabs.def  2018-12-17 15:18:27.817103784 +0100
@@ -335,6 +335,7 @@ OPTAB_D (vec_pack_sfix_trunc_optab, "vec
 OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
 OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
 OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
+OPTAB_D (vec_pack_sbool_trunc_optab, "vec_pack_sbool_trunc_$a")
 OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
 OPTAB_D (vec_packs_float_optab, "vec_packs_float_$a")
 OPTAB_D (vec_packu_float_optab, "vec_packu_float_$a")
@@ -350,6 +351,8 @@ OPTAB_D (vec_unpacks_float_hi_optab, "ve
 OPTAB_D (vec_unpacks_float_lo_optab, "vec_unpacks_float_lo_$a")
 OPTAB_D (vec_unpacks_hi_optab, "vec_unpacks_hi_$a")
 OPTAB_D (vec_unpacks_lo_optab, "vec_unpacks_lo_$a")
+OPTAB_D (vec_unpacks_sbool_hi_optab,

Re: [PATCH][RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars

2018-12-17 Thread Peter Bergner

On 12/17/18 3:55 PM, Segher Boessenkool wrote:
> On Mon, Dec 17, 2018 at 02:23:54PM -0600, Peter Bergner wrote:
>> This means we require the insn to eventually be split and not reach final
>> assembly, does it not?
> 
> Yes, but is the length attribute used for something that matters before
> that?  For correctness, not just for better code.  It isn't clear to me
> that this will work.
> 
> OTOH I cannot currently show it does not work either.  So, okay for trunk,
> and I hope it actually works :-)

Ok, committed.  I'll try and keep an eye out for any fallout, but I'm
fairly certain there won't be.  Thanks!

Peter

Re: [PATCH, V2, d] Fix IdentityExp comparison for complex floats

2018-12-17 Thread Iain Buclaw

On Wed, 28 Nov 2018 at 23:46, Iain Buclaw  wrote:
>
> On Wed, 28 Nov 2018 at 22:32, Johannes Pfau  wrote:
> >
> > Next version, addresses the review comments.
> >
> > Tested at https://github.com/D-Programming-GDC/GDC/pull/768
> > ---
> > gcc/d/ChangeLog:
> >
> > 2018-11-28  Johannes Pfau  
> >
> > * expr.cc (ExprVisitor::visit(IdentityExp)): Add support for 
> > complex types.
> > (build_float_identity): New function.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-11-28  Johannes Pfau  
> >
> > * gdc.dg/runnable.d: Test IdentityExp for complex types.
> >
> >
> >  gcc/d/expr.cc   | 40 -
> >  gcc/testsuite/gdc.dg/runnable.d | 22 ++
> >  2 files changed, 51 insertions(+), 11 deletions(-)
> >
>
> As I've said before, looks reasonable to me.  Thanks.
>

I noticed that this hasn't been pushed in yet, I was about to go ahead
and commit it, however... there's another case to consider, structs
with creal fields.

This test - modified from your patch - is another that should pass but
currently doesn't.
---
struct CReal
{
creal value;
}

CReal s1 = CReal(+0.0 + 0.0i);
CReal s2 = CReal(+0.0 - 0.0i);
CReal s3 = CReal(-0.0 + 0.0i);
CReal s4 = CReal(+0.0 + 0.0i);

assert(s1 !is s2);
assert(s1 !is s3);
assert(s2 !is s3);
assert(s1 is s4);

assert(!(s1 is s2));
assert(!(s1 is s3));
assert(!(s2 is s3));
assert(!(s1 !is s4));
---

I'll send a supplementary patch, and commit both together.

-- 
Iain

Re: [PATCH][RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars

2018-12-17 Thread Segher Boessenkool

On Mon, Dec 17, 2018 at 02:23:54PM -0600, Peter Bergner wrote:
> On 12/14/18 8:23 PM, Segher Boessenkool wrote:
> > On Thu, Dec 13, 2018 at 10:59:36AM -0600, Peter Bergner wrote:
> >> For the alternatives
> >> I'm changing, we're loading into GPR regs and these alternatives are always
> >> split (split2), so these length values are never used/seen at final 
> >> assembly
> >> time.
> > 
> > But some move instructions are created *after* split2.
> 
> That may be so, but I do not know how we could create a move insn using
> this alternative, since rs6000_output_move_128bit() does the following
> for this alternative (ie, loading a constant into a GPR):
> 
> 
>   /* Constants.  */
>   else if (dest_regno >= 0
>&& (GET_CODE (src) == CONST_INT
>|| GET_CODE (src) == CONST_WIDE_INT
>|| GET_CODE (src) == CONST_DOUBLE
>|| GET_CODE (src) == CONST_VECTOR))
> {
>   if (dest_gpr_p)
> return "#";
> 
> This means we require the insn to eventually be split and not reach final
> assembly, does it not?

Yes, but is the length attribute used for something that matters before
that?  For correctness, not just for better code.  It isn't clear to me
that this will work.

OTOH I cannot currently show it does not work either.  So, okay for trunk,
and I hope it actually works :-)


Segher

[PATCH, rs6000] Clarify when typedef names can be used with AltiVec vector types

2018-12-17 Thread Bill Schmidt

Hi,

We recently discovered some incorrect documentation about this topic and agreed 
it should be changed.
This is my attempt to clarify it.  Built and verified on powerpc64le-linux-gnu. 
 Is this ok for trunk?

Thanks,
Bill


2018-12-17  Bill Schmidt  

* doc/extend.texi (PowerPC Altivec/VSX Built-in Functions):
Describe when a typedef name can be used as the type specifier for
a vector type, and when it cannot.


Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 265974)
+++ gcc/doc/extend.texi (working copy)
@@ -16229,9 +16229,30 @@ disabled.  To use them, you must include @code{

[nvptx, committed] Move macro defs to top of nvptx.c

2018-12-17 Thread Tom de Vries

[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0005-nvptx-update-openacc-dim-macros.patch

Factored out this patch.

Committed.

Thanks,
- Tom
[nvptx] Move macro defs to top of nvptx.c

Move macro definition to the top of the file, allowing them to be used
there-after.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (PTX_VECTOR_LENGTH, PTX_WORKER_LENGTH,
	PTX_DEFAULT_RUNTIME_DIM): Move to the top of the file.

---
 gcc/config/nvptx/nvptx.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9906716890e..74ca0f585aa 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -82,6 +82,9 @@
 #define WORKAROUND_PTXJIT_BUG_3 1
 
 #define PTX_WARP_SIZE 32
+#define PTX_VECTOR_LENGTH 32
+#define PTX_WORKER_LENGTH 32
+#define PTX_DEFAULT_RUNTIME_DIM 0 /* Defer to runtime.  */
 
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
@@ -5166,11 +5169,6 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
 default: gcc_unreachable ();
 }
 }
-
-/* Define dimension sizes for known hardware.  */
-#define PTX_VECTOR_LENGTH 32
-#define PTX_WORKER_LENGTH 32
-#define PTX_DEFAULT_RUNTIME_DIM 0 /* Defer to runtime.  */
 
 /* Implement TARGET_SIMT_VF target hook: number of threads in a warp.  */

[nvptx, committed] Add PTX_WARP_SIZE

2018-12-17 Thread Tom de Vries

[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0005-nvptx-update-openacc-dim-macros.patch

Factored out this patch.

Committed.

Thanks,
- Tom
[nvptx] Add PTX_WARP_SIZE

Add PTX_WARP_SIZE constant and use it in nvptx_simt_vf.  The function
nvptx_simt_vf is used for OpenMP, and using PTX_WARP_SIZE here decouples the
OpenMP support from the PTX_VECTOR_LENGTH constant used in OpenACC support.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (PTX_WARP_SIZE): Define.
	(nvptx_simt_vf): Return PTX_WARP_SIZE instead of PTX_VECTOR_LENGTH.

---
 gcc/config/nvptx/nvptx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 01505899785..9906716890e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -81,6 +81,8 @@
 #define WORKAROUND_PTXJIT_BUG_2 1
 #define WORKAROUND_PTXJIT_BUG_3 1
 
+#define PTX_WARP_SIZE 32
+
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
 {
@@ -5175,7 +5177,7 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
 static int
 nvptx_simt_vf ()
 {
-  return PTX_VECTOR_LENGTH;
+  return PTX_WARP_SIZE;
 }
 
 /* Validate compute dimensions of an OpenACC offload or routine, fill

Re: [nvptx] vector length patch series

2018-12-17 Thread Tom de Vries

On 14-12-18 20:58, Tom de Vries wrote:
> 0009-nvptx-Fix-whitespace-in-nvptx_single-and-nvptx_neute.patch

Committed (Could have been committed as obvious).

Thanks,
- Tom
[nvptx] Fix whitespace in nvptx_single and nvptx_neuter_pars

Fix whitespace in nvptx_single and nvptx_neuter_pars.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (nvptx_single): Fix whitespace.
	(nvptx_neuter_pars): Likewise.

---
 gcc/config/nvptx/nvptx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 24727ad5920..01505899785 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4224,7 +4224,7 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	pred = gen_reg_rtx (BImode);
 	cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred;
 	  }
-	
+
 	rtx br;
 	if (mode == GOMP_DIM_VECTOR)
 	  br = gen_br_true (pred, label);
@@ -4554,7 +4554,7 @@ nvptx_neuter_pars (parallel *par, unsigned modes, unsigned outer)
 }
 
   if (skip_mask)
-  nvptx_skip_par (skip_mask, par);
+nvptx_skip_par (skip_mask, par);
   
   if (par->next)
 nvptx_neuter_pars (par->next, modes, outer);

Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support

2018-12-17 Thread Tom Honermann


On 12/17/18 4:02 PM, Jason Merrill wrote:

On 12/5/18 11:16 AM, Jason Merrill wrote:

On 12/5/18 2:09 AM, Tom Honermann wrote:

On 12/3/18 5:01 PM, Jason Merrill wrote:

On 12/3/18 4:51 PM, Jason Merrill wrote:

On 11/5/18 2:39 PM, Tom Honermann wrote:
This patch adds support for the P0482R5 core language changes.  
This includes:

- The -fchar8_t and -fno_char8_t command line options.
- char8_t as a keyword.
- The char8_t builtin type as a non-aliasing unsigned integral
   character type of size 1.
- Use of char8_t as a simple type specifier.
- u8 character literals with type char8_t.
- u8 string literals with type array of const char8_t.
- User defined literal operators that accept char8_1 and char8_t 
pointer

   types.
- New __cpp_char8_t predefined feature test macro.
- New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
   macros .
- Name mangling and demangling for char8_t (using Du).

gcc/ChangeLog:

2018-11-04  Tom Honermann  

  * defaults.h: Define CHAR8_TYPE.

gcc/c-family/ChangeLog:

2018-11-04  Tom Honermann  
  * c-family/c-common.c (c_common_reswords): Add char8_t.
  (fix_string_type): Use char8_t for the type of u8 string 
literals.

  (c_common_get_alias_set): char8_t doesn't alias.
  (c_common_nodes_and_builtins): Define char8_t as a builtin 
type in

  C++.
  (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
  (keyword_begins_type_specifier): Add RID_CHAR8.
  * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
  (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
  Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
  Define char8_type_node and char8_array_type_node.
  * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
  __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
  (c_cpp_builtins): Predefine __cpp_char8_t.
  * c-family/c-lex.c (lex_string): Use char8_array_type_node 
as the

  type of CPP_UTF8STRING.
  (lex_charconst): Use char8_type_node as the type of 
CPP_UTF8CHAR.

  * c-family/c.opt: Add the -fchar8_t command line option.

gcc/c/ChangeLog:

2018-11-04  Tom Honermann  

  * c/c-typeck.c (char_type_p): Add char8_type_node.
  (digest_init): Handle initialization by a u8 string literal of
  char8_t type.

gcc/cp/ChangeLog:

2018-11-04  Tom Honermann  

  * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
  * cp/decl.c (grokdeclarator): Handle invalid type specifier
  combinations involving char8_t.
  * cp/lex.c (init_reswords): Add char8_t as a reserved word.
  * cp/mangle.c (write_builtin_type): Add name mangling for 
char8_t

  (Du).
  * cp/parser.c (cp_keyword_starts_decl_specifier_p,
  cp_parser_simple_type_specifier): Recognize char8_t as a 
simple

  type specifier.
  (cp_parser_string_literal): Use char8_array_type_node for 
the type

  of CPP_UTF8STRING.
  (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs 
in system

  headers.
  * cp/rtti.c (emit_support_tinfos): type_info support for 
char8_t.
  * cp/tree.c (char_type_p): Recognize char8_t as a character 
type.

  * cp/typeck.c (string_conv_p): Handle conversions of u8 string
  literals of char8_t type.
  (check_literal_operator_args): Handle UDLs with u8 string 
literals

  of char8_t type.
  * cp/typeck2.c (digest_init_r): Disallow initializing a 
char array

  with a u8 string literal.

libiberty/ChangeLog:

2018-10-31  Tom Honermann  
  * cp-demangle.c (cplus_demangle_builtin_types,
  cplus_demangle_type): Add name demangling for char8_t (Du).
  * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to 
accommodate the

  new char8_t type.



@@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
   if (!TYPE_P (t))
 return -1;



+  /* Unlike char, char8_t doesn't alias. */
+  if (flag_char8_t && t == char8_type_node)
+    return -1;


This seems unnecessary; doesn't the existing code have the same 
effect? I think we could do with just an adjustment to the 
existing comment.
I'm not sure.  I had concerns about unintended matching due to 
char8_t having an underlying type of unsigned char.


That shouldn't be a problem: if char8_t is a distinct type, it won't 
match unsigned char, and if it's the same as unsigned char, 
flag_char8_t will be false.


+  else if (flag_char8_t && TREE_TYPE (value) == 
char8_array_type_node)

+  || (flag_char8_t && type == char8_type_node)
+  bool char8_array = (flag_char8_t && !!comptypes (typ1, 
char8_type_node));

+   || (flag_char8_t && type == char8_type_node
In many places you check the flag and then for one of the char8 
types. Since the types won't be used without the flag, checking 
the flag seems redundant?


This was again protection against unintended matching of the 
underlying unsigned char type, particularly when compiling as C. 
char8_type_node is constructed (in c_common_nodes_and_builtins) 
following the pattern in place for char16_t and char32_t with the 
follo

[nvptx, committed] Unify C/Fortran routine handling in nvptx_goacc_validate_dims

2018-12-17 Thread Tom de Vries

[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
>> 0004-openacc-Make-GFC-default-to-1-for-OpenACC-routine-di.patch
> If I remove this, I run into ICEs in the compiler, but I think that
> means we need to understand and fix that ICE, instead of concluding that
> we need this patch. It looks completely unrelated.

Indeed this
(0004-openacc-Make-GFC-default-to-1-for-OpenACC-routine-di.patch) patch
is unrelated to the vector length functionality.

However, it fixes a problem in the Fortran front-end which has as
consequence that C and Fortran routines look the same in
nvptx_goacc_validate_dims, which is a good thing to have.

However, the upstreaming of the patch seems to be stuck, so I've
committed an nvptx workaround patch that has the same effect, allowing
us to drop it
(0004-openacc-Make-GFC-default-to-1-for-OpenACC-routine-di.patch) from
the patch series.

Thanks,
- Tom

[nvptx] Unify C/Fortran routine handling in nvptx_goacc_validate_dims

The Fortran front-end has a bug (PR72741) that means what when
nvptx_goacc_validate_dims is called for a Fortran routine, the dims parameter
is not the same as it would have been if the function would have been called for
an equivalent C routine.

Work around this bug by overriding the dims parameter for routines, allowing the
function to handle routines in Fortran and C the same.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Work around Fortran
	bug PR72741 by overriding dims parameter for routines.

---
 gcc/config/nvptx/nvptx.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 746d8bfb100..24727ad5920 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5212,6 +5212,42 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
   else
 gcc_unreachable ();
 
+  if (routine_p)
+{
+  /* OpenACC routines in C arrive here with the following attributes
+	 (omitting the 'omp declare target'):
+	 seq   : __attribute__((oacc function (0 1, 0 1, 0 1)))
+	 vector: __attribute__((oacc function (0 1, 0 1, 1 0)))
+	 worker: __attribute__((oacc function (0 1, 1 0, 1 0)))
+	 gang  : __attribute__((oacc function (1 0, 1 0, 1 0)))
+
+	 If we take f.i. the oacc function attribute of the worker routine
+	 (0 1, 1 0, 1 0), then:
+	 - the slice (0, 1, 1) is interpreted by oacc_fn_attrib_level as
+	   meaning: worker routine, that is:
+	   - can't contain gang loop (0),
+	   - can contain worker loop (1),
+	   - can contain vector loop (1).
+	 - the slice (1, 0, 0) is interpreted by oacc_validate_dims as the
+	 dimensions: gang: 1, worker: 0, vector: 0.
+
+	 OTOH, routines in Fortran arrive here with these attributes:
+	 seq   : __attribute__((oacc function (0 0, 0 0, 0 0)))
+	 vector: __attribute__((oacc function (0 0, 0 0, 1 0)))
+	 worker: __attribute__((oacc function (0 0, 1 0, 1 0)))
+	 gang  : __attribute__((oacc function (1 0, 1 0, 1 0)))
+	 that is, the same as for C but with the dimensions set to 0.
+
+	 This is due to a bug in the Fortran front-end: PR72741.  Work around
+	 this bug by forcing the dimensions to be the same in Fortran as for C,
+	 to be able to handle C and Fortran routines uniformly in this
+	 function.  */
+  dims[GOMP_DIM_VECTOR] = fn_level > GOMP_DIM_VECTOR ? 1 : 0;
+  dims[GOMP_DIM_WORKER] = fn_level > GOMP_DIM_WORKER ? 1 : 0;
+  dims[GOMP_DIM_GANG] = fn_level > GOMP_DIM_GANG ? 1 : 0;
+  changed = true;
+}
+
   /* The vector size must be 32, unless this is a SEQ routine.  */
   if ((offload_region_p || oacc_default_dims_p
|| (routine_p && !routine_seq_p))

[nvptx, committed] Rewrite nvptx_goacc_validate_dims to use predicate vars

2018-12-17 Thread Tom de Vries

[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0023-nvptx-Force-vl32-if-calling-vector-partitionable-rou.patch

I've factored out this cleanup patch from here.

Committed to trunk.

Thanks,
- Tom
[nvptx] Rewrite nvptx_goacc_validate_dims to use predicate vars

The function nvptx_goacc_validate_dims has arguments decl and fn_level which
together describe different situations.

Introduce a predicate var for each situation, and use them, allowing to
understand what the function does in each situation without having to know the
way the situations are encoded in the args.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Rewrite using
	predicate vars.

---
 gcc/config/nvptx/nvptx.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9903a273863..746d8bfb100 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5187,13 +5187,39 @@ static bool
 nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
 {
   bool changed = false;
+  bool oacc_default_dims_p = false;
+  bool oacc_min_dims_p = false;
+  bool offload_region_p = false;
+  bool routine_p = false;
+  bool routine_seq_p = false;
+
+  if (decl == NULL_TREE)
+{
+  if (fn_level == -1)
+	oacc_default_dims_p = true;
+  else if (fn_level == -2)
+	oacc_min_dims_p = true;
+  else
+	gcc_unreachable ();
+}
+  else if (fn_level == -1)
+offload_region_p = true;
+  else if (0 <= fn_level && fn_level <= GOMP_DIM_MAX)
+{
+  routine_p = true;
+  routine_seq_p = fn_level == GOMP_DIM_MAX;
+}
+  else
+gcc_unreachable ();
 
   /* The vector size must be 32, unless this is a SEQ routine.  */
-  if (fn_level <= GOMP_DIM_VECTOR && fn_level >= -1
+  if ((offload_region_p || oacc_default_dims_p
+   || (routine_p && !routine_seq_p))
   && dims[GOMP_DIM_VECTOR] >= 0
   && dims[GOMP_DIM_VECTOR] != PTX_VECTOR_LENGTH)
 {
-  if (fn_level < 0 && dims[GOMP_DIM_VECTOR] >= 0)
+  if ((offload_region_p || oacc_default_dims_p)
+	  && dims[GOMP_DIM_VECTOR] >= 0)
 	warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
 		dims[GOMP_DIM_VECTOR]
 		? G_("using vector_length (%d), ignoring %d")
@@ -5213,7 +5239,7 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
   changed = true;
 }
 
-  if (!decl)
+  if (oacc_default_dims_p || oacc_min_dims_p)
 {
   dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
   if (dims[GOMP_DIM_WORKER] < 0)

Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support

2018-12-17 Thread Jason Merrill


On 12/5/18 11:16 AM, Jason Merrill wrote:

On 12/5/18 2:09 AM, Tom Honermann wrote:

On 12/3/18 5:01 PM, Jason Merrill wrote:

On 12/3/18 4:51 PM, Jason Merrill wrote:

On 11/5/18 2:39 PM, Tom Honermann wrote:
This patch adds support for the P0482R5 core language changes.  
This includes:

- The -fchar8_t and -fno_char8_t command line options.
- char8_t as a keyword.
- The char8_t builtin type as a non-aliasing unsigned integral
   character type of size 1.
- Use of char8_t as a simple type specifier.
- u8 character literals with type char8_t.
- u8 string literals with type array of const char8_t.
- User defined literal operators that accept char8_1 and char8_t 
pointer

   types.
- New __cpp_char8_t predefined feature test macro.
- New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
   macros .
- Name mangling and demangling for char8_t (using Du).

gcc/ChangeLog:

2018-11-04  Tom Honermann  

  * defaults.h: Define CHAR8_TYPE.

gcc/c-family/ChangeLog:

2018-11-04  Tom Honermann  
  * c-family/c-common.c (c_common_reswords): Add char8_t.
  (fix_string_type): Use char8_t for the type of u8 string 
literals.

  (c_common_get_alias_set): char8_t doesn't alias.
  (c_common_nodes_and_builtins): Define char8_t as a builtin 
type in

  C++.
  (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
  (keyword_begins_type_specifier): Add RID_CHAR8.
  * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
  (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
  Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
  Define char8_type_node and char8_array_type_node.
  * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
  __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
  (c_cpp_builtins): Predefine __cpp_char8_t.
  * c-family/c-lex.c (lex_string): Use char8_array_type_node as 
the

  type of CPP_UTF8STRING.
  (lex_charconst): Use char8_type_node as the type of 
CPP_UTF8CHAR.

  * c-family/c.opt: Add the -fchar8_t command line option.

gcc/c/ChangeLog:

2018-11-04  Tom Honermann  

  * c/c-typeck.c (char_type_p): Add char8_type_node.
  (digest_init): Handle initialization by a u8 string literal of
  char8_t type.

gcc/cp/ChangeLog:

2018-11-04  Tom Honermann  

  * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
  * cp/decl.c (grokdeclarator): Handle invalid type specifier
  combinations involving char8_t.
  * cp/lex.c (init_reswords): Add char8_t as a reserved word.
  * cp/mangle.c (write_builtin_type): Add name mangling for 
char8_t

  (Du).
  * cp/parser.c (cp_keyword_starts_decl_specifier_p,
  cp_parser_simple_type_specifier): Recognize char8_t as a simple
  type specifier.
  (cp_parser_string_literal): Use char8_array_type_node for the 
type

  of CPP_UTF8STRING.
  (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in 
system

  headers.
  * cp/rtti.c (emit_support_tinfos): type_info support for 
char8_t.
  * cp/tree.c (char_type_p): Recognize char8_t as a character 
type.

  * cp/typeck.c (string_conv_p): Handle conversions of u8 string
  literals of char8_t type.
  (check_literal_operator_args): Handle UDLs with u8 string 
literals

  of char8_t type.
  * cp/typeck2.c (digest_init_r): Disallow initializing a char 
array

  with a u8 string literal.

libiberty/ChangeLog:

2018-10-31  Tom Honermann  
  * cp-demangle.c (cplus_demangle_builtin_types,
  cplus_demangle_type): Add name demangling for char8_t (Du).
  * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate 
the

  new char8_t type.



@@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
   if (!TYPE_P (t))
 return -1;



+  /* Unlike char, char8_t doesn't alias. */
+  if (flag_char8_t && t == char8_type_node)
+    return -1;


This seems unnecessary; doesn't the existing code have the same 
effect? I think we could do with just an adjustment to the existing 
comment.
I'm not sure.  I had concerns about unintended matching due to char8_t 
having an underlying type of unsigned char.


That shouldn't be a problem: if char8_t is a distinct type, it won't 
match unsigned char, and if it's the same as unsigned char, flag_char8_t 
will be false.


+  else if (flag_char8_t && TREE_TYPE (value) == 
char8_array_type_node)

+  || (flag_char8_t && type == char8_type_node)
+  bool char8_array = (flag_char8_t && !!comptypes (typ1, 
char8_type_node));

+   || (flag_char8_t && type == char8_type_node
In many places you check the flag and then for one of the char8 
types. Since the types won't be used without the flag, checking the 
flag seems redundant?


This was again protection against unintended matching of the 
underlying unsigned char type, particularly when compiling as C. 
char8_type_node is constructed (in c_common_nodes_and_builtins) 
following the pattern in place for char16_t and char32_t with the 
following code:


+  char8_type_node = get_identif

Re: [C++ PATCH] Fix ICE with offsetof-like initializer (PR c++/88410)

2018-12-17 Thread Jason Merrill


On 12/9/18 6:05 AM, Jakub Jelinek wrote:

Hi!

The following testcase ICEs starting with my change to move offsetof-like
expression handling from the parsing to cp_fold - if the base expression
is not INTEGER_CST, but a constant VAR_DECL initialized with INTEGER_CST,
then we don't fold it as offsetof-like expression and as we use recursive
cp_fold only on functions, not initializers, we don't fold that VAR_DECL in
there into INTEGER_CST and the middle-end ICEs on it trying to fold it.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?


OK.

Jason

Re: [PATCH] PR c++/52321 print note for static_cast to/from incomplete type

2018-12-17 Thread Jason Merrill


On 12/17/18 10:37 AM, Jonathan Wakely wrote:

If build_static_cast_1 returns false, and one or both type is a
pointer/reference to an incomplete class, print a note saying so. This
doesn't attempt to check whether the static_cast failed because the
type is incomplete (which was checked inside build_static_cast_1 but
not in the caller). It just informs the user that it is incomplete,
which is probably the most likely reason.

 PR c++/52321
 * typeck.c (build_static_cast): Print a note when the destination
 type or the operand is a pointer/reference to incomplete class type.

Tested powerpc64le-linux. OK for trunk?


OK.

Jason

Re: [PATCH] C/C++: don't suggest decls that are being initialized (PR c++/88320)

2018-12-17 Thread Jason Merrill


On 12/5/18 4:29 PM, David Malcolm wrote:

diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
index 8bfa3f3..8f389e4 100644
--- a/gcc/cp/parser.h
+++ b/gcc/cp/parser.h
@@ -405,6 +405,15 @@ struct GTY(()) cp_parser {
   specification, if any, or UNKNOWN_LOCATION otherwise.  */
location_t innermost_linkage_specification_location;
  
+  /* A stack of all variables for which we're currently parsing an initializer.

+ This allows us to prevent offering the decls as suggestions for
+ unrecognized identifiers - following such suggestions would lead to
+ -Wuninitialized warnings.  */
+  vec *decls_being_initialized;
+
+  /* Are we within a mem-initializer-list?  This allows us to prevent
+ offering fields as suggestions for unrecognized identifiers.  */
+  int within_mem_initializer_list;


I don't think cp_parser is the right place for these; they should go in 
saved_scope instead so that when we push into a template instantiation 
they are cleared, and then restored when we pop back.


Jason

Ping: [PATCH 0/4] c/c++, asm: Various updates

2018-12-17 Thread Segher Boessenkool

Ping?

On Mon, Dec 10, 2018 at 10:47:23PM +, Segher Boessenkool wrote:
> This ties up some loose ends, and adds some more testcases.
> 
> Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?
> 
> 
> Segher
> 
> 
> Segher Boessenkool (4):
>   c/c++, asm: Write the asm-qualifier loop without "done" boolean
>   c/c++, asm: Use nicer error for duplicate asm qualifiers
>   c/c++, asm: Use nicer warning for const and restrict
>   c++, asm: Do not handle any asm-qualifiers in top-level asm
> 
>  gcc/c/c-parser.c  | 106 ++---
>  gcc/c/c-tree.h|   2 +-
>  gcc/c/c-typeck.c  |   4 +-
>  gcc/cp/parser.c   | 107 
> ++
>  gcc/testsuite/g++.dg/asm-qual-1.C |  13 +
>  gcc/testsuite/g++.dg/asm-qual-2.C |  46 
>  gcc/testsuite/g++.dg/asm-qual-3.C |  12 +
>  gcc/testsuite/gcc.dg/asm-qual-1.c |   6 +--
>  gcc/testsuite/gcc.dg/asm-qual-3.c |   9 
>  9 files changed, 209 insertions(+), 96 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/asm-qual-1.C
>  create mode 100644 gcc/testsuite/g++.dg/asm-qual-2.C
>  create mode 100644 gcc/testsuite/g++.dg/asm-qual-3.C
>  create mode 100644 gcc/testsuite/gcc.dg/asm-qual-3.c
> 
> -- 
> 1.8.3.1

Re: [PATCH][RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars

2018-12-17 Thread Peter Bergner

On 12/14/18 8:23 PM, Segher Boessenkool wrote:
> On Thu, Dec 13, 2018 at 10:59:36AM -0600, Peter Bergner wrote:
>> For the alternatives
>> I'm changing, we're loading into GPR regs and these alternatives are always
>> split (split2), so these length values are never used/seen at final assembly
>> time.
> 
> But some move instructions are created *after* split2.

That may be so, but I do not know how we could create a move insn using
this alternative, since rs6000_output_move_128bit() does the following
for this alternative (ie, loading a constant into a GPR):


  /* Constants.  */
  else if (dest_regno >= 0
   && (GET_CODE (src) == CONST_INT
   || GET_CODE (src) == CONST_WIDE_INT
   || GET_CODE (src) == CONST_DOUBLE
   || GET_CODE (src) == CONST_VECTOR))
{
  if (dest_gpr_p)
return "#";

This means we require the insn to eventually be split and not reach final
assembly, does it not?

Peter

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Bernd Edlinger

On 12/17/18 7:46 PM, Richard Sandiford wrote:
> Segher Boessenkool  writes:
>> On Mon, Dec 17, 2018 at 11:47:42AM +, Richard Sandiford wrote:
>>> Dimitar Dimitrov  writes:
 On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
> Hi,
>
> if I understood that right, then clobbering sp is and has always been
> ignored.
>>>
>>> PR77904 was about the clobber not being ignored, so the behaviour
>>> hasn't been consistent.
>>>
>>> I'm also not sure it was always ignored in recent sources.  The clobber
>>> does get added to the associated rtl insn, and it'd be surprising if
>>> that never had an effect.
>>
>> Yes, you will usually get a frame pointer.  My point was that the epilogue
>> will restore your stack pointer both with and without the asm clobber.
> 
> I'm not confident that's the only effect though.
> 
> Also, we didn't use a frame in PR77904, and using a frame would have
> been the wrong thing to do.
> 
>>> I don't think there's a good reason to treat this differently from the
>>> preexisting PIC register error.  If the argument for making it a warning
>>> rather than an error is that the asm might happen to work by accident,
>>> then the same is true for the PIC register.
>>
>> Yes.  As well as quite a few more registers, many of those specific to
>> the target.  And there are many more things you can do terribly wrong in
>> inline assembler, of course, most of which we can never detect.
> 
> Right.  And I don't think anyone's suggesting GCC can detect everything.
> It can only police the things it knows about, which include the input,
> output and clobber clauses.
> 
> What makes the PIC register and sp worth special attention is that
> changing their values would in general invalidate other code that GCC
> generates itself.  It's not just about whether the asm has the effect
> the author wanted (whatever that was).
> 
> FWIW, I don't think we should go on a proactive hunt for other registers
> to complain about.
> 

out of curiosity I looked at the clobber statement in gdb/nat/linux-ptrace.c:

   asm volatile ("pushq %0;"
 ".globl linux_ptrace_test_ret_to_nx_instr;"
 "linux_ptrace_test_ret_to_nx_instr:"
 "ret"
 : : "r" ((uint64_t) (uintptr_t) return_address)
 : "%rsp", "memory");

it turns out to be a far jump, instruction.  And I wanted to find out what
removing the %rsp clobber actually does.  First there is a technical problem
with that, because gcc does not print an error, it is possbile to compile the
code without the sp clobber, but not to compare the code that would have been
generated if the error would only be a warning.  So I had to undo the patch
in order to see, what the sp clobber actually does, and I think Segher
mentioned that this might have an influence on the frame pointer, that turns
out to be true:

diff  linux-ptrace-spclobber.dis  linux-ptrace-noclobber.dis

449,590c449,593
<  5c0: 55  push   %rbp
<  5c1: 45 31 c9xor%r9d,%r9d
<  5c4: 41 b8 ff ff ff ff   mov$0x,%r8d
<  5ca: b9 22 00 00 00  mov$0x22,%ecx
<  5cf: ba 03 00 00 00  mov$0x3,%edx
<  5d4: be 02 00 00 00  mov$0x2,%esi
<  5d9: 31 ff   xor%edi,%edi
<  5db: 48 89 e5mov%rsp,%rbp
<  5de: 41 57   push   %r15
<  5e0: 41 56   push   %r14
<  5e2: 41 55   push   %r13
<  5e4: 41 54   push   %r12
<  5e6: 53  push   %rbx
<  5e7: 48 81 ec f8 00 00 00sub$0xf8,%rsp
[snip]
---
>  5c0: 41 56   push   %r14
>  5c2: 45 31 c9xor%r9d,%r9d
>  5c5: 41 b8 ff ff ff ff   mov$0x,%r8d
>  5cb: b9 22 00 00 00  mov$0x22,%ecx
>  5d0: 41 55   push   %r13
>  5d2: ba 03 00 00 00  mov$0x3,%edx
>  5d7: be 02 00 00 00  mov$0x2,%esi
>  5dc: 31 ff   xor%edi,%edi
>  5de: 41 54   push   %r12
>  5e0: 55  push   %rbp
>  5e1: 53  push   %rbx
>  5e2: 48 81 ec f0 00 00 00sub$0xf0,%rsp


So for me this looks not at all trivial to see if this
would work without the sp clobber, since the stack frame
might be completely different without that sp clobber.

I wonder what gdb developers think about the sp clobber
here, if it is easy to fix or if it gives trouble to them.


Thanks
Bernd.

Re: [PATCH] v5: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-17 Thread Jason Merrill


On 12/14/18 7:17 PM, David Malcolm wrote:

+  /* Since default args are effectively part of the function type,
+strip location wrappers here, since otherwise the location of
+one function's default arguments is arbitrarily chosen for
+all functions with similar signature (due to canonicalization
+of function types).  */


Hmm, looking at this again, why would this happen?  I see that 
type_list_equal uses == to compare default arguments, so two function 
types with the same default argument but different location wrappers 
shouldn't be combined.


Jason

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-17 Thread Szabolcs Nagy

On 17/12/2018 18:22, Uecker, Martin wrote:
> Am Montag, den 17.12.2018, 15:25 + schrieb Szabolcs Nagy:
>> On 16/12/2018 22:45, Uecker, Martin wrote:
>>> Am Sonntag, den 16.12.2018, 09:13 -0700 schrieb Jeff Law:
 Ultimately using function descriptors is an ABI breaking choice and we
 might declare that function descriptors imply higher function
 alignments.  
>>>
>>> Increasing the alignment is not an ABI breaking change.
>>
>> increasing alignment _requirement_ is an abi breaking change.
> 
> You are right. The idea was to increase the minimum alignment
> to always be compatible with code compiled with 
> "-fno-trampolines" but without actually requiring the
> alignment for other code as long as "-fno-trampolines"
> is not given. 
> 
> 
>> and it's not clear who would benefit from the new abi:
>>
>> - it affects everything that does indirect calls (if alignment
>> requirement is increased then in addition everything that has
>> functions whose address may be taken), so it can easily affect
>> existing handwritten asm and it definitely requires the rebuild
>> of the c runtime to support this abi (i think it even requires
>> asm changes there if you allow a thread or makecontext start
>> function to be a nested function).
>>
>> - it makes indirect calls more expensive everywhere, even if
>> nested functions are not used.
> 
> Yes, transition to "-fno-trampolines" by default would be a
> major undertaking and the cost for
> indirect calls might not
> be acceptable. I was not proposing this.
> 
>> i think to fix the executable stack problem in practice, the
>> new nested function mechanism should only require the rebuild
>> of code that actually contains nested functions and thus have
>> no abi or performance impact on code that never uses them.
> 
> My use case is to activate '-fno-trampolines' for some
> project which use nested functions internally. This works
> just fine with existing code because 1) no pointers to nested
> functions escape 2) the default alignment on the existing
> code is high enough.
> 
> This is a practical fix, but only when you are careful and
> activate on a case by case. Of course, it is not a full solution
> to the general problem. 

i see.
i think that's not a common use-case.
i'd expect nested function pointers to often escape
(as callbacks to extern library function calls).

>> i believe this can be achieved by some restrictions on nested
>> function usage in a way that covers most practical use-cases:
>> e.g. only allowing one active parent function call frame per
>> thread, no recursive calls to it, the nested function must be
>> invoked in the same thread as the parent using the same stack,
>> etc. (then the new mechanism can be used safely if nested
>> functions are known to follow the restrictions, the compiler
>> may even emit code to check the constraints at runtime.)
> 
> So a thread_local static variable for storing the static
> chain?

something like that, but the more i think about it the
harder it seems: the call site of the nested function
may not be under control of the nested function writer,
in particular the nested function may be called on a
different thread, and extern library apis are unlikely
to provide guarantees about this, so in general if a
nested function escapes into an extern library then
this cannot be relied on, which limits my original
idea again to cases where there is no escape (which i
think is not that useful).

[PATCH][GCC][AArch64] Fix command line options canonicalization. (PR target/88530)

2018-12-17 Thread Tamar Christina

Hi All,

The options don't seem to get canonicalized into the smallest possible set
before output to the assembler. This means that overlapping feature sets are
emitted with superfluous parts.

Normally this isn't an issue, but in the case of crypto we have retro-actively
split it into aes and sha2. We need to emit only +crypto to the assembler
so old assemblers continue to work.

Because of how -mcpu=native and -march=native work they end up enabling all 
feature
bits, so we need to get the smallest possible set, which would also fix the
problem with older the assemblers and the retro-active split.

Admittedly this should be done earlier in options processing, but the problem
with the way AArch64 currently processes options is that where the isa_bits are
determined we don't know which options are part of the default set yet.

Which is why we instead do it late in processing when we have all the
information.  This however requires us to make a duplicate of the extensions
list.

The Option handling structures have been extended to have a boolean to indicate
whether the option is synthetic, with that I mean if the option flag itself has 
a bit.

e.g. +crypto isn't an actual bit, it just enables other bits, but other 
features flags
like +rdma also enable multiple options but are themselves also a feature.

There are two ways to solve this.

1) Either have the options that are feature bits also turn themselves on, e.g. 
change
   rdma to turn on FP, SIMD and RDMA as dependency bits.
2) Make a distinction between these two different type of features and have the 
framework
   handle it correctly.

Even though it's more code I went for the second approach, as it's the one 
that'll be less
fragile and give the least surprises.

This is a stop-gap measure that's has the lowest impact and is back-portable.

Effectively this patch changes the following:

The values before the => are the old compiler and after the => the new code.

-march=armv8.2-a+crypto+sha2 => -march=armv8.2-a+crypto
-march=armv8.2-a+sha2+aes => -march=armv8.2-a+crypto

The remaining behaviors stay the same.


Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-12-17  Tamar Christina  

PR target/88530
* common/config/aarch64/aarch64-common.c
(struct aarch64_option_extension): Add is_synthetic.
(all_extensions): Use it.
(TARGET_OPTION_INIT_STRUCT): Define hook.
(struct gcc_targetm_common): Moved to end.
(all_extensions_by_on): New.
(opt_ext_cmp, typedef opt_ext): New.
(aarch64_option_init_struct): New.
(aarch64_contains_opt): New.
(aarch64_get_extension_string_for_isa_flags): Output smallest set.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): Explicitly include AES and SHA2 in crypto.
(fp, simd, crc, lse, fp16, rcpc, rdma, dotprod, aes, sha2, sha3,
sm4, fp16fml, sve, profile): Set is_synthetic to false.
(crypto): Set is_synthetic to true.

gcc/testsuite/ChangeLog:

2018-12-17  Tamar Christina  

PR target/88530
* gcc.target/aarch64/options_set_1.c: New test.
* gcc.target/aarch64/options_set_2.c: New test.
* gcc.target/aarch64/options_set_3.c: New test.
* gcc.target/aarch64/options_set_4.c: New test.
* gcc.target/aarch64/options_set_5.c: New test.
* gcc.target/aarch64/options_set_6.c: New test.
* gcc.target/aarch64/options_set_7.c: New test.
* gcc.target/aarch64/options_set_8.c: New test.
* gcc.target/aarch64/options_set_9.c: New test.

-- 
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index dd7d42673402c3cf16ebce009d263d62d574690a..d14237d229fd958c940aee32d3d6404b04cc137e 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -46,6 +46,8 @@
 #define TARGET_OPTION_DEFAULT_PARAMS aarch64_option_default_params
 #undef TARGET_OPTION_VALIDATE_PARAM
 #define TARGET_OPTION_VALIDATE_PARAM aarch64_option_validate_param
+#undef TARGET_OPTION_INIT_STRUCT
+#define TARGET_OPTION_INIT_STRUCT aarch64_option_init_struct
 
 /* Set default optimization options.  */
 static const struct default_options aarch_option_optimization_table[] =
@@ -164,8 +166,6 @@ aarch64_handle_option (struct gcc_options *opts,
 }
 }
 
-struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
-
 /* An ISA extension in the co-processor and main instruction set space.  */
 struct aarch64_option_extension
 {
@@ -173,15 +173,28 @@ struct aarch64_option_extension
   const unsigned long flag_canonical;
   const unsigned long flags_on;
   const unsigned long flags_off;
+  const bool is_synthetic;
 };
 
 /* ISA extensions in AArch64.  */
 static const struct aarch64_option_extension all_extensions[] =
 {
-#define AARCH64_OPT_EXTENSION(NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, Z)

Patch ping (was Re: [C++ PATCH] Fix ICE with offsetof-like initializer (PR c++/88410))

2018-12-17 Thread Jakub Jelinek

Hi!

On Sun, Dec 09, 2018 at 12:05:06PM +0100, Jakub Jelinek wrote:
> The following testcase ICEs starting with my change to move offsetof-like
> expression handling from the parsing to cp_fold - if the base expression
> is not INTEGER_CST, but a constant VAR_DECL initialized with INTEGER_CST,
> then we don't fold it as offsetof-like expression and as we use recursive
> cp_fold only on functions, not initializers, we don't fold that VAR_DECL in
> there into INTEGER_CST and the middle-end ICEs on it trying to fold it.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2018-12-09  Jakub Jelinek  
> 
>   PR c++/88410
>   * cp-gimplify.c (cp_fold) : For offsetof-like folding,
>   call maybe_constant_value on val to see if it is INTEGER_CST.
> 
>   * g++.dg/cpp0x/pr88410.C: New test.

I'd like to ping this patch.

Thanks.

Jakub

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Richard Sandiford

Segher Boessenkool  writes:
> On Mon, Dec 17, 2018 at 11:47:42AM +, Richard Sandiford wrote:
>> Dimitar Dimitrov  writes:
>> > On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
>> >> Hi,
>> >> 
>> >> if I understood that right, then clobbering sp is and has always been
>> >> ignored.
>> 
>> PR77904 was about the clobber not being ignored, so the behaviour
>> hasn't been consistent.
>> 
>> I'm also not sure it was always ignored in recent sources.  The clobber
>> does get added to the associated rtl insn, and it'd be surprising if
>> that never had an effect.
>
> Yes, you will usually get a frame pointer.  My point was that the epilogue
> will restore your stack pointer both with and without the asm clobber.

I'm not confident that's the only effect though.

Also, we didn't use a frame in PR77904, and using a frame would have
been the wrong thing to do.

>> I don't think there's a good reason to treat this differently from the
>> preexisting PIC register error.  If the argument for making it a warning
>> rather than an error is that the asm might happen to work by accident,
>> then the same is true for the PIC register.
>
> Yes.  As well as quite a few more registers, many of those specific to
> the target.  And there are many more things you can do terribly wrong in
> inline assembler, of course, most of which we can never detect.

Right.  And I don't think anyone's suggesting GCC can detect everything.
It can only police the things it knows about, which include the input,
output and clobber clauses.

What makes the PIC register and sp worth special attention is that
changing their values would in general invalidate other code that GCC
generates itself.  It's not just about whether the asm has the effect
the author wanted (whatever that was).

FWIW, I don't think we should go on a proactive hunt for other registers
to complain about.

Thanks,
Richard

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-17 Thread Andreas Schwab

On Dez 17 2018, "Uecker, Martin"  wrote:

> Am Montag, den 17.12.2018, 10:28 -0700 schrieb Jeff Law:
>> On 12/16/18 3:45 PM, Uecker, Martin wrote:
>> > But most architectures require a higher alignment anyway.
>> > Here is a list of all targets where function alignment
>> > is 1 byte:
>> > 
>> > gcc/config/avr/avr.h:#define FUNCTION_BOUNDARY 8
>> > gcc/config/i386/i386.h:#define FUNCTION_BOUNDARY 8
>> > gcc/config/m32c/m32c.h:#define FUNCTION_BOUNDARY 8
>> > gcc/config/mn10300/mn10300.h:#define FUNCTION_BOUNDARY 8
>> > gcc/config/pa/pa.h:#define FUNCTION_BOUNDARY BITS_PER_WORD
>> > gcc/config/rl78/rl78.h:#define FUNCTION_BOUNDARY   8
>> > gcc/config/rx/rx.h:#define FUNCTION_BOUNDARY   ((rx_cpu_type 
>> > == RX100 ||
>> > rx_cpu_type == RX200) ? 4 : 8)
>
> (BTW: pa was included here by mistake)

The rx config apparently confused bits and bytes.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

[PATCH, d] Committed merge with upstream dmd

2018-12-17 Thread Iain Buclaw

Hi,

This patch merges the D front-end implementation with dmd upstream 237ca3fbe.

Backports a fix where a bad cast to TypeFunction resulted in memory
corruption.  The logic in the function semantic has been fixed, and
casts have been replaced with a function call to always check the
front-end AST node value.

Bootstrapped and tested on x86_64-linux-gnu.

Committed to trunk as r267207.

-- 
Iain
---
diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index a1a1fa0efd1..bc35d4adc1f 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-e2fe2687b817a201528abaa3aa882333e04db01b
+237ca3fbe8f9ac4b64e26ce912c20439ee4fc63a
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/dclass.c b/gcc/d/dmd/dclass.c
index 6fe17b36576..ce9849fc7dd 100644
--- a/gcc/d/dmd/dclass.c
+++ b/gcc/d/dmd/dclass.c
@@ -805,7 +805,7 @@ Lancestorsdone:
 if (fd && !fd->errors)
 {
 //printf("Creating default this(){} for class %s\n", toChars());
-TypeFunction *btf = (TypeFunction *)fd->type;
+TypeFunction *btf = fd->type->toTypeFunction();
 TypeFunction *tf = new TypeFunction(NULL, NULL, 0, LINKd, fd->storage_class);
 tf->mod = btf->mod;
 tf->purity = btf->purity;
@@ -1152,7 +1152,7 @@ int isf(void *param, Dsymbol *s)
 
 bool ClassDeclaration::isFuncHidden(FuncDeclaration *fd)
 {
-//printf("ClassDeclaration::isFuncHidden(class = %s, fd = %s)\n", toChars(), fd->toChars());
+//printf("ClassDeclaration::isFuncHidden(class = %s, fd = %s)\n", toChars(), fd->toPrettyChars());
 Dsymbol *s = search(Loc(), fd->ident, IgnoreAmbiguous | IgnoreErrors);
 if (!s)
 {
@@ -1749,6 +1749,7 @@ bool InterfaceDeclaration::isBaseOf(ClassDeclaration *cd, int *poffset)
 //printf("\tX base %s\n", b->sym->toChars());
 if (this == b->sym)
 {
+//printf("\tfound at offset %d\n", b->offset);
 if (poffset)
 {
 // don't return incorrect offsets https://issues.dlang.org/show_bug.cgi?id=16980
@@ -1882,8 +1883,7 @@ bool BaseClass::fillVtbl(ClassDeclaration *cd, FuncDeclarations *vtbl, int newin
 
 assert(ifd);
 // Find corresponding function in this class
-tf = (ifd->type->ty == Tfunction) ? (TypeFunction *)(ifd->type) : NULL;
-assert(tf);  // should always be non-null
+tf = ifd->type->toTypeFunction();
 fd = cd->findFunc(ifd->ident, tf);
 if (fd && !fd->isAbstract())
 {
diff --git a/gcc/d/dmd/dstruct.c b/gcc/d/dmd/dstruct.c
index f9f15ba9092..77d6174241d 100644
--- a/gcc/d/dmd/dstruct.c
+++ b/gcc/d/dmd/dstruct.c
@@ -46,7 +46,7 @@ FuncDeclaration *search_toString(StructDeclaration *sd)
 if (!tftostring)
 {
 tftostring = new TypeFunction(NULL, Type::tstring, 0, LINKd);
-tftostring = (TypeFunction *)tftostring->merge();
+tftostring = tftostring->merge()->toTypeFunction();
 }
 
 fd = fd->overloadExactMatch(tftostring);
@@ -92,6 +92,7 @@ void semanticTypeInfo(Scope *sc, Type *t)
 }
 void visit(TypeStruct *t)
 {
+//printf("semanticTypeInfo::visit(TypeStruct = %s)\n", t->toChars());
 StructDeclaration *sd = t->sym;
 
 /* Step 1: create TypeInfoDeclaration
diff --git a/gcc/d/dmd/func.c b/gcc/d/dmd/func.c
index c8f9c5c350a..4e1b3e2d2d3 100644
--- a/gcc/d/dmd/func.c
+++ b/gcc/d/dmd/func.c
@@ -411,8 +411,8 @@ static bool canInferAttributes(FuncDeclaration *fd, Scope *sc)
  */
 static void initInferAttributes(FuncDeclaration *fd)
 {
-assert(fd->type->ty == Tfunction);
-TypeFunction *tf = (TypeFunction *)fd->type;
+//printf("initInferAttributes() for %s\n", toPrettyChars());
+TypeFunction *tf = fd->type->toTypeFunction();
 if (tf->purity == PUREimpure) // purity not specified
 fd->flags |= FUNCFLAGpurityInprocess;
 
@@ -495,7 +495,7 @@ void FuncDeclaration::semantic(Scope *sc)
 fld->tok = TOKfunction;
 else
 assert(0);
-linkage = ((TypeFunction *)treq->nextOf())->linkage;
+linkage = treq->nextOf()->toTypeFunction()->linkage;
 }
 else
 linkage = sc->linkage;
@@ -505,11 +505,21 @@ void FuncDeclaration::semantic(Scope *sc)
 
 if (!originalType)
 originalType = type->syntaxCopy();
+if (type->ty != Tfunction)
+{
+if (type->ty != Terror)
+{
+error("%s must be a function instead of %s", toChars(), type->toChars());
+type = Type::terror;
+}
+errors = true;
+return;
+}
 if (!type->deco)
 {
 sc = sc->push();
 sc->stc |= storage_class & (STCdisable | STCdeprecated);  // forward to function type
-TypeFunction *tf = (TypeFunction *)type;
+TypeFunction *tf = type->toTypeFunction();
 
 if (sc->func)
 {
@@ -678

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-17 Thread Uecker, Martin

Am Montag, den 17.12.2018, 15:25 + schrieb Szabolcs Nagy:
> On 16/12/2018 22:45, Uecker, Martin wrote:
> > Am Sonntag, den 16.12.2018, 09:13 -0700 schrieb Jeff Law:
> > > Ultimately using function descriptors is an ABI breaking choice and we
> > > might declare that function descriptors imply higher function
> > > alignments.  
> > 
> > Increasing the alignment is not an ABI breaking change.
> 
> increasing alignment _requirement_ is an abi breaking change.

You are right. The idea was to increase the minimum alignment
to always be compatible with code compiled with 
"-fno-trampolines" but without actually requiring the
alignment for other code as long as "-fno-trampolines"
is not given. 

> and it's not clear who would benefit from the new abi:
> 
> - it affects everything that does indirect calls (if alignment
> requirement is increased then in addition everything that has
> functions whose address may be taken), so it can easily affect
> existing handwritten asm and it definitely requires the rebuild
> of the c runtime to support this abi (i think it even requires
> asm changes there if you allow a thread or makecontext start
> function to be a nested function).
>
> - it makes indirect calls more expensive everywhere, even if
> nested functions are not used.

Yes, transition to "-fno-trampolines" by default would be a
major undertaking and the cost for
indirect calls might not
be acceptable. I was not proposing this.

> i think to fix the executable stack problem in practice, the
> new nested function mechanism should only require the rebuild
> of code that actually contains nested functions and thus have
> no abi or performance impact on code that never uses them.

My use case is to activate '-fno-trampolines' for some
project which use nested functions internally. This works
just fine with existing code because 1) no pointers to nested
functions escape 2) the default alignment on the existing
code is high enough.

This is a practical fix, but only when you are careful and
activate on a case by case. Of course, it is not a full solution
to the general problem. 

> i believe this can be achieved by some restrictions on nested
> function usage in a way that covers most practical use-cases:
> e.g. only allowing one active parent function call frame per
> thread, no recursive calls to it, the nested function must be
> invoked in the same thread as the parent using the same stack,
> etc. (then the new mechanism can be used safely if nested
> functions are known to follow the restrictions, the compiler
> may even emit code to check the constraints at runtime.)

So a thread_local static variable for storing the static
chain?

Best,
Martin

Re: [PATCH] AIX extra_headers

2018-12-17 Thread Segher Boessenkool

On Mon, Dec 17, 2018 at 10:07:30AM -0500, David Edelsohn wrote:
> Currently the AIX configuration defines extra_headers to install some
> PowerPC-specific headers, such as altivec.h. The GCC configuration for
> Power was changed to define this centrally and the individual
> definitions for Linux were removed, but remained for AIX.  This means
> that AIX overrides the common definition and is missing some headers.
> 
> This patch removes the AIX-specific definitions to utilize the common
> definition and install all of the GCC PowerPC-specific headers.
> 
> Bootstrapped on powerpc-ibm-aix7.2.0.0
> 
> Thanks, David
> 
> * config.gcc (powerpc-ibm-aix6.*): Delete extra_headers.
> (powerpc-ibm-aix7.1.*): Same.
> (powerpc-ibm-aix[789].*): Same.

Thanks for cleaning this up David!


Segher

Re: [PATCH] avoid folding snprintf calls with bounds > INT_MAX (PR 87096)

2018-12-17 Thread Martin Sebor


On 12/17/18 2:23 AM, Christophe Lyon wrote:

Hi,

On Thu, 13 Dec 2018 at 19:14, Jeff Law  wrote:


On 12/12/18 4:18 PM, Martin Sebor wrote:

POSIX requires snprintf to fail with EOVERFLOW when the specified
bound exceeds INT_MAX.  This requirement conflicts with the C
requirements on valid calls to the function and isn't universally
implemented (e.g., Glibc doesn't seem to follow it, and GCC has
historically not paid heed to it either).  Nevertheless, there
are implementations that do respect it (Solaris being one of
them), and it seems that GCC should make a tricky situation
even more treacherous for programmers by returning different
values from some calls to the function than the library would.
This is also what bug 87096 requests.  The patch also adds
a warning that was missing from a subset of these troublesome
calls.

The attached patch disables the snprintf constant folding and
range optimization for calls to it and other related bounded
functions when the bound is not known not to exceed INT_MAX.

Tested on x86_64-linux.

Martin

gcc-87096.diff

PR tree-optimization/87096 - Optimised snprintf is not POSIX conformant

gcc/ChangeLog:

   PR rtl-optimization/87096
   * gimple-ssa-sprintf.c (sprintf_dom_walker::handle_gimple_call): Avoid
   folding calls whose bound may exceed INT_MAX.  Diagnose bound ranges
   that exceed the limit.

gcc/testsuite/ChangeLog:

   PR tree-optimization/87096
   * gcc.dg/tree-ssa/builtin-snprintf-4.c: New test.


This new test fails on arm (and other 32 bits targets according to
gcc-testresults)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c  (test for warnings, line 106)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c  (test for warnings, line 128)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c  (test for warnings, line 74)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c  (test for warnings, line 87)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c scan-tree-dump-times
optimized " = snprintf" 12
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c scan-tree-dump-times
optimized " = vsnprintf" 12


The test assumed that PTRDIFF_MAX > INT_MAX.  I adjusted it in
r267206 to avoid that assumption.

Thanks
Martin

Re: [arm][RFC] PR target/88469 fix incorrect argument passing with 64-bit bitfields

2018-12-17 Thread Bernd Edlinger

Hi,

> Unfortunately another PCS bug has come to light with the layout of
> structs whose alignment is dominated by a 64-bit bitfield element.  Such
> fields in the type list appear to have alignment 1, but in reality, for
> the purposes of alignment of the underlying structure, the alignment is
> derived from the underlying bitfield's type.  We've been getting this
> wrong since support for over-aligned record types was added several
> releases back.  Worse still, the existing code may generate unaligned
> memory accesses that may fault on some versions of the architecture.
> 
> I'd appreciate a comment from someone with a bit more knowledge of
> record layout - the code in stor-layout.c looks hideously complex when
> it comes to bit-field layout; but it's quite possible that all of that
> is irrelevant on Arm.


I am not really an expert here, Joseph might know the right answers.

I think there are two different alignment values, one is the minimum
alignment for a type within another structure, that can be computed
with:

min_align_of_type (type)

and there is a possibly greater value that tells how a type is aligned
when used alone:

TYPE_ALIGN_UNIT (type)

I thought both are always identical on arm, but maybe they are not.
They are for instance different for double on x86_64 in structures that
type is 4-byte aligned, and used alone, it is 8-byte aligned.


I wonder why you have to iterate over all the type members.


for instance this also gets the right alignment of the bit field type:

--- arm/arm.c   (Revision 267164)
+++ arm/arm.c   (Arbeitskopie)
@@ -6622,6 +6622,8 @@
  ret = -1;
}
  
+  if (min_align_of_type (const_cast(type)) > PARM_BOUNDARY / 
BITS_PER_UNIT)
+return 1;
return ret;
  }
  


But I might have missed the point why this needs to be dome this way.


Regards
Bernd.

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-17 Thread Uecker, Martin

Am Montag, den 17.12.2018, 10:31 -0700 schrieb Martin Sebor:
> On 12/16/18 6:45 AM, Uecker, Martin wrote:
> > Am Freitag, den 14.12.2018, 18:20 -0700 schrieb Martin Sebor:
> > > On 12/14/18 4:36 PM, Jeff Law wrote:
> > > > On 12/14/18 3:05 AM, Uecker, Martin wrote:
> > > > > 
> > > > > Am Donnerstag, den 13.12.2018, 16:35 -0700 schrieb Jeff Law:
> > > > > > On 12/12/18 11:12 AM, Uecker, Martin wrote:
> > > > > 
> > > > > ...
> > > > > > > > > diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
> > > > > > > > > index 78e768c2366..ef039560eb9 100644
> > > > > > > > > --- a/gcc/c/c-objc-common.h
> > > > > > > > > +++ b/gcc/c/c-objc-common.h
> > > > > > > > > @@ -110,4 +110,7 @@ along with GCC; see the file COPYING3.  If
> > > > > > > > > not see
> > > > > > > > >    
> > > > > > > > >    #undef LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P
> > > > > > > > >    #define LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P 
> > > > > > > > > c_vla_unspec_p
> > > > > > > > > +
> > > > > > > > > +#undef LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
> > > > > > > > > +#define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
> > > > > > > > >    #endif /* GCC_C_OBJC_COMMON */
> > > > > > > > 
> > > > > > > > I wonder if we even need the lang hook anymore.  ISTM that a
> > > > > > > > front-end
> > > > > > > > that wants to use the function descriptors can just set
> > > > > > > > FUNC_ADDR_BY_DESCRIPTOR and we'd use the function descriptor,
> > > > > > > > else we'll
> > > > > > > > use the trampoline.  Thoughts?
> > > > > > > 
> > > > > > > The lang hook also affects the minimum alignment for function
> > > > > > > pointers via the FUNCTION_ALIGNMENT macro (gcc/default.h). This
> > > > > > > does
> > > > > > > not appear to change the default alignment on any architecture, 
> > > > > > > but
> > > > > > > it causes a failure in i386/gcc.target/i386/attr-aligned.c when
> > > > > > > requesting a smaller alignment which is then silently ignored.
> > > > > > 
> > > > > > Ugh.  I didn't see that.
> > > > > 
> > > > > The test is new (2019-11-29 Martin Sebor), but one could
> > > > > argue that we could simply remove this specific test as 'aligned'
> > > > > is only required to increase alignment. Martin?
> > > > 
> > > > The test is meant to test that we do the right thing consistently.  If
> > > > we're failing with your patch, then that needs to be addressed.
> > > 
> > > I haven't been paying attention here and so I don't know how the test
> > > fails after the change.  It's meant to verify that attribute aligned
> > > successfully reduces the alignment of functions that have not been
> > > previously declared with one all the way down to the supported minimum
> > > (which is 1 on i386).  I agree with Jeff that removing the test would
> > > not be right unless the failure is due to some bad assumptions on my
> > > part.  If it's the built-in that fails that could be due to a bug in
> > > it (it's very new).
> > 
> > There is a choice to be made:
> > 
> > Either we activate the lang hook for C, then the minimum alignment for
> > functions on is not 1 anymore, because we need one bit to identify the
> > descriptors.  From a correcntess point of view, this is OK as 'alignas'
> > is only required to increase alignment.
> 
> alignas doesn't apply to functions.  Changing function alignment
> is a feature unique to attribute aligned.  The attribute can be
> used to decrease the alignment of functions that have not yet
> been explicitly declared with one.  GCC has historically allowed
> this, and based on my tests, the i386 target has always allowed
> functions to be completely unaligned (either by using attribute
> aligned (1) when supported or via -Os/-falign-functions=1).
> The purpose of the test is to verify that this works.
> 
> > It is also not really regression
> > in my opinion, as it is nowhere documented that one can reduce alignment
> > to '1'. The test also has just been added a couple of days ago (if I am
> > not mistaken). For these reasons, I think it would be OK to remove the test.
> 
> This wasn't clearly documented until recently.  The test was added
> to verify that GCC behaves as intended and clarified in the manual.
> The latest manual mentions that:
> 
>    The attribute cannot be used to decrease the alignment of
>    a function previously declared with a more restrictive alignment;
>    only to increase it.  Attempts to do otherwise are diagnosed.
>    Some targets specify a minimum default alignment for functions
>    that is greater than 1.  On such targets, specifying a less
>    restrictive alignment is silently ignored.
> 
> The update to the text was discussed here:
>    https://gcc.gnu.org/ml/gcc/2018-11/msg00127.html
> 
> If the i386 target should increase the minimum supported alignment
> up from 1 under some conditions the test might need to be adjusted
> (but it should not be removed).

Thank you for clarifying. Based on the discussion I think we can't
change the minimum alignment (at least without the command-

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-17 Thread Uecker, Martin

Am Montag, den 17.12.2018, 10:28 -0700 schrieb Jeff Law:
> On 12/16/18 3:45 PM, Uecker, Martin wrote:
> > Am Sonntag, den 16.12.2018, 09:13 -0700 schrieb Jeff Law:
> > > On 12/16/18 6:45 AM, Uecker, Martin wrote:
> > > > Am Freitag, den 14.12.2018, 18:20 -0700 schrieb Martin Sebor:
> > > > > On 12/14/18 4:36 PM, Jeff Law wrote:
> > > > > > On 12/14/18 3:05 AM, Uecker, Martin wrote:
> > > > > > > 
> > > > > > > Am Donnerstag, den 13.12.2018, 16:35 -0700 schrieb Jeff Law:
> > > > > > > > On 12/12/18 11:12 AM, Uecker, Martin wrote:
> > > > > > > 
...

> > > There's certainly targets where 1 byte function alignment is important
> > > from a code space perspective -- it's likely that function descriptors
> > > will never be supported on those targets.
> > 
> > But most architectures require a higher alignment anyway.
> > Here is a list of all targets where function alignment
> > is 1 byte:
> > 
> > gcc/config/avr/avr.h:#define FUNCTION_BOUNDARY 8
> > gcc/config/i386/i386.h:#define FUNCTION_BOUNDARY 8
> > gcc/config/m32c/m32c.h:#define FUNCTION_BOUNDARY 8
> > gcc/config/mn10300/mn10300.h:#define FUNCTION_BOUNDARY 8
> > gcc/config/pa/pa.h:#define FUNCTION_BOUNDARY BITS_PER_WORD
> > gcc/config/rl78/rl78.h:#define FUNCTION_BOUNDARY    8
> > gcc/config/rx/rx.h:#define FUNCTION_BOUNDARY    ((rx_cpu_type 
> > == RX100 ||
> > rx_cpu_type == RX200) ? 4 : 8)

(BTW: pa was included here by mistake)

> > 
> > From this list only i386 currently adds the target hook and
> > would be affected by this patch. But arm is also affected:
> >  
> > > It's also important to remember that not every target which uses
> > > function descriptors uses the LSB.  On some targets the LSB may switch
> > > between modes (arm vs thumb for example).  So on those targets the use
> > > of descriptors may imply an even larger minimum alignment.
> > 
> > Yes, arm reserves the lowest 2 bits so since we require an additional
> > bit and this increases the minimum alignment from 4 to 8 bytes on 
> > aarch64.
> > 
> > Here are the architectures which currently support function
> > descriptors and the bit used:
> > 
> > $ grep TARGET_CUSTOM gcc/config/*/*.c | grep define
> > gcc/config/aarch64/aarch64.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 4
> > gcc/config/arm/arm.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 2
> > gcc/config/csky/csky.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> > gcc/config/i386/i386.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> > gcc/config/ia64/ia64.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 0
> > gcc/config/mips/mips.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 2
> > gcc/config/or1k/or1k.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> > gcc/config/powerpcspe/powerpcspe.c:#define 
> > TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> > gcc/config/riscv/riscv.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> > gcc/config/rs6000/rs6000.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> > gcc/config/s390/s390.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> > gcc/config/sparc/sparc.c:#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
> > 
> > For mips minimum alignment is 4 bytes, so here alignment would
> > not change.
> > 
> > > Ultimately using function descriptors is an ABI breaking choice and we
> > > might declare that function descriptors imply higher function
> > > alignments.  
> > 
> > Increasing the alignment is not an ABI breaking change.
> 
> Increasing alignment is also an ABI change if you make those low bits
> have a new meaning.  

Absolutely, but this would only happen with the command-line
flag.

> Existing code won't know if the target address is
> actually a function descriptor or just a minimally aligned real function.

Let's think practically:

There are projects which use nested functions and typically these functions
and pointers to these functions are used only internally. The code interfaces
with existing code, which usually uses an alignment which is sufficient.
On most architectures this is the case, because the minimum alignment is
already big enough, and on others because the default alignment (atleast with
optimization) is big enough. If one is careful (actually checks that no
pointer to a nested function escapes and no pointer to functions with
smaller alignment are passed in) then using  '-fno-trampolines' is 
perfectly safe. This would then allow to compile these projects in a
way which does not require an executable stack. In my opinion this is
a very useful feature.

Admittedly, we should add the same warning we also have for other
ABI-changing options (e.g. for -freg-struct-return) to the description
and also only change the minimum alignment when the flag is given.
There would then be no impact without the flag and people who know
that it safe could still use it.
 

> It's not terribly important, but the HPPA bits above are wrong.  It's
> minimum alignment is 4 bytes.  The low two bits are used for two
> different style function descriptors that already exist on that
> platform.  We

Re: [PATCH] PR fortran 88116,88467 -- Catch array constructor errors

2018-12-17 Thread Segher Boessenkool

On Sun, Dec 16, 2018 at 07:46:45AM -0800, Steve Kargl wrote:
> On Sun, Dec 16, 2018 at 11:45:11AM +0100, Dominique d'Humières wrote:
> > Hi Steve,
> > 
> > The patch works as expected.
> > 
> > Is "Can\’t " instead of "Can’t " really necessary?
> > 
> 
> I don't remember how dejagnu handles quotes in the
> regex expression.  "Can\'t" seems to work (for me).
> If you want to change it, go ahead.  Personally, I
> would prefer "Cannot". 

A backslash in a double-quoted string is handled by backslash-substitution,
before it is ever seen as a regexp.  "\'" is exactly the same as "'".

In a braced string it does not get backslash-substitution and then it _is_
handled by the regular expression parsing, but again, as RE {\'} is exactly
the same as {'}.

Segher

Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-17 Thread Thomas Schwinge

Hi Chung-Lin!

On Fri, 14 Dec 2018 22:28:58 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/13 11:51 PM, Thomas Schwinge wrote:
> > On Thu, 13 Dec 2018 23:28:49 +0800, Chung-Lin 
> > Tang  wrote:
> >> On 2018/12/7 6:26 AM, Julian Brown wrote:
> >>> On Thu, 6 Dec 2018 22:22:46 +
> >>> Julian Brown  wrote:
> >>>
>  On Thu, 6 Dec 2018 21:42:14 +0100
>  Thomas Schwinge  wrote:
> 
> > [...]
> > ..., where the "Invalid read of size 8" happens, and which
> > eventually would try to "free (tgt)" again, via
> > libgomp/target.c:gomp_unmap_tgt:
> >
> >   attribute_hidden void
> >   gomp_unmap_tgt (struct target_mem_desc *tgt)
> >   {
> > /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
> > region.  */ if (tgt->tgt_end)
> >   gomp_free_device_memory (tgt->device_descr, tgt->to_free);
> >   
> > free (tgt->array);
> > free (tgt);
> >   }
> >
> > Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
> > or something else?
> 
> I think I understand the problem now. In gomp_unmap_vars_async(), in the case 
> of
> tgt->list_count == 0 (i.e. no map arguments at all) the code should simply 
> free the tgt
> and return, while the code in goacc_async_copyout_unmap_vars() didn't handle 
> this case
> and always scheduled an asynchronous free of the tgt later, causing that 
> valgrind error
> you see.
> 
> I am still testing the attached patch, but I think it is the right fix: I 
> reviewed what I
> wrote and it seemed the way I organized things into a 
> goacc_async_copyout_unmap_vars() routine,
> including the hackish refcount++, etc. is simply unneeded. I have deleted 
> those stuff
> and consolidated things back into gomp_unmap_vars_async().
> 
> I'll update the whole patches later after complete testing, the attached is 
> the patch atop
> of the prior async patches. (the small program you gave above does pass 
> valgrind now)

Thanks, confirmed.


Grüße
 Thomas


> diff -ru trunk-orig/libgomp/oacc-async.c trunk-work/libgomp/oacc-async.c
> --- trunk-orig/libgomp/oacc-async.c   2018-12-14 21:06:06.649794724 +0800
> +++ trunk-work/libgomp/oacc-async.c   2018-12-14 22:11:29.252251925 +0800
> @@ -238,31 +238,6 @@
>thr->default_async = async;
>  }
>  
> -static void
> -goacc_async_unmap_tgt (void *ptr)
> -{
> -  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
> -
> -  if (tgt->refcount > 1)
> -tgt->refcount--;
> -  else
> -gomp_unmap_tgt (tgt);
> -}
> -
> -attribute_hidden void
> -goacc_async_copyout_unmap_vars (struct target_mem_desc *tgt,
> - struct goacc_asyncqueue *aq)
> -{
> -  struct gomp_device_descr *devicep = tgt->device_descr;
> -
> -  /* Increment reference to delay freeing of device memory until callback
> - has triggered.  */
> -  tgt->refcount++;
> -  gomp_unmap_vars_async (tgt, true, aq);
> -  devicep->openacc.async.queue_callback_func (aq, goacc_async_unmap_tgt,
> -   (void *) tgt);
> -}
> -
>  attribute_hidden void
>  goacc_async_free (struct gomp_device_descr *devicep,
> struct goacc_asyncqueue *aq, void *ptr)
> diff -ru trunk-orig/libgomp/oacc-int.h trunk-work/libgomp/oacc-int.h
> --- trunk-orig/libgomp/oacc-int.h 2018-12-14 21:06:06.649794724 +0800
> +++ trunk-work/libgomp/oacc-int.h 2018-12-14 22:11:43.379947915 +0800
> @@ -104,8 +104,6 @@
>  
>  void goacc_init_asyncqueues (struct gomp_device_descr *);
>  bool goacc_fini_asyncqueues (struct gomp_device_descr *);
> -void goacc_async_copyout_unmap_vars (struct target_mem_desc *,
> -  struct goacc_asyncqueue *);
>  void goacc_async_free (struct gomp_device_descr *, struct goacc_asyncqueue *,
>  void *);
>  struct goacc_asyncqueue *get_goacc_asyncqueue (int);
> diff -ru trunk-orig/libgomp/oacc-mem.c trunk-work/libgomp/oacc-mem.c
> --- trunk-orig/libgomp/oacc-mem.c 2018-12-14 21:06:06.649794724 +0800
> +++ trunk-work/libgomp/oacc-mem.c 2018-12-14 22:10:08.325998369 +0800
> @@ -911,7 +911,7 @@
>else
>   {
> goacc_aq aq = get_goacc_asyncqueue (async);
> -   goacc_async_copyout_unmap_vars (t, aq);
> +   gomp_unmap_vars_async (t, true, aq);
>   }
>  }
>  
> diff -ru trunk-orig/libgomp/oacc-parallel.c trunk-work/libgomp/oacc-parallel.c
> --- trunk-orig/libgomp/oacc-parallel.c2018-12-14 21:06:06.649794724 
> +0800
> +++ trunk-work/libgomp/oacc-parallel.c2018-12-14 22:09:51.918353575 
> +0800
> @@ -245,7 +245,7 @@
>  {
>acc_dev->openacc.async.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
>   dims, tgt, aq);
> -  goacc_async_copyout_unmap_vars (tgt, aq);
> +  gomp_unmap_vars_async (tgt, true, aq);
>  }
>  }
>  
> diff -ru trunk-orig/libgomp/target.c trunk-work/libgomp/target.c
> --- trunk-orig/libgomp/target.c   2018

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-17 Thread Martin Sebor


On 12/16/18 6:45 AM, Uecker, Martin wrote:

Am Freitag, den 14.12.2018, 18:20 -0700 schrieb Martin Sebor:

On 12/14/18 4:36 PM, Jeff Law wrote:

On 12/14/18 3:05 AM, Uecker, Martin wrote:


Am Donnerstag, den 13.12.2018, 16:35 -0700 schrieb Jeff Law:

On 12/12/18 11:12 AM, Uecker, Martin wrote:


...

diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
index 78e768c2366..ef039560eb9 100644
--- a/gcc/c/c-objc-common.h
+++ b/gcc/c/c-objc-common.h
@@ -110,4 +110,7 @@ along with GCC; see the file COPYING3.  If
not see
   
   #undef LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P

   #define LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P c_vla_unspec_p
+
+#undef LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
+#define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
   #endif /* GCC_C_OBJC_COMMON */


I wonder if we even need the lang hook anymore.  ISTM that a
front-end
that wants to use the function descriptors can just set
FUNC_ADDR_BY_DESCRIPTOR and we'd use the function descriptor,
else we'll
use the trampoline.  Thoughts?


The lang hook also affects the minimum alignment for function
pointers via the FUNCTION_ALIGNMENT macro (gcc/default.h). This
does
not appear to change the default alignment on any architecture, but
it causes a failure in i386/gcc.target/i386/attr-aligned.c when
requesting a smaller alignment which is then silently ignored.


Ugh.  I didn't see that.


The test is new (2019-11-29 Martin Sebor), but one could
argue that we could simply remove this specific test as 'aligned'
is only required to increase alignment. Martin?


The test is meant to test that we do the right thing consistently.  If
we're failing with your patch, then that needs to be addressed.


I haven't been paying attention here and so I don't know how the test
fails after the change.  It's meant to verify that attribute aligned
successfully reduces the alignment of functions that have not been
previously declared with one all the way down to the supported minimum
(which is 1 on i386).  I agree with Jeff that removing the test would
not be right unless the failure is due to some bad assumptions on my
part.  If it's the built-in that fails that could be due to a bug in
it (it's very new).


There is a choice to be made:

Either we activate the lang hook for C, then the minimum alignment for
functions on is not 1 anymore, because we need one bit to identify the
descriptors.  From a correcntess point of view, this is OK as 'alignas'
is only required to increase alignment.


alignas doesn't apply to functions.  Changing function alignment
is a feature unique to attribute aligned.  The attribute can be
used to decrease the alignment of functions that have not yet
been explicitly declared with one.  GCC has historically allowed
this, and based on my tests, the i386 target has always allowed
functions to be completely unaligned (either by using attribute
aligned (1) when supported or via -Os/-falign-functions=1).
The purpose of the test is to verify that this works.


It is also not really regression
in my opinion, as it is nowhere documented that one can reduce alignment
to '1'. The test also has just been added a couple of days ago (if I am
not mistaken). For these reasons, I think it would be OK to remove the test.


This wasn't clearly documented until recently.  The test was added
to verify that GCC behaves as intended and clarified in the manual.
The latest manual mentions that:

  The attribute cannot be used to decrease the alignment of
  a function previously declared with a more restrictive alignment;
  only to increase it.  Attempts to do otherwise are diagnosed.
  Some targets specify a minimum default alignment for functions
  that is greater than 1.  On such targets, specifying a less
  restrictive alignment is silently ignored.

The update to the text was discussed here:
  https://gcc.gnu.org/ml/gcc/2018-11/msg00127.html

If the i386 target should increase the minimum supported alignment
up from 1 under some conditions the test might need to be adjusted
(but it should not be removed).

Martin



The other option is to decide that having an alignment of only '1'
is a valuable feature and should be preserved on those platforms which
support it. I have no idea what this could be good for, but maybe
there are use cases.  In this case, it makes of course sense to keep
the test.  We should then remove the lang hook (as it could never be
activated for most languages) and instead live with the fact that
'-fno-trampoline' and using alignof(1) on functions are simply
incompatible. A warning could be added if the compiler sees
alignof(1) when -fno-trampoline is active.


I am happy with both choices.


Best,
Martin

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-17 Thread Jeff Law

On 12/16/18 3:45 PM, Uecker, Martin wrote:
> Am Sonntag, den 16.12.2018, 09:13 -0700 schrieb Jeff Law:
>> On 12/16/18 6:45 AM, Uecker, Martin wrote:
>>> Am Freitag, den 14.12.2018, 18:20 -0700 schrieb Martin Sebor:
 On 12/14/18 4:36 PM, Jeff Law wrote:
> On 12/14/18 3:05 AM, Uecker, Martin wrote:
>>
>> Am Donnerstag, den 13.12.2018, 16:35 -0700 schrieb Jeff Law:
>>> On 12/12/18 11:12 AM, Uecker, Martin wrote:
>>
>> ...
>> diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
>> index 78e768c2366..ef039560eb9 100644
>> --- a/gcc/c/c-objc-common.h
>> +++ b/gcc/c/c-objc-common.h
>> @@ -110,4 +110,7 @@ along with GCC; see the file COPYING3.  If
>> not see
>>   
>>   #undef LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P
>>   #define LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P c_vla_unspec_p
>> +
>> +#undef LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
>> +#define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
>>   #endif /* GCC_C_OBJC_COMMON */
>
> I wonder if we even need the lang hook anymore.  ISTM that a
> front-end
> that wants to use the function descriptors can just set
> FUNC_ADDR_BY_DESCRIPTOR and we'd use the function descriptor,
> else we'll
> use the trampoline.  Thoughts?

 The lang hook also affects the minimum alignment for function
 pointers via the FUNCTION_ALIGNMENT macro (gcc/default.h). This
 does
 not appear to change the default alignment on any architecture, but
 it causes a failure in i386/gcc.target/i386/attr-aligned.c when
 requesting a smaller alignment which is then silently ignored.
>>>
>>> Ugh.  I didn't see that.
>>
>> The test is new (2019-11-29 Martin Sebor), but one could
>> argue that we could simply remove this specific test as 'aligned'
>> is only required to increase alignment. Martin?
>
> The test is meant to test that we do the right thing consistently.  If
> we're failing with your patch, then that needs to be addressed.

 I haven't been paying attention here and so I don't know how the test
 fails after the change.  It's meant to verify that attribute aligned
 successfully reduces the alignment of functions that have not been
 previously declared with one all the way down to the supported minimum
 (which is 1 on i386).  I agree with Jeff that removing the test would
 not be right unless the failure is due to some bad assumptions on my
 part.  If it's the built-in that fails that could be due to a bug in
 it (it's very new).
>>>
>>> There is a choice to be made: 
>>>
>>> Either we activate the lang hook for C, then the minimum alignment for
>>> functions on is not 1 anymore, because we need one bit to identify the 
>>> descriptors.  From a correcntess point of view, this is OK as 'alignas'
>>> is only required to increase alignment. It is also not really regression
>>> in my opinion, as it is nowhere documented that one can reduce alignment
>>> to '1'. The test also has just been added a couple of days ago (if I am
>>> not mistaken). For these reasons, I think it would be OK to remove the test.
>>>
>>> The other option is to decide that having an alignment of only '1'
>>> is a valuable feature and should be preserved on those platforms which
>>> support it. I have no idea what this could be good for, but maybe
>>> there are use cases.  In this case, it makes of course sense to keep
>>> the test.  We should then remove the lang hook (as it could never be
>>> activated for most languages) and instead live with the fact that
>>> '-fno-trampoline' and using alignof(1) on functions are simply
>>> incompatible. A warning could be added if the compiler sees
>>> alignof(1) when -fno-trampoline is active.
>>
>> There's certainly targets where 1 byte function alignment is important
>> from a code space perspective -- it's likely that function descriptors
>> will never be supported on those targets.
> 
> But most architectures require a higher alignment anyway.
> Here is a list of all targets where function alignment
> is 1 byte:
> 
> gcc/config/avr/avr.h:#define FUNCTION_BOUNDARY 8
> gcc/config/i386/i386.h:#define FUNCTION_BOUNDARY 8
> gcc/config/m32c/m32c.h:#define FUNCTION_BOUNDARY 8
> gcc/config/mn10300/mn10300.h:#define FUNCTION_BOUNDARY 8
> gcc/config/pa/pa.h:#define FUNCTION_BOUNDARY BITS_PER_WORD
> gcc/config/rl78/rl78.h:#define FUNCTION_BOUNDARY  8
> gcc/config/rx/rx.h:#define FUNCTION_BOUNDARY  ((rx_cpu_type == RX100 
> || rx_cpu_type == RX200) ? 4 : 8)


> 
> 
> From this list only i386 currently adds the target hook and
> would be affected by this patch. But arm is also affected:
>  
>> It's also important to remember that not every target which uses
>> function descriptors uses the LSB.  On some targets the LSB may switch
>> between modes (arm vs thumb fo

Re: [patch] Fix bootstrap powerpc--freebsd* targets

2018-12-17 Thread Segher Boessenkool

Hi!

On Mon, Dec 17, 2018 at 10:40:01AM +1030, Alan Modra wrote:
> Since I broke powerpc*-freebsd and the other non-linux powerpc
> targets, I guess I ought to fix them.  The following is a variation on
> your first patch, that results in -mcall-linux for powerpc-freebsd*
> providing the 32-bit powerpc-linux dynamic linker.

That, like the first patch, abuses that header file.  Please do it
somewhere sane instead, not in a random subtarget file?


Segher


>   * config/rs6000/sysv4.h (GNU_USER_DYNAMIC_LINKER): Define.

[arm][RFC] PR target/88469 fix incorrect argument passing with 64-bit bitfields

2018-12-17 Thread Richard Earnshaw (lists)

Unfortunately another PCS bug has come to light with the layout of
structs whose alignment is dominated by a 64-bit bitfield element.  Such
fields in the type list appear to have alignment 1, but in reality, for
the purposes of alignment of the underlying structure, the alignment is
derived from the underlying bitfield's type.  We've been getting this
wrong since support for over-aligned record types was added several
releases back.  Worse still, the existing code may generate unaligned
memory accesses that may fault on some versions of the architecture.

I'd appreciate a comment from someone with a bit more knowledge of
record layout - the code in stor-layout.c looks hideously complex when
it comes to bit-field layout; but it's quite possible that all of that
is irrelevant on Arm.

PR target/88469
* config/arm/arm.c (arm_needs_doubleword_align): Return 2 if a
record's alignment is dominated by a bitfield with 64-bit
aligned base type.
(arm_function_arg): Emit a warning if the alignment has changed
since earlier GCC releases.
(arm_function_arg_boundary): Likewise.
(arm_setup_incoming_varargs): Likewise.
* testsuite/gcc.target/arm/aapcs/bitfield1.c: New test.

I'm not committing this yet, as I'll await comments as to whether folks
think this is sufficient.

R.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 40f0574e32e..5f5269c71c9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6589,7 +6589,9 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype,
 }
 }
 
-/* Return 1 if double word alignment is required for argument passing.
+/* Return 2 if double word alignment is required for argument passing,
+   but wasn't required before the fix for PR88469.
+   Return 1 if double word alignment is required for argument passing.
Return -1 if double word alignment used to be required for argument
passing before PR77728 ABI fix, but is not required anymore.
Return 0 if double word alignment is not required and wasn't requried
@@ -6609,7 +6611,8 @@ arm_needs_doubleword_align (machine_mode mode, const_tree type)
 return TYPE_ALIGN (TREE_TYPE (type)) > PARM_BOUNDARY;
 
   int ret = 0;
-  /* Record/aggregate types: Use greatest member alignment of any member.  */ 
+  int ret2 = 0;
+  /* Record/aggregate types: Use greatest member alignment of any member.  */
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 if (DECL_ALIGN (field) > PARM_BOUNDARY)
   {
@@ -6621,6 +6624,13 @@ arm_needs_doubleword_align (machine_mode mode, const_tree type)
 	 Make sure we can warn about that with -Wpsabi.  */
 	  ret = -1;
   }
+else if (TREE_CODE (field) == FIELD_DECL
+	 && DECL_BIT_FIELD (field)
+	 && TYPE_ALIGN (DECL_BIT_FIELD_TYPE (field)) > PARM_BOUNDARY)
+  ret2 = 1;
+
+  if (ret2)
+return 2;
 
   return ret;
 }
@@ -6686,7 +6696,12 @@ arm_function_arg (cumulative_args_t pcum_v, machine_mode mode,
 	inform (input_location, "parameter passing for argument of type "
 		"%qT changed in GCC 7.1", type);
   else if (res > 0)
-	pcum->nregs++;
+	{
+	  pcum->nregs++;
+	  if (res > 1 && warn_psabi)
+	inform (input_location, "parameter passing for argument of type "
+		"%qT changed in GCC 9.1", type);
+	}
 }
 
   /* Only allow splitting an arg between regs and memory if all preceding
@@ -6713,6 +6728,9 @@ arm_function_arg_boundary (machine_mode mode, const_tree type)
   if (res < 0 && warn_psabi)
 inform (input_location, "parameter passing for argument of type %qT "
 	"changed in GCC 7.1", type);
+  if (res > 1 && warn_psabi)
+inform (input_location, "parameter passing for argument of type "
+	"%qT changed in GCC 9.1", type);
 
   return res > 0 ? DOUBLEWORD_ALIGNMENT : PARM_BOUNDARY;
 }
@@ -26953,7 +26971,13 @@ arm_setup_incoming_varargs (cumulative_args_t pcum_v,
 	inform (input_location, "parameter passing for argument of "
 		"type %qT changed in GCC 7.1", type);
 	  else if (res > 0)
-	nregs++;
+	{
+	  nregs++;
+	  if (res > 1 && warn_psabi)
+		inform (input_location,
+			"parameter passing for argument of type "
+			"%qT changed in GCC 9.1", type);
+	}
 	}
 }
   else
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/bitfield1.c b/gcc/testsuite/gcc.target/arm/aapcs/bitfield1.c
new file mode 100644
index 000..7d8b7065484
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/bitfield1.c
@@ -0,0 +1,23 @@
+/* Test AAPCS layout (alignment).  */
+
+/* { dg-do run { target arm_eabi } } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O" } */
+
+#ifndef IN_FRAMEWORK
+#define TESTFILE "bitfield1.c"
+
+struct bf
+{
+  unsigned long long a: 61;
+  unsigned b: 3;
+} v = {1, 1};
+
+#include "abitest.h"
+#else
+  ARG (int, 7, R0)
+  /* Attribute suggests R2, but we should use only natural alignment:  */
+  ARG (int, 9, R1)
+  ARG (int, 11, R2)
+  LAST_ARG (struct bf, v, STACK)
+#endif

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Segher Boessenkool

On Mon, Dec 17, 2018 at 11:47:42AM +, Richard Sandiford wrote:
> Dimitar Dimitrov  writes:
> > On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
> >> Hi,
> >> 
> >> if I understood that right, then clobbering sp is and has always been
> >> ignored.
> 
> PR77904 was about the clobber not being ignored, so the behaviour
> hasn't been consistent.
> 
> I'm also not sure it was always ignored in recent sources.  The clobber
> does get added to the associated rtl insn, and it'd be surprising if
> that never had an effect.

Yes, you will usually get a frame pointer.  My point was that the epilogue
will restore your stack pointer both with and without the asm clobber.

> I don't think there's a good reason to treat this differently from the
> preexisting PIC register error.  If the argument for making it a warning
> rather than an error is that the asm might happen to work by accident,
> then the same is true for the PIC register.

Yes.  As well as quite a few more registers, many of those specific to
the target.  And there are many more things you can do terribly wrong in
inline assembler, of course, most of which we can never detect.

Segher

[PATCH v2, middle-end/i386]: Fix PR88502, Inline built-in asinh, acosh, atanh for -ffast-math

2018-12-17 Thread Uros Bizjak

On Mon, Dec 17, 2018 at 9:26 AM Richard Biener  wrote:
>
> On Mon, 17 Dec 2018, Uros Bizjak wrote:
>
> > ... and the patch.
>
> middle-end parts are OK.
>
> > On Mon, Dec 17, 2018 at 8:58 AM Uros Bizjak  wrote:
> > >
> > > Attached patch inlines calls to asinh{,f}, acosh{,f,l} and atanh{,f,l}
> > > using x87 XFmode arithmetic. In the patch, I left out asinhl due to
> > > its reduced input argument range, but perhaps it could be added back,
> > > since we are expanding under flag_unsafe_math_optimizations. The
> > > expanders are modelled after the removed inlines in glibc [1] (which
> > > also include asinhl, with a comment mentioning its reduced input
> > > argument range).

Thinking a bit more about reduced input range of asinhl - we have
similar situation with other trigonometric functions, where argument
range is reduced to +-2^63. So, I have committed version 2 of the
patch, which also expands asinhl.

2018-12-17  Uros Bizjak  

PR target/88502
* internal-fn.def (ACOSH): New.
(ASINH): Ditto.
(ATANH): Ditto.
* optabs.def (acosh_optab): New.
(asinh_optab): Ditto.
(atanh_optab): Ditto.
* config/i386/i386-protos.h (ix86_emit_i387_asinh): New prototype.
(ix86_emit_i387_acosh): Ditto.
(ix86_emit_i387_atanh): Ditto.
* config/i386/i386.c (ix86_emit_i387_asinh): New function.
(ix86_emit_i387_acosh): Ditto.
(ix86_emit_i387_atanh): Ditto.
* config/i386/i386.md (asinhxf2): New expander.
(asinh2):Ditto.
(acoshxf2): Ditto.
(acosh2): Ditto.
(atanhxf2): Ditto.
(atanh2): Ditto.

Uros.
Index: config/i386/i386-protos.h
===
--- config/i386/i386-protos.h   (revision 267203)
+++ config/i386/i386-protos.h   (working copy)
@@ -170,6 +170,9 @@
 extern void x86_emit_floatuns (rtx [2]);
 extern void ix86_emit_fp_unordered_jump (rtx);
 
+extern void ix86_emit_i387_asinh (rtx, rtx);
+extern void ix86_emit_i387_acosh (rtx, rtx);
+extern void ix86_emit_i387_atanh (rtx, rtx);
 extern void ix86_emit_i387_log1p (rtx, rtx);
 extern void ix86_emit_i387_round (rtx, rtx);
 extern void ix86_emit_swdivsf (rtx, rtx, rtx, machine_mode);
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 267203)
+++ config/i386/i386.c  (working copy)
@@ -44054,6 +44054,135 @@
   JUMP_LABEL (insn) = label;
 }
 
+/* Output code to perform an asinh XFmode calculation.  */
+
+void ix86_emit_i387_asinh (rtx op0, rtx op1)
+{
+  rtx e1 = gen_reg_rtx (XFmode);
+  rtx e2 = gen_reg_rtx (XFmode);
+  rtx scratch = gen_reg_rtx (HImode);
+  rtx flags = gen_rtx_REG (CCNOmode, FLAGS_REG);
+  rtx cst1, tmp;
+  rtx_code_label *jump_label = gen_label_rtx ();
+  rtx_insn *insn;
+
+  /* e2 = sqrt (op1^2 + 1.0) + 1.0 */
+  emit_insn (gen_mulxf3 (e1, op1, op1));
+  cst1 = force_reg (XFmode, CONST1_RTX (XFmode));
+  emit_insn (gen_addxf3 (e2, e1, cst1));
+  emit_insn (gen_sqrtxf2 (e2, e2));
+  emit_insn (gen_addxf3 (e2, e2, cst1));
+
+  /* e1 = e1 / e2 */
+  emit_insn (gen_divxf3 (e1, e1, e2));
+
+  /* scratch = fxam (op1) */
+  emit_insn (gen_fxamxf2_i387 (scratch, op1));
+
+  /* e1 = e1 + |op1| */
+  emit_insn (gen_absxf2 (e2, op1));
+  emit_insn (gen_addxf3 (e1, e1, e2));
+
+  /* e2 = log1p (e1) */
+  ix86_emit_i387_log1p (e2, e1);
+
+  /* flags = signbit (op1) */
+  emit_insn (gen_testqi_ext_1_ccno (scratch, GEN_INT (0x02)));
+
+  /* if (flags) then e2 = -e2 */
+  tmp = gen_rtx_IF_THEN_ELSE (VOIDmode,
+ gen_rtx_EQ (VOIDmode, flags, const0_rtx),
+ gen_rtx_LABEL_REF (VOIDmode, jump_label),
+ pc_rtx);
+  insn = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+  predict_jump (REG_BR_PROB_BASE * 50 / 100);
+  JUMP_LABEL (insn) = jump_label;
+
+  emit_insn (gen_negxf2 (e2, e2));
+
+  emit_label (jump_label);
+  LABEL_NUSES (jump_label) = 1;
+
+  emit_move_insn (op0, e2);
+}
+
+/* Output code to perform an acosh XFmode calculation.  */
+
+void ix86_emit_i387_acosh (rtx op0, rtx op1)
+{
+  rtx e1 = gen_reg_rtx (XFmode);
+  rtx e2 = gen_reg_rtx (XFmode);
+  rtx cst1 = force_reg (XFmode, CONST1_RTX (XFmode));
+
+  /* e2 = sqrt (op1 + 1.0) */
+  emit_insn (gen_addxf3 (e2, op1, cst1));
+  emit_insn (gen_sqrtxf2 (e2, e2));
+
+  /* e1 = sqrt (op1 - 1.0) */
+  emit_insn (gen_subxf3 (e1, op1, cst1));
+  emit_insn (gen_sqrtxf2 (e1, e1));
+
+  /* e1 = e1 * e2 */
+  emit_insn (gen_mulxf3 (e1, e1, e2));
+
+  /* e1 = e1 + op1 */
+  emit_insn (gen_addxf3 (e1, e1, op1));
+
+  /* op0 = log (e1) */
+  emit_insn (gen_logxf2 (op0, e1));
+}
+
+/* Output code to perform an atanh XFmode calculation.  */
+
+void ix86_emit_i387_atanh (rtx op0, rtx op1)
+{
+  rtx e1 = gen_reg_rtx (XFmode);
+  rtx e2 = gen_reg_rtx (XFmode);
+  rtx scratch = gen_reg_rtx (HImode);
+  rtx flags = gen_rtx_REG (CCNOmode, FLAGS_REG);
+  rtx half = const_double_from_real_value (dconsthalf, XFmode);
+  rtx cst1, tmp;
+  rtx_code_label *jump_l

[PATCH] PR c++/52321 print note for static_cast to/from incomplete type

2018-12-17 Thread Jonathan Wakely


If build_static_cast_1 returns false, and one or both type is a
pointer/reference to an incomplete class, print a note saying so. This
doesn't attempt to check whether the static_cast failed because the
type is incomplete (which was checked inside build_static_cast_1 but
not in the caller). It just informs the user that it is incomplete,
which is probably the most likely reason.

PR c++/52321
* typeck.c (build_static_cast): Print a note when the destination
type or the operand is a pointer/reference to incomplete class type.

Tested powerpc64le-linux. OK for trunk?


commit c76d6fb5cfb37c2aed387def151d9f707306128a
Author: Jonathan Wakely 
Date:   Mon Dec 17 15:01:21 2018 +

PR c++/52321 print note for static_cast to/from incomplete type

PR c++/52321
* typeck.c (build_static_cast): Print a note when the destination
type or the operand is a pointer/reference to incomplete class type.

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index ac0c81155b5..47ddad16fc1 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -7348,8 +7348,21 @@ build_static_cast (tree type, tree oexpr, tsubst_flags_t 
complain)
 }
 
   if (complain & tf_error)
-error ("invalid static_cast from type %qT to type %qT",
-   TREE_TYPE (expr), type);
+{
+  error ("invalid static_cast from type %qT to type %qT",
+TREE_TYPE (expr), type);
+  if ((TYPE_PTR_P (type) || TYPE_REF_P (type))
+ && CLASS_TYPE_P (TREE_TYPE (type))
+   && !COMPLETE_TYPE_P (TREE_TYPE (type)))
+   inform (DECL_SOURCE_LOCATION (TYPE_MAIN_DECL (TREE_TYPE (type))),
+   "class type %qT is incomplete", TREE_TYPE (type));
+  tree expr_type = TREE_TYPE (expr);
+  if (TYPE_PTR_P (expr_type))
+   expr_type = TREE_TYPE (expr_type);
+  if (CLASS_TYPE_P (expr_type) && !COMPLETE_TYPE_P (expr_type))
+   inform (DECL_SOURCE_LOCATION (TYPE_MAIN_DECL (expr_type)),
+   "class type %qT is incomplete", expr_type);
+}
   return error_mark_node;
 }
 
diff --git a/gcc/testsuite/g++.dg/expr/static_cast8.C 
b/gcc/testsuite/g++.dg/expr/static_cast8.C
new file mode 100644
index 000..dc4d2162d6c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/static_cast8.C
@@ -0,0 +1,27 @@
+// PR c++/52321
+struct A1; // { dg-message "note: class type 'A1' is incomplete" }
+struct A2; // { dg-message "note: class type 'A2' is incomplete" }
+struct B1; // { dg-message "note: class type 'B1' is incomplete" }
+struct B2; // { dg-message "note: class type 'B2' is incomplete" }
+
+struct C { };
+extern C* c;
+
+void pointers(C* c, A2* a2, B1* b1)
+{
+  (void) static_cast(c);  // { dg-error "invalid static_cast" }
+  (void) static_cast(a2);  // { dg-error "invalid static_cast" }
+  (void) static_cast(b1); // { dg-error "invalid static_cast" }
+}
+
+struct D1; // { dg-message "note: class type 'D1' is incomplete" }
+struct D2; // { dg-message "note: class type 'D2' is incomplete" }
+struct E1; // { dg-message "note: class type 'E1' is incomplete" }
+struct E2; // { dg-message "note: class type 'E2' is incomplete" }
+
+void references(C& c, D2& d2, E1& e1)
+{
+  (void) static_cast(c);  // { dg-error "invalid static_cast" }
+  (void) static_cast(d2);  // { dg-error "invalid static_cast" }
+  (void) static_cast(e1); // { dg-error "invalid static_cast" }
+}

Re: [PATCH, OpenACC] Properly handle wait clause with no arguments

2018-12-17 Thread Thomas Schwinge

Hi Chung-Lin!

On Tue, 27 Nov 2018 22:41:54 +0800, Chung-Lin Tang  
wrote:
> On 2018/11/8 3:13 AM, Thomas Schwinge wrote:
> > Now, why do we need the following changes, in this rather "convoluted"
> > form:
> 
|--- gcc/omp-expand.c   (revision 263981)
|+++ gcc/omp-expand.c   (working copy)
|@@ -7381,16 +7381,32 @@ expand_omp_target (struct omp_region *region)
| /* ... push a placeholder.  */
| args.safe_push (integer_zero_node);
| 
|+  bool noval_seen = false;
|+  tree noval = build_int_cst (integer_type_node, GOMP_ASYNC_NOVAL);
|+
|   for (; c; c = OMP_CLAUSE_CHAIN (c))
| if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
|   {
> >> +tree wait_expr = OMP_CLAUSE_WAIT_EXPR (c);
> >> +
> >> +if (TREE_CODE (wait_expr) == INTEGER_CST
> >> +&& tree_int_cst_compare (wait_expr, noval) == 0)
> >> +  {
> >> +noval_seen = true;
> >> +continue;
> >> +  }
> >> +
> >>  args.safe_push (fold_convert_loc (OMP_CLAUSE_LOCATION (c),
> >> -  integer_type_node,
> >> -  OMP_CLAUSE_WAIT_EXPR (c)));
> >> +  integer_type_node, wait_expr));
> >>  num_waits++;
> >>}
> >>   
> >> -  if (!tagging || num_waits)
> >> +  if (noval_seen && num_waits == 0)
> >> +args[t_wait_idx] =
> >> +  (tagging
> >> +   ? oacc_launch_pack (GOMP_LAUNCH_WAIT, NULL_TREE, GOMP_ASYNC_NOVAL)
> >> +   : noval);
> >> +  else if (!tagging || num_waits)
> >>  {
> >>tree len;
> 
> >>case GOMP_LAUNCH_WAIT:
> >>  {
> >> -  unsigned num_waits = GOMP_LAUNCH_OP (tag);
> >> +  /* Be careful to cast the op field as a signed 16-bit, and
> >> + sign-extend to full integer.  */
> >> +  int num_waits = ((signed short) GOMP_LAUNCH_OP (tag));
> >>   
> >> -  if (num_waits)
> >> +  if (num_waits > 0)
> >>  goacc_wait (async, num_waits, &ap);
> >> +  else if (num_waits == acc_async_noval)
> >> +acc_wait_all_async (async);
> 
> > Why can't we just pass "GOMP_ASYNC_NOVAL" through like any other
> > async-argument (that is, map a single "wait" clause to "num_waits == 1,
> > *ap == GOMP_ASYNC_NOVAL"), and then handle that case in "goacc_wait",
> > avoiding all these interface changes and special casing in different
> > functions?
> > 
> > Or am I not understanding correctly what the purpose of this is?
> 
> I think the original intention was that wait(acc_async_noval) should
> correspond to "wait all" semantics, hence we should be able to ignore
> and discard other wait() clauses if they exist.

Ah, I see.  But, I'm not sure whether an optimization for such "strange"
user code ("#pragma acc [...] wait(0, 1, acc_async_noval, 5, 0, [...])")
really warrants any such GCC code complications.  ;-)

> Having that said, I think there is some incorrect code in my patch wrt
> this intended behavior, which I'll revise.

(OK, still waiting for that.)

> (The assumption of an argument-less wait clause to mean "wait all" is
> derived from the closely documented OpenACC wait *directive* specification.
> Frankly speaking, the prior section on the wait *clause* is not explicitly
> clear on this, though 'wait all' is a reasonable assumption. It would still
> be helpful if we asked the OpenACC SC to clarify)

(We're discussing that with them, but what you describe indeed I also
would agree to be what's intended, so OK to proceed assuming that.)

> As for the idea on stuffing more code into goacc_wait(), I think that's
> a pretty good suggestion, since all uses of it in oacc-parallel.c are
> actually quite similar; re-factoring this part should make things more 
> elegant.

ACK.

> > My understanding is that before, GCC never generates "negative
> > async-arguments" (now used for "GOMP_ASYNC_NOVAL"), but only non-negative
> > ones (real "async-arguments"), which we continue to handle, as before.
> 
> > Isn't that sufficient for the ABI compatibility that we promise, which is
> > (unless I'm confused now?) that old (existing) executables continue to
> > run correctly when dynamically linking against a new libgomp.  Or do we
> > also have to care about the case that an executable built with a new
> > version of GCC has to work when dynamically linked against an old
> > libgomp?
> 
> I think either way, encoding GOMP_ASYNC_NOVAL in num_waits or as an argument
> should be okay for backward compatibility, i.e. old binaries should still
> work with new libgomp with this modification.
> 
> As for new binaries vs old libgomp, I believe with the original libgomp
> oacc-parallel.c code, it's not quite possible to achieve the intended wait all
> behavior by playing with num_waits or arguments.
> 
> I'll revise the patch and re-submit later.

OK, thanks!


Grüße
 Thomas

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-17 Thread Szabolcs Nagy

On 16/12/2018 22:45, Uecker, Martin wrote:
> Am Sonntag, den 16.12.2018, 09:13 -0700 schrieb Jeff Law:
>> Ultimately using function descriptors is an ABI breaking choice and we
>> might declare that function descriptors imply higher function
>> alignments.  
> 
> Increasing the alignment is not an ABI breaking change.

increasing alignment _requirement_ is an abi breaking change.

and it's not clear who would benefit from the new abi:

- it affects everything that does indirect calls (if alignment
requirement is increased then in addition everything that has
functions whose address may be taken), so it can easily affect
existing handwritten asm and it definitely requires the rebuild
of the c runtime to support this abi (i think it even requires
asm changes there if you allow a thread or makecontext start
function to be a nested function).

- it makes indirect calls more expensive everywhere, even if
nested functions are not used.

i think to fix the executable stack problem in practice, the
new nested function mechanism should only require the rebuild
of code that actually contains nested functions and thus have
no abi or performance impact on code that never uses them.

i believe this can be achieved by some restrictions on nested
function usage in a way that covers most practical use-cases:
e.g. only allowing one active parent function call frame per
thread, no recursive calls to it, the nested function must be
invoked in the same thread as the parent using the same stack,
etc. (then the new mechanism can be used safely if nested
functions are known to follow the restrictions, the compiler
may even emit code to check the constraints at runtime.)

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Segher Boessenkool

On Sun, Dec 16, 2018 at 10:43:47AM +0200, Dimitar Dimitrov wrote:
> On Fri, Dec 14 2018 2:52:17 EET Segher Boessenkool wrote:
> > You need a few tweaks to what you committed.  Or just one perhaps: if
> > flag_pic is not set, you should not check PIC_OFFSET_TABLE_REGNUM, it is
> > meaningless in that case.  I'm not sure if you need to check whether the
> > register is fixed or not.
> The flag_pic flag is already checked by the PIC_OFFSET_TABLE_REGNUM macro. It 
> will return INVALID_REGNUM if flag_pic is false, so no error will be printed.

No, it is not.  On at least six targets the macro is simply defined as a
register number.


Segher

[PATCH] AIX extra_headers

2018-12-17 Thread David Edelsohn

Currently the AIX configuration defines extra_headers to install some
PowerPC-specific headers, such as altivec.h. The GCC configuration for
Power was changed to define this centrally and the individual
definitions for Linux were removed, but remained for AIX.  This means
that AIX overrides the common definition and is missing some headers.

This patch removes the AIX-specific definitions to utilize the common
definition and install all of the GCC PowerPC-specific headers.

Bootstrapped on powerpc-ibm-aix7.2.0.0

Thanks, David

* config.gcc (powerpc-ibm-aix6.*): Delete extra_headers.
(powerpc-ibm-aix7.1.*): Same.
(powerpc-ibm-aix[789].*): Same.

Index: config.gcc
===
--- config.gcc  (revision 267202)
+++ config.gcc  (working copy)
@@ -2734,7 +2734,6 @@ rs6000-ibm-aix6.* | powerpc-ibm-aix6.*)
use_collect2=yes
thread_file='aix'
use_gcc_stdint=wrap
-   extra_headers=altivec.h
default_use_cxa_atexit=yes
;;
 rs6000-ibm-aix7.1.* | powerpc-ibm-aix7.1.*)
@@ -2744,7 +2743,6 @@ rs6000-ibm-aix7.1.* | powerpc-ibm-aix7.1.*)
use_collect2=yes
thread_file='aix'
use_gcc_stdint=wrap
-   extra_headers="altivec.h amo.h"
default_use_cxa_atexit=yes
;;
 rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
@@ -2754,7 +2752,6 @@ rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
use_collect2=yes
thread_file='aix'
use_gcc_stdint=wrap
-   extra_headers="altivec.h amo.h"
default_use_cxa_atexit=yes
;;
 rl78-*-elf*)

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-17 Thread Thomas Schwinge

Hi Chung-Lin!

On Mon, 17 Dec 2018 19:03:12 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/14 10:56 PM, Thomas Schwinge wrote:
> > +  //TODO Are these safe to call, or might this cause deadlock if 
> > something's locked?
> > CUDA_CALL_ASSERT (cuEventCreate, &e, CU_EVENT_DISABLE_TIMING);
> > CUDA_CALL_ASSERT (cuEventRecord, e, aq1->cuda_stream);
> > CUDA_CALL_ASSERT (cuStreamWaitEvent, aq2->cuda_stream, e, 0);
> > @@ -1413,6 +1416,7 @@ static void
> >   cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
> >   {
> > if (res != CUDA_SUCCESS)
> > +//TODO Is this safe to call, or might this cause deadlock if 
> > something's locked?
> >   GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));
> 
> The reason there are deadlocks from inside the plugin on GOMP_PLUGIN_fatal() 
> is when we hold the
> struct gomp_device_descr's *device* lock, which is also acquired when we 
> execute atexit device shutdown handlers, hence the deadlock.
> 
> I don't think this is the case for the OpenACC entry points that grab at the 
> openacc.async.* hooks,

Ah, OK, I see.  (But I thought that method of deadlock had been fixed by
some structural changes, to have plugin functions call the
non-terminating "GOMP_PLUGIN_error" and return some error, instead of
calling "GOMP_PLUGIN_fatal"?  I may be misremembering.  Or, that's
another TODO item for later, separately...  Or, if that's actually the
case, that this has been fixed in the way I described, then should these
functions also be changed accordingly: instead of "GOMP_PLUGIN_fatal"
call "GOMP_PLUGIN_error", and then return an error code?)

> though I can audit them again if deemed so.

My understanding had been that deadlock may happen if we're inside some
of these async/wait/serialize/synchronize functions, with "async" locked,
then run into an error, then libgomp prepares to abort, and at that time
shuts down the device, which will shut down the asyncqueues
("goacc_fini_asyncqueues"), which will again try to lock "async" -- which
it actually doesn't.  My misunderstanding, I guess?

> > But then, I wonder if we couldn't skip all that locking, if we moved the
> > "asyncqueue"s from "acc_dispatch_t" into "goacc_thread"?
> > 
> 
> > struct {
> > +//TODO Why do these live in the "device" data structure, and not in 
> > the "per-thread" data structure?
> > +//TODO Aren't they meant to be separate per thread?
> > +//TODO That is, as far as I remember right now, OpenACC explicitly 
> > states that an asyncqueue doesn't entail any synchronization between 
> > different host threads.
> > +//TODO Verify OpenACC.
> > +//TODO With that moved into "goacc_thread", we could get rid of all 
> > the locking needed here?
> >   /* Once created and put into the "active" list, asyncqueues are then 
> > never
> >  destructed and removed from the "active" list, other than if the 
> > TODO
> >  device is shut down.  */
> > 
> > At this point, I will again (as in that other email) state that my
> > understanding of OpenACC is that an async queue does not entail any
> > inter-thread synchronization, so it would seem reasonable that all async
> > queues are separate per thread.
> 
> OpenACC 2.6 specification, section 2.16.1 "async clause (page 68, line 2029):
> 
> "If there are two or more host threads executing and sharing the same 
> accelerator device,
> two asynchronous operations with the same async-value will be enqueued on the 
> same activity queue"

Right, but then, in the very next sentence, it goes on to state: "If the
threads are not synchronized with respect to each other, the operations
may be enqueued in either order and therefore may execute on the device
in either order".  So this, and given that:

> That said, I recall most (if not all) of the synchronization operations and 
> behavior are all
> defined to be with respect to operations of the local host thread only, so 
> the spec mentioning interaction with
> other host threads here may be moot, as there's no way meaningful way to 
> synchronize with
> the OpenACC activity of other host threads (though correct me if I forgot 
> some API)

..., I concluded something must be wrong in the OpenACC 2.6,
2.16.1. "async clause" text, and no such (host-side) inter-thread
synchronization can be expected from OpenACC "async"/"wait".  I've also
already got that on my list of things to clarify with the OpenACC
technical committee, later on.

> Also, CUDA streams do not seem to support local-thread-operation-only 
> synchronization.
> I remember this was an issue in the old implementation inside the nvptx 
> plugin as well, and we
> had hacks to somewhat alleviate it (i.e. calling streams "single" or "multi" 
> threaded)

Right.

> Well, another issue that we might want to bring up to the OpenACC committee :)
> I agree that if async queues spaces were simply thread-local then things 
> would be much simpler.

OK, so you agree with that, good.

And, no proble

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-17 Thread Thomas Schwinge

Hi Chung-Lin!

On Fri, 14 Dec 2018 23:00:57 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/14 10:53 PM, Thomas Schwinge wrote:
> > Please comment on the one TODO
> > which before your async re-work also was -- incorrectly? -- run
> > asynchronously?

> > @@ -563,6 +563,8 @@ GOACC_update (int device, size_t mapnum,
> >  the value of the allocated device memory in the
> >  previous pointer.  */
> >   *(uintptr_t *) hostaddrs[i] = (uintptr_t)dptr;
> > + /* This is intentionally no calling acc_update_device_async,
> > +because TODO.  */
> >   acc_update_device (hostaddrs[i], sizeof (uintptr_t));
> >   
> >   /* Restore the host pointer.  */
| *(uintptr_t *) hostaddrs[i] = t;
> 
> I don't remember adding this piece of comment, it might have been Cesar I 
> guess?

That "TODO" comment, you mean?  It was me who just added that one, to
highlight this, to ask you to comment, whether it's feasible to turn that
"acc_update_device" into "acc_update_device_async", too, or if that has
intentionally not been done, given that before your async re-work that
also was -- incorrectly? -- run asynchronously.

> I'm not sure if there's any real reason not to use acc_update_device_async 
> here...

My worry was that the data object being copied here ("*hostaddrs[i]") is
changed immediatelly after the "acc_update_device" call (now inlined
above), so if that update get enqueued for asynchronous execution, it
might then actually copy the value after "Restore the host pointer".

Given the code before and after it, maybe "acc_update_device" is
generally the wrong function to call here?  (That's, again, a separate
change, please.)

I have not yet researched where that code is coming from, and what it's
supposed to be used for.  But as part of the async re-work we should at
least understand that one, too.

> Change and test to see?

Generally, when testing asynchronous behavior, I'd be wary of reading too
much into any such test results ("still PASSes" -- by chance?).  Unless,
of course, there's then a clear regression somewhere ("now FAILs").

..., which there isn't, because adding a "gomp_fatal" in that code path,
that doesn't trigger a single time when running the libgomp testsuite...

Grüße
 Thomas

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Bernd Edlinger

On 12/17/18 2:35 PM, Richard Sandiford wrote:
> Christophe Lyon  writes:
>> On Mon, 17 Dec 2018 at 12:47, Richard Sandiford
>>  wrote:
>>>
>>> Dimitar Dimitrov  writes:
 On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
> Hi,
>
> if I understood that right, then clobbering sp is and has always been
> ignored.
>>>
>>> PR77904 was about the clobber not being ignored, so the behaviour
>>> hasn't been consistent.
>>>
>>> I'm also not sure it was always ignored in recent sources.  The clobber
>>> does get added to the associated rtl insn, and it'd be surprising if
>>> that never had an effect.
>>>
> If that is right, then I would much prefer a warning, that says exactly
> that, because that would also help to understand why removing that clobber
> statement is safe even for old gcc versions.
>>>
>>> If the asm does leave sp with a different value, then it's never been safe,
>>> regardless of the gcc version.  That's why an error seems more appropriate.
>>>
 Thank you. Looks like general consensus is to have a warning. See attached
 patch that switches the error to a warning.
>>>
>>> I don't think there's a good reason to treat this differently from the
>>> preexisting PIC register error.  If the argument for making it a warning
>>> rather than an error is that the asm might happen to work by accident,
>>> then the same is true for the PIC register.
>>>
>>
>> If we leave the error, maybe a more explanatory message would be helpful?
>> (along the lines of what I posted earlier in this thread, which may be
>> too verbose)
> 
> The message in that patch suggested removing the clobber and hoping for
> the best, which IMO is bad advice.  If the current message doesn't make
> it clear enough that changing the sp is not allowed, how about:
> 
>  inline % statements must not change the value of the stack pointer
> 
> ?
> 

Yes, things could be easy, but, mot the closer one looks, the more complicated
they get...

Even pushing a value on the stack and popping it again in the _same_ asm 
statement
is dangerous with red-zone targets. Maybe that would be also good to add as an 
advice?


Bernd.

> Thanks,
> Richard
> 
>  
>

Re: [PATCH] error on missing LTO symbols

2018-12-17 Thread Tom de Vries

On 17-12-18 13:46, Martin Jambor wrote:
> Hi,
> 
> On Fri, Dec 14 2018, Jakub Jelinek wrote:
>> On Fri, Dec 14, 2018 at 10:21:35AM +0100, Tom de Vries wrote:
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.c-c++-common/variable-not-offloaded.c
>>> @@ -0,0 +1,21 @@
>>> +/* { dg-do link } */
>>> +/* { dg-excess-errors "lto1, mkoffload and lto-wrapper fatal errors" } */
>>> +
>>> +int results[2000]; /* { dg-error "variable 'results' has been referenced 
>>> in offloaded code but hasn't been marked to be included in the offloaded 
>>> code" } */
>>> +
>>> +#pragma omp declare target
>>> +void  __attribute__((noinline, noclone))
>>> +baz (int i)
>>> +{
>>> +  results[i]++;
>>> +}
>>> +#pragma omp end declare target
>>> +
>>> +int
>>> +main ()
>>> +{
>>> +#pragma omp target
>>> +#pragma omp for
>>> +  for (int i = 0; i < 2000; i++)
>>> +baz (i);
>>> +}
>>
>> Ah, one more thing, the testcase doesn't really fail when offloading isn't
>> configured, so it would need some effective target or something that
>> it actually does the offloading.  And not really sure about shared memory
>> offloading like hsail, that doesn't really need the variables declared
>> either, just the functions.
>>
> 
> so IIUC and IIRC, the testcase should have:
> 
> { dg-require-effective-target offload_device_nonshared_as }
> 

Hi Martin,

thanks for the confirmation. That's what's been used in the committed
version.

- Tom

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Bernd Edlinger

On 12/17/18 2:42 PM, Christophe Lyon wrote:
> On Mon, 17 Dec 2018 at 14:35, Richard Sandiford
>  wrote:
>>
>> Christophe Lyon  writes:
>>> On Mon, 17 Dec 2018 at 12:47, Richard Sandiford
>>>  wrote:

 Dimitar Dimitrov  writes:
> On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
>> Hi,
>>
>> if I understood that right, then clobbering sp is and has always been
>> ignored.

 PR77904 was about the clobber not being ignored, so the behaviour
 hasn't been consistent.

 I'm also not sure it was always ignored in recent sources.  The clobber
 does get added to the associated rtl insn, and it'd be surprising if
 that never had an effect.

>> If that is right, then I would much prefer a warning, that says exactly
>> that, because that would also help to understand why removing that 
>> clobber
>> statement is safe even for old gcc versions.

 If the asm does leave sp with a different value, then it's never been safe,
 regardless of the gcc version.  That's why an error seems more appropriate.

> Thank you. Looks like general consensus is to have a warning. See attached
> patch that switches the error to a warning.

 I don't think there's a good reason to treat this differently from the
 preexisting PIC register error.  If the argument for making it a warning
 rather than an error is that the asm might happen to work by accident,
 then the same is true for the PIC register.

>>>
>>> If we leave the error, maybe a more explanatory message would be helpful?
>>> (along the lines of what I posted earlier in this thread, which may be
>>> too verbose)
>>
>> The message in that patch suggested removing the clobber and hoping for
>> the best, which IMO is bad advice.  If the current message doesn't make
>> it clear enough that changing the sp is not allowed, how about:
>>
>>  inline % statements must not change the value of the stack pointer
>>
>> ?
>>
> 
> My understanding is that in some cases, some users still want to
> deliberately change SP,
> so "must not" may be confusing in such cases?
> 

What do they want to achieve, something like a longjmp ?

BTW: on arm, you can add r15 (PC) to the clobber list, and
that seems to be ignored as well.  If you want to jump to
another function, you might want to clobber "sp", "pc".
PS: I know that is invalid, just guessing why there is an itch.

Bernd.

>> Thanks,
>> Richard
>>
>>

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-17 Thread Thomas Schwinge

Hi Chung-Lin!

On Fri, 14 Dec 2018 22:42:28 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/14 10:32 PM, Thomas Schwinge wrote:
> > Invoked as "acc_wait_async ([...], acc_async_sync)" (as used in a test
> > case that I'll soon submit/commit), we'll end up with "aq2 == NULL", and
> > will segfault in the nvptx "openacc.async.serialize_func".
> 
> What does "wait async(acc_async_sync)" supposed to mean?

In my understanding, that'll translate to just "wait" without an "async"
clause, thus synchronous with the local (host) thread.

> Instead of fixing
> it here, will it make more sense to have the serialize_func hook to 
> accommodate
> the NULL asyncqueue?

Sure, that may make sense, yes.  Right: if there's no asyncqueue to
serialize with, then serialize/synchronize with the local (host) thread.

Grüße
 Thomas

Re: [ping] Change static chain to r11 on aarch64

2018-12-17 Thread Wilco Dijkstra

Hi Hans-Peter,

> While the choice of static-chain register does not affect the
> ABI, it's the other way round: the choice of static-chain
> register matters, specifically it's call-clobberedness.

Agreed.

> It looks like the current aarch64 static-chain register R18 is
> call-saved but without special provisions to save and restore
> the static chain register, i.e. the port is broken wrt.
> trampolines but may appear to work (likely as-if you got the
> call-clobberedness wrong for a special case; I haven't
> investigated).  I understand the i386 port gets this right.
> The CRIS port does not, but attempts and adds another bug (you
> can't use the trampoline as a register-save area on return).

> So, changing from R18 to R11 for aarch64 seems right, as the
> latter is call-clobbered and the former is call-saved IIUC.

The AArch64 ABI defines x18 as platform specific:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
On Linux it is call-clobbered, but it could be a fixed register on other
platforms (eg. a thread-local pointer). I don't think it's possible to make
it a callee-save. Still it is the wrong register to use since it already has
different uses. Using x9 would make its use as an extra argument clearer.

Wilco

Re: V6 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-17 Thread Jason Merrill


On 12/17/18 7:42 AM, H.J. Lu wrote:

On Mon, Dec 17, 2018 at 1:39 AM Richard Biener
 wrote:


On Fri, Dec 14, 2018 at 11:48 PM H.J. Lu  wrote:


On Fri, Dec 14, 2018 at 2:10 PM Jason Merrill  wrote:


On 12/13/18 6:56 PM, H.J. Lu wrote:

On Thu, Dec 13, 2018 at 12:50 PM Jason Merrill  wrote:


On 9/25/18 11:46 AM, H.J. Lu wrote:

On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill  wrote:

On 07/23/2018 05:24 PM, H.J. Lu wrote:


On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
wrote:


On Mon, 18 Jun 2018, Jason Merrill wrote:


On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
wrote:


On Mon, 18 Jun 2018, Jason Merrill wrote:


+  if (TREE_CODE (rhs) == COND_EXPR)
+{
+  /* Check the THEN path first.  */
+  tree op1 = TREE_OPERAND (rhs, 1);
+  context = check_address_of_packed_member (type, op1);



This should handle the GNU extension of re-using operand 0 if operand
1 is omitted.



Doesn't that just use a SAVE_EXPR?



Hmm, I suppose it does, but many places in the compiler seem to expect
that it produces a COND_EXPR with TREE_OPERAND 1 as NULL_TREE.



Maybe that's used somewhere inside the C++ front end.  For C a SAVE_EXPR
is produced directly.



Here is the updated patch.  Changes from the last one:

1. Handle COMPOUND_EXPR.
2. Fixed typos in comments.
3. Combined warn_for_pointer_of_packed_member and
warn_for_address_of_packed_member into
warn_for_address_or_pointer_of_packed_member.




c.i:4:33: warning: converting a packed ‘struct C *’ pointer increases the
alignment of ‘long int *’ pointer from 1 to 8 [-Waddress-of-packed-member]



I think this would read better as

c.i:4:33: warning: converting a packed ‘struct C *’ pointer (alignment 1) to
‘long int *’ (alignment 8) may result in an unaligned pointer value
[-Waddress-of-packed-member]


Fixed.


+  while (TREE_CODE (base) == ARRAY_REF)
+   base = TREE_OPERAND (base, 0);
+  if (TREE_CODE (base) != COMPONENT_REF)
+   return NULL_TREE;



Are you deliberately not handling the other handled_component_p cases? If
so, there should be a comment.


I changed it to

while (handled_component_p (base))
   {
 enum tree_code code = TREE_CODE (base);
 if (code == COMPONENT_REF)
   break;
 switch (code)
   {
   case ARRAY_REF:
 base = TREE_OPERAND (base, 0);
 break;
   default:
 /* FIXME: Can it ever happen?  */
 gcc_unreachable ();
 break;
   }
   }

Is there a testcase to trigger this ICE? I couldn't find one.


You can take the address of an element of complex:

 __complex int i;
 int *p = &__real(i);

You may get VIEW_CONVERT_EXPR with location wrappers.


Fixed.  I replaced gcc_unreachable with return NULL_TREE;


Then we're back to my earlier question: are you deliberately not
handling the other cases?  Why not look through them as well?  What if
e.g. the operand of __real is a packed field?



Here is the updated patch with

diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
index 615134cfdac..f105742598e 100644
--- a/gcc/c-family/c-warn.c
+++ b/gcc/c-family/c-warn.c
@@ -2669,6 +2669,9 @@ check_address_of_packed_member (tree type, tree rhs)
 switch (code)
   {
   case ARRAY_REF:
+ case REALPART_EXPR:
+ case IMAGPART_EXPR:
+ case VIEW_CONVERT_EXPR:
 base = TREE_OPERAND (base, 0);
 break;
   default:


don't we have handled_component_p () for this?  (you're still
missing BIT_FIELD_REF which might be used for vector
element accesses)



Do you have a testcase?


Is there a reason you only want to handle some component references and 
not others?  If not, checking handled_component_p is simpler and more 
future proof than enumerating specific codes.


Jason

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-17 Thread Thomas Schwinge

Hi Chung-Lin!

On Fri, 14 Dec 2018 22:52:44 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/14 10:17 PM, Thomas Schwinge wrote:
> > On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang 
> >  wrote:
> >> --- a/libgomp/oacc-async.c
> >> +++ b/libgomp/oacc-async.c
> > 
> >> +attribute_hidden struct goacc_asyncqueue *
> >> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
> >> +{
> >> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> >> + default async stream.  */
> >> +  if (async == acc_async_noval)
> >> +async = thr->default_async;
> >> +
> >> +  if (async == acc_async_sync)
> >> +return NULL;
> >> +
> >> +  if (async < 0)
> >> +gomp_fatal ("bad async %d", async);
> > 
> > To make this "resolve" part more obvious, that is, the translation from
> > the "async" argument to an "asyncqueue" array index:
> > 
> >> +  if (!create
> >> +  && (async >= dev->openacc.async.nasyncqueue
> >> +|| !dev->openacc.async.asyncqueue[async]))
> >> +return NULL;
> >> +[...]
> > 
> > ..., I propose adding a "async2id" function for that, and then rename all
> > "asyncqueue[async]" to "asyncqueue[id]".
> 
> I don't think this is needed. This is the only place in the entire runtime 
> that
> does asyncqueue indexing, adding more conceptual layers of re-directed 
> indexing
> seems unneeded.

It makes the code better understandable?  Or, curious, why do you think
that the translation from an OpenACC async-argument to an internal
asyncqueue ID should not be a separate function?


> I do think the more descriptive comments are nice though.


> > And, this also restores the current trunk behavior, so that
> > "acc_async_noval" gets its own, separate "asyncqueue".
> 
> Is there a reason we need to restore that behavior right now?

Because otherwise that's a functional change ("regression") from the
current GCC trunk behavior, which I wouldn't expect in a re-work.


Grüße
 Thomas

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Christophe Lyon

On Mon, 17 Dec 2018 at 14:35, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > On Mon, 17 Dec 2018 at 12:47, Richard Sandiford
> >  wrote:
> >>
> >> Dimitar Dimitrov  writes:
> >> > On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
> >> >> Hi,
> >> >>
> >> >> if I understood that right, then clobbering sp is and has always been
> >> >> ignored.
> >>
> >> PR77904 was about the clobber not being ignored, so the behaviour
> >> hasn't been consistent.
> >>
> >> I'm also not sure it was always ignored in recent sources.  The clobber
> >> does get added to the associated rtl insn, and it'd be surprising if
> >> that never had an effect.
> >>
> >> >> If that is right, then I would much prefer a warning, that says exactly
> >> >> that, because that would also help to understand why removing that 
> >> >> clobber
> >> >> statement is safe even for old gcc versions.
> >>
> >> If the asm does leave sp with a different value, then it's never been safe,
> >> regardless of the gcc version.  That's why an error seems more appropriate.
> >>
> >> > Thank you. Looks like general consensus is to have a warning. See 
> >> > attached
> >> > patch that switches the error to a warning.
> >>
> >> I don't think there's a good reason to treat this differently from the
> >> preexisting PIC register error.  If the argument for making it a warning
> >> rather than an error is that the asm might happen to work by accident,
> >> then the same is true for the PIC register.
> >>
> >
> > If we leave the error, maybe a more explanatory message would be helpful?
> > (along the lines of what I posted earlier in this thread, which may be
> > too verbose)
>
> The message in that patch suggested removing the clobber and hoping for
> the best, which IMO is bad advice.  If the current message doesn't make
> it clear enough that changing the sp is not allowed, how about:
>
> inline % statements must not change the value of the stack pointer
>
> ?
>

My understanding is that in some cases, some users still want to
deliberately change SP,
so "must not" may be confusing in such cases?

> Thanks,
> Richard
>
>

Re: [PATCH] DWARF: Don't expand hash table when no insert is needed

2018-12-17 Thread Richard Biener

On Mon, Dec 17, 2018 at 2:26 PM H.J. Lu  wrote:
>
> On Mon, Dec 17, 2018 at 2:00 AM Richard Biener
>  wrote:
> >
> > On Sun, Dec 16, 2018 at 9:33 PM H.J. Lu  wrote:
> > >
> > > find_slot_with_hash has
> > >
> > >  if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
> > > expand ();
> > >
> > > which may expand hash table even if no insert is neeed and change hash
> > > table traverse order.  When output_macinfo_op is called, all index
> > > strings have been added to hash table by save_macinfo_strings, we
> > > shouldn't expand index string hash table.  Otherwise find_slot_with_hash
> > > will expand hash table when hash table has the right size.
> >
> > So the point is that output_macinfo_op is called from code traversing the
> > hashtable?  I didn't really manage to quickly spot that.
> >
> > > Tested on i686 and x86-64.  OK for trunk?
> >
> > OK if called from htab traverse (please clarify that in the comment you 
> > add).
> >
> > Richard.
>
> dwarf2out_finish performs:
>
> 1. save_macinfo_strings
> 2. hash table traverse of index_string
> 3. output_macinfo -> output_macinfo_op
> 4. output_indirect_strings -> hash table traverse of output_index_string
>
> find_slot_with_hash has
>
>  if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
> expand ();
>
> which may expand hash table even if no insertion is neeed and change hash
> table traverse order.  When output_macinfo_op is called, all index strings
> have been added to hash table by save_macinfo_strings and we shouldn't
> expand index string hash table.  Otherwise find_slot_with_hash will expand
> hash table when hash table has the right size and hash table traverse of
> output_index_string will have a different traverse order from index_string.
>
> This is the updated patch I am checking in.

Thanks for tracking this down.

Richard.

> --
> H.J.

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Richard Sandiford

Christophe Lyon  writes:
> On Mon, 17 Dec 2018 at 12:47, Richard Sandiford
>  wrote:
>>
>> Dimitar Dimitrov  writes:
>> > On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
>> >> Hi,
>> >>
>> >> if I understood that right, then clobbering sp is and has always been
>> >> ignored.
>>
>> PR77904 was about the clobber not being ignored, so the behaviour
>> hasn't been consistent.
>>
>> I'm also not sure it was always ignored in recent sources.  The clobber
>> does get added to the associated rtl insn, and it'd be surprising if
>> that never had an effect.
>>
>> >> If that is right, then I would much prefer a warning, that says exactly
>> >> that, because that would also help to understand why removing that clobber
>> >> statement is safe even for old gcc versions.
>>
>> If the asm does leave sp with a different value, then it's never been safe,
>> regardless of the gcc version.  That's why an error seems more appropriate.
>>
>> > Thank you. Looks like general consensus is to have a warning. See attached
>> > patch that switches the error to a warning.
>>
>> I don't think there's a good reason to treat this differently from the
>> preexisting PIC register error.  If the argument for making it a warning
>> rather than an error is that the asm might happen to work by accident,
>> then the same is true for the PIC register.
>>
>
> If we leave the error, maybe a more explanatory message would be helpful?
> (along the lines of what I posted earlier in this thread, which may be
> too verbose)

The message in that patch suggested removing the clobber and hoping for
the best, which IMO is bad advice.  If the current message doesn't make
it clear enough that changing the sp is not allowed, how about:

inline % statements must not change the value of the stack pointer

?

Thanks,
Richard

Re: V6 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-17 Thread Richard Biener

On Mon, Dec 17, 2018 at 1:43 PM H.J. Lu  wrote:
>
> On Mon, Dec 17, 2018 at 1:39 AM Richard Biener
>  wrote:
> >
> > On Fri, Dec 14, 2018 at 11:48 PM H.J. Lu  wrote:
> > >
> > > On Fri, Dec 14, 2018 at 2:10 PM Jason Merrill  wrote:
> > > >
> > > > On 12/13/18 6:56 PM, H.J. Lu wrote:
> > > > > On Thu, Dec 13, 2018 at 12:50 PM Jason Merrill  
> > > > > wrote:
> > > > >>
> > > > >> On 9/25/18 11:46 AM, H.J. Lu wrote:
> > > > >>> On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill  
> > > > >>> wrote:
> > > >  On 07/23/2018 05:24 PM, H.J. Lu wrote:
> > > > >
> > > > > On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
> > > > > 
> > > > > wrote:
> > > > >>
> > > > >> On Mon, 18 Jun 2018, Jason Merrill wrote:
> > > > >>
> > > > >>> On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
> > > > >>> 
> > > > >>> wrote:
> > > > 
> > > >  On Mon, 18 Jun 2018, Jason Merrill wrote:
> > > > 
> > > > >> +  if (TREE_CODE (rhs) == COND_EXPR)
> > > > >> +{
> > > > >> +  /* Check the THEN path first.  */
> > > > >> +  tree op1 = TREE_OPERAND (rhs, 1);
> > > > >> +  context = check_address_of_packed_member (type, op1);
> > > > >
> > > > >
> > > > > This should handle the GNU extension of re-using operand 0 if 
> > > > > operand
> > > > > 1 is omitted.
> > > > 
> > > > 
> > > >  Doesn't that just use a SAVE_EXPR?
> > > > >>>
> > > > >>>
> > > > >>> Hmm, I suppose it does, but many places in the compiler seem to 
> > > > >>> expect
> > > > >>> that it produces a COND_EXPR with TREE_OPERAND 1 as NULL_TREE.
> > > > >>
> > > > >>
> > > > >> Maybe that's used somewhere inside the C++ front end.  For C a 
> > > > >> SAVE_EXPR
> > > > >> is produced directly.
> > > > >
> > > > >
> > > > > Here is the updated patch.  Changes from the last one:
> > > > >
> > > > > 1. Handle COMPOUND_EXPR.
> > > > > 2. Fixed typos in comments.
> > > > > 3. Combined warn_for_pointer_of_packed_member and
> > > > > warn_for_address_of_packed_member into
> > > > > warn_for_address_or_pointer_of_packed_member.
> > > > 
> > > > 
> > > > > c.i:4:33: warning: converting a packed ‘struct C *’ pointer 
> > > > > increases the
> > > > > alignment of ‘long int *’ pointer from 1 to 8 
> > > > > [-Waddress-of-packed-member]
> > > > 
> > > > 
> > > >  I think this would read better as
> > > > 
> > > >  c.i:4:33: warning: converting a packed ‘struct C *’ pointer 
> > > >  (alignment 1) to
> > > >  ‘long int *’ (alignment 8) may result in an unaligned pointer value
> > > >  [-Waddress-of-packed-member]
> > > > >>>
> > > > >>> Fixed.
> > > > >>>
> > > > > +  while (TREE_CODE (base) == ARRAY_REF)
> > > > > +   base = TREE_OPERAND (base, 0);
> > > > > +  if (TREE_CODE (base) != COMPONENT_REF)
> > > > > +   return NULL_TREE;
> > > > 
> > > > 
> > > >  Are you deliberately not handling the other handled_component_p 
> > > >  cases? If
> > > >  so, there should be a comment.
> > > > >>>
> > > > >>> I changed it to
> > > > >>>
> > > > >>>while (handled_component_p (base))
> > > > >>>   {
> > > > >>> enum tree_code code = TREE_CODE (base);
> > > > >>> if (code == COMPONENT_REF)
> > > > >>>   break;
> > > > >>> switch (code)
> > > > >>>   {
> > > > >>>   case ARRAY_REF:
> > > > >>> base = TREE_OPERAND (base, 0);
> > > > >>> break;
> > > > >>>   default:
> > > > >>> /* FIXME: Can it ever happen?  */
> > > > >>> gcc_unreachable ();
> > > > >>> break;
> > > > >>>   }
> > > > >>>   }
> > > > >>>
> > > > >>> Is there a testcase to trigger this ICE? I couldn't find one.
> > > > >>
> > > > >> You can take the address of an element of complex:
> > > > >>
> > > > >> __complex int i;
> > > > >> int *p = &__real(i);
> > > > >>
> > > > >> You may get VIEW_CONVERT_EXPR with location wrappers.
> > > > >
> > > > > Fixed.  I replaced gcc_unreachable with return NULL_TREE;
> > > >
> > > > Then we're back to my earlier question: are you deliberately not
> > > > handling the other cases?  Why not look through them as well?  What if
> > > > e.g. the operand of __real is a packed field?
> > > >
> > >
> > > Here is the updated patch with
> > >
> > > diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
> > > index 615134cfdac..f105742598e 100644
> > > --- a/gcc/c-family/c-warn.c
> > > +++ b/gcc/c-family/c-warn.c
> > > @@ -2669,6 +2669,9 @@ check_address_of_packed_member (tree type, tree rhs)
> > > switch (code)
> > >   {
> > >   case ARRAY_REF:
> > > + case REALPART_EXPR:
> > > + case IMAGPART_EXPR:
> > > + case VIE

ping: [PATCH][GCC][AARCH64]Introduce aarch64 atomic_{load,store}ti patterns

2018-12-17 Thread Matthew Malcomson

Ping


On 27/09/18 14:43, Matthew Malcomson wrote:
> [PATCH][GCC][AARCH64] Introduce aarch64 atomic_{load,store}ti patterns
>
> In Armv8.4-a these patterns use the LDP/STP instructions that are guaranteed 
> to
> be single-copy atomic, ensure correct memory ordering semantics by using
> the DMB instruction.
>
> We put the use of these inline expansions behind a command line flag since 
> they
> do not satisfy the libatomic ABI and hence can't be used together with code
> already compiled using 16 byte atomics.
> This command line flag is -matomic-128bit-instructions.
>
> Given the introduction of a flag specified to break ABI compatibility with
> libatomic, it seems reasonable to introduce the load-exclusive/store-exclusive
> read-modify-write loop emulation of 128 bit atomic load and stores for older
> architectures behind this flag.
>
> We introduce the usual extension macros for the "at" extension marking the
> LDP/STP atomicity guarantees introduced in Armv8.4-a and use these to decide
> which to use when -matomic-128bit-instructions is provided on the command 
> line.
>
> Tested with full bootstrap and make check on aarch64-none-linux-gnu.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> 2018-09-27  Matthew Malcomson  
>
>   * config/aarch64/aarch64-protos.h (aarch64_split_atomic_ti_access): New
>   prototype.
>   * config/aarch64/aarch64.c (aarch64_split_atomic_ti_access): New.
>   * config/aarch64/aarch64.h (AARCH64_FL_AT): New flag.
>   (AARCH64_FL_PROFILE): Flag moved to accomodate above.
>   (AARCH64_FL_FOR_ARCH8_4): Include AARCH64_FL_AT.
>   (AARCH64_ISA_AT): New ISA flag.
>   * config/aarch64/aarch64.opt (-matomic-128bit-instruction): New.
>   * config/aarch64/atomics.md (atomic_load, atomic_store,
>   @aarch64_load_exclusive {smaller registers},
>   @aarch64_load_exclusive {GPI registers},
>   @aarch64_store_exclusive): Use aarch_mm_needs_{acquire,release}
>   instead of three part check.
>   (atomic_loadti, aarch64_atomic_loadti_ldp, aarch64_atomic_loadti_basic
>   atomic_storeti, aarch64_atomic_storeti_stp,
>   aarch64_atomic_storeti_basic) New
>   * config/aarch64/iterators.md (GPI_TI): New.
>   * config/aarch64/predicates.md (aarch64_atomic_TImode_operand,
>   aarch64_TImode_pair_operand): New.
>   * doc/invoke.texi (-matomic-128bit-instructions): Document option.
>
> gcc/testsuite/ChangeLog:
>
> 2018-09-27  Matthew Malcomson  
>
>   * gcc.target/aarch64/atomic-load128.c: New test.
>   * gcc.target/aarch64/atomic-store.x: Shared macro for below tests.
>   * gcc.target/aarch64/atomic-store.c: Use atomic-store.x.
>   * gcc.target/aarch64/atomic-store128.c: New test using atomic-store.x.
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> caf1d2041f0cac8e3f975f8384a167a90dc638e5..578ea925fac9a7237af3a53e7ec642d0ba8e7b93
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -560,6 +560,8 @@ machine_mode aarch64_select_cc_mode (RTX_CODE, rtx, rtx);
>   rtx aarch64_gen_compare_reg (RTX_CODE, rtx, rtx);
>   rtx aarch64_load_tp (rtx);
>   
> +void aarch64_split_atomic_ti_access (rtx op[], bool);
> +
>   void aarch64_expand_compare_and_swap (rtx op[]);
>   void aarch64_split_compare_and_swap (rtx op[]);
>   void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> e5cdb1d54f4ee96140202ea21a9478438d208f45..c1e407b5a3f27aa7eea9c35e749fe597e79f3e65
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -158,9 +158,10 @@ extern unsigned aarch64_architecture_version;
>   #define AARCH64_FL_SHA3   (1 << 18)  /* Has ARMv8.4-a SHA3 and 
> SHA512.  */
>   #define AARCH64_FL_F16FML (1 << 19)  /* Has ARMv8.4-a FP16 extensions.  
> */
>   #define AARCH64_FL_RCPC8_4(1 << 20)  /* Has ARMv8.4-a RCPC extensions.  
> */
> +#define AARCH64_FL_AT (1 << 21)  /* Has ARMv8.4-a AT extensions.  */
>   
>   /* Statistical Profiling extensions.  */
> -#define AARCH64_FL_PROFILE(1 << 21)
> +#define AARCH64_FL_PROFILE(1 << 22)
>   
>   /* Has FP and SIMD.  */
>   #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
> @@ -179,7 +180,7 @@ extern unsigned aarch64_architecture_version;
> (AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3)
>   #define AARCH64_FL_FOR_ARCH8_4  \
> (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \
> -   | AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4)
> +   | AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4 | AARCH64_FL_AT)
>   
>   /* Macros to test ISA flags.  */
>   
> @@ -201,6 +202,7 @@ extern unsigned aarch64_architecture_version;
>   #define AARCH64_ISA_SHA3   (aarch64_isa_flags & AARCH64_FL_SHA3)
>   #define AARCH64_ISA_F16FML (aarch6

Re: [PATCH] DWARF: Don't expand hash table when no insert is needed

2018-12-17 Thread H.J. Lu

On Mon, Dec 17, 2018 at 2:00 AM Richard Biener
 wrote:
>
> On Sun, Dec 16, 2018 at 9:33 PM H.J. Lu  wrote:
> >
> > find_slot_with_hash has
> >
> >  if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
> > expand ();
> >
> > which may expand hash table even if no insert is neeed and change hash
> > table traverse order.  When output_macinfo_op is called, all index
> > strings have been added to hash table by save_macinfo_strings, we
> > shouldn't expand index string hash table.  Otherwise find_slot_with_hash
> > will expand hash table when hash table has the right size.
>
> So the point is that output_macinfo_op is called from code traversing the
> hashtable?  I didn't really manage to quickly spot that.
>
> > Tested on i686 and x86-64.  OK for trunk?
>
> OK if called from htab traverse (please clarify that in the comment you add).
>
> Richard.

dwarf2out_finish performs:

1. save_macinfo_strings
2. hash table traverse of index_string
3. output_macinfo -> output_macinfo_op
4. output_indirect_strings -> hash table traverse of output_index_string

find_slot_with_hash has

 if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
expand ();

which may expand hash table even if no insertion is neeed and change hash
table traverse order.  When output_macinfo_op is called, all index strings
have been added to hash table by save_macinfo_strings and we shouldn't
expand index string hash table.  Otherwise find_slot_with_hash will expand
hash table when hash table has the right size and hash table traverse of
output_index_string will have a different traverse order from index_string.

This is the updated patch I am checking in.

-- 
H.J.
From 18caa0346c36e4e6a9d556d5a1256f6dc99e8598 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 16 Dec 2018 10:12:01 -0800
Subject: [PATCH] DWARF: Don't expand hash table when no insertion is needed

dwarf2out_finish performs:

1. save_macinfo_strings
2. hash table traverse of index_string
3. output_macinfo -> output_macinfo_op
4. output_indirect_strings -> hash table traverse of output_index_string

find_slot_with_hash has

 if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
expand ();

which may expand hash table even if no insertion is neeed and change hash
table traverse order.  When output_macinfo_op is called, all index strings
have been added to hash table by save_macinfo_strings and we shouldn't
expand index string hash table.  Otherwise find_slot_with_hash will expand
hash table when hash table has the right size and hash table traverse of
output_index_string will have a different traverse order from index_string.

	PR debug/79342
	* dwarf2out.c (find_AT_string_in_table): Add insert argument
	defaulting to INSERT and replace INSERT.
	(find_AT_string): Likewise.
	(output_macinfo_op): Pass NO_INSERT to find_AT_string.
---
 gcc/dwarf2out.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index b2381056991..49d8a0af33b 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -4618,12 +4618,13 @@ indirect_string_hasher::equal (indirect_string_node *x1, const char *x2)
 
 static struct indirect_string_node *
 find_AT_string_in_table (const char *str,
-			 hash_table *table)
+			 hash_table *table,
+			 enum insert_option insert = INSERT)
 {
   struct indirect_string_node *node;
 
   indirect_string_node **slot
-= table->find_slot_with_hash (str, htab_hash_string (str), INSERT);
+= table->find_slot_with_hash (str, htab_hash_string (str), insert);
   if (*slot == NULL)
 {
   node = ggc_cleared_alloc ();
@@ -4640,12 +4641,12 @@ find_AT_string_in_table (const char *str,
 /* Add STR to the indirect string hash table.  */
 
 static struct indirect_string_node *
-find_AT_string (const char *str)
+find_AT_string (const char *str, enum insert_option insert = INSERT)
 {
   if (! debug_str_hash)
 debug_str_hash = hash_table::create_ggc (10);
 
-  return find_AT_string_in_table (str, debug_str_hash);
+  return find_AT_string_in_table (str, debug_str_hash, insert);
 }
 
 /* Add a string attribute value to a DIE.  */
@@ -28095,7 +28096,19 @@ output_macinfo_op (macinfo_entry *ref)
   break;
 case DW_MACRO_define_strp:
 case DW_MACRO_undef_strp:
-  node = find_AT_string (ref->info);
+  /* NB: dwarf2out_finish performs:
+	   1. save_macinfo_strings
+	   2. hash table traverse of index_string
+	   3. output_macinfo -> output_macinfo_op
+	   4. output_indirect_strings
+		-> hash table traverse of output_index_string
+
+ When output_macinfo_op is called, all index strings have been
+	 added to hash table by save_macinfo_strings and we can't pass
+	 INSERT to find_slot_with_hash which may expand hash table, even
+	 if no insertion is needed, and change hash table traverse order
+	 between index_string and output_index_string.  */
+  node = find_AT_string (ref->info, NO_INSERT);
   gcc_assert (node
 		  && (node->form == DW_FORM_strp
 		  || node->

Fix buffer overflow when handling mismatched profiles

2018-12-17 Thread Jan Hubicka

Hi,
building Firefox with FDO and trunk reproduces another problem where we
ICE in streamer on histogram counter being negative when it should not.
I will turn this ICE into profile verification error because it may
happen when profile collection was corrupted, but in this case it is
caused by fact that we expect more counters at profile-use time then we
produced at profile-generate time.  I am not sure why this happens yet,
but this leads to buffer overrun and we pick random garbage into the
statement histogram.

I am not sure who was writting get_coverage_counts but it was optimistic
person, since he/she dropped the test that numbers match because both
users already compute them for no use :)

Bootstrapped/regtested x86_64-linux, commited to trunk and I plan to
backport it to gcc8 after SPEC FDO tests passes.

Honza

* coverage.c (struct conts_entry): Add n_counts.
(remap_counts_file): Record number of ocunts.
(get_coverage_counts): Verify that counts match.
* coverage.h (get_coverage_counts): Update prototype.
* profile.c (get_exec_counts. compute_value_histograms): Add
n_counts parametrs.

Index: coverage.c
===
--- coverage.c  (revision 267185)
+++ coverage.c  (working copy)
@@ -74,6 +74,7 @@ struct counts_entry : pointer_hash lineno_checksum = lineno_checksum;
  entry->cfg_checksum = cfg_checksum;
  entry->counts = XCNEWVEC (gcov_type, n_counts);
+ entry->n_counts = n_counts;
}
  else if (entry->lineno_checksum != lineno_checksum
   || entry->cfg_checksum != cfg_checksum)
@@ -295,7 +297,7 @@ read_counts_file (void)
 
 gcov_type *
 get_coverage_counts (unsigned counter, unsigned cfg_checksum,
-unsigned lineno_checksum)
+unsigned lineno_checksum, unsigned int n_counts)
 {
   counts_entry *entry, elt;
 
@@ -344,17 +346,27 @@ get_coverage_counts (unsigned counter, u
   return NULL;
 }
   
-  if (entry->cfg_checksum != cfg_checksum)
+  if (entry->cfg_checksum != cfg_checksum || entry->n_counts != n_counts)
 {
   static int warned = 0;
   bool warning_printed = false;
 
-  warning_printed =
-   warning_at (DECL_SOURCE_LOCATION (current_function_decl),
-   OPT_Wcoverage_mismatch,
-   "the control flow of function %qD does not match "
-   "its profile data (counter %qs)", current_function_decl,
-   ctr_names[counter]);
+  if (entry->n_counts != n_counts)
+   warning_printed =
+ warning_at (DECL_SOURCE_LOCATION (current_function_decl),
+ OPT_Wcoverage_mismatch,
+ "number of counters in profile data for function %qD "
+ "does not match "
+ "its profile data (counter %qs, expected %i and have %i)",
+ current_function_decl,
+ ctr_names[counter], entry->n_counts, n_counts);
+  else
+   warning_printed =
+ warning_at (DECL_SOURCE_LOCATION (current_function_decl),
+ OPT_Wcoverage_mismatch,
+ "the control flow of function %qD does not match "
+ "its profile data (counter %qs)", current_function_decl,
+ ctr_names[counter]);
   if (warning_printed && dump_enabled_p ())
{
  dump_user_location_t loc
Index: coverage.h
===
--- coverage.h  (revision 267185)
+++ coverage.h  (working copy)
@@ -52,7 +52,8 @@ extern tree tree_coverage_counter_addr (
 /* Get all the counters for the current function.  */
 extern gcov_type *get_coverage_counts (unsigned /*counter*/,
   unsigned /*cfg_checksum*/,
-  unsigned /*lineno_checksum*/);
+  unsigned /*lineno_checksum*/,
+  unsigned /*n_counts*/);
 
 extern tree get_gcov_type (void);
 extern bool coverage_node_map_initialized_p (void);
Index: profile.c
===
--- profile.c   (revision 267185)
+++ profile.c   (working copy)
@@ -218,7 +218,7 @@ get_exec_counts (unsigned cfg_checksum,
 }
 
   counts = get_coverage_counts (GCOV_COUNTER_ARCS, cfg_checksum,
-   lineno_checksum);
+   lineno_checksum, num_edges);
   if (!counts)
 return NULL;
 
@@ -780,7 +780,8 @@ compute_value_histograms (histogram_valu
 
   histogram_counts[t] = get_coverage_counts (COUNTER_FOR_HIST_TYPE (t),
 cfg_checksum,
-lineno_checksum);
+lineno_checksum,
+

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Christophe Lyon

On Mon, 17 Dec 2018 at 12:47, Richard Sandiford
 wrote:
>
> Dimitar Dimitrov  writes:
> > On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
> >> Hi,
> >>
> >> if I understood that right, then clobbering sp is and has always been
> >> ignored.
>
> PR77904 was about the clobber not being ignored, so the behaviour
> hasn't been consistent.
>
> I'm also not sure it was always ignored in recent sources.  The clobber
> does get added to the associated rtl insn, and it'd be surprising if
> that never had an effect.
>
> >> If that is right, then I would much prefer a warning, that says exactly
> >> that, because that would also help to understand why removing that clobber
> >> statement is safe even for old gcc versions.
>
> If the asm does leave sp with a different value, then it's never been safe,
> regardless of the gcc version.  That's why an error seems more appropriate.
>
> > Thank you. Looks like general consensus is to have a warning. See attached
> > patch that switches the error to a warning.
>
> I don't think there's a good reason to treat this differently from the
> preexisting PIC register error.  If the argument for making it a warning
> rather than an error is that the asm might happen to work by accident,
> then the same is true for the PIC register.
>

If we leave the error, maybe a more explanatory message would be helpful?
(along the lines of what I posted earlier in this thread, which may be
too verbose)

Christophe

> Thanks,
> Richard

Re: [PATCH] error on missing LTO symbols

2018-12-17 Thread Martin Jambor

Hi,

On Fri, Dec 14 2018, Jakub Jelinek wrote:
> On Fri, Dec 14, 2018 at 10:21:35AM +0100, Tom de Vries wrote:
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/variable-not-offloaded.c
>> @@ -0,0 +1,21 @@
>> +/* { dg-do link } */
>> +/* { dg-excess-errors "lto1, mkoffload and lto-wrapper fatal errors" } */
>> +
>> +int results[2000]; /* { dg-error "variable 'results' has been referenced in 
>> offloaded code but hasn't been marked to be included in the offloaded code" 
>> } */
>> +
>> +#pragma omp declare target
>> +void  __attribute__((noinline, noclone))
>> +baz (int i)
>> +{
>> +  results[i]++;
>> +}
>> +#pragma omp end declare target
>> +
>> +int
>> +main ()
>> +{
>> +#pragma omp target
>> +#pragma omp for
>> +  for (int i = 0; i < 2000; i++)
>> +baz (i);
>> +}
>
> Ah, one more thing, the testcase doesn't really fail when offloading isn't
> configured, so it would need some effective target or something that
> it actually does the offloading.  And not really sure about shared memory
> offloading like hsail, that doesn't really need the variables declared
> either, just the functions.
>

so IIUC and IIRC, the testcase should have:

{ dg-require-effective-target offload_device_nonshared_as }

Thanks,

Martin

Re: V6 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-17 Thread H.J. Lu

On Mon, Dec 17, 2018 at 1:39 AM Richard Biener
 wrote:
>
> On Fri, Dec 14, 2018 at 11:48 PM H.J. Lu  wrote:
> >
> > On Fri, Dec 14, 2018 at 2:10 PM Jason Merrill  wrote:
> > >
> > > On 12/13/18 6:56 PM, H.J. Lu wrote:
> > > > On Thu, Dec 13, 2018 at 12:50 PM Jason Merrill  wrote:
> > > >>
> > > >> On 9/25/18 11:46 AM, H.J. Lu wrote:
> > > >>> On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill  
> > > >>> wrote:
> > >  On 07/23/2018 05:24 PM, H.J. Lu wrote:
> > > >
> > > > On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
> > > > 
> > > > wrote:
> > > >>
> > > >> On Mon, 18 Jun 2018, Jason Merrill wrote:
> > > >>
> > > >>> On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
> > > >>> 
> > > >>> wrote:
> > > 
> > >  On Mon, 18 Jun 2018, Jason Merrill wrote:
> > > 
> > > >> +  if (TREE_CODE (rhs) == COND_EXPR)
> > > >> +{
> > > >> +  /* Check the THEN path first.  */
> > > >> +  tree op1 = TREE_OPERAND (rhs, 1);
> > > >> +  context = check_address_of_packed_member (type, op1);
> > > >
> > > >
> > > > This should handle the GNU extension of re-using operand 0 if 
> > > > operand
> > > > 1 is omitted.
> > > 
> > > 
> > >  Doesn't that just use a SAVE_EXPR?
> > > >>>
> > > >>>
> > > >>> Hmm, I suppose it does, but many places in the compiler seem to 
> > > >>> expect
> > > >>> that it produces a COND_EXPR with TREE_OPERAND 1 as NULL_TREE.
> > > >>
> > > >>
> > > >> Maybe that's used somewhere inside the C++ front end.  For C a 
> > > >> SAVE_EXPR
> > > >> is produced directly.
> > > >
> > > >
> > > > Here is the updated patch.  Changes from the last one:
> > > >
> > > > 1. Handle COMPOUND_EXPR.
> > > > 2. Fixed typos in comments.
> > > > 3. Combined warn_for_pointer_of_packed_member and
> > > > warn_for_address_of_packed_member into
> > > > warn_for_address_or_pointer_of_packed_member.
> > > 
> > > 
> > > > c.i:4:33: warning: converting a packed ‘struct C *’ pointer 
> > > > increases the
> > > > alignment of ‘long int *’ pointer from 1 to 8 
> > > > [-Waddress-of-packed-member]
> > > 
> > > 
> > >  I think this would read better as
> > > 
> > >  c.i:4:33: warning: converting a packed ‘struct C *’ pointer 
> > >  (alignment 1) to
> > >  ‘long int *’ (alignment 8) may result in an unaligned pointer value
> > >  [-Waddress-of-packed-member]
> > > >>>
> > > >>> Fixed.
> > > >>>
> > > > +  while (TREE_CODE (base) == ARRAY_REF)
> > > > +   base = TREE_OPERAND (base, 0);
> > > > +  if (TREE_CODE (base) != COMPONENT_REF)
> > > > +   return NULL_TREE;
> > > 
> > > 
> > >  Are you deliberately not handling the other handled_component_p 
> > >  cases? If
> > >  so, there should be a comment.
> > > >>>
> > > >>> I changed it to
> > > >>>
> > > >>>while (handled_component_p (base))
> > > >>>   {
> > > >>> enum tree_code code = TREE_CODE (base);
> > > >>> if (code == COMPONENT_REF)
> > > >>>   break;
> > > >>> switch (code)
> > > >>>   {
> > > >>>   case ARRAY_REF:
> > > >>> base = TREE_OPERAND (base, 0);
> > > >>> break;
> > > >>>   default:
> > > >>> /* FIXME: Can it ever happen?  */
> > > >>> gcc_unreachable ();
> > > >>> break;
> > > >>>   }
> > > >>>   }
> > > >>>
> > > >>> Is there a testcase to trigger this ICE? I couldn't find one.
> > > >>
> > > >> You can take the address of an element of complex:
> > > >>
> > > >> __complex int i;
> > > >> int *p = &__real(i);
> > > >>
> > > >> You may get VIEW_CONVERT_EXPR with location wrappers.
> > > >
> > > > Fixed.  I replaced gcc_unreachable with return NULL_TREE;
> > >
> > > Then we're back to my earlier question: are you deliberately not
> > > handling the other cases?  Why not look through them as well?  What if
> > > e.g. the operand of __real is a packed field?
> > >
> >
> > Here is the updated patch with
> >
> > diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
> > index 615134cfdac..f105742598e 100644
> > --- a/gcc/c-family/c-warn.c
> > +++ b/gcc/c-family/c-warn.c
> > @@ -2669,6 +2669,9 @@ check_address_of_packed_member (tree type, tree rhs)
> > switch (code)
> >   {
> >   case ARRAY_REF:
> > + case REALPART_EXPR:
> > + case IMAGPART_EXPR:
> > + case VIEW_CONVERT_EXPR:
> > base = TREE_OPERAND (base, 0);
> > break;
> >   default:
>
> don't we have handled_component_p () for this?  (you're still
> missing BIT_FIELD_REF which might be used for vector
> element accesses)
>

Do you have a testcase?

-- 
H.J.

Re: [PATCH v3 10/10] Port testsuite to GCN

2018-12-17 Thread Andrew Stubbs


On 12/12/2018 11:53, Andrew Stubbs wrote:

Since the previous v2 posting, the sqrt_insn additional option mechanism
has been reworked.  Otherwise the patch has only been rebased.


This updated patch includes the missing internals documentation.

OK?

Andrew
Port testsuite to GCN

This collection of miscellaneous patches configures the testsuite to run on AMD
GCN in a standalone (i.e. not offloading) configuration.  It assumes you have
your Dejagnu set up to run binaries via the gcn-run tool.

2018-12-17  Andrew Stubbs  
	Kwok Cheung Yeung  
	Julian Brown  
	Tom de Vries  

	gcc/doc/
	* sourcebuild.texi: Document dg-add-options sqrt_insn.

	gcc/testsuite/
	* gcc.dg/20020312-2.c: Add amdgcn support.
	* gcc.dg/Wno-frame-address.c: Disable on amdgcn.
	* gcc.dg/builtin-apply2.c: Likewise.
	* gcc.dg/torture/stackalign/builtin-apply-2.c: Likewise.
	* gcc.dg/gimplefe-28.c: Add dg-add-options for sqrt_insn.
	* gcc.dg/intermod-1.c: Add -mlocal-symbol-id on amdgcn.
	* gcc.dg/memcmp-1.c: Increase timeout factor.
	* gcc.dg/pr59605-2.c: Addd -DMAX_COPY=1025 on amdgcn.
	* gcc.dg/sibcall-10.c: xfail on amdgcn.
	* gcc.dg/sibcall-9.c: Likewise.
	* gcc.dg/tree-ssa/gen-vect-11c.c: Likewise.
	* gcc.dg/tree-ssa/pr84512.c: Likewise.
	* gcc.dg/tree-ssa/loop-1.c: Adjust expectations for amdgcn.
	* gfortran.dg/bind_c_array_params_2.f90: Likewise.
	* lib/target-supports.exp (check_effective_target_trampolines):
	Configure amdgcn.
	(check_profiling_available): Likewise.
	(check_effective_target_global_constructor): Likewise.
	(check_effective_target_return_address): Likewise.
	(check_effective_target_fopenacc): Likewise.
	(check_effective_target_fopenmp): Likewise.
	(check_effective_target_vect_int): Likewise.
	(check_effective_target_vect_intfloat_cvt): Likewise.
	(check_effective_target_vect_uintfloat_cvt): Likewise.
	(check_effective_target_vect_floatint_cvt): Likewise.
	(check_effective_target_vect_floatuint_cvt): Likewise.
	(check_effective_target_vect_simd_clones): Likewise.
	(check_effective_target_vect_shift): Likewise.
	(check_effective_target_whole_vector_shift): Likewise.
	(check_effective_target_vect_bswap): Likewise.
	(check_effective_target_vect_shift_char): Likewise.
	(check_effective_target_vect_long): Likewise.
	(check_effective_target_vect_float): Likewise.
	(check_effective_target_vect_double): Likewise.
	(check_effective_target_vect_perm): Likewise.
	(check_effective_target_vect_perm_byte): Likewise.
	(check_effective_target_vect_perm_short): Likewise.
	(check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
	(check_effective_target_vect_widen_mult_hi_to_si): Likewise.
	(check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise.
	(check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise.
	(check_effective_target_vect_natural_alignment): Likewise.
	(check_effective_target_vect_fully_masked): Likewise.
	(check_effective_target_vect_element_align): Likewise.
	(check_effective_target_vect_masked_store): Likewise.
	(check_effective_target_vect_scatter_store): Likewise.
	(check_effective_target_vect_condition): Likewise.
	(check_effective_target_vect_cond_mixed): Likewise.
	(check_effective_target_vect_char_mult): Likewise.
	(check_effective_target_vect_short_mult): Likewise.
	(check_effective_target_vect_int_mult): Likewise.
	(check_effective_target_sqrt_insn): Likewise.
	(check_effective_target_vect_call_sqrtf): Likewise.
	(check_effective_target_vect_call_btrunc): Likewise.
	(check_effective_target_vect_call_btruncf): Likewise.
	(check_effective_target_vect_call_ceil): Likewise.
	(check_effective_target_vect_call_floorf): Likewise.
	(check_effective_target_lto): Likewise.
	(check_vect_support_and_set_flags): Likewise.
	(check_effective_target_vect_stridedN): Enable when fully masked is
	available.
	(add_options_for_sqrt_insn): New procedure.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 94c8002..170ee5d 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2518,6 +2518,10 @@ Add the flags needed to define macro STACK_SIZE and set it to the stack size
 limit associated with the @ref{stack_size_et,,@code{stack_size} effective
 target}.
 
+@item sqrt_insn
+Add the target-specific flags needed to enable hardware square root
+instructions, if any.
+
 @item tls
 Add the target-specific flags needed to use thread-local storage.
 @end table
diff --git a/gcc/testsuite/gcc.dg/20020312-2.c b/gcc/testsuite/gcc.dg/20020312-2.c
index e72a5b2..c584d35 100644
--- a/gcc/testsuite/gcc.dg/20020312-2.c
+++ b/gcc/testsuite/gcc.dg/20020312-2.c
@@ -119,6 +119,8 @@ extern void abort (void);
 # endif
 #elif defined (__or1k__)
 /* No pic register.  */
+#elif defined (__AMDGCN__)
+/* No pic register.  */
 #else
 # error "Modify the test for your target."
 #endif
diff --git a/gcc/testsuite/gcc.dg/Wno-frame-address.c b/gcc/testsuite/gcc.dg/Wno-frame-address.c
index 11ae0cd..51f20b4 100644
--- a/gcc/testsuite/gcc.dg/Wno-frame-address.c
+++ b/gcc/testsuite/gcc.dg/Wn

Re: [PATCH v3 09/10] Ignore LLVM's blank lines.

2018-12-17 Thread Andrew Stubbs


On 12/12/2018 12:07, Rainer Orth wrote:

* lib/target-supports.exp (check_effective_target_llvm_binutils): New.


like "exceptions", this new effective-target keyword needs documenting
in sourcebuild.texi.


This patch adds the missing docs, and removes the extra blank lines.

OK?

Andrew
Ignore LLVM's blank lines.

The GCN toolchain must use the LLVM assembler and linker because there's no
binutils port.  The LLVM tools do not have the same diagnostic style as
binutils, so the "blank line(s) in output" tests are inappropriate (and very
noisy).

The LLVM tools also have different command line options, so it's not possible
to autodetect object formats in the same way.

This patch addresses both issues.

2018-12-17  Andrew Stubbs  

	gcc/doc/
	* sourcebuild.texi: Document dg-require-effective-target llvm_binutils
	and offload_gcn.

	gcc/testsuite/
	* lib/file-format.exp (gcc_target_object_format): Handle AMD GCN.
	* lib/gcc-dg.exp (gcc-dg-prune): Ignore blank lines from the LLVM
	linker.
	* lib/target-supports.exp (check_effective_target_llvm_binutils): New.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6536a7c..94c8002 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2265,6 +2265,9 @@ Target uses GNU @command{ld}.
 Target keeps null pointer checks, either due to the use of
 @option{-fno-delete-null-pointer-checks} or hardwired into the target.
 
+@item llvm_binutils
+Target is using an LLVM assembler and/or linker, instead of GNU Binutils.
+
 @item lto
 Compiler has been configured to support link-time optimization (LTO).
 
@@ -2285,6 +2288,9 @@ Target uses natural alignment (aligned to type size) for types of
 @item nonpic
 Target does not generate PIC by default.
 
+@item offload_gcn
+Target has been configured for OpenACC/OpenMP offloading on AMD GCN.
+
 @item pie_enabled
 Target generates PIE by default.
 
diff --git a/gcc/testsuite/lib/file-format.exp b/gcc/testsuite/lib/file-format.exp
index 5c47246..c595fe2 100644
--- a/gcc/testsuite/lib/file-format.exp
+++ b/gcc/testsuite/lib/file-format.exp
@@ -41,6 +41,9 @@ proc gcc_target_object_format { } {
 } elseif { [istarget *-*-aix*] } {
 	# AIX doesn't necessarily have objdump, so hand-code it.
 	set gcc_target_object_format_saved coff
+} elseif { [istarget *-*-amdhsa*] } {
+	# AMD GCN uses LLVM objdump which is not CLI-compatible
+	set gcc_target_object_format_saved elf
 } else {
 set objdump_name [find_binutils_prog objdump]
 set open_file [open objfmtst.c w]
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 054d884..ba7cb91 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -363,7 +363,7 @@ proc gcc-dg-prune { system text } {
 
 # Complain about blank lines in the output (PR other/69006)
 global allow_blank_lines
-if { !$allow_blank_lines } {
+if { !$allow_blank_lines && ![check_effective_target_llvm_binutils]} {
 	set num_blank_lines [llength [regexp -all -inline "\n\n" $text]]
 	if { $num_blank_lines } {
 	global testname_with_flags
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index eb3fbcd..98ff90e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8742,6 +8742,14 @@ proc check_effective_target_offload_hsa { } {
 } "-foffload=hsa" ]
 }
 
+# Return 1 if the compiler has been configured with hsa offloading.
+
+proc check_effective_target_offload_gcn { } {
+return [check_no_compiler_messages offload_gcn assembly {
+	int main () {return 0;}
+} "-foffload=amdgcn-unknown-amdhsa" ]
+}
+
 # Return 1 if the target support -fprofile-update=atomic
 proc check_effective_target_profile_update_atomic {} {
 return [check_no_compiler_messages profile_update_atomic assembly {
@@ -9038,3 +9046,9 @@ proc check_effective_target_inf { } {
 const double pinf = __builtin_inf ();
 }]
 }
+
+# Return 1 if this target uses an LLVM assembler and/or linker
+proc check_effective_target_llvm_binutils { } {
+return [expr { [istarget amdgcn*-*-*]
+		   || [check_effective_target_offload_gcn] } ]
+}

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-17 Thread Richard Sandiford

Dimitar Dimitrov  writes:
> On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
>> Hi,
>> 
>> if I understood that right, then clobbering sp is and has always been
>> ignored.

PR77904 was about the clobber not being ignored, so the behaviour
hasn't been consistent.

I'm also not sure it was always ignored in recent sources.  The clobber
does get added to the associated rtl insn, and it'd be surprising if
that never had an effect.

>> If that is right, then I would much prefer a warning, that says exactly
>> that, because that would also help to understand why removing that clobber
>> statement is safe even for old gcc versions.

If the asm does leave sp with a different value, then it's never been safe,
regardless of the gcc version.  That's why an error seems more appropriate.

> Thank you. Looks like general consensus is to have a warning. See attached 
> patch that switches the error to a warning.

I don't think there's a good reason to treat this differently from the
preexisting PIC register error.  If the argument for making it a warning
rather than an error is that the asm might happen to work by accident,
then the same is true for the PIC register.

Thanks,
Richard

Re: Linux x86 unwinder: Handle __NR_sigreturn for __kernel_sigreturn support

2018-12-17 Thread Florian Weimer

* Florian Weimer:

> I believe this may address recent unwinder failures in Fedora if the
> vDSO unwinder does not contain unwinding data:
>
>   
>
> The question is: Do we want to move in that direction?  Or should we
> make clear that the userspace ABI *requires* unwinding information?

This will be fixed on the kernel side, so the patch probably won't be
needed.

Thanks,
Florian

Re: [PATCH, og8] Add OpenACC 2.6 `serial' construct support

2018-12-17 Thread Chung-Lin Tang


Hi Maciej,
I don't think there's anything wrong with the patch itself; if the testcases
pass, then it should implement the functionality correctly.

My only issue is should "serial" really be promoted to such a visible construct
in the middle-end? It's just a special case of parallel, and user debug errors
can be dealt with specifically. I don't see much value of it being preserved
past the front-ends into gimplify/omp-low, just more testing to be done to guard
various conditions that are specific to OpenACC...

That's just my 0.02, others can provide more input.

Chung-Lin

On 2018/12/17 11:09 AM, Maciej W. Rozycki wrote:

The `serial' construct is equivalent to a `parallel' construct with
clauses `num_gangs(1) num_workers(1) vector_length(1)' implied.
Naturally these clauses are therefore not supported with the `serial'
construct.  All the remaining clauses accepted with `parallel' are also
accepted with `serial'.

Consequently implementation is straightforward, by handling `serial'
exactly like `parallel', except for hardcoding dimensions rather than
taking them from the relevant clauses, in `expand_omp_target'.

Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends, saving a lot of mechanical changes and avoiding middle-end
code expansion.  This is so that any reporting such as with warning or
error messages and in diagnostic dumps use `serial' rather than
`parallel', therefore avoiding user confusion.

gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant.  Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(omp_add_variable): Handle ORT_ACC_SERIAL.
(oacc_default_clause): Likewise.
(gimplify_scan_omp_clauses): Likewise.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-expand.c (expand_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.  Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_receiver_ref): Adjust accordingly.
(build_sender_ref): Likewise.
(scan_sharing_clauses): Likewise.
(create_omp_child_function): Likewise.
(scan_omp_for): Likewise.
(scan_omp_target): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.
* tree.def (OACC_SERIAL): New tree code.

* doc/generic.texi (OpenACC): Document OACC_SERIAL.

gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.

gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(OACC_SERIAL_CLAUSE_DEVICE_TYPE_MASK): Likewise.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.

gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(OACC_SERIAL_CLAUSE_DEVICE_TYPE_MASK): Likewise.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL.  Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.

gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-17 Thread Chung-Lin Tang


On 2018/12/14 10:56 PM, Thomas Schwinge wrote:

Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
wrote:

--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
+attribute_hidden struct goacc_asyncqueue *
+lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+{
+  /* The special value acc_async_noval (-1) maps to the thread-specific
+ default async stream.  */
+  if (async == acc_async_noval)
+async = thr->default_async;
+
+  if (async == acc_async_sync)
+return NULL;
+
+  if (async < 0)
+gomp_fatal ("bad async %d", async);
+
+  struct gomp_device_descr *dev = thr->dev;
+
+  if (!create
+  && (async >= dev->openacc.async.nasyncqueue
+ || !dev->openacc.async.asyncqueue[async]))
+return NULL;
+

Doesn't this last block also have to be included in the lock you're
taking below?


Good catch, I'll revise this.


-  gomp_mutex_lock (&dev->openacc.async.lock);
if (id >= dev->openacc.async.nasyncqueue)
  {
int diff = id + 1 - dev->openacc.async.nasyncqueue;
+  // TODO gomp_realloc might call "gomp_fatal" with 
"&dev->openacc.async.lock" locked.  Might cause deadlock?
dev->openacc.async.asyncqueue
= gomp_realloc (dev->openacc.async.asyncqueue,
sizeof (goacc_aq) * (id + 1));
@@ -105,16 +109,23 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool 
create, int async)
  
if (!dev->openacc.async.asyncqueue[id])

  {
+  //TODO We have "&dev->openacc.async.lock" locked here, and if "openacc.async.construct_func" 
calls "GOMP_PLUGIN_fatal" (via "CUDA_CALL_ASSERT", for example), that might cause deadlock?
+  //TODO Change the interface to emit an error in the plugin, but then "return 
NULL", and we catch that here, unlock, and bail out?
dev->openacc.async.asyncqueue[id] = dev->openacc.async.construct_func 
();



+  //TODO Are these safe to call, or might this cause deadlock if something's 
locked?
CUDA_CALL_ASSERT (cuEventCreate, &e, CU_EVENT_DISABLE_TIMING);
CUDA_CALL_ASSERT (cuEventRecord, e, aq1->cuda_stream);
CUDA_CALL_ASSERT (cuStreamWaitEvent, aq2->cuda_stream, e, 0);
@@ -1413,6 +1416,7 @@ static void
  cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
  {
if (res != CUDA_SUCCESS)
+//TODO Is this safe to call, or might this cause deadlock if something's 
locked?
  GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));


The reason there are deadlocks from inside the plugin on GOMP_PLUGIN_fatal() is 
when we hold the
struct gomp_device_descr's *device* lock, which is also acquired when we 
execute atexit device shutdown handlers, hence the deadlock.

I don't think this is the case for the OpenACC entry points that grab at the 
openacc.async.* hooks,
though I can audit them again if deemed so.


But then, I wonder if we couldn't skip all that locking, if we moved the
"asyncqueue"s from "acc_dispatch_t" into "goacc_thread"?




struct {
+//TODO Why do these live in the "device" data structure, and not in the 
"per-thread" data structure?
+//TODO Aren't they meant to be separate per thread?
+//TODO That is, as far as I remember right now, OpenACC explicitly states 
that an asyncqueue doesn't entail any synchronization between different host 
threads.
+//TODO Verify OpenACC.
+//TODO With that moved into "goacc_thread", we could get rid of all the 
locking needed here?
  /* Once created and put into the "active" list, asyncqueues are then never
 destructed and removed from the "active" list, other than if the TODO
 device is shut down.  */

At this point, I will again (as in that other email) state that my
understanding of OpenACC is that an async queue does not entail any
inter-thread synchronization, so it would seem reasonable that all async
queues are separate per thread.


OpenACC 2.6 specification, section 2.16.1 "async clause (page 68, line 2029):

"If there are two or more host threads executing and sharing the same 
accelerator device,
two asynchronous operations with the same async-value will be enqueued on the same 
activity queue"

That said, I recall most (if not all) of the synchronization operations and 
behavior are all
defined to be with respect to operations of the local host thread only, so the 
spec mentioning interaction with
other host threads here may be moot, as there's no way meaningful way to 
synchronize with
the OpenACC activity of other host threads (though correct me if I forgot some 
API)

Also, CUDA streams do not seem to support local-thread-operation-only 
synchronization.
I remember this was an issue in the old implementation inside the nvptx plugin 
as well, and we
had hacks to somewhat alleviate it (i.e. calling streams "single" or "multi" 
threaded)

Well, another issue that we might want to bring up to the OpenACC committee :)
I agree that if async queues spaces were simply thread-local then things would 
be much simpler.

Thanks,

Re: [PATCH 1/6, OpenACC, libgomp] Async re-work, interfaces

2018-12-17 Thread Chung-Lin Tang


On 2018/12/15 1:52 AM, Thomas Schwinge wrote:

--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h



@@ -199,7 +200,7 @@ enum gomp_map_kind
  /* Versions of libgomp and device-specific plugins.  GOMP_VERSION
 should be incremented whenever an ABI-incompatible change is introduced
 to the plugin interface defined in libgomp/libgomp.h.  */
-#define GOMP_VERSION   1
+#define GOMP_VERSION   2
  #define GOMP_VERSION_NVIDIA_PTX 1
  #define GOMP_VERSION_INTEL_MIC 0
  #define GOMP_VERSION_HSA 0


OK, I think -- but I'm never quite sure whether we do need to increment
"GOMP_VERSION" when only doing libgomp-internal libgomp-plugin changes,
which don't affect the user/GCC side?

GCC encodes "GOMP_VERSION" in "GOMP_offload_register_ver" calls
synthesized by "mkoffload": "GOMP_VERSION_PACK (/* LIB */ GOMP_VERSION,
/* DEV */ GOMP_VERSION_NVIDIA_PTX)", and then at run time libgomp checks
in "GOMP_offload_register_ver", so that we don't try to load offloading
code with an _old_ libgomp that has been compiled with/for the _new_
version.  (Right?)

 void
 GOMP_offload_register_ver (unsigned version, const void *host_table,
int target_type, const void *target_data)
 { [...]
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
 gomp_fatal ("Library too old for offload (version %u < %u)",
 GOMP_VERSION, GOMP_VERSION_LIB (version));

I don't have a problem with your change per se, but wouldn't we still be
able to load such code, given that we only changed the libgomp-interal
libgomp-plugin interface?

Am I confused?

Or is the above just an (unavoidable?) side effect, because we do need to
increment "GOMP_VERSION" for this check here:

   if (device->version_func () != GOMP_VERSION)
 {
   err = "plugin version mismatch";
   goto fail;
 }

..., which is making sure that the libgomp proper vs. libgomp-plugin
versions match.


The intended effect is exactly to ensure libgomp proper vs plugin compatibility.
We don't ensure backward/forward compatibility between libgomp/plugin, and
this version equality check is what enforces that.


--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -458,7 +462,6 @@ GOMP_PLUGIN_1.0 {
GOMP_PLUGIN_debug;
GOMP_PLUGIN_error;
GOMP_PLUGIN_fatal;
-   GOMP_PLUGIN_async_unmap_vars;
GOMP_PLUGIN_acc_thread;
  };


I think that's fine, but highlighting this again for Jakub, in case
there's an issue with removing a symbol from the libgomp-plugin
interface.


Since we don't enforce the libgomp/plugin interface to be compatible across
versions, I expect this to be okay.


--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h



+/* Opaque type to represent plugin-dependent implementation of an
+   OpenACC asynchronous queue.  */
+struct goacc_asyncqueue;
+
+/* Used to keep a list of active asynchronous queues.  */
+struct goacc_asyncqueue_list
+{
+  struct goacc_asyncqueue *aq;
+  struct goacc_asyncqueue_list *next;
+};
+
+typedef struct goacc_asyncqueue *goacc_aq;
+typedef struct goacc_asyncqueue_list *goacc_aq_list;


I'm not too fond of such "syntactic sugar" typedefs, but if that's fine
for Jakub to have in libgomp, then I won't object.

I'd be in favor then of "typedef struct N *N" or "typedef struct N *N_t"
variants however, instead of introducing yet another "goacc_aq" acronym
next to "goacc_asyncqueue", and "async queue" or "asynchronous queue" as
used in the descriptive texts (comments, etc.).  Maybe standardize all
these to "asyncqueue", also in the descriptive texts?

OpenACC, by the way, uses the term "device activity queue" (in most?
places...) to describe the underlying mechanism used to implement the
OpenACC "async" clause etc.


Please, no more name style changes, please... Orz (beg)

I think I originally thought of "asyncqueue" too, but I felt a shorthand was
needed in many places, and the straightforward "aq" seemed too short to be
comfortably informative. The "goacc_" prefix seemed just right.

Besides, the crucial name convention I had in mind was "queues" in OpenACC,
versus "streams" in CUDA. I don't see much value in further spinning on this
name.


Should "struct goacc_asyncqueue_list" and its typedef still be defined
here in "libgomp/libgomp-plugin.h" (for proximity to the other stuff),
even though it's not actually used in the libgomp-plugin interface?


It looks like it can indeed be placed in libgomp.h, maybe just before the
declaration of acc_dispatch_t.

I originally placed it there in libgomp-plugin.h simply to
collect the declarations related to goacc_asyncqueue together in one place.
Is separating them really better?


--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,19 +888,23 @@ typedef struct acc_dispatch_t

[...]

+  struct {
+gomp_mutex_t lock;
+int nasyncqueue;
+struct goacc_asyncqueue **asyncqueue;
+struct goacc_asyncqueue_list *active;

[...]

+  } async;


For "lock" see my co

Re: [PATCH] DWARF: Don't expand hash table when no insert is needed

2018-12-17 Thread Richard Biener

On Sun, Dec 16, 2018 at 9:33 PM H.J. Lu  wrote:
>
> find_slot_with_hash has
>
>  if (insert == INSERT && m_size * 3 <= m_n_elements * 4)
> expand ();
>
> which may expand hash table even if no insert is neeed and change hash
> table traverse order.  When output_macinfo_op is called, all index
> strings have been added to hash table by save_macinfo_strings, we
> shouldn't expand index string hash table.  Otherwise find_slot_with_hash
> will expand hash table when hash table has the right size.

So the point is that output_macinfo_op is called from code traversing the
hashtable?  I didn't really manage to quickly spot that.

> Tested on i686 and x86-64.  OK for trunk?

OK if called from htab traverse (please clarify that in the comment you add).

Richard.

>
> Thanks.
>
> H.J.
> ---
> PR debug/79342
> * dwarf2out.c (find_AT_string_in_table): Add insert argument
> and replace INSERT.
> (find_AT_string): Add insert argument defaulting to INSERT
> and replace INSERT.
> (output_macinfo_op): Pass NO_INSERT to find_AT_string.
> ---
>  gcc/dwarf2out.c | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index b2381056991..2fc88c0c12d 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -4618,12 +4618,13 @@ indirect_string_hasher::equal (indirect_string_node 
> *x1, const char *x2)
>
>  static struct indirect_string_node *
>  find_AT_string_in_table (const char *str,
> -hash_table *table)
> +hash_table *table,
> +enum insert_option insert = INSERT)
>  {
>struct indirect_string_node *node;
>
>indirect_string_node **slot
> -= table->find_slot_with_hash (str, htab_hash_string (str), INSERT);
> += table->find_slot_with_hash (str, htab_hash_string (str), insert);
>if (*slot == NULL)
>  {
>node = ggc_cleared_alloc ();
> @@ -4640,12 +4641,12 @@ find_AT_string_in_table (const char *str,
>  /* Add STR to the indirect string hash table.  */
>
>  static struct indirect_string_node *
> -find_AT_string (const char *str)
> +find_AT_string (const char *str, enum insert_option insert = INSERT)
>  {
>if (! debug_str_hash)
>  debug_str_hash = hash_table::create_ggc (10);
>
> -  return find_AT_string_in_table (str, debug_str_hash);
> +  return find_AT_string_in_table (str, debug_str_hash, insert);
>  }
>
>  /* Add a string attribute value to a DIE.  */
> @@ -28095,7 +28096,12 @@ output_macinfo_op (macinfo_entry *ref)
>break;
>  case DW_MACRO_define_strp:
>  case DW_MACRO_undef_strp:
> -  node = find_AT_string (ref->info);
> +  /* NB: output_macinfo_op is called after save_macinfo_strings.
> +All index strings have been added to hash table at this point.
> +We can't pass INSERT to find_slot_with_hash which may expand
> +hash table even if no insert is neeed and change hash table
> +traverse order.  */
> +  node = find_AT_string (ref->info, NO_INSERT);
>gcc_assert (node
>   && (node->form == DW_FORM_strp
>   || node->form == dwarf_FORM (DW_FORM_strx)));
> --
> 2.19.2
>

Re: V6 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-17 Thread Richard Biener

On Fri, Dec 14, 2018 at 11:48 PM H.J. Lu  wrote:
>
> On Fri, Dec 14, 2018 at 2:10 PM Jason Merrill  wrote:
> >
> > On 12/13/18 6:56 PM, H.J. Lu wrote:
> > > On Thu, Dec 13, 2018 at 12:50 PM Jason Merrill  wrote:
> > >>
> > >> On 9/25/18 11:46 AM, H.J. Lu wrote:
> > >>> On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill  wrote:
> >  On 07/23/2018 05:24 PM, H.J. Lu wrote:
> > >
> > > On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
> > > 
> > > wrote:
> > >>
> > >> On Mon, 18 Jun 2018, Jason Merrill wrote:
> > >>
> > >>> On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
> > >>> 
> > >>> wrote:
> > 
> >  On Mon, 18 Jun 2018, Jason Merrill wrote:
> > 
> > >> +  if (TREE_CODE (rhs) == COND_EXPR)
> > >> +{
> > >> +  /* Check the THEN path first.  */
> > >> +  tree op1 = TREE_OPERAND (rhs, 1);
> > >> +  context = check_address_of_packed_member (type, op1);
> > >
> > >
> > > This should handle the GNU extension of re-using operand 0 if 
> > > operand
> > > 1 is omitted.
> > 
> > 
> >  Doesn't that just use a SAVE_EXPR?
> > >>>
> > >>>
> > >>> Hmm, I suppose it does, but many places in the compiler seem to 
> > >>> expect
> > >>> that it produces a COND_EXPR with TREE_OPERAND 1 as NULL_TREE.
> > >>
> > >>
> > >> Maybe that's used somewhere inside the C++ front end.  For C a 
> > >> SAVE_EXPR
> > >> is produced directly.
> > >
> > >
> > > Here is the updated patch.  Changes from the last one:
> > >
> > > 1. Handle COMPOUND_EXPR.
> > > 2. Fixed typos in comments.
> > > 3. Combined warn_for_pointer_of_packed_member and
> > > warn_for_address_of_packed_member into
> > > warn_for_address_or_pointer_of_packed_member.
> > 
> > 
> > > c.i:4:33: warning: converting a packed ‘struct C *’ pointer increases 
> > > the
> > > alignment of ‘long int *’ pointer from 1 to 8 
> > > [-Waddress-of-packed-member]
> > 
> > 
> >  I think this would read better as
> > 
> >  c.i:4:33: warning: converting a packed ‘struct C *’ pointer (alignment 
> >  1) to
> >  ‘long int *’ (alignment 8) may result in an unaligned pointer value
> >  [-Waddress-of-packed-member]
> > >>>
> > >>> Fixed.
> > >>>
> > > +  while (TREE_CODE (base) == ARRAY_REF)
> > > +   base = TREE_OPERAND (base, 0);
> > > +  if (TREE_CODE (base) != COMPONENT_REF)
> > > +   return NULL_TREE;
> > 
> > 
> >  Are you deliberately not handling the other handled_component_p cases? 
> >  If
> >  so, there should be a comment.
> > >>>
> > >>> I changed it to
> > >>>
> > >>>while (handled_component_p (base))
> > >>>   {
> > >>> enum tree_code code = TREE_CODE (base);
> > >>> if (code == COMPONENT_REF)
> > >>>   break;
> > >>> switch (code)
> > >>>   {
> > >>>   case ARRAY_REF:
> > >>> base = TREE_OPERAND (base, 0);
> > >>> break;
> > >>>   default:
> > >>> /* FIXME: Can it ever happen?  */
> > >>> gcc_unreachable ();
> > >>> break;
> > >>>   }
> > >>>   }
> > >>>
> > >>> Is there a testcase to trigger this ICE? I couldn't find one.
> > >>
> > >> You can take the address of an element of complex:
> > >>
> > >> __complex int i;
> > >> int *p = &__real(i);
> > >>
> > >> You may get VIEW_CONVERT_EXPR with location wrappers.
> > >
> > > Fixed.  I replaced gcc_unreachable with return NULL_TREE;
> >
> > Then we're back to my earlier question: are you deliberately not
> > handling the other cases?  Why not look through them as well?  What if
> > e.g. the operand of __real is a packed field?
> >
>
> Here is the updated patch with
>
> diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
> index 615134cfdac..f105742598e 100644
> --- a/gcc/c-family/c-warn.c
> +++ b/gcc/c-family/c-warn.c
> @@ -2669,6 +2669,9 @@ check_address_of_packed_member (tree type, tree rhs)
> switch (code)
>   {
>   case ARRAY_REF:
> + case REALPART_EXPR:
> + case IMAGPART_EXPR:
> + case VIEW_CONVERT_EXPR:
> base = TREE_OPERAND (base, 0);
> break;
>   default:

don't we have handled_component_p () for this?  (you're still
missing BIT_FIELD_REF which might be used for vector
element accesses)

>
> Now I got
>
> [hjl@gnu-cfl-1 pr51628-6]$ cat foo.i
> struct A { __complex int i; };
> struct B { struct A a; };
> struct C { struct B b __attribute__ ((packed)); };
>
> extern struct C *p;
>
> int*
> foo1 (void)
> {
>   return &__real(p->b.a.i);
> }
> int*
> foo2 (void)
> {
>   return &__imag(p->b.a.i);
> }
> [hjl@gnu-cfl-1 pr51628-6]$ make foo.s
> /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
> -B/exp

Re: [PATCH] avoid folding snprintf calls with bounds > INT_MAX (PR 87096)

2018-12-17 Thread Christophe Lyon

Hi,

On Thu, 13 Dec 2018 at 19:14, Jeff Law  wrote:
>
> On 12/12/18 4:18 PM, Martin Sebor wrote:
> > POSIX requires snprintf to fail with EOVERFLOW when the specified
> > bound exceeds INT_MAX.  This requirement conflicts with the C
> > requirements on valid calls to the function and isn't universally
> > implemented (e.g., Glibc doesn't seem to follow it, and GCC has
> > historically not paid heed to it either).  Nevertheless, there
> > are implementations that do respect it (Solaris being one of
> > them), and it seems that GCC should make a tricky situation
> > even more treacherous for programmers by returning different
> > values from some calls to the function than the library would.
> > This is also what bug 87096 requests.  The patch also adds
> > a warning that was missing from a subset of these troublesome
> > calls.
> >
> > The attached patch disables the snprintf constant folding and
> > range optimization for calls to it and other related bounded
> > functions when the bound is not known not to exceed INT_MAX.
> >
> > Tested on x86_64-linux.
> >
> > Martin
> >
> > gcc-87096.diff
> >
> > PR tree-optimization/87096 - Optimised snprintf is not POSIX conformant
> >
> > gcc/ChangeLog:
> >
> >   PR rtl-optimization/87096
> >   * gimple-ssa-sprintf.c (sprintf_dom_walker::handle_gimple_call): Avoid
> >   folding calls whose bound may exceed INT_MAX.  Diagnose bound ranges
> >   that exceed the limit.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR tree-optimization/87096
> >   * gcc.dg/tree-ssa/builtin-snprintf-4.c: New test.

This new test fails on arm (and other 32 bits targets according to
gcc-testresults)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c  (test for warnings, line 106)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c  (test for warnings, line 128)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c  (test for warnings, line 74)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c  (test for warnings, line 87)
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c scan-tree-dump-times
optimized " = snprintf" 12
FAIL:gcc.dg/tree-ssa/builtin-snprintf-4.c scan-tree-dump-times
optimized " = vsnprintf" 12

Christophe

> OK
> jeff

Re: Implementation of F2018:18.4 C descriptors and ISO_Fortran_binding.h

2018-12-17 Thread Dominique d'Humières

Hi Paul,

Your patch did not apply smoothly on my working tree:

patching file gcc/configure
Reversed (or previously applied) patch detected!  Assume -R? [n] 
…
patching file gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90
Hunk #1 FAILED at 5.
1 out of 2 hunks FAILED -- saving rejects to file 
gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90.rej
…
patching file libgfortran/Makefile.am
Hunk #3 FAILED at 780.
Hunk #4 FAILED at 1019.
2 out of 4 hunks FAILED -- saving rejects to file libgfortran/Makefile.am.rej
patching file libgfortran/Makefile.in
Hunk #3 FAILED at 765.
Hunk #4 FAILED at 1337.
Hunk #5 FAILED at 1536.
3 out of 7 hunks FAILED -- saving rejects to file libgfortran/Makefile.in.rej
…

I ignored the failed hunks in libgfortran/Makefile.am and the corresponding 
ones in libgfortran/Makefile.in and applied manually the hunk #3.

The I did not have any problem to do the update.

Then the test gfortran.dg/ISO_Fortran_binding_1.c failed for the 32 bit mode:

/opt/gcc/work/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.f90:142:8:

  142 | if (c_allocate (x, lower, upper) .ne. 0) stop 10
  |1
Error: Type mismatch in argument 'lower' at (1); passed INTEGER(4) to INTEGER(8)

Cheers,

Dominique

Re: [PATCH, middle-end/i386]: Fix PR88502, Inline built-in asinh, acosh, atanh for -ffast-math

2018-12-17 Thread Richard Biener

On Mon, 17 Dec 2018, Uros Bizjak wrote:

> ... and the patch.

middle-end parts are OK.

> On Mon, Dec 17, 2018 at 8:58 AM Uros Bizjak  wrote:
> >
> > Attached patch inlines calls to asinh{,f}, acosh{,f,l} and atanh{,f,l}
> > using x87 XFmode arithmetic. In the patch, I left out asinhl due to
> > its reduced input argument range, but perhaps it could be added back,
> > since we are expanding under flag_unsafe_math_optimizations. The
> > expanders are modelled after the removed inlines in glibc [1] (which
> > also include asinhl, with a comment mentioning its reduced input
> > argument range).
> >
> > 2018-12-17  Uros Bizjak  
> >
> > PR target/88502
> > * internal-fn.def (ACOSH): New.
> > (ASINH): Ditto.
> > (ATANH): Ditto.
> > * optabs.def (acosh_optab): New.
> > (asinh_optab): Ditto.
> > (atanh_optab): Ditto.
> > * config/i386/i386-protos.h (ix86_emit_i387_asinh): New prototype.
> > (ix86_emit_i387_acosh): Ditto.
> > (ix86_emit_i387_atanh): Ditto.
> > * config/i386/i386.c (ix86_emit_i387_asinh): New function.
> > (ix86_emit_i387_acosh): Ditto.
> > (ix86_emit_i387_atanh): Ditto.
> > * config/i386/i386.md (asinh2): New expander.
> > (acoshxf2): Ditto.
> > (acosh2): Ditto.
> > (atanhxf2): Ditto.
> > (atanh2): Ditto.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > The patch also needs approval for its straightforward middle-end parts.
> >
> > [1] https://sourceware.org/ml/libc-alpha/2018-12/msg00519.html
> >
> > Uros.
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH, middle-end/i386]: Fix PR88502, Inline built-in asinh, acosh, atanh for -ffast-math

2018-12-17 Thread Uros Bizjak

... and the patch.

On Mon, Dec 17, 2018 at 8:58 AM Uros Bizjak  wrote:
>
> Attached patch inlines calls to asinh{,f}, acosh{,f,l} and atanh{,f,l}
> using x87 XFmode arithmetic. In the patch, I left out asinhl due to
> its reduced input argument range, but perhaps it could be added back,
> since we are expanding under flag_unsafe_math_optimizations. The
> expanders are modelled after the removed inlines in glibc [1] (which
> also include asinhl, with a comment mentioning its reduced input
> argument range).
>
> 2018-12-17  Uros Bizjak  
>
> PR target/88502
> * internal-fn.def (ACOSH): New.
> (ASINH): Ditto.
> (ATANH): Ditto.
> * optabs.def (acosh_optab): New.
> (asinh_optab): Ditto.
> (atanh_optab): Ditto.
> * config/i386/i386-protos.h (ix86_emit_i387_asinh): New prototype.
> (ix86_emit_i387_acosh): Ditto.
> (ix86_emit_i387_atanh): Ditto.
> * config/i386/i386.c (ix86_emit_i387_asinh): New function.
> (ix86_emit_i387_acosh): Ditto.
> (ix86_emit_i387_atanh): Ditto.
> * config/i386/i386.md (asinh2): New expander.
> (acoshxf2): Ditto.
> (acosh2): Ditto.
> (atanhxf2): Ditto.
> (atanh2): Ditto.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> The patch also needs approval for its straightforward middle-end parts.
>
> [1] https://sourceware.org/ml/libc-alpha/2018-12/msg00519.html
>
> Uros.
Index: config/i386/i386-protos.h
===
--- config/i386/i386-protos.h   (revision 267187)
+++ config/i386/i386-protos.h   (working copy)
@@ -170,6 +170,9 @@
 extern void x86_emit_floatuns (rtx [2]);
 extern void ix86_emit_fp_unordered_jump (rtx);
 
+extern void ix86_emit_i387_asinh (rtx, rtx);
+extern void ix86_emit_i387_acosh (rtx, rtx);
+extern void ix86_emit_i387_atanh (rtx, rtx);
 extern void ix86_emit_i387_log1p (rtx, rtx);
 extern void ix86_emit_i387_round (rtx, rtx);
 extern void ix86_emit_swdivsf (rtx, rtx, rtx, machine_mode);
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 267187)
+++ config/i386/i386.c  (working copy)
@@ -44054,6 +44054,135 @@
   JUMP_LABEL (insn) = label;
 }
 
+/* Output code to perform an asinh XFmode calculation.  */
+
+void ix86_emit_i387_asinh (rtx op0, rtx op1)
+{
+  rtx e1 = gen_reg_rtx (XFmode);
+  rtx e2 = gen_reg_rtx (XFmode);
+  rtx scratch = gen_reg_rtx (HImode);
+  rtx flags = gen_rtx_REG (CCNOmode, FLAGS_REG);
+  rtx cst1, tmp;
+  rtx_code_label *jump_label = gen_label_rtx ();
+  rtx_insn *insn;
+
+  /* e2 = sqrt (op1^2 + 1.0) + 1.0 */
+  emit_insn (gen_mulxf3 (e1, op1, op1));
+  cst1 = force_reg (XFmode, CONST1_RTX (XFmode));
+  emit_insn (gen_addxf3 (e2, e1, cst1));
+  emit_insn (gen_sqrtxf2 (e2, e2));
+  emit_insn (gen_addxf3 (e2, e2, cst1));
+
+  /* e1 = e1 / e2 */
+  emit_insn (gen_divxf3 (e1, e1, e2));
+
+  /* scratch = fxam (op1) */
+  emit_insn (gen_fxamxf2_i387 (scratch, op1));
+
+  /* e1 = e1 + |op1| */
+  emit_insn (gen_absxf2 (e2, op1));
+  emit_insn (gen_addxf3 (e1, e1, e2));
+
+  /* e2 = log1p (e1) */
+  ix86_emit_i387_log1p (e2, e1);
+
+  /* flags = signbit (op1) */
+  emit_insn (gen_testqi_ext_1_ccno (scratch, GEN_INT (0x02)));
+
+  /* if (flags) then e2 = -e2 */
+  tmp = gen_rtx_IF_THEN_ELSE (VOIDmode,
+ gen_rtx_EQ (VOIDmode, flags, const0_rtx),
+ gen_rtx_LABEL_REF (VOIDmode, jump_label),
+ pc_rtx);
+  insn = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+  predict_jump (REG_BR_PROB_BASE * 50 / 100);
+  JUMP_LABEL (insn) = jump_label;
+
+  emit_insn (gen_negxf2 (e2, e2));
+
+  emit_label (jump_label);
+  LABEL_NUSES (jump_label) = 1;
+
+  emit_move_insn (op0, e2);
+}
+
+/* Output code to perform an acosh XFmode calculation.  */
+
+void ix86_emit_i387_acosh (rtx op0, rtx op1)
+{
+  rtx e1 = gen_reg_rtx (XFmode);
+  rtx e2 = gen_reg_rtx (XFmode);
+  rtx cst1 = force_reg (XFmode, CONST1_RTX (XFmode));
+
+  /* e2 = sqrt (op1 + 1.0) */
+  emit_insn (gen_addxf3 (e2, op1, cst1));
+  emit_insn (gen_sqrtxf2 (e2, e2));
+
+  /* e1 = sqrt (op1 - 1.0) */
+  emit_insn (gen_subxf3 (e1, op1, cst1));
+  emit_insn (gen_sqrtxf2 (e1, e1));
+
+  /* e1 = e1 * e2 */
+  emit_insn (gen_mulxf3 (e1, e1, e2));
+
+  /* e1 = e1 + op1 */
+  emit_insn (gen_addxf3 (e1, e1, op1));
+
+  /* op0 = log (e1) */
+  emit_insn (gen_logxf2 (op0, e1));
+}
+
+/* Output code to perform an atanh XFmode calculation.  */
+
+void ix86_emit_i387_atanh (rtx op0, rtx op1)
+{
+  rtx e1 = gen_reg_rtx (XFmode);
+  rtx e2 = gen_reg_rtx (XFmode);
+  rtx scratch = gen_reg_rtx (HImode);
+  rtx flags = gen_rtx_REG (CCNOmode, FLAGS_REG);
+  rtx half = const_double_from_real_value (dconsthalf, XFmode);
+  rtx cst1, tmp;
+  rtx_code_label *jump_label = gen_label_rtx ();
+  rtx_insn *insn;
+
+  /* scratch = fxam (op1) */
+  emit_insn (gen_fxamxf2_i387 (scratch, op1));
+
+  /* e2 = |op1| */
+  emit_insn (gen_absxf2 (e2, op1

84 matches

Mail list logo