Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593)

2019-01-07 Thread Uros Bizjak
On Mon, Jan 7, 2019 at 11:51 PM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in that PR, we have a SI->DImode zero extension and RA happens
> to choose to zero extend from a SImode memory slot which is the low part of
> the DImode memory slot into which the zero extension is to be stored.
> Unfortunately, the RTL DSE part really doesn't have infrastructure to
> remember and, if needed, invalidate loads, it just remembers stores, so
> handling this generically is quite unlikely at least for GCC9.
>
> This patch just handles that through a peephole2 (other option would be to
> handle it in the define_split for the zero extension, but the peephole2 is
> likely to catch more things).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Is there a reason stack registers are excluded? Before stackreg pass,
these registers are just like other hard registers.

Other that that, there is no need for REG_P predicate; after reload we
don't have subregs and register_operand will match only hard regs.
Also, please put peep2_reg_dead_p predicate in the pattern predicate.

Uros.

> 2019-01-07  Jakub Jelinek  
>
> PR rtl-optimization/79593
> * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2.
>
> --- gcc/config/i386/i386.md.jj  2019-01-01 12:37:31.564738571 +0100
> +++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100
> @@ -18740,6 +18740,21 @@ (define_peephole2
>const0_rtx);
>  })
>
> +;; Attempt to optimize away memory stores of values the memory already
> +;; has.  See PR79593.
> +(define_peephole2
> +  [(set (match_operand 0 "register_operand")
> +(match_operand 1 "memory_operand"))
> +   (set (match_dup 1) (match_dup 0))]
> +  "REG_P (operands[0])
> +   && !STACK_REGNO_P (operands[0])
> +   && !MEM_VOLATILE_P (operands[1])"
> +  [(set (match_dup 0) (match_dup 1))]
> +{
> +  if (peep2_reg_dead_p (1, operands[0]))
> +DONE;
> +})
> +
>  ;; Attempt to always use XOR for zeroing registers (including FP modes).
>  (define_peephole2
>[(set (match_operand 0 "general_reg_operand")
>
> Jakub


Re: [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY

2019-01-07 Thread Uros Bizjak
On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu  wrote:
>
> There is no need to generate vzeroupper if caller uses upper bits of
> AVX/AVX512 registers,  We track caller's avx_u128_state and avoid
> vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.
>
> Tested on i686 and x86-64 with and without --with-arch=native.
>
> OK for trunk?

In principle OK, but I think we don't have to cache the result of
ix86_avx_u128_mode_entry. Simply call the function from
ix86_avx_u128_mode_exit; it is a simple function, so I guess we can
afford to re-call it one more time per function.

Uros.

> Thanks.
>
> H.J.
> ---
> gcc/
>
> PR target/88717
> * config/i386/i386.c (ix86_avx_u128_mode_entry): Set
> caller_avx_u128_dirty to true when caller is AVX_U128_DIRTY.
> (ix86_avx_u128_mode_exit): Set exit mode to AVX_U128_DIRTY if
> caller is AVX_U128_DIRTY.
> * config/i386/i386.h (machine_function): Add
> caller_avx_u128_dirty.
>
> gcc/testsuite/
>
> PR target/88717
> * gcc.target/i386/pr88717.c: New test.
> ---
>  gcc/config/i386/i386.c  | 10 +-
>  gcc/config/i386/i386.h  |  3 +++
>  gcc/testsuite/gcc.target/i386/pr88717.c | 24 
>  3 files changed, 36 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr88717.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d01278d866f..9b49a2c1d9c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19100,7 +19100,11 @@ ix86_avx_u128_mode_entry (void)
>rtx incoming = DECL_INCOMING_RTL (arg);
>
>if (incoming && ix86_check_avx_upper_register (incoming))
> -   return AVX_U128_DIRTY;
> +   {
> + /* Caller is AVX_U128_DIRTY.  */
> + cfun->machine->caller_avx_u128_dirty = true;
> + return AVX_U128_DIRTY;
> +   }
>  }
>
>return AVX_U128_CLEAN;
> @@ -19130,6 +19134,10 @@ ix86_mode_entry (int entity)
>  static int
>  ix86_avx_u128_mode_exit (void)
>  {
> +  /* Exit mode is set to AVX_U128_DIRTY if caller is AVX_U128_DIRTY.  */
> +  if (cfun->machine->caller_avx_u128_dirty)
> +return AVX_U128_DIRTY;
> +
>rtx reg = crtl->return_rtx;
>
>/* Exit mode is set to AVX_U128_DIRTY if there are 256bit
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 83b025e0cf5..c053b657a55 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2747,6 +2747,9 @@ struct GTY(()) machine_function {
>/* If true, ENDBR is queued at function entrance.  */
>BOOL_BITFIELD endbr_queued_at_entrance : 1;
>
> +  /* If true, caller is AVX_U128_DIRTY.  */
> +  BOOL_BITFIELD caller_avx_u128_dirty : 1;
> +
>/* The largest alignment, in bytes, of stack slot actually used.  */
>unsigned int max_used_stack_alignment;
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr88717.c 
> b/gcc/testsuite/gcc.target/i386/pr88717.c
> new file mode 100644
> index 000..01680998f1b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr88717.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512f -mvzeroupper" } */
> +
> +#include 
> +
> +__m128
> +foo1 (__m256 x)
> +{
> +  return _mm256_castps256_ps128 (x);
> +}
> +
> +void
> +foo2 (float *p, __m256 x)
> +{
> +  *p = ((__v8sf)x)[0];
> +}
> +
> +void
> +foo3 (float *p, __m512 x)
> +{
> +  *p = ((__v16sf)x)[0];
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroupper" } } */
> --
> 2.20.1
>


Re: C++ PATCH for c++/88548, this accepted in static member functions

2019-01-07 Thread Jason Merrill

On 12/22/18 4:38 PM, Marek Polacek wrote:

I noticed that we weren't diagnosing using 'this' in noexcept-specifiers
of static member functions, and Jakub pointed out that this is also true
for trailing-return-type.  cp_parser has local_variables_forbidden_p to
detect using local vars and 'this' in certain contexts, so let's use that.

...except that I found out that I need to be able to distinguish between
forbidding just local vars and/or this, so I've changed its type to char.
For instance, you can't use 'this' in a static member function declaration,
but you can use a local var because noexcept/decltype is an unevaluated
context.

I also noticed that we weren't diagnosing 'this' in a friend declaration,
which is also forbidden.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


I wonder about suppressing inject_this_parm for a static member 
function, since current_class_ptr is unset within a static member 
function.  But this approach (and patch) is also OK.


Jason


Re: [C++ PATCH] Avoid ICE trying to add a bogus fixit hint (PR c++/88554)

2019-01-07 Thread Jason Merrill

On 1/4/19 5:57 PM, Jakub Jelinek wrote:

Hi!

As found by Jonathan, we shouldn't be trying to emit return *this; fixit
hint in non-methods where this is not available (and ICEing because
current_class_ref is NULL there).
I've just added a testcase for Jon's patch from the PR and merged two nested
ifs into one.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.

Jason



Re: C++ PATCH for c++/88538 - braced-init-list in template-argument-list

2019-01-07 Thread Jason Merrill

On 1/7/19 6:56 PM, Marek Polacek wrote:

At the risk of seeming overly eager, I thought it would be reasonable to
go with the following: enabling braced-init-list as a template-argument.
As the discussion on the reflector clearly indicates, this was the intent
from the get-go.

I know, it's not a regression.  But I restricted the change to C++20, and it
should strictly allow code that wasn't accepted before -- when a template
argument starts with {.  Perhaps we could even drop the C++20 check.

What's your preference?


Let's keep the C++20 check for now at least.  I'd suggest moving the 
change further down, with this code:



  if (cxx_dialect <= cxx14)
argument = cp_parser_constant_expression (parser);
  else
{
  /* With C++17 generalized non-type template arguments we need to handle  
 lvalue constant expressions, too.  */

  argument = cp_parser_assignment_expression (parser);
  require_potential_constant_expression (argument);
}


Jason


Re: [C++ Patch] Fix a start_decl location

2019-01-07 Thread Jason Merrill

On 1/6/19 4:47 AM, Paolo Carlini wrote:

Hi,

this was supposed to be very straightforward but required a little more. 
A first draft of the patch exploited DECL_SOURCE_LOCATION but that 
failed when I tested the case of a function already defined in class: at 
that point the location of the decl is that of the in class definition 
itself not that of the wrong redeclaration. Thus the use of 
declarator->id_loc. Tested x86_64-linux.


A final note, about a detail already noticed many other times: the 
location we store in such cases is that of K not that of f, and that 
seems suboptimal to me: in principle we should point to f and possibly 
have the wavy queue of the caret going back to K, at least that's what 
clang does... no idea id David has this kind of tweak in his todo list.


David has done various work with handling of qualified-id locations, so 
I imagine he'll be interested in this issue.


The patch is OK.

Jason


[committed] Fix jit test case (PR jit/88747)

2019-01-07 Thread David Malcolm
Amongst other changes, r266077 updated value_range_base::dump so
that it additionally prints the type.  This broke an assertion within
the jit testsuite, in jit.dg/test-sum-of-squares.c, which was checking
for:
  ": [-INF, n_"
but was now getting:
  ": signed int [-INF, n_"

The test is merely intended as a simple verification that we can read
dump files via gcc_jit_context_enable_dump.

This patch loosens the requirements on the dump so that it should work
with either version of value_range_base::dump.

Verified on x86_64-pc-linux-gnu, removing 6 FAIL results from jit.sum
and taking it from 6479 to 10288 PASS results.

Committed to trunk as r267671.

gcc/testsuite/ChangeLog:
PR jit/88747
* jit.dg/test-sum-of-squares.c (verify_code): Update expected vrp
dump to reflect r266077.
---
 gcc/testsuite/jit.dg/test-sum-of-squares.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/jit.dg/test-sum-of-squares.c 
b/gcc/testsuite/jit.dg/test-sum-of-squares.c
index 46fd5c2..f095f41 100644
--- a/gcc/testsuite/jit.dg/test-sum-of-squares.c
+++ b/gcc/testsuite/jit.dg/test-sum-of-squares.c
@@ -137,6 +137,6 @@ verify_code (gcc_jit_context *ctxt, gcc_jit_result *result)
  bounds of the iteration variable. Specifically, verify that some
  variable is known to be in the range negative infinity to some
  expression based on param "n" (actually n-1).  */
-  CHECK_STRING_CONTAINS (dump_vrp1, ": [-INF, n_");
+  CHECK_STRING_CONTAINS (dump_vrp1, "[-INF, n_");
   free (dump_vrp1);
 }
-- 
1.8.5.3



Re: [PATCH, testsuite] Allow builtin-has-attribute-* to run as far as possible on targets without alias support.

2019-01-07 Thread Iain Sandoe
Hi Martin,

> On 8 Jan 2019, at 00:25, Martin Sebor  wrote:
> 
> On 1/7/19 9:55 AM, Iain Sandoe wrote:
>> Hi Martin,
>> A)
>> Some of the builtin-has-attribute tests fail because a sub-set of them need 
>> symbol alias support.
>> Darwin has only support for weak aliases and therefore we need to skip these.
>> However, the sub-set is small, and I am reluctant to throw out the entire 
>> set for the sake of a small number, so I propose to wrap that small number 
>> in #ifndef that can be enabled by targets without the necessary support 
>> (Darwin is not the only one, just the most frequently tested and therefore 
>> often “guilty” of finding the problem ;) )
>> It’s a tricky trade-off between having too many test-cases and having test 
>> cases that try to cover too many circumstances...
>> B) [builtin-has-attribute-4.c]
>> I am concerned by the diagnostics for the lack of support for the 
>> “protected” mode (Darwin doesn’t have this, at least at present).
>> B.1 the reported diagnostic appears on the closing brace of the function, 
>> rather than on the statement that triggers it (line 233).
>> B.2 I think you’re perhaps missing a %< %> pair - there’s no ‘’ around 
>> “protected".
>> B.3. there are a bunch of other lines with the “protected” visibility 
>> marking, but no diagnostic (perhaps that’s intended, I am not sure).
>> Addressing B is a separate issue from making the current tests pass, it 
>> might not be appropriate at this stage .. it’s more of a “head’s up”.
>> as for the test fixes, OK for trunk?
> 
> Jeff already approved the patch so let me just say thanks for
> the cleanup and the heads up.

I will apply the patch as is (probably my tomorrow AM) for now - and then we 
can adjust if there’s any change to the diagnostics.
> 
> Using an #ifdef like your did makes sense to me for now.  If other
> targets need something similar it might be worth considering whether
> to add an effective-target for them instead of enumerating them in
> the test, and also change the test to verify that the appropriate
> warning is issued.  I'll try to remember to look into it.
> 
> I suspect the missing quotes from around "protected" in the warning
> comed from gcc/config/darwin.c:
> 
>warning (OPT_Wattributes, "protected visibility attribute "
> "not supported in this configuration; ignored”);

Ah..  and that’s a “warning” rather than a “warning_at” which explains the 
placement on the final brace.  Without looking at the context, not sure what 
location info we might have there.  Note to self: better audit other stuff in 
Darwin’s port-specific attrs.

> Let me add them along with a test for them.  

There is a test in builtin-has-attribute-4.c, by default and ….
>> +} /* { dg-warning "protected visibility attribute not supported" "" { 
>> target { *-*-darwin* } } } */

> I'll see if I can
> quickly tell also why the warning isn't issued consistently.

… this might produce a bunch more warnings.

thanks
Iain



PING #2 [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2019-01-07 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00337.html

On 1/3/19 3:22 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00337.html

On 12/20/18 7:51 PM, Martin Sebor wrote:

Jeff, did this and the rest of the discussion answer your question
and if so, is the patch okay to commit?

   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00337.html

Martin

On 12/13/18 12:48 PM, Martin Sebor wrote:

On 12/13/18 12:20 PM, Martin Sebor wrote:

On 12/13/18 11:59 AM, Jeff Law wrote:

On 12/5/18 8:55 PM, Martin Sebor wrote:

The __builtin_has_attribute function fails with an ICE when
its first argument is an expression such as INDIRECT_REF, or
many others.  The code simply assumes it's either a type or
a decl.  The attached patch corrects this oversight.

While testing the fix I also noticed the C++ front end expects
the first operand to be a unary expression, which causes most
other kinds of expressions to be rejected.  The patch fixes
that as well.

Finally, while testing the fix even more I realized that
the built-in considers the underlying array itself in ARRAY_REF
expressions rather than its type, which leads to inconsistent
results for overaligned arrays (it's the array itself that's
overaligned, not its elements).  So I fixed that too and
adjusted the one test designed to verify this.

Tested on x86_64-linux.

Martin

gcc-88383.diff

PR c/88383 - ICE calling __builtin_has_attribute on a reference

gcc/c-family/ChangeLog:

PR c/88383
* c-attribs.c (validate_attribute): Handle expressions.
(has_attribute): Handle types referenced by expressions.
Avoid considering array attributes in ARRAY_REF expressions .

gcc/cp/ChangeLog:

PR c/88383
* parser.c (cp_parser_has_attribute_expression): Handle 
assignment

expressions.

gcc/testsuite/ChangeLog:

PR c/88383
* c-c++-common/builtin-has-attribute-4.c: Adjust expectations.
* c-c++-common/builtin-has-attribute-6.c: New test.
Well, the high level question here is do we want to support this 
builtin

on expressions at all.  Generally attributes apply to types and decls,
not expressions.

Clearly we shouldn't fault, but my first inclination is that the
attribute applies to types or decls, not expressions.  In that case we
should just be issuing an error.

I could be convinced otherwise, so if you think we should support
passing expressions to this builtin, make your case.


The support is necessary in order to determine the attributes
in expressions such as:

   struct S { __attribute__ ((packed)) int a[32]; };

   extern struct S s;

   _Static_assert (__builtin_has_attribute (s.a, packed));


An example involving types might be a better one:

   typedef __attribute__ ((may_alias)) int* BadInt;

   void f (BadInt *p)
   {
 _Static_assert (__builtin_has_attribute (*p, may_alias));
   }

Martin








Re: [PATCH, testsuite] Allow builtin-has-attribute-* to run as far as possible on targets without alias support.

2019-01-07 Thread Martin Sebor

On 1/7/19 9:55 AM, Iain Sandoe wrote:

Hi Martin,

A)
Some of the builtin-has-attribute tests fail because a sub-set of them need 
symbol alias support.
Darwin has only support for weak aliases and therefore we need to skip these.

However, the sub-set is small, and I am reluctant to throw out the entire set 
for the sake of a small number, so I propose to wrap that small number in 
#ifndef that can be enabled by targets without the necessary support (Darwin is 
not the only one, just the most frequently tested and therefore often “guilty” 
of finding the problem ;) )

It’s a tricky trade-off between having too many test-cases and having test 
cases that try to cover too many circumstances...

B) [builtin-has-attribute-4.c]
I am concerned by the diagnostics for the lack of support for the “protected” 
mode (Darwin doesn’t have this, at least at present).

B.1 the reported diagnostic appears on the closing brace of the function, 
rather than on the statement that triggers it (line 233).
B.2 I think you’re perhaps missing a %< %> pair - there’s no ‘’ around 
“protected".
B.3. there are a bunch of other lines with the “protected” visibility marking, 
but no diagnostic (perhaps that’s intended, I am not sure).

Addressing B is a separate issue from making the current tests pass, it might 
not be appropriate at this stage .. it’s more of a “head’s up”.

as for the test fixes, OK for trunk?


Jeff already approved the patch so let me just say thanks for
the cleanup and the heads up.

Using an #ifdef like your did makes sense to me for now.  If other
targets need something similar it might be worth considering whether
to add an effective-target for them instead of enumerating them in
the test, and also change the test to verify that the appropriate
warning is issued.  I'll try to remember to look into it.

I suspect the missing quotes from around "protected" in the warning
comed from gcc/config/darwin.c:

warning (OPT_Wattributes, "protected visibility attribute "
 "not supported in this configuration; ignored");

Let me add them along with a test for them.  I'll see if I can
quickly tell also why the warning isn't issued consistently.

Martin


Iain
 
gcc/testsuite/
 
 	* c-c++-common/builtin-has-attribute-3.c: Skip tests requiring symbol

alias support.
* c-c++-common/builtin-has-attribute-4.c: Likewise.
Append match for warning that ‘protected’ attribute is not supported.

diff --git a/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c 
b/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
index f048059..5b2e5c7 100644
--- a/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
+++ b/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
@@ -1,7 +1,9 @@
  /* Verify __builtin_has_attribute return value for functions.
 { dg-do compile }
 { dg-options "-Wall -ftrack-macro-expansion=0" }
-   { dg-options "-Wall -Wno-narrowing -Wno-unused-local-typedefs 
-ftrack-macro-expansion=0" { target c++ } }  */
+   { dg-options "-Wall -Wno-narrowing -Wno-unused-local-typedefs 
-ftrack-macro-expansion=0" { target c++ } }
+   { dg-additional-options "-DSKIP_ALIAS" { target *-*-darwin* } }
+*/
  
  #define ATTR(...) __attribute__ ((__VA_ARGS__))
  
@@ -27,7 +29,9 @@ extern "C"

  #endif
  ATTR (noreturn) void fnoreturn (void) { __builtin_abort (); }
  
+#ifndef SKIP_ALIAS

  ATTR (alias ("fnoreturn")) void falias (void);
+#endif
  
  #define A(expect, sym, attr)		\

typedef int Assert [1 - 2 * !(__builtin_has_attribute (sym, attr) == 
expect)]
@@ -114,7 +118,7 @@ void test_alloc_size_malloc (void)
A (1, fmalloc_size_3, malloc);
  }
  
-

+#ifndef SKIP_ALIAS
  void test_alias (void)
  {
A (0, fnoreturn, alias);
@@ -123,7 +127,7 @@ void test_alias (void)
A (0, falias, alias ("falias"));
A (0, falias, alias ("fnone"));
  }
-
+#endif
  
  void test_cold_hot (void)

  {
diff --git a/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c 
b/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
index d56ef6b..0c36cfc 100644
--- a/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
+++ b/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
@@ -1,7 +1,9 @@
  /* Verify __builtin_has_attribute return value for variables.
 { dg-do compile }
 { dg-options "-Wall -ftrack-macro-expansion=0" }
-   { dg-options "-Wall -Wno-narrowing -Wno-unused -ftrack-macro-expansion=0" { 
target c++ } }  */
+   { dg-options "-Wall -Wno-narrowing -Wno-unused -ftrack-macro-expansion=0" { 
target c++ } }
+   { dg-additional-options "-DSKIP_ALIAS" { target *-*-darwin* } }
+*/
  
  #define ATTR(...) __attribute__ ((__VA_ARGS__))
  
@@ -45,6 +47,7 @@ void test_aligned (void)

  }
  
  
+#ifndef SKIP_ALIAS

  int vtarget;
  extern ATTR (alias ("vtarget")) int valias;
  
@@ -55,7 +58,7 @@ void test_alias (void)

A (1, valias, alias ("vtarget"));
A (0, valias, alias ("vnone"));
  }
-
+#endif
  
  void test_cleanup (void)

  {
@@ -227,7 +230,7 @@ void test_vector_size (vo

C++ PATCH for c++/88538 - braced-init-list in template-argument-list

2019-01-07 Thread Marek Polacek
At the risk of seeming overly eager, I thought it would be reasonable to
go with the following: enabling braced-init-list as a template-argument.
As the discussion on the reflector clearly indicates, this was the intent
from the get-go.

I know, it's not a regression.  But I restricted the change to C++20, and it
should strictly allow code that wasn't accepted before -- when a template
argument starts with {.  Perhaps we could even drop the C++20 check.

What's your preference?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-01-07  Marek Polacek  

PR c++/88538 - braced-init-list in template-argument-list.
* parser.c (cp_parser_template_argument): Handle braced-init-list when
in C++20.

* g++.dg/cpp2a/nontype-class11.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index bca1739ace3..7de2ee28b20 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -16892,7 +16892,18 @@ cp_parser_template_argument (cp_parser* parser)
   return argument;
 }
   /* It must be a non-type argument.  In C++17 any constant-expression is
- allowed.  */
+ allowed.  In C++20, we can encounter a braced-init-list.  */
+  if (cxx_dialect >= cxx2a
+  && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+{
+  cp_parser_parse_tentatively (parser);
+  bool expr_non_constant_p;
+  argument = cp_parser_braced_list (parser, &expr_non_constant_p);
+  if (cp_parser_parse_definitely (parser))
+   /* Yup, it was a braced-init-list.  */
+   return argument;
+}
+
   if (cxx_dialect > cxx14)
 goto general_expr;
 
diff --git gcc/testsuite/g++.dg/cpp2a/nontype-class11.C 
gcc/testsuite/g++.dg/cpp2a/nontype-class11.C
new file mode 100644
index 000..8a06d23904b
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp2a/nontype-class11.C
@@ -0,0 +1,21 @@
+// PR c++/88538
+// { dg-do compile { target c++2a } }
+
+struct S {
+  unsigned a;
+  unsigned b;
+  constexpr S(unsigned _a, unsigned _b) noexcept: a{_a}, b{_b} { }
+};
+
+template 
+void fnc()
+{
+}
+
+template struct X { };
+
+void f()
+{
+  fnc<{10,20}>();
+  X<{1, 2}> x;
+}


Re: [PATCH] soft-fp: Update _FP_W_TYPE_SIZE check from glibc

2019-01-07 Thread H.J. Lu
On Mon, Jan 7, 2019 at 2:49 PM Joseph Myers  wrote:
>
> Except for the libgcc/config/rs6000/ibm-ldouble.c changes, the appropriate
> way to get these changes into libgcc would be a bulk update of all the
> files copied from glibc, rather than selectively merging particular
> changes - which would be done at another development stage rather than
> when in regressions-only mode, given that this is not a user-visible bug
> at all but purely an internal cleanup.
>

I am queuing this patch for GCC 10.

Thanks.

-- 
H.J.
From 9bfd82eca9f53983f8a2bb30ebbbadc05960e9d0 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 7 Jan 2019 09:16:23 -0800
Subject: [PATCH] Update soft-fp from glibc

This patch is updating all soft-fp from glibc.  Most changes are copyright
years update, execpt for

commit 69da3c9e87e0a692e79db0615a53782e4198dbf0
Author: H.J. Lu 
Date:   Mon Jan 7 09:04:29 2019 -0800

soft-fp: Properly check _FP_W_TYPE_SIZE [BZ #24066]

	* config/rs6000/ibm-ldouble.c: Use "_FP_W_TYPE_SIZE < 64" to
	check if 4_FP_W_TYPEs are used for IEEE quad precision.
	* soft-fp/adddf3.c: Updated from glibc.
	* soft-fp/addsf3.c: Likewise.
	* soft-fp/addtf3.c: Likewise.
	* soft-fp/divdf3.c: Likewise.
	* soft-fp/divsf3.c: Likewise.
	* soft-fp/divtf3.c: Likewise.
	* soft-fp/double.h: Likewise.
	* soft-fp/eqdf2.c: Likewise.
	* soft-fp/eqsf2.c: Likewise.
	* soft-fp/eqtf2.c: Likewise.
	* soft-fp/extenddftf2.c: Likewise.
	* soft-fp/extended.h: Likewise.
	* soft-fp/extendhftf2.c: Likewise.
	* soft-fp/extendsfdf2.c: Likewise.
	* soft-fp/extendsftf2.c: Likewise.
	* soft-fp/extendxftf2.c: Likewise.
	* soft-fp/fixdfdi.c: Likewise.
	* soft-fp/fixdfsi.c: Likewise.
	* soft-fp/fixdfti.c: Likewise.
	* soft-fp/fixhfti.c: Likewise.
	* soft-fp/fixsfdi.c: Likewise.
	* soft-fp/fixsfsi.c: Likewise.
	* soft-fp/fixsfti.c: Likewise.
	* soft-fp/fixtfdi.c: Likewise.
	* soft-fp/fixtfsi.c: Likewise.
	* soft-fp/fixtfti.c: Likewise.
	* soft-fp/fixunsdfdi.c: Likewise.
	* soft-fp/fixunsdfsi.c: Likewise.
	* soft-fp/fixunsdfti.c: Likewise.
	* soft-fp/fixunshfti.c: Likewise.
	* soft-fp/fixunssfdi.c: Likewise.
	* soft-fp/fixunssfsi.c: Likewise.
	* soft-fp/fixunssfti.c: Likewise.
	* soft-fp/fixunstfdi.c: Likewise.
	* soft-fp/fixunstfsi.c: Likewise.
	* soft-fp/fixunstfti.c: Likewise.
	* soft-fp/floatdidf.c: Likewise.
	* soft-fp/floatdisf.c: Likewise.
	* soft-fp/floatditf.c: Likewise.
	* soft-fp/floatsidf.c: Likewise.
	* soft-fp/floatsisf.c: Likewise.
	* soft-fp/floatsitf.c: Likewise.
	* soft-fp/floattidf.c: Likewise.
	* soft-fp/floattihf.c: Likewise.
	* soft-fp/floattisf.c: Likewise.
	* soft-fp/floattitf.c: Likewise.
	* soft-fp/floatundidf.c: Likewise.
	* soft-fp/floatundisf.c: Likewise.
	* soft-fp/floatunditf.c: Likewise.
	* soft-fp/floatunsidf.c: Likewise.
	* soft-fp/floatunsisf.c: Likewise.
	* soft-fp/floatunsitf.c: Likewise.
	* soft-fp/floatuntidf.c: Likewise.
	* soft-fp/floatuntihf.c: Likewise.
	* soft-fp/floatuntisf.c: Likewise.
	* soft-fp/floatuntitf.c: Likewise.
	* soft-fp/gedf2.c: Likewise.
	* soft-fp/gesf2.c: Likewise.
	* soft-fp/getf2.c: Likewise.
	* soft-fp/half.h: Likewise.
	* soft-fp/ledf2.c: Likewise.
	* soft-fp/lesf2.c: Likewise.
	* soft-fp/letf2.c: Likewise.
	* soft-fp/muldf3.c: Likewise.
	* soft-fp/mulsf3.c: Likewise.
	* soft-fp/multf3.c: Likewise.
	* soft-fp/negdf2.c: Likewise.
	* soft-fp/negsf2.c: Likewise.
	* soft-fp/negtf2.c: Likewise.
	* soft-fp/op-1.h: Likewise.
	* soft-fp/op-2.h: Likewise.
	* soft-fp/op-4.h: Likewise.
	* soft-fp/op-8.h: Likewise.
	* soft-fp/op-common.h: Likewise.
	* soft-fp/quad.h: Likewise.
	* soft-fp/single.h: Likewise.
	* soft-fp/soft-fp.h: Likewise.
	* soft-fp/subdf3.c: Likewise.
	* soft-fp/subsf3.c: Likewise.
	* soft-fp/subtf3.c: Likewise.
	* soft-fp/truncdfsf2.c: Likewise.
	* soft-fp/trunctfdf2.c: Likewise.
	* soft-fp/trunctfhf2.c: Likewise.
	* soft-fp/trunctfsf2.c: Likewise.
	* soft-fp/trunctfxf2.c: Likewise.
	* soft-fp/unorddf2.c: Likewise.
	* soft-fp/unordsf2.c: Likewise.
	* soft-fp/unordtf2.c: Likewise.
---
 libgcc/config/rs6000/ibm-ldouble.c | 4 ++--
 libgcc/soft-fp/adddf3.c| 2 +-
 libgcc/soft-fp/addsf3.c| 2 +-
 libgcc/soft-fp/addtf3.c| 2 +-
 libgcc/soft-fp/divdf3.c| 2 +-
 libgcc/soft-fp/divsf3.c| 2 +-
 libgcc/soft-fp/divtf3.c| 2 +-
 libgcc/soft-fp/double.h| 2 +-
 libgcc/soft-fp/eqdf2.c | 2 +-
 libgcc/soft-fp/eqsf2.c | 2 +-
 libgcc/soft-fp/eqtf2.c | 2 +-
 libgcc/soft-fp/extenddftf2.c   | 4 ++--
 libgcc/soft-fp/extended.h  | 2 +-
 libgcc/soft-fp/extendhftf2.c   | 4 ++--
 libgcc/soft-fp/extendsfdf2.c   | 2 +-
 libgcc/soft-fp/extendsftf2.c   | 4 ++--
 libgcc/soft-fp/extendxftf2.c   | 4 ++--
 libgcc/soft-fp/fixdfdi.c   | 2 +-
 libgcc/soft-fp/fixdfsi.c   | 2 +-
 libgcc/soft-fp/fixdfti.c   | 2 +-
 libgcc/soft-fp/fixhfti.c   | 2 +-
 libgcc/soft-fp/fixsfdi.c   | 2 +-
 libgcc/soft-fp/fixsfsi.c   | 2 +-
 libgcc/soft-fp/fixsf

Re: [PATCH] Don't pushdecl compound literals inside C parameter scope (PR c/88701)

2019-01-07 Thread Joseph Myers
On Mon, 7 Jan 2019, Jakub Jelinek wrote:

> Hi!
> 
> As reported recently, my commit to push block scope using pushdecl into
> their corresponding scope broke compound literals appearing in parameter
> scope.  For those, we can keep the previous behavior, where they stayed at
> the function scope if they make it all the way there at all, the fix
> was only needed for block scope compound literals and for those
> current_function_decl is non-NULL.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593)

2019-01-07 Thread Jakub Jelinek
Hi!

As mentioned in that PR, we have a SI->DImode zero extension and RA happens
to choose to zero extend from a SImode memory slot which is the low part of
the DImode memory slot into which the zero extension is to be stored. 
Unfortunately, the RTL DSE part really doesn't have infrastructure to
remember and, if needed, invalidate loads, it just remembers stores, so
handling this generically is quite unlikely at least for GCC9.

This patch just handles that through a peephole2 (other option would be to
handle it in the define_split for the zero extension, but the peephole2 is
likely to catch more things).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-01-07  Jakub Jelinek  

PR rtl-optimization/79593
* config/i386/i386.md (reg = mem; mem = reg): New define_peephole2.

--- gcc/config/i386/i386.md.jj  2019-01-01 12:37:31.564738571 +0100
+++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100
@@ -18740,6 +18740,21 @@ (define_peephole2
   const0_rtx);
 })
 
+;; Attempt to optimize away memory stores of values the memory already
+;; has.  See PR79593.
+(define_peephole2
+  [(set (match_operand 0 "register_operand")
+(match_operand 1 "memory_operand"))
+   (set (match_dup 1) (match_dup 0))]
+  "REG_P (operands[0])
+   && !STACK_REGNO_P (operands[0])
+   && !MEM_VOLATILE_P (operands[1])"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  if (peep2_reg_dead_p (1, operands[0]))
+DONE;
+})
+
 ;; Attempt to always use XOR for zeroing registers (including FP modes).
 (define_peephole2
   [(set (match_operand 0 "general_reg_operand")

Jakub


Re: [PATCH, GCC] PR target/86487: fix the way 'uses_hard_regs_p' handles paradoxical subregs

2019-01-07 Thread Jeff Law
On 1/7/19 7:42 AM, Andre Vieira (lists) wrote:
> Hi,
> 
> This patch fixes the way 'uses_hard_regs_p' handles paradoxical subregs.
>  The function is supposed to detect whether a register access of 'x'
> overlaps with 'set'.  For SUBREGs it should check whether any of the
> full multi-register overlaps with 'set'.  The former behavior used to
> grab the widest mode of the inner/outer registers of a SUBREG and the
> inner register, and check all registers from the inner-register onwards
> for the given width.  For normal SUBREGS this gives you the full
> register, for paradoxical SUBREGS however it may give you the wrong set
> of registers if the index is not the first of the multi-register set.
> 
> The original error reported in PR target/86487 can no longer be
> reproduced with the given test, this was due to an unrelated code-gen
> change, regardless I believe this should still be fixed as it is simply
> wrong behavior by uses_hard_regs_p which may be triggered by a different
> test-case or by future changes to the compiler.  Also it is useful to
> point out that this isn't actually a 'target' issue as this code could
> potentially hit any other target using paradoxical SUBREGS.  Should I
> change the Bugzilla ticket to reflect this is actually a target agnostic
> issue in RTL?
> 
> There is a gotcha here, I don't know what would happen if you hit the
> cases of get_hard_regno where it would return -1, quoting the comment
> above that function "If X is not a register or a subreg of a register,
> return -1." though I think if we are hitting this then things must have
> gone wrong before?
> 
> Bootstrapped on aarch64, arm and x86, no regressions.
> 
> Is this OK for trunk?
> 
> 
> gcc/ChangeLog:
> 2019-01-07 Andre Vieira  
> 
> 
>     PR target/86487
>     * lra-constraints.c(uses_hard_regs_p): Fix handling of
> paradoxical SUBREGS.
But doesn't wider_subreg_mode give us the wider of the two modes here
and we use that wider mode when we call overlaps_hard_reg_set_p which
should ultimately check all the registers in the paradoxical.

I must be missing something here?!?

jeff



Re: [PATCH] soft-fp: Update _FP_W_TYPE_SIZE check from glibc

2019-01-07 Thread Joseph Myers
Except for the libgcc/config/rs6000/ibm-ldouble.c changes, the appropriate 
way to get these changes into libgcc would be a bulk update of all the 
files copied from glibc, rather than selectively merging particular 
changes - which would be done at another development stage rather than 
when in regressions-only mode, given that this is not a user-visible bug 
at all but purely an internal cleanup.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Don't pushdecl compound literals inside C parameter scope (PR c/88701)

2019-01-07 Thread Jakub Jelinek
Hi!

As reported recently, my commit to push block scope using pushdecl into
their corresponding scope broke compound literals appearing in parameter
scope.  For those, we can keep the previous behavior, where they stayed at
the function scope if they make it all the way there at all, the fix
was only needed for block scope compound literals and for those
current_function_decl is non-NULL.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2019-01-07  Jakub Jelinek  

PR c/88701
* c-decl.c (build_compound_literal): If not TREE_STATIC, only pushdecl
if current_function_decl is non-NULL.

* gcc.dg/pr88701.c: New test.

--- gcc/c/c-decl.c.jj   2019-01-04 18:53:11.273323060 +0100
+++ gcc/c/c-decl.c  2019-01-07 13:30:23.108161124 +0100
@@ -5437,7 +5437,7 @@ build_compound_literal (location_t loc,
   pushdecl (decl);
   rest_of_decl_compilation (decl, 1, 0);
 }
-  else
+  else if (current_function_decl)
 pushdecl (decl);
 
   if (non_const)
--- gcc/testsuite/gcc.dg/pr88701.c.jj   2019-01-07 13:35:07.297528147 +0100
+++ gcc/testsuite/gcc.dg/pr88701.c  2019-01-07 13:34:42.237936686 +0100
@@ -0,0 +1,18 @@
+/* PR c/88701 */
+/* { dg-do compile } */
+/* { dg-options "-std=c99 -pedantic-errors" } */
+
+void foo (int [(int (*)[1]) { 0 } == 0]);
+void bar (int n, int [(int (*)[n]) { 0 } == 0]);
+
+int
+baz (int a[(int (*)[1]) { 0 } == 0])
+{
+  return a[0];
+}
+
+int
+qux (int n, int a[(int (*)[n]) { 0 } == 0])
+{
+  return a[0] + n;
+}

Jakub


[PATCH] avoid ICE when pretty-printing a VLA with an error bound (PR 85956, take 2)

2019-01-07 Thread Jakub Jelinek
Hi!

On Thu, May 31, 2018 at 01:34:19PM -0400, Jason Merrill wrote:
> Returning error_mark_node from omp_copy_decl and then continuing seems
> like the problem, then.  Would it really be that hard to return an
> uninitialized variable instead?

The following patch does that, but not from omp_copy_decl, but only in the
caller for the array bounds (the rest would be still errors).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-01-07  Jakub Jelinek  

PR middle-end/85956
PR lto/88733
* tree-inline.h (struct copy_body_data): Add adjust_array_error_bounds
field.
* tree-inline.c (remap_type_1): Formatting fix.  If TYPE_MAX_VALUE of
ARRAY_TYPE's TYPE_DOMAIN is newly error_mark_node, replace it with
a dummy "omp dummy var" variable if id->adjust_array_error_bounds.
* omp-low.c (new_omp_context): Set cb.adjust_array_error_bounds.
fortran/
* trans-openmp.c: Include attribs.h.
(gfc_walk_alloc_comps, gfc_omp_clause_linear_ctor): Handle
VAR_DECL max bound with "omp dummy var" attribute like NULL or
error_mark_node - recompute number of elts independently.
testsuite/
* c-c++-common/gomp/pr85956.c: New test.
* g++.dg/gomp/pr88733.C: New test.

--- gcc/tree-inline.h.jj2019-01-07 12:37:38.644966905 +0100
+++ gcc/tree-inline.h   2019-01-07 18:03:27.478852009 +0100
@@ -122,6 +122,10 @@ struct copy_body_data
   /* True if the location information will need to be reset.  */
   bool reset_location;
 
+  /* Replace error_mark_node as upper bound of array types with
+ an uninitialized VAR_DECL temporary.  */
+  bool adjust_array_error_bounds;
+
   /* A function to be called when duplicating BLOCK nodes.  */
   void (*transform_lang_insert_block) (tree);
 
--- gcc/tree-inline.c.jj2019-01-07 12:37:38.635967053 +0100
+++ gcc/tree-inline.c   2019-01-07 19:09:29.887727827 +0100
@@ -523,11 +523,27 @@ remap_type_1 (tree type, copy_body_data
 
   if (TYPE_MAIN_VARIANT (new_tree) != new_tree)
{
- gcc_checking_assert (TYPE_DOMAIN (type) == TYPE_DOMAIN 
(TYPE_MAIN_VARIANT (type)));
+ gcc_checking_assert (TYPE_DOMAIN (type)
+  == TYPE_DOMAIN (TYPE_MAIN_VARIANT (type)));
  TYPE_DOMAIN (new_tree) = TYPE_DOMAIN (TYPE_MAIN_VARIANT (new_tree));
}
   else
-   TYPE_DOMAIN (new_tree) = remap_type (TYPE_DOMAIN (new_tree), id);
+{
+ TYPE_DOMAIN (new_tree) = remap_type (TYPE_DOMAIN (new_tree), id);
+ /* For array bounds where we have decided not to copy over the bounds
+variable which isn't used in OpenMP/OpenACC region, change them to
+an uninitialized VAR_DECL temporary.  */
+ if (TYPE_MAX_VALUE (TYPE_DOMAIN (new_tree)) == error_mark_node
+ && id->adjust_array_error_bounds
+ && TYPE_MAX_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
+   {
+ tree v = create_tmp_var (TREE_TYPE (TYPE_DOMAIN (new_tree)));
+ DECL_ATTRIBUTES (v)
+   = tree_cons (get_identifier ("omp dummy var"), NULL_TREE,
+DECL_ATTRIBUTES (v));
+ TYPE_MAX_VALUE (TYPE_DOMAIN (new_tree)) = v;
+   }
+}
   break;
 
 case RECORD_TYPE:
--- gcc/omp-low.c.jj2019-01-07 12:37:38.501969255 +0100
+++ gcc/omp-low.c   2019-01-07 18:03:27.509851500 +0100
@@ -872,6 +872,7 @@ new_omp_context (gimple *stmt, omp_conte
 }
 
   ctx->cb.decl_map = new hash_map;
+  ctx->cb.adjust_array_error_bounds = true;
 
   return ctx;
 }
--- gcc/fortran/trans-openmp.c.jj   2019-01-01 12:37:52.699391804 +0100
+++ gcc/fortran/trans-openmp.c  2019-01-07 19:17:00.295377803 +0100
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
 #include "diagnostic-core.h"
 #undef GCC_DIAG_STYLE
 #define GCC_DIAG_STYLE __gcc_gfc__
+#include "attribs.h"
 
 int ompws_flags;
 
@@ -297,10 +298,19 @@ gfc_walk_alloc_comps (tree decl, tree de
}
   else
{
+ bool compute_nelts = false;
  if (!TYPE_DOMAIN (type)
  || TYPE_MAX_VALUE (TYPE_DOMAIN (type)) == NULL_TREE
  || TYPE_MIN_VALUE (TYPE_DOMAIN (type)) == error_mark_node
  || TYPE_MAX_VALUE (TYPE_DOMAIN (type)) == error_mark_node)
+   compute_nelts = true;
+ else if (VAR_P (TYPE_MAX_VALUE (TYPE_DOMAIN (type
+   {
+ tree a = DECL_ATTRIBUTES (TYPE_MAX_VALUE (TYPE_DOMAIN (type)));
+ if (lookup_attribute ("omp dummy var", a))
+   compute_nelts = true;
+   }
+ if (compute_nelts)
{
  tem = fold_build2 (EXACT_DIV_EXPR, sizetype,
 TYPE_SIZE_UNIT (type),
@@ -912,11 +922,20 @@ gfc_omp_clause_linear_ctor (tree clause,
   && (!GFC_DECL_GET_SCALAR_ALLOCATABLE (OMP_CLAUSE_DECL (clause))
  || !POINTER_TYPE_P (type)))
 {
+  bool compute_nelts = fal

Fix diagnostics for never-defined inline and nested functions (PR c/88720, PR c/88726)

2019-01-07 Thread Joseph Myers
Bugs 88720 and 88726 report issues where a function is declared inline
in an inner scope, resulting in spurious diagnostics about it being
declared but never defined when that scope is left (possibly in some
cases also wrongly referring to the function as a nested function).
These are regressions that were introduced with the support for C99
inline semantics in 4.3 (they don't appear with 4.2; it's possible
some aspects of the bugs might have been introduced later than 4.3).

For the case of functions being wrongly referred to as nested,
DECL_EXTERNAL was not the right condition for a function being
non-nested; TREE_PUBLIC is appropriate for the case of non-nested
functions with external linkage, while !b->nested means this is the
outermost scope in which the function was declared and so avoids
catching the case of a file-scope static being redeclared inline
inside a function.

For the non-nested, external-linkage case, the code attempts to avoid
duplicate diagnostics by diagnosing only when scope != external_scope,
but actually scope == external_scope is more appropriate, as it's only
when the file and external scopes are popped that the code can
actually tell whether a function ended up being defined, and all such
functions will appear in the (GCC-internal) external scope.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c:
2019-01-07  Joseph Myers  

PR c/88720
PR c/88726
* c-decl.c (pop_scope): Use TREE_PUBLIC and b->nested to determine
whether a function is nested, not DECL_EXTERNAL.  Diagnose inline
functions declared but never defined only for external scope, not
for other scopes.

gcc/testsuite:
2019-01-07  Joseph Myers  

PR c/88720
PR c/88726
* gcc.dg/inline-40.c, gcc.dg/inline-41.c: New tests.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 267648)
+++ gcc/c/c-decl.c  (working copy)
@@ -1251,8 +1251,9 @@ pop_scope (void)
  && DECL_ABSTRACT_ORIGIN (p) != NULL_TREE
  && DECL_ABSTRACT_ORIGIN (p) != p)
TREE_ADDRESSABLE (DECL_ABSTRACT_ORIGIN (p)) = 1;
- if (!DECL_EXTERNAL (p)
+ if (!TREE_PUBLIC (p)
  && !DECL_INITIAL (p)
+ && !b->nested
  && scope != file_scope
  && scope != external_scope)
{
@@ -1268,7 +1269,7 @@ pop_scope (void)
 in the same translation unit."  */
  if (!flag_gnu89_inline
  && !lookup_attribute ("gnu_inline", DECL_ATTRIBUTES (p))
- && scope != external_scope)
+ && scope == external_scope)
pedwarn (input_location, 0,
 "inline function %q+D declared but never defined", p);
  DECL_EXTERNAL (p) = 1;
Index: gcc/testsuite/gcc.dg/inline-40.c
===
--- gcc/testsuite/gcc.dg/inline-40.c(nonexistent)
+++ gcc/testsuite/gcc.dg/inline-40.c(working copy)
@@ -0,0 +1,49 @@
+/* Test inline functions declared in inner scopes.  Bugs 88720 and 88726.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void
+inline_1 (void)
+{
+}
+
+void
+inline_2 (void)
+{
+}
+
+static void
+inline_static_1 (void)
+{
+}
+
+static void
+inline_static_2 (void)
+{
+}
+
+static void inline_static_3 (void);
+static void inline_static_4 (void);
+
+static void
+test (void)
+{
+  inline void inline_1 (void);
+  extern inline void inline_2 (void);
+  inline void inline_3 (void);
+  extern inline void inline_4 (void);
+  inline void inline_static_1 (void);
+  extern inline void inline_static_2 (void);
+  inline void inline_static_3 (void);
+  extern inline void inline_static_4 (void);
+}
+
+void
+inline_3 (void)
+{
+}
+
+void
+inline_4 (void)
+{
+}
Index: gcc/testsuite/gcc.dg/inline-41.c
===
--- gcc/testsuite/gcc.dg/inline-41.c(nonexistent)
+++ gcc/testsuite/gcc.dg/inline-41.c(working copy)
@@ -0,0 +1,49 @@
+/* Test inline functions declared in inner scopes.  Bugs 88720 and 88726.  */
+/* { dg-do compile } */
+/* { dg-options "-fgnu89-inline" } */
+
+void
+inline_1 (void)
+{
+}
+
+void
+inline_2 (void)
+{
+}
+
+static void
+inline_static_1 (void)
+{
+}
+
+static void
+inline_static_2 (void)
+{
+}
+
+static void inline_static_3 (void);
+static void inline_static_4 (void);
+
+static void
+test (void)
+{
+  inline void inline_1 (void);
+  extern inline void inline_2 (void);
+  inline void inline_3 (void);
+  extern inline void inline_4 (void);
+  inline void inline_static_1 (void);
+  extern inline void inline_static_2 (void);
+  inline void inline_static_3 (void);
+  extern inline void inline_static_4 (void);
+}
+
+void
+inline_3 (void)
+{
+}
+
+void
+inline_4 (void)
+{
+}

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] genattrtab bit-rot, and if_then_else in values

2019-01-07 Thread Alan Modra
On Mon, Jan 07, 2019 at 12:10:38PM +, Richard Sandiford wrote:
> Alan Modra  writes:
> > +/* Given an attribute value expression, return the maximum value that
> > +   might be evaluated assuming all conditionals are independent.
> > +   Return INT_MAX if the value can't be calculated by this function.  */
> 
> Not sure about "assuming all conditionals are independent".  All three
> functions should be conservatively correct without any assumptions.

True.  I'll drop that phrase.  The functions aren't even guaranteed to
give an exactly correct result if they contain any conditions..

> OK without that part if you agree.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH, testsuite] Allow builtin-has-attribute-* to run as far as possible on targets without alias support.

2019-01-07 Thread Jeff Law
On 1/7/19 9:55 AM, Iain Sandoe wrote:
> Hi Martin,
> 
> A)
> Some of the builtin-has-attribute tests fail because a sub-set of them need 
> symbol alias support.
> Darwin has only support for weak aliases and therefore we need to skip these.
> 
> However, the sub-set is small, and I am reluctant to throw out the entire set 
> for the sake of a small number, so I propose to wrap that small number in 
> #ifndef that can be enabled by targets without the necessary support (Darwin 
> is not the only one, just the most frequently tested and therefore often 
> “guilty” of finding the problem ;) )
> 
> It’s a tricky trade-off between having too many test-cases and having test 
> cases that try to cover too many circumstances...
> 
> B) [builtin-has-attribute-4.c]
> I am concerned by the diagnostics for the lack of support for the “protected” 
> mode (Darwin doesn’t have this, at least at present).
> 
> B.1 the reported diagnostic appears on the closing brace of the function, 
> rather than on the statement that triggers it (line 233).
> B.2 I think you’re perhaps missing a %< %> pair - there’s no ‘’ around 
> “protected".
> B.3. there are a bunch of other lines with the “protected” visibility 
> marking, but no diagnostic (perhaps that’s intended, I am not sure).
> 
> Addressing B is a separate issue from making the current tests pass, it might 
> not be appropriate at this stage .. it’s more of a “head’s up”.
> 
> as for the test fixes, OK for trunk?
> Iain
> 
> gcc/testsuite/
> 
>   * c-c++-common/builtin-has-attribute-3.c: Skip tests requiring symbol
>   alias support.
>   * c-c++-common/builtin-has-attribute-4.c: Likewise.
>   Append match for warning that ‘protected’ attribute is not supported.
> 
> diff --git a/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c 
> b/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
OK
jeff


Re: libgo patch committed: set gp->m in getTraceback

2019-01-07 Thread Ian Lance Taylor
On Mon, Jan 7, 2019 at 12:12 PM Ian Lance Taylor  wrote:
>
> This libgo patch by Cherry Zhang moves setting p->m from gtraceback to
> getTraceback.  Currently, when collecting a traceback for another
> goroutine, getTraceback calls gogo(gp) switching to gp, which will
> resume in mcall, which will call gtraceback, which will set up gp->m.
> There is a gap between setting the current running g to gp and setting
> gp->m. If a profiling signal arrives in between, sigtramp will see a
> non-nil gp with a nil m, and will seg fault.  This patch fixes this by
> setting up gp->m first.  This fixes https://golang.org/issue/29448.
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.

And here is a follow up patch by Cherry Zhang to make the same fix in
doscanstackswitch.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 267660)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-c8a9bccbc524381d150c84907a61ac257c1b07cc
+085ef4556ec810a5a9c422e7b86d98441dc92e86
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/proc.c
===
--- libgo/runtime/proc.c(revision 267658)
+++ libgo/runtime/proc.c(working copy)
@@ -482,9 +482,14 @@ void doscanstackswitch(G*, G*) __asm__(G
 void
 doscanstackswitch(G* me, G* gp)
 {
+   M* holdm;
+
__go_assert(me->entry == nil);
me->fromgogo = false;
 
+   holdm = gp->m;
+   gp->m = me->m;
+
 #ifdef USING_SPLIT_STACK
__splitstack_getcontext((void*)(&me->stackcontext[0]));
 #endif
@@ -507,6 +512,8 @@ doscanstackswitch(G* me, G* gp)
 
if (gp->scang != 0)
runtime_gogo(gp);
+
+   gp->m = holdm;
 }
 
 // Do a stack scan, then switch back to the g that triggers this scan.
@@ -515,21 +522,15 @@ static void
 gscanstack(G *gp)
 {
G *oldg, *oldcurg;
-   M* holdm;
 
oldg = (G*)gp->scang;
oldcurg = oldg->m->curg;
-   holdm = gp->m;
-   if(holdm != nil && holdm != g->m)
-   runtime_throw("gscanstack: m is not nil");
oldg->m->curg = gp;
-   gp->m = oldg->m;
gp->scang = 0;
 
doscanstack(gp, (void*)gp->scangcw);
 
gp->scangcw = 0;
-   gp->m = holdm;
oldg->m->curg = oldcurg;
runtime_gogo(oldg);
 }


Re: C++ PATCH for c++/88692 - -Wredundant-move false positive with *this

2019-01-07 Thread Jason Merrill

On 1/7/19 4:29 PM, Marek Polacek wrote:

This patch fixes bogus -Wredundant-move warnings reported in 88692 and 87882.

To quickly recap, this warning is supposed to warn for cases like

struct T { };

T fn(T t)
{
   return std::move (t);
}

where NRVO isn't applicable for T because it's a parameter, but it's
a local variable and we're returning, so C++11 says activate move
semantics, so the std::move is redundant.  But, as these testcases show,
we're failing to realize that that is not the case when returning *this,
which is disguised as an ordinary PARM_DECL, and treat_lvalue_as_rvalue_p
was fooled by that.


Hmm, the function isn't returning 'this', it's returning '*this'.  I 
guess what's happening is that in order to pass *this to the reference 
parameter of move, we end up converting it from pointer to reference by 
NOP_EXPR, and the STRIP_NOPS in maybe_warn_pessimizing_move throws that 
away so that it then thinks we're returning 'this'.  I expect the same 
thing could happen with any parameter of pointer-to-class type.


Jason


Re: [PATCH] [RFC] PR target/52813 and target/11807

2019-01-07 Thread Bernd Edlinger
On 1/7/19 10:23 AM, Jakub Jelinek wrote:
> On Sun, Dec 16, 2018 at 06:13:57PM +0200, Dimitar Dimitrov wrote:
>> -  /* Clobbering the STACK POINTER register is an error.  */
>> +  /* Clobbered STACK POINTER register is not saved/restored by GCC,
>> + which is often unexpected by users.  See PR52813.  */
>>if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM))
>>  {
>> -  error ("Stack Pointer register clobbered by %qs in %", regname);
>> +  warning (0, "Stack Pointer register clobbered by %qs in %",
>> +   regname);
>> +  warning (0, "GCC has always ignored Stack Pointer % clobbers");
> 
> Why do we write Stack Pointer rather than stack pointer?  That is really
> weird.  The second warning would be a note based on the first one, i.e.
> if (warning ()) note ();
> and better have some -W* option to silence the warning.
> 

Yes, thanks for this suggestion.

Meanwhile I found out, that the stack clobber has only been ignored up to
gcc-5 (at least with lra targets, not really sure about reload targets).
From gcc-6 on, with the exception of PR arm/77904 which was a regression due
to the underlying lra change, but fixed later, and back-ported to gcc-6.3.0,
this works for all targets I tried so far.

To me, it starts to look like a rather unique and useful feature, that I would
like to keep working.


Attached is an updated version if my patch, using the suggested warning option,
and a note with the details.


Bootstrapped on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

2018-01-07  Bernd Edlinger  

	* doc/invoke.texi: Document -Wstack-clobber.
	* common.opt (-Wstack-clobber): New default-enabled warning.
	* cfgexpand.c (asm_clobber_reg_is_valid): Emit only a warning together
	with an informative note when the stack pointer is clobbered.

testsuite:
2018-07-01  Bernd Edlinger  

	* gcc.target/arm/pr77904.c: Adjust test.
	* gcc.target/i386/pr52813.c: Adjust test.


Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c	(revision 267653)
+++ gcc/cfgexpand.c	(working copy)
@@ -2854,6 +2854,7 @@ tree_conflicts_with_clobbers_p (tree t, HARD_REG_S
asm clobber operand.  Some HW registers cannot be
saved/restored, hence they should not be clobbered by
asm statements.  */
+
 static bool
 asm_clobber_reg_is_valid (int regno, int nregs, const char *regname)
 {
@@ -2872,11 +2873,22 @@ asm_clobber_reg_is_valid (int regno, int nregs, co
   error ("PIC register clobbered by %qs in %", regname);
   is_valid = false;
 }
-  /* Clobbering the STACK POINTER register is an error.  */
+  /* Clobbering the STACK POINTER register is likely an error.
+ However it is useful to force the use of frame pointer and prevent
+ the use of red zone.  Thus without this clobber, pushing temporary
+ values onto the stack might clobber the red zone or make stack based
+ memory references invalid.  */
   if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM))
 {
-  error ("Stack Pointer register clobbered by %qs in %", regname);
-  is_valid = false;
+  if (warning (OPT_Wstack_clobber,
+		   "stack pointer register clobbered by %qs in %",
+		   regname))
+	inform (input_location,
+		"This does likely not do what you would expect."
+		" The stack pointer register still has to be restored to"
+		" the previous value, however it is safe to push values onto"
+		" the stack, when they are popped again from the stack"
+		" before the asm statement terminates");
 }
 
   return is_valid;
Index: gcc/common.opt
===
--- gcc/common.opt	(revision 267653)
+++ gcc/common.opt	(working copy)
@@ -702,6 +702,10 @@ Warn when one local variable shadows another local
 Wshadow-compatible-local
 Common Warning Undocumented Alias(Wshadow=compatible-local)
 
+Wstack-clobber
+Common Warning Var(warn_stack_clobber) Init(1)
+Warn when asm statements try to clobber the stack pointer register.
+
 Wstack-protector
 Common Var(warn_stack_protect) Warning
 Warn when not issuing stack smashing protection for some reason.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 267653)
+++ gcc/doc/invoke.texi	(working copy)
@@ -339,7 +339,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wshift-count-negative  -Wshift-count-overflow  -Wshift-negative-value @gol
 -Wsign-compare  -Wsign-conversion  -Wfloat-conversion @gol
 -Wno-scalar-storage-order  -Wsizeof-pointer-div @gol
--Wsizeof-pointer-memaccess  -Wsizeof-array-argument @gol
+-Wsizeof-pointer-memaccess  -Wsizeof-array-argument  -Wstack-clobber @gol
 -Wstack-protector  -Wstack-usage=@var{byte-size}  -Wstrict-aliasing @gol
 -Wstrict-aliasing=n  -Wstrict-overflow  -Wstrict-overflow=@var{n} @gol
 -Wstringop-overflow=@var{n}  -Wstringop-truncation  -Wsubobject-linkage @gol
@@ -7560,6 +7560,20 @@ This option is only supported fo

Go patch committed: Move slice construction to callers of makeslice

2019-01-07 Thread Ian Lance Taylor
This patch to the Go frontend moves slice construction to callers of
makeslice.  This is the gccgo version of https://golang.org/cl/141822,
which was described as:

Only return a pointer p to the new slices backing array from makeslice.
Makeslice callers then construct sliceheader{p, len, cap} explictly
instead of makeslice returning the slice.

This change caused the GCC backend to break the runtime/pprof test by
merging together the identical functions allocateReflectTransient and
allocateTransient2M.  This caused the traceback to be other than
expected.  Fix that by making the functions not identical.

This is a step toward updating libgo to the Go1.12beta1 release.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 267658)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-c257303eaef143663216e483857d5b259e05753f
+c8a9bccbc524381d150c84907a61ac257c1b07cc
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 267580)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -1737,6 +1737,16 @@ Escape_analysis_assign::expression(Expre
   }
   break;
 
+case Expression::EXPRESSION_SLICE_VALUE:
+  {
+   // Connect the pointer field to the slice value.
+   Node* slice_node = Node::make_node(*pexpr);
+   Node* ptr_node =
+ Node::make_node((*pexpr)->slice_value_expression()->valmem());
+   this->assign(slice_node, ptr_node);
+  }
+  break;
+
 case Expression::EXPRESSION_HEAP:
   {
Node* pointer_node = Node::make_node(*pexpr);
@@ -2263,6 +2273,8 @@ Escape_analysis_assign::assign(Node* dst
  // DST = map[T]V{...}.
case Expression::EXPRESSION_STRUCT_CONSTRUCTION:
  // DST = T{...}.
+   case Expression::EXPRESSION_SLICE_VALUE:
+ // DST = slice{ptr, len, cap}
case Expression::EXPRESSION_ALLOCATION:
  // DST = new(T).
case Expression::EXPRESSION_BOUND_METHOD:
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 267580)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -7787,21 +7787,29 @@ Builtin_call_expression::lower_make(Stat
   Expression* call;
   if (is_slice)
 {
+  Temporary_statement* len_temp = NULL;
+  if (!len_arg->is_constant())
+   {
+ len_temp = Statement::make_temporary(NULL, len_arg, loc);
+ inserter->insert(len_temp);
+ len_arg = Expression::make_temporary_reference(len_temp, loc);
+   }
+
   if (cap_arg == NULL)
{
   cap_small = len_small;
-  if (len_arg->numeric_constant_value(&nclen)
-  && nclen.to_unsigned_long(&vlen) == 
Numeric_constant::NC_UL_VALID)
-cap_arg = Expression::make_integer_ul(vlen, len_arg->type(), loc);
-  else
-{
-  Temporary_statement* temp = Statement::make_temporary(NULL,
-len_arg,
-loc);
-  inserter->insert(temp);
-  len_arg = Expression::make_temporary_reference(temp, loc);
-  cap_arg = Expression::make_temporary_reference(temp, loc);
-}
+ if (len_temp == NULL)
+   cap_arg = len_arg->copy();
+ else
+   cap_arg = Expression::make_temporary_reference(len_temp, loc);
+   }
+  else if (!cap_arg->is_constant())
+   {
+ Temporary_statement* cap_temp = Statement::make_temporary(NULL,
+   cap_arg,
+   loc);
+ inserter->insert(cap_temp);
+ cap_arg = Expression::make_temporary_reference(cap_temp, loc);
}
 
   Type* et = type->array_type()->element_type();
@@ -7809,7 +7817,12 @@ Builtin_call_expression::lower_make(Stat
   Runtime::Function code = Runtime::MAKESLICE;
   if (!len_small || !cap_small)
code = Runtime::MAKESLICE64;
-  call = Runtime::make_call(code, loc, 3, type_arg, len_arg, cap_arg);
+  Expression* mem = Runtime::make_call(code, loc, 3, type_arg, len_arg,
+  cap_arg);
+  mem = Expression::make_unsafe_cast(Type::make_pointer_type(et), mem,
+loc);
+  call = Expression::make_slice_value(type, mem, len_arg->copy(),
+ cap_arg->copy(), loc);
 }
   else if (is_map)
 {
@@ -13585,9 +13

C++ PATCH for c++/88692 - -Wredundant-move false positive with *this

2019-01-07 Thread Marek Polacek
This patch fixes bogus -Wredundant-move warnings reported in 88692 and 87882.

To quickly recap, this warning is supposed to warn for cases like

struct T { };

T fn(T t)
{
  return std::move (t);
}

where NRVO isn't applicable for T because it's a parameter, but it's
a local variable and we're returning, so C++11 says activate move
semantics, so the std::move is redundant.  But, as these testcases show,
we're failing to realize that that is not the case when returning *this,
which is disguised as an ordinary PARM_DECL, and treat_lvalue_as_rvalue_p
was fooled by that.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-01-07  Marek Polacek  

PR c++/88692, c++/87882 - -Wredundant-move false positive with *this.
* typeck.c (treat_lvalue_as_rvalue_p): Return false for 'this'.

* g++.dg/cpp0x/Wredundant-move5.C: New test.
* g++.dg/cpp0x/Wredundant-move6.C: New test.

diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index e399cd3fe45..c6908d23a11 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -9371,6 +9371,9 @@ bool
 treat_lvalue_as_rvalue_p (tree retval, bool parm_ok)
 {
   STRIP_ANY_LOCATION_WRAPPER (retval);
+  /* *this remains an lvalue expression.  */
+  if (is_this_parameter (retval))
+return false;
   return ((cxx_dialect != cxx98)
  && ((VAR_P (retval) && !DECL_HAS_VALUE_EXPR_P (retval))
  || (parm_ok && TREE_CODE (retval) == PARM_DECL))
diff --git gcc/testsuite/g++.dg/cpp0x/Wredundant-move5.C 
gcc/testsuite/g++.dg/cpp0x/Wredundant-move5.C
new file mode 100644
index 000..b6a3b2296a8
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/Wredundant-move5.C
@@ -0,0 +1,37 @@
+// PR c++/88692
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wredundant-move" }
+
+// Define std::move.
+namespace std {
+  template
+struct remove_reference
+{ typedef _Tp   type; };
+
+  template
+struct remove_reference<_Tp&>
+{ typedef _Tp   type; };
+
+  template
+struct remove_reference<_Tp&&>
+{ typedef _Tp   type; };
+
+  template
+constexpr typename std::remove_reference<_Tp>::type&&
+move(_Tp&& __t) noexcept
+{ return static_cast::type&&>(__t); }
+}
+
+struct X {
+X f() && {
+return std::move(*this); // { dg-bogus "redundant move in return 
statement" }
+}
+
+X f2() & {
+return std::move(*this); // { dg-bogus "redundant move in return 
statement" }
+}
+
+X f3() {
+return std::move(*this); // { dg-bogus "redundant move in return 
statement" }
+}
+};
diff --git gcc/testsuite/g++.dg/cpp0x/Wredundant-move6.C 
gcc/testsuite/g++.dg/cpp0x/Wredundant-move6.C
new file mode 100644
index 000..5808a78638e
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/Wredundant-move6.C
@@ -0,0 +1,43 @@
+// PR c++/87882
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wredundant-move" }
+
+// Define std::move.
+namespace std {
+  template
+struct remove_reference
+{ typedef _Tp   type; };
+
+  template
+struct remove_reference<_Tp&>
+{ typedef _Tp   type; };
+
+  template
+struct remove_reference<_Tp&&>
+{ typedef _Tp   type; };
+
+  template
+constexpr typename std::remove_reference<_Tp>::type&&
+move(_Tp&& __t) noexcept
+{ return static_cast::type&&>(__t); }
+}
+
+struct Foo {
+   Foo Bar() {
+ return std::move(*this); // { dg-bogus "redundant move in return 
statement" }
+   }
+   Foo Baz() {
+ return *this;
+   }
+  int i;
+};
+
+void Move(Foo & f)
+{
+  f = Foo{}.Bar();
+}
+
+void NoMove(Foo & f)
+{
+  f = Foo{}.Baz();
+}


Re: [PATCH][jit] Add thread-local globals to the libgccjit frontend

2019-01-07 Thread David Malcolm
On Sat, 2019-01-05 at 23:10 +0100, Marc Nieper-Wißkirchen wrote:
> This patch adds thread-local globals to the libgccjit frontend. The
> library user can mark a global as being thread-local by calling
> `gcc_jit_lvalue_set_bool_thread_local'. It is implemented by calling
> `set_decl_tls_model (inner, decl_default_tls_model (inner))', where
> `inner' is the GENERIC tree corresponding to the global.

Thanks for creating this patch.

Have you done the legal paperwork with the FSF for contributing to GCC?
 See https://gcc.gnu.org/contribute.html#legal

Overall, this looks pretty good (though we'd want to get release
manager approval for late-breaking changes given where we are in the
release cycle).

Various comments inline below.

> ChangeLog
> 
> 2019-01-05  Marc Nieper-Wißkirchen  
> 
> * docs/topics/compatibility.rst: Add LIBGCCJIT_ABI_11.
> * docs/topics/expressions.rst (Global variables): Add
> documentation of gcc_jit_lvalue_set_bool_thread_local.
> * docs/_build/texinfo/libgccjit.texi: Regenerate.
> * jit-playback.c: Include "varasm.h".
> Within namespace gcc::jit::playback...
> (context::new_global) Add "thread_local_p" param and use it
> to set DECL_TLS_MODEL.
> * jit-playback.h: Within namespace gcc::jit::playback...
> (context::new_global: Add "thread_local_p" param.
> * jit-recording.c: Within namespace gcc::jit::recording...
> (global::replay_into): Provide m_thread_local to call to
> new_global.
> (global::write_reproducer): Call write_reproducer_thread_local.
> (global::write_reproducer_thread_local): New method.
> * jit-recording.h: Within namespace gcc::jit::recording...
> (lvalue::dyn_cast_global): New virtual function.
> (global::m_thread_local): New field.
> * libgccjit.c (gcc_jit_lvalue_set_bool_thread_local): New
> function.
> * libgccjit.h
> (LIBGCCJIT_HAVE_gcc_jit_lvalue_set_bool_thread_local): New
> macro.
> (gcc_jit_lvalue_set_bool_thread_local): New function.
> * libgccjit.map (LIBGCCJIT_ABI_11): New.
> (gcc_jit_lvalue_set_bool_thread_local): Add.
> 
> Testing
> 
> The result of `make check-jit' is (the failing test in
> `test-sum-squares.c` is also failing without this patch on my
> machine):
> 
> Native configuration is x86_64-pc-linux-gnu

[...]

> FAIL:  test-combination.c.exe iteration 1 of 5:
> verify_code_sum_of_squares: dump_vrp1: actual: "
> FAIL: test-combination.c.exe killed: 20233 exp6 0 0 CHILDKILLED
> SIGABRT SIGABRT
> FAIL:  test-sum-of-squares.c.exe iteration 1 of 5: verify_code:
> dump_vrp1: actual: "
> FAIL: test-sum-of-squares.c.exe killed: 22698 exp6 0 0 CHILDKILLED
> SIGABRT SIGABRT
> FAIL:  test-threads.c.exe: verify_code_sum_of_squares: dump_vrp1:
> actual: "
> FAIL: test-threads.c.exe killed: 22840 exp6 0 0 CHILDKILLED SIGABRT
> SIGABRT

That one's failing for me as well.  I'm investigating (I've filed it as
PR jit/88747).

[...]

diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h
> b/gcc/testsuite/jit.dg/all-non-failing-tests.h
> index bf02e1258..c2654ff09 100644
> --- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
> +++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
> @@ -224,6 +224,13 @@
>  #undef create_code
>  #undef verify_code
> 
> +/* test-factorial-must-tail-call.c */

Looks like a cut&paste error: presumably the above comment is meant to
refer to the new test...

> +#define create_code create_code_thread_local
> +#define verify_code verify_code_thread_local
> +#include "test-thread-local.c"

...but it looks like the new test file is missing from the patch.

> +#undef create_code
> +#undef verify_code
> +
>  /* test-types.c */
>  #define create_code create_code_types
>  #define verify_code verify_code_types
> @@ -353,6 +360,9 @@ const struct testcase testcases[] = {
>{"switch",
> create_code_switch,
> verify_code_switch},
> +  {"thread_local",
> +   create_code_thread_local,
> +   verify_code_thread_local},
>{"types",
> create_code_types,
> verify_code_types},

[...]

Thanks
Dave


libgo patch committed: set gp->m in getTraceback

2019-01-07 Thread Ian Lance Taylor
This libgo patch by Cherry Zhang moves setting p->m from gtraceback to
getTraceback.  Currently, when collecting a traceback for another
goroutine, getTraceback calls gogo(gp) switching to gp, which will
resume in mcall, which will call gtraceback, which will set up gp->m.
There is a gap between setting the current running g to gp and setting
gp->m. If a profiling signal arrives in between, sigtramp will see a
non-nil gp with a nil m, and will seg fault.  This patch fixes this by
setting up gp->m first.  This fixes https://golang.org/issue/29448.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 267590)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2ce291eaee427799bfcde256929dab89e0ab61eb
+c257303eaef143663216e483857d5b259e05753f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/pprof/pprof_test.go
===
--- libgo/go/runtime/pprof/pprof_test.go(revision 267580)
+++ libgo/go/runtime/pprof/pprof_test.go(working copy)
@@ -946,3 +946,38 @@ func TestAtomicLoadStore64(t *testing.T)
atomic.StoreUint64(&flag, 1)
<-done
 }
+
+func TestTracebackAll(t *testing.T) {
+   // With gccgo, if a profiling signal arrives at the wrong time
+   // during traceback, it may crash or hang. See issue #29448.
+   f, err := ioutil.TempFile("", "proftraceback")
+   if err != nil {
+   t.Fatalf("TempFile: %v", err)
+   }
+   defer os.Remove(f.Name())
+   defer f.Close()
+
+   if err := StartCPUProfile(f); err != nil {
+   t.Fatal(err)
+   }
+   defer StopCPUProfile()
+
+   ch := make(chan int)
+   defer close(ch)
+
+   count := 10
+   for i := 0; i < count; i++ {
+   go func() {
+   <-ch // block
+   }()
+   }
+
+   N := 1
+   if testing.Short() {
+   N = 500
+   }
+   buf := make([]byte, 10*1024)
+   for i := 0; i < N; i++ {
+   runtime.Stack(buf, true)
+   }
+}
Index: libgo/runtime/proc.c
===
--- libgo/runtime/proc.c(revision 267580)
+++ libgo/runtime/proc.c(working copy)
@@ -442,6 +442,11 @@ void getTraceback(G*, G*) __asm__(GOSYM_
 // goroutine stored in the traceback field, which is me.
 void getTraceback(G* me, G* gp)
 {
+   M* holdm;
+
+   holdm = gp->m;
+   gp->m = me->m;
+
 #ifdef USING_SPLIT_STACK
__splitstack_getcontext((void*)(&me->stackcontext[0]));
 #endif
@@ -450,6 +455,8 @@ void getTraceback(G* me, G* gp)
if (gp->traceback != 0) {
runtime_gogo(gp);
}
+
+   gp->m = holdm;
 }
 
 // Do a stack trace of gp, and then restore the context to
@@ -459,17 +466,11 @@ void
 gtraceback(G* gp)
 {
Traceback* traceback;
-   M* holdm;
 
traceback = (Traceback*)gp->traceback;
gp->traceback = 0;
-   holdm = gp->m;
-   if(holdm != nil && holdm != g->m)
-   runtime_throw("gtraceback: m is not nil");
-   gp->m = traceback->gp->m;
traceback->c = runtime_callers(1, traceback->locbuf,
sizeof traceback->locbuf / sizeof traceback->locbuf[0], false);
-   gp->m = holdm;
runtime_gogo(traceback->gp);
 }
 


Re: Add new --param knobs for inliner

2019-01-07 Thread Jan Hubicka
> On 1/5/19 11:58 AM, Jan Hubicka wrote:
> > @@ -791,7 +791,7 @@ want_inline_small_function_p (struct cgr
> >ipa_hints hints = estimate_edge_hints (e);
> >int big_speedup = -1; /* compute this lazily */
> >  
> > -  if (growth <= 0)
> > +  if (growth <= PARAM_VALUE (PARAM_VALUE 
> > (PARAM_MAX_INLINE_INSNS_SIZE)))
> 
> Extra PARAM_VALUE here.

This was fixed by the followup commit. Indeed a stupid bug.

Honza
> 
> -Pat
> 


Re: [PATCH, rs6000] Port cleanup patch, use rtl.h convenience macros, etc.

2019-01-07 Thread Peter Bergner
On 12/12/18 1:58 PM, Peter Bergner wrote:
> Ping.

Ping * 2.

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00212.html

Peter




Re: [PATCH, testsuite] Require alias support in three tests.

2019-01-07 Thread Jeff Law
On 1/7/19 6:39 AM, Iain Sandoe wrote:
> Hi,
> 
> These three tests fail on targets without alias support,
> 
> OK to apply?
> 
> Iain
> 
> 
> gcc/testsuite/
> 
>   * gcc.dg/Wmissing-attributes.c: Require alias support.
>   * gcc.dg/attr-copy-2.c: Likewise.
>   * gcc.dg/attr-copy-5.c: Likewise.
Yes.  And ISTM that something like adding a missing dg-require to a test
ought to be able to move forward without requiring review.

jeff


Re: [PATCH 4/4][libbacktrace] Add tests for unused formats

2019-01-07 Thread Iain Sandoe


> On 3 Jan 2019, at 23:10, Ian Lance Taylor  wrote:
> 
> On Thu, Jan 3, 2019 at 3:49 AM Iain Sandoe  wrote:
>> 
>>> On 2 Jan 2019, at 13:26, Iain Sandoe  wrote:
>>> 
>>> 
 On 2 Jan 2019, at 13:20, Rainer Orth  wrote:
 
 Gerald Pfeifer  writes:
 
> 
>> 
> 
> I believe that in addition to FreeBSD this probably also fails on
> Solaris and Darwin.
 
 I cannot test Darwin right now,
>>> 
>>> I have builds running on a number of versions, will take a look
>>> - is it missing a “macho_xx.c” implementation, anyway ?
>> 
>> (on darwin17 / macOS 10.13)
>> 
>> make check-target-libbacktrace
>> 
>> fails for me at trunk r267505 with:
>> 
>> /src-local/gcc-trunk/libbacktrace/elf.c:144:2: error: #error "Unknown 
>> BACKTRACE_ELF_SIZE"
>>  144 | #error "Unknown BACKTRACE_ELF_SIZE"
>>  |  ^
>> 
>> So, it looks like there’s some configury-fixing/implementation work needed 
>> for Darwin.
> 
> Someone needs to write a macho.c that roughly corresponds to the
> existing elf.c, pecoff.c, and xcoff.c.

Filed PR88745 for this.
Iain



Re: C++ PATCH for c++/88741 - wrong error with initializer-string

2019-01-07 Thread Jason Merrill

On 1/7/19 2:09 PM, Marek Polacek wrote:

This fixes c++/88741, a bogus error with using [] in a template.
Starting with the "more location wrapper nodes" patch, cp_complete_array_type
can now receive a V_C_E in a CONSTRUCTOR, instead of just {"test"}, so we need
to strip any location wrappers as elsewhere for this code to work.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK.

Jason



Re: [C++ Patch] Fix four more locations

2019-01-07 Thread Jason Merrill

OK.

Jason


[nvptx, committed] Force vl32 if calling vector-partitionable routines

2019-01-07 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0023-nvptx-Force-vl32-if-calling-vector-partitionable-rou.patch

> @@ -73,6 +73,7 @@
>  #include "cfgloop.h"
>  #include "fold-const.h"
>  #include "intl.h"
> +#include "tree-hash-traits.h"
>  #include "omp-offload.h"
>  
>  /* This file should be included last.  */

I dropped that include, that's not necessary.

> @@ -5557,19 +5637,6 @@ nvptx_adjust_parallelism (unsigned inner_mask, 
> unsigned outer_mask)
>if (wv)
>  return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER);
>  
> -  /* It's difficult to guarantee that warps in large vector_lengths
> - will remain convergent when a vector loop is nested inside a
> - worker loop.  Therefore, fallback to setting vector_length to
> - PTX_WARP_SIZE.  Hopefully this condition may be relaxed for
> - sm_70+ targets.  */
> -  if ((inner_mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))
> -  && (outer_mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)))
> -{
> -  tree attr = tree_cons (get_identifier (NVPTX_GOACC_VL_WARP), NULL_TREE,
> - DECL_ATTRIBUTES (current_function_decl));
> -  DECL_ATTRIBUTES (current_function_decl) = attr;
> -}
> -
>return inner_mask;
>  }
>  

This patch is removing here some code related to a workaround that was
added earlier in the patch series
(0017-nvptx-Enable-large-vectors.patch). Which means that that submitted
patch should not have contained that code in the first place.

Committed (without test-cases) as attached.

Thanks,
- Tom
[nvptx] Force vl32 if calling vector-partitionable routines

With PTX_MAX_VECTOR_LENGTH set to larger than PTX_WARP_SIZE, routines can be
called from offloading regions with vector-size set to larger than warp size.
OTOH, vector-partitionable routines assume warp-sized vector length.

Detect if we're calling a vector-partitionable routine from an offloading
region, and if so, fall back to warp-sized vector length in that region.

2018-12-17  Tom de Vries  

	PR target/85486
	* config/nvptx/nvptx.c (has_vector_partitionable_routine_calls_p): New
	function.
	(nvptx_goacc_validate_dims): Force vl32 if calling vector-partitionable
	routines.

---
 gcc/config/nvptx/nvptx.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 5a4b38de522..7fdc285b6f8 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -59,6 +59,7 @@
 #include "builtins.h"
 #include "omp-general.h"
 #include "omp-low.h"
+#include "omp-offload.h"
 #include "gomp-constants.h"
 #include "dumpfile.h"
 #include "internal-fn.h"
@@ -5496,6 +5497,40 @@ nvptx_apply_dim_limits (int dims[])
 dims[GOMP_DIM_VECTOR] = PTX_WARP_SIZE;
 }
 
+/* Return true if FNDECL contains calls to vector-partitionable routines.  */
+
+static bool
+has_vector_partitionable_routine_calls_p (tree fndecl)
+{
+  if (!fndecl)
+return false;
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (fndecl))
+for (gimple_stmt_iterator i = gsi_start_bb (bb); !gsi_end_p (i);
+	 gsi_next_nondebug (&i))
+  {
+	gimple *stmt = gsi_stmt (i);
+	if (gimple_code (stmt) != GIMPLE_CALL)
+	  continue;
+
+	tree callee = gimple_call_fndecl (stmt);
+	if (!callee)
+	  continue;
+
+	tree attrs  = oacc_get_fn_attrib (callee);
+	if (attrs == NULL_TREE)
+	  return false;
+
+	int partition_level = oacc_fn_attrib_level (attrs);
+	bool seq_routine_p = partition_level == GOMP_DIM_MAX;
+	if (!seq_routine_p)
+	  return true;
+  }
+
+  return false;
+}
+
 /* As nvptx_goacc_validate_dims, but does not return bool to indicate whether
DIMS has changed.  */
 
@@ -5611,6 +5646,16 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], int fn_level)
 old_dims[i] = dims[i];
 
   const char *vector_reason = NULL;
+  if (offload_region_p && has_vector_partitionable_routine_calls_p (decl))
+{
+  if (dims[GOMP_DIM_VECTOR] > PTX_WARP_SIZE)
+	{
+	  vector_reason = G_("using vector_length (%d) due to call to"
+			 " vector-partitionable routine, ignoring %d");
+	  dims[GOMP_DIM_VECTOR] = PTX_WARP_SIZE;
+	}
+}
+
   if (dims[GOMP_DIM_VECTOR] == 0)
 {
   vector_reason = G_("using vector_length (%d), ignoring runtime setting");


C++ PATCH for c++/88741 - wrong error with initializer-string

2019-01-07 Thread Marek Polacek
This fixes c++/88741, a bogus error with using [] in a template.
Starting with the "more location wrapper nodes" patch, cp_complete_array_type
can now receive a V_C_E in a CONSTRUCTOR, instead of just {"test"}, so we need
to strip any location wrappers as elsewhere for this code to work.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-01-07  Marek Polacek  

PR c++/88741 - wrong error with initializer-string.
* decl.c (cp_complete_array_type): Strip any location wrappers.

* g++.dg/init/array50.C: New test.

diff --git gcc/cp/decl.c gcc/cp/decl.c
index 15bc4887a59..1fc7a1acf56 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -8433,6 +8433,7 @@ cp_complete_array_type (tree *ptype, tree initial_value, 
bool do_default)
{
  vec *v = CONSTRUCTOR_ELTS (initial_value);
  tree value = (*v)[0].value;
+ STRIP_ANY_LOCATION_WRAPPER (value);
 
  if (TREE_CODE (value) == STRING_CST
  && v->length () == 1)
diff --git gcc/testsuite/g++.dg/init/array50.C 
gcc/testsuite/g++.dg/init/array50.C
new file mode 100644
index 000..a5c129d0eb5
--- /dev/null
+++ gcc/testsuite/g++.dg/init/array50.C
@@ -0,0 +1,12 @@
+// PR c++/88741
+
+template 
+void foo()
+{
+  char row[] = {"test"};
+}
+  
+void bar()
+{
+  foo();
+} 


Re: [committed][nvptx] Allow larger PTX_MAX_VECTOR_LENGTH in nvptx_goacc_validate_dims_1

2019-01-07 Thread Tom de Vries
On 07-01-19 09:56, Tom de Vries wrote:
> +  /* Check that the vector_length is not too large.  */
> +  if (dims[GOMP_DIM_VECTOR] > PTX_MAX_VECTOR_LENGTH)
> +dims[GOMP_DIM_VECTOR] = PTX_MAX_VECTOR_LENGTH;

And just to note this:

I've chosen a different solution here than og8, which falls back to
PTX_WARP_SIZE here.

I figured that given that when too much workers are specified, we fall
back to the maximum amount of workers, we should do the same for
vector_length.

Thanks,
- Tom



Re: [PATCH 3/3][GCC][AARCH64] Add support for pointer authentication B key

2019-01-07 Thread James Greenhalgh
On Fri, Dec 21, 2018 at 09:00:10AM -0600, Sam Tebbs wrote:
> On 11/9/18 11:04 AM, Sam Tebbs wrote:
 


> Attached is an improved patch with "hint" removed from the test scans, 
> pauth_hint_num_a and pauth_hint_num_b merged into pauth_hint_num and the 
> "gcc_assert (cfun->machine->frame.laid_out)" removal reverted since was 
> an unnecessary change.
> 
> OK for trunk?

While the AArch64 parts look OK to me and are buried behind an option so are
relatively safe even though we're late in development, you'll need someone
else to approve the libgcc changes. Especially as you change a generic
routine with an undocumented (?) AArch64-specific change.

Thanks,
James

> 
> gcc/
> 2018-12-21  Sam Tebbs
> 
>   * config/aarch64/aarch64-builtins.c (aarch64_builtins): Add
>   AARCH64_PAUTH_BUILTIN_AUTIB1716 and AARCH64_PAUTH_BUILTIN_PACIB1716.
>   * config/aarch64/aarch64-builtins.c (aarch64_init_pauth_hint_builtins):
>   Add autib1716 and pacib1716 initialisation.
>   * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin): Add checks
>   for autib1716 and pacib1716.
>   * config/aarch64/aarch64-protos.h (aarch64_key_type,
>   aarch64_post_cfi_startproc): Define.
>   * config/aarch64/aarch64-protos.h (aarch64_ra_sign_key): Define extern.
>   * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled): Add
>   check for b-key.
>   * config/aarch64/aarch64.c (aarch64_ra_sign_key,
>   aarch64_post_cfi_startproc, aarch64_handle_pac_ret_b_key): Define.
>   * config/aarch64/aarch64.h (TARGET_ASM_POST_CFI_STARTPROC): Define.
>   * config/aarch64/aarch64.c (aarch64_pac_ret_subtypes): Add "b-key".
>   * config/aarch64/aarch64.md (unspec): Add UNSPEC_AUTIA1716,
>   UNSPEC_AUTIB1716, UNSPEC_AUTIASP, UNSPEC_AUTIBSP, UNSPEC_PACIA1716,
>   UNSPEC_PACIB1716, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>   * config/aarch64/aarch64.md (do_return): Add check for b-key.
>   * config/aarch64/aarch64.md (sp): Replace
>   pauth_hint_num_a with pauth_hint_num.
>   * config/aarch64/aarch64.md (1716): Replace
>   pauth_hint_num_a with pauth_hint_num.
>   * config/aarch64/aarch64.opt (msign-return-address=): Deprecate.
>   * config/aarch64/iterators.md (PAUTH_LR_SP): Add UNSPEC_AUTIASP,
>   UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>   * config/aarch64/iterators.md (PAUTH_17_16): Add UNSPEC_AUTIA1716,
>   UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716.
>   * config/aarch64/iterators.md (pauth_mnem_prefix): Add UNSPEC_AUTIA1716,
>   UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716, UNSPEC_AUTIASP,
>   UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>   * config/aarch64/iterators.md (pauth_hint_num_a): Replace
>   UNSPEC_PACI1716 and UNSPEC_AUTI1716 with UNSPEC_PACIA1716 and
>   UNSPEC_AUTIA1716 respectively.
>   * config/aarch64/iterators.md (pauth_hint_num_a): Rename to 
> pauth_hint_num
>   and add UNSPEC_PACIBSP, UNSPEC_AUTIBSP, UNSPEC_PACIB1716, 
> UNSPEC_AUTIB1716.
> 
> gcc/testsuite
> 2018-12-21  Sam Tebbs
> 
>   * gcc.target/aarch64/return_address_sign_1.c (dg-final): Replace
>   "autiasp" and "paciasp" with "hint\t29 // autisp" and
>   "hint\t25 // pacisp" respectively.
>   * gcc.target/aarch64/return_address_sign_2.c (dg-final): Replace
>   "paciasp" with "hint\t25 // pacisp".
>   * gcc.target/aarch64/return_address_sign_3.c (dg-final): Replace
>   "paciasp" and "autiasp" with "pacisp" and "autisp" respectively.
>   * gcc.target/aarch64/return_address_sign_b_1.c: New file.
>   * gcc.target/aarch64/return_address_sign_b_2.c: New file.
>   * gcc.target/aarch64/return_address_sign_b_3.c: New file.
>   * gcc.target/aarch64/return_address_sign_b_exception.c: New file.
>   * gcc.target/aarch64/return_address_sign_builtin.c: New file
> 
> libgcc/
> 2018-12-21  Sam Tebbs
> 
>   * config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key): New
>   function.
>   * config/aarch64/aarch64-unwind.h (aarch64_post_extract_frame_addr,
>   aarch64_post_frob_eh_handler_addr): Add check for b-key.
>   * unwind-dw2-fde.c (get_cie_encoding): Add check for 'B' in augmentation
>   string.
>   * unwind-dw2.c (extract_cie_info): Add check for 'B' in augmentation
>   string.
> 


Re: [PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI

2019-01-07 Thread James Greenhalgh
On Thu, Dec 20, 2018 at 10:38:42AM -0600, Sam Tebbs wrote:
> On 11/22/18 4:54 PM, Sam Tebbs wrote:



> 
> Hi all,
> 
> Attached is an updated patch with branch_protec_type renamed to 
> branch_protect_type, some unneeded ATTRIBUTE_USED removed and an added 
> use of ARRAY_SIZE.
> 
> Below is the updated changelog.
> 
> OK for trunk? I have committed the preceding patch in the series.


OK. Please get this in soon as we really want to be closing down for Stage 4
(and fix a few bugs in return :-) ).

Thanks,
James

> 
> gcc/ChangeLog:
> 
> 2018-12-20  Sam Tebbs
> 
>   * config/aarch64/aarch64.c (BRANCH_PROTECT_STR_MAX,
>   aarch64_parse_branch_protection,
>   struct aarch64_branch_protect_type,
>   aarch64_handle_no_branch_protection,
>   aarch64_handle_standard_branch_protection,
>   aarch64_validate_mbranch_protection,
>   aarch64_handle_pac_ret_protection,
>   aarch64_handle_attr_branch_protection,
>   accepted_branch_protection_string,
>   aarch64_pac_ret_subtypes,
>   aarch64_branch_protect_types,
>   aarch64_handle_pac_ret_leaf): Define.
>   (aarch64_override_options_after_change_1): Add check for
>   accepted_branch_protection_string.
>   (aarch64_override_options): Add check for
>   accepted_branch_protection_string.
>   (aarch64_option_save): Save accepted_branch_protection_string.
>   (aarch64_option_restore): Save
>   accepted_branch_protection_string.
>   * config/aarch64/aarch64.c (aarch64_attributes): Add branch-protection.
>   * config/aarch64/aarch64.opt: Add mbranch-protection. Deprecate
>   msign-return-address.
>   * doc/invoke.texi: Add mbranch-protection.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-12-20  Sam Tebbs
> 
>   * (gcc.target/aarch64/return_address_sign_1.c,
>   gcc.target/aarch64/return_address_sign_2.c,
>   gcc.target/aarch64/return_address_sign_3.c (__attribute__)): Change
>   option to -mbranch-protection.
>   * gcc.target/aarch64/(branch-protection-option.c,
>   branch-protection-option-2.c, branch-protection-attr.c,
>   branch-protection-attr-2.c): New file.
> 



Re: Add new --param knobs for inliner

2019-01-07 Thread Pat Haugen
On 1/5/19 11:58 AM, Jan Hubicka wrote:
> @@ -791,7 +791,7 @@ want_inline_small_function_p (struct cgr
>ipa_hints hints = estimate_edge_hints (e);
>int big_speedup = -1; /* compute this lazily */
>  
> -  if (growth <= 0)
> +  if (growth <= PARAM_VALUE (PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE)))

Extra PARAM_VALUE here.

-Pat



[PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY

2019-01-07 Thread H.J. Lu
There is no need to generate vzeroupper if caller uses upper bits of
AVX/AVX512 registers,  We track caller's avx_u128_state and avoid
vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.

Tested on i686 and x86-64 with and without --with-arch=native.

OK for trunk?

Thanks.

H.J.
---
gcc/

PR target/88717
* config/i386/i386.c (ix86_avx_u128_mode_entry): Set
caller_avx_u128_dirty to true when caller is AVX_U128_DIRTY.
(ix86_avx_u128_mode_exit): Set exit mode to AVX_U128_DIRTY if
caller is AVX_U128_DIRTY.
* config/i386/i386.h (machine_function): Add
caller_avx_u128_dirty.

gcc/testsuite/

PR target/88717
* gcc.target/i386/pr88717.c: New test.
---
 gcc/config/i386/i386.c  | 10 +-
 gcc/config/i386/i386.h  |  3 +++
 gcc/testsuite/gcc.target/i386/pr88717.c | 24 
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88717.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d01278d866f..9b49a2c1d9c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19100,7 +19100,11 @@ ix86_avx_u128_mode_entry (void)
   rtx incoming = DECL_INCOMING_RTL (arg);
 
   if (incoming && ix86_check_avx_upper_register (incoming))
-   return AVX_U128_DIRTY;
+   {
+ /* Caller is AVX_U128_DIRTY.  */
+ cfun->machine->caller_avx_u128_dirty = true;
+ return AVX_U128_DIRTY;
+   }
 }
 
   return AVX_U128_CLEAN;
@@ -19130,6 +19134,10 @@ ix86_mode_entry (int entity)
 static int
 ix86_avx_u128_mode_exit (void)
 {
+  /* Exit mode is set to AVX_U128_DIRTY if caller is AVX_U128_DIRTY.  */
+  if (cfun->machine->caller_avx_u128_dirty)
+return AVX_U128_DIRTY;
+
   rtx reg = crtl->return_rtx;
 
   /* Exit mode is set to AVX_U128_DIRTY if there are 256bit
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 83b025e0cf5..c053b657a55 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2747,6 +2747,9 @@ struct GTY(()) machine_function {
   /* If true, ENDBR is queued at function entrance.  */
   BOOL_BITFIELD endbr_queued_at_entrance : 1;
 
+  /* If true, caller is AVX_U128_DIRTY.  */
+  BOOL_BITFIELD caller_avx_u128_dirty : 1;
+
   /* The largest alignment, in bytes, of stack slot actually used.  */
   unsigned int max_used_stack_alignment;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr88717.c 
b/gcc/testsuite/gcc.target/i386/pr88717.c
new file mode 100644
index 000..01680998f1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88717.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mvzeroupper" } */
+
+#include 
+
+__m128
+foo1 (__m256 x)
+{
+  return _mm256_castps256_ps128 (x);
+}
+
+void
+foo2 (float *p, __m256 x)
+{
+  *p = ((__v8sf)x)[0];
+}
+
+void
+foo3 (float *p, __m512 x)
+{
+  *p = ((__v16sf)x)[0];
+}
+
+/* { dg-final { scan-assembler-not "vzeroupper" } } */
-- 
2.20.1



Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-07 Thread Richard Sandiford
Steve Ellcey  writes:
> On Thu, 2018-12-06 at 12:25 +, Richard Sandiford wrote:
>> 
>> Since we're looking at the call insns anyway, we could have a hook that
>> "jousts" two calls and picks the one that preserves *fewer* registers.
>> This would mean that loop produces a single instruction that conservatively
>> describes the call-preserved registers.  We could then stash that
>> instruction in lra_reg instead of the current check_part_clobbered
>> boolean.
>> 
>> The hook should by default be a null pointer, so that we can avoid
>> the instruction walk on targets that don't need it.
>> 
>> That would mean that LRA would always have a call instruction to hand
>> when asking about call-preserved information.  So I think we should
>> add an insn parameter to targetm.hard_regno_call_part_clobbered,
>> with a null insn selecting the defaul behaviour.   I know it's
>> going to be a pain to update all callers and targets, sorry.
>
> Richard,  here is an updated version of this patch.  It is not
> completly tested yet but I wanted to send this out and make
> sure it is what you had in mind and see if you had any comments about
> the new target function while I am testing it (including building
> some of the other targets).

Yeah, this was the kind of thing I had in mind, thanks.

>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
> the lower 64 bits of a 128-bit register.  Tell the compiler the callee
> clobbers the top 64 bits when restoring the bottom 64 bits.  */
>  
>  static bool
> -aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode 
> mode)
> +aarch64_hard_regno_call_part_clobbered (rtx_insn *insn,
> + unsigned int regno,
> + machine_mode mode)
>  {
> +  if (insn && CALL_P (insn) && aarch64_simd_call_p (insn))
> +return false;
>return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);

This should be choosing between 8 and 16 for the maybe_gt, since
even SIMD functions clobber bits 128 and above for SVE.

> +/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
> +
> +rtx_insn *
> +aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
> +{
> +  gcc_assert (CALL_P (call_1));
> +  if ((call_2 == NULL_RTX) || aarch64_simd_call_p (call_2))
> +return call_1;
> +  else
> +return call_2;

Nit: redundant parens in "(call_2 == NULL_RTX)".

> diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
> index 023308b..2cf993d 100644
> --- a/gcc/config/avr/avr.c
> +++ b/gcc/config/avr/avr.c
> @@ -12181,7 +12181,9 @@ avr_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
>  
>  static bool
> -avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode)
> +avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
> + unsigned regno,
> + machine_mode mode)
>  {

Also very minor, sorry, but: I think it's usual to put the parameters
on the same line when they fit.  Same for the other hooks.

> @@ -2919,7 +2930,7 @@ the local anchor could be shared by other accesses to 
> nearby locations.
>  
>  The hook returns true if it succeeds, storing the offset of the
>  anchor from the base in @var{offset1} and the offset of the final address
> -from the anchor in @var{offset2}.  The default implementation returns false.
> +from the anchor in @var{offset2}.  ehe defnult implementation returns false.
>  @end deftypefn

Stray change here.

> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 7ffcd35..31a567a 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -5368,16 +5368,24 @@ inherit_reload_reg (bool def_p, int original_regno,
>  static inline bool
>  need_for_call_save_p (int regno)
>  {
> +  machine_mode pmode = PSEUDO_REGNO_MODE (regno);
> +  int new_regno = reg_renumber[regno];
> +
>lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
> -  return (usage_insns[regno].calls_num < calls_num
> -   && (overlaps_hard_reg_set_p
> -   ((flag_ipa_ra &&
> - ! hard_reg_set_empty_p 
> (lra_reg_info[regno].actual_call_used_reg_set))
> -? lra_reg_info[regno].actual_call_used_reg_set
> -: call_used_reg_set,
> -PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
> -   || (targetm.hard_regno_call_part_clobbered
> -   (reg_renumber[regno], PSEUDO_REGNO_MODE (regno);
> +
> +  if (usage_insns[regno].calls_num >= calls_num)
> +return false;
> +
> +  if (flag_ipa_ra
> +  && !hard_reg_set_empty_p 
> (lra_reg_info[regno].actual_call_used_reg_set))
> +return (overlaps_hard_reg_set_p
> + (lra_reg_info[regno].actual_call_used_reg_set, pmode, new_regno)
> + || targetm.hard_regno_call_part_clobbered
> + (lra_reg_info[regno].call_insn, new_regno, pm

Re: Remove overall growth from badness metrics

2019-01-07 Thread Qing Zhao


> On Jan 6, 2019, at 5:52 PM, Jan Hubicka  wrote:
> 
> Hi,
> this patch removes overall growth from badness metrics.  This code was
> added at a time inliner was purely function size based to give a hint
> that inlining more different functions into all callers is better than
> inlining one called very many times.
> 
> With profile we now have more fine grained information and in all
> tests I did this heuristics seems to be counter-productive now and
> harmful especially on large units where growth estimate gets out of
> date.
> 
> I plan to commit the patch tomorrow after re-testing everything after
> the bugfixes from today and yesterday.  In addition to this have found
> that current inline-unit-growth is too small for LTO of large programs
> (especially Firefox:) and there are important improvements when
> increased from 20 to 30 or 40.  I am re-running C++ benchmarks and other
> tests to decide about precise setting.  Finally I plan to increase
> the new parameters for bit more inlining at -O2 and -Os.

Usually increasing these parameters might increase the compilation time and the 
final code size, do you have any data for compilation time and code size impact 
from
these parameter change?

thanks.

Qing
> 
> Bootstrapped/regtested x86_64-linux, will commit it tomorrow.
> 
>   * ipa-inline.c (edge_badness): Do not account overall_growth into
>   badness metrics.
> Index: ipa-inline.c
> ===
> --- ipa-inline.c  (revision 267612)
> +++ ipa-inline.c  (working copy)
> @@ -1082,8 +1082,8 @@ edge_badness (struct cgraph_edge *edge,
>   /* When profile is available. Compute badness as:
> 
>  time_saved * caller_count
> - goodness =  -
> -  growth_of_caller * overall_growth * combined_size
> + goodness =  
> +  growth_of_caller * combined_size
> 
>  badness = - goodness
> 
> @@ -1094,7 +1094,6 @@ edge_badness (struct cgraph_edge *edge,
>  || caller->count.ipa ().nonzero_p ())
> {
>   sreal numerator, denominator;
> -  int overall_growth;
>   sreal inlined_time = compute_inlined_call_time (edge, edge_time);
> 
>   numerator = (compute_uninlined_call_time (edge, unspec_edge_time)
> @@ -1106,73 +1105,6 @@ edge_badness (struct cgraph_edge *edge,
>   else if (caller->count.ipa ().initialized_p ())
>   numerator = numerator >> 11;
>   denominator = growth;
> -
> -  overall_growth = callee_info->growth;
> -
> -  /* Look for inliner wrappers of the form:
> -
> -  inline_caller ()
> -{
> -  do_fast_job...
> -  if (need_more_work)
> -noninline_callee ();
> -}
> -  Withhout panilizing this case, we usually inline noninline_callee
> -  into the inline_caller because overall_growth is small preventing
> -  further inlining of inline_caller.
> -
> -  Penalize only callgraph edges to functions with small overall
> -  growth ...
> - */
> -  if (growth > overall_growth
> -   /* ... and having only one caller which is not inlined ... */
> -   && callee_info->single_caller
> -   && !edge->caller->global.inlined_to
> -   /* ... and edges executed only conditionally ... */
> -   && edge->sreal_frequency () < 1
> -   /* ... consider case where callee is not inline but caller is ... */
> -   && ((!DECL_DECLARED_INLINE_P (edge->callee->decl)
> -&& DECL_DECLARED_INLINE_P (caller->decl))
> -   /* ... or when early optimizers decided to split and edge
> -  frequency still indicates splitting is a win ... */
> -   || (callee->split_part && !caller->split_part
> -   && edge->sreal_frequency () * 100
> -  < PARAM_VALUE
> -   (PARAM_PARTIAL_INLINING_ENTRY_PROBABILITY)
> -   /* ... and do not overwrite user specified hints.   */
> -   && (!DECL_DECLARED_INLINE_P (edge->callee->decl)
> -   || DECL_DECLARED_INLINE_P (caller->decl)
> - {
> -   ipa_fn_summary *caller_info = ipa_fn_summaries->get (caller);
> -   int caller_growth = caller_info->growth;
> -
> -   /* Only apply the penalty when caller looks like inline candidate,
> -  and it is not called once and.  */
> -   if (!caller_info->single_caller && overall_growth < caller_growth
> -   && caller_info->inlinable
> -   && caller_info->size
> -  < (DECL_DECLARED_INLINE_P (caller->decl)
> - ? MAX_INLINE_INSNS_SINGLE : MAX_INLINE_INSNS_AUTO))
> - {
> -   if (dump)
> - fprintf (dump_file,
> -  " Wrapper penalty. Increasing growth %i to %i\n",
> -  overall_growth, caller_growth);
> -   overall_growth = caller_growth;
> - }
> - }
> -  if (ove

[PATCH] soft-fp: Update _FP_W_TYPE_SIZE check from glibc

2019-01-07 Thread H.J. Lu
OK for trunk?

Thanks.


H.J.
---
Update soft-fp from glibc with:

commit 69da3c9e87e0a692e79db0615a53782e4198dbf0
Author: H.J. Lu 
Date:   Mon Jan 7 09:04:29 2019 -0800

quad.h have

 #if _FP_W_TYPE_SIZE < 64

union _FP_UNION_Q
{
  Use 4 _FP_W_TYPEs
}

 #else

union _FP_UNION_Q
{
  Use 2 _FP_W_TYPEs
}

 #endif

Replace

 #if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q

with

 #if _FP_W_TYPE_SIZE < 64

to check whether 4 or 2 _FP_W_TYPEs are used for IEEE quad precision.
Tested with build-many-glibcs.py.

* soft-fp/extenddftf2.c: Use "_FP_W_TYPE_SIZE < 64" to check if
4_FP_W_TYPEs are used for IEEE quad precision.
* soft-fp/extendhftf2.c: Likewise.
* soft-fp/extendsftf2.c: Likewise.
* soft-fp/extendxftf2.c: Likewise.
* soft-fp/trunctfdf2.c: Likewise.
* soft-fp/trunctfhf2.c: Likewise.
* soft-fp/trunctfsf2.c: Likewise.
* soft-fp/trunctfxf2.c: Likewise.
* config/rs6000/ibm-ldouble.c: Likewise.
---
 libgcc/config/rs6000/ibm-ldouble.c | 4 ++--
 libgcc/soft-fp/extenddftf2.c   | 2 +-
 libgcc/soft-fp/extendhftf2.c   | 2 +-
 libgcc/soft-fp/extendsftf2.c   | 2 +-
 libgcc/soft-fp/extendxftf2.c   | 2 +-
 libgcc/soft-fp/trunctfdf2.c| 2 +-
 libgcc/soft-fp/trunctfhf2.c| 2 +-
 libgcc/soft-fp/trunctfsf2.c| 2 +-
 libgcc/soft-fp/trunctfxf2.c| 2 +-
 9 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/libgcc/config/rs6000/ibm-ldouble.c 
b/libgcc/config/rs6000/ibm-ldouble.c
index f9118d8fc39..0e1c443af01 100644
--- a/libgcc/config/rs6000/ibm-ldouble.c
+++ b/libgcc/config/rs6000/ibm-ldouble.c
@@ -407,7 +407,7 @@ fmsub (double a, double b, double c)
 FP_UNPACK_RAW_D (C, c);
 
 /* Extend double to quad.  */
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if _FP_W_TYPE_SIZE < 64
 FP_EXTEND(Q,D,4,2,X,A);
 FP_EXTEND(Q,D,4,2,Y,B);
 FP_EXTEND(Q,D,4,2,Z,C);
@@ -436,7 +436,7 @@ fmsub (double a, double b, double c)
 FP_SUB_Q(V,U,Z);
 
 /* Truncate quad to double.  */
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if _FP_W_TYPE_SIZE < 64
 V_f[3] &= 0x0007;
 FP_TRUNC(D,Q,2,4,R,V);
 #else
diff --git a/libgcc/soft-fp/extenddftf2.c b/libgcc/soft-fp/extenddftf2.c
index 31c7263efa5..ea40d47e3f2 100644
--- a/libgcc/soft-fp/extenddftf2.c
+++ b/libgcc/soft-fp/extenddftf2.c
@@ -43,7 +43,7 @@ __extenddftf2 (DFtype a)
 
   FP_INIT_EXCEPTIONS;
   FP_UNPACK_RAW_D (A, a);
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if _FP_W_TYPE_SIZE < 64
   FP_EXTEND (Q, D, 4, 2, R, A);
 #else
   FP_EXTEND (Q, D, 2, 1, R, A);
diff --git a/libgcc/soft-fp/extendhftf2.c b/libgcc/soft-fp/extendhftf2.c
index 7f1f89ed66f..f42db71d198 100644
--- a/libgcc/soft-fp/extendhftf2.c
+++ b/libgcc/soft-fp/extendhftf2.c
@@ -41,7 +41,7 @@ __extendhftf2 (HFtype a)
 
   FP_INIT_EXCEPTIONS;
   FP_UNPACK_RAW_H (A, a);
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if _FP_W_TYPE_SIZE < 64
   FP_EXTEND (Q, H, 4, 1, R, A);
 #else
   FP_EXTEND (Q, H, 2, 1, R, A);
diff --git a/libgcc/soft-fp/extendsftf2.c b/libgcc/soft-fp/extendsftf2.c
index e3f2e950bf1..618ab098faa 100644
--- a/libgcc/soft-fp/extendsftf2.c
+++ b/libgcc/soft-fp/extendsftf2.c
@@ -43,7 +43,7 @@ __extendsftf2 (SFtype a)
 
   FP_INIT_EXCEPTIONS;
   FP_UNPACK_RAW_S (A, a);
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if _FP_W_TYPE_SIZE < 64
   FP_EXTEND (Q, S, 4, 1, R, A);
 #else
   FP_EXTEND (Q, S, 2, 1, R, A);
diff --git a/libgcc/soft-fp/extendxftf2.c b/libgcc/soft-fp/extendxftf2.c
index 2d12da1dde6..5ea58d981c7 100644
--- a/libgcc/soft-fp/extendxftf2.c
+++ b/libgcc/soft-fp/extendxftf2.c
@@ -41,7 +41,7 @@ __extendxftf2 (XFtype a)
 
   FP_INIT_TRAPPING_EXCEPTIONS;
   FP_UNPACK_RAW_E (A, a);
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if _FP_W_TYPE_SIZE < 64
   FP_EXTEND (Q, E, 4, 4, R, A);
 #else
   FP_EXTEND (Q, E, 2, 2, R, A);
diff --git a/libgcc/soft-fp/trunctfdf2.c b/libgcc/soft-fp/trunctfdf2.c
index c0d1d34841a..0a52987f815 100644
--- a/libgcc/soft-fp/trunctfdf2.c
+++ b/libgcc/soft-fp/trunctfdf2.c
@@ -42,7 +42,7 @@ __trunctfdf2 (TFtype a)
 
   FP_INIT_ROUNDMODE;
   FP_UNPACK_SEMIRAW_Q (A, a);
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if _FP_W_TYPE_SIZE < 64
   FP_TRUNC (D, Q, 2, 4, R, A);
 #else
   FP_TRUNC (D, Q, 1, 2, R, A);
diff --git a/libgcc/soft-fp/trunctfhf2.c b/libgcc/soft-fp/trunctfhf2.c
index 8eddd14b2f4..156fab231e5 100644
--- a/libgcc/soft-fp/trunctfhf2.c
+++ b/libgcc/soft-fp/trunctfhf2.c
@@ -40,7 +40,7 @@ __trunctfhf2 (TFtype a)
 
   FP_INIT_ROUNDMODE;
   FP_UNPACK_SEMIRAW_Q (A, a);
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if _FP_W_TYPE_SIZE < 64
   FP_TRUNC (H, Q, 1, 4, R, A);
 #else
   FP_TRUNC (H, Q, 1, 2, R, A);
diff --git a/libgcc/soft-fp/trunctfsf2.c b/libgcc/soft-fp/trunctfsf2.c
index 4b04d698d24..e37976e2064 100644
--- a/libgcc/soft-fp/trunctfsf2.c
+++ b/libgcc/soft-fp/trunctfsf2.c
@@ -42,7 +42,7 @@ __trunctfsf2 (TFtype a)
 
   FP_INIT_ROUNDMODE;
   FP_UNPACK_SEMIRAW_Q (A, a);
-#if (2 * _FP_W_TYPE_SIZE) < _FP_FRACBITS_Q
+#if 

[C++ Patch] Fix four more locations

2019-01-07 Thread Paolo Carlini

Hi,

should be straightforward material. Tested x86_64-linux, as usual.

Thanks, Paolo.

/

/cp
2019-01-07  Paolo Carlini  

* decl.c (start_decl): Improve two error_at locations.
(expand_static_init): Likewise.

/testsuite
2019-01-07  Paolo Carlini  

* g++.dg/diagnostic/constexpr1.C: New.
* g++.dg/diagnostic/thread1.C: Likewise.
Index: cp/decl.c
===
--- cp/decl.c   (revision 267651)
+++ cp/decl.c   (working copy)
@@ -5235,10 +5236,12 @@ start_decl (const cp_declarator *declarator,
 {
   bool ok = false;
   if (CP_DECL_THREAD_LOCAL_P (decl))
-   error ("%qD declared % in % function",
-  decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "%qD declared % in % function",
+ decl);
   else if (TREE_STATIC (decl))
-   error ("%qD declared % in % function", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "%qD declared % in % function", decl);
   else
ok = true;
   if (!ok)
@@ -8253,18 +8256,18 @@ expand_static_init (tree decl, tree init)
   if (CP_DECL_THREAD_LOCAL_P (decl) && DECL_GNU_TLS_P (decl)
   && !DECL_FUNCTION_SCOPE_P (decl))
 {
+  location_t dloc = DECL_SOURCE_LOCATION (decl);
   if (init)
-   error ("non-local variable %qD declared %<__thread%> "
-  "needs dynamic initialization", decl);
+   error_at (dloc, "non-local variable %qD declared %<__thread%> "
+ "needs dynamic initialization", decl);
   else
-   error ("non-local variable %qD declared %<__thread%> "
-  "has a non-trivial destructor", decl);
+   error_at (dloc, "non-local variable %qD declared %<__thread%> "
+ "has a non-trivial destructor", decl);
   static bool informed;
   if (!informed)
{
- inform (DECL_SOURCE_LOCATION (decl),
- "C++11 % allows dynamic initialization "
- "and destruction");
+ inform (dloc, "C++11 % allows dynamic "
+ "initialization and destruction");
  informed = true;
}
   return;
Index: testsuite/g++.dg/diagnostic/constexpr1.C
===
--- testsuite/g++.dg/diagnostic/constexpr1.C(nonexistent)
+++ testsuite/g++.dg/diagnostic/constexpr1.C(working copy)
@@ -0,0 +1,5 @@
+// { dg-do compile { target c++11 } }
+
+constexpr void foo() { thread_local int i __attribute__((unused)) {}; }  // { 
dg-error "41:.i. declared .thread_local." }
+
+constexpr void bar() { static int i __attribute__((unused)) {}; }  // { 
dg-error "35:.i. declared .static." }
Index: testsuite/g++.dg/diagnostic/thread1.C
===
--- testsuite/g++.dg/diagnostic/thread1.C   (nonexistent)
+++ testsuite/g++.dg/diagnostic/thread1.C   (working copy)
@@ -0,0 +1,13 @@
+// { dg-do compile { target c++11 } }
+
+int foo();
+
+__thread int i __attribute__((unused)) = foo();  // { dg-error "14:non-local 
variable .i. declared .__thread. needs" }
+
+struct S
+{
+  constexpr S() {}
+  ~S();
+};
+
+__thread S s __attribute__((unused));  // { dg-error "12:non-local variable 
.s. declared .__thread. has" }


Re: [PATCH] Define new filesystem::__file_clock type

2019-01-07 Thread Christophe Lyon
On Sun, 6 Jan 2019 at 22:45, Jonathan Wakely  wrote:
>
> On 05/01/19 20:03 +, Jonathan Wakely wrote:
> >In C++17 the clock used for filesystem::file_time_type is unspecified,
> >allowing it to be chrono::system_clock. The C++2a draft requires it to
> >be a distinct type, with additional member functions to convert to/from
> >other clocks (either the system clock or UTC). In order to avoid an ABI
> >change later, this patch defines a new distinct type now, which will be
> >used for std::chrono::file_clock later.
> >
> >   * include/bits/fs_fwd.h (__file_clock): Define new clock.
> >   (file_time_type): Redefine in terms of __file_clock.
> >   * src/filesystem/ops-common.h (file_time): Add FIXME comment about
> >   overflow.
> >   * src/filesystem/std-ops.cc (is_set(perm_options, perm_options)): Give
> >   internal linkage.
> >   (internal_file_lock): New helper type for accessing __file_clock.
> >   (do_copy_file): Use internal_file_lock to convert system time to
> >   file_time_type.
> >   (last_write_time(const path&, error_code&)): Likewise.
> >   (last_write_time(const path&, file_time_type, error_code&)): Likewise.
> >
> >Tested powerpc64-linux, committed to trunk.
>
> There's a new failure on 32-bit x86:
>
> /home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/27_io/filesystem/operations/last_write_time.cc:148:
>  void test02(): Assertion 'approx_equal(last_write_time(f.path), time)' 
> failed.
> FAIL: 27_io/filesystem/operations/last_write_time.cc execution test
>

I've seen the same error on arm.

> I'll deal with that ASAP.
Thanks!


Re: [PATCH, C++,rebased] Fix PR c++/88261

2019-01-07 Thread Jason Merrill

On 1/7/19 10:38 AM, Bernd Edlinger wrote:

On 1/7/19 1:08 AM, Martin Sebor wrote:

On 1/5/19 9:04 AM, Bernd Edlinger wrote:

On 1/4/19 10:22 PM, Jason Merrill wrote:

Hmm, I'm uncomfortable with starting to pass in the decl just for the sake of 
deciding whether this diagnostic should be a pedwarn or error. In general, 
because of copy elision, we can't know at this point what we're initializing, 
so I'd rather not pretend we can.  Instead, maybe add a 
LOOKUP_ALLOW_FLEXARY_INIT flag that you can add to the flags argument in the 
call from store_init_value?



Okay, I reworked the patch, to pass a bit in the flags, it was a bit more 
complicated
than anticipated, because it is necessary to pass the flag thru 
process_init_constructor
and friends to the recursive invocation of digest_init_r.  It turned out that
digest_nsdmi_init did not need to change, since it is always wrong to use 
flexarray init
there.  I added a new test case (flexary32.C) to exercises a few cases where 
non static
direct member intializers are allowed to use flexarrays (in static members) and 
where that
would be wrong (in automatic members).  So  that seems to work.


If that resolves pr69338 can you please also reference the bug in
the test and in the ChangeLog?  (Ditto for pr69697.)



Yes, those appear to be fixed as well.
Added pr69338 + pr69697 to the ChangeLog,
and also added test cases from both PRs.


Attached the otherwise unchanged v3 of my patch.

Is to OK for trunk?


OK, thanks.

Jason



[PATCH, testsuite] Allow builtin-has-attribute-* to run as far as possible on targets without alias support.

2019-01-07 Thread Iain Sandoe
Hi Martin,

A)
Some of the builtin-has-attribute tests fail because a sub-set of them need 
symbol alias support.
Darwin has only support for weak aliases and therefore we need to skip these.

However, the sub-set is small, and I am reluctant to throw out the entire set 
for the sake of a small number, so I propose to wrap that small number in 
#ifndef that can be enabled by targets without the necessary support (Darwin is 
not the only one, just the most frequently tested and therefore often “guilty” 
of finding the problem ;) )

It’s a tricky trade-off between having too many test-cases and having test 
cases that try to cover too many circumstances...

B) [builtin-has-attribute-4.c]
I am concerned by the diagnostics for the lack of support for the “protected” 
mode (Darwin doesn’t have this, at least at present).

B.1 the reported diagnostic appears on the closing brace of the function, 
rather than on the statement that triggers it (line 233).
B.2 I think you’re perhaps missing a %< %> pair - there’s no ‘’ around 
“protected".
B.3. there are a bunch of other lines with the “protected” visibility marking, 
but no diagnostic (perhaps that’s intended, I am not sure).

Addressing B is a separate issue from making the current tests pass, it might 
not be appropriate at this stage .. it’s more of a “head’s up”.

as for the test fixes, OK for trunk?
Iain

gcc/testsuite/

* c-c++-common/builtin-has-attribute-3.c: Skip tests requiring symbol
alias support.
* c-c++-common/builtin-has-attribute-4.c: Likewise.
Append match for warning that ‘protected’ attribute is not supported.

diff --git a/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c 
b/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
index f048059..5b2e5c7 100644
--- a/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
+++ b/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
@@ -1,7 +1,9 @@
 /* Verify __builtin_has_attribute return value for functions.
{ dg-do compile }
{ dg-options "-Wall -ftrack-macro-expansion=0" }
-   { dg-options "-Wall -Wno-narrowing -Wno-unused-local-typedefs 
-ftrack-macro-expansion=0" { target c++ } }  */
+   { dg-options "-Wall -Wno-narrowing -Wno-unused-local-typedefs 
-ftrack-macro-expansion=0" { target c++ } } 
+   { dg-additional-options "-DSKIP_ALIAS" { target *-*-darwin* } } 
+*/
 
 #define ATTR(...) __attribute__ ((__VA_ARGS__))
 
@@ -27,7 +29,9 @@ extern "C"
 #endif
 ATTR (noreturn) void fnoreturn (void) { __builtin_abort (); }
 
+#ifndef SKIP_ALIAS
 ATTR (alias ("fnoreturn")) void falias (void);
+#endif
 
 #define A(expect, sym, attr)   \
   typedef int Assert [1 - 2 * !(__builtin_has_attribute (sym, attr) == expect)]
@@ -114,7 +118,7 @@ void test_alloc_size_malloc (void)
   A (1, fmalloc_size_3, malloc);
 }
 
-
+#ifndef SKIP_ALIAS
 void test_alias (void)
 {
   A (0, fnoreturn, alias);
@@ -123,7 +127,7 @@ void test_alias (void)
   A (0, falias, alias ("falias"));
   A (0, falias, alias ("fnone"));
 }
-
+#endif
 
 void test_cold_hot (void)
 {
diff --git a/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c 
b/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
index d56ef6b..0c36cfc 100644
--- a/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
+++ b/gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
@@ -1,7 +1,9 @@
 /* Verify __builtin_has_attribute return value for variables.
{ dg-do compile }
{ dg-options "-Wall -ftrack-macro-expansion=0" }
-   { dg-options "-Wall -Wno-narrowing -Wno-unused -ftrack-macro-expansion=0" { 
target c++ } }  */
+   { dg-options "-Wall -Wno-narrowing -Wno-unused -ftrack-macro-expansion=0" { 
target c++ } }
+   { dg-additional-options "-DSKIP_ALIAS" { target *-*-darwin* } } 
+*/
 
 #define ATTR(...) __attribute__ ((__VA_ARGS__))
 
@@ -45,6 +47,7 @@ void test_aligned (void)
 }
 
 
+#ifndef SKIP_ALIAS
 int vtarget;
 extern ATTR (alias ("vtarget")) int valias;
 
@@ -55,7 +58,7 @@ void test_alias (void)
   A (1, valias, alias ("vtarget"));
   A (0, valias, alias ("vnone"));
 }
-
+#endif
 
 void test_cleanup (void)
 {
@@ -227,7 +230,7 @@ void test_vector_size (void)
 ATTR (visibility ("default")) int vdefault;
 ATTR (visibility ("hidden")) int vhidden;
 ATTR (visibility ("internal")) int vinternal;
-ATTR (visibility ("protected")) int vprotected;
+ATTR (visibility ("protected")) int vprotected; 
 
 void test_visibility (void)
 {
@@ -280,6 +283,6 @@ void test_weak (void)
 
   A (1, var_init_weak, weak);
   A (1, var_uninit_weak, weak);
-}
+} /* { dg-warning "protected visibility attribute not supported" "" { target { 
*-*-darwin* } } } */
 
 /* { dg-prune-output "specifies less restrictive attribute" } */



Re: Add forgotten options to -fprofile-use

2019-01-07 Thread Sandra Loosemore

On 1/6/19 9:47 AM, Jan Hubicka wrote:


Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 267603)
+++ doc/invoke.texi (working copy)
@@ -9499,6 +9499,8 @@ DO I = 1, N
 D(I) = E(I) * F
  ENDDO
  @end smallexample
+This flag is enabled by default at @option{-O3}.
+It is also enabled by @option{-fprofile-use} and @option{-fauto-profile}.
  
  @item -ftree-loop-distribute-patterns

  @opindex ftree-loop-distribute-patterns
@@ -9524,6 +9526,8 @@ DO I = 1, N
  ENDDO
  @end smallexample
  and the initialization loop is transformed into a call to memset zero.
+This flag is enabled by default at @option{-O3}.
+It is also enabled by @option{-fprofile-use} and @option{-fauto-profile}.
  
  @item -floop-interchange

  @opindex floop-interchange
@@ -9544,12 +9548,14 @@ for (int i = 0; i < N; i++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
  @end smallexample
  This flag is enabled by default at @option{-O3}.
+It is also enabled by @option{-fprofile-use} and @option{-fauto-profile}.
  
  @item -floop-unroll-and-jam

  @opindex floop-unroll-and-jam
  Apply unroll and jam transformations on feasible loops.  In a loop
  nest this unrolls the outer loop by some factor and fuses the resulting
  multiple inner loops.  This flag is enabled by default at @option{-O3}.
+It is also enabled by @option{-fprofile-use} and @option{-fauto-profile}.
  
  @item -ftree-loop-im

  @opindex ftree-loop-im
@@ -10804,6 +10810,8 @@ else
  
  This is particularly useful for assumed-shape arrays in Fortran where

  (for example) it allows better vectorization assuming contiguous accesses.
+This flag is enabled by default at @option{-O3}.
+It is also enabled by @option{-fprofile-use} and @option{-fauto-profile}.
  
  @item -ffunction-sections

  @itemx -fdata-sections



The documentation for -fprofile-use and -fauto-profile also includes 
tables of the other options implied by those options.  Can you please 
update those too?  I just synced them up with the code a couple months 
ago (PR middle-end/23197).


-Sandra


Re: [PATCH] Use proper print formatter in main function in fixincl.c

2019-01-07 Thread Jonathan Wakely

On 07/01/19 11:01 -0500, NightStrike wrote:

On Mon, Jan 7, 2019 at 10:57 AM Jonathan Wakely  wrote:


On 20/12/18 17:23 -0500, Nicholas Krause wrote:
>This fixes the bug id, 71176 to use the proper known
>code print formatter type, %lu for size_t rather than
>%d which is considered best pratice for print statements.

Well the proper specifier for size_t is %zu, but since you cast to
unsigned long then %lu is right.


Wouldn't the right fix be to not cast to unsigned long and use %zu?


That was added in C99 and so is not part of C++98. GCC only requires a
C++98 compiler (and library).




Re: [PATCH] Use proper print formatter in main function in fixincl.c

2019-01-07 Thread NightStrike
On Mon, Jan 7, 2019 at 10:57 AM Jonathan Wakely  wrote:
>
> On 20/12/18 17:23 -0500, Nicholas Krause wrote:
> >This fixes the bug id, 71176 to use the proper known
> >code print formatter type, %lu for size_t rather than
> >%d which is considered best pratice for print statements.
>
> Well the proper specifier for size_t is %zu, but since you cast to
> unsigned long then %lu is right.

Wouldn't the right fix be to not cast to unsigned long and use %zu?


Re: [PATCH] Use proper print formatter in main function in fixincl.c

2019-01-07 Thread Jonathan Wakely

On 20/12/18 17:23 -0500, Nicholas Krause wrote:

This fixes the bug id, 71176 to use the proper known
code print formatter type, %lu for size_t rather than
%d which is considered best pratice for print statements.


Well the proper specifier for size_t is %zu, but since you cast to
unsigned long then %lu is right.



Signed-off-by: Nicholas Krause 
---
fixincludes/fixincl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index 6dba2f6e830..5b8b77a77f0 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -158,11 +158,11 @@ main (int argc, char** argv)
  if (VLEVEL( VERB_PROGRESS )) {
tSCC zFmt[] =
  "\
-Processed %5d files containing %d bytes\n\
+Processed %5d files containing %lu bytes\n\
Applying  %5d fixes to %d files\n\
Altering  %5d of them\n";

-fprintf (stderr, zFmt, process_ct, ttl_data_size, apply_ct,
+fprintf (stderr, zFmt, process_ct, (unsigned int long) ttl_data_size, 
apply_ct,


I'd expect "unsigned long" not "unsigned int long".



Re: C++ PATCH to implement deferred parsing of noexcept-specifiers (c++/86476, c++/52869)

2019-01-07 Thread Jason Merrill

On 12/19/18 3:27 PM, Marek Polacek wrote:

Prompted by Jon's observation in 52869, I noticed that we don't treat
a noexcept-specifier as a complete-class context of a class ([class.mem]/6).
As with member function bodies, default arguments, and NSDMIs, names used in
a noexcept-specifier of a member-function can be declared later in the class
body, so we need to wait and parse them at the end of the class.
For that, I've made use of DEFAULT_ARG (now best to be renamed to UNPARSED_ARG).


Or DEFERRED_PARSE, yes.


+  /* We can't compare unparsed noexcept-specifiers.  Save the old decl
+ and check this again after we've parsed the noexcept-specifiers
+ for real.  */
+  if (UNPARSED_NOEXCEPT_SPEC_P (new_exceptions))
+{
+  vec_safe_push (DEFARG_INSTANTIATIONS (TREE_PURPOSE (new_exceptions)),
+copy_decl (old_decl));
+  return;
+}


Why copy_decl?

It seems wasteful to allocate a vec to hold this single decl; let's make 
the last field of tree_default_arg a union instead.  And add a new macro 
for the single decl case.


I notice that default_arg currently uses tree_common for some reason, 
and we ought to be able to save two words by switching to tree_base



@@ -1245,6 +1245,7 @@ nothrow_spec_p (const_tree spec)
  || TREE_VALUE (spec)
  || spec == noexcept_false_spec
  || TREE_PURPOSE (spec) == error_mark_node
+ || TREE_CODE (TREE_PURPOSE (spec)) == DEFAULT_ARG


Maybe use UNPARSED_NOEXCEPT_SPEC_P here?


+/* Make sure that any member-function parameters are in scope.
+   For instance, a function's noexcept-specifier can use the function's
+   parameters:
+
+   struct S {
+ void fn (int p) noexcept(noexcept(p));
+   };
+
+   so we need to make sure name lookup can find them.  This is used
+   when we delay parsing of the noexcept-specifier.  */
+
+static void
+maybe_begin_member_function_processing (tree decl)


This name is pretty misleading.  How about inject_parm_decls, to go with 
inject_this_parameter?



+/* Undo the effects of maybe_begin_member_function_processing.  */
+
+static void
+maybe_end_member_function_processing (void)


And then perhaps pop_injected_parms.


+/* Check throw specifier of OVERRIDER is at least as strict as
+   the one of BASEFN.  */
+
+bool
+maybe_check_throw_specifier (tree overrider, tree basefn)
+{
+  maybe_instantiate_noexcept (basefn);
+  maybe_instantiate_noexcept (overrider);
+  tree base_throw = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (basefn));
+  tree over_throw = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (overrider));
+
+  if (DECL_INVALID_OVERRIDER_P (overrider))
+return true;
+
+  /* Can't check this yet.  Pretend this is fine and let
+ noexcept_override_late_checks check this later.  */
+  if (UNPARSED_NOEXCEPT_SPEC_P (base_throw)
+  || UNPARSED_NOEXCEPT_SPEC_P (over_throw))
+return true;
+
+  if (!comp_except_specs (base_throw, over_throw, ce_derived))
+{
+  auto_diagnostic_group d;
+  error ("looser throw specifier for %q+#F", overrider);


Since we're touching this diagnostic, let's correct it now to "exception 
specification".  And add "on overriding virtual function".


Jason


Re: [PATCH, C++,rebased] Fix PR c++/88261

2019-01-07 Thread Bernd Edlinger
On 1/7/19 1:08 AM, Martin Sebor wrote:
> On 1/5/19 9:04 AM, Bernd Edlinger wrote:
>> On 1/4/19 10:22 PM, Jason Merrill wrote:
>>> Hmm, I'm uncomfortable with starting to pass in the decl just for the sake 
>>> of deciding whether this diagnostic should be a pedwarn or error. In 
>>> general, because of copy elision, we can't know at this point what we're 
>>> initializing, so I'd rather not pretend we can.  Instead, maybe add a 
>>> LOOKUP_ALLOW_FLEXARY_INIT flag that you can add to the flags argument in 
>>> the call from store_init_value?
>>>
>>
>> Okay, I reworked the patch, to pass a bit in the flags, it was a bit more 
>> complicated
>> than anticipated, because it is necessary to pass the flag thru 
>> process_init_constructor
>> and friends to the recursive invocation of digest_init_r.  It turned out that
>> digest_nsdmi_init did not need to change, since it is always wrong to use 
>> flexarray init
>> there.  I added a new test case (flexary32.C) to exercises a few cases where 
>> non static
>> direct member intializers are allowed to use flexarrays (in static members) 
>> and where that
>> would be wrong (in automatic members).  So  that seems to work.
> 
> If that resolves pr69338 can you please also reference the bug in
> the test and in the ChangeLog?  (Ditto for pr69697.)
> 

Yes, those appear to be fixed as well.
Added pr69338 + pr69697 to the ChangeLog,
and also added test cases from both PRs.


Attached the otherwise unchanged v3 of my patch.

Is to OK for trunk?


Thanks
Bernd.
gcc/cp:
2019-01-05  Bernd Edlinger  

	PR c++/88261
	PR c++/69338
	PR c++/69696
	PR c++/69697
	* cp-tree.h (LOOKUP_ALLOW_FLEXARRAY_INIT): New flag value.
	* typeck2.c (digest_init_r): Raise an error for non-static
	initialization of a flexible array member.
	(process_init_constructor, massage_init_elt,
	process_init_constructor_array, process_init_constructor_record,
	process_init_constructor_union, process_init_constructor): Add the
	flags parameter and pass it thru.
	(store_init_value): Pass LOOKUP_ALLOW_FLEXARRAY_INIT parameter to
	digest_init_flags for static decls.

gcc/testsuite:
2019-01-05  Bernd Edlinger  

	PR c++/88261
	PR c++/69338
	PR c++/69696
	PR c++/69697
	* gcc.dg/array-6.c: Move from here ...
	* c-c++-common/array-6.c: ... to here and add some more test coverage.
	* g++.dg/pr69338.C: New test.
	* g++.dg/pr69697.C: Likewise.
	* g++.dg/ext/flexary32.C: Likewise.
	* g++.dg/ext/flexary3.C: Adjust test.
	* g++.dg/ext/flexary12.C: Likewise.
	* g++.dg/ext/flexary13.C: Likewise.
	* g++.dg/ext/flexary15.C: Likewise.
	* g++.dg/warn/Wplacement-new-size-1.C: Likewise.
	* g++.dg/warn/Wplacement-new-size-2.C: Likewise.
	* g++.dg/warn/Wplacement-new-size-6.C: Likewise.


Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h	(revision 267569)
+++ gcc/cp/cp-tree.h	(working copy)
@@ -5454,6 +5454,8 @@ enum overload_flags { NO_SPECIAL = 0, DTOR_FLAG, T
 #define LOOKUP_NO_NON_INTEGRAL (LOOKUP_NO_RVAL_BIND << 1)
 /* Used for delegating constructors in order to diagnose self-delegation.  */
 #define LOOKUP_DELEGATING_CONS (LOOKUP_NO_NON_INTEGRAL << 1)
+/* Allow initialization of a flexible array members.  */
+#define LOOKUP_ALLOW_FLEXARRAY_INIT (LOOKUP_DELEGATING_CONS << 1)
 
 #define LOOKUP_NAMESPACES_ONLY(F)  \
   (((F) & LOOKUP_PREFER_NAMESPACES) && !((F) & LOOKUP_PREFER_TYPES))
Index: gcc/cp/typeck2.c
===
--- gcc/cp/typeck2.c	(revision 267569)
+++ gcc/cp/typeck2.c	(working copy)
@@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gcc-rich-location.h"
 
 static tree
-process_init_constructor (tree type, tree init, int nested,
+process_init_constructor (tree type, tree init, int nested, int flags,
 			  tsubst_flags_t complain);
 
 
@@ -817,8 +817,12 @@ store_init_value (tree decl, tree init, vec *v = CONSTRUCTOR_ELTS (init);
@@ -1365,7 +1383,8 @@ static int
 	ce->index = error_mark_node;
   gcc_assert (ce->value);
   ce->value
-	= massage_init_elt (TREE_TYPE (type), ce->value, nested, complain);
+	= massage_init_elt (TREE_TYPE (type), ce->value, nested, flags,
+			complain);
 
   gcc_checking_assert
 	(ce->value == error_mark_node
@@ -1373,7 +1392,7 @@ static int
 	 (strip_array_types (TREE_TYPE (type)),
 	  strip_array_types (TREE_TYPE (ce->value);
 
-  flags |= picflag_from_initializer (ce->value);
+  picflags |= picflag_from_initializer (ce->value);
 }
 
   /* No more initializers. If the array is unbounded, we are done. Otherwise,
@@ -1389,7 +1408,8 @@ static int
 	   we can't rely on the back end to do it for us, so make the
 	   initialization explicit by list-initializing from T{}.  */
 	next = build_constructor (init_list_type_node, NULL);
-	next = massage_init_elt (TREE_TYPE (type), next, nested, complain);
+	next = massage_init_elt (TREE_TYPE (type), next, nested, flags,
+ complain);
 	if 

Re: [testsuite] Fix gcc.dg/debug/dwarf2/inline5.c with Solaris as (PR debug/87451)

2019-01-07 Thread Richard Biener
On Fri, 4 Jan 2019, Rainer Orth wrote:

> Hi Richard,
> 
> >> On Thu, 3 Jan 2019, Rainer Orth wrote:
> >>
> >>> gcc.dg/debug/dwarf2/inline5.c currently FAILs with Solaris as (both
> >>> sparc and x86):
> >>> 
> >>> FAIL: gcc.dg/debug/dwarf2/inline5.c scan-assembler-not (DIE
> >>> (0x([0-9a-f]*)) DW_TAG_lexical_block)[^#/!]*[#/!]
> >>> [^(].*DW_TAG_lexical_block)[^#/!x]*x1[^#/!]*[#/!]
> >>> DW_AT_abstract_origin
> >>> FAIL: gcc.dg/debug/dwarf2/inline5.c scan-assembler-times
> >>> DW_TAG_lexical_block)[^#/!]*[#/!] (DIE (0x[0-9a-f]*)
> >>> DW_TAG_variable 1
> >>> 
> >>> The first failure seems to be caused because .* performs multiline
> >>> matches by default in Tcl; tightening it to [^\n]* avoids the problem.
> >>
> >> Hmm, but the matches are supposed to match multiple lines...  how
> >> does it fail for you?
> >
> > it matches all of
> >
> > (DIE (0x19f) DW_TAG_lexical_block)
> > .byte   0xd / uleb128 0xd; (DIE (0x1a0) DW_TAG_variable)
> > .ascii "j"  / DW_AT_name
> > .byte   0x1 / DW_AT_decl_file 
> > (/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/debug/dwarf2/inline5.c)
> > .byte   0x12/ DW_AT_decl_line
> > .byte   0x14/ DW_AT_decl_column
> > .long   0x17f   / DW_AT_type
> > .byte   0   / end of children of DIE 0x19f
> > .byte   0   / end of children of DIE 0x184
> > .byte   0xe / uleb128 0xe; (DIE (0x1ac) DW_TAG_subprogram)
> > .long   0x184   / DW_AT_abstract_origin
> > .long   .LFB0   / DW_AT_low_pc
> > .long   .LFE0-.LFB0 / DW_AT_high_pc
> > .byte   0x1 / uleb128 0x1; DW_AT_frame_base
> > .byte   0x9c/ DW_OP_call_frame_cfa
> > / DW_AT_GNU_all_call_sites
> > .byte   0xf / uleb128 0xf; (DIE (0x1bb) DW_TAG_formal_parameter)
> > .long   0x195   / DW_AT_abstract_origin
> > .byte   0x2 / uleb128 0x2; DW_AT_location
> > .byte   0x91/ DW_OP_fbreg
> > .byte   0   / sleb128 0
> > .byte   0x6 / uleb128 0x6; (DIE (0x1c3) DW_TAG_lexical_block)
> > .long   0x19f   / DW_AT_abstract_origin
> >
> > while with gas there's instead
> >
> > .uleb128 0xc/ (DIE (0xad) DW_TAG_lexical_block)
> > .uleb128 0xd/ (DIE (0xae) DW_TAG_variable)
> > .ascii "j\0"/ DW_AT_name
> > .byte   0x1 / DW_AT_decl_file 
> > (/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/debug/dwarf2/inline5.c)
> >
> > i.e. the pattern doesn't match with gas due to the [^(] while with as we
> > have uleb128 first which does match, producing the failure (which shows
> > that that part of my patch is wrong).
> 
> I still have a hard time determining what to do here.  I've now reverted
> the tree to r264642, i.e. the one before the PR debug/87443 patch.  Then
> I build on x86_64-pc-linux-gnu and ran the inline5.c testcase against
> the old compiler.  I'd have expected all the scan-assembler* tests to
> FAIL here, but instead I get
> 
> PASS: gcc.dg/debug/dwarf2/inline5.c (test for excess errors)
> PASS: gcc.dg/debug/dwarf2/inline5.c scan-assembler-times 
> DW_TAG_inlined_subrouti
> ne 2
> FAIL: gcc.dg/debug/dwarf2/inline5.c scan-assembler-times 
> DW_TAG_lexical_block\\)
> [^#/!]*[#/!] DW_AT_abstract_origin 2
> PASS: gcc.dg/debug/dwarf2/inline5.c scan-assembler-times 
> DW_TAG_lexical_block\\)
> [^#/!]*[#/!] \\(DIE \\(0x[0-9a-f]*\\) DW_TAG_variable 1
> PASS: gcc.dg/debug/dwarf2/inline5.c scan-assembler-not \\(DIE 
> \\(0x([0-9a-f]*)\\
> ) DW_TAG_lexical_block\\)[^#/!]*[#/!] 
> [^(].*DW_TAG_lexical_block\\)[^#/!x]*x\\1[
> ^#/!]*[#/!] DW_AT_abstract_origin
> FAIL: gcc.dg/debug/dwarf2/inline5.c scan-assembler-not 
> DW_TAG_lexical_block\\)[^
> #/!x]*x([0-9a-f]*)[^#/!]*[#/!] DW_AT_abstract_origin.*\\(DIE \\(0x\\1\\) 
> DW_TAG_
> lexical_block\\)[^#/!]*[#/!] DW_AT
> 
> i.e. the problematic scan-assembler-not test PASSes before and after
> your patch, making it hard to determine what that test is guarding
> against (i.e. what is matched on Linux/x86_64 or Solaris with gas) and
> adapting it to the Solaris as syntax.

Yeah, the issue is I applied patches in another order than I developed
the testcases...  I think you need to back out the PR87428/87362
fix to see this FAIL happening.

What we want to not see is a lexical block used as abstract origin
that has further attributes.  GCC 8 shows bogus DWARF:

 <2><5c>: Abbrev Number: 4 (DW_TAG_inlined_subroutine)
<5d>   DW_AT_abstract_origin: <0xa9>
<61>   DW_AT_low_pc  : 0xf
<69>   DW_AT_high_pc : 0xf
<71>   DW_AT_call_file   : 1
<72>   DW_AT_call_line   : 10
<73>   DW_AT_call_column : 20
 <3><74>: Abbrev Number: 5 (DW_TAG_formal_parameter)
<75>   DW_AT_abstract_origin: <0xba>
<79>   DW_AT_location: 0x0 (location list)
 <3><7d>: Abbrev Number: 6 (DW_TAG_lexical_block)
<7e>   DW_AT_abstract_origin: <0xf1>
<82>   DW_AT_low_pc  : 0xf
...
 <1>: A

[PATCH] PR target/85596 Add --with-multilib-list doc for aarch64

2019-01-07 Thread Christophe Lyon
Hi,

This small patch adds a short description of --with-multilib-list for aarch64.
OK?

Thanks,

Christophe
2019-01-07  Christophe Lyon  

PR target/85596
* doc/install.texi (with-multilib-list): Document for aarch64.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 5cf007b..d2bf21d 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1073,10 +1073,19 @@ sysv, aix.
 @itemx --without-multilib-list
 Specify what multilibs to build.  @var{list} is a comma separated list of
 values, possibly consisting of a single value.  Currently only implemented
-for arm*-*-*, riscv*-*-*, sh*-*-* and x86-64-*-linux*.  The accepted
-values and meaning for each target is given below.
+for aarch64*-*-*, arm*-*-*, riscv*-*-*, sh*-*-* and x86-64-*-linux*.  The
+accepted values and meaning for each target is given below.
 
 @table @code
+@item aarch64*-*-*
+@var{list} is a comma separated list of @code{ilp32}, and @code{lp64}
+to enable ILP32 and LP64 run-time libraries, respectively.  If
+@var{list} is empty, then there will be no multilibs and only the
+default run-time library will be built.  If @var{list} is
+@code{default} or --with-multilib-list= is not specified, then the
+default set of libraries is selected based on the value of
+@option{--target}.
+
 @item arm*-*-*
 @var{list} is a comma separated list of @code{aprofile} and
 @code{rmprofile} to build multilibs for A or R and M architecture


Re: Add new --param knobs for inliner

2019-01-07 Thread Jan Hubicka
> Jan Hubicka wrote:
> 
> > uinlined-* should be useful for architecutures with greater function
> > call overhead than modern x86 chips (which is good portion of them,
> > especially s390 as I learnt on Cauldron). It would be nice to benchmark
> > effect of those and tune default in config/* files. I think this is a
> > reasonable way to deal with architecutral differences without making
> > inliner hard to tune in long term.
> 
> Thanks for the heads-up!  This looks interesting, we'll have a look.

It may (and likely will) still be necessary to also increase
max-inline-insns-auto and perhaps -single but I think it is good to get
model realistic first. It will make inliner to prioritize better and
consider more inline candidates as important via the big speedup metric.

Honza
> 
> Bye,
> Ulrich
> 
> -- 
>   Dr. Ulrich Weigand
>   GNU/Linux compilers and toolchain
>   ulrich.weig...@de.ibm.com
> 


[PATCH] PR libstdc++/87787 avoid undefined null args to memcpy and memmove

2019-01-07 Thread Jonathan Wakely

The C++ char_traits and ctype APIs do not disallow null pointer
arguments, so we need explicit checks to ensure we don't forward null
pointers to memcpy or memmove.

PR libstdc++/87787
* include/bits/char_traits.h (char_traits::move): Do not pass null
pointers to memmove.
* include/bits/locale_facets.h
(ctype::widen(const char*, const char*, char*)): Do not
pass null pointers to memcpy.
(ctype::narrow(const char*, const char*, char, char*)):
Likewise.
(ctype::do_widen(const char*, const char*, char*)):
Likewise.
(ctype::do_narrow(const char*, const char*, char, char*)):
Likewise.

Tested powerpc64-linux, committed to trunk.


commit 8322b49ba9dfa3cef33a78e4bc2bab0937c3849a
Author: Jonathan Wakely 
Date:   Mon Jan 7 14:38:22 2019 +

PR libstdc++/87787 avoid undefined null args to memcpy and memmove

The C++ char_traits and ctype APIs do not disallow null pointer
arguments, so we need explicit checks to ensure we don't forward null
pointers to memcpy or memmove.

PR libstdc++/87787
* include/bits/char_traits.h (char_traits::move): Do not pass null
pointers to memmove.
* include/bits/locale_facets.h
(ctype::widen(const char*, const char*, char*)): Do not
pass null pointers to memcpy.
(ctype::narrow(const char*, const char*, char, char*)):
Likewise.
(ctype::do_widen(const char*, const char*, char*)):
Likewise.
(ctype::do_narrow(const char*, const char*, char, char*)):
Likewise.

diff --git a/libstdc++-v3/include/bits/char_traits.h 
b/libstdc++-v3/include/bits/char_traits.h
index a2ba5da910f..06e04ceaa34 100644
--- a/libstdc++-v3/include/bits/char_traits.h
+++ b/libstdc++-v3/include/bits/char_traits.h
@@ -183,6 +183,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 char_traits<_CharT>::
 move(char_type* __s1, const char_type* __s2, std::size_t __n)
 {
+  if (__n == 0)
+   return __s1;
   return static_cast<_CharT*>(__builtin_memmove(__s1, __s2,
__n * sizeof(char_type)));
 }
diff --git a/libstdc++-v3/include/bits/locale_facets.h 
b/libstdc++-v3/include/bits/locale_facets.h
index 33cff65..66ac9c07a5d 100644
--- a/libstdc++-v3/include/bits/locale_facets.h
+++ b/libstdc++-v3/include/bits/locale_facets.h
@@ -896,7 +896,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
if (_M_widen_ok == 1)
  {
-   __builtin_memcpy(__to, __lo, __hi - __lo);
+   if (__builtin_expect(__hi != __lo, true))
+ __builtin_memcpy(__to, __lo, __hi - __lo);
return __hi;
  }
if (!_M_widen_ok)
@@ -961,7 +962,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
if (__builtin_expect(_M_narrow_ok == 1, true))
  {
-   __builtin_memcpy(__to, __lo, __hi - __lo);
+   if (__builtin_expect(__hi != __lo, true))
+ __builtin_memcpy(__to, __lo, __hi - __lo);
return __hi;
  }
if (!_M_narrow_ok)
@@ -1100,7 +1102,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   virtual const char*
   do_widen(const char* __lo, const char* __hi, char_type* __to) const
   {
-   __builtin_memcpy(__to, __lo, __hi - __lo);
+   if (__builtin_expect(__hi != __lo, true))
+ __builtin_memcpy(__to, __lo, __hi - __lo);
return __hi;
   }
 
@@ -1153,7 +1156,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   do_narrow(const char_type* __lo, const char_type* __hi,
char __dfault __attribute__((__unused__)), char* __to) const
   {
-   __builtin_memcpy(__to, __lo, __hi - __lo);
+   if (__builtin_expect(__hi != __lo, true))
+ __builtin_memcpy(__to, __lo, __hi - __lo);
return __hi;
   }
 


Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch)

2019-01-07 Thread Marc Glisse

On Mon, 7 Jan 2019, Tamar Christina wrote:


The 01/04/2019 17:50, Marc Glisse wrote:

+(convert:newtype (op (convert:newtype @1) (convert:newtype @2)))


The outer 'convert' is unnecessary, op already has the same type.



Does it? The only comparison that has been done between the type of op and 
"type" is that
they are both a decimal floating point type. I don't see any reason why they 
have to be the
same type.


op is just PLUS_EXPR (or another operation), it isn't related to @0, it 
does not have a type in itself. When you build the sum of 2 objects of 
type newtype, the result has type newtype. On the other hand, if newtype 
is not the same as type, you may be missing a conversion of the result to 
type. Ah, I see that newtype is always == type here.



+   (nop:type (op (convert:ty1 @1) (convert:ty2 @2)


Please don't use 'nop' directly, use 'convert' instead. This line is very
suspicious, both arguments of op should have the same type. Specifying the
outertype should be unnecessary, it is always 'type'. And if necessary, I
expect '(convert:ty1 @1)' is the same as '{ arg0; }'.



Ah I wasn't aware I could use arg0 here. I've updated the patch, though I don't
really find this clearer.



+ (convert (op (convert:ty1 { arg0; }) (convert:ty2 { arg1; })


I think you misunderstood my point. What you wrote is equivalent to:

(convert (op { arg0; } { arg1; }

since arg0 already has type ty1. And I am complaining that both arguments 
to op must have the same type, but you are creating one of type ty1 and 
one of type ty2, which doesn't clearly indicate that ty1==ty2.


Maybe experiment with
(long double)some_float * (long double)some_double
cast to either float or double.

SCALAR_FLOAT_TYPE_P may be safer than FLOAT_TYPE_P.

--
Marc Glisse


PR target/86891 __builtin_sub_overflow issues on AArch64

2019-01-07 Thread Richard Earnshaw (lists)
Investigating PR target/86891 revealed a number of issues with the way
the AArch64 backend was handing overflow detection patterns.  Firstly,
expansion for signed and unsigned types is not the same as in one form
the overflow is detected via the C flag and in the other it is done via
the V flag in the PSR.  Secondly, particular care has to be taken when
describing overflow of signed types: the comparison has to be performed
conceptually on a value that cannot overflow and compared to a value
that might have overflowed.

It became apparent that some of the patterns were simply unmatchable
(they collapse to NEG in the RTL rather than subtracting from zero) and
a number of patterns were overly restrictive in terms of the immediate
constants that they supported.  I've tried to address all of these
issues as well.

Committed to trunk.

gcc:

PR target/86891
* config/aarch64/aarch64.c (aarch64_expand_subvti): New parameter
unsigned_p.  Handle signed and unsigned overflow correction as
required.
* config/aarch64/aarch64-protos.h (aarch64_expand_subvti): Update
prototype.
* config/aarch64/aarch64.md (addv4): Use aarch64_plus_operand
for operand 2.
(add3_compareV_imm): Make this callable for expanding.
(subv4): Use register_operand for operand 1.  Use
aarch64_plus_operand for operand 2.
(subv_insn): New insn pattern.
(subv_imm): Likewise.
(negv3): New expand pattern.
(negv_insn): New insn pattern.
(negv_cmp_only): Likewise.
(cmpv_insn): Likewise.
(subvti4): Use register_operand for operand 1.  Update call to
aarch64_expand_subvti.
(usubvti4): Likewise.
(negvti3): New expand pattern.
(negdi_carryout): New insn pattern.
(negvdi_carryinV): New insn pattern.
(sub_compare1_imm): Delete named insn pattern, make anonymous
version the named version.
(peepholes to convert to sub_compare1_imm): Adjust order of
operands.
(usub3_carryinC, usub3_carryinC_z1): New insn
patterns.
(usub3_carryinC_z2, usub3_carryinC): New insn
patterns.
(sub3_carryinCV, sub3_carryinCV_z1_z2): Delete.
(sub3_carryinCV_z1, sub3_carryinCV_z2): Delete.
(sub3_carryinCV): Delete.
(sub3_carryinV): New expand pattern.
sub3_carryinV, sub3_carryinV_z2): New insn patterns.

testsuite:

* gcc.target/aarch64/subs_compare_2.c: Make '#' immediate prefix
optional in scan pattern.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9a8f81e..209c09b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -530,7 +530,7 @@ void aarch64_subvti_scratch_regs (rtx, rtx, rtx *,
   rtx *, rtx *,
   rtx *, rtx *, rtx *);
 void aarch64_expand_subvti (rtx, rtx, rtx,
-			rtx, rtx, rtx, rtx);
+			rtx, rtx, rtx, rtx, bool);
 
 
 /* Initialize builtins for SIMD intrinsics.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c5036c8..c879940 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16668,32 +16668,38 @@ aarch64_subvti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
LOW_IN2 represents the low half (DImode) of TImode operand 2
HIGH_DEST represents the high half (DImode) of TImode operand 0
HIGH_IN1 represents the high half (DImode) of TImode operand 1
-   HIGH_IN2 represents the high half (DImode) of TImode operand 2.  */
-
+   HIGH_IN2 represents the high half (DImode) of TImode operand 2
+   UNSIGNED_P is true if the operation is being performed on unsigned
+   values.  */
 void
 aarch64_expand_subvti (rtx op0, rtx low_dest, rtx low_in1,
 		   rtx low_in2, rtx high_dest, rtx high_in1,
-		   rtx high_in2)
+		   rtx high_in2, bool unsigned_p)
 {
   if (low_in2 == const0_rtx)
 {
   low_dest = low_in1;
-  emit_insn (gen_subdi3_compare1 (high_dest, high_in1,
-  force_reg (DImode, high_in2)));
+  high_in2 = force_reg (DImode, high_in2);
+  if (unsigned_p)
+	emit_insn (gen_subdi3_compare1 (high_dest, high_in1, high_in2));
+  else
+	emit_insn (gen_subvdi_insn (high_dest, high_in1, high_in2));
 }
   else
 {
   if (CONST_INT_P (low_in2))
 	{
-	  low_in2 = force_reg (DImode, GEN_INT (-UINTVAL (low_in2)));
 	  high_in2 = force_reg (DImode, high_in2);
-	  emit_insn (gen_adddi3_compareC (low_dest, low_in1, low_in2));
+	  emit_insn (gen_subdi3_compare1_imm (low_dest, low_in1, low_in2,
+	  GEN_INT (-INTVAL (low_in2;
 	}
   else
 	emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
-  emit_insn (gen_subdi3_carryinCV (high_dest,
-   force_reg (DImode, high_in1),
-   high_in2));
+
+  if (unsigned_p)
+	emit_insn (gen_usubdi3_carryinC (high_dest, high_in1, high_in2));
+  else
+	emit_insn (gen_subdi3_carryinV (high_dest, hig

[PATCH, GCC] PR target/86487: fix the way 'uses_hard_regs_p' handles paradoxical subregs

2019-01-07 Thread Andre Vieira (lists)

Hi,

This patch fixes the way 'uses_hard_regs_p' handles paradoxical subregs. 
 The function is supposed to detect whether a register access of 'x' 
overlaps with 'set'.  For SUBREGs it should check whether any of the 
full multi-register overlaps with 'set'.  The former behavior used to 
grab the widest mode of the inner/outer registers of a SUBREG and the 
inner register, and check all registers from the inner-register onwards 
for the given width.  For normal SUBREGS this gives you the full 
register, for paradoxical SUBREGS however it may give you the wrong set 
of registers if the index is not the first of the multi-register set.


The original error reported in PR target/86487 can no longer be 
reproduced with the given test, this was due to an unrelated code-gen 
change, regardless I believe this should still be fixed as it is simply 
wrong behavior by uses_hard_regs_p which may be triggered by a different 
test-case or by future changes to the compiler.  Also it is useful to 
point out that this isn't actually a 'target' issue as this code could 
potentially hit any other target using paradoxical SUBREGS.  Should I 
change the Bugzilla ticket to reflect this is actually a target agnostic 
issue in RTL?


There is a gotcha here, I don't know what would happen if you hit the 
cases of get_hard_regno where it would return -1, quoting the comment 
above that function "If X is not a register or a subreg of a register, 
return -1." though I think if we are hitting this then things must have 
gone wrong before?


Bootstrapped on aarch64, arm and x86, no regressions.

Is this OK for trunk?


gcc/ChangeLog:
2019-01-07 Andre Vieira  


PR target/86487
* lra-constraints.c(uses_hard_regs_p): Fix handling of 
paradoxical SUBREGS.
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index c061093ed699620afe2dfda60d58066d6967523a..736b084acc552b75ff4d369b6584bc9ab422e21b 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1761,11 +1761,21 @@ uses_hard_regs_p (rtx x, HARD_REG_SET set)
 return false;
   code = GET_CODE (x);
   mode = GET_MODE (x);
+
   if (code == SUBREG)
 {
+  /* For all SUBREGs we want to check whether the full multi-register
+	 overlaps the set.  For normal SUBREGs this means 'get_hard_regno' of
+	 the inner register, for paradoxical SUBREGs this means the
+	 'get_hard_regno' of the full SUBREG and for complete SUBREGs either is
+	 fine.  Use the wider mode for all cases.  */
+  rtx subreg = SUBREG_REG (x);
   mode = wider_subreg_mode (x);
-  x = SUBREG_REG (x);
-  code = GET_CODE (x);
+  if (mode == GET_MODE (subreg))
+	{
+	  x = subreg;
+	  code = GET_CODE (x);
+	}
 }
 
   if (REG_P (x))


Re: Fix ICE in get_initial_defs_for_reduction (PR 88567)

2019-01-07 Thread Richard Biener
On Mon, Jan 7, 2019 at 2:27 PM Richard Sandiford
 wrote:
>
> The use of "j" in:
>
>   init = permute_results[number_of_vectors - j - 1];
>
> was out-of-sync with the new flat loop structure.  Now that all that
> reversing is gone, we can just use the result of duplicate_and_interleave
> directly.
>
> The other cases shouldn't be affected by postponing the insertion
> of ctor_seq, since gimple_build* appends to the seq without clearing
> it first (unlike some of the gimplify routines).
>
> The ICE is already covered by gcc.dg/vect/pr63379.c.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2019-01-07  Richard Sandiford  
>
> gcc/
> PR middle-end/88567
> * tree-vect-loop.c (get_initial_defs_for_reduction): Pass the
> output vector directly to duplicate_and_interleave instead of
> going through a temporary.  Postpone insertion of ctor_seq to
> the end of the loop.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-01-04 11:39:26.674251052 +
> +++ gcc/tree-vect-loop.c2019-01-07 13:23:22.924449595 +
> @@ -4103,7 +4103,6 @@ get_initial_defs_for_reduction (slp_tree
>unsigned int group_size = stmts.length ();
>unsigned int i;
>struct loop *loop;
> -  auto_vec permute_results;
>
>vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
>
> @@ -4138,6 +4137,7 @@ get_initial_defs_for_reduction (slp_tree
>bool constant_p = true;
>tree_vector_builder elts (vector_type, nunits, 1);
>elts.quick_grow (nunits);
> +  gimple_seq ctor_seq = NULL;
>for (j = 0; j < nunits * number_of_vectors; ++j)
>  {
>tree op;
> @@ -4163,7 +4163,6 @@ get_initial_defs_for_reduction (slp_tree
>
>if (number_of_places_left_in_vector == 0)
> {
> - gimple_seq ctor_seq = NULL;
>   tree init;
>   if (constant_p && !neutral_op
>   ? multiple_p (TYPE_VECTOR_SUBPARTS (vector_type), nunits)
> @@ -4189,16 +4188,11 @@ get_initial_defs_for_reduction (slp_tree
>   else
> {
>   /* First time round, duplicate ELTS to fill the
> -required number of vectors, then cherry pick the
> -appropriate result for each iteration.  */
> - if (vec_oprnds->is_empty ())
> -   duplicate_and_interleave (&ctor_seq, vector_type, elts,
> - number_of_vectors,
> - permute_results);
> - init = permute_results[number_of_vectors - j - 1];
> +required number of vectors.  */
> + duplicate_and_interleave (&ctor_seq, vector_type, elts,
> +   number_of_vectors, *vec_oprnds);
> + break;
> }
> - if (ctor_seq != NULL)
> -   gsi_insert_seq_on_edge_immediate (pe, ctor_seq);
>   vec_oprnds->quick_push (init);
>
>   number_of_places_left_in_vector = nunits;
> @@ -4207,6 +4201,8 @@ get_initial_defs_for_reduction (slp_tree
>   constant_p = true;
> }
>  }
> +  if (ctor_seq != NULL)
> +gsi_insert_seq_on_edge_immediate (pe, ctor_seq);
>  }
>
>


Re: Fix IFN_MASK_STORE handling of IFN_GOMP_SIMD_LANE

2019-01-07 Thread Richard Biener
On Mon, Jan 7, 2019 at 2:25 PM Richard Sandiford
 wrote:
>
> The IFN_GOMP_SIMD_LANE handling in vectorizable_store tries to use MEM_REF
> offsets to maintain pointer disambiguation info.  This patch makes sure
> that we don't try to do the same optimisation for IFN_MASK_STOREs, which
> have no similar offset argument.
>
> The patch fixes libgomp.c-c++-common/pr66199-*.c for SVE.  Previously
> we had an ncopies==2 store and stored both halves to the same address.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2019-01-07  Richard Sandiford  
>
> gcc/
> * tree-vect-stmts.c (vectorizable_store): Don't use the dataref_offset
> optimization for masked stores.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2019-01-04 11:39:27.190246648 +
> +++ gcc/tree-vect-stmts.c   2019-01-07 13:23:28.048406652 +
> @@ -7059,6 +7059,7 @@ vectorizable_store (stmt_vec_info stmt_i
>   bool simd_lane_access_p
> = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
>   if (simd_lane_access_p
> + && !loop_masks
>   && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
>   && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
>   && integer_zerop (DR_OFFSET (first_dr_info->dr))


Re: [PATCH] Fix PR85574

2019-01-07 Thread Richard Biener
On Thu, 3 Jan 2019, Richard Biener wrote:

> On Thu, 3 Jan 2019, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > The following rectifies a change during hash-map intruction which
> > > changed the uncprop operand_equal_p based hash to a pointer-based
> > > hash_map.  That assumes pointer equality between constants which
> > > does not hold since different (but compatible) typed constants
> > > can appear in the IL.  For SSA names the equivalence holds, of course.
> > >
> > > Fixing this should increase the number of eliminated const-copies
> > > on edges during out-of-SSA.  It also happens to fix the LTO
> > > bootstrap miscompare of cc1 and friends, but I can't really
> > > explain that.
> > >
> > > [LTO] bootstrapped and tested on x86_64-unknown-linux-gnu, applied
> > > to trunk.
> > >
> > > Richard.
> > >
> > > 2019-01-03  Jan Hubicka  
> > >
> > >   PR tree-optimization/85574
> > >   * tree-ssa-uncprop.c (struct equiv_hash_elt): Remove unused
> > >   structure.
> > >   (struct ssa_equip_hash_traits): Declare.
> > >   (val_ssa_equiv): Use custom hash traits using operand_equal_p.
> > >
> > > Index: gcc/tree-ssa-uncprop.c
> > > ===
> > > --- gcc/tree-ssa-uncprop.c(revision 267549)
> > > +++ gcc/tree-ssa-uncprop.c(working copy)
> > > @@ -268,21 +268,24 @@ associate_equivalences_with_edges (void)
> > > so with each value we have a list of SSA_NAMEs that have the
> > > same value.  */
> > >  
> > > -
> > > -/* Main structure for recording equivalences into our hash table.  */
> > > -struct equiv_hash_elt
> > > -{
> > > -  /* The value/key of this entry.  */
> > > -  tree value;
> > > -
> > > -  /* List of SSA_NAMEs which have the same value/key.  */
> > > -  vec equivalences;
> > > +/* Traits for the hash_map to record the value to SSA name equivalences
> > > +   mapping.  */
> > > +struct ssa_equip_hash_traits : default_hash_traits 
> > > +{
> > > +  static inline hashval_t hash (value_type value)
> > > +{ return iterative_hash_expr (value, 0); }
> > > +  static inline bool equal (value_type existing, value_type candidate)
> > > +{ return operand_equal_p (existing, candidate, 0); }
> > >  };
> > 
> > FWIW, this is a dup of tree_operand_hash.
> 
> Indeed.  Testing patch to do the obvious replacement.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2019-01-07  Richard Biener  

* tree-ssa-uncprop.c (ssa_equip_hash_traits): Remove in favor
of tree_operand_hash.

Index: gcc/tree-ssa-uncprop.c
===
--- gcc/tree-ssa-uncprop.c  (revision 267553)
+++ gcc/tree-ssa-uncprop.c  (working copy)
@@ -268,19 +268,7 @@ associate_equivalences_with_edges (void)
so with each value we have a list of SSA_NAMEs that have the
same value.  */
 
-/* Traits for the hash_map to record the value to SSA name equivalences
-   mapping.  */
-struct ssa_equip_hash_traits : default_hash_traits 
-{
-  static inline hashval_t hash (value_type value)
-{ return iterative_hash_expr (value, 0); }
-  static inline bool equal (value_type existing, value_type candidate)
-{ return operand_equal_p (existing, candidate, 0); }
-};
-
-typedef hash_map,
-simple_hashmap_traits  > > val_ssa_equiv_t;
+typedef hash_map > val_ssa_equiv_t;
 
 /* Global hash table implementing a mapping from invariant values
to a list of SSA_NAMEs which have the same value.  We might be


Re: [GCC][middle-end] Add rules to strip away unneeded type casts in expressions (2nd patch)

2019-01-07 Thread Tamar Christina
Hi Marc

The 01/04/2019 17:50, Marc Glisse wrote:
> > +(convert:newtype (op (convert:newtype @1) (convert:newtype @2)))
> 
> The outer 'convert' is unnecessary, op already has the same type.
> 

Does it? The only comparison that has been done between the type of op and 
"type" is that
they are both a decimal floating point type. I don't see any reason why they 
have to be the
same type.

> > +   (nop:type (op (convert:ty1 @1) (convert:ty2 @2)
> 
> Please don't use 'nop' directly, use 'convert' instead. This line is very 
> suspicious, both arguments of op should have the same type. Specifying the 
> outertype should be unnecessary, it is always 'type'. And if necessary, I 
> expect '(convert:ty1 @1)' is the same as '{ arg0; }'.
> 

Ah I wasn't aware I could use arg0 here. I've updated the patch, though I don't
really find this clearer.

Bootstrapped and regtested again on aarch64-none-linux-gnu and 
x86_64-pc-linux-gnu and no issues.

Ok for trunk?

Thanks,
Tamar

> The explicit list of types is quite ugly, but since it already exists...
> 
> -- 
> Marc Glisse

-- 
diff --git a/gcc/convert.c b/gcc/convert.c
index 1a3353c870768a33fe22480ec97c7d3e0c504075..a16b7af0ec54693eb4f1e3a110aabc1aa18eb8df 100644
--- a/gcc/convert.c
+++ b/gcc/convert.c
@@ -295,92 +295,6 @@ convert_to_real_1 (tree type, tree expr, bool fold_p)
 	  return build1 (TREE_CODE (expr), type, arg);
 	}
 	  break;
-	/* Convert (outertype)((innertype0)a+(innertype1)b)
-	   into ((newtype)a+(newtype)b) where newtype
-	   is the widest mode from all of these.  */
-	case PLUS_EXPR:
-	case MINUS_EXPR:
-	case MULT_EXPR:
-	case RDIV_EXPR:
-	   {
-	 tree arg0 = strip_float_extensions (TREE_OPERAND (expr, 0));
-	 tree arg1 = strip_float_extensions (TREE_OPERAND (expr, 1));
-
-	 if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-		 && FLOAT_TYPE_P (TREE_TYPE (arg1))
-		 && DECIMAL_FLOAT_TYPE_P (itype) == DECIMAL_FLOAT_TYPE_P (type))
-	   {
-		  tree newtype = type;
-
-		  if (TYPE_MODE (TREE_TYPE (arg0)) == SDmode
-		  || TYPE_MODE (TREE_TYPE (arg1)) == SDmode
-		  || TYPE_MODE (type) == SDmode)
-		newtype = dfloat32_type_node;
-		  if (TYPE_MODE (TREE_TYPE (arg0)) == DDmode
-		  || TYPE_MODE (TREE_TYPE (arg1)) == DDmode
-		  || TYPE_MODE (type) == DDmode)
-		newtype = dfloat64_type_node;
-		  if (TYPE_MODE (TREE_TYPE (arg0)) == TDmode
-		  || TYPE_MODE (TREE_TYPE (arg1)) == TDmode
-		  || TYPE_MODE (type) == TDmode)
-newtype = dfloat128_type_node;
-		  if (newtype == dfloat32_type_node
-		  || newtype == dfloat64_type_node
-		  || newtype == dfloat128_type_node)
-		{
-		  expr = build2 (TREE_CODE (expr), newtype,
- convert_to_real_1 (newtype, arg0,
-			fold_p),
- convert_to_real_1 (newtype, arg1,
-			fold_p));
-		  if (newtype == type)
-			return expr;
-		  break;
-		}
-
-		  if (TYPE_PRECISION (TREE_TYPE (arg0)) > TYPE_PRECISION (newtype))
-		newtype = TREE_TYPE (arg0);
-		  if (TYPE_PRECISION (TREE_TYPE (arg1)) > TYPE_PRECISION (newtype))
-		newtype = TREE_TYPE (arg1);
-		  /* Sometimes this transformation is safe (cannot
-		 change results through affecting double rounding
-		 cases) and sometimes it is not.  If NEWTYPE is
-		 wider than TYPE, e.g. (float)((long double)double
-		 + (long double)double) converted to
-		 (float)(double + double), the transformation is
-		 unsafe regardless of the details of the types
-		 involved; double rounding can arise if the result
-		 of NEWTYPE arithmetic is a NEWTYPE value half way
-		 between two representable TYPE values but the
-		 exact value is sufficiently different (in the
-		 right direction) for this difference to be
-		 visible in ITYPE arithmetic.  If NEWTYPE is the
-		 same as TYPE, however, the transformation may be
-		 safe depending on the types involved: it is safe
-		 if the ITYPE has strictly more than twice as many
-		 mantissa bits as TYPE, can represent infinities
-		 and NaNs if the TYPE can, and has sufficient
-		 exponent range for the product or ratio of two
-		 values representable in the TYPE to be within the
-		 range of normal values of ITYPE.  */
-		  if (TYPE_PRECISION (newtype) < TYPE_PRECISION (itype)
-		  && (flag_unsafe_math_optimizations
-			  || (TYPE_PRECISION (newtype) == TYPE_PRECISION (type)
-			  && real_can_shorten_arithmetic (TYPE_MODE (itype),
-			  TYPE_MODE (type))
-			  && !excess_precision_type (newtype
-		{
-		  expr = build2 (TREE_CODE (expr), newtype,
- convert_to_real_1 (newtype, arg0,
-			fold_p),
- convert_to_real_1 (newtype, arg1,
-			fold_p));
-		  if (newtype == type)
-			return expr;
-		}
-	   }
-	   }
-	  break;
 	default:
 	  break;
   }
diff --git a/gcc/match.pd b/gcc/match.pd
index 97a94cd8b2f2e0fee9ffbc76c5277c97689b6f42..ef4d9dcbc10d48b893ae5b7df06127a6dc51856a 

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v4)

2019-01-07 Thread Thomas Schwinge
Hi Chung-Lin!

On Sat, 5 Jan 2019 17:47:10 +0800, Chung-Lin Tang  
wrote:
> this is the current version of the oacc-* parts of the Async Re-work patch.
> 
> I have reverted away from the earlier mentioned attempt of using lockless
> techniques to manage the asyncqueues; it is really hard to do in a 100% 
> correct
> manner, unless we only use something like simple lists to manage them,
> which probably makes lookup unacceptably slow.
> 
> For now, I have changed to use the conventional locking and success/fail 
> return
> codes for the synchronize/serialize hooks.

OK, thanks.


> I hope this is enough to pass
> and get committed.

Well, the "Properly handle wait clause with no arguments" changes still
need to be completed and go in first (to avoid introducing regressions),
and then I will have to see your whole set of changes that you intend to
commit: the bits you've incrementally posted still don't include several
of the changes I suggested and provided patches for (again, to avoid
introducing regressions).


But GCC now is in "regression and documentation fixes mode", so I fear
that it's too late now?


> --- oacc-async.c  (revision 267507)
> +++ oacc-async.c  (working copy)

> @@ -62,12 +158,10 @@ acc_wait (int async)

> +  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
> +  thr->dev->openacc.async.synchronize_func (aq);

Have to check the result here?  Like you're doing here, for example:

>  acc_wait_async (int async1, int async2)
>  {

> +  if (!thr->dev->openacc.async.synchronize_func (aq1))
> +gomp_fatal ("wait on %d failed", async1);
> +  if (!thr->dev->openacc.async.serialize_func (aq1, aq2))
> +gomp_fatal ("ordering of async ids %d and %d failed", async1, async2);

> --- oacc-parallel.c   (revision 267507)
> +++ oacc-parallel.c   (working copy)

> @@ -521,17 +500,22 @@ goacc_wait (int async, int num_waits, va_list *ap)

>if (async == acc_async_sync)
> - acc_wait (qid);
> + acc_dev->openacc.async.synchronize_func (aq);

Likewise?

>else if (qid == async)
> - ;/* If we're waiting on the same asynchronous queue as we're
> - launching on, the queue itself will order work as
> - required, so there's no need to wait explicitly.  */
> + /* If we're waiting on the same asynchronous queue as we're
> +launching on, the queue itself will order work as
> +required, so there's no need to wait explicitly.  */
> + ;
>else
> - acc_dev->openacc.async_wait_async_func (qid, async);
> + {
> +   goacc_aq aq2 = get_goacc_asyncqueue (async);
> +   acc_dev->openacc.async.synchronize_func (aq);
> +   acc_dev->openacc.async.serialize_func (aq, aq2);
> + }

Likewise?


Also, I had to apply additional changes as attached, to make this build.


Grüße
 Thomas


>From e4c187a4be46682a989165c38bc6a8d8324554b9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 7 Jan 2019 13:25:18 +0100
Subject: [PATCH] [WIP] into async re-work: complete
 GOMP_OFFLOAD_openacc_async_synchronize, GOMP_OFFLOAD_openacc_async_serialize
 interface changes

---
 libgomp/libgomp-plugin.h  |  4 ++--
 libgomp/plugin/plugin-nvptx.c | 29 +
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index e3c031a282a1..ce3ae125e208 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -115,8 +115,8 @@ extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);
 extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);
 extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);
 extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);
-extern void GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);
-extern void GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,
+extern bool GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);
+extern bool GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,
 		  struct goacc_asyncqueue *);
 extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *,
 		   void (*)(void *), void *);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index f42cbf488a79..12f87ba7be4d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1395,22 +1395,35 @@ GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *aq)
   return -1;
 }
 
-void
+bool
 GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *aq)
 {
-  //TODO Is this safe to call, or might this cause deadlock if something's locked?
-  CUDA_CALL_ASSERT (cuStreamSynchronize, aq->cuda_stream);
+  CUresult r = CUDA_CALL_NOCHECK (cuStreamSynchronize, aq->cuda_stream);
+  return r == CUDA_SUCCESS;
 }
 
-void
+bool
 GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *aq1,
   struct goacc_asyncqueue *aq2)
 {
+  CUresult r;
   CUe

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v2)

2019-01-07 Thread Thomas Schwinge
Hi Chung-Lin!

On Wed, 2 Jan 2019 20:46:12 +0800, Chung-Lin Tang  
wrote:
> Hi Thomas, Happy New Year,

Thanks!  If I remember right, you still have a few weeks until "your" New
Year/Spring Festival, right?


> On 2018/12/19 5:03 AM, Thomas Schwinge wrote:
> >> +
> >> +  if (!dev->openacc.async.asyncqueue[async])
> >> +{
> >> +  dev->openacc.async.asyncqueue[async] = 
> >> dev->openacc.async.construct_func ();
> >> +
> >> +  if (!dev->openacc.async.asyncqueue[async])
> >> +  {
> >> +gomp_mutex_unlock (&dev->openacc.async.lock);
> >> +gomp_fatal ("async %d creation failed", async);
> >> +  }
> > That will now always fail for host fallback, where
> > "host_openacc_async_construct" just always does "return NULL".
> > 
> > Actually, if the device doesn't support asyncqueues, this whole function
> > should turn into some kind of no-op, so that we don't again and again try
> > to create a new one for every call to "lookup_goacc_asyncqueue".
> > 
> > I'm attaching one possible solution.  I think it's fine to assume that
> > the majority of devices will support asyncqueues, and for those that
> > don't, this is just a one-time overhead per async-argument.  So, no
> > special handling required in "lookup_goacc_asyncqueue".
> 
> > --- a/libgomp/oacc-host.c
> > +++ b/libgomp/oacc-host.c
> > @@ -212,7 +212,8 @@ host_openacc_async_queue_callback (struct 
> > goacc_asyncqueue *aq
> >   static struct goacc_asyncqueue *
> >   host_openacc_async_construct (void)
> >   {
> > -  return NULL;
> > +  /* We have to return non-NULL here, but it's OK to use a dummy.  */
> > +  return (struct goacc_asyncqueue *) -1;
> >   }
> 
> I'm not sure I understand the meaning of this? Is there any use to 
> segfaulting somewhere else
> due to this 0x... pointer?

There will be no such dereferencing (and thus no such segfault), as you
(quite nicely!) made this is an opaque data type to the generic code.
The concrete type is specific to, and only ever dereferenced inside each
plugin, and the "host plugin" never dereferences it, so returning minus
one here only serves as a non-NULL value/identifier to the generic code.

> A feature of a NULL asyncqueue should mean that it is simply synchronous

OK, then that should be documented, and as I mentioned above, the
"lookup" code be adjusted so that it doesn't again and again try to
create an asyncqueue when the "construct" function returns NULL.

> however this does somewhat
> conflict with the case of async.construct_func() returning NULL on error...
> 
> Perhaps, again using an explicit success code as the return value (and return 
> asyncqueue using
> an out parameter)?

Sure, that's also fine.  I just did it as presented above, because of its
simplicity, and to avoid adjusting the "lookup" code, as mentioned above.


Grüße
 Thomas


Re: [PATCH 2/2] PR libstdc++/86756 Move rest of std::filesystem to libstdc++.so

2019-01-07 Thread Christophe Lyon
On Mon, 7 Jan 2019 at 13:39, Jonathan Wakely  wrote:
>
> On 07/01/19 09:48 +, Jonathan Wakely wrote:
> >On 07/01/19 10:24 +0100, Christophe Lyon wrote:
> >>Hi Jonathan
> >>
> >>On Sun, 6 Jan 2019 at 23:37, Jonathan Wakely  wrote:
> >>>
> >>>Move std::filesystem directory iterators and operations from
> >>>libstdc++fs.a to main libstdc++ library. These components have many
> >>>dependencies on OS support, which is not available on all targets. Some
> >>>additional autoconf checks and conditional compilation is needed to
> >>>ensure the files will build for all targets. Previously this code was
> >>>not compiled without --enable-libstdcxx-filesystem-ts but the C++17
> >>>components should be available for all hosted builds.
> >>>
> >>>The tests for these components no longer need to link to libstdc++fs.a,
> >>>but are not expected to pass on all targets. To avoid numerous failures
> >>>on targets which are not expected to pass the tests (due to missing OS
> >>>functionality) leave the dg-require-filesystem-ts directives in place
> >>>for now. This will ensure the tests only run for builds where the
> >>>filesystem-ts library is built, which presumably means some level of OS
> >>>support is present.
> >>>
> >>>
> >>>Tested x86_64-linux (old/new string ABIs, 32/64 bit), x86_64-w64-mingw32.
> >>>
> >>>Committed to trunk.
> >>>
> >>
> >>After this commit (r267616), I've noticed build failures for my
> >>newlib-based toolchains:
> >>aarch64-elf, arm-eabi:
> >>
> >>In file included from
> >>/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:57:
> >>/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/../filesystem/ops-common.h:142:11:
> >>error: '::truncate' has not been declared
> >> 142 |   using ::truncate;
> >> |   ^~~~
> >>/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:
> >>In function 'void std::filesystem::resize_file(const
> >>std::filesystem::__cxx11::path&, uintmax_t, std::error_code&)':
> >>/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:1274:19:
> >>error: 'truncate' is not a member of 'posix'
> >>1274 |   else if (posix::truncate(p.c_str(), size))
> >> |   ^~~~
> >>make[5]: *** [fs_ops.lo] Error 1
> >>
> >>I'm not sure if there's an obvious fix? Note that I'm using a rather
> >>old newlib version, if that matters.
> >
> >That's probably the reason, as I didn't see this in my tests with
> >newlib builds.
> >
> >The fix is to add yet another autoconf check and guard the uses of
> >truncate with a _GLIBCXX_USE_TRUNCATE macro. I'll do that now ...
>
>
> Should be fixed with this patch, committed to trunk as r267647.
>

Yes, it works. Thanks!

Christophe


Re: Add new --param knobs for inliner

2019-01-07 Thread Ulrich Weigand
Jan Hubicka wrote:

> uinlined-* should be useful for architecutures with greater function
> call overhead than modern x86 chips (which is good portion of them,
> especially s390 as I learnt on Cauldron). It would be nice to benchmark
> effect of those and tune default in config/* files. I think this is a
> reasonable way to deal with architecutral differences without making
> inliner hard to tune in long term.

Thanks for the heads-up!  This looks interesting, we'll have a look.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



V3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-07 Thread H.J. Lu
On Sun, Dec 30, 2018 at 8:40 AM H.J. Lu  wrote:
>
> On Wed, Nov 28, 2018 at 12:17 PM Jeff Law  wrote:
> >
> > On 11/28/18 12:48 PM, H.J. Lu wrote:
> > > On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka  wrote:
> > >>
> > >>> On 11/5/18 7:21 AM, Jan Hubicka wrote:
> > >
> > > Did you mean "the nearest common dominator"?
> > 
> >  If the nearest common dominator appears in the loop while all uses are
> >  out of loops, this will result in suboptimal xor placement.
> >  In this case you want to split edges out of the loop.
> > 
> >  In general this is what the LCM framework will do for you if the 
> >  problem
> >  is modelled siimlar way as in mode_swtiching.  At entry function mode 
> >  is
> >  "no zero register needed" and all conversions need mode "zero register
> >  needed".  Mode switching should then do the correct placement decisions
> >  (reaching minimal number of executions of xor).
> > 
> >  Jeff, whan is your optinion on the approach taken by the patch?
> >  It seems like a special case of more general issue, but I do not see
> >  very elegant way to solve it at least in the GCC 9 horisont, so if
> >  the placement is correct we can probalby go either with new pass or
> >  making this part of mode swithcing (which is anyway run by x86 backend)
> > >>> So I haven't followed this discussion at all, but did touch on this
> > >>> issue with some patch a month or two ago with a target patch that was
> > >>> trying to avoid the partial stalls.
> > >>>
> > >>> My assumption is that we're trying to find one or more places to
> > >>> initialize the upper half of an avx register so as to avoid partial
> > >>> register stall at existing sites that set the upper half.
> > >>>
> > >>> This sounds like a classic PRE/LCM style problem (of which mode
> > >>> switching is just another variant).   A common-dominator approach is
> > >>> closer to a classic GCSE and is going to result is more initializations
> > >>> at sub-optimal points than a PRE/LCM style.
> > >>
> > >> yes, it is usual code placement problem. It is special case because the
> > >> zero register is not modified by the conversion (just we need to have
> > >> zero somewhere).  So basically we do not have kills to the zero except
> > >> for entry block.
> > >>
> > >
> > > Do you have  testcase to show thatf the nearest common dominator
> > > in the loop, while all uses areout of loops, leads to suboptimal xor
> > > placement?
> > I don't have a testcase, but it's all but certain nearest common
> > dominator is going to be a suboptimal placement.  That's going to create
> > paths where you're going to emit the xor when it's not used.
> >
> > The whole point of the LCM algorithms is they are optimal in terms of
> > expression evaluations.
>
> We tried LCM and it didn't work well for this case.  LCM places a single
> VXOR close to the location where it is needed, which can be inside a
> loop.  There is nothing wrong with the LCM algorithms.   But this doesn't
> solve
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007
>
> where VXOR is executed multiple times inside of a function, instead of
> just once.   We are investigating to generate a single VXOR at entry of the
> nearest dominator for basic blocks with SF/DF conversions, which is in
> the the fake loop that contains the whole function:
>
>   bb = nearest_common_dominator_for_set (CDI_DOMINATORS,
>  convert_bbs);
>   while (bb->loop_father->latch
>  != EXIT_BLOCK_PTR_FOR_FN (cfun))
> bb = get_immediate_dominator (CDI_DOMINATORS,
>   bb->loop_father->header);
>
>   insn = BB_HEAD (bb);
>   if (!NONDEBUG_INSN_P (insn))
> insn = next_nonnote_nondebug_insn (insn);
>   set = gen_rtx_SET (v4sf_const0, CONST0_RTX (V4SFmode));
>   set_insn = emit_insn_before (set, insn);
>

Here is the updated patch.  OK for trunk?

Thanks.

-- 
H.J.
From 6eca7dbf282d7e2a5cde41bffeca66195d72d48e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 7 Jan 2019 05:44:59 -0800
Subject: [PATCH] i386: Add pass_remove_partial_avx_dependency

With -mavx, for

$ cat foo.i
extern float f;
extern double d;
extern int i;

void
foo (void)
{
  d = f;
  f = i;
}

we need to generate

	vxorp[ds]	%xmmN, %xmmN, %xmmN
	...
	vcvtss2sd	f(%rip), %xmmN, %xmmX
	...
	vcvtsi2ss	i(%rip), %xmmN, %xmmY

to avoid partial XMM register stall.  This patch adds a pass to generate
a single

	vxorps		%xmmN, %xmmN, %xmmN

at entry of the nearest dominator for basic blocks with SF/DF conversions,
which is in the fake loop that contains the whole function, instead of
generating one

	vxorp[ds]	%xmmN, %xmmN, %xmmN

for each SF/DF conversion.

NB: The LCM algorithm isn't appropriate here since it may place a vxorps
inside the loop.  Simple testcase show this:

$ cat badcase.c

extern float f;
extern double d;

void
foo (int n, int k)
{
  for (int j = 0; j 

[PATCH, testsuite] Require alias support in three tests.

2019-01-07 Thread Iain Sandoe
Hi,

These three tests fail on targets without alias support,

OK to apply?

Iain


gcc/testsuite/

* gcc.dg/Wmissing-attributes.c: Require alias support.
* gcc.dg/attr-copy-2.c: Likewise.
* gcc.dg/attr-copy-5.c: Likewise.
---
 gcc/testsuite/gcc.dg/Wmissing-attributes.c | 1 +
 gcc/testsuite/gcc.dg/attr-copy-2.c | 1 +
 gcc/testsuite/gcc.dg/attr-copy-5.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/Wmissing-attributes.c 
b/gcc/testsuite/gcc.dg/Wmissing-attributes.c
index 2a981828a2..049a4c095b 100644
--- a/gcc/testsuite/gcc.dg/Wmissing-attributes.c
+++ b/gcc/testsuite/gcc.dg/Wmissing-attributes.c
@@ -1,5 +1,6 @@
 /* PR middle-end/81824 - Warn for missing attributes with function aliases
{ dg-do compile }
+   { dg-require-alias "" }
{ dg-options "-Wall" } */
 
 #define ATTR(list)   __attribute__ (list)
diff --git a/gcc/testsuite/gcc.dg/attr-copy-2.c 
b/gcc/testsuite/gcc.dg/attr-copy-2.c
index 39e5f087de..8564811fc0 100644
--- a/gcc/testsuite/gcc.dg/attr-copy-2.c
+++ b/gcc/testsuite/gcc.dg/attr-copy-2.c
@@ -1,6 +1,7 @@
 /* PR middle-end/81824 - Warn for missing attributes with function aliases
Exercise attribute copy for functions.
{ dg-do compile }
+   { dg-require-alias "" }
{ dg-options "-O2 -Wall" } */
 
 #define Assert(expr)   typedef char AssertExpr[2 * !!(expr) - 1]
diff --git a/gcc/testsuite/gcc.dg/attr-copy-5.c 
b/gcc/testsuite/gcc.dg/attr-copy-5.c
index b085cf968a..24cd6e7aa7 100644
--- a/gcc/testsuite/gcc.dg/attr-copy-5.c
+++ b/gcc/testsuite/gcc.dg/attr-copy-5.c
@@ -3,6 +3,7 @@
copied.  Also verify that copying attribute tls_model to a non-thread
variable triggers a warning.
{ dg-do compile }
+   { dg-require-alias "" }
{ dg-options "-Wall" }
{ dg-require-effective-target tls } */
 
-- 
2.17.1




Fix ICE in get_initial_defs_for_reduction (PR 88567)

2019-01-07 Thread Richard Sandiford
The use of "j" in:

  init = permute_results[number_of_vectors - j - 1];

was out-of-sync with the new flat loop structure.  Now that all that
reversing is gone, we can just use the result of duplicate_and_interleave
directly.

The other cases shouldn't be affected by postponing the insertion
of ctor_seq, since gimple_build* appends to the seq without clearing
it first (unlike some of the gimplify routines).

The ICE is already covered by gcc.dg/vect/pr63379.c.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2019-01-07  Richard Sandiford  

gcc/
PR middle-end/88567
* tree-vect-loop.c (get_initial_defs_for_reduction): Pass the
output vector directly to duplicate_and_interleave instead of
going through a temporary.  Postpone insertion of ctor_seq to
the end of the loop.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-01-04 11:39:26.674251052 +
+++ gcc/tree-vect-loop.c2019-01-07 13:23:22.924449595 +
@@ -4103,7 +4103,6 @@ get_initial_defs_for_reduction (slp_tree
   unsigned int group_size = stmts.length ();
   unsigned int i;
   struct loop *loop;
-  auto_vec permute_results;
 
   vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
 
@@ -4138,6 +4137,7 @@ get_initial_defs_for_reduction (slp_tree
   bool constant_p = true;
   tree_vector_builder elts (vector_type, nunits, 1);
   elts.quick_grow (nunits);
+  gimple_seq ctor_seq = NULL;
   for (j = 0; j < nunits * number_of_vectors; ++j)
 {
   tree op;
@@ -4163,7 +4163,6 @@ get_initial_defs_for_reduction (slp_tree
 
   if (number_of_places_left_in_vector == 0)
{
- gimple_seq ctor_seq = NULL;
  tree init;
  if (constant_p && !neutral_op
  ? multiple_p (TYPE_VECTOR_SUBPARTS (vector_type), nunits)
@@ -4189,16 +4188,11 @@ get_initial_defs_for_reduction (slp_tree
  else
{
  /* First time round, duplicate ELTS to fill the
-required number of vectors, then cherry pick the
-appropriate result for each iteration.  */
- if (vec_oprnds->is_empty ())
-   duplicate_and_interleave (&ctor_seq, vector_type, elts,
- number_of_vectors,
- permute_results);
- init = permute_results[number_of_vectors - j - 1];
+required number of vectors.  */
+ duplicate_and_interleave (&ctor_seq, vector_type, elts,
+   number_of_vectors, *vec_oprnds);
+ break;
}
- if (ctor_seq != NULL)
-   gsi_insert_seq_on_edge_immediate (pe, ctor_seq);
  vec_oprnds->quick_push (init);
 
  number_of_places_left_in_vector = nunits;
@@ -4207,6 +4201,8 @@ get_initial_defs_for_reduction (slp_tree
  constant_p = true;
}
 }
+  if (ctor_seq != NULL)
+gsi_insert_seq_on_edge_immediate (pe, ctor_seq);
 }
 
 


Fix IFN_MASK_STORE handling of IFN_GOMP_SIMD_LANE

2019-01-07 Thread Richard Sandiford
The IFN_GOMP_SIMD_LANE handling in vectorizable_store tries to use MEM_REF
offsets to maintain pointer disambiguation info.  This patch makes sure
that we don't try to do the same optimisation for IFN_MASK_STOREs, which
have no similar offset argument.

The patch fixes libgomp.c-c++-common/pr66199-*.c for SVE.  Previously
we had an ncopies==2 store and stored both halves to the same address.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2019-01-07  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vectorizable_store): Don't use the dataref_offset
optimization for masked stores.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-01-04 11:39:27.190246648 +
+++ gcc/tree-vect-stmts.c   2019-01-07 13:23:28.048406652 +
@@ -7059,6 +7059,7 @@ vectorizable_store (stmt_vec_info stmt_i
  bool simd_lane_access_p
= STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
  if (simd_lane_access_p
+ && !loop_masks
  && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
  && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
  && integer_zerop (DR_OFFSET (first_dr_info->dr))


Re: [PATCH 3/2] Update documentation for C++17 filesystem library

2019-01-07 Thread Jonathan Wakely

This updates the manual to reflect the std::filesystem definitions
moving to the main library.

Committed to trunk.


commit ca7e342f4f8be3d32a58357b9d85b26fc2635fab
Author: Jonathan Wakely 
Date:   Mon Jan 7 12:45:04 2019 +

Update documentation for C++17 filesystem library

* doc/xml/manual/spine.xml: Update copyright years.
* doc/xml/manual/status_cxx2017.xml: Adjust note about -lstdc++fs.
* doc/xml/manual/using.xml: Remove requirement to link with -lstdc++fs
for C++17 filesystem library.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/spine.xml b/libstdc++-v3/doc/xml/manual/spine.xml
index b9f05e24f3d..2b6973ba0ae 100644
--- a/libstdc++-v3/doc/xml/manual/spine.xml
+++ b/libstdc++-v3/doc/xml/manual/spine.xml
@@ -26,6 +26,7 @@
 2016
 2017
 2018
+2019
 
   http://www.w3.org/1999/xlink"; xlink:href="https://www.fsf.org";>FSF
 
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 181dbe7a6ec..f3793083375 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -692,7 +692,7 @@ Feature-testing recommendations for C++.
8.1 
__has_include() ,
 	  __cpp_lib_filesystem >= 201603 
-	 (requires linking with -lstdc++fs)
+	 (GCC 8.x requires linking with -lstdc++fs)
   
 
 
diff --git a/libstdc++-v3/doc/xml/manual/using.xml b/libstdc++-v3/doc/xml/manual/using.xml
index 63031c8a86d..2d44a739406 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -99,9 +99,7 @@
   -lstdc++fs
   Linking to libstdc++fs
 is required for use of the Filesystem library extensions in
-
-and the C++17 Filesystem library in
-.
+.
   
 
 


Re: [PATCH 2/2] PR libstdc++/86756 Move rest of std::filesystem to libstdc++.so

2019-01-07 Thread Jonathan Wakely

On 07/01/19 09:48 +, Jonathan Wakely wrote:

On 07/01/19 10:24 +0100, Christophe Lyon wrote:

Hi Jonathan

On Sun, 6 Jan 2019 at 23:37, Jonathan Wakely  wrote:


Move std::filesystem directory iterators and operations from
libstdc++fs.a to main libstdc++ library. These components have many
dependencies on OS support, which is not available on all targets. Some
additional autoconf checks and conditional compilation is needed to
ensure the files will build for all targets. Previously this code was
not compiled without --enable-libstdcxx-filesystem-ts but the C++17
components should be available for all hosted builds.

The tests for these components no longer need to link to libstdc++fs.a,
but are not expected to pass on all targets. To avoid numerous failures
on targets which are not expected to pass the tests (due to missing OS
functionality) leave the dg-require-filesystem-ts directives in place
for now. This will ensure the tests only run for builds where the
filesystem-ts library is built, which presumably means some level of OS
support is present.


Tested x86_64-linux (old/new string ABIs, 32/64 bit), x86_64-w64-mingw32.

Committed to trunk.



After this commit (r267616), I've noticed build failures for my
newlib-based toolchains:
aarch64-elf, arm-eabi:

In file included from
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:57:
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/../filesystem/ops-common.h:142:11:
error: '::truncate' has not been declared
142 |   using ::truncate;
|   ^~~~
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:
In function 'void std::filesystem::resize_file(const
std::filesystem::__cxx11::path&, uintmax_t, std::error_code&)':
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:1274:19:
error: 'truncate' is not a member of 'posix'
1274 |   else if (posix::truncate(p.c_str(), size))
|   ^~~~
make[5]: *** [fs_ops.lo] Error 1

I'm not sure if there's an obvious fix? Note that I'm using a rather
old newlib version, if that matters.


That's probably the reason, as I didn't see this in my tests with
newlib builds.

The fix is to add yet another autoconf check and guard the uses of
truncate with a _GLIBCXX_USE_TRUNCATE macro. I'll do that now ...



Should be fixed with this patch, committed to trunk as r267647.

commit 5f0f0401171507e887f9ba775bcf243d3b3aff91
Author: Jonathan Wakely 
Date:   Mon Jan 7 11:47:20 2019 +

Fix build for systems without POSIX truncate

Older versions of newlib do not provide truncate so add a configure
check for it, and provide a fallback definition.

There were also some missing exports in the linker script, which went
unnoticed because there are no tests for some functions. A new link-only
test checks that every filesystem operation function is defined by the
library.

* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Check for truncate.
* config.h.in: Regenerate.
* config/abi/pre/gnu.ver: Order patterns for filesystem operations
alphabetically and add missing entries for copy_symlink,
hard_link_count, rename, and resize_file.
* configure: Regenerate.
* src/c++17/fs_ops.cc (resize_file): Remove #if so posix::truncate is
used unconditionally.
* src/filesystem/ops-common.h (__gnu_posix::truncate)
[!_GLIBCXX_HAVE_TRUNCATE]: Provide fallback definition that only
supports truncating to zero length.
* testsuite/27_io/filesystem/operations/all.cc: New test.
* testsuite/27_io/filesystem/operations/resize_file.cc: New test.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index ce91e495fab..8950e4c8872 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4589,6 +4589,19 @@ dnl
   AC_DEFINE(HAVE_SYMLINK, 1, [Define if symlink is available in .])
 fi
 AC_MSG_RESULT($glibcxx_cv_symlink)
+dnl
+AC_MSG_CHECKING([for truncate])
+AC_CACHE_VAL(glibcxx_cv_truncate, [dnl
+  GCC_TRY_COMPILE_OR_LINK(
+[#include ],
+[truncate("", 99);],
+[glibcxx_cv_truncate=yes],
+[glibcxx_cv_truncate=no])
+])
+if test $glibcxx_cv_truncate = yes; then
+  AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in .])
+fi
+AC_MSG_RESULT($glibcxx_cv_truncate)
 dnl
 CXXFLAGS="$ac_save_CXXFLAGS"
 AC_LANG_RESTORE
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 20325bf7a33..02a6ec90375 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2167,31 +2167,35 @@ GLIBCXX_3.4.26 {
 _ZNSt10filesystem7__cxx114pathpLERKS1_;
 _ZT[IV]NSt10filesystem7__cxx1116filesystem_errorE;
 
-_ZNSt10filesystem10e

GCC 9 Status report (2019-01-07), trunk in regression and documentation fixes mode

2019-01-07 Thread Richard Biener


Status
==

Stage 3 is done now.

Changes of GCC trunk should now be restricted to regression and documentation
fixes.  That is, it is in the same mode as the open release branches we have.
As soon as the count of P1 bugs drops to zero (and un-categorized, aka P3
bugs have been categorized) you can expect trunk to branch and stage 1 open
for general development of GCC 10.  Do not hold your breath though, history
suggests you'll have to wait until mid of April for that to happen.

You can make it happen faster by fixing regressions.

Please also give your favorite target production-level quality testing
and make sure to file bugs about regressions you encounter.


Quality Data


Priority  #   Change from GCC 8 stage3 -> stage4 transition
---   ---
P1   42   +   6
P2  187   +  54
P3   47   -  10
P4  182   +  24
P5   25   -   2
---   ---
Total P1-P3 276   +  50
Total   483   +  72


Previous Report
===

https://gcc.gnu.org/ml/gcc/2018-11/msg00067.html


Re: [PATCH] genattrtab bit-rot, and if_then_else in values

2019-01-07 Thread Richard Sandiford
Alan Modra  writes:
> On Fri, Jan 04, 2019 at 12:18:03PM +, Richard Sandiford wrote:
>> Alan Modra  writes:
>> > On Thu, Jan 03, 2019 at 07:03:59PM +, Richard Sandiford wrote:
>> >> Richard Sandiford  writes:
>> >> > This still seems risky and isn't what the name and function comment
>> >
>> > OK, how about this delta from the previous patch to ameliorate the
>> > maintenance risk?
>> 
>> attr_value_alignment seems clearer, and means that we can handle
>> things like:
>> 
>>   (mult (symbol_ref "...") (const_int 4))
>
> OK, revised patch as follows, handling MINUS and MULT in the max/min
> value functions too.
>
>   * genattrtab.c (max_attr_value, min_attr_value, or_attr_value):
>   Delete "unknownp" parameter.  Adjust callers.  Handle
>   CONST_INT, PLUS, MINUS, and MULT.
>   (attr_value_aligned): Renamed from or_attr_value.
>   (min_attr_value): Return INT_MIN for unhandled rtl case..
>   (min_fn): ..and translate to INT_MAX here.
>   (write_length_unit_log): Modify to cope without "unknown".
>   (write_attr_value): Handle IF_THEN_ELSE.
>
> diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
> index 2cd04cdb06f..b8adf704009 100644
> --- a/gcc/genattrtab.c
> +++ b/gcc/genattrtab.c
> @@ -266,9 +266,9 @@ static int compares_alternatives_p (rtx);
>  static void make_internal_attr (const char *, rtx, int);
>  static void insert_insn_ent(struct attr_value *, struct insn_ent *);
>  static void walk_attr_value (rtx);
> -static int max_attr_value   (rtx, int*);
> -static int min_attr_value   (rtx, int*);
> -static int or_attr_value(rtx, int*);
> +static int max_attr_value   (rtx);
> +static int min_attr_value   (rtx);
> +static unsigned int attr_value_alignment (rtx);
>  static rtx simplify_test_exp(rtx, int, int);
>  static rtx simplify_test_exp_in_temp (rtx, int, int);
>  static rtx copy_rtx_unchanging  (rtx);
> @@ -1550,15 +1550,16 @@ one_fn (rtx exp ATTRIBUTE_UNUSED)
>  static rtx
>  max_fn (rtx exp)
>  {
> -  int unknown;
> -  return make_numeric_value (max_attr_value (exp, &unknown));
> +  return make_numeric_value (max_attr_value (exp));
>  }
>  
>  static rtx
>  min_fn (rtx exp)
>  {
> -  int unknown;
> -  return make_numeric_value (min_attr_value (exp, &unknown));
> +  int val = min_attr_value (exp);
> +  if (val < 0)
> +val = INT_MAX;
> +  return make_numeric_value (val);
>  }
>  
>  static void
> @@ -1568,24 +1569,21 @@ write_length_unit_log (FILE *outf)
>struct attr_value *av;
>struct insn_ent *ie;
>unsigned int length_unit_log, length_or;
> -  int unknown = 0;
>  
>if (length_attr)
>  {
> -  length_or = or_attr_value (length_attr->default_val->value, &unknown);
> +  length_or = attr_value_alignment (length_attr->default_val->value);
>for (av = length_attr->first_value; av; av = av->next)
>   for (ie = av->first_insn; ie; ie = ie->next)
> -   length_or |= or_attr_value (av->value, &unknown);
> -}
> +   length_or |= attr_value_alignment (av->value);
>  
> -  if (length_attr == NULL || unknown)
> -length_unit_log = 0;
> -  else
> -{
>length_or = ~length_or;
>for (length_unit_log = 0; length_or & 1; length_or >>= 1)
>   length_unit_log++;
>  }
> +  else
> +length_unit_log = 0;
> +
>fprintf (outf, "EXPORTED_CONST int length_unit_log = %u;\n", 
> length_unit_log);
>  }
>  
> @@ -3753,11 +3751,12 @@ write_test_expr (FILE *outf, rtx exp, unsigned int 
> attrs_cached, int flags,
>return attrs_cached;
>  }
>  
> -/* Given an attribute value, return the maximum CONST_STRING argument
> -   encountered.  Set *UNKNOWNP and return INT_MAX if the value is unknown.  
> */
> +/* Given an attribute value expression, return the maximum value that
> +   might be evaluated assuming all conditionals are independent.
> +   Return INT_MAX if the value can't be calculated by this function.  */

Not sure about "assuming all conditionals are independent".  All three
functions should be conservatively correct without any assumptions.

OK without that part if you agree.

Thanks,
Richard


Re: [PATCH v3 00/10] AMD GCN Port v3

2019-01-07 Thread Richard Biener
On Mon, Jan 7, 2019 at 11:48 AM Andrew Stubbs  wrote:
>
> New year ping!
>
> The last remaining middle-end patch was already applied, so it's only
> the backend, config, and testsuite patches remaining to be committed.
> And, it's mostly only the back end still requiring review.
>
> Hopefully I should still be able to get them reviewed and committed in
> time for GCC 9, given that disruption to other targets should no longer
> be an issue?

Yes, it's OK to add during stage4 once approved.

Richard.

> Andrew
>
> On 12/12/2018 11:52, Andrew Stubbs wrote:
> > This is the third rework of the patchset previously posted on September
> > 5th and November 16th. As before, the series contains the
> > non-OpenACC/OpenMP portions of a port to AMD GCN3 and GCN5 GPU
> > processors.  It's sufficient to build single-threaded programs, with
> > vectorization in the usual way.  C and Fortran are supported, C++ is not
> > supported, and the other front-ends have not been tested.  The
> > OpenACC/OpenMP/libgomp portion will follow, once this is committed,
> > eventually.
> >
> > Compared to the v2 patchset, patch 1, "Fix IRA ICE", has been dropped,
> > and a new, unrelated, patch 1 has been added: "Fix LRA bug".
> >
> > The IRA issue has now been solved by reworking the move instructions in
> > the back-end so that they no longer require explicit mention of the EXEC
> > register (this is now managed mostly by the md_reorg pass).  I also took
> > the opportunity to rework the EXEC use throughout the machine
> > description (something I've been wanting to get to for ages); the
> > primary instruction patterns no longer use vec_merge, and there are
> > "_exec" variants defined (mostly via define_subst) for the use of
> > specific expanders and so that combine can optimize conditional vector
> > moves.
> >
> > Additionally, the patterns that choose which unit to use for
> > scalar operations now only clobber the relevant condition register (via
> > a match_scratch), not both of them.
> >
> > The new LRA issue was exposed by the above changes, but would affect any
> > target where patterns referring to an eliminable register might also
> > include a "scratch" register.
> >
> > I've also addressed the various feedback I received from patch
> > reviewers.
> >
>


Re: [1/2] PR88598: Optimise x * { 0 or 1, 0 or 1, ... }

2019-01-07 Thread Richard Biener
On Fri, Jan 4, 2019 at 1:44 PM Richard Sandiford
 wrote:
>
> Jakub Jelinek  writes:
> > On Fri, Jan 04, 2019 at 12:13:13PM +, Richard Sandiford wrote:
> >> > Can we avoid the gratuitous use of template here?  We were told that C++ 
> >> > would
> >> > be used only when it makes things more straightforward and it's the 
> >> > contrary
> >> > in this case, to wit the need for the ugly RECURSE macro in the middle.
> >>
> >> I did it that way so that it would be easy to add things like
> >> zero_or_minus_onep without cut-&-pasting the whole structure.
> >
> > IMHO we can make such a change only when it is needed.
>
> The other predicates in tree.c suggest that we won't though.
> E.g. there was never any attempt to unify integer_zerop vs. integer_onep
> and real_zerop vs. real_onep.
>
> >> The way to do that in C would be to use a macro for the full
> >> function, but that's even uglier due to the extra backslashes.
> >
> > Or just make the function static inline and pass the function pointers
> > to it as arguments?  If it is inlined, it will be the same, it could be
> > even always_inline if that is really needed.
>
> For that to work for recursive functions I think we'd need to pass the
> caller predicate in too, which means one more function pointer overall.
>
> Anyway, here's the patch without the template.

OK.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> 2019-01-04  Richard Sandiford  
>
> gcc/
> PR tree-optimization/88598
> * tree.h (initializer_each_zero_or_onep): Declare.
> * tree.c (initializer_each_zero_or_onep): New function.
> (signed_or_unsigned_type_for): Handle float types too.
> (unsigned_type_for, signed_type_for): Update comments accordingly.
> * match.pd: Fold x * { 0 or 1, 0 or 1, ...} to
> x & { 0 or -1, 0 or -1, ... }.
>
> gcc/testsuite/
> PR tree-optimization/88598
> * gcc.dg/pr88598-1.c: New test.
> * gcc.dg/pr88598-2.c: Likewise.
> * gcc.dg/pr88598-3.c: Likewise.
> * gcc.dg/pr88598-4.c: Likewise.
> * gcc.dg/pr88598-5.c: Likewise.
>
> Index: gcc/tree.h
> ===
> --- gcc/tree.h  2019-01-04 12:40:51.0 +
> +++ gcc/tree.h  2019-01-04 12:40:51.990582844 +
> @@ -4506,6 +4506,7 @@ extern tree first_field (const_tree);
> combinations indicate definitive answers.  */
>
>  extern bool initializer_zerop (const_tree, bool * = NULL);
> +extern bool initializer_each_zero_or_onep (const_tree);
>
>  extern wide_int vector_cst_int_elt (const_tree, unsigned int);
>  extern tree vector_cst_elt (const_tree, unsigned int);
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  2019-01-04 12:40:51.0 +
> +++ gcc/tree.c  2019-01-04 12:40:51.990582844 +
> @@ -11229,6 +11229,45 @@ initializer_zerop (const_tree init, bool
>  }
>  }
>
> +/* Return true if EXPR is an initializer expression in which every element
> +   is a constant that is numerically equal to 0 or 1.  The elements do not
> +   need to be equal to each other.  */
> +
> +bool
> +initializer_each_zero_or_onep (const_tree expr)
> +{
> +  STRIP_ANY_LOCATION_WRAPPER (expr);
> +
> +  switch (TREE_CODE (expr))
> +{
> +case INTEGER_CST:
> +  return integer_zerop (expr) || integer_onep (expr);
> +
> +case REAL_CST:
> +  return real_zerop (expr) || real_onep (expr);
> +
> +case VECTOR_CST:
> +  {
> +   unsigned HOST_WIDE_INT nelts = vector_cst_encoded_nelts (expr);
> +   if (VECTOR_CST_STEPPED_P (expr)
> +   && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (expr)).is_constant (&nelts))
> + return false;
> +
> +   for (unsigned int i = 0; i < nelts; ++i)
> + {
> +   tree elt = VECTOR_CST_ENCODED_ELT (expr, i);
> +   if (!initializer_each_zero_or_onep (elt))
> + return false;
> + }
> +
> +   return true;
> +  }
> +
> +default:
> +  return false;
> +}
> +}
> +
>  /* Check if vector VEC consists of all the equal elements and
> that the number of elements corresponds to the type of VEC.
> The function returns first element of the vector
> @@ -11672,7 +11711,10 @@ int_cst_value (const_tree x)
>
>  /* If TYPE is an integral or pointer type, return an integer type with
> the same precision which is unsigned iff UNSIGNEDP is true, or itself
> -   if TYPE is already an integer type of signedness UNSIGNEDP.  */
> +   if TYPE is already an integer type of signedness UNSIGNEDP.
> +   If TYPE is a floating-point type, return an integer type with the same
> +   bitsize and with the signedness given by UNSIGNEDP; this is useful
> +   when doing bit-level operations on a floating-point value.  */
>
>  tree
>  signed_or_unsigned_type_for (int unsignedp, tree type)
> @@ -11702,17 +11744,23 @@ signed_or_unsigned_type_for (int unsigne
>return build_complex_type (inner2);
>  }
>
> -  if (!IN

Re: [2/2] PR88598: Optimise reduc (bit_and)

2019-01-07 Thread Richard Biener
On Fri, Jan 4, 2019 at 12:49 PM Richard Sandiford
 wrote:
>
> This patch folds certain reductions of X & CST to X[I] & CST[I] if I is
> the only nonzero element of CST.  This includes the motivating case in
> which CST[I] is -1.
>
> We could do the same for REDUC_MAX on unsigned types, but I wasn't sure
> that that special case was worth it.
>
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK to install?
>
> Obvious follow-ons include handling multiplications of
> single_nonzero_element constants, multiplications of uniform constants,
> and additions of any constant, but this is enough to fix the PR.

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-01-04  Richard Sandiford  
>
> gcc/
> PR tree-optimization/88598
> * tree.h (single_nonzero_element): Declare.
> * tree.c (single_nonzero_element): New function.
> * match.pd: Fold certain reductions of X & CST to X[I] & CST[I]
> if I is the only nonzero element of CST.
>
> gcc/testsuite/
> PR tree-optimization/88598
> * gcc.dg/vect/pr88598-1.c: New test.
> * gcc.dg/vect/pr88598-2.c: Likewise.
> * gcc.dg/vect/pr88598-3.c: Likewise.
> * gcc.dg/vect/pr88598-4.c: Likewise.
> * gcc.dg/vect/pr88598-5.c: Likewise.
> * gcc.dg/vect/pr88598-6.c: Likewise.
>
> Index: gcc/tree.h
> ===
> --- gcc/tree.h  2019-01-04 11:40:33.141683783 +
> +++ gcc/tree.h  2019-01-04 11:40:36.581654424 +
> @@ -4522,6 +4522,8 @@ extern tree uniform_vector_p (const_tree
>
>  extern tree uniform_integer_cst_p (tree);
>
> +extern int single_nonzero_element (const_tree);
> +
>  /* Given a CONSTRUCTOR CTOR, return the element values as a vector.  */
>
>  extern vec *ctor_to_vec (tree);
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  2019-01-04 11:40:33.141683783 +
> +++ gcc/tree.c  2019-01-04 11:40:36.577654458 +
> @@ -11355,6 +11355,38 @@ uniform_integer_cst_p (tree t)
>return NULL_TREE;
>  }
>
> +/* If VECTOR_CST T has a single nonzero element, return the index of that
> +   element, otherwise return -1.  */
> +
> +int
> +single_nonzero_element (const_tree t)
> +{
> +  unsigned HOST_WIDE_INT nelts;
> +  unsigned int repeat_nelts;
> +  if (VECTOR_CST_NELTS (t).is_constant (&nelts))
> +repeat_nelts = nelts;
> +  else if (VECTOR_CST_NELTS_PER_PATTERN (t) == 2)
> +{
> +  nelts = vector_cst_encoded_nelts (t);
> +  repeat_nelts = VECTOR_CST_NPATTERNS (t);
> +}
> +  else
> +return -1;
> +
> +  int res = -1;
> +  for (unsigned int i = 0; i < nelts; ++i)
> +{
> +  tree elt = vector_cst_elt (t, i);
> +  if (!integer_zerop (elt) && !real_zerop (elt))
> +   {
> + if (res >= 0 || i >= repeat_nelts)
> +   return -1;
> + res = i;
> +   }
> +}
> +  return res;
> +}
> +
>  /* Build an empty statement at location LOC.  */
>
>  tree
> Index: gcc/match.pd
> ===
> --- gcc/match.pd2019-01-04 11:40:33.137683817 +
> +++ gcc/match.pd2019-01-04 11:40:36.573654492 +
> @@ -5251,3 +5251,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> { wide_int_to_tree (sizetype, off); })
>   { swap_p ? @0 : @2; }))
> { rhs_tree; })
> +
> +/* Fold REDUC (@0 & @1) -> @0[I] & @1[I] if element I is the only nonzero
> +   element of @1.  */
> +(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR)
> + (simplify (reduc (view_convert? (bit_and @0 VECTOR_CST@1)))
> +  (with { int i = single_nonzero_element (@1); }
> +   (if (i >= 0)
> +(with { tree elt = vector_cst_elt (@1, i);
> +   tree elt_type = TREE_TYPE (elt);
> +   unsigned int elt_bits = tree_to_uhwi (TYPE_SIZE (elt_type));
> +   tree size = bitsize_int (elt_bits);
> +   tree pos = bitsize_int (elt_bits * i); }
> + (view_convert
> +  (bit_and:elt_type
> +   (BIT_FIELD_REF:elt_type @0 { size; } { pos; })
> +   { elt; })))
> Index: gcc/testsuite/gcc.dg/vect/pr88598-1.c
> ===
> --- /dev/null   2018-12-31 11:20:29.178325188 +
> +++ gcc/testsuite/gcc.dg/vect/pr88598-1.c   2019-01-04 11:40:36.577654458 
> +
> @@ -0,0 +1,55 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "-fdump-tree-optimized" } */
> +
> +#include "tree-vect.h"
> +
> +#define N 4
> +
> +int a[N];
> +
> +int __attribute__ ((noipa))
> +f1 (void)
> +{
> +  int b[N] = { 15, 0, 0, 0 }, res = 0;
> +  for (int i = 0; i < N; ++i)
> +res += a[i] & b[i];
> +  return res;
> +}
> +
> +int __attribute__ ((noipa))
> +f2 (void)
> +{
> +  int b[N] = { 0, 31, 0, 0 }, res = 0;
> +  for (int i = 0; i < N; ++i)
> +res += a[i] & b[i];
> +  return res;
> +}
> +
> +int __attribute__ ((noipa))
> +f3 (void)
> +{
> +  int b[N] = { 0, 0, 0, -1 }

Re: [PATCH v3 00/10] AMD GCN Port v3

2019-01-07 Thread Andrew Stubbs

New year ping!

The last remaining middle-end patch was already applied, so it's only 
the backend, config, and testsuite patches remaining to be committed. 
And, it's mostly only the back end still requiring review.


Hopefully I should still be able to get them reviewed and committed in 
time for GCC 9, given that disruption to other targets should no longer 
be an issue?


Andrew

On 12/12/2018 11:52, Andrew Stubbs wrote:

This is the third rework of the patchset previously posted on September
5th and November 16th. As before, the series contains the
non-OpenACC/OpenMP portions of a port to AMD GCN3 and GCN5 GPU
processors.  It's sufficient to build single-threaded programs, with
vectorization in the usual way.  C and Fortran are supported, C++ is not
supported, and the other front-ends have not been tested.  The
OpenACC/OpenMP/libgomp portion will follow, once this is committed,
eventually.

Compared to the v2 patchset, patch 1, "Fix IRA ICE", has been dropped,
and a new, unrelated, patch 1 has been added: "Fix LRA bug".

The IRA issue has now been solved by reworking the move instructions in
the back-end so that they no longer require explicit mention of the EXEC
register (this is now managed mostly by the md_reorg pass).  I also took
the opportunity to rework the EXEC use throughout the machine
description (something I've been wanting to get to for ages); the
primary instruction patterns no longer use vec_merge, and there are
"_exec" variants defined (mostly via define_subst) for the use of
specific expanders and so that combine can optimize conditional vector
moves.

Additionally, the patterns that choose which unit to use for
scalar operations now only clobber the relevant condition register (via
a match_scratch), not both of them.

The new LRA issue was exposed by the above changes, but would affect any
target where patterns referring to an eliminable register might also
include a "scratch" register.

I've also addressed the various feedback I received from patch
reviewers.





Re: [PATCH] Silence some dwarf2out UNSPEC related notes (PR debug/88723)

2019-01-07 Thread Richard Biener
On Mon, 7 Jan 2019, Jakub Jelinek wrote:

> Hi!
> 
> Apparently my recent patch turned on many non-delegitimized UNSPEC notes
> (it is checking only note, goes away in release builds, but anyway).
> 
> The problem is that formerly UNSPECs were diagnosed that way only when
> inside of CONST, but now also outside of them.  The patch keeps ignoring
> them if they don't have all constant arguments though as before, only tries
> to handle them if they have all constant arguments and thus normally would
> appear inside of CONST but we are processing parts of the CONST
> individually.
> 
> The following patch fixes it by determining first if they have all
> CONSTANT_P arguments, and only if they do, follows that with
> const_ok_for_output_1 verification which can emit this diagnostics.
> 
> The patch also removes what appears to be a badly applied patch in r255862,
> the posted patch contained just addition of if (CONST_POLY_INT_P (rtl)) 
> return false;
> to the function, but added also the hunk I'm removing, so we have now
>   if (targetm.const_not_ok_for_debug_p (rtl))
> {
>   if (GET_CODE (rtl) != UNSPEC)
>   {
> diagnostics;
> return false;
>   }
>   another diagnostics;
>   return false;
> }
>   if (CONST_POLY_INT_P (rtl))
> return false;
>   if (targetm.const_not_ok_for_debug_p (rtl))
> {
>   diagnostics;
>   return false;
> }
> Calling it twice will not help in any way.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested on the
> testcase with -> sparc-solaris cross-compiler, ok for trunk?

OK.

Richard.

> 2019-01-07  Jakub Jelinek  
> 
>   PR debug/88723
>   * dwarf2out.c (const_ok_for_output_1): Remove redundant call to
>   const_not_ok_for_debug_p target hook.
>   (mem_loc_descriptor) : Only call const_ok_for_output_1
>   on UNSPEC and subexpressions thereof if all subexpressions of the
>   UNSPEC are CONSTANT_P.
> 
> --- gcc/dwarf2out.c.jj2019-01-05 12:10:36.630753817 +0100
> +++ gcc/dwarf2out.c   2019-01-06 21:33:58.583426865 +0100
> @@ -14445,13 +14445,6 @@ const_ok_for_output_1 (rtx rtl)
>if (CONST_POLY_INT_P (rtl))
>  return false;
>  
> -  if (targetm.const_not_ok_for_debug_p (rtl))
> -{
> -  expansion_failed (NULL_TREE, rtl,
> - "Expression rejected for debug by the backend.\n");
> -  return false;
> -}
> -
>/* FIXME: Refer to PR60655. It is possible for simplification
>   of rtl expressions in var tracking to produce such expressions.
>   We should really identify / validate expressions
> @@ -15660,8 +15653,17 @@ mem_loc_descriptor (rtx rtl, machine_mod
> bool not_ok = false;
> subrtx_var_iterator::array_type array;
> FOR_EACH_SUBRTX_VAR (iter, array, rtl, ALL)
> - if ((*iter != rtl && !CONSTANT_P (*iter))
> - || !const_ok_for_output_1 (*iter))
> + if (*iter != rtl && !CONSTANT_P (*iter))
> +   {
> + not_ok = true;
> + break;
> +   }
> +
> +   if (not_ok)
> + break;
> +
> +   FOR_EACH_SUBRTX_VAR (iter, array, rtl, ALL)
> + if (!const_ok_for_output_1 (*iter))
> {
>   not_ok = true;
>   break;
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Replace outdated references to x86_64-unknown-linux-gnu in docs

2019-01-07 Thread Jakub Jelinek
On Mon, Jan 07, 2019 at 09:40:30AM +, Jonathan Wakely wrote:
>   * doc/install.texi: Replace references to x86_64-unknown-linux-gnu
>   with x86_64-pc-linux-gnu.
> 
> OK for trunk?

Yes, thanks.

> commit 7586c65abcb0f0967a11639baf1d9332dbc0339c
> Author: Jonathan Wakely 
> Date:   Mon Jan 7 09:38:51 2019 +
> 
> Replace outdated references to x86_64-unknown-linux-gnu in docs
> 
> * doc/install.texi: Replace references to x86_64-unknown-linux-gnu
> with x86_64-pc-linux-gnu.
> 
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 5cf007bd1ec..dd01e4caeb1 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -261,10 +261,10 @@ In order to build GCC, the C standard library and 
> headers must be present
>  for all target variants for which target libraries will be built (and not
>  only the variant of the host C++ compiler).
>  
> -This affects the popular @samp{x86_64-unknown-linux-gnu} platform (among
> +This affects the popular @samp{x86_64-pc-linux-gnu} platform (among
>  other multilib targets), for which 64-bit (@samp{x86_64}) and 32-bit
>  (@samp{i386}) libc headers are usually packaged separately. If you do a
> -build of a native compiler on @samp{x86_64-unknown-linux-gnu}, make sure you
> +build of a native compiler on @samp{x86_64-pc-linux-gnu}, make sure you
>  either have the 32-bit libc developer package properly installed (the exact
>  name of the package depends on your distro) or you must build GCC as a
>  64-bit only compiler by configuring with the option
> @@ -2070,14 +2070,14 @@ host system architecture.  For the case that the 
> linker has a
>  different (but run-time compatible) architecture, these flags can be
>  specified to build plugins that are compatible to the linker.  For
>  example, if you are building GCC for a 64-bit x86_64
> -(@samp{x86_64-unknown-linux-gnu}) host system, but have a 32-bit x86
> +(@samp{x86_64-pc-linux-gnu}) host system, but have a 32-bit x86
>  GNU/Linux (@samp{i686-pc-linux-gnu}) linker executable (which is
>  executable on the former system), you can configure GCC as follows for
>  getting compatible linker plugins:
>  
>  @smallexample
>  % @var{srcdir}/configure \
> ---host=x86_64-unknown-linux-gnu \
> +--host=x86_64-pc-linux-gnu \
>  --enable-linker-plugin-configure-flags=--host=i686-pc-linux-gnu \
>  --enable-linker-plugin-flags='CC=gcc\ -m32\ 
> -Wl,-rpath,[...]/i686-pc-linux-gnu/lib'
>  @end smallexample


Jakub


Re: [PATCH 2/2] PR libstdc++/86756 Move rest of std::filesystem to libstdc++.so

2019-01-07 Thread Jonathan Wakely

On 07/01/19 10:24 +0100, Christophe Lyon wrote:

Hi Jonathan

On Sun, 6 Jan 2019 at 23:37, Jonathan Wakely  wrote:


Move std::filesystem directory iterators and operations from
libstdc++fs.a to main libstdc++ library. These components have many
dependencies on OS support, which is not available on all targets. Some
additional autoconf checks and conditional compilation is needed to
ensure the files will build for all targets. Previously this code was
not compiled without --enable-libstdcxx-filesystem-ts but the C++17
components should be available for all hosted builds.

The tests for these components no longer need to link to libstdc++fs.a,
but are not expected to pass on all targets. To avoid numerous failures
on targets which are not expected to pass the tests (due to missing OS
functionality) leave the dg-require-filesystem-ts directives in place
for now. This will ensure the tests only run for builds where the
filesystem-ts library is built, which presumably means some level of OS
support is present.


Tested x86_64-linux (old/new string ABIs, 32/64 bit), x86_64-w64-mingw32.

Committed to trunk.



After this commit (r267616), I've noticed build failures for my
newlib-based toolchains:
aarch64-elf, arm-eabi:

In file included from
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:57:
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/../filesystem/ops-common.h:142:11:
error: '::truncate' has not been declared
 142 |   using ::truncate;
 |   ^~~~
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:
In function 'void std::filesystem::resize_file(const
std::filesystem::__cxx11::path&, uintmax_t, std::error_code&)':
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:1274:19:
error: 'truncate' is not a member of 'posix'
1274 |   else if (posix::truncate(p.c_str(), size))
 |   ^~~~
make[5]: *** [fs_ops.lo] Error 1

I'm not sure if there's an obvious fix? Note that I'm using a rather
old newlib version, if that matters.


That's probably the reason, as I didn't see this in my tests with
newlib builds.

The fix is to add yet another autoconf check and guard the uses of
truncate with a _GLIBCXX_USE_TRUNCATE macro. I'll do that now ...



[PATCH] Silence some dwarf2out UNSPEC related notes (PR debug/88723)

2019-01-07 Thread Jakub Jelinek
Hi!

Apparently my recent patch turned on many non-delegitimized UNSPEC notes
(it is checking only note, goes away in release builds, but anyway).

The problem is that formerly UNSPECs were diagnosed that way only when
inside of CONST, but now also outside of them.  The patch keeps ignoring
them if they don't have all constant arguments though as before, only tries
to handle them if they have all constant arguments and thus normally would
appear inside of CONST but we are processing parts of the CONST
individually.

The following patch fixes it by determining first if they have all
CONSTANT_P arguments, and only if they do, follows that with
const_ok_for_output_1 verification which can emit this diagnostics.

The patch also removes what appears to be a badly applied patch in r255862,
the posted patch contained just addition of if (CONST_POLY_INT_P (rtl)) return 
false;
to the function, but added also the hunk I'm removing, so we have now
  if (targetm.const_not_ok_for_debug_p (rtl))
{
  if (GET_CODE (rtl) != UNSPEC)
{
  diagnostics;
  return false;
}
  another diagnostics;
  return false;
}
  if (CONST_POLY_INT_P (rtl))
return false;
  if (targetm.const_not_ok_for_debug_p (rtl))
{
  diagnostics;
  return false;
}
Calling it twice will not help in any way.

Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested on the
testcase with -> sparc-solaris cross-compiler, ok for trunk?

2019-01-07  Jakub Jelinek  

PR debug/88723
* dwarf2out.c (const_ok_for_output_1): Remove redundant call to
const_not_ok_for_debug_p target hook.
(mem_loc_descriptor) : Only call const_ok_for_output_1
on UNSPEC and subexpressions thereof if all subexpressions of the
UNSPEC are CONSTANT_P.

--- gcc/dwarf2out.c.jj  2019-01-05 12:10:36.630753817 +0100
+++ gcc/dwarf2out.c 2019-01-06 21:33:58.583426865 +0100
@@ -14445,13 +14445,6 @@ const_ok_for_output_1 (rtx rtl)
   if (CONST_POLY_INT_P (rtl))
 return false;
 
-  if (targetm.const_not_ok_for_debug_p (rtl))
-{
-  expansion_failed (NULL_TREE, rtl,
-   "Expression rejected for debug by the backend.\n");
-  return false;
-}
-
   /* FIXME: Refer to PR60655. It is possible for simplification
  of rtl expressions in var tracking to produce such expressions.
  We should really identify / validate expressions
@@ -15660,8 +15653,17 @@ mem_loc_descriptor (rtx rtl, machine_mod
  bool not_ok = false;
  subrtx_var_iterator::array_type array;
  FOR_EACH_SUBRTX_VAR (iter, array, rtl, ALL)
-   if ((*iter != rtl && !CONSTANT_P (*iter))
-   || !const_ok_for_output_1 (*iter))
+   if (*iter != rtl && !CONSTANT_P (*iter))
+ {
+   not_ok = true;
+   break;
+ }
+
+ if (not_ok)
+   break;
+
+ FOR_EACH_SUBRTX_VAR (iter, array, rtl, ALL)
+   if (!const_ok_for_output_1 (*iter))
  {
not_ok = true;
break;


Jakub


Re: [PATCH] Add vec_extract{v32qiv16qi,v16hiv8hi,v8siv4si,v4div2di,v8sfv4sf,v4dfv2df}

2019-01-07 Thread Uros Bizjak
On Sun, Jan 6, 2019 at 11:33 AM Jakub Jelinek  wrote:
>
> Hi!
>
> Looking at the output of builtin-convertvector-1.c (f4), this patch changes
> the generated code:
> vcvttpd2dqy (%rdi), %xmm0
> -   vmovdqa %xmm0, %xmm0
> vmovaps %xmm0, (%rsi)
> -   vzeroupper
> ret
> The problem is that without vec_extract patterns to extract 128-bit vectors
> from 256-bit ones, the expander creates TImode extraction and combine +
> simplify-rtx.c isn't able to optimize it out properly due to vector ->
> non-vector -> vector mode subregs in there.
> We already have vec_extract patterns to extract 256-bit vectors from 512-bit
> ones and we have all the vec_extract_{lo,hi}_* named insns even for the
> 128-bit out of 256-bit vectors, so this patch just makes those available to
> the expander.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-01-06  Jakub Jelinek  
>
> * config/i386/sse.md (vec_extract): Use
> V_256_512 iterator instead of V_512 and TARGET_AVX instead of
> TARGET_AVX512F as condition.

LGTM.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2019-01-04 09:56:08.548495229 +0100
> +++ gcc/config/i386/sse.md  2019-01-05 21:33:34.057288059 +0100
> @@ -8362,9 +8362,9 @@ (define_expand "vec_extract
>  (define_expand "vec_extract"
>[(match_operand: 0 "nonimmediate_operand")
> -   (match_operand:V_512 1 "register_operand")
> +   (match_operand:V_256_512 1 "register_operand")
> (match_operand 2 "const_0_to_1_operand")]
> -  "TARGET_AVX512F"
> +  "TARGET_AVX"
>  {
>if (INTVAL (operands[2]))
>  emit_insn (gen_vec_extract_hi_ (operands[0], operands[1]));
>
> Jakub


[PATCH] Replace outdated references to x86_64-unknown-linux-gnu in docs

2019-01-07 Thread Jonathan Wakely

* doc/install.texi: Replace references to x86_64-unknown-linux-gnu
with x86_64-pc-linux-gnu.

OK for trunk?


commit 7586c65abcb0f0967a11639baf1d9332dbc0339c
Author: Jonathan Wakely 
Date:   Mon Jan 7 09:38:51 2019 +

Replace outdated references to x86_64-unknown-linux-gnu in docs

* doc/install.texi: Replace references to x86_64-unknown-linux-gnu
with x86_64-pc-linux-gnu.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 5cf007bd1ec..dd01e4caeb1 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -261,10 +261,10 @@ In order to build GCC, the C standard library and headers 
must be present
 for all target variants for which target libraries will be built (and not
 only the variant of the host C++ compiler).
 
-This affects the popular @samp{x86_64-unknown-linux-gnu} platform (among
+This affects the popular @samp{x86_64-pc-linux-gnu} platform (among
 other multilib targets), for which 64-bit (@samp{x86_64}) and 32-bit
 (@samp{i386}) libc headers are usually packaged separately. If you do a
-build of a native compiler on @samp{x86_64-unknown-linux-gnu}, make sure you
+build of a native compiler on @samp{x86_64-pc-linux-gnu}, make sure you
 either have the 32-bit libc developer package properly installed (the exact
 name of the package depends on your distro) or you must build GCC as a
 64-bit only compiler by configuring with the option
@@ -2070,14 +2070,14 @@ host system architecture.  For the case that the linker 
has a
 different (but run-time compatible) architecture, these flags can be
 specified to build plugins that are compatible to the linker.  For
 example, if you are building GCC for a 64-bit x86_64
-(@samp{x86_64-unknown-linux-gnu}) host system, but have a 32-bit x86
+(@samp{x86_64-pc-linux-gnu}) host system, but have a 32-bit x86
 GNU/Linux (@samp{i686-pc-linux-gnu}) linker executable (which is
 executable on the former system), you can configure GCC as follows for
 getting compatible linker plugins:
 
 @smallexample
 % @var{srcdir}/configure \
---host=x86_64-unknown-linux-gnu \
+--host=x86_64-pc-linux-gnu \
 --enable-linker-plugin-configure-flags=--host=i686-pc-linux-gnu \
 --enable-linker-plugin-flags='CC=gcc\ -m32\ 
-Wl,-rpath,[...]/i686-pc-linux-gnu/lib'
 @end smallexample


Re: [PATCH] restore CFString handling in attribute format (PR 88638)

2019-01-07 Thread Iain Sandoe


> On 6 Jan 2019, at 23:34, Martin Sebor  wrote:
> 
> Attached is an updated patch with the wording change to the manual
> discussed below and rebased on the top of today's trunk.

Works for me as well.
thanks for the patch.

Iain

> 
> Martin
> 
> PS Thanks for the additional info, Iain.
> 
> On 1/5/19 10:53 AM, Iain Sandoe wrote:
>>> On 5 Jan 2019, at 17:39, Martin Sebor  wrote:
>>> 
>>> On 1/5/19 3:31 AM, Iain Sandoe wrote:
 Hi Martin,
> On 4 Jan 2019, at 22:30, Mike Stump  wrote:
> 
> On Jan 4, 2019, at 2:03 PM, Martin Sebor  wrote:
>> 
>> The improved handling of attribute positional arguments added
>> in r266195 introduced a regression on Darwin where attribute
>> format with the CFString archetype accepts CFString* parameter
>> types in positions where only char* would otherwise be allowed.
> 
> You didn't ask Ok?  I'll assume you want a review...  The darwin bits and 
> the testsuite bits look fine.
>> 
>> Index: gcc/doc/extend.texi
>> ===
>> --- gcc/doc/extend.texi  (revision 267580)
>> +++ gcc/doc/extend.texi  (working copy)
>> @@ -22368,10 +22368,12 @@ bit-fields.  See the Solaris man page for 
>> @code{cm
>>  @node Darwin Format Checks
>>  @subsection Darwin Format Checks
>>  -Darwin targets support the @code{CFString} (or @code{__CFString__}) in 
>> the format
>> -attribute context.  Declarations made with such attribution are parsed 
>> for correct syntax
>> -and format argument types.  However, parsing of the format string 
>> itself is currently undefined
>> -and is not carried out by this version of the compiler.
>> +In addition to the full set of archetypes, Darwin targets also support
>> +the @code{CFString} (or @code{__CFString__}) archetype in the 
>> @code{format}
>> +attribute.  Declarations with this archetype are parsed for correct 
>> syntax
>> +and argument types.  However, parsing of the format string itself and
>> +validating arguments against it in calls to such functions is currently
>> +not performed.
>>Additionally, @code{CFStringRefs} (defined by the 
>> @code{CoreFoundation} headers) may
>>  also be used as format arguments.  Note that the relevant headers are 
>> only likely to be
>> 
 I find “archetype(s)” to be an usual (and possibly unfamiliar to many) 
 word for this context.
 how about:
 s/archetype in/variant for the/
 and then
  s/with this archetype/with this variant/
 in the following sentence.
 However, just 0.02GBP .. (fixing the fails is more important than 
 bikeshedding the wording).
>>> 
>>> Thanks for chiming in!  I used archetype because that's the term
>>> used in the attribute format specification to describe the first
>>> argument.  I do tend to agree that archetype alone may not be
>>> sufficiently familiar to all users.  I'm happy to add text to
>>> make that clear.  Would you find the following better?
>>> 
>>>  In addition to the full set of format archetypes (attribute
>>>  format style arguments such as @code{printf}, @code{scanf},
>>>  @code{strftime}, and @code{strfmon}), Darwin targets also
>>>  support the @code{CFString} (or @code{__CFString__}) archetype…
>> Yes, that makes is clearer
>> (as an aside, I think that to many people the meaning of archetype - as 
>> ‘generic’  or ‘root example’
>>   etc  tends to come to mind before the ‘template/mold’ meaning … however 
>> couldn’t think of
>>  a better term that’s not already overloaded).
>>> FWIW, I wanted to figure out how the CFString attribute made it
>>> possible to differentiate between printf and scanf (and the other)
>>> kinds of functions, for example, so I could add new tests for it,
>>> but I couldn't tell that from the manual.  So I'm trying to update
>>> the text to make it clear that although CFString is just like
>>> the sprintf and scanf format arguments/archetypes, beyond
>>> validating declarations that use it, the attribute serves no
>>> functional purpose, so the printf/scanf distinction is moot.
>> The CFString container** is more general than our implementation, e.g. it 
>> should be able
>> to contain a variety of string formats (e.g. UTF etc.).   AFAIR at the time 
>> I implemented
>> it for FSF GCC (it was originally in the Apple Local branch), we didn’t have 
>> sufficient parsing
>> support for such things (and the support in the Apple-local branch didn’t 
>> look applicable).
>> If we do more sophisitcated checks, we probably need to take cognisance of 
>> the fact that a
>> fully-implemented CFString impl can have non-ascii payload.   I suspect 
>> (although honestly
>> it’s been a while, I maybe mis-remember) that was the reason I didn’t try to 
>> implement the
>> inspection at the time (if so, there’s probably a code comment to that 
>> effect).
>>> Out of curiosity, is the attribute used for fun

Re: [PATCH 2/2] PR libstdc++/86756 Move rest of std::filesystem to libstdc++.so

2019-01-07 Thread Christophe Lyon
Hi Jonathan

On Sun, 6 Jan 2019 at 23:37, Jonathan Wakely  wrote:
>
> Move std::filesystem directory iterators and operations from
> libstdc++fs.a to main libstdc++ library. These components have many
> dependencies on OS support, which is not available on all targets. Some
> additional autoconf checks and conditional compilation is needed to
> ensure the files will build for all targets. Previously this code was
> not compiled without --enable-libstdcxx-filesystem-ts but the C++17
> components should be available for all hosted builds.
>
> The tests for these components no longer need to link to libstdc++fs.a,
> but are not expected to pass on all targets. To avoid numerous failures
> on targets which are not expected to pass the tests (due to missing OS
> functionality) leave the dg-require-filesystem-ts directives in place
> for now. This will ensure the tests only run for builds where the
> filesystem-ts library is built, which presumably means some level of OS
> support is present.
>
>
> Tested x86_64-linux (old/new string ABIs, 32/64 bit), x86_64-w64-mingw32.
>
> Committed to trunk.
>

After this commit (r267616), I've noticed build failures for my
newlib-based toolchains:
aarch64-elf, arm-eabi:

In file included from
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:57:
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/../filesystem/ops-common.h:142:11:
error: '::truncate' has not been declared
  142 |   using ::truncate;
  |   ^~~~
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:
In function 'void std::filesystem::resize_file(const
std::filesystem::__cxx11::path&, uintmax_t, std::error_code&)':
/tmp/5241593_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++17/fs_ops.cc:1274:19:
error: 'truncate' is not a member of 'posix'
 1274 |   else if (posix::truncate(p.c_str(), size))
  |   ^~~~
make[5]: *** [fs_ops.lo] Error 1

I'm not sure if there's an obvious fix? Note that I'm using a rather
old newlib version, if that matters.

Thanks,

Christophe


Re: [PATCH] [RFC] PR target/52813 and target/11807

2019-01-07 Thread Jakub Jelinek
On Sun, Dec 16, 2018 at 06:13:57PM +0200, Dimitar Dimitrov wrote:
> -  /* Clobbering the STACK POINTER register is an error.  */
> +  /* Clobbered STACK POINTER register is not saved/restored by GCC,
> + which is often unexpected by users.  See PR52813.  */
>if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM))
>  {
> -  error ("Stack Pointer register clobbered by %qs in %", regname);
> +  warning (0, "Stack Pointer register clobbered by %qs in %",
> +regname);
> +  warning (0, "GCC has always ignored Stack Pointer % clobbers");

Why do we write Stack Pointer rather than stack pointer?  That is really
weird.  The second warning would be a note based on the first one, i.e.
if (warning ()) note ();
and better have some -W* option to silence the warning.

>is_valid = false;
>  }
>  
> diff --git a/gcc/testsuite/gcc.target/i386/pr52813.c 
> b/gcc/testsuite/gcc.target/i386/pr52813.c
> index 154ebbfc423..644fef15fef 100644
> --- a/gcc/testsuite/gcc.target/i386/pr52813.c
> +++ b/gcc/testsuite/gcc.target/i386/pr52813.c
> @@ -5,5 +5,5 @@
>  void
>  test1 (void)
>  {
> -  asm volatile ("" : : : "%esp"); /* { dg-error "Stack Pointer register 
> clobbered" } */
> +  asm volatile ("" : : : "%esp"); /* { dg-warning "Stack Pointer register 
> clobbered.\+GCC has always ignored Stack Pointer 'asm' clobbers" } */
>  }
> -- 
> 2.11.0
> 


Jakub


[nvptx] Handle large vector reductions

2019-01-07 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0024-nvptx-Handle-large-vector-reductions.patch

Committed.

Thanks,
- Tom
[nvptx] Handle large vector reductions

Add support for vector reductions with openacc vector_length larger than
warp-size.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx-protos.h (nvptx_output_red_partition): Declare.
	* config/nvptx/nvptx.c (vector_red_size, vector_red_align,
	vector_red_partition, vector_red_sym): New global variables.
	(nvptx_option_override): Initialize vector_red_sym.
	(nvptx_declare_function_name): Restore red_partition register.
	(nvptx_file_end): Emit code to declare the vector reduction variables.
	(nvptx_output_red_partition): New function.
	(nvptx_expand_shared_addr): Add vector argument. Use it to handle
	large vector reductions.
	(enum nvptx_builtins): Add NVPTX_BUILTIN_VECTOR_ADDR.
	(nvptx_init_builtins): Add VECTOR_ADDR.
	(nvptx_expand_builtin): Update call to nvptx_expand_shared_addr.
	Handle nvptx_expand_shared_addr.
	(nvptx_get_shared_red_addr): Add vector argument and handle large
	vectors.
	(nvptx_goacc_reduction_setup): Add offload_attrs argument and handle
	large vectors.
	(nvptx_goacc_reduction_init): Likewise.
	(nvptx_goacc_reduction_fini): Likewise.
	(nvptx_goacc_reduction_teardown): Likewise.
	(nvptx_goacc_reduction): Update calls to nvptx_goacc_reduction_{setup,
	init,fini,teardown}.
	(nvptx_init_axis_predicate): Initialize vector_red_partition.
	(nvptx_set_current_function): Init vector_red_partition.
	* config/nvptx/nvptx.md (UNSPECV_RED_PART): New unspecv.
	(nvptx_red_partition): New insn.
	* config/nvptx/nvptx.h (struct machine_function): Add red_partition.

---
 gcc/config/nvptx/nvptx-protos.h |   1 +
 gcc/config/nvptx/nvptx.c| 154 
 gcc/config/nvptx/nvptx.h|   2 +
 gcc/config/nvptx/nvptx.md   |  12 
 4 files changed, 140 insertions(+), 29 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 1a26d00ab99..be09a15e49c 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -56,5 +56,6 @@ extern const char *nvptx_output_return (void);
 extern const char *nvptx_output_set_softstack (unsigned);
 extern const char *nvptx_output_simt_enter (rtx, rtx, rtx);
 extern const char *nvptx_output_simt_exit (rtx);
+extern const char *nvptx_output_red_partition (rtx, rtx);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 26c80716603..5a4b38de522 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -150,6 +150,14 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Buffer needed for vector reductions, when vector_length >
+   PTX_WARP_SIZE.  This has to be distinct from the worker broadcast
+   array, as both may be live concurrently.  */
+static unsigned vector_red_size;
+static unsigned vector_red_align;
+static unsigned vector_red_partition;
+static GTY(()) rtx vector_red_sym;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -226,6 +234,11 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  vector_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__vector_red");
+  SET_SYMBOL_DATA_AREA (vector_red_sym, DATA_AREA_SHARED);
+  vector_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+  vector_red_partition = 0;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -1104,8 +1117,25 @@ nvptx_init_axis_predicate (FILE *file, int regno, const char *name)
 {
   fprintf (file, "\t{\n");
   fprintf (file, "\t\t.reg.u32\t%%%s;\n", name);
+  if (strcmp (name, "x") == 0 && cfun->machine->red_partition)
+{
+  fprintf (file, "\t\t.reg.u64\t%%t_red;\n");
+  fprintf (file, "\t\t.reg.u64\t%%y64;\n");
+}
   fprintf (file, "\t\tmov.u32\t%%%s, %%tid.%s;\n", name, name);
   fprintf (file, "\t\tsetp.ne.u32\t%%r%d, %%%s, 0;\n", regno, name);
+  if (strcmp (name, "x") == 0 && cfun->machine->red_partition)
+{
+  fprintf (file, "\t\tcvt.u64.u32\t%%y64, %%tid.y;\n");
+  fprintf (file, "\t\tcvta.shared.u64\t%%t_red, __vector_red;\n");
+  fprintf (file, "\t\tmad.lo.u64\t%%r%d, %%y64, %d, %%t_red; "
+	   "// vector reduction buffer\n",
+	   REGNO (cfun->machine->red_partition),
+	   vector_red_partition);
+}
+  /* Verify vector_red_size.  */
+  gcc_assert (vector_red_partition * nvptx_mach_max_workers ()
+	  <= vector_red_size);
   fprintf (file, "\t}\n");
 }
 
@@ -1342,6 +1372,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
 	fprintf (file, "\t.local.align 8 .b8 %%simtstack_ar["
 		HOST_

[nvptx] Don't emit barriers for empty loops -- fix

2019-01-07 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0022-nvptx-openacc-Don-t-emit-barriers-for-empty-loops.patch

Committed without test-case.

Thanks,
- Tom
[nvptx] Don't emit barriers for empty loops -- fix

When compiling an empty loop:
...
  long long v1;
  #pragma acc parallel num_gangs (640) num_workers(1) vector_length (128)
  #pragma acc loop
for (v1 = 0; v1 < 20; v1 += 2)
;
...
the compiler emits two subsequent bar.syncs.  This triggers some bug on my
quadro m1200 (I'm assuming in the ptxas/JIT compiler) that hangs the testcase.

This patch works around the bug by doing an optimization: we detect that this is
an empty loop (a forked immediately followed by a joining), and don't emit the
barriers.

The patch does not include the test-case yet, since vector_length (128) is not
yet supported at this point.

2018-12-17  Tom de Vries  

	PR target/85381
	* config/nvptx/nvptx.c (nvptx_process_pars): Don't emit barriers for
	empty loops.

---
 gcc/config/nvptx/nvptx.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2166f37b182..26c80716603 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4636,9 +4636,12 @@ nvptx_process_pars (parallel *par)
 {
   nvptx_shared_propagate (false, is_call, par->forked_block,
 			  par->forked_insn, !worker);
-  bool empty = nvptx_shared_propagate (true, is_call,
-	   par->forked_block, par->fork_insn,
-	   !worker);
+  bool no_prop_p
+	= nvptx_shared_propagate (true, is_call, par->forked_block,
+  par->fork_insn, !worker);
+  bool empty_loop_p
+	= !is_call && (NEXT_INSN (par->forked_insn)
+		   && NEXT_INSN (par->forked_insn) == par->joining_insn);
   rtx barrier = GEN_INT (0);
   int threads = 0;
 
@@ -4648,7 +4651,11 @@ nvptx_process_pars (parallel *par)
 	  threads = nvptx_mach_vector_length ();
 	}
 
-  if (!empty || !is_call)
+  if (no_prop_p && empty_loop_p)
+	;
+  else if (no_prop_p && is_call)
+	;
+  else
 	{
 	  /* Insert begin and end synchronizations.  */
 	  emit_insn_before (nvptx_cta_sync (barrier, threads),


[nvptx, committed] Add support for a per-worker broadcast buffer and barrier

2019-01-07 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0015-nvptx-Generalize-state-propagation-and-synchronizati.patch

Committed.

Thanks,
- Tom
[nvptx] Add support for a per-worker broadcast buffer and barrier

Add support for a per-worker broadcast buffer and barrier, to be used for
openacc vector_length larger than warp-size.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (oacc_bcast_partition): Declare.
	(nvptx_option_override): Init oacc_bcast_partition.
	(nvptx_init_oacc_workers): New function.
	(nvptx_declare_function_name): Call nvptx_init_oacc_workers.
	(nvptx_needs_shared_bcast): New function.
	(nvptx_find_par): Generalize to enable vectors to use shared-memory
	to propagate state.
	(nvptx_shared_propagate): Initialize vector bcast partition and
	synchronization state.
	(nvptx_single):  Generalize to enable vectors to use shared-memory
	to propagate state.
	(nvptx_process_pars): Likewise.
	(nvptx_set_current_function): Initialize oacc_broadcast_partition.
	* config/nvptx/nvptx.h (struct machine_function): Add
	bcast_partition and sync_bar members.

---
 gcc/config/nvptx/nvptx.c | 153 +--
 gcc/config/nvptx/nvptx.h |   4 ++
 2 files changed, 138 insertions(+), 19 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 6df4d02c4c1..2166f37b182 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -140,6 +140,7 @@ static GTY((cache)) hash_table *needed_fndecls_htab;
memory.  It'd be nice if PTX supported common blocks, because then
this could be shared across TUs (taking the largest size).  */
 static unsigned oacc_bcast_size;
+static unsigned oacc_bcast_partition;
 static unsigned oacc_bcast_align;
 static GTY(()) rtx oacc_bcast_sym;
 
@@ -158,6 +159,8 @@ static bool need_softstack_decl;
 /* True if any function references __nvptx_uni.  */
 static bool need_unisimt_decl;
 
+static int nvptx_mach_max_workers ();
+
 /* Allocate a new, cleared machine_function structure.  */
 
 static struct machine_function *
@@ -217,6 +220,7 @@ nvptx_option_override (void)
   oacc_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, "__oacc_bcast");
   SET_SYMBOL_DATA_AREA (oacc_bcast_sym, DATA_AREA_SHARED);
   oacc_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+  oacc_bcast_partition = 0;
 
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_red");
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
@@ -1105,6 +1109,40 @@ nvptx_init_axis_predicate (FILE *file, int regno, const char *name)
   fprintf (file, "\t}\n");
 }
 
+/* Emit code to initialize OpenACC worker broadcast and synchronization
+   registers.  */
+
+static void
+nvptx_init_oacc_workers (FILE *file)
+{
+  fprintf (file, "\t{\n");
+  fprintf (file, "\t\t.reg.u32\t%%tidy;\n");
+  if (cfun->machine->bcast_partition)
+{
+  fprintf (file, "\t\t.reg.u64\t%%t_bcast;\n");
+  fprintf (file, "\t\t.reg.u64\t%%y64;\n");
+}
+  fprintf (file, "\t\tmov.u32\t\t%%tidy, %%tid.y;\n");
+  if (cfun->machine->bcast_partition)
+{
+  fprintf (file, "\t\tcvt.u64.u32\t%%y64, %%tidy;\n");
+  fprintf (file, "\t\tadd.u64\t\t%%y64, %%y64, 1; // vector ID\n");
+  fprintf (file, "\t\tcvta.shared.u64\t%%t_bcast, __oacc_bcast;\n");
+  fprintf (file, "\t\tmad.lo.u64\t%%r%d, %%y64, %d, %%t_bcast; "
+	   "// vector broadcast offset\n",
+	   REGNO (cfun->machine->bcast_partition),
+	   oacc_bcast_partition);
+}
+  /* Verify oacc_bcast_size.  */
+  gcc_assert (oacc_bcast_partition * (nvptx_mach_max_workers () + 1)
+	  <= oacc_bcast_size);
+  if (cfun->machine->sync_bar)
+fprintf (file, "\t\tadd.u32\t\t%%r%d, %%tidy, 1; "
+	 "// vector synchronization barrier\n",
+	 REGNO (cfun->machine->sync_bar));
+  fprintf (file, "\t}\n");
+}
+
 /* Emit code to initialize predicate and master lane index registers for
-muniform-simt code generation variant.  */
 
@@ -1331,6 +1369,8 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   if (cfun->machine->unisimt_predicate
   || (cfun->machine->has_simtreg && !crtl->is_leaf))
 nvptx_init_unisimt_predicate (file);
+  if (cfun->machine->bcast_partition || cfun->machine->sync_bar)
+nvptx_init_oacc_workers (file);
 }
 
 /* Output code for switching uniform-simt state.  ENTERING indicates whether
@@ -3072,6 +3112,19 @@ nvptx_split_blocks (bb_insn_map_t *map)
 }
 }
 
+/* Return true if MASK contains parallelism that requires shared
+   memory to broadcast.  */
+
+static bool
+nvptx_needs_shared_bcast (unsigned mask)
+{
+  bool worker = mask & GOMP_DIM_MASK (GOMP_DIM_WORKER);
+  bool large_vector = (mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))
+&& nvptx_mach_vector_length () != PTX_WARP_SIZE;
+
+  return worker || large_vector;
+}
+
 /* BLOCK is a basic block containing a head or tail instruction.
Locate the associated prehead or pretail instruction, which must be
in the single predecessor blo

[committed][nvptx] Allow larger PTX_MAX_VECTOR_LENGTH in nvptx_goacc_validate_dims_1

2019-01-07 Thread Tom de Vries
Hi,

Allow PTX_MAX_VECTOR_LENGTH to be defined as larger than PTX_WARP_SIZE in
nvptx_goacc_validate_dims_1.

Committed to trunk.

Thanks,
- Tom

[nvptx] Allow larger PTX_MAX_VECTOR_LENGTH in nvptx_goacc_validate_dims_1

2019-01-07  Tom de Vries  

* config/nvptx/nvptx.c (nvptx_welformed_vector_length_p)
(nvptx_apply_dim_limits): New function.
(nvptx_goacc_validate_dims_1): Allow PTX_MAX_VECTOR_LENGTH larger than
PTX_WARP_SIZE.

---
 gcc/config/nvptx/nvptx.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3a4a5a3a159..6df4d02c4c1 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5267,6 +5267,30 @@ nvptx_simt_vf ()
   return PTX_WARP_SIZE;
 }
 
+static bool
+nvptx_welformed_vector_length_p (int l)
+{
+  gcc_assert (l > 0);
+  return l % PTX_WARP_SIZE == 0;
+}
+
+static void
+nvptx_apply_dim_limits (int dims[])
+{
+  /* Check that the vector_length is not too large.  */
+  if (dims[GOMP_DIM_VECTOR] > PTX_MAX_VECTOR_LENGTH)
+dims[GOMP_DIM_VECTOR] = PTX_MAX_VECTOR_LENGTH;
+
+  /* Check that the number of workers is not too large.  */
+  if (dims[GOMP_DIM_WORKER] > PTX_WORKER_LENGTH)
+dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
+
+  /* Ensure that num_worker * vector_length <= cta size.  */
+  if (dims[GOMP_DIM_WORKER] > 0 &&  dims[GOMP_DIM_VECTOR] > 0
+  && dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR] > PTX_CTA_SIZE)
+dims[GOMP_DIM_VECTOR] = PTX_WARP_SIZE;
+}
+
 /* As nvptx_goacc_validate_dims, but does not return bool to indicate whether
DIMS has changed.  */
 
@@ -5389,12 +5413,10 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], int 
fn_level)
 }
 
   if (dims[GOMP_DIM_VECTOR] > 0
-  && dims[GOMP_DIM_VECTOR] != PTX_WARP_SIZE)
+  && !nvptx_welformed_vector_length_p (dims[GOMP_DIM_VECTOR]))
 dims[GOMP_DIM_VECTOR] = PTX_DEFAULT_VECTOR_LENGTH;
 
-  /* Check the num workers is not too large.  */
-  if (dims[GOMP_DIM_WORKER] > PTX_WORKER_LENGTH)
-dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
+  nvptx_apply_dim_limits (dims);
 
   if (dims[GOMP_DIM_VECTOR] != old_dims[GOMP_DIM_VECTOR])
 warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
@@ -5415,6 +5437,7 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], int 
fn_level)
dims[GOMP_DIM_WORKER] = PTX_DEFAULT_RUNTIME_DIM;
   if (dims[GOMP_DIM_GANG] < 0)
dims[GOMP_DIM_GANG] = PTX_DEFAULT_RUNTIME_DIM;
+  nvptx_apply_dim_limits (dims);
 }
 }
 


[committed][nvptx] Postpone warnings in nvptx_goacc_validate_dims_1

2019-01-07 Thread Tom de Vries
Hi,

this patch moves warnings in nvptx_goacc_validate_dims_1 to as late as
possible.  This allows us more flexibility in setting the dimensions.

Committed to trunk.

Thanks,
- Tom

[nvptx] Postpone warnings in nvptx_goacc_validate_dims_1

2019-01-07  Tom de Vries  

* config/nvptx/nvptx.c (nvptx_goacc_validate_dims_1): Move warnings to
as late as possible.

---
 gcc/config/nvptx/nvptx.c | 38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3d680e9d80a..3a4a5a3a159 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5376,25 +5376,37 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], int 
fn_level)
   gcc_assert (dims[GOMP_DIM_GANG] >= -1);
 }
 
-  if (dims[GOMP_DIM_VECTOR] >= 0
-  && dims[GOMP_DIM_VECTOR] != PTX_WARP_SIZE)
+  int old_dims[GOMP_DIM_MAX];
+  unsigned int i;
+  for (i = 0; i < GOMP_DIM_MAX; ++i)
+old_dims[i] = dims[i];
+
+  const char *vector_reason = NULL;
+  if (dims[GOMP_DIM_VECTOR] == 0)
 {
-  warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
- dims[GOMP_DIM_VECTOR]
- ? G_("using vector_length (%d), ignoring %d")
- : G_("using vector_length (%d), ignoring runtime setting"),
- PTX_DEFAULT_VECTOR_LENGTH, dims[GOMP_DIM_VECTOR]);
+  vector_reason = G_("using vector_length (%d), ignoring runtime setting");
   dims[GOMP_DIM_VECTOR] = PTX_DEFAULT_VECTOR_LENGTH;
 }
 
+  if (dims[GOMP_DIM_VECTOR] > 0
+  && dims[GOMP_DIM_VECTOR] != PTX_WARP_SIZE)
+dims[GOMP_DIM_VECTOR] = PTX_DEFAULT_VECTOR_LENGTH;
+
   /* Check the num workers is not too large.  */
   if (dims[GOMP_DIM_WORKER] > PTX_WORKER_LENGTH)
-{
-  warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
- "using num_workers (%d), ignoring %d",
- PTX_WORKER_LENGTH, dims[GOMP_DIM_WORKER]);
-  dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
-}
+dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
+
+  if (dims[GOMP_DIM_VECTOR] != old_dims[GOMP_DIM_VECTOR])
+warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
+   vector_reason != NULL
+   ? vector_reason
+   : G_("using vector_length (%d), ignoring %d"),
+   dims[GOMP_DIM_VECTOR], old_dims[GOMP_DIM_VECTOR]);
+
+  if (dims[GOMP_DIM_WORKER] != old_dims[GOMP_DIM_WORKER])
+warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
+   G_("using num_workers (%d), ignoring %d"),
+   dims[GOMP_DIM_WORKER], old_dims[GOMP_DIM_WORKER]);
 
   if (oacc_default_dims_p)
 {


[committed][nvptx] Add asserts in nvptx_goacc_validate_dims

2019-01-07 Thread Tom de Vries
Hi,

this patch adds a few asserts to nvptx_goacc_validate_dims.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add asserts in nvptx_goacc_validate_dims

2019-01-07  Tom de Vries  

* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Add asserts.

---
 gcc/config/nvptx/nvptx.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 5d0bab65d07..c0a58f3aee5 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5421,6 +5421,10 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int 
fn_level)
 
   nvptx_goacc_validate_dims_1 (decl, dims, fn_level);
 
+  gcc_assert (dims[GOMP_DIM_VECTOR] != 0);
+  if (dims[GOMP_DIM_WORKER] > 0 && dims[GOMP_DIM_VECTOR] > 0)
+gcc_assert (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR] <= PTX_CTA_SIZE);
+
   for (i = 0; i < GOMP_DIM_MAX; ++i)
 if (old_dims[i] != dims[i])
   return true;


[committed][nvptx] Eliminate PTX_VECTOR_LENGTH

2019-01-07 Thread Tom de Vries
Hi,

this patch removes PTX_VECTOR_LENGTH and replaces uses of it with
PTX_DEFAULT_VECTOR_LENGTH, PTX_MAX_VECTOR_LENGTH and PTX_WARP_SIZE.

Committed to trunk.

Thanks,
- Tom

[nvptx] Eliminate PTX_VECTOR_LENGTH

2019-01-07  Tom de Vries  

* config/nvptx/nvptx.c (PTX_VECTOR_LENGTH): Remove.
(PTX_DEFAULT_VECTOR_LENGTH, PTX_MAX_VECTOR_LENGTH): Define.
(nvptx_goacc_validate_dims_1, nvptx_dim_limit)
(nvptx_goacc_reduction_fini): Use PTX_DEFAULT_VECTOR_LENGTH,
PTX_MAX_VECTOR_LENGTH and PTX_WARP_SIZE instead of PTX_VECTOR_LENGTH.

---
 gcc/config/nvptx/nvptx.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c0a58f3aee5..3d680e9d80a 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -82,7 +82,8 @@
 #define WORKAROUND_PTXJIT_BUG_3 1
 
 #define PTX_WARP_SIZE 32
-#define PTX_VECTOR_LENGTH 32
+#define PTX_DEFAULT_VECTOR_LENGTH PTX_WARP_SIZE
+#define PTX_MAX_VECTOR_LENGTH PTX_WARP_SIZE
 #define PTX_WORKER_LENGTH 32
 #define PTX_DEFAULT_RUNTIME_DIM 0 /* Defer to runtime.  */
 
@@ -5376,14 +5377,14 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], int 
fn_level)
 }
 
   if (dims[GOMP_DIM_VECTOR] >= 0
-  && dims[GOMP_DIM_VECTOR] != PTX_VECTOR_LENGTH)
+  && dims[GOMP_DIM_VECTOR] != PTX_WARP_SIZE)
 {
   warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION, 0,
  dims[GOMP_DIM_VECTOR]
  ? G_("using vector_length (%d), ignoring %d")
  : G_("using vector_length (%d), ignoring runtime setting"),
- PTX_VECTOR_LENGTH, dims[GOMP_DIM_VECTOR]);
-  dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
+ PTX_DEFAULT_VECTOR_LENGTH, dims[GOMP_DIM_VECTOR]);
+  dims[GOMP_DIM_VECTOR] = PTX_DEFAULT_VECTOR_LENGTH;
 }
 
   /* Check the num workers is not too large.  */
@@ -5397,7 +5398,7 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], int 
fn_level)
 
   if (oacc_default_dims_p)
 {
-  dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
+  dims[GOMP_DIM_VECTOR] = PTX_DEFAULT_VECTOR_LENGTH;
   if (dims[GOMP_DIM_WORKER] < 0)
dims[GOMP_DIM_WORKER] = PTX_DEFAULT_RUNTIME_DIM;
   if (dims[GOMP_DIM_GANG] < 0)
@@ -5440,7 +5441,7 @@ nvptx_dim_limit (int axis)
   switch (axis)
 {
 case GOMP_DIM_VECTOR:
-  return PTX_VECTOR_LENGTH;
+  return PTX_MAX_VECTOR_LENGTH;
 
 default:
   break;
@@ -5937,7 +5938,7 @@ nvptx_goacc_reduction_fini (gcall *call)
   /* Emit binary shuffle tree.  TODO. Emit this as an actual loop,
 but that requires a method of emitting a unified jump at the
 gimple level.  */
-  for (int shfl = PTX_VECTOR_LENGTH / 2; shfl > 0; shfl = shfl >> 1)
+  for (int shfl = PTX_WARP_SIZE / 2; shfl > 0; shfl = shfl >> 1)
{
  tree other_var = make_ssa_name (TREE_TYPE (var));
  nvptx_generate_vector_shuffle (gimple_location (call),


[nvptx, committed] Fix libgomp.oacc-c-c++-common/vector-length-128-3.c

2019-01-07 Thread Tom de Vries
[was: Re: [nvptx] vector length patch series]

On 03-01-19 17:29, Tom de Vries wrote:
> +/* { dg-set-target-env-var "GOMP_OPENACC_DIM" "-:-:128" } */

Committed as obvious.

Thanks,
- Tom
[nvptx] Fix libgomp.oacc-c-c++-common/vector-length-128-3.c

The vector-length-128-3.c test-case uses GOMP_OPENACC_DIM=-:-:128, but '-' is
not yet supported on trunk.  Use GOMP_OPENACC_DIM=::128 instead.

2019-01-07  Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Fix
	GOMP_OPENACC_DIM argument.

---
 libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
index c403e74658b..59be37a7c27 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
@@ -2,7 +2,7 @@
 /* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
 /* We default to warp size 32 for the vector length, so the GOMP_OPENACC_DIM has
no effect.  */
-/* { dg-set-target-env-var "GOMP_OPENACC_DIM" "-:-:128" } */
+/* { dg-set-target-env-var "GOMP_OPENACC_DIM" "::128" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */
 
 


[openacc] Add oacc_get_min_dim

2019-01-07 Thread Tom de Vries
[ was: Re: Fwd: [openacc, committed] Add oacc_get_default_dim ]

On 19-12-18 16:27, Tom de Vries wrote:
> [ Adding gcc-patches ]
> 
>  Forwarded Message 
> Subject: [openacc, committed] Add oacc_get_default_dim
> Date: Wed, 19 Dec 2018 16:24:25 +0100
> From: Tom de Vries 
> To: Thomas Schwinge 
> 
> [ was: Re: [nvptx] vector length patch series -- openacc parts ]
> 
> On 19-12-18 11:40, Thomas Schwinge wrote:
>> Hi Tom!
>>
>> Thanks for picking up this series!
>>
>>
>> And just to note:
>>
>> On Tue, 18 Dec 2018 00:52:30 +0100, Tom de Vries  wrote:
>>> On 14-12-18 20:58, Tom de Vries wrote:
>>>
 0003-openacc-Add-target-hook-TARGET_GOACC_ADJUST_PARALLEL.patch
>>>
 0017-nvptx-Enable-large-vectors.patch
>>>
 0023-nvptx-Force-vl32-if-calling-vector-partitionable-rou.patch
>>>
>>> Thomas,
>>>
>>> these patches are openacc (0003) or have openacc components (0017, 0023).
>>>
>>> Can you review and possibly approve the openacc parts?
>>
>> I've seen this (and your earlier questions), and will get to it
>> eventually, thanks.
>>
>>
> 
> In that case, let's make the review for the IMO trivial bits post-commit.
> 
> Committed the openacc component of 0017 ...
>


Likewise, added oacc_get_min_dim.

Thanks,
- Tom

[openacc] Add oacc_get_min_dim

Expose oacc_min_dims to backends.

2019-01-07  Tom de Vries  

	* omp-offload.c (oacc_get_min_dim): New function.
	* omp-offload.h (oacc_get_min_dim): Declare.

---
 gcc/omp-offload.c | 7 +++
 gcc/omp-offload.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index aade9f2dc60..9cac5655c63 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -580,6 +580,13 @@ oacc_get_default_dim (int dim)
   return oacc_default_dims[dim];
 }
 
+int
+oacc_get_min_dim (int dim)
+{
+  gcc_assert (0 <= dim && dim < GOMP_DIM_MAX);
+  return oacc_min_dims[dim];
+}
+
 /* Parse the default dimension parameter.  This is a set of
:-separated optional compute dimensions.  Each specified dimension
is a positive integer.  When device type support is added, it is
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index 6759a832d2b..21c9236b74f 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_OMP_DEVICE_H
 
 extern int oacc_get_default_dim (int dim);
+extern int oacc_get_min_dim (int dim);
 extern int oacc_fn_attrib_level (tree attr);
 
 extern GTY(()) vec *offload_funcs;


Re: [PATCH] Add __builtin_convertvector support (PR c++/85052, take 2)

2019-01-07 Thread Richard Biener
On Thu, 3 Jan 2019, Jakub Jelinek wrote:

> On Thu, Jan 03, 2019 at 06:32:44PM +0100, Marc Glisse wrote:
> > > That said, not sure if e.g. using an opaque builtin for the conversion 
> > > that
> > > supportable_convert_operation sometimes uses is better over this ifn.
> > > What exact optimization opportunities you are looking for if it is lowered
> > > earlier?  I have the VECTOR_CST folding in place...
> > 
> > I don't know, any kind of optimization we currently do on scalars... For
> > conversions between integers and floats, that seems to be very limited,
> > maybe combine consecutive casts in rare cases. For sign changes, we have a
> > number of transformations in match.pd that are fine with an intermediate
> > cast that only changes the sign (I even introduced nop_convert to handle
> > vectors at the same time). I guess we could handle this IFN as well. It is
> 
> For the sign only changes (non-narrowing/widening), the FE is already
> emitting a VIEW_CONVERT_EXPR instead of the IFN.
> 
> Anyway, here is an updated version of the patch, that:
> 1) has doc/extend.texi changes
> 2) adds the missing function comment
> 3) has the tweak suggested by Richard Sandiford
> 4) has implemented 2x narrowing and 2x widening vector support, haven't done
> the multi cvt cases yet though
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.

OK for trunk.

Thanks,
Richard.

> 2019-01-03  Jakub Jelinek  
> 
>   PR c++/85052
>   * tree-vect-generic.c: Include insn-config.h and recog.h.
>   (expand_vector_piecewise): Add defaulted ret_type argument,
>   if non-NULL, use that in preference to type for the result type.
>   (expand_vector_parallel): Formatting fix.
>   (do_vec_conversion, do_vec_narrowing_conversion,
>   expand_vector_conversion): New functions.
>   (expand_vector_operations_1): Call expand_vector_conversion
>   for VEC_CONVERT ifn calls.
>   * internal-fn.def (VEC_CONVERT): New internal function.
>   * internal-fn.c (expand_VEC_CONVERT): New function.
>   * fold-const-call.c (fold_const_vec_convert): New function.
>   (fold_const_call): Use it for CFN_VEC_CONVERT.
>   * doc/extend.texi (__builtin_convertvector): Document.
> c-family/
>   * c-common.h (enum rid): Add RID_BUILTIN_CONVERTVECTOR.
>   (c_build_vec_convert): Declare.
>   * c-common.c (c_build_vec_convert): New function.
> c/
>   * c-parser.c (c_parser_postfix_expression): Parse
>   __builtin_convertvector.
> cp/
>   * cp-tree.h (cp_build_vec_convert): Declare.
>   * parser.c (cp_parser_postfix_expression): Parse
>   __builtin_convertvector.
>   * constexpr.c: Include fold-const-call.h.
>   (cxx_eval_internal_function): Handle IFN_VEC_CONVERT.
>   (potential_constant_expression_1): Likewise.
>   * semantics.c (cp_build_vec_convert): New function.
>   * pt.c (tsubst_copy_and_build): Handle CALL_EXPR to
>   IFN_VEC_CONVERT.
> testsuite/
>   * c-c++-common/builtin-convertvector-1.c: New test.
>   * c-c++-common/torture/builtin-convertvector-1.c: New test.
>   * g++.dg/ext/builtin-convertvector-1.C: New test.
>   * g++.dg/cpp0x/constexpr-builtin4.C: New test.
> 
> --- gcc/tree-vect-generic.c.jj2019-01-02 20:48:25.880725772 +0100
> +++ gcc/tree-vect-generic.c   2019-01-03 17:45:43.005459518 +0100
> @@ -39,6 +39,8 @@ along with GCC; see the file COPYING3.
>  #include "tree-cfg.h"
>  #include "tree-vector-builder.h"
>  #include "vec-perm-indices.h"
> +#include "insn-config.h"
> +#include "recog.h"   /* FIXME: for insn_data */
>  
>  
>  static void expand_vector_operations_1 (gimple_stmt_iterator *);
> @@ -267,7 +269,8 @@ do_negate (gimple_stmt_iterator *gsi, tr
>  static tree
>  expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
>tree type, tree inner_type,
> -  tree a, tree b, enum tree_code code)
> +  tree a, tree b, enum tree_code code,
> +  tree ret_type = NULL_TREE)
>  {
>vec *v;
>tree part_width = TYPE_SIZE (inner_type);
> @@ -278,23 +281,27 @@ expand_vector_piecewise (gimple_stmt_ite
>int i;
>location_t loc = gimple_location (gsi_stmt (*gsi));
>  
> -  if (types_compatible_p (gimple_expr_type (gsi_stmt (*gsi)), type))
> +  if (ret_type
> +  || types_compatible_p (gimple_expr_type (gsi_stmt (*gsi)), type))
>  warning_at (loc, OPT_Wvector_operation_performance,
>   "vector operation will be expanded piecewise");
>else
>  warning_at (loc, OPT_Wvector_operation_performance,
>   "vector operation will be expanded in parallel");
>  
> +  if (!ret_type)
> +ret_type = type;
>vec_alloc (v, (nunits + delta - 1) / delta);
>for (i = 0; i < nunits;
> i += delta, index = int_const_binop (PLUS_EXPR, index, part_width))
>  {
> -  tree result = f (gsi, inner_type, a, b, index, part_width, code, type);
> +  tree result = f (gsi,

  1   2   >