Re: [PATCH] Fix unrecognizable insn of pr92865

2019-12-10 Thread Hongtao Liu
On Wed, Dec 11, 2019 at 3:54 PM Jakub Jelinek  wrote:
>
> On Wed, Dec 11, 2019 at 09:55:24AM +0800, Hongtao Liu wrote:
> > Changelog
> > gcc/
> > PR target/92865
> > * config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable
> > integer mask cmov when available even with TARGET_XOP.
> >
> > gcc/testsuite
> > * gcc/testsuite/gcc.target/i386/pr92865-1.c: New test.
>
> Please remove gcc/testsuite/ here too.
> Ok with that change.
Yes, thanks.
>
> Jakub
>


-- 
BR,
Hongtao


Re: [PATCH] Fix unrecognizable insn of pr92865

2019-12-10 Thread Jakub Jelinek
On Wed, Dec 11, 2019 at 09:55:24AM +0800, Hongtao Liu wrote:
> Changelog
> gcc/
> PR target/92865
> * config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable
> integer mask cmov when available even with TARGET_XOP.
> 
> gcc/testsuite
> * gcc/testsuite/gcc.target/i386/pr92865-1.c: New test.

Please remove gcc/testsuite/ here too.
Ok with that change.

Jakub



Re: [PATCH] OpenACC reference count overhaul

2019-12-10 Thread Chung-Lin Tang

On 2019/12/10 12:04 AM, Julian Brown wrote:

I'm citing below the changes introducing 'gomp_remove_var_async',
modelled similar to the existing 'gomp_unmap_vars_async'.


Also for both these, do I understand correctly, that it's actually not
the 'gomp_unref_tgt' that needs to be "delayed" via
'goacc_asyncqueue', but rather really only the
'gomp_free_device_memory', called via 'gomp_unmap_tgt', called via
'gomp_unref_tgt'?  In other words: why do we need to keep the 'struct
target_mem_desc' alive?  Per my understanding, that one is one
component of the mapping table, and not relevant anymore (thus can be
'free'd) as soon as it has been determined that 'tgt->refcount ==
0'?  Am I missing something there?

IIRC, that was Chung-Lin's choice. I'll CC him in. I think delaying
freeing of the target_mem_desc isn't really a huge problem, in practice.


I don't clearly remember all the details. It could be possible that not
asyncqueue-ifying gomp_remove_var was simply an overlook.

The 'target_mem_desc' is supposed to represent the piece of device memory
inside libgomp, so unref/freeing it only after all dev-to-host copying is
done seems logical.

Chung-Lin


[PATCH] Fix gnu-versioned-namespace build

2019-12-10 Thread François Dumont

I plan to commit this tomorrow.

Note that rather than just adding the missing 
_GLIBCXX_[BEGIN,END]_VERSION_NAMESPACE I also move anonymous namespace 
usage outside std namespace. Let me know if it was intentional.


    * src/c++11/random.cc: Add _GLIBCXX_BEGIN_NAMESPACE_VERSION and
    _GLIBCXX_END_NAMESPACE_VERSION. Move anonymous namespace outside std
    namespace.

Tested under Linux x86_64 normal/debug/versioned namespace modes.

There are still tests failing in versioned namespace, more patches to come.

François

diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index 10fbe1dc4c4..d4ebc9556ab 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -73,8 +73,6 @@
 # define USE_MT19937 1
 #endif
 
-namespace std _GLIBCXX_VISIBILITY(default)
-{
 namespace
 {
 #if USE_RDRAND
@@ -124,6 +122,10 @@ namespace std _GLIBCXX_VISIBILITY(default)
 #endif
 }
 
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
   void
   random_device::_M_init(const std::string& token)
   {
@@ -286,7 +288,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _M_mt.seed(seed);
 #else
 // Convert old default token "mt19937" or numeric seed tokens to "default".
-if (token == "mt19937" || isdigit((unsigned char)token[0]))
+if (token == "mt19937" || std::isdigit((unsigned char)token[0]))
   _M_init("default");
 else
   _M_init(token);
@@ -407,5 +409,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 0x9d2c5680UL, 15,
 0xefc6UL, 18, 1812433253UL>;
 #endif // USE_MT19937
-}
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
 #endif // _GLIBCXX_USE_C99_STDINT_TR1


Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-12-10 Thread Sandra Loosemore

On 12/6/19 3:41 PM, Jeff Law wrote:

On Wed, 2019-11-13 at 09:27 -0700, Sandra Loosemore wrote:


I bootstrapped and regression-tested this on x86_64-linux-gnu.  There
are a few regressions involving these tests:

gcc.dg/tree-ssa/pr77445-2.c

I believe tihs is another instance of the FSA optimization.  I'd need
to see the before/after dumps to know if it's regressed.  The main
purpose of the test was to verify that we didn't muck up the profile
estimation after the FSA optimization.



gcc.dg/tree-ssa/ssa-dce-3.c

So I think this one is supposed to collapse into a trivial infinite
loop.  Anything else would be a regression.



gcc.dg/tree-ssa/ssa-dom-thread-7.c

This is the FSA optimization.  Unfortunately checking for it being done
right is exceedingly painful.  If you pass along the before/after dumps
I can probably help determine whether or not we just need an update to
the scanned bits.

Given how much pressure there was to handle the FSA optimization, I'd
prefer to make sure we're still doing it before moving forward.


I poked at these 3 test cases some more.  FWIW, it appears that if you 
use an unmodified build to compile them as C++ instead of C, the same 
problems appear.  So I guess it is an existing bug in 
something-or-another that we are not presently optimizing code output by 
the C++ front end as well as that from the C front end.  :-S  (At least, 
the ssa-dce-3.c optimization failure is a bug; the other two might be 
fragile test cases.)


The C++ front end used to lower loops in exactly the same way as the C 
front end.  This is the patch that changed it:


https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01994.html

There wasn't much discussion about how this affected optimization beyond 
noting a slight decrease in code size for a single benchmark, and no 
consideration at all of whether it wouldn't be better to have the C and 
C++ front ends use the same lowering strategy for loops, whichever way 
produced better code, so that the optimizers can better recognize the 
common patterns from both languages.


Anyway, I'm no longer expecting to get this front end patch into GCC 10, 
but it would be helpful to get some guidance on how to proceed for 
resubmission when stage 1 re-opens.  E.g. from where I'm sitting now, 
fixing the C++ constexpr evaluator to handle gotos (if it doesn't 
already?) and reverting to the C way of lowering loops seems like a much 
more bounded problem than fixing optimizers and trying to benchmark 
their effectiveness.  :-S  OTOH, somebody more familiar with these 
optimizations might see easy fixes.  Or I could re-jigger my patches to 
continue to use different loop lowering strategies for C and C++ so it 
at least wouldn't make things any worse for either language than it 
already is.


-Sandra


[PATCH] Rename condition_variable_any::wait_on_* methods

2019-12-10 Thread Thomas Rodgers
User-agent: mu4e 1.3.4; emacs 26.2
* include/std/condition_variable
(condition_variable_any::wait_on(_Lock&, stop_token, _Predicate): Rename
to match current draft standard.
(condition_variable_any::wait_on_until(_Lock&, stop_token,
const chrono::time_point<>&, _Predicate): Likewise.
(condition_variable_any::wait_on_for(_Lock&, stop_token,
const chrono::duration<>&, _Predicate(: Likewise.
* testsuite/30_threads/condition_variable_any/stop_token/wait_on.cc 
(main):
Adjust tests to account for renamed methods.
---
 libstdc++-v3/include/std/condition_variable   | 30 +--
 .../stop_token/wait_on.cc | 22 +++---
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index 8887cee29fa..3346a28e5dd 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -369,9 +369,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #ifdef __cpp_lib_jthread
 template 
-bool wait_on(_Lock& __lock,
- stop_token __stoken,
- _Predicate __p)
+bool wait(_Lock& __lock,
+  stop_token __stoken,
+  _Predicate __p)
 {
   if (__stoken.stop_requested())
 {
@@ -397,10 +397,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
 template 
-bool wait_on_until(_Lock& __lock,
-   stop_token __stoken,
-   const chrono::time_point<_Clock, _Duration>& __abs_time,
-   _Predicate __p)
+bool wait_until(_Lock& __lock,
+stop_token __stoken,
+const chrono::time_point<_Clock, _Duration>& __abs_time,
+_Predicate __p)
 {
   if (__stoken.stop_requested())
 {
@@ -432,16 +432,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
 template 
-bool wait_on_for(_Lock& __lock,
- stop_token __stoken,
- const chrono::duration<_Rep, _Period>& __rel_time,
- _Predicate __p)
+bool wait_for(_Lock& __lock,
+  stop_token __stoken,
+  const chrono::duration<_Rep, _Period>& __rel_time,
+  _Predicate __p)
 {
   auto __abst = std::chrono::steady_clock::now() + __rel_time;
-  return wait_on_until(__lock,
-   std::move(__stoken),
-   __abst,
-   std::move(__p));
+  return wait_until(__lock,
+std::move(__stoken),
+__abst,
+std::move(__p));
 }
 #endif
   };
diff --git 
a/libstdc++-v3/testsuite/30_threads/condition_variable_any/stop_token/wait_on.cc
 
b/libstdc++-v3/testsuite/30_threads/condition_variable_any/stop_token/wait_on.cc
index 212fc949b3f..636425b43fc 100644
--- 
a/libstdc++-v3/testsuite/30_threads/condition_variable_any/stop_token/wait_on.cc
+++ 
b/libstdc++-v3/testsuite/30_threads/condition_variable_any/stop_token/wait_on.cc
@@ -28,7 +28,7 @@
 
 using namespace::std::literals;
 
-void test_wait_on_stop()
+void test_wait_stop()
 {
   bool ready = false;
   std::mutex mtx;
@@ -40,7 +40,7 @@ void test_wait_on_stop()
   std::thread t([, , , tok]
 {
   std::unique_lock lck(mtx);
-  auto res = cv.wait_on(lck, tok, [] { return ready; });
+  auto res = cv.wait(lck, tok, [] { return ready; });
   if (!res)
 {
   VERIFY(tok.stop_requested());
@@ -54,7 +54,7 @@ void test_wait_on_stop()
   VERIFY(src.stop_requested());
 }
 
-void test_wait_on_until(bool ck = true)
+void test_wait_until(bool ck = true)
 {
   bool ready = false;
   std::mutex mtx;
@@ -67,7 +67,7 @@ void test_wait_on_until(bool ck = true)
   std::thread t([ck, , , , abst, tok]
 {
   std::unique_lock lck(mtx);
-  auto res = cv.wait_on_until(lck, tok, abst, [] { 
return ready; });
+  auto res = cv.wait_until(lck, tok, abst, [] { return 
ready; });
   if (!res && ck)
 {
   VERIFY(tok.stop_requested());
@@ -90,7 +90,7 @@ void test_wait_on_until(bool ck = true)
 }
 }
 
-void test_wait_on_for(bool ck = true)
+void test_wait_for(bool ck = true)
 {
   bool ready = false;
   std::mutex mtx;
@@ -102,7 +102,7 @@ void test_wait_on_for(bool ck = true)
   std::thread t([ck, , , , tok]
 {
   std::unique_lock lck(mtx);
-  auto res = cv.wait_on_for(lck, tok, 1.0s, [] { return 
ready; });
+  auto res = cv.wait_for(lck, tok, 1.0s, [] { return 
ready; });
   if (!res && ck)
 {
   VERIFY(tok.stop_requested());
@@ -127,10 +127,10 @@ void 

Re: [PATCH] Restore enable_if lost during original import of pstl

2019-12-10 Thread Thomas Rodgers


Tested x86_64-pc-linux-gnu, committed to trunk, backported to
gcc-9-branch.

Jonathan Wakely writes:

> On 18/11/19 20:54 -0800, Thomas Rodgers wrote:
>>
>>  * include/pstl/glue_numeric_defs.h: Restore enable_if lost during 
>> original
>>  import of pstl.
>>  * include/pstl/glue_numeric_impl.h: Likewise.
>
> OK for trunk and gcc-9-branch, thanks.



Re: Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-10 Thread Jeff Law
On Mon, 2019-12-09 at 13:40 +, Stam Markianos-Wright wrote:
> 
> On 12/3/19 10:31 AM, Stam Markianos-Wright wrote:
> > 
> > On 12/2/19 9:27 PM, Joseph Myers wrote:
> > > On Mon, 2 Dec 2019, Jeff Law wrote:
> > > 
> > > > > 2019-11-13  Stam Markianos-Wright  <
> > > > > stam.markianos-wri...@arm.com>
> > > > > 
> > > > > * real.c (struct arm_bfloat_half_format,
> > > > > encode_arm_bfloat_half, decode_arm_bfloat_half): New.
> > > > > * real.h (arm_bfloat_half_format): New.
> > > > > 
> > > > > 
> > > > Generally OK.  Please consider using "arm_bfloat_half" instead
> > > > of
> > > > "bfloat_half" for the name field in the arm_bfloat_half_format
> > > > structure.  I'm not sure if that's really visible externally,
> > > > but it
> > Hi both! Agreed that we want to be conservative. See latest diff 
> > attached with the name field change (also pasted below).
> 
> .Ping :)
Sorry if I wasn't clear.  WIth the name change I considered this OK for
the trunk.  Please install on the trunk.

If you don't have commit privs let me know.


Jeff



[PATCH] Add abs pattern to handle {si,di} mode abs to avoid pmax/cmove conversion (PR92651)

2019-12-10 Thread 玩还有
Hi:
  Currently smax/smin pattern added by r274481 cause some regression
in 525.x264_r by 8% with -O2 -march=corei7. The reason is some IA
backends (contain TARGET_SSE4_1) will do transform for simple abs
(using rshift, xor and sub) to pmax/pmin if smax/smin pattern exists,
which generate unnecessary sse instruction. This patch adds abs
patterns to generate simple abs for integer mode to recover the
regression.

  Bootstrap ok, regression test on i386 backend is ok.
  Ok for trunk?

Changelog
gcc/
PR target/92651
* config/i386/i386.h (TARGET_USE_SIMPLE_ABS_PATTERN): New macro.
* config/i386/x86-tune.def (X86_TUNE_USE_SIMPLE_ABS_PATTERN): New
* config/i386/i386.md (abs2): New define_expand.

gcc/testsuite
* gcc.target/i386/pr92651.c: New testcase.

Regards,
Hongyu, Wang
From c0bf64efbcaa989349130b676880cc2ed49fca69 Mon Sep 17 00:00:00 2001
From: hongyuw1 
Date: Thu, 28 Nov 2019 12:49:04 +
Subject: [PATCH] Add abs pattern to handle {si,di} mode abs to avoid
 pmax/cmove conversion.

gcc/
	PR target/92651
	* config/i386/i386.h (TARGET_USE_SIMPLE_ABS_PATTERN): New macro.
	* config/i386/x86-tune.def (X86_TUNE_USE_SIMPLE_ABS_PATTERN): New.
	* config/i386/i386.md (abs2): New define_expand.

gcc/testsuite
	* gcc.target/i386/pr92651.c: New testcase.
---
 gcc/config/i386/i386.h  |  2 ++
 gcc/config/i386/i386.md | 39 +
 gcc/config/i386/x86-tune.def|  7 +
 gcc/testsuite/gcc.target/i386/pr92651.c | 16 ++
 4 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92651.c

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 108456b14d4..ea27526e42e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -596,6 +596,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 	ix86_tune_features[X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE]
 #define TARGET_EMIT_VZEROUPPER \
 	ix86_tune_features[X86_TUNE_EMIT_VZEROUPPER]
+#define TARGET_USE_SIMPLE_ABS_PATTERN \
+	ix86_tune_features[X86_TUNE_USE_SIMPLE_ABS_PATTERN]
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7ff5872ba43..8e5059aedec 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9668,6 +9668,45 @@
   "#"
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
+;; Special expand pattern to handle integer mode abs
+
+(define_expand "abs2"
+  [(set (match_operand:SWI48x 0 "register_operand")
+(abs:SWI48x
+  (match_operand:SWI48x 1 "register_operand")))]
+  "TARGET_USE_SIMPLE_ABS_PATTERN"
+  {
+machine_mode mode = mode;
+
+/* Generate rtx abs using abs (x) = (((signed) x >> (W-1)) ^ x) -
+   ((signed) x >> (W-1)) */
+rtx shift_amount = gen_int_shift_amount (mode,
+   GET_MODE_PRECISION (mode)
+   - 1);
+shift_amount = convert_modes (E_QImode, GET_MODE (shift_amount),
+			shift_amount, 1);
+rtx shift_dst = gen_reg_rtx (mode);
+rtx shift_op = gen_rtx_SET (shift_dst,
+			  gen_rtx_fmt_ee (ASHIFTRT, mode,
+	  operands[1], shift_amount));
+rtx clobber = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode,
+		FLAGS_REG));
+emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, shift_op,
+		clobber)));
+
+rtx xor_op = gen_rtx_SET (operands[0],
+			gen_rtx_fmt_ee (XOR, mode, shift_dst,
+	operands[1]));
+emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, xor_op, clobber)));
+
+rtx minus_op = gen_rtx_SET (operands[0],
+			  gen_rtx_fmt_ee (MINUS, mode,
+	  operands[0], shift_dst));
+emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, minus_op,
+		clobber)));
+DONE;
+  })
+
 (define_expand "2"
   [(set (match_operand:X87MODEF 0 "register_operand")
 	(absneg:X87MODEF (match_operand:X87MODEF 1 "register_operand")))]
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 328535d38d7..86ff24122e6 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -317,6 +317,13 @@ DEF_TUNE (X86_TUNE_ONE_IF_CONV_INSN, "one_if_conv_insn",
 DEF_TUNE (X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE, "use_xchg_for_atomic_store",
 	 m_CORE_ALL | m_BDVER | m_ZNVER | m_GENERIC)
 
+/* X86_TUNE_USE_SIMPLE_ABS_PATTERN: This enables a new abs pattern by
+   generating instructions for abs (x) = (((signed) x >> (W-1) ^ x) -
+   (signed) x >> (W-1)) instead of cmove or SSE max/abs instructions.  */
+DEF_TUNE (X86_TUNE_USE_SIMPLE_ABS_PATTERN, "use_simple_abs_pattern",
+	  m_CORE_ALL | m_SILVERMONT | m_KNL | m_KNM | m_GOLDMONT
+	  | m_GOLDMONT_PLUS | m_TREMONT )
+
 /*/
 /* 387 instruction selection tuning  */
 /*/
diff --git a/gcc/testsuite/gcc.target/i386/pr92651.c 

Re: [PATCH] Fix multibyte-related issues in pretty-print.c (PR 91843)

2019-12-10 Thread David Malcolm
On Tue, 2019-12-10 at 09:36 -0500, Lewis Hyatt wrote:
> On Mon, Dec 9, 2019 at 4:58 PM David Malcolm 
> wrote:
> > On Thu, 2019-10-10 at 16:27 -0400, Lewis Hyatt wrote:
> > > Hello-
> > > 
> > > This short patch addresses
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843
> > > by adding the needed multibyte awareness to pretty-print.c.
> > > Together with my other patch awaiting review
> > > (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this
> > > fixes all
> > > issues that I am aware of regarding printing diagnostics with
> > > multibyte
> > > characters in UTF-8 locales. Would you please have a look and see
> > > if
> > > it's OK?
> > > Thanks very much.
> > > 
> > > bootstrapped and tested on x86-64 Linux, all test results were
> > > identical before
> > > and after:
> > > 34 XPASS
> > > 109 FAIL
> > > 1490 XFAIL
> > > 9470 UNSUPPORTED
> > > 332971 PASS
> > > 
> > > -Lewis
> > 
> > Patch looks good to me.
> > 
> > Do you want SVN commit access, as per:
> >   https://www.gnu.org/software/gcc/svnwrite.html
> > ?
> > 
> > I'm willing to sponsor you.
> > 
> > Dave
> > 
> 
> All set with the commit access, thank you. 

Excellent.

> I just wanted to double
> check that this patch is OK to go in.

Looks good to me.

Dave



Re: [PATCH] Fix unrecognizable insn of pr92865

2019-12-10 Thread Hongtao Liu
On Tue, Dec 10, 2019 at 4:11 PM Jakub Jelinek  wrote:
>
> On Tue, Dec 10, 2019 at 01:47:50PM +0800, Hongtao Liu wrote:
> >   This patch is to enable integer mask cmp/cmov under AVX512F even
> > with TARGET_XOP .
> >   Bootstrap and regression test on i386/x86_64 backend is ok.
> >
> > Changelog:
> > PR target/92865
> > * gcc/config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable
> > integer mask cmov when available even with TARGET_XOP.
> > * gcc/testsuite/gcc.target/i386/pr92865-1.c: New test.
>
> No gcc/ or gcc/testsuite/ prefixes in ChangeLog.
>
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -3428,7 +3428,7 @@ static bool
> >  ix86_valid_mask_cmp_mode (machine_mode mode)
> >  {
> >/* XOP has its own vector conditional movement.  */
> > -  if (TARGET_XOP)
> > +  if (TARGET_XOP && !TARGET_AVX512F)
> >  return false;
> >
> >/* AVX512F is needed for mask operation.  */
>
> We don't know what will AMD CPUs with AVX512* do or what will be optimal for
Yes, I'll make it tunable for different processors in another
separated patch or in this one?
> them, there aren't any yet.  I guess this is fine for now, so would be your
> previous && GET_MODE_SIZE (mode) == 16.
>
> Jakub
>
Updated patch with Changelog


Changelog
gcc/
PR target/92865
* config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable
integer mask cmov when available even with TARGET_XOP.

gcc/testsuite
* gcc/testsuite/gcc.target/i386/pr92865-1.c: New test.


-- 
BR,
Hongtao
From 16b7c5caa930684fce604b352575d27a92b313fb Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 10 Dec 2019 09:44:18 +0800
Subject: [PATCH] Fix unrecognizable insn of pr92865.

gcc/
PR target/92865
* config/i386/i386-expand.c (ix86_valid_mask_cmp_mode): Enable
integer mask cmov when available even with TARGET_XOP.

gcc/testsuite
* gcc/testsuite/gcc.target/i386/pr92865-1.c: New test.
---
 gcc/config/i386/i386-expand.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr92865-1.c | 67 +++
 2 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92865-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index ff3c24cc5b7..cbf4eb7b487 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3428,7 +3428,7 @@ static bool
 ix86_valid_mask_cmp_mode (machine_mode mode)
 {
   /* XOP has its own vector conditional movement.  */
-  if (TARGET_XOP)
+  if (TARGET_XOP && !TARGET_AVX512F)
 return false;
 
   /* AVX512F is needed for mask operation.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr92865-1.c b/gcc/testsuite/gcc.target/i386/pr92865-1.c
new file mode 100644
index 000..49b5778a067
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92865-1.c
@@ -0,0 +1,67 @@
+/* PR target/92865 */
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512f -mavx512bw -mxop" } */
+/* { dg-final { scan-assembler-times "vpcmp\[bwdq\]\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vpcmpu\[bwdq\]\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]8\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]16\[\t ]" 4 } } *
+/* { dg-final { scan-assembler-times "vmovdq\[au\]32\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]64\[\t ]" 4 } } */
+
+extern char arraysb[64];
+extern short arraysw[32];
+extern int arraysd[16];
+extern long long arraysq[8];
+
+extern unsigned char arrayub[64];
+extern unsigned short arrayuw[32];
+extern unsigned int arrayud[16];
+extern unsigned long long arrayuq[8];
+
+int f1(char a)
+{
+  for (int i = 0; i < 64; i++)
+arraysb[i] = arraysb[i] >= a;
+}
+
+int f2(short a)
+{
+  for (int i = 0; i < 32; i++)
+arraysw[i] = arraysw[i] >= a;
+}
+
+int f3(int a)
+{
+  for (int i = 0; i < 16; i++)
+arraysd[i] = arraysd[i] >= a;
+}
+
+int f4(long long a)
+{
+  for (int i = 0; i < 8; i++)
+arraysq[i] = arraysq[i] >= a;
+}
+
+int f5(unsigned char a)
+{
+  for (int i = 0; i < 64; i++)
+arrayub[i] = arrayub[i] >= a;
+}
+
+int f6(unsigned short a)
+{
+  for (int i = 0; i < 32; i++)
+arrayuw[i] = arrayuw[i] >= a;
+}
+
+int f7(unsigned int a)
+{
+  for (int i = 0; i < 16; i++)
+arrayud[i] = arrayud[i] >= a;
+}
+
+int f8(unsigned long long a)
+{
+  for (int i = 0; i < 8; i++)
+arrayuq[i] = arrayuq[i] >= a;
+}
-- 
2.18.1



[patch] factor out common files for compare_exclusions

2019-12-10 Thread Matthias Klose
the toplevel configure.ac repeats common exclusion files for specific targets.
Just factor those out.  Maybe not required, but gm2 is adding more files to be
ignored on every target, so make it easy to only have these files mentioned in
one place. Ok for the trunk?

Matthias

2019-12-11  Matthias Klose  

	* configure.ac: Factor out common cases for compare_exclusions.
	* configure: Regenerate.

Index: configure.ac
===
--- configure.ac	(revision 279204)
+++ configure.ac	(working copy)
@@ -3628,8 +3628,8 @@
 compare_exclusions="gcc/cc*-checksum\$(objext) | gcc/ada/*tools/*"
 case "$target" in
   hppa*64*-*-hpux*) ;;
-  hppa*-*-hpux*) compare_exclusions="gcc/cc*-checksum\$(objext) | */libgcc/lib2funcs* | gcc/ada/*tools/* | gcc/function-tests.o" ;;
-  powerpc*-ibm-aix*) compare_exclusions="gcc/cc*-checksum\$(objext) | gcc/ada/*tools/* | *libgomp*\$(objext)" ;;
+  hppa*-*-hpux*) compare_exclusions="$compare_exclusions | */libgcc/lib2funcs* | gcc/function-tests.o" ;;
+  powerpc*-ibm-aix*) compare_exclusions="$compare_exclusions | *libgomp*\$(objext)" ;;
 esac
 AC_SUBST(compare_exclusions)
 


Go patch committed: Build type descriptor for alias in other package

2019-12-10 Thread Ian Lance Taylor
This patch to the Go frontend by Cherry Zhang generates type
descriptors for pointers to aliases defined in other packages.  When a
type descriptor is needed (for e.g. interface conversion), if the type
is a pointer to a named type defined in another package, we don't
generate the definition of the type descriptor because it is generated
in the package where the type is defined.  However, if the named type
is an alias to an unnamed type, its descriptor is not generated in the
other package, and we need to generate it.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 279136)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-6f2bf15e15bf7516c393966577d72b79cba7f980
+85641a0f26061f7c98db42a2adb3250c07ce504e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 278984)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -1453,13 +1453,13 @@ Type::type_descriptor_defined_elsewhere(
   else
 {
   if (this->points_to() != NULL
- && this->points_to()->named_type() != NULL
- && this->points_to()->named_type()->named_object()->package() != NULL)
+ && this->points_to()->unalias()->named_type() != NULL
+ && 
this->points_to()->unalias()->named_type()->named_object()->package() != NULL)
{
  // This is an unnamed pointer to a named type defined in a
  // different package.  The descriptor should be defined in
  // that package.
- *package = this->points_to()->named_type()->named_object()->package();
+ *package = 
this->points_to()->unalias()->named_type()->named_object()->package();
  return true;
}
 }


[PATCH] libstdc++: Correct noexcept-specifiers on span constructors

2019-12-10 Thread Jonathan Wakely

As discussed at https://github.com/cplusplus/draft/issues/3534 two
std::span constructors specify incorrect conditions for throwing
exceptions. This patch makes those constructors have correct
noexcept-specifiers that accurately reflect what can actually throw.

(span(ContiguousIterator, Sentinel)): Add conditional noexcept.
* include/std/span (span(ContiguousIterator, size_type)): Change
noexcept to be unconditionally true.
* testsuite/23_containers/span/nothrow_cons.cc: New test.

Tested x86_64-linux, committed to trunk.

commit a1cc0205b3a297db1399519f5169a6a1f245a421
Author: Jonathan Wakely 
Date:   Tue Dec 10 23:39:22 2019 +

libstdc++: Correct noexcept-specifiers on span constructors

As discussed at https://github.com/cplusplus/draft/issues/3534 two
std::span constructors specify incorrect conditions for throwing
exceptions. This patch makes those constructors have correct
noexcept-specifiers that accurately reflect what can actually throw.

(span(ContiguousIterator, Sentinel)): Add conditional noexcept.
* include/std/span (span(ContiguousIterator, size_type)): Change
noexcept to be unconditionally true.
* testsuite/23_containers/span/nothrow_cons.cc: New test.

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index ecce0b33b0b..6328ecbcde5 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -210,6 +210,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
&& (!is_convertible_v<_Sentinel, size_type>)
constexpr
span(_ContiguousIterator __first, _Sentinel __last)
+   noexcept(noexcept(__last - __first))
: _M_extent(static_cast(__last - __first)),
  _M_ptr(std::to_address(__first))
{
@@ -221,7 +222,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
requires (__is_compatible_iterator<_ContiguousIterator>::value)
constexpr
span(_ContiguousIterator __first, size_type __count)
-   noexcept(noexcept(std::to_address(__first)))
+   noexcept
: _M_extent(__count), _M_ptr(std::to_address(__first))
{ __glibcxx_assert(_Extent == dynamic_extent || __count == _Extent); }
 
diff --git a/libstdc++-v3/testsuite/23_containers/span/nothrow_cons.cc 
b/libstdc++-v3/testsuite/23_containers/span/nothrow_cons.cc
new file mode 100644
index 000..f28a3386aaf
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/span/nothrow_cons.cc
@@ -0,0 +1,59 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+
+using std::span;
+using std::is_nothrow_constructible_v;
+
+static_assert( is_nothrow_constructible_v> );
+static_assert( is_nothrow_constructible_v> );
+
+static_assert( is_nothrow_constructible_v, span&> );
+static_assert( is_nothrow_constructible_v, span&> );
+static_assert( is_nothrow_constructible_v, span&> );
+static_assert( is_nothrow_constructible_v, span&> );
+static_assert( is_nothrow_constructible_v, span&> );
+static_assert( is_nothrow_constructible_v, span&> );
+
+static_assert( is_nothrow_constructible_v, int(&)[1]> );
+static_assert( is_nothrow_constructible_v, int(&)[1]> );
+static_assert( is_nothrow_constructible_v, std::array&> );
+static_assert( is_nothrow_constructible_v, std::array&> );
+
+template
+struct sentinel { int* p; };
+
+template
+bool operator==(sentinel s, int* p) noexcept { return s.p == p; }
+
+template
+std::ptrdiff_t operator-(sentinel s, int* p) noexcept(B) { return s.p - p; }
+
+template
+std::ptrdiff_t operator-(int* p, sentinel s) noexcept { return p - s.p; }
+
+static_assert(std::sized_sentinel_for, int*>);
+static_assert(std::sized_sentinel_for, int*>);
+
+static_assert(is_nothrow_constructible_v, int*, std::size_t>);
+static_assert(is_nothrow_constructible_v, int*, const int*>);
+static_assert(is_nothrow_constructible_v, int*, sentinel>);
+static_assert(!is_nothrow_constructible_v, int*, sentinel>);


Re: [PATCH] Fix ICE in compute_objsize (PR tree-optimization/92891)

2019-12-10 Thread Jeff Law
On Wed, 2019-12-11 at 00:23 +0100, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase ICEs, because gimple_call_alloc_size doesn't
> always
> return a sizetype typed INTEGER_CST, which the callers rely on
> (compare
> those converted to wide_int with other wide_ints with the sizetype
> precision).  If alloc_size attribute has two arguments,
> gimple_call_alloc_size will always build a sizetype INTEGER_CST,
> but if it is just one, it returns what is passed to the argument,
> whatever type it has, so could be wider (e.g. __int128) or narrower
> like in the testcase on lp64.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for
> trunk?
> 
> 2019-12-10  Jakub Jelinek  
> 
>   PR tree-optimization/92891
>   * builtins.c (gimple_call_alloc_size): Convert size to sizetype
>   before returning it.
> 
>   * gcc.c-torture/compile/pr92891.c: New test.
OK.

jeff
> 



[PATCH] Fix ICE in compute_objsize (PR tree-optimization/92891)

2019-12-10 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because gimple_call_alloc_size doesn't always
return a sizetype typed INTEGER_CST, which the callers rely on (compare
those converted to wide_int with other wide_ints with the sizetype
precision).  If alloc_size attribute has two arguments,
gimple_call_alloc_size will always build a sizetype INTEGER_CST,
but if it is just one, it returns what is passed to the argument,
whatever type it has, so could be wider (e.g. __int128) or narrower
like in the testcase on lp64.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2019-12-10  Jakub Jelinek  

PR tree-optimization/92891
* builtins.c (gimple_call_alloc_size): Convert size to sizetype
before returning it.

* gcc.c-torture/compile/pr92891.c: New test.

--- gcc/builtins.c.jj   2019-12-09 19:50:24.733953169 +0100
+++ gcc/builtins.c  2019-12-10 20:56:53.619769947 +0100
@@ -3755,7 +3755,7 @@ gimple_call_alloc_size (gimple *stmt)
 return NULL_TREE;
 
   if (argidx2 > nargs && TREE_CODE (size) == INTEGER_CST)
-return size;
+return fold_convert (sizetype, size);
 
   /* To handle ranges do the math in wide_int and return the product
  of the upper bounds as a constant.  Ignore anti-ranges.  */
--- gcc/testsuite/gcc.c-torture/compile/pr92891.c.jj2019-12-10 
21:09:14.137648344 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr92891.c   2019-12-10 
21:08:56.902907013 +0100
@@ -0,0 +1,16 @@
+/* PR tree-optimization/92891 */
+
+int a, b;
+char *foo (int) __attribute__((alloc_size(1)));
+
+void
+bar (void)
+{
+  char *e = foo (2);
+  while (a)
+{
+  if (b <= 0)
+   continue;
+  e[b] = 0;
+}
+}

Jakub



Re: [PATCH] avoid invoking assignment on uninitialized objects (PR 92761, 92762)

2019-12-10 Thread Martin Sebor

On 12/10/19 3:07 PM, David Malcolm wrote:

On Tue, 2019-12-03 at 15:41 -0700, Martin Sebor wrote:

After allocating a new chunk of memory hash_table::expand() copy-
assigns elements from the current array over the newly allocated
elements without constructing them.

Similarly, after destroying existing elements, hash_table::
empty_slow() assigns a new value to them.  This bug was
introduced in r249234 when trying to deal with -Wclass-memaccess
instances when the warning was first added.

Neither of these is a problem for PODs but both cause trouble when
the hash_table contains elements of a type with a user-defined copy
assignment operator.  There are at least two such instances in GCC
already and a third is under review.

The attached patch avoids this by using placement new to construct
new elements in uninitialized storage and restoring the original
call to memset in hash_table::empty_slow(), analogously to what
was done in r272893 for PR90923,

Longer term, to make these templates safely and efficiently usable
with non-POD types with user-defined copy ctors and copy assignment
operators I think these classes will probably need to be enhanced
to make use of "assign" and "move" traits functions to efficiently
assign and move objects.

Martin



diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index ba5d64fb7a3..26bac624a08 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -818,8 +818,7 @@ hash_table::expand ()
if (!is_empty (x) && !is_deleted (x))
  {
value_type *q = find_empty_slot_for_expand (Descriptor::hash (x));
-
-  *q = x;
+ new ((void*) q) value_type (x);
  }
  
p++;

@@ -869,14 +868,8 @@ hash_table::empty_slow ()
m_size_prime_index = nindex;
  }
else
-{
-#ifndef BROKEN_VALUE_INITIALIZATION
-  for ( ; size; ++entries, --size)
-   *entries = value_type ();
-#else
-  memset (entries, 0, size * sizeof (value_type));
-#endif
-}
+memset ((void *) entries, 0, size * sizeof (value_type));
+
m_n_deleted = 0;
m_n_elements = 0;
  }


On attempting to rebase my analyzer branch I found that this patch
uncovered a bug in it, but also possibly a bug in hash-table.h.

In the analyzer branch I have a hash_map with a key where the "empty"
value isn't all-zero-bits.


There's a test case for it in comment #3 in 92762.



Specifically, in gcc/analyzer/program-state.h: sm_state_map has a
hash_map  map_t, where svalue_id, the key type, has
hash traits:

template <>
inline void
pod_hash_traits::mark_empty (value_type )
{
   v = svalue_id::null ();
}

which is a -1 int (all ones).

memset to zero bits populates the "empty" slots with a key with a non-
empty value for this key type, effectively corrupting the data
structure (luckily a selftest caught this).

Looking at the above patch, it looks like I was unwittingly relying on
two things:
(a) #ifndef BROKEN_VALUE_INITIALIZATION, and
(b) that the default ctor for my key type was the "empty" value.

However, shouldn't empty_slow be calling the Descriptor::mark_empty
function?  Doesn't this memset code assume that the "empty" value of
the hash_table entry is all-zeroes (and thus imposing the same
assumption on all hash_maps' key and value types?) - which isn't the
case for my code.  I don't remember seeing that assumption documented.

Or am I misreading this?


IIRC, I had tried the below but it caused problems during bootstrap
that I didn't spend enough time investigating.

  for (size_t i = size - 1; i < size; i--)
if (!is_empty (entries[i]) && !is_deleted (entries[i]))
  {
Descriptor::remove (entries[i]);
mark_empty (entries[i]);
  }

(I think it also caused compilation error in one of the IPA passes
because of a missing mark_empty function in its traits class; maybe
ipa_bit_ggc_hash_traits in ipa-prop.c).

To be honest, I'm not sure I understand why the memset is even there.
It seems like a leak to me (I can reproduce leaks with a user-defined
type).  I was going to get back to it at some point.

Martin


Re: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float comparisons

2019-12-10 Thread Vineet Gupta
On 12/10/19 1:12 AM, Claudiu Zissulescu wrote:
> Hi,
>
> Thank you for your contribution, I'll push it asap. As far as I understand, 
> you need this patch both in gcc9 branch and mainline.
>
> Cheers,
> Claudiu

Indeed both mainline and gcc9

Thx
-Vineet

>
>> -Original Message-
>> From: Vineet Gupta [mailto:vgu...@synopsys.com]
>> Sent: Monday, December 09, 2019 8:02 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Claudiu Zissulescu ;
>> andrew.burg...@embecosm.com; linux-snps-...@lists.infradead.org;
>> Vineet Gupta 
>> Subject: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float
>> comparisons
>>
>> ARC gcc generates FDCMP instructions which raises Invalid operation for
>> signaling NaN only. This causes glibc iseqsig() primitives to fail (in
>> the current ongoing glibc port to ARC)
>>
>> So split up the hard float compares into two categories and for unordered
>> compares generate the FDCMPF instruction (vs. FDCMP) which raises
>> exception
>> for either NaNs.
>>
>> With this fix testsuite/gcc.dg/torture/pr52451.c passes for ARC.
>>
>> Also passes 6 additional tests in glibc testsuite (test*iseqsig) and no
>> regressions
>>
>> gcc/
>> -xx-xx  Vineet Gupta  
>>
>>  * config/arc/arc-modes.def (CC_FPUE): New Mode CC_FPUE which
>>  helps codegen generate exceptions even for quiet NaN.
>>  * config/arc/arc.c (arc_init_reg_tables): Handle New CC_FPUE mode.
>>  (get_arc_condition_code): Likewise.
>>  (arc_select_cc_mode): LT, LE, GT, GE to use the New CC_FPUE
>> mode.
>>  * config/arc/arc.h (REVERSE_CONDITION): Handle New CC_FPUE
>> mode.
>>  * config/arc/predicates.md (proper_comparison_operator):
>> Likewise.
>>  * config/arc/fpu.md (cmpsf_fpu_trap): New Pattern for CC_FPUE.
>>  (cmpdf_fpu_trap): Likewise.
>>
>> Signed-off-by: Vineet Gupta 
>> ---
>>  gcc/config/arc/arc-modes.def |  1 +
>>  gcc/config/arc/arc.c |  8 ++--
>>  gcc/config/arc/arc.h |  2 +-
>>  gcc/config/arc/fpu.md| 24 
>>  gcc/config/arc/predicates.md |  1 +
>>  5 files changed, 33 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
>> index 36a2f4abfb25..d16b6a289a15 100644
>> --- a/gcc/config/arc/arc-modes.def
>> +++ b/gcc/config/arc/arc-modes.def
>> @@ -38,4 +38,5 @@ VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI
>> */
>>
>>  /* FPU condition flags.  */
>>  CC_MODE (CC_FPU);
>> +CC_MODE (CC_FPUE);
>>  CC_MODE (CC_FPU_UNEQ);
>> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
>> index 28305f459dcd..cbb95d6e9043 100644
>> --- a/gcc/config/arc/arc.c
>> +++ b/gcc/config/arc/arc.c
>> @@ -1564,6 +1564,7 @@ get_arc_condition_code (rtx comparison)
>>  default : gcc_unreachable ();
>>  }
>>  case E_CC_FPUmode:
>> +case E_CC_FPUEmode:
>>switch (GET_CODE (comparison))
>>  {
>>  case EQ: return ARC_CC_EQ;
>> @@ -1686,11 +1687,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x,
>> rtx y)
>>case UNLE:
>>case UNGT:
>>case UNGE:
>> +return CC_FPUmode;
>> +
>>case LT:
>>case LE:
>>case GT:
>>case GE:
>> -return CC_FPUmode;
>> +return CC_FPUEmode;
>>
>>case LTGT:
>>case UNEQ:
>> @@ -1844,7 +1847,7 @@ arc_init_reg_tables (void)
>>if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode
>>|| i == (int) CC_Cmode
>>|| i == CC_FP_GTmode || i == CC_FP_GEmode || i ==
>> CC_FP_ORDmode
>> -  || i == CC_FPUmode || i == CC_FPU_UNEQmode)
>> +  || i == CC_FPUmode || i == CC_FPUEmode || i ==
>> CC_FPU_UNEQmode)
>>  arc_mode_class[i] = 1 << (int) C_MODE;
>>else
>>  arc_mode_class[i] = 0;
>> @@ -8401,6 +8404,7 @@ arc_reorg (void)
>>
>>/* Avoid FPU instructions.  */
>>if ((GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUmode)
>> +  || (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUEmode)
>>|| (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) ==
>> CC_FPU_UNEQmode))
>>  continue;
>>
>> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
>> index 4d7ac3281b41..c08ca3d0d432 100644
>> --- a/gcc/config/arc/arc.h
>> +++ b/gcc/config/arc/arc.h
>> @@ -1531,7 +1531,7 @@ enum arc_function_type {
>>(((MODE) == CC_FP_GTmode || (MODE) == CC_FP_GEmode \
>>  || (MODE) == CC_FP_UNEQmode || (MODE) == CC_FP_ORDmode   \
>>  || (MODE) == CC_FPXmode || (MODE) == CC_FPU_UNEQmode \
>> -|| (MODE) == CC_FPUmode) \
>> +|| (MODE) == CC_FPUmode || (MODE) == CC_FPUEmode)\
>> ? reverse_condition_maybe_unordered ((CODE))  \
>> : reverse_condition ((CODE)))
>>
>> diff --git a/gcc/config/arc/fpu.md b/gcc/config/arc/fpu.md
>> index 6289e9c3f593..6729795de542 100644
>> --- a/gcc/config/arc/fpu.md
>> +++ b/gcc/config/arc/fpu.md
>> @@ -242,6 +242,18 @@
>> 

Re: [Patch, Fortran] PR92898 - [9/10 Regression] ICE in gfc_check_is_contiguous, at fortran/check.c:7157

2019-12-10 Thread Thomas Koenig

Hello Harald,


Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c (Revision 279183)
+++ gcc/fortran/check.c (Arbeitskopie)
@@ -7154,7 +7154,9 @@ bool
  gfc_check_is_contiguous (gfc_expr *array)
  {
if (array->expr_type == EXPR_NULL
-  && array->symtree->n.sym->attr.pointer == 1)
+  && (!array->symtree ||
+ (array->symtree->n.sym &&
+  array->symtree->n.sym->attr.pointer == 1)))


I have to admit I do not understand the original code here, nor
do I quite understand your fix.

Is there any circumstance where array->expr_type == EXPR_NULL, but
is_contiguous is valid?  What would go wrong if the other tests
were removed?



Index: gcc/testsuite/gfortran.dg/pr91641.f90
===
--- gcc/testsuite/gfortran.dg/pr91641.f90   (Revision 279183)
+++ gcc/testsuite/gfortran.dg/pr91641.f90   (Arbeitskopie)
@@ -1,7 +1,9 @@
  ! { dg-do compile }
  ! PR fortran/91641
-! Code conyributed by Gerhard Steinmetz
+! PR fortran/92898
+! Code contributed by Gerhard Steinmetz
  program p
 real, pointer :: z(:)
 print *, is_contiguous (null(z))! { dg-error "shall be an associated" }
+   print *, is_contiguous (null()) ! { dg-error "shall be an associated" }
  end


Sometimes, it is necessary to change test cases, when error messages
change.  If this is not the case, it is better to add new tests to
new test cases - this makes regression hunting much easier.

Regards

Thomas


patch to fix PR92796

2019-12-10 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92796

The patch was successfully bootstrapped and tested on x86-64.

Committed as r279204


Index: ChangeLog
===
--- ChangeLog	(revision 279200)
+++ ChangeLog	(working copy)
@@ -1,3 +1,18 @@
+2019-12-10  Vladimir Makarov  
+
+	PR rtl-optimization/92796
+	* lra-int.h (lra_risky_transformations_p): Rename to
+	check_and_force_assignment_correctness_p.
+	* lra-assigns.c: Ditto.
+	(lra_assign): Reset check_and_force_assignment_correctness_p.
+	* lra-constraints.c (lra_risky_transformations_p): Rename to
+	check_and_force_assignment_correctness_p.
+	(lra_constraints): Set up check_and_force_assignment_correctness_p
+	only for the 1st sub-pass.
+	* lra-eliminations.c (process_insn_for_elimination): Set up
+	check_and_force_assignment_correctness_p if the insn chnaged its
+	code.
+
 2019-12-10  Jakub Jelinek  
 
 	PR rtl-optimization/92882
Index: lra-assigns.c
===
--- lra-assigns.c	(revision 278608)
+++ lra-assigns.c	(working copy)
@@ -1131,7 +1131,7 @@ static int *sorted_pseudos;
 /* The constraints pass is allowed to create equivalences between
pseudos that make the current allocation "incorrect" (in the sense
that pseudos are assigned to hard registers from their own conflict
-   sets).  The global variable lra_risky_transformations_p says
+   sets).  The global variable check_and_force_assignment_correctness_p says
whether this might have happened.
 
Process pseudos assigned to hard registers (less frequently used
@@ -1152,7 +1152,7 @@ setup_live_pseudos_and_spill_after_risky
   bitmap_iterator bi;
   int max_regno = max_reg_num ();
 
-  if (! lra_risky_transformations_p)
+  if (! check_and_force_assignment_correctness_p)
 {
   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
 	if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
@@ -1690,6 +1690,8 @@ lra_assign (bool _p)
 internal_error
   ("maximum number of LRA assignment passes is achieved (%d)",
LRA_MAX_ASSIGNMENT_ITERATION_NUMBER);
+  /* Reset the assignment correctness flag: */
+  check_and_force_assignment_correctness_p = false;
   return no_spills_p;
 }
 
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 278608)
+++ lra-constraints.c	(working copy)
@@ -4665,11 +4665,14 @@ loc_equivalence_callback (rtx loc, const
 /* The current iteration number of this LRA pass.  */
 int lra_constraint_iter;
 
-/* True if we substituted equiv which needs checking register
-   allocation correctness because the equivalent value contains
-   allocatable hard registers or when we restore multi-register
-   pseudo.  */
-bool lra_risky_transformations_p;
+/* True if we should during assignment sub-pass check assignment
+   correctness for all pseudos and spill some of them to correct
+   conflicts.  It can be necessary when we substitute equiv which
+   needs checking register allocation correctness because the
+   equivalent value contains allocatable hard registers, or when we
+   restore multi-register pseudo, or when we change the insn code and
+   its operand became INOUT operand when it was IN one before.  */
+bool check_and_force_assignment_correctness_p;
 
 /* Return true if REGNO is referenced in more than one block.  */
 static bool
@@ -4811,14 +4814,14 @@ lra_constraints (bool first_p)
   changed_p = false;
   if (pic_offset_table_rtx
   && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER)
-lra_risky_transformations_p = true;
-  else
+check_and_force_assignment_correctness_p = true;
+  else if (first_p)
 /* On the first iteration we should check IRA assignment
correctness.  In rare cases, the assignments can be wrong as
early clobbers operands are ignored in IRA or usages of
paradoxical sub-registers are not taken into account by
IRA.  */
-lra_risky_transformations_p = first_p;
+check_and_force_assignment_correctness_p = true;
   new_insn_uid_start = get_max_uid ();
   new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num ();
   /* Mark used hard regs for target stack size calulations.  */
@@ -4994,7 +4997,7 @@ lra_constraints (bool first_p)
 		  dump_insn_slim (lra_dump_file, curr_insn);
 		}
 		  if (contains_reg_p (x, true, false))
-		lra_risky_transformations_p = true;
+		check_and_force_assignment_correctness_p = true;
 		  lra_set_insn_deleted (curr_insn);
 		  continue;
 		}
@@ -5507,7 +5510,7 @@ need_for_split_p (HARD_REG_SET potential
 	   /* Don't split call clobbered hard regs living through
 	  calls, otherwise we might have a check problem in the
 	  assign sub-pass as in the most cases (exception is a
-	  situation when lra_risky_transformations_p value is
+	  situation when check_and_force_assignment_correctness_p value 

Re: [PATCH] avoid invoking assignment on uninitialized objects (PR 92761, 92762)

2019-12-10 Thread David Malcolm
On Tue, 2019-12-03 at 15:41 -0700, Martin Sebor wrote:
> After allocating a new chunk of memory hash_table::expand() copy-
> assigns elements from the current array over the newly allocated
> elements without constructing them.
> 
> Similarly, after destroying existing elements, hash_table::
> empty_slow() assigns a new value to them.  This bug was
> introduced in r249234 when trying to deal with -Wclass-memaccess
> instances when the warning was first added.
> 
> Neither of these is a problem for PODs but both cause trouble when
> the hash_table contains elements of a type with a user-defined copy
> assignment operator.  There are at least two such instances in GCC
> already and a third is under review.
> 
> The attached patch avoids this by using placement new to construct
> new elements in uninitialized storage and restoring the original
> call to memset in hash_table::empty_slow(), analogously to what
> was done in r272893 for PR90923,
> 
> Longer term, to make these templates safely and efficiently usable
> with non-POD types with user-defined copy ctors and copy assignment
> operators I think these classes will probably need to be enhanced
> to make use of "assign" and "move" traits functions to efficiently
> assign and move objects.
> 
> Martin

> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> index ba5d64fb7a3..26bac624a08 100644
> --- a/gcc/hash-table.h
> +++ b/gcc/hash-table.h
> @@ -818,8 +818,7 @@ hash_table::expand ()
>if (!is_empty (x) && !is_deleted (x))
>  {
>value_type *q = find_empty_slot_for_expand (Descriptor::hash (x));
> -
> -  *q = x;
> +   new ((void*) q) value_type (x);
>  }
>  
>p++;
> @@ -869,14 +868,8 @@ hash_table::empty_slow ()
>m_size_prime_index = nindex;
>  }
>else
> -{
> -#ifndef BROKEN_VALUE_INITIALIZATION
> -  for ( ; size; ++entries, --size)
> - *entries = value_type ();
> -#else
> -  memset (entries, 0, size * sizeof (value_type));
> -#endif
> -}
> +memset ((void *) entries, 0, size * sizeof (value_type));
> +
>m_n_deleted = 0;
>m_n_elements = 0;
>  }

On attempting to rebase my analyzer branch I found that this patch
uncovered a bug in it, but also possibly a bug in hash-table.h.

In the analyzer branch I have a hash_map with a key where the "empty"
value isn't all-zero-bits.

Specifically, in gcc/analyzer/program-state.h: sm_state_map has a
hash_map  map_t, where svalue_id, the key type, has
hash traits:

template <>
inline void
pod_hash_traits::mark_empty (value_type )
{
  v = svalue_id::null ();
}

which is a -1 int (all ones).

memset to zero bits populates the "empty" slots with a key with a non-
empty value for this key type, effectively corrupting the data
structure (luckily a selftest caught this).

Looking at the above patch, it looks like I was unwittingly relying on
two things:
(a) #ifndef BROKEN_VALUE_INITIALIZATION, and
(b) that the default ctor for my key type was the "empty" value.

However, shouldn't empty_slow be calling the Descriptor::mark_empty
function?  Doesn't this memset code assume that the "empty" value of
the hash_table entry is all-zeroes (and thus imposing the same
assumption on all hash_maps' key and value types?) - which isn't the
case for my code.  I don't remember seeing that assumption documented.

Or am I misreading this?

Thanks
Dave



[Patch, Fortran] PR92898 - [9/10 Regression] ICE in gfc_check_is_contiguous, at fortran/check.c:7157

2019-12-10 Thread Harald Anlauf
The testcase in the PR exhibits a corner case in a check on invalid
code that was not handled appropriately and in turn ICEs.  The patch
below enhances that check.  Instead of adding a new testcase, I modified
the related one that came with the 'introduction' of the regression when
fixing PR91641.

Regtested on x86_64-pc-linux-gnu.

OK for trunk and 9 ?

Thanks,
Harald

2019-12-10  Harald Anlauf  

PR fortran/92898
* check.c (gfc_check_is_contiguous): Adjust check to handle NULL()
argument without an actual argument.

2019-12-10  Harald Anlauf  

PR fortran/92898
* gfortran.dg/pr91641.f90: Extend to check fix for PR92898.
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c (Revision 279183)
+++ gcc/fortran/check.c (Arbeitskopie)
@@ -7154,7 +7154,9 @@ bool
 gfc_check_is_contiguous (gfc_expr *array)
 {
   if (array->expr_type == EXPR_NULL
-  && array->symtree->n.sym->attr.pointer == 1)
+  && (!array->symtree ||
+ (array->symtree->n.sym &&
+  array->symtree->n.sym->attr.pointer == 1)))
 {
   gfc_error ("Actual argument at %L of %qs intrinsic shall be an "
 "associated pointer", >where, gfc_current_intrinsic);
Index: gcc/testsuite/gfortran.dg/pr91641.f90
===
--- gcc/testsuite/gfortran.dg/pr91641.f90   (Revision 279183)
+++ gcc/testsuite/gfortran.dg/pr91641.f90   (Arbeitskopie)
@@ -1,7 +1,9 @@
 ! { dg-do compile }
 ! PR fortran/91641
-! Code conyributed by Gerhard Steinmetz
+! PR fortran/92898
+! Code contributed by Gerhard Steinmetz
 program p
real, pointer :: z(:)
print *, is_contiguous (null(z))! { dg-error "shall be an associated" }
+   print *, is_contiguous (null()) ! { dg-error "shall be an associated" }
 end


[patch, fortran, committed] Fix PR 91643, repacking of assumed rank argument

2019-12-10 Thread Thomas Koenig

Hello world,

this fixes a regression introduced by my inline repacking patch.
With the test case, it is simple and obvious enough - do not repack
an assumed rank argument (which makes no sense).

Committed as obvious and simple as r279203 after regression-testing.


2019-12-10  Thomas Koenig  

PR fortran/91643
* trans-array.c (gfc_conv_array_parameter): Do not repack
an assume dummy argument.

2019-12-10  Thomas Koenig  

PR fortran/91643
* gfortran.dg/assumed_rank_18.f90: New test.
Index: trans-array.c
===
--- trans-array.c	(Revision 279064)
+++ trans-array.c	(Arbeitskopie)
@@ -8141,6 +8141,8 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr *
 
   if (g77 && optimize && !optimize_size && expr->expr_type == EXPR_VARIABLE
 	  && !is_pointer (expr) && ! gfc_has_dimen_vector_ref (expr)
+	  && !(expr->symtree->n.sym->as
+	   && expr->symtree->n.sym->as->type == AS_ASSUMED_RANK)
 	  && (fsym == NULL || fsym->ts.type != BT_ASSUMED))
 	{
 	  gfc_conv_subref_array_arg (se, expr, g77,
! { dg-do run }
! PR 91643 - this used to cause an ICE.
! Original test case by Gerhard Steinmetz.
program p
   real :: z(3) = [1.0, 2.0, 3.0]
   call g(z)
contains
   subroutine g(x)
  real :: x(..)
  call h(x)
   end
   subroutine h(x)
  real :: x(*)
  if (x(1) /= 1.0) stop 1
   end
end


Re: [patch, fortran] Introduce -finline-pack

2019-12-10 Thread Thomas Koenig

Am 09.12.19 um 17:30 schrieb Thomas Koenig:

Maybe -finline-repack would be a better name? -finline-internal-pack?


Steve made an excellent suggestion: -finline-arg-packing .

So, OK with that change?

Regards

Thomas


[C++ PATCH] Improve C++ error recovery (PR c++/59655)

2019-12-10 Thread Jakub Jelinek
Hi!

On the following testcase, we emit 2 errors and 1 warning, when the user
really should see one error.  The desirable error is static_assert failure,
the bogus error is during error recovery, complaining that a no_linkage
template isn't defined when it really is defined, but we haven't bothered
instantiating it, because limit_bad_template_recursion sad so.
And finally a warning from the middle-end about function being used even
when it is not defined.

The last one already checks TREE_NO_WARNING on the function decl to avoid
the warning, so this patch just uses TREE_NO_WARNING to signal this case,
both to that warning and to no_linkage_error.

Now, I admit I'm not 100% sure if using TREE_NO_WARNING is the best thing
for no_linkage_error, another possibility might be adding some bit in
lang_decl_base or so and setting that bit in addition to TREE_NO_WARNING,
where that new bit would mean this decl might be defined if we didn't decide
not to instantiate it.  With the patch as is, there is a risk if
TREE_NO_WARNING is set for some other reason on a fndecl (or variable
template instantiation?) and some errors have been reported already that we
won't report another error for it when we should.  But perhaps that is
acceptable, once users fix the original errors even if TREE_NO_WARNING will
be set, errorcount + sorrycount will be zero and thus no_linkage_error will
still report it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Or do you want to use an additional bit for that?

2019-12-10  Jakub Jelinek  

PR c++/59655
* pt.c (push_tinst_level_loc): If limit_bad_template_recursion,
set TREE_NO_WARNING on tldcl.
* decl2.c (no_linkage_error): Treat templates with TREE_NO_WARNING
as defined during error recovery.

* g++.dg/cpp0x/diag3.C: New test.

--- gcc/cp/pt.c.jj  2019-12-10 00:52:39.017449262 +0100
+++ gcc/cp/pt.c 2019-12-10 19:20:11.046062705 +0100
@@ -10640,7 +10640,12 @@ push_tinst_level_loc (tree tldcl, tree t
  anything else.  Do allow deduction substitution and decls usable in
  constant expressions.  */
   if (!targs && limit_bad_template_recursion (tldcl))
-return false;
+{
+  /* Avoid no_linkage_errors and unused function warnings for this
+decl.  */
+  TREE_NO_WARNING (tldcl) = 1;
+  return false;
+}
 
   /* When not -quiet, dump template instantiations other than functions, since
  announce_function will take care of those.  */
--- gcc/cp/decl2.c.jj   2019-11-28 09:05:13.376262983 +0100
+++ gcc/cp/decl2.c  2019-12-10 19:33:05.555237052 +0100
@@ -4414,7 +4414,14 @@ decl_maybe_constant_var_p (tree decl)
 void
 no_linkage_error (tree decl)
 {
-  if (cxx_dialect >= cxx11 && decl_defined_p (decl))
+  if (cxx_dialect >= cxx11
+  && (decl_defined_p (decl)
+ /* Treat templates which limit_bad_template_recursion decided
+not to instantiate as if they were defined.  */
+ || (errorcount + sorrycount > 0
+ && DECL_LANG_SPECIFIC (decl)
+ && DECL_TEMPLATE_INFO (decl)
+ && TREE_NO_WARNING (decl
 /* In C++11 it's ok if the decl is defined.  */
 return;
   tree t = no_linkage_check (TREE_TYPE (decl), /*relaxed_p=*/false);
--- gcc/testsuite/g++.dg/cpp0x/diag3.C.jj   2019-12-10 19:39:06.659722663 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/diag3.C  2019-12-10 19:37:30.617189316 +0100
@@ -0,0 +1,20 @@
+// PR c++/59655
+// { dg-do compile { target c++11 } }
+
+template struct A { static constexpr bool value = false; };
+
+struct B {
+  template
+  B (T t)
+  {
+static_assert (A::value, "baz");// { dg-error "static 
assertion failed" }
+foo (t);
+  }
+  template void foo (T) {} // { dg-bogus "used but never 
defined" }
+};
+
+int
+main ()
+{
+  B t([](int) { });
+}

Jakub



[PATCH] Fix vect rotate pattern recog (PR target/92723)

2019-12-10 Thread Jakub Jelinek
Hi!

Unlike x86, where the last operand of vector by scalar shift is DImode for
V[248]DImode shifts, on aarch64 they seem to be SImode.
vect_recog_rotate_pattern when the rotate amount is not constant casts the
amount to the element type of the vector, so for V[248]DImode vectors to
DImode, but then we ICE during expand_shift_1 because such argument doesn't
satisfy the predicate and can't be widened to it.

The following patch fixes it by special casing vector by scalar shifts
when adding patterns for rotates, in that case we punt if the operand isn't
INTEGER_CST or external_def, and the patch just keeps using the type of the
amount operand the rotate had for the shifts too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-10  Jakub Jelinek  

PR target/92723
* tree-vect-patterns.c (vect_recog_rotate_pattern): If vector x vector
shifts aren't supported and rotate amount is SSA_NAME, use its type
rather than first operand's type for the shift amounts.

* gcc.dg/vect/pr92723.c: New test.

--- gcc/tree-vect-patterns.c.jj 2019-12-09 11:12:29.983288823 +0100
+++ gcc/tree-vect-patterns.c2019-12-10 16:30:59.922177911 +0100
@@ -2242,6 +2242,7 @@ vect_recog_rotate_pattern (stmt_vec_info
   optab optab1, optab2;
   edge ext_def = NULL;
   bool bswap16_p = false;
+  bool scalar_shift_p = false;
 
   if (is_gimple_assign (last_stmt))
 {
@@ -2420,6 +2421,7 @@ vect_recog_rotate_pattern (stmt_vec_info
  || !optab2
  || optab_handler (optab2, TYPE_MODE (vectype)) == CODE_FOR_nothing)
return NULL;
+  scalar_shift_p = true;
 }
 
   *type_out = vectype;
@@ -2439,7 +2441,8 @@ vect_recog_rotate_pattern (stmt_vec_info
   def = NULL_TREE;
   scalar_int_mode mode = SCALAR_INT_TYPE_MODE (type);
   if (TREE_CODE (oprnd1) == INTEGER_CST
-  || TYPE_MODE (TREE_TYPE (oprnd1)) == mode)
+  || TYPE_MODE (TREE_TYPE (oprnd1)) == mode
+  || scalar_shift_p)
 def = oprnd1;
   else if (def_stmt && gimple_assign_cast_p (def_stmt))
 {
--- gcc/testsuite/gcc.dg/vect/pr92723.c.jj  2019-12-10 16:37:09.924375698 
+0100
+++ gcc/testsuite/gcc.dg/vect/pr92723.c 2019-12-10 16:37:21.823189103 +0100
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+
+void
+foo (unsigned long long *x, unsigned long long *y, int z)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+x[i] = (y[i] >> z) | (y[i] << (-z & (__SIZEOF_LONG_LONG__ * __CHAR_BIT__ - 
1)));
+}

Jakub



[C++ PATCH] Fix -std=c++17 and earlier handling of CLASSTYPE_NON_AGGREGATE (PR c++/92869)

2019-12-10 Thread Jakub Jelinek
Hi!

In C++20, types with user-declared constructors are not aggregate types,
while in C++17 only types with user-provided or explicit constructors.
In check_bases_and_members we handle it properly:
  CLASSTYPE_NON_AGGREGATE (t)
|= ((cxx_dialect < cxx2a
 ? type_has_user_provided_or_explicit_constructor (t)
 : TYPE_HAS_USER_CONSTRUCTOR (t))
|| TYPE_POLYMORPHIC_P (t));
but for templates the code right now behaves the C++20 way regardless of the
selected standard.

The following patch tweaks finish_struct to match check_bases_and_members.
I had to add !CLASSTYPE_NON_AGGREGATE check because
type_has_user_provided_or_explicit_constructor -> user_provided_p ICEd
on inherited ctors represented as USING_DECLs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-10  Jakub Jelinek  

PR c++/92869
* class.c (finish_struct): For C++17 and earlier, check
type_has_user_provided_or_explicit_constructor rather than
TYPE_HAS_USER_CONSTRUCTOR whether to set CLASSTYPE_NON_AGGREGATE.

* g++.dg/cpp0x/aggr3.C: New test.

--- gcc/cp/class.c.jj   2019-12-06 00:40:44.525629037 +0100
+++ gcc/cp/class.c  2019-12-10 14:18:41.171380767 +0100
@@ -7456,7 +7456,13 @@ finish_struct (tree t, tree attributes)
   /* Remember current #pragma pack value.  */
   TYPE_PRECISION (t) = maximum_field_alignment;
 
-  if (TYPE_HAS_USER_CONSTRUCTOR (t))
+  if (cxx_dialect < cxx2a)
+   {
+ if (!CLASSTYPE_NON_AGGREGATE (t)
+ && type_has_user_provided_or_explicit_constructor (t))
+   CLASSTYPE_NON_AGGREGATE (t) = 1;
+   }
+  else if (TYPE_HAS_USER_CONSTRUCTOR (t))
CLASSTYPE_NON_AGGREGATE (t) = 1;
 
   /* Fix up any variants we've already built.  */
--- gcc/testsuite/g++.dg/cpp0x/aggr3.C.jj   2019-12-10 14:25:22.344231923 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/aggr3.C  2019-12-10 14:23:31.700927787 +0100
@@ -0,0 +1,20 @@
+// PR c++/92869
+// { dg-do compile { target c++11 } }
+
+struct A {
+  A () = default;
+  A (const A &) = default;
+  A (A &&) = default;
+  int arr[3];
+};
+
+template 
+struct B {
+  B () = default;
+  B (const B &) = default;
+  B (B &&) = default;
+  T arr[N];
+};
+
+A a = { { 1, 2, 3 } }; // { dg-error "could not convert" "" { target 
c++2a } }
+B b = { { 1, 2, 3 } }; // { dg-error "could not convert" "" { target 
c++2a } }

Jakub



Re: [PATCH] Improve -fstack-protector-strong (PR middle-end/92825)

2019-12-10 Thread Jeff Law
On Tue, 2019-12-10 at 10:27 +0100, Jakub Jelinek wrote:
> Hi!
> 
> As mentioned in the PR, -fstack-protector-strong in some cases
> instruments
> even functions which except for the stack canary setup and later
> testing of
> that don't touch the stack at all.
> 
> This is because for -fstack-protector-strong we setup the canary
> whenever stack_protect_decl_p returns true, and that goes through all
> local
> decls and if it is a var that is not a global var and is either
> address
> taken or is array or has arrays embedded in its type, returns true.
> Now, the problem is that by going through all non-global variables,
> it goes
> even through variables that are or have arrays, but we allocate them
> in
> registers like in the testcase below, or in theory for nested
> functions
> it could be variables from the parent function too.
> Variables which live in registers, even when they have arrays in
> them,
> shouldn't really cause stack overflows of any kind, out of bounds
> accesses
> to them never cause out of bounds stack accesses, usually they ought
> to be
> already optimized away, arrays that are accessed through non-constant
> indexes should cause variables to live in memory.
> 
> Another issue is that the record_or_union_type_has_array_p function
> duplicates
> the behavior of stack_protect_classify_type when it returns the
> SPCT_HAS_ARRAY
> bit set.
> 
> Initially, I thought I'd just move the stack_protect_decl_p call
> later when
> the stack_vars array is computed and instead of FOR_EACH_LOCAL_DECL
> walk the stack_vars array.  But for the discovery of stack_vars that
> are or
> have arrays that is a waste of time, all such variables are already
> picked
> for phase 1 or phase 2 and thus has_protected_decls bool will be set
> if
> there are any and we already do create the stack canary if
> has_protected_decls is set.  So, the only thing that remains is catch
> variables that aren't or don't contain arrays, but are address
> taken.  Those
> go into the last unnumbered phase, but we still want to create stack
> canary.
> Instead of adding yet another walk I've done that in a function that
> already
> walks them.
> 
> In addition to this, I've tried to clarify this in the
> documentation.  For
> -fstack-protector, we've been doing it this way already before,
> optimized
> away or arrays living in registers didn't count, because we only
> walked
> stack-vars.  And, the documentation also said that only > 8 byte
> arrays are
> protected with -stack-protector, but it is actually >= 8 byte (or
> whatever
> the ssp-buffer-size param is).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, in addition
> regtested
> (just check-gcc, check-c++-all and check-target-libstdc++-v3) with
> --target_board=unix/-fstack-protector-strong on both.
> Ok for trunk?
> 
> 2019-12-10  Jakub Jelinek  
> 
>   PR middle-end/92825
>   * cfgexpand.c (add_stack_protection_conflicts): Change return
> type
>   from void to bool, return true if at least one
> stack_vars[i].decl
>   is addressable.
>   (record_or_union_type_has_array_p, stack_protect_decl_p):
> Remove.
>   (expand_used_vars): Don't call stack_protect_decl_p, instead
> for
>   -fstack-protector-strong set gen_stack_protect_signal to true
>   if add_stack_protection_conflicts returned true.  Formatting
> fixes.
>   * doc/invoke.texi (-fstack-protector-strong): Clarify that
> optimized
>   out variables or variables not living on the stack don't count.
>   (-fstack-protector): Likewise.  Clarify it affects >= 8 byte
> arrays
>   rather than > 8 byte.
> 
>   * gcc.target/i386/pr92825.c: New test.
OK
jeff
> 



Re: [PATCH] Fix propagate_vr_across_jump_function (PR ipa/92883)

2019-12-10 Thread Jeff Law
On Tue, 2019-12-10 at 21:33 +0100, Jakub Jelinek wrote:
> Hi!
> 
> The ipa vr hash table apparently intentionally doesn't differentiate
> between
> value ranges with different types, all that matters are the values of
> min and max, so before using it ipa_vr_operation_and_type_effects
> needs to
> be called to convert the value_range to the right type.
> Most of the spots do that correctly, but the second hunk fixes a spot
> that called it but ignored the newly computed value_range and used
> the
> non-converted one anyway.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-12-10  Jakub Jelinek  
> 
>   PR ipa/92883
>   * ipa-cp.c (propagate_vr_across_jump_function): Pass jvr rather
>   than *jfunc->m_vr to intersect.  Formatting fix.
OK
jeff




Re: [PATCH] Fix ICE in regstat_bb_compute_calls_crossed (PR rtl-optimization/92882)

2019-12-10 Thread Jeff Law
On Tue, 2019-12-10 at 21:40 +0100, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase ICEs on the newly added asserts.
> Some comments hint that maybe it is fine if CODE_LABEL additions
> don't
> trigger df recomputations, but even if it is not ok,
> regstat_bb_compute_calls_crossed doesn't look like an IL verification
> routine and for !NONDEBUG_INSN_P it really doesn't need
> DF_INSN_INFO_GET
> for anything, so I think it is best not to get it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-12-10  Jakub Jelinek  
> 
>   PR rtl-optimization/92882
>   * regstat.c (regstat_bb_compute_calls_crossed): Don't check
>   INSN_UID against DF_INSN_SIZE or use DF_INSN_INFO_GET unless
>   NONDEBUG_INSN_P.
> 
>   * gfortran.dg/pr92882.f: New test.
OK
jeff
> 



[PATCH] Fix ICE in regstat_bb_compute_calls_crossed (PR rtl-optimization/92882)

2019-12-10 Thread Jakub Jelinek
Hi!

The following testcase ICEs on the newly added asserts.
Some comments hint that maybe it is fine if CODE_LABEL additions don't
trigger df recomputations, but even if it is not ok,
regstat_bb_compute_calls_crossed doesn't look like an IL verification
routine and for !NONDEBUG_INSN_P it really doesn't need DF_INSN_INFO_GET
for anything, so I think it is best not to get it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-10  Jakub Jelinek  

PR rtl-optimization/92882
* regstat.c (regstat_bb_compute_calls_crossed): Don't check
INSN_UID against DF_INSN_SIZE or use DF_INSN_INFO_GET unless
NONDEBUG_INSN_P.

* gfortran.dg/pr92882.f: New test.

--- gcc/regstat.c.jj2019-12-09 15:02:30.112287863 +0100
+++ gcc/regstat.c   2019-12-10 13:36:23.231327649 +0100
@@ -324,13 +324,13 @@ regstat_bb_compute_calls_crossed (unsign
 
   FOR_BB_INSNS_REVERSE (bb, insn)
 {
+  if (!NONDEBUG_INSN_P (insn))
+   continue;
+
   gcc_assert (INSN_UID (insn) < (int) DF_INSN_SIZE ());
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
   unsigned int regno;
 
-  if (!NONDEBUG_INSN_P (insn))
-   continue;
-
   /* Process the defs.  */
   if (CALL_P (insn))
{
--- gcc/testsuite/gfortran.dg/pr92882.f.jj  2019-12-10 13:38:17.737571868 
+0100
+++ gcc/testsuite/gfortran.dg/pr92882.f 2019-12-10 13:38:07.453729553 +0100
@@ -0,0 +1,4 @@
+C PR rtl-optimization/92882
+C { dg-do compile }
+C { dg-options "-O2 -fno-inline" }
+  INCLUDE 'secnds.f'

Jakub



[committed] Fix some comment typos

2019-12-10 Thread Jakub Jelinek
Hi!

While working on the PR92883 patch I've noticed some typos, fixed thusly,
bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as
obvious.

2019-12-10  Jakub Jelinek  

* ipa-param-manipulation.c
(ipa_param_body_adjustments::register_replacement): Fix comment typo
- accross -> across.
* ipa-sra.c (propagate_used_across_scc_edge, ipa_sra_analysis):
Likewise.
(param_splitting_across_edge): Fix typo in dump message - accross
-> across.

--- gcc/ipa-param-manipulation.c.jj 2019-10-01 18:16:10.756161489 +0200
+++ gcc/ipa-param-manipulation.c2019-12-10 13:27:31.050486852 +0100
@@ -895,7 +895,7 @@ ipa_param_adjustments::debug ()
 }
 
 /* Register that REPLACEMENT should replace parameter described in APM and
-   optionally as DUMMY to mark transitive splits accross calls.  */
+   optionally as DUMMY to mark transitive splits across calls.  */
 
 void
 ipa_param_body_adjustments::register_replacement (ipa_adjusted_param *apm,
--- gcc/ipa-sra.c.jj2019-11-25 11:33:41.019082342 +0100
+++ gcc/ipa-sra.c   2019-12-10 13:27:16.792705175 +0100
@@ -3167,7 +3167,7 @@ isra_mark_caller_param_used (isra_func_s
 
 
 /* Propagate information that any parameter is not used only locally within a
-   SCC accross CS to the caller, which must be in the same SCC as the
+   SCC across CS to the caller, which must be in the same SCC as the
callee. Push any callers that need to be re-processed to STACK.  */
 
 static void
@@ -3397,7 +3397,7 @@ param_splitting_across_edge (cgraph_edge
: 0);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
-fprintf (dump_file, "Splitting accross %s->%s:\n",
+fprintf (dump_file, "Splitting across %s->%s:\n",
 cs->caller->dump_name (), callee->dump_name ());
 
   unsigned i;
@@ -3876,9 +3876,9 @@ ipa_sra_analysis (void)
  param_removal_cross_scc_edge (cs);
}
 
-  /* Look at edges within the current SCC and propagate used-ness accross
-  them, pushing onto the stack all notes which might need to be
-  revisited.  */
+  /* Look at edges within the current SCC and propagate used-ness across
+them, pushing onto the stack all notes which might need to be
+revisited.  */
   FOR_EACH_VEC_ELT (cycle_nodes, j, v)
v->call_for_symbol_thunks_and_aliases (propagate_used_to_scc_callers,
   , true);

Jakub



[PATCH] Fix propagate_vr_across_jump_function (PR ipa/92883)

2019-12-10 Thread Jakub Jelinek
Hi!

The ipa vr hash table apparently intentionally doesn't differentiate between
value ranges with different types, all that matters are the values of
min and max, so before using it ipa_vr_operation_and_type_effects needs to
be called to convert the value_range to the right type.
Most of the spots do that correctly, but the second hunk fixes a spot
that called it but ignored the newly computed value_range and used the
non-converted one anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-10  Jakub Jelinek  

PR ipa/92883
* ipa-cp.c (propagate_vr_across_jump_function): Pass jvr rather
than *jfunc->m_vr to intersect.  Formatting fix.

* gcc.dg/ipa/pr92883.c: New test.

--- gcc/ipa-cp.c.jj 2019-12-06 00:40:40.0 +0100
+++ gcc/ipa-cp.c2019-12-10 11:57:10.354999085 +0100
@@ -2369,12 +2369,10 @@ propagate_vr_across_jump_function (cgrap
 
   value_range vr;
   if (TREE_CODE_CLASS (operation) == tcc_unary)
-   {
- ipa_vr_operation_and_type_effects (,
-_lats->m_value_range.m_vr,
-operation, param_type,
-operand_type);
-   }
+   ipa_vr_operation_and_type_effects (,
+  _lats->m_value_range.m_vr,
+  operation, param_type,
+  operand_type);
   /* A crude way to prevent unbounded number of value range updates
 in SCC components.  We should allow limited number of updates within
 SCC, too.  */
@@ -2400,7 +2398,7 @@ propagate_vr_across_jump_function (cgrap
 NOP_EXPR,
 param_type,
 jfunc->m_vr->type ()))
-   vr.intersect (*jfunc->m_vr);
+   vr.intersect (jvr);
}
  return dest_lat->meet_with ();
}
--- gcc/testsuite/gcc.dg/ipa/pr92883.c.jj   2019-12-10 12:45:49.260640164 
+0100
+++ gcc/testsuite/gcc.dg/ipa/pr92883.c  2019-12-10 12:45:07.490278603 +0100
@@ -0,0 +1,14 @@
+/* PR ipa/92883 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int a, b, c, d;
+unsigned e;
+void baz (void *, int);
+void grault (int, unsigned long);
+int foo (unsigned g) { return a / g; }
+void bar (void *g) { if (b) baz (g, 5); }
+static void quux (int, unsigned long);
+static void qux (unsigned long g) { if (g) { d = foo (-1); quux (e, (d & 2) + 
g); } }
+static void quux (int g, unsigned long m) { (void) g; grault (c, m); bar (""); 
}
+void corge () { qux (e); }

Jakub



[C++ PATCH] PR c++/92847 - C++20 comparison ambiguity with class template.

2019-12-10 Thread Jason Merrill
This testcase demonstrates that looking at cand->template_decl is not a good
starting place for finding the most general template, as it is only set for
primary templates.

Tested x86_64-pc-linux-gnu, applying to trunk.

* call.c (cand_parms_match): Handle all templated functions.
---
 gcc/cp/call.c | 29 ++-
 .../g++.dg/cpp2a/spaceship-rewrite5.C | 15 ++
 2 files changed, 30 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite5.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 48d49b7ec87..cbd5747d6ca 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -11052,24 +11052,25 @@ joust_maybe_elide_copy (z_candidate *)
 bool
 cand_parms_match (z_candidate *c1, z_candidate *c2)
 {
-  tree fn1 = c1->template_decl;
-  tree fn2 = c2->template_decl;
-  if (fn1 && fn2)
-{
-  fn1 = most_general_template (TI_TEMPLATE (fn1));
-  fn1 = DECL_TEMPLATE_RESULT (fn1);
-  fn2 = most_general_template (TI_TEMPLATE (fn2));
-  fn2 = DECL_TEMPLATE_RESULT (fn2);
-}
-  else
-{
-  fn1 = c1->fn;
-  fn2 = c2->fn;
-}
+  tree fn1 = c1->fn;
+  tree fn2 = c2->fn;
   if (fn1 == fn2)
 return true;
   if (identifier_p (fn1) || identifier_p (fn2))
 return false;
+  /* We don't look at c1->template_decl because that's only set for primary
+ templates, not e.g. non-template member functions of class templates.  */
+  tree t1 = most_general_template (fn1);
+  tree t2 = most_general_template (fn2);
+  if (t1 || t2)
+{
+  if (!t1 || !t2)
+   return false;
+  if (t1 == t2)
+   return true;
+  fn1 = DECL_TEMPLATE_RESULT (t1);
+  fn2 = DECL_TEMPLATE_RESULT (t2);
+}
   return compparms (TYPE_ARG_TYPES (TREE_TYPE (fn1)),
TYPE_ARG_TYPES (TREE_TYPE (fn2)));
 }
diff --git a/gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite5.C 
b/gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite5.C
new file mode 100644
index 000..d0424377e8e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-rewrite5.C
@@ -0,0 +1,15 @@
+// { dg-do compile { target c++11 } }
+
+template
+struct A {
+  A() {}
+
+  template
+A(const A&) {}
+
+  bool operator==(const A&) const { return true; }
+};
+
+A a;
+A b;
+auto c = (a == b);

base-commit: 4381967563fe149f4b0f5eb6d61ed7645fac70ff
-- 
2.18.1



[C++ PATCH] Fix C++20 structural type vs. private base.

2019-12-10 Thread Jason Merrill
In my patch to implement C++20 "structural type" I tried to set the access
flags on the artificial base fields appropriately, but failed.  I was
copying TREE_PRIVATE from the binfo, but TREE_PRIVATE on binfo is just a
temporary cache for dfs_access_in_type; we really need to get the
inheritance access information from BINFO_BASE_ACCESSES.

Tested x86_64-pc-linux-gnu, applying to trunk.

* class.c (build_base_field_1): Take access parameter.
(build_base_field): Likewise.
(build_base_fields, layout_virtual_bases): Pass it.
* tree.c (structural_type_p): Improve private base diagnostic.
---
 gcc/cp/class.c   | 54 +---
 gcc/cp/tree.c|  8 ++-
 gcc/testsuite/g++.dg/cpp2a/nontype-class25.C |  6 +++
 3 files changed, 49 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class25.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index d8bb44990b7..a8c6b1cb01c 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -134,7 +134,6 @@ static void build_vtbl_initializer (tree, tree, tree, tree, 
int *,
 static bool check_bitfield_decl (tree);
 static bool check_field_decl (tree, tree, int *, int *);
 static void check_field_decls (tree, tree *, int *, int *);
-static tree *build_base_field (record_layout_info, tree, splay_tree, tree *);
 static void build_base_fields (record_layout_info, splay_tree, tree *);
 static void check_methods (tree);
 static void remove_zero_width_bit_fields (tree);
@@ -4407,7 +4406,7 @@ layout_empty_base_or_field (record_layout_info rli, tree 
binfo_or_decl,
fields at NEXT_FIELD, and return it.  */
 
 static tree
-build_base_field_1 (tree t, tree binfo, tree *_field)
+build_base_field_1 (tree t, tree binfo, tree access, tree *_field)
 {
   /* Create the FIELD_DECL.  */
   tree basetype = BINFO_TYPE (binfo);
@@ -4417,8 +4416,6 @@ build_base_field_1 (tree t, tree binfo, tree *_field)
   DECL_ARTIFICIAL (decl) = 1;
   DECL_IGNORED_P (decl) = 1;
   DECL_FIELD_CONTEXT (decl) = t;
-  TREE_PRIVATE (decl) = TREE_PRIVATE (binfo);
-  TREE_PROTECTED (decl) = TREE_PROTECTED (binfo);
   if (is_empty_class (basetype))
 /* CLASSTYPE_SIZE is one byte, but the field needs to have size zero.  */
 DECL_SIZE (decl) = DECL_SIZE_UNIT (decl) = size_zero_node;
@@ -4432,6 +4429,11 @@ build_base_field_1 (tree t, tree binfo, tree 
*_field)
   SET_DECL_MODE (decl, TYPE_MODE (basetype));
   DECL_FIELD_IS_BASE (decl) = 1;
 
+  if (access == access_private_node)
+TREE_PRIVATE (decl) = true;
+  else if (access == access_protected_node)
+TREE_PROTECTED (decl) = true;
+
   /* Add the new FIELD_DECL to the list of fields for T.  */
   DECL_CHAIN (decl) = *next_field;
   *next_field = decl;
@@ -4450,7 +4452,7 @@ build_base_field_1 (tree t, tree binfo, tree *_field)
Returns the location at which the next field should be inserted.  */
 
 static tree *
-build_base_field (record_layout_info rli, tree binfo,
+build_base_field (record_layout_info rli, tree binfo, tree access,
  splay_tree offsets, tree *next_field)
 {
   tree t = rli->t;
@@ -4471,7 +4473,7 @@ build_base_field (record_layout_info rli, tree binfo,
   CLASSTYPE_EMPTY_P (t) = 0;
 
   /* Create the FIELD_DECL.  */
-  decl = build_base_field_1 (t, binfo, next_field);
+  decl = build_base_field_1 (t, binfo, access, next_field);
 
   /* Try to place the field.  It may take more than one try if we
 have a hard time placing the field without putting two
@@ -4505,7 +4507,7 @@ build_base_field (record_layout_info rli, tree binfo,
 aggregate bases.  */
   if (cxx_dialect >= cxx17 && !BINFO_VIRTUAL_P (binfo))
{
- tree decl = build_base_field_1 (t, binfo, next_field);
+ tree decl = build_base_field_1 (t, binfo, access, next_field);
  DECL_FIELD_OFFSET (decl) = BINFO_OFFSET (binfo);
  DECL_FIELD_BIT_OFFSET (decl) = bitsize_zero_node;
  SET_DECL_OFFSET_ALIGN (decl, BITS_PER_UNIT);
@@ -4536,25 +4538,39 @@ build_base_fields (record_layout_info rli,
   /* Chain to hold all the new FIELD_DECLs which stand in for base class
  subobjects.  */
   tree t = rli->t;
-  int n_baseclasses = BINFO_N_BASE_BINFOS (TYPE_BINFO (t));
-  int i;
+  tree binfo = TYPE_BINFO (t);
+  int n_baseclasses = BINFO_N_BASE_BINFOS (binfo);
 
   /* The primary base class is always allocated first.  */
-  if (CLASSTYPE_HAS_PRIMARY_BASE_P (t))
-next_field = build_base_field (rli, CLASSTYPE_PRIMARY_BINFO (t),
-  offsets, next_field);
+  const tree primary_binfo = CLASSTYPE_PRIMARY_BINFO (t);
+  if (primary_binfo)
+{
+  /* We need to walk BINFO_BASE_BINFO to find the access of the primary
+base, if it is direct.  Indirect base fields are private.  */
+  tree primary_access = access_private_node;
+  for (int i = 0; i < n_baseclasses; ++i)
+   {
+ tree base_binfo = BINFO_BASE_BINFO (binfo, 

[C++ PATCH] PR c++/92560 - ICE with decltype and rewritten operator.

2019-12-10 Thread Jason Merrill
A call as the immediate operand of decltype is handled differently; we don't
create an object of the return type as we do normally.  But in the case of a
rewritten operator, we're adding another call as a wrapper, so the inner
call doesn't get the special handling.

Tested x86_64-pc-linux-gnu, applying to trunk.

* call.c (build_new_op_1): Clear tf_decltype on inner call.
---
 gcc/cp/call.c|  6 +-
 gcc/testsuite/g++.dg/cpp2a/spaceship-decltype1.C | 11 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-decltype1.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index ce942977f45..48d49b7ec87 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6242,6 +6242,10 @@ build_new_op_1 (const op_location_t , enum tree_code 
code, int flags,
result = error_mark_node;
  else
{
+ tsubst_flags_t ocomplain = complain;
+ if (cand->rewritten ())
+   /* We'll wrap this call in another one.  */
+   ocomplain &= ~tf_decltype;
  if (cand->reversed ())
{
  /* We swapped these in add_candidate, swap them back now.  */
@@ -6251,7 +6255,7 @@ build_new_op_1 (const op_location_t , enum tree_code 
code, int flags,
"current function recursively with reversed "
"arguments");
}
- result = build_over_call (cand, LOOKUP_NORMAL, complain);
+ result = build_over_call (cand, LOOKUP_NORMAL, ocomplain);
}
 
  if (trivial_fn_p (cand->fn))
diff --git a/gcc/testsuite/g++.dg/cpp2a/spaceship-decltype1.C 
b/gcc/testsuite/g++.dg/cpp2a/spaceship-decltype1.C
new file mode 100644
index 000..bc673b2e020
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-decltype1.C
@@ -0,0 +1,11 @@
+// PR c++/92560
+// { dg-do compile { target c++2a } }
+
+#include 
+
+struct X
+{
+  friend std::strong_ordering operator<=>(X, X);
+} x;
+
+using T = decltype(x < x);

base-commit: b746406f8489fd246aabe01bdf0fc2f319be5b05
-- 
2.18.1



Re: [PATCH] rs6000: Fix 2 for PR92661, Do not define builtins that overload disabled builtins

2019-12-10 Thread Iain Sandoe

Peter Bergner  wrote:


On 12/10/19 12:27 PM, Peter Bergner wrote:
Ok, how about the patch below?  If Iain and David could test this on  
Darwin

and AIX respectively, that would be great.  I'm currently testing this on
powerpc64le-linux, with and without --disable-decimal-float.


So my --enable-decimal-float builds showed no regressions or differences
in the testsuite results (as expected).  My --disable-decimal-float runs
showed no new regressions.  The only differences in the testsuite results
were due to all of the FAILs on the base compiler build that are fixed
with the patch.


make check-gcc-c RUNTESTFLAGS="powerpc.exp dfp.exp”

did not show anything unexpected with a built tree (with the system
assembler). So no prob. from my PoV (if something comes up when/if I
build with a newer assembler, that can be tackled in target-supports as
already mentioned).

thanks
Iain



Re: [PATCH 28/49] analyzer: new files: analyzer.{cc|h}

2019-12-10 Thread Eric Gallager
On 12/9/19, David Malcolm  wrote:
> On Fri, 2019-12-06 at 22:38 -0500, Eric Gallager wrote:
>> On 11/15/19, David Malcolm  wrote:
> [...]
>> > diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
>> > new file mode 100644
>> > index 000..ace8924
>> > --- /dev/null
>> > +++ b/gcc/analyzer/analyzer.h
>> > @@ -0,0 +1,126 @@
>> > +/* Utility functions for the analyzer.
>> > +   Copyright (C) 2019 Free Software Foundation, Inc.
>> > +   Contributed by David Malcolm .
>> > +
>> > +This file is part of GCC.
>> > +
>> > +GCC is free software; you can redistribute it and/or modify it
>> > +under the terms of the GNU General Public License as published by
>> > +the Free Software Foundation; either version 3, or (at your
>> > option)
>> > +any later version.
>> > +
>> > +GCC is distributed in the hope that it will be useful, but
>> > +WITHOUT ANY WARRANTY; without even the implied warranty of
>> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> > +General Public License for more details.
>> > +
>> > +You should have received a copy of the GNU General Public License
>> > +along with GCC; see the file COPYING3.  If not see
>> > +;.  */
>> > +
>> > +#ifndef GCC_ANALYZER_ANALYZER_H
>> > +#define GCC_ANALYZER_ANALYZER_H
>> > +
>> > +/* Forward decls of common types, with indentation to show
>> > inheritance. */
>>
>> I'm wondering about the "with indentation to show inheritance"
>> part...
>> does that require tweaking any editor configuration files or adding
>> /*INDENT-OFF*/ comments or anything to prevent automatic formatting
>> tools from "fixing" the indentation to go back to the normal style of
>> having everything be aligned?
>
> If we had some kind of automatic formatting then I guess it would, but
> I don't think we have such a system in place.
>

Check the contrib directory; there's a clang-format file and a vimrc
file in there that provide automatic formatting; do `make vimrc` and
`make clang-format` from the top-level to use them. There's also the
check_GNU_style scripts, but those just check & don't actually
reformat, AFAIK...

> [...]
>
>


[C++ PATCH] c++/92878 - Parenthesized init of aggregates in new-expression.

2019-12-10 Thread Marek Polacek
Ville pointed out that our paren init of aggregates doesn't work for

  auto a = new A(1, 2, 3);

and I think it should:

A new-expression that creates an object of type T initializes that object
as follows:
...
-- Otherwise, the new-initializer is interpreted according to the
   initialization rules of [dcl.init] for direct-initialization.

so I think it follows that we should perform dcl.init#17.6.2.2.

This doesn't work with new[]; we have:
  error ("parenthesized initializer in array new");

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-12-10  Marek Polacek  

PR c++/92878 - Parenthesized init of aggregates in new-expression.
* init.c (build_new_1): Handle parenthesized initialization of
aggregates in new-expression.

* g++.dg/cpp2a/paren-init20.C: New test.

diff --git gcc/cp/init.c gcc/cp/init.c
index e40afe27e1a..b0331b8ba53 100644
--- gcc/cp/init.c
+++ gcc/cp/init.c
@@ -3608,10 +3608,22 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  tree ie;
 
  /* We are processing something like `new int (10)', which
-means allocate an int, and initialize it with 10.  */
+means allocate an int, and initialize it with 10.
 
- ie = build_x_compound_expr_from_vec (*init, "new initializer",
-  complain);
+In C++20, also handle `new A(1, 2)'.  */
+ if (cxx_dialect >= cxx2a
+ && AGGREGATE_TYPE_P (type)
+ && (*init)->length () > 1)
+   {
+ ie = build_tree_list_vec (*init);
+ ie = build_constructor_from_list (init_list_type_node, ie);
+ CONSTRUCTOR_IS_DIRECT_INIT (ie) = true;
+ CONSTRUCTOR_IS_PAREN_INIT (ie) = true;
+ ie = digest_init (type, ie, complain);
+   }
+ else
+   ie = build_x_compound_expr_from_vec (*init, "new initializer",
+complain);
  init_expr = cp_build_modify_expr (input_location, init_expr,
INIT_EXPR, ie, complain);
}
diff --git gcc/testsuite/g++.dg/cpp2a/paren-init20.C 
gcc/testsuite/g++.dg/cpp2a/paren-init20.C
new file mode 100644
index 000..05da7604686
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp2a/paren-init20.C
@@ -0,0 +1,54 @@
+// PR c++/92878 - Parenthesized init of aggregates in new-expression.
+// { dg-do compile { target c++2a } }
+// Test new TYPE(...).
+
+int f ();
+
+struct A
+{
+  int a;
+  int b;
+};
+
+void
+fn_A ()
+{
+  int i = 0;
+  auto y = new A(1, 2);
+  auto x = new A(++i, ++i);
+  auto z = new A(1, { ++i });
+  auto u = new A(1, f());
+}
+
+struct B
+{
+  int a;
+  int b;
+  int c = 42;
+};
+
+void
+fn_B ()
+{
+  int i = 0;
+  auto y = new B(1, 2);
+  auto x = new B(++i, ++i);
+  auto z = new B(1, { ++i });
+  auto u = new B(1, f());
+}
+
+struct C
+{
+  int a;
+  int b = 10;
+};
+
+void
+fn_C ()
+{
+  int i = 0;
+  auto y = new C(1);
+  auto x = new C(++i);
+  auto z = new C({ ++i });
+  auto u = new C(f());
+}



Re: [PATCH] rs6000: Fix 2 for PR92661, Do not define builtins that overload disabled builtins

2019-12-10 Thread Peter Bergner
On 12/10/19 12:27 PM, Peter Bergner wrote:
> Ok, how about the patch below?  If Iain and David could test this on Darwin
> and AIX respectively, that would be great.  I'm currently testing this on
> powerpc64le-linux, with and without --disable-decimal-float.

So my --enable-decimal-float builds showed no regressions or differences
in the testsuite results (as expected).  My --disable-decimal-float runs
showed no new regressions.  The only differences in the testsuite results
were due to all of the FAILs on the base compiler build that are fixed
with the patch.

Peter





[patch, fortran, committed] Fix PR 92863, 'ICE in gfc_typename

2019-12-10 Thread Thomas Koenig

Hello world,

I have just committed the attached patch as obvious and simple
as r279180.

The ICE for the test case occurred because a previous error
had left the derived field of the typespec NULL. Just returning
"invalid type" or "invalid class" in such a case is enough
to cure the ICE.

Regards

Thomas

2019-12-10  Thomas Koenig  

PR fortran/92863
* misc.c (gfc_typename): If derived component is NULL for
derived or class, return "invalid type" or "invalid class",
respectively.

2019-12-10  Thomas Koenig  

PR fortran/92863
* gfortran.dg/interface_45.f90: New test.

Index: misc.c
===
--- misc.c	(Revision 279064)
+++ misc.c	(Arbeitskopie)
@@ -164,9 +164,19 @@ gfc_typename (gfc_typespec *ts)
   sprintf (buffer, "UNION(%s)", ts->u.derived->name);
   break;
 case BT_DERIVED:
+  if (ts->u.derived == NULL)
+	{
+	  sprintf (buffer, "invalid type");
+	  break;
+	}
   sprintf (buffer, "TYPE(%s)", ts->u.derived->name);
   break;
 case BT_CLASS:
+  if (ts->u.derived == NULL)
+	{
+	  sprintf (buffer, "invalid class");
+	  break;
+	}
   ts1 = ts->u.derived->components ? >u.derived->components->ts : NULL;
   if (ts1 && ts1->u.derived && ts1->u.derived->attr.unlimited_polymorphic)
 	sprintf (buffer, "CLASS(*)");
! { dg-do compile }
! PR 92863 - this used to ICE
! Test case by Arseny Solokha.

type(l1) function mp() ! { dg-error "type for function" }
  call sub(mp) ! { dg-error "Type mismatch" }
end function mp

function bi(ry)
  call sub(ry) ! { dg-error "Type mismatch" }
end function bi


Re: [PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost

2019-12-10 Thread Craig Blackmore
Hi Jim,

On 31/10/2019 00:03, Jim Wilson wrote:
> On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
>  wrote:
>> The sched2 pass undoes some of the addresses generated by the RISC-V
>> shorten_memrefs code size optimization (patch 1/2) and consequently increases
>> code size. This patch prevents sched-deps.c from changing an address if it is
>> expected to increase address cost.
>
> This should be rewritten as a target hook, and then we can define the
> hook to do what we want for RISC-V.  It isn't OK to make this change
> for other targets without testing them.  If the default hook does
> nothing, then other targets won't be affected.
>

Thanks for the review, here is an updated patch rewriting this as a target hook.

Thanks,
Craig

---
gcc/ChangeLog:

* config/riscv/riscv.c (riscv_new_address_profitable_p): New function.
(TARGET_NEW_ADDRESS_PROFITABLE_P): Define.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_NEW_ADDRESS_PROFITABLE_P):  New hook.
* sched-deps.c (attempt_change): Use old address if it is cheaper than
new address.
* target.def (new_address_profitable_p): New hook.
* targhooks.c (default_new_address_profitable_p): New function.
* targhooks.h (default_new_address_profitable_p): Declare.
---
 gcc/config/riscv/riscv.c | 16 
 gcc/doc/tm.texi  |  7 +++
 gcc/doc/tm.texi.in   |  2 ++
 gcc/sched-deps.c |  3 +++
 gcc/target.def   | 11 +++
 gcc/targhooks.c  | 13 +
 gcc/targhooks.h  |  1 +
 7 files changed, 53 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 1d6d2f89f7d..a508894d16e 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -5095,6 +5095,19 @@ riscv_reorg (void)
 riscv_remove_unneeded_save_restore_calls ();
 }
 
+/* Implement TARGET_NEW_ADDRESS_PROFITABLE_P.  */
+
+bool
+riscv_new_address_profitable_p (rtx memref, rtx_insn *insn, rtx new_addr)
+{
+  /* Prefer old address if it is less expensive.  */
+  addr_space_t as = MEM_ADDR_SPACE (memref);
+  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
+  int old_cost = address_cost (XEXP (memref, 0), GET_MODE (memref), as, speed);
+  int new_cost = address_cost (new_addr, GET_MODE (memref), as, speed);
+  return new_cost <= old_cost;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -5272,6 +5285,9 @@ riscv_reorg (void)
 #undef TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG riscv_reorg
 
+#undef TARGET_NEW_ADDRESS_PROFITABLE_P
+#define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5b8b68bd710..85470c8afa0 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6944,6 +6944,13 @@ candidate as a replacement for the if-convertible 
sequence described in
 @code{if_info}.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_NEW_ADDRESS_PROFITABLE_P (rtx 
@var{memref}, rtx_insn * @var{insn}, rtx @var{new_addr})
+Return @code{true} if it is profitable to replace the address in
+@var{memref} with @var{new_addr}.  This allows targets to prevent the
+scheduler from undoing address optimizations.  The instruction containing the
+memref is @var{insn}.  The default implementation returns @code{true}.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
 This predicate controls the use of the eager delay slot filler to disallow
 speculatively executed instructions being placed in delay slots.  Targets
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 1b061d70127..b5c80ce2fdc 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4695,6 +4695,8 @@ Define this macro if a non-short-circuit operation 
produced by
 
 @hook TARGET_NOCE_CONVERSION_PROFITABLE_P
 
+@hook TARGET_NEW_ADDRESS_PROFITABLE_P
+
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
 
 @hook TARGET_ESTIMATED_POLY_VALUE
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index a9e73fc6044..8f56594b745 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -4693,6 +4693,9 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
   rtx mem = *mii->mem_loc;
   rtx new_mem;
 
+  if (!targetm.new_address_profitable_p (mem, mii->mem_insn, new_addr))
+return NULL_RTX;
+
   /* Jump through a lot of hoops to keep the attributes up to date.  We
  do not want to call one of the change address variants that take
  an offset even though we know the offset in many cases.  These
diff --git a/gcc/target.def b/gcc/target.def
index e0e856979a9..81c78d8ffea 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3824,6 +3824,17 @@ candidate as a replacement for the if-convertible 
sequence described in\n\
 bool, (rtx_insn *seq, struct noce_if_info *if_info),
 

Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-12-10 Thread Craig Blackmore
Hi Jim,

Thank you for your review. I have posted an updated patch below which I think
addresses your comments.

On 30/10/2019 23:57, Jim Wilson wrote:
> On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
>  wrote:
>> This patch aims to allow more load/store instructions to be compressed by
>> replacing a load/store of 'base register + large offset' with a new 
>> load/store
>> of 'new base + small offset'. If the new base gets stored in a compressed
>> register, then the new load/store can be compressed. Since there is an 
>> overhead
>> in creating the new base, this change is only attempted when 'base register' 
>> is
>> referenced in at least 4 load/stores in a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been developed for 
>> the
>> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in 
>> future.
>
> The fact that this needs 4 load/stores in a block with the same base
> address means it won't trigger very often.  But it does seem to work.
> I see about a 0.05% code size reduction for a rv32gc newlib nano
> libc.a.  There might be other codes that benefit more.
>
> I'm concerned about the number of RISC-V specific optimization passes
> people are writing.  I've seen at least 3 so far.  This one is at
> least small and simple enough to understand that it doesn't look like
> it will cause maintenance problems.
>
> The config.gcc change conflicts with the save-restore optimization
> pass that Andrew Burgess added, but that is a trivial fix.
>

I have rebased onto latest trunk and fixed this.

> The code only works for 32-bit load/stores.  rv64 has compressed
> 64-bit load/stores, and the F and D extensions have float and double
> compressed loads and stores.  The patch would be more useful if it
> handled all of these.
>

We plan to add support for these in the future.

> The patch doesn't check the other operand, it only looks at the memory
> operand.  This results in some cases where the code rewrites
> instructions that can never be compressed.  For instance, given
> void
> store1z (int *array)
> {
>   array[200] = 0;
>   array[201] = 0;
>   array[202] = 0;
>   array[203] = 0;
> }
> the patch increases code size by 4 bytes because it rewrites the
> stores, but we don't have compressed store 0, and x0 is not a
> compressed reg, so there is no point in rewriting the stores.

I have now excluded store zero from being counted and rewritten.

>  I can
> also create examples that show a size increase with loads but it is a
> little trickier.
> extern int sub1 (int, int, int, int, int, int, int);
> int
> load1a (int a0, int a1, int a2, int a3, int a4, int *array)
> {
>   int a = 0;
>   a += array[200];
>   a += array[201];
>   a += array[202];
>   a += array[203];
>   return sub1 (a0, a1, a2, a3, a4, 0, a);
> }
> The register allocator will pass a0-a4 directly through from args to
> the call, leaving only a5-a7 for the load base address and dest, and
> only one of those regs is a compressed reg, but we need two, so these
> loads can't be compressed.  The code still gets rewritten, resulting
> in a size increase for the extra add.  Not sure if you can do anything
> about that though, since you are doing this before register
> allocation.

I haven't got a way of avoiding this at present, so I have added this
case as an xfail.

>
> It isn't clear that the change riscv_address_cost is for.  That should
> be commented.
>

This is now commented.

> I'd suggest parens in the riscv_compressed_lw_offset_p return
> statement, that makes it easier to align the code correctly (in emacs
> at least).
>

Parens added.

> The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the "ISA
> constants" section of riscv.h, and maybe renamed similar to the other
> constants.
>

Done.

> There is a REG_P check in get_si_mem_reg.  That should probably handle
> SUBREGs too.
>

I couldn't produce a case where the address is 'subreg + offset', so I have
not added handling for SUBREGs for now.

> A testcase to verify the patch would be nice.  I have one I wrote for
> testing that shows some tests that work, some tests that don't work,
> and some that work but maybe shouldn't.
>

Thank you for the test file. I have added some testcases to the patch
based on this. 

> I vaguely remember Micheal Meissner talking about doing something
> similar for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know
> if he tried, or how far he got if he did try.  It might be useful to
> ask him about that work.
>
> Otherwise this looks OK to me.
>
> Jim

Thanks,
Craig

---

gcc/ChangeLog:

* config.gcc:  Add riscv-shorten-memrefs.o to extra_objs for riscv.
* config/riscv/riscv-passes.def: New file.
* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
* config/riscv/riscv-shorten-memrefs.c: New file.
* config/riscv/riscv.c (tree-pass.h): New include.

Re: [PATCH] rs6000: Fix 2 for PR92661, Do not define builtins that overload disabled builtins

2019-12-10 Thread Peter Bergner
On 12/4/19 5:03 PM, Segher Boessenkool wrote:
> On Wed, Dec 04, 2019 at 03:53:30PM -0600, Peter Bergner wrote:
>> Right.  I'll come up with a patch and hopefully Iain and David can test
>> on Darwin and AIX and I can test on Linux with --enable-decimal-float
>> and --disable-decimal-float.  That should cover the major subtargets and
>> if it works there, I'd expect it to work on the minor subtargets too.

Ok, how about the patch below?  If Iain and David could test this on Darwin
and AIX respectively, that would be great.  I'm currently testing this on
powerpc64le-linux, with and without --disable-decimal-float.

The pr92661.c test case is the DFP test case you wanted added to make sure
we do not ICE, even when DFP is disabled.  The dfp-[dt]d*.c changes are
due to me seeing them being run (and FAILing) on my --disable-decimal-float
runs.  Clearly, they shouldn't be run when DFP is disabled.

All of the powerpc/dfp/* tests had powerpc*-*-* target tests, but that is
covered by the dfp.exp target tests, so I removed them along with the
now unneeded dg-skip-if AIX tests.  If dg-do compile is the default, do
we want to just remove that whole line?

How is this looking?

Peter


PR bootstrap/92661
* gcc.target/powerpc/pr92661.c: New test.
* gcc.target/powerpc/dfp-dd.c: Add dg-require-effective-target dfp_hw.
Remove unneeded powerpc_fprs test.
* gcc.target/powerpc/dfp-td.c: Likewise.
* gcc.target/powerpc/dfp-dd-2.c: Add dg-require-effective-target dfp.
* gcc.target/powerpc/dfp-td-2.c: Likewise.
* gcc.target/powerpc/dfp-td-3.c: Likewise.
* gcc.target/powerpc/dfp/dfp.exp: Remove rs6000-*-* and
powerpc*-*-darwin* target tests.  Add check_effective_target_dfp test.
* gcc.target/powerpc/dfp/dtstsfi-0.c: Remove unneeded target test.
Remove unneeded dg-skip-if.
* gcc.target/powerpc/dfp/dtstsfi-1.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-10.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-11.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-12.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-13.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-14.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-15.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-16.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-17.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-18.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-19.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-2.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-20.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-21.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-22.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-23.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-24.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-25.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-26.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-27.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-28.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-29.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-3.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-30.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-31.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-32.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-33.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-34.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-35.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-36.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-37.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-38.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-39.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-4.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-40.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-41.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-42.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-43.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-44.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-45.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-46.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-47.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-48.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-49.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-5.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-50.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-51.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-52.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-53.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-54.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-55.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-56.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-57.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-58.c: Likewise.
* gcc.target/powerpc/dfp/dtstsfi-59.c: 

Re: [PATCH] Freestanding proposal P0829

2019-12-10 Thread Paul M. Bendixen
Den ons. 4. dec. 2019 kl. 22.43 skrev Jonathan Wakely :
>
> On 03/12/19 23:38 +0100, Paul M. Bendixen wrote:
> >Hello
> >I've made an implementation of P0829 and tested it with gcc for avr.
>
> This is great, thanks!
>
> What's the status of libstdc++ for AVR? If I understand correctly, it
> doesn't build by default when GCC is configured for AVR. Are you
> overriding that, or just copying the headers somewhere and including
> them in your programs?
>
The status is that it isn't built by default for AVR. I use the method described
in "Real time C++" by Christopher Kormanyos and modified it a bit.

I have a docker image that is built as part of CI at
https://gitlab.com/avr-libstdcxx/avr-libstdcxx-docker that describes
the build process.
I use avrlibc instead of newlib and it seems to work well.

> >I've included the patch and I believe it might need some modification.
> >Should I rather just maintain my own fork until P0829 is a bit
> >further? (It is currently being split into smaller proposals).
>
> Yes, I think it's too early to actually make this change. Our current
> !_GLIBCXX_HOSTED build conforms to the requirements of a freestanding
> implementation (for anything from C++98 to C++17) so I'd prefer not to
> mess with that drastically until the P0829 work has made more
> progress. Despite that, I'm very happy to see somebody take on the
> task, and will be glad to apply a patch like this when the time is
> right.
>
I also considered using another macro ( such as _P0829_FREESTANDING )
so that it might be added directly. I will try to keep the branch up to date
with regards to the current proposal.
If anyone else is interested, the code is at:
https://gitlab.com/avr-libstdcxx/

> >I have not included further tests, nor actually run the tests since
> >everything is in header files is only includes of already existing
> >libstdc++ code.
>
> If we were serious about improving our freestanding support, we'd need
> to partition the testsuite into the parts that should pass for
> freestanding, and mark all the other tests UNSUPPORTED. That would
> mean we could run tests for a freestanding build and not get thousands
> of failures for tests that are not expected to work for freestanding.
> That would be a big project though.
>
> Your patch is mostly just adding #if in various places, not adding new
> code, and so I'm not sure it needs a copyright assignment (the changes
> probably aren't copyrightable anyway). But if you plan to contribute
> to GCC it would be good to get an assignment in place (if you don't
> have one already). See https://gcc.gnu.org/contribute.html#legal for
> more information, and feel free to contact me directly if you have any
> questions or if you want to start the process.
>
> Thanks again for the patch. Please do keep working on it as the P0829
> work evolves, so that we can integrate it with libstdc++ at some
> point.
>

Thank you, I will


-- 
• − − •/• −/• • −/• − • •/− • • •/•/− •/− • •/• •/− • • −/•/− •/• − −
•− •/− − •/− −/• −/• •/• − • •/• − • − • −/− • − •/− − −/− −//


Fix function profile computation

2019-12-10 Thread Jan Hubicka
Hi,
this patch fixes compute_function_frequency to work well with profiles
that was downgraded from IPA to local profiles.  

Bootstrapped/regtested x86_64-linux, comitted.

honza

* predict.c (compute_function_frequency): Check for presence of IPA
profile.
Index: predict.c
===
--- predict.c   (revision 279167)
+++ predict.c   (working copy)
@@ -3932,13 +3932,11 @@ compute_function_frequency (void)
   if (DECL_STATIC_DESTRUCTOR (current_function_decl))
 node->only_called_at_exit = true;
 
-  if (profile_status_for_fn (cfun) != PROFILE_READ)
+  if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa_p ())
 {
   int flags = flags_from_decl_or_type (current_function_decl);
-  if ((ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa_p ()
-  && ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa() == profile_count::zero 
())
- || lookup_attribute ("cold", DECL_ATTRIBUTES (current_function_decl))
-!= NULL)
+  if (lookup_attribute ("cold", DECL_ATTRIBUTES (current_function_decl))
+ != NULL)
{
   node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
  warn_function_cold (current_function_decl);


Re: Fix overflows in -fprofile-reorder-functions

2019-12-10 Thread Jan Hubicka
> On Tue, 10 Dec 2019, Jan Hubicka wrote:
> 
> > > I would recommend to make these variables uint64_t, then you can simply do
> > > 
> > >   tp_first_run_a--;
> > >   tp_first_run_b--;
> > > 
> > > making 0 wrap around to UINT64_MAX. Then they will naturally sort after 
> > > all
> > > other nodes.
> > 
> > Then we would still have to watch the overflow before returning? I
> > actually find the condtional sort of more readable than intentional wrap
> > around the range, so I kept it in the code espeically because I made the
> > value 32bit again and without this trick I no longer need to watch
> > overflows.
> 
> For int-typed timestamps, you'd need to warp 0 to INT_MAX, e.g. like this:
> 
>   tp_first_run_a = (tp_first_run_a - 1) & INT_MAX;
>   tp_first_run_b = (tp_first_run_b - 1) & INT_MAX;
> 
> which keeps them in 0..INT_MAX range so 'return tp_first_run_a - 
> tp_first_run_b'
> still works.

OK, updatd code for that :)
> 
> > > > +  /* Output functions in RPO so callers get optimized before callees.  
> > > > This
> > > > + makes ipa-ra and other propagators to work.
> > > > + FIXME: This is far from optimal code layout.  */
> > > 
> > > I think this should have said "callees get optimized before callers".
> > 
> > Indeed.
> 
> So technically this would be just postorder, not RPO :)

Well, we it is computed by ipa_reverse_postorder :)
> 
> > --- cgraph.c(revision 279093)
> > +++ cgraph.c(working copy)
> > @@ -3074,6 +3074,11 @@ cgraph_node::verify_node (void)
> >inlined_to->count.debug ();
> >error_found = true;
> >  }
> > +  if (tp_first_run < 0)
> > +{
> > +  error ("tp_first_run must be positive");
> > +  error_found = true;
> > +}
> 
> "non-negative"?
Fixed too.

I have comitted the following variant of patch after re-testing.


* cgraph.c (cgraph_node::verify_node): Verify tp_first_run.
* cgraph.h (cgrpah_node): Turn tp_first_run back to int.
* cgraphunit.c (tp_first_run_node_cmp): Do not watch for overflows.
(expand_all_functions): First expand ordered section and then
unordered.
* lto-partition.c (lto_balanced_map): Fix printing of tp_first_run.
* profile.c (compute_value_histograms): Error on out of range
tp_first_runs.
Index: cgraph.c
===
--- cgraph.c(revision 279167)
+++ cgraph.c(working copy)
@@ -3066,6 +3066,11 @@ cgraph_node::verify_node (void)
   inlined_to->count.debug ();
   error_found = true;
 }
+  if (tp_first_run < 0)
+{
+  error ("tp_first_run must be non-negative");
+  error_found = true;
+}
   if (!definition && !in_other_partition && local)
 {
   error ("local symbols must be defined");
Index: cgraph.h
===
--- cgraph.h(revision 279167)
+++ cgraph.h(working copy)
@@ -926,9 +926,9 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg
   clone_of (NULL), call_site_hash (NULL), former_clone_of (NULL),
   simdclone (NULL), simd_clones (NULL), ipa_transforms_to_apply (vNULL),
   inlined_to (NULL), rtl (NULL), clone (), thunk (),
-  count (profile_count::uninitialized ()), tp_first_run (false),
+  count (profile_count::uninitialized ()),
   count_materialization_scale (REG_BR_PROB_BASE), profile_id (0),
-  unit_id (0), used_as_abstract_origin (false),
+  unit_id (0), tp_first_run (0), used_as_abstract_origin (false),
   lowered (false), process (false), frequency (NODE_FREQUENCY_NORMAL),
   only_called_at_startup (false), only_called_at_exit (false),
   tm_clone (false), dispatcher_function (false), calls_comdat_local 
(false),
@@ -1469,8 +1469,6 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg
 
   /* Expected number of executions: calculated in profile.c.  */
   profile_count count;
-  /* Time profiler: first run of function.  */
-  gcov_type tp_first_run;
   /* How to scale counts at materialization time; used to merge
  LTO units with different number of profile runs.  */
   int count_materialization_scale;
@@ -1478,6 +1476,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg
   unsigned int profile_id;
   /* ID of the translation unit.  */
   int unit_id;
+  /* Time profiler: first run of function.  */
+  int tp_first_run;
 
   /* Set when decl is an abstract function pointed to by the
  ABSTRACT_DECL_ORIGIN of a reachable function.  */
Index: cgraphunit.c
===
--- cgraphunit.c(revision 279167)
+++ cgraphunit.c(working copy)
@@ -2364,8 +2364,8 @@ tp_first_run_node_cmp (const void *pa, c
 {
   const cgraph_node *a = *(const cgraph_node * const *) pa;
   const cgraph_node *b = *(const cgraph_node * const *) pb;
-  gcov_type tp_first_run_a = a->tp_first_run;
-  gcov_type tp_first_run_b = b->tp_first_run;
+  unsigned int tp_first_run_a = a->tp_first_run;
+  unsigned int 

[Patch, Fortran] OpenMP/OpenACC – fix more issues with OPTIONAL

2019-12-10 Thread Tobias Burnus
Nonallocatable, nonpointer array arguments (of assumed shape) are 
special as they get a get an array descriptor ('arg') as argument but 
create a local variable which accesses the actual data ('arg.0 = 
arg->data').


With OPTIONAL, there are/were two outstanding issues:

(A) If the argument is not present, 'arg.0' is/was never assigned to.

(B) The optional-arg-is-present check is not just 'if (arg)' but 'if 
(arg && arg->data)' as passing an unallocated allocatable/disassociated 
pointer (i.e. 'arg->data = NULL') to a nonpointer, nonallocatable 
optional dummy argument counts as absent argument; this affects (A).


Solution:

(B) is now solved by updating what gfc_omp_check_optional_argument 
returns; as is now always returns an boolean_type_node, one can clean up 
the code which adds "!= NULL" when using the "present" tree variable.


(A) For mapping, one also does GOMP_MAP_POINTER; if one replaces this by 
a temporary variable 'D.124 = present ? arg.0 : NULL', it will later ICE 
in omp-low.c one confuses the identifier handling, which replaces the 
variables in 'target (data)'.


Build on x86-64-gnu-linux w/o offloading and on one nvptx configuration 
with actual offloading.

OK?

Tobias

PS: Besides adding tons of test cases, it also fixes the transient issue 
(which does only occur with -O1 ?!?) with the existing 
use_device_addr-{3,4}.f90 test case. That failed due to reason (A). – 
Cf. https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00499.html


PPS: I haven't tried polymorphic data types but I am positive they will 
fail. Cray pointers are also candidates for additional failures.


2019-12-10  Tobias Burnus  

	gcc/fortran/
	* trans-openmp.c (gfc_omp_check_optional_argument): Always return a
	Boolean expression; handle unallocated/disassociated actual arguments
	as absent if passed to nonallocatable/nonpointer dummy array arguments.
	(gfc_build_cond_assign): Change to assume a Boolean expr not a pointer.
	(gfc_omp_finish_clause, gfc_trans_omp_clauses): Assign NULL to generated
	array-data variable if the argument is absent. Simplify code as
	'present' is now a Boolean expression.

	libgomp/
	* testsuite/libgomp.fortran/optional-map.f90: Add test for
	unallocated/disassociated actual arguments to nonallocatable/nonpointer
	dummy arguments; those are/shall be regarded as absent arguments.
	* testsuite/libgomp.fortran/use_device_ptr-optional-2.f90: Ditto.
	* testsuite/libgomp.fortran/use_device_ptr-optional-3.f90: New.

 gcc/fortran/trans-openmp.c | 117 +++--
 libgomp/testsuite/libgomp.fortran/optional-map.f90 |  13 ++
 .../libgomp.fortran/use_device_ptr-optional-2.f90  |  11 ++
 .../libgomp.fortran/use_device_ptr-optional-3.f90  | 140 +
 4 files changed, 242 insertions(+), 39 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 356fd04e6c3..e46086d3916 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -90,11 +90,16 @@ gfc_omp_check_optional_argument (tree decl, bool for_present_check)
   if (!DECL_LANG_SPECIFIC (decl))
 return NULL_TREE;
 
+  bool is_array_type = false;
+
   /* For assumed-shape arrays, a local decl with arg->data is used.  */
   if (TREE_CODE (decl) != PARM_DECL
   && (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl))
 	  || GFC_ARRAY_TYPE_P (TREE_TYPE (decl
-decl = GFC_DECL_SAVED_DESCRIPTOR (decl);
+{
+  is_array_type = true;
+  decl = GFC_DECL_SAVED_DESCRIPTOR (decl);
+}
 
   if (TREE_CODE (decl) != PARM_DECL
   || !DECL_LANG_SPECIFIC (decl)
@@ -126,7 +131,23 @@ gfc_omp_check_optional_argument (tree decl, bool for_present_check)
   return decl;
 }
 
-  return decl;
+  tree cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node,
+			   decl, null_pointer_node);
+
+  /* Fortran regards unallocated allocatables/disassociated pointer which
+ are passed to a nonallocatable, nonpointer argument as not associated;
+ cf. F2018, 15.5.2.12, Paragraph 1.  */
+  if (is_array_type)
+{
+  tree cond2 = build_fold_indirect_ref_loc (input_location, decl);
+  cond2 = gfc_conv_array_data (cond2);
+  cond2 = fold_build2_loc (input_location, NE_EXPR, boolean_type_node,
+			   cond2, null_pointer_node);
+  cond = fold_build2_loc (input_location, TRUTH_ANDIF_EXPR,
+			  boolean_type_node, cond, cond2);
+}
+
+  return cond;
 }
 
 
@@ -1189,7 +1210,7 @@ gfc_build_cond_assign (stmtblock_t *block, tree val, tree cond_val,
 		   tree then_b, tree else_val)
 {
   stmtblock_t cond_block;
-  tree cond, else_b = NULL_TREE;
+  tree else_b = NULL_TREE;
   tree val_ty = TREE_TYPE (val);
 
   if (else_val)
@@ -1198,15 +1219,9 @@ gfc_build_cond_assign (stmtblock_t *block, tree val, tree cond_val,
   gfc_add_modify (_block, val, fold_convert (val_ty, else_val));
   else_b = gfc_finish_block (_block);
 }
-  cond = fold_build2_loc (input_location, NE_EXPR,
-			  logical_type_node,
-			  cond_val, 

Fix hot/startup partitioning with LTO

2019-12-10 Thread Jan Hubicka
Hi,
as noticed by Martin's new heatmaps default_function_sections disables
test.unlikely and text.startup sections during LTO.  This was meant to
be only with tp_first_run profiling. The hot section is still useful
since it does "poor man's clustering" so I kept only the test to disable
startup subsection with corrected fix.

Bootstrapped/regtested x86_64-linux and checked that it works well with
heatmap.

Honza

* varasm.c (default_function_section): Fix confused tests for
tp_first_run reordering.
Index: varasm.c
===
--- varasm.c(revision 279167)
+++ varasm.c(working copy)
@@ -589,9 +589,13 @@ default_function_section (tree decl, enu
  where we can split away unnecessary parts of static constructors.  */
   if (startup && freq != NODE_FREQUENCY_UNLIKELY_EXECUTED)
   {
-/* If we do have a profile or(and) LTO phase is executed, we do not need
-   these ELF section.  */
-if (!in_lto_p || !flag_profile_values)
+/* During LTO the tp_first_run profiling will naturally place all
+   initialization code first.  Using separate section is counter-productive
+   because startup only code may call functions which are no longer
+   startup only.  */
+if (!in_lto_p
+|| !cgraph_node::get (decl)->tp_first_run
+   || !opt_for_fn (decl, flag_profile_reorder_functions))
   return get_named_text_section (decl, ".text.startup", NULL);
 else
   return NULL;
@@ -607,10 +611,7 @@ default_function_section (tree decl, enu
   case NODE_FREQUENCY_UNLIKELY_EXECUTED:
return get_named_text_section (decl, ".text.unlikely", NULL);
   case NODE_FREQUENCY_HOT:
-/* If we do have a profile or(and) LTO phase is executed, we do not 
need
-   these ELF section.  */
-if (!in_lto_p || !flag_profile_values)
-  return get_named_text_section (decl, ".text.hot", NULL);
+return get_named_text_section (decl, ".text.hot", NULL);
/* FALLTHRU */
   default:
return NULL;


[PATCH, libstdc++] Two parts of C++20 p1032...

2019-12-10 Thread Smith-Rowland, Edward M
These patches are modest reworkings of previous patches.

The patch for char_traits was pretty much approved already.  But in response to 
a comment from François I at least added __copy_backwards, and used ptrdiff_t 
instead of size_t parameters and I put all the mem* wrappers in a __detail 
namespace. I didn't change some names. I know Jonathan wanted a bikeshed 
discussion.

For the constexpr iterators I was asked to simplify the testsuite. Done.

These would finish p1032 except for std::string::copy which I think should wait 
behind the rest of constexpr string.

Another thing I need to do is respond to the SG-10 changes for the p1032 
macros.  There's one thing that bothers me still in light of the policy that 
these macros are names __cpp_lib_constexpr_HEADER: We sill have 
__cpp_lib_constexpr_algorithmS in .

OK if these pass final retesting?
2019-12-10  Edward Smith-Rowland  <3dw...@verizon.net>

	Implement the char_traits and string_view part of C++20 p1032 Misc.
	constexpr bits.
	* include/bits/char_traits.h (move, copy, assign): Constexpr.
	* include/bits/stl_algobase.h (__memcpy, __memset): New.
	* include/ext/pod_char_traits.h (from, to, operator==, operator<)
	(assign, eq, lt, compare, length, find, move, copy, assign)
	(to_char_type, to_int_type, eq_int_type, eof, not_eof):
	Make these constespr for appropriate standards.
	* include/std/string_view (copy): Constexpr.
	* testsuite/21_strings/char_traits/requirements/char/constexpr.cc: New test.
	* testsuite/21_strings/char_traits/requirements/wchar_t/constexpr.cc: New test.
	* testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc: New test.
	* testsuite/21_strings/basic_string_view/operations/copy/wchar_t/constexpr.cc: New test.

Index: include/bits/char_traits.h
===
--- include/bits/char_traits.h	(revision 279174)
+++ include/bits/char_traits.h	(working copy)
@@ -113,13 +113,13 @@
   static _GLIBCXX14_CONSTEXPR const char_type*
   find(const char_type* __s, std::size_t __n, const char_type& __a);
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   move(char_type* __s1, const char_type* __s2, std::size_t __n);
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   copy(char_type* __s1, const char_type* __s2, std::size_t __n);
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   assign(char_type* __s, std::size_t __n, char_type __a);
 
   static _GLIBCXX_CONSTEXPR char_type
@@ -179,18 +179,18 @@
 }
 
   template
-typename char_traits<_CharT>::char_type*
+_GLIBCXX20_CONSTEXPR typename char_traits<_CharT>::char_type*
 char_traits<_CharT>::
 move(char_type* __s1, const char_type* __s2, std::size_t __n)
 {
   if (__n == 0)
 	return __s1;
-  return static_cast<_CharT*>(__builtin_memmove(__s1, __s2,
-		__n * sizeof(char_type)));
+  return static_cast<_CharT*>
+		(std::__detail::__memmove(__s1, __s2, __n));
 }
 
   template
-typename char_traits<_CharT>::char_type*
+_GLIBCXX20_CONSTEXPR typename char_traits<_CharT>::char_type*
 char_traits<_CharT>::
 copy(char_type* __s1, const char_type* __s2, std::size_t __n)
 {
@@ -200,7 +200,7 @@
 }
 
   template
-typename char_traits<_CharT>::char_type*
+_GLIBCXX20_CONSTEXPR typename char_traits<_CharT>::char_type*
 char_traits<_CharT>::
 assign(char_type* __s, std::size_t __n, char_type __a)
 {
@@ -349,28 +349,30 @@
 	return static_cast(__builtin_memchr(__s, __a, __n));
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   move(char_type* __s1, const char_type* __s2, size_t __n)
   {
 	if (__n == 0)
 	  return __s1;
-	return static_cast(__builtin_memmove(__s1, __s2, __n));
+	return static_cast
+		(std::__detail::__memmove(__s1, __s2, __n));
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   copy(char_type* __s1, const char_type* __s2, size_t __n)
   {
 	if (__n == 0)
 	  return __s1;
-	return static_cast(__builtin_memcpy(__s1, __s2, __n));
+	return static_cast(
+			std::__detail::__memcpy(__s1, __s2, __n));
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   assign(char_type* __s, size_t __n, char_type __a)
   {
 	if (__n == 0)
 	  return __s;
-	return static_cast(__builtin_memset(__s, __a, __n));
+	return static_cast(std::__detail::__memset(__s, __a, __n));
   }
 
   static _GLIBCXX_CONSTEXPR char_type
@@ -458,27 +460,45 @@
 	return wmemchr(__s, __a, __n);
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   move(char_type* __s1, const char_type* __s2, size_t __n)
   {
 	if (__n == 0)
 	  return __s1;
+#ifdef __cpp_lib_is_constant_evaluated
+	if (std::is_constant_evaluated())
+	  return static_cast
+		   (std::__detail::__memmove(__s1, __s2, __n));
+	else
+#endif
 	return 

Re: [PATCH 28/49] analyzer: new files: analyzer.{cc|h}

2019-12-10 Thread Martin Sebor

On 11/15/19 6:23 PM, David Malcolm wrote:

gcc/ChangeLog:
* analyzer/analyzer.cc: New file.
* analyzer/analyzer.h: New file.


Here are a few more comments/suggestions as I'm slowly making my way
through the patch, partly for my own benefit.  Hopefully they're at
least moderately useful.


---
  gcc/analyzer/analyzer.cc | 125 ++
  gcc/analyzer/analyzer.h  | 126 +++
  2 files changed, 251 insertions(+)
  create mode 100644 gcc/analyzer/analyzer.cc
  create mode 100644 gcc/analyzer/analyzer.h

diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
new file mode 100644
index 000..399925c
--- /dev/null
+++ b/gcc/analyzer/analyzer.cc
@@ -0,0 +1,125 @@
+/* Utility functions for the analyzer.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "gcc-plugin.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "gimple.h"
+#include "diagnostic.h"
+#include "intl.h"
+#include "analyzer/analyzer.h"
+
+/* Helper function for checkers.  Is the CALL to the given function name?  */
+
+bool
+is_named_call_p (const gcall *call, const char *funcname)
+{
+  gcc_assert (funcname);
+
+  tree fndecl = gimple_call_fndecl (call);
+  if (!fndecl)
+return false;
+
+  return 0 == strcmp (IDENTIFIER_POINTER (DECL_NAME (fndecl)), funcname);
+}
+
+/* Helper function for checkers.  Is the CALL to the given function name,
+   and with the given number of arguments?  */
+
+bool
+is_named_call_p (const gcall *call, const char *funcname,
+unsigned int num_args)
+{
+  gcc_assert (funcname);
+
+  if (!is_named_call_p (call, funcname))
+return false;
+
+  if (gimple_call_num_args (call) != num_args)
+return false;
+
+  return true;
+}
+
+/* Return true if stmt is a setjmp call.  */
+
+bool
+is_setjmp_call_p (const gimple *stmt)
+{
+  /* TODO: is there a less hacky way to check for "setjmp"?  */
+  if (const gcall *call = dyn_cast  (stmt))
+if (is_named_call_p (call, "_setjmp", 1))
+  return true;
+
+  return false;
+}
+
+/* Return true if stmt is a longjmp call.  */
+
+bool
+is_longjmp_call_p (const gcall *call)
+{
+  /* TODO: is there a less hacky way to check for "longjmp"?  */
+  if (is_named_call_p (call, "longjmp", 2))
+return true;
+
+  return false;
+}


I haven't actually needed these helper functions myself but they seem
quite generic and might make good additions to gimple.h.


+
+/* Generate a label_text instance by formatting FMT, using a
+   temporary clone of the global_dc's printer (thus using its
+   formatting callbacks).
+
+   Colorize if the global_dc supports colorization and CAN_COLORIZE is
+   true.  */
+
+label_text
+make_label_text (bool can_colorize, const char *fmt, ...)
+{
+  pretty_printer *pp = global_dc->printer->clone ();
+  pp_clear_output_area (pp);
+
+  if (!can_colorize)
+pp_show_color (pp) = false;
+
+  text_info ti;
+  rich_location rich_loc (line_table, UNKNOWN_LOCATION);
+
+  va_list ap;
+
+  va_start (ap, fmt);
+
+  ti.format_spec = _(fmt);
+  ti.args_ptr = 
+  ti.err_no = 0;
+  ti.x_data = NULL;
+  ti.m_richloc = _loc;
+
+  pp_format (pp, );
+  pp_output_formatted_text (pp);
+
+  va_end (ap);
+
+  label_text result = label_text::take (xstrdup (pp_formatted_text (pp)));
+  delete pp;
+  return result;
+}
diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
new file mode 100644
index 000..ace8924
--- /dev/null
+++ b/gcc/analyzer/analyzer.h
@@ -0,0 +1,126 @@
+/* Utility functions for the analyzer.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-12-10 Thread Kyrill Tkachov

Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this approved
first!



Sorry for the delay.



Thank you,
Stam


 Forwarded Message 
Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional
branches in Thumb2 (PR91816)
Date: Mon, 21 Oct 2019 10:37:09 +0100
From: Stam Markianos-Wright 
To: Ramana Radhakrishnan 
CC: gcc-patches@gcc.gnu.org , nd ,
James Greenhalgh , Richard Earnshaw




On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>
>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>> however, on my native Aarch32 setup the test times out when run as part
>> of a big "make check-gcc" regression, but not when run individually.
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>   * config/arm/arm.md: Update b for Thumb2 range checks.
>>   * config/arm/arm.c: New function arm_gen_far_branch.
>>   * config/arm/arm-protos.h: New function arm_gen_far_branch
>>   prototype.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>   * testsuite/gcc.target/arm/pr91816.c: New test.
>
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index f995974f9bb..1dce333d1c3 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
cpu_arch_option *,

>>
>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>
>> +const char * arm_gen_far_branch (rtx *, int,const char * , const 
char *);

>> +
>> +
>
> Lets get the nits out of the way.
>
> Unnecessary extra new line, need a space between int and const above.
>
>

.Fixed!

>>   #endif /* ! GCC_ARM_PROTOS_H */
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 39e1a1ef9a2..1a693d2ddca 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>   }
>>   } /* Namespace selftest.  */
>>
>> +
>> +/* Generate code to enable conditional branches in functions over 
1 MiB.  */

>> +const char *
>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>> +    const char * branch_format)
>
> Not sure if this is some munging from the attachment but check
> vertical alignment of parameters.
>

.Fixed!

>> +{
>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>> +  char label_buf[256];
>> +  char buffer[128];
>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>> +    CODE_LABEL_NUMBER (tmp_label));
>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>> +  rtx dest_label = operands[pos_label];
>> +  operands[pos_label] = tmp_label;
>> +
>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , 
label_ptr);

>> +  output_asm_insn (buffer, operands);
>> +
>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
label_ptr);

>> +  operands[pos_label] = dest_label;
>> +  output_asm_insn (buffer, operands);
>> +  return "";
>> +}
>> +
>> +
>
> Unnecessary extra newline.
>

.Fixed!

>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>   #endif /* CHECKING_P */
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index f861c72ccfc..634fd0a59da 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -6686,9 +6686,16 @@
>>   ;; And for backward branches we have
>>   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or 
-4) + 4).

>>   ;;
>> +;; In 16-bit Thumb these ranges are:
>>   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
(-2040->2048).
>>   ;; For a 'b' pos_range = 254, neg_range = -256  giving 
(-250 ->256).

>>
>> +;; In 32-bit Thumb these ranges are:
>> +;; For a 'b'   +/- 16MB is not checked for.
>> +;; For a 'b' pos_range = 1048574, neg_range = -1048576  giving
>> +;; (-1048568 -> 1048576).
>> +
>> +
>
> Unnecessary extra newline.
>

.Fixed!

>>   (define_expand "cbranchsi4"
>> [(set (pc) (if_then_else
>> (match_operator 0 "expandable_comparison_operator"
>> @@ -6947,22 +6954,42 @@
>> (pc)))]
>> "TARGET_32BIT"
>> "*
>> -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> -    {
>> -  arm_ccfsm_state += 2;
>> -  return \"\";
>> -    }
>> -  return \"b%d1\\t%l0\";
>> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> +  {
>> +    arm_ccfsm_state += 2;
>> +    return \"\";
>> +  }
>> + switch (get_attr_length (insn))
>> +  {
>> +    // Thumb2 16-bit b{cond}
>> +    case 2:
>> +
>> +    // Thumb2 32-bit b{cond}
>> +    case 4: return \"b%d1\\t%l0\";break;
>> +
>> +    // Thumb2 b{cond} out of range.  Use unconditional branch.
>> +    case 8: return arm_gen_far_branch \
>> +    (operands, 0, \"Lbcond\", \"b%D1\t\");
>> +    break;
>> +
>> +    // A32 b{cond}
>> +    

[PATCH] libstdc++: Fix description of std::ios::trunc (PR 92886)

2019-12-10 Thread Jonathan Wakely

PR libstdc++/92886
* include/bits/ios_base.h (std::ios_base::trunc): Fix comment.

Committing to trunk as obvious.

commit 9acff881dea3a00ae588f8ebc4c5289b98f7ee52
Author: Jonathan Wakely 
Date:   Tue Dec 10 17:00:30 2019 +

libstdc++: Fix description of std::ios::trunc (PR 92886)

PR libstdc++/92886
* include/bits/ios_base.h (std::ios_base::trunc): Fix comment.

diff --git a/libstdc++-v3/include/bits/ios_base.h 
b/libstdc++-v3/include/bits/ios_base.h
index 06a3e0870f3..be908b3ffd2 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -445,7 +445,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 /// Open for output.  Default for @c ofstream and fstream.
 static const openmode out =_S_out;
 
-/// Open for input.  Default for @c ofstream.
+/// Truncate an existing stream when opening.  Default for @c ofstream.
 static const openmode trunc =  _S_trunc;
 
 // 27.4.2.1.5  Type ios_base::seekdir


Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-10 Thread Jan Hubicka
>   * Makefile.in: Add ipa-reorder.o.
>   * cgraph.c (cgraph_node::dump): Dump
>   text_sorted_order.
>   (cgraph_node_cmp_by_text_sorted):
>   New function that sorts functions based
>   on text_sorted_order.
>   * cgraph.h (cgraph_node): Add text_sorted_order.
>   (cgraph_node_cmp_by_text_sorted): New.
>   * cgraphclones.c (cgraph_node::create_clone):
>   Clone also text_sorted_order.
>   * cgraphunit.c (node_cmp): Remove.
>   (expand_all_functions): Use new function
>   cgraph_node_cmp_by_text_sorted.
>   * common.opt: Add new option reorder_functions_algorithm.
>   * flag-types.h (enum reorder_functions_algorithm):
>   New enum.
>   * ipa-reorder.c: New file.
>   * lto-cgraph.c (lto_output_node): Stream in and out
>   text_sorted_order.
>   (input_node): Likewise.
>   * passes.def: Add pass_ipa_reorder.
>   * timevar.def (TV_IPA_REORDER): New.
>   * tree-pass.h (make_pass_ipa_reorder): New.
>   * varasm.c (default_function_section): Assign text.sorted.X
>   section.
> 
> gcc/lto/ChangeLog:
> 
> 2019-11-25  Martin Liska  
> 
>   * lto-partition.c (node_cmp): Remove.
>   (lto_balanced_map): Use new cgraph_node_cmp_by_text_sorted.
> +/* Sort cgraph_nodes by text_sorted_order if available, or by order.  */
> +
> +int
> +cgraph_node_cmp_by_text_sorted (const void *pa, const void *pb)
> +{
> +  const cgraph_node *a = *(const cgraph_node * const *) pa;
> +  const cgraph_node *b = *(const cgraph_node * const *) pb;
> +
> +  /* Functions with text_sorted_order should be before these
> + without profile.  */
> +  if (a->text_sorted_order == 0 || b->text_sorted_order == 0)
> +return a->text_sorted_order - b->text_sorted_order;
> +
> +  return a->text_sorted_order != b->text_sorted_order
> +  ? b->text_sorted_order - a->text_sorted_order
> +  : b->order - a->order;
> +}

We now have tp_first_run_node_cmp with needs to be updated and renamed
isntead of adding new comparator.
>/* Time profiler: first run of function.  */
>int tp_first_run;
> +  /* Order in .text.sorted.* section.  */
> +  int text_sorted_order;

tp_first_run probalby should go into summary since it will live from
profling to the ipa-reorder pass where it will be copied to
text_sorted_order? Or how these two interface?
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index a79491e0b88..f624cbee185 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -372,6 +372,7 @@ cgraph_node::create_clone (tree new_decl, profile_count 
> prof_count,
>new_node->rtl = rtl;
>new_node->frequency = frequency;
>new_node->tp_first_run = tp_first_run;
> +  new_node->text_sorted_order = text_sorted_order;
Copy it also in duplicate_thunk_for_node and create_version_node.
I think we should refactor the copying code and share it in all places
we duplicate function.
> +freorder-functions-algorithm=
> +Common Joined RejectNegative Enum(reorder_functions_algorithm) 
> Var(flag_reorder_functions_algorithm) 
> Init(REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING) Optimization
> +-freorder-functions-algorithm=[first-run|call-chain-clustering]  Set the 
> used function reordering algorithm.
> +
> +Enum
> +Name(reorder_functions_algorithm) Type(enum reorder_functions_algorithm) 
> UnknownError(unknown function reordering algorithm %qs)
> +
> +EnumValue
> +Enum(reorder_functions_algorithm) String(first-run) 
> Value(REORDER_FUNCTIONS_ALGORITHM_FIRST_RUN)
> +
> +EnumValue
> +Enum(reorder_functions_algorithm) String(call-chain-clustering) 
> Value(REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING)

I think we should also have "definition-order" which will sort by
node->order.  For testing we may also want to have "random" which will
do something stupid like sorting by symbol name checksum so we could see
how these compares.
> +/* The algorithm used for function reordering.  */
> +enum reorder_functions_algorithm
> +{
> +  REORDER_FUNCTIONS_ALGORITHM_FIRST_RUN,
> +  REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING
> +};

Isn't this supposed to be generated by awk scripts?
> +
> +#define C3_CLUSTER_THRESHOLD 1024
description + perhaps PARAM?
> +
> +struct cluster_edge;
> +
> +/* Cluster is set of functions considered in C3 algorithm.  */
Perhaps an high level overview of C3 algorithm with a reference would be
nice.
> +/* Sort functions based of first execution of the function.  */
> +
> +static void
> +sort_functions_by_first_run (void)
> +{
> +  cgraph_node *node;
> +
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +if (node->tp_first_run && !node->alias)
> +  node->text_sorted_order = node->tp_first_run;

Why you care about excluding aliases?  We also have thunks where
tp_first_run is ignored.  I am just curious if you got some ICE here.
> +}
> +
> +/* Compare clusters by density after that are established.  */
> +
> +static int
> +cluster_cmp (const void *a_p, const void *b_p)
> +{
> +  const cluster *a = *(cluster 

[committed][AArch64] Don't allow partial SVE modes in GPRs

2019-12-10 Thread Richard Sandiford
With -msve-vector-bits=N, the payload of some partial SVE modes can
be 16 bytes or smaller, which makes them small enough to fit in a
pair of GPRs.  We specifically don't want that, because the payload
is distributed evenly across the SVE register rather than collected
at one end.  Marshalling it into a GPR via register operations would
be expensive.

Tested on aarch64-linux-gnu, applied as r279174.

Richard


2019-12-10  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_hard_regno_mode_ok): Don't
allow SVE modes in GPRs.

gcc/testsuite/
* gcc.target/aarch64/sve/mixed_size_7.c: New test.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-12-05 14:20:17.409060413 +
+++ gcc/config/aarch64/aarch64.c2019-12-10 16:45:42.794317637 +
@@ -2019,9 +2019,11 @@ aarch64_hard_regno_mode_ok (unsigned reg
 
   if (GP_REGNUM_P (regno))
 {
+  if (vec_flags & VEC_ANY_SVE)
+   return false;
   if (known_le (GET_MODE_SIZE (mode), 8))
return true;
-  else if (known_le (GET_MODE_SIZE (mode), 16))
+  if (known_le (GET_MODE_SIZE (mode), 16))
return (regno & 1) == 0;
 }
   else if (FP_REGNUM_P (regno))
Index: gcc/testsuite/gcc.target/aarch64/sve/mixed_size_7.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/mixed_size_7.c 2019-12-10 
16:45:42.794317637 +
@@ -0,0 +1,28 @@
+/* Originally gcc.dg/vect/bb-slp-6.c */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 
-fno-vect-cost-model" } */
+
+#define N 16
+
+unsigned int out[N];
+unsigned int in[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+__attribute__ ((noinline)) int
+main1 (unsigned int x, unsigned int y)
+{
+  int i;
+  unsigned int *pin = [0];
+  unsigned int *pout = [0];
+  unsigned int a0, a1, a2, a3;
+
+  a0 = *pin++ + 23;
+  a1 = *pin++ + 142;
+  a2 = *pin++ + 2;
+  a3 = *pin++ + 31;
+
+  *pout++ = a0 * x;
+  *pout++ = a1 * y;
+  *pout++ = a2 * x;
+  *pout++ = a3 * y;
+
+  return 0;
+}


[committed][AArch64] Fix INDEX patterns for partial VNx2 modes

2019-12-10 Thread Richard Sandiford
The INDEX patterns handle partial modes by choosing the container
size rather than the element size, so that the number of lanes
(and thus number of additions) matches the mode.  This means that
all VNx4 modes use .s and all VNx2 modes use .d, etc.

When adding this, I'd forgotten that the choice between Wn and Xn
registers would need to be updated to use the container size too.
For partial VNx2s, we were using .d containers with Wn rather than
Xn source registers.

Tested on aarch64-linux-gnu, applied as r279173.

Richard


2019-12-10  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (vccore): New iterator.
* config/aarch64/aarch64-sve.md (vec_series): Use it instead
of vwcore.
(*vec_series_plus): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/mixed_size_6.c: New test.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-11-18 15:36:04.861884957 +
+++ gcc/config/aarch64/iterators.md 2019-12-10 16:31:31.328032388 +
@@ -1093,6 +1093,12 @@ (define_mode_attr vwcore [(V8QI "w") (V1
  (VNx2DI "x")
  (VNx2DF "x")])
 
+;; Like vwcore, but for the container mode rather than the element mode.
+(define_mode_attr vccore [(VNx16QI "w") (VNx8QI "w") (VNx4QI "w") (VNx2QI "x")
+ (VNx8HI "w") (VNx4HI "w") (VNx2HI "x")
+ (VNx4SI "w") (VNx2SI "x")
+ (VNx2DI "x")])
+
 ;; Double vector types for ALLX.
 (define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
 
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-11-16 13:31:24.342304673 +
+++ gcc/config/aarch64/aarch64-sve.md   2019-12-10 16:31:31.328032388 +
@@ -2541,9 +2541,9 @@ (define_insn "vec_series"
  (match_operand: 2 "aarch64_sve_index_operand" "r, Usi, r")))]
   "TARGET_SVE"
   "@
-   index\t%0., #%1, %2
-   index\t%0., %1, #%2
-   index\t%0., %1, %2"
+   index\t%0., #%1, %2
+   index\t%0., %1, #%2
+   index\t%0., %1, %2"
 )
 
 ;; Optimize {x, x, x, x, ...} + {0, n, 2*n, 3*n, ...} if n is in range
@@ -2557,7 +2557,7 @@ (define_insn "*vec_series_plus"
   "TARGET_SVE && aarch64_check_zero_based_sve_index_immediate (operands[2])"
   {
 operands[2] = aarch64_check_zero_based_sve_index_immediate (operands[2]);
-return "index\t%0., %1, #%2";
+return "index\t%0., %1, #%2";
   }
 )
 
Index: gcc/testsuite/gcc.target/aarch64/sve/mixed_size_6.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/mixed_size_6.c 2019-12-10 
16:31:31.328032388 +
@@ -0,0 +1,47 @@
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+#include 
+
+void
+f1 (uint64_t *restrict ptr1, uint8_t *restrict ptr2, uint8_t start)
+{
+#pragma GCC unroll 0
+  for (int i = 0; i < 4; ++i)
+{
+  ptr1[i] = 10;
+  ptr2[i] = start;
+  start += 1;
+}
+}
+
+void
+f2 (uint64_t *restrict ptr1, uint16_t *restrict ptr2, uint16_t start)
+{
+#pragma GCC unroll 0
+  for (int i = 0; i < 4; ++i)
+{
+  ptr1[i] = 10;
+  ptr2[i] = start;
+  start += 2;
+}
+}
+
+void
+f3 (uint64_t *restrict ptr1, uint32_t *restrict ptr2, uint32_t start)
+{
+#pragma GCC unroll 0
+  for (int i = 0; i < 4; ++i)
+{
+  ptr1[i] = 10;
+  ptr2[i] = start;
+  start += 4;
+}
+}
+
+/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.d, x[0-9]+, #1\n} } } */
+/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.d, x[0-9]+, #1\n} } } */
+/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.d, x[0-9]+, #4\n} } } */
+
+/* { dg-final { scan-assembler-not {\tindex\tz[0-9]+\.d, w[0-9]+, #1\n} } } */
+/* { dg-final { scan-assembler-not {\tindex\tz[0-9]+\.d, w[0-9]+, #1\n} } } */
+/* { dg-final { scan-assembler-not {\tindex\tz[0-9]+\.d, w[0-9]+, #4\n} } } */


Re: [PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Harwath, Frederik
Hi Thomas,

On 10.12.19 15:44, Thomas Schwinge wrote:

> Thanks, yes, with my following remarks considered, and acted on per your
> preference.  To record the review effort, please include "Reviewed-by:
> Thomas Schwinge " in the commit log, see
> .

Committed as r279168 and r279169.

Frederik




[PATCH] libstdc++: Define __cpp_lib_constexpr_complex macro

2019-12-10 Thread Jonathan Wakely

This is LWG issue 3349.

* include/std/complex (__cpp_lib_constexpr_complex): Define.
* include/std/version (__cpp_lib_constexpr_complex): Likewise.
* testsuite/26_numerics/complex/1.cc: New test.
* testsuite/26_numerics/complex/2.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit fb98f3d61b578aedabf371608b105d8d0b2dc999
Author: Jonathan Wakely 
Date:   Tue Dec 10 16:03:43 2019 +

libstdc++: Define __cpp_lib_constexpr_complex macro

This is LWG issue 3349.

* include/std/complex (__cpp_lib_constexpr_complex): Define.
* include/std/version (__cpp_lib_constexpr_complex): Likewise.
* testsuite/26_numerics/complex/1.cc: New test.
* testsuite/26_numerics/complex/2.cc: New test.

diff --git a/libstdc++-v3/include/std/complex b/libstdc++-v3/include/std/complex
index 45450e8ca01..d156d3249a7 100644
--- a/libstdc++-v3/include/std/complex
+++ b/libstdc++-v3/include/std/complex
@@ -47,6 +47,10 @@
 // Get rid of a macro possibly defined in 
 #undef complex
 
+#if __cplusplus > 201703L
+# define __cpp_lib_constexpr_complex 201711L
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 8e08dc99049..e55a092eb89 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -180,6 +180,7 @@
 #if _GLIBCXX_HOSTED
 #define __cpp_lib_bind_front 201907L
 #define __cpp_lib_constexpr_algorithms 201806L
+#define __cpp_lib_constexpr_complex 201711L
 #define __cpp_lib_constexpr_dynamic_alloc 201907L
 #define __cpp_lib_constexpr_invoke 201907L
 #define __cpp_lib_erase_if 201900L
diff --git a/libstdc++-v3/testsuite/26_numerics/complex/1.cc 
b/libstdc++-v3/testsuite/26_numerics/complex/1.cc
new file mode 100644
index 000..e3e6c22c5f4
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/complex/1.cc
@@ -0,0 +1,27 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do preprocess { target c++2a } }
+
+#include 
+
+#ifndef __cpp_lib_constexpr_complex
+# error "Feature test macro for constexpr complex is missing in "
+#elif __cpp_lib_constexpr_complex < 201711L
+# error "Feature test macro for constexpr complex has wrong value in "
+#endif
diff --git a/libstdc++-v3/testsuite/26_numerics/complex/2.cc 
b/libstdc++-v3/testsuite/26_numerics/complex/2.cc
new file mode 100644
index 000..e84e92c62ae
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/complex/2.cc
@@ -0,0 +1,27 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do preprocess { target c++2a } }
+
+#include 
+
+#ifndef __cpp_lib_constexpr_complex
+# error "Feature test macro for constexpr complex is missing in "
+#elif __cpp_lib_constexpr_complex < 201711L
+# error "Feature test macro for constexpr complex has wrong value in "
+#endif


[PATCH] libstdc++: Reduce header dependencies in

2019-12-10 Thread Jonathan Wakely

* include/std/span: Do not include  and .
(tuple_size, tuple_element): Declare.

Tested powerpc64le-linux, committed to trunk.

commit a32163a1de2a84dde15df80dd33e54a57746d6eb
Author: Jonathan Wakely 
Date:   Tue Dec 10 15:59:54 2019 +

libstdc++: Reduce header dependencies in 

* include/std/span: Do not include  and .
(tuple_size, tuple_element): Declare.

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index f215decb453..ecce0b33b0b 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -39,8 +39,6 @@
 #if __cplusplus > 201703L
 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -451,6 +449,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __sp[_Index];
 }
 
+  template struct tuple_size;
+  template struct tuple_element;
+
   template
 struct tuple_size>
 : public integral_constant


[PATCH] libstdc++: Fix bug in std::indirect_result_t

2019-12-10 Thread Jonathan Wakely

The alias template wasn't working because it applied iter_reference_t to
the pack of iterators before and after passing the pack to the
__indeirect_result helper.

* include/bits/iterator_concepts.h (indirect_result_t): Do not apply
iter_reference_t to parameter pack.
* testsuite/24_iterators/indirect_callable/projected.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit c87d75745e6a133a2e8e2703cec24d09bc53c636
Author: Jonathan Wakely 
Date:   Tue Dec 10 15:52:50 2019 +

libstdc++: Fix bug in std::indirect_result_t

The alias template wasn't working because it applied iter_reference_t to
the pack of iterators before and after passing the pack to the
__indeirect_result helper.

* include/bits/iterator_concepts.h (indirect_result_t): Do not apply
iter_reference_t to parameter pack.
* testsuite/24_iterators/indirect_callable/projected.cc: New test.

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index 97aed72e255..ab9851f19c4 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -474,18 +474,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   struct __iter_common_ref
   : common_reference, iter_value_t<_Tp>&>
   { };
-
-// FIXME: needed due to PR c++/67704
-template
-  struct __indirect_result
-  { };
-
-template
-  requires (readable<_Is> && ...)
-   && invocable<_Fn, iter_reference_t<_Is>...>
-  struct __indirect_result<_Fn, _Is...>
-  : invoke_result<_Fn, iter_reference_t<_Is>...>
-  { };
   } // namespace __detail
 
   template
@@ -653,15 +641,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   && strict_weak_order<_Fn&, iter_common_reference_t<_I1>,
   iter_common_reference_t<_I2>>;
 
+  namespace __detail
+  {
+// FIXME: needed due to PR c++/67704
+template
+  struct __indirect_result
+  { };
+
+template
+  requires (readable<_Is> && ...)
+   && invocable<_Fn, iter_reference_t<_Is>...>
+  struct __indirect_result<_Fn, _Is...>
+  : invoke_result<_Fn, iter_reference_t<_Is>...>
+  { };
+  } // namespace __detail
+
   template
 using indirect_result_t = typename
-  __detail::__indirect_result<_Fn, iter_reference_t<_Is>...>::type;
+  __detail::__indirect_result<_Fn, _Is...>::type;
 
   /// [projected], projected
   template _Proj>
 struct projected
 {
   using value_type = remove_cvref_t>;
+
   indirect_result_t<_Proj&, _Iter> operator*() const; // not defined
 };
 
diff --git a/libstdc++-v3/testsuite/24_iterators/indirect_callable/projected.cc 
b/libstdc++-v3/testsuite/24_iterators/indirect_callable/projected.cc
new file mode 100644
index 000..2237314ddc2
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/indirect_callable/projected.cc
@@ -0,0 +1,41 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+
+template
+  using PI = std::projected;
+
+static_assert(std::same_as::value_type, int>);
+static_assert(std::same_as&>()), int&>);
+
+struct X
+{
+  using value_type = char*;
+  char* const& operator*() &;
+};
+static_assert( std::readable );
+static_assert(std::same_as::value_type, char*>);
+static_assert(std::same_as&>()), char* const&>);
+
+struct Y;
+using PY = std::projected;
+static_assert(std::same_as);
+static_assert(std::same_as()), const int&>);


Re: Fix overflows in -fprofile-reorder-functions

2019-12-10 Thread Alexander Monakov
On Tue, 10 Dec 2019, Jan Hubicka wrote:

> > I would recommend to make these variables uint64_t, then you can simply do
> > 
> >   tp_first_run_a--;
> >   tp_first_run_b--;
> > 
> > making 0 wrap around to UINT64_MAX. Then they will naturally sort after all
> > other nodes.
> 
> Then we would still have to watch the overflow before returning? I
> actually find the condtional sort of more readable than intentional wrap
> around the range, so I kept it in the code espeically because I made the
> value 32bit again and without this trick I no longer need to watch
> overflows.

For int-typed timestamps, you'd need to warp 0 to INT_MAX, e.g. like this:

  tp_first_run_a = (tp_first_run_a - 1) & INT_MAX;
  tp_first_run_b = (tp_first_run_b - 1) & INT_MAX;

which keeps them in 0..INT_MAX range so 'return tp_first_run_a - tp_first_run_b'
still works.

> > > +  /* Output functions in RPO so callers get optimized before callees.  
> > > This
> > > + makes ipa-ra and other propagators to work.
> > > + FIXME: This is far from optimal code layout.  */
> > 
> > I think this should have said "callees get optimized before callers".
> 
> Indeed.

So technically this would be just postorder, not RPO :)

> --- cgraph.c  (revision 279093)
> +++ cgraph.c  (working copy)
> @@ -3074,6 +3074,11 @@ cgraph_node::verify_node (void)
>inlined_to->count.debug ();
>error_found = true;
>  }
> +  if (tp_first_run < 0)
> +{
> +  error ("tp_first_run must be positive");
> +  error_found = true;
> +}

"non-negative"?

Alexander


Re: Fix overflows in -fprofile-reorder-functions

2019-12-10 Thread Jan Hubicka
> 2On Sun, 8 Dec 2019, Jan Hubicka wrote:
> 
> > Other explanation would be that our new qsort with broken comparator due to
> > overflow can actualy remove some entries in the array, but that sounds bit
> > crazy.
> 
> gcc_qsort only reorders elements, making it possible for gcc_qsort_chk (that
> runs afterwards) to catch crazy comparators in a sound manner.

I understand this problem (and it is very weird).  It was caused by
optimize attribute overwritting incorrectly
flag_profile_reorder_functions which is not supposed to change mid
compilation globally. So we ended up putting some symbols for ordered
output and then forgetting about them.
> >/* Functions with time profile must be before these without profile.  */
> > -  if (!a->tp_first_run || !b->tp_first_run)
> > -return a->tp_first_run - b->tp_first_run;
> > +  if (!tp_first_run_a || !tp_first_run_b)
> > +return tp_first_run_a ? 1 : -1;
> 
> The code does the opposite of the comment: when tp_first_run_b is 0, it will
> return 1, indicating a > b, causing b to appear in front of a in the sorted
> array.

You are right - I have noticed that and fixed it as (apparently
forgotten) last minute change. Trunk says tp_first_run_b.
> 
> I would recommend to make these variables uint64_t, then you can simply do
> 
>   tp_first_run_a--;
>   tp_first_run_b--;
> 
> making 0 wrap around to UINT64_MAX. Then they will naturally sort after all
> other nodes.

Then we would still have to watch the overflow before returning? I
actually find the condtional sort of more readable than intentional wrap
around the range, so I kept it in the code espeically because I made the
value 32bit again and without this trick I no longer need to watch
overflows.
> 
> > +  /* Output functions in RPO so callers get optimized before callees.  This
> > + makes ipa-ra and other propagators to work.
> > + FIXME: This is far from optimal code layout.  */
> 
> I think this should have said "callees get optimized before callers".

Indeed.
Here is patch fixing the issues which I have bootstrapped
I will wait a bit for comments before comitting.

Honza

* cgraph.c (cgraph_node::verify_node): Verify tp_first_run.
* cgraph.h (cgrpah_node): Turn tp_first_run back to int.
* cgraphunit.c (tp_first_run_node_cmp): Do not watch for overflows.
(expand_all_functions): First expand ordered section and then
unordered.
* lto-partition.c (lto_balanced_map): Fix printing of tp_first_run.
* profile.c (compute_value_histograms): Error on out of range
tp_first_runs.
Index: cgraph.c
===
--- cgraph.c(revision 279093)
+++ cgraph.c(working copy)
@@ -3074,6 +3074,11 @@ cgraph_node::verify_node (void)
   inlined_to->count.debug ();
   error_found = true;
 }
+  if (tp_first_run < 0)
+{
+  error ("tp_first_run must be positive");
+  error_found = true;
+}
   if (!definition && !in_other_partition && local)
 {
   error ("local symbols must be defined");
Index: cgraph.h
===
--- cgraph.h(revision 279093)
+++ cgraph.h(working copy)
@@ -1430,8 +1430,6 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg
 
   /* Expected number of executions: calculated in profile.c.  */
   profile_count count;
-  /* Time profiler: first run of function.  */
-  gcov_type tp_first_run;
   /* How to scale counts at materialization time; used to merge
  LTO units with different number of profile runs.  */
   int count_materialization_scale;
@@ -1439,6 +1437,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg
   unsigned int profile_id;
   /* ID of the translation unit.  */
   int unit_id;
+  /* Time profiler: first run of function.  */
+  int tp_first_run;
 
   /* Set when decl is an abstract function pointed to by the
  ABSTRACT_DECL_ORIGIN of a reachable function.  */
Index: lto/lto-partition.c
===
--- lto/lto-partition.c (revision 279093)
+++ lto/lto-partition.c (working copy)
@@ -514,11 +514,11 @@ lto_balanced_map (int n_lto_partitions,
   if (dump_file)
 {
   for (unsigned i = 0; i < order.length (); i++)
-   fprintf (dump_file, "Balanced map symbol order:%s:%" PRId64 "\n",
-order[i]->name (), (int64_t) order[i]->tp_first_run);
+   fprintf (dump_file, "Balanced map symbol order:%s:%u\n",
+order[i]->name (), order[i]->tp_first_run);
   for (unsigned i = 0; i < noreorder.length (); i++)
-   fprintf (dump_file, "Balanced map symbol no_reorder:%s:%" PRId64 "\n",
-noreorder[i]->name (), (int64_t) noreorder[i]->tp_first_run);
+   fprintf (dump_file, "Balanced map symbol no_reorder:%s:%u\n",
+noreorder[i]->name (), noreorder[i]->tp_first_run);
 }
 
   /* Collect all variables that should not be reordered.  */
Index: profile.c

Re: copy/copy_backward/fill/fill_n/equal rework

2019-12-10 Thread Jonathan Wakely

On 09/12/19 10:32 +0100, François Dumont wrote:
After completing this work and running more tests I realized that the 
declaration of algos was still not ideal.


So here is another version where algos are not re-declare in 
stl_deque.h, I rather include stl_algobase.h in deque.tcc. The problem 
was spotted but another patch I am going to submit afterward.


OK for trunk (with a suitable ChangeLog).

Thanks for persisting with this, sorry it took so long.



Re: [PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Harwath, Frederik
Hi Thomas,

On 10.12.19 15:44, Thomas Schwinge wrote:

>> Frederik Harwath (2):
>>   Use clause locations in OpenACC nested reduction warnings
>>   Add tests to verify OpenACC clause locations
> 
> I won't insist, but suggest (common practice) to merge that into one
> patch: bug fix plus test cases, using the summary line of your first
> patch.> [...]
> It's of course always OK to add new test cases, but wouldn't the same
> test coverage be reached by just adding such checking to the existing
> test cases in 'c-c++-common/goacc/nested-reductions-warn.c',
> 'gfortran.dg/goacc/nested-reductions-warn.f90'?

Sure, we could have everything in one patch and one test. The rationale
for splitting the patches and for splitting the tests is that the tests do
not try to verify the nested reductions validation code. They try to verify
that the language front-ends set the correct locations for clauses.
Without a possibility to do proper unit testing, I just had to find some
way to check the clauses. I had no immediate success triggering one of the
very few other warnings that use the location of omp_clauses from both Fortran
and C code and hence I went with the nested reductions code.

Thanks for your review!

Best regards,
Frederik




[PATCH] Fix libstdc++ testsuite to handle VxWorks gthreads implementation

2019-12-10 Thread Corentin Gay
Hello there ! 




When implementing the support for gthreads in VxWorks, we stumbled on a
problem in the testsuite. In the libstdc++ testsuite, we
indiscriminately add the `-pthread` switch to the tests that require
linking against the pthread library. In certain cases, such as VxWorks,
the gthread interface relies on the system native threads lilbrary and 
the `-pthread` switch does not exist.


This patch adds a condition for the use of the `-pthread` switch. It
adds it only if the target supports it. The patch also adds
`dg-require-gthreads` in tests that were lacking it.

This patch was tested on x86_64-linux and is part of our nightly testing
on all platforms, including VxWorks.

As this is my first submission, please tell me if I missed anything.

Below you will find the diff and the Changelog.

Cheers,
Corentin Gay

--

libstc++/ChangeLog:
* testsuite/20_util/shared_ptr/atomic/3.cc: Do not require POSIX
threads and add -pthread only on targets supporting them.
* testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc:
Likewise.
* testsuite/20_util/shared_ptr/thread/mutex_weaktoshared.cc: Likewise.
* testsuite/30_threads/async/42819.cc: Likewise.
* testsuite/30_threads/async/49668.cc: Likewise.
* testsuite/30_threads/async/54297.cc: Likewise.
* testsuite/30_threads/async/any.cc: Likewise.
* testsuite/30_threads/async/async.cc: Likewise.
* testsuite/30_threads/async/except.cc: Likewise.
* testsuite/30_threads/async/launch.cc: Likewise.
* testsuite/30_threads/async/lwg2021.cc: Likewise.
* testsuite/30_threads/async/sync.cc: Likewise. : Likewise.
* testsuite/30_threads/call_once/39909.cc: Likewise.
* testsuite/30_threads/call_once/49668.cc: Likewise.
* testsuite/30_threads/call_once/60497.cc: Likewise.
* testsuite/30_threads/call_once/call_once1.cc: Likewise.
* testsuite/30_threads/call_once/dr2442.cc: Likewise.
* testsuite/30_threads/condition_variable/54185.cc: Likewise.
* testsuite/30_threads/condition_variable/cons/1.cc: Likewise.
* testsuite/30_threads/condition_variable/members/1.cc: Likewise.
* testsuite/30_threads/condition_variable/members/2.cc: Likewise.
* testsuite/30_threads/condition_variable/members/3.cc: Likewise.
* testsuite/30_threads/condition_variable/members/53841.cc: Likewise.
* testsuite/30_threads/condition_variable/members/68519.cc: Likewise.
* testsuite/30_threads/condition_variable/native_handle/typesizes.cc:
Likewise.
* testsuite/30_threads/condition_variable_any/50862.cc: Likewise.
* testsuite/30_threads/condition_variable_any/53830.cc: Likewise.
* testsuite/30_threads/condition_variable_any/cond.cc: Likewise.
* testsuite/30_threads/condition_variable_any/cons/1.cc: Likewise.
* testsuite/30_threads/condition_variable_any/members/1.cc: Likewise.
* testsuite/30_threads/condition_variable_any/members/2.cc: Likewise.
* testsuite/30_threads/future/cons/move.cc: Likewise.
* testsuite/30_threads/future/members/45133.cc: Likewise.
* testsuite/30_threads/future/members/get.cc: Likewise.
* testsuite/30_threads/future/members/get2.cc: Likewise.
* testsuite/30_threads/future/members/share.cc: Likewise.
* testsuite/30_threads/future/members/valid.cc: Likewise.
* testsuite/30_threads/future/members/wait.cc: Likewise.
* testsuite/30_threads/future/members/wait_for.cc: Likewise.
* testsuite/30_threads/future/members/wait_until.cc: Likewise.
* testsuite/30_threads/lock/1.cc: Likewise.
* testsuite/30_threads/lock/2.cc: Likewise.
* testsuite/30_threads/lock/3.cc: Likewise.
* testsuite/30_threads/lock/4.cc: Likewise.
* testsuite/30_threads/mutex/cons/1.cc: Likewise.
* testsuite/30_threads/mutex/dest/destructor_locked.cc: Likewise.
* testsuite/30_threads/mutex/lock/1.cc: Likewise.
* testsuite/30_threads/mutex/native_handle/1.cc: Likewise.
* testsuite/30_threads/mutex/native_handle/typesizes.cc: Likewise.
* testsuite/30_threads/mutex/try_lock/1.cc: Likewise.
* testsuite/30_threads/mutex/try_lock/2.cc: Likewise.
* testsuite/30_threads/mutex/unlock/1.cc: Likewise.
* testsuite/30_threads/mutex/unlock/2.cc: Likewise.
* testsuite/30_threads/packaged_task/49668.cc: Likewise.
* testsuite/30_threads/packaged_task/60564.cc: Likewise.
* testsuite/30_threads/packaged_task/cons/1.cc: Likewise.
* testsuite/30_threads/packaged_task/cons/2.cc: Likewise.
* testsuite/30_threads/packaged_task/cons/3.cc: Likewise.
* testsuite/30_threads/packaged_task/cons/56492.cc: Likewise.
* testsuite/30_threads/packaged_task/cons/alloc.cc: Likewise.
* testsuite/30_threads/packaged_task/cons/move.cc: Likewise.
* 

Re: [PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Thomas Schwinge
Hi Frederik!

On 2019-12-10T15:23:01+0100, Frederik Harwath  wrote:
> On 09.12.19 16:58, Harwath, Frederik wrote:
>> Tobias has recently fixed a problem with the column information in gfortran 
>> locations
>> [...]
>> I have tested the patch manually by adapting the validity check for nested 
>> OpenACC reductions (see omp-low.c)
>> to include the location of clauses in warnings instead of the location of 
>> the loop to which the clause belongs.
>> I can add a regression test based on this later on after adapting the code 
>> in omp-low.c.
>
> here are patches adding the promised test for Fortran and a corresponding 
> test for C.
>
> Is it ok to include them in trunk?

Thanks, yes, with my following remarks considered, and acted on per your
preference.  To record the review effort, please include "Reviewed-by:
Thomas Schwinge " in the commit log, see
.

> Frederik Harwath (2):
>   Use clause locations in OpenACC nested reduction warnings
>   Add tests to verify OpenACC clause locations

I won't insist, but suggest (common practice) to merge that into one
patch: bug fix plus test cases, using the summary line of your first
patch.

On 2019-12-10T15:23:02+0100, Frederik Harwath  wrote:
> Since the Fortran front-end now sets the clause locations correctly, we can
> emit warnings with more precise locations if we encounter conflicting
> operations for a variable in reduction clauses.
>
> 2019-12-10  Frederik Harwath  
>
> gcc/
>   * omp-low.c (scan_omp_for): Use clause location in warning.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -2473,7 +2473,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
> tree_code outer_op = OMP_CLAUSE_REDUCTION_CODE (outer_clause);
> if (outer_var == local_var && outer_op != local_op)
>   {
> -   warning_at (gimple_location (stmt), 0,
> +   warning_at (OMP_CLAUSE_LOCATION (local_clause), 0,
> "conflicting reduction operations for %qE",
> local_var);
> inform (OMP_CLAUSE_LOCATION (outer_clause),

Watch me think aloud: doesn't the same also apply to the other
'warning_at' added as part of "Warn about inconsistent OpenACC nested
reduction clauses", the "nested loop in reduction needs [...]" diagnotic?
Haha, of course it doesn't -- in that situation we don't have a specific
clause location, because we don't have a 'reduction' clause.  ;-)

On 2019-12-10T15:23:03+0100, Frederik Harwath  wrote:
> Check that the column information for OpenACC clauses is communicated 
> correctly
> to the middle-end, in particular by the Fortran front-end (cf. PR 92793).
>
> 2019-12-10  Frederik Harwath  
>
> gcc/testsuite/
>   * gcc.dg/goacc/clause-locations.c: New test.
>   * gfortran.dg/goacc/clause-locations.f90: New test.

It's of course always OK to add new test cases, but wouldn't the same
test coverage be reached by just adding such checking to the existing
test cases in 'c-c++-common/goacc/nested-reductions-warn.c',
'gfortran.dg/goacc/nested-reductions-warn.f90'?

That said:

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/goacc/clause-locations.c

May want to into 'c-c++-common/', so that both C and C++ will be tested.

> @@ -0,0 +1,17 @@
> +/* Verify that the location information for clauses is correct. */
> +
> +void
> +check_clause_columns() {
> +  int i, j, sum, diff;
> +
> +  #pragma acc parallel
> +  {
> +#pragma acc loop reduction(+:sum)
> +for (i = 1; i <= 10; i++)
> +  {
> +#pragma acc loop reduction(-:diff) reduction(-:sum)  /* { dg-warning 
> "53: conflicting reduction operations for .sum." } */
> + for (j = 1; j <= 10; j++)
> +   sum = 1;
> +  }
> +  }
> +}

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90
> @@ -0,0 +1,18 @@
> +! Verify that the location information for clauses is correct.
> +! See also PR 92793.
> +
> +subroutine check_clause_columns ()
> +  implicit none (type, external)
> +  integer :: i, j, sum, diff
> +
> +  !$acc parallel
> +!$acc loop reduction(+:sum)
> +do i = 1, 10
> +  !$acc loop reduction(-:diff) reduction(-:sum)  ! { dg-warning "47: 
> conflicting reduction operations for .sum." }
> +  do j = 1, 10
> +sum = 1
> +  end do
> +end do
> +  !$acc end parallel
> +end subroutine check_clause_columns
> +


Grüße
 Thomas


Re: [PATCH] Fix multibyte-related issues in pretty-print.c (PR 91843)

2019-12-10 Thread Lewis Hyatt
On Mon, Dec 9, 2019 at 4:58 PM David Malcolm  wrote:
>
> On Thu, 2019-10-10 at 16:27 -0400, Lewis Hyatt wrote:
> > Hello-
> >
> > This short patch addresses
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843
> > by adding the needed multibyte awareness to pretty-print.c.
> > Together with my other patch awaiting review
> > (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this
> > fixes all
> > issues that I am aware of regarding printing diagnostics with
> > multibyte
> > characters in UTF-8 locales. Would you please have a look and see if
> > it's OK?
> > Thanks very much.
> >
> > bootstrapped and tested on x86-64 Linux, all test results were
> > identical before
> > and after:
> > 34 XPASS
> > 109 FAIL
> > 1490 XFAIL
> > 9470 UNSUPPORTED
> > 332971 PASS
> >
> > -Lewis
>
> Patch looks good to me.
>
> Do you want SVN commit access, as per:
>   https://www.gnu.org/software/gcc/svnwrite.html
> ?
>
> I'm willing to sponsor you.
>
> Dave
>

All set with the commit access, thank you. I just wanted to double
check that this patch is OK to go in.

-Lewis


[committed] Add myself to MAINTAINERS

2019-12-10 Thread Lewis Hyatt
Thanks for the sponsorship David.

2019-12-10  Lewis Hyatt  

* MAINTAINERS (Write After Approval): Add myself.
Index: MAINTAINERS
===
--- MAINTAINERS	(revision 279166)
+++ MAINTAINERS	(revision 279167)
@@ -429,6 +429,7 @@
 Andrew John Hughes
 Dominique d'Humieres
 Andy Hutchinson	
+Lewis Hyatt	
 Naveen H.S	
 Roland Illig	
 Meador Inge	


[PATCH 2/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Frederik Harwath
Check that the column information for OpenACC clauses is communicated correctly
to the middle-end, in particular by the Fortran front-end (cf. PR 92793).

2019-12-10  Frederik Harwath  

gcc/testsuite/
* gcc.dg/goacc/clause-locations.c: New test.
* gfortran.dg/goacc/clause-locations.f90: New test.
---
 gcc/testsuite/gcc.dg/goacc/clause-locations.c  | 17 +
 .../gfortran.dg/goacc/clause-locations.f90 | 18 ++
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/clause-locations.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/clause-locations.f90

diff --git a/gcc/testsuite/gcc.dg/goacc/clause-locations.c 
b/gcc/testsuite/gcc.dg/goacc/clause-locations.c
new file mode 100644
index 000..51184e3517b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/goacc/clause-locations.c
@@ -0,0 +1,17 @@
+/* Verify that the location information for clauses is correct. */
+
+void
+check_clause_columns() {
+  int i, j, sum, diff;
+
+  #pragma acc parallel
+  {
+#pragma acc loop reduction(+:sum)
+for (i = 1; i <= 10; i++)
+  {
+#pragma acc loop reduction(-:diff) reduction(-:sum)  /* { dg-warning 
"53: conflicting reduction operations for .sum." } */
+   for (j = 1; j <= 10; j++)
+ sum = 1;
+  }
+  }
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90 
b/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90
new file mode 100644
index 000..29798d31542
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90
@@ -0,0 +1,18 @@
+! Verify that the location information for clauses is correct.
+! See also PR 92793.
+
+subroutine check_clause_columns ()
+  implicit none (type, external)
+  integer :: i, j, sum, diff
+
+  !$acc parallel
+!$acc loop reduction(+:sum)
+do i = 1, 10
+  !$acc loop reduction(-:diff) reduction(-:sum)  ! { dg-warning "47: 
conflicting reduction operations for .sum." }
+  do j = 1, 10
+sum = 1
+  end do
+end do
+  !$acc end parallel
+end subroutine check_clause_columns
+
-- 
2.17.1



[PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Frederik Harwath
Hi,

On 09.12.19 16:58, Harwath, Frederik wrote:
> Tobias has recently fixed a problem with the column information in gfortran 
> locations
> [...]
> I have tested the patch manually by adapting the validity check for nested 
> OpenACC reductions (see omp-low.c)
> to include the location of clauses in warnings instead of the location of the 
> loop to which the clause belongs.
> I can add a regression test based on this later on after adapting the code in 
> omp-low.c.

here are patches adding the promised test for Fortran and a corresponding test 
for C.

Is it ok to include them in trunk?

Best regards,
Frederik

Frederik Harwath (2):
  Use clause locations in OpenACC nested reduction warnings
  Add tests to verify OpenACC clause locations

 gcc/omp-low.c  |  2 +-
 gcc/testsuite/gcc.dg/goacc/clause-locations.c  | 17 +
 .../gfortran.dg/goacc/clause-locations.f90 | 18 ++
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/clause-locations.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/clause-locations.f90

-- 
2.17.1



[PATCH 1/2] Use clause locations in OpenACC nested reduction warnings

2019-12-10 Thread Frederik Harwath
Since the Fortran front-end now sets the clause locations correctly, we can
emit warnings with more precise locations if we encounter conflicting
operations for a variable in reduction clauses.

2019-12-10  Frederik Harwath  

gcc/
* omp-low.c (scan_omp_for): Use clause location in warning.
---
 gcc/omp-low.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ad26f7918a5..d422c205836 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2473,7 +2473,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
  tree_code outer_op = OMP_CLAUSE_REDUCTION_CODE (outer_clause);
  if (outer_var == local_var && outer_op != local_op)
{
- warning_at (gimple_location (stmt), 0,
+ warning_at (OMP_CLAUSE_LOCATION (local_clause), 0,
  "conflicting reduction operations for %qE",
  local_var);
  inform (OMP_CLAUSE_LOCATION (outer_clause),
-- 
2.17.1



Re: [PATCH] Bail out in gfc_dep_compare_expr for a NULL argument.

2019-12-10 Thread Tobias Burnus

Hi Martin,

On 12/10/19 2:04 PM, Martin Liška wrote:

The patch is about early bail out in gfc_dep_compare_expr function
when one of the arguments (and not both at the same time) is null.
I'm not sure, it's a proper fix however.


I think it is the best one can do. Technically, one could replace the 
comparison by
"length == 1 || "" == x(1:)" but I am not sure whether that would be be 
advantageous in general (esp. when taking implementation time into 
account); in any case, that's independent of the ICE.


The caller in this example is "are_identical_variables":

  if (gfc_dep_compare_expr (r1->u.ss.end, r2->u.ss.end) != 0)
return false;
comparing the string end.

And "-2 if the relationship could not be determined".



Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Yes; it looks good to me – thanks!

Tobias


gcc/fortran/ChangeLog:

2019-12-10  Martin Liska  

PR fortran/92874
* dependency.c (gfc_dep_compare_expr): Bail out
when one of the arguments is null.

gcc/testsuite/ChangeLog:

2019-12-10  Martin Liska  

PR fortran/92874
* gfortran.dg/pr92874.f90: New test.
---
 gcc/fortran/dependency.c  |  2 ++
 gcc/testsuite/gfortran.dg/pr92874.f90 | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr92874.f90




Re: [PATCH, Modula-2 (C/C++/D/F/Go/Jit)] (Register spec fn) (v3)

2019-12-10 Thread Matthias Klose
On 09.12.19 17:41, Gaius Mulley wrote:
> Matthias Klose  writes:
> 
>> On 17.11.19 07:49, Gaius Mulley wrote:
>>>
>>> Hello,
>>>
>>> while spending the weekend on the Howland and Baker islands :-) I
>>> thought I'd post version three of the patches which introduce Modula-2
>>> into the GCC trunk.  The patches include:
>>
>> [...]
>>
>>> At a later point (after it is reviewed/approved) the gm2 tree
>>> http://git.savannah.gnu.org/cgit/gm2.git/tree/gcc-versionno/m2/ could
>>> be included.  Together with the gm2 testsuite.
>>>
>>> But for now here are the proposed patches and ChangeLogs and new files
>>> (gm2-v3.tar.gz) (after the patches):
>>
>> I have updated my distro packaging to build gcc-10, including gm2 from the
>> trunk.  Both native and cross builds seem to work, with some glitches:
>>
>>  - For native builds, the profiled build doesn't work, failing to link
>>the gcov library. Failing that, I can't check the lto+profiled build.
>>Both the profiled and lto+profiled builds are working on your gcc-9
>>branch.
>>
>>  - For cross builds, the libgm2 libraries install as host libraries,
>>not target libraries (but are correctly built).  I sent one patch
>>to Gaius, but couldn't figure out yet, why the libs are not
>>installed as target libraries.
>>
>> The packages are publicly available in Debian experimental [1] and Ubuntu 
>> focal
>> [2], test results are sent to the gcc-testresults ML.
>>
>> Are you still aiming for inclusion in GCC 10?
>>
>> Matthias
>>
>> [1] https://tracker.debian.org/pkg/gcc-10
>> [2]
>> https://launchpad.net/~doko/+archive/ubuntu/toolchain/+sourcepub/10781708/+listing-archive-extra
> 
> Hello,
> 
> yes I'm still aiming for inclusion in GCC 10.  I'm still examining the
> target/host bugs in the libgm2 build infrastructure (and slowly going
> insane :-).  I'll also look at the lto+profiled build - great to hear
> the gcc-9/gm2 branch works.  How do the GCC 10 gm2 v3 patches look?

I didn't have to modify any of these for my gcc-10 packaging, from my point of
view these look ok. However I can't officially review any of those.

Matthias


[PATCH][vect] Keep track of DR_OFFSET advance in dr_vec_info rather than data_reference

2019-12-10 Thread Andre Vieira (lists)

Hi,

This patch aims at refactoring the vectorizer code to no longer need to 
reset DR_OFFSET for epilogue vectorization and instead keep track of 
DR_OFFSET changes per dr_vec_info and just update it as needed.  I added 
a member of type 'tree' called 'offset' to dr_vec_info, which keeps 
track of the changes to the data_reference's offset per dr_vec_info and 
thus per instance of loop vectorization.  To get the current loop's 
DR_OFFSET I introduced a function 'get_dr_vinfo_offset' which will add 
the dr_vec_info's offset to either the data_reference's innermost offset 
or the offset of the 'innermost_loop_behavior' returned by 
'vect_dr_behavior' depending on whether 'get_dr_vinfo_offset's second 
parameter 'check_outer' is true.  This is to mimic the behavior of using 
the outer loop relative 'innermost_loop_behavior' in 
'vect_create_addr_base_for_vector_ref'.


I regression tested vect.exp on aarch64 and x86_64. I also regression 
tested libgomp on aarch64 and x86_64, no changes, but there were quite a 
few test failures with the commit I based this patch on...


Is this OK for trunk or shall I wait until libgomp stabilizes?

2019-12-10  Andre Vieira  

* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Use
get_dr_vinfo_offset
* tree-vect-loop.c (update_epilogue_loop_vinfo):  Remove 
orig_drs_init

parameter and its use to reset DR_OFFSET's.
(vect_transform_loop): Remove orig_drs_init argument.
* tree-vect-loop-manip.c (vect_update_init_of_dr): Update the 
offset

member of dr_vec_info rather than the offset of the associated
data_reference's innermost_loop_behavior.
(vect_update_init_of_dr): Pass dr_vec_info instead of 
data_reference.

(vect_do_peeling): Remove orig_drs_init parameter and its
construction.
* tree-vect-stmts.c (check_scan_store): Replace use of DR_OFFSET
with get_dr_vinfo_offset.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index b876d07f44b9922be478fe04b9e7cb201f245b24..6745f99e3e25f690ef80433bc2b0651e4b994ba2 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4597,7 +4597,7 @@ vect_create_addr_base_for_vector_ref (stmt_vec_info stmt_info,
   innermost_loop_behavior *drb = vect_dr_behavior (dr_info);
 
   tree data_ref_base = unshare_expr (drb->base_address);
-  tree base_offset = unshare_expr (drb->offset);
+  tree base_offset = unshare_expr (get_dr_vinfo_offset (dr_info, true));
   tree init = unshare_expr (drb->init);
 
   if (loop_vinfo)
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index b4dda971b18e17733b121038e89e95fc59f09d0c..3cc76c30af50967a579164cc03c2a2cf2ba96640 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1711,19 +1711,22 @@ vect_gen_prolog_loop_niters (loop_vec_info loop_vinfo,
iterations before the scalar one (using masking to skip inactive
elements).  This function updates the information recorded in DR to
account for the difference.  Specifically, it updates the OFFSET
-   field of DR.  */
+   field of DR_INFO.  */
 
 static void
-vect_update_init_of_dr (struct data_reference *dr, tree niters, tree_code code)
+vect_update_init_of_dr (dr_vec_info *dr_info, tree niters, tree_code code)
 {
-  tree offset = DR_OFFSET (dr);
+  struct data_reference *dr = dr_info->dr;
+  tree offset = dr_info->offset;
+  if (!offset)
+offset = build_zero_cst (sizetype);
 
   niters = fold_build2 (MULT_EXPR, sizetype,
 			fold_convert (sizetype, niters),
 			fold_convert (sizetype, DR_STEP (dr)));
   offset = fold_build2 (code, sizetype,
 			fold_convert (sizetype, offset), niters);
-  DR_OFFSET (dr) = offset;
+  dr_info->offset = offset;
 }
 
 
@@ -1753,7 +1756,7 @@ vect_update_inits_of_drs (loop_vec_info loop_vinfo, tree niters,
 {
   dr_vec_info *dr_info = loop_vinfo->lookup_dr (dr);
   if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt))
-	vect_update_init_of_dr (dr, niters, code);
+	vect_update_init_of_dr (dr_info, niters, code);
 }
 }
 
@@ -2441,7 +2444,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 		 tree *niters_vector, tree *step_vector,
 		 tree *niters_vector_mult_vf_var, int th,
 		 bool check_profitability, bool niters_no_overflow,
-		 tree *advance, drs_init_vec _drs_init)
+		 tree *advance)
 {
   edge e, guard_e;
   tree type = TREE_TYPE (niters), guard_cond;
@@ -2656,14 +2659,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  scale_loop_profile (prolog, prob_prolog, bound_prolog);
 	}
 
-  /* Save original inits for each data_reference before advancing them with
-	 NITERS_PROLOG.  */
-  unsigned int i;
-  struct data_reference *dr;
-  vec datarefs = loop_vinfo->shared->datarefs;
-  FOR_EACH_VEC_ELT (datarefs, i, dr)
-	orig_drs_init.safe_push (std::make_pair (dr, DR_OFFSET 

Re: [RFC, vectorizer] Fix ICE with masked vectors

2019-12-10 Thread Andrew Stubbs

On 09/12/2019 15:59, Richard Sandiford wrote:

No, the assumption's correct even there.  The assert usually triggers
because something elsewhere is getting confused about the vector types.


The attached patch fixes the ICE in the testcase, but I suspect does not
go far enough. Can it happen that NUNITS can be greater than the
vectorization factor, but not a multiple? Is this even a valid fix in
the first place? Must it be conditionalized on masking being available?
Is the exactness even worth checking, in the presence of exceptions?


The vector types and VF aren't chosen based on whether masking is available.
It happens the other way around: we first analyse the loop and pick the VF
for an unmasked loop, but record as we go whether a masked implementation
is also possible.  Then we decide at the end whether to use a masked
implementation instead of an unmasked one.

So if this assert triggers for masked loops, it could trigger for unmasked
loops too.


OK, I completely misunderstood what was happening here.

What happens is that it goes through and finds vector types for every 
statement, and then says "Updating vectorization factor to 4.", but 
doesn't then go back and check for suitable types.


So, then it gets to this:

 if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
   {
 poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
 ? vf * DR_GROUP_SIZE (stmt_info) : vf);
 possible_npeel_number
   = vect_get_num_vectors (nscalars, vectype);

 /* NPEEL_TMP is 0 when there is no misalignment, but also
allow peeling NELEMENTS.  */
 if (DR_MISALIGNMENT (dr_info) == 0)
   possible_npeel_number++;
   }

where "vf" is now 4, the group size appears to be 2, and "vectype" is 
V64SI, and vect_get_num_vectors blows up trying to divide 8 by 64.


If I switch back to the default cost model then the "vect" pass 
completes successfully, although vectorization fails due to a missing 
vector operator. The following "slp2" pass then switches to TImode fake 
vectors and works fine.


Alternatively, if I add back my patch then the pass completes the same 
way (without vectorization).


Any suggestions? I can't see how this stuff is supposed to work?

Andrew


[PATCH] Bail out in gfc_dep_compare_expr for a NULL argument.

2019-12-10 Thread Martin Liška

Hi.

The patch is about early bail out in gfc_dep_compare_expr function
when one of the arguments (and not both at the same time) is null.
I'm not sure, it's a proper fix however.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/fortran/ChangeLog:

2019-12-10  Martin Liska  

PR fortran/92874
* dependency.c (gfc_dep_compare_expr): Bail out
when one of the arguments is null.

gcc/testsuite/ChangeLog:

2019-12-10  Martin Liska  

PR fortran/92874
* gfortran.dg/pr92874.f90: New test.
---
 gcc/fortran/dependency.c  |  2 ++
 gcc/testsuite/gfortran.dg/pr92874.f90 | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr92874.f90


diff --git a/gcc/fortran/dependency.c b/gcc/fortran/dependency.c
index 394d85b48a2..eb741137d60 100644
--- a/gcc/fortran/dependency.c
+++ b/gcc/fortran/dependency.c
@@ -319,6 +319,8 @@ gfc_dep_compare_expr (gfc_expr *e1, gfc_expr *e2)
 
   if (e1 == NULL && e2 == NULL)
 return 0;
+  else if (e1 == NULL || e2 == NULL)
+return -2;
 
   e1 = gfc_discard_nops (e1);
   e2 = gfc_discard_nops (e2);
diff --git a/gcc/testsuite/gfortran.dg/pr92874.f90 b/gcc/testsuite/gfortran.dg/pr92874.f90
new file mode 100644
index 000..a1fca3d7718
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr92874.f90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! { dg-options "-O2" }
+! PR fortran/92874
+program p
+   call s('a')
+   call s('abc')
+end
+subroutine s(x)
+   character(*) :: x
+   print *, (x(1:1) == x(1:))
+end



Re: Make dwarf2out punt for MODE_VECTOR_BOOL

2019-12-10 Thread Richard Biener
On December 10, 2019 12:44:01 PM GMT+01:00, Richard Sandiford 
 wrote:
>The dwarf2 handling of vector constants currently divides the vector
>into a length (number of elements) and byte element size.  This doesn't
>work well for MODE_VECTOR_BOOL, where several elements are packed into
>the same byte.
>
>We should probably add a way of encoding this in future, but for now
>the safest thing is to punt, like we already do for variable-length
>vectors.
>
>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok. 

Richard. 

>Richard
>
>
>2019-12-10  Richard Sandiford  
>
>gcc/
>   * dwarf2out.c (loc_descriptor): Punt for MODE_VECTOR_BOOL.
>   (add_const_value_attribute): Likewise.
>
>gcc/testsuite/
>   * gcc.target/aarch64/sve/acle/general/debug_4.c: New test.
>
>Index: gcc/dwarf2out.c
>===
>--- gcc/dwarf2out.c2019-11-29 13:04:13.978672241 +
>+++ gcc/dwarf2out.c2019-12-10 11:43:15.560875505 +
>@@ -16763,7 +16763,12 @@ loc_descriptor (rtx rtl, machine_mode mo
>   if (mode == VOIDmode)
>   mode = GET_MODE (rtl);
> 
>-  if (mode != VOIDmode && (dwarf_version >= 4 || !dwarf_strict))
>+  if (mode != VOIDmode
>+/* The combination of a length and byte elt_size doesn't extend
>+   naturally to boolean vectors, where several elements are packed
>+   into the same byte.  */
>+&& GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL
>+&& (dwarf_version >= 4 || !dwarf_strict))
>   {
> unsigned int length;
> if (!CONST_VECTOR_NUNITS (rtl).is_constant ())
>@@ -19622,6 +19627,12 @@ add_const_value_attribute (dw_die_ref di
> return false;
> 
>   machine_mode mode = GET_MODE (rtl);
>+  /* The combination of a length and byte elt_size doesn't extend
>+ naturally to boolean vectors, where several elements are packed
>+ into the same byte.  */
>+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
>+return false;
>+
>   unsigned int elt_size = GET_MODE_UNIT_SIZE (mode);
>   unsigned char *array
> = ggc_vec_alloc (length * elt_size);
>Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/debug_4.c
>===
>--- /dev/null  2019-09-17 11:41:18.176664108 +0100
>+++
>gcc/testsuite/gcc.target/aarch64/sve/acle/general/debug_4.c2019-12-10
>11:43:15.572875421 +
>@@ -0,0 +1,16 @@
>+/* { dg-options "-O -g -msve-vector-bits=512" } */
>+
>+#include 
>+
>+void __attribute__((noipa))
>+g (volatile int *x, svbool_t pg)
>+{
>+  *x = 1;
>+}
>+
>+void
>+f (volatile int *x)
>+{
>+  svbool_t pg = svorr_z (svpfalse (), svpfalse (), svpfalse ());
>+  g (x, pg);
>+}



Re: Record the loop masks needed for EXTRACT_LAST_REDUCTIONs

2019-12-10 Thread Richard Biener
On December 10, 2019 12:34:33 PM GMT+01:00, Richard Sandiford 
 wrote:
>The analysis phase of vectorizable_condition wasn't recording the
>loop masks needed by the transform phase.  This meant that the masks
>wouldn't be created in the (rare) case that no other statement needed
>them.
>
>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok. 

Richard. 

>Richard
>
>
>2019-12-10  Richard Sandiford  
>
>gcc/
>   * tree-vect-stmts.c (vectorizable_condition): Record the loop
>   masks required for extract-last reductions.
>
>gcc/testsuite/
>   * gcc.target/aarch64/sve/clastb_9.c: New test.
>
>Index: gcc/tree-vect-stmts.c
>===
>--- gcc/tree-vect-stmts.c  2019-12-10 11:33:27.136893420 +
>+++ gcc/tree-vect-stmts.c  2019-12-10 11:33:34.084845950 +
>@@ -9912,6 +9912,7 @@ vectorizable_condition (stmt_vec_info st
>vect_unknown_def_type, vect_unknown_def_type};
>   int ndts = 4;
>   int ncopies;
>+  int vec_num;
>  enum tree_code code, cond_code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
>   stmt_vec_info prev_stmt_info = NULL;
>   int i, j;
>@@ -9969,9 +9970,15 @@ vectorizable_condition (stmt_vec_info st
>   tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
> 
>   if (slp_node)
>-ncopies = 1;
>+{
>+  ncopies = 1;
>+  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
>+}
>   else
>-ncopies = vect_get_num_copies (loop_vinfo, vectype);
>+{
>+  ncopies = vect_get_num_copies (loop_vinfo, vectype);
>+  vec_num = 1;
>+}
> 
>   gcc_assert (ncopies >= 1);
>   if (for_reduction && ncopies > 1)
>@@ -10094,6 +10101,12 @@ vectorizable_condition (stmt_vec_info st
>   }
>   }
> 
>+  if (loop_vinfo
>+&& LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
>+&& reduction_type == EXTRACT_LAST_REDUCTION)
>+  vect_record_loop_mask (loop_vinfo, _VINFO_MASKS (loop_vinfo),
>+ ncopies * vec_num, vectype, NULL);
>+
>   vect_cost_for_stmt kind = vector_stmt;
>   if (reduction_type == EXTRACT_LAST_REDUCTION)
>   /* Count one reduction-like operation per vector.  */
>Index: gcc/testsuite/gcc.target/aarch64/sve/clastb_9.c
>===
>--- /dev/null  2019-09-17 11:41:18.176664108 +0100
>+++ gcc/testsuite/gcc.target/aarch64/sve/clastb_9.c2019-12-10
>11:33:34.080845979 +
>@@ -0,0 +1,21 @@
>+/* Originally gcc.dg/vect/O1-pr41008.c.  */
>+/* { dg-options "-O1 -ftree-vectorize -fno-vect-cost-model
>-msve-vector-bits=256" } */
>+
>+double heating[2][2];
>+
>+void foo (int, int);
>+
>+void map_do()
>+{
>+  int jsav, ksav, k, j;
>+
>+  for(k = 0; k < 2; k++)
>+for(j = 0; j < 2; j++)
>+  if (heating[k][j] > 0.)
>+{
>+  jsav = j;
>+  ksav = k;
>+}
>+
>+  foo (jsav, ksav);
>+}



Re: Add missing conversion in vect_create_epilog_for_reduction

2019-12-10 Thread Richard Biener
On December 10, 2019 12:39:53 PM GMT+01:00, Richard Sandiford 
 wrote:
>The direct_slp_reduc code in vect_create_epilog_for_reduction was
>still assuming that all types involved in a reduction are the same
>(up to types_compatible_p), whereas we now support differences in
>sign.  This was causing an ICE in gcc.dg/vect/pr92324-4.c for SVE.
>
>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok. 

Richard. 

>Richard
>
>
>2019-12-10  Richard Sandiford  
>
>gcc/
>   * tree-vect-loop.c (vect_create_epilog_for_reduction): When
>   handling direct_slp_reduc, allow the PHI arguments to have
>   a different type from the vector elements.
>
>Index: gcc/tree-vect-loop.c
>===
>--- gcc/tree-vect-loop.c   2019-12-10 11:30:23.506138368 +
>+++ gcc/tree-vect-loop.c   2019-12-10 11:38:37.322774821 +
>@@ -5054,6 +5054,8 @@ vect_create_epilog_for_reduction (stmt_v
> tree scalar_value
>   = PHI_ARG_DEF_FROM_EDGE (orig_phis[i]->stmt,
>loop_preheader_edge (loop));
>+scalar_value = gimple_convert (, TREE_TYPE (vectype),
>+   scalar_value);
> vector_identity = gimple_build_vector_from_val (, vectype,
> scalar_value);
>   }



Re: Fix EXTRACT_LAST_REDUCTION handling of pattern stmts

2019-12-10 Thread Richard Biener
On December 10, 2019 12:28:24 PM GMT+01:00, Richard Sandiford 
 wrote:
>Unlike most vector ops, extract-last reductions replace the original
>scalar code in-situ rather than adding an adjacent vector
>implementation.
>I.e.:
>
>   dest_1 = COND_EXPR <...>;
>
>becomes:
>
>   dest_1 = .EXTRACT_LAST (...);
>
>gcc.dg/vect/vect-cond-reduc-4.c was ICEing for SVE because we tried
>to replace the pattern statement in this way, rather than replacing
>the original scalar statement.
>
>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok. 

Richard. 

>Richard
>
>
>2019-12-10  Richard Sandiford  
>
>gcc/
>   * tree-vect-stmts.c (vect_finish_replace_stmt): Always use the
>   original scalar statement rather than a pattern statement.
>   (vectorizable_condition): Likewise, in the handling of extract-last
>   reductions.
>
>Index: gcc/tree-vect-stmts.c
>===
>--- gcc/tree-vect-stmts.c  2019-12-10 09:17:18.768002919 +
>+++ gcc/tree-vect-stmts.c  2019-12-10 11:27:23.619355131 +
>@@ -1779,9 +1779,10 @@ vect_finish_stmt_generation_1 (stmt_vec_
> stmt_vec_info
> vect_finish_replace_stmt (stmt_vec_info stmt_info, gimple *vec_stmt)
> {
>-  gcc_assert (gimple_get_lhs (stmt_info->stmt) == gimple_get_lhs
>(vec_stmt));
>+  gimple *scalar_stmt = vect_orig_stmt (stmt_info)->stmt;
>+  gcc_assert (gimple_get_lhs (scalar_stmt) == gimple_get_lhs
>(vec_stmt));
> 
>-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt_info->stmt);
>+  gimple_stmt_iterator gsi = gsi_for_stmt (scalar_stmt);
>   gsi_replace (, vec_stmt, true);
> 
>   return vect_finish_stmt_generation_1 (stmt_info, vec_stmt);
>@@ -10324,20 +10325,21 @@ vectorizable_condition (stmt_vec_info st
> 
> if (reduction_type == EXTRACT_LAST_REDUCTION)
>   {
>+gimple *old_stmt = vect_orig_stmt (stmt_info)->stmt;
>+tree lhs = gimple_get_lhs (old_stmt);
> gcall *new_stmt = gimple_build_call_internal
>   (IFN_FOLD_EXTRACT_LAST, 3, else_clause, vec_compare,
>vec_then_clause);
>-gimple_call_set_lhs (new_stmt, scalar_dest);
>-SSA_NAME_DEF_STMT (scalar_dest) = new_stmt;
>-if (stmt_info->stmt == gsi_stmt (*gsi))
>+gimple_call_set_lhs (new_stmt, lhs);
>+SSA_NAME_DEF_STMT (lhs) = new_stmt;
>+if (old_stmt == gsi_stmt (*gsi))
>   new_stmt_info = vect_finish_replace_stmt (stmt_info, new_stmt);
> else
>   {
> /* In this case we're moving the definition to later in the
>block.  That doesn't matter because the only uses of the
>lhs are in phi statements.  */
>-gimple_stmt_iterator old_gsi
>-  = gsi_for_stmt (stmt_info->stmt);
>+gimple_stmt_iterator old_gsi = gsi_for_stmt (old_stmt);
> gsi_remove (_gsi, true);
> new_stmt_info
>   = vect_finish_stmt_generation (stmt_info, new_stmt, gsi);



Make dwarf2out punt for MODE_VECTOR_BOOL

2019-12-10 Thread Richard Sandiford
The dwarf2 handling of vector constants currently divides the vector
into a length (number of elements) and byte element size.  This doesn't
work well for MODE_VECTOR_BOOL, where several elements are packed into
the same byte.

We should probably add a way of encoding this in future, but for now
the safest thing is to punt, like we already do for variable-length
vectors.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-12-10  Richard Sandiford  

gcc/
* dwarf2out.c (loc_descriptor): Punt for MODE_VECTOR_BOOL.
(add_const_value_attribute): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/debug_4.c: New test.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c 2019-11-29 13:04:13.978672241 +
+++ gcc/dwarf2out.c 2019-12-10 11:43:15.560875505 +
@@ -16763,7 +16763,12 @@ loc_descriptor (rtx rtl, machine_mode mo
   if (mode == VOIDmode)
mode = GET_MODE (rtl);
 
-  if (mode != VOIDmode && (dwarf_version >= 4 || !dwarf_strict))
+  if (mode != VOIDmode
+ /* The combination of a length and byte elt_size doesn't extend
+naturally to boolean vectors, where several elements are packed
+into the same byte.  */
+ && GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL
+ && (dwarf_version >= 4 || !dwarf_strict))
{
  unsigned int length;
  if (!CONST_VECTOR_NUNITS (rtl).is_constant ())
@@ -19622,6 +19627,12 @@ add_const_value_attribute (dw_die_ref di
  return false;
 
machine_mode mode = GET_MODE (rtl);
+   /* The combination of a length and byte elt_size doesn't extend
+  naturally to boolean vectors, where several elements are packed
+  into the same byte.  */
+   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+ return false;
+
unsigned int elt_size = GET_MODE_UNIT_SIZE (mode);
unsigned char *array
  = ggc_vec_alloc (length * elt_size);
Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/debug_4.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/acle/general/debug_4.c 2019-12-10 
11:43:15.572875421 +
@@ -0,0 +1,16 @@
+/* { dg-options "-O -g -msve-vector-bits=512" } */
+
+#include 
+
+void __attribute__((noipa))
+g (volatile int *x, svbool_t pg)
+{
+  *x = 1;
+}
+
+void
+f (volatile int *x)
+{
+  svbool_t pg = svorr_z (svpfalse (), svpfalse (), svpfalse ());
+  g (x, pg);
+}


Add missing conversion in vect_create_epilog_for_reduction

2019-12-10 Thread Richard Sandiford
The direct_slp_reduc code in vect_create_epilog_for_reduction was
still assuming that all types involved in a reduction are the same
(up to types_compatible_p), whereas we now support differences in
sign.  This was causing an ICE in gcc.dg/vect/pr92324-4.c for SVE.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-12-10  Richard Sandiford  

gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): When
handling direct_slp_reduc, allow the PHI arguments to have
a different type from the vector elements.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-12-10 11:30:23.506138368 +
+++ gcc/tree-vect-loop.c2019-12-10 11:38:37.322774821 +
@@ -5054,6 +5054,8 @@ vect_create_epilog_for_reduction (stmt_v
  tree scalar_value
= PHI_ARG_DEF_FROM_EDGE (orig_phis[i]->stmt,
 loop_preheader_edge (loop));
+ scalar_value = gimple_convert (, TREE_TYPE (vectype),
+scalar_value);
  vector_identity = gimple_build_vector_from_val (, vectype,
  scalar_value);
}


Record the loop masks needed for EXTRACT_LAST_REDUCTIONs

2019-12-10 Thread Richard Sandiford
The analysis phase of vectorizable_condition wasn't recording the
loop masks needed by the transform phase.  This meant that the masks
wouldn't be created in the (rare) case that no other statement needed
them.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-12-10  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vectorizable_condition): Record the loop
masks required for extract-last reductions.

gcc/testsuite/
* gcc.target/aarch64/sve/clastb_9.c: New test.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-12-10 11:33:27.136893420 +
+++ gcc/tree-vect-stmts.c   2019-12-10 11:33:34.084845950 +
@@ -9912,6 +9912,7 @@ vectorizable_condition (stmt_vec_info st
vect_unknown_def_type, vect_unknown_def_type};
   int ndts = 4;
   int ncopies;
+  int vec_num;
   enum tree_code code, cond_code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
   stmt_vec_info prev_stmt_info = NULL;
   int i, j;
@@ -9969,9 +9970,15 @@ vectorizable_condition (stmt_vec_info st
   tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
 
   if (slp_node)
-ncopies = 1;
+{
+  ncopies = 1;
+  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+}
   else
-ncopies = vect_get_num_copies (loop_vinfo, vectype);
+{
+  ncopies = vect_get_num_copies (loop_vinfo, vectype);
+  vec_num = 1;
+}
 
   gcc_assert (ncopies >= 1);
   if (for_reduction && ncopies > 1)
@@ -10094,6 +10101,12 @@ vectorizable_condition (stmt_vec_info st
}
}
 
+  if (loop_vinfo
+ && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
+ && reduction_type == EXTRACT_LAST_REDUCTION)
+   vect_record_loop_mask (loop_vinfo, _VINFO_MASKS (loop_vinfo),
+  ncopies * vec_num, vectype, NULL);
+
   vect_cost_for_stmt kind = vector_stmt;
   if (reduction_type == EXTRACT_LAST_REDUCTION)
/* Count one reduction-like operation per vector.  */
Index: gcc/testsuite/gcc.target/aarch64/sve/clastb_9.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/clastb_9.c 2019-12-10 
11:33:34.080845979 +
@@ -0,0 +1,21 @@
+/* Originally gcc.dg/vect/O1-pr41008.c.  */
+/* { dg-options "-O1 -ftree-vectorize -fno-vect-cost-model 
-msve-vector-bits=256" } */
+
+double heating[2][2];
+
+void foo (int, int);
+
+void map_do()
+{
+  int jsav, ksav, k, j;
+
+  for(k = 0; k < 2; k++)
+for(j = 0; j < 2; j++)
+  if (heating[k][j] > 0.)
+{
+  jsav = j;
+  ksav = k;
+}
+
+  foo (jsav, ksav);
+}


[committed] Disallow EXTRACT_LAST_REDUCTION for reduction chains

2019-12-10 Thread Richard Sandiford
gcc.dg/vect/vect-cond-reduc-5.c was ICEing for SVE because we
tried to use an extract-last reduction for a chain of COND_EXPRs.
Adding support for the chained case would be too invasive for stage 3
so this patch explicitly forbids it instead.  I've filed PR92884 for
the possible future work.

Tested on aarch64-linux-gnu and x86_64-linux-gnu, applied as obvious.

Richard


2019-12-10  Richard Sandiford  

gcc/
* tree-vect-loop.c (vectorizable_reduction): Don't use
EXTRACT_LAST_REDUCTION for chained reductions.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-12-06 10:31:12.558424757 +
+++ gcc/tree-vect-loop.c2019-12-10 11:30:16.014189041 +
@@ -6196,8 +6196,9 @@ vectorizable_reduction (stmt_vec_info st
  return false;
}
 
-  if (direct_internal_fn_supported_p (IFN_FOLD_EXTRACT_LAST,
- vectype_in, OPTIMIZE_FOR_SPEED))
+  if (reduc_chain_length == 1
+ && direct_internal_fn_supported_p (IFN_FOLD_EXTRACT_LAST,
+vectype_in, OPTIMIZE_FOR_SPEED))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,


Fix EXTRACT_LAST_REDUCTION handling of pattern stmts

2019-12-10 Thread Richard Sandiford
Unlike most vector ops, extract-last reductions replace the original
scalar code in-situ rather than adding an adjacent vector implementation.
I.e.:

   dest_1 = COND_EXPR <...>;

becomes:

   dest_1 = .EXTRACT_LAST (...);

gcc.dg/vect/vect-cond-reduc-4.c was ICEing for SVE because we tried
to replace the pattern statement in this way, rather than replacing
the original scalar statement.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-12-10  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vect_finish_replace_stmt): Always use the
original scalar statement rather than a pattern statement.
(vectorizable_condition): Likewise, in the handling of extract-last
reductions.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-12-10 09:17:18.768002919 +
+++ gcc/tree-vect-stmts.c   2019-12-10 11:27:23.619355131 +
@@ -1779,9 +1779,10 @@ vect_finish_stmt_generation_1 (stmt_vec_
 stmt_vec_info
 vect_finish_replace_stmt (stmt_vec_info stmt_info, gimple *vec_stmt)
 {
-  gcc_assert (gimple_get_lhs (stmt_info->stmt) == gimple_get_lhs (vec_stmt));
+  gimple *scalar_stmt = vect_orig_stmt (stmt_info)->stmt;
+  gcc_assert (gimple_get_lhs (scalar_stmt) == gimple_get_lhs (vec_stmt));
 
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt_info->stmt);
+  gimple_stmt_iterator gsi = gsi_for_stmt (scalar_stmt);
   gsi_replace (, vec_stmt, true);
 
   return vect_finish_stmt_generation_1 (stmt_info, vec_stmt);
@@ -10324,20 +10325,21 @@ vectorizable_condition (stmt_vec_info st
 
  if (reduction_type == EXTRACT_LAST_REDUCTION)
{
+ gimple *old_stmt = vect_orig_stmt (stmt_info)->stmt;
+ tree lhs = gimple_get_lhs (old_stmt);
  gcall *new_stmt = gimple_build_call_internal
(IFN_FOLD_EXTRACT_LAST, 3, else_clause, vec_compare,
 vec_then_clause);
- gimple_call_set_lhs (new_stmt, scalar_dest);
- SSA_NAME_DEF_STMT (scalar_dest) = new_stmt;
- if (stmt_info->stmt == gsi_stmt (*gsi))
+ gimple_call_set_lhs (new_stmt, lhs);
+ SSA_NAME_DEF_STMT (lhs) = new_stmt;
+ if (old_stmt == gsi_stmt (*gsi))
new_stmt_info = vect_finish_replace_stmt (stmt_info, new_stmt);
  else
{
  /* In this case we're moving the definition to later in the
 block.  That doesn't matter because the only uses of the
 lhs are in phi statements.  */
- gimple_stmt_iterator old_gsi
-   = gsi_for_stmt (stmt_info->stmt);
+ gimple_stmt_iterator old_gsi = gsi_for_stmt (old_stmt);
  gsi_remove (_gsi, true);
  new_stmt_info
= vect_finish_stmt_generation (stmt_info, new_stmt, gsi);


Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

2019-12-10 Thread Jan Hubicka
> Hi,
> 
> On Tue, Dec 10 2019, Jan Hubicka wrote:
> > Hi,
> > I think the updating should treat self recursive edges as loops: that is
> > calculate SUM of counts incomming edges which are not self recursive,
> > calculate probability of self recursing and then set count as
> > SUM * (1/(1-probability_of_recursion))
> > we do same thing when computing counts withing loops.
> >
> > Martin, I woner how updating works for indirectly self recursive edges?
> 
> It does not special-case them in any way, if that was the question.

Well, it should, summing only frequencies of edges entering the SCC is
not going to give a good answer.
If SCC is having single entry and forms a loop the generalization of
algorithm above will work, but for general graphs this is hard problem.

I suppose we could special case self recurisve nodes (since they are
most common) and apply some fixed scale for other SCCs?

Also i wonder if the updating is done in RPO so we do not account edges
not updated yet?

Honza

> 
> Martin


Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-10 Thread Sudakshina Das
Hi Christophe

On 10/12/2019 09:01, Christophe Lyon wrote:
> Hi,
> 
> On Mon, 9 Dec 2019 at 11:23, Sudakshina Das  wrote:
>>
>> Hi Jeff
>>
>> On 07/12/2019 17:44, Jeff Law wrote:
>>> On Fri, 2019-12-06 at 14:05 +, Sudakshina Das wrote:
 Hi

 While looking at the vectorization for following example, we
 realized
 that even though vectorizable_shift function was distinguishing
 vector
 shifted by vector from vector shifted by scalar, while modeling the
 cost
 it would always add the cost of building a vector constant despite
 not
 needing it for vector shifted by scalar.

 This patch fixes this by using scalar_shift_arg to determine whether
 we
 need to build a vector for the second operand or not. This reduces
 prologue cost as shown in the test.

 Build and regression tests pass on aarch64-none-elf and
 x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
 Spec2017 for AArch64.

> 
> Looks like you didn't check on arm, where I can see that the new testcase 
> fails:
> FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
> scan-tree-dump vect "vectorizable_shift
> ===[\\n\\r][^\\n]*prologue_cost = 0"
> FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
> "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"
> 
> Seen on arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16
> 
> Christophe

Thanks for reporting this. There is already a bugzilla report PR92870 
for powerpc that I am looking at. Apologies I couldn't find your email 
address there to add you to the cc list.

Thanks
Sudi

> 
 gcc/ChangeLog:

 2019-xx-xx  Sudakshina Das  
   Richard Sandiford  

   * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
   vect_model_simple_cost call on scalar_shift_arg.

 gcc/testsuite/ChangeLog:

 2019-xx-xx  Sudakshina Das  

   * gcc.dg/vect/vect-shift-5.c: New test.
>>> It's a bit borderline, but it's really just twiddling a cost, so OK.
>>
>> Thanks :) Committed as r279114.
>>
>> Sudi
>>
>>>
>>> jeff
>>>
>>



[patch, committed, Fortran] PR 92872 – Fix get_CFI_desc

2019-12-10 Thread Tobias Burnus
Fix the condition; the DECL_LANG_SPECIFIC can be used for much more than 
just to save the decl.


Committed as Rev. 279160.

Tobias

2019-12-10  Tobias Burnus  

	PR fortran/92872
	* trans-array.c (get_CFI_desc): Fix cond whether saved desc exists.

	PR fortran/92872
	* gfortran.dg/bind_c_optional-1.f90: New.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 5c27c065ff0..1b779988616 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -882,7 +882,7 @@ get_CFI_desc (gfc_symbol *sym, gfc_expr *expr,
   else
 tmp = sym->backend_decl;
 
-  if (tmp && DECL_LANG_SPECIFIC (tmp))
+  if (tmp && DECL_LANG_SPECIFIC (tmp) && GFC_DECL_SAVED_DESCRIPTOR (tmp))
 tmp = GFC_DECL_SAVED_DESCRIPTOR (tmp);
 
   *desc = tmp;
diff --git a/gcc/testsuite/gfortran.dg/bind_c_optional-1.f90 b/gcc/testsuite/gfortran.dg/bind_c_optional-1.f90
new file mode 100644
index 000..99409205b6f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/bind_c_optional-1.f90
@@ -0,0 +1,22 @@
+! { dg-do run }
+!
+! PR fortran/92872
+!
+! Contributed by G. Steinmetz
+!
+module m
+contains
+subroutine s(x) bind(c)
+   integer, allocatable, optional :: x(:)
+   x = [1, 2, 3]
+end
+end
+
+use m
+integer, allocatable :: y(:)
+! NOTE: starting at 0, otherwise it will fail due to PR 92189
+allocate(y(0:2))
+y = [9, 8, 7]
+call s(y)
+if (any (y /= [1, 2, 3])) stop 1
+end


Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)

2019-12-10 Thread Jakub Jelinek
On Tue, Dec 10, 2019 at 11:02:39AM +0100, Andreas Schwab wrote:
> On Dez 10 2019, Jakub Jelinek wrote:
> 
> > --- gcc/testsuite/gcc.target/i386/pr92841.c.jj  2019-12-09 
> > 19:38:29.572759215 +0100
> > +++ gcc/testsuite/gcc.target/i386/pr92841.c 2019-12-09 19:40:59.642492417 
> > +0100
> > @@ -0,0 +1,17 @@
> > +/* PR target/92841 */
> > +/* { dg-do compile { target fstack_protector } } */
> > +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */
> > +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), 
> > %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */
> > +
> > +const struct S { int b; } c[] = {30, 12, 20, 0, 11};
> > +void bar (int *);
> > +
> > +void
> > +foo (void)
> > +{
> > +  int e[4];
> > +  const struct S *a;
> > +  for (a = c; a < c + sizeof (c); a++)
> > +if (a->b)
> 
> This accesses c beyond bounds.  Does that invalidate the test?

Thanks for noticing, changed into
  for (a = c; a < c + sizeof (c) / sizeof (c[0]); a++)
in my copy, the testcase still FAILs before the patch and PASSes with it.

Jakub



Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)

2019-12-10 Thread Andreas Schwab
On Dez 10 2019, Jakub Jelinek wrote:

> --- gcc/testsuite/gcc.target/i386/pr92841.c.jj2019-12-09 
> 19:38:29.572759215 +0100
> +++ gcc/testsuite/gcc.target/i386/pr92841.c   2019-12-09 19:40:59.642492417 
> +0100
> @@ -0,0 +1,17 @@
> +/* PR target/92841 */
> +/* { dg-do compile { target fstack_protector } } */
> +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */
> +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), 
> %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */
> +
> +const struct S { int b; } c[] = {30, 12, 20, 0, 11};
> +void bar (int *);
> +
> +void
> +foo (void)
> +{
> +  int e[4];
> +  const struct S *a;
> +  for (a = c; a < c + sizeof (c); a++)
> +if (a->b)

This accesses c beyond bounds.  Does that invalidate the test?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)

2019-12-10 Thread Jakub Jelinek
Hi!

The stack_protect_set_1_ pattern intentionally clears the register it
used as a temporary to read the canary from the register and push it back
on the stack for security reasons, to make sure the stack canary isn't
spilled somewhere else etc.  On the following testcase, we end up with a
weird:
movq%fs:40, %rax
movq%rax, 24(%rsp)
xorl%eax, %eax
movl$30, %eax
sequence though, where the reporter rightfully complains it is a waste
to clear the register and immediately set it to something else.

We really don't want to split this into two patterns, because then the
scheduler and whatever other post-RA passes could stick some further
code in between and increase the lifetime of the security sensitive
data in the register.

So, the following patch uses peephole2 to merge the
stack_protect_set_1_ insn with following setter of the same register.
Only SImode and for TARGET_64BIT DImode moves are considered, as QI/HImode
(unless actually emitted as SImode) moves don't overwrite the whole
register, and for simplicity only the most common cases (no XMM/MM etc.
sources).

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
(just check-gcc check-c++-all check-target-libstdc++-v3 with
--target_board=unix/-fstack-protector-strong).  The -fstack-protector-strong
testing was done with additional logging whenever the peephole2's kick in.
During the -m64 testing, it kicked in 17682 times, during -m32 testing
6344 times.  Ok for trunk?

2019-12-10  Jakub Jelinek  

PR target/92841
* config/i386/i386.md (*stack_protect_set_2_,
*stack_protect_set_3): New define_insns and corresponding
define_peephole2s.

* gcc.target/i386/pr92841.c: New test.

--- gcc/config/i386/i386.md.jj  2019-12-05 10:04:05.35723 +0100
+++ gcc/config/i386/i386.md 2019-12-09 19:29:31.578885594 +0100
@@ -19732,6 +19732,98 @@ (define_insn "@stack_protect_set_1_}\t{%1, %2|%2, %1}\;mov{}\t{%2, %0|%0, 
%2}\;xor{l}\t%k2, %k2"
   [(set_attr "type" "multi")])
 
+;; Patterns and peephole2s to optimize stack_protect_set_1_
+;; immediately followed by *mov{s,d}i_internal to the same register,
+;; where we can avoid the xor{l} above.  We don't split this, so that
+;; scheduling or anything else doesn't separate the *stack_protect_set*
+;; pattern from the set of the register that overwrites the register
+;; with a new value.
+(define_insn "*stack_protect_set_2_"
+  [(set (match_operand:PTR 0 "memory_operand" "=m")
+   (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")]
+   UNSPEC_SP_SET))
+   (set (match_operand:SI 1 "register_operand" "=")
+   (match_operand:SI 2 "general_operand" "g"))
+   (clobber (reg:CC FLAGS_REG))]
+  "reload_completed
+   && !reg_overlap_mentioned_p (operands[1], operands[2])"
+{
+  output_asm_insn ("mov{}\t{%3, %1|%1, %3}", operands);
+  output_asm_insn ("mov{}\t{%1, %0|%0, %1}", operands);
+  if (pic_32bit_operand (operands[2], SImode)
+  || ix86_use_lea_for_mov (insn, operands + 1))
+return "lea{l}\t{%E2, %1|%1, %E2}";
+  else
+return "mov{l}\t{%2, %1|%1, %2}";
+}
+  [(set_attr "type" "multi")
+   (set_attr "length" "24")])
+
+(define_peephole2
+ [(parallel [(set (match_operand:PTR 0 "memory_operand")
+ (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
+ UNSPEC_SP_SET))
+(set (match_operand:PTR 2 "general_reg_operand") (const_int 0))
+(clobber (reg:CC FLAGS_REG))])
+  (set (match_operand:SI 3 "general_reg_operand")
+   (match_operand:SI 4 "general_operand"))]
+ "reload_completed
+  && REGNO (operands[2]) == REGNO (operands[3])
+  && !reg_overlap_mentioned_p (operands[3], operands[4])
+  && (general_reg_operand (operands[4], SImode)
+  || !register_operand (operands[4], SImode))"
+ [(parallel [(set (match_dup 0)
+ (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
+(set (match_dup 3) (match_dup 4))
+(clobber (reg:CC FLAGS_REG))])])
+
+(define_insn "*stack_protect_set_3"
+  [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
+   (unspec:DI [(match_operand:DI 3 "memory_operand" "m,m,m")]
+  UNSPEC_SP_SET))
+   (set (match_operand:DI 1 "register_operand" "=,r,r")
+   (match_operand:DI 2 "general_operand" "Z,rem,i"))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT
+   && reload_completed
+   && !reg_overlap_mentioned_p (operands[1], operands[2])"
+{
+  output_asm_insn ("mov{q}\t{%3, %1|%1, %3}", operands);
+  output_asm_insn ("mov{q}\t{%1, %0|%0, %1}", operands);
+  if (which_alternative == 0)
+return "mov{l}\t{%k2, %k1|%k1, %k2}";
+  else if (which_alternative == 2)
+return "movabs{q}\t{%2, %1|%1, %2}";
+  else if (pic_32bit_operand (operands[2], DImode)
+  || ix86_use_lea_for_mov (insn, operands + 1))
+return "lea{q}\t{%E2, %1|%1, %E2}";
+  else
+return "mov{q}\t{%2, %1|%1, %2}";
+}
+  [(set_attr "type" "multi")
+   

[PATCH] Improve -fstack-protector-strong (PR middle-end/92825)

2019-12-10 Thread Jakub Jelinek
Hi!

As mentioned in the PR, -fstack-protector-strong in some cases instruments
even functions which except for the stack canary setup and later testing of
that don't touch the stack at all.

This is because for -fstack-protector-strong we setup the canary
whenever stack_protect_decl_p returns true, and that goes through all local
decls and if it is a var that is not a global var and is either address
taken or is array or has arrays embedded in its type, returns true.
Now, the problem is that by going through all non-global variables, it goes
even through variables that are or have arrays, but we allocate them in
registers like in the testcase below, or in theory for nested functions
it could be variables from the parent function too.
Variables which live in registers, even when they have arrays in them,
shouldn't really cause stack overflows of any kind, out of bounds accesses
to them never cause out of bounds stack accesses, usually they ought to be
already optimized away, arrays that are accessed through non-constant
indexes should cause variables to live in memory.

Another issue is that the record_or_union_type_has_array_p function duplicates
the behavior of stack_protect_classify_type when it returns the SPCT_HAS_ARRAY
bit set.

Initially, I thought I'd just move the stack_protect_decl_p call later when
the stack_vars array is computed and instead of FOR_EACH_LOCAL_DECL
walk the stack_vars array.  But for the discovery of stack_vars that are or
have arrays that is a waste of time, all such variables are already picked
for phase 1 or phase 2 and thus has_protected_decls bool will be set if
there are any and we already do create the stack canary if
has_protected_decls is set.  So, the only thing that remains is catch
variables that aren't or don't contain arrays, but are address taken.  Those
go into the last unnumbered phase, but we still want to create stack canary.
Instead of adding yet another walk I've done that in a function that already
walks them.

In addition to this, I've tried to clarify this in the documentation.  For
-fstack-protector, we've been doing it this way already before, optimized
away or arrays living in registers didn't count, because we only walked
stack-vars.  And, the documentation also said that only > 8 byte arrays are
protected with -stack-protector, but it is actually >= 8 byte (or whatever
the ssp-buffer-size param is).

Bootstrapped/regtested on x86_64-linux and i686-linux, in addition regtested
(just check-gcc, check-c++-all and check-target-libstdc++-v3) with
--target_board=unix/-fstack-protector-strong on both.
Ok for trunk?

2019-12-10  Jakub Jelinek  

PR middle-end/92825
* cfgexpand.c (add_stack_protection_conflicts): Change return type
from void to bool, return true if at least one stack_vars[i].decl
is addressable.
(record_or_union_type_has_array_p, stack_protect_decl_p): Remove.
(expand_used_vars): Don't call stack_protect_decl_p, instead for
-fstack-protector-strong set gen_stack_protect_signal to true
if add_stack_protection_conflicts returned true.  Formatting fixes.
* doc/invoke.texi (-fstack-protector-strong): Clarify that optimized
out variables or variables not living on the stack don't count.
(-fstack-protector): Likewise.  Clarify it affects >= 8 byte arrays
rather than > 8 byte.

* gcc.target/i386/pr92825.c: New test.

--- gcc/cfgexpand.c.jj  2019-12-06 09:51:56.917899249 +0100
+++ gcc/cfgexpand.c 2019-12-09 13:29:16.518031095 +0100
@@ -1888,17 +1888,23 @@ asan_decl_phase_3 (size_t i)
 }
 
 /* Ensure that variables in different stack protection phases conflict
-   so that they are not merged and share the same stack slot.  */
+   so that they are not merged and share the same stack slot.
+   Return true if there are any address taken variables.  */
 
-static void
+static bool
 add_stack_protection_conflicts (void)
 {
   size_t i, j, n = stack_vars_num;
   unsigned char *phase;
+  bool ret = false;
 
   phase = XNEWVEC (unsigned char, n);
   for (i = 0; i < n; ++i)
-phase[i] = stack_protect_decl_phase (stack_vars[i].decl);
+{
+  phase[i] = stack_protect_decl_phase (stack_vars[i].decl);
+  if (TREE_ADDRESSABLE (stack_vars[i].decl))
+   ret = true;
+}
 
   for (i = 0; i < n; ++i)
 {
@@ -1909,6 +1915,7 @@ add_stack_protection_conflicts (void)
 }
 
   XDELETEVEC (phase);
+  return ret;
 }
 
 /* Create a decl for the guard at the top of the stack frame.  */
@@ -1993,50 +2000,6 @@ estimated_stack_frame_size (struct cgrap
   return estimated_poly_value (size);
 }
 
-/* Helper routine to check if a record or union contains an array field. */
-
-static int
-record_or_union_type_has_array_p (const_tree tree_type)
-{
-  tree fields = TYPE_FIELDS (tree_type);
-  tree f;
-
-  for (f = fields; f; f = DECL_CHAIN (f))
-if (TREE_CODE (f) == FIELD_DECL)
-  {
-   tree field_type = TREE_TYPE (f);
-   if 

Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

2019-12-10 Thread Martin Jambor
Hi,

On Tue, Dec 10 2019, Jan Hubicka wrote:
> Hi,
> I think the updating should treat self recursive edges as loops: that is
> calculate SUM of counts incomming edges which are not self recursive,
> calculate probability of self recursing and then set count as
> SUM * (1/(1-probability_of_recursion))
> we do same thing when computing counts withing loops.
>
> Martin, I woner how updating works for indirectly self recursive edges?

It does not special-case them in any way, if that was the question.

Martin


RE: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float comparisons

2019-12-10 Thread Claudiu Zissulescu
Hi,

Thank you for your contribution, I'll push it asap. As far as I understand, you 
need this patch both in gcc9 branch and mainline.

Cheers,
Claudiu

> -Original Message-
> From: Vineet Gupta [mailto:vgu...@synopsys.com]
> Sent: Monday, December 09, 2019 8:02 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ;
> andrew.burg...@embecosm.com; linux-snps-...@lists.infradead.org;
> Vineet Gupta 
> Subject: [PATCH] PR 92846: [ARC] generate signaling FDCMPF for hard float
> comparisons
> 
> ARC gcc generates FDCMP instructions which raises Invalid operation for
> signaling NaN only. This causes glibc iseqsig() primitives to fail (in
> the current ongoing glibc port to ARC)
> 
> So split up the hard float compares into two categories and for unordered
> compares generate the FDCMPF instruction (vs. FDCMP) which raises
> exception
> for either NaNs.
> 
> With this fix testsuite/gcc.dg/torture/pr52451.c passes for ARC.
> 
> Also passes 6 additional tests in glibc testsuite (test*iseqsig) and no
> regressions
> 
> gcc/
> -xx-xx  Vineet Gupta  
> 
>   * config/arc/arc-modes.def (CC_FPUE): New Mode CC_FPUE which
>   helps codegen generate exceptions even for quiet NaN.
>   * config/arc/arc.c (arc_init_reg_tables): Handle New CC_FPUE mode.
>   (get_arc_condition_code): Likewise.
>   (arc_select_cc_mode): LT, LE, GT, GE to use the New CC_FPUE
> mode.
>   * config/arc/arc.h (REVERSE_CONDITION): Handle New CC_FPUE
> mode.
>   * config/arc/predicates.md (proper_comparison_operator):
> Likewise.
>   * config/arc/fpu.md (cmpsf_fpu_trap): New Pattern for CC_FPUE.
>   (cmpdf_fpu_trap): Likewise.
> 
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/arc/arc-modes.def |  1 +
>  gcc/config/arc/arc.c |  8 ++--
>  gcc/config/arc/arc.h |  2 +-
>  gcc/config/arc/fpu.md| 24 
>  gcc/config/arc/predicates.md |  1 +
>  5 files changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
> index 36a2f4abfb25..d16b6a289a15 100644
> --- a/gcc/config/arc/arc-modes.def
> +++ b/gcc/config/arc/arc-modes.def
> @@ -38,4 +38,5 @@ VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI
> */
> 
>  /* FPU condition flags.  */
>  CC_MODE (CC_FPU);
> +CC_MODE (CC_FPUE);
>  CC_MODE (CC_FPU_UNEQ);
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 28305f459dcd..cbb95d6e9043 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -1564,6 +1564,7 @@ get_arc_condition_code (rtx comparison)
>   default : gcc_unreachable ();
>   }
>  case E_CC_FPUmode:
> +case E_CC_FPUEmode:
>switch (GET_CODE (comparison))
>   {
>   case EQ: return ARC_CC_EQ;
> @@ -1686,11 +1687,13 @@ arc_select_cc_mode (enum rtx_code op, rtx x,
> rtx y)
>case UNLE:
>case UNGT:
>case UNGE:
> + return CC_FPUmode;
> +
>case LT:
>case LE:
>case GT:
>case GE:
> - return CC_FPUmode;
> + return CC_FPUEmode;
> 
>case LTGT:
>case UNEQ:
> @@ -1844,7 +1847,7 @@ arc_init_reg_tables (void)
> if (i == (int) CCmode || i == (int) CC_ZNmode || i == (int) CC_Zmode
> || i == (int) CC_Cmode
> || i == CC_FP_GTmode || i == CC_FP_GEmode || i ==
> CC_FP_ORDmode
> -   || i == CC_FPUmode || i == CC_FPU_UNEQmode)
> +   || i == CC_FPUmode || i == CC_FPUEmode || i ==
> CC_FPU_UNEQmode)
>   arc_mode_class[i] = 1 << (int) C_MODE;
> else
>   arc_mode_class[i] = 0;
> @@ -8401,6 +8404,7 @@ arc_reorg (void)
> 
> /* Avoid FPU instructions.  */
> if ((GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUmode)
> +   || (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) == CC_FPUEmode)
> || (GET_MODE (XEXP (XEXP (pc_target, 0), 0)) ==
> CC_FPU_UNEQmode))
>   continue;
> 
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index 4d7ac3281b41..c08ca3d0d432 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -1531,7 +1531,7 @@ enum arc_function_type {
>(((MODE) == CC_FP_GTmode || (MODE) == CC_FP_GEmode  \
>  || (MODE) == CC_FP_UNEQmode || (MODE) == CC_FP_ORDmode\
>  || (MODE) == CC_FPXmode || (MODE) == CC_FPU_UNEQmode  \
> -|| (MODE) == CC_FPUmode)  \
> +|| (MODE) == CC_FPUmode || (MODE) == CC_FPUEmode) \
> ? reverse_condition_maybe_unordered ((CODE))   \
> : reverse_condition ((CODE)))
> 
> diff --git a/gcc/config/arc/fpu.md b/gcc/config/arc/fpu.md
> index 6289e9c3f593..6729795de542 100644
> --- a/gcc/config/arc/fpu.md
> +++ b/gcc/config/arc/fpu.md
> @@ -242,6 +242,18 @@
> (set_attr "type" "fpu")
> (set_attr "predicable" "yes")])
> 
> +(define_insn "*cmpsf_fpu_trap"
> +  [(set (reg:CC_FPUE CC_REG)
> + (compare:CC_FPUE (match_operand:SF 0 "register_operand"  "r,
> 

[committed] Fix up i386/avx512f-vmovntp?-2.c tests

2019-12-10 Thread Jakub Jelinek
Hi!

I've noticed these two tests FAIL execution with -fstack-protector-strong,
without it we are just lucky.  _mm512_stream_* requires the destination to
be 64-byte aligned and without stack protector the s vars were allocated
next to res and ensured the right alignment, but with
-fstack-protector-strong, they go into different phases (res being an array
so goes into phase 1 and s being a union/record containing arrays goes into
phase 2) and so res remained only 8-byte aligned.

Fixed thusly, tested on x86_64-linux, committed to trunk as obvious.

2019-12-10  Jakub Jelinek  

* gcc.target/i386/avx512f-vmovntpd-2.c: Ensure res is 64-byte aligned.
* gcc.target/i386/avx512f-vmovntps-2.c: Likewise.

--- gcc/testsuite/gcc.target/i386/avx512f-vmovntpd-2.c.jj   2013-12-31 
12:51:09.564632909 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-vmovntpd-2.c  2019-12-10 
09:53:02.213679644 +0100
@@ -8,7 +8,7 @@ void static
 avx512f_test (void)
 {
   union512d s;
-  double res[8];
+  double res[8] __attribute__((aligned (64)));
 
   s.x = _mm512_set_pd (-39578.467285, 4294967295.1, -7856.342941, 0,
   85632.783567, 1234., 47563.234215, -1.07);
--- gcc/testsuite/gcc.target/i386/avx512f-vmovntps-2.c.jj   2013-12-31 
12:51:09.555632955 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-vmovntps-2.c  2019-12-10 
09:53:19.096418920 +0100
@@ -8,7 +8,7 @@ void static
 avx512f_test (void)
 {
   union512 s;
-  float res[16];
+  float res[16] __attribute__((aligned (64)));
 
   s.x = _mm512_set_ps (-39578.467285, 4294967295.1, -7856.342941, 0,
   85632.783567, 1234., 47563.234215, -1.07,

Jakub



Re: [PATCH] Fix unrecognizable insn of pr92865

2019-12-10 Thread Eric Botcazou
> No gcc/ or gcc/testsuite/ prefixes in ChangeLog.

And 2 different ChangeLogs: ChangeLog and testsuite/ChangeLog.

-- 
Eric Botcazou


[committed] Fix some i386.c comment typos

2019-12-10 Thread Jakub Jelinek
Hi!

Last night when working on -fstack-protector* improvement to be posted,
I've noticed a few typos.  Fixed thusly, committed to trunk as obvious.

2019-12-10  Jakub Jelinek  

* config/i386/i386.c (IX86_LEA_PRIORITY): Fix comment typos.

--- gcc/config/i386/i386.c.jj   2019-12-09 11:12:30.083287310 +0100
+++ gcc/config/i386/i386.c  2019-12-09 18:09:45.162155616 +0100
@@ -14411,10 +14411,10 @@ distance_agu_use (unsigned int regno0, r
 }
 
 /* Define this macro to tune LEA priority vs ADD, it take effect when
-   there is a dilemma of choicing LEA or ADD
+   there is a dilemma of choosing LEA or ADD
Negative value: ADD is more preferred than LEA
-   Zero: Netrual
-   Positive value: LEA is more preferred than ADD*/
+   Zero: Neutral
+   Positive value: LEA is more preferred than ADD.  */
 #define IX86_LEA_PRIORITY 0
 
 /* Return true if usage of lea INSN has performance advantage

Jakub



Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-10 Thread Christophe Lyon
Hi,

On Mon, 9 Dec 2019 at 11:23, Sudakshina Das  wrote:
>
> Hi Jeff
>
> On 07/12/2019 17:44, Jeff Law wrote:
> > On Fri, 2019-12-06 at 14:05 +, Sudakshina Das wrote:
> >> Hi
> >>
> >> While looking at the vectorization for following example, we
> >> realized
> >> that even though vectorizable_shift function was distinguishing
> >> vector
> >> shifted by vector from vector shifted by scalar, while modeling the
> >> cost
> >> it would always add the cost of building a vector constant despite
> >> not
> >> needing it for vector shifted by scalar.
> >>
> >> This patch fixes this by using scalar_shift_arg to determine whether
> >> we
> >> need to build a vector for the second operand or not. This reduces
> >> prologue cost as shown in the test.
> >>
> >> Build and regression tests pass on aarch64-none-elf and
> >> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
> >> Spec2017 for AArch64.
> >>

Looks like you didn't check on arm, where I can see that the new testcase fails:
FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
scan-tree-dump vect "vectorizable_shift
===[\\n\\r][^\\n]*prologue_cost = 0"
FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
"vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"

Seen on arm-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a9
--with-fpu neon-fp16

Christophe

> >> gcc/ChangeLog:
> >>
> >> 2019-xx-xx  Sudakshina Das  
> >>  Richard Sandiford  
> >>
> >>  * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
> >>  vect_model_simple_cost call on scalar_shift_arg.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-xx-xx  Sudakshina Das  
> >>
> >>  * gcc.dg/vect/vect-shift-5.c: New test.
> > It's a bit borderline, but it's really just twiddling a cost, so OK.
>
> Thanks :) Committed as r279114.
>
> Sudi
>
> >
> > jeff
> >
>


Re: [PATCH] Refactor IPA devirt a bit.

2019-12-10 Thread Jan Hubicka
> > 
> > 2019-12-09  Richard Sandiford  
> > 
> > gcc/
> > * ipa-utils.h (get_odr_name_for_type): Check for a TYPE_DECL.
> > * ipa-devirt.c (warn_types_mismatch): Don't call xstrdup for the
> > second demangled name.
> > 
> > gcc/testsuite/
> > * gcc.dg/lto/tag-1_0.c, gcc.dg/lto/tag-1_1.c: New test.

OK,
thanks
Honza
> > 
> > Index: gcc/ipa-utils.h
> > ===
> > --- gcc/ipa-utils.h 2019-12-09 12:23:47.0 +
> > +++ gcc/ipa-utils.h 2019-12-09 12:23:48.326292463 +
> > @@ -256,6 +256,7 @@ get_odr_name_for_type (tree type)
> >   {
> > tree type_name = TYPE_NAME (type);
> > if (type_name == NULL_TREE
> > +  || TREE_CODE (type_name) != TYPE_DECL
> > || !DECL_ASSEMBLER_NAME_SET_P (type_name))
> >   return NULL;
> > Index: gcc/ipa-devirt.c
> > ===
> > --- gcc/ipa-devirt.c2019-12-09 12:23:47.0 +
> > +++ gcc/ipa-devirt.c2019-12-09 12:23:48.326292463 +
> > @@ -1042,7 +1042,7 @@ warn_types_mismatch (tree t1, tree t2, l
> >   {
> > const int opts = DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES;
> > char *name1 = xstrdup (cplus_demangle (odr1, opts));
> > -  char *name2 = xstrdup (cplus_demangle (odr2, opts));
> > +  char *name2 = cplus_demangle (odr2, opts);
> > if (name1 && name2 && strcmp (name1, name2))
> > {
> >   inform (loc_t1,
> > Index: gcc/testsuite/gcc.dg/lto/tag-1_0.c
> > ===
> > --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> > +++ gcc/testsuite/gcc.dg/lto/tag-1_0.c  2019-12-09 12:23:48.326292463 
> > +
> > @@ -0,0 +1,5 @@
> > +/* { dg-lto-do link } */
> > +/* { dg-lto-options { { -Wodr -flto } } }  */
> > +
> > +struct foo { int x; };
> > +struct foo a = {};
> > Index: gcc/testsuite/gcc.dg/lto/tag-1_1.c
> > ===
> > --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> > +++ gcc/testsuite/gcc.dg/lto/tag-1_1.c  2019-12-09 12:23:48.326292463 
> > +
> > @@ -0,0 +1,6 @@
> > +struct foo { short x; };
> > +
> > +extern struct foo a; /* { dg-lto-warning {type of 'a' does not match 
> > original declaration} } */
> > +struct foo *ptr = 
> > +
> > +int main () { return 0; }
> > 
> 


Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

2019-12-10 Thread Jan Hubicka
Hi,
I think the updating should treat self recursive edges as loops: that is
calculate SUM of counts incomming edges which are not self recursive,
calculate probability of self recursing and then set count as
SUM * (1/(1-probability_of_recursion))
we do same thing when computing counts withing loops.

Martin, I woner how updating works for indirectly self recursive edges?

Honza


  1   2   >