Re: [PATCH] c++: Further tweaks for new-expression and paren-init [PR77841]

2020-09-07 Thread Jason Merrill via Gcc-patches

On 9/6/20 11:34 AM, Marek Polacek wrote:

This patch corrects our handling of array new-expression with ()-init:

   new int[4](1, 2, 3, 4);

should work even with the explicit array bound, and

   new char[3]("so_sad");

should cause an error, but we weren't giving any.

Fixed by handling array new-expressions with ()-init in the same spot
where we deduce the array bound in array new-expression.  I'm now
always passing STRING_CSTs to build_new_1 wrapped in { } which allowed
me to remove the special handling of STRING_CSTs in build_new_1.  And
since the DIRECT_LIST_INIT_P block in build_new_1 calls digest_init, we
report errors about too short arrays.

I took a stab at cp_complete_array_type's "FIXME: this code is duplicated
from reshape_init", but calling reshape_init there, I ran into issues
with has_designator_problem: when we reshape an already reshaped
CONSTRUCTOR again, d.cur.index has been filled, so we think that we
have a user-provided designator (though there was no designator in the
source code), and report an error.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/77841
* init.c (build_new_1): Don't handle string-initializers here.
(build_new): Handle new-expression with paren-init when the
array bound is known.  Always pass string constants to build_new_1
enclosed in braces.

gcc/testsuite/ChangeLog:

PR c++/77841
* g++.old-deja/g++.ext/arrnew2.C: Expect the error only in C++17
and less.
* g++.old-deja/g++.robertl/eb58.C: Adjust dg-error.
* g++.dg/cpp2a/paren-init36.C: New test.
---
  gcc/cp/init.c | 37 ++-
  gcc/testsuite/g++.dg/cpp2a/paren-init36.C | 14 +++
  gcc/testsuite/g++.old-deja/g++.ext/arrnew2.C  |  2 +-
  gcc/testsuite/g++.old-deja/g++.robertl/eb58.C |  2 +-
  4 files changed, 35 insertions(+), 20 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init36.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 3268ae4ad3f..537651778b9 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3596,15 +3596,6 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  vecinit = digest_init (arraytype, vecinit, complain);
}
}
- /* This handles code like new char[]{"foo"}.  */
- else if (len == 1
-  && char_type_p (TYPE_MAIN_VARIANT (type))
-  && TREE_CODE (tree_strip_any_location_wrapper ((**init)[0]))
- == STRING_CST)
-   {
- vecinit = (**init)[0];
- STRIP_ANY_LOCATION_WRAPPER (vecinit);
-   }
  else if (*init)
  {
if (complain & tf_error)
@@ -3944,9 +3935,9 @@ build_new (location_t loc, vec **placement, 
tree type,
  }
  
/* P1009: Array size deduction in new-expressions.  */

-  if (TREE_CODE (type) == ARRAY_TYPE
-  && !TYPE_DOMAIN (type)
-  && *init)
+  const bool deduce_array_p = (TREE_CODE (type) == ARRAY_TYPE
+  && !TYPE_DOMAIN (type));
+  if (*init && (deduce_array_p || (nelts && cxx_dialect >= cxx20)))


Looks like this won't handle new (char[4]), for which we also get an 
ARRAY_TYPE.



  {
/* This means we have 'new T[]()'.  */
if ((*init)->is_empty ())
@@ -3955,16 +3946,20 @@ build_new (location_t loc, vec 
**placement, tree type,
  CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
  vec_safe_push (*init, ctor);
}
+  tree array_type = deduce_array_p ? TREE_TYPE (type) : type;


I'd call this variable elt_type.


tree  = (**init)[0];
/* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
{
- /* Handle new char[]("foo").  */
+ /* Handle new char[]("foo"): turn it into new char[]{"foo"}.  */
  if (vec_safe_length (*init) == 1
- && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+ && char_type_p (TYPE_MAIN_VARIANT (array_type))
  && TREE_CODE (tree_strip_any_location_wrapper (elt))
 == STRING_CST)
-   /* Leave it alone: the string should not be wrapped in {}.  */;
+   {
+ elt = build_constructor_single (init_list_type_node, NULL_TREE, 
elt);
+ CONSTRUCTOR_IS_DIRECT_INIT (elt) = true;
+   }
  else
{
  tree ctor = build_constructor_from_vec (init_list_type_node, 
*init);


With this change, doesn't the string special case produce the same 
result as the general case?



@@ -3977,9 +3972,15 @@ build_new (location_t loc, vec **placement, 
tree type,
}
}
/* Otherwise we should have 'new T[]{e_0, ..., e_k}'.  */
-  if (BRACE_ENCLOSED_INITIALIZER_P (elt))
-   elt = reshape_init (type, elt, complain);
-  cp_complete_array_type (, elt, 

RE: [PATCH v2] doc: add 'cd' command before 'make check-gcc' command in install.texi

2020-09-07 Thread Hans-Peter Nilsson
On Mon, 7 Sep 2020, Hu, Jiangping wrote:
> Hi, H-P
>
> Thanks for comment.
>
> > On Sat, 29 Aug 2020, Hu Jiangping wrote:
> >
> > > This patch add 'cd' command before 'make check-gcc' command
> > > when run the testsuite on selected tests.
> >
> > No, don't do that; those targets work fine from the toplevel
> > too, and then include the language libs.
> Yes, I know that 'make check-gcc' work well from the toplevel,
> but 'make check-g++' does not. Is there anything wrong with
> the Makefile?

IIUC check-g++ is somewhat a historic artefact, but for
consistency it should be added to the toplevel Makefile too as
a synonym for check-c++.

>   i.e.:
>   Note that if run 'make check-testsuite' from the object directory,
>   not only the tests under gcc subdirectory but also the tests under
>   the target libriaries will be performed.

(There's no "make check-testsuite".)

> What do you think?

I think that after re-reading the patch, I retract my objection,
thanks.

FWIW, note also "check-gcc-" where LANGUAGE="c, c++,
fortran, ada" which do consider the library testsuite.

brgds, H-P


Re: [PATCH] libphobos: libdruntime doesn't support shadow stack (PR95680)

2020-09-07 Thread H.J. Lu via Gcc-patches
On Mon, Sep 7, 2020 at 2:35 PM Iain Buclaw  wrote:
>
> Hi,
>
> This patch removes whatever CET support was in the switchContext routine
> for x86 D runtime, and instead uses the ucontext fallback, which propely
> handles shadow stack handling.
>
> Rather than implementing support within D runtime itself, use libc
> getcontext/setcontext functions if CET is enabled instead.
>
> HJ, does this look reasonable before I commit it?  The detection has
> been done at configure-time, rather than adding a predefined version
> condition for CET within the compiler.
>
> Done regression testing on x86_64-linux-gnu/-m32/-mx32.
>
> Regards
> Iain.
>
> ---
> libphobos/ChangeLog:
>
> PR d/95680
> * Makefile.in: Regenerate.
> * configure: Regenerate.
> * configure.ac (DCFG_ENABLE_CET): Substitute.
> * libdruntime/Makefile.in: Regenerate.
> * libdruntime/config/x86/switchcontext.S: Remove CET support code.
> * libdruntime/core/thread.d: Import gcc.config.  Don't set version
> AsmExternal when GNU_Enable_CET is true.
> * libdruntime/gcc/config.d.in (GNU_Enable_CET): Define.
> * src/Makefile.in: Regenerate.
> * testsuite/Makefile.in: Regenerate.

Looks good.  I can try it on Tiger Lake after it has been checked in.

Thanks.

-- 
H.J.


Re: [PATCH] gcc: Make strchr return value pointers const

2020-09-07 Thread JonY via Gcc-patches
On 9/4/20 12:47 PM, Martin Storsjö wrote:
> Hi,
> 
> On Fri, 4 Sep 2020, Jakub Jelinek wrote:
> 
>> On Tue, Sep 01, 2020 at 04:01:42PM +0300, Martin Storsjö wrote:
>>> This fixes compilation of codepaths for dos-like filesystems
>>> with Clang. When built with clang, it treats C input files as C++
>>> when the compiler driver is invoked in C++ mode, triggering errors
>>> when the return value of strchr() on a pointer to const is assigned
>>> to a pointer to non-const variable.
>>
>> Not really specific to clang, e.g. glibc does that in its headers too
>> as the C++ standard mandates that (and I guess mingw should do that too).
>>
>>> This matches similar variables outside of the ifdefs for dos-like
>>> path handling.
>>>
>>> 2020-09-01  Martin Storsjö  
>>>
>>> gcc/Changelog:
>>>     * dwarf2out.c (file_name_acquire): Make a strchr return value
>>>     pointer to const.
>>>
>>> libcpp/Changelog:
>>>     * files.c (remap_filename): Make a strchr return value pointer
>>>     to const.
>>
>> LGTM.  And it is short enough not to need copyright assignment, so ok for
>> trunk.
> 
> Thanks! Can someone commit this for me?
> 
> // Martin

Ping can anyone commit this?

Are platform maintainers allowed to push general changes like these? If
so I can push soon.



signature.asc
Description: OpenPGP digital signature


[PATCH] Add support for C++20 barriers

2020-09-07 Thread Thomas Rodgers
Re-sending

Adds 

* include/Makefile.am (std_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/bits/atomic_base.h (__atomic_base<_Itp>::_M_wait): Define.
(__atomic_base<_Itp>::wait): Delegate to _M_wait.
* include/std/barrier: New file.
* testsuite/30_thread/barrier/1.cc: New test.
* testsuite/30_thread/barrier/2.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_thread/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_thread/barrier/arrive.cc: Likewise.
* testsuite/30_thread/barrier/completion.cc: Likewise.
* testsuite/30_thread/barrier/max.cc: Likewise.
---
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   3 +-
 libstdc++-v3/include/bits/atomic_base.h   |  16 +-
 libstdc++-v3/include/std/barrier  | 230 ++
 libstdc++-v3/include/std/version  |   1 +
 .../testsuite/30_threads/barrier/1.cc |  27 ++
 .../testsuite/30_threads/barrier/2.cc |  27 ++
 .../testsuite/30_threads/barrier/arrive.cc|  34 +++
 .../30_threads/barrier/arrive_and_drop.cc |  32 +++
 .../30_threads/barrier/arrive_and_wait.cc |  33 +++
 .../30_threads/barrier/completion.cc  |  39 +++
 .../testsuite/30_threads/barrier/max.cc   |  26 ++
 12 files changed, 465 insertions(+), 4 deletions(-)
 create mode 100644 libstdc++-v3/include/std/barrier
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/1.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/2.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_drop.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/arrive_and_wait.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/completion.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/max.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index ef8acd4a389..bae97852348 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -30,6 +30,7 @@ std_headers = \
${std_srcdir}/any \
${std_srcdir}/array \
${std_srcdir}/atomic \
+   ${std_srcdir}/barrier \
${std_srcdir}/bit \
${std_srcdir}/bitset \
${std_srcdir}/charconv \
diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index c121d993fee..98f9941c2d4 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -229,6 +229,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __atomic_load(&_M_i, &__v, int(__m));
   return __v == __GCC_ATOMIC_TEST_AND_SET_TRUEVAL;
 }
+
 #endif // C++20
 
 _GLIBCXX_ALWAYS_INLINE void
@@ -566,13 +567,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 #if __cplusplus > 201703L
+  template
+   _GLIBCXX_ALWAYS_INLINE void
+   _M_wait(__int_type __old, _Pred&& __pred,
+   memory_order __m) const noexcept
+   {
+ std::__atomic_wait(&_M_i, __old,
+std::forward<_Pred&&>(__pred));
+   }
+
   _GLIBCXX_ALWAYS_INLINE void
   wait(__int_type __old,
  memory_order __m = memory_order_seq_cst) const noexcept
   {
-   std::__atomic_wait(&_M_i, __old,
-  [__m, this, __old]
-  { return this->load(__m) != __old; });
+   _M_wait(__old, [__m, this, __old]()
+   { return this->load(__m) != __old; },
+   __m);
   }
 
   // TODO add const volatile overload
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
new file mode 100644
index 000..870db7db2ac
--- /dev/null
+++ b/libstdc++-v3/include/std/barrier
@@ -0,0 +1,230 @@
+//  -*- C++ -*-
+// This implementation is based on libcxx/include/barrier
+//===-- barrier.h --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---===//
+
+#ifndef _GLIBCXX_BARRIER
+#define _GLIBCXX_BARRIER 1
+
+#pragma GCC system_header
+
+#if __cplusplus > 201703L
+#define __cpp_lib_barrier 201907L
+
+#include 
+
+#if defined(_GLIBCXX_HAS_GTHREADS)
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  struct __empty_completion
+  {
+_GLIBCXX_ALWAYS_INLINE void
+operator()() noexcept
+{ }
+  };
+
+/*
+
+The default implementation of __barrier_base is a classic tree barrier.
+
+It looks different from literature pseudocode for two main 

Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes

2020-09-07 Thread Alexandre Oliva
On Sep  4, 2020, Segher Boessenkool  wrote:

> Please open a PR?  (But see below first.)

I saw the bit below, then filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96965

> Btw, we could perhaps try permuting arms in combine itself (not in
> recog)?  In combine we know that there are at most 4 sets (and usually
> at most 3), and we also often have a good idea what swaps are likely
> to work and which not, so maybe trying just one extra thing will have
> good results already.

That makes sense.  Perhaps for starters we could just recall that the
original i2 was (parallel [i1 i2]), i.e., with i1 appearing first, so
that added_sets tried to insert i1 first rather than last.  That would
catch this case, and likely any case in which we i1 is split out of i2
and i2 rather than i1 feeds i3.  When it is i1 that feeds i3, we'll have
added_sets_2 and add i2 last in the i3 parallel, which is happens to
already be the most likely order for recog.  In both cases, the
likelihood follows from the fact that i1 and i2 were split, in this
order, out of the original i2's parallel.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH] c++: Fix ICE in reshape_init with init-list [PR95164]

2020-09-07 Thread Jason Merrill via Gcc-patches

On 9/4/20 5:39 PM, Marek Polacek wrote:

This patch fixes a long-standing bug in reshape_init_r.  Since r209314
we implement DR 1467 which handles list-initialization with a single
initializer of the same type as the target.  In this test this causes
a crash in reshape_init_r when we're processing a constructor that has
undergone the DR 1467 transformation.

Take e.g. the

   foo({{1, {H{k);

line in the attached test.  {H{k}} initializes the field b of H in I.
H{k} is a functional cast, so has TREE_HAS_CONSTRUCTOR set, so is
COMPOUND_LITERAL_P.  We perform the DR 1467 transformation and turn
{H{k}} into H{k}.  Then we attempt to reshape H{k} again and since
first_initializer_p is null and it's COMPOUND_LITERAL_P, we go here:

else if (COMPOUND_LITERAL_P (stripped_init))
  gcc_assert (!BRACE_ENCLOSED_INITIALIZER_P (stripped_init));


It looks to me like the bug is here:

  /* [dcl.init.aggr]   
   
 All implicit type conversions (clause _conv_) are considered when 
 initializing the aggregate member with an initializer from an 
 initializer-list.  If the initializer can initialize a member,
 the member is initialized.  Otherwise, if the member is itself a  
 non-empty subaggregate, brace elision is assumed and the  
 initializer is considered for the initialization of the first 
 member of the subaggregate.  */

  if (TREE_CODE (init) != CONSTRUCTOR
  /* But don't try this for the first initializer, since that would be 
 looking through the outermost braces; A a2 = { a1 }; is not a 
 valid aggregate initialization.  */

  && !first_initializer_p
  && (same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (init))
  || can_convert_arg (type, TREE_TYPE (init), init, LOOKUP_NORMAL,
  complain)))
{
  d->cur++;
  return init;
}


We ought to handle H{k} here, treat it as the initializer for the 
member, and not get as far as the code you quote above.


Jason



Re: [PATCH] --enable-link-serialization support

2020-09-07 Thread Jason Merrill via Gcc-patches
On Thu, Sep 3, 2020 at 10:49 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Thu, Sep 03, 2020 at 03:53:35PM +0200, Richard Biener wrote:
> > On Thu, 3 Sep 2020, Jakub Jelinek wrote:
> > But is that an issue in practice?  I usually do not do make -j32 cc1plus
> > in a tree that was configured for bootstrap, nor do I use
> > --enable-link-serialization in that case.
>
> Guess most often true, but one could still do it when debugging some
> particular problem during bootstrap.
>
> > Btw, do you plan to keep --enable-link-mutex the way it is?  What it
> > provides ontop of --enable-link-serialization is that progress
> > metering for very long running LTO link stages.  Not sure if that
> > works for the last LTO link or only when there's some other link
> > waiting.

Only when there's another link waiting.

> I kept it as is for now.  Primarily I didn't want to use
> --enable-linux-mutex for the new thing when it has nothing to do with a
> mutex.  I think with both options together it will just print useless
> message that it acquired the lock immediately in each case.

Yes.

For this to replace --enable-link-mutex for me, I'd want it to work
with plain 'make' in the gcc directory, not just at top level.

Jason



Re: [PATCH, rs6000] Add non-relative jump table support on Power Linux

2020-09-07 Thread Segher Boessenkool
On Mon, Aug 24, 2020 at 03:48:43PM +0800, HAO CHEN GUI wrote:
> >I'll try to be quicker at reviewing iterations of this -- there is quite
> >some way to go, without me slowing things down!

Sigh :-(

>   * config/rs6000/linux.h (rs6000_relative_jumptables): Define.

That macro looks like it is variable (or function).  *Make* it a
variable, please?

>   * config/rs6000/rs6000.c (TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC):
>   Define

Period?

>   (rs6000_gen_pic_addr_diff_vec, rs6000_output_addr_vec_elt): Implement.

"New function."

>   * config/rs6000/rs6000.md (absolute_tablejumpsi,
>   absolute_tablejumpsi_nospec, absolute_tablejumpdi,
>   absolute_tablejumpdi_nospec): Add four new expansions.

"New define_expands." or "New expanders."

>   * config/rs6000/rs6000.opt (mrelative-jumptables): Add a new option and
>   set rs6000_relative_jumptables to true by default.

"rs6000.opt: Add -mrelative-jumptables."

> +/* Disable relative jump tables for Power Linux.  */
> +#undef rs6000_relative_jumptables
> +#define rs6000_relative_jumptables 0

Why?

> +/* Disable relative jump tables for Power Linux64.  */
> +#undef rs6000_relative_jumptables
> +#define rs6000_relative_jumptables 0

(That's not what it's called...  Just don't say the "for..." at all?
It is clear from what file it is in.)

>  /* Indicate that jump tables go in the text section.  */
>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> +#define JUMP_TABLES_IN_TEXT_SECTION rs6000_relative_jumptables

Not sure that is correct.  Maybe the patch using rodata (.data.rel.ro)
should be a separate patch?

>  /* Define as C expression which evaluates to nonzero if the tablejump
> instruction expects the table to contain offsets from the address of the
> table.
> Do not define this if the table should contain absolute addresses.  */
> -#define CASE_VECTOR_PC_RELATIVE 1
> +#define CASE_VECTOR_PC_RELATIVE 0

This should depend on the new flag?

> +/* Specify the machine mode that this machine uses
> +   for the index in the tablejump instruction.  */
> +#define CASE_VECTOR_MODE \
> +  (TARGET_32BIT || rs6000_relative_jumptables ? SImode : DImode)

rs6000_relative_jumptables ? SImode : Pmode;

> +  if (rs6000_relative_jumptables)
> + {
> +   if (TARGET_32BIT)
> + emit_jump_insn (gen_tablejumpsi (operands[0], operands[1]));
> +   else
> + emit_jump_insn (gen_tablejumpdi (operands[0], operands[1]));
> + }

Hrm, I guess we should make that a parameterized name (future work,
don't do it now :-) )

> +(define_expand "absolute_tablejumpsi"

Don't prefix names; it should start with "tablejump".


Segher


[PATCH] libphobos: libdruntime doesn't support shadow stack (PR95680)

2020-09-07 Thread Iain Buclaw via Gcc-patches
Hi,

This patch removes whatever CET support was in the switchContext routine
for x86 D runtime, and instead uses the ucontext fallback, which propely
handles shadow stack handling.

Rather than implementing support within D runtime itself, use libc
getcontext/setcontext functions if CET is enabled instead.

HJ, does this look reasonable before I commit it?  The detection has
been done at configure-time, rather than adding a predefined version
condition for CET within the compiler.

Done regression testing on x86_64-linux-gnu/-m32/-mx32.

Regards
Iain.

---
libphobos/ChangeLog:

PR d/95680
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac (DCFG_ENABLE_CET): Substitute.
* libdruntime/Makefile.in: Regenerate.
* libdruntime/config/x86/switchcontext.S: Remove CET support code.
* libdruntime/core/thread.d: Import gcc.config.  Don't set version
AsmExternal when GNU_Enable_CET is true.
* libdruntime/gcc/config.d.in (GNU_Enable_CET): Define.
* src/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
---
 libphobos/Makefile.in |  3 ++
 libphobos/configure   | 13 +++--
 libphobos/configure.ac|  3 ++
 libphobos/libdruntime/Makefile.in |  4 ++
 .../libdruntime/config/x86/switchcontext.S| 12 +
 libphobos/libdruntime/core/thread.d   | 52 +++
 libphobos/libdruntime/gcc/config.d.in |  3 ++
 libphobos/src/Makefile.in |  4 ++
 libphobos/testsuite/Makefile.in   |  4 ++
 9 files changed, 64 insertions(+), 34 deletions(-)

diff --git a/libphobos/Makefile.in b/libphobos/Makefile.in
index 4806f69f406..f6cba17159f 100644
--- a/libphobos/Makefile.in
+++ b/libphobos/Makefile.in
@@ -108,6 +108,8 @@ target_triplet = @target@
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
+   $(top_srcdir)/../config/cet.m4 \
+   $(top_srcdir)/../config/enable.m4 \
$(top_srcdir)/../config/lead-dot.m4 \
$(top_srcdir)/../config/multi.m4 \
$(top_srcdir)/../config/override.m4 \
@@ -214,6 +216,7 @@ CPPFLAGS = @CPPFLAGS@
 CYGPATH_W = @CYGPATH_W@
 DCFG_ARM_EABI_UNWINDER = @DCFG_ARM_EABI_UNWINDER@
 DCFG_DLPI_TLS_MODID = @DCFG_DLPI_TLS_MODID@
+DCFG_ENABLE_CET = @DCFG_ENABLE_CET@
 DCFG_HAVE_64BIT_ATOMICS = @DCFG_HAVE_64BIT_ATOMICS@
 DCFG_HAVE_ATOMIC_BUILTINS = @DCFG_HAVE_ATOMIC_BUILTINS@
 DCFG_HAVE_LIBATOMIC = @DCFG_HAVE_LIBATOMIC@
diff --git a/libphobos/configure b/libphobos/configure
index a8d151cdccb..86a0aba6976 100755
--- a/libphobos/configure
+++ b/libphobos/configure
@@ -722,6 +722,7 @@ LIBTOOL
 CFLAGS_FOR_BUILD
 CC_FOR_BUILD
 AR
+DCFG_ENABLE_CET
 CET_FLAGS
 RANLIB
 MAINT
@@ -5586,7 +5587,7 @@ case "$host" in
 case "$enable_cet" in
   auto)
# Check if target supports multi-byte NOPs
-   # and if assembler supports CET insn.
+   # and if compiler and assembler support CET insn.
cet_save_CFLAGS="$CFLAGS"
CFLAGS="$CFLAGS -fcf-protection"
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -5650,6 +5651,12 @@ $as_echo "no" >&6; }
 fi
 
 
+if test x$enable_cet = xyes; then :
+  DCFG_ENABLE_CET=true
+else
+  DCFG_ENABLE_CET=false
+fi
+
 
 # This should be inherited in the recursive make, but ensure it is defined.
 test "$AR" || AR=ar
@@ -11738,7 +11745,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11741 "configure"
+#line 11748 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11844,7 +11851,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11847 "configure"
+#line 11854 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/libphobos/configure.ac b/libphobos/configure.ac
index ec8a30ea511..97f96934aaf 100644
--- a/libphobos/configure.ac
+++ b/libphobos/configure.ac
@@ -68,6 +68,9 @@ AC_PROG_MAKE_SET
 # Add CET specific flags if CET is enabled
 GCC_CET_FLAGS(CET_FLAGS)
 AC_SUBST(CET_FLAGS)
+AS_IF([test x$enable_cet = xyes],
+  [DCFG_ENABLE_CET=true], [DCFG_ENABLE_CET=false])
+AC_SUBST(DCFG_ENABLE_CET)
 
 # This should be inherited in the recursive make, but ensure it is defined.
 test "$AR" || AR=ar
diff --git a/libphobos/libdruntime/Makefile.in 
b/libphobos/libdruntime/Makefile.in
index 3fddbc340de..28b4333838f 100644
--- a/libphobos/libdruntime/Makefile.in
+++ b/libphobos/libdruntime/Makefile.in
@@ -131,6 +131,8 @@ target_triplet = @target@
 subdir = libdruntime
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
+   $(top_srcdir)/../config/cet.m4 \
+   $(top_srcdir)/../config/enable.m4 \
$(top_srcdir)/../config/lead-dot.m4 \
$(top_srcdir)/../config/multi.m4 \

Re: [PATCH] hppa64: Improve hppa_rtx_costs for DImode shifts by constants.

2020-09-07 Thread John David Anglin
On 2020-09-07 3:59 p.m., Roger Sayle wrote:
> Please let me know what you think.
>
>
> 2020-09-07  Roger Sayle  
>
> gcc/ChangeLog
>   * config/pa/pa.c (hppa_rtx_costs) [ASHIFT, ASHIFTRT, LSHIFTRT]:
>   Provide accurate costs for DImode shifts of integer constants.
Will do.

I'm still testing patchh3.txt on hppa64-hpux.  Takes a bit of time to do full 
build and check.

Thanks,
Dave

-- 
John David Anglin  dave.ang...@bell.net



[PATCH] hppa64: Improve hppa_rtx_costs for DImode shifts by constants.

2020-09-07 Thread Roger Sayle
Hi Dave,
Here's another patch for you to try on hppa64.  I've no idea whether it
helps PR middle-end/87256 on hppa64 or makes things worse, but this
updates the DImode shift by constant costs on TARGET_64BIT, in the
same way that the previous rtx_costs patch improved the SImode costs.
Generating optimal code is a useful pre-requisite.  If nothing else, this
should reduce the number PA2.0 instructions generated for PR87256's
hog.c.

Please let me know what you think.


2020-09-07  Roger Sayle  

gcc/ChangeLog
* config/pa/pa.c (hppa_rtx_costs) [ASHIFT, ASHIFTRT, LSHIFTRT]:
Provide accurate costs for DImode shifts of integer constants.


Many thanks in advance,
Roger
--

diff --git a/gcc/config/pa/pa.c b/gcc/config/pa/pa.c
index a9223ab..210e44f 100644
--- a/gcc/config/pa/pa.c
+++ b/gcc/config/pa/pa.c
@@ -1630,13 +1630,16 @@ hppa_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 case ASHIFT:
   if (mode == DImode)
{
- if (TARGET_64BIT)
-   *total = COSTS_N_INSNS (3);
- else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+ if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
{
- *total = COSTS_N_INSNS (2);
+ if (TARGET_64BIT)
+   *total = COSTS_N_INSNS (1);
+ else
+   *total = COSTS_N_INSNS (2);
  return true;
}
+ else if (TARGET_64BIT)
+   *total = COSTS_N_INSNS (3);
  else if (speed)
*total = COSTS_N_INSNS (13);
  else
@@ -1661,13 +1664,16 @@ hppa_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 case ASHIFTRT:
   if (mode == DImode)
{
- if (TARGET_64BIT)
-   *total = COSTS_N_INSNS (3);
- else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+ if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
{
- *total = COSTS_N_INSNS (2);
+ if (TARGET_64BIT)
+   *total = COSTS_N_INSNS (1);
+ else
+   *total = COSTS_N_INSNS (2);
  return true;
}
+ else if (TARGET_64BIT)
+   *total = COSTS_N_INSNS (3);
  else if (speed)
*total = COSTS_N_INSNS (14);
  else
@@ -1692,13 +1698,16 @@ hppa_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 case LSHIFTRT:
   if (mode == DImode)
{
- if (TARGET_64BIT)
-   *total = COSTS_N_INSNS (2);
- else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+ if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
{
- *total = COSTS_N_INSNS (2);
+ if (TARGET_64BIT)
+   *total = COSTS_N_INSNS (1);
+ else
+   *total = COSTS_N_INSNS (2);
  return true;
}
+ else if (TARGET_64BIT)
+   *total = COSTS_N_INSNS (2);
  else if (speed)
*total = COSTS_N_INSNS (12);
  else


Re: *Ping*: [PATCH] PR fortran/96711 - ICE on NINT() Function

2020-09-07 Thread Harald Anlauf
Hi Thomas,

thanks for the review!

> Regarding the patch: Could you change the test caes into a
> run-time test and check for the results both for compile-time
> simplification and evaluation at run-time?  Just to make sure that
> fold_convert wasn't trying to tell us something that convert doesn't...
>
> OK with that change.

Here's the test I committed:


! { dg-do run }
! { dg-require-effective-target fortran_integer_16 }
! { dg-require-effective-target fortran_real_16 }
! { dg-additional-options "-fdump-tree-original" }
! { dg-final { scan-tree-dump-times "_gfortran_stop_numeric" 2 "original" } }
!
! PR fortran/96711 - ICE on NINT() Function

program p
  implicit none
  real(8):: x
  real(16)   :: y
  integer(16), parameter :: k1 = nint (2 / epsilon (x), kind(k1))
  integer(16), parameter :: k2 = nint (2 / epsilon (y), kind(k2))
  integer(16), parameter :: m1 = 9007199254740992_16!2**53
  integer(16), parameter :: m2 = 10384593717069655257060992658440192_16 !2**113
  integer(16), volatile  :: m
  x = 2 / epsilon (x)
  y = 2 / epsilon (y)
  m = nint (x, kind(m))
! print *, m
  if (k1 /= m1) stop 1
  if (m  /= m1) stop 2
  m = nint (y, kind(m))
! print *, m
  if (k2 /= m2) stop 3
  if (m  /= m2) stop 4
end program


Harald



Re: [PING] floatformat.h: Add bfloat16 support.

2020-09-07 Thread Joseph Myers
On Mon, 7 Sep 2020, Willgerodt, Felix via Gcc-patches wrote:

> @@ -133,6 +133,9 @@ extern const struct floatformat 
> floatformat_ia64_quad_little;
>  /* IBM long double (double+double).  */  extern const struct floatformat 
> floatformat_ibm_long_double_big;  extern const struct floatformat 
> floatformat_ibm_long_double_little;
> +/* bfloat16.  */
> +extern const struct floatformat floatformat_bfloat16_big; extern const 
> +struct floatformat floatformat_bfloat16_little;

There seems to be something odd about the diff formatting here.  I'd 
expect each declaration to be on its own line, not "extern const" at the 
end of a line and the rest of a declaration on the next line.  OK with 
that fixed.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 6/6] ipa-cp: Separate and increase the large-unit parameter

2020-09-07 Thread Martin Jambor
Hi,

a previous patch in the series has taught IPA-CP to identify the
important cloning opportunities in 548.exchange2_r as worthwhile on
their own, but the optimization is still prevented from taking place
because of the overall unit-growh limit.  This patches raises that
limit so that it takes place and the benchmark runs 30% faster (on AMD
Zen2 CPU at least).

Before this patch, IPA-CP uses the following formulae to arrive at the
overall_size limit:

base = MAX(orig_size, param_large_unit_insns)
overall_size_limit = base + base * param_ipa_cp_unit_growth / 100

since param_ipa_cp_unit_growth has default 10, param_large_unit_insns
has default value 1.

The problem with exchange2 (at least on zen2 but I have had a quick
look on aarch64 too) is that the original estimated unit size is 10513
and so param_large_unit_insns does not apply and the default limit is
therefore 11564 which is good enough only for one of the ideal 8
clonings, we need the limit to be at least 16291.

I would like to raise param_ipa_cp_unit_growth a little bit more soon
too, but most certainly not to 55.  Therefore, the large_unit must be
increased.  In this patch, I decided to decouple the inlining and
ipa-cp large-unit parameters.  It also makes sense because IPA-CP uses
it only at -O3 while inlining also at -O2 (IIUC).  But if we agree we
can try raising param_large_unit_insns to 13-14 thousand
"instructions," perhaps it is not necessary.  But then again, it may
make sense to actually increase the IPA-CP limit further.

I plan to experiment with IPA-CP tuning on a larger set of programs.
Meanwhile, mainly to address the 548.exchange2_r regression, I'm
suggesting this simple change.

Bootstrapped and tested and LTO bootstrapped on x86_64 as a part of
the whole series.

OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-09-07  Martin Jambor  

* params.opt (ipa-cp-large-unit-insns): New parameter.
* ipa-cp.c (get_max_overall_size): Use the new parameter.
---
 gcc/ipa-cp.c   | 2 +-
 gcc/params.opt | 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 12acf24c553..2152f9e5876 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3448,7 +3448,7 @@ static long
 get_max_overall_size (cgraph_node *node)
 {
   long max_new_size = orig_overall_size;
-  long large_unit = opt_for_fn (node->decl, param_large_unit_insns);
+  long large_unit = opt_for_fn (node->decl, param_ipa_cp_large_unit_insns);
   if (max_new_size < large_unit)
 max_new_size = large_unit;
   int unit_growth = opt_for_fn (node->decl, param_ipa_cp_unit_growth);
diff --git a/gcc/params.opt b/gcc/params.opt
index 97509963d71..ef2c1f81dd7 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -214,6 +214,10 @@ Percentage penalty functions containing a single call to 
another function will r
 Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param 
Optimization
 How much can given compilation unit grow because of the interprocedural 
constant propagation (in percent).
 
+-param=ipa-cp-large-unit-insns=
+Common Joined UInteger Var(param_ipa_cp_large_unit_insns) Optimization 
Init(16000) Param
+The size of translation unit that IPA-CP pass considers large.
+
 -param=ipa-cp-value-list-size=
 Common Joined UInteger Var(param_ipa_cp_value_list_size) Init(8) Param 
Optimization
 Maximum size of a list of values associated with each parameter for 
interprocedural constant propagation.
-- 
2.28.0



[PATCH 5/6] ipa-cp: Add dumping of overall_size after cloning

2020-09-07 Thread Martin Jambor
Hi,

when experimenting with IPA-CP parameters, especially when looking
into exchange2_r, it has been very useful to know what the value of
overall_size is at different stages of the decision process.  This
patch therefore adds it to the generated dumps.

Bootstrapped and tested and LTO bootstrapped on x86_64 as a part of
the whole series.

OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-09-07  Martin Jambor  

* ipa-cp.c (estimate_local_effects): Add overeall_size to dumped
string.
(decide_about_value): Add dumping new overall_size.
---
 gcc/ipa-cp.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index f6320c787de..12acf24c553 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3517,7 +3517,8 @@ estimate_local_effects (struct cgraph_node *node)
 
  if (dump_file)
fprintf (dump_file, " Decided to specialize for all "
-"known contexts, growth deemed beneficial.\n");
+"known contexts, growth (to %li) deemed "
+"beneficial.\n", overall_size);
}
  else if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "  Not cloning for all contexts because "
@@ -5506,6 +5507,9 @@ decide_about_value (struct cgraph_node *node, int index, 
HOST_WIDE_INT offset,
   val->spec_node = create_specialized_node (node, known_csts, known_contexts,
aggvals, callers);
   overall_size += val->local_size_cost;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, " overall size reached %li\n",
+overall_size);
 
   /* TODO: If for some lattice there is only one other known value
  left, make a special node for it too. */
-- 
2.28.0



[PATCH 4/6] ipa: Multiple predicates for loop properties, with frequencies

2020-09-07 Thread Martin Jambor
Hi,

this patch enhances the ability of IPA to reason under what conditions
loops in a function have known iteration counts or strides because it
replaces single predicates which currently hold conjunction of
predicates for all loops with vectors capable of holding multiple
predicates, each with a cumulative frequency of loops with the
property.

This second property is then used by IPA-CP to much more aggressively
boost its heuristic score for cloning opportunities which make
iteration counts or strides of frequent loops compile time constant.

Last week I bootstrapped and tested (and LTO-bootstrapped) the series up
to this patch in particular, this week I did so for the whole patch set.

OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-09-03  Martin Jambor  

* ipa-fnsummary.h (ipa_freqcounting_predicate): New type.
(ipa_fn_summary): Change the type of loop_iterations and loop_strides
to vectors of ipa_freqcounting_predicate.
(ipa_fn_summary::ipa_fn_summary): Construct the new vectors.
(ipa_call_estimates): New fields loops_with_known_iterations and
loops_with_known_strides.
* ipa-cp.c (hint_time_bonus): Multiply param_ipa_cp_loop_hint_bonus
with the expected frequencies of loops with known iteration count or
stride.
* ipa-fnsummary.c (add_freqcounting_predicate): New function.
(ipa_fn_summary::~ipa_fn_summary): Release the new vectors instead of
just two predicates.
(remap_hint_predicate_after_duplication): Replace with function
remap_freqcounting_preds_after_dup.
(ipa_fn_summary_t::duplicate): Use it or duplicate new vectors.
(ipa_dump_fn_summary): Dump the new vectors.
(analyze_function_body): Compute the loop property vectors.
(ipa_call_context::estimate_size_and_time): Calculate also
loops_with_known_iterations and loops_with_known_strides.  Adjusted
dumping accordinly.
(remap_hint_predicate): Replace with function
remap_freqcounting_predicate.
(ipa_merge_fn_summary_after_inlining): Use it.
(inline_read_section): Stream loopcounting vectors instead of two
simple predicates.
(ipa_fn_summary_write): Likewise.
* params.opt (ipa-max-loop-predicates): New parameter.
* doc/invoke.texi (ipa-max-loop-predicates): Document new param.

gcc/testsuite/ChangeLog:

2020-09-03  Martin Jambor  

* gcc.dg/ipa/ipcp-loophint-1.c: New test.
---
 gcc/doc/invoke.texi|   4 +
 gcc/ipa-cp.c   |   9 +
 gcc/ipa-fnsummary.c| 318 ++---
 gcc/ipa-fnsummary.h|  38 ++-
 gcc/params.opt |   4 +
 gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c |  29 ++
 6 files changed, 288 insertions(+), 114 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-loophint-1.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bca8c856dc8..d539e58a11c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13259,6 +13259,10 @@ of iterations of a loop known, it adds a bonus of
 @option{ipa-cp-loop-hint-bonus} to the profitability score of
 the candidate.
 
+@item ipa-max-loop-predicates
+The maximum number of different predicates IPA will use to describe when
+loops in a function have known properties.
+
 @item ipa-max-aa-steps
 During its analysis of function bodies, IPA-CP employs alias analysis
 in order to track values pointed to by function parameters.  In order
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 77c84a6ed5d..f6320c787de 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3205,6 +3205,15 @@ hint_time_bonus (cgraph_node *node, const 
ipa_call_estimates )
   ipa_hints hints = estimates.hints;
   if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
 result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
+
+  sreal bonus_for_one = opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
+
+  if (hints & INLINE_HINT_loop_iterations)
+result += (estimates.loops_with_known_iterations * bonus_for_one).to_int 
();
+
+  if (hints & INLINE_HINT_loop_stride)
+result += (estimates.loops_with_known_strides * bonus_for_one).to_int ();
+
   return result;
 }
 
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 2404a92291c..aec1c319a65 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -310,6 +310,36 @@ set_hint_predicate (predicate **p, predicate new_predicate)
 }
 }
 
+/* Find if NEW_PREDICATE is already in V and if so, increment its freq.
+   Otherwise add a new item to the vector with this predicate and frerq equal
+   to add_freq, unless the number of predicates would exceed MAX_NUM_PREDICATES
+   in which case the function does nothing.  */
+
+static void
+add_freqcounting_predicate (vec **v,
+   const predicate _predicate, sreal add_freq,
+   

[PATCH 3/6] ipa: Bundle estimates of ipa_call_context::estimate_size_and_time

2020-09-07 Thread Martin Jambor
Hi,

A subsequent patch adds another two estimates that the code in
ipa_call_context::estimate_size_and_time computes, and the fact that
the function has a special output parameter for each thing it computes
would make it have just too many.  Therefore, this patch collapses all
those ouptut parameters into one output structure.

Last week I bootstrapped and tested (and LTO-bootstrapped) the series up
to this patch in particular, this week I did so for the whole patch set.

OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-09-02  Martin Jambor  

* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to use
ipa_call_estimates.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-fnsummary.h (struct ipa_call_estimates): New type.
(ipa_call_context::estimate_size_and_time): Adjusted declaration.
(estimate_ipcp_clone_size_and_time): Likewise.
* ipa-cp.c (hint_time_bonus): Changed the type of the second argument
to ipa_call_estimates.
(perform_estimation_of_a_value): Adjusted to use ipa_call_estimates.
(estimate_local_effects): Likewise.
* ipa-fnsummary.c (ipa_call_context::estimate_size_and_time): Adjusted
to return estimates in a single ipa_call_estimates parameter.
(estimate_ipcp_clone_size_and_time): Likewise.
---
 gcc/ipa-cp.c  | 45 ++---
 gcc/ipa-fnsummary.c   | 60 +++
 gcc/ipa-fnsummary.h   | 36 +--
 gcc/ipa-inline-analysis.c | 47 +-
 4 files changed, 105 insertions(+), 83 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 292dd7e5bdf..77c84a6ed5d 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3196,12 +3196,13 @@ devirtualization_time_bonus (struct cgraph_node *node,
   return res;
 }
 
-/* Return time bonus incurred because of HINTS.  */
+/* Return time bonus incurred because of hints stored in ESTIMATES.  */
 
 static int
-hint_time_bonus (cgraph_node *node, ipa_hints hints)
+hint_time_bonus (cgraph_node *node, const ipa_call_estimates )
 {
   int result = 0;
+  ipa_hints hints = estimates.hints;
   if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
 result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
   return result;
@@ -3397,15 +3398,13 @@ perform_estimation_of_a_value (cgraph_node *node,
   int removable_params_cost, int est_move_cost,
   ipcp_value_base *val)
 {
-  int size, time_benefit;
-  sreal time, base_time;
-  ipa_hints hints;
+  int time_benefit;
+  ipa_call_estimates estimates;
 
-  estimate_ipcp_clone_size_and_time (node, avals, , ,
-_time, );
-  base_time -= time;
-  if (base_time > 65535)
-base_time = 65535;
+  estimate_ipcp_clone_size_and_time (node, avals, );
+  sreal time_delta = estimates.nonspecialized_time - estimates.time;
+  if (time_delta > 65535)
+time_delta = 65535;
 
   /* Extern inline functions have no cloning local time benefits because they
  will be inlined anyway.  The only reason to clone them is if it enables
@@ -3413,11 +3412,12 @@ perform_estimation_of_a_value (cgraph_node *node,
   if (DECL_EXTERNAL (node->decl) && DECL_DECLARED_INLINE_P (node->decl))
 time_benefit = 0;
   else
-time_benefit = base_time.to_int ()
+time_benefit = time_delta.to_int ()
   + devirtualization_time_bonus (node, avals)
-  + hint_time_bonus (node, hints)
+  + hint_time_bonus (node, estimates)
   + removable_params_cost + est_move_cost;
 
+  int size = estimates.size;
   gcc_checking_assert (size >=0);
   /* The inliner-heuristics based estimates may think that in certain
  contexts some functions do not have any size at all but we want
@@ -3472,23 +3472,21 @@ estimate_local_effects (struct cgraph_node *node)
   || (removable_params_cost && node->can_change_signature))
 {
   struct caller_statistics stats;
-  ipa_hints hints;
-  sreal time, base_time;
-  int size;
+  ipa_call_estimates estimates;
 
   init_caller_stats ();
   node->call_for_symbol_thunks_and_aliases (gather_caller_stats, ,
  false);
-  estimate_ipcp_clone_size_and_time (node, , , ,
-_time, );
-  time -= devirt_bonus;
-  time -= hint_time_bonus (node, hints);
-  time -= removable_params_cost;
-  size -= stats.n_calls * removable_params_cost;
+  estimate_ipcp_clone_size_and_time (node, , );
+  sreal time = estimates.nonspecialized_time - estimates.time;
+  time += devirt_bonus;
+  time += hint_time_bonus (node, estimates);
+  time += removable_params_cost;
+  int size = estimates.size - stats.n_calls * removable_params_cost;
 
   if (dump_file)
fprintf (dump_file, " - context independent values, size: %i, 

[PATCH 2/6] ipa: Introduce ipa_cached_call_context

2020-09-07 Thread Martin Jambor
Hi,

as we discussed with Honza on the mailin glist last week, making
cached call context structure distinct from the normal one may make it
clearer that the cached data need to be explicitely deallocated.

This implements does that division.  It is not mandatory for the overall
main goals of the patch set and can be dropped if deemed superfluous.

Last week I bootstrapped and tested (and LTO-bootstrapped) this patch
individually, this week I did so for the whole patch set.

OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-09-02  Martin Jambor  

* ipa-fnsummary.h (ipa_cached_call_context): New forward declaration
and class.
(class ipa_call_context): Make friend ipa_cached_call_context.  Moved
methods duplicate_from and release to it too.
* ipa-fnsummary.c (ipa_call_context::duplicate_from): Moved to class
ipa_cached_call_context.
(ipa_call_context::release): Likewise, removed the parameter.
* ipa-inline-analysis.c (node_context_cache_entry): Change the type of
ctx to ipa_cached_call_context.
(do_estimate_edge_time): Remove parameter from the call to
ipa_cached_call_context::release.
---
 gcc/ipa-fnsummary.c   | 21 -
 gcc/ipa-fnsummary.h   | 16 ++--
 gcc/ipa-inline-analysis.c |  4 ++--
 3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index d8b95aca307..aaccc203b3b 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -3284,7 +3284,7 @@ ipa_call_context::ipa_call_context (cgraph_node *node, 
clause_t possible_truths,
 /* Set THIS to be a duplicate of CTX.  Copy all relevant info.  */
 
 void
-ipa_call_context::duplicate_from (const ipa_call_context )
+ipa_cached_call_context::duplicate_from (const ipa_call_context )
 {
   m_node = ctx.m_node;
   m_possible_truths = ctx.m_possible_truths;
@@ -3354,24 +3354,19 @@ ipa_call_context::duplicate_from (const 
ipa_call_context )
   m_avals.m_known_value_ranges = vNULL;
 }
 
-/* Release memory used by known_vals/contexts/aggs vectors.
-   If ALL is true release also inline_param_summary.
-   This happens when context was previously duplicated to be stored
-   into cache.  */
+/* Release memory used by known_vals/contexts/aggs vectors.  and
+   inline_param_summary.  */
 
 void
-ipa_call_context::release (bool all)
+ipa_cached_call_context::release ()
 {
   /* See if context is initialized at first place.  */
   if (!m_node)
 return;
-  ipa_release_agg_values (m_avals.m_known_aggs, all);
-  if (all)
-{
-  m_avals.m_known_vals.release ();
-  m_avals.m_known_contexts.release ();
-  m_inline_param_summary.release ();
-}
+  ipa_release_agg_values (m_avals.m_known_aggs, true);
+  m_avals.m_known_vals.release ();
+  m_avals.m_known_contexts.release ();
+  m_inline_param_summary.release ();
 }
 
 /* Return true if CTX describes the same call context as THIS.  */
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index dfcde9f91b8..8e5659f62aa 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -287,6 +287,8 @@ public:
  ipa_call_summary *dst_data);
 };
 
+class ipa_cached_call_context;
+
 /* This object describe a context of call.  That is a summary of known
information about its parameters.  Main purpose of this context is
to give more realistic estimations of function runtime, size and
@@ -307,8 +309,6 @@ public:
   sreal *ret_time,
   sreal *ret_nonspecialized_time,
   ipa_hints *ret_hints);
-  void duplicate_from (const ipa_call_context );
-  void release (bool all = false);
   bool equal_to (const ipa_call_context &);
   bool exists_p ()
   {
@@ -329,6 +329,18 @@ private:
   /* Even after having calculated clauses, the information about argument
  values is used to resolve indirect calls.  */
   ipa_call_arg_values m_avals;
+
+  friend ipa_cached_call_context;
+};
+
+/* Variant of ipa_call_context that is stored in a cache over a longer period
+   of time.  */
+
+class ipa_cached_call_context : public ipa_call_context
+{
+public:
+  void duplicate_from (const ipa_call_context );
+  void release ();
 };
 
 extern fast_call_summary  *ipa_call_summaries;
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index d2ae8196d09..b7af77f7b9b 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -57,7 +57,7 @@ fast_call_summary 
*edge_growth_cache = NULL;
 class node_context_cache_entry
 {
 public:
-  ipa_call_context ctx;
+  ipa_cached_call_context ctx;
   sreal time, nonspec_time;
   int size;
   ipa_hints hints;
@@ -226,7 +226,7 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal 
*ret_nonspec_time)
node_context_cache_miss++;
  else
node_context_cache_clear++;
- e->entry.ctx.release (true);
+ e->entry.ctx.release ();
  

[PATCH 1/6] ipa: Bundle vectors describing argument values

2020-09-07 Thread Martin Jambor
Hi,

this large patch is mostly mechanical change which aims to replace
uses of separate vectors about known scalar values (usually called
known_vals or known_csts), known aggregate values (known_aggs), known
virtual call contexts (known_contexts) and known value
ranges (known_value_ranges) with uses of either new type
ipa_call_arg_values or ipa_auto_call_arg_values, both of which simply
contain these vectors inside them.

The need for two distinct types comes from the fact that when the
vectors are constructed from jump functions or lattices, we really
should use auto_vecs with embedded storage allocated on stack.  On the
other hand, the bundle in ipa_call_context can be allocated on heap when
in cache, one time for each call_graph node.

ipa_call_context is constructible from ipa_auto_call_arg_values but
then its vectors must not be resized, otherwise the vectors will stop
pointing to the stack ones.  Unfortunately, I don't think the
structure embedded in ipa_call_context can be made constant because we
need to manipulate and deallocate it when in cache.

Last week I bootstrapped and tested (and LTO-bootstrapped) this patch
individually, this week I did so for the whole patch set.

OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-09-01  Martin Jambor  

* ipa-prop.h (ipa_auto_call_arg_values): New type.
(class ipa_call_arg_values): Likewise.
(ipa_get_indirect_edge_target): Replaced vector arguments with
ipa_call_arg_values in declaration.  Added an overload for
ipa_auto_call_arg_values.
* ipa-fnsummary.h (ipa_call_context): Removed members m_known_vals,
m_known_contexts, m_known_aggs, duplicate_from, release and equal_to,
new members m_avals, store_to_cache and equivalent_to_p.  Adjusted
construcotr arguments.
(estimate_ipcp_clone_size_and_time): Replaced vector arguments
with ipa_auto_call_arg_values in declaration.
(evaluate_properties_for_edge): Likewise.
* ipa-cp.c (ipa_get_indirect_edge_target): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.  Added an
overload for ipa_auto_call_arg_values.
(devirtualization_time_bonus): Adjusted to work on
ipa_auto_call_arg_values rather than on separate vectors.
(gather_context_independent_values): Adjusted to work on
ipa_auto_call_arg_values rather than on separate vectors.
(perform_estimation_of_a_value): Likewise.
(estimate_local_effects): Likewise.
(modify_known_vectors_with_val): Adjusted both variants to work on
ipa_auto_call_arg_values and rename them to
copy_known_vectors_add_val.
(decide_about_value): Adjusted to work on ipa_call_arg_values rather
than on separate vectors.
(decide_whether_version_node): Likewise.
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Likewise.
(evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(estimate_edge_devirt_benefit): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_edge_size_and_time): Likewise.
(estimate_calls_size_and_time_1): Likewise.
(summarize_calls_size_and_time): Adjusted calls to
estimate_edge_size_and_time.
(estimate_calls_size_and_time): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(ipa_call_context::ipa_call_context): Construct from a pointer to
ipa_auto_call_arg_values instead of inividual vectors.
(ipa_call_context::duplicate_from): Adjusted to access vectors within
m_avals.
(ipa_call_context::release): Likewise.
(ipa_call_context::equal_to): Likewise.
(ipa_call_context::estimate_size_and_time): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_ipcp_clone_size_and_time): Adjusted to work with
ipa_auto_call_arg_values rather than on separate vectors.
(ipa_merge_fn_summary_after_inlining): Likewise.  Adjusted call to
estimate_edge_size_and_time.
(ipa_update_overall_fn_summary): Adjusted call to
estimate_edge_size_and_time.
* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to work with
ipa_auto_call_arg_values rather than with separate vectors.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-prop.c (ipa_auto_call_arg_values::~ipa_auto_call_arg_values):
New destructor.
---
 gcc/ipa-cp.c  | 245 ++---
 gcc/ipa-fnsummary.c   | 446 +-
 gcc/ipa-fnsummary.h   |  27 +--
 gcc/ipa-inline-analysis.c |  41 +---
 gcc/ipa-prop.c|  10 +
 gcc/ipa-prop.h| 112 +-
 6 files changed, 452 insertions(+), 429 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 

Re: [committed] libstdc++: Optimise GCD algorithms

2020-09-07 Thread Jonathan Wakely via Gcc-patches

On 05/09/20 09:34 +0100, Jonathan Wakely via Libstdc++ wrote:

On Sat, 5 Sep 2020 at 01:35, Sidney Marshall  wrote:


Jonathan

I don't know if the following comments are useful or not but here goes:


Reviews of my patches are always welcome, thanks.



>The current std::gcd and std::chrono::duration::_S_gcd algorithms are
>both recursive. This is potentially expensive to evaluate in constant
>expressions, because each level of recursion makes a new copy of the
>function to evaluate. The maximum number of steps is bounded
>(proportional to the number of decimal digits in the smaller value) and
>so unlikely to exceed the limit for constexpr nesting, but the memory
>usage is still suboptimal. By using an iterative algorithm we avoid
>that compile-time cost. Because looping in constexpr functions is not
>allowed until C++14, we need to keep the recursive implementation in
>duration::_S_gcd for C++11 mode.
>
>For std::gcd we can also optimise runtime performance by using the
>binary GCD algorithm.
>
>libstdc++-v3/ChangeLog:
>
> * include/std/chrono (duration::_S_gcd): Use iterative algorithm
> for C++14 and later.
> * include/std/numeric (__detail::__gcd): Replace recursive
> Euclidean algorithm with iterative version of binary GCD algorithm.
> * testsuite/26_numerics/gcd/1.cc: Test additional inputs.
> * testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines.
> * testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
> * testsuite/experimental/numeric/gcd.cc: Test additional inputs.
> * testsuite/26_numerics/gcd/2.cc: New test.
>
>Tested powerpc64le-linux. Committed to trunk.
>
>-- next part --
>commit 3c219134152f645103f2fcd50735b177ccd76cde
>Author: Jonathan Wakely 
>Date:   Thu Sep 3 12:38:50 2020
>
> libstdc++: Optimise GCD algorithms
>
> The current std::gcd and std::chrono::duration::_S_gcd algorithms are
> both recursive. This is potentially expensive to evaluate in constant
> expressions, because each level of recursion makes a new copy of the
> function to evaluate. The maximum number of steps is bounded
> (proportional to the number of decimal digits in the smaller value) and
> so unlikely to exceed the limit for constexpr nesting, but the memory
> usage is still suboptimal. By using an iterative algorithm we avoid
> that compile-time cost. Because looping in constexpr functions is not
> allowed until C++14, we need to keep the recursive implementation in
> duration::_S_gcd for C++11 mode.
>
> For std::gcd we can also optimise runtime performance by using the
> binary GCD algorithm.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/chrono (duration::_S_gcd): Use iterative algorithm
> for C++14 and later.
> * include/std/numeric (__detail::__gcd): Replace recursive
> Euclidean algorithm with iterative version of binary
> GCD algorithm.
> * testsuite/26_numerics/gcd/1.cc: Test additional inputs.
> * testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines.
> * testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
> * testsuite/experimental/numeric/gcd.cc: Test additional inputs.
> * testsuite/26_numerics/gcd/2.cc: New test.
>
>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
>index 1682263fd9f..0e2efb2522b 100644
>--- a/libstdc++-v3/include/std/chrono
>+++ b/libstdc++-v3/include/std/chrono
>@@ -428,8 +428,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> _S_gcd(intmax_t __m, intmax_t __n) noexcept
> {
>   // Duration only allows positive periods so we don't need to
>- // support negative values here (unlike __static_gcd and std::gcd).
>+ // handle negative values here (unlike __static_gcd and std::gcd).
>+#if __cplusplus >= 201402L
>+ while (__m != 0 && __n != 0)

Why are you testing for both __m != 0 && __n != 0 in the loop. Only
__n can be zero. A test for __m == 0 can be made out side of the loop
to save a possible division. Or is there something special about
compile time execution?


It was adapted (maybe too hastily) from the general purpose std::gcd
function, which does need to handle zeros. I didn't consider how to
improve it using extra knowledge about non-zero inputs (I only
considered that they can't be negative).

Neither value can be zero initially (duration only allows positive
ratios, and ratio doesn't allow zero denominators) so I suppose it
could be:

 do
   {
 intmax_t __rem = __m % __n;
 __m = __n;
 __n = __rem;
   } while (__n != 0)
 return __m;




>+   {
>+ intmax_t __rem = __m % __n;
>+ __m = __n;
>+ __n = __rem;
>+   }
>+ return __m + __n;

If __n is zero then why not just return __m? Or, again, is there
something special about compile time execution?


If n is 

[committed] libstdc++: Simplify constraints for semiregular-box [LWG 3477]

2020-09-07 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/std/ranges (__box): Simplify constraints as per LWG 3477.

Tested powerpc64le-linux. Committed to trunk.

commit 00ffe730072f5e2e1923692163dc37db7b3784cb
Author: Jonathan Wakely 
Date:   Mon Sep 7 20:09:17 2020

libstdc++: Simplify constraints for semiregular-box [LWG 3477]

libstdc++-v3/ChangeLog:

* include/std/ranges (__box): Simplify constraints as per LWG 3477.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 054ffe85d0f..23a04d61174 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -494,10 +494,12 @@ namespace ranges
 
using std::optional<_Tp>::operator=;
 
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3477. Simplify constraints for semiregular-box
__box&
operator=(const __box& __that)
noexcept(is_nothrow_copy_constructible_v<_Tp>)
-   requires (!assignable_from<_Tp&, const _Tp&>)
+   requires (!copyable<_Tp>)
{
  if ((bool)__that)
this->emplace(*__that);
@@ -509,7 +511,7 @@ namespace ranges
__box&
operator=(__box&& __that)
noexcept(is_nothrow_move_constructible_v<_Tp>)
-   requires (!assignable_from<_Tp&, _Tp>)
+   requires (!movable<_Tp>)
{
  if ((bool)__that)
this->emplace(std::move(*__that));


Re: [PATCH] vec: Revert "dead code removal in tree-vect-loop.c" and add a comment.

2020-09-07 Thread Andrea Corallo
Richard Sandiford  writes:

> Andrea Corallo  writes:
>> Hi Richard,
>>
>> this reverts the discussed patch and adds what was suggested as a
>> comment.
>>
>> Apologies for the inconvenience, okay for trunk?
>
> OK, thanks!  And sorry for not commenting/documenting this very well.
>
> Richard

Installed as e147bb0faad.

Thanks
  Andrea


Re: ubsan: d-demangle.c:214 signed integer overflow

2020-09-07 Thread Ian Lance Taylor
Iain Buclaw  writes:

> Excerpts from Alan Modra's message of September 7, 2020 2:56 am:
>> On Fri, Sep 04, 2020 at 06:23:10PM +0200, Iain Buclaw wrote:
>>> If we're already using limits.h, I guess it should be fine to also add
>>> 
>>> #define UINT_MAX ((unsigned) ~0U)
>> 
>> Yes, except that I'll use the simpler fall-back
>> #define UINT_MAX (~0U)
>> 
>> The habit of using a cast for unsigned constants dates back to K C
>> where a U suffix was not valid.  For example, from libiberty/strtol.c
>> #define  ULONG_MAX   ((unsigned long)(~0L))
>> 
>> Since the code uses ISO/ANSI C features such as prototypes I think
>> we're OK with a U suffix.  And if there's something I'm missing then
>> #define UINT_MAX ((unsigned) ~0)
>> would be correct for K
>> 
>>> I'll leave it to your judgement on that though.
>>> 
>>> Other than that, OK from me.
>> 
>> Do I need an OK from Ian too?
>> 
>
> As it only touches D support files, I'd say no.

I agree.

Ian


Re: [PATCH] vec: Revert "dead code removal in tree-vect-loop.c" and add a comment.

2020-09-07 Thread Richard Sandiford
Andrea Corallo  writes:
> Hi Richard,
>
> this reverts the discussed patch and adds what was suggested as a
> comment.
>
> Apologies for the inconvenience, okay for trunk?

OK, thanks!  And sorry for not commenting/documenting this very well.

Richard

>
> Thanks
>
>   Andrea
>
> From 84dfa2d461692f445f45b3c3498f9bedba5c3848 Mon Sep 17 00:00:00 2001
> From: Andrea Corallo 
> Date: Mon, 7 Sep 2020 13:45:47 +0100
> Subject: [PATCH] vec: Revert "dead code removal in tree-vect-loop.c" and add a
>  comment.
>
> gcc/ChangeLog
>
> 2020-09-07  Andrea Corallo  
>
>   * tree-vect-loop.c (vect_estimate_min_profitable_iters): Revert
>   dead-code removal introduced by 09fa6acd8d9 + add a comment to
>   clarify.
> ---
>  gcc/tree-vect-loop.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 9a82573604b..5ba4e594df9 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4103,6 +4103,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>  
>if (vec_outside_cost <= 0)
>  min_profitable_estimate = 0;
> +  /* ??? This "else if" arm is written to handle all cases; see below for
> + what we would do for !LOOP_VINFO_USING_PARTIAL_VECTORS_P.  */
>else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>  {
>/* This is a repeat of the code above, but with + SOC rather
> @@ -4115,10 +4117,17 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>if (outside_overhead > 0)
>   min_vec_niters = outside_overhead / saving_per_viter + 1;
>  
> -  int threshold = (vec_inside_cost * min_vec_niters
> -+ vec_outside_cost
> -+ scalar_outside_cost);
> -  min_profitable_estimate = threshold / scalar_single_iter_cost + 1;
> +  if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> + {
> +   int threshold = (vec_inside_cost * min_vec_niters
> ++ vec_outside_cost
> ++ scalar_outside_cost);
> +   min_profitable_estimate = threshold / scalar_single_iter_cost + 1;
> + }
> +  else
> + min_profitable_estimate = (min_vec_niters * assumed_vf
> ++ peel_iters_prologue
> ++ peel_iters_epilogue);
>  }
>else
>  {


Re: [PATCH] doc: Update documentation on MODE_PARTIAL_INT subregs

2020-09-07 Thread Jozef Lawrynowicz
On Mon, Sep 07, 2020 at 01:55:59PM +0100, Richard Sandiford wrote:
> Jozef Lawrynowicz  writes:
> > In d8487c949ad5 (~GCC 4.9.0), MODE_PARTIAL_INT modes were changed from
> > having an unknown number of undefined bits, to having a known number of
> > undefined bits, however the documentation on using SUBREG expressions
> > with MODE_PARTIAL_INT modes was not updated to reflect this.
> >
> > The attached patch updates that portion of the GCC internals
> > documentation.
> >
> > Ok for trunk?
> 
> Thanks for doing this.  OK for trunk with one very minor nit:
> 
> > -does not guarantee that @samp{(subreg:HI (reg:PSI 0) 0)} has the
> > -value @samp{(reg:HI 4)}.
> > +sets the low 16 bits of @samp{(reg:PSI 0)} to @samp{(reg:HI 4)}, and
> > +the high 4 defined bits of @samp{(reg:PSI 0)} retain their
> > +original value.  The behavior here is therefore the same as
> 
> IMO reads more naturally as “the same as for”.
> 
> Richard
> 
> > +normal @code{subreg}s, when there are no
> > +@code{MODE_PARTIAL_INT} modes involved.
> >  

Thanks, fixed and applied.

Jozef


[committed] MSP430: Don't override default ISA when MCU name is unrecognized

2020-09-07 Thread Jozef Lawrynowicz
430X is the default ISA under normal operation, so even when the MCU name
passed to -mmcu= is unrecognized, it should not be overriden.

Committed as obvious.
>From 7f87e446691f1debfe2671a40f8738bd5e128832 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 7 Sep 2020 17:35:04 +0100
Subject: [PATCH] MSP430: Don't override default ISA when MCU name is
 unrecognized

430X is the default ISA under normal operation, so even when the MCU name
passed to -mmcu= is unrecognized, it should not be overriden.

gcc/ChangeLog:

* config/msp430/msp430.c (msp430_option_override): Don't set the
ISA to 430 when the MCU is unrecognized.

gcc/testsuite/ChangeLog:

* gcc.target/msp430/430x-default-isa.c: New test.
---
 gcc/config/msp430/msp430.c | 11 ++-
 gcc/testsuite/gcc.target/msp430/430x-default-isa.c | 10 ++
 2 files changed, 12 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/430x-default-isa.c

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index c2b24974364..129b916715e 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -223,7 +223,7 @@ msp430_option_override (void)
  if (target_cpu == NULL)
warning (0,
 "Unrecognized MCU name %qs, assuming that it is "
-"just a MSP430 with no hardware multiply.\n"
+"just a MSP430X with no hardware multiply.\n"
 "Use the %<-mcpu%> and %<-mhwmult%> options to "
 "set these explicitly.",
 target_mcu);
@@ -242,22 +242,15 @@ msp430_option_override (void)
  if (msp430_warn_mcu)
warning (0,
 "Unrecognized MCU name %qs, assuming that it just "
-"supports the MSP430 ISA.\nUse the %<-mcpu%> option "
+"supports the MSP430X ISA.\nUse the %<-mcpu%> option "
 "to set the ISA explicitly.",
 target_mcu);
-
- msp430x = false;
}
  else if (msp430_warn_mcu)
warning (0, "Unrecognized MCU name %qs.", target_mcu);
}
 }
 
-  /* The F5 series are all able to support the 430X ISA.  */
-  if (target_cpu == NULL && target_mcu == NULL
-  && msp430_hwmult_type == MSP430_HWMULT_F5SERIES)
-msp430x = true;
-
   if (TARGET_LARGE && !msp430x)
 error ("%<-mlarge%> requires a 430X-compatible %<-mmcu=%>");
 
diff --git a/gcc/testsuite/gcc.target/msp430/430x-default-isa.c 
b/gcc/testsuite/gcc.target/msp430/430x-default-isa.c
new file mode 100644
index 000..4e8077786b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/430x-default-isa.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-mcpu=*" "-mmcu=*" } } */
+/* { dg-options "-mmcu=msp430foobar -mno-warn-mcu -mno-warn-devices-csv" } */
+
+/* Verify that the default ISA is set to 430X when the MCU passed to -mmcu= is
+   unrecognized.  */
+
+#ifndef __MSP430X__
+#error
+#endif
-- 
2.28.0



[pushed] Darwin, testsuite : Update pubtypes tests.

2020-09-07 Thread Iain Sandoe via Gcc-patches
Hi

Recent changes in debug output have resulted in a change
in the length of the pub types info.  This updates the tests to
reflect the new length.

tested on i686-darwin, x86_64-darwin,
applied to master
thanks
Iain

gcc/testsuite/ChangeLog:

* gcc.dg/pubtypes-2.c: Amend Pub Info Length.
* gcc.dg/pubtypes-3.c: Likewise.
* gcc.dg/pubtypes-4.c: Likewise.
---
 gcc/testsuite/gcc.dg/pubtypes-2.c | 2 +-
 gcc/testsuite/gcc.dg/pubtypes-3.c | 2 +-
 gcc/testsuite/gcc.dg/pubtypes-4.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pubtypes-2.c 
b/gcc/testsuite/gcc.dg/pubtypes-2.c
index 6669f3d3170..116e3489bc0 100644
--- a/gcc/testsuite/gcc.dg/pubtypes-2.c
+++ b/gcc/testsuite/gcc.dg/pubtypes-2.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O0 -gdwarf-2 -dA" } */
 /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
 /* { dg-final { scan-assembler "__debug_pubtypes" } } */
-/* { dg-final { scan-assembler "long+\[ \t\]+0x12e+\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
+/* { dg-final { scan-assembler {long+[ \t]+0x14d+[ \t]+[#;]+[ \t]+Pub Info 
Length} } } */
 /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 
diff --git a/gcc/testsuite/gcc.dg/pubtypes-3.c 
b/gcc/testsuite/gcc.dg/pubtypes-3.c
index 345e4edaba6..3fb3468fb00 100644
--- a/gcc/testsuite/gcc.dg/pubtypes-3.c
+++ b/gcc/testsuite/gcc.dg/pubtypes-3.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O0 -gdwarf-2 -dA" } */
 /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
 /* { dg-final { scan-assembler "__debug_pubtypes" } } */
-/* { dg-final { scan-assembler "long+\[ \t\]+0x12e+\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
+/* { dg-final { scan-assembler {long+[ \t]+0x14d+[ \t]+[#;]+[ \t]+Pub Info 
Length} } } */
 /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler-not "\"list_name_type0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
diff --git a/gcc/testsuite/gcc.dg/pubtypes-4.c 
b/gcc/testsuite/gcc.dg/pubtypes-4.c
index da2f9b02ff4..83fba8dfabc 100644
--- a/gcc/testsuite/gcc.dg/pubtypes-4.c
+++ b/gcc/testsuite/gcc.dg/pubtypes-4.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O0 -gdwarf-2 -dA" } */
 /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
 /* { dg-final { scan-assembler "__debug_pubtypes" } } */
-/* { dg-final { scan-assembler "long+\[ \t\]+0x165+\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
+/* { dg-final { scan-assembler {long+[ \t]+0x184+[ \t]+[#;]+[ \t]+Pub Info 
Length} } } */
 /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler "\"list_name_type0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
-- 
2.24.1




Re: ubsan: d-demangle.c:214 signed integer overflow

2020-09-07 Thread Iain Buclaw via Gcc-patches
Excerpts from Alan Modra's message of September 7, 2020 2:56 am:
> On Fri, Sep 04, 2020 at 06:23:10PM +0200, Iain Buclaw wrote:
>> If we're already using limits.h, I guess it should be fine to also add
>> 
>> #define UINT_MAX ((unsigned) ~0U)
> 
> Yes, except that I'll use the simpler fall-back
> #define UINT_MAX (~0U)
> 
> The habit of using a cast for unsigned constants dates back to K C
> where a U suffix was not valid.  For example, from libiberty/strtol.c
> #define   ULONG_MAX   ((unsigned long)(~0L))
> 
> Since the code uses ISO/ANSI C features such as prototypes I think
> we're OK with a U suffix.  And if there's something I'm missing then
> #define UINT_MAX ((unsigned) ~0)
> would be correct for K
> 
>> I'll leave it to your judgement on that though.
>> 
>> Other than that, OK from me.
> 
> Do I need an OK from Ian too?
> 

As it only touches D support files, I'd say no.

Iain


[PATCH] vec: Revert "dead code removal in tree-vect-loop.c" and add a comment.

2020-09-07 Thread Andrea Corallo
Hi Richard,

this reverts the discussed patch and adds what was suggested as a
comment.

Apologies for the inconvenience, okay for trunk?

Thanks

  Andrea

>From 84dfa2d461692f445f45b3c3498f9bedba5c3848 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Mon, 7 Sep 2020 13:45:47 +0100
Subject: [PATCH] vec: Revert "dead code removal in tree-vect-loop.c" and add a
 comment.

gcc/ChangeLog

2020-09-07  Andrea Corallo  

* tree-vect-loop.c (vect_estimate_min_profitable_iters): Revert
dead-code removal introduced by 09fa6acd8d9 + add a comment to
clarify.
---
 gcc/tree-vect-loop.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 9a82573604b..5ba4e594df9 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4103,6 +4103,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 
   if (vec_outside_cost <= 0)
 min_profitable_estimate = 0;
+  /* ??? This "else if" arm is written to handle all cases; see below for
+ what we would do for !LOOP_VINFO_USING_PARTIAL_VECTORS_P.  */
   else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
 {
   /* This is a repeat of the code above, but with + SOC rather
@@ -4115,10 +4117,17 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
   if (outside_overhead > 0)
min_vec_niters = outside_overhead / saving_per_viter + 1;
 
-  int threshold = (vec_inside_cost * min_vec_niters
-  + vec_outside_cost
-  + scalar_outside_cost);
-  min_profitable_estimate = threshold / scalar_single_iter_cost + 1;
+  if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
+   {
+ int threshold = (vec_inside_cost * min_vec_niters
+  + vec_outside_cost
+  + scalar_outside_cost);
+ min_profitable_estimate = threshold / scalar_single_iter_cost + 1;
+   }
+  else
+   min_profitable_estimate = (min_vec_niters * assumed_vf
+  + peel_iters_prologue
+  + peel_iters_epilogue);
 }
   else
 {
-- 
2.17.1



[PATCH 1/1] arm: [testsuite] Skip thumb2-cond-cmp tests on Cortex-M [PR94595]

2020-09-07 Thread Christophe Lyon via Gcc-patches
Since r204778 (g571880a0a4c512195aa7d41929ba6795190887b2), we favor
branches over IT blocks on Cortex-M. As a result, instead of
generating two nested IT blocks in thumb2-cond-cmp-[1234].c, we
generate either a single IT block, or use branches depending on
conditions tested by the program.

Since this was a deliberate change and the tests still pass as
expected on Cortex-A, this patch skips them when targetting
Cortex-M. The avoids the failures on Cortex M3, M4, and M33.  This
patch makes the testcases unsupported on Cortex-M7 although they pass
in this case because this CPU has different branch costs.

I tried to relax the scan-assembler directives using eg. cmpne|subne
or cmpgt|ble but that seemed fragile.

OK?

2020-09-07  Christophe Lyon  

gcc/testsuite/
PR target/94595
* gcc.target/arm/thumb2-cond-cmp-1.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-2.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
---
 gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c | 2 +-
 gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c | 2 +-
 gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c | 2 +-
 gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c 
b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
index 45ab605..36204f4 100644
--- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
+++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
@@ -1,6 +1,6 @@
 /* Use conditional compare */
 /* { dg-options "-O2" } */
-/* { dg-skip-if "" { arm_thumb1_ok } } */
+/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
 /* { dg-final { scan-assembler "cmpne" } } */
 
 int f(int i, int j) 
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c 
b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
index 17d9a8f..108d1c3 100644
--- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
+++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
@@ -1,6 +1,6 @@
 /* Use conditional compare */  
   
 /* { dg-options "-O2" } */
-/* { dg-skip-if "" { arm_thumb1_ok } } */
+/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
 /* { dg-final { scan-assembler "cmpeq" } } */
 
 int f(int i, int j) 
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c 
b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
index 6b2a79b..ca7fd9f 100644
--- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
+++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
@@ -1,6 +1,6 @@
 /* Use conditional compare */  
   
 /* { dg-options "-O2" } */
-/* { dg-skip-if "" { arm_thumb1_ok } } */
+/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
 /* { dg-final { scan-assembler "cmpgt" } } */
 
 int f(int i, int j)
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c 
b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
index 80e1076..91cc8f4 100644
--- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
+++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
@@ -1,6 +1,6 @@
 /* Use conditional compare */  
   
 /* { dg-options "-O2" } */
-/* { dg-skip-if "" { arm_thumb1_ok } } */
+/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
 /* { dg-final { scan-assembler "cmpgt" } } */
 
 int f(int i, int j)
-- 
2.7.4



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-07 Thread H.J. Lu via Gcc-patches
On Mon, Sep 7, 2020 at 7:06 AM Segher Boessenkool
 wrote:
>
> On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
> > On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
> >  wrote:
> > > On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > > > > You probably have to do this for every target separately?  But it is 
> > > > > not
> > > > > enough to handle it in the epilogue, you also need to make sure it is
> > > > > done on every path that returns *without* epilogue.
> > > >
> > > > This feature is designed for normal return with epilogue.
> > >
> > > Very many normal returns do *not* pass through an epilogue, but are
> > > simple_return.  Disabling that is *much* more expensive than that 2%.
> >
> > Sibcall isn't covered.  What other cases don't have an epilogue?
>
> Shrink-wrapped stuff.  Quite important for performance.  Not something
> you can throw away.
>

Qing, can you check how it interacts with shrink-wrap?

-- 
H.J.


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-07 Thread Segher Boessenkool
On Fri, Sep 04, 2020 at 01:23:14AM +, Rodriguez Bahena, Victor wrote:
> Qing, thanks a lot for the measurement, I am not sure if this is the limit of 
> overhead the community is willing to accept by adding extra security (me as 
> gcc user will be willing to accept). 

The overhead is of course bearable for most programs / users, but what
is the return?  For what percentage of programs are ROP attacks no
longer possible, for example.


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-07 Thread Segher Boessenkool
On Fri, Sep 04, 2020 at 02:00:41PM -0500, Qing Zhao wrote:
> >> However, if we only clear USED registers, the worst case is 1.72% on 
> >> average.  This overhead is very reasonable. 
> > 
> > No, that is the number I meant.  2% overhead is extremely much, unless
> > this is magically super effective, and actually protects many things
> > from exploitation (that aren't already protected some other way, SSP for
> > example).
> 
> Then how about the 0.81% overhead on average for 
> -fzero-call-used-regs=used_gpr_arg? 

That is still quite a lot.

> This option can be used to effectively mitigate ROP attack. 

Nice assertion.  Show it!

> > Yes.  Which is why I asked for numbers of both sides of the equation:
> > how much it costs, vs. how much value it brings.
> 
> Reasonable. 

I'm glad you agree :-)

> >> For compiler, we should provide such option to the users to satisfy their 
> >> security need even though the runtime overhead.  Of course, during 
> >> compiler implementation, we will do our best to minimize the runtime 
> >> overhead.
> > 
> > There also is a real cost to the compiler *developers*.  Which is my
> > prime worry here.  If this gives users at most marginal value, then it
> > is real cost to us, but nothing to hold up to that.
> 
> Here, you mean the future maintenance  cost  for this part of the code?

Not just that.  *All* support costs, and consider all other
optimisations it will interfere with, etc.


Segher


Re: [PATCH] aarch64: Remove redundant mult patterns

2020-09-07 Thread Alex Coplan
On 28/08/2020 11:00, Richard Sandiford wrote:
> Alex Coplan  writes:
> > Hello,
> >
> > Following on from the earlier patch to fix up the syntax for
> > add/sub/adds/subs and friends with a sign/zero-extended operand [0],
> > this patch removes the "mult" variants of these patterns which are
> > all redundant.
> >
> > This patch removes the following patterns from the AArch64 backend:
> >
> >  *adds_mul_imm_
> >  *subs_mul_imm_
> >  *adds__multp2
> >  *subs__multp2
> >  *add_mul_imm_
> >  *add__mult_
> >  *add__mult_si_uxtw
> >  *add__multp2
> >  *add_si_multp2_uxtw
> >  *add_uxt_multp2
> >  *add_uxtsi_multp2_uxtw
> >  *sub_mul_imm_
> >  *sub_mul_imm_si_uxtw
> >  *sub__multp2
> >  *sub_si_multp2_uxtw
> >  *sub_uxt_multp2
> >  *sub_uxtsi_multp2_uxtw
> >  *neg_mul_imm_2
> >  *neg_mul_imm_si2_uxtw
> >
> > Together with the following predicates which were used only by these
> > patterns:
> >
> >  - aarch64_pwr_imm3
> >  - aarch64_pwr_2_si
> >  - aarch64_pwr_2_di
> >
> > These patterns are all redundant since multiplications by powers of two
> > should be represented as shfits outside a (mem).
> >
> > Testing:
> >  * Bootstrapped and regtested on aarch64-none-linux-gnu (on top of [0]),
> >no regressions.
> >
> > OK for master (when applied after [0])?
> 
> That's a nice collection of minuses.
> 
> OK for trunk, thanks.  Since this depends on the RA patch, and since
> RA patches have a habit of exposing problems on other targets, it might
> be better to wait for a week or so before committing this one.
> Just a suggestion though -- go ahead and commit whenever you're
> comfortable committing.

Pushed to trunk, together with the previous patch to fix the
sign/zero-extend syntax:

2f8ae301f6a aarch64: Remove redundant mult patterns
d4febc75e8d aarch64: Don't emit invalid zero/sign-extend syntax

Thanks,
Alex

> 
> The patch might expose code quality regressions.  If so, that's
> probably a sign that some other pass needs a similar fix to the RA,
> even though the symptom is “just” a missed optimisation rather than
> an ICE.
> 
> Thanks,
> Richard


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-07 Thread Segher Boessenkool
On Fri, Sep 04, 2020 at 11:52:13AM -0700, H.J. Lu wrote:
> On Fri, Sep 4, 2020 at 11:09 AM Segher Boessenkool
>  wrote:
> > On Fri, Sep 04, 2020 at 10:34:23AM -0700, H.J. Lu wrote:
> > > > You probably have to do this for every target separately?  But it is not
> > > > enough to handle it in the epilogue, you also need to make sure it is
> > > > done on every path that returns *without* epilogue.
> > >
> > > This feature is designed for normal return with epilogue.
> >
> > Very many normal returns do *not* pass through an epilogue, but are
> > simple_return.  Disabling that is *much* more expensive than that 2%.
> 
> Sibcall isn't covered.  What other cases don't have an epilogue?

Shrink-wrapped stuff.  Quite important for performance.  Not something
you can throw away.


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-07 Thread Rodriguez Bahena, Victor via Gcc-patches


From: Qing Zhao 
Date: Friday, September 4, 2020 at 9:19 AM
To: "Rodriguez Bahena, Victor" , Kees Cook 

Cc: Segher Boessenkool , Jakub Jelinek 
, Uros Bizjak , GCC Patches 

Subject: Re: PING [Patch][Middle-end]Add 
-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Sep 3, 2020, at 8:23 PM, Rodriguez Bahena, Victor 
mailto:victor.rodriguez.bah...@intel.com>> 
wrote:



-Original Message-
From: Qing Zhao mailto:qing.z...@oracle.com>>
Date: Thursday, September 3, 2020 at 12:55 PM
To: Kees Cook mailto:keesc...@chromium.org>>
Cc: Segher Boessenkool 
mailto:seg...@kernel.crashing.org>>, Jakub Jelinek 
mailto:ja...@redhat.com>>, Uros Bizjak 
mailto:ubiz...@gmail.com>>, "Rodriguez Bahena, Victor" 
mailto:victor.rodriguez.bah...@intel.com>>, 
GCC Patches mailto:gcc-patches@gcc.gnu.org>>
Subject: Re: PING [Patch][Middle-end]Add 
-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Sep 3, 2020, at 12:13 PM, Kees Cook 
mailto:keesc...@chromium.org>> wrote:

On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:

On average, all the options starting with “used_…”  (i.e, only the registers 
that are used in the routine will be zeroed) have very low runtime overheads, 
at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks.
If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 
5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average.
Looks like the overhead of zeroing vector registers is much bigger.

For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the 
runtime overhead with this is very small.

That looks great; thanks for doing those tests!

(And it seems like these benchmarks are kind of a "worst case" scenario
with regard to performance, yes? As in it's mostly tight call loops?)

   The top 3 benchmarks that have the most overhead from this option are: 
531.deepsjeng_r, 541.leela_r, and 511.povray_r.
   All of them are C++ benchmarks.
   I guess that the most important reason is  the smaller routine size in 
general (especially at the hot execution path or loops).
   As a result, the overhead of these additional zeroing instructions in each 
routine will be relatively higher.

   Qing

I think that overhead is expected in benchmarks like 541.leela_r, according to 
https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/benchmarks/541.leela_r.html__;!!GqivPVa7Brio!I4c2wyzrNGbeOTsX7BSD-4C9Cv3ypQ4N1qfRzSK__STxRGa5M4VarBKof2ak8-dT$
  is a benchmark for Artificial Intelligence (Monte Carlo simulation, game tree 
search & pattern recognition). The addition of fzero-call-used-regs will 
represent an overhead each time the functions are being call and in areas like 
game tree search is high.

Qing, thanks a lot for the measurement, I am not sure if this is the limit of 
overhead the community is willing to accept by adding extra security (me as gcc 
user will be willing to accept).

From the performance data, we can see that the runtime overhead of clearing 
only_used registers is very reasonable, even for 541.leela_r, 531.deepsjent_r, 
and 511.povray.   If try to clear all registers whatever used or not in the 
current routine, the overhead will be increased dramatically.

So, my question is:

From the security point of view, does clearing ALL registers have more benefit 
than clearing USED registers?
From my understanding, clearing registers that are not used in the current 
routine does NOT provide additional benefit, correct me if I am wrong here.

You are right, it does not provide additional security


Thanks.

Qing



Regards

Victor




--
Kees Cook




Re: [PATCH] doc: Update documentation on MODE_PARTIAL_INT subregs

2020-09-07 Thread Richard Sandiford
Jozef Lawrynowicz  writes:
> In d8487c949ad5 (~GCC 4.9.0), MODE_PARTIAL_INT modes were changed from
> having an unknown number of undefined bits, to having a known number of
> undefined bits, however the documentation on using SUBREG expressions
> with MODE_PARTIAL_INT modes was not updated to reflect this.
>
> The attached patch updates that portion of the GCC internals
> documentation.
>
> Ok for trunk?

Thanks for doing this.  OK for trunk with one very minor nit:

> -does not guarantee that @samp{(subreg:HI (reg:PSI 0) 0)} has the
> -value @samp{(reg:HI 4)}.
> +sets the low 16 bits of @samp{(reg:PSI 0)} to @samp{(reg:HI 4)}, and
> +the high 4 defined bits of @samp{(reg:PSI 0)} retain their
> +original value.  The behavior here is therefore the same as

IMO reads more naturally as “the same as for”.

Richard

> +normal @code{subreg}s, when there are no
> +@code{MODE_PARTIAL_INT} modes involved.
>  
>  @cindex @code{TARGET_CAN_CHANGE_MODE_CLASS} and subreg semantics
>  The rules above apply to both pseudo @var{reg}s and hard @var{reg}s.


[PATCH] gas: Don't error when .debug_line already exists, unless .loc was used

2020-09-07 Thread Mark Wielaard
When -g was used to generate DWARF gas would error out when a .debug_line
already exists. But when a .debug_info section already exists it would
simply skip generating one without warning or error. Do the same for
.debug_line. It is only an error when the user explicitly uses .loc
directives and also generates the .debug_line table itself.

The tests are unfortunately arch specific because the line table is only
generated when actual instructions have been emitted. Use i386 because
that is probably the most used architecture. Before this patch the new
dwarf-line-2 testcase would fail, with this patch it succeeds (and doesn't
try to add its own line table).

gas/ChangeLog:

* as.texi (-g): Explicitly mention when .debug_info and .debug_line
are generated for the DWARF format.
(Loc): Add that it is an error to both use a .loc directive and
generate a .debug_line yourself.
* dwarf2dbg.c (dwarf2_any_loc_directive_seen): New static variable.
(dwarf2_directive_loc): Set dwarf2_any_loc_directive_seen to TRUE.
(dwarf2_finish): Check dwarf2_any_loc_directive_seen before emitting
an error. Only create .debug_line if it is empty (or doesn't exist).
* testsuite/gas/i386/i386.exp: Add dwarf2-line-{1,2,3,4} when testing
an elf target.
* testsuite/gas/i386/dwarf2-line-{1,2,3,4}.{s,d,l}: New test files.
---
 gas/ChangeLog  | 14 
 gas/doc/as.texi|  7 +-
 gas/dwarf2dbg.c| 29 +---
 gas/testsuite/gas/i386/dwarf2-line-1.d | 45 +
 gas/testsuite/gas/i386/dwarf2-line-1.s | 28 
 gas/testsuite/gas/i386/dwarf2-line-2.d | 48 ++
 gas/testsuite/gas/i386/dwarf2-line-2.s | 91 ++
 gas/testsuite/gas/i386/dwarf2-line-3.d |  3 +
 gas/testsuite/gas/i386/dwarf2-line-3.l |  2 +
 gas/testsuite/gas/i386/dwarf2-line-3.s | 32 +
 gas/testsuite/gas/i386/dwarf2-line-4.d | 46 +
 gas/testsuite/gas/i386/dwarf2-line-4.s | 29 
 gas/testsuite/gas/i386/i386.exp|  5 ++
 13 files changed, 369 insertions(+), 10 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-1.d
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-1.s
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-2.d
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-2.s
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-3.d
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-3.l
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-3.s
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-4.d
 create mode 100644 gas/testsuite/gas/i386/dwarf2-line-4.s

diff --git a/gas/ChangeLog b/gas/ChangeLog
index d34c08e924c..70d09729443 100644
--- a/gas/ChangeLog
+++ b/gas/ChangeLog
@@ -1,3 +1,17 @@
+2020-09-07  Mark Wielaard  
+
+   * as.texi (-g): Explicitly mention when .debug_info and .debug_line
+   are generated for the DWARF format.
+   (Loc): Add that it is an error to both use a .loc directive and
+   generate a .debug_line yourself.
+   * dwarf2dbg.c (dwarf2_any_loc_directive_seen): New static variable.
+   (dwarf2_directive_loc): Set dwarf2_any_loc_directive_seen to TRUE.
+   (dwarf2_finish): Check dwarf2_any_loc_directive_seen before emitting
+   an error. Only create .debug_line if it is empty (or doesn't exist).
+   * testsuite/gas/i386/i386.exp: Add dwarf2-line-{1,2,3,4} when testing
+   an elf target.
+   * testsuite/gas/i386/dwarf2-line-{1,2,3,4}.{s,d,l}: New test files.
+
 2020-09-04  Mark Wielaard  
 
* dwarf2dbg.c (add_line_strp): New function.
diff --git a/gas/doc/as.texi b/gas/doc/as.texi
index 112eaf810cd..f2a0314310d 100644
--- a/gas/doc/as.texi
+++ b/gas/doc/as.texi
@@ -745,7 +745,9 @@ compiler output).
 @itemx --gen-debug
 Generate debugging information for each assembler source line using whichever
 debug format is preferred by the target.  This currently means either STABS,
-ECOFF or DWARF2.
+ECOFF or DWARF2.  When the debug format is DWARF then a @code{.debug_info} and
+@code{.debug_line} section is only emitted when the assembly file doesn't
+generate one itself.
 
 @item --gstabs
 Generate stabs debugging information for each assembler line.  This
@@ -5857,7 +5859,8 @@ the @code{.loc} directive will add a row to the 
@code{.debug_line} line
 number matrix corresponding to the immediately following assembly
 instruction.  The @var{fileno}, @var{lineno}, and optional @var{column}
 arguments will be applied to the @code{.debug_line} state machine before
-the row is added.
+the row is added.  It is an error for the input assembly file to generate
+a non-empty @code{.debug_line} and also use @code{loc} directives.
 
 The @var{options} are a sequence of the following tokens in any order:
 
diff --git a/gas/dwarf2dbg.c b/gas/dwarf2dbg.c
index e0ea3991b45..1c21d58c591 100644
--- a/gas/dwarf2dbg.c
+++ b/gas/dwarf2dbg.c
@@ -226,9 +226,15 @@ static unsigned int  dirs_in_use = 0;
 static unsigned 

[PATCH] improve SLP vect dumping

2020-09-07 Thread Richard Biener
This adds additional dumping helping in particular basic-block
vectorization SLP dump reading plus showing what we actually
generate code from.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2020-09-07  Richard Biener  

* tree-vect-slp.c (vect_analyze_slp_instance): Dump
stmts we start SLP analysis from, failure and splitting.
(vect_schedule_slp): Dump SLP graph entry and root stmt
we are about to emit code for.
---
 gcc/tree-vect-slp.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c89ed04f479..dcc80d55917 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2202,6 +2202,15 @@ vect_analyze_slp_instance (vec_info *vinfo,
scalar_stmts.safe_push (next_info);
 }
 
+  if (dump_enabled_p ())
+{
+  dump_printf_loc (MSG_NOTE, vect_location,
+  "Starting SLP discovery for\n");
+  for (i = 0; i < scalar_stmts.length (); ++i)
+   dump_printf_loc (MSG_NOTE, vect_location,
+"  %G", scalar_stmts[i]->stmt);
+}
+
   /* Build the tree for the SLP instance.  */
   bool *matches = XALLOCAVEC (bool, group_size);
   unsigned npermutes = 0;
@@ -2232,6 +2241,10 @@ vect_analyze_slp_instance (vec_info *vinfo,
  return false;
}
  /* Fatal mismatch.  */
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"SLP discovery succeeded but node needs "
+"splitting\n");
  matches[0] = true;
  matches[group_size / const_max_nunits * const_max_nunits] = false;
  vect_free_slp_tree (node, false);
@@ -2374,6 +2387,9 @@ vect_analyze_slp_instance (vec_info *vinfo,
  gcc_assert ((const_nunits & (const_nunits - 1)) == 0);
  unsigned group1_size = i & ~(const_nunits - 1);
 
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"Splitting SLP group at stmt %u\n", i);
  stmt_vec_info rest = vect_split_slp_store_group (stmt_info,
   group1_size);
  bool res = vect_analyze_slp_instance (vinfo, bst_map, stmt_info,
@@ -2395,6 +2411,9 @@ vect_analyze_slp_instance (vec_info *vinfo,
 (some) of the remainder.  FORNOW ignore this possibility.  */
 }
 
+  /* Failed to SLP.  */
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "SLP discovery failed\n");
   return false;
 }
 
@@ -4662,6 +4681,16 @@ vect_schedule_slp (vec_info *vinfo)
   FOR_EACH_VEC_ELT (slp_instances, i, instance)
 {
   slp_tree node = SLP_INSTANCE_TREE (instance);
+  if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "Vectorizing SLP tree:\n");
+ if (SLP_INSTANCE_ROOT_STMT (instance))
+   dump_printf_loc (MSG_NOTE, vect_location, "Root stmt: %G",
+SLP_INSTANCE_ROOT_STMT (instance)->stmt);
+ vect_print_slp_graph (MSG_NOTE, vect_location,
+   SLP_INSTANCE_TREE (instance));
+   }
   /* Schedule the tree of INSTANCE.  */
   vect_schedule_slp_instance (vinfo, node, instance);
 
-- 
2.26.2


Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-07 Thread Richard Biener via Gcc-patches
On Mon, Sep 7, 2020 at 7:44 AM luoxhu  wrote:
>
> Hi,
>
> On 2020/9/4 18:23, Segher Boessenkool wrote:
> >> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> >> index 03b00738a5e..00c65311f76 100644
> >> --- a/gcc/config/rs6000/rs6000-c.c
> >> +++ b/gcc/config/rs6000/rs6000-c.c
> >> /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0. 
> >> */
> >> @@ -1654,15 +1656,8 @@ altivec_resolve_overloaded_builtin (location_t loc, 
> >> tree fndecl,
> >>SET_EXPR_LOCATION (stmt, loc);
> >>stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
> >>  }
> >> -
> >> -  innerptrtype = build_pointer_type (arg1_inner_type);
> >> -
> >> -  stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
> >> -  stmt = convert (innerptrtype, stmt);
> >> -  stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
> >> -  stmt = build_indirect_ref (loc, stmt, RO_NULL);
> >> -  stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
> >> -convert (TREE_TYPE (stmt), arg0));
> >> +  stmt = build_array_ref (loc, stmt, arg2);
> >> +  stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt, arg0);
> >> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> >> return stmt;
> >>   }
> > You should make a copy of the vector, not modify the original one in
> > place?  (If I read that correctly, heh.)  Looks good otherwise.
> >
>
> Segher, there  is already existed code to make a copy of vector as we
> discussed offline.  Thanks for the reminder.
>
> cat pr79251.c.006t.gimple
> __attribute__((noinline))
> test (__vector signed int v, int i, size_t n)
> {
>   __vector signed int D.3192;
>   __vector signed int D.3194;
>   __vector signed int v1;
>   v1 = v;
>   D.3192 = v1;
>   _1 = n & 3;
>   VIEW_CONVERT_EXPR(D.3192)[_1] = i;
>   v1 = D.3192;
>   D.3194 = v1;
>   return D.3194;
> }
>
> Attached the v2 patch which does:
> 1) Build VIEW_CONVERT_EXPR for vec_insert (i, v, n) like v[n%4] = i to
> unify the gimple code, then expander could use vec_set_optab to expand.
> 2) Recognize the pattern in expander and use optabs to expand
> VIEW_CONVERT_EXPR to vec_insert with variable index to fast instructions:
> lvsl+xxperm+xxsel.

Looking at the RTL expander side I see several issues.


@@ -5237,6 +5288,16 @@ expand_assignment (tree to, tree from, bool nontemporal)

   to_rtx = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);

+  if (TREE_CODE (to) == ARRAY_REF)
+   {
+ tree op0 = TREE_OPERAND (to, 0);
+ if (TREE_CODE (op0) == VIEW_CONVERT_EXPR
+ && expand_view_convert_to_vec_set (to, from, to_rtx))
+   {
+ pop_temp_slots ();
+ return;
+   }
+   }

you're placing this at an awkward spot IMHO, after to_rtx expansion
but disregading parts of  it and compensating just with 'to' matching.
Is the pieces (offset, bitpos) really too awkward to work with for
matching?

Because as written you'll miscompile

struct X { _vector signed int v; _vector singed int u; } x;

test(int i, int a)
{
  x.u[i] = a;
}

as I think you'll end up assigning to x.v.

Are we just interested in the case were we store to a
pseudo or also when the destination is memory?  I guess
only when it's a pseudo - correct?  In that case
handling this all in optimize_bitfield_assignment_op
is probably the best thing to try.

Note we possibly refrain from assigning a pseudo to
such vector because we see a variable array-ref to it.

Richard.

> Thanks,
> Xionghu


Re: [PATCH] C-SKY: Add fpuv3 instructions and CK860 arch

2020-09-07 Thread Xianmiao Qu

Hi Gengqi,

Is this patch based on the master branch? I cannot apply it successfully.


Cooper


On 8/27/20 5:33 PM, gengqi via Gcc-patches wrote:

From: gengq 

gcc/ChangeLog:

* config/csky/constraints.md (W): New constriant for mem operand with
a base reg with a index register.
(Q): Renamed and modified "csky_valid_fpuv2_mem_operand" to
"csky_valid_mem_constraint_operand" to deal with both "Q" and "W"
constraint.
(Dv): New constraint for const double value that can be used at fmovi
instruction.
* config/csky/csky-modes.def (HFmode): New mode.
* config/csky/csky-protos.h (csky_valid_mem_constraint_operand):
Declare.
(get_output_csky_movedouble_length): Declare.
(fpuv3_output_move): Declare.
(fpuv3_const_double): Declare.
* config/csky/csky.c (csky_option_override): New arch CK860 with fpv3.
(decompose_csky_address): Robustness adjust.
(csky_print_operand): New "CONST_DOUBLE" operand.
(csky_output_move): New support for fpv3 instructions.
(get_output_csky_movedouble_length): New function.
(fpuv3_output_move): New function.
(fpuv3_const_double): New function.
(csky_emit_compare): New cover for float comparsion.
(csky_emit_compare_float): Refine.
(csky_valid_mem_constraint_operand): Rename from
"csky_valid_fpuv2_mem_operand" and new support for constraint "W".
(ck860_rtx_costs): New function.
(csky_rtx_costs): New subcall for CK860.
* gcc/config/csky/csky.h (TARGET_TLS): Add CK860.
* gcc/config/csky/csky.md (movsf): Delete, and achieve it at
csky_insn_fpu.md.
(ck801_movsf): Delete, and achieve it at csky_insn_fpu.md.
(csky_movsf): Delete, and achieve it at csky_insn_fpu.md.
(movdf): Delete, achieve it at csky_insn_fpu.md.
(ck801_movdf): Delete, and achieve it at csky_insn_fpu.md.
(csky_movdf): Delete, and achieve it at csky_insn_fpu.md.
(csky_movsf_fpv2): Delete, and achieve it at csky_insn_fpuv2.md.
(csky_movdf_fpv2): Delete, and achieve it at csky_insn_fpuv2.md.
(movsicc): Refine. Use "comparison_operatior" instead of
"ordered_comparison_operatior".
(addsicc): Likewise.
* config/csky/csky_cores.def (CK860): New arch and cpu.
(fpv3_hf, fpv3_hsf, fpv3_sdf, fpv3): New fpus.
* config/csky/csky_insn_fpu.md: Refactor. For fpuv2, separate all float
patterns into emit-patterns and match-patterns, remain the
emit-patterns here, and move the match-patterns to csky_insn_fpuv2.md.
For fpuv3, add patterns and fuse part of them with the fpuv2's.
* config/csky/csky_insn_fpuv2.md: New file for fpuv2 instructions.
* config/csky/csky_insn_fpuv3.md: New flie and new patterns for fpuv3
isntructions.
* config/csky/csky_isa.def (fcr): New isa.
(fpv3_hi, fpv3_hf, fpv3_sf, fpv3_df): New isa.
(CK860): New definition.
* gcc/config/csky/csky_tables.opt (ck860): New processors ck860,
ck860f. And new arch ck860.
(fpv3_hf, fpv3_hsf, fpv3_sdf, fpv3): New fpus.
* config/csky/predicates.md (csky_float_comparsion_operator): Delete
"geu", "gtu", "leu", "ltu", which will never appear at float
comparison.
* doc/md.texi: Add "Q" and "W" constraints for C-SKY.

---
  gcc/config/csky/constraints.md |  13 +-
  gcc/config/csky/csky-modes.def |   2 +
  gcc/config/csky/csky-protos.h  |   7 +-
  gcc/config/csky/csky.c | 528 +++
  gcc/config/csky/csky.h |   2 +-
  gcc/config/csky/csky.md| 120 +
  gcc/config/csky/csky_cores.def |  13 +
  gcc/config/csky/csky_insn_fpu.md   | 797 -
  gcc/config/csky/csky_insn_fpuv2.md | 470 +
  gcc/config/csky/csky_insn_fpuv3.md | 418 +++
  gcc/config/csky/csky_isa.def   |  15 +
  gcc/config/csky/csky_tables.opt|  21 +
  gcc/config/csky/predicates.md  |   3 +-
  gcc/doc/md.texi|   8 +
  14 files changed, 1752 insertions(+), 665 deletions(-)
  create mode 100644 gcc/config/csky/csky_insn_fpuv2.md
  create mode 100644 gcc/config/csky/csky_insn_fpuv3.md

diff --git a/gcc/config/csky/constraints.md b/gcc/config/csky/constraints.md
index b9990b7dac9..874f698411b 100644
--- a/gcc/config/csky/constraints.md
+++ b/gcc/config/csky/constraints.md
@@ -34,7 +34,11 @@
  
  (define_memory_constraint "Q"

"Memory operands with base register, index register and short displacement for 
FPUV2"
-  (match_test "csky_valid_fpuv2_mem_operand (op)"))
+  (match_test "csky_valid_mem_constraint_operand (op, \"Q\")"))
+
+(define_memory_constraint "W"
+  "memory operands with base register, index register"
+  (match_test "csky_valid_mem_constraint_operand (op, \"W\")"))
  
  (define_constraint "R"

"Memory operands whose 

Re: Do we need to do a loop invariant motion after loop interchange ?

2020-09-07 Thread Richard Biener via Gcc-patches
On Mon, Sep 7, 2020 at 11:43 AM HAO CHEN GUI via Gcc-patches
 wrote:
>
> Hi,
>
> I want to follow Lijia's work as I gained the performance benefit on
> some SPEC workloads by adding a im pass after loop interchange.  Could
> you send me the latest patches? I could do further testing. Thanks a lot.
>
> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html

Given the patches are quite complex I wonder if executing region-based LIM
from interchange makes more sense here.  Martin might want to work on this.

Richard.


Re: [PATCH] vec: remove unreachable code

2020-09-07 Thread Richard Sandiford
"Kewen.Lin"  writes:
> Hi Andrea,
>
> on 2020/9/4 锟斤拷锟斤拷8:11, Andrea Corallo wrote:
>> Hi all,
>> 
>> just a small patch removing a piece of unreachable code in
>> 'vect_estimate_min_profitable_iters' given the condition
>> (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) is always true as
>> checked just above.
>>  
>
> FWIW, I had the same confusion when I saw the code at the first time,
> the comments at outer if "??? The "if" arm ..." and the commit messages
> seem to give some hints.  IIUC, the code in this outer "if" was
> implemented to handle all cases but that time was stage 4, so it's
> conservative to be guarded as fully-predicated only to avoid the
> potential risks.  I was imagining that it would finally end up
> by removing this outer if condition guard and the outer else.

Right.  So the idea was to show what the code would look like if we
removed the outer “if” condition.  We would then always calculate the
estimate based on the minimum number of profitable vector iterations
first, then convert that into a minimum number of profitable scalar
iterations.

That still seems like a promising future direction.  It would make the
estimate calculations more consistent and make it easier to compare the
cost of full-vector loops with the cost of partial-vector loops.

> But not sure why it didn't change in the following stage1.

Excess lameness on my part, sorry.

> Maybe Richard S. (the author) will have some comments.

TBH I'd prefer for the patch to be reverted.  Maybe we could add
a version of the existing:

  /* ??? The "if" arm is written to handle all cases; see below for what
 we would do for !LOOP_VINFO_USING_PARTIAL_VECTORS_P.  */

comment above:

  else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{

E.g.:

  /* ??? This "else if" arm is written to handle all cases; see below for
 what we would do for !LOOP_VINFO_USING_PARTIAL_VECTORS_P.  */

As:

  /* This is a repeat of the code above, but with + SOC rather
 than - SOC.  */

says, this block is essentially repeating the logic from the earlier block,
so at the time it didn't seem like a good idea to duplicate the commentary.

Either way, the two blocks should be kept in sync, so if we want to remove
the path from this block, we should remove it from the earlier block too.
But my preference would be to keep the path in both blocks.

I'm not foolish enough to promise that I'll test/benchmark removing the
outer “if” condition in this stage 1 cycle, but I'll at least try to find
time…

Thanks,
Richard


[PING] floatformat.h: Add bfloat16 support.

2020-09-07 Thread Willgerodt, Felix via Gcc-patches
Kindly pinging.

Felix

-Original Message-
From: Felix Willgerodt  
Sent: Montag, 17. August 2020 13:37
To: Willgerodt, Felix ; gcc-patches@gcc.gnu.org
Subject: [PATCH] floatformat.h: Add bfloat16 support.

This change is motivated by a patchset that adds bfloat16 debugging support for 
new avx512 instructions to GDB. The gdb thread can be found
here: https://sourceware.org/pipermail/gdb-patches/2020-July/170820.html

include:
2020-08-17  Felix Willgerodt  

* floatformat.h (floatformat_bfloat16_big): New.
(floatformat_bfloat16_little): New.

libiberty:
2020-08-17  Felix Willgerodt  

* floatformat.c (floatformat_bfloat16_big): New.
(floatformat_bfloat16_little): New.
---
 include/floatformat.h   |  3 +++
 libiberty/floatformat.c | 19 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/floatformat.h b/include/floatformat.h index 
ce8d6d4add8..630fade0449 100644
--- a/include/floatformat.h
+++ b/include/floatformat.h
@@ -133,6 +133,9 @@ extern const struct floatformat 
floatformat_ia64_quad_little;
 /* IBM long double (double+double).  */  extern const struct floatformat 
floatformat_ibm_long_double_big;  extern const struct floatformat 
floatformat_ibm_long_double_little;
+/* bfloat16.  */
+extern const struct floatformat floatformat_bfloat16_big; extern const 
+struct floatformat floatformat_bfloat16_little;
 
 /* Convert from FMT to a double.
FROM is the address of the extended float.
diff --git a/libiberty/floatformat.c b/libiberty/floatformat.c index 
2fd5e688ec4..6b9b03288e2 100644
--- a/libiberty/floatformat.c
+++ b/libiberty/floatformat.c
@@ -389,7 +389,24 @@ const struct floatformat 
floatformat_ibm_long_double_little =
   floatformat_ibm_long_double_is_valid,
   _ieee_double_little
 };
-

+
+const struct floatformat floatformat_bfloat16_big = {
+  floatformat_big, 16, 0, 1, 8, 127, 255, 9, 7,
+  floatformat_intbit_no,
+  "floatformat_bfloat16_big",
+  floatformat_always_valid,
+  NULL
+};
+
+const struct floatformat floatformat_bfloat16_little = {
+  floatformat_little, 16, 0, 1, 8, 127, 255, 9, 7,
+  floatformat_intbit_no,
+  "floatformat_bfloat16_little",
+  floatformat_always_valid,
+  NULL
+};
 
 #ifndef min
 #define min(a, b) ((a) < (b) ? (a) : (b))
--
2.25.4

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


Re: *PING* [Patch] Fortran: Fixes for pointer function call as variable (PR96896)

2020-09-07 Thread Thomas Koenig via Gcc-patches

Hi Tobias,


*PING*


OK.

Thanks for the patch!

Regards

Thomas


[committed][libatomic, testsuite] Add missing include in atomic-generic.c

2020-09-07 Thread Tom de Vries
Hi,

When compiling atomic-generic.c from the libatomic testsuite, we run into:
...
$ gcc src/libatomic/testsuite/libatomic.c/atomic-generic.c -latomic
src/libatomic/testsuite/libatomic.c/atomic-generic.c: In function ‘main’:
src/libatomic/testsuite/libatomic.c/atomic-generic.c:31:7: warning: \
  implicit declaration of function ‘memcmp’ [-Wimplicit-function-declaration]
   if (memcmp (, , size))
   ^~
...

Fix this by adding the missing string.h include.

Tested on x86_64.

Committed to trunk.

Thanks,
- Tom

[libatomic, testsuite] Add missing include in atomic-generic.c

libatomic/ChangeLog:

* testsuite/libatomic.c/atomic-generic.c: Include string.h.

---
 libatomic/testsuite/libatomic.c/atomic-generic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libatomic/testsuite/libatomic.c/atomic-generic.c 
b/libatomic/testsuite/libatomic.c/atomic-generic.c
index 11ef6c5831d..662467ea82a 100644
--- a/libatomic/testsuite/libatomic.c/atomic-generic.c
+++ b/libatomic/testsuite/libatomic.c/atomic-generic.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 extern void abort();
 


RE: [PATCH v2] doc: add 'cd' command before 'make check-gcc' command in install.texi

2020-09-07 Thread Hu, Jiangping
Hi, H-P

Thanks for comment.

> On Sat, 29 Aug 2020, Hu Jiangping wrote:
> 
> > This patch add 'cd' command before 'make check-gcc' command
> > when run the testsuite on selected tests.
> 
> No, don't do that; those targets work fine from the toplevel
> too, and then include the language libs.
Yes, I know that 'make check-gcc' work well from the toplevel,
but 'make check-g++' does not. Is there anything wrong with
the Makefile?

I pasted the original error in the v1 submitting text.
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552761.html

> 
> > Richard and I agree it would be good for clarity and
> > emphasis to have the cd in the example as well, although
> > the text above the example was trying to restrict that to
> > objdir/gcc.
> 
> Aha, I guess you misinterpreted the existing "cd @var{objdir};
> make -k check", because I see no /gcc there.
Does the "cd @var{objdir}; make -k check" mean to execute command
from the toplevel.

I understand the context like this:
1. We can execute 'make -k check' from the toplevel to run all testsuites
2. And also, We can execute 'make check-gcc' from gcc subdirectory
   to run selected targets
3. And 'A more selective way', We can execute command with 'RUNTESTFLAGS' 
   option to run selected test(s).

"In order to run sets of tests selectively, there are targets
@samp{make check-gcc}, @samp{make check-g++} and language specific
@samp{make check-c},
@samp{make check-c++}, @samp{make check-d}, @samp{make check-fortran},
@samp{make check-ada}, @samp{make check-objc}, @samp{make 
check-obj-c++},
@samp{make check-lto}
in the @file{gcc} subdirectory of the object directory.  You can also
just run @samp{make check} in a subdirectory of the object directory."

I think the text above the 'make check-gcc' command implies point 2.
So, if the following commands which are examples of point 3 are also
executed from gcc subdirectory, it will make the context more logical
(The tests to be performed are more specific than point 2).

But it does not mean the commands must be executed from gcc subdirectory.
If anyone wants to execute the testsuites including the target libraries,
he should refer to other documents, or directly the Makefile under toplevel.
Or we can make a note here to illustrate that.

i.e.:
Note that if run 'make check-testsuite' from the object directory,
  not only the tests under gcc subdirectory but also the tests under
  the target libriaries will be performed.

What do you think?

Regards!
Hujp

> 
> Please remove the /gcc from the patch.
> 
> Some .exp just exist in the gcc dir, but that's not
> the point of these examples and the extra iteration from the
> toplevel doesn't take more than a second or two.  The bigger
> risk is missing testing in the target libraries.
> 
> (Incidentally, while GNU make is required such that "-C objdir"
> would be equivalent, for newcomers I agree the "cd" is
> clearer in examples.)
> 
> brgds, H-P
> 





Do we need to do a loop invariant motion after loop interchange ?

2020-09-07 Thread HAO CHEN GUI via Gcc-patches

Hi,

I want to follow Lijia's work as I gained the performance benefit on 
some SPEC workloads by adding a im pass after loop interchange.  Could 
you send me the latest patches? I could do further testing. Thanks a lot.


https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html



*PING* [Patch] Fortran: Fixes for pointer function call as variable (PR96896)

2020-09-07 Thread Tobias Burnus

*PING*

On 9/2/20 5:02 PM, Tobias Burnus wrote:

During some discussion such an example as attached came up:
  f() = 0.0
where 'f' is a function which returns a pointer to an array.
This gets handled as
  _F.D0 => f()
  _F.D0 = 0.0
However, the first line did fail with a rank error as the rank
was taken from the RHS.

Changing this to the LHS express failed due to 'use_assoc',
which added an 'extern' to the variable and 'proc_pointer'
also caused problems – in principle, either problem could
have also occurred for the RHS.

Side effect: The error message is better for rank mismatch
as for 'f() = a' no pointer assignment is involved (in terms
of the user code) but before we had the error message
'Different ranks in pointer assignment'.

OK?

Tobias


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [r11-1851 Regression] FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-07 Thread Richard Biener
On Mon, 7 Sep 2020, Hongtao Liu wrote:

> On Mon, Aug 31, 2020 at 8:35 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > On Mon, Aug 31, 2020 at 12:25 AM Richard Biener  wrote:
> > >
> > > On Sat, 29 Aug 2020, sunil.k.pandey wrote:
> > >
> > > > On Linux/x86_64,
> > > >
> > > > dccbf1e2a6e544f71b4a5795f0c79015db019fc3 is the first bad commit
> > > > commit dccbf1e2a6e544f71b4a5795f0c79015db019fc3
> > > > Author: Richard Biener 
> > > > Date:   Mon Jul 6 16:26:50 2020 +0200
> > > >
> > > > tree-optimization/96075 - fix bogus misalignment calculation
> > > >
> > > > caused
> > > >
> > > > FAIL: gcc.dg/vect/slp-46.c -flto -ffat-lto-objects  
> > > > scan-tree-dump-times vect "vectorizing stmts using SLP" 2
> > > > FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts 
> > > > using SLP" 2
> > >
> > > It might be useful to point out that the commit in question added the
> > > testcase that "regressed" plus that you are using a non-standard
> > > configuration (x86 vectorization unfortunately is too fragmented to
> > > produce fully attributed or generic testcases that will pass with
> > > any configuration).
> > >
> > > So the appropriate report would be "FAIL: gcc.dg/vect/slp-46.c
> > > with -march=cascadelake" which isn't a regression.
> > >
> 
> Add /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ ?

Works for me.

Richard.


Re: [r11-1851 Regression] FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-07 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 31, 2020 at 8:35 PM H.J. Lu via Gcc-patches
 wrote:
>
> On Mon, Aug 31, 2020 at 12:25 AM Richard Biener  wrote:
> >
> > On Sat, 29 Aug 2020, sunil.k.pandey wrote:
> >
> > > On Linux/x86_64,
> > >
> > > dccbf1e2a6e544f71b4a5795f0c79015db019fc3 is the first bad commit
> > > commit dccbf1e2a6e544f71b4a5795f0c79015db019fc3
> > > Author: Richard Biener 
> > > Date:   Mon Jul 6 16:26:50 2020 +0200
> > >
> > > tree-optimization/96075 - fix bogus misalignment calculation
> > >
> > > caused
> > >
> > > FAIL: gcc.dg/vect/slp-46.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > > vect "vectorizing stmts using SLP" 2
> > > FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts 
> > > using SLP" 2
> >
> > It might be useful to point out that the commit in question added the
> > testcase that "regressed" plus that you are using a non-standard
> > configuration (x86 vectorization unfortunately is too fragmented to
> > produce fully attributed or generic testcases that will pass with
> > any configuration).
> >
> > So the appropriate report would be "FAIL: gcc.dg/vect/slp-46.c
> > with -march=cascadelake" which isn't a regression.
> >

Add /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ ?

>
> We analyze and report any new failures.  When a testcase fails
> with -march=cascadelake, it can be
>
> 1. The testcase doesn't expect AVX512.
> 2. Compiler fails to handle AVX512.
>
> In any case, there are failures which didn't exist before the commit.
>
> --
> H.J.



-- 
BR,
Hongtao


Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-09-07 Thread Dennis Zhang
Hi Ramana,

On 8/21/20 10:33 PM, Ramana Radhakrishnan wrote:
> On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang  wrote:
>>
>>
>> Hi all,
>>
>> This patch enables MVE vsub instructions for auto-vectorization.
>> It adds RTL templates for MVE vsub instructions using 'minus' instead of
>> unspec expression to make the instructions recognizable for vectorization.
>> MVE target is added in sub3 optab. The sub3 optab is
>> modified to use a mode iterator that selects available modes for various
>> targets correspondingly.
>> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
>> support vectorization.
>>
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
>> MVE instructions are documented here:
>> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>>
> 
> Hi Dennis,
> 
> Thanks for this patch . However a quick read suggests  at first glance
> that it could do with some refactoring or indeed further breaking
> down.
> 
> 1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
> the motivation for obviously on a quick read. I'll try and read that
> again. Please document why these complex TARGET_ macros exist and how
> they are expected to be used in the machine description and what they
> are indicated to do.

Thanks for the questions.
The macros are used in the iterators as conditions to enable modes 
separately for different targets. The reason to define these macros is 
to make the iterators short.
And about why using conditions for the iterators, the aim is to put 
different modes in a single expander. Otherwise the expander would 
repeat several times for different sets of modes supported by different 
targets.

> 2. It seems odd that we would have
>   "&& ((mode != V2SFmode && mode != V4SFmode)
> +|| flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
> apply this to TARGET_MVE_FLOAT in the sub3 expander. The point
> is that if it isn't safe to vectorize a subtract for Neon, why is it
> safe to do the same for MVE ? This was done in 2010 by Julian to fix
> PR target/43703 - isn't this applicable on MVE as well ?

I agree with this after investigation. I've add 
flag_unsafe_math_optimizations fot MVE_FLOAT target.

> 3. I'm also going to quibble a bit about the use of VSEL as the name
> of an iterator as that conflates it with the instruction vsel and it's
> not obvious what's going on here.

I have changed the name to VNIM_COND, which means NONE, IWWMMXT and MVE 
according to conditions.
I've add comments to document the aim of the iterator.
Please let me know if you think it needs further fix.

> 
> 
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
> 
> I'm a bit confused as to why this got exposed because of the new MVE
> vector modes exposed by this patch.

The aim of the tests is only to check the reinterpret intrinsics working 
well.
However the two functions in each test contain icf optimization pattern 
and then the second function is folded due to same code. The icf pattern 
is not expected but to make the test pass, the author only checked the 
instruction count for the first function.
With my patch that enables MVE vector modes in arm_preferred_simd_mode, 
the estimated code size is smaller so that the code is inlined from the 
first function back to the second one in inlining optimization after icf 
optimization. Then the instruction count changes.
Because the icf is not the expected pattern to be tested but causes 
above mentioned issues, -fno-ipa-icf is used to avoid unstable 
instruction count in these tests.

> 
>> The patch is regtested for arm-none-eabi and bootstrapped for
>> arm-none-linux-gnueabihf.
>>
> Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
> --with-fpu=neon in the configuration ?

Yes, for arm-none-linux-gnueabihf bootstrap there is --with-fpu=neon.
Should I test it without this configuration?

The new patch is attached.
I updated the comments for the iterator and the macros.

Many thanks!
Dennis

gcc/ChangeLog:

2020-08-27  Dennis Zhang  

* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
(TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
(TARGET_NEON_MVE_HFP): Likewise.
* config/arm/iterators.md (VNIM_COND): New mode iterator to enable
modes according to corresponding targets.
* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using 

Re: [PATCH] lto: Stream edge goto_locus [PR94235]

2020-09-07 Thread Richard Biener
On Sat, 5 Sep 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds streaming of edge goto_locus (both LOCATION_LOCUS
> and LOCATION_BLOCK from it), the PR shows a testcase (inappropriate for
> gcc testsuite) where the lack of streaming of goto_locus results in worse
> debug info.
> Earlier version of the patch (without the output_function changes) failed
> miserably, because on the order mismatch - input_function would
> first input_cfg, then input_eh_regions and then input_bb (all of which now
> have locations), while output_function used output_eh_regions, then output_bb
> and then output_cfg.  *_cfg went to a separate stream...
> Now, is there a reason why the order is different?

It's probably simply an oversight.

> Bootstrap/regtest on x86_64-linux and i686-linux went fine after changing 
> that,
> including lto bootstrap on x86_64-linux.

OK.

Thanks,
Richard.

> If the intent is that the cfg could be read separately from the rest of
> function or vice versa, alternatively we'd need to clear_line_info ();
> before output_eh_regions and before/after output_cfg to make them
> independent.
> 
> 2020-09-05  Jakub Jelinek  
> 
>   PR debug/94235
>   * lto-streamer-out.c (output_cfg): Also stream goto_locus for edges.
>   Use bp_pack_var_len_unsigned instead of streamer_write_uhwi to stream
>   e->dest->index and e->flags.
>   (output_function): Call output_cfg before output_ssa_name, rather than
>   after streaming all bbs.
>   * lto-streamer-in.c (input_cfg): Stream in goto_locus for edges.
>   Use bp_unpack_var_len_unsigned instead of streamer_read_uhwi to stream
>   in dest_index and edge_flags.
> 
> --- gcc/lto-streamer-out.c.jj 2020-09-04 11:53:22.673647471 +0200
> +++ gcc/lto-streamer-out.c2020-09-04 22:42:14.595657649 +0200
> @@ -2100,9 +2100,11 @@ output_cfg (struct output_block *ob, str
>streamer_write_uhwi (ob, EDGE_COUNT (bb->succs));
>FOR_EACH_EDGE (e, ei, bb->succs)
>   {
> -   streamer_write_uhwi (ob, e->dest->index);
> +   bitpack_d bp = bitpack_create (ob->main_stream);
> +   bp_pack_var_len_unsigned (, e->dest->index);
> +   bp_pack_var_len_unsigned (, e->flags);
> +   stream_output_location_and_block (ob, , e->goto_locus);
> e->probability.stream_out (ob);
> -   streamer_write_uhwi (ob, e->flags);
>   }
>  }
>  
> @@ -2418,6 +2420,8 @@ output_function (struct cgraph_node *nod
>streamer_write_uhwi (ob, 1);
>output_struct_function_base (ob, fn);
>  
> +  output_cfg (ob, fn);
> +
>/* Output all the SSA names used in the function.  */
>output_ssa_names (ob, fn);
>  
> @@ -2430,8 +2434,6 @@ output_function (struct cgraph_node *nod
>  
>/* The terminator for this function.  */
>streamer_write_record_start (ob, LTO_null);
> -
> -  output_cfg (ob, fn);
> }
>else
>  streamer_write_uhwi (ob, 0);
> --- gcc/lto-streamer-in.c.jj  2020-09-04 11:55:08.069114502 +0200
> +++ gcc/lto-streamer-in.c 2020-09-04 13:26:04.439987053 +0200
> @@ -780,23 +780,19 @@ input_cfg (class lto_input_block *ib, cl
>/* Connect up the CFG.  */
>for (i = 0; i < edge_count; i++)
>   {
> -   unsigned int dest_index;
> -   unsigned int edge_flags;
> -   basic_block dest;
> -   profile_probability probability;
> -   edge e;
> -
> -   dest_index = streamer_read_uhwi (ib);
> -   probability = profile_probability::stream_in (ib);
> -   edge_flags = streamer_read_uhwi (ib);
> -
> -   dest = BASIC_BLOCK_FOR_FN (fn, dest_index);
> +   bitpack_d bp = streamer_read_bitpack (ib);
> +   unsigned int dest_index = bp_unpack_var_len_unsigned ();
> +   unsigned int edge_flags = bp_unpack_var_len_unsigned ();
> +   basic_block dest = BASIC_BLOCK_FOR_FN (fn, dest_index);
>  
> if (dest == NULL)
>   dest = make_new_block (fn, dest_index);
>  
> -   e = make_edge (bb, dest, edge_flags);
> -   e->probability = probability;
> +   edge e = make_edge (bb, dest, edge_flags);
> +   data_in->location_cache.input_location_and_block (>goto_locus,
> + , ib, data_in);
> +   e->probability = profile_probability::stream_in (ib);
> +
>   }
>  
>index = streamer_read_hwi (ib);
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Patch for 96948

2020-09-07 Thread Kirill Müller via Gcc-patches

Hi


As requested, attaching a patch for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96948. This solves a 
problem with _Unwind_Backtrace() on mingw64 + SEH.



Best regards

Kirill

>From bbc163476cab9bcaaa044d8aa9ecee824f7284f5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kirill=20M=C3=BCller?= 
Date: Sun, 6 Sep 2020 10:01:46 +0200
Subject: [PATCH] Fix _Unwind_Backtrace() for SEH

---
 libgcc/unwind-seh.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/libgcc/unwind-seh.c b/libgcc/unwind-seh.c
index 8c6aade9a3b3..a2d51b6df02b 100644
--- a/libgcc/unwind-seh.c
+++ b/libgcc/unwind-seh.c
@@ -450,8 +450,11 @@ _Unwind_Backtrace(_Unwind_Trace_Fn trace,
   gcc_context.disp->ContextRecord = _context;
   gcc_context.disp->HistoryTable = _history;
 
+  int first = 1;
   while (1)
 {
+  gcc_context.cfa = ms_context.Rsp;
+  gcc_context.ra = ms_context.Rip;
   gcc_context.disp->ControlPc = ms_context.Rip;
   gcc_context.disp->FunctionEntry
 	= RtlLookupFunctionEntry (ms_context.Rip, _context.disp->ImageBase,
@@ -466,9 +469,14 @@ _Unwind_Backtrace(_Unwind_Trace_Fn trace,
 			_context.disp->HandlerData,
 			_context.disp->EstablisherFrame, NULL);
 
-  /* Call trace function.  */
-  if (trace (_context, trace_argument) != _URC_NO_REASON)
-	return _URC_FATAL_PHASE1_ERROR;
+  /* Call trace function, skip first call.  */
+  if (first) {
+	first = 0;
+  }
+  else {
+	if (trace (_context, trace_argument) != _URC_NO_REASON)
+	  return _URC_FATAL_PHASE1_ERROR;
+  }
 
   /* ??? Check for invalid stack pointer.  */
   if (ms_context.Rip == 0)