Re: [PATCH 3/4] PowerPC: Add power10 xsmaxcqp/xsmincqp support

2020-08-27 Thread Michael Meissner via Gcc-patches
On Thu, Aug 27, 2020 at 03:47:19PM -0500, will schmidt wrote:
> > (Fm): New mode attribute for floating point scalars.
> 
> Mixed feels on mixed case, but I defer. :-) 

It is similar to other mode attributes (Ff, Fv) used for setting constraints
based on the mode.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


RE: [PATCH] fix testcase gcc.target/aarch64/insv_1.c

2020-08-27 Thread Qian, Jianhua
Hi Iain

Iain Sandoe  wrote:
>Richard Sandiford  wrote:
>> "Qian, Jianhua"  writes:
>>> Hi Richard
>>> 
>>> I found that some instructions are using '#' before immediate value,
>>> and others are not. For example
>>> (define_insn "insv_imm"
>>>  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
>>> 
>>> 
>>> (match_operand:GPI 2 "const_int_operand" "n"))]  "UINTVAL 
>>> (operands[1]) < GET_MODE_BITSIZE (mode)
>>>   && UINTVAL (operands[1]) % 16 == 0"
>>>  "movk\\t%0, %X2, lsl %1"
>>>  [(set_attr "type" "mov_imm")]
>>> )
>>> 
>>> Are there any standards for this?
>> 
>> No, it's unfortunately inconsistent.  FWIW, if we were going to change 
>> this, personally I've a slight preference for having the “#”.
>
>Absence of the # makes assemblers based on the LLVM backend reject GCC’s 
>output, as such I’ve got a strong preference for adding it (I’ve got some 
>local patches for this already).
>e.g.
>https://github.com/iains/gcc-darwin-arm64/commit/526ffb6b34ddb848853016cd14a438683aa0e6de

That's good, we are of the same mind..

Regards
Qian





[PATCH] c: Silently ignore pragma region [PR85487]

2020-08-27 Thread Austin Morton via Gcc-patches
#pragma region is a feature introduced by Microsoft in order to allow
manual grouping and folding of code within Visual Studio.  It is
entirely ignored by the compiler.  Clang has supported this feature
since 2012 when in MSVC compatibility mode, and enabled it across the
board 3 months ago.

As it stands, you cannot use #pragma region within GCC without
disabling unknown pragma warnings, which is not advisable.

I propose GCC adopt "#pragma region" and "#pragma endregion" in order
to alleviate these issues.  Because the pragma has no purpose at
compile time, the implementation is trivial.


Microsoft Documentation on the feature:
https://docs.microsoft.com/en-us/cpp/preprocessor/region-endregion

LLVM change which enabled pragma region across the board:
https://reviews.llvm.org/D42248

---
 gcc/c-family/ChangeLog   |  5 +
 gcc/c-family/c-pragma.c  | 10 ++
 gcc/testsuite/ChangeLog  |  5 +
 gcc/testsuite/gcc.dg/pragma-region.c | 21 +
 4 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pragma-region.c

diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index 1eaa99f31..97ba259cd 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-27  Austin Morton  
+
+ PR c/85487
+ * c-pragma.c (handle_pragma_region): Declare.
+
 2020-08-11  Jakub Jelinek  

  PR c/96545
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index e3169e68f..de0411d07 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1166,6 +1166,13 @@ handle_pragma_message (cpp_reader *ARG_UNUSED(dummy))
 TREE_STRING_POINTER (message));
 }

+/* Silently ignore region pragmas.  */
+
+static void
+handle_pragma_region (cpp_reader *ARG_UNUSED(dummy))
+{
+}
+
 /* Mark whether the current location is valid for a STDC pragma.  */

 static bool valid_location_for_stdc_pragma;
@@ -1584,6 +1591,9 @@ init_pragma (void)

   c_register_pragma_with_expansion (0, "message", handle_pragma_message);

+  c_register_pragma (0, "region", handle_pragma_region);
+  c_register_pragma (0, "endregion", handle_pragma_region);
+
 #ifdef REGISTER_TARGET_PRAGMAS
   REGISTER_TARGET_PRAGMAS ();
 #endif
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 5c1a45716..4033c111c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-27  Austin Morton  
+
+ PR c/85487
+ * gcc.dg/pragma-region.c: New test.
+
 2020-08-26  Jeff Law  

  * gcc.target/i386/387-7.c: Add dg-require-effective-target c99_runtime.
diff --git a/gcc/testsuite/gcc.dg/pragma-region.c
b/gcc/testsuite/gcc.dg/pragma-region.c
new file mode 100644
index 0..72cc2c144
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pragma-region.c
@@ -0,0 +1,21 @@
+/* Verify #pragma region and #pragma endregion do not emit warnings.  */
+
+/* { dg-options "-Wunknown-pragmas" } */
+
+#pragma region
+
+#pragma region name
+
+#pragma region "name"
+
+#pragma region()
+
+#pragma region("name")
+
+#pragma endregion
+
+#pragma endregion garbage
+
+#pragma endregion()
+
+#pragma endregion("garbage")
-- 
2.17.1


Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread Alan Modra via Gcc-patches
On Thu, Aug 27, 2020 at 03:17:45PM -0500, Segher Boessenkool wrote:
> On Thu, Aug 27, 2020 at 01:51:25PM -0500, Bill Schmidt wrote:
> It not the copy that is unnecessary: the preventing it *here*, manually,
> is what is unnecessary.

Blame me for the original !rtx_equal_p in rs6000_call_aix that Bill
copied.  So does emit_move_insn prevent the copy?  I can't spot where,
maybe I haven't looked hard enough.

If emit_move_insn doesn't prevent it, then why create useless RTL that
is only going to make work for optimisation passes that remove such
nops?

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-27 Thread Martin Sebor via Gcc-patches

With --enable-valgrind-annotations the change to the member function
signature in this patch triggers compilation errors during bootstrap:

/src/gcc/trunk/gcc/ggc-common.c: In function ‘void gt_pch_save(FILE*)’:
/src/gcc/trunk/gcc/ggc-common.c:509:33: error: no matching function for 
call to ‘vec::safe_grow(size_t&)’

  vbits.safe_grow (valid_size);
 ^
In file included from /src/gcc/trunk/gcc/hash-table.h:248,
 from /src/gcc/trunk/gcc/coretypes.h:476,
 from /src/gcc/trunk/gcc/ggc-common.c:26:
/src/gcc/trunk/gcc/vec.h:1897:1: note: candidate: ‘void 
vec::safe_grow(unsigned int, bool) [with T = char]’
 vec::safe_grow (unsigned len, bool exact 
MEM_STAT_DECL)

 ^~~
/src/gcc/trunk/gcc/vec.h:1897:1: note:   candidate expects 2 arguments, 
1 provided


/src/gcc/trunk/gcc/jit/jit-recording.c: In member function ‘virtual 
gcc::jit::recording::string* 
gcc::jit::recording::switch_::make_debug_string()’:
/src/gcc/trunk/gcc/jit/jit-recording.c:6313:41: error: no matching 
function for call to ‘auto_vec::safe_grow(size_t)’

 6313 |   cases_str.safe_grow (idx + 1 + len);
  | ^
In file included from /src/gcc/trunk/gcc/hash-table.h:248,
 from /src/gcc/trunk/gcc/coretypes.h:476,
 from /src/gcc/trunk/gcc/jit/jit-recording.c:23:
/src/gcc/trunk/gcc/vec.h:1897:1: note: candidate: ‘void 
vec::safe_grow(unsigned int, bool) [with T = char]’
 1897 | vec::safe_grow (unsigned len, bool exact 
MEM_STAT_DECL)

  | ^~~
/src/gcc/trunk/gcc/vec.h:1897:1: note:   candidate expects 2 arguments, 
1 provided



Bootstrap succeeds with the patch below:

diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c
index 0d528cf455c..d4691cc5448 100644
--- a/gcc/ggc-common.c
+++ b/gcc/ggc-common.c
@@ -506,7 +506,7 @@ gt_pch_save (FILE *f)
   if (__builtin_expect (RUNNING_ON_VALGRIND, 0))
{
  if (vbits.length () < valid_size)
-   vbits.safe_grow (valid_size);
+   vbits.safe_grow (valid_size, true);
  get_vbits = VALGRIND_GET_VBITS (state.ptrs[i]->obj,
  vbits.address (), valid_size);
  if (get_vbits == 3)
diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
index b73cd76a0a0..a9e6528db69 100644
--- a/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -6310,7 +6310,7 @@ recording::switch_::make_debug_string ()
 {
   size_t len = strlen (c->get_debug_string ());
   unsigned idx = cases_str.length ();
-  cases_str.safe_grow (idx + 1 + len);
+  cases_str.safe_grow (idx + 1 + len, true);
   cases_str[idx] = ' ';
   memcpy (&(cases_str[idx + 1]),
  c->get_debug_string (),

Martin


libgo patch committed: Remove middle dot from shell script

2020-08-27 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Maciej W. Rozycki removes a middle dot from the
gotest shell script.  There was a U+00B7 middle dot character, placed
after "mips64p32le" in the target lists, which is now changed to a
space.  The U+00B7 character may not be considered whitespace by
Bourne shell and any non-ASCII character may render incorrectly in
some terminal devices.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
c0260afc181c31419c3e0f6603a492a3e0b3881f
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 780588aabc5..dc63f4a696a 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-9aed2d2c5e9c69aa530bf09d72d33c66e497d720
+b75a139fcc7c56988ce2d5b3a2b9e274eb521b0d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/testsuite/gotest b/libgo/testsuite/gotest
index 2bd6b5e2b2a..eadafa1a7cd 100755
--- a/libgo/testsuite/gotest
+++ b/libgo/testsuite/gotest
@@ -308,7 +308,7 @@ x)
aix | android | darwin | dragonfly | freebsd | hurd | illumos | js 
| linux | nacl | netbsd | openbsd | plan9 | solaris | windows)
tag1=nonmatchingtag
;;
-   386 | amd64 | amd64p32 | arm | armbe | arm64 | arm64be | alpha | 
ia64 | m68k | mips | mipsle | mips64 | mips64le | mips64p32 | mips64p32le·| 
nios2 | ppc | ppc64 | ppc64le | riscv64 | s390 | s390x | sh | shbe | sparc | 
sparc64 | wasm)
+   386 | amd64 | amd64p32 | arm | armbe | arm64 | arm64be | alpha | 
ia64 | m68k | mips | mipsle | mips64 | mips64le | mips64p32 | mips64p32le | 
nios2 | ppc | ppc64 | ppc64le | riscv64 | s390 | s390x | sh | shbe | sparc | 
sparc64 | wasm)
tag1=nonmatchingtag
;;
esac
@@ -320,7 +320,7 @@ x)
aix | android | darwin | dragonfly | freebsd | hurd | illumos | js 
| linux | nacl | netbsd | openbsd | plan9 | solaris | windows)
tag2=nonmatchingtag
;;
-   386 | amd64 | amd64p32 | arm | armbe | arm64 | arm64be | alpha | 
ia64 | m68k | mips | mipsle | mips64 | mips64le | mips64p32 | mips64p32le·| 
nios2 | ppc | ppc64 | ppc64le | riscv64 | s390 | s390x | sh | shbe | sparc | 
sparc64 | wasm)
+   386 | amd64 | amd64p32 | arm | armbe | arm64 | arm64be | alpha | 
ia64 | m68k | mips | mipsle | mips64 | mips64le | mips64p32 | mips64p32le | 
nios2 | ppc | ppc64 | ppc64le | riscv64 | s390 | s390x | sh | shbe | sparc | 
sparc64 | wasm)
tag2=nonmatchingtag
;;
esac


[PATCH 2/n] ipa: Add more intuitive ways of constructing ipa_call_context

2020-08-27 Thread Martin Jambor
Hi,

this patch adds two static methods to ipa_call_context which construct
and return the object in the two scenarios where we use them (what if
an edge was inlined, what if a node wascloned) which saves callers a
bit work and are more intuitive.

The next step is to make ipa_call_context::estimate_size_and_time store
its results directly into the object.  Ideally, these construction
functions would immediately invoke it, but that would not work with
caching in do_estimate_edge_time.  I guess ideally for_inlined_edge
would do the caching... but this patch is a move in the right direction.

Anyway, assuming it passes bootstrap and testing (on x86_64-linux) and
the previous patch is ACKed, is this one on top of it OK too?

Thanks,

Martin


gcc/ChangeLog:

2020-08-28  Martin Jambor  

* ipa-fnsummary.h (class ipa_call_context): New static member
methods for_inlined_edge and for_cloned_node.
(estimate_ipcp_clone_size_and_time): Remove declaration.
(evaluate_properties_for_edge): Remove declaration.
* ipa-cp.c (perform_estimation_of_a_value): Use directly
ipa_call_context instead of estimate_ipcp_clone_size_and_time.
(estimate_local_effects): Likewise.
* ipa-fnsummary.c (evaluate_properties_for_edge): Make static.
(ipa_call_context::for_inlined_edge): New function.
(ipa_call_context::for_cloned_node): Likewise.
(estimate_ipcp_clone_size_and_time): Removed.
* ipa-inline-analysis.c (do_estimate_edge_time): Use
ipa_call_context::for_inlined_edge.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
---
 gcc/ipa-cp.c  | 10 +
 gcc/ipa-fnsummary.c   | 45 +--
 gcc/ipa-fnsummary.h   | 19 ++---
 gcc/ipa-inline-analysis.c | 24 ++---
 4 files changed, 48 insertions(+), 50 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 970675efe49..82875e89a92 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3391,8 +3391,9 @@ perform_estimation_of_a_value (cgraph_node *node, 
ipa_call_arg_values *avals,
   sreal time, base_time;
   ipa_hints hints;
 
-  estimate_ipcp_clone_size_and_time (node, avals, , ,
-_time, );
+  ipa_call_context ctx = ipa_call_context::for_cloned_node (node, avals);
+  ctx.estimate_size_and_time (, NULL, , _time, );
+
   base_time -= time;
   if (base_time > 65535)
 base_time = 65535;
@@ -3469,8 +3470,9 @@ estimate_local_effects (struct cgraph_node *node)
   init_caller_stats ();
   node->call_for_symbol_thunks_and_aliases (gather_caller_stats, ,
  false);
-  estimate_ipcp_clone_size_and_time (node, , , ,
-_time, );
+  ipa_call_context ctx = ipa_call_context::for_cloned_node (node, );
+  ctx.estimate_size_and_time (, NULL, , _time, );
+
   time -= devirt_bonus;
   time -= hint_time_bonus (node, hints);
   time -= removable_params_cost;
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index f154d5ee9fc..2a0f8ad37b2 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -539,7 +539,7 @@ fre_will_run_p (struct cgraph_node *node)
except for m_known_contexts which will only be calculated if
COMPUTE_CONTEXTS is true.  */
 
-void
+static void
 evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
  clause_t *clause_ptr,
  clause_t *nonspec_clause_ptr,
@@ -3423,27 +3423,40 @@ ipa_call_context::estimate_size_and_time (int *ret_size,
   return;
 }
 
+/* Construct and create a context object representing the context of E, if it
+   were inlined. INLINE_PARAM_SUMMARY must be an existing summary with change
+   probabilities of individual parameters.  ARG_VALUES will be computed, this 
is
+   but this structure will be used to hold the result throughout the life of
+   the result context.  */
 
-/* Estimate size and time needed to execute callee of EDGE assuming that
-   parameters known to be constant at caller of EDGE are propagated.
-   KNOWN_VALS and KNOWN_CONTEXTS are vectors of assumed known constant values
-   and types for parameters.  */
+ipa_call_context
+ipa_call_context
+::for_inlined_edge (cgraph_edge *e,
+   vec inline_param_summary,
+   ipa_call_arg_values *arg_values)
+{
+  clause_t clause, nonspec_clause;
+  evaluate_properties_for_edge (e, true, , _clause,
+   arg_values, true);
+  return ipa_call_context (e->callee->ultimate_alias_target (),
+  clause, nonspec_clause, inline_param_summary,
+  arg_values);
+}
 
-void
-estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
-  ipa_call_arg_values *avals,
-  int *ret_size, sreal *ret_time,
-

Re: [PATCH 2/6] Implement a new partitioner for parallel compilation

2020-08-27 Thread Giuliano Belinassi via Gcc-patches
Hi, Honza.

Again, thank you for your detailed review!

On 08/27, Jan Hubicka wrote:
> > When using the LTO infrastructure to compile files in parallel, we
> > can't simply use any of the LTO partitioner, once extra dependency
> > analysis is required to ensure that some nodes are correctly
> > partitioned together.
> > 
> > Therefore, here we implement a new partitioner called
> > "lto_merge_comdat_map" that does all these required analysis.
> > The partitioner works as follows:
> > 
> > 1. We create a number of disjoint sets and inserts each node into a
> >separate set, which may be merged together in the future.
> > 
> > 2. Find COMDAT groups, and mark them to be partitioned together.
> > 
> > 3. Check all nodes that would require any COMDAT group to be
> >copied to its partition (which we name "COMDAT frontier"),
> >and mark them to be partitioned together.
> >This avoids duplication of COMDAT groups and crashes on the LTO
> >partitioning infrastructure.
> 
> What kind of crash you get here?

This assertion.

  bool added = add_symbol_to_partition_1 (part, node1);
  gcc_assert (added);

It checks if the COMDAT node was not already inserted into somewhere
else partition.

> > 
> > 4. Check if the user allows the partitioner to promote non-public
> >functions or variables to global to improve parallelization
> >opportunity with a cost of modifying the output code layout.
> > 
> > 5. Balance generated partitions for performance unless not told to.
> > 
> > The choice of 1. was by design, so we could use a union-find
> > data structure, which are know for being very fast on set unite
> > operations.
> 
> In LTO partitioner the groups of objects that "must go toghether"
> are discovered when first object is placed into the partition (via
> add_to_partition) because with the LTO rules it is always possible to
> discover all members from the group starting from any random element via
> graph walking.
> 
> I guess it is same with your partitioner?  I basically wonder how much
> code can be shared and what needs to be duplicated.
> It is not very nice to have partitioning implemented twice - it is bit
> subtle problem when it comes to details so I would be happier if we
> brought in the lto/lto-partition.c to middle end and updaed/cleaned it
> up as needed.

They are almost the same thing, they group together nodes that
require to be in the same partition before deciding how to partition
them.

Things are a little different in the way that this partitioner starts
with n partitions and merge nodes together as we decide that these
nodes needs to be in the same partition.  The advantage of this is that
merging partitions is quite cheap, but the drawback is that I can't
undo partitions easily. You can also see that I only use the
add_node_to_partition after it decides what nodes should be in the
partition.

I think that if there is a way to avoid failing that assertion that I
mentioned above, we could even get rid of this step and use the
balanced_map partitioner.

> > 
> > For 3. to work properly, we also had to modify
> > lto_promote_cross_file_statics to handle this case.
> > 
> > The parameters --param=promote-statics and --param=balance-partitions
> > control 4. and 5., respectively
> > 
> > gcc/ChangeLog:
> > 2020-08-20  Giuliano Belinassi  
> > 
> > * Makefile.in: Add lto-partition.o
> > * cgraph.h (struct symtab_node::aux2): New variable.
> > * lto-partition.c: Move from gcc/lto/lto-partition.c
> > (add_symbol_to_partition_1): Only compute insn size
> > if information is available.
> > (node_cmp): Same as above.
> > (class union_find): New.
> > (ds_print_roots): New function.
> > (balance_partitions): New function.
> > (build_ltrans_partitions): New function.
> > (merge_comdat_nodes): New function.
> > (merge_static_calls): New function.
> > (merge_contained_symbols): New function.
> > (lto_merge_comdat_map): New function.
> > (privatize_symbol_name_1): Handle when WPA is not enabled.
> > (privatize_symbol_name): Same as above.
> > (lto_promote_cross_file_statics): New parameter to select when
> > to promote to global.
> > (lto_check_usage_from_other_partitions): New function.
> > * lto-partition.h: Move from gcc/lto/lto-partition.h
> > (lto_promote_cross_file_statics): Update prototype.
> > (lto_check_usage_from_other_partitions): Declare.
> > (lto_merge_comdat_map): Declare.
> > 
> > gcc/lto/ChangeLog:
> > 2020-08-20  Giuliano Belinassi  
> > 
> > * lto-partition.c: Move to gcc/lto-partition.c.
> > * lto-partition.h: Move to gcc/lto-partition.h.
> > * lto.c: Update call to lto_promote_cross_file_statics.
> > * Makefile.in: Remove lto-partition.o.
> > ---
> >  gcc/Makefile.in   |   1 +
> >  gcc/cgraph.h  |   1 +
> >  gcc/{lto => }/lto-partition.c | 463 +-
> >  gcc/{lto => }/lto-partition.h |   4 

[committed] libstdc++: Make std::chrono::duration use reduced ratio for period

2020-08-27 Thread Jonathan Wakely via Gcc-patches
This implements the changes from P0548 "common_type and duration". That
was a change for C++17, but as it corrects some issues introduced by DRs
I'm also treating it as a DR and changing it for all modes from C++11
up.

The main change is that duration::period no longer denotes P, but
rather P::type, the reduced ratio. The unary operator+ and operator-
members of duration should now return a duration using that reduced
ratio.

The requirement that common_type::type is the same type as
common_type::type (rather than simply T) was already implemented
for PR 89102.

The standard says that duration::operator+() and duration::operator-()
should return common_type_t, but that seems unnecessarily
expensive to compute. This change just uses duration which
is the same type, so we don't need to instantiate common_type.

As an optimization, this also adds partial specializations of
common_type for two durations of the same type, a single duration, two
time_points of the same type, and a single time_point. These
specializations avoid instantiating other specializations of common_type
and one or both of __duration_common_type or __timepoint_common_type for
the cases where the answer is trivial to obtain.

libstdc++-v3/ChangeLog:

* include/std/chrono (__duration_common_type): Ensure the
reduced ratio is used. Remove unused partial specialization
using __failure_type.
(common_type): Pass reduced ratios to __duration_common_type.
Add partial specializations for simple cases involving a single
duration or time_point type.
(duration::period): Use reduced ratio.
(duration::operator+(), duration::operator-()): Return duration
type using the reduced ratio.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc:
Adjust expected errors.
* testsuite/20_util/duration/requirements/reduced_period.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

This is a C++17 feature, so I think it would be good to backport it to
gcc-10 as well. I'll let it bake on trunk for a while first though.


commit 82030d51017323c5706d58d8c8626324ece007e4
Author: Jonathan Wakely 
Date:   Thu Aug 27 22:36:03 2020

libstdc++: Make std::chrono::duration use reduced ratio for period

This implements the changes from P0548 "common_type and duration". That
was a change for C++17, but as it corrects some issues introduced by DRs
I'm also treating it as a DR and changing it for all modes from C++11
up.

The main change is that duration::period no longer denotes P, but
rather P::type, the reduced ratio. The unary operator+ and operator-
members of duration should now return a duration using that reduced
ratio.

The requirement that common_type::type is the same type as
common_type::type (rather than simply T) was already implemented
for PR 89102.

The standard says that duration::operator+() and duration::operator-()
should return common_type_t, but that seems unnecessarily
expensive to compute. This change just uses duration which
is the same type, so we don't need to instantiate common_type.

As an optimization, this also adds partial specializations of
common_type for two durations of the same type, a single duration, two
time_points of the same type, and a single time_point. These
specializations avoid instantiating other specializations of common_type
and one or both of __duration_common_type or __timepoint_common_type for
the cases where the answer is trivial to obtain.

libstdc++-v3/ChangeLog:

* include/std/chrono (__duration_common_type): Ensure the
reduced ratio is used. Remove unused partial specialization
using __failure_type.
(common_type): Pass reduced ratios to __duration_common_type.
Add partial specializations for simple cases involving a single
duration or time_point type.
(duration::period): Use reduced ratio.
(duration::operator+(), duration::operator-()): Return duration
type using the reduced ratio.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc:
Adjust expected errors.
* testsuite/20_util/duration/requirements/reduced_period.cc: New 
test.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 9fc8f560d99..fb251848da8 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -94,13 +94,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
(_Period1::den / __gcd_den::value) * _Period2::den>;
 
 public:
-  using type = chrono::duration<__cr, __r>;
+  using type = chrono::duration<__cr, typename __r::type>;
 };
 
-  template
-struct __duration_common_type<__failure_type, _Period1, _Period2>
-{ typedef __failure_type type; };
-
   /// @endcond
 
   /// Specialization of 

[PATCH] ipa: Merge vectors describing argument values into one type

2020-08-27 Thread Martin Jambor
Hi,

this large patch is a semi-mechanical change which aims to replace
uses of separate vectors about known scalar values (usually called
known_vals or known_csts), known aggregate values (known_aggs), known
virtual call contexts (known_contexts) and known value
ranges (known_value_ranges) with uses of one type called
ipa_call_arg_values.  The main benefit is bringing down the number of
arguments that various call context evaluating functions have.
Because the new type uses auto_vecs deallocation is simpler and it
also takes advantage of storage within the auto_vecs themselves,
eliminating the need for allocating memory when analyzing most
functions.

The one place where this patch is not mechanical is the
node_context_cache_entry RCU cache.  Before the elements contained
directly ipa_call_context instances which however owned their vectors,
they did not share them with their users like non-cache contexts.  Now
the vectors are behind the pointer which means ipa_call_arg_values
would have to be allocated, for many functions at once, and so it
could not make use of auto_vecs with generous internal storage.

I avoided both by reworking the cache to contain copies of all fields
o interest of ipa_call_context, including the interesting vectors.
This makes the type a bit more verbose but the main functionality,
storing to the cache and comparing a context with a cache entry,
remained very similar.

This patch does not unify information about scalar values, aggregate
values, virtual contexts and value ranges that IPA-CP stores for
clones it creates.  The main reason is not to make this patch even
bigger.  Another one is that such change would not be as mechanical,
the aggregate values for clones are stored in a different format.  I
am considering doing something similar for them too however, even
though using ipa_call_arg_values directly would also mean not using
big auto_vecs in it.

I have bootstrapped and tested what I believe to be an identical patch
(before I fixed some comments) on x86_64-linux, I am running LTO
bootstrap on exactly this patch now.  OK for master if it passes?

Thanks,

Martin



gcc/ChangeLog:

2020-08-27  Martin Jambor  

* ipa-prop.h (ipa_call_arg_values): New type.
(ipa_get_indirect_edge_target): Replaced vector arguments
with ipa_call_arg_values in declaration.
* ipa-fnsummary.h (ipa_node_context_cache_entry): New forward
declaration.
(class ipa_call_context): Removed members m_known_vals,
m_known_contexts, m_known_aggs, duplicate_from, release and
equal_to, new members m_avals, store_to_cache and equivalent_to_p.
Adjusted construcotr arguments.
(estimate_ipcp_clone_size_and_time): Replaced vector arguments
with ipa_call_arg_values in declaration.
(evaluate_properties_for_edge): Likewise.
* ipa-cp.c (ipa_get_indirect_edge_target): Adjust to work on
ipa_call_arg_values rather than on separate vectors.
(devirtualization_time_bonus): Likewise.
(gather_context_independent_values): Likewise.
(perform_estimation_of_a_value): Likewise.
(estimate_local_effects): Likewise.
(modify_known_vectors_with_val): Adjust both variants to work on
ipa_call_arg_values and rename them to copy_known_vectors_add_val.
(decide_about_value): Adjust to work on ipa_call_arg_values rather
than on separate vectors.
(decide_whether_version_node): Likewise.
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Adjust to
work on ipa_call_arg_values rather than on separate vectors.
(evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(estimate_edge_devirt_benefit): Likewise.
(estimate_edge_size_and_time): Likewise.
(estimate_calls_size_and_time_1): Likewise.
(summarize_calls_size_and_time): Adjust calls to
estimate_edge_size_and_time.
(estimate_calls_size_and_time): Adjust to work on
ipa_call_arg_values rather than on separate vectors.
(ipa_call_context::ipa_call_context): Likewise.
(ipa_call_context::duplicate_from): Removed.
(ipa_call_context::release): Likewise.
(ipa_call_context::equal_to): Likewise.
(ipa_call_context::estimate_size_and_time): Adjust to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_ipcp_clone_size_and_time): Likewise.
(ipa_merge_fn_summary_after_inlining): Likewise.
* ipa-inline-analysis.c (node_context_cache_entry): Replaced with
ipa_node_context_cache_entry which does not directly hold context
but irectly its necessary fields.
(ipa_node_context_cache_entry::release): New.
(ipa_call_context::store_to_cache): Likewise.
(ipa_call_context::equivalent_to_p): Likewise.
(do_estimate_edge_time): Adjust to work on ipa_call_arg_values
rather than on 

Re: [PATCH] avoid erroneous argument types when checking built-in signatures (PR c/96596)

2020-08-27 Thread Joseph Myers
On Thu, 27 Aug 2020, Martin Sebor via Gcc-patches wrote:

> The attached change has match_builtin_function_types() fail
> for erroneous argument types to prevent an ICE due to assuming
> they are necessarily valid.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 4/4] PowerPC: Add power10 xscmp{eq,gt,ge}qp support

2020-08-27 Thread will schmidt via Gcc-patches
On Wed, 2020-08-26 at 22:46 -0400, Michael Meissner via Gcc-patches wrote:
> PowerPC: Add power10 xscmp{eq,gt,ge}qp support.
> 
> This patch adds the conditional move support.  In adding the conditional move
> support, the optimizers will be able to convert things like:
> 
>   a = (b > c) ? b : c;
> 
> into the instructions.  This patch merges together the scalar SF/DF 
> conditional
> move with the scalar KF/TF conditional move.  It extends the optimization that
> was previously used for SFmode and DFmode to allow the comparison to be a
> different scalar floating point mode than the move.  I.e.
> 
>   __float128 a, b, c;
>   float x, y;
> 
>   /* ... */
> 
>   a = (x == y) ? b : c;
> 
> I did have to add an XXPERMDI if the comparison mode was SFmode or DFmode, and
> the move mode is KFmode or TFmode (the XSCMP{EQ,GT,GE}DP instructions
> explicitly set the bottom 64 bits of the vector register to 0).
> 
> I have built compilers on a little endian power9 Linux system with all 4
> patches applied.  I did bootstrap builds and ran the testsuite, with no
> regressions.  Previous versions of the patch was also tested on a little 
> endian
> power8 Linux system.  I would like to check this patch into the master branch
> for GCC 11.  At this time, I do not anticipate needing to backport these
> changes to GCC 10.3.
> 
> gcc/
> 2020-08-26  Michael Meissner  
> 
>   * config/rs6000/predicates.md (fpmask_normal_or_invert_operator):
>   New predicate.
>   * config/rs6000/rs6000.c (have_compare_and_set_mask): Add IEEE
>   128-bit floating point types.

>   * config/rs6000/rs6000.md (FSCALAR2): New iterator for floating
>   point conditional moves.


>   (movcc_p9): Replace with
>   mov.

(movcc): Update to use FSCALAR iterator

>   (movcc_invert_p9): Replace with
>   mov.

>   (mov): Combine both

If i followed correctly, that should that be
*movcc " ..?

>   movcc_p9 and
>   movcc_invert_p9 patterns.  Add ISA 3.1
>   support for IEEE 128-bit conditional moves.  Always use an
>   earlyclobber register for the mask.  Use XXPERMDI to extend the
>   mask if we have a 64-bit comparison and 128-bit move.
>   register for the mask.

ok

>   (fpmask, xxsel): Add ISA 3.1 support for IEEE 128-bit
>   conditional moves.  Enable the generator functionality so
>   mov can call it.  Update constraints
>   for 128-bit operations.

ok

reviewed the rest, nothing else jumped out at me.   lgtm, thanks
-Will

> 
> gcc/testsuite/
> 2020-08-26  Michael Meissner  
> 
>   * gcc.target/powerpc/float128-cmove.c: New test.
>   * gcc.target/powerpc/float128-minmax-3.c: New test.
> 
> ---
>  gcc/config/rs6000/predicates.md   |   5 +
>  gcc/config/rs6000/rs6000.c|   4 +
>  gcc/config/rs6000/rs6000.md   | 154 ++
>  .../gcc.target/powerpc/float128-cmove.c   |  93 +++
>  .../gcc.target/powerpc/float128-minmax-3.c|  15 ++
>  5 files changed, 200 insertions(+), 71 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-cmove.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c
> 
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 2709e46f7e5..60b45601e9b 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -1225,6 +1225,11 @@ (define_predicate "fpmask_comparison_operator"
>  (define_predicate "invert_fpmask_comparison_operator"
>(match_code "ne,unlt,unle"))
> 
> +;; Return 1 if OP is either a fpmask_comparison_operator or
> +;; invert_fpmask_comparsion_operator.
> +(define_predicate "fpmask_normal_or_invert_operator"
> +  (match_code "eq,gt,ge,ne,unlt,unle"))
> +
>  ;; Return 1 if OP is a comparison operation suitable for integer 
> vector/scalar
>  ;; comparisons that generate a -1/0 mask.
>  (define_predicate "vecint_comparison_operator"
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 05eb141a2cd..403897926c5 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15177,6 +15177,10 @@ have_compare_and_set_mask (machine_mode mode)
>  case DFmode:
>return TARGET_P9_MINMAX;
> 
> +case KFmode:
> +case TFmode:
> +  return FLOAT128_IEEE_MINMAX_P (mode);
> +
>  default:
>break;
>  }
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 006e60f09bc..147c635994c 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -795,6 +795,15 @@ (define_mode_iterator FSCALAR [SF
>  (KF "FLOAT128_IEEE_MINMAX_P (KFmode)")
>  (TF "FLOAT128_IEEE_MINMAX_P (TFmode)")])
> 
> +;; Secondary iterator for scalar binary floating point operations.  This is
> +;; used for the conditional moves when we have a compare and set mask
> +;; instruction.  Using this 

Re: [PATCH 3/4] PowerPC: Add power10 xsmaxcqp/xsmincqp support

2020-08-27 Thread will schmidt via Gcc-patches
On Wed, 2020-08-26 at 22:45 -0400, Michael Meissner via Gcc-patches wrote:
> PowerPC: Add power10 xsmaxcqp/xsmincqp support.
> 
> This patch adds support for the ISA 3.1 (power10) IEEE 128-bit "C" minimum and
> maximum functions.  Because of the NaN differences, the built-in functions 
> will
> only generate these instructions if -ffast-math is used until the conditional
> move support is added in the next patch.
> 
> I have built compilers on a little endian power9 Linux system with all 4
> patches applied.  I did bootstrap builds and ran the testsuite, with no
> regressions.  Previous versions of the patch was also tested on a little 
> endian
> power8 Linux system.  I would like to check this patch into the master branch
> for GCC 11.  At this time, I do not anticipate needing to backport these
> changes to GCC 10.3.
> 
> 
> gcc/
> 2020-08-26  Michael Meissner  
> 

* config/rs6000/rs6000.c (rs6000_emit_minmax): Add 128-bit
clause to if-check.  

>   * config/rs6000/rs6000.h (FLOAT128_IEEE_MINMAX_P): New helper
>   macro.
ok

>   * config/rs6000/rs6000.md (FSCALAR): New mode iterator for floating
>   point scalars.

>   (Fm): New mode attribute for floating point scalars.

Mixed feels on mixed case, but I defer. :-) 

>   (s): Add support for the ISA 3.1 IEEE 128-bit
>   minimum and maximum instructions.
>   (s3_vsx): Add support for the ISA 3.1 IEEE 128-bit
>   minimum and maximum instructions.
> 
> gcc/testsuite/
> 2020-08-26  Michael Meissner  
> 
>   * gcc.target/powerpc/float128-minmax-2.c: New test.
> 

the rest lgtm, 
thanks
-Will

> ---
>  gcc/config/rs6000/rs6000.c|  3 +-
>  gcc/config/rs6000/rs6000.h|  4 +++
>  gcc/config/rs6000/rs6000.md   | 28 +++
>  .../gcc.target/powerpc/float128-minmax-2.c| 15 ++
>  4 files changed, 43 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 6324f930628..05eb141a2cd 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15445,7 +15445,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx 
> op0, rtx op1)
>/* VSX/altivec have direct min/max insns.  */
>if ((code == SMAX || code == SMIN)
>&& (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
> -   || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode
> +   || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))
> +   || FLOAT128_IEEE_MINMAX_P (mode)))
>  {
>emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1)));
>return;
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index bbd8060e143..b504aaa0199 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -345,6 +345,10 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
> || ((MODE) == TDmode) \
> || (!TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (MODE)))
> 
> +/* Macro whether the float128 min/max instructions are enabled.  */
> +#define FLOAT128_IEEE_MINMAX_P(MODE) \
> +  (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (MODE))
> +
>  /* Return true for floating point that does not use a vector register.  */
>  #define SCALAR_FLOAT_MODE_NOT_VECTOR_P(MODE) \
>(SCALAR_FLOAT_MODE_P (MODE) && !FLOAT128_VECTOR_P (MODE))
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 43b620ae1c0..006e60f09bc 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -789,6 +789,18 @@ (define_code_attr minmax [(smin "min")
>  (define_code_attr SMINMAX[(smin "SMIN")
>(smax "SMAX")])
> 
> +;; Mode iterator for scalar binary floating point operations
> +(define_mode_iterator FSCALAR [SF
> +DF
> +(KF "FLOAT128_IEEE_MINMAX_P (KFmode)")
> +(TF "FLOAT128_IEEE_MINMAX_P (TFmode)")])
> +
> +;; Constraints to use for scalar FP operations
> +(define_mode_attr Fm [(SF "wa")
> +   (DF "wa")
> +   (TF "v")
> +   (KF "v")])
> +
>  ;; Iterator to optimize the following cases:
>  ;;   D-form load to FPR register & move to Altivec register
>  ;;   Move Altivec register to FPR register and store
> @@ -5142,9 +5154,9 @@ (define_insn "copysign3_fcpsgn"
>  ;; to allow either DF/SF to use only traditional registers.
> 
>  (define_expand "s3"
> -  [(set (match_operand:SFDF 0 "gpc_reg_operand")
> - (fp_minmax:SFDF (match_operand:SFDF 1 "gpc_reg_operand")
> - (match_operand:SFDF 2 "gpc_reg_operand")))]
> +  [(set (match_operand:FSCALAR 0 "gpc_reg_operand")
> + (fp_minmax:FSCALAR (match_operand:FSCALAR 1 "gpc_reg_operand")

Re: [PATCH 2/4] PowerPC: Rename functions for min, max, cmove

2020-08-27 Thread will schmidt via Gcc-patches
On Wed, 2020-08-26 at 22:44 -0400, Michael Meissner via Gcc-patches wrote:
> PowerPC: Rename functions for min, max, cmove.
> 
> This patch renames the functions that generate the ISA 3.0 C minimum, C
> maximum, and conditional move instructions to use a better name than just 
> using
> a _p9 suffix.  Because the functions can fail, the names use "maybe_emit"
> instead of "generate_" in the name.
> 
> I have built compilers on a little endian power9 Linux system with all 4
> patches applied.  I did bootstrap builds and ran the testsuite, with no
> regressions.  Previous versions of the patch was also tested on a little 
> endian
> power8 Linux system.  I would like to check this patch into the master
> branch for GCC 11.  At this time, I do not anticipate needing to backport 
> these
> changes to GCC 10.3.
> 
> gcc/
> 2020-08-26  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (rs6000_emit_p9_fp_minmax): Rename to
>   maybe_emit_fp_c_minmax.
ok

>   (maybe_emit_fp_c_minmax): Rename rs6000_emit_p9_fp_minmax.  Return
>   bool instead of int.

function rename is redundant between the two entries?


>   (rs6000_emit_p9_fp_cmove): Rename to maybe_emit_fp_cmove.
>   (maybe_emit_fp_cmove): Rename rs6000_emit_p9_fp_cmove.  Return
>   bool instead of int.

Function rename comment is redundant ?

>   (have_compare_and_set_mask): New helper function.
>   (rs6000_emit_cmove): Rework support of ISA 3.0 functions to
>   generate "C" minimum, "C" maximum, and conditional move
>   instructions for scalar floating point.
> 

Nothing else jumped out at me below. 
lgtm,  thanks, 
-Will

> ---
>  gcc/config/rs6000/rs6000.c | 77 ++
>  1 file changed, 53 insertions(+), 24 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index bac50c2bcf6..6324f930628 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15056,13 +15056,17 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx 
> op_true, rtx op_false,
>return 1;
>  }
> 
> -/* ISA 3.0 (power9) minmax subcase to emit a XSMAXCDP or XSMINCDP instruction
> -   for SF/DF scalars.  Move TRUE_COND to DEST if OP of the operands of the 
> last
> -   comparison is nonzero/true, FALSE_COND if it is zero/false.  Return 0 if 
> the
> -   hardware has no such operation.  */
> +/* Possibly emit the C variant of the minimum or maximum instruction for
> +   floating point scalars (xsmincdp, xsmaxcdp, etc.).
> 
> -static int
> -rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond)
> +   Move TRUE_COND to DEST if OP of the operands of the last comparison is
> +   nonzero/true, FALSE_COND if it is zero/false.
> +
> +   Return false if we can't generate the appropriate minimum or maximum, and
> +   true if we can did the minimum or maximum.  */
> +
> +static bool
> +maybe_emit_fp_c_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond)
>  {
>enum rtx_code code = GET_CODE (op);
>rtx op0 = XEXP (op, 0);
> @@ -15072,14 +15076,14 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx 
> true_cond, rtx false_cond)
>bool max_p = false;
> 
>if (result_mode != compare_mode)
> -return 0;
> +return false;
> 
>if (code == GE || code == GT)
>  max_p = true;
>else if (code == LE || code == LT)
>  max_p = false;
>else
> -return 0;
> +return false;
> 
>if (rtx_equal_p (op0, true_cond) && rtx_equal_p (op1, false_cond))
>  ;
> @@ -15092,19 +15096,23 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx 
> true_cond, rtx false_cond)
>  max_p = !max_p;
> 
>else
> -return 0;
> +return false;
> 
>rs6000_emit_minmax (dest, max_p ? SMAX : SMIN, op0, op1);
> -  return 1;
> +  return true;
>  }
> 
> -/* ISA 3.0 (power9) conditional move subcase to emit XSCMP{EQ,GE,GT,NE}DP and
> -   XXSEL instructions for SF/DF scalars.  Move TRUE_COND to DEST if OP of the
> -   operands of the last comparison is nonzero/true, FALSE_COND if it is
> -   zero/false.  Return 0 if the hardware has no such operation.  */
> +/* Possibly emit a floating point conditional move by generating a compare 
> that
> +   sets a mask instruction and a XXSEL select instruction.
> 
> -static int
> -rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
> +   Move TRUE_COND to DEST if OP of the operands of the last comparison is
> +   nonzero/true, FALSE_COND if it is zero/false.
> +
> +   Return false if the operation cannot be generated, and true if we could
> +   generate the instruction.  */
> +
> +static bool
> +maybe_emit_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
>  {
>enum rtx_code code = GET_CODE (op);
>rtx op0 = XEXP (op, 0);
> @@ -15132,7 +15140,7 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx 
> true_cond, rtx false_cond)
>break;
> 
>  default:
> -  return 0;
> +  return false;
>  }
> 
>/* Generate:   [(parallel [(set (dest)
> @@ 

Re: [PATCH 1/4] PowerPC: Change cmove function return to bool

2020-08-27 Thread will schmidt via Gcc-patches
On Wed, 2020-08-26 at 22:43 -0400, Michael Meissner via Gcc-patches wrote:
> PowerPC: Change cmove function return to bool.
> 
> In doing the other work for adding ISA 3.1 128-bit minimum, maximum, and
> conditional move support, I noticed the two functions that process conditional
> moves return 'int' instead of 'bool'.  This patch changes these functions to
> return 'bool'.
> 
> I have built compilers on a little endian power9 Linux system with all 4
> patches applied.  I did bootstrap builds and ran the testsuite, with no
> regressions.  Previous versions of the patch was also tested on a little 
> endian
> power8 Linux system.  I would like to check this patch into the master
> branch for GCC 11.  At this time, I do not anticipate needing to backport 
> these
> changes to GCC 10.3.
> 
> gcc/
> 2020-08-26  Michael Meissner  
> 
>   * config/rs6000/rs6000-protos.h (rs6000_emit_cmove): Change return
>   type to bool.
>   (rs6000_emit_int_cmove): Change return type to bool.
>   * config/rs6000/rs6000.c (rs6000_emit_cmove): Change return type
>   to bool.
>   (rs6000_emit_int_cmove): Change return type to bool.
> 


lgtm,
thanks
-Will


> ---
>  gcc/config/rs6000/rs6000-protos.h |  4 ++--
>  gcc/config/rs6000/rs6000.c| 32 +++
>  2 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 28e859f4381..02e4d71922f 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -119,8 +119,8 @@ extern char * output_cbranch (rtx, const char *, int, 
> rtx_insn *);
>  extern const char * output_probe_stack_range (rtx, rtx, rtx);
>  extern void rs6000_emit_dot_insn (rtx dst, rtx src, int dot, rtx ccreg);
>  extern bool rs6000_emit_set_const (rtx, rtx);
> -extern int rs6000_emit_cmove (rtx, rtx, rtx, rtx);
> -extern int rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
> +extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx);
> +extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
>  extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx);
>  extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
>  extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 1c1caa90ede..bac50c2bcf6 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15159,7 +15159,7 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx 
> true_cond, rtx false_cond)
> operands of the last comparison is nonzero/true, FALSE_COND if it
> is zero/false.  Return 0 if the hardware has no such operation.  */
> 
> -int
> +bool
>  rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
>  {
>enum rtx_code code = GET_CODE (op);
> @@ -15175,11 +15175,11 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
> rtx false_cond)
>/* In the isel case however, we can use a compare immediate, so
>op1 may be a small constant.  */
>&& (!TARGET_ISEL || !short_cint_operand (op1, VOIDmode)))
> -return 0;
> +return false;
>if (GET_MODE (true_cond) != result_mode)
> -return 0;
> +return false;
>if (GET_MODE (false_cond) != result_mode)
> -return 0;
> +return false;
> 
>/* See if we can use the ISA 3.0 (power9) min/max/compare functions.  */
>if (TARGET_P9_MINMAX
> @@ -15187,16 +15187,16 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
> rtx false_cond)
>&& (result_mode == SFmode || result_mode == DFmode))
>  {
>if (rs6000_emit_p9_fp_minmax (dest, op, true_cond, false_cond))
> - return 1;
> + return true;
> 
>if (rs6000_emit_p9_fp_cmove (dest, op, true_cond, false_cond))
> - return 1;
> + return true;
>  }
> 
>/* Don't allow using floating point comparisons for integer results for
>   now.  */
>if (FLOAT_MODE_P (compare_mode) && !FLOAT_MODE_P (result_mode))
> -return 0;
> +return false;
> 
>/* First, work out if the hardware can do this at all, or
>   if it's too slow  */
> @@ -15204,7 +15204,7 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
> rtx false_cond)
>  {
>if (TARGET_ISEL)
>   return rs6000_emit_int_cmove (dest, op, true_cond, false_cond);
> -  return 0;
> +  return false;
>  }
> 
>is_against_zero = op1 == CONST0_RTX (compare_mode);
> @@ -15216,7 +15216,7 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
> rtx false_cond)
>   generated.  */
>if (SCALAR_FLOAT_MODE_P (compare_mode)
>&& flag_trapping_math && ! is_against_zero)
> -return 0;
> +return false;
> 
>/* Eliminate half of the comparisons by switching operands, this
>   makes the remaining code simpler.  */
> @@ -15232,7 +15232,7 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
> rtx false_cond)
>/* UNEQ and LTGT take four instructions for a 

Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-08-27 Thread Iain Buclaw via Gcc-patches
Excerpts from Jakub Jelinek's message of August 27, 2020 12:06 pm:
> On Fri, Jul 31, 2020 at 04:28:05PM -0400, Jason Merrill via Gcc-patches wrote:
>> On 7/31/20 6:06 AM, Jakub Jelinek wrote:
>> > On Fri, Jul 31, 2020 at 10:54:46AM +0100, Jonathan Wakely wrote:
>> > > > Does the standard require that somewhere?  Because that is not what the
>> > > > compiler implements right now.
>> > > 
>> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78620
>> > 
>> > But does that imply that all CONSTRUCTORs without CONSTRUCTOR_NO_CLEARING
>> > need to be treated that way?  I mean, aren't such CONSTRUCTORs used also 
>> > for
>> > other initializations?
>> 
>> Yes, they are also used to represent constant values of classes that are
>> initialized by constexpr constructor.
>> 
>> > And, are the default copy constructors or assignment operators supposed to
>> > also copy the padding bits, or do they become unspecified again through
>> > that?
>> 
>> For a non-union class, a defaulted copy is defined as memberwise copy, not a
>> copy of the entire object representation.  So I guess strictly speaking the
>> padding bits do become unspecified.  But I think if the copy is trivial, in
>> practice all implementations do copy the object representation; perhaps the
>> specification should adjust accordingly.
> 
> Sorry for not responding earlier.  I think at least in GCC there is no
> guarantee the copying is copying the object representation rather than
> memberwise copy, both are possible, depending e.g. whether SRA happens or
> not.
> 
> So, shouldn't we have a new CONSTRUCTOR flag that will represent whether
> padding bits are cleared or not and then use it e.g. in the gimplifier?
> Right now the gimplifier only adds first zero initialization if
> CONSTRUCTOR_NO_CLEARING is not set and some initializers are not present,
> so if there is a new flag, we'd need to in that case find out if there are
> any padding bits and do the zero initialization in that case.
> A question is if GIMPLE var = {}; statement (empty CONSTRUCTOR) is handled
> as zero initialization of also the padding bits, or if we should treat it
> that way only if the CONSTRUCTOR on the rhs has the new bit set and e.g.
> when lowering memset 0 into var = {}; set the bit too.
> From what I understood on IRC, D has similar need for zero initialization of
> padding.
> 

Yes, by the D spec, all holes in structs and arrays should be zeroed at
all time.  Which has reusulted in some awkward lowerings in order to
attempt forcing the issue.  In particular small structs not passed in
memory are prone to lose the value of padding bits after memset().

Iain.


Re: [PATCH v5] genemit.c (main): split insn-emit.c for compiling parallelly

2020-08-27 Thread Segher Boessenkool
Hi!

On Thu, Aug 27, 2020 at 08:47:19PM +0800, Jojo R wrote:
> +insn-emit-split-c = $(foreach o, $(shell for i in 
> {1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c)

If you use a variable for the result of that "seq", this will be more
readable / maintainable / etc.

(Should this use := instead of = ?  What about the assignment to
insn-generated-split-num itself?)

> +  long long read_count = 0;

We use "int" in many other places for similar counts.  2**31 should be
enough for anybody.

>md_rtx_info info;
>while (read_md_rtx ())
> +{
> +if (!(read_count++ % 1))

Wrong indent.  "== 0" is more typical for testing if numbers are zero.

> +  {
> +printf ("/* Split file into separate compilation units for parallel 
> compilation %lld */\n\n", read_count);

Please split this (at least the source line, but probably the target
line is too long a well).


All that are details.  This does look like it fixes the problems in the
previous versions.  Thanks!


Segher


Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread Segher Boessenkool
On Thu, Aug 27, 2020 at 01:51:25PM -0500, Bill Schmidt wrote:
> >>+  /* For ELFv2, r12 and CTR need to hold the function address
> >>+ for an indirect call.  */
> >>+  if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
> >>+{
> >>+  r12 = gen_rtx_REG (Pmode, 12);
> >>+  if (!rtx_equal_p (r12, func_desc))
> >>+   emit_move_insn (r12, func_desc);
> >These last two lines aren't needed?  A move from r12 to r12 is harmless,
> >and should be optimised away just fine?
> 
> On entry to this function, func_desc has the function address, and the 
> problem is for a sibcall it is generally not in r12.  So I'm forcing it 
> into r12 here.  I guess we could remove the !rtx_equal_p test and 

Yes, that is what I meant, sorry.

> generate an unnecessary copy, but I don't see why we should do that.  

Because it is simpler code.  Micro-optimisations are often not
optimisations at all, just obfuscations, and those hurt in various
ways (they make bigger binary code than necessary, but perhaps more
importantly they make bigger source code, which hurts in various ways).

It not the copy that is unnecessary: the preventing it *here*, manually,
is what is unnecessary.

> This is the same thing we do elsewhere (such as rs6000_aix_call).

That does not mean we have to do the same harmful (or just not useful)
thing everywhere else as well ;-)

> >>/* Create the call.  */
> >>-  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_desc), 
> >>tlsarg);
> >>+  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), 
> >>tlsarg);
> >I don't understand this change?  (Maybe I'm not looking well enough.)
> 
> Prior to this change, func_desc comes in as a parameter and is never 
> changed.  Now it's either that case, or it's the new case, so this just 
> is the join point of that decision.

So I'm not looking well enough, okay :-)


Segher


[committed] hppa: Fix shadd-2.c scan assembler count

2020-08-27 Thread John David Anglin
Due to revisions to hppa_rtx_costs by Roger Sayle, we now have shift add 
instructions in
the shadd-2.c test.  Committed to trunk and gcc-10 branch.

Dave

Fix shadd-2.c scan assembler count.

2020-08-27  John David Anglin  

gcc/testsuite/
* gcc.target/hppa/shadd-2.c: Adjust times to 4.

diff --git a/gcc/testsuite/gcc.target/hppa/shadd-2.c 
b/gcc/testsuite/gcc.target/hppa/shadd-2.c
index 34708e51597..b92f782cf0d 100644
--- a/gcc/testsuite/gcc.target/hppa/shadd-2.c
+++ b/gcc/testsuite/gcc.target/hppa/shadd-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile }  */
 /* { dg-options "-O2" }  */
-/* { dg-final { scan-assembler-times "sh.add" 2 } }  */
+/* { dg-final { scan-assembler-times "sh.add" 4 } }  */

 typedef struct rtx_def *rtx;
 typedef const struct rtx_def *const_rtx;


Re: [PATCH] hppa: Improve hppa_rtx_costs for shifts by constants.

2020-08-27 Thread John David Anglin
Hi Roger,

On 2020-08-27 12:42 p.m., Roger Sayle wrote:
> I was wondering whether you could please "put this in the queue", and
> reconfirm that PR middle-end/87256 remains resolved?
>
>
> 2020-08-27  Roger Sayle  
>
> gcc/ChangeLog
>   * config/pa/pa.c (hppa_rtx_costs) [ASHIFT, ASHIFTRT, LSHIFTRT]:
>   Provide accurate costs for shifts of integer constants.
Will test.

-- 
John David Anglin  dave.ang...@bell.net



Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread Bill Schmidt via Gcc-patches

Hi!

On 8/27/20 1:41 PM, Segher Boessenkool wrote:

Hi!

On Thu, Aug 27, 2020 at 08:21:34AM -0500, Bill Schmidt wrote:

+  /* For ELFv2, r12 and CTR need to hold the function address
+ for an indirect call.  */
+  if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
+{
+  r12 = gen_rtx_REG (Pmode, 12);
+  if (!rtx_equal_p (r12, func_desc))
+   emit_move_insn (r12, func_desc);

These last two lines aren't needed?  A move from r12 to r12 is harmless,
and should be optimised away just fine?



On entry to this function, func_desc has the function address, and the 
problem is for a sibcall it is generally not in r12.  So I'm forcing it 
into r12 here.  I guess we could remove the !rtx_equal_p test and 
generate an unnecessary copy, but I don't see why we should do that.  
This is the same thing we do elsewhere (such as rs6000_aix_call).





/* Create the call.  */
-  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_desc), tlsarg);
+  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), tlsarg);

I don't understand this change?  (Maybe I'm not looking well enough.)


Prior to this change, func_desc comes in as a parameter and is never 
changed.  Now it's either that case, or it's the new case, so this just 
is the join point of that decision.


Thanks,

Bill



Looks fine otherwise, yes :-)


Segher


Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread Segher Boessenkool
Hi!

On Thu, Aug 27, 2020 at 08:21:34AM -0500, Bill Schmidt wrote:
> +  /* For ELFv2, r12 and CTR need to hold the function address
> + for an indirect call.  */
> +  if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
> +{
> +  r12 = gen_rtx_REG (Pmode, 12);
> +  if (!rtx_equal_p (r12, func_desc))
> + emit_move_insn (r12, func_desc);

These last two lines aren't needed?  A move from r12 to r12 is harmless,
and should be optimised away just fine?

>/* Create the call.  */
> -  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_desc), tlsarg);
> +  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), tlsarg);

I don't understand this change?  (Maybe I'm not looking well enough.)

Looks fine otherwise, yes :-)


Segher


Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-27 Thread Segher Boessenkool
On Thu, Aug 27, 2020 at 08:43:40AM -0700, Carl Love wrote:
> 2020-08-26  Carl Love  
>   * config/rs6000/rs6000-builtin.def: (BU_P10V_VSX_1) New builtin macro 
> expansion.
>   (XVCVBF16SPN, XVCVSPBF16): Replace macro expansion BU_VSX_1 with 
> BU_P10V_VSX_1.
>   * config/rs6000/rs6000-call.c: (VSX_BUILTIN_XVCVSPBF16, 
> VSX_BUILTIN_XVCVBF16SPN):
>   Replace with P10V_BUILTIN_XVCVSPBF16, P10V_BUILTIN_XVCVBF16SPN 
> respectively.

(lines too long)

This patch is fine for GCC 10.  Thanks!

As an aside:

> -case VSX_BUILTIN_XVCVSPBF16:
> -case VSX_BUILTIN_XVCVBF16SPN:
> +case P10V_BUILTIN_XVCVSPBF16:
> +case P10V_BUILTIN_XVCVBF16SPN:

Having "P10V" in the name here doesn't really help anything; in fact, it
could just be BUILTIN_XVCBCVXBCXBNVC?  Simpler names like that will
improve readability as well.  But that is all for the future :-)


Segher


Re: [PATCH 3/6] Implement fork-based parallelism engine

2020-08-27 Thread Giuliano Belinassi via Gcc-patches
Hi, Honza.

Thank you for your detailed review!

On 08/27, Jan Hubicka wrote:
> > diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> > index c0b45795059..22405098dc5 100644
> > --- a/gcc/cgraph.c
> > +++ b/gcc/cgraph.c
> > @@ -226,6 +226,22 @@ cgraph_node::delete_function_version_by_decl (tree 
> > decl)
> >decl_node->remove ();
> >  }
> >  
> > +/* Release function dominator info if present.  */
> > +
> > +void
> > +cgraph_node::maybe_release_dominators (void)
> > +{
> > +  struct function *fun = DECL_STRUCT_FUNCTION (decl);
> > +
> > +  if (fun && fun->cfg)
> > +{
> > +  if (dom_info_available_p (fun, CDI_DOMINATORS))
> > +   free_dominance_info (fun, CDI_DOMINATORS);
> > +  if (dom_info_available_p (fun, CDI_POST_DOMINATORS))
> > +   free_dominance_info (fun, CDI_POST_DOMINATORS);
> > +}
> > +}
> 
> I am not sure if that needs to be member function, but if so we want to
> merge it with other places in cgraph.c and cgraphunit.c where dominators
> are freed.  I do not think you need to check avalability.

This is necessary to remove some nodes from the callgraph.  For some
reason, if I node->remove () and it still have the dominance info
available, it will fail some assertions on the compiler.

However, with regard to code layout, this can be moved to lto-cgraph.c,
as it is only used there.

> > +
> >  /* Record that DECL1 and DECL2 are semantically identical function
> > versions.  */
> >  void
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index b4a7871bd3d..72ac19f9672 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -463,6 +463,15 @@ public:
> >   Return NULL if there's no such node.  */
> >static symtab_node *get_for_asmname (const_tree asmname);
> >  
> > +  /* Get symtab node by order.  */
> > +  static symtab_node *find_by_order (int order);
> 
> This is quadratic and moreover seems unused. Why do you add it?

I added this for debugging, since I used this a lot inside GDB.
Sure, I can remove this without any problems, or print a warning
for the developer to avoid calling this in production code.

> > +
> > +  /* Get symtab_node by its name.  */
> > +  static symtab_node *find_by_name (const char *);
> 
> Similarly here, note that names are not really very meaningful as lookup
> things, since they get duplicated.
> > +
> > +  /* Get symtab_node by its ASM name.  */
> > +  static symtab_node *find_by_asm_name (const char *);
> 
> For this we have get_for_asmname (which also populates asm name hash as
> needed and is not quadratic)

Cool. I will surely remove this then :)

> > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> > index d10d635e942..73e4bed3b61 100644
> > --- a/gcc/cgraphunit.c
> > +++ b/gcc/cgraphunit.c
> > @@ -2258,6 +2258,11 @@ cgraph_node::expand (void)
> >  {
> >location_t saved_loc;
> >  
> > +  /* FIXME: Find out why body-removed nodes are marked for output.  */
> > +  if (body_removed)
> > +return;
> 
> Indeed, we should know :)

Looks like this was due an early problem. I removed this and bootstrap
is working OK.

> > +
> > +
> >/* We ought to not compile any inline clones.  */
> >gcc_assert (!inlined_to);
> >  
> > @@ -2658,6 +2663,7 @@ ipa_passes (void)
> >  
> >execute_ipa_summary_passes
> > ((ipa_opt_pass_d *) passes->all_regular_ipa_passes);
> > +
> This seems accidental.

Yes.

> >  }
> >  
> >/* Some targets need to handle LTO assembler output specially.  */
> > @@ -2687,10 +2693,17 @@ ipa_passes (void)
> >if (flag_generate_lto || flag_generate_offload)
> >  targetm.asm_out.lto_end ();
> >  
> > -  if (!flag_ltrans
> > +  if (split_outputs)
> > +flag_ltrans = true;
> > +
> > +  if ((!flag_ltrans || split_outputs)
> >&& ((in_lto_p && flag_incremental_link != INCREMENTAL_LINK_LTO)
> >   || !flag_lto || flag_fat_lto_objects))
> >  execute_ipa_pass_list (passes->all_regular_ipa_passes);
> > +
> > +  if (split_outputs)
> > +flag_ltrans = false;
> > +
> >invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
> >  
> >bitmap_obstack_release (NULL);
> > @@ -2742,6 +2755,185 @@ symbol_table::output_weakrefs (void)
> >}
> >  }
> >  
> > +static bool is_number (const char *str)
> > +{
> > +  while (*str != '\0')
> > +switch (*str++)
> > +  {
> > +   case '0':
> > +   case '1':
> > +   case '2':
> > +   case '3':
> > +   case '4':
> > +   case '5':
> > +   case '6':
> > +   case '7':
> > +   case '8':
> > +   case '9':
> > + continue;
> > +   default:
> > + return false;
> > +  }
> > +
> > +  return true;
> > +}
> 
> This looks odd, we have other places where we parse number from command
> line :)

isdigit () is poisoned in GCC. But I guess I should look how -flto=
does this.

> > +
> > +/* If forked, which child am I?  */
> > +
> > +static int childno = -1;
> > +
> > +static bool
> > +maybe_compile_in_parallel (void)
> > +{
> > +  struct symtab_node *node;
> > +  int partitions, i, j;
> > +  int *pids;
> > +
> > +  bool promote_statics = 

Re: [PATCH] hppa: Improve hppa_rtx_costs for shifts by constants.

2020-08-27 Thread Jeff Law via Gcc-patches
On Thu, 2020-08-27 at 17:42 +0100, Roger Sayle wrote:
> Hi Dave (and Jeff),
> For your consideration, here's a patch that should fix the recent regression
> of gcc.dg/tree-ssa/slrt-13.c on hppa targets.
> 
> This patch provides more accurate rtx_costs estimates for shifts by
> integer constants (which are cheaper than by a register amount).
> Fine tuning these is sufficient to have simple-ssa-strength-reduce
> prefer multiplications by four, over multiplications by five.
> 
> Technically only the ASHIFT is required to fix the regression, but for
> symmetry
> I've added the appropriate PA-RISC values for all three shift types (in
> SImode).
> I've also checked that this doesn't reintroduce PR middle-end/87256.
> 
> I was wondering whether you could please "put this in the queue", and
> reconfirm that PR middle-end/87256 remains resolved?
> 
> 
> 2020-08-27  Roger Sayle  
> 
> gcc/ChangeLog
>   * config/pa/pa.c (hppa_rtx_costs) [ASHIFT, ASHIFTRT, LSHIFTRT]:
>   Provide accurate costs for shifts of integer constants.
Spinning in the tester.  Figure results about this time tomorrow:

http://3.14.90.209:8080/job/hppa-linux-gnu/944/console
> 


jeff



Re: [PATCH] modules: c++: Fix push_namespace ICE with modules

2020-08-27 Thread Nathan Sidwell

On 8/14/20 11:04 AM, Jeff Chapman wrote:

Hello!

Attached is a patch that fixes an ICE on the devel/c++-modules branch caused
by a slot invalidation edge case in push_namespace.


I just fell over this myself, reducing a testcase.  your fix wasn;t 
quite right -- we were creating an empty slot in the hash table and then 
doing another insertion, w/o initializing that slot.  We have to also 
not insert on the first looking.


Thanks for the patch! it was clueful when I hit the problem :)


It's been sitting around for a while and I wasn't sure if I should use the
original date or not. Feel free to adjust that to the commit date if that's
what it should be instead.


2020-05-15  Jeff Chapman II  

gcc/cp/
* name-lookup.c (push_namespace): Fix slot invalidation related
ICE when compiling with modules enabled.

gcc/testsuite/

* g++.dg/modules/string-view1.C: New test.
* g++.dg/modules/string-view2.C: Ditto.


Please let me know if there's anything more I can do,
Jeff Chapman II




--
Nathan Sidwell
diff --git c/ChangeLog.modules w/ChangeLog.modules
index 4d50352a698..27233f0228d 100644
--- c/ChangeLog.modules
+++ w/ChangeLog.modules
@@ -1,3 +1,14 @@
+2020-08-27  Nathan Sidwell  
+	Jeff Chapman II  
+
+	Fix hash table breakage.
+	gcc/cp/
+	* name-lookup.c (push_namespace): Do not create slot on first
+	lookup.
+	gcc/testsuite/
+	* g++.dg/modules/string-view1.C: New test.
+	* g++.dg/modules/string-view2.C: Ditto.
+
 2020-08-27  Nathan Sidwell  
 
 	Merge trunk 708b3600d04.
diff --git c/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index d7ec9d214ed..3e5985d396c 100644
--- c/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -8872,8 +8872,9 @@ push_namespace (tree name, bool make_inline)
 {
   /* Before making a new namespace, see if we already have one in
 	 the existing partitions of the current namespace.  */
-  tree *slot = find_namespace_slot (current_namespace, name, true);
-  ns = reuse_namespace (slot, current_namespace, name);
+  tree *slot = find_namespace_slot (current_namespace, name, false);
+  if (slot)
+	ns = reuse_namespace (slot, current_namespace, name);
   if (!ns)
 	ns = make_namespace (current_namespace, name,
 			 input_location, make_inline);
@@ -8884,6 +8885,12 @@ push_namespace (tree name, bool make_inline)
 	{
 	  /* finish up making the namespace.  */
 	  add_decl_to_level (NAMESPACE_LEVEL (current_namespace), ns);
+	  if (!slot)
+	{
+	  slot = find_namespace_slot (current_namespace, name);
+	  /* This should find the slot created by pushdecl.  */
+	  gcc_checking_assert (slot);
+	}
 	  make_namespace_finish (ns, slot);
 
 	  /* Add the anon using-directive here, we don't do it in
diff --git c/gcc/testsuite/g++.dg/modules/string-view1.C w/gcc/testsuite/g++.dg/modules/string-view1.C
new file mode 100644
index 000..f5391f39180
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/string-view1.C
@@ -0,0 +1,6 @@
+// { dg-additional-options "-fmodules-ts" }
+module;
+#include 
+#include 
+export module foo;
+// { dg-module-cmi foo }
diff --git c/gcc/testsuite/g++.dg/modules/string-view2.C w/gcc/testsuite/g++.dg/modules/string-view2.C
new file mode 100644
index 000..dad30eabba6
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/string-view2.C
@@ -0,0 +1,56 @@
+// { dg-additional-options "-fmodules-ts" }
+// reduced from string-view1 through cvise. Broken under c++2a too.
+// we were creating a hash slot, and then doing another lookup without
+// initializing that first slot :(
+
+namespace std {
+typedef int a;
+int b;
+decltype(nullptr) c;
+namespace xyz {}
+__builtin_va_list d;
+int n;
+int e;
+int f;
+int g;
+int h;
+int i;
+int j;
+int k;
+typedef struct l m;
+typedef struct aa w;
+typedef struct x o;
+typedef x p;
+long q;
+long r;
+typedef l s;
+extern p ab;
+void t();
+void v();
+extern p ac;
+void ad();
+int ae;
+int af;
+extern p ag;
+extern p ah;
+void ai();
+void y();
+int aj;
+int ak;
+int al;
+char am;
+int an;
+a ao;
+int ap;
+int aq;
+void z();
+int ar;
+int as;
+void at();
+void au();
+void av();
+void aw();
+int u;
+namespace zz {
+}
+}


[PATCH] avoid erroneous argument types when checking built-in signatures (PR c/96596)

2020-08-27 Thread Martin Sebor via Gcc-patches

The attached change has match_builtin_function_types() fail
for erroneous argument types to prevent an ICE due to assuming
they are necessarily valid.

Martin
PR c/96596 - ICE in match_builtin_function_types on a declaration of a built-in with invalid array argument

gcc/c/ChangeLog:

	PR c/96596
	* c-decl.c (match_builtin_function_types): Avoid dealing with erroneous
	argument type.

gcc/testsuite/ChangeLog:

	PR c/96596
	* gcc.dg/Wbuiltin-declaration-mismatch-16.c: New test.

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 5d6b504fe78..190c00bd435 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -1712,7 +1712,10 @@ match_builtin_function_types (tree newtype, tree oldtype,
 	return NULL_TREE;
 
   tree oldtype = TYPE_MAIN_VARIANT (TREE_VALUE (oldargs));
-  tree newtype = TYPE_MAIN_VARIANT (TREE_VALUE (newargs));
+  tree newtype = TREE_VALUE (newargs);
+  if (newtype == error_mark_node)
+   return NULL_TREE;
+  newtype = TYPE_MAIN_VARIANT (newtype);
 
   if (!types_close_enough_to_match (oldtype, newtype))
 	return NULL_TREE;
diff --git a/gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-16.c b/gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-16.c
new file mode 100644
index 000..494fff9ec60
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-16.c
@@ -0,0 +1,12 @@
+/* PR c/96596 - ICE on a declaration of a built-in with invalid array
+   argument
+   { dg-do compile } */
+
+void __builtin_abort (int[foo]);/* { dg-error "'foo' undeclared" } */
+void __builtin_trap (int[__SIZE_MAX__]);/* { dg-error "size of unnamed array is too large" } */
+void __builtin_unreachable (int[][]);   /* { dg-error "array type has incomplete element type" } */
+
+void __builtin_exit (int, int[+]);  /* { dg-error "expected expression before" } */
+
+/* { dg-prune-output "\\\[-Wbuiltin-declaration-mismatch" }  */
+


[PATCH] aarch64: Remove redundant mult patterns

2020-08-27 Thread Alex Coplan
Hello,

Following on from the earlier patch to fix up the syntax for
add/sub/adds/subs and friends with a sign/zero-extended operand [0],
this patch removes the "mult" variants of these patterns which are
all redundant.

This patch removes the following patterns from the AArch64 backend:

 *adds_mul_imm_
 *subs_mul_imm_
 *adds__multp2
 *subs__multp2
 *add_mul_imm_
 *add__mult_
 *add__mult_si_uxtw
 *add__multp2
 *add_si_multp2_uxtw
 *add_uxt_multp2
 *add_uxtsi_multp2_uxtw
 *sub_mul_imm_
 *sub_mul_imm_si_uxtw
 *sub__multp2
 *sub_si_multp2_uxtw
 *sub_uxt_multp2
 *sub_uxtsi_multp2_uxtw
 *neg_mul_imm_2
 *neg_mul_imm_si2_uxtw

Together with the following predicates which were used only by these
patterns:

 - aarch64_pwr_imm3
 - aarch64_pwr_2_si
 - aarch64_pwr_2_di

These patterns are all redundant since multiplications by powers of two
should be represented as shfits outside a (mem).

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu (on top of [0]),
   no regressions.

OK for master (when applied after [0])?

Thanks,
Alex

[0] : https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552137.html

---

gcc/ChangeLog:

* config/aarch64/aarch64.md (*adds_mul_imm_): Delete.
(*subs_mul_imm_): Delete.
(*adds__multp2): Delete.
(*subs__multp2): Delete.
(*add_mul_imm_): Delete.
(*add__mult_): Delete.
(*add__mult_si_uxtw): Delete.
(*add__multp2): Delete.
(*add_si_multp2_uxtw): Delete.
(*add_uxt_multp2): Delete.
(*add_uxtsi_multp2_uxtw): Delete.
(*sub_mul_imm_): Delete.
(*sub_mul_imm_si_uxtw): Delete.
(*sub__multp2): Delete.
(*sub_si_multp2_uxtw): Delete.
(*sub_uxt_multp2): Delete.
(*sub_uxtsi_multp2_uxtw): Delete.
(*neg_mul_imm_2): Delete.
(*neg_mul_imm_si2_uxtw): Delete.
* config/aarch64/predicates.md (aarch64_pwr_imm3): Delete.
(aarch64_pwr_2_si): Delete.
(aarch64_pwr_2_di): Delete.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9b20dd0b1a0..59516db4b77 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2341,38 +2341,6 @@
   [(set_attr "type" "alus_shift_imm")]
 )
 
-(define_insn "*adds_mul_imm_"
-  [(set (reg:CC_NZ CC_REGNUM)
-   (compare:CC_NZ
-(plus:GPI (mult:GPI
-   (match_operand:GPI 1 "register_operand" "r")
-   (match_operand:QI 2 "aarch64_pwr_2_" "n"))
-  (match_operand:GPI 3 "register_operand" "r"))
-(const_int 0)))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-   (plus:GPI (mult:GPI (match_dup 1) (match_dup 2))
- (match_dup 3)))]
-  ""
-  "adds\\t%0, %3, %1, lsl %p2"
-  [(set_attr "type" "alus_shift_imm")]
-)
-
-(define_insn "*subs_mul_imm_"
-  [(set (reg:CC_NZ CC_REGNUM)
-   (compare:CC_NZ
-(minus:GPI (match_operand:GPI 1 "register_operand" "r")
-   (mult:GPI
-(match_operand:GPI 2 "register_operand" "r")
-(match_operand:QI 3 "aarch64_pwr_2_" "n")))
-(const_int 0)))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-   (minus:GPI (match_dup 1)
-  (mult:GPI (match_dup 2) (match_dup 3]
-  ""
-  "subs\\t%0, %1, %2, lsl %p3"
-  [(set_attr "type" "alus_shift_imm")]
-)
-
 (define_insn "*adds__"
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
@@ -2437,46 +2405,6 @@
   [(set_attr "type" "alus_ext")]
 )
 
-(define_insn "*adds__multp2"
-  [(set (reg:CC_NZ CC_REGNUM)
-   (compare:CC_NZ
-(plus:GPI (ANY_EXTRACT:GPI
-   (mult:GPI (match_operand:GPI 1 "register_operand" "r")
- (match_operand 2 "aarch64_pwr_imm3" "Up3"))
-   (match_operand 3 "const_int_operand" "n")
-   (const_int 0))
-  (match_operand:GPI 4 "register_operand" "rk"))
-   (const_int 0)))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-   (plus:GPI (ANY_EXTRACT:GPI (mult:GPI (match_dup 1) (match_dup 2))
-  (match_dup 3)
-  (const_int 0))
- (match_dup 4)))]
-  "aarch64_is_extend_from_extract (mode, operands[2], operands[3])"
-  "adds\\t%0, %4, %1, xt%e3 %p2"
-  [(set_attr "type" "alus_ext")]
-)
-
-(define_insn "*subs__multp2"
-  [(set (reg:CC_NZ CC_REGNUM)
-   (compare:CC_NZ
-(minus:GPI (match_operand:GPI 4 "register_operand" "rk")
-   (ANY_EXTRACT:GPI
-(mult:GPI (match_operand:GPI 1 "register_operand" "r")
-  (match_operand 2 "aarch64_pwr_imm3" "Up3"))
-(match_operand 3 "const_int_operand" "n")
-(const_int 0)))
-   (const_int 0)))
-   (set (match_operand:GPI 0 "register_operand" "=r")
-   (minus:GPI (match_dup 4) (ANY_EXTRACT:GPI
- (mult:GPI (match_dup 

Re: [PATCH] x86: Add -mbaseline-isas-only/target("baseline-isas-only")

2020-08-27 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 27, 2020 at 4:45 PM H.J. Lu  wrote:

> > > > > > How about target("baseline-isas-only")? All CPUID functions are
> > > > > > inlined.
> > > > >
> > > > > No, I don't think this is a good idea. Now consider the situation that
> > > > > caller functions are compiled with e.g. -mgeneral-regs-only. Due to
> > > > > #pragmas, CPUID functions are compiled with a superset ISAs, so they
> > > > > again won't be inlined. ISAs of caller functions and CPUID should
> > > > > match, the best way is to include  after the #pragma. And
> > > > > IMO, general-regs-only target #pragma is an excellent setting for
> > > > > both: cpuid.h and caller bit testing functions.
> > > > >
> > > > > So, if we care about inlining, decorating cpuid.h with target pragmas
> > > > > is a bad idea.
> > > >
> > > > This can be done with #pragma in .
> > > >
> > >
> > > We just need to update ix86_can_inline_p to allow inline functions
> > > with baseline-isas-only and general-regs-only attributes if caller
> > > supports the same set of ISAs.
> > >
> > > Here is the updated patch.
> >
> > I'm not against it, but I don't plan to approve the attached patch.
> >
>
> How about this one?

I really don't see any benefit in introducing baseline-isas-only
#pragma, when we have general-regs-only #pragma. We may want (for
whatever reason) to avoid SSE2 movd/movq instructions also for 64bit
targets in functions that test bits, returned by cpuid. And since
cpuid.h functions are extremely simple (because we want them to be
inlined!), we can simply #include them after mentioned #pragma,
together with bit testing functions. This way, all problems involving
inlining are avoided.

Uros.


Re: [PATCH v2] testsuite: Update some vect cases for partial vectors

2020-08-27 Thread Richard Sandiford
"Kewen.Lin"  writes:
> Hi Richard,
>
>>> Yeah, the comments were confusing, its intent is to check which targets
>>> support partial vectors and which usage to be used.
>>>
>>> How about to update them like:
>>>
>>> "Return true if loops using partial vectors are supported and usage kind is
>>> 1/2".
>> 
>> I wasn't really commenting on the comment so much as the intent.
>> It should be possible to run the testsuite with:
>> 
>>   --target_board unix/--param=vect-partial-vector-usage=1
>> 
>> and get the right results.
>> 
 E.g. maybe use check_compile to run gcc with “-Q --help=params” and an
 arbitrary output type (probably assembly).  Then use “regexp” on the
 lines to parse the --param=vect-partial-vector-usage value.  At that
 point it would be worth caching the result.
>>>
>>> Now the default value of this parameter is 2, even for those targets which
>>> don't have the supports with partial vectors.  Since we will get the value
>>> 2 on those unsupported targets, it looks like we have to set it manually?
>> 
>> I think that just means we want:
>> 
>> vect_len_load_store
>>   the len_load_store equivalent of vect_fully_masked, i.e. whether
>>   the target supports len load/store (regardless of whether the
>>   --param enables it)
>> 
>> vect_partial_vectors
>>   (vect_fully_masked || vect_len_load_store) && param != 0
>> 
>> vect_partial_vectors_usage_1
>>   (vect_fully_masked || vect_len_load_store) && param == 1
>> 
>> vect_partial_vectors_usage_2
>>   (vect_fully_masked || vect_len_load_store) && param == 2
>> 
>
> Thanks for the clarification!
>
> Sorry for the late update, this new version depends on the --help= patch,
> I held it for a while.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P9, with/without any
> explicit usage setting, it also works well with --target_board 
> unix/--param=vect-partial-vector-usage=1.
>
> By the way, for the testing on usage 1, it needs the function 
> check_effective_target_vect_len_load_store to return true for Power9,
> this part will be sent for review separately.

Gah, sorry for the slow reply.  Thought I'd already reviewed this,
but just realised I hadn't.

> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 9f37ac26241..2290af5812d 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1689,6 +1689,19 @@ Target supports AND, IOR and XOR reduction on vectors.
>  
>  @item vect_fold_extract_last
>  Target supports the @code{fold_extract_last} optab.
> +
> +@item vect_len_load_store
> +Target supports loops using length-based partial vectors.

Probably easiest to talk in terms of internals, so:

  Target supports the @code{len_load} and @code{len_store} optabs.

> +@item vect_partial_vectors_usage_1
> +Target supports loops using partial vectors but only for those loops whose 
> need
> +to iterate can be removed.

Here too I think it might be better to talk in terms of existing
features and flags, so:

  Target supports loop vectorization with partial vectors and
  @code{vect-partial-vector-usage} is set to 1.

> +@item vect_partial_vectors_usage_2
> +Target supports loops using partial vectors and for all loops.

Similarly here for 2.

> +
> +@item vect_partial_vectors
> +Target supports loops using partial vectors.

For this one, probably:

  Target supports loop vectorization with partial vectors and
  @code{vect-partial-vector-usage} is nonzero.

> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c 
> b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> index 5200ed1cd94..d6b0604db78 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> @@ -49,5 +49,7 @@ int main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> vect_unpack } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  
> { target vect_unpack xfail { vect_variable_length && vect_load_lanes } } } } 
> */
> +/* The epilogues are vectorized using partial vectors.  */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  
> { target { vect_unpack && {! vect_partial_vectors_usage_1 } } xfail { 
> vect_variable_length && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  
> { target { vect_unpack && { vect_partial_vectors_usage_1 } } xfail { 
> vect_variable_length && vect_load_lanes } } } } */

No need for the braces around vect_partial_vectors_usage_1 when it's
not negated.  Same for later tests.

> +# Return true if loops using partial vectors are supported but only for loops
> +# whose need to iterate can be removed, that is, value of
> +# param_vect_partial_vector_usage is set to 1.

For these comments, I think it would be good to use the sourcebuild.texi
wording, but with:

  Return true if the target supports …

instead of just “Target supports …”.

OK with those changes, thanks.

Richard


Re: [PATCH] libstdc++: Fix arithmetic bug in chrono::year_month::operator+

2020-08-27 Thread Jonathan Wakely via Gcc-patches

On 27/08/20 12:37 -0400, Patrick Palka wrote:

On Thu, 27 Aug 2020, Jonathan Wakely wrote:


On 27/08/20 11:29 -0400, Patrick Palka via Libstdc++ wrote:
> This fixes the months-based addition for year_month when the
> year_month's month component is zero.
>
> Successfully tested on x86_64-pc-linux-gnu, on the 'date' library's
> calendar and (now) on libcxx's calendar tests.  Does this look OK to
> commit?
>
> libstdc++-v3/ChangeLog:
>
>* include/std/chrono (year_month::operator+): Properly handle a
>month value of 0 by casting the month value to int before
>subtracting 1 from it so that the difference is signed in the
>subsequent addition.
>* testsuite/std/time/year_month/1.cc: Test addition of months to
>a year_month whose month value is below and above the normalized
>range of [1,12].
> ---
> libstdc++-v3/include/std/chrono |  2 +-
> libstdc++-v3/testsuite/std/time/year_month/1.cc | 12 
> 2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> index 417954d103b..398008c8f31 100644
> --- a/libstdc++-v3/include/std/chrono
> +++ b/libstdc++-v3/include/std/chrono
> @@ -2133,7 +2133,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>{
>  // TODO: Optimize?
>  auto __m = __ym.month() + __dm;
> -auto __i = unsigned{__ym.month()} - 1 + __dm.count();
> +auto __i = int(unsigned{__ym.month()}) - 1 + __dm.count();

The mix of paren-init and braced-init makes me do a double-take.

It makes me stop and wonder if there is some reason? Maybe to check
for narrowing? But it can't narrow since it just calls
year_month::operator unsigned().


[time.cal] seems to mostly use braced-init for converting a calendar
type to the underlying type, and I mindlessly followed suit :) e.g. in
[time.cal.year.nonmembers] and [time.cal.day.nonmembers].

Ah, but there's [time.cal.month.nonmembers]/7 which uses a static_cast
instead.



But it's really just because int{unsigned{__ym}} /would/ give a
narrowing warning, right?


Hmm yes, looks like it.



Would either int(unsigned(__my)) or static_cast(__ym) make sense
instead, do avoid people wondering about it in future?


The first form works for me.  The second form is rejected it seems.

Does the following look OK to commit?


OK, thanks.



-- >8 --

Subject: [PATCH] libstdc++: Fix arithmetic bug in
chrono::year_month::operator+

This fixes the months-based addition for year_month when the
year_month's month component is zero.

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month::operator+): Properly handle a
month value of 0 by casting the month value to int before
subtracting 1 from it so that the difference is sign-extended in
the subsequent addition.
* testsuite/std/time/year_month/1.cc: Test adding months to a
year_month whose month component is below or above the
normalized range of [1,12].
---
libstdc++-v3/include/std/chrono |  2 +-
libstdc++-v3/testsuite/std/time/year_month/1.cc | 12 
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 417954d103b..f0fa66c7a2b 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2133,7 +2133,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  // TODO: Optimize?
  auto __m = __ym.month() + __dm;
- auto __i = unsigned{__ym.month()} - 1 + __dm.count();
+ auto __i = int(unsigned(__ym.month())) - 1 + __dm.count();
  auto __y = (__i < 0
  ? __ym.year() + years{(__i - 11) / 12}
  : __ym.year() + years{__i / 12});
diff --git a/libstdc++-v3/testsuite/std/time/year_month/1.cc 
b/libstdc++-v3/testsuite/std/time/year_month/1.cc
index 007cfeb2f72..4c331dcdb50 100644
--- a/libstdc++-v3/testsuite/std/time/year_month/1.cc
+++ b/libstdc++-v3/testsuite/std/time/year_month/1.cc
@@ -83,4 +83,16 @@ constexpr_year_month()
  static_assert(2017y/33 + months{0} == 2019y/9);

  static_assert(2010y/January + months{-12} == 2009y/January);
+
+  static_assert(2010y/month{0} + months{-1} == 2009y/November);
+  static_assert(2010y/month{0} + months{0} == 2009y/December);
+  static_assert(2010y/month{0} + months{1} == 2010y/January);
+  static_assert(2010y/month{0} + months{2} == 2010y/February);
+  static_assert(2010y/month{0} + months{11} == 2010y/November);
+  static_assert(2010y/month{0} + months{12} == 2010y/December);
+  static_assert(2010y/month{0} + months{13} == 2011y/January);
+
+  static_assert(months{-1} + 2010y/month{37} == 2012y/December);
+  static_assert(months{0} + 2010y/month{37} == 2013y/January);
+  static_assert(months{1} + 2010y/month{37} == 2013y/February);
}
--
2.28.0.337.ge9b77c84a0








[PATCH] hppa: Improve hppa_rtx_costs for shifts by constants.

2020-08-27 Thread Roger Sayle

Hi Dave (and Jeff),
For your consideration, here's a patch that should fix the recent regression
of gcc.dg/tree-ssa/slrt-13.c on hppa targets.

This patch provides more accurate rtx_costs estimates for shifts by
integer constants (which are cheaper than by a register amount).
Fine tuning these is sufficient to have simple-ssa-strength-reduce
prefer multiplications by four, over multiplications by five.

Technically only the ASHIFT is required to fix the regression, but for
symmetry
I've added the appropriate PA-RISC values for all three shift types (in
SImode).
I've also checked that this doesn't reintroduce PR middle-end/87256.

I was wondering whether you could please "put this in the queue", and
reconfirm that PR middle-end/87256 remains resolved?


2020-08-27  Roger Sayle  

gcc/ChangeLog
* config/pa/pa.c (hppa_rtx_costs) [ASHIFT, ASHIFTRT, LSHIFTRT]:
Provide accurate costs for shifts of integer constants.

Many thanks in advance (and my apologies for the inconvenience),
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/pa/pa.c b/gcc/config/pa/pa.c
index cb88852..a9223ab 100644
--- a/gcc/config/pa/pa.c
+++ b/gcc/config/pa/pa.c
@@ -1642,6 +1642,14 @@ hppa_rtx_costs (rtx x, machine_mode mode, int outer_code,
  else
*total = COSTS_N_INSNS (18);
}
+  else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+   {
+ if (TARGET_64BIT)
+   *total = COSTS_N_INSNS (2);
+ else
+   *total = COSTS_N_INSNS (1);
+ return true;
+   }
   else if (TARGET_64BIT)
*total = COSTS_N_INSNS (4);
   else
@@ -1665,6 +1673,14 @@ hppa_rtx_costs (rtx x, machine_mode mode, int outer_code,
  else
*total = COSTS_N_INSNS (19);
}
+  else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+   {
+ if (TARGET_64BIT)
+   *total = COSTS_N_INSNS (2);
+ else
+   *total = COSTS_N_INSNS (1);
+ return true;
+   }
   else if (TARGET_64BIT)
*total = COSTS_N_INSNS (4);
   else
@@ -1688,6 +1704,11 @@ hppa_rtx_costs (rtx x, machine_mode mode, int outer_code,
  else
*total = COSTS_N_INSNS (15);
}
+  else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
   else if (TARGET_64BIT)
*total = COSTS_N_INSNS (3);
   else


Re: [PATCH] libstdc++: Fix arithmetic bug in chrono::year_month::operator+

2020-08-27 Thread Patrick Palka via Gcc-patches
On Thu, 27 Aug 2020, Jonathan Wakely wrote:

> On 27/08/20 11:29 -0400, Patrick Palka via Libstdc++ wrote:
> > This fixes the months-based addition for year_month when the
> > year_month's month component is zero.
> > 
> > Successfully tested on x86_64-pc-linux-gnu, on the 'date' library's
> > calendar and (now) on libcxx's calendar tests.  Does this look OK to
> > commit?
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * include/std/chrono (year_month::operator+): Properly handle a
> > month value of 0 by casting the month value to int before
> > subtracting 1 from it so that the difference is signed in the
> > subsequent addition.
> > * testsuite/std/time/year_month/1.cc: Test addition of months to
> > a year_month whose month value is below and above the normalized
> > range of [1,12].
> > ---
> > libstdc++-v3/include/std/chrono |  2 +-
> > libstdc++-v3/testsuite/std/time/year_month/1.cc | 12 
> > 2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libstdc++-v3/include/std/chrono
> > b/libstdc++-v3/include/std/chrono
> > index 417954d103b..398008c8f31 100644
> > --- a/libstdc++-v3/include/std/chrono
> > +++ b/libstdc++-v3/include/std/chrono
> > @@ -2133,7 +2133,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > {
> >   // TODO: Optimize?
> >   auto __m = __ym.month() + __dm;
> > - auto __i = unsigned{__ym.month()} - 1 + __dm.count();
> > + auto __i = int(unsigned{__ym.month()}) - 1 + __dm.count();
> 
> The mix of paren-init and braced-init makes me do a double-take.
> 
> It makes me stop and wonder if there is some reason? Maybe to check
> for narrowing? But it can't narrow since it just calls
> year_month::operator unsigned().

[time.cal] seems to mostly use braced-init for converting a calendar
type to the underlying type, and I mindlessly followed suit :) e.g. in
[time.cal.year.nonmembers] and [time.cal.day.nonmembers].

Ah, but there's [time.cal.month.nonmembers]/7 which uses a static_cast
instead.

> 
> But it's really just because int{unsigned{__ym}} /would/ give a
> narrowing warning, right?

Hmm yes, looks like it.

> 
> Would either int(unsigned(__my)) or static_cast(__ym) make sense
> instead, do avoid people wondering about it in future?

The first form works for me.  The second form is rejected it seems.

Does the following look OK to commit?

-- >8 --

Subject: [PATCH] libstdc++: Fix arithmetic bug in
 chrono::year_month::operator+

This fixes the months-based addition for year_month when the
year_month's month component is zero.

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month::operator+): Properly handle a
month value of 0 by casting the month value to int before
subtracting 1 from it so that the difference is sign-extended in
the subsequent addition.
* testsuite/std/time/year_month/1.cc: Test adding months to a
year_month whose month component is below or above the
normalized range of [1,12].
---
 libstdc++-v3/include/std/chrono |  2 +-
 libstdc++-v3/testsuite/std/time/year_month/1.cc | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 417954d103b..f0fa66c7a2b 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2133,7 +2133,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  // TODO: Optimize?
  auto __m = __ym.month() + __dm;
- auto __i = unsigned{__ym.month()} - 1 + __dm.count();
+ auto __i = int(unsigned(__ym.month())) - 1 + __dm.count();
  auto __y = (__i < 0
  ? __ym.year() + years{(__i - 11) / 12}
  : __ym.year() + years{__i / 12});
diff --git a/libstdc++-v3/testsuite/std/time/year_month/1.cc 
b/libstdc++-v3/testsuite/std/time/year_month/1.cc
index 007cfeb2f72..4c331dcdb50 100644
--- a/libstdc++-v3/testsuite/std/time/year_month/1.cc
+++ b/libstdc++-v3/testsuite/std/time/year_month/1.cc
@@ -83,4 +83,16 @@ constexpr_year_month()
   static_assert(2017y/33 + months{0} == 2019y/9);
 
   static_assert(2010y/January + months{-12} == 2009y/January);
+
+  static_assert(2010y/month{0} + months{-1} == 2009y/November);
+  static_assert(2010y/month{0} + months{0} == 2009y/December);
+  static_assert(2010y/month{0} + months{1} == 2010y/January);
+  static_assert(2010y/month{0} + months{2} == 2010y/February);
+  static_assert(2010y/month{0} + months{11} == 2010y/November);
+  static_assert(2010y/month{0} + months{12} == 2010y/December);
+  static_assert(2010y/month{0} + months{13} == 2011y/January);
+
+  static_assert(months{-1} + 2010y/month{37} == 2012y/December);
+  static_assert(months{0} + 2010y/month{37} == 2013y/January);
+  static_assert(months{1} + 2010y/month{37} == 2013y/February);
 }
-- 
2.28.0.337.ge9b77c84a0



Re: [PATCH] libstdc++: Fix operator overload resolution for calendar types

2020-08-27 Thread Jonathan Wakely via Gcc-patches

On 25/08/20 15:47 -0400, Patrick Palka via Libstdc++ wrote:

My original patch that implemented the calendar type operations failed
to enforce a constraint on some of the addition/subtraction operator
overloads that take a 'months' argument:

 Constraints: If the argument supplied by the caller for the months
 parameter is convertible to years, its implicit conversion sequence to
 years is worse than its implicit conversion sequence to months

This constraint is relevant when adding/subtracting a duration to/from
say a year_month when the given duration is convertible to both 'months'
and to 'years'.  The correct behavior here in light of this constraint
is to select the (more efficient) 'years'-taking overload, but we
currently emit an ambiguous overload error.

This patch follows the approach taken in the 'date' library and defines
the constrained 'months'-taking operator overloads as function
templates, so that we break such a implicit-conversion tie by selecting
the non-template 'years'-taking overload.

Tested on x86_64-pc-linux-gnu, does this look OK to commit?  (The below
diff is generated with --ignore-space-change for easier review.  In the
actual patch, the function templates are indented two extra spaces after
the template parameter list.)

libstdc++-v3/ChangeLog:

* include/std/chrono (__detail::__unspecified_month_disambuator):
Define.
(year_month::operator+=): Turn the 'months'-taking overload
into a function template, so that the 'years'-taking overload is
selected in case of an equally-ranked implicit conversion
sequence to both 'months' and 'years' from the supplied argument.
(year_month::operator-=): Likewise.
(year_month::operator+): Likewise.
(year_month::operator-): Likewise.
(year_month_day::operator+=): Likewise.
(year_month_day::operator-=): Likewise.
(year_month_day::operator+): Likewise.
(year_month_day::operator-): Likewise.
(year_month_day_last::operator+=): Likewise.
(year_month_day_last::operator-=): Likewise.
(year_month_day_last::operator+): Likewise
(year_month_day_last::operator-): Likewise.
(year_month_day_weekday::operator+=): Likewise
(year_month_day_weekday::operator-=): Likewise.
(year_month_day_weekday::operator+): Likewise.
(year_month_day_weekday::operator-): Likewise.
(year_month_day_weekday_last::operator+=): Likewise
(year_month_day_weekday_last::operator-=): Likewise.
(year_month_day_weekday_last::operator+): Likewise.
(year_month_day_weekday_last::operator-): Likewise.
(testsuite/std/time/year_month/2.cc): New test.
(testsuite/std/time/year_month_day/2.cc): New test.
(testsuite/std/time/year_month_day_last/2.cc): New test.
(testsuite/std/time/year_month_weekday/2.cc): New test.
(testsuite/std/time/year_month_weekday_last/2.cc): New test.
---
libstdc++-v3/include/std/chrono   | 52 ++-
.../testsuite/std/time/year_month/2.cc| 40 ++
.../testsuite/std/time/year_month_day/2.cc| 40 ++
.../std/time/year_month_day_last/2.cc | 40 ++
.../std/time/year_month_weekday/2.cc  | 40 ++
.../std/time/year_month_weekday_last/2.cc | 40 ++
6 files changed, 250 insertions(+), 2 deletions(-)
create mode 100644 libstdc++-v3/testsuite/std/time/year_month/2.cc
create mode 100644 libstdc++-v3/testsuite/std/time/year_month_day/2.cc
create mode 100644 libstdc++-v3/testsuite/std/time/year_month_day_last/2.cc
create mode 100644 libstdc++-v3/testsuite/std/time/year_month_weekday/2.cc
create mode 100644 libstdc++-v3/testsuite/std/time/year_month_weekday_last/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 3cc1438a7b6..0e272c3da58 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2046,6 +2046,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

// YEAR_MONTH

+namespace __detail
+{
+  // [time.cal.ym], [time.cal.ymd], etc constrain the 'months'-taking
+  // addition/subtraction operator overloads like so:
+  //
+  //   Constraints: if the argument supplied by the caller for the months
+  //   parameter is convertible to years, its implicit conversion sequence
+  //   to years is worse than its implicit conversion sequence to months.
+  //
+  // We realize this constraint by defining the 'months'-taking overloads 
as
+  // function templates (with a dummy defaulted template parameter), so 
that
+  // overload resolution doesn't select the 'months'-taking overload unless
+  // the implicit conversion sequence to 'months' is better than that to
+  // 'years'.
+  using __months_years_conversion_disambiguator = void;
+}
+
class year_month
{
private:
@@ -2068,6 +2085,7 @@ 

Re: [PATCH] libstdc++: Fix arithmetic bug in chrono::year_month::operator+

2020-08-27 Thread Jonathan Wakely via Gcc-patches

On 27/08/20 11:29 -0400, Patrick Palka via Libstdc++ wrote:

This fixes the months-based addition for year_month when the
year_month's month component is zero.

Successfully tested on x86_64-pc-linux-gnu, on the 'date' library's
calendar and (now) on libcxx's calendar tests.  Does this look OK to
commit?

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month::operator+): Properly handle a
month value of 0 by casting the month value to int before
subtracting 1 from it so that the difference is signed in the
subsequent addition.
* testsuite/std/time/year_month/1.cc: Test addition of months to
a year_month whose month value is below and above the normalized
range of [1,12].
---
libstdc++-v3/include/std/chrono |  2 +-
libstdc++-v3/testsuite/std/time/year_month/1.cc | 12 
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 417954d103b..398008c8f31 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2133,7 +2133,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  // TODO: Optimize?
  auto __m = __ym.month() + __dm;
- auto __i = unsigned{__ym.month()} - 1 + __dm.count();
+ auto __i = int(unsigned{__ym.month()}) - 1 + __dm.count();


The mix of paren-init and braced-init makes me do a double-take.

It makes me stop and wonder if there is some reason? Maybe to check
for narrowing? But it can't narrow since it just calls
year_month::operator unsigned().

But it's really just because int{unsigned{__ym}} /would/ give a
narrowing warning, right?

Would either int(unsigned(__my)) or static_cast(__ym) make sense
instead, do avoid people wondering about it in future?



  auto __y = (__i < 0
  ? __ym.year() + years{(__i - 11) / 12}
  : __ym.year() + years{__i / 12});
diff --git a/libstdc++-v3/testsuite/std/time/year_month/1.cc 
b/libstdc++-v3/testsuite/std/time/year_month/1.cc
index 007cfeb2f72..4c331dcdb50 100644
--- a/libstdc++-v3/testsuite/std/time/year_month/1.cc
+++ b/libstdc++-v3/testsuite/std/time/year_month/1.cc
@@ -83,4 +83,16 @@ constexpr_year_month()
  static_assert(2017y/33 + months{0} == 2019y/9);

  static_assert(2010y/January + months{-12} == 2009y/January);
+
+  static_assert(2010y/month{0} + months{-1} == 2009y/November);
+  static_assert(2010y/month{0} + months{0} == 2009y/December);
+  static_assert(2010y/month{0} + months{1} == 2010y/January);
+  static_assert(2010y/month{0} + months{2} == 2010y/February);
+  static_assert(2010y/month{0} + months{11} == 2010y/November);
+  static_assert(2010y/month{0} + months{12} == 2010y/December);
+  static_assert(2010y/month{0} + months{13} == 2011y/January);
+
+  static_assert(months{-1} + 2010y/month{37} == 2012y/December);
+  static_assert(months{0} + 2010y/month{37} == 2013y/January);
+  static_assert(months{1} + 2010y/month{37} == 2013y/February);
}
--
2.28.0.337.ge9b77c84a0





Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-27 Thread Carl Love via Gcc-patches


GCC maintainers:

The following patch has been updated based on the comments from Will
and Segher.  

The patch is a subset of the mainline commit:

   commit
   07d456bb80a16405723c98c2ab74ccc2a5a23898

   Author: Carl Love 
   
   Date:   Mon Aug 10 19:37:41 2020
   -0500  
   
   
   rs6000, restrict bfloat convert intrinsic to Power 10. Fix
   BU_P10V macro definitions.

Only the changes from the mainline patch to restrict the bfloat convert
intrinsics (XVCVBF16SPN and XVCVSPBF16) to Power 10 are included in
this patch.  This patch adds the BU_P10V_VSX_1 macro definition for
Power 10.  The macro definition restricts the use of the named
intrinsics similarly to Power 8 and Power 9.

The changes for the BU_10V macro definitions from the mainline patch do
not apply to the GCC 10 branch as the macro definitions and uses do not
exist in the GCC 10 branch.  

The patch has been updated per the comments above.  It was rebased onto
the latest GCC 10 code base and retested on 

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions.  Please let me know if this patch is acceptable
for the GCC 10 branch.  Thank you.

 Carl Love


rs6000, restrict bfloat convert intrinsic to Power 10.

gcc/ChangeLog

2020-08-26  Carl Love  
* config/rs6000/rs6000-builtin.def: (BU_P10V_VSX_1) New builtin macro 
expansion.
(XVCVBF16SPN, XVCVSPBF16): Replace macro expansion BU_VSX_1 with 
BU_P10V_VSX_1.
* config/rs6000/rs6000-call.c: (VSX_BUILTIN_XVCVSPBF16, 
VSX_BUILTIN_XVCVBF16SPN):
Replace with P10V_BUILTIN_XVCVSPBF16, P10V_BUILTIN_XVCVBF16SPN 
respectively.
---
 gcc/config/rs6000/rs6000-builtin.def | 12 ++--
 gcc/config/rs6000/rs6000-call.c  |  4 ++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 88f78cb0a15..5de17e79855 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1014,6 +1014,14 @@
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
+/* Built-ins for ISA 3.1 Altivec instructions.  */
+#define BU_P10V_VSX_1(ENUM, NAME, ATTR, ICODE)\
+  RS6000_BUILTIN_1 (P10V_BUILTIN_ ## ENUM, /* ENUM */  \
+   "__builtin_vsx_" NAME,  /* NAME */  \
+   RS6000_BTM_P10, /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+   | RS6000_BTC_UNARY),\
+   CODE_FOR_ ## ICODE) /* ICODE */
 #endif
 
 
@@ -2698,8 +2706,8 @@ BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, 
"__builtin_cfstring", RS6000_BTM_ALWAYS,
  RS6000_BTC_MISC)
 
 /* POWER10 MMA builtins.  */
-BU_VSX_1 (XVCVBF16SPN, "xvcvbf16spn",  MISC, vsx_xvcvbf16spn)
-BU_VSX_1 (XVCVSPBF16,  "xvcvspbf16",   MISC, vsx_xvcvspbf16)
+BU_P10V_VSX_1 (XVCVBF16SPN,"xvcvbf16spn",  MISC, vsx_xvcvbf16spn)
+BU_P10V_VSX_1 (XVCVSPBF16, "xvcvspbf16",   MISC, vsx_xvcvspbf16)
 
 BU_MMA_1 (XXMFACC, "xxmfacc",  QUAD, mma_xxmfacc)
 BU_MMA_1 (XXMTACC, "xxmtacc",  QUAD, mma_xxmtacc)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 2cf78dfa5fe..fc1671e1bb2 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13383,8 +13383,8 @@ builtin_function_type (machine_mode mode_ret, 
machine_mode mode_arg0,
 case P8V_BUILTIN_VGBBD:
 case MISC_BUILTIN_CDTBCD:
 case MISC_BUILTIN_CBCDTD:
-case VSX_BUILTIN_XVCVSPBF16:
-case VSX_BUILTIN_XVCVBF16SPN:
+case P10V_BUILTIN_XVCVSPBF16:
+case P10V_BUILTIN_XVCVBF16SPN:
   h.uns_p[0] = 1;
   h.uns_p[1] = 1;
   break;
-- 
2.17.1




Re: [PATCH 4/6] Add `+' for Jobserver Integration

2020-08-27 Thread Jan Hubicka
> On Thu, 20 Aug 2020, Giuliano Belinassi via Gcc-patches wrote:
> 
> >  libbacktrace/Makefile.in |   2 +-
> >  zlib/Makefile.in |  64 ++--
> 
> These directories use makefiles generated by automake.  Rather than 
> modifying the generated files, you need to modify the sources (whether 
> that's Makefile.am, or code in automake itself - if in automake itself, we 
> should wait for an actual new automake release before updating the version 
> used in GCC).

We really want to fix make to allow tools connect to the jobserver
without the "+" on the line.  It affects other things, like dry run and
is generally a sad hack.

Honza
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH 3/6] Implement fork-based parallelism engine

2020-08-27 Thread Jan Hubicka
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index c0b45795059..22405098dc5 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -226,6 +226,22 @@ cgraph_node::delete_function_version_by_decl (tree decl)
>decl_node->remove ();
>  }
>  
> +/* Release function dominator info if present.  */
> +
> +void
> +cgraph_node::maybe_release_dominators (void)
> +{
> +  struct function *fun = DECL_STRUCT_FUNCTION (decl);
> +
> +  if (fun && fun->cfg)
> +{
> +  if (dom_info_available_p (fun, CDI_DOMINATORS))
> + free_dominance_info (fun, CDI_DOMINATORS);
> +  if (dom_info_available_p (fun, CDI_POST_DOMINATORS))
> + free_dominance_info (fun, CDI_POST_DOMINATORS);
> +}
> +}

I am not sure if that needs to be member function, but if so we want to
merge it with other places in cgraph.c and cgraphunit.c where dominators
are freed.  I do not think you need to check avalability.
> +
>  /* Record that DECL1 and DECL2 are semantically identical function
> versions.  */
>  void
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index b4a7871bd3d..72ac19f9672 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -463,6 +463,15 @@ public:
>   Return NULL if there's no such node.  */
>static symtab_node *get_for_asmname (const_tree asmname);
>  
> +  /* Get symtab node by order.  */
> +  static symtab_node *find_by_order (int order);

This is quadratic and moreover seems unused. Why do you add it?
> +
> +  /* Get symtab_node by its name.  */
> +  static symtab_node *find_by_name (const char *);

Similarly here, note that names are not really very meaningful as lookup
things, since they get duplicated.
> +
> +  /* Get symtab_node by its ASM name.  */
> +  static symtab_node *find_by_asm_name (const char *);

For this we have get_for_asmname (which also populates asm name hash as
needed and is not quadratic)
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index d10d635e942..73e4bed3b61 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -2258,6 +2258,11 @@ cgraph_node::expand (void)
>  {
>location_t saved_loc;
>  
> +  /* FIXME: Find out why body-removed nodes are marked for output.  */
> +  if (body_removed)
> +return;

Indeed, we should know :)
> +
> +
>/* We ought to not compile any inline clones.  */
>gcc_assert (!inlined_to);
>  
> @@ -2658,6 +2663,7 @@ ipa_passes (void)
>  
>execute_ipa_summary_passes
>   ((ipa_opt_pass_d *) passes->all_regular_ipa_passes);
> +
This seems accidental.
>  }
>  
>/* Some targets need to handle LTO assembler output specially.  */
> @@ -2687,10 +2693,17 @@ ipa_passes (void)
>if (flag_generate_lto || flag_generate_offload)
>  targetm.asm_out.lto_end ();
>  
> -  if (!flag_ltrans
> +  if (split_outputs)
> +flag_ltrans = true;
> +
> +  if ((!flag_ltrans || split_outputs)
>&& ((in_lto_p && flag_incremental_link != INCREMENTAL_LINK_LTO)
> || !flag_lto || flag_fat_lto_objects))
>  execute_ipa_pass_list (passes->all_regular_ipa_passes);
> +
> +  if (split_outputs)
> +flag_ltrans = false;
> +
>invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
>  
>bitmap_obstack_release (NULL);
> @@ -2742,6 +2755,185 @@ symbol_table::output_weakrefs (void)
>}
>  }
>  
> +static bool is_number (const char *str)
> +{
> +  while (*str != '\0')
> +switch (*str++)
> +  {
> + case '0':
> + case '1':
> + case '2':
> + case '3':
> + case '4':
> + case '5':
> + case '6':
> + case '7':
> + case '8':
> + case '9':
> +   continue;
> + default:
> +   return false;
> +  }
> +
> +  return true;
> +}

This looks odd, we have other places where we parse number from command
line :)
> +
> +/* If forked, which child am I?  */
> +
> +static int childno = -1;
> +
> +static bool
> +maybe_compile_in_parallel (void)
> +{
> +  struct symtab_node *node;
> +  int partitions, i, j;
> +  int *pids;
> +
> +  bool promote_statics = param_promote_statics;
> +  bool balance = param_balance_partitions;
> +  bool jobserver = false;
> +  bool job_auto = false;
> +  int num_jobs = -1;
> +
> +  if (!flag_parallel_jobs || !split_outputs)
> +return false;
> +
> +  if (!strcmp (flag_parallel_jobs, "auto"))
> +{
> +  jobserver = jobserver_initialize ();
> +  job_auto = true;
> +}
> +  else if (!strcmp (flag_parallel_jobs, "jobserver"))
> +jobserver = jobserver_initialize ();
> +  else if (is_number (flag_parallel_jobs))
> +num_jobs = atoi (flag_parallel_jobs);
> +  else
> +gcc_unreachable ();
> +
> +  if (job_auto && !jobserver)
> +{
> +  num_jobs = sysconf (_SC_NPROCESSORS_CONF);
> +  if (num_jobs > 2)
> + num_jobs = 2;
> +}
> +
> +  if (num_jobs < 0 && !jobserver)
> +{
> +  inform (UNKNOWN_LOCATION,
> +   "-fparallel-jobs=jobserver, but no GNU Jobserver found");
> +  return false;
> +}
> +
> +  if (jobserver)
> +num_jobs = 2;
> +
> +  if (num_jobs == 0)
> +{
> +  inform 

[PATCH] libstdc++: Fix arithmetic bug in chrono::year_month::operator+

2020-08-27 Thread Patrick Palka via Gcc-patches
This fixes the months-based addition for year_month when the
year_month's month component is zero.

Successfully tested on x86_64-pc-linux-gnu, on the 'date' library's
calendar and (now) on libcxx's calendar tests.  Does this look OK to
commit?

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month::operator+): Properly handle a
month value of 0 by casting the month value to int before
subtracting 1 from it so that the difference is signed in the
subsequent addition.
* testsuite/std/time/year_month/1.cc: Test addition of months to
a year_month whose month value is below and above the normalized
range of [1,12].
---
 libstdc++-v3/include/std/chrono |  2 +-
 libstdc++-v3/testsuite/std/time/year_month/1.cc | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 417954d103b..398008c8f31 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2133,7 +2133,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  // TODO: Optimize?
  auto __m = __ym.month() + __dm;
- auto __i = unsigned{__ym.month()} - 1 + __dm.count();
+ auto __i = int(unsigned{__ym.month()}) - 1 + __dm.count();
  auto __y = (__i < 0
  ? __ym.year() + years{(__i - 11) / 12}
  : __ym.year() + years{__i / 12});
diff --git a/libstdc++-v3/testsuite/std/time/year_month/1.cc 
b/libstdc++-v3/testsuite/std/time/year_month/1.cc
index 007cfeb2f72..4c331dcdb50 100644
--- a/libstdc++-v3/testsuite/std/time/year_month/1.cc
+++ b/libstdc++-v3/testsuite/std/time/year_month/1.cc
@@ -83,4 +83,16 @@ constexpr_year_month()
   static_assert(2017y/33 + months{0} == 2019y/9);
 
   static_assert(2010y/January + months{-12} == 2009y/January);
+
+  static_assert(2010y/month{0} + months{-1} == 2009y/November);
+  static_assert(2010y/month{0} + months{0} == 2009y/December);
+  static_assert(2010y/month{0} + months{1} == 2010y/January);
+  static_assert(2010y/month{0} + months{2} == 2010y/February);
+  static_assert(2010y/month{0} + months{11} == 2010y/November);
+  static_assert(2010y/month{0} + months{12} == 2010y/December);
+  static_assert(2010y/month{0} + months{13} == 2011y/January);
+
+  static_assert(months{-1} + 2010y/month{37} == 2012y/December);
+  static_assert(months{0} + 2010y/month{37} == 2013y/January);
+  static_assert(months{1} + 2010y/month{37} == 2013y/February);
 }
-- 
2.28.0.337.ge9b77c84a0



Re: [PATCH 3/6] Implement fork-based parallelism engine

2020-08-27 Thread Jan Hubicka
> We also implemented a GNU Make Jobserver integration to this mechanism,
> as implemented in jobserver.cc. This works as follows:
> diff --git a/gcc/jobserver.cc b/gcc/jobserver.cc
> new file mode 100644
> index 000..8cb374de86e
> --- /dev/null
> +++ b/gcc/jobserver.cc

I wonder if this can go in separately and be first used to trottle down
the number of streaming processes in WPA?

See TODO at the beggining of do_whole_program_analysis and
the logic in stream_out_partitions.  Adding your API to take tokens for
every new partition being streamed (with exception of first one) is
probably very easy.

Note that there is also logic for detecting jobserv in toplev.c that may
be merged with your logic.
In longer run I think your jobserv.cc fits more to liberty, but we could
have it for GCC only until other tools will want to integreate.

Honza


Re: [PATCH 2/6] Implement a new partitioner for parallel compilation

2020-08-27 Thread Jan Hubicka
> When using the LTO infrastructure to compile files in parallel, we
> can't simply use any of the LTO partitioner, once extra dependency
> analysis is required to ensure that some nodes are correctly
> partitioned together.
> 
> Therefore, here we implement a new partitioner called
> "lto_merge_comdat_map" that does all these required analysis.
> The partitioner works as follows:
> 
> 1. We create a number of disjoint sets and inserts each node into a
>separate set, which may be merged together in the future.
> 
> 2. Find COMDAT groups, and mark them to be partitioned together.
> 
> 3. Check all nodes that would require any COMDAT group to be
>copied to its partition (which we name "COMDAT frontier"),
>and mark them to be partitioned together.
>This avoids duplication of COMDAT groups and crashes on the LTO
>partitioning infrastructure.

What kind of crash you get here?
> 
> 4. Check if the user allows the partitioner to promote non-public
>functions or variables to global to improve parallelization
>opportunity with a cost of modifying the output code layout.
> 
> 5. Balance generated partitions for performance unless not told to.
> 
> The choice of 1. was by design, so we could use a union-find
> data structure, which are know for being very fast on set unite
> operations.

In LTO partitioner the groups of objects that "must go toghether"
are discovered when first object is placed into the partition (via
add_to_partition) because with the LTO rules it is always possible to
discover all members from the group starting from any random element via
graph walking.

I guess it is same with your partitioner?  I basically wonder how much
code can be shared and what needs to be duplicated.
It is not very nice to have partitioning implemented twice - it is bit
subtle problem when it comes to details so I would be happier if we
brought in the lto/lto-partition.c to middle end and updaed/cleaned it
up as needed.
> 
> For 3. to work properly, we also had to modify
> lto_promote_cross_file_statics to handle this case.
> 
> The parameters --param=promote-statics and --param=balance-partitions
> control 4. and 5., respectively
> 
> gcc/ChangeLog:
> 2020-08-20  Giuliano Belinassi  
> 
>   * Makefile.in: Add lto-partition.o
>   * cgraph.h (struct symtab_node::aux2): New variable.
>   * lto-partition.c: Move from gcc/lto/lto-partition.c
>   (add_symbol_to_partition_1): Only compute insn size
>   if information is available.
>   (node_cmp): Same as above.
>   (class union_find): New.
>   (ds_print_roots): New function.
>   (balance_partitions): New function.
>   (build_ltrans_partitions): New function.
>   (merge_comdat_nodes): New function.
>   (merge_static_calls): New function.
>   (merge_contained_symbols): New function.
>   (lto_merge_comdat_map): New function.
>   (privatize_symbol_name_1): Handle when WPA is not enabled.
>   (privatize_symbol_name): Same as above.
>   (lto_promote_cross_file_statics): New parameter to select when
>   to promote to global.
>   (lto_check_usage_from_other_partitions): New function.
>   * lto-partition.h: Move from gcc/lto/lto-partition.h
>   (lto_promote_cross_file_statics): Update prototype.
>   (lto_check_usage_from_other_partitions): Declare.
>   (lto_merge_comdat_map): Declare.
> 
> gcc/lto/ChangeLog:
> 2020-08-20  Giuliano Belinassi  
> 
>   * lto-partition.c: Move to gcc/lto-partition.c.
>   * lto-partition.h: Move to gcc/lto-partition.h.
>   * lto.c: Update call to lto_promote_cross_file_statics.
>   * Makefile.in: Remove lto-partition.o.
> ---
>  gcc/Makefile.in   |   1 +
>  gcc/cgraph.h  |   1 +
>  gcc/{lto => }/lto-partition.c | 463 +-
>  gcc/{lto => }/lto-partition.h |   4 +-
>  gcc/lto/Make-lang.in  |   4 +-
>  gcc/lto/lto.c |   2 +-
>  gcc/params.opt|   8 +
>  gcc/tree.c|  23 +-
>  8 files changed, 489 insertions(+), 17 deletions(-)
>  rename gcc/{lto => }/lto-partition.c (78%)
>  rename gcc/{lto => }/lto-partition.h (89%)
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 79e854aa938..be42b15f4ff 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1457,6 +1457,7 @@ OBJS = \
>   lra-spills.o \
>   lto-cgraph.o \
>   lto-streamer.o \
> + lto-partition.o \
>   lto-streamer-in.o \
>   lto-streamer-out.o \
>   lto-section-in.o \
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 0211f08964f..b4a7871bd3d 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -615,6 +615,7 @@ public:
>struct lto_file_decl_data * lto_file_data;
>  
>PTR GTY ((skip)) aux;
> +  int aux2;

We do not really want to add more pass specific data to the symbol table
(since it is critical wrt memory use during WPA stage).
It is possible to attach extra info using the symbol-summary.h
>  
>/* 

Re: [PATCH] ia32: Fix alignment of _Atomic fields [PR65146]

2020-08-27 Thread H.J. Lu via Gcc-patches
On Thu, Aug 27, 2020 at 5:20 AM Uros Bizjak  wrote:
>
> On Thu, Aug 27, 2020 at 10:35 AM Jakub Jelinek  wrote:
> >
> > Hi!
> >
> > For _Atomic fields, lowering the alignment of long long or double etc.
> > fields on ia32 is undesirable, because then one really can't perform atomic
> > operations on those using cmpxchg8b.
> >
> > The following patch stops lowering the alignment in fields for _Atomic
> > types (the x86_field_alignment change) and for -mpreferred-stack-boundary=2
> > also ensures we don't misalign _Atomic long long etc. automatic variables
> > (the ix86_{local,minimum}_alignment changes).
> > Not sure about iamcu_alignment change, I know next to nothing about IA MCU,
> > but unless it doesn't have cmpxchg8b instruction, it would surprise me if we
> > don't want to do it as well.
> > clang apparently doesn't lower the field alignment for _Atomic.
>
> Intel MCU implements pentium ISA, which includes cmpxchg8b.

That is correct.

> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2020-08-27  Jakub Jelinek  
> >
> > PR target/65146
> > * config/i386/i386.c (iamcu_alignment): Don't decrease alignment
> > for TYPE_ATOMIC types.
> > (ix86_local_alignment): Likewise.
> > (ix86_minimum_alignment): Likewise.
> > (x86_field_alignment): Likewise, and emit a -Wpsabi diagnostic
> > for it.
> >
> > * gcc.target/i386/pr65146.c: New test.
>
> LGTM, but please allow some time for HJ to comment.

LGTM too.  Thanks.

> Thanks,
> Uros.
>
> > --- gcc/config/i386/i386.c.jj   2020-08-24 10:00:01.321258451 +0200
> > +++ gcc/config/i386/i386.c  2020-08-26 19:19:11.834297395 +0200
> > @@ -16487,7 +16487,11 @@ iamcu_alignment (tree type, int align)
> >
> >/* Intel MCU psABI specifies scalar types > 4 bytes aligned to 4
> >   bytes.  */
> > -  mode = TYPE_MODE (strip_array_types (type));
> > +  type = strip_array_types (type);
> > +  if (TYPE_ATOMIC (type))
> > +return align;
> > +
> > +  mode = TYPE_MODE (type);
> >switch (GET_MODE_CLASS (mode))
> >  {
> >  case MODE_INT:
> > @@ -16644,7 +16648,8 @@ ix86_local_alignment (tree exp, machine_
> >&& align == 64
> >&& ix86_preferred_stack_boundary < 64
> >&& (mode == DImode || (type && TYPE_MODE (type) == DImode))
> > -  && (!type || !TYPE_USER_ALIGN (type))
> > +  && (!type || (!TYPE_USER_ALIGN (type)
> > +   && !TYPE_ATOMIC (strip_array_types (type
> >&& (!decl || !DECL_USER_ALIGN (decl)))
> >  align = 32;
> >
> > @@ -16757,7 +16762,8 @@ ix86_minimum_alignment (tree exp, machin
> >/* Don't do dynamic stack realignment for long long objects with
> >   -mpreferred-stack-boundary=2.  */
> >if ((mode == DImode || (type && TYPE_MODE (type) == DImode))
> > -  && (!type || !TYPE_USER_ALIGN (type))
> > +  && (!type || (!TYPE_USER_ALIGN (type)
> > +   && !TYPE_ATOMIC (strip_array_types (type
> >&& (!decl || !DECL_USER_ALIGN (decl)))
> >  {
> >gcc_checking_assert (!TARGET_STV);
> > @@ -20293,11 +20299,30 @@ x86_field_alignment (tree type, int comp
> >  return computed;
> >if (TARGET_IAMCU)
> >  return iamcu_alignment (type, computed);
> > -  mode = TYPE_MODE (strip_array_types (type));
> > +  type = strip_array_types (type);
> > +  mode = TYPE_MODE (type);
> >if (mode == DFmode || mode == DCmode
> >|| GET_MODE_CLASS (mode) == MODE_INT
> >|| GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
> > -return MIN (32, computed);
> > +{
> > +  if (TYPE_ATOMIC (type) && computed > 32)
> > +   {
> > + static bool warned;
> > +
> > + if (!warned && warn_psabi)
> > +   {
> > + const char *url
> > +   = CHANGES_ROOT_URL "gcc-11/changes.html#ia32_atomic";
> > +
> > + warned = true;
> > + inform (input_location, "the alignment of %<_Atomic %T%> "
> > + "fields changed in %{GCC 11.1%}",
> > + TYPE_MAIN_VARIANT (type), url);
> > +   }
> > +   }
> > +  else
> > +  return MIN (32, computed);
> > +}
> >return computed;
> >  }
> >
> > --- gcc/testsuite/gcc.target/i386/pr65146.c.jj  2020-08-26 
> > 19:25:27.720023020 +0200
> > +++ gcc/testsuite/gcc.target/i386/pr65146.c 2020-08-26 
> > 19:28:24.982535651 +0200
> > @@ -0,0 +1,12 @@
> > +/* PR target/65146 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Wno-psabi" } */
> > +
> > +struct A { char a; _Atomic long long b; };
> > +struct B { char a; _Atomic double b; };
> > +struct C { char a; _Atomic long long b[2]; };
> > +struct D { char a; _Atomic double b[2]; };
> > +extern int a[__builtin_offsetof (struct A, b) == 8 ? 1 : -1];
> > +extern int b[__builtin_offsetof (struct B, b) == 8 ? 1 : -1];
> > +extern int c[__builtin_offsetof (struct C, b) == 8 ? 1 : -1];
> > +extern int d[__builtin_offsetof (struct D, b) 

[PATCH] x86: Add -mbaseline-isas-only/target("baseline-isas-only")

2020-08-27 Thread H.J. Lu via Gcc-patches
On Tue, Aug 25, 2020 at 5:27 AM Uros Bizjak  wrote:
>
> On Tue, Aug 25, 2020 at 2:13 PM H.J. Lu  wrote:
> >
> > On Mon, Aug 24, 2020 at 12:40 PM H.J. Lu  wrote:
> > >
> > > On Mon, Aug 24, 2020 at 12:25 PM Uros Bizjak  wrote:
> > > >
> > > > On Mon, Aug 24, 2020 at 6:17 PM H.J. Lu  wrote:
> > > > >
> > > > > On Mon, Aug 24, 2020 at 7:55 AM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Mon, Aug 24, 2020 at 3:23 PM H.J. Lu  wrote:
> > > > > >
> > > > > > > > Speaking of pragmas, these should be added outside cpuid.h, 
> > > > > > > > like:
> > > > > > > >
> > > > > > > > #pragma GCC push_options
> > > > > > > > #pragma GCC target("general-regs-only")
> > > > > > > >
> > > > > > > > #include 
> > > > > > > >
> > > > > > > > void cpuid_check ()
> > > > > > > > ...
> > > > > > > >
> > > > > > > > #pragma GCC pop_options
> > > > > > > >
> > > > > > > > >footnote
> > > > > > > >
> > > > > > > > Nowadays, -march=native is mostly used outside generic target
> > > > > > > > compilations, so for relevant avx512 targets, we still generate 
> > > > > > > > spills
> > > > > > > > to mask regs. In future, we can review the setting of the 
> > > > > > > > tuning flag
> > > > > > > > for a generic target in the same way as with SSE2 inter-reg 
> > > > > > > > moves.
> > > > > > > >
> > > > > > >
> > > > > > > Florian raised an issue that we need to limit  to the 
> > > > > > > basic ISAs.
> > > > > > >  should be handled similarly to other intrinsic header 
> > > > > > > files.
> > > > > > > That is  should use
> > > > > > >
> > > > > > > #pragma GCC push_options
> > > > > > > #ifdef __x86_64__
> > > > > > > #pragma GCC target("arch=x86-64")
> > > > > > > #else
> > > > > > > #pragma GCC target("arch=i386")
> > > > > > > ...
> > > > > > > #pragma GCC pop_options
> > > > > > >
> > > > > > > Here is a patch.  OK for master?
> > > > > >
> > > > > > -ENOPATCH
> > > > > >
> > > > > > However, how will this affect inlining? Every single function in
> > > > > > cpuid.h is defined as static __inline, and due to target flags
> > > > > > mismatch, it won't be inlined anymore. These inline functions are 
> > > > > > used
> > > > > > in some bit testing functions, and to keep them inlined, these 
> > > > > > should
> > > > > > also use the same options to avoid non-basic ISAs. This is the 
> > > > > > reason
> > > > > > cpuid.h should be #included after pragma, together with bit testing
> > > > > > functions, as shown above.
> > > > > >
> > > > >
> > > > > How about target("baseline-isas-only")? All CPUID functions are
> > > > > inlined.
> > > >
> > > > No, I don't think this is a good idea. Now consider the situation that
> > > > caller functions are compiled with e.g. -mgeneral-regs-only. Due to
> > > > #pragmas, CPUID functions are compiled with a superset ISAs, so they
> > > > again won't be inlined. ISAs of caller functions and CPUID should
> > > > match, the best way is to include  after the #pragma. And
> > > > IMO, general-regs-only target #pragma is an excellent setting for
> > > > both: cpuid.h and caller bit testing functions.
> > > >
> > > > So, if we care about inlining, decorating cpuid.h with target pragmas
> > > > is a bad idea.
> > >
> > > This can be done with #pragma in .
> > >
> >
> > We just need to update ix86_can_inline_p to allow inline functions
> > with baseline-isas-only and general-regs-only attributes if caller
> > supports the same set of ISAs.
> >
> > Here is the updated patch.
>
> I'm not against it, but I don't plan to approve the attached patch.
>

How about this one?

-- 
H.J.
From a6ffd2914535d23eb916b4684d5edef132912f72 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 21 Aug 2020 09:42:49 -0700
Subject: [PATCH] x86: Add -mbaseline-isas-only/target("baseline-isas-only")

Add -mbaseline-isas-only and target("baseline-isas-only") attribute to
support baseline ISAs, which include FXSR, MMX, SSE and SSE2 in 64-bit
mode.  Use only general registers and baseline ISAs to perform CPUID
check.  We can inline functions with general registers and baseline
ISAs attributes if caller supports the same set of ISAs.

gcc/

	PR target/96744
	* common/config/i386/i386-common.c (ix86_handle_option): Support
	-mbaseline-isas-only.
	* config/i386/i386-options.c (ix86_valid_target_attribute_inner_p):
	Handle baseline-isas-only.
	* config/i386/i386.c (ix86_can_inline_p): Allow inline functions
	with baseline-isas-only and general-regs-only attributes if caller
	supports the same set of ISAs.
	* config/i386/i386.h (TARGET_64BIT_BASELINE_ISAS): New.
	* config/i386/i386.opt: Add -mbaseline-isas-only.
	* doc/extend.texi: Document target("baseline-isas-only") function
	attribute.
	* doc/invoke.texi: Document -mbaseline-isas-only.

gcc/testsuite/

	PR target/96744
	* gcc.target/i386/avx512-check.h: Add #pragma GCC
	target(""general-regs-only,baseline-isas-only") for CPUID check.
	* gcc.target/i386/pr96744-10.c: New test.
	* gcc.target/i386/pr96744-11.c: Likewise.
	* gcc.target/i386/pr96744-12.c: Likewise.
	* 

Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread David Edelsohn via Gcc-patches
On Thu, Aug 27, 2020 at 9:21 AM Bill Schmidt  wrote:
>
> Prior to P10, ELFv2 hasn't implemented nonlocal sibcalls.  Now that we do,
> we need to be sure that r12 is set up prior to such a call.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this okay for trunk?
>
> Thanks,
> Bill
>
>
> 2020-08-27  Bill Schmidt  
>
> gcc/
> PR target/96787
> * config/rs6000/rs6000-logue.c (rs6000_sibcall_aix): Support
> indirect call for ELFv2.
>
> gcc/testsuite/
>
> PR target/96787
> * gcc.target/powerpc/pr96787-1.c: New.
> * gcc.target/powerpc/pr96787-2.c: New.

Okay.

Thanks, David


Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread Alan Modra via Gcc-patches
On Thu, Aug 27, 2020 at 08:21:34AM -0500, Bill Schmidt wrote:
> Prior to P10, ELFv2 hasn't implemented nonlocal sibcalls.  Now that we do,
> we need to be sure that r12 is set up prior to such a call.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this okay for trunk?

FWIW, looks fine to me.

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH] arm: Fix switch tables for thumb-1 with -mpure-code [PR96768]

2020-08-27 Thread Christophe Lyon via Gcc-patches
In comment 14 from PR94538, it was suggested to switch off jump tables
on thumb-1 cores when using -mpure-code, like we already do for thumb-2.

This is what this patch does, and also restores the previous value of
CASE_VECTOR_PC_RELATIVE since it was not the right way of handling
this.

It also adds a new test, since the existing no-casesi.c did not catch
this problem.

Tested by running the whole testsuite with -mpure-code -mcpu=cortex-m0
-mfloat-abi=soft, no regression and the new test passes (and fails
without the fix).

2020-08-27  Christophe Lyon  

gcc/
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove condition on
-mpure-code.
* config/arm/thumb1.md (tablejump): Disable when -mpure-code.

gcc/testsuite/
* gcc.target/arm/pure-code/pr96768.c: New test.
---
 gcc/config/arm/arm.h |  5 ++---
 gcc/config/arm/thumb1.md |  2 +-
 gcc/testsuite/gcc.target/arm/pure-code/pr96768.c | 21 +
 3 files changed, 24 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96768.c

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3887c51..7d43721 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1975,10 +1975,9 @@ enum arm_auto_incmodes
for the index in the tablejump instruction.  */
 #define CASE_VECTOR_MODE Pmode
 
-#define CASE_VECTOR_PC_RELATIVE ((TARGET_THUMB2
\
+#define CASE_VECTOR_PC_RELATIVE (TARGET_THUMB2 \
  || (TARGET_THUMB1 \
- && (optimize_size || flag_pic)))  \
-&& (!target_pure_code))
+ && (optimize_size || flag_pic)))
 
 
 #define CASE_VECTOR_SHORTEN_MODE(min, max, body)   \
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index f0129db..602039e 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -2012,7 +2012,7 @@ (define_insn "*epilogue_insns"
 (define_expand "tablejump"
   [(parallel [(set (pc) (match_operand:SI 0 "register_operand"))
  (use (label_ref (match_operand 1 "" "")))])]
-  "TARGET_THUMB1"
+  "TARGET_THUMB1 && !target_pure_code"
   "
   if (flag_pic)
 {
diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c 
b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
new file mode 100644
index 000..fd4cad5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mpure-code" } */
+
+int f2 (int x, int y)
+{
+  switch (x)
+{
+case 0: return y + 0;
+case 1: return y + 1;
+case 2: return y + 2;
+case 3: return y + 3;
+case 4: return y + 4;
+case 5: return y + 5;
+}
+  return y;
+}
+
+/* We do not want any load from literal pool, but accept loads from r7
+   (frame pointer, used at -O0).  */
+/* { dg-final { scan-assembler-not "ldr\tr\\d+, \\\[r\[0-689\]+\\\]" } } */
+/* { dg-final { scan-assembler "text,\"0x2006\"" } } */
-- 
2.7.4



[PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-27 Thread Bill Schmidt via Gcc-patches
Prior to P10, ELFv2 hasn't implemented nonlocal sibcalls.  Now that we do,
we need to be sure that r12 is set up prior to such a call.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this okay for trunk?

Thanks,
Bill


2020-08-27  Bill Schmidt  

gcc/
PR target/96787
* config/rs6000/rs6000-logue.c (rs6000_sibcall_aix): Support
indirect call for ELFv2.

gcc/testsuite/

PR target/96787
* gcc.target/powerpc/pr96787-1.c: New.
* gcc.target/powerpc/pr96787-2.c: New.
---
 gcc/config/rs6000/rs6000.c   | 19 +-
 gcc/testsuite/gcc.target/powerpc/pr96787-1.c | 38 
 gcc/testsuite/gcc.target/powerpc/pr96787-2.c | 35 ++
 3 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96787-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96787-2.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1c1caa90ede..09545278dcf 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -24833,14 +24833,27 @@ rs6000_sibcall_aix (rtx value, rtx func_desc, rtx 
tlsarg, rtx cookie)
 {
   rtx call[2];
   rtx insn;
+  rtx r12 = NULL_RTX;
+  rtx func_addr = func_desc;
 
   gcc_assert (INTVAL (cookie) == 0);
 
   if (global_tlsarg)
 tlsarg = global_tlsarg;
 
+  /* For ELFv2, r12 and CTR need to hold the function address
+ for an indirect call.  */
+  if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
+{
+  r12 = gen_rtx_REG (Pmode, 12);
+  if (!rtx_equal_p (r12, func_desc))
+   emit_move_insn (r12, func_desc);
+  func_addr = gen_rtx_REG (Pmode, CTR_REGNO);
+  emit_move_insn (func_addr, r12);
+}
+
   /* Create the call.  */
-  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_desc), tlsarg);
+  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), tlsarg);
   if (value != NULL_RTX)
 call[0] = gen_rtx_SET (value, call[0]);
 
@@ -24853,6 +24866,10 @@ rs6000_sibcall_aix (rtx value, rtx func_desc, rtx 
tlsarg, rtx cookie)
   if (!rs6000_pcrel_p (cfun))
 use_reg (_INSN_FUNCTION_USAGE (insn),
 gen_rtx_REG (Pmode, TOC_REGNUM));
+
+  /* Note use of r12.  */
+  if (r12)
+use_reg (_INSN_FUNCTION_USAGE (insn), r12);
 }
 
 /* Expand code to perform a call under the SYSV4 ABI.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96787-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr96787-1.c
new file mode 100644
index 000..3c58e63797f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96787-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* Verify that we generate an indirect sibcall for ELFv2 on P10 and
+   later, with r12 and CTR containing the function address.  PR96787.  */
+
+extern int f (int);
+
+int main ()
+{
+  if (f (3) != 6)
+return 1;
+  return 0;
+}
+
+
+int g (int a)
+{
+  return a * 2;
+}
+
+
+int h (int a)
+{
+  return a + 2;
+}
+
+int __attribute__((__noinline__)) f (int a)
+{
+  int (*x) (int) = a % 2 ?  : 
+  (*x) (a);
+}
+
+/* { dg-final { scan-assembler {\mmtctr 12\M} } } */
+/* { dg-final { scan-assembler {\mbctr\M} } } */
+/* { dg-final { scan-assembler-not {\mbctrl\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96787-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr96787-2.c
new file mode 100644
index 000..b10ab7a8ce8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96787-2.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* Verify that we generate an indirect sibcall for ELFv2 on P10 and
+   later, with r12 and CTR containing the function address.  PR96787.  */
+
+extern void abort (void);
+extern int f (int);
+
+int main ()
+{
+  if (f (3) != 6)
+abort ();
+  return 0;
+}
+
+
+int g (int a)
+{
+  return a * 2;
+}
+
+
+int h (int a)
+{
+  return a + 2;
+}
+
+int __attribute__((__noinline__)) f (int a)
+{
+  int (*x) (int) = a % 2 ?  : 
+  (*x) (a);
+}
+
-- 
2.17.1



Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 27, 2020 at 03:07:59PM +0200, Richard Biener wrote:
> > Also, isn't the pass also useful for TARGET_AVX and above (but in that case
> > only if it is a simple memory load)?  Or are avx/avx2 broadcast slower than
> > full vector loads?
> >
> > As Jeff wrote, I wonder if when successfully replacing those pool constants
> > the old constant pool entries will be omitted.
> >
> > Another thing I wonder about is whether more analysis shouldn't be used.
> > E.g. if the constant pool entry is already emitted into .rodata anyway
> > (e.g. some earlier function needed it), using the broadcast will mean
> > actually larger .rodata.  If {1to8} and similar is as fast as reading all
> > the same elements from memory (or faster), perhaps in that case it should
> > broadcast from the first element of the existing constant pool full vector
> > rather than creating a new one.
> > And similarly, perhaps the function should look at all constant pool entries
> > in the current function (not yet emitted into .rodata) and if it would
> > succeed for some and not for others, either use broadcast from its first
> > element or not perform it for the others too.
> 
> IIRC I once implemented this (re-using vector constant components
> for non-vector pool entries) but it was quite hackish and never merged
> it seems.

If the generic constant pool code could do it, it would of course simplify
this pass.  Not sure if the case where earlier function emits already some
smaller constant and later function needs a CONST_VECTOR containing that can
be handled at all (probably not), but if the same function has both scalar
pool entries and CONST_VECTOR ones that contain those, or already emitted
CONST_VECTOR pool entry has them, it shouldn't be that hard, at least for
targets with symbol aliases, e.g. by using .LC33 = .LC24 or .LC34 = .LC24 + 8
where .LC33 or .LC34 would be the scalar pool entry label and .LC24
CONST_VECTOR containing those.

Seems constant pool marking is performed during
mark_constant_pool called during final from assemble_start_function or
assemble_end_function, so if the pass replaces the constants before final
and the constants are unused, they won't be emitted.

Jakub



[PATCH] cfgloop.h: Reword comment describing is_exit flag

2020-08-27 Thread Alex Coplan
This simple patch rewords a comment in cfgloop.h to improve the
grammar and readability.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

* cfgloop.h (nb_iter_bound): Reword comment describing is_exit.

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 18b404e292f..be978288aab 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -60,9 +60,9 @@ public:
  b) it is consistent with the result of number_of_iterations_exit.  */
   widest_int bound;
 
-  /* True if the statement will cause the loop to be leaved the (at most)
- BOUND + 1-st time it is executed, that is, all the statements after it
- are executed at most BOUND times.  */
+  /* True if, after executing the statement BOUND + 1 times, we will
+ leave the loop; that is, all the statements after it are executed at most
+ BOUND times.  */
   bool is_exit;
 
   /* The next bound in the list.  */


Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-27 Thread Richard Biener via Gcc-patches
On Thu, Aug 27, 2020 at 2:25 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Thu, Jul 09, 2020 at 04:33:46PM +0800, Hongtao Liu via Gcc-patches wrote:
> > +static void
> > +replace_constant_pool_with_broadcast (rtx_insn* insn)
> > +{
> > +  subrtx_ptr_iterator::array_type array;
> > +  FOR_EACH_SUBRTX_PTR (iter, array,  (insn), ALL)
> > +{
> > +  rtx *loc = *iter;
> > +  rtx x = *loc;
> > +  rtx broadcast_mem, vec_dup, constant, first;
> > +  machine_mode mode;
> > +  if (GET_CODE (x) != MEM
>
> MEM_P
>
> > +   || GET_CODE (XEXP (x, 0)) != SYMBOL_REF
>
> SYMBOL_REF_P
>
> > +   || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
> > + continue;
> > +
> > +  mode = GET_MODE (x);
> > +  if (!VECTOR_MODE_P (mode))
> > + return;
> > +
> > +  constant = get_pool_constant (XEXP (x, 0));
> > +  first = XVECEXP (constant, 0, 0);
>
> Shouldn't this verify first that GET_CODE (constant) == CONST_VECTOR
> and punt otherwise?
>
> > +  broadcast_mem = force_const_mem (GET_MODE_INNER (mode), first);
> > +  vec_dup = gen_rtx_VEC_DUPLICATE (mode, broadcast_mem);
> > +  *loc = vec_dup;
> > +  INSN_CODE (insn) = -1;
> > +  /* Revert change if there's no corresponding pattern.  */
> > +  if (recog_memoized (insn) < 0)
> > + {
> > +   *loc = x;
> > +   recog_memoized (insn);
> > + }
>
> The usual way of doing this would be through
>   validate_change (insn, loc, vec_dup, 0);
>
> Also, isn't the pass also useful for TARGET_AVX and above (but in that case
> only if it is a simple memory load)?  Or are avx/avx2 broadcast slower than
> full vector loads?
>
> As Jeff wrote, I wonder if when successfully replacing those pool constants
> the old constant pool entries will be omitted.
>
> Another thing I wonder about is whether more analysis shouldn't be used.
> E.g. if the constant pool entry is already emitted into .rodata anyway
> (e.g. some earlier function needed it), using the broadcast will mean
> actually larger .rodata.  If {1to8} and similar is as fast as reading all
> the same elements from memory (or faster), perhaps in that case it should
> broadcast from the first element of the existing constant pool full vector
> rather than creating a new one.
> And similarly, perhaps the function should look at all constant pool entries
> in the current function (not yet emitted into .rodata) and if it would
> succeed for some and not for others, either use broadcast from its first
> element or not perform it for the others too.

IIRC I once implemented this (re-using vector constant components
for non-vector pool entries) but it was quite hackish and never merged
it seems.

Richard.

> Jakub
>


Re: [PATCH] C-SKY: Add -mbacktrace option.

2020-08-27 Thread Jojo R
Hi,

Ok & thanks, I will reroll this :)

Jojo
在 2020年8月26日 +0800 AM4:22,Jeff Law ,写道:
> On Fri, 2020-08-21 at 14:18 +0800, Jojo R wrote:
> > gcc/ChangeLog:
> >
> > * config/csky/csky.opt (TARGET_BACKTRACE): New.
> > * doc/invoke.texi (C-SKY Options): Document -mbacktrace.
> ISTM you need an actual implementation. All this does is add an option. It's
> impossible to know if this is a good idea without seeing implementation bits.
>
> jeff


[PATCH v5] genemit.c (main): split insn-emit.c for compiling parallelly

2020-08-27 Thread Jojo R
gcc/ChangeLog:

* genemit.c (main): Print 'split line'.
* Makefile.in (insn-emit.c): Define split count and file

---
 gcc/Makefile.in | 15 +
 gcc/genemit.c   | 87 -
 2 files changed, 64 insertions(+), 38 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 79e854aa938..08e4aa7ef6f 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1258,6 +1258,17 @@ ANALYZER_OBJS = \
 # We put the *-match.o and insn-*.o files first so that a parallel make
 # will build them sooner, because they are large and otherwise tend to be
 # the last objects to finish building.
+
+# target overrides
+-include $(tmake_file)
+
+INSN-GENERATED-SPLIT-NUM ?= 0
+insn-generated-split-num = $(shell expr $(INSN-GENERATED-SPLIT-NUM) + 1)
+
+insn-emit-split-c = $(foreach o, $(shell for i in 
{1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c)
+insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
+$(insn-emit-split-c): insn-emit.c
+
 OBJS = \
gimple-match.o \
generic-match.o \
@@ -1265,6 +1276,7 @@ OBJS = \
insn-automata.o \
insn-dfatab.o \
insn-emit.o \
+   $(insn-emit-split-obj) \
insn-extract.o \
insn-latencytab.o \
insn-modes.o \
@@ -2365,6 +2377,9 @@ $(simple_generated_c:insn-%.c=s-%): s-%: 
build/gen%$(build_exeext)
$(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \
  $(filter insn-conditions.md,$^) > tmp-$*.c
$(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
+   $*v=$$(echo $$(csplit insn-$*.c /parallel\ compilation/ -k -s 
{$(INSN-GENERATED-SPLIT-NUM)} -f insn-$* -b "%d.c" 2>&1));\
+   [ ! "$$$*v" ] || grep "match not found" <<< $$$*v
+   [ -s insn-$*0.c ] || (for i in {1..$(insn-generated-split-num)}; do 
touch insn-$*$$i.c; done && echo "" > insn-$*.c)
$(STAMP) s-$*
 
 # gencheck doesn't read the machine description, and the file produced
diff --git a/gcc/genemit.c b/gcc/genemit.c
index 84d07d388ee..3aaaeb62b0a 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -847,6 +847,46 @@ handle_overloaded_gen (overloaded_name *oname)
 }
 }
 
+#define printf_include() do { \
+  printf ("/* Generated automatically by the program `genemit'\n\
+from the machine description file `md'.  */\n\n");  \
+  printf ("#define IN_TARGET_CODE 1\n");\
+  printf ("#include \"config.h\"\n");   \
+  printf ("#include \"system.h\"\n");   \
+  printf ("#include \"coretypes.h\"\n");\
+  printf ("#include \"backend.h\"\n");  \
+  printf ("#include \"predict.h\"\n");  \
+  printf ("#include \"tree.h\"\n"); \
+  printf ("#include \"rtl.h\"\n");  \
+  printf ("#include \"alias.h\"\n");\
+  printf ("#include \"varasm.h\"\n");   \
+  printf ("#include \"stor-layout.h\"\n");  \
+  printf ("#include \"calls.h\"\n");\
+  printf ("#include \"memmodel.h\"\n"); \
+  printf ("#include \"tm_p.h\"\n"); \
+  printf ("#include \"flags.h\"\n");\
+  printf ("#include \"insn-config.h\"\n");  \
+  printf ("#include \"expmed.h\"\n");   \
+  printf ("#include \"dojump.h\"\n");   \
+  printf ("#include \"explow.h\"\n");   \
+  printf ("#include \"emit-rtl.h\"\n"); \
+  printf ("#include \"stmt.h\"\n"); \
+  printf ("#include \"expr.h\"\n"); \
+  printf ("#include \"insn-codes.h\"\n");   \
+  printf ("#include \"optabs.h\"\n");   \
+  printf ("#include \"dfp.h\"\n");  \
+  printf ("#include \"output.h\"\n");   \
+  printf ("#include \"recog.h\"\n");\
+  printf ("#include \"df.h\"\n");   \
+  printf ("#include \"resource.h\"\n"); \
+  printf ("#include \"reload.h\"\n");   \
+  printf ("#include \"diagnostic-core.h\"\n");  \
+  printf ("#include \"regs.h\"\n"); \
+  printf ("#include \"tm-constrs.h\"\n");   \
+  printf ("#include \"ggc.h\"\n");  \
+  printf ("#include \"target.h\"\n\n"); \
+} while (0)
+
 int
 main (int argc, const char **argv)
 {
@@ -862,49 +902,19 @@ main (int argc, const char **argv)
   /* Assign sequential codes to all entries in the machine description
  in parallel with the tables in insn-output.c.  */
 
-  printf ("/* Generated automatically by the program `genemit'\n\
-from the machine description file `md'.  */\n\n");
-
-  printf ("#define IN_TARGET_CODE 1\n");
-  printf ("#include \"config.h\"\n");
-  printf ("#include \"system.h\"\n");
-  printf ("#include \"coretypes.h\"\n");
-  printf ("#include \"backend.h\"\n");
-  printf ("#include 

Re: [PATCH v4] genemit.c (main): split insn-emit.c for compiling parallelly

2020-08-27 Thread Jojo R


Jojo
在 2020年8月2日 +0800 AM8:09,Segher Boessenkool ,写道:
> On Sat, Aug 01, 2020 at 07:02:07PM +0800, Jojo R wrote:
> > +insn-generated-split-num = $(shell nproc)
>
> nproc isn't portable, is not the same on every system, and can lead to
> a number of processes quadratic in the number of processors being
> launched (say, if someone does make -jK with K some fraction of the
> number of processors).
>
> (It is a bad choice anyway: nproc shows how many hardware threads are
> available, not how many it is a good idea to use for optimal
> performance; and it can be overridden by the user as well, via an
> environment variable).
>
> You need to split to some fixed number of parts, where that fixed number
> can depend on the target, but not on the host (or build machine) at all.
>
>
> Segher

Ok & Thanks,

It’s fixed in patch v5

Jojo


[PATCH] rtl-optimization/96812 - remap dependence info on RTL loop unrolling

2020-08-27 Thread Richard Biener
This carries over the PR87609 fix also to RTL loop unrolling.  The
gcc.dg/torture/pr90328.c testcase otherwise is miscompiled with
the tree-ssa-address.c hunk (or alternatively with -fno-ivopts
on master).  I've tried to find the correct abstraction and
adjusted two other duplicate_insn_chain users for which I do not
have testcases.  There may be other insn-chain copying routines
that could be affected but hopefully most appropriately go through
CFG hooks.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2020-08-27  Richard Biener  

PR rtl-optimization/96812
* tree-ssa-address.c (copy_ref_info): Also copy dependence info.
* cfgrtl.h (duplicate_insn_chain): Adjust prototype.
* cfgrtl.c (duplicate_insn_chain): Remap dependence info
if requested.
(cfg_layout_duplicate_bb): Make sure we remap dependence info.
* modulo-sched.c (duplicate_insns_of_cycles): Remap dependence
info.
(generate_prolog_epilog): Adjust.
* config/c6x/c6x.c (hwloop_optimize): Remap dependence info.
---
 gcc/cfgrtl.c   | 60 ++
 gcc/cfgrtl.h   |  3 ++-
 gcc/config/c6x/c6x.c   |  3 ++-
 gcc/modulo-sched.c | 10 ---
 gcc/tree-ssa-address.c | 10 +++
 5 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 03fa688fed6..eb5ccd42ed7 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -61,6 +61,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-pass.h"
 #include "print-rtl.h"
+#include "rtl-iter.h"
+#include "gimplify.h"
 
 /* Disable warnings about missing quoting in GCC diagnostics.  */
 #if __GNUC__ >= 10
@@ -4199,7 +4201,8 @@ cfg_layout_can_duplicate_bb_p (const_basic_block bb)
 }
 
 rtx_insn *
-duplicate_insn_chain (rtx_insn *from, rtx_insn *to)
+duplicate_insn_chain (rtx_insn *from, rtx_insn *to,
+ class loop *loop, copy_bb_data *id)
 {
   rtx_insn *insn, *next, *copy;
   rtx_note *last;
@@ -4228,6 +4231,51 @@ duplicate_insn_chain (rtx_insn *from, rtx_insn *to)
  && ANY_RETURN_P (JUMP_LABEL (insn)))
JUMP_LABEL (copy) = JUMP_LABEL (insn);
   maybe_copy_prologue_epilogue_insn (insn, copy);
+ /* If requested remap dependence info of cliques brought in
+via inlining.  */
+ if (id)
+   {
+ subrtx_iterator::array_type array;
+ FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
+   if (MEM_P (*iter) && MEM_EXPR (*iter))
+ {
+   tree op = MEM_EXPR (*iter);
+   if (TREE_CODE (op) == WITH_SIZE_EXPR)
+ op = TREE_OPERAND (op, 0);
+   while (handled_component_p (op))
+ op = TREE_OPERAND (op, 0);
+   if ((TREE_CODE (op) == MEM_REF
+|| TREE_CODE (op) == TARGET_MEM_REF)
+   && MR_DEPENDENCE_CLIQUE (op) > 1
+   && (!loop
+   || (MR_DEPENDENCE_CLIQUE (op)
+   != loop->owned_clique)))
+ {
+   if (!id->dependence_map)
+ id->dependence_map = new hash_map;
+   bool existed;
+   unsigned short  = id->dependence_map->get_or_insert
+(MR_DEPENDENCE_CLIQUE (op), );
+   if (!existed)
+ {
+   gcc_assert
+ (MR_DEPENDENCE_CLIQUE (op) <= cfun->last_clique);
+   newc = ++cfun->last_clique;
+ }
+   /* We cannot adjust MR_DEPENDENCE_CLIQUE in-place
+  since MEM_EXPR is shared so make a copy and
+  walk to the subtree again.  */
+   tree new_expr = unshare_expr (MEM_EXPR (*iter));
+   if (TREE_CODE (new_expr) == WITH_SIZE_EXPR)
+ new_expr = TREE_OPERAND (new_expr, 0);
+   while (handled_component_p (new_expr))
+ new_expr = TREE_OPERAND (new_expr, 0);
+   MR_DEPENDENCE_CLIQUE (new_expr) = newc;
+   set_mem_expr (const_cast  (*iter), new_expr);
+ }
+ }
+   }
  break;
 
case JUMP_TABLE_DATA:
@@ -4292,12 +4340,14 @@ duplicate_insn_chain (rtx_insn *from, rtx_insn *to)
 /* Create a duplicate of the basic block BB.  */
 
 static basic_block
-cfg_layout_duplicate_bb (basic_block bb, copy_bb_data *)
+cfg_layout_duplicate_bb (basic_block bb, copy_bb_data *id)
 {
   rtx_insn *insn;
   basic_block new_bb;
 
-  insn = duplicate_insn_chain (BB_HEAD (bb), BB_END (bb));
+  class loop *loop = (id && current_loops) ? 

[PATCH] tree-optimization/96522 - transfer of flow-sensitive info in copy_ref_info

2020-08-27 Thread Richard Biener
This removes the bogus tranfer of flow-sensitive info in copy_ref_info
plus fixes one oversight in FRE when flow-sensitive non-NULLness was added to
points-to info.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2020-08-27  Richard Biener  

PR tree-optimization/96522
* tree-ssa-address.c (copy_ref_info): Reset flow-sensitive
info of the copied points-to.  Transfer bigger alignment
via the access type.
* tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt):
Reset all flow-sensitive info.

* gcc.dg/torture/pr96522.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr96522.c | 36 +++
 gcc/tree-ssa-address.c | 48 ++
 gcc/tree-ssa-sccvn.c   |  3 +-
 3 files changed, 55 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr96522.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr96522.c 
b/gcc/testsuite/gcc.dg/torture/pr96522.c
new file mode 100644
index 000..2f55d1aeb4d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr96522.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fno-tree-pta" } */
+
+__attribute__((noipa)) void
+bar (void)
+{
+  volatile int v = 1;
+  if (v)
+__builtin_abort ();
+}
+
+__attribute__((noipa)) void
+baz (void)
+{
+}
+
+__attribute__((noipa)) void
+foo (int n, double *p, double *x)
+{
+  if (n < 10 && p != 0)
+for (int i = 0; i < 10; i++)
+  if (x[0] < p[i])
+x[i] = 0;
+  if (p != 0)
+bar ();
+  else
+baz ();
+}
+
+int
+main ()
+{
+  double arr[10];
+  foo (1000, 0, arr);
+  return 0;
+}
diff --git a/gcc/tree-ssa-address.c b/gcc/tree-ssa-address.c
index 21ad4e57e40..66756921f13 100644
--- a/gcc/tree-ssa-address.c
+++ b/gcc/tree-ssa-address.c
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-affine.h"
 #include "gimplify.h"
+#include "builtins.h"
 
 /* FIXME: We compute address costs using RTL.  */
 #include "tree-ssa-address.h"
@@ -1015,45 +1016,24 @@ copy_ref_info (tree new_ref, tree old_ref)
 
   new_ptr_base = TREE_OPERAND (new_ref, 0);
 
+  tree base = get_base_address (old_ref);
+  if (!base)
+return;
+
   /* We can transfer points-to information from an old pointer
  or decl base to the new one.  */
   if (new_ptr_base
   && TREE_CODE (new_ptr_base) == SSA_NAME
   && !SSA_NAME_PTR_INFO (new_ptr_base))
 {
-  tree base = get_base_address (old_ref);
-  if (!base)
-   ;
-  else if ((TREE_CODE (base) == MEM_REF
-   || TREE_CODE (base) == TARGET_MEM_REF)
-  && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
-  && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
+  if ((TREE_CODE (base) == MEM_REF
+  || TREE_CODE (base) == TARGET_MEM_REF)
+ && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
+ && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
{
- struct ptr_info_def *new_pi;
- unsigned int align, misalign;
-
  duplicate_ssa_name_ptr_info
(new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
- new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
- /* We have to be careful about transferring alignment information.  */
- if (get_ptr_info_alignment (new_pi, , )
- && TREE_CODE (old_ref) == MEM_REF
- && !(TREE_CODE (new_ref) == TARGET_MEM_REF
-  && (TMR_INDEX2 (new_ref)
-  /* TODO: Below conditions can be relaxed if TMR_INDEX
- is an indcution variable and its initial value and
- step are aligned.  */
-  || (TMR_INDEX (new_ref) && !TMR_STEP (new_ref))
-  || (TMR_STEP (new_ref)
-  && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
-  < align)
-   {
- poly_uint64 inc = (mem_ref_offset (old_ref)
-- mem_ref_offset (new_ref)).force_uhwi ();
- adjust_ptr_info_misalignment (new_pi, inc);
-   }
- else
-   mark_ptr_info_alignment_unknown (new_pi);
+ reset_flow_sensitive_info (new_ptr_base);
}
   else if (VAR_P (base)
   || TREE_CODE (base) == PARM_DECL
@@ -1063,6 +1043,14 @@ copy_ref_info (tree new_ref, tree old_ref)
  pt_solution_set_var (>pt, base);
}
 }
+
+  /* And alignment info.  Note we cannot transfer misalignment info
+ since that sits on the SSA name but this is flow-sensitive info
+ which we cannot transfer in this generic routine.  */
+  unsigned old_align = get_object_alignment (old_ref);
+  unsigned new_align = get_object_alignment (new_ref);
+  if (new_align < old_align)
+TREE_TYPE (new_ref) = build_aligned_type (TREE_TYPE (new_ref), old_align);
 }
 
 /* Move constants in target_mem_ref REF to offset.  Returns the new 

Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 09, 2020 at 04:33:46PM +0800, Hongtao Liu via Gcc-patches wrote:
> +static void
> +replace_constant_pool_with_broadcast (rtx_insn* insn)
> +{
> +  subrtx_ptr_iterator::array_type array;
> +  FOR_EACH_SUBRTX_PTR (iter, array,  (insn), ALL)
> +{
> +  rtx *loc = *iter;
> +  rtx x = *loc;
> +  rtx broadcast_mem, vec_dup, constant, first;
> +  machine_mode mode;
> +  if (GET_CODE (x) != MEM

MEM_P

> +   || GET_CODE (XEXP (x, 0)) != SYMBOL_REF

SYMBOL_REF_P

> +   || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
> + continue;
> +
> +  mode = GET_MODE (x);
> +  if (!VECTOR_MODE_P (mode))
> + return;
> +
> +  constant = get_pool_constant (XEXP (x, 0));
> +  first = XVECEXP (constant, 0, 0);

Shouldn't this verify first that GET_CODE (constant) == CONST_VECTOR
and punt otherwise?

> +  broadcast_mem = force_const_mem (GET_MODE_INNER (mode), first);
> +  vec_dup = gen_rtx_VEC_DUPLICATE (mode, broadcast_mem);
> +  *loc = vec_dup;
> +  INSN_CODE (insn) = -1;
> +  /* Revert change if there's no corresponding pattern.  */
> +  if (recog_memoized (insn) < 0)
> + {
> +   *loc = x;
> +   recog_memoized (insn);
> + }

The usual way of doing this would be through
  validate_change (insn, loc, vec_dup, 0);

Also, isn't the pass also useful for TARGET_AVX and above (but in that case
only if it is a simple memory load)?  Or are avx/avx2 broadcast slower than
full vector loads?

As Jeff wrote, I wonder if when successfully replacing those pool constants
the old constant pool entries will be omitted.

Another thing I wonder about is whether more analysis shouldn't be used.
E.g. if the constant pool entry is already emitted into .rodata anyway
(e.g. some earlier function needed it), using the broadcast will mean
actually larger .rodata.  If {1to8} and similar is as fast as reading all
the same elements from memory (or faster), perhaps in that case it should
broadcast from the first element of the existing constant pool full vector
rather than creating a new one.
And similarly, perhaps the function should look at all constant pool entries
in the current function (not yet emitted into .rodata) and if it would
succeed for some and not for others, either use broadcast from its first
element or not perform it for the others too.

Jakub



Re: [PATCH] ia32: Fix alignment of _Atomic fields [PR65146]

2020-08-27 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 27, 2020 at 10:35 AM Jakub Jelinek  wrote:
>
> Hi!
>
> For _Atomic fields, lowering the alignment of long long or double etc.
> fields on ia32 is undesirable, because then one really can't perform atomic
> operations on those using cmpxchg8b.
>
> The following patch stops lowering the alignment in fields for _Atomic
> types (the x86_field_alignment change) and for -mpreferred-stack-boundary=2
> also ensures we don't misalign _Atomic long long etc. automatic variables
> (the ix86_{local,minimum}_alignment changes).
> Not sure about iamcu_alignment change, I know next to nothing about IA MCU,
> but unless it doesn't have cmpxchg8b instruction, it would surprise me if we
> don't want to do it as well.
> clang apparently doesn't lower the field alignment for _Atomic.

Intel MCU implements pentium ISA, which includes cmpxchg8b.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-08-27  Jakub Jelinek  
>
> PR target/65146
> * config/i386/i386.c (iamcu_alignment): Don't decrease alignment
> for TYPE_ATOMIC types.
> (ix86_local_alignment): Likewise.
> (ix86_minimum_alignment): Likewise.
> (x86_field_alignment): Likewise, and emit a -Wpsabi diagnostic
> for it.
>
> * gcc.target/i386/pr65146.c: New test.

LGTM, but please allow some time for HJ to comment.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2020-08-24 10:00:01.321258451 +0200
> +++ gcc/config/i386/i386.c  2020-08-26 19:19:11.834297395 +0200
> @@ -16487,7 +16487,11 @@ iamcu_alignment (tree type, int align)
>
>/* Intel MCU psABI specifies scalar types > 4 bytes aligned to 4
>   bytes.  */
> -  mode = TYPE_MODE (strip_array_types (type));
> +  type = strip_array_types (type);
> +  if (TYPE_ATOMIC (type))
> +return align;
> +
> +  mode = TYPE_MODE (type);
>switch (GET_MODE_CLASS (mode))
>  {
>  case MODE_INT:
> @@ -16644,7 +16648,8 @@ ix86_local_alignment (tree exp, machine_
>&& align == 64
>&& ix86_preferred_stack_boundary < 64
>&& (mode == DImode || (type && TYPE_MODE (type) == DImode))
> -  && (!type || !TYPE_USER_ALIGN (type))
> +  && (!type || (!TYPE_USER_ALIGN (type)
> +   && !TYPE_ATOMIC (strip_array_types (type
>&& (!decl || !DECL_USER_ALIGN (decl)))
>  align = 32;
>
> @@ -16757,7 +16762,8 @@ ix86_minimum_alignment (tree exp, machin
>/* Don't do dynamic stack realignment for long long objects with
>   -mpreferred-stack-boundary=2.  */
>if ((mode == DImode || (type && TYPE_MODE (type) == DImode))
> -  && (!type || !TYPE_USER_ALIGN (type))
> +  && (!type || (!TYPE_USER_ALIGN (type)
> +   && !TYPE_ATOMIC (strip_array_types (type
>&& (!decl || !DECL_USER_ALIGN (decl)))
>  {
>gcc_checking_assert (!TARGET_STV);
> @@ -20293,11 +20299,30 @@ x86_field_alignment (tree type, int comp
>  return computed;
>if (TARGET_IAMCU)
>  return iamcu_alignment (type, computed);
> -  mode = TYPE_MODE (strip_array_types (type));
> +  type = strip_array_types (type);
> +  mode = TYPE_MODE (type);
>if (mode == DFmode || mode == DCmode
>|| GET_MODE_CLASS (mode) == MODE_INT
>|| GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
> -return MIN (32, computed);
> +{
> +  if (TYPE_ATOMIC (type) && computed > 32)
> +   {
> + static bool warned;
> +
> + if (!warned && warn_psabi)
> +   {
> + const char *url
> +   = CHANGES_ROOT_URL "gcc-11/changes.html#ia32_atomic";
> +
> + warned = true;
> + inform (input_location, "the alignment of %<_Atomic %T%> "
> + "fields changed in %{GCC 11.1%}",
> + TYPE_MAIN_VARIANT (type), url);
> +   }
> +   }
> +  else
> +  return MIN (32, computed);
> +}
>return computed;
>  }
>
> --- gcc/testsuite/gcc.target/i386/pr65146.c.jj  2020-08-26 19:25:27.720023020 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr65146.c 2020-08-26 19:28:24.982535651 
> +0200
> @@ -0,0 +1,12 @@
> +/* PR target/65146 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wno-psabi" } */
> +
> +struct A { char a; _Atomic long long b; };
> +struct B { char a; _Atomic double b; };
> +struct C { char a; _Atomic long long b[2]; };
> +struct D { char a; _Atomic double b[2]; };
> +extern int a[__builtin_offsetof (struct A, b) == 8 ? 1 : -1];
> +extern int b[__builtin_offsetof (struct B, b) == 8 ? 1 : -1];
> +extern int c[__builtin_offsetof (struct C, b) == 8 ? 1 : -1];
> +extern int d[__builtin_offsetof (struct D, b) == 8 ? 1 : -1];
>
> Jakub
>


Re: [PATCH] testsuite: add -fno-tree-fre in gcc.dg/guality

2020-08-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 27, 2020 at 11:13:50AM +, Hu, Jiangping wrote:
> I'm not sure about if the case should fail.
> So, I add Jakub who committed this testcase.
> 
> I thought the case should success, but for changes of gcc of
> years, now it failed. So I think that may be some optimization
> are unnecessary for this testcase, and I found the FRE.

The intent of the guality testsuite is to track regressions in debug info
quality.  Because the testing matrix is huge
(different targets x different -O* options x different gdb versions),
it isn't really feasible to keep xfails for them accurately, so all we care
about is whether we don't regress compared to older gcc releases for the
same target, same -O* options and same gdb version.
If we do regress e.g. because of some added optimization, the intent is
that it is analyzed why it regressed and if possible some improvements
in the debug info generation are added to deal with the optimization if
possible.
So adding -fno-tree-* is not the right solution, the majority of people will
not use such options in their codes and so they will observe the debug info
quality degradation there.

Jakub



Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-08-27 Thread Jonathan Wakely via Gcc-patches

On 27/08/20 13:17 +0200, Jakub Jelinek wrote:

On Thu, Aug 27, 2020 at 12:06:59PM +0100, Jonathan Wakely wrote:

On 27/08/20 12:46 +0200, Jakub Jelinek wrote:
> On Thu, Aug 27, 2020 at 12:06:13PM +0200, Jakub Jelinek via Gcc-patches wrote:
>
> Oops, rewrote the testcase from __builtin_bit_cast to std::bit_cast without
> adjusting the syntax properly.
> Also, let's not use bitfields in there, as clang doesn't support those.
> So, adjusted testcase below.  clang++ rejects all 6 of those, but from what
> you said, I'd expect that u and z should be well defined.
>
> #include 
>
> struct S { short a; int b; };
> struct T { int a, b; };
>
> constexpr int
> foo ()
> {
>  S a = S ();
>  S b = { 0, 0 };
>  S c = a;
>  S d;
>  S e;
>  d = a;
>  e = S ();
>  int u = std::bit_cast (a).a; // Is this well defined due to value 
initialization of a?

The standard says that padding bits in the bit_cast result are
unspecified, so I don't think they have to be copied even if the
source object was zero-initialized and has all-zero padding bits.


My understanding of
"Padding bits of the To object are unspecified."
is that one shouldn't treat the result of std::bit_cast as if it was e.g.
value initialization followed by member-wise assignment.
But it doesn't say anything about padding bits in the From object.
In the above testcase, T has no padding bits, only S has them.


Doh, yes, sorry.


I think the "Padding bits of the To object are unspecified." should be about:
 T t = { 0, 0 };
 int s = std::bit_cast (std::bit_cast (t)).a;
being UB, that one can't expect the padding bits in S to have a particular
value.

Jakub




Re: [PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-27 Thread Christophe Lyon via Gcc-patches
On Thu, 27 Aug 2020 at 11:02, Ramana Radhakrishnan
 wrote:
>
> On Mon, Aug 24, 2020 at 4:35 PM Christophe Lyon
>  wrote:
> >
> > On Mon, 24 Aug 2020 at 11:09, Christophe Lyon
> >  wrote:
> > >
> > > On Sat, 22 Aug 2020 at 00:44, Ramana Radhakrishnan
> > >  wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 10:32 AM Christophe Lyon via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > armv8-m.base (cortex-m23) has the movt instruction, so we need to
> > > > > disable the define_split to generate a constant in this case,
> > > > > otherwise we get incorrect insn constraints as described in PR94538.
> > > > >
> > > > > We also need to fix the pure-code alternative for thumb1_movsi_insn
> > > > > because the assembler complains with instructions like
> > > > > movs r0, #:upper8_15:1234
> > > > > (Internal error in md_apply_fix)
> > > > > We now generate movs r0, 4 instead.
> > > > >
> > > > > 2020-08-19  Christophe Lyon  
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * config/arm/thumb1.md: Disable set-constant splitter when
> > > > > TARGET_HAVE_MOVT.
> > > > > (thumb1_movsi_insn): Fix -mpure-code
> > > > > alternative.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/arm/pure-code/pr94538-1.c: New test.
> > > > > * gcc.target/arm/pure-code/pr94538-2.c: New test.
> > > >
> > >
> > > Hi Ramana,
> > >
> > > >  I take it that this fixes the ICE rather than addressing the code
> > > > generation / performance bits that Wilco was referring to ? Really the
> > > > other code quality / performance issues listed in the PR should really
> > > > be broken down into separate PRs while we take this as a fix for
> > > > fixing the ICE,
> > > >
> > > > Under that assumption OK.
> > > >
> > >
> > > Yes, that's correct, this patch just fixes the ICE, as it is cleaner
> > > to handle perf issues in a different patch.
> > >
> >
> > The patch applies cleanly to gcc-9 and gcc-10: OK to backport there?
>
> Yes, all good.
>

Thanks, I just pushed the two backports.

Christophe

> Thanks,
> Ramana
>
> >
> > (I tested that the offending testcase passes on the branches with the
> > backport, and fails without).
> >
> > Thanks,
> >
> > Christophe
> >
> > > I'll open a new PR to track the perf issue.
> > >
> > > Thanks
> > >
> > >
> > > > Ramana
> > > >
> > > > > ---
> > > > >  gcc/config/arm/thumb1.md   | 66 
> > > > > ++
> > > > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
> > > > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
> > > > >  3 files changed, 79 insertions(+), 12 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
> > > > >
> > > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > > > > index 0ff8190..f0129db 100644
> > > > > --- a/gcc/config/arm/thumb1.md
> > > > > +++ b/gcc/config/arm/thumb1.md
> > > > > @@ -70,6 +70,7 @@ (define_split
> > > > >"TARGET_THUMB1
> > > > > && arm_disable_literal_pool
> > > > > && GET_CODE (operands[1]) == CONST_INT
> > > > > +   && !TARGET_HAVE_MOVT
> > > > > && !satisfies_constraint_I (operands[1])"
> > > > >[(clobber (const_int 0))]
> > > > >"
> > > > > @@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
> > > > >"TARGET_THUMB1
> > > > > && (   register_operand (operands[0], SImode)
> > > > > || register_operand (operands[1], SImode))"
> > > > > -  "@
> > > > > -   movs%0, %1
> > > > > -   movs%0, %1
> > > > > -   movw%0, %1
> > > > > -   #
> > > > > -   #
> > > > > -   ldmia\\t%1, {%0}
> > > > > -   stmia\\t%0, {%1}
> > > > > -   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, 
> > > > > #:upper0_7:%1; lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, 
> > > > > #8; adds\\t%0, #:lower0_7:%1
> > > > > -   ldr\\t%0, %1
> > > > > -   str\\t%1, %0
> > > > > -   mov\\t%0, %1"
> > > > > +{
> > > > > +  switch (which_alternative)
> > > > > +{
> > > > > +  default:
> > > > > +  case 0: return "movs\t%0, %1";
> > > > > +  case 1: return "movs\t%0, %1";
> > > > > +  case 2: return "movw\t%0, %1";
> > > > > +  case 3: return "#";
> > > > > +  case 4: return "#";
> > > > > +  case 5: return "ldmia\t%1, {%0}";
> > > > > +  case 6: return "stmia\t%0, {%1}";
> > > > > +  case 7:
> > > > > +  /* pure-code alternative: build the constant byte by byte,
> > > > > +instead of loading it from a constant pool.  */
> > > > > +   {
> > > > > + int i;
> > > > > + HOST_WIDE_INT op1 = INTVAL (operands[1]);
> > > > > + bool mov_done_p = false;
> > > > > + rtx ops[2];
> > > > > + ops[0] = operands[0];
> > > > > +
> > > > > + /* Emit upper 3 bytes if needed.  */
> > > > > + for (i = 0; i < 3; i++)
> > > > > +   {
> > > > > +   

Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-08-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 27, 2020 at 12:06:59PM +0100, Jonathan Wakely wrote:
> On 27/08/20 12:46 +0200, Jakub Jelinek wrote:
> > On Thu, Aug 27, 2020 at 12:06:13PM +0200, Jakub Jelinek via Gcc-patches 
> > wrote:
> > 
> > Oops, rewrote the testcase from __builtin_bit_cast to std::bit_cast without
> > adjusting the syntax properly.
> > Also, let's not use bitfields in there, as clang doesn't support those.
> > So, adjusted testcase below.  clang++ rejects all 6 of those, but from what
> > you said, I'd expect that u and z should be well defined.
> > 
> > #include 
> > 
> > struct S { short a; int b; };
> > struct T { int a, b; };
> > 
> > constexpr int
> > foo ()
> > {
> >  S a = S ();
> >  S b = { 0, 0 };
> >  S c = a;
> >  S d;
> >  S e;
> >  d = a;
> >  e = S ();
> >  int u = std::bit_cast (a).a; // Is this well defined due to value 
> > initialization of a?
> 
> The standard says that padding bits in the bit_cast result are
> unspecified, so I don't think they have to be copied even if the
> source object was zero-initialized and has all-zero padding bits.

My understanding of
"Padding bits of the To object are unspecified."
is that one shouldn't treat the result of std::bit_cast as if it was e.g.
value initialization followed by member-wise assignment.
But it doesn't say anything about padding bits in the From object.
In the above testcase, T has no padding bits, only S has them.
I think the "Padding bits of the To object are unspecified." should be about:
  T t = { 0, 0 };
  int s = std::bit_cast (std::bit_cast (t)).a;
being UB, that one can't expect the padding bits in S to have a particular
value.

Jakub



RE: [PATCH] testsuite: add -fno-tree-fre in gcc.dg/guality

2020-08-27 Thread Hu, Jiangping
Hi, Richard, Jakub

Thanks for reply. 

I'm not sure about if the case should fail.
So, I add Jakub who committed this testcase.

I thought the case should success, but for changes of gcc of
years, now it failed. So I think that may be some optimization
are unnecessary for this testcase, and I found the FRE.

> Huh well - I guess it should be XFAILed instead?  But then we see
> XPASSes.  Disabling
> FRE doesn't look correct - we _do_ want the test to succeed here.  Did
> you analyze
> why it fails?
Do you mean the reason may be in gcc, not in this testcase?
If so, I'll do more debug.

> 
> It looks like we assign 'a' a register at RTL expansion time for -Og
> but do not perform any
> fancy tracking of components of it then.  On GIMPLE we assume it's
> eventually
> memory and thus wouldn't bother with any debug stmts.
> 
> So it's a genuine FAIL for -Og at least.  That said, the logic is simple -
> both
> a.i and a.j need to be available for a.i + a.j to be evaluated.  For
> some reasons
> we fail to record where their value resides.  For -Og they are in
> %rbp, resp. %rbx.
> But obviously some elaborate DWARF is necessary to convey that info.
I tried -gdwarf options, but it makes no difference.

Jakub, what do you think?

Regards!
Hujp

> 
> Richard.
> 
> > Regards!
> > hujp
> >
> > ---
> >  gcc/testsuite/gcc.dg/guality/sra-1.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/guality/sra-1.c
> b/gcc/testsuite/gcc.dg/guality/sra-1.c
> > index 8ad57cf3f8e..94ea29dd727 100644
> > --- a/gcc/testsuite/gcc.dg/guality/sra-1.c
> > +++ b/gcc/testsuite/gcc.dg/guality/sra-1.c
> > @@ -1,6 +1,6 @@
> >  /* PR debug/43983 */
> >  /* { dg-do run } */
> > -/* { dg-options "-g -fno-ipa-icf" } */
> > +/* { dg-options "-g -fno-tree-fre -fno-ipa-icf" } */
> >
> >  struct A { int i; int j; };
> >  struct B { int : 4; int i : 12; int j : 12; int : 4; };
> > --
> > 2.17.1
> >
> >
> >
> 





Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-27 Thread Jan Hubicka
> Under what circumstances are we seeing a SEQUENCE in the x86 backend?  I'm
> surprised we need to handle that case.
> 
> So your pass modifies the insn in place, which is fine.  But do we actually
> remove the original constant pool entry if it's no longer used?  If not, does
> this patch actually save anything (memory bandwidth perhaps?)

Constant pool entries are output only if actually used by asm output, so
this could just work.
> 
> Is there an existing pass over the RTL chain where this would work so that 
> it's
> more compile-time efficient?

I was also concerned about adding yet another pass and wanted to look
bit more into posibility to make this a part of peephole pass.  While it
is true that the usual way to write it (adding extra pattern for every
instruction)  is a lot of work I was thinking if we can perhaps just add
quite generic define_peephole which will match everything containing
broadcast via predicate, call into the expander that will try to build
mathcing instruction and fail otherwise.  While it is still bit of a
hack I think it may be less intrusive then yet another machine specific
pass.

Honza
> 
> jeff
> 


[PATCH] streamline TARGET_MEM_REF dumping

2020-08-27 Thread Richard Biener
The following streamlines TARGET_MEM_REF dumping building
on what we do for MEM_REF and thus dumping things like
access type, TBAA type and base/clique.  I've changed it
to do semantic dumping aka base + offset + step * index
rather than the odd base: A, step: way.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-08-27  Richard Biener  

* tree-pretty-print.c (dump_mem_ref): Handle TARGET_MEM_REFs.
(dump_generic_node): Use dump_mem_ref also for TARGET_MEM_REF.

* gcc.dg/tree-ssa/loop-19.c: Adjust.
* gcc.dg/tree-ssa/loop-2.c: Likewise.
* gcc.dg/tree-ssa/loop-3.c: Likewise.
---
 gcc/testsuite/gcc.dg/tree-ssa/loop-19.c |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/loop-2.c  |  1 -
 gcc/testsuite/gcc.dg/tree-ssa/loop-3.c  |  3 +-
 gcc/tree-pretty-print.c | 89 -
 4 files changed, 30 insertions(+), 67 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
index af7a3daddec..0c73111c1ee 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
@@ -22,6 +22,6 @@ void tuned_STREAM_Copy()
However, due to a bug in jump threading, we end up peeling one iteration 
from
the loop, which creates an additional occurrence.  */
 
-/* { dg-final { scan-tree-dump-times "MEM.(base: &|symbol: )a," 2 "optimized" 
} } */
-/* { dg-final { scan-tree-dump-times "MEM.(base: &|symbol: )c," 2 "optimized" 
} } */
+/* { dg-final { scan-tree-dump-times "MEM\[^;\]*" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MEM\[^;\]*" 1 "optimized" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
index bda25167353..e58561a6650 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
@@ -27,7 +27,6 @@ void xxx(void)
 
 /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */
 
 /* 17 * iter should be strength reduced.  */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
index d3b26b7ad19..74491f80e49 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
@@ -20,8 +20,7 @@ void xxx(void)
 /* Access to arr_base[iter].y should not be strength reduced, since
we have a memory mode including multiplication by 4.  */
 
-/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "step:" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MEM\[^;\]* \* 4" 1 "optimized" } } */
 
 /* And original induction variable should be preserved.  */
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 655061c174d..075a3fca766 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1441,7 +1441,7 @@ dump_omp_atomic_memory_order (pretty_printer *pp, enum 
omp_memory_order mo)
 static void
 dump_mem_ref (pretty_printer *pp, tree node, int spc, dump_flags_t flags)
 {
-  if (flags & TDF_GIMPLE)
+  if (TREE_CODE (node) == MEM_REF && (flags & TDF_GIMPLE))
 {
   pp_string (pp, "__MEM <");
   dump_generic_node (pp, TREE_TYPE (node),
@@ -1472,7 +1472,8 @@ dump_mem_ref (pretty_printer *pp, tree node, int spc, 
dump_flags_t flags)
}
   pp_right_paren (pp);
 }
-  else if (integer_zerop (TREE_OPERAND (node, 1))
+  else if (TREE_CODE (node) == MEM_REF
+  && integer_zerop (TREE_OPERAND (node, 1))
   /* Dump the types of INTEGER_CSTs explicitly, for we can't
  infer them and MEM_ATTR caching will share MEM_REFs
  with differently-typed op0s.  */
@@ -1541,12 +1542,33 @@ dump_mem_ref (pretty_printer *pp, tree node, int spc, 
dump_flags_t flags)
   dump_generic_node (pp, op1type, spc, flags | TDF_SLIM, false);
   pp_right_paren (pp);
   dump_generic_node (pp, op0, spc, flags, false);
-  if (!integer_zerop (op1))
-  if (!integer_zerop (TREE_OPERAND (node, 1)))
+  if (TREE_CODE (node) == MEM_REF
+ && !integer_zerop (op1))
{
  pp_string (pp, " + ");
  dump_generic_node (pp, op1, spc, flags, false);
}
+  if (TREE_CODE (node) == TARGET_MEM_REF)
+   {
+ tree tmp = TMR_INDEX2 (node);
+ if (tmp)
+   {
+ pp_string (pp, " + ");
+ dump_generic_node (pp, tmp, spc, flags, false);
+   }
+ tmp = TMR_INDEX (node);
+ if (tmp)
+   {
+ pp_string (pp, " + ");
+ dump_generic_node (pp, tmp, spc, flags, false);
+ tmp = TMR_STEP (node);
+ pp_string (pp, " * ");
+ if (tmp)
+   dump_generic_node (pp, tmp, spc, flags, false);
+ else
+   pp_string (pp, "1");
+

Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-08-27 Thread Jonathan Wakely via Gcc-patches

On 27/08/20 12:46 +0200, Jakub Jelinek wrote:

On Thu, Aug 27, 2020 at 12:06:13PM +0200, Jakub Jelinek via Gcc-patches wrote:

Oops, rewrote the testcase from __builtin_bit_cast to std::bit_cast without
adjusting the syntax properly.
Also, let's not use bitfields in there, as clang doesn't support those.
So, adjusted testcase below.  clang++ rejects all 6 of those, but from what
you said, I'd expect that u and z should be well defined.

#include 

struct S { short a; int b; };
struct T { int a, b; };

constexpr int
foo ()
{
 S a = S ();
 S b = { 0, 0 };
 S c = a;
 S d;
 S e;
 d = a;
 e = S ();
 int u = std::bit_cast (a).a; // Is this well defined due to value 
initialization of a?


The standard says that padding bits in the bit_cast result are
unspecified, so I don't think they have to be copied even if the
source object was zero-initialized and has all-zero padding bits.



Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-08-27 Thread Jonathan Wakely via Gcc-patches

On 27/08/20 12:06 +0200, Jakub Jelinek wrote:

On Fri, Jul 31, 2020 at 04:28:05PM -0400, Jason Merrill via Gcc-patches wrote:

On 7/31/20 6:06 AM, Jakub Jelinek wrote:
> On Fri, Jul 31, 2020 at 10:54:46AM +0100, Jonathan Wakely wrote:
> > > Does the standard require that somewhere?  Because that is not what the
> > > compiler implements right now.
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78620
>
> But does that imply that all CONSTRUCTORs without CONSTRUCTOR_NO_CLEARING
> need to be treated that way?  I mean, aren't such CONSTRUCTORs used also for
> other initializations?

Yes, they are also used to represent constant values of classes that are
initialized by constexpr constructor.

> And, are the default copy constructors or assignment operators supposed to
> also copy the padding bits, or do they become unspecified again through
> that?

For a non-union class, a defaulted copy is defined as memberwise copy, not a
copy of the entire object representation.  So I guess strictly speaking the
padding bits do become unspecified.  But I think if the copy is trivial, in
practice all implementations do copy the object representation; perhaps the
specification should adjust accordingly.


Sorry for not responding earlier.  I think at least in GCC there is no
guarantee the copying is copying the object representation rather than
memberwise copy, both are possible, depending e.g. whether SRA happens or
not.

So, shouldn't we have a new CONSTRUCTOR flag that will represent whether
padding bits are cleared or not and then use it e.g. in the gimplifier?
Right now the gimplifier only adds first zero initialization if
CONSTRUCTOR_NO_CLEARING is not set and some initializers are not present,
so if there is a new flag, we'd need to in that case find out if there are
any padding bits and do the zero initialization in that case.
A question is if GIMPLE var = {}; statement (empty CONSTRUCTOR) is handled
as zero initialization of also the padding bits, or if we should treat it
that way only if the CONSTRUCTOR on the rhs has the new bit set and e.g.
when lowering memset 0 into var = {}; set the bit too.
From what I understood on IRC, D has similar need for zero initialization of
padding.

In the testcase below, what is and what is not UB?

#include 

struct S { int a : 31; int b; };
struct T { int a, b; };

constexpr int
foo ()
{
 S a = S ();
 S b = { 0, 0 };
 S c = a;
 S d;
 S e;
 d = a;
 e = S ();
 int u = std::bit_cast (T, a).a; // Is this well defined due to value 
initialization of a?
 int v = std::bit_cast (T, b).a; // But this is invalid, right?  There is no 
difference in the IL though.
 int w = std::bit_cast (T, c).a; // And this is also invalid, or are default 
copy ctors required to copy padding bits?


They are not required to. It does a memberwise copy of the bases and
members. Padding is not copied.


 int x = std::bit_cast (T, d).a; // Similarly for default copy assignment 
operators...


Same again, memberwise assignment of the bases and members.

I'm not sure whether the previous values of padding bits has to be
preserved though. If the LHS was zero-initialized (so its padding bits
were zeroed) and you assign to it, I don't see anything that allows
the padding bits to change. But I don't think the intention is to
forbid using memcpy for trivial assignments, and that would copy
padding bits.


 int y = std::bit_cast (T, e).a; // And this too?


Yes. I don't think e = S() is required to change the padding bits of
e to be copies of the zero bits in S(), so they are unspecified after
that assignment, and after the bit_cast.


 int z = std::bit_cast (T, S ()).a; // This one is well defined?


Yes, I think so.


 return u + v + w + x + y + z;
}

constexpr int x = foo ();

Jakub




Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-08-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 27, 2020 at 12:06:13PM +0200, Jakub Jelinek via Gcc-patches wrote:

Oops, rewrote the testcase from __builtin_bit_cast to std::bit_cast without
adjusting the syntax properly.
Also, let's not use bitfields in there, as clang doesn't support those.
So, adjusted testcase below.  clang++ rejects all 6 of those, but from what
you said, I'd expect that u and z should be well defined.

#include 

struct S { short a; int b; };
struct T { int a, b; };

constexpr int
foo ()
{
  S a = S ();
  S b = { 0, 0 };
  S c = a;
  S d;
  S e;
  d = a;
  e = S ();
  int u = std::bit_cast (a).a; // Is this well defined due to value 
initialization of a?
  int v = std::bit_cast (b).a; // But this is invalid, right?  There is no 
difference in the IL though.
  int w = std::bit_cast (c).a; // And this is also invalid, or are default 
copy ctors required to copy padding bits?
  int x = std::bit_cast (d).a; // Similarly for default copy assignment 
operators...
  int y = std::bit_cast (e).a; // And this too?
  int z = std::bit_cast (S ()).a; // This one is well defined?
  return u + v + w + x + y + z;
}

constexpr int x = foo ();

Jakub



Re: [PATCH] x86: Reject target("no-general-regs-only")

2020-08-27 Thread H.J. Lu via Gcc-patches
On Thu, Aug 27, 2020 at 1:47 AM Richard Biener
 wrote:
>
> On Wed, Aug 26, 2020 at 9:40 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > Reject target("no-general-regs-only") pragma and attribute.
>
> mgeneral-regs-only
> Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Var(ix86_target_flags) 
> Save
> Generate code which uses only the general registers.
>
> it has already RejectNegative - why's that not honored?  Is this a general
> issue?
>

target("no-general-regs-only") needs to be handled separately.

-- 
H.J.


Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-27 Thread Richard Biener via Gcc-patches
On Thu, Aug 27, 2020 at 9:17 AM Roger Sayle  wrote:
>
>
> >On 2020-08-26 5:23 p.m., Roger Sayle wrote:
> >> These more accurate target rtx_costs are used by the
> >> gimple-ssa-strength-reduction.c (via a call to mult_by_coeff_cost) to
> >> decide whether applying strength reduction would be profitable.  This test 
> >> case, slsr-13.c, assumes that two multiplications by four are
> >> cheaper than two multiplications by five.   (I believe) This is not the 
> >> case on hppa which
> >> has a sh2add instruction, that performs a multiplication by five in
> >> one cycle, or exactly the same cost as performing a left shift by two
> >> (i.e. a multiplication by four).  Oddly, I also believe this isn't the
> >> case on x86_64, where the similar lea instruction is (sometimes) as 
> >> efficient as left shift by two bits.
> >This looks like a regression.
> >
> >gcc-10 (prepatch):
> >
> >addl %r25,%r26,%r28
> >sh2addl %r25,%r28,%r25
> >sh2addl %r26,%r28,%r26
> >addl %r26,%r28,%r28
> >bv %r0(%r2)
> >addl %r28,%r25,%r28
> >
> >   [local count: 1073741824]:
> >  x1_4 = c_2(D) + s_3(D);
> >  slsr_11 = s_3(D) * 4;
> >  x2_6 = x1_4 + slsr_11;
> >  slsr_12 = c_2(D) * 4;
> >  x3_8 = x1_4 + slsr_12;
> >  _1 = x1_4 + x2_6;
> >  x_9 = _1 + x3_8;
> >  return x_9;
> >
> >gcc-11 (with patch):
> >
> >addl %r25,%r26,%r19
> >sh2addl %r26,%r26,%r28
> >addl %r28,%r25,%r28
> >sh2addl %r25,%r25,%r25
> >addl %r28,%r19,%r28
> >addl %r25,%r26,%r26
> >bv %r0(%r2)
> >addl %r28,%r26,%r28
> >
> >   [local count: 1073741824]:
> >  x1_4 = c_2(D) + s_3(D);
> >  a2_5 = s_3(D) * 5;
> >  x2_6 = c_2(D) + a2_5;
> >  a3_7 = c_2(D) * 5;
> >  x3_8 = s_3(D) + a3_7;
> >  _1 = x1_4 + x2_6;
> >  x_9 = _1 + x3_8;
> >  return x_9;
> >
> > Regards,
> > Dave
>
> There are two interesting (tree-optimization) observations here.  The first 
> is that at the tree-ssa
> level both of these gimple sequences look to have exactly the same cost, 
> seven assignments on
> a target where *4 is the same cost as *5.  The gimple doesn't attempt to 
> model the sh?add/lea
> instructions that combine may find, so at RTL expansion both sequences look 
> equivalent.  One
> fix may be to have gimple-ssa-strength-reduction.c just prefer 
> multiplications by 2, 4 and 8,
> even on targets that have a single cycle "mul" instruction.
>
> The second observation is why isn't tree-ssa-reassoc.c doing something here.  
> The test case
> is evaluating (s+c)+(s+5*c)+(5*s+c), and this strength reduction test is 
> expecting this to turn
> into "tmp=s+c;  return tmp+(tmp+4*c)+(4*s+tmp" which is clever and an 
> improvement, but
> overlooks the obvious reassociation 7*(s+c).  Indeed LLVM does this in three 
> instructions:

reassoc doesn't work on signed types

>
> tmp1 = s+c;
> tmp2 = tmp1<<3;
> return tmp2-tmp1;
>
> Although the PA backend is (mostly) innocent in this, the lowest impact 
> fix/work around is
> to have multiplications by 2, 4 and 8 return COSTS_N_INSNS(1)-1, to indicate 
> a preference
> when splitting ties.  I'll prepare a patch.
>
> Roger
> --
>
>


Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-08-27 Thread Richard Biener
On Thu, 27 Aug 2020, Jakub Jelinek wrote:

> On Fri, Jul 31, 2020 at 04:28:05PM -0400, Jason Merrill via Gcc-patches wrote:
> > On 7/31/20 6:06 AM, Jakub Jelinek wrote:
> > > On Fri, Jul 31, 2020 at 10:54:46AM +0100, Jonathan Wakely wrote:
> > > > > Does the standard require that somewhere?  Because that is not what 
> > > > > the
> > > > > compiler implements right now.
> > > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78620
> > > 
> > > But does that imply that all CONSTRUCTORs without CONSTRUCTOR_NO_CLEARING
> > > need to be treated that way?  I mean, aren't such CONSTRUCTORs used also 
> > > for
> > > other initializations?
> > 
> > Yes, they are also used to represent constant values of classes that are
> > initialized by constexpr constructor.
> > 
> > > And, are the default copy constructors or assignment operators supposed to
> > > also copy the padding bits, or do they become unspecified again through
> > > that?
> > 
> > For a non-union class, a defaulted copy is defined as memberwise copy, not a
> > copy of the entire object representation.  So I guess strictly speaking the
> > padding bits do become unspecified.  But I think if the copy is trivial, in
> > practice all implementations do copy the object representation; perhaps the
> > specification should adjust accordingly.
> 
> Sorry for not responding earlier.  I think at least in GCC there is no
> guarantee the copying is copying the object representation rather than
> memberwise copy, both are possible, depending e.g. whether SRA happens or
> not.

Note we've basically settled on that SRA needs to copy padding and that
GIMPLE copies all bytes for aggregate copies and thus

  x = {}

is equivalent to a memset.

> So, shouldn't we have a new CONSTRUCTOR flag that will represent whether
> padding bits are cleared or not and then use it e.g. in the gimplifier?
> Right now the gimplifier only adds first zero initialization if
> CONSTRUCTOR_NO_CLEARING is not set and some initializers are not present,
> so if there is a new flag, we'd need to in that case find out if there are
> any padding bits and do the zero initialization in that case.
> A question is if GIMPLE var = {}; statement (empty CONSTRUCTOR) is handled
> as zero initialization of also the padding bits, or if we should treat it
> that way only if the CONSTRUCTOR on the rhs has the new bit set and e.g.
> when lowering memset 0 into var = {}; set the bit too.
> From what I understood on IRC, D has similar need for zero initialization of
> padding.

Now indeed the gimplifier will turn a aggregate CTOR initialization
to memberwise init without caring for padding.  Which means GENERIC
has the less strict semantics and we indeed would need some CTOR flag
to tell whether padding is implicitely zero or undefined?

> In the testcase below, what is and what is not UB?
> 
> #include 
> 
> struct S { int a : 31; int b; };
> struct T { int a, b; };
> 
> constexpr int
> foo ()
> {
>   S a = S ();
>   S b = { 0, 0 };
>   S c = a;
>   S d;
>   S e;
>   d = a;
>   e = S ();
>   int u = std::bit_cast (T, a).a; // Is this well defined due to value 
> initialization of a?
>   int v = std::bit_cast (T, b).a; // But this is invalid, right?  There is no 
> difference in the IL though.
>   int w = std::bit_cast (T, c).a; // And this is also invalid, or are default 
> copy ctors required to copy padding bits?
>   int x = std::bit_cast (T, d).a; // Similarly for default copy assignment 
> operators...
>   int y = std::bit_cast (T, e).a; // And this too?
>   int z = std::bit_cast (T, S ()).a; // This one is well defined?
>   return u + v + w + x + y + z;
> }
> 
> constexpr int x = foo ();
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-08-27 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 31, 2020 at 04:28:05PM -0400, Jason Merrill via Gcc-patches wrote:
> On 7/31/20 6:06 AM, Jakub Jelinek wrote:
> > On Fri, Jul 31, 2020 at 10:54:46AM +0100, Jonathan Wakely wrote:
> > > > Does the standard require that somewhere?  Because that is not what the
> > > > compiler implements right now.
> > > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78620
> > 
> > But does that imply that all CONSTRUCTORs without CONSTRUCTOR_NO_CLEARING
> > need to be treated that way?  I mean, aren't such CONSTRUCTORs used also for
> > other initializations?
> 
> Yes, they are also used to represent constant values of classes that are
> initialized by constexpr constructor.
> 
> > And, are the default copy constructors or assignment operators supposed to
> > also copy the padding bits, or do they become unspecified again through
> > that?
> 
> For a non-union class, a defaulted copy is defined as memberwise copy, not a
> copy of the entire object representation.  So I guess strictly speaking the
> padding bits do become unspecified.  But I think if the copy is trivial, in
> practice all implementations do copy the object representation; perhaps the
> specification should adjust accordingly.

Sorry for not responding earlier.  I think at least in GCC there is no
guarantee the copying is copying the object representation rather than
memberwise copy, both are possible, depending e.g. whether SRA happens or
not.

So, shouldn't we have a new CONSTRUCTOR flag that will represent whether
padding bits are cleared or not and then use it e.g. in the gimplifier?
Right now the gimplifier only adds first zero initialization if
CONSTRUCTOR_NO_CLEARING is not set and some initializers are not present,
so if there is a new flag, we'd need to in that case find out if there are
any padding bits and do the zero initialization in that case.
A question is if GIMPLE var = {}; statement (empty CONSTRUCTOR) is handled
as zero initialization of also the padding bits, or if we should treat it
that way only if the CONSTRUCTOR on the rhs has the new bit set and e.g.
when lowering memset 0 into var = {}; set the bit too.
>From what I understood on IRC, D has similar need for zero initialization of
padding.

In the testcase below, what is and what is not UB?

#include 

struct S { int a : 31; int b; };
struct T { int a, b; };

constexpr int
foo ()
{
  S a = S ();
  S b = { 0, 0 };
  S c = a;
  S d;
  S e;
  d = a;
  e = S ();
  int u = std::bit_cast (T, a).a; // Is this well defined due to value 
initialization of a?
  int v = std::bit_cast (T, b).a; // But this is invalid, right?  There is no 
difference in the IL though.
  int w = std::bit_cast (T, c).a; // And this is also invalid, or are default 
copy ctors required to copy padding bits?
  int x = std::bit_cast (T, d).a; // Similarly for default copy assignment 
operators...
  int y = std::bit_cast (T, e).a; // And this too?
  int z = std::bit_cast (T, S ()).a; // This one is well defined?
  return u + v + w + x + y + z;
}

constexpr int x = foo ();

Jakub



[PATCH] C-SKY: Add fpuv3 instructions and CK860 arch

2020-08-27 Thread gengqi via Gcc-patches
From: gengq 

gcc/ChangeLog:

* config/csky/constraints.md (W): New constriant for mem operand with
a base reg with a index register.
(Q): Renamed and modified "csky_valid_fpuv2_mem_operand" to
"csky_valid_mem_constraint_operand" to deal with both "Q" and "W"
constraint.
(Dv): New constraint for const double value that can be used at fmovi
instruction.
* config/csky/csky-modes.def (HFmode): New mode.
* config/csky/csky-protos.h (csky_valid_mem_constraint_operand):
Declare.
(get_output_csky_movedouble_length): Declare.
(fpuv3_output_move): Declare.
(fpuv3_const_double): Declare.
* config/csky/csky.c (csky_option_override): New arch CK860 with fpv3.
(decompose_csky_address): Robustness adjust.
(csky_print_operand): New "CONST_DOUBLE" operand.
(csky_output_move): New support for fpv3 instructions.
(get_output_csky_movedouble_length): New function.
(fpuv3_output_move): New function.
(fpuv3_const_double): New function.
(csky_emit_compare): New cover for float comparsion.
(csky_emit_compare_float): Refine.
(csky_valid_mem_constraint_operand): Rename from
"csky_valid_fpuv2_mem_operand" and new support for constraint "W".
(ck860_rtx_costs): New function.
(csky_rtx_costs): New subcall for CK860.
* gcc/config/csky/csky.h (TARGET_TLS): Add CK860.
* gcc/config/csky/csky.md (movsf): Delete, and achieve it at
csky_insn_fpu.md.
(ck801_movsf): Delete, and achieve it at csky_insn_fpu.md.
(csky_movsf): Delete, and achieve it at csky_insn_fpu.md.
(movdf): Delete, achieve it at csky_insn_fpu.md.
(ck801_movdf): Delete, and achieve it at csky_insn_fpu.md.
(csky_movdf): Delete, and achieve it at csky_insn_fpu.md.
(csky_movsf_fpv2): Delete, and achieve it at csky_insn_fpuv2.md.
(csky_movdf_fpv2): Delete, and achieve it at csky_insn_fpuv2.md.
(movsicc): Refine. Use "comparison_operatior" instead of
"ordered_comparison_operatior".
(addsicc): Likewise.
* config/csky/csky_cores.def (CK860): New arch and cpu.
(fpv3_hf, fpv3_hsf, fpv3_sdf, fpv3): New fpus.
* config/csky/csky_insn_fpu.md: Refactor. For fpuv2, separate all float
patterns into emit-patterns and match-patterns, remain the
emit-patterns here, and move the match-patterns to csky_insn_fpuv2.md.
For fpuv3, add patterns and fuse part of them with the fpuv2's.
* config/csky/csky_insn_fpuv2.md: New file for fpuv2 instructions.
* config/csky/csky_insn_fpuv3.md: New flie and new patterns for fpuv3
isntructions.
* config/csky/csky_isa.def (fcr): New isa.
(fpv3_hi, fpv3_hf, fpv3_sf, fpv3_df): New isa.
(CK860): New definition.
* gcc/config/csky/csky_tables.opt (ck860): New processors ck860,
ck860f. And new arch ck860.
(fpv3_hf, fpv3_hsf, fpv3_sdf, fpv3): New fpus.
* config/csky/predicates.md (csky_float_comparsion_operator): Delete
"geu", "gtu", "leu", "ltu", which will never appear at float
comparison.
* doc/md.texi: Add "Q" and "W" constraints for C-SKY.

---
 gcc/config/csky/constraints.md |  13 +-
 gcc/config/csky/csky-modes.def |   2 +
 gcc/config/csky/csky-protos.h  |   7 +-
 gcc/config/csky/csky.c | 528 +++
 gcc/config/csky/csky.h |   2 +-
 gcc/config/csky/csky.md| 120 +
 gcc/config/csky/csky_cores.def |  13 +
 gcc/config/csky/csky_insn_fpu.md   | 797 -
 gcc/config/csky/csky_insn_fpuv2.md | 470 +
 gcc/config/csky/csky_insn_fpuv3.md | 418 +++
 gcc/config/csky/csky_isa.def   |  15 +
 gcc/config/csky/csky_tables.opt|  21 +
 gcc/config/csky/predicates.md  |   3 +-
 gcc/doc/md.texi|   8 +
 14 files changed, 1752 insertions(+), 665 deletions(-)
 create mode 100644 gcc/config/csky/csky_insn_fpuv2.md
 create mode 100644 gcc/config/csky/csky_insn_fpuv3.md

diff --git a/gcc/config/csky/constraints.md b/gcc/config/csky/constraints.md
index b9990b7dac9..874f698411b 100644
--- a/gcc/config/csky/constraints.md
+++ b/gcc/config/csky/constraints.md
@@ -34,7 +34,11 @@
 
 (define_memory_constraint "Q"
   "Memory operands with base register, index register and short displacement 
for FPUV2"
-  (match_test "csky_valid_fpuv2_mem_operand (op)"))
+  (match_test "csky_valid_mem_constraint_operand (op, \"Q\")"))
+
+(define_memory_constraint "W"
+  "memory operands with base register, index register"
+  (match_test "csky_valid_mem_constraint_operand (op, \"W\")"))
 
 (define_constraint "R"
   "Memory operands whose address is a label_ref"
@@ -172,3 +176,10 @@
   "Constant in range [-8, -1]"
   (and (match_code "const_int")
(match_test "CSKY_CONST_OK_FOR_US (ival)")))
+

Re: [PATCH] testsuite: add -fno-tree-fre in gcc.dg/guality

2020-08-27 Thread Richard Biener via Gcc-patches
On Thu, Aug 27, 2020 at 3:38 AM Hu Jiangping  wrote:
>
> This patch add -fno-tree-fre to dg-options in gcc.dg/guality/sra-1.c,
> to make the following testcases passed.
>
> FAIL: gcc.dg/guality/sra-1.c  -Og -DPREVENT_OPTIMIZATION  line 43 a.i == 4
> FAIL: gcc.dg/guality/sra-1.c  -Og -DPREVENT_OPTIMIZATION  line 43 a.j == 14
> FAIL: gcc.dg/guality/sra-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 21 a.j == 14
> FAIL: gcc.dg/guality/sra-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 32 a[1] == 14
> FAIL: gcc.dg/guality/sra-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 43 a.j == 14
>
> The detail error logs like this:
> > Spawning: gdb -nx -nw -quiet -batch -x sra-1.gdb ./sra-1.exe
> > spawn gdb -nx -nw -quiet -batch -x sra-1.gdb ./sra-1.exe
> > Breakpoint 1 at 0x40074c: file 
> > /home/build_gcc/gcc-a64fx-1-a/gcc/testsuite/gcc.dg/guality/sra-1.c, line 43.
> >
> > Breakpoint 1, f3 (k=k@entry=7) at 
> > /home/build_gcc/gcc-a64fx-1-a/gcc/testsuite/gcc.dg/guality/sra-1.c:43
> > 43bar (a.j);/* { dg-final { gdb-test . "a.j" "14" } } */
> > $1 = 
> > $2 = 4
> >  != 4
>
> Tested on aarch64.

Huh well - I guess it should be XFAILed instead?  But then we see
XPASSes.  Disabling
FRE doesn't look correct - we _do_ want the test to succeed here.  Did
you analyze
why it fails?

It looks like we assign 'a' a register at RTL expansion time for -Og
but do not perform any
fancy tracking of components of it then.  On GIMPLE we assume it's eventually
memory and thus wouldn't bother with any debug stmts.

So it's a genuine FAIL for -Og at least.  That said, the logic is simple - both
a.i and a.j need to be available for a.i + a.j to be evaluated.  For
some reasons
we fail to record where their value resides.  For -Og they are in
%rbp, resp. %rbx.
But obviously some elaborate DWARF is necessary to convey that info.

Richard.

> Regards!
> hujp
>
> ---
>  gcc/testsuite/gcc.dg/guality/sra-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/guality/sra-1.c 
> b/gcc/testsuite/gcc.dg/guality/sra-1.c
> index 8ad57cf3f8e..94ea29dd727 100644
> --- a/gcc/testsuite/gcc.dg/guality/sra-1.c
> +++ b/gcc/testsuite/gcc.dg/guality/sra-1.c
> @@ -1,6 +1,6 @@
>  /* PR debug/43983 */
>  /* { dg-do run } */
> -/* { dg-options "-g -fno-ipa-icf" } */
> +/* { dg-options "-g -fno-tree-fre -fno-ipa-icf" } */
>
>  struct A { int i; int j; };
>  struct B { int : 4; int i : 12; int j : 12; int : 4; };
> --
> 2.17.1
>
>
>


Re: [PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-27 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Aug 24, 2020 at 4:35 PM Christophe Lyon
 wrote:
>
> On Mon, 24 Aug 2020 at 11:09, Christophe Lyon
>  wrote:
> >
> > On Sat, 22 Aug 2020 at 00:44, Ramana Radhakrishnan
> >  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 10:32 AM Christophe Lyon via Gcc-patches
> > >  wrote:
> > > >
> > > > armv8-m.base (cortex-m23) has the movt instruction, so we need to
> > > > disable the define_split to generate a constant in this case,
> > > > otherwise we get incorrect insn constraints as described in PR94538.
> > > >
> > > > We also need to fix the pure-code alternative for thumb1_movsi_insn
> > > > because the assembler complains with instructions like
> > > > movs r0, #:upper8_15:1234
> > > > (Internal error in md_apply_fix)
> > > > We now generate movs r0, 4 instead.
> > > >
> > > > 2020-08-19  Christophe Lyon  
> > > > gcc/ChangeLog:
> > > >
> > > > * config/arm/thumb1.md: Disable set-constant splitter when
> > > > TARGET_HAVE_MOVT.
> > > > (thumb1_movsi_insn): Fix -mpure-code
> > > > alternative.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/arm/pure-code/pr94538-1.c: New test.
> > > > * gcc.target/arm/pure-code/pr94538-2.c: New test.
> > >
> >
> > Hi Ramana,
> >
> > >  I take it that this fixes the ICE rather than addressing the code
> > > generation / performance bits that Wilco was referring to ? Really the
> > > other code quality / performance issues listed in the PR should really
> > > be broken down into separate PRs while we take this as a fix for
> > > fixing the ICE,
> > >
> > > Under that assumption OK.
> > >
> >
> > Yes, that's correct, this patch just fixes the ICE, as it is cleaner
> > to handle perf issues in a different patch.
> >
>
> The patch applies cleanly to gcc-9 and gcc-10: OK to backport there?

Yes, all good.

Thanks,
Ramana

>
> (I tested that the offending testcase passes on the branches with the
> backport, and fails without).
>
> Thanks,
>
> Christophe
>
> > I'll open a new PR to track the perf issue.
> >
> > Thanks
> >
> >
> > > Ramana
> > >
> > > > ---
> > > >  gcc/config/arm/thumb1.md   | 66 
> > > > ++
> > > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
> > > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
> > > >  3 files changed, 79 insertions(+), 12 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
> > > >
> > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > > > index 0ff8190..f0129db 100644
> > > > --- a/gcc/config/arm/thumb1.md
> > > > +++ b/gcc/config/arm/thumb1.md
> > > > @@ -70,6 +70,7 @@ (define_split
> > > >"TARGET_THUMB1
> > > > && arm_disable_literal_pool
> > > > && GET_CODE (operands[1]) == CONST_INT
> > > > +   && !TARGET_HAVE_MOVT
> > > > && !satisfies_constraint_I (operands[1])"
> > > >[(clobber (const_int 0))]
> > > >"
> > > > @@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
> > > >"TARGET_THUMB1
> > > > && (   register_operand (operands[0], SImode)
> > > > || register_operand (operands[1], SImode))"
> > > > -  "@
> > > > -   movs%0, %1
> > > > -   movs%0, %1
> > > > -   movw%0, %1
> > > > -   #
> > > > -   #
> > > > -   ldmia\\t%1, {%0}
> > > > -   stmia\\t%0, {%1}
> > > > -   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; 
> > > > lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, 
> > > > #:lower0_7:%1
> > > > -   ldr\\t%0, %1
> > > > -   str\\t%1, %0
> > > > -   mov\\t%0, %1"
> > > > +{
> > > > +  switch (which_alternative)
> > > > +{
> > > > +  default:
> > > > +  case 0: return "movs\t%0, %1";
> > > > +  case 1: return "movs\t%0, %1";
> > > > +  case 2: return "movw\t%0, %1";
> > > > +  case 3: return "#";
> > > > +  case 4: return "#";
> > > > +  case 5: return "ldmia\t%1, {%0}";
> > > > +  case 6: return "stmia\t%0, {%1}";
> > > > +  case 7:
> > > > +  /* pure-code alternative: build the constant byte by byte,
> > > > +instead of loading it from a constant pool.  */
> > > > +   {
> > > > + int i;
> > > > + HOST_WIDE_INT op1 = INTVAL (operands[1]);
> > > > + bool mov_done_p = false;
> > > > + rtx ops[2];
> > > > + ops[0] = operands[0];
> > > > +
> > > > + /* Emit upper 3 bytes if needed.  */
> > > > + for (i = 0; i < 3; i++)
> > > > +   {
> > > > +  int byte = (op1 >> (8 * (3 - i))) & 0xff;
> > > > +
> > > > + if (byte)
> > > > +   {
> > > > + ops[1] = GEN_INT (byte);
> > > > + if (mov_done_p)
> > > > +   output_asm_insn ("adds\t%0, %1", ops);
> > > > + else
> > > > +   output_asm_insn ("movs\t%0, %1", ops);
> > > > +   

Re: [PATCH 3/3] vec: use inexact growth where possible.

2020-08-27 Thread Richard Biener via Gcc-patches
On Wed, Aug 26, 2020 at 11:02 PM Jeff Law  wrote:
>
> On Tue, 2020-08-11 at 13:37 +0200, Martin Liška wrote:
> > From cc1d41a469d76f2f8e4f44bed788ace77a1c6d62 Mon Sep 17 00:00:00 2001
> > From: Martin Liska 
> > Date: Mon, 10 Aug 2020 12:09:19 +0200
> > Subject: [PATCH 3/3] vec: use inexact growth where possible.
> >
> > gcc/ChangeLog:
> >
> >   * cfgrtl.c (rtl_create_basic_block): Use default value for
> >   growth vector function.
> >   * gimple.c (gimple_set_bb): Likewise.
> >   * symbol-summary.h: Likewise.
> >   * tree-cfg.c (init_empty_tree_cfg_for_function): Likewise.
> >   (build_gimple_cfg): Likewise.
> >   (create_bb): Likewise.
> >   (move_block_to_fn): Likewise.
> I'll note that in some cases we were avoiding exponential growth in our new 
> size
> computations.  Presumably the inexact growth support will do something 
> similar,
> even if it's not exactly the same.  Right?  Assuming that's the case this is 
> OK
> too.

@@ -183,12 +183,12 @@ init_empty_tree_cfg_for_function (struct function *fn)
   last_basic_block_for_fn (fn) = NUM_FIXED_BLOCKS;
   vec_alloc (basic_block_info_for_fn (fn), initial_cfg_capacity);
   vec_safe_grow_cleared (basic_block_info_for_fn (fn),
-initial_cfg_capacity, true);
+initial_cfg_capacity);

   /* Build a mapping of labels to their associated blocks.  */
   vec_alloc (label_to_block_map_for_fn (fn), initial_cfg_capacity);
   vec_safe_grow_cleared (label_to_block_map_for_fn (fn),
-initial_cfg_capacity, true);
+initial_cfg_capacity);

this is odd now at least since we explicitely alloc initial_cfg_capacity.
IMHO we should instead use

 basic_block_info_for_fn (fn)->quick_grow_cleared (initial_cfg_capacity)

here?

The rest looks OK and along the plan.

Thanks,
Richard.

> jeff
>


Re: [PATCH] x86: Reject target("no-general-regs-only")

2020-08-27 Thread Richard Biener via Gcc-patches
On Wed, Aug 26, 2020 at 9:40 PM H.J. Lu via Gcc-patches
 wrote:
>
> Reject target("no-general-regs-only") pragma and attribute.

mgeneral-regs-only
Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Var(ix86_target_flags) Save
Generate code which uses only the general registers.

it has already RejectNegative - why's that not honored?  Is this a general
issue?

Richard.

> gcc/
>
> PR target/96802
> * config/i386/i386-options.c (ix86_valid_target_attribute_inner_p):
> Reject target("no-general-regs-only").
>
> gcc/testsuite/
>
> PR target/96802
> * gcc.target/i386/pr96802-1.c: New test.
> * gcc.target/i386/pr96802-2.c: Likewise.
> ---
>  gcc/config/i386/i386-options.c|  7 +++
>  gcc/testsuite/gcc.target/i386/pr96802-1.c | 12 
>  gcc/testsuite/gcc.target/i386/pr96802-2.c | 16 
>  3 files changed, 35 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr96802-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr96802-2.c
>
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index e0fc68c27bf..b93c338346f 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -1189,6 +1189,13 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
> args, char *p_strings[],
> {
>   if (mask == OPTION_MASK_GENERAL_REGS_ONLY)
> {
> + if (!opt_set_p)
> +   {
> + error_at (loc, "pragma or attribute %  "
> +   "does not allow a negated form", p);
> + return false;
> +   }
> +
>   if (type != ix86_opt_ix86_yes)
> gcc_unreachable ();
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr96802-1.c 
> b/gcc/testsuite/gcc.target/i386/pr96802-1.c
> new file mode 100644
> index 000..e6ceb95d238
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr96802-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +
> +/* Reject the negated form of non-negatable attributes.  */
> +
> +__attribute__ ((target ("no-general-regs-only")))
> +int
> +foo (int a)
> +{
> +  return a + 1;
> +}
> +
> +/* { dg-error "does not allow a negated form" "" { target *-*-* } 0 } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr96802-2.c 
> b/gcc/testsuite/gcc.target/i386/pr96802-2.c
> new file mode 100644
> index 000..515f5673777
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr96802-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +
> +/* Reject the negated form of non-negatable pragma target.  */
> +
> +#pragma GCC push_options
> +#pragma GCC target("no-general-regs-only")
> +
> +int
> +foo (int a)
> +{
> +  return a + 1;
> +}
> +
> +#pragma GCC pop_options
> +
> +/* { dg-error "does not allow a negated form" "" { target *-*-* } 0 } */
> --
> 2.26.2
>


Re: [PATCH] fix testcase gcc.target/aarch64/insv_1.c

2020-08-27 Thread Iain Sandoe via Gcc-patches

Richard Sandiford  wrote:


"Qian, Jianhua"  writes:

Hi Richard

I found that some instructions are using '#' before immediate value,
and others are not. For example
(define_insn "insv_imm"
 [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
  (const_int 16)
  (match_operand:GPI 1 "const_int_operand" "n"))
(match_operand:GPI 2 "const_int_operand" "n"))]
 "UINTVAL (operands[1]) < GET_MODE_BITSIZE (mode)
  && UINTVAL (operands[1]) % 16 == 0"
 "movk\\t%0, %X2, lsl %1"
 [(set_attr "type" "mov_imm")]
)

Are there any standards for this?


No, it's unfortunately inconsistent.  FWIW, if we were going to change  
this,

personally I've a slight preference for having the “#”.


Absence of the # makes assemblers based on the LLVM backend reject GCC’s
output, as such I’ve got a strong preference for adding it (I’ve got some  
local

patches for this already).
e.g.
https://github.com/iains/gcc-darwin-arm64/commit/526ffb6b34ddb848853016cd14a438683aa0e6de

(hacking branch, please don’t shoot me, yet :) )

Iain



[PATCH] fix a typo in rtl.texi

2020-08-27 Thread Wei Wentao
Hi,

This patch fix a typo in rtl.texi.

Regards!

Weiwt

---
 gcc/doc/rtl.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 501fa1a31da..f8e1f950823 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -3954,7 +3954,7 @@ variable.
 @findex NOTE_INSN_BEGIN_STMT
 @item NOTE_INSN_BEGIN_STMT
 This note is used to generate @code{is_stmt} markers in line number
-debuggign information.  It indicates the beginning of a user
+debugging information.  It indicates the beginning of a user
 statement.
 
 @findex NOTE_INSN_INLINE_ENTRY
-- 
2.18.1





[PATCH] ia32: Fix alignment of _Atomic fields [PR65146]

2020-08-27 Thread Jakub Jelinek via Gcc-patches
Hi!

For _Atomic fields, lowering the alignment of long long or double etc.
fields on ia32 is undesirable, because then one really can't perform atomic
operations on those using cmpxchg8b.

The following patch stops lowering the alignment in fields for _Atomic
types (the x86_field_alignment change) and for -mpreferred-stack-boundary=2
also ensures we don't misalign _Atomic long long etc. automatic variables
(the ix86_{local,minimum}_alignment changes).
Not sure about iamcu_alignment change, I know next to nothing about IA MCU,
but unless it doesn't have cmpxchg8b instruction, it would surprise me if we
don't want to do it as well.
clang apparently doesn't lower the field alignment for _Atomic.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-08-27  Jakub Jelinek  

PR target/65146
* config/i386/i386.c (iamcu_alignment): Don't decrease alignment
for TYPE_ATOMIC types.
(ix86_local_alignment): Likewise.
(ix86_minimum_alignment): Likewise.
(x86_field_alignment): Likewise, and emit a -Wpsabi diagnostic
for it.

* gcc.target/i386/pr65146.c: New test.

--- gcc/config/i386/i386.c.jj   2020-08-24 10:00:01.321258451 +0200
+++ gcc/config/i386/i386.c  2020-08-26 19:19:11.834297395 +0200
@@ -16487,7 +16487,11 @@ iamcu_alignment (tree type, int align)
 
   /* Intel MCU psABI specifies scalar types > 4 bytes aligned to 4
  bytes.  */
-  mode = TYPE_MODE (strip_array_types (type));
+  type = strip_array_types (type);
+  if (TYPE_ATOMIC (type))
+return align;
+
+  mode = TYPE_MODE (type);
   switch (GET_MODE_CLASS (mode))
 {
 case MODE_INT:
@@ -16644,7 +16648,8 @@ ix86_local_alignment (tree exp, machine_
   && align == 64
   && ix86_preferred_stack_boundary < 64
   && (mode == DImode || (type && TYPE_MODE (type) == DImode))
-  && (!type || !TYPE_USER_ALIGN (type))
+  && (!type || (!TYPE_USER_ALIGN (type)
+   && !TYPE_ATOMIC (strip_array_types (type
   && (!decl || !DECL_USER_ALIGN (decl)))
 align = 32;
 
@@ -16757,7 +16762,8 @@ ix86_minimum_alignment (tree exp, machin
   /* Don't do dynamic stack realignment for long long objects with
  -mpreferred-stack-boundary=2.  */
   if ((mode == DImode || (type && TYPE_MODE (type) == DImode))
-  && (!type || !TYPE_USER_ALIGN (type))
+  && (!type || (!TYPE_USER_ALIGN (type)
+   && !TYPE_ATOMIC (strip_array_types (type
   && (!decl || !DECL_USER_ALIGN (decl)))
 {
   gcc_checking_assert (!TARGET_STV);
@@ -20293,11 +20299,30 @@ x86_field_alignment (tree type, int comp
 return computed;
   if (TARGET_IAMCU)
 return iamcu_alignment (type, computed);
-  mode = TYPE_MODE (strip_array_types (type));
+  type = strip_array_types (type);
+  mode = TYPE_MODE (type);
   if (mode == DFmode || mode == DCmode
   || GET_MODE_CLASS (mode) == MODE_INT
   || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
-return MIN (32, computed);
+{
+  if (TYPE_ATOMIC (type) && computed > 32)
+   {
+ static bool warned;
+
+ if (!warned && warn_psabi)
+   {
+ const char *url
+   = CHANGES_ROOT_URL "gcc-11/changes.html#ia32_atomic";
+
+ warned = true;
+ inform (input_location, "the alignment of %<_Atomic %T%> "
+ "fields changed in %{GCC 11.1%}",
+ TYPE_MAIN_VARIANT (type), url);
+   }
+   }
+  else
+  return MIN (32, computed);
+}
   return computed;
 }
 
--- gcc/testsuite/gcc.target/i386/pr65146.c.jj  2020-08-26 19:25:27.720023020 
+0200
+++ gcc/testsuite/gcc.target/i386/pr65146.c 2020-08-26 19:28:24.982535651 
+0200
@@ -0,0 +1,12 @@
+/* PR target/65146 */
+/* { dg-do compile } */
+/* { dg-options "-Wno-psabi" } */
+
+struct A { char a; _Atomic long long b; };
+struct B { char a; _Atomic double b; };
+struct C { char a; _Atomic long long b[2]; };
+struct D { char a; _Atomic double b[2]; };
+extern int a[__builtin_offsetof (struct A, b) == 8 ? 1 : -1];
+extern int b[__builtin_offsetof (struct B, b) == 8 ? 1 : -1];
+extern int c[__builtin_offsetof (struct C, b) == 8 ? 1 : -1];
+extern int d[__builtin_offsetof (struct D, b) == 8 ? 1 : -1];

Jakub



Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-27 Thread Richard Sandiford
xiezhiheng  writes:
> I made two separate patches for these two groups for review purposes.
>
> Note: Patch for min/max intrinsics should be applied before the patch for 
> rounding intrinsics
>
> Bootstrapped and tested on aarch64 Linux platform.

Thanks, LGTM.  Pushed to master.

Richard


[PATCH] tree-optimization/96579 - another special-operands fix in reassoc

2020-08-27 Thread Richard Biener
This makes sure to put special-ops expanded rhs left where
expression rewrite expects it.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-08-27  Richard Biener  

PR tree-optimization/96579
* tree-ssa-reassoc.c (linearize_expr_tree): If we expand
rhs via special ops make sure to swap operands.

* gcc.dg/pr96579.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr96579.c |  4 
 gcc/tree-ssa-reassoc.c | 13 +++--
 2 files changed, 15 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr96579.c

diff --git a/gcc/testsuite/gcc.dg/pr96579.c b/gcc/testsuite/gcc.dg/pr96579.c
new file mode 100644
index 000..49fdcb40359
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr96579.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-tree-forwprop -ffast-math -fno-tree-vrp" } */
+
+#include "pr96370.c"
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index fed463b0350..a5f5d52edab 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -5651,13 +5651,20 @@ linearize_expr_tree (vec *ops, gimple 
*stmt,
 
   if (!binrhsisreassoc)
{
- if (!try_special_add_to_ops (ops, rhscode, binrhs, binrhsdef))
+ bool swap = false;
+ if (try_special_add_to_ops (ops, rhscode, binrhs, binrhsdef))
+   /* If we add ops for the rhs we expect to be able to recurse
+  to it via the lhs during expression rewrite so swap
+  operands.  */
+   swap = true;
+ else
add_to_ops_vec (ops, binrhs);
 
  if (!try_special_add_to_ops (ops, rhscode, binlhs, binlhsdef))
add_to_ops_vec (ops, binlhs);
 
- return;
+ if (!swap)
+   return;
}
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -5676,6 +5683,8 @@ linearize_expr_tree (vec *ops, gimple 
*stmt,
  fprintf (dump_file, " is now ");
  print_gimple_stmt (dump_file, stmt, 0);
}
+  if (!binrhsisreassoc)
+   return;
 
   /* We want to make it so the lhs is always the reassociative op,
 so swap.  */
-- 
2.26.2


Re: [PATCH] fix testcase gcc.target/aarch64/insv_1.c

2020-08-27 Thread Richard Sandiford
"Qian, Jianhua"  writes:
> Hi Richard
>
> I found that some instructions are using '#' before immediate value,
> and others are not. For example
> (define_insn "insv_imm"
>   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
> (const_int 16)
> (match_operand:GPI 1 "const_int_operand" "n"))
>   (match_operand:GPI 2 "const_int_operand" "n"))]
>   "UINTVAL (operands[1]) < GET_MODE_BITSIZE (mode)
>&& UINTVAL (operands[1]) % 16 == 0"
>   "movk\\t%0, %X2, lsl %1"
>   [(set_attr "type" "mov_imm")]
> )
>
> Are there any standards for this?

No, it's unfortunately inconsistent.  FWIW, if we were going to change this,
personally I've a slight preference for having the “#”.

Thanks,
Richard

>
> Regards
> Qian
>
> -Original Message-
> From: Richard Sandiford  
> Sent: Wednesday, August 26, 2020 6:09 PM
> To: Qian, Jianhua/钱 建华 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] fix testcase gcc.target/aarch64/insv_1.c
>
> Qian Jianhua  writes:
>> There are three failures in gcc.target/aarch64/insv_1.c.
>>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler bfi\tx[0-9]+, 
>> x[0-9]+, 0, 8
>>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler bfi\tx[0-9]+, 
>> x[0-9]+, 16, 5
>>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler movk\tx[0-9]+, 
>> 0x1d6b, lsl 32
>>
>> This patch fix the third failure which was missed "#" before immediate 
>> value in scan-assembler.
>
> Thanks, pushed to master.
>
> Richard
>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/aarch64/insv_1.c: Add '#' in scan-assembler
>>
>> ---
>>  gcc/testsuite/gcc.target/aarch64/insv_1.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c 
>> b/gcc/testsuite/gcc.target/aarch64/insv_1.c
>> index 360a9892ad9..9efa22e649d 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/insv_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c
>> @@ -32,7 +32,7 @@ bfi2 (bitfield a)
>>  bitfield
>>  movk (bitfield a)
>>  {
>> -  /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0x1d6b, lsl 32" } 
>> } */
>> +  /* { dg-final { scan-assembler "movk\tx\[0-9\]+, #0x1d6b, lsl 32" } 
>> + } */
>>a.sixteen = 7531;
>>return a;
>>  }


[PATCH] doc: add 'cd' command before 'make check-gcc' command in install.texi

2020-08-27 Thread Hu Jiangping
Hi,

This patch add 'cd' command before 'make check-gcc' command
when run the testsuite on selected tests.

I think the implicit meaning of the original text is to
execute the cd command to move to the gcc subdirectory of
the object directory before executing the make command.
However, due to the following reasons, it may cause
confusion if it is not clearly written.

* make check-gcc command also can be executed in
object directory which will run more testcases
than be executed in gcc subdirectory.
* the make check-g++ command below will report
'No rule to make target' error if be executed in
object directory.

Tested on x86_64. OK for master?

Regards!
hujp

---
 gcc/doc/install.texi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 5330bf3bb29..fd4409921ee 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2975,6 +2975,7 @@ A more selective way to just run all @command{gcc} 
execute tests in the
 testsuite is to use
 
 @smallexample
+cd @var{objdir}/gcc
 make check-gcc RUNTESTFLAGS="execute.exp @var{other-options}"
 @end smallexample
 
-- 
2.17.1





Re: [Patch, fortran] PR96624 - A segment fault occurred when using the reshape function result to assign a variable

2020-08-27 Thread Tobias Burnus

Looks good to me.

Thanks for the patch!

Tobias

On 8/27/20 8:17 AM, Paul Richard Thomas via Fortran wrote:

Hi All,

Here is another of Steve Kargl's patches.

Before the patch is applied, the following code is generated:
 atmp.0.span = 4;
 atmp.0.data = 0B;
 atmp.0.offset = 0;
 (*(integer(kind=4)[0] * restrict) atmp.0.data)[0] = 1;
 (*(integer(kind=4)[0] * restrict) atmp.0.data)[1] = 2;

which causes a segfault at run time. The test case counts the number of
occurrences of 'data' to check that the bad assignments have gone.

Regtests OK on FC31/x86_64 - OK for aster?

This patch fixes PR96624.

2020-08-27  Paul Thomas  

gcc/fortran
PR fortran/96624
* simplify.c (gfc_simplify_reshape): Detect zero shape and
clear index if found.

gcc/testsuite/
PR fortran/96624
* gfortran.dg/reshape_8.f90 : New test.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


RE: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-27 Thread Roger Sayle


>On 2020-08-26 5:23 p.m., Roger Sayle wrote:
>> These more accurate target rtx_costs are used by the 
>> gimple-ssa-strength-reduction.c (via a call to mult_by_coeff_cost) to 
>> decide whether applying strength reduction would be profitable.  This test 
>> case, slsr-13.c, assumes that two multiplications by four are
>> cheaper than two multiplications by five.   (I believe) This is not the case 
>> on hppa which
>> has a sh2add instruction, that performs a multiplication by five in 
>> one cycle, or exactly the same cost as performing a left shift by two 
>> (i.e. a multiplication by four).  Oddly, I also believe this isn't the 
>> case on x86_64, where the similar lea instruction is (sometimes) as 
>> efficient as left shift by two bits.
>This looks like a regression.
>
>gcc-10 (prepatch):
>
>addl %r25,%r26,%r28
>sh2addl %r25,%r28,%r25
>sh2addl %r26,%r28,%r26
>addl %r26,%r28,%r28
>bv %r0(%r2)
>addl %r28,%r25,%r28
>
>   [local count: 1073741824]:
>  x1_4 = c_2(D) + s_3(D);
>  slsr_11 = s_3(D) * 4;
>  x2_6 = x1_4 + slsr_11;
>  slsr_12 = c_2(D) * 4;
>  x3_8 = x1_4 + slsr_12;
>  _1 = x1_4 + x2_6;
>  x_9 = _1 + x3_8;
>  return x_9;
>
>gcc-11 (with patch):
>
>addl %r25,%r26,%r19
>sh2addl %r26,%r26,%r28
>addl %r28,%r25,%r28
>sh2addl %r25,%r25,%r25
>addl %r28,%r19,%r28
>addl %r25,%r26,%r26
>bv %r0(%r2)
>addl %r28,%r26,%r28
>
>   [local count: 1073741824]:
>  x1_4 = c_2(D) + s_3(D);
>  a2_5 = s_3(D) * 5;
>  x2_6 = c_2(D) + a2_5;
>  a3_7 = c_2(D) * 5;
>  x3_8 = s_3(D) + a3_7;
>  _1 = x1_4 + x2_6;
>  x_9 = _1 + x3_8;
>  return x_9;
>
> Regards,
> Dave

There are two interesting (tree-optimization) observations here.  The first is 
that at the tree-ssa
level both of these gimple sequences look to have exactly the same cost, seven 
assignments on
a target where *4 is the same cost as *5.  The gimple doesn't attempt to model 
the sh?add/lea
instructions that combine may find, so at RTL expansion both sequences look 
equivalent.  One
fix may be to have gimple-ssa-strength-reduction.c just prefer multiplications 
by 2, 4 and 8,
even on targets that have a single cycle "mul" instruction.

The second observation is why isn't tree-ssa-reassoc.c doing something here.  
The test case
is evaluating (s+c)+(s+5*c)+(5*s+c), and this strength reduction test is 
expecting this to turn
into "tmp=s+c;  return tmp+(tmp+4*c)+(4*s+tmp" which is clever and an 
improvement, but
overlooks the obvious reassociation 7*(s+c).  Indeed LLVM does this in three 
instructions:

tmp1 = s+c;
tmp2 = tmp1<<3;
return tmp2-tmp1;

Although the PA backend is (mostly) innocent in this, the lowest impact 
fix/work around is
to have multiplications by 2, 4 and 8 return COSTS_N_INSNS(1)-1, to indicate a 
preference
when splitting ties.  I'll prepare a patch.

Roger
--




[Patch, fortran] PR96624 - A segment fault occurred when using the reshape function result to assign a variable

2020-08-27 Thread Paul Richard Thomas via Gcc-patches
Hi All,

Here is another of Steve Kargl's patches.

Before the patch is applied, the following code is generated:
atmp.0.span = 4;
atmp.0.data = 0B;
atmp.0.offset = 0;
(*(integer(kind=4)[0] * restrict) atmp.0.data)[0] = 1;
(*(integer(kind=4)[0] * restrict) atmp.0.data)[1] = 2;

which causes a segfault at run time. The test case counts the number of
occurrences of 'data' to check that the bad assignments have gone.

Regtests OK on FC31/x86_64 - OK for aster?

This patch fixes PR96624.

2020-08-27  Paul Thomas  

gcc/fortran
PR fortran/96624
* simplify.c (gfc_simplify_reshape): Detect zero shape and
clear index if found.

gcc/testsuite/
PR fortran/96624
* gfortran.dg/reshape_8.f90 : New test.
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 074b50c2e68..8e0d2f97a60 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -6398,7 +6398,7 @@ gfc_simplify_is_contiguous (gfc_expr *array)
 
   if (gfc_is_not_contiguous (array))
 return gfc_get_logical_expr (gfc_default_logical_kind, >where, 0);
-
+
   return NULL;
 }
 
@@ -6725,6 +6725,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
   unsigned long j;
   size_t nsource;
   gfc_expr *e, *result;
+  bool zerosize = false;
 
   /* Check that argument expression types are OK.  */
   if (!is_constant_array_expr (source)
@@ -6847,7 +6848,14 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
   result->rank = rank;
   result->shape = gfc_get_shape (rank);
   for (i = 0; i < rank; i++)
-mpz_init_set_ui (result->shape[i], shape[i]);
+{
+  mpz_init_set_ui (result->shape[i], shape[i]);
+  if (shape[i] == 0)
+	zerosize = true;
+}
+
+  if (zerosize)
+goto sizezero;
 
   while (nsource > 0 || npad > 0)
 {
@@ -6897,6 +6905,8 @@ inc:
   break;
 }
 
+sizezero:
+
   mpz_clear (index);
 
   return result;
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
!
! Test the fix for PR96624 in which an attempt was made to assign
! to the zero length temporary created by reshape, resulting in a segfault.
!
! Contributed by Dong Shenpo  
!
program test
  integer :: a(2,0)
  a = reshape([1,2,3,4], [2,0])
  print *, a
end
! { dg-final { scan-tree-dump-times "data" 3 "original" } }