[PATCH] rs6000: Remove ALTIVEC_BUILTIN_MASK_FOR_STORE

2020-08-28 Thread Bill Schmidt via Gcc-patches
It turns out that the target hook that this is supposed to satisfy
disappeared in 2004.  Probably time to retire it.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions; committed as obvious.

Thanks,
Bill


2020-08-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin.def (MASK_FOR_STORE): Remove.
* config/rs6000/rs6000-call.c (rs6000_expand_builtin): Remove
all logic for ALTIVEC_BUILTIN_MASK_FOR_STORE.
---
 gcc/config/rs6000/rs6000-builtin.def |  1 -
 gcc/config/rs6000/rs6000-call.c  | 12 +++-
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index aa1b945b063..e91a48ddf5f 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1515,7 +1515,6 @@ BU_ALTIVEC_C (STVLXL, "stvlxl",   MEM)
 BU_ALTIVEC_C (STVRX,   "stvrx",MEM)
 BU_ALTIVEC_C (STVRXL,  "stvrxl",   MEM)
 BU_ALTIVEC_X (MASK_FOR_LOAD,   "mask_for_load",MISC)
-BU_ALTIVEC_X (MASK_FOR_STORE,  "mask_for_store",   MISC)
 BU_ALTIVEC_X (VEC_INIT_V4SI,   "vec_init_v4si",CONST)
 BU_ALTIVEC_X (VEC_INIT_V8HI,   "vec_init_v8hi",CONST)
 BU_ALTIVEC_X (VEC_INIT_V16QI,  "vec_init_v16qi",   CONST)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 0e9dc77da91..26388762c5f 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12643,7 +12643,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   }
 
 case ALTIVEC_BUILTIN_MASK_FOR_LOAD:
-case ALTIVEC_BUILTIN_MASK_FOR_STORE:
   {
int icode2 = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr_direct
 : (int) CODE_FOR_altivec_lvsl_direct);
@@ -12658,14 +12657,9 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
gcc_assert (POINTER_TYPE_P (TREE_TYPE (arg)));
op = expand_expr (arg, NULL_RTX, Pmode, EXPAND_NORMAL);
addr = memory_address (mode, op);
-   if (fcode == ALTIVEC_BUILTIN_MASK_FOR_STORE)
- op = addr;
-   else
- {
-   /* For the load case need to negate the address.  */
-   op = gen_reg_rtx (GET_MODE (addr));
-   emit_insn (gen_rtx_SET (op, gen_rtx_NEG (GET_MODE (addr), addr)));
- }
+   /* We need to negate the address.  */
+   op = gen_reg_rtx (GET_MODE (addr));
+   emit_insn (gen_rtx_SET (op, gen_rtx_NEG (GET_MODE (addr), addr)));
op = gen_rtx_MEM (mode, op);
 
if (target == 0
-- 
2.17.1



Re: [PATCH] rs6000: r12 copy cleanup

2020-08-28 Thread Segher Boessenkool
On Fri, Aug 28, 2020 at 11:48:41AM -0500, Bill Schmidt wrote:
> Remove unnecessary tests before copying function address to r12, as
> requested by Segher.
> 
> Bootstrapped and tested on powerpc64le-unknown-linx-gnu with no
> regressions, committed as obvious.

Thanks!


Segher


Re: [committed] libstdc++: Make std::chrono::duration use reduced ratio for period

2020-08-28 Thread Jonathan Wakely via Gcc-patches

On 27/08/20 22:41 +0100, Jonathan Wakely wrote:

This implements the changes from P0548 "common_type and duration". That
was a change for C++17, but as it corrects some issues introduced by DRs
I'm also treating it as a DR and changing it for all modes from C++11
up.

The main change is that duration::period no longer denotes P, but
rather P::type, the reduced ratio. The unary operator+ and operator-
members of duration should now return a duration using that reduced
ratio.

The requirement that common_type::type is the same type as
common_type::type (rather than simply T) was already implemented
for PR 89102.

The standard says that duration::operator+() and duration::operator-()
should return common_type_t, but that seems unnecessarily
expensive to compute. This change just uses duration which
is the same type, so we don't need to instantiate common_type.

As an optimization, this also adds partial specializations of
common_type for two durations of the same type, a single duration, two
time_points of the same type, and a single time_point. These
specializations avoid instantiating other specializations of common_type
and one or both of __duration_common_type or __timepoint_common_type for
the cases where the answer is trivial to obtain.

libstdc++-v3/ChangeLog:

* include/std/chrono (__duration_common_type): Ensure the
reduced ratio is used. Remove unused partial specialization
using __failure_type.
(common_type): Pass reduced ratios to __duration_common_type.
Add partial specializations for simple cases involving a single
duration or time_point type.
(duration::period): Use reduced ratio.
(duration::operator+(), duration::operator-()): Return duration
type using the reduced ratio.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc:
Adjust expected errors.
* testsuite/20_util/duration/requirements/reduced_period.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

This is a C++17 feature, so I think it would be good to backport it to
gcc-10 as well. I'll let it bake on trunk for a while first though.


A correction ...

   libstdc++: Fix common_type specializations for duration

My recent change to implement P0548 ("common_type and duration") was not

correct. The result of common_type_t, duration>
should be duration, P::type>, not duration.
The common_type specialization for two different duration types was
correct, but the specializations for a single duration type (which only
exist to optimize compilation time) were wrong.

This fixes the partial specializations of common_type for a single

duration type, and also the return types of duration::operator+ and
duration::operator- which are supposed to use common_type_t.

libstdc++-v3/ChangeLog:

* include/std/chrono (common_type): Fix partial specializations

for a single duration type to use the common_type of the rep.
(duration::operator+, duration::operator-): Fix return types
to also use the common_type of the rep.
* testsuite/20_util/duration/requirements/reduced_period.cc:
Check duration using a rep that has common_type specialized.

Tested powerpc64le-linux. Committed to trunk.

commit f2f48b68a6a586f40dd8ae0e6d391b7f88756eec
Author: Jonathan Wakely 
Date:   Fri Aug 28 23:41:13 2020

libstdc++: Fix common_type specializations for duration

My recent change to implement P0548 ("common_type and duration") was not
correct. The result of common_type_t, duration>
should be duration, P::type>, not duration.
The common_type specialization for two different duration types was
correct, but the specializations for a single duration type (which only
exist to optimize compilation time) were wrong.

This fixes the partial specializations of common_type for a single
duration type, and also the return types of duration::operator+ and
duration::operator- which are supposed to use common_type_t.

libstdc++-v3/ChangeLog:

* include/std/chrono (common_type): Fix partial specializations
for a single duration type to use the common_type of the rep.
(duration::operator+, duration::operator-): Fix return types
to also use the common_type of the rep.
* testsuite/20_util/duration/requirements/reduced_period.cc:
Check duration using a rep that has common_type specialized.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index fb251848da8..524d23ea6a7 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -114,13 +114,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct common_type,
 		   chrono::duration<_Rep, _Period>>
-{ using type = chrono::duration<_Rep, typename _Period::type>; };
+{
+  using type = 

Re: [committed] libstdc++: Fix std::gcd and std::lcm for unsigned integers [PR 92978]

2020-08-28 Thread Jonathan Wakely via Gcc-patches

On 29/08/20 00:20 +0200, Daniel Krügler wrote:

Am Sa., 29. Aug. 2020 um 00:12 Uhr schrieb Jonathan Wakely via
Libstdc++ :


This fixes a bug with mixed signed and unsigned types, where converting
a negative value to the unsigned result type alters the value. The
solution is to obtain the absolute values of the arguments immediately
and to perform the actual GCD or LCM algorithm on two arguments of the
same type.

In order to operate on the most negative number without overflow when
taking its absolute, use an unsigned type for the result of the abs
operation. For example, -INT_MIN will overflow, but -(unsigned)INT_MIN
is (unsigned)INT_MAX+1U which is the correct value.

libstdc++-v3/ChangeLog:

PR libstdc++/92978
* include/std/numeric (__abs_integral): Replace with ...
(__detail::__absu): New function template that returns an
unsigned type, guaranteeing it can represent the most
negative signed value.
(__detail::__gcd, __detail::__lcm): Require arguments to
be unsigned and therefore already non-negative.
(gcd, lcm): Convert arguments to absolute value as unsigned
type before calling __detail::__gcd or __detail::__lcm.
* include/experimental/numeric (gcd, lcm): Likewise.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust expected
errors.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/92978.cc: New test.
* testsuite/26_numerics/lcm/92978.cc: New test.
* testsuite/experimental/numeric/92978.cc: New test.

Tested powerpc64le-linux. Committed to trunk.


Shouldn't the overload of __absu

void __absu(bool) = delete;

still also be a template or is just the diff presentation confusing me?


Good point! It's called as __absu(v) so it needs to be a function
template for the deleted one to be a candidate.

I'm not sure we really need it, since all the callers have a
static_assert ensuring it's not a bool. But if we have it, it should
be correct.




[PATCH 3/n] ipa: Simplify interface of ipa_call_context::estimate_size_and_time

2020-08-28 Thread Martin Jambor
Hi,

this patch changes ipa_call_context::estimate_size_and_time to store
its results into member fields of the ipa_call_context class instead
into pointers it receives as parameters so that it can compute ore
stuff without cluttering the interface even further.

Bootstrapped and tested on x86_64-linux.  OK for master on top of the
previous two patches?

Thanks,

Martin


gcc/ChangeLog:

2020-08-28  Martin Jambor  

* ipa-fnsummary.h (class ipa_call_context): Changed declaration of
estimate_size_and_time to accept two booleans.  Added an overload
of the method without any parameters.  New fields m_size,
m_min_size, m_time, m_nonspecialized_time and m_hints.
* ipa-cp.c (hint_time_bonus): Changed the second parameter from
just hints to a const reference to ipa_call_context.
(perform_estimation_of_a_value): Adjusted to the new interface of
ipa_call_context::estimate_size_and_time.
* ipa-fnsummary.c (ipa_call_context::estimate_size_and_time):
Modified to store results into member fields of the class.
* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to the
new interface of ipa_call_context::estimate_size_and_time.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
---
 gcc/ipa-cp.c  | 41 -
 gcc/ipa-fnsummary.c   | 48 +++
 gcc/ipa-fnsummary.h   | 33 +++
 gcc/ipa-inline-analysis.c | 45 ++--
 4 files changed, 94 insertions(+), 73 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index e37e71bd24f..010ecfc6b43 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3182,12 +3182,13 @@ devirtualization_time_bonus (struct cgraph_node *node,
   return res;
 }
 
-/* Return time bonus incurred because of HINTS.  */
+/* Return time bonus incurred because of hints stored in CTX.  */
 
 static int
-hint_time_bonus (cgraph_node *node, ipa_hints hints)
+hint_time_bonus (cgraph_node *node, const ipa_call_context )
 {
   int result = 0;
+  ipa_hints hints = ctx.m_hints;
   if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
 result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
   return result;
@@ -3387,16 +3388,14 @@ perform_estimation_of_a_value (cgraph_node *node, 
ipa_call_arg_values *avals,
   int removable_params_cost, int est_move_cost,
   ipcp_value_base *val)
 {
-  int size, time_benefit;
-  sreal time, base_time;
-  ipa_hints hints;
+  int time_benefit;
 
   ipa_call_context ctx = ipa_call_context::for_cloned_node (node, avals);
-  ctx.estimate_size_and_time (, NULL, , _time, );
+  ctx.estimate_size_and_time ();
 
-  base_time -= time;
-  if (base_time > 65535)
-base_time = 65535;
+  sreal time_delta = ctx.m_nonspecialized_time - ctx.m_time;
+  if (time_delta > 65535)
+time_delta = 65535;
 
   /* Extern inline functions have no cloning local time benefits because they
  will be inlined anyway.  The only reason to clone them is if it enables
@@ -3404,11 +3403,12 @@ perform_estimation_of_a_value (cgraph_node *node, 
ipa_call_arg_values *avals,
   if (DECL_EXTERNAL (node->decl) && DECL_DECLARED_INLINE_P (node->decl))
 time_benefit = 0;
   else
-time_benefit = base_time.to_int ()
+time_benefit = time_delta.to_int ()
   + devirtualization_time_bonus (node, avals)
-  + hint_time_bonus (node, hints)
+  + hint_time_bonus (node, ctx)
   + removable_params_cost + est_move_cost;
 
+  int size = ctx.m_size;
   gcc_checking_assert (size >=0);
   /* The inliner-heuristics based estimates may think that in certain
  contexts some functions do not have any size at all but we want
@@ -3463,24 +3463,22 @@ estimate_local_effects (struct cgraph_node *node)
   || (removable_params_cost && node->can_change_signature))
 {
   struct caller_statistics stats;
-  ipa_hints hints;
-  sreal time, base_time;
-  int size;
 
   init_caller_stats ();
   node->call_for_symbol_thunks_and_aliases (gather_caller_stats, ,
  false);
   ipa_call_context ctx = ipa_call_context::for_cloned_node (node, );
-  ctx.estimate_size_and_time (, NULL, , _time, );
+  ctx.estimate_size_and_time ();
 
-  time -= devirt_bonus;
-  time -= hint_time_bonus (node, hints);
-  time -= removable_params_cost;
-  size -= stats.n_calls * removable_params_cost;
+  sreal time = ctx.m_nonspecialized_time - ctx.m_time;
+  time += devirt_bonus;
+  time += hint_time_bonus (node, ctx);
+  time += removable_params_cost;
+  int size = ctx.m_size - stats.n_calls * removable_params_cost;
 
   if (dump_file)
fprintf (dump_file, " - context independent values, size: %i, "
-"time_benefit: %f\n", size, (base_time - time).to_double ());
+ 

Re: [committed] libstdc++: Fix std::gcd and std::lcm for unsigned integers [PR 92978]

2020-08-28 Thread Daniel Krügler via Gcc-patches
Am Sa., 29. Aug. 2020 um 00:12 Uhr schrieb Jonathan Wakely via
Libstdc++ :
>
> This fixes a bug with mixed signed and unsigned types, where converting
> a negative value to the unsigned result type alters the value. The
> solution is to obtain the absolute values of the arguments immediately
> and to perform the actual GCD or LCM algorithm on two arguments of the
> same type.
>
> In order to operate on the most negative number without overflow when
> taking its absolute, use an unsigned type for the result of the abs
> operation. For example, -INT_MIN will overflow, but -(unsigned)INT_MIN
> is (unsigned)INT_MAX+1U which is the correct value.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/92978
> * include/std/numeric (__abs_integral): Replace with ...
> (__detail::__absu): New function template that returns an
> unsigned type, guaranteeing it can represent the most
> negative signed value.
> (__detail::__gcd, __detail::__lcm): Require arguments to
> be unsigned and therefore already non-negative.
> (gcd, lcm): Convert arguments to absolute value as unsigned
> type before calling __detail::__gcd or __detail::__lcm.
> * include/experimental/numeric (gcd, lcm): Likewise.
> * testsuite/26_numerics/gcd/gcd_neg.cc: Adjust expected
> errors.
> * testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
> * testsuite/26_numerics/gcd/92978.cc: New test.
> * testsuite/26_numerics/lcm/92978.cc: New test.
> * testsuite/experimental/numeric/92978.cc: New test.
>
> Tested powerpc64le-linux. Committed to trunk.

Shouldn't the overload of __absu

void __absu(bool) = delete;

still also be a template or is just the diff presentation confusing me?

Thanks,

- Daniel


[committed] libstdc++: Fix std::gcd and std::lcm for unsigned integers [PR 92978]

2020-08-28 Thread Jonathan Wakely via Gcc-patches
This fixes a bug with mixed signed and unsigned types, where converting
a negative value to the unsigned result type alters the value. The
solution is to obtain the absolute values of the arguments immediately
and to perform the actual GCD or LCM algorithm on two arguments of the
same type.

In order to operate on the most negative number without overflow when
taking its absolute, use an unsigned type for the result of the abs
operation. For example, -INT_MIN will overflow, but -(unsigned)INT_MIN
is (unsigned)INT_MAX+1U which is the correct value.

libstdc++-v3/ChangeLog:

PR libstdc++/92978
* include/std/numeric (__abs_integral): Replace with ...
(__detail::__absu): New function template that returns an
unsigned type, guaranteeing it can represent the most
negative signed value.
(__detail::__gcd, __detail::__lcm): Require arguments to
be unsigned and therefore already non-negative.
(gcd, lcm): Convert arguments to absolute value as unsigned
type before calling __detail::__gcd or __detail::__lcm.
* include/experimental/numeric (gcd, lcm): Likewise.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust expected
errors.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/92978.cc: New test.
* testsuite/26_numerics/lcm/92978.cc: New test.
* testsuite/experimental/numeric/92978.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 82db1a42e9254c9009bbf8ac01366da4d1ab6df5
Author: Jonathan Wakely 
Date:   Fri Aug 28 22:45:24 2020

libstdc++: Fix std::gcd and std::lcm for unsigned integers [PR 92978]

This fixes a bug with mixed signed and unsigned types, where converting
a negative value to the unsigned result type alters the value. The
solution is to obtain the absolute values of the arguments immediately
and to perform the actual GCD or LCM algorithm on two arguments of the
same type.

In order to operate on the most negative number without overflow when
taking its absolute, use an unsigned type for the result of the abs
operation. For example, -INT_MIN will overflow, but -(unsigned)INT_MIN
is (unsigned)INT_MAX+1U which is the correct value.

libstdc++-v3/ChangeLog:

PR libstdc++/92978
* include/std/numeric (__abs_integral): Replace with ...
(__detail::__absu): New function template that returns an
unsigned type, guaranteeing it can represent the most
negative signed value.
(__detail::__gcd, __detail::__lcm): Require arguments to
be unsigned and therefore already non-negative.
(gcd, lcm): Convert arguments to absolute value as unsigned
type before calling __detail::__gcd or __detail::__lcm.
* include/experimental/numeric (gcd, lcm): Likewise.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust expected
errors.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/92978.cc: New test.
* testsuite/26_numerics/lcm/92978.cc: New test.
* testsuite/experimental/numeric/92978.cc: New test.

diff --git a/libstdc++-v3/include/experimental/numeric 
b/libstdc++-v3/include/experimental/numeric
index d8a904a7057..4154e535a94 100644
--- a/libstdc++-v3/include/experimental/numeric
+++ b/libstdc++-v3/include/experimental/numeric
@@ -54,15 +54,19 @@ inline namespace fundamentals_v2
   /// Greatest common divisor
   template
 constexpr common_type_t<_Mn, _Nn>
-gcd(_Mn __m, _Nn __n)
+gcd(_Mn __m, _Nn __n) noexcept
 {
-  static_assert(is_integral_v<_Mn>, "gcd arguments are integers");
-  static_assert(is_integral_v<_Nn>, "gcd arguments are integers");
-  static_assert(!is_same_v, bool>,
-   "gcd arguments are not bools");
-  static_assert(!is_same_v, bool>,
-   "gcd arguments are not bools");
-  return std::__detail::__gcd(__m, __n);
+  static_assert(is_integral_v<_Mn>,
+ "std::experimental::gcd arguments must be integers");
+  static_assert(is_integral_v<_Nn>,
+ "std::experimental::gcd arguments must be integers");
+  static_assert(_Mn(2) != _Mn(1),
+ "std::experimental::gcd arguments must not be bool");
+  static_assert(_Nn(2) != _Nn(1),
+ "std::experimental::gcd arguments must not be bool");
+  using _Up = make_unsigned_t>;
+  return std::__detail::__gcd(std::__detail::__absu<_Up>(__m),
+ std::__detail::__absu<_Up>(__n));
 }
 
   /// Least common multiple
@@ -70,13 +74,17 @@ inline namespace fundamentals_v2
 constexpr common_type_t<_Mn, _Nn>
 lcm(_Mn __m, _Nn __n)
 {
-  static_assert(is_integral_v<_Mn>, "lcm arguments are integers");
-  static_assert(is_integral_v<_Nn>, "lcm arguments are integers");
-  

Re: [PATCH] c: Silently ignore pragma region [PR85487]

2020-08-28 Thread Austin Morton via Gcc-patches
> The patch misses documentation of the pragma.

This was an intentional omission - when looking for documentation of
the pragma in clang I found none.

If we do want to document the pragmas in GCC:
 - what section of the documentation would that go in?
- gcc/doc/cpp.texi, section 7, "Pragmas"
- this section states: "This manual documents the pragmas
which are meaningful to the preprocessor itself. Other pragmas are
meaningful to the C or C++ compilers. They are documented in the GCC
manual."
- these pragmas aren't exactly meaningful to the preprocessor
_or_ the C/C++ compiler, so it's not exactly clear where they fit.
- gcc/doc/extend.texi, section 6.62, "Pragmas Accepted by GCC"
 - what exactly would it say?
- I guess just mention the existence and that they won't trigger
unknown pragma warnings? They don't _do_ anything

Austin


Re: [PATCH PR96757] aarch64: ICE during GIMPLE pass: vect

2020-08-28 Thread Richard Sandiford
"duanbo (C)"  writes:
> @@ -4395,6 +4395,40 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
>   {
> tmp = vect_recog_temp_ssa_var (TREE_TYPE (rhs1), NULL);
> pattern_stmt = gimple_build_assign (tmp, rhs1);
> +   tree rhs1_op0 = TREE_OPERAND (rhs1, 0);
> +   tree rhs1_op1 = TREE_OPERAND (rhs1, 1);

I think we also need to handle this case when picking the vector
types and deciding whether a pattern is needed.  Otherwise it would
be possible for rhs1_op1 to be the only input that requires a different
mask, and we'd wrongly avoid creating a conversion for it.

> +   if (rhs1_op0 && rhs1_op1
> +   && (TREE_CODE (TREE_TYPE (rhs1_op0)) == BOOLEAN_TYPE)
> +   && (TREE_CODE (TREE_TYPE (rhs1_op1)) == BOOLEAN_TYPE))

Minor formatting nit, but the brackets around the == are redundant
(but are still kept for multiline comparisons like the == below).

> + {
> +   tree rhs1_op0_type = integer_type_for_mask (rhs1_op0, vinfo);
> +   tree rhs1_op1_type = integer_type_for_mask (rhs1_op1, vinfo);
> +   enum tree_code rhs1_code = gimple_assign_rhs_code (pattern_stmt);
> +   if (rhs1_op0_type && rhs1_op1_type
> +   && (!(TYPE_PRECISION (rhs1_op0_type)
> + == TYPE_PRECISION (rhs1_op1_type

More obvious as != rather than !(… == …).

> + {
> +   if (TYPE_PRECISION (rhs1_op0_type)
> +   < TYPE_PRECISION (rhs1_op1_type))
> + {
> +   vectype2
> + = get_mask_type_for_scalar_type (vinfo, rhs1_op0_type);
> +   if (vectype2)
> + rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> +   vectype2, stmt_vinfo);

It isn't safe to ignore null returns like this.  We should bail out
instead.

> + }
> +   else
> + {
> +   vectype2
> + = get_mask_type_for_scalar_type (vinfo, rhs1_op1_type);
> +   if (vectype2)
> + rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> +   vectype2, stmt_vinfo);
> + }

This seems to go for the narrower of the two precisions.  Are you sure
that's always the optimal choice?  Gut instinct says that we should
move in the direction of vectype1, since that's what the result will
be converted to.  E.g. maybe:

- if both precisions are less than vectype1's precision or both are
  more than vectype1's precision, pick the closest to vectype1.

- otherwise, convert both inputs to vectype1 individually, rather than
  converting the result to vectype1.

I admit I haven't thought about it much though.

> +   pattern_stmt = gimple_build_assign (tmp, rhs1_code,
> +   rhs1_op0, rhs1_op1);
> + }

This statement is redundant with the one created above.  I think instead
we should create the statement this way in all cases, even if rhs1_op0
and rhs1_op1 don't change.  That's effectively what the:

  gimple_build_assign (tmp, rhs1)

does anyway.

I'm going to be away next week so my next reply might be even slower
than usual, sorry.  Would be happy for someone else to review in the
meantime though. :-)

Thanks,
Richard


Re: [PATCH] arm: Fix switch tables for thumb-1 with -mpure-code [PR96768]

2020-08-28 Thread Richard Earnshaw
On 28/08/2020 16:36, Christophe Lyon via Gcc-patches wrote:
> On Fri, 28 Aug 2020 at 16:27, Richard Earnshaw
>  wrote:
>>
>> On 28/08/2020 14:24, Christophe Lyon via Gcc-patches wrote:
>>> On Fri, 28 Aug 2020 at 14:00, Richard Earnshaw
>>>  wrote:

 On 27/08/2020 14:27, Christophe Lyon via Gcc-patches wrote:
> In comment 14 from PR94538, it was suggested to switch off jump tables
> on thumb-1 cores when using -mpure-code, like we already do for thumb-2.
>
> This is what this patch does, and also restores the previous value of
> CASE_VECTOR_PC_RELATIVE since it was not the right way of handling
> this.
>
> It also adds a new test, since the existing no-casesi.c did not catch
> this problem.
>
> Tested by running the whole testsuite with -mpure-code -mcpu=cortex-m0
> -mfloat-abi=soft, no regression and the new test passes (and fails
> without the fix).
>
> 2020-08-27  Christophe Lyon  
>
>   gcc/
>   * config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove condition on
>   -mpure-code.
>   * config/arm/thumb1.md (tablejump): Disable when -mpure-code.
>
>   gcc/testsuite/
>   * gcc.target/arm/pure-code/pr96768.c: New test.
> ---
>  gcc/config/arm/arm.h |  5 ++---
>  gcc/config/arm/thumb1.md |  2 +-
>  gcc/testsuite/gcc.target/arm/pure-code/pr96768.c | 21 
> +
>  3 files changed, 24 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 3887c51..7d43721 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1975,10 +1975,9 @@ enum arm_auto_incmodes
> for the index in the tablejump instruction.  */
>  #define CASE_VECTOR_MODE Pmode
>
> -#define CASE_VECTOR_PC_RELATIVE ((TARGET_THUMB2  
> \
> +#define CASE_VECTOR_PC_RELATIVE (TARGET_THUMB2   
> \
> || (TARGET_THUMB1 \
> -   && (optimize_size || flag_pic)))  \
> -  && (!target_pure_code))
> +   && (optimize_size || flag_pic)))
>
>
>  #define CASE_VECTOR_SHORTEN_MODE(min, max, body) \
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index f0129db..602039e 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -2012,7 +2012,7 @@ (define_insn "*epilogue_insns"
>  (define_expand "tablejump"
>[(parallel [(set (pc) (match_operand:SI 0 "register_operand"))
> (use (label_ref (match_operand 1 "" "")))])]
> -  "TARGET_THUMB1"
> +  "TARGET_THUMB1 && !target_pure_code"
>"
>if (flag_pic)
>  {
> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c 
> b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> new file mode 100644
> index 000..fd4cad5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mpure-code" } */
> +
> +int f2 (int x, int y)
> +{
> +  switch (x)
> +{
> +case 0: return y + 0;
> +case 1: return y + 1;
> +case 2: return y + 2;
> +case 3: return y + 3;
> +case 4: return y + 4;
> +case 5: return y + 5;
> +}
> +  return y;
> +}
> +
> +/* We do not want any load from literal pool, but accept loads from r7
> +   (frame pointer, used at -O0).  */
> +/* { dg-final { scan-assembler-not "ldr\tr\\d+, \\\[r\[0-689\]+\\\]" } } 
> */
> +/* { dg-final { scan-assembler "text,\"0x2006\"" } } */
>

 Having switch tables is only a problem if they are embedded in the .text
 segment.  But the case you show in the PR has them in the .rodata
 segment, so is this really necessary?

>>>
>>> Well, in the original PR94538, comment #14, Wilco said this was the best 
>>> option.
>>>
>>
>> If the choice were not really a choice (ie pure code could not use
>> switch tables and still be pure) yes, it would be the case.  But that
>> isn't the case.
> 
> OK thanks, that was my initial impression.
> 
> So it seems this PR is actually not-a-bug/invalid?
> 
> 
>>
>> GCC already knows generally when using jump sequences is going to be
>> better than switch tables and when tables are likely to be better.  It
>> can even produce a meld of the two when appropriate, it can even take
>> into account whether or not we are optimizing for size.  So we shouldn't
>> be just ride rough-shod over those choices.  Pure code doesn't really
>> change the balance.
> 
> 
> ... or 

Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 27, 2020 at 8:24 PM Jakub Jelinek  wrote:
>
> On Thu, Jul 09, 2020 at 04:33:46PM +0800, Hongtao Liu via Gcc-patches wrote:
> > +static void
> > +replace_constant_pool_with_broadcast (rtx_insn* insn)
> > +{
> > +  subrtx_ptr_iterator::array_type array;
> > +  FOR_EACH_SUBRTX_PTR (iter, array,  (insn), ALL)
> > +{
> > +  rtx *loc = *iter;
> > +  rtx x = *loc;
> > +  rtx broadcast_mem, vec_dup, constant, first;
> > +  machine_mode mode;
> > +  if (GET_CODE (x) != MEM
>
> MEM_P
>
> > +   || GET_CODE (XEXP (x, 0)) != SYMBOL_REF
>
> SYMBOL_REF_P
>
> > +   || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
> > + continue;
> > +
> > +  mode = GET_MODE (x);
> > +  if (!VECTOR_MODE_P (mode))
> > + return;
> > +
> > +  constant = get_pool_constant (XEXP (x, 0));
> > +  first = XVECEXP (constant, 0, 0);
>
> Shouldn't this verify first that GET_CODE (constant) == CONST_VECTOR
> and punt otherwise?
>
> > +  broadcast_mem = force_const_mem (GET_MODE_INNER (mode), first);
> > +  vec_dup = gen_rtx_VEC_DUPLICATE (mode, broadcast_mem);
> > +  *loc = vec_dup;
> > +  INSN_CODE (insn) = -1;
> > +  /* Revert change if there's no corresponding pattern.  */
> > +  if (recog_memoized (insn) < 0)
> > + {
> > +   *loc = x;
> > +   recog_memoized (insn);
> > + }
>
> The usual way of doing this would be through
>   validate_change (insn, loc, vec_dup, 0);
>
> Also, isn't the pass also useful for TARGET_AVX and above (but in that case
> only if it is a simple memory load)?  Or are avx/avx2 broadcast slower than
> full vector loads?
>

Yes, broadcast is insufficient. refer to [1]
[1]. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767#c5

Since pass_combine won't combine load instruction into memory operand,
i.e. it wouldn't combine *mov mem xmm* and *vaddps xmm, xmm, xmm* into
*vaddps mem, xmm, xmm*
It just left this to ira, broadcast under avx512f isn't excluded.

Maybe patch should be rewritten by using post reload splitter, then
there would be no concern of an extra pass, but still have the latter
issue as you mentioned.

> As Jeff wrote, I wonder if when successfully replacing those pool constants
> the old constant pool entries will be omitted.
>
> Another thing I wonder about is whether more analysis shouldn't be used.
> E.g. if the constant pool entry is already emitted into .rodata anyway
> (e.g. some earlier function needed it), using the broadcast will mean

Yes, some later function may need it either, so we need a global view
to decide the replacement, hope it could be done by generic constant
pool code.

> actually larger .rodata.  If {1to8} and similar is as fast as reading all
> the same elements from memory (or faster), perhaps in that case it should
> broadcast from the first element of the existing constant pool full vector
> rather than creating a new one.
> And similarly, perhaps the function should look at all constant pool entries
> in the current function (not yet emitted into .rodata) and if it would
> succeed for some and not for others, either use broadcast from its first
> element or not perform it for the others too.
>
> Jakub
>


-- 
BR,
Hongtao


[PATCH] use get_size_range to get allocated size (PR 92942)

2020-08-28 Thread Martin Sebor via Gcc-patches

The gimple_call_alloc_size() function that determines the range
of sizes of allocated objects and constrains the bounds in calls
to functions like memcpy calls get_range() instead of
get_size_range() to obtain its result.  The latter is the right
function to call because it has the necessary logic to constrain
the range to just the values that are valid for object sizes.
This is especially useful when the range is the result of
a conversion from a signed to a wider unsigned integer where
the upper subrange is excessive and can be eliminated such as in:

  char* f (int n)
  {
if (n > 8)
  n = 8;
char *p = malloc (n);
strcpy (p, "0123456789");   // buffer overflow
...
  }

Attached is a fix that lets -Wstringop-overflow diagnose the buffer
overflow above.  Besides with GCC I have also tested the change by
building Binutils/GDB and Glibc and verifying that it doesn't
introduce any false positives.

Martin
PR middle-end/92942 - missing -Wstringop-overflow for allocations with a negative lower bound size

gcc/ChangeLog:

	PR middle-end/92942
	* builtins.c (gimple_call_alloc_size): Call get_size_range instead
	of get_range.
	* calls.c (get_size_range): Define new overload.  Handle anti-ranges
	  whose upper part is with the valid size range.
	* calls.h (get_size_range): Declare new overload.

gcc/testsuite/ChangeLog:

	PR middle-end/92942
	* gcc.dg/Wstringop-overflow-40.c: New test.
	* gcc.dg/Wstringop-overflow-41.c: New test.
	* gcc.dg/attr-alloc_size-10.c: Disable macro tracking.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 8845816aebd..4caa02f9a3b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3884,7 +3884,7 @@ check_access (tree exp, tree, tree, tree dstwrite,
 
 tree
 gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */,
-			const vr_values *rvals /* = NULL */)
+			const vr_values * /* = NULL */)
 {
   if (!stmt)
 return NULL_TREE;
@@ -3936,7 +3936,7 @@ gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */,
   if (!rng1)
 rng1 = rng1_buf;
 
-  if (!get_range (size, rng1, rvals))
+  if (!get_size_range (size, rng1))
 return NULL_TREE;
 
   if (argidx2 > nargs && TREE_CODE (size) == INTEGER_CST)
@@ -3946,7 +3946,7 @@ gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */,
  of the upper bounds as a constant.  Ignore anti-ranges.  */
   tree n = argidx2 < nargs ? gimple_call_arg (stmt, argidx2) : integer_one_node;
   wide_int rng2[2];
-  if (!get_range (n, rng2, rvals))
+  if (!get_size_range (n, rng2))
 return NULL_TREE;
 
   /* Extend to the maximum precision to avoid overflow.  */
diff --git a/gcc/calls.c b/gcc/calls.c
index a11da66492d..6c59b82dd78 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1238,17 +1238,17 @@ alloc_max_size (void)
 }
 
 /* Return true when EXP's range can be determined and set RANGE[] to it
-   after adjusting it if necessary to make EXP a represents a valid size
-   of object, or a valid size argument to an allocation function declared
-   with attribute alloc_size (whose argument may be signed), or to a string
-   manipulation function like memset.  When ALLOW_ZERO is true, allow
-   returning a range of [0, 0] for a size in an anti-range [1, N] where
-   N > PTRDIFF_MAX.  A zero range is a (nearly) invalid argument to
-   allocation functions like malloc but it is a valid argument to
-   functions like memset.  */
+   after adjusting it if necessary to make EXP represent a valid size
+   of an object, or a valid size argument to an allocation function
+   declared with attribute alloc_size (whose argument may be signed),
+   or to a string manipulation function like memset.  When ALLOW_ZERO
+   is true, allow returning a range of [0, 0] for a size in an anti-range
+   [1, N] where N >= PTRDIFF_MAX.  A zero range is a (nearly) invalid
+   argument to allocation functions like malloc but it is a valid argument
+   to functions like memset.  */
 
 bool
-get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
+get_size_range (tree exp, wide_int range[2], bool allow_zero /* = false */)
 {
   if (!exp)
 return false;
@@ -1256,7 +1256,7 @@ get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
   if (tree_fits_uhwi_p (exp))
 {
   /* EXP is a constant.  */
-  range[0] = range[1] = exp;
+  range[0] = range[1] = wi::to_wide (exp);
   return true;
 }
 
@@ -1277,13 +1277,11 @@ get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
 	{
 	  /* Use the full range of the type of the expression when
 	 no value range information is available.  */
-	  range[0] = TYPE_MIN_VALUE (exptype);
-	  range[1] = TYPE_MAX_VALUE (exptype);
+	  range[0] = wi::to_wide (TYPE_MIN_VALUE (exptype));
+	  range[1] = wi::to_wide (TYPE_MAX_VALUE (exptype));
 	  return true;
 	}
 
-  range[0] = NULL_TREE;
-  range[1] = NULL_TREE;
   return false;
 }
 
@@ -1337,14 +1335,46 @@ get_size_range (tree exp, tree range[2], bool allow_zero 

Re: [PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 28, 2020 at 04:40:48PM +0100, Maciej W. Rozycki wrote:
>  As far as `-fexceptions' and `-fasynchronous-unwind-tables' are concerned 
> it aligns with my understanding, i.e. in this specific scenario we need 
> `-fasynchronous-unwind-tables' for libcalls (all of them) so that an 

-fasynchronous-unwind-tables is solely about whether one can unwind through
the code, but not necessarily that EH will work.  If it is not turned on,
non-call exceptions will certainly not work properly, because unwind info
if present at all will only be correct on call insns and not guaranteed on
anything else.

-fexceptions enables EH handling in whatever function it is turned on,
entries in .gcc_except_table as well as a personality routine will be added
for it, calls from it which can throw (including special calls like throw)
will be marked there and cause EH regions, similarly EH pads will be added
for anything where an exception needs to be caught or where a cleanup needs
to be run before continuing with the unwinding (e.g. destructors or cleanup
attribute).  But still, it only expects that call can throw, not arbitrary
instructions, so those won't be covered necessarily in EH regions, in the IL
there won't be EH edges etc.

-fnon-call-exceptions is like -fexceptions, except in addition to calls
(that may throw) it considers also possibly trapping instructions as
something that can throw.  So way too many more EH edges, EH regions etc.

Now, I'd say if for some trapping instruction in a function we want to throw
an exception from a signal handler, surely the signal handler needs to be
built with -fexceptions, but as long as the function that contains the
trapping instruction doesn't really need to catch the exception (including
running a cleanup in it and continuing with the exception), i.e. when the
exception is thrown from the signal handler and caught in some caller of the
function with the trapping instruction, I'd say all that needs to be done
is the function with the trapping instruction be compiled with
-fasynchronous-unwind-tables and the callers of it that are still not meant
to catch it should be compiled with -funwind-tables (or
-fasynchronous-unwind-tables) and only the function that should catch it or
run cleanups should be compiled with -fexceptions (no need for
-fnon-call-exceptions in that case, but of course not disallowed).
This is for DWARF EH, I have no idea how ARM, or IA64 EH works, so it might
be different for those.

So, if Ada has testsuite coverage for EH throwing on division or modulo by
zero and your patch doesn't break Ada on at least some common DWARF using target
and ARM (let's ignore IA64), I'd say replacing
LIB2_DIVMOD_EXCEPTION_FLAGS := -fexceptions -fnon-call-exceptions
with
LIB2_DIVMOD_EXCEPTION_FLAGS := -fasynchronous-unwind-tables
looks reasonable to me.

Jakub



[PING] [PATCH V2 0/4] Unify C and C++ handling of loops and switches

2020-08-28 Thread Sandra Loosemore
Ping!  Only the fix for the Fortran bootstrap failure in part 3 has been 
reviewed.


-Sandra


https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551927.html

On 8/13/20 10:34 AM, Sandra Loosemore wrote:

This is a revised version of the patch set originally posted
last November:

https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534142.html

In addition to generally updating and rebasing the patches to reflect
other changes on mainline in the meantime, for this version I have
switched to using the C lowering strategy (directly to goto form)
rather than the C++ one (to LOOP_EXPR) because of regressions in the C
optimization tests.  Besides the ones previously noted in the original
patch submission, there were a bunch of new ones since November.  Some
of them were trivial to fix (e.g., flipping branch probabilities to
reflect the different sense of the loop exit condition in the
C++-style output), but I wasn't making much progress on others and
eventually decided to pursue the "plan B" of using the C-style output
everywhere, as discussed here:

https://gcc.gnu.org/pipermail/gcc-patches/2019-December/536536.html

The only regression I ran into with this was a bootstrap failure
building the Fortran front end from a new -Wmaybe-uninitialized error.
This might be a false positive but part 3 of the new series works
around it by adding an assertion to give g++ a hint.  Unfortunately I
had no luck in trying to reduce this to a standalone test case, but I
did observe that the failure went away when I compiled that file with
debugging enabled.  :-S  I could file a PR to look into this further if
the workaround is good enough for now.

-Sandra


Sandra Loosemore (4):
   Move loop and switch tree data structures from cp/ to c-family/.
   Use C-style loop lowering instead of C++-style.
   Work around bootstrap failure in Fortran front end.
   Change C front end to emit structured loop and switch tree nodes.

  gcc/c-family/c-common.c |  24 ++
  gcc/c-family/c-common.def   |  24 ++
  gcc/c-family/c-common.h |  53 +++-
  gcc/c-family/c-dump.c   |  38 +++
  gcc/c-family/c-gimplify.c   | 422 
  gcc/c-family/c-pretty-print.c   |  92 ++-
  gcc/c/c-decl.c  |  18 +-
  gcc/c/c-lang.h  |   3 +-
  gcc/c/c-objc-common.h   |   2 +
  gcc/c/c-parser.c| 125 +-
  gcc/c/c-tree.h  |  21 +-
  gcc/c/c-typeck.c| 227 ++---
  gcc/cp/cp-gimplify.c| 469 +++-
  gcc/cp/cp-objcp-common.c|  13 +-
  gcc/cp/cp-tree.def  |  23 --
  gcc/cp/cp-tree.h|  40 ---
  gcc/cp/cxx-pretty-print.c   |  78 --
  gcc/cp/dump.c   |  31 ---
  gcc/doc/generic.texi|  56 +++--
  gcc/fortran/interface.c |   4 +
  gcc/objc/ChangeLog  |   5 +
  gcc/objc/objc-act.c |   6 +-
  gcc/testsuite/gcc.dg/gomp/block-7.c |  12 +-
  23 files changed, 938 insertions(+), 848 deletions(-)





[PATCH] sra: Avoid SRAing if there is an aout-of-bounds access (PR 96820)

2020-08-28 Thread Martin Jambor
Hi,

the testcase causes and ICE in the SRA verifier on x86_64 when
compiling with -m32 because build_user_friendly_ref_for_offset looks
at an out-of-bounds array_ref within an array_ref which accesses an
offset which does not fit into a signed 32bit integer and turns it
into an array-ref with a negative index.

The best thing is probably to bail out early when encountering an out
of bounds access to a local stack-allocated aggregate (and let the DSE
just delete such statements) which is what the patch does.

I also glanced over to the initial candidate vetting routine to make
sure the size would fit into HWI and noticed that it uses unsigned
variants whereas the rest of SRA operates on signed offsets and
sizes (because get_ref_and_extent does) and so changed that for the
sake of consistency.  These ancient checks operate on sizes of types
as opposed to DECLs but I hope that any issues potentially arising
from that are basically hypothetical.

Bootstrapped and tested on x86_64-linux.  OK for master and then for
gcc-10 branch?

Thanks,

Martin


gcc/ChangeLog:

2020-08-28  Martin Jambor  

PR tree-optimization/96820
* tree-sra.c (create_access): Disqualify candidates with accesses
beyond the end of the original aggregate.
(maybe_add_sra_candidate): Check that candidate type size fits
signed uhwi for the sake of consistency.

gcc/testsuite/ChangeLog:

2020-08-28  Martin Jambor  

PR tree-optimization/96820
* gcc.dg/tree-ssa/pr96820.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr96820.c | 12 
 gcc/tree-sra.c  |  9 +++--
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr96820.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96820.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr96820.c
new file mode 100644
index 000..f5c2195f310
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96820.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+struct a {
+  int b;
+};
+int main() {
+  struct a d[][6] = {4};
+  struct a e;
+  d[1955249013][1955249013] = e;
+  return e.b;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 754f41302fc..98a6cacbe2a 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -941,6 +941,11 @@ create_access (tree expr, gimple *stmt, bool write)
   disqualify_candidate (base, "Encountered an unconstrained access.");
   return NULL;
 }
+  if (offset + size > tree_to_shwi (DECL_SIZE (base)))
+{
+  disqualify_candidate (base, "Encountered an access beyond the base.");
+  return NULL;
+}
 
   access = create_access_1 (base, offset, size);
   access->expr = expr;
@@ -1880,12 +1885,12 @@ maybe_add_sra_candidate (tree var)
   reject (var, "has incomplete type");
   return false;
 }
-  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
+  if (!tree_fits_shwi_p (TYPE_SIZE (type)))
 {
   reject (var, "type size not fixed");
   return false;
 }
-  if (tree_to_uhwi (TYPE_SIZE (type)) == 0)
+  if (tree_to_shwi (TYPE_SIZE (type)) == 0)
 {
   reject (var, "type size is zero");
   return false;
-- 
2.28.0



[PATCH] rs6000: r12 copy cleanup

2020-08-28 Thread Bill Schmidt via Gcc-patches
Remove unnecessary tests before copying function address to r12, as
requested by Segher.

Bootstrapped and tested on powerpc64le-unknown-linx-gnu with no
regressions, committed as obvious.

Thanks,
Bill


2020-08-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000.c (rs6000_call_aix): Remove test for r12.
(rs6000_sibcall_aix): Likewise.
---
 gcc/config/rs6000/rs6000.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 09545278dcf..ca5b71ecdd3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -24725,8 +24725,7 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, 
rtx cookie)
  /* A function pointer in the ELFv2 ABI is just a plain address, but
 the ABI requires it to be loaded into r12 before the call.  */
  func_addr = gen_rtx_REG (Pmode, 12);
- if (!rtx_equal_p (func_addr, func))
-   emit_move_insn (func_addr, func);
+ emit_move_insn (func_addr, func);
  abi_reg = func_addr;
  /* Indirect calls via CTR are strongly preferred over indirect
 calls via LR, so move the address there.  Needed to mark
@@ -24846,8 +24845,7 @@ rs6000_sibcall_aix (rtx value, rtx func_desc, rtx 
tlsarg, rtx cookie)
   if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
 {
   r12 = gen_rtx_REG (Pmode, 12);
-  if (!rtx_equal_p (r12, func_desc))
-   emit_move_insn (r12, func_desc);
+  emit_move_insn (r12, func_desc);
   func_addr = gen_rtx_REG (Pmode, CTR_REGNO);
   emit_move_insn (func_addr, r12);
 }
-- 
2.17.1



Re: [PATCH] c++: Fix resolving the address of overloaded pmf [PR96647]

2020-08-28 Thread Patrick Palka via Gcc-patches
(Removing libstd...@gcc.gnu.org from CC list)

On Fri, 28 Aug 2020, Patrick Palka wrote:
> In resolve_address_of_overloaded_function, currently only the second
> pass over the overload set (which considers just the function templates
> in the overload set) checks constraints and performs return type
> deduction when necessary.  But as the testcases below show, we need to
> do this when considering non-template functions during the first pass,
> too.
> 
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/96647
>   * class.c (resolve_address_of_overloaded_function): Also check
>   constraints and perform return type deduction when considering
>   non-template functions in the overload set.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/96647
>   * g++.dg/cpp0x/auto-96647.C: New test.
>   * g++.dg/cpp2a/concepts-fn6.C: New test.
> ---
>  gcc/cp/class.c| 16 
>  gcc/testsuite/g++.dg/cpp0x/auto-96647.C   | 10 ++
>  gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C | 10 ++
>  3 files changed, 36 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto-96647.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C
> 
> diff --git a/gcc/cp/class.c b/gcc/cp/class.c
> index 3479b8207d2..c15cb04c654 100644
> --- a/gcc/cp/class.c
> +++ b/gcc/cp/class.c
> @@ -8286,6 +8286,22 @@ resolve_address_of_overloaded_function (tree 
> target_type,
>one, or vice versa.  */
> continue;
>  
> + /* Constraints must be satisfied. This is done before
> +return type deduction since that instantiates the
> +function. */
> + if (!constraints_satisfied_p (fn))
> +   continue;
> +
> + if (undeduced_auto_decl (fn))
> +   {
> + /* Force instantiation to do return type deduction.  */
> + ++function_depth;
> + instantiate_decl (fn, /*defer*/false, /*class*/false);
> + --function_depth;
> +
> + require_deduced_type (fn);
> +   }
> +
>   /* In C++17 we need the noexcept-qualifier to compare types.  */
>   if (flag_noexcept_type
>   && !maybe_instantiate_noexcept (fn, complain))
> diff --git a/gcc/testsuite/g++.dg/cpp0x/auto-96647.C 
> b/gcc/testsuite/g++.dg/cpp0x/auto-96647.C
> new file mode 100644
> index 000..314b2a16ac2
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/auto-96647.C
> @@ -0,0 +1,10 @@
> +// PR c++/96647
> +// { dg-do compile { target c++11 } }
> +
> +template
> +struct Base {
> +  auto f(int) { }
> +  auto f(char) { }
> +};
> +
> +void (Base::*ptr)(int) = ::f;
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C
> new file mode 100644
> index 000..3d7941658d4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C
> @@ -0,0 +1,10 @@
> +// PR c++/96647
> +// { dg-do compile { target c++20 } }
> +
> +template
> +struct Base {
> +  auto f(int) { }
> +  auto f(int) requires T::fail { static_assert(T::fail); }
> +};
> +
> +void (Base::*ptr)(int) = ::f;
> -- 
> 2.28.0.358.g20de7e7e4f
> 
> 



[PATCH] c++: Fix resolving the address of overloaded pmf [PR96647]

2020-08-28 Thread Patrick Palka via Gcc-patches
In resolve_address_of_overloaded_function, currently only the second
pass over the overload set (which considers just the function templates
in the overload set) checks constraints and performs return type
deduction when necessary.  But as the testcases below show, we need to
do this when considering non-template functions during the first pass,
too.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

gcc/cp/ChangeLog:

PR c++/96647
* class.c (resolve_address_of_overloaded_function): Also check
constraints and perform return type deduction when considering
non-template functions in the overload set.

gcc/testsuite/ChangeLog:

PR c++/96647
* g++.dg/cpp0x/auto-96647.C: New test.
* g++.dg/cpp2a/concepts-fn6.C: New test.
---
 gcc/cp/class.c| 16 
 gcc/testsuite/g++.dg/cpp0x/auto-96647.C   | 10 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C | 10 ++
 3 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto-96647.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 3479b8207d2..c15cb04c654 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -8286,6 +8286,22 @@ resolve_address_of_overloaded_function (tree target_type,
 one, or vice versa.  */
  continue;
 
+   /* Constraints must be satisfied. This is done before
+  return type deduction since that instantiates the
+  function. */
+   if (!constraints_satisfied_p (fn))
+ continue;
+
+   if (undeduced_auto_decl (fn))
+ {
+   /* Force instantiation to do return type deduction.  */
+   ++function_depth;
+   instantiate_decl (fn, /*defer*/false, /*class*/false);
+   --function_depth;
+
+   require_deduced_type (fn);
+ }
+
/* In C++17 we need the noexcept-qualifier to compare types.  */
if (flag_noexcept_type
&& !maybe_instantiate_noexcept (fn, complain))
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto-96647.C 
b/gcc/testsuite/g++.dg/cpp0x/auto-96647.C
new file mode 100644
index 000..314b2a16ac2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/auto-96647.C
@@ -0,0 +1,10 @@
+// PR c++/96647
+// { dg-do compile { target c++11 } }
+
+template
+struct Base {
+  auto f(int) { }
+  auto f(char) { }
+};
+
+void (Base::*ptr)(int) = ::f;
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C
new file mode 100644
index 000..3d7941658d4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn6.C
@@ -0,0 +1,10 @@
+// PR c++/96647
+// { dg-do compile { target c++20 } }
+
+template
+struct Base {
+  auto f(int) { }
+  auto f(int) requires T::fail { static_assert(T::fail); }
+};
+
+void (Base::*ptr)(int) = ::f;
-- 
2.28.0.358.g20de7e7e4f



Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 28, 2020 at 05:07:11PM +0100, Richard Sandiford wrote:
> Jakub Jelinek via Gcc-patches  writes:
> > +  auto_vec buffer;
> > +  auto_vec vec;
> > +  object_allocator
> > +data_pool ("constant_descriptor_rtx_data_pool");
> > +  int idx = 0;
> > +  size_t size = 0;
> > +  for (constant_descriptor_rtx *desc = pool->first; desc; desc = 
> > desc->next)
> > +if (desc->mark > 0
> > +   && ! (SYMBOL_REF_HAS_BLOCK_INFO_P (desc->sym)
> > + && SYMBOL_REF_BLOCK (desc->sym)))
> > +  {
> > +   buffer.truncate (0);
> 
> 128 isn't big enough for all targets (e.g. aarch64 with
> -msve-vector-bits=2048), so I think we still need a reserve
> call here.

You're right, thanks for spotting it, I've missed native_encode_rtx will do
quick_push rather than safe_push.

Updated patch below, it shouldn't be needed in the second loop, because
the first loop should already grow it to the largest size.

2020-08-28  Jakub Jelinek  

PR middle-end/54201
* varasm.c: Include alloc-pool.h.
(output_constant_pool_contents): Emit desc->mark < 0 entries as
aliases.
(struct constant_descriptor_rtx_data): New type.
(constant_descriptor_rtx_data_cmp): New function.
(struct const_rtx_data_hasher): New type.
(const_rtx_data_hasher::hash, const_rtx_data_hasher::equal): New
methods.
(optimize_constant_pool): New function.
(output_shared_constant_pool): Call it if TARGET_SUPPORTS_ALIASES.

--- gcc/varasm.c.jj 2020-07-28 15:39:10.091755086 +0200
+++ gcc/varasm.c2020-08-28 18:21:58.943759578 +0200
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.
 #include "asan.h"
 #include "rtl-iter.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "alloc-pool.h"
 
 #ifdef XCOFF_DEBUGGING_INFO
 #include "xcoffout.h"  /* Needed for external data declarations.  */
@@ -4198,7 +4199,27 @@ output_constant_pool_contents (struct rt
   class constant_descriptor_rtx *desc;
 
   for (desc = pool->first; desc ; desc = desc->next)
-if (desc->mark)
+if (desc->mark < 0)
+  {
+#ifdef ASM_OUTPUT_DEF
+   const char *name = targetm.strip_name_encoding (XSTR (desc->sym, 0));
+   char label[256];
+   char buffer[256 + 32];
+   const char *p;
+
+   ASM_GENERATE_INTERNAL_LABEL (label, "LC", ~desc->mark);
+   p = targetm.strip_name_encoding (label);
+   if (desc->offset)
+ {
+   sprintf (buffer, "%s+%ld", p, desc->offset);
+   p = buffer;
+ }
+   ASM_OUTPUT_DEF (asm_out_file, name, p);
+#else
+   gcc_unreachable ();
+#endif
+  }
+else if (desc->mark)
   {
/* If the constant is part of an object_block, make sure that
   the constant has been positioned within its block, but do not
@@ -4216,6 +4237,160 @@ output_constant_pool_contents (struct rt
   }
 }
 
+struct constant_descriptor_rtx_data {
+  constant_descriptor_rtx *desc;
+  target_unit *bytes;
+  unsigned short size;
+  unsigned short offset;
+  unsigned int hash;
+};
+
+/* qsort callback to sort constant_descriptor_rtx_data * vector by
+   decreasing size.  */
+
+static int
+constant_descriptor_rtx_data_cmp (const void *p1, const void *p2)
+{
+  constant_descriptor_rtx_data *const data1
+= *(constant_descriptor_rtx_data * const *) p1;
+  constant_descriptor_rtx_data *const data2
+= *(constant_descriptor_rtx_data * const *) p2;
+  if (data1->size > data2->size)
+return -1;
+  if (data1->size < data2->size)
+return 1;
+  if (data1->hash < data2->hash)
+return -1;
+  gcc_assert (data1->hash > data2->hash);
+  return 1;
+}
+
+struct const_rtx_data_hasher : nofree_ptr_hash
+{
+  static hashval_t hash (constant_descriptor_rtx_data *);
+  static bool equal (constant_descriptor_rtx_data *,
+constant_descriptor_rtx_data *);
+};
+
+/* Hash and compare functions for const_rtx_data_htab.  */
+
+hashval_t
+const_rtx_data_hasher::hash (constant_descriptor_rtx_data *data)
+{
+  return data->hash;
+}
+
+bool
+const_rtx_data_hasher::equal (constant_descriptor_rtx_data *x,
+ constant_descriptor_rtx_data *y)
+{
+  if (x->hash != y->hash || x->size != y->size)
+return 0;
+  unsigned int align1 = x->desc->align;
+  unsigned int align2 = y->desc->align;
+  unsigned int offset1 = (x->offset * BITS_PER_UNIT) & (align1 - 1);
+  unsigned int offset2 = (y->offset * BITS_PER_UNIT) & (align2 - 1);
+  if (offset1)
+align1 = least_bit_hwi (offset1);
+  if (offset2)
+align2 = least_bit_hwi (offset2);
+  if (align2 > align1)
+return 0;
+  if (memcmp (x->bytes, y->bytes, x->size * sizeof (target_unit)) != 0)
+return 0;
+  return 1;
+}
+
+/* Attempt to optimize constant pool POOL.  If it contains both CONST_VECTOR
+   constants and scalar constants with the values of CONST_VECTOR elements,
+   try to alias the scalar constants with the CONST_VECTOR elements.  */
+
+static void
+optimize_constant_pool (struct 

Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Richard Sandiford
Thanks for doing this.  I don't feel qualified to review the full
patch, but one thing:

Jakub Jelinek via Gcc-patches  writes:
> +  auto_vec buffer;
> +  auto_vec vec;
> +  object_allocator
> +data_pool ("constant_descriptor_rtx_data_pool");
> +  int idx = 0;
> +  size_t size = 0;
> +  for (constant_descriptor_rtx *desc = pool->first; desc; desc = desc->next)
> +if (desc->mark > 0
> + && ! (SYMBOL_REF_HAS_BLOCK_INFO_P (desc->sym)
> +   && SYMBOL_REF_BLOCK (desc->sym)))
> +  {
> + buffer.truncate (0);

128 isn't big enough for all targets (e.g. aarch64 with
-msve-vector-bits=2048), so I think we still need a reserve
call here.

Thanks,
Richard

> + if (native_encode_rtx (desc->mode, desc->constant, buffer, 0,
> +GET_MODE_SIZE (desc->mode)))
> +   {
> + constant_descriptor_rtx_data *data = data_pool.allocate ();
> + data->desc = desc;
> + data->bytes = NULL;
> + data->size = GET_MODE_SIZE (desc->mode);
> + data->offset = 0;
> + data->hash = idx++;
> + size += data->size;
> + vec.safe_push (data);
> +   }
> +  }
> +  if (idx)
> +{
> +  vec.qsort (constant_descriptor_rtx_data_cmp);
> +  unsigned min_size = vec.last ()->size;
> +  target_unit *bytes = XNEWVEC (target_unit, size);
> +  unsigned int i;
> +  constant_descriptor_rtx_data *data;
> +  hash_table * htab
> + = new hash_table (31);
> +  size = 0;
> +  FOR_EACH_VEC_ELT (vec, i, data)
> + {
> +   buffer.truncate (0);
> +   native_encode_rtx (data->desc->mode, data->desc->constant,
> +  buffer, 0, data->size);
> +   memcpy (bytes + size, buffer.address (), data->size);
> +   data->bytes = bytes + size;
> +   data->hash = iterative_hash (data->bytes,
> +data->size * sizeof (target_unit), 0);
> +   size += data->size;
> +   constant_descriptor_rtx_data **slot
> + = htab->find_slot_with_hash (data, data->hash, INSERT);
> +   if (*slot)
> + {
> +   data->desc->mark = ~(*slot)->desc->labelno;
> +   data->desc->offset = (*slot)->offset;
> + }
> +   else
> + {
> +   unsigned int sz = 1 << floor_log2 (data->size);
> +
> +   *slot = data;
> +   for (sz >>= 1; sz >= min_size; sz >>= 1)
> + for (unsigned off = 0; off + sz <= data->size; off += sz)
> +   {
> + constant_descriptor_rtx_data tmp;
> + tmp.desc = data->desc;
> + tmp.bytes = data->bytes + off;
> + tmp.size = sz;
> + tmp.offset = off;
> + tmp.hash = iterative_hash (tmp.bytes,
> +sz * sizeof (target_unit), 0);
> + slot = htab->find_slot_with_hash (, tmp.hash, INSERT);
> + if (*slot == NULL)
> +   {
> + *slot = data_pool.allocate ();
> + **slot = tmp;
> +   }
> +   }
> + }
> + }
> +  delete htab;
> +  XDELETE (bytes);
> +}
> +  data_pool.release ();
> +}
> +
>  /* Mark all constants that are used in the current function, then write
> out the function's private constant pool.  */
>  
> @@ -4251,6 +4425,10 @@ output_constant_pool (const char *fnname
>  void
>  output_shared_constant_pool (void)
>  {
> +  if (optimize
> +  && TARGET_SUPPORTS_ALIASES)
> +optimize_constant_pool (shared_constant_pool);
> +
>output_constant_pool_contents (shared_constant_pool);
>  }
>  
>
>
>   Jakub


Re: [PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-28 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Aug 26, 2020 at 12:08 PM Richard Biener via Gcc-patches
 wrote:
>
> On Tue, Aug 25, 2020 at 6:32 PM Maciej W. Rozycki  wrote:
> >
> > Hi Kito,
> >
> > > I just found the mail thread about div mod with -fnon-call-exceptions,
> > > I think keeping the default LIB2_DIVMOD_EXCEPTION_FLAGS unchanged
> > > should be the best way to go.
> > >
> > > Non-call exceptions and libcalls
> > > https://gcc.gnu.org/legacy-ml/gcc/2001-06/msg01108.html
> > >
> > > Non-call exceptions and libcalls Part 2
> > > https://gcc.gnu.org/legacy-ml/gcc/2001-07/msg00402.html
> >
> >  Thank you for your input.  I believe I had a look at these commits before
> > I posted my original proposal.  Please note however that they both predate
> > the addition of `-fasynchronous-unwind-tables', so clearly the option
> > could not have been considered at the time the changes were accepted into
> > GCC.
> >
> >  Please note that, as observed by Andreas and Richard here:
> >  in no case we
> > want to have full exception handling here, so we clearly need no
> > `-fexceptions'; this libcall code won't itself ever call `throw'.
> >
> >  Now it might be a bit unclear from documentation as to whether we want
> > `-fnon-call-exceptions' or `-fasynchronous-unwind-tables', as it says that
> > the former option makes GCC:
> >
> > "Generate code that allows trapping instructions to throw
> >  exceptions.  Note that this requires platform-specific runtime
> >  support that does not exist everywhere.  Moreover, it only allows
> >  _trapping_ instructions to throw exceptions, i.e. memory references
> >  or floating-point instructions.  It does not allow exceptions to be
> >  thrown from arbitrary signal handlers such as 'SIGALRM'."
> >
> > Note the observation that arbitrary signal handlers (invoked at more inner
> > a frame level, and necessarily built with `-fexceptions') are still not
> > allowed to throw exceptions.  For that, as far as I understand it, you
> > actually need `-fasynchronous-unwind-tables', which makes GCC:
> >
> > "Generate unwind table in DWARF format, if supported by target
> >  machine.  The table is exact at each instruction boundary, so it
> >  can be used for stack unwinding from asynchronous events (such as
> >  debugger or garbage collector)."
> >
> > and therefore allows arbitrary signal handlers to throw exceptions,
> > effectively making the option a superset of `-fexceptions'.  As libcall
> > code can generally be implicitly invoked everywhere, we want people not to
> > be restrained by it and let a exception thrown by e.g. a user-supplied
> > SIGALRM handler propagate through the relevant libcall's stack frame,
> > rather than just those exceptions the libcall itself might indirectly
> > cause.
> >
> >  Maybe I am missing something here, especially as `-fexceptions' mentions
> > code generation, while `-fasynchronous-unwind-tables' only refers to
> > unwind table generation, but then what would be the option to allow
> > exceptions to be thrown from arbitrary signal handlers rather than those
> > for memory references or floating-point instructions (where by a special
> > provision integer division falls as well)?
> >
> >  My understanding has been it is `-fasynchronous-unwind-tables', but I'll
> > be gladly straightened out otherwise.  If I am indeed right, then perhaps
> > the documentation could be clarified and expanded a bit.
> >
> >  Barring evidence to the contrary I maintain the change I have proposed is
> > correct, and not only removes the RISC-V `ld.so' build issue, but it fixes
> > the handling of asynchronous events arriving in the middle of the relevant
> > libcalls for all platforms as well.
> >
> >  Please let me know if you have any further questions, comments or
> > concerns.
>
> You only need -fexceptions for that, then you can throw; from a signal handler
> for example.  If you want to be able to catch the exception somewhere up
> the call chain all intermediate code needs to be compiled so that unwinding
> from asynchronous events is possible - -fasynchronous-unwind-tables.
>
> So -fasynchronous-unwind-tables is about unwinding.  -f[non-call]-exceptions
> is about throw/catch.  Clearly libgcc does neither throw nor catch but with
> async events we might need to unwind from inside it.
>
> Now I don't know about the arm situation but if arm cannot do async unwinding
> then even -fexceptions won't help it here - libgcc still does not throw.

On Arm as in the AArch32 port,  async unwinding will not work as those
can't be expressed in the EH format tables.

regards
Ramana

>
> Richard.
>
> >
> >   Maciej


[PING 2][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-08-28 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551783.html

On 8/19/20 9:00 AM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551783.html

On 8/11/20 10:19 AM, Martin Sebor wrote:

-Wplacement-new handles array indices and pointer offsets the same:
by adjusting them by the size of the element.  That's correct for
the latter but wrong for the former, causing false positives when
the element size is greater than one.

In addition, the warning doesn't even attempt to handle arrays of
arrays.  I'm not sure if I forgot or if I simply didn't think of
it.

The attached patch corrects these oversights by replacing most
of the -Wplacement-new code with a call to compute_objsize which
handles all this correctly (plus more), and is also better tested.
But even compute_objsize has bugs: it trips up while converting
wide_int to offset_int for some pointer offset ranges.  Since
handling the C++ IL required changes in this area the patch also
fixes that.

For review purposes, the patch affects just the middle end.
The C++ diff pretty much just removes code from the front end.

Tested on x86_64-linux plus by building the latest Glibc and
confirming no new warnings.

Martin






Re: [PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-28 Thread Maciej W. Rozycki via Gcc-patches
On Wed, 26 Aug 2020, Jakub Jelinek wrote:

> On Wed, Aug 26, 2020 at 01:08:00PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > You only need -fexceptions for that, then you can throw; from a signal 
> > handler
> > for example.  If you want to be able to catch the exception somewhere up
> > the call chain all intermediate code needs to be compiled so that unwinding
> > from asynchronous events is possible - -fasynchronous-unwind-tables.
> > 
> > So -fasynchronous-unwind-tables is about unwinding.  -f[non-call]-exceptions
> > is about throw/catch.  Clearly libgcc does neither throw nor catch but with
> > async events we might need to unwind from inside it.

 Thank you for the clarification.

 As far as `-fexceptions' and `-fasynchronous-unwind-tables' are concerned 
it aligns with my understanding, i.e. in this specific scenario we need 
`-fasynchronous-unwind-tables' for libcalls (all of them) so that an 
exception thrown from a signal handler can unwind through our code, and it 
is the signal handler that needs `-fexceptions' to be able to throw an 
exception, and then another piece of code upwards the call chain also 
needs `-fexceptions' to catch it.

 I'm still unclear as to `-fnon-call-exceptions' as what you write,
combined with documentation for said option seems to imply that it causes
a signal handler to be installed for SIGBUS/SIGSEGV/SIGFPE, however I seem
unable to locate such code except for the Ada frontend.

 The option does cause certain instruction classes, as per `may_trap_p_1', 
to be annotated for exception handling and prevent them from being 
optimised away, but it is not clear to me what the difference is between 
the case where a piece of code has been built with `-fnon-call-exceptions' 
and `-fasynchronous-unwind-tables', and an instruction within (that hasn't 
been optimised away) traps into an exception handler that throws an 
exception.  I.e. what the extra annotation is used for (obviously being 
optimised away or not is a significat difference).

 IIUC in both cases the exact causing instruction can be identified, 
especially as I note that internally `-fnon-call-exceptions' implies 
`-fasynchronous-unwind-tables':

  if (flag_non_call_exceptions)
flag_asynchronous_unwind_tables = 1;

(which I suspect might be worth documenting).  

> In C code -f{,non-call-}exceptions is also about whether cleanup attribute
> will work or not.  But I think in libgcc we don't really use it, especially
> not in the division/modulo code, not even indirectly from libc headers like
> pthread_cleanup_* macros.

 Thank you for your observation, I didn't know of the cleanup attribute 
before.

 So what shall we do about my patch?

  Maciej


[PATCH] rs6000, vec_popcntd is improperly defined in altivec.h

2020-08-28 Thread Carl Love via Gcc-patches
GCC maintainers:

The defines for vec_popcnt, bvec_popcnth, vec_popcntw, vec_popcntd in
gcc/config/rs6000/altivec.h are not listed in the Power 64-Bi ELF V2
ABI specification revision 1.4, May 10, 2017.  They are not used by any
of the regression tests.  They also do not work as reported in:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830

The following patch removes the unsupported defines for the builtin
functions.

The patch has been tested on 

  powerpc64le-unknown-linux-gnu (Power 9 LE

with no regression errors.

Please let me know if this patch is acceptable for mainline.

 Carl Love

--
 vec_popcntd is improperly defined in altivec.h

gcc/ChangeLog

2020-08-27  Carl Love  

PR target/85830
* config/rs6000/altivec.h (vec_popcntub, vec_popcntuh, vec_popcntuw,
vec_popcntud): Remove defines.
---
 gcc/config/rs6000/altivec.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index bf2240f16a2..8a2dcda0144 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -407,10 +407,6 @@
 #define vec_vpopcnth __builtin_vec_vpopcnth
 #define vec_vpopcntw __builtin_vec_vpopcntw
 #define vec_popcnt __builtin_vec_vpopcntu
-#define vec_popcntb __builtin_vec_vpopcntub
-#define vec_popcnth __builtin_vec_vpopcntuh
-#define vec_popcntw __builtin_vec_vpopcntuw
-#define vec_popcntd __builtin_vec_vpopcntud
 #define vec_vrld __builtin_vec_vrld
 #define vec_vsld __builtin_vec_vsld
 #define vec_vsrad __builtin_vec_vsrad
-- 
2.17.1




Re: [PATCH] arm: Fix switch tables for thumb-1 with -mpure-code [PR96768]

2020-08-28 Thread Christophe Lyon via Gcc-patches
On Fri, 28 Aug 2020 at 16:27, Richard Earnshaw
 wrote:
>
> On 28/08/2020 14:24, Christophe Lyon via Gcc-patches wrote:
> > On Fri, 28 Aug 2020 at 14:00, Richard Earnshaw
> >  wrote:
> >>
> >> On 27/08/2020 14:27, Christophe Lyon via Gcc-patches wrote:
> >>> In comment 14 from PR94538, it was suggested to switch off jump tables
> >>> on thumb-1 cores when using -mpure-code, like we already do for thumb-2.
> >>>
> >>> This is what this patch does, and also restores the previous value of
> >>> CASE_VECTOR_PC_RELATIVE since it was not the right way of handling
> >>> this.
> >>>
> >>> It also adds a new test, since the existing no-casesi.c did not catch
> >>> this problem.
> >>>
> >>> Tested by running the whole testsuite with -mpure-code -mcpu=cortex-m0
> >>> -mfloat-abi=soft, no regression and the new test passes (and fails
> >>> without the fix).
> >>>
> >>> 2020-08-27  Christophe Lyon  
> >>>
> >>>   gcc/
> >>>   * config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove condition on
> >>>   -mpure-code.
> >>>   * config/arm/thumb1.md (tablejump): Disable when -mpure-code.
> >>>
> >>>   gcc/testsuite/
> >>>   * gcc.target/arm/pure-code/pr96768.c: New test.
> >>> ---
> >>>  gcc/config/arm/arm.h |  5 ++---
> >>>  gcc/config/arm/thumb1.md |  2 +-
> >>>  gcc/testsuite/gcc.target/arm/pure-code/pr96768.c | 21 
> >>> +
> >>>  3 files changed, 24 insertions(+), 4 deletions(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> >>>
> >>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> >>> index 3887c51..7d43721 100644
> >>> --- a/gcc/config/arm/arm.h
> >>> +++ b/gcc/config/arm/arm.h
> >>> @@ -1975,10 +1975,9 @@ enum arm_auto_incmodes
> >>> for the index in the tablejump instruction.  */
> >>>  #define CASE_VECTOR_MODE Pmode
> >>>
> >>> -#define CASE_VECTOR_PC_RELATIVE ((TARGET_THUMB2  
> >>> \
> >>> +#define CASE_VECTOR_PC_RELATIVE (TARGET_THUMB2   
> >>> \
> >>> || (TARGET_THUMB1 \
> >>> -   && (optimize_size || flag_pic)))  \
> >>> -  && (!target_pure_code))
> >>> +   && (optimize_size || flag_pic)))
> >>>
> >>>
> >>>  #define CASE_VECTOR_SHORTEN_MODE(min, max, body) \
> >>> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> >>> index f0129db..602039e 100644
> >>> --- a/gcc/config/arm/thumb1.md
> >>> +++ b/gcc/config/arm/thumb1.md
> >>> @@ -2012,7 +2012,7 @@ (define_insn "*epilogue_insns"
> >>>  (define_expand "tablejump"
> >>>[(parallel [(set (pc) (match_operand:SI 0 "register_operand"))
> >>> (use (label_ref (match_operand 1 "" "")))])]
> >>> -  "TARGET_THUMB1"
> >>> +  "TARGET_THUMB1 && !target_pure_code"
> >>>"
> >>>if (flag_pic)
> >>>  {
> >>> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c 
> >>> b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> >>> new file mode 100644
> >>> index 000..fd4cad5
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> >>> @@ -0,0 +1,21 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-mpure-code" } */
> >>> +
> >>> +int f2 (int x, int y)
> >>> +{
> >>> +  switch (x)
> >>> +{
> >>> +case 0: return y + 0;
> >>> +case 1: return y + 1;
> >>> +case 2: return y + 2;
> >>> +case 3: return y + 3;
> >>> +case 4: return y + 4;
> >>> +case 5: return y + 5;
> >>> +}
> >>> +  return y;
> >>> +}
> >>> +
> >>> +/* We do not want any load from literal pool, but accept loads from r7
> >>> +   (frame pointer, used at -O0).  */
> >>> +/* { dg-final { scan-assembler-not "ldr\tr\\d+, \\\[r\[0-689\]+\\\]" } } 
> >>> */
> >>> +/* { dg-final { scan-assembler "text,\"0x2006\"" } } */
> >>>
> >>
> >> Having switch tables is only a problem if they are embedded in the .text
> >> segment.  But the case you show in the PR has them in the .rodata
> >> segment, so is this really necessary?
> >>
> >
> > Well, in the original PR94538, comment #14, Wilco said this was the best 
> > option.
> >
>
> If the choice were not really a choice (ie pure code could not use
> switch tables and still be pure) yes, it would be the case.  But that
> isn't the case.

OK thanks, that was my initial impression.

So it seems this PR is actually not-a-bug/invalid?


>
> GCC already knows generally when using jump sequences is going to be
> better than switch tables and when tables are likely to be better.  It
> can even produce a meld of the two when appropriate, it can even take
> into account whether or not we are optimizing for size.  So we shouldn't
> be just ride rough-shod over those choices.  Pure code doesn't really
> change the balance.


... or should the PR be the other way around and request to improve how such
cases are handled for 

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-28 Thread H.J. Lu via Gcc-patches
On Fri, Aug 28, 2020 at 8:22 AM Qing Zhao  wrote:
>
>
>
> > On Aug 28, 2020, at 2:47 AM, Alexandre Oliva  wrote:
> >
> > On Aug 26, 2020, Qing Zhao  wrote:
> >
> >> There are two issues I can see with adding a default generator in middle 
> >> end:
> >
> >> 1. In order to determine where a target should not use the generic
> >> code to emit the zeroing sequence,
> >> a new target hook to determine this has to be added;
> >
> > Yeah, a target hook whose default is the generic code, and that targets
> > that need it, or that benefit from it, can override.  That's the point
> > of hooks, to enable overriding.  Why should this be an issue?
>
> A default handler will be invoked for all the targets. So, if the target does 
> not provide any
> target-specific handler to override it. The default handler should be correct 
> on this target.
>
> So, the default handler should be correct on all the targets assuming no 
> override happening.
>
> Correct me if I am wrong with the above understanding.
>
> Then, for example, for the 32 bit hpux, is a default handler without any 
> special target handling
> correct on it? My understanding from the previous discussion is, we need some 
> special handling
> On 32 bit hpux to make it correct, So, in order to make the default handler 
> correct on 32 bit hpux,
> We need to add another target hook, for example, targetm.has_return_stubs() 
> to check whether
> A target has such feature, then in the default handler, we can call this new 
> target hook to check and
> Then make sure the default handler is correct on 32 bit hpux.
>
> There might be other targets that might need other special handlings which we 
> currently don’t know
> Yet. Do we need to identify all those targets and all those special features, 
> and then add new
> Target hook for each of the identified special feature?
>
> Yes, theoretically, it’s doable to run testing on all the targets and to see 
> which targets need special
> Handling and what kind of special handling we need, however, is doing this 
> really necessary?
>
>
> >
> >> 2. In order to avoid the generated zeroing insns (which are simply
> >> insns that set registers) being deleted,
> >> We have to define a new insn “pro_epilogue_use” in the target.
> >
> > Why won't a naked USE pattern do?  We already issue those in generic
> > code, for regs holding return values.  If we were to pretend that other
> > registers are also holding zeros as values to be returned, why shouldn't
> > that work for them as well?
>
> From the current implementation based on X86, I see the following comments:
>
> ;; As USE insns aren't meaningful after reload, this is used instead
> ;; to prevent deleting instructions setting registers for PIC code
> (define_insn "pro_epilogue_use"
>   [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
>
> My understanding is, the “USE” will not be useful after reload. So a new 
> “pro_eplogue_use” should
> be added.
>
> HongJiu, could you please provide more information on this?

pro_epilogue_use is needed.  Otherwise, these zeroing instructions
will be removed after reload.

-- 
H.J.


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-28 Thread Qing Zhao via Gcc-patches



> On Aug 28, 2020, at 2:47 AM, Alexandre Oliva  wrote:
> 
> On Aug 26, 2020, Qing Zhao  wrote:
> 
>> There are two issues I can see with adding a default generator in middle end:
> 
>> 1. In order to determine where a target should not use the generic
>> code to emit the zeroing sequence,
>> a new target hook to determine this has to be added;
> 
> Yeah, a target hook whose default is the generic code, and that targets
> that need it, or that benefit from it, can override.  That's the point
> of hooks, to enable overriding.  Why should this be an issue?

A default handler will be invoked for all the targets. So, if the target does 
not provide any 
target-specific handler to override it. The default handler should be correct 
on this target. 

So, the default handler should be correct on all the targets assuming no 
override happening. 

Correct me if I am wrong with the above understanding.

Then, for example, for the 32 bit hpux, is a default handler without any 
special target handling 
correct on it? My understanding from the previous discussion is, we need some 
special handling 
On 32 bit hpux to make it correct, So, in order to make the default handler 
correct on 32 bit hpux,
We need to add another target hook, for example, targetm.has_return_stubs() to 
check whether
A target has such feature, then in the default handler, we can call this new 
target hook to check and
Then make sure the default handler is correct on 32 bit hpux. 

There might be other targets that might need other special handlings which we 
currently don’t know
Yet. Do we need to identify all those targets and all those special features, 
and then add new 
Target hook for each of the identified special feature?

Yes, theoretically, it’s doable to run testing on all the targets and to see 
which targets need special
Handling and what kind of special handling we need, however, is doing this 
really necessary?


> 
>> 2. In order to avoid the generated zeroing insns (which are simply
>> insns that set registers) being deleted,
>> We have to define a new insn “pro_epilogue_use” in the target. 
> 
> Why won't a naked USE pattern do?  We already issue those in generic
> code, for regs holding return values.  If we were to pretend that other
> registers are also holding zeros as values to be returned, why shouldn't
> that work for them as well?

From the current implementation based on X86, I see the following comments:

;; As USE insns aren't meaningful after reload, this is used instead
;; to prevent deleting instructions setting registers for PIC code
(define_insn "pro_epilogue_use"
  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]

My understanding is, the “USE” will not be useful after reload. So a new 
“pro_eplogue_use” should
be added.

HongJiu, could you please provide more information on this?

Thanks.

Qing

> 
> -- 
> Alexandre Oliva, happy hacker
> https://urldefense.com/v3/__https://FSFLA.org/blogs/lxo/__;!!GqivPVa7Brio!NzNvCeA4fLoYPOD4RHTzKJd3QtgXG8bY2zXVcztQohMQRn5yROpYDp9CRbjjtcRV$
>  
> Free Software Activist
> GNU Toolchain Engineer



Re: [committed] doc: Switch valgrind.com to https

2020-08-28 Thread Gerald Pfeifer
On Tue, 25 Aug 2020, Martin Liška wrote:
> I noticed your continual effort to change http:// links to https://.
> I've written a simple script that can do that in automatic way.

Thanks, Martin!  My missions isn't actually changing http:// to https://
where possible, but checking for broken links and redirects offered by
server admins. ;-)

The latter indeed have included a number of those you observed, though
there aren't many (if any) left.

> What do you think about using it?

Are you offering to update those links the script identifies, or that
this is for me? :)  If the former, absolutely - even where server admins
have not activated a redirect (yet), if there is proper https available
we can use that.  If the latter, the primary challenge is one of time,
where I already have fallen behind on fixing the really broken links.

Gerald


Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 28, 2020 at 01:06:40PM +0200, Richard Biener via Gcc-patches wrote:
> > I don't see why it would break, it will not optimize { -1LL, -1LL }
> > vs. -1 scalar, sure, but it uses the hash and equality function the
> > rtl constant pool uses, which means it compares both the constants
> > (rtx_equal_p) and mode we have recorded for it.
> 
> Oh, I thought my patch for PR54201 was installed but it was not
> (I grepped my patch folder...).  PR54201 complains about something
> similar but then different ;)  Guess post-processing that would also be
> possible but also a bit awkward.  Maybe we should hash the
> full byte representation instead of the components.

Here is an adjusted patch that ought to merge even the same sized different
mode vectors with the same byte representation, etc.
It won't really help with avoiding the multiple reads of the constant in the
same function, but as you found, your patch doesn't help with that either.
Your patch isn't really incompatible with what the patch below does, though
I wonder whether a) it wouldn't be better to always canonicalize to an
integral mode with as few elts as possible even e.g. for floats b) whether
asserting that it simplify_rtx succeeds is safe, whether it shouldn't just
canonicalize if the canonicalization works and just do what it previously
did otherwise.

The following patch puts all pool entries which can be natively encoded
into a vector, sorts it by decreasing size, determines minimum size
of a pool entry and adds hash elts for each (aligned) min_size or wider
power of two-ish portion of the pool constant in addition to the whole pool
constant byte representation.

2020-08-28  Jakub Jelinek  

PR middle-end/54201
* varasm.c: Include alloc-pool.h.
(output_constant_pool_contents): Emit desc->mark < 0 entries as
aliases.
(struct constant_descriptor_rtx_data): New type.
(constant_descriptor_rtx_data_cmp): New function.
(struct const_rtx_data_hasher): New type.
(const_rtx_data_hasher::hash, const_rtx_data_hasher::equal): New
methods.
(optimize_constant_pool): New function.
(output_shared_constant_pool): Call it if TARGET_SUPPORTS_ALIASES.

--- gcc/varasm.c.jj 2020-07-28 15:39:10.091755086 +0200
+++ gcc/varasm.c2020-08-28 15:37:30.605076961 +0200
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.
 #include "asan.h"
 #include "rtl-iter.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "alloc-pool.h"
 
 #ifdef XCOFF_DEBUGGING_INFO
 #include "xcoffout.h"  /* Needed for external data declarations.  */
@@ -4198,7 +4199,27 @@ output_constant_pool_contents (struct rt
   class constant_descriptor_rtx *desc;
 
   for (desc = pool->first; desc ; desc = desc->next)
-if (desc->mark)
+if (desc->mark < 0)
+  {
+#ifdef ASM_OUTPUT_DEF
+   const char *name = targetm.strip_name_encoding (XSTR (desc->sym, 0));
+   char label[256];
+   char buffer[256 + 32];
+   const char *p;
+
+   ASM_GENERATE_INTERNAL_LABEL (label, "LC", ~desc->mark);
+   p = targetm.strip_name_encoding (label);
+   if (desc->offset)
+ {
+   sprintf (buffer, "%s+%ld", p, desc->offset);
+   p = buffer;
+ }
+   ASM_OUTPUT_DEF (asm_out_file, name, p);
+#else
+   gcc_unreachable ();
+#endif
+  }
+else if (desc->mark)
   {
/* If the constant is part of an object_block, make sure that
   the constant has been positioned within its block, but do not
@@ -4216,6 +4237,159 @@ output_constant_pool_contents (struct rt
   }
 }
 
+struct constant_descriptor_rtx_data {
+  constant_descriptor_rtx *desc;
+  target_unit *bytes;
+  unsigned short size;
+  unsigned short offset;
+  unsigned int hash;
+};
+
+/* qsort callback to sort constant_descriptor_rtx_data * vector by
+   decreasing size.  */
+
+static int
+constant_descriptor_rtx_data_cmp (const void *p1, const void *p2)
+{
+  constant_descriptor_rtx_data *const data1
+= *(constant_descriptor_rtx_data * const *) p1;
+  constant_descriptor_rtx_data *const data2
+= *(constant_descriptor_rtx_data * const *) p2;
+  if (data1->size > data2->size)
+return -1;
+  if (data1->size < data2->size)
+return 1;
+  if (data1->hash < data2->hash)
+return -1;
+  gcc_assert (data1->hash > data2->hash);
+  return 1;
+}
+
+struct const_rtx_data_hasher : nofree_ptr_hash
+{
+  static hashval_t hash (constant_descriptor_rtx_data *);
+  static bool equal (constant_descriptor_rtx_data *,
+constant_descriptor_rtx_data *);
+};
+
+/* Hash and compare functions for const_rtx_data_htab.  */
+
+hashval_t
+const_rtx_data_hasher::hash (constant_descriptor_rtx_data *data)
+{
+  return data->hash;
+}
+
+bool
+const_rtx_data_hasher::equal (constant_descriptor_rtx_data *x,
+ constant_descriptor_rtx_data *y)
+{
+  if (x->hash != y->hash || x->size != y->size)
+return 0;
+  

Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-28 Thread Andrew Stubbs

On 28/08/2020 15:26, Martin Liška wrote:

On 8/28/20 4:24 PM, Andrew Stubbs wrote:

Should I just add "true" to fix it? That's enough to make it build.


Yes, please and push it as obvious.


I've committed the attached.

Andrew
amdgcn: Update vec_safe_grow_cleared usage

An API change broke the amdgcn build.

gcc/ChangeLog:

	* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Add "true"
	parameter to vec_safe_grow_cleared.

diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
index 4dcb179ba71..a681426139b 100644
--- a/gcc/config/gcn/gcn-tree.c
+++ b/gcc/config/gcn/gcn-tree.c
@@ -456,7 +456,7 @@ gcn_goacc_get_worker_red_decl (tree type, unsigned offset)
 
   varpool_node::finalize_decl (decl);
 
-  vec_safe_grow_cleared (machfun->reduc_decls, offset + 1);
+  vec_safe_grow_cleared (machfun->reduc_decls, offset + 1, true);
   (*machfun->reduc_decls)[offset] = decl;
 
   return decl;


Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-28 Thread Martin Sebor via Gcc-patches

On 8/28/20 12:49 AM, Martin Liška wrote:

On 8/28/20 1:29 AM, Martin Sebor wrote:

With --enable-valgrind-annotations the change to the member function
signature in this patch triggers compilation errors during bootstrap:


I must confirm I didn't tested the configuration. Feel free to install
the patch, it's obvious.


I checked it in r11-2925.

Martin



Thank you,
Martin




Re: [PATCH] arm: Fix switch tables for thumb-1 with -mpure-code [PR96768]

2020-08-28 Thread Richard Earnshaw
On 28/08/2020 15:27, Richard Earnshaw wrote:
> On 28/08/2020 14:24, Christophe Lyon via Gcc-patches wrote:
>> On Fri, 28 Aug 2020 at 14:00, Richard Earnshaw
>>  wrote:
>>>
>>> On 27/08/2020 14:27, Christophe Lyon via Gcc-patches wrote:
 In comment 14 from PR94538, it was suggested to switch off jump tables
 on thumb-1 cores when using -mpure-code, like we already do for thumb-2.

 This is what this patch does, and also restores the previous value of
 CASE_VECTOR_PC_RELATIVE since it was not the right way of handling
 this.

 It also adds a new test, since the existing no-casesi.c did not catch
 this problem.

 Tested by running the whole testsuite with -mpure-code -mcpu=cortex-m0
 -mfloat-abi=soft, no regression and the new test passes (and fails
 without the fix).

 2020-08-27  Christophe Lyon  

   gcc/
   * config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove condition on
   -mpure-code.
   * config/arm/thumb1.md (tablejump): Disable when -mpure-code.

   gcc/testsuite/
   * gcc.target/arm/pure-code/pr96768.c: New test.
 ---
  gcc/config/arm/arm.h |  5 ++---
  gcc/config/arm/thumb1.md |  2 +-
  gcc/testsuite/gcc.target/arm/pure-code/pr96768.c | 21 
 +
  3 files changed, 24 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96768.c

 diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
 index 3887c51..7d43721 100644
 --- a/gcc/config/arm/arm.h
 +++ b/gcc/config/arm/arm.h
 @@ -1975,10 +1975,9 @@ enum arm_auto_incmodes
 for the index in the tablejump instruction.  */
  #define CASE_VECTOR_MODE Pmode

 -#define CASE_VECTOR_PC_RELATIVE ((TARGET_THUMB2   
\
 +#define CASE_VECTOR_PC_RELATIVE (TARGET_THUMB2
\
 || (TARGET_THUMB1 \
 -   && (optimize_size || flag_pic)))  \
 -  && (!target_pure_code))
 +   && (optimize_size || flag_pic)))


  #define CASE_VECTOR_SHORTEN_MODE(min, max, body) \
 diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
 index f0129db..602039e 100644
 --- a/gcc/config/arm/thumb1.md
 +++ b/gcc/config/arm/thumb1.md
 @@ -2012,7 +2012,7 @@ (define_insn "*epilogue_insns"
  (define_expand "tablejump"
[(parallel [(set (pc) (match_operand:SI 0 "register_operand"))
 (use (label_ref (match_operand 1 "" "")))])]
 -  "TARGET_THUMB1"
 +  "TARGET_THUMB1 && !target_pure_code"
"
if (flag_pic)
  {
 diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c 
 b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
 new file mode 100644
 index 000..fd4cad5
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
 @@ -0,0 +1,21 @@
 +/* { dg-do compile } */
 +/* { dg-options "-mpure-code" } */
 +
 +int f2 (int x, int y)
 +{
 +  switch (x)
 +{
 +case 0: return y + 0;
 +case 1: return y + 1;
 +case 2: return y + 2;
 +case 3: return y + 3;
 +case 4: return y + 4;
 +case 5: return y + 5;
 +}
 +  return y;
 +}
 +
 +/* We do not want any load from literal pool, but accept loads from r7
 +   (frame pointer, used at -O0).  */
 +/* { dg-final { scan-assembler-not "ldr\tr\\d+, \\\[r\[0-689\]+\\\]" } } 
 */
 +/* { dg-final { scan-assembler "text,\"0x2006\"" } } */

>>>
>>> Having switch tables is only a problem if they are embedded in the .text
>>> segment.  But the case you show in the PR has them in the .rodata
>>> segment, so is this really necessary?
>>>
>>
>> Well, in the original PR94538, comment #14, Wilco said this was the best 
>> option.
>>
> 
> If the choice were not really a choice (ie pure code could not use
> switch tables and still be pure) yes, it would be the case.  But that
> isn't the case.
> 
> GCC already knows generally when using jump sequences is going to be
> better than switch tables and when tables are likely to be better.  It
> can even produce a meld of the two when appropriate, it can even take
> into account whether or not we are optimizing for size.  So we shouldn't
> be just ride rough-shod over those choices.  Pure code doesn't really
> change the balance.
> 
> R.
> 
>>> R.
> 

Also note that it would be possible to compress the data tables in a
similar way to the aarch64 port; there's no need for the entries to be
32 bits most of the time.  It just needs someone to do the work.

R.


Re: [PATCH] arm: Fix switch tables for thumb-1 with -mpure-code [PR96768]

2020-08-28 Thread Richard Earnshaw
On 28/08/2020 14:24, Christophe Lyon via Gcc-patches wrote:
> On Fri, 28 Aug 2020 at 14:00, Richard Earnshaw
>  wrote:
>>
>> On 27/08/2020 14:27, Christophe Lyon via Gcc-patches wrote:
>>> In comment 14 from PR94538, it was suggested to switch off jump tables
>>> on thumb-1 cores when using -mpure-code, like we already do for thumb-2.
>>>
>>> This is what this patch does, and also restores the previous value of
>>> CASE_VECTOR_PC_RELATIVE since it was not the right way of handling
>>> this.
>>>
>>> It also adds a new test, since the existing no-casesi.c did not catch
>>> this problem.
>>>
>>> Tested by running the whole testsuite with -mpure-code -mcpu=cortex-m0
>>> -mfloat-abi=soft, no regression and the new test passes (and fails
>>> without the fix).
>>>
>>> 2020-08-27  Christophe Lyon  
>>>
>>>   gcc/
>>>   * config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove condition on
>>>   -mpure-code.
>>>   * config/arm/thumb1.md (tablejump): Disable when -mpure-code.
>>>
>>>   gcc/testsuite/
>>>   * gcc.target/arm/pure-code/pr96768.c: New test.
>>> ---
>>>  gcc/config/arm/arm.h |  5 ++---
>>>  gcc/config/arm/thumb1.md |  2 +-
>>>  gcc/testsuite/gcc.target/arm/pure-code/pr96768.c | 21 +
>>>  3 files changed, 24 insertions(+), 4 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
>>>
>>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>>> index 3887c51..7d43721 100644
>>> --- a/gcc/config/arm/arm.h
>>> +++ b/gcc/config/arm/arm.h
>>> @@ -1975,10 +1975,9 @@ enum arm_auto_incmodes
>>> for the index in the tablejump instruction.  */
>>>  #define CASE_VECTOR_MODE Pmode
>>>
>>> -#define CASE_VECTOR_PC_RELATIVE ((TARGET_THUMB2
>>>   \
>>> +#define CASE_VECTOR_PC_RELATIVE (TARGET_THUMB2 
>>>   \
>>> || (TARGET_THUMB1 \
>>> -   && (optimize_size || flag_pic)))  \
>>> -  && (!target_pure_code))
>>> +   && (optimize_size || flag_pic)))
>>>
>>>
>>>  #define CASE_VECTOR_SHORTEN_MODE(min, max, body) \
>>> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
>>> index f0129db..602039e 100644
>>> --- a/gcc/config/arm/thumb1.md
>>> +++ b/gcc/config/arm/thumb1.md
>>> @@ -2012,7 +2012,7 @@ (define_insn "*epilogue_insns"
>>>  (define_expand "tablejump"
>>>[(parallel [(set (pc) (match_operand:SI 0 "register_operand"))
>>> (use (label_ref (match_operand 1 "" "")))])]
>>> -  "TARGET_THUMB1"
>>> +  "TARGET_THUMB1 && !target_pure_code"
>>>"
>>>if (flag_pic)
>>>  {
>>> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c 
>>> b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
>>> new file mode 100644
>>> index 000..fd4cad5
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
>>> @@ -0,0 +1,21 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mpure-code" } */
>>> +
>>> +int f2 (int x, int y)
>>> +{
>>> +  switch (x)
>>> +{
>>> +case 0: return y + 0;
>>> +case 1: return y + 1;
>>> +case 2: return y + 2;
>>> +case 3: return y + 3;
>>> +case 4: return y + 4;
>>> +case 5: return y + 5;
>>> +}
>>> +  return y;
>>> +}
>>> +
>>> +/* We do not want any load from literal pool, but accept loads from r7
>>> +   (frame pointer, used at -O0).  */
>>> +/* { dg-final { scan-assembler-not "ldr\tr\\d+, \\\[r\[0-689\]+\\\]" } } */
>>> +/* { dg-final { scan-assembler "text,\"0x2006\"" } } */
>>>
>>
>> Having switch tables is only a problem if they are embedded in the .text
>> segment.  But the case you show in the PR has them in the .rodata
>> segment, so is this really necessary?
>>
> 
> Well, in the original PR94538, comment #14, Wilco said this was the best 
> option.
> 

If the choice were not really a choice (ie pure code could not use
switch tables and still be pure) yes, it would be the case.  But that
isn't the case.

GCC already knows generally when using jump sequences is going to be
better than switch tables and when tables are likely to be better.  It
can even produce a meld of the two when appropriate, it can even take
into account whether or not we are optimizing for size.  So we shouldn't
be just ride rough-shod over those choices.  Pure code doesn't really
change the balance.

R.

>> R.



Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-28 Thread Martin Liška

On 8/28/20 4:24 PM, Andrew Stubbs wrote:

Should I just add "true" to fix it? That's enough to make it build.


Yes, please and push it as obvious.

Thanks,
Martin


Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-28 Thread Andrew Stubbs

On 11/08/2020 12:36, Martin Liška wrote:

Hello.

All right, I did it in 3 steps:
1) - new exact argument is added (no default value) - I tested the on 
x86_64-linux-gnu

and I build all cross targets.
2) set default value of exact = false
3) change places which calculate its own growth to use the default argument

I would like to install first 1) and then wait some time before the rest 
is installed.


This broke the amdgcn-amdhsa build.

Should I just add "true" to fix it? That's enough to make it build.

Andrew


[PATCH] VEC_COND_EXPR: fix ICE in gimple_expand_vec_cond_expr

2020-08-28 Thread Martin Liška

Hey.

The patch is about VEC_COND_EXP comparison of a SSA_NAME with a constant
that is artifact of -fno-tree-ccp.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

PR tree-optimization/96466
* gimple-fold.c (expand_cmp_piecewise): New.
* gimple-fold.h (nunits_for_known_piecewise_op): New.
(expand_cmp_piecewise): Moved from ...
* tree-vect-generic.c (expand_vector_comparison): ... here.
(nunits_for_known_piecewise_op): Moved to gimple-fold.h.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Use
expand_cmp_piecewise fallback for constants.

gcc/testsuite/ChangeLog:

PR tree-optimization/96466
* gcc.dg/vect/pr96466.c: New test.
---
 gcc/gimple-fold.c   | 28 
 gcc/gimple-fold.h   | 14 ++
 gcc/gimple-isel.cc  | 10 ---
 gcc/testsuite/gcc.dg/vect/pr96466.c | 18 +
 gcc/tree-vect-generic.c | 41 ++---
 5 files changed, 69 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr96466.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index dcc1b56a273..86d5d0ed7d8 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -8056,3 +8056,31 @@ gimple_stmt_integer_valued_real_p (gimple *stmt, int 
depth)
   return false;
 }
 }
+
+tree
+expand_cmp_piecewise (gimple_stmt_iterator *gsi, tree type, tree op0, tree op1)
+{
+  tree inner_type = TREE_TYPE (TREE_TYPE (op0));
+  tree part_width = vector_element_bits_tree (TREE_TYPE (op0));
+  tree index = bitsize_int (0);
+  int nunits = nunits_for_known_piecewise_op (TREE_TYPE (op0));
+  int prec = GET_MODE_PRECISION (SCALAR_TYPE_MODE (type));
+  tree ret_type = build_nonstandard_integer_type (prec, 1);
+  tree ret_inner_type = boolean_type_node;
+  int i;
+  tree t = build_zero_cst (ret_type);
+
+  if (TYPE_PRECISION (ret_inner_type) != 1)
+ret_inner_type = build_nonstandard_integer_type (1, 1);
+  for (i = 0; i < nunits;
+   i++, index = int_const_binop (PLUS_EXPR, index, part_width))
+{
+  tree a = tree_vec_extract (gsi, inner_type, op0, part_width, index);
+  tree b = tree_vec_extract (gsi, inner_type, op1, part_width, index);
+  tree result = gimplify_build2 (gsi, NE_EXPR, ret_inner_type, a, b);
+  t = gimplify_build3 (gsi, BIT_INSERT_EXPR, ret_type, t, result,
+  bitsize_int (i));
+}
+
+  return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
+}
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 0ed1d1ffe83..7e843b34f53 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -147,6 +147,20 @@ gimple_build_vector (gimple_seq *seq, tree_vector_builder 
*builder)
 extern bool gimple_stmt_nonnegative_warnv_p (gimple *, bool *, int = 0);
 extern bool gimple_stmt_integer_valued_real_p (gimple *, int = 0);
 
+/* Return the number of elements in a vector type TYPE that we have

+   already decided needs to be expanded piecewise.  We don't support
+   this kind of expansion for variable-length vectors, since we should
+   always check for target support before introducing uses of those.  */
+
+static inline unsigned int
+nunits_for_known_piecewise_op (const_tree type)
+{
+  return TYPE_VECTOR_SUBPARTS (type).to_constant ();
+}
+
+extern tree expand_cmp_piecewise (gimple_stmt_iterator *gsi, tree lhs,
+ tree op0, tree op1);
+
 /* In gimple-match.c.  */
 extern tree gimple_simplify (enum tree_code, tree, tree,
 gimple_seq *, tree (*)(tree));
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index b330cf4c20e..32e3bc31f7f 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -33,8 +33,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify-me.h"
 #include "gimplify.h"
 #include "tree-cfg.h"
-#include "bitmap.h"
 #include "tree-ssa-dce.h"
+#include "gimple-fold.h"
 
 /* Expand all VEC_COND_EXPR gimple assignments into calls to internal

function based on type of selected expansion.  */
@@ -119,8 +119,12 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
   /* Fake op0 < 0.  */
   else
{
- gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
- == MODE_VECTOR_INT);
+ if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0))) != MODE_VECTOR_INT)
+   {
+ tree t = expand_cmp_piecewise (gsi, TREE_TYPE (lhs), op0, op1);
+ return gimple_build_assign (lhs, NOP_EXPR, t);
+   }
+
  op0a = op0;
  op0b = build_zero_cst (TREE_TYPE (op0));
  tcode = LT_EXPR;
diff --git a/gcc/testsuite/gcc.dg/vect/pr96466.c 
b/gcc/testsuite/gcc.dg/vect/pr96466.c
new file mode 100644
index 000..8cca5e12ff2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr96466.c
@@ -0,0 +1,18 @@
+/* PR tree-optimization/96466 */
+/* { dg-do 

[PATCH] lra: Avoid cycling on certain subreg reloads [PR96796]

2020-08-28 Thread Richard Sandiford
This PR is about LRA cycling for a reload of the form:


Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
  Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287
  Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288
  103: r203:SI=r288:SI<<0x1+r196:DI#0
  REG_DEAD r196:DI
Inserting slow/invalid mem reload before:
  316: r287:DI=[r105:DI*0x8+r140:DI]
  317: r288:SI=r287:DI#0


The problem is with r287.  We rightly give it a broad starting class of
POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
However, we never make forward progress towards narrowing it down to
a specific choice of class (POINTER_REGS or FP_REGS).

I think in practice we rely on two things to narrow a reload pseudo's
class down to a specific choice:

(1) a restricted class is specified when the pseudo is created

This happens for input address reloads, where the class is taken
from the target's chosen base register class.  It also happens
for simple REG reloads, where the class is taken from the chosen
alternative's constraints.

(2) uses of the reload pseudo as a direct input operand

In this case get_reload_reg tries to reuse the existing register
and narrow its class, instead of creating a new reload pseudo.

However, neither occurs here.  As described above, r287 rightly
starts out with a wide choice of class, ultimately derived from
ALL_REGS, so we don't get (1).  And as the comments in the PR
explain, r287 is never used as an input reload, only the subreg is,
so we don't get (2):


 Choosing alt 13 in insn 317:  (0) r  (1) w {*movsi_aarch64}
  Creating newreg=291, assigning class FP_REGS to r291
  317: r288:SI=r291:SI
Inserting insn reload before:
  320: r291:SI=r287:DI#0


IMO, in this case we should rely on the reload of r316 to narrow
down the class of r278.  Currently we do:


 Choosing alt 7 in insn 316:  (0) r  (1) m {*movdi_aarch64}
  Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289
  316: r289:DI=[r105:DI*0x8+r140:DI]
Inserting insn reload after:
  318: r287:DI=r289:DI
---

i.e. we create a new pseudo register r289 and give *that* pseudo
GENERAL_REGS instead.  This is because get_reload_reg only narrows
down the existing class for OP_IN and OP_INOUT, not OP_OUT.

But if we have a reload pseudo in a reload instruction and have chosen
a specific class for the reload pseudo, I think we should simply install
it for OP_OUT reloads too, if the class is a subset of the existing class.
We will need to pick such a register whatever happens (for r289 in the
example above).  And as explained in the PR, doing this actually avoids
an unnecessary move via the FP registers too.

The patch is quite aggressive in that it does this for all reload
pseudos in all reload instructions.  I wondered about reusing the
condition for a reload move in in_class_p:

  INSN_UID (curr_insn) >= new_insn_uid_start
  && curr_insn_set != NULL
  && ((OBJECT_P (SET_SRC (curr_insn_set))
   && ! CONSTANT_P (SET_SRC (curr_insn_set)))
  || (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG
  && OBJECT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
  && ! CONSTANT_P (SUBREG_REG (SET_SRC (curr_insn_set)))

but I can't really justify that on first principles.  I think we
should apply the rule consistently until we have a specific reason
for doing otherwise.

OK for trunk?  If so, I think we should leave it on trunk for a bit
before backporting.  (And perhaps including the in_class_p condition
above would be good for release branches?)

I'm off next week, so if the patch is OK, I'll hold off applying
until I get back.

Richard


gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.

gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.
---
 gcc/lra-constraints.c | 54 ++
 gcc/testsuite/gcc.c-torture/compile/pr96796.c | 55 +++
 2 files changed, 99 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr96796.c

diff --git a/gcc/lra-constraints.c 

Re: [PATCH] x86: Add -mbaseline-isas-only/target("baseline-isas-only")

2020-08-28 Thread H.J. Lu via Gcc-patches
On Thu, Aug 27, 2020 at 10:00 AM Uros Bizjak  wrote:
>
> On Thu, Aug 27, 2020 at 4:45 PM H.J. Lu  wrote:
>
> > > > > > > How about target("baseline-isas-only")? All CPUID functions are
> > > > > > > inlined.
> > > > > >
> > > > > > No, I don't think this is a good idea. Now consider the situation 
> > > > > > that
> > > > > > caller functions are compiled with e.g. -mgeneral-regs-only. Due to
> > > > > > #pragmas, CPUID functions are compiled with a superset ISAs, so they
> > > > > > again won't be inlined. ISAs of caller functions and CPUID should
> > > > > > match, the best way is to include  after the #pragma. And
> > > > > > IMO, general-regs-only target #pragma is an excellent setting for
> > > > > > both: cpuid.h and caller bit testing functions.
> > > > > >
> > > > > > So, if we care about inlining, decorating cpuid.h with target 
> > > > > > pragmas
> > > > > > is a bad idea.
> > > > >
> > > > > This can be done with #pragma in .
> > > > >
> > > >
> > > > We just need to update ix86_can_inline_p to allow inline functions
> > > > with baseline-isas-only and general-regs-only attributes if caller
> > > > supports the same set of ISAs.
> > > >
> > > > Here is the updated patch.
> > >
> > > I'm not against it, but I don't plan to approve the attached patch.
> > >
> >
> > How about this one?
>
> I really don't see any benefit in introducing baseline-isas-only
> #pragma, when we have general-regs-only #pragma. We may want (for
> whatever reason) to avoid SSE2 movd/movq instructions also for 64bit
> targets in functions that test bits, returned by cpuid. And since
> cpuid.h functions are extremely simple (because we want them to be
> inlined!), we can simply #include them after mentioned #pragma,
> together with bit testing functions. This way, all problems involving
> inlining are avoided.
>

Baseline ISAs are related to -march=x86-64.   The differences are

1. Baseline ISAs apply to both 32-bit and 64-bit.
2. Baseline ISAs doesn't change -mtune.

It is guaranteed to work for both 32-bit and 64-bit as well as any
-mXXX options at command-line.

-- 
H.J.


Re: [PATCH] arm: Fix switch tables for thumb-1 with -mpure-code [PR96768]

2020-08-28 Thread Christophe Lyon via Gcc-patches
On Fri, 28 Aug 2020 at 14:00, Richard Earnshaw
 wrote:
>
> On 27/08/2020 14:27, Christophe Lyon via Gcc-patches wrote:
> > In comment 14 from PR94538, it was suggested to switch off jump tables
> > on thumb-1 cores when using -mpure-code, like we already do for thumb-2.
> >
> > This is what this patch does, and also restores the previous value of
> > CASE_VECTOR_PC_RELATIVE since it was not the right way of handling
> > this.
> >
> > It also adds a new test, since the existing no-casesi.c did not catch
> > this problem.
> >
> > Tested by running the whole testsuite with -mpure-code -mcpu=cortex-m0
> > -mfloat-abi=soft, no regression and the new test passes (and fails
> > without the fix).
> >
> > 2020-08-27  Christophe Lyon  
> >
> >   gcc/
> >   * config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove condition on
> >   -mpure-code.
> >   * config/arm/thumb1.md (tablejump): Disable when -mpure-code.
> >
> >   gcc/testsuite/
> >   * gcc.target/arm/pure-code/pr96768.c: New test.
> > ---
> >  gcc/config/arm/arm.h |  5 ++---
> >  gcc/config/arm/thumb1.md |  2 +-
> >  gcc/testsuite/gcc.target/arm/pure-code/pr96768.c | 21 +
> >  3 files changed, 24 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> >
> > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> > index 3887c51..7d43721 100644
> > --- a/gcc/config/arm/arm.h
> > +++ b/gcc/config/arm/arm.h
> > @@ -1975,10 +1975,9 @@ enum arm_auto_incmodes
> > for the index in the tablejump instruction.  */
> >  #define CASE_VECTOR_MODE Pmode
> >
> > -#define CASE_VECTOR_PC_RELATIVE ((TARGET_THUMB2
> >   \
> > +#define CASE_VECTOR_PC_RELATIVE (TARGET_THUMB2 
> >   \
> > || (TARGET_THUMB1 \
> > -   && (optimize_size || flag_pic)))  \
> > -  && (!target_pure_code))
> > +   && (optimize_size || flag_pic)))
> >
> >
> >  #define CASE_VECTOR_SHORTEN_MODE(min, max, body) \
> > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > index f0129db..602039e 100644
> > --- a/gcc/config/arm/thumb1.md
> > +++ b/gcc/config/arm/thumb1.md
> > @@ -2012,7 +2012,7 @@ (define_insn "*epilogue_insns"
> >  (define_expand "tablejump"
> >[(parallel [(set (pc) (match_operand:SI 0 "register_operand"))
> > (use (label_ref (match_operand 1 "" "")))])]
> > -  "TARGET_THUMB1"
> > +  "TARGET_THUMB1 && !target_pure_code"
> >"
> >if (flag_pic)
> >  {
> > diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c 
> > b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> > new file mode 100644
> > index 000..fd4cad5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mpure-code" } */
> > +
> > +int f2 (int x, int y)
> > +{
> > +  switch (x)
> > +{
> > +case 0: return y + 0;
> > +case 1: return y + 1;
> > +case 2: return y + 2;
> > +case 3: return y + 3;
> > +case 4: return y + 4;
> > +case 5: return y + 5;
> > +}
> > +  return y;
> > +}
> > +
> > +/* We do not want any load from literal pool, but accept loads from r7
> > +   (frame pointer, used at -O0).  */
> > +/* { dg-final { scan-assembler-not "ldr\tr\\d+, \\\[r\[0-689\]+\\\]" } } */
> > +/* { dg-final { scan-assembler "text,\"0x2006\"" } } */
> >
>
> Having switch tables is only a problem if they are embedded in the .text
> segment.  But the case you show in the PR has them in the .rodata
> segment, so is this really necessary?
>

Well, in the original PR94538, comment #14, Wilco said this was the best option.

> R.


Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-28 Thread Bill Schmidt via Gcc-patches

On 8/28/20 7:25 AM, Alan Modra wrote:

On Fri, Aug 28, 2020 at 01:17:27AM -0500, Segher Boessenkool wrote:

1) Very many unnecessary moves are generated during expand *anyway*, a
few more will not hurt;
2) In practice we always generate a move from a pseudo into r12 here,
never a copy from r12 into r12;
3) Why generate dead code optimising cases that do not happen, making
the compiler bigger and slower, but more importantly, making the
compiler code harder to read?

Uh, OK point 2 and followup 3 is persuasive.  I should have thought of
that myself, thanks.

OK.  I had already upstreamed the patch on David's ok, but I can go back 
in and clear out those two unnecessary if's in _call and _sibcall fns.


Bill



Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-28 Thread Alan Modra via Gcc-patches
On Fri, Aug 28, 2020 at 01:17:27AM -0500, Segher Boessenkool wrote:
> 1) Very many unnecessary moves are generated during expand *anyway*, a
> few more will not hurt;
> 2) In practice we always generate a move from a pseudo into r12 here,
> never a copy from r12 into r12;
> 3) Why generate dead code optimising cases that do not happen, making
> the compiler bigger and slower, but more importantly, making the
> compiler code harder to read?

Uh, OK point 2 and followup 3 is persuasive.  I should have thought of
that myself, thanks.

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH] dwarf: Multi-register CFI address support

2020-08-28 Thread Andrew Stubbs

Hi all,

This patch introduces DWARF CFI support for architectures that require 
multiple registers to hold pointers, such as the stack pointer, frame 
pointer, and return address. The motivating case is the AMD GCN 
architecture which has 64-bit address pointers, but 32-bit registers.


The current implementation permits program variables to span as many 
registers as they need, but assumes that CFI expressions will only need 
a single register for each frame value.


To be fair, the DWARF standard makes a similar assumption; the engineers 
working on LLVM and GDB, at AMD, have therefore invented some new DWARF 
operators that they plan to propose for a future standard. Only one is 
relevant here, however: DW_OP_LLVM_piece_end. (Unfortunately this 
clashes with an AArch64 extension, but I think we can cope using an 
alias -- only GCC dumps will be confusing.)


My approach is to change the type representing a DWARF register 
throughout the CFI code. This permits the register span information to 
propagate to where it is needed.


I've taken advantage of C++ struct copies and operator== to minimize the 
amount of refactoring required. I'm not sure this meets the GCC 
guidelines exactly, but if not I can change that once the basic form is 
agreed. (I also considered an operator= to make assigning single dwreg 
values transparent, but that hid too many invalid assumptions.)


OK to commit? (Although, I'll hold off until AMD release the compatible 
GDB.)


Thanks

Andrew
dwarf: Multi-register CFI address support

Add support for architectures such as AMD GCN, in which the pointer size is
larger than the register size.  This allows the CFI information to include
multi-register locations for the stack pointer, frame pointer, and return
address.

Note that this uses a newly proposed DWARF operator DW_OP_LLVM_piece_end,
which is currently only recognized by the ROCGDB debugger from AMD.  The exact
name and encoding for this operator is subject to change if and when the DWARF
standard accepts it.

gcc/ChangeLog:

	* dwarf2cfi.c (dw_stack_pointer_regnum): Change type to struct cfa_reg.
	(dw_frame_pointer_regnum): Likewise.
	(new_cfi_row): Use set_by_dwreg.
	(get_cfa_from_loc_descr): Use set_by_dwreg.  Support register spans
	with DW_OP_piece and DW_OP_LLVM_piece_end.  Support DW_OP_lit*,
	DW_OP_const*, DW_OP_minus, and DW_OP_plus.
	(lookup_cfa_1): Use set_by_dwreg.
	(def_cfa_0): Update for cfa_reg and support register spans.
	(reg_save): Change sreg parameter to struct cfa_reg.  Support register
	spans.
	(dwf_cfa_reg): New function.
	(dwarf2out_flush_queued_reg_saves): Use dwf_cfa_reg instead of
	dwf_regno.
	(dwarf2out_frame_debug_def_cfa): Likewise.
	(dwarf2out_frame_debug_adjust_cfa): Likewise.
	(dwarf2out_frame_debug_cfa_offset): Likewise.  Update reg_save usage.
	(dwarf2out_frame_debug_cfa_register): Likewise.
	(dwarf2out_frame_debug_expr): Likewise.
	(create_pseudo_cfg): Use set_by_dwreg.
	(initial_return_save): Use set_by_dwreg and dwf_cfa_reg,
	(create_cie_data): Use dwf_cfa_reg.
	(execute_dwarf2_frame): Use dwf_cfa_reg.
	(dump_cfi_row): Use set_by_dwreg.
	* dwarf2out.c (build_span_loc): New function.
	(build_cfa_loc): Support register spans.
	(build_cfa_aligned_loc): Update cfa_reg usage.
	(convert_cfa_to_fb_loc_list): Use set_by_dwreg.
	* dwarf2out.h (struct cfa_reg): New type.
	(struct dw_cfa_location): Use struct cfa_reg.
	(build_span_loc): New prototype.
	* gengtype.c (main): Accept poly_uint16_pod type.

include/ChangeLog:

	* dwarf2.def (DW_OP_LLVM_piece_end): New extension operator.

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 0d179b388e4..63cb6de5c4f 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -229,8 +229,8 @@ static vec queued_reg_saves;
 static bool any_cfis_emitted;
 
 /* Short-hand for commonly used register numbers.  */
-static unsigned dw_stack_pointer_regnum;
-static unsigned dw_frame_pointer_regnum;
+static struct cfa_reg dw_stack_pointer_regnum;
+static struct cfa_reg dw_frame_pointer_regnum;
 
 /* Hook used by __throw.  */
 
@@ -430,7 +430,7 @@ new_cfi_row (void)
 {
   dw_cfi_row *row = ggc_cleared_alloc ();
 
-  row->cfa.reg = INVALID_REGNUM;
+  row->cfa.reg.set_by_dwreg (INVALID_REGNUM);
 
   return row;
 }
@@ -538,7 +538,11 @@ get_cfa_from_loc_descr (dw_cfa_location *cfa, struct dw_loc_descr_node *loc)
   cfa->offset = 0;
   cfa->base_offset = 0;
   cfa->indirect = 0;
-  cfa->reg = -1;
+  cfa->reg.set_by_dwreg (-1);
+
+  /* Record previous register pieces here.  */
+  struct cfa_reg span;
+  span.set_by_dwreg (INVALID_REGNUM);
 
   for (ptr = loc; ptr != NULL; ptr = ptr->dw_loc_next)
 {
@@ -578,10 +582,10 @@ get_cfa_from_loc_descr (dw_cfa_location *cfa, struct dw_loc_descr_node *loc)
 	case DW_OP_reg29:
 	case DW_OP_reg30:
 	case DW_OP_reg31:
-	  cfa->reg = op - DW_OP_reg0;
+	  cfa->reg.set_by_dwreg (op - DW_OP_reg0);
 	  break;
 	case DW_OP_regx:
-	  cfa->reg = ptr->dw_loc_oprnd1.v.val_int;
+	  cfa->reg.set_by_dwreg (ptr->dw_loc_oprnd1.v.val_int);
 	  break;
 	

Re: [PATCH] arm: Fix switch tables for thumb-1 with -mpure-code [PR96768]

2020-08-28 Thread Richard Earnshaw
On 27/08/2020 14:27, Christophe Lyon via Gcc-patches wrote:
> In comment 14 from PR94538, it was suggested to switch off jump tables
> on thumb-1 cores when using -mpure-code, like we already do for thumb-2.
> 
> This is what this patch does, and also restores the previous value of
> CASE_VECTOR_PC_RELATIVE since it was not the right way of handling
> this.
> 
> It also adds a new test, since the existing no-casesi.c did not catch
> this problem.
> 
> Tested by running the whole testsuite with -mpure-code -mcpu=cortex-m0
> -mfloat-abi=soft, no regression and the new test passes (and fails
> without the fix).
> 
> 2020-08-27  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove condition on
>   -mpure-code.
>   * config/arm/thumb1.md (tablejump): Disable when -mpure-code.
> 
>   gcc/testsuite/
>   * gcc.target/arm/pure-code/pr96768.c: New test.
> ---
>  gcc/config/arm/arm.h |  5 ++---
>  gcc/config/arm/thumb1.md |  2 +-
>  gcc/testsuite/gcc.target/arm/pure-code/pr96768.c | 21 +
>  3 files changed, 24 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 3887c51..7d43721 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1975,10 +1975,9 @@ enum arm_auto_incmodes
> for the index in the tablejump instruction.  */
>  #define CASE_VECTOR_MODE Pmode
>  
> -#define CASE_VECTOR_PC_RELATIVE ((TARGET_THUMB2  
> \
> +#define CASE_VECTOR_PC_RELATIVE (TARGET_THUMB2   
> \
> || (TARGET_THUMB1 \
> -   && (optimize_size || flag_pic)))  \
> -  && (!target_pure_code))
> +   && (optimize_size || flag_pic)))
>  
>  
>  #define CASE_VECTOR_SHORTEN_MODE(min, max, body) \
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index f0129db..602039e 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -2012,7 +2012,7 @@ (define_insn "*epilogue_insns"
>  (define_expand "tablejump"
>[(parallel [(set (pc) (match_operand:SI 0 "register_operand"))
> (use (label_ref (match_operand 1 "" "")))])]
> -  "TARGET_THUMB1"
> +  "TARGET_THUMB1 && !target_pure_code"
>"
>if (flag_pic)
>  {
> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c 
> b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> new file mode 100644
> index 000..fd4cad5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96768.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mpure-code" } */
> +
> +int f2 (int x, int y)
> +{
> +  switch (x)
> +{
> +case 0: return y + 0;
> +case 1: return y + 1;
> +case 2: return y + 2;
> +case 3: return y + 3;
> +case 4: return y + 4;
> +case 5: return y + 5;
> +}
> +  return y;
> +}
> +
> +/* We do not want any load from literal pool, but accept loads from r7
> +   (frame pointer, used at -O0).  */
> +/* { dg-final { scan-assembler-not "ldr\tr\\d+, \\\[r\[0-689\]+\\\]" } } */
> +/* { dg-final { scan-assembler "text,\"0x2006\"" } } */
> 

Having switch tables is only a problem if they are embedded in the .text
segment.  But the case you show in the PR has them in the .rodata
segment, so is this really necessary?

R.


Re: [Patch] Fortran: Fix absent-optional handling for nondescriptor arrays (PR94672)

2020-08-28 Thread Andre Vehreschild
Hi Tobias,

the patch looks ok to me.

Thanks for the patch.

Regards,
Andre

On Wed, 26 Aug 2020 18:29:40 +0200
Tobias Burnus  wrote:

> This fixes an issue caused by the patch for PR 94672, which
> affects both GCC 10 and GCC 11.
> 
> Only 'sVal' of 'subroutine foo' was affected, the rest is
> only a crosscheck that it worked for those code paths.
> 
> (I did check against the dump – which looks fine. I could
> add dump tests as well. The 'foo' test was failing with
> 'stop 5' (absent argument) at runtime before the patch;
> the report was for the 'stop 4' case, which is probably
> harder to trigger as run-time fail as the stack memory
> is likely zero-initialized. → -fdump-tree-original scan
> test useful?)
> 
> OK for mainline and GCC 10?
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 28, 2020 at 01:06:40PM +0200, Richard Biener wrote:
> > On Fri, Aug 28, 2020 at 12:36:00PM +0200, Richard Biener wrote:
> > > Guess this would work indeed.  It's probably quite common to have
> > > both vector and non-vector constants because of vectorization
> > > and scalar epilogues.  But note that elsewhere we're using
> > > the largest component mode to emit vector constant pool entries
> > > to share { -1, -1, -1, -1 } and {-1l, -1l } for example so while the
> > > below code works for FP modes it likely will break down for
> > > integer modes?
> >
> > I don't see why it would break, it will not optimize { -1LL, -1LL }
> > vs. -1 scalar, sure, but it uses the hash and equality function the
> > rtl constant pool uses, which means it compares both the constants
> > (rtx_equal_p) and mode we have recorded for it.
> 
> Oh, I thought my patch for PR54201 was installed but it was not
> (I grepped my patch folder...).  PR54201 complains about something
> similar but then different ;)  Guess post-processing that would also be
> possible but also a bit awkward.  Maybe we should hash the
> full byte representation instead of the components.

I think we in general can't, because the constants can contain even things
like LABEL_REFs, so not everything is actually representable as host byte
stream.
And I'm not convinced doing it in force_constant_mem is best, because at
that point we really don't know if the constants will not be optimized away.

We could extend what my patch does though, by sorting the pool entries
(those with mark > 0 only) by descreasing size and for those for which
native_encode_rtx is successful, enter into a hash table an entry for their
whole byte representation, then half of it until each bytes (possibly taking
into account the alignment too).

As for section anchors, handling that seems far less important, as e.g. on
both powerpc64 and aarch64 I see (unless -fno-merge-constants) the constant
pool emitted outside of the blocks and so optimized by the posted patch.

Jakub



Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Richard Biener via Gcc-patches
On Fri, Aug 28, 2020 at 12:47 PM Jakub Jelinek  wrote:
>
> On Fri, Aug 28, 2020 at 12:36:00PM +0200, Richard Biener wrote:
> > Guess this would work indeed.  It's probably quite common to have
> > both vector and non-vector constants because of vectorization
> > and scalar epilogues.  But note that elsewhere we're using
> > the largest component mode to emit vector constant pool entries
> > to share { -1, -1, -1, -1 } and {-1l, -1l } for example so while the
> > below code works for FP modes it likely will break down for
> > integer modes?
>
> I don't see why it would break, it will not optimize { -1LL, -1LL }
> vs. -1 scalar, sure, but it uses the hash and equality function the
> rtl constant pool uses, which means it compares both the constants
> (rtx_equal_p) and mode we have recorded for it.

Oh, I thought my patch for PR54201 was installed but it was not
(I grepped my patch folder...).  PR54201 complains about something
similar but then different ;)  Guess post-processing that would also be
possible but also a bit awkward.  Maybe we should hash the
full byte representation instead of the components.

Richard.

> Of course, on x86_64 integer scalar constants will pretty much never appear
> in the constant pool, so guess we'll need a different target for testing
> that.
>
> Jakub
>


Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 28, 2020 at 12:36:00PM +0200, Richard Biener wrote:
> Guess this would work indeed.  It's probably quite common to have
> both vector and non-vector constants because of vectorization
> and scalar epilogues.  But note that elsewhere we're using
> the largest component mode to emit vector constant pool entries
> to share { -1, -1, -1, -1 } and {-1l, -1l } for example so while the
> below code works for FP modes it likely will break down for
> integer modes?

I don't see why it would break, it will not optimize { -1LL, -1LL }
vs. -1 scalar, sure, but it uses the hash and equality function the
rtl constant pool uses, which means it compares both the constants
(rtx_equal_p) and mode we have recorded for it.
Of course, on x86_64 integer scalar constants will pretty much never appear
in the constant pool, so guess we'll need a different target for testing
that.

Jakub



RE: [PATCH PR96357][GCC][AArch64]: could not split insn UNSPEC_COND_FSUB with AArch64 SVE

2020-08-28 Thread Przemyslaw Wirkus
> Sorry for the micromanagement, but I think this is easier to read if it flows 
> as a
> single paragraph:

[snip...]

> I should have realised this would be the case, sorry, but now that there's 
> only
> one rewrite, this should simply be:
> 
>   "&& reload_completed
>&& register_operand (operands[4], mode)
>&& !rtx_equal_p (operands[0], operands[4]))"
>   {
> emit_insn (gen_vcond_mask_ (operands[0], operands[3],
>operands[4], operands[1]));
> operands[4] = operands[3] = operands[0];
>   }

Done and done.

> OK with those changes, thanks.
> 
> Richard

Committed with:

commit b648814c02eb418aaf27897c480452172ee96303
Date:   Fri Aug 28 11:31:04 2020 +0100

Kind regards,
Przemyslaw



Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Richard Biener via Gcc-patches
On Fri, Aug 28, 2020 at 10:52 AM Jakub Jelinek  wrote:
>
> On Fri, Aug 28, 2020 at 08:47:06AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > IIRC elsewhere it was discussed to use ld to perform merging by
> > emitting separate rodata sections for constant sizes (4, 8, 16, 32, 64
> > byte sizes).
>
> ld does that already, and gcc too.
>
> > ld could always direct 8 byte constant refs to the larger pools (sub-)entry.
>
> But there is no way to express in ELF that something like that would be
> acceptable.

Hmm, I see.

> I meant something like the following, which on e.g. a dumb:
>
> typedef float V __attribute__((vector_size (4 * sizeof (float;
>
> void
> foo (V *p, float *q)
> {
>   p[0] += (V) { 1.0f, 2.0f, 3.0f, 4.0f };
>   q[0] += 4.0f;
>   q[1] -= 3.0f;
>   q[17] -= 2.0f;
>   q[31] += 1.0f;
> }
>
> testcase merges all the 4 scalar constant pool entries into the CONST_VECTOR
> one.
>
> I'm punting for section anchors and not doing it in the per-function (i.e.
> non-shared) constant pools simply because I don't know them well enough,
> don't know whether backends use the offsets for something etc.
> For section anchors, I guess it would need to be done before (re)computing the
> offsets and arrange for the desc->mark < 0 entries not to be considered as
> objects in the object block, for non-shared pools, perhaps it would be
> enough to call the new function from output_constant_pool before calling
> recompute_pool_offsets and adjust recompute_pool_offsets to ignore
> desc->mark < 0.

Guess this would work indeed.  It's probably quite common to have
both vector and non-vector constants because of vectorization
and scalar epilogues.  But note that elsewhere we're using
the largest component mode to emit vector constant pool entries
to share { -1, -1, -1, -1 } and {-1l, -1l } for example so while the
below code works for FP modes it likely will break down for
integer modes?

Richard.

> 2020-08-28  Jakub Jelinek  
>
> * varasm.c (output_constant_pool_contents): Emit desc->mark < 0
> entries as aliases.
> (optimize_constant_pool): New function.
> (output_shared_constant_pool): Call it if TARGET_SUPPORTS_ALIASES.
>
> --- gcc/varasm.c.jj 2020-07-28 15:39:10.091755086 +0200
> +++ gcc/varasm.c2020-08-28 10:38:10.207636849 +0200
> @@ -4198,7 +4198,27 @@ output_constant_pool_contents (struct rt
>class constant_descriptor_rtx *desc;
>
>for (desc = pool->first; desc ; desc = desc->next)
> -if (desc->mark)
> +if (desc->mark < 0)
> +  {
> +#ifdef ASM_OUTPUT_DEF
> +const char *name = targetm.strip_name_encoding (XSTR (desc->sym, 0));
> +char label[256];
> +char buffer[256 + 32];
> +const char *p;
> +
> +ASM_GENERATE_INTERNAL_LABEL (label, "LC", ~desc->mark);
> +   p = targetm.strip_name_encoding (label);
> +   if (desc->offset)
> + {
> +   sprintf (buffer, "%s+%ld", p, (long) (desc->offset));
> +   p = buffer;
> + }
> +   ASM_OUTPUT_DEF (asm_out_file, name, p);
> +#else
> +   gcc_unreachable ();
> +#endif
> +  }
> +else if (desc->mark)
>{
> /* If the constant is part of an object_block, make sure that
>the constant has been positioned within its block, but do not
> @@ -4216,6 +4236,52 @@ output_constant_pool_contents (struct rt
>}
>  }
>
> +/* Attempt to optimize constant pool POOL.  If it contains both CONST_VECTOR
> +   constants and scalar constants with the values of CONST_VECTOR elements,
> +   try to alias the scalar constants with the CONST_VECTOR elements.  */
> +
> +static void
> +optimize_constant_pool (struct rtx_constant_pool *pool)
> +{
> +  for (constant_descriptor_rtx *desc = pool->first; desc; desc = desc->next)
> +if (desc->mark > 0
> +   && GET_CODE (desc->constant) == CONST_VECTOR
> +   && VECTOR_MODE_P (desc->mode)
> +   && GET_MODE_CLASS (desc->mode) != MODE_VECTOR_BOOL
> +   && !(SYMBOL_REF_HAS_BLOCK_INFO_P (desc->sym)
> +&& SYMBOL_REF_BLOCK (desc->sym))
> +   && desc->labelno >= 0)
> +  {
> +   scalar_mode submode = GET_MODE_INNER (desc->mode);
> +   unsigned int subalign = MIN (desc->align, GET_MODE_BITSIZE (submode));
> +   int units = GET_MODE_NUNITS (desc->mode);
> +
> +   for (int i = 0; i < units; i++)
> + {
> +   if (i != 0
> +   && rtx_equal_p (CONST_VECTOR_ELT (desc->constant, i),
> +   CONST_VECTOR_ELT (desc->constant, i - 1)))
> + continue;
> +
> +   constant_descriptor_rtx tmp;
> +   tmp.constant = CONST_VECTOR_ELT (desc->constant, i);
> +   tmp.mode = submode;
> +   hashval_t hash = const_rtx_hash (tmp.constant);
> +   constant_descriptor_rtx *eldesc
> + = pool->const_rtx_htab->find_with_hash (, hash);
> +   if (eldesc
> +   && eldesc->mark > 0
> +   && eldesc->align <= 

[Patch, Fortran, Coarray] PR fortran/96418: Fix ICE on test when compiled with coarray=single

2020-08-28 Thread Andre Vehreschild
Hi,

attached patch fixes PR96418 where the code in the testsuite when compiled with
-fcoarray=single  lead to an ICE. The reason was that the coarray object was
derefed as an array, but it was no array. Introducing the test for the
descriptor removes the ICE.

Regtests ok on x86_64-linux/FC31. Ok for trunk?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
commit feb00c16ba50473e32a0c4adda470a0111603aed
Author: Andre Vehreschild 
Date:   Fri Aug 28 11:48:31 2020 +0200

Fix ICE when compiling with -fcoarray=single, when derefing a non-array.

gcc/fortran/ChangeLog:

PR fortran/96418
* trans.c (gfc_deallocate_with_status): Check that object to deref
is an array, before applying array deref.

gcc/testsuite/ChangeLog:

PR fortran/96418
* gfortran.dg/coarray_alloc_comp_3.f08: Moved to...
* gfortran.dg/coarray/alloc_comp_6.f08: ...here.
Should be tested for both -fcoarray=single and lib, resp.
* gfortran.dg/coarray_alloc_comp_4.f08: Fix program name.

diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index ed054261452..1c23cdf8a0e 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -1374,7 +1374,8 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	  else
 	caf_dereg_type = (enum gfc_coarray_deregtype) coarray_dealloc_mode;
 	}
-  else if (flag_coarray == GFC_FCOARRAY_SINGLE)
+  else if (flag_coarray == GFC_FCOARRAY_SINGLE
+	   && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (pointer)))
 	pointer = gfc_conv_descriptor_data_get (pointer);
 }
   else if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (pointer)))
diff --git a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08 b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_6.f08
similarity index 95%
rename from gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08
rename to gcc/testsuite/gfortran.dg/coarray/alloc_comp_6.f08
index e2037aa5809..8b153925129 100644
--- a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08
+++ b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_6.f08
@@ -1,12 +1,11 @@
 ! { dg-do run }
-! { dg-options "-fcoarray=lib -lcaf_single" }
 ! { dg-additional-options "-latomic" { target libatomic_available } }
 !
 ! Contributed by Andre Vehreschild
 ! Check that manually freeing components does not lead to a runtime crash,
 ! when the auto-deallocation is taking care.

-program coarray_alloc_comp_3
+program alloc_comp_6
   implicit none

   type dt
diff --git a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08 b/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08
index 6586ec651dd..4c71a90af8f 100644
--- a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08
+++ b/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08
@@ -5,7 +5,7 @@
 ! Contributed by Andre Vehreschild
 ! Check that sub-components are caf_deregistered and not freed.

-program coarray_alloc_comp_3
+program coarray_alloc_comp_4
   implicit none

   type dt


Re: [PATCH] aarch64: Remove redundant mult patterns

2020-08-28 Thread Richard Sandiford
Alex Coplan  writes:
> Hello,
>
> Following on from the earlier patch to fix up the syntax for
> add/sub/adds/subs and friends with a sign/zero-extended operand [0],
> this patch removes the "mult" variants of these patterns which are
> all redundant.
>
> This patch removes the following patterns from the AArch64 backend:
>
>  *adds_mul_imm_
>  *subs_mul_imm_
>  *adds__multp2
>  *subs__multp2
>  *add_mul_imm_
>  *add__mult_
>  *add__mult_si_uxtw
>  *add__multp2
>  *add_si_multp2_uxtw
>  *add_uxt_multp2
>  *add_uxtsi_multp2_uxtw
>  *sub_mul_imm_
>  *sub_mul_imm_si_uxtw
>  *sub__multp2
>  *sub_si_multp2_uxtw
>  *sub_uxt_multp2
>  *sub_uxtsi_multp2_uxtw
>  *neg_mul_imm_2
>  *neg_mul_imm_si2_uxtw
>
> Together with the following predicates which were used only by these
> patterns:
>
>  - aarch64_pwr_imm3
>  - aarch64_pwr_2_si
>  - aarch64_pwr_2_di
>
> These patterns are all redundant since multiplications by powers of two
> should be represented as shfits outside a (mem).
>
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu (on top of [0]),
>no regressions.
>
> OK for master (when applied after [0])?

That's a nice collection of minuses.

OK for trunk, thanks.  Since this depends on the RA patch, and since
RA patches have a habit of exposing problems on other targets, it might
be better to wait for a week or so before committing this one.
Just a suggestion though -- go ahead and commit whenever you're
comfortable committing.

The patch might expose code quality regressions.  If so, that's
probably a sign that some other pass needs a similar fix to the RA,
even though the symptom is “just” a missed optimisation rather than
an ICE.

Thanks,
Richard


Re: [PATCH v5] genemit.c (main): split insn-emit.c for compiling parallelly

2020-08-28 Thread Richard Sandiford
Thanks for doing this.  In addition to what Segher said:

Jojo R  writes:
> gcc/ChangeLog:
>
>   * genemit.c (main): Print 'split line'.
>   * Makefile.in (insn-emit.c): Define split count and file
>
> ---
>  gcc/Makefile.in | 15 +
>  gcc/genemit.c   | 87 -
>  2 files changed, 64 insertions(+), 38 deletions(-)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 79e854aa938..08e4aa7ef6f 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1258,6 +1258,17 @@ ANALYZER_OBJS = \
>  # We put the *-match.o and insn-*.o files first so that a parallel make
>  # will build them sooner, because they are large and otherwise tend to be
>  # the last objects to finish building.
> +
> +# target overrides
> +-include $(tmake_file)
> +
> +INSN-GENERATED-SPLIT-NUM ?= 0
> +insn-generated-split-num = $(shell expr $(INSN-GENERATED-SPLIT-NUM) + 1)
> +
> +insn-emit-split-c = $(foreach o, $(shell for i in 
> {1..$(insn-generated-split-num)}; do echo $$i; done), insn-emit$(o).c)

The {a..b} construct isn't portable: this needs to be runnable with
a plain Bourne shell like /bin/dash.

I think we should use the same wordlist technique as check_p_numbers[0-6].
So I guess the first step would be to rename check_p_numbers[0-6] to
something more general and use it both here and in check_p_numbers.

Thanks,
Richard


Re: [PATCH] fix a typo in rtl.texi

2020-08-28 Thread Richard Sandiford
Wei Wentao  writes:
> Hi,
>
> This patch fix a typo in rtl.texi.
>
> Regards!
>
> Weiwt
>
> ---
>  gcc/doc/rtl.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
> index 501fa1a31da..f8e1f950823 100644
> --- a/gcc/doc/rtl.texi
> +++ b/gcc/doc/rtl.texi
> @@ -3954,7 +3954,7 @@ variable.
>  @findex NOTE_INSN_BEGIN_STMT
>  @item NOTE_INSN_BEGIN_STMT
>  This note is used to generate @code{is_stmt} markers in line number
> -debuggign information.  It indicates the beginning of a user
> +debugging information.  It indicates the beginning of a user
>  statement.
>  
>  @findex NOTE_INSN_INLINE_ENTRY

Thanks, pushed to master.

Richard


Re: [PATCH] doc: add 'cd' command before 'make check-gcc' command in install.texi

2020-08-28 Thread Richard Sandiford
Hu Jiangping  writes:
> Hi,
>
> This patch add 'cd' command before 'make check-gcc' command
> when run the testsuite on selected tests.
>
> I think the implicit meaning of the original text is to
> execute the cd command to move to the gcc subdirectory of
> the object directory before executing the make command.
> However, due to the following reasons, it may cause
> confusion if it is not clearly written.
>
>   * make check-gcc command also can be executed in
>   object directory which will run more testcases
>   than be executed in gcc subdirectory.
>   * the make check-g++ command below will report
>   'No rule to make target' error if be executed in
>   object directory.
>
> Tested on x86_64. OK for master?

I think the text above was trying to restrict this to objdir/gcc:

In order to run sets of tests selectively, there are targets
@samp{make check-gcc} and language specific @samp{make check-c},
@samp{make check-c++}, @samp{make check-d} @samp{make check-fortran},
@samp{make check-ada}, @samp{make check-objc}, @samp{make check-obj-c++},
@samp{make check-lto}
in the @file{gcc} subdirectory of the object directory.

but I agree it would be good for clarity and emphasis to have the cd in
the example as well.  But if we do that, I think it would be better to
add the cd to the make check-g++ examples too, since I'm not sure the
examples were supposed to be executed in sequence.

It's also a bit inconsistent that the text quoted above uses “check-c++”
while the examples use “check-g++”.  The two are equivalent, but it
would be better for the examples to match the text.

Thanks,
Richard


>
> Regards!
> hujp
>
> ---
>  gcc/doc/install.texi | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 5330bf3bb29..fd4409921ee 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -2975,6 +2975,7 @@ A more selective way to just run all @command{gcc} 
> execute tests in the
>  testsuite is to use
>  
>  @smallexample
> +cd @var{objdir}/gcc
>  make check-gcc RUNTESTFLAGS="execute.exp @var{other-options}"
>  @end smallexample


Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-28 Thread Alex Coplan
Re: [PATCH] lra: Canonicalize mult to shift in address reloads

Hi Christophe,

On 28/08/2020 10:16, Christophe Lyon wrote:
> Hi Alex,
> 
> 
> On Wed, 26 Aug 2020 at 17:15, Alex Coplan  wrote:
> >
> > Thanks for the review, both.
> >
> > On 26/08/2020 09:19, Vladimir Makarov wrote:
> > >
> > > On 2020-08-26 5:06 a.m., Richard Sandiford wrote:
> > > > Alex Coplan  writes:
> > > >
> > > > Minor nit, should be formatted as:
> > > >
> > > > static rtx
> > > > canonicalize_reload_addr (rtx addr)
> > > Sorry for missing this.  Alex, it should be fixed anyway.
> > > >
> > > > I don't think we should we restrict this to (plus (mult X Y) Z),
> > > > since addresses can be more complicated than that.  One way to
> > > > search for all MULTs is:
> > > >
> > > >subrtx_iterator::array_type array;
> > > >FOR_EACH_SUBRTX (iter, array, x, NONCONST)
> > > >  {
> > > >rtx x = *iter;
> > > >if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
> > > >  ...
> > > >  }
> > > >
> > > > (Needs rtl-iter.h)
> > >
> > > I am agree it would be nice to process a general case.  Alex, you can do
> > > this as a separate patch if you want.
> > >
> > > Richard, thank you for looking at this patch too.
> > >
> > >
> >
> > Please find a reworked version of the patch attached incorporating
> > Richard's feedback.
> >
> > Testing:
> >  * Bootstrap and regtest on aarch64-none-linux-gnu,
> >arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.
> >
> 
> Unfortunately, the new test fails on aarch64 with mabi=ilp32
> FAIL: gcc.target/aarch64/mem-shift-canonical.c scan-assembler-times 
> add_lsl_di 3

Thanks for catching this. It fails since we're looking for a pattern
that could only be hit on LP64. Disabling the test on ILP32 since the
problematic mult pattern was never hit there, so there's nothing to
test.

Committing as obvious.

Thanks,
Alex

> 
> Christophe
> 

---

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mem-shift-canonical.c: Skip on ILP32.

---

diff --git a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c 
b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
index 36beed497a0..4747929 100644
--- a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
+++ b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
@@ -3,6 +3,7 @@
 
 /* { dg-do compile } */
 /* { dg-options "-Os -ftree-vectorize -dp" } */
+/* { dg-require-effective-target lp64 } */
 
 
 struct T



Re: [PATCH] streamline TARGET_MEM_REF dumping

2020-08-28 Thread Richard Biener
On Fri, 28 Aug 2020, Christophe Lyon wrote:

> Hi,
> 
> 
> On Thu, 27 Aug 2020 at 13:07, Richard Biener  wrote:
> >
> > The following streamlines TARGET_MEM_REF dumping building
> > on what we do for MEM_REF and thus dumping things like
> > access type, TBAA type and base/clique.  I've changed it
> > to do semantic dumping aka base + offset + step * index
> > rather than the odd base: A, step: way.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> >
> > 2020-08-27  Richard Biener  
> >
> > * tree-pretty-print.c (dump_mem_ref): Handle TARGET_MEM_REFs.
> > (dump_generic_node): Use dump_mem_ref also for TARGET_MEM_REF.
> >
> > * gcc.dg/tree-ssa/loop-19.c: Adjust.
> > * gcc.dg/tree-ssa/loop-2.c: Likewise.
> > * gcc.dg/tree-ssa/loop-3.c: Likewise.
> 
> This introduced 2 regressions on arm
> (for instance --target arm-none-linux-gnueabihf --with-mode arm
> --with-cpu cortex-a9 --with-fpu neon-fp16):
> FAIL:gcc.dg/tree-ssa/scev-3.c scan-tree-dump-times ivopts "" 1
> gcc.dg/tree-ssa/scev-3.c: pattern found 2 times
> 
> FAIL:gcc.dg/tree-ssa/scev-5.c scan-tree-dump-times ivopts "" 1
> gcc.dg/tree-ssa/scev-5.c: pattern found 2 times

I'm seeing them on i?86 as well and they seem to be genuine latent
issues:

FAIL: gcc.dg/tree-ssa/scev-3.c scan-tree-dump-times ivopts "" 1
FAIL: gcc.dg/tree-ssa/scev-4.c scan-tree-dump-times ivopts "" 1
FAIL: gcc.dg/tree-ssa/scev-5.c scan-tree-dump-times ivopts "" 1

The testcase all do sth like

__BB(5):
  i_12 = __PHI (__BB6: i_9, __BB4: i_5);
  _1 = [i_12];
  a_p = _1;
  __MEM  ((int *))[i_12] = 100;
  i_9 = i_5 + i_12;
  if (i_9 <= 999ll)
...

and expect IVOPTs to realize that the address stored to a_p
is the same as the one for the store and create a single
IV for this.  But it doesn't:

   :
  # i_12 = PHI 
  _5 = (unsigned int) 
  _11 = (unsigned int) i_12;
  _10 = _11 * 4;
  _8 = _5 + _10;
  _7 = (int *) _8;
  _1_16 = _7;
  a_p = _1_16;
  _6 = (sizetype) i_12;
  MEM[(int *) + _6 * 4] = 100;

previously the access was dumped as MEM[symbol: a, ...] and thus
didn't contain the suspicious  (oops).  But now the reduncanty
is clearly visible and possibly caused by 'long long i' in the
above case but for scev-4.c it is int vs. unsigned int.

Now, the 2017-01-17 change to the testcases notes this already
and hints at missed store-motion / final value replacement.
Those testcases have been fragile for a while (and the question
is still what they exactly were added for).

Richard.

> Christophe
> 
> 
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/loop-19.c |  4 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/loop-2.c  |  1 -
> >  gcc/testsuite/gcc.dg/tree-ssa/loop-3.c  |  3 +-
> >  gcc/tree-pretty-print.c | 89 -
> >  4 files changed, 30 insertions(+), 67 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
> > index af7a3daddec..0c73111c1ee 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
> > @@ -22,6 +22,6 @@ void tuned_STREAM_Copy()
> > However, due to a bug in jump threading, we end up peeling one 
> > iteration from
> > the loop, which creates an additional occurrence.  */
> >
> > -/* { dg-final { scan-tree-dump-times "MEM.(base: &|symbol: )a," 2 
> > "optimized" } } */
> > -/* { dg-final { scan-tree-dump-times "MEM.(base: &|symbol: )c," 2 
> > "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "MEM\[^;\]*" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "MEM\[^;\]*" 1 "optimized" } } */
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
> > index bda25167353..e58561a6650 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
> > @@ -27,7 +27,6 @@ void xxx(void)
> >
> >  /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } 
> > } */
> >  /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } 
> > } */
> > -/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */
> >
> >  /* 17 * iter should be strength reduced.  */
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
> > index d3b26b7ad19..74491f80e49 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
> > @@ -20,8 +20,7 @@ void xxx(void)
> >  /* Access to arr_base[iter].y should not be strength reduced, since
> > we have a memory mode including multiplication by 4.  */
> >
> > -/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */
> > -/* { dg-final { scan-tree-dump-times "step:" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "MEM\[^;\]* \* 4" 1 "optimized" } } */
> >
> >  /* And original induction variable should be preserved.  */
> >
> > diff --git a/gcc/tree-pretty-print.c 

Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 28, 2020 at 08:47:06AM +0200, Richard Biener via Gcc-patches wrote:
> IIRC elsewhere it was discussed to use ld to perform merging by
> emitting separate rodata sections for constant sizes (4, 8, 16, 32, 64
> byte sizes).

ld does that already, and gcc too.

> ld could always direct 8 byte constant refs to the larger pools (sub-)entry.

But there is no way to express in ELF that something like that would be
acceptable.

I meant something like the following, which on e.g. a dumb:

typedef float V __attribute__((vector_size (4 * sizeof (float;

void
foo (V *p, float *q)
{
  p[0] += (V) { 1.0f, 2.0f, 3.0f, 4.0f };
  q[0] += 4.0f;
  q[1] -= 3.0f;
  q[17] -= 2.0f;
  q[31] += 1.0f;
}

testcase merges all the 4 scalar constant pool entries into the CONST_VECTOR
one.

I'm punting for section anchors and not doing it in the per-function (i.e.
non-shared) constant pools simply because I don't know them well enough,
don't know whether backends use the offsets for something etc.
For section anchors, I guess it would need to be done before (re)computing the
offsets and arrange for the desc->mark < 0 entries not to be considered as
objects in the object block, for non-shared pools, perhaps it would be
enough to call the new function from output_constant_pool before calling
recompute_pool_offsets and adjust recompute_pool_offsets to ignore
desc->mark < 0.

2020-08-28  Jakub Jelinek  

* varasm.c (output_constant_pool_contents): Emit desc->mark < 0
entries as aliases.
(optimize_constant_pool): New function.
(output_shared_constant_pool): Call it if TARGET_SUPPORTS_ALIASES.

--- gcc/varasm.c.jj 2020-07-28 15:39:10.091755086 +0200
+++ gcc/varasm.c2020-08-28 10:38:10.207636849 +0200
@@ -4198,7 +4198,27 @@ output_constant_pool_contents (struct rt
   class constant_descriptor_rtx *desc;
 
   for (desc = pool->first; desc ; desc = desc->next)
-if (desc->mark)
+if (desc->mark < 0)
+  {
+#ifdef ASM_OUTPUT_DEF
+const char *name = targetm.strip_name_encoding (XSTR (desc->sym, 0));
+char label[256];
+char buffer[256 + 32];
+const char *p;
+
+ASM_GENERATE_INTERNAL_LABEL (label, "LC", ~desc->mark);
+   p = targetm.strip_name_encoding (label);
+   if (desc->offset)
+ {
+   sprintf (buffer, "%s+%ld", p, (long) (desc->offset));
+   p = buffer;
+ }
+   ASM_OUTPUT_DEF (asm_out_file, name, p);
+#else
+   gcc_unreachable ();
+#endif
+  }
+else if (desc->mark)
   {
/* If the constant is part of an object_block, make sure that
   the constant has been positioned within its block, but do not
@@ -4216,6 +4236,52 @@ output_constant_pool_contents (struct rt
   }
 }
 
+/* Attempt to optimize constant pool POOL.  If it contains both CONST_VECTOR
+   constants and scalar constants with the values of CONST_VECTOR elements,
+   try to alias the scalar constants with the CONST_VECTOR elements.  */
+
+static void
+optimize_constant_pool (struct rtx_constant_pool *pool)
+{
+  for (constant_descriptor_rtx *desc = pool->first; desc; desc = desc->next)
+if (desc->mark > 0
+   && GET_CODE (desc->constant) == CONST_VECTOR
+   && VECTOR_MODE_P (desc->mode)
+   && GET_MODE_CLASS (desc->mode) != MODE_VECTOR_BOOL
+   && !(SYMBOL_REF_HAS_BLOCK_INFO_P (desc->sym)
+&& SYMBOL_REF_BLOCK (desc->sym))
+   && desc->labelno >= 0)
+  {
+   scalar_mode submode = GET_MODE_INNER (desc->mode);
+   unsigned int subalign = MIN (desc->align, GET_MODE_BITSIZE (submode));
+   int units = GET_MODE_NUNITS (desc->mode);
+
+   for (int i = 0; i < units; i++)
+ {
+   if (i != 0
+   && rtx_equal_p (CONST_VECTOR_ELT (desc->constant, i),
+   CONST_VECTOR_ELT (desc->constant, i - 1)))
+ continue;
+
+   constant_descriptor_rtx tmp;
+   tmp.constant = CONST_VECTOR_ELT (desc->constant, i);
+   tmp.mode = submode;
+   hashval_t hash = const_rtx_hash (tmp.constant);
+   constant_descriptor_rtx *eldesc
+ = pool->const_rtx_htab->find_with_hash (, hash);
+   if (eldesc
+   && eldesc->mark > 0
+   && eldesc->align <= subalign
+   && !(SYMBOL_REF_HAS_BLOCK_INFO_P (eldesc->sym)
+&& SYMBOL_REF_BLOCK (eldesc->sym)))
+ {
+   eldesc->mark = ~desc->labelno;
+   eldesc->offset = i * GET_MODE_SIZE (submode);
+ }
+ }
+  }
+}
+
 /* Mark all constants that are used in the current function, then write
out the function's private constant pool.  */
 
@@ -4251,6 +4317,9 @@ output_constant_pool (const char *fnname
 void
 output_shared_constant_pool (void)
 {
+  if (TARGET_SUPPORTS_ALIASES)
+optimize_constant_pool (shared_constant_pool);
+
   output_constant_pool_contents (shared_constant_pool);
 }
 


Jakub



Re: [PATCH] streamline TARGET_MEM_REF dumping

2020-08-28 Thread Christophe Lyon via Gcc-patches
Hi,


On Thu, 27 Aug 2020 at 13:07, Richard Biener  wrote:
>
> The following streamlines TARGET_MEM_REF dumping building
> on what we do for MEM_REF and thus dumping things like
> access type, TBAA type and base/clique.  I've changed it
> to do semantic dumping aka base + offset + step * index
> rather than the odd base: A, step: way.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
> 2020-08-27  Richard Biener  
>
> * tree-pretty-print.c (dump_mem_ref): Handle TARGET_MEM_REFs.
> (dump_generic_node): Use dump_mem_ref also for TARGET_MEM_REF.
>
> * gcc.dg/tree-ssa/loop-19.c: Adjust.
> * gcc.dg/tree-ssa/loop-2.c: Likewise.
> * gcc.dg/tree-ssa/loop-3.c: Likewise.

This introduced 2 regressions on arm
(for instance --target arm-none-linux-gnueabihf --with-mode arm
--with-cpu cortex-a9 --with-fpu neon-fp16):
FAIL:gcc.dg/tree-ssa/scev-3.c scan-tree-dump-times ivopts "" 1
gcc.dg/tree-ssa/scev-3.c: pattern found 2 times

FAIL:gcc.dg/tree-ssa/scev-5.c scan-tree-dump-times ivopts "" 1
gcc.dg/tree-ssa/scev-5.c: pattern found 2 times

Christophe


> ---
>  gcc/testsuite/gcc.dg/tree-ssa/loop-19.c |  4 +-
>  gcc/testsuite/gcc.dg/tree-ssa/loop-2.c  |  1 -
>  gcc/testsuite/gcc.dg/tree-ssa/loop-3.c  |  3 +-
>  gcc/tree-pretty-print.c | 89 -
>  4 files changed, 30 insertions(+), 67 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
> index af7a3daddec..0c73111c1ee 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
> @@ -22,6 +22,6 @@ void tuned_STREAM_Copy()
> However, due to a bug in jump threading, we end up peeling one iteration 
> from
> the loop, which creates an additional occurrence.  */
>
> -/* { dg-final { scan-tree-dump-times "MEM.(base: &|symbol: )a," 2 
> "optimized" } } */
> -/* { dg-final { scan-tree-dump-times "MEM.(base: &|symbol: )c," 2 
> "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\[^;\]*" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\[^;\]*" 1 "optimized" } } */
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
> index bda25167353..e58561a6650 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
> @@ -27,7 +27,6 @@ void xxx(void)
>
>  /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } 
> */
>  /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } 
> */
> -/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */
>
>  /* 17 * iter should be strength reduced.  */
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
> index d3b26b7ad19..74491f80e49 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-3.c
> @@ -20,8 +20,7 @@ void xxx(void)
>  /* Access to arr_base[iter].y should not be strength reduced, since
> we have a memory mode including multiplication by 4.  */
>
> -/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times "step:" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\[^;\]* \* 4" 1 "optimized" } } */
>
>  /* And original induction variable should be preserved.  */
>
> diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
> index 655061c174d..075a3fca766 100644
> --- a/gcc/tree-pretty-print.c
> +++ b/gcc/tree-pretty-print.c
> @@ -1441,7 +1441,7 @@ dump_omp_atomic_memory_order (pretty_printer *pp, enum 
> omp_memory_order mo)
>  static void
>  dump_mem_ref (pretty_printer *pp, tree node, int spc, dump_flags_t flags)
>  {
> -  if (flags & TDF_GIMPLE)
> +  if (TREE_CODE (node) == MEM_REF && (flags & TDF_GIMPLE))
>  {
>pp_string (pp, "__MEM <");
>dump_generic_node (pp, TREE_TYPE (node),
> @@ -1472,7 +1472,8 @@ dump_mem_ref (pretty_printer *pp, tree node, int spc, 
> dump_flags_t flags)
> }
>pp_right_paren (pp);
>  }
> -  else if (integer_zerop (TREE_OPERAND (node, 1))
> +  else if (TREE_CODE (node) == MEM_REF
> +  && integer_zerop (TREE_OPERAND (node, 1))
>/* Dump the types of INTEGER_CSTs explicitly, for we can't
>   infer them and MEM_ATTR caching will share MEM_REFs
>   with differently-typed op0s.  */
> @@ -1541,12 +1542,33 @@ dump_mem_ref (pretty_printer *pp, tree node, int spc, 
> dump_flags_t flags)
>dump_generic_node (pp, op1type, spc, flags | TDF_SLIM, false);
>pp_right_paren (pp);
>dump_generic_node (pp, op0, spc, flags, false);
> -  if (!integer_zerop (op1))
> -  if (!integer_zerop (TREE_OPERAND (node, 1)))
> +  if (TREE_CODE (node) == MEM_REF
> + && !integer_zerop (op1))
> {
>   pp_string (pp, " + ");
>   dump_generic_node (pp, op1, spc, flags, 

Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-28 Thread Christophe Lyon via Gcc-patches
Hi Alex,


On Wed, 26 Aug 2020 at 17:15, Alex Coplan  wrote:
>
> Thanks for the review, both.
>
> On 26/08/2020 09:19, Vladimir Makarov wrote:
> >
> > On 2020-08-26 5:06 a.m., Richard Sandiford wrote:
> > > Alex Coplan  writes:
> > >
> > > Minor nit, should be formatted as:
> > >
> > > static rtx
> > > canonicalize_reload_addr (rtx addr)
> > Sorry for missing this.  Alex, it should be fixed anyway.
> > >
> > > I don't think we should we restrict this to (plus (mult X Y) Z),
> > > since addresses can be more complicated than that.  One way to
> > > search for all MULTs is:
> > >
> > >subrtx_iterator::array_type array;
> > >FOR_EACH_SUBRTX (iter, array, x, NONCONST)
> > >  {
> > >rtx x = *iter;
> > >if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
> > >  ...
> > >  }
> > >
> > > (Needs rtl-iter.h)
> >
> > I am agree it would be nice to process a general case.  Alex, you can do
> > this as a separate patch if you want.
> >
> > Richard, thank you for looking at this patch too.
> >
> >
>
> Please find a reworked version of the patch attached incorporating
> Richard's feedback.
>
> Testing:
>  * Bootstrap and regtest on aarch64-none-linux-gnu,
>arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.
>

Unfortunately, the new test fails on aarch64 with mabi=ilp32
FAIL: gcc.target/aarch64/mem-shift-canonical.c scan-assembler-times add_lsl_di 3

Christophe

> Is the updated patch OK for master?
>
> Thanks,
> Alex


Re: [PATCH] tree-optimization/96579 - another special-operands fix in reassoc

2020-08-28 Thread Christophe Lyon via Gcc-patches
Hi,

On Thu, 27 Aug 2020 at 10:04, Richard Biener  wrote:
>
> This makes sure to put special-ops expanded rhs left where
> expression rewrite expects it.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
> 2020-08-27  Richard Biener  
>
> PR tree-optimization/96579
> * tree-ssa-reassoc.c (linearize_expr_tree): If we expand
> rhs via special ops make sure to swap operands.
>
> * gcc.dg/pr96579.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/pr96579.c |  4 
>  gcc/tree-ssa-reassoc.c | 13 +++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr96579.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr96579.c b/gcc/testsuite/gcc.dg/pr96579.c
> new file mode 100644
> index 000..49fdcb40359
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr96579.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fno-tree-forwprop -ffast-math -fno-tree-vrp" } */
> +
> +#include "pr96370.c"

pr96370 has
/* { dg-do compile { target dfp } } */
which is needed here too, otherwise the test fails on arm/aarch64 and others.

I've pushed the obvious fix.

Christophe


> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index fed463b0350..a5f5d52edab 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -5651,13 +5651,20 @@ linearize_expr_tree (vec *ops, 
> gimple *stmt,
>
>if (!binrhsisreassoc)
> {
> - if (!try_special_add_to_ops (ops, rhscode, binrhs, binrhsdef))
> + bool swap = false;
> + if (try_special_add_to_ops (ops, rhscode, binrhs, binrhsdef))
> +   /* If we add ops for the rhs we expect to be able to recurse
> +  to it via the lhs during expression rewrite so swap
> +  operands.  */
> +   swap = true;
> + else
> add_to_ops_vec (ops, binrhs);
>
>   if (!try_special_add_to_ops (ops, rhscode, binlhs, binlhsdef))
> add_to_ops_vec (ops, binlhs);
>
> - return;
> + if (!swap)
> +   return;
> }
>
>if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -5676,6 +5683,8 @@ linearize_expr_tree (vec *ops, gimple 
> *stmt,
>   fprintf (dump_file, " is now ");
>   print_gimple_stmt (dump_file, stmt, 0);
> }
> +  if (!binrhsisreassoc)
> +   return;
>
>/* We want to make it so the lhs is always the reassociative op,
>  so swap.  */
> --
> 2.26.2


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-28 Thread Alexandre Oliva
On Aug 26, 2020, Qing Zhao  wrote:

> There are two issues I can see with adding a default generator in middle end:

> 1. In order to determine where a target should not use the generic
> code to emit the zeroing sequence,
> a new target hook to determine this has to be added;

Yeah, a target hook whose default is the generic code, and that targets
that need it, or that benefit from it, can override.  That's the point
of hooks, to enable overriding.  Why should this be an issue?

> 2. In order to avoid the generated zeroing insns (which are simply
> insns that set registers) being deleted,
> We have to define a new insn “pro_epilogue_use” in the target. 

Why won't a naked USE pattern do?  We already issue those in generic
code, for regs holding return values.  If we were to pretend that other
registers are also holding zeros as values to be returned, why shouldn't
that work for them as well?

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH] c: Silently ignore pragma region [PR85487]

2020-08-28 Thread Richard Biener via Gcc-patches
On Fri, Aug 28, 2020 at 3:26 AM Austin Morton via Gcc-patches
 wrote:
>
> #pragma region is a feature introduced by Microsoft in order to allow
> manual grouping and folding of code within Visual Studio.  It is
> entirely ignored by the compiler.  Clang has supported this feature
> since 2012 when in MSVC compatibility mode, and enabled it across the
> board 3 months ago.
>
> As it stands, you cannot use #pragma region within GCC without
> disabling unknown pragma warnings, which is not advisable.
>
> I propose GCC adopt "#pragma region" and "#pragma endregion" in order
> to alleviate these issues.  Because the pragma has no purpose at
> compile time, the implementation is trivial.

The patch misses documentation of the pragma.

Richard.

>
> Microsoft Documentation on the feature:
> https://docs.microsoft.com/en-us/cpp/preprocessor/region-endregion
>
> LLVM change which enabled pragma region across the board:
> https://reviews.llvm.org/D42248
>
> ---
>  gcc/c-family/ChangeLog   |  5 +
>  gcc/c-family/c-pragma.c  | 10 ++
>  gcc/testsuite/ChangeLog  |  5 +
>  gcc/testsuite/gcc.dg/pragma-region.c | 21 +
>  4 files changed, 41 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pragma-region.c
>
> diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
> index 1eaa99f31..97ba259cd 100644
> --- a/gcc/c-family/ChangeLog
> +++ b/gcc/c-family/ChangeLog
> @@ -1,3 +1,8 @@
> +2020-08-27  Austin Morton  
> +
> + PR c/85487
> + * c-pragma.c (handle_pragma_region): Declare.
> +
>  2020-08-11  Jakub Jelinek  
>
>   PR c/96545
> diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
> index e3169e68f..de0411d07 100644
> --- a/gcc/c-family/c-pragma.c
> +++ b/gcc/c-family/c-pragma.c
> @@ -1166,6 +1166,13 @@ handle_pragma_message (cpp_reader *ARG_UNUSED(dummy))
>  TREE_STRING_POINTER (message));
>  }
>
> +/* Silently ignore region pragmas.  */
> +
> +static void
> +handle_pragma_region (cpp_reader *ARG_UNUSED(dummy))
> +{
> +}
> +
>  /* Mark whether the current location is valid for a STDC pragma.  */
>
>  static bool valid_location_for_stdc_pragma;
> @@ -1584,6 +1591,9 @@ init_pragma (void)
>
>c_register_pragma_with_expansion (0, "message", handle_pragma_message);
>
> +  c_register_pragma (0, "region", handle_pragma_region);
> +  c_register_pragma (0, "endregion", handle_pragma_region);
> +
>  #ifdef REGISTER_TARGET_PRAGMAS
>REGISTER_TARGET_PRAGMAS ();
>  #endif
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 5c1a45716..4033c111c 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,8 @@
> +2020-08-27  Austin Morton  
> +
> + PR c/85487
> + * gcc.dg/pragma-region.c: New test.
> +
>  2020-08-26  Jeff Law  
>
>   * gcc.target/i386/387-7.c: Add dg-require-effective-target c99_runtime.
> diff --git a/gcc/testsuite/gcc.dg/pragma-region.c
> b/gcc/testsuite/gcc.dg/pragma-region.c
> new file mode 100644
> index 0..72cc2c144
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pragma-region.c
> @@ -0,0 +1,21 @@
> +/* Verify #pragma region and #pragma endregion do not emit warnings.  */
> +
> +/* { dg-options "-Wunknown-pragmas" } */
> +
> +#pragma region
> +
> +#pragma region name
> +
> +#pragma region "name"
> +
> +#pragma region()
> +
> +#pragma region("name")
> +
> +#pragma endregion
> +
> +#pragma endregion garbage
> +
> +#pragma endregion()
> +
> +#pragma endregion("garbage")
> --
> 2.17.1


Re: [PATCH] libstdc++: remove an ignored qualifier on function return type

2020-08-28 Thread Krystian Kuźniarek via Gcc-patches
Actually ChangeLog also uses tabs, so it should be:

libstdc++-v3/ChangeLog:

* include/std/variant: Fix -Wignored-qualifiers
in system headers.


On Fri, 28 Aug 2020 at 08:55, Krystian Kuźniarek <
krystian.kuznia...@gmail.com> wrote:

> > So then you need to produce a changelog entry by hand.
> I had this problem on some old Ubuntu 18.04. Anyway, here's new ChangeLog:
>
> libstdc++-v3/ChangeLog:
>
> * include/std/variant: Fix -Wignored-qualifiers
> in system headers.
>
>
> >That doesn't test this header at all.
> It does but indirectly. What I meant by manual test was:
> ${GCC_GIT} -E contains_only_stdcpp_include.cpp > preprocessed.cpp
> ${GCC_GIT} -Wall -Wextra -pedantic -fsyntax-only preprocessed.cpp
> By manipulating GCC_GIT variable to trunk GCC and patched GCC, I checked
> if the warning is gone.
>
> >What about the libstdc++ testsuite?
> I hope you mean calling make bootstrap and make check. If that's ok, I
> confirm it works on Manjaro and Ubuntu 18.04 with gcc10 and gcc8
> respectively.
>
> >I don't remember exactly why I put it there, but I seem to recall it
> >was necessary.
> I don't know your reasons but I can only tell that this patch seems to
> compile and work just fine.
>
> On Mon, 24 Aug 2020 at 13:38, Jonathan Wakely  wrote:
>
>> On 24/08/20 13:26 +0200, Krystian Kuźniarek via Libstdc++ wrote:
>> >Hi,
>> >
>> >First of all, sorry, I must have sent it as quoted-printable so spaces
>> and
>> >tabs are preserved.
>> >
>> >A description of the problem/bug and how your patch addresses it:
>> >I've got a small patch for -Wignored-qualifiers in system headers.
>> >
>> >Testcases:
>> >N/A, it's only a warning.
>> >
>> >ChangeLog:
>> >Sorry, contrib/mklog.py didn't quite work for me.
>> >For some reason after instruction in line 129: "diff = PatchSet(data)" my
>> >"diff" variable is always empty.
>>
>> So then you need to produce a changelog entry by hand.
>>
>> >Bootstrapping and testing:
>> >Tested that manually by recompling GCC, unfolding all headers with
>> >`#include ` and compiling what's been included by it.
>>
>> That doesn't test this header at all.
>>
>> What about the libstdc++ testsuite?
>>
>> >The patch itself:
>> >
>> >diff --git a/libstdc++-v3/include/std/variant
>> b/libstdc++-v3/include/std/variant
>> >index eb3d6779205..e8fcb57a401 100644
>> >--- a/libstdc++-v3/include/std/variant
>> >+++ b/libstdc++-v3/include/std/variant
>> >@@ -808,7 +808,7 @@ namespace __variant
>> >   { using element_type = _Tp; };
>> >
>> >   template 
>> >-  struct __untag_result
>> >+  struct __untag_result
>>
>> I don't remember exactly why I put it there, but I seem to recall it
>> was necessary.
>>
>>
>> >   : false_type
>> >   { using element_type = void(*)(_Args...); };
>> >
>> >
>> >
>> >Best regards,
>> >Krystian
>> >
>>
>>


Re: [PATCH] libstdc++: remove an ignored qualifier on function return type

2020-08-28 Thread Krystian Kuźniarek via Gcc-patches
> So then you need to produce a changelog entry by hand.
I had this problem on some old Ubuntu 18.04. Anyway, here's new ChangeLog:

libstdc++-v3/ChangeLog:

* include/std/variant: Fix -Wignored-qualifiers
in system headers.


>That doesn't test this header at all.
It does but indirectly. What I meant by manual test was:
${GCC_GIT} -E contains_only_stdcpp_include.cpp > preprocessed.cpp
${GCC_GIT} -Wall -Wextra -pedantic -fsyntax-only preprocessed.cpp
By manipulating GCC_GIT variable to trunk GCC and patched GCC, I checked if
the warning is gone.

>What about the libstdc++ testsuite?
I hope you mean calling make bootstrap and make check. If that's ok, I
confirm it works on Manjaro and Ubuntu 18.04 with gcc10 and gcc8
respectively.

>I don't remember exactly why I put it there, but I seem to recall it
>was necessary.
I don't know your reasons but I can only tell that this patch seems to
compile and work just fine.

On Mon, 24 Aug 2020 at 13:38, Jonathan Wakely  wrote:

> On 24/08/20 13:26 +0200, Krystian Kuźniarek via Libstdc++ wrote:
> >Hi,
> >
> >First of all, sorry, I must have sent it as quoted-printable so spaces and
> >tabs are preserved.
> >
> >A description of the problem/bug and how your patch addresses it:
> >I've got a small patch for -Wignored-qualifiers in system headers.
> >
> >Testcases:
> >N/A, it's only a warning.
> >
> >ChangeLog:
> >Sorry, contrib/mklog.py didn't quite work for me.
> >For some reason after instruction in line 129: "diff = PatchSet(data)" my
> >"diff" variable is always empty.
>
> So then you need to produce a changelog entry by hand.
>
> >Bootstrapping and testing:
> >Tested that manually by recompling GCC, unfolding all headers with
> >`#include ` and compiling what's been included by it.
>
> That doesn't test this header at all.
>
> What about the libstdc++ testsuite?
>
> >The patch itself:
> >
> >diff --git a/libstdc++-v3/include/std/variant
> b/libstdc++-v3/include/std/variant
> >index eb3d6779205..e8fcb57a401 100644
> >--- a/libstdc++-v3/include/std/variant
> >+++ b/libstdc++-v3/include/std/variant
> >@@ -808,7 +808,7 @@ namespace __variant
> >   { using element_type = _Tp; };
> >
> >   template 
> >-  struct __untag_result
> >+  struct __untag_result
>
> I don't remember exactly why I put it there, but I seem to recall it
> was necessary.
>
>
> >   : false_type
> >   { using element_type = void(*)(_Args...); };
> >
> >
> >
> >Best regards,
> >Krystian
> >
>
>


Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-28 Thread Martin Liška

On 8/28/20 1:29 AM, Martin Sebor wrote:

With --enable-valgrind-annotations the change to the member function
signature in this patch triggers compilation errors during bootstrap:


I must confirm I didn't tested the configuration. Feel free to install
the patch, it's obvious.

Thank you,
Martin


Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-28 Thread Richard Biener via Gcc-patches
On Thu, Aug 27, 2020 at 3:20 PM Jakub Jelinek  wrote:
>
> On Thu, Aug 27, 2020 at 03:07:59PM +0200, Richard Biener wrote:
> > > Also, isn't the pass also useful for TARGET_AVX and above (but in that 
> > > case
> > > only if it is a simple memory load)?  Or are avx/avx2 broadcast slower 
> > > than
> > > full vector loads?
> > >
> > > As Jeff wrote, I wonder if when successfully replacing those pool 
> > > constants
> > > the old constant pool entries will be omitted.
> > >
> > > Another thing I wonder about is whether more analysis shouldn't be used.
> > > E.g. if the constant pool entry is already emitted into .rodata anyway
> > > (e.g. some earlier function needed it), using the broadcast will mean
> > > actually larger .rodata.  If {1to8} and similar is as fast as reading all
> > > the same elements from memory (or faster), perhaps in that case it should
> > > broadcast from the first element of the existing constant pool full vector
> > > rather than creating a new one.
> > > And similarly, perhaps the function should look at all constant pool 
> > > entries
> > > in the current function (not yet emitted into .rodata) and if it would
> > > succeed for some and not for others, either use broadcast from its first
> > > element or not perform it for the others too.
> >
> > IIRC I once implemented this (re-using vector constant components
> > for non-vector pool entries) but it was quite hackish and never merged
> > it seems.
>
> If the generic constant pool code could do it, it would of course simplify
> this pass.  Not sure if the case where earlier function emits already some
> smaller constant and later function needs a CONST_VECTOR containing that can
> be handled at all (probably not), but if the same function has both scalar
> pool entries and CONST_VECTOR ones that contain those, or already emitted
> CONST_VECTOR pool entry has them, it shouldn't be that hard, at least for
> targets with symbol aliases, e.g. by using .LC33 = .LC24 or .LC34 = .LC24 + 8
> where .LC33 or .LC34 would be the scalar pool entry label and .LC24
> CONST_VECTOR containing those.
>
> Seems constant pool marking is performed during
> mark_constant_pool called during final from assemble_start_function or
> assemble_end_function, so if the pass replaces the constants before final
> and the constants are unused, they won't be emitted.

IIRC elsewhere it was discussed to use ld to perform merging by
emitting separate rodata sections for constant sizes (4, 8, 16, 32, 64
byte sizes).
ld could always direct 8 byte constant refs to the larger pools (sub-)entry.

As for GCCs constant pool code the issue is in the way lookup works
(hashing) and my earlier patch was recording an additional descriptor
for the first vector element IIRC.  So indeed if first the scalar is emitted
and then the vector this won't work easily (we'd need to be able to
associate multiple labes with a constant), but it could be made work, too.

I guess it would be interesting to experiment with function local pools
ordered by accesses to reduce memory bandwith cost (and pressure
on prefetchers) for memory bandwidth starved code.

Richard.

> Jakub
>


Re: [PATCH] libstdc++: mark variables as possibly unused

2020-08-28 Thread Krystian Kuźniarek via Gcc-patches
>Test suite confirmation:
>All tests pass. Tested on both Manjaro and some Ubuntu 18.04 with gcc10
and gcc8 respectively.

Jonathan, one more thing. I hope it's what you asked for cause all I did
was:
make bootstrap
make check

On Fri, 28 Aug 2020 at 08:32, Krystian Kuźniarek <
krystian.kuznia...@gmail.com> wrote:

> Ok, so here it is.
> New diff:
>
> diff --git a/libstdc++-v3/include/bits/atomic_base.h 
> b/libstdc++-v3/include/bits/atomic_base.h
> index 015acef83c4..c4a763aae5c 100644
> --- a/libstdc++-v3/include/bits/atomic_base.h
> +++ b/libstdc++-v3/include/bits/atomic_base.h
> @@ -231,7 +231,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  _GLIBCXX_ALWAYS_INLINE void
>  clear(memory_order __m = memory_order_seq_cst) noexcept
>  {
> -  memory_order __b = __m & __memory_order_mask;
> +  memory_order __b __attribute__ ((__unused__))
> += __m & __memory_order_mask;
>__glibcxx_assert(__b != memory_order_consume);
>__glibcxx_assert(__b != memory_order_acquire);
>__glibcxx_assert(__b != memory_order_acq_rel);
> @@ -242,7 +243,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  _GLIBCXX_ALWAYS_INLINE void
>  clear(memory_order __m = memory_order_seq_cst) volatile noexcept
>  {
> -  memory_order __b = __m & __memory_order_mask;
> +  memory_order __b __attribute__ ((__unused__))
> += __m & __memory_order_mask;
>__glibcxx_assert(__b != memory_order_consume);
>__glibcxx_assert(__b != memory_order_acquire);
>__glibcxx_assert(__b != memory_order_acq_rel);
> @@ -416,7 +418,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_GLIBCXX_ALWAYS_INLINE void
>store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
>{
> - memory_order __b = __m & __memory_order_mask;
> + memory_order __b __attribute__ ((__unused__))
> +   = __m & __memory_order_mask;
>   __glibcxx_assert(__b != memory_order_acquire);
>   __glibcxx_assert(__b != memory_order_acq_rel);
>   __glibcxx_assert(__b != memory_order_consume);
> @@ -428,7 +431,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>store(__int_type __i,
>   memory_order __m = memory_order_seq_cst) volatile noexcept
>{
> - memory_order __b = __m & __memory_order_mask;
> + memory_order __b __attribute__ ((__unused__))
> +   = __m & __memory_order_mask;
>   __glibcxx_assert(__b != memory_order_acquire);
>   __glibcxx_assert(__b != memory_order_acq_rel);
>   __glibcxx_assert(__b != memory_order_consume);
> @@ -439,7 +443,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_GLIBCXX_ALWAYS_INLINE __int_type
>load(memory_order __m = memory_order_seq_cst) const noexcept
>{
> - memory_order __b = __m & __memory_order_mask;
> + memory_order __b __attribute__ ((__unused__))
> +   = __m & __memory_order_mask;
>   __glibcxx_assert(__b != memory_order_release);
>   __glibcxx_assert(__b != memory_order_acq_rel);
>
> @@ -449,7 +454,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_GLIBCXX_ALWAYS_INLINE __int_type
>load(memory_order __m = memory_order_seq_cst) const volatile noexcept
>{
> - memory_order __b = __m & __memory_order_mask;
> + memory_order __b __attribute__ ((__unused__))
> +   = __m & __memory_order_mask;
>   __glibcxx_assert(__b != memory_order_release);
>   __glibcxx_assert(__b != memory_order_acq_rel);
>
> @@ -475,8 +481,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>compare_exchange_weak(__int_type& __i1, __int_type __i2,
>   memory_order __m1, memory_order __m2) noexcept
>{
> - memory_order __b2 = __m2 & __memory_order_mask;
> - memory_order __b1 = __m1 & __memory_order_mask;
> + memory_order __b2 __attribute__ ((__unused__))
> +   = __m2 & __memory_order_mask;
> + memory_order __b1 __attribute__ ((__unused__))
> +   = __m1 & __memory_order_mask;
>   __glibcxx_assert(__b2 != memory_order_release);
>   __glibcxx_assert(__b2 != memory_order_acq_rel);
>   __glibcxx_assert(__b2 <= __b1);
> @@ -490,8 +498,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   memory_order __m1,
>   memory_order __m2) volatile noexcept
>{
> - memory_order __b2 = __m2 & __memory_order_mask;
> - memory_order __b1 = __m1 & __memory_order_mask;
> + memory_order __b2 __attribute__ ((__unused__))
> +   = __m2 & __memory_order_mask;
> + memory_order __b1 __attribute__ ((__unused__))
> +   = __m1 & __memory_order_mask;
>   __glibcxx_assert(__b2 != memory_order_release);
>   __glibcxx_assert(__b2 != memory_order_acq_rel);
>   __glibcxx_assert(__b2 <= __b1);
> @@ -520,8 +530,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>compare_exchange_strong(__int_type& __i1, __int_type __i2,
> memory_order __m1, memory_order __m2) noexcept
>{
> - memory_order __b2 = __m2 & __memory_order_mask;
> - 

Re: [PATCH] libstdc++: mark variables as possibly unused

2020-08-28 Thread Krystian Kuźniarek via Gcc-patches
Ok, so here it is.
New diff:

diff --git a/libstdc++-v3/include/bits/atomic_base.h
b/libstdc++-v3/include/bits/atomic_base.h
index 015acef83c4..c4a763aae5c 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -231,7 +231,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_ALWAYS_INLINE void
 clear(memory_order __m = memory_order_seq_cst) noexcept
 {
-  memory_order __b = __m & __memory_order_mask;
+  memory_order __b __attribute__ ((__unused__))
+= __m & __memory_order_mask;
   __glibcxx_assert(__b != memory_order_consume);
   __glibcxx_assert(__b != memory_order_acquire);
   __glibcxx_assert(__b != memory_order_acq_rel);
@@ -242,7 +243,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_ALWAYS_INLINE void
 clear(memory_order __m = memory_order_seq_cst) volatile noexcept
 {
-  memory_order __b = __m & __memory_order_mask;
+  memory_order __b __attribute__ ((__unused__))
+= __m & __memory_order_mask;
   __glibcxx_assert(__b != memory_order_consume);
   __glibcxx_assert(__b != memory_order_acquire);
   __glibcxx_assert(__b != memory_order_acq_rel);
@@ -416,7 +418,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX_ALWAYS_INLINE void
   store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
   {
-   memory_order __b = __m & __memory_order_mask;
+   memory_order __b __attribute__ ((__unused__))
+ = __m & __memory_order_mask;
__glibcxx_assert(__b != memory_order_acquire);
__glibcxx_assert(__b != memory_order_acq_rel);
__glibcxx_assert(__b != memory_order_consume);
@@ -428,7 +431,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   store(__int_type __i,
memory_order __m = memory_order_seq_cst) volatile noexcept
   {
-   memory_order __b = __m & __memory_order_mask;
+   memory_order __b __attribute__ ((__unused__))
+ = __m & __memory_order_mask;
__glibcxx_assert(__b != memory_order_acquire);
__glibcxx_assert(__b != memory_order_acq_rel);
__glibcxx_assert(__b != memory_order_consume);
@@ -439,7 +443,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX_ALWAYS_INLINE __int_type
   load(memory_order __m = memory_order_seq_cst) const noexcept
   {
-   memory_order __b = __m & __memory_order_mask;
+   memory_order __b __attribute__ ((__unused__))
+ = __m & __memory_order_mask;
__glibcxx_assert(__b != memory_order_release);
__glibcxx_assert(__b != memory_order_acq_rel);

@@ -449,7 +454,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX_ALWAYS_INLINE __int_type
   load(memory_order __m = memory_order_seq_cst) const volatile noexcept
   {
-   memory_order __b = __m & __memory_order_mask;
+   memory_order __b __attribute__ ((__unused__))
+ = __m & __memory_order_mask;
__glibcxx_assert(__b != memory_order_release);
__glibcxx_assert(__b != memory_order_acq_rel);

@@ -475,8 +481,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   compare_exchange_weak(__int_type& __i1, __int_type __i2,
memory_order __m1, memory_order __m2) noexcept
   {
-   memory_order __b2 = __m2 & __memory_order_mask;
-   memory_order __b1 = __m1 & __memory_order_mask;
+   memory_order __b2 __attribute__ ((__unused__))
+ = __m2 & __memory_order_mask;
+   memory_order __b1 __attribute__ ((__unused__))
+ = __m1 & __memory_order_mask;
__glibcxx_assert(__b2 != memory_order_release);
__glibcxx_assert(__b2 != memory_order_acq_rel);
__glibcxx_assert(__b2 <= __b1);
@@ -490,8 +498,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
memory_order __m1,
memory_order __m2) volatile noexcept
   {
-   memory_order __b2 = __m2 & __memory_order_mask;
-   memory_order __b1 = __m1 & __memory_order_mask;
+   memory_order __b2 __attribute__ ((__unused__))
+ = __m2 & __memory_order_mask;
+   memory_order __b1 __attribute__ ((__unused__))
+ = __m1 & __memory_order_mask;
__glibcxx_assert(__b2 != memory_order_release);
__glibcxx_assert(__b2 != memory_order_acq_rel);
__glibcxx_assert(__b2 <= __b1);
@@ -520,8 +530,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   compare_exchange_strong(__int_type& __i1, __int_type __i2,
  memory_order __m1, memory_order __m2) noexcept
   {
-   memory_order __b2 = __m2 & __memory_order_mask;
-   memory_order __b1 = __m1 & __memory_order_mask;
+   memory_order __b2 __attribute__ ((__unused__))
+ = __m2 & __memory_order_mask;
+   memory_order __b1 __attribute__ ((__unused__))
+ = __m1 & __memory_order_mask;
__glibcxx_assert(__b2 != memory_order_release);
__glibcxx_assert(__b2 != memory_order_acq_rel);
__glibcxx_assert(__b2 <= __b1);
@@ -535,8 +547,10 @@ 

[PATCH] C-SKY: Add compatibility of elf target name

2020-08-28 Thread Jojo R
gcc/ChangeLog:

* config.gcc (C-SKY): Add compatibility of elf target name.

libgcc/ChangeLog:

* config.host (C-SKY): Add compatibility of elf target name.

---
 gcc/config.gcc | 2 +-
 libgcc/config.host | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ed9697b..f2324ba 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1558,7 +1558,7 @@ csky-*-*)
fi
 
case ${target} in
-   csky-*-elf*)
+   csky*-elf*)
tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file} 
csky/csky-elf.h"
tmake_file="csky/t-csky csky/t-csky-elf"
default_use_cxa_atexit=no
diff --git a/libgcc/config.host b/libgcc/config.host
index c529cc4..c291a8a 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -586,7 +586,7 @@ cris-*-elf)
 cris-*-linux* | crisv32-*-linux*)
tmake_file="$tmake_file cris/t-cris t-softfp-sfdf t-softfp cris/t-linux"
;;
-csky-*-elf*)
+csky*-elf*)
tmake_file="csky/t-csky t-fdpbit"
extra_parts="$extra_parts crti.o crtn.o"
;;
-- 
1.9.1



Re: [PATCH] rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

2020-08-28 Thread Segher Boessenkool
On Fri, Aug 28, 2020 at 10:48:43AM +0930, Alan Modra wrote:
> On Thu, Aug 27, 2020 at 03:17:45PM -0500, Segher Boessenkool wrote:
> > On Thu, Aug 27, 2020 at 01:51:25PM -0500, Bill Schmidt wrote:
> > It not the copy that is unnecessary: the preventing it *here*, manually,
> > is what is unnecessary.
> 
> Blame me for the original !rtx_equal_p in rs6000_call_aix that Bill
> copied.  So does emit_move_insn prevent the copy?  I can't spot where,
> maybe I haven't looked hard enough.

It doesn't.  But this is generated during expand already, many later
passes can optimise it away.

> If emit_move_insn doesn't prevent it, then why create useless RTL that
> is only going to make work for optimisation passes that remove such
> nops?

1) Very many unnecessary moves are generated during expand *anyway*, a
few more will not hurt;
2) In practice we always generate a move from a pseudo into r12 here,
never a copy from r12 into r12;
3) Why generate dead code optimising cases that do not happen, making
the compiler bigger and slower, but more importantly, making the
compiler code harder to read?

Such optimisations can be useful for code we generate very late, but
not for stuff run at expand.  It is distracting to the reader, who then
can think a copy from r12 to r12 can happen here, and even is frequent
enough to warrant a special optimisation.

Older GCC was chock-full of such questionable micro-optimisations.  But
imo we should try to make the code base more modern now.


Segher


Re: [PATCH] cmpelim: recognize extra clobbers in insns

2020-08-28 Thread Pip Cet via Gcc-patches
On Mon, Aug 24, 2020 at 6:12 PM Jeff Law  wrote:
> On Thu, 2020-08-06 at 12:42 +, Pip Cet via Gcc-patches wrote:
> > I've bootstrapped and run the test suite with the patch, without 
> > differences.
> So it looks like Richard has given you some feedback and you've got some 
> further
> work to do.  I just wanted to chime in and say thanks for tackling this.

Thanks for the encouragement!

Richard wrote:
> We should be able to test whether the rest of the parallel is suitable by 
> adding a single_set test (after testing everything else).

It seems to me the existing two checks using single_set are sufficient
to ensure this.

Revised patch attached. It bootstraps and causes no test regressions,
and it appears to work.
From 72e05ac3a570f3e6ca9e54599db1b3b4daa84e90 Mon Sep 17 00:00:00 2001
From: Pip Cet 
Date: Tue, 25 Aug 2020 14:23:58 +
Subject: [PATCH] cmpelim: handle extra clobbers

Handle extra clobbers in CC-clobbering insns when attempting to
recognize the corresponding CC-setting insn.

This is for the AVR CCmode conversion. AVR has insns like

(define_insn "andhi3"
  [(set (match_operand:HI 0 ...)
(and:HI (match_operand:HI 1 ...)
	(match_operand:HI 2 ...)))
   (clobber (reg:CC REG_CC))
   (clobber (match_scratch:QI 3 ...))] ...)

which can be profitably converted into CC-setting variants.

2020-08-28  Pip Cet  

gcc/ChangeLog:

	* compare-elim.c (arithmetic_flags_clobber_p): Allow extra
clobbers. (try_validate_parallel): (try_eliminate_compare):
Likewise.
---
 gcc/compare-elim.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index 85f3e344074..8e5c9a05fa8 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -202,7 +202,7 @@ arithmetic_flags_clobber_p (rtx_insn *insn)
   if (asm_noperands (pat) >= 0)
 return false;
 
-  if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) == 2)
+  if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) >= 2)
 {
   x = XVECEXP (pat, 0, 0);
   if (GET_CODE (x) != SET)
@@ -663,6 +663,17 @@ static rtx
 try_validate_parallel (rtx set_a, rtx set_b)
 {
   rtx par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
+  if (GET_CODE (set_b) == PARALLEL)
+{
+  int len = XVECLEN (set_b, 0);
+  rtvec v = rtvec_alloc (len);
+  RTVEC_ELT (v, 0) = set_a;
+  RTVEC_ELT (v, 1) = XVECEXP (set_b, 0, 0);
+  for (int i = 2; i < len; i++)
+	RTVEC_ELT (v, i) = XVECEXP (set_b, 0, i);
+
+  par = gen_rtx_PARALLEL (VOIDmode, v);
+}
   rtx_insn *insn = make_insn_raw (par);
 
   if (insn_invalid_p (insn, false))
@@ -873,10 +884,13 @@ try_eliminate_compare (struct comparison *cmp)
  [(set (reg:CCM) (compare:CCM (operation) (immediate)))
   (set (reg) (operation)]  */
 
-  rtvec v = rtvec_alloc (2);
+  int len = XVECLEN (PATTERN (insn), 0);
+  rtvec v = rtvec_alloc (len);
   RTVEC_ELT (v, 0) = y;
   RTVEC_ELT (v, 1) = x;
-  
+  for (int i = 2; i < len; i++)
+RTVEC_ELT (v, i) = XVECEXP (PATTERN (insn), 0, i);
+
   rtx pat = gen_rtx_PARALLEL (VOIDmode, v);
   
   /* Succeed if the new instruction is valid.  Note that we may have started
-- 
2.28.0