Re: [PATCH v3] x86: Don't enable UINTR in 32-bit mode

2021-07-13 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 13, 2021 at 8:59 PM Jakub Jelinek  wrote:
>
> On Tue, Jul 13, 2021 at 09:35:18AM -0700, H.J. Lu wrote:
> > Here is the v3 patch.   OK for master?
>
> From my POV LGTM, but please give Uros a chance to chime in.
>
> > From ceab81ef97ab102c410830c41ba7fea911170d1a Mon Sep 17 00:00:00 2001
> > From: "H.J. Lu" 
> > Date: Fri, 9 Jul 2021 09:16:01 -0700
> > Subject: [PATCH v3] x86: Don't enable UINTR in 32-bit mode
> >
> > UINTR is available only in 64-bit mode.  Since the codegen target is
> > unknown when the the gcc driver is processing -march=native, to properly
> > handle UINTR for -march=native:
> >
> > 1. Pass "arch [32|64]" and "tune [32|64]" to host_detect_local_cpu to
> > indicate 32-bit and 64-bit codegen.
> > 2. Change ix86_option_override_internal to enable UINTR only in 64-bit
> > mode for -march=CPU when PTA_CPU includes PTA_UINTR.
> >
> > gcc/
> >
> >   PR target/101395
> >   * config/i386/driver-i386.c (host_detect_local_cpu): Check
> >   "arch [32|64]" and "tune [32|64]" for 32-bit and 64-bit codegen.
> >   Enable UINTR only for 64-bit codegen.
> >   * config/i386/i386-options.c
> >   (ix86_option_override_internal::DEF_PTA): Skip PTA_UINTR if not
> >   in 64-bit mode.
> >   * config/i386/i386.h (ARCH_ARG): New.
> >   (CC1_CPU_SPEC): Pass "[arch|tune] 32" for 32-bit codegen and
> >   "[arch|tune] 64" for 64-bit codegen.
> >
> > gcc/testsuite/
> >
> >   PR target/101395
> >   * gcc.target/i386/pr101395-1.c: New test.
> >   * gcc.target/i386/pr101395-2.c: Likewise.
> >   * gcc.target/i386/pr101395-3.c: Likewise.

OK.

Thanks,
Uros.


Re: [patch] PR jit/87808: Allow libgccjit to work without an external gcc driver

2021-07-13 Thread Matthias Klose
On 7/13/21 8:41 AM, Richard Biener wrote:
> On Mon, Jul 12, 2021 at 11:00 PM Matthias Klose  wrote:
>>
>> On 3/26/19 12:52 PM, Matthias Klose wrote:
>>> On 22.03.19 23:00, David Malcolm wrote:
 On Thu, 2019-03-21 at 12:26 +0100, Matthias Klose wrote:
> Fix PR jit/87808, the embedded driver still needing the external gcc
> driver to
> find the gcc_lib_dir. This can happen in a packaging context when
> libgccjit
> doesn't depend on the gcc package, but just on binutils and libgcc-
> dev packages.
> libgccjit probably could use /proc/self/maps to find the gcc_lib_dir,
> but that
> doesn't seem to be very portable.
>
> Ok for the trunk and the branches?
>
> Matthias

 [CCing the jit list]

 I've been trying to reproduce this bug in a working copy, and failing.

 Matthias, do you have a recipe you've been using to reproduce this?
>>>
>>> the JIT debug log shows the driver names that it wants to call.  Are you 
>>> sure
>>> that this driver isn't available anywhere?  I configure the gcc build with
>>> --program-suffix=-8 --program-prefix=x86_64-linux-gnu-, and that one was 
>>> only
>>> available in one place, /usr/bin.
>>>
>>> Matthias
>>
>> David, the bug report now has two more comments from people that the current
>> behavior is broken.  Please could you review the patch?
> 
> I think libgccjit should use the same strathegy for finding the install 
> location
> like the driver does itself.  I couldn't readily decipher its magic but at 
> least
> there's STANDARD_EXEC_PREFIX which seems to be used as possible
> fallback.

No, it's crtbeginS.o, and libgcc.* which are are not found in the
STANDARD_EXEC_PREFIX.

> In particular your patch doesn't seem to work with a DESTDIR=
> install?

it does. usually you build as configure && make && make install with a DESTDIR
set for only the last step, which doesn't rebuild any object file.

> Can we instead add a --with-gccjit-install-dir= or sth like that (whatever
> path to whatever files the JIT exactly looks for)?

that should be possible, moving the definition of FALLBACK_GCC_EXEC_PREFIX from
the Makefile to a value specified by a configure value.  Or is there already a
macro, that doesn't get prefixed by DESTDIR?

Matthias


Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?

2021-07-13 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu  wrote:
>
> Hi:
>   The original problem was that some users wanted the cmdline option
> -ffast-math not to act on intrinsic production code. .i.e for codes
> like
>
> #include
> __m256d
> foo2 (__m256d a, __m256d b, __m256d c, __m256d d)
> {
> __m256d tmp = _mm256_add_pd (a, b);
> tmp = _mm256_sub_pd (tmp, c);
> tmp = _mm256_sub_pd (tmp, d);
> return tmp;
> }
>
> compiled with -O2 -mavx2 -ffast-math, users expected codes generated like
>
> vaddpd ymm0, ymm0, ymm1
> vsubpd ymm0, ymm0, ymm2
> vsubpd ymm0, ymm0, ymm3
>
> but not
>
> vsubpd ymm1, ymm1, ymm2
> vsubpd ymm0, ymm0, ymm3
> vaddpd ymm0, ymm1, ymm0
>
>
> For the LLVM side, there're mechanisms like
> #pragma float_control( precise, on, push)
> ...(intrinsics definition)..
> #pragma float_control(pop)
>
> When intrinsics are inlined, their IRs will be marked with
> "no-fast-math", and even if the caller is compiled with -ffast-math,
> reassociation only happens to those IRs which are not marked with
> "no-fast-math". It seems to be more flexible to support fast math
> control of a region(inside a function).
Testcase
https://godbolt.org/z/9cYMGGWPG
>
> Does GCC have a similar mechanism?
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: avoid early reference to debug-only symbol

2021-07-13 Thread Alexandre Oliva
On Jul 13, 2021, Richard Biener  wrote:

> Hmm, elsewhere in this function we're not anticipating future removal but
> instead use ->global_info_ready which IIRC is when the unit was
> initially analyzed.  So don't the other uses have the same issue?

Possibly.

The call to debug_hooks->late_global_decl in symtab::remove_unused_nodes
is only for varpool nodes, and only variables have initializers that are
expanded as if for non-debug uses, but the initializers *might*
reference function symbols, and that might in turn lead to analogous
codegen differences.  I don't have a testcase for that, but it might not
be hard to come up with an analogous one.

> Maybe reference_to_unused is the wrong tool here and we need a
> reference_to_discardable or so?

Yeah, back when we expanded debug info very late in compilation,
reference_to_unused was good enough, but now that we process and discard
stuff early, what we really need is some way to avoid issuing a hard
reference to something that might end up not be referenced at all if it
weren't for debug info.

> In other places we manage to use symbolic DIE references later resolved
> by note_variable_values, can we maybe do this unconditionally for the
> initializers of removed decls somehow?

That seems doable and desirable.  decContext.c doesn't generate location
or const_value for neither mfctop nor mfcone, even after removing
rtl_for_decl_location from add_location_or_const_value_attribute,
despite lookup_decl_loc's finding a location expression.

I'm afraid I can't really commit to pursuing this any further.

I'd be glad if someone more familiar with this were to pick this up, but
I haven't managed to come up with a testcase to trigger the problem
without the patchset that I'm not ready to contribute.

The kicker, in case someone wants to force it, is that at
materialize_all_clones, the reference to mfctop in a constprop-ed
decContextTestEndian gets substituted by its constant initializer,
, so mfctop becomes unreachable and goes through late_global_decl
in remove_unreachable_nodes.  Eventually, ccp2 resolves the mfcone
dereference to a constant, and no reason remains for mfcone to be
output.  However, when outputting debug info, the expand_expr of
mfctop's initializer will have already generated RTL for mfcone,
committing it to be output, wehreas without debug info, mfcone is not
output at all.

What enables these transformations during IPA is the introduction of a
wrapper for decContextTestEndian, moving its body to a clone that is
materialized early, at the end of all_small_ipa_passes.  This clone,
being a local function, seems to be what enables the substitution of the
mfctop constant initializer.  I haven't found a way to cause such a
difference without my pass, even getting a constprop (local) clone for
the function, and preventing its inlining into an exported function.

Hopefully this is enough information to enable the issue to be
investigated.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?

2021-07-13 Thread Hongtao Liu via Gcc-patches
Hi:
  The original problem was that some users wanted the cmdline option
-ffast-math not to act on intrinsic production code. .i.e for codes
like

#include
__m256d
foo2 (__m256d a, __m256d b, __m256d c, __m256d d)
{
__m256d tmp = _mm256_add_pd (a, b);
tmp = _mm256_sub_pd (tmp, c);
tmp = _mm256_sub_pd (tmp, d);
return tmp;
}

compiled with -O2 -mavx2 -ffast-math, users expected codes generated like

vaddpd ymm0, ymm0, ymm1
vsubpd ymm0, ymm0, ymm2
vsubpd ymm0, ymm0, ymm3

but not

vsubpd ymm1, ymm1, ymm2
vsubpd ymm0, ymm0, ymm3
vaddpd ymm0, ymm1, ymm0


For the LLVM side, there're mechanisms like
#pragma float_control( precise, on, push)
...(intrinsics definition)..
#pragma float_control(pop)

When intrinsics are inlined, their IRs will be marked with
"no-fast-math", and even if the caller is compiled with -ffast-math,
reassociation only happens to those IRs which are not marked with
"no-fast-math". It seems to be more flexible to support fast math
control of a region(inside a function).

Does GCC have a similar mechanism?


-- 
BR,
Hongtao


Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-13 Thread guojiufu via Gcc-patches

On 2021-07-14 04:50, Segher Boessenkool wrote:

Hi!

On Tue, Jul 13, 2021 at 08:50:46PM +0800, Jiufu Guo wrote:

* doc/tm.texi: Regenerated.


Pet peeve: "Regenerate.", no "d".


Ok, yeap. While, 'Regenerate and Regenerated' were used by commits 
somewhere :)





+DEFHOOK
+(preferred_doloop_mode,
+ "This hook returns a more preferred mode or the @var{mode} itself.",
+ machine_mode,
+ (machine_mode mode),
+ default_preferred_doloop_mode)


You need a bit more description here.  What does the value it returns
mean?  If you want to say "a more preferred mode or the mode itself",
you should explain what the difference means, too.


Ok, thanks.



You also should say the hook does not need to test if things will fit,
since the generic code already does.

And say this should return a MODE_INT always -- you never test for that
as far as I can see, but you don't need to, as long as everyone does 
the

sane thing.  So just state every hok implementation should :-)


Yes, the preferred 'doloop iv mode' from targets should be a MODE_INT.
I will add comments, and update the gcc_assert you mentioned below
for this.

Thanks a lot for your comments and suggestions!

When writing words, I was always from adding/deleting and still hard to 
get

perfect ones -:(




+extern machine_mode
+default_preferred_doloop_mode (machine_mode);


One line please (this is a declaration).


+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+void foo(int *p1, long *p2, int s)
+{
+  int n, v, i;
+
+  v = 0;
+  for (n = 0; n <= 100; n++) {
+ for (i = 0; i < s; i++)
+if (p2[i] == n)
+   p1[i] = v;
+ v += 88;
+  }
+}
+
+/* { dg-final { scan-assembler-not {\mrldicl\M} } } */


That is a pretty fragile thing to test for.  It also need a line or two
of comment in the test case what this does, what kind of thing it does
not want to see.


Thanks! I will update accordingly.  And I'm thinking to add tests to 
check

doloop.xx type: no zero_extend to access subreg. This is the intention
of this patch.




+/* If PREFERRED_MODE is suitable and profitable, use the preferred
+   PREFERRED_MODE to compute doloop iv base from niter: base = niter 
+ 1.  */

+
+static tree
+compute_doloop_base_on_mode (machine_mode preferred_mode, tree niter,
+const widest_int _max)
+{
+  tree ntype = TREE_TYPE (niter);
+  tree pref_type = lang_hooks.types.type_for_mode (preferred_mode, 
1);

+
+  gcc_assert (pref_type && TYPE_UNSIGNED (ntype));


Should that be pref_type instead of ntype?  If not, write it as two
separate asserts please.


Ok, will separate as two asserts.




+static machine_mode
+rs6000_preferred_doloop_mode (machine_mode)
+{
+  return word_mode;
+}


This is fine if the generic code does the right thing if it passes say
TImode here, and if it never will pass some other mode class mode.


The generic code checks if the returned mode can works on doloop iv 
correctly,
if the preferred mode is not suitable (e.g. preferred_doloop_mode 
returns DI,
but niter is a large value in TI), then doloop.xx IV will use the 
original mode.


When a target really prefer TImode, and TImode can represent number of 
iterations,
This would still work.  In current code, word_mode is SI/DI in most 
targets, like Pmode.

On powerpc, they are DImode (for 64bit)/SImode(32bit)

Thanks again for your comments!

BR,
Jiufu




Segher


Re: [PATCH] [i386] Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

2021-07-13 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 14, 2021 at 10:34 AM liuhongt  wrote:
>
> By optimizing vector movement to broadcast in ix86_expand_vector_move
> during pass_expand, pass_reload/LRA can automatically generate an avx512
> embedded broadcast, pass_cpb is not needed.
>
> Considering that in the absence of avx512f, broadcast from memory is
> still slightly faster than loading the entire memory, so always enable
> broadcast.
>
> benchmark:
> https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast
>
> The performance diff
>
> strategy: cycles
> memory  : 1046611188
> memory  : 1255420817
> memory  : 1044720793
> memory  : 1253414145
> average : 1097868397
>
> broadcast   : 1044430688
> broadcast   : 1044477630
> broadcast   : 1253554603
> broadcast   : 1044561934
> average : 1096756213
>
> But however broadcast has larger size.
>
> the size diff
>
> size broadcast.o
>textdata bss dec hex filename
> 137   0   0 137  89 broadcast.o
>
> size memory.o
>textdata bss dec hex filename
> 115   0   0 115  73 memory.o
>
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.c
> (ix86_broadcast_from_integer_constant): Rename to ..
> (ix86_broadcast_from_constant): .. this, and extend it to
> handle float mode.
> (ix86_expand_vector_move): Extend to float mode.
> * config/i386/i386-features.c
> (replace_constant_pool_with_broadcast): Remove.
> (remove_partial_avx_dependency_gate): Ditto.
> (constant_pool_broadcast): Ditto.
> (class pass_constant_pool_broadcast): Ditto.
> (make_pass_constant_pool_broadcast): Ditto.
> (remove_partial_avx_dependency): Adjust gate.
> * config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
> * config/i386/i386-protos.h
> (make_pass_constant_pool_broadcast): Remove.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.
> ---
>  gcc/config/i386/i386-expand.c |  29 +++-
>  gcc/config/i386/i386-features.c   | 157 +-
>  gcc/config/i386/i386-passes.def   |   1 -
>  gcc/config/i386/i386-protos.h |   1 -
>  .../gcc.target/i386/fuse-caller-save-xmm.c|   2 +-
>  5 files changed, 26 insertions(+), 164 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 69ea79e6123..ba870145acd 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -453,8 +453,10 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>emit_insn (gen_rtx_SET (op0, op1));
>  }
>
> +/* OP is a memref of CONST_VECTOR, return scalar constant mem
> +   if CONST_VECTOR is a vec_duplicate, else return NULL.  */
>  static rtx
> -ix86_broadcast_from_integer_constant (machine_mode mode, rtx op)
> +ix86_broadcast_from_constant (machine_mode mode, rtx op)
>  {
>int nunits = GET_MODE_NUNITS (mode);
>if (nunits < 2)
> @@ -462,7 +464,8 @@ ix86_broadcast_from_integer_constant (machine_mode mode, 
> rtx op)
>
>/* Don't use integer vector broadcast if we can't move from GPR to SSE
>   register directly.  */
> -  if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> +  if (!TARGET_INTER_UNIT_MOVES_TO_VEC
> +  && INTEGRAL_MODE_P (mode))
>  return nullptr;
>
>/* Convert CONST_VECTOR to a non-standard SSE constant integer
> @@ -470,12 +473,17 @@ ix86_broadcast_from_integer_constant (machine_mode 
> mode, rtx op)
>if (!(TARGET_AVX2
> || (TARGET_AVX
> && (GET_MODE_INNER (mode) == SImode
> -   || GET_MODE_INNER (mode) == DImode)))
> +   || GET_MODE_INNER (mode) == DImode))
> +   || FLOAT_MODE_P (mode))
>|| standard_sse_constant_p (op, mode))
>  return nullptr;
>
> -  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.  */
> -  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT)
> +  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.
> + We can still put 64-bit integer constant in memory when
> + avx512 embed broadcast is available.  */
> +  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT
> +  && (!TARGET_AVX512F
> + || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL)))
>  return nullptr;
>
>if (GET_MODE_INNER (mode) == TImode)
> @@ -561,17 +569,20 @@ ix86_expand_vector_move (machine_mode mode, rtx 
> operands[])
>
>if (can_create_pseudo_p ()
>&& GET_MODE_SIZE (mode) >= 16
> -  && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> +  && VECTOR_MODE_P (mode)
>&& (MEM_P (op1)
>   && SYMBOL_REF_P (XEXP (op1, 0))
>   && CONSTANT_POOL_ADDRESS_P (XEXP (op1, 0
>  {
> -  rtx first = ix86_broadcast_from_integer_constant (mode, op1);
> +  rtx first = ix86_broadcast_from_constant (mode, op1);
>

Re: [PATCH] c++: constexpr array reference and value-initialization [PR101371]

2021-07-13 Thread Jason Merrill via Gcc-patches

On 7/13/21 8:15 PM, Marek Polacek wrote:

This PR gave me a hard time: I saw multiple issues starting with
different revisions.  But ultimately the root cause seems to be
the following, and the attached patch fixes all issues I've found
here.

In cxx_eval_array_reference we create a new constexpr context for the
CP_AGGREGATE_TYPE_P case, but we also have to create it for the
non-aggregate case.


But not for the scalar case, surely?  Other similar places check 
AGGREGATE_TYPE_P || VECTOR_TYPE_P, or !SCALAR_TYPE_P.



In this test, we are evaluating

   ((B *)this)->a = rhs->a

which means that we set ctx.object to ((B *)this)->a.  Then we proceed
to evaluate the initializer, rhs->a.  For *rhs, we eval rhs, a PARM_DECL,
for which we have (const B &) [0] in the hash table.  Then
cxx_fold_indirect_ref gives us c.arr[0].  c is evaluated to {.arr={}} so
c.arr is {}.  Now we want c.arr[0], so we end up in cxx_eval_array_reference
and since we're initializing from {}, we call build_value_init which
gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'.  Then we
evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
we take ctx.object instead.  But that is the wrong object, we're not
initializing ((B *)this)->a here.  And so we wound up with an
initializer for A, and then crash in cxx_eval_component_reference:

   gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole)));

where DECL_CONTEXT (part) is B (as it should be) but the type of whole
was A.

With that in mind, the fix is straightforward, except that when the
value-init produced an AGGR_INIT_EXPR, we shouldn't set ctx.object so
that

2508   if (DECL_CONSTRUCTOR_P (fun) && !ctx->object
2509   && TREE_CODE (t) == AGGR_INIT_EXPR)
2510 {
2511   /* We want to have an initialization target for an AGGR_INIT_EXPR.
2512  If we don't already have one in CTX, use the AGGR_INIT_EXPR_SLOT. 
 */
2513   new_ctx.object = AGGR_INIT_EXPR_SLOT (t);

comes into play.


Hmm, setting new_ctx.object to t here looks like it should be the 
correct c.arr[0], not ((B*)this)->a.  It was wrong in the current code 
because we weren't setting up new_ctx at all, but once that's fixed I 
don't think you need special AGGR_INIT_EXPR handling.



Bootstrapped/regtested on {x86_64,ppc64le,aarch64}-pc-linux-gnu, ok for trunk?

PR c++/101371

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_array_reference): Create a new .object
and .ctor for the non-aggregate case too when value-initializing.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-101371-2.C: New test.
* g++.dg/cpp1y/constexpr-101371.C: New test.
---
  gcc/cp/constexpr.c| 15 ++
  .../g++.dg/cpp1y/constexpr-101371-2.C | 23 +++
  gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C | 29 +++
  3 files changed, 61 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 39787f3f5d5..584ef55703c 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3844,23 +3844,26 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, 
tree t,
   initializer, it's initialized from {}.  But use build_value_init
   directly for non-aggregates to avoid creating a garbage CONSTRUCTOR.  */
tree val;
-  constexpr_ctx new_ctx;
if (is_really_empty_class (elem_type, /*ignore_vptr*/false))
  return build_constructor (elem_type, NULL);
else if (CP_AGGREGATE_TYPE_P (elem_type))
  {
tree empty_ctor = build_constructor (init_list_type_node, NULL);
val = digest_init (elem_type, empty_ctor, tf_warning_or_error);
-  new_ctx = *ctx;
-  new_ctx.object = t;
-  new_ctx.ctor = build_constructor (elem_type, NULL);
-  ctx = _ctx;
  }
else
  val = build_value_init (elem_type, tf_warning_or_error);
+
+  constexpr_ctx new_ctx = *ctx;
+  /* If we are using an AGGR_INIT_EXPR, clear OBJECT for now so that
+ cxx_eval_call_expression can make use of AGGR_INIT_EXPR_SLOT.  */
+  new_ctx.object = (TREE_CODE (val) == AGGR_INIT_EXPR
+   ? NULL_TREE : t);
+  new_ctx.ctor = build_constructor (elem_type, NULL);
+  ctx = _ctx;
t = cxx_eval_constant_expression (ctx, val, lval, non_constant_p,
overflow_p);
-  if (CP_AGGREGATE_TYPE_P (elem_type) && t != ctx->ctor)
+  if (t != ctx->ctor)
  free_constructor (ctx->ctor);
return t;
  }
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C
new file mode 100644
index 000..fb67b67c265
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C
@@ -0,0 +1,23 @@
+// PR c++/101371
+// { dg-do compile { target c++14 } }
+
+struct A {
+  int i;
+};
+struct B {
+  A a{};
+  constexpr B() : a() {}
+  constexpr B(const B ) 

Re: contracts library support (was Re: [PATCH] PING implement pre-c++20 contracts)

2021-07-13 Thread Jason Merrill via Gcc-patches

On 7/12/21 3:58 PM, Jonathan Wakely wrote:

On Mon, 5 Jul 2021 at 20:07, Jason Merrill  wrote:


On 6/26/21 10:23 AM, Andrew Sutton wrote:


I ended up taking over this work from Jeff (CC'd on his existing email
address). I scraped all the contracts changes into one big patch
against master. See attached. The ChangeLog.contracts files list the
sum of changes for the patch, not the full history of the work.


Jonathan, can you advise where the library support should go?

In N4820  was part of the language-support clause, which makes
sense, but it uses string_view, which brings in a lot of the rest of the
library.  Did LWG talk about this when contracts went in?  How are
freestanding implementations expected to support contracts?


I don't recall that being discussed, but I think I was in another room
for much of the contracts review.

If necessary we could make the std::char_traits specialization
available freestanding, without the primary template (or the other
specializations). But since C++20 std::string_view also depends on
quite a lot of ranges, which depends on iterators, which is not
freestanding. Some of those dependencies were added more recently than
contracts was reviewed and then yanked out, so maybe wasn't considered
a big problem back then. In any case, depending on std::string_view
(even without the rest of std::basic_string_view) is not currently
possible for freestanding.


I guess I'll change string_view to const char* for now.


I imagine the header should be  for now.


Agreed.


And the type std::experimental::??::contract_violation.  Maybe 
contracts_v1 for the inline namespace?


Did you have any thoughts about the violation handler?  Is it OK to add 
a default definition to the library, in the above namespace?


Jason



Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-13 Thread Jason Merrill via Gcc-patches

On 7/13/21 4:02 PM, Martin Sebor wrote:

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of the 
change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to use
vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each translation
unit.


The ODR says this is OK because it's a literal constant with the same 
value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in C++17 
it's implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird 
blurring of the object/pointer boundary that is also dependent on vec 
being a thin wrapper around a pointer.  In almost all cases it can be 
replaced with {}; one exception is == comparison, where it seems to be 
testing that the embedded pointer is null, which is a weird thing to 
want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.


In C++11, it can be replaced by {} in that context as well.


If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it independently
of this initial change. I also don't know if I can commit to making
all this cleanup.


I already have a patch to replace all but one use of vNULL, but I'll 
hold off with it until after your patch.


Somewhat relatedly, use of vec variables or fields seems almost 
always a mistake, as they need explicit .release() that could be 
automatic with auto_vec, and is easy to forget.  For instance, the 
recursive call in get_all_loop_exits returns a vec that is never 
released.  And I see a couple of leaks in the C++ front end as well.


I agree.  The challenge I ran into with changing vec fields is with
code that uses the vec member as a reference to auto_vec.  This is
the case in gcc/ipa-prop.h, for instance.  Those instances could
be changed to auto_vec references or pointers but again it's a more
intrusive change than the simple replacements I was planning to make
in this first iteration.

So in summary, I agree with the changes you suggest.  Given their
scope I'd prefer not to make them in the same patch, and rather make
them at some point in the future when I or someone else has the time
and energy.  I'm running out.


Oh, absolutely.

Jason



Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-13 Thread guojiufu via Gcc-patches

On 2021-07-13 23:38, Segher Boessenkool wrote:

On Mon, Jul 12, 2021 at 08:20:14AM +0200, Richard Biener wrote:

On Fri, 9 Jul 2021, Segher Boessenkool wrote:
> Almost all targets just use Pmode, but there is no such guarantee I
> think, and esp. some targets that do not have machine insns for this
> (but want to generate different code for this anyway) can do pretty much
> anything.
>
> Maybe using just Pmode here is good enough though?

I think Pmode is a particularly bad choice and I'd prefer word_mode
if we go for any hardcoded mode.


In many important cases you use a pointer as iteration variable.

Is word_mode register size on most current targets?
Had a search on the implementation, word_mode is the mode on size 
BITS_PER_WORD
in MODE_INTs.  Actually, when targets define Pmode and BITS_PER_WORD, 
these two
macros are aligned -:), and it seems most targets define both these two 
macros.





s390x for example seems to handle
both SImode and DImode (but names the helper gen_doloop_si64
for SImode?!).


Yes, so Pmode will work fine for 390.  It would be nice if we could
allow multiple modes here, certainly.  Can we?


:), for other IVs, multiple modes are allowed to add as candidates;
while only one doloop iv is added.  Comparing the supporting more
doloop IVs, it seems changing the doloop iv mode is easy relatively
for me.  So, the patch is trying to update doloop iv.




But indeed it looks like somehow querying doloop_end
is going to be difficult since the expander doesn't have any mode,
so we'd have to actually try emit RTL here.


Or add a well-designed target macro for this.  "Which modes do we like
for IVs", perhaps?


In the new patch, a target hook preferred_doloop_mode is introduced. 
While

this hook is only for doloop iv at this time.
Maybe we could have preferred_iv_mode if needed. In the current code, 
IVs

are free to be added in different types, and the cost model is applied
to determine which IV may be better. The iv mode would be one factor for 
cost.



BR,
Jiufu




Segher


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-13 Thread Kees Cook via Gcc-patches
On Tue, Jul 13, 2021 at 11:16:59PM +, Qing Zhao wrote:
> Hi, Kees,
> 
> I took a look at the kernel testing case you attached in the previous email, 
> and found the testing failed with the following case:
> 
> #define INIT_STRUCT_static_all  = { .one = arg->one,\
> .two = arg->two,\
> .three = arg->three,\
> .four = arg->four,  \
> }
> 
> i.e, when the structure type auto variable has been explicitly initialized in 
> the source code.  -ftrivial-auto-var-init in the 4th version
> does not initialize the paddings for such variables.  
> 
> But in the previous version of the patches ( 2 or 3), -ftrivial-auto-var-init 
> initializes the paddings for such variables.
> 
> I intended to remove this part of the code from the 4th version of the patch 
> since the implementation for initializing such paddings is completely 
> different from 
> the initializing of the whole structure as a whole with memset in this 
> version of the implementation. 
> 
> If we really need this functionality, I will add another separate patch for 
> this additional functionality, but not with this patch.

Yes, this is required to get proper coverage for initialization in the
kernel (or, really, any program). Without this, things are still left
uninitialized in the padding of structs.

A separate patch is fine by me; my only desire is to still have it be
part of -ftrivial-auto-var-init when it's all done. :)

Thanks!

-- 
Kees Cook


Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-13 Thread guojiufu via Gcc-patches

On 2021-07-13 23:51, Segher Boessenkool wrote:

On Tue, Jul 13, 2021 at 10:09:25AM +0800, guojiufu wrote:

>For loop looks like:
>  do ;
>  while (n-- > 0); /* while  (n-- > low); */


(This whole loop as written will be optimised away, but :-) )

At -O2, the loop is optimized away.
At -O1, the loop is there.
.cfi_startproc
addi %r3,%r3,1
.L2:
addi %r9,%r3,-1
mr %r3,%r9
andi. %r9,%r9,0xff
bne %cr0,.L2
The patch v2 
(https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574596.html)

could help it to be:
.cfi_startproc
addi %r3,%r3,1
mtctr %r3
.L2:
addi %r3,%r3,-1
bdnz .L2




There is a patch that could mitigate "-1 +1 pair" in rtl part.
https://gcc.gnu.org/g:8a15faa730f99100f6f3ed12663563356ec5a2c0


Does that solve PR67288 (and its many duplicates)?

Had a test, yes, the "-1 +1" issue in PR67288 was fixed by that patch.

BR,
Jiufu.



Segher


[PATCH] [i386] Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

2021-07-13 Thread liuhongt via Gcc-patches
By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.

Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the entire memory, so always enable
broadcast.

benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast

The performance diff

strategy: cycles
memory  : 1046611188
memory  : 1255420817
memory  : 1044720793
memory  : 1253414145
average : 1097868397

broadcast   : 1044430688
broadcast   : 1044477630
broadcast   : 1253554603
broadcast   : 1044561934
average : 1096756213

But however broadcast has larger size.

the size diff

size broadcast.o
   textdata bss dec hex filename
137   0   0 137  89 broadcast.o

size memory.o
   textdata bss dec hex filename
115   0   0 115  73 memory.o

Bootstrapped and regtested on x86_64-linux-gnu{-m32,}

gcc/ChangeLog:

* config/i386/i386-expand.c
(ix86_broadcast_from_integer_constant): Rename to ..
(ix86_broadcast_from_constant): .. this, and extend it to
handle float mode.
(ix86_expand_vector_move): Extend to float mode.
* config/i386/i386-features.c
(replace_constant_pool_with_broadcast): Remove.
(remove_partial_avx_dependency_gate): Ditto.
(constant_pool_broadcast): Ditto.
(class pass_constant_pool_broadcast): Ditto.
(make_pass_constant_pool_broadcast): Ditto.
(remove_partial_avx_dependency): Adjust gate.
* config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
* config/i386/i386-protos.h
(make_pass_constant_pool_broadcast): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.
---
 gcc/config/i386/i386-expand.c |  29 +++-
 gcc/config/i386/i386-features.c   | 157 +-
 gcc/config/i386/i386-passes.def   |   1 -
 gcc/config/i386/i386-protos.h |   1 -
 .../gcc.target/i386/fuse-caller-save-xmm.c|   2 +-
 5 files changed, 26 insertions(+), 164 deletions(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 69ea79e6123..ba870145acd 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -453,8 +453,10 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   emit_insn (gen_rtx_SET (op0, op1));
 }
 
+/* OP is a memref of CONST_VECTOR, return scalar constant mem
+   if CONST_VECTOR is a vec_duplicate, else return NULL.  */
 static rtx
-ix86_broadcast_from_integer_constant (machine_mode mode, rtx op)
+ix86_broadcast_from_constant (machine_mode mode, rtx op)
 {
   int nunits = GET_MODE_NUNITS (mode);
   if (nunits < 2)
@@ -462,7 +464,8 @@ ix86_broadcast_from_integer_constant (machine_mode mode, 
rtx op)
 
   /* Don't use integer vector broadcast if we can't move from GPR to SSE
  register directly.  */
-  if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
+  if (!TARGET_INTER_UNIT_MOVES_TO_VEC
+  && INTEGRAL_MODE_P (mode))
 return nullptr;
 
   /* Convert CONST_VECTOR to a non-standard SSE constant integer
@@ -470,12 +473,17 @@ ix86_broadcast_from_integer_constant (machine_mode mode, 
rtx op)
   if (!(TARGET_AVX2
|| (TARGET_AVX
&& (GET_MODE_INNER (mode) == SImode
-   || GET_MODE_INNER (mode) == DImode)))
+   || GET_MODE_INNER (mode) == DImode))
+   || FLOAT_MODE_P (mode))
   || standard_sse_constant_p (op, mode))
 return nullptr;
 
-  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.  */
-  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT)
+  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.
+ We can still put 64-bit integer constant in memory when
+ avx512 embed broadcast is available.  */
+  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT
+  && (!TARGET_AVX512F
+ || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL)))
 return nullptr;
 
   if (GET_MODE_INNER (mode) == TImode)
@@ -561,17 +569,20 @@ ix86_expand_vector_move (machine_mode mode, rtx 
operands[])
 
   if (can_create_pseudo_p ()
   && GET_MODE_SIZE (mode) >= 16
-  && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+  && VECTOR_MODE_P (mode)
   && (MEM_P (op1)
  && SYMBOL_REF_P (XEXP (op1, 0))
  && CONSTANT_POOL_ADDRESS_P (XEXP (op1, 0
 {
-  rtx first = ix86_broadcast_from_integer_constant (mode, op1);
+  rtx first = ix86_broadcast_from_constant (mode, op1);
   if (first != nullptr)
{
  /* Broadcast to XMM/YMM/ZMM register from an integer
-constant.  */
- op1 = ix86_gen_scratch_sse_rtx (mode);
+constant or scalar mem.  */
+ op1 = gen_reg_rtx (mode);
+ if (FLOAT_MODE_P (mode)
+

Re: [PATCH] rs6000: Support [u]mul3_highpart for vector

2021-07-13 Thread Kewen.Lin via Gcc-patches
on 2021/7/14 上午6:07, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Jul 13, 2021 at 04:58:42PM +0800, Kewen.Lin wrote:
>> This patch is to make Power10 newly introduced vector
>> multiply high (part) instructions exploited in vectorized
>> loops, it renames existing define_insns as standard pattern
>> names.  It depends on that patch which enables vectorizer
>> to recog mul_highpart.
> 
> It actually is correct already, it will just not be used yet, right?

Yes, the names are just not standard.  :)

> But the testcases will fail until the generic support lands.
> 

Yes!

> Okay for trunk.  Thanks!
> 
> 

Thanks!

BR,
Kewen


Re: fix typo in attr_fnspec::verify

2021-07-13 Thread Alexandre Oliva
On Jul 13, 2021, Richard Biener  wrote:

> oops - also worth backporting to affected branches.

Thanks, I took that as explicit approval and put it in.

attr fnspec is new in gcc-11, not present in gcc-10, so I'm testing a
trivial backport, just to be sure...  Will install in gcc-11 when done.

>> * tree-ssa-alias.c (attr_fnspec::verify): Fix index in
>> non-'t'-sized arg check.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] c++: constexpr array reference and value-initialization [PR101371]

2021-07-13 Thread Marek Polacek via Gcc-patches
This PR gave me a hard time: I saw multiple issues starting with
different revisions.  But ultimately the root cause seems to be
the following, and the attached patch fixes all issues I've found
here.

In cxx_eval_array_reference we create a new constexpr context for the
CP_AGGREGATE_TYPE_P case, but we also have to create it for the
non-aggregate case.  In this test, we are evaluating

  ((B *)this)->a = rhs->a

which means that we set ctx.object to ((B *)this)->a.  Then we proceed
to evaluate the initializer, rhs->a.  For *rhs, we eval rhs, a PARM_DECL,
for which we have (const B &) [0] in the hash table.  Then
cxx_fold_indirect_ref gives us c.arr[0].  c is evaluated to {.arr={}} so
c.arr is {}.  Now we want c.arr[0], so we end up in cxx_eval_array_reference
and since we're initializing from {}, we call build_value_init which
gives us an AGGR_INIT_EXPR that calls 'constexpr B::B()'.  Then we
evaluate this AGGR_INIT_EXPR and since its first argument is dummy,
we take ctx.object instead.  But that is the wrong object, we're not
initializing ((B *)this)->a here.  And so we wound up with an
initializer for A, and then crash in cxx_eval_component_reference:

  gcc_assert (DECL_CONTEXT (part) == TYPE_MAIN_VARIANT (TREE_TYPE (whole)));

where DECL_CONTEXT (part) is B (as it should be) but the type of whole
was A.

With that in mind, the fix is straightforward, except that when the
value-init produced an AGGR_INIT_EXPR, we shouldn't set ctx.object so
that

2508   if (DECL_CONSTRUCTOR_P (fun) && !ctx->object
2509   && TREE_CODE (t) == AGGR_INIT_EXPR)
2510 {
2511   /* We want to have an initialization target for an AGGR_INIT_EXPR.
2512  If we don't already have one in CTX, use the AGGR_INIT_EXPR_SLOT. 
 */
2513   new_ctx.object = AGGR_INIT_EXPR_SLOT (t);

comes into play.

Bootstrapped/regtested on {x86_64,ppc64le,aarch64}-pc-linux-gnu, ok for trunk?

PR c++/101371

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_array_reference): Create a new .object
and .ctor for the non-aggregate case too when value-initializing.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-101371-2.C: New test.
* g++.dg/cpp1y/constexpr-101371.C: New test.
---
 gcc/cp/constexpr.c| 15 ++
 .../g++.dg/cpp1y/constexpr-101371-2.C | 23 +++
 gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C | 29 +++
 3 files changed, 61 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 39787f3f5d5..584ef55703c 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3844,23 +3844,26 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, 
tree t,
  initializer, it's initialized from {}.  But use build_value_init
  directly for non-aggregates to avoid creating a garbage CONSTRUCTOR.  */
   tree val;
-  constexpr_ctx new_ctx;
   if (is_really_empty_class (elem_type, /*ignore_vptr*/false))
 return build_constructor (elem_type, NULL);
   else if (CP_AGGREGATE_TYPE_P (elem_type))
 {
   tree empty_ctor = build_constructor (init_list_type_node, NULL);
   val = digest_init (elem_type, empty_ctor, tf_warning_or_error);
-  new_ctx = *ctx;
-  new_ctx.object = t;
-  new_ctx.ctor = build_constructor (elem_type, NULL);
-  ctx = _ctx;
 }
   else
 val = build_value_init (elem_type, tf_warning_or_error);
+
+  constexpr_ctx new_ctx = *ctx;
+  /* If we are using an AGGR_INIT_EXPR, clear OBJECT for now so that
+ cxx_eval_call_expression can make use of AGGR_INIT_EXPR_SLOT.  */
+  new_ctx.object = (TREE_CODE (val) == AGGR_INIT_EXPR
+   ? NULL_TREE : t);
+  new_ctx.ctor = build_constructor (elem_type, NULL);
+  ctx = _ctx;
   t = cxx_eval_constant_expression (ctx, val, lval, non_constant_p,
overflow_p);
-  if (CP_AGGREGATE_TYPE_P (elem_type) && t != ctx->ctor)
+  if (t != ctx->ctor)
 free_constructor (ctx->ctor);
   return t;
 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C
new file mode 100644
index 000..fb67b67c265
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-101371-2.C
@@ -0,0 +1,23 @@
+// PR c++/101371
+// { dg-do compile { target c++14 } }
+
+struct A {
+  int i;
+};
+struct B {
+  A a{};
+  constexpr B() : a() {}
+  constexpr B(const B ) : a(rhs.a) {}
+};
+struct C {
+  B arr[1];
+};
+
+constexpr C
+fn ()
+{
+  C c{};
+  return c;
+}
+
+constexpr C c = fn();
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C
new file mode 100644
index 000..b6351b806b9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-101371.C
@@ -0,0 +1,29 @@
+// PR c++/101371
+// { dg-do compile { target c++14 } }
+
+struct A {
+  int i;
+};
+struct 

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-13 Thread Qing Zhao via Gcc-patches
Hi, Kees,

I took a look at the kernel testing case you attached in the previous email, 
and found the testing failed with the following case:

#define INIT_STRUCT_static_all  = { .one = arg->one,\
.two = arg->two,\
.three = arg->three,\
.four = arg->four,  \
}

i.e, when the structure type auto variable has been explicitly initialized in 
the source code.  -ftrivial-auto-var-init in the 4th version
does not initialize the paddings for such variables.  

But in the previous version of the patches ( 2 or 3), -ftrivial-auto-var-init 
initializes the paddings for such variables.

I intended to remove this part of the code from the 4th version of the patch 
since the implementation for initializing such paddings is completely different 
from 
the initializing of the whole structure as a whole with memset in this version 
of the implementation. 

If we really need this functionality, I will add another separate patch for 
this additional functionality, but not with this patch.

Richard, what’s your comment and suggestions on this?

Thanks.

Qing

> On Jul 13, 2021, at 4:29 PM, Kees Cook  wrote:
> 
> On Mon, Jul 12, 2021 at 08:28:55PM +, Qing Zhao wrote:
>>> On Jul 12, 2021, at 12:56 PM, Kees Cook  wrote:
>>> On Wed, Jul 07, 2021 at 05:38:02PM +, Qing Zhao wrote:
 This is the 4th version of the patch for the new security feature for GCC.
>>> 
>>> It looks like padding initialization has regressed to where things where
>>> in version 1[1] (it was, however, working in version 2[2]). I'm seeing
>>> these failures again in the kernel self-test:
>>> 
>>> test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
>>> test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
>>> test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
>>> test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
>>> test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
>>> test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
>> 
>> Are the above failures for -ftrivial-auto-var-init=zero or 
>> -ftrivial-auto-var-init=pattern?  Or both?
> 
> Yes, I was only testing =zero (the kernel test handles =pattern as well:
> it doesn't explicitly test for 0x00). I've verified with =pattern now,
> too.
> 
>> For the current implementation, I believe that all paddings should be 
>> initialized with this option, 
>> for -ftrivial-auto-var-init=zero, the padding will be initialized to zero as 
>> before, however, for
>> -ftrivial-auto-var-init=pattern, the padding will be initialized to 0xFE 
>> byte-repeatable patterns.
> 
> I've double-checked that I'm using the right gcc, with the flag.
> 
>>> 
>>> In looking at the gcc test cases, I think the wrong thing is
>>> being checked: we want to verify the padding itself. For example,
>>> in auto-init-17.c, the actual bytes after "four" need to be checked,
>>> rather than "four" itself.
>> 
>> **For the current auto-init-17.c
>> 
>>  1 /* Verify zero initialization for array type with structure element with
>>  2padding.  */
>>  3 /* { dg-do compile } */
>>  4 /* { dg-options "-ftrivial-auto-var-init=zero" } */
>>  5 
>>  6 struct test_trailing_hole {
>>  7 int one;
>>  8 int two;
>>  9 int three;
>> 10 char four;
>> 11 /* "sizeof(unsigned long) - 1" byte padding hole here. */
>> 12 };
>> 13 
>> 14 
>> 15 int foo ()
>> 16 {
>> 17   struct test_trailing_hole var[10];
>> 18   return var[2].four;
>> 19 }
>> 20 
>> 21 /* { dg-final { scan-assembler "movl\t\\\$0," } } */
>> 22 /* { dg-final { scan-assembler "movl\t\\\$20," } } */
>> 23 /* { dg-final { scan-assembler "rep stosq" } } */
>> ~  
>> **We have the assembly as: (-ftrivial-auto-var-init=zero)
>> 
>>.file   "auto-init-17.c"
>>.text
>>.globl  foo
>>.type   foo, @function
>> foo:
>> .LFB0:
>>.cfi_startproc
>>pushq   %rbp
>>.cfi_def_cfa_offset 16
>>.cfi_offset 6, -16
>>movq%rsp, %rbp
>>.cfi_def_cfa_register 6
>>subq$40, %rsp
>>leaq-160(%rbp), %rax
>>movq%rax, %rsi
>>movl$0, %eax
>>movl$20, %edx
>>movq%rsi, %rdi
>>movq%rdx, %rcx
>>rep stosq
>>movzbl  -116(%rbp), %eax
>>movsbl  %al, %eax
>>leave
>>.cfi_def_cfa 7, 8
>>ret
>>.cfi_endproc
>> .LFE0:
>>.size   foo, .-foo
>>.section.note.GNU-stack,"",@progbits
>> 
>> From the above, we can see,  “zero” will be used to initialize 8 * 20 = 16 * 
>> 10 bytes of memory starting from the beginning of “var”, that include all 
>> the padding holes inside
>> This array of structure. 
>> 
>> I didn’t see issue with padding initialization here.
> 
> Hm, agreed -- 

Re: [PATCH 1/4 committed] rs6000: Add support for SSE4.1 "test" intrinsics

2021-07-13 Thread Segher Boessenkool
On Tue, Jul 13, 2021 at 02:01:18PM -0500, Paul A. Clarke wrote:
> > > >+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > > >__artificial__))
> > > Line too long, please fix here and below.  (Existing cases can be left.)
> > 
> > I wouldn't bother in this case.  There is no way to write these
> > attribute lines in a reasonable way, it doesn't overflow 80 char by that
> > much, and there isn't anything interesting at the end of line.
> 
> I bothered. ;-)

Ha :-)

Btw, Bill suggested to me offline making a preprocessor macro for this
long attribute line.  Which is a fine suggestion!  Something for the
future, maybe?


Segher


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-13 Thread Kees Cook via Gcc-patches
On Tue, Jul 13, 2021 at 02:29:33PM -0700, Kees Cook wrote:
> I've extracted the kernel test to build for userspace, and it behaves
> the same way. See attached "stackinit.c".

I've adjusted this slightly (the "static" tests weren't testing the
correct thing, but the results remained the same). Here's what I see.

This is the variable on the stack:

struct test_small_hole {
size_t one;  /* 0 8 */
char   two;  /* 8 1 */

/* XXX 3 bytes hole, try to pack */

intthree;/*12 4 */
long unsigned int  four; /*16 8 */

/* size: 24, cachelines: 1, members: 4 */
/* sum members: 21, holes: 1, sum holes: 3 */
/* last cacheline: 24 bytes */
};

The above is 0x18 in size.

00405370 :
  405370:   40 0f b6 c7 movzbl %dil,%eax
  405374:   41 89 f1mov%esi,%r9d
  405377:   48 8d 74 24 b8  lea-0x48(%rsp),%rsi

-0x48(%rsp) is the location of the variable.

  40537c:   44 0f be c7 movsbl %dil,%r8d
  405380:   48 b9 01 01 01 01 01movabs $0x101010101010101,%rcx
  405387:   01 01 01 
  40538a:   49 89 c2mov%rax,%r10
  40538d:   48 c7 44 24 b8 00 00movq   $0x0,-0x48(%rsp)
  405394:   00 00 

8 byte move of 0 to -0x48(%rsp) through -0x41

  405396:   48 f7 e1mul%rcx
  405399:   c6 44 24 c0 00  movb   $0x0,-0x40(%rsp)

1 byte move of 0 to -0x40(%rsp)

  40539e:   4c 0f af d1 imul   %rcx,%r10
  4053a2:   c7 44 24 c4 00 00 00movl   $0x0,-0x3c(%rsp)
  4053a9:   00 

4 byte move of 0 to -0x3c through -0x39 (note that -0x3f, -0x3e,
-0x3d is _not_ written, which maps to the 3 byte struct hole).

  4053aa:   48 c7 44 24 c8 00 00movq   $0x0,-0x38(%rsp)
  4053b1:   00 00 

8 byte move of 0 to -0x38(%rsp) through -0x31.

  4053b3:   48 89 35 d6 9c 00 00mov%rsi,0x9cd6(%rip) # 40f090 


variable address is saved to global.

  4053ba:   4c 01 d2add%r10,%rdx
  4053bd:   48 89 44 24 d8  mov%rax,-0x28(%rsp)
  4053c2:   48 c7 05 b3 9c 00 00movq   $0x18,0x9cb3(%rip) # 40f080 

  4053c9:   18 00 00 00 

variable size is saved to global.

  4053cd:   48 89 54 24 e0  mov%rdx,-0x20(%rsp)
  4053d2:   48 89 44 24 e8  mov%rax,-0x18(%rsp)
  4053d7:   48 89 54 24 f0  mov%rdx,-0x10(%rsp)
  4053dc:   45 84 c9test   %r9b,%r9b
  4053df:   75 1d   jne4053fe 

  4053e1:   48 8b 44 24 c8  mov-0x38(%rsp),%rax
  4053e6:   66 0f 6f 44 24 b8   movdqa -0x48(%rsp),%xmm0
  4053ec:   48 89 05 bd 9c 00 00mov%rax,0x9cbd(%rip) # 40f0b0 

  4053f3:   44 89 c0mov%r8d,%eax
  4053f6:   0f 29 05 a3 9c 00 00movaps %xmm0,0x9ca3(%rip) # 40f0a0 


Here's the unrolled memcpy (8 bytes and 16 bytes) to the global buffer,
taking the "uninitialized" padding with it.

  4053fd:   c3  ret

-- 
Kees Cook
// SPDX-License-Identifier: GPL-2.0-or-later
/*
 * Test cases for compiler-based stack variable zeroing via
 * -ftrivial-auto-var-init={zero,pattern} or CONFIG_GCC_PLUGIN_STRUCTLEAK*.
 *
 * Build example:
 * gcc -O2 -Wall -ftrivial-auto-var-init=zero -o stackinit stackinit.c
 */

/* Userspace headers. */
#include 
#include 
#include 
#include 
#include 

/* Linux kernel-isms */
#define KBUILD_MODNAME		"stackinit"
#define pr_fmt(fmt)		KBUILD_MODNAME ": " fmt
#define pr_err(fmt, ...)	fprintf(stderr, pr_fmt(fmt), ##__VA_ARGS__)
#define pr_warn(fmt, ...)	fprintf(stderr, pr_fmt(fmt), ##__VA_ARGS__)
#define pr_info(fmt, ...)	fprintf(stdout, pr_fmt(fmt), ##__VA_ARGS__)
#define __init			/**/
#define __user			/**/
#define noinline		__attribute__((__noinline__))
#define __aligned(x)		__attribute__((__aligned__(x)))
#ifdef __clang__
# define __compiletime_error(message) /**/
#else
# define __compiletime_error(message) __attribute__((__error__(message)))
#endif
#define __compiletime_assert(condition, msg, prefix, suffix)		\
	do {\
		extern void prefix ## suffix(void) __compiletime_error(msg); \
		if (!(condition))	\
			prefix ## suffix();\
	} while (0)
#define _compiletime_assert(condition, msg, prefix, suffix) \
	__compiletime_assert(condition, msg, prefix, suffix)
#define compiletime_assert(condition, msg) \
	_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
#define BUILD_BUG_ON(condition) \
	BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
typedef uint8_t			u8;
typedef uint16_t		u16;
typedef uint32_t		u32;
typedef uint64_t		u64;


/* Exfiltration buffer. */
#define MAX_VAR_SIZE	128
static u8 

Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-13 Thread Segher Boessenkool
Hi!

On Tue, Jul 13, 2021 at 12:14:22PM -0500, Peter Bergner wrote:
> +/* If the target storage locations of arguments MEM1 and MEM2 are
> +   adjacent, then return the argument that has the lower address.
> +   Otherwise, return NULL_RTX.  */
>  
> -static bool
> +static rtx
>  adjacent_mem_locations (rtx mem1, rtx mem2)

Note that the function name now nly makes much sense if you use the
return value as a boolean (zero for false, non-zero for true).

> @@ -18633,7 +18639,7 @@ power6_sched_reorder2 (rtx_insn **ready, int lastpos)
>   first_store_pos = pos;
>  
> if (is_store_insn (last_scheduled_insn, _mem2)
> -   && adjacent_mem_locations (str_mem, str_mem2))
> +   && adjacent_mem_locations (str_mem, str_mem2) != NULL_RTX)

... so don't change this?  Or write != 0 != 0 != 0, if one time is good,
three times must be better!  :-)

> @@ -26752,13 +26758,53 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> if (GET_MODE (src) == OOmode)
>   gcc_assert (VSX_REGNO_P (REGNO (dst)));
>  
> -   reg_mode = GET_MODE (XVECEXP (src, 0, 0));
> int nvecs = XVECLEN (src, 0);
> for (int i = 0; i < nvecs; i++)
>   {
> -   int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i;
> -   rtx dst_i = gen_rtx_REG (reg_mode, reg + index);
> -   emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
> +   rtx op;
> +   int regno = reg + i;
> +
> +   if (WORDS_BIG_ENDIAN)
> + {
> +   op = XVECEXP (src, 0, i);
> +
> +   /* If we are loading an even VSX register and the memory 
> location
> +  is adjacent to the next register's memory location (if 
> any),
> +  then we can load them both with one LXVP instruction.  */
> +   if ((regno & 1) == 0)
> + {
> +   rtx op2 = XVECEXP (src, 0, i + 1);
> +   if (adjacent_mem_locations (op, op2) == op)
> + {
> +   op = adjust_address (op, OOmode, 0);
> +   /* Skip the next register, since we're going to
> +  load it together with this register.  */
> +   i++;
> + }
> + }
> + }
> +   else
> + {
> +   op = XVECEXP (src, 0, nvecs - i - 1);
> +
> +   /* If we are loading an even VSX register and the memory 
> location
> +  is adjacent to the next register's memory location (if 
> any),
> +  then we can load them both with one LXVP instruction.  */
> +   if ((regno & 1) == 0)
> + {
> +   rtx op2 = XVECEXP (src, 0, nvecs - i - 2);
> +   if (adjacent_mem_locations (op2, op) == op2)
> + {
> +   op = adjust_address (op2, OOmode, 0);
> +   /* Skip the next register, since we're going to
> +  load it together with this register.  */
> +   i++;
> + }
> + }
> + }
> +
> +   rtx dst_i = gen_rtx_REG (GET_MODE (op), regno);
> +   emit_insn (gen_rtx_SET (dst_i, op));
>   }

So we are sure we have a hard register here, and it is a VSX register.
Okay.  Factoring this code would not hurt ;-)

Okay for trunk.  Thanks!


Segher


Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-13 Thread Segher Boessenkool
On Tue, Jul 13, 2021 at 12:09:07PM -0500, Peter Bergner wrote:
> On 7/10/21 7:39 PM, seg...@gate.crashing.org wrote:
> > It is very hard to see the differences now.  Don't fold the changes into
> > one patch, just have the code movement in a separate trivial patch, and
> > then the actual changes as a separate patch?  That way it is much easier
> > to review :-)
> 
> Ok, I split the patch into 2 patches.  The one here is simply the move.

This one is obviously okay for trunk.  Thanks!


Segher


Re: [PATCH] rs6000: Support [u]mul3_highpart for vector

2021-07-13 Thread Segher Boessenkool
Hi!

On Tue, Jul 13, 2021 at 04:58:42PM +0800, Kewen.Lin wrote:
> This patch is to make Power10 newly introduced vector
> multiply high (part) instructions exploited in vectorized
> loops, it renames existing define_insns as standard pattern
> names.  It depends on that patch which enables vectorizer
> to recog mul_highpart.

It actually is correct already, it will just not be used yet, right?
But the testcases will fail until the generic support lands.

Okay for trunk.  Thanks!


Segher


[PATCH] Rewrite memset expanders with vec_duplicate

2021-07-13 Thread H.J. Lu via Gcc-patches
1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with
vec_duplicate_optab to duplicate QI value to TI/OI/XI value.
2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

PR middle-end/90773
* builtins.c (gen_memset_value_from_prev): New function.
(gen_memset_broadcast): Likewise.
(builtin_memset_read_str): Use gen_memset_value_from_prev
and gen_memset_broadcast.
(builtin_memset_gen_str): Likewise.
* target.def (gen_memset_scratch_rtx): New hook.
* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
* doc/tm.texi: Regenerated.
---
 gcc/builtins.c | 123 +
 gcc/doc/tm.texi|   5 ++
 gcc/doc/tm.texi.in |   2 +
 gcc/target.def |   7 +++
 4 files changed, 116 insertions(+), 21 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 39ab139b7e1..c1758ae2efc 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx target)
   return NULL_RTX;
 }
 
-/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
-   bytes from constant string DATA + OFFSET and return it as target
-   constant.  If PREV isn't nullptr, it has the RTL info from the
+/* Return the RTL of a register in MODE generated from PREV in the
previous iteration.  */
 
-rtx
-builtin_memset_read_str (void *data, void *prevp,
-HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
-scalar_int_mode mode)
+static rtx
+gen_memset_value_from_prev (void *prevp, scalar_int_mode mode)
 {
+  rtx target = nullptr;
   by_pieces_prev *prev = (by_pieces_prev *) prevp;
   if (prev != nullptr && prev->data != nullptr)
 {
   /* Use the previous data in the same mode.  */
   if (prev->mode == mode)
return prev->data;
+
+  rtx prev_rtx = prev->data;
+  machine_mode prev_mode = prev->mode;
+  unsigned int word_size = GET_MODE_SIZE (word_mode);
+  if (word_size < GET_MODE_SIZE (prev->mode)
+ && word_size > GET_MODE_SIZE (mode))
+   {
+ /* First generate subreg of word mode if the previous mode is
+wider than word mode and word mode is wider than MODE.  */
+ prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
+ prev_mode, 0);
+ prev_mode = word_mode;
+   }
+  if (prev_rtx != nullptr)
+   target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
 }
+  return target;
+}
+
+/* Return the RTL of a register in MODE broadcasted from DATA.  */
+
+static rtx
+gen_memset_broadcast (rtx data, scalar_int_mode mode)
+{
+  /* Skip if regno_reg_rtx isn't initialized.  */
+  if (!regno_reg_rtx)
+return nullptr;
+
+  rtx target = nullptr;
+
+  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
+  machine_mode vector_mode;
+  if (!mode_for_vector (QImode, nunits).exists (_mode))
+gcc_unreachable ();
+
+  enum insn_code icode = optab_handler (vec_duplicate_optab,
+   vector_mode);
+  if (icode != CODE_FOR_nothing)
+{
+  rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
+  if (CONST_INT_P (data))
+   {
+ /* Use the move expander with CONST_VECTOR.  */
+ rtx const_vec = gen_const_vec_duplicate (vector_mode, data);
+ emit_move_insn (reg, const_vec);
+   }
+  else
+   {
+
+ class expand_operand ops[2];
+ create_output_operand ([0], reg, vector_mode);
+ create_input_operand ([1], data, QImode);
+ expand_insn (icode, 2, ops);
+ if (!rtx_equal_p (reg, ops[0].value))
+   emit_move_insn (reg, ops[0].value);
+   }
+  target = lowpart_subreg (mode, reg, vector_mode);
+}
+
+  return target;
+}
+
+/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
+   bytes from constant string DATA + OFFSET and return it as target
+   constant.  If PREV isn't nullptr, it has the RTL info from the
+   previous iteration.  */
 
+rtx
+builtin_memset_read_str (void *data, void *prev,
+HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
+scalar_int_mode mode)
+{
+  rtx target;
   const char *c = (const char *) data;
-  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
+  char *p;
+
+  /* Don't use the previous value if size is 1.  */
+  if (GET_MODE_SIZE (mode) != 1)
+{
+  target = gen_memset_value_from_prev (prev, mode);
+  if (target != nullptr)
+   return target;
+
+  p = XALLOCAVEC (char, GET_MODE_SIZE (QImode));
+  memset (p, *c, GET_MODE_SIZE (QImode));
+  rtx src = c_readstr (p, QImode);
+  target = gen_memset_broadcast (src, mode);
+  if (target != nullptr)
+   return target;
+}
+
+  p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
 
   memset (p, *c, GET_MODE_SIZE (mode));
 

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-13 Thread Kees Cook via Gcc-patches
On Mon, Jul 12, 2021 at 08:28:55PM +, Qing Zhao wrote:
> > On Jul 12, 2021, at 12:56 PM, Kees Cook  wrote:
> > On Wed, Jul 07, 2021 at 05:38:02PM +, Qing Zhao wrote:
> >> This is the 4th version of the patch for the new security feature for GCC.
> > 
> > It looks like padding initialization has regressed to where things where
> > in version 1[1] (it was, however, working in version 2[2]). I'm seeing
> > these failures again in the kernel self-test:
> > 
> > test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
> > test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
> > test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
> > test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
> > test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
> > test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
>  
> Are the above failures for -ftrivial-auto-var-init=zero or 
> -ftrivial-auto-var-init=pattern?  Or both?

Yes, I was only testing =zero (the kernel test handles =pattern as well:
it doesn't explicitly test for 0x00). I've verified with =pattern now,
too.

> For the current implementation, I believe that all paddings should be 
> initialized with this option, 
> for -ftrivial-auto-var-init=zero, the padding will be initialized to zero as 
> before, however, for
> -ftrivial-auto-var-init=pattern, the padding will be initialized to 0xFE 
> byte-repeatable patterns.

I've double-checked that I'm using the right gcc, with the flag.

> > 
> > In looking at the gcc test cases, I think the wrong thing is
> > being checked: we want to verify the padding itself. For example,
> > in auto-init-17.c, the actual bytes after "four" need to be checked,
> > rather than "four" itself.
> 
> **For the current auto-init-17.c
> 
>   1 /* Verify zero initialization for array type with structure element with
>   2padding.  */
>   3 /* { dg-do compile } */
>   4 /* { dg-options "-ftrivial-auto-var-init=zero" } */
>   5 
>   6 struct test_trailing_hole {
>   7 int one;
>   8 int two;
>   9 int three;
>  10 char four;
>  11 /* "sizeof(unsigned long) - 1" byte padding hole here. */
>  12 };
>  13 
>  14 
>  15 int foo ()
>  16 {
>  17   struct test_trailing_hole var[10];
>  18   return var[2].four;
>  19 }
>  20 
>  21 /* { dg-final { scan-assembler "movl\t\\\$0," } } */
>  22 /* { dg-final { scan-assembler "movl\t\\\$20," } } */
>  23 /* { dg-final { scan-assembler "rep stosq" } } */
> ~  
> **We have the assembly as: (-ftrivial-auto-var-init=zero)
> 
> .file   "auto-init-17.c"
> .text
> .globl  foo
> .type   foo, @function
> foo:
> .LFB0:
> .cfi_startproc
> pushq   %rbp
> .cfi_def_cfa_offset 16
> .cfi_offset 6, -16
> movq%rsp, %rbp
> .cfi_def_cfa_register 6
> subq$40, %rsp
> leaq-160(%rbp), %rax
> movq%rax, %rsi
> movl$0, %eax
> movl$20, %edx
> movq%rsi, %rdi
> movq%rdx, %rcx
> rep stosq
> movzbl  -116(%rbp), %eax
> movsbl  %al, %eax
> leave
> .cfi_def_cfa 7, 8
> ret
> .cfi_endproc
> .LFE0:
> .size   foo, .-foo
> .section.note.GNU-stack,"",@progbits
> 
> From the above, we can see,  “zero” will be used to initialize 8 * 20 = 16 * 
> 10 bytes of memory starting from the beginning of “var”, that include all the 
> padding holes inside
> This array of structure. 
> 
> I didn’t see issue with padding initialization here.

Hm, agreed -- this test does do the right thing.

> > But this isn't actually sufficient because they may _accidentally_
> > be zero already. The kernel tests specifically make sure to fill the
> > about-to-be-used stack with 0xff before calling a function like foo()
> > above.

I've extracted the kernel test to build for userspace, and it behaves
the same way. See attached "stackinit.c".

$ gcc-build/auto-var-init.4/installed/bin/gcc -O2 -Wall -o stackinit stackinit.c
$ ./stackinit 2>&1 | grep failures:
stackinit: failures: 23
$ gcc-build/auto-var-init.4/installed/bin/gcc -O2 -Wall 
-ftrivial-auto-var-init=zero -o stackinit stackinit.c
stackinit.c: In function ‘__leaf_switch_none’:
stackinit.c:326:26: warning: statement will never be executed
[-Wswitch-unreachable]
  326 | uint64_t var;
  |  ^~~
$ ./stackinit 2>&1 | grep failures:
stackinit: failures: 6

Same failures as seen in the kernel test (and an expected warning
about the initialization that will never happen for a pre-case switch
statement).

> > 
> > (And as an aside, it seems like naming the test cases with some details
> > about what is being tested in the filename would be nice -- it was
> > a little weird having to dig through their numeric names to find the
> > padding tests.)
> 
> Yes, I will fix the testing names to more reflect the testing details. 

Great!


[PATCH 3/3] [PR libfortran/101305] Fix ISO_Fortran_binding.h paths in gfortran testsuite

2021-07-13 Thread Sandra Loosemore
ISO_Fortran_binding.h is now generated in the libgfortran build
directory where it is on the default include path.  Adjust includes in
the gfortran testsuite not to include an explicit path pointing at the
source directory.

2021-07-13  Sandra Loosemore  

gcc/testsuite/
PR libfortran/101305
* gfortran.dg/ISO_Fortran_binding_1.c: Adjust include path.
* gfortran.dg/ISO_Fortran_binding_10.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_11.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_12.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_15.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_16.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_17.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_18.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_3.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_5.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_6.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_7.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_8.c: Likewise.
* gfortran.dg/ISO_Fortran_binding_9.c: Likewise.
* gfortran.dg/bind_c_array_params_3_aux.c: Likewise.
* gfortran.dg/iso_fortran_binding_uint8_array_driver.c: Likewise.
* gfortran.dg/pr93524.c: Likewise.
---
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c  | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_10.c | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_11.c | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_12.c | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_16.c | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_17.c | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.c | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_3.c  | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_5.c  | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_6.c  | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_7.c  | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_8.c  | 2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_9.c  | 2 +-
 gcc/testsuite/gfortran.dg/bind_c_array_params_3_aux.c  | 2 +-
 gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array_driver.c | 2 +-
 gcc/testsuite/gfortran.dg/pr93524.c| 2 +-
 17 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c 
b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
index a571459..9da5d85 100644
--- a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
@@ -1,6 +1,6 @@
 /* Test F2008 18.5: ISO_Fortran_binding.h functions.  */
 
-#include "../../../libgfortran/ISO_Fortran_binding.h"
+#include "ISO_Fortran_binding.h"
 #include 
 #include 
 #include 
diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_10.c 
b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_10.c
index 9f06e2d..c3954e4 100644
--- a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_10.c
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_10.c
@@ -2,7 +2,7 @@
 
 /* Contributed by Reinhold Bader   */
 
-#include "../../../libgfortran/ISO_Fortran_binding.h"
+#include "ISO_Fortran_binding.h"
 #include 
 #include 
 #include 
diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_11.c 
b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_11.c
index ac17690..c2d4e11 100644
--- a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_11.c
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_11.c
@@ -5,7 +5,7 @@ Contributed by Reinhold Bader  #include  
*/
 #include 
 #include 
 #include 
-#include "../../../libgfortran/ISO_Fortran_binding.h"
+#include "ISO_Fortran_binding.h"
 
 typedef struct
 {
diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_12.c 
b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_12.c
index 279d9f6..078c5de 100644
--- a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_12.c
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_12.c
@@ -2,7 +2,7 @@
 
 #include 
 #include 
-#include "../../../libgfortran/ISO_Fortran_binding.h"
+#include "ISO_Fortran_binding.h"
 
 /* Contributed by Reinhold Bader*/
 
diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c 
b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c
index f5c83c7..622f2de 100644
--- a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c
@@ -4,7 +4,7 @@
 
 #include 
 #include 
-#include "../../../libgfortran/ISO_Fortran_binding.h"
+#include "ISO_Fortran_binding.h"
 
 // Prototype for Fortran functions
 extern void Fsub(CFI_cdesc_t *);
diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_16.c 

[PATCH 1/3] [PR libfortran/101305] Bind(C): Fix type encodings in ISO_Fortran_binding.h

2021-07-13 Thread Sandra Loosemore
ISO_Fortran_binding.h had many incorrect hardwired kind encodings in
the definitions of the CFI_type_* macros.  Additionally, not all
targets support all the defined type encodings, and the Fortran
standard requires those macros to have a negative value.

This patch changes ISO_Fortran_binding.h to use sizeof instead of
hard-coded sizes, and assembles it from fragments that reflect the
set of types supported by the target.

2021-07-13  Sandra Loosemore  
Tobias Burnus  

libgfortran/
PR libfortran/101305
* ISO_Fortran_binding.h: Fix hard-coded sizes and split into...
* ISO_Fortran_binding-1-tmpl.h: New file.
* ISO_Fortran_binding-2-tmpl.h: New file.
* ISO_Fortran_binding-3-tmpl.h: New file.
* Makefile.am: Add rule for generating ISO_Fortran_binding.h.
Adjust pathnames to that file.
* Makefile.in: Regenerated.
* mk-kinds-h.sh: New file.
* runtime/ISO_Fortran_binding.c: Fix include path.
---
 libgfortran/ISO_Fortran_binding-1-tmpl.h  | 196 
 libgfortran/ISO_Fortran_binding-2-tmpl.h  |  42 ++
 libgfortran/ISO_Fortran_binding-3-tmpl.h  |   5 +
 libgfortran/ISO_Fortran_binding.h | 206 --
 libgfortran/Makefile.am   |  15 ++-
 libgfortran/Makefile.in   |  16 ++-
 libgfortran/mk-kinds-h.sh |  25 +++-
 libgfortran/runtime/ISO_Fortran_binding.c |   2 +-
 8 files changed, 292 insertions(+), 215 deletions(-)
 create mode 100644 libgfortran/ISO_Fortran_binding-1-tmpl.h
 create mode 100644 libgfortran/ISO_Fortran_binding-2-tmpl.h
 create mode 100644 libgfortran/ISO_Fortran_binding-3-tmpl.h
 delete mode 100644 libgfortran/ISO_Fortran_binding.h

diff --git a/libgfortran/ISO_Fortran_binding-1-tmpl.h 
b/libgfortran/ISO_Fortran_binding-1-tmpl.h
new file mode 100644
index 000..dde7c3d
--- /dev/null
+++ b/libgfortran/ISO_Fortran_binding-1-tmpl.h
@@ -0,0 +1,196 @@
+/* Declarations for ISO Fortran binding.
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Celis Garza  
+
+This file is part of the GNU Fortran runtime library (libgfortran).
+
+Libgfortran is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+Libgfortran is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#ifndef ISO_FORTRAN_BINDING_H
+#define ISO_FORTRAN_BINDING_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include   /* Standard ptrdiff_t tand size_t. */
+#include   /* Integer types. */
+
+/* Constants, defined as macros. */
+#define CFI_VERSION 1
+#define CFI_MAX_RANK 15
+
+/* Attributes. */
+#define CFI_attribute_pointer 0
+#define CFI_attribute_allocatable 1
+#define CFI_attribute_other 2
+
+/* Error codes.
+   CFI_INVALID_STRIDE should be defined in the standard because they are 
useful to the implementation of the functions.
+ */
+#define CFI_SUCCESS 0
+#define CFI_FAILURE 1
+#define CFI_ERROR_BASE_ADDR_NULL 2
+#define CFI_ERROR_BASE_ADDR_NOT_NULL 3
+#define CFI_INVALID_ELEM_LEN 4
+#define CFI_INVALID_RANK 5
+#define CFI_INVALID_TYPE 6
+#define CFI_INVALID_ATTRIBUTE 7
+#define CFI_INVALID_EXTENT 8
+#define CFI_INVALID_STRIDE 9
+#define CFI_INVALID_DESCRIPTOR 10
+#define CFI_ERROR_MEM_ALLOCATION 11
+#define CFI_ERROR_OUT_OF_BOUNDS 12
+
+/* CFI type definitions. */
+typedef ptrdiff_t CFI_index_t;
+typedef int8_t CFI_rank_t;
+typedef int8_t CFI_attribute_t;
+typedef int16_t CFI_type_t;
+
+/* CFI_dim_t. */
+typedef struct CFI_dim_t
+  {
+CFI_index_t lower_bound;
+CFI_index_t extent;
+CFI_index_t sm;
+  }
+CFI_dim_t;
+
+/* CFI_cdesc_t, C descriptors are cast to this structure as follows:
+   CFI_CDESC_T(CFI_MAX_RANK) foo;
+   CFI_cdesc_t * bar = (CFI_cdesc_t *) 
+ */
+typedef struct CFI_cdesc_t
+ {
+void *base_addr;
+size_t elem_len;
+int version;
+CFI_rank_t rank;
+CFI_attribute_t attribute;
+CFI_type_t type;
+CFI_dim_t dim[];
+ }
+CFI_cdesc_t;
+
+/* CFI_CDESC_T with an explicit type. */
+#define CFI_CDESC_TYPE_T(r, base_type) \
+   struct { \
+   base_type *base_addr; \
+   size_t elem_len; \
+   int version; \
+   CFI_rank_t rank; \
+   CFI_attribute_t 

[PATCH 2/3] [PR libfortran/101305] Bind(C): Correct sizes of some types in CFI_establish

2021-07-13 Thread Sandra Loosemore
CFI_establish was failing to set the default elem_len correctly for
CFI_type_cptr, CFI_type_cfunptr, CFI_type_long_double, and
CFI_type_long_double_Complex.

2021-07-13  Sandra Loosemore  

libgfortran/
PR libfortran/101305
* runtime/ISO_Fortran_binding.c (CFI_establish): Special-case
CFI_type_cptr and CFI_type_cfunptr.  Correct size of long double
on targets where it has kind 10.
---
 libgfortran/runtime/ISO_Fortran_binding.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/libgfortran/runtime/ISO_Fortran_binding.c 
b/libgfortran/runtime/ISO_Fortran_binding.c
index 28fa9f5..6b5f26c 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -341,9 +341,13 @@ int CFI_establish (CFI_cdesc_t *dv, void *base_addr, 
CFI_attribute_t attribute,
 
   dv->base_addr = base_addr;
 
-  if (type == CFI_type_char || type == CFI_type_ucs4_char ||
-  type == CFI_type_struct || type == CFI_type_other)
+  if (type == CFI_type_char || type == CFI_type_ucs4_char
+  || type == CFI_type_struct || type == CFI_type_other)
 dv->elem_len = elem_len;
+  else if (type == CFI_type_cptr)
+dv->elem_len = sizeof (void *);
+  else if (type == CFI_type_cfunptr)
+dv->elem_len = sizeof (void (*)(void));
   else
 {
   /* base_type describes the intrinsic type with kind parameter. */
@@ -351,16 +355,13 @@ int CFI_establish (CFI_cdesc_t *dv, void *base_addr, 
CFI_attribute_t attribute,
   /* base_type_size is the size in bytes of the variable as given by its
* kind parameter. */
   size_t base_type_size = (type - base_type) >> CFI_type_kind_shift;
-  /* Kind types 10 have a size of 64 bytes. */
+  /* Kind type 10 maps onto the 80-bit long double encoding on x86.
+Note that this has different storage size for -m32 than -m64.  */
   if (base_type_size == 10)
-   {
- base_type_size = 64;
-   }
+   base_type_size = sizeof (long double);
   /* Complex numbers are twice the size of their real counterparts. */
   if (base_type == CFI_type_Complex)
-   {
- base_type_size *= 2;
-   }
+   base_type_size *= 2;
   dv->elem_len = base_type_size;
 }
 
-- 
2.8.1



[PATCH 0/3] [PR libfortran/101305] Bind(C): Fix kind/size mappings

2021-07-13 Thread Sandra Loosemore
This set of patches is for PR libfortran/101305, about bugs in
ISO_Fortran_binding.h's type kind/size encodings, and also incorrect
kind/size mappings in CFI_establish.  For instance,
ISO_Fortran_binding.h had hard-wired encodings that ptrdiff_t and long
are 8 bytes that are clearly incorrect on a 32-bit target, and other
encodings like CFI_type_int_fast8_t and CFI_type_long_double were
incorrect on some 64-bit targets too.  So part of this patch involves
using sizeof in the CFI_type_* macro definitions, instead of literal
constants.

Another difficulty is that the 2018 Fortran standard requires that the
CFI_type_* macros for C types not supported by the Fortran processor
have negative values.  Tobias contributed some scripting to check for
that; now ISO_Fortran_binding.h is generated at build time from
fragments in the libgfortran source directory.

The remaining parts of the patch fix up related bugs in CFI_establish
for types whose size isn't directly encoded in the corresponding
CFI_type_* macro, and adjust include paths for ISO_Fortran_binding.h
in the test suite.

Jose has posted a patch that fixes some additional bugs in type/size encodings
in descriptors passed to and from C:

https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html

and there remains a messy bug (PR fortran/100917) relating to
ambiguity in handling long double on some targets -- specifically, on
x86_64 targets that have both 80-bit long doubles with a storage size
of 16 and a true 128-bit floating-point format, the GFC descriptor
representation can't tell them apart.

I tested these patches on i686-pc-linux-gnu with both -m32 and -m64
multilibs.

Sandra Loosemore (3):
  [PR libfortran/101305] Bind(C): Fix type encodings in
ISO_Fortran_binding.h
  [PR libfortran/101305] Bind(C): Correct sizes of some types in
CFI_establish
  [PR libfortran/101305] Fix ISO_Fortran_binding.h paths in gfortran
testsuite

 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c  |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_10.c |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_11.c |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_12.c |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_16.c |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_17.c |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.c |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_3.c  |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_5.c  |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_6.c  |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_7.c  |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_8.c  |   2 +-
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_9.c  |   2 +-
 .../gfortran.dg/bind_c_array_params_3_aux.c|   2 +-
 .../iso_fortran_binding_uint8_array_driver.c   |   2 +-
 gcc/testsuite/gfortran.dg/pr93524.c|   2 +-
 libgfortran/ISO_Fortran_binding-1-tmpl.h   | 196 
 libgfortran/ISO_Fortran_binding-2-tmpl.h   |  42 +
 libgfortran/ISO_Fortran_binding-3-tmpl.h   |   5 +
 libgfortran/ISO_Fortran_binding.h  | 206 -
 libgfortran/Makefile.am|  15 +-
 libgfortran/Makefile.in|  16 +-
 libgfortran/mk-kinds-h.sh  |  25 ++-
 libgfortran/runtime/ISO_Fortran_binding.c  |  21 ++-
 25 files changed, 319 insertions(+), 241 deletions(-)
 create mode 100644 libgfortran/ISO_Fortran_binding-1-tmpl.h
 create mode 100644 libgfortran/ISO_Fortran_binding-2-tmpl.h
 create mode 100644 libgfortran/ISO_Fortran_binding-3-tmpl.h
 delete mode 100644 libgfortran/ISO_Fortran_binding.h

-- 
2.8.1



Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-13 Thread Segher Boessenkool
Hi!

On Tue, Jul 13, 2021 at 08:50:46PM +0800, Jiufu Guo wrote:
>   * doc/tm.texi: Regenerated.

Pet peeve: "Regenerate.", no "d".

> +DEFHOOK
> +(preferred_doloop_mode,
> + "This hook returns a more preferred mode or the @var{mode} itself.",
> + machine_mode,
> + (machine_mode mode),
> + default_preferred_doloop_mode)

You need a bit more description here.  What does the value it returns
mean?  If you want to say "a more preferred mode or the mode itself",
you should explain what the difference means, too.

You also should say the hook does not need to test if things will fit,
since the generic code already does.

And say this should return a MODE_INT always -- you never test for that
as far as I can see, but you don't need to, as long as everyone does the
sane thing.  So just state every hok implementation should :-)

> +extern machine_mode
> +default_preferred_doloop_mode (machine_mode);

One line please (this is a declaration).

> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +void foo(int *p1, long *p2, int s)
> +{
> +  int n, v, i;
> +
> +  v = 0;
> +  for (n = 0; n <= 100; n++) {
> + for (i = 0; i < s; i++)
> +if (p2[i] == n)
> +   p1[i] = v;
> + v += 88;
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not {\mrldicl\M} } } */

That is a pretty fragile thing to test for.  It also need a line or two
of comment in the test case what this does, what kind of thing it does
not want to see.

> +/* If PREFERRED_MODE is suitable and profitable, use the preferred
> +   PREFERRED_MODE to compute doloop iv base from niter: base = niter + 1.  */
> +
> +static tree
> +compute_doloop_base_on_mode (machine_mode preferred_mode, tree niter,
> +  const widest_int _max)
> +{
> +  tree ntype = TREE_TYPE (niter);
> +  tree pref_type = lang_hooks.types.type_for_mode (preferred_mode, 1);
> +
> +  gcc_assert (pref_type && TYPE_UNSIGNED (ntype));

Should that be pref_type instead of ntype?  If not, write it as two
separate asserts please.

> +static machine_mode
> +rs6000_preferred_doloop_mode (machine_mode)
> +{
> +  return word_mode;
> +}

This is fine if the generic code does the right thing if it passes say
TImode here, and if it never will pass some other mode class mode.


Segher


[PATCH v2] gcov: Add __gcov_info_to_gdca()

2021-07-13 Thread Sebastian Huber
Add __gcov_info_to_gcda() to libgcov to get the gcda data for a gcda info in a
freestanding environment.  It is intended to be used with the
-fprofile-info-section option.  A crude test program which doesn't use a linker
script is (use "gcc -coverage -fprofile-info-section -lgcc test.c" to compile
it):

  #include 
  #include 
  #include 

  extern const struct gcov_info *my_info;

  static void
  filename (const char *f, void *arg)
  {
printf("filename: %s\n", f);
  }

  static void
  dump (const void *d, unsigned n, void *arg)
  {
const unsigned char *c = d;

for (unsigned i = 0; i < n; ++i)
  printf ("%02x", c[i]);
  }

  static void *
  allocate (unsigned length, void *arg)
  {
return malloc (length);
  }

  int main()
  {
__asm__ volatile (".set my_info, .LPBX2");
__gcov_info_to_gcda (my_info, filename, dump, allocate, NULL);
return 0;
  }

With this patch,  is included in libgcov-driver.c even if
inhibit_libc is defined.  This header file should be also available for
freestanding environments.  If this is not the case, then we have to define
intptr_t somehow.

The patch removes one use of memset() which makes the  include
superfluous.

gcc/

* gcc/gcov-io.h (gcov_write): Declare.
* gcc/gcov-io.c (gcov_write): New.
* doc/invoke.texi (fprofile-info-section): Mention
__gcov_info_to_gdca().

libgcc/

Makefile.in (LIBGCOV_DRIVER): Add _gcov_info_to_gcda.
gcov.h (gcov_info): Declare.
(__gcov_info_to_gdca): Likewise.
libgcov-driver.c (#include ): New.
(#include ): Remove.
(NEED_L_GCOV): Conditionally define.
(NEED_L_GCOV_INFO_TO_GCDA): Likewise.
(are_all_counters_zero): New.
(dump_handler): Likewise.
(allocate_handler): Likewise.
(dump_unsigned): Likewise.
(dump_counter): Likewise.
(write_topn_counters): Add dump, allocate, and arg parameters.  Use
dump_unsigned() and dump_counter().
(write_one_data): Add dump, allocate, and arg parameters.  Use
dump_unsigned(), dump_counter(), and are_all_counters_zero().
(__gcov_info_to_gcda): New.
---
 gcc/doc/invoke.texi |  80 +++---
 gcc/gcov-io.c   |  10 +++
 gcc/gcov-io.h   |   1 +
 libgcc/Makefile.in  |   2 +-
 libgcc/gcov.h   |  17 
 libgcc/libgcov-driver.c | 176 ++--
 6 files changed, 230 insertions(+), 56 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e67d47af676d..2c514acf2003 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14782,17 +14782,17 @@ To optimize the program based on the collected 
profile information, use
 Register the profile information in the specified section instead of using a
 constructor/destructor.  The section name is @var{name} if it is specified,
 otherwise the section name defaults to @code{.gcov_info}.  A pointer to the
-profile information generated by @option{-fprofile-arcs} or
-@option{-ftest-coverage} is placed in the specified section for each
-translation unit.  This option disables the profile information registration
-through a constructor and it disables the profile information processing
-through a destructor.  This option is not intended to be used in hosted
-environments such as GNU/Linux.  It targets systems with limited resources
-which do not support constructors and destructors.  The linker could collect
-the input sections in a continuous memory block and define start and end
-symbols.  The runtime support could dump the profiling information registered
-in this linker set during program termination to a serial line for example.  A
-GNU linker script example which defines a linker output section follows:
+profile information generated by @option{-fprofile-arcs} is placed in the
+specified section for each translation unit.  This option disables the profile
+information registration through a constructor and it disables the profile
+information processing through a destructor.  This option is not intended to be
+used in hosted environments such as GNU/Linux.  It targets free-standing
+environments (for example embedded systems) with limited resources which do not
+support constructors/destructors or the C library file I/O.
+
+The linker could collect the input sections in a continuous memory block and
+define start and end symbols.  A GNU linker script example which defines a
+linker output section follows:
 
 @smallexample
   .gcov_info  :
@@ -14803,6 +14803,64 @@ GNU linker script example which defines a linker 
output section follows:
   @}
 @end smallexample
 
+The program could dump the profiling information registered in this linker set
+for example like this:
+
+@smallexample
+#include 
+#include 
+#include 
+
+extern const struct gcov_info *__gcov_info_start[];
+extern const struct gcov_info *__gcov_info_end[];
+
+static void
+filename (const char *f, void *arg)
+@{
+  puts (f);
+@}
+

Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-13 Thread Martin Sebor via Gcc-patches

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of the 
change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to use
vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each translation
unit.


The ODR says this is OK because it's a literal constant with the same 
value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in C++17 it's 
implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird blurring 
of the object/pointer boundary that is also dependent on vec being a 
thin wrapper around a pointer.  In almost all cases it can be replaced 
with {}; one exception is == comparison, where it seems to be testing 
that the embedded pointer is null, which is a weird thing to want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.

If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it independently
of this initial change. I also don't know if I can commit to making
all this cleanup.



Somewhat relatedly, use of vec variables or fields seems almost 
always a mistake, as they need explicit .release() that could be 
automatic with auto_vec, and is easy to forget.  For instance, the 
recursive call in get_all_loop_exits returns a vec that is never 
released.  And I see a couple of leaks in the C++ front end as well.


I agree.  The challenge I ran into with changing vec fields is with
code that uses the vec member as a reference to auto_vec.  This is
the case in gcc/ipa-prop.h, for instance.  Those instances could
be changed to auto_vec references or pointers but again it's a more
intrusive change than the simple replacements I was planning to make
in this first iteration.

So in summary, I agree with the changes you suggest.  Given their
scope I'd prefer not to make them in the same patch, and rather make
them at some point in the future when I or someone else has the time
and energy.  I'm running out.

Martin


Re: [PATCH V2] coroutines: Adjust outlined function names [PR95520].

2021-07-13 Thread Jason Merrill via Gcc-patches

On 7/13/21 4:11 AM, Iain Sandoe wrote:

Hi Jason


On 12 Jul 2021, at 20:40, Jason Merrill  wrote:

On 7/11/21 9:03 AM, Iain Sandoe wrote:

Hi Jason,

On 9 Jul 2021, at 22:40, Jason Merrill  wrote:

On 7/9/21 2:18 PM, Iain Sandoe wrote:
How about handling this in write_encoding, along the lines of the 
devel/c++-contracts branch?

OK, so I took a look at this and implemented as below.


Oh, sorry, I didn't expect it to be such a large change!


  Some small differences from your contracts impl described here.
recalling
the original function becomes the ramp - it is called directly by the user-code.
the resumer (actor) contains the outlined code wrapped in synthesized logic as 
dictated by the std
the destroy function effectively calls the actor with a flag that says “take 
the DTOR path” (since the DTOR path has to be available in the case of resume 
too).
this means that is is possible for the actor to be partially (or completely for 
a generator-style coro) inlined into either the ramp or the destroyer.
1. using DECL_ABSTRACT_ORIGIN didn’t work with optimisation and debug since the 
inlining of the outlining confuses the issue (the actor/destory helpers are not 
real clones).


Hmm, I wonder if that will bite my use in contracts as well.  Can you elaborate?


In the coroutines case I think it is simply a lie to set DECL_ABSTRACT_ORIGIN 
since that is telling the debug machinery:

"For any sort of a ..._DECL node, this points to the original (abstract)
decl node which this decl is an inlined/cloned instance of, or else it
is NULL indicating that this decl is not an instance of some other decl. “

That is not true for either the actor or destroy functions in coroutines - they 
are not instances of the ramp.

The problem comes when the actor gets inlined into the ramp - so I guess the 
machinery is expecting that we’ve done something akin to a recursion - but the 
actor is completely different code from the ramp, and has a different interface:
void actor(frame*) c.f. whatever the user’s function was (including being a 
class method or a lambda).

The fail occurs here:

gen_inlined_subroutine_die (tree stmt, dw_die_ref context_die)
  …..
   /* Make sure any inlined functions are known to be inlineable.  */
   gcc_checking_assert (DECL_ABSTRACT_P (decl)
   || cgraph_function_possibly_inlined_p (decl));


Hmm, I would hope that cgraph_function_possibly_inlined_p should be true 
for a function you're trying to inline, I wonder what's interfering with 
that...



--

* I’d expect the JOIN_STR change to bite you at some point (since there are 
some platforms that don’t allow periods in symbols).


Indeed, thanks.


- const char *mangled_name
-   = (ovl_op_info[DECL_ASSIGNMENT_OPERATOR_P (decl)]
+ const char *mangled_name;
+ if (DECL_IS_CORO_ACTOR_P (decl) || DECL_IS_CORO_DESTROY_P (decl))
+   {
+ tree t = DECL_RAMP_FN (decl);


This ends up doing 5 lookups in the to_ramp hashtable; that should be fast 
enough, but better I think to drop the DECL_IS_CORO_*_P macros and check 
DECL_RAMP_FN directly, both here and in write_encoding.


TBH, I had misgivings about this - primarily that the “not used” path should 
have low impact.

However, if there are no coroutines in a TU, then the case above should only be 
two calls which immediately return NULL_TREE…

… however, I’ve changed this as suggested so that there are fewer calls in all 
cases (in the attached).

We can just test DECL_RAMP_FN (decl) since that will return NULL_TREE for any 
case that isn’t a helper (and, again, if there are no coroutines in the TU, it 
returns NULL_TREE immediately).

If we can guarantee that cfun will be available (so we didn’t need to check for 
its presence), then there’s a “coroutine helper” flag there which could be used 
to guard this further (but I’m not sure that it will be massivley quicker if we 
needed to check to see if the cfun is available first).


+ mangled_name = (ovl_op_info[DECL_ASSIGNMENT_OPERATOR_P (t)]
+  [DECL_OVERLOADED_OPERATOR_CODE_RAW (t)].mangled_name);


Is there a reason not to do decl = t; and then share the array indexing line?


No - tidied in the revised version.

tested on x86_64-darwin,
OK for master / backports (with wider testing first).


OK, thanks.


thanks
Iain

===

The mechanism used to date for uniquing the coroutine helper
functions (actor, destroy) was over-complicating things and
leading to the noted PR and also difficulties in setting
breakpoints on these functions (so this will help PR99215 as
well).

This implementation delegates the adjustment to the mangling
to write_encoding() which necessitates some book-keeping so
that it is possible to determine which of the coroutine
helper names is to be mangled.

Signed-off-by: Iain Sandoe 

PR c++/95520 - [coroutines] __builtin_FUNCTION() returns mangled .actor instead 
of original function name

PR c++/95520


[PATCH] PR fortran/100949 - [9/10/11/12 Regression] ICE in gfc_conv_expr_present, at fortran/trans-expr.c:1975

2021-07-13 Thread Harald Anlauf via Gcc-patches
Hello world,

we rather shouldn't consider a presence check for a non-dummy variable.

Regtested on x86_64-pc-linux-gnu.  OK for all affected branches?

Thanks,
Harald


Fortran - ICE in gfc_conv_expr_present while initializing a non-dummy
class variable

gcc/fortran/ChangeLog:

PR fortran/100949
* trans-expr.c (gfc_trans_class_init_assign): Call
gfc_conv_expr_present only for dummy variables.

gcc/testsuite/ChangeLog:

PR fortran/100949
* gfortran.dg/pr100949.f90: New test.

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index de406ad2e8f..9e0dcdefd25 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -1741,8 +1741,9 @@ gfc_trans_class_init_assign (gfc_code *code)
 	}
 }

-  if (code->expr1->symtree->n.sym->attr.optional
-  || code->expr1->symtree->n.sym->ns->proc_name->attr.entry_master)
+  if (code->expr1->symtree->n.sym->attr.dummy
+  && (code->expr1->symtree->n.sym->attr.optional
+	  || code->expr1->symtree->n.sym->ns->proc_name->attr.entry_master))
 {
   tree present = gfc_conv_expr_present (code->expr1->symtree->n.sym);
   tmp = build3_loc (input_location, COND_EXPR, TREE_TYPE (tmp),
diff --git a/gcc/testsuite/gfortran.dg/pr100949.f90 b/gcc/testsuite/gfortran.dg/pr100949.f90
new file mode 100644
index 000..6c736fd7f72
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr100949.f90
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! PR fortran/100949 - ICE in gfc_conv_expr_present, at fortran/trans-expr.c:1975
+
+subroutine s
+entry f
+  type t
+  end type
+  class(t), allocatable :: y, z
+  allocate (z, mold=y)
+end


[PATCH] rs6000: Fix up easy_vector_constant_msb handling [PR101384]

2021-07-13 Thread Jakub Jelinek via Gcc-patches
Hi!

The following gcc.dg/pr101384.c testcase is miscompiled on
powerpc64le-linux.
easy_altivec_constant has code to try construct vector constants with
different element sizes, perhaps different from CONST_VECTOR's mode.  But as
written, that works fine for vspltis[bhw] cases, but not for the vspltisw
x,-1; vsl[bhw] x,x,x case, because that creates always a V16QImode, V8HImode
or V4SImode constant containing broadcasted constant with just the MSB set. 
The vspltis_constant function etc. expects the vspltis[bhw] instructions
where the small [-16..15] or even [-32..30] constant is sign-extended to the
remaining step bytes, but that is not the case for the 0x80...00 constants,
with step > 1 we can't handle e.g.
{ 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 
0xff, 0xff, 0xff }
vectors but do want to handle e.g.
{ 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80 }
and similarly with copies > 1 we do want to handle e.g.
{ 0x80808080, 0x80808080, 0x80808080, 0x80808080 }.

Bootstrapped/regtested on powerpc64le-linux and powerpc64-linux (the latter
regtested with -m32/-m64), ok for trunk?

Perhaps for backports it would be best to limit the EASY_VECTOR_MSB case
matching to step == 1 && copies == 1, because that is the only case the
splitter handled correctly, but as can be seen in the gcc.target tests, the
patch tries to handle it for all the cases.  Do you want that other patch
or prefer this patch for the backports too?

2021-07-13  Jakub Jelinek  

PR target/101384
* config/rs6000/rs6000-protos.h (easy_altivec_constant): Change return
type from bool to int.
* config/rs6000/rs6000.c (vspltis_constant): Fix up handling the
EASY_VECTOR_MSB case if either step or copies is not 1.
(vspltis_shifted): Fix comment typo.
(easy_altivec_constant): Change return type from bool to int, instead
of returning true return byte size of the element mode that should be
used to synthetize the constant.
* config/rs6000/predicates.md (easy_vector_constant_msb): Require
that vspltis_shifted is 0, handle the case where easy_altivec_constant
assumes using different vector mode from CONST_VECTOR's mode.
* config/rs6000/altivec.md (easy_vector_constant_msb splitter): Use
easy_altivec_constant to determine mode in which -1 >> -1 should be
performed, use rs6000_expand_vector_init instead of gen_vec_initv4sisi.

* gcc.dg/pr101384.c: New test.
* gcc.target/powerpc/pr101384-1.c: New test.
* gcc.target/powerpc/pr101384-2.c: New test.

--- gcc/config/rs6000/rs6000-protos.h.jj2021-07-13 09:07:03.697092286 
+0200
+++ gcc/config/rs6000/rs6000-protos.h   2021-07-13 11:28:54.876243593 +0200
@@ -30,7 +30,7 @@ extern void init_cumulative_args (CUMULA
  tree, machine_mode);
 #endif /* TREE_CODE */
 
-extern bool easy_altivec_constant (rtx, machine_mode);
+extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
--- gcc/config/rs6000/rs6000.c.jj   2021-07-13 09:07:03.715092036 +0200
+++ gcc/config/rs6000/rs6000.c  2021-07-13 12:18:08.715706507 +0200
@@ -6134,6 +6134,27 @@ vspltis_constant (rtx op, unsigned step,
   splat_val = val;
   msb_val = val >= 0 ? 0 : -1;
 
+  if (val == 0 && step > 1)
+{
+  /* Special case for loading most significant bit with step > 1.
+In that case, match 0s in all but step-1s elements, where match
+EASY_VECTOR_MSB.  */
+  for (i = 1; i < nunits; ++i)
+   {
+ unsigned elt = BYTES_BIG_ENDIAN ? nunits - 1 - i : i;
+ HOST_WIDE_INT elt_val = const_vector_elt_as_int (op, elt);
+ if ((i & (step - 1)) == step - 1)
+   {
+ if (!EASY_VECTOR_MSB (elt_val, inner))
+   break;
+   }
+ else if (elt_val)
+   break;
+   }
+  if (i == nunits)
+   return true;
+}
+
   /* Construct the value to be splatted, if possible.  If not, return 0.  */
   for (i = 2; i <= copies; i *= 2)
 {
@@ -6146,6 +6167,7 @@ vspltis_constant (rtx op, unsigned step,
   | (small_val & mask)))
return false;
   splat_val = small_val;
+  inner = smallest_int_mode_for_size (bitsize);
 }
 
   /* Check if SPLAT_VAL can really be the operand of a vspltis[bhw].  */
@@ -6160,8 +6182,9 @@ vspltis_constant (rtx op, unsigned step,
 ;
 
   /* Also check if are loading up the most significant bit which can be done by
- loading up -1 and shifting the value left by -1.  */
-  else if (EASY_VECTOR_MSB (splat_val, inner))
+ loading up -1 and shifting the value left by -1.  Only do this for
+ step 1 here, for larger steps it is done earlier.  */
+  else if (EASY_VECTOR_MSB (splat_val, inner) && step == 1)
 ;
 
   else

[PATCH] handle vector and aggregate stores in -Wstringop-overflow [PR 97027]

2021-07-13 Thread Martin Sebor via Gcc-patches

An existing, previously xfailed test that I recently removed
the xfail from made me realize that -Wstringop-overflow doesn't
properly detect buffer overflow resulting from vectorized stores.
Because of a difference in the IL the test passes on x86_64 but
fails on targets like aarch64.  Other examples can be constructed
that -Wstringop-overflow fails to diagnose even on x86_64.  For
INSTANCE, the overflow in the following function isn't diagnosed
when the loop is vectorized:

  void* f (void)
  {
char *p = __builtin_malloc (8);
for (int i = 0; i != 16; ++i)
  p[i] = 1 << i;
return p;
  }

The attached change enhances the warning to detect those as well.
It found a few bugs in vectorizer tests that the patch corrects.
Tested on x86_64-linux and with an aarch64 cross.

Martin
Detect buffer overflow by aggregate and vector stores [PR97027].

Resolves:
PR middle-end/97027 - missing warning on buffer overflow storing a larger scalar into a smaller array

gcc/ChangeLog:

	PR middle-end/97027
	* tree-ssa-strlen.c (handle_assign): New function.
	(nonzero_bytes_for_type): New function.
	(count_nonzero_bytes): Handle more tree types.  Call
	nonzero_bytes_for_tye.
	(count_nonzero_bytes): Handle types.
	(handle_store): Handle stores from function calls.
	(strlen_check_and_optimize_call): Move code to handle_assign.  Call
	it for assignments from function calls.

gcc/testsuite/ChangeLog:

	PR middle-end/97027
	* gcc.dg/Wstringop-overflow-15.c: Remove an xfail.
	* gcc.dg/Wstringop-overflow-47.c: Adjust xfails.
	* gcc.dg/torture/pr69170.c: Avoid valid warnings.
	* gcc.dg/torture/pr70025.c: Prune out a false positive.
	* gcc.dg/vect/pr97769.c: Initialize a loop control variable.
	* gcc.target/i386/pr92658-avx512bw-trunc.c: Increase buffer size
	to avoid overflow.
	* gcc.target/i386/pr92658-avx512f.c: Same.
	* gcc.dg/Wstringop-overflow-68.c: New test.
	* gcc.dg/Wstringop-overflow-69.c: New test.
	* gcc.dg/Wstringop-overflow-70.c: New test.
	* gcc.dg/Wstringop-overflow-71.c: New test.
	* gcc.dg/strlenopt-95.c: New test.

diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-15.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-15.c
index 1907bac2722..87f8462d431 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-15.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-15.c
@@ -30,7 +30,7 @@ void vla_bounded (int n)
   a[0] = 0;
   a[1] = 1;
   a[n] = n; // { dg-warning "\\\[-Wstringop-overflow" "pr82608" { xfail *-*-* } }
-  a[69] = n;// { dg-warning "\\\[-Wstringop-overflow" "pr82608" { xfail *-*-* } }
+  a[69] = n;// { dg-warning "\\\[-Wstringop-overflow" "pr82608" }
 
   sink ();
 }
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-47.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-47.c
index 6412874e2f9..968f6ee4ad4 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-47.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-47.c
@@ -31,15 +31,15 @@ void nowarn_c32 (char c)
 
 void warn_c32 (char c)
 {
-  extern char warn_a32[32];   // { dg-message "at offset 32 into destination object 'warn_a32' of size 32" "pr97027" }
+  extern char warn_a32[32];   // { dg-message "at offset (32|1) into destination object 'warn_a32' of size 32" "pr97027" }
 
   void *p = warn_a32 + 1;
-  *(C32*)p = (C32){ c };  // { dg-warning "writing 1 byte into a region of size 0" "pr97027" }
+  *(C32*)p = (C32){ c };  // { dg-warning "writing (1 byte|32 bytes) into a region of size (0|31)" "pr97027" }
 
   /* Verify a local variable too. */
   char a32[32];
   p = a32 + 1;
-  *(C32*)p = (C32){ c };  // { dg-warning "writing 1 byte into a region of size 0" "pr97027" }
+  *(C32*)p = (C32){ c };  // { dg-warning "writing (1 byte|32 bytes) into a region of size (0|31)" "pr97027" }
   sink (p);
 }
 
@@ -60,15 +60,20 @@ void nowarn_i16_64 (int16_t i)
 
 void warn_i16_64 (int16_t i)
 {
-  extern char warn_a64[64];   // { dg-message "at offset 128 to object 'warn_a64' with size 64" "pr97027" { xfail *-*-* } }
+/* The IL below that's visible to the warning changes from one target to
+   another.  On some like aarch64 it's a single vector store, on others
+   like x86_64 it's a series of BIT_FIELD_REFs.  The overflow by
+   the former is detected but the latter is not yet.  */
+
+ extern char warn_a64[64];   // { dg-message "at offset (1|128) into destination object 'warn_a64' of size (63|64)" "pr97027 note" { xfail { ! aarch64-*-* } } }
 
   void *p = warn_a64 + 1;
   I16_64 *q = (I16_64*)p;
-  *q = (I16_64){ i }; // { dg-warning "writing 1 byte into a region of size 0" "pr97027" { xfail *-*-* } }
+  *q = (I16_64){ i }; // { dg-warning "writing (1 byte|64 bytes) into a region of size (0|63)" "pr97027" { xfail { ! aarch64-*-* } } }
 
   char a64[64];
   p = a64 + 1;
   q = (I16_64*)p;
-  *q = (I16_64){ i }; // { dg-warning "writing 1 byte into a region of size 0" "pr97027" { xfail *-*-* } }
+  *q = (I16_64){ i }; // { dg-warning "writing (1 byte|64 bytes) into a region of size (0|63)" 

[PATCH] libstdc++: invalid default init in _CachedPosition [PR101231]

2021-07-13 Thread Patrick Palka via Gcc-patches
The primary template for _CachedPosition is a dummy implementation for
non-forward ranges, the iterators for which generally can't be cached.
Because this implementation doesn't actually cache anything, _M_has_value
is defined to be false and so calls to _M_get (which are always guarded
by _M_has_value) are unreachable.

Still, to suppress a "control reaches end of non-void function" warning
I made _M_get return {}, but after P2325 input iterators are no longer
necessarily default constructible so this workaround now breaks valid
programs.

This patch fixes this by instead using __builtin_unreachable to squelch
the warning.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR libstdc++/101231

libstdc++-v3/ChangeLog:

* include/std/ranges (_CachedPosition::_M_get): For non-forward
ranges, just call __builtin_unreachable.
* testsuite/std/ranges/istream_view.cc (test05): New test.
---
 libstdc++-v3/include/std/ranges   |  2 +-
 libstdc++-v3/testsuite/std/ranges/istream_view.cc | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index df74ac9dc19..d791e15d096 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1232,7 +1232,7 @@ namespace views::__adaptor
_M_get(const _Range&) const
{
  __glibcxx_assert(false);
- return {};
+ __builtin_unreachable();
}
 
constexpr void
diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc 
b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
index 369790e89e5..2f15f787250 100644
--- a/libstdc++-v3/testsuite/std/ranges/istream_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
@@ -83,6 +83,17 @@ test04()
   static_assert(!std::forward_iterator);
 }
 
+void
+test05()
+{
+  // PR libstdc++/101231
+  auto words = std::istringstream{"42"};
+  auto is = ranges::istream_view(words);
+  auto r = is | views::filter([](auto) { return true; });
+  for (auto x : r)
+;
+}
+
 int
 main()
 {
@@ -90,4 +101,5 @@ main()
   test02();
   test03();
   test04();
+  test05();
 }
-- 
2.32.0.170.gd486ca60a5



[PATCH] libstdc++: Give split_view::_Sentinel a default ctor [PR101214]

2021-07-13 Thread Patrick Palka via Gcc-patches
This gives the new split_view's sentinel type a defaulted default
constructor, something which was overlooked in r12-1665.  This patch
also fixes a couple of other issues with the new split_view as reported
in the PR.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR libstdc++/101214

libstdc++-v3/ChangeLog:

* include/std/ranges (split_view::split_view): Use std::move.
(split_view::_Iterator::_Iterator): Remove redundant
default_initializable constraint.
(split_view::_Sentinel::_Sentinel): Declare.
* testsuite/std/ranges/adaptors/split.cc (test02): New test.
---
 libstdc++-v3/include/std/ranges |  6 --
 libstdc++-v3/testsuite/std/ranges/adaptors/split.cc | 11 +++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index f552caa9d5b..df74ac9dc19 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3306,7 +3306,7 @@ namespace views::__adaptor
&& constructible_from<_Pattern, single_view>>
 constexpr
 split_view(_Range&& __r, range_value_t<_Range> __e)
-  : _M_pattern(views::single(__e)),
+  : _M_pattern(views::single(std::move(__e))),
_M_base(views::all(std::forward<_Range>(__r)))
 { }
 
@@ -3364,7 +3364,7 @@ namespace views::__adaptor
   using value_type = subrange>;
   using difference_type = range_difference_t<_Vp>;
 
-  _Iterator() requires default_initializable> = default;
+  _Iterator() = default;
 
   constexpr
   _Iterator(split_view* __parent,
@@ -3429,6 +3429,8 @@ namespace views::__adaptor
   { return __x._M_cur == _M_end && !__x._M_trailing_empty; }
 
 public:
+  _Sentinel() = default;
+
   constexpr explicit
   _Sentinel(split_view* __parent)
: _M_end(ranges::end(__parent->_M_base))
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
index 02c6073a503..b4e01fea6e4 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
@@ -46,6 +46,16 @@ test01()
   VERIFY( ranges::equal(ints, (int[]){1,2,3,4}) );
 }
 
+void
+test02()
+{
+  // PR libstdc++/101214
+  auto v = views::iota(0) | views::take(5) | views::split(0);
+  static_assert(!ranges::common_range);
+  static_assert(std::default_initializable);
+  static_assert(std::sentinel_for);
+}
+
 // The following testcases are adapted from lazy_split.cc.
 namespace from_lazy_split_cc
 {
@@ -189,6 +199,7 @@ int
 main()
 {
   test01();
+  test02();
 
   from_lazy_split_cc::test01();
   from_lazy_split_cc::test02();
-- 
2.32.0.170.gd486ca60a5



Re: [PATCH 1/4 committed] rs6000: Add support for SSE4.1 "test" intrinsics

2021-07-13 Thread Paul A. Clarke via Gcc-patches
On Mon, Jul 12, 2021 at 05:24:07PM -0500, Segher Boessenkool wrote:
> On Sun, Jul 11, 2021 at 10:45:45AM -0500, Bill Schmidt wrote:
> > On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
> > >--- a/gcc/config/rs6000/smmintrin.h
> > >+++ b/gcc/config/rs6000/smmintrin.h
> > >@@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i 
> > >__mask)
> > >return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> > >  }
> > >
> > >+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > >__artificial__))
> > Line too long, please fix here and below.  (Existing cases can be left.)
> 
> I wouldn't bother in this case.  There is no way to write these
> attribute lines in a reasonable way, it doesn't overflow 80 char by that
> much, and there isn't anything interesting at the end of line.

I bothered. ;-)

> You could put it on a line by itself, which helps for now because it
> won't get too long until you add another attribute ;-)

OK

> There should be a space before (( though, and "extern" on definitions is
> superfluous.  But I do not care much about that either -- this isn't a
> part of the compiler proper anyway :-)

OK

> It is okay for trunk with whatever changes you want to do.  Thanks!

This is what I committed:

2021-07-13  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros): New.
---
 gcc/config/rs6000/smmintrin.h | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index bdf6eb365d88..16fd34d836ff 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,4 +116,60 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testz_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testnzc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_all_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_all_ones (__m128i __A)
+{
+  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
+  return vec_all_eq ((__v16qu) __A, __ones);
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
+  const int any_ones = vec_any_ne (__Amasked, __zero);
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
+  const int any_zeros = vec_any_ne (__notAmasked, __zero);
+  return any_ones * any_zeros;
+}
+
 #endif
-- 
2.27.0

PC


Re: [PATCH v3] x86: Don't enable UINTR in 32-bit mode

2021-07-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 13, 2021 at 09:35:18AM -0700, H.J. Lu wrote:
> Here is the v3 patch.   OK for master?

>From my POV LGTM, but please give Uros a chance to chime in.

> From ceab81ef97ab102c410830c41ba7fea911170d1a Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Fri, 9 Jul 2021 09:16:01 -0700
> Subject: [PATCH v3] x86: Don't enable UINTR in 32-bit mode
> 
> UINTR is available only in 64-bit mode.  Since the codegen target is
> unknown when the the gcc driver is processing -march=native, to properly
> handle UINTR for -march=native:
> 
> 1. Pass "arch [32|64]" and "tune [32|64]" to host_detect_local_cpu to
> indicate 32-bit and 64-bit codegen.
> 2. Change ix86_option_override_internal to enable UINTR only in 64-bit
> mode for -march=CPU when PTA_CPU includes PTA_UINTR.
> 
> gcc/
> 
>   PR target/101395
>   * config/i386/driver-i386.c (host_detect_local_cpu): Check
>   "arch [32|64]" and "tune [32|64]" for 32-bit and 64-bit codegen.
>   Enable UINTR only for 64-bit codegen.
>   * config/i386/i386-options.c
>   (ix86_option_override_internal::DEF_PTA): Skip PTA_UINTR if not
>   in 64-bit mode.
>   * config/i386/i386.h (ARCH_ARG): New.
>   (CC1_CPU_SPEC): Pass "[arch|tune] 32" for 32-bit codegen and
>   "[arch|tune] 64" for 64-bit codegen.
> 
> gcc/testsuite/
> 
>   PR target/101395
>   * gcc.target/i386/pr101395-1.c: New test.
>   * gcc.target/i386/pr101395-2.c: Likewise.
>   * gcc.target/i386/pr101395-3.c: Likewise.

Jakub



Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-13 Thread Peter Bergner via Gcc-patches
On 7/13/21 12:14 PM, Peter Bergner wrote:
> ...and patch 2:
[snip]
> I'm currently bootstrapping and regtesting these two patches and
> will report back.  Better now?

Ok, this along with the previous move patch bootstrapped and regtested
with no regressions on powerpc64le-linux.

Peter




Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-13 Thread Jason Merrill via Gcc-patches

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of the change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to use
vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each translation
unit.


The ODR says this is OK because it's a literal constant with the same 
value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in C++17 it's 
implicitly inline instead of static.


But then, do we really want to keep vNULL at all?  It's a weird blurring 
of the object/pointer boundary that is also dependent on vec being a 
thin wrapper around a pointer.  In almost all cases it can be replaced 
with {}; one exception is == comparison, where it seems to be testing 
that the embedded pointer is null, which is a weird thing to want to test.


Somewhat relatedly, use of vec variables or fields seems almost 
always a mistake, as they need explicit .release() that could be 
automatic with auto_vec, and is easy to forget.  For instance, the 
recursive call in get_all_loop_exits returns a vec that is never 
released.  And I see a couple of leaks in the C++ front end as well.


Jason



Re: disable -Warray-bounds in libgo (PR 101374)

2021-07-13 Thread Dimitar Dimitrov
On Fri, Jul 09, 2021 at 08:16:24AM +0200, Richard Biener via Gcc-patches wrote:
> On Thu, Jul 8, 2021 at 8:02 PM Martin Sebor via Gcc-patches
>  wrote:
> >
> > Hi Ian,
> >
> > Yesterday's enhancement to -Warray-bounds has exposed a couple of
> > issues in libgo where the code writes into an invalid constant
> > address that the warning is designed to flag.
> >
> > On the assumption that those invalid addresses are deliberate,
> > the attached patch suppresses these instances by using #pragma
> > GCC diagnostic but I don't think I'm supposed to commit it (at
> > least Git won't let me).  To avoid Go bootstrap failures please
> > either apply the patch or otherwise suppress the warning (e.g.,
> > by using a volatile pointer temporary).
> 
> Btw, I don't think we should diagnose things like
> 
> *(int*)0x21 = 0x21;
> 
> when somebody literally writes that he'll be just annoyed by diagnostics.
I agree. This will raise a lot of noise for embedded targets.

Similar constructs are used extensively in pretty much any microcontroller
project to define macros to access I/O special-function addresses.
A few random examples:

http://svn.savannah.gnu.org/viewvc/avr-libc/trunk/avr-libc/include/avr/sfr_defs.h?view=markup#l128
https://sourceforge.net/p/mspgcc/msp430mcu/ci/master/tree/upstream/cc430f5123.h#l2141
https://github.com/ARM-software/CMSIS_5/blob/develop/CMSIS/RTOS/RTX/SRC/rt_HAL_CM.h#L138

Regards,
Dimitar

> 
> Of course the above might be able to use __builtin_trap (); - it looks
> like it is placed where control flow should never end, kind of a
> __builtin_unreachable (), which means abort () might do as well.
> 
> Richard.
> 
> > Thanks
> > Martin


[r12-2267 Regression] FAIL: g++.dg/vect/slp-pr87105.cc -std=c++2a scan-tree-dump-times slp2 "basic block part vectorized" 1 on Linux/x86_64

2021-07-13 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

f546e2b6cc5c610ae18aac274d0d6493f2da3801 is the first bad commit
commit f546e2b6cc5c610ae18aac274d0d6493f2da3801
Author: Richard Biener 
Date:   Tue Jul 13 08:04:34 2021 +0200

Revert "Display the number of components BB vectorized"

caused

FAIL: g++.dg/vect/slp-pr87105.cc  -std=c++14  scan-tree-dump-times slp2 "basic 
block part vectorized" 1
FAIL: g++.dg/vect/slp-pr87105.cc  -std=c++17  scan-tree-dump-times slp2 "basic 
block part vectorized" 1
FAIL: g++.dg/vect/slp-pr87105.cc  -std=c++2a  scan-tree-dump-times slp2 "basic 
block part vectorized" 1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2267/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=g++.dg/vect/slp-pr87105.cc --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=g++.dg/vect/slp-pr87105.cc --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH 2/2] RISC-V: Add ldr/str instruction for T-HEAD.

2021-07-13 Thread Palmer Dabbelt

On Tue, 29 Jun 2021 01:11:07 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

gcc/
* gcc/config/riscv/riscv-opts.h (TARGET_LDR): New.
(TARGET_LDUR): Likewise.
* gcc/config/riscv/riscv.h (INDEX_REG_CLASS): Use TARGET_LDR.
(REGNO_OK_FOR_INDEX_P): Use TARGET_LDR.
(REG_OK_FOR_INDEX_P): Use REGNO_OK_FOR_INDEX_P.
* gcc/config/riscv/riscv.c (riscv_address_type): Add ADDRESS_REG_REG,
ADDRESS_REG_UREG.
(riscv_address_info): Add shift.
(riscv_classify_address_index): New.
(riscv_classify_address): Use riscv_classify_address_index.
(riscv_legitimize_address_index_p): New.
(riscv_output_move_index): New.
(riscv_output_move): Add parameter, Use riscv_output_move_index.
(riscv_print_operand_address): Use ADDRESS_REG_REG, ADDRESS_REG_UREG.
* gcc/config/riscv/riscv-protos.h (riscv_output_move): Update 
riscv_output_move.
* gcc/config/riscv/riscv.md (zero_extendsidi2): Use riscv_output_move.
(zero_extendhi2): Likewise.
(zero_extendqi2): Likewise.
(extendsidi2): Likewise.
(extend2): Likewise.
* gcc/config/riscv/predicates.md (sync_memory_operand): New.
* gcc/config/riscv/sync.md (atomic_store): Use 
sync_memory_operand.
(atomic_): Likewise.
(atomic_fetch_): Likewise.
(atomic_exchange): Likewise.
(atomic_cas_value_strong): Likewise.
(atomic_compare_and_swap): Likewise.
(atomic_test_and_set): Likewise.

gcc/testsuite/
* gcc.target/riscv/xthead/riscv-xthead.exp: New.
* gcc.target/riscv/xthead/ldr.c: Likewise.
---
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv-protos.h   |   2 +-
 gcc/config/riscv/riscv.c  | 234 --
 gcc/config/riscv/riscv.h  |   7 +-
 gcc/config/riscv/riscv.md |  50 ++--
 gcc/config/riscv/sync.md  |  14 +-
 gcc/testsuite/gcc.target/riscv/xthead/ldr.c   |  34 +++
 .../gcc.target/riscv/xthead/riscv-xthead.exp  |  41 +++
 9 files changed, 348 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xthead/ldr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xthead/riscv-xthead.exp

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 232115135544..802e7a40e880 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -217,3 +217,7 @@
 {
   return riscv_gpr_save_operation_p (op);
 })
+
+(define_predicate "sync_memory_operand"
+  (and (match_operand 0 "memory_operand")
+   (match_code "reg" "0")))


This should be split out into a standalone patch: it's really 
preparatory work for the reg/reg instructions, having it standalone will 
make it easier to review.



diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index a2d84a66f037..d3163cb2377c 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -76,4 +76,7 @@ enum stack_protector_guard {
 #define MASK_XTHEAD_C (1 << 0)
 #define TARGET_XTHEAD_C ((riscv_x_subext & MASK_XTHEAD_C) != 0)

+#define TARGET_LDR (TARGET_XTHEAD_C)
+#define TARGET_LDUR (TARGET_XTHEAD_C)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 43d7224d6941..3a218f327c42 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -52,9 +52,9 @@ extern bool riscv_legitimize_move (machine_mode, rtx, rtx);
 extern rtx riscv_subword (rtx, bool);
 extern bool riscv_split_64bit_move_p (rtx, rtx);
 extern void riscv_split_doubleword_move (rtx, rtx);
-extern const char *riscv_output_move (rtx, rtx);
 extern const char *riscv_output_return ();
 #ifdef RTX_CODE
+extern const char *riscv_output_move (rtx, rtx, rtx_code outer = UNKNOWN);
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx);
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 576960bb37cb..7d321826f669 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -80,6 +80,12 @@ along with GCC; see the file COPYING3.  If not see
A natural register + offset address.  The register satisfies
riscv_valid_base_register_p and the offset is a const_arith_operand.

+  ADDRESS_REG_REG
+   A base register indexed by (optionally scaled) register.
+
+  ADDRESS_REG_UREG
+   A base register indexed by (optionally scaled) zero-extended register.
+
ADDRESS_LO_SUM
A LO_SUM rtx.  The first operand is a valid base register and
the second operand is a symbolic address.
@@ -91,6 +97,8 @@ along with GCC; see the file COPYING3.  If not see
A constant symbolic 

Re: [PATCH 0/2] RISC-V: Add ldr/str instruction for T-HEAD.

2021-07-13 Thread Palmer Dabbelt

On Sat, 10 Jul 2021 19:31:20 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

Hi,

Ping.

@Jim @kito

— Jojo
在 2021年7月9日 +0800 AM9:30,ALO ,写道:

Hi,
Ping.

— Jojo
在 2021年6月29日 +0800 PM4:11,Jojo R ,写道:
> T-HEAD extends some customized ISAs for Cores.
> The patches support ldr/str insns, it likes arm's LDR insn, the
> memory model is a base register indexed by (optionally scaled) register.


Sorry about that.  I'd seem some discussion here, but I guess it wasn't 
on the lists and wasn't really a review anyway.  I've taken a 
preliminary look and have a few questions, they're in the patches.


Thanks!


Re: [PATCH 1/2] RISC-V: Add arch flags for T-HEAD.

2021-07-13 Thread Palmer Dabbelt

On Tue, 29 Jun 2021 01:11:06 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

gcc/
* gcc/config/riscv/riscv.opt (riscv_x_subext): New.
* gcc/config/riscv/riscv-opts.h (MASK_XTHEAD_C): New.
(TARGET_XTHEAD_C): Likewise.
* gcc/common/config/riscv/riscv-common.c
(riscv_ext_flag_table): Use riscv_x_subext & MASK_XTHEAD_C.
---
 gcc/common/config/riscv/riscv-common.c | 2 ++
 gcc/config/riscv/riscv-opts.h  | 3 +++
 gcc/config/riscv/riscv.opt | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 10868fd417dc..a62080129259 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -906,6 +906,8 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
   {"zicsr",_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", _options::x_riscv_zi_subext, MASK_ZIFENCEI},

+  {"xtheadc", _options::x_riscv_x_subext, MASK_XTHEAD_C},
+


Is there are documentation as to what this "theadc" extension is?  My 
main worry here would be trickling in instructions under the same custom 
extension, as that will quickly get confusing for users.  If you really 
just have one instruction in this extension that's fine, but if there 
are lots (as the marketing material seems to indicate) then I'd prefer 
to at least have a complete picture first.


Also, having the documentation will be necessary for anyone to actually 
use these instructions.



   {NULL, NULL, 0}
 };

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index f4cf6ca4b823..a2d84a66f037 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -73,4 +73,7 @@ enum stack_protector_guard {
 #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
 #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)

+#define MASK_XTHEAD_C (1 << 0)
+#define TARGET_XTHEAD_C ((riscv_x_subext & MASK_XTHEAD_C) != 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 5ff85c214307..84176aea05e9 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -195,6 +195,9 @@ long riscv_stack_protector_guard_offset = 0
 TargetVariable
 int riscv_zi_subext

+TargetVariable
+int riscv_x_subext
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):


[PATCH] libstdc++: Give split_view::_Sentinel a default ctor [PR101214]

2021-07-13 Thread Patrick Palka via Gcc-patches
This gives the new split_view's sentinel type a defaulted default
constructor, something which was overlooked in r12-1665.  This patch
also fixes a couple of other issues with the new split_view as reported
in the PR.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR libstdc++/101214

libstdc++-v3/ChangeLog:

* include/std/ranges (split_view::split_view): Use std::move.
(split_view::_Iterator::_Iterator): Remove redundant
default_initializable constraint.
(split_view::_Sentinel::_Sentinel): Declare.
* testsuite/std/ranges/adaptors/split.cc (test02): New test.
---
 libstdc++-v3/include/std/ranges |  6 --
 libstdc++-v3/testsuite/std/ranges/adaptors/split.cc | 11 +++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index f552caa9d5b..df74ac9dc19 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3306,7 +3306,7 @@ namespace views::__adaptor
&& constructible_from<_Pattern, single_view>>
 constexpr
 split_view(_Range&& __r, range_value_t<_Range> __e)
-  : _M_pattern(views::single(__e)),
+  : _M_pattern(views::single(std::move(__e))),
_M_base(views::all(std::forward<_Range>(__r)))
 { }
 
@@ -3364,7 +3364,7 @@ namespace views::__adaptor
   using value_type = subrange>;
   using difference_type = range_difference_t<_Vp>;
 
-  _Iterator() requires default_initializable> = default;
+  _Iterator() = default;
 
   constexpr
   _Iterator(split_view* __parent,
@@ -3429,6 +3429,8 @@ namespace views::__adaptor
   { return __x._M_cur == _M_end && !__x._M_trailing_empty; }
 
 public:
+  _Sentinel() = default;
+
   constexpr explicit
   _Sentinel(split_view* __parent)
: _M_end(ranges::end(__parent->_M_base))
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
index 02c6073a503..b4e01fea6e4 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
@@ -46,6 +46,16 @@ test01()
   VERIFY( ranges::equal(ints, (int[]){1,2,3,4}) );
 }
 
+void
+test02()
+{
+  // PR libstdc++/101214
+  auto v = views::iota(0) | views::take(5) | views::split(0);
+  static_assert(!ranges::common_range);
+  static_assert(std::default_initializable);
+  static_assert(std::sentinel_for);
+}
+
 // The following testcases are adapted from lazy_split.cc.
 namespace from_lazy_split_cc
 {
@@ -189,6 +199,7 @@ int
 main()
 {
   test01();
+  test02();
 
   from_lazy_split_cc::test01();
   from_lazy_split_cc::test02();
-- 
2.32.0.170.gd486ca60a5



Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-13 Thread Peter Bergner via Gcc-patches
...and patch 2:

On 7/10/21 7:39 PM, seg...@gate.crashing.org wrote:
>> +  unsigned subreg =
>> +(WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i);
> 
> This is not new code, but it caught my eye, so just for the record: the
> "=" should start a new line:
> unsigned subreg
>   = WORDS_BIG_ENDIAN ? i : (nregs - reg_mode_nregs - i);
> (and don't put parens around random words please :-) ).

Fixed.


>> +  int nvecs = XVECLEN (src, 0);
>> +  for (int i = 0; i < nvecs; i++)
>> +{
>> +  rtx opnd;
> 
> Just "op" (and "op2") please?  If you use long names you might as well
> just spell "operand" :-)

Done.



>> +  if (WORDS_BIG_ENDIAN)
>> +opnd = XVECEXP (src, 0, i);
>> +  else
>> +opnd = XVECEXP (src, 0, nvecs - i - 1);
> 
> Put this together with the case below as well?  Probably keep the
> WORDS_BIG_ENDIAN test as the outer "if"?

Ok, reworked a little bit.


I'm currently bootstrapping and regtesting these two patches and
will report back.  Better now?

Peter



rs6000: Generate an lxvp instead of two adjacent lxv instructions

The MMA build built-ins currently use individual lxv instructions to
load up the registers of a __vector_pair or __vector_quad.  If the
memory addresses of the built-in operands are to adjacent locations,
then we can use an lxvp in some cases to load up two registers at once.
The patch below adds support for checking whether memory addresses are
adjacent and emitting an lxvp instead of two lxv instructions.

gcc/
* config/rs6000/rs6000.c (adjacent_mem_locations): Return the lower
addressed memory rtx, if any.
(power6_sched_reorder2): Update for adjacent_mem_locations change.
(rs6000_split_multireg_move): Fix code formatting.
Handle MMA build built-ins with operands in adjacent memory locations.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-9.c: New test.
---
 gcc/config/rs6000/rs6000.c| 84 ++-
 .../gcc.target/powerpc/mma-builtin-9.c| 28 +++
 2 files changed, 93 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ae11b8d52cb..5fed3bc3ac1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18051,23 +18051,29 @@ get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT 
*offset,
   return true;
 }
 
-/* The function returns true if the target storage location of
-   mem1 is adjacent to the target storage location of mem2 */
-/* Return 1 if memory locations are adjacent.  */
+/* If the target storage locations of arguments MEM1 and MEM2 are
+   adjacent, then return the argument that has the lower address.
+   Otherwise, return NULL_RTX.  */
 
-static bool
+static rtx
 adjacent_mem_locations (rtx mem1, rtx mem2)
 {
   rtx reg1, reg2;
   HOST_WIDE_INT off1, size1, off2, size2;
 
-  if (get_memref_parts (mem1, , , )
-  && get_memref_parts (mem2, , , ))
-return ((REGNO (reg1) == REGNO (reg2))
-   && ((off1 + size1 == off2)
-   || (off2 + size2 == off1)));
+  if (MEM_P (mem1)
+  && MEM_P (mem2)
+  && get_memref_parts (mem1, , , )
+  && get_memref_parts (mem2, , , )
+  && REGNO (reg1) == REGNO (reg2))
+{
+  if (off1 + size1 == off2)
+   return mem1;
+  else if (off2 + size2 == off1)
+   return mem2;
+}
 
-  return false;
+  return NULL_RTX;
 }
 
 /* This function returns true if it can be determined that the two MEM
@@ -18633,7 +18639,7 @@ power6_sched_reorder2 (rtx_insn **ready, int lastpos)
first_store_pos = pos;
 
  if (is_store_insn (last_scheduled_insn, _mem2)
- && adjacent_mem_locations (str_mem, str_mem2))
+ && adjacent_mem_locations (str_mem, str_mem2) != NULL_RTX)
{
  /* Found an adjacent store.  Move it to the head of the
 ready list, and adjust it's priority so that it is
@@ -26708,8 +26714,8 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
  for (int i = 0; i < nregs; i += reg_mode_nregs)
{
- unsigned subreg =
-   (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i);
+ unsigned subreg
+   = WORDS_BIG_ENDIAN ? i : (nregs - reg_mode_nregs - i);
  rtx dst2 = adjust_address (dst, reg_mode, offset);
  rtx src2 = gen_rtx_REG (reg_mode, reg + subreg);
  offset += size;
@@ -26726,8 +26732,8 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
  for (int i = 0; i < nregs; i += reg_mode_nregs)
{
- unsigned subreg =
-   (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i);
+ unsigned subreg
+   = WORDS_BIG_ENDIAN ? i : (nregs - reg_mode_nregs - i);
  rtx dst2 = gen_rtx_REG (reg_mode, 

Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-13 Thread Peter Bergner via Gcc-patches
On 7/10/21 7:39 PM, seg...@gate.crashing.org wrote:
> It is very hard to see the differences now.  Don't fold the changes into
> one patch, just have the code movement in a separate trivial patch, and
> then the actual changes as a separate patch?  That way it is much easier
> to review :-)

Ok, I split the patch into 2 patches.  The one here is simply the move.
The second patch in a different reply will handle your other comments.

Peter


rs6000: Move rs6000_split_multireg_move to later in file

An upcoming change to rs6000_split_multireg_move requires it to be
moved later in the file to fix a declaration issue.

gcc/
* config/rs6000/rs6000.c (rs6000_split_multireg_move): Move to later
in the file.
---
 gcc/config/rs6000/rs6000.c | 751 ++---
 1 file changed, 375 insertions(+), 376 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9a5db63d0ef..ae11b8d52cb 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16690,382 +16690,6 @@ rs6000_expand_atomic_op (enum rtx_code code, rtx mem, 
rtx val,
 emit_move_insn (orig_after, after);
 }
 
-/* Emit instructions to move SRC to DST.  Called by splitters for
-   multi-register moves.  It will emit at most one instruction for
-   each register that is accessed; that is, it won't emit li/lis pairs
-   (or equivalent for 64-bit code).  One of SRC or DST must be a hard
-   register.  */
-
-void
-rs6000_split_multireg_move (rtx dst, rtx src)
-{
-  /* The register number of the first register being moved.  */
-  int reg;
-  /* The mode that is to be moved.  */
-  machine_mode mode;
-  /* The mode that the move is being done in, and its size.  */
-  machine_mode reg_mode;
-  int reg_mode_size;
-  /* The number of registers that will be moved.  */
-  int nregs;
-
-  reg = REG_P (dst) ? REGNO (dst) : REGNO (src);
-  mode = GET_MODE (dst);
-  nregs = hard_regno_nregs (reg, mode);
-
-  /* If we have a vector quad register for MMA, and this is a load or store,
- see if we can use vector paired load/stores.  */
-  if (mode == XOmode && TARGET_MMA
-  && (MEM_P (dst) || MEM_P (src)))
-{
-  reg_mode = OOmode;
-  nregs /= 2;
-}
-  /* If we have a vector pair/quad mode, split it into two/four separate
- vectors.  */
-  else if (mode == OOmode || mode == XOmode)
-reg_mode = V1TImode;
-  else if (FP_REGNO_P (reg))
-reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode :
-   (TARGET_HARD_FLOAT ? DFmode : SFmode);
-  else if (ALTIVEC_REGNO_P (reg))
-reg_mode = V16QImode;
-  else
-reg_mode = word_mode;
-  reg_mode_size = GET_MODE_SIZE (reg_mode);
-
-  gcc_assert (reg_mode_size * nregs == GET_MODE_SIZE (mode));
-
-  /* TDmode residing in FP registers is special, since the ISA requires that
- the lower-numbered word of a register pair is always the most significant
- word, even in little-endian mode.  This does not match the usual subreg
- semantics, so we cannnot use simplify_gen_subreg in those cases.  Access
- the appropriate constituent registers "by hand" in little-endian mode.
-
- Note we do not need to check for destructive overlap here since TDmode
- can only reside in even/odd register pairs.  */
-  if (FP_REGNO_P (reg) && DECIMAL_FLOAT_MODE_P (mode) && !BYTES_BIG_ENDIAN)
-{
-  rtx p_src, p_dst;
-  int i;
-
-  for (i = 0; i < nregs; i++)
-   {
- if (REG_P (src) && FP_REGNO_P (REGNO (src)))
-   p_src = gen_rtx_REG (reg_mode, REGNO (src) + nregs - 1 - i);
- else
-   p_src = simplify_gen_subreg (reg_mode, src, mode,
-i * reg_mode_size);
-
- if (REG_P (dst) && FP_REGNO_P (REGNO (dst)))
-   p_dst = gen_rtx_REG (reg_mode, REGNO (dst) + nregs - 1 - i);
- else
-   p_dst = simplify_gen_subreg (reg_mode, dst, mode,
-i * reg_mode_size);
-
- emit_insn (gen_rtx_SET (p_dst, p_src));
-   }
-
-  return;
-}
-
-  /* The __vector_pair and __vector_quad modes are multi-register
- modes, so if we have to load or store the registers, we have to be
- careful to properly swap them if we're in little endian mode
- below.  This means the last register gets the first memory
- location.  We also need to be careful of using the right register
- numbers if we are splitting XO to OO.  */
-  if (mode == OOmode || mode == XOmode)
-{
-  nregs = hard_regno_nregs (reg, mode);
-  int reg_mode_nregs = hard_regno_nregs (reg, reg_mode);
-  if (MEM_P (dst))
-   {
- unsigned offset = 0;
- unsigned size = GET_MODE_SIZE (reg_mode);
-
- /* If we are reading an accumulator register, we have to
-deprime it before we can access it.  */
- if (TARGET_MMA
- && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
-   emit_insn (gen_mma_xxmfacc (src, src));
-
- 

[PATCH v3] x86: Don't enable UINTR in 32-bit mode

2021-07-13 Thread H.J. Lu via Gcc-patches
On Mon, Jul 12, 2021 at 11:56 PM Jakub Jelinek  wrote:
>
> On Mon, Jul 12, 2021 at 06:51:30PM -0700, H.J. Lu wrote:
> > @@ -404,9 +404,18 @@ const char *host_detect_local_cpu (int argc, const 
> > char **argv)
> >if (argc < 1)
> >  return NULL;
>
> I think it would be simpler to use 2 arguments instead of one.
> So change the above to if (argc < 2)

Fixed.

> >
> > -  arch = !strcmp (argv[0], "arch");
> > +  arch = !strncmp (argv[0], "arch", 4);
> >
> > -  if (!arch && strcmp (argv[0], "tune"))
> > +  if (!arch && strncmp (argv[0], "tune", 4))
> > +return NULL;
>
> Keep strcmp as is here.

Fixed.

> > +
> > +  bool codegen_x86_64;
> > +
> > +  if (!strcmp (argv[0] + 4, "32"))
> > +codegen_x86_64 = false;
> > +  else if (!strcmp (argv[0] + 4, "64"))
> > +codegen_x86_64 = true;
> > +  else
> >  return NULL;
>
> Check argv[1] here instead.

Fixed.

> > @@ -813,7 +826,8 @@ const char *host_detect_local_cpu (int argc, const char 
> > **argv)
> >  }
> >
> >  done:
> > -  return concat (cache, "-m", argv[0], "=", cpu, options, NULL);
> > +  const char *moption = arch ? "-march=" : "-mtune=";
> > +  return concat (cache, moption, cpu, options, NULL);
> >  }
> >  #else
>
> You don't need this change.

Fixed.

> > diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> > index 7a35c468da3..7cba655595e 100644
> > --- a/gcc/config/i386/i386-options.c
> > +++ b/gcc/config/i386/i386-options.c
> > @@ -2109,6 +2109,7 @@ ix86_option_override_internal (bool main_args_p,
> >  #define DEF_PTA(NAME) \
> >   if (((processor_alias_table[i].flags & PTA_ ## NAME) != 0) \
> >   && PTA_ ## NAME != PTA_64BIT \
> > + && (TARGET_64BIT || PTA_ ## NAME != PTA_UINTR) \
> >   && !TARGET_EXPLICIT_ ## NAME ## _P (opts)) \
> > SET_TARGET_ ## NAME (opts);
> >  #include "i386-isa.def"
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 8c3eace56da..ae9f455c48d 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -577,9 +577,12 @@ extern const char *host_detect_local_cpu (int argc, 
> > const char **argv);
> >  #define CC1_CPU_SPEC CC1_CPU_SPEC_1
> >  #else
> >  #define CC1_CPU_SPEC CC1_CPU_SPEC_1 \
> > -"%{march=native:%>march=native %:local_cpu_detect(arch) \
> > -  %{!mtune=*:%>mtune=native %:local_cpu_detect(tune)}} \
> > -%{mtune=native:%>mtune=native %:local_cpu_detect(tune)}"
> > +"%{" OPT_ARCH32 ":%{march=native:%>march=native %:local_cpu_detect(arch32) 
> > \
> > +   %{!mtune=*:%>mtune=native %:local_cpu_detect(tune32)}}}" \
> > +"%{" OPT_ARCH32 ":%{mtune=native:%>mtune=native 
> > %:local_cpu_detect(tune32)}}" \
> > +"%{" OPT_ARCH64 ":%{march=native:%>march=native %:local_cpu_detect(arch64) 
> > \
> > +   %{!mtune=*:%>mtune=native %:local_cpu_detect(tune64)}}}" \
> > +"%{" OPT_ARCH64 ":%{mtune=native:%>mtune=native 
> > %:local_cpu_detect(tune64)}}"
>
> And you can use
> #define ARCH_ARG "%{" OPT_ARCH64 ":64;32}"

I added

#define ARCH_ARG "%{" OPT_ARCH64 ":64;:32}"

> %:local_cpu_detect(arch, " ARCH_ARG ")
> etc.
>
> Jakub
>

Here is the v3 patch.   OK for master?

Thanks.

-- 
H.J.
From ceab81ef97ab102c410830c41ba7fea911170d1a Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 9 Jul 2021 09:16:01 -0700
Subject: [PATCH v3] x86: Don't enable UINTR in 32-bit mode

UINTR is available only in 64-bit mode.  Since the codegen target is
unknown when the the gcc driver is processing -march=native, to properly
handle UINTR for -march=native:

1. Pass "arch [32|64]" and "tune [32|64]" to host_detect_local_cpu to
indicate 32-bit and 64-bit codegen.
2. Change ix86_option_override_internal to enable UINTR only in 64-bit
mode for -march=CPU when PTA_CPU includes PTA_UINTR.

gcc/

	PR target/101395
	* config/i386/driver-i386.c (host_detect_local_cpu): Check
	"arch [32|64]" and "tune [32|64]" for 32-bit and 64-bit codegen.
	Enable UINTR only for 64-bit codegen.
	* config/i386/i386-options.c
	(ix86_option_override_internal::DEF_PTA): Skip PTA_UINTR if not
	in 64-bit mode.
	* config/i386/i386.h (ARCH_ARG): New.
	(CC1_CPU_SPEC): Pass "[arch|tune] 32" for 32-bit codegen and
	"[arch|tune] 64" for 64-bit codegen.

gcc/testsuite/

	PR target/101395
	* gcc.target/i386/pr101395-1.c: New test.
	* gcc.target/i386/pr101395-2.c: Likewise.
	* gcc.target/i386/pr101395-3.c: Likewise.
---
 gcc/config/i386/driver-i386.c  | 25 --
 gcc/config/i386/i386-options.c |  1 +
 gcc/config/i386/i386.h |  7 +++---
 gcc/testsuite/gcc.target/i386/pr101395-1.c | 12 +++
 gcc/testsuite/gcc.target/i386/pr101395-2.c | 22 +++
 gcc/testsuite/gcc.target/i386/pr101395-3.c |  6 ++
 6 files changed, 64 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-3.c

diff --git 

Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-13 Thread H.J. Lu via Gcc-patches
On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers  wrote:
>
> On Tue, 13 Jul 2021, H.J. Lu wrote:
>
> > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei  
> > wrote:
> > >
> > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
> > >
> > > Can you please explain the behavior here? Is there difference between 
> > > _Float16 and _Complex _Float16 when return? I.e.,
> > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1?
> > > 2, For a single _Float16 value, are both real part and imaginary part 
> > > returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?
> >
> > Here is the v2 patch to add the missing _Float16 bits.   The PDF file is at
> >
> > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
>
> This PDF shows _Complex _Float16 as having a size of 2 bytes (should be
> 4-byte size, 2-byte alignment).
>
> It also seems to change double from 4-byte to 8-byte alignment, which is
> wrong.  And it's inconsistent about whether it covers the long double =
> double (Android) case - it shows that case for _Complex long double but
> not for long double itself.

Here is the v3 patch with the fixes.  I also updated the PDF file.

> --
> Joseph S. Myers
> jos...@codesourcery.com
>


-- 
H.J.
From a02a11ef0ea066cab57eb66ef392b21d243d2734 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 1 Jul 2021 13:58:00 -0700
Subject: [PATCH v3] Add optional _Float16 support

1. Pass _Float16 and _Complex _Float16 values on stack.
2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
---
 low-level-sys-info.tex | 76 ++
 1 file changed, 54 insertions(+), 22 deletions(-)

diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index acaf30e..9ae7995 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a
 \subsubsection{Fundamental Types}
 
 Table~\ref{basic-types} shows the correspondence between ISO C
-scalar types and the processor scalar types.  \code{__float80},
+scalar types and the processor scalar types.  \code{_Float16},
+\code{__float80},
 \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and
 \code{__m512} types are optional.
 
@@ -79,23 +80,28 @@ scalar types and the processor scalar types.  \code{__float80},
 & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\
 & \texttt{\textit{any-type} (*)()} & & \\
 \hline
-Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\
 \cline{2-5}
-point & \texttt{double} & 8 & 4 & double (IEEE-754) \\
-& \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+& \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\
+\cline{2-5}
+& \texttt{float} & 4 & 4 & single (IEEE-754) \\
+\cline{2-5}
+Floating- & \texttt{double} & 8 & 4 & double (IEEE-754) \\
+point & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & 8 & 4 & double (IEEE-754) \\
 \cline{2-5}
 & \texttt{__float80}$^{\dagger\dagger}$  & 12 & 4 & 80-bit extended (IEEE-754) \\
-& \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+& \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & 12 & 4 & 80-bit extended (IEEE-754) \\
 \cline{2-5}
 & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\
 \hline
-Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
+& \texttt{_Complex _Float16} $^{\dagger\dagger\dagger\dagger\dagger}$ & 4 & 2 & complex 16-bit (IEEE-754) \\
 \cline{2-5}
-Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
-point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
+& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
 \cline{2-5}
-& \texttt{_Complex __float80}$^{\dagger\dagger}$  & 24 & 4 & complex 80-bit extended (IEEE-754) \\
-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
+Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
+\cline{2-5}
+point & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\
+& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
 \cline{2-5}
 & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 128-bit extended (IEEE-754) \\
 \hline
@@ -125,6 +131,8 @@ The \texttt{long double} type is 64-bit, the same as the \texttt{double}
 type, on the Android{\texttrademark} platform.  More information on the
 Android{\texttrademark} platform is available from
 \url{http://www.android.com/}.}\\
+\multicolumn{5}{p{13cm}}{\myfontsize $^{\dagger\dagger\dagger\dagger\dagger}$
+The \texttt{_Float16} type, from ISO/IEC TS 18661-3:2015, is optional.}\\
  

Re: [PATCH] godump: Fix -fdump-go-spec= reproduceability issue

2021-07-13 Thread Ian Lance Taylor via Gcc-patches
On Tue, Jul 13, 2021 at 12:49 AM Jakub Jelinek  wrote:
>
> 2021-07-13  Jakub Jelinek  
>
> PR go/101407
> * godump.c (godump_str_hash): New type.
> (godump_container::pot_dummy_types): Use string_hash instead of
> ptr_hash in the hash_set.

Thanks for looking at this.  This is OK.

Ian


Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-13 Thread Segher Boessenkool
On Tue, Jul 13, 2021 at 10:09:25AM +0800, guojiufu wrote:
> >For loop looks like:
> >  do ;
> >  while (n-- > 0); /* while  (n-- > low); */

(This whole loop as written will be optimised away, but :-) )

> There is a patch that could mitigate "-1 +1 pair" in rtl part.
> https://gcc.gnu.org/g:8a15faa730f99100f6f3ed12663563356ec5a2c0

Does that solve PR67288 (and its many duplicates)?


Segher


Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-13 Thread Joseph Myers
On Tue, 13 Jul 2021, H.J. Lu wrote:

> On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei  wrote:
> >
> > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
> >
> > Can you please explain the behavior here? Is there difference between 
> > _Float16 and _Complex _Float16 when return? I.e.,
> > 1, In which case will _Float16 values return in both %xmm0 and %xmm1?
> > 2, For a single _Float16 value, are both real part and imaginary part 
> > returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?
> 
> Here is the v2 patch to add the missing _Float16 bits.   The PDF file is at
> 
> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI

This PDF shows _Complex _Float16 as having a size of 2 bytes (should be 
4-byte size, 2-byte alignment).

It also seems to change double from 4-byte to 8-byte alignment, which is 
wrong.  And it's inconsistent about whether it covers the long double = 
double (Android) case - it shows that case for _Complex long double but 
not for long double itself.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-13 Thread Segher Boessenkool
On Mon, Jul 12, 2021 at 08:20:14AM +0200, Richard Biener wrote:
> On Fri, 9 Jul 2021, Segher Boessenkool wrote:
> > Almost all targets just use Pmode, but there is no such guarantee I
> > think, and esp. some targets that do not have machine insns for this
> > (but want to generate different code for this anyway) can do pretty much
> > anything.
> > 
> > Maybe using just Pmode here is good enough though?
> 
> I think Pmode is a particularly bad choice and I'd prefer word_mode
> if we go for any hardcoded mode.

In many important cases you use a pointer as iteration variable.

Is word_mode register size on most current targets?

> s390x for example seems to handle
> both SImode and DImode (but names the helper gen_doloop_si64
> for SImode?!).

Yes, so Pmode will work fine for 390.  It would be nice if we could
allow multiple modes here, certainly.  Can we?

> But indeed it looks like somehow querying doloop_end
> is going to be difficult since the expander doesn't have any mode,
> so we'd have to actually try emit RTL here.

Or add a well-designed target macro for this.  "Which modes do we like
for IVs", perhaps?


Segher


Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-13 Thread H.J. Lu via Gcc-patches
On Tue, Jul 13, 2021 at 7:48 AM Wang, Pengfei  wrote:
>
> Hi H.J.,
>
> Our LLVM implementation currently use %xmm0 for both _Complex's real part and 
> imaginary part. Do we have special reason to use two registers?
> We are using one register on X64. Considering the performance, especially the 
> register pressure, should it be better to use one register for _Complex 
> _Float16 on 32 bits target?

x86-64 psABI is unrelated to i386 psABI.  Using a pair of registers is
more natural for
complex _Float16.  Since it is only used for function return value, I
don't think there is
a register pressure issue.

> Thanks
> Pengfei
>
> -Original Message-
> From: H.J. Lu 
> Sent: Tuesday, July 13, 2021 10:26 PM
> To: Wang, Pengfei ; llvm-...@lists.llvm.org
> Cc: Joseph Myers ; GCC Patches 
> ; GNU C Library ; IA32 
> System V Application Binary Interface 
> Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support
>
> On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei  wrote:
> >
> > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
> >
> > Can you please explain the behavior here? Is there difference between
> > _Float16 and _Complex _Float16 when return? I.e., 1, In which case will 
> > _Float16 values return in both %xmm0 and %xmm1?
> > 2, For a single _Float16 value, are both real part and imaginary part 
> > returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?
>
> Here is the v2 patch to add the missing _Float16 bits.   The PDF file is at
>
> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
>
> > Thanks
> > Pengfei
> >
> > -Original Message-
> > From: llvm-dev  On Behalf Of H.J. Lu
> > via llvm-dev
> > Sent: Friday, July 2, 2021 6:28 AM
> > To: Joseph Myers 
> > Cc: llvm-...@lists.llvm.org; GCC Patches ;
> > GNU C Library ; IA32 System V Application
> > Binary Interface 
> > Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support
> >
> > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers  wrote:
> > >
> > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:
> > >
> > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 
> > > > registers.
> > >
> > > That restricts use of _Float16 to processors with SSE.  Is that what
> > > we want in the ABI, or should _Float16 be available with base 32-bit
> > > x86 architecture features only, much like _Float128 and the decimal
> > > FP types
> >
> > Yes, _Float16 requires XMM registers.
> >
> > > are?  (If it is restricted to SSE, we can of course ensure relevant
> > > libgcc functions are built with SSE enabled, and likewise in glibc
> > > if that gains
> > > _Float16 functions, though maybe with some extra complications to
> > > get relevant testcases to run whenever possible.)
> > >
> >
> > _Float16 functions in libgcc should be compiled with SSE enabled.
> >
> > BTW, _Float16 software emulation may require more than just SSE since we 
> > need to do _Float16 load and store with XMM registers.
> > There is no 16bit load/store for XMM registers without AVX512FP16.
> >
> > --
> > H.J.
> > ___
> > LLVM Developers mailing list
> > llvm-...@lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> --
> H.J.



-- 
H.J.


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-13 Thread Kewen.Lin via Gcc-patches
on 2021/7/13 下午8:42, Richard Biener wrote:
> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:
>>
>> Hi Richi,
>>
>> Thanks for the comments!
>>
>> on 2021/7/13 下午5:35, Richard Biener wrote:
>>> On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin  wrote:

 Hi,

 When I added the support for Power10 newly introduced multiply
 highpart instrutions, I noticed that currently vectorizer
 doesn't try to vectorize multiply highpart pattern, I hope
 this isn't intentional?

 This patch is to extend the existing pattern mulhs handlings
 to cover multiply highpart.  Another alternative seems to
 recog mul_highpart operation in a general place applied for
 scalar code when the target supports the optab for the scalar
 operation, it's based on the assumption that one target which
 supports vector version of multiply highpart should have the
 scalar version.  I noticed that the function can_mult_highpart_p
 can check/handle mult_highpart well even without mul_highpart
 optab support, I think to recog this pattern in vectorizer
 is better.  Is it on the right track?
>>>
>>> I think it's on the right track, using IFN_LAST is a bit awkward
>>> in case yet another case pops up so maybe you can use
>>> a code_helper instance instead which unifies tree_code,
>>> builtin_code and internal_fn?
>>>
>>
>> If there is one new requirement which doesn't have/introduce IFN
>> stuffs but have one existing tree_code, can we add one more field
>> with type tree_code, then for the IFN_LAST path we can check the
>> different requirements under the guard with that tree_code variable?
>>
>>> I also notice that can_mult_highpart_p will return true if
>>> only vec_widen_[us]mult_{even,odd,hi,lo} are available,
>>> but then the result might be less optimal (or even not
>>> handled later)?
>>>
>>
>> I think it will be handled always?  The expander calls
>>
>> rtx
>> expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
>>   rtx target, bool uns_p)
>>
>> which will further check with can_mult_highpart_p.
>>
>> For the below case,
>>
>> #define SHT_CNT 16
>>
>> __attribute__ ((noipa)) void
>> test ()
>> {
>>   for (int i = 0; i < N; i++)
>> sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16;
>> }
>>
>> Without this patch, it use widen_mult like below:
>>
>>   vect__1.5_19 = MEM  [(short int *)_a + ivtmp.18_24 
>> * 1];
>>   vect__3.8_14 = MEM  [(short int *)_b + ivtmp.18_24 
>> * 1];
>>   vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR ;
>>   vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR ;
>>   vect__6.10_25 = vect_patt_22.9_13 >> 16;
>>   vect__6.10_26 = vect_patt_22.9_9 >> 16;
>>   vect__7.11_27 = VEC_PACK_TRUNC_EXPR ;
>>   MEM  [(short int *)_c + ivtmp.18_24 * 1] = 
>> vect__7.11_27;
>>
>> .L2:
>> lxvx 33,7,9
>> lxvx 32,8,9
>> vmulosh 13,0,1// widen mult
>> vmulesh 0,0,1
>> xxmrglw 33,32,45  // merge
>> xxmrghw 32,32,45
>> vsraw 1,1,12  // shift
>> vsraw 0,0,12
>> vpkuwum 0,0,1 // pack
>> stxvx 32,10,9
>> addi 9,9,16
>> bdnz .L2
>>
>>
>> With this patch, it ends up with:
>>
>>   vect__1.5_14 = MEM  [(short int *)_a + ivtmp.17_24 
>> * 1];
>>   vect__3.8_8 = MEM  [(short int *)_b + ivtmp.17_24 
>> * 1];
>>   vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14;
>>   MEM  [(short int *)_c + ivtmp.17_24 * 1] = 
>> vect_patt_21.9_25;
> 
> Yes, so I'm curious what it ends up with/without the patch on x86_64 which
> can do vec_widen_[us]mult_{even,odd} but not [us]mul_highpart.
> 

For test case:

```
#define N 32
typedef signed int bigType;
typedef signed short smallType;
#define SH_CNT 16

extern smallType small_a[N], small_b[N], small_c[N];

__attribute__((noipa)) void test_si(int n) {
  for (int i = 0; i < n; i++)
small_c[i] = ((bigType)small_a[i] * (bigType)small_b[i]) >> SH_CNT;
}

```

on x86_64, with option set: -mfpmath=sse -msse2 -O2 -ftree-vectorize

1) without this patch, the pattern isn't recognized, the IR looks like:

   [local count: 94607391]:
  bnd.5_34 = niters.4_25 >> 3;
  _13 = (sizetype) bnd.5_34;
  _29 = _13 * 16;

   [local count: 378429566]:
  # ivtmp.34_4 = PHI 
  vect__1.10_40 = MEM  [(short int *)_a + ivtmp.34_4 
* 1];
  vect__3.13_43 = MEM  [(short int *)_b + ivtmp.34_4 
* 1];
  vect_patt_18.14_44 = WIDEN_MULT_LO_EXPR ;
  vect_patt_18.14_45 = WIDEN_MULT_HI_EXPR ;
  vect__6.15_46 = vect_patt_18.14_44 >> 16;
  vect__6.15_47 = vect_patt_18.14_45 >> 16;
  vect__7.16_48 = VEC_PACK_TRUNC_EXPR ;
  MEM  [(short int *)_c + ivtmp.34_4 * 1] = 
vect__7.16_48;
  ivtmp.34_5 = ivtmp.34_4 + 16;
  if (ivtmp.34_5 != _29)
goto ; [75.00%]
  else
goto ; [25.00%]

...

*asm*:

.L4:
movdqu  small_b(%rax), %xmm3
movdqu  small_a(%rax), %xmm1
addq$16, %rax
movdqu  small_a-16(%rax), %xmm2
pmullw  %xmm3, %xmm1
pmulhw  %xmm3, %xmm2
movdqa  %xmm1, %xmm0
punpcklwd   %xmm2, %xmm0
punpckhwd 

RE: [PATCH] Port GCC documentation to Sphinx

2021-07-13 Thread Tamar Christina via Gcc-patches
Hi Martin,

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Martin Liška
> Sent: Tuesday, June 29, 2021 11:09 AM
> To: Joseph Myers 
> Cc: GCC Development ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Port GCC documentation to Sphinx
> 
> On 6/28/21 5:33 PM, Joseph Myers wrote:
> > Are formatted manuals (HTML, PDF, man, info) corresponding to this
> > patch version also available for review?
> 
> I've just uploaded them here:
> https://splichal.eu/gccsphinx-final/
> 

Thanks for doing this 

I'm a primary user of the PDFs (easier to work offline) I do like the look and 
syntax highlighting of the new PDFs,
But I do prefer the way the current itemized entries are laid out.

See for instance ` vect_interleave` which before would have the description 
indented on the next line, vs the new docs which puts it on the same line.

The visual break in my opinion makes it easier to read.  It currently looks 
like a sea of text.  Also purely personal I expect, but could the weight of the 
bold entries be reduced a bit? They look very BOLD to me atm and when there's a 
lot of them I find it slightly harder to read.

Cheers,
Tamar

> Martin



RE: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-13 Thread Wang, Pengfei via Gcc-patches
Hi H.J.,

Our LLVM implementation currently use %xmm0 for both _Complex's real part and 
imaginary part. Do we have special reason to use two registers?
We are using one register on X64. Considering the performance, especially the 
register pressure, should it be better to use one register for _Complex 
_Float16 on 32 bits target?

Thanks
Pengfei

-Original Message-
From: H.J. Lu  
Sent: Tuesday, July 13, 2021 10:26 PM
To: Wang, Pengfei ; llvm-...@lists.llvm.org
Cc: Joseph Myers ; GCC Patches 
; GNU C Library ; IA32 
System V Application Binary Interface 
Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support

On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei  wrote:
>
> > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
>
> Can you please explain the behavior here? Is there difference between 
> _Float16 and _Complex _Float16 when return? I.e., 1, In which case will 
> _Float16 values return in both %xmm0 and %xmm1?
> 2, For a single _Float16 value, are both real part and imaginary part 
> returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?

Here is the v2 patch to add the missing _Float16 bits.   The PDF file is at

https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI

> Thanks
> Pengfei
>
> -Original Message-
> From: llvm-dev  On Behalf Of H.J. Lu 
> via llvm-dev
> Sent: Friday, July 2, 2021 6:28 AM
> To: Joseph Myers 
> Cc: llvm-...@lists.llvm.org; GCC Patches ; 
> GNU C Library ; IA32 System V Application 
> Binary Interface 
> Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support
>
> On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers  wrote:
> >
> > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:
> >
> > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
> >
> > That restricts use of _Float16 to processors with SSE.  Is that what 
> > we want in the ABI, or should _Float16 be available with base 32-bit
> > x86 architecture features only, much like _Float128 and the decimal 
> > FP types
>
> Yes, _Float16 requires XMM registers.
>
> > are?  (If it is restricted to SSE, we can of course ensure relevant 
> > libgcc functions are built with SSE enabled, and likewise in glibc 
> > if that gains
> > _Float16 functions, though maybe with some extra complications to 
> > get relevant testcases to run whenever possible.)
> >
>
> _Float16 functions in libgcc should be compiled with SSE enabled.
>
> BTW, _Float16 software emulation may require more than just SSE since we need 
> to do _Float16 load and store with XMM registers.
> There is no 16bit load/store for XMM registers without AVX512FP16.
>
> --
> H.J.
> ___
> LLVM Developers mailing list
> llvm-...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
H.J.


Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-13 Thread H.J. Lu via Gcc-patches
On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei  wrote:
>
> > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
>
> Can you please explain the behavior here? Is there difference between 
> _Float16 and _Complex _Float16 when return? I.e.,
> 1, In which case will _Float16 values return in both %xmm0 and %xmm1?
> 2, For a single _Float16 value, are both real part and imaginary part 
> returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?

Here is the v2 patch to add the missing _Float16 bits.   The PDF file is at

https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI

> Thanks
> Pengfei
>
> -Original Message-
> From: llvm-dev  On Behalf Of H.J. Lu via 
> llvm-dev
> Sent: Friday, July 2, 2021 6:28 AM
> To: Joseph Myers 
> Cc: llvm-...@lists.llvm.org; GCC Patches ; GNU C 
> Library ; IA32 System V Application Binary 
> Interface 
> Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support
>
> On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers  wrote:
> >
> > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:
> >
> > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
> >
> > That restricts use of _Float16 to processors with SSE.  Is that what
> > we want in the ABI, or should _Float16 be available with base 32-bit
> > x86 architecture features only, much like _Float128 and the decimal FP
> > types
>
> Yes, _Float16 requires XMM registers.
>
> > are?  (If it is restricted to SSE, we can of course ensure relevant
> > libgcc functions are built with SSE enabled, and likewise in glibc if
> > that gains
> > _Float16 functions, though maybe with some extra complications to get
> > relevant testcases to run whenever possible.)
> >
>
> _Float16 functions in libgcc should be compiled with SSE enabled.
>
> BTW, _Float16 software emulation may require more than just SSE since we need 
> to do _Float16 load and store with XMM registers.
> There is no 16bit load/store for XMM registers without AVX512FP16.
>
> --
> H.J.
> ___
> LLVM Developers mailing list
> llvm-...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-- 
H.J.
From b48c361b939ef9216184f1a58a9d5052bbeb7551 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 1 Jul 2021 13:58:00 -0700
Subject: [PATCH v2] Add optional _Float16 support

1. Pass _Float16 and _Complex _Float16 values on stack.
2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
---
 low-level-sys-info.tex | 76 ++
 1 file changed, 54 insertions(+), 22 deletions(-)

diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index acaf30e..157509b 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a
 \subsubsection{Fundamental Types}
 
 Table~\ref{basic-types} shows the correspondence between ISO C
-scalar types and the processor scalar types.  \code{__float80},
+scalar types and the processor scalar types.  \code{_Float16},
+\code{__float80},
 \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and
 \code{__m512} types are optional.
 
@@ -79,22 +80,27 @@ scalar types and the processor scalar types.  \code{__float80},
 & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\
 & \texttt{\textit{any-type} (*)()} & & \\
 \hline
-Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\
 \cline{2-5}
-point & \texttt{double} & 8 & 4 & double (IEEE-754) \\
-& \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+& \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\
 \cline{2-5}
-& \texttt{__float80}$^{\dagger\dagger}$  & 12 & 4 & 80-bit extended (IEEE-754) \\
-& \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+& \texttt{float} & 4 & 4 & single (IEEE-754) \\
+\cline{2-5}
+Floating- & \texttt{double} & 8
+	& 8$^{\dagger\dagger\dagger\dagger}$ & double (IEEE-754) \\
+\cline{2-5}
+point & \texttt{__float80}$^{\dagger\dagger}$  & 16 & 16 & 80-bit extended (IEEE-754) \\
+& \texttt{long double}$^{\dagger\dagger\dagger\dagger\dagger}$  & 16 & 16 & 80-bit extended (IEEE-754) \\
 \cline{2-5}
 & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\
 \hline
-Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
+& \texttt{_Complex _Float16} $^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & complex 16-bit (IEEE-754) \\
 \cline{2-5}
-Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
-point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
+& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
 \cline{2-5}
-& \texttt{_Complex __float80}$^{\dagger\dagger}$  & 24 & 4 & complex 80-bit extended (IEEE-754) \\
+Complex& 

[committed] libstdc++: Simplify basic_string_view::ends_with [PR 101361]

2021-07-13 Thread Jonathan Wakely via Gcc-patches
The use of npos triggers a diagnostic as described in PR c++/101361.
This change replaces the use of npos with the exact length, which is
already known. We can further simplify it by inlining the effects of
compare and substr, avoiding the redundant range checks in the latter.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR c++/101361
* include/std/string_view (ends_with): Use traits_type::compare
directly.

Tested powerpc64le-linux. Committed to trunk.

commit 4d3eaeb4f505b0838c673ee28e7dba8687fc8272
Author: Jonathan Wakely 
Date:   Tue Jul 13 12:21:27 2021

libstdc++: Simplify basic_string_view::ends_with [PR 101361]

The use of npos triggers a diagnostic as described in PR c++/101361.
This change replaces the use of npos with the exact length, which is
already known. We can further simplify it by inlining the effects of
compare and substr, avoiding the redundant range checks in the latter.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR c++/101361
* include/std/string_view (ends_with): Use traits_type::compare
directly.

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index cfdcf28f026..4ea72c6cef2 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -361,8 +361,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr bool
   ends_with(basic_string_view __x) const noexcept
   {
-   return this->size() >= __x.size()
-   && this->compare(this->size() - __x.size(), npos, __x) == 0;
+   const auto __len = this->size();
+   const auto __xlen = __x.size();
+   return __len >= __xlen
+ && traits_type::compare(end() - __xlen, __x.data(), __xlen) == 0;
   }
 
   constexpr bool


Re: [PATCH] Support reduction def re-use for epilogue with different vector size

2021-07-13 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The following adds support for re-using the vector reduction def
> from the main loop in vectorized epilogue loops on architectures
> which use different vector sizes for the epilogue.  That's only
> x86 as far as I am aware.
>
> vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
> regtest in progress.
>
> There's costing issues on x86 which usually prevent vectorizing
> an epilogue with a reduction, at least for loops that only
> have a reduction - it could be mitigated by not accounting for
> the epilogue there if we can compute that we can re-use the
> main loops cost.
>
> Richard - did I figure the correct place to adjust?  I guess
> adjusting accumulator->reduc_input in vect_transform_cycle_phi
> for re-use by the skip code in vect_create_epilog_for_reduction
> is a bit awkward but at least we're conciously doing
> vect_create_epilog_for_reduction last (via vectorizing live
> operations).

Yeah.  IMO it'd be a bit cleaner to store the new accumulator directly
in the reduc_info, but I don't feel strongly about it.

Apart from that and a minor nit below, it looks good to me FWIW.

(At some point it'd be good for reduc_info to be its own structure,
separate from stmt_vec_info, so that there's less of a cost associated
with storing more data there.)

Thanks,
Richard

> OK in the unlikely case all testing succeeds (I also want to
> run it through SPEC with/without -fno-vect-cost-model which
> will take some time)?
>
> Thanks,
> Richard.
>
> 2021-07-13  Richard Biener  
>
>   * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
>   vector types where the old vector type has a multiple of
>   the new vector type elements.
>   (vect_create_partial_epilog): New function, split out from...
>   (vect_create_epilog_for_reduction): ... here.
>   (vect_transform_cycle_phi): Reduce the re-used accumulator
>   to the new vector type.
>
>   * gcc.target/i386/vect-reduc-1.c: New testcase.
> ---
>  gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
>  gcc/tree-vect-loop.c | 223 ---
>  2 files changed, 155 insertions(+), 85 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c 
> b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> new file mode 100644
> index 000..9ee9ba4e736
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
> +
> +#define N 32
> +int foo (int *a, int n)
> +{
> +  int sum = 1;
> +  for (int i = 0; i < 8*N + 4; ++i)
> +sum += a[i];
> +  return sum;
> +}
> +
> +/* The reduction epilog should be vectorized and the accumulator
> +   re-used.  */
> +/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-times "psrl" 2 } } */
> +/* { dg-final { scan-assembler-times "padd" 5 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 8c27d75f889..98e2a845629 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info 
> loop_vinfo,
>   ones as well.  */
>tree vectype = STMT_VINFO_VECTYPE (reduc_info);
>tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> -  if (!useless_type_conversion_p (old_vectype, vectype))
> +  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> + TYPE_VECTOR_SUBPARTS (vectype)))
>  return false;
>  
>/* Non-SLP reductions might apply an adjustment after the reduction

The comment above this needs updating too.

> @@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info 
> loop_vinfo,
>return true;
>  }
>  
> +/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
> +   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
> +
> +static tree
> +vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code,
> + gimple_seq *seq)
> +{
> +  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant 
> ();
> +  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> +  tree stype = TREE_TYPE (vectype);
> +  tree new_temp = vec_def;
> +  while (nunits > nunits1)
> +{
> +  nunits /= 2;
> +  tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE 
> (vectype),
> +stype, nunits);
> +  unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
> +
> +  /* The target has to make sure we support lowpart/highpart
> +  extraction, either via direct vector extract or through
> +  an integer mode punning.  */
> +  tree dst1, dst2;
> +  gimple *epilog_stmt;
> +  if (convert_optab_handler (vec_extract_optab,
> +  

Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-13 Thread Jonathan Wakely via Gcc-patches
On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:
> Somebody with more C++ knowledge than me needs to approve the
> vec.h changes - I don't feel competent to assess all effects of the change.

They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation to use
vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each translation
unit. The extern object avoids that problem. It probably won't cause
real problems in practice but it's still technically UB.

If avoiding the extern variable is desirable, you can make it inline
when compiled by a C++17 compiler (such as GCC 11):

#if __cpp_inline_variables
inline constexpr vnull vNULL{ };
#else
extern const vnull vNULL;
#endif

and then define it in vec.c:

#if ! __cpp_inline_variables
const vnull vNULL{ };
#endif



Re: [PATCHv3 00/55] Replace the Power target-specific builtin machinery

2021-07-13 Thread Bill Schmidt via Gcc-patches

Ping^2

On 6/25/21 10:25 AM, Bill Schmidt wrote:

Ping / beg  :-)

On 6/17/21 10:18 AM, Bill Schmidt via Gcc-patches wrote:

Original patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568840.html

V2 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572231.html

I've made some slight changes to the V2 series because of recent updates
to trunk from Carl Love and Peter Bergner.  Carl added some new P10
builtins, and Peter made some changes to the MMA builtins.  This series
is compatible with all builtins that are currently upstream.

As a reminder, as a result of reviews, the original patch 0001 has been
dropped, so the patch numbering is off by one compared with the original
series.  Status of the remaining patches (using new numbering):

0001: Approved
0002: Approved
0003: Approved
0004: Approved
0005: Needs re-review
0006: Approved
0007: Approved
0008: Approved
0009: Approved
0010-0055: Not yet reviewed

Thanks again for the ongoing reviews!

Bill

Bill Schmidt (55):
Support scanning of build-time GC roots in gengtype
rs6000: Initial create of rs6000-gen-builtins.c
rs6000: Add initial input files
rs6000: Add file support and functions for diagnostic support
rs6000: Add helper functions for parsing
rs6000: Add functions for matching types, part 1 of 3
rs6000: Add functions for matching types, part 2 of 3
rs6000: Add functions for matching types, part 3 of 3
rs6000: Red-black tree implementation for balanced tree search
rs6000: Main function with stubs for parsing and output
rs6000: Parsing built-in input file, part 1 of 3
rs6000: Parsing built-in input file, part 2 of 3
rs6000: Parsing built-in input file, part 3 of 3
rs6000: Parsing of overload input file
rs6000: Build and store function type identifiers
rs6000: Write output to the builtin definition include file
rs6000: Write output to the builtins header file
rs6000: Write output to the builtins init file, part 1 of 3
rs6000: Write output to the builtins init file, part 2 of 3
rs6000: Write output to the builtins init file, part 3 of 3
rs6000: Write static initializations for built-in table
rs6000: Write static initializations for overload tables
rs6000: Incorporate new builtins code into the build machinery
rs6000: Add gengtype handling to the build machinery
rs6000: Add the rest of the [altivec] stanza to the builtins file
rs6000: Add VSX builtins
rs6000: Add available-everywhere and ancient builtins
rs6000: Add power7 and power7-64 builtins
rs6000: Add power8-vector builtins
rs6000: Add Power9 builtins
rs6000: Add more type nodes to support builtin processing
rs6000: Add Power10 builtins
rs6000: Add MMA builtins
rs6000: Add miscellaneous builtins
rs6000: Add Cell builtins
rs6000: Add remaining overloads
rs6000: Execute the automatic built-in initialization code
rs6000: Darwin builtin support
rs6000: Add sanity to V2DI_type_node definitions
rs6000: Always initialize vector_pair and vector_quad nodes
rs6000: Handle overloads during program parsing
rs6000: Handle gimple folding of target built-ins
rs6000: Support for vectorizing built-in functions
rs6000: Builtin expansion, part 1
rs6000: Builtin expansion, part 2
rs6000: Builtin expansion, part 3
rs6000: Builtin expansion, part 4
rs6000: Builtin expansion, part 5
rs6000: Builtin expansion, part 6
rs6000: Update rs6000_builtin_decl
rs6000: Miscellaneous uses of rs6000_builtin_decls_x
rs6000: Debug support
rs6000: Update altivec.h for automated interfaces
rs6000: Test case adjustments
rs6000: Enable the new builtin support

   gcc/Makefile.in   |5 +-
   gcc/config.gcc|2 +
   gcc/config/rs6000/altivec.h   |  522 +-
   gcc/config/rs6000/darwin.h|8 +-
   gcc/config/rs6000/rbtree.c|  242 +
   gcc/config/rs6000/rbtree.h|   52 +
   gcc/config/rs6000/rs6000-builtin-new.def  | 3998 +++
   gcc/config/rs6000/rs6000-c.c  | 1083 +++
   gcc/config/rs6000/rs6000-call.c   | 3399 -
   gcc/config/rs6000/rs6000-gen-builtins.c   | 2984 
   gcc/config/rs6000/rs6000-overload.def | 6186 +
   gcc/config/rs6000/rs6000.c|  219 +-
   gcc/config/rs6000/rs6000.h|   84 +
   gcc/config/rs6000/t-rs6000|   45 +-
   gcc/gengtype-state.c  |   32 +-
   gcc/gengtype.c|   22 +-
   gcc/gengtype.h|5 +
   .../powerpc/bfp/scalar-extract-exp-2.c|2 +-
   .../powerpc/bfp/scalar-extract-sig-2.c|2 +-
   .../powerpc/bfp/scalar-insert-exp-2.c |2 +-
   

[COMMITTED] tree-optimization/93781 - Adjust testcase to test the call is removed.

2021-07-13 Thread Andrew MacLeod via Gcc-patches

Ranger now resolves the first test in this series, so add the check.

Andrew

commit f75560398af6f1f696c820016f437af4e8b4265c
Author: Andrew MacLeod 
Date:   Tue Jul 13 09:41:30 2021 -0400

Adjust testcase to test the call is removed.

Ranger now handles the test.

gcc/testsuite
PR tree-optimization/93781
* gcc.dg/tree-ssa/pr93781-1.c: Check that call is removed.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr93781-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr93781-1.c
index 5ebd8053965..b2505f3959d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr93781-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr93781-1.c
@@ -12,7 +12,9 @@ void foo (unsigned int arg)
   if (a < 0)
 b = x;
 
-  /* In the fullness of time, we will delete this call.  */
   if (b >=  5)
 kill ();;
 }
+
+/* { dg-final { scan-tree-dump-not "kill" "evrp" } }  */
+


[Committed] Make gimple_could_trap_p const-safe.

2021-07-13 Thread Roger Sayle

On Mon, Jul 12, 2021 Richard Biener  wrote:
> There's const-correctness pieces in your patch - those are OK under the 
> obvious rule and you might want to push them separately.

Allow gimple_could_trap_p (which previously took a non-const gimple)
to be called from functions that take a const gimple (such as
gimple_has_side_effects), and update its prototypes.

These chunks have been (re)tested on x86_64-pc-linux-gnu with a
"make bootstrap" and "make -k check" with no new failures.

Committed to mainline as obvious/pre-approved.

2021-07-13  Roger Sayle  
Richard Biener  

gcc/ChangeLog
* gimple.c (gimple_could_trap_p_1):  Make S argument a
"const gimple*".  Preserve constness in call to
gimple_asm_volatile_p.
(gimple_could_trap_p): Make S argument a "const gimple*".
* gimple.h (gimple_could_trap_p_1, gimple_could_trap_p):
Update function prototypes.

Thanks,
Roger
--

diff --git a/gcc/gimple.c b/gcc/gimple.c
index f1044e9..66edc1e 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2129,7 +2136,7 @@ gimple_has_side_effects (const gimple *s)
S is a GIMPLE_ASSIGN, the LHS of the assignment is also checked.  */
 
 bool
-gimple_could_trap_p_1 (gimple *s, bool include_mem, bool include_stores)
+gimple_could_trap_p_1 (const gimple *s, bool include_mem, bool include_stores)
 {
   tree t, div = NULL_TREE;
   enum tree_code op;
@@ -2146,7 +2153,7 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, bool 
include_stores)
   switch (gimple_code (s))
 {
 case GIMPLE_ASM:
-  return gimple_asm_volatile_p (as_a  (s));
+  return gimple_asm_volatile_p (as_a  (s));
 
 case GIMPLE_CALL:
   t = gimple_call_fndecl (s);
@@ -2192,7 +2199,7 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, bool 
include_stores)
 /* Return true if statement S can trap.  */
 
 bool
-gimple_could_trap_p (gimple *s)
+gimple_could_trap_p (const gimple *s)
 {
   return gimple_could_trap_p_1 (s, true, true);
 }
diff --git a/gcc/gimple.h b/gcc/gimple.h
index e7dc2a4..1a2e120 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1601,8 +1601,8 @@ void gimple_set_lhs (gimple *, tree);
 gimple *gimple_copy (gimple *);
 void gimple_move_vops (gimple *, gimple *);
 bool gimple_has_side_effects (const gimple *);
-bool gimple_could_trap_p_1 (gimple *, bool, bool);
-bool gimple_could_trap_p (gimple *);
+bool gimple_could_trap_p_1 (const gimple *, bool, bool);
+bool gimple_could_trap_p (const gimple *);
 bool gimple_assign_rhs_could_trap_p (gimple *);
 extern void dump_gimple_statistics (void);
 unsigned get_gimple_rhs_num_ops (enum tree_code);


Re: [PATCH] gcov: Add __gcov_info_to_gdca()

2021-07-13 Thread Sebastian Huber

On 13/07/2021 15:03, Sebastian Huber wrote:

memset (list_sizes, 0, counters * sizeof (unsigned));


Sorry, I just realized that memset() cannot be used if inhibit_libc is 
defined. I will send a v2 of the patch.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


[PATCH] gcov: Add __gcov_info_to_gdca()

2021-07-13 Thread Sebastian Huber
Add __gcov_info_to_gcda() to libgcov to get the gcda data for a gcda info in a
free-standing environment.  It is intended to be used with the
-fprofile-info-section option.  A crude test program which doesn't use a linker
script is (use "gcc -coverage -fprofile-info-section -lgcc test.c" to compile
it):

  #include 
  #include 
  #include 

  extern const struct gcov_info *my_info;

  static void
  filename (const char *f, void *arg)
  {
printf("filename: %s\n", f);
  }

  static void
  dump (const void *d, unsigned n, void *arg)
  {
const unsigned char *c = d;

for (unsigned i = 0; i < n; ++i)
  printf ("%02x", c[i]);
  }

  static void *
  allocate (unsigned length, void *arg)
  {
return malloc (length);
  }

  int main()
  {
__asm__ volatile (".set my_info, .LPBX2");
__gcov_info_to_gcda (my_info, filename, dump, allocate, NULL);
return 0;
  }

gcc/

* gcc/gcov-io.h (gcov_write): Declare.
* gcc/gcov-io.c (gcov_write): New.
* doc/invoke.texi (fprofile-info-section): Mention
__gcov_info_to_gdca().

libgcc/

Makefile.in (LIBGCOV_DRIVER): Add _gcov_info_to_gcda.
gcov.h (gcov_info): Declare.
(__gcov_info_to_gdca): Likewise.
libgcov-driver.c (are_all_counters_zero): New.
(dump_handler): Likewise.
(allocate_handler): Likewise.
(dump_unsigned): Likewise.
(dump_counter): Likewise.
(write_topn_counters): Add dump, allocate, and arg parameters.  Use
dump_unsigned() and dump_counter().
(write_one_data): Add dump, allocate, and arg parameters.  Use
dump_unsigned(), dump_counter(), and are_all_counters_zero().
(__gcov_info_to_gcda): New.
---
 gcc/doc/invoke.texi |  80 ++---
 gcc/gcov-io.c   |  10 +++
 gcc/gcov-io.h   |   1 +
 libgcc/Makefile.in  |   2 +-
 libgcc/gcov.h   |  17 +
 libgcc/libgcov-driver.c | 155 +++-
 6 files changed, 218 insertions(+), 47 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e67d47af676d..2c514acf2003 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14782,17 +14782,17 @@ To optimize the program based on the collected 
profile information, use
 Register the profile information in the specified section instead of using a
 constructor/destructor.  The section name is @var{name} if it is specified,
 otherwise the section name defaults to @code{.gcov_info}.  A pointer to the
-profile information generated by @option{-fprofile-arcs} or
-@option{-ftest-coverage} is placed in the specified section for each
-translation unit.  This option disables the profile information registration
-through a constructor and it disables the profile information processing
-through a destructor.  This option is not intended to be used in hosted
-environments such as GNU/Linux.  It targets systems with limited resources
-which do not support constructors and destructors.  The linker could collect
-the input sections in a continuous memory block and define start and end
-symbols.  The runtime support could dump the profiling information registered
-in this linker set during program termination to a serial line for example.  A
-GNU linker script example which defines a linker output section follows:
+profile information generated by @option{-fprofile-arcs} is placed in the
+specified section for each translation unit.  This option disables the profile
+information registration through a constructor and it disables the profile
+information processing through a destructor.  This option is not intended to be
+used in hosted environments such as GNU/Linux.  It targets free-standing
+environments (for example embedded systems) with limited resources which do not
+support constructors/destructors or the C library file I/O.
+
+The linker could collect the input sections in a continuous memory block and
+define start and end symbols.  A GNU linker script example which defines a
+linker output section follows:
 
 @smallexample
   .gcov_info  :
@@ -14803,6 +14803,64 @@ GNU linker script example which defines a linker 
output section follows:
   @}
 @end smallexample
 
+The program could dump the profiling information registered in this linker set
+for example like this:
+
+@smallexample
+#include 
+#include 
+#include 
+
+extern const struct gcov_info *__gcov_info_start[];
+extern const struct gcov_info *__gcov_info_end[];
+
+static void
+filename (const char *f, void *arg)
+@{
+  puts (f);
+@}
+
+static void
+dump (const void *d, unsigned n, void *arg)
+@{
+  const unsigned char *c = d;
+
+  for (unsigned i = 0; i < n; ++i)
+printf ("%02x", c[i]);
+@}
+
+static void *
+allocate (unsigned length, void *arg)
+@{
+  return malloc (length);
+@}
+
+static void
+dump_gcov_info (void)
+@{
+  const struct gcov_info **info = __gcov_info_start;
+  const struct gcov_info **end = __gcov_info_end;
+
+  /* Obfuscate variable to prevent 

[PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-13 Thread Jiufu Guo via Gcc-patches
Major changes from v1:
* Add target hook to query preferred doloop mode.
* Recompute doloop iv base from niter under preferred mode.

Currently, doloop.xx variable is using the type as niter which may shorter
than word size.  For some cases, it would be better to use word size type.
For example, on 64bit system, to access 32bit value, subreg maybe used.
Then using 64bit type maybe better for niter if it can be present in
both 32bit and 64bit.

This patch add target hook for querg perferred mode for doloop iv.
And update doloop iv mode accordingly.

Bootstrap and regtest pass on powerpc64le, is this ok for trunk?

BR.
Jiufu

gcc/ChangeLog:

2021-07-13  Jiufu Guo  

PR target/61837
* config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
(rs6000_preferred_doloop_mode): New hook.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Add hook preferred_doloop_mode.
* target.def (preferred_doloop_mode): New hook.
* targhooks.c (default_preferred_doloop_mode): New hook.
* targhooks.h (default_preferred_doloop_mode): New hook.
* tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New function.
(add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
and compute_doloop_base_on_mode.

gcc/testsuite/ChangeLog:

2021-07-13  Jiufu Guo  

PR target/61837
* gcc.target/powerpc/pr61837.c: New test.
---
 gcc/config/rs6000/rs6000.c |  9 +++
 gcc/doc/tm.texi|  4 ++
 gcc/doc/tm.texi.in |  2 +
 gcc/target.def |  7 +++
 gcc/targhooks.c|  8 +++
 gcc/targhooks.h|  2 +
 gcc/testsuite/gcc.target/powerpc/pr61837.c | 16 ++
 gcc/tree-ssa-loop-ivopts.c | 66 +-
 8 files changed, 112 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr61837.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9a5db63d0ef..444f3c49288 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1700,6 +1700,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_DOLOOP_COST_FOR_ADDRESS
 #define TARGET_DOLOOP_COST_FOR_ADDRESS 10
 
+#undef TARGET_PREFERRED_DOLOOP_MODE
+#define TARGET_PREFERRED_DOLOOP_MODE rs6000_preferred_doloop_mode
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
@@ -27867,6 +27870,12 @@ rs6000_predict_doloop_p (struct loop *loop)
   return true;
 }
 
+static machine_mode
+rs6000_preferred_doloop_mode (machine_mode)
+{
+  return word_mode;
+}
+
 /* Implement TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P.  */
 
 static bool
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2a41ae5fba1..3f5881220f8 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11984,6 +11984,10 @@ By default, the RTL loop optimizer does not use a 
present doloop pattern for
 loops containing function calls or branch on table instructions.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE 
(machine_mode @var{mode})
+This hook returns a more preferred mode or the @var{mode} itself.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_LEGITIMATE_COMBINED_INSN (rtx_insn 
*@var{insn})
 Take an instruction in @var{insn} and return @code{false} if the instruction
 is not appropriate as a combination of two or more instructions.  The
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f881cdabe9e..38215149a92 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7917,6 +7917,8 @@ to by @var{ce_info}.
 
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
+@hook TARGET_PREFERRED_DOLOOP_MODE
+
 @hook TARGET_LEGITIMATE_COMBINED_INSN
 
 @hook TARGET_CAN_FOLLOW_JUMP
diff --git a/gcc/target.def b/gcc/target.def
index c009671c583..91a96150e50 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4454,6 +4454,13 @@ loops containing function calls or branch on table 
instructions.",
  const char *, (const rtx_insn *insn),
  default_invalid_within_doloop)
 
+DEFHOOK
+(preferred_doloop_mode,
+ "This hook returns a more preferred mode or the @var{mode} itself.",
+ machine_mode,
+ (machine_mode mode),
+ default_preferred_doloop_mode)
+
 /* Returns true for a legitimate combined insn.  */
 DEFHOOK
 (legitimate_combined_insn,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 44a1facedcf..eb5190910dc 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -660,6 +660,14 @@ default_predict_doloop_p (class loop *loop 
ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* By default, just use the input MODE itself.  */
+
+machine_mode
+default_preferred_doloop_mode (machine_mode mode)
+{
+  return mode;
+}
+
 /* NULL if INSN insn is valid within a low-overhead loop, otherwise returns
an error message.
 
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index f70a307d26c..f0560b6ae34 

Re: [PATCH 0/7] ifcvt: Convert multiple

2021-07-13 Thread Robin Dapp via Gcc-patches

Ping :)


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-13 Thread Richard Biener via Gcc-patches
On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin  wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/7/13 下午5:35, Richard Biener wrote:
> > On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> When I added the support for Power10 newly introduced multiply
> >> highpart instrutions, I noticed that currently vectorizer
> >> doesn't try to vectorize multiply highpart pattern, I hope
> >> this isn't intentional?
> >>
> >> This patch is to extend the existing pattern mulhs handlings
> >> to cover multiply highpart.  Another alternative seems to
> >> recog mul_highpart operation in a general place applied for
> >> scalar code when the target supports the optab for the scalar
> >> operation, it's based on the assumption that one target which
> >> supports vector version of multiply highpart should have the
> >> scalar version.  I noticed that the function can_mult_highpart_p
> >> can check/handle mult_highpart well even without mul_highpart
> >> optab support, I think to recog this pattern in vectorizer
> >> is better.  Is it on the right track?
> >
> > I think it's on the right track, using IFN_LAST is a bit awkward
> > in case yet another case pops up so maybe you can use
> > a code_helper instance instead which unifies tree_code,
> > builtin_code and internal_fn?
> >
>
> If there is one new requirement which doesn't have/introduce IFN
> stuffs but have one existing tree_code, can we add one more field
> with type tree_code, then for the IFN_LAST path we can check the
> different requirements under the guard with that tree_code variable?
>
> > I also notice that can_mult_highpart_p will return true if
> > only vec_widen_[us]mult_{even,odd,hi,lo} are available,
> > but then the result might be less optimal (or even not
> > handled later)?
> >
>
> I think it will be handled always?  The expander calls
>
> rtx
> expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
>   rtx target, bool uns_p)
>
> which will further check with can_mult_highpart_p.
>
> For the below case,
>
> #define SHT_CNT 16
>
> __attribute__ ((noipa)) void
> test ()
> {
>   for (int i = 0; i < N; i++)
> sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16;
> }
>
> Without this patch, it use widen_mult like below:
>
>   vect__1.5_19 = MEM  [(short int *)_a + ivtmp.18_24 
> * 1];
>   vect__3.8_14 = MEM  [(short int *)_b + ivtmp.18_24 
> * 1];
>   vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR ;
>   vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR ;
>   vect__6.10_25 = vect_patt_22.9_13 >> 16;
>   vect__6.10_26 = vect_patt_22.9_9 >> 16;
>   vect__7.11_27 = VEC_PACK_TRUNC_EXPR ;
>   MEM  [(short int *)_c + ivtmp.18_24 * 1] = 
> vect__7.11_27;
>
> .L2:
> lxvx 33,7,9
> lxvx 32,8,9
> vmulosh 13,0,1// widen mult
> vmulesh 0,0,1
> xxmrglw 33,32,45  // merge
> xxmrghw 32,32,45
> vsraw 1,1,12  // shift
> vsraw 0,0,12
> vpkuwum 0,0,1 // pack
> stxvx 32,10,9
> addi 9,9,16
> bdnz .L2
>
>
> With this patch, it ends up with:
>
>   vect__1.5_14 = MEM  [(short int *)_a + ivtmp.17_24 
> * 1];
>   vect__3.8_8 = MEM  [(short int *)_b + ivtmp.17_24 * 
> 1];
>   vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14;
>   MEM  [(short int *)_c + ivtmp.17_24 * 1] = 
> vect_patt_21.9_25;

Yes, so I'm curious what it ends up with/without the patch on x86_64 which
can do vec_widen_[us]mult_{even,odd} but not [us]mul_highpart.

Richard.

> .L2:
> lxvx 32,8,9
> lxvx 33,10,9
> vmulosh 13,0,1   // widen mult
> vmulesh 0,0,1
> vperm 0,0,13,12  // perm on widen mults
> stxvx 32,7,9
> addi 9,9,16
> bdnz .L2
>
>
> > That is, what about adding optab internal functions
> > for [us]mul_highpart instead, much like the existing
> > ones for MULH{R,}S?
> >
>
> OK, I was thinking the IFN way at the beginning, but was worried
> that it's easy to be blamed saying it's not necessary since there
> is one existing tree_code.  :-)  Will update it with IFN way.
>
> BR,
> Kewen
>
> > Richard.
> >
> >>
> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9,
> >> x86_64-redhat-linux and aarch64-linux-gnu.
> >>
> >> BR,
> >> Kewen
> >> -
> >> gcc/ChangeLog:
> >>
> >> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
> >> recog normal multiply highpart.
> >>
>
>


[PATCH] Support reduction def re-use for epilogue with different vector size

2021-07-13 Thread Richard Biener
The following adds support for re-using the vector reduction def
from the main loop in vectorized epilogue loops on architectures
which use different vector sizes for the epilogue.  That's only
x86 as far as I am aware.

vect.exp tested on x86_64-unknown-linux-gnu, full bootstrap &
regtest in progress.

There's costing issues on x86 which usually prevent vectorizing
an epilogue with a reduction, at least for loops that only
have a reduction - it could be mitigated by not accounting for
the epilogue there if we can compute that we can re-use the
main loops cost.

Richard - did I figure the correct place to adjust?  I guess
adjusting accumulator->reduc_input in vect_transform_cycle_phi
for re-use by the skip code in vect_create_epilog_for_reduction
is a bit awkward but at least we're conciously doing
vect_create_epilog_for_reduction last (via vectorizing live
operations).

OK in the unlikely case all testing succeeds (I also want to
run it through SPEC with/without -fno-vect-cost-model which
will take some time)?

Thanks,
Richard.

2021-07-13  Richard Biener  

* tree-vect-loop.c (vect_find_reusable_accumulator): Handle
vector types where the old vector type has a multiple of
the new vector type elements.
(vect_create_partial_epilog): New function, split out from...
(vect_create_epilog_for_reduction): ... here.
(vect_transform_cycle_phi): Reduce the re-used accumulator
to the new vector type.

* gcc.target/i386/vect-reduc-1.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
 gcc/tree-vect-loop.c | 223 ---
 2 files changed, 155 insertions(+), 85 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-1.c

diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c 
b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
new file mode 100644
index 000..9ee9ba4e736
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */
+
+#define N 32
+int foo (int *a, int n)
+{
+  int sum = 1;
+  for (int i = 0; i < 8*N + 4; ++i)
+sum += a[i];
+  return sum;
+}
+
+/* The reduction epilog should be vectorized and the accumulator
+   re-used.  */
+/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
+/* { dg-final { scan-assembler-times "psrl" 2 } } */
+/* { dg-final { scan-assembler-times "padd" 5 } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 8c27d75f889..98e2a845629 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4901,7 +4901,8 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
  ones as well.  */
   tree vectype = STMT_VINFO_VECTYPE (reduc_info);
   tree old_vectype = TREE_TYPE (accumulator->reduc_input);
-  if (!useless_type_conversion_p (old_vectype, vectype))
+  if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
+   TYPE_VECTOR_SUBPARTS (vectype)))
 return false;
 
   /* Non-SLP reductions might apply an adjustment after the reduction
@@ -4935,6 +4936,101 @@ vect_find_reusable_accumulator (loop_vec_info 
loop_vinfo,
   return true;
 }
 
+/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation
+   CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */
+
+static tree
+vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code,
+   gimple_seq *seq)
+{
+  unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant ();
+  unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+  tree stype = TREE_TYPE (vectype);
+  tree new_temp = vec_def;
+  while (nunits > nunits1)
+{
+  nunits /= 2;
+  tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+  stype, nunits);
+  unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
+
+  /* The target has to make sure we support lowpart/highpart
+extraction, either via direct vector extract or through
+an integer mode punning.  */
+  tree dst1, dst2;
+  gimple *epilog_stmt;
+  if (convert_optab_handler (vec_extract_optab,
+TYPE_MODE (TREE_TYPE (new_temp)),
+TYPE_MODE (vectype1))
+ != CODE_FOR_nothing)
+   {
+ /* Extract sub-vectors directly once vec_extract becomes
+a conversion optab.  */
+ dst1 = make_ssa_name (vectype1);
+ epilog_stmt
+ = gimple_build_assign (dst1, BIT_FIELD_REF,
+build3 (BIT_FIELD_REF, vectype1,
+new_temp, TYPE_SIZE (vectype1),
+bitsize_int (0)));
+ gimple_seq_add_stmt_without_update (seq, epilog_stmt);
+ dst2 =  

[committed] libstdc++: Remove duplicate #include in

2021-07-13 Thread Jonathan Wakely via Gcc-patches
When I added the new C++23 constructor I added a conditional include of
, which was already being included unconditionally.
This removes the unconditional include but changes the condition for the
other one, so it's used for C++20 as well.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/string_view: Only include 
once, and only for C++20 and later.

Tested powerpc64le-linux. Committed to trunk.

commit bd1eb556b910fd4853ea83291e495d40adbcdf81
Author: Jonathan Wakely 
Date:   Tue Jul 13 12:09:37 2021

libstdc++: Remove duplicate #include in 

When I added the new C++23 constructor I added a conditional include of
, which was already being included unconditionally.
This removes the unconditional include but changes the condition for the
other one, so it's used for C++20 as well.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/string_view: Only include 
once, and only for C++20 and later.

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index 33e2129383a..cfdcf28f026 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -41,11 +41,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-#if __cplusplus > 202002L
+#if __cplusplus >= 202002L
 # include 
 #endif
 


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-13 Thread Kewen.Lin via Gcc-patches
Hi Richi,

Thanks for the comments!

on 2021/7/13 下午5:35, Richard Biener wrote:
> On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> When I added the support for Power10 newly introduced multiply
>> highpart instrutions, I noticed that currently vectorizer
>> doesn't try to vectorize multiply highpart pattern, I hope
>> this isn't intentional?
>>
>> This patch is to extend the existing pattern mulhs handlings
>> to cover multiply highpart.  Another alternative seems to
>> recog mul_highpart operation in a general place applied for
>> scalar code when the target supports the optab for the scalar
>> operation, it's based on the assumption that one target which
>> supports vector version of multiply highpart should have the
>> scalar version.  I noticed that the function can_mult_highpart_p
>> can check/handle mult_highpart well even without mul_highpart
>> optab support, I think to recog this pattern in vectorizer
>> is better.  Is it on the right track?
> 
> I think it's on the right track, using IFN_LAST is a bit awkward
> in case yet another case pops up so maybe you can use
> a code_helper instance instead which unifies tree_code,
> builtin_code and internal_fn?
> 

If there is one new requirement which doesn't have/introduce IFN
stuffs but have one existing tree_code, can we add one more field
with type tree_code, then for the IFN_LAST path we can check the
different requirements under the guard with that tree_code variable?

> I also notice that can_mult_highpart_p will return true if
> only vec_widen_[us]mult_{even,odd,hi,lo} are available,
> but then the result might be less optimal (or even not
> handled later)?
> 

I think it will be handled always?  The expander calls

rtx
expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
  rtx target, bool uns_p)

which will further check with can_mult_highpart_p.

For the below case, 

#define SHT_CNT 16

__attribute__ ((noipa)) void
test ()
{
  for (int i = 0; i < N; i++)
sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16;
}

Without this patch, it use widen_mult like below:

  vect__1.5_19 = MEM  [(short int *)_a + ivtmp.18_24 * 
1];
  vect__3.8_14 = MEM  [(short int *)_b + ivtmp.18_24 * 
1];
  vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR ;
  vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR ;
  vect__6.10_25 = vect_patt_22.9_13 >> 16;
  vect__6.10_26 = vect_patt_22.9_9 >> 16;
  vect__7.11_27 = VEC_PACK_TRUNC_EXPR ;
  MEM  [(short int *)_c + ivtmp.18_24 * 1] = 
vect__7.11_27;

.L2:
lxvx 33,7,9
lxvx 32,8,9
vmulosh 13,0,1// widen mult
vmulesh 0,0,1
xxmrglw 33,32,45  // merge
xxmrghw 32,32,45
vsraw 1,1,12  // shift
vsraw 0,0,12
vpkuwum 0,0,1 // pack
stxvx 32,10,9
addi 9,9,16
bdnz .L2


With this patch, it ends up with:

  vect__1.5_14 = MEM  [(short int *)_a + ivtmp.17_24 * 
1];
  vect__3.8_8 = MEM  [(short int *)_b + ivtmp.17_24 * 
1];
  vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14;
  MEM  [(short int *)_c + ivtmp.17_24 * 1] = 
vect_patt_21.9_25;

.L2:
lxvx 32,8,9
lxvx 33,10,9
vmulosh 13,0,1   // widen mult
vmulesh 0,0,1
vperm 0,0,13,12  // perm on widen mults
stxvx 32,7,9
addi 9,9,16
bdnz .L2


> That is, what about adding optab internal functions
> for [us]mul_highpart instead, much like the existing
> ones for MULH{R,}S?
> 

OK, I was thinking the IFN way at the beginning, but was worried
that it's easy to be blamed saying it's not necessary since there
is one existing tree_code.  :-)  Will update it with IFN way.

BR,
Kewen

> Richard.
> 
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu.
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>> recog normal multiply highpart.
>>




Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-13 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Jul 13, 2021 at 11:40 AM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin  wrote:
>> >>
>> >> Hi,
>> >>
>> >> When I added the support for Power10 newly introduced multiply
>> >> highpart instrutions, I noticed that currently vectorizer
>> >> doesn't try to vectorize multiply highpart pattern, I hope
>> >> this isn't intentional?
>> >>
>> >> This patch is to extend the existing pattern mulhs handlings
>> >> to cover multiply highpart.  Another alternative seems to
>> >> recog mul_highpart operation in a general place applied for
>> >> scalar code when the target supports the optab for the scalar
>> >> operation, it's based on the assumption that one target which
>> >> supports vector version of multiply highpart should have the
>> >> scalar version.  I noticed that the function can_mult_highpart_p
>> >> can check/handle mult_highpart well even without mul_highpart
>> >> optab support, I think to recog this pattern in vectorizer
>> >> is better.  Is it on the right track?
>> >
>> > I think it's on the right track, using IFN_LAST is a bit awkward
>> > in case yet another case pops up so maybe you can use
>> > a code_helper instance instead which unifies tree_code,
>> > builtin_code and internal_fn?
>> >
>> > I also notice that can_mult_highpart_p will return true if
>> > only vec_widen_[us]mult_{even,odd,hi,lo} are available,
>> > but then the result might be less optimal (or even not
>> > handled later)?
>> >
>> > That is, what about adding optab internal functions
>> > for [us]mul_highpart instead, much like the existing
>> > ones for MULH{R,}S?
>>
>> Yeah, that's be my preference too FWIW.  All uses of MULT_HIGHPART_EXPR
>> already have to be guarded by can_mult_highpart_p, so replacing it with
>> a directly-mapped ifn seems like a natural fit.  (Then can_mult_highpart_p
>> should be replaced with a direct_internal_fn_supported_p query.)
>
> But note can_mult_highpart_t covers use via 
> vec_widen_[us]mult_{even,odd,hi,lo}
> but I think this specific pattern should key on [us]mul_highpart only?
>
> Because vec_widen_* implies a higher VF (or else we might miss vectorizing?)?

But wouldn't it be better to do the existing hi/lo/even/odd conversion in
gimple, rather than hide it in expand?  (Yes, this is feature creep. :-))

Richard


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-13 Thread Richard Biener via Gcc-patches
On Tue, Jul 13, 2021 at 11:40 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> When I added the support for Power10 newly introduced multiply
> >> highpart instrutions, I noticed that currently vectorizer
> >> doesn't try to vectorize multiply highpart pattern, I hope
> >> this isn't intentional?
> >>
> >> This patch is to extend the existing pattern mulhs handlings
> >> to cover multiply highpart.  Another alternative seems to
> >> recog mul_highpart operation in a general place applied for
> >> scalar code when the target supports the optab for the scalar
> >> operation, it's based on the assumption that one target which
> >> supports vector version of multiply highpart should have the
> >> scalar version.  I noticed that the function can_mult_highpart_p
> >> can check/handle mult_highpart well even without mul_highpart
> >> optab support, I think to recog this pattern in vectorizer
> >> is better.  Is it on the right track?
> >
> > I think it's on the right track, using IFN_LAST is a bit awkward
> > in case yet another case pops up so maybe you can use
> > a code_helper instance instead which unifies tree_code,
> > builtin_code and internal_fn?
> >
> > I also notice that can_mult_highpart_p will return true if
> > only vec_widen_[us]mult_{even,odd,hi,lo} are available,
> > but then the result might be less optimal (or even not
> > handled later)?
> >
> > That is, what about adding optab internal functions
> > for [us]mul_highpart instead, much like the existing
> > ones for MULH{R,}S?
>
> Yeah, that's be my preference too FWIW.  All uses of MULT_HIGHPART_EXPR
> already have to be guarded by can_mult_highpart_p, so replacing it with
> a directly-mapped ifn seems like a natural fit.  (Then can_mult_highpart_p
> should be replaced with a direct_internal_fn_supported_p query.)

But note can_mult_highpart_t covers use via vec_widen_[us]mult_{even,odd,hi,lo}
but I think this specific pattern should key on [us]mul_highpart only?

Because vec_widen_* implies a higher VF (or else we might miss vectorizing?)?

Richard.


> Thanks,
> Richard


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-13 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> When I added the support for Power10 newly introduced multiply
>> highpart instrutions, I noticed that currently vectorizer
>> doesn't try to vectorize multiply highpart pattern, I hope
>> this isn't intentional?
>>
>> This patch is to extend the existing pattern mulhs handlings
>> to cover multiply highpart.  Another alternative seems to
>> recog mul_highpart operation in a general place applied for
>> scalar code when the target supports the optab for the scalar
>> operation, it's based on the assumption that one target which
>> supports vector version of multiply highpart should have the
>> scalar version.  I noticed that the function can_mult_highpart_p
>> can check/handle mult_highpart well even without mul_highpart
>> optab support, I think to recog this pattern in vectorizer
>> is better.  Is it on the right track?
>
> I think it's on the right track, using IFN_LAST is a bit awkward
> in case yet another case pops up so maybe you can use
> a code_helper instance instead which unifies tree_code,
> builtin_code and internal_fn?
>
> I also notice that can_mult_highpart_p will return true if
> only vec_widen_[us]mult_{even,odd,hi,lo} are available,
> but then the result might be less optimal (or even not
> handled later)?
>
> That is, what about adding optab internal functions
> for [us]mul_highpart instead, much like the existing
> ones for MULH{R,}S?

Yeah, that's be my preference too FWIW.  All uses of MULT_HIGHPART_EXPR
already have to be guarded by can_mult_highpart_p, so replacing it with
a directly-mapped ifn seems like a natural fit.  (Then can_mult_highpart_p
should be replaced with a direct_internal_fn_supported_p query.)

Thanks,
Richard


Re: [RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-13 Thread Richard Biener via Gcc-patches
On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin  wrote:
>
> Hi,
>
> When I added the support for Power10 newly introduced multiply
> highpart instrutions, I noticed that currently vectorizer
> doesn't try to vectorize multiply highpart pattern, I hope
> this isn't intentional?
>
> This patch is to extend the existing pattern mulhs handlings
> to cover multiply highpart.  Another alternative seems to
> recog mul_highpart operation in a general place applied for
> scalar code when the target supports the optab for the scalar
> operation, it's based on the assumption that one target which
> supports vector version of multiply highpart should have the
> scalar version.  I noticed that the function can_mult_highpart_p
> can check/handle mult_highpart well even without mul_highpart
> optab support, I think to recog this pattern in vectorizer
> is better.  Is it on the right track?

I think it's on the right track, using IFN_LAST is a bit awkward
in case yet another case pops up so maybe you can use
a code_helper instance instead which unifies tree_code,
builtin_code and internal_fn?

I also notice that can_mult_highpart_p will return true if
only vec_widen_[us]mult_{even,odd,hi,lo} are available,
but then the result might be less optimal (or even not
handled later)?

That is, what about adding optab internal functions
for [us]mul_highpart instead, much like the existing
ones for MULH{R,}S?

Richard.

>
> Bootstrapped & regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu.
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
> recog normal multiply highpart.
>


Re: [PATCH 00/10] vect: Reuse reduction accumulators between loops

2021-07-13 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> Hi Richard,
>
> on 2021/7/8 下午8:38, Richard Sandiford via Gcc-patches wrote:
>> Quoting from the final patch in the series:
>> 
>> 
>> This patch adds support for reusing a main loop's reduction accumulator
>> in an epilogue loop.  This in turn lets the loops share a single piece
>> of vector->scalar reduction code.
>> 
>> The patch has the following restrictions:
>> 
>> (1) The epilogue reduction can only operate on a single vector
>> (e.g. ncopies must be 1 for non-SLP reductions, and the group size
>> must be <= the element count for SLP reductions).
>> 
>> (2) Both loops must use the same vector mode for their accumulators.
>> This means that the patch is restricted to targets that support
>> --param vect-partial-vector-usage=1.
>> 
>> (3) The reduction must be a standard “tree code” reduction.
>> 
>> However, these restrictions could be lifted in future.  For example,
>> if the main loop operates on 128-bit vectors and the epilogue loop
>> operates on 64-bit vectors, we could in future reduce the 128-bit
>> vector by one stage and use the 64-bit result as the starting point
>> for the epilogue result.
>> 
>> The patch tries to handle chained SLP reductions, unchained SLP
>> reductions and non-SLP reductions.  It also handles cases in which
>> the epilogue loop is entered directly (rather than via the main loop)
>> and cases in which the epilogue loop can be skipped.
>> 
>> 
>> However, it ended up being difficult to do that without some preparatory
>> clean-ups.  Some of them could probably stand on their own, but others
>> are a bit “meh” without the final patch to justify them.
>> 
>> The diff below shows the effect of the patch when compiling:
>> 
>>   unsigned short __attribute__((noipa))
>>   add_loop (unsigned short *x, int n)
>>   {
>> unsigned short res = 0;
>> for (int i = 0; i < n; ++i)
>>   res += x[i];
>> return res;
>>   }
>> 
>> with -O3 --param vect-partial-vector-usage=1 on an SVE target:
>> 
>> add_loop:add_loop:
>> .LFB0:   .LFB0:
>>  .cfi_startproc  .cfi_startproc
>>  mov x4, x0<
>>  cmp w1, 0   cmp w1, 0
>>  ble .L7 ble .L7
>>  cnthx0| cnthx4
>>  sub w2, w1, #1  sub w2, w1, #1
>>  sub w3, w0, #1| sub w3, w4, #1
>>  cmp w2, w3  cmp w2, w3
>>  bcc .L8 bcc .L8
>>  sub w0, w1, w0| sub w4, w1, w4
>>  mov x3, 0   mov x3, 0
>>  cnthx5  cnthx5
>>  mov z0.b, #0mov z0.b, #0
>>  ptrue   p0.b, all   ptrue   p0.b, all
>>  .p2align 3,,7   .p2align 3,,7
>> .L4: .L4:
>>  ld1hz1.h, p0/z, [x4, x3,  | ld1hz1.h, p0/z, [x0, x3, 
>>  mov x2, x3  mov x2, x3
>>  add x3, x3, x5  add x3, x3, x5
>>  add z0.h, z0.h, z1.hadd z0.h, z0.h, z1.h
>>  cmp w0, w3| cmp w4, w3
>>  bcs .L4 bcs .L4
>>  uaddv   d0, p0, z0.h  <
>>  umovw0, v0.h[0]   <
>>  inchx2  inchx2
>>  and w0, w0, 65535 <
>>  cmp w1, w2  cmp w1, w2
>>  beq .L2   | beq .L6
>> .L3: .L3:
>>  sub w1, w1, w2  sub w1, w1, w2
>>  mov z1.b, #0  | add x2, x0, w2, uxtw 1
>>  whilelo p0.h, wzr, w1   whilelo p0.h, wzr, w1
>>  add x2, x4, w2, uxtw 1| ld1hz1.h, p0/z, [x2]
>>  ptrue   p1.b, all | add z0.h, p0/m, z0.h, z1.
>>  ld1hz0.h, p0/z, [x2]  | .L6:
>>  sel z0.h, p0, z0.h, z1.h  | ptrue   p0.b, all
>>  uaddv   d0, p1, z0.h  | uaddv   d0, p0, z0.h
>>  fmovx1, d0| umovw0, v0.h[0]
>>  add w0, w0, w1, uxth  <
>>  and w0, w0, 65535   and w0, w0, 65535
>> .L2:   <
>>  ret ret
>>  .p2align 2,,3   .p2align 2,,3
>> .L7: .L7:
>>  mov w0, 0   mov w0, 0
>>  ret 

Re: [PATCH 04/10] vect: Ensure reduc_inputs always have vectype

2021-07-13 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, Jul 8, 2021 at 2:44 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> Vector reduction accumulators can differ in signedness from the
>> final scalar result.  The conversions to handle that case were
>> distributed through vect_create_epilog_for_reduction; this patch
>> does the conversion up-front instead.
>
> But is that still correct?  The conversions should be unsigned -> signed,
> that is, we've performed the reduction in unsigned because we associated
> the originally undefined overflow signed reduction.  But the final
> reduction of the vector lanes in the epilogue still needs to be done
> unsigned.
>
> So it's just not obvious that the patch preserves this - if it does then
> the patch is OK.

We ended up covering most of this in the later 6/10 thread, but just to
follow up here for the record, in case anyone looks at the list archives:

In that scenario, the phis are created with the signed type and then
(like you say) the reduction happens in the unsigned type.  These
conversions are from the signed type to the unsigned type ready for
the reduction.

All later code either performed the conversion itself or (in the
case of some of the cond reductions) required the phi and reduction
vectypes to be the same.

I've pushed the series now -- thanks for the reviews.

Richard


Re: move unreachable user labels to entry point

2021-07-13 Thread Richard Biener via Gcc-patches
On Tue, Jul 13, 2021 at 10:52 AM Jakub Jelinek  wrote:
>
> On Tue, Jul 13, 2021 at 09:56:41AM +0200, Richard Biener wrote:
> > The comment above says
> >
> >   /* During cfg pass make sure to put orphaned labels
> >  into the right OMP region.  */
> >
> > and the full guard is
> >
> >   if ((unsigned) bb->index < bb_to_omp_idx.length ()
> >   && ((unsigned) new_bb->index >= bb_to_omp_idx.length ()
> >   || (bb_to_omp_idx[bb->index]
> >   != bb_to_omp_idx[new_bb->index])))
> >
> > The right OMP region suggests something wrt correctness
> > (though the original choice of bb->prev_bb doesn't put the initial new_bb
> > in an semantic relation to the original block).
>
> The reason for the OMP code I've added there is to put orphaned labels into
> the right OpenMP region, because if they are moved to arbitrary unrelated
> locations, when outlining the OMP regions (e.g. parallel, task, or target),
> we want the orphaned labels to be in the right function as opposed to some
> unrelated one, at least if the orphaned label is referenced in the code.
> Because referencing say from offloaded code a label that got moved to the
> host function or vice versa simply can't work and similarly causes ICEs
> even with parallel/task etc. regions.
>
> But I'm not sure I agree with the intention of the patch, yes, orphaned
> labels even without OpenMP are moved to some other place, but the current
> code typically keeps it close to where it used to live.

typically, yes, but bb->prev_bb doesn't really have such meaning.  OTOH
it's hard to do better from delete_basic_block, callers might do a better
job but there's currently no way to communicate down sth like an
entry or exit to a removed CFG region where these things could be moved to.

Maybe the GIMPLE CFG hooks could queue to be re-issued stmts somewhere
and callers be responsible for re-emitting them.

>  Moving them to
> entry will be always worse user experience if people e.g. print the &
> addresses etc.
>
> Jakub
>


Re: [PATCH 1/2] arm: Fix vcond_mask expander for MVE (PR target/100757)

2021-07-13 Thread Christophe Lyon via Gcc-patches
Ping?

Le ven. 2 juil. 2021 à 10:53, Christophe Lyon 
a écrit :

> Hi,
>
> On Wed, 9 Jun 2021 at 17:04, Richard Sandiford
>  wrote:
> >
> > Christophe Lyon  writes:
> > > The problem in this PR is that we call VPSEL with a mask of vector
> > > type instead of HImode. This happens because operand 3 in vcond_mask
> > > is the pre-computed vector comparison and has vector type. The fix is
> > > to transfer this value to VPR.P0 by comparing operand 3 with a vector
> > > of constant 1 of the same type as operand 3.
> >
> > The alternative is to implement TARGET_VECTORIZE_GET_MASK_MODE
> > and return HImode for MVE.  This is how AVX512 handles masks.
> >
> > It might be worth trying that to see how it works.  I'm not sure
> > off-hand whether it'll produce worse code or better code.  However,
> > using HImode as the mask mode would help when defining other
> > predicated optabs in future.
> >
>
> Here is my v2 of this patch, hopefully implementing what you suggested.
>
> Sorry it took me so long, but as implementing this hook was of course
> not sufficient, and it took me a while to figure out I needed to keep the
> non-HI expanders (vec_cmp ,...). Each time I fixed a bug, I created
> another one... I shouldn't have added so many tests ;-)
>
> I'm not sure how to improve the vectorizer doc, to better describe the
> vec_cmp/vcond patterns and see which ones the vectorizer is trying
> to use (to understand which ones I should implement).
>
> Then I realized I was about to break Neon support, so I decided
> it was safer to add Neon tests ;-)
>
> Is that version OK?
>
> Thanks,
>
> Christophe
>
>
> > Thanks,
> > Richard
> >
> > > The pr100757*.c testcases are derived from
> > > gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
> > > different types and return values different from 0 and 1 to avoid
> > > commonalization with boolean masks.
> > >
> > > Reducing the number of iterations in pr100757-3.c from 32 to 8, we
> > > generate the code below:
> > >
> > > float a[32];
> > > float fn1(int d) {
> > >   int c = 4;
> > >   for (int b = 0; b < 8; b++)
> > > if (a[b] != 2.0f)
> > >   c = 5;
> > >   return c;
> > > }
> > >
> > > fn1:
> > > ldr r3, .L4+80
> > >   vpush.64{d8, d9}
> > >   vldrw.32q3, [r3]// q3=a[0..3]
> > >   vldr.64 d8, .L4 // q4=(2.0,2.0,2.0,2.0)
> > >   vldr.64 d9, .L4+8
> > >   addsr3, r3, #16
> > >   vcmp.f32eq, q3, q4  // cmp a[0..3] ==
> (2.0,2.0,2.0,2.0)
> > >   vldr.64 d2, .L4+16  // q1=(1,1,1,1)
> > >   vldr.64 d3, .L4+24
> > >   vldrw.32q3, [r3]// q3=a[4..7]
> > >   vldr.64 d4, .L4+32  // q2=(0,0,0,0)
> > >   vldr.64 d5, .L4+40
> > >   vpsel q0, q1, q2// q0=select (a[0..3])
> > >   vcmp.f32eq, q3, q4  // cmp a[4..7] ==
> (2.0,2.0,2.0,2.0)
> > >   vldmsp!, {d8-d9}
> > >   vpsel q2, q1, q2// q2=select (a[4..7])
> > >   vandq2, q0, q2  // q2=select (a[0..3]) && select
> (a[4..7])
> > >   vldr.64 d6, .L4+48  // q3=(4.0,4.0,4.0,4.0)
> > >   vldr.64 d7, .L4+56
> > >   vldr.64 d0, .L4+64  // q0=(5.0,5.0,5.0,5.0)
> > >   vldr.64 d1, .L4+72
> > >   vcmp.i32  eq, q2, q1// cmp mask(a[0..7]) == (1,1,1,1)
> > >   vpsel q3, q3, q0// q3= vcond_mask(4.0,5.0)
> > >   vmov.32 r3, q3[0]   // keep the scalar
> maxv2-0001-arm-Fix-vcond_mask-expander-for-MVE-PR-target-100.patch
> > >   vmov.32 r1, q3[1]
> > >   vmov.32 r0, q3[3]
> > >   vmov.32 r2, q3[2]
> > >   vmovs14, r1
> > >   vmovs15, r3
> > >   vmaxnm.f32  s15, s15, s14
> > >   vmovs14, r2
> > >   vmaxnm.f32  s15, s15, s14
> > >   vmovs14, r0
> > >   vmaxnm.f32  s15, s15, s14
> > >   vmovr0, s15
> > >   bx  lr
> > >   .L5:
> > >   .align  3
> > >   .L4:
> > >   .word   1073741824
> > >   .word   1073741824
> > >   .word   1073741824
> > >   .word   1073741824
> > >   .word   1
> > >   .word   1
> > >   .word   1
> > >   .word   1
> > >   .word   0
> > >   .word   0
> > >   .word   0
> > >   .word   0
> > >   .word   1082130432
> > >   .word   1082130432
> > >   .word   1082130432
> > >   .word   1082130432
> > >   .word   1084227584
> > >   .word   1084227584
> > >   .word   1084227584
> > >   .word   1084227584
> > >
> > > 2021-06-09  Christophe Lyon  
> > >
> > >   PR target/100757
> > >   gcc/
> > >   * config/arm/vec-common.md (vcond_mask_): Fix
> > >   expansion for MVE.
> > >
> > >   gcc/testsuite/
> > >   * gcc.target/arm/simd/pr100757.c: New test.
> > >   * gcc.target/arm/simd/pr100757-2.c: New test.
> > >   * gcc.target/arm/simd/pr100757-3.c: New test.
> > > ---
> > >  

[PATCH] rs6000: Support [u]mul3_highpart for vector

2021-07-13 Thread Kewen.Lin via Gcc-patches
Hi,

This patch is to make Power10 newly introduced vector
multiply high (part) instructions exploited in vectorized
loops, it renames existing define_insns as standard pattern
names.  It depends on that patch which enables vectorizer
to recog mul_highpart.

Tested on powerpc64le-linux-gnu P9 with P10 supported
binutils, will test more if the vectorizer patch gets
landed.

BR,
Kewen.
-
gcc/ChangeLog:

* config/rs6000/vsx.md (mulhs_): Rename to...
(smul3_highpart): ... this.
(mulhu_): Rename to...
(umul3_highpart): ... this.
* config/rs6000/rs6000-builtin.def (MULHS_V2DI, MULHS_V4SI,
MULHU_V2DI, MULHU_V4SI): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/mul-vectorize-3.c: New test.
* gcc.target/powerpc/mul-vectorize-4.c: New test.
---
 gcc/config/rs6000/rs6000-builtin.def  |  8 ++---
 gcc/config/rs6000/vsx.md  |  4 +--
 .../gcc.target/powerpc/mul-vectorize-3.c  | 32 ++
 .../gcc.target/powerpc/mul-vectorize-4.c  | 33 +++
 4 files changed, 71 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mul-vectorize-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mul-vectorize-4.c

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 592efe31b04..cbacbc6b785 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -3016,10 +3016,10 @@ BU_P10V_AV_2 (MODS_V2DI, "vmodsd", CONST, modv2di3)
 BU_P10V_AV_2 (MODS_V4SI, "vmodsw", CONST, modv4si3)
 BU_P10V_AV_2 (MODU_V2DI, "vmodud", CONST, umodv2di3)
 BU_P10V_AV_2 (MODU_V4SI, "vmoduw", CONST, umodv4si3)
-BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, mulhs_v2di)
-BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, mulhs_v4si)
-BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, mulhu_v2di)
-BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, mulhu_v4si)
+BU_P10V_AV_2 (MULHS_V2DI, "vmulhsd", CONST, smulv2di3_highpart)
+BU_P10V_AV_2 (MULHS_V4SI, "vmulhsw", CONST, smulv4si3_highpart)
+BU_P10V_AV_2 (MULHU_V2DI, "vmulhud", CONST, umulv2di3_highpart)
+BU_P10V_AV_2 (MULHU_V4SI, "vmulhuw", CONST, umulv4si3_highpart)
 BU_P10V_AV_2 (MULLD_V2DI, "vmulld", CONST, mulv2di3)
 
 BU_P10V_VSX_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST, xxspltiw_v4si)
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f622873d758..6f6fc0bd835 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -6351,7 +6351,7 @@ (define_insn "umod3"
   [(set_attr "type" "vecdiv")
(set_attr "size" "")])
 
-(define_insn "mulhs_"
+(define_insn "smul3_highpart"
   [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
(mult:VIlong (ashiftrt
   (match_operand:VIlong 1 "vsx_register_operand" "v")
@@ -6363,7 +6363,7 @@ (define_insn "mulhs_"
   "vmulhs %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "mulhu_"
+(define_insn "umul3_highpart"
   [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
(us_mult:VIlong (ashiftrt
  (match_operand:VIlong 1 "vsx_register_operand" "v")
diff --git a/gcc/testsuite/gcc.target/powerpc/mul-vectorize-3.c 
b/gcc/testsuite/gcc.target/powerpc/mul-vectorize-3.c
new file mode 100644
index 000..2c89c0faec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mul-vectorize-3.c
@@ -0,0 +1,32 @@
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -ftree-vectorize 
-fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */
+
+/* Test vectorizer can exploit ISA 3.1 instructions Vector Multiply
+   High Signed/Unsigned Word for both signed and unsigned int high part
+   multiplication.  */
+
+#define N 128
+
+extern signed int si_a[N], si_b[N], si_c[N];
+extern unsigned int ui_a[N], ui_b[N], ui_c[N];
+
+typedef signed long long sLL;
+typedef unsigned long long uLL;
+
+__attribute__ ((noipa)) void
+test_si ()
+{
+  for (int i = 0; i < N; i++)
+si_c[i] = ((sLL) si_a[i] * (sLL) si_b[i]) >> 32;
+}
+
+__attribute__ ((noipa)) void
+test_ui ()
+{
+  for (int i = 0; i < N; i++)
+ui_c[i] = ((uLL) ui_a[i] * (uLL) ui_b[i]) >> 32;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { scan-assembler-times {\mvmulhsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmulhuw\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mul-vectorize-4.c 
b/gcc/testsuite/gcc.target/powerpc/mul-vectorize-4.c
new file mode 100644
index 000..265e7588bb6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mul-vectorize-4.c
@@ -0,0 +1,33 @@
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -ftree-vectorize 
-fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */
+
+/* Test vectorizer can exploit ISA 3.1 instructions Vector Multiply
+   High 

[RFC/PATCH] vect: Recog mul_highpart pattern

2021-07-13 Thread Kewen.Lin via Gcc-patches
Hi,

When I added the support for Power10 newly introduced multiply
highpart instrutions, I noticed that currently vectorizer
doesn't try to vectorize multiply highpart pattern, I hope
this isn't intentional?

This patch is to extend the existing pattern mulhs handlings
to cover multiply highpart.  Another alternative seems to
recog mul_highpart operation in a general place applied for
scalar code when the target supports the optab for the scalar
operation, it's based on the assumption that one target which
supports vector version of multiply highpart should have the
scalar version.  I noticed that the function can_mult_highpart_p
can check/handle mult_highpart well even without mul_highpart
optab support, I think to recog this pattern in vectorizer
is better.  Is it on the right track?

Bootstrapped & regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.

BR,
Kewen
-
gcc/ChangeLog:

* tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
recog normal multiply highpart.

---
 gcc/tree-vect-patterns.c | 67 
 1 file changed, 48 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index b2e7fc2cc7a..9253c8088e9 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1896,8 +1896,15 @@ vect_recog_over_widening_pattern (vec_info *vinfo,
 
1) Multiply high with scaling
  TYPE res = ((TYPE) a * (TYPE) b) >> c;
+ Here, c is bitsize (TYPE) / 2 - 1.
+
2) ... or also with rounding
  TYPE res = (((TYPE) a * (TYPE) b) >> d + 1) >> 1;
+ Here, d is bitsize (TYPE) / 2 - 2.
+
+   3) Normal multiply high
+ TYPE res = ((TYPE) a * (TYPE) b) >> e;
+ Here, e is bitsize (TYPE) / 2.
 
where only the bottom half of res is used.  */
 
@@ -1942,7 +1949,6 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
   stmt_vec_info mulh_stmt_info;
   tree scale_term;
   internal_fn ifn;
-  unsigned int expect_offset;
 
   /* Check for the presence of the rounding term.  */
   if (gimple_assign_rhs_code (rshift_input_stmt) == PLUS_EXPR)
@@ -1991,25 +1997,37 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
 
   /* Get the scaling term.  */
   scale_term = gimple_assign_rhs2 (plus_input_stmt);
+  /* Check that the scaling factor is correct.  */
+  if (TREE_CODE (scale_term) != INTEGER_CST)
+   return NULL;
+
+  /* Check pattern 2).  */
+  if (wi::to_widest (scale_term) + target_precision + 2
+ != TYPE_PRECISION (lhs_type))
+   return NULL;
 
-  expect_offset = target_precision + 2;
   ifn = IFN_MULHRS;
 }
   else
 {
   mulh_stmt_info = rshift_input_stmt_info;
   scale_term = gimple_assign_rhs2 (last_stmt);
+  /* Check that the scaling factor is correct.  */
+  if (TREE_CODE (scale_term) != INTEGER_CST)
+   return NULL;
 
-  expect_offset = target_precision + 1;
-  ifn = IFN_MULHS;
+  /* Check for pattern 1).  */
+  if (wi::to_widest (scale_term) + target_precision + 1
+ == TYPE_PRECISION (lhs_type))
+   ifn = IFN_MULHS;
+  /* Check for pattern 3).  */
+  else if (wi::to_widest (scale_term) + target_precision
+  == TYPE_PRECISION (lhs_type))
+   ifn = IFN_LAST;
+  else
+   return NULL;
 }
 
-  /* Check that the scaling factor is correct.  */
-  if (TREE_CODE (scale_term) != INTEGER_CST
-  || wi::to_widest (scale_term) + expect_offset
-  != TYPE_PRECISION (lhs_type))
-return NULL;
-
   /* Check whether the scaling input term can be seen as two widened
  inputs multiplied together.  */
   vect_unpromoted_value unprom_mult[2];
@@ -2029,9 +2047,14 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
 
   /* Check for target support.  */
   tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
-  if (!new_vectype
-  || !direct_internal_fn_supported_p
-   (ifn, new_vectype, OPTIMIZE_FOR_SPEED))
+  if (!new_vectype)
+return NULL;
+  if (ifn != IFN_LAST
+  && !direct_internal_fn_supported_p (ifn, new_vectype, 
OPTIMIZE_FOR_SPEED))
+return NULL;
+  else if (ifn == IFN_LAST
+  && !can_mult_highpart_p (TYPE_MODE (new_vectype),
+   TYPE_UNSIGNED (new_type)))
 return NULL;
 
   /* The IR requires a valid vector type for the cast result, even though
@@ -2040,14 +2063,20 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
   if (!*type_out)
 return NULL;
 
-  /* Generate the IFN_MULHRS call.  */
+  gimple *mulhrs_stmt;
   tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
   tree new_ops[2];
-  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type,
-  unprom_mult, new_vectype);
-  gcall *mulhrs_stmt
-= gimple_build_call_internal (ifn, 2, new_ops[0], new_ops[1]);
-  gimple_call_set_lhs (mulhrs_stmt, new_var);
+  vect_convert_inputs (vinfo, last_stmt_info, 2, new_ops, new_type, 
unprom_mult,
+ 

Re: move unreachable user labels to entry point

2021-07-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 13, 2021 at 09:56:41AM +0200, Richard Biener wrote:
> The comment above says
> 
>   /* During cfg pass make sure to put orphaned labels
>  into the right OMP region.  */
> 
> and the full guard is
> 
>   if ((unsigned) bb->index < bb_to_omp_idx.length ()
>   && ((unsigned) new_bb->index >= bb_to_omp_idx.length ()
>   || (bb_to_omp_idx[bb->index]
>   != bb_to_omp_idx[new_bb->index])))
> 
> The right OMP region suggests something wrt correctness
> (though the original choice of bb->prev_bb doesn't put the initial new_bb
> in an semantic relation to the original block).

The reason for the OMP code I've added there is to put orphaned labels into
the right OpenMP region, because if they are moved to arbitrary unrelated
locations, when outlining the OMP regions (e.g. parallel, task, or target),
we want the orphaned labels to be in the right function as opposed to some
unrelated one, at least if the orphaned label is referenced in the code.
Because referencing say from offloaded code a label that got moved to the
host function or vice versa simply can't work and similarly causes ICEs
even with parallel/task etc. regions.

But I'm not sure I agree with the intention of the patch, yes, orphaned
labels even without OpenMP are moved to some other place, but the current
code typically keeps it close to where it used to live.  Moving them to
entry will be always worse user experience if people e.g. print the &
addresses etc.

Jakub



Re: [PATCH] passes: Fix up subobject __bos [PR101419]

2021-07-13 Thread Richard Biener
On Tue, 13 Jul 2021, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled, because VN during cunrolli changes
> __bos argument from address of a larger field to address of a smaller field
> and so __builtin_object_size (, 1) then folds into smaller value than the
> actually available size.
> copy_reference_ops_from_ref has a hack for this, but it was using
> cfun->after_inlining as a check whether the hack can be ignored, and
> cunrolli is after_inlining.
> 
> This patch uses a property to make it exact (set at the end of objsz
> pass that doesn't do insert_min_max_p) and additionally based on discussions
> in the PR moves the objsz pass earlier after IPA.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2021-07-13  Jakub Jelinek  
>   Richard Biener  
> 
>   PR tree-optimization/101419
>   * tree-pass.h (PROP_objsz): Define.
>   (make_pass_early_object_sizes): Declare.
>   * passes.def (pass_all_early_optimizations): Rename pass_object_sizes
>   there to pass_early_object_sizes, drop parameter.
>   (pass_all_optimizations): Move pass_object_sizes right after pass_ccp,
>   drop parameter, move pass_post_ipa_warn right after that.
>   * tree-object-size.c (pass_object_sizes::execute): Rename to...
>   (object_sizes_execute): ... this.  Add insert_min_max_p argument.
>   (pass_data_object_sizes): Move after object_sizes_execute.
>   (pass_object_sizes): Likewise.  In execute method call
>   object_sizes_execute, drop set_pass_param method and insert_min_max_p
>   non-static data member and its initializer in the ctor.
>   (pass_data_early_object_sizes, pass_early_object_sizes,
>   make_pass_early_object_sizes): New.
>   * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Use
>   (cfun->curr_properties & PROP_objsz) instead of cfun->after_inlining.
> 
>   * gcc.dg/builtin-object-size-10.c: Pass -fdump-tree-early_objsz-details
>   instead of -fdump-tree-objsz1-details in dg-options and adjust names
>   of dump file in scan-tree-dump.
>   * gcc.dg/pr101419.c: New test.
> 
> --- gcc/tree-pass.h.jj2021-01-27 10:10:00.525903635 +0100
> +++ gcc/tree-pass.h   2021-07-12 17:23:42.322648068 +0200
> @@ -208,6 +208,7 @@ protected:
>  #define PROP_gimple_lcf  (1 << 1)/* lowered control flow 
> */
>  #define PROP_gimple_leh  (1 << 2)/* lowered eh */
>  #define PROP_cfg (1 << 3)
> +#define PROP_objsz   (1 << 4)/* object sizes computed */
>  #define PROP_ssa (1 << 5)
>  #define PROP_no_crit_edges  (1 << 6)
>  #define PROP_rtl (1 << 7)
> @@ -426,6 +427,7 @@ extern gimple_opt_pass *make_pass_omp_ta
>  extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_omp_device_lower (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_early_object_sizes (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_warn_printf (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
> --- gcc/passes.def.jj 2021-05-19 09:16:34.434046683 +0200
> +++ gcc/passes.def2021-07-12 17:41:38.274859148 +0200
> @@ -74,7 +74,7 @@ along with GCC; see the file COPYING3.
>NEXT_PASS (pass_all_early_optimizations);
>PUSH_INSERT_PASSES_WITHIN (pass_all_early_optimizations)
> NEXT_PASS (pass_remove_cgraph_callee_edges);
> -   NEXT_PASS (pass_object_sizes, true /* insert_min_max_p */);
> +   NEXT_PASS (pass_early_object_sizes);
> /* Don't record nonzero bits before IPA to avoid
>using too much memory.  */
> NEXT_PASS (pass_ccp, false /* nonzero_p */);
> @@ -194,14 +194,14 @@ along with GCC; see the file COPYING3.
>They ensure memory accesses are not indirect wherever possible.  */
>NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
>NEXT_PASS (pass_ccp, true /* nonzero_p */);
> -  NEXT_PASS (pass_post_ipa_warn);
>/* After CCP we rewrite no longer addressed locals into SSA
>form if possible.  */
> +  NEXT_PASS (pass_object_sizes);
> +  NEXT_PASS (pass_post_ipa_warn);
>NEXT_PASS (pass_complete_unrolli);
>NEXT_PASS (pass_backprop);
>NEXT_PASS (pass_phiprop);
>NEXT_PASS (pass_forwprop);
> -  NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
>/* pass_build_alias is a dummy pass that ensures that we
>execute TODO_rebuild_alias at this point.  */
>NEXT_PASS (pass_build_alias);
> --- gcc/tree-object-size.c.jj 2021-01-04 10:25:39.911221618 +0100
> +++ gcc/tree-object-size.c2021-07-12 17:47:30.497018569 +0200
> @@ -1304,45 +1304,6 @@ fini_object_sizes (void)
>  }
>  }
>  
> -
> 

Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-13 Thread guojiufu via Gcc-patches

On 2021-07-13 15:09, Richard Biener wrote:

On Tue, 13 Jul 2021, guojiufu wrote:


On 2021-07-12 23:53, guojiufu via Gcc-patches wrote:
> On 2021-07-12 22:46, Richard Biener wrote:
>> On Mon, 12 Jul 2021, guojiufu wrote:
>>
>>> On 2021-07-12 18:02, Richard Biener wrote:
>>> > On Mon, 12 Jul 2021, guojiufu wrote:
>>> >
>>> >> On 2021-07-12 16:57, Richard Biener wrote:
>>> >> > On Mon, 12 Jul 2021, guojiufu wrote:
>>> >> >
>>> >> >> On 2021-07-12 14:20, Richard Biener wrote:
>>> >> >> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
>>> >> >> >
>>> >> >> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
>>> >> >> >> > I wonder if there's a way to query the target what modes the
>>> >> >> >> > doloop
>>> >> >> >> > pattern can handle (not being too familiar with the doloop
>>> >> >> >> > code).
>>> >> >> >>
>>> >> >> >> You can look what modes are allowed for operand 0 of doloop_end,
>>> >> >> >> perhaps?  Although that is a define_expand, not a define_insn, so
>>> >> >> >> it
>>> >> >> >> is
>>> >> >> >> hard to introspect.
>>> >> >> >>
>>> >> >> >> > Why do you need to do any checks besides the new type being
>>> >> >> >> > able to
>>> >> >> >> > represent all IV values?  The original doloop IV will never
>>> >> >> >> > wrap
>>> >> >> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will
>>> >> >> >> > become
>>> >> >> >> > zero ... I suppose the doloop might still do the correct thing
>>> >> >> >> > here
>>> >> >> >> > but it also still will with a IV with larger type).
>>> >> >>
>>> >> >> The issue comes from U*_MAX (original short MAX), as you said: on
>>> >> >> which
>>> >> >> niter + 1 becomes zero.  And because the step for doloop is -1;
>>> >> >> then, on
>>> >> >> larger type 'zero - 1' will be a very large number on larger type
>>> >> >> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a
>>> >> >> small
>>> >> >> value
>>> >> >> (e.g. "0xff").
>>> >> >
>>> >> > But for the larger type the small type MAX + 1 fits and does not
>>> >> > yield
>>> >> > zero so it should still work exactly as before, no?  Of course you
>>> >> > have to compute the + 1 in the larger type.
>>> >> >
>>> >> You are right, if compute the "+ 1" in the larger type it is ok, as
>>> >> below
>>> >> code:
>>> >> ```
>>> >>/* Use type in word size may fast.  */
>>> >> if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
>>> >>   {
>>> >> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
>>> >> niter = fold_convert (ntype, niter);
>>> >>   }
>>> >>
>>> >> tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
>>> >>  build_int_cst (ntype, 1));
>>> >>
>>> >>
>>> >> add_candidate (data, base, build_int_cst (ntype, -1), true, NULL,
>>> >> NULL,
>>> >> true);
>>> >> ```
>>> >> The issue of this is, this code generates more stmt for doloop.xxx:
>>> >>   _12 = (unsigned int) xx(D);
>>> >>   _10 = _12 + 4294967295;
>>> >>   _24 = (long unsigned int) _10;
>>> >>   doloop.6_8 = _24 + 1;
>>> >>
>>> >> if use previous patch, "+ 1" on original type, then the stmts will
>>> >> looks
>>> >> like:
>>> >>   _12 = (unsigned int) xx(D);
>>> >>   doloop.6_8 = (long unsigned int) _12;
>>> >>
>>> >> This is the reason for checking
>>> >>wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))
>>> >
>>> > But this then only works when there's an upper bound on the number
>>> > of iterations.  Note you should not use TYPE_MAX_VALUE here but
>>> > you can instead use
>>> >
>>> >  wi::ltu_p (niter_desc->max, wi::to_widest (wi::max_value
>>> > (TYPE_PRECISION (ntype), TYPE_SIGN (ntype;
>>>
>>> Ok, Thanks!
>>> I remember you mentioned that:
>>> widest_int::from (wi::max_value (TYPE_PRECISION (ntype), TYPE_SIGN
>>> (ntype)),
>>> TYPE_SIGN (ntype))
>>> would be better than
>>> wi::to_widest (TYPE_MAX_VALUE (ntype)).
>>>
>>> It seems that:
>>> "TYPE_MAX_VALUE (ntype)" is "NUMERICAL_TYPE_CHECK
>>> (NODE)->type_non_common.maxval"
>>> which do a numerical-check and return the field of maxval.  And then call
>>> to
>>> wi::to_widest
>>>
>>> The other code "widest_int::from (wi::max_value (..,..),..)" calls
>>> wi::max_value
>>> and widest_int::from.
>>>
>>> I'm wondering if wi::to_widest (TYPE_MAX_VALUE (ntype)) is cheaper?
>>
>> TYPE_MAX_VALUE can be "suprising", it does not necessarily match the
>> underlying modes precision.  At some point we've tried to eliminate
>> most of its uses, not sure what the situation/position is right now.
> Ok, get it, thanks.
> I will use "widest_int::from (wi::max_value (..,..),..)".
>
>>
>>> > I think the -1 above comes from number of latch iterations vs. header
>>> > entries - it's a common source for this kind of issues.  range analysis
>>> > might be able to prove that we can still merge the two adds even with
>>> > the intermediate extension.
>>> Yes, as you mentioned here, it relates to number of latch iterations
>>> For loop looks like : while (l < n) or for (i = 0; i < n; i++)

Re: fix typo in attr_fnspec::verify

2021-07-13 Thread Richard Biener via Gcc-patches
On Tue, Jul 13, 2021 at 5:15 AM Alexandre Oliva  wrote:
>
>
> Odd-numbered indices describing argument access sizes in the fnspec
> string can only hold 't' or a digit, as tested in the beginning of the
> case.  When checking that the size-supplying argument does not have
> additional information associated with it, the test that excludes the
> 't' possibility looks for it at the even position in the fnspec
> string.  Oops.
>
> This might yield false positives and negatives if a function has a
> fnspec in which an argument uses a 't' access-size, and ('t' - '1')
> happens to be the index of an argument described in an fnspec string.
> Assuming ASCII encoding, it would take a function with at least 68
> arguments described in fnspec.  Still, probably worth fixing.
>
> Regstrapped on x86_64-linux-gnu.  I'm checking this in as obvious unless
> there are objections within some 48 hours.

oops - also worth backporting to affected branches.

Richard.

>
> for  gcc/ChangeLog
>
> * tree-ssa-alias.c (attr_fnspec::verify): Fix index in
> non-'t'-sized arg check.
> ---
>  gcc/tree-ssa-alias.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index 0421bfac99869..742a95a549e20 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -3895,7 +3895,7 @@ attr_fnspec::verify ()
> && str[idx] != 'w' && str[idx] != 'W'
> && str[idx] != 'o' && str[idx] != 'O')
>   err = true;
> -   if (str[idx] != 't'
> +   if (str[idx + 1] != 't'
> /* Size specified is scalar, so it should be described
>by ". " if specified at all.  */
> && (arg_specified_p (str[idx + 1] - '1')
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: avoid early reference to debug-only symbol

2021-07-13 Thread Richard Biener via Gcc-patches
On Tue, Jul 13, 2021 at 5:13 AM Alexandre Oliva  wrote:
>
>
> If some IPA pass replaces the only reference to a constant non-public
> non-automatic variable with its initializer, namely the address of
> another such variable, the former becomes unreferenced and it's
> discarded by symbol_table::remove_unreachable_nodes.  It calls
> debug_hooks->late_global_decl while at that, and this expands the
> initializer, which assigs RTL to the latter variable and forces it to
> be retained by remove_unreferenced_decls, and eventually be output
> despite not being referenced.  Without debug information, it's not
> output.
>
> This has caused a bootstrap-debug compare failure in
> libdecnumber/decContext.o, while developing a transformation that ends
> up enabling the above substitution in constprop.
>
> This patch makes reference_to_unused slightly more conservative about
> such variables at the end of IPA passes, falling back onto
> expand_debug_expr for expressions referencing symbols that might or
> might not be output, avoiding the loss of debug information when the
> symbol is output, while avoiding a symbol output only because of debug
> info.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?
>
>
> for  gcc/ChangeLog
>
> * dwarf2out.c (add_const_value_attribute): Return false if
> resolve_one_addr fails.
> (reference_to_unused): Don't assume local symbol presence
> while it can still be optimized out.
> (rtl_for_decl_init): Fallback to expand_debug_expr.
> * cfgexpand.c (expand_debug_expr): Export.
> * expr.h (expand_debug_expr): Declare.
> ---
>  gcc/cfgexpand.c |   12 +---
>  gcc/dwarf2out.c |   15 +--
>  gcc/expr.h  |2 ++
>  3 files changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 3edd53c37dcb3..b731a5598230c 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -91,8 +91,6 @@ struct ssaexpand SA;
> of comminucating the profile info to the builtin expanders.  */
>  gimple *currently_expanding_gimple_stmt;
>
> -static rtx expand_debug_expr (tree);
> -
>  static bool defer_stack_allocation (tree, bool);
>
>  static void record_alignment_for_reg_var (unsigned int);
> @@ -4413,7 +4411,7 @@ expand_debug_parm_decl (tree decl)
>
>  /* Return an RTX equivalent to the value of the tree expression EXP.  */
>
> -static rtx
> +rtx
>  expand_debug_expr (tree exp)
>  {
>rtx op0 = NULL_RTX, op1 = NULL_RTX, op2 = NULL_RTX;
> @@ -5285,6 +5283,14 @@ expand_debug_expr (tree exp)
>else
> goto flag_unsupported;
>
> +case COMPOUND_LITERAL_EXPR:
> +  exp = COMPOUND_LITERAL_EXPR_DECL_EXPR (exp);
> +  /* DECL_EXPR is a tcc_statement, which expand_debug_expr does
> +not expect, so instead of recursing we take care of it right
> +away.  */
> +  exp = DECL_EXPR_DECL (exp);
> +  return expand_debug_expr (exp);
> +
>  case CALL_EXPR:
>/* ??? Maybe handle some builtins?  */
>return NULL;
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 82783c4968b85..bb7e2b8dc4e2c 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -20170,7 +20170,8 @@ add_const_value_attribute (dw_die_ref die, rtx rtl)
>if (dwarf_version >= 4 || !dwarf_strict)
> {
>   dw_loc_descr_ref loc_result;
> - resolve_one_addr ();
> + if (!resolve_one_addr ())
> +   return false;
> rtl_addr:
>loc_result = new_addr_loc_descr (rtl, dtprel_false);
>   add_loc_descr (_result, new_loc_descr (DW_OP_stack_value, 0, 
> 0));
> @@ -20255,6 +20256,12 @@ reference_to_unused (tree * tp, int * walk_subtrees,
>varpool_node *node = varpool_node::get (*tp);
>if (!node || !node->definition)
> return *tp;
> +  /* If it's local, it might still be optimized out, unless we've
> +already committed to outputting it by assigning RTL to it.  */
> +  if (! TREE_PUBLIC (*tp) && ! TREE_ASM_WRITTEN (*tp)
> + && symtab->state <= IPA_SSA_AFTER_INLINING

Hmm, elsewhere in this function we're not anticipating future removal but
instead use ->global_info_ready which IIRC is when the unit was
initially analyzed.  So don't the other uses have the same issue?  Maybe
reference_to_unused is the wrong tool here and we need a
reference_to_discardable or so?

In other places we manage to use symbolic DIE references later resolved
by note_variable_values, can we maybe do this unconditionally for the
initializers of removed decls somehow?

> + && ! DECL_RTL_SET_P (*tp))
> +   return *tp;
>  }
>else if (TREE_CODE (*tp) == FUNCTION_DECL
>&& (!DECL_EXTERNAL (*tp) || DECL_DECLARED_INLINE_P (*tp)))
> @@ -20279,6 +20286,7 @@ static rtx
>  rtl_for_decl_init (tree init, tree type)
>  {
>rtx rtl = NULL_RTX;
> +  bool valid_p = false;
>
>STRIP_NOPS (init);
>
> @@ -20322,7 +20330,7 @@ rtl_for_decl_init (tree init, tree type)
>/* If 

Re: [PATCH V2] coroutines: Adjust outlined function names [PR95520].

2021-07-13 Thread Iain Sandoe
Hi Jason

> On 12 Jul 2021, at 20:40, Jason Merrill  wrote:
> 
> On 7/11/21 9:03 AM, Iain Sandoe wrote:
>> Hi Jason,
>>> On 9 Jul 2021, at 22:40, Jason Merrill  wrote:
>>> 
>>> On 7/9/21 2:18 PM, Iain Sandoe wrote:
>>> How about handling this in write_encoding, along the lines of the 
>>> devel/c++-contracts branch?
>> OK, so I took a look at this and implemented as below.
> 
> Oh, sorry, I didn't expect it to be such a large change!
> 
>>  Some small differences from your contracts impl described here.
>> recalling
>> the original function becomes the ramp - it is called directly by the 
>> user-code.
>> the resumer (actor) contains the outlined code wrapped in synthesized logic 
>> as dictated by the std
>> the destroy function effectively calls the actor with a flag that says “take 
>> the DTOR path” (since the DTOR path has to be available in the case of 
>> resume too).
>> this means that is is possible for the actor to be partially (or completely 
>> for a generator-style coro) inlined into either the ramp or the destroyer.
>> 1. using DECL_ABSTRACT_ORIGIN didn’t work with optimisation and debug since 
>> the inlining of the outlining confuses the issue (the actor/destory helpers 
>> are not real clones).
> 
> Hmm, I wonder if that will bite my use in contracts as well.  Can you 
> elaborate?

In the coroutines case I think it is simply a lie to set DECL_ABSTRACT_ORIGIN 
since that is telling the debug machinery:

"For any sort of a ..._DECL node, this points to the original (abstract)
   decl node which this decl is an inlined/cloned instance of, or else it
   is NULL indicating that this decl is not an instance of some other decl. “

That is not true for either the actor or destroy functions in coroutines - they 
are not instances of the ramp.

The problem comes when the actor gets inlined into the ramp - so I guess the 
machinery is expecting that we’ve done something akin to a recursion - but the 
actor is completely different code from the ramp, and has a different interface:
void actor(frame*) c.f. whatever the user’s function was (including being a 
class method or a lambda).

The fail occurs here:

gen_inlined_subroutine_die (tree stmt, dw_die_ref context_die)
 …..
  /* Make sure any inlined functions are known to be inlineable.  */
  gcc_checking_assert (DECL_ABSTRACT_P (decl)
   || cgraph_function_possibly_inlined_p (decl));

--

* I’d expect the JOIN_STR change to bite you at some point (since there are 
some platforms that don’t allow periods in symbols).

>> -  const char *mangled_name
>> -= (ovl_op_info[DECL_ASSIGNMENT_OPERATOR_P (decl)]
>> +  const char *mangled_name;
>> +  if (DECL_IS_CORO_ACTOR_P (decl) || DECL_IS_CORO_DESTROY_P (decl))
>> +{
>> +  tree t = DECL_RAMP_FN (decl);
> 
> This ends up doing 5 lookups in the to_ramp hashtable; that should be fast 
> enough, but better I think to drop the DECL_IS_CORO_*_P macros and check 
> DECL_RAMP_FN directly, both here and in write_encoding.

TBH, I had misgivings about this - primarily that the “not used” path should 
have low impact.

However, if there are no coroutines in a TU, then the case above should only be 
two calls which immediately return NULL_TREE…

… however, I’ve changed this as suggested so that there are fewer calls in all 
cases (in the attached).

We can just test DECL_RAMP_FN (decl) since that will return NULL_TREE for any 
case that isn’t a helper (and, again, if there are no coroutines in the TU, it 
returns NULL_TREE immediately).

If we can guarantee that cfun will be available (so we didn’t need to check for 
its presence), then there’s a “coroutine helper” flag there which could be used 
to guard this further (but I’m not sure that it will be massivley quicker if we 
needed to check to see if the cfun is available first).

>> +  mangled_name = (ovl_op_info[DECL_ASSIGNMENT_OPERATOR_P (t)]
>> +   [DECL_OVERLOADED_OPERATOR_CODE_RAW (t)].mangled_name);
> 
> Is there a reason not to do decl = t; and then share the array indexing line?

No - tidied in the revised version.

tested on x86_64-darwin,
OK for master / backports (with wider testing first).
thanks
Iain

===

The mechanism used to date for uniquing the coroutine helper
functions (actor, destroy) was over-complicating things and
leading to the noted PR and also difficulties in setting
breakpoints on these functions (so this will help PR99215 as
well).

This implementation delegates the adjustment to the mangling
to write_encoding() which necessitates some book-keeping so
that it is possible to determine which of the coroutine
helper names is to be mangled.

Signed-off-by: Iain Sandoe 

PR c++/95520 - [coroutines] __builtin_FUNCTION() returns mangled .actor instead 
of original function name

PR c++/95520

gcc/cp/ChangeLog:

* coroutines.cc (struct coroutine_info): Add fields for
actor and destroy function decls.
(to_ramp): New.

Re: move unreachable user labels to entry point

2021-07-13 Thread Richard Biener via Gcc-patches
On Tue, Jul 13, 2021 at 5:11 AM Alexandre Oliva  wrote:
>
>
> pr42739.C, complicated by some extra wrappers and cleanups from a
> feature I'm working on, got me very confused because a user label
> ended up in a cleanup introduced by my pass, where it couldn't
> possibly have been initially.
>
> The current logic may move such an unreachable user label multiple
> times, if it ends up in blocks that get removed one after the other.
> Since it doesn't really matter where it lands (save for omp
> constraints), I propose we move it once and for all to a stable, final
> location, that we currently use only as a last resort: the entry
> point.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?
>
>
> for  gcc/ChangeLog
>
> * tree-cfg.c (remove_bb): When preserving an unreachable user
> label, use the entry block as the preferred choice for its
> surviving location, rather than as a last resort.
> ---
>  gcc/tree-cfg.c |   18 +-
>  1 file changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 1f0f4a2c6eb2c..f6f005f10a9f5 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -2300,13 +2300,11 @@ remove_bb (basic_block bb)
>   FORCED_LABEL (gimple_label_label (label_stmt)) = 1;
> }
>
> - new_bb = bb->prev_bb;
> - /* Don't move any labels into ENTRY block.  */
> - if (new_bb == ENTRY_BLOCK_PTR_FOR_FN (cfun))
> -   {
> - new_bb = single_succ (new_bb);
> - gcc_assert (new_bb != bb);
> -   }
> + /* We have to move the unreachable label somewhere.
> +Moving it to the entry block makes sure it's moved at
> +most once, and avoids messing with anonymous landing
> +pad labels.  */
> + new_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>   if ((unsigned) bb->index < bb_to_omp_idx.length ()
>   && ((unsigned) new_bb->index >= bb_to_omp_idx.length ()
>   || (bb_to_omp_idx[bb->index]
> @@ -2316,7 +2314,6 @@ remove_bb (basic_block bb)
>  into the right OMP region.  */

The comment above says

  /* During cfg pass make sure to put orphaned labels
 into the right OMP region.  */

and the full guard is

  if ((unsigned) bb->index < bb_to_omp_idx.length ()
  && ((unsigned) new_bb->index >= bb_to_omp_idx.length ()
  || (bb_to_omp_idx[bb->index]
  != bb_to_omp_idx[new_bb->index])))

The right OMP region suggests something wrt correctness
(though the original choice of bb->prev_bb doesn't put the initial new_bb
in an semantic relation to the original block).

So I wonder what the code was for an whether we still need it.
The ENTRY_BLOCK successor can be in some OMP region I guess
but we're still falling back to a "mismatched" OMP region in case
it was.  Then we could also insert on the ENTRY_BLOCK single successor
edge...

That said, the patch doesn't look incorrect - it just tweaks heuristics - but
the OMP code looks odd.

Richard.

>   unsigned int i;
>   int idx;
> - new_bb = NULL;
>   FOR_EACH_VEC_ELT (bb_to_omp_idx, i, idx)
> if (i >= NUM_FIXED_BLOCKS
> && idx == bb_to_omp_idx[bb->index]
> @@ -2325,11 +2322,6 @@ remove_bb (basic_block bb)
> new_bb = BASIC_BLOCK_FOR_FN (cfun, i);
> break;
>   }
> - if (new_bb == NULL)
> -   {
> - new_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> - gcc_assert (new_bb != bb);
> -   }
> }
>   new_gsi = gsi_after_labels (new_bb);
>   gsi_remove (, false);
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-13 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 12, 2021 at 05:29:36PM +0200, Florian Weimer wrote:
> I verifed that this change on top successfully builds GCC for all glibc
> targets:

Here is what I've committed after testing overnight:

2021-07-13  Jakub Jelinek  
Florian Weimer  

* config/linux/sem.h: Don't include limits.h.
(SEM_WAIT): Define to -__INT_MAX__ - 1 instead of INT_MIN.
* config/linux/affinity.c: Include limits.h.

--- libgomp/config/linux/sem.h.jj   2021-01-18 07:18:42.360339646 +0100
+++ libgomp/config/linux/sem.h  2021-07-12 15:18:10.121178404 +0200
@@ -33,10 +33,8 @@
 #ifndef GOMP_SEM_H
 #define GOMP_SEM_H 1
 
-#include  /* For INT_MIN */
-
 typedef int gomp_sem_t;
-#define SEM_WAIT INT_MIN
+#define SEM_WAIT (-__INT_MAX__ - 1)
 #define SEM_INC 1
 
 extern void gomp_sem_wait_slow (gomp_sem_t *, int);
--- libgomp/config/linux/affinity.c.jj  2021-01-04 10:25:56.160037625 +0100
+++ libgomp/config/linux/affinity.c 2021-07-12 17:19:07.280429144 +0200
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef HAVE_PTHREAD_AFFINITY_NP
 


Jakub



[PATCH] godump: Fix -fdump-go-spec= reproduceability issue

2021-07-13 Thread Jakub Jelinek via Gcc-patches
Hi!

pot_dummy_types is a hash_set from whose traversal the code prints some type
lines.  hash_set normally uses default_hash_traits which for pointer types
(the hash set hashes const char *) uses pointer_hash which hashes the
addresses of the pointers except of the least significant 3 bits.
With address space randomization, that results in non-determinism in the
-fdump-go-specs= generated file, each invocation can have different order of
the lines emitted from pot_dummy_types traversal.

This patch fixes it by hashing the string contents instead to make the
hashes reproduceable.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-13  Jakub Jelinek  

PR go/101407
* godump.c (godump_str_hash): New type.
(godump_container::pot_dummy_types): Use string_hash instead of
ptr_hash in the hash_set.

--- gcc/godump.c.jj 2021-05-07 10:34:46.255123614 +0200
+++ gcc/godump.c2021-07-12 15:00:31.328761742 +0200
@@ -56,6 +56,8 @@ static FILE *go_dump_file;
 
 static GTY(()) vec *queue;
 
+struct godump_str_hash : string_hash, ggc_remove  {};
+
 /* A hash table of macros we have seen.  */
 
 static htab_t macro_hash;
@@ -535,7 +537,7 @@ public:
 
   /* Types which may potentially have to be defined as dummy
  types.  */
-  hash_set pot_dummy_types;
+  hash_set pot_dummy_types;
 
   /* Go keywords.  */
   htab_t keyword_hash;

Jakub



Re: adjust landing pads when changing main label

2021-07-13 Thread Richard Biener via Gcc-patches
On Tue, Jul 13, 2021 at 5:10 AM Alexandre Oliva  wrote:
>
>
> If an artificial label created for a landing pad ends up being
> dropped in favor of a user-supplied label, the user-supplied label
> inherits the landing pad index, but the post_landing_pad field is not
> adjusted to point to the new label.
>
> This patch fixes the problem, and adds verification that we don't
> remove a label that's still used as a landing pad.
>
> The circumstance in which this problem can be hit was unusual: removal
> of a block with an unreachable label moves the label to some other
> unrelated block, in case its address is taken.  In the case at hand
> (pr42739.C, complicated by wrappers and cleanups), the chosen block
> happened to be an EH landing pad.  (A followup patch will change that.)
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

>
> for  gcc/ChangeLog
>
> * tree-cfg.c (cleanup_dead_labels_eh): Update
> post_landing_pad label upon change of landing pad block's
> primary label.
> (cleanup_dead_labels): Check that a removed label is not that
> of a landing pad.
> ---
>  gcc/tree-cfg.c |6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index c73e1cbdda6b9..1f0f4a2c6eb2c 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -1481,6 +1481,7 @@ cleanup_dead_labels_eh (label_record *label_for_bb)
> if (lab != lp->post_landing_pad)
>   {
> EH_LANDING_PAD_NR (lp->post_landing_pad) = 0;
> +   lp->post_landing_pad = lab;
> EH_LANDING_PAD_NR (lab) = lp->index;
>   }
>}
> @@ -1707,7 +1708,10 @@ cleanup_dead_labels (void)
>   || FORCED_LABEL (label))
> gsi_next ();
>   else
> -   gsi_remove (, true);
> +   {
> + gcc_checking_assert (EH_LANDING_PAD_NR (label) == 0);
> + gsi_remove (, true);
> +   }
> }
>  }
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: disable -Warray-bounds in libgo (PR 101374)

2021-07-13 Thread Rainer Orth
Hi Martin,

>>> while this patch does fix the libgo bootstrap failure, Go is completely
>>> broken: almost 1000 go.test failures and all libgo tests FAIL as well.
>>> Seen on both i386-pc-solaris2.11 and sparc-sun-solaris2.11.
>> FWIW, I see exactly the same failures on x86_64-pc-linux-gnu, so nothing
>> Solaris-specific here.
>
> I don't normally test Go because of PR 91992, but I see just

I've never seen this myself, neither on Fedora 29 in the past or on
Ubuntu 20.04 right now.

> the three test failures below on x86_64-linux with the latest trunk:
>
> FAIL: go.test/test/fixedbugs/issue10441.go   -O  (test for excess errors)
> FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments
> FAIL: runtime/pprof
>
> The excess errors don't look related to my changes:
>
> FAIL: go.test/test/fixedbugs/issue10441.go   -O  (test for excess errors)
> Excess errors:
> /usr/bin/ld:
> /ssd/test/build/gcc-trunk/x86_64-pc-linux-gnu/./libgo/.libs/libgo.so: 
> undefined reference to `__go_init_main'
> /usr/bin/ld:
> /ssd/test/build/gcc-trunk/x86_64-pc-linux-gnu/./libgo/.libs/libgo.so: 
> undefined reference to `main.main'

Indeed: this is a compile-only test that isn't marked as such and thus
fails to link.

> If you see different failures in your build that look like they
> might be caused by them then please show what those are.

I've started another round of bootstraps last night for the first time
since Friday and indeed the failures are gone.  Before, every single Go
execution test died with a SEGV.  No idea what fixed that, though.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] passes: Fix up subobject __bos [PR101419]

2021-07-13 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled, because VN during cunrolli changes
__bos argument from address of a larger field to address of a smaller field
and so __builtin_object_size (, 1) then folds into smaller value than the
actually available size.
copy_reference_ops_from_ref has a hack for this, but it was using
cfun->after_inlining as a check whether the hack can be ignored, and
cunrolli is after_inlining.

This patch uses a property to make it exact (set at the end of objsz
pass that doesn't do insert_min_max_p) and additionally based on discussions
in the PR moves the objsz pass earlier after IPA.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-13  Jakub Jelinek  
Richard Biener  

PR tree-optimization/101419
* tree-pass.h (PROP_objsz): Define.
(make_pass_early_object_sizes): Declare.
* passes.def (pass_all_early_optimizations): Rename pass_object_sizes
there to pass_early_object_sizes, drop parameter.
(pass_all_optimizations): Move pass_object_sizes right after pass_ccp,
drop parameter, move pass_post_ipa_warn right after that.
* tree-object-size.c (pass_object_sizes::execute): Rename to...
(object_sizes_execute): ... this.  Add insert_min_max_p argument.
(pass_data_object_sizes): Move after object_sizes_execute.
(pass_object_sizes): Likewise.  In execute method call
object_sizes_execute, drop set_pass_param method and insert_min_max_p
non-static data member and its initializer in the ctor.
(pass_data_early_object_sizes, pass_early_object_sizes,
make_pass_early_object_sizes): New.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Use
(cfun->curr_properties & PROP_objsz) instead of cfun->after_inlining.

* gcc.dg/builtin-object-size-10.c: Pass -fdump-tree-early_objsz-details
instead of -fdump-tree-objsz1-details in dg-options and adjust names
of dump file in scan-tree-dump.
* gcc.dg/pr101419.c: New test.

--- gcc/tree-pass.h.jj  2021-01-27 10:10:00.525903635 +0100
+++ gcc/tree-pass.h 2021-07-12 17:23:42.322648068 +0200
@@ -208,6 +208,7 @@ protected:
 #define PROP_gimple_lcf(1 << 1)/* lowered control flow 
*/
 #define PROP_gimple_leh(1 << 2)/* lowered eh */
 #define PROP_cfg   (1 << 3)
+#define PROP_objsz (1 << 4)/* object sizes computed */
 #define PROP_ssa   (1 << 5)
 #define PROP_no_crit_edges  (1 << 6)
 #define PROP_rtl   (1 << 7)
@@ -426,6 +427,7 @@ extern gimple_opt_pass *make_pass_omp_ta
 extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_early_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_warn_printf (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
--- gcc/passes.def.jj   2021-05-19 09:16:34.434046683 +0200
+++ gcc/passes.def  2021-07-12 17:41:38.274859148 +0200
@@ -74,7 +74,7 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_all_early_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_early_optimizations)
  NEXT_PASS (pass_remove_cgraph_callee_edges);
- NEXT_PASS (pass_object_sizes, true /* insert_min_max_p */);
+ NEXT_PASS (pass_early_object_sizes);
  /* Don't record nonzero bits before IPA to avoid
 using too much memory.  */
  NEXT_PASS (pass_ccp, false /* nonzero_p */);
@@ -194,14 +194,14 @@ along with GCC; see the file COPYING3.
 They ensure memory accesses are not indirect wherever possible.  */
   NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
   NEXT_PASS (pass_ccp, true /* nonzero_p */);
-  NEXT_PASS (pass_post_ipa_warn);
   /* After CCP we rewrite no longer addressed locals into SSA
 form if possible.  */
+  NEXT_PASS (pass_object_sizes);
+  NEXT_PASS (pass_post_ipa_warn);
   NEXT_PASS (pass_complete_unrolli);
   NEXT_PASS (pass_backprop);
   NEXT_PASS (pass_phiprop);
   NEXT_PASS (pass_forwprop);
-  NEXT_PASS (pass_object_sizes, false /* insert_min_max_p */);
   /* pass_build_alias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  */
   NEXT_PASS (pass_build_alias);
--- gcc/tree-object-size.c.jj   2021-01-04 10:25:39.911221618 +0100
+++ gcc/tree-object-size.c  2021-07-12 17:47:30.497018569 +0200
@@ -1304,45 +1304,6 @@ fini_object_sizes (void)
 }
 }
 
-
-/* Simple pass to optimize all __builtin_object_size () builtins.  */
-
-namespace {
-
-const pass_data pass_data_object_sizes =
-{
-  GIMPLE_PASS, /* type */
-  "objsz", /* name */

Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-13 Thread Richard Biener
On Tue, 13 Jul 2021, guojiufu wrote:

> On 2021-07-12 23:53, guojiufu via Gcc-patches wrote:
> > On 2021-07-12 22:46, Richard Biener wrote:
> >> On Mon, 12 Jul 2021, guojiufu wrote:
> >> 
> >>> On 2021-07-12 18:02, Richard Biener wrote:
> >>> > On Mon, 12 Jul 2021, guojiufu wrote:
> >>> >
> >>> >> On 2021-07-12 16:57, Richard Biener wrote:
> >>> >> > On Mon, 12 Jul 2021, guojiufu wrote:
> >>> >> >
> >>> >> >> On 2021-07-12 14:20, Richard Biener wrote:
> >>> >> >> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
> >>> >> >> >
> >>> >> >> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
> >>> >> >> >> > I wonder if there's a way to query the target what modes the
> >>> >> >> >> > doloop
> >>> >> >> >> > pattern can handle (not being too familiar with the doloop
> >>> >> >> >> > code).
> >>> >> >> >>
> >>> >> >> >> You can look what modes are allowed for operand 0 of doloop_end,
> >>> >> >> >> perhaps?  Although that is a define_expand, not a define_insn, so
> >>> >> >> >> it
> >>> >> >> >> is
> >>> >> >> >> hard to introspect.
> >>> >> >> >>
> >>> >> >> >> > Why do you need to do any checks besides the new type being
> >>> >> >> >> > able to
> >>> >> >> >> > represent all IV values?  The original doloop IV will never
> >>> >> >> >> > wrap
> >>> >> >> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will
> >>> >> >> >> > become
> >>> >> >> >> > zero ... I suppose the doloop might still do the correct thing
> >>> >> >> >> > here
> >>> >> >> >> > but it also still will with a IV with larger type).
> >>> >> >>
> >>> >> >> The issue comes from U*_MAX (original short MAX), as you said: on
> >>> >> >> which
> >>> >> >> niter + 1 becomes zero.  And because the step for doloop is -1;
> >>> >> >> then, on
> >>> >> >> larger type 'zero - 1' will be a very large number on larger type
> >>> >> >> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a
> >>> >> >> small
> >>> >> >> value
> >>> >> >> (e.g. "0xff").
> >>> >> >
> >>> >> > But for the larger type the small type MAX + 1 fits and does not
> >>> >> > yield
> >>> >> > zero so it should still work exactly as before, no?  Of course you
> >>> >> > have to compute the + 1 in the larger type.
> >>> >> >
> >>> >> You are right, if compute the "+ 1" in the larger type it is ok, as
> >>> >> below
> >>> >> code:
> >>> >> ```
> >>> >>/* Use type in word size may fast.  */
> >>> >> if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
> >>> >>   {
> >>> >> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
> >>> >> niter = fold_convert (ntype, niter);
> >>> >>   }
> >>> >>
> >>> >> tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
> >>> >>  build_int_cst (ntype, 1));
> >>> >>
> >>> >>
> >>> >> add_candidate (data, base, build_int_cst (ntype, -1), true, NULL,
> >>> >> NULL,
> >>> >> true);
> >>> >> ```
> >>> >> The issue of this is, this code generates more stmt for doloop.xxx:
> >>> >>   _12 = (unsigned int) xx(D);
> >>> >>   _10 = _12 + 4294967295;
> >>> >>   _24 = (long unsigned int) _10;
> >>> >>   doloop.6_8 = _24 + 1;
> >>> >>
> >>> >> if use previous patch, "+ 1" on original type, then the stmts will
> >>> >> looks
> >>> >> like:
> >>> >>   _12 = (unsigned int) xx(D);
> >>> >>   doloop.6_8 = (long unsigned int) _12;
> >>> >>
> >>> >> This is the reason for checking
> >>> >>wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))
> >>> >
> >>> > But this then only works when there's an upper bound on the number
> >>> > of iterations.  Note you should not use TYPE_MAX_VALUE here but
> >>> > you can instead use
> >>> >
> >>> >  wi::ltu_p (niter_desc->max, wi::to_widest (wi::max_value
> >>> > (TYPE_PRECISION (ntype), TYPE_SIGN (ntype;
> >>> 
> >>> Ok, Thanks!
> >>> I remember you mentioned that:
> >>> widest_int::from (wi::max_value (TYPE_PRECISION (ntype), TYPE_SIGN
> >>> (ntype)),
> >>> TYPE_SIGN (ntype))
> >>> would be better than
> >>> wi::to_widest (TYPE_MAX_VALUE (ntype)).
> >>> 
> >>> It seems that:
> >>> "TYPE_MAX_VALUE (ntype)" is "NUMERICAL_TYPE_CHECK
> >>> (NODE)->type_non_common.maxval"
> >>> which do a numerical-check and return the field of maxval.  And then call
> >>> to
> >>> wi::to_widest
> >>> 
> >>> The other code "widest_int::from (wi::max_value (..,..),..)" calls
> >>> wi::max_value
> >>> and widest_int::from.
> >>> 
> >>> I'm wondering if wi::to_widest (TYPE_MAX_VALUE (ntype)) is cheaper?
> >> 
> >> TYPE_MAX_VALUE can be "suprising", it does not necessarily match the
> >> underlying modes precision.  At some point we've tried to eliminate
> >> most of its uses, not sure what the situation/position is right now.
> > Ok, get it, thanks.
> > I will use "widest_int::from (wi::max_value (..,..),..)".
> > 
> >> 
> >>> > I think the -1 above comes from number of latch iterations vs. header
> >>> > entries - it's a common source for this kind of issues.  range analysis
> >>> > might be able to prove that we can still 

Re: [PATCH v2] x86: Don't enable UINTR in 32-bit mode

2021-07-13 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 12, 2021 at 06:51:30PM -0700, H.J. Lu wrote:
> @@ -404,9 +404,18 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
>if (argc < 1)
>  return NULL;

I think it would be simpler to use 2 arguments instead of one.
So change the above to if (argc < 2)

>  
> -  arch = !strcmp (argv[0], "arch");
> +  arch = !strncmp (argv[0], "arch", 4);
>  
> -  if (!arch && strcmp (argv[0], "tune"))
> +  if (!arch && strncmp (argv[0], "tune", 4))
> +return NULL;

Keep strcmp as is here.

> +
> +  bool codegen_x86_64;
> +
> +  if (!strcmp (argv[0] + 4, "32"))
> +codegen_x86_64 = false;
> +  else if (!strcmp (argv[0] + 4, "64"))
> +codegen_x86_64 = true;
> +  else
>  return NULL;

Check argv[1] here instead.

> @@ -813,7 +826,8 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
>  }
>  
>  done:
> -  return concat (cache, "-m", argv[0], "=", cpu, options, NULL);
> +  const char *moption = arch ? "-march=" : "-mtune=";
> +  return concat (cache, moption, cpu, options, NULL);
>  }
>  #else

You don't need this change.

> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 7a35c468da3..7cba655595e 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -2109,6 +2109,7 @@ ix86_option_override_internal (bool main_args_p,
>  #define DEF_PTA(NAME) \
>   if (((processor_alias_table[i].flags & PTA_ ## NAME) != 0) \
>   && PTA_ ## NAME != PTA_64BIT \
> + && (TARGET_64BIT || PTA_ ## NAME != PTA_UINTR) \
>   && !TARGET_EXPLICIT_ ## NAME ## _P (opts)) \
> SET_TARGET_ ## NAME (opts);
>  #include "i386-isa.def"
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 8c3eace56da..ae9f455c48d 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -577,9 +577,12 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
>  #define CC1_CPU_SPEC CC1_CPU_SPEC_1
>  #else
>  #define CC1_CPU_SPEC CC1_CPU_SPEC_1 \
> -"%{march=native:%>march=native %:local_cpu_detect(arch) \
> -  %{!mtune=*:%>mtune=native %:local_cpu_detect(tune)}} \
> -%{mtune=native:%>mtune=native %:local_cpu_detect(tune)}"
> +"%{" OPT_ARCH32 ":%{march=native:%>march=native %:local_cpu_detect(arch32) \
> +   %{!mtune=*:%>mtune=native %:local_cpu_detect(tune32)}}}" \
> +"%{" OPT_ARCH32 ":%{mtune=native:%>mtune=native 
> %:local_cpu_detect(tune32)}}" \
> +"%{" OPT_ARCH64 ":%{march=native:%>march=native %:local_cpu_detect(arch64) \
> +   %{!mtune=*:%>mtune=native %:local_cpu_detect(tune64)}}}" \
> +"%{" OPT_ARCH64 ":%{mtune=native:%>mtune=native %:local_cpu_detect(tune64)}}"

And you can use
#define ARCH_ARG "%{" OPT_ARCH64 ":64;32}"

%:local_cpu_detect(arch, " ARCH_ARG ")
etc.

Jakub



Re: [patch] PR jit/87808: Allow libgccjit to work without an external gcc driver

2021-07-13 Thread Richard Biener via Gcc-patches
On Mon, Jul 12, 2021 at 11:00 PM Matthias Klose  wrote:
>
> On 3/26/19 12:52 PM, Matthias Klose wrote:
> > On 22.03.19 23:00, David Malcolm wrote:
> >> On Thu, 2019-03-21 at 12:26 +0100, Matthias Klose wrote:
> >>> Fix PR jit/87808, the embedded driver still needing the external gcc
> >>> driver to
> >>> find the gcc_lib_dir. This can happen in a packaging context when
> >>> libgccjit
> >>> doesn't depend on the gcc package, but just on binutils and libgcc-
> >>> dev packages.
> >>> libgccjit probably could use /proc/self/maps to find the gcc_lib_dir,
> >>> but that
> >>> doesn't seem to be very portable.
> >>>
> >>> Ok for the trunk and the branches?
> >>>
> >>> Matthias
> >>
> >> [CCing the jit list]
> >>
> >> I've been trying to reproduce this bug in a working copy, and failing.
> >>
> >> Matthias, do you have a recipe you've been using to reproduce this?
> >
> > the JIT debug log shows the driver names that it wants to call.  Are you 
> > sure
> > that this driver isn't available anywhere?  I configure the gcc build with
> > --program-suffix=-8 --program-prefix=x86_64-linux-gnu-, and that one was 
> > only
> > available in one place, /usr/bin.
> >
> > Matthias
>
> David, the bug report now has two more comments from people that the current
> behavior is broken.  Please could you review the patch?

I think libgccjit should use the same strathegy for finding the install location
like the driver does itself.  I couldn't readily decipher its magic but at least
there's STANDARD_EXEC_PREFIX which seems to be used as possible
fallback.

In particular your patch doesn't seem to work with a DESTDIR=
install?

Can we instead add a --with-gccjit-install-dir= or sth like that (whatever
path to whatever files the JIT exactly looks for)?

Richard.

> Thanks, Matthias


  1   2   >