Re: [PATCH] c++: Add __reference_con{struc,ver}ts_from_temporary [PR104477]

2022-07-14 Thread Jason Merrill via Gcc-patches

On 7/14/22 13:43, Marek Polacek wrote:

On Tue, Jul 12, 2022 at 04:15:00PM -0400, Jason Merrill wrote:

On 7/12/22 16:10, Jason Merrill wrote:

On 7/8/22 13:41, Marek Polacek wrote:

This patch implements C++23 P2255R2, which adds two new type traits to
detect reference binding to a temporary.  They can be used to detect code
like

    std::tuple t("meow");

which is incorrect because it always creates a dangling reference,
because
the std::string temporary is created inside the selected constructor of
std::tuple, and not outside it.

There are two new compiler builtins,
__reference_constructs_from_temporary
and __reference_converts_from_temporary.  The former is used to simulate
direct- and the latter copy-initialization context.  But I had a
hard time
finding a test where there's actually a difference.  Under DR 2267, both
of these are invalid:

    struct A { } a;
    struct B { explicit B(const A&); };
    const B {a};
    const B (a);

so I had to peruse [over.match.ref], and eventually realized that the
difference can be seen here:

    struct G {
  operator int(); // #1
  explicit operator int&&(); // #2
    };

int&& r1(G{}); // use #2 (no temporary)
int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&)

The implementation itself was rather straightforward because we already
have conv_binds_ref_to_prvalue.  The main function here is
reference_from_temporary.  The renaming to ref_conv_binds_to_temporary_p
is because previously the function didn't distinguish between an invalid
conversion and one that binds to a prvalue.

The patch also adds the relevant class and variable templates to
.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

 PR c++/104477

gcc/c-family/ChangeLog:

 * c-common.cc (c_common_reswords): Add
 __reference_constructs_from_temporary and
 __reference_converts_from_temporary.
 * c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and
 RID_REF_CONVERTS_FROM_TEMPORARY.

gcc/cp/ChangeLog:

 * call.cc (ref_conv_binds_directly_p): Rename to ...
 (ref_conv_binds_to_temporary_p): ... this.  Add a new bool
 parameter.  Return true only if the conversion is valid and
 conv_binds_ref_to_prvalue returns true.
 * constraint.cc (diagnose_trait_expr): Handle
 CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and
CPTK_REF_CONVERTS_FROM_TEMPORARY.
 * cp-tree.h (enum cp_trait_kind): Add
CPTK_REF_CONSTRUCTS_FROM_TEMPORARY
 and CPTK_REF_CONVERTS_FROM_TEMPORARY.
 (ref_conv_binds_directly_p): Rename to ...
 (ref_conv_binds_to_temporary_p): ... this.
 (reference_from_temporary): Declare.
 * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
 CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and
CPTK_REF_CONVERTS_FROM_TEMPORARY.
 * method.cc (reference_from_temporary): New.
 * parser.cc (cp_parser_primary_expression): Handle
 RID_REF_CONSTRUCTS_FROM_TEMPORARY and
RID_REF_CONVERTS_FROM_TEMPORARY.
 (cp_parser_trait_expr): Likewise.
 (warn_for_range_copy): Adjust to call ref_conv_binds_to_temporary_p.
 * semantics.cc (trait_expr_value): Handle
 CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and
CPTK_REF_CONVERTS_FROM_TEMPORARY.
 (finish_trait_expr): Likewise.

libstdc++-v3/ChangeLog:

 * include/std/type_traits (reference_constructs_from_temporary,
 reference_converts_from_temporary): New class templates.
 (reference_constructs_from_temporary_v,
 reference_converts_from_temporary_v): New variable templates.
 (__cpp_lib_reference_from_temporary): Define for C++23.
 * include/std/version (__cpp_lib_reference_from_temporary):
Define for
 C++23.
 * testsuite/20_util/variable_templates_for_traits.cc: Test
 reference_constructs_from_temporary_v and
 reference_converts_from_temporary_v.
 * testsuite/20_util/reference_from_temporary/value.cc: New test.
 * testsuite/20_util/reference_from_temporary/value2.cc: New test.
 * testsuite/20_util/reference_from_temporary/version.cc: New test.

gcc/testsuite/ChangeLog:

 * g++.dg/ext/reference_constructs_from_temporary1.C: New test.
 * g++.dg/ext/reference_converts_from_temporary1.C: New test.
---
   gcc/c-family/c-common.cc  |   4 +
   gcc/c-family/c-common.h   |   2 +
   gcc/cp/call.cc    |  14 +-
   gcc/cp/constraint.cc  |   8 +
   gcc/cp/cp-tree.h  |   7 +-
   gcc/cp/cxx-pretty-print.cc    |   6 +
   gcc/cp/method.cc  |  28 +++
   gcc/cp/parser.cc  |  14 +-
   gcc/cp/semantics.cc   |   8 +
   .../reference_constructs_from_temporary1.C    | 214 ++
   .../ext/reference_converts_from_temporary1.C  | 214 ++
   libstdc++-v3/include/std/type_traits  |  39 
   libstdc++-v3/include/std/version  |   5 +-
   

Re: [PATCH] x86: Disable sibcall if indirect_return attribute doesn't match

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Fri, Jul 15, 2022 at 1:44 AM H.J. Lu via Gcc-patches
 wrote:
>
> When shadow stack is enabled, function with indirect_return attribute
> may return via indirect jump.  In this case, we need to disable sibcall
> if caller doesn't have indirect_return attribute and indirect branch
> tracking is enabled since compiler won't generate ENDBR when calling the
> caller.
>
LGTM.
> gcc/
>
> PR target/85620
> * config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
> false if callee has indirect_return attribute and caller
> doesn't.
>
> gcc/testsuite/
>
> PR target/85620
> * gcc.target/i386/pr85620-2.c: Updated.
> * gcc.target/i386/pr85620-5.c: New test.
> * gcc.target/i386/pr85620-6.c: Likewise.
> * gcc.target/i386/pr85620-7.c: Likewise.
> ---
>  gcc/config/i386/i386.cc   | 10 ++
>  gcc/testsuite/gcc.target/i386/pr85620-2.c |  3 ++-
>  gcc/testsuite/gcc.target/i386/pr85620-5.c | 13 +
>  gcc/testsuite/gcc.target/i386/pr85620-6.c | 14 ++
>  gcc/testsuite/gcc.target/i386/pr85620-7.c | 14 ++
>  5 files changed, 53 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-7.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3a3c7299eb4..e03f86d4a23 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -1024,6 +1024,16 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>  return false;
>  }
>
> +  /* Disable sibcall if callee has indirect_return attribute and
> + caller doesn't since callee will return to the caller's caller
> + via an indirect jump.  */
> +  if (((flag_cf_protection & (CF_RETURN | CF_BRANCH))
> +   == (CF_RETURN | CF_BRANCH))
> +  && lookup_attribute ("indirect_return", TYPE_ATTRIBUTES (type))
> +  && !lookup_attribute ("indirect_return",
> +   TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl
> +return false;
> +
>/* Otherwise okay.  That also includes certain types of indirect calls.  */
>return true;
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/pr85620-2.c 
> b/gcc/testsuite/gcc.target/i386/pr85620-2.c
> index b2e680fa1fe..14ce0ffd1e1 100644
> --- a/gcc/testsuite/gcc.target/i386/pr85620-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr85620-2.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fcf-protection" } */
> -/* { dg-final { scan-assembler-times {\mendbr} 1 } } */
> +/* { dg-final { scan-assembler-times {\mendbr} 2 } } */
> +/* { dg-final { scan-assembler-not "jmp" } } */
>
>  struct ucontext;
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr85620-5.c 
> b/gcc/testsuite/gcc.target/i386/pr85620-5.c
> new file mode 100644
> index 000..04537702d09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr85620-5.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcf-protection" } */
> +/* { dg-final { scan-assembler-not "jmp" } } */
> +
> +struct ucontext;
> +
> +extern int (*bar) (struct ucontext *) __attribute__((__indirect_return__));
> +
> +int
> +foo (struct ucontext *oucp)
> +{
> +  return bar (oucp);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr85620-6.c 
> b/gcc/testsuite/gcc.target/i386/pr85620-6.c
> new file mode 100644
> index 000..0b6a64e8454
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr85620-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcf-protection" } */
> +/* { dg-final { scan-assembler "jmp" } } */
> +
> +struct ucontext;
> +
> +extern int bar (struct ucontext *) __attribute__((__indirect_return__));
> +
> +__attribute__((__indirect_return__))
> +int
> +foo (struct ucontext *oucp)
> +{
> +  return bar (oucp);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr85620-7.c 
> b/gcc/testsuite/gcc.target/i386/pr85620-7.c
> new file mode 100644
> index 000..fa62d56decf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr85620-7.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcf-protection" } */
> +/* { dg-final { scan-assembler "jmp" } } */
> +
> +struct ucontext;
> +
> +extern int (*bar) (struct ucontext *) __attribute__((__indirect_return__));
> +extern int foo (struct ucontext *) __attribute__((__indirect_return__));
> +
> +int
> +foo (struct ucontext *oucp)
> +{
> +  return bar (oucp);
> +}
> --
> 2.36.1
>


-- 
BR,
Hongtao


Re: [PATCH] i386: Fix _mm_[u]comixx_{ss,sd} codegen and add PF result. [PR106113]

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 14, 2022 at 2:11 PM Kong, Lingling via Gcc-patches
 wrote:
>
> Hi,
>
> The patch is to fix _mm_[u]comixx_{ss,sd} codegen and add PF result.  These 
> intrinsics have changed over time, like `_mm_comieq_ss ` old operation is 
> `RETURN ( a[31:0] == b[31:0] ) ? 1 : 0`, and new operation update is `RETURN 
> ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0`.
>
> OK for master?
All _mm_comiXX_ss uses order_compare except for mm_comine_ss which
uses unordered_compare, now it's aligned with intrinsic guide.
Ok for trunk.
>
> gcc/ChangeLog:
>
> PR target/106113
> * config/i386/i386-builtin.def (BDESC): Fix [u]comi{ss,sd}
> comparison due to intrinsics changed over time.
> * config/i386/i386-expand.cc (ix86_ssecom_setcc):
> Add unordered check and mode for sse comi codegen.
> (ix86_expand_sse_comi): Add unordered check and check a different
> CCmode.
> (ix86_expand_sse_comi_round):Extract unordered check and mode part
> in ix86_ssecom_setcc.
>
> gcc/testsuite/ChangeLog:
>
> PR target/106113
> * gcc.target/i386/avx-vcomisd-pr106113-2.c: New test.
> * gcc.target/i386/avx-vcomiss-pr106113-2.c: Ditto.
> * gcc.target/i386/avx-vucomisd-pr106113-2.c: Ditto.
> * gcc.target/i386/avx-vucomiss-pr106113-2.c: Ditto.
> * gcc.target/i386/sse-comiss-pr106113-1.c: Ditto.
> * gcc.target/i386/sse-comiss-pr106113-2.c: Ditto.
> * gcc.target/i386/sse-ucomiss-pr106113-1.c: Ditto.
> * gcc.target/i386/sse-ucomiss-pr106113-2.c: Ditto.
> * gcc.target/i386/sse2-comisd-pr106113-1.c: Ditto.
> * gcc.target/i386/sse2-comisd-pr106113-2.c: Ditto.
> * gcc.target/i386/sse2-ucomisd-pr106113-1.c: Ditto.
> * gcc.target/i386/sse2-ucomisd-pr106113-2.c: Ditto.
> ---
>  gcc/config/i386/i386-builtin.def  |  32 ++--
>  gcc/config/i386/i386-expand.cc| 140 +++---
>  .../gcc.target/i386/avx-vcomisd-pr106113-2.c  |   8 +
>  .../gcc.target/i386/avx-vcomiss-pr106113-2.c  |   8 +
>  .../gcc.target/i386/avx-vucomisd-pr106113-2.c |   8 +
>  .../gcc.target/i386/avx-vucomiss-pr106113-2.c |   8 +
>  .../gcc.target/i386/sse-comiss-pr106113-1.c   |  19 +++
>  .../gcc.target/i386/sse-comiss-pr106113-2.c   |  59 
>  .../gcc.target/i386/sse-ucomiss-pr106113-1.c  |  19 +++
>  .../gcc.target/i386/sse-ucomiss-pr106113-2.c  |  59 
>  .../gcc.target/i386/sse2-comisd-pr106113-1.c  |  19 +++
>  .../gcc.target/i386/sse2-comisd-pr106113-2.c  |  59 
>  .../gcc.target/i386/sse2-ucomisd-pr106113-1.c |  19 +++
>  .../gcc.target/i386/sse2-ucomisd-pr106113-2.c |  59 
>  14 files changed, 450 insertions(+), 66 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx-vcomisd-pr106113-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx-vcomiss-pr106113-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx-vucomisd-pr106113-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx-vucomiss-pr106113-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse-comiss-pr106113-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse-comiss-pr106113-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse-ucomiss-pr106113-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse-ucomiss-pr106113-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-comisd-pr106113-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-comisd-pr106113-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-ucomisd-pr106113-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-ucomisd-pr106113-2.c
>
> diff --git a/gcc/config/i386/i386-builtin.def 
> b/gcc/config/i386/i386-builtin.def
> index fd160935e67..acb7e8ca64b 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -35,30 +35,30 @@
>  IX86_BUILTIN__BDESC_##NEXT_KIND##_FIRST - 1.  */
>
>  BDESC_FIRST (comi, COMI,
> -   OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comieq", 
> IX86_BUILTIN_COMIEQSS, UNEQ, 0)
> -BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comilt", 
> IX86_BUILTIN_COMILTSS, UNLT, 0)
> -BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comile", 
> IX86_BUILTIN_COMILESS, UNLE, 0)
> +   OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comieq", 
> IX86_BUILTIN_COMIEQSS, EQ, 0)
> +BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comilt", 
> IX86_BUILTIN_COMILTSS, LT, 0)
> +BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comile", 
> IX86_BUILTIN_COMILESS, LE, 0)
>  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comigt", 
> IX86_BUILTIN_COMIGTSS, GT, 0)
>  BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comige", 
> IX86_BUILTIN_COMIGESS, GE, 0)
> -BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comineq", 
> 

Re: [PATCH, committed] Fortran: error recovery for bad initializers of implied-shape arrays [PR106209]

2022-07-14 Thread Jerry D via Gcc-patches



Hi Herald,

Looks good to me. I have always preferred informative messages.

Thanks,

Jerry
On 7/14/22 1:34 PM, Harald Anlauf via Fortran wrote:

Dear all,

the attached patch introduces error recovery for two cases of using
an invalid array in a declaration of an implied-shape array instead
of hitting internal errors.

Initial patch by Steve.  The final version was approved in the PR
by Steve.

Committed as:

https://gcc.gnu.org/g:748f8a8b145dde59c7b63aa68b5a59515b7efc49

Thanks,
Harald





Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-07-14 Thread Yonghong Song via Gcc-patches




On 7/14/22 8:09 AM, Jose E. Marchesi wrote:


Hi Yonghong.


On 7/7/22 1:24 PM, Jose E. Marchesi wrote:

Hi Yonghong.


On 6/21/22 9:12 AM, Jose E. Marchesi wrote:



On 6/17/22 10:18 AM, Jose E. Marchesi wrote:

Hi Yonghong.


On 6/15/22 1:57 PM, David Faust wrote:


On 6/14/22 22:53, Yonghong Song wrote:



On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
to "tag") particular declarations and types with arbitrary strings. As
explained below, this is intended to be used to, for example, 
characterize
certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
  tags on certain language elements, such as struct fields.

  The purpose of these annotations is to provide additional information 
about
  types, variables, and function parameters of interest to the kernel. A
  driving use case is to tag pointer types within the linux kernel and 
eBPF
  programs with additional semantic information, such as '__user' or 
'__rcu'.

  For example, consider the linux kernel function do_execve with the
  following declaration:

static int do_execve(struct filename *filename,
   const char __user *const __user *__argv,
   const char __user *const __user *__envp);

  Here, __user could be defined with these annotations to record 
semantic
  information about the pointer parameters (e.g., they are 
user-provided) in
  DWARF and BTF information. Other kernel facilites such as the eBPF 
verifier
  can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

  The main motivation for emitting the tags in DWARF is that the Linux 
kernel
  generates its BTF information via pahole, using DWARF as a source:

  ++  BTF  BTF   +--+
  | pahole |---> vmlinux.btf --->| verifier |
  ++ +--+
  ^^
  ||
DWARF |BTF |
  ||
   vmlinux  +-+
   module1.ko   | BPF program |
   module2.ko   +-+
 ...

  This is because:

  a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

  b)  GCC can generate BTF for whatever target with -gbtf, but there is 
no
  support for linking/deduplicating BTF in the linker.

  In the scenario above, the verifier needs access to the pointer tags 
of
  both the kernel types/declarations (conveyed in the DWARF and 
translated
  to BTF by pahole) and those of the BPF program (available directly in 
BTF).

  Another motivation for having the tag information in DWARF, unrelated 
to
  BPF and BTF, is that the drgn project (another DWARF consumer) also 
wants
  to benefit from these tags in order to differentiate between different
  kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

  This is easy: the main purpose of having this info in BTF is for the
  compiled eBPF programs. The kernel verifier can then access the tags
  of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes 

Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.

2022-07-14 Thread Segher Boessenkool
On Thu, Jul 14, 2022 at 05:49:57PM -0400, Michael Meissner wrote:
> On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote:
> > You never posted the trunk version of this, so that never was approved
> > either.
> 
> I did post the trunk version on June 10th, and your only comment was fix the
> commit message, which I thought I did in the commit.

I did not approve the patch.  Of course not, I didn't even get as far as
reading it.  You should have fixed it and sent again, I did not approve
anything.

> > > +  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> > > +   && rs6000_tune != PROCESSOR_POWER10)
> > >   rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> > >else
> > >   rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> > 
> > The TARGET_MMA in that should not be there.  Please fix that (that
> > probably needs more changes).
> 
> All of the movoo and movxo support require TARGET_MMA as does the code in
> rs6000-string.cc that could possibly generate load/store vector pair.

And all that is wrong and should be fixed.

> To
> remove the check here would mean also fixing all of the vector load and store
> pairs in mma.md.

That is wha I said, yes,.

> > This statement does the opposite of what the comment says.
> > 
> > Please fix this.  On trunk, first.

This is the core problem with this patch: it is simply wrong.

It is a very roundabout way of saying "only enable vector pairs if
-mcpu=power10 but -mtune=somethingelse".  Which is not a sensible
thing to do, and not what the comment says either.


Segher


[PATCH] stack-protector: Check stack canary for noreturn function

2022-07-14 Thread H.J. Lu via Gcc-patches
Check stack canary for noreturn function to catch stack corruption
before calling noreturn function.  For C++, check stack canary when
throwing exception or resuming stack unwind to avoid corrupted stack.

gcc/

PR middle-end/58245
* calls.cc (expand_call): Check stack canary for noreturn
function.

gcc/testsuite/

PR middle-end/58245
* c-c++-common/pr58245-1.c: New test.
* g++.dg/pr58245-1.C: Likewise.
* g++.dg/fstack-protector-strong.C: Adjusted.
---
 gcc/calls.cc   |  7 ++-
 gcc/testsuite/c-c++-common/pr58245-1.c | 12 
 gcc/testsuite/g++.dg/fstack-protector-strong.C |  2 +-
 gcc/testsuite/g++.dg/pr58245-1.C   | 10 ++
 4 files changed, 29 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr58245-1.c
 create mode 100644 gcc/testsuite/g++.dg/pr58245-1.C

diff --git a/gcc/calls.cc b/gcc/calls.cc
index bc96aff38f0..7816c2c8d99 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -3154,7 +3154,12 @@ expand_call (tree exp, rtx target, int ignore)
   if (pass && (flags & ECF_MALLOC))
start_sequence ();
 
-  if (pass == 0
+  /* Check the canary value for sibcall or function which doesn't
+return.  */
+  if ((pass == 0
+  || ((flags & ECF_NORETURN) != 0
+  && (fndecl
+  != get_callee_fndecl (targetm.stack_protect_fail ()
  && crtl->stack_protect_guard
  && targetm.stack_protect_runtime_enabled_p ())
stack_protect_epilogue ();
diff --git a/gcc/testsuite/c-c++-common/pr58245-1.c 
b/gcc/testsuite/c-c++-common/pr58245-1.c
new file mode 100644
index 000..945acc53004
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr58245-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */
+/* { dg-options "-O2 -fstack-protector-all" } */
+
+extern void foo (void) __attribute__ ((noreturn));
+
+void
+bar (void)
+{
+  foo ();
+}
+
+/* { dg-final { scan-assembler-times "stack_chk_fail" 1 } } */
diff --git a/gcc/testsuite/g++.dg/fstack-protector-strong.C 
b/gcc/testsuite/g++.dg/fstack-protector-strong.C
index ae6d2fdb8df..034af2ce9ab 100644
--- a/gcc/testsuite/g++.dg/fstack-protector-strong.C
+++ b/gcc/testsuite/g++.dg/fstack-protector-strong.C
@@ -85,4 +85,4 @@ int foo7 (B *p)
   return p->return_slot ().a1;
 }
 
-/* { dg-final { scan-assembler-times "stack_chk_fail" 7 } } */
+/* { dg-final { scan-assembler-times "stack_chk_fail" 8 } } */
diff --git a/gcc/testsuite/g++.dg/pr58245-1.C b/gcc/testsuite/g++.dg/pr58245-1.C
new file mode 100644
index 000..1439bc62e71
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr58245-1.C
@@ -0,0 +1,10 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */
+/* { dg-options "-O2 -fstack-protector-all" } */
+
+void
+bar (void)
+{
+  throw 1;
+}
+
+/* { dg-final { scan-assembler-times "stack_chk_fail" 1 } } */
-- 
2.36.1



[PATCH] libphobos: Fix instability in the parallelized testsuite

2022-07-14 Thread Lewis Hyatt via Gcc-patches
Hello-

I get a different number of test results from libphobos.unittest/unittest.exp,
depending on server load. I believe it's because this testsuite doesn't check
runtest_file_p:

$ make -j 1 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#'
 # of expected passes   10

$ make -j 2 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#'
 # of expected passes   10
 # of expected passes   10

$ make -j 4 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#'
 # of expected passes   10
 # of expected passes   10
 # of expected passes   10
 # of expected passes   10

When running in parallel along with other tests, even at a fixed argument
for -j, the number of tests that actually execute will depend on how many of the
parallel sub-makes happened to start prior to the first one finishing, hence
it changes from run to run.

The attached patch fixes it for me, if it looks OK? Thanks, this would remove
some noise from before/after test comparisons.

-Lewis
libphobos: Fix instability in the parallelized testsuite

libphobos.unittest/unittest.exp calls bare dg-test rather than dg-runtest, and
so it should call runtest_file_p to determine whether to run each test or
not. Without that call, the tests run too many times in parallel mode (they will
run as many times, as the argument to make -j).

libphobos/ChangeLog:

* testsuite/libphobos.unittest/unittest.exp: Call runtest_file_p
prior to running each test.

diff --git a/libphobos/testsuite/libphobos.unittest/unittest.exp 
b/libphobos/testsuite/libphobos.unittest/unittest.exp
index 2a019caca8c..175decdc333 100644
--- a/libphobos/testsuite/libphobos.unittest/unittest.exp
+++ b/libphobos/testsuite/libphobos.unittest/unittest.exp
@@ -42,6 +42,9 @@ foreach unit_test $unit_test_list {
 set expected_fail [lindex $unit_test 1]
 
 foreach test $tests {
+   if {![runtest_file_p $runtests $test]} {
+continue
+}
 set shouldfail $expected_fail
 dg-test $test "" $test_flags
 }


Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.

2022-07-14 Thread Michael Meissner via Gcc-patches
On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote:
> On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote:
> > I have applied the patch to GCC 12.
> > 
> > | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001
> > | From: Michael Meissner 
> > | Date: Thu, 14 Jul 2022 11:16:08 -0400
> > | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs 
> > for block copies.
> > 
> > Testing has found that using load and store vector pair for block copies
> > can result in a slow down on power10.  This patch disables using the
> > vector pair instructions for block copies if we are tuning for power10.
> > 
> > 2022-06-11   Michael Meissner  
> > 
> > gcc/
> > 
> > * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
> > not generate block copies with vector pair instructions if we are
> > tuning for power10.  Back port from master branch.
> 
> You never posted the trunk version of this, so that never was approved
> either.

I did post the trunk version on June 10th, and your only comment was fix the
commit message, which I thought I did in the commit.

> > +++ b/gcc/config/rs6000/rs6000.cc
> > @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p)
> >  
> >if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
> >  {
> > -  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
> > +  /* Do not generate lxvp and stxvp on power10 since there are some
> > +performance issues.  */
> > +  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> > + && rs6000_tune != PROCESSOR_POWER10)
> > rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> >else
> > rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> 
> The TARGET_MMA in that should not be there.  Please fix that (that
> probably needs more changes).


All of the movoo and movxo support require TARGET_MMA as does the code in
rs6000-string.cc that could possibly generate load/store vector pair.  To
remove the check here would mean also fixing all of the vector load and store
pairs in mma.md.

> This statement does the opposite of what the comment says.
> 
> Please fix this.  On trunk, first.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] libcpp: Improve encapsulation of label_text

2022-07-14 Thread David Malcolm via Gcc-patches
On Thu, 2022-07-14 at 22:10 +0100, Jonathan Wakely wrote:

Thanks for the patch.

> I'm not sure if label_text::get () is the best name for the new
> accessor. Other options include buffer () and c_str () but I don't
> see a
> compelling reason to prefer either of those.

label_text::get should return a const char *, rather than a char *.

I don't think anything uses label_text::is_owner; do we need that?

> 
> As a follow-up patch we could change label_text::m_buffer (and
> pod_label_text::m_buffer) to be const char*, since those labels are
> never modified once a label_text takes ownership of it. That would
> make it clear to callers that they are not supposed to modify the
> contents, and would remove the const_cast currently used by
> label_text::borrow (), which is a code smell (returning a non-const
> char* makes it look like it's OK to write into it, which is
> definitely
> not true for a borrowed string that was originally a string literal).
> That would require using const_cast when passing the buffer to free
> for
> deallocation, but that's fine, and avoids the impression that the
> object
> holds a modifiable string buffer.

I'm not sure I agree with your reasoning about the proposed follow-up,
but the patch below is OK for trunk, with the above nits fixed.

Thanks
Dave

> 
> Tested x86_64-linux, OK for trunk?
> 
> 
> -- >8 --
> 
> This adjusts the API of label_text so that the data members are
> private
> and cannot be modified by callers.  Add accessors for them instead.
> Also rename moved_from () to the more idiomatic release ().  Also
> remove
> the unused take_or_copy () member function which has confusing
> ownership
> semantics.
> 
> gcc/analyzer/ChangeLog:
> 
> * call-info.cc (call_info::print): Adjust to new label_text
> API.
> * checker-path.cc (checker_event::dump): Likewise.
> (region_creation_event::get_desc): Likewise.
> (state_change_event::get_desc): Likewise.
> (superedge_event::should_filter_p): Likewise.
> (start_cfg_edge_event::get_desc): Likewise.
> (call_event::get_desc): Likewise.
> (return_event::get_desc): Likewise.
> (warning_event::get_desc): Likewise.
> (checker_path::dump): Likewise.
> (checker_path::debug): Likewise.
> * diagnostic-manager.cc
> (diagnostic_manager::prune_for_sm_diagnostic):
> Likewise.
> (diagnostic_manager::prune_interproc_events): Likewise.
> * engine.cc (feasibility_state::maybe_update_for_edge):
> Likewise.
> * program-state.cc (sm_state_map::to_json): Likewise.
> * region-model-impl-calls.cc
> (region_model::impl_call_analyzer_describe): Likewise.
> (region_model::impl_call_analyzer_dump_capacity): Likewise.
> * region.cc (region::to_json): Likewise.
> * sm-malloc.cc (inform_nonnull_attribute): Likewise.
> * store.cc (binding_map::to_json): Likewise.
> (store::to_json): Likewise.
> * supergraph.cc (superedge::dump): Likewise.
> * svalue.cc (svalue::to_json): Likewise.
> 
> gcc/c-family/ChangeLog:
> 
> * c-format.cc (class range_label_for_format_type_mismatch):
> Adjust to new label_text API.
> 
> gcc/ChangeLog:
> 
> * diagnostic-format-json.cc (json_from_location_range):
> Adjust
> to new label_text API.
> * diagnostic-format-sarif.cc
> (sarif_builder::make_location_object):
> Likewise.
> * diagnostic-show-locus.cc (struct pod_label_text): Likewise.
> (layout::print_any_labels): Likewise.
> * tree-diagnostic-path.cc (class path_label): Likewise.
> (struct event_range): Likewise.
> (default_tree_diagnostic_path_printer): Likewise.
> (default_tree_make_json_for_path): Likewise.
> 
> libcpp/ChangeLog:
> 
> * include/line-map.h (label_text::take_or_copy): Remove.
> (label_text::moved_from): Rename to release.
> (label_text::m_buffer, label_text::m_owned): Make private.
> (label_text::get, label_text::is_owned): New accessors.
> ---
>  gcc/analyzer/call-info.cc   |  2 +-
>  gcc/analyzer/checker-path.cc    | 46 ---
> --
>  gcc/analyzer/diagnostic-manager.cc  | 20 +--
>  gcc/analyzer/engine.cc  |  2 +-
>  gcc/analyzer/program-state.cc   |  2 +-
>  gcc/analyzer/region-model-impl-calls.cc |  4 +--
>  gcc/analyzer/region.cc  |  2 +-
>  gcc/analyzer/sm-malloc.cc   | 10 +++---
>  gcc/analyzer/store.cc   |  6 ++--
>  gcc/analyzer/supergraph.cc  |  4 +--
>  gcc/analyzer/svalue.cc  |  2 +-
>  gcc/c-family/c-format.cc    |  4 +--
>  gcc/diagnostic-format-json.cc   |  4 +--
>  gcc/diagnostic-format-sarif.cc  |  2 +-
>  gcc/diagnostic-show-locus.cc    |  6 ++--
>  gcc/tree-diagnostic-path.cc | 16 -
>  libcpp/include/line-map.h 

Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.

2022-07-14 Thread Segher Boessenkool
On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote:
> I have applied the patch to GCC 12.
> 
> | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001
> | From: Michael Meissner 
> | Date: Thu, 14 Jul 2022 11:16:08 -0400
> | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for 
> block copies.
> 
> Testing has found that using load and store vector pair for block copies
> can result in a slow down on power10.  This patch disables using the
> vector pair instructions for block copies if we are tuning for power10.
> 
> 2022-06-11   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
>   not generate block copies with vector pair instructions if we are
>   tuning for power10.  Back port from master branch.

You never posted the trunk version of this, so that never was approved
either.

> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p)
>  
>if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
>  {
> -  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
> +  /* Do not generate lxvp and stxvp on power10 since there are some
> +  performance issues.  */
> +  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> +   && rs6000_tune != PROCESSOR_POWER10)
>   rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
>else
>   rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;

The TARGET_MMA in that should not be there.  Please fix that (that
probably needs more changes).

This statement does the opposite of what the comment says.

Please fix this.  On trunk, first.


Segher


[PATCH] libcpp: Improve encapsulation of label_text

2022-07-14 Thread Jonathan Wakely via Gcc-patches
I'm not sure if label_text::get () is the best name for the new
accessor. Other options include buffer () and c_str () but I don't see a
compelling reason to prefer either of those.

As a follow-up patch we could change label_text::m_buffer (and
pod_label_text::m_buffer) to be const char*, since those labels are
never modified once a label_text takes ownership of it. That would
make it clear to callers that they are not supposed to modify the
contents, and would remove the const_cast currently used by
label_text::borrow (), which is a code smell (returning a non-const
char* makes it look like it's OK to write into it, which is definitely
not true for a borrowed string that was originally a string literal).
That would require using const_cast when passing the buffer to free for
deallocation, but that's fine, and avoids the impression that the object
holds a modifiable string buffer.

Tested x86_64-linux, OK for trunk?


-- >8 --

This adjusts the API of label_text so that the data members are private
and cannot be modified by callers.  Add accessors for them instead.
Also rename moved_from () to the more idiomatic release ().  Also remove
the unused take_or_copy () member function which has confusing ownership
semantics.

gcc/analyzer/ChangeLog:

* call-info.cc (call_info::print): Adjust to new label_text API.
* checker-path.cc (checker_event::dump): Likewise.
(region_creation_event::get_desc): Likewise.
(state_change_event::get_desc): Likewise.
(superedge_event::should_filter_p): Likewise.
(start_cfg_edge_event::get_desc): Likewise.
(call_event::get_desc): Likewise.
(return_event::get_desc): Likewise.
(warning_event::get_desc): Likewise.
(checker_path::dump): Likewise.
(checker_path::debug): Likewise.
* diagnostic-manager.cc (diagnostic_manager::prune_for_sm_diagnostic):
Likewise.
(diagnostic_manager::prune_interproc_events): Likewise.
* engine.cc (feasibility_state::maybe_update_for_edge):
Likewise.
* program-state.cc (sm_state_map::to_json): Likewise.
* region-model-impl-calls.cc 
(region_model::impl_call_analyzer_describe): Likewise.
(region_model::impl_call_analyzer_dump_capacity): Likewise.
* region.cc (region::to_json): Likewise.
* sm-malloc.cc (inform_nonnull_attribute): Likewise.
* store.cc (binding_map::to_json): Likewise.
(store::to_json): Likewise.
* supergraph.cc (superedge::dump): Likewise.
* svalue.cc (svalue::to_json): Likewise.

gcc/c-family/ChangeLog:

* c-format.cc (class range_label_for_format_type_mismatch):
Adjust to new label_text API.

gcc/ChangeLog:

* diagnostic-format-json.cc (json_from_location_range): Adjust
to new label_text API.
* diagnostic-format-sarif.cc (sarif_builder::make_location_object):
Likewise.
* diagnostic-show-locus.cc (struct pod_label_text): Likewise.
(layout::print_any_labels): Likewise.
* tree-diagnostic-path.cc (class path_label): Likewise.
(struct event_range): Likewise.
(default_tree_diagnostic_path_printer): Likewise.
(default_tree_make_json_for_path): Likewise.

libcpp/ChangeLog:

* include/line-map.h (label_text::take_or_copy): Remove.
(label_text::moved_from): Rename to release.
(label_text::m_buffer, label_text::m_owned): Make private.
(label_text::get, label_text::is_owned): New accessors.
---
 gcc/analyzer/call-info.cc   |  2 +-
 gcc/analyzer/checker-path.cc| 46 -
 gcc/analyzer/diagnostic-manager.cc  | 20 +--
 gcc/analyzer/engine.cc  |  2 +-
 gcc/analyzer/program-state.cc   |  2 +-
 gcc/analyzer/region-model-impl-calls.cc |  4 +--
 gcc/analyzer/region.cc  |  2 +-
 gcc/analyzer/sm-malloc.cc   | 10 +++---
 gcc/analyzer/store.cc   |  6 ++--
 gcc/analyzer/supergraph.cc  |  4 +--
 gcc/analyzer/svalue.cc  |  2 +-
 gcc/c-family/c-format.cc|  4 +--
 gcc/diagnostic-format-json.cc   |  4 +--
 gcc/diagnostic-format-sarif.cc  |  2 +-
 gcc/diagnostic-show-locus.cc|  6 ++--
 gcc/tree-diagnostic-path.cc | 16 -
 libcpp/include/line-map.h   | 27 ---
 17 files changed, 80 insertions(+), 79 deletions(-)

diff --git a/gcc/analyzer/call-info.cc b/gcc/analyzer/call-info.cc
index e1142d743a3..efc070b8bed 100644
--- a/gcc/analyzer/call-info.cc
+++ b/gcc/analyzer/call-info.cc
@@ -75,7 +75,7 @@ void
 call_info::print (pretty_printer *pp) const
 {
   label_text desc (get_desc (pp_show_color (pp)));
-  pp_string (pp, desc.m_buffer);
+  pp_string (pp, desc.get ());
 }
 
 /* Implementation of custom_edge_info::add_events_to_path vfunc for
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 

Re: [PATCH v3] Simplify memchr with small constant strings

2022-07-14 Thread H.J. Lu via Gcc-patches
On Wed, Jul 13, 2022 at 11:42 PM Richard Biener
 wrote:
>
> On Wed, Jul 13, 2022 at 6:50 PM H.J. Lu  wrote:
> >
> > When memchr is applied on a constant string of no more than the bytes of
> > a word, simplify memchr by checking each byte in the constant string.
> >
> > int f (int a)
> > {
> >return  __builtin_memchr ("AE", a, 2) != 0;
> > }
> >
> > is simplified to
> >
> > int f (int a)
> > {
> >   return ((char) a == 'A' || (char) a == 'E') != 0;
> > }
> >
> > gcc/
> >
> > PR tree-optimization/103798
> > * tree-ssa-forwprop.cc: Include "tree-ssa-strlen.h".
> > (simplify_builtin_call): Inline memchr with constant strings of
> > no more than the bytes of a word.
> > * tree-ssa-strlen.cc (use_in_zero_equality): Make it global.
> > * tree-ssa-strlen.h (use_in_zero_equality): New.
> >
> > gcc/testsuite/
> >
> > PR tree-optimization/103798
> > * c-c++-common/pr103798-1.c: New test.
> > * c-c++-common/pr103798-2.c: Likewise.
> > * c-c++-common/pr103798-3.c: Likewise.
> > * c-c++-common/pr103798-4.c: Likewise.
> > * c-c++-common/pr103798-5.c: Likewise.
> > * c-c++-common/pr103798-6.c: Likewise.
> > * c-c++-common/pr103798-7.c: Likewise.
> > * c-c++-common/pr103798-8.c: Likewise.
> > * c-c++-common/pr103798-9.c: Likewise.
> > * c-c++-common/pr103798-10.c: Likewise.
> > ---
> >  gcc/testsuite/c-c++-common/pr103798-1.c  | 28 +
> >  gcc/testsuite/c-c++-common/pr103798-10.c | 10 
> >  gcc/testsuite/c-c++-common/pr103798-2.c  | 30 ++
> >  gcc/testsuite/c-c++-common/pr103798-3.c  | 28 +
> >  gcc/testsuite/c-c++-common/pr103798-4.c  | 28 +
> >  gcc/testsuite/c-c++-common/pr103798-5.c  | 26 +
> >  gcc/testsuite/c-c++-common/pr103798-6.c  | 27 +
> >  gcc/testsuite/c-c++-common/pr103798-7.c  | 27 +
> >  gcc/testsuite/c-c++-common/pr103798-8.c  | 27 +
> >  gcc/testsuite/c-c++-common/pr103798-9.c  | 10 
> >  gcc/tree-ssa-forwprop.cc | 73 
> >  gcc/tree-ssa-strlen.cc   |  4 +-
> >  gcc/tree-ssa-strlen.h|  2 +
> >  13 files changed, 318 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-1.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-10.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-2.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-3.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-4.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-5.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-6.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-7.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-8.c
> >  create mode 100644 gcc/testsuite/c-c++-common/pr103798-9.c
> >
> > diff --git a/gcc/testsuite/c-c++-common/pr103798-1.c 
> > b/gcc/testsuite/c-c++-common/pr103798-1.c
> > new file mode 100644
> > index 000..cd3edf569fc
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/pr103798-1.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> > +
> > +__attribute__ ((weak))
> > +int
> > +f (char a)
> > +{
> > +   return  __builtin_memchr ("a", a, 1) == 0;
> > +}
> > +
> > +__attribute__ ((weak))
> > +int
> > +g (char a)
> > +{
> > +  return a != 'a';
> > +}
> > +
> > +int
> > +main ()
> > +{
> > + for (int i = 0; i < 255; i++)
> > +   if (f (i) != g (i))
> > + __builtin_abort ();
> > +
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "memchr" } } */
> > diff --git a/gcc/testsuite/c-c++-common/pr103798-10.c 
> > b/gcc/testsuite/c-c++-common/pr103798-10.c
> > new file mode 100644
> > index 000..4677e9539fa
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/pr103798-10.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -fdump-tree-optimized -save-temps" } */
> > +
> > +int
> > +f (char a)
> > +{
> > +  return  __builtin_memchr ("ac", a, 1) == 0;
> > +}
> > +
> > +/* { dg-final { scan-assembler "memchr" } } */
> > diff --git a/gcc/testsuite/c-c++-common/pr103798-2.c 
> > b/gcc/testsuite/c-c++-common/pr103798-2.c
> > new file mode 100644
> > index 000..e7e99c3679e
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/pr103798-2.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> > +
> > +#include 
> > +
> > +__attribute__ ((weak))
> > +int
> > +f (int a)
> > +{
> > +   return memchr ("aE", a, 2) != NULL;
> > +}
> > +
> > +__attribute__ ((weak))
> > +int
> > +g (char a)
> > +{
> > +  return a == 'a' || a == 'E';
> > +}
> > +
> > +int
> > +main ()
> > +{
> > + for (int i = 0; i < 255; i++)
> > +   if (f (i + 256) != g (i + 256))
> > + __builtin_abort ();
> > +
> > + return 0;
> > +}
> > +
> > +/* { dg-final { 

Re: [PATCH] Revert "[PATCH] RISC-V: Use new linker emulations for glibc ABI."

2022-07-14 Thread Fangrui Song via Gcc-patches

On 2022-07-14, Palmer Dabbelt wrote:

On Mon, 20 Jun 2022 20:48:50 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On Mon, Jun 20, 2022 at 1:21 AM Kito Cheng  wrote:


Generally I agree we should fix that by GCC driver rather than ld
emulation, but I think this should be reverted with the -L path fix,
otherwise that will break multilib on GNU toolchain for linux
immediately?


IIUC just changing this will break even non-multlib systems, though 
it's possible that the symlinks work around that sufficiently.



Thanks for the good consideration. That said, I am unsure any distro
uses this currently.
I think some just work around the possibly non-existent paths by
creating symlinks.
Perhaps we should prioritize on fixing the scheme before distros start
to rely on the behavior.


I'm kind of torn on this one: this has been around for a while and 
dropping it would be an ABI break, but the feedback from distro folks 
is pretty consistently that multlib is broken on RISC-V.  If it's 
really unusably broken then I could buy the argument that there's no 
binaries (and thus no ABI to break), but there's at some base multilib 
functionality working -- I build multilib cross toolchains regularly, 
for example, and they can build simple stuff.


I always find making that "nobody's used it" argument really hard, 
there's just too many users to try and track everyone down.  We're in 
kind of a weird spot with RISC-V in general when it comes to ABI 
stuff: we were probably a bit overly optimistic about how fast any of 
this was going to get used when we committed to the ABI freeze, but 
any ABi break has such a huge potential for user headaches that I'm 
not sure it's going to be possible.  It means we're stuck with some 
baggage, and while it's a headache to keep around stuff that's 
probably not all that useful I think it's just what we've got to live 
with.


If multlib really is so broken it's not fixable without an ABI break 
then I guess there's no other option, but I think in this case we have 
some:


One option would be to add an ld argument that says to turn off the 
emulation-specific path resolution, which we could then add to 
LINK_SPEC when we get the library paths sorted out?  We'd still have 
the emulations and the subdirs, but at least we wouldn't need a flag 
day.


Another option would be to add new multlib paths that don't have the 
subdirectories, as last I checked that was an issue for distros 
(violates FHS, breaks build systems, etc).  If we're going to do that 
anyway then we could piggyback the new behavior on it and deprecate 
the old paths along with whatever behavior is associated with them.


Thanks for chiming in and providing the good consideration.  I don't do
much GNU development.  I just spotted something and think it should be
fixed.  If dropping -L is considered a risk, an alternative scheme is
that someone:

* create an alternative patch making GCC driver pass the -L to ld and use the 
plain emulation name.
* a configure option can possibly be added to guard whether the -L is added at 
all
* drop the excessive emulations in GNU ld.

This way an alternative linker implementation doesn't have to add a
compatibility no-op option.

I appreciate anyone who wants to step up and helps removing
eelf64lriscv_{lp64,ilp32f,...}. We will have "*64briscv*" variants and
it will then look really ugly.


On Wed, Jun 15, 2022 at 4:00 PM Fangrui Song via Gcc-patches
 wrote:


This reverts commit 37d57ac9a636f2235f9060e84fb8dd7968abd1dc.

The resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=22962
let GCC pass -m emulation to ld and let the ld emulation configure
default library paths.  This scheme is problematic:

* It's not ld's business to specify default -L.  Different platforms have
different opinions on the hierarchy and all other arches work well without ld's
default -L.
* If some ABI derived library paths are desired, the compiler driver is in a
better position to make the decision and traditionally has done this.
* -m emulation is opaque to the compiler driver.  It doesn't affect -B, so
data files like crt*.o, libasan_preinit.o, and libtsan_preinit.o are not 
affected.

As is, many platforms just use symlinks to fake the lib64/{ilp32{,f},lp64{,f}}
hierarchies needed by the GNU ld emulation.  They can always specify -L
explicitly if they want some ABI derived library paths.  See also the rejected
https://reviews.llvm.org/D95755


I don't do a lot of LLVM stuff, but that has a green check mark that 
says "accepted" at the top.  Does that mean it was merged somewhere, 
or just that it was acked/reviewed and then dropped?


I could land it, but I realized that would be a mistake so I stopped.
There has since been no real need for this feature, either.  I wished
that some RISC-V folks stepped up and fixed it.  It took so long so I
ended up sending this patch myself, but I have no bandwidth to change this
patch to a form providing better compatibility



gcc/Changelog:

* 

Re: [PATCH] Revert "[PATCH] RISC-V: Use new linker emulations for glibc ABI."

2022-07-14 Thread Palmer Dabbelt

On Mon, 20 Jun 2022 20:48:50 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On Mon, Jun 20, 2022 at 1:21 AM Kito Cheng  wrote:


Generally I agree we should fix that by GCC driver rather than ld
emulation, but I think this should be reverted with the -L path fix,
otherwise that will break multilib on GNU toolchain for linux
immediately?


IIUC just changing this will break even non-multlib systems, though it's 
possible that the symlinks work around that sufficiently.



Thanks for the good consideration. That said, I am unsure any distro
uses this currently.
I think some just work around the possibly non-existent paths by
creating symlinks.
Perhaps we should prioritize on fixing the scheme before distros start
to rely on the behavior.


I'm kind of torn on this one: this has been around for a while and 
dropping it would be an ABI break, but the feedback from distro folks is 
pretty consistently that multlib is broken on RISC-V.  If it's really 
unusably broken then I could buy the argument that there's no binaries 
(and thus no ABI to break), but there's at some base multilib 
functionality working -- I build multilib cross toolchains regularly, 
for example, and they can build simple stuff.


I always find making that "nobody's used it" argument really hard, 
there's just too many users to try and track everyone down.  We're in 
kind of a weird spot with RISC-V in general when it comes to ABI stuff: 
we were probably a bit overly optimistic about how fast any of this was 
going to get used when we committed to the ABI freeze, but any ABi break 
has such a huge potential for user headaches that I'm not sure it's 
going to be possible.  It means we're stuck with some baggage, and while 
it's a headache to keep around stuff that's probably not all that useful 
I think it's just what we've got to live with.


If multlib really is so broken it's not fixable without an ABI break 
then I guess there's no other option, but I think in this case we have 
some:


One option would be to add an ld argument that says to turn off the 
emulation-specific path resolution, which we could then add to LINK_SPEC 
when we get the library paths sorted out?  We'd still have the 
emulations and the subdirs, but at least we wouldn't need a flag day.


Another option would be to add new multlib paths that don't have the 
subdirectories, as last I checked that was an issue for distros 
(violates FHS, breaks build systems, etc).  If we're going to do that 
anyway then we could piggyback the new behavior on it and deprecate the old 
paths along with whatever behavior is associated with them.



On Wed, Jun 15, 2022 at 4:00 PM Fangrui Song via Gcc-patches
 wrote:
>
> This reverts commit 37d57ac9a636f2235f9060e84fb8dd7968abd1dc.
>
> The resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=22962
> let GCC pass -m emulation to ld and let the ld emulation configure
> default library paths.  This scheme is problematic:
>
> * It's not ld's business to specify default -L.  Different platforms have
> different opinions on the hierarchy and all other arches work well without 
ld's
> default -L.
> * If some ABI derived library paths are desired, the compiler driver is in a
> better position to make the decision and traditionally has done this.
> * -m emulation is opaque to the compiler driver.  It doesn't affect -B, so
> data files like crt*.o, libasan_preinit.o, and libtsan_preinit.o are not 
affected.
>
> As is, many platforms just use symlinks to fake the lib64/{ilp32{,f},lp64{,f}}
> hierarchies needed by the GNU ld emulation.  They can always specify -L
> explicitly if they want some ABI derived library paths.  See also the rejected
> https://reviews.llvm.org/D95755


I don't do a lot of LLVM stuff, but that has a green check mark that 
says "accepted" at the top.  Does that mean it was merged somewhere, or 
just that it was acked/reviewed and then dropped?



>
> gcc/Changelog:
>
> * config/riscv/linux.h (LD_EMUL_SUFFIX): Remove.
> (LINK_SPEC): Remove LD_EMUL_SUFFIX.
> ---
>  gcc/config/riscv/linux.h | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
> index 38803723ba9..e0ff6e6a178 100644
> --- a/gcc/config/riscv/linux.h
> +++ b/gcc/config/riscv/linux.h
> @@ -49,16 +49,8 @@ along with GCC; see the file COPYING3.  If not see
>
>  #define CPP_SPEC "%{pthread:-D_REENTRANT}"
>
> -#define LD_EMUL_SUFFIX \
> -  "%{mabi=lp64d:}" \
> -  "%{mabi=lp64f:_lp64f}" \
> -  "%{mabi=lp64:_lp64}" \
> -  "%{mabi=ilp32d:}" \
> -  "%{mabi=ilp32f:_ilp32f}" \
> -  "%{mabi=ilp32:_ilp32}"
> -
>  #define LINK_SPEC "\
> --melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv" LD_EMUL_SUFFIX " \
> +-melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv \
>  %{mno-relax:--no-relax} \
>  %{mbig-endian:-EB} \
>  %{mlittle-endian:-EL} \
> --
> 2.36.1.476.g0c4daa206d-goog
>




--
宋方睿


[PATCH, committed] Fortran: error recovery for bad initializers of implied-shape arrays [PR106209]

2022-07-14 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached patch introduces error recovery for two cases of using
an invalid array in a declaration of an implied-shape array instead
of hitting internal errors.

Initial patch by Steve.  The final version was approved in the PR
by Steve.

Committed as:

https://gcc.gnu.org/g:748f8a8b145dde59c7b63aa68b5a59515b7efc49

Thanks,
Harald

From 748f8a8b145dde59c7b63aa68b5a59515b7efc49 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 14 Jul 2022 22:24:55 +0200
Subject: [PATCH] Fortran: error recovery for bad initializers of implied-shape
 arrays [PR106209]

gcc/fortran/ChangeLog:

	PR fortran/106209
	* decl.cc (add_init_expr_to_sym): Handle bad initializers for
	implied-shape arrays.

gcc/testsuite/ChangeLog:

	PR fortran/106209
	* gfortran.dg/pr106209.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/decl.cc| 15 +--
 gcc/testsuite/gfortran.dg/pr106209.f90 |  9 +
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr106209.f90

diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index 339f8b15035..b6400514731 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -2129,10 +2129,21 @@ add_init_expr_to_sym (const char *name, gfc_expr **initp, locus *var_locus)
 	  /* The shape may be NULL for EXPR_ARRAY, set it.  */
 	  if (init->shape == NULL)
 	{
-	  gcc_assert (init->expr_type == EXPR_ARRAY);
+	  if (init->expr_type != EXPR_ARRAY)
+		{
+		  gfc_error ("Bad shape of initializer at %L", >where);
+		  return false;
+		}
+
 	  init->shape = gfc_get_shape (1);
 	  if (!gfc_array_size (init, >shape[0]))
-		  gfc_internal_error ("gfc_array_size failed");
+		{
+		  gfc_error ("Cannot determine shape of initializer at %L",
+			 >where);
+		  free (init->shape);
+		  init->shape = NULL;
+		  return false;
+		}
 	}

 	  for (dim = 0; dim < sym->as->rank; ++dim)
diff --git a/gcc/testsuite/gfortran.dg/pr106209.f90 b/gcc/testsuite/gfortran.dg/pr106209.f90
new file mode 100644
index 000..44f9233ec2f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr106209.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! PR fortran/106209 - ICE in add_init_expr_to_sym
+! Contributed by G.Steinmetz
+
+program p
+  integer, parameter :: a(:) = 0   ! { dg-error "of deferred shape" }
+  integer, parameter :: b(*) = a   ! { dg-error "Bad shape of initializer" }
+  integer, parameter :: c(*) = [a] ! { dg-error "Cannot determine shape" }
+end
--
2.35.3



Re: [PATCH 2/3] tree-cfg: do not duplicate returns_twice calls

2022-07-14 Thread Alexander Monakov via Gcc-patches


On Thu, 14 Jul 2022, Richard Biener wrote:

> Indeed.  Guess that's what __builtin_setjmp[_receiver] for SJLJ_EH got 
> "right".
> 
> When copying a block we do not copy labels so any "jumps" remain to the 
> original
> block and thus we are indeed able to isolate normal control flow.  Given that
> returns_twice functions _do_ seem to be special, and also special as to how
> we handle other abnormal receivers in duplicate_block.

We do? Sorry, I don't see what you mean here, can you point me to specific 
lines?

> So it might indeed make sense to special-case them in can_duplicate_block_p
> ... (sorry for going back-and-forth ...)
> 
> Note that I think this detail of duplicate_block (the function) and the hook
> needs to be better documented (the semantics on incoming edges, not 
> duplicating
> labels used for incoming control flow).
> 
> Can you see as to how to adjust the RTL side for this?  It looks like at least
> some places set a REG_SETJMP note on call_insns (emit_call_1), I wonder
> if in rtl_verify_flow_info[_1] (or its callees) we can check that such
> calls come
> first ... they might not since IIRC we do _not_ preserve abnormal edges when
> expanding RTL (there's some existing bug about this and how this breaks some
> setjmp tests) (but we try to recompute them?).

No, we emit arguments/return value handling before/after a REG_SETJMP call,
and yeah, we don't always properly recompute abnormal edges, so improving
RTL in this respect seems hopeless. For example, it is easy enough to create
a testcase where bb-reordering duplicates a returns_twice call, although it
runs so late that perhaps later passes don't care:

// gcc -O2 --param=max-grow-copy-bb-insns=100
__attribute__((returns_twice))
int rtwice(int);
int g1(int), g2(int);
void f(int i)
{
  do {
i = i%2 ? g1(i) : g2(i);
  } while (i = rtwice(i));
}

FWIW, I also investigated https://gcc.gnu.org/PR101347

> Sorry about the back-and-forth again ... your original patch looks OK for the
> GIMPLE side but can you amend the cfghooks.{cc,h} documentation to
> summarize our findings and
> the desired semantics of duplicate_block in this respect?

Like below?

---8<---

Subject: [PATCH v3] tree-cfg: do not duplicate returns_twice calls

A returns_twice call may have associated abnormal edges that correspond
to the "second return" from the call. If the call is duplicated, the
copies of those edges also need to be abnormal, but e.g. tracer does not
enforce that. Just prohibit the (unlikely to be useful) duplication.

gcc/ChangeLog:

* cfghooks.cc (duplicate_block): Expand comment.
* tree-cfg.cc (gimple_can_duplicate_bb_p): Reject blocks with
calls that may return twice.
---
 gcc/cfghooks.cc | 13 ++---
 gcc/tree-cfg.cc |  7 +--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/gcc/cfghooks.cc b/gcc/cfghooks.cc
index e435891fa..c6ac9532c 100644
--- a/gcc/cfghooks.cc
+++ b/gcc/cfghooks.cc
@@ -1086,9 +1086,16 @@ can_duplicate_block_p (const_basic_block bb)
   return cfg_hooks->can_duplicate_block_p (bb);
 }
 
-/* Duplicates basic block BB and redirects edge E to it.  Returns the
-   new basic block.  The new basic block is placed after the basic block
-   AFTER.  */
+/* Duplicate basic block BB, place it after AFTER (if non-null) and redirect
+   edge E to it (if non-null).  Return the new basic block.
+
+   If BB contains a returns_twice call, the caller is responsible for 
recreating
+   incoming abnormal edges corresponding to the "second return" for the copy.
+   gimple_can_duplicate_bb_p rejects such blocks, while RTL likes to live
+   dangerously.
+
+   If BB has incoming abnormal edges for some other reason, their destinations
+   should be tied to label(s) of the original BB and not the copy.  */
 
 basic_block
 duplicate_block (basic_block bb, edge e, basic_block after, copy_bb_data *id)
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index f846dc2d8..5bcf78198 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -6346,12 +6346,15 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
 {
   gimple *g = gsi_stmt (gsi);
 
-  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+  /* Prohibit duplication of returns_twice calls, otherwise associated
+abnormal edges also need to be duplicated properly.
+An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 duplicated as part of its group, or not at all.
 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
 group, so the same holds there.  */
   if (is_gimple_call (g)
- && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+ && (gimple_call_flags (g) & ECF_RETURNS_TWICE
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
  || gimple_call_internal_p (g, 

Re: [PATCH] match.pd: Add new abs pattern [PR94290]

2022-07-14 Thread Andrew Pinski via Gcc-patches
On Thu, Jul 14, 2022 at 12:38 PM Sam Feifer  wrote:
>
>
>
> On Thu, Jul 14, 2022 at 1:24 PM Andrew Pinski  wrote:
>>
>> On Thu, Jul 14, 2022 at 7:09 AM Sam Feifer  wrote:
>> >
>> >
>> > On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski  wrote:
>> >>
>> >> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches
>> >>  wrote:
>> >> >
>> >> > This patch is intended to fix a missed optimization in match.pd. It 
>> >> > optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to 
>> >> > write a second simplification in match.pd to handle the commutative 
>> >> > property as the match was not ocurring otherwise. Additionally, the 
>> >> > pattern (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps 
>> >> > with the other simplification rule.
>> >>
>> >> You could use :c for the commutative property instead and that should
>> >> simplify things.
>> >> That is:
>> >>
>> >> (simplify
>> >>   (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
>> >>   (abs @0))
>> >>
>> >> Also since integer_zerop works on vectors, it seems like you should
>> >> add a testcase or two for the vector case.
>> >> Also would be useful if you write a testcase that uses different
>> >> statements rather than one big one so it gets exercised in the
>> >> forwprop case.
>> >> Note also if either of the max are used more than just in this
>> >> simplification, it could increase the lifetime of @0, maybe you need
>> >> to add :s to the max expressions.
>> >>
>> >
>> > Thanks for the feedback. I'm not quite sure what a vector test case would 
>> > look like for this. Could I get some guidance on that?
>>
>> Yes this should produce the pattern at forwprop1 time (with the C++
>> front-end, the C front-end does not support vector selects):
>> typedef int __attribute__((vector_size(4*sizeof(int vint;
>>
>> vint foo(vint x) {
>> vint t = (x >= 0 ? x : 0) ;
>> vint xx = -x;
>> vint t1 =  (xx >= 0 ? xx : 0);
>> return t + t1;
>> }
>>
>> int foo(int x) {
>> int t = (x >= 0 ? x : 0) ;
>> int xx = -x;
>> int t1 =  (xx >= 0 ? xx : 0);
>> return t + t1;
>> }
>>
>> Thanks,
>> Andrew Pinski
>>
>
> Thanks for the help. I'm still having trouble with the vector test, though. 
> When I try to compile, I get an error saying "used vector type where scalar 
> is required", referring to the max expressions. How do I use the max 
> expression with two vectors as the inputs?

As I mentioned it only works with the C++ front-end :). The C
front-end does not support ?: with vectors types.

Thanks,
Andrew Pinski

>
> Thanks
> -Sam
>>
>>
>> >
>> > Thanks
>> > -Sam
>> >
>> >>
>> >> Thanks,
>> >> Andrew
>> >>
>> >> >
>> >> > Tests are also included to be added to the testsuite.
>> >> >
>> >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>> >> >
>> >> > PR tree-optimization/94290
>> >> >
>> >> > gcc/ChangeLog:
>> >> >
>> >> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New 
>> >> > simplification.
>> >> > * match.pd (x <= 0 ? -x : 0): New Simplification.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >
>> >> > * gcc.c-torture/execute/pr94290-1.c: New test.
>> >> > * gcc.dg/pr94290-2.c: New test.
>> >> > * gcc.dg/pr94290.c: New test.
>> >> > ---
>> >> >  gcc/match.pd  | 15 ++
>> >> >  .../gcc.c-torture/execute/pr94290-1.c | 16 +++
>> >> >  gcc/testsuite/gcc.dg/pr94290-2.c  | 15 ++
>> >> >  gcc/testsuite/gcc.dg/pr94290.c| 46 +++
>> >> >  4 files changed, 92 insertions(+)
>> >> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
>> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c
>> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290.c
>> >> >
>> >> > diff --git a/gcc/match.pd b/gcc/match.pd
>> >> > index 45aefd96688..55ca79d7ac9 100644
>> >> > --- a/gcc/match.pd
>> >> > +++ b/gcc/match.pd
>> >> > @@ -7848,3 +7848,18 @@ and,
>> >> >(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
>> >> >  (bit_and @0 @1)
>> >> >(cond (le @0 @1) @0 (bit_and @0 @1))
>> >> > +
>> >> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
>> >> > +(simplify
>> >> > +  (plus (max @0 integer_zerop) (max (negate @0) integer_zerop))
>> >> > +  (abs @0))
>> >> > +
>> >> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x.  */
>> >> > +(simplify
>> >> > +  (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) )
>> >> > +  (abs @0))
>> >> > +
>> >> > +/* (x <= 0 ? -x : 0) -> max(-x, 0).  */
>> >> > +(simplify
>> >> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1)
>> >> > + (max (negate @0) @1))
>> >> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c 
>> >> > b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
>> >> > new file mode 100644
>> >> > index 000..93b80d569aa
>> >> > --- /dev/null
>> >> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
>> >> > @@ -0,0 +1,16 @@
>> >> > 

Re: [PATCH] match.pd: Add new abs pattern [PR94290]

2022-07-14 Thread Sam Feifer via Gcc-patches
On Thu, Jul 14, 2022 at 1:24 PM Andrew Pinski  wrote:

> On Thu, Jul 14, 2022 at 7:09 AM Sam Feifer  wrote:
> >
> >
> > On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski  wrote:
> >>
> >> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches
> >>  wrote:
> >> >
> >> > This patch is intended to fix a missed optimization in match.pd. It
> optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to
> write a second simplification in match.pd to handle the commutative
> property as the match was not ocurring otherwise. Additionally, the pattern
> (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps with the
> other simplification rule.
> >>
> >> You could use :c for the commutative property instead and that should
> >> simplify things.
> >> That is:
> >>
> >> (simplify
> >>   (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
> >>   (abs @0))
> >>
> >> Also since integer_zerop works on vectors, it seems like you should
> >> add a testcase or two for the vector case.
> >> Also would be useful if you write a testcase that uses different
> >> statements rather than one big one so it gets exercised in the
> >> forwprop case.
> >> Note also if either of the max are used more than just in this
> >> simplification, it could increase the lifetime of @0, maybe you need
> >> to add :s to the max expressions.
> >>
> >
> > Thanks for the feedback. I'm not quite sure what a vector test case
> would look like for this. Could I get some guidance on that?
>
> Yes this should produce the pattern at forwprop1 time (with the C++
> front-end, the C front-end does not support vector selects):
> typedef int __attribute__((vector_size(4*sizeof(int vint;
>
> vint foo(vint x) {
> vint t = (x >= 0 ? x : 0) ;
> vint xx = -x;
> vint t1 =  (xx >= 0 ? xx : 0);
> return t + t1;
> }
>
> int foo(int x) {
> int t = (x >= 0 ? x : 0) ;
> int xx = -x;
> int t1 =  (xx >= 0 ? xx : 0);
> return t + t1;
> }
>
> Thanks,
> Andrew Pinski
>
>
Thanks for the help. I'm still having trouble with the vector test, though.
When I try to compile, I get an error saying "used vector type where scalar
is required", referring to the max expressions. How do I use the max
expression with two vectors as the inputs?

Thanks
-Sam

>
> >
> > Thanks
> > -Sam
> >
> >>
> >> Thanks,
> >> Andrew
> >>
> >> >
> >> > Tests are also included to be added to the testsuite.
> >> >
> >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >> >
> >> > PR tree-optimization/94290
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New
> simplification.
> >> > * match.pd (x <= 0 ? -x : 0): New Simplification.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> > * gcc.c-torture/execute/pr94290-1.c: New test.
> >> > * gcc.dg/pr94290-2.c: New test.
> >> > * gcc.dg/pr94290.c: New test.
> >> > ---
> >> >  gcc/match.pd  | 15 ++
> >> >  .../gcc.c-torture/execute/pr94290-1.c | 16 +++
> >> >  gcc/testsuite/gcc.dg/pr94290-2.c  | 15 ++
> >> >  gcc/testsuite/gcc.dg/pr94290.c| 46
> +++
> >> >  4 files changed, 92 insertions(+)
> >> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c
> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290.c
> >> >
> >> > diff --git a/gcc/match.pd b/gcc/match.pd
> >> > index 45aefd96688..55ca79d7ac9 100644
> >> > --- a/gcc/match.pd
> >> > +++ b/gcc/match.pd
> >> > @@ -7848,3 +7848,18 @@ and,
> >> >(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> >> >  (bit_and @0 @1)
> >> >(cond (le @0 @1) @0 (bit_and @0 @1))
> >> > +
> >> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
> >> > +(simplify
> >> > +  (plus (max @0 integer_zerop) (max (negate @0) integer_zerop))
> >> > +  (abs @0))
> >> > +
> >> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x.  */
> >> > +(simplify
> >> > +  (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) )
> >> > +  (abs @0))
> >> > +
> >> > +/* (x <= 0 ? -x : 0) -> max(-x, 0).  */
> >> > +(simplify
> >> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1)
> >> > + (max (negate @0) @1))
> >> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> >> > new file mode 100644
> >> > index 000..93b80d569aa
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> >> > @@ -0,0 +1,16 @@
> >> > +/* PR tree-optimization/94290 */
> >> > +
> >> > +#include "../../gcc.dg/pr94290.c"
> >> > +
> >> > +int main() {
> >> > +
> >> > +if (foo(0) != 0
> >> > +|| foo(-42) != 42
> >> > +|| foo(42) != 42
> >> > +|| baz(-10) != 10
> >> > +|| baz(-10) != 10) {
> >> > +__builtin_abort();
> >> > +}
> >> > +
> >> > +return 0;
> >> > +}

Re: [PATCH] PR target/106278: Keep REG_EQUAL notes consistent during TImode STV.

2022-07-14 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 14, 2022 at 6:58 PM Roger Sayle  wrote:
>
>
> This patch resolves PR target/106278 a regression on x86_64 caused by my
> recent TImode STV improvements.  Now that TImode STV can handle comparisons
> such as "(set (regs:CC) (compare:CC (reg:TI) ...))" the convert_insn method
> sensibly checks that the mode of the SET_DEST is TImode before setting
> it to V1TImode [to avoid V1TImode appearing on the hard reg CC_FLAGS.
>
> Hence the current code looks like:
>
>   if (GET_MODE (dst) == TImode)
> {
>   tmp = find_reg_equal_equiv_note (insn);
>   if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode)
> PUT_MODE (XEXP (tmp, 0), V1TImode);
>   PUT_MODE (dst, V1TImode);
>   fix_debug_reg_uses (dst);
> }
>   break;
>
> which checks GET_MODE (dst) before calling PUT_MODE, and when a
> change is made updating the REG_EQUAL_NOTE tmp if it exists.
>
> The logical flaw (oversight) is that due to RTL sharing, the destination
> of this set may already have been updated to V1TImode, as this chain is
> being converted, but we still need to update any REG_EQUAL_NOTE that
> still has TImode.  Hence the correct code is actually:
>
>
>   if (GET_MODE (dst) == TImode)
> {
>   PUT_MODE (dst, V1TImode);
>   fix_debug_reg_uses (dst);
> }
>   if (GET_MODE (dst) == V1TImode)
> {
>   tmp = find_reg_equal_equiv_note (insn);
>   if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode)
> PUT_MODE (XEXP (tmp, 0), V1TImode);
> }
>   break;
>
> While fixing this behavior, I noticed I had some indentation whitespace
> issues and some vestigial dead code in this function/method that I've
> taken the liberty of cleaning up (as obvious) in this patch.

While it is desired to fix cosmetic issues, those kinds of changes
unnecessarily make review of the patch more difficult.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?

OK, but please do not mix cleanups with the functional changes.
Cleanups should be separated to a follow-up patch.

Thanks,
Uros.

>
>
> 2022-07-14  Roger Sayle  
>
> gcc/ChangeLog
> PR target/106278
> * config/i386/i386-features.cc (general_scalar_chain::convert_insn):
> Fix indentation whitespace.
> (timode_scalar_chain::fix_debug_reg_uses): Likewise.
> (timode_scalar_chain::convert_insn): Delete dead code.
> Update TImode REG_EQUAL_NOTE even if the SET_DEST is already V1TI.
> Fix indentation whitespace.
> (convertible_comparison_p): Likewise.
> (timode_scalar_to_vector_candidate_p): Likewise.
>
> gcc/testsuite/ChangeLog
> PR target/106278
> * gcc.dg/pr106278.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] jit: Make recording::memento non-copyable

2022-07-14 Thread David Malcolm via Gcc-patches
On Thu, 2022-07-14 at 19:44 +0100, Jonathan Wakely wrote:
> These polymoprhic types don't appear to be copied anywhere. Rather
> than
> trying to reason about what it means to copy a polymoprhic base
> without
> copying the derived part, disable copies. This also avoids a
> potential
> double-free if a recorindg::string object does somehow get copied (it
> owns a pointer to dynamically allocated memory, but the implicit copy
> constructor will just make shallow copies).
> 
> Tested x86_64-linux with --enable-languages=c,c++,jit --enable-host-
> shared
> 
> OK for trunk?

[CCing jit mailing list]

Thanks; OK for trunk.

> 
> -- >8 --
> 
> gcc/jit/ChangeLog:
> 
> * jit-recording.h (recording::memento): Define copy
> constructor
> and copy assignment operator as deleted.
> (recording::string): Likewise.
> (recording::string::c_str): Add const qualifier.
> ---
>  gcc/jit/jit-recording.h | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h
> index 0dfb42f2676..8610ea988bd 100644
> --- a/gcc/jit/jit-recording.h
> +++ b/gcc/jit/jit-recording.h
> @@ -405,6 +405,9 @@ public:
>    virtual void write_reproducer (reproducer ) = 0;
>    virtual location *dyn_cast_location () { return NULL; }
>  
> +  memento (const memento&) = delete;
> +  memento& operator= (const memento&) = delete;
> +
>  protected:
>    memento (context *ctxt)
>    : m_ctxt (ctxt),
> @@ -436,13 +439,16 @@ public:
>    string (context *ctxt, const char *text, bool escaped);
>    ~string ();
>  
> -  const char *c_str () { return m_buffer; }
> +  const char *c_str () const { return m_buffer; }
>  
>    static string * from_printf (context *ctxt, const char *fmt, ...)
>  GNU_PRINTF(2, 3);
>  
>    void replay_into (replayer *) final override {}
>  
> +  string (const string&) = delete;
> +  string& operator= (const string&) = delete;
> +
>  private:
>    string * make_debug_string () final override;
>    void write_reproducer (reproducer ) final override;




[PATCH] jit: Make recording::memento non-copyable

2022-07-14 Thread Jonathan Wakely via Gcc-patches
These polymoprhic types don't appear to be copied anywhere. Rather than
trying to reason about what it means to copy a polymoprhic base without
copying the derived part, disable copies. This also avoids a potential
double-free if a recorindg::string object does somehow get copied (it
owns a pointer to dynamically allocated memory, but the implicit copy
constructor will just make shallow copies).

Tested x86_64-linux with --enable-languages=c,c++,jit --enable-host-shared

OK for trunk?

-- >8 --

gcc/jit/ChangeLog:

* jit-recording.h (recording::memento): Define copy constructor
and copy assignment operator as deleted.
(recording::string): Likewise.
(recording::string::c_str): Add const qualifier.
---
 gcc/jit/jit-recording.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h
index 0dfb42f2676..8610ea988bd 100644
--- a/gcc/jit/jit-recording.h
+++ b/gcc/jit/jit-recording.h
@@ -405,6 +405,9 @@ public:
   virtual void write_reproducer (reproducer ) = 0;
   virtual location *dyn_cast_location () { return NULL; }
 
+  memento (const memento&) = delete;
+  memento& operator= (const memento&) = delete;
+
 protected:
   memento (context *ctxt)
   : m_ctxt (ctxt),
@@ -436,13 +439,16 @@ public:
   string (context *ctxt, const char *text, bool escaped);
   ~string ();
 
-  const char *c_str () { return m_buffer; }
+  const char *c_str () const { return m_buffer; }
 
   static string * from_printf (context *ctxt, const char *fmt, ...)
 GNU_PRINTF(2, 3);
 
   void replay_into (replayer *) final override {}
 
+  string (const string&) = delete;
+  string& operator= (const string&) = delete;
+
 private:
   string * make_debug_string () final override;
   void write_reproducer (reproducer ) final override;
-- 
2.36.1



Re: [GCC 11 backport] Disable generating load/store vector pairs for block copies.

2022-07-14 Thread Michael Meissner via Gcc-patches
Back port patch (changing .cc to .c) from trunk to GCC 11 committed.

| From 3118d0856b030fe491a170354fed2df570df199f Mon Sep 17 00:00:00 2001
| From: Michael Meissner 
| Date: Thu, 14 Jul 2022 14:03:37 -0400
| Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for 
block copies.

Testing has found that using load and store vector pair for block copies
can result in a slow down on power10.  This patch disables using the
vector pair instructions for block copies if we are tuning for power10.

2022-06-11   Michael Meissner  

gcc/

* config/rs6000/rs6000.c (rs6000_option_override_internal): Do
not generate block copies with vector pair instructions if we are
tuning for power10.  Back port from master branch.
---
 gcc/config/rs6000/rs6000.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8c89b45922f..73f90f134f8 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4149,7 +4149,10 @@ rs6000_option_override_internal (bool global_init_p)
 
   if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
 {
-  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
+  /* Do not generate lxvp and stxvp on power10 since there are some
+performance issues.  */
+  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
+ && rs6000_tune != PROCESSOR_POWER10)
rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
   else
rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
-- 
2.35.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [RFC] RISC-V: Add support for RV64E/lp64e

2022-07-14 Thread Palmer Dabbelt

On Tue, 12 Jul 2022 22:09:53 PDT (-0700), Palmer Dabbelt wrote:

gcc/ChangeLog

* config.gcc (riscv): Accept rv64e and lp64e.
* config/riscv/arch-canonicalize: Likewise.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Likewise.
* config/riscv/riscv-opts.h (riscv_abi_type): Likewise.
* config/riscv/riscv.cc (riscv_option_override): Likewise
* config/riscv/riscv.h (UNITS_PER_FP_ARG): Likewise.
(STACK_BOUNDARY): Likewise.
(ABI_STACK_BOUNDARY): Likewise.
(MAX_ARGS_IN_REGISTERS): Likewise.
(ABI_SPEC): Likewise.
* config/riscv/riscv.opt (abi_type): Likewise.
* doc/invoke.texi (RISC-V) <-mabi>: Likewise.
---
This is all still in flight, but evidently RV64E exists.  I haven't
tested this at all, but given that we don't even have the ABI docs lined
up yet it's likely a bit away from being mergable.
---
 gcc/config.gcc |  8 +---
 gcc/config/riscv/arch-canonicalize |  2 +-
 gcc/config/riscv/riscv-c.cc|  1 +
 gcc/config/riscv/riscv-opts.h  |  1 +
 gcc/config/riscv/riscv.cc  |  6 --
 gcc/config/riscv/riscv.h   | 11 +++
 gcc/config/riscv/riscv.opt |  3 +++
 gcc/doc/invoke.texi|  5 +++--
 8 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 4e3b15bb5e9..4617ecb8d9b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4637,7 +4637,7 @@ case "${target}" in

# Infer arch from --with-arch, --target, and --with-abi.
case "${with_arch}" in
-   rv32e* | rv32i* | rv32g* | rv64i* | rv64g*)
+   rv32e* | rv32i* | rv32g* | rv64e* | rv64i* | rv64g*)
# OK.
;;
"")
@@ -4645,12 +4645,13 @@ case "${target}" in
case "${with_abi}" in
ilp32e) with_arch="rv32e" ;;
ilp32 | ilp32f | ilp32d) with_arch="rv32gc" ;;
+   lp64e) with_arch="rv64e" ;;
lp64 | lp64f | lp64d) with_arch="rv64gc" ;;
*) with_arch="rv${xlen}gc" ;;
esac
;;
*)
-   echo "--with-arch=${with_arch} is not supported.  The argument must 
begin with rv32e, rv32i, rv32g, rv64i, or rv64g." 1>&2
+   echo "--with-arch=${with_arch} is not supported.  The argument must 
begin with rv32e, rv32i, rv32g, rv64e, rv64i, or rv64g." 1>&2
exit 1
;;
esac
@@ -4672,6 +4673,7 @@ case "${target}" in
rv32e*) with_abi=ilp32e ;;
rv32*) with_abi=ilp32 ;;
rv64*d* | rv64g*) with_abi=lp64d ;;
+   rv64e*) with_abi=lp64e ;;
rv64*) with_abi=lp64 ;;
esac
;;
@@ -4687,7 +4689,7 @@ case "${target}" in
ilp32,rv32* | ilp32e,rv32e* \
| ilp32f,rv32*f* | ilp32f,rv32g* \
| ilp32d,rv32*d* | ilp32d,rv32g* \
-   | lp64,rv64* \
+   | lp64,rv64* | lp64e,rv64e* \
| lp64f,rv64*f* | lp64f,rv64g* \
| lp64d,rv64*d* | lp64d,rv64g*)
;;
diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
index fd7651ac491..8db3e88ddd7 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -71,7 +71,7 @@ def arch_canonicalize(arch, isa_spec):
   new_arch = ""
   extra_long_ext = []
   std_exts = []
-  if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64i', 'rv64g']:
+  if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64e', 'rv64i', 'rv64g']:
 new_arch = arch[:5].replace("g", "i")
 if arch[:5] in ['rv32g', 'rv64g']:
   std_exts = ['m', 'a', 'f', 'd']
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index eb7ef09297e..4614dc6b6d9 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -67,6 +67,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
   switch (riscv_abi)
 {
 case ABI_ILP32E:
+case ABI_LP64E:
   builtin_define ("__riscv_abi_rve");
   gcc_fallthrough ();

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1e153b3a6e7..70fe708cbae 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -27,6 +27,7 @@ enum riscv_abi_type {
   ABI_ILP32F,
   ABI_ILP32D,
   ABI_LP64,
+  ABI_LP64E,
   ABI_LP64F,
   ABI_LP64D
 };
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2e83ca07394..51b7195c17b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5047,8 +5047,10 @@ riscv_option_override (void)
 error ("requested ABI requires %<-march%> to subsume the %qc extension",
   

Re: [PATCH] c++: Add __reference_con{struc,ver}ts_from_temporary [PR104477]

2022-07-14 Thread Marek Polacek via Gcc-patches
On Tue, Jul 12, 2022 at 04:15:00PM -0400, Jason Merrill wrote:
> On 7/12/22 16:10, Jason Merrill wrote:
> > On 7/8/22 13:41, Marek Polacek wrote:
> > > This patch implements C++23 P2255R2, which adds two new type traits to
> > > detect reference binding to a temporary.  They can be used to detect code
> > > like
> > > 
> > >    std::tuple t("meow");
> > > 
> > > which is incorrect because it always creates a dangling reference,
> > > because
> > > the std::string temporary is created inside the selected constructor of
> > > std::tuple, and not outside it.
> > > 
> > > There are two new compiler builtins,
> > > __reference_constructs_from_temporary
> > > and __reference_converts_from_temporary.  The former is used to simulate
> > > direct- and the latter copy-initialization context.  But I had a
> > > hard time
> > > finding a test where there's actually a difference.  Under DR 2267, both
> > > of these are invalid:
> > > 
> > >    struct A { } a;
> > >    struct B { explicit B(const A&); };
> > >    const B {a};
> > >    const B (a);
> > > 
> > > so I had to peruse [over.match.ref], and eventually realized that the
> > > difference can be seen here:
> > > 
> > >    struct G {
> > >  operator int(); // #1
> > >  explicit operator int&&(); // #2
> > >    };
> > > 
> > > int&& r1(G{}); // use #2 (no temporary)
> > > int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&)
> > > 
> > > The implementation itself was rather straightforward because we already
> > > have conv_binds_ref_to_prvalue.  The main function here is
> > > reference_from_temporary.  The renaming to ref_conv_binds_to_temporary_p
> > > is because previously the function didn't distinguish between an invalid
> > > conversion and one that binds to a prvalue.
> > > 
> > > The patch also adds the relevant class and variable templates to
> > > .
> > > 
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > > PR c++/104477
> > > 
> > > gcc/c-family/ChangeLog:
> > > 
> > > * c-common.cc (c_common_reswords): Add
> > > __reference_constructs_from_temporary and
> > > __reference_converts_from_temporary.
> > > * c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and
> > > RID_REF_CONVERTS_FROM_TEMPORARY.
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > > * call.cc (ref_conv_binds_directly_p): Rename to ...
> > > (ref_conv_binds_to_temporary_p): ... this.  Add a new bool
> > > parameter.  Return true only if the conversion is valid and
> > > conv_binds_ref_to_prvalue returns true.
> > > * constraint.cc (diagnose_trait_expr): Handle
> > > CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and
> > > CPTK_REF_CONVERTS_FROM_TEMPORARY.
> > > * cp-tree.h (enum cp_trait_kind): Add
> > > CPTK_REF_CONSTRUCTS_FROM_TEMPORARY
> > > and CPTK_REF_CONVERTS_FROM_TEMPORARY.
> > > (ref_conv_binds_directly_p): Rename to ...
> > > (ref_conv_binds_to_temporary_p): ... this.
> > > (reference_from_temporary): Declare.
> > > * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
> > > CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and
> > > CPTK_REF_CONVERTS_FROM_TEMPORARY.
> > > * method.cc (reference_from_temporary): New.
> > > * parser.cc (cp_parser_primary_expression): Handle
> > > RID_REF_CONSTRUCTS_FROM_TEMPORARY and
> > > RID_REF_CONVERTS_FROM_TEMPORARY.
> > > (cp_parser_trait_expr): Likewise.
> > > (warn_for_range_copy): Adjust to call ref_conv_binds_to_temporary_p.
> > > * semantics.cc (trait_expr_value): Handle
> > > CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and
> > > CPTK_REF_CONVERTS_FROM_TEMPORARY.
> > > (finish_trait_expr): Likewise.
> > > 
> > > libstdc++-v3/ChangeLog:
> > > 
> > > * include/std/type_traits (reference_constructs_from_temporary,
> > > reference_converts_from_temporary): New class templates.
> > > (reference_constructs_from_temporary_v,
> > > reference_converts_from_temporary_v): New variable templates.
> > > (__cpp_lib_reference_from_temporary): Define for C++23.
> > > * include/std/version (__cpp_lib_reference_from_temporary):
> > > Define for
> > > C++23.
> > > * testsuite/20_util/variable_templates_for_traits.cc: Test
> > > reference_constructs_from_temporary_v and
> > > reference_converts_from_temporary_v.
> > > * testsuite/20_util/reference_from_temporary/value.cc: New test.
> > > * testsuite/20_util/reference_from_temporary/value2.cc: New test.
> > > * testsuite/20_util/reference_from_temporary/version.cc: New test.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > * g++.dg/ext/reference_constructs_from_temporary1.C: New test.
> > > * g++.dg/ext/reference_converts_from_temporary1.C: New test.
> > > ---
> > >   gcc/c-family/c-common.cc  |   4 +
> > >   gcc/c-family/c-common.h   |   2 +
> > >   gcc/cp/call.cc    |  14 +-
> > >   gcc/cp/constraint.cc  |   8 +
> > >   

[PATCH] x86: Disable sibcall if indirect_return attribute doesn't match

2022-07-14 Thread H.J. Lu via Gcc-patches
When shadow stack is enabled, function with indirect_return attribute
may return via indirect jump.  In this case, we need to disable sibcall
if caller doesn't have indirect_return attribute and indirect branch
tracking is enabled since compiler won't generate ENDBR when calling the
caller.

gcc/

PR target/85620
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
false if callee has indirect_return attribute and caller
doesn't.

gcc/testsuite/

PR target/85620
* gcc.target/i386/pr85620-2.c: Updated.
* gcc.target/i386/pr85620-5.c: New test.
* gcc.target/i386/pr85620-6.c: Likewise.
* gcc.target/i386/pr85620-7.c: Likewise.
---
 gcc/config/i386/i386.cc   | 10 ++
 gcc/testsuite/gcc.target/i386/pr85620-2.c |  3 ++-
 gcc/testsuite/gcc.target/i386/pr85620-5.c | 13 +
 gcc/testsuite/gcc.target/i386/pr85620-6.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr85620-7.c | 14 ++
 5 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr85620-7.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3a3c7299eb4..e03f86d4a23 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1024,6 +1024,16 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
 return false;
 }
 
+  /* Disable sibcall if callee has indirect_return attribute and
+ caller doesn't since callee will return to the caller's caller
+ via an indirect jump.  */
+  if (((flag_cf_protection & (CF_RETURN | CF_BRANCH))
+   == (CF_RETURN | CF_BRANCH))
+  && lookup_attribute ("indirect_return", TYPE_ATTRIBUTES (type))
+  && !lookup_attribute ("indirect_return",
+   TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl
+return false;
+
   /* Otherwise okay.  That also includes certain types of indirect calls.  */
   return true;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr85620-2.c 
b/gcc/testsuite/gcc.target/i386/pr85620-2.c
index b2e680fa1fe..14ce0ffd1e1 100644
--- a/gcc/testsuite/gcc.target/i386/pr85620-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr85620-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fcf-protection" } */
-/* { dg-final { scan-assembler-times {\mendbr} 1 } } */
+/* { dg-final { scan-assembler-times {\mendbr} 2 } } */
+/* { dg-final { scan-assembler-not "jmp" } } */
 
 struct ucontext;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr85620-5.c 
b/gcc/testsuite/gcc.target/i386/pr85620-5.c
new file mode 100644
index 000..04537702d09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr85620-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcf-protection" } */
+/* { dg-final { scan-assembler-not "jmp" } } */
+
+struct ucontext;
+
+extern int (*bar) (struct ucontext *) __attribute__((__indirect_return__));
+
+int
+foo (struct ucontext *oucp)
+{
+  return bar (oucp);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr85620-6.c 
b/gcc/testsuite/gcc.target/i386/pr85620-6.c
new file mode 100644
index 000..0b6a64e8454
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr85620-6.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcf-protection" } */
+/* { dg-final { scan-assembler "jmp" } } */
+
+struct ucontext;
+
+extern int bar (struct ucontext *) __attribute__((__indirect_return__));
+
+__attribute__((__indirect_return__))
+int
+foo (struct ucontext *oucp)
+{
+  return bar (oucp);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr85620-7.c 
b/gcc/testsuite/gcc.target/i386/pr85620-7.c
new file mode 100644
index 000..fa62d56decf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr85620-7.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcf-protection" } */
+/* { dg-final { scan-assembler "jmp" } } */
+
+struct ucontext;
+
+extern int (*bar) (struct ucontext *) __attribute__((__indirect_return__));
+extern int foo (struct ucontext *) __attribute__((__indirect_return__));
+
+int
+foo (struct ucontext *oucp)
+{
+  return bar (oucp);
+}
-- 
2.36.1



[PATCH v3] c++: Add __reference_con{struc,ver}ts_from_temporary [PR104477]

2022-07-14 Thread Marek Polacek via Gcc-patches
On Tue, Jul 12, 2022 at 04:10:08PM -0400, Jason Merrill wrote:
> On 7/8/22 13:41, Marek Polacek wrote:
> > +bool
> > +reference_from_temporary (tree to, tree from, bool direct_init_p)
> > +{
> > +  /* Check is_reference.  */
> > +  if (!TYPE_REF_P (to))
> > +return false;
> > +  /* Check is_constructible.
> > + ??? This check doesn't seem to be necessary; if T isn't constructible
> > + from U, we won't be able to create a conversion.  */
> > +  if (!is_xible (INIT_EXPR, to, build_tree_list (NULL_TREE, from)))
> > +return false;
> 
> I agree with the comment, did you try leaving this out?  If it stays I'd
> think it needs to consider direct_init_p.

I did, it doesn't break anything, so in this version I dropped the is_xible
call entirely.
 
> > @@ -13811,7 +13821,7 @@ warn_for_range_copy (tree decl, tree expr)
> > if (TYPE_REF_P (type))
> >   {
> > -  if (glvalue_p (expr) && !ref_conv_binds_directly_p (type, expr))
> > +  if (glvalue_p (expr) && ref_conv_binds_to_temporary_p (type, expr))
> > {
> >   auto_diagnostic_group d;
> >   if (warning_at (loc, OPT_Wrange_loop_construct,
> > @@ -13842,7 +13852,7 @@ warn_for_range_copy (tree decl, tree expr)
> > tree rtype = cp_build_reference_type (type, /*rval*/false);
> > /* If we could initialize the reference directly, it wouldn't involve 
> > any
> >copies.  */
> > -  if (!ref_conv_binds_directly_p (rtype, expr))
> > +  if (ref_conv_binds_to_temporary_p (rtype, expr))
> >   return;
> 
> I think this case wants the old handling of invalid conversions you
> mentioned in your intro; we don't want to suggest changing to a reference if
> that's ill-formed.

Right.  I've changed the return type to 'tristate', so that we can properly
distinguish between true/false/dunno.
 
> In passing we might change the comment to "If we can initialize a reference
> directly, suggest that to avoid the copy." and move it above the rtype
> declaration.

Done.
 
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements C++23 P2255R2, which adds two new type traits to
detect reference binding to a temporary.  They can be used to detect code
like

  std::tuple t("meow");

which is incorrect because it always creates a dangling reference, because
the std::string temporary is created inside the selected constructor of
std::tuple, and not outside it.

There are two new compiler builtins, __reference_constructs_from_temporary
and __reference_converts_from_temporary.  The former is used to simulate
direct- and the latter copy-initialization context.  But I had a hard time
finding a test where there's actually a difference.  Under DR 2267, both
of these are invalid:

  struct A { } a;
  struct B { explicit B(const A&); };
  const B {a};
  const B (a);

so I had to peruse [over.match.ref], and eventually realized that the
difference can be seen here:

  struct G {
operator int(); // #1
explicit operator int&&(); // #2
  };

int&& r1(G{}); // use #2 (no temporary)
int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&)

The implementation itself was rather straightforward because we already
have the conv_binds_ref_to_prvalue function.  The main function here is
reference_from_temporary.
I've changed the return type of ref_conv_binds_directly to tristate, because
previously the function didn't distinguish between an invalid conversion and
one that binds to a prvalue.  Since it no longer returns a bool, I removed
the _p suffix.

The patch also adds the relevant class and variable templates to .

PR c++/104477

gcc/c-family/ChangeLog:

* c-common.cc (c_common_reswords): Add
__reference_constructs_from_temporary and
__reference_converts_from_temporary.
* c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and
RID_REF_CONVERTS_FROM_TEMPORARY.

gcc/cp/ChangeLog:

* call.cc (ref_conv_binds_directly_p): Rename to ...
(ref_conv_binds_directly): ... this.  Add a new bool parameter.  Change
the return type to tristate.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
* cp-tree.h: Include "tristate.h".
(enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY
and CPTK_REF_CONVERTS_FROM_TEMPORARY.
(ref_conv_binds_directly_p): Rename to ...
(ref_conv_binds_directly): ... this.
(reference_from_temporary): Declare.
* cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
* method.cc (reference_from_temporary): New.
* parser.cc (cp_parser_primary_expression): Handle
RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY.
(cp_parser_trait_expr): Likewise.
(warn_for_range_copy): Adjust to call ref_conv_binds_directly.
* 

Re: [PATCH] match.pd: Add new abs pattern [PR94290]

2022-07-14 Thread Andrew Pinski via Gcc-patches
On Thu, Jul 14, 2022 at 7:09 AM Sam Feifer  wrote:
>
>
> On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski  wrote:
>>
>> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches
>>  wrote:
>> >
>> > This patch is intended to fix a missed optimization in match.pd. It 
>> > optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to 
>> > write a second simplification in match.pd to handle the commutative 
>> > property as the match was not ocurring otherwise. Additionally, the 
>> > pattern (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps 
>> > with the other simplification rule.
>>
>> You could use :c for the commutative property instead and that should
>> simplify things.
>> That is:
>>
>> (simplify
>>   (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
>>   (abs @0))
>>
>> Also since integer_zerop works on vectors, it seems like you should
>> add a testcase or two for the vector case.
>> Also would be useful if you write a testcase that uses different
>> statements rather than one big one so it gets exercised in the
>> forwprop case.
>> Note also if either of the max are used more than just in this
>> simplification, it could increase the lifetime of @0, maybe you need
>> to add :s to the max expressions.
>>
>
> Thanks for the feedback. I'm not quite sure what a vector test case would 
> look like for this. Could I get some guidance on that?

Yes this should produce the pattern at forwprop1 time (with the C++
front-end, the C front-end does not support vector selects):
typedef int __attribute__((vector_size(4*sizeof(int vint;

vint foo(vint x) {
vint t = (x >= 0 ? x : 0) ;
vint xx = -x;
vint t1 =  (xx >= 0 ? xx : 0);
return t + t1;
}

int foo(int x) {
int t = (x >= 0 ? x : 0) ;
int xx = -x;
int t1 =  (xx >= 0 ? xx : 0);
return t + t1;
}

Thanks,
Andrew Pinski


>
> Thanks
> -Sam
>
>>
>> Thanks,
>> Andrew
>>
>> >
>> > Tests are also included to be added to the testsuite.
>> >
>> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>> >
>> > PR tree-optimization/94290
>> >
>> > gcc/ChangeLog:
>> >
>> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New 
>> > simplification.
>> > * match.pd (x <= 0 ? -x : 0): New Simplification.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > * gcc.c-torture/execute/pr94290-1.c: New test.
>> > * gcc.dg/pr94290-2.c: New test.
>> > * gcc.dg/pr94290.c: New test.
>> > ---
>> >  gcc/match.pd  | 15 ++
>> >  .../gcc.c-torture/execute/pr94290-1.c | 16 +++
>> >  gcc/testsuite/gcc.dg/pr94290-2.c  | 15 ++
>> >  gcc/testsuite/gcc.dg/pr94290.c| 46 +++
>> >  4 files changed, 92 insertions(+)
>> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
>> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c
>> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290.c
>> >
>> > diff --git a/gcc/match.pd b/gcc/match.pd
>> > index 45aefd96688..55ca79d7ac9 100644
>> > --- a/gcc/match.pd
>> > +++ b/gcc/match.pd
>> > @@ -7848,3 +7848,18 @@ and,
>> >(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
>> >  (bit_and @0 @1)
>> >(cond (le @0 @1) @0 (bit_and @0 @1))
>> > +
>> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
>> > +(simplify
>> > +  (plus (max @0 integer_zerop) (max (negate @0) integer_zerop))
>> > +  (abs @0))
>> > +
>> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x.  */
>> > +(simplify
>> > +  (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) )
>> > +  (abs @0))
>> > +
>> > +/* (x <= 0 ? -x : 0) -> max(-x, 0).  */
>> > +(simplify
>> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1)
>> > + (max (negate @0) @1))
>> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c 
>> > b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
>> > new file mode 100644
>> > index 000..93b80d569aa
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
>> > @@ -0,0 +1,16 @@
>> > +/* PR tree-optimization/94290 */
>> > +
>> > +#include "../../gcc.dg/pr94290.c"
>> > +
>> > +int main() {
>> > +
>> > +if (foo(0) != 0
>> > +|| foo(-42) != 42
>> > +|| foo(42) != 42
>> > +|| baz(-10) != 10
>> > +|| baz(-10) != 10) {
>> > +__builtin_abort();
>> > +}
>> > +
>> > +return 0;
>> > +}
>> > diff --git a/gcc/testsuite/gcc.dg/pr94290-2.c 
>> > b/gcc/testsuite/gcc.dg/pr94290-2.c
>> > new file mode 100644
>> > index 000..ea6e55755f5
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.dg/pr94290-2.c
>> > @@ -0,0 +1,15 @@
>> > +/* PR tree-optimization/94290 */
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> > +
>> > +/* Form from PR.  */
>> > +__attribute__((noipa)) unsigned int foo(int x) {
>> > +return x <= 0 ? -x : 0;
>> > +}
>> > +
>> > +/* Changed order.  */
>> > +__attribute__((noipa)) 

[PATCH] PR target/106278: Keep REG_EQUAL notes consistent during TImode STV.

2022-07-14 Thread Roger Sayle

This patch resolves PR target/106278 a regression on x86_64 caused by my
recent TImode STV improvements.  Now that TImode STV can handle comparisons
such as "(set (regs:CC) (compare:CC (reg:TI) ...))" the convert_insn method
sensibly checks that the mode of the SET_DEST is TImode before setting
it to V1TImode [to avoid V1TImode appearing on the hard reg CC_FLAGS.

Hence the current code looks like:

  if (GET_MODE (dst) == TImode)
{
  tmp = find_reg_equal_equiv_note (insn);
  if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode)
PUT_MODE (XEXP (tmp, 0), V1TImode);
  PUT_MODE (dst, V1TImode);
  fix_debug_reg_uses (dst);
}
  break;

which checks GET_MODE (dst) before calling PUT_MODE, and when a
change is made updating the REG_EQUAL_NOTE tmp if it exists.

The logical flaw (oversight) is that due to RTL sharing, the destination
of this set may already have been updated to V1TImode, as this chain is
being converted, but we still need to update any REG_EQUAL_NOTE that
still has TImode.  Hence the correct code is actually:


  if (GET_MODE (dst) == TImode)
{
  PUT_MODE (dst, V1TImode);
  fix_debug_reg_uses (dst);
}
  if (GET_MODE (dst) == V1TImode)
{
  tmp = find_reg_equal_equiv_note (insn);
  if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode)
PUT_MODE (XEXP (tmp, 0), V1TImode);
}
  break;

While fixing this behavior, I noticed I had some indentation whitespace
issues and some vestigial dead code in this function/method that I've
taken the liberty of cleaning up (as obvious) in this patch.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-07-14  Roger Sayle  

gcc/ChangeLog
PR target/106278
* config/i386/i386-features.cc (general_scalar_chain::convert_insn):
Fix indentation whitespace.
(timode_scalar_chain::fix_debug_reg_uses): Likewise.
(timode_scalar_chain::convert_insn): Delete dead code.
Update TImode REG_EQUAL_NOTE even if the SET_DEST is already V1TI.
Fix indentation whitespace.
(convertible_comparison_p): Likewise.
(timode_scalar_to_vector_candidate_p): Likewise.

gcc/testsuite/ChangeLog
PR target/106278
* gcc.dg/pr106278.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index f1b03c3..813b203 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1054,13 +1054,13 @@ general_scalar_chain::convert_insn (rtx_insn *insn)
   else if (REG_P (dst) && GET_MODE (dst) == smode)
 {
   /* Replace the definition with a SUBREG to the definition we
- use inside the chain.  */
+use inside the chain.  */
   rtx *vdef = defs_map.get (dst);
   if (vdef)
dst = *vdef;
   dst = gen_rtx_SUBREG (vmode, dst, 0);
   /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
- is a non-REG_P.  So kill those off.  */
+is a non-REG_P.  So kill those off.  */
   rtx note = find_reg_equal_equiv_note (insn);
   if (note)
remove_note (insn, note);
@@ -1246,7 +1246,7 @@ timode_scalar_chain::fix_debug_reg_uses (rtx reg)
 {
   rtx_insn *insn = DF_REF_INSN (ref);
   /* Make sure the next ref is for a different instruction,
- so that we're not affected by the rescan.  */
+so that we're not affected by the rescan.  */
   next = DF_REF_NEXT_REG (ref);
   while (next && DF_REF_INSN (next) == insn)
next = DF_REF_NEXT_REG (next);
@@ -1336,21 +1336,19 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
   rtx dst = SET_DEST (def_set);
   rtx tmp;
 
-  if (MEM_P (dst) && !REG_P (src))
-{
-  /* There are no scalar integer instructions and therefore
-temporary register usage is required.  */
-}
   switch (GET_CODE (dst))
 {
 case REG:
   if (GET_MODE (dst) == TImode)
{
+ PUT_MODE (dst, V1TImode);
+ fix_debug_reg_uses (dst);
+   }
+  if (GET_MODE (dst) == V1TImode)
+   {
  tmp = find_reg_equal_equiv_note (insn);
  if (tmp && GET_MODE (XEXP (tmp, 0)) == TImode)
PUT_MODE (XEXP (tmp, 0), V1TImode);
- PUT_MODE (dst, V1TImode);
- fix_debug_reg_uses (dst);
}
   break;
 case MEM:
@@ -1410,8 +1408,8 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
   if (MEM_P (dst))
{
  tmp = gen_reg_rtx (V1TImode);
-  emit_insn_before (gen_rtx_SET (tmp, src), insn);
-  src = tmp;
+ emit_insn_before (gen_rtx_SET (tmp, src), insn);
+ src = tmp;
}
   break;
 
@@ -1434,8 +1432,8 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
   if (MEM_P (dst))
{
  

Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.

2022-07-14 Thread Michael Meissner via Gcc-patches
I have applied the patch to GCC 12.

| From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001
| From: Michael Meissner 
| Date: Thu, 14 Jul 2022 11:16:08 -0400
| Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for 
block copies.

Testing has found that using load and store vector pair for block copies
can result in a slow down on power10.  This patch disables using the
vector pair instructions for block copies if we are tuning for power10.

2022-06-11   Michael Meissner  

gcc/

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
not generate block copies with vector pair instructions if we are
tuning for power10.  Back port from master branch.
---
 gcc/config/rs6000/rs6000.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 4030864aa1a..040487bd277 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p)
 
   if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
 {
-  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
+  /* Do not generate lxvp and stxvp on power10 since there are some
+performance issues.  */
+  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
+ && rs6000_tune != PROCESSOR_POWER10)
rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
   else
rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
-- 
2.35.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-07-14 Thread Jose E. Marchesi via Gcc-patches


Hi Yonghong.

> On 7/7/22 1:24 PM, Jose E. Marchesi wrote:
>> Hi Yonghong.
>> 
>>> On 6/21/22 9:12 AM, Jose E. Marchesi wrote:

> On 6/17/22 10:18 AM, Jose E. Marchesi wrote:
>> Hi Yonghong.
>>
>>> On 6/15/22 1:57 PM, David Faust wrote:

 On 6/14/22 22:53, Yonghong Song wrote:
>
>
> On 6/7/22 2:43 PM, David Faust wrote:
>> Hello,
>>
>> This patch series adds support for:
>>
>> - Two new C-language-level attributes that allow to associate (to 
>> "annotate" or
>>to "tag") particular declarations and types with arbitrary 
>> strings. As
>>explained below, this is intended to be used to, for example, 
>> characterize
>>certain pointer types.
>>
>> - The conveyance of that information in the DWARF output in the form 
>> of a new
>>DIE: DW_TAG_GNU_annotation.
>>
>> - The conveyance of that information in the BTF output in the form 
>> of two new
>>kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>
>> All of these facilities are being added to the eBPF ecosystem, and 
>> support for
>> them exists in some form in LLVM.
>>
>> Purpose
>> ===
>>
>> 1)  Addition of C-family language constructs (attributes) to specify 
>> free-text
>>  tags on certain language elements, such as struct fields.
>>
>>  The purpose of these annotations is to provide additional 
>> information about
>>  types, variables, and function parameters of interest to 
>> the kernel. A
>>  driving use case is to tag pointer types within the linux 
>> kernel and eBPF
>>  programs with additional semantic information, such as 
>> '__user' or '__rcu'.
>>
>>  For example, consider the linux kernel function do_execve 
>> with the
>>  following declaration:
>>
>>static int do_execve(struct filename *filename,
>>   const char __user *const __user *__argv,
>>   const char __user *const __user *__envp);
>>
>>  Here, __user could be defined with these annotations to 
>> record semantic
>>  information about the pointer parameters (e.g., they are 
>> user-provided) in
>>  DWARF and BTF information. Other kernel facilites such as 
>> the eBPF verifier
>>  can read the tags and make use of the information.
>>
>> 2)  Conveying the tags in the generated DWARF debug info.
>>
>>  The main motivation for emitting the tags in DWARF is that 
>> the Linux kernel
>>  generates its BTF information via pahole, using DWARF as a 
>> source:
>>
>>  ++  BTF  BTF   +--+
>>  | pahole |---> vmlinux.btf --->| verifier |
>>  ++ +--+
>>  ^^
>>  ||
>>DWARF |BTF |
>>  ||
>>   vmlinux  +-+
>>   module1.ko   | BPF program |
>>   module2.ko   +-+
>> ...
>>
>>  This is because:
>>
>>  a)  Unlike GCC, LLVM will only generate BTF for BPF 
>> programs.
>>
>>  b)  GCC can generate BTF for whatever target with -gbtf, 
>> but there is no
>>  support for linking/deduplicating BTF in the linker.
>>
>>  In the scenario above, the verifier needs access to the 
>> pointer tags of
>>  both the kernel types/declarations (conveyed in the DWARF 
>> and translated
>>  to BTF by pahole) and those of the BPF program (available 
>> directly in BTF).
>>
>>  Another motivation for having the tag information in DWARF, 
>> unrelated to
>>  BPF and BTF, is that the drgn project (another DWARF 
>> consumer) also wants
>>  to benefit from these tags in order to differentiate 
>> between different
>>  kinds of pointers in the kernel.
>>
>> 3)  Conveying the tags in the generated BTF debug info.

Re: [PATCH] match.pd: Add new abs pattern [PR94290]

2022-07-14 Thread Sam Feifer via Gcc-patches
On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski  wrote:

> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches
>  wrote:
> >
> > This patch is intended to fix a missed optimization in match.pd. It
> optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to
> write a second simplification in match.pd to handle the commutative
> property as the match was not ocurring otherwise. Additionally, the pattern
> (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps with the
> other simplification rule.
>
> You could use :c for the commutative property instead and that should
> simplify things.
> That is:
>
> (simplify
>   (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
>   (abs @0))
>
> Also since integer_zerop works on vectors, it seems like you should
> add a testcase or two for the vector case.
> Also would be useful if you write a testcase that uses different
> statements rather than one big one so it gets exercised in the
> forwprop case.
> Note also if either of the max are used more than just in this
> simplification, it could increase the lifetime of @0, maybe you need
> to add :s to the max expressions.
>
>
Thanks for the feedback. I'm not quite sure what a vector test case would
look like for this. Could I get some guidance on that?

Thanks
-Sam


> Thanks,
> Andrew
>
> >
> > Tests are also included to be added to the testsuite.
> >
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >
> > PR tree-optimization/94290
> >
> > gcc/ChangeLog:
> >
> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New
> simplification.
> > * match.pd (x <= 0 ? -x : 0): New Simplification.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/execute/pr94290-1.c: New test.
> > * gcc.dg/pr94290-2.c: New test.
> > * gcc.dg/pr94290.c: New test.
> > ---
> >  gcc/match.pd  | 15 ++
> >  .../gcc.c-torture/execute/pr94290-1.c | 16 +++
> >  gcc/testsuite/gcc.dg/pr94290-2.c  | 15 ++
> >  gcc/testsuite/gcc.dg/pr94290.c| 46 +++
> >  4 files changed, 92 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 45aefd96688..55ca79d7ac9 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7848,3 +7848,18 @@ and,
> >(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> >  (bit_and @0 @1)
> >(cond (le @0 @1) @0 (bit_and @0 @1))
> > +
> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
> > +(simplify
> > +  (plus (max @0 integer_zerop) (max (negate @0) integer_zerop))
> > +  (abs @0))
> > +
> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x.  */
> > +(simplify
> > +  (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) )
> > +  (abs @0))
> > +
> > +/* (x <= 0 ? -x : 0) -> max(-x, 0).  */
> > +(simplify
> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1)
> > + (max (negate @0) @1))
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> > new file mode 100644
> > index 000..93b80d569aa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> > @@ -0,0 +1,16 @@
> > +/* PR tree-optimization/94290 */
> > +
> > +#include "../../gcc.dg/pr94290.c"
> > +
> > +int main() {
> > +
> > +if (foo(0) != 0
> > +|| foo(-42) != 42
> > +|| foo(42) != 42
> > +|| baz(-10) != 10
> > +|| baz(-10) != 10) {
> > +__builtin_abort();
> > +}
> > +
> > +return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/pr94290-2.c
> b/gcc/testsuite/gcc.dg/pr94290-2.c
> > new file mode 100644
> > index 000..ea6e55755f5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr94290-2.c
> > @@ -0,0 +1,15 @@
> > +/* PR tree-optimization/94290 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* Form from PR.  */
> > +__attribute__((noipa)) unsigned int foo(int x) {
> > +return x <= 0 ? -x : 0;
> > +}
> > +
> > +/* Changed order.  */
> > +__attribute__((noipa)) unsigned int bar(int x) {
> > +return 0 >= x ? -x : 0;
> > +}
> > +
> > +/* { dg-final {scan-tree-dump-times " MAX_EXPR " 2 "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/pr94290.c
> b/gcc/testsuite/gcc.dg/pr94290.c
> > new file mode 100644
> > index 000..47617c36c02
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr94290.c
> > @@ -0,0 +1,46 @@
> > +/* PR tree-optimization/94290 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +
> > +/* Same form as PR.  */
> > +__attribute__((noipa)) unsigned int foo(int x) {
> > +return (x >= 0 ? x : 0) + (x <= 0 ? -x : 0);
> > +}
> > +
> > +/* Signed function.  */
> > 

[PATCH] libstdc++: Make __from_chars_alnum_to_val conversion explicit

2022-07-14 Thread Marco Falke via Gcc-patches
>From 2d4e7cd1d476a065d824e11045c8dbc049d5f0a0 Mon Sep 17 00:00:00 2001
From: MacroFake 
Date: Thu, 14 Jul 2022 15:26:12 +0200
Subject: [PATCH] libstdc++: Make __from_chars_alnum_to_val conversion explicit

The optimizations from commit a54137c88061c7495728fc6b8dfd0474e812b2cb
introduced a clang integer sanitizer error.

Fix this with an explicit static_cast, similar to the fix in commit
074436cf8cdd2a9ce75cadd36deb8301f00e55b9.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum_to_val): Replace
  implicit conversions from int to unsigned char with explicit
  casts.
---
 libstdc++-v3/include/std/charconv | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/charconv
b/libstdc++-v3/include/std/charconv
index 218813e4797..bdf23e4a5ad 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -436,7 +436,7 @@ namespace __detail
 __from_chars_alnum_to_val(unsigned char __c)
 {
   if _GLIBCXX17_CONSTEXPR (_DecOnly)
-return __c - '0';
+   return static_cast(__c - '0');
   else
 {
   // This initializer is deliberately made dependent in order to work
-- 
2.35.3
From 2d4e7cd1d476a065d824e11045c8dbc049d5f0a0 Mon Sep 17 00:00:00 2001
From: MacroFake 
Date: Thu, 14 Jul 2022 15:26:12 +0200
Subject: [PATCH] libstdc++: Make __from_chars_alnum_to_val conversion explicit

The optimizations from commit a54137c88061c7495728fc6b8dfd0474e812b2cb
introduced a clang integer sanitizer error.

Fix this with an explicit static_cast, similar to the fix in commit
074436cf8cdd2a9ce75cadd36deb8301f00e55b9.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum_to_val): Replace
  implicit conversions from int to unsigned char with explicit
  casts.
---
 libstdc++-v3/include/std/charconv | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv
index 218813e4797..bdf23e4a5ad 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -436,7 +436,7 @@ namespace __detail
 __from_chars_alnum_to_val(unsigned char __c)
 {
   if _GLIBCXX17_CONSTEXPR (_DecOnly)
-	return __c - '0';
+   return static_cast(__c - '0');
   else
 	{
 	  // This initializer is deliberately made dependent in order to work
-- 
2.35.3



Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
>  wrote:
>>
>> On Wed, 13 Jul 2022 at 12:22, Richard Biener  
>> wrote:
>> >
>> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
>> >  wrote:
>> > >
>> > > Hi Richard,
>> > > For the following test:
>> > >
>> > > svint32_t f2(int a, int b, int c, int d)
>> > > {
>> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
>> > >   return svld1rq_s32 (svptrue_b8 (), [0]);
>> > > }
>> > >
>> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
>> > > foo.c: In function ‘f2’:
>> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
>> > > 4 | svint32_t f2(int a, int b, int c, int d)
>> > >   |   ^~
>> > > svint32_t
>> > > __Int32x4_t
>> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
>> > > during GIMPLE pass: forwprop
>> > > dump file: foo.c.109t.forwprop2
>> > > foo.c:4:11: internal compiler error: verify_gimple failed
>> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
>> > > ../../gcc/gcc/tree-cfg.cc:5568
>> > > 0xe9371f execute_function_todo
>> > > ../../gcc/gcc/passes.cc:2091
>> > > 0xe93ccb execute_todo
>> > > ../../gcc/gcc/passes.cc:2145
>> > >
>> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we 
>> > > have:
>> > >   int32x4_t v;
>> > >   __Int32x4_t _1;
>> > >   svint32_t _9;
>> > >   vector(4) int _11;
>> > >
>> > >:
>> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
>> > >   v_12 = _1;
>> > >   _11 = v_12;
>> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
>> > >   return _9;
>> > >
>> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
>> > > view_convert_expr,
>> > > and the end result becomes:
>> > >   svint32_t _7;
>> > >   __Int32x4_t _8;
>> > >
>> > > ;;   basic block 2, loop depth 0
>> > > ;;pred:   ENTRY
>> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
>> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
>> > >   return _7;
>> > > ;;succ:   EXIT
>> > >
>> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
>> > > has incompatible types (svint32_t, int32x4_t).
>> > >
>> > > The attached patch disables simplification of VEC_PERM_EXPR
>> > > in simplify_permutation, if lhs and rhs have non compatible types,
>> > > which resolves ICE, but am not sure if it's the correct approach ?
>> >
>> > It for sure papers over the issue.  I think the error happens earlier,
>> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
>> > which is the type of the LHS.  But then you probably run into the
>> > different sizes ICE (VLA vs constant size).  I think for this case you
>> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
>> > selecting the "low" part of the VLA vector.
>> Hi Richard,
>> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
>> represent dup operation
>> from fixed width to VLA vector. I am not sure how folding it to
>> BIT_FIELD_REF will work.
>> Could you please elaborate ?
>>
>> Also, the issue doesn't seem restricted to this case.
>> The following test case also ICE's during forwprop:
>> svint32_t foo()
>> {
>>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
>>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), [0]);
>>   return v2;
>> }
>>
>> foo2.c: In function ‘foo’:
>> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
>> 9 | }
>>   | ^
>> svint32_t
>> int32x4_t
>> v2_4 = { 1, 2, 3, 4 };
>>
>> because simplify_permutation folds
>> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
>> into:
>> vector_cst {1, 2, 3, 4}
>>
>> and it complains during verify_gimple_assign_single because we don't
>> support assignment of vector_cst to VLA vector.
>>
>> I guess the issue really is that currently, only VEC_PERM_EXPR
>> supports lhs and rhs
>> to have vector types with differing lengths, and simplifying it to
>> other tree codes, like above,
>> will result in type errors ?
>
> That might be the case - Richard should know.

I don't see anything particularly special about VEC_PERM_EXPR here,
or about the VLA vs. VLS thing.  We would have the same issue trying
to build a 128-bit vector from 2 64-bit vectors.  And there are other
tree codes whose input types are/can be different from their output
types.

So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
of type T needs to be replaced by something of type T.  Whether T has a
constant size or a variable size doesn't matter.

> If so your type check
> is still too late, you should instead recognize that we are permuting
> a VLA vector and then refuse to go any of the non-VEC_PERM generating
> paths - that probably means just allowing the code == VEC_PERM_EXPR
> case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?

Yeah.  If we're talking about the match.pd code, I think only:

  (if (sel.series_p (0, 1, 0, 1))
   { op0; }
   (if (sel.series_p (0, 1, nelts, 1))
{ op1; }

need a type compatibility check.  For fold_vec_perm I think
we 

Re: [aarch64] Use op_mode instead of vmode for op0, op1 in aarch64_vectorize_vec_perm_const

2022-07-14 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> Hi,
> For following test case:
>
> svint32_t foo()
> {
>   int32x4_t v = (int32x4_t) { 1, 2, 3, 4 };
>   svint32_t v2 = svld1rq_s32 (svptrue_b8(), [0]);
>   return v2;
> }
>
> After applying workaround in forwprop to not simplify VEC_PERM_EXPR in
> simplify_permutation to avoid type error in middle end (or using
> -fno-tree-forwprop)
> as mentioned in:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598390.html
>
> We get following optimized gimple:
>   v2_2 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
>   return v2_2;

Hmm, we really should be able to fold that to a constant.  However…

> However we hit the following ICE during expansion of vec_perm_expr
> because in aarch64_vectorize_vec_perm_const,
> op0 is VECTOR_CST, and we call force_reg (VNx4SI, op0), which is incorrect 
> mode
> for op0. The patch fixes it by using op_mode instead of vmode in calls
> to force_reg
> for op0 and op1.
>
> during RTL pass: expand
> foo2.c: In function ‘foo’:
> foo2.c:8:10: internal compiler error: in emit_move_insn, at expr.cc:4052
> 8 |   return v2;
>   |  ^~
> 0x74789b emit_move_insn(rtx_def*, rtx_def*)
> ../../gcc/gcc/expr.cc:4052
> 0xb8f664 force_reg(machine_mode, rtx_def*)
> ../../gcc/gcc/explow.cc:688
> 0x134182f aarch64_vectorize_vec_perm_const
> ../../gcc/gcc/config/aarch64/aarch64.cc:24132
> 0xe63070 expand_vec_perm_const(machine_mode, rtx_def*, rtx_def*,
> int_vector_builder > const&, machine_mode,
> rtx_def*)
> ../../gcc/gcc/optabs.cc:6254
> 0xbb1569 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> expand_modifier)
> ../../gcc/gcc/expr.cc:10273
> 0xbb6498 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> expand_modifier, rtx_def**, bool)
> ../../gcc/gcc/expr.cc:10625
> 0xa897dc expand_expr
> ../../gcc/gcc/expr.h:310
> 0xa897dc expand_return
> ../../gcc/gcc/cfgexpand.cc:3809
> 0xa897dc expand_gimple_stmt_1
> ../../gcc/gcc/cfgexpand.cc:3918
> 0xa897dc expand_gimple_stmt
> ../../gcc/gcc/cfgexpand.cc:4044
> 0xa8f238 expand_gimple_basic_block
> ../../gcc/gcc/cfgexpand.cc:6096
> 0xa91187 execute
> ../../gcc/gcc/cfgexpand.cc:6822
>
> Is the patch OK to commit after bootstrap+test on aarch64-linux-gnu ?
>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 25f4cbb466d..303814b8cca 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -24129,11 +24129,11 @@ aarch64_vectorize_vec_perm_const (machine_mode 
> vmode, machine_mode op_mode,
>d.op_mode = op_mode;
>d.op_vec_flags = aarch64_classify_vector_mode (d.op_mode);
>d.target = target;
> -  d.op0 = op0 ? force_reg (vmode, op0) : NULL_RTX;
> +  d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX;
>if (op0 == op1)
>  d.op1 = d.op0;
>else
> -d.op1 = op1 ? force_reg (vmode, op1) : NULL_RTX;
> +d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX;
>d.testing_p = !target;
>  
>if (!d.testing_p)

…yes, this is OK, thanks.

Richard


Re: [PATCH] lto-plugin: use -pthread only for detected targets

2022-07-14 Thread Richard Biener via Gcc-patches
On Thu, Jul 14, 2022 at 1:13 PM Martin Liška  wrote:
>
> On 7/14/22 12:10, Richard Biener wrote:
> > On Thu, Jul 14, 2022 at 12:03 PM Martin Liška  wrote:
> >>
> >> On 7/13/22 14:15, Richard Biener wrote:
> >>> Didn't we have it that way and not work?  IIRC LDFLAGS is only
> >>> used during configure link tests and _not_ substituted?
> >>
> >> You are right.
> >>
> >> There's a proper fix that utilizes AM_CONDITIONAL and sets LDFLAGS properly
> >> in Makefile.am.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > I'm not really an expert on build issues but I'd have added
> > some lto_plugin_extra_ldflags variable, AC_SUBST ()ed it
> > and done
> >
> > AM_LDFLAGS = @lto_plugin_extra_ldflags@ @ac_lto_plugin_ldflags@
> >
> > or figure where we set ac_lto_plugin_ldflags and amend that during
> > configure instead:
> >
> > # Need -Wc to get it through libtool.
> > if test "x$have_static_libgcc" = xyes; then
> >ac_lto_plugin_ldflags="-Wc,-static-libgcc"
> > fi
> > AC_SUBST(ac_lto_plugin_ldflags)
>
> All right, so something like this?

LGTM with the changelog fixed

> Martin
>
> >
> >
> >> Thanks,
> >> Martin


Re: [PATCH] lto-plugin: use -pthread only for detected targets

2022-07-14 Thread Martin Liška
On 7/14/22 12:10, Richard Biener wrote:
> On Thu, Jul 14, 2022 at 12:03 PM Martin Liška  wrote:
>>
>> On 7/13/22 14:15, Richard Biener wrote:
>>> Didn't we have it that way and not work?  IIRC LDFLAGS is only
>>> used during configure link tests and _not_ substituted?
>>
>> You are right.
>>
>> There's a proper fix that utilizes AM_CONDITIONAL and sets LDFLAGS properly
>> in Makefile.am.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> 
> I'm not really an expert on build issues but I'd have added
> some lto_plugin_extra_ldflags variable, AC_SUBST ()ed it
> and done
> 
> AM_LDFLAGS = @lto_plugin_extra_ldflags@ @ac_lto_plugin_ldflags@
> 
> or figure where we set ac_lto_plugin_ldflags and amend that during
> configure instead:
> 
> # Need -Wc to get it through libtool.
> if test "x$have_static_libgcc" = xyes; then
>ac_lto_plugin_ldflags="-Wc,-static-libgcc"
> fi
> AC_SUBST(ac_lto_plugin_ldflags)

All right, so something like this?

Martin

> 
> 
>> Thanks,
>> Martin
From 248469bd1b35d2c87d751b1adbc07c784ebd4830 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 14 Jul 2022 09:51:33 +0200
Subject: [PATCH] lto-plugin: use -pthread only for detected targets

Use -pthread only if we are going to use pthread functionality.

	PR bootstrap/106156

lto-plugin/ChangeLog:

	* Makefile.am: Append -pthread to LDFLAGS if
	LTO_PLUGIN_USE_PTHREAD.
	* configure.ac: Use AM_CONDITIONAL for LTO_PLUGIN_USE_PTHREAD.
	* Makefile.in: Regenerate.
	* configure: Regenerate.
---
 lto-plugin/Makefile.am  |  3 +--
 lto-plugin/Makefile.in  |  4 ++--
 lto-plugin/configure| 10 --
 lto-plugin/configure.ac |  5 +
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am
index 482946e4dd5..aba3c5a00cd 100644
--- a/lto-plugin/Makefile.am
+++ b/lto-plugin/Makefile.am
@@ -9,8 +9,7 @@ libexecsubdir := $(libexecdir)/gcc/$(real_target_noncanonical)/$(gcc_version)$(a
 
 AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS)
 AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS) -DBASE_VERSION='"$(gcc_version)"'
-# The plug-in depends on pthreads.
-AM_LDFLAGS = -pthread @ac_lto_plugin_ldflags@
+AM_LDFLAGS = @ac_lto_plugin_ldflags@ @ac_lto_plugin_extra_ldflags@
 AM_LIBTOOLFLAGS = --tag=disable-static
 override CFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(CFLAGS))
 override LDFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(LDFLAGS))
diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
index 9453bc7d607..cb568e1e09f 100644
--- a/lto-plugin/Makefile.in
+++ b/lto-plugin/Makefile.in
@@ -276,6 +276,7 @@ abs_top_builddir = @abs_top_builddir@
 abs_top_srcdir = @abs_top_srcdir@
 ac_ct_CC = @ac_ct_CC@
 ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
+ac_lto_plugin_extra_ldflags = @ac_lto_plugin_extra_ldflags@
 ac_lto_plugin_ldflags = @ac_lto_plugin_ldflags@
 ac_lto_plugin_warn_cflags = @ac_lto_plugin_warn_cflags@
 accel_dir_suffix = @accel_dir_suffix@
@@ -344,8 +345,7 @@ gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
 libexecsubdir := $(libexecdir)/gcc/$(real_target_noncanonical)/$(gcc_version)$(accel_dir_suffix)
 AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS)
 AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS) -DBASE_VERSION='"$(gcc_version)"'
-# The plug-in depends on pthreads.
-AM_LDFLAGS = -pthread @ac_lto_plugin_ldflags@
+AM_LDFLAGS = @ac_lto_plugin_ldflags@ @ac_lto_plugin_extra_ldflags@
 AM_LIBTOOLFLAGS = --tag=disable-static
 libexecsub_LTLIBRARIES = liblto_plugin.la
 in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib))
diff --git a/lto-plugin/configure b/lto-plugin/configure
index 870e49b2e62..fdb8a5964b4 100755
--- a/lto-plugin/configure
+++ b/lto-plugin/configure
@@ -650,6 +650,7 @@ LD
 FGREP
 SED
 LIBTOOL
+ac_lto_plugin_extra_ldflags
 LTO_PLUGIN_USE_SYMVER_SUN_FALSE
 LTO_PLUGIN_USE_SYMVER_SUN_TRUE
 LTO_PLUGIN_USE_SYMVER_GNU_FALSE
@@ -6012,6 +6013,7 @@ fi
 
 # Check for thread headers.
 use_locking=no
+ac_lto_plugin_extra_ldflags=
 
 case $target in
   riscv*)
@@ -6031,8 +6033,12 @@ $as_echo "#define HAVE_PTHREAD_LOCKING 1" >>confdefs.h
 fi
 
 
+
+  ac_lto_plugin_extra_ldflags="-pthread"
 fi
 
+
+
 case `pwd` in
   *\ * | *\	*)
 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Libtool does not cope well with whitespace in \`pwd\`" >&5
@@ -12104,7 +12110,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12107 "configure"
+#line 12113 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12210,7 +12216,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12213 "configure"
+#line 12219 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/lto-plugin/configure.ac b/lto-plugin/configure.ac
index 18eb4f60b0a..e69a7bd9ec0 100644
--- 

Re: Mips: Fix kernel_stat structure size

2022-07-14 Thread Dimitrije Milosevic
Hi Xi,
> Please CC me (@xry111) in LLVM review.
I just posted the patch. You should've gotten the email. If not, see: 
https://reviews.llvm.org/D129749.
> By the way, if you have some spare time, you can also add the value for
> Musl (all 3 ABIs).  Musl has a different struct stat than Glibc (it's PR
> 106136 in GCC bugzilla).
I might be able to, but no promises. :)


From: Xi Ruoyao 
Sent: Wednesday, July 13, 2022 4:38 AM
To: Dimitrije Milosevic ; Hans-Peter Nilsson 

Cc: Djordje Todorovic ; gcc-patches@gcc.gnu.org 

Subject: Re: Mips: Fix kernel_stat structure size 
 
On Tue, 2022-07-12 at 06:42 +, Dimitrije Milosevic wrote:

> I will send out a review, covering both 32 ABIs, meaning that both O32
> and N32 ABIs will be working properly.

Please CC me (@xry111) in LLVM review.

By the way, if you have some spare time, you can also add the value for
Musl (all 3 ABIs).  Musl has a different struct stat than Glibc (it's PR
106136 in GCC bugzilla).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[aarch64] Use op_mode instead of vmode for op0, op1 in aarch64_vectorize_vec_perm_const

2022-07-14 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
For following test case:

svint32_t foo()
{
  int32x4_t v = (int32x4_t) { 1, 2, 3, 4 };
  svint32_t v2 = svld1rq_s32 (svptrue_b8(), [0]);
  return v2;
}

After applying workaround in forwprop to not simplify VEC_PERM_EXPR in
simplify_permutation to avoid type error in middle end (or using
-fno-tree-forwprop)
as mentioned in:
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598390.html

We get following optimized gimple:
  v2_2 = VEC_PERM_EXPR <{ 1, 2, 3, 4 }, { 1, 2, 3, 4 }, { 0, 1, 2, 3, ... }>;
  return v2_2;

However we hit the following ICE during expansion of vec_perm_expr
because in aarch64_vectorize_vec_perm_const,
op0 is VECTOR_CST, and we call force_reg (VNx4SI, op0), which is incorrect mode
for op0. The patch fixes it by using op_mode instead of vmode in calls
to force_reg
for op0 and op1.

during RTL pass: expand
foo2.c: In function ‘foo’:
foo2.c:8:10: internal compiler error: in emit_move_insn, at expr.cc:4052
8 |   return v2;
  |  ^~
0x74789b emit_move_insn(rtx_def*, rtx_def*)
../../gcc/gcc/expr.cc:4052
0xb8f664 force_reg(machine_mode, rtx_def*)
../../gcc/gcc/explow.cc:688
0x134182f aarch64_vectorize_vec_perm_const
../../gcc/gcc/config/aarch64/aarch64.cc:24132
0xe63070 expand_vec_perm_const(machine_mode, rtx_def*, rtx_def*,
int_vector_builder > const&, machine_mode,
rtx_def*)
../../gcc/gcc/optabs.cc:6254
0xbb1569 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
../../gcc/gcc/expr.cc:10273
0xbb6498 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc/gcc/expr.cc:10625
0xa897dc expand_expr
../../gcc/gcc/expr.h:310
0xa897dc expand_return
../../gcc/gcc/cfgexpand.cc:3809
0xa897dc expand_gimple_stmt_1
../../gcc/gcc/cfgexpand.cc:3918
0xa897dc expand_gimple_stmt
../../gcc/gcc/cfgexpand.cc:4044
0xa8f238 expand_gimple_basic_block
../../gcc/gcc/cfgexpand.cc:6096
0xa91187 execute
../../gcc/gcc/cfgexpand.cc:6822

Is the patch OK to commit after bootstrap+test on aarch64-linux-gnu ?

Thanks,
Prathamesh
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 25f4cbb466d..303814b8cca 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24129,11 +24129,11 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   d.op_mode = op_mode;
   d.op_vec_flags = aarch64_classify_vector_mode (d.op_mode);
   d.target = target;
-  d.op0 = op0 ? force_reg (vmode, op0) : NULL_RTX;
+  d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX;
   if (op0 == op1)
 d.op1 = d.op0;
   else
-d.op1 = op1 ? force_reg (vmode, op1) : NULL_RTX;
+d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX;
   d.testing_p = !target;
 
   if (!d.testing_p)


Re: [PATCH] lto-plugin: use -pthread only for detected targets

2022-07-14 Thread Richard Biener via Gcc-patches
On Thu, Jul 14, 2022 at 12:03 PM Martin Liška  wrote:
>
> On 7/13/22 14:15, Richard Biener wrote:
> > Didn't we have it that way and not work?  IIRC LDFLAGS is only
> > used during configure link tests and _not_ substituted?
>
> You are right.
>
> There's a proper fix that utilizes AM_CONDITIONAL and sets LDFLAGS properly
> in Makefile.am.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

I'm not really an expert on build issues but I'd have added
some lto_plugin_extra_ldflags variable, AC_SUBST ()ed it
and done

AM_LDFLAGS = @lto_plugin_extra_ldflags@ @ac_lto_plugin_ldflags@

or figure where we set ac_lto_plugin_ldflags and amend that during
configure instead:

# Need -Wc to get it through libtool.
if test "x$have_static_libgcc" = xyes; then
   ac_lto_plugin_ldflags="-Wc,-static-libgcc"
fi
AC_SUBST(ac_lto_plugin_ldflags)


> Thanks,
> Martin


Re: [PATCH] lto-plugin: use -pthread only for detected targets

2022-07-14 Thread Martin Liška
On 7/13/22 14:15, Richard Biener wrote:
> Didn't we have it that way and not work?  IIRC LDFLAGS is only
> used during configure link tests and _not_ substituted?

You are right.

There's a proper fix that utilizes AM_CONDITIONAL and sets LDFLAGS properly
in Makefile.am.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom bf0e9635086b5b3f88a40bab9839b2c438eed831 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 14 Jul 2022 09:51:33 +0200
Subject: [PATCH] lto-plugin: use -pthread only for detected targets

Use -pthread only if we are going to use pthread functionality.

	PR bootstrap/106156

lto-plugin/ChangeLog:

	* Makefile.am: Append -pthread to LDFLAGS if
	LTO_PLUGIN_USE_PTHREAD.
	* configure.ac: Use AM_CONDITIONAL for LTO_PLUGIN_USE_PTHREAD.
	* Makefile.in: Regenerate.
	* configure: Regenerate.
---
 lto-plugin/Makefile.am  |  7 +--
 lto-plugin/Makefile.in  |  5 +++--
 lto-plugin/configure| 19 +--
 lto-plugin/configure.ac |  2 ++
 4 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am
index 482946e4dd5..ce51e8e173e 100644
--- a/lto-plugin/Makefile.am
+++ b/lto-plugin/Makefile.am
@@ -9,12 +9,15 @@ libexecsubdir := $(libexecdir)/gcc/$(real_target_noncanonical)/$(gcc_version)$(a
 
 AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS)
 AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS) -DBASE_VERSION='"$(gcc_version)"'
-# The plug-in depends on pthreads.
-AM_LDFLAGS = -pthread @ac_lto_plugin_ldflags@
+AM_LDFLAGS = @ac_lto_plugin_ldflags@
 AM_LIBTOOLFLAGS = --tag=disable-static
 override CFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(CFLAGS))
 override LDFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(LDFLAGS))
 
+if LTO_PLUGIN_USE_PTHREAD
+override LDFLAGS := $(LDFLAGS) -pthread
+endif
+
 libexecsub_LTLIBRARIES = liblto_plugin.la
 gcc_build_dir = @gcc_build_dir@
 in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib))
diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
index 9453bc7d607..c323b6a51de 100644
--- a/lto-plugin/Makefile.in
+++ b/lto-plugin/Makefile.in
@@ -344,8 +344,7 @@ gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
 libexecsubdir := $(libexecdir)/gcc/$(real_target_noncanonical)/$(gcc_version)$(accel_dir_suffix)
 AM_CPPFLAGS = -I$(top_srcdir)/../include $(DEFS)
 AM_CFLAGS = @ac_lto_plugin_warn_cflags@ $(CET_HOST_FLAGS) -DBASE_VERSION='"$(gcc_version)"'
-# The plug-in depends on pthreads.
-AM_LDFLAGS = -pthread @ac_lto_plugin_ldflags@
+AM_LDFLAGS = @ac_lto_plugin_ldflags@
 AM_LIBTOOLFLAGS = --tag=disable-static
 libexecsub_LTLIBRARIES = liblto_plugin.la
 in_gcc_libs = $(foreach lib, $(libexecsub_LTLIBRARIES), $(gcc_build_dir)/$(lib))
@@ -685,6 +684,8 @@ uninstall-am: uninstall-libexecsubLTLIBRARIES
 override CFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(CFLAGS))
 override LDFLAGS := $(filter-out -fsanitize=address -fsanitize=hwaddress,$(LDFLAGS))
 
+@LTO_PLUGIN_USE_PTHREAD_TRUE@override LDFLAGS := $(LDFLAGS) -pthread
+
 all-local: $(in_gcc_libs)
 
 $(in_gcc_libs) : $(gcc_build_dir)/%: %
diff --git a/lto-plugin/configure b/lto-plugin/configure
index 870e49b2e62..bf7c4d064ee 100755
--- a/lto-plugin/configure
+++ b/lto-plugin/configure
@@ -650,6 +650,8 @@ LD
 FGREP
 SED
 LIBTOOL
+LTO_PLUGIN_USE_PTHREAD_FALSE
+LTO_PLUGIN_USE_PTHREAD_TRUE
 LTO_PLUGIN_USE_SYMVER_SUN_FALSE
 LTO_PLUGIN_USE_SYMVER_SUN_TRUE
 LTO_PLUGIN_USE_SYMVER_GNU_FALSE
@@ -6033,6 +6035,15 @@ fi
 
 fi
 
+ if test "x$use_locking" = xyes; then
+  LTO_PLUGIN_USE_PTHREAD_TRUE=
+  LTO_PLUGIN_USE_PTHREAD_FALSE='#'
+else
+  LTO_PLUGIN_USE_PTHREAD_TRUE='#'
+  LTO_PLUGIN_USE_PTHREAD_FALSE=
+fi
+
+
 case `pwd` in
   *\ * | *\	*)
 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Libtool does not cope well with whitespace in \`pwd\`" >&5
@@ -12104,7 +12115,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12107 "configure"
+#line 12118 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12210,7 +12221,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12213 "configure"
+#line 12224 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12693,6 +12704,10 @@ if test -z "${LTO_PLUGIN_USE_SYMVER_SUN_TRUE}" && test -z "${LTO_PLUGIN_USE_SYMV
   as_fn_error $? "conditional \"LTO_PLUGIN_USE_SYMVER_SUN\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${LTO_PLUGIN_USE_PTHREAD_TRUE}" && test -z "${LTO_PLUGIN_USE_PTHREAD_FALSE}"; then
+  as_fn_error $? "conditional \"LTO_PLUGIN_USE_PTHREAD\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 
 : "${CONFIG_STATUS=./config.status}"
 ac_write_fail=0
diff 

Re: [PATCH] Extend 64-bit vector bit_op patterns with ?r alternative

2022-07-14 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 14, 2022 at 11:32 AM Hongtao Liu  wrote:
>
> On Thu, Jul 14, 2022 at 3:22 PM Uros Bizjak via Gcc-patches
>  wrote:
> >
> > On Thu, Jul 14, 2022 at 7:33 AM liuhongt  wrote:
> > >
> > > And split it to GPR-version instruction after reload.
> > >
> > > > ?r was introduced under the assumption that we want vector values
> > > > mostly in vector registers. Currently there are no instructions with
> > > > memory or immediate operand, so that made sense at the time. Let's
> > > > keep ?r until logic instructions with mem/imm operands are introduced.
> > > > So, for the patch that adds 64-bit vector logic in GPR, I would advise
> > > > to first introduce only register operands. mem/imm operands should be
> > > Update patch to add ?r to 64-bit bit_op patterns.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > No big imact on SPEC2017(Most same binary).
> >
> > The problem with your approach is with the combine pass, where combine
> > first tries to recognize the combined instruction without clobber,
> > before re-recognizing instruction with added clobber. So, if a forward
> > propagation happens, the combine will *always* choose the insn variant
> > without GPR.
> Thank you for the explanation, I really did not know this point.
> >
> > So, the solution with VI_16_32 is to always expand with a clobbered
> > version that is split to either SImode or V16QImode. With 64-bit
> > instructions, we have two additional complications. First, we have a
> > native MMX instruction, and we have to split to it after reload, and
> > second, we have a builtin that expects vector insn.
> >
> > To solve the first issue, we should change the mode of
> > "*mmx" to V1DImode and split your new _gpr version with
> > clobber to it for !GENERAL_REG_P operands.
> >
> > The second issue could be solved by emitting V1DImode instructions
> > directly from the expander. Please note there are several expanders
> > that expect non-clobbered logic insn in certain mode to be available,
> > so the situation can become quite annoying...
> Yes. It looks like it would add a lot of code complexity, I'll hold
> the patch for now.

I did some experimenting in the past with the idea of adding GPR
instructions to 64-bit vectors. While there were some opportunities
with 32- and 16-bit operations, mostly due to the fact that these
arguments are passed via integer registers, 64-bit cases never
triggered, because 64-bit vectors are passed via XMM registers. Also,
when mem/imm alternatives were added, many inter-regunit moves were
generated for everything but the most simple testcases involving logic
operations, also considering the limited range of 64-bit immediates.
IMO, the only case it is worth adding is a direct immediate store to
memory, which HJ recently added.

Uros.


Uros.


Re: [PATCH] ipa-cp: Fix assert triggering with -fno-toplevel-reorder (PR 106260)

2022-07-14 Thread Martin Jambor
Hi,

On Thu, Jul 14 2022, Richard Biener wrote:
> On Wed, Jul 13, 2022 at 11:06 PM Martin Jambor  wrote:
>>
>> Hi,
>>
>> with -fno-toplevel-reorder (and -fwhole-program), there apparently can
>> be local functions without any callers.
>
> Did you check why?  Can't we fix that?

no, I have not checked much.

Our manual description of -fno-toplevel-reorder says that:

  "When this option is used, unreferenced static variables are not
   removed."

In the beginning, static (because of -fwhole-program) variables _ZTV1S
and __gxx_personality_v0 do refer to the destructor in question, my
hypothesis is that the dead function removal code somehow has to be
careful in case these have been already output, even when these two
variables are no longer in the symbol table when IPA-CP runs.

But I did not look deeper, hoping that Honza would correct me if I am
totally off.  Perhaps the bug is that the function should not be local
even with -fwhole-program.  The semantics of this combination of options
is a bit unclear to me.

Martin


>
>>  This is something that IPA-CP
>> does not like because its propagation verifier checks that local
>> functions do not end up with TOP in their lattices.  Therefore there
>> is an assert checking that all call-less unreachable functions have
>> been removed, which triggers in PR 106260 with these two options.
>>
>> This patch detects the situation and marks the lattices as variable,
>> thus avoiding both the assert trigger and the verification failure.
>>
>> Bootstrapped and tested on x86_64-linux.  OK for master and then all
>> active release branches?
>>
>> Thanks,
>>
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2022-07-13  Martin Jambor  
>>
>> PR ipa/106260
>> * ipa-cp.cc (initialize_node_lattices): Replace assert that there are
>> callers with handling that situation when -fno-toplevel_reorder.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2022-07-13  Martin Jambor  
>>
>> PR ipa/106260
>> * g++.dg/ipa/pr106260.C: New test.
>> ---
>>  gcc/ipa-cp.cc   |  6 ++-
>>  gcc/testsuite/g++.dg/ipa/pr106260.C | 64 +
>>  2 files changed, 69 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr106260.C
>>
>> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
>> index 543a9334e2c..f699a8dadc0 100644
>> --- a/gcc/ipa-cp.cc
>> +++ b/gcc/ipa-cp.cc
>> @@ -1286,10 +1286,14 @@ initialize_node_lattices (struct cgraph_node *node)
>>int caller_count = 0;
>>node->call_for_symbol_thunks_and_aliases (count_callers, 
>> _count,
>> true);
>> -  gcc_checking_assert (caller_count > 0);
>>if (caller_count == 1)
>> node->call_for_symbol_thunks_and_aliases (set_single_call_flag,
>>   NULL, true);
>> +  else if (caller_count == 0)
>> +   {
>> + gcc_checking_assert (!opt_for_fn (node->decl, 
>> flag_toplevel_reorder));
>> + variable = true;
>> +   }
>>  }
>>else
>>  {
>> diff --git a/gcc/testsuite/g++.dg/ipa/pr106260.C 
>> b/gcc/testsuite/g++.dg/ipa/pr106260.C
>> new file mode 100644
>> index 000..bd3b6e0af79
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/ipa/pr106260.C
>> @@ -0,0 +1,64 @@
>> +// { dg-do compile }
>> +// { dg-options "-O2 -std=gnu++14 -fwhole-program -fno-unit-at-a-time" }
>> +
>> +struct A;
>> +template 
>> +struct Q { Q (T); };
>> +template
>> +struct U {
>> +  ~U () { m1 (nullptr); }
>> +  D m2 ();
>> +  T *u;
>> +  void m1 (T *) { m2 () (u); }
>> +};
>> +struct F { F (int *); };
>> +template 
>> +using W = Q;
>> +int a, b;
>> +void fn1 (void *);
>> +template 
>> +void
>> +fn2 (T *x)
>> +{
>> +  if (x)
>> +x->~T();
>> +  fn1 (x);
>> +}
>> +template 
>> +struct C {
>> +  void operator() (T *x) { fn2 (x); }
>> +};
>> +struct D;
>> +template  >
>> +using V = U;
>> +struct A {
>> +  A (int *);
>> +};
>> +struct S;
>> +struct G {
>> +  V m3 ();
>> +};
>> +struct S {
>> +  int e;
>> +  virtual ~S () {}
>> +};
>> +template
>> +struct H {
>> +  H (int, T x, int) : h(x) {}
>> +  G g;
>> +  void m4 () { g.m3 (); }
>> +  T h;
>> +};
>> +struct I {
>> +  I(A, W);
>> +};
>> +void
>> +test ()
>> +{
>> +  A c ();
>> +  W d ();
>> +  I e (c, d);
>> +  H f (0, e, a);
>> +  f.m4 ();
>> +}
>> +
>> --
>> 2.36.1
>>


[PATCH][pushed] libiberty: fix docs typo

2022-07-14 Thread Martin Liška
libiberty/ChangeLog:

* functions.texi: Replace strtoul with strtoull.
---
 libiberty/functions.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libiberty/functions.texi b/libiberty/functions.texi
index e4d74b58220..b56b02e0686 100644
--- a/libiberty/functions.texi
+++ b/libiberty/functions.texi
@@ -1747,7 +1747,7 @@ that the converted value is unsigned.
 @c strtoll.c:33
 @deftypefn Supplemental {long long int} strtoll (const char *@var{string}, @
   char **@var{endptr}, int @var{base})
-@deftypefnx Supplemental {unsigned long long int} strtoul (@
+@deftypefnx Supplemental {unsigned long long int} strtoull (@
   const char *@var{string}, char **@var{endptr}, int @var{base})
 
 The @code{strtoll} function converts the string in @var{string} to a
-- 
2.37.0



Re: [PATCH] Extend 64-bit vector bit_op patterns with ?r alternative

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 14, 2022 at 3:22 PM Uros Bizjak via Gcc-patches
 wrote:
>
> On Thu, Jul 14, 2022 at 7:33 AM liuhongt  wrote:
> >
> > And split it to GPR-version instruction after reload.
> >
> > > ?r was introduced under the assumption that we want vector values
> > > mostly in vector registers. Currently there are no instructions with
> > > memory or immediate operand, so that made sense at the time. Let's
> > > keep ?r until logic instructions with mem/imm operands are introduced.
> > > So, for the patch that adds 64-bit vector logic in GPR, I would advise
> > > to first introduce only register operands. mem/imm operands should be
> > Update patch to add ?r to 64-bit bit_op patterns.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > No big imact on SPEC2017(Most same binary).
>
> The problem with your approach is with the combine pass, where combine
> first tries to recognize the combined instruction without clobber,
> before re-recognizing instruction with added clobber. So, if a forward
> propagation happens, the combine will *always* choose the insn variant
> without GPR.
Thank you for the explanation, I really did not know this point.
>
> So, the solution with VI_16_32 is to always expand with a clobbered
> version that is split to either SImode or V16QImode. With 64-bit
> instructions, we have two additional complications. First, we have a
> native MMX instruction, and we have to split to it after reload, and
> second, we have a builtin that expects vector insn.
>
> To solve the first issue, we should change the mode of
> "*mmx" to V1DImode and split your new _gpr version with
> clobber to it for !GENERAL_REG_P operands.
>
> The second issue could be solved by emitting V1DImode instructions
> directly from the expander. Please note there are several expanders
> that expect non-clobbered logic insn in certain mode to be available,
> so the situation can become quite annoying...
Yes. It looks like it would add a lot of code complexity, I'll hold
the patch for now.
>
> Uros.



-- 
BR,
Hongtao


Re: [PATCH] [RFC]Support vectorization for Complex type.

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 14, 2022 at 4:53 PM Hongtao Liu  wrote:
>
> On Thu, Jul 14, 2022 at 4:20 PM Richard Biener
>  wrote:
> >
> > On Wed, Jul 13, 2022 at 9:34 AM Richard Biener
> >  wrote:
> > >
> > > On Wed, Jul 13, 2022 at 6:47 AM Hongtao Liu  wrote:
> > > >
> > > > On Tue, Jul 12, 2022 at 10:12 PM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Tue, Jul 12, 2022 at 6:11 AM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jul 11, 2022 at 7:47 PM Richard Biener via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, Jul 11, 2022 at 5:44 AM liuhongt  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > The patch only handles load/store(including ctor/permutation, 
> > > > > > > > except
> > > > > > > > gather/scatter) for complex type, other operations don't needs 
> > > > > > > > to be
> > > > > > > > handled since they will be lowered by pass cplxlower.(MASK_LOAD 
> > > > > > > > is not
> > > > > > > > supported for complex type, so no need to handle either).
> > > > > > >
> > > > > > > (*)
> > > > > > >
> > > > > > > > Instead of support vector(2) _Complex double, this patch takes 
> > > > > > > > vector(4)
> > > > > > > > double as vector type of _Complex double. Since vectorizer 
> > > > > > > > originally
> > > > > > > > takes TYPE_VECTOR_SUBPARTS as nunits which is not true for 
> > > > > > > > complex
> > > > > > > > type, the patch handles nunits/ncopies/vf specially for complex 
> > > > > > > > type.
> > > > > > >
> > > > > > > For the limited set above(*) can you explain what's "special" 
> > > > > > > about
> > > > > > > vector(2) _Complex
> > > > > > > vs. vector(4) double, thus why we need to have 
> > > > > > > STMT_VINFO_COMPLEX_P at all?
> > > > > > Supporting a vector(2) complex  is a straightforward idea, just like
> > > > > > supporting other scalar type in vectorizer, but it requires more
> > > > > > efforts(in the backend and frontend), considering that most of
> > > > > > operations of complex type will be lowered into realpart and 
> > > > > > imagpart
> > > > > > operations, supporting a vector(2) complex does not look that
> > > > > > necessary. Then it comes up with supporting vector(4) double(with
> > > > > > adjustment of vf/ctor/permutation), the vectorizer only needs to
> > > > > > handle the vectorization of the move operation of the complex 
> > > > > > type(no
> > > > > > need to worry about wrongly mapping vector(4) double multiplication 
> > > > > > to
> > > > > > complex type multiplication since it's already lowered before
> > > > > > vectorizer).
> > > > > > stmt_info does not record the scalar type, in order to avoid 
> > > > > > duplicate
> > > > > > operation like getting a lhs type from stmt to determine whether it 
> > > > > > is
> > > > > > a complex type, STMT_VINFO_COMPLEX_P bit is added, this bit is 
> > > > > > mainly
> > > > > > initialized in vect_analyze_data_refs and vect_get_vector_types_for_
> > > > > > stmt.
> > > > > > >
> > > > > > > I wonder to what extent your handling can be extended to support 
> > > > > > > re-vectorizing
> > > > > > > (with a higher VF for example) already vectorized code?  The 
> > > > > > > vectorizer giving
> > > > > > > up on vector(2) double looks quite obviously similar to it giving 
> > > > > > > up
> > > > > > > on _Complex double ...
> > > > > > Yes, it can be extended to vector(2) double/float/int/ with a 
> > > > > > bit
> > > > > > adjustment(exacting element by using bit_field instead of
> > > > > > imagpart_expr/realpart_expr).
> > > > > > > It would be a shame to not use the same underlying mechanism for 
> > > > > > > dealing with
> > > > > > > both, where for the vector case obviously vector(4) would be 
> > > > > > > supported as well.
> > > > > > >
> > > > > > > In principle _Complex double operations should be two SLP lanes 
> > > > > > > but it seems you
> > > > > > > are handling them with classical interleaving as well?
> > > > > > I'm only handling move operations, for other operations it will be
> > > > > > lowered to realpart and imagpart and thus two SLP lanes.
> > > > >
> > > > > Yes, I understood that.
> > > > >
> > > > > Doing it more general (and IMHO better) would involve enhancing
> > > > > how we represent dataref groups, maintaining the number of scalars
> > > > > covered by each of the vinfos.  On the SLP representation side it
> > > > > probably requires to rely on the representative for access and not
> > > > > on the scalar stmts (since those do not map properly to the lanes).
> > > > >
> > > > > Ideally we'd be able to handle
> > > > >
> > > > > struct { _Complex double c; double a; double b; } a[], b[];
> > > > >
> > > > > void foo ()
> > > > > {
> > > > >for (int i = 0; i < 100; ++i)
> > > > > {
> > > > >   a[i].c = b[i].c;
> > > > >   a[i].a = b[i].a;
> > > > >   a[i].b = b[i].b;
> > > > > }
> > > > > }
> > > > >
> > > > > which I guess your patch doesn't handle with plain AVX vector
> > > > > copies but instead uses 

Re: [PATCH] [RFC]Support vectorization for Complex type.

2022-07-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jul 14, 2022 at 4:20 PM Richard Biener
 wrote:
>
> On Wed, Jul 13, 2022 at 9:34 AM Richard Biener
>  wrote:
> >
> > On Wed, Jul 13, 2022 at 6:47 AM Hongtao Liu  wrote:
> > >
> > > On Tue, Jul 12, 2022 at 10:12 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Jul 12, 2022 at 6:11 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Mon, Jul 11, 2022 at 7:47 PM Richard Biener via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, Jul 11, 2022 at 5:44 AM liuhongt  
> > > > > > wrote:
> > > > > > >
> > > > > > > The patch only handles load/store(including ctor/permutation, 
> > > > > > > except
> > > > > > > gather/scatter) for complex type, other operations don't needs to 
> > > > > > > be
> > > > > > > handled since they will be lowered by pass cplxlower.(MASK_LOAD 
> > > > > > > is not
> > > > > > > supported for complex type, so no need to handle either).
> > > > > >
> > > > > > (*)
> > > > > >
> > > > > > > Instead of support vector(2) _Complex double, this patch takes 
> > > > > > > vector(4)
> > > > > > > double as vector type of _Complex double. Since vectorizer 
> > > > > > > originally
> > > > > > > takes TYPE_VECTOR_SUBPARTS as nunits which is not true for complex
> > > > > > > type, the patch handles nunits/ncopies/vf specially for complex 
> > > > > > > type.
> > > > > >
> > > > > > For the limited set above(*) can you explain what's "special" about
> > > > > > vector(2) _Complex
> > > > > > vs. vector(4) double, thus why we need to have STMT_VINFO_COMPLEX_P 
> > > > > > at all?
> > > > > Supporting a vector(2) complex  is a straightforward idea, just like
> > > > > supporting other scalar type in vectorizer, but it requires more
> > > > > efforts(in the backend and frontend), considering that most of
> > > > > operations of complex type will be lowered into realpart and imagpart
> > > > > operations, supporting a vector(2) complex does not look that
> > > > > necessary. Then it comes up with supporting vector(4) double(with
> > > > > adjustment of vf/ctor/permutation), the vectorizer only needs to
> > > > > handle the vectorization of the move operation of the complex type(no
> > > > > need to worry about wrongly mapping vector(4) double multiplication to
> > > > > complex type multiplication since it's already lowered before
> > > > > vectorizer).
> > > > > stmt_info does not record the scalar type, in order to avoid duplicate
> > > > > operation like getting a lhs type from stmt to determine whether it is
> > > > > a complex type, STMT_VINFO_COMPLEX_P bit is added, this bit is mainly
> > > > > initialized in vect_analyze_data_refs and vect_get_vector_types_for_
> > > > > stmt.
> > > > > >
> > > > > > I wonder to what extent your handling can be extended to support 
> > > > > > re-vectorizing
> > > > > > (with a higher VF for example) already vectorized code?  The 
> > > > > > vectorizer giving
> > > > > > up on vector(2) double looks quite obviously similar to it giving up
> > > > > > on _Complex double ...
> > > > > Yes, it can be extended to vector(2) double/float/int/ with a bit
> > > > > adjustment(exacting element by using bit_field instead of
> > > > > imagpart_expr/realpart_expr).
> > > > > > It would be a shame to not use the same underlying mechanism for 
> > > > > > dealing with
> > > > > > both, where for the vector case obviously vector(4) would be 
> > > > > > supported as well.
> > > > > >
> > > > > > In principle _Complex double operations should be two SLP lanes but 
> > > > > > it seems you
> > > > > > are handling them with classical interleaving as well?
> > > > > I'm only handling move operations, for other operations it will be
> > > > > lowered to realpart and imagpart and thus two SLP lanes.
> > > >
> > > > Yes, I understood that.
> > > >
> > > > Doing it more general (and IMHO better) would involve enhancing
> > > > how we represent dataref groups, maintaining the number of scalars
> > > > covered by each of the vinfos.  On the SLP representation side it
> > > > probably requires to rely on the representative for access and not
> > > > on the scalar stmts (since those do not map properly to the lanes).
> > > >
> > > > Ideally we'd be able to handle
> > > >
> > > > struct { _Complex double c; double a; double b; } a[], b[];
> > > >
> > > > void foo ()
> > > > {
> > > >for (int i = 0; i < 100; ++i)
> > > > {
> > > >   a[i].c = b[i].c;
> > > >   a[i].a = b[i].a;
> > > >   a[i].b = b[i].b;
> > > > }
> > > > }
> > > >
> > > > which I guess your patch doesn't handle with plain AVX vector
> > > > copies but instead uses interleaving for the _Complex and non-_Complex
> > > > parts?
> > > Indeed, it produces wrong code.
> >
> > For _Complex, in case we don't get to the "true and only" solution it
> > might be easier to split the loads and stores when it's just memory
> > copies and we have vectorization enabled and a supported vector
> > mode that would surely re-assemble them (store-merging doesn't seem
> > to do that).
> >
> > 

[PATCH part2][PUSHED] docs: fix position of @end deftypefn

2022-07-14 Thread Martin Liška
gcc/ChangeLog:

* doc/gimple.texi: Close properly a deftypefn.
---
 gcc/doc/gimple.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 9150218fed8..7832fa6ff90 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -1977,12 +1977,12 @@ executed in sequence.
 @deftypefn {GIMPLE function} gomp_parallel *gimple_build_omp_parallel (@
 gimple_seq body, tree clauses, tree child_fn, tree data_arg)
 Build a @code{GIMPLE_OMP_PARALLEL} statement.
-@end deftypefn
 
 @code{BODY} is sequence of statements which are executed in parallel.
 @code{CLAUSES}, are the @code{OMP} parallel construct's clauses.  
@code{CHILD_FN} is
 the function created for the parallel threads to execute.
 @code{DATA_ARG} are the shared data argument(s).
+@end deftypefn
 
 @deftypefn {GIMPLE function} bool gimple_omp_parallel_combined_p (gimple g)
 Return true if @code{OMP} parallel statement @code{G} has the
@@ -2076,9 +2076,9 @@ Return true if @code{OMP} return statement @code{G} has 
the
 
 @deftypefn {GIMPLE function} gimple gimple_build_omp_section (gimple_seq body)
 Build a @code{GIMPLE_OMP_SECTION} statement for a sections statement.
-@end deftypefn
 
 @code{BODY} is the sequence of statements in the section.
+@end deftypefn
 
 @deftypefn {GIMPLE function} bool gimple_omp_section_last_p (gimple g)
 Return true if @code{OMP} section statement @code{G} has the
-- 
2.37.0



Re: [PATCH] Fix ICE on view conversion between struct and integer

2022-07-14 Thread Richard Biener via Gcc-patches
On Thu, Jul 14, 2022 at 10:20 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> you can build a view conversion between pretty much anything in Ada including
> between types with different sizes, although the compiler warns in this case
> and gigi pads the smaller type to end up with the same size.
>
> The attached testcase triggers an ICE at -O or above for one of them:
>
> FAIL: gnat.dg/opt98.adb (test for excess errors)
> Excess errors:
> during GIMPLE pass: esra
> +===GNAT BUG DETECTED==+
> | 13.0.0 20220713 (experimental) [master 6f5cf9470aa] (x86_64-suse-linux) GCC 
> error:|
> | in gimplify_modify_expr, at gimplify.cc:6254 |
> | Error detected around 
> /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt98.adb:10:7|
> | Compiling /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt98.adb |
>
>   if (gimplify_ctxp->into_ssa && is_gimple_reg (*to_p))
> {
>   /* We should have got an SSA name from the start.  */
>   gcc_assert (TREE_CODE (*to_p) == SSA_NAME
>   || ! gimple_in_ssa_p (cfun));
> }
>
> This happens from prepare_gimple_addressable for the variable to be marked 
> with
> DECL_NOT_GIMPLE_REG_P when its initialization is gimplified, so it's 
> apparently
> just a matter of setting the flag earlier.
>
> Bootstrapped/regtested on x86-64/Linux, OK for the mainline?

OK.

Thanks,
Richard.

>
> 2022-07-14  Eric Botcazou  
>
> * gimplify.cc (lookup_tmp_var): Add NOT_GIMPLE_REG boolean parameter
> and set DECL_NOT_GIMPLE_REG_P on the variable according to it.
> (internal_get_tmp_var): Add NOT_GIMPLE_REG boolean parameter and pass
> it in the call to lookup_tmp_var.
> (get_formal_tmp_var): Pass false in the call to lookup_tmp_var.
> (get_initialized_tmp_var): Likewise.
> (prepare_gimple_addressable): Call internal_get_tmp_var instead of
> get_initialized_tmp_var with NOT_GIMPLE_REG set to true.
>
>
> 2022-07-14  Eric Botcazou  
>
> * gnat.dg/opt98.ads, gnat.dg/opt98.adb: New test.
>
> --
> Eric Botcazou


Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-14 Thread Richard Biener via Gcc-patches
On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
 wrote:
>
> On Wed, 13 Jul 2022 at 12:22, Richard Biener  
> wrote:
> >
> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> >  wrote:
> > >
> > > Hi Richard,
> > > For the following test:
> > >
> > > svint32_t f2(int a, int b, int c, int d)
> > > {
> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > >   return svld1rq_s32 (svptrue_b8 (), [0]);
> > > }
> > >
> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > > foo.c: In function ‘f2’:
> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > >   |   ^~
> > > svint32_t
> > > __Int32x4_t
> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > during GIMPLE pass: forwprop
> > > dump file: foo.c.109t.forwprop2
> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > 0xe9371f execute_function_todo
> > > ../../gcc/gcc/passes.cc:2091
> > > 0xe93ccb execute_todo
> > > ../../gcc/gcc/passes.cc:2145
> > >
> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> > >   int32x4_t v;
> > >   __Int32x4_t _1;
> > >   svint32_t _9;
> > >   vector(4) int _11;
> > >
> > >:
> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > >   v_12 = _1;
> > >   _11 = v_12;
> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > >   return _9;
> > >
> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > > view_convert_expr,
> > > and the end result becomes:
> > >   svint32_t _7;
> > >   __Int32x4_t _8;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >   return _7;
> > > ;;succ:   EXIT
> > >
> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > > has incompatible types (svint32_t, int32x4_t).
> > >
> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > > which resolves ICE, but am not sure if it's the correct approach ?
> >
> > It for sure papers over the issue.  I think the error happens earlier,
> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > which is the type of the LHS.  But then you probably run into the
> > different sizes ICE (VLA vs constant size).  I think for this case you
> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > selecting the "low" part of the VLA vector.
> Hi Richard,
> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> represent dup operation
> from fixed width to VLA vector. I am not sure how folding it to
> BIT_FIELD_REF will work.
> Could you please elaborate ?
>
> Also, the issue doesn't seem restricted to this case.
> The following test case also ICE's during forwprop:
> svint32_t foo()
> {
>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), [0]);
>   return v2;
> }
>
> foo2.c: In function ‘foo’:
> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> 9 | }
>   | ^
> svint32_t
> int32x4_t
> v2_4 = { 1, 2, 3, 4 };
>
> because simplify_permutation folds
> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> into:
> vector_cst {1, 2, 3, 4}
>
> and it complains during verify_gimple_assign_single because we don't
> support assignment of vector_cst to VLA vector.
>
> I guess the issue really is that currently, only VEC_PERM_EXPR
> supports lhs and rhs
> to have vector types with differing lengths, and simplifying it to
> other tree codes, like above,
> will result in type errors ?

That might be the case - Richard should know.  If so your type check
is still too late, you should instead recognize that we are permuting
a VLA vector and then refuse to go any of the non-VEC_PERM generating
paths - that probably means just allowing the code == VEC_PERM_EXPR
case and not any of the CTOR/CST/VIEW_CONVERT_EXPR cases?

Richard.

>
> Thanks,
> Prathamesh
> >
> > >
> > > Alternatively, should we allow assignments from fixed-width to SVE
> > > vector, so the above
> > > VIEW_CONVERT_EXPR would result in dup ?
> > >
> > > Thanks,
> > > Prathamesh


[PATCH][pushed] docs: fix position of @end deftypefn

2022-07-14 Thread Martin Liška
gcc/ChangeLog:

* doc/gimple.texi: Close properly a deftypefn.
---
 gcc/doc/gimple.texi | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index dd9149377f3..9150218fed8 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -1965,11 +1965,10 @@ statements to be executed by just the master.
 
 @deftypefn {GIMPLE function} gimple gimple_build_omp_ordered (gimple_seq body)
 Build a @code{GIMPLE_OMP_ORDERED} statement.
-@end deftypefn
 
 @code{BODY} is the sequence of statements inside a loop that will
 executed in sequence.
-
+@end deftypefn
 
 @node @code{GIMPLE_OMP_PARALLEL}
 @subsection @code{GIMPLE_OMP_PARALLEL}
-- 
2.37.0



Re: [PATCH] [RFC]Support vectorization for Complex type.

2022-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 13, 2022 at 9:34 AM Richard Biener
 wrote:
>
> On Wed, Jul 13, 2022 at 6:47 AM Hongtao Liu  wrote:
> >
> > On Tue, Jul 12, 2022 at 10:12 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Jul 12, 2022 at 6:11 AM Hongtao Liu  wrote:
> > > >
> > > > On Mon, Jul 11, 2022 at 7:47 PM Richard Biener via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > On Mon, Jul 11, 2022 at 5:44 AM liuhongt  
> > > > > wrote:
> > > > > >
> > > > > > The patch only handles load/store(including ctor/permutation, except
> > > > > > gather/scatter) for complex type, other operations don't needs to be
> > > > > > handled since they will be lowered by pass cplxlower.(MASK_LOAD is 
> > > > > > not
> > > > > > supported for complex type, so no need to handle either).
> > > > >
> > > > > (*)
> > > > >
> > > > > > Instead of support vector(2) _Complex double, this patch takes 
> > > > > > vector(4)
> > > > > > double as vector type of _Complex double. Since vectorizer 
> > > > > > originally
> > > > > > takes TYPE_VECTOR_SUBPARTS as nunits which is not true for complex
> > > > > > type, the patch handles nunits/ncopies/vf specially for complex 
> > > > > > type.
> > > > >
> > > > > For the limited set above(*) can you explain what's "special" about
> > > > > vector(2) _Complex
> > > > > vs. vector(4) double, thus why we need to have STMT_VINFO_COMPLEX_P 
> > > > > at all?
> > > > Supporting a vector(2) complex  is a straightforward idea, just like
> > > > supporting other scalar type in vectorizer, but it requires more
> > > > efforts(in the backend and frontend), considering that most of
> > > > operations of complex type will be lowered into realpart and imagpart
> > > > operations, supporting a vector(2) complex does not look that
> > > > necessary. Then it comes up with supporting vector(4) double(with
> > > > adjustment of vf/ctor/permutation), the vectorizer only needs to
> > > > handle the vectorization of the move operation of the complex type(no
> > > > need to worry about wrongly mapping vector(4) double multiplication to
> > > > complex type multiplication since it's already lowered before
> > > > vectorizer).
> > > > stmt_info does not record the scalar type, in order to avoid duplicate
> > > > operation like getting a lhs type from stmt to determine whether it is
> > > > a complex type, STMT_VINFO_COMPLEX_P bit is added, this bit is mainly
> > > > initialized in vect_analyze_data_refs and vect_get_vector_types_for_
> > > > stmt.
> > > > >
> > > > > I wonder to what extent your handling can be extended to support 
> > > > > re-vectorizing
> > > > > (with a higher VF for example) already vectorized code?  The 
> > > > > vectorizer giving
> > > > > up on vector(2) double looks quite obviously similar to it giving up
> > > > > on _Complex double ...
> > > > Yes, it can be extended to vector(2) double/float/int/ with a bit
> > > > adjustment(exacting element by using bit_field instead of
> > > > imagpart_expr/realpart_expr).
> > > > > It would be a shame to not use the same underlying mechanism for 
> > > > > dealing with
> > > > > both, where for the vector case obviously vector(4) would be 
> > > > > supported as well.
> > > > >
> > > > > In principle _Complex double operations should be two SLP lanes but 
> > > > > it seems you
> > > > > are handling them with classical interleaving as well?
> > > > I'm only handling move operations, for other operations it will be
> > > > lowered to realpart and imagpart and thus two SLP lanes.
> > >
> > > Yes, I understood that.
> > >
> > > Doing it more general (and IMHO better) would involve enhancing
> > > how we represent dataref groups, maintaining the number of scalars
> > > covered by each of the vinfos.  On the SLP representation side it
> > > probably requires to rely on the representative for access and not
> > > on the scalar stmts (since those do not map properly to the lanes).
> > >
> > > Ideally we'd be able to handle
> > >
> > > struct { _Complex double c; double a; double b; } a[], b[];
> > >
> > > void foo ()
> > > {
> > >for (int i = 0; i < 100; ++i)
> > > {
> > >   a[i].c = b[i].c;
> > >   a[i].a = b[i].a;
> > >   a[i].b = b[i].b;
> > > }
> > > }
> > >
> > > which I guess your patch doesn't handle with plain AVX vector
> > > copies but instead uses interleaving for the _Complex and non-_Complex
> > > parts?
> > Indeed, it produces wrong code.
>
> For _Complex, in case we don't get to the "true and only" solution it
> might be easier to split the loads and stores when it's just memory
> copies and we have vectorization enabled and a supported vector
> mode that would surely re-assemble them (store-merging doesn't seem
> to do that).
>
> Btw, we seem to produce
>
> movsd   b(%rip), %xmm0
> movsd   %xmm0, a(%rip)
> movsd   b+8(%rip), %xmm0
> movsd   %xmm0, a+8(%rip)
>
> for a _Complex double memory copy on x86 which means we lack
> true DCmode support (pseudos get decomposed).  Not sure if we
> can somehow 

[PATCH] Fix ICE on view conversion between struct and integer

2022-07-14 Thread Eric Botcazou via Gcc-patches
Hi,

you can build a view conversion between pretty much anything in Ada including
between types with different sizes, although the compiler warns in this case
and gigi pads the smaller type to end up with the same size.

The attached testcase triggers an ICE at -O or above for one of them:

FAIL: gnat.dg/opt98.adb (test for excess errors)
Excess errors:
during GIMPLE pass: esra
+===GNAT BUG DETECTED==+
| 13.0.0 20220713 (experimental) [master 6f5cf9470aa] (x86_64-suse-linux) GCC 
error:|
| in gimplify_modify_expr, at gimplify.cc:6254 |
| Error detected around /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt98.adb:10:7|
| Compiling /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt98.adb |

  if (gimplify_ctxp->into_ssa && is_gimple_reg (*to_p))
{
  /* We should have got an SSA name from the start.  */
  gcc_assert (TREE_CODE (*to_p) == SSA_NAME
  || ! gimple_in_ssa_p (cfun));
}

This happens from prepare_gimple_addressable for the variable to be marked with
DECL_NOT_GIMPLE_REG_P when its initialization is gimplified, so it's apparently
just a matter of setting the flag earlier.

Bootstrapped/regtested on x86-64/Linux, OK for the mainline?


2022-07-14  Eric Botcazou  

* gimplify.cc (lookup_tmp_var): Add NOT_GIMPLE_REG boolean parameter
and set DECL_NOT_GIMPLE_REG_P on the variable according to it.
(internal_get_tmp_var): Add NOT_GIMPLE_REG boolean parameter and pass
it in the call to lookup_tmp_var.
(get_formal_tmp_var): Pass false in the call to lookup_tmp_var.
(get_initialized_tmp_var): Likewise.
(prepare_gimple_addressable): Call internal_get_tmp_var instead of
get_initialized_tmp_var with NOT_GIMPLE_REG set to true.


2022-07-14  Eric Botcazou  

* gnat.dg/opt98.ads, gnat.dg/opt98.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 04990ad91a6..2ac7ca0855e 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -573,20 +573,26 @@ create_tmp_from_val (tree val)
 }
 
 /* Create a temporary to hold the value of VAL.  If IS_FORMAL, try to reuse
-   an existing expression temporary.  */
+   an existing expression temporary.  If NOT_GIMPLE_REG, mark it as such.  */
 
 static tree
-lookup_tmp_var (tree val, bool is_formal)
+lookup_tmp_var (tree val, bool is_formal, bool not_gimple_reg)
 {
   tree ret;
 
+  /* We cannot mark a formal temporary with DECL_NOT_GIMPLE_REG_P.  */
+  gcc_assert (!is_formal || !not_gimple_reg);
+
   /* If not optimizing, never really reuse a temporary.  local-alloc
  won't allocate any variable that is used in more than one basic
  block, which means it will go into memory, causing much extra
  work in reload and final and poorer code generation, outweighing
  the extra memory allocation here.  */
   if (!optimize || !is_formal || TREE_SIDE_EFFECTS (val))
-ret = create_tmp_from_val (val);
+{
+  ret = create_tmp_from_val (val);
+  DECL_NOT_GIMPLE_REG_P (ret) = not_gimple_reg;
+}
   else
 {
   elt_t elt, *elt_p;
@@ -617,7 +623,7 @@ lookup_tmp_var (tree val, bool is_formal)
 
 static tree
 internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
-  bool is_formal, bool allow_ssa)
+		  bool is_formal, bool allow_ssa, bool not_gimple_reg)
 {
   tree t, mod;
 
@@ -639,7 +645,7 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
 	}
 }
   else
-t = lookup_tmp_var (val, is_formal);
+t = lookup_tmp_var (val, is_formal, not_gimple_reg);
 
   mod = build2 (INIT_EXPR, TREE_TYPE (t), t, unshare_expr (val));
 
@@ -667,7 +673,7 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
 tree
 get_formal_tmp_var (tree val, gimple_seq *pre_p)
 {
-  return internal_get_tmp_var (val, pre_p, NULL, true, true);
+  return internal_get_tmp_var (val, pre_p, NULL, true, true, false);
 }
 
 /* Return a temporary variable initialized with VAL.  PRE_P and POST_P
@@ -678,7 +684,7 @@ get_initialized_tmp_var (tree val, gimple_seq *pre_p,
 			 gimple_seq *post_p /* = NULL */,
 			 bool allow_ssa /* = true */)
 {
-  return internal_get_tmp_var (val, pre_p, post_p, false, allow_ssa);
+  return internal_get_tmp_var (val, pre_p, post_p, false, allow_ssa, false);
 }
 
 /* Declare all the variables in VARS in SCOPE.  If DEBUG_INFO is true,
@@ -4574,13 +4580,10 @@ prepare_gimple_addressable (tree *expr_p, gimple_seq *seq_p)
 {
   while (handled_component_p (*expr_p))
 expr_p = _OPERAND (*expr_p, 0);
+
+  /* Do not allow an SSA name as the temporary.  */
   if (is_gimple_reg (*expr_p))
-{
-  /* Do not allow an SSA name as the temporary.  */
-  tree var = get_initialized_tmp_var (*expr_p, seq_p, NULL, false);
-  DECL_NOT_GIMPLE_REG_P (var) = 1;
-  *expr_p = var;
-}
+*expr_p = internal_get_tmp_var (*expr_p, seq_p, NULL, false, false, true);
 }
 
 

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-14 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 13 Jul 2022 at 12:22, Richard Biener  wrote:
>
> On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > Hi Richard,
> > For the following test:
> >
> > svint32_t f2(int a, int b, int c, int d)
> > {
> >   int32x4_t v = (int32x4_t) {a, b, c, d};
> >   return svld1rq_s32 (svptrue_b8 (), [0]);
> > }
> >
> > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > foo.c: In function ‘f2’:
> > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > 4 | svint32_t f2(int a, int b, int c, int d)
> >   |   ^~
> > svint32_t
> > __Int32x4_t
> > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > during GIMPLE pass: forwprop
> > dump file: foo.c.109t.forwprop2
> > foo.c:4:11: internal compiler error: verify_gimple failed
> > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > ../../gcc/gcc/tree-cfg.cc:5568
> > 0xe9371f execute_function_todo
> > ../../gcc/gcc/passes.cc:2091
> > 0xe93ccb execute_todo
> > ../../gcc/gcc/passes.cc:2145
> >
> > This happens because, after folding svld1rq_s32 to vec_perm_expr, we have:
> >   int32x4_t v;
> >   __Int32x4_t _1;
> >   svint32_t _9;
> >   vector(4) int _11;
> >
> >:
> >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> >   v_12 = _1;
> >   _11 = v_12;
> >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> >   return _9;
> >
> > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > view_convert_expr,
> > and the end result becomes:
> >   svint32_t _7;
> >   __Int32x4_t _8;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> >   return _7;
> > ;;succ:   EXIT
> >
> > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > has incompatible types (svint32_t, int32x4_t).
> >
> > The attached patch disables simplification of VEC_PERM_EXPR
> > in simplify_permutation, if lhs and rhs have non compatible types,
> > which resolves ICE, but am not sure if it's the correct approach ?
>
> It for sure papers over the issue.  I think the error happens earlier,
> the V_C_E should have been built with the type of the VEC_PERM_EXPR
> which is the type of the LHS.  But then you probably run into the
> different sizes ICE (VLA vs constant size).  I think for this case you
> want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> selecting the "low" part of the VLA vector.
Hi Richard,
Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
represent dup operation
from fixed width to VLA vector. I am not sure how folding it to
BIT_FIELD_REF will work.
Could you please elaborate ?

Also, the issue doesn't seem restricted to this case.
The following test case also ICE's during forwprop:
svint32_t foo()
{
  int32x4_t v = (int32x4_t) {1, 2, 3, 4};
  svint32_t v2 = svld1rq_s32 (svptrue_b8 (), [0]);
  return v2;
}

foo2.c: In function ‘foo’:
foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
9 | }
  | ^
svint32_t
int32x4_t
v2_4 = { 1, 2, 3, 4 };

because simplify_permutation folds
VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
into:
vector_cst {1, 2, 3, 4}

and it complains during verify_gimple_assign_single because we don't
support assignment of vector_cst to VLA vector.

I guess the issue really is that currently, only VEC_PERM_EXPR
supports lhs and rhs
to have vector types with differing lengths, and simplifying it to
other tree codes, like above,
will result in type errors ?

Thanks,
Prathamesh
>
> >
> > Alternatively, should we allow assignments from fixed-width to SVE
> > vector, so the above
> > VIEW_CONVERT_EXPR would result in dup ?
> >
> > Thanks,
> > Prathamesh


Re: [PATCH] xtensa: Minor fix for FP constant synthesis

2022-07-14 Thread Max Filippov via Gcc-patches
On Wed, Jul 13, 2022 at 4:41 AM Takayuki 'January June' Suwa
 wrote:
>
> This patch fixes an non-fatal issue about negative constant values derived
> from FP constant synthesis on hosts whose 'long' is wider than 'int32_t'.
>
> And also replaces the dedicated code in FP constant synthesis split
> pattern with the appropriate existing function call.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md:
> In FP constant synthesis split pattern, subcontract to
> avoid_constant_pool_reference() as in the case of integer,
> because it can handle well too.  And cast to int32_t before
> calling xtensa_constantsynth() in order to ignore upper 32-bit.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/xtensa/constsynth_double.c:
> Modify in order to catch the issue.
> ---
>  gcc/config/xtensa/xtensa.md   | 35 +--
>  .../gcc.target/xtensa/constsynth_double.c |  2 +-
>  2 files changed, 9 insertions(+), 28 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH] Extend 64-bit vector bit_op patterns with ?r alternative

2022-07-14 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 14, 2022 at 7:33 AM liuhongt  wrote:
>
> And split it to GPR-version instruction after reload.
>
> > ?r was introduced under the assumption that we want vector values
> > mostly in vector registers. Currently there are no instructions with
> > memory or immediate operand, so that made sense at the time. Let's
> > keep ?r until logic instructions with mem/imm operands are introduced.
> > So, for the patch that adds 64-bit vector logic in GPR, I would advise
> > to first introduce only register operands. mem/imm operands should be
> Update patch to add ?r to 64-bit bit_op patterns.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> No big imact on SPEC2017(Most same binary).

The problem with your approach is with the combine pass, where combine
first tries to recognize the combined instruction without clobber,
before re-recognizing instruction with added clobber. So, if a forward
propagation happens, the combine will *always* choose the insn variant
without GPR.

So, the solution with VI_16_32 is to always expand with a clobbered
version that is split to either SImode or V16QImode. With 64-bit
instructions, we have two additional complications. First, we have a
native MMX instruction, and we have to split to it after reload, and
second, we have a builtin that expects vector insn.

To solve the first issue, we should change the mode of
"*mmx" to V1DImode and split your new _gpr version with
clobber to it for !GENERAL_REG_P operands.

The second issue could be solved by emitting V1DImode instructions
directly from the expander. Please note there are several expanders
that expect non-clobbered logic insn in certain mode to be available,
so the situation can become quite annoying...

Uros.


Re: [x86_64 PATCH] Improved Scalar-To-Vector (STV) support for TImode to V1TImode.

2022-07-14 Thread Richard Biener via Gcc-patches
On Thu, Jul 14, 2022 at 7:32 AM Roger Sayle  wrote:
>
>
> On Mon, Jul 11, 2022, H.J. Lu  wrote:
> > On Sun, Jul 10, 2022 at 2:38 PM Roger Sayle 
> > wrote:
> > > Hi HJ,
> > >
> > > I believe this should now be handled by the post-reload (CSE) pass.
> > > Consider the simple test case:
> > >
> > > __int128 a, b, c;
> > > void foo()
> > > {
> > >   a = 0;
> > >   b = 0;
> > >   c = 0;
> > > }
> > >
> > > Without any STV, i.e. -O2 -msse4 -mno-stv, GCC get TI mode writes:
> > > movq$0, a(%rip)
> > > movq$0, a+8(%rip)
> > > movq$0, b(%rip)
> > > movq$0, b+8(%rip)
> > > movq$0, c(%rip)
> > > movq$0, c+8(%rip)
> > > ret
> > >
> > > But with STV, i.e. -O2 -msse4, things get converted to V1TI mode:
> > > pxor%xmm0, %xmm0
> > > movaps  %xmm0, a(%rip)
> > > movaps  %xmm0, b(%rip)
> > > movaps  %xmm0, c(%rip)
> > > ret
> > >
> > > You're quite right internally the STV actually generates the equivalent 
> > > of:
> > > pxor%xmm0, %xmm0
> > > movaps  %xmm0, a(%rip)
> > > pxor%xmm0, %xmm0
> > > movaps  %xmm0, b(%rip)
> > > pxor%xmm0, %xmm0
> > > movaps  %xmm0, c(%rip)
> > > ret
> > >
> > > And currently because STV run before cse2 and combine, the const0_rtx
> > > gets CSE'd be the cse2 pass to produce the code we see.  However, if
> > > you specify -fno-rerun-cse-after-loop (to disable the cse2 pass),
> > > you'll see we continue to generate the same optimized code, as the
> > > same const0_rtx gets CSE'd in postreload.
> > >
> > > I can't be certain until I try the experiment, but I believe that the
> > > postreload CSE will clean-up, all of the same common subexpressions.
> > > Hence, it should be safe to perform all STV at the same point (after
> > > combine), which for a few additional optimizations.
> > >
> > > Does this make sense?  Do you have a test case,
> > > -fno-rerun-cse-after-loop produces different/inferior code for TImode STV
> > chains?
> > >
> > > My guess is that the RTL passes have changed so much in the last six
> > > or seven years, that some of the original motivation no longer applies.
> > > Certainly we now try to keep TI mode operations visible longer, and
> > > then allow STV to behave like a pre-reload pass to decide which set of
> > > registers to use (vector V1TI or scalar doubleword DI).  Any CSE
> > > opportunities that cse2 finds with V1TI mode, could/should equally
> > > well be found for TI mode (mostly).
> >
> > You are probably right.  If there are no regressions in GCC testsuite, my 
> > original
> > motivation is no longer valid.
>
> It was good to try the experiment, but H.J. is right, there is still some 
> benefit
> (as well as some disadvantages)  to running STV lowering before CSE2/combine.
> A clean-up patch to perform all STV conversion as a single pass (removing a
> pass from the compiler) results in just a single regression in the test suite:
> FAIL: gcc.target/i386/pr70155-17.c scan-assembler-times movv1ti_internal 8
> which looks like:
>
> __int128 a, b, c, d, e, f;
> void foo (void)
> {
>   a = 0;
>   b = -1;
>   c = 0;
>   d = -1;
>   e = 0;
>   f = -1;
> }
>
> By performing STV after combine (without CSE), reload prefers to implement
> this function using a single register, that then requires 12 instructions 
> rather
> than 8 (if using two registers).  Alas there's nothing that postreload 
> CSE/GCSE
> can do.  Doh!

Hmm, the RA could be taught to make use of more of the register file I suppose
(shouldn't regrename do this job - but it runs after postreload-cse)

> pxor%xmm0, %xmm0
> movaps  %xmm0, a(%rip)
> pcmpeqd %xmm0, %xmm0
> movaps  %xmm0, b(%rip)
> pxor%xmm0, %xmm0
> movaps  %xmm0, c(%rip)
> pcmpeqd %xmm0, %xmm0
> movaps  %xmm0, d(%rip)
> pxor%xmm0, %xmm0
> movaps  %xmm0, e(%rip)
> pcmpeqd %xmm0, %xmm0
> movaps  %xmm0, f(%rip)
> ret
>
> I also note that even without STV, the scalar implementation of this function 
> when
> compiled with -Os is also larger than it needs to be due to poor CSE (notice 
> in the
> following we only need a single zero register, and  an all_ones reg would be 
> helpful).
>
> xorl%eax, %eax
> xorl%edx, %edx
> xorl%ecx, %ecx
> movq$-1, b(%rip)
> movq%rax, a(%rip)
> movq%rax, a+8(%rip)
> movq$-1, b+8(%rip)
> movq%rdx, c(%rip)
> movq%rdx, c+8(%rip)
> movq$-1, d(%rip)
> movq$-1, d+8(%rip)
> movq%rcx, e(%rip)
> movq%rcx, e+8(%rip)
> movq$-1, f(%rip)
> movq$-1, f+8(%rip)
> ret
>
> I need to give the problem some more thought.  It would be good to 
> clean-up/unify
> the STV passes, but I/we need to solve/CSE HJ's last test case before we do.  
> 

Re: [PATCH] ipa-cp: Fix assert triggering with -fno-toplevel-reorder (PR 106260)

2022-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 13, 2022 at 11:06 PM Martin Jambor  wrote:
>
> Hi,
>
> with -fno-toplevel-reorder (and -fwhole-program), there apparently can
> be local functions without any callers.

Did you check why?  Can't we fix that?

>  This is something that IPA-CP
> does not like because its propagation verifier checks that local
> functions do not end up with TOP in their lattices.  Therefore there
> is an assert checking that all call-less unreachable functions have
> been removed, which triggers in PR 106260 with these two options.
>
> This patch detects the situation and marks the lattices as variable,
> thus avoiding both the assert trigger and the verification failure.
>
> Bootstrapped and tested on x86_64-linux.  OK for master and then all
> active release branches?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2022-07-13  Martin Jambor  
>
> PR ipa/106260
> * ipa-cp.cc (initialize_node_lattices): Replace assert that there are
> callers with handling that situation when -fno-toplevel_reorder.
>
> gcc/testsuite/ChangeLog:
>
> 2022-07-13  Martin Jambor  
>
> PR ipa/106260
> * g++.dg/ipa/pr106260.C: New test.
> ---
>  gcc/ipa-cp.cc   |  6 ++-
>  gcc/testsuite/g++.dg/ipa/pr106260.C | 64 +
>  2 files changed, 69 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr106260.C
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 543a9334e2c..f699a8dadc0 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -1286,10 +1286,14 @@ initialize_node_lattices (struct cgraph_node *node)
>int caller_count = 0;
>node->call_for_symbol_thunks_and_aliases (count_callers, _count,
> true);
> -  gcc_checking_assert (caller_count > 0);
>if (caller_count == 1)
> node->call_for_symbol_thunks_and_aliases (set_single_call_flag,
>   NULL, true);
> +  else if (caller_count == 0)
> +   {
> + gcc_checking_assert (!opt_for_fn (node->decl, 
> flag_toplevel_reorder));
> + variable = true;
> +   }
>  }
>else
>  {
> diff --git a/gcc/testsuite/g++.dg/ipa/pr106260.C 
> b/gcc/testsuite/g++.dg/ipa/pr106260.C
> new file mode 100644
> index 000..bd3b6e0af79
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr106260.C
> @@ -0,0 +1,64 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -std=gnu++14 -fwhole-program -fno-unit-at-a-time" }
> +
> +struct A;
> +template 
> +struct Q { Q (T); };
> +template
> +struct U {
> +  ~U () { m1 (nullptr); }
> +  D m2 ();
> +  T *u;
> +  void m1 (T *) { m2 () (u); }
> +};
> +struct F { F (int *); };
> +template 
> +using W = Q;
> +int a, b;
> +void fn1 (void *);
> +template 
> +void
> +fn2 (T *x)
> +{
> +  if (x)
> +x->~T();
> +  fn1 (x);
> +}
> +template 
> +struct C {
> +  void operator() (T *x) { fn2 (x); }
> +};
> +struct D;
> +template  >
> +using V = U;
> +struct A {
> +  A (int *);
> +};
> +struct S;
> +struct G {
> +  V m3 ();
> +};
> +struct S {
> +  int e;
> +  virtual ~S () {}
> +};
> +template
> +struct H {
> +  H (int, T x, int) : h(x) {}
> +  G g;
> +  void m4 () { g.m3 (); }
> +  T h;
> +};
> +struct I {
> +  I(A, W);
> +};
> +void
> +test ()
> +{
> +  A c ();
> +  W d ();
> +  I e (c, d);
> +  H f (0, e, a);
> +  f.m4 ();
> +}
> +
> --
> 2.36.1
>


Re: [PATCH] match.pd: Add new abs pattern [PR94290]

2022-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 13, 2022 at 9:36 PM Andrew Pinski via Gcc-patches
 wrote:
>
> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches
>  wrote:
> >
> > This patch is intended to fix a missed optimization in match.pd. It 
> > optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to 
> > write a second simplification in match.pd to handle the commutative 
> > property as the match was not ocurring otherwise. Additionally, the pattern 
> > (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps with the 
> > other simplification rule.
>
> You could use :c for the commutative property instead and that should
> simplify things.
> That is:
>
> (simplify
>   (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
>   (abs @0))
>
> Also since integer_zerop works on vectors, it seems like you should
> add a testcase or two for the vector case.
> Also would be useful if you write a testcase that uses different
> statements rather than one big one so it gets exercised in the
> forwprop case.
> Note also if either of the max are used more than just in this
> simplification, it could increase the lifetime of @0, maybe you need
> to add :s to the max expressions.

Note :s will be ineffective since the result is "simple", I think it
is OK to do this simplification even when the max are not dead
after the transform.

>
> Thanks,
> Andrew
>
> >
> > Tests are also included to be added to the testsuite.
> >
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >
> > PR tree-optimization/94290
> >
> > gcc/ChangeLog:
> >
> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New simplification.
> > * match.pd (x <= 0 ? -x : 0): New Simplification.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/execute/pr94290-1.c: New test.
> > * gcc.dg/pr94290-2.c: New test.
> > * gcc.dg/pr94290.c: New test.
> > ---
> >  gcc/match.pd  | 15 ++
> >  .../gcc.c-torture/execute/pr94290-1.c | 16 +++
> >  gcc/testsuite/gcc.dg/pr94290-2.c  | 15 ++
> >  gcc/testsuite/gcc.dg/pr94290.c| 46 +++
> >  4 files changed, 92 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 45aefd96688..55ca79d7ac9 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7848,3 +7848,18 @@ and,
> >(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> >  (bit_and @0 @1)
> >(cond (le @0 @1) @0 (bit_and @0 @1))
> > +
> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
> > +(simplify
> > +  (plus (max @0 integer_zerop) (max (negate @0) integer_zerop))
> > +  (abs @0))
> > +
> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x.  */
> > +(simplify
> > +  (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) )
> > +  (abs @0))
> > +
> > +/* (x <= 0 ? -x : 0) -> max(-x, 0).  */
> > +(simplify
> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1)
> > + (max (negate @0) @1))
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c 
> > b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> > new file mode 100644
> > index 000..93b80d569aa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> > @@ -0,0 +1,16 @@
> > +/* PR tree-optimization/94290 */
> > +
> > +#include "../../gcc.dg/pr94290.c"
> > +
> > +int main() {
> > +
> > +if (foo(0) != 0
> > +|| foo(-42) != 42
> > +|| foo(42) != 42
> > +|| baz(-10) != 10
> > +|| baz(-10) != 10) {
> > +__builtin_abort();
> > +}
> > +
> > +return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/pr94290-2.c 
> > b/gcc/testsuite/gcc.dg/pr94290-2.c
> > new file mode 100644
> > index 000..ea6e55755f5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr94290-2.c
> > @@ -0,0 +1,15 @@
> > +/* PR tree-optimization/94290 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* Form from PR.  */
> > +__attribute__((noipa)) unsigned int foo(int x) {
> > +return x <= 0 ? -x : 0;
> > +}
> > +
> > +/* Changed order.  */
> > +__attribute__((noipa)) unsigned int bar(int x) {
> > +return 0 >= x ? -x : 0;
> > +}
> > +
> > +/* { dg-final {scan-tree-dump-times " MAX_EXPR " 2 "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/pr94290.c b/gcc/testsuite/gcc.dg/pr94290.c
> > new file mode 100644
> > index 000..47617c36c02
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr94290.c
> > @@ -0,0 +1,46 @@
> > +/* PR tree-optimization/94290 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +
> > +/* Same form as PR.  */
> > +__attribute__((noipa)) unsigned int foo(int x) {
> > +return (x >= 0 ? x : 0) + (x <= 0 ? -x : 0);
> > +}

Re: [PATCH v3] Simplify memchr with small constant strings

2022-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 13, 2022 at 6:50 PM H.J. Lu  wrote:
>
> When memchr is applied on a constant string of no more than the bytes of
> a word, simplify memchr by checking each byte in the constant string.
>
> int f (int a)
> {
>return  __builtin_memchr ("AE", a, 2) != 0;
> }
>
> is simplified to
>
> int f (int a)
> {
>   return ((char) a == 'A' || (char) a == 'E') != 0;
> }
>
> gcc/
>
> PR tree-optimization/103798
> * tree-ssa-forwprop.cc: Include "tree-ssa-strlen.h".
> (simplify_builtin_call): Inline memchr with constant strings of
> no more than the bytes of a word.
> * tree-ssa-strlen.cc (use_in_zero_equality): Make it global.
> * tree-ssa-strlen.h (use_in_zero_equality): New.
>
> gcc/testsuite/
>
> PR tree-optimization/103798
> * c-c++-common/pr103798-1.c: New test.
> * c-c++-common/pr103798-2.c: Likewise.
> * c-c++-common/pr103798-3.c: Likewise.
> * c-c++-common/pr103798-4.c: Likewise.
> * c-c++-common/pr103798-5.c: Likewise.
> * c-c++-common/pr103798-6.c: Likewise.
> * c-c++-common/pr103798-7.c: Likewise.
> * c-c++-common/pr103798-8.c: Likewise.
> * c-c++-common/pr103798-9.c: Likewise.
> * c-c++-common/pr103798-10.c: Likewise.
> ---
>  gcc/testsuite/c-c++-common/pr103798-1.c  | 28 +
>  gcc/testsuite/c-c++-common/pr103798-10.c | 10 
>  gcc/testsuite/c-c++-common/pr103798-2.c  | 30 ++
>  gcc/testsuite/c-c++-common/pr103798-3.c  | 28 +
>  gcc/testsuite/c-c++-common/pr103798-4.c  | 28 +
>  gcc/testsuite/c-c++-common/pr103798-5.c  | 26 +
>  gcc/testsuite/c-c++-common/pr103798-6.c  | 27 +
>  gcc/testsuite/c-c++-common/pr103798-7.c  | 27 +
>  gcc/testsuite/c-c++-common/pr103798-8.c  | 27 +
>  gcc/testsuite/c-c++-common/pr103798-9.c  | 10 
>  gcc/tree-ssa-forwprop.cc | 73 
>  gcc/tree-ssa-strlen.cc   |  4 +-
>  gcc/tree-ssa-strlen.h|  2 +
>  13 files changed, 318 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-1.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-10.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-2.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-3.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-4.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-5.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-6.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-7.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-8.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr103798-9.c
>
> diff --git a/gcc/testsuite/c-c++-common/pr103798-1.c 
> b/gcc/testsuite/c-c++-common/pr103798-1.c
> new file mode 100644
> index 000..cd3edf569fc
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr103798-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> +
> +__attribute__ ((weak))
> +int
> +f (char a)
> +{
> +   return  __builtin_memchr ("a", a, 1) == 0;
> +}
> +
> +__attribute__ ((weak))
> +int
> +g (char a)
> +{
> +  return a != 'a';
> +}
> +
> +int
> +main ()
> +{
> + for (int i = 0; i < 255; i++)
> +   if (f (i) != g (i))
> + __builtin_abort ();
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-assembler-not "memchr" } } */
> diff --git a/gcc/testsuite/c-c++-common/pr103798-10.c 
> b/gcc/testsuite/c-c++-common/pr103798-10.c
> new file mode 100644
> index 000..4677e9539fa
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr103798-10.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fdump-tree-optimized -save-temps" } */
> +
> +int
> +f (char a)
> +{
> +  return  __builtin_memchr ("ac", a, 1) == 0;
> +}
> +
> +/* { dg-final { scan-assembler "memchr" } } */
> diff --git a/gcc/testsuite/c-c++-common/pr103798-2.c 
> b/gcc/testsuite/c-c++-common/pr103798-2.c
> new file mode 100644
> index 000..e7e99c3679e
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr103798-2.c
> @@ -0,0 +1,30 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> +
> +#include 
> +
> +__attribute__ ((weak))
> +int
> +f (int a)
> +{
> +   return memchr ("aE", a, 2) != NULL;
> +}
> +
> +__attribute__ ((weak))
> +int
> +g (char a)
> +{
> +  return a == 'a' || a == 'E';
> +}
> +
> +int
> +main ()
> +{
> + for (int i = 0; i < 255; i++)
> +   if (f (i + 256) != g (i + 256))
> + __builtin_abort ();
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-assembler-not "memchr" } } */
> diff --git a/gcc/testsuite/c-c++-common/pr103798-3.c 
> b/gcc/testsuite/c-c++-common/pr103798-3.c
> new file mode 100644
> index 000..ddcedc7e238
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr103798-3.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
> +
> 

Re: [PATCH 2/3] tree-cfg: do not duplicate returns_twice calls

2022-07-14 Thread Richard Biener via Gcc-patches
On Wed, Jul 13, 2022 at 4:57 PM Alexander Monakov  wrote:
>
> On Wed, 13 Jul 2022, Richard Biener wrote:
>
> > > > The thing to check would be incoming abnormal edges in
> > > > can_duplicate_block_p, not (only) returns twice functions?
> > >
> > > Unfortunately not, abnormal edges are also used for computed gotos, which 
> > > are
> > > less magic than returns_twice edges and should not block tracer I think.
> >
> > I think computed gotos should use regular edges, only non-local goto should
> > use abnormals...
>
> Yeah, afaict it's not documented what "abnormal" is supposed to mean :/
>
> > I suppose asm goto also uses abnormal edges?
>
> Heh, no, asm goto appears to use normal edges, but there's an old gap in
> their specification: can you use them like computed gotos, i.e. can asm-goto
> jump to a computed target? Or must they be similar to plain gotos where the
> jump label is redirectable (because it's substitutable in the asm template)?
>
> If you take a restrictive interpretation (asm goto may not jump to a computed
> label) then using regular edges looks fine.
>
> > Btw, I don't see how they in general are "less magic".  Sure, we have an
> > explicit receiver (the destination label), but we can only do edge inserts
> > if we have a single computed goto edge into a block (we can "move" the
> > label to the block created when splitting the edge).
>
> Sure, they are a bit magic, but returns_twice edges are even more magic: their
> destination looks tied to a label in the IR, but in reality their destination
> is inside a call that returns twice (hence GCC must be careful not to insert
> anything between the label and the call, like in patch 1/3).

Indeed.  Guess that's what __builtin_setjmp[_receiver] for SJLJ_EH got "right".

When copying a block we do not copy labels so any "jumps" remain to the original
block and thus we are indeed able to isolate normal control flow.  Given that
returns_twice functions _do_ seem to be special, and also special as to how
we handle other abnormal receivers in duplicate_block.  So it might indeed make
sense to special-case them in can_duplicate_block_p ... (sorry for
going back-and-forth ...)

Note that I think this detail of duplicate_block (the function) and the hook
needs to be better documented (the semantics on incoming edges, not duplicating
labels used for incoming control flow).

Can you see as to how to adjust the RTL side for this?  It looks like at least
some places set a REG_SETJMP note on call_insns (emit_call_1), I wonder
if in rtl_verify_flow_info[_1] (or its callees) we can check that such
calls come
first ... they might not since IIRC we do _not_ preserve abnormal edges when
expanding RTL (there's some existing bug about this and how this breaks some
setjmp tests) (but we try to recompute them?).

Sorry about the back-and-forth again ... your original patch looks OK for the
GIMPLE side but can you amend the cfghooks.{cc,h} documentation to
summarize our findings and
the desired semantics of duplicate_block in this respect?

Thanks,
Richard.

> > > This implies patch 1/3 [1] unnecessary blocks sinking to computed goto 
> > > targets.
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588498.html
> > >
> > > How would you like to proceed here? Is my initial patch ok?
> >
> > Hmm, so for returns twice calls duplicate_block correctly copies the call
> > and redirects the provided incoming edge to it.  The API does not
> > handle adding any further incoming edges - the caller would be responsible
> > for this.  So I still somewhat fail to see the point here.  If tracer does 
> > not
> > handle extra incoming edges properly then we need to fix tracer?
>
> I think abnormal edges corresponding to computed gotos are fine: we are
> attempting to create a chain of blocks with no incoming edges in the middle,
> right? Destinations of computed gotos remain at labels of original blocks.
>
> Agreed about correcting this in the tracer.
>
> > This also includes non-local goto (we seem to copy non-local labels just
> > fine - wasn't there a bugreport about this!?).
>
> Sorry, no idea about this.
>
> > So I think can_duplicate_block_p is the wrong place to fix (the RTL side
> > would need a similar fix anyhow?)
>
> Right. I'm happy to leave both RTL and GIMPLE can_duplicate_block_p as is,
> and instead constrain just the tracer. Alternative patch below:
>
> * tracer.cc (analyze_bb): Disallow duplication of returns_twice calls.
>
> diff --git a/gcc/tracer.cc b/gcc/tracer.cc
> index 64517846d..422e2b6a7 100644
> --- a/gcc/tracer.cc
> +++ b/gcc/tracer.cc
> @@ -132,14 +132,19 @@ analyze_bb (basic_block bb, int *count)
>gimple *stmt;
>int n = 0;
>
> +  bool can_dup = can_duplicate_block_p (CONST_CAST_BB (bb));
> +
>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
>  {
>stmt = gsi_stmt (gsi);
>n += estimate_num_insns (stmt, _size_weights);
> +  if (can_dup && cfun->calls_setjmp && gimple_code (stmt) == 

[PATCH] i386: Fix _mm_[u]comixx_{ss,sd} codegen and add PF result. [PR106113]

2022-07-14 Thread Kong, Lingling via Gcc-patches
Hi,

The patch is to fix _mm_[u]comixx_{ss,sd} codegen and add PF result.  These 
intrinsics have changed over time, like `_mm_comieq_ss ` old operation is 
`RETURN ( a[31:0] == b[31:0] ) ? 1 : 0`, and new operation update is `RETURN ( 
a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0`.

OK for master?

gcc/ChangeLog:

PR target/106113
* config/i386/i386-builtin.def (BDESC): Fix [u]comi{ss,sd}
comparison due to intrinsics changed over time.
* config/i386/i386-expand.cc (ix86_ssecom_setcc):
Add unordered check and mode for sse comi codegen.
(ix86_expand_sse_comi): Add unordered check and check a different
CCmode.
(ix86_expand_sse_comi_round):Extract unordered check and mode part
in ix86_ssecom_setcc.

gcc/testsuite/ChangeLog:

PR target/106113
* gcc.target/i386/avx-vcomisd-pr106113-2.c: New test.
* gcc.target/i386/avx-vcomiss-pr106113-2.c: Ditto.
* gcc.target/i386/avx-vucomisd-pr106113-2.c: Ditto.
* gcc.target/i386/avx-vucomiss-pr106113-2.c: Ditto.
* gcc.target/i386/sse-comiss-pr106113-1.c: Ditto.
* gcc.target/i386/sse-comiss-pr106113-2.c: Ditto.
* gcc.target/i386/sse-ucomiss-pr106113-1.c: Ditto.
* gcc.target/i386/sse-ucomiss-pr106113-2.c: Ditto.
* gcc.target/i386/sse2-comisd-pr106113-1.c: Ditto.
* gcc.target/i386/sse2-comisd-pr106113-2.c: Ditto.
* gcc.target/i386/sse2-ucomisd-pr106113-1.c: Ditto.
* gcc.target/i386/sse2-ucomisd-pr106113-2.c: Ditto.
---
 gcc/config/i386/i386-builtin.def  |  32 ++--
 gcc/config/i386/i386-expand.cc| 140 +++---
 .../gcc.target/i386/avx-vcomisd-pr106113-2.c  |   8 +
 .../gcc.target/i386/avx-vcomiss-pr106113-2.c  |   8 +
 .../gcc.target/i386/avx-vucomisd-pr106113-2.c |   8 +
 .../gcc.target/i386/avx-vucomiss-pr106113-2.c |   8 +
 .../gcc.target/i386/sse-comiss-pr106113-1.c   |  19 +++
 .../gcc.target/i386/sse-comiss-pr106113-2.c   |  59 
 .../gcc.target/i386/sse-ucomiss-pr106113-1.c  |  19 +++
 .../gcc.target/i386/sse-ucomiss-pr106113-2.c  |  59 
 .../gcc.target/i386/sse2-comisd-pr106113-1.c  |  19 +++
 .../gcc.target/i386/sse2-comisd-pr106113-2.c  |  59 
 .../gcc.target/i386/sse2-ucomisd-pr106113-1.c |  19 +++
 .../gcc.target/i386/sse2-ucomisd-pr106113-2.c |  59 
 14 files changed, 450 insertions(+), 66 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vcomisd-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vcomiss-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vucomisd-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vucomiss-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-comiss-pr106113-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-comiss-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-ucomiss-pr106113-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-ucomiss-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-comisd-pr106113-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-comisd-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-ucomisd-pr106113-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-ucomisd-pr106113-2.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index fd160935e67..acb7e8ca64b 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -35,30 +35,30 @@
 IX86_BUILTIN__BDESC_##NEXT_KIND##_FIRST - 1.  */
 
 BDESC_FIRST (comi, COMI,
-   OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comieq", 
IX86_BUILTIN_COMIEQSS, UNEQ, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comilt", 
IX86_BUILTIN_COMILTSS, UNLT, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comile", 
IX86_BUILTIN_COMILESS, UNLE, 0)
+   OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comieq", 
IX86_BUILTIN_COMIEQSS, EQ, 0)
+BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comilt", 
IX86_BUILTIN_COMILTSS, LT, 0)
+BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comile", 
IX86_BUILTIN_COMILESS, LE, 0)
 BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comigt", 
IX86_BUILTIN_COMIGTSS, GT, 0)
 BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comige", 
IX86_BUILTIN_COMIGESS, GE, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comineq", 
IX86_BUILTIN_COMINEQSS, LTGT, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_ucomi, "__builtin_ia32_ucomieq", 
IX86_BUILTIN_UCOMIEQSS, UNEQ, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_ucomi, "__builtin_ia32_ucomilt", 
IX86_BUILTIN_UCOMILTSS, UNLT, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_ucomi, "__builtin_ia32_ucomile", 
IX86_BUILTIN_UCOMILESS, UNLE, 0)
+BDESC (OPTION_MASK_ISA_SSE, 0,