Re: [PATCH v2] bpf: Add support to eBPF atomic instructions

2021-10-25 Thread Andrew Pinski via Gcc-patches
On Mon, Oct 25, 2021 at 6:29 PM Guillermo E. Martinez via Gcc-patches
 wrote:
>
> Hello people,
>
> This patch v2 to add support for atomics operations in eBPF target
> using the gcc built-in functions:
>
>   __atomic__fetch
>   __atomic_fetch_
>
> This new version restrict/enable the use of `add + fetch' and the
> rest of atomic instructions using the -m[no-]atomics option to
> generate code running in old/new kernel versions.
>
> Please if you have comments, don't hesitate to let me know.

Just a small comment, I see you placed the atomics rtl into bpf.md,
most other targets use atomics.md (or sync.md for targets which __sync
was implemented before the __atomic where in) where they place them so
it would be easier to be found.  It might make sense to do the same
here.
Changing that should be easy really.

Thanks,
Andrew Pinski

>
> Kinds Regards,
> Guillermo
>
> eBPF add support for atomic instructions, the following
> gcc built-in functions are implemented for bpf target using
> both: 32 and 64 bits data types:
>
>  __atomic_fetch_add
>  __atomic_fetch_sub
>  __atomic_fetch_and
>  __atomic_fetch_xor
>  __atomic_fetch_or
>  __atomic_exchange
>  __atomic_compare_exchange_n
>
> Also calls to __atomic__fetch are fully supported.
>
> New define instructions were added to bpf.md along with its
> tests for gcc atomic built-in functions.
>
> In order to restrict/enable the use of `add + fetch' and the
> rest of atomic instructions the -m[no-]atomics was added.
>
> Those instructions are fully compliant with:
>   https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
>   https://www.kernel.org/doc/Documentation/networking/filter.rst
>
> This support depends of two previous submissions in CGEN and
> binutils-gdb projects:
>https://sourceware.org/pipermail/cgen/2021q3/002774.html
>https://sourceware.org/pipermail/binutils/2021-August/117798.html
>
> gcc/
>   * config/bpf/bpf.md: Add defines for atomic instructions.
>   * config/bpf/bpf.c: Enable atomics by default in ISA v3.
>   * config/bpf/bpf.h: Pass option to gas to disabled use of
>   atomics (-mno-atomics).
>   * config/bpf/bpf.opt: Add -m[no-]atomics option.
>   * doc/invoke.texi: Add documentation for -m[no-a]tomics.
>
> gcc/testsuite/
>* gcc.target/bpf/atomic-compare-exchange.c: New test.
>* gcc.target/bpf/atomic-exchange.c: Likewise.
>* gcc.target/bpf/atomic-add.c: Likewise.
>* gcc.target/bpf/atomic-and.c: Likewise.
>* gcc.target/bpf/atomic-or.c: Likewise.
>* gcc.target/bpf/atomic-sub.c: Likewise.
>* gcc.target/bpf/atomic-xor.c: Likewise.
>* gcc.target/bpf/atomics-disabled.c: Likewise.
>* gcc.target/bpf/ftest-mcpuv3-atomics.c: Likewise.
>* gcc.target/bpf/ftest-no-atomics-add.c: Likewise.
> ---
>  gcc/ChangeLog |   8 +
>  gcc/config/bpf/bpf.c  |   2 +
>  gcc/config/bpf/bpf.h  |  12 +-
>  gcc/config/bpf/bpf.md | 155 --
>  gcc/config/bpf/bpf.opt|   4 +
>  gcc/config/bpf/constraints.md |   3 +
>  gcc/doc/invoke.texi   |  12 +-
>  gcc/testsuite/ChangeLog   |  12 ++
>  .../gcc.target/bpf/atomic-add-fetch.c |  29 
>  gcc/testsuite/gcc.target/bpf/atomic-and.c |  25 +++
>  .../gcc.target/bpf/atomic-compare-exchange.c  |  28 
>  .../gcc.target/bpf/atomic-exchange.c  |  19 +++
>  gcc/testsuite/gcc.target/bpf/atomic-or.c  |  25 +++
>  gcc/testsuite/gcc.target/bpf/atomic-sub.c |  27 +++
>  gcc/testsuite/gcc.target/bpf/atomic-xor.c |  25 +++
>  .../gcc.target/bpf/atomics-disabled.c |  28 
>  .../gcc.target/bpf/ftest-mcpuv3-atomics.c |  36 
>  .../gcc.target/bpf/ftest-no-atomics-add.c |  23 +++
>  18 files changed, 458 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-add-fetch.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-and.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-compare-exchange.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-exchange.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-or.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-sub.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xor.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomics-disabled.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-mcpuv3-atomics.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-no-atomics-add.c
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 115f32e5061..782d33908ba 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,11 @@
> +2021-10-25  Guillermo E. Martinez  
> +   * config/bpf/bpf.md: Add defines for atomic instructions.
> +   * 

Re: [PATCH] Fix loop split incorrect count and probability

2021-10-25 Thread Xionghu Luo via Gcc-patches


On 2021/10/21 18:55, Richard Biener wrote:
> On Thu, 21 Oct 2021, Xionghu Luo wrote:
> 
>>
>>
>> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
>>>
>>>
>>> On 2021/9/23 20:17, Richard Biener wrote:
 On Wed, 22 Sep 2021, Xionghu Luo wrote:

>
>
> On 2021/8/11 17:16, Richard Biener wrote:
>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
>>
>>>
>>>
>>> On 2021/8/10 22:47, Richard Biener wrote:
 On Mon, 9 Aug 2021, Xionghu Luo wrote:

> Thanks,
>
> On 2021/8/6 19:46, Richard Biener wrote:
>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>
>>> loop split condition is moved between loop1 and loop2, the split 
>>> bb's
>>> count and probability should also be duplicated instead of (100% vs
>>> INV),
>>> secondly, the original loop1 and loop2 count need be propotional 
>>> from
>>> the
>>> original loop.
>>>
>>>
>>> diff base/loop-cond-split-1.c.151t.lsplit
>>> patched/loop-cond-split-1.c.151t.lsplit:
>>> ...
>>>   int prephitmp_16;
>>>   int prephitmp_25;
>>>
>>>[local count: 118111600]:
>>>   if (n_7(D) > 0)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>>[local count: 118111600]:
>>>   return;
>>>
>>>[local count: 105119324]:
>>>   pretmp_3 = ga;
>>>
>>> -   [local count: 955630225]:
>>> +   [local count: 315357973]:
>>>   # i_13 = PHI 
>>>   # prephitmp_12 = PHI 
>>>   if (prephitmp_12 != 0)
>>> goto ; [33.00%]
>>>   else
>>> goto ; [67.00%]
>>>
>>> -   [local count: 315357972]:
>>> +   [local count: 104068130]:
>>>   _2 = do_something ();
>>>   ga = _2;
>>>
>>> -   [local count: 955630225]:
>>> +   [local count: 315357973]:
>>>   # prephitmp_5 = PHI 
>>>   i_10 = inc (i_13);
>>>   if (n_7(D) > i_10)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>>[local count: 105119324]:
>>>   goto ; [100.00%]
>>>
>>> -   [local count: 850510901]:
>>> +   [local count: 280668596]:
>>>   if (prephitmp_12 != 0)
>>> -goto ; [100.00%]
>>> +goto ; [33.00%]
>>>   else
>>> -goto ; [INV]
>>> +goto ; [67.00%]
>>>
>>> -   [local count: 850510901]:
>>> +   [local count: 280668596]:
>>>   goto ; [100.00%]
>>>
>>> -   [count: 0]:
>>> +   [local count: 70429947]:
>>>   # i_23 = PHI 
>>>   # prephitmp_25 = PHI 
>>>
>>> -   [local count: 955630225]:
>>> +   [local count: 640272252]:
>>>   # i_15 = PHI 
>>>   # prephitmp_16 = PHI 
>>>   i_22 = inc (i_15);
>>>   if (n_7(D) > i_22)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>> -   [local count: 850510901]:
>>> +   [local count: 569842305]:
>>>   goto ; [100.00%]
>>>
>>> }
>>>
>>> gcc/ChangeLog:
>>>
>>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>  (do_split_loop_on_cond): Likewise.
>>> ---
>>> gcc/tree-ssa-loop-split.c | 16 
>>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>> --- a/gcc/tree-ssa-loop-split.c
>>> +++ b/gcc/tree-ssa-loop-split.c
>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>  basic_block cond_bb;
>
>   if (!initial_true)
> -   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> +? EDGE_SUCC (bbs[i], 0)
> +: EDGE_SUCC (bbs[i], 1);
>
>>> 
>>> class loop *loop2 = loop_version (loop1, cond, _bb,
>>> -  profile_probability::always
>>> (),
>>> -  profile_probability::always
>>> (),
>>> -  profile_probability::always
>>> (),
>>> -  

[PATCH] forwprop: Remove incorrect assertion [PR102897]

2021-10-25 Thread Kewen.Lin via Gcc-patches
Hi,

As PR102897 shows, there is one incorrect assertion in function
simplify_permutation, which is based on the wrong assumption that
all cases with op2_type == tgt_type are handled previously, the
proposed fix is to remove this wrong assertion.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

BR,
Kewen
-
gcc/ChangeLog:

PR tree-optimization/102897
* tree-ssa-forwprop.c (simplify_permutation): Remove a wrong assertion.

gcc/testsuite/ChangeLog:

* gcc.dg/pr102897.c: New test.
---
 gcc/testsuite/gcc.dg/pr102897.c | 16 
 gcc/tree-ssa-forwprop.c |  2 --
 2 files changed, 16 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102897.c

diff --git a/gcc/testsuite/gcc.dg/pr102897.c b/gcc/testsuite/gcc.dg/pr102897.c
new file mode 100644
index 000..d96b0e48ccc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102897.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* Specify C99 to avoid the warning/error on compound literals.  */
+/* { dg-options "-std=c99" } */
+
+/* Verify that there is no ICE.  */
+
+typedef __attribute__((vector_size(8))) signed char int8x8_t;
+typedef __attribute__((vector_size(8))) unsigned char uint8x8_t;
+
+int8x8_t fn1 (int8x8_t val20, char tmp)
+{
+  uint8x8_t __trans_tmp_3;
+  __trans_tmp_3 = (uint8x8_t){tmp};
+  int8x8_t __a = (int8x8_t) __trans_tmp_3;
+  return __builtin_shuffle (__a, val20, (uint8x8_t){0});
+}
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 5b30d4c1a76..a830bab78ba 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2267,8 +2267,6 @@ simplify_permutation (gimple_stmt_iterator *gsi)
  if (!VECTOR_TYPE_P (tgt_type))
return 0;
  tree op2_type = TREE_TYPE (op2);
- /* Should have folded this before.  */
- gcc_assert (op2_type != tgt_type);

  /* Figure out the shrunk factor.  */
  poly_uint64 tgt_units = TYPE_VECTOR_SUBPARTS (tgt_type);
--
2.27.0


[PATCH v2] bpf: Add support to eBPF atomic instructions

2021-10-25 Thread Guillermo E. Martinez via Gcc-patches
Hello people,

This patch v2 to add support for atomics operations in eBPF target
using the gcc built-in functions:

  __atomic__fetch
  __atomic_fetch_

This new version restrict/enable the use of `add + fetch' and the
rest of atomic instructions using the -m[no-]atomics option to
generate code running in old/new kernel versions.

Please if you have comments, don't hesitate to let me know.

Kinds Regards,
Guillermo

eBPF add support for atomic instructions, the following
gcc built-in functions are implemented for bpf target using
both: 32 and 64 bits data types:

 __atomic_fetch_add
 __atomic_fetch_sub
 __atomic_fetch_and
 __atomic_fetch_xor
 __atomic_fetch_or
 __atomic_exchange
 __atomic_compare_exchange_n

Also calls to __atomic__fetch are fully supported.

New define instructions were added to bpf.md along with its
tests for gcc atomic built-in functions.

In order to restrict/enable the use of `add + fetch' and the
rest of atomic instructions the -m[no-]atomics was added.

Those instructions are fully compliant with:
  https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
  https://www.kernel.org/doc/Documentation/networking/filter.rst

This support depends of two previous submissions in CGEN and
binutils-gdb projects:
   https://sourceware.org/pipermail/cgen/2021q3/002774.html
   https://sourceware.org/pipermail/binutils/2021-August/117798.html

gcc/
  * config/bpf/bpf.md: Add defines for atomic instructions.
  * config/bpf/bpf.c: Enable atomics by default in ISA v3.
  * config/bpf/bpf.h: Pass option to gas to disabled use of
  atomics (-mno-atomics).
  * config/bpf/bpf.opt: Add -m[no-]atomics option.
  * doc/invoke.texi: Add documentation for -m[no-a]tomics.

gcc/testsuite/
   * gcc.target/bpf/atomic-compare-exchange.c: New test.
   * gcc.target/bpf/atomic-exchange.c: Likewise.
   * gcc.target/bpf/atomic-add.c: Likewise.
   * gcc.target/bpf/atomic-and.c: Likewise.
   * gcc.target/bpf/atomic-or.c: Likewise.
   * gcc.target/bpf/atomic-sub.c: Likewise.
   * gcc.target/bpf/atomic-xor.c: Likewise.
   * gcc.target/bpf/atomics-disabled.c: Likewise.
   * gcc.target/bpf/ftest-mcpuv3-atomics.c: Likewise.
   * gcc.target/bpf/ftest-no-atomics-add.c: Likewise.
---
 gcc/ChangeLog |   8 +
 gcc/config/bpf/bpf.c  |   2 +
 gcc/config/bpf/bpf.h  |  12 +-
 gcc/config/bpf/bpf.md | 155 --
 gcc/config/bpf/bpf.opt|   4 +
 gcc/config/bpf/constraints.md |   3 +
 gcc/doc/invoke.texi   |  12 +-
 gcc/testsuite/ChangeLog   |  12 ++
 .../gcc.target/bpf/atomic-add-fetch.c |  29 
 gcc/testsuite/gcc.target/bpf/atomic-and.c |  25 +++
 .../gcc.target/bpf/atomic-compare-exchange.c  |  28 
 .../gcc.target/bpf/atomic-exchange.c  |  19 +++
 gcc/testsuite/gcc.target/bpf/atomic-or.c  |  25 +++
 gcc/testsuite/gcc.target/bpf/atomic-sub.c |  27 +++
 gcc/testsuite/gcc.target/bpf/atomic-xor.c |  25 +++
 .../gcc.target/bpf/atomics-disabled.c |  28 
 .../gcc.target/bpf/ftest-mcpuv3-atomics.c |  36 
 .../gcc.target/bpf/ftest-no-atomics-add.c |  23 +++
 18 files changed, 458 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-add-fetch.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-and.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-compare-exchange.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-exchange.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-or.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-sub.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xor.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomics-disabled.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-mcpuv3-atomics.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-no-atomics-add.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 115f32e5061..782d33908ba 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2021-10-25  Guillermo E. Martinez  
+   * config/bpf/bpf.md: Add defines for atomic instructions.
+   * config/bpf/bpf.c: Enable atomics by default in ISA v3.
+   * config/bpf/bpf.h: Pass option to gas to disable use of
+   atomics (-mno-atomics).
+   * config/bpf/bpf.opt: Add -m[no-]atomics option.
+   * doc/invoke.texi: Add documentation for -m[no-]atomics.
+
 2021-10-20  Alex Coplan  
 
* calls.c (initialize_argument_information): Remove some dead
diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 82bb698bd91..5f489c829cc 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -253,6 +253,8 @@ bpf_option_override (void)
   if (bpf_has_jmp32 == -1)
 bpf_has_jmp32 = 

Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-10-25 Thread Joseph Myers
On Mon, 25 Oct 2021, Richard Biener via Gcc-patches wrote:

> So it looks like tm_d.h is much more stripped down compared to regular
> tm_p.h but also oddly enough config/default-d.c includes tm_d.h
> while config/default-c.c explicitely documents itself to not do that.

I think the intent of that comment in default-c.c (which I wrote) was that 
if a separate tm_c.h is needed, it should use its own headers, disjoint 
from those used by tm.h.  In particular, as noted in the original patch 
submission 
, that 
avoids making macros used only to define hooks visible throughout the 
compiler.

> Is it maybe a bug that tm_d.h includes defaults.h at all?  Should

It's a bug that it includes defaults.h, and a bug that it includes 
${cpu_type}/${cpu_type}.h.  Any macros used only to define D hooks should 
be in completely separate headers that aren't used elsewhere in the 
compiler.

> "d defaults" be in a defaults-d.h instead?  If I remove the

Yes, and likewise any target-specific overrides of such macros should be 
in a separate header, not ${cpu_type}/${cpu_type}.h.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/avx512f-pr96891-3.c on Solaris [PR102834]

2021-10-25 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 25, 2021 at 10:01 PM Rainer Orth
 wrote:
>
> gcc.target/i386/avx512f-pr96891-3.c currently FAILs on 32-bit Solaris/x86:
>
> FAIL: gcc.target/i386/avx512f-pr96891-3.c scan-assembler-times 
> (?n)vpcmp[bwdq][ t]*\$7 4
>
> There are only 3 instances of the expected pattern because Solaris/x86
> defaults to -mno-stv.  Fixed by compiling with -mstv and
> -mno-stackrealign.  Tested on i386-pc-solaris2.11 and
> x86_64-pc-linux-gnu.
>
> Ok for master?
Ok.
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2021-10-20  Rainer Orth  
>
> gcc/testsuite:
> PR testsuite/102834
> * gcc.target/i386/avx512f-pr96891-3.c: Add -mstv -mno-stackrealign
> to dg-options.
>


-- 
BR,
Hongtao


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/avx512fp16-trunchf.c on Solaris [PR102835]

2021-10-25 Thread Hongyu Wang via Gcc-patches
I think this can be put in as an obvious fix.

Thanks for the patch.

Rainer Orth  于2021年10月25日周一 下午9:53写道:
>
> The gcc.target/i386/avx512fp16-trunchf.c test FAILs on 32-bit Solaris/x86:
>
> FAIL: gcc.target/i386/avx512fp16-trunchf.c scan-assembler-times vcvttsh2si[ 
> t]+[^{\\n]*(?:%xmm[0-9]|(%esp))+, %eax(?:\\n|[ t]+#) 3
> FAIL: gcc.target/i386/avx512fp16-trunchf.c scan-assembler-times vcvttsh2usi[ 
> t]+[^{\\n]*(?:%xmm[0-9]|(%esp))+, %eax(?:\\n|[ t]+#) 2
>
> This happens because Solaris defaults to -fno-omit-frame-pointer, so it
> uses %ebp instead of the expected %esp.  As Hongyu Wang suggested in the
> PR, this can be fixed by accepting both forms, which this patch does.
>
> Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
>
> Ok for master?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2021-10-20  Rainer Orth  
>
> gcc/testsuite:
> * gcc.target/i386/avx512fp16-trunchf.c: Allow for %esp instead of
> %ebp.
>


[COMMITTED] Move vrp_simplify_cond_using_ranges to the simplifier.

2021-10-25 Thread Andrew MacLeod via Gcc-patches
VRP currently performs a simplification that we can move into the 
general simplification code. I'll just quote the comment:


   If the conditional is of the form SSA_NAME op constant and the SSA_NAME
   was set via a type conversion, try to replace the SSA_NAME with the RHS
   of the type conversion.  Doing so makes the conversion dead which helps
   subsequent passes.  */

This patch moves the routine to the simplify_using_ranges class, and 
calls it when other conditional simplifications fail.  It also moves the 
simplfy_cast_conds routine into the VRP folder instead of being a 
standalone static function in tree-vrp.c.


Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew


>From f5bacd9c5be5e129688d9c91eeed05e7b968117e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 25 Oct 2021 18:04:06 -0400
Subject: [PATCH 2/2] Move vrp_simplify_cond_using_ranges into the simplifier.

This static VRP routine does a simplification with casted conditions.  Add it
to the general simplfier, and continue to invoke if from the VRP folder.

	* tree-vrp.c (vrp_simplify_cond_using_ranges): Add return type and
	move to vr-values.c.
	(simplify_casted_conds): Move to vrp_folder class.
	(execute_vrp): Call via vrp_folder now.
	* vr-values.c (simplify_cond_using_ranges_1): Call simplify_casted_cond.
	(simplify_using_ranges::simplify_casted_cond): Relocate from tree-vrp.c.
	* vr-values.h (simplify_casted_cond): Add prototype.
---
 gcc/tree-vrp.c  | 91 -
 gcc/vr-values.c | 69 +
 gcc/vr-values.h |  1 +
 3 files changed, 85 insertions(+), 76 deletions(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index a948c524098..38ea50303e0 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -4031,6 +4031,7 @@ class vrp_folder : public substitute_and_fold_engine
 : substitute_and_fold_engine (/* Fold all stmts.  */ true),
   m_vr_values (v), simplifier (v)
 {  }
+  void simplify_casted_conds (function *fun);
 
 private:
   tree value_of_expr (tree name, gimple *stmt) OVERRIDE
@@ -4117,78 +4118,6 @@ vrp_folder::fold_stmt (gimple_stmt_iterator *si)
   return simplifier.simplify (si);
 }
 
-/* STMT is a conditional at the end of a basic block.
-
-   If the conditional is of the form SSA_NAME op constant and the SSA_NAME
-   was set via a type conversion, try to replace the SSA_NAME with the RHS
-   of the type conversion.  Doing so makes the conversion dead which helps
-   subsequent passes.  */
-
-static void
-vrp_simplify_cond_using_ranges (range_query *query, gcond *stmt)
-{
-  tree op0 = gimple_cond_lhs (stmt);
-  tree op1 = gimple_cond_rhs (stmt);
-
-  /* If we have a comparison of an SSA_NAME (OP0) against a constant,
- see if OP0 was set by a type conversion where the source of
- the conversion is another SSA_NAME with a range that fits
- into the range of OP0's type.
-
- If so, the conversion is redundant as the earlier SSA_NAME can be
- used for the comparison directly if we just massage the constant in the
- comparison.  */
-  if (TREE_CODE (op0) == SSA_NAME
-  && TREE_CODE (op1) == INTEGER_CST)
-{
-  gimple *def_stmt = SSA_NAME_DEF_STMT (op0);
-  tree innerop;
-
-  if (!is_gimple_assign (def_stmt))
-	return;
-
-  switch (gimple_assign_rhs_code (def_stmt))
-	{
-	CASE_CONVERT:
-	  innerop = gimple_assign_rhs1 (def_stmt);
-	  break;
-	case VIEW_CONVERT_EXPR:
-	  innerop = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0);
-	  if (!INTEGRAL_TYPE_P (TREE_TYPE (innerop)))
-	return;
-	  break;
-	default:
-	  return;
-	}
-
-  if (TREE_CODE (innerop) == SSA_NAME
-	  && !POINTER_TYPE_P (TREE_TYPE (innerop))
-	  && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (innerop)
-	  && desired_pro_or_demotion_p (TREE_TYPE (innerop), TREE_TYPE (op0)))
-	{
-	  const value_range *vr = query->get_value_range (innerop);
-
-	  if (range_int_cst_p (vr)
-	  && range_fits_type_p (vr,
-TYPE_PRECISION (TREE_TYPE (op0)),
-TYPE_SIGN (TREE_TYPE (op0)))
-	  && int_fits_type_p (op1, TREE_TYPE (innerop)))
-	{
-	  tree newconst = fold_convert (TREE_TYPE (innerop), op1);
-	  gimple_cond_set_lhs (stmt, innerop);
-	  gimple_cond_set_rhs (stmt, newconst);
-	  update_stmt (stmt);
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-		{
-		  fprintf (dump_file, "Folded into: ");
-		  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
-		  fprintf (dump_file, "\n");
-		}
-	}
-	}
-}
-}
-
 /* A comparison of an SSA_NAME against a constant where the SSA_NAME
was set by a type conversion can often be rewritten to use the RHS
of the type conversion.  Do this optimization for all conditionals
@@ -4198,15 +4127,25 @@ vrp_simplify_cond_using_ranges (range_query *query, gcond *stmt)
So that transformation is not performed until after jump threading
is complete.  */
 
-static void
-simplify_casted_conds (function *fun, range_query *query)
+void

[COMMITTED] Fold all statements in Ranger VRP.

2021-10-25 Thread Andrew MacLeod via Gcc-patches
This patch changes the ranger VRP pass to simplify all statements, not 
just the ones with ranges.  I believe Jeff had mentioned we were no 
longer doing this a while back.  Now we need it when running as the VRP2 
pass to satisfy the testcase: gcc.dg/wrapped-binop-simplify.c


This also requires a testcase adjustment since EVRP will now perform 
this simplification, and it causes a test looking for it in VRP1 to 
fail.  In that test, I simply disable evrp, and then add a duplicate of 
the test which then tests that EVRP also performs the optimization.


Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew

>From cb153222404e2e149aa65a4b3139b09477551203 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 20 Oct 2021 13:37:29 -0400
Subject: [PATCH 1/2] Fold all statements in Ranger VRP.

Until now, ranger VRP has only simplified statements with ranges.  This patch
enables us to fold all statements.

	gcc/
	* tree-vrp.c (rvrp_folder::fold_stmt): If simplification fails, try
	to fold anyway.

	gcc/testsuite/
	* gcc.dg/tree-ssa/vrp98.c: Disable evrp for vrp1 test.
	* gcc.dg/tree-ssa/vrp98-1.c: New. Test for folding in evrp.
---
 gcc/testsuite/gcc.dg/tree-ssa/vrp98-1.c | 41 +
 gcc/testsuite/gcc.dg/tree-ssa/vrp98.c   |  2 +-
 gcc/tree-vrp.c  |  5 ++-
 3 files changed, 46 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp98-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp98-1.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp98-1.c
new file mode 100644
index 000..daa3f073b92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp98-1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-Os -fdump-tree-evrp-details" } */
+
+#include 
+#include 
+
+typedef unsigned int word __attribute__((mode(word)));
+typedef unsigned __int128 bigger_than_word;
+
+int
+foo (bigger_than_word a, word b, uint8_t c)
+{
+  /* Must fold use of t1 into use of b, as b is no wider than word_mode. */
+  const uint8_t t1 = b % UCHAR_MAX;
+
+  /* Must NOT fold use of t2 into use of a, as a is wider than word_mode. */
+  const uint8_t t2 = a % UCHAR_MAX;
+
+  /* Must fold use of t3 into use of c, as c is narrower than t3. */
+  const uint32_t t3 = (const uint32_t)(c >> 1);
+
+  uint16_t ret = 0;
+
+  if (t1 == 1)
+ret = 20;
+  else if (t2 == 2)
+ret = 30;
+  else if (t3 == 3)
+ret = 40;
+  /* Th extra condition below is necessary to prevent a prior pass from
+ folding away the cast. Ignored in scan-tree-dump. */
+  else if (t3 == 4)
+ret = 50;
+
+  return ret;
+}
+
+/* { dg-final { scan-tree-dump "Folded into: if \\(_\[0-9\]+ == 1\\)" "evrp" } } */
+/* { dg-final { scan-tree-dump-not "Folded into: if \\(_\[0-9\]+ == 2\\)" "evrp" } } */
+/* { dg-final { scan-tree-dump "Folded into: if \\(_\[0-9\]+ == 3\\)" "evrp" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp98.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp98.c
index 982f091080c..78d3bbaf499 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp98.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp98.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target int128 } */
-/* { dg-options "-Os -fdump-tree-vrp1-details" } */
+/* { dg-options "-Os -fdisable-tree-evrp -fdump-tree-vrp1-details" } */
 
 #include 
 #include 
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index ba7a4efc7c6..a948c524098 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-range.h"
 #include "gimple-range-path.h"
 #include "value-pointer-equiv.h"
+#include "gimple-fold.h"
 
 /* Set of SSA names found live during the RPO traversal of the function
for still active basic-blocks.  */
@@ -4381,7 +4382,9 @@ public:
 
   bool fold_stmt (gimple_stmt_iterator *gsi) OVERRIDE
   {
-return m_simplifier.simplify (gsi);
+if (m_simplifier.simplify (gsi))
+  return true;
+return ::fold_stmt (gsi, follow_single_use_edges);
   }
 
 private:
-- 
2.17.2



[PATCH] rs6000: Fix bootstrap (libffi)

2021-10-25 Thread Segher Boessenkool
This fixes bootstrap for the current problems building libffi.

I'll work on getting this into upstream as well.  If the maintainers
want it done differently, at least we have bootstrap working again
until then.

Tested on powerpc64-linux {-m32,-m64}.


Segher


2021-10-25  Segher Boessenkool  

libffi/
* src/powerpc/linux64.S: Enable AltiVec insns.
* src/powerpc/linux64_closure.S: Ditto.
---
 libffi/src/powerpc/linux64.S | 2 ++
 libffi/src/powerpc/linux64_closure.S | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/libffi/src/powerpc/linux64.S b/libffi/src/powerpc/linux64.S
index e92d64af34fd..1f876ea39edd 100644
--- a/libffi/src/powerpc/linux64.S
+++ b/libffi/src/powerpc/linux64.S
@@ -29,6 +29,8 @@
 #include 
 #include 
 
+   .machine altivec
+
 #ifdef POWERPC64
.hidden ffi_call_LINUX64
.globl  ffi_call_LINUX64
diff --git a/libffi/src/powerpc/linux64_closure.S 
b/libffi/src/powerpc/linux64_closure.S
index 3469a2cbb01e..199981db3307 100644
--- a/libffi/src/powerpc/linux64_closure.S
+++ b/libffi/src/powerpc/linux64_closure.S
@@ -30,6 +30,8 @@
 
.file   "linux64_closure.S"
 
+   .machine altivec
+
 #ifdef POWERPC64
FFI_HIDDEN (ffi_closure_LINUX64)
.globl  ffi_closure_LINUX64
-- 
1.8.3.1



Re: [PATCH] rs6000: Fixes for tests including only

2021-10-25 Thread Segher Boessenkool
Hi!

On Mon, Oct 25, 2021 at 03:33:21PM -0500, Paul A. Clarke wrote:
>   * config/rs6000/x86intrin.h: Move some included headers to new
>   headers; include new immintrin.h instead.

s/; i/.  I/  (And instead of what?)

>   * config/rs6000/immintrin.h: New.
>   * config/rs6000/x86gprintrin.h: New.

(That is a filename worse than our worst mnemonic :-) )

>   * config/config.gcc (powerpc-*-*): Add new headers to extra_headers.

powerpc*-*-*

> --- a/gcc/testsuite/gcc.target/powerpc/pr78102.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mvsx" } */
> -/* { dg-require-effective-target vsx_hw } */
> +/* { dg-options "-O2 -mpower8-vector -DNO_WARN_X86_INTRINSICS" } */
> +/* { dg-require-effective-target p8vector_hw } */

Please use -mcpu=power8 instead?  (And -mdejagnu-cpu=power8 in
testcases).

(The changelog should say you added the -D btw).

If you run you need *_hw.  If you only compile, like here, you want to
use *_ok instead.

Okay for trunk with those things tuned up.  Thanks!


Segher


Re: Make full use of context-sensitive ranges in access warnings

2021-10-25 Thread Martin Sebor via Gcc-patches

On 10/25/21 2:24 PM, Jeff Law wrote:



On 10/25/2021 1:31 PM, Martin Sebor wrote:

On 10/25/21 12:57 PM, Jeff Law wrote:



On 10/23/2021 5:49 PM, Martin Sebor via Gcc-patches wrote:

Somewhat belatedly following Aldy's lead on finishing
the conversion to Ranger, the attached patch modifies
gimple-ssa-warn-access and other passes that use
the pointer_query machinery to provide Ranger with
the statement it's being called to determine ranges for.
The changes are almost completely mechanical, involving
passing a GIMPLE statement around (and a range_query
pointer) all the way into the bowels of the pointer_query
class to make them available when range info is being
determined.

There might be some overlap with Aldy's tree-ssa-strlen.c
changes to do the same there.  I'll deal with any conflicts
when it comes time to commit the work.

The changes trigger a couple of -Wstringop-overread instances
in libstdc++ tests.  The warnings look valid for the IL but
the code they're in is unreachable.  One of the tests already
suppresses -Wstringop-overflow so also suppressing
-Wstringop-overread doesn't seem out of line.

Tested on x86_64-linux.

Martin

PS The warning for the u8path-char8_t.cc test is this:

/ssd/test/build/gcc-test/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:355: 
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned 
int)' reading between 16 and 4611686018427387903 bytes from a region 
of size 10 [-Wstringop-overread]


The IL for it is below.  The loop iN BB 3 exits with __i_22 equal
to 10 so BBs 5, 6 and 7 are unreachable.  It's surprising to me
that the loop isn't optimized into something better (like a MEM
array assignment or memcpy).

   [local count: 1073741824]:
  MEM[(struct basic_string *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)]._M_p = _M_local_buf;

   [local count: 8687547547]:
  # __i_109 = PHI <__i_22(3), 0(2)>
  __i_22 = __i_109 + 1;
  _24 = MEM[(const char_type &)"filename2" + __i_22 * 1];
  if (_24 != 0)
    goto ; [89.00%]
  else
    goto ; [11.00%]

   [local count: 1073741824]:   <<< __i_22 == 10 here
  if (__i_22 > 15)
    goto ; [33.00%]
  else
    goto ; [67.00%]

   [local count: 354334802]:
  if (__i_22 > 4611686018427387903)
    goto ; [0.04%]
  else
    goto ; [99.96%]   >>> __i_22 in [16, 4611686018427387903]

   [local count: 141736]:
  std::__throw_length_error ("basic_string::_M_create");

   [local count: 354193066]:
  _85 = __i_109 + 2;
  _42 = operator new (_85);
  s1._M_dataplus._M_p = _42;
  s1.D.30357._M_allocated_capacity = __i_22;
  __builtin_memcpy (_42, "filename2", __i_22);   << -Wstringop-overread


Do you mean __i_22 == 16 earlier?  I don't see how it's restricted to 
10.


The loop computes the size of the "filename2" string so the result
is 10, no?

Oh, duh.  I'm not sure that Ranger will pick that up though.


I don't expect Ranger to figure it out from the loop, but I'd
expect the loop to be unrolled into a constant.  It's just:

  int __i_22 = 0;
  do; while ("filename2"[__i22++]);

and this and all its variations I've tried is folded to 10.
Something in the bowels of std::u8string is keeping that from
happening.

The test case that shows the difference is just:

#include 

int f ()
{
  std::string s = "filename2";
  return s.length ();   // folded to 10
}

int g ()
{
  std::u8string s = u8"filename2";
  return s.length ();   // not folded
}

Compile with -std=c++17 -fchar8_t.

Martin


Re: Make full use of context-sensitive ranges in access warnings

2021-10-25 Thread Andrew MacLeod via Gcc-patches

On 10/25/21 4:24 PM, Jeff Law via Gcc-patches wrote:



On 10/25/2021 1:31 PM, Martin Sebor wrote:

On 10/25/21 12:57 PM, Jeff Law wrote:



On 10/23/2021 5:49 PM, Martin Sebor via Gcc-patches wrote:

Somewhat belatedly following Aldy's lead on finishing
the conversion to Ranger, the attached patch modifies
gimple-ssa-warn-access and other passes that use
the pointer_query machinery to provide Ranger with
the statement it's being called to determine ranges for.
The changes are almost completely mechanical, involving
passing a GIMPLE statement around (and a range_query
pointer) all the way into the bowels of the pointer_query
class to make them available when range info is being
determined.

There might be some overlap with Aldy's tree-ssa-strlen.c
changes to do the same there.  I'll deal with any conflicts
when it comes time to commit the work.

The changes trigger a couple of -Wstringop-overread instances
in libstdc++ tests.  The warnings look valid for the IL but
the code they're in is unreachable.  One of the tests already
suppresses -Wstringop-overflow so also suppressing
-Wstringop-overread doesn't seem out of line.

Tested on x86_64-linux.

Martin

PS The warning for the u8path-char8_t.cc test is this:

/ssd/test/build/gcc-test/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:355: 
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned 
int)' reading between 16 and 4611686018427387903 bytes from a 
region of size 10 [-Wstringop-overread]


The IL for it is below.  The loop iN BB 3 exits with __i_22 equal
to 10 so BBs 5, 6 and 7 are unreachable.  It's surprising to me
that the loop isn't optimized into something better (like a MEM
array assignment or memcpy).

   [local count: 1073741824]:
  MEM[(struct basic_string *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)]._M_p = _M_local_buf;

   [local count: 8687547547]:
  # __i_109 = PHI <__i_22(3), 0(2)>
  __i_22 = __i_109 + 1;
  _24 = MEM[(const char_type &)"filename2" + __i_22 * 1];
  if (_24 != 0)
    goto ; [89.00%]
  else
    goto ; [11.00%]

   [local count: 1073741824]:   <<< __i_22 == 10 here
  if (__i_22 > 15)
    goto ; [33.00%]
  else
    goto ; [67.00%]

   [local count: 354334802]:
  if (__i_22 > 4611686018427387903)
    goto ; [0.04%]
  else
    goto ; [99.96%]   >>> __i_22 in [16, 4611686018427387903]

   [local count: 141736]:
  std::__throw_length_error ("basic_string::_M_create");

   [local count: 354193066]:
  _85 = __i_109 + 2;
  _42 = operator new (_85);
  s1._M_dataplus._M_p = _42;
  s1.D.30357._M_allocated_capacity = __i_22;
  __builtin_memcpy (_42, "filename2", __i_22);   << 
-Wstringop-overread


Do you mean __i_22 == 16 earlier?  I don't see how it's restricted 
to 10.


The loop computes the size of the "filename2" string so the result
is 10, no?

Oh, duh.  I'm not sure that Ranger will pick that up though.
Absolutely not on its own, unless its globally set earlier by loop 
analysis or someone else.




[PATCH] rs6000: Fixes for tests including only

2021-10-25 Thread Paul A. Clarke via Gcc-patches
Tests which only include  expect many other include files
to be brought in, but not enough are.

Try to increase compatibility with x86 headers by:
- Create new immintrin.h, including the analogous subset of intrinsics
  headers available for powerpc.
- Create new x86gprintrin.h, serving exclusively as the umbrella for
  bmiintrin.h and bmi2intrin.h.
- Modify x86intrin.h:
  - Include new immintrin.h.
  - Remove mmintrin.h, xmmintrin.h, emmintrin.h, now included indirectly
from immintrin.h.
  - Remove bmiintrin.h, bmi2intrin.h, now included indirectly from
x86gprintrin.h (which is now included from immintrin.h).

Add the new files to gcc/config.gcc.

Also, fix up the testcase that provoked PR102719, which requires
Power8 vector support.

Fixes commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

2021-10-25  Paul A. Clarke  

gcc
PR target/102719
* config/rs6000/x86intrin.h: Move some included headers to new
headers; include new immintrin.h instead.
* config/rs6000/immintrin.h: New.
* config/rs6000/x86gprintrin.h: New.
* config/config.gcc (powerpc-*-*): Add new headers to extra_headers.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Fix dg directives to require Power8
vector support.
---
Tested on powerpc64le-linux (Power9), powerpc64-linux (Power8) and
powerpc-linux (Power8).

OK for trunk?

 gcc/config.gcc |  2 +-
 gcc/config/rs6000/immintrin.h  | 41 ++
 gcc/config/rs6000/x86gprintrin.h   | 31 
 gcc/config/rs6000/x86intrin.h  | 10 +-
 gcc/testsuite/gcc.target/powerpc/pr78102.c |  4 +--
 5 files changed, 76 insertions(+), 12 deletions(-)
 create mode 100644 gcc/config/rs6000/immintrin.h
 create mode 100644 gcc/config/rs6000/x86gprintrin.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fb1f06f3da89..efd1f42ac234 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -490,7 +490,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
-   extra_headers="${extra_headers} nmmintrin.h"
+   extra_headers="${extra_headers} nmmintrin.h immintrin.h x86gprintrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
diff --git a/gcc/config/rs6000/immintrin.h b/gcc/config/rs6000/immintrin.h
new file mode 100644
index ..647a5ae49b5a
--- /dev/null
+++ b/gcc/config/rs6000/immintrin.h
@@ -0,0 +1,41 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#define _IMMINTRIN_H_INCLUDED
+
+#include 
+
+#include 
+
+#include 
+
+#include 
+
+#include 
+
+#include 
+
+#include 
+
+#endif /* _IMMINTRIN_H_INCLUDED */
diff --git a/gcc/config/rs6000/x86gprintrin.h b/gcc/config/rs6000/x86gprintrin.h
new file mode 100644
index ..57ef120f805f
--- /dev/null
+++ b/gcc/config/rs6000/x86gprintrin.h
@@ -0,0 +1,31 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library 

Re: [PING] rs600 built-in series

2021-10-25 Thread Bill Schmidt via Gcc-patches
Ping...

On 10/11/21 5:17 PM, Bill Schmidt wrote:
> Hi!  Ping, please. :-)
>
> Bill
>
> On 9/29/21 3:38 PM, Bill Schmidt wrote:
>> Hi Segher,
>>
>> Might as well ping this before I go on vacation.  :-)  I think we're up to 
>> 06/18:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578604.html
>>
>> Thanks!!
>>
>> Bill


[COMMITTED] rs6000: Fix missing "externs" in smmintrin.h

2021-10-25 Thread Paul A. Clarke via Gcc-patches
Inline functions defined in smmintrin.h need "extern" as part of their
declaration, otherwise instances of those functions are created in the
objects which include them.

Fixes commits:
- acd4b9103c1a30c833de4eee31fb69c3ff13cd77
- 9d352c68e8c8b642a36a6bcfc7f6b5dba11ac748
- bd9a8737d478f7f1d01a9d5f1cc4309ffbb53103
- 5f500715438761f59de5fb992267748c5d4dc4b6
- eaa93a0f3d9f67c8cbc1dc849ea6feba432ff412
- 29fb1e831bf1c25e4574bf2f98a9f534e5c67665

2021-10-25  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_testz_si128): Add "extern" to
function signature.
(_mm_testc_si128): Likewise.
(_mm_testnzc_si128): Likewise.
(_mm_blend_ps): Likewise.
(_mm_blendv_ps): Likewise.
(_mm_blend_pd): Likewise.
(_mm_blendv_pd): Likewise.
(_mm_ceil_pd): Likewise.
(_mm_ceil_sd): Likewise.
(_mm_ceil_ps): Likewise.
(_mm_ceil_ss): Likewise.
(_mm_floor_pd): Likewise.
(_mm_floor_sd): Likewise.
(_mm_floor_ps): Likewise.
(_mm_floor_ss): Likewise.
(_mm_minpos_epu16): Likewise.
(_mm_mul_epi32): Likewise.
(_mm_cvtepi8_epi16): Likewise.
(_mm_packus_epi32): Likewise.
(_mm_cmpgt_epi64): Likewise.
---
Tested on powerpc64le-linux (Power9), powerpc64-linux (Power8),
powerpc-linux (Power8).

Committed as trivial, obvious.

 gcc/config/rs6000/smmintrin.h | 40 +--
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index b732fbca7b09..0fab308b1951 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -118,7 +118,7 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
-__inline __m128
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
 {
@@ -145,7 +145,7 @@ _mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
   return (__m128) __r;
 }
 
-__inline __m128
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
 {
@@ -154,7 +154,7 @@ _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
   return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
 }
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
 {
@@ -170,7 +170,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
 }
 
 #ifdef _ARCH_PWR8
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
 {
@@ -180,7 +180,7 @@ _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
 }
 #endif
 
-__inline int
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testz_si128 (__m128i __A, __m128i __B)
 {
@@ -189,7 +189,7 @@ _mm_testz_si128 (__m128i __A, __m128i __B)
   return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
 }
 
-__inline int
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testc_si128 (__m128i __A, __m128i __B)
 {
@@ -199,7 +199,7 @@ _mm_testc_si128 (__m128i __A, __m128i __B)
   return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
 }
 
-__inline int
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testnzc_si128 (__m128i __A, __m128i __B)
 {
@@ -214,14 +214,14 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
 
 #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_ceil_pd (__m128d __A)
 {
   return (__m128d) vec_ceil ((__v2df) __A);
 }
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_ceil_sd (__m128d __A, __m128d __B)
 {
@@ -230,14 +230,14 @@ _mm_ceil_sd (__m128d __A, __m128d __B)
   return (__m128d) __r;
 }
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_floor_pd (__m128d __A)
 {
   return (__m128d) vec_floor ((__v2df) __A);
 }
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_floor_sd (__m128d __A, __m128d __B)
 {
@@ -246,14 +246,14 @@ _mm_floor_sd (__m128d __A, __m128d __B)
   return (__m128d) __r;
 }
 
-__inline __m128
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_ceil_ps (__m128 __A)
 {
   return (__m128) vec_ceil ((__v4sf) __A);
 }
 
-__inline __m128
+extern __inline __m128
 __attribute__ ((__gnu_inline__, 

Re: Make full use of context-sensitive ranges in access warnings

2021-10-25 Thread Jeff Law via Gcc-patches




On 10/25/2021 1:31 PM, Martin Sebor wrote:

On 10/25/21 12:57 PM, Jeff Law wrote:



On 10/23/2021 5:49 PM, Martin Sebor via Gcc-patches wrote:

Somewhat belatedly following Aldy's lead on finishing
the conversion to Ranger, the attached patch modifies
gimple-ssa-warn-access and other passes that use
the pointer_query machinery to provide Ranger with
the statement it's being called to determine ranges for.
The changes are almost completely mechanical, involving
passing a GIMPLE statement around (and a range_query
pointer) all the way into the bowels of the pointer_query
class to make them available when range info is being
determined.

There might be some overlap with Aldy's tree-ssa-strlen.c
changes to do the same there.  I'll deal with any conflicts
when it comes time to commit the work.

The changes trigger a couple of -Wstringop-overread instances
in libstdc++ tests.  The warnings look valid for the IL but
the code they're in is unreachable.  One of the tests already
suppresses -Wstringop-overflow so also suppressing
-Wstringop-overread doesn't seem out of line.

Tested on x86_64-linux.

Martin

PS The warning for the u8path-char8_t.cc test is this:

/ssd/test/build/gcc-test/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:355: 
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned 
int)' reading between 16 and 4611686018427387903 bytes from a region 
of size 10 [-Wstringop-overread]


The IL for it is below.  The loop iN BB 3 exits with __i_22 equal
to 10 so BBs 5, 6 and 7 are unreachable.  It's surprising to me
that the loop isn't optimized into something better (like a MEM
array assignment or memcpy).

   [local count: 1073741824]:
  MEM[(struct basic_string *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)]._M_p = _M_local_buf;

   [local count: 8687547547]:
  # __i_109 = PHI <__i_22(3), 0(2)>
  __i_22 = __i_109 + 1;
  _24 = MEM[(const char_type &)"filename2" + __i_22 * 1];
  if (_24 != 0)
    goto ; [89.00%]
  else
    goto ; [11.00%]

   [local count: 1073741824]:   <<< __i_22 == 10 here
  if (__i_22 > 15)
    goto ; [33.00%]
  else
    goto ; [67.00%]

   [local count: 354334802]:
  if (__i_22 > 4611686018427387903)
    goto ; [0.04%]
  else
    goto ; [99.96%]   >>> __i_22 in [16, 4611686018427387903]

   [local count: 141736]:
  std::__throw_length_error ("basic_string::_M_create");

   [local count: 354193066]:
  _85 = __i_109 + 2;
  _42 = operator new (_85);
  s1._M_dataplus._M_p = _42;
  s1.D.30357._M_allocated_capacity = __i_22;
  __builtin_memcpy (_42, "filename2", __i_22);   << -Wstringop-overread


Do you mean __i_22 == 16 earlier?  I don't see how it's restricted to 
10.


The loop computes the size of the "filename2" string so the result
is 10, no?

Oh, duh.  I'm not sure that Ranger will pick that up though.

jeff




Re: Make full use of context-sensitive ranges in access warnings

2021-10-25 Thread Martin Sebor via Gcc-patches

On 10/25/21 12:57 PM, Jeff Law wrote:



On 10/23/2021 5:49 PM, Martin Sebor via Gcc-patches wrote:

Somewhat belatedly following Aldy's lead on finishing
the conversion to Ranger, the attached patch modifies
gimple-ssa-warn-access and other passes that use
the pointer_query machinery to provide Ranger with
the statement it's being called to determine ranges for.
The changes are almost completely mechanical, involving
passing a GIMPLE statement around (and a range_query
pointer) all the way into the bowels of the pointer_query
class to make them available when range info is being
determined.

There might be some overlap with Aldy's tree-ssa-strlen.c
changes to do the same there.  I'll deal with any conflicts
when it comes time to commit the work.

The changes trigger a couple of -Wstringop-overread instances
in libstdc++ tests.  The warnings look valid for the IL but
the code they're in is unreachable.  One of the tests already
suppresses -Wstringop-overflow so also suppressing
-Wstringop-overread doesn't seem out of line.

Tested on x86_64-linux.

Martin

PS The warning for the u8path-char8_t.cc test is this:

/ssd/test/build/gcc-test/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:355: 
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned 
int)' reading between 16 and 4611686018427387903 bytes from a region 
of size 10 [-Wstringop-overread]


The IL for it is below.  The loop iN BB 3 exits with __i_22 equal
to 10 so BBs 5, 6 and 7 are unreachable.  It's surprising to me
that the loop isn't optimized into something better (like a MEM
array assignment or memcpy).

   [local count: 1073741824]:
  MEM[(struct basic_string *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)]._M_p = _M_local_buf;

   [local count: 8687547547]:
  # __i_109 = PHI <__i_22(3), 0(2)>
  __i_22 = __i_109 + 1;
  _24 = MEM[(const char_type &)"filename2" + __i_22 * 1];
  if (_24 != 0)
    goto ; [89.00%]
  else
    goto ; [11.00%]

   [local count: 1073741824]:   <<< __i_22 == 10 here
  if (__i_22 > 15)
    goto ; [33.00%]
  else
    goto ; [67.00%]

   [local count: 354334802]:
  if (__i_22 > 4611686018427387903)
    goto ; [0.04%]
  else
    goto ; [99.96%]   >>> __i_22 in [16, 4611686018427387903]

   [local count: 141736]:
  std::__throw_length_error ("basic_string::_M_create");

   [local count: 354193066]:
  _85 = __i_109 + 2;
  _42 = operator new (_85);
  s1._M_dataplus._M_p = _42;
  s1.D.30357._M_allocated_capacity = __i_22;
  __builtin_memcpy (_42, "filename2", __i_22);   << -Wstringop-overread


Do you mean __i_22 == 16 earlier?  I don't see how it's restricted to 10.


The loop computes the size of the "filename2" string so the result
is 10, no?



I would have expected to have a global range for i_22 of [0,16] which in 
turn should have allowed the optimizers to remove bb5 and bb6.  Not sure 
if that'd fix your overread though.


OK.  I'll let you and Aldy coordinate since y'all may be hitting some of 
the same bits.


Will do.

Martin


Re: Make full use of context-sensitive ranges in access warnings

2021-10-25 Thread Martin Sebor via Gcc-patches

On 10/25/21 12:57 PM, Jeff Law wrote:



On 10/23/2021 5:49 PM, Martin Sebor via Gcc-patches wrote:

Somewhat belatedly following Aldy's lead on finishing
the conversion to Ranger, the attached patch modifies
gimple-ssa-warn-access and other passes that use
the pointer_query machinery to provide Ranger with
the statement it's being called to determine ranges for.
The changes are almost completely mechanical, involving
passing a GIMPLE statement around (and a range_query
pointer) all the way into the bowels of the pointer_query
class to make them available when range info is being
determined.

There might be some overlap with Aldy's tree-ssa-strlen.c
changes to do the same there.  I'll deal with any conflicts
when it comes time to commit the work.

The changes trigger a couple of -Wstringop-overread instances
in libstdc++ tests.  The warnings look valid for the IL but
the code they're in is unreachable.  One of the tests already
suppresses -Wstringop-overflow so also suppressing
-Wstringop-overread doesn't seem out of line.

Tested on x86_64-linux.

Martin

PS The warning for the u8path-char8_t.cc test is this:

/ssd/test/build/gcc-test/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:355: 
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned 
int)' reading between 16 and 4611686018427387903 bytes from a region 
of size 10 [-Wstringop-overread]


The IL for it is below.  The loop iN BB 3 exits with __i_22 equal
to 10 so BBs 5, 6 and 7 are unreachable.  It's surprising to me
that the loop isn't optimized into something better (like a MEM
array assignment or memcpy).

   [local count: 1073741824]:
  MEM[(struct basic_string *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)]._M_p = _M_local_buf;

   [local count: 8687547547]:
  # __i_109 = PHI <__i_22(3), 0(2)>
  __i_22 = __i_109 + 1;
  _24 = MEM[(const char_type &)"filename2" + __i_22 * 1];
  if (_24 != 0)
    goto ; [89.00%]
  else
    goto ; [11.00%]

   [local count: 1073741824]:   <<< __i_22 == 10 here
  if (__i_22 > 15)
    goto ; [33.00%]
  else
    goto ; [67.00%]

   [local count: 354334802]:
  if (__i_22 > 4611686018427387903)
    goto ; [0.04%]
  else
    goto ; [99.96%]   >>> __i_22 in [16, 4611686018427387903]

   [local count: 141736]:
  std::__throw_length_error ("basic_string::_M_create");

   [local count: 354193066]:
  _85 = __i_109 + 2;
  _42 = operator new (_85);
  s1._M_dataplus._M_p = _42;
  s1.D.30357._M_allocated_capacity = __i_22;
  __builtin_memcpy (_42, "filename2", __i_22);   << -Wstringop-overread


Do you mean __i_22 == 16 earlier?  I don't see how it's restricted to 10.


The loop computes the size of the "filename2" string so the result
is 10, no?



I would have expected to have a global range for i_22 of [0,16] which in 
turn should have allowed the optimizers to remove bb5 and bb6.  Not sure 
if that'd fix your overread though.


OK.  I'll let you and Aldy coordinate since y'all may be hitting some of 
the same bits.


Will do.

Martin


Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-25 Thread Jeff Law via Gcc-patches




On 10/25/2021 12:49 PM, Aldy Hernandez wrote:

On Mon, Oct 25, 2021 at 8:42 PM Jeff Law  wrote:



On 10/24/2021 12:25 PM, Aldy Hernandez wrote:

On 10/24/21 6:57 PM, Jeff Law wrote:


Ugwe could put the test back, check for some random large
number, and come up with a more satisfactory test later? ;-)

I thought our "counting" based tests could only check equality (ie,
expect to see this string precisely N times).  Though if we could
check that # threads realized was > some low water mark, that'd
probably be better than what we've got right now.

Andrew actually had a patch for a dejagnu construct doing just that
(scan-tree-dump-minimum), but I just noticed it didn't work quite
right for this test.

This is a bit embarrassing, but upon further analysis I've just
noticed that the number of threadable candidates has been exploding
over the year, but the ones that actually make it past the block
copier restrictions plus rewire_first_differing_edge, etc, only
changed by 1 with this patch.  So perhaps we don't need to bend over
backward (just yet anyhow).

I can leave the simple gimple FE test since I've already coded it.
Up to you.

I'd keep the gimple FE test.  I can easily see coming back to this ;-)


How does this look?

Looks good for the trunk to me.

Thanks Jeff.

I will commit the other patch from this series as well as the
testsuite change, both of which you approved.  Also, I was going to
commit the following as obvious until I noticed it depended on the
other patches:

https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582232.html

Just to be explicit, that patch is fine too.



I think it's now obvious, but if you have an objection, let me know.

It'll be a while, cause I need to rest everything again on x86 and
ppc64.  I'm tired of getting mail from CI bots :).

Thanks for your feedback and patience.

Thanks for digging into this stuff.  It's ripe for some developer love.

jeff


Re: Make full use of context-sensitive ranges in access warnings

2021-10-25 Thread Jeff Law via Gcc-patches




On 10/23/2021 5:49 PM, Martin Sebor via Gcc-patches wrote:

Somewhat belatedly following Aldy's lead on finishing
the conversion to Ranger, the attached patch modifies
gimple-ssa-warn-access and other passes that use
the pointer_query machinery to provide Ranger with
the statement it's being called to determine ranges for.
The changes are almost completely mechanical, involving
passing a GIMPLE statement around (and a range_query
pointer) all the way into the bowels of the pointer_query
class to make them available when range info is being
determined.

There might be some overlap with Aldy's tree-ssa-strlen.c
changes to do the same there.  I'll deal with any conflicts
when it comes time to commit the work.

The changes trigger a couple of -Wstringop-overread instances
in libstdc++ tests.  The warnings look valid for the IL but
the code they're in is unreachable.  One of the tests already
suppresses -Wstringop-overflow so also suppressing
-Wstringop-overread doesn't seem out of line.

Tested on x86_64-linux.

Martin

PS The warning for the u8path-char8_t.cc test is this:

/ssd/test/build/gcc-test/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:355: 
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned 
int)' reading between 16 and 4611686018427387903 bytes from a region 
of size 10 [-Wstringop-overread]


The IL for it is below.  The loop iN BB 3 exits with __i_22 equal
to 10 so BBs 5, 6 and 7 are unreachable.  It's surprising to me
that the loop isn't optimized into something better (like a MEM
array assignment or memcpy).

   [local count: 1073741824]:
  MEM[(struct basic_string *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)]._M_p = _M_local_buf;

   [local count: 8687547547]:
  # __i_109 = PHI <__i_22(3), 0(2)>
  __i_22 = __i_109 + 1;
  _24 = MEM[(const char_type &)"filename2" + __i_22 * 1];
  if (_24 != 0)
    goto ; [89.00%]
  else
    goto ; [11.00%]

   [local count: 1073741824]:   <<< __i_22 == 10 here
  if (__i_22 > 15)
    goto ; [33.00%]
  else
    goto ; [67.00%]

   [local count: 354334802]:
  if (__i_22 > 4611686018427387903)
    goto ; [0.04%]
  else
    goto ; [99.96%]   >>> __i_22 in [16, 4611686018427387903]

   [local count: 141736]:
  std::__throw_length_error ("basic_string::_M_create");

   [local count: 354193066]:
  _85 = __i_109 + 2;
  _42 = operator new (_85);
  s1._M_dataplus._M_p = _42;
  s1.D.30357._M_allocated_capacity = __i_22;
  __builtin_memcpy (_42, "filename2", __i_22);   << -Wstringop-overread


Do you mean __i_22 == 16 earlier?  I don't see how it's restricted to 10.

I would have expected to have a global range for i_22 of [0,16] which in 
turn should have allowed the optimizers to remove bb5 and bb6.  Not sure 
if that'd fix your overread though.


OK.  I'll let you and Aldy coordinate since y'all may be hitting some of 
the same bits.


jeff



Re: [PATCH] x86_64: Implement V1TI mode shifts/rotates by a constant

2021-10-25 Thread Uros Bizjak via Gcc-patches
On Mon, Oct 25, 2021 at 4:16 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> I believe the proposed sequences should be dramatically faster than LLVM's
> implementation(s), due to the large latencies required to move values between
> the vector and scalar parts on modern x86_64 microarchitectures.  All of the
> SSE2 instructions used in the sequences proposed by my patch have single
> cycle latencies, so have a maximum total latency of 5 cycles, though due to
> multiple issue, typically require between 1 and 3 cycles depending up the 
> sequence.
>
> Moving between units is significantly slower; according to Agner Fog's tables,
> the pinsrq/pextrq instructions you suggest have latencies up to 7 cycles on 
> the
> Silvermont architecture.  Let's take the LLVM code you've provided, and
> annotate with cycle counts for a recent Intel (cascadelake) and recent AMD
> (zen2) CPUs.
>
> movq%xmm0, %rax ; 2-3 cycles
> pshufd  $78, %xmm0, %xmm0   ; 1 cycle
> movq%xmm0, %rcx ; 2-3 cycles
> shldq   $8, %rax, %rcx  ; 3 cycles
> shlq$8, %rax; 1 cycle
> movq%rcx, %xmm1 ; 2-3 cycles
> movq%rax, %xmm0 ; 2-3 cycles
> punpcklqdq  %xmm1, %xmm0; 1 cycle
>
> This 8 instruction sequence has a total latency of 14 cycles on CascadeLake 
> and
> 18 cycles on Zen2, but an scheduled cycle count of 9 cycles and 11 cycles 
> respectively.
>
> The same left shift by 8 as implemented by the proposed patch is:
>
> pslldq  $1, %xmm0   ; 1 cycle
>
> And for reference, the code currently generated by GCC is:
>
> movaps  %xmm0, -24(%rsp); 3 cycles
> movq-24(%rsp), %rax ; 2 cycles
> movq-16(%rsp), %rdx ; 2 cycles
> shldq   $8, %rax, %rdx  ; 3 cycles
> salq$8, %rax; 1 cycle
> movq%rax, -24(%rsp) ; 2 cycles
> movq%rdx, -16(%rsp) ; 2 cycles
> movdqa  -24(%rsp), %xmm0; 2 cycles
>
>
> The very worst case timing of my patches is the five instruction rotate:
> pshufd  $78, %xmm0, %xmm1   ; 1 cycle
> pshufd  $57, %xmm0, %xmm0   ; 1 cycle
> pslld   $1, %xmm1   ; 1 cycle
> psrld   $31, %xmm0  ; 1 cycle
> por %xmm1, %xmm0; 1 cycle
>
> which has 5 cycle total latency, but can complete in 3 cycles when suitably
> scheduled as the pshufd can execute concurrently, as then can the two shifts,
> finally followed by the por.
>
> Perhaps I'm missing something, but I'd expect this patch to be three or
> four times faster, on recent hardware, than the code generated by LLVM.
>
> Let me know if you'd like me to run microbenchmarks, but the documented
> timings are such a dramatic improvement, I'm a little surprised you've
> asked about performance.  My patch is also a code size win with -Os
> (ashl_8 is currently 39 bytes, shrinks to 5 bytes with this patch).

I was a bit worried about latencies, but as shown above in a great
detail, this worry was not justified. Yes, taking into account that
V1TI lives natively in XMM registers, we should keep it there as much
as possible, and even if the sequences look complicated at the first
sight, they win in all cases.

So, the patch is OK.

Thanks,
Uros.

> Please let me know what you think.
> Roger
> --
>
> -Original Message-
> From: Uros Bizjak 
> Sent: 25 October 2021 09:02
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [PATCH] x86_64: Implement V1TI mode shifts/rotates by a constant
>
> On Sun, Oct 24, 2021 at 6:34 PM Roger Sayle  
> wrote:
> >
> >
> > This patch provides RTL expanders to implement logical shifts and
> > rotates of 128-bit values (stored in vector integer registers) by
> > constant bit counts.  Previously, GCC would transfer these values to a
> > pair of scalar registers (TImode) via memory to perform the operation,
> > then transfer the result back via memory.  Instead these operations
> > are now expanded using (between 1 and 5) SSE2 vector instructions.
>
> Hm, instead of using memory (without STL forwarding for general -> XMM
> moves!) these should use something similar to what clang produces (or use 
> pextrq/pinsrq, at least with SSE4.1):
>
>movq%xmm0, %rax
>pshufd  $78, %xmm0, %xmm0
>movq%xmm0, %rcx
>shldq   $8, %rax, %rcx
>shlq$8, %rax
>movq%rcx, %xmm1
>movq%rax, %xmm0
>punpcklqdq  %xmm1, %xmm0
>
> > Logical shifts by multiples of 8 can be implemented using x86_64's
> > pslldq/psrldq instruction:
> > ashl_8: pslldq  $1, %xmm0
> > ret
> > lshr_32:
> > psrldq  $4, %xmm0
> > ret
> >
> > Logical shifts by greater than 64 can use pslldq/psrldq $8, followed
> > by a psllq/psrlq for the remaining bits:
> > ashl_111:
> > pslldq  $8, %xmm0
> > psllq   $47, %xmm0
> > ret
> > lshr_127:
> > psrldq  $8, %xmm0
> > psrlq   $63, %xmm0
> > ret
> >
> > The remaining logical shifts 

[committed] libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca

2021-10-25 Thread Tobias Burnus

In PR testsuite/102910 there was some discussion about alloca.h and
whether that header exists or whether 'alloca' is provided by stdlib.h
or ...

Well, some grepping showed that libgomp.oacc-c-c++-common/loop-gwv-2.c
also used 'alloca'. Solution: Do it like other testcases and use
__builtin_alloca. (I think this will fail on nvptx, but otherwise
it should now pass on more systems.)

I have only tested it on x86-64, where it passed before and after,
but I do see in gcc-testresults@ fails (even though I don't know
whether those fails are for this issue or others).

Committed as Rev. r12-4691.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 72dc270be793f159a3a038bef41542d85550b331
Author: Tobias Burnus 
Date:   Mon Oct 25 20:40:13 2021 +0200

libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca

Some systems do not have  but provide alloca differently, e.g.
via stdlib.h. Do it like other testcases do and use __builtin_alloca.

libgomp/ChangeLog:

PR testsuite/102910
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca
instead of #include  + alloca.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
index cb3878b8d4e..e73ed6064eb 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -6,7 +6,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -78,9 +77,9 @@ int main ()
 vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
-  gangdist = (int *) alloca (gangsize * sizeof (int));
-  workerdist = (int *) alloca (workersize * sizeof (int));
-  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  gangdist = (int *) __builtin_alloca (gangsize * sizeof (int));
+  workerdist = (int *) __builtin_alloca (workersize * sizeof (int));
+  vectordist = (int *) __builtin_alloca (vectorsize * sizeof (int));
   memset (gangdist, 0, gangsize * sizeof (int));
   memset (workerdist, 0, workersize * sizeof (int));
   memset (vectordist, 0, vectorsize * sizeof (int));


Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-25 Thread Aldy Hernandez via Gcc-patches
On Mon, Oct 25, 2021 at 8:42 PM Jeff Law  wrote:
>
>
>
> On 10/24/2021 12:25 PM, Aldy Hernandez wrote:
> > On 10/24/21 6:57 PM, Jeff Law wrote:
> >
> >>> Ugwe could put the test back, check for some random large
> >>> number, and come up with a more satisfactory test later? ;-)
> >> I thought our "counting" based tests could only check equality (ie,
> >> expect to see this string precisely N times).  Though if we could
> >> check that # threads realized was > some low water mark, that'd
> >> probably be better than what we've got right now.
> >
> > Andrew actually had a patch for a dejagnu construct doing just that
> > (scan-tree-dump-minimum), but I just noticed it didn't work quite
> > right for this test.
> >
> > This is a bit embarrassing, but upon further analysis I've just
> > noticed that the number of threadable candidates has been exploding
> > over the year, but the ones that actually make it past the block
> > copier restrictions plus rewire_first_differing_edge, etc, only
> > changed by 1 with this patch.  So perhaps we don't need to bend over
> > backward (just yet anyhow).
> >
> > I can leave the simple gimple FE test since I've already coded it.
> > Up to you.
> I'd keep the gimple FE test.  I can easily see coming back to this ;-)
>
> >
> > How does this look?
> Looks good for the trunk to me.

Thanks Jeff.

I will commit the other patch from this series as well as the
testsuite change, both of which you approved.  Also, I was going to
commit the following as obvious until I noticed it depended on the
other patches:

https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582232.html

I think it's now obvious, but if you have an objection, let me know.

It'll be a while, cause I need to rest everything again on x86 and
ppc64.  I'm tired of getting mail from CI bots :).

Thanks for your feedback and patience.
Aldy



Re: [PATCH] [PR testsuite/102857] Tweak ssa-dom-thread-7.c for aarch64.

2021-10-25 Thread Jeff Law via Gcc-patches




On 10/23/2021 3:14 AM, Aldy Hernandez wrote:

First, ssa-dom-thread-7 was looking at a dump file that was not
being generated.  This probably happened in the detangling of the VRP
threader from VRP, and I didn't notice because the test came back as
with UNRESOLVED instead of FAIL.

Second, aarch64 gets far more threads than other architectures (20
versus 12).  The difference is sufficiently different to make the
regex awkward.

We already have special casing for aarch64 in other parts of this
test, so perhaps it's simplest to have an arch specific test
for the thread3 count.

I don't know perhaps there's a better way.  I wake up with chills in
the middle of the night thinking about this test ;-).

Tested on x86-64 Linux and aarch64 Linux.

OK?

gcc/testsuite/ChangeLog:

PR testsuite/102857
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Add -fdump-tree-vrp2-stats.
Tweak for aarch64.

OK
jeff



Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-25 Thread Jeff Law via Gcc-patches




On 10/24/2021 12:25 PM, Aldy Hernandez wrote:

On 10/24/21 6:57 PM, Jeff Law wrote:

Ugwe could put the test back, check for some random large 
number, and come up with a more satisfactory test later? ;-)
I thought our "counting" based tests could only check equality (ie, 
expect to see this string precisely N times).  Though if we could 
check that # threads realized was > some low water mark, that'd 
probably be better than what we've got right now.


Andrew actually had a patch for a dejagnu construct doing just that 
(scan-tree-dump-minimum), but I just noticed it didn't work quite 
right for this test.


This is a bit embarrassing, but upon further analysis I've just 
noticed that the number of threadable candidates has been exploding 
over the year, but the ones that actually make it past the block 
copier restrictions plus rewire_first_differing_edge, etc, only 
changed by 1 with this patch.  So perhaps we don't need to bend over 
backward (just yet anyhow).


I can leave the simple gimple FE test since I've already coded it.   
Up to you.

I'd keep the gimple FE test.  I can easily see coming back to this ;-)



How does this look?

Looks good for the trunk to me.

jeff



Re: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.

2021-10-25 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> >>
>> >> int32x4_t foo(int32x4_t x) {
>> >>   return vshlq_s32(x, vdupq_n_s32(256)); }
>> >>
>> >> should fold to “x” (if we fold it at all).  Similarly:
>> >>
>> >> int32x4_t foo(int32x4_t x) {
>> >>   return vshlq_s32(x, vdupq_n_s32(257)); }
>> >>
>> >> should fold to x << 1 (again if we fold it at all).
>> >>
>> >> For a shift right:
>> >>
>> >> int32x4_t foo(int32x4_t x) {
>> >>   return vshlq_s32(x, vdupq_n_s32(-64)); }
>> >>
>> >> is equivalent to:
>> >>
>> >> int32x4_t foo(int32x4_t x) {
>> >>   return vshrq_n_s32(x, 31);
>> >> }
>> >>
>> >> and so it shouldn't fold to 0.
>> >
>> > And here I thought I had read the specs very carefully...
>> >
>> > I will punt on  them because I don't think those ranged are common at all.
>> 
>> Sounds good.
>> 
>> There were other review comments further down the message (I should
>> have been clearer about that, sorry).  Could you have a look at those too?
>> 
>
> Yes sorry I had missed those.
>
>> > +  }
>> > +  break;
>> > +  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
>> > +  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
>> > +  {
>> > +tree cst = args[1];
>> > +tree ctype = TREE_TYPE (cst);
>> > +HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (TREE_TYPE 
>> > (args[0])));
>> > +if (INTEGRAL_TYPE_P (ctype)
>> > +&& TREE_CODE (cst) == INTEGER_CST)
>> 
>> I don't think this works, since args[1] is a vector rather than a scalar.  
>> E.g. trying locally:
>
> The _x1_t types are treated as scalar, not vectors, so both are needed.

Ah, yeah, sorry for missing that.

> My original patch tested the scalar variant which is why this is here.
> I added vector one.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.c
>   (aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr,
>   ashr_simd, lshr, lshr_simd.
>   * config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM.
>   * config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32,
>   vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack.
>
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-1.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-2.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-3.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-4.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-5.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-7.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-8.c: New test.
>   * gcc.target/aarch64/signbit-2.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> f6b41d9c200d6300dee65ba60ae94488231a8a38..41da13f82f8cfe0de3c56e62fe884ffabf315ef9
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -2394,6 +2394,89 @@ aarch64_general_gimple_fold_builtin (unsigned int 
> fcode, gcall *stmt)
>  1, args[0]);
>   gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
>   break;
> +  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
> + {
> +   tree cst = args[1];
> +   tree ctype = TREE_TYPE (cst);
> +   if (TREE_CODE (cst) == INTEGER_CST)
> + {
> +   wide_int wcst = wi::to_wide (cst);
> +   if (wi::geu_p (wi::abs (wcst), element_precision (args[0])))
> + break;
> +
> +   if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
> + new_stmt =
> +   gimple_build_assign (gimple_call_lhs (stmt),
> +RSHIFT_EXPR, args[0],
> +wide_int_to_tree (ctype,
> +  wi::abs (wcst)));
> +   else
> + new_stmt =
> +   gimple_build_assign (gimple_call_lhs (stmt),
> +LSHIFT_EXPR, args[0], args[1]);
> + }

I don't think we should fold the negative cases here: they're erroneous
in the same way that shifts by precision are.  E.g. clang gives an error
for:

#include 

int32x4_t foo(int32x4_t x) {
  return vshlq_n_s32(x, -1);
}

So I think this simplifies to:

if (TREE_CODE (args[1]) == INTEGER_CST
&& wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0])))
   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
   LSHIFT_EXPR, args[0], args[1]);

along similar lines to the shifts right.

> + }
> + break;
> +  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
> +  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
> + 

[Fortran] Fix broken use of alloca in C interoperability testcase

2021-10-25 Thread Sandra Loosemore
This patch is for PR102910.  There's no reason why the testcase in 
question needs to use alloca, but I wasn't aware there were portability 
issues with it until I saw this issue.


I think this fix is probably obvious and will commit it tomorrow unless 
I get some feedback on it meanwhile.


-Sandra
commit 75b603334401d079391ca950dd2e22663cdb3080
Author: Sandra Loosemore 
Date:   Mon Oct 25 11:08:28 2021 -0700

[Fortran] Fix broken use of alloca in C interoperability testcase

2021-10-25  Sandra Loosemore  

	gcc/testsuite/

	PR testsuite/102910
	* gfortran.dg/c-interop/cf-descriptor-5-c.c: Use a static buffer
	instead of alloca.

diff --git a/gcc/testsuite/gfortran.dg/c-interop/cf-descriptor-5-c.c b/gcc/testsuite/gfortran.dg/c-interop/cf-descriptor-5-c.c
index 12464b5..320a354 100644
--- a/gcc/testsuite/gfortran.dg/c-interop/cf-descriptor-5-c.c
+++ b/gcc/testsuite/gfortran.dg/c-interop/cf-descriptor-5-c.c
@@ -1,6 +1,5 @@
 #include 
 #include 
-#include 
 
 #include 
 #include "dump-descriptors.h"
@@ -8,12 +7,18 @@
 extern void ctest (int n);
 extern void ftest (CFI_cdesc_t *a, int n);
 
+#define BUFSIZE 512
+static char adata[BUFSIZE];
+
 void
 ctest (int n)
 {
   CFI_CDESC_T(0) adesc;
   CFI_cdesc_t *a = (CFI_cdesc_t *) 
-  char *adata = (char *) alloca (n);
+
+  /* Use a fixed-size static buffer instead of allocating one dynamically.  */
+  if (n > BUFSIZE)
+abort ();
 
   /* Fill in adesc.  */
   check_CFI_status ("CFI_establish",


Re: [PATH][_GLIBCXX_DEBUG] Fix unordered container merge

2021-10-25 Thread François Dumont via Gcc-patches

New patch with the proposed workaround below.

I also slightly change the _M_merge_multi implementation so that if the 
new hash code computation raise an exception the node is simply not 
extracted rather than extracted and then released. This way, if it takes 
place on the 1st moved node the _GLIBCXX_DEBUG mode won't try to 
invalidate anything because the source size won't have changed.


Ok to commit ?

François


On 16/10/21 4:52 pm, Jonathan Wakely wrote:



On Sat, 16 Oct 2021, 14:49 François Dumont via Libstdc++, 
mailto:libstdc%2b...@gcc.gnu.org>> wrote:


Hi

 Here is the new proposal. My only concern is that we are also
using
hash or equal_to functors in the guard destructor.



Can we catch any exception there, invalidate all iterators, and not 
rethrow the exception?



 I am going to enhance merge normal implementation to make use of
the cached hash code when hash functors are the same between the
source
and destination of nodes. Maybe I'll be able to make use of it in
Debug
implementation too.

François


On 14/10/21 10:23 am, Jonathan Wakely wrote:
> On Wed, 13 Oct 2021 at 18:10, François Dumont via Libstdc++
> mailto:libstdc%2b...@gcc.gnu.org>> wrote:
>> Hi
>>
>>       libstdc++: [_GLIBCXX_DEBUG] Implement unordered container
merge
>>
>>       The _GLIBCXX_DEBUG unordered containers need a dedicated
merge
>> implementation
>>       so that any existing iterator on the transfered nodes is
properly
>> invalidated.
>>
>>       Add typedef/using declaration for everything used as-is
from normal
>> implementation.
>>
>>       libstdc++-v3/ChangeLog:
>>
>>               * include/debug/safe_container.h
(_Safe_container<>): Make
>> all methods
>>               protected.
>>               * include/debug/safe_unordered_container.h
>>  (_Safe_unordered_container<>::_M_invalide_all): Make public.
>>  (_Safe_unordered_container<>::_M_invalide_if): Likewise.
>> (_Safe_unordered_container<>::_M_invalide_local_if): Likewise.
>>               * include/debug/unordered_map
>>  (unordered_map<>::mapped_type, pointer, const_pointer): New
>> typedef.
>>               (unordered_map<>::reference, const_reference,
>> difference_type): New typedef.
>>  (unordered_map<>::get_allocator, empty, size, max_size):
>> Add usings.
>>  (unordered_map<>::bucket_count, max_bucket_count, bucket):
>> Add usings.
>>  (unordered_map<>::hash_function, key_equal, count,
>> contains): Add usings.
>>               (unordered_map<>::operator[], at, rehash,
reserve): Add usings.
>>               (unordered_map<>::merge): New.
>>  (unordered_multimap<>::mapped_type, pointer,
>> const_pointer): New typedef.
>>  (unordered_multimap<>::reference, const_reference,
>> difference_type): New typedef.
>>  (unordered_multimap<>::get_allocator, empty, size,
>> max_size): Add usings.
>>  (unordered_multimap<>::bucket_count, max_bucket_count,
>> bucket): Add usings.
>>  (unordered_multimap<>::hash_function, key_equal, count,
>> contains): Add usings.
>>  (unordered_multimap<>::rehash, reserve): Add usings.
>>  (unordered_multimap<>::merge): New.
>>               * include/debug/unordered_set
>>  (unordered_set<>::mapped_type, pointer, const_pointer): New
>> typedef.
>>               (unordered_set<>::reference, const_reference,
>> difference_type): New typedef.
>>  (unordered_set<>::get_allocator, empty, size, max_size):
>> Add usings.
>>  (unordered_set<>::bucket_count, max_bucket_count, bucket):
>> Add usings.
>>  (unordered_set<>::hash_function, key_equal, count,
>> contains): Add usings.
>>               (unordered_set<>::rehash, reserve): Add usings.
>>               (unordered_set<>::merge): New.
>>  (unordered_multiset<>::mapped_type, pointer,
>> const_pointer): New typedef.
>>  (unordered_multiset<>::reference, const_reference,
>> difference_type): New typedef.
>>  (unordered_multiset<>::get_allocator, empty, size,
>> max_size): Add usings.
>>  (unordered_multiset<>::bucket_count, max_bucket_count,
>> bucket): Add usings.
>>  (unordered_multiset<>::hash_function, key_equal, count,
>> contains): Add usings.
>>  (unordered_multiset<>::rehash, reserve): Add usings.
>>  (unordered_multiset<>::merge): New.
>>               *
>> testsuite/23_containers/unordered_map/debug/merge1_neg.cc: New
test.
>>               *
>> testsuite/23_containers/unordered_map/debug/merge2_neg.cc: New
test.
>>               *
>> testsuite/23_containers/unordered_map/debug/merge3_neg.cc: New
test.
>>               *
>> testsuite/23_containers/unordered_map/debug/merge4_neg.cc: New
test.
>>               *
>> 

Re: [PATCH][RFC] Map -ftrapv to -fsanitize=signed-integer-overflow -fsanitize-undefined-trap-on-error

2021-10-25 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 20, 2021 at 03:22:10PM +0200, Richard Biener via Gcc-patches wrote:
> This maps -ftrapv to -fsanitize=signed-integer-overflow
> -fsanitize-undefined-trap-on-error, effectively removing
> flag_trapv (or rather making it always false).
> 
> This has implications on language support - while -ftrapv
> was formerly universally available the mapping restricts it
> to the C family of frontends.
> 
> It also raises questions on mixing -ftrapv with -fsanitize
> flags, specifically with other recovery options for the
> undefined sanitizer since -fsanitize-undefined-trap-on-error
> cannot be restricted to the signed-integer-overflow part at
> the moment.  To more closely map behavior we could add
> -fsanitize=trapv where with a single option we could also
> simply alias -ftrapv to that.

I think we shouldn't do it this way.
There is no reason not to support it in all FEs, not just C family,
the instrumentation is done during in this case in the ubsan pass anyway.
And it also should cope well with different sanitizers, while
-ftrapv vs. -fsanitize=signed-integer-overflow probably needs to be
either/or, so one of those should take precedence over the other,
e.g. -fsanitize=shift -fsanitize-recover=shift -ftrapv should result
in recovering from shift UBs, not trap on them.

My preference would be new set of ifns for -ftrapv, similar to
.UBSAN_CHECK_{ADD,SUB,MUL}, say .TRAPV_CHECK_{ADD,SUB,MUL,DIV},
that uses moreless the same internal-fn.c expansion as .UBSAN_CHECK_*,
but doesn't call ubsan_build_overflow_builtin and rely on
flag_sanitize_undefined_trap_on_error, instead either emits
the trap call directly, or also uses libcalls if optab isn't
available and libcall is (or for -Os cases if libcall is smaller).
Because some of the -fsanitize=signed-integer-overflow emitted code
for multiplication at least on some architectures is very large...

Jakub



Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-25 Thread Jeff Law via Gcc-patches




On 10/25/2021 10:58 AM, Andrew MacLeod wrote:

On 10/20/21 6:28 AM, Aldy Hernandez wrote:

Sometimes we can solve a candidate path without having to recurse
further back.  This can mostly happen in fully resolving mode, because
we can ask the ranger what the range on entry to the path is, but
there's no reason this can't always apply.  This one-liner removes
the fully-resolving restriction.

I'm tickled pink to see how many things we now get quite early
in the compilation.  I actually had to disable jump threading entirely
for a few tests because the early threader was catching things
disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
cleanups happening.

I was going to commit this as obvious, but I think the test changes
merit discussion.

We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
some time.  Every time a threading pass gets smarter, we push the
check further down the pipeline.  We've officially run out of dumb
threading passes to disable ;-).  In the last year we've gone up from a
handful of threads, to 34 threads with the current combination of
options.  I doubt this is testing anything useful any more, so I've
removed it.

Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
jump threads, but they were disallowed because of loop rotation.  Then
we started catching more jump threads in VRP2 threading so we tested
there.  With this patch though, we triple the number of threads found
from 11 to 31.  I believe this test has outlived its usefulness, and
I've removed it.  Note that even though we have these outrageous
possibilities for this test, the block copier ultimately chops them
down (23 survive though).


Im running into an issue with ssa-dom-thread-4.c when trying to run 
ranger for the VRP2 pass.  It reduces the number of threads to 2, and 
upon closer inspection as to why, I see:


unsigned char
bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
  const_bitmap kill)
{
  unsigned char changed = 0;

  bitmap_element *dst_elt;
  const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;

  while (a_elt || b_elt)
    {

Ranger determines that the uses of a_elt and b_elt in the guard are 
used before defined, so assumed UNDFINED and removes a condition check.


So it seems like this entire test case is predicated on undefined 
behaviour?  fwiw, If I initialize them, I get 0 threads...

It may have been over-reduced.  It looks like one of mine :-)

I'd argue that in this case the test is compromised and should just be 
removed.


Jeff


Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-25 Thread Aldy Hernandez via Gcc-patches
On Mon, Oct 25, 2021 at 6:58 PM Andrew MacLeod  wrote:
>
> On 10/20/21 6:28 AM, Aldy Hernandez wrote:
> > Sometimes we can solve a candidate path without having to recurse
> > further back.  This can mostly happen in fully resolving mode, because
> > we can ask the ranger what the range on entry to the path is, but
> > there's no reason this can't always apply.  This one-liner removes
> > the fully-resolving restriction.
> >
> > I'm tickled pink to see how many things we now get quite early
> > in the compilation.  I actually had to disable jump threading entirely
> > for a few tests because the early threader was catching things
> > disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
> > cleanups happening.
> >
> > I was going to commit this as obvious, but I think the test changes
> > merit discussion.
> >
> > We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
> > some time.  Every time a threading pass gets smarter, we push the
> > check further down the pipeline.  We've officially run out of dumb
> > threading passes to disable ;-).  In the last year we've gone up from a
> > handful of threads, to 34 threads with the current combination of
> > options.  I doubt this is testing anything useful any more, so I've
> > removed it.
> >
> > Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
> > jump threads, but they were disallowed because of loop rotation.  Then
> > we started catching more jump threads in VRP2 threading so we tested
> > there.  With this patch though, we triple the number of threads found
> > from 11 to 31.  I believe this test has outlived its usefulness, and
> > I've removed it.  Note that even though we have these outrageous
> > possibilities for this test, the block copier ultimately chops them
> > down (23 survive though).
>
> Im running into an issue with ssa-dom-thread-4.c when trying to run
> ranger for the VRP2 pass.  It reduces the number of threads to 2, and
> upon closer inspection as to why, I see:
>
> unsigned char
> bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
>const_bitmap kill)
> {
>unsigned char changed = 0;
>
>bitmap_element *dst_elt;
>const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;
>
>while (a_elt || b_elt)
>  {
>
> Ranger determines that the uses of a_elt and b_elt in the guard are used
> before defined, so assumed UNDFINED and removes a condition check.
>
> So it seems like this entire test case is predicated on undefined
> behaviour?  fwiw, If I initialize them, I get 0 threads...

Hah.  That makes sense.  As the threaders have gotten smarter, the
scan-tree-dump-times has moved later on in the pipeline to less
capable threaders.  Currently it's testing for 3 threads in the first
VRP threader pass.  With my proposed patch I bet we see an UNDEFINED
somewhere in the calculation, and bail on the entire thread as
unreachable.

Aldy



Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-25 Thread Andrew MacLeod via Gcc-patches

On 10/20/21 6:28 AM, Aldy Hernandez wrote:

Sometimes we can solve a candidate path without having to recurse
further back.  This can mostly happen in fully resolving mode, because
we can ask the ranger what the range on entry to the path is, but
there's no reason this can't always apply.  This one-liner removes
the fully-resolving restriction.

I'm tickled pink to see how many things we now get quite early
in the compilation.  I actually had to disable jump threading entirely
for a few tests because the early threader was catching things
disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
cleanups happening.

I was going to commit this as obvious, but I think the test changes
merit discussion.

We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
some time.  Every time a threading pass gets smarter, we push the
check further down the pipeline.  We've officially run out of dumb
threading passes to disable ;-).  In the last year we've gone up from a
handful of threads, to 34 threads with the current combination of
options.  I doubt this is testing anything useful any more, so I've
removed it.

Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
jump threads, but they were disallowed because of loop rotation.  Then
we started catching more jump threads in VRP2 threading so we tested
there.  With this patch though, we triple the number of threads found
from 11 to 31.  I believe this test has outlived its usefulness, and
I've removed it.  Note that even though we have these outrageous
possibilities for this test, the block copier ultimately chops them
down (23 survive though).


Im running into an issue with ssa-dom-thread-4.c when trying to run 
ranger for the VRP2 pass.  It reduces the number of threads to 2, and 
upon closer inspection as to why, I see:


unsigned char
bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
  const_bitmap kill)
{
  unsigned char changed = 0;

  bitmap_element *dst_elt;
  const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;

  while (a_elt || b_elt)
    {

Ranger determines that the uses of a_elt and b_elt in the guard are used 
before defined, so assumed UNDFINED and removes a condition check.


So it seems like this entire test case is predicated on undefined 
behaviour?  fwiw, If I initialize them, I get 0 threads...


Andrew




Re: [PATCH][RFC] Map -ftrapv to -fsanitize=signed-integer-overflow -fsanitize-undefined-trap-on-error

2021-10-25 Thread Martin Sebor via Gcc-patches

On 10/20/21 7:22 AM, Richard Biener via Gcc-patches wrote:

This maps -ftrapv to -fsanitize=signed-integer-overflow
-fsanitize-undefined-trap-on-error, effectively removing
flag_trapv (or rather making it always false).


It sounds like C/C++ programmers might benefit from this change
but users of the option in other languages would not.  I'm sure
they'd appreciate a heads up on the upcoming removal of a feature
so they could adjust to it.  Issuing a warning would be one way
to give them such a heads up, while keeping the existing behavior
for a release, and then removing it.



This has implications on language support - while -ftrapv
was formerly universally available the mapping restricts it
to the C family of frontends.

It also raises questions on mixing -ftrapv with -fsanitize
flags, specifically with other recovery options for the
undefined sanitizer since -fsanitize-undefined-trap-on-error
cannot be restricted to the signed-integer-overflow part at
the moment.  To more closely map behavior we could add
-fsanitize=trapv where with a single option we could also
simply alias -ftrapv to that.

Code quality wise a simple signed add compiles to

 movl%edi, %eax
 addl%esi, %eax
jo  .L5
...
.L5:
 ud2

compared to

 call__addvsi3

and it has less of the bugs -ftrapv has.  The IL will
not contain a PLUS_EXPR but a .UBSAN_CHECK_ADD internal
function call which has rudimentary support throughout
optimizers but is not recognized as possibly terminating
the program so

int foo (int i, int j, int *p, int k)
{
   int tem = i + j;
   *p = 0;
   if (k)
 return tem;
   return 0;
}

will be optimized to perform the add only conditional
and the possibly NULL *p dereference first (note the
same happens with the "legacy" -ftrapv).  The behavior
with -fnon-call-exceptions is also different as the
internal functions are marked as not throwing and
as seen above the actual kind of trap can change (SIGILL
vs. SIGABRT).

One question is whether -ftrapv makes signed integer overflow
well-defined (to trap)


Trapping isn't well-defined in the C/C++ sense of the word.
It's still undefined behavior, even if it's documented that
way.  (Same way dereferencing a null pointer is undefined,
even if it results in SIGBUS.)

Martin

 like -fwrapv makes it wrap.  If so

the the above behavior is ill-formed.  Not sure how
sanitizers position themselves with respect to this and
whether the current behavior is OK there.  The patch below
instruments signed integer ops but leaves them undefined
so the compiler still has to be careful as to not introduce
new signed overflow (but at least that won't trap).
Currently -fwrapv -fsanitize=signed-integer-overflow will
not instrument any signed operations for example.

I do consider the option to simply make -ftrapv do nothing
but warn that people should use UBSAN - that wouldn't
imply semantics are 1:1 the same (which they are not).

Bootstrapped and tested on x86_64-unknown-linux-gnu, regresses

FAIL: gcc.dg/vect/trapv-vect-reduc-4.c scan-tree-dump-times vect "Detected
reduc
tion." 3
FAIL: gcc.dg/vect/trapv-vect-reduc-4.c scan-tree-dump-times vect "using an
in-or
der (fold-left) reduction" 1
FAIL: gcc.dg/vect/trapv-vect-reduc-4.c scan-tree-dump-times vect
"vectorized 3 l
oops" 1

where the vectorizer doesn't know the UBSAN IFNs.

2021-10-20  Richard Biener  

* opts.c (common_handle_option): Handle -ftrapv like
-fsanitize=signed-integer-overflow
-fsanitize-undefined-trap-on-error and do not set
flag_trapv.
---
  gcc/opts.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/opts.c b/gcc/opts.c
index 65fe192a198..909d2a031ff 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -3022,7 +3022,21 @@ common_handle_option (struct gcc_options *opts,
  
  case OPT_ftrapv:

if (value)
-   opts->x_flag_wrapv = 0;
+   {
+ opts->x_flag_wrapv = 0;
+ opts->x_flag_sanitize
+   = parse_sanitizer_options ("signed-integer-overflow",
+  loc, code, opts->x_flag_sanitize,
+  value, false);
+ if (!opts_set->x_flag_sanitize_undefined_trap_on_error)
+   opts->x_flag_sanitize_undefined_trap_on_error = 1;
+ /* This keeps overflow undefined and not trap.  Specifically
+it does no longer allow to catch exceptions together with
+-fnon-call-exceptions.  It also makes -ftrapv cease to
+work with non-C-family languages since ubsan only works for
+those.  */
+ opts->x_flag_trapv = 0;
+   }
break;
  
  case OPT_fstrict_overflow:






Re: [PATCH 3/N] Come up with casm global state.

2021-10-25 Thread Segher Boessenkool
Hi!

On Mon, Oct 25, 2021 at 03:36:25PM +0200, Martin Liška wrote:
> --- a/gcc/config/rs6000/rs6000-internal.h
> +++ b/gcc/config/rs6000/rs6000-internal.h
> @@ -189,4 +189,13 @@ extern bool rs6000_passes_vector;
>  extern bool rs6000_returns_struct;
>  extern bool cpu_builtin_p;
>  
> +struct rs6000_asm_out_state : public asm_out_state
> +{
> +  /* Initialize ELF sections. */
> +  void init_elf_sections ();
> +
> +  /* Initialize XCOFF sections. */
> +  void init_xcoff_sections ();
> +};

Our coding convention says to use "class", not "struct" (since this
isn't valid C code at all).

> -  sdata2_section
> + sec.sdata2
>  = get_unnamed_section (SECTION_WRITE, output_section_asm_op,
>  SDATA2_SECTION_ASM_OP);

(broken indentation)

> +/* Implement TARGET_ASM_INIT_SECTIONS.  */

That comment is out-of-date.

> +static asm_out_state *
> +rs6000_elf_asm_init_sections (void)
> +{
> +  rs6000_asm_out_state *target_state
> += new (ggc_alloc ()) rs6000_asm_out_state ();

Hrm, maybe we can have a macro or function that does this, ggc_new or
something?

> +/* Implement TARGET_ASM_INIT_SECTIONS.  */
> +
> +static asm_out_state *
> +rs6000_xcoff_asm_init_sections (void)

Here, too.  Both implementations are each one of several functions that
together implement the target macro.

> +/* The section that holds the DWARF2 frame unwind information, when 
> known.
> +   The section is set either by the target's init_sections hook or by the
> +   first call to switch_to_eh_frame_section.  */
> +section *eh_frame;
> +
> +/* RS6000 sections.  */

Nothing here?  Just remove the comment header?

The idea looks fine to me.


Segher


RE: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.

2021-10-25 Thread Tamar Christina via Gcc-patches
> >>
> >> int32x4_t foo(int32x4_t x) {
> >>   return vshlq_s32(x, vdupq_n_s32(256)); }
> >>
> >> should fold to “x” (if we fold it at all).  Similarly:
> >>
> >> int32x4_t foo(int32x4_t x) {
> >>   return vshlq_s32(x, vdupq_n_s32(257)); }
> >>
> >> should fold to x << 1 (again if we fold it at all).
> >>
> >> For a shift right:
> >>
> >> int32x4_t foo(int32x4_t x) {
> >>   return vshlq_s32(x, vdupq_n_s32(-64)); }
> >>
> >> is equivalent to:
> >>
> >> int32x4_t foo(int32x4_t x) {
> >>   return vshrq_n_s32(x, 31);
> >> }
> >>
> >> and so it shouldn't fold to 0.
> >
> > And here I thought I had read the specs very carefully...
> >
> > I will punt on  them because I don't think those ranged are common at all.
> 
> Sounds good.
> 
> There were other review comments further down the message (I should
> have been clearer about that, sorry).  Could you have a look at those too?
> 

Yes sorry I had missed those.

> > +   }
> > +   break;
> > +  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
> > +  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
> > +   {
> > + tree cst = args[1];
> > + tree ctype = TREE_TYPE (cst);
> > + HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (TREE_TYPE 
> > (args[0])));
> > + if (INTEGRAL_TYPE_P (ctype)
> > + && TREE_CODE (cst) == INTEGER_CST)
> 
> I don't think this works, since args[1] is a vector rather than a scalar.  
> E.g. trying locally:

The _x1_t types are treated as scalar, not vectors, so both are needed.
My original patch tested the scalar variant which is why this is here.
I added vector one.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr,
ashr_simd, lshr, lshr_simd.
* config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM.
* config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32,
vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-7.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-8.c: New test.
* gcc.target/aarch64/signbit-2.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
f6b41d9c200d6300dee65ba60ae94488231a8a38..41da13f82f8cfe0de3c56e62fe884ffabf315ef9
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2394,6 +2394,89 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt)
   1, args[0]);
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
+  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
+   {
+ tree cst = args[1];
+ tree ctype = TREE_TYPE (cst);
+ if (TREE_CODE (cst) == INTEGER_CST)
+   {
+ wide_int wcst = wi::to_wide (cst);
+ if (wi::geu_p (wi::abs (wcst), element_precision (args[0])))
+   break;
+
+ if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  RSHIFT_EXPR, args[0],
+  wide_int_to_tree (ctype,
+wi::abs (wcst)));
+ else
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  LSHIFT_EXPR, args[0], args[1]);
+   }
+   }
+   break;
+  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
+  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
+   {
+ tree cst = args[1];
+ tree ctype = TREE_TYPE (cst);
+ /* Left shifts can be both scalar or vector, e.g. uint64x1_t is
+treated as a scalar type not a vector one.  */
+ if ((VECTOR_INTEGER_TYPE_P (ctype)
+  && uniform_vector_p (cst))
+ || TREE_CODE (cst) == INTEGER_CST)
+   {
+ wide_int wcst;
+ tree unit_ty;
+ if (TREE_CODE (cst) == INTEGER_CST)
+   {
+ wcst = wi::to_wide (cst);
+ unit_ty = TREE_TYPE (cst);
+   }
+ else
+   {
+ tree tmp = vector_cst_elt (cst, 0);
+ wcst = wi::to_wide (tmp);

Re: [PATCH] libcody: add mostlyclean Makefile target

2021-10-25 Thread Eric Gallager via Gcc-patches
On Mon, Oct 25, 2021 at 7:35 AM Martin Liška  wrote:
>
> Hello.
>
> The patch adds missing Makefile mostlyclean.
>
> Ready to be installed?
> Thanks,
> Martin
>

Generally the way the various "*clean" targets are arranged, in order
of cleanliness, from least clean to most clean, is:
mostlyclean
clean
distclean
maintainer-clean
...with each target depending on the previous one in the order. So
thus, instead of mostlyclean depending on clean, it'd be the other way
around, with clean depending on mostlyclean. See how the gcc/
subdirectory does it, for example. See the "Standard Targets for
Users" section of the GNU Coding Standards:
https://www.gnu.org/prep/standards/html_node/Standard-Targets.html#Standard-Targets

> PR other/102657
>
> libcody/ChangeLog:
>
> * Makefile.in: Add mostlyclean Makefile target.
> ---
>   libcody/Makefile.in | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/libcody/Makefile.in b/libcody/Makefile.in
> index b8b45a2e310..d8f1e8216d4 100644
> --- a/libcody/Makefile.in
> +++ b/libcody/Makefile.in
> @@ -111,7 +111,7 @@ maintainer-clean:: distclean
>   clean::
> rm -f $(shell find $(srcdir) -name '*~')
>
> -.PHONY: all check clean distclean maintainer-clean
> +.PHONY: all check clean distclean maintainer-clean mostlyclean
>
>   CXXFLAGS/ := -I$(srcdir)
>   LIBCODY.O := buffer.o client.o fatal.o netclient.o netserver.o \
> @@ -127,6 +127,8 @@ clean::
> rm -f $(LIBCODY.O) $(LIBCODY.O:.o=.d)
> rm -f libcody.a
>
> +mostlyclean: clean
> +
>   CXXFLAGS/fatal.cc = -DSRCDIR='"$(srcdir)"'
>
>   fatal.o: Makefile revision
> --
> 2.33.1
>


Re: [PATCH 3/N] Come up with casm global state.

2021-10-25 Thread Segher Boessenkool
On Mon, Oct 25, 2021 at 12:46:30PM +0200, Richard Biener wrote:
> On Thu, Oct 21, 2021 at 5:42 PM Segher Boessenkool
>  wrote:
> > It's disgusting, and fragile.  The define is slightly better than having
> > to write it out every time.  But can this not be done properly?
> >
> > If you use object-oriented stuff and need casts for that, you are doing
> > something wrong.
> 
> I think the "proper" fix would be to make 'casm' have the correct type
> in the first place

+1

> - of course that would either mean that the target
> needs to provide the (possibly derived) type, for example via a typedef
> in the target structure or a classical target macro.  If gengtype would
> know about inheritance that would also fix the GC marking issue.

Do we want gengtype to do inheritance, will it solve more future
problems, or will it *cause* more problems later?

> Of course encoding a static type in the target structure is the
> wrong direction from making targets "switchable".

Yes.

> Other than that my C++ fu is too weak to suggest the correct "pattern"
> here.

Virtual functions something something?


Segher


Re: how does vrp2 rearrange this?

2021-10-25 Thread Andrew MacLeod via Gcc-patches

On 10/21/21 3:59 PM, Andrew Pinski wrote:

On Thu, Oct 21, 2021 at 8:04 AM Andrew MacLeod  wrote:

On 10/19/21 7:13 PM, Andrew Pinski wrote:

On Tue, Oct 19, 2021 at 3:32 PM Andrew MacLeod  wrote:

On 10/19/21 5:13 PM, Andrew Pinski wrote:

On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches
 wrote:

using testcase ifcvt-4.c:


typedef int word __attribute__((mode(word)));

word
foo (word x, word y, word a)
{
  word i = x;
  word j = y;
  /* Try to make taking the branch likely.  */
  __builtin_expect (x > y, 1);
  if (x > y)
{
  i = a;
  j = i;
}
  return i * j;


The testcase is broken anyways.
The builtin_expect should be inside the if to have any effect.  Look
at the estimated values:
 if (x_3(D) > y_4(D))
   goto ; [50.00%]<<-- has been reversed.
 else
   goto ; [50.00%]
;;succ:   4 [50.0% (guessed)]  count:536870912 (estimated
locally) (TRUE_VALUE,EXECUTABLE)
;;3 [50.0% (guessed)]  count:536870912 (estimated
locally) (FALSE_VALUE,EXECUTABLE)

See how it is 50/50?
The testcase is not even testing what it says it is testing.  Just
happened to work previously does not mean anything.  Move the
builtin_expect inside the if and try again. I am shocked it took this
long to find the testcase issue really.

Thanks,
Andrew Pinski


Moving the expect around doesn't change anything, in fact, it makes it
worse since fre and evrp immediately eliminate it as true if it is in
the THEN block.

I think you misunderstood the change I was saying to do.
Try this:
typedef int word __attribute__((mode(word)));

word
foo (word x, word y, word a)
{
   word i = x;
   word j = y;
   /* Try to make taking the branch likely.  */
   if (__builtin_expect (x > y, 1))
 {
   i = a;
   j = i;
 }
   return i * j;
}
/* { dg-final { scan-rtl-dump "2 true changes made" "ce1" } } */

This should fix the "estimated values" to be more correct.

Thanks,
Andrew Pinski


estimated values are now correct, but it changes nothing.  I think you 
are right that it could be an artifact of going in and out of ASSERT_EXPRs.


;;   basic block 2, loop depth 0, count 1073741824 (estimated locally), 
maybe hot

;;    prev block 0, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:   ENTRY [always]  count:1073741824 (estimated locally) 
(FALLTHRU,EXECUTABLE)

  if (x_6(D) > y_7(D))
    goto ; [90.00%]
  else
    goto ; [10.00%]
;;    succ:   3 [90.0% (guessed)]  count:966367640 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)
;;    5 [10.0% (guessed)]  count:107374184 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)


;;   basic block 5, loop depth 0, count 107374184 (estimated locally), 
maybe hot

;;    prev block 2, next block 3, flags: (NEW)
;;    pred:   2 [10.0% (guessed)]  count:107374184 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)

  x_2 = ASSERT_EXPR ;
  y_3 = ASSERT_EXPR = x_2>;
  goto ; [100.00%]
;;    succ:   4 [always]  count:107374184 (estimated locally) (FALLTHRU)

;;   basic block 3, loop depth 0, count 966367640 (estimated locally), 
maybe hot

;;    prev block 5, next block 4, flags: (NEW, REACHABLE, VISITED)
;;    pred:   2 [90.0% (guessed)]  count:966367640 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)

  x_1 = ASSERT_EXPR  y_7(D)>;
  y_12 = ASSERT_EXPR ;
;;    succ:   4 [always]  count:966367640 (estimated locally) 
(FALLTHRU,EXECUTABLE)


;;   basic block 4, loop depth 0, count 1073741824 (estimated locally), 
maybe hot

;;    prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:   5 [always]  count:107374184 (estimated locally) (FALLTHRU)
;;    3 [always]  count:966367640 (estimated locally) 
(FALLTHRU,EXECUTABLE)

  # i_4 = PHI 
  # j_5 = PHI 
  _9 = i_4 * j_5;
  # VUSE <.MEM_10(D)>
  return _9;

nothing is done, and upon removing the ASSERTs, it becomes:

;;   basic block 2, loop depth 0, count 1073741824 (estimated locally), 
maybe hot

;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;    pred:   ENTRY [always]  count:1073741824 (estimated locally) 
(FALLTHRU,EXECUTABLE)

  if (x_6(D) > y_7(D))
    goto ; [90.00%]
  else
    goto ; [10.00%]
;;    succ:   4 [90.0% (guessed)]  count:966367640 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)
;;    3 [10.0% (guessed)]  count:107374184 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)


;;   basic block 3, loop depth 0, count 107374184 (estimated locally), 
maybe hot

;;    prev block 2, next block 4, flags: (NEW, VISITED)
;;    pred:   2 [10.0% (guessed)]  count:107374184 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)
;;    succ:   4 [always]  count:107374184 (estimated locally) 
(FALLTHRU,EXECUTABLE)


;;   basic block 4, loop depth 0, count 1073741824 (estimated locally), 
maybe hot

;;    prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:   3 [always]  count:107374184 (estimated locally) 
(FALLTHRU,EXECUTABLE)
;;

[Ada] Remove gnatfind and gnatxref

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
These tools are no longer maintained and have never supported project
files. They are replaced by the Ada Language Server which implements the
Language Server Protocol.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gcc-interface/Make-lang.in, gcc-interface/Makefile.in: Remove
gnatfind and gnatxref.diff --git a/gcc/ada/gcc-interface/Make-lang.in b/gcc/ada/gcc-interface/Make-lang.in
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -116,7 +116,7 @@ ADA_FLAGS_TO_PASS = \
 
 # List of Ada tools to build and install
 ADA_TOOLS=gnatbind gnatchop gnat gnatkr gnatlink gnatls gnatmake \
-  gnatname gnatprep gnatxref gnatfind gnatclean
+  gnatname gnatprep gnatclean
 
 # Say how to compile Ada programs.
 .SUFFIXES: .ada .adb .ads
@@ -903,8 +903,7 @@ doc/gnat-style.pdf: ada/gnat-style.texi $(gcc_docdir)/include/fdl.texi
 # (cross). $(prefix) comes from the --program-prefix configure option,
 # or from the --target option if the former is not specified.
 # Do the same for the rest of the Ada tools (gnatchop, gnat, gnatkr,
-# gnatlink, gnatls, gnatmake, gnatname, gnatprep, gnatxref, gnatfind,
-# gnatclean).
+# gnatlink, gnatls, gnatmake, gnatname, gnatprep, gnatclean).
 # gnatdll is only used on Windows.
 ada.install-common: $(gnat_install_lib) gnat-install-tools
 
@@ -975,8 +974,6 @@ ada.distclean:
 	-$(RM) gnatmake$(exeext)
 	-$(RM) gnatname$(exeext)
 	-$(RM) gnatprep$(exeext)
-	-$(RM) gnatfind$(exeext)
-	-$(RM) gnatxref$(exeext)
 	-$(RM) gnatclean$(exeext)
 	-$(RM) ada/rts/*
 	-$(RMDIR) ada/rts


diff --git a/gcc/ada/gcc-interface/Makefile.in b/gcc/ada/gcc-interface/Makefile.in
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -449,7 +449,7 @@ gnattools2: ../stamp-tools
 common-tools: ../stamp-tools
 	$(GNATMAKE) -j0 -c -b $(ADA_INCLUDES) \
 	  --GNATBIND="$(GNATBIND)" --GCC="$(CC) $(ALL_ADAFLAGS)" \
-	  gnatchop gnatcmd gnatkr gnatls gnatprep gnatxref gnatfind gnatname \
+	  gnatchop gnatcmd gnatkr gnatls gnatprep gnatname \
 	  gnatclean -bargs $(ADA_INCLUDES) $(GNATBIND_FLAGS)
 	$(GNATLINK) -v gnatcmd -o ../../gnat$(exeext) \
 	  --GCC="$(CC) $(ADA_INCLUDES)" --LINK="$(GCC_LINK)" $(TOOLS_LIBS)
@@ -461,10 +461,6 @@ common-tools: ../stamp-tools
 	  --GCC="$(CC) $(ADA_INCLUDES)" --LINK="$(GCC_LINK)" $(TOOLS_LIBS)
 	$(GNATLINK) -v gnatprep -o ../../gnatprep$(exeext) \
 	  --GCC="$(CC) $(ADA_INCLUDES)" --LINK="$(GCC_LINK)" $(TOOLS_LIBS)
-	$(GNATLINK) -v gnatxref -o ../../gnatxref$(exeext) \
-	  --GCC="$(CC) $(ADA_INCLUDES)" --LINK="$(GCC_LINK)" $(TOOLS_LIBS)
-	$(GNATLINK) -v gnatfind -o ../../gnatfind$(exeext) \
-	  --GCC="$(CC) $(ADA_INCLUDES)" --LINK="$(GCC_LINK)" $(TOOLS_LIBS)
 	$(GNATLINK) -v gnatname -o ../../gnatname$(exeext) \
 	  --GCC="$(CC) $(ADA_INCLUDES)" --LINK="$(GCC_LINK)" $(TOOLS_LIBS)
 	$(GNATLINK) -v gnatclean -o ../../gnatclean$(exeext) \




[Ada] Spurious error on user-defined literal and operator

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
This patch improves the handling of the Ada_2022 aspect involving
user-defined literals on integers, reals, and strings, when the literal
that must be converted to a type (for which the aspect is defined)
appears as an operand of a predefined operator. The target type may be
given by the type of the context, or by the type of another operand of
the operator.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch4.adb (Has_Possible_Literal_Aspects): If analysis of an
operator node fails to find  a possible interpretation, and one
of its operands is a literal or a named number, assign to the
node the corresponding class type (Any_Integer, Any_String,
etc).
(Operator_Check): Call it before emitting a type error.
* sem_res.adb (Has_Applicable_User_Defined_Literal): Given a
literal and a type, determine whether the type has a
user_defined aspect that can apply to the literal, and rewrite
the node as call to the corresponding function. Most of the code
was previously in procedure Resolve.
(Try_User_Defined_Literal): Check operands of a predefined
operator that fails to resolve, and apply
Has_Applicable_User_Defined_Literal to literal operands if any,
to find if a conversion will allow the operator to resolve
properly.
(Resolve): Call the above when a literal or an operator with a
literal operand fails to resolve.diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -281,6 +281,19 @@ package body Sem_Ch4 is
--  type is not directly visible. The routine uses this type to emit a more
--  informative message.
 
+   function Has_Possible_Literal_Aspects (N : Node_Id) return Boolean;
+   --  Ada_2022: if an operand is a literal it may be subject to an
+   --  implicit conversion to a type for which a user-defined literal
+   --  function exists. During the first pass of type resolution we do
+   --  not know the context imposed on the literal, so we assume that
+   --  the literal type is a valid candidate and rely on the second pass
+   --  of resolution to find the type with the proper aspect. We only
+   --  add this interpretation if no other one was found, which may be
+   --  too restrictive but seems sufficient to handle most proper uses
+   --  of the new aspect. It is unclear whether a full implementation of
+   --  these aspects can be achieved without larger modifications to the
+   --  two-pass resolution algorithm.
+
procedure Remove_Abstract_Operations (N : Node_Id);
--  Ada 2005: implementation of AI-310. An abstract non-dispatching
--  operation is not a candidate interpretation.
@@ -7541,6 +7554,9 @@ package body Sem_Ch4 is
 then
return;
 
+elsif Has_Possible_Literal_Aspects (N) then
+   return;
+
 --  If we have a logical operator, one of whose operands is
 --  Boolean, then we know that the other operand cannot resolve to
 --  Boolean (since we got no interpretations), but in that case we
@@ -7857,6 +7873,69 @@ package body Sem_Ch4 is
   end if;
end Operator_Check;
 
+   --
+   -- Has_Possible_Literal_Aspects --
+   --
+
+   function Has_Possible_Literal_Aspects (N : Node_Id) return Boolean is
+  R : constant Node_Id := Right_Opnd (N);
+  L : Node_Id := Empty;
+
+  procedure Check_Literal_Opnd (Opnd : Node_Id);
+  --  If an operand is a literal to which an aspect may apply,
+  --  add the corresponding type to operator node.
+
+  
+  -- Check_Literal_Opnd --
+  
+
+  procedure Check_Literal_Opnd (Opnd : Node_Id) is
+  begin
+ if Nkind (Opnd) in N_Numeric_Or_String_Literal
+   or else (Is_Entity_Name (Opnd)
+ and then Present (Entity (Opnd))
+ and then Is_Named_Number (Entity (Opnd)))
+ then
+Add_One_Interp (N, Etype (Opnd), Etype (Opnd));
+ end if;
+  end Check_Literal_Opnd;
+
+   --  Start of processing for Has_Possible_Literal_Aspects
+
+   begin
+  if Ada_Version < Ada_2022 then
+ return False;
+  end if;
+
+  if Nkind (N) in N_Binary_Op then
+ L := Left_Opnd (N);
+  else
+ L := Empty;
+  end if;
+  Check_Literal_Opnd (R);
+
+  --  Check left operand only if right one did not provide a
+  --  possible interpretation. Note that literal types are not
+  --  overloadable, in the sense that there is no overloadable
+  --  entity name whose several interpretations can be used to
+  --  indicate possible resulting types, so there is no way to
+  --  provide more than one interpretation to the operator node.
+  --  The choice of one operand over the other is arbitrary at
+  --  

[Ada] Follow-on cleanups for Uint fields

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Subsequent to prior major cleanups of Uint fields, this patch includes a
few more, fairly minor, cleanups.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* freeze.adb (Freeze_Fixed_Point_Type): Remove
previously-inserted test for Uint_0; no longer needed.
* gen_il-gen.ads: Improve comments.
* repinfo.adb (Rep_Value): Use Ubool type for B.
* repinfo.ads (Node_Ref): Use Unegative type.
(Node_Ref_Or_Val): Document that values of this type can be
No_Uint.
* exp_disp.adb (Make_Disp_Requeue_Body): Minor comment fix.
* sem_ch3.adb: Likewise.
* sem_ch8.adb: Likewise.
* sinfo-utils.adb (End_Location): End_Span can never be No_Uint,
so remove the "if No (L)" test.
* uintp.adb (Image_String): Use "for ... of" loop.
* uintp.ads (Unegative): New type for negative integers.  We
give it a long name (unlike Unat and Upos) because it is rarely
used.diff --git a/gcc/ada/exp_disp.adb b/gcc/ada/exp_disp.adb
--- a/gcc/ada/exp_disp.adb
+++ b/gcc/ada/exp_disp.adb
@@ -3037,7 +3037,7 @@ package body Exp_Disp is
begin
   pragma Assert (not Restriction_Active (No_Dispatching_Calls));
 
-  --  Null body is generated for interface types and non-concurrent
+  --  Null body is generated for interface types and nonconcurrent
   --  tagged types.
 
   if Is_Interface (Typ)


diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -9500,9 +9500,7 @@ package body Freeze is
  Minsiz : constant Uint := UI_From_Int (Minimum_Size (Typ));
 
   begin
- if Known_RM_Size (Typ)
-   and then RM_Size (Typ) /= Uint_0
- then
+ if Known_RM_Size (Typ) then
 if RM_Size (Typ) < Minsiz then
Error_Msg_Uint_1 := RM_Size (Typ);
Error_Msg_Uint_2 := Minsiz;


diff --git a/gcc/ada/gen_il-gen.ads b/gcc/ada/gen_il-gen.ads
--- a/gcc/ada/gen_il-gen.ads
+++ b/gcc/ada/gen_il-gen.ads
@@ -204,9 +204,22 @@ package Gen_IL.Gen is
--  Gen_IL.Fields, and delete all occurrences from Gen_IL.Gen.Gen_Entities.
 
--  If a field is not set, it is initialized by default to whatever value is
-   --  represented by all-zero bits, with two exceptions: Elist fields default
-   --  to No_Elist, and Uint fields default to Uint_0. In retrospect, it would
-   --  have been better to use No_Uint instead of Uint_0.
+   --  represented by all-zero bits, with some exceptions. This means Flags are
+   --  initialized to False, Node_Ids and List_Ids are initialized to Empty,
+   --  and enumeration fields are initialized to 'First of the type (assuming
+   --  there is no representation clause).
+   --
+   --  Elists default to No_Elist.
+   --
+   --  Fields of type Uint (but not its subtypes) are initialized to No_Uint.
+   --  Fields of subtypes Valid_Uint, Unat, Upos, Nonzero_Uint, and Ureal have
+   --  no default; it is an error to call a getter before calling the setter.
+   --  Likewise, other types whose range does not include zero have no default
+   --  (see package Types for the ranges).
+   --
+   --  If a node is created by a function in Nmake, then the defaults are
+   --  different from what is specified above. The parameters of Make_...
+   --  functions can have defaults specified; see Create_Syntactic_Field.
 
procedure Create_Node_Union_Type
  (T : Abstract_Node; Children : Type_Array);


diff --git a/gcc/ada/repinfo.adb b/gcc/ada/repinfo.adb
--- a/gcc/ada/repinfo.adb
+++ b/gcc/ada/repinfo.adb
@@ -2120,7 +2120,7 @@ package body Repinfo is
 
function Rep_Value (Val : Node_Ref_Or_Val; D : Discrim_List) return Uint is
 
-  function B (Val : Boolean) return Uint;
+  function B (Val : Boolean) return Ubool;
   --  Returns Uint_0 for False, Uint_1 for True
 
   function T (Val : Node_Ref_Or_Val) return Boolean;
@@ -2141,7 +2141,7 @@ package body Repinfo is
   -- B --
   ---
 
-  function B (Val : Boolean) return Uint is
+  function B (Val : Boolean) return Ubool is
   begin
  if Val then
 return Uint_1;


diff --git a/gcc/ada/repinfo.ads b/gcc/ada/repinfo.ads
--- a/gcc/ada/repinfo.ads
+++ b/gcc/ada/repinfo.ads
@@ -118,12 +118,12 @@ package Repinfo is
--  this field is done only in -gnatR3 mode, and in other modes, the value
--  is set to Uint_Minus_1.
 
-   subtype Node_Ref is Uint;
+   subtype Node_Ref is Unegative;
--  Subtype used for negative Uint values used to represent nodes
 
subtype Node_Ref_Or_Val is Uint;
-   --  Subtype used for values that can either be a Node_Ref (negative)
-   --  or a value (non-negative)
+   --  Subtype used for values that can be a Node_Ref (negative) or a value
+   --  (non-negative) or No_Uint.
 
type TCode is range 0 .. 27;
--  Type used on Ada side to represent DEFTREECODE values defined in
@@ -306,7 +306,7 @@ package Repinfo is
--  In the 

[Ada] Change format of the ?? warning insertion sequence

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Update all ?X? to ?.x? (likewise for  [-gnatwx]
* ?_x? -> [-gnatw_x]
* ?.x? -> [-gnatw.x]

With the support of the ?_x? insertion sequences, messages that related
to -gnatw_a, -gnatw_c, -gnatw_p, -gnatw_r are now correctly advertised as
relating to these.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* errout.adb (Skip_Msg_Insertion_Warning): Adapt and format as
Erroutc.Prescan_Message.Parse_Message_Class.
(Warn_Insertion): Adapt to new format.
* errout.ads: Update documentation.
* erroutc.adb (Get_Warning_Tag): Adapt to new format.
(Prescan_Message): Introduce Parse_Message_Class function.
(Validate_Specific_Warnings): Update ?W? to ?.w?.
* erroutc.ads: Update type and documentation.
* checks.adb (Validity_Check_Range): Update ?X? to ?.x?.
* exp_ch11.adb (Possible_Local_Raise): Update ?X? to ?.x?.
(Warn_If_No_Local_Raise): Likewise.
(Warn_If_No_Propagation): Likewise.
(Warn_No_Exception_Propagation_Active): Likewise.
* exp_ch4.adb (Expand_N_Allocator): Attach warning message to
-gnatw_a.
* exp_prag.adb (Expand_Pragma_Check): Update ?A? to ?.a?.
* exp_util.adb (Activate_Atomic_Synchronization): Update ?N? to
?.n?.
(Add_Invariant_Check): Update ?L? to ?.l?.
* freeze.adb (Check_Suspicious_Modulus): Update ?M? to ?.m?.
(Freeze_Entity): Update ?T? to ?.t?, ?Z? to ?.z?.
* par-util.adb (Warn_If_Standard_Redefinition): Update ?K? to
?.k?.
* sem_attr.adb (Min_Max): Update ?U? to ?.u?.
* sem_ch13.adb (Adjust_Record_For_Reverse_Bit_Order): Update ?V?
to ?.v?.
(Adjust_Record_For_Reverse_Bit_Order_Ada_95): Update ?V? to ?.v?.
(Component_Size_Case): Update ?S? to ?.s?.
(Analyze_Record_Representation_Clause): Update ?S? to ?.s? and
?C? to ?.c?.
(Add_Call): Update ?L? to ?.l?.
(Component_Order_Check): Attach warning message to -gnatw_r.
(Check_Component_List): Update ?H? to ?.h?.
(Set_Biased): Update ?B? to ?.b?.
* sem_ch3.adb (Modular_Type_Declaration): Update ?M? to ?.m?.
* sem_ch4.adb (Analyze_Mod): Update ?M? to ?.m?.
(Analyze_Quantified_Expression): Update ?T? to ?.t?.
* sem_ch6.adb (Check_Conformance): Attach warning message to
-gnatw_p.
(List_Inherited_Pre_Post_Aspects): Update ?L? to ?.l?.
* sem_ch7.adb (Unit_Requires_Body_Info): Update ?Y? to ?.y?.
* sem_ch8.adb (Analyze_Object_Renaming): Update ?R? to ?.r?.
* sem_prag.adb (Validate_Compile_Time_Warning_Or_Error): Attach
warning message to -gnatw_c.
* sem_res.adb (Check_Argument_Order): Update ?P? to ?.p?.
(Resolve_Comparison_Op): Update ?U? to ?.u?.
(Resolve_Range): Update ?U? to ?.u?.
(Resolve_Short_Circuit): Update ?A? to ?.a?.
(Resolve_Unary_Op): Update ?M? to ?.m?.
* sem_util.adb (Check_Result_And_Post_State): Update ?T? to ?.t?.
* sem_warn.adb (Output_Unused_Warnings_Off_Warnings): Update ?W?
to ?.w?.
* warnsw.ads: Update documentation for -gnatw_c.

patch.diff.gz
Description: application/gzip


[Ada] Fix bugs in Base_Type_Only (etc.) fields

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
If a field has Type_Only set to something other than No_Type_Only, then
we need to fetch the field from a possibly different node. For example,
the Modulus field has Type_Only = Base_Type_Only (and is documented as a
"[base type only]" field in Einfo). Therefore if we try to get Modulus
from node N, we must actually get it from Base_Type(N), not from N.

This was working correctly for the normal getters generated by Gen_IL.
However, when using Field_Descriptors to fetch fields (see package
Seinfo), the Type_Only aspect was ignored. This patch fixes that bug.
Treepr is the main place where Field_Descriptors are used to fetch
fields, so the effect of the bug was mainly to cause Treepr to print
wrong information.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gen_il-gen.adb (Put_Seinfo): Generate type
Seinfo.Type_Only_Enum based on type
Gen_IL.Internals.Type_Only_Enum. Automatically generating a copy
of the type will help keep them in sync.  (Note that there are
no Ada compiler packages imported into Gen_IL.)  Add a Type_Only
field to Field_Descriptor, so this information is available in
the Ada compiler (as opposed to just in the Gen_IL "compiler").
(One_Comp): Add initialization of the Type_Only field of
Field_Descriptor.
* gen_il-internals.ads (Image): Image function for
Type_Only_Enum.
* atree.ads (Node_To_Fetch_From): New function to compute which
node to fetch from, based on the Type_Only aspect.
* atree.adb (Get_Field_Value): Call Node_To_Fetch_From.
* treepr.adb (Print_Entity_Field): Call Node_To_Fetch_From.
(Print_Node_Field): Assert.
* sinfo-utils.adb (Walk_Sinfo_Fields,
Walk_Sinfo_Fields_Pairwise): Asserts.diff --git a/gcc/ada/atree.adb b/gcc/ada/atree.adb
--- a/gcc/ada/atree.adb
+++ b/gcc/ada/atree.adb
@@ -854,14 +854,15 @@ package body Atree is
  (N : Node_Id; Field : Node_Or_Entity_Field) return Field_Size_32_Bit
is
   Desc : Field_Descriptor renames Field_Descriptors (Field);
+  NN : constant Node_Or_Entity_Id := Node_To_Fetch_From (N, Field);
 
begin
   case Field_Size (Desc.Kind) is
- when 1 => return Field_Size_32_Bit (Get_1_Bit_Val (N, Desc.Offset));
- when 2 => return Field_Size_32_Bit (Get_2_Bit_Val (N, Desc.Offset));
- when 4 => return Field_Size_32_Bit (Get_4_Bit_Val (N, Desc.Offset));
- when 8 => return Field_Size_32_Bit (Get_8_Bit_Val (N, Desc.Offset));
- when others => return Get_32_Bit_Val (N, Desc.Offset);  -- 32
+ when 1 => return Field_Size_32_Bit (Get_1_Bit_Val (NN, Desc.Offset));
+ when 2 => return Field_Size_32_Bit (Get_2_Bit_Val (NN, Desc.Offset));
+ when 4 => return Field_Size_32_Bit (Get_4_Bit_Val (NN, Desc.Offset));
+ when 8 => return Field_Size_32_Bit (Get_8_Bit_Val (NN, Desc.Offset));
+ when others => return Get_32_Bit_Val (NN, Desc.Offset);  -- 32
   end case;
end Get_Field_Value;
 


diff --git a/gcc/ada/atree.ads b/gcc/ada/atree.ads
--- a/gcc/ada/atree.ads
+++ b/gcc/ada/atree.ads
@@ -47,6 +47,7 @@
 with Alloc;
 with Sinfo.Nodes;use Sinfo.Nodes;
 with Einfo.Entities; use Einfo.Entities;
+with Einfo.Utils;use Einfo.Utils;
 with Types;  use Types;
 with Seinfo; use Seinfo;
 with System; use System;
@@ -616,6 +617,20 @@ package Atree is
--  always the same; for example we change from E_Void, to E_Variable, to
--  E_Void, to E_Constant.
 
+   function Node_To_Fetch_From
+ (N : Node_Or_Entity_Id; Field : Node_Or_Entity_Field)
+ return Node_Or_Entity_Id is
+  (case Field_Descriptors (Field).Type_Only is
+ when No_Type_Only => N,
+ when Base_Type_Only => Base_Type (N),
+ when Impl_Base_Type_Only => Implementation_Base_Type (N),
+ when Root_Type_Only => Root_Type (N));
+   --  This is analogous to the same-named function in Gen_IL.Gen. Normally,
+   --  Type_Only is No_Type_Only, and we fetch the field from the node N. But
+   --  if Type_Only = Base_Type_Only, we need to go to the Base_Type, and
+   --  similarly for the other two cases. This can return something other
+   --  than N only if N is an Entity.
+
-
-- Private Part Subpackage --
-


diff --git a/gcc/ada/gen_il-gen.adb b/gcc/ada/gen_il-gen.adb
--- a/gcc/ada/gen_il-gen.adb
+++ b/gcc/ada/gen_il-gen.adb
@@ -2157,7 +2157,8 @@ package body Gen_IL.Gen is
 
   Put (S, F_Image (F) & " => (" &
Image (Field_Table (F).Field_Type) & "_Field, " &
-   Image (Offset) & ")");
+   Image (Offset) & ", " &
+   Image (Field_Table (F).Type_Only) & ")");
 
   FS := Field_Size (F);
   FB := First_Bit (F, Offset);
@@ -2252,10 +2253,32 @@ package body Gen_IL.Gen is
  Decrease_Indent 

[Ada] Simplify iteration of record components when expanding equality

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Replace a confusing loop with two exit staments by a straightforward
while loop with an explicit condition. Also, explicitly iterate over
discriminants and components, not over entities.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch4.adb (Expand_Composite_Equality): Fix style.
(Element_To_Compare): Simplify loop.
(Expand_Record_Equality): Adapt calls to Element_To_Compare.diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -2583,7 +2583,7 @@ package body Exp_Ch4 is
 
  return
Make_Function_Call (Loc,
- Name => New_Occurrence_Of (Eq_Op, Loc),
+ Name   => New_Occurrence_Of (Eq_Op, Loc),
  Parameter_Associations =>
New_List
  (Unchecked_Convert_To (Etype (First_Formal (Eq_Op)), Lhs),
@@ -2606,7 +2606,7 @@ package body Exp_Ch4 is
begin
   return
 Make_Function_Call (Loc,
-  Name  => New_Occurrence_Of (Eq_Op, Loc),
+  Name   => New_Occurrence_Of (Eq_Op, Loc),
   Parameter_Associations => New_List (
 OK_Convert_To (T, Lhs),
 OK_Convert_To (T, Rhs)));
@@ -13116,41 +13116,35 @@ package body Exp_Ch4 is
   
 
   function Element_To_Compare (C : Entity_Id) return Entity_Id is
- Comp : Entity_Id;
+ Comp : Entity_Id := C;
 
   begin
- Comp := C;
- loop
---  Exit loop when the next element to be compared is found, or
---  there is no more such element.
-
-exit when No (Comp);
-
-exit when Ekind (Comp) in E_Discriminant | E_Component
-  and then not (
+ while Present (Comp) loop
+--  Skip inherited components
 
-  --  Skip inherited components
+--  Note: for a tagged type, we always generate the "=" primitive
+--  for the base type (not on the first subtype), so the test for
+--  Comp /= Original_Record_Component (Comp) is True for inherited
+--  components only.
 
-  --  Note: for a tagged type, we always generate the "=" primitive
-  --  for the base type (not on the first subtype), so the test for
-  --  Comp /= Original_Record_Component (Comp) is True for
-  --  inherited components only.
-
-  (Is_Tagged_Type (Typ)
+if (Is_Tagged_Type (Typ)
 and then Comp /= Original_Record_Component (Comp))
 
-  --  Skip _Tag
+--  Skip _Tag
 
   or else Chars (Comp) = Name_uTag
 
-  --  Skip interface elements (secondary tags???)
-
-  or else Is_Interface (Etype (Comp)));
+--  Skip interface elements (secondary tags???)
 
-Next_Entity (Comp);
+  or else Is_Interface (Etype (Comp))
+then
+   Next_Component_Or_Discriminant (Comp);
+else
+   return Comp;
+end if;
  end loop;
 
- return Comp;
+ return Empty;
   end Element_To_Compare;
 
--  Start of processing for Expand_Record_Equality
@@ -13166,7 +13160,7 @@ package body Exp_Ch4 is
   --and then Lhs.Cmpn = Rhs.Cmpn
 
   Result := New_Occurrence_Of (Standard_True, Loc);
-  C := Element_To_Compare (First_Entity (Typ));
+  C := Element_To_Compare (First_Component_Or_Discriminant (Typ));
   while Present (C) loop
  declare
 New_Lhs : Node_Id;
@@ -13224,7 +13218,7 @@ package body Exp_Ch4 is
  end;
 
  First_Time := False;
- C := Element_To_Compare (Next_Entity (C));
+ C := Element_To_Compare (Next_Component_Or_Discriminant (C));
   end loop;
 
   return Result;




[Ada] Relax INOX restrictions when casing on composite value.

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
When casing on a composite value, certain component types/subtypes were
previously disallowed. These included access types, real types,
nonstatic discrete subtypes, and others. This restriction is relaxed so
that such components are now allowed, but no non-box value may be
specified for such a component in a case choice.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_case.adb (Composite_Case_Ops.Box_Value_Required): A new
function which takes a component type and returns a Boolean.
Returns True for the cases which were formerly forbidden as
components (these checks were formerly performed in the
now-deleted procedure
Check_Composite_Case_Selector.Check_Component_Subtype).
(Composite_Case_Ops.Normalized_Case_Expr_Type): Hoist this
function out of the Array_Case_Ops package because it has been
generalized to also do the analogous thing in the case of a
discriminated type.
(Composite_Case_Ops.Scalar_Part_Count): Return 0 if
Box_Value_Required returns True for the given type/subtype.

(Composite_Case_Ops.Choice_Analysis.Choice_Analysis.Component_Bounds_Info.
Traverse_Discrete_Parts): Return without doing anything if
Box_Value_Required returns True for the given type/subtype.
(Composite_Case_Ops.Choice_Analysis.Parse_Choice.Traverse_Choice):
If Box_Value_Required yields True for a given component type,
then check that the value of that component in a choice
expression is indeed a box (in which case the component is
ignored).
* doc/gnat_rm/implementation_defined_pragmas.rst: Update
documentation.
* gnat_rm.texi: Regenerate.diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
--- a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
@@ -2268,9 +2268,24 @@ of GNAT specific extensions are recognized as follows:
   set shall be a proper subset of the second (and the later alternative
   will not be executed if the earlier alternative "matches"). All possible
   values of the composite type shall be covered. The composite type of the
-  selector shall be a nonlimited untagged (but possibly discriminated)
-  record type, all of whose subcomponent subtypes are either static discrete
-  subtypes or record types that meet the same restrictions.
+  selector shall be an array or record type that is neither limited
+  class-wide.
+
+  If a subcomponent's subtype does not meet certain restrictions, then
+  the only value that can be specified for that subcomponent in a case
+  choice expression is a "box" component association (which matches all
+  possible values for the subcomponent). This restriction applies if
+
+  - the component subtype is not a record, array, or discrete type; or
+
+  - the component subtype is subject to a non-static constraint or
+has a predicate; or
+
+  - the component type is an enumeration type that is subject to an
+enumeration representation clause; or
+
+  - the component type is a multidimensional array type or an
+array type with a nonstatic index subtype.
 
   Support for casing on arrays (and on records that contain arrays) is
   currently subject to some restrictions. Non-positional


diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -21,7 +21,7 @@
 
 @copying
 @quotation
-GNAT Reference Manual , Sep 28, 2021
+GNAT Reference Manual , Oct 25, 2021
 
 AdaCore
 
@@ -3707,9 +3707,32 @@ overlaps the corresponding set of a later alternative, then the first
 set shall be a proper subset of the second (and the later alternative
 will not be executed if the earlier alternative “matches”). All possible
 values of the composite type shall be covered. The composite type of the
-selector shall be a nonlimited untagged (but possibly discriminated)
-record type, all of whose subcomponent subtypes are either static discrete
-subtypes or record types that meet the same restrictions.
+selector shall be an array or record type that is neither limited
+class-wide.
+
+If a subcomponent’s subtype does not meet certain restrictions, then
+the only value that can be specified for that subcomponent in a case
+choice expression is a “box” component association (which matches all
+possible values for the subcomponent). This restriction applies if
+
+
+@itemize -
+
+@item 
+the component subtype is not a record, array, or discrete type; or
+
+@item 
+the component subtype is subject to a non-static constraint or
+has a predicate; or
+
+@item 
+the component type is an enumeration type that is subject to an
+enumeration representation clause; or
+
+@item 
+the component type is a multidimensional array type or an
+array type with a nonstatic index subtype.
+@end itemize
 
 Support for casing on arrays (and 

[Ada] Update the inactive GMP variant of Big_Integers

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
The GMP variant of Ada.Numerics.Big_Numbers.Big_Integers is currently
not used, but since we keep it, it seems worth to keep it up-to-date
with respect to the corresponding spec.

Part of providing a gdb pretty-printer for Big_Integer objects.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-nbnbin__gmp.adb (From_String): Fix predicate
mismatch between subprogram declaration and body.diff --git a/gcc/ada/libgnat/a-nbnbin__gmp.adb b/gcc/ada/libgnat/a-nbnbin__gmp.adb
--- a/gcc/ada/libgnat/a-nbnbin__gmp.adb
+++ b/gcc/ada/libgnat/a-nbnbin__gmp.adb
@@ -327,7 +327,7 @@ package body Ada.Numerics.Big_Numbers.Big_Integers is
-- From_String --
-
 
-   function From_String (Arg : String) return Big_Integer is
+   function From_String (Arg : String) return Valid_Big_Integer is
   function mpz_set_str
 (this : access mpz_t;
  str  : System.Address;




[Ada] Make Declaration_Node return nondeclarations in fewer cases

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
This patch changes Declaration_Node to avoid returning certain strange
node kinds. We don't avoid them all (in particular N_Null_Statement),
but we document what it's returning with a pragma Assert.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* einfo-utils.adb (Declaration_Node): Avoid returning the
following node kinds: N_Assignment_Statement, N_Integer_Literal,
N_Procedure_Call_Statement, N_Subtype_Indication, and
N_Type_Conversion.  Assert that the result is in N_Is_Decl or
empty.
* gen_il-gen-gen_nodes.adb (N_Is_Decl): Modify to match the
things that Declaration_Node can return.diff --git a/gcc/ada/einfo-utils.adb b/gcc/ada/einfo-utils.adb
--- a/gcc/ada/einfo-utils.adb
+++ b/gcc/ada/einfo-utils.adb
@@ -698,6 +698,30 @@ package body Einfo.Utils is
  P := Empty;
   end if;
 
+  --  Declarations are sometimes removed by replacing them with other
+  --  irrelevant nodes. For example, a declare expression can be turned
+  --  into a literal by constant folding. In these cases we want to
+  --  return Empty.
+
+  if Nkind (P) in
+  N_Assignment_Statement
+| N_Integer_Literal
+| N_Procedure_Call_Statement
+| N_Subtype_Indication
+| N_Type_Conversion
+  then
+ P := Empty;
+  end if;
+
+  --  The following Assert indicates what kinds of nodes can be returned;
+  --  they are not all "declarations".
+
+  if Serious_Errors_Detected = 0 then
+ pragma Assert
+   (Nkind (P) in N_Is_Decl | N_Empty,
+"Declaration_Node incorrect kind: " & Node_Kind'Image (Nkind (P)));
+  end if;
+
   return P;
end Declaration_Node;
 


diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -1675,16 +1675,29 @@ begin -- Gen_IL.Gen.Gen_Nodes
 
Union (N_Is_Decl,
   Children =>
-(N_Declaration,
+(N_Aggregate,
+ N_Block_Statement,
+ N_Declaration,
  N_Discriminant_Specification,
+ N_Entry_Index_Specification,
  N_Enumeration_Type_Definition,
  N_Exception_Handler,
+ N_Explicit_Dereference,
+ N_Expression_With_Actions,
+ N_Extension_Aggregate,
+ N_Identifier,
+ N_Iterated_Component_Association,
  N_Later_Decl_Item,
+ N_Loop_Statement,
+ N_Null_Statement,
+ N_Number_Declaration,
  N_Package_Specification,
  N_Parameter_Specification,
  N_Renaming_Declaration,
- N_Subprogram_Specification));
-   --  Nodes that can be returned by Declaration_Node
+ N_Quantified_Expression));
+   --  Nodes that can be returned by Declaration_Node; it can also return
+   --  Empty. Not all of these are true "declarations", but Declaration_Node
+   --  can return them in some cases.
 
Union (N_Is_Range,
   Children =>




[Ada] Global contracts on expression functions in Ada.Strings.Superbounded

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
For consistency, add Global => null contracts also to expression
functions in the Ada.Strings.Superbounded package.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-strsup.ads (Super_Length, Super_Element,
Super_Slice): Add Global contracts.diff --git a/gcc/ada/libgnat/a-strsup.ads b/gcc/ada/libgnat/a-strsup.ads
--- a/gcc/ada/libgnat/a-strsup.ads
+++ b/gcc/ada/libgnat/a-strsup.ads
@@ -76,7 +76,8 @@ package Ada.Strings.Superbounded with SPARK_Mode is
--  that they can be renamed in Ada.Strings.Bounded.Generic_Bounded_Length.
 
function Super_Length (Source : Super_String) return Natural
-   is (Source.Current_Length);
+   is (Source.Current_Length)
+   with Global => null;
 

-- Conversion, Concatenation, and Selection Functions --
@@ -620,7 +621,8 @@ package Ada.Strings.Superbounded with SPARK_Mode is
is (if Index <= Source.Current_Length
then Source.Data (Index)
else raise Index_Error)
-   with Pre => Index <= Super_Length (Source);
+   with Pre=> Index <= Super_Length (Source),
+Global => null;
 
procedure Super_Replace_Element
  (Source : in out Super_String;
@@ -649,8 +651,9 @@ package Ada.Strings.Superbounded with SPARK_Mode is
   --  get the null string in accordance with normal Ada slice rules.
 
   String (Source.Data (Low .. High)))
-   with Pre => Low - 1 <= Super_Length (Source)
- and then High <= Super_Length (Source);
+   with Pre=> Low - 1 <= Super_Length (Source)
+and then High <= Super_Length (Source),
+Global => null;
 
function Super_Slice
  (Source : Super_String;




[Ada] Simplify detection of a parent interface equality

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Replace subtle conditions on First_Entity/Last_Entity with a
straightforward Number_Formals/First_Formal/Last_Formal.

Code cleanup related to handling of dispatching equality in SPARK.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch3.adb (Predefined_Primitive_Bodies): Simplify detection
of existing equality operator.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -10972,16 +10972,13 @@ package body Exp_Ch3 is
  while Present (Prim) loop
 if Chars (Node (Prim)) = Name_Op_Eq
   and then not Is_Internal (Node (Prim))
-  and then Present (First_Entity (Node (Prim)))
 
   --  The predefined equality primitive must have exactly two
-  --  formals whose type is this tagged type
+  --  formals whose type is this tagged type.
 
-  and then Present (Last_Entity (Node (Prim)))
-  and then Next_Entity (First_Entity (Node (Prim)))
- = Last_Entity (Node (Prim))
-  and then Etype (First_Entity (Node (Prim))) = Tag_Typ
-  and then Etype (Last_Entity (Node (Prim))) = Tag_Typ
+  and then Number_Formals (Node (Prim)) = 2
+  and then Etype (First_Formal (Node (Prim))) = Tag_Typ
+  and then Etype (Last_Formal (Node (Prim))) = Tag_Typ
 then
Eq_Needed := False;
Eq_Name := No_Name;




[Ada] Remove redundant guard in expansion of dispatching calls

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Routine Predefined_Primitive_Bodies, which create predefined primitives
for derived tagged types, is only called with non-interface types
entities (which is even enforced with an assertion at the very start of
its body). There is no need to recheck this condition when creating
individual primitive operations related to tasking and equality.

Code cleanup related to handling of dispatching equality in SPARK.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch3.adb (Predefined_Primitive_Bodies): Remove redundant
conditions related to interface types.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -11102,7 +11102,6 @@ package body Exp_Ch3 is
   --  they may be ancestors of synchronized interface types).
 
   if Ada_Version >= Ada_2005
-and then not Is_Interface (Tag_Typ)
 and then
   ((Is_Interface (Etype (Tag_Typ))
  and then Is_Limited_Record (Etype (Tag_Typ)))
@@ -11123,7 +11122,7 @@ package body Exp_Ch3 is
  Append_To (Res, Make_Disp_Timed_Select_Body(Tag_Typ));
   end if;
 
-  if not Is_Limited_Type (Tag_Typ) and then not Is_Interface (Tag_Typ) then
+  if not Is_Limited_Type (Tag_Typ) then
 
  --  Body for equality
 




[Ada] Do not expect execv to return 0

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
When spawning subprocesses with fork & execv there is no need to check
the result of execve, because it either succeeds and does not return or
fails and returns -1.

This is only a code cleanup related to the use of fork-vs-vfork in
GNATprove; behaviour is unaffected, though the GNAT runtime library
will be marginally smaller.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* adaint.c (__gnat_portable_spawn): Do not expect execv to
return 0.
(__gnat_portable_no_block_spawn): Likewise.diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
--- a/gcc/ada/adaint.c
+++ b/gcc/ada/adaint.c
@@ -2424,8 +2424,10 @@ __gnat_portable_spawn (char *args[] ATTRIBUTE_UNUSED)
   if (pid == 0)
 {
   /* The child. */
-  if (execv (args[0], MAYBE_TO_PTR32 (args)) != 0)
-	_exit (1);
+  execv (args[0], MAYBE_TO_PTR32 (args));
+
+  /* execv() returns only on error */
+  _exit (1);
 }
 
   /* The parent.  */
@@ -2822,8 +2824,10 @@ __gnat_portable_no_block_spawn (char *args[] ATTRIBUTE_UNUSED)
   if (pid == 0)
 {
   /* The child.  */
-  if (execv (args[0], MAYBE_TO_PTR32 (args)) != 0)
-	_exit (1);
+  execv (args[0], MAYBE_TO_PTR32 (args));
+
+  /* execv() returns only on error */
+  _exit (1);
 }
 
   return pid;




[Ada] Initialize variable to Empty

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
CodePeer was warning about this variable being potentially used without
being initialized.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch8.adb (Analyze_Subprogram_Renaming): Set New_S to Empty.diff --git a/gcc/ada/sem_ch8.adb b/gcc/ada/sem_ch8.adb
--- a/gcc/ada/sem_ch8.adb
+++ b/gcc/ada/sem_ch8.adb
@@ -3099,7 +3099,7 @@ package body Sem_Ch8 is
   --  related formal type is class-wide.
 
   Inst_Node: Node_Id   := Empty;
-  New_S: Entity_Id;
+  New_S: Entity_Id := Empty;
   Wrapped_Prim : Entity_Id := Empty;
 
--  Start of processing for Analyze_Subprogram_Renaming




[Ada] Reference in Unbounded_String is almost never null

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
There are two variants of the Ada.Strings.Unbounded_String package, with
and without atomic reference counters. The underlying pointer is never
null in one variant (and had a null-excluding type) and almost never
null in the other variant (and now has a null-excluding type as well).

Cleanup related to sync of contracts for GNATprove between both variants
of the package.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-strunb.ads (Unbounded_String): Reference is never
null.
* libgnat/a-strunb.adb (Finalize): Copy reference while it needs
to be deallocated.diff --git a/gcc/ada/libgnat/a-strunb.adb b/gcc/ada/libgnat/a-strunb.adb
--- a/gcc/ada/libgnat/a-strunb.adb
+++ b/gcc/ada/libgnat/a-strunb.adb
@@ -505,8 +505,14 @@ package body Ada.Strings.Unbounded is
   --  Note: Don't try to free statically allocated null string
 
   if Object.Reference /= Null_String'Access then
- Deallocate (Object.Reference);
- Object.Reference := Null_Unbounded_String.Reference;
+ declare
+Reference_Copy : String_Access := Object.Reference;
+--  The original reference cannot be null, so we must create a
+--  copy which will become null when deallocated.
+ begin
+Deallocate (Reference_Copy);
+Object.Reference := Null_Unbounded_String.Reference;
+ end;
  Object.Last := 0;
   end if;
end Finalize;


diff --git a/gcc/ada/libgnat/a-strunb.ads b/gcc/ada/libgnat/a-strunb.ads
--- a/gcc/ada/libgnat/a-strunb.ads
+++ b/gcc/ada/libgnat/a-strunb.ads
@@ -746,8 +746,8 @@ private
  renames To_Unbounded_String;
 
type Unbounded_String is new AF.Controlled with record
-  Reference : String_Access := Null_String'Access;
-  Last  : Natural   := 0;
+  Reference : not null String_Access := Null_String'Access;
+  Last  : Natural:= 0;
end record with Put_Image => Put_Image;
 
procedure Put_Image




[Ada] Don't expect enumeration literals to be renamings

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
When using cross-reference information to get subprogram effects in
SPARK (i.e. its reads and writes), we were calling Renamed_Object on
enumeration literals. This was pointless but harmless; now it rightly
triggers an assertion failure in developer builds, so avoid that.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* lib-xref.adb (Get_Through_Renamings): Exit loop when an
enumeration literal is found.diff --git a/gcc/ada/lib-xref.adb b/gcc/ada/lib-xref.adb
--- a/gcc/ada/lib-xref.adb
+++ b/gcc/ada/lib-xref.adb
@@ -481,7 +481,9 @@ package body Lib.Xref is
--  e.g. function call, slicing of a function call,
--  pointer dereference, etc.
 
-   if No (Obj) then
+   if No (Obj)
+ or else Ekind (Obj) = E_Enumeration_Literal
+   then
   return Empty;
end if;
 else




[Ada] Shutdown codepeer message

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Which is a false positive, caused by a confusion on the expanded code
for pragma Loop_Variant.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-widthu.adb: Add pragma Annotate.diff --git a/gcc/ada/libgnat/s-widthu.adb b/gcc/ada/libgnat/s-widthu.adb
--- a/gcc/ada/libgnat/s-widthu.adb
+++ b/gcc/ada/libgnat/s-widthu.adb
@@ -134,10 +134,13 @@ begin
  W := W + 1;
  Pow := Pow * 10;
 
- pragma Loop_Variant (Decreases => T);
  pragma Loop_Invariant (W in 3 .. Max_W + 3);
  pragma Loop_Invariant (Pow = Big_10 ** (W - 2));
  pragma Loop_Invariant (Big (T) = Big (T_Init) / Pow);
+ pragma Loop_Variant (Decreases => T);
+ pragma Annotate
+   (CodePeer, False_Positive,
+"validity check", "confusion on generated code");
   end loop;
 
   declare




[Ada] Ada 2022: Class-wide types and formal abstract subprograms

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Ada 2022 specifies that when the controlling type of a formal abstract
subprogram declaration is a formal type, and the actual type is a
class-wide type T'Class, the actual subprogram can be an implicitly
declared subprogram corresponding to a primitive operation of type T
(AI12-0165-1/05).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch8.adb (Build_Class_Wide_Wrapper): Previous version split
in two subprograms to factorize its functionality:
Find_Suitable_Candidate, and Build_Class_Wide_Wrapper. These
routines are also placed in the new subprogram
Handle_Instance_With_Class_Wide_Type.
(Handle_Instance_With_Class_Wide_Type): New subprogram that
encapsulates all the code that handles instantiations with
class-wide types.
(Analyze_Subprogram_Renaming): Adjust code to invoke the new
nested subprogram Handle_Instance_With_Class_Wide_Type; adjust
documentation.

patch.diff.gz
Description: application/gzip


[Ada] Renamed_Or_Alias cleanup

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
There are three "fields" that are aliases for the Renamed_Or_Alias
field: Alias, Renamed_Entity, and Renamed_Object. The getters and
setters were (mis)used more or less interchangeably, in violation of the
comments.

This patch adds assertions to enforce the comments, and changes all of
the call sites to obey the comments, except for some call sites of
[Set_]Renamed_Object involving front end inlining and generation of
debug information, which are too complicated to fix and which are well
isolated from the other uses.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* einfo-utils.ads, einfo-utils.adb (Alias, Set_Alias,
Renamed_Entity, Set_Renamed_Entity, Renamed_Object,
Set_Renamed_Object): Add assertions that reflect how these are
supposed to be used and what they are supposed to return.
(Renamed_Entity_Or_Object): New getter.
(Set_Renamed_Object_Of_Possibly_Void): Setter that allows N to
be E_Void.
* checks.adb (Ensure_Valid): Use Renamed_Entity_Or_Object
because this is called for both cases.
* exp_dbug.adb (Debug_Renaming_Declaration): Use
Renamed_Entity_Or_Object because this is called for both cases.
Add assertions.
* exp_util.adb (Possible_Bit_Aligned_Component): Likewise.
* freeze.adb (Freeze_All_Ent): Likewise.
* sem_ch5.adb (Within_Function): Likewise.
* exp_attr.adb (Calculate_Header_Size): Call Renamed_Entity
instead of Renamed_Object.
* exp_ch11.adb (Expand_N_Raise_Statement): Likewise.
* repinfo.adb (Find_Declaration): Likewise.
* sem_ch10.adb (Same_Unit, Process_Spec_Clauses,
Analyze_With_Clause, Install_Parents): Likewise.
* sem_ch12.adb (Build_Local_Package, Needs_Body_Instantiated,
Build_Subprogram_Renaming, Check_Formal_Package_Instance,
Check_Generic_Actuals, In_Enclosing_Instance,
Denotes_Formal_Package, Process_Nested_Formal,
Check_Initialized_Types, Map_Formal_Package_Entities,
Restore_Nested_Formal): Likewise.
* sem_ch6.adb (Report_Conflict): Likewise.
* sem_ch8.adb (Analyze_Exception_Renaming,
Analyze_Generic_Renaming, Analyze_Package_Renaming,
Is_Primitive_Operator_In_Use, Declared_In_Actual,
Note_Redundant_Use): Likewise.
* sem_warn.adb (Find_Package_Renaming): Likewise.
* sem_elab.adb (Ultimate_Variable): Call Renamed_Object instead
of Renamed_Entity.
* exp_ch6.adb (Get_Function_Id): Call
Set_Renamed_Object_Of_Possibly_Void, because the defining
identifer is still E_Void at this point.
* sem_util.adb (Function_Call_Or_Allocator_Level): Likewise.
Remove redundant (unreachable) code.
(Is_Object_Renaming, Is_Valid_Renaming): Call Renamed_Object
instead of Renamed_Entity.
(Get_Fullest_View): Call Renamed_Entity instead of
Renamed_Object.
(Copy_Node_With_Replacement): Call
Set_Renamed_Object_Of_Possibly_Void because the defining entity
is sometimes E_Void.
* exp_ch5.adb (Expand_N_Assignment_Statement): Protect a call to
Renamed_Object with Is_Object to avoid assertion failure.
* einfo.ads: Minor comment fixes.
* inline.adb: Minor comment fixes.
* tbuild.ads: Minor comment fixes.

patch.diff.gz
Description: application/gzip


[Ada] Issue error on invalid use of Ghost inside pragma Predicate

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Checking for ghost placement was only occurring inside the various
versions of predicate aspects, not inside the pragma Predicate. Now
fixed.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Freeze_Entity_Checks): Perform same check on
predicate expression inside pragma as inside aspect.
* sem_util.adb (Is_Current_Instance): Recognize possible
occurrence of subtype as current instance inside the pragma
Predicate.diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -13144,6 +13144,28 @@ package body Sem_Ch13 is
   else
  Check_Aspect_At_Freeze_Point (Ritem);
   end if;
+
+   --  A pragma Predicate should be checked like one of the
+   --  corresponding aspects, wrt possible misuse of ghost
+   --  entities.
+
+   elsif Nkind (Ritem) = N_Pragma
+ and then No (Corresponding_Aspect (Ritem))
+ and then
+   Get_Pragma_Id (Pragma_Name (Ritem)) = Pragma_Predicate
+   then
+  --  Retrieve the visibility to components and discriminants
+  --  in order to properly analyze the pragma.
+
+  declare
+ Arg : constant Node_Id :=
+Next (First (Pragma_Argument_Associations (Ritem)));
+  begin
+ Push_Type (E);
+ Preanalyze_Spec_Expression
+   (Expression (Arg), Standard_Boolean);
+ Pop_Type (E);
+  end;
end if;
 
Next_Rep_Item (Ritem);


diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -16644,7 +16644,8 @@ package body Sem_Util is
 --  Predicate_Failure aspect, for which we do not construct a
 --  wrapper procedure. The subtype will be replaced by the
 --  expression being tested when the corresponding predicate
---  check is expanded.
+--  check is expanded. It may also appear in the pragma Predicate
+--  expression during legality checking.
 
 elsif Nkind (P) = N_Aspect_Specification
   and then Nkind (Parent (P)) = N_Subtype_Declaration
@@ -16652,7 +16653,8 @@ package body Sem_Util is
return True;
 
 elsif Nkind (P) = N_Pragma
-  and then Get_Pragma_Id (P) = Pragma_Predicate_Failure
+  and then Get_Pragma_Id (P) in Pragma_Predicate
+  | Pragma_Predicate_Failure
 then
return True;
 end if;




[Ada] Fix deleted Compile_Time warnings causing crashes

2021-10-25 Thread Pierre-Marie de Rodat via Gcc-patches
Count_Compile_Time_Pragma_Warnings also counted deleted pragmas. This
caused discrepancies ultimately leading to a crash when Compile_Time
warnings were suppressed by a Warnings(Off, ...) pragma.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* erroutc.adb (Count_Compile_Time_Pragma_Warnings): Don't count
deleted warnings.diff --git a/gcc/ada/erroutc.adb b/gcc/ada/erroutc.adb
--- a/gcc/ada/erroutc.adb
+++ b/gcc/ada/erroutc.adb
@@ -277,7 +277,9 @@ package body Erroutc is
begin
   for J in 1 .. Errors.Last loop
  begin
-if Errors.Table (J).Warn and Errors.Table (J).Compile_Time_Pragma
+if Errors.Table (J).Warn
+   and then Errors.Table (J).Compile_Time_Pragma
+   and then not Errors.Table (J).Deleted
 then
Result := Result + 1;
 end if;




[COMMITTED] Always output exported ranges to a dump_file.

2021-10-25 Thread Andrew MacLeod via Gcc-patches
This patch makes execute_ranger_vrp () match what VRP does and output 
any exported ranges at the end of the pass to a dump file, without 
needing TDF_DETAILS to be provided.


Bootstraps onx86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From 17d26698aa31268acdf5e1d4d0bc363dd35378ac Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 20 Oct 2021 13:41:12 -0400
Subject: [PATCH 2/3] Always output exported ranges to a dump_file.

	* gimple-range.cc (gimple_ranger::export_global_ranges): Remove check
	for TDF_DETAILS.
---
 gcc/gimple-range.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 69cde911c49..91bacda6dd0 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -303,7 +303,7 @@ gimple_ranger::export_global_ranges ()
 	  && !r.varying_p())
 	{
 	  bool updated = update_global_range (r, name);
-	  if (!updated || !dump_file || !(dump_flags & TDF_DETAILS))
+	  if (!updated || !dump_file)
 	continue;
 
 	  if (print_header)
-- 
2.17.2



[COMMITTED] Tweak ranger-debug flags

2021-10-25 Thread Andrew MacLeod via Gcc-patches
When I split out the ranger debug flags from the evrp-mode flag, I 
should have also made the 3 separate debug flags their own unique flag 
value, and combined them for the various options.  As it is, when you 
ask for a trace, you also gets cache output, which is unintended.  This 
patch resolves that situation.


Bootstraps onx86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From 2bfb21bb8ce16698926576870e4b1f2609e0c909 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 21 Oct 2021 10:58:16 -0400
Subject: [PATCH 1/3] Tweak ranger-debug flags.

Set the 3 possible flags as all individual bits and group for options.

	* flag-types.h (enum ranger_debug): Adjust values.
	* params.opt (ranger_debug): Ditto.
---
 gcc/flag-types.h | 3 ++-
 gcc/params.opt   | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index a5a637160d7..7cf8c28933b 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -454,9 +454,10 @@ enum ranger_debug
 {
   RANGER_DEBUG_NONE = 0,
   RANGER_DEBUG_TRACE = 1,
-  RANGER_DEBUG_CACHE = (2 | RANGER_DEBUG_TRACE),
+  RANGER_DEBUG_CACHE = 2,
   RANGER_DEBUG_GORI = 4,
   RANGER_DEBUG_TRACE_GORI = (RANGER_DEBUG_TRACE | RANGER_DEBUG_GORI),
+  RANGER_DEBUG_TRACE_CACHE = (RANGER_DEBUG_TRACE | RANGER_DEBUG_CACHE),
   RANGER_DEBUG_ALL = (RANGER_DEBUG_GORI | RANGER_DEBUG_CACHE)
 };
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 393d52bc660..6eb3e15a9e6 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -887,7 +887,7 @@ EnumValue
 Enum(ranger_debug) String(trace) Value(RANGER_DEBUG_TRACE)
 
 EnumValue
-Enum(ranger_debug) String(cache) Value(RANGER_DEBUG_CACHE)
+Enum(ranger_debug) String(cache) Value(RANGER_DEBUG_TRACE_CACHE)
 
 EnumValue
 Enum(ranger_debug) String(gori) Value(RANGER_DEBUG_GORI)
-- 
2.17.2



[COMMITTED] Re: [PATCH] Possible use before def in fortran/trans-decl.c.

2021-10-25 Thread Andrew MacLeod via Gcc-patches

On 10/21/21 3:02 PM, Andrew MacLeod wrote:
As I'm tweaking installing ranger as the VRP2 pass, I am getting a 
stage 2 bootstrap failure now:


In file included from 
/opt/notnfs/amacleod/master/gcc/gcc/fortran/trans-decl.c:28:
/opt/notnfs/amacleod/master/gcc/gcc/tree.h: In function ‘void 
gfc_conv_cfi_to_gfc(stmtblock_t*, stmtblock_t*, tree, tree, 
gfc_symbol*)’:
/opt/notnfs/amacleod/master/gcc/gcc/tree.h:244:56: error: ‘rank’ may 
be used uninitialized in this function [-Werror=maybe-uninitialized]

  244 | #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code)
  | ^~~~
/opt/notnfs/amacleod/master/gcc/gcc/fortran/trans-decl.c:6671:8: note: 
‘rank’ was declared here

 6671 |   tree rank, idx, etype, tmp, tmp2, size_var = NULL_TREE;
  |    ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1136: fortran/trans-decl.o] Error 1


looking at that function, in the middle I see:

  if (sym->as->rank < 0)
    {
  /* Set gfc->dtype.rank, if assumed-rank.  */
  rank = gfc_get_cfi_desc_rank (cfi);
  gfc_add_modify (, gfc_conv_descriptor_rank (gfc_desc), rank);
    }
  else if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (gfc_desc)))
    /* In that case, the CFI rank and the declared rank can differ.  */
    rank = gfc_get_cfi_desc_rank (cfi);
  else
    rank = build_int_cst (signed_char_type_node, sym->as->rank);


so rank is set on all paths here.   However, stepping back a bit, 
earlier in the function I see:


  if (!sym->attr.dimension || !GFC_DESCRIPTOR_TYPE_P (TREE_TYPE 
(gfc_desc)))

    {
  tmp = gfc_get_cfi_desc_base_addr (cfi);
  gfc_add_modify (, gfc_desc,
  fold_convert (TREE_TYPE (gfc_desc), tmp));
  if (!sym->attr.dimension)
    goto done;
    }

The done: label occurs *after* that block of initialization code, and 
bit furtehr down , I see this:


      gfc_add_modify (_body, tmpidx, idx);
  stmtblock_t inner_loop;
  gfc_init_block (_loop);
  tree dim = gfc_create_var (TREE_TYPE (rank), "dim");

I cannot convince myself by looking at the intervening code that this 
can not be executed along this path.  Perhaps someone more familiar 
with the code can check it out.   However, It seems worthwhile to at 
least initialize rank to NULL_TREE, thus we can be more likely to see 
a trap if that path ever gets followed.


And it makes the warning go away :-)

OK?

Andrew

PS as a side note, it would be handy to have the def point *and* the 
use point that might be undefined.   Its a big function and it took me 
a while just to see where a possible use might be.






Bootstraps onx86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From 387c665392366a543fb29badaee329533b32abb3 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 21 Oct 2021 14:48:20 -0400
Subject: [PATCH 3/3] Initialize variable.

	gcc/fortran/
	* trans-decl.c (gfc_conv_cfi_to_gfc): Initialize rank to NULL_TREE.
---
 gcc/fortran/trans-decl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index de624c82fcf..fe5511b5285 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -6668,7 +6668,7 @@ gfc_conv_cfi_to_gfc (stmtblock_t *init, stmtblock_t *finally,
   stmtblock_t block;
   gfc_init_block ();
   tree cfi = build_fold_indirect_ref_loc (input_location, cfi_desc);
-  tree rank, idx, etype, tmp, tmp2, size_var = NULL_TREE;
+  tree idx, etype, tmp, tmp2, size_var = NULL_TREE, rank = NULL_TREE;
   bool do_copy_inout = false;
 
   /* When allocatable + intent out, free the cfi descriptor.  */
-- 
2.17.2



Re: [Aarch64] Fix alignment of neon loads & stores in gimple

2021-10-25 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> This fixes the alignment on the memory access type for neon loads & 
> stores in the gimple lowering. Bootstrap ubsan on aarch64 builds again 
> with this change.
>
>
> 2021-10-25  Andre Vieira  
>
> gcc/ChangeLog:
>
>      * config/aarch64/aarch64-builtins.c 
> (aarch64_general_gimple_fold_builtin): Fix memory access
>      type alignment.
>
>
> Is this OK for trunk?
>
> Kind regards,
> Andre
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> a815e4cfbccab692ca688ba87c71b06c304abbfb..f5436baf5f8a65c340e05faa491d86a7847c37d3
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -2490,12 +2490,16 @@ aarch64_general_gimple_fold_builtin (unsigned int 
> fcode, gcall *stmt,
>   gimple_seq stmts = NULL;
>   tree base = gimple_convert (, elt_ptr_type,
>   args[0]);
> + /* Use element type alignment.  */
> + tree access_type
> +   = build_aligned_type (simd_type.itype,
> + TYPE_ALIGN (TREE_TYPE (simd_type.itype)));

Guess this is slightly simpler as TYPE_ALIGN (simd_type.eltype)
but either's fine.

OK with or without that change.

Thanks,
Richard

>   if (stmts)
> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>   new_stmt
> = gimple_build_assign (gimple_get_lhs (stmt),
>fold_build2 (MEM_REF,
> -   simd_type.itype,
> +   access_type,
> base, zero));
> }
>   break;
> @@ -2512,13 +2516,16 @@ aarch64_general_gimple_fold_builtin (unsigned int 
> fcode, gcall *stmt,
>   gimple_seq stmts = NULL;
>   tree base = gimple_convert (, elt_ptr_type,
>   args[0]);
> + /* Use element type alignment.  */
> + tree access_type
> +   = build_aligned_type (simd_type.itype,
> + TYPE_ALIGN (TREE_TYPE (simd_type.itype)));
>   if (stmts)
> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>   new_stmt
> -   = gimple_build_assign (fold_build2 (MEM_REF,
> -  simd_type.itype,
> -  base,
> -  zero), args[1]);
> +   = gimple_build_assign (fold_build2 (MEM_REF, access_type,
> +   base, zero),
> +  args[1]);
> }
>   break;
>  


[PATCH] libcody: add mostlyclean Makefile target

2021-10-25 Thread Martin Liška

Hello.

The patch adds missing Makefile mostlyclean.

Ready to be installed?
Thanks,
Martin

PR other/102657

libcody/ChangeLog:

* Makefile.in: Add mostlyclean Makefile target.
---
 libcody/Makefile.in | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libcody/Makefile.in b/libcody/Makefile.in
index b8b45a2e310..d8f1e8216d4 100644
--- a/libcody/Makefile.in
+++ b/libcody/Makefile.in
@@ -111,7 +111,7 @@ maintainer-clean:: distclean
 clean::
rm -f $(shell find $(srcdir) -name '*~')
 
-.PHONY: all check clean distclean maintainer-clean

+.PHONY: all check clean distclean maintainer-clean mostlyclean
 
 CXXFLAGS/ := -I$(srcdir)

 LIBCODY.O := buffer.o client.o fatal.o netclient.o netserver.o \
@@ -127,6 +127,8 @@ clean::
rm -f $(LIBCODY.O) $(LIBCODY.O:.o=.d)
rm -f libcody.a
 
+mostlyclean: clean

+
 CXXFLAGS/fatal.cc = -DSRCDIR='"$(srcdir)"'
 
 fatal.o: Makefile revision

--
2.33.1



Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-10-25 Thread Richard Sandiford via Gcc-patches
n Arm Cortex CPUs it is not a regression as a DUP on a SIMD scalar has the 
> same throughput and latencies as a MOVI
> according to the Arm Performance Software Optimization guides.

Costing them as equal would be OK when they are equal.  It's the “DUP (lane)/
mov high is strictly cheaper bit” I'm concerned about.

> So to me this looks like an improvement overall.  And this is where we likely 
> disagree?

Well, the disagreement isn't about whether the new compiler output for
these testcases is better than the old compiler output.  It's more a
question of how we're getting there.

>> > MOVI as I mentioned before is the one case where this is a toss up.
>> > But there are far more constants that cannot be created with a movi.
>> > A simple example is
>> >
>> > #include 
>> >
>> > int8x16_t square(int8x16_t full, int8x8_t small) {
>> > int8x16_t cst = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,15};
>> > int8x8_t low = vget_high_s8 (cst);
>> > int8x8_t res1 = vmul_s8 (small, low);
>> > return vaddq_s8 (vmulq_s8 (full, cst), vcombine_s8 (res1, res1));
>> > }
>> >
>> > Where in Gimple we get
>> >
>> >[local count: 1073741824]:
>> >   _2 = __builtin_aarch64_get_highv16qi ({ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
>> > 10, 11, 12,
>> 13, 15, 0 });
>> >   _4 = _2 * small_3(D);
>> >   _6 = full_5(D) * { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 0 };
>> >   _7 = __builtin_aarch64_combinev8qi (_4, _4);
>> >   _8 = _6 + _7;
>> >   return _8;
>> >
>> > Regardless of what happens to __builtin_aarch64_get_highv16qi nothing
>> > will recreate the relationship with cst, whether
>> __builtin_aarch64_get_highv16qi is lowered or not, constant prop will still
>> push in constants.
>> 
>> Yeah, constants are (by design) free in gimple.  But that's OK in itself,
>> because RTL optimisers have the job of removing any duplicates that end up
>> requiring separate moves.  I think we both agree on that.
>> 
>> E.g. for:
>> 
>> #include 
>> 
>> void foo(int8x16_t *x) {
>>   x[0] = vaddq_s8 (x[0], (int8x16_t) 
>> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15});
>>   x[1] = vaddq_s8 (x[1], (int8x16_t) 
>> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15});
>> }
>> 
>> the final gimple is:
>> 
>>[local count: 1073741824]:
>>   _1 = *x_4(D);
>>   _5 = _1 + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
>>   *x_4(D) = _5;
>>   _2 = MEM[(int8x16_t *)x_4(D) + 16B];
>>   _7 = _2 + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
>>   MEM[(int8x16_t *)x_4(D) + 16B] = _7;
>>   return;
>> 
>> but cse1 removes the duplicated constant even before the patch.
>
> It doesn't for me, again an unmodified compiler:
>
> https://godbolt.org/z/qnvf7496h 

FWIW, the link for my example is:

  https://godbolt.org/z/G6vaE3nab

but it sounds like the disagreement wasn't where I thought it was.

> and CSE1 has as the final codegen:
>
> (insn 7 4 8 2 (set (reg:V16QI 99)
> (const_vector:V16QI [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> (const_int 4 [0x4])
> (const_int 5 [0x5])
> (const_int 6 [0x6])
> (const_int 7 [0x7])
> (const_int 8 [0x8])
> (const_int 9 [0x9])
> (const_int 10 [0xa])
> (const_int 11 [0xb])
> (const_int 12 [0xc])
> (const_int 13 [0xd])
> (const_int 15 [0xf])
> (const_int 0 [0])
> ]))
>
> (insn 8 7 9 2 (set (reg:V8QI 92 [ _2 ])
> (const_vector:V8QI [
> (const_int 8 [0x8])
> (const_int 9 [0x9])
> (const_int 10 [0xa])
>     (const_int 11 [0xb])
> (const_int 12 [0xc])
> (const_int 13 [0xd])
> (const_int 15 [0xf])
> (const_int 0 [0])
> ]))
>
> (insn 11 10 12 2 (set (reg:V16QI 95 [ _7 ])
> (vec_concat:V16QI (vec_select:V8QI (reg:V16QI 95 [ _7 ])
> (parallel:V16QI [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> (const_int 4 [0x4])
> (const_int 5 [0x5])
> (const_int 6 [0x6])
>  

RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-25 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Friday, October 15, 2021 12:31 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> 
> Subject: Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns
> on signed values
> 
> On Fri, 15 Oct 2021, Tamar Christina wrote:
> 
> > Hi All,
> >
> > During testing after rebasing to commit I noticed a failing testcase
> > with the bitmask compare patch.
> >
> > Consider the following C++ testcase:
> >
> > #include 
> >
> > #define A __attribute__((noipa))
> > A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0; }
> >
> > This turns into a comparison against chars, on systems where chars are
> > signed the pattern inserts an unsigned convert such that it's able to
> > do the transformation.
> >
> > i.e.:
> >
> >   # RANGE [-1, 2]
> >   # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
> >   # RANGE ~[3, 254]
> >   _11 = (unsigned char) c$_M_value_22;
> >   _19 = _11 <= 1;
> >   # .MEM_24 = VDEF <.MEM_6(D)>
> >   D.10434 ={v} {CLOBBER};
> >   # .MEM_14 = VDEF <.MEM_24>
> >   D.10407 ={v} {CLOBBER};
> >   # VUSE <.MEM_14>
> >   return _19;
> >
> > instead of:
> >
> >   # RANGE [-1, 2]
> >   # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
> >   # RANGE [-2, 2]
> >   _3 = c$_M_value_5 & -2;
> >   _19 = _3 == 0;
> >   # .MEM_24 = VDEF <.MEM_6(D)>
> >   D.10440 ={v} {CLOBBER};
> >   # .MEM_14 = VDEF <.MEM_24>
> >   D.10413 ={v} {CLOBBER};
> >   # VUSE <.MEM_14>
> >   return _19;
> >
> > This causes much worse codegen under -ffast-math due to phiops no
> > longer recognizing the pattern.  It turns out that phiopts
> > spaceship_replacement is looking for the exact form that was just changed.
> >
> > Trying to get it to recognize the new form is not trivial as the
> > transformation doesn't look to work when the thing it's pointing to is 
> > itself
> a phi-node.
> 
> What do you mean?  Where it handles the BIT_AND it could also handle the
> conversion, no?  The later handling would probably more explicitely need to
> distinguish between the BIT_AND and the conversion forms.

Looks like I misunderstood the code, it was looking at the uses not the defs of
the value.

--- inline copy of patch ---

The comments seems to suggest this code only checks for (res & ~1) == 0 but the
implementation seems to suggest it's broader.

As such I added a case to check to see if the value comparison we found is a
type cast.  and strips away the type cast and continues.

In match.pd the typecasts are only added for signed comparisons to == 0 and != 0
which are then rewritten into comparisons with 1.

As such I only check for 1 and LE and GT, which is what match.pd would have
rewritten it to.

This fixes the regression but this is not code I 100% understand, since I don't
really know the semantics of the spaceship operator so would appreciate an extra
look.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-ssa-phiopt.c (spaceship_replacement): Handle new canonical
codegen.

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 
0e339c46afa29fa97f90d9bc4394370cd9b4b396..65b25be3399b75d5e9cab0f78aa2340418571a33
 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -2037,6 +2037,7 @@ spaceship_replacement (basic_block cond_bb, basic_block 
middle_bb,
   tree lhs, rhs;
   gimple *orig_use_stmt = use_stmt;
   tree orig_use_lhs = NULL_TREE;
+  bool is_canon = false;
   int prec = TYPE_PRECISION (TREE_TYPE (phires));
   if (is_gimple_assign (use_stmt)
   && gimple_assign_rhs_code (use_stmt) == BIT_AND_EXPR
@@ -2063,6 +2064,26 @@ spaceship_replacement (basic_block cond_bb, basic_block 
middle_bb,
 }
   else if (is_gimple_assign (use_stmt))
 {
+  /* Deal with if match.pd has rewritten the (res & ~1) == 0
+into res <= 1 and has left a type-cast for signed types.  */
+  if (gimple_assign_cast_p (use_stmt))
+   {
+ orig_use_lhs = gimple_assign_lhs (use_stmt);
+ if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig_use_lhs))
+   return false;
+ if (EDGE_COUNT (phi_bb->preds) != 4)
+   return false;
+ if (!TYPE_UNSIGNED (TREE_TYPE (orig_use_lhs)))
+   return false;
+ if (!single_imm_use (orig_use_lhs, _p, _stmt))
+   return false;
+ tree_code cmp;
+ if (is_gimple_assign (use_stmt)
+ && (cmp = gimple_assign_rhs_code (use_stmt))
+ && (cmp == LE_EXPR || cmp == GT_EXPR)
+ && wi::eq_p (wi::to_wide (gimple_assign_rhs2 (use_stmt)), 1))
+   is_canon = true;
+   }
   if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS)
{
  cmp = gimple_assign_rhs_code (use_stmt);
@@ -2099,7 +2120,9 @@ spaceship_replacement (basic_block cond_bb, basic_block 
middle_bb,
   || !tree_fits_shwi_p (rhs)
   || !IN_RANGE (tree_to_shwi (rhs), -1, 1))
 return 

Re: [Version 2][Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers

2021-10-25 Thread Qing Zhao via Gcc-patches
Ping….

Is this Okay for trunk?

> On Oct 18, 2021, at 2:26 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi, Jakub,
> 
> This is the 2nd version of the patch based on your comment.
> 
> Bootstrapped on both x86 and aarch64. Regression testings are ongoing.

The regression testing looks good.

Thanks.

Qing
> 
> Please let me know if this is ready for committing?
> 
> Thanks a lot.
> 
> Qing.
> 
> ==
> 
> From d6f60370dee69b5deb3d7ef51873a5e986490782 Mon Sep 17 00:00:00 2001
> From: Qing Zhao 
> Date: Mon, 18 Oct 2021 19:04:39 +
> Subject: [PATCH] PR 102281 (-ftrivial-auto-var-init=zero causes ice)
> 
> Do not add call to __builtin_clear_padding when a variable is a gimple
> register or it might not have padding.
> 
> gcc/ChangeLog:
> 
> 2021-10-18  qing zhao  
> 
>   * gimplify.c (gimplify_decl_expr): Do not add call to
>   __builtin_clear_padding when a variable is a gimple register
>   or it might not have padding.
>   (gimplify_init_constructor): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-10-18  qing zhao  
> 
>   * c-c++-common/pr102281.c: New test.
>   * gcc.target/i386/auto-init-2.c: Adjust testing case.
>   * gcc.target/i386/auto-init-4.c: Likewise.
>   * gcc.target/i386/auto-init-6.c: Likewise.
>   * gcc.target/aarch64/auto-init-6.c: Likewise.
> ---
> gcc/gimplify.c| 25 ++-
> gcc/testsuite/c-c++-common/pr102281.c | 17 +
> .../gcc.target/aarch64/auto-init-6.c  |  4 +--
> gcc/testsuite/gcc.target/i386/auto-init-2.c   |  2 +-
> gcc/testsuite/gcc.target/i386/auto-init-4.c   | 10 +++-
> gcc/testsuite/gcc.target/i386/auto-init-6.c   |  7 +++---
> 6 files changed, 47 insertions(+), 18 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/pr102281.c
> 
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index d8e4b139349..b27dc0ed308 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1784,8 +1784,8 @@ gimple_add_init_for_auto_var (tree decl,
>that padding is initialized to zero. So, we always initialize paddings
>to zeroes regardless INIT_TYPE.
>To do the padding initialization, we insert a call to
> -   __BUILTIN_CLEAR_PADDING (, 0, for_auto_init = true).
> -   Note, we add an additional dummy argument for __BUILTIN_CLEAR_PADDING,
> +   __builtin_clear_padding (, 0, for_auto_init = true).
> +   Note, we add an additional dummy argument for __builtin_clear_padding,
>'for_auto_init' to distinguish whether this call is for automatic
>variable initialization or not.
>*/
> @@ -1954,8 +1954,14 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
>pattern initialization.
>In order to make the paddings as zeroes for pattern init, We
>should add a call to __builtin_clear_padding to clear the
> -  paddings to zero in compatiple with CLANG.  */
> -   if (flag_auto_var_init == AUTO_INIT_PATTERN)
> +  paddings to zero in compatiple with CLANG.
> +  We cannot insert this call if the variable is a gimple register
> +  since __builtin_clear_padding will take the address of the
> +  variable.  As a result, if a long double/_Complex long double
> +  variable will spilled into stack later, its padding is 0XFE.  */
> +   if (flag_auto_var_init == AUTO_INIT_PATTERN
> +   && !is_gimple_reg (decl)
> +   && clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
>   gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
>   }
> }
> @@ -5384,12 +5390,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
> 
>   /* If the user requests to initialize automatic variables, we
>  should initialize paddings inside the variable.  Add a call to
> - __BUILTIN_CLEAR_PADDING (, 0, for_auto_init = true) to
> + __builtin_clear_pading (, 0, for_auto_init = true) to
>  initialize paddings of object always to zero regardless of
>  INIT_TYPE.  Note, we will not insert this call if the aggregate
>  variable has be completely cleared already or it's initialized
> - with an empty constructor.  */
> + with an empty constructor.  We cannot insert this call if the
> + variable is a gimple register since __builtin_clear_padding will take
> + the address of the variable.  As a result, if a long double/_Complex 
> long
> + double variable will be spilled into stack later, its padding cannot
> + be cleared with __builtin_clear_padding.  We should clear its padding
> + when it is spilled into memory.  */
>   if (is_init_expr
> +  && !is_gimple_reg (object)
> +  && clear_padding_type_may_have_padding_p (type)
>   && ((AGGREGATE_TYPE_P (type) && !cleared && !is_empty_ctor)
> || !AGGREGATE_TYPE_P (type))
>   && is_var_need_auto_init (object))
> diff --git a/gcc/testsuite/c-c++-common/pr102281.c 
> b/gcc/testsuite/c-c++-common/pr102281.c
> 

RE: [PATCH] x86_64: Implement V1TI mode shifts/rotates by a constant

2021-10-25 Thread Roger Sayle


Hi Uros,
I believe the proposed sequences should be dramatically faster than LLVM's
implementation(s), due to the large latencies required to move values between
the vector and scalar parts on modern x86_64 microarchitectures.  All of the
SSE2 instructions used in the sequences proposed by my patch have single
cycle latencies, so have a maximum total latency of 5 cycles, though due to
multiple issue, typically require between 1 and 3 cycles depending up the 
sequence.

Moving between units is significantly slower; according to Agner Fog's tables,
the pinsrq/pextrq instructions you suggest have latencies up to 7 cycles on the
Silvermont architecture.  Let's take the LLVM code you've provided, and 
annotate with cycle counts for a recent Intel (cascadelake) and recent AMD
(zen2) CPUs.

movq%xmm0, %rax ; 2-3 cycles
pshufd  $78, %xmm0, %xmm0   ; 1 cycle
movq%xmm0, %rcx ; 2-3 cycles
shldq   $8, %rax, %rcx  ; 3 cycles
shlq$8, %rax; 1 cycle
movq%rcx, %xmm1 ; 2-3 cycles
movq%rax, %xmm0 ; 2-3 cycles
punpcklqdq  %xmm1, %xmm0; 1 cycle

This 8 instruction sequence has a total latency of 14 cycles on CascadeLake and
18 cycles on Zen2, but an scheduled cycle count of 9 cycles and 11 cycles 
respectively.

The same left shift by 8 as implemented by the proposed patch is:

pslldq  $1, %xmm0   ; 1 cycle

And for reference, the code currently generated by GCC is:

movaps  %xmm0, -24(%rsp); 3 cycles
movq-24(%rsp), %rax ; 2 cycles
movq-16(%rsp), %rdx ; 2 cycles
shldq   $8, %rax, %rdx  ; 3 cycles
salq$8, %rax; 1 cycle
movq%rax, -24(%rsp) ; 2 cycles
movq%rdx, -16(%rsp) ; 2 cycles
movdqa  -24(%rsp), %xmm0; 2 cycles


The very worst case timing of my patches is the five instruction rotate:
pshufd  $78, %xmm0, %xmm1   ; 1 cycle
pshufd  $57, %xmm0, %xmm0   ; 1 cycle
pslld   $1, %xmm1   ; 1 cycle
psrld   $31, %xmm0  ; 1 cycle
por %xmm1, %xmm0; 1 cycle

which has 5 cycle total latency, but can complete in 3 cycles when suitably
scheduled as the pshufd can execute concurrently, as then can the two shifts,
finally followed by the por.

Perhaps I'm missing something, but I'd expect this patch to be three or
four times faster, on recent hardware, than the code generated by LLVM.

Let me know if you'd like me to run microbenchmarks, but the documented
timings are such a dramatic improvement, I'm a little surprised you've
asked about performance.  My patch is also a code size win with -Os
(ashl_8 is currently 39 bytes, shrinks to 5 bytes with this patch).


Please let me know what you think.
Roger
--

-Original Message-
From: Uros Bizjak  
Sent: 25 October 2021 09:02
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [PATCH] x86_64: Implement V1TI mode shifts/rotates by a constant

On Sun, Oct 24, 2021 at 6:34 PM Roger Sayle  wrote:
>
>
> This patch provides RTL expanders to implement logical shifts and 
> rotates of 128-bit values (stored in vector integer registers) by 
> constant bit counts.  Previously, GCC would transfer these values to a 
> pair of scalar registers (TImode) via memory to perform the operation, 
> then transfer the result back via memory.  Instead these operations 
> are now expanded using (between 1 and 5) SSE2 vector instructions.

Hm, instead of using memory (without STL forwarding for general -> XMM
moves!) these should use something similar to what clang produces (or use 
pextrq/pinsrq, at least with SSE4.1):

   movq%xmm0, %rax
   pshufd  $78, %xmm0, %xmm0
   movq%xmm0, %rcx
   shldq   $8, %rax, %rcx
   shlq$8, %rax
   movq%rcx, %xmm1
   movq%rax, %xmm0
   punpcklqdq  %xmm1, %xmm0

> Logical shifts by multiples of 8 can be implemented using x86_64's 
> pslldq/psrldq instruction:
> ashl_8: pslldq  $1, %xmm0
> ret
> lshr_32:
> psrldq  $4, %xmm0
> ret
>
> Logical shifts by greater than 64 can use pslldq/psrldq $8, followed 
> by a psllq/psrlq for the remaining bits:
> ashl_111:
> pslldq  $8, %xmm0
> psllq   $47, %xmm0
> ret
> lshr_127:
> psrldq  $8, %xmm0
> psrlq   $63, %xmm0
> ret
>
> The remaining logical shifts make use of the following idiom:
> ashl_1:
> movdqa  %xmm0, %xmm1
> psllq   $1, %xmm0
> pslldq  $8, %xmm1
> psrlq   $63, %xmm1
> por %xmm1, %xmm0
> ret
> lshr_15:
> movdqa  %xmm0, %xmm1
> psrlq   $15, %xmm0
> psrldq  $8, %xmm1
> psllq   $49, %xmm1
> por %xmm1, %xmm0
> ret
>
> Rotates by multiples of 32 can use x86_64's pshufd:
> rotr_32:
> pshufd  $57, %xmm0, %xmm0
> ret
> rotr_64:
> pshufd  $78, %xmm0, %xmm0
> ret
> rotr_96:
> pshufd  $147, %xmm0, %xmm0
> 

[committed][PATCH]AArch64 testsuite: Force shrn-combine-*.c to use NEON.

2021-10-25 Thread Tamar Christina via Gcc-patches
Hi All,

These tests are testing Advanced SIMD codegen, so if the compiler or the
testsuite is forcing SVE they will fail.

This adds +nosve so that we always generate neon.

Regtested on aarch64-none-linux-gnu and no issues.

Committed under the obvious rule.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR target/102907
* gcc.target/aarch64/shrn-combine-1.c: Disable SVE.
* gcc.target/aarch64/shrn-combine-2.c: Likewise.
* gcc.target/aarch64/shrn-combine-3.c: Likewise.
* gcc.target/aarch64/shrn-combine-4.c: Likewise.
* gcc.target/aarch64/shrn-combine-5.c: Likewise.
* gcc.target/aarch64/shrn-combine-6.c: Likewise.
* gcc.target/aarch64/shrn-combine-7.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
index 
a28524662edca8eb149e34c2242091b51a167b71..334e94aa76e030d18cfbda2febe3200f0ccb7b5e
 100644
--- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
 
+#pragma GCC target "+nosve"
+
 #define TYPE char
 
 void foo (unsigned TYPE * restrict a, TYPE * restrict d, int n)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c
index 
012135b424f98abadc480e7ef13fcab080d99c28..c90de72e9c39e2cac22264004015b4be62c38110
 100644
--- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
 
+#pragma GCC target "+nosve"
+
 #define TYPE short
 
 void foo (unsigned TYPE * restrict a, TYPE * restrict d, int n)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c
index 
8b5b360de623b0ada0da1531795ba6b428c7f9e1..a05ecbb373a55d39e07bb1d8f887485d73740638
 100644
--- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
 
+#pragma GCC target "+nosve"
+
 #define TYPE int
 
 void foo (unsigned long long * restrict a, TYPE * restrict d, int n)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-4.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-4.c
index 
fedca7621e2a82df0df9d12b91c5c0c9fd3dfc60..36ebab7b742add831403f6d2000c14f6a7714770
 100644
--- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-4.c
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-4.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
 
+#pragma GCC target "+nosve"
+
 #define TYPE long long
 
 void foo (unsigned TYPE * restrict a, TYPE * restrict d, int n)
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c
index 
408e85535788b2c1c9b05672a269e4e6567f2683..973e577e938198fb8ab5ee8662bb16fa695a6842
 100644
--- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-5.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
 
+#pragma GCC target "+nosve"
+
 #define TYPE1 char
 #define TYPE2 short
 #define SHIFT 8
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c
index 
6211ba3e41c199f325b80217d298801767c8dad5..db36a9c421815987778d1427be232d9264bf7094
 100644
--- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-6.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
 
+#pragma GCC target "+nosve"
+
 #define TYPE1 short
 #define TYPE2 int
 #define SHIFT 16
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c
index 
56cbeacc6de54f177f5b66d26b62ba6cefb921ad..e7caf3c7587a7df15889760a2090e3fa264bc66e
 100644
--- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-7.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
 
+#pragma GCC target "+nosve"
+
 #define TYPE1 int
 #define TYPE2 long long
 #define SHIFT 32


-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c b/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
index a28524662edca8eb149e34c2242091b51a167b71..334e94aa76e030d18cfbda2febe3200f0ccb7b5e 100644
--- a/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options 

[PATCH] testsuite: i386: Fix gcc.target/i386/avx512f-pr96891-3.c on Solaris [PR102834]

2021-10-25 Thread Rainer Orth
gcc.target/i386/avx512f-pr96891-3.c currently FAILs on 32-bit Solaris/x86:

FAIL: gcc.target/i386/avx512f-pr96891-3.c scan-assembler-times (?n)vpcmp[bwdq][ 
t]*\$7 4

There are only 3 instances of the expected pattern because Solaris/x86
defaults to -mno-stv.  Fixed by compiling with -mstv and
-mno-stackrealign.  Tested on i386-pc-solaris2.11 and
x86_64-pc-linux-gnu.

Ok for master?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2021-10-20  Rainer Orth  

gcc/testsuite:
PR testsuite/102834
* gcc.target/i386/avx512f-pr96891-3.c: Add -mstv -mno-stackrealign
to dg-options.

# HG changeset patch
# Parent  fb0ee6d7c96c44712f6a682a8be50ea3471d73fc
testsuite: i386: Fix gcc.target/i386/avx512f-pr96891-3.c on Solaris [PR102834]

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c
--- a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx512vl -mavx512bw -mavx512dq -O2 -masm=att" } */
+/* { dg-options "-mavx512vl -mavx512bw -mavx512dq -O2 -masm=att -mstv -mno-stackrealign" } */
 /* { dg-final { scan-assembler-not {not[bwlqd]\]} } } */
 /* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$5} 4} } */
 /* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]*\$6} 4} } */


[PATCH] testsuite: i386: Fix gcc.target/i386/avx512fp16-trunchf.c on Solaris [PR102835]

2021-10-25 Thread Rainer Orth
The gcc.target/i386/avx512fp16-trunchf.c test FAILs on 32-bit Solaris/x86:

FAIL: gcc.target/i386/avx512fp16-trunchf.c scan-assembler-times vcvttsh2si[ 
t]+[^{\\n]*(?:%xmm[0-9]|(%esp))+, %eax(?:\\n|[ t]+#) 3
FAIL: gcc.target/i386/avx512fp16-trunchf.c scan-assembler-times vcvttsh2usi[ 
t]+[^{\\n]*(?:%xmm[0-9]|(%esp))+, %eax(?:\\n|[ t]+#) 2

This happens because Solaris defaults to -fno-omit-frame-pointer, so it
uses %ebp instead of the expected %esp.  As Hongyu Wang suggested in the
PR, this can be fixed by accepting both forms, which this patch does.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

Ok for master?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2021-10-20  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/avx512fp16-trunchf.c: Allow for %esp instead of
%ebp.

# HG changeset patch
# Parent  42f3c920a9840cff9344293def88b179095d62bd
testsuite: i386: Fix gcc.target/i386/avx512fp16-trunchf.c on Solaris [PR102835]

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-trunchf.c b/gcc/testsuite/gcc.target/i386/avx512fp16-trunchf.c
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-trunchf.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-trunchf.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512fp16" } */
-/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\[^\{\n\]*(?:%xmm\[0-9\]|\\(%esp\\))+, %eax(?:\n|\[ \\t\]+#)" 3 } } */
-/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\[^\{\n\]*(?:%xmm\[0-9\]|\\(%esp\\))+, %eax(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\[^\{\n\]*(?:%xmm\[0-9\]|\\(%e\[bs\]p\\))+, %eax(?:\n|\[ \\t\]+#)" 3 } } */
+/* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\[^\{\n\]*(?:%xmm\[0-9\]|\\(%e\[bs\]p\\))+, %eax(?:\n|\[ \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vcvttsh2si\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+, %rax(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-times "vcvttsh2usi\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+, %rax(?:\n|\[ \\t\]+#)" 1 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler "xorl\[ \\t\]+%edx, %edx" { target ia32 } } } */


Re: [PATCH] testsuite: i386: Use -fomit-frame-pointer for gcc.target/i386/pr100704-1.c etc.

2021-10-25 Thread H.J. Lu via Gcc-patches
On Mon, Oct 25, 2021 at 6:42 AM Rainer Orth  
wrote:
>
> gcc.target/i386/pr100704-[12].c currently FAIL on 64-bit Solaris/x86:
>
> FAIL: gcc.target/i386/pr100704-1.c scan-assembler-not pushq
> FAIL: gcc.target/i386/pr100704-2.c scan-assembler-not pushq
>
> Fixed by compiling with -fomit-frame-pointer.
>
> Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
>
> Ok for master?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2021-10-20  Rainer Orth  
>
> gcc/testsuite:
> * gcc.target/i386/pr100704-1.c: Add -fomit-frame-pointer to
> dg-options.
> * gcc.target/i386/pr100704-2.c: Likewise.
>

LGTM.

Thanks.

-- 
H.J.


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pieces-memset-1.c etc. on Solaris [PR102836]

2021-10-25 Thread H.J. Lu via Gcc-patches
On Mon, Oct 25, 2021 at 6:46 AM Rainer Orth  
wrote:
>
> Several of the gcc.target/i386/pieces-memset-*.c tests FAIL on 32-bit
> Solaris/x86:
>
> FAIL: gcc.target/i386/pieces-memset-1.c scan-assembler-not %[re]bp
> FAIL: gcc.target/i386/pieces-memset-4.c scan-assembler-not %[re]bp
> FAIL: gcc.target/i386/pieces-memset-41.c scan-assembler-not %[re]bp
> FAIL: gcc.target/i386/pieces-memset-7.c scan-assembler-not %[re]bp
> FAIL: gcc.target/i386/pieces-memset-8.c scan-assembler-not %[re]bp
> FAIL: gcc.target/i386/pr90773-1.c scan-assembler-times movq[t 
> ]+7(%[^,]+), 1
> FAIL: gcc.target/i386/pr90773-1.c scan-assembler-times movq[t 
> ]+(%[^,]+), 1
>
> Fixed by compiling with -mno-stackrealign.  Tested no
> i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
>
> Ok for master?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2021-10-20  Rainer Orth  
>
> gcc/testsuite:
> PR testsuite/102836
> * gcc.target/i386/pieces-memset-1.c: Add -mno-stackrealign to
> dg-options.
> * gcc.target/i386/pieces-memset-4.c: Likewise.
> * gcc.target/i386/pieces-memset-7.c: Likewise.
> * gcc.target/i386/pieces-memset-8.c: Likewise.
> * gcc.target/i386/pieces-memset-41.c: Likewise.
> * gcc.target/i386/pr90773-1.c: Likewise.
>

LGTM.

Thanks.

-- 
H.J.


[PATCH] testsuite: i386: Fix gcc.target/i386/pieces-memset-1.c etc. on Solaris [PR102836]

2021-10-25 Thread Rainer Orth
Several of the gcc.target/i386/pieces-memset-*.c tests FAIL on 32-bit
Solaris/x86:

FAIL: gcc.target/i386/pieces-memset-1.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-4.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-41.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-7.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-8.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pr90773-1.c scan-assembler-times movq[t 
]+7(%[^,]+), 1
FAIL: gcc.target/i386/pr90773-1.c scan-assembler-times movq[t 
]+(%[^,]+), 1

Fixed by compiling with -mno-stackrealign.  Tested no
i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

Ok for master?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2021-10-20  Rainer Orth  

gcc/testsuite:
PR testsuite/102836
* gcc.target/i386/pieces-memset-1.c: Add -mno-stackrealign to
dg-options.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pr90773-1.c: Likewise.

# HG changeset patch
# Parent  36b044e6c2ffe7fb4c59b9e83604e747575dbb35
testsuite: i386: Fix gcc.target/i386/pieces-memset-1.c etc. on Solaris [PR102836]

diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-1.c b/gcc/testsuite/gcc.target/i386/pieces-memset-1.c
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-1.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic -mno-stackrealign" } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-4.c b/gcc/testsuite/gcc.target/i386/pieces-memset-4.c
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-4.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic -mno-stackrealign" } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-41.c b/gcc/testsuite/gcc.target/i386/pieces-memset-41.c
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-41.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-41.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge -mno-stackrealign" } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-7.c b/gcc/testsuite/gcc.target/i386/pieces-memset-7.c
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-7.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic -mno-stackrealign" } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-8.c b/gcc/testsuite/gcc.target/i386/pieces-memset-8.c
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-8.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */
+/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic -mno-stackrealign" } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c
--- a/gcc/testsuite/gcc.target/i386/pr90773-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse2 -mtune=generic" } */
+/* { dg-options "-O2 -msse2 -mtune=generic -mno-stackrealign" } */
 
 extern char *dst, *src;
 


[PATCH] testsuite: i386: Use -fomit-frame-pointer for gcc.target/i386/pr100704-1.c etc.

2021-10-25 Thread Rainer Orth
gcc.target/i386/pr100704-[12].c currently FAIL on 64-bit Solaris/x86:

FAIL: gcc.target/i386/pr100704-1.c scan-assembler-not pushq
FAIL: gcc.target/i386/pr100704-2.c scan-assembler-not pushq

Fixed by compiling with -fomit-frame-pointer.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

Ok for master?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2021-10-20  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/pr100704-1.c: Add -fomit-frame-pointer to
dg-options.
* gcc.target/i386/pr100704-2.c: Likewise.

# HG changeset patch
# Parent  0bfb6ff336f41aa5422e34580f68d5cf27a1641c
testsuite: i386: Use -fomit-frame-pointer for gcc.target/i386/pr100704-1.c etc.

diff --git a/gcc/testsuite/gcc.target/i386/pr100704-1.c b/gcc/testsuite/gcc.target/i386/pr100704-1.c
--- a/gcc/testsuite/gcc.target/i386/pr100704-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr100704-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2 -march=x86-64" } */
+/* { dg-options "-O2 -fomit-frame-pointer -march=x86-64" } */
 
 struct S
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr100704-2.c b/gcc/testsuite/gcc.target/i386/pr100704-2.c
--- a/gcc/testsuite/gcc.target/i386/pr100704-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr100704-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2 -march=x86-64" } */
+/* { dg-options "-O2 -fomit-frame-pointer -march=x86-64" } */
 
 struct S
 {


Re: [PATCH] Add TSVC tests.

2021-10-25 Thread Martin Liška

PING^1

On 10/19/21 08:49, Martin Liška wrote:

On 10/18/21 12:08, Richard Biener wrote:

Can you please use a subdirectory for the sources, a "toplevel"
license.txt doesn't make much sense.  You can simply amend
vect.exp to process tsvc/*.c as well as sources so no need for an
extra .exp file.


Sure, it's a good idea and I've done that.



Is the license recognized as
compatible to the GPL as far as source distribution is concerned?


Yes: https://www.gnu.org/licenses/license-list.html#NCSA



Did you test the testcases on any non-x86 target?  (power/aarch64/arm)


Yes, I run the tests also on ppc64le-linux-gnu and aarch64-linux-gnu.

Thoughts?
Thanks,
Martin



Richard.




[PATCH] testsuite: i386: Require dfp in gcc.target/i386/pr101346.c

2021-10-25 Thread Rainer Orth
gcc.target/i386/pr101346.c currently FAILs on Solaris/x86:

FAIL: gcc.target/i386/pr101346.c (test for excess errors)

Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/pr101346.c:6:1: 
error: decimal floating-point not supported for this target
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/pr101346.c:7:6: 
error: decimal floating-point not supported for this target
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/pr101346.c:9:12: 
warning: implicit declaration of function '__builtin_fabsd128'; did you mean 
'__builtin_fabsf128'? [-Wimplicit-function-declaration]

Fixed by requiring dfp support.  Tested on i386-pc-solaris2.11 and
x86_64-pc-linux-gnu.

Ok for master?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2021-10-20  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/pr101346.c: Require dfp support.

# HG changeset patch
# Parent  68af767588b52fb610717b4dbbc97b030192ae48
testsuite: i386: Require dfp in gcc.target/i386/pr101346.c

diff --git a/gcc/testsuite/gcc.target/i386/pr101346.c b/gcc/testsuite/gcc.target/i386/pr101346.c
--- a/gcc/testsuite/gcc.target/i386/pr101346.c
+++ b/gcc/testsuite/gcc.target/i386/pr101346.c
@@ -2,6 +2,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O0 -fprofile-generate -msse" } */
 /* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-require-effective-target dfp } */
 
 _Decimal128
 foo (_Decimal128 x)


Re: [PATCH] Port update-copyright.py to Python3

2021-10-25 Thread Martin Liška

On 10/22/21 23:00, Thomas Schwinge wrote:

|Turns out, there is another issue, observed in combination with a few "BadYear" occurrences 
due to "improper" copyright lines (Bill, for your information). OK to push "Fix 
'contrib/update-copyright.py':|


Thank you for the fix, it seems to me obvious the change!

Martin


[PATCH] libstdc++: Fix 28_regex/basic_regex/84110.cc on Solaris

2021-10-25 Thread Rainer Orth
28_regex/basic_regex/84110.cc currently FAILs on Solaris:

FAIL: 28_regex/basic_regex/84110.cc (test for excess errors)
UNRESOLVED: 28_regex/basic_regex/84110.cc compilation failed to produce 
executable

Excess errors:
/vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc:14:
 error: reference to 'extended' is ambiguous

The issue is seen in the full output:

/vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc:
 In function ‘void test01()’:
/vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc:14:
 error: reference to ‘extended’ is ambiguous
In file included from 
/var/gcc/regression/master/11.4-gcc-gas/build/gcc/include-fixed/math.h:391,
 from 
/var/gcc/regression/master/11.4-gcc-gas/build/i386-pc-solaris2.11/libstdc++-v3/include/cmath:45,
 from 
/vol/gcc/src/hg/master/local/libstdc++-v3/include/precompiled/stdc++.h:41:
/usr/include/floatingpoint.h:73: note: candidates are: ‘typedef unsigned int 
extended [3]’

Fixed by qualifying extended.  Tested on i386-pc-solaris2.11,
sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.

Ok for master?

I'm not certain if this is the best fix, though.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2021-10-20  Rainer Orth  

libstdc++-v3:
* testsuite/28_regex/basic_regex/84110.cc (test01)
[__cpp_exceptions]: Disambiguate extended.

# HG changeset patch
# Parent  1a71c5553268184d62ac25cc8838e7ad096199b3
libstdc++: Fix 28_regex/basic_regex/84110.cc on Solaris

diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc b/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc
@@ -11,7 +11,7 @@ void test01()
 
 #if __cpp_exceptions
   using namespace std::regex_constants;
-  for (auto syn : {basic, extended, awk, grep, egrep})
+  for (auto syn : {basic, std::regex::extended, awk, grep, egrep})
   {
 try
 {


[Aarch64] Fix alignment of neon loads & stores in gimple

2021-10-25 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This fixes the alignment on the memory access type for neon loads & 
stores in the gimple lowering. Bootstrap ubsan on aarch64 builds again 
with this change.



2021-10-25  Andre Vieira  

gcc/ChangeLog:

    * config/aarch64/aarch64-builtins.c 
(aarch64_general_gimple_fold_builtin): Fix memory access

    type alignment.


Is this OK for trunk?

Kind regards,
Andre
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
a815e4cfbccab692ca688ba87c71b06c304abbfb..f5436baf5f8a65c340e05faa491d86a7847c37d3
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2490,12 +2490,16 @@ aarch64_general_gimple_fold_builtin (unsigned int 
fcode, gcall *stmt,
gimple_seq stmts = NULL;
tree base = gimple_convert (, elt_ptr_type,
args[0]);
+   /* Use element type alignment.  */
+   tree access_type
+ = build_aligned_type (simd_type.itype,
+   TYPE_ALIGN (TREE_TYPE (simd_type.itype)));
if (stmts)
  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
new_stmt
  = gimple_build_assign (gimple_get_lhs (stmt),
 fold_build2 (MEM_REF,
- simd_type.itype,
+ access_type,
  base, zero));
  }
break;
@@ -2512,13 +2516,16 @@ aarch64_general_gimple_fold_builtin (unsigned int 
fcode, gcall *stmt,
gimple_seq stmts = NULL;
tree base = gimple_convert (, elt_ptr_type,
args[0]);
+   /* Use element type alignment.  */
+   tree access_type
+ = build_aligned_type (simd_type.itype,
+   TYPE_ALIGN (TREE_TYPE (simd_type.itype)));
if (stmts)
  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
new_stmt
- = gimple_build_assign (fold_build2 (MEM_REF,
-simd_type.itype,
-base,
-zero), args[1]);
+ = gimple_build_assign (fold_build2 (MEM_REF, access_type,
+ base, zero),
+args[1]);
  }
break;
 


[PATCH] libstdc++: Fix 17_intro/names.cc on Solaris

2021-10-25 Thread Rainer Orth
17_intro/names.cc and experimental/names.cc currently FAIL on Solaris

FAIL: 17_intro/names.cc (test for excess errors)
FAIL: experimental/names.cc (test for excess errors)

Excess errors:
/usr/include/sys/timespec_util.h:22: error: expected ')' before ';' token
/usr/include/stdlib.h:157: error: expected unqualified-id before '[' token
/usr/include/stdlib.h:157: error: expected ')' before '[' token

 has

extern int timespeccompare(const struct timespec *l, const struct timespec *r);

while  has

typedef struct drand48_data {
unsigned int _initialised;
unsigned short int x[3];
unsigned short int a[3];
unsigned int c;
unsigned short lastx[3];
} drand48_data;

both of which are broken by defining r resp. x to ( in the testcase.

Fixed by undoing the defines.  Tested on i386-pc-solaris2.11,
sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.

Ok for master?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2021-10-20  Rainer Orth  

libstdc++-v3:
* testsuite/17_intro/names.cc [__sun__] (r, x): Undef.


# HG changeset patch
# Parent  46e494d20576c10260ec41a54607f36eb946dc2e
libstdc++: Fix 17_intro/names.cc on Solaris

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc b/libstdc++-v3/testsuite/17_intro/names.cc
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -251,6 +251,10 @@
 #undef p
 // See https://gcc.gnu.org/ml/libstdc++/2019-05/msg00175.html
 #undef ptr
+//  uses this as parameter
+#undef r
+//  uses this as member of drand48_data
+#undef x
 #endif
 
 #ifdef __VXWORKS__


RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-10-25 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, October 25, 2021 10:54 AM
> To: Tamar Christina 
> Cc: Tamar Christina via Gcc-patches ; Richard
> Earnshaw ; nd ; Marcus
> Shawcroft 
> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants
> and operations
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Saturday, October 23, 2021 11:40 AM
> >> To: Tamar Christina via Gcc-patches 
> >> Cc: Tamar Christina ; Richard Earnshaw
> >> ; nd ; Marcus Shawcroft
> >> 
> >> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector
> >> constants and operations
> >>
> >> Tamar Christina via Gcc-patches  writes:
> >> >> I'm still a bit sceptical about treating the high-part cost as lower.
> >> >> ISTM that the subreg cases are the ones that are truly “free” and
> >> >> any others should have a normal cost.  So if CSE handled the
> >> >> subreg case itself (to model how the rtx would actually be
> >> >> generated) then
> >> >> aarch64 code would have to do less work.  I imagine that will be
> >> >> true for
> >> other targets as well.
> >> >
> >> > I guess the main problem is that CSE lacks context because it's not
> >> > until after combine that the high part becomes truly "free" when
> >> > pushed
> >> into a high operation.
> >>
> >> Yeah.  And the aarch64 code is just being asked to cost the operation
> >> it's given, which could for example come from an existing
> >> aarch64_simd_mov_from_high.  I think we should try to ensure
> >> that a aarch64_simd_mov_from_high followed by some
> arithmetic
> >> on the result is more expensive than the fused operation (when fusing
> >> is possible).
> >>
> >> An analogy might be: if the cost code is given:
> >>
> >>   (add (reg X) (reg Y))
> >>
> >> then, at some later point, the (reg X) might be replaced with a
> >> multiplication, in which case we'd have a MADD operation and the
> >> addition is effectively free.  Something similar would happen if (reg
> >> X) became a shift by a small amount on newer cores, although I guess
> >> then you could argue either that the cost of the add disappears or that
> the cost of the shift disappears.
> >>
> >> But we shouldn't count ADD as free on the basis that it could be
> >> combined with a multiplication or shift in future.  We have to cost
> >> what we're given.  I think the same thing applies to the high part.
> >>
> >> Here we're trying to prevent cse1 from replacing a DUP (lane) with a
> >> MOVI by saying that the DUP is strictly cheaper than the MOVI.
> >> I don't think that's really true though, and the cost tables in the
> >> patch say that DUP is more expensive (rather than less expensive) than
> MOVI.
> >
> > No we're not. The front end has already pushed the constant into each
> > operation that needs it which is the entire problem.
> 
> I think we're talking about different things here.  I'll come to the gimple 
> stuff
> below, but I was talking purely about the effect on the RTL optimisers.  What
> I meant above is that, in the cse1 dumps, the patch leads to changes like:
> 
>  (insn 20 19 21 2 (set (reg:V8QI 96 [ _8 ])
> -(const_vector:V8QI [
> +(vec_select:V8QI (reg:V16QI 116)
> +(parallel:V16QI [
> +(const_int 8 [0x8])
> +(const_int 9 [0x9])
> +(const_int 10 [0xa])
> +(const_int 11 [0xb])
> +(const_int 12 [0xc])
> +(const_int 13 [0xd])
> +(const_int 14 [0xe])
> +(const_int 15 [0xf])
> +]))) "include/arm_neon.h":6477:22 1394
> {aarch64_simd_mov_from_v16qihigh}
> + (expr_list:REG_EQUAL (const_vector:V8QI [
>  (const_int 3 [0x3]) repeated x8
> -])) "include/arm_neon.h":6477:22 1160 {*aarch64_simd_movv8qi}
> - (expr_list:REG_DEAD (reg:V16QI 117)
> -(nil)))
> +])
> +(expr_list:REG_DEAD (reg:V16QI 117)
> +(nil
> 
> The pre-cse1 code is:
> 
> (insn 19 18 20 2 (set (reg:V16QI 117)
> (const_vector:V16QI [
> (const_int 3 [0x3]) repeated x16
> ])) "include/arm_neon.h":6477:22 1166 {*aarch64_simd_movv16qi}
>  (nil))
> (insn 20 19 21 2 (set (reg:V8QI 96 [ _8 ])
> (vec_select:V8QI (reg:V16QI 117)
> (parallel:V16QI [
> (const_int 8 [0x8])
> (const_int 9 [0x9])
> (const_int 10 [0xa])
> (const_int 11 [0xb])
> (const_int 12 [0xc])
> (const_int 13 [0xd])
> (const_int 14 [0xe])
> (const_int 15 [0xf])
> ]))) "include/arm_neon.h":6477:22 1394
> {aarch64_simd_mov_from_v16qihigh}
>  (nil))
> 
> That is, before the patch, we folded insn 19 into insn 20 to get:
> 
> (insn 20 19 21 2 (set (reg:V8QI 96 [ _8 ])
> 

Re: [PATCH] Constant fold/simplify SS_ASHIFT and US_ASHIFT in simplify-rtx.c

2021-10-25 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> This patch adds compile-time evaluation of signed saturating left shift
> (SS_ASHIFT) and unsigned saturating left shift (US_ASHIFT) to simplify-rtx's
> simplify_const_binary_operation.  US_ASHIFT saturates to the maximum
> unsigned value on overflow (which occurs when the shift is greater than
> the leading zero count), while SS_ASHIFT saturates on overflow to the
> maximum signed value for positive arguments, and the minimum signed value
> for negative arguments (which occurs when the shift count is greater than
> the number of leading redundant sign bits, clrsb).  This suggests
> some additional simplifications that this patch implements in
> simplify_binary_operation_1; us_ashift:HI of 0x remains 0x
> (much like any ashift of 0x remains 0x), and ss_ashift:HI of
> 0x7fff remains 0x7, and of 0x8000 remains 0x8000.
>
> Conveniently the bfin backend provides instructions/built-ins that allow
> this functionality to be tested.  The two functions below
>
> short stest_sat_max() { return __builtin_bfin_shl_fr1x16(1,8); }
> short stest_sat_min() { return __builtin_bfin_shl_fr1x16(-1,8); }
>
> previously on bfin-elf with -O2 generated:
>
> _stest_sat_max:
> nop;
> nop;
> R0 = 1 (X);
> R0 = R0 << 8 (V,S);
> rts;
>
> _stest_sat_min:
> nop;
> nop;
> R0 = -1 (X);
> R0 = R0 << 8 (V,S);
> rts;
>
> With this patch, bfin-elf now generates:
>
> _stest_sat_max:
> nop;
> nop;
> nop;
> R0 = 32767 (X);
> rts;
>
> _stest_sat_min:
> nop;
> nop;
> nop;
> R0 = -32768 (X);
> rts;
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check with no new failures, and on a cross-compiler to bfin-elf
> with no regressions.  Ok for mainline?
>
>
> 2021-10-25  Roger Sayle  
>
> gcc/ChangeLog
>   * simplify-rtx (simplify_binary_operation_1) [SS_ASHIFT]: Simplify
>   shifts of the mode's smin_value and smax_value when the bit count
>   operand doesn't have side-effects.
>   [US_ASHIFT]: Likewise, simplify shifts of the mode's umax_value
>   when the bit count operand doesn't have side-effects.
>   (simplify_const_binary_operation) [SS_ASHIFT, US_ASHIFT]: Perform
>   compile-time evaluation of saturating left shifts with constant
>   arguments.
>
> gcc/testsuite/ChangeLog
>   * gcc.target/bfin/ssashift-1.c: New test case.
>
>
> Roger
> --
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 2bb18fb..5903d55 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -4064,9 +4064,25 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
>   }
>break;
>  
> -case ASHIFT:
>  case SS_ASHIFT:
> +  if (CONST_INT_P (trueop0)
> +   && HWI_COMPUTABLE_MODE_P (mode)
> +   && (UINTVAL (trueop0) == (GET_MODE_MASK (mode) >> 1)
> +   || mode_signbit_p (mode, trueop0))
> +   && ! side_effects_p (op1))
> + return op0;
> +  goto simplify_ashift;
> +
>  case US_ASHIFT:
> +  if (CONST_INT_P (trueop0)
> +   && HWI_COMPUTABLE_MODE_P (mode)
> +   && UINTVAL (trueop0) == GET_MODE_MASK (mode)
> +   && ! side_effects_p (op1))
> + return op0;
> +  /* FALLTHRU */
> +
> +case ASHIFT:
> +simplify_ashift:
>if (trueop1 == CONST0_RTX (mode))
>   return op0;
>if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
> @@ -5004,6 +5020,8 @@ simplify_const_binary_operation (enum rtx_code code, 
> machine_mode mode,
>   case LSHIFTRT:
>   case ASHIFTRT:
>   case ASHIFT:
> + case SS_ASHIFT:
> + case US_ASHIFT:
> {
>   wide_int wop1 = pop1;
>   if (SHIFT_COUNT_TRUNCATED)
> @@ -5025,6 +5043,27 @@ simplify_const_binary_operation (enum rtx_code code, 
> machine_mode mode,
>   result = wi::lshift (pop0, wop1);
>   break;
>  
> +   case SS_ASHIFT:
> + if (wi::leu_p (wop1, wi::clrsb (pop0)))
> +   result = wi::lshift (pop0, wop1);
> + else if (wi::neg_p (pop0))
> +   result = wi::min_value (GET_MODE_PRECISION (int_mode),
> +   SIGNED);
> + else
> +   result = wi::max_value (GET_MODE_PRECISION (int_mode),
> +   SIGNED);
> + break;
> +
> +   case US_ASHIFT:
> + if (wi::eq_p (pop0, 0))
> +   result = pop0;
> + else if (wi::leu_p (wop1, wi::clz (pop0)))
> +   result = wi::lshift (pop0, wop1);

I guess in the SS_ASHIFT case we're relying on clrsb (0) doing something
sensible (return the number of bits minus 1, which it does).  We could
also rely on wi::clz doing something sensible here (like we already do in
wi::min_precision) and remove the special case for zero.  Either way
is fine 

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-25 Thread Aldy Hernandez via Gcc-patches
On Mon, Oct 25, 2021 at 6:42 AM Jeff Law  wrote:
>
>
>
> On 10/24/2021 8:15 PM, Jeff Law wrote:
> >
> >
> > On 10/18/2021 2:17 AM, Aldy Hernandez wrote:
> >>
> >>
> >> On 10/18/21 12:52 AM, Jeff Law wrote:
> >>>
> >>>
> >>> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:
>  The following patch converts the strlen pass from evrp to ranger,
>  leaving DOM as the last remaining user.
> >>> So is there any reason why we can't convert DOM as well? DOM's use
> >>> of EVRP is pretty limited.  You've mentioned FP bits before, but my
> >>> recollection is those are not part of the EVRP analysis DOM uses.
> >>> Hell, give me a little guidance and I'll do the work...
> >>
> >> Not only will I take you up on that offer, but I can provide 90% of
> >> the work.  Here be dragons, though (well, for me, maybe not for you
> >> ;-)).
> > [ ... ]
> > So the failure I see it a bootstrap comparison failure affecting
> > omp-expand.c and cp/cp-gimplify.c.  We end up generating different
> > code with and without debug symbols.
> Replying to myself
>
>
> So we're getting different results from a call to fold_range_internal
> for this statement in bb #35 of expand_omp_target:
>
> (gdb) p debug_gimple_stmt (stmt)
> if (loop_171 != 0B)
>
>
> 259 res = fold_range_internal (r, s, NULL_TREE);
> (gdb) n
> 283   if (idx)
> (gdb) p res
> $60 = true
> (gdb) p r
> $61 = (irange &) @0x7fffdb20: {m_num_ranges = 1 '\001',
>m_max_ranges = 2 '\002', m_kind = VR_RANGE, m_base = 0x7fffdb30}
>
>
> vs
>
> 259 res = fold_range_internal (r, s, NULL_TREE);
> (gdb)
> 283   if (idx)
> (gdb) p res
> $16 = true
> (gdb) p r
> $17 = (irange &) @0x7fffdba0: {m_num_ranges = 1 '\001', m_max_ranges
> = 2 '\002', m_kind = VR_VARYING,
>m_base = 0x7fffdbb0}
>
> Anyway, not sure when I'll be able to look at this again, perhaps
> Wednesday.  But my sense is something isn't right WRT the range of loop_171.

You can print the range in gdb by calling debug(r) or alternatively r->debug().

It'd be interesting to see why the statement got folded to two
different ranges.  Did the IL change?  Was a global range recorded in
SSA_NAME_RANGE_INFO that perhaps the ranger picked up?  Actually, even
if the IL changed, it'd be interesting to see what exactly caused the
disparity.

Can you call gimple_ranger::dump_bb() on the problematic BB on both
compiles to see what the ranger sees for that BB?

You could also call debug_ranger() from within gdb and get a dump of
what a fresh ranger would be able to calculate with the current IL.
Try it on both compiles and send it to us, if you don't want to spam
the list.

Thanks.
Aldy



[PATCH] tree-optimization/102905 - restore re-align load for alignment peeling

2021-10-25 Thread Richard Biener via Gcc-patches
Previous refactoring made the possibility of considering re-aligned
loads for unlimited cost model alignment peeling difficult so I
ditched that.  Later refactoring made it easily possible again so
the following patch re-instantiates this which should fix the
observed regression on powerpc with altivec.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-25  Richard Biener  

PR tree-optimization/102905
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Use vect_supportable_dr_alignment again to determine whether
an access is supported when not aligned.
---
 gcc/tree-vect-data-refs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 556ae9725f1..cbcd4b80246 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1994,9 +1994,8 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
 prune all entries from the peeling hashtable which cause
 DRs to be not supported.  */
  bool supportable_if_not_aligned
-   = targetm.vectorize.support_vector_misalignment
-   (TYPE_MODE (vectype), TREE_TYPE (DR_REF (dr_info->dr)),
-DR_MISALIGNMENT_UNKNOWN, false);
+   = vect_supportable_dr_alignment
+   (loop_vinfo, dr_info, vectype, DR_MISALIGNMENT_UNKNOWN);
  while (known_le (npeel_tmp, nscalars))
 {
   vect_peeling_hash_insert (_htab, loop_vinfo,
-- 
2.31.1


Re: [PATCH 3/N] Come up with casm global state.

2021-10-25 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 5:42 PM Segher Boessenkool
 wrote:
>
> On Thu, Oct 21, 2021 at 02:42:10PM +0200, Richard Biener wrote:
> > +#define rs6000_casm static_cast (casm)
> >
> > maybe there's a better way?  Though I can't think of one at the moment.
> > There are only 10 uses so eventually we can put the
> > static_cast into all places.  Let's ask the powerpc maintainers (CCed).
>
> It's disgusting, and fragile.  The define is slightly better than having
> to write it out every time.  But can this not be done properly?
>
> If you use object-oriented stuff and need casts for that, you are doing
> something wrong.

I think the "proper" fix would be to make 'casm' have the correct type
in the first place - of course that would either mean that the target
needs to provide the (possibly derived) type, for example via a typedef
in the target structure or a classical target macro.  If gengtype would
know about inheritance that would also fix the GC marking issue.
Of course encoding a static type in the target structure is the
wrong direction from making targets "switchable".

Other than that my C++ fu is too weak to suggest the correct "pattern"
here.

> > Note you do
> >
> > +/* Implement TARGET_ASM_INIT_SECTIONS.  */
> > +
> > +static asm_out_state *
> > +rs6000_elf_asm_init_sections (void)
> > +{
> > +  rs6000_asm_out_state *target_state
> > += new (ggc_alloc ()) rs6000_asm_out_state ();
> > +  target_state->init_elf_sections ();
> > +  target_state->init_sections ();
> > +
> > +  return target_state;
> > +}
> >
> > If you'd have made init_sections virtual the flow would be more
> > natural and we could separate section init from casm construction
> > (and rs6000 would override init_sections but call the base function
> > from the override).
>
> Yeah.
>
>
> Segher


Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-10-25 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Saturday, October 23, 2021 11:40 AM
>> To: Tamar Christina via Gcc-patches 
>> Cc: Tamar Christina ; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>> 
>> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants
>> and operations
>> 
>> Tamar Christina via Gcc-patches  writes:
>> >> I'm still a bit sceptical about treating the high-part cost as lower.
>> >> ISTM that the subreg cases are the ones that are truly “free” and any
>> >> others should have a normal cost.  So if CSE handled the subreg case
>> >> itself (to model how the rtx would actually be generated) then
>> >> aarch64 code would have to do less work.  I imagine that will be true for
>> other targets as well.
>> >
>> > I guess the main problem is that CSE lacks context because it's not
>> > until after combine that the high part becomes truly "free" when pushed
>> into a high operation.
>> 
>> Yeah.  And the aarch64 code is just being asked to cost the operation it's
>> given, which could for example come from an existing
>> aarch64_simd_mov_from_high.  I think we should try to ensure that
>> a aarch64_simd_mov_from_high followed by some arithmetic on
>> the result is more expensive than the fused operation (when fusing is
>> possible).
>> 
>> An analogy might be: if the cost code is given:
>> 
>>   (add (reg X) (reg Y))
>> 
>> then, at some later point, the (reg X) might be replaced with a 
>> multiplication,
>> in which case we'd have a MADD operation and the addition is effectively
>> free.  Something similar would happen if (reg X) became a shift by a small
>> amount on newer cores, although I guess then you could argue either that
>> the cost of the add disappears or that the cost of the shift disappears.
>> 
>> But we shouldn't count ADD as free on the basis that it could be combined
>> with a multiplication or shift in future.  We have to cost what we're given. 
>>  I
>> think the same thing applies to the high part.
>> 
>> Here we're trying to prevent cse1 from replacing a DUP (lane) with a MOVI
>> by saying that the DUP is strictly cheaper than the MOVI.
>> I don't think that's really true though, and the cost tables in the patch 
>> say that
>> DUP is more expensive (rather than less expensive) than MOVI.
>
> No we're not. The front end has already pushed the constant into each 
> operation that needs it
> which is the entire problem.

I think we're talking about different things here.  I'll come to the
gimple stuff below, but I was talking purely about the effect on the
RTL optimisers.  What I meant above is that, in the cse1 dumps,
the patch leads to changes like:

 (insn 20 19 21 2 (set (reg:V8QI 96 [ _8 ])
-(const_vector:V8QI [
+(vec_select:V8QI (reg:V16QI 116)
+(parallel:V16QI [
+(const_int 8 [0x8])
+(const_int 9 [0x9])
+(const_int 10 [0xa])
+(const_int 11 [0xb])
+(const_int 12 [0xc])
+(const_int 13 [0xd])
+(const_int 14 [0xe])
+(const_int 15 [0xf])
+]))) "include/arm_neon.h":6477:22 1394 
{aarch64_simd_mov_from_v16qihigh}
+ (expr_list:REG_EQUAL (const_vector:V8QI [
 (const_int 3 [0x3]) repeated x8
-])) "include/arm_neon.h":6477:22 1160 {*aarch64_simd_movv8qi}
- (expr_list:REG_DEAD (reg:V16QI 117)
-(nil)))
+])
+(expr_list:REG_DEAD (reg:V16QI 117)
+(nil

The pre-cse1 code is:

(insn 19 18 20 2 (set (reg:V16QI 117)
(const_vector:V16QI [
(const_int 3 [0x3]) repeated x16
])) "include/arm_neon.h":6477:22 1166 {*aarch64_simd_movv16qi}
 (nil))
(insn 20 19 21 2 (set (reg:V8QI 96 [ _8 ])
(vec_select:V8QI (reg:V16QI 117)
(parallel:V16QI [
(const_int 8 [0x8])
(const_int 9 [0x9])
(const_int 10 [0xa])
(const_int 11 [0xb])
(const_int 12 [0xc])
(const_int 13 [0xd])
(const_int 14 [0xe])
(const_int 15 [0xf])
]))) "include/arm_neon.h":6477:22 1394 
{aarch64_simd_mov_from_v16qihigh}
 (nil))

That is, before the patch, we folded insn 19 into insn 20 to get:

(insn 20 19 21 2 (set (reg:V8QI 96 [ _8 ])
(const_vector:V8QI [
(const_int 3 [0x3]) repeated x8
])) "include/arm_neon.h":6477:22 1160 {*aarch64_simd_movv8qi}
 (expr_list:REG_DEAD (reg:V16QI 117)
(nil)))

After the patch we reject that because:

  (set (reg:V8QI X) (const_vector:V8QI [3]))

is costed as a MOVI (cost 4) and the original aarch64_simd_mov_from_v16qihigh
is costed as zero.  In other words, the patch makes the DUP (lane) in the
“mov high” strictly cheaper than a constant move (MOVI).

Preventing this fold seems like a 

Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-25 Thread Richard Biener via Gcc-patches
On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:

> On Fri, 22 Oct 2021 at 14:56, Richard Biener  wrote:
> >
> > On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Wed, 20 Oct 2021 at 18:21, Richard Biener  wrote:
> > > >
> > > > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Tue, 19 Oct 2021 at 16:55, Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via 
> > > > > > > > Gcc-patches
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > As suggested in PR, I have attached WIP patch 
> > > > > > > > > > > > > > > that adds two patterns
> > > > > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() 
> > > > > > > > > > > > > > > and,
> > > > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This works to remove call to erfc for the 
> > > > > > > > > > > > > > > following test:
> > > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > > > > >   return t1;
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does 
> > > > > > > > > > > > > > > not transform 1 -
> > > > > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> 
> > > > > > > > > > > > > > > erfc(x) gets applied,
> > > > > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > > > > resimplify erfc(x), which fails post 
> > > > > > > > > > > > > > > canonicalization. So we end up
> > > > > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose 
> > > > > > > > > > > > > > > isn't ideal.
> > > > > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I applied your patch manually and it does the 
> > > > > > > > > > > > > > intended
> > > > > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) 
> > > > > > > > > > > > > even when there's
> > > > > > > > > > > > > no erf(x) in the source ?
> > > > > > > > > > > >
> > > > > > > > > > > > I do think it's reasonable to expect erfc to be 
> > > > > > > > > > > > available when erf
> > > > > > > > > > > > is and vice versa but note both are C99 specified 
> > > > > > > > > > > > functions (either
> > > > > > > > > > > > requires -lm).
> > > > > > > > > > > OK, thanks. Would it be OK to commit the patch after 
> > > > > > > > > > > bootstrap+test ?
> > > > > > > > > >
> > > > > > > > > > Yes, but I'm confused because you say the patch doesn't 
> > > > > > > > > > work for you?
> > > > > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > > > > However when there's only erfc in the source, it 
> > > > > > > > > canonicalizes erfc(x)
> > > > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) 
> > > > > > > > > back to
> > > > > > > > > erfc(x)
> > > > > > > > > with -O3 -funsafe-math-optimizations.
> 

Re: [PATCH] Objective-C: fix protocol list count type (pertinent to non-LP64)

2021-10-25 Thread Iain Sandoe
Hi Matt,

> On 23 Oct 2021, at 09:46, Iain Sandoe via Gcc-patches 
>  wrote:
> 
>> On 20 Oct 2021, at 04:51, Matt Jacobson via Gcc-patches 
>>  wrote:
>> 
>> 
>>> On Sep 26, 2021, at 11:45 PM, Matt Jacobson  wrote:
>>> 
>>> Fix protocol list layout for non-LP64.  clang and objc4 both give the 
>>> `count` 
>>> field as `long`, not `intptr_t`.  Those are the same on LP64, but not 
>>> everywhere.  For non-LP64, this fixes binary compatibility with clang-built 
>>> classes.
>>> 
>>> This was more complicated than I anticipated, because the relevant frontend 
>>> code in fact had no AST type for `protocol_list_t`, instead emitting 
>>> protocol 
>>> lists as `protocol_t[]`, with the zeroth element actually being the integer 
>>> count.  That made it nontrivial to change the count to `long`.  With this 
>>> change, there is now a true `protocol_list_t` type in the AST.
>>> 
>>> Tested multiple ways.  On x86_64/Darwin, I confirmed with a test program 
>>> that 
>>> protocol conformances by classes, categories, and protocols works.

see ** below …

>>>  On AVR, I 
>>> manually inspected the generated assembly to confirm that protocol lists 
>>> gain 
>>> an extra two bytes of `count`, matching clang.

Did you test objective-c++ on Darwin?

I see a lot of fails of the form:
Excess errors:
: error: initialization of a flexible array member [-Wpedantic]


>>> Thank you for your time.
>>> 
>>> 
>> 
>> Friendly ping.  Please let me know if there’s anything I can clarify.
> 
> The patch is in my queue (it will not get lost), the rationale seems 
> reasonable,

For a patch that changes code-gen we should have a test that it produces what’s
expected (in general, a ‘torture' test would be preferrable so that we can be 
sure the
output is as expected for different optimisation levels). 

** If you have a test program that could be the basis for this - but it needs 
to be
integrated into the testsuite with scans for the required output.

Have to give some thought to how to solve the obj-c++ issue.
Iain




Re: [PATCH,Fortran 0/7] delete some unused decls, make static

2021-10-25 Thread Bernhard Reutner-Fischer via Gcc-patches
On Mon, 25 Oct 2021 00:30:16 +0200
Bernhard Reutner-Fischer  wrote:

> Hi!
> 
> Quickly skimming through the frontend headers.
> There are a couple of declarations for functions that do not have
> definitions. And there are a couple of functions that can be static.

> Bootstraps fine, regression tests running over night.
> Ok for trunk if it passes?

Tested with no regressions on x86_64-unknown-linux {-m32,-m64}.
Ok for trunk?


Re: [RFC PATCH 0/8] RISC-V: Bit-manipulation extension.

2021-10-25 Thread Kito Cheng via Gcc-patches
As we discussed in the last RISC-V GNU sync up, I've committed this
patch-set to trunk after rebase and running regression with latest
binutils.

On Mon, Oct 18, 2021 at 11:23 AM Kito Cheng  wrote:
>
> Hi Vineet:
>
> I am not familiar with buildroot, so I am not sure which GCC version will 
> work,
> but I think the patch set should be able to apply both gcc 11.1 and
> trunk without conflict.
>
> Here is a gcc 11.1 + this patch set on my github, hope this could help :)
> https://github.com/kito-cheng/riscv-gcc/tree/riscv-gcc-11.1.0-zbabcs
>
> On Thu, Oct 14, 2021 at 4:22 AM Vineet Gupta  wrote:
> >
> > Hi Kito,
> >
> > On 9/23/21 12:57 AM, Kito Cheng wrote:
> > > Bit manipulation extension[1] is finishing the public review and waiting 
> > > for
> > > the rest of the ratification process, I believe that will become a 
> > > ratified
> > > extension soon, so I think it's time to submit to upstream for review now 
> > > :)
> > >
> > > As the title included RFC, it's not a rush to merge to trunk yet, I would
> > > like to merge that until it is officially ratified.
> > >
> > > This patch set is the implementation of bit-manipulation extension, which
> > > includes zba, zbb, zbc and zbs extension, but only included in 
> > > instruction/md
> > > pattern only, no intrinsic function implementation.
> > >
> > > Most work is done by Jim Willson and many other contributors
> > > on https://github.com/riscv-collab/riscv-gcc.
> > >
> > >
> > > [1] https://github.com/riscv/riscv-bitmanip/releases/tag/1.0.0
> >
> > I wanted to give these a try. Is it reasonable to apply these to a gcc
> > 11.1 baseline and give a spin in buildroot or do these absolutely have
> > to be bleeding edge gcc.
> >
> > Thx,
> > -Vineet


Re: [PATCH] Canonicalize __atomic/sync_fetch_or/xor/and for constant mask.

2021-10-25 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 25, 2021 at 1:59 PM liuhongt  wrote:
>
> Canoicalize & and nop_convert order for
> __atomic_fetch_or_*, __atomic_fetch_xor_*,
> __atomic_xor_fetch_*,__sync_fetch_and_or_*,
> __sync_fetch_and_xor_*,__sync_xor_and_fetch_*,
> __atomic_fetch_and_*,__sync_fetch_and_and_* when mask is constant.
>
> .i.e.
>
> +/* Canonicalize
> +  _1 = __atomic_fetch_or_4 (, 1, 0);
> +  _2 = (int) _1;
> +  _5 = _2 & 1;
> +
> +to
> +
> +  _1 = __atomic_fetch_or_4 (, 1, 0);
> +  _2 = _1 & 1;
> +  _5 = (int) _2;
>
> +/* Convert
> + _1 = __atomic_fetch_and_4 (a_6(D), 4294959103, 0);
> + _2 = (int) _1;
> + _3 = _2 & 8192;
> +to
> +  _1 = __atomic_fetch_and_4 (a_4(D), 4294959103, 0);
> +  _7 = _1 & 8192;
> +  _6 = (int) _7;
> + So it can be handled by  optimize_atomic_bit_test_and.  */
>
> I'm trying to rewrite match part in match.pd and find the
> canonicalization is ok when mask is constant, but not for variable
> since it will be simplified back by
>  /* In GIMPLE, getting rid of 2 conversions for one new results
> in smaller IL.  */
>  (simplify
>   (convert (bitop:cs@2 (nop_convert:s @0) @1))
>   (if (GIMPLE
>&& TREE_CODE (@1) != INTEGER_CST
>&& tree_nop_conversion_p (type, TREE_TYPE (@2))
>&& types_match (type, @0))
>(bitop @0 (convert @1)
>
> The canonicalization for variabled is like
>
> convert
>   _1 = ~mask_7;
>   _2 = (unsigned int) _1;
>   _3 = __atomic_fetch_and_4 (ptr_6, _2, 0);
>  _4 = (int) _3;
>  _5 = _4 & mask_7;
>
> to
>   _1 = ~mask_7;
>   _2 = (unsigned int) _1;
>   _3 = __atomic_fetch_and_4 (ptr_6, _2, 0);
>   _4 = (unsigned int) mask_7
>   _6 = _3 & _4
>   _5 = (int) _6
>
> and be simplified back.
>
> I've also tried another way of simplication like
>
> convert
>   _1 = ~mask_7;
>   _2 = (unsigned int) _1;
>   _3 = __atomic_fetch_and_4 (ptr_6, _2, 0);
>  _4 = (int) _3;
>  _5 = _4 & mask_7;
>
> to
>   _1 = (unsigned int)mask_7;
>   _2 = ~ _1;
>   _3 = __atomic_fetch_and_4 (ptr_6, _2, 0);
>_6 = _3 & _1
>   _5 = (int)
>
> but it's prevent by below since __atomic_fetch_and_4 is not CONST, but
> we need to regenerate it with updated parameter.
>
>   /* We can't and should not emit calls to non-const functions.  */
>   if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> return NULL;
>
> gcc/ChangeLog:
>
> * match.pd: Canonicalize __atomic/sync_fetch_or/xor/and for
> constant mask.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr102566-1a.c: New test.
> * gcc.target/i386/pr102566-2a.c: New test.
> ---
>  gcc/match.pd| 114 
>  gcc/testsuite/gcc.target/i386/pr102566-1a.c |  66 
>  gcc/testsuite/gcc.target/i386/pr102566-2a.c |  65 +++
>  3 files changed, 245 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102566-1a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102566-2a.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5bed2e12715..545a243eae6 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -104,6 +104,39 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (define_operator_list COND_TERNARY
>IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
>
> +/* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
> +(define_operator_list ATOMIC_FETCH_OR_XOR_N
> +  BUILT_IN_ATOMIC_FETCH_OR_1 BUILT_IN_ATOMIC_FETCH_OR_2
> +  BUILT_IN_ATOMIC_FETCH_OR_4 BUILT_IN_ATOMIC_FETCH_OR_8
> +  BUILT_IN_ATOMIC_FETCH_OR_16
> +  BUILT_IN_ATOMIC_FETCH_XOR_1 BUILT_IN_ATOMIC_FETCH_XOR_2
> +  BUILT_IN_ATOMIC_FETCH_XOR_4 BUILT_IN_ATOMIC_FETCH_XOR_8
> +  BUILT_IN_ATOMIC_FETCH_XOR_16
> +  BUILT_IN_ATOMIC_XOR_FETCH_1 BUILT_IN_ATOMIC_XOR_FETCH_2
> +  BUILT_IN_ATOMIC_XOR_FETCH_4 BUILT_IN_ATOMIC_XOR_FETCH_8
> +  BUILT_IN_ATOMIC_XOR_FETCH_16)
> +/* __sync_fetch_and_or_*, __sync_fetch_and_xor_*, __sync_xor_and_fetch_*  */
> +(define_operator_list SYNC_FETCH_OR_XOR_N
> +  BUILT_IN_SYNC_FETCH_AND_OR_1 BUILT_IN_SYNC_FETCH_AND_OR_2
> +  BUILT_IN_SYNC_FETCH_AND_OR_4 BUILT_IN_SYNC_FETCH_AND_OR_8
> +  BUILT_IN_SYNC_FETCH_AND_OR_16
> +  BUILT_IN_SYNC_FETCH_AND_XOR_1 BUILT_IN_SYNC_FETCH_AND_XOR_2
> +  BUILT_IN_SYNC_FETCH_AND_XOR_4 BUILT_IN_SYNC_FETCH_AND_XOR_8
> +  BUILT_IN_SYNC_FETCH_AND_XOR_16
> +  BUILT_IN_SYNC_XOR_AND_FETCH_1 BUILT_IN_SYNC_XOR_AND_FETCH_2
> +  BUILT_IN_SYNC_XOR_AND_FETCH_4 BUILT_IN_SYNC_XOR_AND_FETCH_8
> +  BUILT_IN_SYNC_XOR_AND_FETCH_16)
> +/* __atomic_fetch_and_*.  */
> +(define_operator_list ATOMIC_FETCH_AND_N
> +  BUILT_IN_ATOMIC_FETCH_AND_1 BUILT_IN_ATOMIC_FETCH_AND_2
> +  BUILT_IN_ATOMIC_FETCH_AND_4 BUILT_IN_ATOMIC_FETCH_AND_8
> +  BUILT_IN_ATOMIC_FETCH_AND_16)
> +/* __sync_fetch_and_and_*.  */
> +(define_operator_list SYNC_FETCH_AND_AND_N
> +  BUILT_IN_SYNC_FETCH_AND_AND_1 BUILT_IN_SYNC_FETCH_AND_AND_2
> +  BUILT_IN_SYNC_FETCH_AND_AND_4 BUILT_IN_SYNC_FETCH_AND_AND_8
> +  BUILT_IN_SYNC_FETCH_AND_AND_16)
> +
>  /* With nop_convert? combine convert? and view_convert? in one pattern
> plus conditionalize on 

Re: [PATCH] Simplify (_Float16) sqrtf((float) a) to .SQRT(a) when a is a _Float16 value.

2021-10-25 Thread Richard Biener via Gcc-patches
On Mon, Oct 25, 2021 at 10:26 AM Hongtao Liu  wrote:
>
> On Mon, Oct 25, 2021 at 1:59 PM liuhongt  wrote:
> >
> > Similar for sqrt/sqrtl.
> >
>   Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>   Ok for trunk?

OK.

> > gcc/ChangeLog:
> >
> > PR target/102464
> > * match.pd: Simplify (_Float16) sqrtf((float) a) to .SQRT(a)
> > when direct_internal_fn_supported_p, similar for sqrt/sqrtl.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/102464
> > * gcc.target/i386/pr102464-sqrtph.c: New test.
> > * gcc.target/i386/pr102464-sqrtsh.c: New test.
> > ---
> >  gcc/match.pd  |  6 +++--
> >  .../gcc.target/i386/pr102464-sqrtph.c | 27 +++
> >  .../gcc.target/i386/pr102464-sqrtsh.c | 23 
> >  3 files changed, 54 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 5bed2e12715..43d1c1bc0bd 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6228,14 +6228,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > BUILT_IN_ROUNDEVENL BUILT_IN_ROUNDEVEN BUILT_IN_ROUNDEVENF
> > BUILT_IN_ROUNDL BUILT_IN_ROUND BUILT_IN_ROUNDF
> > BUILT_IN_NEARBYINTL BUILT_IN_NEARBYINT BUILT_IN_NEARBYINTF
> > -   BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF)
> > +   BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF
> > +   BUILT_IN_SQRTL BUILT_IN_SQRT BUILT_IN_SQRTF)
> >   tos (IFN_TRUNC IFN_TRUNC IFN_TRUNC
> >   IFN_FLOOR IFN_FLOOR IFN_FLOOR
> >   IFN_CEIL IFN_CEIL IFN_CEIL
> >   IFN_ROUNDEVEN IFN_ROUNDEVEN IFN_ROUNDEVEN
> >   IFN_ROUND IFN_ROUND IFN_ROUND
> >   IFN_NEARBYINT IFN_NEARBYINT IFN_NEARBYINT
> > - IFN_RINT IFN_RINT IFN_RINT)
> > + IFN_RINT IFN_RINT IFN_RINT
> > + IFN_SQRT IFN_SQRT IFN_SQRT)
> >   /* (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc.,
> >  if x is a _Float16.  */
> >   (simplify
> > diff --git a/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c 
> > b/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c
> > new file mode 100644
> > index 000..8bd19c6e65e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c
> > @@ -0,0 +1,27 @@
> > +/* PR target/102464.  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ffast-math -ftree-vectorize" 
> > } */
> > +
> > +#include
> > +void foo1 (_Float16* __restrict a, _Float16* b)
> > +{
> > +  for (int i = 0; i != 8; i++)
> > +a[i] =  sqrtf (b[i]);
> > +}
> > +
> > +void foo2 (_Float16* __restrict a, _Float16* b)
> > +{
> > +  for (int i = 0; i != 8; i++)
> > +a[i] =  sqrt (b[i]);
> > +}
> > +
> > +void foo3 (_Float16* __restrict a, _Float16* b)
> > +{
> > +  for (int i = 0; i != 8; i++)
> > +a[i] =  sqrtl (b[i]);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
> > +/* { dg-final { scan-assembler-not "vcvtph2p\[sd\]" } } */
> > +/* { dg-final { scan-assembler-not "extendhfxf" } } */
> > +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*xmm\[0-9\]" 3 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c 
> > b/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c
> > new file mode 100644
> > index 000..4cf0089a67f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c
> > @@ -0,0 +1,23 @@
> > +/* PR target/102464.  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx512fp16 -ffast-math" } */
> > +
> > +#include
> > +_Float16 foo1 (_Float16 a)
> > +{
> > +  return sqrtf (a);
> > +}
> > +
> > +_Float16 foo2 (_Float16 a)
> > +{
> > +  return sqrt (a);
> > +}
> > +
> > +_Float16 foo3 (_Float16 a)
> > +{
> > +  return sqrtl (a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
> > +/* { dg-final { scan-assembler-not "extendhfxf" } } */
> > +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 3 } } */
> > --
> > 2.18.1
> >
>
>
> --
> BR,
> Hongtao


Re: [PATCH] Fix PR 102908: wrongly removing null pointer loads

2021-10-25 Thread Richard Biener via Gcc-patches
On Sun, Oct 24, 2021 at 11:28 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> Just like PR 100382, here we have a DCE removing a
> null pointer load which is needed still.
> In this case, execute_fixup_cfg removes a store (correctly)
> and then removes the null load (incorrectly) due to
> not checking stmt_unremovable_because_of_non_call_eh_p.
> This patch adds the check in the similar way as the patch
> to fix PR 100382 did.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-dce.c (simple_dce_from_worklist):
> Check stmt_unremovable_because_of_non_call_eh_p also
> before removing the statement.
> ---
>  gcc/tree-ssa-dce.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
> index 372e0691ae6..1281e67489c 100644
> --- a/gcc/tree-ssa-dce.c
> +++ b/gcc/tree-ssa-dce.c
> @@ -1828,6 +1828,11 @@ simple_dce_from_worklist (bitmap worklist)
>if (gimple_has_side_effects (t))
> continue;
>
> +  /* Don't remove statements that are needed for non-call
> +eh to work.  */
> +  if (stmt_unremovable_because_of_non_call_eh_p (cfun, t))
> +   continue;
> +
>/* Add uses to the worklist.  */
>ssa_op_iter iter;
>use_operand_p use_p;
> --
> 2.17.1
>


[PATCH] tree-optimization/102920 - fix PHI VN with undefined args

2021-10-25 Thread Richard Biener via Gcc-patches
This fixes a latent issue exposed by now allowing VN_TOP in PHI
arguments.  We may only use optimistic equality when merging values on
different edges, not when merging values on the same edge - in particular
we may not choose the undef value on any edge when there's a not undef
value as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-25  Richard Biener  

PR tree-optimization/102920
* tree-ssa-sccvn.h (expressions_equal_p): Add argument
controlling VN_TOP matching behavior.
* tree-ssa-sccvn.c (expressions_equal_p): Likewise.
(vn_phi_eq): Do not optimistically match VN_TOP.

* gcc.dg/torture/pr102920.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr102920.c | 25 +
 gcc/tree-ssa-sccvn.c| 21 ++---
 gcc/tree-ssa-sccvn.h|  2 +-
 3 files changed, 40 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr102920.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr102920.c 
b/gcc/testsuite/gcc.dg/torture/pr102920.c
new file mode 100644
index 000..aa27ac5f6ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr102920.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-additional-options "-funswitch-loops" } */
+
+unsigned short a = 42;
+unsigned short b = 1;
+long int c = 1;
+unsigned char var_120;
+unsigned char var_123;
+
+void __attribute__((noipa)) test(unsigned short a, unsigned short b, long c)
+{
+  for (char i = 0; i < (char)c; i += 5)
+if (!b)
+  var_120 = a;
+else
+  var_123 = a;
+}
+
+int main()
+{
+  test(a, b, c);
+  if (var_123 != 42)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 893b1d0ddaa..d5242597684 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -4441,11 +4441,15 @@ vn_phi_eq (const_vn_phi_t const vp1, const_vn_phi_t 
const vp2)
if (inverted_p)
  std::swap (te2, fe2);
 
-   /* ???  Handle VN_TOP specially.  */
+   /* Since we do not know which edge will be executed we have
+  to be careful when matching VN_TOP.  Be conservative and
+  only match VN_TOP == VN_TOP for now, we could allow
+  VN_TOP on the not prevailing PHI though.  See for example
+  PR102920.  */
if (! expressions_equal_p (vp1->phiargs[te1->dest_idx],
-  vp2->phiargs[te2->dest_idx])
+  vp2->phiargs[te2->dest_idx], false)
|| ! expressions_equal_p (vp1->phiargs[fe1->dest_idx],
- vp2->phiargs[fe2->dest_idx]))
+ vp2->phiargs[fe2->dest_idx], false))
  return false;
 
return true;
@@ -4470,7 +4474,7 @@ vn_phi_eq (const_vn_phi_t const vp1, const_vn_phi_t const 
vp2)
   tree phi2op = vp2->phiargs[i];
   if (phi1op == phi2op)
continue;
-  if (!expressions_equal_p (phi1op, phi2op))
+  if (!expressions_equal_p (phi1op, phi2op, false))
return false;
 }
 
@@ -5816,17 +5820,20 @@ get_next_constant_value_id (void)
 }
 
 
-/* Compare two expressions E1 and E2 and return true if they are equal.  */
+/* Compare two expressions E1 and E2 and return true if they are equal.
+   If match_vn_top_optimistically is true then VN_TOP is equal to anything,
+   otherwise VN_TOP only matches VN_TOP.  */
 
 bool
-expressions_equal_p (tree e1, tree e2)
+expressions_equal_p (tree e1, tree e2, bool match_vn_top_optimistically)
 {
   /* The obvious case.  */
   if (e1 == e2)
 return true;
 
   /* If either one is VN_TOP consider them equal.  */
-  if (e1 == VN_TOP || e2 == VN_TOP)
+  if (match_vn_top_optimistically
+  && (e1 == VN_TOP || e2 == VN_TOP))
 return true;
 
   /* SSA_NAME compare pointer equal.  */
diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
index 8a1b649c726..7d53ab5e39f 100644
--- a/gcc/tree-ssa-sccvn.h
+++ b/gcc/tree-ssa-sccvn.h
@@ -22,7 +22,7 @@
 #define TREE_SSA_SCCVN_H
 
 /* In tree-ssa-sccvn.c  */
-bool expressions_equal_p (tree, tree);
+bool expressions_equal_p (tree, tree, bool = true);
 
 
 /* TOP of the VN lattice.  */
-- 
2.31.1


Re: [PATCH] Simplify (_Float16) sqrtf((float) a) to .SQRT(a) when a is a _Float16 value.

2021-10-25 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 25, 2021 at 1:59 PM liuhongt  wrote:
>
> Similar for sqrt/sqrtl.
>
  Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
  Ok for trunk?

> gcc/ChangeLog:
>
> PR target/102464
> * match.pd: Simplify (_Float16) sqrtf((float) a) to .SQRT(a)
> when direct_internal_fn_supported_p, similar for sqrt/sqrtl.
>
> gcc/testsuite/ChangeLog:
>
> PR target/102464
> * gcc.target/i386/pr102464-sqrtph.c: New test.
> * gcc.target/i386/pr102464-sqrtsh.c: New test.
> ---
>  gcc/match.pd  |  6 +++--
>  .../gcc.target/i386/pr102464-sqrtph.c | 27 +++
>  .../gcc.target/i386/pr102464-sqrtsh.c | 23 
>  3 files changed, 54 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5bed2e12715..43d1c1bc0bd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6228,14 +6228,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> BUILT_IN_ROUNDEVENL BUILT_IN_ROUNDEVEN BUILT_IN_ROUNDEVENF
> BUILT_IN_ROUNDL BUILT_IN_ROUND BUILT_IN_ROUNDF
> BUILT_IN_NEARBYINTL BUILT_IN_NEARBYINT BUILT_IN_NEARBYINTF
> -   BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF)
> +   BUILT_IN_RINTL BUILT_IN_RINT BUILT_IN_RINTF
> +   BUILT_IN_SQRTL BUILT_IN_SQRT BUILT_IN_SQRTF)
>   tos (IFN_TRUNC IFN_TRUNC IFN_TRUNC
>   IFN_FLOOR IFN_FLOOR IFN_FLOOR
>   IFN_CEIL IFN_CEIL IFN_CEIL
>   IFN_ROUNDEVEN IFN_ROUNDEVEN IFN_ROUNDEVEN
>   IFN_ROUND IFN_ROUND IFN_ROUND
>   IFN_NEARBYINT IFN_NEARBYINT IFN_NEARBYINT
> - IFN_RINT IFN_RINT IFN_RINT)
> + IFN_RINT IFN_RINT IFN_RINT
> + IFN_SQRT IFN_SQRT IFN_SQRT)
>   /* (_Float16) round ((doube) x) -> __built_in_roundf16 (x), etc.,
>  if x is a _Float16.  */
>   (simplify
> diff --git a/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c 
> b/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c
> new file mode 100644
> index 000..8bd19c6e65e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr102464-sqrtph.c
> @@ -0,0 +1,27 @@
> +/* PR target/102464.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ffast-math -ftree-vectorize" } 
> */
> +
> +#include
> +void foo1 (_Float16* __restrict a, _Float16* b)
> +{
> +  for (int i = 0; i != 8; i++)
> +a[i] =  sqrtf (b[i]);
> +}
> +
> +void foo2 (_Float16* __restrict a, _Float16* b)
> +{
> +  for (int i = 0; i != 8; i++)
> +a[i] =  sqrt (b[i]);
> +}
> +
> +void foo3 (_Float16* __restrict a, _Float16* b)
> +{
> +  for (int i = 0; i != 8; i++)
> +a[i] =  sqrtl (b[i]);
> +}
> +
> +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
> +/* { dg-final { scan-assembler-not "vcvtph2p\[sd\]" } } */
> +/* { dg-final { scan-assembler-not "extendhfxf" } } */
> +/* { dg-final { scan-assembler-times "vsqrtph\[^\n\r\]*xmm\[0-9\]" 3 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c 
> b/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c
> new file mode 100644
> index 000..4cf0089a67f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr102464-sqrtsh.c
> @@ -0,0 +1,23 @@
> +/* PR target/102464.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512fp16 -ffast-math" } */
> +
> +#include
> +_Float16 foo1 (_Float16 a)
> +{
> +  return sqrtf (a);
> +}
> +
> +_Float16 foo2 (_Float16 a)
> +{
> +  return sqrt (a);
> +}
> +
> +_Float16 foo3 (_Float16 a)
> +{
> +  return sqrtl (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "vcvtsh2s\[sd\]" } } */
> +/* { dg-final { scan-assembler-not "extendhfxf" } } */
> +/* { dg-final { scan-assembler-times "vsqrtsh\[^\n\r\]*xmm\[0-9\]" 3 } } */
> --
> 2.18.1
>


-- 
BR,
Hongtao


Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-10-25 Thread Richard Biener via Gcc-patches
On Sun, 24 Oct 2021, Jan-Benedict Glaw wrote:

> Hi Richard,
> 
> On Sun, 2021-10-24 08:36:36 +0200, Richard Biener  wrote:
> > On October 23, 2021 10:00:05 PM GMT+02:00, Jan-Benedict Glaw 
> >  wrote:
> > >On Tue, 2021-09-21 16:25:19 +0200, Richard Biener via Gcc-patches 
> > > wrote:
> > >> I have built all targets from contrib/config-list.mk to make sure we
> > >> don't run into the #error and the following makes the STABS usage
> > >> explicit for pdp11 and hppa with SOM.
> > >
> > >I'm running build tests based on config-list.mk as well and see a good
> > >number of targets failing, all about the same, ie. for moxie-elf:
> > 
> > That's odd. I did test the patch using config-list.mk - the patch
> > sat in the comit tree for quite a while since that exercise (but
> > unchanged), but I doubt anything significant changed in between. 
> > 
> > >[all 2021-10-17 00:01:19] /usr/lib/gcc-snapshot/bin/g++ -fno-PIE -c 
> > >-DIN_GCC_FRONTEND -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE 
> > >-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
> > >-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag 
> > >-Wmissing-format-attribute -Woverloaded-virtual -pedantic 
> > >-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror 
> > >-fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. 
> > >-I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include 
> > >-I../../gcc/gcc/../libcody -I../../gcc/gcc/../libdecnumber 
> > >-I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
> > >-I../../gcc/gcc/../libbacktrace -o default-d.o -MT default-d.o -MMD 
> > >-MP -MF ./.deps/default-d.TPo ../../gcc/gcc/config/default-d.c [all 
> > >2021-10-17 00:01:19] In file included from ./tm_d.h:9, [all 
> > >2021-10-17 00:01:19] from ../../gcc/gcc/config/default-d.c:22: [all 
> > >2021-10-17 00:01:19] ../../gcc/gcc/defaults.h:908:2: error: #error 
> > >You must define PREFERRED_DEBUGGING_TYPE if DWARF is not supported
> > 
> > Is that building the D frontend? I remember restricting the builds to C... 
> 
> Probably. I configure as
> 
> .../gcc/configure --target=moxie-elf --enable-werror-always
> --enable-languages=all --disable-gcov --disable-shared
> --disable-threads --without-headers
> --prefix=/var/lib/laminar/run/gcc-moxie-elf/13/toolchain-install

So it looks like tm_d.h is much more stripped down compared to regular
tm_p.h but also oddly enough config/default-d.c includes tm_d.h
while config/default-c.c explicitely documents itself to not do that.

In particular tm_d.h includes defaults.h which now has the requirement
that either PREFERRED_DEBUGGING_TYPE is defined or DWARF2_DEBUGGING_INFO
but the latter is usually picked up from config/elfos.h or similar
which are headers _not_ included via tm_d.h.

The old defaults.h resulted in NO_DEBUG if no PREFERRED_DEBUGGING_TYPE
and no DWARF2_DEBUGGING_INFO was defined.

I also note that default-d.o is not built on x86_64-linux?  Looks
like that's built only for

if [ "$target_has_targetdm" = "no" ]; then
  d_target_objs="$d_target_objs default-d.o"
fi

I note that for example config/glibc-d.c includes tm.h and tm_p.h
which would end up in proper definitions.

So ... for moxie-elf, did D really end up with NO_DEBUG previously?
Is that "correct" for D or was that a bug?  moxie-elf seems to use
default-c.c as well but that does not end including defaults.h.

Is it maybe a bug that tm_d.h includes defaults.h at all?  Should
"d defaults" be in a defaults-d.h instead?  If I remove the
defaults.h include from tm_d.h the build for moxie-elf succeeds.

Ian?  Joseph?

Thanks,
Richard.


[PATCH] Enable vectorization for _Float16 floor/ceil/trunc/nearbyint/rint operations.

2021-10-25 Thread liuhongt via Gcc-patches
  Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
  Ok for trunk?

gcc/ChangeLog:

PR target/102464
* config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF): New
function type.
(V16HF_FTYPE_V16HF): Ditto.
(V32HF_FTYPE_V32HF): Ditto.
(V8HF_FTYPE_V8HF_ROUND): Ditto.
(V16HF_FTYPE_V16HF_ROUND): Ditto.
(V32HF_FTYPE_V32HF_ROUND): Ditto.
* config/i386/i386-builtin.def ( IX86_BUILTIN_FLOORPH,
IX86_BUILTIN_CEILPH, IX86_BUILTIN_TRUNCPH,
IX86_BUILTIN_FLOORPH256, IX86_BUILTIN_CEILPH256,
IX86_BUILTIN_TRUNCPH256, IX86_BUILTIN_FLOORPH512,
IX86_BUILTIN_CEILPH512, IX86_BUILTIN_TRUNCPH512): New builtin.
* config/i386/i386-builtins.c
(ix86_builtin_vectorized_function): Enable vectorization for
HFmode FLOOR/CEIL/TRUNC operation.
* config/i386/i386-expand.c (ix86_expand_args_builtin): Handle
new builtins.
* config/i386/sse.md (rint2, nearbyint2): Extend
to vector HFmodes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102464-vrndscaleph.c: New test.
---
 gcc/config/i386/i386-builtin-types.def|   7 ++
 gcc/config/i386/i386-builtin.def  |  11 ++
 gcc/config/i386/i386-builtins.c   |  42 +++
 gcc/config/i386/i386-expand.c |   3 +
 gcc/config/i386/sse.md|  12 +-
 .../gcc.target/i386/pr102464-vrndscaleph.c| 115 ++
 6 files changed, 184 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-vrndscaleph.c

diff --git a/gcc/config/i386/i386-builtin-types.def 
b/gcc/config/i386/i386-builtin-types.def
index 4c355c587b5..e33f06ab30b 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1380,3 +1380,10 @@ DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, UHI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
 DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT)
+
+DEF_FUNCTION_TYPE (V8HF, V8HF)
+DEF_FUNCTION_TYPE (V16HF, V16HF)
+DEF_FUNCTION_TYPE (V32HF, V32HF)
+DEF_FUNCTION_TYPE_ALIAS (V8HF_FTYPE_V8HF, ROUND)
+DEF_FUNCTION_TYPE_ALIAS (V16HF_FTYPE_V16HF, ROUND)
+DEF_FUNCTION_TYPE_ALIAS (V32HF_FTYPE_V32HF, ROUND)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 99217d08d37..d9eee3f373c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -958,6 +958,10 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, 
CODE_FOR_sse4_1_roundpd_vec_pack_sfix, "__buil
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv2df2, 
"__builtin_ia32_roundpd_az", IX86_BUILTIN_ROUNDPD_AZ, UNKNOWN, (int) 
V2DF_FTYPE_V2DF)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv2df2_vec_pack_sfix, 
"__builtin_ia32_roundpd_az_vec_pack_sfix", 
IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX, UNKNOWN, (int) V4SI_FTYPE_V2DF_V2DF)
 
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_floorph", 
IX86_BUILTIN_FLOORPH, (enum rtx_code) ROUND_FLOOR, (int) V8HF_FTYPE_V8HF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_ceilph", IX86_BUILTIN_CEILPH, 
(enum rtx_code) ROUND_CEIL, (int) V8HF_FTYPE_V8HF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_truncph", 
IX86_BUILTIN_TRUNCPH, (enum rtx_code) ROUND_TRUNC, (int) V8HF_FTYPE_V8HF_ROUND)
+
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
"__builtin_ia32_floorps", IX86_BUILTIN_FLOORPS, (enum rtx_code) ROUND_FLOOR, 
(int) V4SF_FTYPE_V4SF_ROUND)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
"__builtin_ia32_ceilps", IX86_BUILTIN_CEILPS, (enum rtx_code) ROUND_CEIL, (int) 
V4SF_FTYPE_V4SF_ROUND)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
"__builtin_ia32_truncps", IX86_BUILTIN_TRUNCPS, (enum rtx_code) ROUND_TRUNC, 
(int) V4SF_FTYPE_V4SF_ROUND)
@@ -1090,6 +1094,10 @@ BDESC (OPTION_MASK_ISA_AVX, 0, 
CODE_FOR_roundv4df2_vec_pack_sfix, "__builtin_ia3
 BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_roundpd_vec_pack_sfix256, 
"__builtin_ia32_floorpd_vec_pack_sfix256", 
IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX256, (enum rtx_code) ROUND_FLOOR, (int) 
V8SI_FTYPE_V4DF_V4DF_ROUND)
 BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_roundpd_vec_pack_sfix256, 
"__builtin_ia32_ceilpd_vec_pack_sfix256", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX256, 
(enum rtx_code) ROUND_CEIL, (int) V8SI_FTYPE_V4DF_V4DF_ROUND)
 
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
CODE_FOR_avx512vl_rndscalev16hf, "__builtin_ia32_floorph256", 
IX86_BUILTIN_FLOORPH256, (enum rtx_code) ROUND_FLOOR, (int) 
V16HF_FTYPE_V16HF_ROUND)
+BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
CODE_FOR_avx512vl_rndscalev16hf, "__builtin_ia32_ceilph256", 
IX86_BUILTIN_CEILPH256, (enum 

Re: [PATCH] x86_64: Implement V1TI mode shifts/rotates by a constant

2021-10-25 Thread Uros Bizjak via Gcc-patches
On Sun, Oct 24, 2021 at 6:34 PM Roger Sayle  wrote:
>
>
> This patch provides RTL expanders to implement logical shifts and
> rotates of 128-bit values (stored in vector integer registers) by
> constant bit counts.  Previously, GCC would transfer these values
> to a pair of scalar registers (TImode) via memory to perform the
> operation, then transfer the result back via memory.  Instead these
> operations are now expanded using (between 1 and 5) SSE2 vector
> instructions.

Hm, instead of using memory (without STL forwarding for general -> XMM
moves!) these should use something similar to what clang produces (or
use pextrq/pinsrq, at least with SSE4.1):

   movq%xmm0, %rax
   pshufd  $78, %xmm0, %xmm0
   movq%xmm0, %rcx
   shldq   $8, %rax, %rcx
   shlq$8, %rax
   movq%rcx, %xmm1
   movq%rax, %xmm0
   punpcklqdq  %xmm1, %xmm0

> Logical shifts by multiples of 8 can be implemented using x86_64's
> pslldq/psrldq instruction:
> ashl_8: pslldq  $1, %xmm0
> ret
> lshr_32:
> psrldq  $4, %xmm0
> ret
>
> Logical shifts by greater than 64 can use pslldq/psrldq $8, followed
> by a psllq/psrlq for the remaining bits:
> ashl_111:
> pslldq  $8, %xmm0
> psllq   $47, %xmm0
> ret
> lshr_127:
> psrldq  $8, %xmm0
> psrlq   $63, %xmm0
> ret
>
> The remaining logical shifts make use of the following idiom:
> ashl_1:
> movdqa  %xmm0, %xmm1
> psllq   $1, %xmm0
> pslldq  $8, %xmm1
> psrlq   $63, %xmm1
> por %xmm1, %xmm0
> ret
> lshr_15:
> movdqa  %xmm0, %xmm1
> psrlq   $15, %xmm0
> psrldq  $8, %xmm1
> psllq   $49, %xmm1
> por %xmm1, %xmm0
> ret
>
> Rotates by multiples of 32 can use x86_64's pshufd:
> rotr_32:
> pshufd  $57, %xmm0, %xmm0
> ret
> rotr_64:
> pshufd  $78, %xmm0, %xmm0
> ret
> rotr_96:
> pshufd  $147, %xmm0, %xmm0
> ret
>
> Rotates by multiples of 8 (other than multiples of 32) can make
> use of both pslldq and psrldq, followed by por:
> rotr_8:
> movdqa  %xmm0, %xmm1
> psrldq  $1, %xmm0
> pslldq  $15, %xmm1
> por %xmm1, %xmm0
> ret
> rotr_112:
> movdqa  %xmm0, %xmm1
> psrldq  $14, %xmm0
> pslldq  $2, %xmm1
> por %xmm1, %xmm0
> ret
>
> And the remaining rotates use one or two pshufd, followed by a
> psrld/pslld/por sequence:
> rotr_1:
> movdqa  %xmm0, %xmm1
> pshufd  $57, %xmm0, %xmm0
> psrld   $1, %xmm1
> pslld   $31, %xmm0
> por %xmm1, %xmm0
> ret
> rotr_63:
> pshufd  $78, %xmm0, %xmm1
> pshufd  $57, %xmm0, %xmm0
> pslld   $1, %xmm1
> psrld   $31, %xmm0
> por %xmm1, %xmm0
> ret
> rotr_111:
> pshufd  $147, %xmm0, %xmm1
> pslld   $17, %xmm0
> psrld   $15, %xmm1
> por %xmm1, %xmm0
> ret
>
> The new test case, sse2-v1ti-shift.c, is a run-time check to confirm that
> the results of V1TImode shifts/rotates by constants, exactly match the
> expected results of TImode operations, for various input test vectors.

Is the sequence of 4+ SSE instructions really faster than
pinsrq/pextrq (and two movq insn) + two operations on integer
registers?

Uros.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?
>
>
> 2021-10-24  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.c (ix86_expand_v1ti_shift): New helper
> function to expand V1TI mode logical shifts by integer constants.
> (ix86_expand_v1ti_rotate): New helper function to expand V1TI
> mode rotations by integer constants.
> * config/i386/i386-protos.h (ix86_expand_v1ti_shift,
> ix86_expand_v1ti_rotate): Prototype new functions here.
> * config/i386/sse.md (ashlv1ti3, lshrv1ti3, rotlv1ti3, rotrv1ti3):
> New TARGET_SSE2 expanders to implement V1TI shifts and rotations.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse2-v1ti-shift.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


  1   2   >