Re: [PATCH 1/2] REE: PR rtl-optimization/100264: Handle more PARALLEL SET expressions

2021-05-05 Thread Jim Wilson
On Fri, Apr 30, 2021 at 4:10 PM Christoph Müllner via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Sat, May 1, 2021 at 12:48 AM Jeff Law  wrote:
> > On 4/26/2021 5:38 AM, Christoph Muellner via Gcc-patches wrote:
> > > [ree] PR rtl-optimization/100264: Handle more PARALLEL SET expressions
> > >
> > >  PR rtl-optimization/100264
> > >  * ree.c (get_sub_rtx): Ignore SET expressions without register
> > >  destinations.
> > >  (merge_def_and_ext): Eliminate destination check for register
> > >  as such SET expressions can't occur anymore.
> > >  (combine_reaching_defs): Likewise.
> >
> > This is pretty sensible.  Do you have commit privs for GCC?
>

This looks reasonable to me also.  But I tried a build and check with an
rv64gc/lp64d linux toolchain built from riscv-gnu-toolchain and I get two
extra failures in the gfortran testsuite.

/scratch/jimw/fsf-testing/patched/riscv-gcc/gcc/testsuite/gfortran.dg/typebound\
_operator_3.f03:93:21: internal compiler error: in get_sub_rtx, at
ree.c:705^M
0x15664f8 get_sub_rtx^M
../../../patched/riscv-gcc/gcc/ree.c:705^M
0x15672ce merge_def_and_ext^M
../../../patched/riscv-gcc/gcc/ree.c:719^M
0x15672ce combine_reaching_defs^M
../../../patched/riscv-gcc/gcc/ree.c:1020^M
0x1568308 find_and_remove_re^M
../../../patched/riscv-gcc/gcc/ree.c:1319^M
0x1568308 rest_of_handle_ree^M
../../../patched/riscv-gcc/gcc/ree.c:1390^M
0x1568308 execute^M
../../../patched/riscv-gcc/gcc/ree.c:1418^M
Please submit a full bug report,^M
with preprocessed source if appropriate.^M
Please include the complete backtrace with any bug report.^M
See  for instructions.^M
compiler exited with status 1
FAIL: gfortran.dg/typebound_operator_3.f03   -Os  (internal compiler error)
FAIL: gfortran.dg/typebound_operator_3.f03   -Os  (test for excess errors)

Jim


Ping: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-05-05 Thread Xionghu Luo via Gcc-patches

Gentle ping, thanks.


On 2021/4/16 15:10, Xiong Hu Luo wrote:

fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.

fmodf:
  fdivs   f0,f1,f2
  frizf0,f0
  fnmsubs f1,f2,f0,f1

remainderf:
  fdivs   f0,f1,f2
  frinf0,f0
  fnmsubs f1,f2,f0,f1

gcc/ChangeLog:

2021-04-16  Xionghu Luo  

PR target/97142
* config/rs6000/rs6000.md (fmod3): New define_expand.
(remainder3): Likewise.

gcc/testsuite/ChangeLog:

2021-04-16  Xionghu Luo  

PR target/97142
* gcc.target/powerpc/pr97142.c: New test.
---
  gcc/config/rs6000/rs6000.md| 36 ++
  gcc/testsuite/gcc.target/powerpc/pr97142.c | 30 ++
  2 files changed, 66 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97142.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a1315523fec..7e0e94e6ba4 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4902,6 +4902,42 @@ (define_insn "fre"
[(set_attr "type" "fp")
 (set_attr "isa" "*,")])
  
+(define_expand "fmod3"

+  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))
+   (use (match_operand:SFDF 2 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+  && TARGET_FPRND
+  && flag_unsafe_math_optimizations"
+{
+  rtx div = gen_reg_rtx (mode);
+  emit_insn (gen_div3 (div, operands[1], operands[2]));
+
+  rtx friz = gen_reg_rtx (mode);
+  emit_insn (gen_btrunc2 (friz, div));
+
+  emit_insn (gen_nfms4 (operands[0], operands[2], friz, operands[1]));
+  DONE;
+ })
+
+(define_expand "remainder3"
+  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))
+   (use (match_operand:SFDF 2 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+  && TARGET_FPRND
+  && flag_unsafe_math_optimizations"
+{
+  rtx div = gen_reg_rtx (mode);
+  emit_insn (gen_div3 (div, operands[1], operands[2]));
+
+  rtx frin = gen_reg_rtx (mode);
+  emit_insn (gen_round2 (frin, div));
+
+  emit_insn (gen_nfms4 (operands[0], operands[2], frin, operands[1]));
+  DONE;
+ })
+
  (define_insn "*rsqrt2"
[(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" ",wa")]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97142.c 
b/gcc/testsuite/gcc.target/powerpc/pr97142.c
new file mode 100644
index 000..48f25ca5b5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97142.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+#include 
+
+float test1 (float x, float y)
+{
+  return fmodf (x, y);
+}
+
+double test2 (double x, double y)
+{
+  return fmod (x, y);
+}
+
+float test3 (float x, float y)
+{
+  return remainderf (x, y);
+}
+
+double test4 (double x, double y)
+{
+  return remainder (x, y);
+}
+
+/* { dg-final { scan-assembler-not {\mbl fmod\M} } } */
+/* { dg-final { scan-assembler-not {\mbl fmodf\M} } } */
+/* { dg-final { scan-assembler-not {\mbl remainder\M} } } */
+/* { dg-final { scan-assembler-not {\mbl remainderf\M} } } */
+



--
Thanks,
Xionghu


Re: [PATCH] split loop for NE condition.

2021-05-05 Thread guojiufu via Gcc-patches

On 2021-05-01 05:37, Segher Boessenkool wrote:

Hi!

On Thu, Apr 29, 2021 at 05:50:48PM +0800, Jiufu Guo wrote:
When there is the possibility that overflow may happen on the loop 
index,

a few optimizations would not happen. For example code:

foo (int *a, int *b, unsigned k, unsigned n)
{
  while (++k != n)
a[k] = b[k]  + 1;
}

For this code, if "l > n", overflow may happen.  if "l < n" at 
begining,

it could be optimized (e.g. vectorization).


FWIW, this isn't called "overflow" in C: all overflow is undefined
behaviour.

"A computation involving unsigned operands can never overflow, because 
a
result that cannot be represented by the resulting unsigned integer 
type

is reduced modulo the number that is one greater than the largest value
that can be represented by the resulting type."


Thanks for point out this, yes, it may be better to call it as 'wrap' :)




+-param=max-insns-ne-cond-split=
+Common Joined UInteger Var(param_max_insn_ne_cond_split) Init(64) 
Param Optimization
+The maximum threshold for insnstructions number of a loop with ne 
condition to split.


"number of instructions".

Perhaps you should mark up "ne" as a codeword somehow, but because it
is in a help text it is probably better to just, write out "not equal"
or similar?


Would update it accordingly. Thanks for your suggestion!



@@ -248,13 +250,14 @@ connect_loop_phis (class loop *loop1, class loop 
*loop2, edge new_e)

!gsi_end_p (psi_first);
gsi_next (_first), gsi_next (_second))
 {
-  tree init, next, new_init;
+  tree init, next, new_init, prev;
   use_operand_p op;
   gphi *phi_first = psi_first.phi ();
   gphi *phi_second = psi_second.phi ();

   init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
   next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
+  prev = PHI_RESULT (phi_first);
   op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
   gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR 
(op)));




I would just declare it at the first use...  Less mental load for the
reader.  (And a smaller patch ;-) )

Yeap, thanks!




+/* Check if the LOOP exit branch likes "if (idx != bound)".
+   if INV is not NULL and the branch is "if (bound != idx)", set *INV 
to true.


"If INV", sentences start with a capital.


Thanks :)



+  /* Make sure idx and bound.  */
+  tree idx = gimple_cond_lhs (cond);
+  tree bnd = gimple_cond_rhs (cond);
+  if (expr_invariant_in_loop_p (loop, idx))
+   {
+ std::swap (idx, bnd);
+ if (inv)
+   *inv = true;
+   }
+  else if (!expr_invariant_in_loop_p (loop, bnd))
+   continue;


Make sure idx and bound what?  What about them?


+  /* Make sure idx is iv.  */
+  class loop *useloop = loop_containing_stmt (cond);
+  affine_iv iv;
+  if (!simple_iv (loop, useloop, idx, , false))
+   continue;


"Make sure idx is a simple_iv"?

Thanks, the comment should be more clear, the intention is:
make sure "lhs/rhs" pair are "index/bound" pair.




+
+  /* No need to split loop, if base is know value.
+Or check range info.  */


"if base is a known value".  Not sure what you mean with range info?
A possible future improvement?
The intention is "If there is no wrap/overflow happen", no need to split 
loop".
If the base is a known value, the index may not wrap/overflow and may be 
able

optimized by other passes.
Using range-info to check wrap/overflow could be a future improvement.




+  /* There is type conversion on idx(or rhs of idx's def).
+And there is converting shorter to longer type. */
+  tree type = TREE_TYPE (idx);
+  if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME
+ || !TYPE_UNSIGNED (type)
+ || TYPE_PRECISION (type) == TYPE_PRECISION (sizetype))
+   continue;


"IDX is an unsigned type that is widened to SIZETYPE" etc.

This is better wording :)



This code assumes SIZETYPE is bigger than any other integer type.  Is
that true?  Even if so, the second comment could be improved.

(Not reviewing further, my Gimple isn't near good enough, sorry.  But
at least to my untrained eye it looks pretty good :-) )


Thanks so much for your very helpful comments!

Jiufu Guo.




Segher


Re: [PATCH] RISC-V: Generate helpers for cbranch4

2021-05-05 Thread Jim Wilson
On Wed, May 5, 2021 at 12:23 PM Christoph Muellner 
wrote:

> gcc/
> PR 100266
> * config/rsicv/riscv.c (riscv_block_move_loop): Simplify.
> * config/rsicv/riscv.md (cbranch4): Generate helpers.
>

OK.  Committed.  Though I had to fix the ChangeLog entry.  It was indented
by spaces instead of tabs.  The PR line is missing the component (target).
riscv is misspelled twice as rsicv.  And it doesn't mention the
stack_protect_test change.  The gcc commit hooks complained about most of
this stuff.  It seems fairly good at finding minor ChangeLog issues.

Jim


Re: [PATCH] split loop for NE condition.

2021-05-05 Thread guojiufu via Gcc-patches

On 2021-05-01 00:27, Jeff Law wrote:

On 4/29/2021 3:50 AM, Jiufu Guo via Gcc-patches wrote:
When there is the possibility that overflow may happen on the loop 
index,

a few optimizations would not happen. For example code:

foo (int *a, int *b, unsigned k, unsigned n)
{
   while (++k != n)
 a[k] = b[k]  + 1;
}

For this code, if "l > n", overflow may happen.  if "l < n" at 
begining,

it could be optimized (e.g. vectorization).

We can split the loop into two loops:

   while (++k > n)
 a[k] = b[k]  + 1;
   while (l++ < n)
 a[k] = b[k]  + 1;

then for the second loop, it could be optimized.

This patch is spltting this kind of small loop to achieve better 
performance.


Bootstrap and regtest pass on ppc64le.  Is this ok for trunk?

Thanks!

Jiufu Guo.

gcc/ChangeLog:

2021-04-29  Jiufu Guo  

* params.opt (max-insns-ne-cond-split): New.
* tree-ssa-loop-split.c (connect_loop_phis): Add new param.
(get_ne_cond_branch): New function.
(split_ne_loop): New function.
(split_loop_on_ne_cond): New function.
(tree_ssa_split_loops): Use split_loop_on_ne_cond.


I haven't looked at the patch in any detail, but I wonder if the same
concept could be used to fix pr59371, which is a long standing
regression.  Yea, it's reported against MIPS, but the concepts are
fairly generic.


Yes, thanks for point out this!  This patch is handling "!=" which is a
little different from pr59371.  While as you point out, the concept can
be used for pr59371: split loop for possible wrap/overflow on 
index/bound.

We could enhance this patch to handle the case in pr59371!

Thanks!
Jiufu Guo.



Jeff


Re: [PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS [PR 100266]

2021-05-05 Thread Jim Wilson
On Wed, May 5, 2021 at 12:37 PM Christoph Muellner 
wrote:

> The existing CAS implementation uses an INSN definition, which provides
> the core LR/SC sequence. Additionally to that, there is a follow-up code,
> that evaluates the results and calculates the return values.
> This has two drawbacks: a) an extension to sub-word CAS implementations
> is not possible (even if, then it would be unmaintainable), and b) the
> implementation is hard to maintain/improve.
> This patch provides a programmatic implementation of CAS, similar
> like many other architectures are having one.
>

A comment that Andrew Waterman made to me today about the safety of this
under various circumstances got me thinking, and I realized that without
the special cas pattern we can get reloads in the middle of the sequence
which would be bad.  Experimenting a bit, I managed to prove it.  This is
using the old version of the patch which I already had handy, but I'm sure
the new version will behave roughly the same way.  Using the testsuite
testcase atomic-compare-exchange-3.c as before, and adding a lot of
-ffixed-X options to simulate high register pressure, with the compiler
command
./xgcc -B./ -O2 -S tmp.c -ffixed-x16 -ffixed-x17 -ffixed-x18 -ffixed-x19
-ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25
-ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30 -ffixed-x31
-ffixed-x15 -ffixed-x14 -ffixed-x13 -ffixed-x12 -ffixed-s0 -ffixed-s1
-ffixed-t2 -ffixed-t1 -ffixed-t0
I get for the first lr/sc loop
.L2:
lui a1,%hi(v)
addia0,a1,%lo(v)
lr.w a1, 0(a0)
ld  a0,8(sp)
sw  a1,24(sp)
bne a1,a0,.L39
lui a1,%hi(v)
addia0,a1,%lo(v)
lw  a1,16(sp)
sd  ra,24(sp)
sc.w ra, a1, 0(a0)
sext.w  a1,ra
ld  ra,24(sp)
bne a1,zero,.L2
and note all of the misc load/store instructions added by reload.  I don't
think this is safe or guaranteed to work.  With the cas pattern, any
reloads are guaranteed to be emitted before and/or after the lr/sc loop.
With the separate patterns, there is no way to ensure that we won't get
accidental reloads in the middle of the lr/sc loop.

I think we need to keep the cas pattern.  We can always put C code inside
the output template of the cas pattern if that is helpful.  It can do any
necessary tests and then return an appropriate string for the instructions
we want.

Jim


[PR66791][ARM] Replace __builtin_neon_vtst*

2021-05-05 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
The attached patch replaces __builtin_neon_vtst* (a, b) with (a & b) != 0.
Bootstrapped and tested on arm-linux-gnueabihf and cross-tested on arm*-*-*.
OK to commit ?

Thanks,
Prathamesh


vtst-1.diff
Description: Binary data


[PATCH 2/2] dwarf: new dwarf_debuginfo_p predicate

2021-05-05 Thread Indu Bhagat via Gcc-patches
This patch introduces a dwarf_debuginfo_p predicate that abstracts and
replaces complex checks on write_symbols.

gcc/c-family/ChangeLog:

* c-lex.c (init_c_lex): Use dwarf_debuginfo_p.

gcc/ChangeLog:

* config/c6x/c6x.c (c6x_output_file_unwind): Use dwarf_debuginfo_p.
* dwarf2cfi.c (cfi_label_required_p): Likewise.
(dwarf2out_do_frame): Likewise.
* final.c (dwarf2_debug_info_emitted_p): Likewise.
(final_scan_insn_1): Likewise.
* flags.h (dwarf_debuginfo_p): New function declaration.
* opts.c (dwarf_debuginfo_p): New function definition.
* targhooks.c (default_debug_unwind_info): Use dwarf_debuginfo_p.
* toplev.c (process_options): Likewise.
---
 gcc/c-family/c-lex.c |  4 ++--
 gcc/config/c6x/c6x.c |  3 +--
 gcc/dwarf2cfi.c  |  9 -
 gcc/final.c  | 15 ++-
 gcc/flags.h  |  4 
 gcc/opts.c   |  8 
 gcc/targhooks.c  |  2 +-
 gcc/toplev.c |  6 ++
 8 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 6374b72..5174b22 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "c-pragma.h"
 #include "debug.h"
+#include "flags.h"
 #include "file-prefix-map.h" /* remap_macro_filename()  */
 #include "langhooks.h"
 #include "attribs.h"
@@ -87,8 +88,7 @@ init_c_lex (void)
 
   /* Set the debug callbacks if we can use them.  */
   if ((debug_info_level == DINFO_LEVEL_VERBOSE
-   && (write_symbols == DWARF2_DEBUG
-  || write_symbols == VMS_AND_DWARF2_DEBUG))
+   && dwarf_debuginfo_p ())
   || flag_dump_go_spec != NULL)
 {
   cb->define = cb_define;
diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c
index f9ad1e5..a10e2f8 100644
--- a/gcc/config/c6x/c6x.c
+++ b/gcc/config/c6x/c6x.c
@@ -439,8 +439,7 @@ c6x_output_file_unwind (FILE * f)
 {
   if (flag_unwind_tables || flag_exceptions)
{
- if (write_symbols == DWARF2_DEBUG
- || write_symbols == VMS_AND_DWARF2_DEBUG)
+ if (dwarf_debuginfo_p ())
asm_fprintf (f, "\t.cfi_sections .debug_frame, .c6xabi.exidx\n");
  else
asm_fprintf (f, "\t.cfi_sections .c6xabi.exidx\n");
diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 362ff3f..c27ac19 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -39,7 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "expr.h"  /* init_return_column_size */
 #include "output.h"/* asm_out_file */
 #include "debug.h" /* dwarf2out_do_frame, dwarf2out_do_cfi_asm */
-
+#include "flags.h" /* dwarf_debuginfo_p */
 
 /* ??? Poison these here until it can be done generically.  They've been
totally replaced in this file; make sure it stays that way.  */
@@ -2289,8 +2289,7 @@ cfi_label_required_p (dw_cfi_ref cfi)
 
   if (dwarf_version == 2
   && debug_info_level > DINFO_LEVEL_TERSE
-  && (write_symbols == DWARF2_DEBUG
- || write_symbols == VMS_AND_DWARF2_DEBUG))
+  && dwarf_debuginfo_p ())
 {
   switch (cfi->dw_cfi_opc)
{
@@ -3557,9 +3556,9 @@ bool
 dwarf2out_do_frame (void)
 {
   /* We want to emit correct CFA location expressions or lists, so we
- have to return true if we're going to output debug info, even if
+ have to return true if we're going to generate debug info, even if
  we're not going to output frame or unwind info.  */
-  if (write_symbols == DWARF2_DEBUG || write_symbols == VMS_AND_DWARF2_DEBUG)
+  if (dwarf_debuginfo_p ())
 return true;
 
   if (saved_do_cfi_asm > 0)
diff --git a/gcc/final.c b/gcc/final.c
index ba4285d..794702f 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -1428,7 +1428,8 @@ asm_str_count (const char *templ)
 static bool
 dwarf2_debug_info_emitted_p (tree decl)
 {
-  if (write_symbols != DWARF2_DEBUG && write_symbols != VMS_AND_DWARF2_DEBUG)
+  /* When DWARF2 debug info is not generated internally.  */
+  if (!dwarf_debuginfo_p ())
 return false;
 
   if (DECL_IGNORED_P (decl))
@@ -2298,10 +2299,8 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int 
optimize_p ATTRIBUTE_UNUSED,
  break;
 
case NOTE_INSN_BLOCK_BEG:
- if (debug_info_level == DINFO_LEVEL_NORMAL
- || debug_info_level == DINFO_LEVEL_VERBOSE
- || write_symbols == DWARF2_DEBUG
- || write_symbols == VMS_AND_DWARF2_DEBUG
+ if (debug_info_level >= DINFO_LEVEL_NORMAL
+ || dwarf_debuginfo_p ()
  || write_symbols == VMS_DEBUG)
{
  int n = BLOCK_NUMBER (NOTE_BLOCK (insn));
@@ -2336,10 +2335,8 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int 
optimize_p ATTRIBUTE_UNUSED,
case NOTE_INSN_BLOCK_END:
  maybe_output_next_view (seen);
 
- if (debug_info_level == DINFO_LEVEL_NORMAL
- 

[PATCH 1/2] opts: change write_symbols to support bitmasks

2021-05-05 Thread Indu Bhagat via Gcc-patches
To support multiple debug formats, we need to move away from explicit
enumeration of each individual combination of debug formats.

gcc/c-family/ChangeLog:

* c-opts.c (c_common_post_options): Adjust access to debug_type_names.
* c-pch.c (struct c_pch_validity): Use type uint32_t.
(pch_init): Renamed member.
(c_common_valid_pch): Adjust access to debug_type_names.

gcc/ChangeLog:

* common.opt: Change type to support bitmasks.
* flag-types.h (enum debug_info_type): Rename enumerator constants.
(NO_DEBUG): New bitmask.
(DBX_DEBUG): Likewise.
(DWARF2_DEBUG): Likewise.
(XCOFF_DEBUG): Likewise.
(VMS_DEBUG): Likewise.
(VMS_AND_DWARF2_DEBUG): Likewise.
* flags.h (debug_set_to_format): New function declaration.
(debug_set_count): Likewise.
(debug_set_names): Likewise.
* opts.c (debug_type_masks): Array of bitmasks for debug formats.
(debug_set_to_format): New function definition.
(debug_set_count): Likewise.
(debug_set_names): Likewise.
(set_debug_level): Update access to debug_type_names.
* toplev.c: Likewise.

gcc/objc/ChangeLog:

* objc-act.c (synth_module_prologue): Use uint32_t instead of enum
debug_info_type.
---
 gcc/c-family/c-opts.c |  10 +++--
 gcc/c-family/c-pch.c  |  12 +++---
 gcc/common.opt|   2 +-
 gcc/flag-types.h  |  29 ++
 gcc/flags.h   |  17 +++-
 gcc/objc/objc-act.c   |   2 +-
 gcc/opts.c| 109 +-
 gcc/toplev.c  |   9 +++--
 8 files changed, 158 insertions(+), 32 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 89e05a4..e463240 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -1112,9 +1112,13 @@ c_common_post_options (const char **pfilename)
  /* Only -g0 and -gdwarf* are supported with PCH, for other
 debug formats we warn here and refuse to load any PCH files.  */
  if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
-   warning (OPT_Wdeprecated,
-"the %qs debug format cannot be used with "
-"pre-compiled headers", debug_type_names[write_symbols]);
+   {
+ gcc_assert (debug_set_count (write_symbols) <= 1);
+ warning (OPT_Wdeprecated,
+  "the %qs debug format cannot be used with "
+  "pre-compiled headers",
+  debug_type_names[debug_set_to_format (write_symbols)]);
+   }
}
   else if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
c_common_no_more_pch ();
diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c
index fd94c37..6804388 100644
--- a/gcc/c-family/c-pch.c
+++ b/gcc/c-family/c-pch.c
@@ -52,7 +52,7 @@ enum {
 
 struct c_pch_validity
 {
-  unsigned char debug_info_type;
+  uint32_t pch_write_symbols;
   signed char match[MATCH_SIZE];
   void (*pch_init) (void);
   size_t target_data_length;
@@ -108,7 +108,7 @@ pch_init (void)
   pch_outfile = f;
 
   memset (, '\0', sizeof (v));
-  v.debug_info_type = write_symbols;
+  v.pch_write_symbols = write_symbols;
   {
 size_t i;
 for (i = 0; i < MATCH_SIZE; i++)
@@ -252,13 +252,15 @@ c_common_valid_pch (cpp_reader *pfile, const char *name, 
int fd)
   /* The allowable debug info combinations are that either the PCH file
  was built with the same as is being used now, or the PCH file was
  built for some kind of debug info but now none is in use.  */
-  if (v.debug_info_type != write_symbols
+  if (v.pch_write_symbols != write_symbols
   && write_symbols != NO_DEBUG)
 {
+  gcc_assert (debug_set_count (v.pch_write_symbols) <= 1);
+  gcc_assert (debug_set_count (write_symbols) <= 1);
   cpp_warning (pfile, CPP_W_INVALID_PCH,
   "%s: created with -g%s, but used with -g%s", name,
-  debug_type_names[v.debug_info_type],
-  debug_type_names[write_symbols]);
+  debug_type_names[debug_set_to_format (v.pch_write_symbols)],
+  debug_type_names[debug_set_to_format (write_symbols)]);
   return 2;
 }
 
diff --git a/gcc/common.opt b/gcc/common.opt
index a75b44e..ffb968d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -109,7 +109,7 @@ bool exit_after_options
 ; flag-types.h for the definitions of the different possible types of
 ; debugging information.
 Variable
-enum debug_info_type write_symbols = NO_DEBUG
+uint32_t write_symbols = NO_DEBUG
 
 ; Level of debugging information we are producing.  See flag-types.h
 ; for the definitions of the different possible levels.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index a038c8f..d60bb30 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -24,15 +24,30 @@ along with GCC; see the file COPYING3.  If not see
 
 enum debug_info_type
 

[PATCH 0/2] Fix write_symbols for supporting multiple debug formats

2021-05-05 Thread Indu Bhagat via Gcc-patches
Hello,

Over the last year, we have discussed and agreed that in order to support
multiple debug formats, we keep DWARF as the default internal debug format; Any
new debug format to be supported feeds off DWARF dies. This requirement
specification has worked well for addition for CTF/BTF overall. 

There are some existing issues that need to discussed and fixed in this regard,
however. One of them is the definition and handling of write_symbols.

The current issue is that write_symbols is defined as 

   enum debug_info_type write_symbols = NO_DEBUG;

This means any new combination of debug formats needs to be explicitly
enumerated, like CTF_AND_DWARF2_DEBUG, VMS_AND_DWARF2_DEBUG etc. So to support
say, "-gctf -gbtf -g" or to make other combinations of debug formats possible,
each one needs to spelled out explicitly; this will make the handling ugly.

We discussed over IRC about the possibility of write_symbols to use bitmasks
instead. Please take a look at the patch set and let me know what you think.

BTW, the patch 2/2 [dwarf: new dwarf_debuginfo_p predicate] in this series is
the same as the one sent earlier in the CTF patch series (and has been
approved). I just include it in this patch series as it fits better here.

In a subsequent patch after these current two patches, I can work on removing
the VMS_AND_DWARF2_DEBUG symbol and replacing its usages with the appropriate
bitmasks. I would also like to review the usages of debug_type_names [] in code
diagnostics around PCH (in c-family/c-opts.c and c-family/c-pch.c) in terms of
what combination of debug formats would be allowed and such. But at this
time, the patch retains the current behaviour by simply adjusting the approach
to access the debug_type_name [].

Boostrapped and regression tested on x86_64.

Thanks,
Indu Bhagat (2):
  opts: change write_symbols to support bitmasks
  dwarf: new dwarf_debuginfo_p predicate

 gcc/c-family/c-lex.c  |   4 +-
 gcc/c-family/c-opts.c |  10 +++--
 gcc/c-family/c-pch.c  |  12 +++---
 gcc/common.opt|   2 +-
 gcc/config/c6x/c6x.c  |   3 +-
 gcc/dwarf2cfi.c   |   9 ++--
 gcc/final.c   |  15 +++
 gcc/flag-types.h  |  29 ++---
 gcc/flags.h   |  21 -
 gcc/objc/objc-act.c   |   2 +-
 gcc/opts.c| 117 +-
 gcc/targhooks.c   |   2 +-
 gcc/toplev.c  |  15 ---
 13 files changed, 186 insertions(+), 55 deletions(-)

-- 
1.8.3.1



Re: testsuite: gcc.c-torture/execute/ieee/cdivchkld.c needs fmaxl

2021-05-05 Thread Joseph Myers
On Tue, 4 May 2021, Christophe Lyon via Gcc-patches wrote:

> The new test gcc.c-torture/execute/ieee/cdivchkld.c needs fmaxl(),
> which may not be available, for instance on aarch64-elf with newlib.
> As discussed in the PR, requiring c99_runtime enables to skip the test
> in this case.
> 
> 2021-05-04  Christophe Lyon  
> 
> PR testsuite/100355
> gcc/testsuite/
> * gcc.c-torture/execute/ieee/cdivchkld.x: New.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [RFC v2] bpf.2: Use standard types and attributes

2021-05-05 Thread Joseph Myers
On Wed, 5 May 2021, David Laight via Libc-alpha wrote:

> > __u64 can't be formatted with %llu on all architectures.  That's not
> > true for uint64_t, where you have to use %lu on some architectures to
> > avoid compiler warnings (and technically undefined behavior).  There are
> > preprocessor macros to get the expected format specifiers, but they are
> > clunky.  I don't know if the problem applies to uint32_t.  It does
> > happen with size_t and ptrdiff_t on 32-bit targets (both vary between
> > int and long).
> 
> uint32_t can be 'randomly' either int or long on typical 32bit architectures.
> The correct way to print it is with eg "xxx %5.4" PRI_u32 " yyy".

C2X adds printf length modifiers such as "w32", so you can use a 
friendlier %w32u, for example.  (Not yet implemented in glibc or in GCC's 
format checking.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 3/3] go: use htab_eq_string in godump

2021-05-05 Thread Tom Tromey
This changes godump to use the new htab_eq_string function.

gcc

* godump.c (string_hash_eq): Remove.
(go_finish): Use htab_eq_string.
---
 gcc/godump.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/gcc/godump.c b/gcc/godump.c
index 7864d9d63e5d..cf9989490356 100644
--- a/gcc/godump.c
+++ b/gcc/godump.c
@@ -109,14 +109,6 @@ macro_hash_del (void *v)
   XDELETE (mhv);
 }
 
-/* For the string hash tables.  */
-
-static int
-string_hash_eq (const void *y1, const void *y2)
-{
-  return strcmp ((const char *) y1, (const char *) y2) == 0;
-}
-
 /* A macro definition.  */
 
 static void
@@ -1374,11 +1366,11 @@ go_finish (const char *filename)
   real_debug_hooks->finish (filename);
 
   container.type_hash = htab_create (100, htab_hash_string,
- string_hash_eq, NULL);
+htab_eq_string, NULL);
   container.invalid_hash = htab_create (10, htab_hash_string,
-   string_hash_eq, NULL);
+   htab_eq_string, NULL);
   container.keyword_hash = htab_create (50, htab_hash_string,
-string_hash_eq, NULL);
+   htab_eq_string, NULL);
   obstack_init (_obstack);
 
   keyword_hash_init ();
-- 
2.26.3



[PATCH 2/3] gcc: use htab_eq_string

2021-05-05 Thread Tom Tromey
This changes one spot in GCC to use the new htab_eq_string function.

gcc

* gengtype-state.c (read_state): Use htab_eq_string.
(string_eq): Remove.
---
 gcc/gengtype-state.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/gcc/gengtype-state.c b/gcc/gengtype-state.c
index 891f2e18a610..a8fde959f4eb 100644
--- a/gcc/gengtype-state.c
+++ b/gcc/gengtype-state.c
@@ -2556,15 +2556,6 @@ equals_type_number (const void *ty1, const void *ty2)
   return type1->state_number == type2->state_number;
 }
 
-static int
-string_eq (const void *a, const void *b)
-{
-  const char *a0 = (const char *)a;
-  const char *b0 = (const char *)b;
-
-  return (strcmp (a0, b0) == 0);
-}
-
 
 /* The function reading the state, called by main from gengtype.c.  */
 void
@@ -2588,7 +2579,7 @@ read_state (const char *path)
   state_seen_types =
 htab_create (2017, hash_type_number, equals_type_number, NULL);
   state_ident_tab =
-htab_create (4027, htab_hash_string, string_eq, NULL);
+htab_create (4027, htab_hash_string, htab_eq_string, NULL);
   read_state_version (version_string);
   read_state_srcdir ();
   read_state_languages ();
-- 
2.26.3



[PATCH 0/3] Add htab_eq_string to libiberty

2021-05-05 Thread Tom Tromey
The libiberty hash table defines a hash function for strings, but not
an equality function.  This means that various files have had to
implement their own comparison function over the years.

This series resolves this for gcc.  Once this is in, I plan to import
the change into binutils-gdb and apply a similar fix there.

While examining all the uses of htab_hash_string, I found an oddity
related to this in libcpp.  I've filed PR preprocessor/100435 for
this.

Tom




[PATCH 1/3] libiberty: add htab_eq_string

2021-05-05 Thread Tom Tromey
The libiberty hash table includes a helper function for strings, but
no equality function.  Consequently, this equality function has been
reimplemented a number of times in both the gcc and binutils-gdb
source trees.  This patch adds the function to the libiberty hash
table, as a step toward the goal of removing all the copies.

One change to gcc is included here.  Normally I would have put this in
the next patch, but gensupport.c used the most natural name for its
reimplementation of this function, and this can't coexist with the
extern function in libiberty.

include

* hashtab.h (htab_eq_string): Declare.

libiberty

* hashtab.c (htab_eq_string): New function.

gcc

* gensupport.c (htab_eq_string): Remove.
---
 gcc/gensupport.c| 8 
 include/hashtab.h   | 3 +++
 libiberty/hashtab.c | 7 +++
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 0f19bd706646..e1ca06dbc1ec 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -2322,14 +2322,6 @@ gen_reader::handle_unknown_directive (file_location loc, 
const char *rtx_name)
 process_rtx (x, loc);
 }
 
-/* Comparison function for the mnemonic hash table.  */
-
-static int
-htab_eq_string (const void *s1, const void *s2)
-{
-  return strcmp ((const char*)s1, (const char*)s2) == 0;
-}
-
 /* Add mnemonic STR with length LEN to the mnemonic hash table
MNEMONIC_HTAB.  A trailing zero end character is appended to STR
and a permanent heap copy of STR is created.  */
diff --git a/include/hashtab.h b/include/hashtab.h
index b3a6265eeb6e..77c5eec79055 100644
--- a/include/hashtab.h
+++ b/include/hashtab.h
@@ -192,6 +192,9 @@ extern htab_eq htab_eq_pointer;
 /* A hash function for null-terminated strings.  */
 extern hashval_t htab_hash_string (const void *);
 
+/* An equality function for null-terminated strings.  */
+extern int htab_eq_string (const void *, const void *);
+
 /* An iterative hash function for arbitrary data.  */
 extern hashval_t iterative_hash (const void *, size_t, hashval_t);
 /* Shorthand for hashing something with an intrinsic size.  */
diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c
index 0c7208effe11..7c424e8f6cc1 100644
--- a/libiberty/hashtab.c
+++ b/libiberty/hashtab.c
@@ -841,6 +841,13 @@ htab_hash_string (const PTR p)
   return r;
 }
 
+/* An equality function for null-terminated strings.  */
+int
+htab_eq_string (const void *a, const void *b)
+{
+  return strcmp ((const char *) a, (const char *) b) == 0;
+}
+
 /* DERIVED FROM:
 
 lookup2.c, by Bob Jenkins, December 1996, Public Domain.
-- 
2.26.3



Re: [committed] libstdc++: Use unsigned char argument to std::isdigit

2021-05-05 Thread Jonathan Wakely via Gcc-patches

On 05/05/21 21:57 +0200, François Dumont via Libstdc++ wrote:

On 05/05/21 2:01 pm, Jonathan Wakely via Libstdc++ wrote:

Passing plain char to isdigit is undefined if the value is negative.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum): Pass unsigned
char to std::isdigit.

Tested powerpc64le-linux. Committed to trunk.


   unsigned char __c = *__first;
-      if (std::isdigit(__c))
+      if (std::isdigit(static_cast(__c)))

I am very curious to know what this static_cast does on 
__c which is already unsigned char ? If it does I'll just start to 
hate C++ :-)


Maybe you wanted to put it on the previous *__first ?


Ugh, yes, but it's not even needed there because the implicit
conversion is fine.

We do need to fix the isspace calls in src/c++11/debug.cc but this one
was already correct. Thanks!




Fix PR target/100402

2021-05-05 Thread Eric Botcazou
This is a regression for 64-bit Windows present from mainline down to the 9
branch and introduced by my fix for PR target/99234.  Again SEH, but with a
twist related to the way MinGW implements setjmp/longjmp that I discovered
while debugging it: setjmp/longjmp is piggybacked on SEH with recent versions
of MinGW, i.e. the longjmp performs a bona-fide unwinding of the stack, as it
calls RtlUnwindEx with the second argument passed to setjmp, which is the
result of __builtin_frame_address (0) in the MinGW header file:

  define setjmp(BUF) _setjmp((BUF), __builtin_frame_address (0))

This means that we directly expose the frame pointer to the SEH machinery here
(unlike with regular exception handling where we use an intermediate CFA) and
thus that we cannot do whatever we want with it.  The old code would leave it
unaligned, i.e. not multiple of 16, whereas the new code aligns it, but this
breaks for some reason; at least it appears that a .seh_setframe directive
with 0 as second argument always works, so the fix aligns it this way.

Tested on x86-64/Windows, applied on the affected branches as obvious.


2021-05-05  Eric Botcazou  

PR target/100402
* config/i386/i386.c (ix86_compute_frame_layout): For a SEH target,
always return the establisher frame for __builtin_frame_address (0).


2021-05-05  Eric Botcazou  

* gcc.c-torture/execute/20210505-1.c: New test.

-- 
Eric Botcazoudiff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 06b0f5814ea..ecc15358efe 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6672,12 +6672,29 @@ ix86_compute_frame_layout (void)
 	 area, see the SEH code in config/i386/winnt.c for the rationale.  */
   frame->hard_frame_pointer_offset = frame->sse_reg_save_offset;
 
-  /* If we can leave the frame pointer where it is, do so.  Also, return
+  /* If we can leave the frame pointer where it is, do so; however return
 	 the establisher frame for __builtin_frame_address (0) or else if the
-	 frame overflows the SEH maximum frame size.  */
+	 frame overflows the SEH maximum frame size.
+
+	 Note that the value returned by __builtin_frame_address (0) is quite
+	 constrained, because setjmp is piggybacked on the SEH machinery with
+	 recent versions of MinGW:
+
+	  #elif defined(__SEH__)
+	  # if defined(__aarch64__) || defined(_ARM64_)
+	  #  define setjmp(BUF) _setjmp((BUF), __builtin_sponentry())
+	  # elif (__MINGW_GCC_VERSION < 40702)
+	  #  define setjmp(BUF) _setjmp((BUF), mingw_getsp())
+	  # else
+	  #  define setjmp(BUF) _setjmp((BUF), __builtin_frame_address (0))
+	  # endif
+
+	 and the second argument passed to _setjmp, if not null, is forwarded
+	 to the TargetFrame parameter of RtlUnwindEx by longjmp (after it has
+	 built an ExceptionRecord on the fly describing the setjmp buffer).  */
   const HOST_WIDE_INT diff
 	= frame->stack_pointer_offset - frame->hard_frame_pointer_offset;
-  if (diff <= 255)
+  if (diff <= 255 && !crtl->accesses_prior_frames)
 	{
 	  /* The resulting diff will be a multiple of 16 lower than 255,
 	 i.e. at most 240 as required by the unwind data structure.  */
/* PR target/100402 */
/* Testcase by Hannes Domani  */

/* { dg-require-effective-target indirect_jumps } */

#include 
#include 

static jmp_buf buf;
static _Bool stop = false;

void call_func (void(*func)(void))
{
  func ();
}

void func (void)
{
  stop = true;
  longjmp (buf, 1);
}

int main (void)
{
  setjmp (buf);

  while (!stop)
call_func (func);

  return 0;
}


[PATCH 2/2] libstdc++: Implement LWG 3533 changes to foo_view::iterator::base()

2021-05-05 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/10/11?

libstdc++-v3/ChangeLog:

* include/std/ranges (filter_view::_Iterator::base): Make the
const& overload return a const reference and remove its
constraint as per LWG 3533. Make unconditionally noexcept.
(transform_view::_Iterator::base): Likewise.
(elements_view::_Iterator::base): Likewise.
---
 libstdc++-v3/include/std/ranges | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 7075fa3ae6e..bc11505c167 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1199,9 +1199,8 @@ namespace views::__adaptor
_M_parent(__parent)
{ }
 
-   constexpr _Vp_iter
-   base() const &
- requires copyable<_Vp_iter>
+   constexpr const _Vp_iter&
+   base() const & noexcept
{ return _M_current; }
 
constexpr _Vp_iter
@@ -1467,9 +1466,8 @@ namespace views::__adaptor
: _M_current(std::move(__i._M_current)), _M_parent(__i._M_parent)
  { }
 
- constexpr _Base_iter
- base() const &
-   requires copyable<_Base_iter>
+ constexpr const _Base_iter&
+ base() const & noexcept
  { return _M_current; }
 
  constexpr _Base_iter
@@ -3403,8 +3401,8 @@ namespace views::__adaptor
: _M_base(std::move(base))
   { }
 
-  constexpr _Vp
-  base() const& requires copy_constructible<_Vp>
+  constexpr const _Vp&
+  base() const & noexcept
   { return _M_base; }
 
   constexpr _Vp
-- 
2.31.1.442.g7e39198978



[PATCH 1/2] libstdc++: Implement LWG 3391 changes to move/counted_iterator::base

2021-05-05 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/10/11?

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (move_iterator::base): Make the
const& overload return a const reference and remove its
constraint as per LWG 3391.  Make unconditionally noexcept.
(counted_iterator::base): Likewise.
* testsuite/24_iterators/move_iterator/lwg3391.cc: New test.
* testsuite/24_iterators/move_iterator/move_only.cc: Adjust
has_member_base concept to decay-copy the result of base().
---
 libstdc++-v3/include/bits/stl_iterator.h  | 13 ++-
 .../24_iterators/move_iterator/lwg3391.cc | 37 +++
 .../24_iterators/move_iterator/move_only.cc   |  8 +++-
 3 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3391.cc

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 049f83cff90..2409cd71f86 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1409,11 +1409,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   base() const
   { return _M_current; }
 #else
-  constexpr iterator_type
-  base() const &
-#if __cpp_lib_concepts
-   requires copy_constructible
-#endif
+  constexpr const iterator_type&
+  base() const & noexcept
   { return _M_current; }
 
   constexpr iterator_type
@@ -2141,10 +2138,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  return *this;
}
 
-  constexpr _It
-  base() const &
-  noexcept(is_nothrow_copy_constructible_v<_It>)
-  requires copy_constructible<_It>
+  constexpr const _It&
+  base() const & noexcept
   { return _M_current; }
 
   constexpr _It
diff --git a/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3391.cc 
b/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3391.cc
new file mode 100644
index 000..18e015777cd
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3391.cc
@@ -0,0 +1,37 @@
+// Copyright (C) 2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+// Verify LWG 3391 changes.
+
+#include 
+#include 
+
+#include 
+
+using __gnu_test::test_range;
+using __gnu_test::input_iterator_wrapper_nocopy;
+
+void
+test01()
+{
+  extern test_range rx;
+  auto v = rx | std::views::take(5);
+  std::ranges::begin(v) != std::ranges::end(v);
+}
diff --git a/libstdc++-v3/testsuite/24_iterators/move_iterator/move_only.cc 
b/libstdc++-v3/testsuite/24_iterators/move_iterator/move_only.cc
index 639adfab0e2..5537dfbf3cd 100644
--- a/libstdc++-v3/testsuite/24_iterators/move_iterator/move_only.cc
+++ b/libstdc++-v3/testsuite/24_iterators/move_iterator/move_only.cc
@@ -43,7 +43,13 @@ template<> struct std::iterator_traits
 static_assert(std::input_iterator);
 
 template
-  concept has_member_base = requires (T t) { std::forward(t).base(); };
+  concept has_member_base = requires (T t) {
+// LWG 3391 made the const& overload of move_iterator::base()
+// unconstrained and return a const reference.  So rather than checking
+// whether base() is valid (which is now trivially true in an unevaluated
+// context), the below now checks whether decay-copying base() is valid.
+[](auto){}(std::forward(t).base());
+  };
 
 using move_only_move_iterator = std::move_iterator;
 
-- 
2.31.1.442.g7e39198978



Re: [committed] libstdc++: Use unsigned char argument to std::isdigit

2021-05-05 Thread François Dumont via Gcc-patches

On 05/05/21 2:01 pm, Jonathan Wakely via Libstdc++ wrote:

Passing plain char to isdigit is undefined if the value is negative.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum): Pass unsigned
char to std::isdigit.

Tested powerpc64le-linux. Committed to trunk.


   unsigned char __c = *__first;
-      if (std::isdigit(__c))
+      if (std::isdigit(static_cast(__c)))

I am very curious to know what this static_cast does on 
__c which is already unsigned char ? If it does I'll just start to hate 
C++ :-)


Maybe you wanted to put it on the previous *__first ?



[PATCH v2 10/10] RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
Atomic instructions require zero-offset memory addresses.
If we allow all addresses, the nonzero-offset addresses will
be prepared in an extra register in an extra instruction before
the actual atomic instruction.

This patch introduces the predicate "riscv_sync_memory_operand",
which restricts the memory operand to be suitable for atomic
instructions.

gcc/
PR 100266
* config/riscv/sync.md (riscv_sync_memory_operand): New.
* config/riscv/sync.md (riscv_load_reserved): Use new predicate.
* config/riscv/sync.md (riscv_store_conditional): Likewise.
* config/riscv/sync.md (atomic_): Likewise.
* config/riscv/sync.md (atomic_fetch_): Likewise.
* config/riscv/sync.md (atomic_exchange): Likewise.
* config/riscv/sync.md (atomic_compare_and_swap): Likewise.
---
 gcc/config/riscv/sync.md | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index da8dbf698163..cd9078a40248 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -30,6 +30,10 @@
   UNSPEC_STORE_CONDITIONAL
 ])
 
+(define_predicate "riscv_sync_memory_operand"
+  (and (match_operand 0 "memory_operand")
+   (match_code "reg" "0")))
+
 (define_code_iterator any_atomic [plus ior xor and])
 (define_code_attr atomic_optab
   [(plus "add") (ior "or") (xor "xor") (and "and")])
@@ -118,7 +122,7 @@
 (define_insn "@riscv_load_reserved"
   [(set (match_operand:GPR 0 "register_operand" "=r")
 (unspec_volatile:GPR
-  [(match_operand:GPR 1 "memory_operand" "A")
+  [(match_operand:GPR 1 "riscv_sync_memory_operand" "A")
(match_operand:SI 2 "const_int_operand")]  ;; model
   UNSPEC_LOAD_RESERVED))]
   "TARGET_ATOMIC"
@@ -133,7 +137,7 @@
   [(set (match_operand:DI 0 "register_operand" "=r")
 (sign_extend:DI
   (unspec_volatile:SI
-   [(match_operand:SI 1 "memory_operand" "A")
+   [(match_operand:SI 1 "riscv_sync_memory_operand" "A")
 (match_operand:SI 2 "const_int_operand")]  ;; model
UNSPEC_LOAD_RESERVED)))]
   "TARGET_ATOMIC && TARGET_64BIT"
@@ -143,7 +147,7 @@
 (define_insn "@riscv_store_conditional"
   [(set (match_operand:GPR 0 "register_operand" "=")
 (unspec_volatile:GPR [(const_int 0)] UNSPEC_STORE_CONDITIONAL))
-   (set (match_operand:GPR 1 "memory_operand" "=A")
+   (set (match_operand:GPR 1 "riscv_sync_memory_operand" "=A")
 (unspec_volatile:GPR
   [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
(match_operand:SI 3 "const_int_operand")]  ;; model
@@ -162,7 +166,7 @@
   [(set (match_operand:DI 0 "register_operand" "=")
 (sign_extend:DI
   (unspec_volatile:SI [(const_int 0)] UNSPEC_STORE_CONDITIONAL)))
-   (set (match_operand:SI 1 "memory_operand" "=A")
+   (set (match_operand:SI 1 "riscv_sync_memory_operand" "=A")
 (unspec_volatile:SI
   [(match_operand:SI 2 "reg_or_0_operand" "rJ")
(match_operand:SI 3 "const_int_operand")]  ;; model
@@ -172,7 +176,7 @@
 )
 
 (define_insn "atomic_"
-  [(set (match_operand:GPR 0 "memory_operand" "+A")
+  [(set (match_operand:GPR 0 "riscv_sync_memory_operand" "+A")
(unspec_volatile:GPR
  [(any_atomic:GPR (match_dup 0)
 (match_operand:GPR 1 "reg_or_0_operand" "rJ"))
@@ -184,7 +188,7 @@
 
 (define_insn "atomic_fetch_"
   [(set (match_operand:GPR 0 "register_operand" "=")
-   (match_operand:GPR 1 "memory_operand" "+A"))
+   (match_operand:GPR 1 "riscv_sync_memory_operand" "+A"))
(set (match_dup 1)
(unspec_volatile:GPR
  [(any_atomic:GPR (match_dup 1)
@@ -198,7 +202,7 @@
 (define_insn "atomic_exchange"
   [(set (match_operand:GPR 0 "register_operand" "=")
(unspec_volatile:GPR
- [(match_operand:GPR 1 "memory_operand" "+A")
+ [(match_operand:GPR 1 "riscv_sync_memory_operand" "+A")
   (match_operand:SI 3 "const_int_operand")] ;; model
  UNSPEC_SYNC_EXCHANGE))
(set (match_dup 1)
@@ -208,14 +212,14 @@
 )
 
 (define_expand "atomic_compare_and_swap"
-  [(match_operand:SI 0 "register_operand" "")   ;; bool output
-   (match_operand:GPR 1 "register_operand" "")  ;; val output
-   (match_operand:GPR 2 "memory_operand" "");; memory
-   (match_operand:GPR 3 "reg_or_0_operand" "")  ;; expected value
-   (match_operand:GPR 4 "reg_or_0_operand" "")  ;; desired value
-   (match_operand:SI 5 "const_int_operand" "")  ;; is_weak
-   (match_operand:SI 6 "const_int_operand" "")  ;; mod_s
-   (match_operand:SI 7 "const_int_operand" "")] ;; mod_f
+  [(match_operand:SI 0 "register_operand" "")   ;; bool output
+   (match_operand:GPR 1 "register_operand" "")  ;; val output
+   (match_operand:GPR 2 "riscv_sync_memory_operand" "") ;; memory
+   (match_operand:GPR 3 "reg_or_0_operand" "")  ;; expected value
+   (match_operand:GPR 4 "reg_or_0_operand" "")  ;; desired value
+   (match_operand:SI 5 "const_int_operand" "") 

[PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS [PR 100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
The existing CAS implementation uses an INSN definition, which provides
the core LR/SC sequence. Additionally to that, there is a follow-up code,
that evaluates the results and calculates the return values.
This has two drawbacks: a) an extension to sub-word CAS implementations
is not possible (even if, then it would be unmaintainable), and b) the
implementation is hard to maintain/improve.
This patch provides a programmatic implementation of CAS, similar
like many other architectures are having one.

The implementation supports both, RV32 and RV64.

Additionally, the implementation does not introduce data dependencies
for computation of the return value. Instead, we set the return value
(success state of the CAS operation) based on structural information.
This approach is also shown in the the RISC-V unpriv spec (as part
of the sample code for a compare-and-swap function using LR/SC).
The cost of this implementation is a single LI instruction on top,
which is actually not required in case of success (it will be
overwritten in the success case later).

The resulting sequence requires 9 instructions in the success case.
The previous implementation required 11 instructions in the succcess
case (including a taken branch) and had a "subw;seqz;beqz" sequence,
with direct dependencies.

Below is the generated code of a 32-bit CAS sequence with the old
implementation and the new implementation (ignore the ANDIs below).

Old:
 f00:   419clw  a5,0(a1)
 f02:   1005272flr.wa4,(a0)
 f06:   00f71563bne a4,a5,f10
 f0a:   18c526afsc.wa3,a2,(a0)
 f0e:   faf5bneza3,f02
 f10:   40f707bbsubwa5,a4,a5
 f14:   0017b513seqza0,a5
 f18:   c391beqza5,f1c
 f1a:   c198sw  a4,0(a1)
 f1c:   8905andia0,a0,1
 f1e:   8082ret

New:
 e28:   4194lw  a3,0(a1)
 e2a:   4701li  a4,0
 e2c:   1005282flr.wa6,(a0)
 e30:   00d81963bne a6,a3,e42
 e34:   18c527afsc.wa5,a2,(a0)
 e38:   fbf5bneza5,e2c
 e3a:   4705li  a4,1
 e3c:   00177513andia0,a4,1
 e40:   8082ret
 e42:   0105a023sw  a6,0(a1)
 e46:   00177513andia0,a4,1
 e4a:   8082ret

gcc/
PR 100266
* config/riscv/riscv-protos.h (riscv_expand_compare_and_swap): New.
* config/riscv/riscv.c (riscv_emit_unlikely_jump): New.
* config/rsicv/riscv.c (riscv_expand_compare_and_swap): New.
* config/rsicv/sync.md (atomic_cas_value_strong): Removed.
* config/rsicv/sync.md (atomic_compare_and_swap): Call
  riscv_expand_compare_and_swap.
---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv.c| 75 +
 gcc/config/riscv/sync.md| 35 +--
 3 files changed, 77 insertions(+), 34 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 43d7224d6941..eb7e67d3b95a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -59,6 +59,7 @@ extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, 
rtx);
 extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_conditional_move (rtx, rtx, rtx, rtx_code, rtx, rtx);
+extern void riscv_expand_compare_and_swap (rtx[]);
 #endif
 extern rtx riscv_legitimize_call_address (rtx);
 extern void riscv_set_return_address (rtx, rtx);
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 5fe65776e608..a7b18d650daa 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2496,6 +2496,81 @@ riscv_expand_conditional_move (rtx dest, rtx cons, rtx 
alt, rtx_code code,
  cons, alt)));
 }
 
+/* Mark the previous jump instruction as unlikely.  */
+
+static void
+riscv_emit_unlikely_jump (rtx insn)
+{
+  rtx_insn *jump = emit_jump_insn (insn);
+  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
+}
+
+/* Expand code to perform a compare-and-swap.  */
+
+extern void
+riscv_expand_compare_and_swap (rtx operands[])
+{
+  rtx bval, oldval, mem, expval, newval, mod_s, mod_f, scratch, cond1, cond2;
+  machine_mode mode;
+  rtx_code_label *begin_label, *end_label;
+
+  bval = operands[0];
+  oldval = operands[1];
+  mem = operands[2];
+  expval = operands[3];
+  newval = operands[4];
+  mod_s = operands[6];
+  mod_f = operands[7];

[PATCH v2 08/10] RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
The current model of the LR and SC INSNs requires a sign-extension
to use the generated SImode value for conditional branches, which
only operate on XLEN registers.
However, the sign-extension is actually not required in both cases,
therefore this patch introduces additional INSNs that consume
the sign-extension.

Rationale:
The sign-extension of the loaded value of a LR.W is specified as
sign-extended. Therefore, a sign-extension is not required.
The sign-extension of the success value a SC.W is specified as
non-zero. As sign-extended non-zero value remains non-zero,
therefore the sign-extension is not required.

gcc/
PR 100266
* config/riscv/sync.md (riscv_load_reserved): New.
* config/riscv/sync.md (riscv_store_conditional): New.
---
 gcc/config/riscv/sync.md | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index edff6520b87e..49b860da8ef0 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -125,6 +125,21 @@
   "lr.%A2 %0, %1"
 )
 
+;; This pattern allows to consume a sign-extension of the loaded value.
+;; This is legal, because the specification of LR.W defines the loaded
+;; value to be sign-extended.
+
+(define_insn "riscv_load_reserved"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(sign_extend:DI
+  (unspec_volatile:SI
+   [(match_operand:SI 1 "memory_operand" "A")
+(match_operand:SI 2 "const_int_operand")]  ;; model
+   UNSPEC_LOAD_RESERVED)))]
+  "TARGET_ATOMIC && TARGET_64BIT"
+  "lr.w%A2 %0, %1"
+)
+
 (define_insn "@riscv_store_conditional"
   [(set (match_operand:GPR 0 "register_operand" "=")
 (unspec_volatile:GPR [(const_int 0)] UNSPEC_STORE_CONDITIONAL))
@@ -137,6 +152,25 @@
   "sc.%A3 %0, %z2, %1"
 )
 
+;; This pattern allows to consume a sign-extension of the success
+;; value of SC.W, which can then be used for instructions which
+;; require values of XLEN-size (e.g. conditional branches).
+;; This is legal, because any non-zero value remains non-zero
+;; after sign-extension.
+
+(define_insn "riscv_store_conditional"
+  [(set (match_operand:DI 0 "register_operand" "=")
+(sign_extend:DI
+  (unspec_volatile:SI [(const_int 0)] UNSPEC_STORE_CONDITIONAL)))
+   (set (match_operand:SI 1 "memory_operand" "=A")
+(unspec_volatile:SI
+  [(match_operand:SI 2 "reg_or_0_operand" "rJ")
+   (match_operand:SI 3 "const_int_operand")]  ;; model
+  UNSPEC_STORE_CONDITIONAL))]
+  "TARGET_ATOMIC && TARGET_64BIT"
+  "sc.w%A3 %0, %z2, %1"
+)
+
 (define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
(unspec_volatile:GPR
-- 
2.31.1



[PATCH v2 07/10] RISC-V: Model INSNs for LR and SC [PR 100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
In order to emit LR/SC sequences, let's provide INSNs, which
take care of memory ordering constraints.

gcc/
PR 100266
* config/rsicv/sync.md (UNSPEC_LOAD_RESERVED): New.
* config/rsicv/sync.md (UNSPEC_STORE_CONDITIONAL): New.
* config/riscv/sync.md (riscv_load_reserved): New.
* config/riscv/sync.md (riscv_store_conditional): New.
---
 gcc/config/riscv/sync.md | 24 
 1 file changed, 24 insertions(+)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index ceec324dfa30..edff6520b87e 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -26,6 +26,8 @@
   UNSPEC_ATOMIC_LOAD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
+  UNSPEC_LOAD_RESERVED
+  UNSPEC_STORE_CONDITIONAL
 ])
 
 (define_code_iterator any_atomic [plus ior xor and])
@@ -113,6 +115,28 @@
 DONE;
 })
 
+(define_insn "@riscv_load_reserved"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(unspec_volatile:GPR
+  [(match_operand:GPR 1 "memory_operand" "A")
+   (match_operand:SI 2 "const_int_operand")]  ;; model
+  UNSPEC_LOAD_RESERVED))]
+  "TARGET_ATOMIC"
+  "lr.%A2 %0, %1"
+)
+
+(define_insn "@riscv_store_conditional"
+  [(set (match_operand:GPR 0 "register_operand" "=")
+(unspec_volatile:GPR [(const_int 0)] UNSPEC_STORE_CONDITIONAL))
+   (set (match_operand:GPR 1 "memory_operand" "=A")
+(unspec_volatile:GPR
+  [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
+   (match_operand:SI 3 "const_int_operand")]  ;; model
+  UNSPEC_STORE_CONDITIONAL))]
+  "TARGET_ATOMIC"
+  "sc.%A3 %0, %z2, %1"
+)
+
 (define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
(unspec_volatile:GPR
-- 
2.31.1



[PATCH v2 06/10] RISC-V: Implement atomic_{load,store} [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
A recent commit introduced a mechanism to emit proper fences
for RISC-V. Additionally, we already have emit_move_insn ().
Let's reuse this code and provide atomic_load and
atomic_store for RISC-V (as defined in section
"Code Porting and Mapping Guidelines" of the unpriv spec).
Note, that this works also for sub-word atomics.

gcc/
PR 100265
* config/riscv/sync.md (atomic_load): New.
* config/riscv/sync.md (atomic_store): New.
---
 gcc/config/riscv/sync.md | 41 
 1 file changed, 41 insertions(+)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 406db1730b81..ceec324dfa30 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -23,6 +23,7 @@
   UNSPEC_COMPARE_AND_SWAP
   UNSPEC_SYNC_OLD_OP
   UNSPEC_SYNC_EXCHANGE
+  UNSPEC_ATOMIC_LOAD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -72,6 +73,46 @@
 
 ;; Atomic memory operations.
 
+(define_expand "atomic_load"
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+(unspec_volatile:ANYI
+  [(match_operand:ANYI 1 "memory_operand" "A")
+   (match_operand:SI 2 "const_int_operand")]  ;; model
+  UNSPEC_ATOMIC_LOAD))]
+  ""
+  {
+rtx target = operands[0];
+rtx mem = operands[1];
+enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+if (is_mm_seq_cst (model))
+  emit_insn (gen_mem_fence (GEN_INT (MEMMODEL_SEQ_CST)));
+emit_move_insn (target, mem);
+if (is_mm_acquire (model) || is_mm_seq_cst (model))
+  emit_insn (gen_mem_fence (GEN_INT (MEMMODEL_ACQUIRE)));
+
+DONE;
+})
+
+(define_expand "atomic_store"
+  [(set (match_operand:ANYI 0 "memory_operand" "=A")
+(unspec_volatile:ANYI
+  [(match_operand:ANYI 1 "reg_or_0_operand" "rJ")
+   (match_operand:SI 2 "const_int_operand")]  ;; model
+  UNSPEC_ATOMIC_STORE))]
+  ""
+  {
+rtx mem = operands[0];
+rtx val = operands[1];
+enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
+
+if (is_mm_release (model) || is_mm_seq_cst (model))
+  emit_insn (gen_mem_fence (GEN_INT (MEMMODEL_RELEASE)));
+emit_move_insn (mem, val);
+
+DONE;
+})
+
 (define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
(unspec_volatile:GPR
-- 
2.31.1



[PATCH v2 05/10] RISC-V: Emit fences according to chosen memory model [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
mem_thread_fence gets the desired memory model as operand.
Let's emit fences according to this value (as defined in section
"Code Porting and Mapping Guidelines" of the unpriv spec).

gcc/
PR 100265
* config/riscv/sync.md (mem_thread_fence):
  Emit fences according to given operand.
* config/riscv/sync.md (mem_fence):
  Add INSNs for different fence flavours.
* config/riscv/sync.md (mem_thread_fence_1):
  Remove.
---
 gcc/config/riscv/sync.md | 41 +++-
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index efd49745a8e2..406db1730b81 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -34,26 +34,41 @@
 ;; Memory barriers.
 
 (define_expand "mem_thread_fence"
-  [(match_operand:SI 0 "const_int_operand" "")] ;; model
+  [(match_operand:SI 0 "const_int_operand")] ;; model
   ""
 {
-  if (INTVAL (operands[0]) != MEMMODEL_RELAXED)
-{
-  rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
-  MEM_VOLATILE_P (mem) = 1;
-  emit_insn (gen_mem_thread_fence_1 (mem, operands[0]));
-}
+  enum memmodel model = memmodel_from_int (INTVAL (operands[0]));
+  if (!(is_mm_relaxed (model)))
+  emit_insn (gen_mem_fence (operands[0]));
   DONE;
 })
 
-;; Until the RISC-V memory model (hence its mapping from C++) is finalized,
-;; conservatively emit a full FENCE.
-(define_insn "mem_thread_fence_1"
+(define_expand "mem_fence"
+  [(set (match_dup 1)
+   (unspec:BLK [(match_dup 1) (match_operand:SI 0 "const_int_operand")]
+   UNSPEC_MEMORY_BARRIER))]
+  ""
+{
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
+})
+
+(define_insn "*mem_fence"
   [(set (match_operand:BLK 0 "" "")
-   (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
-   (match_operand:SI 1 "const_int_operand" "")] ;; model
+   (unspec:BLK [(match_dup 0) (match_operand:SI 1 "const_int_operand")]
+   UNSPEC_MEMORY_BARRIER))]
   ""
-  "fence\tiorw,iorw")
+{
+  enum memmodel model = memmodel_from_int (INTVAL (operands[1]));
+  if (is_mm_consume (model) || is_mm_acquire (model))
+return "fence\tr, rw";
+  else if (is_mm_release (model))
+return "fence\trw, w";
+  else if (is_mm_acq_rel (model))
+return "fence.tso";
+  else
+return "fence\trw, rw";
+})
 
 ;; Atomic memory operations.
 
-- 
2.31.1



[PATCH v2 04/10] RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
Using AMOSWAP as atomic store does not allow us to do sub-word accesses.
Further, it is not consistent with our atomic_load () implementation.
The benefit of AMOSWAP is that the resulting code sequence will be
smaller (comapred to FENCE+STORE), however, this does not weight
out for the lack of sub-word accesses.
Additionally, HW implementors have claimed that an optimal
implementation AMOSWAP is slightly more expensive than FENCE+STORE.
So let's use STORE instead of AMOSWAP.

gcc/
PR 100265
* config/riscv/sync.md (atomic_store):
  Remove.
---
 gcc/config/riscv/sync.md | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index aeeb2e854b68..efd49745a8e2 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -57,17 +57,6 @@
 
 ;; Atomic memory operations.
 
-;; Implement atomic stores with amoswap.  Fall back to fences for atomic loads.
-(define_insn "atomic_store"
-  [(set (match_operand:GPR 0 "memory_operand" "=A")
-(unspec_volatile:GPR
-  [(match_operand:GPR 1 "reg_or_0_operand" "rJ")
-   (match_operand:SI 2 "const_int_operand")]  ;; model
-  UNSPEC_ATOMIC_STORE))]
-  "TARGET_ATOMIC"
-  "amoswap.%A2 zero,%z1,%0"
-  [(set (attr "length") (const_int 8))])
-
 (define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
(unspec_volatile:GPR
-- 
2.31.1



[PATCH v2 03/10] RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
A previous patch took care, that the proper memory ordering suffixes
for AMOs are emitted. Therefore there is no reason to keep the fence
generation mechanism for release operations.

gcc/
PR 100265
* config/riscv/riscv.c (riscv_memmodel_needs_release_fence):
  Remove function.
* config/riscv/riscv.c (riscv_print_operand): Remove
  %F format specifier.
* config/riscv/sync.md: Remove %F format specifier uses.
---
 gcc/config/riscv/riscv.c | 29 -
 gcc/config/riscv/sync.md | 16 
 2 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 3edd5c239d7c..5fe65776e608 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3371,29 +3371,6 @@ riscv_print_amo_memory_ordering_suffix (FILE *file, 
const enum memmodel model)
 }
 }
 
-/* Return true if a FENCE should be emitted to before a memory access to
-   implement the release portion of memory model MODEL.  */
-
-static bool
-riscv_memmodel_needs_release_fence (const enum memmodel model)
-{
-  switch (model)
-{
-  case MEMMODEL_ACQ_REL:
-  case MEMMODEL_SEQ_CST:
-  case MEMMODEL_RELEASE:
-   return true;
-
-  case MEMMODEL_ACQUIRE:
-  case MEMMODEL_CONSUME:
-  case MEMMODEL_RELAXED:
-   return false;
-
-  default:
-   gcc_unreachable ();
-}
-}
-
 /* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
 
'h' Print the high-part relocation associated with OP, after stripping
@@ -3401,7 +3378,6 @@ riscv_memmodel_needs_release_fence (const enum memmodel 
model)
'R' Print the low-part relocation associated with OP.
'C' Print the integer branch condition for comparison OP.
'A' Print the atomic operation suffix for memory model OP.
-   'F' Print a FENCE if the memory model requires a release.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.  */
 
@@ -3433,11 +3409,6 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   riscv_print_amo_memory_ordering_suffix (file, model);
   break;
 
-case 'F':
-  if (riscv_memmodel_needs_release_fence (model))
-   fputs ("fence iorw,ow; ", file);
-  break;
-
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 747a799e2377..aeeb2e854b68 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -65,7 +65,7 @@
(match_operand:SI 2 "const_int_operand")]  ;; model
   UNSPEC_ATOMIC_STORE))]
   "TARGET_ATOMIC"
-  "%F2amoswap.%A2 zero,%z1,%0"
+  "amoswap.%A2 zero,%z1,%0"
   [(set (attr "length") (const_int 8))])
 
 (define_insn "atomic_"
@@ -76,8 +76,8 @@
   (match_operand:SI 2 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F2amo.%A2 zero,%z1,%0"
-  [(set (attr "length") (const_int 8))])
+  "amo.%A2 zero,%z1,%0"
+)
 
 (define_insn "atomic_fetch_"
   [(set (match_operand:GPR 0 "register_operand" "=")
@@ -89,8 +89,8 @@
   (match_operand:SI 3 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F3amo.%A3 %0,%z2,%1"
-  [(set (attr "length") (const_int 8))])
+  "amo.%A3 %0,%z2,%1"
+)
 
 (define_insn "atomic_exchange"
   [(set (match_operand:GPR 0 "register_operand" "=")
@@ -101,8 +101,8 @@
(set (match_dup 1)
(match_operand:GPR 2 "register_operand" "0"))]
   "TARGET_ATOMIC"
-  "%F3amoswap.%A3 %0,%z2,%1"
-  [(set (attr "length") (const_int 8))])
+  "amoswap.%A3 %0,%z2,%1"
+)
 
 (define_insn "atomic_cas_value_strong"
   [(set (match_operand:GPR 0 "register_operand" "=")
@@ -115,7 +115,7 @@
 UNSPEC_COMPARE_AND_SWAP))
(clobber (match_scratch:GPR 6 "="))]
   "TARGET_ATOMIC"
-  "%F5 1: lr.%A5 %0,%1; bne %0,%z2,1f; sc.%A4 %6,%z3,%1; bnez %6,1b; 
1:"
+  "1: lr.%A5 %0,%1; bne %0,%z2,1f; sc.%A4 %6,%z3,%1; bnez %6,1b; 1:"
   [(set (attr "length") (const_int 20))])
 
 (define_expand "atomic_compare_and_swap"
-- 
2.31.1



[PATCH v2 02/10] RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
The ratified A extension supports '.aq', '.rl' and '.aqrl' as
memory ordering suffixes. Let's emit them in case we get a '%A'
conversion specifier for riscv_print_operand().

As '%A' was already used for a similar, but restricted, purpose
(only '.aq' was emitted so far), this does not require any other
changes.

gcc/
PR 100265
* config/riscv/riscv.c (riscv_memmodel_needs_amo_acquire):
  Remove function.
* config/riscv/riscv.c (riscv_print_amo_memory_ordering_suffix):
  Add function to emit AMO memory ordering suffixes.
* config/riscv/riscv.c (riscv_print_operand): Call
  riscv_print_amo_memory_ordering_suffix() instead of
  riscv_memmodel_needs_amo_acquire().
---
 gcc/config/riscv/riscv.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 545f3d0cb82c..3edd5c239d7c 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3346,24 +3346,26 @@ riscv_print_operand_reloc (FILE *file, rtx op, bool 
hi_reloc)
   fputc (')', file);
 }
 
-/* Return true if the .AQ suffix should be added to an AMO to implement the
-   acquire portion of memory model MODEL.  */
+/* Print the memory ordering suffix for AMOs.  */
 
-static bool
-riscv_memmodel_needs_amo_acquire (const enum memmodel model)
+static void
+riscv_print_amo_memory_ordering_suffix (FILE *file, const enum memmodel model)
 {
   switch (model)
 {
-  case MEMMODEL_ACQ_REL:
-  case MEMMODEL_SEQ_CST:
-  case MEMMODEL_ACQUIRE:
+  case MEMMODEL_RELAXED:
+   break;
   case MEMMODEL_CONSUME:
-   return true;
-
+  case MEMMODEL_ACQUIRE:
+   fputs (".aq", file);
+   break;
   case MEMMODEL_RELEASE:
-  case MEMMODEL_RELAXED:
-   return false;
-
+   fputs (".rl", file);
+   break;
+  case MEMMODEL_ACQ_REL:
+  case MEMMODEL_SEQ_CST:
+   fputs (".aqrl", file);
+   break;
   default:
gcc_unreachable ();
 }
@@ -3428,8 +3430,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'A':
-  if (riscv_memmodel_needs_amo_acquire (model))
-   fputs (".aq", file);
+  riscv_print_amo_memory_ordering_suffix (file, model);
   break;
 
 case 'F':
-- 
2.31.1



[PATCH v2 01/10] RISC-V: Simplify memory model code [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
We don't have any special treatment of MEMMODEL_SYNC_* values,
so let's hide them behind the memmodel_base() function.

gcc/
PR 100265
* config/riscv/riscv.c (riscv_memmodel_needs_amo_acquire):
  Ignore MEMMODEL_SYNC_* values.
* config/riscv/riscv.c (riscv_memmodel_needs_release_fence):
  Likewise.
* config/riscv/riscv.c (riscv_print_operand): Eliminate
  MEMMODEL_SYNC_* values by calling memmodel_base().
---
 gcc/config/riscv/riscv.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 27665e5b58f9..545f3d0cb82c 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3350,20 +3350,17 @@ riscv_print_operand_reloc (FILE *file, rtx op, bool 
hi_reloc)
acquire portion of memory model MODEL.  */
 
 static bool
-riscv_memmodel_needs_amo_acquire (enum memmodel model)
+riscv_memmodel_needs_amo_acquire (const enum memmodel model)
 {
   switch (model)
 {
   case MEMMODEL_ACQ_REL:
   case MEMMODEL_SEQ_CST:
-  case MEMMODEL_SYNC_SEQ_CST:
   case MEMMODEL_ACQUIRE:
   case MEMMODEL_CONSUME:
-  case MEMMODEL_SYNC_ACQUIRE:
return true;
 
   case MEMMODEL_RELEASE:
-  case MEMMODEL_SYNC_RELEASE:
   case MEMMODEL_RELAXED:
return false;
 
@@ -3376,20 +3373,17 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
implement the release portion of memory model MODEL.  */
 
 static bool
-riscv_memmodel_needs_release_fence (enum memmodel model)
+riscv_memmodel_needs_release_fence (const enum memmodel model)
 {
   switch (model)
 {
   case MEMMODEL_ACQ_REL:
   case MEMMODEL_SEQ_CST:
-  case MEMMODEL_SYNC_SEQ_CST:
   case MEMMODEL_RELEASE:
-  case MEMMODEL_SYNC_RELEASE:
return true;
 
   case MEMMODEL_ACQUIRE:
   case MEMMODEL_CONSUME:
-  case MEMMODEL_SYNC_ACQUIRE:
   case MEMMODEL_RELAXED:
return false;
 
@@ -3414,6 +3408,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 {
   machine_mode mode = GET_MODE (op);
   enum rtx_code code = GET_CODE (op);
+  const enum memmodel model = memmodel_base (INTVAL (op));
 
   switch (letter)
 {
@@ -3433,12 +3428,12 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'A':
-  if (riscv_memmodel_needs_amo_acquire ((enum memmodel) INTVAL (op)))
+  if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
   break;
 
 case 'F':
-  if (riscv_memmodel_needs_release_fence ((enum memmodel) INTVAL (op)))
+  if (riscv_memmodel_needs_release_fence (model))
fputs ("fence iorw,ow; ", file);
   break;
 
-- 
2.31.1



[PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
This series provides a cleanup of the current atomics implementation
of RISC-V:

* PR100265: Use proper fences for atomic load/store
* PR100266: Provide programmatic implementation of CAS

As both are very related, I merged the patches into one series.

The first patch could be squashed into the following patches,
but I found it easier to understand the chances with it in place.

The series has been tested as follows:
* Building and testing a multilib RV32/64 toolchain
  (bootstrapped with riscv-gnu-toolchain repo)
* Manual review of generated sequences for GCC's atomic builtins API

The programmatic re-implementation of CAS benefits from a REE improvement
(see PR100264):
  https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
If this patch is not in place, then an additional extension instruction
is emitted after the SC.W (in case of RV64 and CAS for uint32_t).

Further, the new CAS code requires cbranch INSN helpers to be present:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html

Changes for v2:
* Guard LL/SC sequence by compiler barriers ("blockage")
  (suggested by Andrew Waterman)
* Changed commit message for AMOSWAP->STORE change
  (suggested by Andrew Waterman)
* Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
* Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
* Fix small code style issue

Christoph Muellner (10):
  RISC-V: Simplify memory model code [PR 100265]
  RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
  RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
  RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
  RISC-V: Emit fences according to chosen memory model [PR 100265]
  RISC-V: Implement atomic_{load,store} [PR 100265]
  RISC-V: Model INSNs for LR and SC [PR 100266]
  RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
  RISC-V: Provide programmatic implementation of CAS [PR 100266]
  RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]

 gcc/config/riscv/riscv-protos.h |   1 +
 gcc/config/riscv/riscv.c| 136 +---
 gcc/config/riscv/sync.md| 216 +---
 3 files changed, 235 insertions(+), 118 deletions(-)

-- 
2.31.1



Re: [PATCH 09/10] RISC-V: Generate helpers for cbranch4 [PR 100266]

2021-05-05 Thread Christoph Müllner via Gcc-patches
On Mon, Apr 26, 2021 at 4:40 PM Kito Cheng  wrote:
>
> This patch is a good and simple improvement which could be an independent 
> patch.
>
> There is only 1 comment from me for this patch, could you also add @
> to cbranch pattern for floating mode, I would prefer make the
> gen_cbranch4 could handle floating mode for consistency.

Did that and I also found one more code location, where which could be
simplified.
Patch can be found here:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html

>
> So feel free to commit this patch once you have addressed my comment.
>
>
>
> On Mon, Apr 26, 2021 at 8:46 PM Christoph Muellner
>  wrote:
> >
> > On RISC-V we are facing the fact, that our conditional branches
> > require Pmode conditions. Currently, we generate them explicitly
> > with a check for Pmode and then calling the proper generator
> > (i.e. gen_cbranchdi4 on RV64 and gen_cbranchsi4 on RV32).
> > Let's make simplify this code by using gen_cbranch4 (Pmode).
> >
> > gcc/
> > PR 100266
> > * config/rsicv/riscv.c (riscv_block_move_loop): Simplify.
> > * config/rsicv/riscv.md (cbranch4): Generate helpers.
> > ---
> >  gcc/config/riscv/riscv.c  | 5 +
> >  gcc/config/riscv/riscv.md | 2 +-
> >  2 files changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> > index 87cdde73ae21..6e97b38db6db 100644
> > --- a/gcc/config/riscv/riscv.c
> > +++ b/gcc/config/riscv/riscv.c
> > @@ -3250,10 +3250,7 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned 
> > HOST_WIDE_INT length,
> >
> >/* Emit the loop condition.  */
> >test = gen_rtx_NE (VOIDmode, src_reg, final_src);
> > -  if (Pmode == DImode)
> > -emit_jump_insn (gen_cbranchdi4 (test, src_reg, final_src, label));
> > -  else
> > -emit_jump_insn (gen_cbranchsi4 (test, src_reg, final_src, label));
> > +  emit_jump_insn (gen_cbranch4 (Pmode, test, src_reg, final_src, label));
> >
> >/* Mop up any left-over bytes.  */
> >if (leftover)
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index c3687d57047b..52f8a321ac23 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -1908,7 +1908,7 @@
> >   (label_ref (match_operand 1))
> >   (pc)))])
> >
> > -(define_expand "cbranch4"
> > +(define_expand "@cbranch4"
> >[(set (pc)
> > (if_then_else (match_operator 0 "comparison_operator"
> >   [(match_operand:BR 1 "register_operand")
> > --
> > 2.31.1
> >


[PATCH] RISC-V: Generate helpers for cbranch4

2021-05-05 Thread Christoph Muellner via Gcc-patches
On RISC-V we are facing the fact, that our conditional branches
require Pmode conditions. Currently, we generate them explicitly
with a check for Pmode and then calling the proper generator
(i.e. gen_cbranchdi4 on RV64 and gen_cbranchsi4 on RV32).
Let's simplify this code by generating the INSN helpers
and use gen_cbranch4 (Pmode).

gcc/
PR 100266
* config/rsicv/riscv.c (riscv_block_move_loop): Simplify.
* config/rsicv/riscv.md (cbranch4): Generate helpers.
---
 gcc/config/riscv/riscv.c  |  5 +
 gcc/config/riscv/riscv.md | 12 
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index e1064e374eb0..27665e5b58f9 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3258,10 +3258,7 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length,
 
   /* Emit the loop condition.  */
   test = gen_rtx_NE (VOIDmode, src_reg, final_src);
-  if (Pmode == DImode)
-emit_jump_insn (gen_cbranchdi4 (test, src_reg, final_src, label));
-  else
-emit_jump_insn (gen_cbranchsi4 (test, src_reg, final_src, label));
+  emit_jump_insn (gen_cbranch4 (Pmode, test, src_reg, final_src, label));
 
   /* Mop up any left-over bytes.  */
   if (leftover)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 0e35960fefaa..f88877fd5966 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2153,7 +2153,7 @@
  (label_ref (match_operand 1))
  (pc)))])
 
-(define_expand "cbranch4"
+(define_expand "@cbranch4"
   [(set (pc)
(if_then_else (match_operator 0 "comparison_operator"
  [(match_operand:BR 1 "register_operand")
@@ -2167,7 +2167,7 @@
   DONE;
 })
 
-(define_expand "cbranch4"
+(define_expand "@cbranch4"
   [(set (pc)
(if_then_else (match_operator 0 "fp_branch_comparison"
   [(match_operand:ANYF 1 "register_operand")
@@ -2829,12 +2829,8 @@
operands[0],
operands[1]));
 
-  if (mode == DImode)
-emit_jump_insn (gen_cbranchdi4 (gen_rtx_EQ (VOIDmode, result, const0_rtx),
-   result, const0_rtx, operands[2]));
-  else
-emit_jump_insn (gen_cbranchsi4 (gen_rtx_EQ (VOIDmode, result, const0_rtx),
-   result, const0_rtx, operands[2]));
+  rtx cond = gen_rtx_EQ (VOIDmode, result, const0_rtx);
+  emit_jump_insn (gen_cbranch4 (mode, cond, result, const0_rtx, operands[2]));
 
   DONE;
 })
-- 
2.31.1



[Patch, fortran] PR84119 - Type parameter inquiry for PDT returns array instead of scalar

2021-05-05 Thread Paul Richard Thomas via Gcc-patches
Ping!

On Tue, 20 Apr 2021 at 12:51, Paul Richard Thomas <
paul.richard.tho...@gmail.com> wrote:

> Hi All,
>
> This is another PDT warm-up patch before tackling the real beast: PR82649.
>
> As the contributor wrote in the PR, "The F08 standard clearly
> distinguishes between type parameter definition statements and component
> definition statements. See R425, R431, R435, and in particular see Note 6.7
> which says 'It [array%a, for example] is scalar even if designator is an
> array.' " gfortran was not making this distinction. The patch realises the
> fix by lifting the code used for inquiry part references into a new
> function and calling for PDT parameters and inquiry references. The
> arrayspec lbound is used for 'start' now, rather than unity. In principle
> this should remove the need to suppress bound checking. However, since this
> would be confusing for the user to say the least of it, the suppression has
> been retained.
>
> Bootstraps and regtests on FC33/x86_64. OK for 12- and 11-branches?
>
> Cheers
>
> Paul
>
> Fortran: Make PDT LEN and KIND expressions always scalar [PR84119].
>
> 2021-04-20  Paul Thomas  
>
> gcc/fortran
> PR fortran/84119
> * resolve.c (reset_array_ref_to_scalar): New function.
> (gfc_resolve_ref): Call it for PDT kind and len expressions.
> Code for inquiry refs. moved to new function and replaced by a
> call to it.
>
> gcc/testsuite/
> PR fortran/84119
> * gfortran.dg/pdt_32.f03: New test.
> * gfortran.dg/pdt_20.f03: Correct the third test to be against
> a scalar instead of an array.
>
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: [PATCH] Remove CC0

2021-05-05 Thread Koning, Paul via Gcc-patches



> On May 5, 2021, at 8:45 AM, Segher Boessenkool  
> wrote:
> 
> Hi~
> 
> On Tue, May 04, 2021 at 04:08:22PM +0100, Richard Earnshaw wrote:
>> On 03/05/2021 23:55, Segher Boessenkool wrote:
>>> CC_STATUS_INIT is suggested in final.c to also be useful for ports that
>>> are not CC0, and at least arm seems to use it for something.  So I am
>>> leaving that alone, but most targets that have it could remove it.
>> 
>> A quick look through the code suggests it's being used for thumb1 code 
>> gen to try to reproduce the traditional CC0 type behaviour of 
>> eliminating redundant compare operations when you have sequences such as
>> 
>> cmp a, b
>> b d1
>> cmp a, b
>> b d2
>> 
>> The second compare operation can be eliminated.
>> 
>> It might be possible to eliminate this another way by reworking the 
>> thumb1 codegen to expose the condition codes after register allocation 
>> has completed (much like x86 does these days), but that would be quite a 
>> lot of work right now.  I don't know if such splitting would directly 
>> lead to the ability to remove the redundant compares - it might need a 
>> new pass to spot them.
> 
> At least on rs6000 on a simple example this is handled by fwprop1
> already.  Does that work for thumb1?  Or maybe that uses hard regs for
> the condition codes and that does not work here?
> 
> Example code:
> 
> ===
> void g(void);
> void h(void);
> void i(void);
> void f(long a, long b)
> {
>if (a < b)
>g();
>if (a == b)
>h();
>if (a > b)
>i();
> }

FWIW, that also works on pdp11, so it seems the general mechanism is in place 
and working.  Well, with one oddity, an unnecessary third conditional branch:

_f:
mov 02(sp),r1
mov 04(sp),r0
cmp r1,r0
blt L_7
beq L_4
bgt L_5
rts pc
L_5:
jsr pc,_i
rts pc
L_4:
jsr pc,_h
rts pc
L_7:
jsr pc,_g
rts pc



Re: [PATCH v2] x86: Build only one __cpu_model/__cpu_features2 variables

2021-05-05 Thread Uros Bizjak via Gcc-patches
On Wed, May 5, 2021 at 4:50 PM H.J. Lu  wrote:
>
> On Wed, May 05, 2021 at 09:36:16AM +0200, Richard Biener wrote:
> > On Mon, May 3, 2021 at 11:31 AM Ivan Sorokin via Gcc-patches
> >  wrote:
> > >
> > > Prior to this commit GCC -O2 generated quite bad code for this
> > > function:
> > >
> > > bool f()
> > > {
> > > return __builtin_cpu_supports("popcnt")
> > > && __builtin_cpu_supports("ssse3");
> > > }
> > >
> > > f:
> > > movl__cpu_model+12(%rip), %eax
> > > xorl%r8d, %r8d
> > > testb   $4, %al
> > > je  .L1
> > > shrl$6, %eax
> > > movl%eax, %r8d
> > > andl$1, %r8d
> > > .L1:
> > > movl%r8d, %eax
> > > ret
> > >
> > > The problem was caused by the fact that internally every invocation
> > > of __builtin_cpu_supports built a new variable __cpu_model and a new
> > > type __processor_model. Because of this GIMPLE level optimizers
> > > weren't able to CSE the loads of __cpu_model and optimize
> > > bit-operations properly.
> > >
> > > This commit fixes the problem by caching created __cpu_model
> > > variable and __processor_model type. Now the GCC -O2 generates:
> > >
> > > f:
> > > movl__cpu_model+12(%rip), %eax
> > > andl$68, %eax
> > > cmpl$68, %eax
> > > sete%al
> > > ret
> >
> > The patch looks good, the function could need a comment
> > and the global variables better names, not starting with __
> >
> > Up to the x86 maintainers - HJ, can you pick up this work?
> >
>
> Here is the updated patch to also handle and__cpu_features2.
> OK for master?

LGTM (Richi effectively approved tree parts).

Thanks,
Uros.

>
> Thanks.
>
> H.J.
> ---
> GCC -O2 generated quite bad code for this function:
>
> bool
> f (void)
> {
>   return __builtin_cpu_supports("popcnt")
>  && __builtin_cpu_supports("ssse3");
> }
>
> f:
> movl__cpu_model+12(%rip), %edx
> movl%edx, %eax
> shrl$6, %eax
> andl$1, %eax
> andl$4, %edx
> movl$0, %edx
> cmove   %edx, %eax
> ret
>
> The problem was caused by the fact that internally every invocation of
> __builtin_cpu_supports built a new variable __cpu_model and a new type
> __processor_model.  Because of this, GIMPLE level optimizers weren't able
> to CSE the loads of __cpu_model and optimize bit-operations properly.
>
> Improve GCC -O2 code generation by caching __cpu_model and__cpu_features2
> variables as well as their types:
>
> f:
> movl__cpu_model+12(%rip), %eax
> andl$68, %eax
> cmpl$68, %eax
> sete%al
> ret
>
> gcc/ChangeLog:
>
> 2021-05-05  Ivan Sorokin 
> H.J. Lu 
>
> PR target/91400
> * config/i386/i386-builtins.c (ix86_cpu_model_type_node): New.
> (ix86_cpu_model_var): Likewise.
> (ix86_cpu_features2_type_node): Likewise.
> (ix86_cpu_features2_var): Likewise.
> (fold_builtin_cpu): Cache __cpu_model and __cpu_features2 with
> their types.
>
> gcc/testsuite/Changelog:
>
> 2021-05-05  Ivan Sorokin 
> H.J. Lu 
>
> PR target/91400
> * gcc.target/i386/pr91400-1.c: New test.
> * gcc.target/i386/pr91400-2.c: Likewise.
> ---
>  gcc/config/i386/i386-builtins.c   | 52 +++
>  gcc/testsuite/gcc.target/i386/pr91400-1.c | 14 ++
>  gcc/testsuite/gcc.target/i386/pr91400-2.c | 14 ++
>  3 files changed, 63 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr91400-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr91400-2.c
>
> diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
> index b66911082ab..8036aedebac 100644
> --- a/gcc/config/i386/i386-builtins.c
> +++ b/gcc/config/i386/i386-builtins.c
> @@ -2103,6 +2103,11 @@ make_var_decl (tree type, const char *name)
>return new_decl;
>  }
>
> +static GTY(()) tree ix86_cpu_model_type_node;
> +static GTY(()) tree ix86_cpu_model_var;
> +static GTY(()) tree ix86_cpu_features2_type_node;
> +static GTY(()) tree ix86_cpu_features2_var;
> +
>  /* FNDECL is a __builtin_cpu_is or a __builtin_cpu_supports call that is 
> folded
> into an integer defined in libgcc/config/i386/cpuinfo.c */
>
> @@ -2114,12 +2119,16 @@ fold_builtin_cpu (tree fndecl, tree *args)
>  = (enum ix86_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>tree param_string_cst = NULL;
>
> -  tree __processor_model_type = build_processor_model_struct ();
> -  tree __cpu_model_var = make_var_decl (__processor_model_type,
> -   "__cpu_model");
> -
> -
> -  varpool_node::add (__cpu_model_var);
> +  if (ix86_cpu_model_var == nullptr)
> +{
> +  /* Build a single __cpu_model variable for all references to
> +__cpu_model so that GIMPLE level optimizers can CSE the loads
> +of __cpu_model and optimize bit-operations 

[PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]

2021-05-05 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 19, 2021 at 04:10:33PM +, Richard Sandiford via Gcc-patches 
wrote:
> Ah, ok, thanks for the extra context.
> 
> So AIUI the problem when recording xmm2<-di isn't just:
> 
>  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
> 
> but also that:
> 
>  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
> 
> For example, all registers in this sequence can be part of the same chain:
> 
> (set (reg:HI R1) (reg:HI R0))
> (set (reg:SI R2) (reg:SI R1)) // [A]
> (set (reg:DI R3) (reg:DI R2)) // [A]
> (set (reg:SI R4) (reg:SI R[0-3]))
> (set (reg:HI R5) (reg:HI R[0-4]))
> 
> But:
> 
> (set (reg:SI R1) (reg:SI R0))
> (set (reg:HI R2) (reg:HI R1))
> (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
> 
> is problematic because it dips below the precision of the oldest regno
> and then increases again.
> 
> When this happens, I guess we have two choices:
> 
> (1) what the patch does: treat R3 as the start of a new chain.
> (2) pretend that the copy occured in vd->e[sr].mode instead
> (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
> 
> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> Maybe the optimisation provided by (2) compared to (1) isn't common
> enough to be worth the complication.
> 
> I think we should test [B] as well as [A] though.  The pass is set
> up to do some quite elaborate mode changes and I think rejecting
> [A] on its own would make some of the other code redundant.
> It also feels like it should be a seperate “if” or “else if”,
> with its own comment.

Unfortunately, we now have a testcase that shows that testing also [B]
is a problem (unfortunately now latent on the trunk, only reproduces
on 10 and 11 branches).

The comment in the patch tries to list just the interesting instructions,
we have a 64-bit value, copy low 8 bit of those to another register,
copy full 64 bits to another register and then clobber the original register.
Before that (set (reg:DI r14) (const_int ...)) we have a chain
DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
we have QI si, DI bp , si being the oldest_regno.
Next DI si is copied into DI dx.  Only the low 8 bits of that are defined,
the rest is unspecified, but we would add DI dx into that same chain at the
end, so QI si, DI bp, DI dx [*].  Next si is overwritten, so the chain is
DI bp, DI dx.  And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
as redundant, because we think bp and dx are already equivalent, when in
reality that is true only for the lowpart 8 bits.
I believe the [*] marked step above is where the bug is.

The committed regcprop.c (copy_value) change (but only committed to
trunk/11, not to 10) added
  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
   && partial_subreg_p (vd->e[sr].mode,
vd->e[vd->e[sr].oldest_regno].mode))
return;
and while the first partial_subreg_p call returns true, the second one
doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
true and we'd return, but as that reg got clobbered, si became the oldest
regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
But as the testcase shows, what is the oldest_regno in the chain is
something that changes over time, so relying on it for anything is
problematic, something could have a different oldest_regno and later
on get a different oldest_regno (perhaps with different mode) because
the oldest_regno got overwritten and it can change both ways.

I wrote the following patch (originally against 10 branch because that is
where Uros has been debugging it) and bootstrapped/regtested it on 11
branch successfully.
It effectively implements your (2) above; I'm not sure if
REG_CAN_CHANGE_MODE_P is needed there, because it is already tested in
find_oldest_value_reg -> maybe_mode_change -> mode_change_ok.

So perhaps just the vd->e[dr].mode in there could change to
GET_MODE (src) and drop the previous PR98694 change?
If yes, what to do with the previously added comment?

2021-05-05  Jakub Jelinek  

PR rtl-optimization/100342
* regcprop.c (copy_value): When copying a source reg in a wider
mode than it has recorded for the value, adjust recorded destination
mode too.

* gcc.target/i386/pr100342.c: New test.

--- gcc/regcprop.c.jj   2020-04-30 17:41:37.624675304 +0200
+++ gcc/regcprop.c  2021-05-05 16:24:01.667308941 +0200
@@ -358,6 +358,22 @@ copy_value (rtx dest, rtx src, struct va
   else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
 return;
 
+  /* If a narrower value is copied using wider mode, the upper bits
+ are undefined (could be e.g. a former paradoxical subreg).  Signal
+ in that case we've only copied value using the narrower mode.
+ Consider:
+ (set (reg:DI r14) (mem:DI ...))
+ (set (reg:QI si) (reg:QI r14))
+ (set (reg:DI bp) 

FW: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.

2021-05-05 Thread Tamar Christina via Gcc-patches
Forgot to CC maintainers..

-Original Message-
From: Tamar Christina  
Sent: Wednesday, May 5, 2021 6:39 PM
To: gcc-patches@gcc.gnu.org
Cc: nd 
Subject: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot 
for NEON. 

Hi All,

This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 
char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
  int av = a[i];
  int bv = b[i];
  SIGNEDNESS_2 short mult = av * bv;
  res += mult;
}
  return res;
}

Generates

f:
vmov.i32q8, #0  @ v4si
add r3, r2, #480
.L2:
vld1.8  {q10}, [r2]!
vld1.8  {q9}, [r1]!
vusdot.s8   q8, q9, q10
cmp r3, r2
bne .L2
vadd.i32d16, d16, d17
vpadd.i32   d16, d16, d16
vmov.32 r3, d16[0]
add r0, r0, r3
bx  lr

instead of

f:
vmov.i32q8, #0  @ v4si
add r3, r2, #480
.L2:
vld1.8  {q9}, [r2]!
vld1.8  {q11}, [r1]!
cmp r3, r2
vmull.s8 q10, d18, d22
vmull.s8 q9, d19, d23
vaddw.s16   q8, q8, d20
vaddw.s16   q8, q8, d21
vaddw.s16   q8, q8, d18
vaddw.s16   q8, q8, d19
bne .L2
vadd.i32d16, d16, d17
vpadd.i32   d16, d16, d16
vmov.32 r3, d16[0]
add r0, r0, r3
bx  lr

For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16 could be 
used to emulate this.  Because it would require additional widening to work I 
left MVE out of this patch set but perhaps someone should take a look.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (usdot_prod): New.

gcc/testsuite/ChangeLog:

* gcc.target/arm/simd/vusdot-autovec.c: New test.

--- inline copy of patch --
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 
fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3075,6 +3075,24 @@ (define_expand "dot_prod"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod"
+  [(set (match_operand:VCVTI 0 "register_operand")
+   (plus:VCVTI (unspec:VCVTI [(match_operand: 1
+   "register_operand")
+  (match_operand: 2
+   "register_operand")]
+UNSPEC_DOT_US)
+   (match_operand:VCVTI 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+gen_neon_usdot (operands[3], operands[3], operands[1],
+   operands[2]));
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+  DONE;
+})
+
 (define_expand "neon_copysignf"
   [(match_operand:VCVTF 0 "register_operand")
(match_operand:VCVTF 1 "register_operand") diff --git 
a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c 
b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
new file mode 100644
index 
..7cc56f68817d77d6950df0ab372d6fbaad6b3813
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, 
+SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+{
+  int av = a[i];
+  int bv = b[i];
+  SIGNEDNESS_2 short mult = av * bv;
+  res += mult;
+}
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res, 
+SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+{
+  int av = a[i];
+  int bv = b[i];
+  SIGNEDNESS_2 short mult = av * bv;
+  res += mult;
+}
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target { 
+arm-*-*-gnueabihf } } } } */


-- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3075,6 +3075,24 @@ (define_expand "dot_prod"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod"
+  [(set (match_operand:VCVTI 0 "register_operand")
+	(plus:VCVTI (unspec:VCVTI [(match_operand: 1
+			

[PATCH 4/4]middle-end: Add tests middle end generic tests for sign differing dotproduct.

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All,

This adds testcases to test for auto-vect detection of the new sign differing
dot product.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/sourcebuild.texi (arm_v8_2a_i8mm_neon_hw): Document.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_v8_2a_imm8_neon_ok_nocache,
check_effective_target_arm_v8_2a_i8mm_neon_hw,
check_effective_target_vect_usdot_qi): New.
* gcc.dg/vect/vect-reduc-dot-10.c: New test.
* gcc.dg/vect/vect-reduc-dot-11.c: New test.
* gcc.dg/vect/vect-reduc-dot-12.c: New test.
* gcc.dg/vect/vect-reduc-dot-13.c: New test.
* gcc.dg/vect/vect-reduc-dot-14.c: New test.
* gcc.dg/vect/vect-reduc-dot-15.c: New test.
* gcc.dg/vect/vect-reduc-dot-16.c: New test.
* gcc.dg/vect/vect-reduc-dot-9.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 
b0001247795947c9dcab1a14884ecd585976dfdd..0034ac9d86b26e6674d71090b9d04b6148f99e17
 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1672,6 +1672,10 @@ Target supports a vector dot-product of @code{signed 
char}.
 @item vect_udot_qi
 Target supports a vector dot-product of @code{unsigned char}.
 
+@item vect_usdot_qi
+Target supports a vector dot-product where one operand of the multiply is
+@code{signed char} and the other of @code{unsigned char}.
+
 @item vect_sdot_hi
 Target supports a vector dot-product of @code{signed short}.
 
@@ -1947,6 +1951,11 @@ ARM target supports executing instructions from 
ARMv8.2-A with the Dot
 Product extension. Some multilibs may be incompatible with these options.
 Implies arm_v8_2a_dotprod_neon_ok.
 
+@item arm_v8_2a_i8mm_neon_hw
+ARM target supports executing instructions from ARMv8.2-A with the 8-bit
+Matrix Multiply extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_i8mm_ok.
+
 @item arm_fp16fml_neon_ok
 @anchor{arm_fp16fml_neon_ok}
 ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
new file mode 100644
index 
..7ce86965ea97d37c43d96b4d2271df667dcb2aae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target 
vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
new file mode 100644
index 
..0f7cbbb87ef028f166366aea55bc4ef49d2f8e9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-11.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 unsigned
+#define SIGNEDNESS_4 signed
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target 
vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
new file mode 100644
index 
..08412614fc67045d3067b5b55ba032d297595237
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-12.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_i8mm }  */
+
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+#include "vect-reduc-dot-9.c"
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target 
vect_usdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
new file mode 100644
index 
..7ee0f45f64296442204ee13d5f880f4b7716fb85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-13.c
@@ -0,0 +1,13 @@
+/* { 

[PATCH 2/4]AArch64: Add support for sign differing dot-product usdot for NEON and SVE.

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All,

This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
  int av = a[i];
  int bv = b[i];
  SIGNEDNESS_2 short mult = av * bv;
  res += mult;
}
  return res;
}

Generates for NEON

f:
moviv0.4s, 0
mov x3, 0
.p2align 3,,7
.L2:
ldr q1, [x2, x3]
ldr q2, [x1, x3]
usdot   v0.4s, v1.16b, v2.16b
add x3, x3, 16
cmp x3, 480
bne .L2
addvs0, v0.4s
fmovw1, s0
add w0, w0, w1
ret

and for SVE

f:
mov x3, 0
cntbx5
mov w4, 480
mov z1.b, #0
whilelo p0.b, wzr, w4
mov z3.b, #0
ptrue   p1.b, all
.p2align 3,,7
.L2:
ld1bz2.b, p0/z, [x1, x3]
ld1bz0.b, p0/z, [x2, x3]
add x3, x3, x5
sel z0.b, p0, z0.b, z3.b
whilelo p0.b, w3, w4
usdot   z1.s, z0.b, z2.b
b.any   .L2
uaddv   d0, p1, z1.s
fmovx1, d0
add w0, w0, w1
ret

instead of

f:
moviv0.4s, 0
mov x3, 0
.p2align 3,,7
.L2:
ldr q2, [x1, x3]
ldr q1, [x2, x3]
add x3, x3, 16
sxtlv4.8h, v2.8b
sxtl2   v3.8h, v2.16b
uxtlv2.8h, v1.8b
uxtl2   v1.8h, v1.16b
mul v2.8h, v2.8h, v4.8h
mul v1.8h, v1.8h, v3.8h
saddw   v0.4s, v0.4s, v2.4h
saddw2  v0.4s, v0.4s, v2.8h
saddw   v0.4s, v0.4s, v1.4h
saddw2  v0.4s, v0.4s, v1.8h
cmp x3, 480
bne .L2
addvs0, v0.4s
fmovw1, s0
add w0, w0, w1
ret

and

f:
mov x3, 0
cnthx5
mov w4, 480
mov z1.b, #0
whilelo p0.h, wzr, w4
ptrue   p2.b, all
.p2align 3,,7
.L2:
ld1sb   z2.h, p0/z, [x1, x3]
punpklo p1.h, p0.b
ld1bz0.h, p0/z, [x2, x3]
add x3, x3, x5
mul z0.h, p2/m, z0.h, z2.h
sunpklo z2.s, z0.h
sunpkhi z0.s, z0.h
add z1.s, p1/m, z1.s, z2.s
punpkhi p1.h, p0.b
whilelo p0.h, w3, w4
add z1.s, p1/m, z1.s, z0.s
b.any   .L2
uaddv   d0, p2, z1.s
fmovx1, d0
add w0, w0, w1
ret

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (usdot_prod): New.
* config/aarch64/aarch64-sve.md (@aarch64_dot_prod):
Rename to...
(@dot_prod): ...This.
* config/aarch64/aarch64-sve-builtins-base.cc
(svusdot_impl::expand): Use it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/vusdot-autovec.c: New test.
* gcc.target/aarch64/sve/vusdot-autovec.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
4edee99051c4e2112b546becca47da32aae21df2..c9fb8e702732dd311fb10de17126432e2a63a32b
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -648,6 +648,22 @@ (define_expand "dot_prod"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod"
+  [(set (match_operand:VS 0 "register_operand")
+   (plus:VS (unspec:VS [(match_operand: 1 "register_operand")
+   (match_operand: 2 "register_operand")]
+UNSPEC_USDOT)
+   (match_operand:VS 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+gen_aarch64_usdot (operands[3], operands[3], operands[1],
+  operands[2]));
+  emit_move_insn (operands[0], operands[3]);
+  DONE;
+})
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_dot_lane"
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 
dfdf0e2fd186389cbddcff51ef52f8778d7fdb24..50adcd5404e97e610485140fdbfe4c8ebbf2f602
 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2366,7 +2366,7 @@ public:
Hence we do the same rotation on arguments as svdot_impl does.  */
 e.rotate_inputs_left (0, 3);
 machine_mode mode = e.vector_mode (0);
-insn_code icode = code_for_aarch64_dot_prod (UNSPEC_USDOT, mode);
+insn_code icode = code_for_dot_prod (UNSPEC_USDOT, mode);
 return e.use_exact_insn (icode);
   }
 
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md

[PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All,

This adds optabs implementing usdot_prod.

The following testcase:

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
  int av = a[i];
  int bv = b[i];
  SIGNEDNESS_2 short mult = av * bv;
  res += mult;
}
  return res;
}

Generates

f:
vmov.i32q8, #0  @ v4si
add r3, r2, #480
.L2:
vld1.8  {q10}, [r2]!
vld1.8  {q9}, [r1]!
vusdot.s8   q8, q9, q10
cmp r3, r2
bne .L2
vadd.i32d16, d16, d17
vpadd.i32   d16, d16, d16
vmov.32 r3, d16[0]
add r0, r0, r3
bx  lr

instead of

f:
vmov.i32q8, #0  @ v4si
add r3, r2, #480
.L2:
vld1.8  {q9}, [r2]!
vld1.8  {q11}, [r1]!
cmp r3, r2
vmull.s8 q10, d18, d22
vmull.s8 q9, d19, d23
vaddw.s16   q8, q8, d20
vaddw.s16   q8, q8, d21
vaddw.s16   q8, q8, d18
vaddw.s16   q8, q8, d19
bne .L2
vadd.i32d16, d16, d17
vpadd.i32   d16, d16, d16
vmov.32 r3, d16[0]
add r0, r0, r3
bx  lr

For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16 could be
used to emulate this.  Because it would require additional widening to work I
left MVE out of this patch set but perhaps someone should take a look.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (usdot_prod): New.

gcc/testsuite/ChangeLog:

* gcc.target/arm/simd/vusdot-autovec.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3075,6 +3075,24 @@ (define_expand "dot_prod"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod"
+  [(set (match_operand:VCVTI 0 "register_operand")
+   (plus:VCVTI (unspec:VCVTI [(match_operand: 1
+   "register_operand")
+  (match_operand: 2
+   "register_operand")]
+UNSPEC_DOT_US)
+   (match_operand:VCVTI 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+gen_neon_usdot (operands[3], operands[3], operands[1],
+   operands[2]));
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+  DONE;
+})
+
 (define_expand "neon_copysignf"
   [(match_operand:VCVTF 0 "register_operand")
(match_operand:VCVTF 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c 
b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
new file mode 100644
index 
..7cc56f68817d77d6950df0ab372d6fbaad6b3813
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
+
+#define N 480
+#define SIGNEDNESS_1 unsigned
+#define SIGNEDNESS_2 signed
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 unsigned
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
+   SIGNEDNESS_4 char *restrict b)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+{
+  int av = a[i];
+  int bv = b[i];
+  SIGNEDNESS_2 short mult = av * bv;
+  res += mult;
+}
+  return res;
+}
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
+   SIGNEDNESS_4 char *restrict a)
+{
+  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
+{
+  int av = a[i];
+  int bv = b[i];
+  SIGNEDNESS_2 short mult = av * bv;
+  res += mult;
+}
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target { 
arm-*-*-gnueabihf } } } } */


-- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3075,6 +3075,24 @@ (define_expand "dot_prod"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot
+(define_expand "usdot_prod"
+  [(set (match_operand:VCVTI 0 "register_operand")
+	(plus:VCVTI (unspec:VCVTI [(match_operand: 1
+			"register_operand")
+   (match_operand: 2
+			"register_operand")]
+		 UNSPEC_DOT_US)
+		(match_operand:VCVTI 3 "register_operand")))]
+  "TARGET_I8MM"
+{
+  emit_insn (
+gen_neon_usdot (operands[3], operands[3], operands[1],

[PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All,

This patch adds support for a dot product where the sign of the multiplication
arguments differ. i.e. one is signed and one is unsigned but the precisions are
the same.

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
  int av = a[i];
  int bv = b[i];
  SIGNEDNESS_2 short mult = av * bv;
  res += mult;
}
  return res;
}

The operations are performed as if the operands were extended to a 32-bit value.
As such this operation isn't valid if there is an intermediate conversion to an
unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.

more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
optab is used but the operands are flipped in the optab expansion.

To support this the patch extends the dot-product detection to optionally
ignore operands with different signs and stores this information in the optab
subtype which is now made a bitfield.

The subtype can now additionally controls which optab an EXPR can expand to.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* optabs.def (usdot_prod_optab): New.
* doc/md.texi: Document it.
* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
* optabs-tree.h (enum optab_subtype): Likewise.
* optabs.c (expand_widen_pattern_expr): Likewise.
* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
* tree-vect-loop.c (vect_determine_dot_kind): New.
(vectorizable_reduction): Query dot-product kind.
* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
optab subtype.
(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
mismatch types.
(vect_recog_dot_prod_pattern): Support usdot_prod_optab.

--- inline copy of patch -- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fdf2e66bc80d7d23
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but takes an 
additional mask operand
 @item @samp{sdot_prod@var{m}}
 @cindex @code{udot_prod@var{m}} instruction pattern
 @itemx @samp{udot_prod@var{m}}
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@itemx @samp{usdot_prod@var{m}}
 Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+Operand 1 and operand 2 are of the same mode but may differ in signs. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index 
c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f1990e0548ba08d
 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
shift amount vs. machines that take a vector for the shift amount.  */
 enum optab_subtype
 {
-  optab_default,
-  optab_scalar,
-  optab_vector
+  optab_default = 1 << 0,
+  optab_scalar = 1 << 1,
+  optab_vector = 1 << 2,
+  optab_signed_to_unsigned = 1 << 3,
+  optab_unsigned_to_signed = 1 << 4
 };
 
+/* Override the OrEqual-operator so we can use optab_subtype as a bit flag.  */
+inline enum optab_subtype&
+operator |= (enum optab_subtype& a, enum optab_subtype b)
+{
+return a = static_cast(static_cast(a)
+ | static_cast(b));
+}
+
+/* Override the Or-operator so we can use optab_subtype as a bit flag.  */
+inline enum optab_subtype
+operator | (enum optab_subtype a, enum optab_subtype b)
+{
+return static_cast(static_cast(a)
+ | static_cast(b));
+}
+
 /* Return the optab used for computing the given operation on the type given by
the second argument.  The third argument distinguishes between the types of
vector shifts and rotates.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 
95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea1e5c22b7453072
 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code, const_tree type,
   return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
 case DOT_PROD_EXPR:
-  

[PATCH] Vect: Remove restrictions on dotprod signedness

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All,

There's no reason that the sign of the operands of dot-product have to all be
the same.  The only restriction really is that the sign of the multiplicands
are the same, however the sign between the multiplier and the accumulator need
not be the same.

The type of the overall operations should be determined by the sign of the
multiplicand which is already being done by optabs-tree.c.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Remove sign check.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-reduc-dot-2.c: Expect to pass.
* gcc.dg/vect/vect-reduc-dot-3.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-6.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-7.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-2.c
index 
25757d2b6713b53a325979b96f89396dbf4675b8..2ebe98887a6072b9e674846af1df38cdc94258dd
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-2.c
@@ -6,5 +6,5 @@
 
 #include "vect-reduc-dot-1.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-3.c
index 
b1deb64e186da99ef42cb687d107445c0b800bd8..6a6679d522350ab4c19836f5537119122f0e654e
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-3.c
@@ -6,5 +6,5 @@
 
 #include "vect-reduc-dot-1.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-6.c
index 
b690c9f2eb18b34f4b147d779bb3da582e285399..0cd4b823643bd4fadd529b2fe4e1d664aa1159ad
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-6.c
@@ -6,5 +6,5 @@
 
 #include "vect-reduc-dot-1.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-7.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-7.c
index 
29e442e8bbf7176cf861518dc171a83d82967764..eefee2e2ca27d749cd3af2238723aeae4e60a429
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-7.c
@@ -6,5 +6,5 @@
 
 #include "vect-reduc-dot-1.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" 
"vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
 
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 
803de3fc287371fa202610a55b17e2c8934672f3..441d6cd28c4eaded7abd756164890dbcffd2f3b8
 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -946,7 +946,8 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
  In which
  - DX is double the size of X
  - DY is double the size of Y
- - DX, DY, DPROD all have the same type
+ - DX, DY, DPROD all have the same type but the sign
+   between DX, DY and DPROD can differ.
  - sum is the same size of DPROD or bigger
  - sum has been recognized as a reduction variable.
 
@@ -988,12 +989,6 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 false, 2, unprom0, _type))
 return NULL;
 
-  /* If there are two widening operations, make sure they agree on
- the sign of the extension.  */
-  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
-  && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
-return NULL;
-
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;


-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-2.c
index 25757d2b6713b53a325979b96f89396dbf4675b8..2ebe98887a6072b9e674846af1df38cdc94258dd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-2.c
@@ -6,5 +6,5 @@
 
 #include "vect-reduc-dot-1.c"
 
-/* { dg-final { scan-tree-dump-not "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-3.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-3.c
index b1deb64e186da99ef42cb687d107445c0b800bd8..6a6679d522350ab4c19836f5537119122f0e654e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-3.c
@@ -6,5 +6,5 @@
 
 

[PATCH]AArch64: Have -mcpu=native and -march=native enable extensions when CPU is unknown

2021-05-05 Thread Tamar Christina via Gcc-patches
Hi All,

Currently when using -mcpu=native or -march=native on a CPU that is unknown to
the compiler the compiler currently just used -march=armv8-a and enables none
of the extensions.

To make this a bit more useful this patch changes it to still use -march=armv8.a
but to enable the extensions.  We still cannot do tuning but at least if using
this on a future SVE core the compiler will at the very least enable SVE etc.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/driver-aarch64.c (DEFAULT_ARCH): New.
(host_detect_local_cpu): Use it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_16: New test.
* gcc.target/aarch64/cpunative/info_17: New test.
* gcc.target/aarch64/cpunative/native_cpu_16.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_17.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/driver-aarch64.c 
b/gcc/config/aarch64/driver-aarch64.c
index 
e2935a1156412c898ea086feb0d698ec92107652..b58591d497461cae6e8014fa39afd9dd26ae67bf
 100644
--- a/gcc/config/aarch64/driver-aarch64.c
+++ b/gcc/config/aarch64/driver-aarch64.c
@@ -58,6 +58,8 @@ struct aarch64_core_data
 #define INVALID_IMP ((unsigned char) -1)
 #define INVALID_CORE ((unsigned)-1)
 #define ALL_VARIANTS ((unsigned)-1)
+/* Default architecture to use if -mcpu=native did not detect a known CPU.  */
+#define DEFAULT_ARCH "8A"
 
 #define AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, 
PART, VARIANT) \
   { CORE_NAME, #ARCH, IMP, PART, VARIANT, FLAGS },
@@ -390,10 +392,18 @@ host_detect_local_cpu (int argc, const char **argv)
 && (aarch64_cpu_data[i].variant == ALL_VARIANTS
 || variants[0] == aarch64_cpu_data[i].variant))
  break;
+
   if (aarch64_cpu_data[i].name == NULL)
-goto not_found;
+   {
+ aarch64_arch_driver_info* arch_info
+   = get_arch_from_id (DEFAULT_ARCH);
+
+ gcc_assert (arch_info);
 
-  if (arch)
+ res = concat ("-march=", arch_info->name, NULL);
+ default_flags = arch_info->flags;
+   }
+  else if (arch)
{
  const char *arch_id = aarch64_cpu_data[i].arch;
  aarch64_arch_driver_info* arch_info = get_arch_from_id (arch_id);
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
new file mode 100644
index 
..b0679579d9167d46c832e55cb63d9077f7a80f70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
@@ -0,0 +1,8 @@
+processor  : 0
+BogoMIPS   : 100.00
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2
+CPU implementer: 0xff
+CPU architecture: 8
+CPU variant: 0x0
+CPU part   : 0xd08
+CPU revision   : 2
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
new file mode 100644
index 
..b0679579d9167d46c832e55cb63d9077f7a80f70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
@@ -0,0 +1,8 @@
+processor  : 0
+BogoMIPS   : 100.00
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2
+CPU implementer: 0xff
+CPU architecture: 8
+CPU variant: 0x0
+CPU part   : 0xd08
+CPU revision   : 2
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
new file mode 100644
index 
..a424e7c56c782ca6e6917248e2fa7a18eb94e06a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
+/* { dg-set-compiler-env-var GCC_CPUINFO 
"$srcdir/gcc.target/aarch64/cpunative/info_16" } */
+/* { dg-additional-options "-mcpu=native" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+crc\+dotprod\+sve2} } 
} */
+
+/* Test a normal looking procinfo.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
new file mode 100644
index 
..8104761be927275207318a834f03041b627856b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
+/* { dg-set-compiler-env-var GCC_CPUINFO 
"$srcdir/gcc.target/aarch64/cpunative/info_16" } */
+/* { dg-additional-options "-march=native" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+crc\+dotprod\+sve2} } 
} */
+
+/* Test a normal looking procinfo.  */


-- 
diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c

Re: [PATCH] -Walloca-larger-than with constant sizes at -O0 (PR 100425)

2021-05-05 Thread Martin Sebor via Gcc-patches

On 5/5/21 1:32 AM, Richard Biener wrote:

On Wed, May 5, 2021 at 4:20 AM Martin Sebor via Gcc-patches
 wrote:


Even when explicitly enabled, -Walloca-larger-than doesn't run
unless optimization is enabled as well.  This prevents diagnosing
alloca calls with constant arguments in excess of the limit that
could otherwise be flagged even at -O0, making the warning less
consistent and less useful than is possible.

The attached patch enables -Walloca-larger-than for calls with
constant arguments in excess of the limit even at -O0 (variable
arguments are only handled with optimization, when VRP runs).


Hmm, but then the pass runs even without -Walloca or -Walloca-larger-than
or -Wvla[-larger-than].  It performs an IL walk we should avoid in those
cases.

So the patch is OK but can you please come up with a gate that disables
the pass when all of the warnings it handles won't fire anyway?


-W{alloca,vla}-larger-than=PTRDIFF_MAX are enabled by default so
the pass needs to do walk.

FWIW, it would make sense to me to consolidate all the checking of
calls for arguments with excessive sizes/values into the same pass
and single walk (with code still in separate source files).  As it
is, some are done in their own passes (like alloca and sprintf),
and others during expansion (-Wstringop-overflow), and others in
calls.c (-Walloc-size-larger-than).  That leads to repetitive code
and inconsistent approaches and inconsistent false positives and
negatives (because some are done at -O0 but others require
optimization).

Martin



Thanks,
Richard.


Martin




Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-05-05 Thread Qing Zhao via Gcc-patches
> 
>> 
>>> @@ -11950,6 +12088,72 @@ lower_bound_in_type (tree outer, tree inner)
>>>}
>>> }
>>> 
>>> +/* Returns true when the given TYPE has padding inside it.
>>> +   return false otherwise.  */
>>> +bool
>>> +type_has_padding (tree type)
>> 
>> Would it be possible to reuse __builtin_clear_padding here?
> 
> Not sure, where can I get more details on __builtin_clear_padding? I can 
> study a little bit more on this to make sure this.

After some study, my understanding is, the call to __builtin_clear_padding is 
expanded during gimplification phase.  
And there is no __bultin_clear_padding expanding during rtx expanding phase.

Is the above understanding correct? 

If so,  for -ftrivial-auto-var-init, padding initialization should be done both 
in gimplification phase and rtx expanding phase. 
And since the __builtin_clear_padding might not be good for rtx expanding, 
reusing __builtin_clear_padding might not work.

Let me know if I misunderstand something.

Qing

>> 



Re: [PATCH] testsuite: Add vect_floatint_cvt to gcc.dg/vect/pr56541.c

2021-05-05 Thread Jeff Law via Gcc-patches



On 5/5/2021 8:31 AM, Robin Dapp via Gcc-patches wrote:

Hi,

pr56541.c implicitly converts a float vector to an int (bool) vector:

 rMin = (rMax>0) ? rMin : rBig;

It fails on some s390 targets because the do not support converting 
from vector float to int.  Is adding a vect_floatint_cvt as in the 
attached patch the OK thing to do?


Or better an xfail with ! vect_floatint_cvt?

Regards
 Robin

gcc/testsuite/ChangeLog:

    * gcc.dg/vect/pr56541.c: Add vect_floatint_cvt.


OK.  I'd tend to use XFAIL for a compiler bug that we haven't fixed.  In 
this case the target doesn't support what the test is trying to do.  So 
skipping the test in one manner or another seems better.



jeff




Re: [PATCH] testsuite: Add s390 to vect_*_cvt checks

2021-05-05 Thread Jeff Law via Gcc-patches



On 5/5/2021 8:39 AM, Robin Dapp via Gcc-patches wrote:

Hi,

this patch adds some s390 checks for vect_*_cvts. Is it OK?

Regards
 Robin

gcc/testsuite/ChangeLog:

    * lib/target-supports.exp: Add s390 checks for vect conversions.


OK

jeff




Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-05 Thread Andre Vieira (lists) via Gcc-patches



On 05/05/2021 13:34, Richard Biener wrote:

On Wed, 5 May 2021, Andre Vieira (lists) wrote:


I tried to see what IVOPTs would make of this and it is able to analyze the
IVs but it doesn't realize (not even sure it tries) that one IV's end (loop 1)
could be used as the base for the other (loop 2). I don't know if this is
where you'd want such optimizations to be made, on one side I think it would
be great as it would also help with non-vectorized loops as you allured to.

Hmm, OK.  So there's the first loop that has a looparound jump and thus
we do not always enter the 2nd loop with the first loop final value of the
IV.  But yes, IVOPTs does not try to allocate IVs across multiple loops.
And for a followup transform to catch this it would need to compute
the final value of the IV and then match this up with the initial
value computation.  I suppose FRE could be teached to do this, at
least for very simple cases.
I will admit I am not at all familiar with how FRE works, I know it 
exists as the occlusion of running it often breaks my vector patches :P 
But that's about all I know.
I will have a look and see if it makes sense from my perspective to 
address it there, because ...



Anyway I diverge. Back to the main question of this patch. How do you suggest
I go about this? Is there a way to make IVOPTS aware of the 'iterate-once' IVs
in the epilogue(s) (both vector and scalar!) and then teach it to merge IV's
if one ends where the other begins?

I don't think we will make that work easily.  So indeed attacking this
in the vectorizer sounds most promising.


The problem with this that I found with my approach is that it only 
tackles the vectorized epilogues and that leads to regressions, I don't 
have the example at hand, but what I saw was happening was that 
increased register pressure lead to a spill in the hot path. I believe 
this was caused by the epilogue loop using the update pointers as the 
base for their DR's, in this case there were three DR's (2 loads one 
store), but the scalar epilogue still using the original base + niters, 
since this data_reference approach only changes the vectorized epilogues.




  I'll note there's also
the issue of epilogue vectorization and reductions where we seem
to not re-use partially reduced reduction vectors but instead
reduce to a scalar in each step.  That's a related issue - we're
not able to carry forward a (reduction) IV we generated for the
main vector loop to the epilogue loops.  Like for

double foo (double *a, int n)
{
   double sum = 0.;
   for (int i = 0; i < n; ++i)
 sum += a[i];
   return sum;
}

with AVX512 we get three reductions to scalars instead of
a partial reduction from zmm to ymm before the first vectorized
epilogue followed by a reduction from ymm to xmm before the second
(the jump around for the epilogues need to jump to the further
reduction piece obviously).

So I think we want to record IVs we generate (the reduction IVs
are already nicely associated with the stmt-infos), one might
consider to refer to them from the dr_vec_info for example.

It's just going to be "interesting" to wire everything up
correctly with all the jump-arounds we have ...
I have a downstream hack for the reductions, but it only worked for 
partial-vector-usage as there you have the guarantee it's the same 
vector-mode, so you don't need to pfaff around with half and full 
vectors. Obviously what you are suggesting has much wider applications 
and not surprisingly I think Richard Sandiford also pointed out to me 
that these are somewhat related and we might be able to reuse the 
IV-creation to manage it all. But I feel like I am currently light years 
away from that.


I had started to look at removing the data_reference updating we have 
now and dealing with this in the 'create_iv' calls from 
'vect_create_data_ref_ptr' inside 'vectorizable_{load,store}' but then I 
thought it would be good to discuss it with you first. This will require 
keeping track of the 'end-value' of the IV, which for loops where we can 
skip the previous loop means we will need to construct a phi-node 
containing the updated pointer and the initial base. But I'm not 
entirely sure where to keep track of all this. Also I don't know if I 
can replace the base address of the data_reference right there at the 
'create_iv' call, can a data_reference be used multiple times in the 
same loop?


I'll go do a bit more nosing around this idea and the ivmap you 
mentioned before. Let me know if you have any ideas on how this all 
should look like, even if its a 'in an ideal world'.


Andre



On 04/05/2021 10:56, Richard Biener wrote:

On Fri, 30 Apr 2021, Andre Vieira (lists) wrote:


Hi,

The aim of this RFC is to explore a way of cleaning up the codegen around
data_references.  To be specific, I'd like to reuse the main-loop's updated
data_reference as the base_address for the epilogue's corresponding
data_reference, rather than use the niters.  We have found this leads to
better 

Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 01:45:29PM +0200, Marc Glisse wrote:
> On Tue, 4 May 2021, Jakub Jelinek via Gcc-patches wrote:
> 
> > 2) the pr94589-2.C testcase should be matching just 12 times each, but runs
> > into operator>=(strong_ordering, unspecified) being defined as
> > (_M_value&1)==_M_value
> > rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
> > unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
> > but is not a single use but two uses.  I'll need to pattern match that case
> > specially.
> 
> Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0,
> that could be worth doing already in GIMPLE.

Apparently it is
  /* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or
 constant folding if x/y is a constant.  */
  if ((code == EQ || code == NE)
  && (op0code == AND || op0code == IOR)
  && !side_effects_p (op1)
  && op1 != CONST0_RTX (cmp_mode))
{
  /* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to
 (eq/ne (and (not y) x) 0).  */
...
  /* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to
 (eq/ne (and (not x) y) 0).  */
Yes, doing that on GIMPLE for the case where the not argument is constant
would simplify the phiopt follow-up (it would be single imm use then).

Jakub



[vect] Support min/max + index pattern

2021-05-05 Thread Joel Hutton via Gcc-patches
Hi all,

looking for some feedback on this, one thing I would like to draw attention to 
is the fact that this pattern requires 2 separate dependent reductions in the 
epilogue.
The accumulator vector for the maximum/minimum elements can be reduced to a 
scalar result trivially with a min/max, but getting the index from accumulator 
vector of indices is more complex and requires using the position of the 
maximum/minimum scalar result value within the accumulator vector to create a 
mask.

The given solution works but it's slightly messy. 
vect_create_epilogue_for_reduction creates the epilogue for one vectorized 
scalar stmt at a time. This modification makes one
invocation create the epilogue for both related stmts and marks the other as 
'done'. Alternate suggestions are welcome.

Joel

[vect] Support min/max + index pattern

Add the capability to vect-loop to support the following pattern.

for (int i = 0; i < n; i ++)
{
if (data[i] < best)
{
best = data[i];
best_i = i;
}
}

gcc/ChangeLog:

* tree-vect-loop.c (vect_reassociating_reduction_simple_p): New 

 
function.   

 
(vect_recog_minmax_index_pattern): New function.

 
(vect_is_simple_reduction): Add multi_use_reduction case.   

 
(vect_create_epilog_for_reduction): Add minmax+index epilogue handling.


minmax.patch
Description: minmax.patch


Re: [PATCH] libstdc++: Reduce ranges::minmax/minmax_element comparison complexity

2021-05-05 Thread Patrick Palka via Gcc-patches
On Wed, 5 May 2021, Patrick Palka wrote:

> On Wed, 5 May 2021, Jonathan Wakely wrote:
> 
> > On 04/05/21 21:42 -0400, Patrick Palka via Libstdc++ wrote:
> > > This rewrites ranges::minmax and ranges::minmax_element so that it
> > > performs at most 3*N/2 many comparisons, as required by the standard.
> > > In passing, this also fixes PR100387 by avoiding a premature std::move
> > > in ranges::minmax and in std::shift_right.
> > > 
> > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps
> > > 10/11?
> > > 
> > > libstdc++-v3/ChangeLog:
> > > 
> > >   PR libstdc++/100387
> > >   * include/bits/ranges_algo.h (__minmax_fn::operator()): Rewrite
> > >   to limit comparison complexity to 3*N/2.  Avoid premature std::move.
> > >   (__minmax_element_fn::operator()): Likewise.
> > >   (shift_right): Avoid premature std::move of __result.
> > >   * testsuite/25_algorithms/minmax/constrained.cc (test04, test05):
> > >   New tests.
> > >   * testsuite/25_algorithms/minmax_element/constrained.cc (test02):
> > >   Likewise.
> > > ---
> > > libstdc++-v3/include/bits/ranges_algo.h   | 87 ++-
> > > .../25_algorithms/minmax/constrained.cc   | 31 +++
> > > .../minmax_element/constrained.cc | 19 
> > > 3 files changed, 113 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > > b/libstdc++-v3/include/bits/ranges_algo.h
> > > index cda3042c11f..bbd29127e89 100644
> > > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > > @@ -3291,18 +3291,39 @@ namespace ranges
> > >   auto __first = ranges::begin(__r);
> > >   auto __last = ranges::end(__r);
> > >   __glibcxx_assert(__first != __last);
> > > + auto __comp_proj = __detail::__make_comp_proj(__comp, __proj);
> > >   minmax_result> __result = {*__first, *__first};
> > >   while (++__first != __last)
> > > {
> > > - auto __tmp = *__first;
> > > - if (std::__invoke(__comp,
> > > -   std::__invoke(__proj, __tmp),
> > > -   std::__invoke(__proj, __result.min)))
> > > -   __result.min = std::move(__tmp);
> > > - if (!(bool)std::__invoke(__comp,
> > > -  std::__invoke(__proj, __tmp),
> > > -  std::__invoke(__proj, __result.max)))
> > > -   __result.max = std::move(__tmp);
> > > + // Process two elements at a time so that we perform at most
> > > + // 3*N/2 many comparisons in total (each of the N/2 iterations
> > 
> > Is "many" a typo here?
> 
> Just a bad habit of mine to usually write " many" instead of just
> "" :) Consider the "many" removed.
> 
> > 
> > > + // of this loop performs three comparisions).
> > > + auto __val1 = *__first;
> > 
> > Can we avoid making this copy if the range satisfies forward_range, by
> > keeping copies of the min/max iterators, or just forwarding to
> > ranges::minmax_element?
> 
> Maybe we can make __val1 and __val2 universal references?  Ah, but then
> __val1 would potentially be invalidated after incrementing __first.  I
> think it should be safe to make __val2 a universal reference though.
> I've done this in v2 below.
> 
> Forwarding to ranges::minmax_element seems like it would be profitable
> in some situations, e.g if the value type isn't trivially copyable.  I
> can do this in a followup patch for ranges::max/max_element and
> ranges::min/min_element too, they should all use the same heuristic.
> 
> > 
> > 
> > > + if (++__first == __last)
> > > +   {
> > > + // N is odd; in this final iteration, we perform a just one
> > 
> > s/perform a just one/perform just one/
> 
> Fixed.
> 
> > 
> > > + // comparison, for a total of 3*(N-1)/2 + 1 < 3*N/2
> > > comparisons.
> > 
> > I find this a bit hard to parse with the inequality there.
> 
> Removed.
> 
> > 
> > > + if (__comp_proj(__val1, __result.min))
> > > +   __result.min = std::move(__val1);
> > > + else if (!__comp_proj(__val1, __result.max))
> > > +   __result.max = std::move(__val1);
> > 
> > This can be two comparisons, can't it? Would this be better...
> 
> Whoops, yeah...
> 
> > 
> >   // N is odd; in this final iteration, we perform at most two
> >   // comparisons, for a total of 3*(N-1)/2 + 2 comparisons,
> >   // which is not more than 3*N/2, as required.
> > 
> > ?
> 
> Ah, but then the total is more than 3*N/2 :(  And I think we reach this
> case really when N is even, not odd (sorry, I really botched this
> patch).
> 
> And when N=2 in particular, we perform up two comparisons instead of
> three, but actually a single comparison should suffice in this case.  I
> think all this is fixed in v2 below by handling the second element in
> the range specially.
> 
> > 
> > > + break;
> > > +   }
> > > + auto __val2 = *__first;
> > > + if (!__comp_proj(__val2, __val1))
> > > +   {
> > > + if 

Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Martin Sebor via Gcc-patches

On 5/4/21 1:44 AM, Jakub Jelinek via Gcc-patches wrote:

Hi!

genericize_spaceship genericizes i <=> j to approximately
({ int c; if (i == j) c = 0; else if (i < j) c = -1; else c = 1; c; })
for strong ordering and
({ int c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; 
else c = 2; c; })
for partial ordering.
The C++ standard supports then == or != comparisons of that against
strong/partial ordering enums, or />= comparisons of <=> result
against literal 0.

In some cases we already optimize that but in many cases we keep performing
all the 2 or 3 comparisons, compute the spaceship value and then compare
that.

The following patch recognizes those patterns if the <=> operands are
integral types or floating point (the latter only for -ffast-math) and
optimizes it to the single comparison that is needed (plus adds debug stmts
if needed for the spaceship result).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Just a few words on readabilty.

I didn't follow all the logic carefully but I found the code very
dense and hard to read, and a little too frugal with comments.
The new spaceship_replacement function is over 340 lines long!  It
could stand to be broken up into a few smaller logical pieces to
be easier to read and understand.  Commenting the pieces would
also help (a lot).  (It may seem crystal clear to you the way it
is but not all of us read code as it was prose.  It's good to
keep that in mind.)

Overall, long functions are easier to read if they make more liberal
use of vertical whitespace than when they look like one uninterrupted
stream of text.  I see that sometimes you separate chunks of code with
blank lines but other times you don't.  I'm not sure I see any pattern
to it but it would help to separate logical blocks with blank lines,
especially after return statements.  I marked up a few spots to give
you a general idea what I mean.

Martin



There are two things I'd like to address in a follow-up:
1) if (HONOR_NANS (TREE_TYPE (lhs1)) || HONOR_SIGNED_ZEROS (TREE_TYPE (lhs1)))
is what I've copied from elsewhere in phiopt, but thinking about it,
alll we care is probably only HONOR_NANS, the matched pattern starts with
== or != comparison and branches to the PHI bb with -1/0/1/2 result if it is
equal, which should be the case for signed zero differences.
2) the pr94589-2.C testcase should be matching just 12 times each, but runs
into operator>=(strong_ordering, unspecified) being defined as
(_M_value&1)==_M_value
rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
but is not a single use but two uses.  I'll need to pattern match that case
specially.

2021-05-04  Jakub Jelinek  

PR tree-optimization/94589
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
spaceship_replacement.
(cond_only_block_p, spaceship_replacement): New functions.

* gcc.dg/pr94589-1.c: New test.
* gcc.dg/pr94589-2.c: New test.
* gcc.dg/pr94589-3.c: New test.
* gcc.dg/pr94589-4.c: New test.
* g++.dg/opt/pr94589-1.C: New test.
* g++.dg/opt/pr94589-2.C: New test.
* g++.dg/opt/pr94589-3.C: New test.
* g++.dg/opt/pr94589-4.C: New test.

--- gcc/tree-ssa-phiopt.c.jj2021-05-02 10:17:49.095397758 +0200
+++ gcc/tree-ssa-phiopt.c   2021-05-03 17:49:54.233300624 +0200
@@ -64,6 +64,8 @@ static bool abs_replacement (basic_block
 edge, edge, gimple *, tree, tree);
  static bool xor_replacement (basic_block, basic_block,
 edge, edge, gimple *, tree, tree);
+static bool spaceship_replacement (basic_block, basic_block,
+  edge, edge, gimple *, tree, tree);
  static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, 
basic_block,
  edge, edge, gimple *,
  tree, tree);
@@ -357,6 +359,8 @@ tree_ssa_phiopt_worker (bool do_store_el
cfgchanged = true;
  else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
+ else if (spaceship_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+   cfgchanged = true;
}
  }
  
@@ -1806,6 +1810,420 @@ minmax_replacement (basic_block cond_bb,
  
return true;

  }
+
+/* Return true if the only executable statement in BB is a GIMPLE_COND.  */
+
+static bool
+cond_only_block_p (basic_block bb)
+{
+  /* BB must have no executable statements.  */
+  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+  if (phi_nodes (bb))
+return false;
+  while (!gsi_end_p (gsi))
+{
+  gimple *stmt = gsi_stmt (gsi);
+  if (is_gimple_debug (stmt))
+   ;
+  else if (gimple_code (stmt) == GIMPLE_NOP
+  || gimple_code (stmt) == GIMPLE_PREDICT
+  || gimple_code (stmt) 

Re: [PATCH] libstdc++: Reduce ranges::minmax/minmax_element comparison complexity

2021-05-05 Thread Patrick Palka via Gcc-patches
On Wed, 5 May 2021, Jonathan Wakely wrote:

> On 04/05/21 21:42 -0400, Patrick Palka via Libstdc++ wrote:
> > This rewrites ranges::minmax and ranges::minmax_element so that it
> > performs at most 3*N/2 many comparisons, as required by the standard.
> > In passing, this also fixes PR100387 by avoiding a premature std::move
> > in ranges::minmax and in std::shift_right.
> > 
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps
> > 10/11?
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > PR libstdc++/100387
> > * include/bits/ranges_algo.h (__minmax_fn::operator()): Rewrite
> > to limit comparison complexity to 3*N/2.  Avoid premature std::move.
> > (__minmax_element_fn::operator()): Likewise.
> > (shift_right): Avoid premature std::move of __result.
> > * testsuite/25_algorithms/minmax/constrained.cc (test04, test05):
> > New tests.
> > * testsuite/25_algorithms/minmax_element/constrained.cc (test02):
> > Likewise.
> > ---
> > libstdc++-v3/include/bits/ranges_algo.h   | 87 ++-
> > .../25_algorithms/minmax/constrained.cc   | 31 +++
> > .../minmax_element/constrained.cc | 19 
> > 3 files changed, 113 insertions(+), 24 deletions(-)
> > 
> > diff --git a/libstdc++-v3/include/bits/ranges_algo.h
> > b/libstdc++-v3/include/bits/ranges_algo.h
> > index cda3042c11f..bbd29127e89 100644
> > --- a/libstdc++-v3/include/bits/ranges_algo.h
> > +++ b/libstdc++-v3/include/bits/ranges_algo.h
> > @@ -3291,18 +3291,39 @@ namespace ranges
> > auto __first = ranges::begin(__r);
> > auto __last = ranges::end(__r);
> > __glibcxx_assert(__first != __last);
> > +   auto __comp_proj = __detail::__make_comp_proj(__comp, __proj);
> > minmax_result> __result = {*__first, *__first};
> > while (++__first != __last)
> >   {
> > -   auto __tmp = *__first;
> > -   if (std::__invoke(__comp,
> > - std::__invoke(__proj, __tmp),
> > - std::__invoke(__proj, __result.min)))
> > - __result.min = std::move(__tmp);
> > -   if (!(bool)std::__invoke(__comp,
> > -std::__invoke(__proj, __tmp),
> > -std::__invoke(__proj, __result.max)))
> > - __result.max = std::move(__tmp);
> > +   // Process two elements at a time so that we perform at most
> > +   // 3*N/2 many comparisons in total (each of the N/2 iterations
> 
> Is "many" a typo here?

Just a bad habit of mine to usually write " many" instead of just
"" :) Consider the "many" removed.

> 
> > +   // of this loop performs three comparisions).
> > +   auto __val1 = *__first;
> 
> Can we avoid making this copy if the range satisfies forward_range, by
> keeping copies of the min/max iterators, or just forwarding to
> ranges::minmax_element?

Maybe we can make __val1 and __val2 universal references?  Ah, but then
__val1 would potentially be invalidated after incrementing __first.  I
think it should be safe to make __val2 a universal reference though.
I've done this in v2 below.

Forwarding to ranges::minmax_element seems like it would be profitable
in some situations, e.g if the value type isn't trivially copyable.  I
can do this in a followup patch for ranges::max/max_element and
ranges::min/min_element too, they should all use the same heuristic.

> 
> 
> > +   if (++__first == __last)
> > + {
> > +   // N is odd; in this final iteration, we perform a just one
> 
> s/perform a just one/perform just one/

Fixed.

> 
> > +   // comparison, for a total of 3*(N-1)/2 + 1 < 3*N/2
> > comparisons.
> 
> I find this a bit hard to parse with the inequality there.

Removed.

> 
> > +   if (__comp_proj(__val1, __result.min))
> > + __result.min = std::move(__val1);
> > +   else if (!__comp_proj(__val1, __result.max))
> > + __result.max = std::move(__val1);
> 
> This can be two comparisons, can't it? Would this be better...

Whoops, yeah...

> 
>   // N is odd; in this final iteration, we perform at most two
>   // comparisons, for a total of 3*(N-1)/2 + 2 comparisons,
>   // which is not more than 3*N/2, as required.
> 
> ?

Ah, but then the total is more than 3*N/2 :(  And I think we reach this
case really when N is even, not odd (sorry, I really botched this
patch).

And when N=2 in particular, we perform up two comparisons instead of
three, but actually a single comparison should suffice in this case.  I
think all this is fixed in v2 below by handling the second element in
the range specially.

> 
> > +   break;
> > + }
> > +   auto __val2 = *__first;
> > +   if (!__comp_proj(__val2, __val1))
> > + {
> > +   if (__comp_proj(__val1, __result.min))
> > + __result.min = std::move(__val1);
> > +   if (!__comp_proj(__val2, __result.max))
> > + __result.max = std::move(__val2);
> > + }
> > +   else

[committed] Get avr building again

2021-05-05 Thread Jeff Law via Gcc-patches


Removes references to CC_STATUS_INIT from the avr port, which should get 
it to the point of building again.



Committed to the trunk.


Jeff

commit b927ffdd6cecd0eeda6ef77df2623519870b1e75
Author: Jeff Law 
Date:   Wed May 5 09:15:42 2021 -0600

Remove cc0 remnants from avr port

gcc/
* config/avr/avr.md: Remove references to CC_STATUS_INIT.

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index a1a325b7a8c..271f95fbf7a 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -7668,7 +7668,6 @@
   {
 const char *op;
 int jump_mode;
-CC_STATUS_INIT;
 if (test_hard_reg_class (ADDW_REGS, operands[0]))
   output_asm_insn ("sbiw %0,1" CR_TAB
"sbc %C0,__zero_reg__" CR_TAB
@@ -7713,7 +7712,6 @@
   {
 const char *op;
 int jump_mode;
-CC_STATUS_INIT;
 if (test_hard_reg_class (ADDW_REGS, operands[0]))
   output_asm_insn ("sbiw %0,1", operands);
 else
@@ -7756,7 +7754,6 @@
   {
 const char *op;
 int jump_mode;
-CC_STATUS_INIT;
 if (test_hard_reg_class (ADDW_REGS, operands[0]))
   output_asm_insn ("sbiw %0,1", operands);
 else
@@ -7799,7 +7796,6 @@
   {
 const char *op;
 int jump_mode;
-CC_STATUS_INIT;
 output_asm_insn ("ldi %3,1"   CR_TAB
  "sub %A0,%3" CR_TAB
  "sbc %B0,__zero_reg__", operands);


[PATCH] s390: Add more vcond_mask patterns.

2021-05-05 Thread Robin Dapp via Gcc-patches

Hi,

this patch adds vcond_mask patterns with mixed mode for the 
condition/mask and source, target so e.g. boolean conditions become 
possible:


  vtarget = bool_cond ? vsource1 : vsource2.

Is it OK for trunk?

Regards
 Robin

gcc/ChangeLog:

* config/s390/vector.md 
(vcond_mask_): Add 
vcond_mask with mixed mode.

(vcond_mask_): Dito.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vcond-mixed-double.c: New test.
* gcc.target/s390/vector/vcond-mixed-float.c: New test.
>From 4ce2d9a3f43c44d35142a726921258540adfca51 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Thu, 18 Mar 2021 11:31:02 +0100
Subject: [PATCH 3/7] s390: Add more vcond_mask patterns.

Add vcond_mask patterns that allow another mode for the condition/mask
than the source and target so e.g. boolean conditions become possible:

  vtarget = bool_cond ? vsource1 : vsource2.

Also, add test cases for vcond_mask with mixed modes.
---
 gcc/config/s390/vector.md | 21 ++
 .../s390/vector/vcond-mixed-double.c  | 41 +++
 .../s390/vector/vcond-mixed-float.c   | 41 +++
 3 files changed, 103 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c80d582a300..7c730432d80 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -36,6 +36,7 @@
 (define_mode_iterator V_HW2 [V16QI V8HI V4SI V2DI V2DF (V4SF "TARGET_VXE")
 			 (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
 
+
 (define_mode_iterator V_HW_64 [V2DI V2DF])
 (define_mode_iterator VT_HW_HSDT [V8HI V4SI V4SF V2DI V2DF V1TI V1TF TI TF])
 (define_mode_iterator V_HW_HSD [V8HI V4SI (V4SF "TARGET_VXE") V2DI V2DF])
@@ -725,6 +726,26 @@
   "TARGET_VX"
   "operands[4] = CONST0_RTX (mode);")
 
+(define_expand "vcond_mask_"
+  [(set (match_operand:VX_VEC_CONV_BFP 0 "register_operand" "")
+	(if_then_else:VX_VEC_CONV_BFP
+	 (eq (match_operand:VX_VEC_CONV_INT 3 "register_operand" "")
+	 (match_dup 4))
+	 (match_operand:VX_VEC_CONV_BFP 2 "register_operand" "")
+	 (match_operand:VX_VEC_CONV_BFP 1 "register_operand" "")))]
+  "TARGET_VX"
+  "operands[4] = CONST0_RTX (mode);")
+
+(define_expand "vcond_mask_"
+  [(set (match_operand:VX_VEC_CONV_INT 0 "register_operand" "")
+	(if_then_else:VX_VEC_CONV_INT
+	 (eq (match_operand:VX_VEC_CONV_BFP 3 "register_operand" "")
+	 (match_dup 4))
+	 (match_operand:VX_VEC_CONV_INT 2 "register_operand" "")
+	 (match_operand:VX_VEC_CONV_INT 1 "register_operand" "")))]
+  "TARGET_VX"
+  "operands[4] = CONST0_RTX (mode);")
+
 
 ; We only have HW support for byte vectors.  The middle-end is
 ; supposed to lower the mode if required.
diff --git a/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
new file mode 100644
index 000..8795d08a732
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
@@ -0,0 +1,41 @@
+/* Check for vectorization of mixed conditionals.  */
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O3 -march=z14 -mzarch" } */
+
+double xd[1024];
+double zd[1024];
+double wd[1024];
+
+long xl[1024];
+long zl[1024];
+long wl[1024];
+
+void foold ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zd[i] = xl[i] ? zd[i] : wd[i];
+}
+
+void foodl ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zl[i] = xd[i] ? zl[i] : wl[i];
+}
+
+void foold2 ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zd[i] = (xd[i] > 0) ? zd[i] : wd[i];
+}
+
+void foold3 ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zd[i] = (xd[i] > 0. & wd[i] < 0.) ? zd[i] : wd[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c
new file mode 100644
index 000..1153cace420
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c
@@ -0,0 +1,41 @@
+/* Check for vectorization of mixed conditionals.  */
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O3 -march=z15 -mzarch" } */
+
+float xf[1024];
+float zf[1024];
+float wf[1024];
+
+int xi[1024];
+int zi[1024];
+int wi[1024];
+
+void fooif ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zf[i] = xi[i] ? zf[i] : wf[i];
+}
+
+void foofi ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zi[i] = xf[i] ? zi[i] : wi[i];
+}
+
+void fooif2 ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zf[i] = (xf[i] > 0) ? zf[i] : wf[i];
+}
+
+void fooif3 ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zf[i] = (xf[i] > 0.f & wf[i] < 0.f) ? zf[i] : wf[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
-- 
2.23.0



[PATCH, OG10, OpenMP 5.0, committed] Implement relaxation of implicit map vs. existing device mappings

2021-05-05 Thread Chung-Lin Tang via Gcc-patches

This patch implements relaxing the requirements when a map with the implicit 
attribute encounters
an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, 
lines 18-27 (and 5.1 spec,
page 352, lines 13-22):

"If a single contiguous part of the original storage of a list item with an 
implicit data-mapping
 attribute has corresponding storage in the device data environment prior to a 
task encountering the
 construct that is associated with the map clause, only that part of the 
original storage will have
 corresponding storage in the device data environment as a result of the map 
clause."

Also tracked in the OpenMP spec context as issue #1463:
https://github.com/OpenMP/spec/issues/1463

The implementation inside the compiler is to of course, tag the implicitly 
created maps with some
indication of "implicit". I've done this with a OMP_CLAUSE_MAP_IMPLICIT_P 
macro, using
'base.deprecated_flag' underneath.

There is an encoding of this as GOMP_MAP_IMPLICIT == 
GOMP_MAP_FLAG_SPECIAL_3|GOMP_MAP_FLAG_SPECIAL_4
in include/gomp-constants.h for the runtime, but I've intentionally avoided 
exploding the entire
gimplify/omp-low with a new set of GOMP_MAP_IMPLICIT_TO/FROM/etc. symbols, 
instead adding in the new
flag bits only at the final runtime call generation during omp-lowering.

The rest is libgomp mapping taking care of the implicit case: allowing map 
success if an existing
map is a proper subset of the new map, if the new map is implicit. 
Straightforward enough I think.

There are also some additions to print the implicit attribute during tree 
pretty-printing, for that
reason some scan tests were updated.

Also, another adjustment in this patch is how implicitly created clauses are 
added to the current
clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the 
new clauses to the end,
this patch adds them at the position "after initial non-map clauses, but right 
before any existing
map clauses".

The reason for this is: when combined with other map clauses, for example:

  #pragma omp target map(rec.ptr[:N])
  for (int i = 0; i < N; i++)
rec.ptr[i] += 1;

There will be an implicit map created for map(rec), because of the access 
inside the target region.
The expectation is that 'rec' is implicitly mapped, and then the pointed 
array-section part by 'rec.ptr'
will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' 
(in that order).

If the implicit 'map(rec)' is appended to the end, instead of placed before 
other maps, the attachment
operation will not find anything to attach to, and the entire region will fail.

Note: this touches a bit on another issue which I will be sending a patch for 
later:
per the discussion on omp-lang, an array section list item should *not* be 
mapping its base-pointer
(although an attachment attempt should exist), while in current GCC behavior, 
for struct member pointers
like 'rec.ptr' above, we do map it (which should be deemed incorrect).

This means that as of right now, this modification of map order doesn't really 
exhibit the above mentioned
behavior yet. I have included it as part of this patch because the "[implicit]" 
tree printing requires
modifying many gimple scan tests already, so including the test modifications 
together seems more
manageable patch-wise.

Tested with no regressions, and pushed to devel/omp/gcc-10. Will be submitting 
a mainline trunk version later.

Chung-Lin

2021-05-05  Chung-Lin Tang  

include/ChangeLog:

* gomp-constants.h (GOMP_MAP_IMPLICIT): New special map kind bits value.
(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
special map kind bits.
(GOMP_MAP_NONCONTIG_ARRAY_P): Adjust test for non-contiguous array map
kind bits to be more specific.
(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.

gcc/ChangeLog:

* tree.h (OMP_CLAUSE_MAP_IMPLICIT_P): New access macro for 'implicit'
bit, using 'base.deprecated_flag' field of tree_node.
* tree-pretty-print.c (dump_omp_clause): Add support for printing
implicit attribute in tree dumping.
* gimplify.c (gimplify_adjust_omp_clauses_1):
Set OMP_CLAUSE_MAP_IMPLICIT_P to 1 if map clause is implicitly created.
(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
clauses, from simple append, to starting of list, after non-map clauses.
* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
values passed to libgomp for implicit maps.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-implicit-map-1.c: New test.
* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
* c-c++-common/goacc/mdc-1.c: Likewise.
* c-c++-common/goacc/reduction-1.c: Likewise.
* c-c++-common/goacc/reduction-2.c: Likewise.
* 

[PATCH v2] x86: Build only one __cpu_model/__cpu_features2 variables

2021-05-05 Thread H.J. Lu via Gcc-patches
On Wed, May 05, 2021 at 09:36:16AM +0200, Richard Biener wrote:
> On Mon, May 3, 2021 at 11:31 AM Ivan Sorokin via Gcc-patches
>  wrote:
> >
> > Prior to this commit GCC -O2 generated quite bad code for this
> > function:
> >
> > bool f()
> > {
> > return __builtin_cpu_supports("popcnt")
> > && __builtin_cpu_supports("ssse3");
> > }
> >
> > f:
> > movl__cpu_model+12(%rip), %eax
> > xorl%r8d, %r8d
> > testb   $4, %al
> > je  .L1
> > shrl$6, %eax
> > movl%eax, %r8d
> > andl$1, %r8d
> > .L1:
> > movl%r8d, %eax
> > ret
> >
> > The problem was caused by the fact that internally every invocation
> > of __builtin_cpu_supports built a new variable __cpu_model and a new
> > type __processor_model. Because of this GIMPLE level optimizers
> > weren't able to CSE the loads of __cpu_model and optimize
> > bit-operations properly.
> >
> > This commit fixes the problem by caching created __cpu_model
> > variable and __processor_model type. Now the GCC -O2 generates:
> >
> > f:
> > movl__cpu_model+12(%rip), %eax
> > andl$68, %eax
> > cmpl$68, %eax
> > sete%al
> > ret
> 
> The patch looks good, the function could need a comment
> and the global variables better names, not starting with __
> 
> Up to the x86 maintainers - HJ, can you pick up this work?
> 

Here is the updated patch to also handle and__cpu_features2.
OK for master?

Thanks.

H.J.
---
GCC -O2 generated quite bad code for this function:

bool
f (void)
{
  return __builtin_cpu_supports("popcnt")
 && __builtin_cpu_supports("ssse3");
}

f:
movl__cpu_model+12(%rip), %edx
movl%edx, %eax
shrl$6, %eax
andl$1, %eax
andl$4, %edx
movl$0, %edx
cmove   %edx, %eax
ret

The problem was caused by the fact that internally every invocation of
__builtin_cpu_supports built a new variable __cpu_model and a new type
__processor_model.  Because of this, GIMPLE level optimizers weren't able
to CSE the loads of __cpu_model and optimize bit-operations properly.

Improve GCC -O2 code generation by caching __cpu_model and__cpu_features2
variables as well as their types:

f:
movl__cpu_model+12(%rip), %eax
andl$68, %eax
cmpl$68, %eax
sete%al
ret

gcc/ChangeLog:

2021-05-05  Ivan Sorokin 
H.J. Lu 

PR target/91400
* config/i386/i386-builtins.c (ix86_cpu_model_type_node): New.
(ix86_cpu_model_var): Likewise.
(ix86_cpu_features2_type_node): Likewise.
(ix86_cpu_features2_var): Likewise.
(fold_builtin_cpu): Cache __cpu_model and __cpu_features2 with
their types.

gcc/testsuite/Changelog:

2021-05-05  Ivan Sorokin 
H.J. Lu 

PR target/91400
* gcc.target/i386/pr91400-1.c: New test.
* gcc.target/i386/pr91400-2.c: Likewise.
---
 gcc/config/i386/i386-builtins.c   | 52 +++
 gcc/testsuite/gcc.target/i386/pr91400-1.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr91400-2.c | 14 ++
 3 files changed, 63 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr91400-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr91400-2.c

diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
index b66911082ab..8036aedebac 100644
--- a/gcc/config/i386/i386-builtins.c
+++ b/gcc/config/i386/i386-builtins.c
@@ -2103,6 +2103,11 @@ make_var_decl (tree type, const char *name)
   return new_decl;
 }
 
+static GTY(()) tree ix86_cpu_model_type_node;
+static GTY(()) tree ix86_cpu_model_var;
+static GTY(()) tree ix86_cpu_features2_type_node;
+static GTY(()) tree ix86_cpu_features2_var;
+
 /* FNDECL is a __builtin_cpu_is or a __builtin_cpu_supports call that is folded
into an integer defined in libgcc/config/i386/cpuinfo.c */
 
@@ -2114,12 +2119,16 @@ fold_builtin_cpu (tree fndecl, tree *args)
 = (enum ix86_builtins) DECL_MD_FUNCTION_CODE (fndecl);
   tree param_string_cst = NULL;
 
-  tree __processor_model_type = build_processor_model_struct ();
-  tree __cpu_model_var = make_var_decl (__processor_model_type,
-   "__cpu_model");
-
-
-  varpool_node::add (__cpu_model_var);
+  if (ix86_cpu_model_var == nullptr)
+{
+  /* Build a single __cpu_model variable for all references to
+__cpu_model so that GIMPLE level optimizers can CSE the loads
+of __cpu_model and optimize bit-operations properly.  */
+  ix86_cpu_model_type_node = build_processor_model_struct ();
+  ix86_cpu_model_var = make_var_decl (ix86_cpu_model_type_node,
+ "__cpu_model");
+  varpool_node::add (ix86_cpu_model_var);
+}
 
   gcc_assert ((args != NULL) && (*args != NULL));
 
@@ -2160,7 +2169,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
 

Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-05-05 Thread Qing Zhao via Gcc-patches
Hi, Richard, 

During the change for the 2nd version based on your previous comments, I have 
the following questions need your help:

> 
>> +  sra_stats.subtree_deferred_init++;
>> +}
>> +  else if (access->grp_to_be_debug_replaced)
>> +{
>> +  /* FIXME, this part might have some issue.  */
>> +  tree drhs = build_debug_ref_for_model (loc, agg,
>> + access->offset - top_offset,
>> + access);
>> +  gdebug *ds = gimple_build_debug_bind (get_access_replacement (access),
>> +drhs, gsi_stmt (*gsi));
>> +  gsi_insert_before (gsi, ds, GSI_SAME_STMT);
> 
> Would be good to fix the FIXME :-)
> 
> I guess the thing we need to decide here is whether -ftrivial-auto-var-init
> should affect debug-only constructs too.  If it doesn't, exmaining removed
> components in a debugger might show uninitialised values in cases where
> the user was expecting initialised ones.  There would be no security
> concern, but it might be surprising.
> 
> I think in principle the DRHS can contain a call to DEFERRED_INIT.
> Doing that would probably require further handling elsewhere though.

Right now, what I did is:

  else if (lhs_access->grp_to_be_debug_replaced)
{
  tree lhs_drepl = get_access_replacement (lhs_access);
  tree init_type_node
   = build_int_cst (integer_type_node, (int) init_type);
  tree call = build_call_expr_internal_loc
  (UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
  TREE_TYPE (lhs_drepl), 2, lhs_drepl, init_type_node);
  gdebug *ds = gimple_build_debug_bind (lhs_drepl, call,
gsi_stmt (*gsi));
  gsi_insert_before (gsi, ds, GSI_SAME_STMT);
}

Is the above matching what you suggested?

What do you mean by “further handling elsewhere”?

> 
>> + is better even for pattern initialization.  */
>> +  return build_int_cstu (type, largevalue);
> 
> I've no objection to that choice for booleans, but: booleans in some
> languages (like Ada) can have multibit precision.  If we want booleans
> to be zero then it would probably be better to treat them as a separate
> case and just use build_zero_cst (type) for them.
> 
> Also, the above won't work correctly for 128-bit integers: it will
> zero-initialize the upper half.  It would probably be better to use
> wi::from_buffer to construct the integer instead.

You mean using wi::from_buffer to construct all the integer type (including 
64-bit, 32-bit, etc.)?

I read the corresponding source codes related to “wi::from_buffer”, but still 
not very clear
On how to use it for my purpose,

From my current understanding, I should use it like the following:

"

unsigned char *ptr = “0x”;

Int total_bytes = GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (type));
wide_int result = wi::from_buffer (ptr, total_bytes);
return wide_int_to_tree (type, result);

“

Is the above correct for INTEGER type?

thanks.

Qing


[PATCH] testsuite: Add s390 to vect_*_cvt checks

2021-05-05 Thread Robin Dapp via Gcc-patches

Hi,

this patch adds some s390 checks for vect_*_cvts. Is it OK?

Regards
 Robin

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add s390 checks for vect conversions.
>From 959251d5d2684a9ffebec1b341a4413c2f2328db Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Thu, 22 Apr 2021 09:36:04 +0200
Subject: [PATCH 04/10] Add s390 to vect_*_cvt checks.

---
 gcc/testsuite/lib/target-supports.exp | 29 ---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a522da322aa..f8d2ad3e623 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3504,7 +3504,9 @@ proc check_effective_target_vect_intfloat_cvt { } {
 	 || [is-effective-target arm_neon]
 	 || ([istarget mips*-*-*]
 		 && [et-is-effective-target mips_msa])
-	 || [istarget amdgcn-*-*] }}]
+	 || [istarget amdgcn-*-*]
+	 || ([istarget s390*-*-*]
+		 && [check_effective_target_s390_vxe2]) }}]
 }
 
 # Return 1 if the target supports signed double->int conversion
@@ -3521,7 +3523,9 @@ proc check_effective_target_vect_doubleint_cvt { } {
 	|| [istarget aarch64*-*-*]
 	|| ([istarget powerpc*-*-*] && [check_vsx_hw_available])
 	|| ([istarget mips*-*-*]
-		 && [et-is-effective-target mips_msa]) }}]
+		 && [et-is-effective-target mips_msa])
+	 || ([istarget s390*-*-*]
+		 && [check_effective_target_s390_vx]) }}]
 }
 
 # Return 1 if the target supports signed int->double conversion
@@ -3538,7 +3542,9 @@ proc check_effective_target_vect_intdouble_cvt { } {
 	 || [istarget aarch64*-*-*]
 	 || ([istarget powerpc*-*-*] && [check_vsx_hw_available])
 	 || ([istarget mips*-*-*]
-		 && [et-is-effective-target mips_msa]) }}]
+		 && [et-is-effective-target mips_msa])
+	 || ([istarget s390*-*-*]
+		 && [check_effective_target_s390_vx]) }}]
 }
 
 #Return 1 if we're supporting __int128 for target, 0 otherwise.
@@ -3567,7 +3573,9 @@ proc check_effective_target_vect_uintfloat_cvt { } {
 	 || [is-effective-target arm_neon]
 	 || ([istarget mips*-*-*]
 		 && [et-is-effective-target mips_msa])
-	 || [istarget amdgcn-*-*] }}]
+	 || [istarget amdgcn-*-*]
+	 || ([istarget s390*-*-*]
+	  && [check_effective_target_s390_vxe2]) }}]
 }
 
 
@@ -3582,7 +3590,9 @@ proc check_effective_target_vect_floatint_cvt { } {
 	 || [is-effective-target arm_neon]
 	 || ([istarget mips*-*-*]
 		 && [et-is-effective-target mips_msa])
-	 || [istarget amdgcn-*-*] }}]
+	 || [istarget amdgcn-*-*]
+	 || ([istarget s390*-*-*]
+		 && [check_effective_target_s390_vxe2]) }}]
 }
 
 # Return 1 if the target supports unsigned float->int conversion
@@ -3595,7 +3605,9 @@ proc check_effective_target_vect_floatuint_cvt { } {
 	|| [is-effective-target arm_neon]
 	|| ([istarget mips*-*-*]
 		&& [et-is-effective-target mips_msa])
-	|| [istarget amdgcn-*-*] }}]
+	|| [istarget amdgcn-*-*]
+	|| ([istarget s390*-*-*]
+		&& [check_effective_target_s390_vxe2]) }}]
 }
 
 # Return 1 if peeling for alignment might be profitable on the target
@@ -9794,7 +9806,10 @@ proc check_vect_support_and_set_flags { } {
 	lappend DEFAULT_VECTCFLAGS "--param" "max-unroll-times=8"
 	lappend DEFAULT_VECTCFLAGS "--param" "max-completely-peeled-insns=200"
 	lappend DEFAULT_VECTCFLAGS "--param" "max-completely-peel-times=16"
-if [check_effective_target_s390_vxe] {
+if [check_effective_target_s390_vxe2] {
+	lappend DEFAULT_VECTCFLAGS "-march=z15" "-mzarch"
+set dg-do-what-default run
+	} elseif [check_effective_target_s390_vxe] {
 	lappend DEFAULT_VECTCFLAGS "-march=z14" "-mzarch"
 set dg-do-what-default run
 	} elseif [check_effective_target_s390_vx] {
-- 
2.23.0



[PATCH] testsuite: Add vect_floatint_cvt to gcc.dg/vect/pr56541.c

2021-05-05 Thread Robin Dapp via Gcc-patches

Hi,

pr56541.c implicitly converts a float vector to an int (bool) vector:

 rMin = (rMax>0) ? rMin : rBig;

It fails on some s390 targets because the do not support converting from 
vector float to int.  Is adding a vect_floatint_cvt as in the attached 
patch the OK thing to do?


Or better an xfail with ! vect_floatint_cvt?

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr56541.c: Add vect_floatint_cvt.
>From 4c19323b2c392923391a3c37b92054852d671c19 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Thu, 22 Apr 2021 09:22:07 +0200
Subject: [PATCH 03/10] Add vect_floatint_cvt.

---
 gcc/testsuite/gcc.dg/vect/pr56541.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr56541.c b/gcc/testsuite/gcc.dg/vect/pr56541.c
index d5def6899e4..e1cee6d0b0e 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56541.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56541.c
@@ -24,4 +24,4 @@ void foo()
 }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_cond_mixed } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! vect_floatint_cvt } xfail { ! vect_cond_mixed } } } } */
-- 
2.23.0



Re: [PATCH] phiopt, v2: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Richard Biener
On Wed, 5 May 2021, Jakub Jelinek wrote:

> On Wed, May 05, 2021 at 01:39:32PM +0200, Richard Biener wrote:
> > Can you in the above IL snippets mark COND_BB and MIDDLE_BB?
> ...
> 
> Thanks.
> Here is an updated patch (attached) and interdiff from previous patch.
> Ok for trunk if it passes bootstrap/regtest?

OK.

Thanks,
Richard.

> --- gcc/tree-ssa-phiopt.c 2021-05-03 17:49:54.233300624 +0200
> +++ gcc/tree-ssa-phiopt.c 2021-05-05 15:06:23.253189139 +0200
> @@ -65,7 +65,7 @@
>  static bool xor_replacement (basic_block, basic_block,
>edge, edge, gimple *, tree, tree);
>  static bool spaceship_replacement (basic_block, basic_block,
> -edge, edge, gimple *, tree, tree);
> +edge, edge, gphi *, tree, tree);
>  static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, 
> basic_block,
> edge, edge, gimple *,
> tree, tree);
> @@ -1840,53 +1840,53 @@
>  
>  /* Attempt to optimize (x <=> y) cmp 0 and similar comparisons.
> For strong ordering <=> try to match something like:
> - :
> + :  // cond3_bb (== cond2_bb)
>  if (x_4(D) != y_5(D))
>goto ; [INV]
>  else
>goto ; [INV]
>  
> - :
> + :  // cond_bb
>  if (x_4(D) < y_5(D))
>goto ; [INV]
>  else
>goto ; [INV]
>  
> - :
> + :  // middle_bb
>  
> - :
> + :  // phi_bb
>  # iftmp.0_2 = PHI <1(4), 0(2), -1(3)>
>  _1 = iftmp.0_2 == 0;
>  
> and for partial ordering <=> something like:
>  
> - :
> + :  // cond3_bb
>  if (a_3(D) == b_5(D))
>goto ; [50.00%]
>  else
>goto ; [50.00%]
>  
> - [local count: 536870913]:
> + [local count: 536870913]:  // cond2_bb
>  if (a_3(D) < b_5(D))
>goto ; [50.00%]
>  else
>goto ; [50.00%]
>  
> - [local count: 268435456]:
> + [local count: 268435456]:  // cond_bb
>  if (a_3(D) > b_5(D))
>goto ; [50.00%]
>  else
>goto ; [50.00%]
>  
> - [local count: 134217728]:
> + [local count: 134217728]:  // middle_bb
>  
> - [local count: 1073741824]:
> + [local count: 1073741824]:  // phi_bb
>  # SR.27_4 = PHI <0(2), -1(3), 1(4), 2(5)>
>  _2 = SR.27_4 > 0;  */
>  
>  static bool
>  spaceship_replacement (basic_block cond_bb, basic_block middle_bb,
> -edge e0, edge e1, gimple *phi,
> +edge e0, edge e1, gphi *phi,
>  tree arg0, tree arg1)
>  {
>if (!INTEGRAL_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
> @@ -1897,6 +1897,11 @@
>|| !IN_RANGE (tree_to_shwi (arg1), -1, 2))
>  return false;
>  
> +  basic_block phi_bb = gimple_bb (phi);
> +  gcc_assert (phi_bb == e0->dest && phi_bb == e1->dest);
> +  if (!IN_RANGE (EDGE_COUNT (phi_bb->preds), 3, 4))
> +return false;
> +
>use_operand_p use_p;
>gimple *use_stmt;
>if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (PHI_RESULT (phi)))
> @@ -1953,11 +1958,6 @@
>if (!empty_block_p (middle_bb))
>  return false;
>  
> -  basic_block phi_bb = gimple_bb (phi);
> -  gcc_assert (phi_bb == e0->dest && phi_bb == e1->dest);
> -  if (!IN_RANGE (EDGE_COUNT (phi_bb->preds), 3, 4))
> -return false;
> -
>gcond *cond1 = as_a  (last_stmt (cond_bb));
>enum tree_code cmp1 = gimple_cond_code (cond1);
>if (cmp1 != LT_EXPR && cmp1 != GT_EXPR)
> @@ -1965,7 +1965,7 @@
>tree lhs1 = gimple_cond_lhs (cond1);
>tree rhs1 = gimple_cond_rhs (cond1);
>/* The optimization may be unsafe due to NaNs.  */
> -  if (HONOR_NANS (TREE_TYPE (lhs1)) || HONOR_SIGNED_ZEROS (TREE_TYPE (lhs1)))
> +  if (HONOR_NANS (TREE_TYPE (lhs1)))
>  return false;
>if (TREE_CODE (lhs1) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs1))
>  return false;
> @@ -2180,11 +2180,6 @@
>gimple_assign_set_rhs1 (use_stmt, cond);
>  }
>update_stmt (use_stmt);
> -  if (cmp3 == EQ_EXPR)
> -gimple_cond_make_true (as_a  (cond3));
> -  else
> -gimple_cond_make_false (as_a  (cond3));
> -  update_stmt (cond3);
>  
>if (MAY_HAVE_DEBUG_BIND_STMTS)
>  {
> @@ -2201,6 +2196,13 @@
>  
>if (has_debug_uses)
>   {
> +   /* If there are debug uses, emit something like:
> +  # DEBUG D#1 => i_2(D) > j_3(D) ? 1 : -1
> +  # DEBUG D#2 => i_2(D) == j_3(D) ? 0 : D#1
> +  where > stands for the comparison that yielded 1
> +  and replace debug uses of phi result with that D#2.
> +  Ignore the value of 2, because if NaNs aren't expected,
> +  all floating point numbers should be comparable.  */
> gimple_stmt_iterator gsi = gsi_after_labels (gimple_bb (phi));
> tree type = TREE_TYPE (PHI_RESULT (phi));
> tree temp1 = make_node (DEBUG_EXPR_DECL);
> @@ -2224,6 +2226,9 @@
>   }
>  }
>  
> +  gimple_stmt_iterator psi = gsi_for_stmt (phi);
> +  

Re: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP

2021-05-05 Thread Christophe Lyon via Gcc-patches
On Tue, 4 May 2021 at 19:03, Christophe Lyon  wrote:
>
> On Tue, 4 May 2021 at 15:43, Christophe Lyon  
> wrote:
> >
> > On Tue, 4 May 2021 at 13:48, Andre Vieira (lists)
> >  wrote:
> > >
> > > It would be good to also add tests for NEON as you also enable auto-vec
> > > for it. I checked and I do think the necessary 'neon_vc' patterns exist
> > > for 'VH', so we should be OK there.
> > >
> >
> > Actually since I posted the patch series, I've noticed a regression in
> > armv8_2-fp16-arith-1.c, because we now vectorize all the float16x[48]_t 
> > loops,
> > but we lose the fact that some FP comparisons can throw exceptions.
> >
> > I'll have to revisit this patch.
>
> Actually it looks like my patch does the right thing: we now vectorize
> appropriately, given that the testcase is compiled with -ffast-math.
> I need to update the testcase, though.
>

Here is a new version, with armv8_2-fp16-arith-1.c updated to take
into account the new vectorization.

Christophe


> >
> > Thanks,
> >
> > Christophe
> >
> > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > This patch adds __fp16 support to the previous patch that added vcmp
> > > > support with MVE. For this we update existing expanders to use VDQWH
> > > > iterator, and add a new expander vcond.  In the
> > > > process we need to create suitable iterators, and update v_cmp_result
> > > > as needed.
> > > >
> > > > 2021-04-26  Christophe Lyon  
> > > >
> > > >   gcc/
> > > >   * config/arm/iterators.md (V16): New iterator.
> > > >   (VH_cvtto): New iterator.
> > > >   (v_cmp_result): Added V4HF and V8HF support.
> > > >   * config/arm/vec-common.md (vec_cmp): Use 
> > > > VDQWH.
> > > >   (vcond): Likewise.
> > > >   (vcond_mask_): Likewise.
> > > >   (vcond): New expander.
> > > >
> > > >   gcc/testsuite/
> > > >   * gcc.target/arm/simd/mve-compare-3.c: New test with GCC vectors.
> > > >   * gcc.target/arm/simd/mve-vcmp-f16.c: New test for
> > > >   auto-vectorization.
> > > > ---
> > > >   gcc/config/arm/iterators.md   |  6 
> > > >   gcc/config/arm/vec-common.md  | 40 
> > > > ---
> > > >   gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c | 38 
> > > > +
> > > >   gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c  | 30 
> > > > +
> > > >   4 files changed, 102 insertions(+), 12 deletions(-)
> > > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c
> > > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c
> > > >
> > > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > > > index a128465..3042baf 100644
> > > > --- a/gcc/config/arm/iterators.md
> > > > +++ b/gcc/config/arm/iterators.md
> > > > @@ -231,6 +231,9 @@ (define_mode_iterator VU [V16QI V8HI V4SI])
> > > >   ;; Vector modes for 16-bit floating-point support.
> > > >   (define_mode_iterator VH [V8HF V4HF])
> > > >
> > > > +;; Modes with 16-bit elements only.
> > > > +(define_mode_iterator V16 [V4HI V4HF V8HI V8HF])
> > > > +
> > > >   ;; 16-bit floating-point vector modes suitable for moving (includes 
> > > > BFmode).
> > > >   (define_mode_iterator VHFBF [V8HF V4HF V4BF V8BF])
> > > >
> > > > @@ -571,6 +574,8 @@ (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF 
> > > > "v2si")
> > > >   ;; (Opposite) mode to convert to/from for vector-half mode 
> > > > conversions.
> > > >   (define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
> > > >   (V8HI "V8HF") (V8HF "V8HI")])
> > > > +(define_mode_attr VH_cvtto [(V4HI "v4hf") (V4HF "v4hi")
> > > > + (V8HI "v8hf") (V8HF "v8hi")])
> > > >
> > > >   ;; Define element mode for each vector mode.
> > > >   (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
> > > > @@ -720,6 +725,7 @@ (define_mode_attr V_cmp_result [(V8QI "V8QI") 
> > > > (V16QI "V16QI")
> > > >   (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
> > > >   (V4HI "v4hi") (V8HI  "v8hi")
> > > >   (V2SI "v2si") (V4SI  "v4si")
> > > > + (V4HF "v4hi") (V8HF  "v8hi")
> > > >   (DI   "di")   (V2DI  "v2di")
> > > >   (V2SF "v2si") (V4SF  "v4si")])
> > > >
> > > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> > > > index 034b48b..3fd341c 100644
> > > > --- a/gcc/config/arm/vec-common.md
> > > > +++ b/gcc/config/arm/vec-common.md
> > > > @@ -366,8 +366,8 @@ (define_expand "vlshr3"
> > > >   (define_expand "vec_cmp"
> > > > [(set (match_operand: 0 "s_register_operand")
> > > >   (match_operator: 1 "comparison_operator"
> > > > -   [(match_operand:VDQW 2 "s_register_operand")
> > > > -(match_operand:VDQW 3 "reg_or_zero_operand")]))]
> > > > +   [(match_operand:VDQWH 2 "s_register_operand")
> > > > +(match_operand:VDQWH 

Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

2021-05-05 Thread Christophe Lyon via Gcc-patches
On Tue, 4 May 2021 at 15:41, Christophe Lyon  wrote:
>
> On Tue, 4 May 2021 at 13:29, Andre Vieira (lists)
>  wrote:
> >
> > Hi Christophe,
> >
> > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > Since MVE has a different set of vector comparison operators from
> > > Neon, we have to update the expansion to take into account the new
> > > ones, for instance 'NE' for which MVE does not require to use 'EQ'
> > > with the inverted condition.
> > >
> > > Conversely, Neon supports comparisons with #0, MVE does not.
> > >
> > > For:
> > > typedef long int vs32 __attribute__((vector_size(16)));
> > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
> > >
> > > we now generate:
> > > cmp_eq_vs32_reg:
> > >   vldr.64 d4, .L123   @ 8 [c=8 l=4]  *mve_movv4si/8
> > >   vldr.64 d5, .L123+8
> > >   vldr.64 d6, .L123+16@ 9 [c=8 l=4]  *mve_movv4si/8
> > >   vldr.64 d7, .L123+24
> > >   vcmp.i32  eq, q0, q1@ 7 [c=16 l=4]  mve_vcmpeqq_v4si
> > >   vpsel q0, q3, q2@ 15[c=8 l=4]  mve_vpselq_sv4si
> > >   bx  lr  @ 26[c=8 l=4]  *thumb2_return
> > > .L124:
> > >   .align  3
> > > .L123:
> > >   .word   0
> > >   .word   0
> > >   .word   0
> > >   .word   0
> > >   .word   1
> > >   .word   1
> > >   .word   1
> > >   .word   1
> > >
> > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) produces
> > > a pair of vldr instead of vmov.i32, qX, #0
> > I think ideally we would even want:
> > vpte  eq, q0, q1
> > vmovt.i32 q0, #0
> > vmove.i32 q0, #1
> >
> > But we don't have a way to generate VPT blocks with multiple
> > instructions yet unfortunately so I guess VPSEL will have to do for now.
>
> TBH,  I looked at what LLVM generates currently ;-)
>

Here is an updated version, which adds
&& (! || flag_unsafe_math_optimizations)
to vcond_mask_

This condition was not present in the neon.md version I move to vec-common.md,
but since the VDQW iterator includes V2SF and V4SF, it should take
float-point flags into account.

Christophe

> >
> > >
> > > 2021-03-01  Christophe Lyon  
> > >
> > >   gcc/
> > >   * config/arm/arm-protos.h (arm_expand_vector_compare): Update
> > >   prototype.
> > >   * config/arm/arm.c (arm_expand_vector_compare): Add support for
> > >   MVE.
> > >   (arm_expand_vcond): Likewise.
> > >   * config/arm/iterators.md (supf): Remove VCMPNEQ_S, VCMPEQQ_S,
> > >   VCMPEQQ_N_S, VCMPNEQ_N_S.
> > >   (VCMPNEQ, VCMPEQQ, VCMPEQQ_N, VCMPNEQ_N): Remove.
> > >   * config/arm/mve.md (@mve_vcmpq_): Add '@' prefix.
> > >   (@mve_vcmpq_f): Likewise.
> > >   (@mve_vcmpq_n_f): Likewise.
> > >   (@mve_vpselq_): Likewise.
> > >   (@mve_vpselq_f"): Likewise.
> > >   * config/arm/neon.md (vec_cmp > >   and move to vec-common.md.
> > >   (vec_cmpu): Likewise.
> > >   (vcond): Likewise.
> > >   (vcond): Likewise.
> > >   (vcondu): Likewise.
> > >   (vcond_mask_): Likewise.
> > >   * config/arm/unspecs.md (VCMPNEQ_U, VCMPNEQ_S, VCMPEQQ_S)
> > >   (VCMPEQQ_N_S, VCMPNEQ_N_S, VCMPEQQ_U, CMPEQQ_N_U, VCMPNEQ_N_U)
> > >   (VCMPGEQ_N_S, VCMPGEQ_S, VCMPGTQ_N_S, VCMPGTQ_S, VCMPLEQ_N_S)
> > >   (VCMPLEQ_S, VCMPLTQ_N_S, VCMPLTQ_S, VCMPCSQ_N_U, VCMPCSQ_U)
> > >   (VCMPHIQ_N_U, VCMPHIQ_U): Remove.
> > >   * config/arm/vec-common.md (vec_cmp > >   from neon.md.
> > >   (vec_cmpu): Likewise.
> > >   (vcond): Likewise.
> > >   (vcond): Likewise.
> > >   (vcondu): Likewise.
> > >   (vcond_mask_): Likewise.
> > >
> > >   gcc/testsuite
> > >   * gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors.
> > >   * gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors.
> > >   * gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC
> > >   vectors.
> > >   * gcc.target/arm/simd/mve-vcmp-f32.c: New test for
> > >   auto-vectorization.
> > >   * gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization.
> > >
> > > add gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c
> > > ---
> > >   gcc/config/arm/arm-protos.h|   2 +-
> > >   gcc/config/arm/arm.c   | 211 
> > > -
> > >   gcc/config/arm/iterators.md|   9 +-
> > >   gcc/config/arm/mve.md  |  10 +-
> > >   gcc/config/arm/neon.md |  87 -
> > >   gcc/config/arm/unspecs.md  |  20 --
> > >   gcc/config/arm/vec-common.md   | 107 +++
> > >   gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c  |  80 
> > >   gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c  |  38 
> > >   .../gcc.target/arm/simd/mve-compare-scalar-1.c |  69 +++
> > >   gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c   |  30 +++
> > >   

[committed] Fix minor CC0 removal fallout on cr16

2021-05-05 Thread Jeff Law via Gcc-patches


cr16 failed to build due to remnants of CC0 support (NOTICE_UPDATE_CC).  
This removes the macro and obvious bits necessary to support it.  
Committed to the trunk.



Jeff


commit 14cf6aab8578132ec89ccb46e69899ae6008ff63
Author: Jeff Law 
Date:   Wed May 5 07:49:28 2021 -0600

Remove NOTICE_UPDATE_CC remnants on cr16

gcc
* config/cr16/cr16.h (NOTICE_UPDATE_CC): Remove.
* config/cr16/cr16.c (notice_update_cc): Remove.
* config/cr16/cr16-protos.h (notice_update_cc): Remove.

diff --git a/gcc/config/cr16/cr16-protos.h b/gcc/config/cr16/cr16-protos.h
index 32f54e0936e..8580dfef716 100644
--- a/gcc/config/cr16/cr16-protos.h
+++ b/gcc/config/cr16/cr16-protos.h
@@ -67,7 +67,6 @@ enum cr16_addrtype
   CR16_ABSOLUTE
 };
 
-extern void notice_update_cc (rtx);
 extern int cr16_operand_bit_pos (int val, int bitval);
 extern void cr16_decompose_const (rtx x, int *code,
  enum data_model_type *data,
diff --git a/gcc/config/cr16/cr16.c b/gcc/config/cr16/cr16.c
index 079706f7a91..6c81c399f70 100644
--- a/gcc/config/cr16/cr16.c
+++ b/gcc/config/cr16/cr16.c
@@ -2095,37 +2095,6 @@ cr16_legitimate_constant_p (machine_mode mode 
ATTRIBUTE_UNUSED,
   return 1;
 }
 
-void
-notice_update_cc (rtx exp)
-{
-  if (GET_CODE (exp) == SET)
-{
-  /* Jumps do not alter the cc's.  */
-  if (SET_DEST (exp) == pc_rtx)
-   return;
-
-  /* Moving register or memory into a register:
- it doesn't alter the cc's, but it might invalidate
- the RTX's which we remember the cc's came from.
- (Note that moving a constant 0 or 1 MAY set the cc's).  */
-  if (REG_P (SET_DEST (exp))
- && (REG_P (SET_SRC (exp)) || GET_CODE (SET_SRC (exp)) == MEM))
-   {
- return;
-   }
-
-  /* Moving register into memory doesn't alter the cc's.
- It may invalidate the RTX's which we remember the cc's came from.  */
-  if (GET_CODE (SET_DEST (exp)) == MEM && REG_P (SET_SRC (exp)))
-   {
- return;
-   }
-}
-
-  CC_STATUS_INIT;
-  return;
-}
-
 static scalar_int_mode
 cr16_unwind_word_mode (void)
 {
diff --git a/gcc/config/cr16/cr16.h b/gcc/config/cr16/cr16.h
index ae90610ad80..4ce9e81b0e3 100644
--- a/gcc/config/cr16/cr16.h
+++ b/gcc/config/cr16/cr16.h
@@ -195,9 +195,6 @@ while (0)
   (targetm.hard_regno_nregs (REGNO,  \
 GET_MODE_WIDER_MODE (word_mode).require ()) == 1)
 
-#define NOTICE_UPDATE_CC(EXP, INSN) \
-   notice_update_cc ((EXP))
-
 /* Interrupt functions can only use registers that have already been 
saved by the prologue, even if they would normally be call-clobbered 
Check if sizes are same and then check if it is possible to rename.  */


[Patch?][RFC][RTL] clobber handling & buildin expansion - missing insn_invalid_p call [PR100418]

2021-05-05 Thread Tobias Burnus

Hi Eric, hi all,

currently, gcn (amdgcn-amdhsa) bootstrapping fails as Alexandre's
patch to __builtin_memset (applied yesterday) now does more expansions.

The problem is [→ PR100418]
  (set(reg:DI)(plus:DI(reg:DI)(const_int)))  [= "adddi3"]
This fails with gcn as gcn has two clobbers for "adddi3" - and when
  expand_insn
is called, INSN_CODE == -1 via:
  icode = recog_memoized (insn);
alias
  INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
As the "int *pnum_clobber" argument is NULL (well, '0'), the
clobbers are not available - which causes the pattern fail.

I think that's a general issue with the RTX code generated by
builtins.c – except that most targets either do not
have clobbers for the used operators — or the code is by
chance fixed:

For instance, I see that several "if" blocks being processed in
recog.c's insn_invalid_p via 'cleanup_cfg (CLEANUP_NO_INSN_DEL)';
the innermost parts of the call chain are:
apply_change_group → verify_changes → insn_invalid_p

* * *

The attached patch seems to solve the GCN issue. Does it look OK?

Or does the insn_invalid_p call come too late?
If so, any suggestion where it would fit best?

Tobias,
who is more a FE and early-ME person.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
extract_insn: Call insn_invalid_p is insn cannot be found.

gcc/ChangeLog:

	* recog.c (extract_insn): Call insn_invalid_p if
	recog_memoized did not find the insn.

diff --git a/gcc/recog.c b/gcc/recog.c
index eb617f11163..4ddc5d185af 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -2766,6 +2766,8 @@ extract_insn (rtx_insn *insn)
 	 and get the constraints.  */
 
   icode = recog_memoized (insn);
+  if (icode < 0 && !insn_invalid_p (insn, false))
+	icode = INSN_CODE (insn);
   if (icode < 0)
 	fatal_insn_not_found (insn);
 


[GCC-10 backport][PATCH] arm: PR target/95646: Do not clobber callee saved registers with CMSE.

2021-05-05 Thread Srinath Parvathaneni via Gcc-patches
Hi,

This is a backport to gcc-10, cleanly applied on the branch.

As reported in bugzilla when the -mcmse option is used while compiling for size
(-Os) with a thumb-1 target the generated code will clear the registers r7-r10.
These however are callee saved and should be preserved accross ABI boundaries.
The reason this happens is because these registers are made "fixed" when
optimising for size with Thumb-1 in a way to make sure they are not used, as
pushing and popping hi-registers requires extra moves to and from LO_REGS.

To fix this, this patch uses 'callee_saved_reg_p', which accounts for this
optimisation, instead of 'call_used_or_fixed_reg_p'. Be aware of
'callee_saved_reg_p''s definition, as it does still take call used registers
into account, which aren't callee_saved in my opinion, so it is a rather
misnoemer, works in our advantage here though as it does exactly what we need.

Regression tested on arm-none-eabi.

Is this Ok for GCC-10 branch?

Regards,
Srinath.

gcc/ChangeLog:
2020-06-19  Andre Vieira  

PR target/95646
* config/arm/arm.c: (cmse_nonsecure_entry_clear_before_return): Use
'callee_saved_reg_p' instead of 'calL_used_or_fixed_reg_p'.

gcc/testsuite/ChangeLog:
2020-06-19  Andre Vieira  

PR target/95646
* gcc.target/arm/pr95646.c: New test.

(cherry picked from commit 5f426554fd804d65509875d706d8b8bc3a48393b)


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
781bcc8ca42e10524595cb6c90b61450a41f739e..6f4381fd6e959321d8d319fafdce4079c7b54e5f
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27011,7 +27011,7 @@ cmse_nonsecure_entry_clear_before_return (void)
continue;
   if (IN_RANGE (regno, IP_REGNUM, PC_REGNUM))
continue;
-  if (call_used_or_fixed_reg_p (regno)
+  if (!callee_saved_reg_p (regno)
  && (!IN_RANGE (regno, FIRST_VFP_REGNUM, LAST_VFP_REGNUM)
  || TARGET_HARD_FLOAT))
bitmap_set_bit (to_clear_bitmap, regno);
diff --git a/gcc/testsuite/gcc.target/arm/pr95646.c 
b/gcc/testsuite/gcc.target/arm/pr95646.c
new file mode 100644
index 
..12d06a0c8c1ed7de1f8d4d15130432259e613a32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr95646.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv8-m.base" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mcpu=*" } { 
"-mcpu=cortex-m23" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mfpu=*" } { 
} } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { 
"-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
+/* { dg-options "-mcpu=cortex-m23 -mcmse" } */
+/* { dg-additional-options "-Os" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+int __attribute__ ((cmse_nonsecure_entry))
+foo (void)
+{
+  return 1;
+}
+/* { { dg-final { scan-assembler-not "mov\tr9, r0" } } */
+
+/*
+** __acle_se_bar:
+** mov (r[0-3]), r9
+** push{\1}
+** ...
+** pop {(r[0-3])}
+** mov r9, \2
+** ...
+** bxnslr
+*/
+int __attribute__ ((cmse_nonsecure_entry))
+bar (void)
+{
+  asm ("": : : "r9");
+  return 1;
+}

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
781bcc8ca42e10524595cb6c90b61450a41f739e..6f4381fd6e959321d8d319fafdce4079c7b54e5f
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27011,7 +27011,7 @@ cmse_nonsecure_entry_clear_before_return (void)
continue;
   if (IN_RANGE (regno, IP_REGNUM, PC_REGNUM))
continue;
-  if (call_used_or_fixed_reg_p (regno)
+  if (!callee_saved_reg_p (regno)
  && (!IN_RANGE (regno, FIRST_VFP_REGNUM, LAST_VFP_REGNUM)
  || TARGET_HARD_FLOAT))
bitmap_set_bit (to_clear_bitmap, regno);
diff --git a/gcc/testsuite/gcc.target/arm/pr95646.c 
b/gcc/testsuite/gcc.target/arm/pr95646.c
new file mode 100644
index 
..12d06a0c8c1ed7de1f8d4d15130432259e613a32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr95646.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv8-m.base" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mcpu=*" } { 
"-mcpu=cortex-m23" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mfpu=*" } { 
} } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { 
"-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
+/* { dg-options "-mcpu=cortex-m23 -mcmse" } */
+/* { dg-additional-options "-Os" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+int __attribute__ ((cmse_nonsecure_entry))
+foo (void)
+{
+  return 1;
+}
+/* { { dg-final { scan-assembler-not "mov\tr9, r0" } } */
+
+/*
+** __acle_se_bar:
+** mov 

[GCC-10 backport][PATCH] arm: Fix testisms introduced with fix for pr target/95646.

2021-05-05 Thread Srinath Parvathaneni via Gcc-patches
Hi,

This is a backport to gcc-10, cleanly applied on the branch.

This patch changes the test to use the effective-target machinery disables the
error message "ARMv8-M Security Extensions incompatible with selected FPU" when
-mfloat-abi=soft.
Further changes 'asm' to '__asm__' to avoid failures with '-std=' options.

Regression tested on arm-none-eabi.

Is this Ok for GCC-10 branch?

Regards,
Srinath.

gcc/ChangeLog:
2020-07-06  Andre Vieira  

* config/arm/arm.c (arm_options_perform_arch_sanity_checks): Do not
check +D32 for CMSE if -mfloat-abi=soft

gcc/testsuite/ChangeLog:
2020-07-06  Andre Vieira  

* gcc.target/arm/pr95646.c: Fix testism.

(cherry picked from commit 80297f897758f59071968ddff2a04a8d11481117)


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
6f4381fd6e959321d8d319fafdce4079c7b54e5f..c3bbd9fd5e177f07b37610df57d4f02bd0402761
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3850,7 +3850,7 @@ arm_options_perform_arch_sanity_checks (void)
 
   /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions
  and ARMv8-M Baseline and Mainline do not allow such configuration.  */
-  if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
+  if (use_cmse && TARGET_HARD_FLOAT && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
 error ("ARMv8-M Security Extensions incompatible with selected FPU");
 
 
diff --git a/gcc/testsuite/gcc.target/arm/pr95646.c 
b/gcc/testsuite/gcc.target/arm/pr95646.c
index 
12d06a0c8c1ed7de1f8d4d15130432259e613a32..cde1b2d9d36a4e39cd916fdcc9eef424a22bd589
 100644
--- a/gcc/testsuite/gcc.target/arm/pr95646.c
+++ b/gcc/testsuite/gcc.target/arm/pr95646.c
@@ -1,10 +1,7 @@
 /* { dg-do compile } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv8-m.base" } } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mcpu=*" } { 
"-mcpu=cortex-m23" } } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mfpu=*" } { 
} } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { 
"-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
-/* { dg-options "-mcpu=cortex-m23 -mcmse" } */
-/* { dg-additional-options "-Os" } */
+/* { dg-require-effective-target arm_arch_v8m_base_ok } */
+/* { dg-add-options arm_arch_v8m_base } */
+/* { dg-additional-options "-mcmse -Os" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 int __attribute__ ((cmse_nonsecure_entry))
@@ -27,6 +24,6 @@ foo (void)
 int __attribute__ ((cmse_nonsecure_entry))
 bar (void)
 {
-  asm ("": : : "r9");
+  __asm__ ("" : : : "r9");
   return 1;
 }

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
6f4381fd6e959321d8d319fafdce4079c7b54e5f..c3bbd9fd5e177f07b37610df57d4f02bd0402761
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3850,7 +3850,7 @@ arm_options_perform_arch_sanity_checks (void)
 
   /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions
  and ARMv8-M Baseline and Mainline do not allow such configuration.  */
-  if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
+  if (use_cmse && TARGET_HARD_FLOAT && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
 error ("ARMv8-M Security Extensions incompatible with selected FPU");
 
 
diff --git a/gcc/testsuite/gcc.target/arm/pr95646.c 
b/gcc/testsuite/gcc.target/arm/pr95646.c
index 
12d06a0c8c1ed7de1f8d4d15130432259e613a32..cde1b2d9d36a4e39cd916fdcc9eef424a22bd589
 100644
--- a/gcc/testsuite/gcc.target/arm/pr95646.c
+++ b/gcc/testsuite/gcc.target/arm/pr95646.c
@@ -1,10 +1,7 @@
 /* { dg-do compile } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv8-m.base" } } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mcpu=*" } { 
"-mcpu=cortex-m23" } } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mfpu=*" } { 
} } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { 
"-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
-/* { dg-options "-mcpu=cortex-m23 -mcmse" } */
-/* { dg-additional-options "-Os" } */
+/* { dg-require-effective-target arm_arch_v8m_base_ok } */
+/* { dg-add-options arm_arch_v8m_base } */
+/* { dg-additional-options "-mcmse -Os" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 int __attribute__ ((cmse_nonsecure_entry))
@@ -27,6 +24,6 @@ foo (void)
 int __attribute__ ((cmse_nonsecure_entry))
 bar (void)
 {
-  asm ("": : : "r9");
+  __asm__ ("" : : : "r9");
   return 1;
 }



[PATCH] i386: Implement integer vector compares for 64bit vectors [PR98218]

2021-05-05 Thread Uros Bizjak via Gcc-patches
Implement integer vector compares for 64bit vectors for TARGET_MMX_WITH_SSE.

2021-05-05  Uroš Bizjak  

gcc/
PR target/98218
* config/i386/i386-expand.c (ix86_expand_int_sse_cmp):
Handle V8QI, V4HI and V2SI modes.
* config/i386/i386.c (ix86_build_const_vector): Handle V2SImode.
(ix86_build_signbit_mask): Ditto.
* config/i386/mmx.md (MMXMODE14): New mode iterator.
(3): New expander.
(*mmx_3): New insn pattern.
(3): New expander.
(*mmx_3): New insn pattern.
(vec_cmp): New expander.
(vec_cmpu): Ditto.
(vcond): Ditto.
(vcondu): Ditto.
(vcond_mask_): Ditto.

gcc/testsuite/

PR target/98218
* gcc.target/i386/pr98218-1.c: New test.
* gcc.target/i386/pr98218-1a.c: Ditto.
* gcc.target/i386/pr98218-2.c: Ditto.
* gcc.target/i386/pr98218-2a.c: Ditto.
* gcc.target/i386/pr98218-3.c: Ditto.
* gcc.target/i386/pr98218-3a.c: Ditto.
* gcc.dg/vect/vect-bool-cmp.c (dg-final):
Scan vect tree dump for "LOOP VECTORIZED", not VECTORIZED.

BTW: The reason for xfail in pr98218-3a.c is that I didn't succeed to
vectorize int[2] array to V2SI mode, even with -fno-vect-cost-model.
The mode handling and patterns in the patch are correct, as confirmed
by pr98218-3.c testcase.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index fee4d07b7fd..4dfe7d6c282 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -4204,16 +4204,32 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, 
rtx cop0, rtx cop1,
  else if (code == GT && TARGET_SSE4_1)
gen = gen_sminv16qi3;
  break;
+   case E_V8QImode:
+ if (code == GTU && TARGET_SSE2)
+   gen = gen_uminv8qi3;
+ else if (code == GT && TARGET_SSE4_1)
+   gen = gen_sminv8qi3;
+ break;
case E_V8HImode:
  if (code == GTU && TARGET_SSE4_1)
gen = gen_uminv8hi3;
  else if (code == GT && TARGET_SSE2)
gen = gen_sminv8hi3;
  break;
+   case E_V4HImode:
+ if (code == GTU && TARGET_SSE4_1)
+   gen = gen_uminv4hi3;
+ else if (code == GT && TARGET_SSE2)
+   gen = gen_sminv4hi3;
+ break;
case E_V4SImode:
  if (TARGET_SSE4_1)
gen = (code == GTU) ? gen_uminv4si3 : gen_sminv4si3;
  break;
+   case E_V2SImode:
+ if (TARGET_SSE4_1)
+   gen = (code == GTU) ? gen_uminv2si3 : gen_sminv2si3;
+ break;
case E_V2DImode:
  if (TARGET_AVX512VL)
{
@@ -4254,6 +4270,7 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, 
rtx cop0, rtx cop1,
case E_V8SImode:
case E_V4DImode:
case E_V4SImode:
+   case E_V2SImode:
case E_V2DImode:
{
  rtx t1, t2, mask;
@@ -4278,7 +4295,9 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, 
rtx cop0, rtx cop1,
case E_V32QImode:
case E_V16HImode:
case E_V16QImode:
+   case E_V8QImode:
case E_V8HImode:
+   case E_V4HImode:
  /* Perform a parallel unsigned saturating subtraction.  */
  x = gen_reg_rtx (mode);
  emit_insn (gen_rtx_SET
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 780da108a7c..06b0f5814ea 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15284,6 +15284,7 @@ ix86_build_const_vector (machine_mode mode, bool vect, 
rtx value)
 case E_V16SImode:
 case E_V8SImode:
 case E_V4SImode:
+case E_V2SImode:
 case E_V8DImode:
 case E_V4DImode:
 case E_V2DImode:
@@ -15334,6 +15335,7 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, 
bool invert)
 case E_V8SFmode:
 case E_V4SFmode:
 case E_V2SFmode:
+case E_V2SImode:
   vec_mode = mode;
   imode = SImode;
   break;
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 4c2b724dc6f..347295afbb5 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -52,6 +52,7 @@ (define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF])
 
 ;; Mix-n-match
 (define_mode_iterator MMXMODE12 [V8QI V4HI])
+(define_mode_iterator MMXMODE14 [V8QI V2SI])
 (define_mode_iterator MMXMODE24 [V4HI V2SI])
 (define_mode_iterator MMXMODE248 [V4HI V2SI V1DI])
 
@@ -1417,6 +1418,31 @@ (define_insn "*sse2_umulv1siv1di3"
(set_attr "type" "mmxmul,ssemul,ssemul")
(set_attr "mode" "DI,TI,TI")])
 
+(define_expand "3"
+  [(set (match_operand:MMXMODE14 0 "register_operand")
+(smaxmin:MMXMODE14
+ (match_operand:MMXMODE14 1 "register_operand")
+ (match_operand:MMXMODE14 2 "register_operand")))]
+  "TARGET_MMX_WITH_SSE && 

[PATCH] phiopt, v2: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 01:39:32PM +0200, Richard Biener wrote:
> Can you in the above IL snippets mark COND_BB and MIDDLE_BB?
...

Thanks.
Here is an updated patch (attached) and interdiff from previous patch.
Ok for trunk if it passes bootstrap/regtest?

--- gcc/tree-ssa-phiopt.c   2021-05-03 17:49:54.233300624 +0200
+++ gcc/tree-ssa-phiopt.c   2021-05-05 15:06:23.253189139 +0200
@@ -65,7 +65,7 @@
 static bool xor_replacement (basic_block, basic_block,
 edge, edge, gimple *, tree, tree);
 static bool spaceship_replacement (basic_block, basic_block,
-  edge, edge, gimple *, tree, tree);
+  edge, edge, gphi *, tree, tree);
 static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, basic_block,
  edge, edge, gimple *,
  tree, tree);
@@ -1840,53 +1840,53 @@
 
 /* Attempt to optimize (x <=> y) cmp 0 and similar comparisons.
For strong ordering <=> try to match something like:
- :
+ :  // cond3_bb (== cond2_bb)
 if (x_4(D) != y_5(D))
   goto ; [INV]
 else
   goto ; [INV]
 
- :
+ :  // cond_bb
 if (x_4(D) < y_5(D))
   goto ; [INV]
 else
   goto ; [INV]
 
- :
+ :  // middle_bb
 
- :
+ :  // phi_bb
 # iftmp.0_2 = PHI <1(4), 0(2), -1(3)>
 _1 = iftmp.0_2 == 0;
 
and for partial ordering <=> something like:
 
- :
+ :  // cond3_bb
 if (a_3(D) == b_5(D))
   goto ; [50.00%]
 else
   goto ; [50.00%]
 
- [local count: 536870913]:
+ [local count: 536870913]:  // cond2_bb
 if (a_3(D) < b_5(D))
   goto ; [50.00%]
 else
   goto ; [50.00%]
 
- [local count: 268435456]:
+ [local count: 268435456]:  // cond_bb
 if (a_3(D) > b_5(D))
   goto ; [50.00%]
 else
   goto ; [50.00%]
 
- [local count: 134217728]:
+ [local count: 134217728]:  // middle_bb
 
- [local count: 1073741824]:
+ [local count: 1073741824]:  // phi_bb
 # SR.27_4 = PHI <0(2), -1(3), 1(4), 2(5)>
 _2 = SR.27_4 > 0;  */
 
 static bool
 spaceship_replacement (basic_block cond_bb, basic_block middle_bb,
-  edge e0, edge e1, gimple *phi,
+  edge e0, edge e1, gphi *phi,
   tree arg0, tree arg1)
 {
   if (!INTEGRAL_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
@@ -1897,6 +1897,11 @@
   || !IN_RANGE (tree_to_shwi (arg1), -1, 2))
 return false;
 
+  basic_block phi_bb = gimple_bb (phi);
+  gcc_assert (phi_bb == e0->dest && phi_bb == e1->dest);
+  if (!IN_RANGE (EDGE_COUNT (phi_bb->preds), 3, 4))
+return false;
+
   use_operand_p use_p;
   gimple *use_stmt;
   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (PHI_RESULT (phi)))
@@ -1953,11 +1958,6 @@
   if (!empty_block_p (middle_bb))
 return false;
 
-  basic_block phi_bb = gimple_bb (phi);
-  gcc_assert (phi_bb == e0->dest && phi_bb == e1->dest);
-  if (!IN_RANGE (EDGE_COUNT (phi_bb->preds), 3, 4))
-return false;
-
   gcond *cond1 = as_a  (last_stmt (cond_bb));
   enum tree_code cmp1 = gimple_cond_code (cond1);
   if (cmp1 != LT_EXPR && cmp1 != GT_EXPR)
@@ -1965,7 +1965,7 @@
   tree lhs1 = gimple_cond_lhs (cond1);
   tree rhs1 = gimple_cond_rhs (cond1);
   /* The optimization may be unsafe due to NaNs.  */
-  if (HONOR_NANS (TREE_TYPE (lhs1)) || HONOR_SIGNED_ZEROS (TREE_TYPE (lhs1)))
+  if (HONOR_NANS (TREE_TYPE (lhs1)))
 return false;
   if (TREE_CODE (lhs1) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs1))
 return false;
@@ -2180,11 +2180,6 @@
   gimple_assign_set_rhs1 (use_stmt, cond);
 }
   update_stmt (use_stmt);
-  if (cmp3 == EQ_EXPR)
-gimple_cond_make_true (as_a  (cond3));
-  else
-gimple_cond_make_false (as_a  (cond3));
-  update_stmt (cond3);
 
   if (MAY_HAVE_DEBUG_BIND_STMTS)
 {
@@ -2201,6 +2196,13 @@
 
   if (has_debug_uses)
{
+ /* If there are debug uses, emit something like:
+# DEBUG D#1 => i_2(D) > j_3(D) ? 1 : -1
+# DEBUG D#2 => i_2(D) == j_3(D) ? 0 : D#1
+where > stands for the comparison that yielded 1
+and replace debug uses of phi result with that D#2.
+Ignore the value of 2, because if NaNs aren't expected,
+all floating point numbers should be comparable.  */
  gimple_stmt_iterator gsi = gsi_after_labels (gimple_bb (phi));
  tree type = TREE_TYPE (PHI_RESULT (phi));
  tree temp1 = make_node (DEBUG_EXPR_DECL);
@@ -2224,6 +2226,9 @@
}
 }
 
+  gimple_stmt_iterator psi = gsi_for_stmt (phi);
+  remove_phi_node (, true);
+
   return true;
 }
 


Jakub
2021-05-05  Jakub Jelinek  

PR tree-optimization/94589
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
spaceship_replacement.
(cond_only_block_p, spaceship_replacement): New functions.

* gcc.dg/pr94589-1.c: 

[PATCH] AArch64: Improve GOT addressing

2021-05-05 Thread Wilco Dijkstra via Gcc-patches

Improve GOT addressing by emitting the instructions as a pair.  This reduces
register pressure and improves code quality. With -fPIC codesize improves by
0.65% and SPECINT2017 improves by 0.25%.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-05-05  Wilco Dijkstra  

* config/aarch64/aarch64.md (ldr_got_small_): Emit ADRP+LDR GOT 
sequence.
(ldr_got_small_sidi): Likewise.
* config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Remove 
tmp_reg.
(aarch64_print_operand): Correctly print got_lo12 in L specifier.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
641c83b479e76cbcc75b299eb7ae5f634d9db7cd..32c5c76d3c001a79d2a69b7f8243f1f1f605f901
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3625,27 +3625,21 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 
rtx insn;
rtx mem;
-   rtx tmp_reg = dest;
machine_mode mode = GET_MODE (dest);
 
-   if (can_create_pseudo_p ())
- tmp_reg = gen_reg_rtx (mode);
-
-   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
if (mode == ptr_mode)
  {
if (mode == DImode)
- insn = gen_ldr_got_small_di (dest, tmp_reg, imm);
+ insn = gen_ldr_got_small_di (dest, imm);
else
- insn = gen_ldr_got_small_si (dest, tmp_reg, imm);
+ insn = gen_ldr_got_small_si (dest, imm);
 
mem = XVECEXP (SET_SRC (insn), 0, 0);
  }
else
  {
gcc_assert (mode == Pmode);
-
-   insn = gen_ldr_got_small_sidi (dest, tmp_reg, imm);
+   insn = gen_ldr_got_small_sidi (dest, imm);
mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0);
  }
 
@@ -11019,7 +11013,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
   switch (aarch64_classify_symbolic_expression (x))
{
case SYMBOL_SMALL_GOT_4G:
- asm_fprintf (asm_out_file, ":lo12:");
+ asm_fprintf (asm_out_file, ":got_lo12:");
  break;
 
case SYMBOL_SMALL_TLSGD:
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
abfd84526745d029ad4953eabad6dd17b159a218..36c5c054f86e9cdd1f0945cdbc1beb47aa7ad80a
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6705,25 +6705,23 @@ (define_insn "add_losym_"
 
 (define_insn "ldr_got_small_"
   [(set (match_operand:PTR 0 "register_operand" "=r")
-   (unspec:PTR [(mem:PTR (lo_sum:PTR
- (match_operand:PTR 1 "register_operand" "r")
- (match_operand:PTR 2 "aarch64_valid_symref" 
"S")))]
+   (unspec:PTR [(mem:PTR (match_operand:PTR 1 "aarch64_valid_symref" "S"))]
UNSPEC_GOTSMALLPIC))]
   ""
-  "ldr\\t%0, [%1, #:got_lo12:%c2]"
-  [(set_attr "type" "load_")]
+  "adrp\\t%0, %A1\;ldr\\t%0, [%0, %L1]"
+  [(set_attr "type" "load_")
+   (set_attr "length" "8")]
 )
 
 (define_insn "ldr_got_small_sidi"
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
-(unspec:SI [(mem:SI (lo_sum:DI
-(match_operand:DI 1 "register_operand" "r")
-(match_operand:DI 2 "aarch64_valid_symref" "S")))]
+(unspec:SI [(mem:SI (match_operand:DI 1 "aarch64_valid_symref" "S"))]
UNSPEC_GOTSMALLPIC)))]
   "TARGET_ILP32"
-  "ldr\\t%w0, [%1, #:got_lo12:%c2]"
-  [(set_attr "type" "load_4")]
+  "adrp\\t%0, %A1\;ldr\\t%w0, [%0, %L1]"
+  [(set_attr "type" "load_4")
+   (set_attr "length" "8")]
 )
 
 (define_insn "ldr_got_small_28k_"


Re: [PATCH][PR94156] Split COMDAT groups on target that do not support them

2021-05-05 Thread Richard Biener via Gcc-patches
On Tue, Mar 23, 2021 at 5:29 PM Markus Böck via Gcc-patches
 wrote:
>
> GCC at the moment uses COMDAT groups for things like virtual thunks,
> even on targets that do not support COMDAT groups. This has not been a
> problem as on platforms not supporting these (such as PE COFF on
> Windows), the backend handled it through directives to GAS. GCC would
> simply use a .linkonce directive telling the assembler that this
> symbol may occur multiple times, and GAS would translate that into a
> "select any" COMDAT, containing only the symbol itself (as Windows
> does support COMDAT, just not groups).
>
> When using LTO on Windows however, a few problems occur: The COMDAT
> group is transmitted as part of the IR and the linker (ld) will try to
> resolve symbols. On Windows the COMDAT information is put into the
> symbol table, instead of in sections, leading to the linker to error
> out with a multiple reference error before even calling the
> lto-wrapper and LTRANS on the IR, which would otherwise resolve the
> use of COMDAT groups.
>
> This patch removes comdat groups for symbols in the ipa-visibility
> pass and instead puts them into their own comdat. An exception to this
> rule are aliases which (at least on Windows) are also allowed to be in
> the same comdat group as the symbol they are referencing.
>
> This fixes PR94156: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94156
> A previous discussion on the problems this patch attempts to fix were
> had here: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550148.html
>
> I tested this patch with a x86_64-w64-mingw32 target on a Linux host.
> No regressions between before and after of this patch were noted. The
> Test cases provided with this patch have been confirmed to reproduce
> before this patch, and link and work after the application of the
> patch.
>
> Feedback is very welcome, especially on implications I might not be aware of.

Honza - can you have a look here?

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2020-03-23  Markus Böck  
>
> * ipa-visibility.c (function_and_variable_visibility): Split
> COMDAT groups on targets not supporting them
>
> gcc/testsuite/ChangeLog:
>
> 2020-03-23  Markus Böck  
>
> * g++.dg/lto/pr94156.h: New test.
> * g++.dg/lto/pr94156_0.C: New test.
> * g++.dg/lto/pr94156_1.C: New test.
>
> --
> diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
> index eb0ebf770e3..76f1a8ff72a 100644
> --- a/gcc/ipa-visibility.c
> +++ b/gcc/ipa-visibility.c
> @@ -709,6 +709,14 @@ function_and_variable_visibility (bool whole_program)
>   }
> node->dissolve_same_comdat_group_list ();
>   }
> +
> +  if (!HAVE_COMDAT_GROUP && node->same_comdat_group
> +  && !node->alias && !node->has_aliases_p())
> +{
> +  node->remove_from_same_comdat_group();
> +  node->set_comdat_group(DECL_ASSEMBLER_NAME_RAW(node->decl));
> +}
> +
>gcc_assert ((!DECL_WEAK (node->decl)
> && !DECL_COMDAT (node->decl))
>   || TREE_PUBLIC (node->decl)
> @@ -742,8 +750,11 @@ function_and_variable_visibility (bool whole_program)
>   {
> gcc_checking_assert (DECL_COMDAT (node->decl)
>  == DECL_COMDAT (decl_node->decl));
> -   gcc_checking_assert (node->in_same_comdat_group_p (decl_node));
> -   gcc_checking_assert (node->same_comdat_group);
> +   if (HAVE_COMDAT_GROUP)
> +{
> +  gcc_checking_assert (node->in_same_comdat_group_p
> (decl_node));
> +  gcc_checking_assert (node->same_comdat_group);
> +}
>   }
> node->forced_by_abi = decl_node->forced_by_abi;
> if (DECL_EXTERNAL (decl_node->decl))
> diff --git a/gcc/testsuite/g++.dg/lto/pr94156.h
> b/gcc/testsuite/g++.dg/lto/pr94156.h
> new file mode 100644
> index 000..3990ac46fcb
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr94156.h
> @@ -0,0 +1,20 @@
> +class Base0 {
> +public:
> +  virtual ~Base0() {}
> +};
> +
> +class Base1 {
> +public:
> +  virtual void foo() = 0;
> +};
> +
> +class Base2 {
> +public:
> +  virtual void foo() = 0;
> +};
> +
> +class Derived : public Base0, public Base1, public Base2 {
> +public:
> +  virtual ~Derived();
> +  virtual void foo() override {}
> +};
> diff --git a/gcc/testsuite/g++.dg/lto/pr94156_0.C
> b/gcc/testsuite/g++.dg/lto/pr94156_0.C
> new file mode 100644
> index 000..1a2e30badc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr94156_0.C
> @@ -0,0 +1,6 @@
> +// { dg-lto-do link }
> +#include "pr94156.h"
> +
> +Derived::~Derived() {}
> +
> +int main() {}
> diff --git a/gcc/testsuite/g++.dg/lto/pr94156_1.C
> b/gcc/testsuite/g++.dg/lto/pr94156_1.C
> new file mode 100644
> index 000..d7a40efa96c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr94156_1.C
> @@ -0,0 +1,6 @@
> +#include "pr94156.h"
> +
> +void bar(Derived* p)
> +{
> +  p->foo();
> +}


Re: [PATCH] ipa-sra: Do not bail out when callers cannot be cloned

2021-05-05 Thread Richard Biener via Gcc-patches
On Tue, Apr 27, 2021 at 5:29 PM Martin Jambor  wrote:
>
> Hi,
>
> IPA-SRA fails to produce (very simple) edge summaries when a caller
> cannot be cloned or its signature cannot be changed which makes it
> less powerful for no good reason.  This patch fixes that problem.
>
> Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for trunk?

OK.

> A common reason why we think that a function cannot change its
> signature is presence of function type attributes.  I dumped those
> that caused this in our testsuite on x86_64 and got:
>
>   - access
>   - alloc_align
>   - alloc_size
>   - externally_visible
>   - fn spec
>   - force_align_arg_pointer
>   - format
>   - interrupt
>   - ms_abi
>   - no_caller_saved_registers
>   - nocf_check
>   - nonnull
>   - regparm
>   - returns_nonnull
>   - sysv_abi
>   - transaction_callable
>   - transaction_may_cancel_outer
>   - transaction_pure
>   - transaction_safe
>   - transaction_unsafe
>   - type generic
>   - warn_unused_result
>
> and on an Aarch64 I have also seen aarch64_vector_pcs.
>
> Allowing to clone and modify functions with some of these parameters
> like returns_nonull should be easy (it still should probably be
> dropped if the clone does not return anything at all) while doing so
> for attributes like fnspec would need massaging their value.  I am not
> sure what to think of the transaction related ones.

I guess one can classify attributes in those that can survive regardless
of signature changes, those that can be safely dropped (safely as in
no wrong-code, only missed optimizations) and those that can be
transformed (fixing missed optimizations).

I suppose some global attribute_handler[] indexed by some new
attribute key enum providing a class interface so an attribute
can implement its own IPA signature transform method (defaulting
to dropping) and predicate on whether it wants to be
exempted from IPA transforms (default to true) would be nice to have.

Unfortunately attribute handling is scattered amongst frontends at
the moment...

It would also finally allow the attributes to be stored with their ID,
not requiring string compares and eventually allowing a better
data structure for searching.

> (I have seen IPA-SRA remove unused return values of functions marked
> as malloc, so the alloc_align could also be dealt with but I am not
> sure how important it is.)
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2021-04-12  Martin Jambor  
>
> * ipa-sra.c (ipa_sra_dump_all_summaries): Dump edge summaries even
> when there is no function summary.
> (ipa_sra_summarize_function): produce edge summaries even when
> bailing out early.
>
> gcc/testsuite/ChangeLog:
>
> 2021-04-12  Martin Jambor  
>
> * gcc.dg/ipa/ipa-sra-1.c (main): Revert change done by
> 05193687dde, make the argv again pointer to an array.
> ---
>  gcc/ipa-sra.c| 45 +++-
>  gcc/testsuite/gcc.dg/ipa/ipa-sra-1.c |  2 +-
>  2 files changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
> index 7a89906cee6..3f90d4d81b6 100644
> --- a/gcc/ipa-sra.c
> +++ b/gcc/ipa-sra.c
> @@ -2795,27 +2795,27 @@ ipa_sra_dump_all_summaries (FILE *f)
>
>isra_func_summary *ifs = func_sums->get (node);
>if (!ifs)
> -   {
> - fprintf (f, "  Function does not have any associated IPA-SRA "
> -  "summary\n");
> - continue;
> -   }
> -  if (!ifs->m_candidate)
> -   {
> - fprintf (f, "  Not a candidate function\n");
> - continue;
> -   }
> -  if (ifs->m_returns_value)
> - fprintf (f, "  Returns value\n");
> -  if (vec_safe_is_empty (ifs->m_parameters))
> -   fprintf (f, "  No parameter information. \n");
> +   fprintf (f, "  Function does not have any associated IPA-SRA "
> +"summary\n");
>else
> -   for (unsigned i = 0; i < ifs->m_parameters->length (); ++i)
> - {
> -   fprintf (f, "  Descriptor for parameter %i:\n", i);
> -   dump_isra_param_descriptor (f, &(*ifs->m_parameters)[i]);
> - }
> -  fprintf (f, "\n");
> +   {
> + if (!ifs->m_candidate)
> +   {
> + fprintf (f, "  Not a candidate function\n");
> + continue;
> +   }
> + if (ifs->m_returns_value)
> +   fprintf (f, "  Returns value\n");
> + if (vec_safe_is_empty (ifs->m_parameters))
> +   fprintf (f, "  No parameter information. \n");
> + else
> +   for (unsigned i = 0; i < ifs->m_parameters->length (); ++i)
> + {
> +   fprintf (f, "  Descriptor for parameter %i:\n", i);
> +   dump_isra_param_descriptor (f, &(*ifs->m_parameters)[i]);
> + }
> + fprintf (f, "\n");
> +   }
>
>struct cgraph_edge *cs;
>for (cs = node->callees; cs; cs = cs->next_callee)
> @@ -4063,7 +4063,10 @@ 

Re: [wwwdocs] Remove CC0 from backends.html

2021-05-05 Thread Segher Boessenkool
On Tue, May 04, 2021 at 06:30:04PM +0200, Eric Botcazou wrote:
> > Pushed.  What is next?  :-)
> 
> You can finally remove powerpcspe. :-)

Done, thanks!


Segher


[wwwdocs] Remove powerpcspe from backends.html

2021-05-05 Thread Segher Boessenkool
Committed.


Segher


---
 htdocs/backends.html | 1 -
 1 file changed, 1 deletion(-)

diff --git a/htdocs/backends.html b/htdocs/backends.html
index 8034a5776360..f80378b90170 100644
--- a/htdocs/backends.html
+++ b/htdocs/backends.html
@@ -103,7 +103,6 @@ nios2  | C   ia
 nvptx  |   S Q   Cqmg   e
 pa | Q   CBD  qr b   i  e
 pdp11  |L   ICqr b  e
-powerpcspe | Q   Cqrpb   ia
 pru|L  F  a  s
 riscv  | Q   Cqrgia
 rl78   |L  F l  gs
-- 
1.8.3.1



Re: [PATCH] Remove CC0

2021-05-05 Thread Segher Boessenkool
Hi~

On Tue, May 04, 2021 at 04:08:22PM +0100, Richard Earnshaw wrote:
> On 03/05/2021 23:55, Segher Boessenkool wrote:
> >CC_STATUS_INIT is suggested in final.c to also be useful for ports that
> >are not CC0, and at least arm seems to use it for something.  So I am
> >leaving that alone, but most targets that have it could remove it.
> 
> A quick look through the code suggests it's being used for thumb1 code 
> gen to try to reproduce the traditional CC0 type behaviour of 
> eliminating redundant compare operations when you have sequences such as
> 
> cmp a, b
> b d1
> cmp a, b
> b d2
> 
> The second compare operation can be eliminated.
> 
> It might be possible to eliminate this another way by reworking the 
> thumb1 codegen to expose the condition codes after register allocation 
> has completed (much like x86 does these days), but that would be quite a 
> lot of work right now.  I don't know if such splitting would directly 
> lead to the ability to remove the redundant compares - it might need a 
> new pass to spot them.

At least on rs6000 on a simple example this is handled by fwprop1
already.  Does that work for thumb1?  Or maybe that uses hard regs for
the condition codes and that does not work here?

Example code:

===
void g(void);
void h(void);
void i(void);
void f(long a, long b)
{
if (a < b)
g();
if (a == b)
h();
if (a > b)
i();
}
===


Segher


RE: [GCC][PATCH] arm: Remove duplicate definitions from arm_mve.h (pr100419).

2021-05-05 Thread Srinath Parvathaneni via Gcc-patches
Hi Richard,

> -Original Message-
> From: Richard Earnshaw 
> Sent: 05 May 2021 11:15
> To: Srinath Parvathaneni ; gcc-
> patc...@gcc.gnu.org
> Cc: Richard Earnshaw 
> Subject: Re: [GCC][PATCH] arm: Remove duplicate definitions from
> arm_mve.h (pr100419).
> 
> 
> 
> On 05/05/2021 10:56, Srinath Parvathaneni via Gcc-patches wrote:
> > Hi All,
> >
> > This patch removes several duplicated intrinsic definitions from
> > arm_mve.h mentioned in PR100419 and also fixes the wrong arguments
> > in few of intrinsics polymorphic variants.
> >
> > Regression tested and found no issues.
> >
> > Ok for master ? GCC-11 and GCC-10 branch backports?
> > gcc/ChangeLog:
> >
> > 2021-05-04  Srinath Parvathaneni  
> >
> >  PR target/100419
> >  * config/arm/arm_mve.h (__arm_vstrwq_scatter_offset): Fix wrong
> arguments.
> >  (__arm_vcmpneq): Remove duplicate definition.
> >  (__arm_vstrwq_scatter_offset_p): Likewise.
> >  (__arm_vmaxq_x): Likewise.
> >  (__arm_vmlsdavaq): Likewise.
> >  (__arm_vmlsdavaxq): Likewise.
> >  (__arm_vmlsdavq_p): Likewise.
> >  (__arm_vmlsdavxq_p): Likewise.
> >  (__arm_vrmlaldavhaq): Likewise.
> >  (__arm_vstrbq_p): Likewise.
> >  (__arm_vstrbq_scatter_offset): Likewise.
> >  (__arm_vstrbq_scatter_offset_p): Likewise.
> >  (__arm_vstrdq_scatter_offset): Likewise.
> >  (__arm_vstrdq_scatter_offset_p): Likewise.
> >  (__arm_vstrdq_scatter_shifted_offset): Likewise.
> >  (__arm_vstrdq_scatter_shifted_offset_p): Likewise.
> >
> > Co-authored-by: Joe Ramsay  
> 
> Let's take this example:
> 
> -#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 =
> (p1); \
> +#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 =
> (p0); \
> __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
> -  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]:
> __arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(p0, int32_t *), __p1,
> __ARM_mve_coerce(__p2, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]:
> __arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(p0, uint32_t *),
> __p1,
> __ARM_mve_coerce(__p2, uint32x4_t)));})
> +  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]:
> __arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int32_t *), p1,
> __ARM_mve_coerce(__p2, int32x4_t)), \
> +  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]:
> __arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint32_t *),
> p1,
> __ARM_mve_coerce(__p2, uint32x4_t)));})
> 
> It removes the safe shadow copy of p1 but adds a safe shadow copy of p0.
>   Why?  Isn't it better (and safer) to just create shadow copies of all
> the arguments and let the compiler worry about when it's safe to
> eliminate them?

As you already know polymorphic variants are used to select the intrinsics 
based on type of their arguments.

Consider the following code from arm_mve.h:
__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrwq_scatter_offset_s32 (int32_t * __base, uint32x4_t __offset, 
int32x4_t __value)
{
  __builtin_mve_vstrwq_scatter_offset_sv4si ((__builtin_neon_si *) __base, 
__offset, __value);
}

__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrwq_scatter_offset_u32 (uint32_t * __base, uint32x4_t __offset, 
uint32x4_t __value)
{
  __builtin_mve_vstrwq_scatter_offset_uv4si ((__builtin_neon_si *) __base, 
__offset, __value);
}

__extension__ extern __inline void
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__arm_vstrwq_scatter_offset_f32 (float32_t * __base, uint32x4_t __offset, 
float32x4_t __value)
{
  __builtin_mve_vstrwq_scatter_offset_fv4sf ((__builtin_neon_si *) __base, 
__offset, __value);
}

Of above 3 functions, which function is to be called from the following 
polymorphic variant is
decided based on type of arguments p0, p1 and p2.
#define __arm_vstrwq_scatter_offset(p0,p1,p2)

For the 3 function definitions mentioned above, only type of arguments 1 (p0) 
and 3 (p2) varies
whereas type of second argument (p1) is same (uint32x4_t).

This is the reason we need only shadow copy of p0 and p2 to determine the 
actual function to be called
and type of p1 is irrelevant. Previously p1 was wrongly used to determine the 
function instead of p0
and that is a bug, which got fixed in this patch.

Since type of p1 is irrelevant in deciding the function to be called and I 
believe adding shadow copy
for p1 (__typeof(p1) __p1 = (p1) ) in this macro expansion is of no use. 
Considering we have more than
250 polymorphic variants defined in arm_mve.h headers, this results in more 
than 250 lines of extra code.

Regards,
Srinath.

> R.
> 
> 

Re: [patch] Fix PR rtl-optimization/100411

2021-05-05 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 02:19:27PM +0200, Eric Botcazou wrote:
> > At least for NOTE_INSN_BASIC_BLOCK skipping more than one might
> > be problematic, because that would mean we've skipped into a different basic
> > block and it wouldn't surprise me if split_block in that case crashed or
> > did something weird (if the first argument is not BLOCK_FOR_INSN of the
> > second argument when it is non-NULL).
> > For the other notes, I think they should normally appear just once and
> > shouldn't be a problem therefore.
> 
> OK, version essentially equivalent to the original one, but with a loop.

LGTM.

> diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
> index f05cb6136c7..17edc4f37ad 100644
> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -2145,7 +2145,11 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>if (NOTE_INSN_BASIC_BLOCK_P (newpos1))
>  newpos1 = NEXT_INSN (newpos1);
>  
> -  while (DEBUG_INSN_P (newpos1))
> +  /* Skip also prologue and function markers.  */
> +  while (DEBUG_INSN_P (newpos1)
> +  || (NOTE_P (newpos1)
> +  && (NOTE_KIND (newpos1) == NOTE_INSN_PROLOGUE_END
> +  || NOTE_KIND (newpos1) == NOTE_INSN_FUNCTION_BEG)))
>  newpos1 = NEXT_INSN (newpos1);
>  
>redirect_from = split_block (src1, PREV_INSN (newpos1))->src;


Jakub



Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-05 Thread Richard Biener
On Wed, 5 May 2021, Andre Vieira (lists) wrote:

> Hi Richi,
> 
> So I'm trying to look at what IVOPTs does right now and how it might be able
> to help us. Looking at these two code examples:
> #include 
> #if 0
> int foo(short * a, short * b, unsigned int n)
> {
>     int sum = 0;
>     for (unsigned int i = 0; i < n; ++i)
>     sum += a[i] + b[i];
> 
>     return sum;
> }
> 
> 
> #else
> 
> int bar (short * a, short *b, unsigned int n)
> {
>     int sum = 0;
>     unsigned int i = 0;
>     for (; i < (n / 16); i += 1)
>     {
>     // Iterates [0, 16, .., (n/16 * 16) * 16]
>     // Example n = 127,
>     // iterates [0, 16, 32, 48, 64, 80, 96, 112]
>     sum += a[i*16] + b[i*16];
>     }
>     for (size_t j =  (size_t) ((n / 16) * 16); j < n; ++j)
>     {
>     // Iterates [(n/16 * 16) * 16 , (((n/16 * 16) + 1) * 16)... ,n*16]
>     // Example n = 127,
>     // j starts at (127/16) * 16 = 7 * 16 = 112,
>     // So iterates over [112, 113, 114, 115, ..., 127]
>     sum += a[j] + b[j];
>     }
>     return sum;
> }
> #endif
> 
> Compiled the bottom one (#if 0) with 'aarch64-linux-gnu' with the following
> options '-O3 -march=armv8-a -fno-tree-vectorize -fdump-tree-ivopts-all
> -fno-unroll-loops'. See godbolt link here: https://godbolt.org/z/MEf6j6ebM
> 
> I tried to see what IVOPTs would make of this and it is able to analyze the
> IVs but it doesn't realize (not even sure it tries) that one IV's end (loop 1)
> could be used as the base for the other (loop 2). I don't know if this is
> where you'd want such optimizations to be made, on one side I think it would
> be great as it would also help with non-vectorized loops as you allured to.

Hmm, OK.  So there's the first loop that has a looparound jump and thus
we do not always enter the 2nd loop with the first loop final value of the
IV.  But yes, IVOPTs does not try to allocate IVs across multiple loops.
And for a followup transform to catch this it would need to compute
the final value of the IV and then match this up with the initial
value computation.  I suppose FRE could be teached to do this, at
least for very simple cases.

> However, if you compile the top test case (#if 1) and let the tree-vectorizer
> have a go you will see different behaviours for different vectorization
> approaches, so for:
> '-O3 -march=armv8-a', using NEON and epilogue vectorization it seems IVOPTs
> only picks up on one loop.

Yep, the "others" are likely fully peeled because they have just a single
iteration.  Again some kind-of final-value replacement "CSE" could
eventually help - but then we have jump-arounds here as well thus
we'd need final-value replacement "PRE".

> If you use '-O3 -march=armv8-a+sve --param vect-partial-vector-usage=1' it
> will detect two loops. This may well be because in fact epilogue vectorization
> 'un-loops' it because it knows it will only have to do one iteration of the
> vectorized epilogue. vect-partial-vector-usage=1 could have done the same, but
> because we are dealing with polymorphic vector modes it fails to, I have a
> hack that works for vect-partial-vector-usage to avoid it, but I think we can
> probably do better and try to reason about boundaries in poly_int's rather
> than integers (TBC).
> 
> Anyway I diverge. Back to the main question of this patch. How do you suggest
> I go about this? Is there a way to make IVOPTS aware of the 'iterate-once' IVs
> in the epilogue(s) (both vector and scalar!) and then teach it to merge IV's
> if one ends where the other begins?

I don't think we will make that work easily.  So indeed attacking this
in the vectorizer sounds most promising.  I'll note there's also
the issue of epilogue vectorization and reductions where we seem
to not re-use partially reduced reduction vectors but instead
reduce to a scalar in each step.  That's a related issue - we're
not able to carry forward a (reduction) IV we generated for the
main vector loop to the epilogue loops.  Like for

double foo (double *a, int n)
{
  double sum = 0.;
  for (int i = 0; i < n; ++i)
sum += a[i];
  return sum;
}

with AVX512 we get three reductions to scalars instead of
a partial reduction from zmm to ymm before the first vectorized
epilogue followed by a reduction from ymm to xmm before the second
(the jump around for the epilogues need to jump to the further
reduction piece obviously).

So I think we want to record IVs we generate (the reduction IVs
are already nicely associated with the stmt-infos), one might
consider to refer to them from the dr_vec_info for example.

It's just going to be "interesting" to wire everything up
correctly with all the jump-arounds we have ...

> On 04/05/2021 10:56, Richard Biener wrote:
> > On Fri, 30 Apr 2021, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >>
> >> The aim of this RFC is to explore a way of cleaning up the codegen around
> >> data_references.  To be specific, I'd like to reuse the main-loop's updated
> >> data_reference as the base_address 

[Alert]: ASSIGNMENT ($500.00)

2021-05-05 Thread LifePoints Research , Inc . ® via Gcc-patches

Dear Prospective Panelist,

You have been selected as a Customer Satisfaction Analyst to participate in our 
short exclusive 10-15 minutes paid project study on Walmart,Walgreens & CVS 
stores within your area.

REWARDS: You will earn $500.00 for each completed survey project.

SURVEY [#53779-1001] DIRECTIVES: LifePoints Research will send you a check to 
cover the expense and compensation for your first survey project. You will 
visit the nearest Walmart/Walgreens/CVS store within your area to conduct your 
evaluation survey by utilizing the store products/transfer services to rate 
their customer satisfaction standards, associate skills and facility management.

As a survey panelist, you will act as a potential customer at the assigned 
Walmart/Walgreens/CVS store and complete your survey questionnaire report on 
the quality of store services and efficiency of associates in order to submit a 
fair and unbiased rating in your survey report.
_
Complete Your Survey Participant Profile  >
**
*Full Name:
*Complete Mailing Address:
*Phone #(cell):
*Work Experience:
*Alternate Email:
*

THANK YOU - WE APPRECIATE AND VALUE YOUR PARTICIPATION!

Sincerely,

LifePoints Research, Inc.®
85 Broadway St
New York, NY 10004
Attn: Scott Richards (Operations Manager)
[P: (585) 735-7030]

Re: [patch] Fix PR rtl-optimization/100411

2021-05-05 Thread Eric Botcazou
> At least for NOTE_INSN_BASIC_BLOCK skipping more than one might
> be problematic, because that would mean we've skipped into a different basic
> block and it wouldn't surprise me if split_block in that case crashed or
> did something weird (if the first argument is not BLOCK_FOR_INSN of the
> second argument when it is non-NULL).
> For the other notes, I think they should normally appear just once and
> shouldn't be a problem therefore.

OK, version essentially equivalent to the original one, but with a loop.

-- 
Eric Botcazoudiff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index f05cb6136c7..17edc4f37ad 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -2145,7 +2145,11 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
   if (NOTE_INSN_BASIC_BLOCK_P (newpos1))
 newpos1 = NEXT_INSN (newpos1);
 
-  while (DEBUG_INSN_P (newpos1))
+  /* Skip also prologue and function markers.  */
+  while (DEBUG_INSN_P (newpos1)
+	 || (NOTE_P (newpos1)
+	 && (NOTE_KIND (newpos1) == NOTE_INSN_PROLOGUE_END
+		 || NOTE_KIND (newpos1) == NOTE_INSN_FUNCTION_BEG)))
 newpos1 = NEXT_INSN (newpos1);
 
   redirect_from = split_block (src1, PREV_INSN (newpos1))->src;


[committed] libstdc++: Add tests for std::invoke feature test macro

2021-05-05 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* testsuite/20_util/function_objects/invoke/3.cc: Check feature
test macro.
* testsuite/20_util/function_objects/invoke/version.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 29745bf06276b9628d08ef1c9e28890cc56df4aa
Author: Jonathan Wakely 
Date:   Wed May 5 12:41:14 2021

libstdc++: Add tests for std::invoke feature test macro

libstdc++-v3/ChangeLog:

* testsuite/20_util/function_objects/invoke/3.cc: Check feature
test macro.
* testsuite/20_util/function_objects/invoke/version.cc: New test.

diff --git a/libstdc++-v3/testsuite/20_util/function_objects/invoke/3.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/invoke/3.cc
index 5ffc5dedbab..40e068e86fd 100644
--- a/libstdc++-v3/testsuite/20_util/function_objects/invoke/3.cc
+++ b/libstdc++-v3/testsuite/20_util/function_objects/invoke/3.cc
@@ -15,11 +15,16 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++17" }
 // { dg-do compile { target c++17 } }
 
 #include 
 
+#ifndef __cpp_lib_invoke
+# error Feature-test macro for invoke is missing in 
+#elif __cpp_lib_invoke < 201411L
+# error Feature-test macro for invoke has the wrong value in 
+#endif
+
 struct abstract {
   virtual ~abstract() = 0;
   void operator()() noexcept;
diff --git a/libstdc++-v3/testsuite/20_util/function_objects/invoke/version.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/invoke/version.cc
new file mode 100644
index 000..cf1a46a1ada
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/function_objects/invoke/version.cc
@@ -0,0 +1,9 @@
+// { dg-do compile { target c++17 } }
+
+#include 
+
+#ifndef __cpp_lib_invoke
+# error Feature-test macro for invoke is missing in 
+#elif __cpp_lib_invoke < 201411L
+# error Feature-test macro for invoke has the wrong value in 
+#endif


[committed] libstdc++: Use unsigned char argument to std::isdigit

2021-05-05 Thread Jonathan Wakely via Gcc-patches
Passing plain char to isdigit is undefined if the value is negative.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum): Pass unsigned
char to std::isdigit.

Tested powerpc64le-linux. Committed to trunk.

commit d0d6ca019717305df0ef41e3fe1da48f7f561fac
Author: Jonathan Wakely 
Date:   Wed May 5 11:19:55 2021

libstdc++: Use unsigned char argument to std::isdigit

Passing plain char to isdigit is undefined if the value is negative.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum): Pass unsigned
char to std::isdigit.

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 193702e677a..571be075a6b 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -565,7 +565,7 @@ namespace __detail
   while (__first != __last)
{
  unsigned char __c = *__first;
- if (std::isdigit(__c))
+ if (std::isdigit(static_cast(__c)))
__c -= '0';
  else
{


Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Marc Glisse

On Tue, 4 May 2021, Jakub Jelinek via Gcc-patches wrote:


2) the pr94589-2.C testcase should be matching just 12 times each, but runs
into operator>=(strong_ordering, unspecified) being defined as
(_M_value&1)==_M_value
rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
but is not a single use but two uses.  I'll need to pattern match that case
specially.


Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0, 
that could be worth doing already in GIMPLE.


--
Marc Glisse


Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Richard Biener
On Tue, 4 May 2021, Jakub Jelinek wrote:

> Hi!
> 
> genericize_spaceship genericizes i <=> j to approximately
> ({ int c; if (i == j) c = 0; else if (i < j) c = -1; else c = 1; c; })
> for strong ordering and
> ({ int c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; 
> else c = 2; c; })
> for partial ordering.
> The C++ standard supports then == or != comparisons of that against
> strong/partial ordering enums, or />= comparisons of <=> result
> against literal 0.
> 
> In some cases we already optimize that but in many cases we keep performing
> all the 2 or 3 comparisons, compute the spaceship value and then compare
> that.
> 
> The following patch recognizes those patterns if the <=> operands are
> integral types or floating point (the latter only for -ffast-math) and
> optimizes it to the single comparison that is needed (plus adds debug stmts
> if needed for the spaceship result).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> There are two things I'd like to address in a follow-up:
> 1) if (HONOR_NANS (TREE_TYPE (lhs1)) || HONOR_SIGNED_ZEROS (TREE_TYPE (lhs1)))
> is what I've copied from elsewhere in phiopt, but thinking about it,
> alll we care is probably only HONOR_NANS, the matched pattern starts with
> == or != comparison and branches to the PHI bb with -1/0/1/2 result if it is
> equal, which should be the case for signed zero differences.
> 2) the pr94589-2.C testcase should be matching just 12 times each, but runs
> into operator>=(strong_ordering, unspecified) being defined as
> (_M_value&1)==_M_value
> rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
> unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
> but is not a single use but two uses.  I'll need to pattern match that case
> specially.
> 
> 2021-05-04  Jakub Jelinek  
> 
>   PR tree-optimization/94589
>   * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
>   spaceship_replacement.
>   (cond_only_block_p, spaceship_replacement): New functions.
> 
>   * gcc.dg/pr94589-1.c: New test.
>   * gcc.dg/pr94589-2.c: New test.
>   * gcc.dg/pr94589-3.c: New test.
>   * gcc.dg/pr94589-4.c: New test.
>   * g++.dg/opt/pr94589-1.C: New test.
>   * g++.dg/opt/pr94589-2.C: New test.
>   * g++.dg/opt/pr94589-3.C: New test.
>   * g++.dg/opt/pr94589-4.C: New test.
> 
> --- gcc/tree-ssa-phiopt.c.jj  2021-05-02 10:17:49.095397758 +0200
> +++ gcc/tree-ssa-phiopt.c 2021-05-03 17:49:54.233300624 +0200
> @@ -64,6 +64,8 @@ static bool abs_replacement (basic_block
>edge, edge, gimple *, tree, tree);
>  static bool xor_replacement (basic_block, basic_block,
>edge, edge, gimple *, tree, tree);
> +static bool spaceship_replacement (basic_block, basic_block,
> +edge, edge, gimple *, tree, tree);
>  static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, 
> basic_block,
> edge, edge, gimple *,
> tree, tree);
> @@ -357,6 +359,8 @@ tree_ssa_phiopt_worker (bool do_store_el
>   cfgchanged = true;
> else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
>   cfgchanged = true;
> +   else if (spaceship_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
> + cfgchanged = true;
>   }
>  }
>  
> @@ -1806,6 +1810,420 @@ minmax_replacement (basic_block cond_bb,
>  
>return true;
>  }
> +
> +/* Return true if the only executable statement in BB is a GIMPLE_COND.  */
> +
> +static bool
> +cond_only_block_p (basic_block bb)
> +{
> +  /* BB must have no executable statements.  */
> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +  if (phi_nodes (bb))
> +return false;
> +  while (!gsi_end_p (gsi))
> +{
> +  gimple *stmt = gsi_stmt (gsi);
> +  if (is_gimple_debug (stmt))
> + ;
> +  else if (gimple_code (stmt) == GIMPLE_NOP
> +|| gimple_code (stmt) == GIMPLE_PREDICT
> +|| gimple_code (stmt) == GIMPLE_COND)
> + ;
> +  else
> + return false;
> +  gsi_next ();
> +}
> +  return true;
> +}
> +
> +/* Attempt to optimize (x <=> y) cmp 0 and similar comparisons.
> +   For strong ordering <=> try to match something like:
> + :
> +if (x_4(D) != y_5(D))
> +  goto ; [INV]
> +else
> +  goto ; [INV]
> +
> + :
> +if (x_4(D) < y_5(D))
> +  goto ; [INV]
> +else
> +  goto ; [INV]
> +
> + :
> +
> + :
> +# iftmp.0_2 = PHI <1(4), 0(2), -1(3)>
> +_1 = iftmp.0_2 == 0;
> +
> +   and for partial ordering <=> something like:
> +
> + :
> +if (a_3(D) == b_5(D))
> +  goto ; [50.00%]
> +else
> +  goto ; [50.00%]
> +
> + [local count: 536870913]:
> +if (a_3(D) < b_5(D))
> +  goto ; [50.00%]
> +else
> +  goto ; [50.00%]
> +
> + [local count: 

Re: [patch] Fix PR rtl-optimization/100411

2021-05-05 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 01:21:20PM +0200, Eric Botcazou wrote:
> > I mean, can't we have just one while loop that skips over all debug insns,
> > NOTE_INSN_BASIC_BLOCK_P, NOTE_INSN_PROLOGUE_END and NOTE_INSN_FUNCTION_BEG
> > and stops on anything else, or, if we want to skip at most one of some or
> > all of those note kinds, do some tracking if we've already skipped that kind
> > of note?
> 
> Revised version attached.

It was meant as a question, I don't know what the right answer is.
At least for NOTE_INSN_BASIC_BLOCK skipping more than one might
be problematic, because that would mean we've skipped into a different basic
block and it wouldn't surprise me if split_block in that case crashed or
did something weird (if the first argument is not BLOCK_FOR_INSN of the
second argument when it is non-NULL).
For the other notes, I think they should normally appear just once and
shouldn't be a problem therefore.

> diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
> index f05cb6136c7..f49a34dcb0f 100644
> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -2134,18 +2134,15 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>update_br_prob_note (redirect_edges_to);
>  
>/* Edit SRC1 to go to REDIRECT_TO at NEWPOS1.  */
> -
> -  /* Skip possible basic block header.  */
>if (LABEL_P (newpos1))
>  newpos1 = NEXT_INSN (newpos1);
>  
> -  while (DEBUG_INSN_P (newpos1))
> -newpos1 = NEXT_INSN (newpos1);
> -
> -  if (NOTE_INSN_BASIC_BLOCK_P (newpos1))
> -newpos1 = NEXT_INSN (newpos1);
> -
> -  while (DEBUG_INSN_P (newpos1))
> +  /* Skip debug insns, basic block header and prologue markers.  */
> +  while (DEBUG_INSN_P (newpos1)
> +  || (NOTE_P (newpos1)
> +  && (NOTE_KIND (newpos1) == NOTE_INSN_BASIC_BLOCK
> +  || NOTE_KIND (newpos1) == NOTE_INSN_PROLOGUE_END
> +  || NOTE_KIND (newpos1) == NOTE_INSN_FUNCTION_BEG)))
>  newpos1 = NEXT_INSN (newpos1);
>  
>redirect_from = split_block (src1, PREV_INSN (newpos1))->src;


Jakub



Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-05 Thread Andre Vieira (lists) via Gcc-patches

Hi Richi,

So I'm trying to look at what IVOPTs does right now and how it might be 
able to help us. Looking at these two code examples:

#include 
#if 0
int foo(short * a, short * b, unsigned int n)
{
    int sum = 0;
    for (unsigned int i = 0; i < n; ++i)
    sum += a[i] + b[i];

    return sum;
}


#else

int bar (short * a, short *b, unsigned int n)
{
    int sum = 0;
    unsigned int i = 0;
    for (; i < (n / 16); i += 1)
    {
    // Iterates [0, 16, .., (n/16 * 16) * 16]
    // Example n = 127,
    // iterates [0, 16, 32, 48, 64, 80, 96, 112]
    sum += a[i*16] + b[i*16];
    }
    for (size_t j =  (size_t) ((n / 16) * 16); j < n; ++j)
    {
    // Iterates [(n/16 * 16) * 16 , (((n/16 * 16) + 1) * 16)... ,n*16]
    // Example n = 127,
    // j starts at (127/16) * 16 = 7 * 16 = 112,
    // So iterates over [112, 113, 114, 115, ..., 127]
    sum += a[j] + b[j];
    }
    return sum;
}
#endif

Compiled the bottom one (#if 0) with 'aarch64-linux-gnu' with the 
following options '-O3 -march=armv8-a -fno-tree-vectorize 
-fdump-tree-ivopts-all -fno-unroll-loops'. See godbolt link here: 
https://godbolt.org/z/MEf6j6ebM


I tried to see what IVOPTs would make of this and it is able to analyze 
the IVs but it doesn't realize (not even sure it tries) that one IV's 
end (loop 1) could be used as the base for the other (loop 2). I don't 
know if this is where you'd want such optimizations to be made, on one 
side I think it would be great as it would also help with non-vectorized 
loops as you allured to.


However, if you compile the top test case (#if 1) and let the 
tree-vectorizer have a go you will see different behaviours for 
different vectorization approaches, so for:
'-O3 -march=armv8-a', using NEON and epilogue vectorization it seems 
IVOPTs only picks up on one loop.
If you use '-O3 -march=armv8-a+sve --param vect-partial-vector-usage=1' 
it will detect two loops. This may well be because in fact epilogue 
vectorization 'un-loops' it because it knows it will only have to do one 
iteration of the vectorized epilogue. vect-partial-vector-usage=1 could 
have done the same, but because we are dealing with polymorphic vector 
modes it fails to, I have a hack that works for 
vect-partial-vector-usage to avoid it, but I think we can probably do 
better and try to reason about boundaries in poly_int's rather than 
integers (TBC).


Anyway I diverge. Back to the main question of this patch. How do you 
suggest I go about this? Is there a way to make IVOPTS aware of the 
'iterate-once' IVs in the epilogue(s) (both vector and scalar!) and then 
teach it to merge IV's if one ends where the other begins?


On 04/05/2021 10:56, Richard Biener wrote:

On Fri, 30 Apr 2021, Andre Vieira (lists) wrote:


Hi,

The aim of this RFC is to explore a way of cleaning up the codegen around
data_references.  To be specific, I'd like to reuse the main-loop's updated
data_reference as the base_address for the epilogue's corresponding
data_reference, rather than use the niters.  We have found this leads to
better codegen in the vectorized epilogue loops.

The approach in this RFC creates a map if iv_updates which always contain an
updated pointer that is caputed in vectorizable_{load,store}, an iv_update may
also contain a skip_edge in case we decide the vectorization can be skipped in
'vect_do_peeling'. During the epilogue update this map of iv_updates is then
checked to see if it contains an entry for a data_reference and it is used
accordingly and if not it reverts back to the old behavior of using the niters
to advance the data_reference.

The motivation for this work is to improve codegen for the option `--param
vect-partial-vector-usage=1` for SVE. We found that one of the main problems
for the codegen here was coming from unnecessary conversions caused by the way
we update the data_references in the epilogue.

This patch passes regression tests in aarch64-linux-gnu, but the codegen is
still not optimal in some cases. Specifically those where we have a scalar
epilogue, as this does not use the data_reference's and will rely on the
gimple scalar code, thus constructing again a memory access using the niters.
This is a limitation for which I haven't quite worked out a solution yet and
does cause some minor regressions due to unfortunate spills.

Let me know what you think and if you have ideas of how we can better achieve
this.

Hmm, so the patch adds a kludge to improve the kludge we have in place ;)

I think it might be interesting to create a C testcase mimicing the
update problem without involving the vectorizer.  That way we can
see how the various components involved behave (FRE + ivopts most
specifically).

That said, a cleaner approach to dealing with this would be to
explicitely track the IVs we generate for vectorized DRs, eventually
factoring that out from vectorizable_{store,load} so we can simply
carry over the actual pointer IV final value to the epilogue as
initial 

Re: [PATCH][_GLIBCXX_DEBUG] libbacktrace integration

2021-05-05 Thread Jonathan Wakely via Gcc-patches

On 24/04/21 15:46 +0200, François Dumont via Libstdc++ wrote:

Hi

    Here is the patch to add backtrace generation on _GLIBCXX_DEBUG 
assertions thanks to libbacktrace.


    In addition to this integration I am also improving the generation 
of the assertion message thanks to the "%.*s" printf format, it avoids 
an intermediate buffer most of the time. I am also removing the "__" 
used for uglification to get a nicer output. I can propose it in a 
dedicated patch if you prefer.


    I am adding GLIBCXX_3.4.30 abi version to properly export the 2 
new weak symbols. Let me know if it isn't necessary.


    libstdc++: [_GLIBCXX_DEBUG] Add backtrace generation thanks to 
libbacktrace


  Add _GLIBCXX_DEBUG_BACKTRACE macro to activate backtrace 
generation on

    _GLIBCXX_DEBUG assertions using libbacktrace.

    * config/abi/pre/gnu.ver: Add GLIBCXX_3.4.30 version and 
new exports.

    * include/debug/formatter.h [_GLIBCXX_DEBUG_BACKTRACE]:
    Include .
    [_GLIBCXX_DEBUG_BACKTRACE && BACKTRACE_SUPPORTED]:
    Include .
    [(!_GLIBCXX_DEBUG_BACKTRACE || !BACKTRACE_SUPPORTED) &&
    _GLIBCXX_USE_C99_STDINT_TR1]: Include .
    [_GLIBCXX_DEBUG_USE_LIBBACKTRACE]
    (__gnu_debug::__create_backtrace_state): New.
    [_GLIBCXX_DEBUG_USE_LIBBACKTRACE]
    (__gnu_debug::__render_backtrace): New.
[_GLIBCXX_DEBUG_USE_LIBBACKTRACE](_Error_formatter::_M_print_backtrace):
    New.
[_GLIBCXX_DEBUG_USE_LIBBACKTRACE](_Error_formatter::_M_backtrace_state):
    New.
    (_Error_formatter::_Error_formatter): Outline definition.
    * src/c++11/debug.cc: Include .
    (_Print_func_t): New.
    (print_word): Use '%.*s' format in fprintf to render only 
expected

    number of chars.
    (print_raw(PrintContext&, const char*, ptrdiff_t)): New.
    (print_function(PrintContext&, const char*, 
_Print_func_t)): New.

    (print_type): Use latter.
    (print_string(PrintContext&, const char*, const 
_Parameter*, size_t)):

    Change signature to...
    (print_string(PrintContext&, const char*, ptrdiff_t, const 
_Parameter*,
    size_t)): ...this and adapt. Remove intermediate buffer to 
render input

    string.
    (print_string(PrintContext&, const char*, ptrdiff_t)): New.
    [_GLIBCXX_DEBUG_USE_LIBBACKTRACE]
    (print_backtrace(void*, uintptr_t, const char*, int, const 
char*)): New.

    (_Error_formatter::_M_error()): Adapt.
    [_GLIBCXX_DEBUG_USE_LIBBACKTRACE]
    (__gnu_debug::__create_backtrace_state): New, weak symbol.
    [_GLIBCXX_DEBUG_USE_LIBBACKTRACE]
    (__gnu_debug::__render_backtrace): New, weak symbol.
    * testsuite/util/testsuite_abi.cc: Add new symbol version.
    * doc/xml/manual/debug_mode.xml: Document 
_GLIBCXX_DEBUG_BACKTRACE.

    * doc/xml/manual/using.xml: Likewise.

Tested under Linux x86_64.

Ok to commit ?

François




diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 5323c7f0604..2606d67d8a9 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2397,6 +2397,15 @@ GLIBCXX_3.4.29 {

} GLIBCXX_3.4.28;

+GLIBCXX_3.4.30 {
+
+# __gnu_debug::__create_backtrace
+_ZN11__gnu_debug24__create_backtrace_stateEv;
+_ZN11__gnu_debug18__render_backtraceEPvPFiS0_mPKciS2_ES0_;
+
+} GLIBCXX_3.4.29;
+
+
# Symbols in the support library (libsupc++) have their own tag.
CXXABI_1.3 {

diff --git a/libstdc++-v3/doc/xml/manual/debug_mode.xml 
b/libstdc++-v3/doc/xml/manual/debug_mode.xml
index 883e8cb4f03..931b09710f3 100644
--- a/libstdc++-v3/doc/xml/manual/debug_mode.xml
+++ b/libstdc++-v3/doc/xml/manual/debug_mode.xml
@@ -160,6 +160,12 @@ which always works correctly.
  GLIBCXX_DEBUG_MESSAGE_LENGTH can be used to request a
  different length.

+Note that libstdc++ is able to use
+  http://www.w3.org/1999/xlink;
+  
xlink:href="https://github.com/ianlancetaylor/libbacktrace;>libbacktrace
+  to produce backtraces on error. Use -D_GLIBCXX_DEBUG_BACKTRACE 
to
+  activate it. You'll also have to link with libbacktrace
+  (-lbacktrace) to build your application.


Using a 
Specific Debug Container
diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 24543e9526e..9bd0da8c1c5 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -1128,6 +1128,16 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
hello.cc -o test.exe
extensions and libstdc++-specific behavior into errors.
  

+_GLIBCXX_DEBUG_BACKTRACE
+
+  
+   Undefined by default. Considered only if _GLIBCXX_DEBUG
+   is defined. When defined, checks for http://www.w3.org/1999/xlink;
+   

Re: [patch] Fix PR rtl-optimization/100411

2021-05-05 Thread Eric Botcazou
> I mean, can't we have just one while loop that skips over all debug insns,
> NOTE_INSN_BASIC_BLOCK_P, NOTE_INSN_PROLOGUE_END and NOTE_INSN_FUNCTION_BEG
> and stops on anything else, or, if we want to skip at most one of some or
> all of those note kinds, do some tracking if we've already skipped that kind
> of note?

Revised version attached.

-- 
Eric Botcazoudiff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index f05cb6136c7..f49a34dcb0f 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -2134,18 +2134,15 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
   update_br_prob_note (redirect_edges_to);
 
   /* Edit SRC1 to go to REDIRECT_TO at NEWPOS1.  */
-
-  /* Skip possible basic block header.  */
   if (LABEL_P (newpos1))
 newpos1 = NEXT_INSN (newpos1);
 
-  while (DEBUG_INSN_P (newpos1))
-newpos1 = NEXT_INSN (newpos1);
-
-  if (NOTE_INSN_BASIC_BLOCK_P (newpos1))
-newpos1 = NEXT_INSN (newpos1);
-
-  while (DEBUG_INSN_P (newpos1))
+  /* Skip debug insns, basic block header and prologue markers.  */
+  while (DEBUG_INSN_P (newpos1)
+	 || (NOTE_P (newpos1)
+	 && (NOTE_KIND (newpos1) == NOTE_INSN_BASIC_BLOCK
+		 || NOTE_KIND (newpos1) == NOTE_INSN_PROLOGUE_END
+		 || NOTE_KIND (newpos1) == NOTE_INSN_FUNCTION_BEG)))
 newpos1 = NEXT_INSN (newpos1);
 
   redirect_from = split_block (src1, PREV_INSN (newpos1))->src;


Re: [patch] Fix PR rtl-optimization/100411

2021-05-05 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 01:00:35PM +0200, Eric Botcazou wrote:
> 2021-05-05  Eric Botcazou  
> 
>   PR rtl-optimization/100411
>   * cfgcleanup.c (try_crossjump_to_edge): Also skip end of prologue
>   and beginning of function markers.
> 
> -- 
> Eric Botcazou

> diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
> index f05cb6136c7..64279cc8c20 100644
> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -2148,6 +2148,20 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>while (DEBUG_INSN_P (newpos1))
>  newpos1 = NEXT_INSN (newpos1);
>  
> +  /* And end of prologue marker.  */
> +  if (NOTE_P (newpos1) && NOTE_KIND (newpos1) == NOTE_INSN_PROLOGUE_END)
> +newpos1 = NEXT_INSN (newpos1);
> +
> +  while (DEBUG_INSN_P (newpos1))
> +newpos1 = NEXT_INSN (newpos1);
> +
> +  /* And also beginning of function marker.  */
> +  if (NOTE_P (newpos1) && NOTE_KIND (newpos1) == NOTE_INSN_FUNCTION_BEG)
> +newpos1 = NEXT_INSN (newpos1);
> +
> +  while (DEBUG_INSN_P (newpos1))
> +newpos1 = NEXT_INSN (newpos1);

Do those notes always have to appear in that order?
I mean, can't we have just one while loop that skips over all debug insns,
NOTE_INSN_BASIC_BLOCK_P, NOTE_INSN_PROLOGUE_END and NOTE_INSN_FUNCTION_BEG
and stops on anything else, or, if we want to skip at most one of some or
all of those note kinds, do some tracking if we've already skipped that kind of
note?

> +
>redirect_from = split_block (src1, PREV_INSN (newpos1))->src;
>to_remove = single_succ (redirect_from);
>  


Jakub



[patch] Fix PR rtl-optimization/100411

2021-05-05 Thread Eric Botcazou
Hi,

this is the bootstrap failure of GCC 11 on MinGW64 configured with --enable-
tune=nocona.  The bottom line is that SEH does not support CFI for epilogues 
but the x86 back-end nevertheless attaches it to instructions, so we have to 
filter it out and this is done by detecting the end of the prologue by means 
of the NOTE_INSN_PROLOGUE_END note.

But the compiler manages to generate a second epilogue before this note in the 
RTL stream and this fools the above logic.  The root cause is cross-jumping, 
which inserts a jump before the end of the prologue (in fact just before the 
note); the rest (CFG cleanup, BB reordering) is downhill from there.

Tested on x86-64/Linux and x86-64/Windows, OK for mainline and 11 branch?


2021-05-05  Eric Botcazou  

PR rtl-optimization/100411
* cfgcleanup.c (try_crossjump_to_edge): Also skip end of prologue
and beginning of function markers.

-- 
Eric Botcazoudiff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index f05cb6136c7..64279cc8c20 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -2148,6 +2148,20 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
   while (DEBUG_INSN_P (newpos1))
 newpos1 = NEXT_INSN (newpos1);
 
+  /* And end of prologue marker.  */
+  if (NOTE_P (newpos1) && NOTE_KIND (newpos1) == NOTE_INSN_PROLOGUE_END)
+newpos1 = NEXT_INSN (newpos1);
+
+  while (DEBUG_INSN_P (newpos1))
+newpos1 = NEXT_INSN (newpos1);
+
+  /* And also beginning of function marker.  */
+  if (NOTE_P (newpos1) && NOTE_KIND (newpos1) == NOTE_INSN_FUNCTION_BEG)
+newpos1 = NEXT_INSN (newpos1);
+
+  while (DEBUG_INSN_P (newpos1))
+newpos1 = NEXT_INSN (newpos1);
+
   redirect_from = split_block (src1, PREV_INSN (newpos1))->src;
   to_remove = single_succ (redirect_from);
 


Re: [PATCH] PR rtl-optimization/100263: Ensure register can change mode

2021-05-05 Thread Eric Botcazou
> For move2add_valid_value_p we also have to ask the target whether a
> register can be accessed in a different mode than it was set before.
> 
> gcc/ChangeLog:
> 
>   PR rtl-optimization/100263
>   * postreload.c (move2add_valid_value_p): Ensure register can
>   change mode.
> 
> Bootstrapped and regtested releases/gcc-{8,9,10,11} and master on IBM Z.
> Ok for those branches?

Yes, OK everywhere, thanks.

-- 
Eric Botcazou




Re: [GCC][PATCH] arm: Remove duplicate definitions from arm_mve.h (pr100419).

2021-05-05 Thread Richard Earnshaw via Gcc-patches




On 05/05/2021 10:56, Srinath Parvathaneni via Gcc-patches wrote:

Hi All,

This patch removes several duplicated intrinsic definitions from
arm_mve.h mentioned in PR100419 and also fixes the wrong arguments
in few of intrinsics polymorphic variants.

Regression tested and found no issues.

Ok for master ? GCC-11 and GCC-10 branch backports?
gcc/ChangeLog:

2021-05-04  Srinath Parvathaneni  

 PR target/100419
 * config/arm/arm_mve.h (__arm_vstrwq_scatter_offset): Fix wrong 
arguments.
 (__arm_vcmpneq): Remove duplicate definition.
 (__arm_vstrwq_scatter_offset_p): Likewise.
 (__arm_vmaxq_x): Likewise.
 (__arm_vmlsdavaq): Likewise.
 (__arm_vmlsdavaxq): Likewise.
 (__arm_vmlsdavq_p): Likewise.
 (__arm_vmlsdavxq_p): Likewise.
 (__arm_vrmlaldavhaq): Likewise.
 (__arm_vstrbq_p): Likewise.
 (__arm_vstrbq_scatter_offset): Likewise.
 (__arm_vstrbq_scatter_offset_p): Likewise.
 (__arm_vstrdq_scatter_offset): Likewise.
 (__arm_vstrdq_scatter_offset_p): Likewise.
 (__arm_vstrdq_scatter_shifted_offset): Likewise.
 (__arm_vstrdq_scatter_shifted_offset_p): Likewise.

Co-authored-by: Joe Ramsay  


Let's take this example:

-#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 = 
(p1); \
+#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = 
(p0); \

   __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(p0, int32_t *), __p1, 
__ARM_mve_coerce(__p2, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: 
__arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(p0, uint32_t *), __p1, 
__ARM_mve_coerce(__p2, uint32x4_t)));})

+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int32_t *), p1, 
__ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: 
__arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint32_t *), p1, 
__ARM_mve_coerce(__p2, uint32x4_t)));})


It removes the safe shadow copy of p1 but adds a safe shadow copy of p0. 
 Why?  Isn't it better (and safer) to just create shadow copies of all 
the arguments and let the compiler worry about when it's safe to 
eliminate them?


R.




### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 
3a40c6e68161b64319b071f57a5b0d8393303cfd..dc1d874a6366eb5fe755a70c72ed371c915bd04b
 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -37808,33 +37808,19 @@ extern void *__ARM_undef;
int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32(p0, p1, 
__ARM_mve_coerce(__p2, uint32x4_t), p3), \
int (*)[__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_base_p_f32(p0, 
p1, __ARM_mve_coerce(__p2, float32x4_t), p3));})
  
-#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \

+#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
__typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(p0, int32_t *), __p1, 
__ARM_mve_coerce(__p2, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: 
__arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(p0, uint32_t *), __p1, 
__ARM_mve_coerce(__p2, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: 
__arm_vstrwq_scatter_offset_f32 (__ARM_mve_coerce(p0, float32_t *), __p1, 
__ARM_mve_coerce(__p2, float32x4_t)));})
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int32_t *), p1, 
__ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: 
__arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint32_t *), p1, 
__ARM_mve_coerce(__p2, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: 
__arm_vstrwq_scatter_offset_f32 (__ARM_mve_coerce(__p0, float32_t *), p1, 
__ARM_mve_coerce(__p2, float32x4_t)));})
  
-#define __arm_vstrwq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \

+#define __arm_vstrwq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = 
(p0); \
__typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_p_s32 

Re: [patch] Generate debug info for local dynamic record types

2021-05-05 Thread Eric Botcazou
> OK.

Thanks.

> I suppose there's no easy way to build a gnat.dg "guality" testcase for
> this?

On all the platforms I test, I have a bunch of guality failures so I'm not 
really thrilled by the idea...  This should be covered by the GDB testsuite 
run with -fgnat-encodings=minimal but it's actually easy to check in DWARF 5.


* gnat.dg/debug17.adb: New test.

-- 
Eric Botcazou-- { dg-do compile }
-- { dg-skip-if "No Dwarf" { { hppa*-*-hpux* } && { ! lp64 } } }
-- { dg-options "-cargs -O0 -g -dA -fgnat-encodings=minimal -margs" }

pragma No_Component_Reordering;

procedure Debug17 (Number_Of_Bits : Natural) is

   type Bitinfos_T is array (Natural range 1 .. Number_Of_Bits) of Float;

   type Inner_Record_T is
  record
 Bitinfos : Bitinfos_T := (others => 1.5);
 Check1 : Integer := 1983;
 Check2 : Integer := 1995;
 Flag : Boolean := False;
 Check3 : Integer := 2005;

  end record;

   Rfield : Inner_Record_T;

begin
   null;
end;

-- { dg-final { scan-assembler-not "DW_AT_data_member_location (0)" } }


Re: [RFC] ldist: Recognize rawmemchr loop patterns

2021-05-05 Thread Richard Biener via Gcc-patches
On Wed, May 5, 2021 at 11:36 AM Richard Biener
 wrote:
>
> On Tue, Mar 16, 2021 at 6:13 PM Stefan Schulze Frielinghaus
>  wrote:
> >
> > [snip]
> >
> > Please find attached a new version of the patch.  A major change compared to
> > the previous patch is that I created a separate pass which hopefully makes
> > reviewing also easier since it is almost self-contained.  After realizing 
> > that
> > detecting loops which mimic the behavior of rawmemchr/strlen functions does 
> > not
> > really fit into the topic of loop distribution, I created a separate pass.
>
> It's true that these reduction-like patterns are more difficult than
> the existing
> memcpy/memset cases.
>
> >  Due
> > to this I was also able to play around a bit and schedule the pass at 
> > different
> > times.  Currently it is scheduled right before loop distribution where loop
> > header copying already took place which leads to the following effect.
>
> In fact I'd schedule it after loop distribution so there's the chance that 
> loop
> distribution can expose a loop that fits the new pattern.
>
> >  Running
> > this setup over
> >
> > char *t (char *p)
> > {
> >   for (; *p; ++p);
> >   return p;
> > }
> >
> > the new pass transforms
> >
> > char * t (char * p)
> > {
> >   char _1;
> >   char _7;
> >
> >[local count: 118111600]:
> >   _7 = *p_3(D);
> >   if (_7 != 0)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 105119324]:
> >
> >[local count: 955630225]:
> >   # p_8 = PHI 
> >   p_6 = p_8 + 1;
> >   _1 = *p_6;
> >   if (_1 != 0)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 105119324]:
> >   # p_2 = PHI 
> >   goto ; [100.00%]
> >
> >[local count: 850510901]:
> >   goto ; [100.00%]
> >
> >[local count: 12992276]:
> >
> >[local count: 118111600]:
> >   # p_9 = PHI 
> >   return p_9;
> >
> > }
> >
> > into
> >
> > char * t (char * p)
> > {
> >   char * _5;
> >   char _7;
> >
> >[local count: 118111600]:
> >   _7 = *p_3(D);
> >   if (_7 != 0)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 105119324]:
> >   _5 = p_3(D) + 1;
> >   p_10 = .RAWMEMCHR (_5, 0);
> >
> >[local count: 118111600]:
> >   # p_9 = PHI 
> >   return p_9;
> >
> > }
> >
> > which is fine so far.  However, I haven't made up my mind so far whether it 
> > is
> > worthwhile to spend more time in order to also eliminate the "first 
> > unrolling"
> > of the loop.
>
> Might be a phiopt transform ;)  Might apply to quite some set of
> builtins.  I wonder how the strlen case looks like though.
>
> > I gave it a shot by scheduling the pass prior pass copy header
> > and ended up with:
> >
> > char * t (char * p)
> > {
> >[local count: 118111600]:
> >   p_5 = .RAWMEMCHR (p_3(D), 0);
> >   return p_5;
> >
> > }
> >
> > which seems optimal to me.  The downside of this is that I have to 
> > initialize
> > scalar evolution analysis which might be undesired that early.
> >
> > All this brings me to the question where do you see this peace of code 
> > running?
> > If in a separate pass when would you schedule it?  If in an existing pass,
> > which one would you choose?
>
> I think it still fits loop distribution.  If you manage to detect it
> with your pass
> standalone then you should be able to detect it in loop distribution.  Can you
> explain what part is "easier" as standalone pass?

Btw, another "fitting" pass would be final value replacement (pass_scev_cprop)
since what these patterns provide is a builtin call to compute the value of one
of the loop PHIs on exit.  Note this pass leaves removal of in-loop computations
to followup DCE which means that in some cases it does unprofitable transforms.
There's a bug somewhere where I worked on doing final value replacement
on-demand when DCE figures out the loop is otherwise dead but I never finished
this (loop distribution could also use such mechanism to get rid of
unwanted PHIs).

> > Another topic which came up is whether there exists a more elegant solution 
> > to
> > my current implementation in order to deal with stores (I'm speaking of the 
> > `if
> > (store_dr)` statement inside of function transform_loop_1).  For example,
> >
> > extern char *p;
> > char *t ()
> > {
> >   for (; *p; ++p);
> >   return p;
> > }
> >
> > ends up as
> >
> > char * t ()
> > {
> >   char * _1;
> >   char * _2;
> >   char _3;
> >   char * p.1_8;
> >   char _9;
> >   char * p.1_10;
> >   char * p.1_11;
> >
> >[local count: 118111600]:
> >   p.1_8 = p;
> >   _9 = *p.1_8;
> >   if (_9 != 0)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 105119324]:
> >
> >[local count: 955630225]:
> >   # p.1_10 = PHI <_1(6), p.1_8(5)>
> >   _1 = p.1_10 + 1;
> >   p = _1;
> >   _3 = *_1;
> >   if (_3 != 0)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 105119324]:
> >   # _2 = PHI <_1(3)>
> >   goto ; [100.00%]
> >
> >[local count: 

Re: [PATCH][_GLIBCXX_DEBUG] libbacktrace integration

2021-05-05 Thread Jonathan Wakely via Gcc-patches

On 04/05/21 08:03 +0200, François Dumont wrote:

On 03/05/21 11:06 pm, Jonathan Wakely wrote:

On 03/05/21 22:17 +0200, François Dumont via Libstdc++ wrote:

Is it too early to consider this patch ? Or just lack of time ?


I haven't had time to review it yet, but my general feeling hasn't
changed. I still don't like the idea of executing additional code
after undefined behaviour is detected. I've been convinced by glibc
folk that every bit of code run when the program state is corrupt
increases the risk that it can be exploited by an attacker.




Ok, I must have miss (or forgot) this feedback.


See https://gcc.gnu.org/pipermail/libstdc++/2018-December/048061.html


Well, isn't it the current situation of the whole _GLIBCXX_DEBUG mode ?


Yes, but adding more code makes it worse.

For me _GLIBCXX_DEBUG mode purpose is to detect UB situation and to 
assert _before_ any UB code is run.


Yes, it stops running the user code, but then runs its own code to
format the message to show to the user. The more code that runs when
the program is in an inconsistent/undefined state, the more likely it
is that some of that code can be exploited to do something bad.

Moreover it is optional. This is a feature to use when _GLIBCXX_DEBUG 
is telling you that you have a problem in your code but you just 
cannot find where it is called.


Which you can do with a debugger. When debug mode calls abort() it
will stop the program in a debugger, or produce a core file that can
be examined in a debugger.

The stacktrace is a convenience, not providing anything that couldn't
be done already.

Anyway, I'll review the patch...




[GCC][PATCH] arm: Remove duplicate definitions from arm_mve.h (pr100419).

2021-05-05 Thread Srinath Parvathaneni via Gcc-patches
Hi All,

This patch removes several duplicated intrinsic definitions from
arm_mve.h mentioned in PR100419 and also fixes the wrong arguments
in few of intrinsics polymorphic variants.

Regression tested and found no issues.

Ok for master ? GCC-11 and GCC-10 branch backports?
gcc/ChangeLog:

2021-05-04  Srinath Parvathaneni  

PR target/100419
* config/arm/arm_mve.h (__arm_vstrwq_scatter_offset): Fix wrong 
arguments.
(__arm_vcmpneq): Remove duplicate definition.
(__arm_vstrwq_scatter_offset_p): Likewise.
(__arm_vmaxq_x): Likewise.
(__arm_vmlsdavaq): Likewise.
(__arm_vmlsdavaxq): Likewise.
(__arm_vmlsdavq_p): Likewise.
(__arm_vmlsdavxq_p): Likewise.
(__arm_vrmlaldavhaq): Likewise.
(__arm_vstrbq_p): Likewise.
(__arm_vstrbq_scatter_offset): Likewise.
(__arm_vstrbq_scatter_offset_p): Likewise.
(__arm_vstrdq_scatter_offset): Likewise.
(__arm_vstrdq_scatter_offset_p): Likewise.
(__arm_vstrdq_scatter_shifted_offset): Likewise.
(__arm_vstrdq_scatter_shifted_offset_p): Likewise.

Co-authored-by: Joe Ramsay  


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 
3a40c6e68161b64319b071f57a5b0d8393303cfd..dc1d874a6366eb5fe755a70c72ed371c915bd04b
 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -37808,33 +37808,19 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32(p0, p1, 
__ARM_mve_coerce(__p2, uint32x4_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_base_p_f32(p0, p1, 
__ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \
+#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(p0, int32_t *), __p1, 
__ARM_mve_coerce(__p2, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: 
__arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(p0, uint32_t *), __p1, 
__ARM_mve_coerce(__p2, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: 
__arm_vstrwq_scatter_offset_f32 (__ARM_mve_coerce(p0, float32_t *), __p1, 
__ARM_mve_coerce(__p2, float32x4_t)));})
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int32_t *), p1, 
__ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: 
__arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint32_t *), p1, 
__ARM_mve_coerce(__p2, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: 
__arm_vstrwq_scatter_offset_f32 (__ARM_mve_coerce(__p0, float32_t *), p1, 
__ARM_mve_coerce(__p2, float32x4_t)));})
 
-#define __arm_vstrwq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = 
(p1); \
+#define __arm_vstrwq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = 
(p0); \
   __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_p_s32 (__ARM_mve_coerce(p0, int32_t *), __p1, 
__ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: 
__arm_vstrwq_scatter_offset_p_u32 (__ARM_mve_coerce(p0, uint32_t *), __p1, 
__ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: 
__arm_vstrwq_scatter_offset_p_f32 (__ARM_mve_coerce(p0, float32_t *), __p1, 
__ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vstrwq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = 
(p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: 
__arm_vstrwq_scatter_offset_p_s32 (__ARM_mve_coerce(p0, int32_t *), __p1, 
__ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: 
__arm_vstrwq_scatter_offset_p_u32 (__ARM_mve_coerce(p0, uint32_t *), __p1, 
__ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: 
__arm_vstrwq_scatter_offset_p_f32 (__ARM_mve_coerce(p0, float32_t *), __p1, 
__ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \
-  int 

[patch, committed] libgfortran/intrinsics/chmod.c: Silence unused var warning

2021-05-05 Thread Tobias Burnus

Found with amdgcn – silences a build warning there.

Committed as r12-509-gdee371fdd4ae25f837b9b2ded7789d07ed739c9e

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
commit dee371fdd4ae25f837b9b2ded7789d07ed739c9e
Author: Tobias Burnus 
Date:   Wed May 5 11:48:48 2021 +0200

libgfortran/intrinsics/chmod.c: Silence unused var warning

libgfortran/ChangeLog:

* intrinsics/chmod.c (chmod_internal): Only declare mode_mask var
if HAVE_UMASK.

diff --git a/libgfortran/intrinsics/chmod.c b/libgfortran/intrinsics/chmod.c
index 8b5140a05a3..d0371ce560f 100644
--- a/libgfortran/intrinsics/chmod.c
+++ b/libgfortran/intrinsics/chmod.c
@@ -71,7 +71,10 @@ chmod_internal (char *file, char *mode, gfc_charlen_type mode_len)
 #ifndef __MINGW32__
   bool is_dir;
 #endif
-  mode_t mode_mask, file_mode, new_mode;
+#ifdef HAVE_UMASK
+  mode_t mode_mask;
+#endif
+  mode_t file_mode, new_mode;
   struct stat stat_buf;
 
   if (mode_len == 0)


Re: [PATCH] libstdc++: Reduce ranges::minmax/minmax_element comparison complexity

2021-05-05 Thread Jonathan Wakely via Gcc-patches

On 05/05/21 10:39 +0100, Jonathan Wakely wrote:

On 04/05/21 21:42 -0400, Patrick Palka via Libstdc++ wrote:

This rewrites ranges::minmax and ranges::minmax_element so that it
performs at most 3*N/2 many comparisons, as required by the standard.
In passing, this also fixes PR100387 by avoiding a premature std::move
in ranges::minmax and in std::shift_right.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps
10/11?

libstdc++-v3/ChangeLog:

PR libstdc++/100387
* include/bits/ranges_algo.h (__minmax_fn::operator()): Rewrite
to limit comparison complexity to 3*N/2.  Avoid premature std::move.
(__minmax_element_fn::operator()): Likewise.
(shift_right): Avoid premature std::move of __result.
* testsuite/25_algorithms/minmax/constrained.cc (test04, test05):
New tests.
* testsuite/25_algorithms/minmax_element/constrained.cc (test02):
Likewise.
---
libstdc++-v3/include/bits/ranges_algo.h   | 87 ++-
.../25_algorithms/minmax/constrained.cc   | 31 +++
.../minmax_element/constrained.cc | 19 
3 files changed, 113 insertions(+), 24 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index cda3042c11f..bbd29127e89 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3291,18 +3291,39 @@ namespace ranges
auto __first = ranges::begin(__r);
auto __last = ranges::end(__r);
__glibcxx_assert(__first != __last);
+   auto __comp_proj = __detail::__make_comp_proj(__comp, __proj);
minmax_result> __result = {*__first, *__first};
while (++__first != __last)
  {
-   auto __tmp = *__first;
-   if (std::__invoke(__comp,
- std::__invoke(__proj, __tmp),
- std::__invoke(__proj, __result.min)))
- __result.min = std::move(__tmp);
-   if (!(bool)std::__invoke(__comp,
-std::__invoke(__proj, __tmp),
-std::__invoke(__proj, __result.max)))
- __result.max = std::move(__tmp);
+   // Process two elements at a time so that we perform at most
+   // 3*N/2 many comparisons in total (each of the N/2 iterations


Is "many" a typo here?


+   // of this loop performs three comparisions).
+   auto __val1 = *__first;


Can we avoid making this copy if the range satisfies forward_range, by
keeping copies of the min/max iterators, or just forwarding to
ranges::minmax_element?


Hmm, on the other hand, for a forward range of ints we're probably
better off jsut making the copy here and not going through the
indirections done by minmax_element.

Maybe something like:

  if constexpr (forward_range<_Range> && is_scalar_v>)




  1   2   >