Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-09 Thread Richard Biener
On Thu, Nov 9, 2023 at 9:25 PM Vladimir Makarov  wrote:
>
>
> On 11/7/23 22:47, Lehua Ding wrote:
> >
> > Lehua Ding (7):
> >ira: Refactor the handling of register conflicts to make it more
> >  general
> >ira: Add live_subreg problem and apply to ira pass
> >ira: Support subreg live range track
> >ira: Support subreg copy
> >ira: Add all nregs >= 2 pseudos to tracke subreg list
> >lra: Apply live_subreg df_problem to lra pass
> >lra: Support subreg live range track and conflict detect
> >
> Thank you very much for addressing subreg RA.  It is a big work.  I
> wanted to address this long time ago but have no time to do this by myself.
>
> I tried to evaluate your patches on x86-64 (i7-9700k) release mode GCC.
> I used -O3 for SPEC2017 compilation.
>
> Here are the results:
>
> baseline baseline(+patches)
> specint2017:  8.51 vs 8.58 (+0.8%)
> specfp2017:   21.1 vs 21.1 (+0%)
> compile time: 2426.41s vs 2580.58s (+6.4%)
>
> Spec2017 average code size change: -0.07%
>
> Improving specint by 0.8% is impressive for me.
>
> Unfortunately, it is achieved by decreasing compilation speed by 6.4%
> (although on smaller benchmark I saw only 3% slowdown). I don't know how
> but we should mitigate this speed degradation.  May be we can find a hot
> spot in the new code (but I think it is not a linear search pointed by
> Richard Biener as the object vectors most probably contain 1-2 elements)
> and this code spot can be improved, or we could use this only for
> -O3/fast, or the code can be function or target dependent.
>
> I also find GCC consumes more memory with the patches. May be it can be
> improved too (although I am not sure about this).

Note I think it's important that this can be disabled by default for -O1
which we recommend when you feed GCC with large machine-generated
code which is also where I guess you'll find the effect is way worse.

That includes disabling the memory usage side-effect which I guess might
be hard given you grow generic data structures.

> I'll start to review the patches on the next week.  I don't expect that
> I'll find something serious to reject the patches but again we should
> work on mitigation of the compilation speed problem.  We can fill a new
> PR for this and resolve the problem during the release cycle.
>
>


Fwd: [PING][PATCH V15 4/4] ree: Improve ree pass using defined abi interfaces

2023-11-09 Thread Ajit Agarwal
Ping!

Ok for trunk.

Thanks & Regards
Ajit

 Forwarded Message 
Subject: [PATCH V15 4/4] ree: Improve ree pass using defined abi interfaces
Date: Sun, 29 Oct 2023 16:14:17 +0530
From: Ajit Agarwal 
To: gcc-patches , Jeff Law , 
Vineet Gupta , Bernhard Reutner-Fischer 

CC: Richard Biener , Segher Boessenkool 
, Peter Bergner 

Hello Vineet, Jeff and Bernhard:

This version 15 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 15) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.
d) Addressed remaining review comments from Vineet.
e) Addressed review comments from Bernhard.
f) Fix aarch64 regressions failure.

Please let me know if there is anything missing in this patch.

Ok for trunk?

Thanks & Regards
Ajit

ree: Improve ree pass using defined abi interfaces

For rs6000 target we see zero and sign extend with missing
definitions. Improved to eliminate such zero and sign extension
using defined ABI interfaces.

2023-10-29  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Eliminate zero_extend and sign_extend
using defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
changes since v6:
  - Added missing abi interfaces.
  - Rearranging and restructuring the code.
  - Removal of hard coded zero extend and sign extend in abi interfaces.
  - Relaxed different registers with source and destination in abi interfaces.
  - Using CSE in abi interfaces.
  - Fix aarch64 regressions.
  - Add Sign extension removal in abi interfaces.
  - Modified comments as per coding convention.
  - Modified code as per coding convention.
  - Fix bug bootstrapping RISCV failures
---
 gcc/ree.cc| 136 +-
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 143 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..6af82093eaf 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,109 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode, false otherwise.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, ,
+  NULL_TREE, 1);
+
+  return tgt_mode == mode;
+}
+
+/* Return TRUE if regno is a return register.  */
+
+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if
+   reg source operand is argument register and not return register,
+   mode of source and destination operand are different,
+   if not promoted REGNO of source and destination operand are the same.  */
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  && !abi_extension_candidate_return_reg_p (REGNO (orig_src))
+  && dst_mode != GET_MODE (orig_src))
+ {
+   if (!abi_target_promote_function_mode (GET_MODE (orig_src))
+  && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+   return true;
+ }
+  return false;
+}
+
+/* Return TRUE if regno is an argument register.  */
+
+static inline bool
+abi_extension_candidate_argno_p (int regno)
+{
+  return FUNCTION_ARG_REGNO_P (regno);
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses = get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->next)
+{
+  if (!use->ref)
+

Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-09 Thread HAO CHEN GUI
Hi Richard,
  Thanks so much for your comments.

在 2023/11/9 19:41, Richard Biener 写道:
> I'm not sure if the testcase is valid though?
> 
> @defbuiltin{{void} __builtin_return (void *@var{result})}
> This built-in function returns the value described by @var{result} from
> the containing function.  You should specify, for @var{result}, a value
> returned by @code{__builtin_apply}.
> @enddefbuiltin
> 
> I don't see __builtin_apply being used here?

The prototype of the test case is from "__objc_block_forward" in
libobjc/sendmsg.c.

  void *args, *res;

  args = __builtin_apply_args ();
  res = __objc_forward (rcv, op, args);
  if (res)
__builtin_return (res);
  else
...

The __builtin_apply_args puts the return values on stack by the alignment.
But the forward function can do anything and return a void* pointer.
IMHO the alignment might be broken. So I just simplified it to use a
void* pointer as the input argument of  "__builtin_return" and skip
"__builtin_apply_args".

Thanks
Gui Haochen


[PATCH] RISC-V: Fix bug that XTheadMemPair extension caused fcsr not to be saved and restored before and after interrupt.

2023-11-09 Thread Jin Ma
The t0 register is used as a temporary register for interrupts, so it needs
special treatment. It is necessary to avoid using "th.ldd" in the interrupt
program to stop the subsequent operation of the t0 register, so they need to
exchange positions in the function "riscv_for_each_saved_reg".

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the interrupt
operation before the XTheadMemPair.
---
 gcc/config/riscv/riscv.cc | 56 +--
 .../riscv/xtheadmempair-interrupt-fcsr.c  | 18 ++
 2 files changed, 46 insertions(+), 28 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e25692b86fc..fa2d4d4b779 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6346,6 +6346,34 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
riscv_save_restore_fn fn,
  && riscv_is_eh_return_data_register (regno))
continue;
 
+  /* In an interrupt function, save and restore some necessary CSRs in the 
stack
+to avoid changes in CSRs.  */
+  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
+ && cfun->machine->interrupt_handler_p
+ && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
+ || (TARGET_ZFINX
+ && (cfun->machine->frame.mask & ~(1 << 
RISCV_PROLOGUE_TEMP_REGNUM)
+   {
+ unsigned int fcsr_size = GET_MODE_SIZE (SImode);
+ if (!epilogue)
+   {
+ riscv_save_restore_reg (word_mode, regno, offset, fn);
+ offset -= fcsr_size;
+ emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
+ riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
+ offset, riscv_save_reg);
+   }
+ else
+   {
+ riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
+ offset - fcsr_size, riscv_restore_reg);
+ emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
+ riscv_save_restore_reg (word_mode, regno, offset, fn);
+ offset -= fcsr_size;
+   }
+ continue;
+   }
+
   if (TARGET_XTHEADMEMPAIR)
{
  /* Get the next reg/offset pair.  */
@@ -6376,34 +6404,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
riscv_save_restore_fn fn,
}
}
 
-  /* In an interrupt function, save and restore some necessary CSRs in the 
stack
-to avoid changes in CSRs.  */
-  if (regno == RISCV_PROLOGUE_TEMP_REGNUM
- && cfun->machine->interrupt_handler_p
- && ((TARGET_HARD_FLOAT  && cfun->machine->frame.fmask)
- || (TARGET_ZFINX
- && (cfun->machine->frame.mask & ~(1 << 
RISCV_PROLOGUE_TEMP_REGNUM)
-   {
- unsigned int fcsr_size = GET_MODE_SIZE (SImode);
- if (!epilogue)
-   {
- riscv_save_restore_reg (word_mode, regno, offset, fn);
- offset -= fcsr_size;
- emit_insn (gen_riscv_frcsr (RISCV_PROLOGUE_TEMP (SImode)));
- riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
- offset, riscv_save_reg);
-   }
- else
-   {
- riscv_save_restore_reg (SImode, RISCV_PROLOGUE_TEMP_REGNUM,
- offset - fcsr_size, riscv_restore_reg);
- emit_insn (gen_riscv_fscsr (RISCV_PROLOGUE_TEMP (SImode)));
- riscv_save_restore_reg (word_mode, regno, offset, fn);
- offset -= fcsr_size;
-   }
- continue;
-   }
-
   riscv_save_restore_reg (word_mode, regno, offset, fn);
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c 
b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
new file mode 100644
index 000..d06f05f5c7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-interrupt-fcsr.c
@@ -0,0 +1,18 @@
+/* Verify that fcsr instructions emitted.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */
+/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906 
-funwind-tables" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906 
-funwind-tables" { target { rv32 } } } */
+
+
+extern int foo (void);
+
+void __attribute__ ((interrupt))
+sub (void)
+{
+  foo ();
+}
+
+/* { dg-final { scan-assembler-times "frcsr\t" 1 } } */
+/* { dg-final { scan-assembler-times "fscsr\t" 1 } } */

base-commit: e7f4040d9d6ec40c48ada940168885d7dde03af9
-- 
2.17.1



[PING ^1] [PATCH v2 3/4] Improve functionality of ree pass with various constants with AND operation.

2023-11-09 Thread Ajit Agarwal
ping!

 Forwarded Message 
Subject: [PING ^0] [PATCH v2 3/4] Improve functionality of ree pass with 
various constants with AND operation.
Date: Sun, 15 Oct 2023 18:28:51 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Jeff Law , Vineet Gupta , 
Richard Biener , Segher Boessenkool 
, Peter Bergner , Kewen.Lin 




Hello All:

Please review. In this patch I have different modes and constants that are 
supported in ree pass for 
sign and zero extension eliminations.

Please review and update with your comments so that it will be committed in 
trunk.

Thanks & Regards
Ajit
 Forwarded Message 
Subject: [PATCH v2 3/4] Improve functionality of ree pass with various 
constants with AND operation.
Date: Tue, 19 Sep 2023 14:51:16 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Jeff Law , Vineet Gupta , 
Richard Biener , Peter Bergner 
, Segher Boessenkool 


Hello Jeff:

This patch eliminates redundant zero and sign extension with ree pass for rs6000
target.

Bootstrapped and regtested for powerpc64-linux-gnu.

Thanks & Regards
Ajit


ree: Improve ree pass

For rs6000 target we see redundant zero and sign extension and ree pass
s improved to eliminate such redundant zero and sign extension. Support of
zero_extend/sign_extend/AND.

2023-09-04  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (eliminate_across_bbs_p): Add checks to enable extension
elimination across and within basic blocks.
(def_arith_p): New function to check definition has arithmetic
operation.
(combine_set_extension): Modification to incorporate AND
and current zero_extend and sign_extend instruction.
(merge_def_and_ext): Add calls to eliminate_across_bbs_p and
zero_extend sign_extend and AND instruction.
(rtx_is_zext_p): New function.
(feasible_cfg): New function.
* rtl.h (reg_used_set_between_p): Add prototype.
* rtlanal.cc (reg_used_set_between_p): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim.C: New testcase.
* g++.target/powerpc/zext-elim-1.C: New testcase.
* g++.target/powerpc/zext-elim-2.C: New testcase.
* g++.target/powerpc/sext-elim.C: New testcase.
---
 gcc/ree.cc| 487 --
 gcc/rtl.h |   1 +
 gcc/rtlanal.cc|  15 +
 gcc/testsuite/g++.target/powerpc/sext-elim.C  |  17 +
 .../g++.target/powerpc/zext-elim-1.C  |  19 +
 .../g++.target/powerpc/zext-elim-2.C  |  11 +
 gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 ++
 7 files changed, 534 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..931b9b08821 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -253,6 +253,77 @@ struct ext_cand
 
 static int max_insn_uid;
 
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx insn)
+{
+  if (GET_CODE (insn) == AND)
+{
+  rtx set = XEXP (insn, 0);
+  if (REG_P (set))
+   {
+ rtx src = XEXP (insn, 1);
+ machine_mode m_mode = GET_MODE (set);
+
+ if (CONST_INT_P (src)
+ && (INTVAL (src) == 1
+ || (m_mode == QImode && INTVAL (src) == 0x7)
+ || (m_mode == QImode && INTVAL (src) == 0x007F)
+ || (m_mode == HImode && INTVAL (src) == 0x7FFF)
+ || (m_mode == SImode && INTVAL (src) == 0x007F)))
+   return true;
+
+   }
+  else
+   return false;
+}
+
+  return false;
+}
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx_insn *insn)
+{
+  rtx body = single_set (insn);
+
+  if (GET_CODE (body) == SET && GET_CODE (SET_SRC (body)) == AND)
+   {
+ rtx set = XEXP (SET_SRC (body), 0);
+
+ if (REG_P (set) && GET_MODE (SET_DEST (body)) == GET_MODE (set))
+   {
+ rtx src = XEXP (SET_SRC (body), 1);
+ machine_mode m_mode = GET_MODE (set);
+
+ if (CONST_INT_P (src)
+ && (INTVAL (src) == 1
+ || (m_mode == QImode && INTVAL (src) == 0x7)
+ || (m_mode == QImode && INTVAL (src) == 

[PING ^2][PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-11-09 Thread Ajit Agarwal
Ping ^2.


On 23/10/23 2:02 pm, Ajit Agarwal wrote:
> 
> 
> Ping ^1.
> 
>  Forwarded Message 
> Subject: [PING ^0][PATCH v2] rs6000: Add new pass for replacement of 
> contiguous addresses vector load lxv with lxvp
> Date: Sun, 15 Oct 2023 17:43:24 +0530
> From: Ajit Agarwal 
> To: gcc-patches 
> CC: Segher Boessenkool , Kewen.Lin 
> , Peter Bergner 
> 
> Hello All:
> 
> Please review.
> 
> Thanks & Regards
> Ajit
> 
> 
>  Forwarded Message 
> Subject: [PATCH v2] rs6000: Add new pass for replacement of contiguous 
> addresses vector load lxv with lxvp
> Date: Sun, 8 Oct 2023 00:34:27 +0530
> From: Ajit Agarwal 
> To: gcc-patches 
> CC: Segher Boessenkool , Peter Bergner 
> , Kewen.Lin 
> 
> Hello All:
> 
> This patch add new pass to replace contiguous addresses vector load lxv with 
> mma instruction
> lxvp. This patch addresses one regressions failure in ARM architecture.
> 
> Bootstrapped and regtested with powepc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> rs6000: Add new pass for replacement of contiguous lxv with lxvp.
> 
> New pass to replace contiguous addresses lxv with lxvp. This pass
> is registered after ree rtl pass.
> 
> 2023-10-07  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-passes.def: Registered vecload pass.
>   * config/rs6000/rs6000-vecload-opt.cc: Add new pass.
>   * config.gcc: Add new executable.
>   * config/rs6000/rs6000-protos.h: Add new prototype for vecload
>   pass.
>   * config/rs6000/rs6000.cc: Add new prototype for vecload pass.
>   * config/rs6000/t-rs6000: Add new rule.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/vecload.C: New test.
> ---
>  gcc/config.gcc |   4 +-
>  gcc/config/rs6000/rs6000-passes.def|   1 +
>  gcc/config/rs6000/rs6000-protos.h  |   2 +
>  gcc/config/rs6000/rs6000-vecload-opt.cc| 234 +
>  gcc/config/rs6000/rs6000.cc|   3 +-
>  gcc/config/rs6000/t-rs6000 |   4 +
>  gcc/testsuite/g++.target/powerpc/vecload.C |  15 ++
>  7 files changed, 260 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
>  create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index ee46d96bf62..482ab094b89 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -515,7 +515,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
>   extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -552,7 +552,7 @@ riscv*)
>   ;;
>  rs6000*-*-*)
>   extra_options="${extra_options} g.opt fused-madd.opt 
> rs6000/rs6000-tables.opt"
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-logue.cc 
> \$(srcdir)/config/rs6000/rs6000-call.cc"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def 
> b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..9ecf8ce6a9c 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
>   The power8 does not have instructions that automaticaly do the byte 
> swaps
>   for loads and stores.  */
>INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
> +  INSERT_PASS_AFTER (pass_ree, 1, pass_analyze_vecload);
>  
>/* Pass to do the PCREL_OPT optimization that combines the load of an
>   external symbol's address along with a single load or store using that
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index f70118ea40f..9c44bae33d3 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -91,6 +91,7 @@ extern int mems_ok_for_quad_peep (rtx, rtx);
>  extern bool gpr_or_gpr_p (rtx, rtx);
>  extern bool direct_move_p (rtx, rtx);
>  extern bool quad_address_p (rtx, machine_mode, bool);
> +extern bool mode_supports_dq_form (machine_mode);
>  extern bool quad_load_store_p (rtx, rtx);
>  extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
>  extern void expand_fusion_gpr_load (rtx *);
> @@ -344,6 +345,7 @@ class rtl_opt_pass;
>  
>  extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
>  extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
> +extern rtl_opt_pass *make_pass_analyze_vecload 

[PING][PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-11-09 Thread Surya Kumari Jangala
Ping

On 03/11/23 1:14 pm, Surya Kumari Jangala wrote:
> Hi Segher,
> I have incorporated changes in the code as per the review comments provided 
> by you 
> for version 2 of the patch. Please review.
> 
> Regards,
> Surya
> 
> 
> rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable instructions
> are not optimized, i.e., the webs do not contain any permuting load/store
> instructions along with the associated register swap instructions. Doing 
> special
> handling in such webs will result in the extracted lane being adjusted
> unnecessarily for vec_extract.
> 
> Another issue is that existing code treats non-permuting loads/stores as 
> special
> swappables. Non-permuting loads/stores (that have not yet been split into a
> permuting load/store and a swap) are handled by converting them into a 
> permuting
> load/store (which effectively removes the swap). As a result, if special
> swappables are handled only in webs containing permuting loads/stores, then
> non-optimal code is generated for non-permuting loads/stores.
> 
> Hence, in this patch, all webs containing either permuting loads/ stores or
> non-permuting loads/stores are marked as requiring special handling of
> swappables. Swaps associated with permuting loads/stores are marked for 
> removal,
> and non-permuting loads/stores are converted to permuting loads/stores. Then 
> the
> special swappables in the webs are fixed up.
> 
> This patch also ensures that swappable instructions are not modified in the
> following webs as it is incorrect to do so:
>  - webs containing permuting load/store instructions and associated swap
>instructions that are transformed by converting the permuting memory
>instructions into non-permuting instructions and removing the swap
>instructions.
>  - webs where swap(load(vector constant)) instructions are replaced with
>load(swapped vector constant).
> 
> 2023-09-10  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/PR106770
>   * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
>   (handle_non_permuting_mem_insn): New function.
>   (rs6000_analyze_swaps): Handle swappable instructions only in certain
>   webs.
>   (web_requires_special_handling): New instance variable.
>   (handle_special_swappables): Remove handling of non-permuting load/store
>   instructions.
> 
> gcc/testsuite/
>   PR rtl-optimization/PR106770
>   * gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 0388b9bd736..02ea299bc3d 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the swappable insns in the web represented by this entry
> + have to be fixed. Swappable insns have to be fixed in:
> +   - webs containing permuting loads/stores and the swap insns
> +  in such webs have been marked for removal
> +   - webs where non-permuting loads/stores have been converted
> +  to permuting loads/stores  */
> +  unsigned int web_requires_special_handling : 1;
>/* Set if this insn should be deleted.  */
>unsigned int will_delete : 1;
>  };
> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, 
> unsigned i)
>if (dump_file)
>   fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>break;
> -case SH_NOSWAP_LD:
> -  /* Convert a non-permuting load to a permuting one.  */
> -  permute_load (insn);
> -  break;
> -case SH_NOSWAP_ST:
> -  /* Convert a non-permuting store to a permuting one.  */
> -  permute_store (insn);
> -  break;
>  case SH_EXTRACT:
>/* Change the lane on an extract operation.  */
>adjust_extract (insn);
> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
>free (to_delete);
>  }
>  
> +/* Return true if insn is a non-permuting load/store.  */
> +static bool
> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
> +{
> +  return insn_entry[i].special_handling == SH_NOSWAP_LD
> +  || insn_entry[i].special_handling == SH_NOSWAP_ST;
> +}
> +
> +/* Convert a non-permuting load/store insn to a permuting one.  */
> +static void
> +convert_mem_insn (swap_web_entry *insn_entry, unsigned int i)
> +{
> +  rtx_insn *insn = insn_entry[i].insn;
> +  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
> +permute_load (insn);
> +  if (insn_entry[i].special_handling == SH_NOSWAP_ST)
> +permute_store (insn);
> +}
> +
>  /* Main entry point for this pass.  */
>  unsigned 

Re: [PATCH] tree-ssa-loop-ivopts : Add live analysis in regs used in decision making

2023-11-09 Thread Ajit Agarwal
Hello Richard:


On 09/11/23 6:21 pm, Richard Biener wrote:
> On Wed, Nov 8, 2023 at 4:00 PM Ajit Agarwal  wrote:
>>
>> tree-ssa-loop-ivopts : Add live analysis in regs used in decision making.
>>
>> Add live anaysis in regs used calculation in decision making of
>> selecting ivopts candidates.
>>
>> 2023-11-08  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>> * tree-ssa-loop-ivopts.cc (get_regs_used): New function.
>> (determine_set_costs): Call to get_regs_used to use live
>> analysis.
>> ---
>>  gcc/tree-ssa-loop-ivopts.cc | 73 +++--
>>  1 file changed, 70 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
>> index c3336603778..e02fe7d434b 100644
>> --- a/gcc/tree-ssa-loop-ivopts.cc
>> +++ b/gcc/tree-ssa-loop-ivopts.cc
>> @@ -6160,6 +6160,68 @@ ivopts_estimate_reg_pressure (struct ivopts_data 
>> *data, unsigned n_invs,
>>return cost + n_cands;
>>  }
>>
>> +/* Return regs used based on live-in and liveout of given ssa variables.  */
> 
> Please explain how the following code relates to anything like "live
> analysis" and
> where it uses live-in and live-out.  And what "live-in/out of a given
> SSA variable"
> should be.
> 
> Also explain why you are doing this at all.  The patch doesn't come
> with a testcase
> or with any other hint that motivated you.
> 
> Richard.
>

The function get_regs_used increments the regs_used based on live-in 
and live-out analysis of given ssa name. Instead of setting live-in and
live-out bitmap I increment the regs_used.

Below is how I identify live-in and live-out and increments the regs_used
variable:

a) For a given def_bb of gimple statement of ssa name there should be
live-out and increments the regs_used.

b) Visit each use of SSA_NAME and if it isn't in the same block as the def,
 we identify live on entry blocks and increments regs_used.

The below function is the modification of set_var_live_on_entry of 
tree-ssa-live.cc
Where we set the bitmap of liveout and livein of basic block. Instead of 
setting bitmap, regs_used is incremented.

I identify regs_used as the number of live-in and liveout of given ssa name 
variable.

For each iv candiate ssa variables I identify regs_used and take maximum of regs
used for all the iv candidates that will be used in 
ivopts_estimate_register_pressure
cost analysis.

Motivation behind doing this optimization is I get good performance improvement
for several spec cpu 2017 benchmarks for FP and INT around 2% to 7%.

Also setting regs_used as number of iv candiates, which is not
optimized and robust way of decision making for ivopts optimization I decide
on live-in and live-out analysis which is more correct and appropriate way of 
identifying regs_used.

And also there are no regressions in bootstrapped/regtested on 
powerpc64-linux-gnu.

Thanks & Regards
Ajit
 
>> +static unsigned
>> +get_regs_used (tree ssa_name)
>> +{
>> +  unsigned regs_used = 0;
>> +  gimple *stmt;
>> +  use_operand_p use;
>> +  basic_block def_bb = NULL;
>> +  imm_use_iterator imm_iter;
>> +
>> +  stmt = SSA_NAME_DEF_STMT (ssa_name);
>> +  if (stmt)
>> +{
>> +  def_bb = gimple_bb (stmt);
>> +  /* Mark defs in liveout bitmap temporarily.  */
>> +  if (def_bb)
>> +   regs_used++;
>> +}
>> +  else
>> +def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>> +
>> +  /* An undefined local variable does not need to be very alive.  */
>> +  if (virtual_operand_p (ssa_name)
>> +  || ssa_undefined_value_p (ssa_name, false))
>> +return 0;
>> +
>> +  /* Visit each use of SSA_NAME and if it isn't in the same block as the 
>> def,
>> + add it to the list of live on entry blocks.  */
>> +  FOR_EACH_IMM_USE_FAST (use, imm_iter, ssa_name)
>> +{
>> +  gimple *use_stmt = USE_STMT (use);
>> +  basic_block add_block = NULL;
>> +
>> +  if (gimple_code (use_stmt) == GIMPLE_PHI)
>> +   {
>> + /* Uses in PHI's are considered to be live at exit of the SRC block
>> +as this is where a copy would be inserted.  Check to see if it 
>> is
>> +defined in that block, or whether its live on entry.  */
>> + int index = PHI_ARG_INDEX_FROM_USE (use);
>> + edge e = gimple_phi_arg_edge (as_a  (use_stmt), index);
>> + if (e->src != def_bb)
>> +   add_block = e->src;
>> +   }
>> +  else if (is_gimple_debug (use_stmt))
>> +   continue;
>> +  else
>> +   {
>> + /* If its not defined in this block, its live on entry.  */
>> + basic_block use_bb = gimple_bb (use_stmt);
>> + if (use_bb != def_bb)
>> +   add_block = use_bb;
>> +   }
>> +
>> +  /* If there was a live on entry use, increment register used.  */
>> +  if (add_block)
>> +   {
>> + regs_used++;
>> +   }
>> +}
>> +  return regs_used;
>> +}
>> +
>>  /* For each size of the induction variable set determine the penalty.  */
>>
>>  static 

RE: [PATCH v1] RISC-V: Support vec_init for trailing same element

2023-11-09 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zhong 
Sent: Friday, November 10, 2023 2:32 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; Li, Pan2 ; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Support vec_init for trailing same element

lgtm
 Replied Message 
From
pan2...@intel.com
Date
11/10/2023 14:22
To
gcc-patches@gcc.gnu.org
Cc
juzhe.zh...@rivai.ai,
pan2...@intel.com,
yanzhang.w...@intel.com,
kito.ch...@gmail.com
Subject
[PATCH v1] RISC-V: Support vec_init for trailing same element



Re: [PATCH v1] RISC-V: Support vec_init for trailing same element

2023-11-09 Thread juzhe.zhong
lgtm Replied Message Frompan2...@intel.comDate11/10/2023 14:22 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Support vec_init for trailing same element


[PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just VCE:((a cmp b) ? (VCE c) : (VCE d)).

2023-11-09 Thread liuhongt
When I'm working on PR112443, I notice there's some misoptimizations:
after we fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend
fails to combine it back to v{,p}blendv{v,ps,pd} since the pattern is
too complicated, so I think maybe we should hanlde it in the gimple
level.

The dump is like

  _1 = c_3(D) >= { 0, 0, 0, 0 };
  _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
  _7 = VIEW_CONVERT_EXPR(_2);
  _8 = VIEW_CONVERT_EXPR(b_6(D));
  _9 = VIEW_CONVERT_EXPR(a_5(D));
  _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _11 = VEC_COND_EXPR <_10, _8, _9>;

It can be optimized to

  _1 = c_2(D) >= { 0, 0, 0, 0 };
  _6 = VEC_COND_EXPR <_1, b_5(D), a_4(D)>;

since _7 is either -1 or 0, the selection of _7 < 0 ? _8 : _9 should
be euqal to _1 ? b : a as long as TYPE_PRECISION of the component type
of the second VEC_COND_EXPR is less equal to the first one.
The patch add a gimple pattern to handle that.

gcc/ChangeLog:

* match.pd (VCE:(a cmp b ? -1 : 0) < 0) ? c : d ---> VCE:((a
cmp b) ? (VCE:c) : (VCE:d)): New gimple simplication.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-blendv-3.c: New test.
* gcc.target/i386/blendv-3.c: New test.
---
 gcc/match.pd  | 19 
 .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
 gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
 3 files changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c

diff --git a/gcc/match.pd b/gcc/match.pd
index dbc811b2b38..4d823882a7c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5170,6 +5170,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
   (vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
 
+(for cmp (simple_comparison)
+ (simplify
+  (vec_cond
+(lt (view_convert?@5 (vec_cond@6 (cmp@4 @0 @1)
+integer_all_onesp
+integer_zerop))
+ integer_zerop) @2 @3)
+  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
+   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
+   && !TYPE_UNSIGNED (TREE_TYPE (@5))
+   && VECTOR_TYPE_P (TREE_TYPE (@6))
+   && VECTOR_TYPE_P (type)
+   && (TYPE_PRECISION (TREE_TYPE (type))
+ <= TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@6
+   && TYPE_SIZE (type) == TYPE_SIZE (TREE_TYPE (@6)))
+   (with { tree vtype = TREE_TYPE (@6);}
+ (view_convert:type
+   (vec_cond @4 (view_convert:vtype @2) (view_convert:vtype @3)))
+
 /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
 (simplify
  (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
new file mode 100644
index 000..2777e72ab5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
+/* { dg-final { scan-assembler-not {vpcmp} } } */
+
+#include "blendv-3.c"
diff --git a/gcc/testsuite/gcc.target/i386/blendv-3.c 
b/gcc/testsuite/gcc.target/i386/blendv-3.c
new file mode 100644
index 000..fa0fb067a73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/blendv-3.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2" } */
+/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
+/* { dg-final { scan-assembler-not {vpcmp} } } */
+
+#include 
+
+__m256i
+foo (__m256i a, __m256i b, __m256i c)
+{
+  return _mm256_blendv_epi8 (a, b, ~c < 0);
+}
+
+__m256d
+foo1 (__m256d a, __m256d b, __m256i c)
+{
+  __m256i d = ~c < 0;
+  return _mm256_blendv_pd (a, b, (__m256d)d);
+}
+
+__m256
+foo2 (__m256 a, __m256 b, __m256i c)
+{
+  __m256i d = ~c < 0;
+  return _mm256_blendv_ps (a, b, (__m256)d);
+}
+
+__m128i
+foo4 (__m128i a, __m128i b, __m128i c)
+{
+  return _mm_blendv_epi8 (a, b, ~c < 0);
+}
+
+__m128d
+foo5 (__m128d a, __m128d b, __m128i c)
+{
+  __m128i d = ~c < 0;
+  return _mm_blendv_pd (a, b, (__m128d)d);
+}
+
+__m128
+foo6 (__m128 a, __m128 b, __m128i c)
+{
+  __m128i d = ~c < 0;
+  return _mm_blendv_ps (a, b, (__m128)d);
+}
-- 
2.31.1



Re: RFA: make scan-assembler* ignore LTO sections (Was: Re: committed [RISC-V]: Harden test scan patterns)

2023-11-09 Thread Jeff Law




On 11/8/23 09:00, Joern Rennecke wrote:

On Fri, 29 Sept 2023 at 14:54, Jeff Law  wrote:

...  Joern  can you post a follow-up manual twiddle so
that other ports can follow your example and avoid this problem?

THanks,

jeff

The attached patch makes the scan-assembler* directives ignore the LTO
sections.

Regression tested (using QEMU) for
 riscv-sim
 
riscv-sim/-march=rv32gcv_zfh/-mabi=ilp32d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
 riscv-sim/-march=rv32imac/-mabi=ilp32
 
riscv-sim/-march=rv64gcv_zfh_zvfh_zba_zbb_zbc_zicond_zicboz_zawrs/-mabi=lp64d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
 riscv-sim/-march=rv64imac/-mabi=lp64


scanasm-diff-5.txt

2023-11-08  Joern Rennecke

gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Disregard LTO sections.
(scan-assembler-dem, scan-assembler-dem-not): Likewise.
(dg-scan): Likewise, if name starts with scan-assembler.
(scan-raw-assembler): New proc.
* gcc.dg/pr61868.c: Use scan-raw-assembler.
* gcc.dg/scantest-lto.c: New test.
gcc/
* doc/sourcebuild.texi (Scan the assembly output): Document change.

Looks reasonable to me. OK for the trunk.
jeff


Re: [PATCH v4 1/2] c++: Initial support for P0847R7 (Deducing this) [PR102609]

2023-11-09 Thread waffl3x
Ahh, I should have updated my progress last night after all, it would
have saved us some time. Regardless, it's nice to see we independently
came to the same conclusions.

Side note, would you prefer I compile the lambda and by-value fixes
into a new version of this patch? Or as a separate patch? Originally I
had planned to put it in another patch, but I identified that the code
I wrote in build_over_call was kind of fundamentally broken and it was
almost merely coincidence that it worked at all. In light of this and
your comments (which I've skimmed, I will respond directly below) I
think I should just revise this patch with everything else.


On Thursday, November 9th, 2023 at 2:53 PM, Jason Merrill  
wrote:


> 
> 
> On 11/5/23 10:06, waffl3x wrote:
> 
> > Bootstrapped and tested on x86_64-linux with no regressions.
> > 
> > I originally threw this e-mail together last night, but threw in the
> > towel when I thought I saw tests failing and went to sleep. I did a
> > proper bootstrap and comparison and whatnot and found that there were
> > thankfully no regressions.
> > 
> > Anyhow, the first patch feels ready for trunk, the second needs at
> > least one review, I'll write more on that in the second e-mail though.
> > I put quite a lot into the commit message, in hindsight I think I may
> > have gone overboard, but that isn't something I'm going to rewrite at
> > the moment. I really want to get these patches up for review so they
> > can be finalized.
> > 
> > I'm also including my usual musings on things that came up as I was
> > polishing off the patches. I reckon some of them aren't all that
> > important right now but I would rather toss them in here than forget
> > about them.
> > 
> > I'm starting to think that we should have a general macro that
> > indicates whether an implicit object argument should be passed in the
> > call. It might be more clear than what is currently present. I've also
> > noticed that there's a fair amount of places where instead of using
> > DECL_NONSTATIC_MEMBER_FUNCTION_P the code checks if tree_code of the
> > type is a METHOD_TYPE, which is exactly what the aforementioned macro
> > does.
> 
> 
> Agreed.
> 
> > In build_min_non_dep_op_overload I reversed the branches of a condition
> > because it made more sense with METHOD_TYPE first so it doesn't have to
> > take xobj member functions into account on both branches. I am slightly
> > concerned that flipping the branch around might have consequences,
> > hence why I am mentioning it. Realistically I think it's probably fine
> > though.
> 
> 
> Agreed.

Great, I was definitely concerned about this.
 
> > BTW let me know if there's anything you would prefer to be done
> > differently in the changelog, I am still having trouble writing them
> > and I'm usually uncertain if I'm writing them properly.
> 
> > (DECL_FUNCTION_XOBJ_FLAG): Define.
> 
> 
> This is usually "New macro" or just "New".
> 
> > * decl.cc (grokfndecl): New param XOBJ_FUNC_P, for xobj member
> > functions set DECL_FUNCTION_XOBJ_FLAG and don't set
> > DECL_STATIC_FUNCTION_P.
> > (grokdeclarator): Check for xobj param, clear it's purpose and set
> > is_xobj_member_function if it is present. When flag set, don't change
> > type to METHOD_TYPE, keep it as FUNCTION_TYPE.
> > Adjust call to grokfndecl, pass is_xobj_member_function.
> 
> 
> These could be less verbose; for grokfndecl it makes sense to mention
> the new parameter, but otherwise just saying "handle explicit object
> member functions" is enough.

Will do.

> > It needs to be noted that we can not add checking for xobj member functions 
> > to
> > DECL_NONSTATIC_MEMBER_FUNCTION_P as it is used in cp-objcp-common.cc. While 
> > it
> > most likely would be fine, it's possible it could have unintended effects. 
> > In
> > light of this, we will most likely need to do some refactoring, possibly
> > renaming and replacing it. In contrast, DECL_FUNCTION_MEMBER_P is not used
> > outside of C++ code, so we can add checking for xobj member functions to it
> > without any concerns.
> 
> 
> I think DECL_NONSTATIC_MEMBER_FUNCTION_P should probably be renamed to
> DECL_IOBJ_MEMBER_FUNC_P to parallel the new macro...
> 
> > @@ -3660,6 +3660,7 @@ build_min_non_dep_op_overload (enum tree_code op,
> > 
> > expected_nargs = cp_tree_code_length (op);
> > if (TREE_CODE (TREE_TYPE (overload)) == METHOD_TYPE
> > + || DECL_XOBJ_MEMBER_FUNC_P (overload)
> 
> 
> ...and then the combination should have its own macro, perhaps
> DECL_OBJECT_MEMBER_FUNC_P, spelling out OBJECT to avoid visual confusion
> with either IOBJ/XOBJ.
> 
> Renaming the old macro doesn't need to happen in this patch, but adding
> the new macro should.

Sounds good, I will add it in the next revision.

> > There are a few known issues still present in this patch. Most importantly,
> > the implicit object argument fails to convert when passed to by-value xobj
> > parameters. This occurs both for xobj parameters that match the argument 
> > type
> > and 

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-09 Thread waffl3x

> > I'm unfortunately going down a rabbit hole again.
> > 
> > --function.h:608
> > `/* If pointers to member functions use the least significant bit to 
> > indicate whether a function is virtual, ensure a pointer to this function 
> > will have that bit clear. */ #define MINIMUM_METHOD_BOUNDARY \\ 
> > ((TARGET_PTRMEMFUNC_VBIT_LOCATION == ptrmemfunc_vbit_in_pfn) \\ ? MAX 
> > (FUNCTION_BOUNDARY, 2 * BITS_PER_UNIT) : FUNCTION_BOUNDARY)`
> 
> 
> So yes, it was for PMFs using the low bit of the pointer to indicate a
> virtual member function. Since an xob memfn can't be virtual, it's
> correct for them to have the same alignment as a static memfn.

Is it worth considering whether we want to support virtual xobj member
functions in the future? If that were the case would it be better if we
aligned things a little differently here? Or might it be better if we
wanted to support it as an extension to just effectively translate the
declaration back to one that is a METHOD_TYPE? I imagine this would be
the best solution for non-standard support of the syntax. We would
simply have to forbid by-value and conversion semantics and on the
user's side they would get consistent syntax.

However, this flies in the face of the defective/contradictory spec for
virtual function overrides. So I'm not really sure whether we would
want to do this. I just want to raise the question before we lock in
the alignment, if pushing the patch locks it in that is, I'm not really
sure if it needs to be stable or not.

> > I stumbled upon this while cleaning up the patch, grokfndecl is just so
> > full of cruft it's crazy hard to reason about. There's more than one
> > block that I am near certain is completely dead code. I would like to
> > just ignore them for now but some of them unfortunately pertain to xobj
> > functions. I just don't feel good about putting in any hacks, but to
> > really get any modifications in here correct it would need to be
> > refactored much more than I should be doing in this patch.
> > 
> > Here's another example that I'm not sure how I want to address it.
> > 
> > :10331~decl.cc grokfndecl
> > `int staticp = ctype && TREE_CODE (type) == FUNCTION_TYPE;`
> > :10506~decl.cc grokfndecl
> > `/* If this decl has namespace scope, set that up. */ if (in_namespace) 
> > set_decl_namespace (decl, in_namespace, friendp); else if (ctype) 
> > DECL_CONTEXT (decl) = ctype; else DECL_CONTEXT (decl) = FROB_CONTEXT 
> > (current_decl_namespace ());`
> > And just a few lines down;
> > :10529~decl.cc
> > `/* Should probably propagate const out from type to decl I bet (mrs). */ 
> > if (staticp) { DECL_STATIC_FUNCTION_P (decl) = 1; DECL_CONTEXT (decl) = 
> > ctype; }`
> > 
> > If staticp is true, ctype must have been non-null, and if ctype is
> > non-null, the context for decl should have been set in the second
> > block. So why was the code in the second block added?
> > 
> > commit f3665bdc1799c0421490b5e655f977570354
> > Author: Nathan Sidwell nat...@acm.org
> > Date: Tue Jul 28 08:57:36 2020 -0700
> > 
> > c++: Set more DECL_CONTEXTs
> > 
> > I discovered we were not setting DECL_CONTEXT in a few cases, and
> > grokfndecl's control flow wasn't making it clear that we were doing it
> > in all cases.
> > 
> > gcc/cp/
> > * cp-gimplify.c (cp_genericize_r): Set IMPORTED_DECL's context.
> > * cp-objcp-common.c (cp_pushdecl): Set decl's context.
> > * decl.c (grokfndecl): Make DECL_CONTEXT setting clearer.
> > 
> > According to the commit, it was because it was not clear, which quite
> > frankly I can agree to, it just wasn't determined that the code below
> > is redundantly setting the context so it wasn't removed.
> > 
> > This puts me in a dilemma though, do I put another condition in that
> > code block for the xobj case even though the code is nearly dead? Or do
> > I give it a minor refactor for it to make a little more sense? If I add
> > to the code I feel like it's just going to add to the problem, while if
> > I give it a minor refactor it still won't look great and has a greater
> > chance of breaking something.
> > 
> > In this case I'm going to risk refactoring it, staticp is only used in
> > that 1 place so I will just rip it out. I am not concerned with decl's
> > type spontaneously changing to something that is not FUNCTION_TYPE, and
> > if it did I think there are bigger problems afoot.
> > 
> > I guess I'll know if I went too far with the refactoring when the patch
> > reaches you, do let me know about this one specifically though because
> > it took up a lot of my time trying to decide how to address it.
> 
> 
> Removing the redundant DECL_CONTEXT setting seems appropriate, and
> changing how staticp is handled to reflect that xobfns can also have
> FUNCTION_TYPE.

I removed static_p as it was only used in that one case, I'm pretty
happy with the resulting code but I saw you replied on the patch as
well so I'll see if you commented on it in the review and address your
thoughts there.

> > All tests 

Re: [PATCH 2/3] attribs: Consider namespaces when comparing attributes

2023-11-09 Thread Jeff Law




On 11/6/23 05:24, Richard Sandiford wrote:

decl_attributes and comp_type_attributes both had code that
iterated over one list of attributes and looked for coresponding
attributes in another list.  This patch makes those lookups
namespace-aware.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* attribs.cc (find_same_attribute): New function.
(decl_attributes, comp_type_attributes): Use it when looking
up one list's attributes in another list.

OK
jeff


Re: [PATCH v3] libiberty: Use posix_spawn in pex-unix when available.

2023-11-09 Thread Jeff Law




On 10/4/23 12:28, Brendan Shanks wrote:

Hi,

This patch implements pex_unix_exec_child using posix_spawn when
available.

This should especially benefit recent macOS (where vfork just calls
fork), but should have equivalent or faster performance on all
platforms.
In addition, the implementation is substantially simpler than the
vfork+exec code path.

Tested on x86_64-linux.

v2: Fix error handling (previously the function would be run twice in
case of error), and don't use a macro that changes control flow.

v3: Match file style for error-handling blocks, don't close
in/out/errdes on error, and check close() for errors.

libiberty/
* configure.ac (AC_CHECK_HEADERS): Add spawn.h.
(checkfuncs): Add posix_spawn, posix_spawnp.
(AC_CHECK_FUNCS): Add posix_spawn, posix_spawnp.
* configure, config.in: Rebuild.
* pex-unix.c [HAVE_POSIX_SPAWN] (pex_unix_exec_child): New function.
Thanks.  I pushed this based on Richi's ACK after fixing some minor 
whitespace problems and rebuilding the generated files.


Sorry about the long delay.

jeff


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Jeff Law




On 11/9/23 18:57, Kewen.Lin wrote:

Hi Maxim and Alexander,

Thanks a lot for the review comments!

on 2023/11/10 01:40, Alexander Monakov wrote:


On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:


Hi Kewen,

Below are my comments.  I don't want to override Alexander's review, and if
the patch looks good to him, it's fine to ignore my concerns.

My main concern is that this adds a new entity -- forceful skipping of
DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
already quite a bit of logic in the scheduler to skip them _as part of normal
operation_.


Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal 
operations.
When I started to work on this issue, initially I wanted to try something 
similar
to your idea #2, but when checking the APIs, I realized why not just skip the 
basic
block with NOTEs and LABELs, DEBUG_INSNs as well.  IMHO there is no value to 
try to
schedule this kind of BB (to be scheduled range), skipping it can save some 
resource
allocation (like block dependencies) and make it more efficient (not enter 
function
schedule_block etc.), from this perspective it seems an enhancement.  Does it 
sound
reasonable to you?
It sounds reasonable, but only if doing so doesn't add significant 
implementation complexity.  ie, the gains from doing less work here are 
likely to be very marginal, so I'm more interested in clean, easy to 
maintain code.


Jeff


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Jeff Law




On 11/9/23 10:40, Alexander Monakov wrote:


On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:


Hi Kewen,

Below are my comments.  I don't want to override Alexander's review, and if
the patch looks good to him, it's fine to ignore my concerns.

My main concern is that this adds a new entity -- forceful skipping of
DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
already quite a bit of logic in the scheduler to skip them _as part of normal
operation_.


I agree with the concern. I hoped that solving the problem by skipping the BB
like the (bit-rotted) debug code needs to would be a minor surgery. As things
look now, it may be better to remove the non-working sched_block debug counter
entirely and implement a good solution for the problem at hand.
I wouldn't lose sleep over this -- if removing a debug counter from this 
code simplifies the implementation, that's fine with me.  I believe they 
were all added when significant work in the scheduler was more common.





Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
boundaries with -fcompare-debug is enabled? It should make the problem
readily detectable by -fcompare-debug even when scheduling did not diverge.

I like it.

jeff


[PATCH] Support vec_set/vec_extract/vec_init for V4HF/V2HF.

2023-11-09 Thread liuhongt
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/i386-expand.cc
(ix86_expand_vector_init_duplicate): Handle V4HF/V4BF and
V2HF/V2BF.
(ix86_expand_vector_init_one_nonzero): Ditto.
(ix86_expand_vector_init_one_var): Ditto.
(ix86_expand_vector_init_general): Ditto.
(ix86_expand_vector_set_var): Ditto.
(ix86_expand_vector_set): Ditto.
(ix86_expand_vector_extract): Ditto.
* config/i386/mmx.md
(mmxdoublevecmode): Extend to V4HF/V4BF/V2HF/V2BF.
(*mmx_pinsrw): Extend to V4FI_64, add a new alternative (,
x, x), add a new define_split after the pattern.
(*mmx_pextrw): New define_insn.
(mmx_pshufw_1): Rename to ..
(mmx_pshufw_1): .. this, extend to V4FI_64.
(*mmx_pblendw64): Extend to V4FI_64.
(*vec_dup): New define_insn.
(vec_setv4hi): Rename to ..
(vec_set): .. this, and extend to V4FI_64
(vec_extractv4hihi): Rename to ..
(vec_extract): .. this, and extend
to V4FI_64.
(vec_init): New define_insn.
(*pinsrw): Extend to V2FI_32, add a new alternative (,
x, x), and add a new define_split after it.
(*pextrw): New define_insn.
(vec_setv2hi): Rename to ..
(vec_set): .. this, extend to V2FI_32.
(vec_extractv2hihi): Rename to ..
(vec_extract): .. this, extend to
V2FI_32.
(*punpckwd): Extend to V2FI_32.
(*pshufw_1): Rename to ..
(*pshufw_1): .. this, extend to V2FI_32.
(vec_initv2hihi): Rename to ..
(vec_init): .. this, and extend to
V2FI_32.
(*vec_dup): New define_insn.
* config/i386/sse.md (*vec_extract): Refine constraint
from v to Yw.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-vec_elem-1.c: New test.
* gcc.target/i386/part-vect-vec_elem-2.c: New test.
---
 gcc/config/i386/i386-expand.cc|  60 
 gcc/config/i386/mmx.md| 271 ++
 gcc/config/i386/sse.md|   4 +-
 .../gcc.target/i386/part-vect-vec_elem-1.c| 135 +
 .../gcc.target/i386/part-vect-vec_elem-2.c| 135 +
 5 files changed, 541 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vec_elem-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vec_elem-2.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 8fad73c1549..b52ec51fbe4 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -15592,6 +15592,17 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
}
   goto widen;
 
+case E_V4HFmode:
+case E_V4BFmode:
+  if (TARGET_MMX_WITH_SSE)
+   {
+ val = force_reg (GET_MODE_INNER (mode), val);
+ rtx x = gen_rtx_VEC_DUPLICATE (mode, val);
+ emit_insn (gen_rtx_SET (target, x));
+ return true;
+   }
+  return false;
+
 case E_V2HImode:
   if (TARGET_SSE2)
{
@@ -15605,6 +15616,17 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
}
   return false;
 
+case E_V2HFmode:
+case E_V2BFmode:
+  if (TARGET_SSE2)
+   {
+ val = force_reg (GET_MODE_INNER (mode), val);
+ rtx x = gen_rtx_VEC_DUPLICATE (mode, val);
+ emit_insn (gen_rtx_SET (target, x));
+ return true;
+   }
+  return false;
+
 case E_V8QImode:
 case E_V4QImode:
   if (!mmx_ok)
@@ -15815,6 +15837,8 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, 
machine_mode mode,
   use_vector_set = TARGET_MMX_WITH_SSE && TARGET_SSE4_1;
   break;
 case E_V4HImode:
+case E_V4HFmode:
+case E_V4BFmode:
   use_vector_set = TARGET_SSE || TARGET_3DNOW_A;
   break;
 case E_V4QImode:
@@ -16051,6 +16075,8 @@ ix86_expand_vector_init_one_var (bool mmx_ok, 
machine_mode mode,
 case E_V4SImode:
 case E_V8HImode:
 case E_V4HImode:
+case E_V4HFmode:
+case E_V4BFmode:
   break;
 
 case E_V16QImode:
@@ -16438,6 +16464,7 @@ ix86_expand_vector_init_general (bool mmx_ok, 
machine_mode mode,
   rtx ops[64], op0, op1, op2, op3, op4, op5;
   machine_mode half_mode = VOIDmode;
   machine_mode quarter_mode = VOIDmode;
+  machine_mode int_inner_mode = VOIDmode;
   int n, i;
 
   switch (mode)
@@ -16582,6 +16609,13 @@ quarter:
   ix86_expand_vector_init_interleave (mode, target, ops, n >> 1);
   return;
 
+case E_V4HFmode:
+case E_V4BFmode:
+case E_V2HFmode:
+case E_V2BFmode:
+  int_inner_mode = HImode;
+  break;
+
 case E_V4HImode:
 case E_V8QImode:
 
@@ -16613,6 +16647,16 @@ quarter:
  for (j = 0; j < n_elt_per_word; ++j)
{
  rtx elt = XVECEXP (vals, 0, (i+1)*n_elt_per_word - j - 1);
+ if 

[PATCH] RISC-V: Add combine optimization by slideup for vec_init vectorization

2023-11-09 Thread Juzhe-Zhong
This patch is a small optimization for vector initialization.
Discovered when I am evaluating benchmarks.

Consider this following case:
void foo3 (int8_t *out, int8_t x, int8_t y)
{
  v16qi v = {y, y, y, y, y, y, y, x, x, x, x, x, x, x, x, x};
  *(v16qi*)out = v;
}

Before this patch:

vsetivlizero,16,e8,m1,ta,ma
vmv.v.x v1,a2
vslide1down.vx  v1,v1,a1
vslide1down.vx  v1,v1,a1
vslide1down.vx  v1,v1,a1
vslide1down.vx  v1,v1,a1
vslide1down.vx  v1,v1,a1
vslide1down.vx  v1,v1,a1
vslide1down.vx  v1,v1,a1
vslide1down.vx  v1,v1,a1
vslide1down.vx  v1,v1,a1
vse8.v  v1,0(a0)
ret

After this patch:

vsetivlizero,16,e8,m1,ta,ma
vmv.v.x v1,a1
vmv.v.x v2,a2
vslideup.vi v1,v2,8
vse8.v  v1,0(a0)
ret

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): New enum.
* config/riscv/riscv-v.cc 
(rvv_builder::combine_sequence_use_slideup_profitable_p): New function.
(expand_vector_init_slideup_combine_sequence): Ditto.
(expand_vec_init): Add slideup combine optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add combine test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/combine-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-7.c: New test.

---
 gcc/config/riscv/riscv-protos.h   |   5 +
 gcc/config/riscv/riscv-v.cc   |  53 +++
 .../riscv/rvv/autovec/vls-vlmax/combine-1.c   |  30 ++
 .../riscv/rvv/autovec/vls/combine-1.c | 338 ++
 .../riscv/rvv/autovec/vls/combine-2.c | 178 +
 .../riscv/rvv/autovec/vls/combine-3.c |  98 +
 .../riscv/rvv/autovec/vls/combine-4.c |  58 +++
 .../riscv/rvv/autovec/vls/combine-5.c | 178 +
 .../riscv/rvv/autovec/vls/combine-6.c |  98 +
 .../riscv/rvv/autovec/vls/combine-7.c |  58 +++
 .../gcc.target/riscv/rvv/autovec/vls/def.h|   7 +
 11 files changed, 1101 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/combine-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/combine-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/combine-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/combine-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/combine-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/combine-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/combine-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/combine-7.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 85d4f6ed9ea..ad8c42018d9 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -386,6 +386,11 @@ enum insn_type : unsigned int
   COMPRESS_OP_MERGE
   = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P | BINARY_OP_P,
 
+  /* For vslideup.up has merge operand but use ta.  */
+  SLIDEUP_OP_MERGE = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P
+| HAS_MERGE_P | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P
+| BINARY_OP_P,
+
   /* For vreduce, no mask policy operand. */
   REDUCE_OP = __NORMAL_OP_TA | BINARY_OP_P | VTYPE_MODE_FROM_OP1_P,
   REDUCE_OP_M = __MASK_OP_TA | BINARY_OP_P | VTYPE_MODE_FROM_OP1_P,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c32cd8abe6c..4381d0abc88 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -414,6 +414,7 @@ public:
   rtx get_merged_repeating_sequence ();
 
   bool repeating_sequence_use_merge_profitable_p ();
+  bool combine_sequence_use_slideup_profitable_p ();
   rtx get_merge_scalar_mask (unsigned int) const;
 
   bool single_step_npatterns_p () const;
@@ -511,6 +512,22 @@ rvv_builder::repeating_sequence_use_merge_profitable_p ()
   return (build_merge_mask_cost + merge_cost) * npatterns () < slide1down_cost;
 }
 
+/* Return true if it's worthwhile to use slideup combine 2 vectors.  */
+bool
+rvv_builder::combine_sequence_use_slideup_profitable_p ()
+{
+  int nelts = full_nelts ().to_constant ();
+  int leading_ndups = this->count_dups (0, nelts - 1, 1);
+  int trailing_ndups = this->count_dups (nelts - 1, -1, -1);
+
+  /* ??? Current heuristic we do is we do combine 2 vectors
+ by slideup when:
+   1. # of leading same elements is equal to # of trailing same elements.
+   2. 

[PATCH] RISC-V: Robustify vec_init pattern[NFC]

2023-11-09 Thread Juzhe-Zhong
Although current GCC didn't cause ICE when I create FP16 vec_init case
with -march=rv64gcv (no ZVFH), current vec_init pattern looks wrong.

Since V_VLS FP16 predicate is TARGET_VECTOR_ELEN_FP_16, wheras vec_init
needs vfslide1down/vfslide1up.

It makes more sense to robustify the vec_init patterns which split them
into 2 patterns (one is integer, the other is float) like other 
autovectorization patterns.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_init): Split patterns.

---
 gcc/config/riscv/autovec.md | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 33722ea1139..868b47c8af7 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -373,7 +373,19 @@
 ;; -
 
 (define_expand "vec_init"
-  [(match_operand:V_VLS 0 "register_operand")
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand 1 "")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_init (operands[0], operands[1]);
+DONE;
+  }
+)
+
+;; We split RVV floating-point because we are going to
+;; use vfslide1down/vfslide1up for FP16 which need TARGET_ZVFH.
+(define_expand "vec_init"
+  [(match_operand:V_VLSF 0 "register_operand")
(match_operand 1 "")]
   "TARGET_VECTOR"
   {
-- 
2.36.3



Re: [PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE:a cmp VCE:b) ? c : d.

2023-11-09 Thread Hongtao Liu
On Fri, Nov 10, 2023 at 10:11 AM Andrew Pinski  wrote:
>
> On Thu, Nov 9, 2023 at 5:52 PM liuhongt  wrote:
> >
> > When I'm working on PR112443, I notice there's some misoptimizations: after 
> > we
> > fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend fails to combine 
> > it
> > back to v{,p}blendv{v,ps,pd} since the pattern is too complicated, so I 
> > think
> > maybe we should hanlde it in the gimple level.
> >
> > The dump is like
> >
> >   _1 = c_3(D) >= { 0, 0, 0, 0 };
> >   _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> >   _7 = VIEW_CONVERT_EXPR(_2);
> >   _8 = VIEW_CONVERT_EXPR(b_6(D));
> >   _9 = VIEW_CONVERT_EXPR(a_5(D));
> >   _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> >   _11 = VEC_COND_EXPR <_10, _8, _9>;
> >
> >
> > It can be optimized to
> >
> >   _6 = VIEW_CONVERT_EXPR(b_4(D));
> >   _7 = VIEW_CONVERT_EXPR(a_3(D));
> >   _10 = VIEW_CONVERT_EXPR(c_1(D));
> >   _5 = _10 >= { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> >   _8 = VEC_COND_EXPR <_5, _6, _7>;
> >   _9 = VIEW_CONVERT_EXPR<__m256i>(_8);
>
> Actually this is invalid transformation. It is only valid for unsigned types.
> The reason why it is invalid is because the sign bit changes when
> going to a smaller type from a larger one.
> It would be valid for equals but no other type.
Yes, I think we should VIEW_CONVERT_EXPR the true/false data instead
of the comparison operand.
And it should be only valid when the component type of the second
VEC_COND_EXPR is small than the first VEC_COND_EXPR.
>
> Thanks,
> Andrew
>
> >
> > since _7 is either -1 or 0, _7 < 0 should is euqal to _1 = c_3(D) > { 0, 0, 
> > 0, 0 };
> > The patch add a gimple pattern to handle that.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * match.pd (VCE:(a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE:a cmp
> > VCE:b) ? c : d): New gimple simplication.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx512vl-blendv-3.c: New test.
> > * gcc.target/i386/blendv-3.c: New test.
> > ---
> >  gcc/match.pd  | 17 +++
> >  .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
> >  gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
> >  3 files changed, 69 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index dbc811b2b38..e6f9c4fa1fd 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -5170,6 +5170,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
> >(vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
> >
> > +(for cmp (simple_comparison)
> > + (simplify
> > +  (vec_cond
> > +(lt@4 (view_convert?@5 (vec_cond (cmp @0 @1)
> > +integer_all_onesp
> > +integer_zerop))
> > + integer_zerop) @2 @3)
> > +  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
> > +   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
> > +   && TYPE_SIGN (TREE_TYPE (@0)) == TYPE_SIGN (TREE_TYPE (@5))
> > +   && VECTOR_TYPE_P (type))
> > +   (with {
> > +  tree itype = TREE_TYPE (@5);
> > +  tree vbtype = TREE_TYPE (@4);}
> > + (vec_cond (cmp:vbtype (view_convert:itype @0)
> > +  (view_convert:itype @1)) @2 @3)
> > +
> >  /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
> >  (simplify
> >   (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
> > b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> > new file mode 100644
> > index 000..2777e72ab5f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> > @@ -0,0 +1,6 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512vl -mavx512bw -O2" } */
> > +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> > +/* { dg-final { scan-assembler-not {vpcmp} } } */
> > +
> > +#include "blendv-3.c"
> > diff --git a/gcc/testsuite/gcc.target/i386/blendv-3.c 
> > b/gcc/testsuite/gcc.target/i386/blendv-3.c
> > new file mode 100644
> > index 000..fa0fb067a73
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/blendv-3.c
> > @@ -0,0 +1,46 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx2 -O2" } */
> > +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> > +/* { dg-final { scan-assembler-not {vpcmp} } } */
> > +
> > +#include 
> > +
> > +__m256i
> > +foo (__m256i a, __m256i b, __m256i c)
> > +{
> > +  return _mm256_blendv_epi8 (a, b, ~c < 0);
> > +}
> > +
> > +__m256d
> > +foo1 (__m256d a, __m256d b, __m256i c)
> > +{
> > +  __m256i d = ~c < 0;
> > +  

Re: [PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE:a cmp VCE:b) ? c : d.

2023-11-09 Thread Andrew Pinski
On Thu, Nov 9, 2023 at 5:52 PM liuhongt  wrote:
>
> When I'm working on PR112443, I notice there's some misoptimizations: after we
> fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend fails to combine it
> back to v{,p}blendv{v,ps,pd} since the pattern is too complicated, so I think
> maybe we should hanlde it in the gimple level.
>
> The dump is like
>
>   _1 = c_3(D) >= { 0, 0, 0, 0 };
>   _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>   _7 = VIEW_CONVERT_EXPR(_2);
>   _8 = VIEW_CONVERT_EXPR(b_6(D));
>   _9 = VIEW_CONVERT_EXPR(a_5(D));
>   _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _11 = VEC_COND_EXPR <_10, _8, _9>;
>
>
> It can be optimized to
>
>   _6 = VIEW_CONVERT_EXPR(b_4(D));
>   _7 = VIEW_CONVERT_EXPR(a_3(D));
>   _10 = VIEW_CONVERT_EXPR(c_1(D));
>   _5 = _10 >= { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _8 = VEC_COND_EXPR <_5, _6, _7>;
>   _9 = VIEW_CONVERT_EXPR<__m256i>(_8);

Actually this is invalid transformation. It is only valid for unsigned types.
The reason why it is invalid is because the sign bit changes when
going to a smaller type from a larger one.
It would be valid for equals but no other type.

Thanks,
Andrew

>
> since _7 is either -1 or 0, _7 < 0 should is euqal to _1 = c_3(D) > { 0, 0, 
> 0, 0 };
> The patch add a gimple pattern to handle that.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * match.pd (VCE:(a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE:a cmp
> VCE:b) ? c : d): New gimple simplication.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512vl-blendv-3.c: New test.
> * gcc.target/i386/blendv-3.c: New test.
> ---
>  gcc/match.pd  | 17 +++
>  .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
>  gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
>  3 files changed, 69 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index dbc811b2b38..e6f9c4fa1fd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5170,6 +5170,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
>(vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
>
> +(for cmp (simple_comparison)
> + (simplify
> +  (vec_cond
> +(lt@4 (view_convert?@5 (vec_cond (cmp @0 @1)
> +integer_all_onesp
> +integer_zerop))
> + integer_zerop) @2 @3)
> +  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
> +   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
> +   && TYPE_SIGN (TREE_TYPE (@0)) == TYPE_SIGN (TREE_TYPE (@5))
> +   && VECTOR_TYPE_P (type))
> +   (with {
> +  tree itype = TREE_TYPE (@5);
> +  tree vbtype = TREE_TYPE (@4);}
> + (vec_cond (cmp:vbtype (view_convert:itype @0)
> +  (view_convert:itype @1)) @2 @3)
> +
>  /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
>  (simplify
>   (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
> b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> new file mode 100644
> index 000..2777e72ab5f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512vl -mavx512bw -O2" } */
> +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> +/* { dg-final { scan-assembler-not {vpcmp} } } */
> +
> +#include "blendv-3.c"
> diff --git a/gcc/testsuite/gcc.target/i386/blendv-3.c 
> b/gcc/testsuite/gcc.target/i386/blendv-3.c
> new file mode 100644
> index 000..fa0fb067a73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/blendv-3.c
> @@ -0,0 +1,46 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O2" } */
> +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> +/* { dg-final { scan-assembler-not {vpcmp} } } */
> +
> +#include 
> +
> +__m256i
> +foo (__m256i a, __m256i b, __m256i c)
> +{
> +  return _mm256_blendv_epi8 (a, b, ~c < 0);
> +}
> +
> +__m256d
> +foo1 (__m256d a, __m256d b, __m256i c)
> +{
> +  __m256i d = ~c < 0;
> +  return _mm256_blendv_pd (a, b, (__m256d)d);
> +}
> +
> +__m256
> +foo2 (__m256 a, __m256 b, __m256i c)
> +{
> +  __m256i d = ~c < 0;
> +  return _mm256_blendv_ps (a, b, (__m256)d);
> +}
> +
> +__m128i
> +foo4 (__m128i a, __m128i b, __m128i c)
> +{
> +  return _mm_blendv_epi8 (a, b, ~c < 0);
> +}
> +
> +__m128d
> +foo5 (__m128d a, __m128d b, __m128i c)
> +{
> +  __m128i d = ~c < 0;
> +  return _mm_blendv_pd (a, b, (__m128d)d);
> +}
> +
> +__m128
> +foo6 (__m128 a, __m128 b, __m128i c)
> +{
> +  __m128i d = ~c < 0;
> +  return 

Re: [PATCH-2v2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-09 Thread Kewen.Lin
Hi,

on 2023/11/9 09:31, HAO CHEN GUI wrote:
> Hi,
>   This patch enables vector mode for by pieces equality compare. It
> adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES
> and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare
> relies both move and compare instructions, so both macro are changed.
> As the vector load/store might be unaligned, the 16-byte move and
> compare are only enabled when VSX and EFFICIENT_UNALIGNED_VSX are both
> enabled.
> 
>   This patch enables 16-byte by pieces move. As the vector mode is not
> enabled for by pieces move, TImode is used for the move. It caused 2
> regression cases. The root cause is that now 16-byte length array can
> be constructed by one load instruction and not be put into LC0 so that
> SRA optimization will not be taken.
> 
>   Compared to previous version, the main change is to modify the guard
> of expand pattern and compiling options of the test case. Also the fix
> for two regression cases caused by 16-byte move enablement is moved to
> this patch.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Enable vector mode for by pieces equality compare
> 
> This patch adds a new expand pattern - cbranchv16qi4 to enable vector
> mode by pieces equality compare on rs6000.  The macro MOVE_MAX_PIECES
> (COMPARE_MAX_PIECES) is set to 16 bytes when VSX and
> EFFICIENT_UNALIGNED_VSX is enabled, otherwise keeps unchanged.  The
> macro STORE_MAX_PIECES is set to the same value as MOVE_MAX_PIECES by
> default, so now it's explicitly defined and keeps unchanged.
> 
> gcc/
>   PR target/111449
>   * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
>   * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate
>   insn sequence for V16QImode equality compare.
>   * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
>   (STORE_MAX_PIECES): Define.
> 
> gcc/testsuite/
>   PR target/111449
>   * gcc.target/powerpc/pr111449-1.c: New.
>   * gcc.dg/tree-ssa/sra-17.c: Add additional options for 32-bit powerpc.
>   * gcc.dg/tree-ssa/sra-18.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index e8a596fb7e9..a1423c76451 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2605,6 +2605,48 @@ (define_insn "altivec_vupklpx"
>  }
>[(set_attr "type" "vecperm")])
> 
> +/* The cbranch_optabs doesn't allow FAIL, so old cpus which are

Nit: s/cbranch_optabs/cbranch_optab/

> +   inefficient on unaligned vsx are disabled as the cost is high
> +   for unaligned load/store.  */
> +(define_expand "cbranchv16qi4"
> +  [(use (match_operator 0 "equality_operator"
> + [(match_operand:V16QI 1 "reg_or_mem_operand")
> +  (match_operand:V16QI 2 "reg_or_mem_operand")]))
> +   (use (match_operand 3))]
> +  "VECTOR_MEM_VSX_P (V16QImode)
> +   && TARGET_EFFICIENT_UNALIGNED_VSX"
> +{
> +  /* Use direct move for P8 LE to skip double-word swap, as the byte
> + order doesn't matter for equality compare.  If any operands are
> + altivec indexed or indirect operands, the load can be implemented
> + directly by altivec aligned load instruction and swap is no
> + need.  */
> +  if (!TARGET_P9_VECTOR
> +  && !BYTES_BIG_ENDIAN
> +  && MEM_P (operands[1])
> +  && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
> +  && MEM_P (operands[2])
> +  && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
> +{
> +  rtx reg_op1 = gen_reg_rtx (V16QImode);
> +  rtx reg_op2 = gen_reg_rtx (V16QImode);
> +  rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
> +  rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
> +  operands[1] = reg_op1;
> +  operands[2] = reg_op2;
> +}
> +  else
> +{
> +  operands[1] = force_reg (V16QImode, operands[1]);
> +  operands[2] = force_reg (V16QImode, operands[2]);
> +}
> +
> +  rtx_code code = GET_CODE (operands[0]);
> +  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
> +  rs6000_emit_cbranch (V16QImode, operands);
> +  DONE;
> +})
> +
>  ;; Compare vectors producing a vector result and a predicate, setting CR6 to
>  ;; indicate a combined status
>  (define_insn "altivec_vcmpequ_p"
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index cc24dd5301e..10279052636 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
> else
>   emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
>   }
> +  else if (mode == V16QImode)
> + {
> +   gcc_assert (code == EQ || code == NE);
> +
> +   rtx result_vector = gen_reg_rtx (V16QImode);
> +   rtx cc_bit = gen_reg_rtx (SImode);
> 

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Kewen.Lin
Hi Maxim and Alexander,

Thanks a lot for the review comments!

on 2023/11/10 01:40, Alexander Monakov wrote:
> 
> On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:
> 
>> Hi Kewen,
>>
>> Below are my comments.  I don't want to override Alexander's review, and if
>> the patch looks good to him, it's fine to ignore my concerns.
>>
>> My main concern is that this adds a new entity -- forceful skipping of
>> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
>> in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
>> already quite a bit of logic in the scheduler to skip them _as part of normal
>> operation_.

Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal 
operations.
When I started to work on this issue, initially I wanted to try something 
similar
to your idea #2, but when checking the APIs, I realized why not just skip the 
basic
block with NOTEs and LABELs, DEBUG_INSNs as well.  IMHO there is no value to 
try to
schedule this kind of BB (to be scheduled range), skipping it can save some 
resource
allocation (like block dependencies) and make it more efficient (not enter 
function
schedule_block etc.), from this perspective it seems an enhancement.  Does it 
sound
reasonable to you?

> 
> I agree with the concern. I hoped that solving the problem by skipping the BB
> like the (bit-rotted) debug code needs to would be a minor surgery. As things
> look now, it may be better to remove the non-working sched_block debug counter
> entirely and implement a good solution for the problem at hand.

OK, if debug counter sched_block is useless and can be removed, then the 
proposed
new skipping becomes the only actual need for the artificial resolve_forw_deps.

> 
>>
>> Would you please consider 2 ideas below.
>>
>> #1:
>> After a brief look, I'm guessing this part is causing the problem:
>> haifa-sched.cc :schedule_block():
>> === [1]
>>   /* Loop until all the insns in BB are scheduled.  */
>>   while ((*current_sched_info->schedule_more_p) ())
>> {
>>   perform_replacements_new_cycle ();
>>   do
>>  {
>>start_clock_var = clock_var;
>>
>>clock_var++;
>>
>>advance_one_cycle ();
> 
> As I understand, we have spurious calls to advance_one_cycle on basic block
> boundaries, which don't model the hardware (the CPU doesn't see BB boundaries)
> and cause divergence when passing through a debug-only BB which would not be
> present at all without -g.
> 
> Since EBBs and regions may not have jump targets in the middle, advancing
> a cycle on BB boundaries does not seem well motivated. Can we remove it?
> 
> Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
> boundaries with -fcompare-debug is enabled? It should make the problem
> readily detectable by -fcompare-debug even when scheduling did not diverge.

Good idea!  It would be easy to detect the inconsistent issue with such note.

BR,
Kewen


[PATCH] Simplify vector ((VCE?(a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE:a cmp VCE:b) ? c : d.

2023-11-09 Thread liuhongt
When I'm working on PR112443, I notice there's some misoptimizations: after we
fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend fails to combine it
back to v{,p}blendv{v,ps,pd} since the pattern is too complicated, so I think
maybe we should hanlde it in the gimple level.

The dump is like

  _1 = c_3(D) >= { 0, 0, 0, 0 };
  _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
  _7 = VIEW_CONVERT_EXPR(_2);
  _8 = VIEW_CONVERT_EXPR(b_6(D));
  _9 = VIEW_CONVERT_EXPR(a_5(D));
  _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _11 = VEC_COND_EXPR <_10, _8, _9>;


It can be optimized to

  _6 = VIEW_CONVERT_EXPR(b_4(D));
  _7 = VIEW_CONVERT_EXPR(a_3(D));
  _10 = VIEW_CONVERT_EXPR(c_1(D));
  _5 = _10 >= { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _8 = VEC_COND_EXPR <_5, _6, _7>;
  _9 = VIEW_CONVERT_EXPR<__m256i>(_8);

since _7 is either -1 or 0, _7 < 0 should is euqal to _1 = c_3(D) > { 0, 0, 0, 
0 };
The patch add a gimple pattern to handle that.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?

gcc/ChangeLog:

* match.pd (VCE:(a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE:a cmp
VCE:b) ? c : d): New gimple simplication.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-blendv-3.c: New test.
* gcc.target/i386/blendv-3.c: New test.
---
 gcc/match.pd  | 17 +++
 .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
 gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
 3 files changed, 69 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c

diff --git a/gcc/match.pd b/gcc/match.pd
index dbc811b2b38..e6f9c4fa1fd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5170,6 +5170,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
   (vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
 
+(for cmp (simple_comparison)
+ (simplify
+  (vec_cond
+(lt@4 (view_convert?@5 (vec_cond (cmp @0 @1)
+integer_all_onesp
+integer_zerop))
+ integer_zerop) @2 @3)
+  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
+   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
+   && TYPE_SIGN (TREE_TYPE (@0)) == TYPE_SIGN (TREE_TYPE (@5))
+   && VECTOR_TYPE_P (type))
+   (with {
+  tree itype = TREE_TYPE (@5);
+  tree vbtype = TREE_TYPE (@4);}
+ (vec_cond (cmp:vbtype (view_convert:itype @0)
+  (view_convert:itype @1)) @2 @3)
+
 /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
 (simplify
  (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
new file mode 100644
index 000..2777e72ab5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
+/* { dg-final { scan-assembler-not {vpcmp} } } */
+
+#include "blendv-3.c"
diff --git a/gcc/testsuite/gcc.target/i386/blendv-3.c 
b/gcc/testsuite/gcc.target/i386/blendv-3.c
new file mode 100644
index 000..fa0fb067a73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/blendv-3.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2" } */
+/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
+/* { dg-final { scan-assembler-not {vpcmp} } } */
+
+#include 
+
+__m256i
+foo (__m256i a, __m256i b, __m256i c)
+{
+  return _mm256_blendv_epi8 (a, b, ~c < 0);
+}
+
+__m256d
+foo1 (__m256d a, __m256d b, __m256i c)
+{
+  __m256i d = ~c < 0;
+  return _mm256_blendv_pd (a, b, (__m256d)d);
+}
+
+__m256
+foo2 (__m256 a, __m256 b, __m256i c)
+{
+  __m256i d = ~c < 0;
+  return _mm256_blendv_ps (a, b, (__m256)d);
+}
+
+__m128i
+foo4 (__m128i a, __m128i b, __m128i c)
+{
+  return _mm_blendv_epi8 (a, b, ~c < 0);
+}
+
+__m128d
+foo5 (__m128d a, __m128d b, __m128i c)
+{
+  __m128i d = ~c < 0;
+  return _mm_blendv_pd (a, b, (__m128d)d);
+}
+
+__m128
+foo6 (__m128 a, __m128 b, __m128i c)
+{
+  __m128i d = ~c < 0;
+  return _mm_blendv_ps (a, b, (__m128)d);
+}
-- 
2.31.1



[PATCH] Initial support for AVX10.1

2023-11-09 Thread Haochen Jiang
gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Add avx10_set and version and detect avx10.1.
(cpu_indicator_init): Handle avx10.1-512.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_1_256_SET): New.
(OPTION_MASK_ISA2_AVX10_1_256_SET): Ditto.
(OPTION_MASK_ISA2_AVX10_1_512_UNSET): Ditto.
(OPTION_MASK_ISA2_AVX10_1_512_UNSET): Ditto.
(OPTION_MASK_ISA2_AVX2_UNSET): Modify for AVX10.1.
(ix86_handle_option): Handle -mavx10.1-256 and -mavx10.1-512.
Add indicator for explicit no-avx512 and no-avx10.1 options.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX10_1_256 and FEATURE_AVX10_1_512.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
AVX10_1_256 and AVX10_1_512.
* config/i386/cpuid.h (bit_AVX10): New.
(bit_AVX10_256): Ditto.
(bit_AVX10_512): Ditto.
* config/i386/driver-i386.cc (check_avx10_avx512_features): New.
(host_detect_local_cpu): Do not append "-mno-" options under
specific scenarios to avoid emitting a warning.
* config/i386/i386-isa.def
(EVEX512): Add DEF_PTA(EVEX512).
(AVX10_1_256): Add DEF_PTA(AVX10_1_256).
(AVX10_1_512): Add DEF_PTA(AVX10_1_512).
* config/i386/i386-options.cc (isa2_opts): Add -mavx10.1-256 and
-mavx10.1-512.
(ix86_function_specific_save): Save explicit no indicator.
(ix86_function_specific_restore): Restore explicit no indicator.
(ix86_valid_target_attribute_inner_p): Handle avx10.1, avx10.1-256 and
avx10.1-512.
(ix86_valid_target_attribute_tree): Handle avx512 function
attributes with avx10.1 command line option.
(ix86_option_override_internal): Handle AVX10.1 options.
* config/i386/i386.h: Add PTA_EVEX512 for AVX512 target
machines.
* config/i386/i386.opt: Add variable ix86_no_avx512_explicit and
ix86_no_avx10_1_explicit, option -mavx10.1, -mavx10.1-256 and
-mavx10.1-512.
* doc/extend.texi: Document avx10.1, avx10.1-256 and avx10.1-512.
* doc/invoke.texi: Document -mavx10.1, -mavx10.1-256 and -mavx10.1-512.
* doc/sourcebuild.texi: Document target avx10.1, avx10.1-256
and avx10.1-512.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-1.c: New test.
* gcc.target/i386/avx10_1-10.c: Ditto.
* gcc.target/i386/avx10_1-11.c: Ditto.
* gcc.target/i386/avx10_1-12.c: Ditto.
* gcc.target/i386/avx10_1-13.c: Ditto.
* gcc.target/i386/avx10_1-14.c: Ditto.
* gcc.target/i386/avx10_1-15.c: Ditto.
* gcc.target/i386/avx10_1-16.c: Ditto.
* gcc.target/i386/avx10_1-17.c: Ditto.
* gcc.target/i386/avx10_1-18.c: Ditto.
* gcc.target/i386/avx10_1-19.c: Ditto.
* gcc.target/i386/avx10_1-2.c: Ditto.
* gcc.target/i386/avx10_1-20.c: Ditto.
* gcc.target/i386/avx10_1-21.c: Ditto.
* gcc.target/i386/avx10_1-22.c: Ditto.
* gcc.target/i386/avx10_1-23.c: Ditto.
* gcc.target/i386/avx10_1-3.c: Ditto.
* gcc.target/i386/avx10_1-4.c: Ditto.
* gcc.target/i386/avx10_1-5.c: Ditto.
* gcc.target/i386/avx10_1-6.c: Ditto.
* gcc.target/i386/avx10_1-7.c: Ditto.
* gcc.target/i386/avx10_1-8.c: Ditto.
* gcc.target/i386/avx10_1-9.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h   |  33 ++
 gcc/common/config/i386/i386-common.cc  |  55 -
 gcc/common/config/i386/i386-cpuinfo.h  |   2 +
 gcc/common/config/i386/i386-isas.h |   3 +
 gcc/config/i386/cpuid.h|   5 +
 gcc/config/i386/driver-i386.cc |  43 ++-
 gcc/config/i386/i386-isa.def   |   3 +
 gcc/config/i386/i386-options.cc| 132 +++--
 gcc/config/i386/i386.h |   2 +-
 gcc/config/i386/i386.opt   |  30 +
 gcc/doc/extend.texi|  15 +++
 gcc/doc/invoke.texi|  17 ++-
 gcc/doc/sourcebuild.texi   |   9 ++
 gcc/testsuite/gcc.target/i386/avx10_1-1.c  |  22 
 gcc/testsuite/gcc.target/i386/avx10_1-10.c |   6 +
 gcc/testsuite/gcc.target/i386/avx10_1-11.c |   6 +
 gcc/testsuite/gcc.target/i386/avx10_1-12.c |   6 +
 gcc/testsuite/gcc.target/i386/avx10_1-13.c |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-14.c |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-15.c |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-16.c |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-17.c |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-18.c |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-19.c |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-2.c  |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-20.c |  13 ++
 gcc/testsuite/gcc.target/i386/avx10_1-21.c |   6 +
 gcc/testsuite/gcc.target/i386/avx10_1-22.c |  13 ++
 

[RFC] Intel AVX10.1 Compiler Design and Support

2023-11-09 Thread Haochen Jiang
Hi all,

This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512
support, it makes a lot easier to add them comparing to the August version.
Detail for AVX10 is shown below:

Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
It describes the Intel Advanced Vector Extensions 10 Instruction Set
Architecture.
https://cdrdv2.intel.com/v1/dl/getContent/784267

The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper
It provides introductory information regarding the converged vector ISA: Intel
Advanced Vector Extensions 10.
https://cdrdv2.intel.com/v1/dl/getContent/784343

Our proposal is to take AVX10.1-256 and AVX10.1-512 as two "virtual" ISAs in
the compiler. AVX10.1-512 will imply AVX10.1-256. They will not enable
anything at first. At the end of the option handling, we will check whether
the two bits are set. If AVX10.1-256 is set, we will set the AVX512 related
ISA bits. AVX10.1-512 will further set EVEX512 ISA bit.

It means that AVX10 options will be separated from the existing AVX512 and the
newly added -m[no-]evex512 options. AVX10 and AVX512 options will control
(enable/disable/set vector size) the AVX512 features underneath independently.
If there’s potential overlap or conflict between AVX10 and AVX512 options,
some rules are provided to define the behavior, which will be described below.

avx10.1 option will be provided as an alias of avx10.1-256.

In the future, the AVX10 options will imply like this:

AVX10.1-256 < AVX10.1-512
 ^ ^
 | |

AVX10.2-256 < AVX10.2-512
 ^ ^
 | |

AVX10.3-256 < AVX10.3-512
 ^ ^
 | |

Each of them will have its own option to enable/disabled corresponding
features. The alias avx10.x will also be provided.

As mentioned in August version RFC, since we lean towards the adoption of
AVX10 instead of AVX512 from now on, we don’t recommend users to combine the
AVX10 and legacy AVX512 options. However, we would like to introduce some
simple rules for user when it comes to combination. 

1. Enabling AVX10 and AVX512 at the same command line with different vector
size will lead to a warning message. The behavior of the compiler will be
enabling AVX10 with longer, i.e., 512 bit vector size.

If the vector sizes are the same (e.g. -mavx10.1-256 -mavx512f -mno-evex512,
-mavx10.1-512 -mavx512f), it will be valid with the corresponding vector size.

2. -mno-avx10.1 option can’t disable any features enabled by AVX512 options or
impact the vector size, and vice versa. The compiler will emit warnings if
necessary.

For the auto dispatch support including function multi versioning, function
attribute usage, the behavior will be identical to compiler options.

If you have any questions, feel free to ask in this thread.

Thx,
Haochen




Re: [PATCH 1/3] attribs: Cache the gnu namespace

2023-11-09 Thread Jeff Law




On 11/6/23 05:23, Richard Sandiford wrote:

Later patches add more calls to get_attribute_namespace.
For scoped attributes, this is a simple operation on tree pointers.
But for normal GNU attributes (the vast majority), it involves a
call to get_identifier ("gnu").  This patch caches the identifier
for speed.

Admittedly I'm just going off gut instinct here.  I'm happy to drop
the patch if this doesn't seem worth a new GC root.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard

gcc/
* Makefile.in (GTFILES): Add attribs.cc.
* attribs.cc (gnu_namespace_cache): New variable.
(get_gnu_namespace): New function.
(lookup_attribute_spec): Use it instead of get_identifier ("gnu").
(get_attribute_namespace, attribs_cc_tests): Likewise.

I trust your gut instincts.  So OK.

jeff


Re: [PATCH] g++: Add require-effective-target to multi-input file testcase pr95401.cc

2023-11-09 Thread Jeff Law




On 11/3/23 00:18, Patrick O'Neill wrote:

On non-vector targets dejagnu attempts dg-do compile for pr95401.cc.
This produces a command like this:
g++ pr95401.cc pr95401a.cc -S -o pr95401.s

which isn't valid (gcc does not accept multiple input files when using
-S with -o).

This patch adds require-effective-target vect_int to avoid the case
where the testcase is invoked with dg-do compile.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr95401.cc: Add require-effective-target vect_int.
Sorry, I must be missing something here.  I fail to see how adding an 
effective target check would/should impact the problem you've described 
above with the dg-additional-sources interaction with -S.


Jeff




Re: [committed] RISC-V: Fix INSN costing and more zicond tests

2023-11-09 Thread Maciej W. Rozycki
On Thu, 9 Nov 2023, Jeff Law wrote:

> >   Can we have the insn costing reverted to correct calculation?
> What needs to happen is that code needs to be extended, not reverted. Many
> codes have to be synthesized based on the condition and the true/false arms.
> That's not currently accounted for.

 How is maintaining zillions of variants of insn counts by hand (IIUC what 
you mean) going to be more efficient (or even practical maintenance-wise) 
than what the middle end did automagically?  What exactly was wrong with 
the previous approach, and then why didn't your change include a proof of 
correctness in the form of testsuite cases verifying branch vs conditional 
move costing stays the same (or gets corrected if applicable) across your 
change?

 I guess I'll post my patch series regardless on the presumption that 
correct insn counting will have been reinstated for GCC 14 one way or 
another (i.e. by reverting commit 44efc743acc0 locally in my tree and then 
getting clean test results across the patch series) and we can take it 
from there.  Also to make sure we're on the same page.

 I do hope it will be considered worthwhile despite this issue making it 
not ready for testsuite verification, as not only it adds new features, 
but it fixes numerous existing problems, plain bugs, and deficiencies as 
well which we currently have in conditional move handling.  But it relies 
on correct costing for verification, which I couldn't have expected that 
will get broken again (regardless of your clearly good intentions).  And 
I'd rather we had these test cases or otherwise costing regressions are 
easily missed (as indicated here).

  Maciej


Re: Re: [PATCH V3] test: Fix FAIL of pr97428.c for RVV

2023-11-09 Thread juzhe.zh...@rivai.ai
Thanks Jeff. Committed.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-10 09:26
To: Juzhe-Zhong; gcc-patches
CC: rguenther
Subject: Re: [PATCH V3] test: Fix FAIL of pr97428.c for RVV
 
 
On 11/7/23 08:18, Juzhe-Zhong wrote:
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/pr97428.c: Add additional compile option for riscv.
I don't guess we know if other targets would benefit from this option. 
The only reference in gcc-testresults to pr97428.c is an armv7 run from 
2022.  So let's assume nobody else cares.
 
OK for the trunk.
 
jeff
 
 


Re: [PATCH V3] test: Fix FAIL of pr97428.c for RVV

2023-11-09 Thread Jeff Law




On 11/7/23 08:18, Juzhe-Zhong wrote:

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97428.c: Add additional compile option for riscv.
I don't guess we know if other targets would benefit from this option. 
The only reference in gcc-testresults to pr97428.c is an armv7 run from 
2022.  So let's assume nobody else cares.


OK for the trunk.

jeff



Re: [PATCH] g++: Rely on dg-do-what-default to avoid running pr102788.cc on non-vector targets

2023-11-09 Thread Jeff Law




On 11/2/23 17:45, Patrick O'Neill wrote:

Testcases in g++.dg/vect rely on check_vect_support_and_set_flags
to set dg-do-what-default and avoid running vector tests on non-vector
targets. The three testcases in this patch overwrite the default with
dg-do run.

Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr102788.cc: Remove dg-do run directive.
OK.  I'll note your patch has just one file patched, but your comment 
indicates three testcases have this problem.  Did you forget to include 
a couple changes?


If so, those are pre-approved as well.  Just post them for the archiver 
and commit.


Thanks,
jeff


Re: Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.

2023-11-09 Thread juzhe.zh...@rivai.ai
I am using --with-arch=rv32gcv --with-abi=ilp32d

I change dg-additional-option into dg-option of all those tests.
Issues gone.




juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-10 09:15
To: juzhe.zh...@rivai.ai; Robin Dapp; gcc-patches; palmer; kito.cheng
Subject: Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.
 
 
On 11/9/23 18:12, juzhe.zh...@rivai.ai wrote:
> How to fix it ? I am pretty noob on testing CI.
> Can Robin fix that?
It's most likely a problem on your side with how you've configured the 
toolchain.  I don't think this is somethign Robin can fix for you.
jeff
 


Re: [PATCH 12/12] mode-switching: Add a backprop hook

2023-11-09 Thread Jeff Law




On 11/5/23 11:50, Richard Sandiford wrote:

This patch adds a way for targets to ask that selected mode changes
be brought forward, through a combination of:

(1) requiring a mode in blocks where the entity was previously
 transparent

(2) pushing the transition at the head of a block onto incomging edges

SME has two uses for this:

- A "one-shot" entity that, for any given path of execution,
   either stays off or makes exactly one transition from off to on.
   This relies only on (1) above; see the hook description for more info.

   The main purpose of using mode-switching for this entity is to
   shrink-wrap the code that requires it.

- A second entity for which all transitions must be from known
   modes, which is enforced using a combination of (1) and (2).
   More specifically, (1) looks for edges B1->B2 for which:

   - B2 requires a specific mode and
   - B1 does not guarantee a specific starting mode

   In this system, such an edge is only possible if the entity is
   transparent in B1.  (1) then forces B1 to require some safe common
   mode.  Applying this inductively means that all incoming edges are
   from known modes.  If different edges give different starting modes,
   (2) pushes the transitions onto the edges themselves; this only
   happens if the entity is not transparent in some predecessor block.

The patch also uses the back-propagation as an excuse to do a simple
on-the-fly optimisation.

Hopefully the comments in the patch explain things a bit better.

gcc/
* target.def (mode_switching.backprop): New hook.
* doc/tm.texi.in (TARGET_MODE_BACKPROP): New @hook.
* doc/tm.texi: Regenerate.
* mode-switching.cc (struct bb_info): Add single_succ.
(confluence_info): Add transp field.
(single_succ_confluence_n, single_succ_transfer): New functions.
(backprop_confluence_n, backprop_transfer): Likewise.
(optimize_mode_switching): Use them.  Push mode transitions onto
a block's incoming edges, if the backprop hook requires it.
OK.  Really curious if we might be able to use this to improve the 
vsetvl bits in the RISC-V backend.


Jeff



Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.

2023-11-09 Thread Jeff Law




On 11/9/23 18:12, juzhe.zh...@rivai.ai wrote:

How to fix it ? I am pretty noob on testing CI.
Can Robin fix that?
It's most likely a problem on your side with how you've configured the 
toolchain.  I don't think this is somethign Robin can fix for you.

jeff


Re: Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.

2023-11-09 Thread juzhe.zh...@rivai.ai
How to fix it ? I am pretty noob on testing CI.
Can Robin fix that?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-10 09:11
To: juzhe.zh...@rivai.ai; Robin Dapp; gcc-patches; palmer; kito.cheng
Subject: Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.
 
 
On 11/9/23 18:09, juzhe.zh...@rivai.ai wrote:
> I am already using master branch.
> 
> The FAIL is:
> xgcc: fatal error: Cannot find suitable multilib set for 
> '-march=rv64imafdcv_zicsr_zifencei_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvfh_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
That's an multilib configuration issue of some kind.  Changing binutils 
isn't going to fix that.
 
jeff
 


Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.

2023-11-09 Thread Jeff Law




On 11/9/23 18:09, juzhe.zh...@rivai.ai wrote:

I am already using master branch.

The FAIL is:
xgcc: fatal error: Cannot find suitable multilib set for 
'-march=rv64imafdcv_zicsr_zifencei_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvfh_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
That's an multilib configuration issue of some kind.  Changing binutils 
isn't going to fix that.


jeff


Re: Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.

2023-11-09 Thread juzhe.zh...@rivai.ai
I am already using master branch.

The FAIL is:
xgcc: fatal error: Cannot find suitable multilib set for 
'-march=rv64imafdcv_zicsr_zifencei_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvfh_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'

I will update binutils again today to see whether it can fix the issue.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-10 07:58
To: 钟居哲; rdapp.gcc; gcc-patches; palmer; kito.cheng
Subject: Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.
 
 
On 11/9/23 15:43, 钟居哲 wrote:
> Hi. Robin.
[ ... ]
You may need a development version of binutils to get the zfh/zvfh 
support and unreleased patches to get zfb/zvfb support.
 
Probably the easiest thing to do would be to look in the gcc.log file at 
those failures and see what the excess failure is.  If it's a diagnostic 
from the assembler about an unrecognized instruction or something 
similar, then that's a pretty clear sign you need a newer binutils.
 
jeff
 


Re: [PATCH 2/3] RISC-V: Update XCValu constraints to match other vendors

2023-11-09 Thread Jeff Law




On 11/8/23 04:09, Mary Bennett wrote:

gcc/ChangeLog:
* config/riscv/constraints.md: CVP2 -> CV_alu_pow2.
* config/riscv/corev.md: Likewise.
Bikeshedding alert...  Usually we keep constraint names pretty small. It 
helps when you've got patterns that may have many constraints.  I don't 
see that likely happening here, so I think we're OK.  But something to 
keep in mind.


2^n - 1 is a pretty common constraint and normally I might suggest we 
make this more generic for use elsewhere.  But in this case there's a 
restriction on the upper bound of 0x3fff, so it's not as generic as 
2^n - 1 up to word size.


So OK for the trunk as-is.

jeff


Re: Re: [PATCH] RISC-V: Move cond_copysign from combine pattern to autovec pattern

2023-11-09 Thread juzhe.zh...@rivai.ai
Yes. No regression. Committed.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-10 07:56
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Move cond_copysign from combine pattern to autovec 
pattern
 
 
On 11/9/23 16:33, Juzhe-Zhong wrote:
> Since cond_copysign has been support into match.pd (middle-end).
> We don't need to support conditional copysign by RTL combine pass.
> 
> Instead, we can support it by direct explicit cond_copysign optab.
> 
> conditional copysign tests are already available in the testsuite.
> No need to add tests.
> 
> gcc/ChangeLog:
> 
> * config/riscv/autovec-opt.md (*cond_copysign): Remove.
> * config/riscv/autovec.md (cond_copysign): New pattern.
I assume you ran the testsuite after this change to ensure there weren't 
any regressions?  We need to make sure that we indicate what testing 
we've done.
 
You don't need to run every multilib or anything like that.  For a given 
change I trust you to run a reasonable set of test.
 
OK assuming you've done a testsuite run.
 
Jeff
 


RE: [PATCH v1] Internal-fn: Add FLOATN support for l/ll round and rint [PR/112432]

2023-11-09 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, November 9, 2023 11:12 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] Internal-fn: Add FLOATN support for l/ll round and rint 
[PR/112432]



> Am 09.11.2023 um 15:34 schrieb pan2...@intel.com:
> 
> From: Pan Li 
> 
> The defined DEF_EXT_LIB_FLOATN_NX_BUILTINS functions should also
> have DEF_INTERNAL_FLT_FLOATN_FN instead of DEF_INTERNAL_FLT_FN for
> the FLOATN support. According to the glibc API and gcc builtin, we
> have below table for the FLOATN is supported or not.
> 
> +-+---+-+
> | | glibc | gcc: DEF_EXT_LIB_FLOATN_NX_BUILTINS |
> +-+---+-+
> | iceil   | N | N   |
> | ifloor  | N | N   |
> | irint   | N | N   |
> | iround  | N | N   |
> | lceil   | N | N   |
> | lfloor  | N | N   |
> | lrint   | Y | Y   |
> | lround  | Y | Y   |
> | llceil  | N | N   |
> | llfllor | N | N   |
> | llrint  | Y | Y   |
> | llround | Y | Y   |
> +-+---+-+
> 
> This patch would like to support FLOATN for:
> 1. lrint
> 2. lround
> 3. llrint
> 4. llround
> 
> The below tests are passed within this patch:
> 1. x86 bootstrap and regression test.
> 2. aarch64 regression test.
> 3. riscv regression tests.

Ok 

Richard 

>PR target/112432
> 
> gcc/ChangeLog:
> 
>* internal-fn.def (LRINT): Add FLOATN support.
>(LROUND): Ditto.
>(LLRINT): Ditto.
>(LLROUND): Ditto.
> 
> Signed-off-by: Pan Li 
> ---
> gcc/internal-fn.def | 8 
> 1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 7f0e3759615..10f88e37bc9 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -365,12 +365,12 @@ DEF_INTERNAL_FLT_FN (IRINT, ECF_CONST, lrint, 
> unary_convert)
> DEF_INTERNAL_FLT_FN (IROUND, ECF_CONST, lround, unary_convert)
> DEF_INTERNAL_FLT_FN (LCEIL, ECF_CONST, lceil, unary_convert)
> DEF_INTERNAL_FLT_FN (LFLOOR, ECF_CONST, lfloor, unary_convert)
> -DEF_INTERNAL_FLT_FN (LRINT, ECF_CONST, lrint, unary_convert)
> -DEF_INTERNAL_FLT_FN (LROUND, ECF_CONST, lround, unary_convert)
> +DEF_INTERNAL_FLT_FLOATN_FN (LRINT, ECF_CONST, lrint, unary_convert)
> +DEF_INTERNAL_FLT_FLOATN_FN (LROUND, ECF_CONST, lround, unary_convert)
> DEF_INTERNAL_FLT_FN (LLCEIL, ECF_CONST, lceil, unary_convert)
> DEF_INTERNAL_FLT_FN (LLFLOOR, ECF_CONST, lfloor, unary_convert)
> -DEF_INTERNAL_FLT_FN (LLRINT, ECF_CONST, lrint, unary_convert)
> -DEF_INTERNAL_FLT_FN (LLROUND, ECF_CONST, lround, unary_convert)
> +DEF_INTERNAL_FLT_FLOATN_FN (LLRINT, ECF_CONST, lrint, unary_convert)
> +DEF_INTERNAL_FLT_FLOATN_FN (LLROUND, ECF_CONST, lround, unary_convert)
> 
> /* FP rounding.  */
> DEF_INTERNAL_FLT_FLOATN_FN (CEIL, ECF_CONST, ceil, unary)
> -- 
> 2.34.1
> 


[committed] Improve single bit zero extraction on H8.

2023-11-09 Thread Jeff Law
When zero extracting a single bit bitfield from bits 16..31 on the H8 we 
currently generate some pretty bad code.


The fundamental issue is we can't shift efficiently and there's no 
trivial way to extract a single bit out of the high half word of an 
SImode value.


What usually happens is we use a synthesized right shift to get the 
single bit into the desired position, then a bit-and to mask off 
everything we don't care about.


The shifts are expensive, even using tricks like half and quarter word 
moves to implement shift-by-16 and shift-by-8.  Additionally a logical 
right shift must clear out the upper bits which is redundant since we're 
going to mask things with &1 later.


This patch provides a consistently better sequence for such extractions. 
 The general form moves the high half into the low half, a bit 
extraction into C, clear the destination, then move C into the 
destination with a few special cases.


This also avoids all the shenanigans for H8/SX which has a much more 
capable shifter.  It's not single cycle, but it is reasonably efficient.


This has been regression tested on the H8 without issues.  Pushing to 
the trunk momentarily.


jeff

ps.  Yes, supporting extraction of multi-bit fields might be improvable 
as well.  But I've already spent more time on this than I can reasonably 
justify.


commit 57dbc02d261bb833f6ef287187eb144321dd595c
Author: Jeff Law 
Date:   Thu Nov 9 17:34:01 2023 -0700

[committed] Improve single bit zero extraction on H8.

When zero extracting a single bit bitfield from bits 16..31 on the H8 we
currently generate some pretty bad code.

The fundamental issue is we can't shift efficiently and there's no trivial 
way
to extract a single bit out of the high half word of an SImode value.

What usually happens is we use a synthesized right shift to get the single 
bit
into the desired position, then a bit-and to mask off everything we don't 
care
about.

The shifts are expensive, even using tricks like half and quarter word 
moves to
implement shift-by-16 and shift-by-8.  Additionally a logical right shift 
must
clear out the upper bits which is redundant since we're going to mask things
with &1 later.

This patch provides a consistently better sequence for such extractions.  
The
general form moves the high half into the low half, a bit extraction into C,
clear the destination, then move C into the destination with a few special
cases.

This also avoids all the shenanigans for H8/SX which has a much more capable
shifter.  It's not single cycle, but it is reasonably efficient.

This has been regression tested on the H8 without issues.  Pushing to the 
trunk
momentarily.

jeff

ps.  Yes, supporting zero extraction of multi-bit fields might be 
improvable as
well.  But I've already spent more time on this than I can reasonably 
justify.

gcc/
* config/h8300/combiner.md (single bit sign_extract): Avoid recently
added patterns for H8/SX.
(single bit zero_extract): New patterns.

diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index 2f7faf77c93..e1179b5fea6 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -1278,7 +1278,7 @@ (define_insn_and_split ""
(sign_extract:SI (match_operand:QHSI 1 "register_operand" "0")
 (const_int 1)
 (match_operand 2 "immediate_operand")))]
-  ""
+  "!TARGET_H8300SX"
   "#"
   "&& reload_completed"
   [(parallel [(set (match_dup 0)
@@ -1291,7 +1291,7 @@ (define_insn ""
 (const_int 1)
 (match_operand 2 "immediate_operand")))
(clobber (reg:CC CC_REG))]
-  ""
+  "!TARGET_H8300SX"
 {
   int position = INTVAL (operands[2]);
 
@@ -1359,3 +1359,69 @@ (define_insn ""
   return "subx\t%s0,%s0\;exts.w %T0\;exts.l %0";
 }
   [(set_attr "length" "10")])
+
+;; For shift counts >= 16 we can always do better than the
+;; generic sequences.  Other patterns handle smaller counts.
+(define_insn_and_split ""
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+(match_operand 2 "immediate_operand" "n"))
+   (const_int 1)))]
+  "!TARGET_H8300SX && INTVAL (operands[2]) >= 16"
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 0) (and:SI (lshiftrt:SI (match_dup 0) (match_dup 
2))
+(const_int 1)))
+ (clobber (reg:CC CC_REG))])])
+
+(define_insn ""
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+(match_operand 2 "immediate_operand" "n"))
+   (const_int 1)))
+   (clobber (reg:CC CC_REG))]
+  "!TARGET_H8300SX && INTVAL (operands[2]) >= 16"

Re: [PATCH] c++: fix tf_decltype manipulation for COMPOUND_EXPR

2023-11-09 Thread Jason Merrill

On 11/7/23 10:08, Patrick Palka wrote:

bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for trunk?

-- >8 --

In the COMPOUND_EXPR case of tsubst_expr, we were redundantly clearing
the tf_decltype flag when substituting the LHS and also neglecting to
propagate it when substituting the RHS.  This patch corrects this flag
manipulation, which allows us to accept the below testcase.

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) : Don't redundantly
clear tf_decltype when substituting the LHS.  Propagate
tf_decltype when substituting the RHS.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-call7.C: New test.
---
  gcc/cp/pt.cc| 9 -
  gcc/testsuite/g++.dg/cpp0x/decltype-call7.C | 9 +
  2 files changed, 13 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-call7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 521749df525..5f879287a58 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20382,11 +20382,10 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  
  case COMPOUND_EXPR:

{
-   tree op0 = tsubst_expr (TREE_OPERAND (t, 0), args,
-   complain & ~tf_decltype, in_decl);
-   RETURN (build_x_compound_expr (EXPR_LOCATION (t),
-  op0,
-  RECUR (TREE_OPERAND (t, 1)),
+   tree op0 = RECUR (TREE_OPERAND (t, 0));
+   tree op1 = tsubst_expr (TREE_OPERAND (t, 1), args,
+   complain|decltype_flag, in_decl);
+   RETURN (build_x_compound_expr (EXPR_LOCATION (t), op0, op1,
   templated_operator_saved_lookups (t),
   complain|decltype_flag));


Hmm, passing decltype_flag to both op1 and the , is concerning.  Can you 
add a test with overloaded operator, where the RHS is a class with a 
destructor?


Jason



Re: [PATCH] c++: decltype of capture proxy [PR79378, PR96917]

2023-11-09 Thread Jason Merrill

On 11/7/23 14:52, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

We usually don't see capture proxies in finish_decltype_type because
process_outer_var_ref is a no-op inside an unevaluated context and
so a use of a capture inside decltype refers directly to the captured
variable.  But we can still see a capture proxy during decltype(auto)
deduction and for decltype of an init-capture, which suggests we need
to handle capture proxies specially within finish_decltype_type (since
they're always implicitly const).  This patch adds such handling.

PR c++/79378
PR c++/96917

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): Handle an id-expression
naming a capture proxy specially.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto7.C: New test.
* g++.dg/cpp1y/lambda-init20.C: New test.
---
  gcc/cp/semantics.cc | 28 +--
  gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C | 53 +
  gcc/testsuite/g++.dg/cpp1y/lambda-init20.C  | 22 +
  3 files changed, 98 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-init20.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 4059e74bdb7..f583dedd6cf 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11643,12 +11643,30 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
/* Fall through for fields that aren't bitfields.  */
  gcc_fallthrough ();
  
-case FUNCTION_DECL:

  case VAR_DECL:
-case CONST_DECL:
-case PARM_DECL:
-case RESULT_DECL:
-case TEMPLATE_PARM_INDEX:
+ if (is_capture_proxy (expr))
+   {
+ if (is_normal_capture_proxy (expr))
+   {
+ expr = DECL_CAPTURED_VARIABLE (expr);
+ type = TREE_TYPE (expr);
+ type = non_reference (type);
+   }
+ else
+   {
+ expr = DECL_VALUE_EXPR (expr);
+ gcc_assert (TREE_CODE (expr) == COMPONENT_REF);
+ expr = TREE_OPERAND (expr, 1);
+ type = TREE_TYPE (expr);
+   }
+ break;
+   }
+ /* Fall through.  */
+   case FUNCTION_DECL:
+   case CONST_DECL:
+   case PARM_DECL:
+   case RESULT_DECL:
+   case TEMPLATE_PARM_INDEX:
  expr = mark_type_use (expr);
type = TREE_TYPE (expr);
  if (VAR_P (expr) && DECL_NTTP_OBJECT_P (expr))
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C
new file mode 100644
index 000..a37b9db38d4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C
@@ -0,0 +1,53 @@
+// PR c++/96917
+// { dg-do compile { target c++14 } }
+
+int main() {
+  int x = 0;
+  int y = 0;
+  const int cx = 0;
+  const int cy = 0;
+
+  [x, , cx, ] {
+decltype(auto) a = x;
+using ty1 = int;
+using ty1 = decltype(x);
+using ty1 = decltype(a);
+
+decltype(auto) b = y;
+using ty2 = int;
+using ty2 = decltype(y);
+using ty2 = decltype(b);
+
+decltype(auto) ca = cx;
+using ty3 = const int;
+using ty3 = decltype(cx);
+using ty3 = decltype(ca);
+
+decltype(auto) cb = cy;
+using ty4 = const int;
+using ty4 = decltype(cy);
+using ty4 = decltype(cb);
+  };
+
+  [x=x, =y, cx=cx, =cy] {
+decltype(auto) a = x;
+using ty1 = int;
+using ty1 = decltype(x);
+using ty1 = decltype(a);
+
+decltype(auto) b = y;
+using ty2 = int&;
+using ty2 = decltype(y);
+using ty2 = decltype(b);
+
+decltype(auto) ca = cx;
+using ty3 = int;
+using ty3 = decltype(cx);
+using ty3 = decltype(ca);
+
+decltype(auto) cb = cy;
+using ty4 = const int&;
+using ty4 = decltype(cy);
+using ty4 = decltype(cb);
+  };
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-init20.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-init20.C
new file mode 100644
index 000..a06b77a664d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-init20.C
@@ -0,0 +1,22 @@
+// PR c++/79378
+// { dg-do compile { target c++14 } }
+
+int main() {
+  int x = 0;
+  [x=x, =x] {
+using ty1 = int;
+using ty1 = decltype(x);
+
+using ty2 = int&;
+using ty2 = decltype(r);
+  };
+
+  const int cx = 0;
+  [x=cx, =cx] {
+using ty1 = int;
+using ty1 = decltype(x);
+
+using ty2 = const int&;
+using ty2 = decltype(r);
+  };
+}




Re: [PATCH] libgccjit: Add ability to get CPU features

2023-11-09 Thread Antoni Boucher
Hi.
See answers below.

On Thu, 2023-11-09 at 18:04 -0500, David Malcolm wrote:
> On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote:
> > Hi.
> > This patch adds support for getting the CPU features in libgccjit
> > (bug
> > 112466)
> > 
> > There's a TODO in the test:
> > I'm not sure how to test that gcc_jit_target_info_arch returns the
> > correct value since it is dependant on the CPU.
> > Any idea on how to improve this?
> > 
> > Also, I created a CStringHash to be able to have a
> > std::unordered_set. Is there any built-in way of
> > doing
> > this?
> 
> Thanks for the patch.
> 
> Some high-level questions:
> 
> Is this specifically about detecting capabilities of the host that
> libgccjit is currently running on? or how the target was configured
> when libgccjit was built?

I'm less sure about this part. I'll need to do more tests.

> 
> One of the benefits of libgccjit is that, in theory, we support all
> of
> the targets that GCC already supports.  Does this patch change that,
> or
> is this more about giving client code the ability to determine
> capabilities of the specific host being compiled for?

This should not change that. If it does, this is a bug.

> 
> I'm nervous about having per-target jit code.  Presumably there's a
> reason that we can't reuse existing target logic here - can you
> please
> describe what the problem is.  I see that the ChangeLog has:
> 
> > * config/i386/i386-jit.cc: New file.
> 
> where i386-jit.cc has almost 200 lines of nontrivial code.  Where did
> this come from?  Did you base it on existing code in our source tree,
> making modifications to fit the new internal API, or did you write it
> from scratch?  In either case, how onerous would this be for other
> targets?

This was mostly copied from the same code done for the Rust and D
frontends.
See this commit and the following:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b1c06fd9723453dd2b2ec306684cb806dc2b4fbb
The equivalent to i386-jit.cc is there:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=22e3557e2d52f129f2bbfdc98688b945dba28dc9

> 
> I'm not at expert at target hooks (or at the i386 backend), so if we
> do
> go with this approach I'd want someone else to review those parts of
> the patch.
> 
> Have you verified that GCC builds with this patch with jit *not*
> enabled in the enabled languages?

I will do.

> 
> [...snip...]
> 
> A nitpick:
> 
> > +.. function:: const char * \
> > +  gcc_jit_target_info_arch (gcc_jit_target_info *info)
> > +
> > +   Get the architecture of the currently running CPU.
> 
> What does this string look like?
> How long does the pointer remain valid?

It's the march string, like "znver2", for instance.
It remains valid until we free the gcc_jit_target_info object.

> 
> Thanks again; hope the above makes sense
> Dave
> 



Re: [PATCH] c++: decltype of (by-value captured reference) [PR79620]

2023-11-09 Thread Jason Merrill

On 11/7/23 14:52, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

The capture decltype handling in finish_decltype_type wasn't looking
through implicit INDIRECT_REF (added by convert_from_reference), which
caused us to incorrectly resolve decltype((x)) to float& below.

We still don't fully accept the example ultimately because when
processing the decltype inside the first lambda's trailing return type,
we're in lambda type scope but not yet in lambda function scope that
the check looks for, which seems like an orthogonal bug.

PR c++/79620

gcc/cp/ChangeLog:

* cp-tree.h (STRIP_REFERENCE_REF): Define.
* semantics.cc (finish_decltype_type): Use it to look
through implicit INDIRECT_REF when deciding whether to
call capture_decltype.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-decltype3.C: New test.
---
  gcc/cp/cp-tree.h  |  4 +++
  gcc/cp/semantics.cc   |  4 +--
  .../g++.dg/cpp0x/lambda/lambda-decltype3.C| 28 +++
  3 files changed, 34 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b2603d4830e..1fa710d7154 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4084,6 +4084,10 @@ struct GTY(()) lang_decl {
 && TREE_TYPE (TREE_OPERAND (NODE, 0))  \
 && TYPE_REF_P (TREE_TYPE (TREE_OPERAND ((NODE), 0
  
+/* Look through an implicit INDIRECT_REF from convert_from_reference.  */

+#define STRIP_REFERENCE_REF(NODE)  \
+  (REFERENCE_REF_P (NODE) ? TREE_OPERAND (NODE, 0) : NODE)
+
  /* True iff this represents an lvalue being treated as an rvalue during return
 or throw as per [class.copy.elision].  */
  #define IMPLICIT_RVALUE_P(NODE) \
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index f583dedd6cf..8df4521bf7c 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11717,10 +11717,10 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
 transformed into an access to a corresponding data member
 of the closure type that would have been declared if x
 were a use of the denoted entity.  */
-  if (outer_automatic_var_p (expr)
+  if (outer_automatic_var_p (STRIP_REFERENCE_REF (expr))


Let's also have outer_automatic_var_p assert that its argument is not 
REFERENCE_REF_P.  OK with that change (if no regressions).


Jason



Re: [PATCH] c++: non-dependent .* folding [PR112427]

2023-11-09 Thread Jason Merrill

On 11/8/23 16:59, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Here when building up the non-dependent .* expression, we crash from
fold_convert on 'b.a' due to this (templated) COMPONENT_REF having an
IDENTIFIER_NODE instead of FIELD_DECL operand that middle-end routines
expect.  Like in r14-4899-gd80a26cca02587, this patch fixes this by
replacing the problematic piecemeal folding with a single call to
cp_fully_fold.

PR c++/112427

gcc/cp/ChangeLog:

* typeck2.cc (build_m_component_ref): Use cp_convert, build2 and
cp_fully_fold instead of fold_build_pointer_plus and fold_convert.



gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent29.C: New test.
---
  gcc/cp/typeck2.cc   |  5 -
  gcc/testsuite/g++.dg/template/non-dependent29.C | 13 +
  2 files changed, 17 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent29.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 309903afed8..208004221da 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -2378,7 +2378,10 @@ build_m_component_ref (tree datum, tree component, 
tsubst_flags_t complain)
/* Build an expression for "object + offset" where offset is the
 value stored in the pointer-to-data-member.  */
ptype = build_pointer_type (type);
-  datum = fold_build_pointer_plus (fold_convert (ptype, datum), component);
+  datum = cp_convert (ptype, datum, complain);
+  datum = build2 (POINTER_PLUS_EXPR, ptype,
+ datum, convert_to_ptrofftype (component));


We shouldn't need to build the POINTER_PLUS_EXPR at all in template 
context.  OK with that change.


Jason



[COMMITTED] bpf: fix pseudo-c asm emitted for *mulsidi3_zeroextend

2023-11-09 Thread Jose E. Marchesi
This patch fixes the pseudo-c BPF assembly syntax used for
*mulsidi3_zeroextend, which was being emitted as:

  rN *= wM

instead of the proper way to denote a mul32 in pseudo-C syntax:

  wN *= wM

Includes test.
Tested in bpf-unknown-none-gcc target in x86_64-linux-gnu host.

gcc/ChangeLog:

* config/bpf/bpf.cc (bpf_print_register): Accept modifier code 'W'
to force emitting register names using the wN form.
* config/bpf/bpf.md (*mulsidi3_zeroextend): Force operands to
always use wN written form in pseudo-C assembly syntax.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c: New test.
---
 gcc/config/bpf/bpf.cc  | 18 +++---
 gcc/config/bpf/bpf.md  |  2 +-
 .../bpf/mulsidi3-zeroextend-pseudoc.c  | 14 ++
 3 files changed, 26 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 63637ece78e..a0956a06972 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -763,13 +763,17 @@ bpf_output_call (rtx target)
   return "";
 }
 
-/* Print register name according to assembly dialect.
-   In normal syntax registers are printed like %rN where N is the
-   register number.
+/* Print register name according to assembly dialect.  In normal
+   syntax registers are printed like %rN where N is the register
+   number.
+
In pseudoc syntax, the register names do not feature a '%' prefix.
-   Additionally, the code 'w' denotes that the register should be printed
-   as wN instead of rN, where N is the register number, but only when the
-   value stored in the operand OP is 32-bit wide.  */
+   Additionally, the code 'w' denotes that the register should be
+   printed as wN instead of rN, where N is the register number, but
+   only when the value stored in the operand OP is 32-bit wide.
+   Finally, the code 'W' denotes that the register should be printed
+   as wN instead of rN, in all cases, regardless of the mode of the
+   value stored in the operand.  */
 
 static void
 bpf_print_register (FILE *file, rtx op, int code)
@@ -778,7 +782,7 @@ bpf_print_register (FILE *file, rtx op, int code)
 fprintf (file, "%s", reg_names[REGNO (op)]);
   else
 {
-  if (code == 'w' && GET_MODE_SIZE (GET_MODE (op)) <= 4)
+  if (code == 'W' || (code == 'w' && GET_MODE_SIZE (GET_MODE (op)) <= 4))
{
  if (REGNO (op) == BPF_FP)
fprintf (file, "w10");
diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 0e2ad8da5ac..522351a6596 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -184,7 +184,7 @@ (define_insn "*mulsidi3_zeroextend"
  (mult:SI (match_operand:SI 1 "register_operand" "0,0")
   (match_operand:SI 2 "reg_or_imm_operand" "r,I"]
   ""
-  "{mul32\t%0,%2|%w0 *= %w2}"
+  "{mul32\t%0,%2|%W0 *= %W2}"
   [(set_attr "type" "alu32")])
 
 ;;; Division
diff --git a/gcc/testsuite/gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c 
b/gcc/testsuite/gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c
new file mode 100644
index 000..63d63142708
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c
@@ -0,0 +1,14 @@
+/* Make sure that we are emitting `wN *= wM' and not `rN *= wM' for a mul32 in
+   pseudo-C assembly syntax when emitting assembly for a recognized
+   *mulsidi3_zeroextend pattern.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -masm=pseudoc" } */
+
+unsigned long foo (unsigned snd_cwnd, unsigned mss_cache)
+{
+  return snd_cwnd * mss_cache;
+}
+
+/* { dg-final { scan-assembler-not {\tr. \*= w.\n} } } */
+/* { dg-final { scan-assembler {\tw. \*= w.\n} } } */
-- 
2.30.2



Re: [PATCH] c++: fix parsing with auto(x) [PR112410]

2023-11-09 Thread Jason Merrill

On 11/9/23 14:58, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we are wrongly parsing

   int y(auto(42));

which uses the C++23 cast-to-prvalue feature, and initializes y to 42.
However, we were treating the auto as an implicit template parameter.

Fixing the auto{42} case is easy, but when auto is followed by a (,
I found the fix to be much more involved.  For instance, we cannot
use cp_parser_expression, because that can give hard errors.  It's
also necessary to disambiguate 'auto(i)' as 'auto i', not a cast.
auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc.
are all function declarations.  We have to look at more than one
token to decide.


Yeah, this is a most vexing parse problem.  The code is synthesizing 
template parameters before we've resolved whether the auto is a 
decl-specifier or not.



In this fix, I'm (ab)using cp_parser_declarator, with member_p=false
so that it doesn't commit.  But it handles even more complicated
cases as

   int fn (auto (*const **)(int) -> char);


But it doesn't seem to handle the extremely vexing

struct A {
  A(int,int);
};

int main()
{
  int a;
  A b(auto(a), 42);
}

I think we need to stop synthesizing immediately when we see RID_AUTO, 
and instead go back after we successfully parse a declaration and 
synthesize for any autos we saw along the way.  :/


Jason



Re: [PATCH] RISC-V/testsuite: Fix zvfh tests.

2023-11-09 Thread Jeff Law




On 11/9/23 15:43, 钟居哲 wrote:

Hi. Robin.

[ ... ]
You may need a development version of binutils to get the zfh/zvfh 
support and unreleased patches to get zfb/zvfb support.


Probably the easiest thing to do would be to look in the gcc.log file at 
those failures and see what the excess failure is.  If it's a diagnostic 
from the assembler about an unrecognized instruction or something 
similar, then that's a pretty clear sign you need a newer binutils.


jeff


Re: [PATCH] RISC-V: Move cond_copysign from combine pattern to autovec pattern

2023-11-09 Thread Jeff Law




On 11/9/23 16:33, Juzhe-Zhong wrote:

Since cond_copysign has been support into match.pd (middle-end).
We don't need to support conditional copysign by RTL combine pass.

Instead, we can support it by direct explicit cond_copysign optab.

conditional copysign tests are already available in the testsuite.
No need to add tests.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_copysign): Remove.
* config/riscv/autovec.md (cond_copysign): New pattern.
I assume you ran the testsuite after this change to ensure there weren't 
any regressions?  We need to make sure that we indicate what testing 
we've done.


You don't need to run every multilib or anything like that.  For a given 
change I trust you to run a reasonable set of test.


OK assuming you've done a testsuite run.

Jeff


[COMMITED] bpf: testsuite: fix expected regexp in gcc.target/bpf/ldxdw.c

2023-11-09 Thread Jose E. Marchesi
gcc/testsuite/ChangeLog:

* gcc.target/bpf/ldxdw.c: Fix regexp with expected result.
---
 gcc/testsuite/gcc.target/bpf/ldxdw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/bpf/ldxdw.c 
b/gcc/testsuite/gcc.target/bpf/ldxdw.c
index 0985ea3e6ac..72db8f03324 100644
--- a/gcc/testsuite/gcc.target/bpf/ldxdw.c
+++ b/gcc/testsuite/gcc.target/bpf/ldxdw.c
@@ -4,7 +4,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
 
-/* { dg-final { scan-assembler-times "ldxdw\t%r.,\\\[%r.+0\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "ldxdw\t%r.,\\\[%r.\\+\[0-9\]+\\\]" 1 } } 
*/
 /* { dg-final { scan-assembler-not "ldxdw\t%r.,\[0-9\]+" } } */
 
 unsigned long long test () {
-- 
2.30.2



Re: [PATCH] c++/modules: fix virtual destructors [PR103499]

2023-11-09 Thread Nathan Sidwell

On 11/9/23 18:29, Nathaniel Shead wrote:

On Thu, Nov 09, 2023 at 05:57:39PM -0500, Nathan Sidwell wrote:

On 11/9/23 04:55, Nathaniel Shead wrote:

I'm not sure if this is just papering over a general issue of clones not being
exported/imported, or if this is just an exception to the general case of
clones being able to be freely regenerated with no other issues.

Alternatively, would it be better to override the DECL_VINDEX of the original
declaration after filling it in for the clones as well? I wasn't able to see
anything depending on the current behaviour (though I didn't look very hard).


I think your patch is a fine approach. IIRC just streaming out the clones
directly ran into a bunch of issues, hence the current implementation.


ok

nathan


Sorry, I don't have write access, would you be able to push? Thanks.
(And for my other patch.)


ok, no worries






Bootstrapped and regtexted on x86_64-pc-linux-gnu.

-- >8 --

Currently, cloned functions are not included in the CMI.  However, for
virtual destructors the clones must have a different DECL_VINDEX from
their base declaration: the former have an INTEGER_CST indicating the
index into the vtable, while the latter indicate the FUNCTION_DECL that
they're overriding.

As such, this patch ensures that DECL_VINDEX is properly passed on for
cloned functions as well to prevent this from causing issues.

PR c++/103499

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_node): Write DECL_VINDEX for
virtual clones.
(trees_in::tree_node): Read DECL_VINDEX for virtual clones.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr103499_a.C: New test.
* g++.dg/modules/pr103499_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/module.cc  |  6 ++
   gcc/testsuite/g++.dg/modules/pr103499_a.C | 12 
   gcc/testsuite/g++.dg/modules/pr103499_b.C |  8 
   3 files changed, 26 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/modules/pr103499_a.C
   create mode 100644 gcc/testsuite/g++.dg/modules/pr103499_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c1c8c226bc1..416a7c414cc 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8648,6 +8648,8 @@ trees_out::decl_node (tree decl, walk_kind ref)
 tree_node (target);
 tree_node (DECL_NAME (decl));
+  if (TREE_CODE (decl) == FUNCTION_DECL && DECL_VIRTUAL_P (decl))
+   tree_node (DECL_VINDEX (decl));
 int tag = insert (decl);
 if (streaming_p ())
dump (dumper::TREE)
@@ -9869,6 +9871,10 @@ trees_in::tree_node (bool is_use)
}
  }
+   /* A clone might have a different vtable entry.  */
+   if (res && TREE_CODE (res) == FUNCTION_DECL && DECL_VIRTUAL_P (res))
+ DECL_VINDEX (res) = tree_node ();
+
if (!res)
  set_overrun ();
int tag = insert (res);
diff --git a/gcc/testsuite/g++.dg/modules/pr103499_a.C 
b/gcc/testsuite/g++.dg/modules/pr103499_a.C
new file mode 100644
index 000..0497c2c5504
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr103499_a.C
@@ -0,0 +1,12 @@
+// PR c++/103499
+// { dg-module-do compile }
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi pr103499 }
+
+export module pr103499;
+
+export struct base {
+  virtual ~base() = default;
+};
+
+export struct derived : base {};
diff --git a/gcc/testsuite/g++.dg/modules/pr103499_b.C 
b/gcc/testsuite/g++.dg/modules/pr103499_b.C
new file mode 100644
index 000..b7468562ba9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr103499_b.C
@@ -0,0 +1,8 @@
+// PR c++/103499
+// { dg-additional-options "-fmodules-ts" }
+
+import pr103499;
+
+void test(derived* p) {
+  delete p;
+}


--
Nathan Sidwell



--
Nathan Sidwell



Re: [PATCH v4 2/2] c++: Diagnostics for P0847R7 (Deducing this) [PR102609]

2023-11-09 Thread Jason Merrill

On 11/5/23 10:06, waffl3x wrote:

I had wanted to write about some of my frustrations with trying to
write a test for virtual specifiers and errors/warnings for
shadowing/overloading virtual functions, but I am a bit too tired at
the moment and I don't want to delay getting this up for another night.
In short, the standard does not properly specify the criteria for
overriding functions, which leaves a lot of ambiguity in how exactly we
should be handling these cases. 


Agreed, this issue came up in the C++ committee meeting today.  See

https://cplusplus.github.io/CWG/issues/2553.html
https://cplusplus.github.io/CWG/issues/2554.html

for draft changes to clarify some of these issues.


The standard also really poorly
specifies things related to the implicit object parameter and implicit
object argument which also causes some trouble. Anyhow, for the time
being I am not including my test for diagnostics related to a virtual
specifier on xobj member functions. I can't get it to a point I am
happy with it and I think there will need to be some discussion on how
exactly we want to handle that.


The discussion might be easier with the testcase to refer to?


I was fairly lazy with the changelog and commit message in this patch
as I expect to need to do another round on this patch before it can be
accepted. One specific question I have is whether I should be listing
out all the diagnostics that were added to a function. For the cases
where there were only one diagnostic added I stated it, but for
grokdeclarator which has the majority of the diagnostics I did not. I
welcome input here, really I request it, because the changelogs are
still fairly difficult for me to write. Hell, the commit messages are
hard to write, I feel I went overboard on the first patch but I guess
it's a fairly large patch so maybe it's alright? Again, I am looking
for feedback here if anyone is willing to provide it.


ChangeLog entries are very brief summaries of the changes, there's 
absolutely no need to enumerate multiple diagnostics.  If someone wants 
more detail they can look at the patch.



+  if (xobj_func_p && (quals || rqual))
+   inform (DECL_SOURCE_LOCATION (DECL_ARGUMENTS (decl)),
+   "explicit object parameter declared here");


When you add an inform after a diagnostic, you also need to add an 
auto_diagnostic_group declaration before the error so that they get 
grouped together for JSON/SARIF diagnostic output.


This applies to a lot of the diagnostics in the patch.


+ pedwarn(DECL_SOURCE_LOCATION (xobj_parm), OPT_Wc__23_extensions,


Missing space before (


+   /* If   */


I think this comment doesn't add much.  :)


+   else if (declarator->declarator->kind == cdk_ptrmem)
+ error_at (DECL_SOURCE_LOCATION (xobj_parm),
+   "a member function pointer type "
+   "cannot have an explicit object parameter");


Let's say "a pointer to member function type "


+   /* Ideally we should synthesize the correct syntax
+  for the user, perhaps this could be added later.  */


Should be pretty simple to produce an add_fixit_remove() for the 'this' 
token here?



+   /* Free function case,
+  surely there is a better way to identify it?  */


Move these diagnostics down past where ctype gets set?


+   else if (decl_context == NORMAL
+&& (in_namespace
+|| !declarator->declarator->u.id.qualifying_scope))
+   error_at (DECL_SOURCE_LOCATION (xobj_parm),
+ "a free function cannot have "
+ "an explicit object parameter");


Let's say "non-member function".


+ /* Ideally we synthesize a full rewrite, at the moment
+there are issues with it though.
+It rewrites "f(S this & s)" correctly,
+but fails to rewrite "f(const this S s)" correctly.
+It also does not handle "f(S& this s)" correctly at all.


David Malcolm would be the one to ask for advice about fixit tricks, if 
you want.



+It's also possible we want to wait and see if the parm
+could even be a valid xobj parm as it might be confusing
+to the user to see an error, fix it, and then see another
+error for something new.


I don't see how that applies here; we don't bail out after this error, 
so we should continue to give any other needed errors.



+ /* If default_argument is non-null token should always be the
+the location of the `=' token, this is brittle code though
+and should be rectified in the future.  */


It would be easy enough to add an eq_token variable?


+  /* I can imagine doing a fixit here, suggesting replacing
+this / *this / this-> with  / name / "name." but it 

[PATCH] RISC-V: Move cond_copysign from combine pattern to autovec pattern

2023-11-09 Thread Juzhe-Zhong
Since cond_copysign has been support into match.pd (middle-end).
We don't need to support conditional copysign by RTL combine pass.

Instead, we can support it by direct explicit cond_copysign optab.

conditional copysign tests are already available in the testsuite.
No need to add tests.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_copysign): Remove.
* config/riscv/autovec.md (cond_copysign): New pattern.

---
 gcc/config/riscv/autovec-opt.md | 22 --
 gcc/config/riscv/autovec.md | 22 ++
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 3c87e66ea49..986ac6e9181 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -486,28 +486,6 @@
 }
 [(set_attr "type" "vector")])
 
-;; Combine vfsgnj.vv + vcond_mask
-(define_insn_and_split "*cond_copysign"
-   [(set (match_operand:V_VLSF 0 "register_operand")
-(if_then_else:V_VLSF
-  (match_operand: 1 "register_operand")
-  (unspec:V_VLSF
-   [(match_operand:V_VLSF 2 "register_operand")
-(match_operand:V_VLSF 3 "register_operand")] UNSPEC_VCOPYSIGN)
-  (match_operand:V_VLSF 4 "register_operand")))]
-   "TARGET_VECTOR && can_create_pseudo_p ()"
-   "#"
-   "&& 1"
-   [(const_int 0)]
-{
-  insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, mode);
-  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[4],
-   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
-  riscv_vector::expand_cond_len_binop (icode, ops);
-   DONE;
-}
-[(set_attr "type" "vector")])
-
 ;; Combine vnsra + vcond_mask
 (define_insn_and_split 
"*cond_vtrunc"
   [(set (match_operand: 0 "register_operand")
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 973dc4ac235..33722ea1139 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1808,6 +1808,28 @@
   DONE;
 })
 
+;; -
+;;  [FP] Conditional copysign operations
+;; -
+;; Includes:
+;; - vfsgnj
+;; -
+
+(define_expand "cond_copysign"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:V_VLSF 2 "register_operand")
+   (match_operand:V_VLSF 3 "register_operand")
+   (match_operand:V_VLSF 4 "register_operand")]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[4],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_binop (icode, ops);
+  DONE;
+})
+
 ;; -
 ;;  [INT] Conditional ternary operations
 ;; -
-- 
2.36.3



Re: [PATCH] c++/modules: fix virtual destructors [PR103499]

2023-11-09 Thread Nathaniel Shead
On Thu, Nov 09, 2023 at 05:57:39PM -0500, Nathan Sidwell wrote:
> On 11/9/23 04:55, Nathaniel Shead wrote:
> > I'm not sure if this is just papering over a general issue of clones not 
> > being
> > exported/imported, or if this is just an exception to the general case of
> > clones being able to be freely regenerated with no other issues.
> > 
> > Alternatively, would it be better to override the DECL_VINDEX of the 
> > original
> > declaration after filling it in for the clones as well? I wasn't able to see
> > anything depending on the current behaviour (though I didn't look very 
> > hard).
> 
> I think your patch is a fine approach. IIRC just streaming out the clones
> directly ran into a bunch of issues, hence the current implementation.
> 
> 
> ok
> 
> nathan

Sorry, I don't have write access, would you be able to push? Thanks.
(And for my other patch.)

> > 
> > Bootstrapped and regtexted on x86_64-pc-linux-gnu.
> > 
> > -- >8 --
> > 
> > Currently, cloned functions are not included in the CMI.  However, for
> > virtual destructors the clones must have a different DECL_VINDEX from
> > their base declaration: the former have an INTEGER_CST indicating the
> > index into the vtable, while the latter indicate the FUNCTION_DECL that
> > they're overriding.
> > 
> > As such, this patch ensures that DECL_VINDEX is properly passed on for
> > cloned functions as well to prevent this from causing issues.
> > 
> > PR c++/103499
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (trees_out::decl_node): Write DECL_VINDEX for
> > virtual clones.
> > (trees_in::tree_node): Read DECL_VINDEX for virtual clones.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/pr103499_a.C: New test.
> > * g++.dg/modules/pr103499_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/module.cc  |  6 ++
> >   gcc/testsuite/g++.dg/modules/pr103499_a.C | 12 
> >   gcc/testsuite/g++.dg/modules/pr103499_b.C |  8 
> >   3 files changed, 26 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/pr103499_a.C
> >   create mode 100644 gcc/testsuite/g++.dg/modules/pr103499_b.C
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index c1c8c226bc1..416a7c414cc 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -8648,6 +8648,8 @@ trees_out::decl_node (tree decl, walk_kind ref)
> > tree_node (target);
> > tree_node (DECL_NAME (decl));
> > +  if (TREE_CODE (decl) == FUNCTION_DECL && DECL_VIRTUAL_P (decl))
> > +   tree_node (DECL_VINDEX (decl));
> > int tag = insert (decl);
> > if (streaming_p ())
> > dump (dumper::TREE)
> > @@ -9869,6 +9871,10 @@ trees_in::tree_node (bool is_use)
> > }
> >   }
> > +   /* A clone might have a different vtable entry.  */
> > +   if (res && TREE_CODE (res) == FUNCTION_DECL && DECL_VIRTUAL_P (res))
> > + DECL_VINDEX (res) = tree_node ();
> > +
> > if (!res)
> >   set_overrun ();
> > int tag = insert (res);
> > diff --git a/gcc/testsuite/g++.dg/modules/pr103499_a.C 
> > b/gcc/testsuite/g++.dg/modules/pr103499_a.C
> > new file mode 100644
> > index 000..0497c2c5504
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/pr103499_a.C
> > @@ -0,0 +1,12 @@
> > +// PR c++/103499
> > +// { dg-module-do compile }
> > +// { dg-additional-options "-fmodules-ts" }
> > +// { dg-module-cmi pr103499 }
> > +
> > +export module pr103499;
> > +
> > +export struct base {
> > +  virtual ~base() = default;
> > +};
> > +
> > +export struct derived : base {};
> > diff --git a/gcc/testsuite/g++.dg/modules/pr103499_b.C 
> > b/gcc/testsuite/g++.dg/modules/pr103499_b.C
> > new file mode 100644
> > index 000..b7468562ba9
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/pr103499_b.C
> > @@ -0,0 +1,8 @@
> > +// PR c++/103499
> > +// { dg-additional-options "-fmodules-ts" }
> > +
> > +import pr103499;
> > +
> > +void test(derived* p) {
> > +  delete p;
> > +}
> 
> -- 
> Nathan Sidwell
> 


Re: [PATCH 1/2] libstdc++: declare std::allocator in !HOSTED as an extension

2023-11-09 Thread Arsen Arsenović

Jonathan Wakely  writes:

> OK

Thanks, pushed (tests did pass).

Have a lovely night.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH 1/2] libstdc++: declare std::allocator in !HOSTED as an extension

2023-11-09 Thread Jonathan Wakely
On Thu, 9 Nov 2023 at 19:32, Arsen Arsenović  wrote:
>
> This allows us to add features to freestanding which allow specifying
> non-default allocators (generators, collections, ...) without having to
> modify them.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/memoryfwd.h: Remove HOSTED check around allocator
> and its specializations.
> ---
> Evening,
>
> This patch adds std::allocator as a declaration to freestanding, so that
> it doesn't block various other bits of the library (such as collections
> or the generators that I intend to send in soon) from being added to
> freestanding anymore.
>
> I don't intend to pull in anything but  into freestanding
> in this release, though, so, this patch will have little impact for now.
>
> Testing on x86_64-pc-linux-gnu (the tests are not done yet, but I see no
> relevant fails in previous test runs).  The follow-up patch also marks a
> new test as freestanding (as it was failing).
>
> OK for trunk?

OK


>
> Have a lovely evening!
>
>  libstdc++-v3/include/bits/memoryfwd.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/memoryfwd.h 
> b/libstdc++-v3/include/bits/memoryfwd.h
> index 330a6df7f44a..2b79cd8880a1 100644
> --- a/libstdc++-v3/include/bits/memoryfwd.h
> +++ b/libstdc++-v3/include/bits/memoryfwd.h
> @@ -60,13 +60,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> * @{
> */
>
> -#if _GLIBCXX_HOSTED
> +  // Included in freestanding as a libstdc++ extension.
>template
>  class allocator;
>
>template<>
>  class allocator;
> -#endif
>
>  #if __cplusplus >= 201103L
>/// Declare uses_allocator so it can be specialized in `` etc.
> --
> 2.42.1
>



Re: [PATCH] libgccjit: Add ability to get CPU features

2023-11-09 Thread David Malcolm
On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds support for getting the CPU features in libgccjit
> (bug
> 112466)
> 
> There's a TODO in the test:
> I'm not sure how to test that gcc_jit_target_info_arch returns the
> correct value since it is dependant on the CPU.
> Any idea on how to improve this?
> 
> Also, I created a CStringHash to be able to have a
> std::unordered_set. Is there any built-in way of doing
> this?

Thanks for the patch.

Some high-level questions:

Is this specifically about detecting capabilities of the host that
libgccjit is currently running on? or how the target was configured
when libgccjit was built?

One of the benefits of libgccjit is that, in theory, we support all of
the targets that GCC already supports.  Does this patch change that, or
is this more about giving client code the ability to determine
capabilities of the specific host being compiled for?

I'm nervous about having per-target jit code.  Presumably there's a
reason that we can't reuse existing target logic here - can you please
describe what the problem is.  I see that the ChangeLog has:

>   * config/i386/i386-jit.cc: New file.

where i386-jit.cc has almost 200 lines of nontrivial code.  Where did
this come from?  Did you base it on existing code in our source tree,
making modifications to fit the new internal API, or did you write it
from scratch?  In either case, how onerous would this be for other
targets?

I'm not at expert at target hooks (or at the i386 backend), so if we do
go with this approach I'd want someone else to review those parts of
the patch.

Have you verified that GCC builds with this patch with jit *not*
enabled in the enabled languages?

[...snip...]

A nitpick:

> +.. function:: const char * \
> +  gcc_jit_target_info_arch (gcc_jit_target_info *info)
> +
> +   Get the architecture of the currently running CPU.

What does this string look like?
How long does the pointer remain valid?

Thanks again; hope the above makes sense
Dave



RFC (V3) the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-09 Thread Qing Zhao
Hi,

I added the BPF related issue and the solution in the section Appendix 4 Known 
issues. 
No change to other parts. 

Send this V3 for record purpose.

Qing


Represent the missing dependence for the "counted_by" attribute and its 
consumers 

Qing Zhao

11/09/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634844.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635397.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations. 

  1 struct A
  2 {
  3  size_t size;
  4  char buf[] __attribute__((counted_by(size)));
  5 };
  6 
  7 size_t 
  8 foo (size_t sz)
  9 {
 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 11  obj->size = sz;
 12  obj->buf[0] = 2;
 13  return __builtin_dynamic_object_size (obj->buf, 1);
 14 }
  
Please see a more complicate example in the Appendex 1.

We need to represent such data dependency correctly in the IL. 

2. The solution:

2.1 Summary

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function 
with the actual reference to the FAM field; 
* Some adjustment to inlining heuristic, ipa alias analysis, and other SSA 
passes to mitigate the impact to the optimizer and code generation. 

2.2 The new internal function 

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object, 
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents 
   0: unknown;
   1: the number of the elements of the object type;
   2: the number of bytes; 
4th argument "SIZE_OF_SIZE": how many bytes is the object that REF_TO_SIZE 
points;
5th argument "ACCESS_MODE": 
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write

NOTEs, 
  A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503);
  B. For "counted_by" the 3rd argument will be 1;
  C. For "counted_by" and "alloc_size" attributes, the 5th argument will be -1; 
  
  D. In this wrieup, we focus on the implementation details for the 
"counted_by" attribute. However, this function should be ready to be used by 
"access" and "alloc_size" without issue. 

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

  struct A
  {
   size_t size;
   char buf[] __attribute__((counted_by(size)));
  };

for any object with such type:

  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The initialization to the size field should be done before the first reference 
to the FAM field,
Otherwise, the behavior is undefined.
Such additional requirement to the user will guarantee that the first reference 
to the FAM knows the size of the FAM.  

Another thing that need to be clarified is:
A later reference to the FAM field will use the latest value assigned to the 
size field before that reference. For example, 
 obj->size = val1;
 ref1 (obj->buf);
 obj->size = val2;
 ref2 (obj->buf);
in the above, "ref1" will use val1 and "ref2" will use val2. 
This clarification will inform user that the dynamic array feature is fully 
supported.

We need to add the above additional requirement and clarification to the user 
documentation.
The complete user documentation is in Appendix 2. 

2.4 Replace the reference to a FAM field with the new function .ACCESS_WITH_SIZE

In C FE:

for every reference to a FAM, for example, "obj->buf" in the small example,
  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
  if YES, replace 

Re: [PATCH] c++/modules: handle templates in exported using-declarations [PR106849]

2023-11-09 Thread Nathan Sidwell

On 11/9/23 16:06, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

A TEMPLATE_DECL does not have module attachment flags associated with
it, so this patch extracts the result from the template to read the
flags from there instead.



oh yeah.  my original plan had it duplicated, but that didn't work out well. 
You can use

   tree decl = STRIP_TEMPLATE (new_fn);
btw.  ok with that change


As a drive-by fix we also group the error with its informative note.

PR c++/106849

gcc/cp/ChangeLog:

* name-lookup.cc (do_nonmember_using_decl): Handle
TEMPLATE_DECLs when checking module attachment.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-9.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/name-lookup.cc  | 14 ++
  gcc/testsuite/g++.dg/modules/using-9.C | 13 +
  2 files changed, 23 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/using-9.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index a8b9229b29e..512dc1be87f 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -4846,12 +4846,18 @@ do_nonmember_using_decl (name_lookup , bool 
fn_scope_p,
  bool exporting = revealing_p && module_exporting_p ();
  if (exporting)
{
+ /* Module flags for templates are on the template_result.  */
+ tree decl = new_fn;
+ if (TREE_CODE (decl) == TEMPLATE_DECL)
+   decl = DECL_TEMPLATE_RESULT (decl);
+
  /* If the using decl is exported, the things it refers
-to must also be exported (or not habve module attachment).  */
- if (!DECL_MODULE_EXPORT_P (new_fn)
- && (DECL_LANG_SPECIFIC (new_fn)
- && DECL_MODULE_ATTACH_P (new_fn)))
+to must also be exported (or not have module attachment).  */
+ if (!DECL_MODULE_EXPORT_P (decl)
+ && (DECL_LANG_SPECIFIC (decl)
+ && DECL_MODULE_ATTACH_P (decl)))
{
+ auto_diagnostic_group d;
  error ("%q#D does not have external linkage", new_fn);
  inform (DECL_SOURCE_LOCATION (new_fn),
  "%q#D declared here", new_fn);
diff --git a/gcc/testsuite/g++.dg/modules/using-9.C 
b/gcc/testsuite/g++.dg/modules/using-9.C
new file mode 100644
index 000..4290280d897
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-9.C
@@ -0,0 +1,13 @@
+// PR c++/106849
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi !lib }
+
+export module lib;
+
+namespace outer {
+  template void any_of(T) { }  // { dg-note "declared here" }
+}
+
+export using outer::any_of;  // { dg-error "does not have external linkage" }
+
+// { dg-prune-output "not writing module" }


--
Nathan Sidwell



Re: [PATCH] c++/modules: fix virtual destructors [PR103499]

2023-11-09 Thread Nathan Sidwell

On 11/9/23 04:55, Nathaniel Shead wrote:

I'm not sure if this is just papering over a general issue of clones not being
exported/imported, or if this is just an exception to the general case of
clones being able to be freely regenerated with no other issues.

Alternatively, would it be better to override the DECL_VINDEX of the original
declaration after filling it in for the clones as well? I wasn't able to see
anything depending on the current behaviour (though I didn't look very hard).


I think your patch is a fine approach. IIRC just streaming out the clones 
directly ran into a bunch of issues, hence the current implementation.



ok

nathan



Bootstrapped and regtexted on x86_64-pc-linux-gnu.

-- >8 --

Currently, cloned functions are not included in the CMI.  However, for
virtual destructors the clones must have a different DECL_VINDEX from
their base declaration: the former have an INTEGER_CST indicating the
index into the vtable, while the latter indicate the FUNCTION_DECL that
they're overriding.

As such, this patch ensures that DECL_VINDEX is properly passed on for
cloned functions as well to prevent this from causing issues.

PR c++/103499

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_node): Write DECL_VINDEX for
virtual clones.
(trees_in::tree_node): Read DECL_VINDEX for virtual clones.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr103499_a.C: New test.
* g++.dg/modules/pr103499_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc  |  6 ++
  gcc/testsuite/g++.dg/modules/pr103499_a.C | 12 
  gcc/testsuite/g++.dg/modules/pr103499_b.C |  8 
  3 files changed, 26 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr103499_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/pr103499_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c1c8c226bc1..416a7c414cc 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8648,6 +8648,8 @@ trees_out::decl_node (tree decl, walk_kind ref)
  
tree_node (target);

tree_node (DECL_NAME (decl));
+  if (TREE_CODE (decl) == FUNCTION_DECL && DECL_VIRTUAL_P (decl))
+   tree_node (DECL_VINDEX (decl));
int tag = insert (decl);
if (streaming_p ())
dump (dumper::TREE)
@@ -9869,6 +9871,10 @@ trees_in::tree_node (bool is_use)
}
  }
  
+	/* A clone might have a different vtable entry.  */

+   if (res && TREE_CODE (res) == FUNCTION_DECL && DECL_VIRTUAL_P (res))
+ DECL_VINDEX (res) = tree_node ();
+
if (!res)
  set_overrun ();
int tag = insert (res);
diff --git a/gcc/testsuite/g++.dg/modules/pr103499_a.C 
b/gcc/testsuite/g++.dg/modules/pr103499_a.C
new file mode 100644
index 000..0497c2c5504
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr103499_a.C
@@ -0,0 +1,12 @@
+// PR c++/103499
+// { dg-module-do compile }
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi pr103499 }
+
+export module pr103499;
+
+export struct base {
+  virtual ~base() = default;
+};
+
+export struct derived : base {};
diff --git a/gcc/testsuite/g++.dg/modules/pr103499_b.C 
b/gcc/testsuite/g++.dg/modules/pr103499_b.C
new file mode 100644
index 000..b7468562ba9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr103499_b.C
@@ -0,0 +1,8 @@
+// PR c++/103499
+// { dg-additional-options "-fmodules-ts" }
+
+import pr103499;
+
+void test(derived* p) {
+  delete p;
+}


--
Nathan Sidwell



[pushed] diagnostics: cleanups to diagnostic-show-locus.cc

2023-11-09 Thread David Malcolm
Reduce implicit usage of line_table global, and move source printing to
within diagnostic_context.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5300-g8625aa24669669.

gcc/ChangeLog:
* diagnostic-show-locus.cc (layout::m_line_table): New field.
(compatible_locations_p): Convert to...
(layout::compatible_locations_p): ...this, replacing uses of
line_table global with m_line_table.
(layout::layout): Convert "richloc" param from a pointer to a
const reference.  Initialize m_line_table member.
(layout::maybe_add_location_range):  Replace uses of line_table
global with m_line_table.  Pass the latter to
linemap_client_expand_location_to_spelling_point.
(layout::print_leading_fixits): Pass m_line_table to
affects_line_p.
(layout::print_trailing_fixits): Likewise.
(gcc_rich_location::add_location_if_nearby): Update for change
to layout ctor params.
(diagnostic_show_locus): Convert to...
(diagnostic_context::maybe_show_locus): ...this, converting
richloc param from a pointer to a const reference.  Make "loc"
const.  Split out printing part of function to...
(diagnostic_context::show_locus): ...this.
(selftest::test_offset_impl): Update for change to layout ctor
params.
(selftest::test_layout_x_offset_display_utf8): Likewise.
(selftest::test_layout_x_offset_display_tab): Likewise.
(selftest::test_tab_expansion): Likewise.
* diagnostic.h (diagnostic_context::maybe_show_locus): New decl.
(diagnostic_context::show_locus): New decl.
(diagnostic_show_locus): Convert from a decl to an inline function.
* gdbinit.in (break-on-diagnostic): Update from a breakpoint
on diagnostic_show_locus to one on
diagnostic_context::maybe_show_locus.
* genmatch.cc (linemap_client_expand_location_to_spelling_point):
Add "set" param and use it in place of line_table global.
* input.cc (expand_location_1): Likewise.
(expand_location): Update for new param of expand_location_1.
(expand_location_to_spelling_point): Likewise.
(linemap_client_expand_location_to_spelling_point): Add "set"
param and use it in place of line_table global.
* tree-diagnostic-path.cc (event_range::print): Pass line_table
for new param of linemap_client_expand_location_to_spelling_point.

libcpp/ChangeLog:
* include/line-map.h (rich_location::get_expanded_location): Make
const.
(rich_location::get_line_table): New accessor.
(rich_location::m_line_table): Make the pointer be const.
(rich_location::m_have_expanded_location): Make mutable.
(rich_location::m_expanded_location): Likewise.
(fixit_hint::affects_line_p): Add const line_maps * param.
(linemap_client_expand_location_to_spelling_point): Likewise.
* line-map.cc (rich_location::get_expanded_location): Make const.
Pass m_line_table to
linemap_client_expand_location_to_spelling_point.
(rich_location::maybe_add_fixit): Likewise.
(fixit_hint::affects_line_p): Add set param and pass to
linemap_client_expand_location_to_spelling_point.
---
 gcc/diagnostic-show-locus.cc | 122 +++
 gcc/diagnostic.h |  21 --
 gcc/gdbinit.in   |   2 +-
 gcc/genmatch.cc  |   7 +-
 gcc/input.cc |  23 ---
 gcc/tree-diagnostic-path.cc  |   2 +-
 libcpp/include/line-map.h|  17 +++--
 libcpp/line-map.cc   |  22 ---
 8 files changed, 129 insertions(+), 87 deletions(-)

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 43523572fe5b..5edc319c3130 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -366,7 +366,7 @@ class layout
 {
  public:
   layout (const diagnostic_context ,
- rich_location *richloc,
+ const rich_location ,
  diagnostic_t diagnostic_kind,
  pretty_printer *pp);
 
@@ -428,7 +428,10 @@ class layout
   move_to_column (int *column, int dest_column, bool add_left_margin);
 
  private:
+  bool compatible_locations_p (location_t loc_a, location_t loc_b) const;
+
   const diagnostic_source_printing_options _options;
+  const line_maps *m_line_table;
   pretty_printer *m_pp;
   char_display_policy m_policy;
   location_t m_primary_loc;
@@ -930,13 +933,13 @@ test_get_line_bytes_without_trailing_whitespace ()
erroneously was leading to hundreds of lines of irrelevant source
being printed.  */
 
-static bool
-compatible_locations_p (location_t loc_a, location_t loc_b)
+bool
+layout::compatible_locations_p (location_t loc_a, location_t loc_b) const
 {
   if (IS_ADHOC_LOC (loc_a))
-loc_a = get_location_from_adhoc_loc (line_table, loc_a);
+loc_a = 

[PATCH] libgccjit: Add ability to get CPU features

2023-11-09 Thread Antoni Boucher
Hi.
This patch adds support for getting the CPU features in libgccjit (bug
112466)

There's a TODO in the test:
I'm not sure how to test that gcc_jit_target_info_arch returns the
correct value since it is dependant on the CPU.
Any idea on how to improve this?

Also, I created a CStringHash to be able to have a
std::unordered_set. Is there any built-in way of doing
this?

Thanks for the review.
From 302f9f0bb22deae3deb8249a9127447c3ec4f7c7 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Mon, 26 Jun 2023 18:29:15 -0400
Subject: [PATCH] libgccjit: Add ability to get CPU features

gcc/ChangeLog:
	PR jit/112466
	* Makefile.in (tm_jit_file_list, tm_jit_include_list, TM_JIT_H,
	JIT_TARGET_DEF, JIT_TARGET_H, JIT_TARGET_OBJS): New variables.
	(tm_jit.h, cs-tm_jit.h, jit/jit-target-hooks-def.h,
	s-jit-target-hooks-def-h): New rules.
	(s-tm-texi): Also check timestamp on jit-target.def.
	(generated_files): Add TM_JIT_H and jit/jit-target-hooks-def.h.
	(build/genhooks.o): Also depend on JIT_TARGET_DEF.
	* config.gcc (tm_jit_file, jit_target_objs, target_has_targetjitm):
	New variables.
	* config/i386/t-i386 (i386-jit.o): New rule.
	* config/t-linux (linux-jit.o): New rule.
	* configure: Regenerate.
	* configure.ac (tm_jit_file_list, tm_jit_include_list,
	jit_target_objs): Add substitutes.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (targetjitm): Document.
	(target_has_targetjitm): Document.
	* genhooks.cc: Include jit/jit-target.def.
	* config/default-jit.cc: New file.
	* config/i386/i386-jit.cc: New file.
	* config/i386/i386-jit.h: New file.
	* config/linux-jit.cc: New file.

gcc/jit/ChangeLog:
	PR jit/112466
	* Make-lang.in (JIT_OBJS): New variable.
	* jit-playback.cc (replay): Include jit-target.h and initialize
	target.
	* jit-playback.h (class get_target_info): New class.
	* jit-recording.cc (recording::context::get_target_info): New
	method.
	* jit-recording.h (recording::context::get_target_info): New
	method.
	* libgccjit.cc: Include jit-target.h.
	(struct gcc_jit_target_info): New struct.
	(gcc_jit_context_get_target_info, gcc_jit_target_info_release,
	gcc_jit_target_info_cpu_supports, gcc_jit_target_info_arch,
	gcc_jit_target_info_supports_128bit_int): New functions.
	* libgccjit.h (gcc_jit_context_get_target_info,
	gcc_jit_target_info_release, gcc_jit_target_info_cpu_supports,
	gcc_jit_target_info_arch, gcc_jit_target_info_supports_128bit_int):
	New functions.
	* libgccjit.map (LIBGCCJIT_ABI_26): New ABI tag.
	* docs/topics/compilation.rst: Add documentation for the
	functions gcc_jit_context_get_target_info, gcc_jit_target_info_release,
	gcc_jit_target_info_cpu_supports, gcc_jit_target_info_arch,
	gcc_jit_target_info_supports_128bit_int.
	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_26): New ABI tag.
	* jit-target-def.h: New file.
	* jit-target.cc: New file.
	* jit-target.def: New file.
	* jit-target.h: New file.

gcc/testsuite/ChangeLog:
	PR jit/112466
	* jit.dg/all-non-failing-tests.h: Mention
	test-target-info.c.
	* jit.dg/test-target-info.c: New test.
---
 gcc/Makefile.in  |  29 ++-
 gcc/config.gcc   |  21 ++
 gcc/config/default-jit.cc|  29 +++
 gcc/config/i386/i386-jit.cc  | 195 +++
 gcc/config/i386/i386-jit.h   |  22 +++
 gcc/config/i386/t-i386   |   4 +
 gcc/config/linux-jit.cc  |  36 
 gcc/config/t-linux   |   4 +
 gcc/configure|  14 ++
 gcc/configure.ac |  14 ++
 gcc/doc/tm.texi  |  26 +++
 gcc/doc/tm.texi.in   |  16 ++
 gcc/genhooks.cc  |   1 +
 gcc/jit/Make-lang.in |   8 +-
 gcc/jit/docs/topics/compatibility.rst|  14 ++
 gcc/jit/docs/topics/compilation.rst  |  51 +
 gcc/jit/jit-playback.cc  |   2 +
 gcc/jit/jit-playback.h   |  17 +-
 gcc/jit/jit-recording.cc |  19 ++
 gcc/jit/jit-recording.h  |   3 +
 gcc/jit/jit-target-def.h |  20 ++
 gcc/jit/jit-target.cc|  89 +
 gcc/jit/jit-target.def   |  52 +
 gcc/jit/jit-target.h |  73 +++
 gcc/jit/libgccjit.cc |  43 
 gcc/jit/libgccjit.h  |  60 ++
 gcc/jit/libgccjit.map|   9 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h |   3 +
 gcc/testsuite/jit.dg/test-target-info.c  |  63 ++
 29 files changed, 931 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/default-jit.cc
 create mode 100644 gcc/config/i386/i386-jit.cc
 create mode 100644 gcc/config/i386/i386-jit.h
 create mode 100644 gcc/config/linux-jit.cc
 create mode 100644 gcc/jit/jit-target-def.h
 create mode 100644 gcc/jit/jit-target.cc
 create 

RISC-V GCC Patchwork Sync on Nov 14th and 21st

2023-11-09 Thread Palmer Dabbelt
I'm going to be traveling for the next two weeks (Plumbers and then 
Thanksgiving), so I won't be at the patchwork syncs.


Re: [PATCH] RISC-V: VECT: Remember to assert any_known_not_updated_vssa

2023-11-09 Thread Maxim Blinov
Yes, those tests that triggered the ICE now pass.

Maxim


On Thu, 9 Nov 2023 at 16:26, Jeff Law  wrote:

>
>
> On 11/6/23 06:01, Maxim Blinov wrote:
> > From: Maxim Blinov 
> >
> > This patch is based on and intended for the
> vendors/riscv/gcc-13-with-riscv-opts branch - please apply if looks OK.
> >
> > Fixes the following ICEs that I'm seeing:
> >
> > FAIL: gcc.dg/vect/O3-pr49087.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/no-scevccp-pr86725-1.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/no-scevccp-pr86725-2.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/no-scevccp-pr86725-3.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/no-scevccp-pr86725-4.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/pr94443.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/pr94443.c -flto -ffat-lto-objects (internal compiler
> error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/slp-50.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/slp-50.c -flto -ffat-lto-objects (internal compiler
> error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/vect-cond-13.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/vect-cond-13.c -flto -ffat-lto-objects (internal
> compiler error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/vect-live-6.c (internal compiler error: in
> vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (internal
> compiler error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c (internal compiler
> error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> > FAIL: gcc.target/riscv/rvv/autovec/partial/live-2.c (internal compiler
> error: in vect_transform_loops, at tree-vectorizer.cc:1032)
> >
> > -- >8 --
> >
> > When we create a VEC_EXPAND gimple stmt:
> >
> >/* SCALAR_RES = VEC_EXTRACT .  */
> >tree scalar_res
> >  = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE
> (vectype),
> >  vec_lhs_phi, last_index);
> >
> > Under the hood we are really just creating a GIMPLE_CALL stmt. Later
> > on, when we `gsi_insert_seq_before` our stmts:
> >
> >if (stmts)
> >  {
> >gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
> >gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
> >
> > We eventually run into tree-ssa-operands.cc:1147:
> >
> >operands_scanner (fn, stmt).build_ssa_operands ();
> >
> > Since VEC_EXPAND is *not* marked with ECF_NOVOPS, ECF_CONST, or
> > ECF_PURE flags in internal-fn.def, when
> > `operand_scanner::parse_ssa_operands` comes across our
> > VEC_EXTRACT-type GIMPLE_CALL, it generates a `gimple_vop()` artificial
> > variable.
> >
> > `operand_scanner::finalize_ssa_defs` then picks this up, so our final
> > stmt goes from
> >
> > _73 = .VEC_EXTRACT (vect_last_9.56_71, _72);
> >
> > to
> >
> > # .MEM = VDEF <>
> > _73 = .VEC_EXTRACT (vect_last_9.56_71, _72);
> >
> > But more importantly it marks us as `ssa_renaming_needed`, in
> > tree-ssa-operands.cc:420:
> >
> >/* If we have a non-SSA_NAME VDEF, mark it for renaming.  */
> >if (gimple_vdef (stmt)
> >&& TREE_CODE (gimple_vdef (stmt)) != SSA_NAME)
> >  {
> >fn->gimple_df->rename_vops = 1;
> >fn->gimple_df->ssa_renaming_needed = 1;
> >  }
> >
> > This then proceeds to crash the compiler when we are about to leave
> > `vect_transform_loops`:
> >
> >if (need_ssa_update_p (cfun))
> >  {
> >gcc_assert (loop_vinfo->any_known_not_updated_vssa);
> >fun->gimple_df->ssa_renaming_needed = false;
> >todo |= TODO_update_ssa_only_virtuals;
> >  }
> >
> > Since,
> >
> > - `need_ssa_update_p (cfun)` is true (it was set when we generated a
> >memory vdef)
> > - `loop_vinfo->any_known_not_updated_vssa` is false
> >
> > As the code currently stands, creating a gimple stmt containing a
> > VEC_EXTRACT should always generate a memory vdef, therefore we should
> > remember to mark `loop_vinfo->any_known_not_updated_vssa` afterwards.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-loop.cc (vectorizable_live_operation): Remember to
> >   assert loop_vinfo->any_known_not_updated_vssa if we are inserting
> >   a call to VEC_EXPAND.
> Just to avoid any doubt -- with the internal-fn.def patch I cherry
> picked earlier this week to the branch, this is no longer needed, right?
>
> jeff
>


Re: [PATCH v4 1/2] c++: Initial support for P0847R7 (Deducing this) [PR102609]

2023-11-09 Thread Jason Merrill

On 11/5/23 10:06, waffl3x wrote:

Bootstrapped and tested on x86_64-linux with no regressions.

I originally threw this e-mail together last night, but threw in the
towel when I thought I saw tests failing and went to sleep. I did a
proper bootstrap and comparison and whatnot and found that there were
thankfully no regressions.

Anyhow, the first patch feels ready for trunk, the second needs at
least one review, I'll write more on that in the second e-mail though.
I put quite a lot into the commit message, in hindsight I think I may
have gone overboard, but that isn't something I'm going to rewrite at
the moment. I really want to get these patches up for review so they
can be finalized.

I'm also including my usual musings on things that came up as I was
polishing off the patches. I reckon some of them aren't all that
important right now but I would rather toss them in here than forget
about them.

I'm starting to think that we should have a general macro that
indicates whether an implicit object argument should be passed in the
call. It might be more clear than what is currently present. I've also
noticed that there's a fair amount of places where instead of using
DECL_NONSTATIC_MEMBER_FUNCTION_P the code checks if tree_code of the
type is a METHOD_TYPE, which is exactly what the aforementioned macro
does.


Agreed.


In build_min_non_dep_op_overload I reversed the branches of a condition
because it made more sense with METHOD_TYPE first so it doesn't have to
take xobj member functions into account on both branches. I am slightly
concerned that flipping the branch around might have consequences,
hence why I am mentioning it. Realistically I think it's probably fine
though.


Agreed.


BTW let me know if there's anything you would prefer to be done
differently in the changelog, I am still having trouble writing them
and I'm usually uncertain if I'm writing them properly.



(DECL_FUNCTION_XOBJ_FLAG): Define.


This is usually "New macro" or just "New".


* decl.cc (grokfndecl): New param XOBJ_FUNC_P, for xobj member
functions set DECL_FUNCTION_XOBJ_FLAG and don't set
DECL_STATIC_FUNCTION_P.
(grokdeclarator): Check for xobj param, clear it's purpose and set
is_xobj_member_function if it is present.  When flag set, don't change
type to METHOD_TYPE, keep it as FUNCTION_TYPE.
Adjust call to grokfndecl, pass is_xobj_member_function.


These could be less verbose; for grokfndecl it makes sense to mention 
the new parameter, but otherwise just saying "handle explicit object 
member functions" is enough.



It needs to be noted that we can not add checking for xobj member functions to
DECL_NONSTATIC_MEMBER_FUNCTION_P as it is used in cp-objcp-common.cc.  While it
most likely would be fine, it's possible it could have unintended effects.  In
light of this, we will most likely need to do some refactoring, possibly
renaming and replacing it.  In contrast, DECL_FUNCTION_MEMBER_P is not used
outside of C++ code, so we can add checking for xobj member functions to it
without any concerns.


I think DECL_NONSTATIC_MEMBER_FUNCTION_P should probably be renamed to 
DECL_IOBJ_MEMBER_FUNC_P to parallel the new macro...



@@ -3660,6 +3660,7 @@ build_min_non_dep_op_overload (enum tree_code op,
 
   expected_nargs = cp_tree_code_length (op);

   if (TREE_CODE (TREE_TYPE (overload)) == METHOD_TYPE
+  || DECL_XOBJ_MEMBER_FUNC_P (overload)


...and then the combination should have its own macro, perhaps 
DECL_OBJECT_MEMBER_FUNC_P, spelling out OBJECT to avoid visual confusion 
with either IOBJ/XOBJ.


Renaming the old macro doesn't need to happen in this patch, but adding 
the new macro should.



There are a few known issues still present in this patch.  Most importantly,
the implicit object argument fails to convert when passed to by-value xobj
parameters.  This occurs both for xobj parameters that match the argument type
and xobj parameters that are unrelated to the object type, but have valid
conversions available.  This behavior can be observed in the
explicit-obj-by-value[1-3].C tests.  The implicit object argument appears to be
simply reinterpreted instead of any conversion applied.  This is elaborated on
in the test cases.


Yes, that's because of:


@@ -9949,7 +9951,8 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
}
 }
   /* Bypass access control for 'this' parameter.  */
-  else if (TREE_CODE (TREE_TYPE (fn)) == METHOD_TYPE)
+  else if (TREE_CODE (TREE_TYPE (fn)) == METHOD_TYPE
+  || DECL_XOBJ_MEMBER_FUNC_P (fn))


We don't want to take this path for xob fns.  Instead I think we need to 
change the existing:



  gcc_assert (first_arg == NULL_TREE);


to assert that if first_arg is non-null, we're dealing with an xob fn, 
and then go ahead and do the same conversion as the loop body on first_arg.



Despite this, calls where there is no valid conversion
available are correctly rejected, which I 

Re: [PATCH] testsuite: tsan: add fallback overload for pthread_cond_clockwait

2023-11-09 Thread Mike Stump
On Nov 8, 2023, at 5:49 PM, Alexandre Oliva  wrote:
> 
> LTS GNU/Linux distros from 2018, still in use, don't have
> pthread_cond_clockwait.  There's no trivial way to detect it so as to
> make the test conditional, but there's an easy enough way to silence
> the fail due to lack of the function in libc, and that has nothing to
> do with the false positive that this is testing against.
> 
> Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
> x86_64-, on distros that offer and that lack pthread_cond_clockwait.  Ok
> to install?

Ok.


Re: [PATCH] Add type-generic clz/ctz/clrsb/ffs/parity/popcount builtins [PR111309]

2023-11-09 Thread Joseph Myers
On Thu, 9 Nov 2023, Jakub Jelinek wrote:

> The main reason to add these is to support arbitrary unsigned (for
> clrsb/ffs signed) bit-precise integer types and also __int128 which
> wasn't supported by the existing builtins, so that e.g. 
> type-generic functions could then support not just bit-precise unsigned
> integer type whose width matches a standard or extended integer type,
> but others too.

Thanks for working on this.  My plan for the  implementation I'm 
working on for glibc is to start with implementations using existing 
built-in functions (and only handling types whose width matches standard 
types), then supporting _BitInt with these new built-in functions will be 
suitable for a followup.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: testsuite: introduce hostedlib effective target

2023-11-09 Thread Mike Stump
On Nov 8, 2023, at 8:29 AM, Alexandre Oliva  wrote:
> 
> On Nov  5, 2023, Mike Stump  wrote:
> 
>> that, otherwise, I'll approve this version.
> 
> FWIW, this version is not usable as is.  Something went wrong in my
> testing, and several regressions only visible in hosted mode made to the
> version I posted, that adds some missing end-of-comment markers for the
> added dg directives, and moving the new dg directive to the end so as to
> not disturb line numbers.  I've got a fully fixed and properly tested
> version, but since it's about as big as the original patch, I'll only
> post it upon request.

Updates and fixes to the original plan are fine.

I'm still planning on letting you decide based upon input from everyone.  :-)


Re: [PATCH] testsuite: arg-pushing reqs -mno-accumulate-outgoing-args

2023-11-09 Thread Mike Stump
On Nov 8, 2023, at 7:55 AM, Alexandre Oliva  wrote:
> 
> gcc.target/i386/pr95126-m32-[34].c expect push instructions that are
> only present with -mno-accumulate-outgoing-args, so make that option
> explicit rather than dependent on tuning.
> 
> Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
> x86_64-.  Ok to install?

Ok.


[PATCH] diagnostics: Fix behavior of permerror options after diagnostic pop [PR111918]

2023-11-09 Thread Lewis Hyatt
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111918

This patch fixes the behavior of `#pragma GCC diagnostic pop' for permissive
error diagnostics such as -Wnarrowing (in C++11). Those currently do not
return to the correct state after the last pop; they become effectively
simple warnings instead. Bootstrap + regtest all languages on x86-64, does
it look OK please? Thanks!

-Lewis

-- >8 --

When a diagnostic pragma changes the classification of a given diagnostic,
the global options flags (such as warn_narrowing, etc.) may get changed too.
Specifically, if a warning was not enabled initially and was later enabled
by a pragma, then the corresponding global flag will change from false to
true when the pragma is processed. That change is permanent and is not
undone by a subsequent `#pragma GCC diagnostic pop'; the warning flag needs
to remain enabled since a diagnostic could be generated later on for a
source location prior to the pop.

So in order to support popping to the initial classification, given that the
global options flags no longer reflect that state, the diagnostic_context
object itself remembers the way things were before it changed anything. The
current implementation works fine for diagnostics that are always errors or
always warnings, but it doesn't do the right thing for diagnostics that
could be either, such as -Wnarrowing. The classification of that diagnostic
(or any permerror diagnostic) depends on the state of -fpermissive; for the
particular case of -Wnarrowing it also matters whether a compile-time or
run-time narrowing is being diagnosed.

The problem is that the current implementation insists on recording whether
an enabled diagnostic should be a DK_WARNING or a DK_ERROR, and then, after
popping to the initial state, it overrides it always to that type only. Fix
that up by adding a new internal diagnostic type DK_ANY. This just indicates
that the diagnostic is enabled without mandating exactly what type of
diagnostic it should be. Then the diagnostic can be emitted with whatever
type the frontend asks for.

Incidentally, while making this change, I noticed that classify_diagnostic()
spends some time computing a return value (the old classification kind) that
is not used anywhere. The computed value seems to have some problems, mainly
that it does not take into account `#pragma GCC diagnostic pop' at all, and
so the returned value doesn't seem like it could make sense in many
contexts. Given it would also not be desirable to leak the new internal-only
DK_ANY type to outside callers, I think it would make sense in a subsequent
cleanup patch to remove the return value altogether.

gcc/ChangeLog:

PR c++/111918
* diagnostic-core.h (enum diagnostic_t): Add DK_ANY special flag.
* diagnostic.cc (diagnostic_option_classifier::classify_diagnostic):
Make use of DK_ANY to indicate a diagnostic was initially enabled.
(diagnostic_context::diagnostic_enabled): Do not change the type of
a diagnostic if the saved classification is type DK_ANY.

gcc/testsuite/ChangeLog:

PR c++/111918
* g++.dg/cpp0x/Wnarrowing21a.C: New test.
* g++.dg/cpp0x/Wnarrowing21b.C: New test.
* g++.dg/cpp0x/Wnarrowing21c.C: New test.
* g++.dg/cpp0x/Wnarrowing21d.C: New test.
---
 gcc/diagnostic-core.h  |  5 -
 gcc/diagnostic.cc  | 13 ++---
 gcc/testsuite/g++.dg/cpp0x/Wnarrowing21a.C | 14 ++
 gcc/testsuite/g++.dg/cpp0x/Wnarrowing21b.C |  9 +
 gcc/testsuite/g++.dg/cpp0x/Wnarrowing21c.C |  9 +
 gcc/testsuite/g++.dg/cpp0x/Wnarrowing21d.C |  9 +
 6 files changed, 55 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wnarrowing21a.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wnarrowing21b.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wnarrowing21c.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wnarrowing21d.C

diff --git a/gcc/diagnostic-core.h b/gcc/diagnostic-core.h
index 04eba3d140e..4926c48da96 100644
--- a/gcc/diagnostic-core.h
+++ b/gcc/diagnostic-core.h
@@ -33,7 +33,10 @@ typedef enum
   DK_LAST_DIAGNOSTIC_KIND,
   /* This is used for tagging pragma pops in the diagnostic
  classification history chain.  */
-  DK_POP
+  DK_POP,
+  /* This is used internally to note that a diagnostic is enabled
+ without mandating any specific type.  */
+  DK_ANY,
 } diagnostic_t;
 
 /* RAII-style class for grouping related diagnostics.  */
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index addd6606eaa..99921a10b7b 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -1126,8 +1126,7 @@ classify_diagnostic (const diagnostic_context *context,
  old_kind = !context->m_option_enabled (option_index,
 context->m_lang_mask,
 context->m_option_state)
-   ? DK_IGNORED : 

[COMMITTED] MAINTAINERS: Add myself to write after approval

2023-11-09 Thread Jivan Hakobyan
MAINTAINERS: Add myself to write after approval

Signed-off-by: Jeff Law 

ChangeLog:

* MAINTAINERS: Add myself.

diff --git a/MAINTAINERS b/MAINTAINERS
index 30cb530a3b1..c43167d9a75 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -445,6 +445,7 @@ Wei Guozhi  <
car...@google.com>
 Vineet Gupta   
 Naveen H.S 
 Mostafa Hagog  
+Jivan Hakobyan 
 Andrew Haley   
 Frederik Harwath   
 Stuart Hastings

-- 
With the best regards
Jivan Hakobyan


[PATCH] c++/modules: handle templates in exported using-declarations [PR106849]

2023-11-09 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

A TEMPLATE_DECL does not have module attachment flags associated with
it, so this patch extracts the result from the template to read the
flags from there instead.

As a drive-by fix we also group the error with its informative note.

PR c++/106849

gcc/cp/ChangeLog:

* name-lookup.cc (do_nonmember_using_decl): Handle
TEMPLATE_DECLs when checking module attachment.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-9.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc  | 14 ++
 gcc/testsuite/g++.dg/modules/using-9.C | 13 +
 2 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/using-9.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index a8b9229b29e..512dc1be87f 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -4846,12 +4846,18 @@ do_nonmember_using_decl (name_lookup , bool 
fn_scope_p,
  bool exporting = revealing_p && module_exporting_p ();
  if (exporting)
{
+ /* Module flags for templates are on the template_result.  */
+ tree decl = new_fn;
+ if (TREE_CODE (decl) == TEMPLATE_DECL)
+   decl = DECL_TEMPLATE_RESULT (decl);
+
  /* If the using decl is exported, the things it refers
-to must also be exported (or not habve module attachment).  */
- if (!DECL_MODULE_EXPORT_P (new_fn)
- && (DECL_LANG_SPECIFIC (new_fn)
- && DECL_MODULE_ATTACH_P (new_fn)))
+to must also be exported (or not have module attachment).  */
+ if (!DECL_MODULE_EXPORT_P (decl)
+ && (DECL_LANG_SPECIFIC (decl)
+ && DECL_MODULE_ATTACH_P (decl)))
{
+ auto_diagnostic_group d;
  error ("%q#D does not have external linkage", new_fn);
  inform (DECL_SOURCE_LOCATION (new_fn),
  "%q#D declared here", new_fn);
diff --git a/gcc/testsuite/g++.dg/modules/using-9.C 
b/gcc/testsuite/g++.dg/modules/using-9.C
new file mode 100644
index 000..4290280d897
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-9.C
@@ -0,0 +1,13 @@
+// PR c++/106849
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi !lib }
+
+export module lib;
+
+namespace outer {
+  template void any_of(T) { }  // { dg-note "declared here" }
+}
+
+export using outer::any_of;  // { dg-error "does not have external linkage" }
+
+// { dg-prune-output "not writing module" }
-- 
2.42.0



Re: [PATCH] Add missing declaration of get_restrict in C++ interface

2023-11-09 Thread David Malcolm
On Thu, 2023-11-09 at 21:51 +0100, Guillaume Gomez wrote:
> I confirm it does. I realized it when finalizing our patch for
> attributes support.

Excellent; thanks for the fix.

Dave




Re: [PATCH] Add missing declaration of get_restrict in C++ interface

2023-11-09 Thread Guillaume Gomez
I confirm it does. I realized it when finalizing our patch for
attributes support.

Le jeu. 9 nov. 2023 à 21:49, David Malcolm  a écrit :
>
> On Thu, 2023-11-09 at 21:03 +0100, Guillaume Gomez wrote:
> > Hi,
> >
> > This patch adds the `get_restrict` method declaration for
> > the C++ interface as it was forgotten.
> >
> > Thanks in advance for the review.
>
> Looking at my jit.sum results, it looks like the .cc files are indeed
> FAILing on initial compilation, with errors such as:
>
> In file included from gcc/testsuite/jit.dg/test-alignment.cc:4:
> gcc/testsuite/../jit/libgccjit++.h:1414:1: error: no declaration matches 
> 'gccjit::type gccjit::type::get_restrict()'
> gcc/testsuite/../jit/libgccjit++.h:1414:1: note: no functions named 
> 'gccjit::type gccjit::type::get_restrict()'
> gcc/testsuite/../jit/libgccjit++.h:350:9: note: 'class gccjit::type' defined 
> here
>
> which presumably started with r14-3552-g29763b002459cb.
>
> Hence the patch looks good to me - thanks!
>
> Does this patch fix those test cases?
>
> Dave
>


Re: [PATCH] Add missing declaration of get_restrict in C++ interface

2023-11-09 Thread David Malcolm
On Thu, 2023-11-09 at 21:03 +0100, Guillaume Gomez wrote:
> Hi,
> 
> This patch adds the `get_restrict` method declaration for
> the C++ interface as it was forgotten.
> 
> Thanks in advance for the review.

Looking at my jit.sum results, it looks like the .cc files are indeed
FAILing on initial compilation, with errors such as:

In file included from gcc/testsuite/jit.dg/test-alignment.cc:4:
gcc/testsuite/../jit/libgccjit++.h:1414:1: error: no declaration matches 
'gccjit::type gccjit::type::get_restrict()'
gcc/testsuite/../jit/libgccjit++.h:1414:1: note: no functions named 
'gccjit::type gccjit::type::get_restrict()'
gcc/testsuite/../jit/libgccjit++.h:350:9: note: 'class gccjit::type' defined 
here

which presumably started with r14-3552-g29763b002459cb.

Hence the patch looks good to me - thanks!

Does this patch fix those test cases?

Dave



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-09 Thread Vladimir Makarov



On 11/7/23 22:47, Lehua Ding wrote:


Lehua Ding (7):
   ira: Refactor the handling of register conflicts to make it more
 general
   ira: Add live_subreg problem and apply to ira pass
   ira: Support subreg live range track
   ira: Support subreg copy
   ira: Add all nregs >= 2 pseudos to tracke subreg list
   lra: Apply live_subreg df_problem to lra pass
   lra: Support subreg live range track and conflict detect

Thank you very much for addressing subreg RA.  It is a big work.  I 
wanted to address this long time ago but have no time to do this by myself.


I tried to evaluate your patches on x86-64 (i7-9700k) release mode GCC.  
I used -O3 for SPEC2017 compilation.


Here are the results:

   baseline baseline(+patches)
specint2017:  8.51 vs 8.58 (+0.8%)
specfp2017:   21.1 vs 21.1 (+0%)
compile time: 2426.41s vs 2580.58s (+6.4%)

Spec2017 average code size change: -0.07%

Improving specint by 0.8% is impressive for me.

Unfortunately, it is achieved by decreasing compilation speed by 6.4% 
(although on smaller benchmark I saw only 3% slowdown). I don't know how 
but we should mitigate this speed degradation.  May be we can find a hot 
spot in the new code (but I think it is not a linear search pointed by 
Richard Biener as the object vectors most probably contain 1-2 elements) 
and this code spot can be improved, or we could use this only for 
-O3/fast, or the code can be function or target dependent.


I also find GCC consumes more memory with the patches. May be it can be 
improved too (although I am not sure about this).


I'll start to review the patches on the next week.  I don't expect that 
I'll find something serious to reject the patches but again we should 
work on mitigation of the compilation speed problem.  We can fill a new 
PR for this and resolve the problem during the release cycle.





[PATCH] Add missing declaration of get_restrict in C++ interface

2023-11-09 Thread Guillaume Gomez
Hi,

This patch adds the `get_restrict` method declaration for
the C++ interface as it was forgotten.

Thanks in advance for the review.
From e819fd01cd3e79bfab28a77f4ce78f34156e7a83 Mon Sep 17 00:00:00 2001
From: Guillaume Gomez 
Date: Thu, 9 Nov 2023 17:53:08 +0100
Subject: [PATCH] Add missing declaration of get_restrict in C++ interface

gcc/jit/ChangeLog:

	* libgccjit++.h:
---
 gcc/jit/libgccjit++.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h
index 4a04db386e6..f9a0017cae5 100644
--- a/gcc/jit/libgccjit++.h
+++ b/gcc/jit/libgccjit++.h
@@ -360,6 +360,7 @@ namespace gccjit
 type get_volatile ();
 type get_aligned (size_t alignment_in_bytes);
 type get_vector (size_t num_units);
+type get_restrict ();
 
 // Shortcuts for getting values of numeric types:
 rvalue zero ();
-- 
2.34.1



Re: [PATCH] testsuite/vect: Make check more accurate.

2023-11-09 Thread Thomas Schwinge
Hi!

On 2023-11-07T09:22:16+0100, Robin Dapp  wrote:
> similar to before this modifies a check so we do only match a
> vectorization attempt if it succeeded.  On riscv we potentially try
> several modes of which some may fail.
>
> I tested on riscv, aarch64 and x86 but on the cfarm machines
> there is no vect_fold_extract_last.  Maybe gcn would work?

With GCN (tested '-march=gfx906'), I actually see a "regression":

PASS: gcc.dg/vect/vect-cond-reduc-4.c (test for excess errors)
PASS: gcc.dg/vect/vect-cond-reduc-4.c execution test
PASS: gcc.dg/vect/vect-cond-reduc-4.c scan-tree-dump-times vect "LOOP 
VECTORIZED" 2
[-PASS:-]{+FAIL:+} gcc.dg/vect/vect-cond-reduc-4.c scan-tree-dump-times 
vect "optimizing condition reduction with 
[-FOLD_EXTRACT_LAST"-]{+FOLD_EXTRACT_LAST(?:(?!failed)(?!Re-trying).)*succeeded"+}
 2

That's "regression" in quotes as indeed there is no "succeeded":

$ grep -C1 'optimizing condition reduction with FOLD_EXTRACT_LAST' -- 
vect-cond-reduc-4.c.176t.vect
[...]/source-gcc/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c:19:21: note: 
  vect_is_simple_use: vectype vector(64) int
[...]/source-gcc/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c:19:21: 
missed:   optimizing condition reduction with FOLD_EXTRACT_LAST.
vect_model_reduction_cost: inside_cost = 0, prologue_cost = 0, 
epilogue_cost = 0 .
--
[...]/source-gcc/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c:19:21: note: 
  vect_is_simple_use: vectype vector(64) int
[...]/source-gcc/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c:19:21: 
missed:   optimizing condition reduction with FOLD_EXTRACT_LAST.
vect_model_reduction_cost: inside_cost = 0, prologue_cost = 0, 
epilogue_cost = 0 .


Grüße
 Thomas


> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-cond-reduc-4.c: Make check more accurate.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
> index 8ea8c538713..c5aa989ec29 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
> @@ -42,7 +42,7 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 2 "vect" { target { vect_fold_extract_last && 
> vect_pack_trunc } } } } */
> +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST(?:(?!failed)(?!Re-trying).)*succeeded" 2 "vect" { target { 
> vect_fold_extract_last && vect_pack_trunc } } } } */
>  /* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 4 "vect" { target { { vect_fold_extract_last } && { ! 
> vect_pack_trunc } } } } } */
>  /* { dg-final { scan-tree-dump-times "condition expression based on integer 
> induction." 2 "vect" { target { ! vect_fold_extract_last } } } } */
>
> --
> 2.41.0
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] c++: fix parsing with auto(x) [PR112410]

2023-11-09 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we are wrongly parsing

  int y(auto(42));

which uses the C++23 cast-to-prvalue feature, and initializes y to 42.
However, we were treating the auto as an implicit template parameter.

Fixing the auto{42} case is easy, but when auto is followed by a (,
I found the fix to be much more involved.  For instance, we cannot
use cp_parser_expression, because that can give hard errors.  It's
also necessary to disambiguate 'auto(i)' as 'auto i', not a cast.
auto(), auto(int), auto(f)(int), auto(*), auto(i[]), auto(...), etc.
are all function declarations.  We have to look at more than one
token to decide.

In this fix, I'm (ab)using cp_parser_declarator, with member_p=false
so that it doesn't commit.  But it handles even more complicated
cases as

  int fn (auto (*const **)(int) -> char);

PR c++/112410

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_type_specifier): Disambiguate
between a variable and function declaration with auto.
(cp_parser_constructor_declarator_p): Use cp_parser_starts_param_decl_p.
(cp_parser_starts_param_decl_p): New, factored out of
cp_parser_constructor_declarator_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/auto-fncast13.C: New test.
---
 gcc/cp/parser.cc   | 79 ++
 gcc/testsuite/g++.dg/cpp23/auto-fncast13.C | 61 +
 2 files changed, 125 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast13.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5116bcb78f6..3edee092e56 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2887,6 +2887,8 @@ static bool cp_parser_next_token_ends_template_argument_p
   (cp_parser *);
 static bool cp_parser_nth_token_starts_template_argument_list_p
   (cp_parser *, size_t);
+static bool cp_parser_starts_param_decl_p
+  (cp_parser *);
 static enum tag_types cp_parser_token_is_class_key
   (cp_token *);
 static enum tag_types cp_parser_token_is_type_parameter_key
@@ -19991,6 +19993,8 @@ cp_parser_simple_type_specifier (cp_parser* parser,
  /* The 'auto' might be the placeholder return type for a function decl
 with trailing return type.  */
  bool have_trailing_return_fn_decl = false;
+ /* Or it might be auto(x) or auto {x}.  */
+ bool decay_copy = false;
 
  cp_parser_parse_tentatively (parser);
  cp_lexer_consume_token (parser->lexer);
@@ -20002,12 +20006,43 @@ cp_parser_simple_type_specifier (cp_parser* parser,
  if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
{
  cp_lexer_consume_token (parser->lexer);
+ /* An auto specifier that appears in a parameter declaration
+might be the placeholder for a late return type, or it
+can be an implicit template parameter.  But it can also
+be a prvalue cast, rendering the current construct not
+a function declaration at all.  Check if it decidedly
+cannot be a valid function-style cast first...  */
+ if (!cp_parser_starts_param_decl_p (parser))
+   {
+ /* Ug, we couldn't tell.  Try to parse whatever follows
+as a declarator; this should detect cases like
+auto(i), auto(*), auto(f[]), auto(f)(int).  */
+ cp_parser_declarator (parser, CP_PARSER_DECLARATOR_EITHER,
+   CP_PARSER_FLAGS_NONE,
+   /*ctor_dtor_or_conv_p=*/nullptr,
+   /*parenthesized_p=*/NULL,
+   /*member_p=*/true,
+   /*friend_p=*/false,
+   /*static_p=*/true);
+ /* OK, if we now see a ')', it looks like a valid
+function declaration.  Otherwise, let's go with
+auto(x).  */
+ decay_copy
+   = cp_lexer_next_token_is_not (parser->lexer,
+ CPP_CLOSE_PAREN);
+   }
  cp_parser_skip_to_closing_parenthesis (parser,
 /*recovering*/false,
 /*or_comma*/false,
 /*consume_paren*/true);
  continue;
}
+ /* The easy case: it has got to be C++23 auto(x).  */
+ else if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+   {
+ decay_copy = true;
+ break;
+   }
 
  if 

Re: [PATCH 2/2] libstdc++: mark 20_util/scoped_allocator/noexcept.cc R-E-T hosted

2023-11-09 Thread Jonathan Wakely
On Thu, 9 Nov 2023 at 19:32, Arsen Arsenović  wrote:
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/20_util/scoped_allocator/noexcept.cc: Mark as
> requiring hosted.

OK for trunk, thanks.

The test has been backported, but we don't have the hosted et there so
this isn't needed on the branches.

> ---
>  libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc 
> b/libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc
> index 16992968d3b9..f14eff2c46f5 100644
> --- a/libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc
> +++ b/libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc
> @@ -1,4 +1,5 @@
>  // { dg-do compile { target c++11 } }
> +// { dg-require-effective-target hosted }
>
>  #include 
>
> --
> 2.42.1
>



Re: [PATCH] libstdc++: Fix forwarding in __take/drop_of_repeat_view [PR112453]

2023-11-09 Thread Jonathan Wakely
On Thu, 9 Nov 2023 at 16:01, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/13?  (The
> && overloads are also missing on earlier branches, but I don't think
> it makes a difference there since all uses of that operator* are on
> lvalues before this fix.)

OK for trunk and gcc-13, thanks.


>
> -- >8 --
>
> We need to respect the value category of the repeat_view passed to these
> two functions when accessing its _M_value member.  This revealed that
> the space-efficient partial specialization of __box lacks && overloads
> of operator* to match std::optional's API.
>
> PR libstdc++/112453
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (__detail::__box::operator*): Define &&
> overloads as well.
> (__detail::__take_of_repeat_view): Forward __r when accessing
> its _M_value member.
> (__detail::__drop_of_repeat_view): Likewise.
> * testsuite/std/ranges/repeat/1.cc (test07): New test.
> ---
>  libstdc++-v3/include/std/ranges   | 20 ++-
>  libstdc++-v3/testsuite/std/ranges/repeat/1.cc | 13 
>  2 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index 7893e3a84c9..41f95dc8f78 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -250,13 +250,21 @@ namespace ranges
> { return true; };
>
> constexpr _Tp&
> -   operator*() noexcept
> +   operator*() & noexcept
> { return _M_value; }
>
> constexpr const _Tp&
> -   operator*() const noexcept
> +   operator*() const & noexcept
> { return _M_value; }
>
> +   constexpr _Tp&&
> +   operator*() && noexcept
> +   { return std::move(_M_value); }
> +
> +   constexpr const _Tp&&
> +   operator*() const && noexcept
> +   { return std::move(_M_value); }
> +
> constexpr _Tp*
> operator->() noexcept
> { return std::__addressof(_M_value); }
> @@ -7799,9 +7807,10 @@ namespace views::__adaptor
>   using _Tp = remove_cvref_t<_Range>;
>   static_assert(__is_repeat_view<_Tp>);
>   if constexpr (sized_range<_Tp>)
> -   return views::repeat(*__r._M_value, 
> std::min(ranges::distance(__r), __n));
> +   return views::repeat(*std::forward<_Range>(__r)._M_value,
> +std::min(ranges::distance(__r), __n));
>   else
> -   return views::repeat(*__r._M_value, __n);
> +   return views::repeat(*std::forward<_Range>(__r)._M_value, __n);
> }
>
>template
> @@ -7813,7 +7822,8 @@ namespace views::__adaptor
>   if constexpr (sized_range<_Tp>)
> {
>   auto __sz = ranges::distance(__r);
> - return views::repeat(*__r._M_value, __sz - std::min(__sz, __n));
> + return views::repeat(*std::forward<_Range>(__r)._M_value,
> +  __sz - std::min(__sz, __n));
> }
>   else
> return __r;
> diff --git a/libstdc++-v3/testsuite/std/ranges/repeat/1.cc 
> b/libstdc++-v3/testsuite/std/ranges/repeat/1.cc
> index 30636407ee2..9551414e2c8 100644
> --- a/libstdc++-v3/testsuite/std/ranges/repeat/1.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/repeat/1.cc
> @@ -2,6 +2,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #if __cpp_lib_ranges_repeat != 202207L
> @@ -137,6 +138,17 @@ test06()
>static_assert( requires { views::repeat(move_only{}, 2); } );
>  }
>
> +void
> +test07()
> +{
> +  // PR libstdc++/112453
> +  auto t = std::views::repeat(std::make_unique(5)) | 
> std::views::take(2);
> +  auto d = std::views::repeat(std::make_unique(5)) | 
> std::views::drop(2);
> +
> +  auto t2 = std::views::repeat(std::make_unique(5), 4) | 
> std::views::take(2);
> +  auto d2 = std::views::repeat(std::make_unique(5), 4) | 
> std::views::drop(2);
> +}
> +
>  int
>  main()
>  {
> @@ -146,4 +158,5 @@ main()
>static_assert(test04());
>test05();
>test06();
> +  test07();
>  }
> --
> 2.43.0.rc1
>



[PATCH 2/2] libstdc++: mark 20_util/scoped_allocator/noexcept.cc R-E-T hosted

2023-11-09 Thread Arsen Arsenović
libstdc++-v3/ChangeLog:

* testsuite/20_util/scoped_allocator/noexcept.cc: Mark as
requiring hosted.
---
 libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc 
b/libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc
index 16992968d3b9..f14eff2c46f5 100644
--- a/libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc
+++ b/libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc
@@ -1,4 +1,5 @@
 // { dg-do compile { target c++11 } }
+// { dg-require-effective-target hosted }
 
 #include 
 
-- 
2.42.1



[PATCH 1/2] libstdc++: declare std::allocator in !HOSTED as an extension

2023-11-09 Thread Arsen Arsenović
This allows us to add features to freestanding which allow specifying
non-default allocators (generators, collections, ...) without having to
modify them.

libstdc++-v3/ChangeLog:

* include/bits/memoryfwd.h: Remove HOSTED check around allocator
and its specializations.
---
Evening,

This patch adds std::allocator as a declaration to freestanding, so that
it doesn't block various other bits of the library (such as collections
or the generators that I intend to send in soon) from being added to
freestanding anymore.

I don't intend to pull in anything but  into freestanding
in this release, though, so, this patch will have little impact for now.

Testing on x86_64-pc-linux-gnu (the tests are not done yet, but I see no
relevant fails in previous test runs).  The follow-up patch also marks a
new test as freestanding (as it was failing).

OK for trunk?

Have a lovely evening!

 libstdc++-v3/include/bits/memoryfwd.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/memoryfwd.h 
b/libstdc++-v3/include/bits/memoryfwd.h
index 330a6df7f44a..2b79cd8880a1 100644
--- a/libstdc++-v3/include/bits/memoryfwd.h
+++ b/libstdc++-v3/include/bits/memoryfwd.h
@@ -60,13 +60,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-#if _GLIBCXX_HOSTED
+  // Included in freestanding as a libstdc++ extension.
   template
 class allocator;
 
   template<>
 class allocator;
-#endif
 
 #if __cplusplus >= 201103L
   /// Declare uses_allocator so it can be specialized in `` etc.
-- 
2.42.1



[committed] i386: Improve stack protector patterns and peephole2s even more

2023-11-09 Thread Uros Bizjak
Improve stack protector patterns and peephole2s even more:

a. Use unrelated register clears with integer mode size <= word
   mode size to clear stack protector scratch register.

b. Use unrelated register initializations in front of stack
   protector sequence to clear stack protector scratch register.

c. Use unrelated register initializations using LEA instructions
   to clear stack protector scratch register.

These stack protector improvements reuse 6914 unrelated register
initializations to substitute the clear of stack protector scratch
register in 12034 instances of stack protector sequence in recent linux
defconfig build.

gcc/ChangeLog:

* config/i386/i386.md (@stack_protect_set_1__):
Use W mode iterator instead of SWI48.  Output MOV instead of XOR
for TARGET_USE_MOV0.
(stack_protect_set_1 peephole2): Use integer modes with
mode size <= word mode size for operand 3.
(stack_protect_set_1 peephole2 #2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization, originally in front of stack
protector sequence.
(*stack_protect_set_3__): New insn pattern.
(stack_protect_set_1 peephole2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization involving LEA instruction.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ce7102af44f..046b6b7919e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -24306,11 +24306,11 @@ (define_expand "stack_protect_set"
   DONE;
 })
 
-(define_insn "@stack_protect_set_1__"
+(define_insn "@stack_protect_set_1__"
   [(set (match_operand:PTR 0 "memory_operand" "=m")
(unspec:PTR [(match_operand:PTR 1 "memory_operand" "m")]
UNSPEC_SP_SET))
-   (set (match_operand:SWI48 2 "register_operand" "=") (const_int 0))
+   (set (match_operand:W 2 "register_operand" "=") (const_int 0))
(clobber (reg:CC FLAGS_REG))]
   ""
 {
@@ -24318,7 +24318,10 @@ (define_insn 
"@stack_protect_set_1__"
   operands);
   output_asm_insn ("mov{}\t{%2, %0|%0, %2}",
   operands);
-  return "xor{l}\t%k2, %k2";
+  if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
+return "xor{l}\t%k2, %k2";
+  else
+return "mov{l}\t{$0, %k2|%k2, 0}";
 }
   [(set_attr "type" "multi")])
 
@@ -24334,15 +24337,16 @@ (define_peephole2
   UNSPEC_SP_SET))
  (set (match_operand:W 2 "general_reg_operand") (const_int 0))
  (clobber (reg:CC FLAGS_REG))])
-   (parallel [(set (match_operand:SWI48 3 "general_reg_operand")
-  (match_operand:SWI48 4 "const0_operand"))
- (clobber (reg:CC FLAGS_REG))])]
-  "peep2_reg_dead_p (0, operands[3])
+   (set (match_operand 3 "general_reg_operand")
+   (match_operand 4 "const0_operand"))]
+  "GET_MODE_SIZE (GET_MODE (operands[3])) <= UNITS_PER_WORD
+   && peep2_reg_dead_p (0, operands[3])
&& peep2_reg_dead_p (1, operands[2])"
   [(parallel [(set (match_dup 0)
   (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
  (set (match_dup 3) (const_int 0))
- (clobber (reg:CC FLAGS_REG))])])
+ (clobber (reg:CC FLAGS_REG))])]
+  "operands[3] = gen_lowpart (word_mode, operands[3]);")
 
 (define_insn "*stack_protect_set_2__si"
   [(set (match_operand:PTR 0 "memory_operand" "=m")
@@ -24401,6 +24405,59 @@ (define_peephole2
   (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
  (set (match_dup 3) (match_dup 4))])])
 
+(define_peephole2
+  [(set (match_operand:SWI48 3 "general_reg_operand")
+   (match_operand:SWI48 4 "general_gr_operand"))
+   (parallel [(set (match_operand:PTR 0 "memory_operand")
+  (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
+  UNSPEC_SP_SET))
+ (set (match_operand:W 2 "general_reg_operand") (const_int 0))
+ (clobber (reg:CC FLAGS_REG))])]
+  "peep2_reg_dead_p (0, operands[3])
+   && peep2_reg_dead_p (2, operands[2])
+   && !reg_mentioned_p (operands[3], operands[0])
+   && !reg_mentioned_p (operands[3], operands[1])"
+  [(parallel [(set (match_dup 0)
+  (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
+ (set (match_dup 3) (match_dup 4))])])
+
+(define_insn "*stack_protect_set_3__"
+  [(set (match_operand:PTR 0 "memory_operand" "=m")
+   (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")]
+   UNSPEC_SP_SET))
+   (set (match_operand:SWI48 1 "register_operand" "=")
+   (match_operand:SWI48 2 "address_no_seg_operand" "Ts"))]
+  ""
+{
+  output_asm_insn ("mov{}\t{%3, %1|%1, %3}",
+  operands);
+  output_asm_insn ("mov{}\t{%1, %0|%0, %1}",
+  operands);
+  if (SImode_address_operand (operands[2], VOIDmode))
+{
+  gcc_assert (TARGET_64BIT);

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-09 Thread Jason Merrill

On 11/4/23 02:40, waffl3x wrote:

I'm unfortunately going down a rabbit hole again.

--function.h:608
```
/* If pointers to member functions use the least significant bit to
indicate whether a function is virtual, ensure a pointer
to this function will have that bit clear.  */
#define MINIMUM_METHOD_BOUNDARY \
   ((TARGET_PTRMEMFUNC_VBIT_LOCATION == ptrmemfunc_vbit_in_pfn)  \
? MAX (FUNCTION_BOUNDARY, 2 * BITS_PER_UNIT) : FUNCTION_BOUNDARY)
```


So yes, it was for PMFs using the low bit of the pointer to indicate a 
virtual member function.  Since an xob memfn can't be virtual, it's 
correct for them to have the same alignment as a static memfn.



I stumbled upon this while cleaning up the patch, grokfndecl is just so
full of cruft it's crazy hard to reason about. There's more than one
block that I am near certain is completely dead code. I would like to
just ignore them for now but some of them unfortunately pertain to xobj
functions. I just don't feel good about putting in any hacks, but to
really get any modifications in here correct it would need to be
refactored much more than I should be doing in this patch.

Here's another example that I'm not sure how I want to address it.

~decl.cc:10331 grokfndecl
```
   int staticp = ctype && TREE_CODE (type) == FUNCTION_TYPE;
```
~decl.cc:10506 grokfndecl
```
   /* If this decl has namespace scope, set that up.  */
   if (in_namespace)
 set_decl_namespace (decl, in_namespace, friendp);
   else if (ctype)
 DECL_CONTEXT (decl) = ctype;
   else
 DECL_CONTEXT (decl) = FROB_CONTEXT (current_decl_namespace ());
```
And just a few lines down;
~decl.cc:10529
```
   /* Should probably propagate const out from type to decl I bet (mrs).  */
   if (staticp)
 {
   DECL_STATIC_FUNCTION_P (decl) = 1;
   DECL_CONTEXT (decl) = ctype;
 }
```

If staticp is true, ctype must have been non-null, and if ctype is
non-null, the context for decl should have been set in the second
block. So why was the code in the second block added?

commit f3665bdc1799c0421490b5e655f977570354
Author: Nathan Sidwell 
Date:   Tue Jul 28 08:57:36 2020 -0700

 c++: Set more DECL_CONTEXTs
 
 I discovered we were not setting DECL_CONTEXT in a few cases, and

 grokfndecl's control flow wasn't making it clear that we were doing it
 in all cases.
 
 gcc/cp/

 * cp-gimplify.c (cp_genericize_r): Set IMPORTED_DECL's context.
 * cp-objcp-common.c (cp_pushdecl): Set decl's context.
 * decl.c (grokfndecl): Make DECL_CONTEXT setting clearer.

According to the commit, it was because it was not clear, which quite
frankly I can agree to, it just wasn't determined that the code below
is redundantly setting the context so it wasn't removed.

This puts me in a dilemma though, do I put another condition in that
code block for the xobj case even though the code is nearly dead? Or do
I give it a minor refactor for it to make a little more sense? If I add
to the code I feel like it's just going to add to the problem, while if
I give it a minor refactor it still won't look great and has a greater
chance of breaking something.

In this case I'm going to risk refactoring it, staticp is only used in
that 1 place so I will just rip it out. I am not concerned with decl's
type spontaneously changing to something that is not FUNCTION_TYPE, and
if it did I think there are bigger problems afoot.

I guess I'll know if I went too far with the refactoring when the patch
reaches you, do let me know about this one specifically though because
it took up a lot of my time trying to decide how to address it.


Removing the redundant DECL_CONTEXT setting seems appropriate, and 
changing how staticp is handled to reflect that xobfns can also have 
FUNCTION_TYPE.



All tests seemed to pass when applied to GCC14, but the results did
something funny where it said tests disappeared and new tests appeared
and passed. The ones that disappeared and the new ones that appeared
looked like they were identical so I'm not worrying about it. Just
mentioning it in case this is something I do need to look into.


That doesn't sound like a problem, but I'm curious about the specific 
output you're seeing.


Jason



[pushed] [IRA]: Fixing conflict calculation from region landing pads.

2023-11-09 Thread Vladimir Makarov

This is one more patch for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

The patch was successfully tested and bootstrapped on x86-64, aarch64, 
ppc64le.


commit df14f1c0582cd6742a37abf3a97f4c4bf0caf864
Author: Vladimir N. Makarov 
Date:   Thu Nov 9 08:51:15 2023 -0500

[IRA]: Fixing conflict calculation from region landing pads.

The following patch fixes conflict calculation from exception landing
pads.  The previous patch processed only one newly created landing pad.
Besides it was wrong, it also resulted in large memory consumption by IRA.

gcc/ChangeLog:

PR rtl-optimization/110215
* ira-lives.cc: (add_conflict_from_region_landing_pads): New
function.
(process_bb_node_lives): Use it.

diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index bc8493856a4..81af5c06460 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -1214,6 +1214,32 @@ process_out_of_region_eh_regs (basic_block bb)
 
 #endif
 
+/* Add conflicts for object OBJ from REGION landing pads using CALLEE_ABI.  */
+static void
+add_conflict_from_region_landing_pads (eh_region region, ira_object_t obj,
+   function_abi callee_abi)
+{
+  ira_allocno_t a = OBJECT_ALLOCNO (obj);
+  rtx_code_label *landing_label;
+  basic_block landing_bb;
+
+  for (eh_landing_pad lp = region->landing_pads; lp ; lp = lp->next_lp)
+{
+  if ((landing_label = lp->landing_pad) != NULL
+	  && (landing_bb = BLOCK_FOR_INSN (landing_label)) != NULL
+	  && (region->type != ERT_CLEANUP
+	  || bitmap_bit_p (df_get_live_in (landing_bb),
+			   ALLOCNO_REGNO (a
+	{
+	  HARD_REG_SET new_conflict_regs
+	= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+	  OBJECT_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+	  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+	  return;
+	}
+}
+}
+
 /* Process insns of the basic block given by its LOOP_TREE_NODE to
update allocno live ranges, allocno hard register conflicts,
intersected calls, and register pressure info for allocnos for the
@@ -1385,23 +1411,9 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 		  SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
 		}
 		  eh_region r;
-		  eh_landing_pad lp;
-		  rtx_code_label *landing_label;
-		  basic_block landing_bb;
 		  if (can_throw_internal (insn)
-		  && (r = get_eh_region_from_rtx (insn)) != NULL
-		  && (lp = gen_eh_landing_pad (r)) != NULL
-		  && (landing_label = lp->landing_pad) != NULL
-		  && (landing_bb = BLOCK_FOR_INSN (landing_label)) != NULL
-		  && (r->type != ERT_CLEANUP
-			  || bitmap_bit_p (df_get_live_in (landing_bb),
-	   ALLOCNO_REGNO (a
-		{
-		  HARD_REG_SET new_conflict_regs
-			= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
-		  OBJECT_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
-		  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
-		}
+		  && (r = get_eh_region_from_rtx (insn)) != NULL)
+		add_conflict_from_region_landing_pads (r, obj, callee_abi);
 		  if (sparseset_bit_p (allocnos_processed, num))
 		continue;
 		  sparseset_set_bit (allocnos_processed, num);


Re: [AVR PATCH] Optimize (X>>C)&1 for C in [1,4,8,16,24] in *insv.any_shift..

2023-11-09 Thread Georg-Johann Lay



Am 02.11.23 um 12:50 schrieb Roger Sayle:


This patch optimizes a few special cases in avr.md's *insv.any_shift.
instruction.  This template handles tests for a single bit, where the result
has only a (different) single bit set in the result.  Usually (currently)
this always requires a three-instruction sequence of a BST, a CLR and a BLD
(plus any additional CLR instructions to clear the rest of the result
bytes).
The special cases considered here are those that can be done with only two
instructions (plus CLRs); an ANDI preceded by either a MOV, a SHIFT or a
SWAP.

Hence for C=1 in HImode, GCC with -O2 currently generates:

 bst r24,1
 clr r24
 clr r25
 bld r24,0

with this patch, we now generate:

 lsr r24
 andi r24,1
 clr r25

Likewise, HImode C=4 now becomes:

 swap r24
 andi r24,1
 clr r25

and SImode C=8 now becomes:

 mov r22,r23
 andi r22,1
 clr 23
 clr 24
 clr 25


I've not attempted to model the instruction length accurately for these
special cases; the logic would be ugly, but it's safe to use the current
(1 insn longer) length.

This patch has been (partially) tested with a cross-compiler to avr-elf
hosted on x86_64, without a simulator, where the compile-only tests in
the gcc testsuite show no regressions.  If someone could test this more
thoroughly that would be great.


2023-11-02  Roger Sayle  


CCing Andrew.

Hi, here is a version based on yours.

I am still unsure of what to make with this insn; one approach would be
to post-reload split which simplifies the pattern a bit.  However, when
the current pattern would use MOVW, in a split version we'd get one
more instruction because there would be no MOVW but two MOV's.

Splitting would improve situation when not all of the output bytes
are used by following code, though.

Maybe Andrew has an idea; he helped a lot to improve code generation
by fixing and tweaking middle-end using AVR test cases like for PR55181
or PR109907.

Anyway, here is a version that works out exact code lengths, and it
handles some more cases.

Then I am not really sure if testcases that assert certain instruction
sequences from optimizers is a good idea or rather a liability:
The middle-end is not very good at generating reproducible code
across versions.  In particular, it's not uncommon that newer GCC
versions no more find some optimizations.  So the attached patch just
has a dg-do run without asserting anything on the exact code sequence.

Johann

--

Improve insn output for "*insv.any_shift.".

gcc/
* config/avr/avr-protos.h (avr_out_insv): New proto.
* config/avr/avr.md (adjust_len) [insv]: Add to define_attr.
(*insv.any_shift.): Output using...
* config/avr/avr.cc (avr_out_insv): ...this new function.
(avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle new case.

gcc/testsuite/
* gcc.target/avr/torture/insv-anyshift.c: New test.diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 5c1343f0df8..dfc949a8c0f 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -58,6 +58,7 @@ extern const char *ret_cond_branch (rtx x, int len, int reverse);
 extern const char *avr_out_movpsi (rtx_insn *, rtx*, int*);
 extern const char *avr_out_sign_extend (rtx_insn *, rtx*, int*);
 extern const char *avr_out_insert_notbit (rtx_insn *, rtx*, int*);
+extern const char *avr_out_insv (rtx_insn *, rtx*, int*);
 extern const char *avr_out_extr (rtx_insn *, rtx*, int*);
 extern const char *avr_out_extr_not (rtx_insn *, rtx*, int*);
 extern const char *avr_out_plus_set_ZN (rtx*, int*);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 5e0217de36f..b4d082315b5 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -9066,6 +9066,159 @@ avr_out_insert_notbit (rtx_insn *insn, rtx op[], int *plen)
 }
 
 
+/* Output instructions for  XOP[0] = (XOP[1]  XOP[2]) & XOP[3]  where
+   * XOP[0] and XOP[1] have the same mode which is one of: QI, HI, PSI, SI.
+   * XOP[3] is an exact power of 2.
+   * XOP[2] and XOP[3] are const_int.
+   *  is any of: ASHIFT, LSHIFTRT, ASHIFTRT.
+   * The result depends on XOP[1].
+   Returns "".
+   PLEN != 0: Set *PLEN to the code length in words.  Don't output anything.
+   PLEN == 0: Output instructions.  */
+
+const char*
+avr_out_insv (rtx_insn *insn, rtx xop[], int *plen)
+{
+  machine_mode mode = GET_MODE (xop[0]);
+  int n_bytes = GET_MODE_SIZE (mode);
+  rtx xsrc = SET_SRC (single_set (insn));
+
+  // Any of ASHIFT, LSHIFTRT, ASHIFTRT.
+  enum rtx_code code = GET_CODE (XEXP (xsrc, 0));
+  int shift = code == ASHIFT ? INTVAL (xop[2]) : -INTVAL (xop[2]);
+
+  // Determines the position of the output bit.
+  unsigned mask = GET_MODE_MASK (mode) & INTVAL (xop[3]);
+
+  // Position of the output / input bit, respectively.
+  int obit = exact_log2 (mask);
+  int ibit = obit - shift;
+
+  gcc_assert (IN_RANGE (obit, 0, 

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-11-09 Thread Di Zhao OS
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 31, 2023 9:48 PM
> To: Di Zhao OS 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> get_reassociation_width
> 
> On Sun, Oct 8, 2023 at 6:40 PM Di Zhao OS 
> wrote:
> >
> > Attached is a new version of the patch.
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, October 6, 2023 5:33 PM
> > > To: Di Zhao OS 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > > get_reassociation_width
> > >
> > > On Thu, Sep 14, 2023 at 2:43 PM Di Zhao OS
> > >  wrote:
> > > >
> > > > This is a new version of the patch on "nested FMA".
> > > > Sorry for updating this after so long, I've been studying and
> > > > writing micro cases to sort out the cause of the regression.
> > >
> > > Sorry for taking so long to reply.
> > >
> > > > First, following previous discussion:
> > > > (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html)
> > > >
> > > > 1. From testing more altered cases, I don't think the
> > > > problem is that reassociation works locally. In that:
> > > >
> > > >   1) On the example with multiplications:
> > > >
> > > > tmp1 = a + c * c + d * d + x * y;
> > > > tmp2 = x * tmp1;
> > > > result += (a + c + d + tmp2);
> > > >
> > > >   Given "result" rewritten by width=2, the performance is
> > > >   worse if we rewrite "tmp1" with width=2. In contrast, if we
> > > >   remove the multiplications from the example (and make "tmp1"
> > > >   not singe used), and still rewrite "result" by width=2, then
> > > >   rewriting "tmp1" with width=2 is better. (Make sense because
> > > >   the tree's depth at "result" is still smaller if we rewrite
> > > >   "tmp1".)
> > > >
> > > >   2) I tried to modify the assembly code of the example without
> > > >   FMA, so the width of "result" is 4. On Ampere1 there's no
> > > >   obvious improvement. So although this is an interesting
> > > >   problem, it doesn't seem like the cause of the regression.
> > >
> > > OK, I see.
> > >
> > > > 2. From assembly code of the case with FMA, one problem is
> > > > that, rewriting "tmp1" to parallel didn't decrease the
> > > > minimum CPU cycles (taking MULT_EXPRs into account), but
> > > > increased code size, so the overhead is increased.
> > > >
> > > >a) When "tmp1" is not re-written to parallel:
> > > > fmadd d31, d2, d2, d30
> > > > fmadd d31, d3, d3, d31
> > > > fmadd d31, d4, d5, d31  //"tmp1"
> > > > fmadd d31, d31, d4, d3
> > > >
> > > >b) When "tmp1" is re-written to parallel:
> > > > fmul  d31, d4, d5
> > > > fmadd d27, d2, d2, d30
> > > > fmadd d31, d3, d3, d31
> > > > fadd  d31, d31, d27 //"tmp1"
> > > > fmadd d31, d31, d4, d3
> > > >
> > > > For version a), there are 3 dependent FMAs to calculate "tmp1".
> > > > For version b), there are also 3 dependent instructions in the
> > > > longer path: the 1st, 3rd and 4th.
> > >
> > > Yes, it doesn't really change anything.  The patch has
> > >
> > > +  /* If there's code like "acc = a * b + c * d + acc" in a tight loop,
> some
> > > + uarchs can execute results like:
> > > +
> > > +   _1 = a * b;
> > > +   _2 = .FMA (c, d, _1);
> > > +   acc_1 = acc_0 + _2;
> > > +
> > > + in parallel, while turning it into
> > > +
> > > +   _1 = .FMA(a, b, acc_0);
> > > +   acc_1 = .FMA(c, d, _1);
> > > +
> > > + hinders that, because then the first FMA depends on the result
> > > of preceding
> > > + iteration.  */
> > >
> > > I can't see what can be run in parallel for the first case.  The .FMA
> > > depends on the multiplication a * b.  Iff the uarch somehow decomposes
> > > .FMA into multiply + add then the c * d multiply could run in parallel
> > > with the a * b multiply which _might_ be able to hide some of the
> > > latency of the full .FMA.  Like on x86 Zen FMA has a latency of 4
> > > cycles but a multiply only 3.  But I never got confirmation from any
> > > of the CPU designers that .FMAs are issued when the multiply
> > > operands are ready and the add operand can be forwarded.
> > >
> > > I also wonder why the multiplications of the two-FMA sequence
> > > then cannot be executed at the same time?  So I have some doubt
> > > of the theory above.
> >
> > The parallel execution for the code snippet above was the other
> > issue (previously discussed here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628960.html).
> > Sorry it's a bit confusing to include that here, but these 2 fixes
> > needs to be combined to avoid new regressions. Since considering
> > FMA in get_reassociation_width produces more results of width=1,
> > so there would be more loop depending FMA chains.
> >
> > > Iff this really is the reason for the sequence to execute with lower
> > > overall latency and we want to attack this on 

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Alexander Monakov


On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:

> Hi Kewen,
> 
> Below are my comments.  I don't want to override Alexander's review, and if
> the patch looks good to him, it's fine to ignore my concerns.
> 
> My main concern is that this adds a new entity -- forceful skipping of
> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
> in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
> already quite a bit of logic in the scheduler to skip them _as part of normal
> operation_.

I agree with the concern. I hoped that solving the problem by skipping the BB
like the (bit-rotted) debug code needs to would be a minor surgery. As things
look now, it may be better to remove the non-working sched_block debug counter
entirely and implement a good solution for the problem at hand.

> 
> Would you please consider 2 ideas below.
> 
> #1:
> After a brief look, I'm guessing this part is causing the problem:
> haifa-sched.cc :schedule_block():
> === [1]
>   /* Loop until all the insns in BB are scheduled.  */
>   while ((*current_sched_info->schedule_more_p) ())
> {
>   perform_replacements_new_cycle ();
>   do
>   {
> start_clock_var = clock_var;
> 
> clock_var++;
> 
> advance_one_cycle ();

As I understand, we have spurious calls to advance_one_cycle on basic block
boundaries, which don't model the hardware (the CPU doesn't see BB boundaries)
and cause divergence when passing through a debug-only BB which would not be
present at all without -g.

Since EBBs and regions may not have jump targets in the middle, advancing
a cycle on BB boundaries does not seem well motivated. Can we remove it?

Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
boundaries with -fcompare-debug is enabled? It should make the problem
readily detectable by -fcompare-debug even when scheduling did not diverge.

Alexander


Re: RFC (V2) the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-09 Thread Qing Zhao


> On Nov 9, 2023, at 11:50 AM, Jose Marchesi  wrote:
> 
>> 
>> On Thu, Nov 09, 2023 at 03:49:49PM +, Qing Zhao wrote:
>>> Is it reasonable to add one option to disable the “counted_by” attribute?
>>> (then no insertion of the new .ACCESS_WITH_SIZE into IL).  
>>> 
>>> The major reason is: some users might want to ignore all the “counted_by” 
>>> attribute added in the source code,
>>> We need to provide them a way to disable this feature.
>> 
>> -D'counted_by(x)='
>> and/or
>> -D'__counted_by__(x)='
>> ?
> 
> The insertion of .ACCESS_WITH_SIZE collides with the BPF CO-RE
> preserve_access_index implementation.
> 
> I don't think this will be a problem in practice (the BPF program can
> define counted_by to the empty string as Jakub suggests) but we ought to
> at least detect when a data structure featuring a counted_by FMA is
> accessed with access index preservation (either attribute or builtin)
> and either error out or warning out and try to accomodate by turning the
> .ACCESS_WTIH_INDEX back to plain accesses.  We can do either with BPF
> specific backend code.

Yes, I agree that handling this in BPF backend code might be a better approach
 since this is really a BPF CO-RE specific issue.

For the counted_by implementation, I will keep the current design.

But I will add this identified BPF CO-RE issue into the proposal as a known 
issue for record purpose.

Thanks a lot for raising this issue and the possible solutions.

Qing



Re: [PATCH 1/3] RISC-V: Add support for XCVelw extension in CV32E40P

2023-11-09 Thread Jeff Law




On 11/8/23 04:09, Mary Bennett wrote:


+;; XCVELW builtins
+(define_insn "riscv_cv_elw_elw_si"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+  (unspec_volatile [(mem:SI (match_operand:SI 1 "address_operand" "p"))]
+  UNSPECV_CV_ELW))]
+
+  "TARGET_XCVELW && !TARGET_64BIT"
+  "cv.elw\t%0,%a1"
+
+  [(set_attr "type" "load")
+  (set_attr "mode" "SI")])
Would it make more sense to pull the MEM into the operand?  So instead 
of "address_operand", you'd define a new operand predicate which 
accepted (mem (...)) and that chunk of your insn looks like



(unspec_volatile [(match_operand:SI 1 "new_predicate" "")] UNSPEC_CV_ELW))]

Or something close to that.


From a quick look at the docs it looks like the addressing modes are 
similar to other extensions and could be re-used.


Thoughts?

jeff



Re: RFC (V2) the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-09 Thread Jose E. Marchesi


> On Thu, Nov 09, 2023 at 03:49:49PM +, Qing Zhao wrote:
>> Is it reasonable to add one option to disable the “counted_by” attribute?
>> (then no insertion of the new .ACCESS_WITH_SIZE into IL).  
>> 
>> The major reason is: some users might want to ignore all the “counted_by” 
>> attribute added in the source code,
>> We need to provide them a way to disable this feature.
>
> -D'counted_by(x)='
> and/or
> -D'__counted_by__(x)='
> ?

The insertion of .ACCESS_WITH_SIZE collides with the BPF CO-RE
preserve_access_index implementation.

I don't think this will be a problem in practice (the BPF program can
define counted_by to the empty string as Jakub suggests) but we ought to
at least detect when a data structure featuring a counted_by FMA is
accessed with access index preservation (either attribute or builtin)
and either error out or warning out and try to accomodate by turning the
.ACCESS_WTIH_INDEX back to plain accesses.  We can do either with BPF
specific backend code.


Re: [PATCH] RISC-V: VECT: Remember to assert any_known_not_updated_vssa

2023-11-09 Thread Jeff Law




On 11/6/23 06:01, Maxim Blinov wrote:

From: Maxim Blinov 

This patch is based on and intended for the 
vendors/riscv/gcc-13-with-riscv-opts branch - please apply if looks OK.

Fixes the following ICEs that I'm seeing:

FAIL: gcc.dg/vect/O3-pr49087.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/no-scevccp-pr86725-1.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/no-scevccp-pr86725-2.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/no-scevccp-pr86725-3.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/no-scevccp-pr86725-4.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/pr94443.c (internal compiler error: in vect_transform_loops, 
at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/pr94443.c -flto -ffat-lto-objects (internal compiler error: 
in vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/slp-50.c (internal compiler error: in vect_transform_loops, 
at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/slp-50.c -flto -ffat-lto-objects (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-cond-13.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-cond-13.c -flto -ffat-lto-objects (internal compiler 
error: in vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-live-6.c (internal compiler error: in 
vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-live-6.c -flto -ffat-lto-objects (internal compiler 
error: in vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c (internal compiler error: 
in vect_transform_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.target/riscv/rvv/autovec/partial/live-2.c (internal compiler error: 
in vect_transform_loops, at tree-vectorizer.cc:1032)

-- >8 --

When we create a VEC_EXPAND gimple stmt:

   /* SCALAR_RES = VEC_EXTRACT .  */
   tree scalar_res
 = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
 vec_lhs_phi, last_index);

Under the hood we are really just creating a GIMPLE_CALL stmt. Later
on, when we `gsi_insert_seq_before` our stmts:

   if (stmts)
 {
   gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
   gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);

We eventually run into tree-ssa-operands.cc:1147:

   operands_scanner (fn, stmt).build_ssa_operands ();

Since VEC_EXPAND is *not* marked with ECF_NOVOPS, ECF_CONST, or
ECF_PURE flags in internal-fn.def, when
`operand_scanner::parse_ssa_operands` comes across our
VEC_EXTRACT-type GIMPLE_CALL, it generates a `gimple_vop()` artificial
variable.

`operand_scanner::finalize_ssa_defs` then picks this up, so our final
stmt goes from

_73 = .VEC_EXTRACT (vect_last_9.56_71, _72);

to

# .MEM = VDEF <>
_73 = .VEC_EXTRACT (vect_last_9.56_71, _72);

But more importantly it marks us as `ssa_renaming_needed`, in
tree-ssa-operands.cc:420:

   /* If we have a non-SSA_NAME VDEF, mark it for renaming.  */
   if (gimple_vdef (stmt)
   && TREE_CODE (gimple_vdef (stmt)) != SSA_NAME)
 {
   fn->gimple_df->rename_vops = 1;
   fn->gimple_df->ssa_renaming_needed = 1;
 }

This then proceeds to crash the compiler when we are about to leave
`vect_transform_loops`:

   if (need_ssa_update_p (cfun))
 {
   gcc_assert (loop_vinfo->any_known_not_updated_vssa);
   fun->gimple_df->ssa_renaming_needed = false;
   todo |= TODO_update_ssa_only_virtuals;
 }

Since,

- `need_ssa_update_p (cfun)` is true (it was set when we generated a
   memory vdef)
- `loop_vinfo->any_known_not_updated_vssa` is false

As the code currently stands, creating a gimple stmt containing a
VEC_EXTRACT should always generate a memory vdef, therefore we should
remember to mark `loop_vinfo->any_known_not_updated_vssa` afterwards.

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_live_operation): Remember to
assert loop_vinfo->any_known_not_updated_vssa if we are inserting
a call to VEC_EXPAND.
Just to avoid any doubt -- with the internal-fn.def patch I cherry 
picked earlier this week to the branch, this is no longer needed, right?


jeff


RE: [PATCH] tree-optimization/111950 - vectorizer loop copying

2023-11-09 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, November 9, 2023 11:54 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] tree-optimization/111950 - vectorizer loop copying
> 
> On Thu, 9 Nov 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Thursday, November 9, 2023 9:24 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: RE: [PATCH] tree-optimization/111950 - vectorizer loop
> > > copying
> > >
> > > On Thu, 9 Nov 2023, Tamar Christina wrote:
> > >
> > > > > guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
> > > > > -   guard_to = split_edge (epilog_e);
> > > > > +   guard_to = epilog_e->dest;
> > > > > guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, 
> > > > > guard_to,
> > > > >  skip_vector ? anchor : 
> > > > > guard_bb,
> > > > >  prob_epilog.invert (),
> > > > > @@ -3443,8 +3229,30 @@ vect_do_peeling (loop_vec_info
> > > > > loop_vinfo, tree niters, tree nitersm1,
> > > > > if (vect_epilogues)
> > > > >   epilogue_vinfo->skip_this_loop_edge = guard_e;
> > > > > edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > -   slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv,
> > > > > guard_e,
> > > > > -   epilog_e);
> > > > > +   gphi_iterator gsi2 = gsi_start_phis (main_iv->dest);
> > > > > +   for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > > +!gsi_end_p (gsi); gsi_next ())
> > > > > + {
> > > > > +   /* We are expecting all of the PHIs we have on epilog_e
> > > > > +  to be also on the main loop exit.  But sometimes
> > > > > +  a stray virtual definition can appear at epilog_e
> > > > > +  which we can then take as the same on all exits,
> > > > > +  we've removed the LC SSA PHI on the main exit before
> > > > > +  so we wouldn't need to create a loop PHI for it.  */
> > > > > +   if (virtual_operand_p (gimple_phi_result (*gsi))
> > > > > +   && (gsi_end_p (gsi2)
> > > > > +   || !virtual_operand_p (gimple_phi_result 
> > > > > (*gsi2
> > > > > + add_phi_arg (*gsi,
> > > > > +  gimple_phi_arg_def_from_edge (*gsi,
> epilog_e),
> > > > > +  guard_e, UNKNOWN_LOCATION);
> > > > > +   else
> > > > > + {
> > > > > +   add_phi_arg (*gsi, gimple_phi_result (*gsi2),
> guard_e,
> > > > > +UNKNOWN_LOCATION);
> > > > > +   gsi_next ();
> > > > > + }
> > > > > + }
> > > > > +
> > > >
> > > > I've been having some trouble incorporating this change into the
> > > > early break
> > > work.
> > > > My understanding is that here you've removed the lookup that
> > > > find_guard did and are assuming that the order between the PHI
> > > > nodes between loop->exit and epilog->exit are the same - sporadic
> > > > virtual
> > > operands.
> > > >
> > > > But the loop->exit for early break has to materialize all PHI
> > > > nodes from the main loop into the epilog loop since we need them
> > > > to restart the
> > > scalar loop iteration.
> > > >
> > > > This means that the number of PHI nodes between the first loop and
> > > > the second Loop are not the same, so we end up mis-linking phi nodes.
> > > > i.e. consider this loop
> > > >
> > > >
> https://gist.github.com/Mistuke/65d476b18f991772fdec159a09b81869
> > >
> > > I don't see any multi-exits here?  I think you need exactly the same
> > > PHIs you need for the branch to the epilogue, no?
> > >
> >
> > Ah it's a failing testcase but not one with an early break,
> >
> > > If you can point me to a testcase that fails on your branch I can
> > > try to have a look.
> >
> > I've updated the branch refs/users/tnfchris/heads/gcc-14-early-break
> >
> > Quite a few tests fail, a simple one is vect-early-break_5.c and
> > vect-early-break_20.c
> >
> > But what you just said above makes me wonder.. at the moment before we
> > have differening amount because we require to have the loop counters
> > and IVs as PHI nodes such that vect_update_ivs_after_vectorizer can
> > thread them through correctly as it searches for PHI nodes.  However
> > for the epilog exit, those that are not live are not needed.  This is why 
> > we get
> different counts.
> >
> > Maybe.. the solution is that I need to do the same thing as
> > vectorizable_live_operations In that when
> > vect_update_ivs_after_vectorizer is done I should either remove the PHI
> nodes or turn them into simple assignments.  Since they're always single 
> value.
> >
> > Looking at a few examples that seems like it would fix the issue.. 

Re: [PATCH] minimal support for xtheadv

2023-11-09 Thread Kito Cheng
Give a few more thought behind my first LGTM:

I am OK *IF* binutils bits accepted since it's just kind of bypassing
the -march to bintuils to enable those instructions for assembly code.
However the situation seems is little more complicated than my expect
at beginning...:P

Anyway, I still think it's fine to accept that to me *IF* bintuils
part has landed, but only limited to -march support, no further things
for GCC 14 like intrinsic and auto vectorizer stuff for t-head vector
(or vector 0.7).

On Fri, Nov 10, 2023 at 12:05 AM Jeff Law  wrote:
>
>
>
> On 11/9/23 01:38, Yixuan Chen wrote:
> > Hi Kito and Christoph,
> >
> > XYenChi (oriachi...@gmail.com ) is my
> > e-mail address too. I didn't notice the git email config have changed,
> > very sorry about that.
> >
> > We want to support other operate system project from our team, so port
> > the XTheadV. If T-Head and VRULL have made great progress, it's pleasure
> > to follow your work. By the way, I have sent the opcode patch to
> > binutils, if you have any concern, please check the patch:
> > https://sourceware.org/pipermail/binutils/2023-November/130431.html
> > 
> >
> > If our team could provide any help, please let us know.
> Given we see multiple organizations with an interest in this work, but
> that the bulk of the work can't be integrated in the short term, y'all
> might consider a shared development branch for coordination.
>
> That gives the two organizations a place to coordinate their work while
> things like the ISA spec and such get solidified.  Presumably the goal
> for the main body of work is not gcc-14, but gcc-15.
>
> I don't want to dictate how coordination happens.  Ultimately it's
> something the relevant developers can decide.
>
> jeff


Re: [PATCH v2] DSE: Allow vector type for get_stored_val when read < store

2023-11-09 Thread Jeff Law




On 11/8/23 23:08, pan2...@intel.com wrote:

From: Pan Li 

Update in v2:
* Move vector type support to get_stored_val.

Original log:

This patch would like to allow the vector mode in the
get_stored_val in the DSE. It is valid for the read
rtx if and only if the read bitsize is less than the
stored bitsize.

Given below example code with
--param=riscv-autovec-preference=fixed-vlmax.

vuint8m1_t test () {
   uint8_t arr[32] = {
 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
   };

   return __riscv_vle8_v_u8m1(arr, 32);
}

Before this patch:
test:
   lui a5,%hi(.LANCHOR0)
   addisp,sp,-32
   addia5,a5,%lo(.LANCHOR0)
   li  a3,32
   vl2re64.v   v2,0(a5)
   vsetvli zero,a3,e8,m1,ta,ma
   vs2r.v  v2,0(sp) <== Unnecessary store to stack
   vle8.v  v1,0(sp) <== Ditto
   vs1r.v  v1,0(a0)
   addisp,sp,32
   jr  ra

After this patch:
test:
   lui a5,%hi(.LANCHOR0)
   addia5,a5,%lo(.LANCHOR0)
   li  a4,32
   addisp,sp,-32
   vsetvli zero,a4,e8,m1,ta,ma
   vle8.v  v1,0(a5)
   vs1r.v  v1,0(a0)
   addisp,sp,32
   jr  ra

Below tests are passed within this patch:

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression test.

PR target/111720

gcc/ChangeLog:

* dse.cc (get_stored_val): Allow vector mode if the read
bitsize is less than stored bitsize.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111720-0.c: New test.
* gcc.target/riscv/rvv/base/pr111720-1.c: New test.
* gcc.target/riscv/rvv/base/pr111720-10.c: New test.
* gcc.target/riscv/rvv/base/pr111720-2.c: New test.
* gcc.target/riscv/rvv/base/pr111720-3.c: New test.
* gcc.target/riscv/rvv/base/pr111720-4.c: New test.
* gcc.target/riscv/rvv/base/pr111720-5.c: New test.
* gcc.target/riscv/rvv/base/pr111720-6.c: New test.
* gcc.target/riscv/rvv/base/pr111720-7.c: New test.
* gcc.target/riscv/rvv/base/pr111720-8.c: New test.
* gcc.target/riscv/rvv/base/pr111720-9.c: New test.
We're always getting the lowpart here AFAICT and it appears that all the 
right thing should happen if gen_lowpart_common fails (it returns NULL, 
which bubbles up and is the right return value from get_stored_val if it 
can't be optimized).


Did you want to use known_le so that you'd pick up the case when the two 
modes are the same size?  Or was known_lt the test you really wanted 
(and if so, why).



OK using known_lt, or known_le.  If you decide to change to known_le, 
you'll need to bootstrap & regression test again on x86.




jeff


  1   2   >