Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-04 Thread Andrew Pinski
On Mon, Dec 4, 2023 at 4:58 PM Vincenzo Palazzo
 wrote:
>
> This commit is specifically targeting enhancements in
> Nix support for GCC development. This initiative stems
> from the recognized need within our community for a more
> streamlined and efficient development process when using Nix.

I think this is wrong place for this.

>
> Signed-off-by: Vincenzo Palazzo 
> ---
>  flake.lock | 60 ++
>  flake.nix  | 35 +++
>  2 files changed, 95 insertions(+)
>  create mode 100644 flake.lock
>  create mode 100644 flake.nix
>
> diff --git a/flake.lock b/flake.lock
> new file mode 100644
> index 000..de713ff0da9
> --- /dev/null
> +++ b/flake.lock
> @@ -0,0 +1,60 @@
> +{
> +  "nodes": {
> +"flake-utils": {
> +  "inputs": {
> +"systems": "systems"
> +  },
> +  "locked": {
> +"lastModified": 1694529238,
> +"narHash": "sha256-zsNZZGTGnMOf9YpHKJqMSsa0dXbfmxeoJ7xHlrt+xmY=",
> +"owner": "numtide",
> +"repo": "flake-utils",
> +"rev": "ff7b65b44d01cf9ba6a71320833626af21126384",
> +"type": "github"
> +  },
> +  "original": {
> +"owner": "numtide",
> +"repo": "flake-utils",
> +"type": "github"
> +  }
> +},
> +"nixpkgs": {
> +  "locked": {
> +"lastModified": 1696095070,
> +"narHash": "sha256-iDx02dT+OHYYgaRGJxp2HXvzSHkA9l8/3O8GJB2wttU=",
> +"owner": "nixos",
> +"repo": "nixpkgs",
> +"rev": "1f0e8ac1f9a783c4cfa0515483094eeff4315fe2",
> +"type": "github"
> +  },
> +  "original": {
> +"owner": "nixos",
> +"repo": "nixpkgs",
> +"type": "github"
> +  }
> +},
> +"root": {
> +  "inputs": {
> +"flake-utils": "flake-utils",
> +"nixpkgs": "nixpkgs"
> +  }
> +},
> +"systems": {
> +  "locked": {
> +"lastModified": 1681028828,
> +"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
> +"owner": "nix-systems",
> +"repo": "default",
> +"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
> +"type": "github"
> +  },
> +  "original": {
> +"owner": "nix-systems",
> +"repo": "default",
> +"type": "github"
> +  }
> +}
> +  },
> +  "root": "root",
> +  "version": 7
> +}
> diff --git a/flake.nix b/flake.nix
> new file mode 100644
> index 000..b0ff1915adc
> --- /dev/null
> +++ b/flake.nix
> @@ -0,0 +1,35 @@
> +{
> +  description = "gcc compiler";
> +
> +  inputs = {
> +nixpkgs.url = "github:nixos/nixpkgs";
> +flake-utils.url = "github:numtide/flake-utils";
> +  };
> +
> +  outputs = { self, nixpkgs, flake-utils }:
> +flake-utils.lib.eachDefaultSystem (system:
> +  let pkgs = nixpkgs.legacyPackages.${system};
> +  in {
> +packages = {
> +  default = pkgs.gnumake;
> +};
> +formatter = pkgs.nixpkgs-fmt;
> +
> +devShell = pkgs.mkShell {
> +  buildInputs = [
> +pkgs.gnumake
> +pkgs.gcc13
> +
> +pkgs.gmp
> +pkgs.libmpc
> +pkgs.mpfr
> +pkgs.isl
> +pkgs.pkg-config
> +pkgs.autoconf-archive
> +pkgs.autoconf
> +pkgs.automake
> +  ];
> +};
> +  }
> +);
> +}
> --
> 2.43.0
>


Re: [gcc15] nested functions in C

2023-12-04 Thread Martin Uecker
Am Montag, dem 04.12.2023 um 19:51 +0100 schrieb Jakub Jelinek:
> On Mon, Dec 04, 2023 at 01:27:32PM -0500, Siddhesh Poyarekar wrote:
> > [Branching this into a separate conversation to avoid derailing the patch,
> > which isn't directly related]
> > 
> > On 2023-12-04 12:21, Martin Uecker wrote:
> > > I do not really agree with that.  Nested functions can substantially
> > > improve code quality and in C can avoid type unsafe use of
> > > void* pointers in callbacks. The code is often much better with
> > > nested functions than without.  Nested functions and lambdas
> > > (i.e. anonymous nested functions) are used in many languages
> > > because they make code better and GNU's nested function are no
> > > exception.
> > > 
> > > So I disagree with the idea that discouraging nested functions leads
> > > to better code - I think the exact opposite is true.
> > 
> > I would argue that GNU's nested functions *are* an exception because they're
> > like feathers stuck on a pig to try and make it fly; I think a significant
> > specification effort is required to actually make it a cleanly usable
> > feature.
> 
> Why?  The syntax doesn't seem to be something unexpected, and as C doesn't
> have lambdas, one can use the nested functions instead.
> The only problem is if you need to pass function pointers somewhere else
> (and target doesn't have function descriptors or something similar), if it
> is only done to make code more readable compared to say use of macros, I
> think the nested functions are better, one doesn't have to worry about
> multiple evaluations of argument side-effects etc.  And if everything is
> inlined and SRA optimized, there is no extra cost.
> The problem of passing it as a function pointer to other functions is
> common with C++, only lambdas which don't capture anything actually can be
> convertible to function pointer, for anything else you need a template and
> instantiate it for a particular lambda (which is something you can't do in
> C).

In C++ you can erase the type with std::function.  C is missing a 
function pointer type which can encapsulate the static chain on
all archs (not only for nested functions, also for language 
interoperability).

Martin

> 



[PATCH 10/17] [APX NDD] Support APX NDD for and insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.

1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pattern.

2. Legacy AND insn will use r/qm/L constraint, and a post-reload splitter will
transform it into zero_extend move. But for NDD form AND, the splitter is not
strict enough as the splitter assum such AND will have the const_int operand
matching the constraint "L", then NDD form AND allows const_int with any QI
values. Restrict the splitter condition to match "L" constraint that strictly
matches zero-extend sematic.

3. Legacy AND insn will adopt r/0/Z constraint, a splitter will try to optimize
such form into strict_lowpart QImode AND when 7th bit is not set. But the
splitter will wronly convert non-zext form of NDD and with memory src, then the
strict_lowpart transform matches alternative 1 of *_slp_1 and
generates *movstrict_1 so the zext sematic was omitted. This could cause
highpart of dest not cleared and generates wrong code. Disable the splitter
when NDD adopted and operands[0] and operands[1] are not equal.

gcc/ChangeLog:

* config/i386/i386.md (and3): Add NDD alternatives and adjust
output template.
(*anddi_1): Likewise.
(*and_1): Likewise.
(*andqi_1): Likewise.
(*andsi_1_zext): Likewise.
(*anddi_2): Likewise.
(*andsi_2_zext): Likewise.
(*andqi_2_maybe_si): Likewise.
(*and_2): Likewise.
(*and3_doubleword): Add NDD alternative, emit move for optimized
case if operands[0] not equal to operands[1].
(define_split for QI highpart AND): Prohibit splitter to split NDD
form AND insn to qi_ext_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *3_1_slp.
(define_split for zero_extend and optimization): Prohibit splitter to
split NDD form AND insn to zero_extend insn.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add and test.
---
 gcc/config/i386/i386.md | 175 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  13 ++
 2 files changed, 127 insertions(+), 61 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 050779273a7..64944a1163d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11705,18 +11705,19 @@ (define_expand "and3"
   (operands[0], gen_lowpart (mode, operands[1]),
mode, mode, 1));
   else
-ix86_expand_binary_operator (AND, mode, operands);
+ix86_expand_binary_operator (AND, mode, operands,
+TARGET_APX_NDD);
 
   DONE;
 })
 
 (define_insn_and_split "*and3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(and:
-(match_operand: 1 "nonimmediate_operand" "%0,0")
-(match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+(match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+(match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (AND, mode, operands)"
+  "ix86_binary_operator_ok (AND, mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(const_int:DWIH 0)]
@@ -11728,39 +11729,53 @@ (define_insn_and_split "*and3_doubleword"
   if (operands[2] == const0_rtx)
 emit_move_insn (operands[0], const0_rtx);
   else if (operands[2] == constm1_rtx)
-emit_insn_deleted_note_p = true;
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  else
+   emit_insn_deleted_note_p = true;
+}
   else
-ix86_expand_binary_operator (AND, mode, [0]);
+ix86_expand_binary_operator (AND, mode, [0],
+TARGET_APX_NDD);
 
   if (operands[5] == const0_rtx)
 emit_move_insn (operands[3], const0_rtx);
   else if (operands[5] == constm1_rtx)
 {
-  if (emit_insn_deleted_note_p)
+  if (!rtx_equal_p (operands[3], operands[4]))
+   emit_move_insn (operands[3], operands[4]);
+  else if (emit_insn_deleted_note_p)
emit_note (NOTE_INSN_DELETED);
 }
   else
-ix86_expand_binary_operator (AND, mode, [3]);
+ix86_expand_binary_operator (AND, mode, [3],
+TARGET_APX_NDD);
 
   DONE;
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*anddi_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,?k")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm,r,r,r,r,?k")
(and:DI
-(match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm,k")
-(match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m,L,k")))
+(match_operand:DI 1 

[PATCH 17/17] [APX NDD] Support TImode shift for NDD

2023-12-04 Thread Hongyu Wang
For TImode shifts, they are splitted by splitter functions, which assume
operands[0] and operands[1] to be the same. For the NDD alternative the
assumption may not be true so add split functions for NDD to emit the NDD
form instructions, and omit the handling of !64bit target split.

Although the NDD form allows memory src, for post-reload splitter there are
no extra register to accept NDD form shift, especially shld/shrd. So only
accept register alternative for shift src under NDD.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_split_ashl_ndd): New
function to split NDD form lshift.
(ix86_split_rshift_ndd): Likewise for l/ashiftrt.
* config/i386/i386-protos.h (ix86_split_ashl_ndd): New
prototype.
(ix86_split_rshift_ndd): Likewise.
* config/i386/i386.md (ashl3_doubleword): Add NDD
alternative, call ndd split function when operands[0]
not equal to operands[1].
(define_split for doubleword lshift): Likewise.
(define_peephole for doubleword lshift): Likewise.
(3_doubleword): Likewise for l/ashiftrt.
(define_split for doubleword l/ashiftrt): Likewise.
(define_peephole for doubleword l/ashiftrt): Likewise.

gcc/ChangeLog:

* gcc.target/i386/apx-ndd-ti-shift.c: New test.
---
 gcc/config/i386/i386-expand.cc| 136 ++
 gcc/config/i386/i386-protos.h |   2 +
 gcc/config/i386/i386.md   |  56 ++--
 .../gcc.target/i386/apx-ndd-ti-shift.c|  91 
 4 files changed, 273 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index d4bbd33ce07..a53d69d5400 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -6678,6 +6678,142 @@ ix86_split_lshr (rtx *operands, rtx scratch, 
machine_mode mode)
 }
 }
 
+/* Helper function to split TImode ashl under NDD.  */
+void
+ix86_split_ashl_ndd (rtx *operands, rtx scratch)
+{
+  gcc_assert (TARGET_APX_NDD);
+  int half_width = GET_MODE_BITSIZE (TImode) >> 1;
+
+  rtx low[2], high[2];
+  int count;
+
+  split_double_mode (TImode, operands, 2, low, high);
+  if (CONST_INT_P (operands[2]))
+{
+  count = INTVAL (operands[2]) & (GET_MODE_BITSIZE (TImode) - 1);
+
+  if (count >= half_width)
+   {
+ count = count - half_width;
+ if (count == 0)
+   {
+ if (!rtx_equal_p (high[0], low[1]))
+   emit_move_insn (high[0], low[1]);
+   }
+ else if (count == 1)
+   emit_insn (gen_adddi3 (high[0], low[1], low[1]));
+ else
+   emit_insn (gen_ashldi3 (high[0], low[1], GEN_INT (count)));
+
+ ix86_expand_clear (low[0]);
+   }
+  else if (count == 1)
+   {
+ rtx x3 = gen_rtx_REG (CCCmode, FLAGS_REG);
+ rtx x4 = gen_rtx_LTU (TImode, x3, const0_rtx);
+ emit_insn (gen_add3_cc_overflow_1 (DImode, low[0],
+low[1], low[1]));
+ emit_insn (gen_add3_carry (DImode, high[0], high[1], high[1],
+x3, x4));
+   }
+  else
+   {
+ emit_insn (gen_x86_64_shld_ndd (high[0], high[1], low[1],
+ GEN_INT (count)));
+ emit_insn (gen_ashldi3 (low[0], low[1], GEN_INT (count)));
+   }
+}
+  else
+{
+  emit_insn (gen_x86_64_shld_ndd (high[0], high[1], low[1],
+ operands[2]));
+  emit_insn (gen_ashldi3 (low[0], low[1], operands[2]));
+  if (TARGET_CMOVE && scratch)
+   {
+ ix86_expand_clear (scratch);
+ emit_insn (gen_x86_shift_adj_1
+(DImode, high[0], low[0], operands[2], scratch));
+   }
+  else
+   emit_insn (gen_x86_shift_adj_2 (DImode, high[0], low[0], operands[2]));
+}
+}
+
+/* Helper function to split TImode l/ashr under NDD.  */
+void
+ix86_split_rshift_ndd (enum rtx_code code, rtx *operands, rtx scratch)
+{
+  gcc_assert (TARGET_APX_NDD);
+  int half_width = GET_MODE_BITSIZE (TImode) >> 1;
+  bool ashr_p = code == ASHIFTRT;
+  rtx (*gen_shr)(rtx, rtx, rtx) = ashr_p ? gen_ashrdi3
+: gen_lshrdi3;
+
+  rtx low[2], high[2];
+  int count;
+
+  split_double_mode (TImode, operands, 2, low, high);
+  if (CONST_INT_P (operands[2]))
+{
+  count = INTVAL (operands[2]) & (GET_MODE_BITSIZE (TImode) - 1);
+
+  if (ashr_p && (count == GET_MODE_BITSIZE (TImode) - 1))
+   {
+ emit_insn (gen_shr (high[0], high[1],
+ GEN_INT (half_width - 1)));
+ emit_move_insn (low[0], high[0]);
+   }
+  else if (count >= half_width)
+   {
+ if (ashr_p)
+   emit_insn (gen_shr (high[0], high[1],
+   GEN_INT (half_width - 1)));
+  

[PATCH v7] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-04 Thread Marek Polacek
On Fri, Dec 01, 2023 at 07:43:35PM -0500, Jason Merrill wrote:
> On 12/1/23 18:37, Marek Polacek wrote:
> > On Thu, Nov 30, 2023 at 06:34:01PM -0500, Jason Merrill wrote:
> > > On 11/23/23 11:46, Marek Polacek wrote:
> > > > v5 greatly simplifies the code.
> > > 
> > > Indeed, it's much cleaner now.
> > > 
> > > > I still need a new ff_ flag to signal that we can return immediately
> > > > after seeing an i-e expr.
> > > 
> > > That's still not clear to me:
> > > 
> > > > +  /* In turn, maybe promote the function we find ourselves in...  
> > > > */
> > > > +  if ((data->flags & ff_find_escalating_expr)
> > > > + && DECL_IMMEDIATE_FUNCTION_P (decl)
> > > > + /* ...but not if the call to DECL was constant; that is the
> > > > +"an immediate invocation that is not a constant expression"
> > > > +case.  */
> > > > + && (e = cxx_constant_value (stmt, tf_none), e == 
> > > > error_mark_node))
> > > > +   {
> > > > + /* Since we had to set DECL_ESCALATION_CHECKED_P before the 
> > > > walk,
> > > > +we call promote_function_to_consteval directly which 
> > > > doesn't
> > > > +check unchecked_immediate_escalating_function_p.  */
> > > > + if (current_function_decl)
> > > > +   promote_function_to_consteval (current_function_decl);
> > > > + *walk_subtrees = 0;
> > > > + return stmt;
> > > > +   }
> > > 
> > > This is the one use of ff_find_escalating_expr, and it seems redundant 
> > > with
> > > the code immediately below, where we use complain (derived from
> > > ff_mce_false) to decide whether to return immediately.  Can we remove this
> > > hunk and the flag, and merge find_escalating_expr with cp_fold_immediate?
> > 
> > Ah, that works!  Hopefully done now.
> > > I think you want to walk the function body for three-ish reasons:
> > > 1) at EOF, to check for escalation
> > > 2) at EOF, to check for errors
> > > 3) at error time, to explain escalation
> > > 
> > > It's not clear to me that we need a flag to distinguish between them. When
> > > we encounter an immediate-escalating expression E:
> > > 
> > > A) if we're in an immediate-escalating function, escalate and return E 
> > > (#1,
> > > #3).
> > > B) otherwise, if we're diagnosing, error and continue (#2).
> > > C) otherwise, return E (individual expression mce_unknown walk from
> > > constexpr.cc).
> > > 
> > > > @@ -1178,11 +1388,19 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, 
> > > > void *data_
> > > > )
> > > >*walk_subtrees = 0;
> > > >/* Don't return yet, still need the cp_fold below.  */
> > > >  }
> > > > -  cp_fold_immediate_r (stmt_p, walk_subtrees, data);
> > > > +  else
> > > > +   cp_fold_immediate_r (stmt_p, walk_subtrees, data);
> > > >   }
> > > > *stmt_p = stmt = cp_fold (*stmt_p, data->flags);
> > > > +  /* For certain trees, like +foo(), the cp_fold below will remove the 
> > > > +,
> > > 
> > > s/below/above/?
> > 
> > Fixed.
> > > > +/* We've stashed immediate-escalating functions.  Now see if they 
> > > > indeed
> > > > +   ought to be promoted to consteval.  */
> > > > +
> > > > +void
> > > > +process_pending_immediate_escalating_fns ()
> > > > +{
> > > > +  /* This will be null for -fno-immediate-escalation.  */
> > > > +  if (!deferred_escalating_exprs)
> > > > +return;
> > > > +
> > > > +  for (auto e : *deferred_escalating_exprs)
> > > > +if (TREE_CODE (e) == FUNCTION_DECL && !DECL_ESCALATION_CHECKED_P 
> > > > (e))
> > > > +  cp_fold_immediate (_SAVED_TREE (e), mce_false, e);
> > > > +}
> > > > +
> > > > +/* We've escalated every function that could have been promoted to
> > > > +   consteval.  Check that we are not taking the address of a consteval
> > > > +   function.  */
> > > > +
> > > > +void
> > > > +check_immediate_escalating_refs ()
> > > > +{
> > > > +  /* This will be null for -fno-immediate-escalation.  */
> > > > +  if (!deferred_escalating_exprs)
> > > > +return;
> > > > +
> > > > +  for (auto ref : *deferred_escalating_exprs)
> > > > +{
> > > > +  if (TREE_CODE (ref) == FUNCTION_DECL)
> > > > +   continue;
> > > > +  tree decl = (TREE_CODE (ref) == PTRMEM_CST
> > > > +  ? PTRMEM_CST_MEMBER (ref)
> > > > +  : TREE_OPERAND (ref, 0));
> > > > +  if (DECL_IMMEDIATE_FUNCTION_P (decl))
> > > > +   taking_address_of_imm_fn_error (ref, decl);
> > > > +}
> > > > +
> > > > +  deferred_escalating_exprs = nullptr;
> > > >   }
> > > 
> > > Could these be merged, so you do a single loop of cp_fold_immediate over
> > > function bodies or non-function expressions?  I'd expect that to work.
> > 
> > We seem to walk the hash table in a random order so I can't use one loop,
> > otherwise we could hit  before escalating f.
> 
> Is that a problem, since we recurse if we see a function that is still
> unchecked?

It's a problem if the first thing we encounter in the loop is 

[PATCH 11/17] [APX NDD] Support APX NDD for or/xor insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.

Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will
generate xor insn.

gcc/ChangeLog:

* config/i386/i386.md (3): Add new alternative for NDD
and adjust output templates.
(*_1): Likewise.
(*qi_1): Likewise.
(*notxor_1): Likewise.
(*si_1_zext): Likewise.
(*notxorqi_1): Likewise.
(*_2): Likewise.
(*si_2_zext): Likewise.
(*si_2_zext_imm): Likewise.
(*si_1_zext_imm): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*one_cmplsi2_2_zext): Likewise.
(define_split for *one_cmplsi2_2_zext): Use nonimmediate_operand for
operands[3].
(*3_doubleword): Add NDD constraints, emit move for
optimized case if operands[0] != operands[1] or operands[4]
!= operands[5].
(define_split for QI highpart OR/XOR): Prohibit splitter to split NDD
form OR/XOR insn to qi_ext_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *3_1_slp.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add or and xor test.
---
 gcc/config/i386/i386.md | 186 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  26 
 2 files changed, 143 insertions(+), 69 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 64944a1163d..62cd21ee3d4 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12698,17 +12698,19 @@ (define_expand "3"
   && !x86_64_hilo_general_operand (operands[2], mode))
 operands[2] = force_reg (mode, operands[2]);
 
-  ix86_expand_binary_operator (, mode, operands);
+  ix86_expand_binary_operator (, mode, operands,
+  TARGET_APX_NDD);
   DONE;
 })
 
 (define_insn_and_split "*3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(any_or:
-(match_operand: 1 "nonimmediate_operand" "%0,0")
-(match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+(match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+(match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(const_int:DWIH 0)]
@@ -12720,20 +12722,29 @@ (define_insn_and_split "*3_doubleword"
   split_double_mode (mode, [0], 3, [0], [3]);
 
   if (operands[2] == const0_rtx)
-emit_insn_deleted_note_p = true;
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  else
+   emit_insn_deleted_note_p = true;
+}
   else if (operands[2] == constm1_rtx)
 {
   if ( == IOR)
emit_move_insn (operands[0], constm1_rtx);
   else
-   ix86_expand_unary_operator (NOT, mode, [0]);
+   ix86_expand_unary_operator (NOT, mode, [0],
+   TARGET_APX_NDD);
 }
   else
-ix86_expand_binary_operator (, mode, [0]);
+ix86_expand_binary_operator (, mode, [0],
+TARGET_APX_NDD);
 
   if (operands[5] == const0_rtx)
 {
-  if (emit_insn_deleted_note_p)
+  if (!rtx_equal_p (operands[3], operands[4]))
+   emit_move_insn (operands[3], operands[4]);
+  else if (emit_insn_deleted_note_p)
emit_note (NOTE_INSN_DELETED);
 }
   else if (operands[5] == constm1_rtx)
@@ -12741,37 +12752,43 @@ (define_insn_and_split "*3_doubleword"
   if ( == IOR)
emit_move_insn (operands[3], constm1_rtx);
   else
-   ix86_expand_unary_operator (NOT, mode, [3]);
+   ix86_expand_unary_operator (NOT, mode, [3],
+   TARGET_APX_NDD);
 }
   else
-ix86_expand_binary_operator (, mode, [3]);
+ix86_expand_binary_operator (, mode, [3],
+TARGET_APX_NDD);
 
   DONE;
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,r,r,?k")
(any_or:SWI248
-(match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
-(match_operand:SWI248 2 "" "r,,k")))
+(match_operand:SWI248 1 "nonimmediate_operand" "%0,0,rm,r,k")
+(match_operand:SWI248 2 "" "r,,r,,k")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   TARGET_APX_NDD)"
   "@
{}\t{%2, %0|%0, %2}
{}\t{%2, %0|%0, %2}
+   {}\t{%2, %1, %0|%0, %1, %2}
+   {}\t{%2, %1, %0|%0, %1, 

[PATCH 01/17] [APX NDD] Support Intel APX NDD for legacy add insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.

This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea have shorter encoding. For
add operations containing mem NDD will be adopted to save an extra move.

In legacy x86 binary operation expand it will force operands[0] and
operands[1] to be the same so add a helper function to allow NDD form
pattern that operands[0] and operands[1] can be different.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_fixup_binary_operands): Add
new use_ndd flag to check whether ndd can be used for this binop
and adjust operand emit.
(ix86_binary_operator_ok): Likewise.
(ix86_expand_binary_operator): Likewise, and void postreload
expand generate lea pattern when use_ndd is explicit parsed.
* config/i386/i386-options.cc (ix86_option_override_internal):
Prohibit apx subfeatures when not in 64bit mode.
* config/i386/i386-protos.h (ix86_binary_operator_ok):
Add use_ndd flag.
(ix86_fixup_binary_operand): Likewise.
(ix86_expand_binary_operand): Likewise.
* config/i386/i386.md (*add_1): Extend with new alternatives
to support NDD, and adjust output template.
(*addhi_1): Likewise.
(*addqi_1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: New test.
---
 gcc/config/i386/i386-expand.cc  |  19 ++---
 gcc/config/i386/i386-options.cc |   2 +
 gcc/config/i386/i386-protos.h   |   6 +-
 gcc/config/i386/i386.md | 102 ++--
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  21 +
 5 files changed, 96 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 4bd7d4f39c8..3ecda989cf8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1260,14 +1260,14 @@ ix86_swap_binary_operands_p (enum rtx_code code, 
machine_mode mode,
   return false;
 }
 
-
 /* Fix up OPERANDS to satisfy ix86_binary_operator_ok.  Return the
destination to use for the operation.  If different from the true
-   destination in operands[0], a copy operation will be required.  */
+   destination in operands[0], a copy operation will be required except
+   under TARGET_APX_NDD.  */
 
 rtx
 ix86_fixup_binary_operands (enum rtx_code code, machine_mode mode,
-   rtx operands[])
+   rtx operands[], bool use_ndd)
 {
   rtx dst = operands[0];
   rtx src1 = operands[1];
@@ -1307,7 +1307,7 @@ ix86_fixup_binary_operands (enum rtx_code code, 
machine_mode mode,
 src1 = force_reg (mode, src1);
 
   /* Source 1 cannot be a non-matching memory.  */
-  if (MEM_P (src1) && !rtx_equal_p (dst, src1))
+  if (!use_ndd && MEM_P (src1) && !rtx_equal_p (dst, src1))
 src1 = force_reg (mode, src1);
 
   /* Improve address combine.  */
@@ -1338,11 +1338,11 @@ ix86_fixup_binary_operands_no_copy (enum rtx_code code,
 
 void
 ix86_expand_binary_operator (enum rtx_code code, machine_mode mode,
-rtx operands[])
+rtx operands[], bool use_ndd)
 {
   rtx src1, src2, dst, op, clob;
 
-  dst = ix86_fixup_binary_operands (code, mode, operands);
+  dst = ix86_fixup_binary_operands (code, mode, operands, use_ndd);
   src1 = operands[1];
   src2 = operands[2];
 
@@ -1352,7 +1352,8 @@ ix86_expand_binary_operator (enum rtx_code code, 
machine_mode mode,
 
   if (reload_completed
   && code == PLUS
-  && !rtx_equal_p (dst, src1))
+  && !rtx_equal_p (dst, src1)
+  && !use_ndd)
 {
   /* This is going to be an LEA; avoid splitting it later.  */
   emit_insn (op);
@@ -1451,7 +1452,7 @@ ix86_expand_vector_logical_operator (enum rtx_code code, 
machine_mode mode,
 
 bool
 ix86_binary_operator_ok (enum rtx_code code, machine_mode mode,
-rtx operands[3])
+rtx operands[3], bool use_ndd)
 {
   rtx dst = operands[0];
   rtx src1 = operands[1];
@@ -1475,7 +1476,7 @@ ix86_binary_operator_ok (enum rtx_code code, machine_mode 
mode,
 return false;
 
   /* Source 1 cannot be a non-matching memory.  */
-  if (MEM_P (src1) && !rtx_equal_p (dst, src1))
+  if (!use_ndd && MEM_P (src1) && !rtx_equal_p (dst, src1))
 /* Support "andhi/andsi/anddi" as a zero-extending move.  */
 return (code == AND
&& (mode == HImode
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 877659229d2..27f078790e7 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2129,6 +2129,8 @@ ix86_option_override_internal (bool main_args_p,
 
   if (TARGET_APX_F && !TARGET_64BIT)
 error 

[PATCH 07/17] [APX NDD] Support APX NDD for sbb insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

Similar to *add3_doubleword, operands[1] may not equal to operands[0] so
extra move is required.

gcc/ChangeLog:

* config/i386/i386.md (*sub3_doubleword): Add new alternative for
NDD, and emit move when operands[0] not equal to operands[1].
(*sub3_doubleword_zext): Likewise.
(*subv4_doubleword): Likewise.
(*subv4_doubleword_1): Likewise.
(*subv4_overflow_1): Add NDD alternatives and adjust output
templates.
(*subv4_overflow_2): Likewise.
(@sub3_carry): Likewise.
(*addsi3_carry_zext_0r): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*subsi3_carry_zext): Likewise.
(subborrow): Parse TARGET_APX_NDD to ix86_binary_operator_ok.
(subborrow_0): Likewise.
(*sub3_eq): Likewise.
(*sub3_ne): Likewise.
(*sub3_eq_1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-sbb.c: New test.
---
 gcc/config/i386/i386.md | 160 
 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c |   6 +
 2 files changed, 107 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ea5377a0b38..e2705ada31a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7776,12 +7776,13 @@ (define_expand "sub3"
TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*sub3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(minus:
- (match_operand: 1 "nonimmediate_operand" "0,0")
- (match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+ (match_operand: 1 "nonimmediate_operand" "0,0,ro,r")
+ (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7805,16 +7806,18 @@ (define_insn_and_split "*sub3_doubleword"
   TARGET_APX_NDD);
   DONE;
 }
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*sub3_doubleword_zext"
-  [(set (match_operand: 0 "nonimmediate_operand" "=r,o")
+  [(set (match_operand: 0 "nonimmediate_operand" "=r,o,r,r")
(minus:
- (match_operand: 1 "nonimmediate_operand" "0,0")
+ (match_operand: 1 "nonimmediate_operand" "0,0,r,o")
  (zero_extend:
-   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r"
+   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r,rm,r"
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (UNKNOWN, mode, operands)"
+  "ix86_binary_operator_ok (UNKNOWN, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7828,7 +7831,8 @@ (define_insn_and_split "*sub3_doubleword_zext"
   (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))
 (const_int 0)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (mode, [0], 2, [0], 
[3]);")
+  "split_double_mode (mode, [0], 2, [0], [3]);"
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*sub_1"
   [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,,r,r")
@@ -8162,14 +8166,15 @@ (define_insn_and_split "*subv4_doubleword"
(eq:CCO
  (minus:
(sign_extend:
- (match_operand: 1 "nonimmediate_operand" "0,0"))
+ (match_operand: 1 "nonimmediate_operand" "0,0,ro,r"))
(sign_extend:
- (match_operand: 2 "nonimmediate_operand" "r,o")))
+ (match_operand: 2 "nonimmediate_operand" "r,o,r,o")))
  (sign_extend:
(minus: (match_dup 1) (match_dup 2)
-   (set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+   (set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(minus: (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -8197,22 +8202,24 @@ (define_insn_and_split "*subv4_doubleword"
 (match_dup 5)))])]
 {
   split_double_mode (mode, [0], 3, [0], [3]);
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*subv4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
(eq:CCO
  (minus:
(sign_extend:
- (match_operand: 1 "nonimmediate_operand" "0"))
+ (match_operand: 1 "nonimmediate_operand" "0,ro"))
(match_operand: 3 

Re: [PATCH v4] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-04 Thread Manos Anagnostakis
Στις Δευ 4 Δεκ 2023, 21:22 ο χρήστης Richard Sandiford <
richard.sandif...@arm.com> έγραψε:

> Manos Anagnostakis  writes:
> > This is an RTL pass that detects store forwarding from stores to larger
> loads (load pairs).
> >
> > This optimization is SPEC2017-driven and was found to be beneficial for
> some benchmarks,
> > through testing on ampere1/ampere1a machines.
> >
> > For example, it can transform cases like
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldp  d31, d17, [sp, #312] # Large load from small store
> >
> > to
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldr  d31, [sp, #312]
> > ldr  d17, [sp, #320]
> >
> > Currently, the pass is disabled by default on all architectures and
> enabled by a target-specific option.
> >
> > If deemed beneficial enough for a default, it will be enabled on
> ampere1/ampere1a,
> > or other architectures as well, without needing to be turned on by this
> option.
> >
> > Bootstrapped and regtested on aarch64-linux.
> >
> > gcc/ChangeLog:
> >
> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New
> pass.
> > * config/aarch64/aarch64-protos.h
> (make_pass_avoid_store_forwarding): Declare.
> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New
> option.
> >   (aarch64-store-forwarding-threshold): New param.
> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> > * doc/invoke.texi: Document new option and new param.
> > * config/aarch64/aarch64-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > Co-Authored-By: Manolis Tsamis 
> > Co-Authored-By: Philipp Tomsich 
> > ---
> > Changes in v4:
> >   - I had problems to make cselib_subst_to_values work correctly
> > so I used cselib_lookup to implement the exact same behaviour and
> > record the store value at the time we iterate over it.
> >   - Removed the store/load_mem_addr check from is_forwarding as
> > unnecessary.
> >   - The pass is called on all optimization levels right now.
> >   - The threshold check should remain as it is as we only care for
> > the front element of the list. The comment above the check
> explains
> > why a single if is enough.
>
> I still think this is structurally better as a while.  There's no reason
> in principle we why wouldn't want to record the stores in:
>
> stp x0, x1, [x4, #8]
> ldp x0, x1, [x4, #0]
> ldp x2, x3, [x4, #16]
>
> and then the two stores should have the same distance value.
> I realise we don't do that yet, but still.
>
Ah, you mean forwarding from stp. I was a bit confused with what you meant
the previous time. This was not initially meant for this patch, but I think
it wouldn't take long to implement that before pushing this. It is your
call of course if I should include it.

>
> >   - The documentation changes requested.
> >   - Adjusted a comment.
> >
> >  gcc/config.gcc|   1 +
> >  gcc/config/aarch64/aarch64-passes.def |   1 +
> >  gcc/config/aarch64/aarch64-protos.h   |   1 +
> >  .../aarch64/aarch64-store-forwarding.cc   | 321 ++
> >  gcc/config/aarch64/aarch64.opt|   9 +
> >  gcc/config/aarch64/t-aarch64  |  10 +
> >  gcc/doc/invoke.texi   |  11 +-
> >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
> >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
> >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
> >  10 files changed, 452 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
> >  create mode 100644
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
> >  create mode 100644
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 748430194f3..2ee3b61c4fa 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -350,6 +350,7 @@ aarch64*-*-*)
> >   cxx_target_objs="aarch64-c.o"
> >   d_target_objs="aarch64-d.o"
> >   extra_objs="aarch64-builtins.o aarch-common.o
> aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o
> aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o
> cortex-a57-fma-steering.o aarch64-speculation.o
> falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
> > + extra_objs="${extra_objs} aarch64-store-forwarding.o"
> >   target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc
> 

Re: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

2023-12-04 Thread Hongtao Liu
On Mon, Dec 4, 2023 at 3:51 PM Uros Bizjak  wrote:
>
> On Mon, Dec 4, 2023 at 8:11 AM Hongtao Liu  wrote:
> >
> > On Fri, Dec 1, 2023 at 10:26 PM Richard Biener
> >  wrote:
> > >
> > > On Fri, Dec 1, 2023 at 3:39 AM liuhongt  wrote:
> > > >
> > > > > Hmm, I would suggest you put reg_needed into the class and accumulate
> > > > > over all vec_construct, with your patch you pessimize a single v32qi
> > > > > over two separate v16qi for example.  Also currently the whole block 
> > > > > is
> > > > > gated with INTEGRAL_TYPE_P but register pressure would be also
> > > > > a concern for floating point vectors.  finish_cost would then apply an
> > > > > adjustment.
> > > >
> > > > Changed.
> > > >
> > > > > 'target_avail_regs' is for GENERAL_REGS, does that include APX regs?
> > > > > I don't see anything similar for FP regs, but I guess the target 
> > > > > should know
> > > > > or maybe there's a #regs in regclass query already.
> > > > Haven't see any, use below setting.
> > > >
> > > > unsigned target_avail_sse = TARGET_64BIT ? (TARGET_AVX512F ? 32 : 16) : 
> > > > 8;
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > No big impact on SPEC2017.
> > > > Observe 1 big improvement from other benchmark by avoiding 
> > > > vectorization with
> > > > vec_construct v32qi which caused lots of spills.
> > > >
> > > > Ok for trunk?
> > >
> > > LGTM, let's see what x86 maintainers think.
> > +Honza and Uros.
> > Any comments?
>
> I have no comment on vector stuff, I think you are the most
> experienced developer in this area.
Thanks, committed.
>
> Uros.
>
> > >
> > > Richard.
> > >
> > > > For vec_contruct, the components must be live at the same time if
> > > > they're not loaded from memory, when the number of those components
> > > > exceeds available registers, spill happens. Try to account that with a
> > > > rough estimation.
> > > > ??? Ideally, we should have an overall estimation of register pressure
> > > > if we know the live range of all variables.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> > > > Count sse_reg/gpr_regs for components not loaded from memory.
> > > > (ix86_vector_costs:ix86_vector_costs): New constructor.
> > > > (ix86_vector_costs::m_num_gpr_needed[3]): New private memeber.
> > > > (ix86_vector_costs::m_num_sse_needed[3]): Ditto.
> > > > (ix86_vector_costs::finish_cost): Estimate overall register
> > > > pressure cost.
> > > > (ix86_vector_costs::ix86_vect_estimate_reg_pressure): New
> > > > function.
> > > > ---
> > > >  gcc/config/i386/i386.cc | 54 ++---
> > > >  1 file changed, 50 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > index 9390f525b99..dcaea6c2096 100644
> > > > --- a/gcc/config/i386/i386.cc
> > > > +++ b/gcc/config/i386/i386.cc
> > > > @@ -24562,15 +24562,34 @@ ix86_noce_conversion_profitable_p (rtx_insn 
> > > > *seq, struct noce_if_info *if_info)
> > > >  /* x86-specific vector costs.  */
> > > >  class ix86_vector_costs : public vector_costs
> > > >  {
> > > > -  using vector_costs::vector_costs;
> > > > +public:
> > > > +  ix86_vector_costs (vec_info *, bool);
> > > >
> > > >unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind,
> > > >   stmt_vec_info stmt_info, slp_tree node,
> > > >   tree vectype, int misalign,
> > > >   vect_cost_model_location where) override;
> > > >void finish_cost (const vector_costs *) override;
> > > > +
> > > > +private:
> > > > +
> > > > +  /* Estimate register pressure of the vectorized code.  */
> > > > +  void ix86_vect_estimate_reg_pressure ();
> > > > +  /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used 
> > > > for
> > > > + estimation of register pressure.
> > > > + ??? Currently it's only used by vec_construct/scalar_to_vec
> > > > + where we know it's not loaded from memory.  */
> > > > +  unsigned m_num_gpr_needed[3];
> > > > +  unsigned m_num_sse_needed[3];
> > > >  };
> > > >
> > > > +ix86_vector_costs::ix86_vector_costs (vec_info* vinfo, bool 
> > > > costing_for_scalar)
> > > > +  : vector_costs (vinfo, costing_for_scalar),
> > > > +m_num_gpr_needed (),
> > > > +m_num_sse_needed ()
> > > > +{
> > > > +}
> > > > +
> > > >  /* Implement targetm.vectorize.create_costs.  */
> > > >
> > > >  static vector_costs *
> > > > @@ -24748,8 +24767,7 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > > > vect_cost_for_stmt kind,
> > > >  }
> > > >else if ((kind == vec_construct || kind == scalar_to_vec)
> > > >&& node
> > > > -  && SLP_TREE_DEF_TYPE (node) == vect_external_def
> > > > -  && INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
> > > > +  && SLP_TREE_DEF_TYPE (node) == vect_external_def)

[PATCH 05/17] [APX NDD] Support APX NDD for adc insns

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.

NDD instructions will automatically zero-extend dest register to 64bit, so for
zext patterns it can adopt all NDD form that have memory src input.

gcc/ChangeLog:

* config/i386/i386.md (*add3_doubleword): Add ndd constraints, and
move operands[1] to operands[0] when they are not equal.
(*add3_doubleword_cc_overflow_1): Likewise.
(*add3_doubleword_zext): Add ndd constraints.
(*addv4_doubleword): Likewise.
(*addv4_doubleword_1): Likewise.
(addv4_overflow_1): Likewise.
(*addv4_overflow_2): Likewise.
(@add3_carry): Likewise.
(*add3_carry_0): Likewise.
(*addsi3_carry_zext): Likewise.
(addcarry): Likewise.
(addcarry_0): Likewise.
(*addcarry_1): Likewise.
(*add3_eq): Likewise.
(*add3_ne): Likewise.
(*addsi3_carry_zext_0): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-adc.c: New test.
---
 gcc/config/i386/i386.md | 191 
 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c |  15 ++
 2 files changed, 134 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6b316e698bb..358a3857f89 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6291,12 +6291,12 @@ (define_expand "add3"
TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*add3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(plus:
- (match_operand: 1 "nonimmediate_operand" "%0,0")
- (match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+ (match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+ (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,r")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, mode, operands)"
+  "ix86_binary_operator_ok (PLUS, mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -6316,24 +6316,34 @@ (define_insn_and_split "*add3_doubleword"
   split_double_mode (mode, [0], 3, [0], [3]);
   if (operands[2] == const0_rtx)
 {
+  /* Under NDD op0 and op1 may not equal, do not delete insn then.  */
+  bool emit_insn_deleted_note_p = true;
+  if (!rtx_equal_p (operands[0], operands[1]))
+   {
+ emit_move_insn (operands[0], operands[1]);
+ emit_insn_deleted_note_p = false;
+   }
   if (operands[5] != const0_rtx)
-   ix86_expand_binary_operator (PLUS, mode, [3]);
+   ix86_expand_binary_operator (PLUS, mode, [3],
+TARGET_APX_NDD);
   else if (!rtx_equal_p (operands[3], operands[4]))
emit_move_insn (operands[3], operands[4]);
-  else
+  else if (emit_insn_deleted_note_p)
emit_note (NOTE_INSN_DELETED);
   DONE;
 }
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*add3_doubleword_zext"
-  [(set (match_operand: 0 "nonimmediate_operand" "=r,o")
+  [(set (match_operand: 0 "nonimmediate_operand" "=r,o,r,r")
(plus:
  (zero_extend:
-   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r")) 
- (match_operand: 1 "nonimmediate_operand" "0,0")))
+   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r,rm,r"))
+ (match_operand: 1 "nonimmediate_operand" "0,0,r,m")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (UNKNOWN, mode, operands)"
+  "ix86_binary_operator_ok (UNKNOWN, mode, operands,
+   TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -6349,7 +6359,8 @@ (define_insn_and_split "*add3_doubleword_zext"
   (match_dup 4))
 (const_int 0)))
  (clobber (reg:CC FLAGS_REG))])]
- "split_double_mode (mode, [0], 2, [0], [3]);")
+ "split_double_mode (mode, [0], 2, [0], [3]);"
+ [(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*add3_doubleword_concat"
   [(set (match_operand: 0 "register_operand" "=")
@@ -7411,14 +7422,14 @@ (define_insn_and_split "*addv4_doubleword"
(eq:CCO
  (plus:
(sign_extend:
- (match_operand: 1 "nonimmediate_operand" "%0,0"))
+ (match_operand: 1 "nonimmediate_operand" "%0,0,ro,r"))
(sign_extend:
- (match_operand: 2 "nonimmediate_operand" "r,o")))
+ (match_operand: 2 "nonimmediate_operand" "r,o,r,o")))
  

Re: [PATCH] RISC-V: Add blocker for gather/scatter auto-vectorization

2023-12-04 Thread Robin Dapp
OK.

Regards
 Robin



[PATCH] c++: Fix parsing [[]][[]];

2023-12-04 Thread Jakub Jelinek
Hi!

When working on the previous patch I put [[]] [[]] asm (""); into a
testcase, but was surprised it wasn't parsed.
The problem is that when cp_parser_std_attribute_spec returns NULL, it
can mean 2 different things, one is that the next token(s) are neither
[[ nor alignas (in that case the caller should break from the loop),
or when we parsed something like [[]] - it was valid attribute specifier,
but didn't specify any attributes in it.

The following patch fixes that by adding another parameter to differentiate
between the cases, guess another option would be to use some magic
tree value for the break case instead of NULL_TREE (but error_mark_node is
already taken and means something else).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Or shall I go with some magic return which will never happen otherwise?
void_node?

2023-12-05  Jakub Jelinek  

* parser.cc (cp_parser_std_attribute_spec): Add ANY_P argument, set
what it points to initially to true and only if token is neither
CPP_OPEN_SQUARE nor RID_ALIGNAS CPP_KEYWORD set it to false.
(cp_parser_std_attribute_spec_seq): Adjust
cp_parser_std_attribute_spec caller.  If it returns NULL_TREE and
any_p is true, continue rather than break.

* g++.dg/cpp0x/gen-attrs-79.C: New test.

--- gcc/cp/parser.cc.jj 2023-12-04 20:23:53.225009856 +0100
+++ gcc/cp/parser.cc2023-12-04 20:49:21.160426104 +0100
@@ -2703,7 +2703,7 @@ static tree cp_parser_gnu_attribute_list
 static tree cp_parser_std_attribute
   (cp_parser *, tree);
 static tree cp_parser_std_attribute_spec
-  (cp_parser *);
+  (cp_parser *, bool *);
 static tree cp_parser_std_attribute_spec_seq
   (cp_parser *);
 static size_t cp_parser_skip_std_attribute_spec_seq
@@ -30265,11 +30265,12 @@ void cp_parser_late_contract_condition (
 conditional-expression ] ]  */
 
 static tree
-cp_parser_std_attribute_spec (cp_parser *parser)
+cp_parser_std_attribute_spec (cp_parser *parser, bool *any_p)
 {
   tree attributes = NULL_TREE;
   cp_token *token = cp_lexer_peek_token (parser->lexer);
 
+  *any_p = true;
   if (token->type == CPP_OPEN_SQUARE
   && cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_OPEN_SQUARE)
 {
@@ -30342,7 +30343,10 @@ cp_parser_std_attribute_spec (cp_parser
 
   if (token->type != CPP_KEYWORD
  || token->keyword != RID_ALIGNAS)
-   return NULL_TREE;
+   {
+ *any_p = false;
+ return NULL_TREE;
+   }
 
   cp_lexer_consume_token (parser->lexer);
   maybe_warn_cpp0x (CPP0X_ATTRIBUTES);
@@ -30414,9 +30418,16 @@ cp_parser_std_attribute_spec_seq (cp_par
 
   while (true)
 {
-  tree attr_spec = cp_parser_std_attribute_spec (parser);
+  bool any_p;
+  tree attr_spec = cp_parser_std_attribute_spec (parser, _p);
   if (attr_spec == NULL_TREE)
-   break;
+   {
+ /* Accept [[]][[]]; for which cp_parser_std_attribute_spec
+also returns NULL_TREE as there are no attributes.  */
+ if (any_p)
+   continue;
+ break;
+   }
   if (attr_spec == error_mark_node)
return error_mark_node;
 
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-79.C.jj2023-12-04 
20:38:35.122574430 +0100
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-79.C   2023-12-04 20:38:29.468654143 
+0100
@@ -0,0 +1,9 @@
+// { dg-do compile { target c++11 } }
+
+[[]] [[]];
+
+[[]] [[]] void
+foo ()
+{
+  [[]] [[]];
+}

Jakub



Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-04 Thread Vincenzo Palazzo
Ciao all,

>+1.  I think this is best left to the distros.

What do you mean? this is not a package, it is an env shell in order
to build an work on GCC on NixOS.

NixOS has already the packages for GCC

Cheers,

   Vincent.

On Tue, Dec 5, 2023 at 2:07 AM Jeff Law  wrote:
>
>
>
> On 12/4/23 18:02, Andrew Pinski wrote:
> > On Mon, Dec 4, 2023 at 4:58 PM Vincenzo Palazzo
> >  wrote:
> >>
> >> This commit is specifically targeting enhancements in
> >> Nix support for GCC development. This initiative stems
> >> from the recognized need within our community for a more
> >> streamlined and efficient development process when using Nix.
> >
> > I think this is wrong place for this.
> +1.  I think this is best left to the distros.
>
> jeff


Re: [patch-1, rs6000] enable fctiw on old archs [PR112707]

2023-12-04 Thread Kewen.Lin
Hi Haochen,

on 2023/12/1 10:41, HAO CHEN GUI wrote:
> Hi,
>   SImode in float register is supported on P7 above. It causes "fctiw"
> can be generated on old 32-bit processors as the output operand of

typo?  I guess you meant to say "can NOT"?

> fctiw insn is a SImode in float/double register. This patch fixes the
> problem by adding an expand and an insn pattern for fctiw. The output
> of new pattern is SFmode. When the target doesn't support SImode in
> float register, it calls the new pattern and convert the SFmode to
> SImode via stack.

Assuming that due to the inconsistent ISA support levels between stfiwx
and lfiw[az]x, it's not practical to support SImode in FP regs.

> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
> no regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: enable fctiw on old archs
> 
> The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction,
> but the instruction can't be generated on such platforms as the insn is
> guard by TARGET_POPCNTD.  The root cause is SImode in float register is
> supported from Power7.  Actually implementation of "fctiw" only needs
> stfiwx which is supported by the old 320-bit processors.  This patch
> enables "fctiw" expand for these processors.
> 
> gcc/
>   PR target/112707
>   * config/rs6000/rs6000.md (UNSPEC_STFIWX_SF, UNSPEC_FCTIW_SF): New.
>   (expand lrintsi2): New.
>   (insn lrintsi2): Rename to...
>   (lrintsi_internal): ...this, and remove guard TARGET_POPCNTD.
>   (lrintsi_internal2): New.
>   (stfiwx_sf): New.
> 
> gcc/testsuite/
>   PR target/112707
>   * gcc.target/powerpc/pr112707-1.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index d4337ce42a9..1b207522ad5 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -90,6 +90,7 @@ (define_c_enum "unspec"
> UNSPEC_TLSTLS_PCREL
> UNSPEC_FIX_TRUNC_TF   ; fadd, rounding towards zero
> UNSPEC_STFIWX
> +   UNSPEC_STFIWX_SF
> UNSPEC_POPCNTB
> UNSPEC_FRES
> UNSPEC_SP_SET
> @@ -111,6 +112,7 @@ (define_c_enum "unspec"
> UNSPEC_PARITY
> UNSPEC_CMPB
> UNSPEC_FCTIW
> +   UNSPEC_FCTIW_SF
> UNSPEC_FCTID
> UNSPEC_LFIWAX
> UNSPEC_LFIWZX
> @@ -6722,11 +6724,39 @@ (define_insn "lrintdi2"
>"fctid %0,%1"
>[(set_attr "type" "fp")])
> 
> -(define_insn "lrintsi2"
> +(define_expand "lrintsi2"
>[(set (match_operand:SI 0 "gpc_reg_operand" "=d")
>   (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
>  UNSPEC_FCTIW))]
> -  "TARGET_HARD_FLOAT && TARGET_POPCNTD"
> +  "TARGET_HARD_FLOAT && TARGET_STFIWX"
> +{
> +  /* For those old archs in which SImode can't be hold in float registers,
> + call lrintsi_internal2 to put the result in SFmode then
> + convert it via stack.  */
> +  if (!TARGET_POPCNTD)
> +{
> +  rtx tmp = gen_reg_rtx (SFmode);
> +  emit_insn (gen_lrintsi_internal2 (tmp, operands[1]));

Considering some existing supports eg: "fix_truncsi2_stfiwx" adopting
DImode, I think we can do the similar thing here, ie:

  rtx tmp = gen_reg_rtx (DImode);
  emit_insn (gen_lrintsi_di (tmp, operands[1]));

> +  rtx stack = rs6000_allocate_stack_temp (SImode, false, true);
> +  emit_insn (gen_stfiwx_sf (stack, tmp));

  ...
  emit_insn (gen_stfiwx (stack, tmp));

Theoretically even if !TARGET_STFIWX, we can still save the fpr into
memory and load the appropriate 4 bytes from that, but TARGET_STFIWX (PPC)
is quite old already, introducing such complexity here seems not worthy.

> +  emit_move_insn (operands[0], stack);
> +  DONE;
> +}
> +})
> +
> +(define_insn "lrintsi_internal"

Nit: This can be unnamed, maybe something like "*lrintsi"?

> +  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
> + (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
> +UNSPEC_FCTIW))]
> +  "TARGET_HARD_FLOAT"

Nit: Add "&& TARGET_POPCNTD" to the condition.

> +  "fctiw %0,%1"
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "lrintsi_internal2"

can be: lrintsi_di

> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=d")
> + (unspec:SF [(match_operand:SFDF 1 "gpc_reg_operand" "")]
> +UNSPEC_FCTIW_SF))]

Use DI to replace SF here and just still use UNSPEC_FCTIW.

> +  "TARGET_HARD_FLOAT"

Add "&& !TARGET_POPCNTD" to the condition.

>"fctiw %0,%1"
>[(set_attr "type" "fp")])
> 
> @@ -6801,6 +6831,14 @@ (define_insn "stfiwx"
>[(set_attr "type" "fpstore")
> (set_attr "isa" "*,p8v")])
> 
> +(define_insn "stfiwx_sf"
> +  [(set (match_operand:SI 0 "memory_operand" "=Z")
> + (unspec:SI [(match_operand:SF 1 "gpc_reg_operand" "d")]
> +UNSPEC_STFIWX_SF))]
> +  "TARGET_STFIWX"
> +  "stfiwx %1,%y0"
> +  [(set_attr "type" "fpstore")])

Then this part isn't needed.

> +
>  ;; If we don't have a direct conversion to single precision, don't enable 
> this
>  

[pushed] c++: fix constexpr noreturn diagnostic

2023-12-04 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Mentioning a noreturn function does not involve an lvalue-rvalue
conversion.

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1): Fix
check for loading volatile lvalue.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-noreturn1.C: New test.
---
 gcc/cp/constexpr.cc  |  3 ++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-noreturn1.C | 12 
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-noreturn1.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index b17e176aded..96c61666470 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9387,7 +9387,8 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
available, so we don't bother with switch tracking.  */
 return true;
 
-  if (TREE_THIS_VOLATILE (t) && want_rval)
+  if (TREE_THIS_VOLATILE (t) && want_rval
+  && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (t)))
 {
   if (flags & tf_error)
constexpr_error (loc, fundef_p, "lvalue-to-rvalue conversion of "
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-noreturn1.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-noreturn1.C
new file mode 100644
index 000..08c10e8dccb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-noreturn1.C
@@ -0,0 +1,12 @@
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -Winvalid-constexpr }
+
+// We were giving a wrong error about loading a volatile value instead of the
+// proper error about calling a non-constexpr function.
+
+[[noreturn]] void f();
+
+constexpr int g()
+{
+  return f(), 42; // { dg-message "call to non-'constexpr' function" }
+}

base-commit: b6abc5dbfa5342347828b9feb4d9060071ff819c
-- 
2.39.3



Re: [PATCH] htdocs/git.html: correct spelling and use git in example

2023-12-04 Thread Joseph Myers
On Fri, 1 Dec 2023, Jonny Grant wrote:

> 
> 
> On 30/11/2023 23:56, Joseph Myers wrote:
> > On Thu, 30 Nov 2023, Jonny Grant wrote:
> > 
> >> ChangeLog:
> >>
> >>htdocs/git.html: change example to use git:// and correct
> >>spelling repostiory -> repository .
> > 
> > git:// (unencrypted / unauthenticated) is pretty widely considered 
> > obsolescent, I'm not sure adding a use of it (as opposed to changing any 
> > existing examples to use a secure connection mechanism) is a good idea.
> > 
> 
> Hi Joseph
> 
> Thank you for your review.
> 
> Good point. I changed the ssh::// example because it doesn't work with 
> anonymous access.
> How about changing both to https:// ?

Using https:// makes sense for examples for anonymous access, yes.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-04 Thread Jeff Law




On 12/4/23 18:02, Andrew Pinski wrote:

On Mon, Dec 4, 2023 at 4:58 PM Vincenzo Palazzo
 wrote:


This commit is specifically targeting enhancements in
Nix support for GCC development. This initiative stems
from the recognized need within our community for a more
streamlined and efficient development process when using Nix.


I think this is wrong place for this.

+1.  I think this is best left to the distros.

jeff


Re: [PATCH] Fortran: allow RESTRICT qualifier also for optional arguments [PR100988]

2023-12-04 Thread Paul Richard Thomas
Hi Harald,

The patch is OK for mainline.

Thanks

Paul


On Mon, 4 Dec 2023 at 22:47, Harald Anlauf  wrote:

> Dear all,
>
> the attached patch picks up an observation by Tobias that we did
> not specify the RESTRICT qualifier for optional arguments even
> if that was allowed.  In principle this might have prevented
> better optimization.
>
> While looking more closely, I found and fixed an issue with CLASS
> dummy arguments that mishandled this.  This revealed a few cases
> in the testsuite that were matching the wrong patterns...
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> Thanks,
> Harald
>
>


[r14-6114 Regression] FAIL: gcc.dg/tree-ssa/ssa-sink-16.c (test for excess errors) on Linux/x86_64

2023-12-04 Thread haochen.jiang
On Linux/x86_64,

de0ab339a795352c843f6e9b2dfce222f26588de is the first bad commit
commit de0ab339a795352c843f6e9b2dfce222f26588de
Author: Richard Biener 
Date:   Mon Dec 4 10:46:11 2023 +0100

tree-optimization/112827 - corrupt SCEV cache during SCCP

caused

FAIL: gcc.dg/tree-ssa/ssa-sink-16.c (internal compiler error: verify_gimple 
failed)
FAIL: gcc.dg/tree-ssa/ssa-sink-16.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6114/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-sink-16.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-sink-16.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-sink-16.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-sink-16.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] c++: Implement C++ DR 2262 - Attributes for asm-definition [PR110734]

2023-12-04 Thread Jakub Jelinek
Hi!

Seems in 2017 attribute-specifier-seq[opt] was added to asm-declaration
and the change was voted in as a DR.

The following patch implements it by parsing the attributes and warning
about them.

I found one attribute parsing bug I'll send a fix for momentarily.

And there is another thing I wonder about: with -Wno-attributes= we are
supposed to ignore the attributes altogether, but we are actually still
warning about them when we emit these generic warnings about ignoring
all attributes which appertain to this and that (perhaps with some
exceptions we first remove from the attribute chain), like:
void foo () { [[foo::bar]]; }
with -Wattributes -Wno-attributes=foo::bar
Shouldn't we call some helper function in cases like this and warn
not when std_attrs (or how the attribute chain var is called) is non-NULL,
but if it is non-NULL and contains at least one non-attribute_ignored_p
attribute?  cp_parser_declaration at least tries:
  if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs))
warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
OPT_Wattributes, "attribute ignored");
but attribute_ignored_p here checks the first attribute rather than the
whole chain.  So it will incorrectly not warn if there is an ignored
attribute followed by non-ignored.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-05  Jakub Jelinek  

PR c++/110734
* parser.cc (cp_parser_block_declaration): Implement C++ DR 2262
- Attributes for asm-definition.  Call cp_parser_asm_definition
even if RID_ASM token is only seen after sequence of standard
attributes.
(cp_parser_asm_definition): Parse standard attributes before
RID_ASM token and warn for them with -Wattributes.

* g++.dg/DRs/dr2262.C: New test.
* g++.dg/cpp0x/gen-attrs-76.C (foo, bar): Don't expect errors
on attributes on asm definitions.
* g++.dg/gomp/attrs-11.C: Remove 2 expected errors.

--- gcc/cp/parser.cc.jj 2023-12-04 08:59:06.871357329 +0100
+++ gcc/cp/parser.cc2023-12-04 20:23:53.225009856 +0100
@@ -15398,7 +15398,6 @@ cp_parser_block_declaration (cp_parser *
   /* Peek at the next token to figure out which kind of declaration is
  present.  */
   cp_token *token1 = cp_lexer_peek_token (parser->lexer);
-  size_t attr_idx;
 
   /* If the next keyword is `asm', we have an asm-definition.  */
   if (token1->keyword == RID_ASM)
@@ -15452,22 +15451,36 @@ cp_parser_block_declaration (cp_parser *
   /* If the next token is `static_assert' we have a static assertion.  */
   else if (token1->keyword == RID_STATIC_ASSERT)
 cp_parser_static_assert (parser, /*member_p=*/false);
-  /* If the next tokens after attributes is `using namespace', then we have
- a using-directive.  */
-  else if ((attr_idx = cp_parser_skip_std_attribute_spec_seq (parser, 1)) != 1
-  && cp_lexer_nth_token_is_keyword (parser->lexer, attr_idx,
-RID_USING)
-  && cp_lexer_nth_token_is_keyword (parser->lexer, attr_idx + 1,
-RID_NAMESPACE))
+  else
 {
-  if (statement_p)
-   cp_parser_commit_to_tentative_parse (parser);
-  cp_parser_using_directive (parser);
+  size_t attr_idx = cp_parser_skip_std_attribute_spec_seq (parser, 1);
+  cp_token *after_attr = NULL;
+  if (attr_idx != 1)
+   after_attr = cp_lexer_peek_nth_token (parser->lexer, attr_idx);
+  /* If the next tokens after attributes is `using namespace', then we have
+a using-directive.  */
+  if (after_attr
+ && after_attr->keyword == RID_USING
+ && cp_lexer_nth_token_is_keyword (parser->lexer, attr_idx + 1,
+   RID_NAMESPACE))
+   {
+ if (statement_p)
+   cp_parser_commit_to_tentative_parse (parser);
+ cp_parser_using_directive (parser);
+   }
+  /* If the next token after attributes is `asm', then we have
+an asm-definition.  */
+  else if (after_attr && after_attr->keyword == RID_ASM)
+   {
+ if (statement_p)
+   cp_parser_commit_to_tentative_parse (parser);
+ cp_parser_asm_definition (parser);
+   }
+  /* Anything else must be a simple-declaration.  */
+  else
+   cp_parser_simple_declaration (parser, !statement_p,
+ /*maybe_range_for_decl*/NULL);
 }
-  /* Anything else must be a simple-declaration.  */
-  else
-cp_parser_simple_declaration (parser, !statement_p,
- /*maybe_range_for_decl*/NULL);
 }
 
 /* Parse a simple-declaration.
@@ -22424,6 +22437,7 @@ cp_parser_asm_definition (cp_parser* par
   bool invalid_inputs_p = false;
   bool invalid_outputs_p = false;
   required_token missing = RT_NONE;
+  tree std_attrs = cp_parser_std_attribute_spec_seq (parser);
   location_t asm_loc = 

RE: [PATCH 3/3] MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type are the same

2023-12-04 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Richard Biener 
> Sent: Monday, December 4, 2023 6:22 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 3/3] MATCH: (convert)(zero_one !=/== 0/1) for outer
> type and zero_one type are the same
> 
> On Sat, Dec 2, 2023 at 7:38 AM Andrew Pinski 
> wrote:
> >
> > When I moved two_value to match.pd, I removed the check for the
> > {0,+-1} as I had placed it after the {0,+-1} case for cond in match.pd.
> > In the case of {0,+-1} and non boolean, before we would optmize those
> > case to just `(convert)a` but after we would get `(convert)(a != 0)`
> > which was not handled anyways to just `(convert)a`.
> > So this adds a pattern to match `(convert)(zeroone != 0)` and simplify
> > to `(convert)zeroone`.
> >
> > Also this optimizes (convert)(zeroone == 0) into (zeroone^1) if the
> > type match. This can only be done on the gimple level as if zeroone
> > was defined by (a&1), fold will convert (a&1)^1 back into
> > `(convert)(zeroone == 0)` and an infinite loop will happen.
> 
> So fold converts (a&1)^1 to (convert)(a&1 == 0)?  Can we fix (remove) this
> instead or do we rely on that?

I have not tried to remove it but I will try to see if we depend on this.

Thanks,
Andrew


> 
> > Note the testcase pr69270.c needed a slight update due to not matching
> > exactly a scan pattern, this update makes it more robust and will
> > match before and afterwards and if there are other changes in this area too.
> >
> > Note the testcase gcc.target/i386/pr110790-2.c needs a slight update
> > for better code generation in LP64 bit mode.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> 
> Otherwise OK.
> 
> Thanks,
> Richard.
> 
> > gcc/ChangeLog:
> >
> > PR tree-optimization/111972
> > PR tree-optimization/110637
> > * match.pd (`(convert)(zeroone !=/== CST)`): Match
> > and simplify to ((convert)zeroone){,^1}.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/pr110637-1.c: New test.
> > * gcc.dg/tree-ssa/pr110637-2.c: New test.
> > * gcc.dg/tree-ssa/pr110637-3.c: New test.
> > * gcc.dg/tree-ssa/pr111972-1.c: New test.
> > * gcc.dg/tree-ssa/pr69270.c: Update testcase.
> > * gcc.target/i386/pr110790-2.c: Update testcase.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/match.pd   | 21 +
> >  gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c | 10 +++
> > gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c | 13 +
> > gcc/testsuite/gcc.dg/tree-ssa/pr110637-3.c | 14 +
> > gcc/testsuite/gcc.dg/tree-ssa/pr111972-1.c | 34
> ++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr69270.c|  4 +--
> >  gcc/testsuite/gcc.target/i386/pr110790-2.c | 16 --
> >  7 files changed, 108 insertions(+), 4 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-3.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111972-1.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> > 4d554ba4721..656b2c9edda 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3332,6 +3332,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE
> (@0)))
> >  (rcmp @0 @1
> >
> > +/* (type)([0,1]@a != 0) -> (type)a
> > +   (type)([0,1]@a == 1) -> (type)a
> > +   (type)([0,1]@a == 0) -> a ^ 1
> > +   (type)([0,1]@a != 1) -> a ^ 1.  */ (for eqne (eq ne)  (simplify
> > +  (convert (eqne zero_one_valued_p@0 INTEGER_CST@1))
> > +  (if ((integer_zerop (@1) || integer_onep (@1)))
> > +   (if ((eqne == EQ_EXPR) ^ integer_zerop (@1))
> > +(convert @0)
> > +   /* a^1 can only be produced for gimple as
> > +  fold has the exact opposite transformation
> > +  for `(X & 1) ^ 1`.
> > +  See `Fold ~X & 1 as (X & 1) == 0.`
> > +  and `Fold (X ^ 1) & 1 as (X & 1) == 0.` in fold-const.cc.
> > +  Only do this if the types match as (type)(a == 0) is
> > +  canonical form normally, while `a ^ 1` is canonical when
> > +  there is no type change. */
> > +   (if (GIMPLE && types_match (type, TREE_TYPE (@0)))
> > +(bit_xor @0 { build_one_cst (type); } ))
> > +
> >  /* We can't reassociate at all for saturating types.  */  (if
> > (!TYPE_SATURATING (type))
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c
> > new file mode 100644
> > index 000..3d03b0992a4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fdump-tree-optimized" } */ int f(int a) {
> > +int b = (a & 1)!=0;
> > +return b;
> > +}
> > +
> > +/* This should be optimized to just return (a & 1); 

[PATCH 14/17] [APX NDD] Support APX NDD for rotate insns

2023-12-04 Thread Hongyu Wang
gcc/ChangeLog:

* config/i386/i386.md (*3_1): Extend with a new
alternative to support NDD for SI/DI rotate, and adjust output
template.
(*si3_1_zext): Likewise.
(*3_1): Likewise for QI/HI modes.
(rcrsi2): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternative.
(rcrdi2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add test for left/right rotate.
---
 gcc/config/i386/i386.md | 79 +++--
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 20 +++
 2 files changed, 69 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8bec8a63ba9..6398f544a17 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16662,13 +16662,15 @@ (define_insn "*bmi2_rorx3_1"
(set_attr "mode" "")])
 
 (define_insn "*3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(any_rotate:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
+ (match_operand:QI 2 "nonmemory_operand" "c,,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
 {
 case TYPE_ROTATEX:
@@ -16676,14 +16678,16 @@ (define_insn "*3_1"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !use_ndd)
return "{}\t%0";
   else
-   return "{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "{}\t{%2, %1, %0|%0, %1, %2}"
+  : "{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "rotate,rotatex")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "rotate,rotatex,rotate")
(set (attr "preferred_for_size")
  (cond [(eq_attr "alternative" "0")
  (symbol_ref "true")]
@@ -16733,13 +16737,14 @@ (define_insn "*bmi2_rorxsi3_1_zext"
(set_attr "mode" "SI")])
 
 (define_insn "*si3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
(zero_extend:DI
- (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
-(match_operand:QI 2 "nonmemory_operand" "cI,I"
+ (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm")
+(match_operand:QI 2 "nonmemory_operand" "cI,I,cI"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && ix86_binary_operator_ok (, SImode, operands)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
 {
 case TYPE_ROTATEX:
@@ -16747,14 +16752,16 @@ (define_insn "*si3_1_zext"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !use_ndd)
return "{l}\t%k0";
   else
-   return "{l}\t{%2, %k0|%k0, %2}";
+   return use_ndd ? "{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  : "{l}\t{%2, %k0|%k0, %2}";
 }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "rotate,rotatex")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "rotate,rotatex,rotate")
(set (attr "preferred_for_size")
  (cond [(eq_attr "alternative" "0")
  (symbol_ref "true")]
@@ -16798,19 +16805,25 @@ (define_split
(zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2])
 
 (define_insn "*3_1"
-  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m")
-   (any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0")
- (match_operand:QI 2 "nonmemory_operand" "c")))
+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m,r")
+   (any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0,rm")
+ (match_operand:QI 2 "nonmemory_operand" "c,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = which_alternative == 1;
   if (operands[2] == const1_rtx
-  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+  && !use_ndd)
 return "{}\t%0";
   else
-return "{}\t{%2, %0|%0, %2}";
+return use_ndd
+  ? "{}\t{%2, %1, %0|%0, %1, %2}"
+  : "{}\t{%2, %0|%0, %2}";
 }
-  [(set_attr 

[PATCH 12/17] [APX NDD] Support APX NDD for left shift insns

2023-12-04 Thread Hongyu Wang
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.

The optimization TARGET_SHIFT1 will try to remove constant 1 to use shorter
opcode, but under NDD assembler will automatically use it whether $1 exist
or not, so do not involve NDD with it.

The doubleword insns for left shift calls ix86_expand_ashl, which assume
all shift related pattern has same operand[0] and operand[1]. For these pattern
we will support them in a standalone patch.

gcc/ChangeLog:

* config/i386/i386.md (*ashl3_1): Extend with new
alternatives to support NDD, limit the new alternative to
generate sal only, and adjust output template for NDD.
(*ashlsi3_1_zext): Likewise.
(*ashlhi3_1): Likewise.
(*ashlqi3_1): Likewise.
(*ashl3_cmp): Likewise.
(*ashlsi3_cmp_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*ashl3_cconly): Likewise.
(*ashl3_doubleword_highpart): Adjust codegen for NDD.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add tests for sal.
---
 gcc/config/i386/i386.md | 172 
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  22 +++
 2 files changed, 136 insertions(+), 58 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 62cd21ee3d4..43be1364bff 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14467,10 +14467,19 @@ (define_insn_and_split 
"*ashl3_doubleword_highpart"
 {
   split_double_mode (mode, [0], 1, [0], [3]);
   int bits = INTVAL (operands[2]) - ( * BITS_PER_UNIT);
-  if (!rtx_equal_p (operands[3], operands[1]))
-emit_move_insn (operands[3], operands[1]);
-  if (bits > 0)
-emit_insn (gen_ashl3 (operands[3], operands[3], GEN_INT (bits)));
+  bool op_equal_p = rtx_equal_p (operands[3], operands[1]);
+  if (bits == 0)
+{
+  if (!op_equal_p)
+   emit_move_insn (operands[3], operands[1]);
+}
+  else
+{
+  if (!op_equal_p && !TARGET_APX_NDD)
+   emit_move_insn (operands[3], operands[1]);
+  rtx op_tmp = TARGET_APX_NDD ? operands[1] : operands[3];
+  emit_insn (gen_ashl3 (operands[3], op_tmp, GEN_INT (bits)));
+}
   ix86_expand_clear (operands[0]);
   DONE;
 })
@@ -14777,12 +14786,14 @@ (define_insn "*bmi2_ashl3_1"
(set_attr "mode" "")])
 
 (define_insn "*ashl3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
-   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
- (match_operand:QI 2 "nonmemory_operand" "c,M,r,")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k,r")
+   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
"0,l,rm,k,rm")
+ (match_operand:QI 2 "nonmemory_operand" 
"c,M,r,,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands)"
+  "ix86_binary_operator_ok (ASHIFT, mode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 4);
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
@@ -14797,18 +14808,25 @@ (define_insn "*ashl3_1"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ /* For NDD form instructions related to TARGET_SHIFT1, the $1
+immediate do not need to be omitted as assembler will map it
+to use shorter encoding. */
+ && !use_ndd)
return "sal{}\t%0";
   else
-   return "sal{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sal{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sal{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,*,bmi2,")
+  [(set_attr "isa" "*,*,bmi2,,apx_ndd")
(set (attr "type")
  (cond [(eq_attr "alternative" "1")
  (const_string "lea")
(eq_attr "alternative" "2")
  (const_string "ishiftx")
+   (eq_attr "alternative" "4")
+ (const_string "ishift")
 (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
  (match_operand 0 "register_operand"))
 (match_operand 2 "const1_operand"))
@@ -14850,13 +14868,15 @@ (define_insn "*bmi2_ashlsi3_1_zext"
(set_attr "mode" "SI")])
 
 (define_insn "*ashlsi3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
(zero_extend:DI
- (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,l,rm")
-(match_operand:QI 2 "nonmemory_operand" "cI,M,r"
+ (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,l,rm,rm")
+

[RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-04 Thread Vincenzo Palazzo
This commit is specifically targeting enhancements in
Nix support for GCC development. This initiative stems
from the recognized need within our community for a more
streamlined and efficient development process when using Nix.

Signed-off-by: Vincenzo Palazzo 
---
 flake.lock | 60 ++
 flake.nix  | 35 +++
 2 files changed, 95 insertions(+)
 create mode 100644 flake.lock
 create mode 100644 flake.nix

diff --git a/flake.lock b/flake.lock
new file mode 100644
index 000..de713ff0da9
--- /dev/null
+++ b/flake.lock
@@ -0,0 +1,60 @@
+{
+  "nodes": {
+"flake-utils": {
+  "inputs": {
+"systems": "systems"
+  },
+  "locked": {
+"lastModified": 1694529238,
+"narHash": "sha256-zsNZZGTGnMOf9YpHKJqMSsa0dXbfmxeoJ7xHlrt+xmY=",
+"owner": "numtide",
+"repo": "flake-utils",
+"rev": "ff7b65b44d01cf9ba6a71320833626af21126384",
+"type": "github"
+  },
+  "original": {
+"owner": "numtide",
+"repo": "flake-utils",
+"type": "github"
+  }
+},
+"nixpkgs": {
+  "locked": {
+"lastModified": 1696095070,
+"narHash": "sha256-iDx02dT+OHYYgaRGJxp2HXvzSHkA9l8/3O8GJB2wttU=",
+"owner": "nixos",
+"repo": "nixpkgs",
+"rev": "1f0e8ac1f9a783c4cfa0515483094eeff4315fe2",
+"type": "github"
+  },
+  "original": {
+"owner": "nixos",
+"repo": "nixpkgs",
+"type": "github"
+  }
+},
+"root": {
+  "inputs": {
+"flake-utils": "flake-utils",
+"nixpkgs": "nixpkgs"
+  }
+},
+"systems": {
+  "locked": {
+"lastModified": 1681028828,
+"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
+"owner": "nix-systems",
+"repo": "default",
+"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
+"type": "github"
+  },
+  "original": {
+"owner": "nix-systems",
+"repo": "default",
+"type": "github"
+  }
+}
+  },
+  "root": "root",
+  "version": 7
+}
diff --git a/flake.nix b/flake.nix
new file mode 100644
index 000..b0ff1915adc
--- /dev/null
+++ b/flake.nix
@@ -0,0 +1,35 @@
+{
+  description = "gcc compiler";
+
+  inputs = {
+nixpkgs.url = "github:nixos/nixpkgs";
+flake-utils.url = "github:numtide/flake-utils";
+  };
+
+  outputs = { self, nixpkgs, flake-utils }:
+flake-utils.lib.eachDefaultSystem (system:
+  let pkgs = nixpkgs.legacyPackages.${system};
+  in {
+packages = {
+  default = pkgs.gnumake;
+};
+formatter = pkgs.nixpkgs-fmt;
+
+devShell = pkgs.mkShell {
+  buildInputs = [
+pkgs.gnumake
+pkgs.gcc13
+
+pkgs.gmp
+pkgs.libmpc
+pkgs.mpfr
+pkgs.isl
+pkgs.pkg-config
+pkgs.autoconf-archive
+pkgs.autoconf
+pkgs.automake
+  ];
+};
+  }
+);
+}
-- 
2.43.0



[PATCH 15/17] [APX NDD] Support APX NDD for shld/shrd insns

2023-12-04 Thread Hongyu Wang
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.

gcc/ChangeLog:

* config/i386/i386.md (x86_64_shld_ndd): New define_insn.
(x86_64_shld_ndd_1): Likewise.
(*x86_64_shld_ndd_2): Likewise.
(x86_shld_ndd): Likewise.
(x86_shld_ndd_1): Likewise.
(*x86_shld_ndd_2): Likewise.
(x86_64_shrd_ndd): Likewise.
(x86_64_shrd_ndd_1): Likewise.
(*x86_64_shrd_ndd_2): Likewise.
(x86_shrd_ndd): Likewise.
(x86_shrd_ndd_1): Likewise.
(*x86_shrd_ndd_2): Likewise.
(*x86_64_shld_shrd_1_nozext): Adjust codegen under TARGET_APX_NDD.
(*x86_shld_shrd_1_nozext): Likewise.
(*x86_64_shrd_shld_1_nozext): Likewise.
(*x86_shrd_shld_1_nozext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-shld-shrd.c: New test.
---
 gcc/config/i386/i386.md   | 322 +-
 .../gcc.target/i386/apx-ndd-shld-shrd.c   |  24 ++
 2 files changed, 344 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6398f544a17..0af7e82deee 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14505,6 +14505,23 @@ (define_insn "x86_64_shld"
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shld_ndd"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+ (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc")
+ (const_int 63)))
+   (subreg:DI
+ (lshiftrt:TI
+   (zero_extend:TI
+ (match_operand:DI 2 "register_operand" "r"))
+   (minus:QI (const_int 64)
+ (and:QI (match_dup 3) (const_int 63 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD"
+  "shld{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "DI")])
+
 (define_insn "x86_64_shld_1"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
 (ior:DI (ashift:DI (match_dup 0)
@@ -14526,6 +14543,24 @@ (define_insn "x86_64_shld_1"
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shld_ndd_1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+  (match_operand:QI 3 "const_0_to_63_operand"))
+   (subreg:DI
+ (lshiftrt:TI
+   (zero_extend:TI
+ (match_operand:DI 2 "register_operand" "r"))
+   (match_operand:QI 4 "const_0_to_255_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])"
+  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "DI")
+   (set_attr "length_immediate" "1")])
+
+
 (define_insn_and_split "*x86_64_shld_shrd_1_nozext"
   [(set (match_operand:DI 0 "nonimmediate_operand")
(ior:DI (ashift:DI (match_operand:DI 4 "nonimmediate_operand")
@@ -14551,6 +14586,23 @@ (define_insn_and_split "*x86_64_shld_shrd_1_nozext"
   operands[4] = force_reg (DImode, operands[4]);
   emit_insn (gen_x86_64_shrd_1 (operands[0], operands[4], operands[3], 
operands[2]));
 }
+  else if (TARGET_APX_NDD)
+{
+ rtx tmp = gen_reg_rtx (DImode);
+ if (MEM_P (operands[4]))
+   {
+operands[1] = force_reg (DImode, operands[1]);
+emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1],
+  operands[2], operands[3]));
+   }
+ else if (MEM_P (operands[1]))
+   emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[1], operands[4],
+operands[3], operands[2]));
+ else
+   emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1],
+operands[2], operands[3]));
+ emit_move_insn (operands[0], tmp);
+}
   else
{
  operands[1] = force_reg (DImode, operands[1]);
@@ -14583,6 +14635,33 @@ (define_insn_and_split "*x86_64_shld_2"
   (const_int 63 0)))
  (clobber (reg:CC FLAGS_REG))])])
 
+(define_insn_and_split "*x86_64_shld_ndd_2"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+   (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand")
+  (match_operand:QI 3 "nonmemory_operand"))
+   (lshiftrt:DI (match_operand:DI 2 "register_operand")
+

Re: [PATCH v6 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-04 Thread waffl3x
>> @@ -15402,6 +15450,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
>> complain,
>>   gcc_checking_assert (TYPE_MAIN_VARIANT (TREE_TYPE (ve))
>>== TYPE_MAIN_VARIANT (type));
>> SET_DECL_VALUE_EXPR (r, ve);
>> +   if (is_capture_proxy (t))
>> + type = TREE_TYPE (ve);

>That should have close to the same effect as the lambda_proxy_type
>adjustment I was talking about, since that function basically returns
>the TREE_TYPE of the COMPONENT_REF.  But the underlying problem is that
>finish_non_static_data_member assumes that 'object' is '*this', for
>which you can trust the cv-quals; for auto&&, you can't.
>capture_decltype has the same problem.  I'm attaching a patch to address
>this in both places.

Regarding this, was my change actually okay, and was your change
supposed to address it? I applied my patch to the latest commit in
master yesterday and started tests and whatnot with this change
commented out as I wasn't sure. It seems like my tests for constness of
captures no longer works with or without this change commented out.

If you wish I can go over everything again and figure out a new
solution with your changes but stepping through all this code was quite
a task that I'm weary of doing again. Even if the second time through
won't be so arduous I would like to avoid it.

You know what, I'll give it a go anyway but I don't want to spend too
much time on it, I still have a few tests to clean up and this crash to
fix.

template  void f()
{
   int i;
   [=](this T&& self){ return i; }(); // error, unrelated
}
int main() { f(); }

If this crash doesn't take too long (I don't think it will, it seems
straightforward enough) then I'll look at fixing the captures with a
const xobject parameter bug the correct way.

Alex


Re: [PATCH] c/86869 - preserve address-space info when building qualified ARRAY_TYPE

2023-12-04 Thread Joseph Myers
On Mon, 4 Dec 2023, Richard Biener wrote:

> The following adjusts the C FE specific qualified type building
> to preserve address-space info also for ARRAY_TYPE.
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu, OK?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] tree-optimization/112827 - corrupt SCEV cache during SCCP

2023-12-04 Thread Patrick O'Neill
Relevant bugzilla:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112848

Thanks,
Patrick

On Mon, Dec 4, 2023 at 11:20 PM Li, Pan2  wrote:

> Hi Richard,
>
> It looks like this patch result in one ICE for RISC-V backend for case
> tree-ssa.exp=ssa-sink-16.c, could you please help to double check about it?
> Any more information required please feel free to let me know. Thanks.
>
> compiler error: Segmentation fault
> 0x1903067 crash_signal
> ../../../../gcc/gcc/toplev.cc:316
> 0x111a24e loop_outer(loop const*)
> ../../../../gcc/gcc/cfgloop.h:549
> 0x1ac2143 find_uses_to_rename_use
> ../../../../gcc/gcc/tree-ssa-loop-manip.cc:424
> 0x1ac2295 find_uses_to_rename_stmt
> ../../../../gcc/gcc/tree-ssa-loop-manip.cc:464
> 0x1ac2456 find_uses_to_rename_bb
> ../../../../gcc/gcc/tree-ssa-loop-manip.cc:495
> 0x1ac2585 find_uses_to_rename
> ../../../../gcc/gcc/tree-ssa-loop-manip.cc:521
> 0x1ac267c rewrite_into_loop_closed_ssa_1
> ../../../../gcc/gcc/tree-ssa-loop-manip.cc:588
> 0x1ac2735 rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int)
> ../../../../gcc/gcc/tree-ssa-loop-manip.cc:628
> 0x19682a3 repair_loop_structures
> ../../../../gcc/gcc/tree-cfgcleanup.cc:1190
> 0x196831d cleanup_tree_cfg(unsigned int)
> ../../../../gcc/gcc/tree-cfgcleanup.cc:1209
> 0x16e654b execute_function_todo
> ../../../../gcc/gcc/passes.cc:2057
> 0x16e534d do_per_function
> ../../../../gcc/gcc/passes.cc:1687
> 0x16e68b0 execute_todo
> ../../../../gcc/gcc/passes.cc:2142
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, December 4, 2023 7:54 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/112827 - corrupt SCEV cache during SCCP
>
> The following avoids corrupting the SCEV cache by my last change
> to propagate constant final values immediately.  The easiest fix
> is to keep a dead initialization around.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
> PR tree-optimization/112827
> * tree-scalar-evolution.cc (final_value_replacement_loop):
> Do not release SSA name but keep a dead initialization around.
>
> * gcc.dg/torture/pr112827-1.c: New testcase.
> * gcc.dg/torture/pr112827-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/torture/pr112827-1.c | 14 ++
>  gcc/testsuite/gcc.dg/torture/pr112827-2.c | 18 ++
>  gcc/tree-scalar-evolution.cc  |  9 +++--
>  3 files changed, 35 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr112827-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr112827-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr112827-1.c
> b/gcc/testsuite/gcc.dg/torture/pr112827-1.c
> new file mode 100644
> index 000..6838cbbe62f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr112827-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +
> +int a, b, c, d, e;
> +int main() {
> +  for (; c; c++) {
> +for (a = 0; a < 2; a++)
> +  ;
> +for (; b; b++) {
> +  e = d;
> +  d = a;
> +}
> +  }
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/torture/pr112827-2.c
> b/gcc/testsuite/gcc.dg/torture/pr112827-2.c
> new file mode 100644
> index 000..a7a2a70211b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr112827-2.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +
> +short a, b[1], f;
> +char c, g;
> +int d, e;
> +int main() {
> +  for (; f; f++) {
> +for (d = 0; d < 2; d++)
> +  ;
> +if (a)
> +  for (g = 0; g < 2; g++)
> +for (c = 0; c < 2; c += b[d+g])
> +  ;
> +for (; e; e++)
> +  ;
> +  }
> +  return 0;
> +}
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index 065bcd0743d..7556d89e9f8 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -3847,13 +3847,10 @@ final_value_replacement_loop (class loop *loop)
>def = unshare_expr (def);
>remove_phi_node (, false);
>
> -  /* Propagate constants immediately.  */
> +  /* Propagate constants immediately, but leave an unused
> initialization
> +around to avoid invalidating the SCEV cache.  */
>if (CONSTANT_CLASS_P (def))
> -   {
> - replace_uses_by (rslt, def);
> - release_ssa_name (rslt);
> - continue;
> -   }
> +   replace_uses_by (rslt, def);
>
>/* Create the replacement statements.  */
>gimple_seq stmts;
> --
> 2.35.3
>


[PATCH] tree-optimization/112827 - more SCEV cprop fixes

2023-12-04 Thread Richard Biener
The insert iteration can be corrupted by foldings of replace_uses_by,
within this particular PHI replacement but also with subsequent ones.
Recompute the insert location before insertion instead.

This fixes an obvserved ICE of gcc.dg/tree-ssa/ssa-sink-16.c.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112827
PR tree-optimization/112848
* tree-scalar-evolution.cc (final_value_replacement_loop):
Compute the insert location for each insert.
---
 gcc/tree-scalar-evolution.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 065bcd0743d..38821ffb15a 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3739,7 +3739,6 @@ final_value_replacement_loop (class loop *loop)
 split_loop_exit_edge (exit);
 
   /* Set stmt insertion pointer.  All stmts are inserted before this point.  */
-  gimple_stmt_iterator gsi = gsi_after_labels (exit->dest);
 
   class loop *ex_loop
 = superloop_at_depth (loop,
@@ -3883,6 +3882,7 @@ final_value_replacement_loop (class loop *loop)
  gsi_next ();
}
}
+  gimple_stmt_iterator gsi = gsi_after_labels (exit->dest);
   gsi_insert_seq_before (, stmts, GSI_SAME_STMT);
   if (dump_file)
{
-- 
2.35.3


[RFC PATCH 0/1] Improved Nix Support for GCC Development

2023-12-04 Thread Vincenzo Palazzo
I am writing to submit a patch for consideration to be included 
in the mainline GCC repository. The patch aims to improve the ease of 
using Nix for GCC development, a challenge that several developers 
in our community have faced.

In the event that there is no current maintainer willing to take ownership 
of these changes, I am willing and able to assume the responsibility for 
maintaining this part of the codebase.

I believe that this patch will be a valuable addition to the GCC 
project, making it more accessible and user-friendly for a broader range 
of developers on NixOS. I appreciate your consideration of this contribution 
and am open to any feedback or suggestions for improvement.

Cheers,

Vincent

Vincenzo Palazzo (1):
  nix: add a simple flake nix shell

 flake.lock | 60 ++
 flake.nix  | 35 +++
 2 files changed, 95 insertions(+)
 create mode 100644 flake.lock
 create mode 100644 flake.nix

-- 
2.43.0



Re: [gcc15] nested functions in C

2023-12-04 Thread Martin Uecker
Am Montag, dem 04.12.2023 um 15:35 -0500 schrieb Siddhesh Poyarekar:
> On 2023-12-04 13:48, Martin Uecker wrote:
> > > I empathize with Jakub's stated use case though of keeping the C
> > > frontend support for testing purposes, but that could easily be done
> > > behind a flag, or by putting nested C func deprecation behind a flag.
> > 
> > I am relatively sure C will get some form of nested functions.
> > Maybe as anonymous nested functions, i.e. lambdas, but I do
> > not see a fundamental difference here (I personally like naming
> > things for clarity, so i prefer named nested functions)
> 
> If (assuming from them being called lambdas) they are primarily for 
> small functions without side-effects then it's already a significantly 
> stronger specification than what we have right now with C nested 
> functions.  That would end up enforcing what you demonstrate as the good 
> way to use nested functions.

The proposal we have seen for C23 (which was not accepted into
C23 mostly due to timing and lack of implementation experience)
were similar to C++'s lambdas and did not have any such restriction.

> 
> I suppose minimal, contained side-effects (such as atomically updating a 
> structure) may also constitute sound design, but that should be made 
> explicit in the language.

Updating some variable is useful for example for contractions, e.g.
summing over a certain range of values in an array, etc.

> 
> > > I don't disagree for cases like -Warray-bounds,
> > > but for warnings/errors that are more deterministic in nature (like
> > > -Werror=trampolines), they're going to point at actual problems and
> > > larger projects and distributions will usually prefer to at least track
> > > them, if not actually fix them.  For Fedora we tend to provide macro
> > > overrides for packages that need to explicitly disable a security
> > > related flag.
> > 
> > In projects such as mine, this will lead to a lot of code
> > transformations as indicated above, i.e. much worse code.
> > 
> > One could get away with it, since nested functions are rarely
> > used, but I think this is bad, because a lot of code would
> > improve if it used them.
> 
> If nested functions are eventually going to make it into the C standard 
> then effort is probably better spent in porting the C nested functions 
> to use descriptors instead of executable stacks or heaps.

I submitted a patch for this a long time ago which was based
on the code for Ada that uses a bit in the pointer to differentiate
between conventional pointers and descriptors.

I would now prefer an approach that uses a qualifier on the
function type to indicate that the static chain has to be
set. A pointer to such a qualified function would a descriptor
that consists of the address and the value for the static chain.

This would be useful for many things.

Martin







Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-04 Thread Jeff Law




On 12/4/23 18:38, Vincenzo Palazzo wrote:

Ciao all,


+1.  I think this is best left to the distros.


What do you mean? this is not a package, it is an env shell in order
to build an work on GCC on NixOS.

Distro build procedures are not something the GCC project generally gets 
involved with.


jeff


Re: [PATCH] lra: Updates of biggest mode for hard regs [PR112278]

2023-12-04 Thread Vladimir Makarov



On 12/3/23 05:13, Richard Sandiford wrote:

[Gah.  In my head I'd sent this a few weeks ago, but it turns out
  that I hadn't even got to the stage of writing the changlog...]

LRA keeps track of the biggest mode for both hard registers and
pseudos.  The updates assume that the modes are ordered, i.e. that
we can tell whether one is no bigger than the other at compile time.

That is (or at least seemed to be) a reasonable restriction for pseudos.
But it isn't necessarily so for hard registers, since the uses of hard
registers can be logically distinct.  The testcase is an example of this.

The biggest mode of hard registers is also special for other reasons.
As the existing comment says:

   /* A reg can have a biggest_mode of VOIDmode if it was only ever seen as
  part of a multi-word register.  In that case, just use the reg_rtx
  mode.  Do the same also if the biggest mode was larger than a register
  or we can not compare the modes.  Otherwise, limit the size to that of
  the biggest access in the function or to the natural mode at least.  */

This patch applies the same approach to the updates.

Tested on aarch64-linus-gnu (with and without SVE) and on x86_64-linux-gnu.
OK to install?


Sure.  Thank you for fixing this, Richard.




[PATCH 13/17] [APX NDD] Support APX NDD for right shift insns

2023-12-04 Thread Hongyu Wang
Similar to LSHIFT, rshift do not need to omit $1 for NDD form.

gcc/ChangeLog:

* config/i386/i386.md (ashr3_cvt): Extend with new
alternatives to support NDD, and adjust output templates.
(*ashr3_1): Likewise for SI/DI mode.
(*lshr3_1): Likewise.
(*si3_1_zext): Likewise.
(*ashr3_1): Likewise for QI/HI mode.
(*lshrqi3_1): Likewise.
(*lshrhi3_1): Likewise.
(3_cmp): Likewise.
(*3_cconly): Likewise.
(*ashrsi3_cvt_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*highpartdisi2): Likewise.
(*si3_cmp_zext): Likewise.
(3_carry): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add l/ashiftrt tests.
---
 gcc/config/i386/i386.md | 232 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  24 +++
 2 files changed, 166 insertions(+), 90 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 43be1364bff..8bec8a63ba9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15803,39 +15803,45 @@ (define_mode_attr cvt_mnemonic
   [(SI "{cltd|cdq}") (DI "{cqto|cqo}")])
 
 (define_insn "ashr3_cvt"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm,r")
(ashiftrt:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "*a,0")
+ (match_operand:SWI48 1 "nonimmediate_operand" "*a,0,rm")
  (match_operand:QI 2 "const_int_operand")))
(clobber (reg:CC FLAGS_REG))]
   "INTVAL (operands[2]) == GET_MODE_BITSIZE (mode)-1
&& (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, mode, operands)"
+   && ix86_binary_operator_ok (ASHIFTRT, mode, operands,
+  TARGET_APX_NDD)"
   "@

-   sar{}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "imovx,ishift")
-   (set_attr "prefix_0f" "0,*")
-   (set_attr "length_immediate" "0,*")
-   (set_attr "modrm" "0,1")
+   sar{}\t{%2, %0|%0, %2}
+   sar{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "imovx,ishift,ishift")
+   (set_attr "prefix_0f" "0,*,*")
+   (set_attr "length_immediate" "0,*,*")
+   (set_attr "modrm" "0,1,1")
(set_attr "mode" "")])
 
 (define_insn "*ashrsi3_cvt_zext"
-  [(set (match_operand:DI 0 "register_operand" "=*d,r")
+  [(set (match_operand:DI 0 "register_operand" "=*d,r,r")
(zero_extend:DI
- (ashiftrt:SI (match_operand:SI 1 "register_operand" "*a,0")
+ (ashiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "*a,0,rm")
   (match_operand:QI 2 "const_int_operand"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && INTVAL (operands[2]) == 31
&& (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, SImode, operands)"
+   && ix86_binary_operator_ok (ASHIFTRT, SImode, operands,
+  TARGET_APX_NDD)"
   "@
{cltd|cdq}
-   sar{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "imovx,ishift")
-   (set_attr "prefix_0f" "0,*")
-   (set_attr "length_immediate" "0,*")
-   (set_attr "modrm" "0,1")
+   sar{l}\t{%2, %k0|%k0, %2}
+   sar{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "imovx,ishift,ishift")
+   (set_attr "prefix_0f" "0,*,*")
+   (set_attr "length_immediate" "0,*,*")
+   (set_attr "modrm" "0,1,1")
(set_attr "mode" "SI")])
 
 (define_expand "@x86_shift_adj_3"
@@ -15877,13 +15883,15 @@ (define_insn "*bmi2_3_1"
(set_attr "mode" "")])
 
 (define_insn "*ashr3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(ashiftrt:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,r")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
+ (match_operand:QI 2 "nonmemory_operand" "c,r,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, mode, operands)"
+  "ix86_binary_operator_ok (ASHIFTRT, mode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
 {
 case TYPE_ISHIFTX:
@@ -15891,14 +15899,16 @@ (define_insn "*ashr3_1"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !use_ndd)
return "sar{}\t%0";
   else
-   return "sar{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sar{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "ishift,ishiftx")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr 

[PATCH] i386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]

2023-12-04 Thread Jakub Jelinek
Hi!

The following testcase ICEs in the movabsq $(i32 << shift), r64 peephole2
I've added a while back to use smaller code than movabsq if possible.
If i32 is 0xfa1e0ff3 and shift is not divisible by 8, then it creates
an invalid insn (as 0xfa1e0ff3 CONST_INT is not allowed as
x86_64_immediate_operand nor x86_64_zext_immediate_operand), the peephole2
even triggers on it again and again (this time with shift 0) until it gives
up.

The following patch fixes that.  As ix86_endbr_immediate_operand needs a
CONST_INT and it is hopefully rare, I chose to use FAIL rather than handling
it in the condition (where I'd probably need to call ctz_hwi again etc.).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-04  Jakub Jelinek  

PR target/112845
* config/i386/i386.md (movabsq $(i32 << shift), r64 peephole2): FAIL
if the new immediate is ix86_endbr_immediate_operand.

--- gcc/config/i386/i386.md.jj  2023-12-01 08:10:42.287330513 +0100
+++ gcc/config/i386/i386.md 2023-12-04 16:22:23.497986229 +0100
@@ -2699,7 +2699,10 @@ (define_peephole2
  (clobber (reg:CC FLAGS_REG))])]
 {
   int shift = ctz_hwi (UINTVAL (operands[1]));
-  operands[1] = gen_int_mode (UINTVAL (operands[1]) >> shift, DImode);
+  rtx op1 = gen_int_mode (UINTVAL (operands[1]) >> shift, DImode);
+  if (ix86_endbr_immediate_operand (op1, VOIDmode))
+FAIL;
+  operands[1] = op1;
   operands[2] = gen_int_mode (shift, QImode);
 })
 
--- gcc/testsuite/gcc.dg/pr112845.c.jj  2023-12-04 16:25:31.228350449 +0100
+++ gcc/testsuite/gcc.dg/pr112845.c 2023-12-04 16:25:26.740413464 +0100
@@ -0,0 +1,9 @@
+/* PR target/112845 */
+/* { dg-do compile { target cet } } */
+/* { dg-options "-Os -fcf-protection" } */
+
+unsigned long long
+foo (void)
+{
+  return 0xfa1e0ff3ULL << 3;
+}

Jakub



Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-04 Thread Eli Schwartz
On 12/4/23 9:01 PM, Vincenzo Palazzo wrote:
> On Tue, Dec 5, 2023 at 2:54 AM Jeff Law  wrote:
>> Distro build procedures are not something the GCC project generally gets
>> involved with.
> 
> I see, but to me, this do not look like a distro build procedure,
> because you can use
> with any kind of system (OSX/UNIX) by using nix.


But you can do the same with various other distro build procedures too?

e.g. Gentoo Prefix allows you to install a full-blown gentoo anywhere
you like, "by using portage".

But also by the same token, I can just install pacman or rpm or dpkg on
any system, and use the recipe executor just without requiring a
database of installed packages.


> I disagree with you just because my patch is not building a package
> but is just giving
> an agnostic way to develop with GCC. OFC is most useful with NixOs because
> it does not have apt or pacman or any other kind of package manager.


I'm not entirely sure what this statement means (unless you are saying
that nix isn't a package manager and NixOS doesn't have any package
manager)?

But I'd actually go one step further. It looks like this "flake.nix"
file is the NixOS specific equivalent of a README.md which says "to
install the software, you must first install XX, YY, and ZZ using your
system package manager. Often they will have names such as XX-devel and
suchlike".

Which for GCC would be https://gcc.gnu.org/install/prerequisites.html --
this page actually lists a bunch of things I don't see mentioned in your
"flake.nix" file so I suspect that it won't, in fact, produce a good
development environment for developing GCC.

I don't think it's the job of the GCC maintainers to maintain special
snowflake integrations with niche linux distros, whether those
integrations work or not. But, if it *was* the job of the GCC
maintainers, perhaps it would be better to make a script:

`tools/setup-development-env.sh $distro`

which could abstract away all of this for any distro, not just a niche one.


-- 
Eli Schwartz



[PATCH] lower-bitint: Make temporarily wrong IL less wrong [PR112843]

2023-12-04 Thread Jakub Jelinek
Hi!

As discussed in the PR, for the middle (on x86-64 65..128 bit) _BitInt
types like
  _1 = x_4(D) * 5;
where _1 and x_4(D) have _BitInt(128) type and x is PARM_DECL, the bitint
lowering pass wants to replace this with
  _13 = (int128_t) x_4(D);
  _12 = _13 * 5;
  _1 = (_BitInt(128)) _12;
where _13 and _12 have int128_t type and the ranger ICEs when the IL is
temporarily invalid:
during GIMPLE pass: bitintlower
pr112843.c: In function ‘foo’:
pr112843.c:7:1: internal compiler error: Segmentation fault
7 | foo (_BitInt (128) x, _BitInt (256) y)
  | ^~~
0x152943f crash_signal
../../gcc/toplev.cc:316
0x25c21c8 ranger_cache::range_of_expr(vrange&, tree_node*, gimple*)
../../gcc/gimple-range-cache.cc:1204
0x25cdcf9 fold_using_range::range_of_range_op(vrange&, 
gimple_range_op_handler&, fur_source&)
../../gcc/gimple-range-fold.cc:671
0x25cf9a0 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&, tree_node*)
../../gcc/gimple-range-fold.cc:602
0x25b5520 gimple_ranger::update_stmt(gimple*)
../../gcc/gimple-range.cc:564
0x16f1234 update_stmt_operands(function*, gimple*)
../../gcc/tree-ssa-operands.cc:1150
0x117a5b6 update_stmt_if_modified(gimple*)
../../gcc/gimple-ssa.h:187
0x117a5b6 update_stmt_if_modified(gimple*)
../../gcc/gimple-ssa.h:184
0x117a5b6 update_modified_stmt
../../gcc/gimple-iterator.cc:44
0x117a5b6 gsi_insert_after(gimple_stmt_iterator*, gimple*, gsi_iterator_update)
../../gcc/gimple-iterator.cc:544
0x25abc2f gimple_lower_bitint
../../gcc/gimple-lower-bitint.cc:6348

What the code does right now is, it first creates a new SSA_NAME (_12
above), adds the
  _1 = (_BitInt(128)) _12;
stmt after it (where it crashes, because _12 has no SSA_NAME_DEF_STMT yet),
then sets lhs of the previous stmt to _12 (this is also temporarily
incorrect, there are incompatible types involved in the stmt), later on
changes also operands and finally update_stmt it.

The following patch instead changes the lhs of the stmt before adding the
cast after it.  The question is if this is less or more wrong temporarily
(but the ICE is gone).
Yet another possibility would be to first adjust the operands of stmt
(without update_stmt), then set_lhs to a new lhs (still without
update_stmt), then add the cast after it and finally update_stmt (stmt).
Maybe that would be less wrong (still, before it is updated some chains
might think it is still the setter of _1 when it is not anymore).
Anyway, should I go with that order then instead of the patch below?

The reason I tweaked the lhs first is that it then just uses gimple_op and
iterates over all ops, if that is done before lhs it would need to special
case which op to skip because it is lhs (I'm using gimple_get_lhs for the
lhs, but this isn't done for GIMPLE_CALL nor GIMPLE_PHI, so GIMPLE_ASSIGN
or say GIMPLE_GOTO etc. are the only options, so I could just start with
op 1 rather than 0 for is_gimple_assign).

2023-12-04  Jakub Jelinek  

PR tree-optimization/112843
* gimple-lower-bitint.cc (gimple_lower_bitint): Change lhs of stmt
to lhs2 before building and inserting lhs = (cast) lhs2; assignment.

* gcc.dg/bitint-47.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2023-12-03 17:53:55.60482 +0100
+++ gcc/gimple-lower-bitint.cc  2023-12-04 14:39:20.352057389 +0100
@@ -6338,6 +6338,7 @@ gimple_lower_bitint (void)
int uns = TYPE_UNSIGNED (TREE_TYPE (lhs));
type = build_nonstandard_integer_type (prec, uns);
tree lhs2 = make_ssa_name (type);
+   gimple_set_lhs (stmt, lhs2);
gimple *g = gimple_build_assign (lhs, NOP_EXPR, lhs2);
if (stmt_ends_bb_p (stmt))
  {
@@ -6346,7 +6347,6 @@ gimple_lower_bitint (void)
  }
else
  gsi_insert_after (, g, GSI_SAME_STMT);
-   gimple_set_lhs (stmt, lhs2);
  }
  unsigned int nops = gimple_num_ops (stmt);
  for (unsigned int i = 0; i < nops; ++i)
--- gcc/testsuite/gcc.dg/bitint-47.c.jj 2023-12-04 14:53:19.784200724 +0100
+++ gcc/testsuite/gcc.dg/bitint-47.c2023-12-04 14:42:07.25164 +0100
@@ -0,0 +1,13 @@
+/* PR tree-optimization/112843 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-O2" } */
+
+#if __BITINT_MAXWIDTH__ >= 256
+_BitInt (256)
+foo (_BitInt (128) x, _BitInt (256) y)
+{
+  return x * 5 * y;
+}
+#else
+int x;
+#endif

Jakub



RE: [PATCH] tree-optimization/112827 - corrupt SCEV cache during SCCP

2023-12-04 Thread Li, Pan2
Hi Richard,

It looks like this patch result in one ICE for RISC-V backend for case 
tree-ssa.exp=ssa-sink-16.c, could you please help to double check about it?
Any more information required please feel free to let me know. Thanks.

compiler error: Segmentation fault
0x1903067 crash_signal
../../../../gcc/gcc/toplev.cc:316
0x111a24e loop_outer(loop const*)
../../../../gcc/gcc/cfgloop.h:549
0x1ac2143 find_uses_to_rename_use
../../../../gcc/gcc/tree-ssa-loop-manip.cc:424
0x1ac2295 find_uses_to_rename_stmt
../../../../gcc/gcc/tree-ssa-loop-manip.cc:464
0x1ac2456 find_uses_to_rename_bb
../../../../gcc/gcc/tree-ssa-loop-manip.cc:495
0x1ac2585 find_uses_to_rename
../../../../gcc/gcc/tree-ssa-loop-manip.cc:521
0x1ac267c rewrite_into_loop_closed_ssa_1
../../../../gcc/gcc/tree-ssa-loop-manip.cc:588
0x1ac2735 rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int)
../../../../gcc/gcc/tree-ssa-loop-manip.cc:628
0x19682a3 repair_loop_structures
../../../../gcc/gcc/tree-cfgcleanup.cc:1190
0x196831d cleanup_tree_cfg(unsigned int)
../../../../gcc/gcc/tree-cfgcleanup.cc:1209
0x16e654b execute_function_todo
../../../../gcc/gcc/passes.cc:2057
0x16e534d do_per_function
../../../../gcc/gcc/passes.cc:1687
0x16e68b0 execute_todo
../../../../gcc/gcc/passes.cc:2142

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, December 4, 2023 7:54 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] tree-optimization/112827 - corrupt SCEV cache during SCCP

The following avoids corrupting the SCEV cache by my last change
to propagate constant final values immediately.  The easiest fix
is to keep a dead initialization around.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112827
* tree-scalar-evolution.cc (final_value_replacement_loop):
Do not release SSA name but keep a dead initialization around.

* gcc.dg/torture/pr112827-1.c: New testcase.
* gcc.dg/torture/pr112827-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/pr112827-1.c | 14 ++
 gcc/testsuite/gcc.dg/torture/pr112827-2.c | 18 ++
 gcc/tree-scalar-evolution.cc  |  9 +++--
 3 files changed, 35 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr112827-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr112827-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr112827-1.c 
b/gcc/testsuite/gcc.dg/torture/pr112827-1.c
new file mode 100644
index 000..6838cbbe62f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr112827-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+int a, b, c, d, e;
+int main() {
+  for (; c; c++) {
+for (a = 0; a < 2; a++)
+  ;
+for (; b; b++) {
+  e = d;
+  d = a;
+}
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr112827-2.c 
b/gcc/testsuite/gcc.dg/torture/pr112827-2.c
new file mode 100644
index 000..a7a2a70211b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr112827-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+
+short a, b[1], f;
+char c, g;
+int d, e;
+int main() {
+  for (; f; f++) {
+for (d = 0; d < 2; d++)
+  ;
+if (a)
+  for (g = 0; g < 2; g++)
+for (c = 0; c < 2; c += b[d+g])
+  ;
+for (; e; e++)
+  ;
+  }
+  return 0;
+}
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 065bcd0743d..7556d89e9f8 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3847,13 +3847,10 @@ final_value_replacement_loop (class loop *loop)
   def = unshare_expr (def);
   remove_phi_node (, false);
 
-  /* Propagate constants immediately.  */
+  /* Propagate constants immediately, but leave an unused initialization
+around to avoid invalidating the SCEV cache.  */
   if (CONSTANT_CLASS_P (def))
-   {
- replace_uses_by (rslt, def);
- release_ssa_name (rslt);
- continue;
-   }
+   replace_uses_by (rslt, def);
 
   /* Create the replacement statements.  */
   gimple_seq stmts;
-- 
2.35.3


Re: [gcc15] nested functions in C

2023-12-04 Thread Joseph Myers
On Mon, 4 Dec 2023, Siddhesh Poyarekar wrote:

> On 2023-12-04 13:48, Martin Uecker wrote:
> > > I empathize with Jakub's stated use case though of keeping the C
> > > frontend support for testing purposes, but that could easily be done
> > > behind a flag, or by putting nested C func deprecation behind a flag.
> > 
> > I am relatively sure C will get some form of nested functions.
> > Maybe as anonymous nested functions, i.e. lambdas, but I do
> > not see a fundamental difference here (I personally like naming
> > things for clarity, so i prefer named nested functions)
> 
> If (assuming from them being called lambdas) they are primarily for small
> functions without side-effects then it's already a significantly stronger
> specification than what we have right now with C nested functions.  That would
> end up enforcing what you demonstrate as the good way to use nested functions.

The key feature of lambdas (which failed to make it into C23) for this 
purpose is that you can't convert them to function pointers, which 
eliminates any need for trampolines.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH v8] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-04 Thread Marek Polacek
On Mon, Dec 04, 2023 at 04:49:29PM -0500, Jason Merrill wrote:
> On 12/4/23 15:23, Marek Polacek wrote:
> > +/* FN is not a consteval function, but may become one.  Remember to
> > +   escalate it after all pending templates have been instantiated.  */
> > +
> > +void
> > +maybe_store_immediate_escalating_fn (tree fn)
> > +{
> > +  if (unchecked_immediate_escalating_function_p (fn))
> > +remember_escalating_expr (fn);
> > +}
> 
> > +++ b/gcc/cp/decl.cc
> > @@ -18441,7 +18441,10 @@ finish_function (bool inline_p)
> >if (!processing_template_decl
> >&& !DECL_IMMEDIATE_FUNCTION_P (fndecl)
> >&& !DECL_OMP_DECLARE_REDUCTION_P (fndecl))
> > -cp_fold_function (fndecl);
> > +{
> > +  cp_fold_function (fndecl);
> > +  maybe_store_immediate_escalating_fn (fndecl);
> > +}
> 
> I think maybe_store_, and the call to it from finish_function, are unneeded;
> we will have already decided whether we need to remember the function during
> the call to cp_fold_function.

'Tis true.
 
> OK with that change.

Here's what I pushed after another regtest.  Thanks!

-- >8 --
This patch implements P2564, described at , whereby
certain functions are promoted to consteval.  For example:

  consteval int id(int i) { return i; }

  template 
  constexpr int f(T t)
  {
return t + id(t); // id causes f to be promoted to consteval
  }

  void g(int i)
  {
f (3);
  }

now compiles.  Previously the code was ill-formed: we would complain
that 't' in 'f' is not a constant expression.  Since 'f' is now
consteval, it means that the call to id(t) is in an immediate context,
so doesn't have to produce a constant -- this is how we allow consteval
functions composition.  But making 'f' consteval also means that
the call to 'f' in 'g' must yield a constant; failure to do so results
in an error.  I made the effort to have cc1plus explain to us what's
going on.  For example, calling f(i) produces this neat diagnostic:

w.C:11:11: error: call to consteval function 'f(i)' is not a constant 
expression
   11 | f (i);
  | ~~^~~
w.C:11:11: error: 'i' is not a constant expression
w.C:6:22: note: 'constexpr int f(T) [with T = int]' was promoted to an 
immediate function because its body contains an immediate-escalating expression 
'id(t)'
6 | return t + id(t); // id causes f to be promoted to 
consteval
  |~~^~~

which hopefully makes it clear what's going on.

Implementing this proposal has been tricky.  One problem was delayed
instantiation: instantiating a function can set off a domino effect
where one call promotes a function to consteval but that then means
that another function should also be promoted, etc.

In v1, I addressed the delayed instantiation problem by instantiating
trees early, so that we can escalate functions right away.  That caused
a number of problems, and in certain cases, like consteval-prop3.C, it
can't work, because we need to wait till EOF to see the definition of
the function anyway.  Overeager instantiation tends to cause diagnostic
problems too.

In v2, I attempted to move the escalation to the gimplifier, at which
point all templates have been instantiated.  That attempt flopped,
however, because once we've gimplified a function, its body is discarded
and as a consequence, you can no longer evaluate a call to that function
which is required for escalating, which needs to decide if a call is
a constant expression or not.

Therefore, we have to perform the escalation before gimplifying, but
after instantiate_pending_templates.  That's not easy because we have
no way to walk all the trees.  In the v2 patch, I use two vectors: one
to store function decls that may become consteval, and another to
remember references to immediate-escalating functions.  Unfortunately
the latter must also stash functions that call immediate-escalating
functions.  Consider:

  int g(int i)
  {
f(i); // f is immediate-escalating
  }

where g itself is not immediate-escalating, but we have to make sure
that if f gets promoted to consteval, we give an error.

A new option, -fno-immediate-escalation, is provided to suppress
escalating functions.

v2 also adds a new flag, DECL_ESCALATION_CHECKED_P, so that we don't
escalate a function multiple times, and so that we can distinguish between
explicitly consteval functions and functions that have been promoted
to consteval.

In v3, I removed one of the new vectors and changed the other one
to a hash set.  This version also contains numerous cleanups.

v4 merges find_escalating_expr_r into cp_fold_immediate_r.  It also
adds a new optimization in cp_fold_function.

v5 greatly simplifies the code.

v6 simplifies the code further and removes an ff_ flag.

v7 removes maybe_promote_function_to_consteval and further simplifies
cp_fold_immediate_r logic.

v8 removes maybe_store_immediate_escalating_fn.

PR c++/107687
PR c++/110997

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc 

[PATCH] RISC-V: Add blocker for gather/scatter auto-vectorization

2023-12-04 Thread Juzhe-Zhong
This patch fixes ICE exposed on full coverage testing:

=== g++: Unexpected fails for 
rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=dynamic ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at 
machmode.h:313)
=== g++: Unexpected fails for 
rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=dynamic 
--param=riscv-autovec-preference=fixed-vlmax ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at 
machmode.h:313)
=== g++: Unexpected fails for 
rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=m4 ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at 
machmode.h:313)
=== g++: Unexpected fails for 
rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=m4 
--param=riscv-autovec-preference=fixed-vlmax ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at 
machmode.h:313)
=== g++: Unexpected fails for 
rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=m8 ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at 
machmode.h:313)
=== g++: Unexpected fails for 
rv64gc_zve32f_zvfh_zfh lp64d medlow --param=riscv-autovec-lmul=m8 
--param=riscv-autovec-preference=fixed-vlmax ===
FAIL: g++.dg/pr106219.C  -std=gnu++14 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++17 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++20 (internal compiler error: in require, at 
machmode.h:313)
FAIL: g++.dg/pr106219.C  -std=gnu++98 (internal compiler error: in require, at 
machmode.h:313)

The rootcause is we can't extend RVVM4SImode into RVVM8DImode on zve32f.
Add a blocker of it to disable such auto-vectorization in this situation.

gcc/ChangeLog:

* config/riscv/autovec.md: Add blocker.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_p): New 
function.
* config/riscv/riscv-v.cc (gather_scatter_valid_offset_p): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/autovec/bug-2.C: New test.

---
 gcc/config/riscv/autovec.md   | 24 -
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 18 +
 .../g++.target/riscv/rvv/autovec/bug-2.C  | 26 +++
 4 files changed, 57 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/autovec/bug-2.C

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 2d727c2609b..b9f7aa204da 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -59,7 +59,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -74,7 +74,7 @@
(match_operand: 5 "vector_mask_operand")
(match_operand 6 "autovec_length_operand")
(match_operand 7 "const_0_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
   DONE;
@@ -89,7 +89,7 @@
(match_operand: 5 

Re: [gcc15] nested functions in C

2023-12-04 Thread Martin Uecker
Am Montag, dem 04.12.2023 um 21:33 + schrieb Joseph Myers:
> On Mon, 4 Dec 2023, Siddhesh Poyarekar wrote:
> 
> > On 2023-12-04 13:48, Martin Uecker wrote:
> > > > I empathize with Jakub's stated use case though of keeping the C
> > > > frontend support for testing purposes, but that could easily be done
> > > > behind a flag, or by putting nested C func deprecation behind a flag.
> > > 
> > > I am relatively sure C will get some form of nested functions.
> > > Maybe as anonymous nested functions, i.e. lambdas, but I do
> > > not see a fundamental difference here (I personally like naming
> > > things for clarity, so i prefer named nested functions)
> > 
> > If (assuming from them being called lambdas) they are primarily for small
> > functions without side-effects then it's already a significantly stronger
> > specification than what we have right now with C nested functions.  That 
> > would
> > end up enforcing what you demonstrate as the good way to use nested 
> > functions.
> 
> The key feature of lambdas (which failed to make it into C23) for this 
> purpose is that you can't convert them to function pointers, which 
> eliminates any need for trampolines.

And also makes them useful only for template-like macro programming,
but not much else. So my understanding was that this needs to be 
addressed at some point. 

Martin

> 



Re: [PATCH 2/6] c: Turn int-conversion warnings into permerrors

2023-12-04 Thread Kito Cheng
both patches are landed, newlib trunk should be able to build with gcc
trunk now.

On Mon, Dec 4, 2023 at 3:45 PM Kito Cheng  wrote:
>
> RISC-V newlib patch send, one for libgloss and another one for libm,
> the libm issue is because we don't have right long double support,
> however newlib has supported that few months ago, and porting effort
> is minor, so I just port that to fix the issue :)
>
> https://sourceware.org/pipermail/newlib/2023/020725.html
> https://sourceware.org/pipermail/newlib/2023/020726.html


Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-04 Thread Vincenzo Palazzo
>Distro build procedures are not something the GCC project generally gets
involved with.

I see, but to me, this do not look like a distro build procedure,
because you can use
with any kind of system (OSX/UNIX) by using nix.

I disagree with you just because my patch is not building a package
but is just giving
an agnostic way to develop with GCC. OFC is most useful with NixOs because
it does not have apt or pacman or any other kind of package manager.

Cheers,

   Vincent.

On Tue, Dec 5, 2023 at 2:54 AM Jeff Law  wrote:
>
>
>
> On 12/4/23 18:38, Vincenzo Palazzo wrote:
> > Ciao all,
> >
> >> +1.  I think this is best left to the distros.
> >
> > What do you mean? this is not a package, it is an env shell in order
> > to build an work on GCC on NixOS.
> >
> Distro build procedures are not something the GCC project generally gets
> involved with.
>
> jeff


Re: [PATCH v4] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-04 Thread Richard Sandiford
Manos Anagnostakis  writes:
> Στις Δευ 4 Δεκ 2023, 21:22 ο χρήστης Richard Sandiford <
> richard.sandif...@arm.com> έγραψε:
>
>> Manos Anagnostakis  writes:
>> > This is an RTL pass that detects store forwarding from stores to larger
>> loads (load pairs).
>> >
>> > This optimization is SPEC2017-driven and was found to be beneficial for
>> some benchmarks,
>> > through testing on ampere1/ampere1a machines.
>> >
>> > For example, it can transform cases like
>> >
>> > str  d5, [sp, #320]
>> > fmul d5, d31, d29
>> > ldp  d31, d17, [sp, #312] # Large load from small store
>> >
>> > to
>> >
>> > str  d5, [sp, #320]
>> > fmul d5, d31, d29
>> > ldr  d31, [sp, #312]
>> > ldr  d17, [sp, #320]
>> >
>> > Currently, the pass is disabled by default on all architectures and
>> enabled by a target-specific option.
>> >
>> > If deemed beneficial enough for a default, it will be enabled on
>> ampere1/ampere1a,
>> > or other architectures as well, without needing to be turned on by this
>> option.
>> >
>> > Bootstrapped and regtested on aarch64-linux.
>> >
>> > gcc/ChangeLog:
>> >
>> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
>> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New
>> pass.
>> > * config/aarch64/aarch64-protos.h
>> (make_pass_avoid_store_forwarding): Declare.
>> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New
>> option.
>> >   (aarch64-store-forwarding-threshold): New param.
>> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
>> > * doc/invoke.texi: Document new option and new param.
>> > * config/aarch64/aarch64-store-forwarding.cc: New file.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
>> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
>> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
>> >
>> > Signed-off-by: Manos Anagnostakis 
>> > Co-Authored-By: Manolis Tsamis 
>> > Co-Authored-By: Philipp Tomsich 
>> > ---
>> > Changes in v4:
>> >   - I had problems to make cselib_subst_to_values work correctly
>> > so I used cselib_lookup to implement the exact same behaviour and
>> > record the store value at the time we iterate over it.
>> >   - Removed the store/load_mem_addr check from is_forwarding as
>> > unnecessary.
>> >   - The pass is called on all optimization levels right now.
>> >   - The threshold check should remain as it is as we only care for
>> > the front element of the list. The comment above the check
>> explains
>> > why a single if is enough.
>>
>> I still think this is structurally better as a while.  There's no reason
>> in principle we why wouldn't want to record the stores in:
>>
>> stp x0, x1, [x4, #8]
>> ldp x0, x1, [x4, #0]
>> ldp x2, x3, [x4, #16]
>>
>> and then the two stores should have the same distance value.
>> I realise we don't do that yet, but still.
>>
> Ah, you mean forwarding from stp. I was a bit confused with what you meant
> the previous time. This was not initially meant for this patch, but I think
> it wouldn't take long to implement that before pushing this. It is your
> call of course if I should include it.

No strong opinion either way, really.  It's definitely fine to check
only for STRs initially.  I'd just rather not bake that assumption into
places where we don't need to.

If you do check for STPs, it'd probably be worth committing the pass
without it at first, waiting until Alex's patch is in, and then doing
STP support as a separate follow-up patch.  That's because Alex's patch
will convert STPs to using a single double-width memory operand.

(Waiting shouldn't jeopardise the chances of the patch going in, since
the pass was posted well in time and the hold-up isn't your fault.)

> [...]
>> > +/* Return true if STORE_MEM_ADDR is forwarding to the address of
>> LOAD_MEM;
>> > +   otherwise false.  STORE_MEM_MODE is the mode of the MEM rtx
>> containing
>> > +   STORE_MEM_ADDR.  */
>> > +
>> > +static bool
>> > +is_forwarding (rtx store_mem_addr, rtx load_mem, machine_mode
>> store_mem_mode)
>> > +{
>> > +  /* Sometimes we do not have the proper value.  */
>> > +  if (!CSELIB_VAL_PTR (store_mem_addr))
>> > +return false;
>> > +
>> > +  gcc_checking_assert (MEM_P (load_mem));
>> > +
>> > +  rtx load_mem_addr = get_addr (XEXP (load_mem, 0));
>> > +  machine_mode load_mem_mode = GET_MODE (load_mem);
>> > +  load_mem_addr = cselib_lookup (load_mem_addr, load_mem_mode, 1,
>> > +  load_mem_mode)->val_rtx;
>>
>> Like I said in the previous review, it shouldn't be necessary to do any
>> manual lookup on the load address.  rtx_equal_for_cselib_1 does the
>> lookup itself.  Does that not work?
>>
> I thought you meant only that the if check was redundant here, which it
> was.

Ah, no, I meant the full lookup, sorry.

> I'll reply if 

Re: [PATCH v2] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]

2023-12-04 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi Richard,
>
>>> Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible 
>>> with
>>> existing binaries, gives better performance than locking atomics and is what
>>> most users expect.
>>
>> Please add a justification for why it's backwards compatible, rather
>> than just stating that it's so.
>
> This isn't any different than the LSE2 support which also switches some CPUs 
> to
> lock-free implementations. This is basically switching the rest. It trivially 
> follows
> from the fact that GCC always calls libatomic so that you switch all atomics 
> in a
> process. I'll add that to the description.

So I guess there's an implicit assumption that we don't support people
linking libatomic statically into a DSO and hiding the symbols.  Obviously
not great practice, but I wouldn't be 100% surprised if someone's done
that...

> Note the compatibility story is even better than this. We are also compatible
> with LLVM and future GCC versions which may inline these sequences.
>
>> Thanks for adding this.  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95722
>> suggests that it's still an open question whether this is a correct thing
>> to do, but it sounds from Joseph's comment that he isn't sure whether
>> atomic loads from read-only data are valid.
>
> Yes it's not useful to do an atomic read if it is a read-only value... It 
> should
> be feasible to mark atomic types as mutable to force them to .data (see eg.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 and
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109553).
>
>> Linus's comment in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70490
>> suggests that a reasonable compromise might be to use a storing
>> implementation but not advertise that it is lock-free.  Also,
>> the comment above libat_is_lock_free says:
>>
>> /* Note that this can return that a size/alignment is not lock-free even if
>>all the operations that we use to implement the respective accesses 
>> provide
>>lock-free forward progress as specified in C++14:  Users likely expect
>>"lock-free" to also mean "fast", which is why we do not return true if, 
>> for
>>example, we implement loads with this size/alignment using a CAS.  */
>
> I don't believe lying about being lock-free like that is a good idea. When
> you use a faster lock-free implementation, you want to tell users about it
> (so they aren't forced to use nasty inline assembler hacks for example).
>
>> We don't use a CAS for the fallbacks, but like you say, we do use a
>> load/store exclusive loop.  So did you consider not doing this:
>
>> +/* State we have lock-free 128-bit atomics.  */
>> +#undef FAST_ATOMIC_LDST_16
>> +#define FAST_ATOMIC_LDST_161
>
> That would result in __atomic_is_lock_free incorrectly returning false.
> Note that __atomic_always_lock_free remains false for 128-bit since there
> is no inlining in the compiler, but __atomic_is_lock_free should be true.

I don't think it's so much lying/being incorrect as admitting that there
are shades of grey.  And the sources above suggest that there's no
consensus around saying that a load/store exclusive implementation of
an atomic load should advertise itself as lock-free.  x86 has:

/* Since load and store are implemented with CAS, they are not fast.  */
# undef FAST_ATOMIC_LDST_16
# define FAST_ATOMIC_LDST_160

I'm not comfortable with rejecting the opinions above based on my
limited knowledge of the area.

So the patch is OK from my POV without the FAST_ATOMIC_LDST_16
definition.  Please get approval from someone else if you want to
keep the definition, either as part of the initial patch or a separate
follow-on.

To be clear, I'd be happy with something that makes __atomic_is_lock_free
return true for LSE2, if that doesn't already happen (looks like it
doesn't happen?).

Thanks,
Richard

>> -   /* RELEASE.  */
>> -5: ldxpres0, res1, [x5]
>> +   /* RELEASE/ACQ_REL/SEQ_CST.  */
>> +4: ldaxp   res0, res1, [x5]
>>  stlxp   w4, in0, in1, [x5]
>> -   cbnzw4, 5b
>> +   cbnzw4, 4b
>>  ret
>> +END (libat_exchange_16)
>
>> Please explain (here and in the commit message) why you're adding
>> acquire semantics to the RELEASE case.
>
> That merges the RELEASE with ACQ_REL/SEQ_CST cases to keep the code
> short and simple like much of the code. I've added a note in the commit msg.
>
> Cheers,
> Wilco
>
> Here is v2 - this also incorporates the PR111404 fix to compare-exchange:
>
> Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible 
> with
> existing binaries (as for these GCC always calls into libatomic, so all 
> 128-bit
> atomic uses in  a process are switched), gives better performance than locking
> atomics and is what most users expect.
>
> Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not 
> supported.
> This results in an implicit store which is invisible to software as long as 
> the
> 

Re: [gcc15] nested functions in C

2023-12-04 Thread Siddhesh Poyarekar

On 2023-12-04 13:51, Jakub Jelinek wrote:

Why?  The syntax doesn't seem to be something unexpected, and as C doesn't
have lambdas, one can use the nested functions instead.
The only problem is if you need to pass function pointers somewhere else
(and target doesn't have function descriptors or something similar), if it
is only done to make code more readable compared to say use of macros, I
think the nested functions are better, one doesn't have to worry about
multiple evaluations of argument side-effects etc.  And if everything is
inlined and SRA optimized, there is no extra cost.
The problem of passing it as a function pointer to other functions is
common with C++, only lambdas which don't capture anything actually can be
convertible to function pointer, for anything else you need a template and
instantiate it for a particular lambda (which is something you can't do in
C).


I think from a language standpoint, the general idea that nested 
functions are just any functions inside functions (which is how the C 
nested functions essentially behave) is too broad and they should be 
restricted to minimal implementations that, e.g. don't have side-effects 
or if they do, there's explicit syntactic sugar to make it clearer.


If (like Martin stated earlier), nested functions are in fact going to 
enter the C standard in future then maybe this discussion is moot and we 
probably are better off implementing descriptor support for C to replace 
the on-stack trampolines instead of adding -Werror=trampolines in a hurry.


Thanks,
Sid


RE: [PATCH] i386: Improve code generation for vector __builtin_signbit (x.x[i]) ? -1 : 0 [PR112816]

2023-12-04 Thread Liu, Hongtao



> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, December 5, 2023 3:01 PM
> To: Uros Bizjak ; Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] i386: Improve code generation for vector __builtin_signbit
> (x.x[i]) ? -1 : 0 [PR112816]
> 
> Hi!
> 
> On the testcase I've recently fixed I've noticed bad code generation, we emit
> pxor%xmm1, %xmm1
> psrld   $31, %xmm0
> pcmpeqd %xmm1, %xmm0
> pcmpeqd %xmm1, %xmm0
> or
> vpxor   %xmm1, %xmm1, %xmm1
> vpsrld  $31, %xmm0, %xmm0
> vpcmpeqd%xmm1, %xmm0, %xmm0
> vpcmpeqd%xmm1, %xmm0, %xmm2
> rather than
> psrad   $31, %xmm2
> or
> vpsrad  $31, %xmm1, %xmm2
> The following patch fixes that using a combiner splitter.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok.
> 
> 2023-12-04  Jakub Jelinek  
> 
>   PR target/112816
>   * config/i386/sse.md ((eq (eq (lshiftrt x elt_bits-1) 0) 0)): New
>   splitter to turn psrld $31; pcmpeq; pcmpeq into psrad $31.
> 
>   * gcc.target/i386/pr112816.c: New test.
> 
> --- gcc/config/i386/sse.md.jj 2023-12-04 09:00:12.722437462 +0100
> +++ gcc/config/i386/sse.md2023-12-04 13:22:38.565833465 +0100
> @@ -16614,6 +16614,18 @@ (define_insn_and_split "*ashrv1ti3_inter
>DONE;
>  })
> 
> +(define_split
> +  [(set (match_operand:VI248_AVX2 0 "register_operand")
> +(eq:VI248_AVX2
> +   (eq:VI248_AVX2
> + (lshiftrt:VI248_AVX2
> +   (match_operand:VI248_AVX2 1 "register_operand")
> +   (match_operand:SI 2 "const_int_operand"))
> + (match_operand:VI248_AVX2 3 "const0_operand"))
> +   (match_operand:VI248_AVX2 4 "const0_operand")))]
> +  "INTVAL (operands[2]) == GET_MODE_PRECISION (mode)
> - 1"
> +  [(set (match_dup 0) (ashiftrt:VI248_AVX2 (match_dup 1) (match_dup
> +2)))])
> +
>  (define_expand "rotlv1ti3"
>[(set (match_operand:V1TI 0 "register_operand")
>   (rotate:V1TI
> --- gcc/testsuite/gcc.target/i386/pr112816.c.jj   2023-12-04
> 13:31:51.215061445 +0100
> +++ gcc/testsuite/gcc.target/i386/pr112816.c  2023-12-04
> 13:34:14.008053097 +0100
> @@ -0,0 +1,27 @@
> +/* PR target/112816 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-avx512f -masm=att" } */
> +/* { dg-final { scan-assembler-times "psrad\t\\\$31," 2 } } */
> +/* { dg-final { scan-assembler-not "pcmpeqd\t" } } */
> +
> +#define N 4
> +struct S { float x[N]; };
> +struct T { int x[N]; };
> +
> +__attribute__((target ("no-sse3,sse2"))) struct T foo (struct S x) {
> +  struct T res;
> +  for (int i = 0; i < N; ++i)
> +res.x[i] = __builtin_signbit (x.x[i]) ? -1 : 0;
> +  return res;
> +}
> +
> +__attribute__((target ("avx2"))) struct T bar (struct S x) {
> +  struct T res;
> +  for (int i = 0; i < N; ++i)
> +res.x[i] = __builtin_signbit (x.x[i]) ? -1 : 0;
> +  return res;
> +}
> 
>   Jakub



[PATCH v2 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrt operations.

2023-12-04 Thread Jiahao Xu
LoongArch V1.1 adds support for approximate instructions, which are utilized 
along with additional
Newton-Raphson steps implement single precision floating-point division, square 
root and reciprocal
square root operations for better throughput.

The patches are modifications made based on the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html.

Jiahao Xu (5):
  LoongArch: Add support for LoongArch V1.1 approximate instructions.
  LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
instructions.
  LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
  LoongArch: New options -mrecip and -mrecip= with ffast-math.
  LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
when -mrecip is enabled.

 gcc/config/loongarch/genopts/isa-evolution.in |   1 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
 gcc/config/loongarch/larchintrin.h|  38 +++
 gcc/config/loongarch/lasx.md  |  89 ++-
 gcc/config/loongarch/lasxintrin.h |  34 +++
 gcc/config/loongarch/loongarch-builtins.cc|  66 +
 gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
 gcc/config/loongarch/loongarch-protos.h   |   2 +
 gcc/config/loongarch/loongarch-str.h  |   1 +
 gcc/config/loongarch/loongarch.cc | 252 +-
 gcc/config/loongarch/loongarch.h  |  18 ++
 gcc/config/loongarch/loongarch.md | 104 ++--
 gcc/config/loongarch/loongarch.opt|  15 ++
 gcc/config/loongarch/lsx.md   |  89 ++-
 gcc/config/loongarch/lsxintrin.h  |  34 +++
 gcc/config/loongarch/predicates.md|   8 +
 gcc/doc/extend.texi   |  18 ++
 gcc/doc/invoke.texi   |  54 
 gcc/testsuite/gcc.target/loongarch/divf.c |  10 +
 .../loongarch/larch-frecipe-builtin.c |  28 ++
 .../gcc.target/loongarch/recip-divf.c |   9 +
 .../gcc.target/loongarch/recip-sqrtf.c|  23 ++
 gcc/testsuite/gcc.target/loongarch/sqrtf.c|  24 ++
 .../loongarch/vector/lasx/lasx-divf.c |  13 +
 .../vector/lasx/lasx-frecipe-builtin.c|  30 +++
 .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
 .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
 .../loongarch/vector/lasx/lasx-recip.c|  24 ++
 .../loongarch/vector/lasx/lasx-rsqrt.c|  26 ++
 .../loongarch/vector/lasx/lasx-sqrtf.c|  29 ++
 .../loongarch/vector/lsx/lsx-divf.c   |  13 +
 .../vector/lsx/lsx-frecipe-builtin.c  |  30 +++
 .../loongarch/vector/lsx/lsx-recip-divf.c |  12 +
 .../loongarch/vector/lsx/lsx-recip-sqrtf.c|  28 ++
 .../loongarch/vector/lsx/lsx-recip.c  |  24 ++
 .../loongarch/vector/lsx/lsx-rsqrt.c  |  26 ++
 .../loongarch/vector/lsx/lsx-sqrtf.c  |  29 ++
 37 files changed, 1212 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c

-- 
2.20.1



Re: [PATCH v1 1/2] LoongArch: Switch loongarch-def from C to C++ to make it possible.

2023-12-04 Thread chenglulu



在 2023/12/2 下午9:41, Xi Ruoyao 写道:

On Sat, 2023-12-02 at 20:44 +0800, chenglulu wrote:

@@ -657,12 +658,18 @@ abi_str (struct loongarch_abi abi)
     strlen (loongarch_abi_base_strings[abi.base]));
      else
    {
+  /* This situation has not yet occurred, so in order to avoid
the
+-Warray-bounds warning during C++ syntax checking, this part
+of the code is commented first.*/
+  /*

Just put a "gcc_unreachable ();" here?

Um, I just thought that the code can't go here, I will add a prompt
message here.:-(

If I read the code correctly, this is indeed unreachable so we can just
put gcc_unreachable() here.  But maybe I'm wrong.


I agree that if it runs this far, it's a problem with the code design.

In addition,I've sorted out the patches to remove the unnecessary guards.,

and I will send the v2 version of the patch immediately.



Re: [gcc15] nested functions in C

2023-12-04 Thread Siddhesh Poyarekar

On 2023-12-04 13:48, Martin Uecker wrote:

I empathize with Jakub's stated use case though of keeping the C
frontend support for testing purposes, but that could easily be done
behind a flag, or by putting nested C func deprecation behind a flag.


I am relatively sure C will get some form of nested functions.
Maybe as anonymous nested functions, i.e. lambdas, but I do
not see a fundamental difference here (I personally like naming
things for clarity, so i prefer named nested functions)


If (assuming from them being called lambdas) they are primarily for 
small functions without side-effects then it's already a significantly 
stronger specification than what we have right now with C nested 
functions.  That would end up enforcing what you demonstrate as the good 
way to use nested functions.


I suppose minimal, contained side-effects (such as atomically updating a 
structure) may also constitute sound design, but that should be made 
explicit in the language.



I don't disagree for cases like -Warray-bounds,
but for warnings/errors that are more deterministic in nature (like
-Werror=trampolines), they're going to point at actual problems and
larger projects and distributions will usually prefer to at least track
them, if not actually fix them.  For Fedora we tend to provide macro
overrides for packages that need to explicitly disable a security
related flag.


In projects such as mine, this will lead to a lot of code
transformations as indicated above, i.e. much worse code.

One could get away with it, since nested functions are rarely
used, but I think this is bad, because a lot of code would
improve if it used them.


If nested functions are eventually going to make it into the C standard 
then effort is probably better spent in porting the C nested functions 
to use descriptors instead of executable stacks or heaps.


Thanks,
Sid


Re: [PATCH] c/89270 - honor registered_builtin_types in type_for_size

2023-12-04 Thread Joseph Myers
On Mon, 4 Dec 2023, Richard Biener wrote:

> The following fixes the intermediate conversions inserted by
> convert_to_integer when facing address-spaces and converts
> to their effective [u]intptr_t when they are registered_builtin_types
> by considering those also from c_common_type_for_size and not
> only from c_common_type_for_mode.
> 
> Bootstrap and regtest on x86_64-unknown-linux-gnu, OK?

OK, but I also think AVR should define its __int24 type using INT_N in 
avr-modes.def, which should also serve to avoid this problem.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Maintain a validity flag for REG_UNUSED notes [PR112760] (was Re: [PATCH] pro_and_epilogue: Call df_note_add_problem () if SHRINK_WRAPPING_ENABLED [PR112760])

2023-12-04 Thread Jakub Jelinek
On Mon, Dec 04, 2023 at 05:30:45PM +, Richard Sandiford wrote:
> > I don't think it's worth adding the note problem to shrink-wrapping
> > just for the regcprop code.  If we're prepared to take that compile-time
> > hit, we might as well run a proper (fast) DCE.
> 
> Here's a patch that tries to do that.  Boostrapped & regression tested
> on aarch64-linux-gnu.  Also tested on x86_64-linux-gnu for the testcase.
> (I'll run full x86_64-linux-gnu testing overnight.)
> 
> OK to install if that passes?  Not an elegant fix, but it's probably
> too much to hope for one of those.

Isn't this way too conservative though, basically limiting single_set
to ~ 15 out of the ~ 65 RTL passes (sure, it will still DTRT for
non-PARALLEL or just PARALLEL with clobbers/uses)?
Do we know about passes other than postreload which may invalidate
REG_UNUSED notes while not purging them altogether?
Given what postreload does, I bet cse/gcse might too.

If we add a RTL checking verification of the notes, we could know
immediately what other passes invalidate it.

So, couldn't we just set such flag at the end of such passes (or only if
they actually remove any redundant insns and thus potentially invalidate
them, perhaps during doing so)?

And on the x86 side, the question from the PR still stands, why is
vzeroupper pass placed exactly after reload and not postreload which
cleans stuff up after reload.

Jakub



[PATCH v2 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled.

2023-12-04 Thread Jiahao Xu
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and 
rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue instructions 
per cycle of the
LoongArch, so vectorized loop unrolling is not performed on them.

gcc/ChangeLog:

* config/loongarch/loongarch.cc 
(loongarch_vector_costs::determine_suggested_unroll_factor):
If m_has_recip is true, uf return 1.
(loongarch_vector_costs::add_stmt_cost): Detect the use of approximate 
instruction sequence.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 2c06edcff92..0ca60e15ced 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3974,7 +3974,9 @@ protected:
   /* Reduction factor for suggesting unroll factor.  */
   unsigned m_reduc_factor = 0;
   /* True if the loop contains an average operation. */
-  bool m_has_avg =false;
+  bool m_has_avg = false;
+  /* True if the loop uses approximation instruction sequence.  */
+  bool m_has_recip = false;
 };
 
 /* Implement TARGET_VECTORIZE_CREATE_COSTS.  */
@@ -4021,7 +4023,7 @@ loongarch_vector_costs::determine_suggested_unroll_factor 
(loop_vec_info loop_vi
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
 
-  if (m_has_avg)
+  if (m_has_avg || m_has_recip)
 return 1;
 
   /* Don't unroll if it's specified explicitly not to be unrolled.  */
@@ -4081,6 +4083,36 @@ loongarch_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
}
 }
 
+  combined_fn cfn;
+  if (kind == vector_stmt
+  && stmt_info
+  && stmt_info->stmt)
+{
+  /* Detect the use of approximate instruction sequence.  */
+  if ((TARGET_RECIP_VEC_SQRT || TARGET_RECIP_VEC_RSQRT)
+ && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST)
+   switch (cfn)
+ {
+ case CFN_BUILT_IN_SQRTF:
+   m_has_recip = true;
+ default:
+   break;
+ }
+  else if (TARGET_RECIP_VEC_DIV
+  && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN)
+   {
+ machine_mode mode = TYPE_MODE (vectype);
+ switch (gimple_assign_rhs_code (stmt_info->stmt))
+   {
+   case RDIV_EXPR:
+ if (GET_MODE_INNER (mode) == SFmode)
+   m_has_recip = true;
+   default:
+ break;
+   }
+   }
+}
+
   return retval;
 }
 
-- 
2.20.1



[PATCH v2 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-12-04 Thread Jiahao Xu
When both the -mrecip and -mfrecipe options are enabled, use approximate 
reciprocal
instructions and approximate reciprocal square root instructions with additional
Newton-Raphson steps to implement single precision floating-point division, 
square
root and reciprocal square root operations, for a better performance.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable.
(-mrecip, -mrecip): New options.
* config/loongarch/lasx.md (div3): New expander.
(*div3): Rename.
(sqrt2): New expander.
(*sqrt2): Rename.
(rsqrt2): New expander.
* config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New 
prototype.
(loongarch_emit_swdivsf): Ditto.
* config/loongarch/loongarch.cc (loongarch_option_override_internal): 
Set
recip_mask for -mrecip and -mrecip= options.
(loongarch_emit_swrsqrtsf): New function.
(loongarch_emit_swdivsf): Ditto.
* config/loongarch/loongarch.h (RECIP_MASK_NONE, RECIP_MASK_DIV, 
RECIP_MASK_SQRT
RECIP_MASK_RSQRT, RECIP_MASK_VEC_DIV, RECIP_MASK_VEC_SQRT, 
RECIP_MASK_VEC_RSQRT
RECIP_MASK_ALL): New bitmasks.
(TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_RSQRT, 
TARGET_RECIP_VEC_DIV
TARGET_RECIP_VEC_SQRT, TARGET_RECIP_VEC_RSQRT): New tests.
* config/loongarch/loongarch.md (sqrt2): New expander.
(*sqrt2): Rename.
(rsqrt2): New expander.
* config/loongarch/loongarch.opt (recip_mask): New variable.
(-mrecip, -mrecip): New options.
* config/loongarch/lsx.md (div3): New expander.
(*div3): Rename.
(sqrt2): New expander.
(*sqrt2): Rename.
(rsqrt2): New expander.
* config/loongarch/predicates.md (reg_or_vecotr_1_operand): New 
predicate.
* doc/invoke.texi (LoongArch Options): Document new options.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/divf.c: New test.
* gcc.target/loongarch/recip-divf.c: New test.
* gcc.target/loongarch/recip-sqrtf.c: New test.
* gcc.target/loongarch/sqrtf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-divf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-recip.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-sqrtf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-divf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-recip.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-sqrtf.c: New test.

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 8af6cc6f532..cc1a9daf7cf 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -23,6 +23,9 @@ config/loongarch/loongarch-opts.h
 HeaderInclude
 config/loongarch/loongarch-str.h
 
+TargetVariable
+unsigned int recip_mask = 0
+
 ; ISA related options
 ;; Base ISA
 Enum
@@ -197,6 +200,14 @@ mexplicit-relocs
 Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET)
 Use %reloc() assembly operators (for backward compatibility).
 
+mrecip
+Target RejectNegative Var(loongarch_recip)
+Generate approximate reciprocal divide and square root for better throughput.
+
+mrecip=
+Target RejectNegative Joined Var(loongarch_recip_name)
+Control generation of reciprocal estimates.
+
 ; The code model option names for -mcmodel.
 Enum
 Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e4310c4523d..f6f2feedbb3 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1194,7 +1194,25 @@ (define_insn "mul3"
   [(set_attr "type" "simd_fmul")
(set_attr "mode" "")])
 
-(define_insn "div3"
+(define_expand "div3"
+  [(set (match_operand:FLASX 0 "register_operand")
+(div:FLASX (match_operand:FLASX 1 "reg_or_vecotr_1_operand")
+  (match_operand:FLASX 2 "register_operand")))]
+  "ISA_HAS_LASX"
+{
+  if (mode == V8SFmode
+&& TARGET_RECIP_VEC_DIV
+&& optimize_insn_for_speed_p ()
+&& flag_finite_math_only && !flag_trapping_math
+&& flag_unsafe_math_optimizations)
+  {
+loongarch_emit_swdivsf (operands[0], operands[1],
+   operands[2], V8SFmode);
+DONE;
+  }
+})
+
+(define_insn "*div3"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
(div:FLASX (match_operand:FLASX 1 "register_operand" "f")
   (match_operand:FLASX 2 "register_operand" "f")))]
@@ -1223,7 +1241,23 @@ (define_insn "fnma4"
   [(set_attr "type" "simd_fmadd")
(set_attr "mode" "")])
 
-(define_insn "sqrt2"
+(define_expand "sqrt2"
+  [(set (match_operand:FLASX 0 

[PATCH v2 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.

2023-12-04 Thread Jiahao Xu
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, 
and enable
[x]vfrecip instructions to be generated during auto-vectorization.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to ..
(recip3): .. this.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrecip_d): 
Redefine
to new pattern name.
(CODE_FOR_lsx_vfrecip_s): Ditto.
(CODE_FOR_lasx_xvfrecip_d): Ditto.
(CODE_FOR_lasx_xvfrecip_s): Ditto.
(loongarch_expand_builtin_direct): For the vector recip instructions, 
construct a
temporary parameter const1_vector.
* config/loongarch/lsx.md (lsx_vfrecip_): Renamed to ..
(recip3): .. this.
* config/loongarch/predicates.md (const_vector_1_operand): New 
predicate.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index c8edc1bfd76..e4310c4523d 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1626,12 +1626,12 @@ (define_insn "lasx_xvfmina_"
   [(set_attr "type" "simd_fminmax")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvfrecip_"
+(define_insn "recip3"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
-   (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
- UNSPEC_LASX_XVFRECIP))]
+   (div:FLASX (match_operand:FLASX 1 "const_vector_1_operand" "")
+ (match_operand:FLASX 2 "register_operand" "f")))]
   "ISA_HAS_LASX"
-  "xvfrecip.\t%u0,%u1"
+  "xvfrecip.\t%u0,%u2"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
 
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index b196e142d61..e0933537166 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -502,6 +502,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d
 #define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2
 #define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2
+#define CODE_FOR_lsx_vfrecip_d CODE_FOR_recipv2df3
+#define CODE_FOR_lsx_vfrecip_s CODE_FOR_recipv4sf3
 
 /* LoongArch ASX define CODE_FOR_lasx_mxxx */
 #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3
@@ -780,6 +782,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du
 #define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2
 #define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2
+#define CODE_FOR_lasx_xvfrecip_d CODE_FOR_recipv4df3
+#define CODE_FOR_lasx_xvfrecip_s CODE_FOR_recipv8sf3
 
 static const struct loongarch_builtin_description loongarch_builtins[] = {
 #define LARCH_MOVFCSR2GR 0
@@ -3024,6 +3028,22 @@ loongarch_expand_builtin_direct (enum insn_code icode, 
rtx target, tree exp,
   if (has_target_p)
 create_output_operand ([opno++], target, TYPE_MODE (TREE_TYPE (exp)));
 
+  /* For the vector reciprocal instructions, we need to construct a temporary
+ parameter const1_vector.  */
+  switch (icode)
+{
+case CODE_FOR_recipv8sf3:
+case CODE_FOR_recipv4df3:
+case CODE_FOR_recipv4sf3:
+case CODE_FOR_recipv2df3:
+  loongarch_prepare_builtin_arg ([2], exp, 0);
+  create_input_operand ([1], CONST1_RTX (ops[0].mode), ops[0].mode);
+  return loongarch_expand_builtin_insn (icode, 3, ops, has_target_p);
+
+default:
+  break;
+}
+
   /* Map the arguments to the other operands.  */
   gcc_assert (opno + call_expr_nargs (exp)
  == insn_data[icode].n_generator_args);
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index aeae1b1a622..06402e3b353 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1539,12 +1539,12 @@ (define_insn "lsx_vfmina_"
   [(set_attr "type" "simd_fminmax")
(set_attr "mode" "")])
 
-(define_insn "lsx_vfrecip_"
+(define_insn "recip3"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
-   (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
-UNSPEC_LSX_VFRECIP))]
+   (div:FLSX (match_operand:FLSX 1 "const_vector_1_operand" "")
+(match_operand:FLSX 2 "register_operand" "f")))]
   "ISA_HAS_LSX"
-  "vfrecip.\t%w0,%w1"
+  "vfrecip.\t%w0,%w2"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
 
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index d02e846cb12..f7796da10b2 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -227,6 +227,10 @@ (define_predicate "const_1_operand"
   (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST1_RTX (GET_MODE (op))")))
 
+(define_predicate "const_vector_1_operand"
+  (and (match_code "const_vector")
+   (match_test "op == CONST1_RTX (GET_MODE (op))")))
+
 (define_predicate "reg_or_1_operand"
   (ior (match_operand 0 

[PATCH v2 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions.

2023-12-04 Thread Jiahao Xu
This patch adds define_insn/builtins/intrinsics for these instructions, and add 
option
-mfrecipe to control instruction generation.

gcc/ChangeLog:

* config/loongarch/genopts/isa-evolution.in (fecipe): Add.
* config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
* config/loongarch/lasx.md (lasx_xvfrecipe_): New insn 
pattern.
(lasx_xvfrsqrte_): Ditto.
* config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic.
(__lasx_xvfrecipe_d): Ditto.
(__lasx_xvfrsqrte_s): Ditto.
(__lasx_xvfrsqrte_d): Ditto.
* config/loongarch/loongarch-builtins.cc (AVAIL_ALL): Add predicates.
(LSX_EXT_BUILTIN): New macro.
(LASX_EXT_BUILTIN): Ditto.
* config/loongarch/loongarch-cpucfg-map.h: Regenerate.
* config/loongarch/loongarch-str.h (OPTSTR_FRECIPE): Regenerate.
* config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump status 
for TARGET_FRECIPE.
* config/loongarch/loongarch.md (loongarch_frecipe_): New insn 
pattern.
(loongarch_frsqrte_): Ditto.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/lsx.md (lsx_vfrecipe_): New insn pattern.
(lsx_vfrsqrte_): Ditto.
* config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic.
(__lsx_vfrecipe_d): Ditto.
(__lsx_vfrsqrte_s): Ditto.
(__lsx_vfrsqrte_d): Ditto.
* doc/extend.texi: Add documentation for LoongArch new builtins and 
intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/larch-frecipe-builtin.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c: New test.

diff --git a/gcc/config/loongarch/genopts/isa-evolution.in 
b/gcc/config/loongarch/genopts/isa-evolution.in
index a6bc3f87f20..11a198b649f 100644
--- a/gcc/config/loongarch/genopts/isa-evolution.in
+++ b/gcc/config/loongarch/genopts/isa-evolution.in
@@ -1,3 +1,4 @@
+2  25  frecipe Support frecipe.{s/d} and frsqrte.{s/d} 
instructions.
 2  26  div32   Support div.w[u] and mod.w[u] instructions with 
inputs not sign-extended.
 2  27  lam-bh  Support am{swap/add}[_db].{b/h} instructions.
 2  28  lamcas  Support amcas[_db].{b/h/w/d} instructions.
diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
index e571ed27b37..028081cccfb 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -333,6 +333,44 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2)
 }
 #endif
 
+#if defined(__loongarch64) &&  defined(TARGET_FRECIPE)
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  SF, SF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frecipe_s (float _1)
+{
+  __builtin_loongarch_frecipe_s ((float) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  DF, DF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frecipe_d (double _1)
+{
+  __builtin_loongarch_frecipe_d ((double) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  SF, SF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frsqrte_s (float _1)
+{
+  __builtin_loongarch_frsqrte_s ((float) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  DF, DF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frsqrte_d (double _1)
+{
+  __builtin_loongarch_frsqrte_d ((double) _1);
+}
+#endif
+
 /* Assembly instruction format:ui15.  */
 /* Data types in instruction templates:  USI.  */
 #define __dbar(/*ui15*/ _1) __builtin_loongarch_dbar ((_1))
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 116b30c0774..f6e5208a6f1 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -40,8 +40,10 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVFCVTL
   UNSPEC_LASX_XVFLOGB
   UNSPEC_LASX_XVFRECIP
+  UNSPEC_LASX_XVFRECIPE
   UNSPEC_LASX_XVFRINT
   UNSPEC_LASX_XVFRSQRT
+  UNSPEC_LASX_XVFRSQRTE
   UNSPEC_LASX_XVFCMP_SAF
   UNSPEC_LASX_XVFCMP_SEQ
   UNSPEC_LASX_XVFCMP_SLE
@@ -1633,6 +1635,17 @@ (define_insn "lasx_xvfrecip_"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
 
+;; Approximate Reciprocal Instructions.
+
+(define_insn "lasx_xvfrecipe_"
+  [(set (match_operand:FLASX 0 "register_operand" "=f")
+(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
+ UNSPEC_LASX_XVFRECIPE))]
+  "ISA_HAS_LASX && TARGET_FRECIPE"
+  "xvfrecipe.\t%u0,%u1"
+  [(set_attr "type" "simd_fdiv")
+   (set_attr 

[PATCH v2 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions.

2023-12-04 Thread Jiahao Xu
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard
pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to ..
(rsqrt2): .. this.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vfrsqrt_d): Redefine to standard pattern name.
(CODE_FOR_lsx_vfrsqrt_s): Ditto.
(CODE_FOR_lasx_xvfrsqrt_d): Ditto.
(CODE_FOR_lasx_xvfrsqrt_s): Ditto.
* config/loongarch/loongarch.cc (use_rsqrt_p): New function.
(loongarch_optab_supported_p): Ditto.
(TARGET_OPTAB_SUPPORTED_P): New hook.
* config/loongarch/loongarch.md (*rsqrta): Remove.
(*rsqrt2): New insn pattern.
(*rsqrtb): Remove.
* config/loongarch/lsx.md (lsx_vfrsqrt_): Renamed to ..
(rsqrt2): .. this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-rsqrt.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-rsqrt.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index f6e5208a6f1..c8edc1bfd76 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1646,10 +1646,10 @@ (define_insn "lasx_xvfrecipe_"
   [(set_attr "type" "simd_fdiv")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvfrsqrt_"
+(define_insn "rsqrt2"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
-   (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
- UNSPEC_LASX_XVFRSQRT))]
+(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
+ UNSPEC_LASX_XVFRSQRT))]
   "ISA_HAS_LASX"
   "xvfrsqrt.\t%u0,%u1"
   [(set_attr "type" "simd_fdiv")
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index bf95a44c0d2..b196e142d61 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -500,6 +500,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lsx_vssrlrn_bu_h CODE_FOR_lsx_vssrlrn_u_bu_h
 #define CODE_FOR_lsx_vssrlrn_hu_w CODE_FOR_lsx_vssrlrn_u_hu_w
 #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d
+#define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2
+#define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2
 
 /* LoongArch ASX define CODE_FOR_lasx_mxxx */
 #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3
@@ -776,6 +778,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lasx_xvsat_hu CODE_FOR_lasx_xvsat_u_hu
 #define CODE_FOR_lasx_xvsat_wu CODE_FOR_lasx_xvsat_u_wu
 #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du
+#define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2
+#define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2
 
 static const struct loongarch_builtin_description loongarch_builtins[] = {
 #define LARCH_MOVFCSR2GR 0
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 57a20bec8a4..96a4b846f2d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -11487,6 +11487,30 @@ loongarch_builtin_support_vector_misalignment 
(machine_mode mode,
  is_packed);
 }
 
+static bool
+use_rsqrt_p (void)
+{
+  return (flag_finite_math_only
+ && !flag_trapping_math
+ && flag_unsafe_math_optimizations);
+}
+
+/* Implement the TARGET_OPTAB_SUPPORTED_P hook.  */
+
+static bool
+loongarch_optab_supported_p (int op, machine_mode, machine_mode,
+optimization_type opt_type)
+{
+  switch (op)
+{
+case rsqrt_optab:
+  return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p ();
+
+default:
+  return true;
+}
+}
+
 /* If -fverbose-asm, dump some info for debugging.  */
 static void
 loongarch_asm_code_end (void)
@@ -11625,6 +11649,9 @@ loongarch_asm_code_end (void)
 #undef TARGET_FUNCTION_ARG_BOUNDARY
 #define TARGET_FUNCTION_ARG_BOUNDARY loongarch_function_arg_boundary
 
+#undef TARGET_OPTAB_SUPPORTED_P
+#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p
+
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P loongarch_vector_mode_supported_p
 
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 07beede8892..fd154b02e48 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -60,6 +60,7 @@ (define_c_enum "unspec" [
   UNSPEC_TIE
 
   ;; RSQRT
+  UNSPEC_RSQRT
   UNSPEC_RSQRTE
 
   ;; RECIP
@@ -1134,25 +1135,14 @@ (define_insn "sqrt2"
(set_attr "mode" "")
(set_attr "insn_count" "1")])
 
-(define_insn "*rsqrta"
+(define_insn "*rsqrt2"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
-   (div:ANYF (match_operand:ANYF 1 "const_1_operand" "")
- (sqrt:ANYF (match_operand:ANYF 2 "register_operand" "f"]
-  "flag_unsafe_math_optimizations"
-  "frsqrt.\t%0,%2"
-  

[PATCH v4] RISC-V: Implement TLS Descriptors.

2023-12-04 Thread Tatsuyuki Ishi
This implements TLS Descriptors (TLSDESC) as specified in [1].

The 4-instruction sequence is implemented as a single RTX insn for
simplicity, but this can be revisited later if instruction scheduling or
more flexible RA is desired.

The default remains to be the traditional TLS model, but can be configured
with --with_tls={trad,desc}. The choice can be revisited once toolchain
and libc support ships.

[1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373.

gcc/Changelog:
* config/riscv/riscv.opt: Add -mtls-dialect to configure TLS flavor.
* config.gcc: Add --with_tls configuration option to change the
default TLS flavor.
* config/riscv/riscv.h: Add TARGET_TLSDESC determined from
-mtls-dialect and with_tls defaults.
* config/riscv/riscv-opts.h: Define enum riscv_tls_type for the
two TLS flavors.
* config/riscv/riscv-protos.h: Define SYMBOL_TLSDESC symbol type.
* config/riscv/riscv.md: Add instruction sequence for TLSDESC.
* config/riscv/riscv.cc (riscv_symbol_insns): Add instruction
sequence length data for TLSDESC.
(riscv_legitimize_tls_address): Add lowering of TLSDESC.
* doc/install.texi: Document --with-tls for RISC-V.
* doc/invoke.texi: Document -mtls-dialect for RISC-V.
* testsuite/gcc.target/riscv/tls_1.x: Add TLSDESC GD test case.
* testsuite/gcc.target/riscv/tlsdesc.c: Same as above.
---
No regression in gcc tests for rv64gc, tested alongside the binutils and
glibc implementation. Tested with --with_tls=desc.

v2: Add with_tls configuration option, and a few readability improvements.
Added Changelog.
v3: Add documentation per Kito's suggestion.
Fix minor issues pointed out by Kito and Jeff.
Thanks Kito Cheng and Jeff Law for review.
v4: Add TLSDESC GD assembly test.
Rebase on top of trunk.

 gcc/config.gcc   | 15 ++-
 gcc/config/riscv/riscv-opts.h|  6 ++
 gcc/config/riscv/riscv-protos.h  |  5 +++--
 gcc/config/riscv/riscv.cc| 24 
 gcc/config/riscv/riscv.h |  9 +++--
 gcc/config/riscv/riscv.md| 20 +++-
 gcc/config/riscv/riscv.opt   | 14 ++
 gcc/doc/install.texi |  3 +++
 gcc/doc/invoke.texi  | 13 -
 gcc/testsuite/gcc.target/riscv/tls_1.x   |  5 +
 gcc/testsuite/gcc.target/riscv/tlsdesc.c | 12 
 11 files changed, 115 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/tls_1.x
 create mode 100644 gcc/testsuite/gcc.target/riscv/tlsdesc.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 748430194f3..8bb22e9f590 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2490,6 +2490,7 @@ riscv*-*-linux*)
# Force .init_array support.  The configure script cannot always
# automatically detect that GAS supports it, yet we require it.
gcc_cv_initfini_array=yes
+   with_tls=${with_tls:-trad}
;;
 riscv*-*-elf* | riscv*-*-rtems*)
tm_file="elfos.h newlib-stdint.h ${tm_file} riscv/elf.h"
@@ -2532,6 +2533,7 @@ riscv*-*-freebsd*)
# Force .init_array support.  The configure script cannot always
# automatically detect that GAS supports it, yet we require it.
gcc_cv_initfini_array=yes
+   with_tls=${with_tls:-trad}
;;
 
 loongarch*-*-linux*)
@@ -4658,7 +4660,7 @@ case "${target}" in
;;
 
riscv*-*-*)
-   supported_defaults="abi arch tune riscv_attribute isa_spec"
+   supported_defaults="abi arch tune riscv_attribute isa_spec tls"
 
case "${target}" in
riscv-* | riscv32*) xlen=32 ;;
@@ -4788,6 +4790,17 @@ case "${target}" in
;;
esac
fi
+   # Handle --with-tls.
+   case "$with_tls" in
+   "" \
+   | trad | desc)
+   # OK
+   ;;
+   *)
+   echo "Unknown TLS method used in --with-tls=$with_tls" 
1>&2
+   exit 1
+   ;;
+   esac
 
# Handle --with-multilib-list.
if test "x${with_multilib_list}" != xdefault; then
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 30efebbf07b..b2551968be0 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -138,4 +138,10 @@ enum stringop_strategy_enum {
 #define TARGET_MAX_LMUL
\
   (int) (riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul)
 
+/* TLS types.  */
+enum riscv_tls_type {
+  TLS_TRADITIONAL,
+  TLS_DESCRIPTORS
+};
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv-protos.h 

[PATCH] i386: Improve code generation for vector __builtin_signbit (x.x[i]) ? -1 : 0 [PR112816]

2023-12-04 Thread Jakub Jelinek
Hi!

On the testcase I've recently fixed I've noticed bad code generation,
we emit
pxor%xmm1, %xmm1
psrld   $31, %xmm0
pcmpeqd %xmm1, %xmm0
pcmpeqd %xmm1, %xmm0
or
vpxor   %xmm1, %xmm1, %xmm1
vpsrld  $31, %xmm0, %xmm0
vpcmpeqd%xmm1, %xmm0, %xmm0
vpcmpeqd%xmm1, %xmm0, %xmm2
rather than
psrad   $31, %xmm2
or
vpsrad  $31, %xmm1, %xmm2
The following patch fixes that using a combiner splitter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-04  Jakub Jelinek  

PR target/112816
* config/i386/sse.md ((eq (eq (lshiftrt x elt_bits-1) 0) 0)): New
splitter to turn psrld $31; pcmpeq; pcmpeq into psrad $31.

* gcc.target/i386/pr112816.c: New test.

--- gcc/config/i386/sse.md.jj   2023-12-04 09:00:12.722437462 +0100
+++ gcc/config/i386/sse.md  2023-12-04 13:22:38.565833465 +0100
@@ -16614,6 +16614,18 @@ (define_insn_and_split "*ashrv1ti3_inter
   DONE;
 })
 
+(define_split
+  [(set (match_operand:VI248_AVX2 0 "register_operand")
+(eq:VI248_AVX2
+ (eq:VI248_AVX2
+   (lshiftrt:VI248_AVX2
+ (match_operand:VI248_AVX2 1 "register_operand")
+ (match_operand:SI 2 "const_int_operand"))
+   (match_operand:VI248_AVX2 3 "const0_operand"))
+ (match_operand:VI248_AVX2 4 "const0_operand")))]
+  "INTVAL (operands[2]) == GET_MODE_PRECISION (mode) - 1"
+  [(set (match_dup 0) (ashiftrt:VI248_AVX2 (match_dup 1) (match_dup 2)))])
+
 (define_expand "rotlv1ti3"
   [(set (match_operand:V1TI 0 "register_operand")
(rotate:V1TI
--- gcc/testsuite/gcc.target/i386/pr112816.c.jj 2023-12-04 13:31:51.215061445 
+0100
+++ gcc/testsuite/gcc.target/i386/pr112816.c2023-12-04 13:34:14.008053097 
+0100
@@ -0,0 +1,27 @@
+/* PR target/112816 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx512f -masm=att" } */
+/* { dg-final { scan-assembler-times "psrad\t\\\$31," 2 } } */
+/* { dg-final { scan-assembler-not "pcmpeqd\t" } } */
+
+#define N 4
+struct S { float x[N]; };
+struct T { int x[N]; };
+
+__attribute__((target ("no-sse3,sse2"))) struct T
+foo (struct S x)
+{
+  struct T res;
+  for (int i = 0; i < N; ++i)
+res.x[i] = __builtin_signbit (x.x[i]) ? -1 : 0;
+  return res;
+}
+
+__attribute__((target ("avx2"))) struct T
+bar (struct S x)
+{
+  struct T res;
+  for (int i = 0; i < N; ++i)
+res.x[i] = __builtin_signbit (x.x[i]) ? -1 : 0;
+  return res;
+}

Jakub



Re: [PATCH v7] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-04 Thread Jason Merrill

On 12/4/23 15:23, Marek Polacek wrote:

+/* FN is not a consteval function, but may become one.  Remember to
+   escalate it after all pending templates have been instantiated.  */
+
+void
+maybe_store_immediate_escalating_fn (tree fn)
+{
+  if (unchecked_immediate_escalating_function_p (fn))
+remember_escalating_expr (fn);
+}



+++ b/gcc/cp/decl.cc
@@ -18441,7 +18441,10 @@ finish_function (bool inline_p)
   if (!processing_template_decl
   && !DECL_IMMEDIATE_FUNCTION_P (fndecl)
   && !DECL_OMP_DECLARE_REDUCTION_P (fndecl))
-cp_fold_function (fndecl);
+{
+  cp_fold_function (fndecl);
+  maybe_store_immediate_escalating_fn (fndecl);
+}


I think maybe_store_, and the call to it from finish_function, are 
unneeded; we will have already decided whether we need to remember the 
function during the call to cp_fold_function.


OK with that change.

Jason



Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-04 Thread Vincenzo Palazzo
However, I understand your point.

Cheers,

   Vincent.

On Tue, Dec 5, 2023 at 3:01 AM Vincenzo Palazzo
 wrote:
>
> >Distro build procedures are not something the GCC project generally gets
> involved with.
>
> I see, but to me, this do not look like a distro build procedure,
> because you can use
> with any kind of system (OSX/UNIX) by using nix.
>
> I disagree with you just because my patch is not building a package
> but is just giving
> an agnostic way to develop with GCC. OFC is most useful with NixOs because
> it does not have apt or pacman or any other kind of package manager.
>
> Cheers,
>
>Vincent.
>
> On Tue, Dec 5, 2023 at 2:54 AM Jeff Law  wrote:
> >
> >
> >
> > On 12/4/23 18:38, Vincenzo Palazzo wrote:
> > > Ciao all,
> > >
> > >> +1.  I think this is best left to the distros.
> > >
> > > What do you mean? this is not a package, it is an env shell in order
> > > to build an work on GCC on NixOS.
> > >
> > Distro build procedures are not something the GCC project generally gets
> > involved with.
> >
> > jeff


[PATCH] Fortran: allow RESTRICT qualifier also for optional arguments [PR100988]

2023-12-04 Thread Harald Anlauf
Dear all,

the attached patch picks up an observation by Tobias that we did
not specify the RESTRICT qualifier for optional arguments even
if that was allowed.  In principle this might have prevented
better optimization.

While looking more closely, I found and fixed an issue with CLASS
dummy arguments that mishandled this.  This revealed a few cases
in the testsuite that were matching the wrong patterns...

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From aa25d35cb866f7f333b656938224866a70b93a69 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 4 Dec 2023 22:44:53 +0100
Subject: [PATCH] Fortran: allow RESTRICT qualifier also for optional arguments
 [PR100988]

gcc/fortran/ChangeLog:

	PR fortran/100988
	* gfortran.h (IS_PROC_POINTER): New macro.
	* trans-types.cc (gfc_sym_type): Use macro in determination if the
	restrict qualifier can be used for a dummy variable.  Fix logic to
	allow the restrict qualifier also for optional arguments, and to
	not apply it to pointer or proc_pointer arguments.

gcc/testsuite/ChangeLog:

	PR fortran/100988
	* gfortran.dg/coarray_poly_6.f90: Adjust pattern.
	* gfortran.dg/coarray_poly_7.f90: Likewise.
	* gfortran.dg/coarray_poly_8.f90: Likewise.
	* gfortran.dg/missing_optional_dummy_6a.f90: Likewise.
	* gfortran.dg/pr100988.f90: New test.

Co-authored-by: Tobias Burnus  
---
 gcc/fortran/gfortran.h|  3 +
 gcc/fortran/trans-types.cc| 13 ++--
 gcc/testsuite/gfortran.dg/coarray_poly_6.f90  |  2 +-
 gcc/testsuite/gfortran.dg/coarray_poly_7.f90  |  2 +-
 gcc/testsuite/gfortran.dg/coarray_poly_8.f90  |  2 +-
 .../gfortran.dg/missing_optional_dummy_6a.f90 |  2 +-
 gcc/testsuite/gfortran.dg/pr100988.f90| 61 +++
 7 files changed, 74 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr100988.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index aa3f6cb70b4..a77441f38e7 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -4008,6 +4008,9 @@ bool gfc_may_be_finalized (gfc_typespec);
 #define IS_POINTER(sym) \
 	(sym->ts.type == BT_CLASS && sym->attr.class_ok && CLASS_DATA (sym) \
 	 ? CLASS_DATA (sym)->attr.class_pointer : sym->attr.pointer)
+#define IS_PROC_POINTER(sym) \
+	(sym->ts.type == BT_CLASS && sym->attr.class_ok && CLASS_DATA (sym) \
+	 ? CLASS_DATA (sym)->attr.proc_pointer : sym->attr.proc_pointer)

 /* frontend-passes.cc */

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 084b8c3ae2c..5b11ffc3cc9 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -2327,8 +2327,8 @@ gfc_sym_type (gfc_symbol * sym, bool is_bind_c)
   else
 byref = 0;

-  restricted = !sym->attr.target && !sym->attr.pointer
-   && !sym->attr.proc_pointer && !sym->attr.cray_pointee;
+  restricted = (!sym->attr.target && !IS_POINTER (sym)
+		&& !IS_PROC_POINTER (sym) && !sym->attr.cray_pointee);
   if (!restricted)
 type = gfc_nonrestricted_type (type);

@@ -2384,11 +2384,10 @@ gfc_sym_type (gfc_symbol * sym, bool is_bind_c)
 	  || (sym->ns->proc_name && sym->ns->proc_name->attr.entry_master))
 	type = build_pointer_type (type);
   else
-	{
-	  type = build_reference_type (type);
-	  if (restricted)
-	type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
-	}
+	type = build_reference_type (type);
+
+  if (restricted)
+	type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
 }

   return (type);
diff --git a/gcc/testsuite/gfortran.dg/coarray_poly_6.f90 b/gcc/testsuite/gfortran.dg/coarray_poly_6.f90
index 53b80e442d3..344e12b4eff 100644
--- a/gcc/testsuite/gfortran.dg/coarray_poly_6.f90
+++ b/gcc/testsuite/gfortran.dg/coarray_poly_6.f90
@@ -16,6 +16,6 @@ contains
   end subroutine foo
 end
 ! { dg-final { scan-tree-dump-times "foo \\(struct __class_MAIN___T_0_1t & restrict x, void \\* restrict caf_token.., integer\\(kind=\[48\]\\) caf_offset..\\)" 1 "original" } }
-! { dg-final { scan-tree-dump-times "bar \\(struct __class_MAIN___T_0_1t \\* x, void \\* restrict caf_token.., integer\\(kind=\[48\]\\) caf_offset..\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "bar \\(struct __class_MAIN___T_0_1t \\* restrict x, void \\* restrict caf_token.., integer\\(kind=\[48\]\\) caf_offset..\\)" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "bar \\(0B, 0B, 0\\);" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "foo \\(, y._data.token, \\(integer\\(kind=\[48\]\\)\\) class..._data.data - \\(integer\\(kind=\[48\]\\)\\) y._data.data\\);" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/coarray_poly_7.f90 b/gcc/testsuite/gfortran.dg/coarray_poly_7.f90
index 44f98e16e09..d8d83aea39b 100644
--- a/gcc/testsuite/gfortran.dg/coarray_poly_7.f90
+++ b/gcc/testsuite/gfortran.dg/coarray_poly_7.f90
@@ -16,6 +16,6 @@ contains
   end subroutine foo
 end
 ! { dg-final { scan-tree-dump-times "foo \\(struct __class_MAIN___T_1_1t & restrict x, void \\* restrict 

Re: [PATCH v6 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-04 Thread waffl3x






On Monday, December 4th, 2023 at 9:35 PM, waffl3x  
wrote:


> 
> 
> >> @@ -15402,6 +15450,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
> >> complain,
> 
> > > gcc_checking_assert (TYPE_MAIN_VARIANT (TREE_TYPE (ve))
> > > == TYPE_MAIN_VARIANT (type));
> > > SET_DECL_VALUE_EXPR (r, ve);
> > > + if (is_capture_proxy (t))
> > > + type = TREE_TYPE (ve);
> 
> > That should have close to the same effect as the lambda_proxy_type
> > adjustment I was talking about, since that function basically returns
> > the TREE_TYPE of the COMPONENT_REF. But the underlying problem is that
> > finish_non_static_data_member assumes that 'object' is '*this', for
> > which you can trust the cv-quals; for auto&&, you can't.
> > capture_decltype has the same problem. I'm attaching a patch to address
> > this in both places.
> 
> 
> Regarding this, was my change actually okay, and was your change
> supposed to address it? I applied my patch to the latest commit in
> master yesterday and started tests and whatnot with this change
> commented out as I wasn't sure. It seems like my tests for constness of
> captures no longer works with or without this change commented out.
> 
> If you wish I can go over everything again and figure out a new
> solution with your changes but stepping through all this code was quite
> a task that I'm weary of doing again. Even if the second time through
> won't be so arduous I would like to avoid it.
> 
> You know what, I'll give it a go anyway but I don't want to spend too
> much time on it, I still have a few tests to clean up and this crash to
> fix.
> 
> template  void f()
> 
> {
> int i;
> [=](this T&& self){ return i; }(); // error, unrelated
> }
> int main() { f(); }
> 
> 
> If this crash doesn't take too long (I don't think it will, it seems
> straightforward enough) then I'll look at fixing the captures with a
> const xobject parameter bug the correct way.
> 
> Alex

WAIT Scratch that, I made a mistake, there's only a single case that is
broken, I read the test log wrong. Ah, I swear I'm cursed to realize
things the moment I hit the send button.

I have to take a closer look, I'll get back to you when I know more,
just trying to make sure you don't waste your time on this due to my
mistake.

Alex


[PATCH] c++: Further #pragma GCC unroll C++ fix [PR112795]

2023-12-04 Thread Jakub Jelinek
Hi!

When committing the #pragma GCC unroll patch, I found I forgot one spot
for diagnosting the invalid unrolls - if #pragma GCC unroll argument is
dependent and the pragma is before a range for loop, the unroll tree (now,
before one converted form ushort) is saved into RANGE_FOR_UNROLL and
tsubst_stmt was RECURing on it, but didn't diagnose if it was invalid and
so we ICEd later in the middle-end when  ANNOTATE_EXPR had unexpected
argument.

The following patch fixes that.  Or should I create some helper function
(how to call it) and call it from all of cp_parser_pragma_unroll,
tsubst_stmt (here) and tsubst_expr (ANNOTATE_EXPR)?
Another option is diagnose it instead when we create the ANNOTATE_EXPRs,
but unfortunately that is in 3 different places.  And at least for the
non-template case we'd have worse location_t.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2023-12-04  Jakub Jelinek  

PR c++/112795
* pt.cc (tsubst_stmt) : Perform RANGE_FOR_UNROLL
value checking here as well.

* g++.dg/ext/unroll-2.C: Use { target c++11 } instead of dg-skip-if for
-std=gnu++98.
* g++.dg/ext/unroll-3.C: Likewise.
* g++.dg/ext/unroll-7.C: New test.
* g++.dg/ext/unroll-8.C: New test.

--- gcc/cp/pt.cc.jj 2023-12-04 08:59:06.0 +0100
+++ gcc/cp/pt.cc2023-12-04 10:49:38.149254907 +0100
@@ -18407,22 +18407,46 @@ tsubst_stmt (tree t, tree args, tsubst_f
complain, in_decl, decomp);
  }
 
+   tree unroll = RECUR (RANGE_FOR_UNROLL (t));
+   if (unroll)
+ {
+   HOST_WIDE_INT lunroll;
+   if (type_dependent_expression_p (unroll))
+ ;
+   else if (!INTEGRAL_TYPE_P (TREE_TYPE (unroll))
+|| (!value_dependent_expression_p (unroll)
+&& (!tree_fits_shwi_p (unroll)
+|| (lunroll = tree_to_shwi (unroll)) < 0
+|| lunroll >= USHRT_MAX)))
+ {
+   error_at (EXPR_LOCATION (RANGE_FOR_UNROLL (t)),
+ "%<#pragma GCC unroll%> requires an "
+ "assignment-expression that evaluates to a "
+ "non-negative integral constant less than %u",
+ USHRT_MAX);
+   unroll = integer_one_node;
+ }
+   else if (TREE_CODE (unroll) == INTEGER_CST)
+ {
+   unroll = fold_convert (integer_type_node, unroll);
+   if (integer_zerop (unroll))
+ unroll = integer_one_node;
+ }
+ }
+
if (processing_template_decl)
  {
RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t);
-   RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t);
+   RANGE_FOR_UNROLL (stmt) = unroll;
RANGE_FOR_NOVECTOR (stmt) = RANGE_FOR_NOVECTOR (t);
finish_range_for_decl (stmt, decl, expr);
if (decomp && decl != error_mark_node)
  cp_finish_decomp (decl, decomp);
  }
else
- {
-   tree unroll = RECUR (RANGE_FOR_UNROLL (t));
-   stmt = cp_convert_range_for (stmt, decl, expr, decomp,
-RANGE_FOR_IVDEP (t), unroll,
-RANGE_FOR_NOVECTOR (t));
- }
+ stmt = cp_convert_range_for (stmt, decl, expr, decomp,
+  RANGE_FOR_IVDEP (t), unroll,
+  RANGE_FOR_NOVECTOR (t));
 
bool prev = note_iteration_stmt_body_start ();
 RECUR (RANGE_FOR_BODY (t));
--- gcc/testsuite/g++.dg/ext/unroll-2.C.jj  2020-01-12 11:54:37.172401958 
+0100
+++ gcc/testsuite/g++.dg/ext/unroll-2.C 2023-12-04 10:17:00.390997063 +0100
@@ -1,6 +1,5 @@
-// { dg-do compile }
+// { dg-do compile { target c++11 } }
 // { dg-options "-O2 -fdump-tree-cunrolli-details" }
-// { dg-skip-if "range for" { *-*-* } { "-std=gnu++98" } { "" } }
 
 void
 foo (int ()[8], int *b, int *c)
--- gcc/testsuite/g++.dg/ext/unroll-3.C.jj  2020-01-12 11:54:37.172401958 
+0100
+++ gcc/testsuite/g++.dg/ext/unroll-3.C 2023-12-04 10:17:13.526813516 +0100
@@ -1,6 +1,5 @@
-// { dg-do compile }
+// { dg-do compile { target c++11 } }
 // { dg-options "-O2 -fdump-tree-cunrolli-details" }
-// { dg-skip-if "range for" { *-*-* } { "-std=gnu++98" } { "" } }
 
 template 
 void
--- gcc/testsuite/g++.dg/ext/unroll-7.C.jj  2023-12-04 10:17:53.481255222 
+0100
+++ gcc/testsuite/g++.dg/ext/unroll-7.C 2023-12-04 10:39:23.258115349 +0100
@@ -0,0 +1,45 @@
+// PR c++/112795
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -fdump-tree-cunrolli-details" }
+
+void baz (int);
+constexpr int n = 3;
+constexpr int m = 7;
+
+template 
+void
+foo (int ()[3], T b)
+{
+#pragma GCC unroll(n)
+  for (auto i : a)
+baz (i);
+#pragma GCC unroll(m)
+  for (auto i : b)
+baz (i);
+}
+

[PATCH 06/17] [APX NDD] Support APX NDD for sub insns

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy):
Add use_ndd parameter and parse it.
* config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy):
Change define.
* config/i386/i386.md (sub3): Add new alternatives for NDD
and adjust output templates.
(*sub_1): Likewise.
(*sub_2): Likewise.
(subv4): Likewise.
(*subv4): Likewise.
(subv4_1): Likewise.
(usubv4): Likewise.
(*sub_3): Likewise.
(*subsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternatives.
(*subsi_2_zext): Likewise.
(*subsi_3_zext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add test for ndd sub.
---
 gcc/config/i386/i386-expand.cc  |   5 +-
 gcc/config/i386/i386-protos.h   |   2 +-
 gcc/config/i386/i386.md | 155 
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  13 ++
 4 files changed, 120 insertions(+), 55 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 3ecda989cf8..93ecde4b4a8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1326,9 +1326,10 @@ ix86_fixup_binary_operands (enum rtx_code code, 
machine_mode mode,
 
 void
 ix86_fixup_binary_operands_no_copy (enum rtx_code code,
-   machine_mode mode, rtx operands[])
+   machine_mode mode, rtx operands[],
+   bool use_ndd)
 {
-  rtx dst = ix86_fixup_binary_operands (code, mode, operands);
+  rtx dst = ix86_fixup_binary_operands (code, mode, operands, use_ndd);
   gcc_assert (dst == operands[0]);
 }
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7dfeb6af225..481527872e8 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -111,7 +111,7 @@ extern void ix86_expand_vector_move_misalign (machine_mode, 
rtx[]);
 extern rtx ix86_fixup_binary_operands (enum rtx_code,
   machine_mode, rtx[], bool = false);
 extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
-   machine_mode, rtx[]);
+   machine_mode, rtx[], bool = 
false);
 extern void ix86_expand_binary_operator (enum rtx_code,
 machine_mode, rtx[], bool = false);
 extern void ix86_expand_vector_logical_operator (enum rtx_code,
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 358a3857f89..ea5377a0b38 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7772,7 +7772,8 @@ (define_expand "sub3"
(minus:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")
 (match_operand:SDWIM 2 "")))]
   ""
-  "ix86_expand_binary_operator (MINUS, mode, operands); DONE;")
+  "ix86_expand_binary_operator (MINUS, mode, operands,
+   TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*sub3_doubleword"
   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
@@ -7798,7 +7799,10 @@ (define_insn_and_split "*sub3_doubleword"
   split_double_mode (mode, [0], 3, [0], [3]);
   if (operands[2] == const0_rtx)
 {
-  ix86_expand_binary_operator (MINUS, mode, [3]);
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  ix86_expand_binary_operator (MINUS, mode, [3],
+  TARGET_APX_NDD);
   DONE;
 }
 })
@@ -7827,25 +7831,36 @@ (define_insn_and_split "*sub3_doubleword_zext"
   "split_double_mode (mode, [0], 2, [0], 
[3]);")
 
 (define_insn "*sub_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,")
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,,r,r")
(minus:SWI
- (match_operand:SWI 1 "nonimmediate_operand" "0,0")
- (match_operand:SWI 2 "" ",")))
+ (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
+ (match_operand:SWI 2 "" ",,r,")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
-  "sub{}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   TARGET_APX_NDD)"
+  "@
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
(set_attr "mode" "")])
 
 (define_insn "*subsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
(zero_extend:DI
- (minus:SI (match_operand:SI 1 "register_operand" "0")
-   (match_operand:SI 2 "x86_64_general_operand" "rBMe"
+   

[PATCH 16/17] [APX NDD] Support APX NDD for cmove insns

2023-12-04 Thread Hongyu Wang
gcc/ChangeLog:

* config/i386/i386.md (*movcc_noc): Extend with new constraints
to support NDD.
(*movsicc_noc_zext): Likewise.
(*movsicc_noc_zext_1): Likewise.
(*movqicc_noc): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-cmov.c: New test.
---
 gcc/config/i386/i386.md  | 48 
 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c | 16 +++
 2 files changed, 45 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 0af7e82deee..853f53c2bb9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -24412,47 +24412,56 @@ (define_split
(neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0])
 
 (define_insn "*movcc_noc"
-  [(set (match_operand:SWI248 0 "register_operand" "=r,r")
+  [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r")
(if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
- (match_operand:SWI248 2 "nonimmediate_operand" "rm,0")
- (match_operand:SWI248 3 "nonimmediate_operand" "0,rm")))]
+ (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r")
+ (match_operand:SWI248 3 "nonimmediate_operand" "0,rm,r,rm")))]
   "TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %0|%0, %2}
-   cmov%O2%c1\t{%3, %0|%0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %0|%0, %3}
+   cmov%O2%C1\t{%2, %3, %0|%0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "")])
 
 (define_insn "*movsicc_noc_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
(if_then_else:DI (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
  (zero_extend:DI
-   (match_operand:SI 2 "nonimmediate_operand" "rm,0"))
+   (match_operand:SI 2 "nonimmediate_operand" "rm,0,rm,r"))
  (zero_extend:DI
-   (match_operand:SI 3 "nonimmediate_operand" "0,rm"]
+   (match_operand:SI 3 "nonimmediate_operand" "0,rm,r,rm"]
   "TARGET_64BIT
&& TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %k0|%k0, %2}
-   cmov%O2%c1\t{%3, %k0|%k0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %k0|%k0, %3}
+   cmov%O2%C1\t{%2, %3, %k0|%k0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %k0|%k0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "SI")])
 
 (define_insn "*movsicc_noc_zext_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,r")
(zero_extend:DI
  (if_then_else:SI (match_operator 1 "ix86_comparison_operator"
 [(reg FLAGS_REG) (const_int 0)])
-(match_operand:SI 2 "nonimmediate_operand" "rm,0")
-(match_operand:SI 3 "nonimmediate_operand" "0,rm"]
+(match_operand:SI 2 "nonimmediate_operand" "rm,0,rm,r")
+(match_operand:SI 3 "nonimmediate_operand" "0,rm,r,rm"]
   "TARGET_64BIT
&& TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %k0|%k0, %2}
-   cmov%O2%c1\t{%3, %k0|%k0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %k0|%k0, %3}
+   cmov%O2%C1\t{%2, %3, %k0|%k0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %k0|%k0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "SI")])
 
 
@@ -24477,14 +24486,15 @@ (define_split
 })
 
 (define_insn "*movqicc_noc"
-  [(set (match_operand:QI 0 "register_operand" "=r,r")
+  [(set (match_operand:QI 0 "register_operand" "=r,r,r")
(if_then_else:QI (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
- (match_operand:QI 2 "register_operand" "r,0")
- (match_operand:QI 3 "register_operand" "0,r")))]
+ (match_operand:QI 2 "register_operand" "r,0,r")
+ (match_operand:QI 3 "register_operand" "0,r,r")))]
   "TARGET_CMOVE && !TARGET_PARTIAL_REG_STALL"
   "#"
-  [(set_attr "type" "icmov")
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "QI")])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c 
b/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
new file mode 100644
index 000..459dc965342
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -m64 -mapxf" } */
+/* { dg-final { scan-assembler-times "cmove\[^\n\r]*, %eax" 1 } } */
+/* 

[PATCH 04/17] [APX NDD] Disable seg_prefixed memory usage for NDD add

2023-12-04 Thread Hongyu Wang
NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded using segment prefix.
Disable those *POFF constant usage in NDD add alternatives with new constraint.

gcc/ChangeLog:

* config/i386/constraints.md (je): New constraint.
* config/i386/i386-protos.h (x86_poff_operand_p): New function to
check any *POFF constant in operand.
* config/i386/i386.cc (x86_poff_operand_p): New prototype.
* config/i386/i386.md (*add_1): Split out je alternative for add.
---
 gcc/config/i386/constraints.md |  5 +
 gcc/config/i386/i386-protos.h  |  1 +
 gcc/config/i386/i386.cc| 25 +
 gcc/config/i386/i386.md| 10 +-
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index cbee31fa40a..f4c3c3dd952 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -433,3 +433,8 @@ (define_address_constraint "jb"
 
 (define_register_constraint  "jc"
  "TARGET_APX_EGPR && !TARGET_AVX ? GENERAL_GPR16 : GENERAL_REGS")
+
+(define_constraint  "je"
+  "@internal constant that do not allow any unspec global offsets"
+  (and (match_operand 0 "x86_64_immediate_operand")
+   (match_test "!x86_poff_operand_p (op)")))
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a9d0c568bba..7dfeb6af225 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -66,6 +66,7 @@ extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
 extern bool x86_extended_rex2reg_mentioned_p (rtx);
 extern bool x86_evex_reg_mentioned_p (rtx [], int);
+extern bool x86_poff_operand_p (rtx);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3efeed396c4..3e670330ef6 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23341,6 +23341,31 @@ x86_evex_reg_mentioned_p (rtx operands[], int nops)
   return false;
 }
 
+/* Return true when rtx operand does not contain any UNSPEC_*POFF related
+   constant to avoid APX_NDD instructions excceed encoding length limit.  */
+bool
+x86_poff_operand_p (rtx operand)
+{
+  if (GET_CODE (operand) == CONST)
+{
+  rtx op = XEXP (operand, 0);
+  if (GET_CODE (op) == PLUS)
+   op = XEXP (op, 0);
+   
+  if (GET_CODE (op) == UNSPEC)
+   {
+ int unspec = XINT (op, 1);
+ return (unspec == UNSPEC_NTPOFF
+ || unspec == UNSPEC_TPOFF
+ || unspec == UNSPEC_DTPOFF
+ || unspec == UNSPEC_GOTTPOFF
+ || unspec == UNSPEC_GOTNTPOFF
+ || unspec == UNSPEC_INDNTPOFF);
+   }
+}
+  return false;
+}
+
 /* If profitable, negate (without causing overflow) integer constant
of mode MODE at location LOC.  Return true in this case.  */
 bool
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2a73f6dcaec..6b316e698bb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6415,15 +6415,15 @@ (define_insn_and_split 
"*add3_doubleword_concat_zext"
  "split_double_mode (mode, [0], 1, [0], [5]);")
 
 (define_insn "*add_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r,r")
(plus:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r")
- (match_operand:SWI48 2 "x86_64_general_operand" "re,BM,0,le,re,BM")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r,m,r")
+ (match_operand:SWI48 2 "x86_64_general_operand" 
"re,BM,0,le,r,e,je,BM")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands,
TARGET_APX_NDD)"
 {
-  bool use_ndd = (which_alternative == 4 || which_alternative == 5);
+  bool use_ndd = (which_alternative >= 4);
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
@@ -6454,7 +6454,7 @@ (define_insn "*add_1"
: "add{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd")
+  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd")
(set (attr "type")
  (cond [(eq_attr "alternative" "3")
   (const_string "lea")
-- 
2.31.1



[PATCH v2] LoongArch: Add asm modifiers to the LSX and LASX directives in the doc.

2023-12-04 Thread chenxiaolong
gcc/ChangeLog:

* doc/extend.texi:Add modifiers to the vector of asm in the doc.
* doc/md.texi:Refine the description of the modifier 'f' in the doc.
---
 gcc/doc/extend.texi | 47 +
 gcc/doc/md.texi |  2 +-
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 32ae15e1d5b..d87a079704c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -11820,10 +11820,57 @@ The list below describes the supported modifiers and 
their effects for LoongArch
 @item @code{d} @tab Same as @code{c}.
 @item @code{i} @tab Print the character ''@code{i}'' if the operand is not a 
register.
 @item @code{m} @tab Same as @code{c}, but the printed value is @code{operand - 
1}.
+@item @code{u} @tab Print a LASX register.
+@item @code{w} @tab Print a LSX register.
 @item @code{X} @tab Print a constant integer operand in hexadecimal.
 @item @code{z} @tab Print the operand in its unmodified form, followed by a 
comma.
 @end multitable
 
+References to input and output operands in the assembler template of extended
+asm statements can use modifiers to affect the way the operands are formatted
+in the code output to the assembler.  For example, the following code uses the
+'w' modifier for LoongArch:
+
+@example
+test-asm.c:
+
+#include 
+
+__m128i foo (void)
+@{
+__m128i  a,b,c;
+__asm__ ("vadd.d %w0,%w1,%w2\n\t"
+   :"=f" (c)
+   :"f" (a),"f" (b));
+
+return c;
+@}
+
+@end example
+
+@noindent
+The compile command for the test case is as follows:
+
+@example
+gcc test-asm.c -mlsx -S -o test-asm.s
+@end example
+
+@noindent
+The assembly statement produces the following assembly code:
+
+@example
+vadd.d $vr0,$vr0,$vr1
+@end example
+
+This is a 128-bit vector addition instruction, @code{c} (referred to in the
+template string as %0) is the output, and @code{a} (%1) and @code{b} (%2) are
+the inputs.  @code{__m128i} is a vector data type defined in the  file
+@code{lsxintrin.h} (@xref{LoongArch SX Vector Intrinsics}).  The symbol '=f'
+represents a constraint using a floating-point register as an output type, and
+the 'f' in the input operand represents a constraint using a floating-point
+register operand, which can refer to the definition of a constraint
+(@xref{Constraints}) in gcc.
+
 @anchor{riscvOperandmodifiers}
 @subsubsection RISC-V Operand Modifiers
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 536ce997f01..2274da5ff69 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -2881,7 +2881,7 @@ $r1h
 @item LoongArch---@file{config/loongarch/constraints.md}
 @table @code
 @item f
-A floating-point register (if available).
+A floating-point or vector register (if available).
 @item k
 A memory operand whose address is formed by a base register and
 (optionally scaled) index register.
-- 
2.20.1



[PATCH v2 1/2] LoongArch: Switch loongarch-def from C to C++ to make it possible.

2023-12-04 Thread Lulu Cheng
From: Xi Ruoyao 

We'll use HOST_WIDE_INT in LoongArch static properties in following patches.

To keep the same readability as C99 designated initializers, create a
std::array like data structure with position setter function, and add
field setter functions for structs used in loongarch-def.cc.

Remove unneeded guards #if
!defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)
in loongarch-def.h and loongarch-opts.h.


gcc/ChangeLog:

* config/loongarch/loongarch-def.h: Remove extern "C".
(loongarch_isa_base_strings): Declare as loongarch_def_array
instead of plain array.
(loongarch_isa_ext_strings): Likewise.
(loongarch_abi_base_strings): Likewise.
(loongarch_abi_ext_strings): Likewise.
(loongarch_cmodel_strings): Likewise.
(loongarch_cpu_strings): Likewise.
(loongarch_cpu_default_isa): Likewise.
(loongarch_cpu_issue_rate): Likewise.
(loongarch_cpu_multipass_dfa_lookahead): Likewise.
(loongarch_cpu_cache): Likewise.
(loongarch_cpu_align): Likewise.
(loongarch_cpu_rtx_cost_data): Likewise.
(loongarch_isa): Add a constructor and field setter functions.
* config/loongarch/loongarch-opts.h (loongarch-defs.h): Do not
include for target libraries.
* config/loongarch/loongarch-tune.h (LOONGARCH_TUNE_H): Likewise.
(struct loongarch_rtx_cost_data): Likewise.
(struct loongarch_cache): Likewise.
(struct loongarch_align): Likewise.
* config/loongarch/t-loongarch: Compile loongarch-def.cc with the
C++ compiler.
* config/loongarch/loongarch-def-array.h: New file for a
std:array like data structure with position setter function.
* config/loongarch/loongarch-def.c: Rename to ...
* config/loongarch/loongarch-def.cc: ... here.
(loongarch_cpu_strings): Define as loongarch_def_array instead
of plain array.
(loongarch_cpu_default_isa): Likewise.
(loongarch_cpu_cache): Likewise.
(loongarch_cpu_align): Likewise.
(loongarch_cpu_rtx_cost_data): Likewise.
(loongarch_cpu_issue_rate): Likewise.
(loongarch_cpu_multipass_dfa_lookahead): Likewise.
(loongarch_isa_base_strings): Likewise.
(loongarch_isa_ext_strings): Likewise.
(loongarch_abi_base_strings): Likewise.
(loongarch_abi_ext_strings): Likewise.
(loongarch_cmodel_strings): Likewise.
(abi_minimal_isa): Likewise.
(loongarch_rtx_cost_optimize_size): Use field setter functions
instead of designated initializers.
(loongarch_rtx_cost_data): Implement default constructor.
---
 gcc/config/loongarch/loongarch-def-array.h |  40 
 gcc/config/loongarch/loongarch-def.c   | 227 -
 gcc/config/loongarch/loongarch-def.cc  | 187 +
 gcc/config/loongarch/loongarch-def.h   |  55 ++---
 gcc/config/loongarch/loongarch-opts.cc |   7 +
 gcc/config/loongarch/loongarch-opts.h  |   5 +-
 gcc/config/loongarch/loongarch-tune.h  | 123 ++-
 gcc/config/loongarch/t-loongarch   |   4 +-
 8 files changed, 390 insertions(+), 258 deletions(-)
 create mode 100644 gcc/config/loongarch/loongarch-def-array.h
 delete mode 100644 gcc/config/loongarch/loongarch-def.c
 create mode 100644 gcc/config/loongarch/loongarch-def.cc

diff --git a/gcc/config/loongarch/loongarch-def-array.h 
b/gcc/config/loongarch/loongarch-def-array.h
new file mode 100644
index 000..bdb3e9c6a2b
--- /dev/null
+++ b/gcc/config/loongarch/loongarch-def-array.h
@@ -0,0 +1,40 @@
+/* A std::array like data structure for LoongArch static properties.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef _LOONGARCH_DEF_ARRAY_H
+#define _LOONGARCH_DEF_ARRAY_H 1
+
+template 
+class loongarch_def_array {
+private:
+  T arr[N];
+public:
+  loongarch_def_array () : arr{} {}
+
+  T [] (int n) { return arr[n]; }
+  const T [] (int n) const { return arr[n]; }
+
+  loongarch_def_array set (int idx, T &)
+  {
+(*this)[idx] = value;
+return *this;
+  }
+};
+
+#endif
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
deleted file mode 100644
index f22d488acb2..000
--- a/gcc/config/loongarch/loongarch-def.c
+++ /dev/null
@@ -1,227 

[PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386.md: (addsi_1_zext): Add new alternatives for
NDD and adjust output templates.
(*add_2): Likewise.
(*addsi_2_zext): Likewise.
(*add_3): Likewise.
(*addsi_3_zext): Likewise.
(*adddi_4): Likewise.
(*add_4): Likewise.
(*add_5): Likewise.
(*addv4): Likewise.
(*addv4_1): Likewise.
(*add3_cconly_overflow_1): Likewise.
(*add3_cc_overflow_1): Likewise.
(*addsi3_zext_cc_overflow_1): Likewise.
(*add3_cconly_overflow_2): Likewise.
(*add3_cc_overflow_2): Likewise.
(*addsi3_zext_cc_overflow_2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add more test.
---
 gcc/config/i386/i386.md | 310 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  53 ++--
 2 files changed, 232 insertions(+), 131 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index cb227d19f40..2a73f6dcaec 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6476,13 +6476,15 @@ (define_insn "*add_1"
 ;; patterns constructed from addsi_1 to match.
 
 (define_insn "addsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r,r")
(zero_extend:DI
- (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r")
-  (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,le"
+ (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r,r,rm")
+  (match_operand:SI 2 "x86_64_general_operand" 
"rBMe,0,le,rBMe,re"
(clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands)"
+  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands,
+   TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 3 || which_alternative == 4);
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
@@ -6490,11 +6492,13 @@ (define_insn "addsi_1_zext"
 
 case TYPE_INCDEC:
   if (operands[2] == const1_rtx)
-return "inc{l}\t%k0";
+return use_ndd ? "inc{l}\t{%1, %k0|%k0, %1}"
+  : "inc{l}\t%k0";
   else
 {
  gcc_assert (operands[2] == constm1_rtx);
-  return "dec{l}\t%k0";
+ return use_ndd ? "dec{l}\t{%1, %k0|%k0, %1}"
+: "dec{l}\t%k0";
}
 
 default:
@@ -6504,12 +6508,15 @@ (define_insn "addsi_1_zext"
 std::swap (operands[1], operands[2]);
 
   if (x86_maybe_negate_const_int ([2], SImode))
-return "sub{l}\t{%2, %k0|%k0, %2}";
+return use_ndd ? "sub{l}\t{%2 ,%1, %k0|%k0, %1, %2}"
+  : "sub{l}\t{%2, %k0|%k0, %2}";
 
-  return "add{l}\t{%2, %k0|%k0, %2}";
+  return use_ndd ? "add{l}\t{%2 ,%1, %k0|%k0, %1, %2}"
+: "add{l}\t{%2, %k0|%k0, %2}";
 }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
  (cond [(eq_attr "alternative" "2")
  (const_string "lea")
(match_operand:SI 2 "incdec_operand")
@@ -6811,37 +6818,42 @@ (define_insn "*add_2"
   [(set (reg FLAGS_REG)
(compare
  (plus:SWI
-   (match_operand:SWI 1 "nonimmediate_operand" "%0,0,")
-   (match_operand:SWI 2 "" ",,0"))
+   (match_operand:SWI 1 "nonimmediate_operand" "%0,0,,rm,r")
+   (match_operand:SWI 2 "" ",,0,r,"))
  (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,,")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,,,r,r")
(plus:SWI (match_dup 1) (match_dup 2)))]
   "ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (PLUS, mode, operands)"
+   && ix86_binary_operator_ok (PLUS, mode, operands, TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 3 || which_alternative == 4);
   switch (get_attr_type (insn))
 {
 case TYPE_INCDEC:
   if (operands[2] == const1_rtx)
-return "inc{}\t%0";
+return use_ndd ? "inc{}\t{%1, %0|%0, %1}"
+  : "inc{}\t%0";
   else
 {
  gcc_assert (operands[2] == constm1_rtx);
-  return "dec{}\t%0";
+ return use_ndd ? "dec{}\t{%1, %0|%0, %1}"
+: "dec{}\t%0";
}
 
 default:
   if (which_alternative == 2)
 std::swap (operands[1], operands[2]);
 
-  gcc_assert (rtx_equal_p (operands[0], operands[1]));
   if (x86_maybe_negate_const_int ([2], mode))
-return "sub{}\t{%2, %0|%0, %2}";
+return use_ndd ? "sub{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sub{}\t{%2, %0|%0, %2}";
 
-  return "add{}\t{%2, %0|%0, %2}";
+  return use_ndd ? "add{}\t{%2, %1, %0|%0, %1, %2}"
+: "add{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set (attr "type")
+  

[PATCH v2 2/2] LoongArch: Remove the definition of ISA_BASE_LA64V110 from the code.

2023-12-04 Thread Lulu Cheng
The instructions defined in LoongArch Reference Manual v1.1 are not the 
instruction
set v1.1 version. The CPU defined later may only support some instructions in
LoongArch Reference Manual v1.1. Therefore, the macro ISA_BASE_LA64V110 and
related definitions are removed here.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Delete 
STR_ISA_BASE_LA64V110.
* config/loongarch/genopts/loongarch.opt.in: Likewise.
* config/loongarch/loongarch-cpu.cc (ISA_BASE_LA64V110_FEATURES): 
Delete macro.
(fill_native_cpu_config): Define a new variable hw_isa_evolution record 
the
extended instruction set support read from cpucfg.
* config/loongarch/loongarch-def.cc: Set evolution at initialization.
* config/loongarch/loongarch-def.h (ISA_BASE_LA64V100): Delete.
(ISA_BASE_LA64V110): Likewise.
(N_ISA_BASE_TYPES): Likewise.
(defined): Likewise.
* config/loongarch/loongarch-opts.cc: Likewise.
* config/loongarch/loongarch-opts.h (TARGET_64BIT): Likewise.
(ISA_BASE_IS_LA64V110): Likewise.
* config/loongarch/loongarch-str.h (STR_ISA_BASE_LA64V110): Likewise.
* config/loongarch/loongarch.opt: Regenerate.
---
 .../loongarch/genopts/loongarch-strings   |  1 -
 gcc/config/loongarch/genopts/loongarch.opt.in |  3 ---
 gcc/config/loongarch/loongarch-cpu.cc | 23 +--
 gcc/config/loongarch/loongarch-def.cc | 14 +++
 gcc/config/loongarch/loongarch-def.h  | 12 ++
 gcc/config/loongarch/loongarch-opts.cc|  3 ---
 gcc/config/loongarch/loongarch-opts.h |  4 +---
 gcc/config/loongarch/loongarch-str.h  |  1 -
 gcc/config/loongarch/loongarch.opt|  3 ---
 9 files changed, 19 insertions(+), 45 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index b2070c83ed0..7bc4824007e 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -30,7 +30,6 @@ STR_CPU_LA664   la664
 
 # Base architecture
 STR_ISA_BASE_LA64V100 la64
-STR_ISA_BASE_LA64V110 la64v1.1
 
 # -mfpu
 OPTSTR_ISA_EXT_FPUfpu
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 8af6cc6f532..483b185b059 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -32,9 +32,6 @@ Basic ISAs of LoongArch:
 EnumValue
 Enum(isa_base) String(@@STR_ISA_BASE_LA64V100@@) Value(ISA_BASE_LA64V100)
 
-EnumValue
-Enum(isa_base) String(@@STR_ISA_BASE_LA64V110@@) Value(ISA_BASE_LA64V110)
-
 ;; ISA extensions / adjustments
 Enum
 Name(isa_ext_fpu) Type(int)
diff --git a/gcc/config/loongarch/loongarch-cpu.cc 
b/gcc/config/loongarch/loongarch-cpu.cc
index 622df47916f..4033320d0e1 100644
--- a/gcc/config/loongarch/loongarch-cpu.cc
+++ b/gcc/config/loongarch/loongarch-cpu.cc
@@ -23,7 +23,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
-#include "tm.h"
 #include "diagnostic-core.h"
 
 #include "loongarch-def.h"
@@ -32,19 +31,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "loongarch-cpucfg-map.h"
 #include "loongarch-str.h"
 
-/* loongarch_isa_base_features defined here instead of loongarch-def.c
-   because we need to use options.h.  Pay attention on the order of elements
-   in the initializer becaue ISO C++ does not allow C99 designated
-   initializers!  */
-
-#define ISA_BASE_LA64V110_FEATURES \
-  (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA \
-   | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS)
-
-int64_t loongarch_isa_base_features[N_ISA_BASE_TYPES] = {
-  /* [ISA_BASE_LA64V100] = */ 0,
-  /* [ISA_BASE_LA64V110] = */ ISA_BASE_LA64V110_FEATURES,
-};
 
 /* Native CPU detection with "cpucfg" */
 static uint32_t cpucfg_cache[N_CPUCFG_WORDS] = { 0 };
@@ -235,18 +221,20 @@ fill_native_cpu_config (struct loongarch_target *tgt)
   /* Use the native value anyways.  */
   preset.simd = tmp;
 
+
+  int64_t hw_isa_evolution = 0;
+
   /* Features added during ISA evolution.  */
   for (const auto : cpucfg_map)
if (cpucfg_cache[entry.cpucfg_word] & entry.cpucfg_bit)
- preset.evolution |= entry.isa_evolution_bit;
+ hw_isa_evolution |= entry.isa_evolution_bit;
 
   if (native_cpu_type != CPU_NATIVE)
{
  /* Check if the local CPU really supports the features of the base
 ISA of probed native_cpu_type.  If any feature is not detected,
 either GCC or the hardware is buggy.  */
- auto base_isa_feature = loongarch_isa_base_features[preset.base];
- if ((preset.evolution & base_isa_feature) != base_isa_feature)
+ if ((preset.evolution & hw_isa_evolution) != hw_isa_evolution)
warning (0,
 "detected base architecture 

[PATCH v2 0/2] Delete ISA_BASE_LA64V110 related definitions.

2023-12-04 Thread Lulu Cheng
1. Rebase Xi Ruoyao's patch a to the latest commit.
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636798.html

2. remove the #if
!defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)
guards in loongarch-def.h and loongarch-opts.h as they'll be unneeded.

3. Described in LoongArch Reference Manual v1.1:
The new functional subsets in each new version have independent identification
bits in the return value of the CPUCFG instruction. It is recommended that the
software determines the running process based on this information rather than
the version number of the Loongson architecture.

So delete the ISA_BASE_LA64V110 related definitions here.

*** BLURB HERE ***

Lulu Cheng (1):
  LoongArch: Remove the definition of ISA_BASE_LA64V110 from the code.

Xi Ruoyao (1):
  LoongArch: Switch loongarch-def from C to C++ to make it possible.

 .../loongarch/genopts/loongarch-strings   |   1 -
 gcc/config/loongarch/genopts/loongarch.opt.in |   3 -
 gcc/config/loongarch/loongarch-cpu.cc |  23 +-
 gcc/config/loongarch/loongarch-def-array.h|  40 +++
 gcc/config/loongarch/loongarch-def.c  | 227 --
 gcc/config/loongarch/loongarch-def.cc | 193 +++
 gcc/config/loongarch/loongarch-def.h  |  67 +++---
 gcc/config/loongarch/loongarch-opts.cc|  10 +-
 gcc/config/loongarch/loongarch-opts.h |   9 +-
 gcc/config/loongarch/loongarch-str.h  |   1 -
 gcc/config/loongarch/loongarch-tune.h | 123 +-
 gcc/config/loongarch/loongarch.opt|   3 -
 gcc/config/loongarch/t-loongarch  |   4 +-
 13 files changed, 405 insertions(+), 299 deletions(-)
 create mode 100644 gcc/config/loongarch/loongarch-def-array.h
 delete mode 100644 gcc/config/loongarch/loongarch-def.c
 create mode 100644 gcc/config/loongarch/loongarch-def.cc

-- 
2.31.1



[PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled

2023-12-04 Thread Hongyu Wang
Under APX NDD, previous TImode allocation will have issue that it was
originally allocated using continuous pair, like rax:rdi, rdi:rdx.

This will cause issue for all TImode NDD patterns. For NDD we will not
assume the arithmetic operations like add have dependency between dest
and src1, then write to 1st highpart rdi will be overrided by the 2nd
lowpart rdi if 2nd lowpart rdi have different src as input, then the write
to 1st highpart rdi will missed and cause miscompliation.

To resolve this, under TARGET_APX_NDD we'd only allow register with even
regno to be allocated with TImode, then TImode registers will be allocated
with non-overlapping pairs.

There could be some error for inline assembly if it forcely allocate __int128
with odd number general register.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_hard_regno_mode_ok): Restrict even regno
for TImode if APX NDD enabled.
---
 gcc/config/i386/i386.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 93a9cb556a5..3efeed396c4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20873,6 +20873,16 @@ ix86_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
return true;
   return !can_create_pseudo_p ();
 }
+  /* With TImode we previously have assumption that src1/dest will use same
+ register, so the allocation of highpart/lowpart can be consecutive, and
+ 2 TImode insn would held their low/highpart in continuous sequence like
+ rax:rdx, rdx:rcx. This will not work for APX_NDD since NDD allows
+ different registers as dest/src1, when writes to 2nd lowpart will impact
+ the writes to 1st highpart, then the insn will be optimized out. So for
+ TImode pattern if we support NDD form, the allowed register number should
+ be even to avoid such mixed high/low part override. */
+  else if (TARGET_APX_NDD && mode == TImode)
+return regno % 2 == 0;
   /* We handle both integer and floats in the general purpose registers.  */
   else if (VALID_INT_MODE_P (mode)
   || VALID_FP_MODE_P (mode))
-- 
2.31.1



[PATCH 08/17] [APX NDD] Support APX NDD for neg insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
parameter and adjust for NDD.
* config/i386/i386-protos.h: Add use_ndd parameter for
ix86_unary_operator_ok and ix86_expand_unary_operator.
* config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter
and adjust for NDD.
* config/i386/i386.md (neg2): Add new constraint for NDD and
adjust output template.
(*neg_1): Likewise.
(*neg2_doubleword): Likewise.
(*neg_2): Likewise.
(*neg_ccc_1): Likewise.
(*neg_ccc_2): Likewise.
(*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternatives.
(*negsi_2_zext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add neg test.
---
 gcc/config/i386/i386-expand.cc  |  4 +-
 gcc/config/i386/i386-protos.h   |  5 +-
 gcc/config/i386/i386.cc |  5 +-
 gcc/config/i386/i386.md | 77 -
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 29 ++
 5 files changed, 87 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 93ecde4b4a8..d4bbd33ce07 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1494,7 +1494,7 @@ ix86_binary_operator_ok (enum rtx_code code, machine_mode 
mode,
 
 void
 ix86_expand_unary_operator (enum rtx_code code, machine_mode mode,
-   rtx operands[])
+   rtx operands[], bool use_ndd)
 {
   bool matching_memory = false;
   rtx src, dst, op, clob;
@@ -1513,7 +1513,7 @@ ix86_expand_unary_operator (enum rtx_code code, 
machine_mode mode,
 }
 
   /* When source operand is memory, destination must match.  */
-  if (MEM_P (src) && !matching_memory)
+  if (!use_ndd && MEM_P (src) && !matching_memory)
 src = force_reg (mode, src);
 
   /* Emit the instruction.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 481527872e8..fa952409729 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -127,7 +127,7 @@ extern bool ix86_vec_interleave_v2df_operator_ok (rtx 
operands[3], bool high);
 extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
 extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
 extern void ix86_expand_unary_operator (enum rtx_code, machine_mode,
-   rtx[]);
+   rtx[], bool = false);
 extern rtx ix86_build_const_vector (machine_mode, bool, rtx);
 extern rtx ix86_build_signbit_mask (machine_mode, bool, bool);
 extern HOST_WIDE_INT ix86_convert_const_vector_to_integer (rtx,
@@ -147,7 +147,8 @@ extern void ix86_split_fp_absneg_operator (enum rtx_code, 
machine_mode,
   rtx[]);
 extern void ix86_expand_copysign (rtx []);
 extern void ix86_expand_xorsign (rtx []);
-extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2]);
+extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2],
+   bool = false);
 extern bool ix86_match_ccmode (rtx, machine_mode);
 extern bool ix86_match_ptest_ccmode (rtx);
 extern void ix86_expand_branch (enum rtx_code, rtx, rtx, rtx);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3e670330ef6..a3b628d2f6d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -16209,11 +16209,12 @@ ix86_dep_by_shift_count (const_rtx set_insn, 
const_rtx use_insn)
 bool
 ix86_unary_operator_ok (enum rtx_code,
machine_mode,
-   rtx operands[2])
+   rtx operands[2],
+   bool use_ndd)
 {
   /* If one of operands is memory, source and destination must match.  */
   if ((MEM_P (operands[0])
-   || MEM_P (operands[1]))
+   || (!use_ndd && MEM_P (operands[1])))
   && ! rtx_equal_p (operands[0], operands[1]))
 return false;
   return true;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e2705ada31a..1a2fb116f01 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -13282,13 +13282,14 @@ (define_expand "neg2"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
(neg:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")))]
   ""
-  "ix86_expand_unary_operator (NEG, mode, operands); DONE;")
+  "ix86_expand_unary_operator (NEG, mode, operands,
+  TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*neg2_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro")
-   (neg: (match_operand: 1 "nonimmediate_operand" "0")))
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+   (neg: (match_operand: 1 "nonimmediate_operand" "0,ro")))

Re: [PATCH v2 00/17] Support Intel APX NDD

2023-12-04 Thread Hongtao Liu
On Tue, Dec 5, 2023 at 10:32 AM Hongyu Wang  wrote:
>
> Hi,
>
> APX NDD patches have been posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636604.html
>
> Thanks to Hongtao's review, the V2 patch adds support of zext sematic with
> memory input as NDD by default clear upper bits of dest for any operand size.
>
> Also we support TImode shift with new split helper functions, which allows NDD
> form split but still restric the memory src usage as in post-reload splitter
> the register number is restricted, and no new register can be used for
> shld/shrd.
>
> Also fixed several typo/formatting/redundant code.
Patches LGTM, Please wait a few more days before committing incase
other folks have comments.
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
>
> OK for trunk?
>
> Hongyu Wang (8):
>   [APX NDD] Restrict TImode register usage when NDD enabled
>   [APX NDD] Disable seg_prefixed memory usage for NDD add
>   [APX NDD] Support APX NDD for left shift insns
>   [APX NDD] Support APX NDD for right shift insns
>   [APX NDD] Support APX NDD for rotate insns
>   [APX NDD] Support APX NDD for shld/shrd insns
>   [APX NDD] Support APX NDD for cmove insns
>   [APX NDD] Support TImode shift for NDD
>
> Kong Lingling (9):
>   [APX NDD] Support Intel APX NDD for legacy add insn
>   [APX NDD] Support APX NDD for optimization patterns of add
>   [APX NDD] Support APX NDD for adc insns
>   [APX NDD] Support APX NDD for sub insns
>   [APX NDD] Support APX NDD for sbb insn
>   [APX NDD] Support APX NDD for neg insn
>   [APX NDD] Support APX NDD for not insn
>   [APX NDD] Support APX NDD for and insn
>   [APX NDD] Support APX NDD for or/xor insn
>
>  gcc/config/i386/constraints.md|5 +
>  gcc/config/i386/i386-expand.cc|  164 +-
>  gcc/config/i386/i386-options.cc   |2 +
>  gcc/config/i386/i386-protos.h |   16 +-
>  gcc/config/i386/i386.cc   |   40 +-
>  gcc/config/i386/i386.md   | 2323 +++--
>  gcc/testsuite/gcc.target/i386/apx-ndd-adc.c   |   15 +
>  gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c  |   16 +
>  gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c   |6 +
>  .../gcc.target/i386/apx-ndd-shld-shrd.c   |   24 +
>  .../gcc.target/i386/apx-ndd-ti-shift.c|   91 +
>  gcc/testsuite/gcc.target/i386/apx-ndd.c   |  202 ++
>  12 files changed, 2149 insertions(+), 755 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c
>
> --
> 2.31.1
>


-- 
BR,
Hongtao


[PATCH] btf: avoid wrong DATASEC entries for extern vars [PR112849]

2023-12-04 Thread David Faust
The process of creating BTF_KIND_DATASEC records involves iterating
through variable declarations, determining which section they will be
placed in, and creating an entry in the appropriate DATASEC record
accordingly.

For variables without e.g. an explicit __attribute__((section)), we use
categorize_decl_for_section () to identify the appropriate named section
and corresponding BTF_KIND_DATASEC record.

This was incorrectly being done for 'extern' variable declarations as
well as non-extern ones, which meant that extern variable declarations
could result in BTF_KIND_DATASEC entries claiming the variable is
allocated in some section such as '.bss' without any knowledge whether
that is actually true. That resulted in errors building the Linux kernel
BPF selftests.

This patch corrects btf_collect_datasec () to avoid assuming a section
for extern variables, and only emit BTF_KIND_DATASEC entries for them if
they have a known section.

Bootstrapped + tested on x86_64-linux-gnu.
Tested on x86_64-linux-gnu host for bpf-unknown-none.

gcc/
PR debug/112849
* btfout.cc (btf_collect_datasec): Avoid incorrectly creating an
entry in a BTF_KIND_DATASEC record for extern variable decls without
a known section.

gcc/testsuite/
PR debug/112849
* gcc.dg/debug/btf/btf-datasec-3.c: New test.
---
 gcc/btfout.cc | 10 ++-
 .../gcc.dg/debug/btf/btf-datasec-3.c  | 27 +++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index a5e0d640e19..db4f1084f85 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -486,7 +486,15 @@ btf_collect_datasec (ctf_container_ref ctfc)
 
   /* Mark extern variables.  */
   if (DECL_EXTERNAL (node->decl))
-   dvd->dvd_visibility = BTF_VAR_GLOBAL_EXTERN;
+   {
+ dvd->dvd_visibility = BTF_VAR_GLOBAL_EXTERN;
+
+ /* PR112849: avoid assuming a section for extern decls without
+an explicit section, which would result in incorrectly
+emitting a BTF_KIND_DATASEC entry for them.  */
+ if (node->get_section () == NULL)
+   continue;
+   }
 
   const char *section_name = get_section_name (node);
   if (section_name == NULL)
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c
new file mode 100644
index 000..3c1c7a28c2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-3.c
@@ -0,0 +1,27 @@
+/* PR debug/112849
+   Test that we do not incorrectly create BTF_KIND_DATASEC entries for
+   extern decls with no known section.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+extern int VERSION __attribute__((section (".version")));
+
+extern int test_bss1;
+extern int test_data1;
+
+int test_bss2;
+int test_data2 = 2;
+
+int
+foo (void)
+{
+  test_bss2 = VERSION;
+  return test_bss1 + test_data1 + test_data2;
+}
+
+/* There should only be a DATASEC entries for VERSION out of the extern decls. 
 */
+/* { dg-final { scan-assembler-times "bts_type" 3 } } */
+/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 
'test_data2'\\)" 1 } } */
+/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 
'test_bss2'\\)" 1 } } */
+/* { dg-final { scan-assembler-times "bts_type: \\(BTF_KIND_VAR 'VERSION'\\)" 
1 } } */
-- 
2.42.0



[PATCH v2 00/17] Support Intel APX NDD

2023-12-04 Thread Hongyu Wang
Hi,

APX NDD patches have been posted at
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636604.html

Thanks to Hongtao's review, the V2 patch adds support of zext sematic with
memory input as NDD by default clear upper bits of dest for any operand size.

Also we support TImode shift with new split helper functions, which allows NDD
form split but still restric the memory src usage as in post-reload splitter
the register number is restricted, and no new register can be used for
shld/shrd.

Also fixed several typo/formatting/redundant code.

Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.

OK for trunk?

Hongyu Wang (8):
  [APX NDD] Restrict TImode register usage when NDD enabled
  [APX NDD] Disable seg_prefixed memory usage for NDD add
  [APX NDD] Support APX NDD for left shift insns
  [APX NDD] Support APX NDD for right shift insns
  [APX NDD] Support APX NDD for rotate insns
  [APX NDD] Support APX NDD for shld/shrd insns
  [APX NDD] Support APX NDD for cmove insns
  [APX NDD] Support TImode shift for NDD

Kong Lingling (9):
  [APX NDD] Support Intel APX NDD for legacy add insn
  [APX NDD] Support APX NDD for optimization patterns of add
  [APX NDD] Support APX NDD for adc insns
  [APX NDD] Support APX NDD for sub insns
  [APX NDD] Support APX NDD for sbb insn
  [APX NDD] Support APX NDD for neg insn
  [APX NDD] Support APX NDD for not insn
  [APX NDD] Support APX NDD for and insn
  [APX NDD] Support APX NDD for or/xor insn

 gcc/config/i386/constraints.md|5 +
 gcc/config/i386/i386-expand.cc|  164 +-
 gcc/config/i386/i386-options.cc   |2 +
 gcc/config/i386/i386-protos.h |   16 +-
 gcc/config/i386/i386.cc   |   40 +-
 gcc/config/i386/i386.md   | 2323 +++--
 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c   |   15 +
 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c  |   16 +
 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c   |6 +
 .../gcc.target/i386/apx-ndd-shld-shrd.c   |   24 +
 .../gcc.target/i386/apx-ndd-ti-shift.c|   91 +
 gcc/testsuite/gcc.target/i386/apx-ndd.c   |  202 ++
 12 files changed, 2149 insertions(+), 755 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c

-- 
2.31.1



Re: [PATCH v4] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-04 Thread Richard Sandiford
Manos Anagnostakis  writes:
> This is an RTL pass that detects store forwarding from stores to larger loads 
> (load pairs).
>
> This optimization is SPEC2017-driven and was found to be beneficial for some 
> benchmarks,
> through testing on ampere1/ampere1a machines.
>
> For example, it can transform cases like
>
> str  d5, [sp, #320]
> fmul d5, d31, d29
> ldp  d31, d17, [sp, #312] # Large load from small store
>
> to
>
> str  d5, [sp, #320]
> fmul d5, d31, d29
> ldr  d31, [sp, #312]
> ldr  d17, [sp, #320]
>
> Currently, the pass is disabled by default on all architectures and enabled 
> by a target-specific option.
>
> If deemed beneficial enough for a default, it will be enabled on 
> ampere1/ampere1a,
> or other architectures as well, without needing to be turned on by this 
> option.
>
> Bootstrapped and regtested on aarch64-linux.
>
> gcc/ChangeLog:
>
> * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
> * config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): 
> Declare.
> * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
>   (aarch64-store-forwarding-threshold): New param.
> * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> * doc/invoke.texi: Document new option and new param.
> * config/aarch64/aarch64-store-forwarding.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
>
> Signed-off-by: Manos Anagnostakis 
> Co-Authored-By: Manolis Tsamis 
> Co-Authored-By: Philipp Tomsich 
> ---
> Changes in v4:
>   - I had problems to make cselib_subst_to_values work correctly
> so I used cselib_lookup to implement the exact same behaviour and
> record the store value at the time we iterate over it.
>   - Removed the store/load_mem_addr check from is_forwarding as
> unnecessary.
>   - The pass is called on all optimization levels right now.
>   - The threshold check should remain as it is as we only care for
> the front element of the list. The comment above the check explains
> why a single if is enough.

I still think this is structurally better as a while.  There's no reason
in principle we why wouldn't want to record the stores in:

stp x0, x1, [x4, #8]
ldp x0, x1, [x4, #0]
ldp x2, x3, [x4, #16]

and then the two stores should have the same distance value.
I realise we don't do that yet, but still.

>   - The documentation changes requested.
>   - Adjusted a comment.
>
>  gcc/config.gcc|   1 +
>  gcc/config/aarch64/aarch64-passes.def |   1 +
>  gcc/config/aarch64/aarch64-protos.h   |   1 +
>  .../aarch64/aarch64-store-forwarding.cc   | 321 ++
>  gcc/config/aarch64/aarch64.opt|   9 +
>  gcc/config/aarch64/t-aarch64  |  10 +
>  gcc/doc/invoke.texi   |  11 +-
>  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
>  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
>  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
>  10 files changed, 452 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 748430194f3..2ee3b61c4fa 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -350,6 +350,7 @@ aarch64*-*-*)
>   cxx_target_objs="aarch64-c.o"
>   d_target_objs="aarch64-d.o"
>   extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
> aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
> aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o 
> falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
> + extra_objs="${extra_objs} aarch64-store-forwarding.o"
>   target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
>   target_has_targetm_common=yes
>   ;;
> diff --git a/gcc/config/aarch64/aarch64-passes.def 
> b/gcc/config/aarch64/aarch64-passes.def
> index 6ace797b738..fa79e8adca8 100644
> --- a/gcc/config/aarch64/aarch64-passes.def
> +++ b/gcc/config/aarch64/aarch64-passes.def
> @@ -23,3 +23,4 @@ INSERT_PASS_BEFORE (pass_reorder_blocks, 1, 
> pass_track_speculation);
>  INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
>  INSERT_PASS_BEFORE 

[PATCH 09/17] [APX NDD] Support APX NDD for not insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling 

For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be
added together with xor NDD support.

gcc/ChangeLog:

* config/i386/i386.md (one_cmpl2): Add new constraints for NDD
and adjust output template.
(*one_cmpl2_1): Likewise.
(*one_cmplqi2_1): Likewise.
(*one_cmpl2_doubleword): Likewise.
(*one_cmpl2_2): Likewise.
(*one_cmplsi2_1_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add not test.
---
 gcc/config/i386/i386.md | 58 ++---
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 11 +
 2 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1a2fb116f01..050779273a7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14001,57 +14001,63 @@ (define_expand "one_cmpl2"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
(not:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")))]
   ""
-  "ix86_expand_unary_operator (NOT, mode, operands); DONE;")
+  "ix86_expand_unary_operator (NOT, mode, operands,
+  TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*one_cmpl2_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro")
-   (not: (match_operand: 1 "nonimmediate_operand" "0")))]
-  "ix86_unary_operator_ok (NOT, mode, operands)"
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+   (not: (match_operand: 1 "nonimmediate_operand" "0,ro")))]
+  "ix86_unary_operator_ok (NOT, mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
(not:DWIH (match_dup 1)))
(set (match_dup 2)
(not:DWIH (match_dup 3)))]
-  "split_double_mode (mode, [0], 2, [0], 
[2]);")
+  "split_double_mode (mode, [0], 2, [0], [2]);"
+  [(set_attr "isa" "*,apx_ndd")])
 
 (define_insn "*one_cmpl2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,?k")
-   (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0,k")))]
-  "ix86_unary_operator_ok (NOT, mode, operands)"
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+   (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0,rm,k")))]
+  "ix86_unary_operator_ok (NOT, mode, operands, TARGET_APX_NDD)"
   "@
not{}\t%0
+   not{}\t{%1, %0|%0, %1}
#"
-  [(set_attr "isa" "*,")
-   (set_attr "type" "negnot,msklog")
+  [(set_attr "isa" "*,apx_ndd,")
+   (set_attr "type" "negnot,negnot,msklog")
(set_attr "mode" "")])
 
 (define_insn "*one_cmplsi2_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,?k")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,?k")
(zero_extend:DI
- (not:SI (match_operand:SI 1 "register_operand" "0,k"]
-  "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands)"
+ (not:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,k"]
+  "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands,
+  TARGET_APX_NDD)"
   "@
not{l}\t%k0
+   not{l}\t{%1, %k0|%k0, %1}
#"
-  [(set_attr "isa" "x64,avx512bw_512")
-   (set_attr "type" "negnot,msklog")
-   (set_attr "mode" "SI,SI")])
+  [(set_attr "isa" "x64,apx_ndd,avx512bw_512")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "mode" "SI,SI,SI")])
 
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,?k")
-   (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,k")))]
-  "ix86_unary_operator_ok (NOT, QImode, operands)"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,r,?k")
+   (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,rm,k")))]
+  "ix86_unary_operator_ok (NOT, QImode, operands, TARGET_APX_NDD)"
   "@
not{b}\t%0
not{l}\t%k0
+   not{b}\t{%1, %0|%0, %1}
#"
-  [(set_attr "isa" "*,*,avx512f")
-   (set_attr "type" "negnot,negnot,msklog")
+  [(set_attr "isa" "*,*,apx_ndd,avx512f")
+   (set_attr "type" "negnot,negnot,negnot,msklog")
(set (attr "mode")
(cond [(eq_attr "alternative" "1")
 (const_string "SI")
-   (and (eq_attr "alternative" "2")
+   (and (eq_attr "alternative" "3")
 (match_test "!TARGET_AVX512DQ"))
 (const_string "HI")
   ]
@@ -14081,14 +14087,16 @@ (define_insn_and_split "*one_cmpl_1_slp"
 
 (define_insn "*one_cmpl2_2"
   [(set (reg FLAGS_REG)
-   (compare (not:SWI (match_operand:SWI 1 "nonimmediate_operand" "0"))
+   (compare (not:SWI (match_operand:SWI 1 "nonimmediate_operand" "0,rm"))
 (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=m")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,r")
(not:SWI (match_dup 1)))]
   "ix86_match_ccmode (insn, CCNOmode)
-   && 

Re: [PATCH v6 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-04 Thread waffl3x
On Monday, December 4th, 2023 at 9:39 PM, waffl3x  
wrote:

> On Monday, December 4th, 2023 at 9:35 PM, waffl3x waff...@protonmail.com 
> wrote:
>
>
>
> > > > @@ -15402,6 +15450,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
> > > > complain,
> >
> > > > gcc_checking_assert (TYPE_MAIN_VARIANT (TREE_TYPE (ve))
> > > > == TYPE_MAIN_VARIANT (type));
> > > > SET_DECL_VALUE_EXPR (r, ve);
> > > > + if (is_capture_proxy (t))
> > > > + type = TREE_TYPE (ve);
> >
> > > That should have close to the same effect as the lambda_proxy_type
> > > adjustment I was talking about, since that function basically returns
> > > the TREE_TYPE of the COMPONENT_REF. But the underlying problem is that
> > > finish_non_static_data_member assumes that 'object' is '*this', for
> > > which you can trust the cv-quals; for auto&&, you can't.
> > > capture_decltype has the same problem. I'm attaching a patch to address
> > > this in both places.
> >
> > Regarding this, was my change actually okay, and was your change
> > supposed to address it? I applied my patch to the latest commit in
> > master yesterday and started tests and whatnot with this change
> > commented out as I wasn't sure. It seems like my tests for constness of
> > captures no longer works with or without this change commented out.
> >
> > If you wish I can go over everything again and figure out a new
> > solution with your changes but stepping through all this code was quite
> > a task that I'm weary of doing again. Even if the second time through
> > won't be so arduous I would like to avoid it.
> >
> > You know what, I'll give it a go anyway but I don't want to spend too
> > much time on it, I still have a few tests to clean up and this crash to
> > fix.
> >
> > template  void f()
> >
> > {
> > int i;
> > [=](this T&& self){ return i; }(); // error, unrelated
> > }
> > int main() { f(); }
> >
> > If this crash doesn't take too long (I don't think it will, it seems
> > straightforward enough) then I'll look at fixing the captures with a
> > const xobject parameter bug the correct way.
> >
> > Alex
>
>
> WAIT Scratch that, I made a mistake, there's only a single case that is
> broken, I read the test log wrong. Ah, I swear I'm cursed to realize
> things the moment I hit the send button.
>
> I have to take a closer look, I'll get back to you when I know more,
> just trying to make sure you don't waste your time on this due to my
> mistake.
>
> Alex

tl;dr it wasn't important, I just have to fix my test.

Okay that was faster than I anticipated, but unfortunately I don't know
how to handle it. I think your change in finish_non_static_data_member
might have been too heavy handed, but I don't know if there's a middle
ground. Or that's what I was going to say until I tested my assumption
on godbolt.

void f(auto const& a) { a = 5; }

Clang, MSVC and GCC all accept this until it is actually instantiated.

So, the true answer to my test failing is to just instantiate the
template. The test in question that was failing looks like this.

auto f2 = [n = 5](this auto const&){ n = 10; }; // { dg-error {} }

With the way things were before, this actually worked, so what my
assumption is now is that for us to actually diagnose this before a
template is instantiated would take some significant reworking of how
things are currently done. AND, I don't even know if it's legal for us
to make this diagnostic before instantiation for either of these cases.

Hah, come to think of it, we can't, there could be an overloaded
operator= that this is valid for... how disappointing.

We can for lambdas since the type is not dependent (on the lambda
instantiation) but it just isn't worth the effort I reckon.

Whatever, moving on, spending time on these things always drains me
because I think "oh boy I can do something better" and finding out it's
just not possible sucks. It's worse when it's because I overlooked
something that's obvious in hindsight.

Oh well, only that crash left I believe.

Alex


Re: [PATCH v5] Introduce strub: machine-independent stack scrubbing

2023-12-04 Thread Alexandre Oliva
The recently-installed patch for interprocedural value-range propagation
enabled some folding that was not expected by the strub-const testcases,
causing them to fail.

I'm making the following adjustments to them to restore the behavior
they tested for, and to make them more future-proof to future
improvements of ivrp.

I intend to install this as part of the monster patch upthread.


--- a/gcc/testsuite/c-c++-common/torture/strub-const1.c
+++ b/gcc/testsuite/c-c++-common/torture/strub-const1.c
@@ -1,18 +1,22 @@
 /* { dg-do compile } */
 /* { dg-options "-fstrub=strict -fdump-ipa-strub" } */
 
-/* Check that, along with a strub const function call, we issue an asm 
statement
-   to make sure the watermark passed to it is held in memory before the call,
-   and another to make sure it is not assumed to be unchanged.  */
+/* Check that, along with a strub const function call, we issue an asm
+   statement to make sure the watermark passed to it is held in memory before
+   the call, and another to make sure it is not assumed to be unchanged.  f
+   should not be inlined into g, but if it were too simple it might be folded
+   by interprocedural value-range propagation.  */
+
+extern int __attribute__ ((__strub__ ("callable"), __const__)) c ();
 
 int __attribute__ ((__strub__, __const__))
-f() {
-  return 0;
+f () {
+  return c ();
 }
 
 int
-g() {
-  return f();
+g () {
+  return f ();
 }
 
 /* { dg-final { scan-ipa-dump-times "__asm__" 2 "strub" } } */
--- a/gcc/testsuite/c-c++-common/torture/strub-const2.c
+++ b/gcc/testsuite/c-c++-common/torture/strub-const2.c
@@ -6,17 +6,19 @@
before the call, and another to make sure it is not assumed to be
unchanged.  */
 
+extern int __attribute__ ((__strub__ ("callable"), __const__)) c ();
+
 int __attribute__ ((__strub__))
 #if ! __OPTIMIZE__
 __attribute__ ((__const__))
 #endif
-f() {
-  return 0;
+f () {
+  return c ();
 }
 
 int
-g() {
-  return f();
+g () {
+  return f ();
 }
 
 /* { dg-final { scan-ipa-dump-times "__asm__" 2 "strub" } } */


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [gcc15] nested functions in C

2023-12-04 Thread Jakub Jelinek
On Mon, Dec 04, 2023 at 01:27:32PM -0500, Siddhesh Poyarekar wrote:
> [Branching this into a separate conversation to avoid derailing the patch,
> which isn't directly related]
> 
> On 2023-12-04 12:21, Martin Uecker wrote:
> > I do not really agree with that.  Nested functions can substantially
> > improve code quality and in C can avoid type unsafe use of
> > void* pointers in callbacks. The code is often much better with
> > nested functions than without.  Nested functions and lambdas
> > (i.e. anonymous nested functions) are used in many languages
> > because they make code better and GNU's nested function are no
> > exception.
> > 
> > So I disagree with the idea that discouraging nested functions leads
> > to better code - I think the exact opposite is true.
> 
> I would argue that GNU's nested functions *are* an exception because they're
> like feathers stuck on a pig to try and make it fly; I think a significant
> specification effort is required to actually make it a cleanly usable
> feature.

Why?  The syntax doesn't seem to be something unexpected, and as C doesn't
have lambdas, one can use the nested functions instead.
The only problem is if you need to pass function pointers somewhere else
(and target doesn't have function descriptors or something similar), if it
is only done to make code more readable compared to say use of macros, I
think the nested functions are better, one doesn't have to worry about
multiple evaluations of argument side-effects etc.  And if everything is
inlined and SRA optimized, there is no extra cost.
The problem of passing it as a function pointer to other functions is
common with C++, only lambdas which don't capture anything actually can be
convertible to function pointer, for anything else you need a template and
instantiate it for a particular lambda (which is something you can't do in
C).

Jakub



Re: [gcc15] nested functions in C

2023-12-04 Thread Martin Uecker
Am Montag, dem 04.12.2023 um 13:27 -0500 schrieb Siddhesh Poyarekar:
> [Branching this into a separate conversation to avoid derailing the 
> patch, which isn't directly related]
> 
> On 2023-12-04 12:21, Martin Uecker wrote:
> > I do not really agree with that.  Nested functions can substantially
> > improve code quality and in C can avoid type unsafe use of
> > void* pointers in callbacks. The code is often much better with
> > nested functions than without.  Nested functions and lambdas
> > (i.e. anonymous nested functions) are used in many languages
> > because they make code better and GNU's nested function are no
> > exception.
> > 
> > So I disagree with the idea that discouraging nested functions leads
> > to better code - I think the exact opposite is true.
> 
> I would argue that GNU's nested functions *are* an exception because 
> they're like feathers stuck on a pig to try and make it fly; I think a 
> significant specification effort is required to actually make it a 
> cleanly usable feature.  It *may* be possible to implement patterns that 
> use C nested functions well enough *and* result in readable code, but 
> IMO it is easier to write clunky and unmaintainable code with it.

I use them in my code a lot and I think they improve
code quality.  For example:

int foo_find(int N, struct foo in_array[N], const char* *key)
{
  bool cond(struct foo* x)
  {
return 0 == strcmp(x->name, key); 
  }
  return find(N, in_array, cond);
}

is a lot cleaner and safer than what you need to write
without nested functions:

struct foo_find {
  const char* name;
}; 

int foo_cond(void *vdata, struct foo* a)
{
  struct foo *key = data;
  return 0 == strcmp(x->name, key->name);  
}

void foo_sort(int N, struct foo in_array[N], const char* key)
{
  struct foo_find data = { key };
  sort(N, in_array, foo_cond, );
}

and this is a toy example, the improvement gets more 
substantial with more complicated logic.

> 
> I empathize with Jakub's stated use case though of keeping the C 
> frontend support for testing purposes, but that could easily be done 
> behind a flag, or by putting nested C func deprecation behind a flag.

I am relatively sure C will get some form of nested functions.
Maybe as anonymous nested functions, i.e. lambdas, but I do
not see a fundamental difference here (I personally like naming
things for clarity, so i prefer named nested functions)

> > I am generally wary of mitigations that may make exploitation of
> > buffer overflows a bit harder  while increasing the likelihood
> > of buffer overflows by reducing type safety and/or code quality.
> > 
> > But I would agree that trampolines are generally problematic. A
> > better strategy would be wide function pointer type (as in Apple'
> > Blocks extension). Alternatively, an explicit way to obtain the
> > static chain for a nested function which could be used with
> > __builtin_call_with_static_chain  could also work.
> > 
> > But in any case, I think it diminishes the value of -fhardening
> > it if requires source code changes, because then it is not as easy
> > to simply turn it on in larger projects / distributitions.
> 
> I suppose you mean source code changes even in correct code just to 
> comply with the flag?  

Yes

> I don't disagree for cases like -Warray-bounds, 
> but for warnings/errors that are more deterministic in nature (like 
> -Werror=trampolines), they're going to point at actual problems and 
> larger projects and distributions will usually prefer to at least track 
> them, if not actually fix them.  For Fedora we tend to provide macro 
> overrides for packages that need to explicitly disable a security 
> related flag.

In projects such as mine, this will lead to a lot of code
transformations as indicated above, i.e. much worse code. 

One could get away with it, since nested functions are rarely
used, but I think this is bad, because a lot of code would
improve if it used them.

Martin

> 
> Thanks,
> Sid



Re: [PATCH] gettext: disable install, docs targets, libasprintf, threads

2023-12-04 Thread Tom Tromey
> "Arsen" == Arsen Arsenović  writes:

Arsen> Thanks.  I'll wait for the Binutils and GDB maintainers to weigh in
Arsen> before pushing (plus, I can't push there).

Seems fine to me.  Thank you.

Tom


[gcc15] nested functions in C

2023-12-04 Thread Siddhesh Poyarekar
[Branching this into a separate conversation to avoid derailing the 
patch, which isn't directly related]


On 2023-12-04 12:21, Martin Uecker wrote:

I do not really agree with that.  Nested functions can substantially
improve code quality and in C can avoid type unsafe use of
void* pointers in callbacks. The code is often much better with
nested functions than without.  Nested functions and lambdas
(i.e. anonymous nested functions) are used in many languages
because they make code better and GNU's nested function are no
exception.

So I disagree with the idea that discouraging nested functions leads
to better code - I think the exact opposite is true.


I would argue that GNU's nested functions *are* an exception because 
they're like feathers stuck on a pig to try and make it fly; I think a 
significant specification effort is required to actually make it a 
cleanly usable feature.  It *may* be possible to implement patterns that 
use C nested functions well enough *and* result in readable code, but 
IMO it is easier to write clunky and unmaintainable code with it.


I empathize with Jakub's stated use case though of keeping the C 
frontend support for testing purposes, but that could easily be done 
behind a flag, or by putting nested C func deprecation behind a flag.



I am generally wary of mitigations that may make exploitation of
buffer overflows a bit harder  while increasing the likelihood
of buffer overflows by reducing type safety and/or code quality.

But I would agree that trampolines are generally problematic. A
better strategy would be wide function pointer type (as in Apple'
Blocks extension). Alternatively, an explicit way to obtain the
static chain for a nested function which could be used with
__builtin_call_with_static_chain  could also work.

But in any case, I think it diminishes the value of -fhardening
it if requires source code changes, because then it is not as easy
to simply turn it on in larger projects / distributitions.


I suppose you mean source code changes even in correct code just to 
comply with the flag?  I don't disagree for cases like -Warray-bounds, 
but for warnings/errors that are more deterministic in nature (like 
-Werror=trampolines), they're going to point at actual problems and 
larger projects and distributions will usually prefer to at least track 
them, if not actually fix them.  For Fedora we tend to provide macro 
overrides for packages that need to explicitly disable a security 
related flag.


Thanks,
Sid


Re: [PATCH v2] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]

2023-12-04 Thread Wilco Dijkstra
Hi Richard,

>> Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible 
>> with
>> existing binaries, gives better performance than locking atomics and is what
>> most users expect.
>
> Please add a justification for why it's backwards compatible, rather
> than just stating that it's so.

This isn't any different than the LSE2 support which also switches some CPUs to
lock-free implementations. This is basically switching the rest. It trivially 
follows
from the fact that GCC always calls libatomic so that you switch all atomics in 
a
process. I'll add that to the description.

Note the compatibility story is even better than this. We are also compatible
with LLVM and future GCC versions which may inline these sequences.

> Thanks for adding this.  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95722
> suggests that it's still an open question whether this is a correct thing
> to do, but it sounds from Joseph's comment that he isn't sure whether
> atomic loads from read-only data are valid.

Yes it's not useful to do an atomic read if it is a read-only value... It should
be feasible to mark atomic types as mutable to force them to .data (see eg.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109553).

> Linus's comment in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70490
> suggests that a reasonable compromise might be to use a storing
> implementation but not advertise that it is lock-free.  Also,
> the comment above libat_is_lock_free says:
>
> /* Note that this can return that a size/alignment is not lock-free even if
>    all the operations that we use to implement the respective accesses provide
>    lock-free forward progress as specified in C++14:  Users likely expect
>    "lock-free" to also mean "fast", which is why we do not return true if, for
>    example, we implement loads with this size/alignment using a CAS.  */

I don't believe lying about being lock-free like that is a good idea. When
you use a faster lock-free implementation, you want to tell users about it
(so they aren't forced to use nasty inline assembler hacks for example).

> We don't use a CAS for the fallbacks, but like you say, we do use a
> load/store exclusive loop.  So did you consider not doing this:

> +/* State we have lock-free 128-bit atomics.  */
> +#undef FAST_ATOMIC_LDST_16
> +#define FAST_ATOMIC_LDST_16    1

That would result in __atomic_is_lock_free incorrectly returning false.
Note that __atomic_always_lock_free remains false for 128-bit since there
is no inlining in the compiler, but __atomic_is_lock_free should be true.

> -   /* RELEASE.  */
> -5: ldxp    res0, res1, [x5]
> +   /* RELEASE/ACQ_REL/SEQ_CST.  */
> +4: ldaxp   res0, res1, [x5]
>  stlxp   w4, in0, in1, [x5]
> -   cbnz    w4, 5b
> +   cbnz    w4, 4b
>  ret
> +END (libat_exchange_16)

> Please explain (here and in the commit message) why you're adding
> acquire semantics to the RELEASE case.

That merges the RELEASE with ACQ_REL/SEQ_CST cases to keep the code
short and simple like much of the code. I've added a note in the commit msg.

Cheers,
Wilco

Here is v2 - this also incorporates the PR111404 fix to compare-exchange:

Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible with
existing binaries (as for these GCC always calls into libatomic, so all 128-bit
atomic uses in  a process are switched), gives better performance than locking
atomics and is what most users expect.

Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not 
supported.
This results in an implicit store which is invisible to software as long as the
given address is writeable (which will be true when using atomics in actual 
code).

Passes regress, OK for commit?

libatomic/
config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0 atomics.
(libat_exchange_16): Merge RELEASE and ACQ_REL/SEQ_CST cases.
config/linux/aarch64/host-config.h: Use atomic_16.S for baseline v8.0.
State we have lock-free atomics.

---

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 
05439ce394b9653c9bcb582761ff7aaa7c8f9643..a099037179b3f1210145baea02a9d43418629813
 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -22,6 +22,22 @@
.  */
 
 
+/* AArch64 128-bit lock-free atomic implementation.
+
+   128-bit atomics are now lock-free for all AArch64 architecture versions.
+   This is backwards compatible with existing binaries (as we swap all uses
+   of 128-bit atomics via an ifunc) and gives better performance than locking
+   atomics.
+
+   128-bit atomic loads use a exclusive loop if LSE2 is not supported.
+   This results in an implicit store which is invisible to software as long
+   as the given address is writeable.  Since all other atomics have explicit
+   

[PATCH v4] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-04 Thread Manos Anagnostakis
This is an RTL pass that detects store forwarding from stores to larger loads 
(load pairs).

This optimization is SPEC2017-driven and was found to be beneficial for some 
benchmarks,
through testing on ampere1/ampere1a machines.

For example, it can transform cases like

str  d5, [sp, #320]
fmul d5, d31, d29
ldp  d31, d17, [sp, #312] # Large load from small store

to

str  d5, [sp, #320]
fmul d5, d31, d29
ldr  d31, [sp, #312]
ldr  d17, [sp, #320]

Currently, the pass is disabled by default on all architectures and enabled by 
a target-specific option.

If deemed beneficial enough for a default, it will be enabled on 
ampere1/ampere1a,
or other architectures as well, without needing to be turned on by this option.

Bootstrapped and regtested on aarch64-linux.

gcc/ChangeLog:

* config.gcc: Add aarch64-store-forwarding.o to extra_objs.
* config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
* config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): 
Declare.
* config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
(aarch64-store-forwarding-threshold): New param.
* config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
* doc/invoke.texi: Document new option and new param.
* config/aarch64/aarch64-store-forwarding.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
* gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
* gcc.target/aarch64/ldp_ssll_overlap.c: New test.

Signed-off-by: Manos Anagnostakis 
Co-Authored-By: Manolis Tsamis 
Co-Authored-By: Philipp Tomsich 
---
Changes in v4:
- I had problems to make cselib_subst_to_values work correctly
  so I used cselib_lookup to implement the exact same behaviour and
  record the store value at the time we iterate over it.
- Removed the store/load_mem_addr check from is_forwarding as
  unnecessary.
- The pass is called on all optimization levels right now.
- The threshold check should remain as it is as we only care for
  the front element of the list. The comment above the check explains
  why a single if is enough.
- The documentation changes requested.
- Adjusted a comment.

 gcc/config.gcc|   1 +
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 .../aarch64/aarch64-store-forwarding.cc   | 321 ++
 gcc/config/aarch64/aarch64.opt|   9 +
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/doc/invoke.texi   |  11 +-
 .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
 .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
 .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
 10 files changed, 452 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 748430194f3..2ee3b61c4fa 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -350,6 +350,7 @@ aarch64*-*-*)
cxx_target_objs="aarch64-c.o"
d_target_objs="aarch64-d.o"
extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o 
falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
+   extra_objs="${extra_objs} aarch64-store-forwarding.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 6ace797b738..fa79e8adca8 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -23,3 +23,4 @@ INSERT_PASS_BEFORE (pass_reorder_blocks, 1, 
pass_track_speculation);
 INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
 INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
 INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
+INSERT_PASS_AFTER (pass_peephole2, 1, pass_avoid_store_forwarding);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d2718cc87b3..7d9dfa06af9 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1050,6 +1050,7 @@ rtl_opt_pass *make_pass_track_speculation (gcc::context 
*);
 rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *);
 

Re: [PATCH] gettext: disable install, docs targets, libasprintf, threads

2023-12-04 Thread Arsen Arsenović

Richard Biener  writes:

> OK.

Thanks.  I'll wait for the Binutils and GDB maintainers to weigh in
before pushing (plus, I can't push there).

Have a lovely day!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


[PATCH] Maintain a validity flag for REG_UNUSED notes [PR112760] (was Re: [PATCH] pro_and_epilogue: Call df_note_add_problem () if SHRINK_WRAPPING_ENABLED [PR112760])

2023-12-04 Thread Richard Sandiford
Richard Sandiford  writes:
> Jakub Jelinek  writes:
>> On Sat, Dec 02, 2023 at 11:04:04AM +, Richard Sandiford wrote:
>>> I still maintain that so much stuff relies on the lack of false-positive
>>> REG_UNUSED notes that (whatever the intention might have been) we need
>>> to prevent the false positive.  Like Andrew says, any use of single_set
>>> is suspect if there's a REG_UNUSED note for something that is in fact used.
>>
>> The false positive REG_UNUSED in that case comes from
>> (insn 15 14 35 2 (set (reg:CCZ 17 flags)
>> (compare:CCZ (reg:DI 0 ax [111])
>> (reg:DI 1 dx [112]))) "pr112760.c":11:22 12 {*cmpdi_1}
>>  (expr_list:REG_UNUSED (reg:CCZ 17 flags)
>> (nil)))
>> (insn 35 15 36 2 (set (reg:CCZ 17 flags)
>> (compare:CCZ (reg:DI 0 ax [111])
>> (reg:DI 1 dx [112]))) "pr112760.c":11:22 12 {*cmpdi_1}
>>  (expr_list:REG_DEAD (reg:DI 1 dx [112])
>> (expr_list:REG_DEAD (reg:DI 0 ax [111])
>> (nil
>> ...
>> use of flags
>> Haven't verified what causes the redundant comparison, but postreload cse
>> then does:
>> 110if (!count && cselib_redundant_set_p (body))
>> 111  {
>> 112if (check_for_inc_dec (insn))
>> 113  delete_insn_and_edges (insn);
>> 114/* We're done with this insn.  */
>> 115goto done;
>> 116  }
>> So, we'd in such cases need to look up what instruction was the earlier
>> setter and if it has REG_UNUSED note, drop it.
>
> Hmm, OK.  I guess it's not as simple as I'd imagined.  cselib does have
> some code to track which instruction established which equivalence,
> but it doesn't currently record what we want, and it would be difficult
> to reuse that information here anyway.  Something "simple" like a map of
> register numbers to instructions, populated only for REG_UNUSED sets,
> would be enough, and low overhead.  But it's not very natural.
>
> Perhaps DF should maintain a flag to say "the current pass keeps
> notes up-to-date", with the assumption being that any pass that
> uses the notes problem does that.  Then single_set and the
> regcprop.cc uses can check that flag.
>
> I don't think it's worth adding the note problem to shrink-wrapping
> just for the regcprop code.  If we're prepared to take that compile-time
> hit, we might as well run a proper (fast) DCE.

Here's a patch that tries to do that.  Boostrapped & regression tested
on aarch64-linux-gnu.  Also tested on x86_64-linux-gnu for the testcase.
(I'll run full x86_64-linux-gnu testing overnight.)

OK to install if that passes?  Not an elegant fix, but it's probably
too much to hope for one of those.

Richard



PR112760 is a miscompilation caused by a stale, false-positive
REG_UNUSED note.  There were originally two consecutive,
identical instructions that set the CC flags.  The first
originally had a REG_UNUSED note, but postreload later deleted
the second in favour of the first, based on cselib_redundant_set_p.

Although in principle it would be possible to remove the note
when making the optimisation, the required bookkeeping wouldn't
fit naturally into what cselib already does.  Doing that would also
arguably be a change of policy.

This patch instead adds a global flag that says whether REG_UNUSED
notes are trustworthy.  The assumption is that any pass that calls
df_note_add_problem cares about REG_UNUSED notes and will keep them
sufficiently up-to-date to support the pass's use of things like
single_set.

gcc/
PR rtl-optimization/112760
* df.h (df_d::can_trust_reg_unused_notes): New member variable.
* df-problems.cc (df_note_add_problem): Set can_trust_reg_unused_notes
to true.
* passes.cc (execute_one_pass): Clear can_trust_reg_unused_notes
after each pass.
* rtlanal.cc (single_set_2): Check can_trust_reg_unused_notes.
* regcprop.cc (copyprop_hardreg_forward_1): Likewise.

gcc/testsuite/
* gcc.dg/pr112760.c: New test.
---
 gcc/df-problems.cc  |  1 +
 gcc/df.h|  4 
 gcc/passes.cc   |  3 +++
 gcc/regcprop.cc |  4 +++-
 gcc/rtlanal.cc  |  8 ++--
 gcc/testsuite/gcc.dg/pr112760.c | 22 ++
 6 files changed, 39 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr112760.c

diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index d2cfaf7f50f..d2eb95d35ad 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -3782,6 +3782,7 @@ void
 df_note_add_problem (void)
 {
   df_add_problem (_NOTE);
+  df->can_trust_reg_unused_notes = true;
 }
 
 
diff --git a/gcc/df.h b/gcc/df.h
index 402657a7076..a405c000235 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -614,6 +614,10 @@ public:
   /* True if someone added or deleted something from regs_ever_live so
  that the entry and exit blocks need be reprocessed.  */
   bool redo_entry_and_exit;
+
+  /* True if REG_UNUSED notes are 

Re: [PATCH] gcc: Disallow trampolines when -fhardened

2023-12-04 Thread Martin Uecker
Am Montag, dem 04.12.2023 um 11:46 -0500 schrieb Siddhesh Poyarekar:
> On 2023-12-04 11:39, Andreas Schwab wrote:
> > On Dez 04 2023, Siddhesh Poyarekar wrote:
> > 
> > > For hardened code in C, I think we really should look to step away from
> > > nested functions instead of adding ways to continue supporting it. There's
> > > probably a larger conversation to be had about the utility of nested
> > > functions in general for C (and whether this GCC extension should be
> > > deprecated altogether in future), but I feel like the -fhardened subset
> > > gives us the opportunity to enforce at least a safe subset for now,
> > > possibly extending it in future.
> > 
> > Nested functions by itself don't need a trampoline, only if the address
> > of it is passed outside the containing function's scope (as a callback,
> > for example).
> 
> Yes, that's why I said that the conversation about deprecating the C 
> nested functions extension is a broader one (and hence for gcc 15) that 
> will likely involve the question of whether dropping the extension 
> altogether gives any benefit or if dropping support for on-stack 
> trampolines is sufficient.  On-heap trampolines are maybe slightly 
> better in that they don't need an executable stack, but defaulting to 
> on-heap trampolines for -fhardened seems like a lost opportunity to 
> enforce better user code.

I do not really agree with that.  Nested functions can substantially
improve code quality and in C can avoid type unsafe use of
void* pointers in callbacks. The code is often much better with
nested functions than without.  Nested functions and lambdas
(i.e. anonymous nested functions) are used in many languages
because they make code better and GNU's nested function are no
exception.

So I disagree with the idea that discouraging nested functions leads 
to better code - I think the exact opposite is true.

I am generally wary of mitigations that may make exploitation of
buffer overflows a bit harder  while increasing the likelihood
of buffer overflows by reducing type safety and/or code quality.

But I would agree that trampolines are generally problematic. A
better strategy would be wide function pointer type (as in Apple'
Blocks extension). Alternatively, an explicit way to obtain the
static chain for a nested function which could be used with 
__builtin_call_with_static_chain  could also work.

But in any case, I think it diminishes the value of -fhardening 
it if requires source code changes, because then it is not as easy
to simply turn it on in larger projects / distributitions. 

Martin



> 
> Thanks,
> Sid



Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-12-04 Thread Maciej W. Rozycki
On Wed, 8 Nov 2023, Kito Cheng wrote:

> OK, then LGTM, thanks for the explanation :)

 Please don't top-post on a GCC mailing list (and preferably in off-list 
replies to such mailing list messages unless it's been agreed to somehow 
with the participants), as it makes it difficult to make context replies.

 Best practice is to reply inline, quoting the relevant original paragraph 
(or enough context) referred to above, and with all the other parts of the 
message replied to discarded.  We may even have it written down somewhere 
(though I haven't checked; in the old days it used to be assumed), and I 
do hope any sane modern MUA can handle it.

 Otherwise the discussion thread quickly grows into an illegible mess.

 So this change does indeed fix PR 112092, however we now have an issue 
with several other test cases and the new `-mmovcc' option.  For example 
vsetvl-13.c fails with "-mmovcc -mbranch-cost=8" test options and assembly 
produced is like:

vsetvli a6,a6,e8,mf4,ta,ma
sneza5,a5
neg a5,a5
and a6,a5,a6
not a5,a5
andia5,a5,55
or  a5,a6,a5
beq a4,zero,.L10
li  a6,0
vsetvli zero,a5,e32,m1,tu,ma
.L4:
vle32.v v1,0(a0)
vle32.v v1,0(a1)
vle32.v v1,0(a2)
vse32.v v1,0(a3)
addia6,a6,1
bne a4,a6,.L4
.L10:
ret

As far as I can tell code produced is legitimate, and for the record 
analogous assembly is produced with `-march=rv32gcv_zicond' too:

vsetvli a6,a6,e8,mf4,ta,ma
czero.eqz   a6,a6,a5
li  a7,55
czero.nez   a5,a7,a5
or  a5,a5,a6
beq a4,zero,.L10
li  a6,0
vsetvli zero,a5,e32,m1,tu,ma
.L4:
vle32.v v1,0(a0)
vle32.v v1,0(a1)
vle32.v v1,0(a2)
vse32.v v1,0(a3)
addia6,a6,1
bne a4,a6,.L4
.L10:
ret

-- it's just that you can't see it with regression testing, because the 
test case overrides `-march='.  Presumably we do want to execute VSETVLI 
twice here on the basis that to avoid the second one by means of branches 
would be more costly than not to.

 Shall we just silence false failures like this with `-mno-movcc' then or 
shall we handle the conditional-move case somehow?

 For reference plain branched assembly is like:

li  a7,55
beq a5,zero,.L13
vsetvli zero,a6,e32,m1,tu,ma
.L2:
beq a4,zero,.L11
li  a5,0
.L4:
vle32.v v1,0(a0)
vle32.v v1,0(a1)
vle32.v v1,0(a2)
vse32.v v1,0(a3)
addia5,a5,1
bne a4,a5,.L4
.L11:
ret
.L13:
vsetvli zero,a7,e32,m1,tu,ma
j   .L2

  Maciej


[committed] Fix HImode load mnemonic on microblaze port

2023-12-04 Thread Jeff Law

The tester recently started failing va-arg-22.c on microblaze-linux:

gcc.c-torture/execute/va-arg-22.c   -O0  (test for excess errors)

It was failing with an undefined reference to "r7" at link time.  This 
was ultimately tracked down to a HImode load using (reg+reg) addressing 
mode, but which used the lhui instruction instead of lhu.  The "i" means 
it's supposed to be (reg+disp) so the assembler tried to interpret "r7" 
as an immediate/symbol.


The port uses %i as an output modifier to select between sh/shi 
and various other mnemonics for loads/stores.  The movhi pattern simply 
failed to use it for the two cases where it's loading from memory 
(interestingly enough it was used for stores).


Clearly we aren't using reg+reg much for HImode loads as this didn't fix 
anything else in the testsuite.


Installing on the trunk,
Jeffcommit b544ec681bdc9c48587d2e014f9559674097738a
Author: Jeff Law 
Date:   Mon Dec 4 10:06:49 2023 -0700

[committed] Fix HImode load mnemonic on microblaze port

The tester recently started failing va-arg-22.c on microblaze-linux:

gcc.c-torture/execute/va-arg-22.c   -O0  (test for excess errors)

It was failing with an undefined reference to "r7" at link time.  This was
ultimately tracked down to a HImode load using (reg+reg) addressing mode, 
but
which used the lhui instruction instead of lhu.  The "i" means it's 
supposed to
be (reg+disp) so the assembler tried to interpret "r7" as an 
immediate/symbol.

The port uses %i as an output modifier to select between sh/shi and
various other mnemonics for loads/stores.  The movhi pattern simply failed 
to
use it for the two cases where it's loading from memory (interestingly 
enough
it was used for stores).

Clearly we aren't using reg+reg much for HImode loads as this didn't fix
anything else in the testsuite.

gcc/
* config/microblaze/microblaze.md (movhi): Use %i for half-word
loads to properly select between lhu/lhui.

diff --git a/gcc/config/microblaze/microblaze.md 
b/gcc/config/microblaze/microblaze.md
index 671667b537c..a8ee886d36b 100644
--- a/gcc/config/microblaze/microblaze.md
+++ b/gcc/config/microblaze/microblaze.md
@@ -1089,8 +1089,8 @@ (define_insn "*movhi_internal2"
   "@
addik\t%0,r0,%1\t# %X1
addk\t%0,%1,r0
-   lhui\t%0,%1
-   lhui\t%0,%1
+   lhu%i1\t%0,%1
+   lhu%i1\t%0,%1
sh%i0\t%z1,%0
sh%i0\t%z1,%0"
   [(set_attr "type""arith,move,load,no_delay_load,store,no_delay_store")


Re: [PATCH] aarch64: fix eh_return-3.c test

2023-12-04 Thread Richard Sandiford
Szabolcs Nagy  writes:
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/eh_return-3.c: Fix when retaa is available.

OK, thanks.

Richard

> ---
>  gcc/testsuite/gcc.target/aarch64/eh_return-3.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return-3.c 
> b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
> index a17baa86501..d180fa7c455 100644
> --- a/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
> +++ b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
> @@ -12,8 +12,12 @@
>  **   cbz x4, .*
>  **   add sp, sp, x5
>  **   br  x6
> +** (
>  **   hint29 // autiasp
>  **   ret
> +** |
> +**   retaa
> +** )
>  **   mov x5, x0
>  **   mov x4, 1
>  **   mov x6, x1


Re: [PATCH] gm2: Fix mc/mc.flex compilation on Solaris

2023-12-04 Thread Gaius Mulley
Rainer Orth  writes:

> The recent warning changes broke gm2 bootstrap on Solaris:
>
> /vol/gcc/src/hg/master/local/gcc/m2/mc/mc.flex: In function 'handleFile':
> /vol/gcc/src/hg/master/local/gcc/m2/mc/mc.flex:297:21: error: implicit 
> declaration of function 'alloca' [-Wimplicit-function-declaration]
>   297 |   char *s = (char *)alloca (strlen (filename) + 2 + 1);
>   | ^~
>
> alloca needs  on Solaris, which isn't universally available.
> Since mc.flex doesn't include any config header, I chose to switch to
> __builtin_alloca instead.
>
> /vol/gcc/src/hg/master/local/gcc/m2/mc/mc.flex:332:19: error: implicit 
> declaration of function 'index' [-Wimplicit-function-declaration]
>   332 |   char   *p = index(sdate, '\n');
>   |   ^
>
> index is declared in  on Solaris, again not a standard
> header.  I simply switched to using strchr to avoid that issue.
>
> Bootstrapped without regressions on i386-pc-solaris2.11,
> sparc-sun-solaris2.11, x86_64-pc-linux-gnu, and
> x86_64-apple-darwin23.1.0.
>
> Ok for trunk?
>
>   Rainer

yes, lgtm, - thanks for fixing index as well,

regards,
Gaius




Re: [PATCH] libiberty: Fix pex_unix_wait return type

2023-12-04 Thread Richard Sandiford
Rainer Orth  writes:
> The recent warning patches broke Solaris bootstrap:
>
> /vol/gcc/src/hg/master/local/libiberty/pex-unix.c:326:3: error: 
> initialization of 'pid_t (*)(struct pex_obj *, pid_t,  int *, struct pex_time 
> *, int,  const char **, int *)' {aka 'long int (*)(struct pex_obj *, long 
> int,  int *, struct pex_time *, int,  const char **, int *)'} from 
> incompatible pointer type 'int (*)(struct pex_obj *, pid_t,  int *, struct 
> pex_time *, int,  const char **, int *)' {aka 'int (*)(struct pex_obj *, long 
> int,  int *, struct pex_time *, int,  const char **, int *)'} 
> [-Wincompatible-pointer-types]
>   326 |   pex_unix_wait,
>   |   ^
> /vol/gcc/src/hg/master/local/libiberty/pex-unix.c:326:3: note: (near 
> initialization for 'funcs.wait')
>
> While pex_funcs.wait expects a function returning pid_t, pex_unix_wait
> currently returns int.  However, on Solaris pid_t is long for 32-bit,
> but int for 64-bit.
>
> This patches fixes this by having pex_unix_wait return pid_t as
> expected, and like every other variant already does.
>
> Bootstrapped without regressions on i386-pc-solaris2.11,
> sparc-sun-solaris2.11, x86_64-pc-linux-gnu, and
> x86_64-apple-darwin23.1.0.
>
> Ok for trunk?
>
>   Rainer

OK, thanks.

Richard


Re: [PATCH] gcc: Disallow trampolines when -fhardened

2023-12-04 Thread Siddhesh Poyarekar

On 2023-12-04 11:39, Andreas Schwab wrote:

On Dez 04 2023, Siddhesh Poyarekar wrote:


For hardened code in C, I think we really should look to step away from
nested functions instead of adding ways to continue supporting it. There's
probably a larger conversation to be had about the utility of nested
functions in general for C (and whether this GCC extension should be
deprecated altogether in future), but I feel like the -fhardened subset
gives us the opportunity to enforce at least a safe subset for now,
possibly extending it in future.


Nested functions by itself don't need a trampoline, only if the address
of it is passed outside the containing function's scope (as a callback,
for example).


Yes, that's why I said that the conversation about deprecating the C 
nested functions extension is a broader one (and hence for gcc 15) that 
will likely involve the question of whether dropping the extension 
altogether gives any benefit or if dropping support for on-stack 
trampolines is sufficient.  On-heap trampolines are maybe slightly 
better in that they don't need an executable stack, but defaulting to 
on-heap trampolines for -fhardened seems like a lost opportunity to 
enforce better user code.


Thanks,
Sid


Re: [PATCH] gcc: Disallow trampolines when -fhardened

2023-12-04 Thread Jakub Jelinek
On Mon, Dec 04, 2023 at 05:39:04PM +0100, Andreas Schwab wrote:
> On Dez 04 2023, Siddhesh Poyarekar wrote:
> 
> > For hardened code in C, I think we really should look to step away from
> > nested functions instead of adding ways to continue supporting it. There's
> > probably a larger conversation to be had about the utility of nested
> > functions in general for C (and whether this GCC extension should be
> > deprecated altogether in future), but I feel like the -fhardened subset
> > gives us the opportunity to enforce at least a safe subset for now,
> > possibly extending it in future.
> 
> Nested functions by itself don't need a trampoline, only if the address
> of it is passed outside the containing function's scope (as a callback,
> for example).

And only if the code to which it is passed can't be inlined back.

I'm afraid contained functions in Fortran or in Ada (whatever it is called
there) aren't going away any time soon and having the possibility to test it
also in C and not just Fortran/Ada is very useful at least from compiler
testing POV.

Jakub



[PATCH] libstdc++: Add test for LWG Issue 3897

2023-12-04 Thread Will Hawkins
Hello!

Thank you, as always, for the great work that you do on libstdc++. The
inout_ptr implementation properly handles the issue raised in LWG 3897
but it seems like having an explicit test might be a good idea.

I hope that this helps!
Will

-- >8 --

Add a test to verify that the implementation of inout_ptr is not
vulnerable to LWG Issue 3897.

libstdc++-v3/ChangeLog:

* testsuite/20_util/smartptr.adapt/inout_ptr/3.cc: New test
for LWG Issue 3897.

Signed-off-by: Will Hawkins 
---
 .../20_util/smartptr.adapt/inout_ptr/3.cc   | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/20_util/smartptr.adapt/inout_ptr/3.cc

diff --git a/libstdc++-v3/testsuite/20_util/smartptr.adapt/inout_ptr/3.cc 
b/libstdc++-v3/testsuite/20_util/smartptr.adapt/inout_ptr/3.cc
new file mode 100644
index 000..f9114dc57b5
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/smartptr.adapt/inout_ptr/3.cc
@@ -0,0 +1,17 @@
+// { dg-do run { target c++23 } }
+
+#include 
+#include 
+
+// C++23 [inout.ptr.t] Class template inout_ptr_t
+// Verify that implementation handles LWG Issue 3897
+void nuller(int **p) {
+  *p = nullptr;
+}
+
+int main(int, char **) {
+  int *i = new int{5};
+  nuller(std::inout_ptr(i));
+
+  VERIFY(i == nullptr);
+}
-- 
2.41.0



  1   2   >