Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-03-07 Thread Richard Biener
On Thu, 7 Mar 2024, Richard Sandiford wrote:

> Sorry, still catching up on email, but: 
> 
> Richard Biener  writes:
> > We have optimize_vectors_before_lowering_p but we shouldn't even there
> > turn supported into not supported ops and as said, what's supported or
> > not cannot be finally decided (if it's only vcond and not vcond_mask
> > that is supported).  Also optimize_vectors_before_lowering_p is set
> > for a short time between vectorization and vector lowering and we
> > definitely do not want to turn supported vectorizer emitted stmts
> > into ones that we need to lower.  For GCC 15 we should see to move
> > vector lowering before vectorization (before loop optimization I'd
> > say) to close this particula hole (and also reliably ICE when the
> > vectorizer creates unsupported IL).  We also definitely want to
> > retire vcond expanders (no target I know of supports single-instruction
> > compare-and-select).
> 
> ...definitely agree with this FWIW.  Sounds like a much cleaner approach.
> 
> One of the main tricks that vcond*s tend to do is invert "difficult"
> comparisons and swap the data operands to match.  But I think we should
> move to a situation where targets don't provide comparison patterns
> that require an inversion, and instead move inversions to generic code.

Yes, that would be good as well - of course it will likely require fending
off some more simplification pattenrs.  Sounds like step #2 anyway, making
all ports happy with no vcond will definitely require some work unless
we force it by simply no longer using vcond ...

Richard.


Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-07 Thread Richard Biener
On Thu, Mar 7, 2024 at 8:29 PM Ken Matsui  wrote:
>
> On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
>  wrote:
> >
> > On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  wrote:
> > >
> > > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> > > > >
> > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > > equivalence does not hold for types other than signed integers.  When
> > > > > it comes to conditions, the former was translated to a combination of
> > > > > sub and test, whereas the latter was translated to a single cmp.
> > > > > Thus, this optimization pass tries to optimize the former to the
> > > > > latter.
> > > > >
> > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > > as defined behavior, specifically, wrapping around according to two's
> > > > > complement arithmetic.  This has implications for optimizations that
> > > > > rely on the standard behavior of signed integers, where overflow is
> > > > > undefined.  Consider the example given:
> > > > >
> > > > > long long llmax = __LONG_LONG_MAX__;
> > > > > long long llmin = -llmax - 1;
> > > > >
> > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, 
> > > > > which
> > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum 
> > > > > value
> > > > > for a `long long`, this calculation overflows in a defined manner
> > > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > > produces a negative value due to two's complement wraparound.
> > > > > Therefore, `llmax - llmin < 0` is true.
> > > > >
> > > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > > enabled.  This is why this optimization pass is disabled under
> > > > > `-fwrapv`.
> > > > >
> > > > > This optimization pass must run before the Jump Threading pass and the
> > > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > >
> > > > > (1)
> > > > >   int diff = x - y;
> > > > >   if (diff > 0)
> > > > > foo();
> > > > >   if (diff < 0)
> > > > > bar();
> > > > >
> > > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > > because we know the postcondition of the first condition is diff <= 0,
> > > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > > after this VRP, we get:
> > > > >
> > > > > (2)
> > > > >   int diff = x - y;
> > > > >   if (x > y)
> > > > > foo();
> > > > >   if (diff != 0)
> > > > > bar();
> > > > >
> > > > > This generates sub and test for the second condition and cmp for the
> > > > > first condition. However, if we apply this pass beforehand, we simply
> > > > > get:
> > > > >
> > > > > (3)
> > > > >   int diff = x - y;
> > > > >   if (x > y)
> > > > > foo();
> > > > >   if (x < y)
> > > > > bar();
> > > > >
> > > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > > will not be generated, which is more efficient.
> > > > >
> > > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > > (3) above are recognized as different, which prevents TCO.
> > > > >
> > > > > PR middle-end/113680
> > > >
> > > > This shouldn't be done as a new optimization pass.  It fits either
> > > > the explicit code present in the forwprop pass or a new match.pd
> > > > pattern.  There's possible interaction with x - y value being used
> > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > a comparison against zero being possibly implemented by
> > > > a flag setting subtraction instruction.
> > > >
> > >
> > > Thank you so much for your review!  Although the forwprop pass runs
> > > multiple times, we might not need to apply this optimization multiple
> > > times.  Would it be acceptable to add such optimization?  More
> > > generally, I would like to know how to determine where to put
> > > optimization in the future.
> >
> > This kind of pattern matching expression simplification is best
> > addressed by patterns in match.pd though historically the forwprop
> > pass still catches cases not in match.pd in its
> > forward_propagate_into_comparison_1 (and callers).
> >
>
> I see.  When would patterns in match.pd be applied?  Around forwprop
> or somewhere else?  (Also, could you please tell me a document I can
> learn about these if it exists?)  I 

[PATCH v2, RFC] fsra: gimple final sra pass for paramters and returns

2024-03-07 Thread Jiufu Guo
Hi,

As known there are a few PRs (meta-bug PR101926) about
accessing aggregate param/returns which are passed through registers.

We could use the current SRA pass in a special mode right before
RTL expansion for the incoming/outgoing part, as the suggestion from:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637935.html

Compared to previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646625.html
This version enhanced the expand_ARG_PARTS to pass bootstrap aarch64.
And this version merge previous three patches into one patch for
review.

As mentioned in previous version:
This patch is using IFN ARG_PARTS and SET_RET_PARTS for parameters
and returns. And expand the IFNs according to the incoming/outgoing
registers.

Bootstrapped/regtested on ppc64{,le}, x86_64 and aarch64.

Again there are a few thing could be enhanced for this patch:
* Multi-registers access
* Parameter access cross call
* Optimize for access parameter which in memory
* More cases/targets checking

I would like to ask for comments to avoid major flaw.

BR,
Jeff (Jiufu Guo)


PR target/108073
PR target/65421
PR target/69143

gcc/ChangeLog:

* cfgexpand.cc (expand_value_return): Update for rtx eq checking.
(expand_return): Update for sclarized returns.
* internal-fn.cc (query_position_in_parallel): New function.
(construct_reg_seq): New function.
(get_incoming_element): New function.
(reference_alias_ptr_type): Extern declare.
(expand_ARG_PARTS): New IFN expand.
(store_outgoing_element): New function.
(expand_SET_RET_PARTS): New IFN expand.
(expand_SET_RET_LAST_PARTS): New IFN expand.
* internal-fn.def (ARG_PARTS): New IFN.
(SET_RET_PARTS): New IFN.
(SET_RET_LAST_PARTS): New IFN.
* passes.def (pass_sra_final): Add new pass.
* tree-pass.h (make_pass_sra_final): New function.
* tree-sra.cc (enum sra_mode): New enum item SRA_MODE_FINAL_INTRA.
(build_accesses_from_assign): Accept SRA_MODE_FINAL_INTRA.
(scan_function): Update for argment in fsra.
(find_var_candidates): Collect candidates for SRA_MODE_FINAL_INTRA.
(analyze_access_subtree): Update analyze for fsra.
(generate_subtree_copies): Update to generate new IFNs.
(final_intra_sra): New function.
(class pass_sra_final): New class.
(make_pass_sra_final): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr102024.C: Update instructions.
* gcc.target/powerpc/pr108073-1.c: New test.
* gcc.target/powerpc/pr108073.c: New test.
* gcc.target/powerpc/pr65421.c: New test.
* gcc.target/powerpc/pr69143.c: New test.

---
 gcc/cfgexpand.cc  |   6 +-
 gcc/internal-fn.cc| 254 ++
 gcc/internal-fn.def   |   9 +
 gcc/passes.def|   2 +
 gcc/tree-pass.h   |   1 +
 gcc/tree-sra.cc   | 157 ++-
 gcc/testsuite/g++.target/powerpc/pr102024.C   |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073-1.c |  76 ++
 gcc/testsuite/gcc.target/powerpc/pr108073.c   |  74 +
 gcc/testsuite/gcc.target/powerpc/pr65421.c|  10 +
 gcc/testsuite/gcc.target/powerpc/pr69143.c|  23 ++
 11 files changed, 598 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr69143.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index eef565eddb5..1ec6c2d8102 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -3759,7 +3759,7 @@ expand_value_return (rtx val)
 
   tree decl = DECL_RESULT (current_function_decl);
   rtx return_reg = DECL_RTL (decl);
-  if (return_reg != val)
+  if (!rtx_equal_p (return_reg, val))
 {
   tree funtype = TREE_TYPE (current_function_decl);
   tree type = TREE_TYPE (decl);
@@ -3832,6 +3832,10 @@ expand_return (tree retval)
  been stored into it, so we don't have to do anything special.  */
   if (TREE_CODE (retval_rhs) == RESULT_DECL)
 expand_value_return (result_rtl);
+  /* return is scalarized by fsra: TODO use FLAG. */
+  else if (VAR_P (retval_rhs)
+  && rtx_equal_p (result_rtl, DECL_RTL (retval_rhs)))
+expand_null_return_1 ();
 
   /* If the result is an aggregate that is being returned in one (or more)
  registers, load the registers here.  */
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..905ee7da005 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3394,6 +3394,260 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
 }
 }
 
+/* In the parallel rtx register series REGS, compute the register position for
+   given {BITPOS, 

Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread Xi Ruoyao
On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote:
> 
> 在 2024/3/7 下午8:52, Xi Ruoyao 写道:
> > It should be better to extend the expected value before the ll/sc loop
> > (like what LLVM does), instead of repeating the extending in each
> > iteration.  Something like:
> 
> I wanted to do this at first, but it didn't work out.
> 
> But then I thought about it, and there are two benefits to putting it in 
> the middle of ll/sc:
> 
> 1. If there is an operation that uses the $r4 register after this atomic 
> operation, another
> 
> register is required to store $r4.
> 
> 2. ll.w requires long cycles, so putting an addi.w command after ll.w 
> won't make a difference.
> 
> So based on the above, I didn't try again, but directly made a 
> modification like a patch.

Ah, the explanation makes sense to me.  Ok with the original patch then.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH v2] c++: Check module attachment instead of just purview when necessary [PR112631]

2024-03-07 Thread Nathaniel Shead
On Mon, Nov 27, 2023 at 03:59:39PM +1100, Nathaniel Shead wrote:
> On Thu, Nov 23, 2023 at 03:03:37PM -0500, Nathan Sidwell wrote:
> > On 11/20/23 04:47, Nathaniel Shead wrote:
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
> > > access.
> > > 
> > > -- >8 --
> > > 
> > > Block-scope declarations of functions or extern values are not allowed
> > > when attached to a named module. Similarly, class member functions are
> > > not inline if attached to a named module. However, in both these cases
> > > we currently only check if the declaration is within the module purview;
> > > it is possible for such a declaration to occur within the module purview
> > > but not be attached to a named module (e.g. in an 'extern "C++"' block).
> > > This patch makes the required adjustments.
> > 
> > 
> > Ah I'd been puzzling over the default inlinedness of  member-fns of
> > block-scope structs.  Could you augment the testcase to make sure that's
> > right too?
> > 
> > Something like:
> > 
> > // dg-module-do link
> > export module Mod;
> > 
> > export auto Get () {
> >   struct X { void Fn () {} };
> >   return X();
> > }
> > 
> > 
> > ///
> > import Mod
> > void Frob () { Get().Fn(); }
> > 
> 
> I gave this a try and it indeed doesn't work correctly; 'Fn' needs to be
> marked 'inline' for this to link (whether or not 'Get' itself is
> inline). I've tried tracing the code to work out what's going on but
> I've been struggling to work out how all the different flags (e.g.
> TREE_PUBLIC, TREE_EXTERNAL, TREE_COMDAT, DECL_NOT_REALLY_EXTERN)
> interact, which flags we want to be set where, and where the decision of
> what function definitions to emit is actually made.
> 
> I did find that doing 'mark_used(decl)' on all member functions in
> block-scope structs seems to work however, but I wonder if that's maybe
> too aggressive or if there's something else we should be doing?

I got around to looking at this again, here's an updated version of this
patch. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

(I'm not sure if 'start_preparsed_function' is the right place to be
putting this kind of logic or if it should instead be going in
'grokfndecl', e.g. decl.cc:10761 where the rules for making local
functions have no linkage are initially determined, but I found this
easier to implement: happy to move around though if preferred.)

-- >8 --

Block-scope declarations of functions or extern values are not allowed
when attached to a named module. Similarly, class member functions are
not inline if attached to a named module. However, in both these cases
we currently only check if the declaration is within the module purview;
it is possible for such a declaration to occur within the module purview
but not be attached to a named module (e.g. in an 'extern "C++"' block).
This patch makes the required adjustments.

While implementing this we discovered that block-scope local functions
are not correctly emitted, causing link failures; this patch also
corrects some assumptions here and ensures that they are emitted when
needed.

PR c++/112631

gcc/cp/ChangeLog:

* cp-tree.h (named_module_attach_p): New function.
* decl.cc (start_decl): Check for attachment not purview.
(grokmethod): Likewise.
(start_preparsed_function): Ensure block-scope functions are
emitted in module interfaces.
* decl2.cc (determine_visibility): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/block-decl-1_a.C: New test.
* g++.dg/modules/block-decl-1_b.C: New test.
* g++.dg/modules/block-decl-2_a.C: New test.
* g++.dg/modules/block-decl-2_b.C: New test.
* g++.dg/modules/block-decl-2_c.C: New test.
* g++.dg/modules/block-decl-3.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h  |   2 +
 gcc/cp/decl.cc|  22 ++-
 gcc/cp/decl2.cc   |  23 +--
 gcc/testsuite/g++.dg/modules/block-decl-1_a.C |   9 ++
 gcc/testsuite/g++.dg/modules/block-decl-1_b.C |  10 ++
 gcc/testsuite/g++.dg/modules/block-decl-2_a.C | 143 ++
 gcc/testsuite/g++.dg/modules/block-decl-2_b.C |   8 +
 gcc/testsuite/g++.dg/modules/block-decl-2_c.C |  25 +++
 gcc/testsuite/g++.dg/modules/block-decl-3.C   |  21 +++
 9 files changed, 249 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/block-decl-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/block-decl-1_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/block-decl-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/block-decl-2_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/block-decl-2_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/block-decl-3.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 14895bc6585..05913861e06 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7381,6 +7381,8 @@ inline bool module_attach_p ()
 
 

RE: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-03-07 Thread Li, Pan2
Thanks a lot for coaching, really save my day. I will have a try for 
usadd/ssadd includes both the scalar and vector (ISEL/widen_mult) modes in v3.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, March 7, 2024 4:41 PM
To: Li, Pan2 
Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com; jeffreya...@gmail.com
Subject: Re: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation 
US_PLUS

On Thu, Mar 7, 2024 at 2:54 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > gen_int_libfunc will no longer make it emit libcalls for fixed point
> > modes, so this can't be correct
> > and there's no libgcc implementation for integer mode saturating ops,
> > so it's pointless to emit calls
> > to them.
>
> Got the pointer here, the OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, 
> "usadd", '3', gen_unsigned_fixed_libfunc)
> Is designed for the fixed point, cannot cover integer mode right now.

I think

OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
gen_unsigned_fixed_libfunc)

would work fine (just dropping the $Q).

> Given we have saturating integer alu like below, could you help to coach me 
> the most reasonable way to represent
> It in scalar as well as vectorize part? Sorry not familiar with this part and 
> still dig into how it works...

As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for
the other cases.

As I said, use vectorizer patterns and possibly do instruction
selection at ISEL/widen_mult time.

Richard.

> uint32_t sat_uadd (uint32_t a, uint32_t b)
> {
>   uint32_t add = a + b;
>   return add | -(add < a);
> }
>
> sint32_t sat_sadd (sint32_t a, sint32_t b)
> {
>   sint32_t add = a + b;
>   sint32_t x = a ^ b;
>   sint32_t y = add ^ x;
>   return x < 0 ? add : (y >= 0 ? add : INT32_MAX + (x < 0));
> }
>
> uint32_t sat_usub (uint32_t a, uint32_t b)
> {
>   return a >= b ? a - b : 0;
> }
>
> sint32_t sat_ssub (sint32_t a, sint32_t b)
> {
>   sint32_t sub = a - b;
>   sint32_t x = a ^ b;
>   sint32_t y = sub ^ x;
>   return x >= 0 ? sub : (y >= 0 ? sub : INT32_MAX + (x < 0));
> }
>
> uint32_t sat_umul (uint32_t a, uint32_t b)
> {
>   uint64_t mul = a * b;
>
>   return mul <= (uint64_t)UINT32_MAX ? (uint32_t)mul : UINT32_MAX;
> }
>
> sint32_t sat_smul (sint32_t a, sint32_t b)
> {
>   sint64_t mul = a * b;
>
>   return mul >= (sint64_t)INT32_MIN && mul <= (sint64_t)INT32_MAX ? 
> (sint32_t)mul : INT32_MAX + ((x ^ y) < 0);
> }
>
> uint32_t sat_udiv (uint32_t a, uint32_t b)
> {
>   return a / b; // never overflow
> }
>
> sint32_t sat_sdiv (sint32_t a, sint32_t b)
> {
>   return a == INT32_MIN && b == -1 ? INT32_MAX : a / b;
> }
>
> sint32_t sat_abs (sint32_t a)
> {
>   return a >= 0 ? a : (a == INT32_MIN ? INT32_MAX : -a);
> }
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, March 5, 2024 4:41 PM
> To: Li, Pan2 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
> kito.ch...@gmail.com; jeffreya...@gmail.com
> Subject: Re: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation 
> US_PLUS
>
> On Tue, Mar 5, 2024 at 8:09 AM Li, Pan2  wrote:
> >
> > Thanks Richard for comments.
> >
> > > I do wonder what the existing usadd patterns with integer vector modes
> > > in various targets do?
> > > Those define_insn will at least not end up in the optab set I guess,
> > > so they must end up
> > > being either unused or used by explicit gen_* (via intrinsic
> > > functions?) or by combine?
> >
> > For usadd with vector modes, I think the backend like RISC-V try to 
> > leverage instructions
> > like Vector Single-Width Saturating Add(aka vsaddu.vv/x/i).
> >
> > > I think simply changing gen_*_fixed_libfunc to gen_int_libfunc won't
> > > work.  Since there's
> > > no libgcc support I'd leave it as gen_*_fixed_libfunc thus no library
> > > fallback for integers?
> >
> > Change to gen_int_libfunc follows other int optabs. I am not sure if it 
> > will hit the standard name usaddm3 for vector mode.
> > But the happy path for scalar modes works up to a point, please help to 
> > correct me if any misunderstanding.
>
> gen_int_libfunc will no longer make it emit libcalls for fixed point
> modes, so this can't be correct
> and there's no libgcc implementation for integer mode saturating ops,
> so it's pointless to emit calls
> to them.
>
> > #0  riscv_expand_usadd (dest=0x76a8c7c8, x=0x76a8c798, 
> > y=0x76a8c7b0) at 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:10662
> > #1  0x029f142a in gen_usaddsi3 (operand0=0x76a8c7c8, 
> > operand1=0x76a8c798, operand2=0x76a8c7b0) at 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.md:3848
> > #2  0x01751e60 in insn_gen_fn::operator() > rtx_def*> (this=0x4910e70 ) at 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/recog.h:441
> > #3  

[committed] libstdc++: Use std::from_chars to speed up parsing subsecond durations

2024-03-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

With std::from_chars we can parse subsecond durations much faster than
with std::num_get, as shown in the microbenchmarks below. We were using
std::num_get and std::numpunct in order to parse a number with the
locale's decimal point character. But we copy the chars from the input
stream into a new buffer anyway, so we can replace the locale's decimal
point with '.' in that buffer, and then we can use std::from_chars on
it.

BenchmarkTime CPU   Iterations
--
from_chars_millisec158 ns  158 ns  4524046
num_get_millisec   192 ns  192 ns  3644626
from_chars_microsec164 ns  163 ns  4330627
num_get_microsec   205 ns  205 ns  3413452
from_chars_nanosec 173 ns  173 ns  4072653
num_get_nanosec227 ns  227 ns  3105161

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (_Parser::operator()): Use
std::from_chars to parse fractional seconds.
---
 libstdc++-v3/include/bits/chrono_io.h | 28 +--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index b8f0657bee9..412e8b83fb7 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -37,6 +37,7 @@
 #include  // ostringstream
 #include  // setw, setfill
 #include 
+#include  // from_chars
 
 #include 
 
@@ -3597,13 +3598,17 @@ namespace __detail
__err |= ios_base::eofbit;
  else
{
- auto& __np = use_facet>(__loc);
- auto __dp = __np.decimal_point();
+ _CharT __dp = '.';
+ if (__loc != locale::classic())
+   {
+ auto& __np = use_facet>(__loc);
+ __dp = __np.decimal_point();
+   }
  _CharT __c = _Traits::to_char_type(__i);
  if (__c == __dp)
{
  (void) __is.get();
- __buf.put(__c);
+ __buf.put('.');
  int __prec
= hh_mm_ss<_Duration>::fractional_width;
  do
@@ -3618,14 +3623,17 @@ namespace __detail
}
}
 
- if (!__is_failed(__err))
+ if (!__is_failed(__err)) [[likely]]
{
- auto& __ng = use_facet>(__loc);
- long double __val;
- ios_base::iostate __err2{};
- __ng.get(__buf, {}, __buf, __err2, __val);
- if (__is_failed(__err2)) [[unlikely]]
-   __err |= __err2;
+ long double __val{};
+ string __str = std::move(__buf).str();
+ auto __first = __str.data();
+ auto __last = __first + __str.size();
+ using enum chars_format;
+ auto [ptr, ec] = std::from_chars(__first, __last,
+  __val, fixed);
+ if ((bool)ec || ptr != __last) [[unlikely]]
+   __err |= ios_base::failbit;
  else
{
  duration __fs(__val);
-- 
2.43.2



[PATCH v1] VECT: Bugfix ICE for vectorizable_store when both len and mask

2024-03-07 Thread pan2 . li
From: Pan Li 

This patch would like to fix one ICE in vectorizable_store for both the
loop_masks and loop_lens.  The ICE looks like below with "-march=rv64gcv -O3".

during GIMPLE pass: vect
test.c: In function ‘d’:
test.c:6:6: internal compiler error: in vectorizable_store, at
tree-vect-stmts.cc:8691
6 | void d() {
  |  ^
0x37a6f2f vectorizable_store
.../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691
0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*,
_slp_tree*, _slp_instance*, vec*)
.../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242
0x1db5dca vect_analyze_loop_operations
.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208
0x1db885b vect_analyze_loop_2
.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041
0x1dba029 vect_analyze_loop_1
.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481
0x1dbabad vect_analyze_loop(loop*, vec_info_shared*)
.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639
0x1e389d1 try_vectorize_loop_1
.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066
0x1e38f3d try_vectorize_loop
.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182
0x1e39230 execute
.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298

Given the masks and the lens cannot be enabled simultanously when loop is
using partial vectors.  Thus, we need to ensure the one is disabled when we
would like to record the other in check_load_store_for_partial_vectors.  For
example, when we try to record loop len, we need to check if the loop mask
is disabled or not.

Below testsuites are passed for this patch:
* The x86 bootstrap tests.
* The x86 fully regression tests.
* The aarch64 fully regression tests.
* The riscv fully regressison tests.

PR target/114195

gcc/ChangeLog:

* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Add
loop mask/len check before recording as they are mutual exclusion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr114195-1.c: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/base/pr114195-1.c| 15 +++
 gcc/tree-vect-stmts.cc| 26 ++-
 2 files changed, 35 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
new file mode 100644
index 000..b0c9d5b81b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c
@@ -0,0 +1,15 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
+
+long a, b;
+extern short c[];
+
+void d() {
+  for (int e = 0; e < 35; e += 2) {
+a = ({ a < 0 ? a : 0; });
+b = ({ b < 0 ? b : 0; });
+
+c[e] = 0;
+  }
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 14a3ffb5f02..624947ed271 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1502,6 +1502,8 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
  gather_scatter_info *gs_info,
  tree scalar_mask)
 {
+  gcc_assert (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo));
+
   /* Invariant loads need no special support.  */
   if (memory_access_type == VMAT_INVARIANT)
 return;
@@ -1521,9 +1523,17 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
   internal_fn ifn
= (is_load ? vect_load_lanes_supported (vectype, group_size, true)
   : vect_store_lanes_supported (vectype, group_size, true));
-  if (ifn == IFN_MASK_LEN_LOAD_LANES || ifn == IFN_MASK_LEN_STORE_LANES)
+
+  /* When the loop_vinfo using partial vector,  we cannot enable both
+the fully mask and length simultaneously.  Thus, make sure the
+other one is disabled when record one of them.
+The same as other place for both the vect_record_loop_len and
+vect_record_loop_mask.  */
+  if ((ifn == IFN_MASK_LEN_LOAD_LANES || ifn == IFN_MASK_LEN_STORE_LANES)
+   && !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
-  else if (ifn == IFN_MASK_LOAD_LANES || ifn == IFN_MASK_STORE_LANES)
+  else if ((ifn == IFN_MASK_LOAD_LANES || ifn == IFN_MASK_STORE_LANES)
+   && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
   scalar_mask);
   else
@@ -1549,12 +1559,14 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
   if (internal_gather_scatter_fn_supported_p (len_ifn, vectype,
  gs_info->memory_type,
  gs_info->offset_vectype,
- 

[committed] libstdc++: Fix parsing of fractional seconds [PR114244]

2024-03-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

When converting a chrono::duration to a result type with an
integer representation we should use chrono::round<_Duration> so that we
don't truncate towards zero. Rounding ensures that e.g. 0.001999s
becomes 2ms not 1ms.

We can also remove some redundant uses of chrono::duration_cast to
convert from seconds to _Duration, because the _Parser class template
requires _Duration type to be able to represent seconds without loss of
precision.

This also fixes a bug where no fractional part would be parsed for
chrono::duration because its period is ratio<1>. We should
also consider treat_as_floating_point when deciding whether to skip
reading a fractional part.

libstdc++-v3/ChangeLog:

PR libstdc++/114244
* include/bits/chrono_io.h (_Parser::operator()): Remove
redundant uses of duration_cast. Use chrono::round to convert
long double value to durations with integer representations.
Check represenation type when deciding whether to skip parsing
fractional seconds.
* testsuite/20_util/duration/114244.cc: New test.
* testsuite/20_util/duration/io.cc: Check that a floating-point
duration with ratio<1> precision can be parsed.
---
 libstdc++-v3/include/bits/chrono_io.h | 18 ++
 .../testsuite/20_util/duration/114244.cc  | 36 +++
 libstdc++-v3/testsuite/20_util/duration/io.cc | 12 +++
 3 files changed, 60 insertions(+), 6 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/duration/114244.cc

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 82f2d39ec44..b8f0657bee9 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -3113,6 +3113,9 @@ namespace __detail
  unsigned __num = 0; // Non-zero for N modifier.
  bool __is_flag = false; // True if we're processing a % flag.
 
+ constexpr bool __is_floating
+   = treat_as_floating_point_v;
+
  // If an out-of-range value is extracted (e.g. 61min for %M),
  // do not set failbit immediately because we might not need it
  // (e.g. parsing chrono::year doesn't care about invalid %M values).
@@ -3195,7 +3198,7 @@ namespace __detail
  __d = day(__tm.tm_mday);
  __h = hours(__tm.tm_hour);
  __min = minutes(__tm.tm_min);
- __s = duration_cast<_Duration>(seconds(__tm.tm_sec));
+ __s = seconds(__tm.tm_sec);
}
}
  __parts |= _ChronoParts::_DateTime;
@@ -3564,8 +3567,8 @@ namespace __detail
  if (!__is_failed(__err))
__s = seconds(__tm.tm_sec);
}
- else if constexpr (ratio_equal_v>)
+ else if constexpr (_Duration::period::den == 1
+  && !__is_floating)
{
  auto __val = __read_unsigned(__num ? __num : 2);
  if (0 <= __val && __val <= 59) [[likely]]
@@ -3577,7 +3580,7 @@ namespace __detail
  break;
}
}
- else
+ else // Read fractional seconds
{
  basic_stringstream<_CharT> __buf;
  auto __digit = _S_try_read_digit(__is, __err);
@@ -3626,7 +3629,10 @@ namespace __detail
  else
{
  duration __fs(__val);
- __s = duration_cast<_Duration>(__fs);
+ if constexpr (__is_floating)
+   __s = __fs;
+ else
+   __s = chrono::round<_Duration>(__fs);
}
}
}
@@ -3737,7 +3743,7 @@ namespace __detail
{
  __h = hours(__tm.tm_hour);
  __min = minutes(__tm.tm_min);
- __s = duration_cast<_Duration>(seconds(__tm.tm_sec));
+ __s = seconds(__tm.tm_sec);
}
}
  __parts |= _ChronoParts::_TimeOfDay;
diff --git a/libstdc++-v3/testsuite/20_util/duration/114244.cc 
b/libstdc++-v3/testsuite/20_util/duration/114244.cc
new file mode 100644
index 000..55a7670522a
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/duration/114244.cc
@@ -0,0 +1,36 @@
+// { dg-do run { target c++20 } }
+// { dg-timeout-factor 2 }
+// { dg-require-namedlocale "en_US.ISO8859-1" }
+
+// PR libstdc++/114244 Need to use round when parsing fractional seconds
+
+#include 
+#include 
+#include 
+
+void
+test_pr114244()
+{
+  using namespace 

Re: [PATCH] c++: problematic assert in reference_binding [PR113141]

2024-03-07 Thread Jason Merrill

On 1/29/24 17:42, Patrick Palka wrote:

On Mon, 29 Jan 2024, Patrick Palka wrote:


On Fri, 26 Jan 2024, Jason Merrill wrote:


On 1/26/24 17:11, Jason Merrill wrote:

On 1/26/24 16:52, Jason Merrill wrote:

On 1/25/24 14:18, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13?  This isn't a very satisfactory fix, but at least
it safely fixes these testcases I guess.  Note that there's
implementation disagreement about the second testcase, GCC always
accepted it but Clang/MSVC/icc reject it.


Because of trying to initialize int& from {c}; removing the extra braces
makes it work everywhore.

https://eel.is/c++draft/dcl.init#list-3.10 says that we always generate a
prvalue in this case, so perhaps we shouldn't recalculate if the
initializer is an init-list?


...but it seems bad to silently bind a const int& to a prvalue instead of
directly to the reference returned by the operator, as clang does if we add
const to the second testcase, so I think there's a defect in the standard
here.


Perhaps bullet 3.9 should change to "...its referenced type is
reference-related to E or scalar, ..."


Maybe for now also disable the maybe_valid heuristics in the case of an
init-list?


The first testcase is special because it's a C-style cast; seems like the
maybe_valid = false heuristics should be disabled if c_cast_p.


Thanks a lot for the pointers.  IIUC c_cast_p and LOOKUP_SHORTCUT_BAD_CONVS
should already be mutually exclusive, since the latter is set only when
computing argument conversions, so it shouldn't be necessary to check c_cast_p.

I suppose we could disable the heuristic for init-lists, but after some
digging I noticed that the heuristics were originally in same spot they
are now until r5-601-gd02f620dc0bb3b moved them to get checked after
the recursive recalculation case in reference_binding, returning a bad
conversion instead of NULL.  (Then in r13-1755-g68f37670eff0b872 I moved
them back; IIRC that's why I felt confident that moving the checks was safe.)
Thus we didn't always accept the second testcase, we only started doing so in
GCC 5: https://godbolt.org/z/6nsEW14fh (sorry for missing this and saying we
always accepted it)

And indeed the current order of checks seems consistent with that of
[dcl.init.ref]/5.  So I wonder if we don't instead want to "complete"
the NULL-to-bad-conversion adjustment in r5-601-gd02f620dc0bb3b and
do:

gcc/cp/ChangeLog:

* call.cc (reference_binding): Set bad_p according to
maybe_valid_p in the recursive case as well.  Remove
redundant gcc_assert.

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 9de0d77c423..c4158b2af37 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2033,8 +2033,8 @@ reference_binding (tree rto, tree rfrom, tree expr, bool 
c_cast_p, int flags,
   sflags, complain);
if (!new_second)
  return bad_direct_conv ? bad_direct_conv : nullptr;
+   t->bad_p = !maybe_valid_p;


Oops, that should be |= not =.


Perhaps bullet 3.9 should change to "...its referenced type is
reference-related to E or scalar, ..."

conv = merge_conversion_sequences (t, new_second);
-   gcc_assert (maybe_valid_p || conv->bad_p);
return conv;
  }
  }

This'd mean we'd go back to rejecting the second testcase (only the
call, not the direct-init, interestingly enough), but that seems to be


In the second testcase, with the above fix initialize_reference silently
returns error_mark_node for the direct-init without issuing a
diagnostic, because in the error path convert_like doesn't find anything
wrong with the bad conversion.  So more changes need to be made if we
want to set bad_p in the recursive case of reference_binding it seems;
dunno if that's the path we want to go down?

On the other hand, disabling the badness checks in certain cases seems
to be undesirable as well, since AFAICT their current position is
consistent with [dcl.init.ref]/5?

So I wonder if we should just go with the safest thing at this stage,
which would be the original patch that removes the problematic assert?


I still think the assert is correct, and the problem is that 
maybe_valid_p is wrong; these cases turn out to be valid, so 
maybe_valid_p should be true.


Jason



Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:29 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 11:07:18PM +0100, Uros Bizjak wrote:
> > On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool
> >  wrote:
> > > > but can be something else, such as the above noted
> > > >
> > > >  (unspec:DI [
> > > >  (reg:CC 17 flags)
> > > >  ] UNSPEC_PUSHFL)
> > >
> > > But that is invalid RTL?  The only valid use of a CC is written as
> > > cc-compared-to-0.  It means "the result of something compared to 0,
> > > stored in that cc reg".
> > >
> > > (And you can copy a CC reg around, but that is not a use ;-) )
> >
> > How can we describe a pushfl then?
>
> (unspec:DI [
> (compare:CC) ((reg:CC 17 flags) (const_int 0))
> ] UNSPEC_PUSHFL)
>
> or something like that?
>
> > It was changed to its current form
> > in [1], but I suspect that the code will try to update it even when
> > pushfl is implemented as a direct move from a register (as was defined
> > previously).
> >
> > BTW: Unspecs are used in a couple of other places for other targets [2].
> >
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494#c5
> > [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639743.html
>
> There is nothing wront with unspecs.  You cannot use a CCmode value
> without comparing it to 0, though.  The exact kind of comparison
> determines what bits are valid (and have what meaning) in your CC reg,
> even!

The pushfl can be considered as a transparent move, separate bits have
no meaning there. I don't see why using unspec should be any different
than using "naked" register (please also see below, current source
code may update "naked" reg as well). What constitutes use is "(cmp:CC
(CC reg) (const_int 0))" around the register and I think that without
this RTX around the CC reg its use should not be updated in any way.

> > > > The source code that deals with the *user* of the CC register assumes
> > > > the former form, so it blindly tries to update the mode of the CC
> > > > register inside LT comparison RTX
> > >
> > > Would you like it better if there was an assert for this?  There are
> > > very many RTL requirements that aren't chacked for currently :-/
> >
> > In this case - yes. Assert signals that something is unsupported (or
> > invalid), way better than silently corrupting some memory, reporting
> > the corruption only with checking enabled.
>
> Yeah.  The way RTL checking works makes this hard to do in most cases.
> Hrm.  (It cannot easily look at context, only inside of the current RTX).
>
> > > The unspec should have the CC compared with 0 as argument.
> >
> > But this does not do what pushfl does... It pushes the register to the 
> > stack.
>
> Can't you just describe the dataflow then, without an unspec?  An unspec
> by definition does some (unspecified) operation on the data.

Previously, it was defined as:

 (define_insn "*pushfl2"
   [(set (match_operand:W 0 "push_operand" "=<")
 (match_operand:W 1 "flags_reg_operand"))]

But Wmode, AKA SI/DImode is not CCmode. And as said in my last
message, nothing prevents current source code to try to update the CC
register here.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool
On Thu, Mar 07, 2024 at 11:07:18PM +0100, Uros Bizjak wrote:
> On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool
>  wrote:
> > > but can be something else, such as the above noted
> > >
> > >  (unspec:DI [
> > >  (reg:CC 17 flags)
> > >  ] UNSPEC_PUSHFL)
> >
> > But that is invalid RTL?  The only valid use of a CC is written as
> > cc-compared-to-0.  It means "the result of something compared to 0,
> > stored in that cc reg".
> >
> > (And you can copy a CC reg around, but that is not a use ;-) )
> 
> How can we describe a pushfl then?

(unspec:DI [
(compare:CC) ((reg:CC 17 flags) (const_int 0))
] UNSPEC_PUSHFL)

or something like that?

> It was changed to its current form
> in [1], but I suspect that the code will try to update it even when
> pushfl is implemented as a direct move from a register (as was defined
> previously).
> 
> BTW: Unspecs are used in a couple of other places for other targets [2].
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494#c5
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639743.html

There is nothing wront with unspecs.  You cannot use a CCmode value
without comparing it to 0, though.  The exact kind of comparison
determines what bits are valid (and have what meaning) in your CC reg,
even!

> > > The source code that deals with the *user* of the CC register assumes
> > > the former form, so it blindly tries to update the mode of the CC
> > > register inside LT comparison RTX
> >
> > Would you like it better if there was an assert for this?  There are
> > very many RTL requirements that aren't chacked for currently :-/
> 
> In this case - yes. Assert signals that something is unsupported (or
> invalid), way better than silently corrupting some memory, reporting
> the corruption only with checking enabled.

Yeah.  The way RTL checking works makes this hard to do in most cases.
Hrm.  (It cannot easily look at context, only inside of the current RTX).

> > The unspec should have the CC compared with 0 as argument.
> 
> But this does not do what pushfl does... It pushes the register to the stack.

Can't you just describe the dataflow then, without an unspec?  An unspec
by definition does some (unspecified) operation on the data.


Segher


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak  wrote:
>
> On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool
>  wrote:
> >
> > On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote:
> >
> > [snip]
> >
> > > The part we want to fix deals with the *user* of the CC register. It
> > > is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ...
> > > in the form of
> > >
> > > (LT:CCGC (reg:CCGC 17 flags) (const_int 0))
> > >
> > > but can be something else, such as the above noted
> > >
> > >  (unspec:DI [
> > >  (reg:CC 17 flags)
> > >  ] UNSPEC_PUSHFL)
> >
> > But that is invalid RTL?  The only valid use of a CC is written as
> > cc-compared-to-0.  It means "the result of something compared to 0,
> > stored in that cc reg".
> >
> > (And you can copy a CC reg around, but that is not a use ;-) )

Hm... under this premise, we can also say that any form that is not
cc-compared-to-0 is not a use. Consequently, if it is not a use, then
the CC reg should not be updated at its use location, so my v1 patch,
where we simply skip the update (but retain the combined insn),
actually makes sense.

In this concrete situation, we don't care about CC register mode in
the PUSHFL insn. And we should not change CC reg mode of the use,
because any other mode than the generic CCmode won't be recognized by
the insn pattern.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote:
>
> [snip]
>
> > The part we want to fix deals with the *user* of the CC register. It
> > is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ...
> > in the form of
> >
> > (LT:CCGC (reg:CCGC 17 flags) (const_int 0))
> >
> > but can be something else, such as the above noted
> >
> >  (unspec:DI [
> >  (reg:CC 17 flags)
> >  ] UNSPEC_PUSHFL)
>
> But that is invalid RTL?  The only valid use of a CC is written as
> cc-compared-to-0.  It means "the result of something compared to 0,
> stored in that cc reg".
>
> (And you can copy a CC reg around, but that is not a use ;-) )

How can we describe a pushfl then? It was changed to its current form
in [1], but I suspect that the code will try to update it even when
pushfl is implemented as a direct move from a register (as was defined
previously).

BTW: Unspecs are used in a couple of other places for other targets [2].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494#c5
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639743.html

>
> > The source code that deals with the *user* of the CC register assumes
> > the former form, so it blindly tries to update the mode of the CC
> > register inside LT comparison RTX
>
> Would you like it better if there was an assert for this?  There are
> very many RTL requirements that aren't chacked for currently :-/

In this case - yes. Assert signals that something is unsupported (or
invalid), way better than silently corrupting some memory, reporting
the corruption only with checking enabled.

>
> > (some other nearby source code even
> > checks for (const_int 0) RTX). Obviously, this is not the case with
> > the former form, where the update tries to:
> >
> > SUBST (XEXP (*cc_use_loc, 0), ...)
> >
> > on unspec, which has no XEXP (..., 0).
> >
> > And *this* is what triggers RTX checking assert.
>
> The unspec should have the CC compared with 0 as argument.

But this does not do what pushfl does... It pushes the register to the stack.

Uros.


Re: [PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-07 Thread Segher Boessenkool
On Fri, Mar 08, 2024 at 03:01:04AM +0530, Ajit Agarwal wrote:
>  
> >> +   Copyright (C) 2020-2023 Free Software Foundation, Inc.
> > 
> > What in here is from 2020?
> > 
> > Most things will be from 2024, too.  First publication date is what
> > counts.
> 
> Please let me know the second publication date.

Huh?  This code won't be published before 2024 (and it does not derive
from older code), so the only valid date in the copyright message is
2024.

> >> +  bool pair_check_register_operand (bool load_p, rtx reg_op,
> >> +  machine_mode mem_mode)
> >> +  {
> >> +if (load_p || reg_op || mem_mode)
> >> +  return false;
> >> +else
> >> +  return false;
> >> +  }
> > 
> > The compiler will have warned for this.  Please look at all compiler
> > (and other) warnings that you introduce.
> >
> 
> As far as my understanding I didn't see any extra warnings, 
> but I will surely cross check and solve that.

Hrm, apparently there is no -Wall -W warnign for this?  But your code is
essentially

bool pair_check_register_operand (bool, rtx, machine_mode)
{
  return false;
}

> >> +// alias_walker that iterates over stores.
> >> +template
> >> +class store_walker : public def_walker
> > 
> > That is not a good comment.  You should describe parameters and return
> > values and that kind of thing.  That it walks over things is bloody
> > obvious from the name already :-)
> >
> 
> This part of code is taken from aarch64 load store fusion
> pass.  I have made the aarch64-ldp-fusion.cc into target independent code and 
> target dependent code. Target independent code is shared
> across all the architecture, In this case its rs6000 and aarch64.
> Target dependent code is implemented through pure virtual functions.

It is required to decribe what a function is for, and all its arguments
and return values.  If the aarch64 code doesn't, it should be fixed.

Not only reviewers need this, anyone trying to use the code needs it,
too.

> >> +find_trailing_add (insn_info *insns[2],
> >> + const insn_range_info _range,
> >> + int initial_writeback,
> >> + rtx *writeback_effect,
> >> + def_info **add_def,
> >> + def_info *base_def,
> >> + poly_int64 initial_offset,
> >> + unsigned access_size);
> > 
> > That is way, way, way too many parameters.
> > 
> 
> This code I have taken from aarch64-ldp-fusion.cc.
> I have not changed anything here.

Don't copy not-so-good stuff unmodified?  It is unreviewable, to start
with, but probably not very usable later either.

> Could you please elaborate on how you want me
> to structure the patches.

*You* should know the code already, so you surely can figure out a nice
way to present it, so that it takes me LESS work to review this than it
took you to write it?

Making a patch series reviewable is part of the development effort.  It
is way less work if you start with this as the goal in mind.  It is less
work than writing (and debugging etc.) an omnibus patch, in the first
place!

Your goal is to make a patch series that will be reviewed and then seen
to be great stuff.  So it of course needs to *be* great stuff, but it
also needs to be presented in such a way that that is obvious.


Segher


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool
On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote:

[snip]

> The part we want to fix deals with the *user* of the CC register. It
> is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ...
> in the form of
> 
> (LT:CCGC (reg:CCGC 17 flags) (const_int 0))
> 
> but can be something else, such as the above noted
> 
>  (unspec:DI [
>  (reg:CC 17 flags)
>  ] UNSPEC_PUSHFL)

But that is invalid RTL?  The only valid use of a CC is written as
cc-compared-to-0.  It means "the result of something compared to 0,
stored in that cc reg".

(And you can copy a CC reg around, but that is not a use ;-) )

> The source code that deals with the *user* of the CC register assumes
> the former form, so it blindly tries to update the mode of the CC
> register inside LT comparison RTX

Would you like it better if there was an assert for this?  There are
very many RTL requirements that aren't chacked for currently :-/

> (some other nearby source code even
> checks for (const_int 0) RTX). Obviously, this is not the case with
> the former form, where the update tries to:
> 
> SUBST (XEXP (*cc_use_loc, 0), ...)
> 
> on unspec, which has no XEXP (..., 0).
> 
> And *this* is what triggers RTX checking assert.

The unspec should have the CC compared with 0 as argument.


Segher


Re: [PATCH v14 00/26] Optimize more type traits

2024-03-07 Thread Ken Matsui
On Wed, Feb 28, 2024 at 11:32 AM Ken Matsui  wrote:
>
> Hi,
>
> This patch series implements __is_const, __is_volatile, __is_pointer,
> and __is_unbounded_array built-in traits, which were isolated from my
> previous patch series "Optimize type traits compilation performance"
> because they contained performance regression.  I confirmed that this
> patch series does not cause any performance regression.  The main reason
> of the performance regression were the exhaustiveness of the benchmarks
> and the instability of the benchmark results.  Also, this patch series
> includes built-ins for add_pointer, remove_extent, remove_all_extents,
> add_lvalue_reference, add_rvalue_reference, decay, rank, is_invocable,
> and is_nothrow_invocable.  Here are the benchmark results:

Ping.  Ok for trunk or maybe for 15?

>
> is_const: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_const.md#sat-dec-23-090605-am-pst-2023
> time: -4.36603%, peak memory: -0.300891%, total memory: -0.247934%
>
> is_const_v: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_const_v.md#sat-jun-24-044815-am-pdt-2023
> time: -2.86467%, peak memory: -1.0654%, total memory: -1.62369%
>
> is_volatile: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_volatile.md#sun-oct-22-091644-pm-pdt-2023
> time: -5.25164%, peak memory: -0.337971%, total memory: -0.247934%
>
> is_volatile_v: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_volatile_v.md#sat-dec-23-091518-am-pst-2023
> time: -4.06816%, peak memory: -0.609298%, total memory: -0.659134%
>
> is_pointer: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_pointer.md#sat-dec-23-124903-pm-pst-2023
> time: -2.47124%, peak memory: -2.98207%, total memory: -4.0811%
>
> is_pointer_v: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_pointer_v.md#sun-oct-22-122257-am-pdt-2023
> time: -4.71336%, peak memory: -2.25026%, total memory: -3.125%
>
> is_unbounded_array: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_unbounded_array.md#sun-oct-22-091644-pm-pdt-2023
> time: -6.33287%, peak memory: -0.602494%, total memory: -1.56035%
>
> is_unbounded_array_v: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_unbounded_array_v.md#sat-dec-23-010046-pm-pst-2023
> time: -1.50025%, peak memory: -1.07386%, total memory: -2.32394%
>
> add_pointer_t: 
> https://github.com/ken-matsui/gcc-bench/blob/main/add_pointer_t.md#wed-feb-28-060044-am-pst-2024
> time: -21.6673%, peak memory: -14.%, total memory: -17.4716%
>
> remove_extent_t: 
> https://github.com/ken-matsui/gcc-bench/blob/main/remove_extent_t.md#wed-feb-28-063021-am-pst-2024
> time: -14.4089%, peak memory: -2.71836%, total memory: -9.87013%
>
> remove_all_extents_t: 
> https://github.com/ken-matsui/gcc-bench/blob/main/remove_all_extents_t.md#wed-feb-28-064716-am-pst-2024
> time: -28.8941%, peak memory: -16.6981%, total memory: -23.6088%
>
> add_lvalue_reference_t: 
> https://github.com/ken-matsui/gcc-bench/blob/main/add_lvalue_reference_t.md#wed-feb-28-070023-am-pst-2024
> time: -33.8827%, peak memory: -24.9292%, total memory: -25.3043%
>
> add_rvalue_reference_t: 
> https://github.com/ken-matsui/gcc-bench/blob/main/add_rvalue_reference_t.md#wed-feb-28-070701-am-pst-2024
> time: -23.9186%, peak memory: -17.1311%, total memory: -19.5891%
>
> decay_t: 
> https://github.com/ken-matsui/gcc-bench/blob/main/decay_t.md#wed-feb-28-072330-am-pst-2024
> time: -42.4076%, peak memory: -29.2077%, total memory: -33.0914%
>
> rank: 
> https://github.com/ken-matsui/gcc-bench/blob/main/rank.md#wed-feb-28-074917-am-pst-2024
> time: -33.7312%, peak memory: -27.5885%, total memory: -34.5736%
>
> rank_v: 
> https://github.com/ken-matsui/gcc-bench/blob/main/rank_v.md#wed-feb-28-073632-am-pst-2024
> time: -40.7174%, peak memory: -16.4653%, total memory: -23.0131%
>
> is_invocable_v: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_invocable.md#wed-feb-28-111001-am-pst-2024
> time: -58.8307%, peak memory: -59.4966%, total memory: -59.8871%
> (This benchmark is not exhaustive as my laptop crashed with larger benchmarks)
>
> is_nothrow_invocable_v: 
> https://github.com/ken-matsui/gcc-bench/blob/main/is_nothrow_invocable.md#wed-feb-28-112414-am-pst-2024
> time: -70.4102%, peak memory: -62.5516%, total memory: -65.5853%
> (This benchmark is not exhaustive as my laptop crashed with larger benchmarks)
>
> Sincerely,
> Ken Matsui
>
> Ken Matsui (26):
>   c++: Implement __is_const built-in trait
>   libstdc++: Optimize std::is_const compilation performance
>   c++: Implement __is_volatile built-in trait
>   libstdc++: Optimize std::is_volatile compilation performance
>   c++: Implement __is_pointer built-in trait
>   libstdc++: Optimize std::is_pointer compilation performance
>   c++: Implement __is_unbounded_array built-in trait
>   libstdc++: Optimize std::is_unbounded_array compilation performance
>   c++: Implement __add_pointer built-in trait
>   libstdc++: Optimize std::add_pointer compilation performance
>   c++: Implement 

Re: [PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-07 Thread Ajit Agarwal
Hello Segher:

On 01/03/24 3:02 am, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Feb 19, 2024 at 04:24:37PM +0530, Ajit Agarwal wrote:
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -518,7 +518,7 @@ or1k*-*-*)
>>  ;;
>>  powerpc*-*-*)
>>  cpu_type=rs6000
>> -extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
>> +extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
>> rs6000-vecload-fusion.o"
> 
> Line too long.

I will incorporate this change.
> 
>> +  /* Pass to replace adjacent memory addresses lxv instruction with lxvp
>> + instruction.  */
>> +  INSERT_PASS_BEFORE (pass_early_remat, 1, pass_analyze_vecload);
> 
> That is not such a great name.  Any pss name with "analyze" is not so
> good -- the pass does much more than just "analyze" things!
> 

I will change that and incorporate that.

>> --- /dev/null
>> +++ b/gcc/config/rs6000/rs6000-vecload-fusion.cc
>> @@ -0,0 +1,701 @@
>> +/* Subroutines used to replace lxv with lxvp
>> +   for TARGET_POWER10 and TARGET_VSX,
> 
> The pass filename is not good then, either.
>

I will change and incorporate it.
 
>> +   Copyright (C) 2020-2023 Free Software Foundation, Inc.
> 
> What in here is from 2020?
> 
> Most things will be from 2024, too.  First publication date is what
> counts.

Please let me know the second publication date.

> 
>> +   Contributed by Ajit Kumar Agarwal .
> 
> We don't say such things in the files normally.
>

Yes I will remove it.
 
>> +class rs6000_pair_fusion : public pair_fusion
>> +{
>> +public:
>> +  rs6000_pair_fusion (bb_info *bb) : pair_fusion (bb) {reg_ops = NULL;};
>> +  bool is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool load_p);
>> +  bool pair_mem_ok_policy (rtx first_mem, bool load_p, machine_mode mode)
>> +  {
>> +return !(first_mem || load_p || mode);
>> +  }
> 
> It is much more natural to write this as
>   retuurn !first_mem && !load && !mode;
> 
> (_p is wrong, this is not a predicate, it is not a function at all!)
> 

Surely I will do that.

> What is "!mode" for here?  How can VOIDmode happen here?  What does it
> mean?  This needs to be documented.
>
Yes I will document that.

 
>> +  bool pair_check_register_operand (bool load_p, rtx reg_op,
>> +machine_mode mem_mode)
>> +  {
>> +if (load_p || reg_op || mem_mode)
>> +  return false;
>> +else
>> +  return false;
>> +  }
> 
> The compiler will have warned for this.  Please look at all compiler
> (and other) warnings that you introduce.
>

As far as my understanding I didn't see any extra warnings, 
but I will surely cross check and solve that.
 
>> +rs6000_pair_fusion::is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool 
>> load_p)
>> +{
>> +  return !((reg_op && mem_mode) || load_p);
>> +}
> 
> For more complex logic, split it up into two or more conditional
> returns.
> 

Surely I will do that.

>> +// alias_walker that iterates over stores.
>> +template
>> +class store_walker : public def_walker
> 
> That is not a good comment.  You should describe parameters and return
> values and that kind of thing.  That it walks over things is bloody
> obvious from the name already :-)
>

This part of code is taken from aarch64 load store fusion
pass.  I have made the aarch64-ldp-fusion.cc into target independent code and 
target dependent code. Target independent code is shared
across all the architecture, In this case its rs6000 and aarch64.
Target dependent code is implemented through pure virtual functions.

While doing this, I have not changed target independent code
taken as it is from aarch64-ldp-fusion.cc 

This is how they have added comments.

Target dependent code is based in rs6000-vecload-fusion.cc and 
aarch64-ldp-fusion.cc.

Target independent code is populated in 3 files.

gcc/pair-fusion-base.h

This file has declaration of pair_fusion base class and 
other classes declarations along with prototype of common
fusions in gcc/pair-fusion-common.cc

gcc/pair-fusion-common.cc

Here we have common function that is shared across all 
architectures that are helper functions that are used
inside pair_fusion class member functions.
 
gcc/pair-fusion.cc

These are implementation of member function of pair_fusion
class.

Architecture dependent files are 
rs6000-vecload-fusion-pass/aarch64-ldp-fusion.cc. This has implementation of 
derived classes from pair_fusion classes 
and target specific code added to it.

>> +extern insn_info *
>> +find_trailing_add (insn_info *insns[2],
>> +   const insn_range_info _range,
>> +   int initial_writeback,
>> +   rtx *writeback_effect,
>> +   def_info **add_def,
>> +   def_info *base_def,
>> +   poly_int64 initial_offset,
>> +   unsigned access_size);
> 
> That is way, way, way too many parameters.
> 

This code I have taken from aarch64-ldp-fusion.cc.
I have not changed anything here.

> So:
> 
> * Better names please.
> * Better documentation, 

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:04 PM Uros Bizjak  wrote:

> The source code that deals with the *user* of the CC register assumes
> the former form, so it blindly tries to update the mode of the CC
> register inside LT comparison RTX (some other nearby source code even
> checks for (const_int 0) RTX). Obviously, this is not the case with
> the former form, where the update tries to:

Please read the above as:

... Obviously, this won't work with the former form, ...

Uros.


Re: [PATCH] c++: problematic assert in reference_binding [PR113141]

2024-03-07 Thread Patrick Palka
On Mon, 29 Jan 2024, Patrick Palka wrote:

> On Mon, 29 Jan 2024, Patrick Palka wrote:
> 
> > On Fri, 26 Jan 2024, Jason Merrill wrote:
> > 
> > > On 1/26/24 17:11, Jason Merrill wrote:
> > > > On 1/26/24 16:52, Jason Merrill wrote:
> > > > > On 1/25/24 14:18, Patrick Palka wrote:
> > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > > > OK for trunk/13?  This isn't a very satisfactory fix, but at least
> > > > > > it safely fixes these testcases I guess.  Note that there's
> > > > > > implementation disagreement about the second testcase, GCC always
> > > > > > accepted it but Clang/MSVC/icc reject it.
> > > > > 
> > > > > Because of trying to initialize int& from {c}; removing the extra 
> > > > > braces
> > > > > makes it work everywhore.
> > > > > 
> > > > > https://eel.is/c++draft/dcl.init#list-3.10 says that we always 
> > > > > generate a
> > > > > prvalue in this case, so perhaps we shouldn't recalculate if the
> > > > > initializer is an init-list?
> > > > 
> > > > ...but it seems bad to silently bind a const int& to a prvalue instead 
> > > > of
> > > > directly to the reference returned by the operator, as clang does if we 
> > > > add
> > > > const to the second testcase, so I think there's a defect in the 
> > > > standard
> > > > here.
> > > 
> > > Perhaps bullet 3.9 should change to "...its referenced type is
> > > reference-related to E or scalar, ..."
> > > 
> > > > Maybe for now also disable the maybe_valid heuristics in the case of an
> > > > init-list?
> > > > 
> > > > > The first testcase is special because it's a C-style cast; seems like 
> > > > > the
> > > > > maybe_valid = false heuristics should be disabled if c_cast_p.
> > 
> > Thanks a lot for the pointers.  IIUC c_cast_p and LOOKUP_SHORTCUT_BAD_CONVS
> > should already be mutually exclusive, since the latter is set only when
> > computing argument conversions, so it shouldn't be necessary to check 
> > c_cast_p.
> > 
> > I suppose we could disable the heuristic for init-lists, but after some
> > digging I noticed that the heuristics were originally in same spot they
> > are now until r5-601-gd02f620dc0bb3b moved them to get checked after
> > the recursive recalculation case in reference_binding, returning a bad
> > conversion instead of NULL.  (Then in r13-1755-g68f37670eff0b872 I moved
> > them back; IIRC that's why I felt confident that moving the checks was 
> > safe.)
> > Thus we didn't always accept the second testcase, we only started doing so 
> > in
> > GCC 5: https://godbolt.org/z/6nsEW14fh (sorry for missing this and saying we
> > always accepted it)
> > 
> > And indeed the current order of checks seems consistent with that of
> > [dcl.init.ref]/5.  So I wonder if we don't instead want to "complete"
> > the NULL-to-bad-conversion adjustment in r5-601-gd02f620dc0bb3b and
> > do:
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (reference_binding): Set bad_p according to
> > maybe_valid_p in the recursive case as well.  Remove
> > redundant gcc_assert.
> > 
> > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > index 9de0d77c423..c4158b2af37 100644
> > --- a/gcc/cp/call.cc
> > +++ b/gcc/cp/call.cc
> > @@ -2033,8 +2033,8 @@ reference_binding (tree rto, tree rfrom, tree expr, 
> > bool c_cast_p, int flags,
> >sflags, complain);
> > if (!new_second)
> >   return bad_direct_conv ? bad_direct_conv : nullptr;
> > +   t->bad_p = !maybe_valid_p;
> 
> Oops, that should be |= not =.
> 
> > > Perhaps bullet 3.9 should change to "...its referenced type is
> > > reference-related to E or scalar, ..."
> > conv = merge_conversion_sequences (t, new_second);
> > -   gcc_assert (maybe_valid_p || conv->bad_p);
> > return conv;
> >   }
> >  }
> > 
> > This'd mean we'd go back to rejecting the second testcase (only the
> > call, not the direct-init, interestingly enough), but that seems to be
> 
> In the second testcase, with the above fix initialize_reference silently
> returns error_mark_node for the direct-init without issuing a
> diagnostic, because in the error path convert_like doesn't find anything
> wrong with the bad conversion.  So more changes need to be made if we
> want to set bad_p in the recursive case of reference_binding it seems;
> dunno if that's the path we want to go down?
> 
> On the other hand, disabling the badness checks in certain cases seems
> to be undesirable as well, since AFAICT their current position is
> consistent with [dcl.init.ref]/5?
> 
> So I wonder if we should just go with the safest thing at this stage,
> which would be the original patch that removes the problematic assert?

Ping.

> 
> > the correct behavior anyway IIUC.  The testsuite is otherwise happy
> > with this change.

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 6:39 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 10:55:12AM +0100, Richard Biener wrote:
> > On Thu, 7 Mar 2024, Uros Bizjak wrote:
> > > This is
> > >
> > > 3236  /* Just replace the CC reg with a new mode.  */
> > > 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> > > 3238  undobuf.other_insn = cc_use_insn;
> > >
> > > in combine.cc, where *cc_use_loc is
> > >
> > > (unspec:DI [
> > > (reg:CC 17 flags)
> > > ] UNSPEC_PUSHFL)
> > >
> > > combine assumes CC must be used inside of a comparison and uses XEXP 
> > > (..., 0)
>
> No.  It has established *this is the case* some time earlier.  Lines\
> 3155 and on, what begins with
>   /* Many machines have insns that can both perform an
>  arithmetic operation and set the condition code.
>
> > > OK for trunk?
> >
> > Since you CCed me - looking at the code I wonder why we fatally fail.
>
> I did not get this email btw.  Some blip in email (on the sender's side)
> I guess?
>
> > The following might also fix the issue and preserve more of the
> > rest of the flow of the function.
>
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> > *i1, rtx_insn *i0,
> >
> >if (undobuf.other_insn == 0
> >   && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
> > -   _use_insn)))
> > +   _use_insn))
> > + && COMPARISON_P (*cc_use_loc))
>
> Line 3167 already is
>   && GET_CODE (SET_SRC (PATTERN (i3))) == COMPARE
> so what in your backend is unusual?

When combine tries to combine instructions involving COMPARE RTX, e.g.:

(define_insn "*add_2"
  [(set (reg FLAGS_REG)
(compare
  (plus:SWI
(match_operand:SWI 1 "nonimmediate_operand" "%0,0,,rm,r")
(match_operand:SWI 2 "" ",,0,r,"))
  (const_int 0)))
   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,,,r,r")
(plus:SWI (match_dup 1) (match_dup 2)))]

it also updates the *user* of the CC register. The *user* is e.g.:

(define_insn "*setcc_qi"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
(match_operator:QI 1 "ix86_comparison_operator"
  [(reg FLAGS_REG) (const_int 0)]))]

where "ix86_comparison_operator" is one of EQ, NE, GE, LT ... RTX codes.

The part we want to fix deals with the *user* of the CC register. It
is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ...
in the form of

(LT:CCGC (reg:CCGC 17 flags) (const_int 0))

but can be something else, such as the above noted

 (unspec:DI [
 (reg:CC 17 flags)
 ] UNSPEC_PUSHFL)

The source code that deals with the *user* of the CC register assumes
the former form, so it blindly tries to update the mode of the CC
register inside LT comparison RTX (some other nearby source code even
checks for (const_int 0) RTX). Obviously, this is not the case with
the former form, where the update tries to:

SUBST (XEXP (*cc_use_loc, 0), ...)

on unspec, which has no XEXP (..., 0).

And *this* is what triggers RTX checking assert.

Uros.


Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-03-07 Thread Richard Sandiford
Sorry, still catching up on email, but: 

Richard Biener  writes:
> We have optimize_vectors_before_lowering_p but we shouldn't even there
> turn supported into not supported ops and as said, what's supported or
> not cannot be finally decided (if it's only vcond and not vcond_mask
> that is supported).  Also optimize_vectors_before_lowering_p is set
> for a short time between vectorization and vector lowering and we
> definitely do not want to turn supported vectorizer emitted stmts
> into ones that we need to lower.  For GCC 15 we should see to move
> vector lowering before vectorization (before loop optimization I'd
> say) to close this particula hole (and also reliably ICE when the
> vectorizer creates unsupported IL).  We also definitely want to
> retire vcond expanders (no target I know of supports single-instruction
> compare-and-select).

...definitely agree with this FWIW.  Sounds like a much cleaner approach.

One of the main tricks that vcond*s tend to do is invert "difficult"
comparisons and swap the data operands to match.  But I think we should
move to a situation where targets don't provide comparison patterns
that require an inversion, and instead move inversions to generic code.

Richard


[committed] libstdc++: Do not define lock-free atomic aliases if not fully lock-free [PR114103]

2024-03-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The whole point of these typedefs is to guarantee lock-freedom, so if
the target has no such types, we shouldn't defined the typedefs at all.

libstdc++-v3/ChangeLog:

PR libstdc++/114103
* include/bits/version.def (atomic_lock_free_type_aliases): Add
extra_cond to check for at least one always-lock-free type.
* include/bits/version.h: Regenerate.
* include/std/atomic (atomic_signed_lock_free)
(atomic_unsigned_lock_free): Only use always-lock-free types.
* src/c++20/tzdb.cc (time_zone::_Impl::RulesCounter): Don't use
atomic counter if lock-free aliases aren't available.
* testsuite/29_atomics/atomic/lock_free_aliases.cc: XFAIL for
targets without lock-free word-size compare_exchange.
---
 libstdc++-v3/include/bits/version.def  | 1 +
 libstdc++-v3/include/bits/version.h| 2 +-
 libstdc++-v3/include/std/atomic| 6 +++---
 libstdc++-v3/src/c++20/tzdb.cc | 7 ++-
 .../testsuite/29_atomics/atomic/lock_free_aliases.cc   | 1 +
 5 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 502961eb269..d298420121b 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -739,6 +739,7 @@ ftms = {
   values = {
 v = 201907;
 cxxmin = 20;
+extra_cond = "(__GCC_ATOMIC_INT_LOCK_FREE | __GCC_ATOMIC_LONG_LOCK_FREE | 
__GCC_ATOMIC_CHAR_LOCK_FREE) & 2";
   };
 };
 
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 7a6fbd35e2e..9107b45a484 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -819,7 +819,7 @@
 #undef __glibcxx_want_atomic_float
 
 #if !defined(__cpp_lib_atomic_lock_free_type_aliases)
-# if (__cplusplus >= 202002L)
+# if (__cplusplus >= 202002L) && ((__GCC_ATOMIC_INT_LOCK_FREE | 
__GCC_ATOMIC_LONG_LOCK_FREE | __GCC_ATOMIC_CHAR_LOCK_FREE) & 2)
 #  define __glibcxx_atomic_lock_free_type_aliases 201907L
 #  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_atomic_lock_free_type_aliases)
 #   define __cpp_lib_atomic_lock_free_type_aliases 201907L
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 559f8370459..1462cf5ec23 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -1774,13 +1774,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 = atomic>;
   using atomic_unsigned_lock_free
 = atomic>;
-# elif ATOMIC_INT_LOCK_FREE || !(ATOMIC_LONG_LOCK_FREE || 
ATOMIC_CHAR_LOCK_FREE)
+# elif ATOMIC_INT_LOCK_FREE == 2
   using atomic_signed_lock_free = atomic;
   using atomic_unsigned_lock_free = atomic;
-# elif ATOMIC_LONG_LOCK_FREE
+# elif ATOMIC_LONG_LOCK_FREE == 2
   using atomic_signed_lock_free = atomic;
   using atomic_unsigned_lock_free = atomic;
-# elif ATOMIC_CHAR_LOCK_FREE
+# elif ATOMIC_CHAR_LOCK_FREE == 2
   using atomic_signed_lock_free = atomic;
   using atomic_unsigned_lock_free = atomic;
 # else
diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index e03f4a5c32a..890a4c53e2d 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -651,7 +651,7 @@ namespace std::chrono
 template requires _Tp::is_always_lock_free
   struct RulesCounter<_Tp>
   {
-   atomic_signed_lock_free counter{0};
+   _Tp counter{0};
 
void
increment()
@@ -703,7 +703,12 @@ namespace std::chrono
   };
 #endif // __GTHREADS && __cpp_lib_atomic_wait
 
+#if __cpp_lib_atomic_lock_free_type_aliases
 RulesCounter rules_counter;
+#else
+RulesCounter rules_counter;
+#endif
+
 #else // TZDB_DISABLED
 _Impl(weak_ptr) { }
 struct {
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/lock_free_aliases.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic/lock_free_aliases.cc
index 372a63129ff..489d181d136 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/lock_free_aliases.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/lock_free_aliases.cc
@@ -1,5 +1,6 @@
 // { dg-do compile { target c++20 } }
 // { dg-add-options no_pch }
+// { dg-require-atomic-cmpxchg-word "PR libstdc++/114103" }
 
 #include 
 
-- 
2.43.2



Re: [PATCH] libstdc++: Better diagnostics for std::format errors

2024-03-07 Thread Jonathan Wakely
Pushed to trunk.

On Fri, 1 Mar 2024 at 15:09, Jonathan Wakely  wrote:

> Does the text of these new diagnostics look good?
>
> There are of course other ways for a type to be not-formattable (e.g.
> the formatter::format member doesn't return the right type or has some
> other kind of incorrect signature, or the formatter::parse member isn't
> constexpr) but we can't predict/detect them all reliably. This just
> attempts to give a user-friendly explanation for a couple of common
> mistakes. It should not have any false positives, because the
> basic_format_arg constructor requires __formattable_with<_Tp, _Context>
> so if either of these assertions fails, constructing __arg will fail
> too.  The static_assert only adds a more readable error for a
> compilation that's going to fail anyway.
>
> Tested x86_64-linux.
>
> -- >8 --
>
> This adds two new static_assert messages to the internals of
> std::make_format_args to give better diagnostics for invalid format
> args. Rather than just getting an error saying that basic_format_arg
> cannot be constructed, we get more specific errors for the cases where
> std::formatter isn't specialized for the type at all, and where it's
> specialized but only meets the BasicFormatter requirements and so can
> only format non-const arguments.
>
> Also add a test for the existing static_assert when constructing a
> format_string for non-formattable args.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/format (_Arg_store::_S_make_elt): Add two
> static_assert checks to give more user-friendly error messages.
> * testsuite/lib/prune.exp (libstdc++-dg-prune): Prune another
> form of "in requirements with" note.
> * testsuite/std/format/arguments/args_neg.cc: Check for
> user-friendly diagnostics for non-formattable types.
> * testsuite/std/format/string_neg.cc: Likewise.
> ---
>  libstdc++-v3/include/std/format   | 13 +++
>  libstdc++-v3/testsuite/lib/prune.exp  |  1 +
>  .../std/format/arguments/args_neg.cc  | 34 ++-
>  .../testsuite/std/format/string_neg.cc|  4 +++
>  4 files changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/format
> b/libstdc++-v3/include/std/format
> index ee189f9086c..1e839e88db4 100644
> --- a/libstdc++-v3/include/std/format
> +++ b/libstdc++-v3/include/std/format
> @@ -3704,6 +3704,19 @@ namespace __format
> static _Element_t
> _S_make_elt(_Tp& __v)
> {
> + using _Tq = remove_const_t<_Tp>;
> + using _CharT = typename _Context::char_type;
> + static_assert(is_default_constructible_v>,
> +   "std::formatter must be specialized for the type "
> +   "of each format arg");
> + using __format::__formattable_with;
> + if constexpr (is_const_v<_Tp>)
> +   if constexpr (!__formattable_with<_Tp, _Context>)
> + if constexpr (__formattable_with<_Tq, _Context>)
> +   static_assert(__formattable_with<_Tp, _Context>,
> + "format arg must be non-const because its "
> + "std::formatter specialization has a "
> + "non-const reference parameter");
>   basic_format_arg<_Context> __arg(__v);
>   if constexpr (_S_values_only)
> return __arg._M_val;
> diff --git a/libstdc++-v3/testsuite/lib/prune.exp
> b/libstdc++-v3/testsuite/lib/prune.exp
> index 24a15ccad22..071dcf34c1e 100644
> --- a/libstdc++-v3/testsuite/lib/prune.exp
> +++ b/libstdc++-v3/testsuite/lib/prune.exp
> @@ -54,6 +54,7 @@ proc libstdc++-dg-prune { system text } {
>  regsub -all "(^|\n)\[^\n\]*:   . skipping \[0-9\]* instantiation
> contexts \[^\n\]*" $text "" text
>  regsub -all "(^|\n)\[^\n\]*:   in .constexpr. expansion \[^\n\]*"
> $text "" text
>  regsub -all "(^|\n)\[^\n\]*:   in requirements  .with\[^\n\]*" $text
> "" text
> +regsub -all "(^|\n)\[^\n\]*:   in requirements with\[^\n\]*" $text ""
> text
>  regsub -all "(^|\n)inlined from \[^\n\]*" $text "" text
>  # Why doesn't GCC need these to strip header context?
>  regsub -all "(^|\n)In file included from \[^\n\]*" $text "" text
> diff --git a/libstdc++-v3/testsuite/std/format/arguments/args_neg.cc
> b/libstdc++-v3/testsuite/std/format/arguments/args_neg.cc
> index 16ac3040146..ded56fe63ab 100644
> --- a/libstdc++-v3/testsuite/std/format/arguments/args_neg.cc
> +++ b/libstdc++-v3/testsuite/std/format/arguments/args_neg.cc
> @@ -6,7 +6,39 @@
>
>  std::string rval() { return "path/etic/experience"; }
>
> -void f()
> +void test_rval()
>  {
>(void)std::make_format_args(rval()); // { dg-error "cannot bind
> non-const lvalue reference" }
>  }
> +
> +void test_missing_specialization()
> +{
> +  struct X { };
> +  X x;
> +  (void)std::make_format_args(x); // { dg-error "here" }
> +// { dg-error "std::formatter must be specialized" "" { 

Re: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]

2024-03-07 Thread Patrick Palka
On Thu, 7 Mar 2024, Jason Merrill wrote:

> On 3/7/24 14:41, Patrick Palka wrote:
> > On Thu, 7 Mar 2024, Patrick Palka wrote:
> > 
> > > On Wed, 6 Mar 2024, Jason Merrill wrote:
> > > 
> > > > On 3/4/24 17:26, Patrick Palka wrote:
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > > OK for trunk?
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > Alias templates are weird in that their specializations can appear in
> > > > > both decl_specializations and type_specializations.  They appear in
> > > > > the
> > > > > latter only at parse time via finish_template_type.  This should
> > > > > probably
> > > > > be revisited in GCC 15 since it seems sufficient to store them only in
> > > > > decl_specializations.
> > > > 
> > > > It looks like most all of lookup_template_class is wrong for alias
> > > > templates.
> > > > 
> > > > Can we move the alias template handling up higher and unconditionally
> > > > return
> > > > the result of tsubst?
> > > 
> > > This works nicely (although we have to use instantiate_alias_template
> > > directly instead of tsubst since tsubst would first substitute the
> > > uncoerced arguments into the generic DECL_TI_ARGS which breaks for
> > > for parameter packs).  And it allows for some nice simplifications in
> > > the modules code which had to handle alias template specializations
> > > specially.
> > > 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > 
> > > -- >8 --
> > > 
> > > Subject: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]
> > > 
> > > Alias templates are weird in that their specializations can appear in
> > > both decl_specializations and type_specializations.  They appear in the
> > > type table only at parse time via finish_template_type.  There seems
> > > to be no good reason for this, and the code paths end up stepping over
> > > each other in particular for a partial alias template instantiation such
> > > as A::key_arg in the below modules testcase: the type code path
> > > (lookup_template_class) wants to set TI_TEMPLATE to the most general
> > > template whereas the decl code path (tsubst_template_decl called during
> > > instantiation of A) already set TI_TEMPLATE to the partially
> > > instantiated TEMPLATE_DECL.  This ends up confusing modules which
> > > decides to stream the logically equivalent TYPE_DECL and TEMPLATE_DECL
> > > for this partial alias template instantiation separately.
> > > 
> > > This patch fixes this by making lookup_template_class dispatch to
> > > instantiatie_alias_template early for alias template specializations.
> > > In turn we now only add such specializations to the decl table and
> > > not also the type table.  This admits some nice simplification in
> > > the modules code which otherwise has to cope with such specializations
> > > appearing in both tables.
> > > 
> > >   PR c++/103994
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * cp-tree.h (add_mergeable_specialization): Remove is_alias
> > >   parameter.
> > >   * module.cc (depset::disc_bits::DB_ALIAS_SPEC_BIT): Remove.
> > >   (depset::is_alias): Remove.
> > >   (merge_kind::MK_tmpl_alias_mask): Remove.
> > >   (merge_kind::MK_alias_spec): Remove.
> > >   (merge_kind_name): Remove entries for alias specializations.
> > >   (trees_in::decl_value): Adjust add_mergeable_specialization
> > >   calls.
> > >   (trees_out::get_merge_kind) :
> > >   Use MK_decl_spec for alias template specializations.
> > >   (trees_out::key_mergeable): Simplify after MK_tmpl_alias_mask
> > >   removal.
> > >   (specialization_add): Don't allow alias templates when !decl_p.
> > >   (depset::hash::add_specializations): Remove now-dead code
> > >   accomodating alias template specializations in the type table.
> > >   * pt.cc (lookup_template_class): Dispatch early to
> > >   instantiate_alias_template for alias templates.  Simplify
> > >   accordingly.
> > >   (add_mergeable_specialization): Remove alias_p parameter and
> > >   simplify accordingly.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/modules/pr99425-1_b.H: s/alias/decl in dump scan.
> > >   * g++.dg/modules/tpl-alias-1_a.H: Likewise.
> > >   * g++.dg/modules/tpl-alias-2_a.H: New test.
> > >   * g++.dg/modules/tpl-alias-2_b.C: New test.
> > > ---
> > >   gcc/cp/cp-tree.h |  3 +-
> > >   gcc/cp/module.cc | 50 ++--
> > >   gcc/cp/pt.cc | 84 
> > >   gcc/testsuite/g++.dg/modules/pr99425-1_b.H   |  2 +-
> > >   gcc/testsuite/g++.dg/modules/tpl-alias-1_a.H |  2 +-
> > >   gcc/testsuite/g++.dg/modules/tpl-alias-2_a.H | 15 
> > >   gcc/testsuite/g++.dg/modules/tpl-alias-2_b.C |  9 +++
> > >   7 files changed, 69 insertions(+), 96 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/tpl-alias-2_a.H
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/tpl-alias-2_b.C
> > > 
> > > diff --git a/gcc/cp/cp-tree.h 

[committed] libstdc++: Update expiry times for leap seconds lists

2024-03-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The list in tzdb.cc isn't the only hardcoded list of leap seconds in the
library, there's the one defined inline in  (to avoid loading
the tzdb for the common case) and another in a testcase. This updates
them to note that there are no new leap seconds in 2024 either, until at
least 2024-12-28.

libstdc++-v3/ChangeLog:

* include/std/chrono (__get_leap_second_info): Update expiry
time for hardcoded list of leap seconds.
* testsuite/std/time/tzdb/leap_seconds.cc: Update comment.
---
 libstdc++-v3/include/std/chrono  | 2 +-
 libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index a59af34567c..3a9751781d2 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -3243,7 +3243,7 @@ namespace __detail
   };
   // The list above is known to be valid until (at least) this date
   // and only contains positive leap seconds.
-  const sys_seconds __expires(1703721600s); // 2023-12-28 00:00:00 UTC
+  const sys_seconds __expires(1735344000s); // 2024-12-28 00:00:00 UTC
 
 #if _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
   if (__ss > __expires)
diff --git a/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc 
b/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc
index f5401a24526..5999635a89f 100644
--- a/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc
+++ b/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc
@@ -21,7 +21,7 @@ void
 test_load_leapseconds()
 {
   std::ofstream("leapseconds") << R"(
-# These are all the real leap seconds as of 2022:
+# These are all the real leap seconds as of 2024:
 Leap   1972Jun 30  23:59:60+   S
 Leap   1972Dec 31  23:59:60+   S
 Leap   1973Dec 31  23:59:60+   S
-- 
2.43.2



[committed] libstdc++: Replace unnecessary uses of built-ins in testsuite

2024-03-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

I don't see why we should rely on __builtin_memset etc. in tests. We can
just include  and use the public API.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/deque/allocator/default_init.cc: Use
std::memset instead of __builtin_memset.
* testsuite/23_containers/forward_list/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/list/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/map/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/set/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/unordered_map/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/unordered_set/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/vector/allocator/default_init.cc:
Likewise.
* testsuite/23_containers/vector/bool/allocator/default_init.cc:
Likewise.
* testsuite/29_atomics/atomic/compare_exchange_padding.cc:
Likewise.
* testsuite/util/atomic/wait_notify_util.h: Likewise.
---
 .../23_containers/deque/allocator/default_init.cc |  5 +++--
 .../forward_list/allocator/default_init.cc|  5 +++--
 .../23_containers/list/allocator/default_init.cc  |  5 +++--
 .../23_containers/map/allocator/default_init.cc   |  5 +++--
 .../23_containers/set/allocator/default_init.cc   |  5 +++--
 .../unordered_map/allocator/default_init.cc   |  5 +++--
 .../unordered_set/allocator/default_init.cc   |  5 +++--
 .../23_containers/vector/allocator/default_init.cc|  5 +++--
 .../vector/bool/allocator/default_init.cc |  5 +++--
 .../29_atomics/atomic/compare_exchange_padding.cc |  5 +++--
 libstdc++-v3/testsuite/util/atomic/wait_notify_util.h | 11 +--
 11 files changed, 35 insertions(+), 26 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/23_containers/deque/allocator/default_init.cc 
b/libstdc++-v3/testsuite/23_containers/deque/allocator/default_init.cc
index ce8c6ba8114..63ada98d048 100644
--- a/libstdc++-v3/testsuite/23_containers/deque/allocator/default_init.cc
+++ b/libstdc++-v3/testsuite/23_containers/deque/allocator/default_init.cc
@@ -22,6 +22,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 using T = int;
@@ -34,7 +35,7 @@ void test01()
   typedef std::deque test_type;
 
   __gnu_cxx::__aligned_buffer buf;
-  __builtin_memset(buf._M_addr(), ~0, sizeof(test_type));
+  std::memset(buf._M_addr(), ~0, sizeof(test_type));
 
   test_type *tmp = ::new(buf._M_addr()) test_type;
 
@@ -49,7 +50,7 @@ void test02()
   typedef std::deque test_type;
 
   __gnu_cxx::__aligned_buffer buf;
-  __builtin_memset(buf._M_addr(), ~0, sizeof(test_type));
+  std::memset(buf._M_addr(), ~0, sizeof(test_type));
 
   test_type *tmp = ::new(buf._M_addr()) test_type();
 
diff --git 
a/libstdc++-v3/testsuite/23_containers/forward_list/allocator/default_init.cc 
b/libstdc++-v3/testsuite/23_containers/forward_list/allocator/default_init.cc
index 1865e39a885..d8a8bdf05a9 100644
--- 
a/libstdc++-v3/testsuite/23_containers/forward_list/allocator/default_init.cc
+++ 
b/libstdc++-v3/testsuite/23_containers/forward_list/allocator/default_init.cc
@@ -22,6 +22,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 using T = int;
@@ -34,7 +35,7 @@ void test01()
   typedef std::forward_list test_type;
 
   __gnu_cxx::__aligned_buffer buf;
-  __builtin_memset(buf._M_addr(), ~0, sizeof(test_type));
+  std::memset(buf._M_addr(), ~0, sizeof(test_type));
 
   test_type *tmp = ::new(buf._M_addr()) test_type;
 
@@ -49,7 +50,7 @@ void test02()
   typedef std::forward_list test_type;
 
   __gnu_cxx::__aligned_buffer buf;
-  __builtin_memset(buf._M_addr(), ~0, sizeof(test_type));
+  std::memset(buf._M_addr(), ~0, sizeof(test_type));
 
   test_type *tmp = ::new(buf._M_addr()) test_type();
 
diff --git 
a/libstdc++-v3/testsuite/23_containers/list/allocator/default_init.cc 
b/libstdc++-v3/testsuite/23_containers/list/allocator/default_init.cc
index ab19ca7070c..cffad227bc0 100644
--- a/libstdc++-v3/testsuite/23_containers/list/allocator/default_init.cc
+++ b/libstdc++-v3/testsuite/23_containers/list/allocator/default_init.cc
@@ -22,6 +22,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 using T = int;
@@ -34,7 +35,7 @@ void test01()
   typedef std::list test_type;
 
   __gnu_cxx::__aligned_buffer buf;
-  __builtin_memset(buf._M_addr(), ~0, sizeof(test_type));
+  std::memset(buf._M_addr(), ~0, sizeof(test_type));
 
   VERIFY( buf._M_ptr()->get_allocator().state != 0 );
 
@@ -51,7 +52,7 @@ void test02()
   typedef std::list test_type;
 
   __gnu_cxx::__aligned_buffer buf;
-  __builtin_memset(buf._M_addr(), ~0, sizeof(test_type));
+  std::memset(buf._M_addr(), ~0, sizeof(test_type));
 
   VERIFY( buf._M_ptr()->get_allocator().state != 0 );
 
diff --git a/libstdc++-v3/testsuite/23_containers/map/allocator/default_init.cc 

Re: [PATCH] libatomic: Fix build for --disable-gnu-indirect-function [PR113986]

2024-03-07 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Fix libatomic build to support --disable-gnu-indirect-function on AArch64.
> Always build atomic_16.S and add aliases to the __atomic_* functions if
> !HAVE_IFUNC.

This description is too brief for me.  Could you say in detail how the
new scheme works?  E.g. the description doesn't explain:

> -if ARCH_AARCH64_HAVE_LSE128
> -AM_CPPFLAGS   = -DHAVE_FEAT_LSE128
> -endif

And what's the purpose of ARCH_AARCH64_HAVE_LSE128 after this change?

Is the indirection via ALIAS2 necessary?  Couldn't ENTRY just define
the __atomic_* symbols directly, as non-hidden, if we remove the
libat_ prefix?  That would make it easier to ensure that the lists
are kept up-to-date.

Shouldn't we skip the ENTRY_FEAT functions and existing aliases
if !HAVE_IFUNC?

I think it'd be worth (as a prepatch) splitting the file into two
#included subfiles, one that contains the base AArch64 routines and one
that contains the optimised versions.  The former would then be #included
for all builds while the latter would be specific to HAVE_IFUNC.

Thanks,
Richard

> Passes regress and bootstrap, OK for commit?
>
> libatomic:
> PR target/113986
> * Makefile.in: Regenerated.
> * Makefile.am: Make atomic_16.S not depend on HAVE_IFUNC.
> Remove predefine of HAVE_FEAT_LSE128.
> * config/linux/aarch64/atomic_16.S: Add __atomic_ aliases if 
> !HAVE_IFUNC. 
> * config/linux/aarch64/host-config.h: Correctly handle !HAVE_IFUNC.
>
> ---
>
> diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
> index 
> d49c44c7d5fbe83061fddd1f8ef4813a39eb1b8b..980677f353345c050f6cef2d57090360216c56cf
>  100644
> --- a/libatomic/Makefile.am
> +++ b/libatomic/Makefile.am
> @@ -130,12 +130,8 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
> _$(s)_.lo,$(SIZEOBJS)))
>  ## On a target-specific basis, include alternates to be selected by IFUNC.
>  if HAVE_IFUNC
>  if ARCH_AARCH64_LINUX
> -if ARCH_AARCH64_HAVE_LSE128
> -AM_CPPFLAGS   = -DHAVE_FEAT_LSE128
> -endif
>  IFUNC_OPTIONS = -march=armv8-a+lse
>  libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
> _$(s)_1_.lo,$(SIZEOBJS)))
> -libatomic_la_SOURCES += atomic_16.S
>  
>  endif
>  if ARCH_ARM_LINUX
> @@ -155,6 +151,10 @@ libatomic_la_LIBADD += $(addsuffix 
> _16_1_.lo,$(SIZEOBJS)) \
>  endif
>  endif
>  
> +if ARCH_AARCH64_LINUX
> +libatomic_la_SOURCES += atomic_16.S
> +endif
> +
>  libatomic_convenience_la_SOURCES = $(libatomic_la_SOURCES)
>  libatomic_convenience_la_LIBADD = $(libatomic_la_LIBADD)
>  
> diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
> index 
> 11c8ec7ba15ba7da5ef55e90bd836317bc270061..d9d529bc502d4ce7b9997640d5f40f5d5cc1232c
>  100644
> --- a/libatomic/Makefile.in
> +++ b/libatomic/Makefile.in
> @@ -90,17 +90,17 @@ build_triplet = @build@
>  host_triplet = @host@
>  target_triplet = @target@
>  @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1 = $(foreach 
> s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
> -@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = atomic_16.S
> -@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach \
> +@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = $(foreach \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   s,$(SIZES),$(addsuffix \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   _$(s)_1_.lo,$(SIZEOBJS))) \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   $(addsuffix \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   _8_2_.lo,$(SIZEOBJS)) \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   tas_1_2_.lo
> -@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
> _8_1_.lo,$(SIZEOBJS))
> -@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(addsuffix 
> _16_1_.lo,$(SIZEOBJS)) \
> +@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(addsuffix 
> _8_1_.lo,$(SIZEOBJS))
> +@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
> _16_1_.lo,$(SIZEOBJS)) \
>  @ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@ $(addsuffix 
> _16_2_.lo,$(SIZEOBJS))
>  
> +@ARCH_AARCH64_LINUX_TRUE@am__append_5 = atomic_16.S
>  subdir = .
>  ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
>  am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
> @@ -156,8 +156,7 @@ am__uninstall_files_from_dir = { \
>}
>  am__installdirs = "$(DESTDIR)$(toolexeclibdir)"
>  LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_LTLIBRARIES)
> -@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__objects_1 =  \
> -@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@   atomic_16.lo
> +@ARCH_AARCH64_LINUX_TRUE@am__objects_1 = atomic_16.lo
>  am_libatomic_la_OBJECTS = gload.lo gstore.lo gcas.lo gexch.lo \
>   glfree.lo lock.lo init.lo fenv.lo fence.lo flag.lo \
>   $(am__objects_1)
> @@ -425,7 +424,7 @@ libatomic_la_LDFLAGS = $(libatomic_version_info) 
> $(libatomic_version_script) \
>   $(lt_host_flags) $(libatomic_darwin_rpath)
>  
>  libatomic_la_SOURCES = gload.c gstore.c gcas.c gexch.c glfree.c lock.c \
> - init.c fenv.c fence.c flag.c $(am__append_2)
> + init.c 

[PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-07 Thread Evgeny Karpov
Monday, March 4, 2024
Evgeny Karpov wrote:

>
> Changes from v1 to v2:
> Adjust the target name to aarch64-*-mingw* to exclude the big-endian target
> from support.
> Exclude 64-bit ISA.
> Rename enum calling_abi to aarch64_calling_abi.
> Move AArch64 MS ABI definitions FIXED_REGISTERS,
> CALL_REALLY_USED_REGISTERS, and STATIC_CHAIN_REGNUM from aarch64.h
> to aarch64-abi-ms.h.
> Rename TARGET_ARM64_MS_ABI to TARGET_AARCH64_MS_ABI.
> Exclude TARGET_64BIT from the aarch64 target.
> Exclude HAVE_GAS_WEAK.
> Set HAVE_GAS_ALIGNED_COMM to 1 by default.
> Use a reference from "x86 Windows Options" to "Cygwin and MinGW
> Options".
> Update commit descriptions to follow standard style.
> Rebase from 4th March 2024.

Hello,

It looks like the initial feedback has been addressed.
While unit testing for the x86_64-w64-mingw32 target is still in
progress, the first 4 patches do not obviously change other
targets, including aarch64-linux-gnu.
Could they be merged once stage 1 starts, 
or could it be done even now? 
Thanks!

Regards,
Evgeny



[PATCH] testsuite, darwin: improve check for -shared support

2024-03-07 Thread FX Coudert
The undefined symbols are allowed for C checks, but when
this is run as C++, the mangled foo() symbol is still
seen as undefined, and the testsuite thinks darwin does not
support -shared.

Pushed after approval by Iain Sandoe in PR114233

FX




0001-testsuite-darwin-improve-check-for-shared-support.patch
Description: Binary data


Re: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]

2024-03-07 Thread Jason Merrill

On 3/7/24 14:41, Patrick Palka wrote:

On Thu, 7 Mar 2024, Patrick Palka wrote:


On Wed, 6 Mar 2024, Jason Merrill wrote:


On 3/4/24 17:26, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

Alias templates are weird in that their specializations can appear in
both decl_specializations and type_specializations.  They appear in the
latter only at parse time via finish_template_type.  This should probably
be revisited in GCC 15 since it seems sufficient to store them only in
decl_specializations.


It looks like most all of lookup_template_class is wrong for alias templates.

Can we move the alias template handling up higher and unconditionally return
the result of tsubst?


This works nicely (although we have to use instantiate_alias_template
directly instead of tsubst since tsubst would first substitute the
uncoerced arguments into the generic DECL_TI_ARGS which breaks for
for parameter packs).  And it allows for some nice simplifications in
the modules code which had to handle alias template specializations
specially.

Bootstrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

Subject: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]

Alias templates are weird in that their specializations can appear in
both decl_specializations and type_specializations.  They appear in the
type table only at parse time via finish_template_type.  There seems
to be no good reason for this, and the code paths end up stepping over
each other in particular for a partial alias template instantiation such
as A::key_arg in the below modules testcase: the type code path
(lookup_template_class) wants to set TI_TEMPLATE to the most general
template whereas the decl code path (tsubst_template_decl called during
instantiation of A) already set TI_TEMPLATE to the partially
instantiated TEMPLATE_DECL.  This ends up confusing modules which
decides to stream the logically equivalent TYPE_DECL and TEMPLATE_DECL
for this partial alias template instantiation separately.

This patch fixes this by making lookup_template_class dispatch to
instantiatie_alias_template early for alias template specializations.
In turn we now only add such specializations to the decl table and
not also the type table.  This admits some nice simplification in
the modules code which otherwise has to cope with such specializations
appearing in both tables.

PR c++/103994

gcc/cp/ChangeLog:

* cp-tree.h (add_mergeable_specialization): Remove is_alias
parameter.
* module.cc (depset::disc_bits::DB_ALIAS_SPEC_BIT): Remove.
(depset::is_alias): Remove.
(merge_kind::MK_tmpl_alias_mask): Remove.
(merge_kind::MK_alias_spec): Remove.
(merge_kind_name): Remove entries for alias specializations.
(trees_in::decl_value): Adjust add_mergeable_specialization
calls.
(trees_out::get_merge_kind) :
Use MK_decl_spec for alias template specializations.
(trees_out::key_mergeable): Simplify after MK_tmpl_alias_mask
removal.
(specialization_add): Don't allow alias templates when !decl_p.
(depset::hash::add_specializations): Remove now-dead code
accomodating alias template specializations in the type table.
* pt.cc (lookup_template_class): Dispatch early to
instantiate_alias_template for alias templates.  Simplify
accordingly.
(add_mergeable_specialization): Remove alias_p parameter and
simplify accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99425-1_b.H: s/alias/decl in dump scan.
* g++.dg/modules/tpl-alias-1_a.H: Likewise.
* g++.dg/modules/tpl-alias-2_a.H: New test.
* g++.dg/modules/tpl-alias-2_b.C: New test.
---
  gcc/cp/cp-tree.h |  3 +-
  gcc/cp/module.cc | 50 ++--
  gcc/cp/pt.cc | 84 
  gcc/testsuite/g++.dg/modules/pr99425-1_b.H   |  2 +-
  gcc/testsuite/g++.dg/modules/tpl-alias-1_a.H |  2 +-
  gcc/testsuite/g++.dg/modules/tpl-alias-2_a.H | 15 
  gcc/testsuite/g++.dg/modules/tpl-alias-2_b.C |  9 +++
  7 files changed, 69 insertions(+), 96 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-alias-2_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-alias-2_b.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 4469d965ef0..14895bc6585 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7642,8 +7642,7 @@ extern void walk_specializations  (bool,
 void *);
  extern tree match_mergeable_specialization(bool is_decl, spec_entry *);
  extern unsigned get_mergeable_specialization_flags (tree tmpl, tree spec);
-extern void add_mergeable_specialization(bool is_decl, bool is_alias,
-spec_entry *,
+extern void add_mergeable_specialization   

Re: [PATCH] c++/modules: inline namespace abi_tag streaming [PR110730]

2024-03-07 Thread Jason Merrill

On 3/6/24 21:12, Patrick Palka wrote:

On Wed, 6 Mar 2024, Jason Merrill wrote:


On 3/6/24 14:10, Patrick Palka wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

The unreduced testcase from this PR crashes at runtime ultimately
because we don't stream the abi_tag attribute on inline namespaces and
so the filesystem::current_path() call resolves to the non-C++11 ABI
version even though the C++11 ABI is active, leading to a crash when
destructing the call result (which contains an std::string member).

While we do stream the DECL_ATTRIBUTES of all decls that go through
the generic tree streaming routines, it seems namespaces are streamed
separately from other decls and we don't use the generic routines for
them.  So this patch makes us stream the abi_tag manually for (inline)
namespaces.


Why not stream all DECL_ATTRIBUTES of all namespaces?


AFAICT abi_tag and deprecated are the only attributes that we recognize
on a namespace, and for deprecated it should suffice to stream the
TREE_DEPRECATED flag instead of the actual attribute, so hardcoding
abi_tag streaming seems convenient.

If we wanted to stream all DECL_ATTRIBUTES of a namespace then we'd have
to assume up front what kind of tree arguments of the attributes can be,
e.g. an INTEGER_CST or a STRING_CST etc and implement streaming of these
trees within the bytes_in/out base classes instead of trees_in/out (we
only have access to a bytes_in/out object from read/write_namespaces).


Hunh, why don't we use trees_in/out for namespaces?

But in that case, the patch is OK.  You might use list_length (tags) 
instead of writing the counting loop yourself.  OK either way.


Jason



Re: [PATCH v2] c++: Redetermine whether to write vtables on stream-in [PR114229]

2024-03-07 Thread Jason Merrill

On 3/7/24 07:49, Nathaniel Shead wrote:

On Wed, Mar 06, 2024 at 08:59:16AM -0500, Jason Merrill wrote:

On 3/5/24 22:06, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, reading a variable definition always marks that decl as
DECL_NOT_REALLY_EXTERN, with anything else imported still being
considered external. This is not sufficient for vtables, however; for an
extern template, a vtable may be generated (and its definition emitted)
but nonetheless the vtable should only be emitted in the TU where that
template is actually instantiated.


Does the vtable go through import_export_decl?  I've been thinking that that
function (and import_export_class) need to be more module-aware. Would it
make sense to do that rather than stream DECL_NOT_REALLY_EXTERN?

Jason



Right. It doesn't go through 'import_export_decl' because when it's
imported, DECL_INTERFACE_KNOWN is already set. So it seems an obvious
fix here is to just ensure that we clear that flag on stream-in for
vtables (we can't do it generally as it seems to be needed to be kept on
various other kinds of declarations).

Linaro complained about the last version of this patch too on ARM;
hopefully this version is friendlier.

I might also spend some time messing around to see if I can implement
https://github.com/itanium-cxx-abi/cxx-abi/issues/170 later, but that
will probably have to be a GCC 15 change.


That's OK, but please add a FIXME in this change until then.

OK with that adjustment.


Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk if
Linaro doesn't complain about this patch?

-- >8 --

We currently always stream DECL_INTERFACE_KNOWN, which is needed since
many kinds of declarations already have their interface determined at
parse time.  But for vtables and type-info declarations we need to
re-evaluate on stream-in, as whether they need to be emitted or not
changes in each TU, so this patch clears DECL_INTERFACE_KNOWN on these
kinds of declarations so that they can go through 'import_export_decl'
again.

Note that the precise details of the virt-2 tests will need to change
when we implement the resolution of [1]; for now I just updated the test
to not fail with the new (current) semantics.

[1]: https://github.com/itanium-cxx-abi/cxx-abi/pull/171

PR c++/114229

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Redetermine
DECL_INTERFACE_KNOWN on stream-in for vtables and tinfo.

gcc/testsuite/ChangeLog:

* g++.dg/modules/virt-2_b.C: Update test to acknowledge that we
now emit vtables here too.
* g++.dg/modules/virt-3_a.C: New test.
* g++.dg/modules/virt-3_b.C: New test.
* g++.dg/modules/virt-3_c.C: New test.
* g++.dg/modules/virt-3_d.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc| 12 +++-
  gcc/testsuite/g++.dg/modules/virt-2_b.C |  5 ++---
  gcc/testsuite/g++.dg/modules/virt-3_a.C |  9 +
  gcc/testsuite/g++.dg/modules/virt-3_b.C |  6 ++
  gcc/testsuite/g++.dg/modules/virt-3_c.C |  3 +++
  gcc/testsuite/g++.dg/modules/virt-3_d.C |  7 +++
  6 files changed, 38 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_b.C
  create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_c.C
  create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_d.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f7e8b357fc2..d77286328f5 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5390,7 +5390,17 @@ trees_out::core_bools (tree t)
WB (t->decl_common.lang_flag_2);
WB (t->decl_common.lang_flag_3);
WB (t->decl_common.lang_flag_4);
-  WB (t->decl_common.lang_flag_5);
+
+  {
+   /* This is DECL_INTERFACE_KNOWN: We should redetermine whether
+  we need to import or export any vtables or typeinfo objects
+  on stream-in.  */
+   bool interface_known = t->decl_common.lang_flag_5;
+   if (VAR_P (t) && (DECL_VTABLE_OR_VTT_P (t) || DECL_TINFO_P (t)))
+ interface_known = false;
+   WB (interface_known);
+  }
+
WB (t->decl_common.lang_flag_6);
WB (t->decl_common.lang_flag_7);
WB (t->decl_common.lang_flag_8);
diff --git a/gcc/testsuite/g++.dg/modules/virt-2_b.C 
b/gcc/testsuite/g++.dg/modules/virt-2_b.C
index e041f0721f9..2bc5eced013 100644
--- a/gcc/testsuite/g++.dg/modules/virt-2_b.C
+++ b/gcc/testsuite/g++.dg/modules/virt-2_b.C
@@ -21,8 +21,7 @@ int main ()
return !(Visit () == 1);
  }
  
-// We do not emit Visitor vtable

-// but we do emit rtti here
-// { dg-final { scan-assembler-not {_ZTVW3foo7Visitor:} } }
+// Again, we emit Visitor vtable and rtti here
+// { dg-final { scan-assembler {_ZTVW3foo7Visitor:} } }
  // { dg-final { scan-assembler {_ZTIW3foo7Visitor:} } }
  // { dg-final { scan-assembler {_ZTSW3foo7Visitor:} } }
diff --git 

Re: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]

2024-03-07 Thread Patrick Palka
On Thu, 7 Mar 2024, Patrick Palka wrote:

> On Wed, 6 Mar 2024, Jason Merrill wrote:
> 
> > On 3/4/24 17:26, Patrick Palka wrote:
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > OK for trunk?
> > > 
> > > -- >8 --
> > > 
> > > Alias templates are weird in that their specializations can appear in
> > > both decl_specializations and type_specializations.  They appear in the
> > > latter only at parse time via finish_template_type.  This should probably
> > > be revisited in GCC 15 since it seems sufficient to store them only in
> > > decl_specializations.
> > 
> > It looks like most all of lookup_template_class is wrong for alias 
> > templates.
> > 
> > Can we move the alias template handling up higher and unconditionally return
> > the result of tsubst?
> 
> This works nicely (although we have to use instantiate_alias_template
> directly instead of tsubst since tsubst would first substitute the
> uncoerced arguments into the generic DECL_TI_ARGS which breaks for
> for parameter packs).  And it allows for some nice simplifications in
> the modules code which had to handle alias template specializations
> specially.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> 
> -- >8 --
> 
> Subject: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]
> 
> Alias templates are weird in that their specializations can appear in
> both decl_specializations and type_specializations.  They appear in the
> type table only at parse time via finish_template_type.  There seems
> to be no good reason for this, and the code paths end up stepping over
> each other in particular for a partial alias template instantiation such
> as A::key_arg in the below modules testcase: the type code path
> (lookup_template_class) wants to set TI_TEMPLATE to the most general
> template whereas the decl code path (tsubst_template_decl called during
> instantiation of A) already set TI_TEMPLATE to the partially
> instantiated TEMPLATE_DECL.  This ends up confusing modules which
> decides to stream the logically equivalent TYPE_DECL and TEMPLATE_DECL
> for this partial alias template instantiation separately.
> 
> This patch fixes this by making lookup_template_class dispatch to
> instantiatie_alias_template early for alias template specializations.
> In turn we now only add such specializations to the decl table and
> not also the type table.  This admits some nice simplification in
> the modules code which otherwise has to cope with such specializations
> appearing in both tables.
> 
>   PR c++/103994
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (add_mergeable_specialization): Remove is_alias
>   parameter.
>   * module.cc (depset::disc_bits::DB_ALIAS_SPEC_BIT): Remove.
>   (depset::is_alias): Remove.
>   (merge_kind::MK_tmpl_alias_mask): Remove.
>   (merge_kind::MK_alias_spec): Remove.
>   (merge_kind_name): Remove entries for alias specializations.
>   (trees_in::decl_value): Adjust add_mergeable_specialization
>   calls.
>   (trees_out::get_merge_kind) :
>   Use MK_decl_spec for alias template specializations.
>   (trees_out::key_mergeable): Simplify after MK_tmpl_alias_mask
>   removal.
>   (specialization_add): Don't allow alias templates when !decl_p.
>   (depset::hash::add_specializations): Remove now-dead code
>   accomodating alias template specializations in the type table.
>   * pt.cc (lookup_template_class): Dispatch early to
>   instantiate_alias_template for alias templates.  Simplify
>   accordingly.
>   (add_mergeable_specialization): Remove alias_p parameter and
>   simplify accordingly.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/pr99425-1_b.H: s/alias/decl in dump scan.
>   * g++.dg/modules/tpl-alias-1_a.H: Likewise.
>   * g++.dg/modules/tpl-alias-2_a.H: New test.
>   * g++.dg/modules/tpl-alias-2_b.C: New test.
> ---
>  gcc/cp/cp-tree.h |  3 +-
>  gcc/cp/module.cc | 50 ++--
>  gcc/cp/pt.cc | 84 
>  gcc/testsuite/g++.dg/modules/pr99425-1_b.H   |  2 +-
>  gcc/testsuite/g++.dg/modules/tpl-alias-1_a.H |  2 +-
>  gcc/testsuite/g++.dg/modules/tpl-alias-2_a.H | 15 
>  gcc/testsuite/g++.dg/modules/tpl-alias-2_b.C |  9 +++
>  7 files changed, 69 insertions(+), 96 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-alias-2_a.H
>  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-alias-2_b.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 4469d965ef0..14895bc6585 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7642,8 +7642,7 @@ extern void walk_specializations(bool,
>void *);
>  extern tree match_mergeable_specialization   (bool is_decl, spec_entry *);
>  extern unsigned get_mergeable_specialization_flags (tree tmpl, tree spec);
> -extern void 

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-07 Thread Ken Matsui
On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
 wrote:
>
> On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  wrote:
> >
> > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> >  wrote:
> > >
> > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> > > >
> > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > equivalence does not hold for types other than signed integers.  When
> > > > it comes to conditions, the former was translated to a combination of
> > > > sub and test, whereas the latter was translated to a single cmp.
> > > > Thus, this optimization pass tries to optimize the former to the
> > > > latter.
> > > >
> > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > as defined behavior, specifically, wrapping around according to two's
> > > > complement arithmetic.  This has implications for optimizations that
> > > > rely on the standard behavior of signed integers, where overflow is
> > > > undefined.  Consider the example given:
> > > >
> > > > long long llmax = __LONG_LONG_MAX__;
> > > > long long llmin = -llmax - 1;
> > > >
> > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > > for a `long long`, this calculation overflows in a defined manner
> > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > produces a negative value due to two's complement wraparound.
> > > > Therefore, `llmax - llmin < 0` is true.
> > > >
> > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > enabled.  This is why this optimization pass is disabled under
> > > > `-fwrapv`.
> > > >
> > > > This optimization pass must run before the Jump Threading pass and the
> > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > >
> > > > (1)
> > > >   int diff = x - y;
> > > >   if (diff > 0)
> > > > foo();
> > > >   if (diff < 0)
> > > > bar();
> > > >
> > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > because we know the postcondition of the first condition is diff <= 0,
> > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > after this VRP, we get:
> > > >
> > > > (2)
> > > >   int diff = x - y;
> > > >   if (x > y)
> > > > foo();
> > > >   if (diff != 0)
> > > > bar();
> > > >
> > > > This generates sub and test for the second condition and cmp for the
> > > > first condition. However, if we apply this pass beforehand, we simply
> > > > get:
> > > >
> > > > (3)
> > > >   int diff = x - y;
> > > >   if (x > y)
> > > > foo();
> > > >   if (x < y)
> > > > bar();
> > > >
> > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > will not be generated, which is more efficient.
> > > >
> > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > (3) above are recognized as different, which prevents TCO.
> > > >
> > > > PR middle-end/113680
> > >
> > > This shouldn't be done as a new optimization pass.  It fits either
> > > the explicit code present in the forwprop pass or a new match.pd
> > > pattern.  There's possible interaction with x - y value being used
> > > elsewhere and thus exposing a CSE opportunity as well as
> > > a comparison against zero being possibly implemented by
> > > a flag setting subtraction instruction.
> > >
> >
> > Thank you so much for your review!  Although the forwprop pass runs
> > multiple times, we might not need to apply this optimization multiple
> > times.  Would it be acceptable to add such optimization?  More
> > generally, I would like to know how to determine where to put
> > optimization in the future.
>
> This kind of pattern matching expression simplification is best
> addressed by patterns in match.pd though historically the forwprop
> pass still catches cases not in match.pd in its
> forward_propagate_into_comparison_1 (and callers).
>

I see.  When would patterns in match.pd be applied?  Around forwprop
or somewhere else?  (Also, could you please tell me a document I can
learn about these if it exists?)  I ask because this optimization
should be applied before the Jump Threading and VRP passes.

> > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
> >
> > > Our VN pass has some tricks to anticipate CSE opportunities
> > > like this, but it's not done "properly" in the 

Re: [PATCH 2/2] aarch64: Add support for _BitInt

2024-03-07 Thread Richard Sandiford
"Andre Vieira (lists)"  writes:
> Hey,
>
> Dropped the first patch and dealt with the comments above, hopefully I 
> didn't miss any this time.
>
> --
>
> This patch adds support for C23's _BitInt for the AArch64 port when 
> compiling
> for little endianness.  Big Endianness requires further target-agnostic
> support and we therefor disable it for now.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (TARGET_C_BITINT_TYPE_INFO): Declare MACRO.
>   (aarch64_bitint_type_info): New function.
>   (aarch64_return_in_memory_1): Return large _BitInt's in memory.
>   (aarch64_function_arg_alignment): Adapt to correctly return the ABI
>   mandated alignment of _BitInt(N) where N > 128 as the alignment of
>   TImode.
>   (aarch64_composite_type_p): Return true for _BitInt(N), where N > 128.
>
> libgcc/ChangeLog:
>
>   * config/aarch64/t-softfp (softfp_extras): Add floatbitinthf,
>   floatbitintbf, floatbitinttf and fixtfbitint.
>   * config/aarch64/libgcc-softfp.ver (GCC_14.0.0): Add __floatbitinthf,
>   __floatbitintbf, __floatbitinttf and __fixtfbitint.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/bitint-alignments.c: New test.
>   * gcc.target/aarch64/bitint-args.c: New test.
>   * gcc.target/aarch64/bitint-sizes.c: New test.
>
>
> On 02/02/2024 14:46, Jakub Jelinek wrote:
>> On Thu, Jan 25, 2024 at 05:45:01PM +, Andre Vieira wrote:
>>> This patch adds support for C23's _BitInt for the AArch64 port when 
>>> compiling
>>> for little endianness.  Big Endianness requires further target-agnostic
>>> support and we therefor disable it for now.
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/aarch64/aarch64.cc (TARGET_C_BITINT_TYPE_INFO): Declare MACRO.
>>> (aarch64_bitint_type_info): New function.
>>> (aarch64_return_in_memory_1): Return large _BitInt's in memory.
>>> (aarch64_function_arg_alignment): Adapt to correctly return the ABI
>>> mandated alignment of _BitInt(N) where N > 128 as the alignment of
>>> TImode.
>>> (aarch64_composite_type_p): Return true for _BitInt(N), where N > 128.
>>>
>>> libgcc/ChangeLog:
>>>
>>> * config/aarch64/t-softfp: Add fixtfbitint, floatbitinttf and
>>> floatbitinthf to the softfp_extras variable to ensure the
>>> runtime support is available for _BitInt.
>> 
>> I think this lacks some config/aarch64/t-whatever.ver
>> additions.
>> See PR113700 for some more details.
>> We want the support routines for binary floating point <-> _BitInt
>> conversions in both libgcc.a and libgcc_s.so.1 and exported from the latter
>> too at GCC_14.0.0 symver, while decimal floating point <-> _BitInt solely in
>> libgcc.a (as with all the huge dfp/bid stuff).
>> 
>>  Jakub
>> 
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 16318bf925883ecedf9345e53fc0824a553b2747..9bd8d22f6edd9f6c77907ec383f9e8bf055cfb8b
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -6583,6 +6583,7 @@ aarch64_return_in_memory_1 (const_tree type)
>int count;
>  
>if (!AGGREGATE_TYPE_P (type)
> +  && TREE_CODE (type) != BITINT_TYPE
>&& TREE_CODE (type) != COMPLEX_TYPE
>&& TREE_CODE (type) != VECTOR_TYPE)
>  /* Simple scalar types always returned in registers.  */
> @@ -21895,6 +21896,11 @@ aarch64_composite_type_p (const_tree type,
>if (type && (AGGREGATE_TYPE_P (type) || TREE_CODE (type) == COMPLEX_TYPE))
>  return true;
>  
> +  if (type
> +  && TREE_CODE (type) == BITINT_TYPE
> +  && int_size_in_bytes (type) > 16)
> +return true;
> +

Think I probably said this before, but for the record: I don't think
the above code has any practical effect, but I agree it's probably better
to include it for completeness.

>if (mode == BLKmode
>|| GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
>|| GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
> @@ -28400,6 +28406,42 @@ aarch64_excess_precision (enum excess_precision_type 
> type)
>return FLT_EVAL_METHOD_UNPREDICTABLE;
>  }
>  
> +/* Implement TARGET_C_BITINT_TYPE_INFO.
> +   Return true if _BitInt(N) is supported and fill its details into *INFO.  
> */
> +bool
> +aarch64_bitint_type_info (int n, struct bitint_info *info)
> +{
> +  if (TARGET_BIG_END)
> +return false;
> +
> +  if (n <= 8)
> +info->limb_mode = QImode;
> +  else if (n <= 16)
> +info->limb_mode = HImode;
> +  else if (n <= 32)
> +info->limb_mode = SImode;
> +  else if (n <= 64)
> +info->limb_mode = DImode;
> +  else if (n <= 128)
> +info->limb_mode = TImode;
> +  else
> +/* The AAPCS for AArch64 defines _BitInt(N > 128) as an array with
> +   type {signed,unsigned} __int128[M] where M*128 >= N.  However, to be
> +   able to use libgcc's implementation to support large _BitInt's we need
> +   to use a LIMB_MODE that is no larger than 'long long'.  This is why we
> +   use DImode for our internal 

[PATCH] bpf: testsuite: fix unresolved test in memset-1.c

2024-03-07 Thread David Faust
The test was trying to do too much by both checking for an error, and
checking the resulting assembly. Of course, due to the error no asm was
produced, so the scan-asm went unresolved. Split it into two separate
tests to fix the issue.

Tested on x86_64-linux-gnu host for bpf-unknown-none target.

gcc/testsuite/

* gcc.target/bpf/memset-1.c: Move error test case to...
* gcc.target/bpf/memset-2.c: ... here. New test.
---
 gcc/testsuite/gcc.target/bpf/memset-1.c |  8 
 gcc/testsuite/gcc.target/bpf/memset-2.c | 22 ++
 2 files changed, 22 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/memset-2.c

diff --git a/gcc/testsuite/gcc.target/bpf/memset-1.c 
b/gcc/testsuite/gcc.target/bpf/memset-1.c
index 9e9f8eff028..7c4768c6e73 100644
--- a/gcc/testsuite/gcc.target/bpf/memset-1.c
+++ b/gcc/testsuite/gcc.target/bpf/memset-1.c
@@ -28,12 +28,4 @@ set_large (struct context *ctx)
   __builtin_memset (dest, 0xfe, 130);
 }
 
-void
-set_variable (struct context *ctx)
-{
-  void *data = (void *)(long)ctx->data;
-  char *dest = data;
-  __builtin_memset (dest, 0xbc, ctx->data_meta); /* { dg-error "could not 
inline call" } */
-}
-
 /* { dg-final { scan-assembler-times "call" 0 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/memset-2.c 
b/gcc/testsuite/gcc.target/bpf/memset-2.c
new file mode 100644
index 000..0602a1a277c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/memset-2.c
@@ -0,0 +1,22 @@
+/* Test that we error if memset cannot be expanded inline.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct context {
+ unsigned int data;
+ unsigned int data_end;
+ unsigned int data_meta;
+ unsigned int ingress;
+ unsigned int queue_index;
+ unsigned int egress;
+};
+
+
+void
+set_variable (struct context *ctx)
+{
+  void *data = (void *)(long)ctx->data;
+  char *dest = data;
+  __builtin_memset (dest, 0xbc, ctx->data_meta); /* { dg-error "could not 
inline call" } */
+}
-- 
2.43.0



[PATCH] bpf: add size threshold for inlining mem builtins

2024-03-07 Thread David Faust
BPF cannot fall back on library calls to implement memmove, memcpy and
memset, so we attempt to expand these inline always if possible.
However, this inline expansion was being attempted even for excessively
large operations, which could result in gcc consuming huge amounts of
memory and hanging.

Add a size threshold in the BPF backend below which to always expand
these operations inline, and introduce an option
-minline-memops-threshold= to control the threshold. Defaults to
1024 bytes.

Tested on x86_64-linux-gnu host for bpf-unknown-none target.
Fixes hang in test gcc.c-torture/compile/20050622-1.c for BPF, which
returns to (correctly) failing due to exceeding the eBPF stack limit.

gcc/

* config/bpf/bpf.cc (bpf_expand_cpymem, bpf_expand_setmem): Do
not attempt inline expansion if size is above threshold.
* config/bpf/bpf.opt (-minline-memops-threshold): New option.
* doc/invoke.texi (eBPF Options) <-minline-memops-threshold>:
Document.

gcc/testsuite/

* gcc.target/bpf/inline-memops-threshold-1.c: New test.
* gcc.target/bpf/inline-memops-threshold-2.c: New test.
---
 gcc/config/bpf/bpf.cc |  8 
 gcc/config/bpf/bpf.opt|  4 
 gcc/doc/invoke.texi   |  9 -
 .../gcc.target/bpf/inline-memops-threshold-1.c| 15 +++
 .../gcc.target/bpf/inline-memops-threshold-2.c| 14 ++
 5 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/inline-memops-threshold-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/inline-memops-threshold-2.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 0e33f4347ba..3f3dcb0a46b 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -1275,6 +1275,10 @@ bpf_expand_cpymem (rtx *operands, bool is_move)
   gcc_unreachable ();
 }
 
+  /* For sizes above threshold, always use a libcall.  */
+  if (size_bytes > (unsigned HOST_WIDE_INT) bpf_inline_memops_threshold)
+return false;
+
   unsigned iters = size_bytes >> ceil_log2 (align);
   unsigned remainder = size_bytes & (align - 1);
 
@@ -1347,6 +1351,10 @@ bpf_expand_setmem (rtx *operands)
   gcc_unreachable ();
 }
 
+  /* For sizes above threshold, always use a libcall.  */
+  if (size_bytes > (unsigned HOST_WIDE_INT) bpf_inline_memops_threshold)
+return false;
+
   unsigned iters = size_bytes >> ceil_log2 (align);
   unsigned remainder = size_bytes & (align - 1);
   unsigned inc = GET_MODE_SIZE (mode);
diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
index acfddebdad7..541ebe4dfc4 100644
--- a/gcc/config/bpf/bpf.opt
+++ b/gcc/config/bpf/bpf.opt
@@ -108,3 +108,7 @@ Enum(asm_dialect) String(normal) Value(ASM_NORMAL)
 
 EnumValue
 Enum(asm_dialect) String(pseudoc) Value(ASM_PSEUDOC)
+
+minline-memops-threshold=
+Target RejectNegative Joined UInteger Var(bpf_inline_memops_threshold) 
Init(1024)
+-minline-memops-threshold= Maximum size of memset/memmove/memcpy to 
inline, larger sizes will use a library call.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2390d478121..7a965631123 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -971,7 +971,7 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-mbig-endian -mlittle-endian
 -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
 -mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -msmov -mcpu=@var{version}
--masm=@var{dialect}}
+-masm=@var{dialect} -minline-memops-threshold=@var{bytes}}
 
 @emph{FR30 Options}
 @gccoptlist{-msmall-model  -mno-lsim}
@@ -25700,6 +25700,13 @@ Outputs pseudo-c assembly dialect.
 
 @end table
 
+@opindex -minline-memops-threshold
+@item -minline-memops-threshold=@var{bytes}
+Specifies a size threshold in bytes at or below which memmove, memcpy
+and memset shall always be expanded inline.  Operations dealing with
+sizes larger than this threshold will be implemented using a library
+call instead of being expanded inline.  The default is @samp{1024}.
+
 @end table
 
 @node FR30 Options
diff --git a/gcc/testsuite/gcc.target/bpf/inline-memops-threshold-1.c 
b/gcc/testsuite/gcc.target/bpf/inline-memops-threshold-1.c
new file mode 100644
index 000..c2ba4db5b7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/inline-memops-threshold-1.c
@@ -0,0 +1,15 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2" "-minline-memops-threshold=256"} */
+
+char buf[512];
+
+void
+mov_small (void)
+{
+  __builtin_memmove (buf, buf + 2, 255);
+}
+
+/* { dg-final { scan-assembler-not "call" } } */
+/* { dg-final { scan-assembler "ldxb" } } */
+/* { dg-final { scan-assembler "stxb" } } */
diff --git a/gcc/testsuite/gcc.target/bpf/inline-memops-threshold-2.c 
b/gcc/testsuite/gcc.target/bpf/inline-memops-threshold-2.c
new file mode 100644
index 000..190b10b579c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/inline-memops-threshold-2.c
@@ -0,0 +1,14 @@
+/* { 

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool
On Thu, Mar 07, 2024 at 12:22:04PM +0100, Uros Bizjak wrote:
> As I understood find_single_use, it is returning RTX iff DEST is used
> only a single time in an insn sequence following INSN.

Connected by a log_link even, yeah.

> We can reject the combination without worries of multiple uses.

Yup.  That is the whole point of find_single_use: if that test fails,
combine knows to get its paws off :-)


Segher


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool
On Thu, Mar 07, 2024 at 11:45:45AM +0100, Richard Biener wrote:
> The question is, whether a NULL cc_use_loc (find_single_use returning 
> NULL) means "there is no use" or it can mean "huh, don't know, maybe
> more than one, maybe I was too stupid to indentify the single use".
> The implementation suggests it's all broken ;)

It specifically means there is not a *single* use (or it could not find
what it was, perhaps).  It does not mean there is no use.  There is not
much in combine that cares about dead code anyway, earier passes should
have taken care of that ;-)

All as documented.


Segher


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool
On Thu, Mar 07, 2024 at 11:22:12AM +0100, Richard Biener wrote:
> > > > Undo the combination if *cc_use_loc is not COMPARISON_P.

Why, anyway?  COMPARISON_P means things like LE.  It does not even
include actual RTX COMPARE.


Segher


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool
On Thu, Mar 07, 2024 at 10:55:12AM +0100, Richard Biener wrote:
> On Thu, 7 Mar 2024, Uros Bizjak wrote:
> > This is
> > 
> > 3236  /* Just replace the CC reg with a new mode.  */
> > 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> > 3238  undobuf.other_insn = cc_use_insn;
> > 
> > in combine.cc, where *cc_use_loc is
> > 
> > (unspec:DI [
> > (reg:CC 17 flags)
> > ] UNSPEC_PUSHFL)
> > 
> > combine assumes CC must be used inside of a comparison and uses XEXP (..., 
> > 0)

No.  It has established *this is the case* some time earlier.  Lines\
3155 and on, what begins with
  /* Many machines have insns that can both perform an
 arithmetic operation and set the condition code.

> > OK for trunk?
> 
> Since you CCed me - looking at the code I wonder why we fatally fail.

I did not get this email btw.  Some blip in email (on the sender's side)
I guess?

> The following might also fix the issue and preserve more of the
> rest of the flow of the function.

> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
> *i1, rtx_insn *i0,
>  
>if (undobuf.other_insn == 0
>   && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
> -   _use_insn)))
> +   _use_insn))
> + && COMPARISON_P (*cc_use_loc))

Line 3167 already is
  && GET_CODE (SET_SRC (PATTERN (i3))) == COMPARE
so what in your backend is unusual?


Segher


Re: [PATCH] AArch64: memcpy/memset expansions should not emit LDP/STP [PR113618]

2024-03-07 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi Richard,
>
>> It looks like this is really doing two things at once: disabling the
>> direct emission of LDP/STP Qs, and switching the GPR handling from using
>> pairs of DImode moves to single TImode moves.  At least, that seems to be
>> the effect of...
>
> No it still uses TImode for the !TARGET_SIMD case.
>
>> +   if (GET_MODE_SIZE (mode_iter.require ()) <= MIN (size, 16))
>> + mode = mode_iter.require ();
>
>> ...hard-coding 16 here and...
>
> This only affects the Q register case.
>
>> -  if (size > 0 && size < copy_max / 2 && !STRICT_ALIGNMENT)
>> +  if (size > 0 && size < 16 && !STRICT_ALIGNMENT)
>
>> ...changing this limit from 8 to 16 for non-SIMD copies.
>>
>> Is that deliberate?  If so, please mention that kind of thing in the
>> covering note.  It sounded like this was intended to change the handling
>> of vector moves only.
>
> Yes it's deliberate. It now basically treats everything as blocks of 16 bytes
> which has a nice simplifying effect. I've added a note.
>
>> This means that, for GPRs, we are now effectively using the double-word
>> move patterns to get an LDP/STP indirectly, rather than directly as before.
>
> No, there is no difference here.
>
>> That seems OK, and I suppose might be slightly preferable to the current
>> code for things like:
>>
>>  char a[31], b[31];
>>  void f() { __builtin_memcpy(a, b, 31); }
>
> Yes an unaligned tail improves slightly by using blocks of 16 bytes.
> It's a very rare case, both -mgeneral-regs-only is rarely used, and most
> fixed-size copies are a nice multiple of 8.
>
>> But that raises the question: should we do the same thing for Q registers
>> and V2x16QImode?
>
> I don't believe it makes sense to use those complex types. And it likely
> blocks optimizations in a similar way as UNSPEC does.

A V2x16QImode move isn't particularly special as far as target-
independent code is concerned.  It's just an ordinary move of an
ordinary vector mode.  And the vector modes that we're picking here
generally have nothing to do with the source data.

But I'd forgotten about:

  /* On LE, for AdvSIMD, don't support anything other than POST_INC or
 REG addressing.  */
  if (advsimd_struct_p
  && TARGET_SIMD
  && !BYTES_BIG_ENDIAN
  && (code != POST_INC && code != REG))
return false;

> v2: Rebase to trunk
>
> The new RTL introduced for LDP/STP results in regressions due to use of 
> UNSPEC.
> Given the new LDP fusion pass is good at finding LDP opportunities, change the
> memcpy, memmove and memset expansions to emit single vector loads/stores.
> This fixes the regression and enables more RTL optimization on the standard
> memory accesses.  Handling of unaligned tail of memcpy/memmove is improved
> with -mgeneral-regs-only.  SPEC2017 performance improves slightly.  Codesize
> is a bit worse due to missed LDP opportunities as discussed in the PR.
>
> Passes regress, OK for commit?
>
> gcc/ChangeLog:
> PR target/113618
> * config/aarch64/aarch64.cc (aarch64_copy_one_block): Remove. 
> (aarch64_expand_cpymem): Emit single load/store only.
> (aarch64_set_one_block): Emit single stores only.
>
> gcc/testsuite/ChangeLog:
> PR target/113618
> * gcc.target/aarch64/pr113618.c: New test.

OK, thanks.

Richard

> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 16318bf925883ecedf9345e53fc0824a553b2747..0a28e033088a00818c6ed9fa8c15ecdee5a86c35
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -26465,33 +26465,6 @@ aarch64_progress_pointer (rtx pointer)
>return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
>  }
>  
> -typedef auto_vec, 12> copy_ops;
> -
> -/* Copy one block of size MODE from SRC to DST at offset OFFSET.  */
> -static void
> -aarch64_copy_one_block (copy_ops , rtx src, rtx dst,
> - int offset, machine_mode mode)
> -{
> -  /* Emit explict load/store pair instructions for 32-byte copies.  */
> -  if (known_eq (GET_MODE_SIZE (mode), 32))
> -{
> -  mode = V4SImode;
> -  rtx src1 = adjust_address (src, mode, offset);
> -  rtx dst1 = adjust_address (dst, mode, offset);
> -  rtx reg1 = gen_reg_rtx (mode);
> -  rtx reg2 = gen_reg_rtx (mode);
> -  rtx load = aarch64_gen_load_pair (reg1, reg2, src1);
> -  rtx store = aarch64_gen_store_pair (dst1, reg1, reg2);
> -  ops.safe_push ({ load, store });
> -  return;
> -}
> -
> -  rtx reg = gen_reg_rtx (mode);
> -  rtx load = gen_move_insn (reg, adjust_address (src, mode, offset));
> -  rtx store = gen_move_insn (adjust_address (dst, mode, offset), reg);
> -  ops.safe_push ({ load, store });
> -}
> -
>  /* Expand a cpymem/movmem using the MOPS extension.  OPERANDS are taken
> from the cpymem/movmem pattern.  IS_MEMMOVE is true if this is a memmove
> rather than memcpy.  Return true iff we succeeded.  */
> @@ -26527,7 +26500,7 @@ 

Re: [PATCH] aarch64: Fix costing of manual bfi instructions

2024-03-07 Thread Richard Sandiford
Andrew Pinski  writes:
> This fixes the cost model for BFI instructions which don't
> use directly zero_extract on the LHS.
> aarch64_bfi_rtx_p does the heavy lifting by matching of
> the patterns.
>
> Note this alone does not fix PR 107270, it is a step in the right
> direction. There we get z zero_extend for the non-shifted part
> which we don't currently match.
>
> Built and tested on aarch64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_bfi_rtx_p): New function.
>   (aarch64_rtx_costs): For IOR, try calling aarch64_bfi_rtx_p.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc | 94 +++
>  1 file changed, 94 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 3d8341c17fe..dc5c5c23cb3 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -13776,6 +13776,90 @@ aarch64_extr_rtx_p (rtx x, rtx *res_op0, rtx 
> *res_op1)
>return false;
>  }
>  
> +/* Return true iff X is an rtx that will match an bfi instruction
> +   i.e. as described in the *aarch64_bfi5 family of patterns.
> +   OP0 and OP1 will be set to the operands of the insert involved
> +   on success and will be NULL_RTX otherwise.  */
> +
> +static bool
> +aarch64_bfi_rtx_p (rtx x, rtx *res_op0, rtx *res_op1)

I think it'd be slightly neater to pass an XEXP index in as well,
and use it...

> +{
> +  rtx op0, op1;
> +  scalar_int_mode mode;
> +
> +  *res_op0 = NULL_RTX;
> +  *res_op1 = NULL_RTX;
> +  if (!is_a  (GET_MODE (x), ))
> +return false;
> +
> +  if (GET_CODE (x) != IOR)
> +return false;
> +
> +  unsigned HOST_WIDE_INT mask1;
> +  unsigned HOST_WIDE_INT shft_amnt;
> +  unsigned HOST_WIDE_INT mask2;
> +  rtx shiftop;
> +
> +  rtx iop0 = XEXP (x, 0);
> +  rtx iop1 = XEXP (x, 1);

...here as opno and 1 - opno.  That way we don't need to...

> +
> +  if (GET_CODE (iop0) == AND
> +  && CONST_INT_P (XEXP (iop0, 1))
> +  && GET_CODE (XEXP (iop0, 0)) != ASHIFT)
> +{
> +  op0 = XEXP (iop0, 0);
> +  mask1 = UINTVAL (XEXP (iop0, 1));
> +  shiftop = iop1;
> +}
> +  else if (GET_CODE (iop1) == AND
> +  && CONST_INT_P (XEXP (iop1, 1))
> +  && GET_CODE (XEXP (iop1, 0)) != ASHIFT)
> +{
> +  op0 = XEXP (iop1, 0);
> +  mask1 = UINTVAL (XEXP (iop1, 1));
> +  shiftop = iop0;
> +}
> +  else
> +return false;

...handle this both ways, and don't need to exclude ASHIFT.

Maybe some variation on "insert_op" would be better than "shiftop",
since the operand might not include a shift.

Looks generally good to me otherwise FWIW, but obviously GCC 15 material.

Thanks,
Richard

> +
> +  /* Shifted with no mask. */
> +  if (GET_CODE (shiftop) == ASHIFT
> +  && CONST_INT_P (XEXP (shiftop, 1)))
> +{
> +  shft_amnt = UINTVAL (XEXP (shiftop, 1));
> +  mask2 = HOST_WIDE_INT_M1U << shft_amnt;
> +  op1 = XEXP (shiftop, 0);
> +}
> +   else if (GET_CODE (shiftop) == AND
> + && CONST_INT_P (XEXP (shiftop, 1)))
> +{
> +  mask2 = UINTVAL (XEXP (shiftop, 1));
> +  if (GET_CODE (XEXP (shiftop, 0)) == ASHIFT
> +   && CONST_INT_P (XEXP (XEXP (shiftop, 0), 1)))
> + {
> +   op1 = XEXP (XEXP (shiftop, 0), 0);
> +   shft_amnt = UINTVAL (XEXP (XEXP (shiftop, 0), 1));
> + }
> +  else
> + {
> +   op1 = XEXP (shiftop, 0);
> +   shft_amnt = 0;
> + }
> +}
> +  else
> +return false;
> +
> +  if (shft_amnt >= GET_MODE_BITSIZE (mode))
> +return false;
> +
> +  if (!aarch64_masks_and_shift_for_bfi_p (mode, mask1, shft_amnt, mask2))
> +return false;
> +
> +  *res_op0 = op0;
> +  *res_op1 = op1;
> +  return true;
> +}
> +
>  /* Calculate the cost of calculating (if_then_else (OP0) (OP1) (OP2)),
> storing it in *COST.  Result is true if the total cost of the operation
> has now been calculated.  */
> @@ -14662,6 +14746,16 @@ cost_plus:
> return true;
>  }
>  
> +  if (aarch64_bfi_rtx_p (x, , ))
> + {
> +   *cost += rtx_cost (op0, mode, IOR, 0, speed);
> +   *cost += rtx_cost (op0, mode, IOR, 1, speed);
> +   if (speed)
> + *cost += extra_cost->alu.bfi;
> +
> +   return true;
> + }
> +
>if (aarch64_extr_rtx_p (x, , ))
>  {
> *cost += rtx_cost (op0, mode, IOR, 0, speed);


[PATCH] Fix libcc1plugin and libc1plugin to avoid poisoned identifiers

2024-03-07 Thread Dimitry Andric
Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632

Use INCLUDE_VECTOR before including system.h, instead of directly
including , to avoid running into poisoned identifiers.

Signed-off-by: Dimitry Andric 
---
 libcc1/libcc1plugin.cc | 3 +--
 libcc1/libcp1plugin.cc | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/libcc1/libcc1plugin.cc b/libcc1/libcc1plugin.cc
index 72d17c3b81c..e64847466f4 100644
--- a/libcc1/libcc1plugin.cc
+++ b/libcc1/libcc1plugin.cc
@@ -32,6 +32,7 @@
 #undef PACKAGE_VERSION
 
 #define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
 #include "gcc-plugin.h"
 #include "system.h"
 #include "coretypes.h"
@@ -69,8 +70,6 @@
 #include "gcc-c-interface.h"
 #include "context.hh"
 
-#include 
-
 using namespace cc1_plugin;
 
 
diff --git a/libcc1/libcp1plugin.cc b/libcc1/libcp1plugin.cc
index 0eff7c68d29..da68c5d0ac1 100644
--- a/libcc1/libcp1plugin.cc
+++ b/libcc1/libcp1plugin.cc
@@ -33,6 +33,7 @@
 #undef PACKAGE_VERSION
 
 #define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
 #include "gcc-plugin.h"
 #include "system.h"
 #include "coretypes.h"
@@ -71,8 +72,6 @@
 #include "rpc.hh"
 #include "context.hh"
 
-#include 
-
 using namespace cc1_plugin;
 
 
-- 
2.43.2



Re: [PATCH 2/2] aarch64: Support `{1.0f, 1.0f, 0.0, 0.0}` CST forming with fmov with a smaller vector type.

2024-03-07 Thread Richard Sandiford
Andrew Pinski  writes:
> This enables construction of V4SF CST like `{1.0f, 1.0f, 0.0f, 0.0f}`
> (and other fp enabled CSTs) by using `fmov v0.2s, 1.0` as the instruction
> is designed to zero out the other bits.
> This is a small extension on top of the code that creates fmov for the case
> where the all but the first element is non-zero.

Similarly to the second reply to 1/2, I think we should handle this
by detecting when only the low 64 bits are nonzero, and then try to
construct a simd_immediate_info for the low 64 bits.  The technique
is more general than just floats.

The same thing would work for SVE too (if TARGET_SIMD).

Thanks,
Richard

> Built and tested for aarch64-linux-gnu with no regressions.
>
>   PR target/113856
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (simd_immediate_info): Add bool to the
>   float mode constructor. Document modifier field for FMOV_SDH.
>   (aarch64_simd_valid_immediate): Recognize where the first half
>   of the const float vect is the same.
>   (aarch64_output_simd_mov_immediate): Handle the case where insn is
>   FMOV_SDH and modifier is MSL.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/fmov-zero-cst-3.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc | 34 ---
>  .../gcc.target/aarch64/fmov-zero-cst-3.c  | 28 +++
>  2 files changed, 57 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-zero-cst-3.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index c4386591a9b..89bd0c5e5a6 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -130,7 +130,7 @@ struct simd_immediate_info
>enum modifier_type { LSL, MSL };
>  
>simd_immediate_info () {}
> -  simd_immediate_info (scalar_float_mode, rtx, insn_type = MOV);
> +  simd_immediate_info (scalar_float_mode, rtx, insn_type = MOV, bool = 
> false);
>simd_immediate_info (scalar_int_mode, unsigned HOST_WIDE_INT,
>  insn_type = MOV, modifier_type = LSL,
>  unsigned int = 0);
> @@ -153,6 +153,8 @@ struct simd_immediate_info
>  
>/* The kind of shift modifier to use, and the number of bits to shift.
>This is (LSL, 0) if no shift is needed.  */
> +  /* For FMOV_SDH, LSL says it is a single while MSL
> +  says if it is either .4h/.2s fmov. */
>modifier_type modifier;
>unsigned int shift;
>  } mov;
> @@ -173,12 +175,12 @@ struct simd_immediate_info
>  /* Construct a floating-point immediate in which each element has mode
> ELT_MODE_IN and value VALUE_IN.  */
>  inline simd_immediate_info
> -::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in, 
> insn_type insn_in)
> +::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in, 
> insn_type insn_in, bool firsthalfsame)
>: elt_mode (elt_mode_in), insn (insn_in)
>  {
>gcc_assert (insn_in == MOV || insn_in == FMOV_SDH);
>u.mov.value = value_in;
> -  u.mov.modifier = LSL;
> +  u.mov.modifier = firsthalfsame ? MSL : LSL;
>u.mov.shift = 0;
>  }
>  
> @@ -22944,10 +22946,23 @@ aarch64_simd_valid_immediate (rtx op, 
> simd_immediate_info *info,
> || aarch64_float_const_representable_p (elt))
>   {
> bool valid = true;
> +   bool firsthalfsame = false;
> for (unsigned int i = 1; i < n_elts; i++)
>   {
> rtx elt1 = CONST_VECTOR_ENCODED_ELT (op, i);
> if (!aarch64_float_const_zero_rtx_p (elt1))
> + {
> +   if (i == 1)
> + firsthalfsame = true;
> +   if (!firsthalfsame
> +   || i >= n_elts/2
> +   || !rtx_equal_p (elt, elt1))
> + {
> +   valid = false;
> +   break;
> + }
> + }
> +   else if (firsthalfsame && i < n_elts/2)
>   {
> valid = false;
> break;
> @@ -22957,7 +22972,8 @@ aarch64_simd_valid_immediate (rtx op, 
> simd_immediate_info *info,
>   {
> if (info)
>   *info = simd_immediate_info (elt_float_mode, elt,
> -  simd_immediate_info::FMOV_SDH);
> +  simd_immediate_info::FMOV_SDH,
> +  firsthalfsame);
> return true;
>   }
>   }
> @@ -25165,8 +25181,16 @@ aarch64_output_simd_mov_immediate (rtx const_vector, 
> unsigned width,
> real_to_decimal_for_mode (float_buf,
>   CONST_DOUBLE_REAL_VALUE (info.u.mov.value),
>   buf_size, buf_size, 1, info.elt_mode);
> -   if (info.insn == simd_immediate_info::FMOV_SDH)
> +   if (info.insn == simd_immediate_info::FMOV_SDH
> +   && info.u.mov.modifier 

Re: [PATCH 1/2] aarch64: Use fmov s/d/hN, FP_CST for some vector CST [PR113856]

2024-03-07 Thread Richard Sandiford
Richard Sandiford  writes:
> Andrew Pinski  writes:
>> Aarch64 has a way to form some floating point CSTs via the fmov instructions,
>> these instructions also zero out the upper parts of the registers so they can
>> be used for vector CSTs that have have one non-zero constant that would be 
>> able
>> to formed via the fmov in the first element.
>>
>> This implements this "small" optimization so these vector cst don't need to 
>> do
>> loads from memory.
>>
>> Built and tested on aarch64-linux-gnu with no regressions.
>>
>>  PR target/113856
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64.cc (struct simd_immediate_info):
>>  Add FMOV_SDH to insn_type. For scalar_float_mode constructor
>>  add insn_in.
>>  (aarch64_simd_valid_immediate): Catch `{fp, 0...}` vector_cst
>>  and return a simd_immediate_info which uses FMOV_SDH.
>>  (aarch64_output_simd_mov_immediate): Support outputting
>>  fmov for FMOV_SDH.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/aarch64/fmov-zero-cst-1.c: New test.
>>  * gcc.target/aarch64/fmov-zero-cst-2.c: New test.
>>
>> Signed-off-by: Andrew Pinski 
>> ---
>>  gcc/config/aarch64/aarch64.cc | 48 ++---
>>  .../gcc.target/aarch64/fmov-zero-cst-1.c  | 52 +++
>>  .../gcc.target/aarch64/fmov-zero-cst-2.c  | 19 +++
>>  3 files changed, 111 insertions(+), 8 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-zero-cst-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-zero-cst-2.c
>>
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 5dd0814f198..c4386591a9b 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -126,11 +126,11 @@ constexpr auto AARCH64_STATE_OUT = 1U << 2;
>>  /* Information about a legitimate vector immediate operand.  */
>>  struct simd_immediate_info
>>  {
>> -  enum insn_type { MOV, MVN, INDEX, PTRUE };
>> +  enum insn_type { MOV, FMOV_SDH, MVN, INDEX, PTRUE };
>>enum modifier_type { LSL, MSL };
>>  
>>simd_immediate_info () {}
>> -  simd_immediate_info (scalar_float_mode, rtx);
>> +  simd_immediate_info (scalar_float_mode, rtx, insn_type = MOV);
>>simd_immediate_info (scalar_int_mode, unsigned HOST_WIDE_INT,
>> insn_type = MOV, modifier_type = LSL,
>> unsigned int = 0);
>> @@ -145,7 +145,7 @@ struct simd_immediate_info
>>  
>>union
>>{
>> -/* For MOV and MVN.  */
>> +/* For MOV, FMOV_SDH and MVN.  */
>>  struct
>>  {
>>/* The value of each element.  */
>> @@ -173,9 +173,10 @@ struct simd_immediate_info
>>  /* Construct a floating-point immediate in which each element has mode
>> ELT_MODE_IN and value VALUE_IN.  */
>>  inline simd_immediate_info
>> -::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in)
>> -  : elt_mode (elt_mode_in), insn (MOV)
>> +::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in, 
>> insn_type insn_in)
>
> Nit: long line.
>
>> +  : elt_mode (elt_mode_in), insn (insn_in)
>>  {
>> +  gcc_assert (insn_in == MOV || insn_in == FMOV_SDH);
>>u.mov.value = value_in;
>>u.mov.modifier = LSL;
>>u.mov.shift = 0;
>> @@ -22932,6 +22933,35 @@ aarch64_simd_valid_immediate (rtx op, 
>> simd_immediate_info *info,
>>return true;
>>  }
>>  }
>> +  /* See if we can use fmov d0/s0/h0 ... for the constant. */
>> +  if (n_elts >= 1
>
> This condition seems unnecessary.  n_elts can't be zero.
>
>> +  && (vec_flags & VEC_ADVSIMD)
>> +  && is_a  (elt_mode, _float_mode)
>> +  && !CONST_VECTOR_DUPLICATE_P (op))
>
> I think we should also drop this.  I guess it's to undo:
>
>   if (CONST_VECTOR_P (op)
>   && CONST_VECTOR_DUPLICATE_P (op))
> n_elts = CONST_VECTOR_NPATTERNS (op);
>
> but we can use GET_MODE_NUNITS (mode) directly instead.
>
>> +{
>> +  rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
>> +  if (aarch64_float_const_zero_rtx_p (elt)
>> +  || aarch64_float_const_representable_p (elt))
>
> What's the effect of including aarch64_float_const_zero_rtx_p for the
> first element?  Does it change the code we generate for any cases
> involving +0.0?  Or is it more for -0.0?
>
>> +{
>> +  bool valid = true;
>> +  for (unsigned int i = 1; i < n_elts; i++)
>> +{
>> +  rtx elt1 = CONST_VECTOR_ENCODED_ELT (op, i);
>> +  if (!aarch64_float_const_zero_rtx_p (elt1))
>> +{
>> +  valid = false;
>> +  break;
>> +}
>> +}
>> +  if (valid)
>> +{
>> +  if (info)
>> +*info = simd_immediate_info (elt_float_mode, elt,
>> + simd_immediate_info::FMOV_SDH);
>> +  return true;
>> +}
>> +}
>> +}

Sorry to reply to myself almost immediately, but did you consider extending:

  scalar_float_mode elt_float_mode;
  if (n_elts == 1
   

Re: [PATCH 1/2] aarch64: Use fmov s/d/hN, FP_CST for some vector CST [PR113856]

2024-03-07 Thread Richard Sandiford
Andrew Pinski  writes:
> Aarch64 has a way to form some floating point CSTs via the fmov instructions,
> these instructions also zero out the upper parts of the registers so they can
> be used for vector CSTs that have have one non-zero constant that would be 
> able
> to formed via the fmov in the first element.
>
> This implements this "small" optimization so these vector cst don't need to do
> loads from memory.
>
> Built and tested on aarch64-linux-gnu with no regressions.
>
>   PR target/113856
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (struct simd_immediate_info):
>   Add FMOV_SDH to insn_type. For scalar_float_mode constructor
>   add insn_in.
>   (aarch64_simd_valid_immediate): Catch `{fp, 0...}` vector_cst
>   and return a simd_immediate_info which uses FMOV_SDH.
>   (aarch64_output_simd_mov_immediate): Support outputting
>   fmov for FMOV_SDH.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/fmov-zero-cst-1.c: New test.
>   * gcc.target/aarch64/fmov-zero-cst-2.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc | 48 ++---
>  .../gcc.target/aarch64/fmov-zero-cst-1.c  | 52 +++
>  .../gcc.target/aarch64/fmov-zero-cst-2.c  | 19 +++
>  3 files changed, 111 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-zero-cst-1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-zero-cst-2.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5dd0814f198..c4386591a9b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -126,11 +126,11 @@ constexpr auto AARCH64_STATE_OUT = 1U << 2;
>  /* Information about a legitimate vector immediate operand.  */
>  struct simd_immediate_info
>  {
> -  enum insn_type { MOV, MVN, INDEX, PTRUE };
> +  enum insn_type { MOV, FMOV_SDH, MVN, INDEX, PTRUE };
>enum modifier_type { LSL, MSL };
>  
>simd_immediate_info () {}
> -  simd_immediate_info (scalar_float_mode, rtx);
> +  simd_immediate_info (scalar_float_mode, rtx, insn_type = MOV);
>simd_immediate_info (scalar_int_mode, unsigned HOST_WIDE_INT,
>  insn_type = MOV, modifier_type = LSL,
>  unsigned int = 0);
> @@ -145,7 +145,7 @@ struct simd_immediate_info
>  
>union
>{
> -/* For MOV and MVN.  */
> +/* For MOV, FMOV_SDH and MVN.  */
>  struct
>  {
>/* The value of each element.  */
> @@ -173,9 +173,10 @@ struct simd_immediate_info
>  /* Construct a floating-point immediate in which each element has mode
> ELT_MODE_IN and value VALUE_IN.  */
>  inline simd_immediate_info
> -::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in)
> -  : elt_mode (elt_mode_in), insn (MOV)
> +::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in, 
> insn_type insn_in)

Nit: long line.

> +  : elt_mode (elt_mode_in), insn (insn_in)
>  {
> +  gcc_assert (insn_in == MOV || insn_in == FMOV_SDH);
>u.mov.value = value_in;
>u.mov.modifier = LSL;
>u.mov.shift = 0;
> @@ -22932,6 +22933,35 @@ aarch64_simd_valid_immediate (rtx op, 
> simd_immediate_info *info,
> return true;
>   }
>  }
> +  /* See if we can use fmov d0/s0/h0 ... for the constant. */
> +  if (n_elts >= 1

This condition seems unnecessary.  n_elts can't be zero.

> +  && (vec_flags & VEC_ADVSIMD)
> +  && is_a  (elt_mode, _float_mode)
> +  && !CONST_VECTOR_DUPLICATE_P (op))

I think we should also drop this.  I guess it's to undo:

  if (CONST_VECTOR_P (op)
  && CONST_VECTOR_DUPLICATE_P (op))
n_elts = CONST_VECTOR_NPATTERNS (op);

but we can use GET_MODE_NUNITS (mode) directly instead.

> +{
> +  rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
> +  if (aarch64_float_const_zero_rtx_p (elt)
> +   || aarch64_float_const_representable_p (elt))

What's the effect of including aarch64_float_const_zero_rtx_p for the
first element?  Does it change the code we generate for any cases
involving +0.0?  Or is it more for -0.0?

> + {
> +   bool valid = true;
> +   for (unsigned int i = 1; i < n_elts; i++)
> + {
> +   rtx elt1 = CONST_VECTOR_ENCODED_ELT (op, i);
> +   if (!aarch64_float_const_zero_rtx_p (elt1))
> + {
> +   valid = false;
> +   break;
> + }
> + }
> +   if (valid)
> + {
> +   if (info)
> + *info = simd_immediate_info (elt_float_mode, elt,
> +  simd_immediate_info::FMOV_SDH);
> +   return true;
> + }
> + }
> +}
>  
>/* If all elements in an SVE vector have the same value, we have a free
>   choice between using the element mode and using the container mode.
> @@ -25121,7 +25151,8 @@ aarch64_output_simd_mov_immediate (rtx const_vector, 
> unsigned width,
>  
>if 

Re: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]

2024-03-07 Thread Patrick Palka
On Wed, 6 Mar 2024, Jason Merrill wrote:

> On 3/4/24 17:26, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > OK for trunk?
> > 
> > -- >8 --
> > 
> > Alias templates are weird in that their specializations can appear in
> > both decl_specializations and type_specializations.  They appear in the
> > latter only at parse time via finish_template_type.  This should probably
> > be revisited in GCC 15 since it seems sufficient to store them only in
> > decl_specializations.
> 
> It looks like most all of lookup_template_class is wrong for alias templates.
> 
> Can we move the alias template handling up higher and unconditionally return
> the result of tsubst?

This works nicely (although we have to use instantiate_alias_template
directly instead of tsubst since tsubst would first substitute the
uncoerced arguments into the generic DECL_TI_ARGS which breaks for
for parameter packs).  And it allows for some nice simplifications in
the modules code which had to handle alias template specializations
specially.

Bootstrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

Subject: [PATCH] c++/modules: member alias tmpl partial inst [PR103994]

Alias templates are weird in that their specializations can appear in
both decl_specializations and type_specializations.  They appear in the
type table only at parse time via finish_template_type.  There seems
to be no good reason for this, and the code paths end up stepping over
each other in particular for a partial alias template instantiation such
as A::key_arg in the below modules testcase: the type code path
(lookup_template_class) wants to set TI_TEMPLATE to the most general
template whereas the decl code path (tsubst_template_decl called during
instantiation of A) already set TI_TEMPLATE to the partially
instantiated TEMPLATE_DECL.  This ends up confusing modules which
decides to stream the logically equivalent TYPE_DECL and TEMPLATE_DECL
for this partial alias template instantiation separately.

This patch fixes this by making lookup_template_class dispatch to
instantiatie_alias_template early for alias template specializations.
In turn we now only add such specializations to the decl table and
not also the type table.  This admits some nice simplification in
the modules code which otherwise has to cope with such specializations
appearing in both tables.

PR c++/103994

gcc/cp/ChangeLog:

* cp-tree.h (add_mergeable_specialization): Remove is_alias
parameter.
* module.cc (depset::disc_bits::DB_ALIAS_SPEC_BIT): Remove.
(depset::is_alias): Remove.
(merge_kind::MK_tmpl_alias_mask): Remove.
(merge_kind::MK_alias_spec): Remove.
(merge_kind_name): Remove entries for alias specializations.
(trees_in::decl_value): Adjust add_mergeable_specialization
calls.
(trees_out::get_merge_kind) :
Use MK_decl_spec for alias template specializations.
(trees_out::key_mergeable): Simplify after MK_tmpl_alias_mask
removal.
(specialization_add): Don't allow alias templates when !decl_p.
(depset::hash::add_specializations): Remove now-dead code
accomodating alias template specializations in the type table.
* pt.cc (lookup_template_class): Dispatch early to
instantiate_alias_template for alias templates.  Simplify
accordingly.
(add_mergeable_specialization): Remove alias_p parameter and
simplify accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99425-1_b.H: s/alias/decl in dump scan.
* g++.dg/modules/tpl-alias-1_a.H: Likewise.
* g++.dg/modules/tpl-alias-2_a.H: New test.
* g++.dg/modules/tpl-alias-2_b.C: New test.
---
 gcc/cp/cp-tree.h |  3 +-
 gcc/cp/module.cc | 50 ++--
 gcc/cp/pt.cc | 84 
 gcc/testsuite/g++.dg/modules/pr99425-1_b.H   |  2 +-
 gcc/testsuite/g++.dg/modules/tpl-alias-1_a.H |  2 +-
 gcc/testsuite/g++.dg/modules/tpl-alias-2_a.H | 15 
 gcc/testsuite/g++.dg/modules/tpl-alias-2_b.C |  9 +++
 7 files changed, 69 insertions(+), 96 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-alias-2_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-alias-2_b.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 4469d965ef0..14895bc6585 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7642,8 +7642,7 @@ extern void walk_specializations  (bool,
 void *);
 extern tree match_mergeable_specialization (bool is_decl, spec_entry *);
 extern unsigned get_mergeable_specialization_flags (tree tmpl, tree spec);
-extern void add_mergeable_specialization(bool is_decl, bool is_alias,
-spec_entry *,
+extern void add_mergeable_specialization(bool is_decl, spec_entry *,

Re: [PATCH] ipa: Avoid excessive removing of SSAs (PR 113757)

2024-03-07 Thread Jan Hubicka
> On Thu, Feb 08 2024, Martin Jambor wrote:
> > Hi,
> >
> > PR 113757 shows that the code which was meant to debug-reset and
> > remove SSAs defined by LHSs of calls redirected to
> > __builtin_unreachable can trigger also when speculative
> > devirtualization creates a call to a noreturn function (and since it
> > is noreturn, it does not bother dealing with its return value).
> >
> > What is more, it seems that the code handling this case is not really
> > necessary.  I feel slightly idiotic about this because I have a
> > feeling that I added it because of a failing test-case but I can
> > neither find the testcase nor a reason why the code in
> > cgraph_edge::redirect_call_stmt_to_callee would not be sufficient (it
> > turns the SSA name into a default-def, a bit like IPA-SRA, but any
> > code dominated by a call to a noreturn is not dangerous when it comes
> > to its side-effects).  So this patch just removes the handling.
> >
> > Bootstrapped and tested on x86_64-linux and ppc64le-linux.  I have also
> > LTO-bootstrapped and LTO-profilebootstrapped the patch on x86_64-linux.
> >
> > OK for master?
> >
> > Thanks,
> >
> > Martin
> >
> >
> > gcc/ChangeLog:
> >
> > 2024-02-07  Martin Jambor  
> >
> > PR ipa/113757
> > * tree-inline.cc (redirect_all_calls): Remove code adding SSAs to
> > id->killed_new_ssa_names.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2024-02-07  Martin Jambor  
> >
> > PR ipa/113757
> > * g++.dg/ipa/pr113757.C: New test.
OK,
thanks!
Honza
> > ---
> >  gcc/testsuite/g++.dg/ipa/pr113757.C | 14 ++
> >  gcc/tree-inline.cc  | 14 ++
> >  2 files changed, 16 insertions(+), 12 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/ipa/pr113757.C
> >
> > diff --git a/gcc/testsuite/g++.dg/ipa/pr113757.C 
> > b/gcc/testsuite/g++.dg/ipa/pr113757.C
> > new file mode 100644
> > index 000..885d4010a10
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/ipa/pr113757.C
> > @@ -0,0 +1,14 @@
> > +// { dg-do compile }
> > +// { dg-options "-O2 -fPIC" }
> > +// { dg-require-effective-target fpic }
> > +
> > +long size();
> > +struct ll {  virtual int hh();  };
> > +ll  *slice_owner;
> > +int ll::hh() { __builtin_exit(0); }
> > +int nn() {
> > +  if (size())
> > +return 0;
> > +  return slice_owner->hh();
> > +}
> > +int (*a)() = nn;
> > diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> > index 75c10eb7dfc..cac41b4f031 100644
> > --- a/gcc/tree-inline.cc
> > +++ b/gcc/tree-inline.cc
> > @@ -2984,23 +2984,13 @@ redirect_all_calls (copy_body_data * id, 
> > basic_block bb)
> >gimple *stmt = gsi_stmt (si);
> >if (is_gimple_call (stmt))
> > {
> > - tree old_lhs = gimple_call_lhs (stmt);
> >   struct cgraph_edge *edge = id->dst_node->get_edge (stmt);
> >   if (edge)
> > {
> >   if (!id->killed_new_ssa_names)
> > id->killed_new_ssa_names = new hash_set (16);
> > - gimple *new_stmt
> > -   = cgraph_edge::redirect_call_stmt_to_callee (edge,
> > -   id->killed_new_ssa_names);
> > - if (old_lhs
> > - && TREE_CODE (old_lhs) == SSA_NAME
> > - && !gimple_call_lhs (new_stmt))
> > -   /* In case of IPA-SRA removing the LHS, the name should have
> > -  been already added to the hash.  But in case of redirecting
> > -  to builtin_unreachable it was not and the name still should
> > -  be pruned from debug statements.  */
> > -   id->killed_new_ssa_names->add (old_lhs);
> > + cgraph_edge::redirect_call_stmt_to_callee (edge,
> > +   id->killed_new_ssa_names);
> >  
> >   if (stmt == last && id->call_stmt && maybe_clean_eh_stmt (stmt))
> > gimple_purge_dead_eh_edges (bb);
> > -- 
> > 2.43.0


[PING^1][PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-03-07 Thread jeevitha
Ping!

please review.

Thanks & Regards
Jeevitha

On 23/02/24 3:04 pm, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> PTImode attribute assists in generating even/odd register pairs on 128 bits.
> When the user specifies PTImode as an attribute, it breaks because there is no
> internal type to handle this mode . We have created a tree node with dummy 
> type
> to handle PTImode. We are not documenting this dummy type since users are not
> allowed to use this type externally.
> 
> 2024-02-23  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/106895
>   * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
>   to hold PTImode type.
>   * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
>   for PTImode type.
> 
> gcc/testsuite/
>   PR target/106895
>   * gcc.target/powerpc/pr106895.c: New testcase.
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 6698274031b..f553c72779e 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -756,6 +756,15 @@ rs6000_init_builtins (void)
>else
>  ieee128_float_type_node = NULL_TREE;
>  
> +  /* PTImode to get even/odd register pairs.  */
> +  intPTI_type_internal_node = make_node(INTEGER_TYPE);
> +  TYPE_PRECISION (intPTI_type_internal_node) = GET_MODE_BITSIZE (PTImode);
> +  layout_type (intPTI_type_internal_node);
> +  SET_TYPE_MODE (intPTI_type_internal_node, PTImode);
> +  t = build_qualified_type (intPTI_type_internal_node, TYPE_QUAL_CONST);
> +  lang_hooks.types.register_builtin_type (intPTI_type_internal_node,
> +   "__dummypti");
> +
>/* Vector pair and vector quad support.  */
>vector_pair_type_node = make_node (OPAQUE_TYPE);
>SET_TYPE_MODE (vector_pair_type_node, OOmode);
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 2291fe8d3a3..77bb937a28b 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2304,6 +2304,7 @@ enum rs6000_builtin_type_index
>RS6000_BTI_ptr_vector_quad,
>RS6000_BTI_ptr_long_long,
>RS6000_BTI_ptr_long_long_unsigned,
> +  RS6000_BTI_PTI,
>RS6000_BTI_MAX
>  };
>  
> @@ -2348,6 +2349,7 @@ enum rs6000_builtin_type_index
>  #define uintDI_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_UINTDI])
>  #define intTI_type_internal_node  
> (rs6000_builtin_types[RS6000_BTI_INTTI])
>  #define uintTI_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_UINTTI])
> +#define intPTI_type_internal_node (rs6000_builtin_types[RS6000_BTI_PTI])
>  #define float_type_internal_node  
> (rs6000_builtin_types[RS6000_BTI_float])
>  #define double_type_internal_node 
> (rs6000_builtin_types[RS6000_BTI_double])
>  #define long_double_type_internal_node
> (rs6000_builtin_types[RS6000_BTI_long_double])
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106895.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106895.c
> new file mode 100644
> index 000..56547b7fa9d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106895.c
> @@ -0,0 +1,15 @@
> +/* PR target/106895 */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-O2" } */
> +
> +/* Verify the following generates even/odd register pairs.  */
> +
> +typedef __int128 pti __attribute__((mode(PTI)));
> +
> +void
> +set128 (pti val, pti *mem)
> +{
> +asm("stq %1,%0" : "=m"(*mem) : "r"(val));
> +}
> +
> +/* { dg-final { scan-assembler "stq \[123\]?\[02468\]" } } */
> 


[PING^1][PATCH] rs6000: load high and low part of 128bit vector independently [PR110040]

2024-03-07 Thread jeevitha
Ping!

please review.

Thanks & Regards
Jeevitha

On 26/02/24 11:13 am, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> PR110040 exposes an issue concerning moves from vector registers to GPRs.
> There are two moves, one for upper 64 bits and the other for the lower
> 64 bits.  In the problematic test case, we are only interested in storing
> the lower 64 bits.  However, the instruction for copying the upper 64 bits
> is still emitted and is dead code.  This patch adds a splitter that splits
> apart the two move instructions so that DCE can remove the dead code after
> splitting.
> 
> 2024-02-26  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110040
>   * config/rs6000/vsx.md (split pattern for V1TI to DI move): Defined.
> 
> gcc/testsuite/
>   PR target/110040
>   * gcc.target/powerpc/pr110040-1.c: New testcase.
>   * gcc.target/powerpc/pr110040-2.c: New testcase.
> 
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 6111cc90eb7..78457f8fb14 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6706,3 +6706,19 @@
>"vmsumcud %0,%1,%2,%3"
>[(set_attr "type" "veccomplex")]
>  )
> +
> +(define_split
> +  [(set (match_operand:V1TI 0 "int_reg_operand")
> +   (match_operand:V1TI 1 "vsx_register_operand"))]
> +  "reload_completed
> +   && TARGET_DIRECT_MOVE_64BIT"
> +   [(pc)]
> +{
> +  rtx op0 = gen_rtx_REG (DImode, REGNO (operands[0]));
> +  rtx op1 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
> +  rtx op2 = gen_rtx_REG (DImode, REGNO (operands[0]) + 1);
> +  rtx op3 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
> +  emit_insn (gen_vsx_extract_v2di (op0, op1, GEN_INT (0)));
> +  emit_insn (gen_vsx_extract_v2di (op2, op3, GEN_INT (1)));
> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
> new file mode 100644
> index 000..fb3bd254636
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
> @@ -0,0 +1,14 @@
> +/* PR target/110040 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
> +
> +#include 
> +
> +void
> +foo (signed long *dst, vector signed __int128 src)
> +{
> +  *dst = (signed long) src[0];
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
> new file mode 100644
> index 000..f3aa22be4e8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
> @@ -0,0 +1,13 @@
> +/* PR target/110040 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
> +
> +#include 
> +
> +void
> +foo (signed int *dst, vector signed __int128 src)
> +{
> +  __builtin_vec_xst_trunc (src, 0, dst);
> +}
> 
> 


Re: nvptx: 'cuDeviceGetCount' failure is fatal

2024-03-07 Thread Tobias Burnus

Hi Thomas,

Thomas Schwinge wrote:
/* Return the number of GCN devices on the system. */  
  int

-GOMP_OFFLOAD_get_num_devices (void)
+GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
  {
if (!init_hsa_context ())
  return 0;
+  /* Return -1 if no omp_requires_mask cannot be fulfilled but
+ devices were present.  */
+  if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
+return -1;
return hsa_context.agent_count;
  }

...

OK to push the attached "nvptx: 'cuDeviceGetCount' failure is fatal"?


I think the real question is: what does a 'cuDeviceGetCount' fail mean?

Does it mean a serious error – or could it just be a permissions issue 
such that the user has no device access but otherwise is fine?


Because if it is, e.g., a permission problem – just returning '0' (no 
devices) would seem to be the proper solution.


But if it is expected to be always something serious, well, then a fatal 
error makes more sense.


The possible exit codes are:

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, 
CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE


which does not really help.

My impression is that 0 is usually returned if something goes wrong 
(e.g. with permissions) such that an error is a real exception. But all 
three choices seem to make about equally sense: either host fallback 
(with 0 or -1) or a fatal error.


Tobias


[PATCH v2] contrib: Improve dg-extract-results.sh's Python detection

2024-03-07 Thread Sam James
'python' on some systems (e.g. SLES 15) might be Python 2. Prefer python3,
then python, then python2 (as the script still tries to work there).

contrib/ChangeLog:

* dg-extract-results.sh: Check for python3 before python. Check for python2 
last.
---
v2: Add python2 and drop EPYTHON.

 contrib/dg-extract-results.sh | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/contrib/dg-extract-results.sh b/contrib/dg-extract-results.sh
index 00ef80046f74..9398de786125 100755
--- a/contrib/dg-extract-results.sh
+++ b/contrib/dg-extract-results.sh
@@ -28,14 +28,17 @@
 
 PROGNAME=dg-extract-results.sh
 
-# Try to use the python version if possible, since it tends to be faster.
+# Try to use the python version if possible, since it tends to be faster and
+# produces more stable results.
 PYTHON_VER=`echo "$0" | sed 's/sh$/py/'`
-if test "$PYTHON_VER" != "$0" &&
-   test -f "$PYTHON_VER" &&
-   python -c 'import sys, getopt, re, io, datetime, operator; sys.exit (0 if 
sys.version_info >= (2, 6) else 1)' \
- > /dev/null 2> /dev/null; then
-  exec python $PYTHON_VER "$@"
-fi
+for python in python3 python python2 ; do
+   if test "$PYTHON_VER" != "$0" &&
+  test -f "$PYTHON_VER" &&
+  ${python} -c 'import sys, getopt, re, io, datetime, operator; 
sys.exit (0 if sys.version_info >= (2, 6) else 1)' \
+> /dev/null 2> /dev/null; then
+ exec ${python} $PYTHON_VER "$@"
+   fi
+done
 
 usage() {
   cat <&2
-- 
2.44.0



[Ada] Fix PR ada/113979

2024-03-07 Thread Eric Botcazou
This is a regression present on all active branches: the compiler gives a 
bogus error on an allocator for an unconstrained array type declared with a 
Dynamic_Predicate because Apply_Predicate_Check is invoked directly on a 
subtype reference, which it cannot handle.

This moves the check to the resulting access value (after dereference) like in 
Expand_Allocator_Expression.

Tested on x86-64/Linux, applied on all active branches.


2024-03-07  Eric Botcazou  

PR ada/113979
* exp_ch4.adb (Expand_N_Allocator): In the subtype indication case,
call Apply_Predicate_Check on the resulting access value if needed.


2024-03-07  Eric Botcazou  

* testsuite/gnat.dg/predicate15.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 4f83cd4737a..e4a40414872 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -4657,8 +4657,6 @@ package body Exp_Ch4 is
  if Is_Array_Type (Dtyp) and then not No_Initialization (N) then
 Apply_Constraint_Check (Expression (N), Dtyp, No_Sliding => True);
 
-Apply_Predicate_Check (Expression (N), Dtyp);
-
 if Nkind (Expression (N)) = N_Raise_Constraint_Error then
Rewrite (N, New_Copy (Expression (N)));
Set_Etype (N, PtrT);
@@ -4752,6 +4750,8 @@ package body Exp_Ch4 is
 Rewrite (N, New_Occurrence_Of (Temp, Loc));
 Analyze_And_Resolve (N, PtrT);
 
+Apply_Predicate_Check (N, Dtyp, Deref => True);
+
  --  Case of no initialization procedure present
 
  elsif not Has_Non_Null_Base_Init_Proc (T) then
@@ -5119,6 +5119,8 @@ package body Exp_Ch4 is
Rewrite (N, New_Occurrence_Of (Temp, Loc));
Analyze_And_Resolve (N, PtrT);
 
+   Apply_Predicate_Check (N, Dtyp, Deref => True);
+
--  When designated type has Default_Initial_Condition aspects,
--  make a call to the type's DIC procedure to perform the
--  checks. Theoretically this might also be needed for cases
-- { dg-do compile }
-- { dg-options "-gnata" }

procedure Predicate15 is

   type Grid is array (Positive range <>) of Integer with
  Dynamic_Predicate => Grid'First = 1;

   type Grid_Ptr is access Grid;

   Data : Grid_Ptr := new Grid (1 .. 10);

begin
   null;
end;


Re: [PATCH] contrib: Improve dg-extract-results.sh's Python detection

2024-03-07 Thread Jakub Jelinek
On Thu, Mar 07, 2024 at 02:25:09PM +, Sam James wrote:
> Jakub Jelinek  writes:
> 
> > On Thu, Mar 07, 2024 at 02:16:37PM +, Sam James wrote:
> >> 'python' on some systems (e.g. SLES 15) might be Python 2. Prefer 
> >> ${EPYTHON}
> >> if defined (used by Gentoo's python-exec wrapping), then python3, then 
> >> python.
> >
> > I'd say EPYTHON is too distro specific, just use for python in python3 
> > python ?
> > Other scripts just have
> > #!/usr/bin/env python3
> > as the first line and go with that.
> 
> Sure. Should I add python2 too as well (last), given the script nominally 
> tries to
> work with it still?

Yes.

Jakub



Re: [PATCH] contrib: Improve dg-extract-results.sh's Python detection

2024-03-07 Thread Sam James
Jakub Jelinek  writes:

> On Thu, Mar 07, 2024 at 02:16:37PM +, Sam James wrote:
>> 'python' on some systems (e.g. SLES 15) might be Python 2. Prefer ${EPYTHON}
>> if defined (used by Gentoo's python-exec wrapping), then python3, then 
>> python.
>
> I'd say EPYTHON is too distro specific, just use for python in python3 python 
> ?
> Other scripts just have
> #!/usr/bin/env python3
> as the first line and go with that.

Sure. Should I add python2 too as well (last), given the script nominally tries 
to
work with it still?

>
>> contrib/ChangeLog:
>> 
>> * dg-extract-results.sh: Check for python3 before python.
>> ---
>>  contrib/dg-extract-results.sh | 17 ++---
>>  1 file changed, 10 insertions(+), 7 deletions(-)
>> 
>> diff --git a/contrib/dg-extract-results.sh b/contrib/dg-extract-results.sh
>> index 00ef80046f74..2d1cd76fe255 100755
>> --- a/contrib/dg-extract-results.sh
>> +++ b/contrib/dg-extract-results.sh
>> @@ -28,14 +28,17 @@
>>  
>>  PROGNAME=dg-extract-results.sh
>>  
>> -# Try to use the python version if possible, since it tends to be faster.
>> +# Try to use the python version if possible, since it tends to be faster and
>> +# produces more stable results.
>>  PYTHON_VER=`echo "$0" | sed 's/sh$/py/'`
>> -if test "$PYTHON_VER" != "$0" &&
>> -   test -f "$PYTHON_VER" &&
>> -   python -c 'import sys, getopt, re, io, datetime, operator; sys.exit (0 
>> if sys.version_info >= (2, 6) else 1)' \
>> - > /dev/null 2> /dev/null; then
>> -  exec python $PYTHON_VER "$@"
>> -fi
>> +for python in ${EPYTHON:-python3} python ; do
>> +if test "$PYTHON_VER" != "$0" &&
>> +   test -f "$PYTHON_VER" &&
>> +   ${python} -c 'import sys, getopt, re, io, datetime, operator; 
>> sys.exit (0 if sys.version_info >= (2, 6) else 1)' \
>> + > /dev/null 2> /dev/null; then
>> +  exec ${python} $PYTHON_VER "$@"
>> +fi
>> +done
>>  
>>  usage() {
>>cat <&2
>> -- 
>> 2.44.0
>
>   Jakub


Re: [PATCH] contrib: Improve dg-extract-results.sh's Python detection

2024-03-07 Thread Jakub Jelinek
On Thu, Mar 07, 2024 at 02:16:37PM +, Sam James wrote:
> 'python' on some systems (e.g. SLES 15) might be Python 2. Prefer ${EPYTHON}
> if defined (used by Gentoo's python-exec wrapping), then python3, then python.

I'd say EPYTHON is too distro specific, just use for python in python3 python ?
Other scripts just have
#!/usr/bin/env python3
as the first line and go with that.

> contrib/ChangeLog:
> 
> * dg-extract-results.sh: Check for python3 before python.
> ---
>  contrib/dg-extract-results.sh | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/contrib/dg-extract-results.sh b/contrib/dg-extract-results.sh
> index 00ef80046f74..2d1cd76fe255 100755
> --- a/contrib/dg-extract-results.sh
> +++ b/contrib/dg-extract-results.sh
> @@ -28,14 +28,17 @@
>  
>  PROGNAME=dg-extract-results.sh
>  
> -# Try to use the python version if possible, since it tends to be faster.
> +# Try to use the python version if possible, since it tends to be faster and
> +# produces more stable results.
>  PYTHON_VER=`echo "$0" | sed 's/sh$/py/'`
> -if test "$PYTHON_VER" != "$0" &&
> -   test -f "$PYTHON_VER" &&
> -   python -c 'import sys, getopt, re, io, datetime, operator; sys.exit (0 if 
> sys.version_info >= (2, 6) else 1)' \
> - > /dev/null 2> /dev/null; then
> -  exec python $PYTHON_VER "$@"
> -fi
> +for python in ${EPYTHON:-python3} python ; do
> + if test "$PYTHON_VER" != "$0" &&
> +test -f "$PYTHON_VER" &&
> +${python} -c 'import sys, getopt, re, io, datetime, operator; 
> sys.exit (0 if sys.version_info >= (2, 6) else 1)' \
> +  > /dev/null 2> /dev/null; then
> +   exec ${python} $PYTHON_VER "$@"
> + fi
> +done
>  
>  usage() {
>cat <&2
> -- 
> 2.44.0

Jakub



[PATCH] contrib: Improve dg-extract-results.sh's Python detection

2024-03-07 Thread Sam James
'python' on some systems (e.g. SLES 15) might be Python 2. Prefer ${EPYTHON}
if defined (used by Gentoo's python-exec wrapping), then python3, then python.

contrib/ChangeLog:

* dg-extract-results.sh: Check for python3 before python.
---
 contrib/dg-extract-results.sh | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/contrib/dg-extract-results.sh b/contrib/dg-extract-results.sh
index 00ef80046f74..2d1cd76fe255 100755
--- a/contrib/dg-extract-results.sh
+++ b/contrib/dg-extract-results.sh
@@ -28,14 +28,17 @@
 
 PROGNAME=dg-extract-results.sh
 
-# Try to use the python version if possible, since it tends to be faster.
+# Try to use the python version if possible, since it tends to be faster and
+# produces more stable results.
 PYTHON_VER=`echo "$0" | sed 's/sh$/py/'`
-if test "$PYTHON_VER" != "$0" &&
-   test -f "$PYTHON_VER" &&
-   python -c 'import sys, getopt, re, io, datetime, operator; sys.exit (0 if 
sys.version_info >= (2, 6) else 1)' \
- > /dev/null 2> /dev/null; then
-  exec python $PYTHON_VER "$@"
-fi
+for python in ${EPYTHON:-python3} python ; do
+   if test "$PYTHON_VER" != "$0" &&
+  test -f "$PYTHON_VER" &&
+  ${python} -c 'import sys, getopt, re, io, datetime, operator; 
sys.exit (0 if sys.version_info >= (2, 6) else 1)' \
+> /dev/null 2> /dev/null; then
+ exec ${python} $PYTHON_VER "$@"
+   fi
+done
 
 usage() {
   cat <&2
-- 
2.44.0



Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Tobias Burnus

Hi Thomas,

first, I have the feeling we talk about (more or less) the same code 
region and use the same words – but we talk about rather different 
things. Thus, you confuse me (and possibly Andrew) – and my reply 
confuses you.


Thomas Schwinge wrote:

On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:

Thomas Schwinge wrote:

First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it
is also not really desirable.

External users probably don't, but certainly all our internal testing is
setting it,


First, I doubt it – secondly, if it were true, it was broken for the 
last 5 years or so as we definitely did not notice fails due to not 
working offload devices. – Neither for AMD GCN nor ...



and also implicitly all nvptx offloading testing: simply by
means of having such knob in the libgomp nvptx plugin.


I did see it at some places set for AMD but I do not see any 
nvptx-specific environment variable which permits to do the same.


However:

  That is, the
libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
(the original meaning of) that flag


I think that's one of the problems here – you talk about 
suppress_host_fallback (implicit, original meaning), while I talk about 
the GCN_SUPPRESS_HOST_FALLBACK environment variable.


Besides all the talk about suppress_host_fallback, 
'init_hsa_runtime_functions' is not fatal' of the subject line seems to 
be something to be considered (beyond the patches you already suggested).




If I run on my Linux system the system compiler with nvptx + gcn suppost
installed, I get (with a nvptx permission problem):

$ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out

libgomp: GCN host fallback has been suppressed

And exit code = 1. The same result with '-foffload=disable' or with
'-foffload=nvptx-none'.

I can't tell if that's what you expect to see there, or not?


Well, obviously not that I get this error by default – and as your 
wording indicated that the internal variable will be always true – and 
not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I 
worry that I would get the error any time.



(For avoidance of doubt: I'm expecting silent host-fallback execution in
case that libgomp GCN and/or nvptx plugins are available, but no
corresponding devices.  That's what my patch achieves.)


I concur that the silent host fallback should happen by default (unless 
env vars tell otherwise) - at least when either no code was generated 
for the device (e.g. -foffload=disable) or when the vendor runtime 
library is not available or no device (be it no hardware or no permission).


That's the current behavior and if that remains, my main concern evaporates.

* * *


If we want to remove it, we can make it always false - but I am strongly
against making it always true.

I'm confused.  So you want the GCN and nvptx plugins to behave
differently in that regard?

No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set.

Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to
prevent the host fallback, but don't break somewhat common systems.

That's an orthogonal concept?


No – It's the same concept of the main use of the 
GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time 
error instead of a silent host fallback.


But I have in the whole thread the feeling that – while talking about 
the same code region and throwing in the same words – we actually talk 
about completely different things.


Tobias


[COMMITED] contrib: Update test_mklog to correspond to mklog

2024-03-07 Thread Filip Kastl
Hi,

the recent change to contrib/mklog.py broke contrib/test_mklog.py.  It modified
mklog.py to produce "Move to..." instead of "Moved to..." note in changelog for
files that were moved.  I've commited the fix as obvious.

Filip Kastl

-- 8< --

contrib/ChangeLog:

* test_mklog.py: "Moved to..." -> "Move to..."

Signed-off-by: Filip Kastl 
---
 contrib/test_mklog.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/test_mklog.py b/contrib/test_mklog.py
index b6210738e55..80e159fcca4 100755
--- a/contrib/test_mklog.py
+++ b/contrib/test_mklog.py
@@ -400,7 +400,7 @@ rename to gcc/ipa-icf2.c
 EXPECTED8 = '''\
 gcc/ChangeLog:
 
-   * ipa-icf.c: Moved to...
+   * ipa-icf.c: Move to...
* ipa-icf2.c: ...here.
 
 '''
-- 
2.43.0



Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Andrew Stubbs

On 07/03/2024 13:37, Thomas Schwinge wrote:

Hi Andrew!

On 2024-03-07T11:38:27+, Andrew Stubbs  wrote:

On 07/03/2024 11:29, Thomas Schwinge wrote:

On 2019-11-12T13:29:16+, Andrew Stubbs  wrote:

This patch contributes the GCN libgomp plugin, with the various
configure and make bits to go with it.


An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
different from the libgomp-level host-fallback execution):


--- /dev/null
+++ b/libgomp/plugin/plugin-gcn.c



+/* Flag to decide if the runtime should suppress a possible fallback to host
+   execution.  */
+
+static bool suppress_host_fallback;



+static void
+init_environment_variables (void)
+{
+  [...]
+  if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK"))
+suppress_host_fallback = true;
+  else
+suppress_host_fallback = false;



+/* Return true if the HSA runtime can run function FN_PTR.  */
+
+bool
+GOMP_OFFLOAD_can_run (void *fn_ptr)
+{
+  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
+
+  init_kernel (kernel);
+  if (kernel->initialization_failed)
+goto failure;
+
+  return true;
+
+failure:
+  if (suppress_host_fallback)
+GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
+  return false;
+}


This originates in the libgomp HSA plugin, where the idea was -- in my
understanding -- that you wouldn't have device code available for all
'fn_ptr's, and in that case transparently (shared-memory system!) do
host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin (see above).
However, is it really still applicable there; don't we assume that we're
generating device code for all relevant functions?  (I suppose everyone
really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)  Should we thus
actually remove 'suppress_host_fallback' (that is, make it
always-'true'), including removal of the 'can_run' hook?  (I suppose that
even in a future shared-memory "GCN" configuration, we're not expecting
to use this again; expecting always-'true' for 'can_run'?)


Now my actual issue: the libgomp GCN plugin then invented an additional
use of 'GCN_SUPPRESS_HOST_FALLBACK':


+/* Initialize hsa_context if it has not already been done.
+   Return TRUE on success.  */
+
+static bool
+init_hsa_context (void)
+{
+  hsa_status_t status;
+  int agent_index = 0;
+
+  if (hsa_context.initialized)
+return true;
+  init_environment_variables ();
+  if (!init_hsa_runtime_functions ())
+{
+  GCN_WARNING ("Run-time could not be dynamically opened\n");
+  if (suppress_host_fallback)
+   GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+  return false;
+}


That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its
original purpose), and you have the libgomp GCN plugin configured, but
don't have 'libhsa-runtime64.so.1' available, you run into a fatal error.

The libgomp nvptx plugin in such cases silently disables the
plugin/device (and thus lets libgomp proper do its thing), and I propose
we do the same here.  OK to push the attached
"GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 
'init_hsa_runtime_functions' is not fatal"?


If you try to run the offload testsuite on a device that is not properly
configured then we want FAIL


Exactly, and that's what I'm working towards.  (Currently we're not
implementing that properly.)

But why is 'GCN_SUPPRESS_HOST_FALLBACK' controlling
'init_hsa_runtime_functions' relevant for that?  As you know, that
function only deals with dynamically loading 'libhsa-runtime64.so.1', and
Failure to load that one (because it doesn't exist) should have the
agreed-upon behavior of *not* raising an error.  (Any other, later errors
should be fatal, I certainly agree.)


not pass-via-fallback. You're breaking that.


Sorry, I don't follow, please explain?


If the plugin load fails then libgomp will run in host-fallback. In that 
case, IIRC, this is the *only* opportunity we get to enforce 
GCN_SUPPRESS_HOST_FALLBACK. As far as I'm aware, that variable is 
internal, undocumented, meant for dev testing only. It says "I'm testing 
GCN features and if they're not working then I want to know about it."


Users should be using official OMP features.

Andrew


Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-03-07 Thread FX Coudert
> I think it's an obvious change ...

Thanks, pushed.

Dimitry, I suggest you post the second patch for review.

FX


Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Thomas Schwinge
Hi Tobias!

On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:
> Thomas Schwinge wrote:
>> An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
>> different from the libgomp-level host-fallback execution):
>>> +failure:
>>> +  if (suppress_host_fallback)
>>> +GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
>>> +  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
>>> +  return false;
>>> +}
>>
>> This originates in the libgomp HSA plugin, where the idea was -- in my
>> understanding -- that you wouldn't have device code available for all
>> 'fn_ptr's, and in that case transparently (shared-memory system!) do
>> host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
>> you'd get those diagnosed.
>>
>> This has then been copied into the libgomp GCN plugin (see above).
>> However, is it really still applicable there; don't we assume that we're
>> generating device code for all relevant functions?  (I suppose everyone
>> really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)
>
> First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it 
> is also not really desirable.

External users probably don't, but certainly all our internal testing is
setting it, and also implicitly all nvptx offloading testing: simply by
means of having such knob in the libgomp nvptx plugin.  That is, the
libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
(the original meaning of) that flag (and does not have the "init"-error
behavior that I consider bogus, and try to remove from the libgomp GCN
plugin).

And, one step back: how is (the original meaning of)
'suppress_host_fallback = false' even supposed to work on non-shared
memory systems as currently implemented by the libgomp GCN plugin?

> If I run on my Linux system the system compiler with nvptx + gcn suppost 
> installed, I get (with a nvptx permission problem):
>
> $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out
>
> libgomp: GCN host fallback has been suppressed
>
> And exit code = 1. The same result with '-foffload=disable' or with 
> '-foffload=nvptx-none'.

I can't tell if that's what you expect to see there, or not?

(For avoidance of doubt: I'm expecting silent host-fallback execution in
case that libgomp GCN and/or nvptx plugins are available, but no
corresponding devices.  That's what my patch achieves.)

>> Should we thus
>> actually remove 'suppress_host_fallback' (that is, make it
>> always-'true'),
>
> If we want to remove it, we can make it always false - but I am strongly 
> against making it always true.

I'm confused.  So you want the GCN and nvptx plugins to behave
differently in that regard?  What is the rationale for that?  In
particular also regarding this whole concept of dynamic plugin-level
host-fallback execution being in conflict with our current non-shared
memory system configurations?


> Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to 
> prevent the host fallback, but don't break somewhat common systems.

That's an orthogonal concept?


Grüße
 Thomas


Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Thomas Schwinge
Hi Andrew!

On 2024-03-07T11:38:27+, Andrew Stubbs  wrote:
> On 07/03/2024 11:29, Thomas Schwinge wrote:
>> On 2019-11-12T13:29:16+, Andrew Stubbs  wrote:
>>> This patch contributes the GCN libgomp plugin, with the various
>>> configure and make bits to go with it.
>> 
>> An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
>> different from the libgomp-level host-fallback execution):
>> 
>>> --- /dev/null
>>> +++ b/libgomp/plugin/plugin-gcn.c
>> 
>>> +/* Flag to decide if the runtime should suppress a possible fallback to 
>>> host
>>> +   execution.  */
>>> +
>>> +static bool suppress_host_fallback;
>> 
>>> +static void
>>> +init_environment_variables (void)
>>> +{
>>> +  [...]
>>> +  if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK"))
>>> +suppress_host_fallback = true;
>>> +  else
>>> +suppress_host_fallback = false;
>> 
>>> +/* Return true if the HSA runtime can run function FN_PTR.  */
>>> +
>>> +bool
>>> +GOMP_OFFLOAD_can_run (void *fn_ptr)
>>> +{
>>> +  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
>>> +
>>> +  init_kernel (kernel);
>>> +  if (kernel->initialization_failed)
>>> +goto failure;
>>> +
>>> +  return true;
>>> +
>>> +failure:
>>> +  if (suppress_host_fallback)
>>> +GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
>>> +  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
>>> +  return false;
>>> +}
>> 
>> This originates in the libgomp HSA plugin, where the idea was -- in my
>> understanding -- that you wouldn't have device code available for all
>> 'fn_ptr's, and in that case transparently (shared-memory system!) do
>> host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
>> you'd get those diagnosed.
>> 
>> This has then been copied into the libgomp GCN plugin (see above).
>> However, is it really still applicable there; don't we assume that we're
>> generating device code for all relevant functions?  (I suppose everyone
>> really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)  Should we thus
>> actually remove 'suppress_host_fallback' (that is, make it
>> always-'true'), including removal of the 'can_run' hook?  (I suppose that
>> even in a future shared-memory "GCN" configuration, we're not expecting
>> to use this again; expecting always-'true' for 'can_run'?)
>> 
>> 
>> Now my actual issue: the libgomp GCN plugin then invented an additional
>> use of 'GCN_SUPPRESS_HOST_FALLBACK':
>> 
>>> +/* Initialize hsa_context if it has not already been done.
>>> +   Return TRUE on success.  */
>>> +
>>> +static bool
>>> +init_hsa_context (void)
>>> +{
>>> +  hsa_status_t status;
>>> +  int agent_index = 0;
>>> +
>>> +  if (hsa_context.initialized)
>>> +return true;
>>> +  init_environment_variables ();
>>> +  if (!init_hsa_runtime_functions ())
>>> +{
>>> +  GCN_WARNING ("Run-time could not be dynamically opened\n");
>>> +  if (suppress_host_fallback)
>>> +   GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
>>> +  return false;
>>> +}
>> 
>> That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its
>> original purpose), and you have the libgomp GCN plugin configured, but
>> don't have 'libhsa-runtime64.so.1' available, you run into a fatal error.
>> 
>> The libgomp nvptx plugin in such cases silently disables the
>> plugin/device (and thus lets libgomp proper do its thing), and I propose
>> we do the same here.  OK to push the attached
>> "GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 
>> 'init_hsa_runtime_functions' is not fatal"?
>
> If you try to run the offload testsuite on a device that is not properly 
> configured then we want FAIL

Exactly, and that's what I'm working towards.  (Currently we're not
implementing that properly.)

But why is 'GCN_SUPPRESS_HOST_FALLBACK' controlling
'init_hsa_runtime_functions' relevant for that?  As you know, that
function only deals with dynamically loading 'libhsa-runtime64.so.1', and
Failure to load that one (because it doesn't exist) should have the
agreed-upon behavior of *not* raising an error.  (Any other, later errors
should be fatal, I certainly agree.)

> not pass-via-fallback. You're breaking that.

Sorry, I don't follow, please explain?


Grüße
 Thomas


Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread chenglulu



在 2024/3/7 下午8:52, Xi Ruoyao 写道:

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:


I wanted to do this at first, but it didn't work out.

But then I thought about it, and there are two benefits to putting it in 
the middle of ll/sc:


1. If there is an operation that uses the $r4 register after this atomic 
operation, another


register is required to store $r4.

2. ll.w requires long cycles, so putting an addi.w command after ll.w 
won't make a difference.


So based on the above, I didn't try again, but directly made a 
modification like a patch.




Re: [PATCH] vect: Do not peel epilogue for partial vectors [PR114196].

2024-03-07 Thread Richard Biener
On Thu, Mar 7, 2024 at 1:25 PM Robin Dapp  wrote:
>
> Attached v2 combines the checks.
>
> Bootstrapped and regtested on x86 an power10, aarch64 still running.
> Regtested on riscv64.

LGTM.

> Regards
>  Robin
>
>
> Subject: [PATCH v2] vect: Do not peel epilogue for partial vectors.
>
> r14-7036-gcbf569486b2dec added an epilogue vectorization guard for early
> break but PR114196 shows that we also run into the problem without early
> break.  Therefore merge the condition into the topmost vectorization
> guard.
>
> gcc/ChangeLog:
>
> PR middle-end/114196
>
> * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Merge
> vectorization guards.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/pr114196.c: New test.
> * gcc.target/riscv/rvv/autovec/pr114196.c: New test.
> ---
>  gcc/testsuite/gcc.target/aarch64/pr114196.c   | 19 
>  .../gcc.target/riscv/rvv/autovec/pr114196.c   | 19 
>  gcc/tree-vect-loop-manip.cc   | 30 +--
>  3 files changed, 45 insertions(+), 23 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr114196.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr114196.c 
> b/gcc/testsuite/gcc.target/aarch64/pr114196.c
> new file mode 100644
> index 000..15e4b0e31b8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr114196.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options { -O3 -fno-vect-cost-model -march=armv9-a 
> -msve-vector-bits=256 } } */
> +
> +unsigned a;
> +int b;
> +long *c;
> +
> +int
> +main ()
> +{
> +  for (int d = 0; d < 22; d += 4) {
> +  b = ({
> +   int e = c[d];
> +   e;
> +   })
> +  ? 0 : -c[d];
> +  a *= 3;
> +  }
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
> new file mode 100644
> index 000..7ba9cbbed70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options { -O3 -fno-vect-cost-model -march=rv64gcv_zvl256b 
> -mabi=lp64d -mrvv-vector-bits=zvl } } */
> +
> +unsigned a;
> +int b;
> +long *c;
> +
> +int
> +main ()
> +{
> +  for (int d = 0; d < 22; d += 4) {
> +  b = ({
> +   int e = c[d];
> +   e;
> +   })
> +  ? 0 : -c[d];
> +  a *= 3;
> +  }
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index f72da915103..56a6d8e4a8d 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -2129,16 +2129,19 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info 
> loop_vinfo,
>   For mult, don't known how to generate
>   init_expr * pow (step, niters) for variable niters.
>   For neg, it should be ok, since niters of vectorized main loop
> - will always be multiple of 2.  */
> -  if ((!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> -   || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> + will always be multiple of 2.
> + See also PR113163 and PR114196.  */
> +  if ((!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
> +   || LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> +   || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>&& induction_type != vect_step_op_neg)
>  {
>if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Peeling for epilogue is not supported"
>  " for nonlinear induction except neg"
> -" when iteration count is unknown.\n");
> +" when iteration count is unknown or"
> +" when using partial vectorization.\n");
>return false;
>  }
>
> @@ -2178,25 +2181,6 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo,
>return false;
>  }
>
> -  /* We can't support partial vectors and early breaks with an induction
> - type other than add or neg since we require the epilog and can't
> - perform the peeling.  The below condition mirrors that of
> - vect_gen_vector_loop_niters  where niters_vector_mult_vf_var then sets
> - step_vector to VF rather than 1.  This is what creates the nonlinear
> - IV.  PR113163.  */
> -  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> -  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
> -  && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> -  && induction_type != vect_step_op_neg)
> -{
> -  if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"Peeling for epilogue is not supported"
> -" for nonlinear induction except neg"
> -" when VF is known and early breaks.\n");
> -  return false;
> -}
> -
>return true;
>  }
>
> --
> 

Re: [PATCH] analyzer: Fix up some -Wformat* warnings

2024-03-07 Thread David Malcolm
On Thu, 2024-03-07 at 09:30 +0100, Jakub Jelinek wrote:
> Hi!
> 
> I'm seeing warnings like
> ../../gcc/analyzer/access-diagram.cc: In member function ‘void
> ana::bit_size_expr::print(pretty_printer*) const’:
> ../../gcc/analyzer/access-diagram.cc:399:26: warning: unknown
> conversion type character ‘E’ in format [-Wformat=]
>   399 | pp_printf (pp, _("%qE bytes"), bytes_expr);
>   |  ^~~
> when building stage2/stage3 gcc.  While such warnings would be
> understandable when building stage1 because one could e.g. have some
> older host compiler which doesn't understand some of the format
> specifiers,
> the above seems to be because we have in pretty-print.h
> #ifdef GCC_DIAG_STYLE
> #define GCC_PPDIAG_STYLE GCC_DIAG_STYLE
> #else
> #define GCC_PPDIAG_STYLE __gcc_diag__
> #endif
> and use GCC_PPDIAG_STYLE e.g. for pp_printf, and while
> diagnostic-core.h has
> #ifndef GCC_DIAG_STYLE
> #define GCC_DIAG_STYLE __gcc_tdiag__
> #endif
> (and similarly various FE headers include their own GCC_DIAG_STYLE)
> when including pretty-print.h before diagnostic-core.h we end up
> with __gcc_diag__ style rather than __gcc_tdiag__ style, which I
> think
> is the right thing for the analyzer, because analyzer seems to use
> default_tree_printer everywhere:
> grep pp_format_decoder.*=.default_tree_printer analyzer/* | wc -l
> 57
> 
> The following patch fixes that by making sure diagnostic-core.h is
> included
> before pretty-print.h.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Yes, thanks
Dave



Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread Xi Ruoyao
On Thu, 2024-03-07 at 09:12 +0800, Lulu Cheng wrote:

> +  output_asm_insn ("1:", operands);
> +  output_asm_insn ("ll.\t%0,%1", operands);
> +
> +  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
> + return value of the val_without_const_folding will not be truncated and
> + will be passed directly to the function compare_exchange_strong.
> + However, the instruction 'bne' does not distinguish between 32-bit and
> + 64-bit operations.  so if the upper 32 bits of the register are not
> + extended by the 32nd bit symbol, then the comparison may not be valid
> + here.  This will affect the result of the operation.  */
> +
> +  if (TARGET_64BIT && REG_P (operands[2])
> +  && GET_MODE (operands[2]) == SImode)
> +    {
> +  output_asm_insn ("addi.w\t%5,%2,0", operands);
> +  output_asm_insn ("bne\t%0,%5,2f", operands);

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..c21781947fd 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -234,11 +234,11 @@ (define_insn "atomic_exchange_short"
   "amswap%A3.\t%0,%z2,%1"
   [(set (attr "length") (const_int 4))])
 
-(define_insn "atomic_cas_value_strong"
+(define_insn "atomic_cas_value_strong"
   [(set (match_operand:GPR 0 "register_operand" "=")
(match_operand:GPR 1 "memory_operand" "+ZC"))
(set (match_dup 1)
-   (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
+   (unspec_volatile:GPR [(match_operand:X 2 "reg_or_0_operand" "rJ")
  (match_operand:GPR 3 "reg_or_0_operand" "rJ")
  (match_operand:SI 4 "const_int_operand")]  ;; 
mod_s
 UNSPEC_COMPARE_AND_SWAP))
@@ -246,10 +246,10 @@ (define_insn "atomic_cas_value_strong"
   ""
 {
   return "1:\\n\\t"
-"ll.\\t%0,%1\\n\\t"
+"ll.\\t%0,%1\\n\\t"
 "bne\\t%0,%z2,2f\\n\\t"
 "or%i3\\t%5,$zero,%3\\n\\t"
-"sc.\\t%5,%1\\n\\t"
+"sc.\\t%5,%1\\n\\t"
 "beqz\\t%5,1b\\n\\t"
 "b\\t3f\\n\\t"
 "2:\\n\\t"
@@ -301,9 +301,23 @@ (define_expand "atomic_compare_and_swap"
 operands[3], 
operands[4],
 operands[6]));
   else
-emit_insn (gen_atomic_cas_value_strong (operands[1], operands[2],
- operands[3], operands[4],
- operands[6]));
+{
+  rtx (*cas)(rtx, rtx, rtx, rtx, rtx) =
+   TARGET_64BIT ? gen_atomic_cas_value_strongdi
+: gen_atomic_cas_value_strongsi;
+  rtx expect = operands[3];
+
+  if (mode == SImode
+ && TARGET_64BIT
+ && operands[3] != const0_rtx)
+   {
+ expect = gen_reg_rtx (DImode);
+ emit_insn (gen_extendsidi2 (expect, operands[3]));
+   }
+
+  emit_insn (cas (operands[1], operands[2], expect, operands[4],
+ operands[6]));
+}
 
   rtx compare = operands[1];
   if (operands[3] != const0_rtx)

It produces:

slli.w  $r4,$r4,0
1:
ll.w$r14,$r3,0
bne $r14,$r4,2f
or  $r15,$zero,$r12
sc.w$r15,$r3,0
beqz$r15,1b
b   3f
2:
dbar0b10100
3:

for the test case and the compiled test case runs successfully.  I've
not done a full bootstrap yet though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH v2] c++: Redetermine whether to write vtables on stream-in [PR114229]

2024-03-07 Thread Nathaniel Shead
On Wed, Mar 06, 2024 at 08:59:16AM -0500, Jason Merrill wrote:
> On 3/5/24 22:06, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > -- >8 --
> > 
> > Currently, reading a variable definition always marks that decl as
> > DECL_NOT_REALLY_EXTERN, with anything else imported still being
> > considered external. This is not sufficient for vtables, however; for an
> > extern template, a vtable may be generated (and its definition emitted)
> > but nonetheless the vtable should only be emitted in the TU where that
> > template is actually instantiated.
> 
> Does the vtable go through import_export_decl?  I've been thinking that that
> function (and import_export_class) need to be more module-aware. Would it
> make sense to do that rather than stream DECL_NOT_REALLY_EXTERN?
> 
> Jason
> 

Right. It doesn't go through 'import_export_decl' because when it's
imported, DECL_INTERFACE_KNOWN is already set. So it seems an obvious
fix here is to just ensure that we clear that flag on stream-in for
vtables (we can't do it generally as it seems to be needed to be kept on
various other kinds of declarations).

Linaro complained about the last version of this patch too on ARM;
hopefully this version is friendlier.

I might also spend some time messing around to see if I can implement
https://github.com/itanium-cxx-abi/cxx-abi/issues/170 later, but that
will probably have to be a GCC 15 change.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk if
Linaro doesn't complain about this patch?

-- >8 --

We currently always stream DECL_INTERFACE_KNOWN, which is needed since
many kinds of declarations already have their interface determined at
parse time.  But for vtables and type-info declarations we need to
re-evaluate on stream-in, as whether they need to be emitted or not
changes in each TU, so this patch clears DECL_INTERFACE_KNOWN on these
kinds of declarations so that they can go through 'import_export_decl'
again.

Note that the precise details of the virt-2 tests will need to change
when we implement the resolution of [1]; for now I just updated the test
to not fail with the new (current) semantics.

[1]: https://github.com/itanium-cxx-abi/cxx-abi/pull/171

PR c++/114229

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Redetermine
DECL_INTERFACE_KNOWN on stream-in for vtables and tinfo.

gcc/testsuite/ChangeLog:

* g++.dg/modules/virt-2_b.C: Update test to acknowledge that we
now emit vtables here too.
* g++.dg/modules/virt-3_a.C: New test.
* g++.dg/modules/virt-3_b.C: New test.
* g++.dg/modules/virt-3_c.C: New test.
* g++.dg/modules/virt-3_d.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc| 12 +++-
 gcc/testsuite/g++.dg/modules/virt-2_b.C |  5 ++---
 gcc/testsuite/g++.dg/modules/virt-3_a.C |  9 +
 gcc/testsuite/g++.dg/modules/virt-3_b.C |  6 ++
 gcc/testsuite/g++.dg/modules/virt-3_c.C |  3 +++
 gcc/testsuite/g++.dg/modules/virt-3_d.C |  7 +++
 6 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/virt-3_d.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f7e8b357fc2..d77286328f5 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5390,7 +5390,17 @@ trees_out::core_bools (tree t)
   WB (t->decl_common.lang_flag_2);
   WB (t->decl_common.lang_flag_3);
   WB (t->decl_common.lang_flag_4);
-  WB (t->decl_common.lang_flag_5);
+
+  {
+   /* This is DECL_INTERFACE_KNOWN: We should redetermine whether
+  we need to import or export any vtables or typeinfo objects
+  on stream-in.  */
+   bool interface_known = t->decl_common.lang_flag_5;
+   if (VAR_P (t) && (DECL_VTABLE_OR_VTT_P (t) || DECL_TINFO_P (t)))
+ interface_known = false;
+   WB (interface_known);
+  }
+
   WB (t->decl_common.lang_flag_6);
   WB (t->decl_common.lang_flag_7);
   WB (t->decl_common.lang_flag_8);
diff --git a/gcc/testsuite/g++.dg/modules/virt-2_b.C 
b/gcc/testsuite/g++.dg/modules/virt-2_b.C
index e041f0721f9..2bc5eced013 100644
--- a/gcc/testsuite/g++.dg/modules/virt-2_b.C
+++ b/gcc/testsuite/g++.dg/modules/virt-2_b.C
@@ -21,8 +21,7 @@ int main ()
   return !(Visit () == 1);
 }
 
-// We do not emit Visitor vtable
-// but we do emit rtti here
-// { dg-final { scan-assembler-not {_ZTVW3foo7Visitor:} } }
+// Again, we emit Visitor vtable and rtti here
+// { dg-final { scan-assembler {_ZTVW3foo7Visitor:} } }
 // { dg-final { scan-assembler {_ZTIW3foo7Visitor:} } }
 // { dg-final { scan-assembler {_ZTSW3foo7Visitor:} } }
diff --git a/gcc/testsuite/g++.dg/modules/virt-3_a.C 
b/gcc/testsuite/g++.dg/modules/virt-3_a.C
new file mode 100644
index 

nvptx: 'cuDeviceGetCount' failure is fatal (was: [Patch] OpenMP: Move omp requires checks to libgomp)

2024-03-07 Thread Thomas Schwinge
Hi!

On 2022-06-08T05:56:02+0200, Tobias Burnus  wrote:
> [...] On the libgomp side: The devices which do not fulfill the requirements 
> are
> now filtered out.  [...]

> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c

>  /* Return the number of GCN devices on the system.  */
>  
>  int
> -GOMP_OFFLOAD_get_num_devices (void)
> +GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
>  {
>if (!init_hsa_context ())
>  return 0;
> +  /* Return -1 if no omp_requires_mask cannot be fulfilled but
> + devices were present.  */
> +  if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
> +return -1;
>return hsa_context.agent_count;
>  }

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c

>  int
> -GOMP_OFFLOAD_get_num_devices (void)
> +GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
>  {
> -  return nvptx_get_num_devices ();
> +  int num_devices = nvptx_get_num_devices ();
> +  /* Return -1 if no omp_requires_mask cannot be fulfilled but
> + devices were present.  */
> +  if (num_devices > 0 && omp_requires_mask != 0)
> +return -1;
> +  return num_devices;
>  }

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -4132,8 +4183,19 @@ gomp_target_init (void)
>  
>   if (gomp_load_plugin_for_device (_device, plugin_name))
> {
> - new_num_devs = current_device.get_num_devices_func ();
> - if (new_num_devs >= 1)
> + new_num_devs = current_device.get_num_devices_func (requires_mask);
> + if (new_num_devs < 0)
> +   {
> + [...]
> +   }
> + else if (new_num_devs >= 1)
> {
>   /* Augment DEVICES and NUM_DEVICES.  */

OK to push the attached "nvptx: 'cuDeviceGetCount' failure is fatal"?


Grüße
 Thomas


>From 8090da93cb00e4aa47a8b21b6548d739b2cebc49 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 13:18:23 +0100
Subject: [PATCH] nvptx: 'cuDeviceGetCount' failure is fatal

Per commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp", we're now using 'return -1'
from 'GOMP_OFFLOAD_get_num_devices' for 'omp_requires_mask' purposes.  This
missed that via 'nvptx_get_num_devices', we could also 'return -1' for
'cuDeviceGetCount' failure.  Before, this meant (in 'gomp_target_init') to
silently ignore the plugin/device -- which also has been doubtful behavior.
Let's instead turn 'cuDeviceGetCount' failure into a fatal error, similar to
other errors during device initialization.

	libgomp/
	* plugin/plugin-nvptx.c (nvptx_get_num_devices):
	'cuDeviceGetCount' failure is fatal.
---
 libgomp/plugin/plugin-nvptx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index ffb1db67d20..81b4a7f499a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -630,7 +630,7 @@ nvptx_get_num_devices (void)
 	}
 }
 
-  CUDA_CALL_ERET (-1, cuDeviceGetCount, );
+  CUDA_CALL_ASSERT (cuDeviceGetCount, );
   return n;
 }
 
-- 
2.34.1



Re: [PATCH] vect: Do not peel epilogue for partial vectors [PR114196].

2024-03-07 Thread Robin Dapp
Attached v2 combines the checks.

Bootstrapped and regtested on x86 an power10, aarch64 still running.
Regtested on riscv64.

Regards
 Robin


Subject: [PATCH v2] vect: Do not peel epilogue for partial vectors.

r14-7036-gcbf569486b2dec added an epilogue vectorization guard for early
break but PR114196 shows that we also run into the problem without early
break.  Therefore merge the condition into the topmost vectorization
guard.

gcc/ChangeLog:

PR middle-end/114196

* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Merge
vectorization guards.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr114196.c: New test.
* gcc.target/riscv/rvv/autovec/pr114196.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/pr114196.c   | 19 
 .../gcc.target/riscv/rvv/autovec/pr114196.c   | 19 
 gcc/tree-vect-loop-manip.cc   | 30 +--
 3 files changed, 45 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr114196.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c

diff --git a/gcc/testsuite/gcc.target/aarch64/pr114196.c 
b/gcc/testsuite/gcc.target/aarch64/pr114196.c
new file mode 100644
index 000..15e4b0e31b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr114196.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options { -O3 -fno-vect-cost-model -march=armv9-a 
-msve-vector-bits=256 } } */
+
+unsigned a;
+int b;
+long *c;
+
+int
+main ()
+{
+  for (int d = 0; d < 22; d += 4) {
+  b = ({
+   int e = c[d];
+   e;
+   })
+  ? 0 : -c[d];
+  a *= 3;
+  }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
new file mode 100644
index 000..7ba9cbbed70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options { -O3 -fno-vect-cost-model -march=rv64gcv_zvl256b -mabi=lp64d 
-mrvv-vector-bits=zvl } } */
+
+unsigned a;
+int b;
+long *c;
+
+int
+main ()
+{
+  for (int d = 0; d < 22; d += 4) {
+  b = ({
+   int e = c[d];
+   e;
+   })
+  ? 0 : -c[d];
+  a *= 3;
+  }
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index f72da915103..56a6d8e4a8d 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2129,16 +2129,19 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo,
  For mult, don't known how to generate
  init_expr * pow (step, niters) for variable niters.
  For neg, it should be ok, since niters of vectorized main loop
- will always be multiple of 2.  */
-  if ((!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-   || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
+ will always be multiple of 2.
+ See also PR113163 and PR114196.  */
+  if ((!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
+   || LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+   || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
   && induction_type != vect_step_op_neg)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Peeling for epilogue is not supported"
 " for nonlinear induction except neg"
-" when iteration count is unknown.\n");
+" when iteration count is unknown or"
+" when using partial vectorization.\n");
   return false;
 }
 
@@ -2178,25 +2181,6 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo,
   return false;
 }
 
-  /* We can't support partial vectors and early breaks with an induction
- type other than add or neg since we require the epilog and can't
- perform the peeling.  The below condition mirrors that of
- vect_gen_vector_loop_niters  where niters_vector_mult_vf_var then sets
- step_vector to VF rather than 1.  This is what creates the nonlinear
- IV.  PR113163.  */
-  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
-  && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
-  && induction_type != vect_step_op_neg)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"Peeling for epilogue is not supported"
-" for nonlinear induction except neg"
-" when VF is known and early breaks.\n");
-  return false;
-}
-
   return true;
 }
 
-- 
2.43.2



[PATCH] libstdc++: Document that _GLIBCXX_CONCEPT_CHECKS might be removed in future

2024-03-07 Thread Jonathan Wakely
Any objection to this update to make the docs reflect reality?

-- >8 --

The macro-based concept checks are unmaintained and do not support C++11
or later, so reject valid code. If nobody plans to update them we should
consider removing them. Alternatively, we could ignore the macro for
C++11 and later, so they have no effect and don't reject valid code.

libstdc++-v3/ChangeLog:

* doc/xml/manual/debug.xml: Document that concept checking might
be removed in future.
* doc/xml/manual/extensions.xml: Likewise.
---
 libstdc++-v3/doc/xml/manual/debug.xml  |  2 ++
 libstdc++-v3/doc/xml/manual/extensions.xml | 18 --
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/debug.xml 
b/libstdc++-v3/doc/xml/manual/debug.xml
index 42d4d32aa29..7f6d0876fc6 100644
--- a/libstdc++-v3/doc/xml/manual/debug.xml
+++ b/libstdc++-v3/doc/xml/manual/debug.xml
@@ -351,6 +351,8 @@
 
The Compile-Time
   Checks extension has compile-time checks for many algorithms.
+  These checks were designed for C++98 and have not been updated to work
+  with C++11 and later standards. They might be removed at a future date.
   
 
 
diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml 
b/libstdc++-v3/doc/xml/manual/extensions.xml
index d4fe2f509d4..490a50cc331 100644
--- a/libstdc++-v3/doc/xml/manual/extensions.xml
+++ b/libstdc++-v3/doc/xml/manual/extensions.xml
@@ -77,8 +77,7 @@ extensions, be aware of two things:
   object file.  The checks are also cleaner and easier to read and
   understand.

-   They are off by default for all versions of GCC from 3.0 to 3.4 (the
-  latest release at the time of writing).
+   They are off by default for all GCC 3.0 and all later versions.
   They can be enabled at configure time with
   --enable-concept-checks.
   You can enable them on a per-translation-unit basis with
@@ -89,10 +88,17 @@ extensions, be aware of two things:

 
Please note that the concept checks only validate the requirements
-   of the old C++03 standard. C++11 was expected to have first-class
-   support for template parameter constraints based on concepts in the core
-   language. This would have obviated the need for the library-simulated 
concept
-   checking described above, but was not part of C++11.
+   of the old C++03 standard and reject some valid code that meets the relaxed
+   requirements of C++11 and later standards.
+   C++11 was expected to have first-class support for template parameter
+   constraints based on concepts in the core language.
+   This would have obviated the need for the library-simulated concept checking
+   described above, but was not part of C++11.
+   C++20 adds a different model of concepts, which is now used to constrain
+   some new parts of the C++20 library, e.g. the
+   ranges header and the new overloads in the
+   algorithm header for working with ranges.
+   The old library-simulated concept checks might be removed at a future date.

 
 
-- 
2.43.2



Re: GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools p

2024-03-07 Thread Jakub Jelinek
On Thu, Mar 07, 2024 at 12:53:31PM +0100, Thomas Schwinge wrote:
> >From 6a6520e01f7e7118b556683c2934f2c64c6dbc81 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Thu, 7 Mar 2024 12:31:52 +0100
> Subject: [PATCH] GCN, nvptx: Fatal error for missing symbols in
>  'libhsa-runtime64.so.1', 'libcuda.so.1'
> 
> If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the 
> corresponding
> libgomp plugin/device gets disabled, as before.  But if they are available,
> report any inconsistencies such as missing symbols, similar to how we fail in
> presence of other issues during device initialization.
> 
>   libgomp/
>   * plugin/plugin-gcn.c (init_hsa_runtime_functions): Fatal error
>   for missing symbols.
>   * plugin/plugin-nvptx.c (init_cuda_lib): Likewise.

Ok.

Jakub



GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patch

2024-03-07 Thread Thomas Schwinge
Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek  wrote:
> [...] If the nvptx libgomp plugin is installed, but libcuda.so.1
> can't be found, then the plugin behaves as if there are no PTX devices
> available.  [...]

ACK.

> --- libgomp/plugin/plugin-nvptx.c.jj  2017-01-13 12:07:56.0 +0100
> +++ libgomp/plugin/plugin-nvptx.c 2017-01-13 18:00:39.693284346 +0100

> +/* -1 if init_cuda_lib has not been called yet, false
> +   if it has been and failed, true if it has been and succeeded.  */
> +static char cuda_lib_inited = -1;
>  
> -  return desc;
> +/* Dynamically load the CUDA runtime library and initialize function
> +   pointers, return false if unsuccessful, true if successful.  */
> +static bool
> +init_cuda_lib (void)
> +{
> +  if (cuda_lib_inited != -1)
> +return cuda_lib_inited;
> +  const char *cuda_runtime_lib = "libcuda.so.1";
> +  void *h = dlopen (cuda_runtime_lib, RTLD_LAZY);
> +  cuda_lib_inited = false;
> +  if (h == NULL)
> +return false;

..., so this has to stay.

> +# undef CUDA_ONE_CALL
> +# define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
> +# define CUDA_ONE_CALL_1(call) \
> +  cuda_lib.call = dlsym (h, #call);  \
> +  if (cuda_lib.call == NULL) \
> +return false;

However, this (missing symbol) I'd like to make a fatal error, instead of
silently disabling the plugin/device.  OK to push the attached
"GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 
'libcuda.so.1'"?

> +  [...]
> +  cuda_lib_inited = true;
> +  return true;
>  }


Grüße
 Thomas


>From 6a6520e01f7e7118b556683c2934f2c64c6dbc81 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 12:31:52 +0100
Subject: [PATCH] GCN, nvptx: Fatal error for missing symbols in
 'libhsa-runtime64.so.1', 'libcuda.so.1'

If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the corresponding
libgomp plugin/device gets disabled, as before.  But if they are available,
report any inconsistencies such as missing symbols, similar to how we fail in
presence of other issues during device initialization.

	libgomp/
	* plugin/plugin-gcn.c (init_hsa_runtime_functions): Fatal error
	for missing symbols.
	* plugin/plugin-nvptx.c (init_cuda_lib): Likewise.
---
 libgomp/plugin/plugin-gcn.c   | 3 ++-
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 464164afb03..338225db6f4 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1382,9 +1382,10 @@ init_hsa_runtime_functions (void)
 #define DLSYM_FN(function) \
   hsa_fns.function##_fn = dlsym (handle, #function); \
   if (hsa_fns.function##_fn == NULL) \
-return false;
+GOMP_PLUGIN_fatal ("'%s' is missing '%s'", hsa_runtime_lib, #function);
 #define DLSYM_OPT_FN(function) \
   hsa_fns.function##_fn = dlsym (handle, #function);
+
   void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);
   if (handle == NULL)
 return false;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3fd6cd42fa6..ffb1db67d20 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -127,7 +127,7 @@ init_cuda_lib (void)
 # define CUDA_ONE_CALL_1(call, allow_null)		\
   cuda_lib.call = dlsym (h, #call);	\
   if (!allow_null && cuda_lib.call == NULL)		\
-return false;
+GOMP_PLUGIN_fatal ("'%s' is missing '%s'", cuda_runtime_lib, #call);
 #include "cuda-lib.def"
 # undef CUDA_ONE_CALL
 # undef CUDA_ONE_CALL_1
-- 
2.34.1



Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Tobias Burnus

Hi,

Thomas Schwinge wrote:

An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
different from the libgomp-level host-fallback execution):

+failure:
+  if (suppress_host_fallback)
+GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
+  return false;
+}


This originates in the libgomp HSA plugin, where the idea was -- in my
understanding -- that you wouldn't have device code available for all
'fn_ptr's, and in that case transparently (shared-memory system!) do
host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin (see above).
However, is it really still applicable there; don't we assume that we're
generating device code for all relevant functions?  (I suppose everyone
really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)


First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it 
is also not really desirable.


If I run on my Linux system the system compiler with nvptx + gcn suppost 
installed, I get (with a nvptx permission problem):


$ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out

libgomp: GCN host fallback has been suppressed

And exit code = 1. The same result with '-foffload=disable' or with 
'-foffload=nvptx-none'.



Should we thus
actually remove 'suppress_host_fallback' (that is, make it
always-'true'),


If we want to remove it, we can make it always false - but I am strongly 
against making it always true.


Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to 
prevent the host fallback, but don't break somewhat common systems.


Tobias


Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Andrew Stubbs

On 07/03/2024 11:29, Thomas Schwinge wrote:

Hi!

On 2019-11-12T13:29:16+, Andrew Stubbs  wrote:

This patch contributes the GCN libgomp plugin, with the various
configure and make bits to go with it.


An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
different from the libgomp-level host-fallback execution):


--- /dev/null
+++ b/libgomp/plugin/plugin-gcn.c



+/* Flag to decide if the runtime should suppress a possible fallback to host
+   execution.  */
+
+static bool suppress_host_fallback;



+static void
+init_environment_variables (void)
+{
+  [...]
+  if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK"))
+suppress_host_fallback = true;
+  else
+suppress_host_fallback = false;



+/* Return true if the HSA runtime can run function FN_PTR.  */
+
+bool
+GOMP_OFFLOAD_can_run (void *fn_ptr)
+{
+  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
+
+  init_kernel (kernel);
+  if (kernel->initialization_failed)
+goto failure;
+
+  return true;
+
+failure:
+  if (suppress_host_fallback)
+GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
+  return false;
+}


This originates in the libgomp HSA plugin, where the idea was -- in my
understanding -- that you wouldn't have device code available for all
'fn_ptr's, and in that case transparently (shared-memory system!) do
host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin (see above).
However, is it really still applicable there; don't we assume that we're
generating device code for all relevant functions?  (I suppose everyone
really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)  Should we thus
actually remove 'suppress_host_fallback' (that is, make it
always-'true'), including removal of the 'can_run' hook?  (I suppose that
even in a future shared-memory "GCN" configuration, we're not expecting
to use this again; expecting always-'true' for 'can_run'?)


Now my actual issue: the libgomp GCN plugin then invented an additional
use of 'GCN_SUPPRESS_HOST_FALLBACK':


+/* Initialize hsa_context if it has not already been done.
+   Return TRUE on success.  */
+
+static bool
+init_hsa_context (void)
+{
+  hsa_status_t status;
+  int agent_index = 0;
+
+  if (hsa_context.initialized)
+return true;
+  init_environment_variables ();
+  if (!init_hsa_runtime_functions ())
+{
+  GCN_WARNING ("Run-time could not be dynamically opened\n");
+  if (suppress_host_fallback)
+   GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+  return false;
+}


That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its
original purpose), and you have the libgomp GCN plugin configured, but
don't have 'libhsa-runtime64.so.1' available, you run into a fatal error.

The libgomp nvptx plugin in such cases silently disables the
plugin/device (and thus lets libgomp proper do its thing), and I propose
we do the same here.  OK to push the attached
"GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 
'init_hsa_runtime_functions' is not fatal"?


If you try to run the offload testsuite on a device that is not properly 
configured then we want FAIL, not pass-via-fallback. You're breaking that.


Andrew


GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal (was: [PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin)

2024-03-07 Thread Thomas Schwinge
Hi!

On 2019-11-12T13:29:16+, Andrew Stubbs  wrote:
> This patch contributes the GCN libgomp plugin, with the various
> configure and make bits to go with it.

An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
different from the libgomp-level host-fallback execution):

> --- /dev/null
> +++ b/libgomp/plugin/plugin-gcn.c

> +/* Flag to decide if the runtime should suppress a possible fallback to host
> +   execution.  */
> +
> +static bool suppress_host_fallback;

> +static void
> +init_environment_variables (void)
> +{
> +  [...]
> +  if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK"))
> +suppress_host_fallback = true;
> +  else
> +suppress_host_fallback = false;

> +/* Return true if the HSA runtime can run function FN_PTR.  */
> +
> +bool
> +GOMP_OFFLOAD_can_run (void *fn_ptr)
> +{
> +  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
> +
> +  init_kernel (kernel);
> +  if (kernel->initialization_failed)
> +goto failure;
> +
> +  return true;
> +
> +failure:
> +  if (suppress_host_fallback)
> +GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
> +  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
> +  return false;
> +}

This originates in the libgomp HSA plugin, where the idea was -- in my
understanding -- that you wouldn't have device code available for all
'fn_ptr's, and in that case transparently (shared-memory system!) do
host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin (see above).
However, is it really still applicable there; don't we assume that we're
generating device code for all relevant functions?  (I suppose everyone
really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)  Should we thus
actually remove 'suppress_host_fallback' (that is, make it
always-'true'), including removal of the 'can_run' hook?  (I suppose that
even in a future shared-memory "GCN" configuration, we're not expecting
to use this again; expecting always-'true' for 'can_run'?)


Now my actual issue: the libgomp GCN plugin then invented an additional
use of 'GCN_SUPPRESS_HOST_FALLBACK':

> +/* Initialize hsa_context if it has not already been done.
> +   Return TRUE on success.  */
> +
> +static bool
> +init_hsa_context (void)
> +{
> +  hsa_status_t status;
> +  int agent_index = 0;
> +
> +  if (hsa_context.initialized)
> +return true;
> +  init_environment_variables ();
> +  if (!init_hsa_runtime_functions ())
> +{
> +  GCN_WARNING ("Run-time could not be dynamically opened\n");
> +  if (suppress_host_fallback)
> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
> +  return false;
> +}

That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its
original purpose), and you have the libgomp GCN plugin configured, but
don't have 'libhsa-runtime64.so.1' available, you run into a fatal error.

The libgomp nvptx plugin in such cases silently disables the
plugin/device (and thus lets libgomp proper do its thing), and I propose
we do the same here.  OK to push the attached
"GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 
'init_hsa_runtime_functions' is not fatal"?


Grüße
 Thomas


>From f037d2d8274940f042633a0ecb18a53942c075f5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Mar 2024 10:43:15 +0100
Subject: [PATCH] GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to
 'init_hsa_runtime_functions' is not fatal

'GCN_SUPPRESS_HOST_FALLBACK' controls the libgomp GCN plugin's capability to
transparently use host-fallback execution for certain device functions; it
shouldn't control failure of libgomp GCN plugin initialization (which libgomp
handles fine: triggering use of a different plugin/device, or general
host-fallback execution, or fatal error, as applicable).

	libgomp/
	* plugin/plugin-gcn.c (init_hsa_context): Even with
	'GCN_SUPPRESS_HOST_FALLBACK' set, failure to
	'init_hsa_runtime_functions' is not fatal.
---
 libgomp/plugin/plugin-gcn.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 2771123252a..464164afb03 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1524,8 +1524,6 @@ init_hsa_context (void)
   if (!init_hsa_runtime_functions ())
 {
   GCN_WARNING ("Run-time could not be dynamically opened\n");
-  if (suppress_host_fallback)
-	GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
   return false;
 }
   status = hsa_fns.hsa_init_fn ();
-- 
2.34.1



Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 12:11 PM Richard Biener  wrote:
>
> On Thu, 7 Mar 2024, Jakub Jelinek wrote:
>
> > On Thu, Mar 07, 2024 at 11:11:35AM +0100, Uros Bizjak wrote:
> > > > Since you CCed me - looking at the code I wonder why we fatally fail.
> > > > The following might also fix the issue and preserve more of the
> > > > rest of the flow of the function.
> > > >
> > > > If that works I'd prefer it.  But I'll defer approval to the combine
> > > > maintainer which is Segher.
> > >
> > > Your patch is basically what v1 did [1], but it was suggested (in a
> > > reply by you ;) ) that we should stop the attempt to combine if we
> > > can't handle the use. So, the v2 patch undoes the combine and records
> > > a nice message in this case.
> >
> > My understanding of Richi's patch is that it it treats the non-COMPARISON_P
> > the same as if find_single_use fails, which is a common case that certainly
> > has to be handled right and it doesn't seem that we are giving up completely
> > for that case.  So, I think it is reasonable to treat the non-COMPARISON_P
> > *cc_use_loc as NULL cc_use_loc.
>
> The question is, whether a NULL cc_use_loc (find_single_use returning
> NULL) means "there is no use" or it can mean "huh, don't know, maybe
> more than one, maybe I was too stupid to indentify the single use".
> The implementation suggests it's all broken ;)

As I understood find_single_use, it is returning RTX iff DEST is used
only a single time in an insn sequence following INSN.
find_single_use_1 returns RTX iff argument is used exactly once in
DEST. So, find_single_use returns RTX only when DEST is used exactly
once in a sequence following INSN.

We can reject the combination without worries of multiple uses.

Uros,


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Richard Biener
On Thu, 7 Mar 2024, Jakub Jelinek wrote:

> On Thu, Mar 07, 2024 at 11:11:35AM +0100, Uros Bizjak wrote:
> > > Since you CCed me - looking at the code I wonder why we fatally fail.
> > > The following might also fix the issue and preserve more of the
> > > rest of the flow of the function.
> > >
> > > If that works I'd prefer it.  But I'll defer approval to the combine
> > > maintainer which is Segher.
> > 
> > Your patch is basically what v1 did [1], but it was suggested (in a
> > reply by you ;) ) that we should stop the attempt to combine if we
> > can't handle the use. So, the v2 patch undoes the combine and records
> > a nice message in this case.
> 
> My understanding of Richi's patch is that it it treats the non-COMPARISON_P
> the same as if find_single_use fails, which is a common case that certainly
> has to be handled right and it doesn't seem that we are giving up completely
> for that case.  So, I think it is reasonable to treat the non-COMPARISON_P
> *cc_use_loc as NULL cc_use_loc.

The question is, whether a NULL cc_use_loc (find_single_use returning 
NULL) means "there is no use" or it can mean "huh, don't know, maybe
more than one, maybe I was too stupid to indentify the single use".
The implementation suggests it's all broken ;)

Richard.


Re: [PATCH] ipa: Avoid excessive removing of SSAs (PR 113757)

2024-03-07 Thread Martin Jambor
Hello,

and ping please.

Martin


On Thu, Feb 08 2024, Martin Jambor wrote:
> Hi,
>
> PR 113757 shows that the code which was meant to debug-reset and
> remove SSAs defined by LHSs of calls redirected to
> __builtin_unreachable can trigger also when speculative
> devirtualization creates a call to a noreturn function (and since it
> is noreturn, it does not bother dealing with its return value).
>
> What is more, it seems that the code handling this case is not really
> necessary.  I feel slightly idiotic about this because I have a
> feeling that I added it because of a failing test-case but I can
> neither find the testcase nor a reason why the code in
> cgraph_edge::redirect_call_stmt_to_callee would not be sufficient (it
> turns the SSA name into a default-def, a bit like IPA-SRA, but any
> code dominated by a call to a noreturn is not dangerous when it comes
> to its side-effects).  So this patch just removes the handling.
>
> Bootstrapped and tested on x86_64-linux and ppc64le-linux.  I have also
> LTO-bootstrapped and LTO-profilebootstrapped the patch on x86_64-linux.
>
> OK for master?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2024-02-07  Martin Jambor  
>
>   PR ipa/113757
>   * tree-inline.cc (redirect_all_calls): Remove code adding SSAs to
>   id->killed_new_ssa_names.
>
> gcc/testsuite/ChangeLog:
>
> 2024-02-07  Martin Jambor  
>
>   PR ipa/113757
>   * g++.dg/ipa/pr113757.C: New test.
> ---
>  gcc/testsuite/g++.dg/ipa/pr113757.C | 14 ++
>  gcc/tree-inline.cc  | 14 ++
>  2 files changed, 16 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr113757.C
>
> diff --git a/gcc/testsuite/g++.dg/ipa/pr113757.C 
> b/gcc/testsuite/g++.dg/ipa/pr113757.C
> new file mode 100644
> index 000..885d4010a10
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr113757.C
> @@ -0,0 +1,14 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -fPIC" }
> +// { dg-require-effective-target fpic }
> +
> +long size();
> +struct ll {  virtual int hh();  };
> +ll  *slice_owner;
> +int ll::hh() { __builtin_exit(0); }
> +int nn() {
> +  if (size())
> +return 0;
> +  return slice_owner->hh();
> +}
> +int (*a)() = nn;
> diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> index 75c10eb7dfc..cac41b4f031 100644
> --- a/gcc/tree-inline.cc
> +++ b/gcc/tree-inline.cc
> @@ -2984,23 +2984,13 @@ redirect_all_calls (copy_body_data * id, basic_block 
> bb)
>gimple *stmt = gsi_stmt (si);
>if (is_gimple_call (stmt))
>   {
> -   tree old_lhs = gimple_call_lhs (stmt);
> struct cgraph_edge *edge = id->dst_node->get_edge (stmt);
> if (edge)
>   {
> if (!id->killed_new_ssa_names)
>   id->killed_new_ssa_names = new hash_set (16);
> -   gimple *new_stmt
> - = cgraph_edge::redirect_call_stmt_to_callee (edge,
> - id->killed_new_ssa_names);
> -   if (old_lhs
> -   && TREE_CODE (old_lhs) == SSA_NAME
> -   && !gimple_call_lhs (new_stmt))
> - /* In case of IPA-SRA removing the LHS, the name should have
> -been already added to the hash.  But in case of redirecting
> -to builtin_unreachable it was not and the name still should
> -be pruned from debug statements.  */
> - id->killed_new_ssa_names->add (old_lhs);
> +   cgraph_edge::redirect_call_stmt_to_callee (edge,
> + id->killed_new_ssa_names);
>  
> if (stmt == last && id->call_stmt && maybe_clean_eh_stmt (stmt))
>   gimple_purge_dead_eh_edges (bb);
> -- 
> 2.43.0


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Richard Biener
On Thu, 7 Mar 2024, Uros Bizjak wrote:

> On Thu, Mar 7, 2024 at 10:56?AM Richard Biener  wrote:
> >
> > On Thu, 7 Mar 2024, Uros Bizjak wrote:
> >
> > > The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:
> > >
> > > internal compiler error: RTL check: expected elt 0 type 'e' or 'u',
> > > have 'E' (rtx unspec) in try_combine, at combine.cc:3237
> > >
> > > This is
> > >
> > > 3236  /* Just replace the CC reg with a new mode.  */
> > > 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> > > 3238  undobuf.other_insn = cc_use_insn;
> > >
> > > in combine.cc, where *cc_use_loc is
> > >
> > > (unspec:DI [
> > > (reg:CC 17 flags)
> > > ] UNSPEC_PUSHFL)
> > >
> > > combine assumes CC must be used inside of a comparison and uses XEXP 
> > > (..., 0)
> > > without checking on the RTX type of the argument.
> > >
> > > Undo the combination if *cc_use_loc is not COMPARISON_P.
> > >
> > > Also remove buggy and now redundant check for (const 0) RTX as part of
> > > the comparison.
> > >
> > > PR rtl-optimization/112560
> > >
> > > gcc/ChangeLog:
> > >
> > > * combine.cc (try_combine): Reject the combination
> > > if *cc_use_loc is not COMPARISON_P.
> > >
> > > Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
> > >
> > > OK for trunk?
> >
> > Since you CCed me - looking at the code I wonder why we fatally fail.
> > The following might also fix the issue and preserve more of the
> > rest of the flow of the function.
> >
> > If that works I'd prefer it.  But I'll defer approval to the combine
> > maintainer which is Segher.
> 
> Your patch is basically what v1 did [1], but it was suggested (in a
> reply by you ;) ) that we should stop the attempt to combine if we
> can't handle the use. So, the v2 patch undoes the combine and records
> a nice message in this case.

Ah, sorry.  I think I was merely curious how this works at all.
Your patch doesn't stop when find_single_use returns NULL
for multiple uses, but that case (as compared to "no other use")
should be handled the same as a !COMPARISON_P single-use.  The
original patch preserves the multi-use behavior which might be
a lot more unlikely than the !COMPARISON_P use case.

That said, in the original mail I questioned whether combine
handles the multi-use case correctly at all.  Looking at
find_single_use it seems to implement find_first_use.
Or get_use_of_single_use with the pretext that there actually
is only at most a single use (it doesn't seem to assert there
is one?).  Strange function, that.

But I don't know combine, I hope Segher can answer and pick a
patch.

Richard.

> Also, please note the removal of an existing crude hack that tries to
> reject non-comparison uses by looking for (const_int 0) in the use
> RTX, which is wrong as well.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638589.html
> 
> Uros.
> 
> >
> > Thanks,
> > Richard.
> >
> > diff --git a/gcc/combine.cc b/gcc/combine.cc
> > index a4479f8d836..e280cd72ec7 100644
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> > *i1, rtx_insn *i0,
> >
> >if (undobuf.other_insn == 0
> >   && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
> > -   _use_insn)))
> > +   _use_insn))
> > + && COMPARISON_P (*cc_use_loc))
> > {
> >   compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
> >   if (is_a  (GET_MODE (i2dest), ))
> > @@ -3200,7 +3201,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> > *i1, rtx_insn *i0,
> >  the above simplify_compare_const() returned a new comparison
> >  operator.  undobuf.other_insn is assigned the CC use insn
> >  when modifying it.  */
> > - if (cc_use_loc)
> > + if (cc_use_loc && COMPARISON_P (*cc_use_loc))
> > {
> >  #ifdef SELECT_CC_MODE
> >   machine_mode new_mode
> >
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:37 AM Jakub Jelinek  wrote:
>
> On Thu, Mar 07, 2024 at 11:11:35AM +0100, Uros Bizjak wrote:
> > > Since you CCed me - looking at the code I wonder why we fatally fail.
> > > The following might also fix the issue and preserve more of the
> > > rest of the flow of the function.
> > >
> > > If that works I'd prefer it.  But I'll defer approval to the combine
> > > maintainer which is Segher.
> >
> > Your patch is basically what v1 did [1], but it was suggested (in a
> > reply by you ;) ) that we should stop the attempt to combine if we
> > can't handle the use. So, the v2 patch undoes the combine and records
> > a nice message in this case.
>
> My understanding of Richi's patch is that it it treats the non-COMPARISON_P
> the same as if find_single_use fails, which is a common case that certainly
> has to be handled right and it doesn't seem that we are giving up completely
> for that case.  So, I think it is reasonable to treat the non-COMPARISON_P
> *cc_use_loc as NULL cc_use_loc.

Please see the logic in my v1 patch. For COMPARISON_P (*cc_use_loc),
we execute the same code in the first hunk of the patch, but for
non-COMPARISON_P, my patch zeroes cc_use_loc. The cc_use_loc is used
only in the "if (cc_use_loc)" protected part, so clearing cc_use_loc
when !COMPARISON_P (*cc_use_loc) has exactly the same effect as adding
COMPARISON_P check to existing "if (cc_use_loc) - we can execute the
"if" part only when *cc_use_loc is a comparison.

The functionality of Richi's patch is exactly the same as my v1 patch
which was rejected for the reason mentioned in my previous post.

Uros.


[PATCH] vect: Call vect_convert_output with the right vecitype [PR114108]

2024-03-07 Thread Tejas Belagod
This patch fixes a bug where vect_recog_abd_pattern called vect_convert_output
with the incorrect vecitype for the corresponding pattern_stmt.
vect_convert_output expects vecitype to be the vector form of the scalar type
of the LHS of pattern_stmt, but we were passing in the vector form of the LHS
of the new impending conversion statement.  This caused a skew in ABD's
pattern_stmt having the vectype of the following gimple pattern_stmt.

2024-03-06  Tejas Belagod  

gcc/ChangeLog:

PR middle-end/114108
* tree-vect-patterns.cc (vect_recog_abd_pattern): Call
vect_convert_output with the correct vecitype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr114108.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr114108.c | 19 +++
 gcc/tree-vect-patterns.cc|  5 ++---
 2 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr114108.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114108.c 
b/gcc/testsuite/gcc.dg/vect/pr114108.c
new file mode 100644
index 000..b3075d41398
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114108.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+#include "tree-vect.h"
+
+typedef signed char schar;
+
+__attribute__((noipa, noinline, optimize("O3")))
+void foo(const schar *a, const schar *b, schar *c, int n)
+{
+  for (int i = 0; i < n; i++)
+{   
+  unsigned u = __builtin_abs (a[i] - b[i]);
+  c[i] = u <= 7U ? u : 7U; 
+}   
+}
+
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target aarch64*-*-* 
} } } */
+/* { dg-final { scan-tree-dump "vect_recog_abd_pattern: detected" "vect" { 
target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d562f57920f..4f491c6b833 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1576,9 +1576,8 @@ vect_recog_abd_pattern (vec_info *vinfo,
   && !TYPE_UNSIGNED (abd_out_type))
 {
   tree unsign = unsigned_type_for (abd_out_type);
-  tree unsign_vectype = get_vectype_for_scalar_type (vinfo, unsign);
-  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
- unsign_vectype);
+  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt, 
vectype_out);
+  vectype_out = get_vectype_for_scalar_type (vinfo, unsign);
 }
 
   return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out);
-- 
2.25.1



[wwwdocs, committed] projects/gomp/: Fix typo, mark an item as implemented in GCC 14

2024-03-07 Thread Tobias Burnus

Found when glancing at it: A typo and an omission.
Committed. Seehttps://gcc.gnu.org/projects/gomp/#omp5.2  for the result.

Tobias
commit f99d0f3a2c61ad6677170b9068d511c20ba1bfe1
Author: Tobias Burnus 
Date:   Thu Mar 7 11:40:57 2024 +0100

projects/gomp/: Fix typo, mark an item as implemented in GCC 14

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 8fdfb95a..b8f11508 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -708,7 +708,7 @@ than listed, depending on resolved corner cases and optimizations.
 
   
   
-.terators in target update motion clauses and map clauses
+Iterators in target update motion clauses and map clauses
 No
 
   
@@ -729,7 +729,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 present argument to defaultmap clause
-No
+GCC14
 
   
   


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Jakub Jelinek
On Thu, Mar 07, 2024 at 11:11:35AM +0100, Uros Bizjak wrote:
> > Since you CCed me - looking at the code I wonder why we fatally fail.
> > The following might also fix the issue and preserve more of the
> > rest of the flow of the function.
> >
> > If that works I'd prefer it.  But I'll defer approval to the combine
> > maintainer which is Segher.
> 
> Your patch is basically what v1 did [1], but it was suggested (in a
> reply by you ;) ) that we should stop the attempt to combine if we
> can't handle the use. So, the v2 patch undoes the combine and records
> a nice message in this case.

My understanding of Richi's patch is that it it treats the non-COMPARISON_P
the same as if find_single_use fails, which is a common case that certainly
has to be handled right and it doesn't seem that we are giving up completely
for that case.  So, I think it is reasonable to treat the non-COMPARISON_P
*cc_use_loc as NULL cc_use_loc.

Jakub



Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:56 AM Richard Biener  wrote:
>
> On Thu, 7 Mar 2024, Uros Bizjak wrote:
>
> > The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:
> >
> > internal compiler error: RTL check: expected elt 0 type 'e' or 'u',
> > have 'E' (rtx unspec) in try_combine, at combine.cc:3237
> >
> > This is
> >
> > 3236  /* Just replace the CC reg with a new mode.  */
> > 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> > 3238  undobuf.other_insn = cc_use_insn;
> >
> > in combine.cc, where *cc_use_loc is
> >
> > (unspec:DI [
> > (reg:CC 17 flags)
> > ] UNSPEC_PUSHFL)
> >
> > combine assumes CC must be used inside of a comparison and uses XEXP (..., 
> > 0)
> > without checking on the RTX type of the argument.
> >
> > Undo the combination if *cc_use_loc is not COMPARISON_P.
> >
> > Also remove buggy and now redundant check for (const 0) RTX as part of
> > the comparison.
> >
> > PR rtl-optimization/112560
> >
> > gcc/ChangeLog:
> >
> > * combine.cc (try_combine): Reject the combination
> > if *cc_use_loc is not COMPARISON_P.
> >
> > Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
> >
> > OK for trunk?
>
> Since you CCed me - looking at the code I wonder why we fatally fail.
> The following might also fix the issue and preserve more of the
> rest of the flow of the function.
>
> If that works I'd prefer it.  But I'll defer approval to the combine
> maintainer which is Segher.

Your patch is basically what v1 did [1], but it was suggested (in a
reply by you ;) ) that we should stop the attempt to combine if we
can't handle the use. So, the v2 patch undoes the combine and records
a nice message in this case.

Also, please note the removal of an existing crude hack that tries to
reject non-comparison uses by looking for (const_int 0) in the use
RTX, which is wrong as well.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638589.html

Uros.

>
> Thanks,
> Richard.
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index a4479f8d836..e280cd72ec7 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> *i1, rtx_insn *i0,
>
>if (undobuf.other_insn == 0
>   && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
> -   _use_insn)))
> +   _use_insn))
> + && COMPARISON_P (*cc_use_loc))
> {
>   compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
>   if (is_a  (GET_MODE (i2dest), ))
> @@ -3200,7 +3201,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> *i1, rtx_insn *i0,
>  the above simplify_compare_const() returned a new comparison
>  operator.  undobuf.other_insn is assigned the CC use insn
>  when modifying it.  */
> - if (cc_use_loc)
> + if (cc_use_loc && COMPARISON_P (*cc_use_loc))
> {
>  #ifdef SELECT_CC_MODE
>   machine_mode new_mode
>


Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-03-07 Thread Richard Earnshaw (lists)
On 06/03/2024 20:28, Alexandre Oliva wrote:
> On Mar  1, 2024, "Richard Earnshaw (lists)"  wrote:
> 
>> On 01/03/2024 04:38, Alexandre Oliva wrote:
>>> Thanks for the review.
> 
>> For closure, Jakub has just pushed a patch to the generic code, so I
>> don't think we need this now.
> 
> ACK.  I see the c2x-stdarg-4.c test is now passing in our arm-eabi
> gcc-13 tree.  Thank you all.
> 
> Alas, the same nightly build showed a new riscv fail in c23-stdarg-6.c,
> that also got backported to gcc-13.  Presumably it's failing in the
> trunk as well, both riscv32-elf and riscv64-elf.
> 
> I haven't looked into whether it's a regression brought about by the
> patch or just a new failure mode that the new test exposed.  Either way,
> I'm not sure whether to link this new failure to any of the associated
> PRs or to file a new one, but, FTR, I'm going to look into it.
> 

I'd suggest a new pr.  It's easier to track than re-opening an existing on.

R.

> -- 
> Alexandre Oliva, happy hacker    https://FSFLA.org/blogs/lxo/ 
> 
>    Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive



Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Richard Biener
On Thu, 7 Mar 2024, Uros Bizjak wrote:

> The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:
> 
> internal compiler error: RTL check: expected elt 0 type 'e' or 'u',
> have 'E' (rtx unspec) in try_combine, at combine.cc:3237
> 
> This is
> 
> 3236  /* Just replace the CC reg with a new mode.  */
> 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> 3238  undobuf.other_insn = cc_use_insn;
> 
> in combine.cc, where *cc_use_loc is
> 
> (unspec:DI [
> (reg:CC 17 flags)
> ] UNSPEC_PUSHFL)
> 
> combine assumes CC must be used inside of a comparison and uses XEXP (..., 0)
> without checking on the RTX type of the argument.
> 
> Undo the combination if *cc_use_loc is not COMPARISON_P.
> 
> Also remove buggy and now redundant check for (const 0) RTX as part of
> the comparison.
> 
> PR rtl-optimization/112560
> 
> gcc/ChangeLog:
> 
> * combine.cc (try_combine): Reject the combination
> if *cc_use_loc is not COMPARISON_P.
> 
> Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
> 
> OK for trunk?

Since you CCed me - looking at the code I wonder why we fatally fail.
The following might also fix the issue and preserve more of the
rest of the flow of the function.

If that works I'd prefer it.  But I'll defer approval to the combine
maintainer which is Segher.

Thanks,
Richard.

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..e280cd72ec7 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
*i1, rtx_insn *i0,
 
   if (undobuf.other_insn == 0
  && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
-   _use_insn)))
+   _use_insn))
+ && COMPARISON_P (*cc_use_loc))
{
  compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
  if (is_a  (GET_MODE (i2dest), ))
@@ -3200,7 +3201,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
*i1, rtx_insn *i0,
 the above simplify_compare_const() returned a new comparison
 operator.  undobuf.other_insn is assigned the CC use insn
 when modifying it.  */
- if (cc_use_loc)
+ if (cc_use_loc && COMPARISON_P (*cc_use_loc))
{
 #ifdef SELECT_CC_MODE
  machine_mode new_mode



Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-03-07 Thread Richard Biener
On Thu, 7 Mar 2024, Iain Sandoe wrote:

> Hi Dimitry, folks,
> 
> > On 6 Mar 2024, at 23:02, Dimitry Andric  wrote:
> > 
> > On 6 Mar 2024, at 15:57, FX Coudert  wrote:
> >> 
> >>> Hmm I recall trying it and finding a problem - was there some different 
> >>> fix applied
> >>> in the end?
> >> 
> >> The bug is still open, I don?t think a patch was applied, and I don?t find 
> >> any email to the list stating what the problem could be.
> > 
> > The original patch (https://gcc.gnu.org/bugzilla/attachment.cgi?id=56010) 
> > still applies to the master branch.
> 
> I have retested this on various Darwin versions and confirm that it fixes the 
> bootstrap fail on x86_64-darwin23 and works OK on other versions (including 
> 32b hosts). 
> 
> +1 for applying this soon.

I think it's an obvious change ...

Richard.

> 
> 
> the second patch really needs to be posted separately to make review easier 
> (if we were not in stage 4, I?d say it?s ?obvious? anyway).
> 
> thanks
> Iain
> 
> 
> > It turned out there is also a related problem in libcc1plugin.cc and 
> > libcp1plugin.cc , which is fixed by 
> > https://gcc.gnu.org/bugzilla/attachment.cgi?id=57639 :
> > 
> > commit 49222b98ac8e30a4a042ada0ece3d7df93f049d2
> > Author: Dimitry Andric 
> > Date:   2024-03-06T23:46:27+01:00
> > 
> >Fix libcc1plugin and libc1plugin to use INCLUDE_VECTOR before including
> >system.h, instead of directly including , to avoid running into
> >poisoned identifiers.
> > 
> > diff --git a/libcc1/libcc1plugin.cc b/libcc1/libcc1plugin.cc
> > index 72d17c3b81c..e64847466f4 100644
> > --- a/libcc1/libcc1plugin.cc
> > +++ b/libcc1/libcc1plugin.cc
> > @@ -32,6 +32,7 @@
> > #undef PACKAGE_VERSION
> > 
> > #define INCLUDE_MEMORY
> > +#define INCLUDE_VECTOR
> > #include "gcc-plugin.h"
> > #include "system.h"
> > #include "coretypes.h"
> > @@ -69,8 +70,6 @@
> > #include "gcc-c-interface.h"
> > #include "context.hh"
> > 
> > -#include 
> > -
> > using namespace cc1_plugin;
> > 
> > 
> > diff --git a/libcc1/libcp1plugin.cc b/libcc1/libcp1plugin.cc
> > index 0eff7c68d29..da68c5d0ac1 100644
> > --- a/libcc1/libcp1plugin.cc
> > +++ b/libcc1/libcp1plugin.cc
> > @@ -33,6 +33,7 @@
> > #undef PACKAGE_VERSION
> > 
> > #define INCLUDE_MEMORY
> > +#define INCLUDE_VECTOR
> > #include "gcc-plugin.h"
> > #include "system.h"
> > #include "coretypes.h"
> > @@ -71,8 +72,6 @@
> > #include "rpc.hh"
> > #include "context.hh"
> > 
> > -#include 
> > -
> > using namespace cc1_plugin;
> > 
> > 
> > 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:

internal compiler error: RTL check: expected elt 0 type 'e' or 'u',
have 'E' (rtx unspec) in try_combine, at combine.cc:3237

This is

3236  /* Just replace the CC reg with a new mode.  */
3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
3238  undobuf.other_insn = cc_use_insn;

in combine.cc, where *cc_use_loc is

(unspec:DI [
(reg:CC 17 flags)
] UNSPEC_PUSHFL)

combine assumes CC must be used inside of a comparison and uses XEXP (..., 0)
without checking on the RTX type of the argument.

Undo the combination if *cc_use_loc is not COMPARISON_P.

Also remove buggy and now redundant check for (const 0) RTX as part of
the comparison.

PR rtl-optimization/112560

gcc/ChangeLog:

* combine.cc (try_combine): Reject the combination
if *cc_use_loc is not COMPARISON_P.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

OK for trunk?

Uros.
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..6dac9ffca85 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3184,11 +3184,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
_use_insn)))
{
- compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
- if (is_a  (GET_MODE (i2dest), ))
-   compare_code = simplify_compare_const (compare_code, mode,
-  , );
- target_canonicalize_comparison (_code, , , 1);
+ if (COMPARISON_P (*cc_use_loc))
+   {
+ compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
+ if (is_a  (GET_MODE (i2dest), ))
+   compare_code = simplify_compare_const (compare_code, mode,
+  , );
+ target_canonicalize_comparison (_code, , , 1);
+   }
+ else
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "CC register not used in comparison.\n");
+ undo_all ();
+ return 0;
+   }
}
 
   /* Do the rest only if op1 is const0_rtx, which may be the
@@ -3221,9 +3231,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
}
 #endif
  /* Cases for modifying the CC-using comparison.  */
- if (compare_code != orig_compare_code
- /* ??? Do we need to verify the zero rtx?  */
- && XEXP (*cc_use_loc, 1) == const0_rtx)
+ if (compare_code != orig_compare_code)
{
  /* Replace cc_use_loc with entire new RTX.  */
  SUBST (*cc_use_loc,


RE: [PATCH] vect: Do not peel epilogue for partial vectors [PR114196].

2024-03-07 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, March 7, 2024 8:47 AM
> To: Robin Dapp 
> Cc: gcc-patches ; Tamar Christina
> 
> Subject: Re: [PATCH] vect: Do not peel epilogue for partial vectors 
> [PR114196].
> 
> On Wed, Mar 6, 2024 at 9:21 PM Robin Dapp  wrote:
> >
> > Hi,
> >
> > r14-7036-gcbf569486b2dec added an epilogue vectorization guard for early
> > break but PR114196 shows that we also run into the problem without early
> > break.  Therefore remove early break from the conditions.
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/114196
> >
> > * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Remove
> > early break check from guards.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/pr114196.c: New test.
> > * gcc.target/riscv/rvv/autovec/pr114196.c: New test.
> > ---
> >  gcc/testsuite/gcc.target/aarch64/pr114196.c   | 19 +++
> >  .../gcc.target/riscv/rvv/autovec/pr114196.c   | 19 +++
> >  gcc/tree-vect-loop-manip.cc   |  6 +++---
> >  3 files changed, 41 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr114196.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr114196.c
> b/gcc/testsuite/gcc.target/aarch64/pr114196.c
> > new file mode 100644
> > index 000..15e4b0e31b8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr114196.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-options { -O3 -fno-vect-cost-model -march=armv9-a -msve-vector-
> bits=256 } } */
> > +
> > +unsigned a;
> > +int b;
> > +long *c;
> > +
> > +int
> > +main ()
> > +{
> > +  for (int d = 0; d < 22; d += 4) {
> > +  b = ({
> > +   int e = c[d];
> > +   e;
> > +   })
> > +  ? 0 : -c[d];
> > +  a *= 3;
> > +  }
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
> > new file mode 100644
> > index 000..7ba9cbbed70
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-options { -O3 -fno-vect-cost-model -march=rv64gcv_zvl256b -
> mabi=lp64d -mrvv-vector-bits=zvl } } */
> > +
> > +unsigned a;
> > +int b;
> > +long *c;
> > +
> > +int
> > +main ()
> > +{
> > +  for (int d = 0; d < 22; d += 4) {
> > +  b = ({
> > +   int e = c[d];
> > +   e;
> > +   })
> > +  ? 0 : -c[d];
> > +  a *= 3;
> > +  }
> > +}
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index f72da915103..c3cd20eef70 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -2183,9 +2183,9 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info
> loop_vinfo,
> >   perform the peeling.  The below condition mirrors that of
> >   vect_gen_vector_loop_niters  where niters_vector_mult_vf_var then sets
> >   step_vector to VF rather than 1.  This is what creates the nonlinear
> > - IV.  PR113163.  */
> > -  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > -  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
> > + IV.  PR113163.
> > + This also happens without early breaks, see PR114196.  */
> 
> Can you instead reword to not mention early breaks, maybe instead
> say PR113163 (with early breaks), PR114196 (without)?
> 
> The dump message also needs adjustments, it mentions early breaks as
> well.
> 
> The comment says it matches a condition in vect_gen_vector_loop_niters
> but I can't see what that means ... Tamar?
> 

The comment was trying to say that this case is when you manage to get here:
https://github.com/gcc-mirror/gcc/blob/95b6ee96348041eaee9133f082b57f3e57ef0b11/gcc/tree-vect-loop-manip.cc#L2847

because that makes you fall into 
https://github.com/gcc-mirror/gcc/blob/95b6ee96348041eaee9133f082b57f3e57ef0b11/gcc/tree-vect-loop-manip.cc#L3528
 which creates the nonlinear IV variable.

The vect_step_op_neg exception is because vect_update_ivs_after_vectorizer can 
deal with that case specifically
https://github.com/gcc-mirror/gcc/blob/95b6ee96348041eaee9133f082b57f3e57ef0b11/gcc/tree-vect-loop-manip.cc#L2398

which is what the previous check is also explaining 
https://github.com/gcc-mirror/gcc/blob/95b6ee96348041eaee9133f082b57f3e57ef0b11/gcc/tree-vect-loop-manip.cc#L2133

If this also happens for non-early breaks it's just better to merge the check 
into the earlier one at 
github.com/gcc-mirror/gcc/blob/95b6ee96348041eaee9133f082b57f3e57ef0b11/gcc/tree-vect-loop-manip.cc#L2133

Tamar

> > +  if (LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
> >&& LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> >&& induction_type != vect_step_op_neg)
> >  {
> > --
> > 2.43.2


Re: [PATCH] bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]

2024-03-07 Thread Richard Biener
On Thu, 7 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs, because fix_crossing_unconditional_branches
> thinks that asm goto is an unconditional jump and removes it, replacing it
> with unconditional jump to one of the labels.
> This doesn't happen on x86 because the function in question isn't invoked
> there at all:
>   /* If the architecture does not have unconditional branches that
>  can span all of memory, convert crossing unconditional branches
>  into indirect jumps.  Since adding an indirect jump also adds
>  a new register usage, update the register usage information as
>  well.  */
>   if (!HAS_LONG_UNCOND_BRANCH)
> fix_crossing_unconditional_branches ();
> I think for the asm goto case, for the non-fallthru edge if any we should
> handle it like any other fallthru (and fix_crossing_unconditional_branches
> doesn't really deal with those, it only looks at explicit branches at the
> end of bbs and we are in cfglayout mode at that point) and for the labels
> we just pass the labels as immediates to the assembly and it is up to the
> user to figure out how to store them/branch to them or whatever they want to
> do.
> So, the following patch fixes this by not treating asm goto as a simple
> unconditional jump.
> 
> I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
> somewhere else, where outofcfglayout or whatever should actually create
> those indirect jumps on the crossing edges instead of adding normal
> unconditional jumps, I see e.g. in
> __attribute__((cold)) int bar (char *);
> __attribute__((hot)) int baz (char *);
> void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto l1; 
> l1: baz (""); }
> void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: 
> return; l1: bar (""); goto l2; }
> with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
> just b .L? jumps which I believe are +-32MB, so if .text is larger than
> 32MB, it could fail to link, but this patch doesn't address that.
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux and aarch64-linux, ok for
> trunk?

OK.

Thanks,
Richard.

> 2024-03-07  Jakub Jelinek  
> 
>   PR rtl-optimization/110079
>   * bb-reorder.cc (fix_crossing_unconditional_branches): Don't adjust
>   asm goto.
> 
>   * gcc.dg/pr110079.c: New test.
> 
> --- gcc/bb-reorder.cc.jj  2024-01-03 11:51:32.0 +0100
> +++ gcc/bb-reorder.cc 2024-03-06 18:34:29.468016144 +0100
> @@ -2266,7 +2266,8 @@ fix_crossing_unconditional_branches (voi
> /* Make sure the jump is not already an indirect or table jump.  */
>  
> if (!computed_jump_p (last_insn)
> -   && !tablejump_p (last_insn, NULL, NULL))
> +   && !tablejump_p (last_insn, NULL, NULL)
> +   && !asm_noperands (PATTERN (last_insn)))
>   {
> /* We have found a "crossing" unconditional branch.  Now
>we must convert it to an indirect jump.  First create
> --- gcc/testsuite/gcc.dg/pr110079.c.jj2024-03-06 18:42:47.175250069 
> +0100
> +++ gcc/testsuite/gcc.dg/pr110079.c   2024-03-06 18:44:47.008620726 +0100
> @@ -0,0 +1,43 @@
> +/* PR rtl-optimization/110079 */
> +/* { dg-do compile { target lra } } */
> +/* { dg-options "-O2" } */
> +/* { dg-additional-options "-freorder-blocks-and-partition" { target 
> freorder } } */
> +
> +int a;
> +__attribute__((cold)) int bar (char *);
> +__attribute__((hot)) int baz (char *);
> +
> +void
> +foo (void)
> +{
> +l1:
> +  while (a)
> +;
> +  bar ("");
> +  asm goto ("" : : : : l2);
> +  asm ("");
> +l2:
> +  goto l1;
> +}
> +
> +void
> +qux (void)
> +{
> +  asm goto ("" : : : : l1);
> +  bar ("");
> +  goto l1;
> +l1:
> +  baz ("");
> +}
> +
> +void
> +corge (void)
> +{
> +  asm goto ("" : : : : l1);
> +  baz ("");
> +l2:
> +  return;
> +l1:
> +  bar ("");
> +  goto l2;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] expand: Fix UB in choose_mult_variant [PR105533]

2024-03-07 Thread Richard Biener
On Thu, 7 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> As documented in the function comment, choose_mult_variant attempts to
> compute costs of 3 different cases, val, -val and val - 1.
> The -val case is actually only done if val fits into host int, so there
> should be no overflow, but the val - 1 case is done unconditionally.
> val is shwi (but inside of synth_mult already uhwi), so when val is
> HOST_WIDE_INT_MIN, val - 1 invokes UB.  The following patch fixes that
> by using val - HOST_WIDE_INT_1U, but I'm not really convinced it would
> DTRT for > 64-bit modes, so I've guarded it as well.  Though, arch
> would need to have really strange costs that something that could be
> expressed as x << 63 would be better expressed as (x * 0x7fff) + 1
> In the long term, I think we should just rewrite
> choose_mult_variant/synth_mult etc. to work on wide_int.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-03-07  Jakub Jelinek  
> 
>   PR middle-end/105533
>   * expmed.cc (choose_mult_variant): Only try the val - 1 variant
>   if val is not HOST_WIDE_INT_MIN or if mode has exactly
>   HOST_BITS_PER_WIDE_INT precision.  Avoid triggering UB while computing
>   val - 1.
> 
>   * gcc.dg/pr105533.c: New test.
> 
> --- gcc/expmed.cc.jj  2024-01-03 11:51:27.327789537 +0100
> +++ gcc/expmed.cc 2024-03-06 16:01:27.898693433 +0100
> @@ -3285,11 +3285,15 @@ choose_mult_variant (machine_mode mode,
>limit.latency = mult_cost - op_cost;
>  }
>  
> -  synth_mult (, val - 1, , mode);
> -  alg2.cost.cost += op_cost;
> -  alg2.cost.latency += op_cost;
> -  if (CHEAPER_MULT_COST (, >cost))
> -*alg = alg2, *variant = add_variant;
> +  if (val != HOST_WIDE_INT_MIN
> +  || GET_MODE_UNIT_PRECISION (mode) == HOST_BITS_PER_WIDE_INT)
> +{
> +  synth_mult (, val - HOST_WIDE_INT_1U, , mode);
> +  alg2.cost.cost += op_cost;
> +  alg2.cost.latency += op_cost;
> +  if (CHEAPER_MULT_COST (, >cost))
> + *alg = alg2, *variant = add_variant;
> +}
>  
>return MULT_COST_LESS (>cost, mult_cost);
>  }
> --- gcc/testsuite/gcc.dg/pr105533.c.jj2024-03-06 16:03:50.347757251 
> +0100
> +++ gcc/testsuite/gcc.dg/pr105533.c   2024-03-06 16:03:26.226084751 +0100
> @@ -0,0 +1,9 @@
> +/* PR middle-end/105533 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +long long
> +foo (long long x, long long y)
> +{
> +  return ((x < 0) & (y != 0)) * (-__LONG_LONG_MAX__ - 1);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] vect: Do not peel epilogue for partial vectors [PR114196].

2024-03-07 Thread Richard Biener
On Wed, Mar 6, 2024 at 9:21 PM Robin Dapp  wrote:
>
> Hi,
>
> r14-7036-gcbf569486b2dec added an epilogue vectorization guard for early
> break but PR114196 shows that we also run into the problem without early
> break.  Therefore remove early break from the conditions.
>
> gcc/ChangeLog:
>
> PR middle-end/114196
>
> * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Remove
> early break check from guards.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/pr114196.c: New test.
> * gcc.target/riscv/rvv/autovec/pr114196.c: New test.
> ---
>  gcc/testsuite/gcc.target/aarch64/pr114196.c   | 19 +++
>  .../gcc.target/riscv/rvv/autovec/pr114196.c   | 19 +++
>  gcc/tree-vect-loop-manip.cc   |  6 +++---
>  3 files changed, 41 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr114196.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr114196.c 
> b/gcc/testsuite/gcc.target/aarch64/pr114196.c
> new file mode 100644
> index 000..15e4b0e31b8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr114196.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options { -O3 -fno-vect-cost-model -march=armv9-a 
> -msve-vector-bits=256 } } */
> +
> +unsigned a;
> +int b;
> +long *c;
> +
> +int
> +main ()
> +{
> +  for (int d = 0; d < 22; d += 4) {
> +  b = ({
> +   int e = c[d];
> +   e;
> +   })
> +  ? 0 : -c[d];
> +  a *= 3;
> +  }
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
> new file mode 100644
> index 000..7ba9cbbed70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr114196.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options { -O3 -fno-vect-cost-model -march=rv64gcv_zvl256b 
> -mabi=lp64d -mrvv-vector-bits=zvl } } */
> +
> +unsigned a;
> +int b;
> +long *c;
> +
> +int
> +main ()
> +{
> +  for (int d = 0; d < 22; d += 4) {
> +  b = ({
> +   int e = c[d];
> +   e;
> +   })
> +  ? 0 : -c[d];
> +  a *= 3;
> +  }
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index f72da915103..c3cd20eef70 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -2183,9 +2183,9 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo,
>   perform the peeling.  The below condition mirrors that of
>   vect_gen_vector_loop_niters  where niters_vector_mult_vf_var then sets
>   step_vector to VF rather than 1.  This is what creates the nonlinear
> - IV.  PR113163.  */
> -  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> -  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
> + IV.  PR113163.
> + This also happens without early breaks, see PR114196.  */

Can you instead reword to not mention early breaks, maybe instead
say PR113163 (with early breaks), PR114196 (without)?

The dump message also needs adjustments, it mentions early breaks as
well.

The comment says it matches a condition in vect_gen_vector_loop_niters
but I can't see what that means ... Tamar?

> +  if (LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
>&& LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>&& induction_type != vect_step_op_neg)
>  {
> --
> 2.43.2


Re: [PATCH] sccvn: Avoid UB in ao_ref_init_from_vn_reference [PR105533]

2024-03-07 Thread Richard Biener
On Thu, 7 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> When compiling libgcc or on e.g.
> int a[64];
> int p;
> 
> void
> foo (void)
> {
>   int s = 1;
>   while (p)
> {
>   s -= 11;
>   a[s] != 0;
> }
> }
> sccvn invokes UB in the compiler as detected by ubsan:
> ../../gcc/poly-int.h:1089:5: runtime error: left shift of negative value -40
> The problem is that we still use C++11..C++17 as the implementation language
> and in those C++ versions shifting negative values left is UB (well defined
> since C++20) and above in
>offset += op->off << LOG2_BITS_PER_UNIT;
> op->off is poly_int64 with -40 value (in libgcc with -8).
> I understand the offset_int << LOG2_BITS_PER_UNIT shifts but it is then well
> defined during underlying implementation which is done on the uhwi limbs,
> but for poly_int64 we use
>   offset += pop->off * BITS_PER_UNIT;
> a few lines earlier and I think that is both more readable in what it
> actually does and triggers UB only if there would be signed multiply
> overflow.  In the end, the compiler will treat them the same at least at the
> RTL level (at least, if not and they aren't the same cost, it should).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-03-07  Jakub Jelinek  
> 
>   PR middle-end/105533
>   * tree-ssa-sccvn.cc (ao_ref_init_from_vn_reference) :
>   Multiple op->off by BITS_PER_UNIT instead of shifting it left by
>   LOG2_BITS_PER_UNIT.
> 
> --- gcc/tree-ssa-sccvn.cc.jj  2024-02-28 22:57:18.318658827 +0100
> +++ gcc/tree-ssa-sccvn.cc 2024-03-06 14:52:16.819229719 +0100
> @@ -1221,7 +1221,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
> if (maybe_eq (op->off, -1))
>   max_size = -1;
> else
> - offset += op->off << LOG2_BITS_PER_UNIT;
> + offset += op->off * BITS_PER_UNIT;
> break;
>  
>   case REALPART_EXPR:
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-03-07 Thread Richard Biener
On Thu, Mar 7, 2024 at 2:54 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > gen_int_libfunc will no longer make it emit libcalls for fixed point
> > modes, so this can't be correct
> > and there's no libgcc implementation for integer mode saturating ops,
> > so it's pointless to emit calls
> > to them.
>
> Got the pointer here, the OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, 
> "usadd", '3', gen_unsigned_fixed_libfunc)
> Is designed for the fixed point, cannot cover integer mode right now.

I think

OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
gen_unsigned_fixed_libfunc)

would work fine (just dropping the $Q).

> Given we have saturating integer alu like below, could you help to coach me 
> the most reasonable way to represent
> It in scalar as well as vectorize part? Sorry not familiar with this part and 
> still dig into how it works...

As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for
the other cases.

As I said, use vectorizer patterns and possibly do instruction
selection at ISEL/widen_mult time.

Richard.

> uint32_t sat_uadd (uint32_t a, uint32_t b)
> {
>   uint32_t add = a + b;
>   return add | -(add < a);
> }
>
> sint32_t sat_sadd (sint32_t a, sint32_t b)
> {
>   sint32_t add = a + b;
>   sint32_t x = a ^ b;
>   sint32_t y = add ^ x;
>   return x < 0 ? add : (y >= 0 ? add : INT32_MAX + (x < 0));
> }
>
> uint32_t sat_usub (uint32_t a, uint32_t b)
> {
>   return a >= b ? a - b : 0;
> }
>
> sint32_t sat_ssub (sint32_t a, sint32_t b)
> {
>   sint32_t sub = a - b;
>   sint32_t x = a ^ b;
>   sint32_t y = sub ^ x;
>   return x >= 0 ? sub : (y >= 0 ? sub : INT32_MAX + (x < 0));
> }
>
> uint32_t sat_umul (uint32_t a, uint32_t b)
> {
>   uint64_t mul = a * b;
>
>   return mul <= (uint64_t)UINT32_MAX ? (uint32_t)mul : UINT32_MAX;
> }
>
> sint32_t sat_smul (sint32_t a, sint32_t b)
> {
>   sint64_t mul = a * b;
>
>   return mul >= (sint64_t)INT32_MIN && mul <= (sint64_t)INT32_MAX ? 
> (sint32_t)mul : INT32_MAX + ((x ^ y) < 0);
> }
>
> uint32_t sat_udiv (uint32_t a, uint32_t b)
> {
>   return a / b; // never overflow
> }
>
> sint32_t sat_sdiv (sint32_t a, sint32_t b)
> {
>   return a == INT32_MIN && b == -1 ? INT32_MAX : a / b;
> }
>
> sint32_t sat_abs (sint32_t a)
> {
>   return a >= 0 ? a : (a == INT32_MIN ? INT32_MAX : -a);
> }
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, March 5, 2024 4:41 PM
> To: Li, Pan2 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
> kito.ch...@gmail.com; jeffreya...@gmail.com
> Subject: Re: [PATCH v2] Draft|Internal-fn: Introduce internal fn saturation 
> US_PLUS
>
> On Tue, Mar 5, 2024 at 8:09 AM Li, Pan2  wrote:
> >
> > Thanks Richard for comments.
> >
> > > I do wonder what the existing usadd patterns with integer vector modes
> > > in various targets do?
> > > Those define_insn will at least not end up in the optab set I guess,
> > > so they must end up
> > > being either unused or used by explicit gen_* (via intrinsic
> > > functions?) or by combine?
> >
> > For usadd with vector modes, I think the backend like RISC-V try to 
> > leverage instructions
> > like Vector Single-Width Saturating Add(aka vsaddu.vv/x/i).
> >
> > > I think simply changing gen_*_fixed_libfunc to gen_int_libfunc won't
> > > work.  Since there's
> > > no libgcc support I'd leave it as gen_*_fixed_libfunc thus no library
> > > fallback for integers?
> >
> > Change to gen_int_libfunc follows other int optabs. I am not sure if it 
> > will hit the standard name usaddm3 for vector mode.
> > But the happy path for scalar modes works up to a point, please help to 
> > correct me if any misunderstanding.
>
> gen_int_libfunc will no longer make it emit libcalls for fixed point
> modes, so this can't be correct
> and there's no libgcc implementation for integer mode saturating ops,
> so it's pointless to emit calls
> to them.
>
> > #0  riscv_expand_usadd (dest=0x76a8c7c8, x=0x76a8c798, 
> > y=0x76a8c7b0) at 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.cc:10662
> > #1  0x029f142a in gen_usaddsi3 (operand0=0x76a8c7c8, 
> > operand1=0x76a8c798, operand2=0x76a8c7b0) at 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/config/riscv/riscv.md:3848
> > #2  0x01751e60 in insn_gen_fn::operator() > rtx_def*> (this=0x4910e70 ) at 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/recog.h:441
> > #3  0x0180f553 in maybe_gen_insn (icode=CODE_FOR_usaddsi3, nops=3, 
> > ops=0x7fffd2c0) at 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs.cc:8232
> > #4  0x0180fa42 in maybe_expand_insn (icode=CODE_FOR_usaddsi3, 
> > nops=3, ops=0x7fffd2c0) at 
> > /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs.cc:8275
> > #5  0x0180fade in expand_insn (icode=CODE_FOR_usaddsi3, nops=3, 
> > ops=0x7fffd2c0) at 
> > 

Re:[pushed] [PATCH v1] LoongArch: testsuite:Fix problems with incorrect results in vector test cases.

2024-03-07 Thread chenglulu

Pushed to r14-9352.

在 2024/3/6 下午4:54, chenxiaolong 写道:

In simd_correctness_check.h, the role of the macro ASSERTEQ_64 is to check the
result of the passed vector values for the 64-bit data of each array element.
It turns out that it uses the abs() function to check only the lower 32 bits
of the data at a time, so it replaces abs() with the llabs() function.

However, the following two problems may occur after modification:

1.FAIL in lasx-xvfrint_s.c and lsx-vfrint_s.c
The reason for the error is because vector test cases that use __m{128,256} to
define vector types are composed of 32-bit primitive types, they should use
ASSERTEQ_32 instead of ASSERTEQ_64 to check for correctness.

2.FAIL in lasx-xvshuf_b.c and lsx-vshuf.c
The cause of the error is that the expected result of the function setting in
the test case is incorrect.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c: Replace
ASSERTEQ_64 with the macro ASSERTEQ_32.
* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Modify the expected
test results of some functions according to the function of the vector
instruction.
* gcc.target/loongarch/vector/lsx/lsx-vfrint_s.c: Same
modification as lasx-xvfrint_s.c.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Same
modification as lasx-xvshuf_b.c.
* gcc.target/loongarch/vector/simd_correctness_check.h: Use the llabs()
function instead of abs() to check the correctness of the results.
---
  .../loongarch/vector/lasx/lasx-xvfrint_s.c| 58 +--
  .../loongarch/vector/lasx/lasx-xvshuf_b.c | 14 ++---
  .../loongarch/vector/lsx/lsx-vfrint_s.c   | 50 
  .../loongarch/vector/lsx/lsx-vshuf.c  | 12 ++--
  .../loongarch/vector/simd_correctness_check.h |  2 +-
  5 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
index fbfe300eac4..4538528a67f 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvfrint_s.c
@@ -184,7 +184,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -203,7 +203,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -222,7 +222,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x01010101;

*((int *)&__m256_op0[6]) = 0x01010101;
@@ -241,7 +241,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -260,7 +260,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -279,7 +279,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x;

*((int *)&__m256_op0[6]) = 0x;
@@ -298,7 +298,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 (__LINE__, __m256_result, __m256_out);
+  ASSERTEQ_32 (__LINE__, __m256_result, __m256_out);
  
*((int *)&__m256_op0[7]) = 0x01010101;

*((int *)&__m256_op0[6]) = 0x01010101;
@@ -317,7 +317,7 @@ main ()
*((int *)&__m256_result[1]) = 0x;
*((int *)&__m256_result[0]) = 0x;
__m256_out = __lasx_xvfrintrne_s (__m256_op0);
-  ASSERTEQ_64 

Re:[pushed] [PATCH] LoongArch: Use /lib instead of /lib64 as the library search path for MUSL.

2024-03-07 Thread chenglulu

Pushed to r14-9351.

在 2024/3/6 上午9:19, Yang Yujie 写道:

gcc/ChangeLog:

* config.gcc: Add a case for loongarch*-*-linux-musl*.
* config/loongarch/linux.h: Disable the multilib-compatible
treatment for *musl* targets.
* config/loongarch/musl.h: New file.
---
  gcc/config.gcc   |  3 +++
  gcc/config/loongarch/linux.h |  4 +++-
  gcc/config/loongarch/musl.h  | 23 +++
  3 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/config/loongarch/musl.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a1480b72c46..3293be16699 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2538,6 +2538,9 @@ riscv*-*-freebsd*)
  
  loongarch*-*-linux*)

tm_file="elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h 
${tm_file}"
+case ${target} in
+ *-linux-musl*) tm_file="${tm_file} loongarch/musl.h"
+   esac
tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h 
loongarch/loongarch-driver.h"
extra_options="${extra_options} linux-android.opt"
tmake_file="${tmake_file} loongarch/t-multilib loongarch/t-linux"
diff --git a/gcc/config/loongarch/linux.h b/gcc/config/loongarch/linux.h
index 17d9f87537b..40d9ba6d405 100644
--- a/gcc/config/loongarch/linux.h
+++ b/gcc/config/loongarch/linux.h
@@ -21,7 +21,9 @@ along with GCC; see the file COPYING3.  If not see
   * This ensures that a compiler configured with --disable-multilib
   * can work in a multilib environment.  */
  
-#if defined(LA_DISABLE_MULTILIB) && defined(LA_DISABLE_MULTIARCH)

+#if !defined(LA_DEFAULT_TARGET_MUSL) \
+  && defined(LA_DISABLE_MULTILIB) \
+  && defined(LA_DISABLE_MULTIARCH)
  
#if DEFAULT_ABI_BASE == ABI_BASE_LP64D

  #define ABI_LIBDIR "lib64"
diff --git a/gcc/config/loongarch/musl.h b/gcc/config/loongarch/musl.h
new file mode 100644
index 000..fa43bc86606
--- /dev/null
+++ b/gcc/config/loongarch/musl.h
@@ -0,0 +1,23 @@
+/* Definitions for MUSL C library support.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+#ifndef LA_DEFAULT_TARGET_MUSL
+#define LA_DEFAULT_TARGET_MUSL
+#endif




Re: [PATCH v2] RISC-V: Fix ICE in riscv vector costs

2024-03-07 Thread juzhe.zh...@rivai.ai
LGTM except a nit comment:

PR 114264 -> PR target/114264

No need to send V3, just commit it with this change.


juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-03-07 16:32
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; jeffreyalaw
Subject: [PATCH v2] RISC-V: Fix ICE in riscv vector costs
The following code can result in ICE:
-march=rv64gcv --param riscv-autovec-lmul=dynamic -O3
 
char *jpeg_difference7_input_buf;
void jpeg_difference7(int *diff_buf) {
  unsigned width;
  int samp, Rb;
  while (--width) {
Rb = samp = *jpeg_difference7_input_buf;
*diff_buf++ = -(int)(samp + (long)Rb >> 1);
  }
}
 
One biggest_mode update missed in one branch and trigger assertion fail.
gcc_assert (biggest_size >= mode_size);
 
Tested On RV64 and no regression.
 
PR 114264
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc: Fix ICE
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/pr114264.c: New test.
 
Signed-off-by: demin.han 
---
gcc/config/riscv/riscv-vector-costs.cc|  2 ++
.../gcc.dg/vect/costmodel/riscv/rvv/pr114264.c| 15 +++
2 files changed, 17 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114264.c
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 7c9840df4e9..f13a1296b31 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -413,6 +413,8 @@ compute_local_live_ranges (
  auto *r = get_live_range (live_ranges, arg);
  gcc_assert (r);
  (*r).second = MAX (point, (*r).second);
+   biggest_mode = get_biggest_mode (
+ biggest_mode, TYPE_MODE (TREE_TYPE (arg)));
}
}
  else
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114264.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114264.c
new file mode 100644
index 000..7853f292af7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr114264.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
--param=riscv-autovec-lmul=dynamic" } */
+
+char *jpeg_difference7_input_buf;
+void
+jpeg_difference7 (int *diff_buf)
+{
+  unsigned width;
+  int samp, Rb;
+  while (--width)
+{
+  Rb = samp = *jpeg_difference7_input_buf;
+  *diff_buf++ = -(int) (samp + (long) Rb >> 1);
+}
+}
-- 
2.44.0
 
 


  1   2   >