Re: [PATCH 0/2] Condition coverage fixes

2024-04-06 Thread Jørgen Kvalsvik

On 06/04/2024 13:15, Jørgen Kvalsvik wrote:

On 06/04/2024 07:50, Richard Biener wrote:




Am 05.04.2024 um 21:59 schrieb Jørgen Kvalsvik :

Hi,

I propose these fixes for the current issues with the condition
coverage.

Rainer, I propose to simply delete the test with __sigsetjmp. I don't
think it actually detects anything reasonable any more, I kept it around
to prevent a regression. Since then I have built a lot of programs (with
optimization enabled) and not really seen this problem.

H.J., the problem you found with -O2 was really a problem of
tree-inlining, which was actually caught earlier by Jan [1]. It probably
warrants some more testing, but I could reproduce by tuning your test
case to use always_inline and not -O2 and trigger the error.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648785.html


Ok


Thanks, committed.

I am wondering if the fn->cond_uids access should always be guarded (in 
tree-profile.cc) should always be guarded. Right now there is the 
assumption that if condition coverage is requested the will exist and be 
populated, but as this shows there may be other circumstances where this 
is not true.


Or perhaps there should be a gcc_assert to (reliably) detect cases where 
the map is not constructed properly?


Thanks,
Jørgen


I gave this some more thought, and realised I was too eager to fix the 
segfault. While trunk no longer crashes (at least on my x86_64 linux) 
the fix itself is bad. It copies the gcond -> uid mappings into the 
caller, but the stmts are deep copied into the caller, so no gcond will 
ever be a hit when we look up the condition_uids in tree-profile.cc.


I did a very quick prototype to confirm. By applying this patch:

@@ -2049,6 +2049,9 @@ copy_bb (copy_body_data *id, basic_block bb,

   copy_gsi = gsi_start_bb (copy_basic_block);

+  if (!cfun->cond_uids && id->src_cfun->cond_uids)
+ cfun->cond_uids = new hash_map  ();
+
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
 {
   gimple_seq stmts;
@@ -2076,6 +2079,12 @@ copy_bb (copy_body_data *id, basic_block bb,
  if (gimple_nop_p (stmt))
  continue;

+ if (id->src_cfun->cond_uids && is_a  (orig_stmt))
+   {
+ unsigned *v = id->src_cfun->cond_uids->get (as_a 
(orig_stmt));

+ if (v) cfun->cond_uids->put (as_a  (stmt), *v);
+   }
+


and this test program:

__attribute__((always_inline))
inline int
inlinefn (int a)
{
if (a > 5)
{
printf ("a > 5\n");
return a;
}
else
printf ("a < 5, was %d\n", a);
return a * a - 2;
}

int
mcdc027e (int a, int b)
{
int y = inlinefn (a);
return y + b;
}


gcov reports:

2:   18:mcdc027e (int a, int b)
condition outcomes covered 1/2
condition  0 not covered (true)
-:   19:{
2:   20:int y = inlinefn (a);
2:   21:return y + b;
-:   22:}

but without the patch, gcov prints nothing.

I am not sure if this approach is even ideal. Probably the most 
problematic is the source line mapping which is all messed up. I checked 
with gcov --branch-probabilities and it too reports the callee at the 
top of the caller.


If you think it is a good strategy I can clean up the prototype and 
submit a patch. I suppose the function _totals_ should be accurate, even 
if the source mapping is a bit surprising.


What do you think? I am open to other strategies, too

Thanks,
Jørgen





Thanks,
Richard



Thanks,
Jørgen

Jørgen Kvalsvik (2):
  Remove unecessary and broken MC/DC compile test
  Copy condition->expr map when inlining [PR114599]

gcc/testsuite/gcc.misc-tests/gcov-19.c   | 11 -
gcc/testsuite/gcc.misc-tests/gcov-pr114599.c | 25 
gcc/tree-inline.cc   | 20 +++-
3 files changed, 44 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr114599.c

--
2.30.2







[PATCH] s390: Fix s390_const_int_pool_entry_p and movdi peephole2 [PR114605]

2024-04-06 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because we have initially
a movti which loads the 0x3f803f80ULL TImode constant
from constant pool.  Later on we split it into a pair of DImode
loads.  Now, for the first load (why just that?, though not stage4
material) we trigger the peephole2 which uses s390_const_int_pool_entry_p.
That function doesn't check at all the constant pool mode though, sees
the constant pool at that address has a CONST_INT value and just assumes
that is the value to return, which is especially wrong for big-endian,
if it is a DImode load from offset 0, it should be loading 0 rather than
0x3f803f80ULL.
The following patch adds checks if we are extracing a MODE_INT mode,
if the constant pool has MODE_INT mode as well, punts if constant pool
has smaller mode size than the extraction one (then it would be UB),
if it has the same mode as before keeps using what it did before,
if constant pool has a larger mode than the one being extracted, uses
simplify_subreg.  I'd have used avoid_constant_pool_reference
instead which can handle also offsets into the constant pool constants,
but it can't handle UNSPEC_LTREF.

Another thing is that once that is fixed, we ICE when we extract constant
like 0, ior insn predicate require non-0 constant.  So, the patch also
fixes the peephole2 so that if either 32-bit half is zero, it uses a mere
load of the constant into register rather than a pair of such load and ior.

Bootstrapped/regtested on s390x-linux, ok for trunk?

2024-04-06  Jakub Jelinek  

PR target/114605
* config/s390/s390.cc (s390_const_int_pool_entry_p): Punt
if mem doesn't have MODE_INT mode, or pool constant doesn't
have MODE_INT mode, or if pool constant mode is smaller than
mem mode.  If mem mode is different from pool constant mode,
try to simplify subreg.  If that doesn't work, punt, if it
does, use the simplified constant instead of the constant pool
constant.
* config/s390/s390.md (movdi from const pool peephole): If
either low or high 32-bit part is zero, just emit move insn
instead of move + ior.

* gcc.dg/pr114605.c: New test.

--- gcc/config/s390/s390.cc.jj  2024-03-14 14:07:34.088426911 +0100
+++ gcc/config/s390/s390.cc 2024-04-05 15:58:57.757057420 +0200
@@ -9984,7 +9984,7 @@ s390_const_int_pool_entry_p (rtx mem, HO
  - (mem (unspec [(symbol_ref) (reg)] UNSPEC_LTREF)).
  - (mem (symbol_ref)).  */
 
-  if (!MEM_P (mem))
+  if (!MEM_P (mem) || GET_MODE_CLASS (GET_MODE (mem)) != MODE_INT)
 return false;
 
   rtx addr = XEXP (mem, 0);
@@ -9998,9 +9998,19 @@ s390_const_int_pool_entry_p (rtx mem, HO
 return false;
 
   rtx val_rtx = get_pool_constant (sym);
-  if (!CONST_INT_P (val_rtx))
+  machine_mode mode = get_pool_mode (sym);
+  if (!CONST_INT_P (val_rtx)
+  || GET_MODE_CLASS (mode) != MODE_INT
+  || GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (mem)))
 return false;
 
+  if (mode != GET_MODE (mem))
+{
+  val_rtx = simplify_subreg (GET_MODE (mem), val_rtx, mode, 0);
+  if (val_rtx == NULL_RTX || !CONST_INT_P (val_rtx))
+   return false;
+}
+
   if (val != nullptr)
 *val = INTVAL (val_rtx);
   return true;
--- gcc/config/s390/s390.md.jj  2024-01-03 11:51:54.638410489 +0100
+++ gcc/config/s390/s390.md 2024-04-05 16:17:17.322234553 +0200
@@ -2152,6 +2152,16 @@ (define_peephole2
   gcc_assert (ok);
   operands[2] = GEN_INT (val & 0xULL);
   operands[3] = GEN_INT (val & 0xULL);
+  if (operands[2] == const0_rtx)
+{
+  emit_move_insn (operands[0], operands[3]);
+  DONE;
+}
+  else if (operands[3] == const0_rtx)
+{
+  emit_move_insn (operands[0], operands[2]);
+  DONE;
+}
 })
 
 ;
--- gcc/testsuite/gcc.dg/pr114605.c.jj  2024-04-05 16:25:34.678505438 +0200
+++ gcc/testsuite/gcc.dg/pr114605.c 2024-04-05 16:25:10.388834268 +0200
@@ -0,0 +1,37 @@
+/* PR target/114605 */
+/* { dg-do run } */
+/* { dg-options "-O0" } */
+
+typedef struct { const float *a; int b, c; float *d; } S;
+
+__attribute__((noipa)) void
+bar (void)
+{
+}
+
+__attribute__((noinline, optimize (2))) static void
+foo (S *e)
+{
+  const float *f;
+  float *g;
+  float h[4] = { 0.0, 0.0, 1.0, 1.0 };
+  if (!e->b)
+f = h;
+  else
+f = e->a;
+  g = >d[0];
+  __builtin_memcpy (g, f, sizeof (float) * 4);
+  bar ();
+  if (!e->b)
+if (g[0] != 0.0 || g[1] != 0.0 || g[2] != 1.0 || g[3] != 1.0)
+  __builtin_abort ();
+}
+
+int
+main ()
+{
+  float d[4];
+  S e = { .d = d };
+  foo ();
+  return 0;
+}

Jakub



New French PO file for 'gcc' (version 14.1-b20240218)

2024-04-06 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-14.1-b20240218.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [pushed] aarch64: Fix bogus cnot optimisation [PR114603]

2024-04-06 Thread Richard Biener
On Fri, Apr 5, 2024 at 3:52 PM Richard Sandiford
 wrote:
>
> aarch64-sve.md had a pattern that combined:
>
> cmpeq   pb.T, pa/z, zc.T, #0
> mov zd.T, pb/z, #1
>
> into:
>
> cnotzd.T, pa/m, zc.T
>
> But this is only valid if pa.T is a ptrue.  In other cases, the
> original would set inactive elements of zd.T to 0, whereas the
> combined form would copy elements from zc.T.
>
> This isn't a regression on a known testcase.  However, it's a nasty
> wrong code bug that could conceivably trigger for autovec code (although
> I've not been able to construct a reproducer so far).  That fix is also
> quite localised to the buggy operation.  I'd therefore prefer to push
> the fix now rather than wait for GCC 15.

wrong-code bugs (and also rejects-valid or ice-on-valid) are always exempt
from the regression-only fixing.  In practice every such bug will be a
regression,
in this case to when the combining pattern was introduced (unless that was
with the version with the initial introduction of the port of course).

Richard.

> Tested on aarch64-linux-gnu & pushed.  I'll backport to branches if
> there is no fallout.
>
> Richard
>
> gcc/
> PR target/114603
> * config/aarch64/aarch64-sve.md (@aarch64_pred_cnot): Replace
> with...
> (@aarch64_ptrue_cnot): ...this, requiring operand 1 to be
> a ptrue.
> (*cnot): Require operand 1 to be a ptrue.
> * config/aarch64/aarch64-sve-builtins-base.cc (svcnot_impl::expand):
> Use aarch64_ptrue_cnot for _x operations that are predicated
> with a ptrue.  Represent other _x operations as fully-defined _m
> operations.
>
> gcc/testsuite/
> PR target/114603
> * gcc.target/aarch64/sve/acle/general/cnot_1.c: New test.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  | 25 ---
>  gcc/config/aarch64/aarch64-sve.md | 22 
>  .../aarch64/sve/acle/general/cnot_1.c | 23 +
>  3 files changed, 50 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cnot_1.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 257ca5bf6ad..5be2315a3c6 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -517,15 +517,22 @@ public:
>expand (function_expander ) const override
>{
>  machine_mode mode = e.vector_mode (0);
> -if (e.pred == PRED_x)
> -  {
> -   /* The pattern for CNOT includes an UNSPEC_PRED_Z, so needs
> -  a ptrue hint.  */
> -   e.add_ptrue_hint (0, e.gp_mode (0));
> -   return e.use_pred_x_insn (code_for_aarch64_pred_cnot (mode));
> -  }
> -
> -return e.use_cond_insn (code_for_cond_cnot (mode), 0);
> +machine_mode pred_mode = e.gp_mode (0);
> +/* The underlying _x pattern is effectively:
> +
> +dst = src == 0 ? 1 : 0
> +
> +   rather than an UNSPEC_PRED_X.  Using this form allows autovec
> +   constructs to be matched by combine, but it means that the
> +   predicate on the src == 0 comparison must be all-true.
> +
> +   For simplicity, represent other _x operations as fully-defined _m
> +   operations rather than using a separate bespoke pattern.  */
> +if (e.pred == PRED_x
> +   && gen_lowpart (pred_mode, e.args[0]) == CONSTM1_RTX (pred_mode))
> +  return e.use_pred_x_insn (code_for_aarch64_ptrue_cnot (mode));
> +return e.use_cond_insn (code_for_cond_cnot (mode),
> +   e.pred == PRED_x ? 1 : 0);
>}
>  };
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index eca8623e587..0434358122d 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -3363,24 +3363,24 @@ (define_insn_and_split 
> "trunc2"
>  ;; - CNOT
>  ;; -
>
> -;; Predicated logical inverse.
> -(define_expand "@aarch64_pred_cnot"
> +;; Logical inverse, predicated with a ptrue.
> +(define_expand "@aarch64_ptrue_cnot"
>[(set (match_operand:SVE_FULL_I 0 "register_operand")
> (unspec:SVE_FULL_I
>   [(unspec:
>  [(match_operand: 1 "register_operand")
> - (match_operand:SI 2 "aarch64_sve_ptrue_flag")
> + (const_int SVE_KNOWN_PTRUE)
>   (eq:
> -   (match_operand:SVE_FULL_I 3 "register_operand")
> -   (match_dup 4))]
> +   (match_operand:SVE_FULL_I 2 "register_operand")
> +   (match_dup 3))]
>  UNSPEC_PRED_Z)
> -  (match_dup 5)
> -  (match_dup 4)]
> +  (match_dup 4)
> +  (match_dup 3)]
>   UNSPEC_SEL))]
>"TARGET_SVE"
>{
> -operands[4] = CONST0_RTX (mode);
> -operands[5] = CONST1_RTX (mode);
> +

Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination

2024-04-06 Thread Richard Biener
On Fri, Apr 5, 2024 at 11:29 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
> > The following avoids re-walking and re-combining the instructions
> > between i2 and i3 when the pattern of i2 doesn't change.
> >
> > Bootstrap and regtest running ontop of a reversal of
> > r14-9692-g839bc42772ba7a.
>
> Please include that in the patch (or series, preferably).

I'd like reversal to be considered independently of this patch which is why
I didn't include the reversal.  Of course without reversal this patch doesn't
make sense.

> > It brings down memory use frmo 9GB to 400MB and compile-time from
> > 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
> > but has shown code generation regressions across acrchitectures.
> >
> > OK to revert r14-9692-g839bc42772ba7a?
>
> No.
>
> The patch solved a very real problem.  How does your replacement handle
> that?  You don't say.  It looks like it only battles symptoms a bit,
> instead :-(

My patch isn't a replacement for your solution.  Reversal is to address
the P1 regressions caused by the change.  My change offers to address
some of the symptoms shown with the testcase without disabling the
offending 2->2 combinations.

> We had this before: 3->2 combinations that leave an instruction
> identical to what was there before.  This was just a combination with
> context as well.  The only reason this wasn't a huge problem then
> already was because this is a 3->2 combination, even if it really is a
> 2->1 one it still is beneficial in all the same cases.  But in the new
> case it can iterate indefinitely -- well not quite, but some polynomial
> number of times, for a polynomial at least of degree three, possibly
> more :-(
>
> With this patch you need to show combine still is linear.  I don't think
> it is, but some deeper analysis might show it still is.

We have come to different conclusions as to what the issue is that the
testcase exposes in combine.  While you spotted a transform that
combine shouldn't have done (and I think I agree on that!) I identified
the fact that while the number of successful combines is linear in the
number of log-links (and thus linear in the size of a basic-block), the
number of combination _attempts_ appears to be O(log-links) times
O(successful combinations).  The testcase hits hard on this because of
those 2->2 combinations done but this problem persists with all N->2
combinations.  This "quadraticness" (I know you don't like to call it that
way) is because for each successful N->2 combination of insns
{I2, ..  .., I1} we try combining into all insns
between (and including) I2 and I1.  combine wants to retry combining
into I2 rightfully so but there's no good reason to retry _all_ of the
n intervening insns.  Yes, we want to retry all insns that have their
log-links point to I2 (and possibly more for second-level references,
dependent on how combine exactly identifies I2, I3 and I4).  Re-trying
all of them obviously works, but there's your quadraticness.

My patch makes this problem a little bit (in general) less often hit
since it avoids retrying iff I2 did not change.  I _think_ in that case
the log-links/notes shouldn't have changed either.  In any case I'm
disabling less of combine than what your patch did.

So again - is it OK to revert your patch?  Or do you expect you can
address the code generation regressions within the next week?  Since
the original issue is quite old postponing your solution to the next stage1
should be acceptable.  Currently those regressions will block the release
of GCC 14.

Do you agree that the patch I propose, while it doesn't solve any actual
issue (it doesn't fix the walking scheme nor does it avoid the combinations
combine shouldn't do), it helps in some cases and shouldn't cause code
generation regressions that your patch wouldn't have caused as well?
So is that change OK until we get your real solution implemented in a way
that doesn't cause regressions?

Thanks,
Richard.


[committed] d: Merge upstream dmd, druntime b65767825f, phobos 92dc5a4e9.

2024-04-06 Thread Iain Buclaw
Hi,

This patch merges the D front-end and runtime library with upstream dmd
b65767825f, and the standard library with phobos 92dc5a4e9.

Synchronizing with the upstream release of v2.108.0.

D front-end changes:

- Import dmd v2.108.0.

D runtime changes:

- Import druntime v2.108.0.

Phobos changes:

- Import phobos v2.108.0.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd b65767825f.
* dmd/VERSION: Bump version to v2.108.0.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime b65767825f.
* src/MERGE: Merge upstream phobos 92dc5a4e9.
---
 gcc/d/dmd/MERGE   |  2 +-
 gcc/d/dmd/VERSION |  2 +-
 libphobos/libdruntime/MERGE   |  2 +-
 .../core/internal/array/duplication.d | 14 +-
 libphobos/src/MERGE   |  2 +-
 .../building_blocks/kernighan_ritchie.d   |  4 +-
 libphobos/src/std/net/curl.d  |  5 +-
 libphobos/src/std/typecons.d  | 47 ++-
 8 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index a00872ef864..dc47db87a80 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-855353a1d9e16d43e85b6cf2b03aef388619bd16
+b65767825f365dbc153457fc86e1054b03196c6d
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/VERSION b/gcc/d/dmd/VERSION
index 8ca452f8912..5868b874955 100644
--- a/gcc/d/dmd/VERSION
+++ b/gcc/d/dmd/VERSION
@@ -1 +1 @@
-v2.108.0-rc.1
+v2.108.0
diff --git a/libphobos/libdruntime/MERGE b/libphobos/libdruntime/MERGE
index a00872ef864..dc47db87a80 100644
--- a/libphobos/libdruntime/MERGE
+++ b/libphobos/libdruntime/MERGE
@@ -1,4 +1,4 @@
-855353a1d9e16d43e85b6cf2b03aef388619bd16
+b65767825f365dbc153457fc86e1054b03196c6d
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/libphobos/libdruntime/core/internal/array/duplication.d 
b/libphobos/libdruntime/core/internal/array/duplication.d
index eec6af92fef..9df84893bb9 100644
--- a/libphobos/libdruntime/core/internal/array/duplication.d
+++ b/libphobos/libdruntime/core/internal/array/duplication.d
@@ -21,9 +21,9 @@ U[] _dup(T, U)(scope T[] a) pure nothrow @trusted if 
(__traits(isPOD, T))
 {
 import core.stdc.string : memcpy;
 import core.internal.array.construction: _d_newarrayUPureNothrow;
-auto arr = _d_newarrayUPureNothrow!T(a.length, is(T == shared));
+auto arr = _d_newarrayUPureNothrow!U(a.length, is(U == shared));
 memcpy(cast(void*) arr.ptr, cast(const(void)*) a.ptr, T.sizeof * 
a.length);
-return *cast(U[]*) 
+return arr;
 }
 }
 
@@ -358,3 +358,13 @@ U[] _dup(T, U)(T[] a) if (!__traits(isPOD, T))
 static assert(test!Copy());
 assert(test!Copy());
 }
+
+// https://issues.dlang.org/show_bug.cgi?id=24453
+@safe unittest
+{
+static inout(char)[] foo(ref scope return inout(char)[] s)
+{
+auto bla = s.idup;
+return s;
+}
+}
diff --git a/libphobos/src/MERGE b/libphobos/src/MERGE
index ff34bece2a3..a4f25db810e 100644
--- a/libphobos/src/MERGE
+++ b/libphobos/src/MERGE
@@ -1,4 +1,4 @@
-a2ade9dec49e70c6acd447df52321988a4c2fb9f
+92dc5a4e98591a0e6b0af4ff0f84f096fea09016
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/phobos repository.
diff --git 
a/libphobos/src/std/experimental/allocator/building_blocks/kernighan_ritchie.d 
b/libphobos/src/std/experimental/allocator/building_blocks/kernighan_ritchie.d
index 6883d33adae..167cf1bc6bc 100644
--- 
a/libphobos/src/std/experimental/allocator/building_blocks/kernighan_ritchie.d
+++ 
b/libphobos/src/std/experimental/allocator/building_blocks/kernighan_ritchie.d
@@ -647,7 +647,7 @@ fronting the GC allocator.
 import std.experimental.allocator.gc_allocator : GCAllocator;
 import std.typecons : Ternary;
 // KRRegion fronting a general-purpose allocator
-ubyte[1024 * 128] buf;
+align(KRRegion!().alignment) ubyte[1024 * 128] buf;
 auto alloc = fallbackAllocator(KRRegion!()(buf), GCAllocator.instance);
 auto b = alloc.allocate(100);
 assert(b.length == 100);
@@ -916,7 +916,7 @@ version (StdUnittest)
 @system unittest
 {   import std.typecons : Ternary;
 
-ubyte[1024] b;
+align(KRRegion!().alignment) ubyte[1024] b;
 auto alloc = KRRegion!()(b);
 
 auto k = alloc.allocate(128);
diff --git a/libphobos/src/std/net/curl.d b/libphobos/src/std/net/curl.d
index 6aec366c657..3f823013e65 100644
--- a/libphobos/src/std/net/curl.d
+++ b/libphobos/src/std/net/curl.d
@@ -2422,6 +2422,7 @@ struct HTTP
 import std.algorithm.searching : findSplit, startsWith;
 import 

[PATCH v1] Internal-fn: Introduce new internal function SAT_ADD

2024-04-06 Thread pan2 . li
From: Pan Li 

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

The patch also implement the SAT_ADD in the riscv backend as
the sample for both the scalar and vector.  Given below example:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;succ:   EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;succ:   EXIT
}

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, 
vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv   v0,v0,v1
  vmerge.vim  v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vsaddu.vv   v1,v1,v2
  vse64.v v1,0(a0)
  ...

To limit the patch size for review, only unsigned version of
usadd3 are involved here. The signed version will be covered
in the underlying patch(es).

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* config/riscv/autovec.md (usadd3): New pattern expand
for unsigned SAT_ADD vector.
* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
decl to expand usadd3 pattern.
(expand_vec_usadd): Ditto but for vector.
* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
emit the vsadd insn.
(expand_vec_usadd): New func impl to expand usadd3 for
vector.
* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
to expand usadd3 for scalar.
* config/riscv/riscv.md (usadd3): New pattern expand
for unsigned SAT_ADD scalar.
* config/riscv/vector.md: Allow VLS mode for vsaddu.
* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
* match.pd: Add unsigned SAT_ADD match and simply.
* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
   

RE: [PATCH v2] DSE: Bugfix ICE after allow vector type in get_stored_val

2024-04-06 Thread Li, Pan2
Kindly ping for this ice.

Pan

-Original Message-
From: Li, Pan2  
Sent: Saturday, March 23, 2024 1:45 PM
To: Jeff Law ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Wang, Yanzhang ; Liu, Hongtao 
Subject: RE: [PATCH v2] DSE: Bugfix ICE after allow vector type in 
get_stored_val

Thanks Jeff for comments.

> As Richi noted using validate_subreg here isn't great.  Does it work to 
> factor out this code from extract_low_bits
>
>>   if (!int_mode_for_mode (src_mode).exists (_int_mode)
>>   || !int_mode_for_mode (mode).exists (_mode))
>> return NULL_RTX;
>> 
>>   if (!targetm.modes_tieable_p (src_int_mode, src_mode))
>> return NULL_RTX;
>>   if (!targetm.modes_tieable_p (int_mode, mode))
>> return NULL_RTX;

> And use that in the condition (and in extract_low_bits rather than 
> duplicating the code)?

It can solve the ICE but will forbid all vector modes goes gen_lowpart.
Actually only the vector mode size is less than reg nature size will trigger 
the ICE.
Thus, how about just add one more condition before goes to gen_lowpart as below?

Feel free to correct me if any misunderstandings. !

diff --git a/gcc/dse.cc b/gcc/dse.cc
index edc7a1dfecf..258d2ccc299 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1946,7 +1946,9 @@ get_stored_val (store_info *store_info, machine_mode 
read_mode,
 copy_rtx (store_info->const_rhs));
   else if (VECTOR_MODE_P (read_mode) && VECTOR_MODE_P (store_mode)
 && known_le (GET_MODE_BITSIZE (read_mode), GET_MODE_BITSIZE (store_mode))
-&& targetm.modes_tieable_p (read_mode, store_mode))
+&& targetm.modes_tieable_p (read_mode, store_mode)
+/* It's invalid in validate_subreg if read_mode size is < reg natural.  */
+&& known_ge (GET_MODE_SIZE (read_mode), REGMODE_NATURAL_SIZE (read_mode)))
 read_reg = gen_lowpart (read_mode, copy_rtx (store_info->rhs));
   else
 read_reg = extract_low_bits (read_mode, store_mode,

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, March 23, 2024 2:54 AM
To: Li, Pan2 ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Wang, Yanzhang ; Liu, Hongtao 
Subject: Re: [PATCH v2] DSE: Bugfix ICE after allow vector type in 
get_stored_val



On 3/4/24 11:22 PM, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> But in the case of a vector modes, we can usually reinterpret the
>> underlying bits in whatever mode we want and do any of the usual
>> operations on those bits.
> 
> Yes, I think that is why we can allow vector mode in get_stored_val if my 
> understanding is correct.
> And then the different modes will return by gen_low_part. Unfortunately, 
> there are some modes
>   (less than a vector bit size like V2SF, V2QI for vlen=128) are considered 
> as invalid by validate_subreg,
> and return NULL_RTX result in the final ICE.
That doesn't make a lot of sense to me.  Even for vlen=128 I would have 
expected that we can still use a subreg to access low bits.  After all 
we might have had a V16QI vector and done a reduction of some sort 
storing the result in the first element and we have to be able to 
extract that result and move it around.

I'm not real keen on a target workaround.  While extremely safe, I 
wouldn't be surprised if other ports could trigger the ICE and we'd end 
up patching up multiple targets for what is, IMHO, a more generic issue.

As Richi noted using validate_subreg here isn't great.  Does it work to 
factor out this code from extract_low_bits:


>   if (!int_mode_for_mode (src_mode).exists (_int_mode)
>   || !int_mode_for_mode (mode).exists (_mode))
> return NULL_RTX;
> 
>   if (!targetm.modes_tieable_p (src_int_mode, src_mode))
> return NULL_RTX;
>   if (!targetm.modes_tieable_p (int_mode, mode))
> return NULL_RTX;

And use that in the condition (and in extract_low_bits rather than 
duplicating the code)?

jeff

ps.  No need to apologize for the pings.  This completely fell off my radar.


Re: [PATCH 0/2] Condition coverage fixes

2024-04-06 Thread Jørgen Kvalsvik

On 06/04/2024 07:50, Richard Biener wrote:




Am 05.04.2024 um 21:59 schrieb Jørgen Kvalsvik :

Hi,

I propose these fixes for the current issues with the condition
coverage.

Rainer, I propose to simply delete the test with __sigsetjmp. I don't
think it actually detects anything reasonable any more, I kept it around
to prevent a regression. Since then I have built a lot of programs (with
optimization enabled) and not really seen this problem.

H.J., the problem you found with -O2 was really a problem of
tree-inlining, which was actually caught earlier by Jan [1]. It probably
warrants some more testing, but I could reproduce by tuning your test
case to use always_inline and not -O2 and trigger the error.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648785.html


Ok


Thanks, committed.

I am wondering if the fn->cond_uids access should always be guarded (in 
tree-profile.cc) should always be guarded. Right now there is the 
assumption that if condition coverage is requested the will exist and be 
populated, but as this shows there may be other circumstances where this 
is not true.


Or perhaps there should be a gcc_assert to (reliably) detect cases where 
the map is not constructed properly?


Thanks,
Jørgen



Thanks,
Richard



Thanks,
Jørgen

Jørgen Kvalsvik (2):
  Remove unecessary and broken MC/DC compile test
  Copy condition->expr map when inlining [PR114599]

gcc/testsuite/gcc.misc-tests/gcov-19.c   | 11 -
gcc/testsuite/gcc.misc-tests/gcov-pr114599.c | 25 
gcc/tree-inline.cc   | 20 +++-
3 files changed, 44 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr114599.c

--
2.30.2





Re: [PATCH v1] LoongArch: Set default alignment for functions jumps and loops [PR112919].

2024-04-06 Thread Xi Ruoyao
On Tue, 2024-04-02 at 15:03 +0800, Lulu Cheng wrote:
> +/* Alignment for functions loops and jumps for best performance.  For new
> +   uarchs the value should be measured via benchmarking.  See the 
> documentation
> +   for -falign-functions -falign-loops and -falign-jumps in invoke.texi for 
> the
   ^ ^

Better have two commas here.

Otherwise it should be OK.

> +   format.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] x86: Use explicit shift count in double-precision shifts

2024-04-06 Thread Uros Bizjak
On Fri, Apr 5, 2024 at 5:56 PM H.J. Lu  wrote:
>
> Don't use implicit shift count in double-precision shifts in AT syntax
> since they aren't in Intel SDM.  Keep the 's' modifier for backward
> compatibility with inline asm statements.
>
> PR target/114590
> * config/i386/i386.md (x86_64_shld): Use explicit shift count in
> AT syntax.
> (x86_64_shld_ndd): Likewise.
> (x86_shld): Likewise.
> (x86_shld_ndd): Likewise.
> (x86_64_shrd): Likewise.
> (x86_64_shrd_ndd): Likewise.
> (x86_shrd): Likewise.
> (x86_shrd_ndd): Likewise.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.md | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 6ac401154e4..bb2c72f3473 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -14503,7 +14503,7 @@ (define_insn "x86_64_shld"
>   (and:QI (match_dup 2) (const_int 63 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_64BIT"
> -  "shld{q}\t{%s2%1, %0|%0, %1, %2}"
> +  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "ishift")
> (set_attr "prefix_0f" "1")
> (set_attr "mode" "DI")
> @@ -14524,7 +14524,7 @@ (define_insn "x86_64_shld_ndd"
>   (and:QI (match_dup 3) (const_int 63 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_APX_NDD"
> -  "shld{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
> +  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "type" "ishift")
> (set_attr "mode" "DI")])
>
> @@ -14681,7 +14681,7 @@ (define_insn "x86_shld"
>   (and:QI (match_dup 2) (const_int 31 0)))
> (clobber (reg:CC FLAGS_REG))]
>""
> -  "shld{l}\t{%s2%1, %0|%0, %1, %2}"
> +  "shld{l}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "ishift")
> (set_attr "prefix_0f" "1")
> (set_attr "mode" "SI")
> @@ -14703,7 +14703,7 @@ (define_insn "x86_shld_ndd"
>   (and:QI (match_dup 3) (const_int 31 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_APX_NDD"
> -  "shld{l}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
> +  "shld{l}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "type" "ishift")
> (set_attr "mode" "SI")])
>
> @@ -15792,7 +15792,7 @@ (define_insn "x86_64_shrd"
>   (and:QI (match_dup 2) (const_int 63 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_64BIT"
> -  "shrd{q}\t{%s2%1, %0|%0, %1, %2}"
> +  "shrd{q}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "ishift")
> (set_attr "prefix_0f" "1")
> (set_attr "mode" "DI")
> @@ -15813,7 +15813,7 @@ (define_insn "x86_64_shrd_ndd"
>   (and:QI (match_dup 3) (const_int 63 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_APX_NDD"
> -  "shrd{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
> +  "shrd{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "type" "ishift")
> (set_attr "mode" "DI")])
>
> @@ -15971,7 +15971,7 @@ (define_insn "x86_shrd"
>   (and:QI (match_dup 2) (const_int 31 0)))
> (clobber (reg:CC FLAGS_REG))]
>""
> -  "shrd{l}\t{%s2%1, %0|%0, %1, %2}"
> +  "shrd{l}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "ishift")
> (set_attr "prefix_0f" "1")
> (set_attr "mode" "SI")
> @@ -15993,7 +15993,7 @@ (define_insn "x86_shrd_ndd"
>   (and:QI (match_dup 3) (const_int 31 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_APX_NDD"
> -  "shrd{l}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
> +  "shrd{l}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "type" "ishift")
> (set_attr "mode" "SI")])
>
> --
> 2.44.0
>


Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-04-06 Thread Philipp Tomsich
On Sat 6. Apr 2024 at 06:52, Jeff Law  wrote:

>
>
> On 3/27/24 4:55 AM, Philipp Tomsich wrote:
> > Jeff,
> >
> > just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
> > generates the suboptimal sequence:
> >https://godbolt.org/z/K9YYEPsvY
> Realistically it's too late to get this into gcc-14.


I didn’t expect this for 14, but wanted to make sure we didn’t forget about
it once the branch for 15 opens up.

Thanks,
Philipp.

>