Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Martin Uecker
Am Dienstag, dem 24.10.2023 um 22:51 + schrieb Qing Zhao:
> 
> > On Oct 24, 2023, at 4:38 PM, Martin Uecker  wrote:
> > 
> > Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
> > > Hi, Sid,
> > > 
> > > Really appreciate for your example and detailed explanation. Very helpful.
> > > I think that this example is an excellent example to show (almost) all 
> > > the issues we need to consider.
> > > 
> > > I slightly modified this example to make it to be compilable and 
> > > run-able, as following: 
> > > (but I still cannot make the incorrect reordering or DSE happening, 
> > > anyway, the potential reordering possibility is there…)
> > > 
> > >  1 #include 
> > >  2 struct A
> > >  3 {
> > >  4  size_t size;
> > >  5  char buf[] __attribute__((counted_by(size)));
> > >  6 };
> > >  7 
> > >  8 static size_t
> > >  9 get_size_from (void *ptr)
> > > 10 {
> > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > 12 }
> > > 13 
> > > 14 void
> > > 15 foo (size_t sz)
> > > 16 {
> > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > sizeof(char));
> > > 18  obj->size = sz;
> > > 19  obj->buf[0] = 2;
> > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > 21  return;
> > > 22 }
> > > 23 
> > > 24 int main ()
> > > 25 {
> > > 26  foo (20);
> > > 27  return 0;
> > > 28 }
> > > 
> > > With my GCC, it was compiled and worked:
> > > [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > 20
> > > Situation 1: With O1 and above, the routine “get_size_from” was inlined 
> > > into “foo”, therefore, the call to __bdos is in the same routine as the 
> > > instantiation of the object, and the TYPE information and the attached 
> > > counted_by attribute information in the TYPE of the object can be USED by 
> > > the __bdos call to compute the final object size. 
> > > 
> > > [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > -1
> > > Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
> > > “foo”, therefore, the call to __bdos is Not in the same routine as the 
> > > instantiation of the object, As a result, the TYPE info and the attached 
> > > counted_by info of the object can NOT be USED by the __bdos call. 
> > > 
> > > Keep in mind of the above 2 situations, we will refer them in below:
> > > 
> > > 1. First,  the problem we are trying to resolve is:
> > > 
> > > (Your description):
> > > 
> > > > the reordering of __bdos w.r.t. initialization of the size parameter 
> > > > but to also account for DSE of the assignment, we can abstract this 
> > > > problem to that of DFA being unable to see implicit use of the size 
> > > > parameter in the __bdos call.
> > > 
> > > basically is correct.  However, with the following exception:
> > > 
> > > The implicit use of the size parameter in the __bdos call is not always 
> > > there, it ONLY exists WHEN the __bdos is able to evaluated to an 
> > > expression of the size parameter in the “objsz” phase, i.e., the 
> > > “Situation 1” of the above example. 
> > > In the “Situation 2”, when the __bdos does not see the TYPE of the real 
> > > object,  it does not see the counted_by information from the TYPE, 
> > > therefore,  it is not able to evaluate the size of the object through the 
> > > counted_by information.  As a result, the implicit use of the size 
> > > parameter in the __bdos call does NOT exist at all.  The optimizer can 
> > > freely reorder the initialization of the size parameter with the __bdos 
> > > call since there is no data flow dependency between these two. 
> > > 
> > > With this exception in mind, we can see that your proposed “option 2” 
> > > (making the type of size “volatile”) is too conservative, it will  
> > > disable many optimizations  unnecessarily, even though it’s safe and 
> > > simple to implement. 
> > > 
> > > As a compiler optimization person for many many years, I really don’t 
> > > want to take this approach at this moment.  -:)
> > > 
> > > 2. Some facts I’d like to mention:
> > > 
> > > A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
> > > optimization stage. During RTL stage,  the __bdos call has already been 
> > > replaced by an expression of the size parameter or a constant, the data 
> > > dependency is explicitly in the IR already.  I believe that the data 
> > > analysis in RTL stage should pick up the data dependency correctly, No 
> > > special handling is needed in RTL.
> > > 
> > > B. If the __bdos call cannot see the real object , it has no way to get 
> > > the “counted_by” field from the TYPE of the real object. So, if we try to 
> > > add the implicit use of the “counted_by” field to the __bdos call, the 
> > > object instantiation should be in the same routine as the __bdos call.  
> > > Both the FE and the gimplification phase are too early to do this work. 
> > > 
> > > 2. Then, what’s the 

[RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-24 Thread Vineet Gupta
RV64 comapre and branch instructions only support 64-bit operands.
The backend unconditionally generates zero/sign extend at Expand time
for compare and branch operands even if they are already as such
e.g. function args which ABI guarantees to be sign-extended (at callsite).

And subsequently REE fails to eliminate them as
"missing defintion(s)"
specially for function args since they show up as missing explicit
definition.

So elide the sign extensions at Expand time for a subreg promoted var
with inner word sized value whcih doesn't need explicit sign extending
(fairly good represntative of an incoming function arg).

There are patches floating to enhance REE and/or new passes to elimiate
extensions, but it is always better to not generate them if possible,
given Expand is so early, the elimiated extend would help improve outcomes
of later passes such as combine if they have fewer operations/combinations
to deal with.

The test used was existing gcc.c-torture/execute/20020506-1.c -O2 zbb
It elimiates the SEXT.W and an additional branch

before  after
--- --
test2:  test2:
sext.b  a0,a0
blt a0,zero,.L15
bne a1,zero,.L17bne a1,zero,.L17

This is marked RFC as I only ran a spot check with a directed test and
want to use Patrick's pre-commit CI to do the A/B testing for me.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_extend_comparands): Don't
sign-extend operand if subreg promoted with inner word size.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f2dcb0db6fbd..a8d12717e43d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3707,7 +3707,16 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
   else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+   /* A subreg promoted SI of DI could be peeled to expose DI, eliding
+  an unnecessary sign extension.  */
+   if (GET_CODE (*op0) == SUBREG
+   && SUBREG_PROMOTED_VAR_P (*op0)
+   && GET_MODE_SIZE (GET_MODE (XEXP (*op0, 0))).to_constant ()
+  == GET_MODE_SIZE (word_mode))
+*op0 = XEXP (*op0, 0);
+   else
+*op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+
  if (*op1 != const0_rtx)
*op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
}
-- 
2.34.1



[PATCH] Improve tree_expr_nonnegative_p by using the ranger [PR111959]

2023-10-24 Thread Andrew Pinski
I noticed we were missing optimizing `a / (1 << b)` when
we know that a is nonnegative but only due to ranger information.
This adds the use of the global ranger to tree_single_nonnegative_warnv_p
for SSA_NAME.
I didn't extend tree_single_nonnegative_warnv_p to use the ranger for floating
point nor to use the local ranger since I am not 100% sure it is safe where
all of the uses tree_expr_nonnegative_p would be safe.

Note pr80776-1.c testcase fails again due to vrp's bad handling of setting
global ranges from __builtin_unreachable. It just happened to be optimized
before due to global ranges not being used as much.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111959

gcc/ChangeLog:

* fold-const.cc (tree_single_nonnegative_warnv_p): Use
the global range to see if the SSA_NAME was nonnegative.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/forwprop-42.c: New test.
* gcc.dg/pr80776-1.c: xfail and update comment.
---
 gcc/fold-const.cc   | 36 +++--
 gcc/testsuite/gcc.dg/pr80776-1.c|  8 ++---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c | 15 +
 3 files changed, 46 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 40767736389..2a2a90230f5 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15047,15 +15047,33 @@ tree_single_nonnegative_warnv_p (tree t, bool 
*strict_overflow_p, int depth)
   return RECURSE (TREE_OPERAND (t, 1)) && RECURSE (TREE_OPERAND (t, 2));
 
 case SSA_NAME:
-  /* Limit the depth of recursion to avoid quadratic behavior.
-This is expected to catch almost all occurrences in practice.
-If this code misses important cases that unbounded recursion
-would not, passes that need this information could be revised
-to provide it through dataflow propagation.  */
-  return (!name_registered_for_update_p (t)
- && depth < param_max_ssa_name_query_depth
- && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
- strict_overflow_p, depth));
+  {
+   /* For integral types, querry the global range if possible. */
+   if (INTEGRAL_TYPE_P (TREE_TYPE (t)))
+ {
+   value_range vr;
+   if (get_global_range_query ()->range_of_expr (vr, t)
+   && !vr.varying_p () && !vr.undefined_p ())
+ {
+   /* If the range is nonnegative, return true. */
+   if (vr.nonnegative_p ())
+ return true;
+
+   /* If the range is non-positive, then return false. */
+   if (vr.nonpositive_p ())
+ return false;
+ }
+ }
+   /* Limit the depth of recursion to avoid quadratic behavior.
+  This is expected to catch almost all occurrences in practice.
+  If this code misses important cases that unbounded recursion
+  would not, passes that need this information could be revised
+  to provide it through dataflow propagation.  */
+   return (!name_registered_for_update_p (t)
+   && depth < param_max_ssa_name_query_depth
+   && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
+   strict_overflow_p, depth));
+  }
 
 default:
   return tree_simple_nonnegative_warnv_p (TREE_CODE (t), TREE_TYPE (t));
diff --git a/gcc/testsuite/gcc.dg/pr80776-1.c b/gcc/testsuite/gcc.dg/pr80776-1.c
index b9bce62d982..f3d47aeda36 100644
--- a/gcc/testsuite/gcc.dg/pr80776-1.c
+++ b/gcc/testsuite/gcc.dg/pr80776-1.c
@@ -18,14 +18,14 @@ Foo (void)
   if (! (0 <= i && i <= 99))
 __builtin_unreachable ();
 
-  /* Legacy evrp sets the range of i to [0, MAX] *before* the first 
conditional,
+  /* vrp1 sets the range of i to [0, MAX] *before* the first conditional,
  and to [0,99] *before* the second conditional.  This is because both
- evrp and VRP use trickery to set global ranges when this particular use of
+ vrp use trickery to set global ranges when this particular use of
  a __builtin_unreachable is in play (see uses of
  assert_unreachable_fallthru_edge_p).
 
- Setting these ranges at the definition site, causes VRP to remove the
+ Setting these ranges at the definition site, causes other passes to 
remove the
  unreachable code altogether, leaving the following sprintf unguarded.  
This
  causes the bogus warning below.  */
-  sprintf (number, "%d", i); /* { dg-bogus "writing" "" } */
+  sprintf (number, "%d", i); /* { dg-bogus "writing" "" { xfail *-*-* } } */
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
new file mode 100644
index 000..4e5421ed4d4
--- /dev/null
+++ 

[PATCH] match: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957]

2023-10-24 Thread Andrew Pinski
This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified
to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead
of abs.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111957

gcc/ChangeLog:

* match.pd (`a != C1 ? abs(a) : C2`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-40.c: New test.
---
 gcc/match.pd   | 10 +
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c | 25 ++
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 5df04ebba77..370ee35de52 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5622,6 +5622,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (wi::eq_p (wi::bit_not (wi::to_wide (@1)), wi::to_wide (@2)))
   @3))
 
+/* X != C1 ? abs(X) : C2 simplifies to abs(x) when abs(C1) == C2. */
+(for op (abs absu)
+ (simplify
+  (cond (ne @0 INTEGER_CST@1) (op@3 @0) INTEGER_CST@2)
+  (if (wi::abs (wi::to_wide (@1)) == wi::to_wide (@2))
+   (if (op != ABSU_EXPR && wi::only_sign_bit_p (wi::to_wide (@1)))
+(with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
+ (convert (absu:utype @0)))
+@3
+
 /* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c
new file mode 100644
index 000..a9011ce97fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiopt" } */
+/* PR tree-optimization/111957 */
+
+int f(int a)
+{
+  if (a)
+return a > 0 ? a : -a;
+  return 0;
+}
+
+int f1(int x)
+{
+  int intmin = (-1u >> 1);
+  intmin = -intmin - 1;
+  if (x != intmin)
+return x > 0 ? x : -x;
+  return intmin;
+}
+
+/* { dg-final { scan-tree-dump-times "if " 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-not "if " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 2 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 1 "phiopt2" } } */
+/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 1 "phiopt2" } } */
-- 
2.34.1



PING^3 [PATCH v2] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]

2023-10-24 Thread Kewen.Lin
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609993.html

BR,
Kewen

>> on 2023/1/16 17:08, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> As Honza pointed out in [1], the current uses of function
>>> optimize_function_for_speed_p in rs6000_option_override_internal
>>> are too early, since the query results from the functions
>>> optimize_function_for_{speed,size}_p could be changed later due
>>> to profile feedback and some function attributes handlings etc.
>>>
>>> This patch is to move optimize_function_for_speed_p to all the
>>> use places of the corresponding flags, which follows the existing
>>> practices.  Maybe we can cache it somewhere at an appropriate
>>> timing, but that's another thing.
>>>
>>> Comparing with v1[2], this version added one test case for
>>> SAVE_TOC_INDIRECT as Segher questioned and suggested, and it
>>> also considered the possibility of explicit option (see test
>>> cases pr108184-2.c and pr108184-4.c).  I believe that excepting
>>> for the intentional change on optimize_function_for_{speed,
>>> size}_p, there is no other function change.
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607527.html
>>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609379.html
>>>
>>> Bootstrapped and regtested on powerpc64-linux-gnu P8,
>>> powerpc64le-linux-gnu P{9,10} and powerpc-ibm-aix.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
>>> all optimize_function_for_speed_p uses.
>>> (fusion_gpr_load_p): Call optimize_function_for_speed_p along
>>> with TARGET_P8_FUSION_SIGN.
>>> (expand_fusion_gpr_load): Likewise.
>>> (rs6000_call_aix): Call optimize_function_for_speed_p along with
>>> TARGET_SAVE_TOC_INDIRECT.
>>> * config/rs6000/predicates.md (fusion_gpr_mem_load): Call
>>> optimize_function_for_speed_p along with TARGET_P8_FUSION_SIGN.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/pr108184-1.c: New test.
>>> * gcc.target/powerpc/pr108184-2.c: New test.
>>> * gcc.target/powerpc/pr108184-3.c: New test.
>>> * gcc.target/powerpc/pr108184-4.c: New test.
>>> ---
>>>  gcc/config/rs6000/predicates.md   |  5 +++-
>>>  gcc/config/rs6000/rs6000.cc   | 19 +-
>>>  gcc/testsuite/gcc.target/powerpc/pr108184-1.c | 16 
>>>  gcc/testsuite/gcc.target/powerpc/pr108184-2.c | 15 +++
>>>  gcc/testsuite/gcc.target/powerpc/pr108184-3.c | 25 +++
>>>  gcc/testsuite/gcc.target/powerpc/pr108184-4.c | 24 ++
>>>  6 files changed, 97 insertions(+), 7 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-1.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-2.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-3.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-4.c
>>>
>>> diff --git a/gcc/config/rs6000/predicates.md 
>>> b/gcc/config/rs6000/predicates.md
>>> index a1764018545..9f84468db84 100644
>>> --- a/gcc/config/rs6000/predicates.md
>>> +++ b/gcc/config/rs6000/predicates.md
>>> @@ -1878,7 +1878,10 @@ (define_predicate "fusion_gpr_mem_load"
>>>
>>>/* Handle sign/zero extend.  */
>>>if (GET_CODE (op) == ZERO_EXTEND
>>> -  || (TARGET_P8_FUSION_SIGN && GET_CODE (op) == SIGN_EXTEND))
>>> +  || (TARGET_P8_FUSION_SIGN
>>> + && GET_CODE (op) == SIGN_EXTEND
>>> + && (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN
>>> + || optimize_function_for_speed_p (cfun
>>>  {
>>>op = XEXP (op, 0);
>>>mode = GET_MODE (op);
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 6ac3adcec6b..f47d21980a9 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -3997,8 +3997,7 @@ rs6000_option_override_internal (bool global_init_p)
>>>/* If we can shrink-wrap the TOC register save separately, then use
>>>   -msave-toc-indirect unless explicitly disabled.  */
>>>if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>>> -  && flag_shrink_wrap_separate
>>> -  && optimize_function_for_speed_p (cfun))
>>> +  && flag_shrink_wrap_separate)
>>>  rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>>
>>>/* Enable power8 fusion if we are tuning for power8, even if we aren't
>>> @@ -4032,7 +4031,6 @@ rs6000_option_override_internal (bool global_init_p)
>>>   zero extending load, and an explicit sign extension.  */
>>>if (TARGET_P8_FUSION
>>>&& !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
>>> -  && optimize_function_for_speed_p (cfun)
>>>&& optimize >= 3)
>>>  rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
>>>
>>> @@ -25690,7 +25688,10 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx 
>>> tlsarg, rtx cookie)
>>>
>>>   /* Can we 

PING^5 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-10-24 Thread Kewen.Lin
Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

 on 2022/11/24 17:15, Kewen Lin wrote:
> Hi,
>
> Following Segher's suggestion, this patch series is to rework
> function rs6000_emit_vector_compare for vector float and int
> in multiple steps, it's based on the previous attempts [1][2].
> As mentioned in [1], the need to rework this for float is to
> make a centralized place for vector float comparison handlings
> instead of supporting with swapping ops and reversing code etc.
> dispersedly.  It's also for a subsequent patch to handle
> comparison operators with or without trapping math (PR105480).
> With the handling on vector float reworked, we can further make
> the handling on vector int simplified as shown.
>
> For Segher's concern about whether this rework causes any
> assembly change, I constructed two testcases for vector float[3]
> and int[4] respectively before, it showed the most are fine
> excepting for the difference on LE and UNGT, it's demonstrated
> as improvement since it uses GE instead of GT ior EQ.  The
> associated test case in patch 3/9 is a good example.
>
> Besides, w/ and w/o the whole patch series, I built the whole
> SPEC2017 at options -O3 and -Ofast separately, checked the
> differences on object assembly.  The result showed that the
> most are unchanged, except for:
>
>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
> 9 object files with differences.
>
>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
> one and 527.cam4_r has 4 object files with differences.
>
> By looking into these differences, all significant differences
> are caused by the known improvement mentined above transforming
> GT ior EQ to GE, which can also affect unrolling decision due
> to insn count.  Some other trivial differences are branch
> target offset difference, nop difference for alignment, vsx
> register number differences etc.
>
> I also evaluated the runtime performance for these changed
> benchmarks, the result is neutral.
>
> These patches are bootstrapped and regress-tested
> incrementally on powerpc64-linux-gnu P7 & P8, and
> powerpc64le-linux-gnu P9 & P10.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>
> Kewen Lin (9):
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - 
> p1
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - 
> p2
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - 
> p3
>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - 
> p4
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
> - p1
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
> - p2
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
> - p3
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
> - p4
>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
> - p5
>
>  gcc/config/rs6000/rs6000.cc | 180 ++--
>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>  2 files changed, 74 insertions(+), 131 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>



[PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-10-24 Thread Kewen.Lin
Hi,

This is almost a repost for v2 which was posted at[1] in March
excepting for:
  1) rebased from r14-4810 which is relatively up-to-date,
 some conflicts on "int to bool" return type change have
 been resolved;
  2) adjust commit log a bit;
  3) fix misspelled "articial" with "artificial" somewhere;

--
*v2 comments*:

By addressing Alexander's comments, against v1 this
patch v2 mainly:

  - Rename no_real_insns_p to no_real_nondebug_insns_p;
  - Introduce enum rgn_bb_deps_free_action for three
kinds of actions to free deps;
  - Change function free_deps_for_bb_no_real_insns_p to
resolve_forw_deps which only focuses on forward deps;
  - Extend the handlings to cover dbg-cnt sched_block,
add one test case for it;
  - Move free_trg_info call in schedule_region to an
appropriate place.

One thing I'm not sure about is the change in function
sched_rgn_local_finish, currently the invocation to
sched_rgn_local_free is guarded with !sel_sched_p (),
so I just follow it, but the initialization of those
structures (in sched_rgn_local_init) isn't guarded
with !sel_sched_p (), it looks odd.

--

As PR108273 shows, when there is one block which only has
NOTE_P and LABEL_P insns at non-debug mode while has some
extra DEBUG_INSN_P insns at debug mode, after scheduling
it, the DFA states would be different between debug mode
and non-debug mode.  Since at non-debug mode, the block
meets no_real_insns_p, it gets skipped; while at debug
mode, it gets scheduled, even it only has NOTE_P, LABEL_P
and DEBUG_INSN_P, the call of function advance_one_cycle
will change the DFA state.  PR108519 also shows this issue
can be exposed by some scheduler changes.

This patch is to change function no_real_insns_p to
function no_real_nondebug_insns_p by taking debug insn into
account, which make us not try to schedule for the block
having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
resulting in consistent DFA states between non-debug and
debug mode.

Changing no_real_insns_p to no_real_nondebug_insns_p caused
ICE when doing free_block_dependencies, the root cause is
that we create dependencies for debug insns, those
dependencies are expected to be resolved during scheduling
insns, but they get skipped after this change.
By checking the code, it looks it's reasonable to skip to
compute block dependences for no_real_nondebug_insns_p
blocks.  There is also another issue, which gets exposed
in SPEC2017 bmks build at option -O2 -g, is that we could
skip to schedule some block, which already gets dependency
graph built so has dependencies computed and rgn_n_insns
accumulated, then the later verification on if the graph
becomes exhausted by scheduling would fail as follow:

  /* Sanity check: verify that all region insns were
 scheduled.  */
gcc_assert (sched_rgn_n_insns == rgn_n_insns);

, and also some forward deps aren't resovled.

As Alexander pointed out, the current debug count handling
also suffers the similar issue, so this patch handles these
two cases together: one is for some block gets skipped by
!dbg_cnt (sched_block), the other is for some block which
is not no_real_nondebug_insns_p initially but becomes
no_real_nondebug_insns_p due to speculative scheduling.

This patch can be bootstrapped and regress-tested on
x86_64-redhat-linux, aarch64-linux-gnu and
powerpc64{,le}-linux-gnu.

I also verified this patch can pass SPEC2017 both intrate
and fprate bmks building at -g -O2/-O3.

Any thoughts?  Is it ok for trunk?

[1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
[2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html

BR,
Kewen
-
PR rtl-optimization/108273

gcc/ChangeLog:

* haifa-sched.cc (no_real_insns_p): Rename to ...
(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
no_real_nondebug_insns_p.
* sched-int.h (no_real_insns_p): Rename to ...
(no_real_nondebug_insns_p): ... this.
* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
(bb_deps_free_actions): New static variable.
(compute_block_dependences): Skip for no_real_nondebug_insns_p.
(resolve_forw_deps): New function.
(free_block_dependencies): Check bb_deps_free_actions and call
function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
(compute_priorities): Replace no_real_insns_p with
no_real_nondebug_insns_p.
(schedule_region): Replace no_real_insns_p with
no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
get dependencies computed before but skipped now, fix up count
sched_rgn_n_insns for it too.  Call free_trg_info when the block
gets scheduled, and move sched_rgn_local_finish after the loop
of free_block_dependencies loop.
(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.

Re: [PATCH 3/3]rs6000: split complicate constant to constant pool

2023-10-24 Thread Kewen.Lin
Hi,

on 2023/10/25 10:00, Jiufu Guo wrote:
> Hi,
> 
> Sometimes, a complicated constant is built via 3(or more)
> instructions to build. Generally speaking, it would not be
> as faster as loading it from the constant pool (as a few
> discussions in PR63281).

I may miss some previous discussions, but I'm curious why we
chose ">=3" here, as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c9
which indicates that more than 3 (>3) should be considered
with this change.

> 
> For the concern that I raised in:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599676.html
> The micro-cases would not be the major concern. Because as
> Segher explained in:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c18
> It would just be about the benchmark method.
> 
> As tested on spec2017, for visible performance changes, we
> can find the runtime improvement on 500.perlbench_r about
> ~1.8% (-O2) when support loading complicates constant from
> constant pool. And no visible performance recession on
> other benchmarks.

The improvement on 500.perlbench_r looks to match what PR63281
mentioned, nice!  I'm curious that which options and which kinds
of CPUs have you tested with?  Since this is a general change,
I'd expect we can test with P8/P9/P10 at O2/O3 (or Ofast) at
least.

BR,
Kewen

> 
> Boostrap & regtest pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff (Jiufu Guo)
> 
>   PR target/63281
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
>   complicate constant to memory.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/const_anchors.c: Update to test final-rtl. 
>   * gcc.target/powerpc/parall_5insn_const.c: Update to keep original test
>   point.
>   * gcc.target/powerpc/pr106550.c: Likewise..
>   * gcc.target/powerpc/pr106550_1.c: Likewise.
>   * gcc.target/powerpc/pr87870.c: Update according to latest behavior.
>   * gcc.target/powerpc/pr93012.c: Likewise.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 16 
>  .../gcc.target/powerpc/const_anchors.c  |  5 ++---
>  .../gcc.target/powerpc/parall_5insn_const.c | 14 --
>  gcc/testsuite/gcc.target/powerpc/pr106550.c | 17 +++--
>  gcc/testsuite/gcc.target/powerpc/pr106550_1.c   | 15 +--
>  gcc/testsuite/gcc.target/powerpc/pr87870.c  |  5 -
>  gcc/testsuite/gcc.target/powerpc/pr93012.c  |  4 +++-
>  7 files changed, 65 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 4690384cdbe..b9562f1ea0f 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10292,6 +10292,22 @@ rs6000_emit_set_const (rtx dest, rtx source)
> c = sext_hwi (c, 32);
> emit_move_insn (lo, GEN_INT (c));
>   }
> +
> +  /* If it can be stored to the constant pool and profitable.  */
> +  else if (base_reg_operand (dest, mode)
> +&& num_insns_constant (source, mode) > 2)
> + {
> +   rtx sym = force_const_mem (mode, source);
> +   if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
> +   && use_toc_relative_ref (XEXP (sym, 0), mode))
> + {
> +   rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
> +   sym = gen_const_mem (mode, toc);
> +   set_mem_alias_set (sym, get_TOC_alias_set ());
> + }
> +
> +   emit_insn (gen_rtx_SET (dest, sym));
> + }
>else
>   rs6000_emit_set_long_const (dest, c);
>break;
> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> index 542e2674b12..188744165f2 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target has_arch_ppc64 } } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fdump-rtl-final" } */
>  
>  #define C1 0x2351847027482577ULL
>  #define C2 0x2351847027482578ULL
> @@ -16,5 +16,4 @@ void __attribute__ ((noinline)) foo1 (long long *a, long 
> long b)
>if (b)
>  *a++ = C2;
>  }
> -
> -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
> +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c 
> b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> index e3a9a7264cf..df0690b90be 100644
> --- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> @@ -9,8 +9,18 @@
>  void __attribute__ ((noinline)) foo (unsigned long long *a)
>  {
>/* 2 lis + 2 ori + 1 rldimi for each constant.  */
> -  *a++ = 0x800aabcdc167fa16ULL;
> -  *a++ = 0x7543a876867f616ULL;
> +  {
> +register long long d asm("r0") = 0x800aabcdc167fa16ULL;
> +long long n;
> +asm("mr %0, %1" : "=r"(n) : "r"(d));
> +*a++ = 

Re: [PATCH 2/3]rs6000: using 'pli' to load 34bit-constant

2023-10-24 Thread Kewen.Lin
on 2023/10/25 10:00, Jiufu Guo wrote:
> Hi,
> 
> For constants with 16bit values, 'li or lis' can be used to generate
> the value.  For 34bit constant, 'pli' is ok to generate the value.
> 
> Bootstrap pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff (Jiufu Guo)
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
>   pli for 34bit constant.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index b23ff3d7917..4690384cdbe 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10530,7 +10530,11 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>ud3 = (c >> 32) & 0x;
>ud4 = (c >> 48) & 0x;
> 
> -  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
> +  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (c))
> +{
> +  emit_move_insn (dest, GEN_INT (c));
> +}

Nit: unexpected formatting, no {} needed.

Is there any test case justifying this change?  I think only one "li" or "lis"
beats "pli" since the latter is a prefixed insn, it puts more burdens on insn
decoding.

BR,
Kewen

> +  else if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 
> 0x8000))
>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
>  emit_move_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
> 




Re: [PATCH 1/3]rs6000: update num_insns_constant for 2 insns

2023-10-24 Thread Kewen.Lin
Hi,

on 2023/10/25 10:00, Jiufu Guo wrote:
> Hi,
> 
> Trunk gcc supports more constants to be built via two instructions: e.g.
> "li/lis; xori/xoris/rldicl/rldicr/rldic".
> And then num_insns_constant should also be updated.
> 

Thanks for updating this.

> Bootstrap & regtest pass ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff (Jiufu Guo)
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (can_be_built_by_lilis_and_rldicX): New
>   function.
>   (num_insns_constant_gpr): Update to return 2 for more cases.
>   (rs6000_emit_set_long_const): Update to use
>   can_be_built_by_lilis_and_rldicX.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 64 -
>  1 file changed, 41 insertions(+), 23 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index cc24dd5301e..b23ff3d7917 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -6032,6 +6032,9 @@ direct_return (void)
>return 0;
>  }
>  
> +static bool
> +can_be_built_by_lilis_and_rldicX (HOST_WIDE_INT, int *, HOST_WIDE_INT *);
> +
>  /* Helper for num_insns_constant.  Calculate number of instructions to
> load VALUE to a single gpr using combinations of addi, addis, ori,
> oris, sldi and rldimi instructions.  */
> @@ -6044,35 +6047,41 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
>  return 1;
>  
>/* constant loadable with addis */
> -  else if ((value & 0x) == 0
> -&& (value >> 31 == -1 || value >> 31 == 0))
> +  if ((value & 0x) == 0 && (value >> 31 == -1 || value >> 31 == 0))
>  return 1;
>  
>/* PADDI can support up to 34 bit signed integers.  */
> -  else if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (value))
> +  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (value))
>  return 1;
>  
> -  else if (TARGET_POWERPC64)
> -{
> -  HOST_WIDE_INT low = sext_hwi (value, 32);
> -  HOST_WIDE_INT high = value >> 31;
> +  if (!TARGET_POWERPC64)
> +return 2;
>  
> -  if (high == 0 || high == -1)
> - return 2;
> +  HOST_WIDE_INT low = sext_hwi (value, 32);
> +  HOST_WIDE_INT high = value >> 31;
>  
> -  high >>= 1;
> +  if (high == 0 || high == -1)
> +return 2;
>  
> -  if (low == 0 || low == high)
> - return num_insns_constant_gpr (high) + 1;
> -  else if (high == 0)
> - return num_insns_constant_gpr (low) + 1;
> -  else
> - return (num_insns_constant_gpr (high)
> - + num_insns_constant_gpr (low) + 1);
> -}
> +  high >>= 1;
>  
> -  else
> +  HOST_WIDE_INT ud2 = (low >> 16) & 0x;
> +  HOST_WIDE_INT ud1 = low & 0x;
> +  if (high == -1 && ((!(ud2 & 0x8000) && ud1 == 0) || (ud1 & 0x8000)))
> +return 2;
> +  if (high == 0 && (ud1 == 0 || (!(ud1 & 0x8000
>  return 2;

I was thinking that instead of enumerating all the cases in function
rs6000_emit_set_long_const, if we can add one optional argument like
"int* num_insns=nullptr" to function rs6000_emit_set_long_const, and
when it's not nullptr, not emitting anything but update the count in
rs6000_emit_set_long_const.  It helps people remember to update
num_insns when updating rs6000_emit_set_long_const in future, it's
also more clear on how the number comes from.

Does it sound good to you?

BR,
Kewen

> +
> +  int shift;
> +  HOST_WIDE_INT mask;
> +  if (can_be_built_by_lilis_and_rldicX (value, , ))
> +return 2;
> +
> +  if (low == 0 || low == high)
> +return num_insns_constant_gpr (high) + 1;
> +  if (high == 0)
> +return num_insns_constant_gpr (low) + 1;
> +  return (num_insns_constant_gpr (high) + num_insns_constant_gpr (low) + 1);
>  }
>  
>  /* Helper for num_insns_constant.  Allow constants formed by the
> @@ -10492,6 +10501,18 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
> *shift, HOST_WIDE_INT *mask)
>return false;
>  }
>  
> +/* Combine the above checking functions for  li/lis;rldicX. */
> +
> +static bool
> +can_be_built_by_lilis_and_rldicX (HOST_WIDE_INT c, int *shift,
> +   HOST_WIDE_INT *mask)
> +{
> +  return (can_be_built_by_li_lis_and_rotldi (c, shift, mask)
> +   || can_be_built_by_li_lis_and_rldicl (c, shift, mask)
> +   || can_be_built_by_li_lis_and_rldicr (c, shift, mask)
> +   || can_be_built_by_li_and_rldic (c, shift, mask));
> +}
> +
>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> Output insns to set DEST equal to the constant C as a series of
> lis, ori and shl instructions.  */
> @@ -10538,10 +10559,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>GEN_INT ((ud2 ^ 0x) << 16)));
>  }
> -  else if (can_be_built_by_li_lis_and_rotldi (c, , )
> -|| can_be_built_by_li_lis_and_rldicl (c, , )
> -|| can_be_built_by_li_lis_and_rldicr (c, , )
> -|| can_be_built_by_li_and_rldic (c, , ))
> +  else if 

[PATCH 3/3]rs6000: split complicate constant to constant pool

2023-10-24 Thread Jiufu Guo
Hi,

Sometimes, a complicated constant is built via 3(or more)
instructions to build. Generally speaking, it would not be
as faster as loading it from the constant pool (as a few
discussions in PR63281).

For the concern that I raised in:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599676.html
The micro-cases would not be the major concern. Because as
Segher explained in:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281#c18
It would just be about the benchmark method.

As tested on spec2017, for visible performance changes, we
can find the runtime improvement on 500.perlbench_r about
~1.8% (-O2) when support loading complicates constant from
constant pool. And no visible performance recession on
other benchmarks.

Boostrap & regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

PR target/63281

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
complicate constant to memory.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const_anchors.c: Update to test final-rtl. 
* gcc.target/powerpc/parall_5insn_const.c: Update to keep original test
point.
* gcc.target/powerpc/pr106550.c: Likewise..
* gcc.target/powerpc/pr106550_1.c: Likewise.
* gcc.target/powerpc/pr87870.c: Update according to latest behavior.
* gcc.target/powerpc/pr93012.c: Likewise.

---
 gcc/config/rs6000/rs6000.cc | 16 
 .../gcc.target/powerpc/const_anchors.c  |  5 ++---
 .../gcc.target/powerpc/parall_5insn_const.c | 14 --
 gcc/testsuite/gcc.target/powerpc/pr106550.c | 17 +++--
 gcc/testsuite/gcc.target/powerpc/pr106550_1.c   | 15 +--
 gcc/testsuite/gcc.target/powerpc/pr87870.c  |  5 -
 gcc/testsuite/gcc.target/powerpc/pr93012.c  |  4 +++-
 7 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 4690384cdbe..b9562f1ea0f 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10292,6 +10292,22 @@ rs6000_emit_set_const (rtx dest, rtx source)
  c = sext_hwi (c, 32);
  emit_move_insn (lo, GEN_INT (c));
}
+
+  /* If it can be stored to the constant pool and profitable.  */
+  else if (base_reg_operand (dest, mode)
+  && num_insns_constant (source, mode) > 2)
+   {
+ rtx sym = force_const_mem (mode, source);
+ if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
+ && use_toc_relative_ref (XEXP (sym, 0), mode))
+   {
+ rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
+ sym = gen_const_mem (mode, toc);
+ set_mem_alias_set (sym, get_TOC_alias_set ());
+   }
+
+ emit_insn (gen_rtx_SET (dest, sym));
+   }
   else
rs6000_emit_set_long_const (dest, c);
   break;
diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
index 542e2674b12..188744165f2 100644
--- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
+++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target has_arch_ppc64 } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fdump-rtl-final" } */
 
 #define C1 0x2351847027482577ULL
 #define C2 0x2351847027482578ULL
@@ -16,5 +16,4 @@ void __attribute__ ((noinline)) foo1 (long long *a, long long 
b)
   if (b)
 *a++ = C2;
 }
-
-/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
+/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c 
b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
index e3a9a7264cf..df0690b90be 100644
--- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
+++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
@@ -9,8 +9,18 @@
 void __attribute__ ((noinline)) foo (unsigned long long *a)
 {
   /* 2 lis + 2 ori + 1 rldimi for each constant.  */
-  *a++ = 0x800aabcdc167fa16ULL;
-  *a++ = 0x7543a876867f616ULL;
+  {
+register long long d asm("r0") = 0x800aabcdc167fa16ULL;
+long long n;
+asm("mr %0, %1" : "=r"(n) : "r"(d));
+*a++ = n;
+  }
+  {
+register long long d asm("r0") = 0x7543a876867f616ULL;
+long long n;
+asm("mr %0, %1" : "=r"(n) : "r"(d));
+*a++ = n;
+  }
 }
 
 long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL};
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
b/gcc/testsuite/gcc.target/powerpc/pr106550.c
index 74e395331ab..5eca2b2f701 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr106550.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
@@ -1,12 +1,25 @@
 /* PR target/106550 */
 /* { dg-options "-O2 -mdejagnu-cpu=power10" } */
 /* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target has_arch_ppc64 } */
 
 void
 foo (unsigned long long *a)
 {
-  *a++ = 

[PATCH 2/3]rs6000: using 'pli' to load 34bit-constant

2023-10-24 Thread Jiufu Guo
Hi,

For constants with 16bit values, 'li or lis' can be used to generate
the value.  For 34bit constant, 'pli' is ok to generate the value.

Bootstrap pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
pli for 34bit constant.

---
 gcc/config/rs6000/rs6000.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b23ff3d7917..4690384cdbe 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10530,7 +10530,11 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   ud3 = (c >> 32) & 0x;
   ud4 = (c >> 48) & 0x;
 
-  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
+  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (c))
+{
+  emit_move_insn (dest, GEN_INT (c));
+}
+  else if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
 emit_move_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
 
-- 
2.25.1



[PATCH 1/3]rs6000: update num_insns_constant for 2 insns

2023-10-24 Thread Jiufu Guo
Hi,

Trunk gcc supports more constants to be built via two instructions: e.g.
"li/lis; xori/xoris/rldicl/rldicr/rldic".
And then num_insns_constant should also be updated.

Bootstrap & regtest pass ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_lilis_and_rldicX): New
function.
(num_insns_constant_gpr): Update to return 2 for more cases.
(rs6000_emit_set_long_const): Update to use
can_be_built_by_lilis_and_rldicX.

---
 gcc/config/rs6000/rs6000.cc | 64 -
 1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cc24dd5301e..b23ff3d7917 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -6032,6 +6032,9 @@ direct_return (void)
   return 0;
 }
 
+static bool
+can_be_built_by_lilis_and_rldicX (HOST_WIDE_INT, int *, HOST_WIDE_INT *);
+
 /* Helper for num_insns_constant.  Calculate number of instructions to
load VALUE to a single gpr using combinations of addi, addis, ori,
oris, sldi and rldimi instructions.  */
@@ -6044,35 +6047,41 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
 return 1;
 
   /* constant loadable with addis */
-  else if ((value & 0x) == 0
-  && (value >> 31 == -1 || value >> 31 == 0))
+  if ((value & 0x) == 0 && (value >> 31 == -1 || value >> 31 == 0))
 return 1;
 
   /* PADDI can support up to 34 bit signed integers.  */
-  else if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (value))
+  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (value))
 return 1;
 
-  else if (TARGET_POWERPC64)
-{
-  HOST_WIDE_INT low = sext_hwi (value, 32);
-  HOST_WIDE_INT high = value >> 31;
+  if (!TARGET_POWERPC64)
+return 2;
 
-  if (high == 0 || high == -1)
-   return 2;
+  HOST_WIDE_INT low = sext_hwi (value, 32);
+  HOST_WIDE_INT high = value >> 31;
 
-  high >>= 1;
+  if (high == 0 || high == -1)
+return 2;
 
-  if (low == 0 || low == high)
-   return num_insns_constant_gpr (high) + 1;
-  else if (high == 0)
-   return num_insns_constant_gpr (low) + 1;
-  else
-   return (num_insns_constant_gpr (high)
-   + num_insns_constant_gpr (low) + 1);
-}
+  high >>= 1;
 
-  else
+  HOST_WIDE_INT ud2 = (low >> 16) & 0x;
+  HOST_WIDE_INT ud1 = low & 0x;
+  if (high == -1 && ((!(ud2 & 0x8000) && ud1 == 0) || (ud1 & 0x8000)))
+return 2;
+  if (high == 0 && (ud1 == 0 || (!(ud1 & 0x8000
 return 2;
+
+  int shift;
+  HOST_WIDE_INT mask;
+  if (can_be_built_by_lilis_and_rldicX (value, , ))
+return 2;
+
+  if (low == 0 || low == high)
+return num_insns_constant_gpr (high) + 1;
+  if (high == 0)
+return num_insns_constant_gpr (low) + 1;
+  return (num_insns_constant_gpr (high) + num_insns_constant_gpr (low) + 1);
 }
 
 /* Helper for num_insns_constant.  Allow constants formed by the
@@ -10492,6 +10501,18 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
   return false;
 }
 
+/* Combine the above checking functions for  li/lis;rldicX. */
+
+static bool
+can_be_built_by_lilis_and_rldicX (HOST_WIDE_INT c, int *shift,
+ HOST_WIDE_INT *mask)
+{
+  return (can_be_built_by_li_lis_and_rotldi (c, shift, mask)
+ || can_be_built_by_li_lis_and_rldicl (c, shift, mask)
+ || can_be_built_by_li_lis_and_rldicr (c, shift, mask)
+ || can_be_built_by_li_and_rldic (c, shift, mask));
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10538,10 +10559,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_lis_and_rotldi (c, , )
-  || can_be_built_by_li_lis_and_rldicl (c, , )
-  || can_be_built_by_li_lis_and_rldicr (c, , )
-  || can_be_built_by_li_and_rldic (c, , ))
+  else if (can_be_built_by_lilis_and_rldicX (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
-- 
2.25.1



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Siddhesh Poyarekar

On 2023-10-24 18:51, Qing Zhao wrote:

Thanks for the proposal!

So what you suggested is:

For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, 
then the call to the _bdos (x.buf, 1) will
Become:

_bdos(__builtin_with_size(x.buf, x.L), 1)?

Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?


Oops, I think Martin and I fell off-list in a subthread.  I clarified 
that my comment was that any such annotation at object reference is 
probably too late and hence not the right place for it; basically it has 
the same problems as the option A in your comment.  A better place to 
reinforce such a relationship would be the allocation+initialization 
site instead.


Thanks,
Sid


Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Siddhesh Poyarekar

On 2023-10-24 18:41, Qing Zhao wrote:




On Oct 24, 2023, at 5:03 PM, Siddhesh Poyarekar  wrote:

On 2023-10-24 16:30, Qing Zhao wrote:

Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
therefore, the call to __bdos is Not in the same routine as the instantiation 
of the object, As a result, the TYPE info and the attached counted_by info of 
the object can NOT be USED by the __bdos call.


But __bos/__bdos are barely useful without optimization; you need a minimum of 
-O1.  You're right that if the call is never inlined then we don't care because 
the __bdos call does not get expanded to obj->size.

However, the point of situation 2 is that the TYPE info cannot be used by the 
__bdos call *only for a while* (i.e. until the call gets inlined) and that 
window is an opportunity for the reordering/DSE to break things.


The main point of situation 2 I tried made: there are situations where 
obj->size is not used at all by the __bdos, marking it as volatile is too 
conservative, unnecessarily prevent useful optimizations from happening.  -:)


Yes, that's the tradeoff.  However, maybe this is the point where Kees 
jumps in and say the kernel doesn't really care as much or something 
like that :)


Sid


Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Qing Zhao


> On Oct 24, 2023, at 4:38 PM, Martin Uecker  wrote:
> 
> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
>> Hi, Sid,
>> 
>> Really appreciate for your example and detailed explanation. Very helpful.
>> I think that this example is an excellent example to show (almost) all the 
>> issues we need to consider.
>> 
>> I slightly modified this example to make it to be compilable and run-able, 
>> as following: 
>> (but I still cannot make the incorrect reordering or DSE happening, anyway, 
>> the potential reordering possibility is there…)
>> 
>>  1 #include 
>>  2 struct A
>>  3 {
>>  4  size_t size;
>>  5  char buf[] __attribute__((counted_by(size)));
>>  6 };
>>  7 
>>  8 static size_t
>>  9 get_size_from (void *ptr)
>> 10 {
>> 11  return __builtin_dynamic_object_size (ptr, 1);
>> 12 }
>> 13 
>> 14 void
>> 15 foo (size_t sz)
>> 16 {
>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 18  obj->size = sz;
>> 19  obj->buf[0] = 2;
>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>> 21  return;
>> 22 }
>> 23 
>> 24 int main ()
>> 25 {
>> 26  foo (20);
>> 27  return 0;
>> 28 }
>> 
>> With my GCC, it was compiled and worked:
>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> 20
>> Situation 1: With O1 and above, the routine “get_size_from” was inlined into 
>> “foo”, therefore, the call to __bdos is in the same routine as the 
>> instantiation of the object, and the TYPE information and the attached 
>> counted_by attribute information in the TYPE of the object can be USED by 
>> the __bdos call to compute the final object size. 
>> 
>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> -1
>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
>> “foo”, therefore, the call to __bdos is Not in the same routine as the 
>> instantiation of the object, As a result, the TYPE info and the attached 
>> counted_by info of the object can NOT be USED by the __bdos call. 
>> 
>> Keep in mind of the above 2 situations, we will refer them in below:
>> 
>> 1. First,  the problem we are trying to resolve is:
>> 
>> (Your description):
>> 
>>> the reordering of __bdos w.r.t. initialization of the size parameter but to 
>>> also account for DSE of the assignment, we can abstract this problem to 
>>> that of DFA being unable to see implicit use of the size parameter in the 
>>> __bdos call.
>> 
>> basically is correct.  However, with the following exception:
>> 
>> The implicit use of the size parameter in the __bdos call is not always 
>> there, it ONLY exists WHEN the __bdos is able to evaluated to an expression 
>> of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the 
>> above example. 
>> In the “Situation 2”, when the __bdos does not see the TYPE of the real 
>> object,  it does not see the counted_by information from the TYPE, 
>> therefore,  it is not able to evaluate the size of the object through the 
>> counted_by information.  As a result, the implicit use of the size parameter 
>> in the __bdos call does NOT exist at all.  The optimizer can freely reorder 
>> the initialization of the size parameter with the __bdos call since there is 
>> no data flow dependency between these two. 
>> 
>> With this exception in mind, we can see that your proposed “option 2” 
>> (making the type of size “volatile”) is too conservative, it will  disable 
>> many optimizations  unnecessarily, even though it’s safe and simple to 
>> implement. 
>> 
>> As a compiler optimization person for many many years, I really don’t want 
>> to take this approach at this moment.  -:)
>> 
>> 2. Some facts I’d like to mention:
>> 
>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
>> optimization stage. During RTL stage,  the __bdos call has already been 
>> replaced by an expression of the size parameter or a constant, the data 
>> dependency is explicitly in the IR already.  I believe that the data 
>> analysis in RTL stage should pick up the data dependency correctly, No 
>> special handling is needed in RTL.
>> 
>> B. If the __bdos call cannot see the real object , it has no way to get the 
>> “counted_by” field from the TYPE of the real object. So, if we try to add 
>> the implicit use of the “counted_by” field to the __bdos call, the object 
>> instantiation should be in the same routine as the __bdos call.  Both the FE 
>> and the gimplification phase are too early to do this work. 
>> 
>> 2. Then, what’s the best approach to resolve this problem:
>> 
>> There were several suggestions so far:
>> 
>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>  A.1, during FE;
>>  A.2, during gimplification phase;
>> B.  Encode the implicit USE  in the type of size, to make the size 
>> “volatile”;
>> C.  Encode the implicit USE  in the type of buf, then update the 
>> optimization passes 

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Qing Zhao


> On Oct 24, 2023, at 5:03 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-10-24 16:30, Qing Zhao wrote:
>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
>> “foo”, therefore, the call to __bdos is Not in the same routine as the 
>> instantiation of the object, As a result, the TYPE info and the attached 
>> counted_by info of the object can NOT be USED by the __bdos call.
> 
> But __bos/__bdos are barely useful without optimization; you need a minimum 
> of -O1.  You're right that if the call is never inlined then we don't care 
> because the __bdos call does not get expanded to obj->size.
> 
> However, the point of situation 2 is that the TYPE info cannot be used by the 
> __bdos call *only for a while* (i.e. until the call gets inlined) and that 
> window is an opportunity for the reordering/DSE to break things.

The main point of situation 2 I tried made: there are situations where 
obj->size is not used at all by the __bdos, marking it as volatile is too 
conservative, unnecessarily prevent useful optimizations from happening.  -:)

Qing
> 
> Thanks.
> Sid



Re: [PATCH v2 3/3] c++: note other candidates when diagnosing deletedness

2023-10-24 Thread Jason Merrill

On 10/23/23 19:51, Patrick Palka wrote:

With the previous two patches in place, we can now extend our
deletedness diagnostic to note the other considered candidates, e.g.:

   deleted16.C: In function 'int main()':
   deleted16.C:10:4: error: use of deleted function 'void f(int)'
  10 |   f(0);
 |   ~^~~
   deleted16.C:5:6: note: declared here
   5 | void f(int) = delete;
 |  ^
   deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
   deleted16.C:6:6: note: candidate: 'void f(...)'
   6 | void f(...);
 |  ^
   deleted16.C:7:6: note: candidate: 'void f(int, int)'
   7 | void f(int, int);
 |  ^
   deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided

For now, these these notes are disabled when a deleted special member
function is selected because it introduces a lot of new "cannot bind
reference" errors in the testsuite when noting non-viable candidates,
e.g. in cpp0x/initlist-opt1.C we would need to expect an error when
noting unviability of A(A&&).  (It'd be nice if we could downgrade such
errors into notes when noting candidates...)


What about my suggestion to make printing the other candidates an 
opt-in, with the normal output just suggesting the use of that option 
for more information, like -ftemplate-backtrace-limit?


Jason



Re: [PATCH v2 2/3] c++: remember candidates that we ignored

2023-10-24 Thread Jason Merrill

On 10/23/23 19:51, Patrick Palka wrote:

During overload resolution, we sometimes outright ignore a function from
the overload set and leave no trace of it in the candidates list, for
example when we find a perfect non-template candidate we discard all
function templates, or when the callee is a template-id we discard all
non-template functions.  We should still however make note of these
unviable functions when diagnosing overload resolution failure, but
that's not possible if they're not present in the returned candidates
list.

To that end, this patch reworks add_candidates to add such ignored
functions to the list.  The new rr_ignored rejection reason is somewhat
of a catch-all; we could perhaps split it up into more specific rejection
reasons, but I leave that as future work.


OK with the same unviable -> non-viable change as the 1/3 patch.

Jason



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-10-24 Thread Richard Sandiford
Robin Dapp  writes:
> The attached patch introduces a VCOND_MASK_LEN, helps for the riscv cases
> that were broken before and looks unchanged on x86, aarch64 and power
> bootstrap and testsuites.
>
> I only went with the minimal number of new match.pd patterns and did not
> try stripping the length of a COND_LEN_OP in order to simplify the
> associated COND_OP.
>
> An important part that I'm not sure how to handle properly is -
> when we have a constant immediate length of e.g. 16 and the hardware
> also operates on 16 units, vector length masking is actually
> redundant and the vcond_mask_len can be reduced to a vec_cond.
> For those (if_then_else unsplit) we have a large number of combine
> patterns that fuse instruction which do not correspond to ifns
> (like widening operations but also more complex ones).
>
> Currently I achieve this in a most likely wrong way:
>
>   auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
>   bool full_len = len && known_eq (sz.coeffs[0], ilen);
>   if (!len || full_len)
>  "vec_cond"
>   else
>  "vcond_mask_len"

At first, this seemed like an odd place to fold away the length.
AFAIK the length in res_op is inherited directly from the original
operation, and so it isn't any more redundant after the fold than
it was before.  But I suppose the reason for doing it here is that
we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
lengths.  Doing that avoids the need to define an IFN_COND_FOO
equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
I think it deserves a comment.

But yeah, known_eq (sz.coeffs[0], ilen) doesn't look right.
If the target knows that the length is exactly 16 at runtime,
then it should set GET_MODE_NUNITS to 16.  So I think the length
is only redundant if known_eq (sz, ilen).

The calculation should take the bias into account as well.

Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
(I realise IFN_COND_MASK isn't, but that's used differently.)

Failing that, could the expansion use expand_fn_using_insn?

It generally looks OK to me otherwise FWIW, but it would be nice
to handle the fold programmatically in gimple-match*.cc rather
than having the explicit match.pd patterns.  Richi should review
the match.pd stuff though. ;)  (I didn't really look at it.)

Thanks,
Richard

> Another thing not done in this patch:  For vcond_mask we only expect
> register operands as mask and force to a register.  For a vcond_mask_len
> that results from a simplification with all-one or all-zero mask we
> could allow constant immediate vectors and expand them to simple
> len moves in the backend.
>
> Regards
>  Robin
>
> From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Fri, 13 Oct 2023 10:20:35 +0200
> Subject: [PATCH] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
> length that is not the full hardware vector length a simplification now
> does not result int a VEC_COND but rather a VCOND_MASK_LEN.
>
> For cases where the masks is known to be all true or all zero the patch
> introduces new match patterns that allow combination of unconditional
> unary, binary and ternay operations with the respective conditional
> operations if the target supports it.
>
> Similarly, if the length is known to be equal to the target hardware
> length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.
>
> gcc/ChangeLog:
>
>   * config/riscv/autovec.md (vcond_mask_len_): Add
>   expander.
>   * config/riscv/riscv-protos.h (enum insn_type):
>   * doc/md.texi: Add vcond_mask_len.
>   * gimple-match-exports.cc (maybe_resimplify_conditional_op):
>   Create VCOND_MASK_LEN when
>   length masking.
>   * gimple-match.h (gimple_match_op::gimple_match_op): Allow
>   matching of 6 and 7 parameters.
>   (gimple_match_op::set_op): Ditto.
>   (gimple_match_op::gimple_match_op): Always initialize len and
>   bias.
>   * internal-fn.cc (vec_cond_mask_len_direct): Add.
>   (expand_vec_cond_mask_len_optab_fn): Add.
>   (direct_vec_cond_mask_len_optab_supported_p): Add.
>   (internal_fn_len_index): Add VCOND_MASK_LEN.
>   (internal_fn_mask_index): Ditto.
>   * internal-fn.def (VCOND_MASK_LEN): New internal function.
>   * match.pd: Combine unconditional unary, binary and ternary
>   operations into the respective COND_LEN operations.
>   * optabs.def (OPTAB_CD): Add vcond_mask_len optab.
> ---
>  gcc/config/riscv/autovec.md | 20 +
>  gcc/config/riscv/riscv-protos.h |  4 ++
>  gcc/doc/md.texi |  9 
>  gcc/gimple-match-exports.cc | 20 +++--
>  gcc/gimple-match.h  | 78 

Re: PR111754

2023-10-24 Thread Richard Sandiford
Hi,

Sorry the slow review.  I clearly didn't think this through properly
when doing the review of the original patch, so I wanted to spend
some time working on the code to get a better understanding of
the problem.

Prathamesh Kulkarni  writes:
> Hi,
> For the following test-case:
>
> typedef float __attribute__((__vector_size__ (16))) F;
> F foo (F a, F b)
> {
>   F v = (F) { 9 };
>   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> }
>
> Compiling with -O2 results in following ICE:
> foo.c: In function ‘foo’:
> foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
>   |  ^~
> 0x7f3185 wi::int_traits
>>::decompose(long*, unsigned int, std::pair
> const&)
> ../../gcc/gcc/rtl.h:2314
> 0x7f3185 wide_int_ref_storage false>::wide_int_ref_storage
>>(std::pair const&)
> ../../gcc/gcc/wide-int.h:1089
> 0x7f3185 generic_wide_int
>>::generic_wide_int
>>(std::pair const&)
> ../../gcc/gcc/wide-int.h:847
> 0x7f3185 poly_int<1u, generic_wide_int false> > >::poly_int
>>(poly_int_full, std::pair const&)
> ../../gcc/gcc/poly-int.h:467
> 0x7f3185 poly_int<1u, generic_wide_int false> > >::poly_int
>>(std::pair const&)
> ../../gcc/gcc/poly-int.h:453
> 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> ../../gcc/gcc/rtl.h:2383
> 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> ../../gcc/gcc/rtx-vector-builder.h:122
> 0xfd4e1b vector_builder rtx_vector_builder>::elt(unsigned int) const
> ../../gcc/gcc/vector-builder.h:253
> 0xfd4d11 rtx_vector_builder::build()
> ../../gcc/gcc/rtx-vector-builder.cc:73
> 0xc21d9c const_vector_from_tree
> ../../gcc/gcc/expr.cc:13487
> 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> expand_modifier, rtx_def**, bool)
> ../../gcc/gcc/expr.cc:11059
> 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
> ../../gcc/gcc/expr.h:310
> 0xaee682 expand_return
> ../../gcc/gcc/cfgexpand.cc:3809
> 0xaee682 expand_gimple_stmt_1
> ../../gcc/gcc/cfgexpand.cc:3918
> 0xaee682 expand_gimple_stmt
> ../../gcc/gcc/cfgexpand.cc:4044
> 0xaf28f0 expand_gimple_basic_block
> ../../gcc/gcc/cfgexpand.cc:6100
> 0xaf4996 execute
> ../../gcc/gcc/cfgexpand.cc:6835
>
> IIUC, the issue is that fold_vec_perm returns a vector having float element
> type with res_nelts_per_pattern == 3, and later ICE's when it tries
> to derive element v[3], not present in the encoding, while trying to
> build rtx vector
> in rtx_vector_builder::build():
>  for (unsigned int i = 0; i < nelts; ++i)
> RTVEC_ELT (v, i) = elt (i);
>
> The attached patch tries to fix this by returning false from
> valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> input vector has non-integral element type, so for VLA vectors, it
> will only build result with dup sequence (nelts_per_pattern < 3) for
> non-integral element type.
>
> For VLS vectors, this will still work for stepped sequence since it
> will then use the "VLS exception" in fold_vec_perm_cst, and set:
> res_npattern = res_nelts and
> res_nelts_per_pattern = 1
>
> and fold the above case to:
> F foo (F a, F b)
> {
>[local count: 1073741824]:
>   return { 0.0, 9.0e+0, 0.0, 0.0 };
> }
>
> But I am not sure if this is entirely correct, since:
> tree res = out_elts.build ();
> will canonicalize the encoding and may result in a stepped sequence
> (vector_builder::finalize() may reduce npatterns at the cost of increasing
> nelts_per_pattern)  ?
>
> PS: This issue is now latent after PR111648 fix, since
> valid_mask_for_fold_vec_perm_cst with  sel = {1, 0, 1, ...} returns
> false because the corresponding pattern in arg0 is not a natural
> stepped sequence, and folds correctly using VLS exception. However, I
> guess the underlying issue of dealing with non-integral element types
> in fold_vec_perm_cst still remains ?
>
> The patch passes bootstrap+test with and without SVE on aarch64-linux-gnu,
> and on x86_64-linux-gnu.

I think the problem is instead in the way that we're calculating
res_npatterns and res_nelts_per_pattern.

If the selector is a duplication of { a1, ..., an }, then the
result will be a duplication of n elements, regardless of the shape
of the other arguments.

Similarly, if the selector is { a1, , an } followed by a
duplication of { b1, ..., bn }, the result be n elements followed
by a duplication of n elements, regardless of the shape of the other
arguments.

So for these two cases, res_npatterns and res_nelts_per_pattern
can come directly from the selector's encoding.

If:

(1) the selector is an n-pattern stepped sequence
(2) the stepped part of each pattern selects from the same input pattern
(3) the stepped part of each pattern does not select the first element
of the input pattern, or the full input pattern is stepped
(your previous 

Re: [PATCH v2 1/3] c++: sort candidates according to viability

2023-10-24 Thread Jason Merrill

On 10/23/23 19:51, Patrick Palka wrote:

The second patch in this series is new and ensures that the candidates
list isn't mysteriously missing some candidates when noting other
candidates due to deletedness.

-- >8 --

This patch:

   * changes splice_viable to move the non-viable candidates to the end
 of the list instead of removing them outright


Please consistently spell this "non-viable" rather than "unviable".


+tourney (struct z_candidate *, tsubst_flags_t complain)
  {
struct z_candidate *champ = candidates, *challenger;
int fate;
struct z_candidate *champ_compared_to_predecessor = nullptr;
+  struct z_candidate *champ_predecessor = nullptr;
+  struct z_candidate *challenger_predecessor = champ;


Rather than adding two more variables to keep synchronized, maybe we 
want champ and challenger to be **, like the tail variables in 
splice_viable?


Jason



Re: [PATCH] c++: build_new_1 and non-dep array size [PR111929]

2023-10-24 Thread Jason Merrill

On 10/24/23 13:03, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
like the right approach?

-- >8 --

This PR is another instance of NON_DEPENDENT_EXPR having acted as an
"analysis barrier" for middle-end routines, and now that it's gone we
may end up passing weird templated trees (that have a generic tree code)
to the middle-end which leads to an ICE.  In the testcase below the
non-dependent array size 'var + 42' is expressed as an ordinary
PLUS_EXPR, but whose operand types have different precisions -- long and
int respectively -- naturally because templated trees encode only the
syntactic form of an expression devoid of e.g. implicit conversions
(typically).  This type incoherency triggers a wide_int assert during
the call to size_binop in build_new_1 which requires the operand types
have the same precision.

This patch fixes this by replacing our incremental folding of 'size'
within build_new_1 with a single call to cp_fully_fold (which is a no-op
in template context) once 'size' is fully built.


This is OK, but we could probably also entirely skip a lot of the 
calculation in a template, since we don't care about any values.  Can we 
skip the entire if (array_p) block?



PR c++/111929

gcc/cp/ChangeLog:

* init.cc (build_new_1): Use convert, build2, build3 instead of
fold_convert, size_binop and fold_build3 when building 'size'.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent28.C: New test.
---
  gcc/cp/init.cc  | 9 +
  gcc/testsuite/g++.dg/template/non-dependent28.C | 6 ++
  2 files changed, 11 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent28.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index d48bb16c7c5..56c1b5e9f5e 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -3261,7 +3261,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
max_outer_nelts = wi::udiv_trunc (max_size, inner_size);
max_outer_nelts_tree = wide_int_to_tree (sizetype, max_outer_nelts);
  
-  size = size_binop (MULT_EXPR, size, fold_convert (sizetype, nelts));

+  size = build2 (MULT_EXPR, sizetype, size, convert (sizetype, nelts));
  
if (TREE_CODE (cst_outer_nelts) == INTEGER_CST)

{
@@ -3344,7 +3344,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
/* Use a class-specific operator new.  */
/* If a cookie is required, add some extra space.  */
if (array_p && TYPE_VEC_NEW_USES_COOKIE (elt_type))
-   size = size_binop (PLUS_EXPR, size, cookie_size);
+   size = build2 (PLUS_EXPR, sizetype, size, cookie_size);
else
{
  cookie_size = NULL_TREE;
@@ -3358,8 +3358,8 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
if (cxx_dialect >= cxx11 && flag_exceptions)
errval = throw_bad_array_new_length ();
if (outer_nelts_check != NULL_TREE)
-   size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check,
-   size, errval);
+   size = build3 (COND_EXPR, sizetype, outer_nelts_check, size, errval);
+  size = cp_fully_fold (size);
/* Create the argument list.  */
vec_safe_insert (*placement, 0, size);
/* Do name-lookup to find the appropriate operator.  */
@@ -3418,6 +3418,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
/* If size is zero e.g. due to type having zero size, try to
 preserve outer_nelts for constant expression evaluation
 purposes.  */
+  size = cp_fully_fold (size);
if (integer_zerop (size) && outer_nelts)
size = build2 (MULT_EXPR, TREE_TYPE (size), size, outer_nelts);
  
diff --git a/gcc/testsuite/g++.dg/template/non-dependent28.C b/gcc/testsuite/g++.dg/template/non-dependent28.C

new file mode 100644
index 000..3e45154f61d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent28.C
@@ -0,0 +1,6 @@
+// PR c++/111929
+
+template
+void f(long var) {
+  new int[var + 42];
+}




Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Siddhesh Poyarekar

On 2023-10-24 16:38, Martin Uecker wrote:

Here is another proposal:  Add a new builtin function

__builtin_with_size(x, size)

that return x but behaves similar to an allocation
function in that BDOS can look at the size argument
to discover the size.

The FE insers this function when the field is accessed:

__builtin_with_size(x.buf, x.L);



In fact if we do this at the allocation site for x, it may also help 
with future warnings, where the compiler could flag a warning or error 
when it encounters this builtin but does not see an assignment to x.L.


Thanks,
Sid


Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Siddhesh Poyarekar

On 2023-10-24 16:30, Qing Zhao wrote:

Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
therefore, the call to __bdos is Not in the same routine as the instantiation 
of the object, As a result, the TYPE info and the attached counted_by info of 
the object can NOT be USED by the __bdos call.



But __bos/__bdos are barely useful without optimization; you need a 
minimum of -O1.  You're right that if the call is never inlined then we 
don't care because the __bdos call does not get expanded to obj->size.


However, the point of situation 2 is that the TYPE info cannot be used 
by the __bdos call *only for a while* (i.e. until the call gets inlined) 
and that window is an opportunity for the reordering/DSE to break things.


Thanks.
Sid


Re: [PATCH] c++: error with bit-fields and scoped enums [PR111895]

2023-10-24 Thread Marek Polacek
On Tue, Oct 24, 2023 at 04:46:02PM -0400, Jason Merrill wrote:
> On 10/24/23 12:18, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > Here we issue a bogus error: invalid operands of types 'unsigned char:2'
> > and 'int' to binary 'operator!=' when casting a bit-field of scoped enum
> > type to bool.
> > 
> > In build_static_cast_1, perform_direct_initialization_if_possible returns
> > NULL_TREE, because the invented declaration T t(e) fails, which is
> > correct.  So we go down to ocp_convert, which has code to deal with this
> > case:
> >/* We can't implicitly convert a scoped enum to bool, so convert
> >   to the underlying type first.  */
> >if (SCOPED_ENUM_P (intype) && (convtype & CONV_STATIC))
> >  e = build_nop (ENUM_UNDERLYING_TYPE (intype), e);
> > but the SCOPED_ENUM_P is false since intype is .
> > This could be fixed by using unlowered_expr_type.  But then
> > c_common_truthvalue_conversion/CASE_CONVERT has a similar problem, and
> > unlowered_expr_type is a C++-only function.
> > 
> > Rather than adding a dummy unlowered_expr_type to C, I think we should
> > follow [expr.static.cast]p3: "the lvalue-to-rvalue conversion is applied
> > to the bit-field and the resulting prvalue is used as the operand of the
> > static_cast."  There are no prvalue bit-fields, so the l-to-r conversion
> > will get us an expression whose type is the enum.  (I thought we didn't
> > need decay_conversion because that does a whole lot more but using it
> > would make sense to me too.)
> 
> It's possible that we might want some of that more, particularly
> mark_rvalue_use; decay_conversion seems like the right answer.  OK with that
> change.

Makes total sense, thank you.  (I'd tested the version with decay_conversion
and it worked fine.)
 
> rvalue() would also make sense, though that seems to be missing a call to
> unlowered_expr_type at the moment.  In fact, after "otherwise, it's the
> lvalue-to-rvalue conversion" in decay_conv should probably just be a call to
> rvalue, with missing bits added to the latter function.

Sounds good; I hope I'll get to it next week.  I'm not going to make it
part of this patch so that I can backport this one to 13 and leave the
cleanup for trunk only.

Marek



Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread Vineet Gupta

On 10/24/23 13:36, rep.dot@gmail.com wrote:

As said, I don't see why the below was not cleaned up before the V1 submission.
Iff it breaks when manually CSEing, I'm curious why?

The function below looks identical in v12 of the patch.
Why didn't you use common subexpressions?
ba

Using CSE here breaks aarch64 regressions hence I have reverted it back
not to use CSE,

Just for my own education, can you please paste your patch perusing common 
subexpressions and an assembly diff of the failing versus working aarch64 
testcase, along how you configured that failing (cross-?)compiler and the 
command-line of a typical testcase that broke when manually CSEing the function 
below?


I was meaning to ask this before, but what exactly is the CSE issue, 
manually or whatever.


-Vineet


Re: [PATCH] c++: error with bit-fields and scoped enums [PR111895]

2023-10-24 Thread Jason Merrill

On 10/24/23 12:18, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we issue a bogus error: invalid operands of types 'unsigned char:2'
and 'int' to binary 'operator!=' when casting a bit-field of scoped enum
type to bool.

In build_static_cast_1, perform_direct_initialization_if_possible returns
NULL_TREE, because the invented declaration T t(e) fails, which is
correct.  So we go down to ocp_convert, which has code to deal with this
case:
   /* We can't implicitly convert a scoped enum to bool, so convert
  to the underlying type first.  */
   if (SCOPED_ENUM_P (intype) && (convtype & CONV_STATIC))
 e = build_nop (ENUM_UNDERLYING_TYPE (intype), e);
but the SCOPED_ENUM_P is false since intype is .
This could be fixed by using unlowered_expr_type.  But then
c_common_truthvalue_conversion/CASE_CONVERT has a similar problem, and
unlowered_expr_type is a C++-only function.

Rather than adding a dummy unlowered_expr_type to C, I think we should
follow [expr.static.cast]p3: "the lvalue-to-rvalue conversion is applied
to the bit-field and the resulting prvalue is used as the operand of the
static_cast."  There are no prvalue bit-fields, so the l-to-r conversion
will get us an expression whose type is the enum.  (I thought we didn't
need decay_conversion because that does a whole lot more but using it
would make sense to me too.)


It's possible that we might want some of that more, particularly 
mark_rvalue_use; decay_conversion seems like the right answer.  OK with 
that change.


rvalue() would also make sense, though that seems to be missing a call 
to unlowered_expr_type at the moment.  In fact, after "otherwise, it's 
the lvalue-to-rvalue conversion" in decay_conv should probably just be a 
call to rvalue, with missing bits added to the latter function.


Jason



Re: [PATCH] Fortran/OpenMP: event handle in task detach cannot be a coarray [PR104131]

2023-10-24 Thread Harald Anlauf

Dear all,

Tobias argued in the PR that the testcase should actually be valid.
Therefore withdrawing the patch.

Sorry for expecting this to be a low-hanging fruit...

Harald

On 10/24/23 22:23, rep.dot@gmail.com wrote:

On 24 October 2023 21:25:01 CEST, Harald Anlauf  wrote:

Dear all,

the attached simple patch adds a forgotten check that an event handle
cannot be a coarray.  This case appears to have been overlooked in the
original fix for this PR.

I intend to commit as obvious within 24h unless there are comments.


diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1cc65d7fa49..08081dacde4 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8967,6 +8967,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
else if (omp_clauses->detach->symtree->n.sym->attr.dimension > 0)
gfc_error ("The event handle at %L must not be an array element",
   _clauses->detach->where);
+  else if (omp_clauses->detach->symtree->n.sym->attr.codimension)
+   gfc_error ("The event handle at %L must not be a coarray",

ISTM that we usually do not mention "element" when talking about undue 
(co)array access.

Maybe we want to streamline this specific error message?

LGTM otherwise.
Thanks for your dedication!


+  _clauses->detach->where);
else if (omp_clauses->detach->symtree->n.sym->ts.type == BT_DERIVED
   || omp_clauses->detach->symtree->n.sym->ts.type == BT_CLASS)
gfc_error ("The event handle at %L must not be part of "






Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Martin Uecker
Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
> Hi, Sid,
> 
> Really appreciate for your example and detailed explanation. Very helpful.
> I think that this example is an excellent example to show (almost) all the 
> issues we need to consider.
> 
> I slightly modified this example to make it to be compilable and run-able, as 
> following: 
> (but I still cannot make the incorrect reordering or DSE happening, anyway, 
> the potential reordering possibility is there…)
> 
>   1 #include 
>   2 struct A
>   3 {
>   4  size_t size;
>   5  char buf[] __attribute__((counted_by(size)));
>   6 };
>   7 
>   8 static size_t
>   9 get_size_from (void *ptr)
>  10 {
>  11  return __builtin_dynamic_object_size (ptr, 1);
>  12 }
>  13 
>  14 void
>  15 foo (size_t sz)
>  16 {
>  17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>  18  obj->size = sz;
>  19  obj->buf[0] = 2;
>  20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>  21  return;
>  22 }
>  23 
>  24 int main ()
>  25 {
>  26  foo (20);
>  27  return 0;
>  28 }
> 
> With my GCC, it was compiled and worked:
> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> 20
> Situation 1: With O1 and above, the routine “get_size_from” was inlined into 
> “foo”, therefore, the call to __bdos is in the same routine as the 
> instantiation of the object, and the TYPE information and the attached 
> counted_by attribute information in the TYPE of the object can be USED by the 
> __bdos call to compute the final object size. 
> 
> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> -1
> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
> therefore, the call to __bdos is Not in the same routine as the instantiation 
> of the object, As a result, the TYPE info and the attached counted_by info of 
> the object can NOT be USED by the __bdos call. 
> 
> Keep in mind of the above 2 situations, we will refer them in below:
> 
> 1. First,  the problem we are trying to resolve is:
> 
> (Your description):
> 
> >  the reordering of __bdos w.r.t. initialization of the size parameter but 
> > to also account for DSE of the assignment, we can abstract this problem to 
> > that of DFA being unable to see implicit use of the size parameter in the 
> > __bdos call.
> 
> basically is correct.  However, with the following exception:
> 
> The implicit use of the size parameter in the __bdos call is not always 
> there, it ONLY exists WHEN the __bdos is able to evaluated to an expression 
> of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the 
> above example. 
>  In the “Situation 2”, when the __bdos does not see the TYPE of the real 
> object,  it does not see the counted_by information from the TYPE, therefore, 
>  it is not able to evaluate the size of the object through the counted_by 
> information.  As a result, the implicit use of the size parameter in the 
> __bdos call does NOT exist at all.  The optimizer can freely reorder the 
> initialization of the size parameter with the __bdos call since there is no 
> data flow dependency between these two. 
> 
> With this exception in mind, we can see that your proposed “option 2” (making 
> the type of size “volatile”) is too conservative, it will  disable many 
> optimizations  unnecessarily, even though it’s safe and simple to implement. 
> 
> As a compiler optimization person for many many years, I really don’t want to 
> take this approach at this moment.  -:)
> 
> 2. Some facts I’d like to mention:
> 
> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
> optimization stage. During RTL stage,  the __bdos call has already been 
> replaced by an expression of the size parameter or a constant, the data 
> dependency is explicitly in the IR already.  I believe that the data analysis 
> in RTL stage should pick up the data dependency correctly, No special 
> handling is needed in RTL.
> 
> B. If the __bdos call cannot see the real object , it has no way to get the 
> “counted_by” field from the TYPE of the real object. So, if we try to add the 
> implicit use of the “counted_by” field to the __bdos call, the object 
> instantiation should be in the same routine as the __bdos call.  Both the FE 
> and the gimplification phase are too early to do this work. 
> 
> 2. Then, what’s the best approach to resolve this problem:
> 
> There were several suggestions so far:
> 
> A.  Add an additional argument, the size parameter,  to __bdos, 
>   A.1, during FE;
>   A.2, during gimplification phase;
> B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
> C.  Encode the implicit USE  in the type of buf, then update the optimization 
> passes to use this implicit USE encoded in the type of buf.
> 
> As I explained in the above, 
> ** Approach A (both A.1 and A.2) does not work;
> ** 

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread rep . dot . nop
On 24 October 2023 09:36:22 CEST, Ajit Agarwal  wrote:
>Hello Bernhard:
>
>On 23/10/23 7:40 pm, Bernhard Reutner-Fischer wrote:
>> On Mon, 23 Oct 2023 12:16:18 +0530
>> Ajit Agarwal  wrote:
>> 
>>> Hello All:
>>>
>>> Addressed below review comments in the version 11 of the patch.
>>> Please review and please let me know if its ok for trunk.
>> 
>> s/satisified/satisfied/
>> 
>
>I will fix this.

thanks!

>
 As said, I don't see why the below was not cleaned up before the V1 
 submission.
 Iff it breaks when manually CSEing, I'm curious why?
>> 
>> The function below looks identical in v12 of the patch.
>> Why didn't you use common subexpressions?
>> ba
>
>Using CSE here breaks aarch64 regressions hence I have reverted it back 
>not to use CSE,

Just for my own education, can you please paste your patch perusing common 
subexpressions and an assembly diff of the failing versus working aarch64 
testcase, along how you configured that failing (cross-?)compiler and the 
command-line of a typical testcase that broke when manually CSEing the function 
below?

I might have not completely understood the subtile intricacies of RTL 
re-entrancy, it seems?

thanks

   
>> +/* Return TRUE if reg source operand of zero_extend is argument 
>> registers
>> +   and not return registers and source and destination operand are same
>> +   and mode of source and destination operand are not same.  */
>> +
>> +static bool
>> +abi_extension_candidate_p (rtx_insn *insn)
>> +{
>> +  rtx set = single_set (insn);
>> +  machine_mode dst_mode = GET_MODE (SET_DEST (set));
>> +  rtx orig_src = XEXP (SET_SRC (set), 0);
>> +
>> +  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
>> +  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO 
>> (orig_src)))  
>> +return false;
>> +
>> +  /* Mode of destination and source should be different.  */
>> +  if (dst_mode == GET_MODE (orig_src))
>> +return false;
>> +
>> +  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
>> +  bool promote_p = abi_target_promote_function_mode (mode);
>> +
>> +  /* REGNO of source and destination should be same if not
>> +  promoted.  */
>> +  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
>> +return false;
>> +
>> +  return true;
>> +}
>> +  
>> 
>> 

 As said, please also rephrase the above (and everything else if it 
 obviously looks akin the above).
>> 
>> thanks



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-24 Thread Qing Zhao
Hi, Sid,

Really appreciate for your example and detailed explanation. Very helpful.
I think that this example is an excellent example to show (almost) all the 
issues we need to consider.

I slightly modified this example to make it to be compilable and run-able, as 
following: 
(but I still cannot make the incorrect reordering or DSE happening, anyway, the 
potential reordering possibility is there…)

  1 #include 
  2 struct A
  3 {
  4  size_t size;
  5  char buf[] __attribute__((counted_by(size)));
  6 };
  7 
  8 static size_t
  9 get_size_from (void *ptr)
 10 {
 11  return __builtin_dynamic_object_size (ptr, 1);
 12 }
 13 
 14 void
 15 foo (size_t sz)
 16 {
 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 18  obj->size = sz;
 19  obj->buf[0] = 2;
 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
 21  return;
 22 }
 23 
 24 int main ()
 25 {
 26  foo (20);
 27  return 0;
 28 }

With my GCC, it was compiled and worked:
[opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
[opc@qinzhao-ol8u3-x86 ]$ ./a.out
20
Situation 1: With O1 and above, the routine “get_size_from” was inlined into 
“foo”, therefore, the call to __bdos is in the same routine as the 
instantiation of the object, and the TYPE information and the attached 
counted_by attribute information in the TYPE of the object can be USED by the 
__bdos call to compute the final object size. 

[opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
[opc@qinzhao-ol8u3-x86 ]$ ./a.out
-1
Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
therefore, the call to __bdos is Not in the same routine as the instantiation 
of the object, As a result, the TYPE info and the attached counted_by info of 
the object can NOT be USED by the __bdos call. 

Keep in mind of the above 2 situations, we will refer them in below:

1. First,  the problem we are trying to resolve is:

(Your description):

>  the reordering of __bdos w.r.t. initialization of the size parameter but to 
> also account for DSE of the assignment, we can abstract this problem to that 
> of DFA being unable to see implicit use of the size parameter in the __bdos 
> call.

basically is correct.  However, with the following exception:

The implicit use of the size parameter in the __bdos call is not always there, 
it ONLY exists WHEN the __bdos is able to evaluated to an expression of the 
size parameter in the “objsz” phase, i.e., the “Situation 1” of the above 
example. 
 In the “Situation 2”, when the __bdos does not see the TYPE of the real 
object,  it does not see the counted_by information from the TYPE, therefore,  
it is not able to evaluate the size of the object through the counted_by 
information.  As a result, the implicit use of the size parameter in the __bdos 
call does NOT exist at all.  The optimizer can freely reorder the 
initialization of the size parameter with the __bdos call since there is no 
data flow dependency between these two. 

With this exception in mind, we can see that your proposed “option 2” (making 
the type of size “volatile”) is too conservative, it will  disable many 
optimizations  unnecessarily, even though it’s safe and simple to implement. 

As a compiler optimization person for many many years, I really don’t want to 
take this approach at this moment.  -:)

2. Some facts I’d like to mention:

A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
optimization stage. During RTL stage,  the __bdos call has already been 
replaced by an expression of the size parameter or a constant, the data 
dependency is explicitly in the IR already.  I believe that the data analysis 
in RTL stage should pick up the data dependency correctly, No special handling 
is needed in RTL.

B. If the __bdos call cannot see the real object , it has no way to get the 
“counted_by” field from the TYPE of the real object. So, if we try to add the 
implicit use of the “counted_by” field to the __bdos call, the object 
instantiation should be in the same routine as the __bdos call.  Both the FE 
and the gimplification phase are too early to do this work. 

2. Then, what’s the best approach to resolve this problem:

There were several suggestions so far:

A.  Add an additional argument, the size parameter,  to __bdos, 
  A.1, during FE;
  A.2, during gimplification phase;
B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
C.  Encode the implicit USE  in the type of buf, then update the optimization 
passes to use this implicit USE encoded in the type of buf.

As I explained in the above, 
** Approach A (both A.1 and A.2) does not work;
** Approach B will have big performance impact, I’d prefer not to take this 
approach at this moment.
** Approach C will be a lot of change in GCC, and also not very necessary since 
the ONLY implicit use of the size parameter is in the __bdos call when __bdos 
can see the real object.

So, all the above 

Re: [PATCH] libstdc++ Add cstdarg to freestanding

2023-10-24 Thread Jonathan Wakely
On Sun, 22 Oct 2023 at 21:06, Arsen Arsenović  wrote:

>
> "Paul M. Bendixen"  writes:
>
> > Updated patch, added the requested files, hopefully wrote the commit
> better.
>
> LGTM.  Jonathan?
>

Yup, looks good. I've pushed it to trunk with a tweaked changelog entry.
I'll backport it to gcc-13 soon too.

Thanks, Paul!


Re: [PATCH] Fortran/OpenMP: event handle in task detach cannot be a coarray [PR104131]

2023-10-24 Thread rep . dot . nop
On 24 October 2023 21:25:01 CEST, Harald Anlauf  wrote:
>Dear all,
>
>the attached simple patch adds a forgotten check that an event handle
>cannot be a coarray.  This case appears to have been overlooked in the
>original fix for this PR.
>
>I intend to commit as obvious within 24h unless there are comments.

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1cc65d7fa49..08081dacde4 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8967,6 +8967,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
   else if (omp_clauses->detach->symtree->n.sym->attr.dimension > 0)
gfc_error ("The event handle at %L must not be an array element",
   _clauses->detach->where);
+  else if (omp_clauses->detach->symtree->n.sym->attr.codimension)
+   gfc_error ("The event handle at %L must not be a coarray",

ISTM that we usually do not mention "element" when talking about undue 
(co)array access.

Maybe we want to streamline this specific error message?

LGTM otherwise.
Thanks for your dedication!


+  _clauses->detach->where);
   else if (omp_clauses->detach->symtree->n.sym->ts.type == BT_DERIVED
   || omp_clauses->detach->symtree->n.sym->ts.type == BT_CLASS)
gfc_error ("The event handle at %L must not be part of "



Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-24 Thread Robin Dapp
Changed as suggested.  The difference to v5 is thus:

+ if (cond_fn_p)
+   {
+ gcall *call = dyn_cast (use_stmt);
+ unsigned else_pos
+   = internal_fn_else_index (internal_fn (op.code));
+
+ for (unsigned int j = 0; j < gimple_call_num_args (call); ++j)
+   {
+ if (j == else_pos)
+   continue;
+ if (gimple_call_arg (call, j) == op.ops[opi])
+   cnt++;
+   }
+   }
+ else if (!is_gimple_debug (op_use_stmt)

as well as internal_fn_else_index.

Testsuite on riscv is unchanged, bootstrap and testsuite on power10 done,
aarch64 and x86 still running.

Regards
 Robin

>From e11ac2b5889558c58ce711d8119ebcd78173ac6c Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Wed, 13 Sep 2023 22:19:35 +0200
Subject: [PATCH v6] ifcvt/vect: Emit COND_OP for conditional scalar reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD/COND_OP during
ifcvt and adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* internal-fn.cc (internal_fn_else_index): New function.
* internal-fn.h (internal_fn_else_index): Define.
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
(neutral_op_for_reduction): Return -0 for PLUS.
(check_reduction_path): Don't count else operand in COND_OP.
(vect_is_simple_reduction): Ditto.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_OP handling.
(vectorizable_reduction): Don't count else operand in COND_OP.
(vect_transform_reduction): Add COND_OP handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
---
 gcc/internal-fn.cc|  58 ++
 gcc/internal-fn.h |   1 +
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +
 .../riscv/rvv/autovec/cond/pr111401.c | 139 +
 .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
 .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
 gcc/tree-if-conv.cc   |  49 +++--
 gcc/tree-vect-loop.cc | 193 ++
 gcc/tree-vectorizer.h |   2 +-
 9 files changed, 536 insertions(+), 55 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..018175261b9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4697,6 +4697,64 @@ internal_fn_len_index (internal_fn fn)
 }
 }
 
+int
+internal_fn_else_index (internal_fn fn)
+{
+  switch (fn)
+{
+case IFN_COND_NEG:
+case IFN_COND_NOT:
+case IFN_COND_LEN_NEG:
+case IFN_COND_LEN_NOT:
+  return 2;
+
+case IFN_COND_ADD:
+case IFN_COND_SUB:
+case IFN_COND_MUL:
+case IFN_COND_DIV:
+case IFN_COND_MOD:
+case IFN_COND_MIN:
+case IFN_COND_MAX:
+case IFN_COND_FMIN:
+case IFN_COND_FMAX:
+case IFN_COND_AND:
+case IFN_COND_IOR:
+case IFN_COND_XOR:
+case IFN_COND_SHL:
+case IFN_COND_SHR:
+case IFN_COND_LEN_ADD:
+case IFN_COND_LEN_SUB:
+case IFN_COND_LEN_MUL:
+case IFN_COND_LEN_DIV:
+case IFN_COND_LEN_MOD:
+case IFN_COND_LEN_MIN:
+case IFN_COND_LEN_MAX:
+case IFN_COND_LEN_FMIN:
+case IFN_COND_LEN_FMAX:
+case IFN_COND_LEN_AND:
+case IFN_COND_LEN_IOR:
+case IFN_COND_LEN_XOR:
+case IFN_COND_LEN_SHL:
+case IFN_COND_LEN_SHR:
+  return 3;
+
+case IFN_COND_FMA:
+case IFN_COND_FMS:
+case IFN_COND_FNMA:
+case IFN_COND_FNMS:
+case IFN_COND_LEN_FMA:
+case IFN_COND_LEN_FMS:
+case IFN_COND_LEN_FNMA:
+case IFN_COND_LEN_FNMS:
+  return 4;
+
+default:
+  return -1;
+}
+
+  return -1;
+}
+
 /* If FN takes a vector mask argument, return the index of that argument,
otherwise return -1.  */
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..7d72f4db2d0 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ 

[PATCH] Fortran/OpenMP: event handle in task detach cannot be a coarray [PR104131]

2023-10-24 Thread Harald Anlauf
Dear all,

the attached simple patch adds a forgotten check that an event handle
cannot be a coarray.  This case appears to have been overlooked in the
original fix for this PR.

I intend to commit as obvious within 24h unless there are comments.

Thanks,
Harald

From 2b5ed32cacfe84dc4df74b4dccf16ac830d9eb98 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 24 Oct 2023 21:18:02 +0200
Subject: [PATCH] Fortran/OpenMP: event handle in task detach cannot be a
 coarray [PR104131]

gcc/fortran/ChangeLog:

	PR fortran/104131
	* openmp.cc (resolve_omp_clauses): Add check that event handle is
	not a coarray.

gcc/testsuite/ChangeLog:

	PR fortran/104131
	* gfortran.dg/gomp/pr104131-2.f90: New test.
---
 gcc/fortran/openmp.cc |  3 +++
 gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90 | 12 
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1cc65d7fa49..08081dacde4 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8967,6 +8967,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
   else if (omp_clauses->detach->symtree->n.sym->attr.dimension > 0)
 	gfc_error ("The event handle at %L must not be an array element",
 		   _clauses->detach->where);
+  else if (omp_clauses->detach->symtree->n.sym->attr.codimension)
+	gfc_error ("The event handle at %L must not be a coarray",
+		   _clauses->detach->where);
   else if (omp_clauses->detach->symtree->n.sym->ts.type == BT_DERIVED
 	   || omp_clauses->detach->symtree->n.sym->ts.type == BT_CLASS)
 	gfc_error ("The event handle at %L must not be part of "
diff --git a/gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90 b/gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90
new file mode 100644
index 000..3978a6ac31a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/pr104131-2.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-fopenmp -fcoarray=single" }
+! PR fortran/104131 - event handle cannot be a coarray
+
+program p
+  use iso_c_binding, only: c_intptr_t
+  implicit none
+  integer, parameter :: omp_event_handle_kind = c_intptr_t
+  integer (kind=omp_event_handle_kind) :: x[*]
+!$omp task detach (x) ! { dg-error "The event handle at \\\(1\\\) must not be a coarray" }
+!$omp end task
+end
--
2.35.3



Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Iain Sandoe



> On 24 Oct 2023, at 20:03, Marek Polacek  wrote:
> 
> On Tue, Oct 24, 2023 at 10:34:22AM +0100, Iain Sandoe wrote:
>> hi Marek,
>> 
>>> On 24 Oct 2023, at 08:44, Iain Sandoe  wrote:
>>> On 23 Oct 2023, at 20:25, Marek Polacek  wrote:
 
 On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
>> 
>> On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
>>> On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
 On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
  wrote:
 
>>> 
 and I tried Darwin (104) and that fails with
 
 *** Configuration aarch64-apple-darwin21.6.0 not supported
 
 Is anyone else able to build gcc on those machines, or test the attached
 patch?
>>> 
>>> We’re still working on upstreaming the aarch64 Darwin port - the devt. 
>>> branch
>>> is here; https://github.com/iains/gcc-darwin-arm64 (but it will be rebased 
>>> soon
>>> because we just upstreamed some dependencies).
>>> 
>>> In the meantime, I will put your patch into my test queue - hopefully before
>>> next week.
>> 
>> actually, I rebased already .. (but not pushed yet, pending testing).
>> 
>> aarch64-darwin21 bootstrapped fine with your patch (as did x86_64-darwin19)
> 
> Thank you so much Iain!
> 
>> ===
>> 
>> $ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c 
>> -o hc -fhardened -Whardened
>> cc1: warning: ‘_FORTIFY_SOURCE’ is not enabled by ‘-fhardened’ because 
>> optimizations are turned off [-Whardened]
>> 
>> $ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c 
>> -o hc -fhardened -Whardened -O
>> 
> 
> That looks correct.
> 
>> I’m about to run the testsuite, but if there’s something else to be tested 
>> please let me know (NOTE: I have not read the patch, just applied it and 
>> built).
> 
> I am mostly curious about the fhardened* tests, if they all pass.

No, some that require __PIE__=2 fail.

That is because Darwin has to handle PIE and PIC locally, because the way in 
which those options interact is different from Linux.  I need to amend Darwin’s 
handling to work together with fhardening on platform versions tor which that’s 
relevant (but do not expect that to be too tricky).

For aarch64-darwin, PIE is mandatory so we had not even been considering it [we 
basically ignore the flag, because all it does is to create tool warnings] (I 
need to fix the ouput of the pp define tho).

For all x86_64  Darwin>=20 warns about no-PIE so we are also defaulting it on 
there.

Of course, none of this should affect these tests (it just means that 
fhardening will be a NOP for PIE on later Darwin).

I’ll look into these changes over the next few days, if I have a chance, in any 
case, they do not need to be relevant to your patch.

Iain

> 
> Thanks,
> Marek



Re: [PATCH] gcov-io.h: fix comment regarding length of records

2023-10-24 Thread Jose E. Marchesi


> On 10/24/23 06:41, Jose E. Marchesi wrote:
>> The length of gcov records is stored as a signed 32-bit number of
>> bytes.
>> Ok?
> OK.

Pushed.  Thanks.


Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Marek Polacek
On Tue, Oct 24, 2023 at 09:22:25AM +0200, Richard Biener wrote:
> On Mon, Oct 23, 2023 at 9:26 PM Marek Polacek  wrote:
> >
> > On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> > > On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
> > > >
> > > > On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> > > > > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> > > > > > On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, 
> > > > > > > powerpc64le-unknown-linux-gnu,
> > > > > > > and aarch64-unknown-linux-gnu; ok for trunk?
> > > > > > >
> > > > > > > -- >8 --
> > > > > > > In 
> > > > > > > 
> > > > > > > I proposed -fhardened, a new umbrella option that enables a 
> > > > > > > reasonable set
> > > > > > > of hardening flags.  The read of the room seems to be that the 
> > > > > > > option
> > > > > > > would be useful.  So here's a patch implementing that option.
> > > > > > >
> > > > > > > Currently, -fhardened enables:
> > > > > > >
> > > > > > >   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
> > > > > > >   -D_GLIBCXX_ASSERTIONS
> > > > > > >   -ftrivial-auto-var-init=pattern
> > >
> > > I think =zero is much better here given the overhead is way
> > > cheaper and pointers get a more reliable behavior.
> >
> > Ok, changed now.
> >
> > > > > > >   -fPIE  -pie  -Wl,-z,relro,-z,now
> > > > > > >   -fstack-protector-strong
> > > > > > >   -fstack-clash-protection
> > > > > > >   -fcf-protection=full (x86 GNU/Linux only)
> > > > > > >
> > > > > > > -fhardened will not override options that were specified on the 
> > > > > > > command line
> > > > > > > (before or after -fhardened).  For example,
> > > > > > >
> > > > > > >  -D_FORTIFY_SOURCE=1 -fhardened
> > > > > > >
> > > > > > > means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> > > > > > >
> > > > > > >   -fhardened -fstack-protector
> > > > > > >
> > > > > > > will not enable -fstack-protector-strong.
> > > > > > >
> > > > > > > In DW_AT_producer it is reflected only as -fhardened; it doesn't 
> > > > > > > expand
> > > > > > > to anything.  I think we need a better way to show what it 
> > > > > > > actually
> > > > > > > enables.
> > > > > >
> > > > > > I do think we need to find a solution here to solve asserting 
> > > > > > compliance.
> > > > >
> > > > > Fair enough.
> > > > >
> > > > > > Maybe we can have -Whardened that will diagnose any altering of
> > > > > > -fhardened by other options on the command-line or by missed target
> > > > > > implementations?  People might for example use -fstack-protector
> > > > > > but don't really want to make protection lower than requested with 
> > > > > > -fhardened.
> > > > > >
> > > > > > Any such conflict is much less appearant than when you use the
> > > > > > flags -fhardened composes.
> > > > >
> > > > > How about: --help=hardened says which options -fhardened attempts to
> > > > > enable, and -Whardened warns when it didn't enable an option?  E.g.,
> > > > >
> > > > >   -fstack-protector -fhardened -Whardened
> > > > >
> > > > > would say that it didn't enable -fstack-protector-strong because
> > > > > -fstack-protector was specified on the command line?
> > > > >
> > > > > If !HAVE_LD_NOW_SUPPORT, --help=hardened probably doesn't even have to
> > > > > list -z now, likewise for -z relro.
> > > > >
> > > > > Unclear if -Whardened should be enabled by default, but probably yes?
> > > >
> > > > Here's v2 which adds -Whardened (enabled by default).
> > > >
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > >
> > > I think it's OK but I'd like to see a second ACK here.
> >
> > Thanks!
> >
> > > Can you see how our
> > > primary and secondary targets (+ host OS) behave here?
> >
> > That's very reasonable.  I tried to build gcc on Compile Farm 119 (AIX) but
> > that fails with:
> >
> > ar  -X64 x ../ppc64/libgcc/libgcc_s.a shr.o
> > ar: 0707-100 ../ppc64/libgcc/libgcc_s.a does not exist.
> > make[2]: *** [/home/polacek/gcc/libgcc/config/rs6000/t-slibgcc-aix:98: all] 
> > Error 1
> > make[2]: Leaving directory 
> > '/home/polacek/x/trunk/powerpc-ibm-aix7.3.1.0/libgcc'
> >
> > and I tried Darwin (104) and that fails with
> >
> > *** Configuration aarch64-apple-darwin21.6.0 not supported
> >
> > Is anyone else able to build gcc on those machines, or test the attached
> > patch?
> >
> > > I think the
> > > documentation should elaborate a bit on expectations for non-Linux/GNU
> > > targets, specifically I think the default configuration for a target 
> > > should
> > > with -fhardened _not_ have any -Whardened diagnostics.  Maybe we can
> > > have a testcase for this?
> >
> > Sorry, I'm not sure how to test that.  I suppose if -fhardened enables
> > something not supported on those systems, and it's something for which
> > we have a configure test, then we 

Re: [PATCH v3] gcc: Introduce -fhardened

2023-10-24 Thread Marek Polacek
On Tue, Oct 24, 2023 at 10:34:22AM +0100, Iain Sandoe wrote:
> hi Marek,
> 
> > On 24 Oct 2023, at 08:44, Iain Sandoe  wrote:
> > On 23 Oct 2023, at 20:25, Marek Polacek  wrote:
> >> 
> >> On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> >>> On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
>  
>  On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> >> On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
> >>  wrote:
> >> 
> > 
> >> and I tried Darwin (104) and that fails with
> >> 
> >> *** Configuration aarch64-apple-darwin21.6.0 not supported
> >> 
> >> Is anyone else able to build gcc on those machines, or test the attached
> >> patch?
> > 
> > We’re still working on upstreaming the aarch64 Darwin port - the devt. 
> > branch
> > is here; https://github.com/iains/gcc-darwin-arm64 (but it will be rebased 
> > soon
> > because we just upstreamed some dependencies).
> > 
> > In the meantime, I will put your patch into my test queue - hopefully before
> > next week.
> 
> actually, I rebased already .. (but not pushed yet, pending testing).
> 
> aarch64-darwin21 bootstrapped fine with your patch (as did x86_64-darwin19)

Thank you so much Iain!
 
> ===
> 
> $ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c 
> -o hc -fhardened -Whardened
> cc1: warning: ‘_FORTIFY_SOURCE’ is not enabled by ‘-fhardened’ because 
> optimizations are turned off [-Whardened]
> 
> $ /opt/iains/aarch64-apple-darwin21/gcc-14-0-0/bin/gcc /source/test/hello.c 
> -o hc -fhardened -Whardened -O
> 

That looks correct.
 
> I’m about to run the testsuite, but if there’s something else to be tested 
> please let me know (NOTE: I have not read the patch, just applied it and 
> built).

I am mostly curious about the fhardened* tests, if they all pass.

Thanks,
Marek



[PATCH] Add a late-combine pass [PR106594]

2023-10-24 Thread Richard Sandiford
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.  I hope it would
also help with Robin's vec_duplicate testcase, although the
pressure heuristic might need tweaking for that case.

This is just a first step..  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutitable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

I've run an assembly comparison with one target per CPU directory,
and it seems to be a win for all targets except nvptx (which is hard
to measure, being a higher-level asm).  The biggest winner seemed
to be AVR.

I'd originally hoped to enable the pass by default at -O2 and above
on all targets.  But in the end, I don't think that's possible,
because it interacts badly with x86's STV and partial register
dependency passes.

For example, gcc.target/i386/minmax-6.c tests whether the code
compiles without any spilling.  The RTL created by STV contains:

(insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int 1 [0x1]))) -1
 (nil))
(insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
(subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
 (expr_list:REG_DEAD (reg:SI 120)
(nil)))
(insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
(reg:SI 118)) -1
 (nil))

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

That one could be solved by running STV2 later.  But RPAD is
a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:

(insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
(sqrt:DF (mem/c:DF (symbol_ref:DI ("d2") {*sqrtdf2_sse}
 (nil))

into:

(insn 45 26 44 6 (set (reg:V4SF 108)
(const_vector:V4SF [
(const_double:SF 0.0 [0x0.0p+0]) repeated x4
])) -1
 (nil))
(insn 44 45 27 6 (set (reg:V2DF 109)
(vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI 
("d2")
(subreg:V2DF (reg:V4SF 108) 0)
(const_int 1 [0x1]))) -1
 (nil))
(insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
(subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
 (nil))

But both the pre-RA and post-RA passes are able to combine these
instructions back to the original form.

The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?

Richard


gcc/
PR rtl-optimization/106594
* Makefile.in (OBJS): Add late-combine.o.
* common.opt (flate-combine-instructions): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc: Enable it by default
at -O2 and above.
* tree-pass.h (make_pass_late_combine): Declare.
* late-combine.cc: New file.
* passes.def: Add two instances of late_combine.

gcc/testsuite/
PR rtl-optimization/106594
* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
targets.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
described in the comment.
* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
* gcc.target/aarch64/pr106594_1.c: New test.
---
 gcc/Makefile.in   |   1 +
 gcc/common.opt|   5 +
 gcc/common/config/aarch64/aarch64-common.cc   | 

Re: [PATCH 3/3] rtl-ssa: Add new helper functions

2023-10-24 Thread Jeff Law




On 10/24/23 11:58, Richard Sandiford wrote:

This patch adds some RTL-SSA helper functions.  They will be
used by the upcoming late-combine pass.

The patch contains the first non-template out-of-line function declared
in movement.h, so it adds a movement.cc.  I realise it seems a bit
over-the-top to have a file with just one function, but it might grow
in future. :)

gcc/
* Makefile.in (OBJS): Add rtl-ssa/movement.o.
* rtl-ssa/access-utils.h (accesses_include_nonfixed_hard_registers)
(single_set_info): New functions.
(remove_uses_of_def, accesses_reference_same_resource): Declare.
(insn_clobbers_resources): Likewise.
* rtl-ssa/accesses.cc (rtl_ssa::remove_uses_of_def): New function.
(rtl_ssa::accesses_reference_same_resource): Likewise.
(rtl_ssa::insn_clobbers_resources): Likewise.
* rtl-ssa/movement.h (can_move_insn_p): Declare.
* rtl-ssa/movement.cc: New file.
I assumed that you'll end up with more code in there, so I'm certainly 
OK with having just one function in the file right now.


OK for the trunk.

jeff


Re: [PATCH 2/3] rtl-ssa: Extend make_uses_available

2023-10-24 Thread Jeff Law




On 10/24/23 11:58, Richard Sandiford wrote:

The first in-tree use of RTL-SSA was fwprop, and one of the goals
was to make the fwprop rewrite preserve the old behaviour as far
as possible.  The switch to RTL-SSA was supposed to be a pure
infrastructure change.  So RTL-SSA has various FIXMEs for things
that were artifically limited to faciliate the old-fwprop vs.
new-fwprop comparison.

One of the things that fwprop wants to do is extend live ranges, and
function_info::make_use_available tried to keep within the cases that
old fwprop could handle.

Since the information is built in extended basic blocks, it's easy
to handle intra-EBB queries directly.  This patch does that, and
removes the associated FIXME.

To get a flavour for how much difference this makes, I tried compiling
the testsuite at -Os for at least one target per supported CPU and OS.
For most targets, only a handful of tests changed, but the vast majority
of changes were positive.  The only target that seemed to benefit
significantly was i686-apple-darwin.

The main point of the patch is to remove the FIXME and to enable
the upcoming post-RA late-combine pass to handle more cases.

gcc/
* rtl-ssa/functions.h (function_info::remains_available_at_insn):
New member function.
* rtl-ssa/accesses.cc (function_info::remains_available_at_insn):
Likewise.
(function_info::make_use_available): Avoid false negatives for
queries within an EBB.

OK
jeff


Re: [PATCH 1/3] rtl-ssa: Use frequency-weighted insn costs

2023-10-24 Thread Jeff Law




On 10/24/23 11:58, Richard Sandiford wrote:

rtl_ssa::changes_are_worthwhile used the standard approach
of summing up the individual costs of the old and new sequences
to see which one is better overall.  But when optimising for
speed and changing instructions in multiple blocks, it seems
better to weight the cost of each instruction by its execution
frequency.  (We already do something similar for SLP layouts.)

gcc/
* rtl-ssa/changes.cc: Include sreal.h.
(rtl_ssa::changes_are_worthwhile): When optimizing for speed,
scale the cost of each instruction by its execution frequency.

Agreed that it seems better.  OK.

Jeff


Re: [PATCH V14 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-24 Thread Vineet Gupta




On 10/24/23 10:03, Ajit Agarwal wrote:

Hello Vineet, Jeff and Bernhard:

This version 14 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
This fixes aarch64 regressions failures with aggressive CSE.


Once again, this information belong between the two "---" lines that you 
added for v6 and stopped updating.


And it seems the only code difference between v13 and v14 is

-  return tgt_mode == mode;
+  if (tgt_mode == mode)
+    return true;
+  else
+    return false;

How does that make any difference ?

-Vineet



Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 14) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.
d) Addressed remaining review comments from Vineet.
e) Addressed review comments from Bernhard.
f) Fix aarch64 regressions failure.

Please let me know if there is anything missing in this patch.

Ok for trunk?

Thanks & Regards
Ajit

ree: Improve ree pass using defined abi interfaces

For rs6000 target we see zero and sign extend with missing
definitions. Improved to eliminate such zero and sign extension
using defined ABI interfaces.

2023-10-24  Ajit Kumar Agarwal  

gcc/ChangeLog:

 * ree.cc (combine_reaching_defs): Eliminate zero_extend and sign_extend
 using defined abi interfaces.
 (add_removable_extension): Use of defined abi interfaces for no
 reaching defs.
 (abi_extension_candidate_return_reg_p): New function.
 (abi_extension_candidate_p): New function.
 (abi_extension_candidate_argno_p): New function.
 (abi_handle_regs): New function.
 (abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

 * g++.target/powerpc/zext-elim-3.C
---
changes since v6:
   - Added missing abi interfaces.
   - Rearranging and restructuring the code.
   - Removal of hard coded zero extend and sign extend in abi interfaces.
   - Relaxed different registers with source and destination in abi interfaces.
   - Using CSE in abi interfaces.
   - Fix aarch64 regressions.
   - Add Sign extension removal in abi interfaces.
   - Modified comments as per coding convention.
   - Modified code as per coding convention.
   - Fix bug bootstrapping RISCV failures.
---
  gcc/ree.cc| 147 +-
  .../g++.target/powerpc/zext-elim-3.C  |  13 ++
  2 files changed, 154 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..f557b49b366 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
break;
  
-  gcc_assert (def != NULL);

+  if (def == NULL)
+return NULL;
  
ref_chain = DF_REF_CHAIN (def);
  
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)

return src;
  }
  
+/* Return TRUE if target mode is equal to source mode, false otherwise.  */

+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, ,
+  NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if regno is a return register.  */
+
+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the following conditions are satisfied.
+
+  a) reg source operand is argument register and not return register.
+  b) mode of source and destination operand are different.
+  c) if not promoted REGNO of source and destination operand are same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+
+/* Return TRUE if regno is an argument register.  */
+
+static inline bool
+abi_extension_candidate_argno_p (int regno)
+{
+  return FUNCTION_ARG_REGNO_P (regno);
+}
+
+/* Return 

Re: [PATCH] testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c

2023-10-24 Thread Jeff Law




On 10/24/23 09:26, Stefan Schulze Frielinghaus wrote:

Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.

gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not 
supported on this target
   237 | _BitInt(32) b32_v;
   | ^~~

Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.

Tested on s390x and x86_64.  Ok for mainline?

gcc/testsuite/ChangeLog:

* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.

OK
jeff


Re: [PATCH] gcov-io.h: fix comment regarding length of records

2023-10-24 Thread Jeff Law




On 10/24/23 06:41, Jose E. Marchesi wrote:


The length of gcov records is stored as a signed 32-bit number of bytes.
Ok?

OK.
jeff


Re: [PATCH] recog/reload: Remove old UNARY_P operand support

2023-10-24 Thread Jeff Law




On 10/24/23 04:14, Richard Sandiford wrote:

reload and constrain_operands had some old code to look through unary
operators.  E.g. an operand could be (sign_extend (reg X)), and the
constraints would match the reg rather than the sign_extend. >
This was previously used by the MIPS port.  But relying on it was a
recurring source of problems, so Eric and I removed it in the MIPS
rewrite from ~20 years back.  I don't know of any other port that used it.
I can't remember if other ports used this or not.  The most likely 
scenario would be a port from the mid/late 90s that started as a 32bit 
port and was extended to a 64bit port and has similar sign extension 
properties as MIPS.



PPC, sparc and s390 come immediately to mind.  I just checked their 
predicates.md files and they don't see to have a predicate which would 
trigger this old code, even if they were reload targets.




Also, the constraints processing in LRA and IRA do not have direct
support for these embedded operators, so I think it was only ever a
reload-specific feature (and probably only a global/local+reload-specific
feature, rather than IRA+reload).
It was definitely specific to the old register allocator+reload 
implementation.  It pre-dates the introduction of IRA by many years.





Richard


gcc/
* recog.cc (constrain_operands): Remove

OK
jeff



[PATCH 3/3] rtl-ssa: Add new helper functions

2023-10-24 Thread Richard Sandiford
This patch adds some RTL-SSA helper functions.  They will be
used by the upcoming late-combine pass.

The patch contains the first non-template out-of-line function declared
in movement.h, so it adds a movement.cc.  I realise it seems a bit
over-the-top to have a file with just one function, but it might grow
in future. :)

gcc/
* Makefile.in (OBJS): Add rtl-ssa/movement.o.
* rtl-ssa/access-utils.h (accesses_include_nonfixed_hard_registers)
(single_set_info): New functions.
(remove_uses_of_def, accesses_reference_same_resource): Declare.
(insn_clobbers_resources): Likewise.
* rtl-ssa/accesses.cc (rtl_ssa::remove_uses_of_def): New function.
(rtl_ssa::accesses_reference_same_resource): Likewise.
(rtl_ssa::insn_clobbers_resources): Likewise.
* rtl-ssa/movement.h (can_move_insn_p): Declare.
* rtl-ssa/movement.cc: New file.
---
 gcc/Makefile.in|  1 +
 gcc/rtl-ssa/access-utils.h | 41 +
 gcc/rtl-ssa/accesses.cc| 63 ++
 gcc/rtl-ssa/movement.cc| 40 
 gcc/rtl-ssa/movement.h |  4 +++
 5 files changed, 149 insertions(+)
 create mode 100644 gcc/rtl-ssa/movement.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7b7a4ff789a..91d6bfbea4d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1635,6 +1635,7 @@ OBJS = \
rtl-ssa/changes.o \
rtl-ssa/functions.o \
rtl-ssa/insns.o \
+   rtl-ssa/movement.o \
rtl-tests.o \
rtl.o \
rtlhash.o \
diff --git a/gcc/rtl-ssa/access-utils.h b/gcc/rtl-ssa/access-utils.h
index 0d7a57f843c..f078625babf 100644
--- a/gcc/rtl-ssa/access-utils.h
+++ b/gcc/rtl-ssa/access-utils.h
@@ -33,6 +33,20 @@ accesses_include_hard_registers (const access_array 
)
   return accesses.size () && HARD_REGISTER_NUM_P (accesses.front ()->regno ());
 }
 
+// Return true if ACCESSES includes a reference to a non-fixed hard register.
+inline bool
+accesses_include_nonfixed_hard_registers (access_array accesses)
+{
+  for (access_info *access : accesses)
+{
+  if (!HARD_REGISTER_NUM_P (access->regno ()))
+   break;
+  if (!fixed_regs[access->regno ()])
+   return true;
+}
+  return false;
+}
+
 // Return true if sorted array ACCESSES includes an access to memory.
 inline bool
 accesses_include_memory (const access_array )
@@ -246,6 +260,22 @@ last_def (def_mux mux)
   return mux.last_def ();
 }
 
+// If INSN's definitions contain a single set, return that set, otherwise
+// return null.
+inline set_info *
+single_set_info (insn_info *insn)
+{
+  set_info *set = nullptr;
+  for (auto def : insn->defs ())
+if (auto this_set = dyn_cast (def))
+  {
+   if (set)
+ return nullptr;
+   set = this_set;
+  }
+  return set;
+}
+
 int lookup_use (splay_tree &, insn_info *);
 int lookup_def (def_splay_tree &, insn_info *);
 int lookup_clobber (clobber_tree &, insn_info *);
@@ -539,6 +569,10 @@ insert_access (obstack_watermark ,
   return T (insert_access_base (watermark, access1, accesses2));
 }
 
+// Return a copy of USES that drops any use of DEF.
+use_array remove_uses_of_def (obstack_watermark &, use_array uses,
+ def_info *def);
+
 // The underlying non-template implementation of remove_note_accesses.
 access_array remove_note_accesses_base (obstack_watermark &, access_array);
 
@@ -554,4 +588,11 @@ remove_note_accesses (obstack_watermark , T 
accesses)
   return T (remove_note_accesses_base (watermark, accesses));
 }
 
+// Return true if ACCESSES1 and ACCESSES2 have at least one resource in common.
+bool accesses_reference_same_resource (access_array accesses1,
+  access_array accesses2);
+
+// Return true if INSN clobbers the value of any resources in ACCESSES.
+bool insn_clobbers_resources (insn_info *insn, access_array accesses);
+
 }
diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 1b25ecc3e23..510545a8bad 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1569,6 +1569,19 @@ rtl_ssa::insert_access_base (obstack_watermark 
,
   return builder.finish ();
 }
 
+// See the comment above the declaration.
+use_array
+rtl_ssa::remove_uses_of_def (obstack_watermark , use_array uses,
+def_info *def)
+{
+  access_array_builder uses_builder (watermark);
+  uses_builder.reserve (uses.size ());
+  for (use_info *use : uses)
+if (use->def () != def)
+  uses_builder.quick_push (use);
+  return use_array (uses_builder.finish ());
+}
+
 // See the comment above the declaration.
 access_array
 rtl_ssa::remove_note_accesses_base (obstack_watermark ,
@@ -1587,6 +1600,56 @@ rtl_ssa::remove_note_accesses_base (obstack_watermark 
,
   return accesses;
 }
 
+// See the comment above the declaration.
+bool
+rtl_ssa::accesses_reference_same_resource (access_array accesses1,
+

[PATCH 2/3] rtl-ssa: Extend make_uses_available

2023-10-24 Thread Richard Sandiford
The first in-tree use of RTL-SSA was fwprop, and one of the goals
was to make the fwprop rewrite preserve the old behaviour as far
as possible.  The switch to RTL-SSA was supposed to be a pure
infrastructure change.  So RTL-SSA has various FIXMEs for things
that were artifically limited to faciliate the old-fwprop vs.
new-fwprop comparison.

One of the things that fwprop wants to do is extend live ranges, and
function_info::make_use_available tried to keep within the cases that
old fwprop could handle.

Since the information is built in extended basic blocks, it's easy
to handle intra-EBB queries directly.  This patch does that, and
removes the associated FIXME.

To get a flavour for how much difference this makes, I tried compiling
the testsuite at -Os for at least one target per supported CPU and OS.
For most targets, only a handful of tests changed, but the vast majority
of changes were positive.  The only target that seemed to benefit
significantly was i686-apple-darwin.

The main point of the patch is to remove the FIXME and to enable
the upcoming post-RA late-combine pass to handle more cases.

gcc/
* rtl-ssa/functions.h (function_info::remains_available_at_insn):
New member function.
* rtl-ssa/accesses.cc (function_info::remains_available_at_insn):
Likewise.
(function_info::make_use_available): Avoid false negatives for
queries within an EBB.
---
 gcc/rtl-ssa/accesses.cc | 37 +++--
 gcc/rtl-ssa/functions.h |  4 
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index c35c7efb73d..1b25ecc3e23 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1303,6 +1303,33 @@ function_info::insert_temp_clobber (obstack_watermark 
,
   return insert_access (watermark, clobber, old_defs);
 }
 
+// See the comment above the declaration.
+bool
+function_info::remains_available_at_insn (const set_info *set,
+ insn_info *insn)
+{
+  auto *ebb = set->ebb ();
+  gcc_checking_assert (ebb == insn->ebb ());
+
+  def_info *next_def = set->next_def ();
+  if (next_def && *next_def->insn () < *insn)
+return false;
+
+  if (HARD_REGISTER_NUM_P (set->regno ())
+  && TEST_HARD_REG_BIT (m_clobbered_by_calls, set->regno ()))
+for (ebb_call_clobbers_info *call_group : ebb->call_clobbers ())
+  {
+   if (!call_group->clobbers (set->resource ()))
+ continue;
+
+   insn_info *call_insn = next_call_clobbers (*call_group, insn);
+   if (call_insn && *call_insn < *insn)
+ return false;
+  }
+
+  return true;
+}
+
 // See the comment above the declaration.
 bool
 function_info::remains_available_on_exit (const set_info *set, bb_info *bb)
@@ -1354,14 +1381,20 @@ function_info::make_use_available (use_info *use, 
bb_info *bb,
   if (is_single_dominating_def (def))
 return use;
 
-  // FIXME: Deliberately limited for fwprop compatibility testing.
+  if (def->ebb () == bb->ebb ())
+{
+  if (remains_available_at_insn (def, bb->head_insn ()))
+   return use;
+  return nullptr;
+}
+
   basic_block cfg_bb = bb->cfg_bb ();
   bb_info *use_bb = use->bb ();
   if (single_pred_p (cfg_bb)
   && single_pred (cfg_bb) == use_bb->cfg_bb ()
   && remains_available_on_exit (def, use_bb))
 {
-  if (def->ebb () == bb->ebb () || will_be_debug_use)
+  if (will_be_debug_use)
return use;
 
   resource_info resource = use->resource ();
diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
index ab253e750cb..ecb40fdaf57 100644
--- a/gcc/rtl-ssa/functions.h
+++ b/gcc/rtl-ssa/functions.h
@@ -121,6 +121,10 @@ public:
   // scope until the change has been aborted or successfully completed.
   obstack_watermark new_change_attempt () { return _temp_obstack; }
 
+  // SET and INSN belong to the same EBB, with SET occuring before INSN.
+  // Return true if SET is still available at INSN.
+  bool remains_available_at_insn (const set_info *set, insn_info *insn);
+
   // SET either occurs in BB or is known to be available on entry to BB.
   // Return true if it is also available on exit from BB.  (The value
   // might or might not be live.)
-- 
2.25.1



[PATCH 1/3] rtl-ssa: Use frequency-weighted insn costs

2023-10-24 Thread Richard Sandiford
rtl_ssa::changes_are_worthwhile used the standard approach
of summing up the individual costs of the old and new sequences
to see which one is better overall.  But when optimising for
speed and changing instructions in multiple blocks, it seems
better to weight the cost of each instruction by its execution
frequency.  (We already do something similar for SLP layouts.)

gcc/
* rtl-ssa/changes.cc: Include sreal.h.
(rtl_ssa::changes_are_worthwhile): When optimizing for speed,
scale the cost of each instruction by its execution frequency.
---
 gcc/rtl-ssa/changes.cc | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index 3e14069421c..aab532b9f26 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -34,6 +34,7 @@
 #include "emit-rtl.h"
 #include "cfghooks.h"
 #include "cfgrtl.h"
+#include "sreal.h"
 
 using namespace rtl_ssa;
 
@@ -171,18 +172,33 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
 {
   unsigned int old_cost = 0;
   unsigned int new_cost = 0;
+  sreal weighted_old_cost = 0;
+  sreal weighted_new_cost = 0;
+  auto entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
   for (insn_change *change : changes)
 {
   old_cost += change->old_cost ();
+  basic_block cfg_bb = change->bb ()->cfg_bb ();
+  bool for_speed = optimize_bb_for_speed_p (cfg_bb);
+  if (for_speed)
+   weighted_old_cost += (cfg_bb->count.to_sreal_scale (entry_count)
+ * change->old_cost ());
   if (!change->is_deletion ())
{
- basic_block cfg_bb = change->bb ()->cfg_bb ();
- change->new_cost = insn_cost (change->rtl (),
-   optimize_bb_for_speed_p (cfg_bb));
+ change->new_cost = insn_cost (change->rtl (), for_speed);
  new_cost += change->new_cost;
+ if (for_speed)
+   weighted_new_cost += (cfg_bb->count.to_sreal_scale (entry_count)
+ * change->new_cost);
}
 }
-  bool ok_p = (strict_p ? new_cost < old_cost : new_cost <= old_cost);
+  bool ok_p;
+  if (weighted_new_cost != weighted_old_cost)
+ok_p = weighted_new_cost < weighted_old_cost;
+  else if (strict_p)
+ok_p = new_cost < old_cost;
+  else
+ok_p = new_cost <= old_cost;
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "original cost");
@@ -192,6 +208,8 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
  fprintf (dump_file, " %c %d", sep, change->old_cost ());
  sep = '+';
}
+  if (weighted_old_cost != 0)
+   fprintf (dump_file, " (weighted: %f)", weighted_old_cost.to_double ());
   fprintf (dump_file, ", replacement cost");
   sep = '=';
   for (const insn_change *change : changes)
@@ -200,6 +218,8 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
fprintf (dump_file, " %c %d", sep, change->new_cost);
sep = '+';
  }
+  if (weighted_new_cost != 0)
+   fprintf (dump_file, " (weighted: %f)", weighted_new_cost.to_double ());
   fprintf (dump_file, "; %s\n",
   ok_p ? "keeping replacement" : "rejecting replacement");
 }
-- 
2.25.1



[PATCH 0/3] rtl-ssa: Various extensions for the late-combine pass

2023-10-24 Thread Richard Sandiford
This series adds some RTL-SSA enhancements that are needed
by the late-combine pass.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard

Richard Sandiford (3):
  rtl-ssa: Use frequency-weighted insn costs
  rtl-ssa: Extend make_uses_available
  rtl-ssa: Add new helper functions

 gcc/Makefile.in|   1 +
 gcc/rtl-ssa/access-utils.h |  41 +++
 gcc/rtl-ssa/accesses.cc| 100 -
 gcc/rtl-ssa/changes.cc |  28 +--
 gcc/rtl-ssa/functions.h|   4 ++
 gcc/rtl-ssa/movement.cc|  40 +++
 gcc/rtl-ssa/movement.h |   4 ++
 7 files changed, 212 insertions(+), 6 deletions(-)
 create mode 100644 gcc/rtl-ssa/movement.cc

-- 
2.25.1



Re: [PATCH 6/6] rtl-ssa: Handle call clobbers in more places

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

In order to save (a lot of) memory, RTL-SSA avoids creating
individual clobber records for every call-clobbered register.
It instead maintains a list & splay tree of calls in an EBB,
grouped by ABI.

This patch takes these call clobbers into account in a couple
more routines.  I don't think this will have any effect on
existing users, since it's only necessary for hard registers.

gcc/
* rtl-ssa/access-utils.h (next_call_clobbers): New function.
(is_single_dominating_def, remains_available_on_exit): Replace with...
* rtl-ssa/functions.h (function_info::is_single_dominating_def)
(function_info::remains_available_on_exit): ...these new member
functions.
(function_info::m_clobbered_by_calls): New member variable.
* rtl-ssa/functions.cc (function_info::function_info): Explicitly
initialize m_clobbered_by_calls.
* rtl-ssa/insns.cc (function_info::record_call_clobbers): Update
m_clobbered_by_calls for each call-clobber note.
* rtl-ssa/member-fns.inl (function_info::is_single_dominating_def):
New function.  Check for call clobbers.
* rtl-ssa/accesses.cc (function_info::remains_available_on_exit):
Likewise.

OK
jeff

---


Re: [PATCH 5/6] rtl-ssa: Calculate dominance frontiers for the exit block

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

The exit block can have multiple predecessors, for example if the
function calls __builtin_eh_return.  We might then need PHI nodes
for values that are live on exit.

RTL-SSA uses the normal dominance frontiers approach for calculating
where PHI nodes are needed.  However, dominannce.cc only calculates
dominators for normal blocks, not the exit block.
calculate_dominance_frontiers likewise only calculates dominance
frontiers for normal blocks.

This patch fills in the “missing” frontiers manually.

gcc/
* rtl-ssa/internals.h (build_info::exit_block_dominator): New
member variable.
* rtl-ssa/blocks.cc (build_info::build_info): Initialize it.
(bb_walker::bb_walker): Use it, moving the computation of the
dominator to...
(function_info::process_all_blocks): ...here.
(function_info::place_phis): Add dominance frontiers for the
exit block.

OK
jeff


[PATCH v2] AArch64: Improve immediate generation

2023-10-24 Thread Wilco Dijkstra
v2: Use check-function-bodies in tests

Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates.  This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.

Passes regress, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.

gcc/testsuite:
* gcc.target/aarch64/imm_choice_comparison.c: Change tests.
* gcc.target/aarch64/moveor_imm.c: Add new test.
* gcc.target/aarch64/pr106583.c: Change tests.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
578a253d6e0e133e19592553fc873b3e73f9f218..ed5be2b64c9a767d74e9d78415da964c669001aa
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5748,6 +5748,26 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
generate,
}
  return 2;
}
+
+  /* Try 2 bitmask immediates which are xor'd together. */
+  for (i = 0; i < 64; i += 16)
+   {
+ val2 = (val >> i) & mask;
+ val2 |= val2 << 16;
+ val2 |= val2 << 32;
+ if (aarch64_bitmask_imm (val2) && aarch64_bitmask_imm (val ^ val2))
+   break;
+   }
+
+  if (i != 64)
+   {
+ if (generate)
+   {
+ emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));
+ emit_insn (gen_xordi3 (dest, dest, GEN_INT (val ^ val2)));
+   }
+ return 2;
+   }
 }
 
   /* Try a bitmask plus 2 movk to generate the immediate in 3 instructions.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c 
b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
index 
ebc44d6dbc7287d907603d77d7b54496de177c4b..a1fc90ad73411ae8ed848fa321586afcb8d710aa
 100644
--- a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -1,32 +1,64 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 /* Go from four moves to two.  */
 
+/*
+** foo:
+** mov w[0-9]+, 2576980377
+** movkx[0-9]+, 0x, lsl 32
+** ...
+*/
+
 int
 foo (long long x)
 {
-  return x <= 0x1998;
+  return x <= 0x9998;
 }
 
+/*
+** GT:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 GT (unsigned int x)
 {
   return x > 0xfefe;
 }
 
+/*
+** LE:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 LE (unsigned int x)
 {
   return x <= 0xfefe;
 }
 
+/*
+** GE:
+** mov w[0-9]+, 4278190079
+** ...
+*/
+
 int
 GE (long long x)
 {
   return x >= 0xff00;
 }
 
+/*
+** LT:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 LT (int x)
 {
@@ -35,6 +67,13 @@ LT (int x)
 
 /* Optimize the immediate in conditionals.  */
 
+/*
+** check:
+** ...
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 check (int x, int y)
 {
@@ -44,11 +83,15 @@ check (int x, int y)
   return x;
 }
 
+/*
+** tern:
+** ...
+** mov w[0-9]+, -16777217
+** ...
+*/
+
 int
 tern (int x)
 {
   return x >= 0xff00 ? 5 : -3;
 }
-
-/* baz produces one movk instruction.  */
-/* { dg-final { scan-assembler-times "movk" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/moveor_imm.c 
b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
new file mode 100644
index 
..1c0c3f3bf8c588f9661112a8b3f9a72c5ddff95c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**  movx0, -6148914691236517206
+** eor x0, x0, -9223372036854775807
+** ret
+*/
+
+long f1 (void)
+{
+  return 0x2aab;
+}
+
+/*
+** f2:
+** mov x0, -1085102592571150096
+** eor x0, x0, -2305843009213693951
+** ret
+*/
+
+long f2 (void)
+{
+  return 0x10f0f0f0f0f0f0f1;
+}
+
+/*
+** f3:
+** mov x0, -3689348814741910324
+** eor x0, x0, -4611686018427387903
+** ret
+*/
+
+long f3 (void)
+{
+  return 0xccd;
+}
+
+/*
+** f4:
+** mov x0, -7378697629483820647
+** eor x0, x0, -9223372036854775807
+** ret
+*/
+
+long f4 (void)
+{
+  return 0x1998;
+}
+
+/*
+** f5:
+** mov x0, 3689348814741910323
+** eor x0, x0, 864691128656461824
+** ret
+*/
+
+long f5 (void)
+{
+  return 0x3f333f33;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106583.c 
b/gcc/testsuite/gcc.target/aarch64/pr106583.c
index 
0f931580817d78dc1cc58f03b251bd21bec71f59..63df7395edf9491720e3601848e15aa773c51e6d
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr106583.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr106583.c
@@ -1,41 +1,94 @@
-/* { dg-do assemble } */
-/* { dg-options "-O2 --save-temps" } */
+/* { dg-do compile } */
+/* { dg-options "-O2" } 

Re: [PATCH 4/6] rtl-ssa: Handle artifical uses of deleted defs

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

If an optimisation removes the last real use of a definition,
there can still be artificial uses left.  This patch removes
those uses too.

These artificial uses exist because RTL-SSA is only an SSA-like
view of the existing RTL IL, rather than a native SSA representation.
It effectively treats RTL registers like gimple vops, but with the
addition of an RPO view of the register's lifetime(s).  Things are
structured to allow most operations to update this RPO view in
amortised sublinear time.

gcc/
* rtl-ssa/functions.h (function_info::process_uses_of_deleted_def):
New member function.
* rtl-ssa/functions.cc (function_info::process_uses_of_deleted_def):
Likewise.
(function_info::change_insns): Use it.

OK
jeff


Re: [PATCH 3/6] rtl-ssa: Fix ICE when deleting memory clobbers

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

Sometimes an optimisation can remove a clobber of scratch registers
or scratch memory.  We then need to update the DU chains to reflect
the removed clobber.

For registers this isn't a problem.  Clobbers of registers are just
momentary blips in the register's lifetime.  They act as a barrier for
moving uses later or defs earlier, but otherwise they have no effect on
the semantics of other instructions.  Removing a clobber is therefore a
cheap, local operation.

In contrast, clobbers of memory are modelled as full sets.
This is because (a) a clobber of memory does not invalidate
*all* memory and (b) it's a common idiom to use (clobber (mem ...))
in stack barriers.  But removing a set and redirecting all uses
to a different set is a linear operation.  Doing it for potentially
every optimisation could lead to quadratic behaviour.

This patch therefore refrains from removing sets of memory that appear
to be redundant.  There's an opportunity to clean this up in linear time
at the end of the pass, but as things stand, nothing would benefit from
that.

This is also a very rare event.  Usually we should try to optimise the
insn before the scratch memory has been allocated.

gcc/
* rtl-ssa/changes.cc (function_info::finalize_new_accesses):
If a change describes a set of memory, ensure that that set
is kept, regardless of the insn pattern.

OK
jeff


Re: [PATCH 2/6] rtl-ssa: Create REG_UNUSED notes after all pending changes

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

Unlike REG_DEAD notes, REG_UNUSED notes need to be kept free of
false positives by all passes.  function_info::change_insns
does this by removing all REG_UNUSED notes, and then using
add_reg_unused_notes to add notes back (or create new ones)
where appropriate.

The problem was that it called add_reg_unused_notes on the fly
while updating each instruction, which meant that the information
for later instructions in the change set wasn't up to date.
This patch does it in a separate loop instead.

gcc/
* rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Remove
call to add_reg_unused_notes and instead...
(function_info::change_insns): ...use a separate loop here.

OK
jeff


Re: [PATCH 1/6] rtl-ssa: Ensure global registers are live on exit

2023-10-24 Thread Jeff Law




On 10/24/23 04:50, Richard Sandiford wrote:

RTL-SSA mostly relies on DF for block-level register liveness
information, including artificial uses and defs at the beginning
and end of blocks.  But one case was missing.  DF does not add
artificial uses of global registers to the beginning or end
of a block.  Instead it marks them as used within every block
when computing LR and LIVE problems.

For RTL-SSA, global registers behave like memory, which in
turn behaves like gimple vops.  We need to ensure that they
are live on exit so that final definitions do not appear
to be unused.

Also, the previous live-on-exit handling only considered the exit
block itself.  It needs to consider non-local gotos as well, since
they jump directly to some code in a parent function and so do
not have a path to the exit block.

gcc/
* rtl-ssa/blocks.cc (function_info::add_artificial_accesses): Force
global registers to be live on exit.  Handle any block with zero
successors like an exit block.

OK
jeff


[PATCH V14 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-24 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard:

This version 14 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
This fixes aarch64 regressions failures with aggressive CSE.

Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 14) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.
d) Addressed remaining review comments from Vineet.
e) Addressed review comments from Bernhard.
f) Fix aarch64 regressions failure.

Please let me know if there is anything missing in this patch.

Ok for trunk?

Thanks & Regards
Ajit

ree: Improve ree pass using defined abi interfaces

For rs6000 target we see zero and sign extend with missing
definitions. Improved to eliminate such zero and sign extension
using defined ABI interfaces.

2023-10-24  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Eliminate zero_extend and sign_extend
using defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
changes since v6:
  - Added missing abi interfaces.
  - Rearranging and restructuring the code.
  - Removal of hard coded zero extend and sign extend in abi interfaces.
  - Relaxed different registers with source and destination in abi interfaces.
  - Using CSE in abi interfaces.
  - Fix aarch64 regressions.
  - Add Sign extension removal in abi interfaces.
  - Modified comments as per coding convention.
  - Modified code as per coding convention.
  - Fix bug bootstrapping RISCV failures.
---
 gcc/ree.cc| 147 +-
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 154 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..f557b49b366 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode, false otherwise.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, ,
+  NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if regno is a return register.  */
+
+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the following conditions are satisfied.
+
+  a) reg source operand is argument register and not return register.
+  b) mode of source and destination operand are different.
+  c) if not promoted REGNO of source and destination operand are same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+
+/* Return TRUE if regno is an argument register.  */
+
+static inline bool
+abi_extension_candidate_argno_p (int regno)
+{
+  return FUNCTION_ARG_REGNO_P (regno);
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses = get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->next)
+

[PATCH] c++: build_new_1 and non-dep array size [PR111929]

2023-10-24 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
like the right approach?

-- >8 --

This PR is another instance of NON_DEPENDENT_EXPR having acted as an
"analysis barrier" for middle-end routines, and now that it's gone we
may end up passing weird templated trees (that have a generic tree code)
to the middle-end which leads to an ICE.  In the testcase below the
non-dependent array size 'var + 42' is expressed as an ordinary
PLUS_EXPR, but whose operand types have different precisions -- long and
int respectively -- naturally because templated trees encode only the
syntactic form of an expression devoid of e.g. implicit conversions
(typically).  This type incoherency triggers a wide_int assert during
the call to size_binop in build_new_1 which requires the operand types
have the same precision.

This patch fixes this by replacing our incremental folding of 'size'
within build_new_1 with a single call to cp_fully_fold (which is a no-op
in template context) once 'size' is fully built.

PR c++/111929

gcc/cp/ChangeLog:

* init.cc (build_new_1): Use convert, build2, build3 instead of
fold_convert, size_binop and fold_build3 when building 'size'.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent28.C: New test.
---
 gcc/cp/init.cc  | 9 +
 gcc/testsuite/g++.dg/template/non-dependent28.C | 6 ++
 2 files changed, 11 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent28.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index d48bb16c7c5..56c1b5e9f5e 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -3261,7 +3261,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   max_outer_nelts = wi::udiv_trunc (max_size, inner_size);
   max_outer_nelts_tree = wide_int_to_tree (sizetype, max_outer_nelts);
 
-  size = size_binop (MULT_EXPR, size, fold_convert (sizetype, nelts));
+  size = build2 (MULT_EXPR, sizetype, size, convert (sizetype, nelts));
 
   if (TREE_CODE (cst_outer_nelts) == INTEGER_CST)
{
@@ -3344,7 +3344,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   /* Use a class-specific operator new.  */
   /* If a cookie is required, add some extra space.  */
   if (array_p && TYPE_VEC_NEW_USES_COOKIE (elt_type))
-   size = size_binop (PLUS_EXPR, size, cookie_size);
+   size = build2 (PLUS_EXPR, sizetype, size, cookie_size);
   else
{
  cookie_size = NULL_TREE;
@@ -3358,8 +3358,8 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   if (cxx_dialect >= cxx11 && flag_exceptions)
errval = throw_bad_array_new_length ();
   if (outer_nelts_check != NULL_TREE)
-   size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check,
-   size, errval);
+   size = build3 (COND_EXPR, sizetype, outer_nelts_check, size, errval);
+  size = cp_fully_fold (size);
   /* Create the argument list.  */
   vec_safe_insert (*placement, 0, size);
   /* Do name-lookup to find the appropriate operator.  */
@@ -3418,6 +3418,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
   /* If size is zero e.g. due to type having zero size, try to
 preserve outer_nelts for constant expression evaluation
 purposes.  */
+  size = cp_fully_fold (size);
   if (integer_zerop (size) && outer_nelts)
size = build2 (MULT_EXPR, TREE_TYPE (size), size, outer_nelts);
 
diff --git a/gcc/testsuite/g++.dg/template/non-dependent28.C 
b/gcc/testsuite/g++.dg/template/non-dependent28.C
new file mode 100644
index 000..3e45154f61d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent28.C
@@ -0,0 +1,6 @@
+// PR c++/111929
+
+template
+void f(long var) {
+  new int[var + 42];
+}
-- 
2.42.0.424.gceadf0f3cf



Re: [PATCH] c++: cp_stabilize_reference and non-dep exprs [PR111919]

2023-10-24 Thread Jason Merrill

On 10/23/23 19:49, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?

-- >8 --

After the removal of NON_DEPENDENT_EXPR, cp_stabilize_reference which
used to just exit early for NON_DEPENDENT_EXPR is now more prone to
passing a weird templated tree to middle-end routines, which leads to a
crash from contains_placeholder_p in the testcase below.  It seems the
best fix is to just disable cp_stabilize_reference when in a template
context like we already do for cp_save_expr; it seems SAVE_EXPR should
never appear in a templated tree (since e.g. tsubst doesn't handle it).


Hmm.  We don't want the result of cp_stabilize_reference (or 
cp_save_expr) to end up in the resulting trees in template context. 
Having a SAVE_EXPR in the result would actually be helpful for catching 
such a bug.


That said, the patch is OK.


PR c++/111919

gcc/cp/ChangeLog:

* tree.cc (cp_stabilize_reference): Do nothing when
processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent27.C: New test.
---
  gcc/cp/tree.cc  | 4 
  gcc/testsuite/g++.dg/template/non-dependent27.C | 8 
  2 files changed, 12 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent27.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index a3d61d3e7c9..417c92ba76f 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -408,6 +408,10 @@ bitfield_p (const_tree ref)
  tree
  cp_stabilize_reference (tree ref)
  {
+  if (processing_template_decl)
+/* As in cp_save_expr.  */
+return ref;
+
STRIP_ANY_LOCATION_WRAPPER (ref);
switch (TREE_CODE (ref))
  {
diff --git a/gcc/testsuite/g++.dg/template/non-dependent27.C 
b/gcc/testsuite/g++.dg/template/non-dependent27.C
new file mode 100644
index 000..cf7af6e6425
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent27.C
@@ -0,0 +1,8 @@
+// PR c++/111919
+
+int i[3];
+
+template
+void f() {
+  i[42 / (int) sizeof (T)] |= 0;
+}




[PATCH] c++: error with bit-fields and scoped enums [PR111895]

2023-10-24 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we issue a bogus error: invalid operands of types 'unsigned char:2'
and 'int' to binary 'operator!=' when casting a bit-field of scoped enum
type to bool.

In build_static_cast_1, perform_direct_initialization_if_possible returns
NULL_TREE, because the invented declaration T t(e) fails, which is
correct.  So we go down to ocp_convert, which has code to deal with this
case:
  /* We can't implicitly convert a scoped enum to bool, so convert
 to the underlying type first.  */
  if (SCOPED_ENUM_P (intype) && (convtype & CONV_STATIC))
e = build_nop (ENUM_UNDERLYING_TYPE (intype), e);
but the SCOPED_ENUM_P is false since intype is .
This could be fixed by using unlowered_expr_type.  But then
c_common_truthvalue_conversion/CASE_CONVERT has a similar problem, and
unlowered_expr_type is a C++-only function.

Rather than adding a dummy unlowered_expr_type to C, I think we should
follow [expr.static.cast]p3: "the lvalue-to-rvalue conversion is applied
to the bit-field and the resulting prvalue is used as the operand of the
static_cast."  There are no prvalue bit-fields, so the l-to-r conversion
will get us an expression whose type is the enum.  (I thought we didn't
need decay_conversion because that does a whole lot more but using it
would make sense to me too.)

PR c++/111895

gcc/cp/ChangeLog:

* typeck.cc (build_static_cast_1): Call
convert_bitfield_to_declared_type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/scoped_enum12.C: New test.
---
 gcc/cp/typeck.cc   | 9 +
 gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C | 8 
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index f3dc80c40cf..50427090e5d 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -8405,6 +8405,15 @@ build_static_cast_1 (location_t loc, tree type, tree 
expr, bool c_cast_p,
return expr;
   if (TREE_CODE (expr) == EXCESS_PRECISION_EXPR)
expr = TREE_OPERAND (expr, 0);
+  /* [expr.static.cast]: "If the value is not a bit-field, the result
+refers to the object or the specified base class subobject thereof;
+otherwise, the lvalue-to-rvalue conversion is applied to the
+bit-field and the resulting prvalue is used as the operand of the
+static_cast."  There are no prvalue bit-fields; the l-to-r conversion
+will give us an object of the underlying type of the bit-field.  We
+can let convert_bitfield_to_declared_type convert EXPR to the desired
+type.  */
+  expr = convert_bitfield_to_declared_type (expr);
   return ocp_convert (type, expr, CONV_C_CAST, LOOKUP_NORMAL, complain);
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C 
b/gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C
new file mode 100644
index 000..1d10431e6dc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/scoped_enum12.C
@@ -0,0 +1,8 @@
+// PR c++/111895
+// { dg-do compile { target c++11 } }
+
+enum class o_field : unsigned char { no, yes, different_from_s };
+struct fields {
+  o_field o : 2;
+};
+bool func(fields f) { return static_cast(f.o); }

base-commit: 99a6c1065de2db04d0f56f4b2cc89acecf21b72e
-- 
2.41.0



Re: [PATCH] Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)

2023-10-24 Thread Tobias Burnus

Hi PA, hello all,

First, I hesitate to review/approve a patch I am involved in; Thus, I would like
if someone could have a second look.

Regarding the patch itself:


On 20.10.23 16:02, Paul-Antoine Arraswrote:

Hi all,

The attached patch fixes a bug that causes valid OpenMP declare
variant directive
and functions to be rejected with the following error (see testcase):

[...]
Error: variant ‘foo_variant’ and base ‘foo’ at (1) have incompatible
types: Type mismatch in argument 'c_bv' (INTEGER(8)/TYPE(c_ptr))

The fix consists in special-casing this situation in gfc_compare_types().
OK for mainline?

...

Subject: [PATCH] Fortran: Fix incompatible types between INTEGER(8) and
  TYPE(c_ptr)

In the context of an OpenMP declare variant directive, arguments of type C_PTR
are sometimes recognised as C_PTR in the base function and as INTEGER(8) in the
variant - or the other way around, depending on the parsing order.
This patch prevents such situation from turning into a compile error.

2023-10-20  Paul-Antoine Arras
  Tobias Burnus

gcc/fortran/ChangeLog:

  * interface.cc (gfc_compare_types): Return true in this situation.


That's a bad description. It makes sense when reading the commit log but if you
only read gcc/fortran/ChangeLog, 'this situation' is a dangling reference.


  gcc/fortran/ChangeLog.omp|  5 ++
  gcc/testsuite/ChangeLog.omp  |  4 ++


On mainline, the ChangeLog not ChangeLog.omp is used. This changelog is 
automatically
filled by the data in the commit log. Thus, no need to include it in the patch.
(Besides: It keeps getting outdated by any other commit to that file.)

As you have a commit, running the following, possibly with the commit hash as
argument (unless it is the last commit) will show that the nightly script will 
use:

./contrib/gcc-changelog/git_check_commit.py -v -p

It is additionally a good check whether you got the syntax right. (This is run
as pre-commit hook.)

* * *

Regarding the patch, I think it will work, but I wonder whether we can do
better - esp. regarding c_ptr vs. c_funptr.

I started by looking why the current code fails:


index e9843e9549c..8bd35fd6d22 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -705,12 +705,17 @@ gfc_compare_types (gfc_typespec *ts1, gfc_typespec *ts2)
-
-  if (((ts1->type == BT_INTEGER && ts2->type == BT_DERIVED)
-   || (ts1->type == BT_DERIVED && ts2->type == BT_INTEGER))
-  && ts1->u.derived && ts2->u.derived
-  && ts1->u.derived == ts2->u.derived)


This does not work because the pointers to the derived type are different:

(gdb) p *ts1
$10 = {type = BT_INTEGER, kind = 8, u = {derived = 0x30c66b0, cl = 0x30c66b0, 
pad = 51144368}, interface = 0x0, is_c_interop = 1, is_iso_c = 0, f90_type = 
BT_VOID, deferred = false,
  interop_kind = 0x0}

(gdb) p *ts2
$11 = {type = BT_DERIVED, kind = 0, u = {derived = 0x30c2930, cl = 0x30c2930, 
pad = 51128624}, interface = 0x0, is_c_interop = 0, is_iso_c = 0, f90_type = 
BT_UNKNOWN,
  deferred = false, interop_kind = 0x0}

The reason seems to be that they are freshly created
in different namespaces. Consequently, attr.use_assoc = 1
and the namespace is different, i.e.


(gdb) p ts1->u.derived->ns->proc_name->name
$18 = 0x76ff4138 "foo"

(gdb) p ts2->u.derived->ns->proc_name->name
$19 = 0x76ffc260 "foo_variant"

* * *

Having said this, I think we can combine the current
and the modified version, i.e.


+  if ((ts1->type == BT_INTEGER && ts2->type == BT_DERIVED
+   && ts1->f90_type == BT_VOID
+   && ts2->u.derived->ts.is_iso_c
+   && ts2->u.derived->ts.u.derived->ts.f90_type == BT_VOID)
+  || (ts2->type == BT_INTEGER && ts1->type == BT_DERIVED
+   && ts2->f90_type == BT_VOID
+   && ts1->u.derived->ts.is_iso_c
+   && ts1->u.derived->ts.u.derived->ts.f90_type == BT_VOID))


See attached patch for a combined version, which checks now
whether from_intmod == INTMOD_ISO_C_BINDING and then compares
the names (to distinguish c_ptr and c_funptr). Those are unaffected
by 'use' renames, hence, we should be fine.

While in this example, the name pointers are identical, I fear that
won't hold in some more complex indirect use via module-use. Thus,
strcmp is used.

(gdb) p ts1->u.derived->name
$13 = 0x76ff4100 "c_ptr"

(gdb) p ts2->u.derived->name
$14 = 0x76ff4100 "c_ptr"

* * *

Additionally, I think it would be good to have a testcase which checks for
  c_funptr vs. c_ptr
mismatch.

Just changing c_ptr to c_funptr in the testcase (+ commenting the c_f_pointer)
prints:
  Error: variant ‘foo_variant’ and base ‘foo’ at (1) have incompatible types: 
Type mismatch in argument 'c_bv' (INTEGER(8)/TYPE(c_funptr))

I think that would be a good invalid testcase besides the valid one.

But with a tweak to get better messages to give:
  Error: variant ‘foo_variant’ and base ‘foo’ at (1) have incompatible types: 
Type mismatch in argument 'c_bv' (TYPE(c_ptr)/TYPE(c_funptr))

cf. misc.cc in the 

[PATCH] config, aarch64: Use a more compatible sed invocation.

2023-10-24 Thread Iain Sandoe
Although this came up initially when working on the Darwin Arm64
port, it also breaks cross-compilers on platforms with non-GNU sed.

Tested on x86_64-darwin X aarch64-linux-gnu, aarch64-darwin,
aarch64-linux-gnu and x86_64-linux-gnu.  OK for master?
thanks,
Iain

--- 8< ---

Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension to the -e (recognising extended regex).

This is failing on Darwin, which defaults to Posix behaviour for -e.
However '-E' is accepted to indicate an extended RE.  Strictly, this
is also not really sufficient, since we should only require a Posix
sed (but it seems supported for BSD-derivatives).

gcc/ChangeLog:

* config.gcc: Use -E to to sed to indicate that we are using
extended REs.

Signed-off-by: Iain Sandoe 
---
 gcc/config.gcc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 606d3a8513e..a7216907261 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4199,8 +4199,8 @@ case "${target}" in
fi
for which in cpu arch tune; do
eval "val=\$with_$which"
-   base_val=`echo $val | sed -e 's/\+.*//'`
-   ext_val=`echo $val | sed -e 's/[a-z0-9.-]\+//'`
+   base_val=`echo $val | sed -E 's/\+.*//'`
+   ext_val=`echo $val | sed -E 's/[a-z0-9.-]+//'`
 
if [ $which = arch ]; then
  def=aarch64-arches.def
@@ -4232,9 +4232,9 @@ case "${target}" in
 
  while [ x"$ext_val" != x ]
  do
-   ext_val=`echo $ext_val | sed -e 's/\+//'`
-   ext=`echo $ext_val | sed -e 's/\+.*//'`
-   base_ext=`echo $ext | sed -e 's/^no//'`
+   ext_val=`echo $ext_val | sed -E 's/\+//'`
+   ext=`echo $ext_val | sed -E 's/\+.*//'`
+   base_ext=`echo $ext | sed -E 's/^no//'`
opt_line=`echo -e "$options_parsed" | \
grep "^\"$base_ext\""`
 
@@ -4245,7 +4245,7 @@ case "${target}" in
  echo "Unknown extension used in 
--with-$which=$val" 1>&2
  exit 1
fi
-   ext_val=`echo $ext_val | sed -e 
's/[a-z0-9]\+//'`
+   ext_val=`echo $ext_val | sed -E 's/[a-z0-9]+//'`
  done
 
  true
-- 
2.39.2 (Apple Git-143)



[PATCH] testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c

2023-10-24 Thread Stefan Schulze Frielinghaus
Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.

gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not 
supported on this target
  237 | _BitInt(32) b32_v;
  | ^~~

Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.

Tested on s390x and x86_64.  Ok for mainline?

gcc/testsuite/ChangeLog:

* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.
---
 gcc/testsuite/gcc.misc-tests/godump-1.c | 12 
 gcc/testsuite/gcc.misc-tests/godump-2.c | 18 ++
 2 files changed, 18 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.misc-tests/godump-2.c

diff --git a/gcc/testsuite/gcc.misc-tests/godump-1.c 
b/gcc/testsuite/gcc.misc-tests/godump-1.c
index f359a657827..b661d04719c 100644
--- a/gcc/testsuite/gcc.misc-tests/godump-1.c
+++ b/gcc/testsuite/gcc.misc-tests/godump-1.c
@@ -234,18 +234,6 @@ const char cc_v1;
 cc_t cc_v2;
 /* { dg-final { scan-file godump-1.out "(?n)^var _cc_v2 _cc_t$" } } */
 
-_BitInt(32) b32_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b32_v int32$" } } */
-
-_BitInt(64) b64_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b64_v int64$" } } */
-
-unsigned _BitInt(32) b32u_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b32u_v uint32$" } } */
-
-_BitInt(33) b33_v;
-/* { dg-final { scan-file godump-1.out "(?n)^// var _b33_v INVALID-bitint-33$" 
} } */
-
 /*** pointer and array types ***/
 typedef void *vp_t;
 /* { dg-final { scan-file godump-1.out "(?n)^type _vp_t \\*byte$" } } */
diff --git a/gcc/testsuite/gcc.misc-tests/godump-2.c 
b/gcc/testsuite/gcc.misc-tests/godump-2.c
new file mode 100644
index 000..ed093c964ac
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/godump-2.c
@@ -0,0 +1,18 @@
+/* { dg-options "-c -fdump-go-spec=godump-2.out" } */
+/* { dg-do compile { target bitint } } */
+/* { dg-skip-if "not supported for target" { ! "alpha*-*-* s390*-*-* i?86-*-* 
x86_64-*-*" } } */
+/* { dg-skip-if "not supported for target" { ! lp64 } } */
+
+_BitInt(32) b32_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b32_v int32$" } } */
+
+_BitInt(64) b64_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b64_v int64$" } } */
+
+unsigned _BitInt(32) b32u_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b32u_v uint32$" } } */
+
+_BitInt(33) b33_v;
+/* { dg-final { scan-file godump-2.out "(?n)^// var _b33_v INVALID-bitint-33$" 
} } */
+
+/* { dg-final { remove-build-file "godump-2.out" } } */
-- 
2.41.0



Re: [PATCH] RISC-V: Add AVL propagation PASS for RVV auto-vectorization

2023-10-24 Thread Kito Cheng
> +using namespace rtl_ssa;
> +using namespace riscv_vector;
> +
> +/* The AVL propagation instructions and corresponding preferred AVL.
> +   It will be updated during the analysis.  */
> +static hash_map *avlprops;

Maybe put into member data of pass_avlprop?

> +
> +const pass_data pass_data_avlprop = {
> +  RTL_PASS, /* type */
> +  "avlprop",/* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE,  /* tv_id */
> +  0,/* properties_required */
> +  0,/* properties_provided */
> +  0,/* properties_destroyed */
> +  0,/* todo_flags_start */
> +  0,/* todo_flags_finish */
> +};
> +
> +class pass_avlprop : public rtl_opt_pass
> +{
> +public:
> +  pass_avlprop (gcc::context *ctxt) : rtl_opt_pass (pass_data_avlprop, ctxt) 
> {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) final override
> +  {
> +return TARGET_VECTOR && optimize > 0;
> +  }
> +  virtual unsigned int execute (function *) final override;
> +}; // class pass_avlprop
> +
> +static void
> +avlprop_init (void)

Maybe put into member function of pass_avlprop?

> +{
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  df_analyze ();
> +  crtl->ssa = new function_info (cfun);

And take function * from incomping parameter of execute

> +  avlprops = new hash_map;
> +}
> +
> +static void
> +avlprop_done (void)
> +{
> +  free_dominance_info (CDI_DOMINATORS);
> +  if (crtl->ssa->perform_pending_updates ())
> +cleanup_cfg (0);
> +  delete crtl->ssa;
> +  crtl->ssa = nullptr;
> +  delete avlprops;
> +  avlprops = NULL;
> +}
> +
> +/* Helper function to get AVL operand.  */
> +static rtx
> +get_avl (insn_info *insn, bool avlprop_p)
> +{
> +  if (get_attr_avl_type (insn->rtl ()) == INVALID_ATTRIBUTE
> +  || get_attr_avl_type (insn->rtl ()) == VLS)
> +return NULL_RTX;
> +  if (avlprop_p)
> +{
> +  if (avlprops->get (insn))
> +   return (*avlprops->get (insn));
> +  else if (vlmax_avl_type_p (insn->rtl ()))
> +   return RVV_VLMAX;

I guess I didn't get why we need handle vlmax_avl_type_p here?

> +}
> +  extract_insn_cached (insn->rtl ());
> +  return recog_data.operand[get_attr_vl_op_idx (insn->rtl ())];
> +}
> +
> +/* This is a straight forward pattern ALWAYS in paritial auto-vectorization:
> +
> + VL = SELECT_AVL (AVL, ...)
> + V0 = MASK_LEN_LOAD (..., VL)
> + V1 = MASK_LEN_LOAD (..., VL)
> + V2 = V0 + V1 --- Missed LEN information.
> + MASK_LEN_STORE (..., V2, VL)
> +
> +   We prefer PLUS_EXPR (V0 + V1) instead of COND_LEN_ADD (V0, V1, dummy LEN)
> +   because:
> +
> + - Few code changes in Loop Vectorizer.
> + - Reuse the current clean flow of partial vectorization, That is, apply
> +   predicate LEN or MASK into LOAD/STORE operations and other special
> +   arithmetic operations (e.d. DIV), then do the whole vector register
> +   operation if it DON'T affect the correctness.
> +   Such flow is used by all other targets like x86, sve, s390, ... etc.
> + - PLUS_EXPR has better gimple optimizations than COND_LEN_ADD.
> +
> +   We propagate AVL from NON-VLMAX to VLMAX for gimple IR like PLUS_EXPR 
> which
> +   generates the VLMAX instruction due to missed LEN information. The later
> +   VSETVL PASS will elided the redundant vsetvls.
> +*/
> +
> +static rtx
> +get_autovectorize_preferred_avl (insn_info *insn)
> +{
> +  if (!vlmax_avl_p (get_avl (insn, true)) || !tail_agnostic_p (insn->rtl ()))
> +return NULL_RTX;

I would prefer adding new attribute to let this become simpler.

> +
> +  rtx use_avl = NULL_RTX;
> +  insn_info *avl_use_insn = nullptr;
> +  unsigned int ratio
> += calculate_ratio (get_sew (insn->rtl ()), get_vlmul (insn->rtl ()));
> +  for (def_info *def : insn->defs ())
> +{
> +  auto set = safe_dyn_cast (def);
> +  if (!set || !set->is_reg ())
> +   return NULL_RTX;
> +  for (use_info *use : set->all_uses ())
> +   {
> + if (!use->is_in_nondebug_insn ())
> +   return NULL_RTX;
> + insn_info *use_insn = use->insn ();
> + /* FIXME: Stop AVL propagation if any USE is not a RVV real
> +instruction. It should be totally enough for vectorized codes 
> since
> +they always locate at extended blocks.
> +
> +TODO: We can extend PHI checking for intrinsic codes if it
> +necessary in the future.  */
> + if (use_insn->is_artificial () || !has_vtype_op (use_insn->rtl ()))
> +   return NULL_RTX;
> + if (!has_vl_op (use_insn->rtl ()))
> +   continue;
> +
> + rtx new_use_avl = get_avl (use_insn, true);
> + if (!new_use_avl)
> +   return NULL_RTX;
> + if (!use_avl)
> +   use_avl = new_use_avl;
> + if (!rtx_equal_p (use_avl, new_use_avl)
> + || calculate_ratio (get_sew (use_insn->rtl ()),
> + get_vlmul (use_insn->rtl ()))
> +  

Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-10-24 Thread Richard Sandiford
Sorry for the slow review.  I had a look at the arm bits too, to get
some context for the target-independent bits.

Stamatis Markianos-Wright via Gcc-patches  writes:
> [...]
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 77e76336e94..74186930f0b 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -65,8 +65,8 @@ extern void arm_emit_speculation_barrier_function (void);
>  extern void arm_decompose_di_binop (rtx, rtx, rtx *, rtx *, rtx *, rtx *);
>  extern bool arm_q_bit_access (void);
>  extern bool arm_ge_bits_access (void);
> -extern bool arm_target_insn_ok_for_lob (rtx);
> -
> +extern bool arm_target_bb_ok_for_lob (basic_block);
> +extern rtx arm_attempt_dlstp_transform (rtx);
>  #ifdef RTX_CODE
>  enum reg_class
>  arm_mode_base_reg_class (machine_mode);
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 6e933c80183..39d97ba5e4d 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -659,6 +659,12 @@ static const struct attribute_spec arm_attribute_table[]
> [...]
> +/* Wrapper function of arm_get_required_vpr_reg with TYPE == 1, so return
> +   something only if the VPR reg is an input operand to the insn.  */
> +
> +static rtx
> +ALWAYS_INLINE

Probably best to leave out the ALWAYS_INLINE.  That's generally only
appropriate for things that need to be inlined for correctness.

> +arm_get_required_vpr_reg_param (rtx_insn *insn)
> +{
> +  return arm_get_required_vpr_reg (insn, 1);
> +}
> [...]
> +/* Recursively scan through the DF chain backwards within the basic block and
> +   determine if any of the USEs of the original insn (or the USEs of the 
> insns
> +   where thy were DEF-ed, etc., recursively) were affected by implicit VPT
> +   predication of an MVE_VPT_UNPREDICATED_INSN_P in a dlstp/letp loop.
> +   This function returns true if the insn is affected implicit predication
> +   and false otherwise.
> +   Having such implicit predication on an unpredicated insn wouldn't in 
> itself
> +   block tail predication, because the output of that insn might then be used
> +   in a correctly predicated store insn, where the disabled lanes will be
> +   ignored.  To verify this we later call:
> +   `arm_mve_check_df_chain_fwd_for_implic_predic_impact`, which will check 
> the
> +   DF chains forward to see if any implicitly-predicated operand gets used in
> +   an improper way.  */
> +
> +static bool
> +arm_mve_check_df_chain_back_for_implic_predic
> +  (hash_map, bool>* safe_insn_map, rtx_insn *insn,
> +   rtx vctp_vpr_generated)
> +{
> +  bool* temp = NULL;
> +  if ((temp = safe_insn_map->get (INSN_UID (insn
> +return *temp;
> +
> +  basic_block body = BLOCK_FOR_INSN (insn);
> +  /* The circumstances under which an instruction is affected by "implicit
> + predication" are as follows:
> +  * It is an UNPREDICATED_INSN_P:
> + * That loads/stores from/to memory.
> + * Where any one of its operands is an MVE vector from outside the
> +   loop body bb.
> + Or:
> +  * Any of it's operands, recursively backwards, are affected.  */
> +  if (MVE_VPT_UNPREDICATED_INSN_P (insn)
> +  && (arm_is_mve_load_store_insn (insn)
> +   || (arm_is_mve_across_vector_insn (insn)
> +   && !arm_mve_is_allowed_unpredic_across_vector_insn (insn
> +{
> +  safe_insn_map->put (INSN_UID (insn), true);
> +  return true;
> +}
> +
> +  df_ref insn_uses = NULL;
> +  FOR_EACH_INSN_USE (insn_uses, insn)
> +  {
> +/* If the operand is in the input reg set to the the basic block,
> +   (i.e. it has come from outside the loop!), consider it unsafe if:
> +  * It's being used in an unpredicated insn.
> +  * It is a predicable MVE vector.  */
> +if (MVE_VPT_UNPREDICATED_INSN_P (insn)
> + && VALID_MVE_MODE (GET_MODE (DF_REF_REG (insn_uses)))
> + && REGNO_REG_SET_P (DF_LR_IN (body), DF_REF_REGNO (insn_uses)))
> +  {
> + safe_insn_map->put (INSN_UID (insn), true);
> + return true;
> +  }
> +/* Scan backwards from the current INSN through the instruction chain
> +   until the start of the basic block.  */
> +for (rtx_insn *prev_insn = PREV_INSN (insn);
> +  prev_insn && prev_insn != PREV_INSN (BB_HEAD (body));
> +  prev_insn = PREV_INSN (prev_insn))
> +  {
> + /* If a previous insn defines a register that INSN uses, then recurse
> +in order to check that insn's USEs.
> +If any of these insns return true as MVE_VPT_UNPREDICATED_INSN_Ps,
> +then the whole chain is affected by the change in behaviour from
> +being placed in dlstp/letp loop.  */
> + df_ref prev_insn_defs = NULL;
> + FOR_EACH_INSN_DEF (prev_insn_defs, prev_insn)
> + {
> +   if (DF_REF_REGNO (insn_uses) == DF_REF_REGNO (prev_insn_defs)
> +   && !arm_mve_vec_insn_is_predicated_with_this_predicate
> +(insn, vctp_vpr_generated)
> +   && 

Re: [PATCH v1 1/1] gcc: config: microblaze: fix cpu version check

2023-10-24 Thread Michael Eager

On 10/24/23 00:01, Frager, Neal wrote:

There is a microblaze cpu version 10.0 included in versal. If the
minor version is only a single digit, then the version comparison
will fail as version 10.0 will appear as 100 compared to version
6.00 or 8.30 which will calculate to values 600 and 830.
The issue can be seen when using the '-mcpu=10.0' option.
With this fix, versions with a single digit minor number such as
10.0 will be calculated as greater than versions with a smaller
major version number, but with two minor version digits.
By applying this fix, several incorrect warning messages will no
longer be printed when building the versal plm application, such as
the warning message below:
warning: '-mxl-multiply-high' can be used only with '-mcpu=v6.00.a'
or greater
Signed-off-by: Neal Frager 
---
   gcc/config/microblaze/microblaze.cc | 164 +---
   1 file changed, 76 insertions(+), 88 deletions(-)


Please add a test case.

--
Michael Eager


Hi Michael,

Would you mind helping me understand how to make a gcc test case for this patch?

This patch does not change the resulting binaries of a microblaze gcc build.  
The output will be the same with our without the patch, so I do not having 
anything in the binary itself to verify.

All that happens is false warning messages will not be printed when building 
with ‘-mcpu=10.0’.  Is there a way to test for warning messages?

In any case, please do not commit v1 of this patch.  I am going to work on 
making a v2 based on Mark’s feedback.



You can create a test case which passes the -mcpu=10.0 and other options to GCC 
and verify that the message is not generated after the patch is applied.



You can make all GCC warnings into errors with the "-Werror" option.
This means that the compile will fail if the warning is issued.



Take a look at gcc/testsuite/gcc.target/aarch64/bti-1.c for an example of using { dg-options 
"" } to specify command line options.



There is a test suite option (dg-warning) which checks that a particular source 
line generates a warning message, but it isn't clear whether is is possible to 
check that a warning is not issued.


Hi Michael,

Thanks to Mark Hatle's feedback, we have a much simpler solution to the problem.

The following change is actually all that is necessary.  Since we are just 
moving from
strcasecmp to strverscmp, does v2 of the patch need to have a test case to go 
with it?

-#define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
+#define MICROBLAZE_VERSION_COMPARE(VA,VB) strverscmp (VA, VB)

I assume there are already test cases that verify that strverscmp works 
correctly?


Still need a test case to verify this fix.

--
Michael Eager


[RFC PATCH] Detecting lifetime-dse issues via Valgrind

2023-10-24 Thread exactlywb
From: Daniil Frolov 

PR 66487 is asking to provide sanitizer-like detection for C++ object lifetime
violations that are worked around with -fno-lifetime-dse in Firefox, LLVM,
OpenJade.

The discussion in the PR was centered around extending MSan, but MSan was not
ported to GCC (and requires rebuilding everything with instrumentation).

Instead, allow Valgrind to see lifetime boundaries by emitting client requests
along *this = { CLOBBER }.  The client request marks the "clobbered" memory as
undefined for Valgrind; clobbering assignments mark the beginning of ctor and
end of dtor execution for C++ objects.  Hence, attempts to read object storage
after the destructor, or "pre-initialize" its fields prior to the constructor
will be caught.

Valgrind client requests are offered as macros that emit inline asm.  For use
in code generation, we need to wrap it in a built-in.  We know that implementing
such a built-in in libgcc is undesirable, ideally contents of libgcc should not
depend on availability of external headers.  Suggestion for cleaner solutions
would be welcome.

gcc/ChangeLog:

* Makefile.in: Add gimple-valgrind.o.
* builtins.def (BUILT_IN_VALGRIND_MEM_UNDEF): Add new built-in.
* common.opt: Add new option.
* passes.def: Add new pass.
* tree-pass.h (make_pass_emit_valgrind): New function.
* gimple-valgrind.cc: New file.

libgcc/ChangeLog:

* Makefile.in: Add valgrind.o.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add option --enable-valgrind-annotations into libgcc
config.
* libgcc2.h (__valgrind_make_mem_undefined): New function.
* valgrind.c: New file.
---
 gcc/Makefile.in|   1 +
 gcc/builtins.def   |   1 +
 gcc/common.opt |   4 ++
 gcc/gimple-valgrind.cc | 124 +
 gcc/passes.def |   1 +
 gcc/tree-pass.h|   1 +
 libgcc/Makefile.in |   2 +-
 libgcc/config.in   |   9 +++
 libgcc/configure   |  79 ++
 libgcc/configure.ac|  48 
 libgcc/libgcc2.h   |   1 +
 libgcc/valgrind.c  |  50 +
 12 files changed, 320 insertions(+), 1 deletion(-)
 create mode 100644 gcc/gimple-valgrind.cc
 create mode 100644 libgcc/valgrind.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9cc16268abf..ded6bdf1673 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1487,6 +1487,7 @@ OBJS = \
gimple-ssa-warn-access.o \
gimple-ssa-warn-alloca.o \
gimple-ssa-warn-restrict.o \
+   gimple-valgrind.o \
gimple-streamer-in.o \
gimple-streamer-out.o \
gimple-walk.o \
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 5953266acba..42d34189f1e 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1064,6 +1064,7 @@ DEF_GCC_BUILTIN(BUILT_IN_VA_END, "va_end", 
BT_FN_VOID_VALIST_REF, ATTR_N
 DEF_GCC_BUILTIN(BUILT_IN_VA_START, "va_start", 
BT_FN_VOID_VALIST_REF_VAR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_VA_ARG_PACK, "va_arg_pack", BT_FN_INT, 
ATTR_PURE_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_VA_ARG_PACK_LEN, "va_arg_pack_len", 
BT_FN_INT, ATTR_PURE_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_VALGRIND_MEM_UNDEF, 
"__valgrind_make_mem_undefined", BT_FN_VOID_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN__EXIT, "_exit", BT_FN_VOID_INT, 
ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN__EXIT2, "_Exit", BT_FN_VOID_INT, 
ATTR_NORETURN_NOTHROW_LEAF_LIST)
 
diff --git a/gcc/common.opt b/gcc/common.opt
index f137a1f81ac..c9040386956 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2515,6 +2515,10 @@ starts and when the destructor finishes.
 flifetime-dse=
 Common Joined RejectNegative UInteger Var(flag_lifetime_dse) Optimization 
IntegerRange(0, 2)
 
+fvalgrind-emit-annotations
+Common Var(flag_valgrind_annotations,1)
+Emit Valgrind annotations with respect to object's lifetime.
+
 flive-patching
 Common RejectNegative Alias(flive-patching=,inline-clone) Optimization
 
diff --git a/gcc/gimple-valgrind.cc b/gcc/gimple-valgrind.cc
new file mode 100644
index 000..8075e6404d4
--- /dev/null
+++ b/gcc/gimple-valgrind.cc
@@ -0,0 +1,124 @@
+/* Emit Valgrind client requests.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not 

Re: [PATCH] testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

2023-10-24 Thread Richard Earnshaw




On 08/09/2023 09:43, Christophe Lyon via Gcc-patches wrote:

The test was declaring 'int *carry;' and wrote to '*carry' without
initializing 'carry' first, leading to an attempt to write at address
zero, and a crash.

Fix by declaring 'int carry;' and passing '' instead of 'carry'
as parameter.

2023-09-08  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.


OK.

R.


---
  .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 34 +--
  1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c 
b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
index a8c6cce67c8..931c9d2f30b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
+++ b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
@@ -7,7 +7,7 @@
  
  volatile int32x4_t c1;

  volatile uint32x4_t c2;
-int *carry;
+int carry;
  
  int

  main ()
@@ -21,45 +21,45 @@ main ()
uint32x4_t inactive2 = vcreateq_u32 (0, 0);
  
mve_pred16_t p = 0x;

-  (*carry) = 0x;
+  carry = 0x;
  
__builtin_arm_set_fpscr_nzcvqc (0);

-  c1 = vadcq (a1, b1, carry);
+  c1 = vadcq (a1, b1, );
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vadcq (a2, b2, carry);
+  c2 = vadcq (a2, b2, );
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vsbcq (a1, b1, carry);
+  c1 = vsbcq (a1, b1, );
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vsbcq (a2, b2, carry);
+  c2 = vsbcq (a2, b2, );
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vadcq_m (inactive1, a1, b1, carry, p);
+  c1 = vadcq_m (inactive1, a1, b1, , p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vadcq_m (inactive2, a2, b2, carry, p);
+  c2 = vadcq_m (inactive2, a2, b2, , p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vsbcq_m (inactive1, a1, b1, carry, p);
+  c1 = vsbcq_m (inactive1, a1, b1, , p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vsbcq_m (inactive2, a2, b2, carry, p);
+  c2 = vsbcq_m (inactive2, a2, b2, , p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
  


Re: Re: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

2023-10-24 Thread Kito Cheng
Ok for gcc 13 but just wait one more week to make sure everything is fine
as gcc convention :)

Li Xu 於 2023年10月24日 週二,15:49寫道:

> Committed to trunk. Thanks juzhe.
>
>
> --
>
>
>
> Li Xu
>
>
>
> >Ok for trunk (You can commit it to the trunk now).
>
>
>
> >
>
>
>
> >For GCC-13,  I'd like to wait for kito's comment.
>
>
>
> >
>
>
>
> >Thanks.
>
>
>
> >
>
>
>
> >
>
>
>
> >juzhe.zh...@rivai.ai
>
>
>
> >
>
>
>
> >From: Li Xu
>
>
>
> >Date: 2023-10-24 15:29
>
>
>
> >To: gcc-patches
>
>
>
> >CC: kito.cheng; palmer; juzhe.zhong
>
>
>
> >Subject: [PATCH v2] RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]
>
>
>
> >
>
>
>
> >Calling vget/vset intrinsic without receiving a return value will cause
>
>
>
> >a crash. Because in this case e.target is null.
>
>
>
> >This patch should be backported to releases/gcc-13.
>
>
>
> >
>
>
>
> >PR/target 111935
>
>
>
> >
>
>
>
> >gcc/ChangeLog:
>
>
>
> >
>
>
>
> >* config/riscv/riscv-vector-builtins-bases.cc: fix bug.
>
>
>
> >
>
>
>
> >gcc/testsuite/ChangeLog:
>
>
>
> >
>
>
>
> >* gcc.target/riscv/rvv/base/pr111935.c: New test.
>
>
>
> >---
>
>
>
> > .../riscv/riscv-vector-builtins-bases.cc  |  4 +++
>
>
>
> > .../gcc.target/riscv/rvv/base/pr111935.c  | 26 +++
>
>
>
> > 2 files changed, 30 insertions(+)
>
>
>
> > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
>
>
>
> >
>
>
>
> >diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
>
>
> >index ab12e130907..0b1409a52e0 100644
>
>
>
> >--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
>
>
> >+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>
>
>
> >@@ -1740,6 +1740,8 @@ public:
>
>
>
> >
>
>
>
> >   rtx expand (function_expander ) const override
>
>
>
> >   {
>
>
>
> >+if (!e.target)
>
>
>
> >+  return NULL_RTX;
>
>
>
> > rtx dest = expand_normal (CALL_EXPR_ARG (e.exp, 0));
>
>
>
> > gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (dest)));
>
>
>
> > rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));
>
>
>
> >@@ -1777,6 +1779,8 @@ public:
>
>
>
> >
>
>
>
> >   rtx expand (function_expander ) const override
>
>
>
> >   {
>
>
>
> >+if (!e.target)
>
>
>
> >+  return NULL_RTX;
>
>
>
> > rtx src = expand_normal (CALL_EXPR_ARG (e.exp, 0));
>
>
>
> > gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (src)));
>
>
>
> > rtx index = expand_normal (CALL_EXPR_ARG (e.exp, 1));
>
>
>
> >diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
>
>
>
> >new file mode 100644
>
>
>
> >index 000..0b936d849a1
>
>
>
> >--- /dev/null
>
>
>
> >+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111935.c
>
>
>
> >@@ -0,0 +1,26 @@
>
>
>
> >+/* { dg-do compile } */
>
>
>
> >+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0 -Wno-psabi" } */
>
>
>
> >+
>
>
>
> >+#include "riscv_vector.h"
>
>
>
> >+
>
>
>
> >+inline vuint32m4_t __attribute__((__always_inline__))
> transpose_indexes() {
>
>
>
> >+  static const uint32_t idx_[16] = {0, 4, 8, 12,
>
>
>
> >+  1, 5, 9, 13,
>
>
>
> >+  2, 6, 10, 14,
>
>
>
> >+  3, 7, 11, 15};
>
>
>
> >+  return __riscv_vle32_v_u32m4(idx_, 16);
>
>
>
> >+}
>
>
>
> >+
>
>
>
> >+void pffft_real_preprocess_4x4(const float *in) {
>
>
>
> >+  vfloat32m1_t r0=__riscv_vle32_v_f32m1(in,4);
>
>
>
> >+  vfloat32m4_t tmp = __riscv_vundefined_f32m4();
>
>
>
> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 0, r0);
>
>
>
> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 1, r0);
>
>
>
> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 2, r0);
>
>
>
> >+  tmp = __riscv_vset_v_f32m1_f32m4(tmp, 3, r0);
>
>
>
> >+  tmp = __riscv_vrgather_vv_f32m4(tmp, transpose_indexes(), 16);
>
>
>
> >+  r0 = __riscv_vget_v_f32m4_f32m1(tmp, 0);
>
>
>
> >+}
>
>
>
> >+
>
>
>
> >+/* { dg-final { scan-assembler-times
> {vl[0-9]+re[0-9]+\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 10 } } */
>
>
>
> >+/* { dg-final { scan-assembler-times
> {vs[0-9]+r\.v\s+v[0-9]+,\s*0\([a-z]+[0-9]+\)} 8 } } */
>
>
>
> >--
>
>
>
> >2.17.1
>
>
>
> >
>
>
>
> >
>
>
>
> >xu...@eswincomputing.com
>
>
>


Re: Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-24 Thread juzhe.zh...@rivai.ai
Hi, Richard.

Assertion failed at this IR: 

  _427 = _425 & _426;
  _429 = present$0_16(D) != 0;
  _430 = _425 & _429;
  _409 = _430 | _445;
  _410 = _409 | _449;
  _411 = .LOOP_VECTORIZED (3, 6);
  if (_411 != 0)
goto ; [100.00%]
  else
goto ; [100.00%]

   [local count: 3280550]:

   [local count: 29823181]:

pretmp_56 = .MASK_LOAD (_293, 32B, _427);   ---> cause assertion failed.

You can take a look this ifcvt IR and search 'pretmp_56 = .MASK_LOAD (_293, 
32B, _427);'
RVV is totally the same IR as ARM SVE:
https://godbolt.org/z/rPbzfExWP 

I was struggling at this issue for a few days but failed to figure out why.
I am sorry about that. Could you help me with that?

And I adjust the code as follows:
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..42f85839c6e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9843,10 +9843,17 @@ vectorizable_load (vec_info *vinfo,
   mask_index = internal_fn_mask_index (ifn);
   if (mask_index >= 0 && slp_node)
mask_index = vect_slp_child_index_for_operand (call, mask_index);
+  slp_tree slp_op = NULL;
   if (mask_index >= 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- , NULL, _dt, _vectype))
+ ,
+ ifn == IFN_MASK_LEN_GATHER_LOAD ? _op
+ : NULL,
+ _dt, _vectype))
return false;
+  if (mask_index >= 0 && slp_node
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+   gcc_unreachable ();
 }

As you can see, except MASK_LEN_GATHER_LOAD, other LOADs, I pass 'NULL' same as 
before.
Only MASK_LEN_GATHER_LOAD passes '_op'. It works fine for RVV. But I don't 
think it's a correct code, we may need to find another solution.

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-10-20 06:19
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
"juzhe.zh...@rivai.ai"  writes:
> Hi, this patch fix V4 issue:
>
> Previously as Richard S commented:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html 
>
> slp_op and mask_vectype are only initialised when mask_index >= 0.
> Shouldn't this code be under mask_index >= 0 too?
> Also, when do we encounter mismatched mask_vectypes?  Presumably the SLP
> node has a known vectype by this point.  I think a comment would be useful.
>
> Since I didn't encounter mismatched case in the regression of RISC-V and X86, 
> so 
> I fix it in V4 patch as follows:
> +  if (mask_index >= 0 && slp_node)
> + {
> +   bool match_p
> + = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
> +   gcc_assert (match_p);
> + }
> Add assertion here.
>
> However, recently an ICE suddenly appear today in RISC-V regression:
>
> FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in 
> vectorizable_load, at tree-vect-stmts.cc:9885)
> FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors)
>
> This is because we are encountering that mask_vectype is boolean type and it 
> is external def.
> Then vect_maybe_update_slp_op_vectype will return false.
>
> Then I fix this piece of code in V5 here:
>
> +  if (mask_index >= 0 && slp_node
> +   && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> +   /* We don't vectorize the boolean type external SLP mask.  */
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "incompatible vector types for invariants\n");
> +   return false;
> + }
>
> Bootstrap and Regression on x86 passed.
 
Why are external defs a problem though?  E.g. in pr44306, it looks like
we should be able to create an invariant mask that contains
 
   (!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
   (!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
   (!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
   (!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
   ...repeating...
 
The type of the mask can be driven by the code that needs it.
 
Thanks,
Richard
 
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>  
> From: Juzhe-Zhong
> Date: 2023-10-18 20:36
> To: gcc-patches
> CC: richard.sandiford; rguenther; Juzhe-Zhong
> Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> This patch fixes this following FAILs in RISC-V regression:
>  
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
>  
> The root cause of these FAIL is that GCC SLP 

[committed] arc: Remove mpy_dest_reg_operand predicate

2023-10-24 Thread Claudiu Zissulescu
The mpy_dest_reg_operand is just a wrapper for
register_operand. Remove it.

gcc/

* config/arc/arc.md (mulsi3_700): Update pattern.
(mulsi3_v2): Likewise.
* config/arc/predicates.md (mpy_dest_reg_operand): Remove it.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.md| 6 +++---
 gcc/config/arc/predicates.md | 7 ---
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 22af0bf47dd..325e4f56b9b 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -2293,7 +2293,7 @@ (define_insn "mulu64"
 ; registers, since it cannot be the destination of a multi-cycle insn
 ; like MPY or MPYU.
 (define_insn "mulsi3_700"
- [(set (match_operand:SI 0 "mpy_dest_reg_operand""=r, r,r,  r,r")
+ [(set (match_operand:SI 0 "register_operand""=r, r,r,  r,r")
(mult:SI (match_operand:SI 1 "register_operand"  "%0, r,0,  0,r")
 (match_operand:SI 2 "nonmemory_operand" "rL,rL,I,Cal,Cal")))]
  "TARGET_ARC700_MPY"
@@ -2306,8 +2306,8 @@ (define_insn "mulsi3_700"
 ; ARCv2 has no penalties between mpy and mpyu. So, we use mpy because of its
 ; short variant. LP_COUNT constraints are still valid.
 (define_insn "mulsi3_v2"
- [(set (match_operand:SI 0 "mpy_dest_reg_operand""=q,q, r, r,r,  r,  
r")
-   (mult:SI (match_operand:SI 1 "register_operand"  "%0,q, 0, r,0,  0,  c")
+ [(set (match_operand:SI 0 "register_operand""=q,q, r, r,r,  r,  
r")
+   (mult:SI (match_operand:SI 1 "register_operand"  "%0,q, 0, r,0,  0,  r")
 (match_operand:SI 2 "nonmemory_operand"  
"q,0,rL,rL,I,Cal,Cal")))]
  "TARGET_MULTI"
  "@
diff --git a/gcc/config/arc/predicates.md b/gcc/config/arc/predicates.md
index e37d8844979..e0aef86fd24 100644
--- a/gcc/config/arc/predicates.md
+++ b/gcc/config/arc/predicates.md
@@ -23,13 +23,6 @@ (define_predicate "dest_reg_operand"
   return register_operand (op, mode);
 })
 
-(define_predicate "mpy_dest_reg_operand"
-  (match_code "reg,subreg")
-{
-  return register_operand (op, mode);
-})
-
-
 ;; Returns 1 if OP is a symbol reference.
 (define_predicate "symbolic_operand"
   (match_code "symbol_ref, label_ref, const")
-- 
2.30.2



[PATCH] gcov-io.h: fix comment regarding length of records

2023-10-24 Thread Jose E. Marchesi


The length of gcov records is stored as a signed 32-bit number of bytes.
Ok?

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index bfe4439d02d..e6f33e32652 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -101,7 +101,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
Records are not nested, but there is a record hierarchy.  Tag
numbers reflect this hierarchy.  Tags are unique across note and
data files.  Some record types have a varying amount of data.  The
-   LENGTH is the number of 4bytes that follow and is usually used to
+   LENGTH is the number of bytes that follow and is usually used to
determine how much data.  The tag value is split into 4 8-bit
fields, one for each of four possible levels.  The most significant
is allocated first.  Unused levels are zero.  Active levels are


Re: [x86 PATCH] Fine tune STV register conversion costs for -Os.

2023-10-24 Thread Uros Bizjak
On Mon, Oct 23, 2023 at 4:47 PM Roger Sayle  wrote:
>
>
> The eagle-eyed may have spotted that my recent testcases for DImode shifts
> on x86_64 included -mno-stv in the dg-options.  This is because the
> Scalar-To-Vector (STV) pass currently transforms these shifts to use
> SSE vector operations, producing larger code even with -Os.  The issue
> is that the compute_convert_gain currently underestimates the size of
> instructions required for interunit moves, which is corrected with the
> patch below.
>
> For the simple test case:
>
> unsigned long long shl1(unsigned long long x) { return x << 1; }
>
> without this patch, GCC -m32 -Os -mavx2 currently generates:
>
> shl1:   push   %ebp  // 1 byte
> mov%esp,%ebp // 2 bytes
> vmovq  0x8(%ebp),%xmm0   // 5 bytes
> pop%ebp  // 1 byte
> vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes
> vmovd  %xmm0,%eax// 4 bytes
> vpextrd $0x1,%xmm0,%edx  // 6 bytes
> ret  // 1 byte  = 24 bytes total
>
> with this patch, we now generate the shorter
>
> shl1:   push   %ebp // 1 byte
> mov%esp,%ebp// 2 bytes
> mov0x8(%ebp),%eax   // 3 bytes
> mov0xc(%ebp),%edx   // 3 bytes
> pop%ebp // 1 byte
> add%eax,%eax// 2 bytes
> adc%edx,%edx// 2 bytes
> ret // 1 byte  = 15 bytes total
>
> Benchmarking using CSiBE, shows that this patch saves 1361 bytes
> when compiling with -m32 -Os, and saves 172 bytes when compiling
> with -Os.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-10-23  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.cc (compute_convert_gain): Provide
> more accurate values (sizes) for inter-unit moves with -Os.

LGTM.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


[PATCH GCC13 backport] Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big.

2023-10-24 Thread liuhongt
This is the backport patch for releases/gcc-13 branch, the original patch for 
main trunk
is at [1].
The only difference between this backport patch and [1] is GCC13 doesn't 
support auto_mpz,
So this patch manually use mpz_init/mpz_clear.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633661.html

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for backport to releases/gcc-13?

There's loop in vect_peel_nonlinear_iv_init to get init_expr *
pow (step_expr, skip_niters). When skipn_iters is too big, compile time
hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to
init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is
pow of 2, otherwise give up vectorization when skip_niters >=
TYPE_PRECISION (TREE_TYPE (init_expr)).

Also give up vectorization when niters_skip is negative which will be
used for fully masked loop.

gcc/ChangeLog:

PR tree-optimization/111820
PR tree-optimization/111833
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give
up vectorization for nonlinear iv vect_step_op_mul when
step_expr is not exact_log2 and niters is greater than
TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize
for nagative niters_skip which will be used by fully masked
loop.
(vect_can_advance_ivs_p): Pass whole phi_info to
vect_can_peel_nonlinear_iv_p.
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize
init_expr * pow (step_expr, skipn) to init_expr
<< (log2 (step_expr) * skipn) when step_expr is exact_log2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111820-1.c: New test.
* gcc.target/i386/pr111820-2.c: New test.
* gcc.target/i386/pr111820-3.c: New test.
* gcc.target/i386/pr103144-mul-1.c: Adjust testcase.
* gcc.target/i386/pr103144-mul-2.c: Adjust testcase.
---
 .../gcc.target/i386/pr103144-mul-1.c  |  8 +++---
 .../gcc.target/i386/pr103144-mul-2.c  |  8 +++---
 gcc/testsuite/gcc.target/i386/pr111820-1.c| 16 +++
 gcc/testsuite/gcc.target/i386/pr111820-2.c| 16 +++
 gcc/testsuite/gcc.target/i386/pr111820-3.c| 16 +++
 gcc/tree-vect-loop-manip.cc   | 28 +--
 gcc/tree-vect-loop.cc | 21 +++---
 7 files changed, 98 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-3.c

diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c 
b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
index 640c34fd959..913d7737dcd 100644
--- a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
@@ -11,7 +11,7 @@ foo_mul (int* a, int b)
   for (int i = 0; i != N; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -23,7 +23,7 @@ foo_mul_const (int* a)
   for (int i = 0; i != N; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -34,7 +34,7 @@ foo_mul_peel (int* a, int b)
   for (int i = 0; i != 39; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -46,6 +46,6 @@ foo_mul_peel_const (int* a)
   for (int i = 0; i != 39; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c 
b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
index 39fdea3a69d..b2ff186e335 100644
--- a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
@@ -16,12 +16,12 @@ avx2_test (void)
 
   __builtin_memset (epi32_exp, 0, N * sizeof (int));
   int b = 8;
-  v8si init = __extension__(v8si) { b, b * 3, b * 9, b * 27, b * 81, b * 243, 
b * 729, b * 2187 };
+  v8si init = __extension__(v8si) { b, b * 4, b * 16, b * 64, b * 256, b * 
1024, b * 4096, b * 16384 };
 
   for (int i = 0; i != N / 8; i++)
 {
   memcpy (epi32_exp + i * 8, , 32);
-  init *= 6561;
+  init *= 65536;
 }
 
   foo_mul (epi32_dst, b);
@@ -32,11 +32,11 @@ avx2_test (void)
   if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * 4) != 0)
 __builtin_abort ();
 
-  init = __extension__(v8si) { 1, 3, 9, 27, 81, 243, 729, 2187 };
+  init = __extension__(v8si) { 1, 4, 16, 64, 256, 1024, 4096, 16384 };
   for (int i = 0; i != N / 8; i++)
 {
   memcpy (epi32_exp + i * 8, , 32);
-  init *= 6561;
+  init *= 65536;
 }
 
   foo_mul_const (epi32_dst);
diff --git a/gcc/testsuite/gcc.target/i386/pr111820-1.c 
b/gcc/testsuite/gcc.target/i386/pr111820-1.c
new file mode 100644
index 000..50e960c39d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111820-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -fno-tree-vrp -Wno-aggressive-loop-optimizations 
-fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump "Avoid 

Re: [PATCH] i386: Fix undefined masks in vpopcnt tests

2023-10-24 Thread Hongtao Liu
On Tue, Oct 24, 2023 at 6:10 PM Richard Sandiford
 wrote:
>
> The files changed in this patch had tests for masked and unmasked
> popcnt.  However, the mask inputs to the masked forms were undefined,
> and would be set to zero by init_regs.  Any combine-like pass that
> ran after init_regs could then fold the masked forms into the
> unmasked ones.  I saw this while testing the late-combine pass
> on x86.
>
> Tested on x86_64-linux-gnu.  OK to install?  (I didn't think this
> counted as obvious because there are other ways of initialising
> the mask.)
Maybe just move the definition of the mask outside of the functions as
extern __mmask16 msk;
But of course your approach is also ok, so either way is ok with me.
>
> Richard
>
>
> gcc/testsuite/
> * gcc.target/i386/avx512bitalg-vpopcntb.c: Use an asm to define
> the mask.
> * gcc.target/i386/avx512bitalg-vpopcntbvl.c: Likewise.
> * gcc.target/i386/avx512bitalg-vpopcntw.c: Likewise.
> * gcc.target/i386/avx512bitalg-vpopcntwvl.c: Likewise.
> * gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Likewise.
> * gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c| 1 +
>  gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c  | 1 +
>  gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c| 1 +
>  gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c  | 1 +
>  gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c | 1 +
>  gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c | 1 +
>  6 files changed, 6 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
> index 44b82c0519d..c52088161a0 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
> @@ -11,6 +11,7 @@ extern __m512i z, z1;
>  int foo ()
>  {
>__mmask16 msk;
> +  asm volatile ("" : "=k" (msk));
>__m512i c = _mm512_popcnt_epi8 (z);
>asm volatile ("" : "+v" (c));
>c = _mm512_mask_popcnt_epi8 (z1, msk, z);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
> index 8c2dfaba9c6..7d11c6c4623 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
> @@ -16,6 +16,7 @@ int foo ()
>  {
>__mmask32 msk32;
>__mmask16 msk16;
> +  asm volatile ("" : "=k" (msk16), "=k" (msk32));
>__m256i c256 = _mm256_popcnt_epi8 (y);
>asm volatile ("" : "+v" (c256));
>c256 = _mm256_mask_popcnt_epi8 (y_1, msk32, y);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
> index 2ef8589f6c1..bc470415e9b 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
> @@ -11,6 +11,7 @@ extern __m512i z, z1;
>  int foo ()
>  {
>__mmask16 msk;
> +  asm volatile ("" : "=k" (msk));
>__m512i c = _mm512_popcnt_epi16 (z);
>asm volatile ("" : "+v" (c));
>c = _mm512_mask_popcnt_epi16 (z1, msk, z);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
> index c976461b12e..3a6af3ed8a1 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
> @@ -16,6 +16,7 @@ int foo ()
>  {
>__mmask16 msk16;
>__mmask8 msk8;
> +  asm volatile ("" : "=k" (msk16), "=k" (msk8));
>__m256i c256 = _mm256_popcnt_epi16 (y);
>asm volatile ("" : "+v" (c256));
>c256 = _mm256_mask_popcnt_epi16 (y_1, msk16, y);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c 
> b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
> index b4d82f97032..0a54ae83055 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
> @@ -20,6 +20,7 @@ int foo ()
>  {
>__mmask16 msk;
>__mmask8 msk8;
> +  asm volatile ("" : "=k" (msk), "=k" (msk8));
>__m128i a = _mm_popcnt_epi32 (x);
>asm volatile ("" : "+v" (a));
>a = _mm_mask_popcnt_epi32 (x_1, msk8, x);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c 
> b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
> index e87d6c999b6..c11e6e00998 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
> @@ -19,6 +19,7 @@ extern __m512i z, z_1;
>  int foo ()
>  {
>__mmask8 msk;
> +  asm volatile ("" : "=k" (msk));
>__m128i a = _mm_popcnt_epi64 (x);
>asm volatile ("" : "+v" (a));
>a = _mm_mask_popcnt_epi64 (x_1, msk, x);
> --
> 2.25.1
>


-- 
BR,
Hongtao


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-24 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, 19 Oct 2023, Robin Dapp wrote:
>
>> Ugh, I didn't push yet because with a rebased trunk I am
>> seeing different behavior for some riscv testcases.
>> 
>> A reduction is not recognized because there is yet another
>> "double use" occurrence in check_reduction_path.  I guess it's
>> reasonable to loosen the restriction for conditional operations
>> here as well.
>> 
>> The only change to v4 therefore is:
>> 
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index ebab1953b9c..64654a55e4c 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -4085,7 +4094,15 @@ pop:
>> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
>>   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
>> cnt++;
>> -  if (cnt != 1)
>> +
>> +  bool cond_fn_p = op.code.is_internal_fn ()
>> +   && (conditional_internal_fn_code (internal_fn (*code))
>> +   != ERROR_MARK);
>> +
>> +  /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
>> +op1 twice (once as definition, once as else) in the same operation.
>> +Allow this.  */
>> +  if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))
>> 
>> Bootstrapped and regtested again on x86, aarch64 and power10.
>> Testsuite on riscv unchanged.
>
> Hmm, why opi == 1 only?  I think
>
> # _1 = PHI <.., _4>
>  _3 = .COND_ADD (_1, _2, _1);
>  _4 = .COND_ADD (_3, _5, _3);
>
> would be fine as well.  I think we want to simply ignore the 'else' value
> of conditional internal functions.  I suppose we have unary, binary
> and ternary conditional functions - I miss a internal_fn_else_index,
> but I suppose it's always the last one?

Yeah, it was always the last one before the introduction of .COND_LEN.
I agree internal_fn_else_index would be useful now.

Thanks,
Richard

>
> I think a single use on .COND functions is also OK, even when on the
> 'else' value only?  But maybe that's not too important here.
>
> Maybe
>
>   gimple *op_use_stmt;
>   unsigned cnt = 0;
>   FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
> if (.. op_use_stmt is conditional internal function ..)
>   {
> for (unsigned j = 0; j < gimple_call_num_args (call) - 1; ++j)
>   if (gimple_call_arg (call, j) == op.ops[opi])
> cnt++;
>   }
> else if (!is_gimple_debug (op_use_stmt)
> && (*code != ERROR_MARK
> || flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
>   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> cnt++;
>
> ?
>
>> Regards
>>  Robin
>> 
>> Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar 
>> reduction.
>> 
>> As described in PR111401 we currently emit a COND and a PLUS expression
>> for conditional reductions.  This makes it difficult to combine both
>> into a masked reduction statement later.
>> This patch improves that by directly emitting a COND_ADD/COND_OP during
>> ifcvt and adjusting some vectorizer code to handle it.
>> 
>> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
>> is true.
>> 
>> gcc/ChangeLog:
>> 
>>  PR middle-end/111401
>>  * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
>>  if supported.
>>  (predicate_scalar_phi): Add whitespace.
>>  * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
>>  (neutral_op_for_reduction): Return -0 for PLUS.
>>  (check_reduction_path): Don't count else operand in COND_OP.
>>  (vect_is_simple_reduction): Ditto.
>>  (vect_create_epilog_for_reduction): Fix whitespace.
>>  (vectorize_fold_left_reduction): Add COND_OP handling.
>>  (vectorizable_reduction): Don't count else operand in COND_OP.
>>  (vect_transform_reduction): Add COND_OP handling.
>>  * tree-vectorizer.h (neutral_op_for_reduction): Add default
>>  parameter.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>>  * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>>  * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>>  * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
>> ---
>>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +++
>>  .../riscv/rvv/autovec/cond/pr111401.c | 139 +++
>>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>>  gcc/tree-if-conv.cc   |  49 +++--
>>  gcc/tree-vect-loop.cc | 168 ++
>>  gcc/tree-vectorizer.h |   2 +-
>>  7 files changed, 456 insertions(+), 51 deletions(-)
>>  create mode 100644 
>> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>> 
>> diff --git 
>> 

Re: [PATCH v25 25/33] libstdc++: Optimize std::is_function compilation performance

2023-10-24 Thread Jonathan Wakely
On Tue, 24 Oct 2023 at 03:16, Ken Matsui  wrote:

> This patch optimizes the compilation performance of std::is_function
> by dispatching to the new __is_function built-in trait.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_function): Use __is_function
> built-in trait.
> (is_function_v): Likewise. Optimize its implementation.
> (is_const_v): Move on top of is_function_v as is_function_v now
> depends on is_const_v.
>

I think I'd prefer to keep is_const_v where it is now, adjacent to
is_volatile_v, and move is_function_v after those.

i.e. like this (but with the additional changes to use the new built-in):

--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3198,8 +3198,8 @@ template 
   inline constexpr bool is_union_v = __is_union(_Tp);
 template 
   inline constexpr bool is_class_v = __is_class(_Tp);
-template 
-  inline constexpr bool is_function_v = is_function<_Tp>::value;
+// is_function_v is defined below, after is_const_v.
+
 template 
   inline constexpr bool is_reference_v = false;
 template 
@@ -3226,6 +3226,8 @@ template 
   inline constexpr bool is_volatile_v = false;
 template 
   inline constexpr bool is_volatile_v = true;
+template 
+  inline constexpr bool is_function_v = is_function<_Tp>::value;

 template 
   inline constexpr bool is_trivial_v = __is_trivial(_Tp);

The variable templates are currently defined in the order shown in the
standard, in te [meta.type.synop] synopsis, and in the [meta.unary.cat]
table. So let's move _is_function_v later and add a comment saying why it's
not in the expected place.


[PATCH 4/6] rtl-ssa: Handle artifical uses of deleted defs

2023-10-24 Thread Richard Sandiford
If an optimisation removes the last real use of a definition,
there can still be artificial uses left.  This patch removes
those uses too.

These artificial uses exist because RTL-SSA is only an SSA-like
view of the existing RTL IL, rather than a native SSA representation.
It effectively treats RTL registers like gimple vops, but with the
addition of an RPO view of the register's lifetime(s).  Things are
structured to allow most operations to update this RPO view in
amortised sublinear time.

gcc/
* rtl-ssa/functions.h (function_info::process_uses_of_deleted_def):
New member function.
* rtl-ssa/functions.cc (function_info::process_uses_of_deleted_def):
Likewise.
(function_info::change_insns): Use it.
---
 gcc/rtl-ssa/changes.cc  | 35 +--
 gcc/rtl-ssa/functions.h |  1 +
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index 5800f9dba97..3e14069421c 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -209,6 +209,35 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
   return true;
 }
 
+// SET has been deleted.  Clean up all remaining uses.  Such uses are
+// either dead phis or now-redundant live-out uses.
+void
+function_info::process_uses_of_deleted_def (set_info *set)
+{
+  if (!set->has_any_uses ())
+return;
+
+  auto *use = *set->all_uses ().begin ();
+  do
+{
+  auto *next_use = use->next_use ();
+  if (use->is_in_phi ())
+   {
+ // This call will not recurse.
+ process_uses_of_deleted_def (use->phi ());
+ delete_phi (use->phi ());
+   }
+  else
+   {
+ gcc_assert (use->is_live_out_use ());
+ remove_use (use);
+   }
+  use = next_use;
+}
+  while (use);
+  gcc_assert (!set->has_any_uses ());
+}
+
 // Update the REG_NOTES of INSN, whose pattern has just been changed.
 static void
 update_notes (rtx_insn *insn)
@@ -695,7 +724,8 @@ function_info::change_insns (array_slice 
changes)
 }
 
   // Remove all definitions that are no longer needed.  After the above,
-  // such definitions should no longer have any registered users.
+  // the only uses of such definitions should be dead phis and now-redundant
+  // live-out uses.
   //
   // In particular, this means that consumers must handle debug
   // instructions before removing a set.
@@ -704,7 +734,8 @@ function_info::change_insns (array_slice 
changes)
   if (def->m_has_been_superceded)
{
  auto *set = dyn_cast (def);
- gcc_assert (!set || !set->has_any_uses ());
+ if (set && set->has_any_uses ())
+   process_uses_of_deleted_def (set);
  remove_def (def);
}
 
diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
index 73690a0e63b..cd90b6aa9df 100644
--- a/gcc/rtl-ssa/functions.h
+++ b/gcc/rtl-ssa/functions.h
@@ -263,6 +263,7 @@ private:
   bb_info *create_bb_info (basic_block);
   void append_bb (bb_info *);
 
+  void process_uses_of_deleted_def (set_info *);
   insn_info *add_placeholder_after (insn_info *);
   void possibly_queue_changes (insn_change &);
   void finalize_new_accesses (insn_change &, insn_info *);
-- 
2.25.1



[PATCH 2/6] rtl-ssa: Create REG_UNUSED notes after all pending changes

2023-10-24 Thread Richard Sandiford
Unlike REG_DEAD notes, REG_UNUSED notes need to be kept free of
false positives by all passes.  function_info::change_insns
does this by removing all REG_UNUSED notes, and then using
add_reg_unused_notes to add notes back (or create new ones)
where appropriate.

The problem was that it called add_reg_unused_notes on the fly
while updating each instruction, which meant that the information
for later instructions in the change set wasn't up to date.
This patch does it in a separate loop instead.

gcc/
* rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Remove
call to add_reg_unused_notes and instead...
(function_info::change_insns): ...use a separate loop here.
---
 gcc/rtl-ssa/changes.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index de6222ae736..c73c23c86fb 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -586,8 +586,6 @@ function_info::apply_changes_to_insn (insn_change )
 
   insn->set_accesses (builder.finish ().begin (), num_defs, num_uses);
 }
-
-  add_reg_unused_notes (insn);
 }
 
 // Add a temporary placeholder instruction after AFTER.
@@ -733,9 +731,14 @@ function_info::change_insns (array_slice 
changes)
}
 }
 
-  // Finally apply the changes to the underlying insn_infos.
+  // Apply the changes to the underlying insn_infos.
   for (insn_change *change : changes)
 apply_changes_to_insn (*change);
+
+  // Now that the insns and accesses are up to date, add any REG_UNUSED notes.
+  for (insn_change *change : changes)
+if (!change->is_deletion ())
+  add_reg_unused_notes (change->insn ());
 }
 
 // See the comment above the declaration.
-- 
2.25.1



[PATCH 6/6] rtl-ssa: Handle call clobbers in more places

2023-10-24 Thread Richard Sandiford
In order to save (a lot of) memory, RTL-SSA avoids creating
individual clobber records for every call-clobbered register.
It instead maintains a list & splay tree of calls in an EBB,
grouped by ABI.

This patch takes these call clobbers into account in a couple
more routines.  I don't think this will have any effect on
existing users, since it's only necessary for hard registers.

gcc/
* rtl-ssa/access-utils.h (next_call_clobbers): New function.
(is_single_dominating_def, remains_available_on_exit): Replace with...
* rtl-ssa/functions.h (function_info::is_single_dominating_def)
(function_info::remains_available_on_exit): ...these new member
functions.
(function_info::m_clobbered_by_calls): New member variable.
* rtl-ssa/functions.cc (function_info::function_info): Explicitly
initialize m_clobbered_by_calls.
* rtl-ssa/insns.cc (function_info::record_call_clobbers): Update
m_clobbered_by_calls for each call-clobber note.
* rtl-ssa/member-fns.inl (function_info::is_single_dominating_def):
New function.  Check for call clobbers.
* rtl-ssa/accesses.cc (function_info::remains_available_on_exit):
Likewise.
---
 gcc/rtl-ssa/access-utils.h | 27 +--
 gcc/rtl-ssa/accesses.cc| 25 +
 gcc/rtl-ssa/functions.cc   |  2 +-
 gcc/rtl-ssa/functions.h| 14 ++
 gcc/rtl-ssa/insns.cc   |  2 ++
 gcc/rtl-ssa/member-fns.inl |  9 +
 6 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/gcc/rtl-ssa/access-utils.h b/gcc/rtl-ssa/access-utils.h
index 84d386b7d8b..0d7a57f843c 100644
--- a/gcc/rtl-ssa/access-utils.h
+++ b/gcc/rtl-ssa/access-utils.h
@@ -127,24 +127,6 @@ set_with_nondebug_insn_uses (access_info *access)
   return nullptr;
 }
 
-// Return true if SET is the only set of SET->resource () and if it
-// dominates all uses (excluding uses of SET->resource () at points
-// where SET->resource () is always undefined).
-inline bool
-is_single_dominating_def (const set_info *set)
-{
-  return set->is_first_def () && set->is_last_def ();
-}
-
-// SET is known to be available on entry to BB.  Return true if it is
-// also available on exit from BB.  (The value might or might not be live.)
-inline bool
-remains_available_on_exit (const set_info *set, bb_info *bb)
-{
-  return (set->is_last_def ()
- || *set->next_def ()->insn () > *bb->end_insn ());
-}
-
 // ACCESS is known to be associated with an instruction rather than
 // a phi node.  Return which instruction that is.
 inline insn_info *
@@ -313,6 +295,15 @@ next_call_clobbers_ignoring (insn_call_clobbers_tree 
, insn_info *insn,
   return tree->insn ();
 }
 
+// Search forwards from immediately after INSN for the first instruction
+// recorded in TREE.  Return null if no such instruction exists.
+inline insn_info *
+next_call_clobbers (insn_call_clobbers_tree , insn_info *insn)
+{
+  auto ignore = [](const insn_info *) { return false; };
+  return next_call_clobbers_ignoring (tree, insn, ignore);
+}
+
 // If ACCESS is a set, return the first use of ACCESS by a nondebug insn I
 // for which IGNORE (I) is false.  Return null if ACCESS is not a set or if
 // no such use exists.
diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 774ab9d99ee..c35c7efb73d 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1303,6 +1303,31 @@ function_info::insert_temp_clobber (obstack_watermark 
,
   return insert_access (watermark, clobber, old_defs);
 }
 
+// See the comment above the declaration.
+bool
+function_info::remains_available_on_exit (const set_info *set, bb_info *bb)
+{
+  if (HARD_REGISTER_NUM_P (set->regno ())
+  && TEST_HARD_REG_BIT (m_clobbered_by_calls, set->regno ()))
+{
+  insn_info *search_insn = (set->bb () == bb
+   ? set->insn ()
+   : bb->head_insn ());
+  for (ebb_call_clobbers_info *call_group : bb->ebb ()->call_clobbers ())
+   {
+ if (!call_group->clobbers (set->resource ()))
+   continue;
+
+ insn_info *insn = next_call_clobbers (*call_group, search_insn);
+ if (insn && insn->bb () == bb)
+   return false;
+   }
+}
+
+  return (set->is_last_def ()
+ || *set->next_def ()->insn () > *bb->end_insn ());
+}
+
 // A subroutine of make_uses_available.  Try to make USE's definition
 // available at the head of BB.  WILL_BE_DEBUG_USE is true if the
 // definition will be used only in debug instructions.
diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc
index c35d25dbf8f..8a8108baae8 100644
--- a/gcc/rtl-ssa/functions.cc
+++ b/gcc/rtl-ssa/functions.cc
@@ -32,7 +32,7 @@
 using namespace rtl_ssa;
 
 function_info::function_info (function *fn)
-  : m_fn (fn)
+  : m_fn (fn), m_clobbered_by_calls ()
 {
   // Force the alignment to be obstack_alignment.  Everything else is normal.
   

[PATCH 3/6] rtl-ssa: Fix ICE when deleting memory clobbers

2023-10-24 Thread Richard Sandiford
Sometimes an optimisation can remove a clobber of scratch registers
or scratch memory.  We then need to update the DU chains to reflect
the removed clobber.

For registers this isn't a problem.  Clobbers of registers are just
momentary blips in the register's lifetime.  They act as a barrier for
moving uses later or defs earlier, but otherwise they have no effect on
the semantics of other instructions.  Removing a clobber is therefore a
cheap, local operation.

In contrast, clobbers of memory are modelled as full sets.
This is because (a) a clobber of memory does not invalidate
*all* memory and (b) it's a common idiom to use (clobber (mem ...))
in stack barriers.  But removing a set and redirecting all uses
to a different set is a linear operation.  Doing it for potentially
every optimisation could lead to quadratic behaviour.

This patch therefore refrains from removing sets of memory that appear
to be redundant.  There's an opportunity to clean this up in linear time
at the end of the pass, but as things stand, nothing would benefit from
that.

This is also a very rare event.  Usually we should try to optimise the
insn before the scratch memory has been allocated.

gcc/
* rtl-ssa/changes.cc (function_info::finalize_new_accesses):
If a change describes a set of memory, ensure that that set
is kept, regardless of the insn pattern.
---
 gcc/rtl-ssa/changes.cc | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index c73c23c86fb..5800f9dba97 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -429,8 +429,18 @@ function_info::finalize_new_accesses (insn_change , 
insn_info *pos)
   // Also keep any explicitly-recorded call clobbers, which are deliberately
   // excluded from the vec_rtx_properties.  Calls shouldn't move, so we can
   // keep the definitions in their current position.
+  //
+  // If the change describes a set of memory, but the pattern doesn't
+  // reference memory, keep the set anyway.  This can happen if the
+  // old pattern was a parallel that contained a memory clobber, and if
+  // the new pattern was recognized without that clobber.  Keeping the
+  // set avoids a linear-complexity update to the set's users.
+  //
+  // ??? We could queue an update so that these bogus clobbers are
+  // removed later.
   for (def_info *def : change.new_defs)
-if (def->m_has_been_superceded && def->is_call_clobber ())
+if (def->m_has_been_superceded
+   && (def->is_call_clobber () || def->is_mem ()))
   {
def->m_has_been_superceded = false;
def->set_insn (insn);
@@ -535,7 +545,7 @@ function_info::finalize_new_accesses (insn_change , 
insn_info *pos)
}
 }
 
-  // Install the new list of definitions in CHANGE.
+  // Install the new list of uses in CHANGE.
   sort_accesses (m_temp_uses);
   change.new_uses = use_array (temp_access_array (m_temp_uses));
   m_temp_uses.truncate (0);
-- 
2.25.1



[PATCH 1/6] rtl-ssa: Ensure global registers are live on exit

2023-10-24 Thread Richard Sandiford
RTL-SSA mostly relies on DF for block-level register liveness
information, including artificial uses and defs at the beginning
and end of blocks.  But one case was missing.  DF does not add
artificial uses of global registers to the beginning or end
of a block.  Instead it marks them as used within every block
when computing LR and LIVE problems.

For RTL-SSA, global registers behave like memory, which in
turn behaves like gimple vops.  We need to ensure that they
are live on exit so that final definitions do not appear
to be unused.

Also, the previous live-on-exit handling only considered the exit
block itself.  It needs to consider non-local gotos as well, since
they jump directly to some code in a parent function and so do
not have a path to the exit block.

gcc/
* rtl-ssa/blocks.cc (function_info::add_artificial_accesses): Force
global registers to be live on exit.  Handle any block with zero
successors like an exit block.
---
 gcc/rtl-ssa/blocks.cc | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index ecce7a68c59..49c0d15b3cf 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -866,11 +866,14 @@ function_info::add_artificial_accesses (build_info , 
df_ref_flags flags)
 
   start_insn_accesses ();
 
+  HARD_REG_SET added_regs = {};
   FOR_EACH_ARTIFICIAL_USE (ref, cfg_bb->index)
 if ((DF_REF_FLAGS (ref) & DF_REF_AT_TOP) == flags)
   {
unsigned int regno = DF_REF_REGNO (ref);
machine_mode mode = GET_MODE (DF_REF_REAL_REG (ref));
+   if (HARD_REGISTER_NUM_P (regno))
+ SET_HARD_REG_BIT (added_regs, regno);
 
// A definition must be available.
gcc_checking_assert (bitmap_bit_p (_info->in, regno)
@@ -879,10 +882,20 @@ function_info::add_artificial_accesses (build_info , 
df_ref_flags flags)
m_temp_uses.safe_push (create_reg_use (bi, insn, { mode, regno }));
   }
 
-  // Track the return value of memory by adding an artificial use of
-  // memory at the end of the exit block.
-  if (flags == 0 && cfg_bb->index == EXIT_BLOCK)
+  // Ensure that global registers and memory are live at the end of any
+  // block that has no successors, such as the exit block and non-local gotos.
+  // Global registers have to be singled out because they are not part of
+  // the DF artifical use list (they are instead treated as used within
+  // every block).
+  if (flags == 0 && EDGE_COUNT (cfg_bb->succs) == 0)
 {
+  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
+   if (global_regs[i] && !TEST_HARD_REG_BIT (added_regs, i))
+ {
+   auto mode = reg_raw_mode[i];
+   m_temp_uses.safe_push (create_reg_use (bi, insn, { mode, i }));
+ }
+
   auto *use = allocate (insn, memory, bi.current_mem_value ());
   add_use (use);
   m_temp_uses.safe_push (use);
-- 
2.25.1



[PATCH 0/6] rtl-ssa: Various fixes needed for the late-combine pass

2023-10-24 Thread Richard Sandiford
Testing the late-combine pass showed a depressing number of
bugs in areas of RTL-SSA that hadn't been used much until now.
Most of them relate to doing things after RA.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard

Richard Sandiford (6):
  rtl-ssa: Ensure global registers are live on exit
  rtl-ssa: Create REG_UNUSED notes after all pending changes
  rtl-ssa: Fix ICE when deleting memory clobbers
  rtl-ssa: Handle artifical uses of deleted defs
  rtl-ssa: Calculate dominance frontiers for the exit block
  rtl-ssa: Handle call clobbers in more places

 gcc/rtl-ssa/access-utils.h | 27 ++---
 gcc/rtl-ssa/accesses.cc| 25 
 gcc/rtl-ssa/blocks.cc  | 60 ++
 gcc/rtl-ssa/changes.cc | 58 +++-
 gcc/rtl-ssa/functions.cc   |  2 +-
 gcc/rtl-ssa/functions.h| 15 ++
 gcc/rtl-ssa/insns.cc   |  2 ++
 gcc/rtl-ssa/internals.h|  4 +++
 gcc/rtl-ssa/member-fns.inl |  9 ++
 9 files changed, 158 insertions(+), 44 deletions(-)

-- 
2.25.1



Re: [PATCH] testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

2023-10-24 Thread Christophe Lyon
Ping?

Le lun. 2 oct. 2023, 10:23, Christophe Lyon  a
écrit :

> ping? maybe this counts as obvious?
>
>
> On Thu, 14 Sept 2023 at 11:13, Christophe Lyon 
> wrote:
>
>> ping?
>>
>> On Fri, 8 Sept 2023 at 10:43, Christophe Lyon 
>> wrote:
>>
>>> The test was declaring 'int *carry;' and wrote to '*carry' without
>>> initializing 'carry' first, leading to an attempt to write at address
>>> zero, and a crash.
>>>
>>> Fix by declaring 'int carry;' and passing '' instead of 'carry'
>>> as parameter.
>>>
>>> 2023-09-08  Christophe Lyon  
>>>
>>> gcc/testsuite/
>>> * gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.
>>> ---
>>>  .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 34 +--
>>>  1 file changed, 17 insertions(+), 17 deletions(-)
>>>
>>> diff --git
>>> a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>>> b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>>> index a8c6cce67c8..931c9d2f30b 100644
>>> --- a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>>> +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>>> @@ -7,7 +7,7 @@
>>>
>>>  volatile int32x4_t c1;
>>>  volatile uint32x4_t c2;
>>> -int *carry;
>>> +int carry;
>>>
>>>  int
>>>  main ()
>>> @@ -21,45 +21,45 @@ main ()
>>>uint32x4_t inactive2 = vcreateq_u32 (0, 0);
>>>
>>>mve_pred16_t p = 0x;
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c1 = vadcq (a1, b1, carry);
>>> +  c1 = vadcq (a1, b1, );
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c2 = vadcq (a2, b2, carry);
>>> +  c2 = vadcq (a2, b2, );
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c1 = vsbcq (a1, b1, carry);
>>> +  c1 = vsbcq (a1, b1, );
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c2 = vsbcq (a2, b2, carry);
>>> +  c2 = vsbcq (a2, b2, );
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c1 = vadcq_m (inactive1, a1, b1, carry, p);
>>> +  c1 = vadcq_m (inactive1, a1, b1, , p);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c2 = vadcq_m (inactive2, a2, b2, carry, p);
>>> +  c2 = vadcq_m (inactive2, a2, b2, , p);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c1 = vsbcq_m (inactive1, a1, b1, carry, p);
>>> +  c1 = vsbcq_m (inactive1, a1, b1, , p);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>> -  (*carry) = 0x;
>>> +  carry = 0x;
>>>__builtin_arm_set_fpscr_nzcvqc (0);
>>> -  c2 = vsbcq_m (inactive2, a2, b2, carry, p);
>>> +  c2 = vsbcq_m (inactive2, a2, b2, , p);
>>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>>  __builtin_abort ();
>>>
>>> --
>>> 2.34.1
>>>
>>>


Re: [PATCH 1/2] testsuite: Add and use thread_fence effective-target

2023-10-24 Thread Christophe Lyon
Ping?

Le lun. 2 oct. 2023, 10:24, Christophe Lyon  a
écrit :

> ping?
>
> On Sun, 10 Sept 2023 at 21:31, Christophe Lyon 
> wrote:
>
>> Some targets like arm-eabi with newlib and default settings rely on
>> __sync_synchronize() to ensure synchronization.  Newlib does not
>> implement it by default, to make users aware they have to take special
>> care.
>>
>> This makes a few tests fail to link.
>>
>> This patch adds a new thread_fence effective target (similar to the
>> corresponding one in libstdc++ testsuite), and uses it in the tests
>> that need it, making them UNSUPPORTED instead of FAIL and UNRESOLVED.
>>
>> 2023-09-10  Christophe Lyon  
>>
>> gcc/
>> * doc/sourcebuild.texi (Other attributes): Document thread_fence
>> effective-target.
>>
>> gcc/testsuite/
>> * g++.dg/init/array54.C: Require thread_fence.
>> * gcc.dg/c2x-nullptr-1.c: Likewise.
>> * gcc.dg/pr103721-2.c: Likewise.
>> * lib/target-supports.exp (check_effective_target_thread_fence):
>> New.
>> ---
>>  gcc/doc/sourcebuild.texi  |  4 
>>  gcc/testsuite/g++.dg/init/array54.C   |  1 +
>>  gcc/testsuite/gcc.dg/c2x-nullptr-1.c  |  1 +
>>  gcc/testsuite/gcc.dg/pr103721-2.c |  1 +
>>  gcc/testsuite/lib/target-supports.exp | 12 
>>  5 files changed, 19 insertions(+)
>>
>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>> index 1a78b3c1abb..a5f61c29f3b 100644
>> --- a/gcc/doc/sourcebuild.texi
>> +++ b/gcc/doc/sourcebuild.texi
>> @@ -2860,6 +2860,10 @@ Compiler has been configured to support link-time
>> optimization (LTO).
>>  Compiler and linker support link-time optimization relocatable linking
>>  with @option{-r} and @option{-flto} options.
>>
>> +@item thread_fence
>> +Target implements @code{__atomic_thread_fence} without relying on
>> +non-implemented @code{__sync_synchronize()}.
>> +
>>  @item naked_functions
>>  Target supports the @code{naked} function attribute.
>>
>> diff --git a/gcc/testsuite/g++.dg/init/array54.C
>> b/gcc/testsuite/g++.dg/init/array54.C
>> index f6be350ba72..5241e451d6d 100644
>> --- a/gcc/testsuite/g++.dg/init/array54.C
>> +++ b/gcc/testsuite/g++.dg/init/array54.C
>> @@ -1,5 +1,6 @@
>>  // PR c++/90947
>>  // { dg-do run { target c++11 } }
>> +// { dg-require-effective-target thread_fence }
>>
>>  #include 
>>
>> diff --git a/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
>> b/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
>> index 4e440234d52..97a31c27409 100644
>> --- a/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
>> +++ b/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
>> @@ -1,5 +1,6 @@
>>  /* Test valid usage of C23 nullptr.  */
>>  /* { dg-do run } */
>> +// { dg-require-effective-target thread_fence }
>>  /* { dg-options "-std=c2x -pedantic-errors -Wall -Wextra
>> -Wno-unused-variable" } */
>>
>>  #include 
>> diff --git a/gcc/testsuite/gcc.dg/pr103721-2.c
>> b/gcc/testsuite/gcc.dg/pr103721-2.c
>> index aefa1f0f147..e059b1cfc2d 100644
>> --- a/gcc/testsuite/gcc.dg/pr103721-2.c
>> +++ b/gcc/testsuite/gcc.dg/pr103721-2.c
>> @@ -1,4 +1,5 @@
>>  // { dg-do run }
>> +// { dg-require-effective-target thread_fence }
>>  // { dg-options "-O2" }
>>
>>  extern void abort ();
>> diff --git a/gcc/testsuite/lib/target-supports.exp
>> b/gcc/testsuite/lib/target-supports.exp
>> index d353cc0aaf0..7ac9e7530cc 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -9107,6 +9107,18 @@ proc check_effective_target_sync_char_short { } {
>>  || [check_effective_target_mips_llsc] }}]
>>  }
>>
>> +# Return 1 if thread_fence does not rely on __sync_synchronize
>> +# library function
>> +
>> +proc check_effective_target_thread_fence {} {
>> +return [check_no_compiler_messages thread_fence executable {
>> +   int main () {
>> +   __atomic_thread_fence (__ATOMIC_SEQ_CST);
>> +   return 0;
>> +   }
>> +} ""]
>> +}
>> +
>>  # Return 1 if the target uses a ColdFire FPU.
>>
>>  proc check_effective_target_coldfire_fpu { } {
>> --
>> 2.34.1
>>
>>


[PATCH 2/4] rtl-ssa: Fix handling of deleted insns

2023-10-24 Thread Richard Sandiford
RTL-SSA queues up some invasive changes for later.  But sometimes
the insns involved in those changes can be deleted by later
optimisations, making the queued change unnecessary.  This patch
checks for that case.

gcc/
* rtl-ssa/changes.cc (function_info::perform_pending_updates): Check
whether an insn has been replaced by a note.
---
 gcc/rtl-ssa/changes.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index 73ab3ccfd24..de6222ae736 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -983,7 +983,10 @@ function_info::perform_pending_updates ()
   for (insn_info *insn : m_queued_insn_updates)
 {
   rtx_insn *rtl = insn->rtl ();
-  if (JUMP_P (rtl))
+  if (NOTE_P (rtl))
+   // The insn was later optimized away, typically to a NOTE_INSN_DELETED.
+   ;
+  else if (JUMP_P (rtl))
{
  if (INSN_CODE (rtl) == NOOP_MOVE_INSN_CODE)
{
-- 
2.25.1



[PATCH 3/4] rtl-ssa: Don't insert after insns that can throw

2023-10-24 Thread Richard Sandiford
rtl_ssa::can_insert_after didn't handle insns that can throw.
Fixing that avoids a regression with a later patch.

gcc/
* rtl-ssa.h: Include cfgbuild.h.
* rtl-ssa/movement.h (can_insert_after): Replace is_jump with the
more comprehensive control_flow_insn_p.
---
 gcc/rtl-ssa.h  | 1 +
 gcc/rtl-ssa/movement.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa.h b/gcc/rtl-ssa.h
index 7355c6c4463..3a3c8b50ee2 100644
--- a/gcc/rtl-ssa.h
+++ b/gcc/rtl-ssa.h
@@ -49,6 +49,7 @@
 #include "obstack-utils.h"
 #include "mux-utils.h"
 #include "rtlanal.h"
+#include "cfgbuild.h"
 
 // Provides the global crtl->ssa.
 #include "memmodel.h"
diff --git a/gcc/rtl-ssa/movement.h b/gcc/rtl-ssa/movement.h
index d9945f49172..67370947dbd 100644
--- a/gcc/rtl-ssa/movement.h
+++ b/gcc/rtl-ssa/movement.h
@@ -61,7 +61,8 @@ move_earlier_than (insn_range_info range, insn_info *insn)
 inline bool
 can_insert_after (insn_info *insn)
 {
-  return insn->is_bb_head () || (insn->is_real () && !insn->is_jump ());
+  return (insn->is_bb_head ()
+ || (insn->is_real () && !control_flow_insn_p (insn->rtl (;
 }
 
 // Try to restrict move range MOVE_RANGE so that it is possible to
-- 
2.25.1



[PATCH 4/4] rtl-ssa: Avoid creating duplicated phis

2023-10-24 Thread Richard Sandiford
If make_uses_available was called twice for the same use,
we could end up trying to create duplicate definitions for
the same extended live range.

gcc/
* rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Check
whether the requested phi already exists.
---
 gcc/rtl-ssa/blocks.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index d46cbf1e388..ecce7a68c59 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -525,6 +525,11 @@ function_info::create_phi (ebb_info *ebb, resource_info 
resource,
 phi_info *
 function_info::create_degenerate_phi (ebb_info *ebb, set_info *def)
 {
+  // Allow the function to be called twice in succession for the same def.
+  def_lookup dl = find_def (def->resource (), ebb->phi_insn ());
+  if (set_info *set = dl.matching_set ())
+return as_a (set);
+
   access_info *input = def;
   phi_info *phi = create_phi (ebb, def->resource (), , 1);
   if (def->is_reg ())
-- 
2.25.1



[PATCH 1/4] rtl-ssa: Fix null deref in first_any_insn_use

2023-10-24 Thread Richard Sandiford
first_any_insn_use implicitly (but contrary to its documentation)
assumed that there was at least one use.

gcc/
* rtl-ssa/member-fns.inl (first_any_insn_use): Handle null
m_first_use.
---
 gcc/rtl-ssa/member-fns.inl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa/member-fns.inl b/gcc/rtl-ssa/member-fns.inl
index c127fab8b98..3fdca14e0ef 100644
--- a/gcc/rtl-ssa/member-fns.inl
+++ b/gcc/rtl-ssa/member-fns.inl
@@ -215,7 +215,7 @@ set_info::last_nondebug_insn_use () const
 inline use_info *
 set_info::first_any_insn_use () const
 {
-  if (m_first_use->is_in_any_insn ())
+  if (m_first_use && m_first_use->is_in_any_insn ())
 return m_first_use;
   return nullptr;
 }
-- 
2.25.1



[PATCH 0/4] rtl-ssa: Some small, obvious fixes

2023-10-24 Thread Richard Sandiford
This series contains some small fixes to RTL-SSA.  Tested on
aarch64-linux-gnu & x86_64-linux-gnu, pushed as obvious.

Richard Sandiford (4):
  rtl-ssa: Fix null deref in first_any_insn_use
  rtl-ssa: Fix handling of deleted insns
  rtl-ssa: Don't insert after insns that can throw
  rtl-ssa: Avoid creating duplicated phis

 gcc/rtl-ssa.h  | 1 +
 gcc/rtl-ssa/blocks.cc  | 5 +
 gcc/rtl-ssa/changes.cc | 5 -
 gcc/rtl-ssa/member-fns.inl | 2 +-
 gcc/rtl-ssa/movement.h | 3 ++-
 5 files changed, 13 insertions(+), 3 deletions(-)

-- 
2.25.1



[PING] [PATCH 1/3] [GCC] arm: vld1q_types_x2 ACLE intrinsics

2023-10-24 Thread Ezra Sitorus
Ping


From: ezra.sito...@arm.com 
Sent: Friday, October 6, 2023 10:49 AM
To: gcc-patches@gcc.gnu.org
Cc: Richard Earnshaw; Kyrylo Tkachov
Subject: [PATCH 1/3] [GCC] arm: vld1q_types_x2 ACLE intrinsics

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vld1q intrinsic for arm32.
This patch adds the _x2 variants of the vld1q intrinsic. Tests use xN so that 
the latter variants (_x3, _x4) could be added.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x2, vld1q_u16_x2, vld1q_u32_x2, vld1q_u64_x2): New.
(vld1q_s8_x2, vld1q_s16_x2, vld1q_s32_x2, vld1q_s64_x2): New.
(vld1q_f16_x2, vld1q_f32_x2): New.
(vld1q_p8_x2, vld1q_p16_x2, vld1q_p64_x2): New.
(vld1q_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vld1_x2): New entries.
* config/arm/neon.md (vld1_x2): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new test.
---
 gcc/config/arm/arm_neon.h | 128 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  67 +
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |  13 ++
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |  14 ++
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |  14 ++
 7 files changed, 247 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index cdfdb44259a..3eb41c6bdc8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10403,6 +10403,15 @@ vld1q_p64 (const poly64_t * __a)
   return (poly64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }

+__extension__ extern __inline poly64x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_p64_x2 (const poly64_t * __a)
+{
+  union { poly64x2x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10432,6 +10441,42 @@ vld1q_s64 (const int64_t * __a)
   return (int64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }

+__extension__ extern __inline int8x16x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s8_x2 (const int8_t * __a)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s16_x2 (const int16_t * __a)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s32_x2 (const int32_t * __a)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s64_x2 (const int64_t * __a)
+{
+  union { int64x2x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10448,6 +10493,26 @@ vld1q_f32 (const float32_t * __a)
   return (float32x4_t)__builtin_neon_vld1v4sf ((const __builtin_neon_sf *) 
__a);
 }

+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f16_x2 (const float16_t * __a)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern 

Re: [PATCH] i386: Avoid paradoxical subreg dests in vector zero_extend

2023-10-24 Thread Uros Bizjak
On Tue, Oct 24, 2023 at 12:08 PM Richard Sandiford
 wrote:
>
> For the V2HI -> V2SI zero extension in:
>
>   typedef unsigned short v2hi __attribute__((vector_size(4)));
>   typedef unsigned int v2si __attribute__((vector_size(8)));
>   v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }
>
> ix86_expand_sse_extend would generate:
>
>(set (reg:V2HI 102)
> (const_vector:V2HI [(const_int 0 [0])
> (const_int 0 [0])]))
>(set (subreg:V8HI (reg:V2HI 101) 0)
> (vec_select:V8HI
>   (vec_concat:V16HI (subreg:V8HI (reg/v:V2HI 99 [ x ]) 0)
> (subreg:V8HI (reg:V2HI 102) 0))
>   (parallel [(const_int 0 [0])
>  (const_int 8 [0x8])
>  (const_int 1 [0x1])
>  (const_int 9 [0x9])
>  (const_int 2 [0x2])
>  (const_int 10 [0xa])
>  (const_int 3 [0x3])
>  (const_int 11 [0xb])])))
>   (set (reg:V2SI 100)
>(subreg:V2SI (reg:V2HI 101) 0))
> (expr_list:REG_EQUAL (zero_extend:V2SI (reg/v:V2HI 99 [ x ])))
>
> But using (subreg:V2SI (reg:V2HI 101) 0) as the destination of
> the vec_select means that only the low 4 bytes of the destination
> are stored.  Only the lower half of reg 100 is well-defined.
>
> Things tend to happen to work if the register allocator ties reg 101
> to reg 100.  But it caused problems with the upcoming late-combine pass
> because we propagated the set of reg 100 into its uses.
>
> Tested on x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * config/i386/i386-expand.cc (ix86_split_mmx_punpck): Allow the
> destination to be wider than the sources.  Take the mode from the
> first source.
> (ix86_expand_sse_extend): Pass the destination directly to
> ix86_split_mmx_punpck, rather than using a fresh register that
> is half the size.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.cc | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 1eae9d7c78c..2361ff77af3 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -1110,7 +1110,9 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
>ix86_move_vector_high_sse_to_mmx (op0);
>  }
>
> -/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
> +/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  This is also used
> +   for a full unpack of OPERANDS[1] and OPERANDS[2] into a wider
> +   OPERANDS[0].  */
>
>  void
>  ix86_split_mmx_punpck (rtx operands[], bool high_p)
> @@ -1118,7 +1120,7 @@ ix86_split_mmx_punpck (rtx operands[], bool high_p)
>rtx op0 = operands[0];
>rtx op1 = operands[1];
>rtx op2 = operands[2];
> -  machine_mode mode = GET_MODE (op0);
> +  machine_mode mode = GET_MODE (op1);
>rtx mask;
>/* The corresponding SSE mode.  */
>machine_mode sse_mode, double_sse_mode;
> @@ -5660,7 +5662,7 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
> unsigned_p)
>gcc_unreachable ();
>  }
>
> -  ops[0] = gen_reg_rtx (imode);
> +  ops[0] = dest;
>
>ops[1] = force_reg (imode, src);
>
> @@ -5671,7 +5673,6 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
> unsigned_p)
>   ops[1], pc_rtx, pc_rtx);
>
>ix86_split_mmx_punpck (ops, false);
> -  emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), ops[0], imode));
>  }
>
>  /* Unpack SRC into the next wider integer vector type.  UNSIGNED_P is
> --
> 2.25.1
>


[PATCH] recog: Fix propagation into ASM_OPERANDS

2023-10-24 Thread Richard Sandiford
An inline asm with multiple output operands is represented as a
parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS.
insn_propgation didn't account for this, and instead propagated
into each ASM_OPERANDS individually.  This meant that it could
apply a substitution X->Y to Y itself, which (a) could create
circularity and (b) would be semantically wrong in any case,
since Y might use a different value of X.

This patch checks explicitly for parallels involving ASM_OPERANDS,
just like combine does.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (insn_propagation::apply_to_pattern_1): Handle shared
ASM_OPERANDS.
---
 gcc/recog.cc | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/recog.cc b/gcc/recog.cc
index e12b4c9500e..3bd2d73c259 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1339,13 +1339,26 @@ insn_propagation::apply_to_pattern_1 (rtx *loc)
  && apply_to_pattern_1 (_EXEC_CODE (body)));
 
 case PARALLEL:
-  {
-   int last = XVECLEN (body, 0) - 1;
-   for (int i = 0; i < last; ++i)
- if (!apply_to_pattern_1 ( (body, 0, i)))
-   return false;
-   return apply_to_pattern_1 ( (body, 0, last));
-  }
+  for (int i = 0; i < XVECLEN (body, 0); ++i)
+   {
+ rtx *subloc =  (body, 0, i);
+ if (GET_CODE (*subloc) == SET)
+   {
+ if (!apply_to_lvalue_1 (SET_DEST (*subloc)))
+   return false;
+ /* ASM_OPERANDS are shared between SETs in the same PARALLEL.
+Only process them on the first iteration.  */
+ if ((i == 0 || GET_CODE (SET_SRC (*subloc)) != ASM_OPERANDS)
+ && !apply_to_rvalue_1 (_SRC (*subloc)))
+   return false;
+   }
+ else
+   {
+ if (!apply_to_pattern_1 (subloc))
+   return false;
+   }
+   }
+  return true;
 
 case ASM_OPERANDS:
   for (int i = 0, len = ASM_OPERANDS_INPUT_LENGTH (body); i < len; ++i)
-- 
2.25.1



[PATCH] recog/reload: Remove old UNARY_P operand support

2023-10-24 Thread Richard Sandiford
reload and constrain_operands had some old code to look through unary
operators.  E.g. an operand could be (sign_extend (reg X)), and the
constraints would match the reg rather than the sign_extend.

This was previously used by the MIPS port.  But relying on it was a
recurring source of problems, so Eric and I removed it in the MIPS
rewrite from ~20 years back.  I don't know of any other port that used it.

Also, the constraints processing in LRA and IRA do not have direct
support for these embedded operators, so I think it was only ever a
reload-specific feature (and probably only a global/local+reload-specific
feature, rather than IRA+reload).

Keeping the checks caused problems for special memory constraints,
leading to:

  /* A unary operator may be accepted by the predicate, but it
 is irrelevant for matching constraints.  */
  /* For special_memory_operand, there could be a memory operand inside,
 and it would cause a mismatch for constraint_satisfied_p.  */
  if (UNARY_P (op) && op == extract_mem_from_operand (op))
op = XEXP (op, 0);

But inline asms are another source of problems.  Asms don't have
predicates, and so we can't use recog to decide whether a given change
to an asm gives a valid match.  We instead rely on constrain_operands as
something of a recog stand-in.  For an example like:

void
foo (int *ptr)
{
  asm volatile ("%0" :: "r" (-*ptr));
}

any attempt to propagate the negation into the asm would be allowed,
because it's the negated register that would be checked against the
"r" constraint.  This would later lead to:

error: invalid 'asm': invalid operand

The same thing happened in gcc.target/aarch64/vneg_s.c with the
upcoming late-combine pass.

Rather than add more workarounds, it seemed better just to delete
this code.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (constrain_operands): Remove UNARY_P handling.
* reload.cc (find_reloads): Likewise.
---
 gcc/recog.cc  | 15 ---
 gcc/reload.cc |  6 --
 2 files changed, 21 deletions(-)

diff --git a/gcc/recog.cc b/gcc/recog.cc
index 92f151248a6..e12b4c9500e 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -3080,13 +3080,6 @@ constrain_operands (int strict, alternative_mask 
alternatives)
 
  earlyclobber[opno] = 0;
 
- /* A unary operator may be accepted by the predicate, but it
-is irrelevant for matching constraints.  */
- /* For special_memory_operand, there could be a memory operand inside,
-and it would cause a mismatch for constraint_satisfied_p.  */
- if (UNARY_P (op) && op == extract_mem_from_operand (op))
-   op = XEXP (op, 0);
-
  if (GET_CODE (op) == SUBREG)
{
  if (REG_P (SUBREG_REG (op))
@@ -3152,14 +3145,6 @@ constrain_operands (int strict, alternative_mask 
alternatives)
{
  rtx op1 = recog_data.operand[match];
  rtx op2 = recog_data.operand[opno];
-
- /* A unary operator may be accepted by the predicate,
-but it is irrelevant for matching constraints.  */
- if (UNARY_P (op1))
-   op1 = XEXP (op1, 0);
- if (UNARY_P (op2))
-   op2 = XEXP (op2, 0);
-
  val = operands_match_p (op1, op2);
}
 
diff --git a/gcc/reload.cc b/gcc/reload.cc
index 2e57ebb3cac..07256b6cf2f 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -3077,12 +3077,6 @@ find_reloads (rtx_insn *insn, int replace, int 
ind_levels, int live_known,
  enum constraint_num cn;
  enum reg_class cl;
 
- /* If the predicate accepts a unary operator, it means that
-we need to reload the operand, but do not do this for
-match_operator and friends.  */
- if (UNARY_P (operand) && *p != 0)
-   operand = XEXP (operand, 0);
-
  /* If the operand is a SUBREG, extract
 the REG or MEM (or maybe even a constant) within.
 (Constants can occur as a result of reg_equiv_constant.)  */
-- 
2.25.1



[PATCH] i386: Fix undefined masks in vpopcnt tests

2023-10-24 Thread Richard Sandiford
The files changed in this patch had tests for masked and unmasked
popcnt.  However, the mask inputs to the masked forms were undefined,
and would be set to zero by init_regs.  Any combine-like pass that
ran after init_regs could then fold the masked forms into the
unmasked ones.  I saw this while testing the late-combine pass
on x86.

Tested on x86_64-linux-gnu.  OK to install?  (I didn't think this
counted as obvious because there are other ways of initialising
the mask.)

Richard


gcc/testsuite/
* gcc.target/i386/avx512bitalg-vpopcntb.c: Use an asm to define
the mask.
* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Likewise.
* gcc.target/i386/avx512bitalg-vpopcntw.c: Likewise.
* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Likewise.
* gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Likewise.
* gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c| 1 +
 gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c| 1 +
 gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c  | 1 +
 gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c | 1 +
 gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c | 1 +
 6 files changed, 6 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c 
b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
index 44b82c0519d..c52088161a0 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
@@ -11,6 +11,7 @@ extern __m512i z, z1;
 int foo ()
 {
   __mmask16 msk;
+  asm volatile ("" : "=k" (msk));
   __m512i c = _mm512_popcnt_epi8 (z);
   asm volatile ("" : "+v" (c));
   c = _mm512_mask_popcnt_epi8 (z1, msk, z);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c 
b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
index 8c2dfaba9c6..7d11c6c4623 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
@@ -16,6 +16,7 @@ int foo ()
 {
   __mmask32 msk32;
   __mmask16 msk16;
+  asm volatile ("" : "=k" (msk16), "=k" (msk32));
   __m256i c256 = _mm256_popcnt_epi8 (y);
   asm volatile ("" : "+v" (c256));
   c256 = _mm256_mask_popcnt_epi8 (y_1, msk32, y);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c 
b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
index 2ef8589f6c1..bc470415e9b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
@@ -11,6 +11,7 @@ extern __m512i z, z1;
 int foo ()
 {
   __mmask16 msk;
+  asm volatile ("" : "=k" (msk));
   __m512i c = _mm512_popcnt_epi16 (z);
   asm volatile ("" : "+v" (c));
   c = _mm512_mask_popcnt_epi16 (z1, msk, z);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c 
b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
index c976461b12e..3a6af3ed8a1 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
@@ -16,6 +16,7 @@ int foo ()
 {
   __mmask16 msk16;
   __mmask8 msk8;
+  asm volatile ("" : "=k" (msk16), "=k" (msk8));
   __m256i c256 = _mm256_popcnt_epi16 (y);
   asm volatile ("" : "+v" (c256));
   c256 = _mm256_mask_popcnt_epi16 (y_1, msk16, y);
diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c 
b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
index b4d82f97032..0a54ae83055 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
@@ -20,6 +20,7 @@ int foo ()
 {
   __mmask16 msk;
   __mmask8 msk8;
+  asm volatile ("" : "=k" (msk), "=k" (msk8));
   __m128i a = _mm_popcnt_epi32 (x);
   asm volatile ("" : "+v" (a));
   a = _mm_mask_popcnt_epi32 (x_1, msk8, x);
diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c 
b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
index e87d6c999b6..c11e6e00998 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
@@ -19,6 +19,7 @@ extern __m512i z, z_1;
 int foo ()
 {
   __mmask8 msk; 
+  asm volatile ("" : "=k" (msk));
   __m128i a = _mm_popcnt_epi64 (x);
   asm volatile ("" : "+v" (a));
   a = _mm_mask_popcnt_epi64 (x_1, msk, x);
-- 
2.25.1



[PATCH] i386: Avoid paradoxical subreg dests in vector zero_extend

2023-10-24 Thread Richard Sandiford
For the V2HI -> V2SI zero extension in:

  typedef unsigned short v2hi __attribute__((vector_size(4)));
  typedef unsigned int v2si __attribute__((vector_size(8)));
  v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }

ix86_expand_sse_extend would generate:

   (set (reg:V2HI 102)
(const_vector:V2HI [(const_int 0 [0])
(const_int 0 [0])]))
   (set (subreg:V8HI (reg:V2HI 101) 0)
(vec_select:V8HI
  (vec_concat:V16HI (subreg:V8HI (reg/v:V2HI 99 [ x ]) 0)
(subreg:V8HI (reg:V2HI 102) 0))
  (parallel [(const_int 0 [0])
 (const_int 8 [0x8])
 (const_int 1 [0x1])
 (const_int 9 [0x9])
 (const_int 2 [0x2])
 (const_int 10 [0xa])
 (const_int 3 [0x3])
 (const_int 11 [0xb])])))
  (set (reg:V2SI 100)
   (subreg:V2SI (reg:V2HI 101) 0))
(expr_list:REG_EQUAL (zero_extend:V2SI (reg/v:V2HI 99 [ x ])))

But using (subreg:V2SI (reg:V2HI 101) 0) as the destination of
the vec_select means that only the low 4 bytes of the destination
are stored.  Only the lower half of reg 100 is well-defined.

Things tend to happen to work if the register allocator ties reg 101
to reg 100.  But it caused problems with the upcoming late-combine pass
because we propagated the set of reg 100 into its uses.

Tested on x86_64-linux-gnu.  OK to install?

Richard


gcc/
* config/i386/i386-expand.cc (ix86_split_mmx_punpck): Allow the
destination to be wider than the sources.  Take the mode from the
first source.
(ix86_expand_sse_extend): Pass the destination directly to
ix86_split_mmx_punpck, rather than using a fresh register that
is half the size.
---
 gcc/config/i386/i386-expand.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 1eae9d7c78c..2361ff77af3 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1110,7 +1110,9 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
   ix86_move_vector_high_sse_to_mmx (op0);
 }
 
-/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
+/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  This is also used
+   for a full unpack of OPERANDS[1] and OPERANDS[2] into a wider
+   OPERANDS[0].  */
 
 void
 ix86_split_mmx_punpck (rtx operands[], bool high_p)
@@ -1118,7 +1120,7 @@ ix86_split_mmx_punpck (rtx operands[], bool high_p)
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  machine_mode mode = GET_MODE (op0);
+  machine_mode mode = GET_MODE (op1);
   rtx mask;
   /* The corresponding SSE mode.  */
   machine_mode sse_mode, double_sse_mode;
@@ -5660,7 +5662,7 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
unsigned_p)
   gcc_unreachable ();
 }
 
-  ops[0] = gen_reg_rtx (imode);
+  ops[0] = dest;
 
   ops[1] = force_reg (imode, src);
 
@@ -5671,7 +5673,6 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
unsigned_p)
  ops[1], pc_rtx, pc_rtx);
 
   ix86_split_mmx_punpck (ops, false);
-  emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), ops[0], imode));
 }
 
 /* Unpack SRC into the next wider integer vector type.  UNSIGNED_P is
-- 
2.25.1



[pushed] i386: Fix unprotected REGNO in aeswidekl_operation

2023-10-24 Thread Richard Sandiford
I hit an ICE in aeswidekl_operation while testing the late-combine
pass on x86.  The predicate tested REGNO without first testing REG_P.

Tested on x86_64-linux-gnu & pushed as obvious.

Richard


gcc/
* config/i386/predicates.md (aeswidekl_operation): Protect
REGNO check with REG_P.
---
 gcc/config/i386/predicates.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index ef49efdbde5..e3d55f0c502 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -2260,6 +2260,7 @@ (define_predicate "aeswidekl_operation"
  || GET_CODE (SET_SRC (elt)) != UNSPEC_VOLATILE
  || GET_MODE (SET_SRC (elt)) != V2DImode
  || XVECLEN (SET_SRC (elt), 0) != 1
+ || !REG_P (XVECEXP (SET_SRC (elt), 0, 0))
  || REGNO (XVECEXP (SET_SRC (elt), 0, 0)) != GET_SSE_REGNO (i))
return false;
 }
-- 
2.25.1



  1   2   >