Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-11 Thread Ajit Agarwal
Hello Kewen:

On 12/12/23 11:58 am, Kewen.Lin wrote:
> Hi Ajit,
> 
> on 2023/12/8 16:01, Ajit Agarwal wrote:
>> Hello Kewen:
>>
> 
> [snip...]
> 
>> With UNSPEC_MMA_EXTRACT I could generate the register pair but functionally 
>> here is the
>> below code which is incorrect.
>>
>>  l   lxvp %vs0,0(%r4)
>> xxlor %vs32,%vs0,%vs0
>> xvf32ger 0,%vs34,%vs32
>> xvf32gerpp 0,%vs34,%vs33
>> xxmfacc 0
>> stxvp %vs2,0(%r3)
>> stxvp %vs0,32(%r3)
>> blr
>>
>>
>> Here is the RTL Code:
>>
>> (insn 19 4 20 2 (set (reg:OO 124 [ *ptr_4(D) ])
>> (mem:OO (reg/v/f:DI 122 [ ptr ]) [0 *ptr_4(D)+0 S16 A128])) -1
>>  (nil))
>> (insn 20 19 9 2 (set (reg:V16QI 129 [orig:124 *ptr_4(D) ] [124])
>> (subreg:V16QI (reg:OO 124 [ *ptr_4(D) ]) 0)) -1
>>  (nil))
>> (insn 9 20 11 2 (set (reg:XO 119 [ _7 ])
>> (unspec:XO [
>> (reg/v:V16QI 123 [ src ])
>> (reg:V16QI 129 [orig:124 *ptr_4(D) ] [124])
>> ] UNSPEC_MMA_XVF32GER)) 2195 {mma_xvf32ger}
>>  (expr_list:REG_DEAD (reg:OO 124 [ *ptr_4(D) ])
>> (nil)))
>> (insn 11 9 12 2 (set (reg:XO 120 [ _9 ])
>> (unspec:XO [
>> (reg:XO 119 [ _7 ])
>> (reg/v:V16QI 123 [ src ])
>> (reg:V16QI 125 [ MEM[(__vector unsigned char *)ptr_4(D) + 
>> 16B] ])
>> ] UNSPEC_MMA_XVF32GERPP)) 2209 {mma_xvf32gerpp}
>>  (expr_list:REG_DEAD (reg:V16QI 125 [ MEM[(__vector unsigned char 
>> *)ptr_4(D) + 16B] ])
>> (expr_list:REG_DEAD (reg/v:V16QI 123 [ src ])
>> (expr_list:REG_DEAD (reg:XO 119 [ _7 ])
>> (nil)
>> (insn 12 11 18 2 (set (mem:XO (reg:DI 126) [1 *dst_10(D)+0 S64 A128])
>> (reg:XO 120 [ _9 ])) 
>> "../gcc/testsuite/g++.target/powerpc/vecload.C":13:8 2182 {*movxo}
>>  (expr_list:REG_DEAD (reg:DI 126)
>> (expr_list:REG_DEAD (reg:XO 120 [ _9 ])
>> (nil
>> (note 18 12 0 NOTE_INSN_DELETED)
>>
>> r124 and r129 conflicts live range amd ira generates different registers 
>> which will not
>> serve our purpose.
>>
>> Making r124 and r129 as same will not allocate register by ira as r124 could 
>> have both OOmode
>> and V16QImode.
>>
>> Doing this pass before ira_pass has such above issues and we could solve 
>> them after making
>> after reload pass.
> 
> Could you also attach your latest WIP patch?  I'm going to look into the 
> extra move issue with it.
>

I have fixed the register allocator IRA pass to generate the register pair and 
with that no extra move 
is also generated and now with the fix the code is generated with register pair.

Earlier you have the idea to use SUBREG V16QI (reg: OOmode 124) at the use that 
also doesn't generate
the register pair. I had to make changes in IRA pass register allocator to 
generate register pair
and no extra moves are also not generated. I am testing the fix and would send 
for code review very 
soon.

Thanks for the help and suggestions.

Thanks & Regards
Ajit
 
> Thanks!
> 
> BR,
> Kewen


[PATCH] tree-optimization/112939 - VN PHI visiting and -ftrivial-auto-var-init

2023-12-11 Thread Richard Biener
The following builds upon the last fix, making sure we only value-number
to visited (un-)defs, otherwise prefer .VN_TOP.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112939
* tree-ssa-sccvn.cc (visit_phi): When all args are undefined
make sure we end up with a value that was visited, otherwise
fall back to .VN_TOP.

* gcc.dg/pr112939.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr112939.c | 23 +++
 gcc/tree-ssa-sccvn.cc   |  4 +++-
 2 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr112939.c

diff --git a/gcc/testsuite/gcc.dg/pr112939.c b/gcc/testsuite/gcc.dg/pr112939.c
new file mode 100644
index 000..7017beff30a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr112939.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftrivial-auto-var-init=zero" } */
+
+int i;
+
+void f (void)
+{
+  for (;;)
+  {
+if (0)
+  for (;;)
+  {
+int *a;
+int *b = a;
+
+ l1:
+*b = (*b != 0) ? 0 : 2;
+  }
+
+if (i != 0)
+  goto l1;
+  }
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 11537fa3e0b..a178b768459 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -5946,6 +5946,8 @@ visit_phi (gimple *phi, bool *inserted, bool 
backedges_varying_p)
if (TREE_CODE (def) == SSA_NAME)
  {
tree val = SSA_VAL (def, );
+   if (SSA_NAME_IS_DEFAULT_DEF (def))
+ visited = true;
if (!backedges_varying_p || !(e->flags & EDGE_DFS_BACK))
  def = val;
if (e->flags & EDGE_DFS_BACK)
@@ -6091,7 +6093,7 @@ visit_phi (gimple *phi, bool *inserted, bool 
backedges_varying_p)
   /* If we saw only undefined values and VN_TOP use one of the
  undefined values.  */
   else if (sameval == VN_TOP)
-result = seen_undef ? seen_undef : sameval;
+result = (seen_undef && seen_undef_visited) ? seen_undef : sameval;
   /* First see if it is equivalent to a phi node in this block.  We prefer
  this as it allows IV elimination - see PRs 66502 and 67167.  */
   else if ((result = vn_phi_lookup (phi, backedges_varying_p)))
-- 
2.35.3


Re: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Andrew Pinski
On Mon, Dec 11, 2023 at 11:46 AM Richard Sandiford
 wrote:
>
> Jeff Law  writes:
> > On 11/27/23 05:12, Richard Sandiford wrote:
> >> check_asm_operands was inconsistent about how it handled "p" after
> >> RA compared to before RA.  Before RA it tested the address with a
> >> void (unknown) memory mode:
> >>
> >>  case CT_ADDRESS:
> >>/* Every address operand can be reloaded to fit.  */
> >>result = result || address_operand (op, VOIDmode);
> >>break;
> >>
> >> After RA it deferred to constrain_operands, which used the mode
> >> of the operand:
> >>
> >>  if ((GET_MODE (op) == VOIDmode
> >>   || SCALAR_INT_MODE_P (GET_MODE (op)))
> >>  && (strict <= 0
> >>  || (strict_memory_address_p
> >>   (recog_data.operand_mode[opno], op
> >>win = true;
> >>
> >> Using the mode of the operand matches reload's behaviour:
> >>
> >>else if (insn_extra_address_constraint
> >> (lookup_constraint (constraints[i])))
> >>  {
> >>address_operand_reloaded[i]
> >>  = find_reloads_address (recog_data.operand_mode[i], (rtx*) 0,
> >>  recog_data.operand[i],
> >>  recog_data.operand_loc[i],
> >>  i, operand_type[i], ind_levels, insn);
> >>
> >> It allowed the special predicate address_operand to be used, with the
> >> mode of the operand being the mode of the addressed memory, rather than
> >> the mode of the address itself.  For example, vax has:
> >>
> >> (define_insn "*movaddr"
> >>[(set (match_operand:SI 0 "nonimmediate_operand" "=g")
> >>  (match_operand:VAXfp 1 "address_operand" "p"))
> >> (clobber (reg:CC VAX_PSL_REGNUM))]
> >>"reload_completed"
> >>"mova %a1,%0")
> >>
> >> where operand 1 is an SImode expression that can address memory of
> >> mode VAXfp.  GET_MODE (recog_data.operand[1]) is SImode (or VOIDmode),
> >> but recog_data.operand_mode[1] is mode.
> >>
> >> But AFAICT, ira and lra (like pre-reload check_asm_operands) do not
> >> do this, and instead pass VOIDmode.  So I think this traditional use
> >> of address_operand is effectively an old-reload-only feature.
> >>
> >> And it seems like no modern port cares.  I think ports have generally
> >> moved to using different address constraints instead, rather than
> >> relying on "p" with different operand modes.  Target-specific address
> >> constraints post-date the code above.
> >>
> >> The big advantage of using different constraints is that it works
> >> for asms too.  And that (to finally get to the point) is the problem
> >> fixed in this patch.  For the aarch64 test:
> >>
> >>void f(char *p) { asm("prfm pldl1keep, %a0\n" :: "p" (p + 6)); }
> >>
> >> everything up to and including RA required the operand to be a
> >> valid VOIDmode address.  But post-RA check_asm_operands and
> >> constrain_operands instead required it to be valid for
> >> recog_data.operand_mode[0].  Since asms have no syntax for
> >> specifying an operand mode that's separate from the operand itself,
> >> operand_mode[0] is simply Pmode (i.e. DImode).
> >>
> >> This meant that we required one mode before RA and a different mode
> >> after RA.  On AArch64, VOIDmode is treated as a wildcard and so has a
> >> more conservative/restricted range than DImode.  So if a post-RA pass
> >> tried to form a new address, it would use a laxer condition than the
> >> pre-RA passes.
> > This was initially a bit counter-intuitive, my first reaction was that a
> > wildcard mode is more general.  And that's true, but it necessarily
> > means the addresses accepted are more restrictive because any mode is
> > allowed.
>
> Right.  I should probably have a conservative, common subset.
>
> >> This happened with the late-combine pass that I posted in October:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
> >> which in turn triggered an error from aarch64_print_operand_address.
> >>
> >> This patch takes the (hopefully) conservative fix of using VOIDmode for
> >> asms but continuing to use the operand mode for .md insns, so as not
> >> to break ports that still use reload.
> > Sadly I didn't get as far as I would have liked in removing reload,
> > though we did get a handful of ports converted this cycle
> >
> >>
> >> Fixing this made me realise that recog_level2 was doing duplicate
> >> work for asms after RA.
> >>
> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>
> >> Richard
> >>
> >>
> >> gcc/
> >>  * recog.cc (constrain_operands): Pass VOIDmode to
> >>  strict_memory_address_p for 'p' constraints in asms.
> >>  * rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands
> >>  for asms.
> >>
> >> gcc/testsuite/
> >>  * gcc.target/aarch64/prfm_imm_offset_2.c: New test.
> > It all seems a bit hackish.  I don't think ports have 

[PATCH draft v2] sched: Don't skip empty block in scheduling [PR108273]

2023-12-11 Thread Kewen.Lin
Hi,

on 2023/11/22 17:30, Kewen.Lin wrote:
> on 2023/11/17 20:55, Alexander Monakov wrote:
>>
>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
 I don't think you can run cleanup_cfg after sched_init. I would suggest
 to put it early in schedule_insns.
>>>
>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
>>> instead, since schedule_insns invokes haifa_sched_init, although the
>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>>> ahead but they are all "setup" functions, shouldn't affect or be affected
>>> by this placement.
>>
>> I was worried because sched_init invokes df_analyze, and I'm not sure if
>> cfg_cleanup can invalidate it.
> 
> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> makes more sense.
> 
>>
 I suspect this may be caused by invoking cleanup_cfg too late.
>>>
>>> By looking into some failures, I found that although cleanup_cfg is executed
>>> there would be still some empty blocks left, by analyzing a few failures 
>>> there
>>> are at least such cases:
>>>   1. empty function body
>>>   2. block holding a label for return.
>>>   3. block without any successor.
>>>   4. block which becomes empty after scheduling some other block.
>>>   5. block which looks mergeable with its always successor but left.
>>>   ...
>>>
>>> For 1,2, there is one single successor EXIT block, I think they don't affect
>>> state transition, for 3, it's the same.  For 4, it depends on if we can have
>>> the assumption this kind of empty block doesn't have the chance to have 
>>> debug
>>> insn (like associated debug insn should be moved along), I'm not sure.  For 
>>> 5,
>>> a reduced test case is:
>>
>> Oh, I should have thought of cases like these, really sorry about the slip
>> of attention, and thanks for showing a testcase for item 5. As Richard as
>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
>> would be changing no_real_insns_p to always return false, and see if the
>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
>> a non-bootstrapped compiler are still informative).
> 
> As you suggested, I forced no_real_insns_p to return false all the time, some
> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
> encountered in those places, so the adjustments for most of them are just to
> consider NOTE_P or this kind of special block and so on.  One draft patch is
> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> empty blocks and expose more issues).  The draft isn't qualified for code
> review but I hope it can provide some information on what kinds of changes
> are needed for the proposal.  If this is the direction which we all agree on,
> I'll further refine it and post a formal patch.  One thing I want to note is
> that this patch disable one assertion below:
> 
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..abd334864fb 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
>  }
> 
>/* Sanity check: verify that all region insns were scheduled.  */
> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
>sched_finish_ready_list ();
> 
> Some cases can cause this assertion to fail, it's due to the mismatch on
> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
> one block previously has only one INSN_P but while scheduling some other 
> blocks
> it gets moved as well then we ends up with an empty block so that the only
> NOTE_P insn was counted then, but since this block isn't empty initially and
> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
> it in.  It can be fixed with special-casing this kind of block for counting
> like initially recording which block is empty and if a block isn't recorded
> before then fix up the count for it accordingly.  I'm not sure if someone may
> have an argument that all the complication make this proposal beaten by
> previous special-casing debug insn approach, looking forward to more comments.
> 

The attached one is the improved draft patch v2 for skipping empty BB, against
the previous draft, it does:
  1) use NONDEBUG_INSN_P for !DEBUG_INSN_P && !NOTE_P when it's appropriate;
  2) merge NOTE_P special handling into the one on DEBUG_INSN_P;
  3) fix exposed issue on broad testing on EBB;
  4) introduce rgn_init_empty_bb for mismatch count issue;
  5) add/refine some comments;

It's bootstrapped and regress-tested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.  I also tested with EBB turned on by default, one
issue in schedule_ebb got exposed and 

[PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-11 Thread Xi Ruoyao
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases.  For example:

float a[1];
float t() { return a[0] + a[8000]; }

is compiled to:

la.local$r13,a
la.local$r12,a+32768
fld.s   $f1,$r13,0
fld.s   $f0,$r12,-768
fadd.s  $f0,$f1,$f0

by trunk.  But as we've explained in r14-4851, the following would be
better with -mexplicit-relocs=auto:

pcalau12i   $r13,%pc_hi20(a)
pcalau12i   $r12,%pc_hi20(a+32000)
fld.s   $f1,$r13,%pc_lo12(a)
fld.s   $f0,$r12,%pc_lo12(a+32000)
fadd.s  $f0,$f1,$f0

However the sliding-window algorithm just won't detect the pcalau12i/fld
pair to be optimized.  Use a define_insn_and_split in combine pass will
work around the issue.

gcc/ChangeLog:

* config/loongarch/loongarch.md:
(simple_load): New
define_insn_and_split.
(simple_load_off): Likewise.
(simple_load_ext): Likewise.
(simple_load_offext):
Likewise.
(simple_store): Likewise.
(simple_store_off): Likewise.
(define_peephole2): Remove la.local/[f]ld peepholes.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c:
New test.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 165 +-
 ...explicit-relocs-auto-single-load-store-2.c |  11 ++
 2 files changed, 98 insertions(+), 78 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 7b26d15aa4e..4009de408fb 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4033,101 +4033,110 @@ (define_insn "loongarch_crcc_w__w"
 ;;
 ;; And if the pseudo op cannot be relaxed, we'll get a worse result (with
 ;; 3 instructions).
-(define_peephole2
-  [(set (match_operand:P 0 "register_operand")
-   (match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand")
-   (mem:LD_AT_LEAST_32_BIT (match_dup 0)))]
-  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
-   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
-   && (peep2_reg_dead_p (2, operands[0]) \
-   || REGNO (operands[0]) == REGNO (operands[2]))"
-  [(set (match_dup 2)
-   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1]
+(define_insn_and_split "simple_load"
+  [(set (match_operand:LD_AT_LEAST_32_BIT 0 "register_operand" "=r,f")
+   (mem:LD_AT_LEAST_32_BIT
+ (match_operand:P 1 "symbolic_pcrel_operand" "")))]
+  "loongarch_pre_reload_split () \
+   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)"
+  "#"
+  ""
+  [(set (match_dup 0)
+   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 2) (match_dup 1]
   {
-emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
+operands[2] = gen_reg_rtx (Pmode);
+emit_insn (gen_pcalau12i_gr (operands[2], operands[1]));
   })
 
-(define_peephole2
-  [(set (match_operand:P 0 "register_operand")
-   (match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand")
-   (mem:LD_AT_LEAST_32_BIT (plus (match_dup 0)
-   (match_operand 3 "const_int_operand"]
-  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
-   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
-   && (peep2_reg_dead_p (2, operands[0]) \
-   || REGNO (operands[0]) == REGNO (operands[2]))"
-  [(set (match_dup 2)
-   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1]
+(define_insn_and_split "simple_load_off"
+  [(set (match_operand:LD_AT_LEAST_32_BIT 0 "register_operand" "=r,f")
+   (mem:LD_AT_LEAST_32_BIT
+ (plus (match_operand:P 1 "symbolic_pcrel_operand" "")
+   (match_operand 2 "const_int_operand" ""]
+  "loongarch_pre_reload_split () \
+   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)"
+  "#"
+  ""
+  [(set (match_dup 0)
+   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 2) (match_dup 1]
   {
-operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3]));
-emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
+HOST_WIDE_INT offset = INTVAL (operands[2]);
+operands[2] = gen_reg_rtx (Pmode);
+operands[1] = plus_constant (Pmode, operands[1], offset);
+emit_insn (gen_pcalau12i_gr (operands[2], operands[1]));
   })
 
-(define_peephole2
-  [(set (match_operand:P 0 "register_operand")
-   (match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (match_operand:GPR 2 "register_operand")
-   (any_extend:GPR (mem:SUBDI (match_dup 0]
-  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
-   && (TARGET_CMODEL_NORMAL || 

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-11 Thread Kewen.Lin
Hi Ajit,

on 2023/12/8 16:01, Ajit Agarwal wrote:
> Hello Kewen:
> 

[snip...]

> With UNSPEC_MMA_EXTRACT I could generate the register pair but functionally 
> here is the
> below code which is incorrect.
> 
>  llxvp %vs0,0(%r4)
> xxlor %vs32,%vs0,%vs0
> xvf32ger 0,%vs34,%vs32
> xvf32gerpp 0,%vs34,%vs33
> xxmfacc 0
> stxvp %vs2,0(%r3)
> stxvp %vs0,32(%r3)
> blr
> 
> 
> Here is the RTL Code:
> 
> (insn 19 4 20 2 (set (reg:OO 124 [ *ptr_4(D) ])
> (mem:OO (reg/v/f:DI 122 [ ptr ]) [0 *ptr_4(D)+0 S16 A128])) -1
>  (nil))
> (insn 20 19 9 2 (set (reg:V16QI 129 [orig:124 *ptr_4(D) ] [124])
> (subreg:V16QI (reg:OO 124 [ *ptr_4(D) ]) 0)) -1
>  (nil))
> (insn 9 20 11 2 (set (reg:XO 119 [ _7 ])
> (unspec:XO [
> (reg/v:V16QI 123 [ src ])
> (reg:V16QI 129 [orig:124 *ptr_4(D) ] [124])
> ] UNSPEC_MMA_XVF32GER)) 2195 {mma_xvf32ger}
>  (expr_list:REG_DEAD (reg:OO 124 [ *ptr_4(D) ])
> (nil)))
> (insn 11 9 12 2 (set (reg:XO 120 [ _9 ])
> (unspec:XO [
> (reg:XO 119 [ _7 ])
> (reg/v:V16QI 123 [ src ])
> (reg:V16QI 125 [ MEM[(__vector unsigned char *)ptr_4(D) + 
> 16B] ])
> ] UNSPEC_MMA_XVF32GERPP)) 2209 {mma_xvf32gerpp}
>  (expr_list:REG_DEAD (reg:V16QI 125 [ MEM[(__vector unsigned char 
> *)ptr_4(D) + 16B] ])
> (expr_list:REG_DEAD (reg/v:V16QI 123 [ src ])
> (expr_list:REG_DEAD (reg:XO 119 [ _7 ])
> (nil)
> (insn 12 11 18 2 (set (mem:XO (reg:DI 126) [1 *dst_10(D)+0 S64 A128])
> (reg:XO 120 [ _9 ])) 
> "../gcc/testsuite/g++.target/powerpc/vecload.C":13:8 2182 {*movxo}
>  (expr_list:REG_DEAD (reg:DI 126)
> (expr_list:REG_DEAD (reg:XO 120 [ _9 ])
> (nil
> (note 18 12 0 NOTE_INSN_DELETED)
> 
> r124 and r129 conflicts live range amd ira generates different registers 
> which will not
> serve our purpose.
> 
> Making r124 and r129 as same will not allocate register by ira as r124 could 
> have both OOmode
> and V16QImode.
> 
> Doing this pass before ira_pass has such above issues and we could solve them 
> after making
> after reload pass.

Could you also attach your latest WIP patch?  I'm going to look into the extra 
move issue with it.

Thanks!

BR,
Kewen


PING^1 [PATCH] sched: Remove debug counter sched_block

2023-12-11 Thread Kewen.Lin
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636597.html

BR,
Kewen

on 2023/11/15 17:01, Kewen.Lin wrote:
> Hi,
> 
> on 2023/11/10 01:40, Alexander Monakov wrote:
> 
>> I agree with the concern. I hoped that solving the problem by skipping the BB
>> like the (bit-rotted) debug code needs to would be a minor surgery. As things
>> look now, it may be better to remove the non-working sched_block debug 
>> counter
>> entirely and implement a good solution for the problem at hand.
>>
> 
> According to this comment, I made and tested the below patch to remove the
> problematic debug counter:
> 
> Subject: [PATCH] sched: Remove debug counter sched_block
> 
> Currently the debug counter sched_block doesn't work well
> since we create dependencies for some insns and those
> dependencies are expected to be resolved during scheduling
> insns but they can get skipped once we are skipping some
> block while respecting sched_block debug counter.
> 
> For example, for the below test case:
> --
> int a, b, c, e, f;
> float d;
> 
> void
> g ()
> {
>   float h, i[1];
>   for (; f;)
> if (c)
>   {
>   d *e;
>   if (b)
> {
>   float *j = i;
>   j[0] = 0;
> }
>   h = d;
>   }
>   if (h)
> a = i[0];
> }
> --
> ICE occurs with option "-O2 -fdbg-cnt=sched_block:1".
> 
> As the discussion in [1], it seems that we think this debug
> counter is useless and can be removed.  It's also implied
> that if it's useful and used often, the above issue should
> have been cared about and resolved earlier.  So this patch
> is to remove this debug counter.
> 
> Bootstrapped and regtested on x86_64-redhat-linux and
> powerpc64{,le}-linux-gnu.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635852.html
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
> 
> gcc/ChangeLog:
> 
>   * dbgcnt.def (sched_block): Remove.
>   * sched-rgn.cc (schedule_region): Remove the support of debug count
>   sched_block.
> ---
>  gcc/dbgcnt.def   |  1 -
>  gcc/sched-rgn.cc | 19 ++-
>  2 files changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
> index 871cbf75d93..a8c4e61e13d 100644
> --- a/gcc/dbgcnt.def
> +++ b/gcc/dbgcnt.def
> @@ -198,7 +198,6 @@ DEBUG_COUNTER (pre_insn)
>  DEBUG_COUNTER (prefetch)
>  DEBUG_COUNTER (registered_jump_thread)
>  DEBUG_COUNTER (sched2_func)
> -DEBUG_COUNTER (sched_block)
>  DEBUG_COUNTER (sched_breakdep)
>  DEBUG_COUNTER (sched_func)
>  DEBUG_COUNTER (sched_insn)
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..1c8acf5068a 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -3198,20 +3198,13 @@ schedule_region (int rgn)
>current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
> 
>curr_bb = first_bb;
> -  if (dbg_cnt (sched_block))
> -{
> -   int saved_last_basic_block = last_basic_block_for_fn (cfun);
> +  int saved_last_basic_block = last_basic_block_for_fn (cfun);
> 
> -   schedule_block (_bb, bb_state[first_bb->index]);
> -   gcc_assert (EBB_FIRST_BB (bb) == first_bb);
> -   sched_rgn_n_insns += sched_n_insns;
> -   realloc_bb_state_array (saved_last_basic_block);
> -   save_state_for_fallthru_edge (last_bb, curr_state);
> -}
> -  else
> -{
> -  sched_rgn_n_insns += rgn_n_insns;
> -}
> +  schedule_block (_bb, bb_state[first_bb->index]);
> +  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
> +  sched_rgn_n_insns += sched_n_insns;
> +  realloc_bb_state_array (saved_last_basic_block);
> +  save_state_for_fallthru_edge (last_bb, curr_state);
> 
>/* Clean up.  */
>if (current_nr_blocks > 1)
> --
> 2.39.1


PING^1 [PATCH] rs6000: New pass to mitigate SP float load perf issue on Power10

2023-12-11 Thread Kewen.Lin
Hi,

Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636599.html

BR,
Kewen

on 2023/11/15 17:16, Kewen.Lin wrote:
> Hi,
> 
> As Power ISA defines, when loading a scalar single precision (SP)
> floating point from memory, we have the double precision (DP) format
> in target register converted from SP, it's unlike some other
> architectures which supports SP and DP in registers with their
> separated formats.  The scalar SP instructions operates on DP format
> value in register and round the result to fit in SP (but still
> keeping the value in DP format).
> 
> On Power10, a scalar SP floating point load insn will be cracked into
> two internal operations, one is to load the value, the other is to
> convert SP to DP format.  Comparing to those uncracked load like
> vector SP load, it has extra 3 cycles load-to-use penalty.  When
> evaluating some critical workloads, we found that for some cases we
> don't really need the conversion if all the involved operations are
> only with SP format.  In this case, we can replace the scalar SP
> loads with vector SP load and splat (no conversion), replace all
> involved computation with the corresponding vector operations (with
> Power10 slice-based design, we expect the latency of scalar operation
> and its equivalent vector operation is the same), that is to promote
> the scalar SP loads and their affected computation to vector
> operations.
> 
> For example for the below case:
> 
> void saxpy (int n, float a, float * restrict x, float * restrict y)
> {
>   for (int i = 0; i < n; ++i)
>   y[i] = a*x[i] + y[i];
> }
> 
> At -O2, the loop body would end up with:
> 
> .L3:
> lfsx 12,6,9// conv
> lfsx 0,5,9 // conv
> fmadds 0,0,1,12
> stfsx 0,6,9
> addi 9,9,4
> bdnz .L3
> 
> but it can be implemented with:
> 
> .L3:
> lxvwsx 0,5,9   // load and splat
> lxvwsx 12,6,9
> xvmaddmsp 0,1,12
> stxsiwx 0,6,9  // just store word 1 (BE ordering)
> addi 9,9,4
> bdnz .L3
> 
> Evaluated on Power10, the latter can speed up 23% against the former.
> 
> So this patch is to introduce a pass to recognize such case and
> change the scalar SP operations with the appropriate vector SP
> operations when it's proper.
> 
> The processing of this pass starts from scalar SP loads, first it
> checks if it's valid, further checks all the stmts using its loaded
> result, then propagates from them.  This process of propagation
> mainly goes with function visit_stmt, which first checks the
> validity of the given stmt, then checks the feeders of use operands
> with visit_stmt recursively, finally checks all the stmts using the
> def with visit_stmt recursively.  The purpose is to ensure all
> propagated stmts are valid to be transformed with its equivalent
> vector operations.  For some special operands like constant or
> GIMPLE_NOP def ssa, record them as splatting candidates.  There are
> some validity checks like: if the addressing mode can satisfy index
> form with some adjustments, if there is the corresponding vector
> operation support, and so on.  Once all propagated stmts from one
> load are valid, they are transformed by function transform_stmt by
> respecting the information in stmt_info like sf_type, new_ops etc.
> 
> For example, for the below test case:
> 
>   _4 = MEM[(float *)x_13(D) + ivtmp.13_24 * 1];  // stmt1
>   _7 = MEM[(float *)y_15(D) + ivtmp.13_24 * 1];  // stmt2
>   _8 = .FMA (_4, a_14(D), _7);   // stmt3
>   MEM[(float *)y_15(D) + ivtmp.13_24 * 1] = _8;  // stmt4
> 
> The processing starts from stmt1, which is taken as valid, adds it
> into the chain, then processes its use stmt stmt3, which is also
> valid, iterating its operands _4 whose def is stmt1 (visited), a_14
> which needs splatting and _7 whose def stmt2 is to be processed.
> Then stmt2 is taken as a valid load and it's added into the chain.
> All operands _4, a_14 and _7 of stmt3 are processed well, then it's
> added into the chain.  Then it processes use stmts of _8 (result of
> stmt3), so checks stmt4 which is a valid store.  Since all these
> involved stmts are valid to be transformed, we get below finally:
> 
>   sf_5 = __builtin_vsx_lxvwsx (ivtmp.13_24, x_13(D));
>   sf_25 = __builtin_vsx_lxvwsx (ivtmp.13_24, y_15(D));
>   sf_22 = {a_14(D), a_14(D), a_14(D), a_14(D)};
>   sf_20 = .FMA (sf_5, sf_22, sf_25);
>   __builtin_vsx_stxsiwx (sf_20, ivtmp.13_24, y_15(D));
> 
> Since it needs to do some validity checks and adjustments if allowed,
> such as: check if some scalar operation has the corresponding vector
> support, considering scalar SP load can allow reg + {reg, disp}
> addressing modes while vector SP load and splat can only allow reg +
> reg, also considering the efficiency to get UD/DF chain for affected
> operations, we make this pass as a gimple pass.
> 
> Considering gimple_isel pass has some gimple massaging, this pass is
> placed just before that.  

[PATCH] Adjust vectorized cost for reduction.

2023-12-11 Thread liuhongt
x86 doesn't support horizontal reduction instructions, reduc_op_scal_m
is emulated with vec_extract_half + op(half vector length)
Take that into account when calculating cost for vectorization.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
No big performance impact on SPEC2017 as measured on ICX.
Ok for trunk?

gcc/ChangeLog:

PR target/112325
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Handle reduction vec_to_scalar.
(ix86_vector_costs::ix86_vect_reduc_cost): New function.
---
 gcc/config/i386/i386.cc | 45 +
 1 file changed, 45 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 4b6bad37c8f..02c9a5004a1 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24603,6 +24603,7 @@ private:
 
   /* Estimate register pressure of the vectorized code.  */
   void ix86_vect_estimate_reg_pressure ();
+  unsigned ix86_vect_reduc_cost (stmt_vec_info, tree);
   /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used for
  estimation of register pressure.
  ??? Currently it's only used by vec_construct/scalar_to_vec
@@ -24845,6 +24846,12 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
if (TREE_CODE (op) == SSA_NAME)
  TREE_VISITED (op) = 0;
 }
+  /* This is a reduc_*_scal_m, x86 support reduc_*_scal_m with emulation.  */
+  else if (kind == vec_to_scalar
+  && stmt_info
+  && vect_is_reduction (stmt_info))
+stmt_cost = ix86_vect_reduc_cost (stmt_info, vectype);
+
   if (stmt_cost == -1)
 stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
 
@@ -24875,6 +24882,44 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
   return retval;
 }
 
+/* x86 doesn't support horizontal reduction instructions,
+   redc_op_scal_m is emulated with vec_extract_hi + op.  */
+unsigned
+ix86_vector_costs::ix86_vect_reduc_cost (stmt_vec_info stmt_info,
+tree vectype)
+{
+  gcc_assert (vectype);
+  unsigned cost = 0;
+  machine_mode mode = TYPE_MODE (vectype);
+  unsigned len = GET_MODE_SIZE (mode);
+
+  /* PSADBW is used for reduc_plus_scal_{v16qi, v8qi, v4qi}.  */
+  if (GET_MODE_INNER (mode) == E_QImode
+  && stmt_info
+  && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN
+  && gimple_assign_rhs_code (stmt_info->stmt) == PLUS_EXPR)
+{
+  cost = ix86_cost->sse_op;
+  /* vec_extract_hi + vpaddb for 256/512-bit reduc_plus_scal_v*qi.  */
+  if (len > 16)
+   cost += exact_log2 (len >> 4) * ix86_cost->sse_op * 2;
+}
+  else
+/* vec_extract_hi + op.  */
+cost = ix86_cost->sse_op * exact_log2 (TYPE_VECTOR_SUBPARTS (vectype)) * 2;
+
+  /* Cout extra uops for TARGET_*_SPLIT_REGS. NB: There's no target which
+ supports 512-bit vector but has TARGET_AVX256/128_SPLIT_REGS.
+ ix86_vect_cost is not used since reduction instruction sequence are
+ consisted with mixed vector-length instructions after vec_extract_hi.  */
+  if ((len == 64 && TARGET_AVX512_SPLIT_REGS)
+  || (len == 32 && TARGET_AVX256_SPLIT_REGS)
+  || (len == 16 && TARGET_AVX256_SPLIT_REGS))
+cost += ix86_cost->sse_op;
+
+  return cost;
+}
+
 void
 ix86_vector_costs::ix86_vect_estimate_reg_pressure ()
 {
-- 
2.31.1



PING^6 [PATCH v2] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]

2023-12-11 Thread Kewen.Lin
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609993.html

BR,
Kewen

> on 2023/1/16 17:08, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> As Honza pointed out in [1], the current uses of function
>> optimize_function_for_speed_p in rs6000_option_override_internal
>> are too early, since the query results from the functions
>> optimize_function_for_{speed,size}_p could be changed later due
>> to profile feedback and some function attributes handlings etc.
>>
>> This patch is to move optimize_function_for_speed_p to all the
>> use places of the corresponding flags, which follows the existing
>> practices.  Maybe we can cache it somewhere at an appropriate
>> timing, but that's another thing.
>>
>> Comparing with v1[2], this version added one test case for
>> SAVE_TOC_INDIRECT as Segher questioned and suggested, and it
>> also considered the possibility of explicit option (see test
>> cases pr108184-2.c and pr108184-4.c).  I believe that excepting
>> for the intentional change on optimize_function_for_{speed,
>> size}_p, there is no other function change.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607527.html
>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609379.html
>>
>> Bootstrapped and regtested on powerpc64-linux-gnu P8,
>> powerpc64le-linux-gnu P{9,10} and powerpc-ibm-aix.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  * config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
>>  all optimize_function_for_speed_p uses.
>>  (fusion_gpr_load_p): Call optimize_function_for_speed_p along
>>  with TARGET_P8_FUSION_SIGN.
>>  (expand_fusion_gpr_load): Likewise.
>>  (rs6000_call_aix): Call optimize_function_for_speed_p along with
>>  TARGET_SAVE_TOC_INDIRECT.
>>  * config/rs6000/predicates.md (fusion_gpr_mem_load): Call
>>  optimize_function_for_speed_p along with TARGET_P8_FUSION_SIGN.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/powerpc/pr108184-1.c: New test.
>>  * gcc.target/powerpc/pr108184-2.c: New test.
>>  * gcc.target/powerpc/pr108184-3.c: New test.
>>  * gcc.target/powerpc/pr108184-4.c: New test.
>> ---
>>  gcc/config/rs6000/predicates.md   |  5 +++-
>>  gcc/config/rs6000/rs6000.cc   | 19 +-
>>  gcc/testsuite/gcc.target/powerpc/pr108184-1.c | 16 
>>  gcc/testsuite/gcc.target/powerpc/pr108184-2.c | 15 +++
>>  gcc/testsuite/gcc.target/powerpc/pr108184-3.c | 25 +++
>>  gcc/testsuite/gcc.target/powerpc/pr108184-4.c | 24 ++
>>  6 files changed, 97 insertions(+), 7 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-3.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-4.c
>>
>> diff --git a/gcc/config/rs6000/predicates.md 
>> b/gcc/config/rs6000/predicates.md
>> index a1764018545..9f84468db84 100644
>> --- a/gcc/config/rs6000/predicates.md
>> +++ b/gcc/config/rs6000/predicates.md
>> @@ -1878,7 +1878,10 @@ (define_predicate "fusion_gpr_mem_load"
>>
>>/* Handle sign/zero extend.  */
>>if (GET_CODE (op) == ZERO_EXTEND
>> -  || (TARGET_P8_FUSION_SIGN && GET_CODE (op) == SIGN_EXTEND))
>> +  || (TARGET_P8_FUSION_SIGN
>> +  && GET_CODE (op) == SIGN_EXTEND
>> +  && (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN
>> +  || optimize_function_for_speed_p (cfun
>>  {
>>op = XEXP (op, 0);
>>mode = GET_MODE (op);
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 6ac3adcec6b..f47d21980a9 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -3997,8 +3997,7 @@ rs6000_option_override_internal (bool 
>> global_init_p)
>>/* If we can shrink-wrap the TOC register save separately, then use
>>   -msave-toc-indirect unless explicitly disabled.  */
>>if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>> -  && flag_shrink_wrap_separate
>> -  && optimize_function_for_speed_p (cfun))
>> +  && flag_shrink_wrap_separate)
>>  rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>
>>/* Enable power8 fusion if we are tuning for power8, even if we aren't
>> @@ -4032,7 +4031,6 @@ rs6000_option_override_internal (bool 
>> global_init_p)
>>   zero extending load, and an explicit sign extension.  */
>>if (TARGET_P8_FUSION
>>&& !(rs6000_isa_flags_explicit & 

PING^8 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-12-11 Thread Kewen.Lin
Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

>>> on 2022/11/24 17:15, Kewen Lin wrote:
 Hi,

 Following Segher's suggestion, this patch series is to rework
 function rs6000_emit_vector_compare for vector float and int
 in multiple steps, it's based on the previous attempts [1][2].
 As mentioned in [1], the need to rework this for float is to
 make a centralized place for vector float comparison handlings
 instead of supporting with swapping ops and reversing code etc.
 dispersedly.  It's also for a subsequent patch to handle
 comparison operators with or without trapping math (PR105480).
 With the handling on vector float reworked, we can further make
 the handling on vector int simplified as shown.

 For Segher's concern about whether this rework causes any
 assembly change, I constructed two testcases for vector float[3]
 and int[4] respectively before, it showed the most are fine
 excepting for the difference on LE and UNGT, it's demonstrated
 as improvement since it uses GE instead of GT ior EQ.  The
 associated test case in patch 3/9 is a good example.

 Besides, w/ and w/o the whole patch series, I built the whole
 SPEC2017 at options -O3 and -Ofast separately, checked the
 differences on object assembly.  The result showed that the
 most are unchanged, except for:

   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
 9 object files with differences.

   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
 one and 527.cam4_r has 4 object files with differences.

 By looking into these differences, all significant differences
 are caused by the known improvement mentined above transforming
 GT ior EQ to GE, which can also affect unrolling decision due
 to insn count.  Some other trivial differences are branch
 target offset difference, nop difference for alignment, vsx
 register number differences etc.

 I also evaluated the runtime performance for these changed
 benchmarks, the result is neutral.

 These patches are bootstrapped and regress-tested
 incrementally on powerpc64-linux-gnu P7 & P8, and
 powerpc64le-linux-gnu P9 & P10.

 Is it ok for trunk?

 BR,
 Kewen
 -
 [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
 [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
 [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
 [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html

 Kewen Lin (9):
   rs6000: Rework vector float comparison in rs6000_emit_vector_compare 
 - p1
   rs6000: Rework vector float comparison in rs6000_emit_vector_compare 
 - p2
   rs6000: Rework vector float comparison in rs6000_emit_vector_compare 
 - p3
   rs6000: Rework vector float comparison in rs6000_emit_vector_compare 
 - p4
   rs6000: Rework vector integer comparison in 
 rs6000_emit_vector_compare - p1
   rs6000: Rework vector integer comparison in 
 rs6000_emit_vector_compare - p2
   rs6000: Rework vector integer comparison in 
 rs6000_emit_vector_compare - p3
   rs6000: Rework vector integer comparison in 
 rs6000_emit_vector_compare - p4
   rs6000: Rework vector integer comparison in 
 rs6000_emit_vector_compare - p5

  gcc/config/rs6000/rs6000.cc | 180 ++--
  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
  2 files changed, 74 insertions(+), 131 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c

>>


PING^1 [PATCH] range: Workaround different type precision issue between _Float128 and long double [PR112788]

2023-12-11 Thread Kewen.Lin
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639140.html

BR,
Kewen

on 2023/12/4 17:49, Kewen.Lin wrote:
> Hi,
> 
> As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128
> has the different type precision (128) from that (127) of type long
> double, but actually they has the same underlying mode, so they have
> the same precision as the mode indicates the same real type format
> ieee_quad_format.
> 
> It's not sensible to have such two types which have the same mode but
> different type precisions, some fix attempt was posted at [1].
> As the discussion there, there are some historical reasons and
> practical issues.  Considering we passed stage 1 and it also affected
> the build as reported, this patch is trying to temporarily workaround
> it.  I thought to introduce a hookpod but that seems a bit overkill,
> assuming scalar float type with the same mode should have the same
> precision looks sensible.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P7/P8/P9 and
> powerpc64le-linux-gnu P9/P10.
> 
> Is it ok for trunk?
> 
> [1] 
> https://inbox.sourceware.org/gcc-patches/718677e7-614d-7977-312d-05a75e1fd...@linux.ibm.com/
> 
> BR,
> Kewen
> 
>   PR tree-optimization/112788
> 
> gcc/ChangeLog:
> 
>   * value-range.h (range_compatible_p): Workaround same type mode but
>   different type precision issue for rs6000 scalar float types
>   _Float128 and long double.
> ---
>  gcc/value-range.h | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 33f204a7171..d0a84754a10 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -1558,7 +1558,13 @@ range_compatible_p (tree type1, tree type2)
>// types_compatible_p requires conversion in both directions to be useless.
>// GIMPLE only requires a cast one way in order to be compatible.
>// Ranges really only need the sign and precision to be the same.
> -  return (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
> -   && TYPE_SIGN (type1) == TYPE_SIGN (type2));
> +  return TYPE_SIGN (type1) == TYPE_SIGN (type2)
> +  && (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
> +  // FIXME: As PR112788 shows, for now on rs6000 _Float128 has
> +  // type precision 128 while long double has type precision 127
> +  // but both have the same mode so their precision is actually
> +  // the same, workaround it temporarily.
> +  || (SCALAR_FLOAT_TYPE_P (type1)
> +  && TYPE_MODE (type1) == TYPE_MODE (type2)));
>  }
>  #endif // GCC_VALUE_RANGE_H
> --
> 2.42.0
>


Re: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Hongtao Liu
On Tue, Dec 12, 2023 at 1:47 PM Jiang, Haochen via Gcc-regression
 wrote:
>
> > -Original Message-
> > From: Jiang, Haochen
> > Sent: Tuesday, December 12, 2023 9:11 AM
> > To: Andrew Pinski (QUIC) ; haochen.jiang
> > ; gcc-regress...@gcc.gnu.org; gcc-
> > patc...@gcc.gnu.org
> > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> > assembler-times shrq 2 on Linux/x86_64
> >
> > > -Original Message-
> > > From: Andrew Pinski (QUIC) 
> > > Sent: Tuesday, December 12, 2023 9:01 AM
> > > To: haochen.jiang ; Andrew Pinski (QUIC)
> > > ; gcc-regress...@gcc.gnu.org; gcc-
> > > patc...@gcc.gnu.org; Jiang, Haochen 
> > > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> > scan-
> > > assembler-times shrq 2 on Linux/x86_64
> > >
> > > > -Original Message-
> > > > From: haochen.jiang 
> > > > Sent: Monday, December 11, 2023 4:54 PM
> > > > To: Andrew Pinski (QUIC) ; gcc-
> > > > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> > haochen.ji...@intel.com
> > > > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> > > > assembler-times shrq 2 on Linux/x86_64
> > > >
> > > > On Linux/x86_64,
> > > >
> > > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit
> > commit
> > > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > > > Author: Andrew Pinski 
> > > > Date:   Sat Nov 11 15:54:10 2023 -0800
> > > >
> > > > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one 
> > > > type
> > are
> > > > the same
> > > >
> > > > caused
> > > >
> > > > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> > >
> > >
> > > So I think this is a testsuite issue, in that shrx instruction is being 
> > > used here
> > > instead of just ` shrq` due to that instruction being enabled with `-
> > > march=cascadelake` .
> > > Can someone confirm that and submit a testcase change?
> >
> > I will do that today.
>
> I suppose we might just need to change the scan-asm from shrq to shr to cover
> shrx.
Please use shr\[qx\], not shr.
>
> Is that ok? If it is, I will commit a patch to change that.
>
> Thx,
> Haochen
>
> >
> > Thx,
> > Haochen
> >
> > >
> > > Thanks,
> > > Andrew
> > >
> > > >
> > > > with GCC configured with
> > > >
> > > > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > > > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-
> > zlib -
> > > > -with-demangler-in-ld --with-fpmath=sse --enable-
> > languages=c,c++,fortran --
> > > > enable-cet --without-isl --enable-libmpx x86_64-linux 
> > > > --disable-bootstrap
> > > >
> > > > To reproduce:
> > > >
> > > > $ cd {build_dir}/gcc && make check
> > > > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > > > target_board='unix{-m64\ -march=cascadelake}'"
> > > >
> > > > (Please do not reply to this email, for question about this report, 
> > > > contact
> > me at
> > > > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > > > related, disabling AVX512F in command line might save that.) (However,
> > > > please make sure that there is no potential problems with AVX512.)



-- 
BR,
Hongtao


Re: [Patch, rs6000] Clean up pre-checking of expand_block_compare

2023-12-11 Thread Kewen.Lin
Hi,

on 2023/12/11 10:54, HAO CHEN GUI wrote:
> Hi,
>   This patch cleans up pre-checking of expand_block_compare. It does
> 1. Assert only P7 above can enter this function as it's already guard
> by the expand.
> 2. Return false when optimizing for size.
> 3. Remove P7 CPU test as only P7 above can enter this function and P7
> LE is excluded by targetm.slow_unaligned_access.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Clean up pre-checking of expand_block_compare
> 
> gcc/
>   * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
>   only P7 above can enter this function.  Return false when it's
>   optimized for size.  Remove P7 CPU test as only P7 above can enter
>   this function and P7 LE is excluded by the checking of
>   targetm.slow_unaligned_access on word_mode.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/memcmp_for_size.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index d4030854b2a..dff69e90d0c 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -1946,6 +1946,15 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
> unsigned int base_align,
>  bool
>  expand_block_compare (rtx operands[])
>  {
> +  gcc_assert (TARGET_POPCNTD);

Nit: Add one comment on why we expect TARGET_POPCNTD here.

> +
> +  if (optimize_insn_for_size_p ())
> +return false;
> +
> +  /* Allow this param to shut off all expansion.  */
> +  if (rs6000_block_compare_inline_limit == 0)
> +return false;
> +
>rtx target = operands[0];
>rtx orig_src1 = operands[1];
>rtx orig_src2 = operands[2];

Nit: Move these below closer to their uses.

> @@ -1959,23 +1968,9 @@ expand_block_compare (rtx operands[])
>if (TARGET_32BIT && TARGET_POWERPC64)
>  return false;
> 
> -  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
> -
> -  /* Allow this param to shut off all expansion.  */
> -  if (rs6000_block_compare_inline_limit == 0)
> -return false;
> -
> -  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
> - However slow_unaligned_access returns true on P7 even though the
> - performance of this code is good there.  */
> -  if (!isP7
> -  && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
> -   || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2
> -return false;
> -
> -  /* Unaligned l*brx traps on P7 so don't do this.  However this should
> - not affect much because LE isn't really supported on P7 anyway.  */
> -  if (isP7 && !BYTES_BIG_ENDIAN)

IMHO we'd better to keep this check, since users are able to specify
no-strict-align on P7, that is we can't guarantee it's always strict-align
and can be rejected by targetm.slow_unaligned_access below.

> +  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
> +if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
> + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
>  return false;

This change makes us respect targetm.slow_unaligned_access more, I like it.

> 
>/* If this is not a fixed size compare, try generating loop code and
> @@ -2023,14 +2018,6 @@ expand_block_compare (rtx operands[])
>if (!IN_RANGE (bytes, 1, max_bytes))
>  return expand_compare_loop (operands);
> 
> -  /* The code generated for p7 and older is not faster than glibc
> - memcmp if alignment is small and length is not short, so bail
> - out to avoid those conditions.  */
> -  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
> -  && ((base_align == 1 && bytes > 16)
> -   || (base_align == 2 && bytes > 32)))
> -return false;

Why did you change this?  I didn't see any explanation above or am I missing?

> -
>rtx final_label = NULL;
> 
>if (use_vec)
> diff --git a/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c 
> b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
> new file mode 100644
> index 000..c7e853ad593
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c

Nit: As the comment in another thread, it can be block-cmp-3.c or similar.

BR,
Kewen
 
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
> +
> +int foo (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 4);
> +}




Re: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Andrew Pinski
On Mon, Dec 11, 2023, 21:48 Jiang, Haochen  wrote:

> > -Original Message-
> > From: Jiang, Haochen
> > Sent: Tuesday, December 12, 2023 9:11 AM
> > To: Andrew Pinski (QUIC) ; haochen.jiang
> > ; gcc-regress...@gcc.gnu.org; gcc-
> > patc...@gcc.gnu.org
> > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > assembler-times shrq 2 on Linux/x86_64
> >
> > > -Original Message-
> > > From: Andrew Pinski (QUIC) 
> > > Sent: Tuesday, December 12, 2023 9:01 AM
> > > To: haochen.jiang ; Andrew Pinski (QUIC)
> > > ; gcc-regress...@gcc.gnu.org; gcc-
> > > patc...@gcc.gnu.org; Jiang, Haochen 
> > > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> > scan-
> > > assembler-times shrq 2 on Linux/x86_64
> > >
> > > > -Original Message-
> > > > From: haochen.jiang 
> > > > Sent: Monday, December 11, 2023 4:54 PM
> > > > To: Andrew Pinski (QUIC) ; gcc-
> > > > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> > haochen.ji...@intel.com
> > > > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > > > assembler-times shrq 2 on Linux/x86_64
> > > >
> > > > On Linux/x86_64,
> > > >
> > > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit
> > commit
> > > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > > > Author: Andrew Pinski 
> > > > Date:   Sat Nov 11 15:54:10 2023 -0800
> > > >
> > > > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one
> type
> > are
> > > > the same
> > > >
> > > > caused
> > > >
> > > > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> > >
> > >
> > > So I think this is a testsuite issue, in that shrx instruction is
> being used here
> > > instead of just ` shrq` due to that instruction being enabled with `-
> > > march=cascadelake` .
> > > Can someone confirm that and submit a testcase change?
> >
> > I will do that today.
>
> I suppose we might just need to change the scan-asm from shrq to shr to
> cover
> shrx.
>
> Is that ok? If it is, I will commit a patch to change that.
>


>From my point of view, that would be the correct approach but I cannot
approve it.

Thanks,
Andrew




> Thx,
> Haochen
>
> >
> > Thx,
> > Haochen
> >
> > >
> > > Thanks,
> > > Andrew
> > >
> > > >
> > > > with GCC configured with
> > > >
> > > > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > > > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-
> > zlib -
> > > > -with-demangler-in-ld --with-fpmath=sse --enable-
> > languages=c,c++,fortran --
> > > > enable-cet --without-isl --enable-libmpx x86_64-linux
> --disable-bootstrap
> > > >
> > > > To reproduce:
> > > >
> > > > $ cd {build_dir}/gcc && make check
> > > > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > > > target_board='unix{-m64\ -march=cascadelake}'"
> > > >
> > > > (Please do not reply to this email, for question about this report,
> contact
> > me at
> > > > haochen dot jiang at intel.com.) (If you met problems with
> cascadelake
> > > > related, disabling AVX512F in command line might save that.)
> (However,
> > > > please make sure that there is no potential problems with AVX512.)
>


RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Jiang, Haochen
> -Original Message-
> From: Jiang, Haochen
> Sent: Tuesday, December 12, 2023 9:11 AM
> To: Andrew Pinski (QUIC) ; haochen.jiang
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org
> Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> > -Original Message-
> > From: Andrew Pinski (QUIC) 
> > Sent: Tuesday, December 12, 2023 9:01 AM
> > To: haochen.jiang ; Andrew Pinski (QUIC)
> > ; gcc-regress...@gcc.gnu.org; gcc-
> > patc...@gcc.gnu.org; Jiang, Haochen 
> > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > assembler-times shrq 2 on Linux/x86_64
> >
> > > -Original Message-
> > > From: haochen.jiang 
> > > Sent: Monday, December 11, 2023 4:54 PM
> > > To: Andrew Pinski (QUIC) ; gcc-
> > > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> haochen.ji...@intel.com
> > > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> > > assembler-times shrq 2 on Linux/x86_64
> > >
> > > On Linux/x86_64,
> > >
> > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit
> commit
> > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > > Author: Andrew Pinski 
> > > Date:   Sat Nov 11 15:54:10 2023 -0800
> > >
> > > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type
> are
> > > the same
> > >
> > > caused
> > >
> > > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> >
> >
> > So I think this is a testsuite issue, in that shrx instruction is being 
> > used here
> > instead of just ` shrq` due to that instruction being enabled with `-
> > march=cascadelake` .
> > Can someone confirm that and submit a testcase change?
> 
> I will do that today.

I suppose we might just need to change the scan-asm from shrq to shr to cover
shrx.

Is that ok? If it is, I will commit a patch to change that.

Thx,
Haochen

> 
> Thx,
> Haochen
> 
> >
> > Thanks,
> > Andrew
> >
> > >
> > > with GCC configured with
> > >
> > > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-
> zlib -
> > > -with-demangler-in-ld --with-fpmath=sse --enable-
> languages=c,c++,fortran --
> > > enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> > >
> > > To reproduce:
> > >
> > > $ cd {build_dir}/gcc && make check
> > > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > > target_board='unix{-m64\ -march=cascadelake}'"
> > >
> > > (Please do not reply to this email, for question about this report, 
> > > contact
> me at
> > > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > > related, disabling AVX512F in command line might save that.) (However,
> > > please make sure that there is no potential problems with AVX512.)


Re: [PATCH] Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.

2023-12-11 Thread Hongtao Liu
On Fri, Dec 8, 2023 at 10:17 AM liuhongt  wrote:
>
> If the function desn't clobber any sse registers or only clobber
> 128-bit part, then vzeroupper isn't issued before the function exit.
> the status not CLEAN but ANY after the function.
>
> Also for sibling_call, it's safe to issue an vzeroupper. Also there
> could be missing vzeroupper since there's no mode_exit for
> sibling_call_p.
>
> Compared to the patch in the PR, this patch add sibling_call part.
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk and backport?
Part of this has been approved  in the PR, and for the sibling_call
part, i think it should be reasonable.
So i'm going to commit the patch.
>
> gcc/ChangeLog:
>
> PR target/112891
> * config/i386/i386.cc (ix86_avx_u128_mode_after): Return
> AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to
> align with ix86_avx_u128_mode_needed.
> (ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for
> sibling_call.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr112891.c: New test.
> * gcc.target/i386/pr112891-2.c: New test.
> ---
>  gcc/config/i386/i386.cc| 22 +---
>  gcc/testsuite/gcc.target/i386/pr112891-2.c | 30 ++
>  gcc/testsuite/gcc.target/i386/pr112891.c   | 29 +
>  3 files changed, 78 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112891-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112891.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 7c5cab4e2c6..fe259cdb789 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -15038,8 +15038,12 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
>  vzeroupper if all SSE registers are clobbered.  */
>const function_abi  = insn_callee_abi (insn);
>if (vzeroupper_pattern (PATTERN (insn), VOIDmode)
> - || !hard_reg_set_subset_p (reg_class_contents[SSE_REGS],
> -abi.mode_clobbers (V4DImode)))
> + /* Should be safe to issue an vzeroupper before sibling_call_p.
> +Also there not mode_exit for sibling_call, so there could be
> +missing vzeroupper for that.  */
> + || !(SIBLING_CALL_P (insn)
> +  || hard_reg_set_subset_p (reg_class_contents[SSE_REGS],
> +abi.mode_clobbers (V4DImode
> return AVX_U128_ANY;
>
>return AVX_U128_CLEAN;
> @@ -15177,7 +15181,19 @@ ix86_avx_u128_mode_after (int mode, rtx_insn *insn)
>bool avx_upper_reg_found = false;
>note_stores (insn, ix86_check_avx_upper_stores, _upper_reg_found);
>
> -  return avx_upper_reg_found ? AVX_U128_DIRTY : AVX_U128_CLEAN;
> +  if (avx_upper_reg_found)
> +   return AVX_U128_DIRTY;
> +
> +  /* If the function desn't clobber any sse registers or only clobber
> +128-bit part, Then vzeroupper isn't issued before the function exit.
> +the status not CLEAN but ANY after the function.  */
> +  const function_abi  = insn_callee_abi (insn);
> +  if (!(SIBLING_CALL_P (insn)
> +   || hard_reg_set_subset_p (reg_class_contents[SSE_REGS],
> + abi.mode_clobbers (V4DImode
> +   return AVX_U128_ANY;
> +
> +  return  AVX_U128_CLEAN;
>  }
>
>/* Otherwise, return current mode.  Remember that if insn
> diff --git a/gcc/testsuite/gcc.target/i386/pr112891-2.c 
> b/gcc/testsuite/gcc.target/i386/pr112891-2.c
> new file mode 100644
> index 000..164c3985d50
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112891-2.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O3" } */
> +/* { dg-final { scan-assembler-times "vzeroupper" 1 } } */
> +
> +void
> +__attribute__((noinline))
> +bar (double* a)
> +{
> +  a[0] = 1.0;
> +  a[1] = 2.0;
> +}
> +
> +double
> +__attribute__((noinline))
> +foo (double* __restrict a, double* b)
> +{
> +  a[0] += b[0];
> +  a[1] += b[1];
> +  a[2] += b[2];
> +  a[3] += b[3];
> +  bar (b);
> +  return a[5] + b[5];
> +}
> +
> +double
> +foo1 (double* __restrict a, double* b)
> +{
> +  double c = foo (a, b);
> +  return __builtin_exp (c);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr112891.c 
> b/gcc/testsuite/gcc.target/i386/pr112891.c
> new file mode 100644
> index 000..dbf6c67948a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112891.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O3" } */
> +/* { dg-final { scan-assembler-times "vzeroupper" 1 } } */
> +
> +void
> +__attribute__((noinline))
> +bar (double* a)
> +{
> +  a[0] = 1.0;
> +  a[1] = 2.0;
> +}
> +
> +void
> +__attribute__((noinline))
> +foo (double* __restrict a, double* b)
> +{
> +  a[0] += b[0];
> +  a[1] += b[1];
> +  a[2] += b[2];
> +  a[3] += b[3];
> +  bar (b);
> +}
> +
> +double
> +foo1 (double* 

Re: [PATCH] untyped calls: enable target switching [PR112334]

2023-12-11 Thread Alexandre Oliva
On Dec 11, 2023, Jeff Law  wrote:

>> 
>> for  gcc/ChangeLog
>> PR target/112334
>> * builtins.h (target_builtins): Add fields for apply_args_size
>> and apply_result_size.
>> * builtins.cc (apply_args_size, apply_result_size): Cache
>> results in fields rather than in static variables.
>> (get_apply_args_size, set_apply_args_size): New.
>> (get_apply_result_size, set_apply_result_size): New.
> OK.

Thanks, I put this bit in.

>> untyped calls: use wrapper class type for implicit plus_one
>> Instead of get and set macros to apply a delta, use a single macro
>> that resorts to a temporary wrapper class to apply it.
>> To be combined (or not) with the previous patch.

> I'd be OK with this as well.

That conditional made me doubt that this was meant as approval, so I did
*not* put this one in ;-)

If there's firmer/broader buy-in, I'd be glad to put it in as well.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] LoongArch: Fix warnings building libgcc

2023-12-11 Thread chenglulu



在 2023/12/12 上午9:58, chenglulu 写道:


在 2023/12/10 上午12:38, Xi Ruoyao 写道:

We are excluding loongarch-opts.h from target libraries, but now struct
loongarch_target and gcc_options are not declared in the target
libraries, causing:

In file included from ../.././gcc/options.h:8,
  from ../.././gcc/tm.h:49,
  from ../../../gcc/libgcc/fixed-bit.c:48:
../../../gcc/libgcc/../gcc/config/loongarch/loongarch-opts.h:57:41:
warning: 'struct gcc_options' declared inside parameter list will not
be visible outside of this definition or declaration
    57 |  struct gcc_options *opts,
   | ^~~

So exclude the declarations referring to the C++ structs as well.

gcc/ChangeLog:

* config/loongarch/loongarch-opts.h (la_target): Move into #if
for loongarch-def.h.
(loongarch_init_target): Likewise.
(loongarch_config_target): Likewise.
(loongarch_update_gcc_opt_status): Likewise.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

I don't understand. I tested it when I submitted the code, but I 
didn't find this problem.


I think such a problem will cause compilation errors when 
--enable-bootstrap, right?




I've seen a warning message.

Thanks!



Re: [Patch, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-11 Thread Kewen.Lin
Hi,

on 2023/12/11 09:49, HAO CHEN GUI wrote:
> Hi,
>   The patch corrects the definition of
> TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a
> comprehensible name.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Correct definition of macro of fixed point efficient unaligned
> 
> Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to
> guard whether a platform is efficient on fixed point unaligned load/store.
> It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled
> from P8 and can be disabled by mno-vsx option. So the definition is wrong.
> This patch corrects the problem and define it by "!STRICT_ALIGNMENT" which
> is true on P7 BE and P8 above.

I agree it's unexpected that the current TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
is mapped as TARGET_EFFICIENT_UNALIGNED_VSX for scalar int access, but there is
one function rs6000_slow_unaligned_access (for hook slow_unaligned_access), I
expect it can well fit into these places, can we just adopt it?  IMHO it helps
the long term maintenance. 

> 
> gcc/
>   * config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
>   Rename to...
>   (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT): ...this, set it to
>   !STRICT_ALIGNMENT.
>   * config/rs6000/rs6000-string.cc (select_block_compare_mode):
>   Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
>   TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT.
>   (select_block_compare_mode): Likewise.
>   (expand_block_compare_gpr): Likewise.
>   (expand_block_compare): Likewise.
>   (expand_strncmp_gpr_sequence): Likewise.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c: New.
>   * gcc.target/powerpc/target_efficient_unaligned_fixedpoint-2.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index 44a946cd453..d4030854b2a 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
>else if (bytes == GET_MODE_SIZE (QImode))
>  return QImode;
>else if (bytes < GET_MODE_SIZE (SImode)
> -&& TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
> +&& TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
>  && offset >= GET_MODE_SIZE (SImode) - bytes)
>  /* This matches the case were we have SImode and 3 bytes
> and offset >= 1 and permits us to move back one and overlap
> @@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
> unwanted bytes off of the input.  */
>  return SImode;
>else if (word_mode_ok && bytes < UNITS_PER_WORD
> -&& TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
> +&& TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
>  && offset >= UNITS_PER_WORD-bytes)
>  /* Similarly, if we can use DImode it will get matched here and
> can do an overlapping read that ends at the end of the block.  */
> @@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
> unsigned int base_align,
>load_mode_size = GET_MODE_SIZE (load_mode);
>if (bytes >= load_mode_size)
>   cmp_bytes = load_mode_size;
> -  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
> +  else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT)
>   {
> /* Move this load back so it doesn't go past the end.
>P8/P9 can do this efficiently.  */
> @@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[])
>/* The code generated for p7 and older is not faster than glibc
>   memcmp if alignment is small and length is not short, so bail
>   out to avoid those conditions.  */
> -  if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
> +  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
>&& ((base_align == 1 && bytes > 16)
> || (base_align == 2 && bytes > 32)))
>  return false;
> @@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT 
> bytes_to_compare,
>load_mode_size = GET_MODE_SIZE (load_mode);
>if (bytes_to_compare >= load_mode_size)
>   cmp_bytes = load_mode_size;
> -  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
> +  else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT)
>   {
> /* Move this load back so it doesn't go past the end.
>P8/P9 can do this efficiently.  */
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 326c45221e9..2f3a82942c1 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -483,9 +483,9 @@ extern int rs6000_vector_align[];
>  #define TARGET_NO_SF_SUBREG  TARGET_DIRECT_MOVE_64BIT
>  #define TARGET_ALLOW_SF_SUBREG   (!TARGET_DIRECT_MOVE_64BIT)
> 
> -/* This wants to be set for p8 and newer.  On p7, overlapping unaligned
> -   loads are slow. */
> 

[PATCH] contrib: add git gcc-style alias

2023-12-11 Thread Jason Merrill
OK for trunk?

-- 8< --

I thought it could be easier to use check_GNU_style.py.  With this alias,
'git gcc-style' will take a git revision as argument instead of a file, or
check HEAD if no argument is given.

contrib/ChangeLog:

* gcc-git-customization.sh: Add git gcc-style alias.
---
 contrib/gcc-git-customization.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/contrib/gcc-git-customization.sh b/contrib/gcc-git-customization.sh
index 2e173e859d7..54bd35ea1aa 100755
--- a/contrib/gcc-git-customization.sh
+++ b/contrib/gcc-git-customization.sh
@@ -30,6 +30,11 @@ git config alias.gcc-backport '!f() { "`git rev-parse 
--show-toplevel`/contrib/g
 git config alias.gcc-fix-changelog '!f() { "`git rev-parse 
--show-toplevel`/contrib/git-fix-changelog.py" $@; } ; f'
 git config alias.gcc-mklog '!f() { "`git rev-parse 
--show-toplevel`/contrib/mklog.py" $@; } ; f'
 git config alias.gcc-commit-mklog '!f() { "`git rev-parse 
--show-toplevel`/contrib/git-commit-mklog.py" "$@"; }; f'
+git config alias.gcc-style '!f() {
+check=`git rev-parse --show-toplevel`/contrib/check_GNU_style.py;
+arg=; if [ $# -ge 1 ] && [ "$1" != "-f" ]; then arg="$1"; shift;
+elif [ $# -eq 3 ]; then arg="$3"; set -- "$1" "$2"; fi
+git show $arg | $check "$@" -; }; f'
 
 # Make diff on MD files use "(define" as a function marker.
 # Use this in conjunction with a .gitattributes file containing

base-commit: 074c6f15f7a28c620c756f18c2a310961de00539
-- 
2.39.3



Re: [PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-11 Thread Hongtao Liu
On Mon, Dec 11, 2023 at 8:39 PM Hongyu Wang  wrote:
>
> > > +__int128 u128_2 = (9223372036854775808 << 4) * foo0_u8_0; /* { 
> > > dg-warning "integer constant is so large that it is unsigned" "so large" 
> > > } */
> >
> > Just you can use (9223372036854775807LL + (__int128) 1) instead of 
> > 9223372036854775808
> > to avoid the warning.
> > The testcase will ICE without the patch even with that.
>
> Thanks for the hint! Will adjust when pushing the patch.
Ok.



-- 
BR,
Hongtao


[PATCH #2/2] strub: drop volatile from wrapper args [PR112938]

2023-12-11 Thread Alexandre Oliva
On Dec 11, 2023, Alexandre Oliva  wrote:

> (there's a #2/2 followup coming up that addresses the ??? comment added
> herein)

Here it is.  Also regstrapped on x86_64-linux-gnu, along with the
previous patch (that had also been regstrapped by itself).  I think this
would be a desirable thing to do (maybe also with TYPE_QUAL_ATOMIC), but
I'm a little worried about modifying the types of args of the original
function decl, the one that is about to become a wrapper.  This would be
visible at least in debug information.  OTOH, keeping the volatile in
the wrapper would serve no useful purpose whatsoever, it would likely
just make it slower, and such top-level qualifiers really only apply
within the function body, which the wrapper isn't.  Thoughts?  Ok to
install?


Drop volatile from argument types in internal strub wrappers that are
not made indirect.  Their volatility is only relevant within the body
of the function, and that body is moved to the wrapped function.


for  gcc/ChangeLog

PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Drop volatile from
internal strub wrapper args.

for  gcc/testsuite/ChangeLog

PR middle-end/112938
* gcc.dg/strub-internal-volatile.c: Check for dropped volatile
in wrapper.
---
 gcc/ipa-strub.cc   |   14 +++---
 gcc/testsuite/gcc.dg/strub-internal-volatile.c |5 +
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/ipa-strub.cc b/gcc/ipa-strub.cc
index 45294b0b46bcb..bab20c386bb01 100644
--- a/gcc/ipa-strub.cc
+++ b/gcc/ipa-strub.cc
@@ -2922,6 +2922,16 @@ pass_ipa_strub::execute (function *)
  if (nparmt)
adjust_ftype++;
}
+  else if (TREE_THIS_VOLATILE (parm))
+   {
+ /* Drop volatile from wrapper's arguments, they're just
+temporaries copied to the wrapped function.  ???  Should
+we drop TYPE_QUAL_ATOMIC as well?  */
+ TREE_TYPE (parm) = build_qualified_type (TREE_TYPE (parm),
+  TYPE_QUALS (TREE_TYPE (parm))
+  & ~TYPE_QUAL_VOLATILE);
+ TREE_THIS_VOLATILE (parm) = 0;
+   }
 
 /* Also adjust the wrapped function type, if needed.  */
 if (adjust_ftype)
@@ -3224,9 +3234,7 @@ pass_ipa_strub::execute (function *)
{
  tree tmp = arg;
  /* If ARG is e.g. volatile, we must copy and
-convert in separate statements.  ???  Should
-we drop volatile from the wrapper
-instead?  */
+convert in separate statements.  */
  if (!is_gimple_val (arg))
{
  tmp = create_tmp_reg (TYPE_MAIN_VARIANT
diff --git a/gcc/testsuite/gcc.dg/strub-internal-volatile.c 
b/gcc/testsuite/gcc.dg/strub-internal-volatile.c
index cdfca67616bc8..0ffa98d799d32 100644
--- a/gcc/testsuite/gcc.dg/strub-internal-volatile.c
+++ b/gcc/testsuite/gcc.dg/strub-internal-volatile.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-options "-fdump-ipa-strub" } */
 /* { dg-require-effective-target strub } */
 
 void __attribute__ ((strub("internal")))
@@ -8,3 +9,7 @@ f(volatile short) {
 void g(void) {
   f(0);
 }
+
+/* We drop volatile from the wrapper, and keep it in the wrapped f, so
+   the count remains 1.  */
+/* { dg-final { scan-ipa-dump-times "volatile" 1 "strub" } } */


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH V4 2/3] Using pli for constant splitting

2023-12-11 Thread Kewen.Lin
Hi,

on 2023/12/11 11:26, Jiufu Guo wrote:
> Hi,
> 
> For constant building e.g. r120=0x, which does not fit 'li or lis',
> 'pli' is used to build this constant via 'emit_move_insn'.
> 
> While for a complicated constant, e.g. 0xULL, when using
> 'rs6000_emit_set_long_const' to split the constant recursively, it fails to
> use 'pli' to build the half part constant: 0x.
> 
> 'rs6000_emit_set_long_const' could be updated to use 'pli' to build half
> part of the constant when necessary.  For example: 0xULL,
> "pli 3,1717986918; rldimi 3,3,32,0" can be used.
> 
> Compare with previous:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639492.html
> This verion is refreshed and updated testcase name.
> 
> Bootstrap pass on ppc64{,le}.
> Is this ok for trunk?

OK for trunk, thanks!

BR,
Kewen

> 
> BR,
> Jeff (Jiufu Guo)
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
>   pli for 34bit constant.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/const-build-1.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc  | 9 -
>  gcc/testsuite/gcc.target/powerpc/const-build-1.c | 9 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build-1.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 017000a4e02..531c40488b4 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10511,7 +10511,14 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c, int *num_insns)
>emit_insn (dest_or_insn);
>};
> 
> -  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
> +  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (c))
> +{
> +  /* li/lis/pli */
> +  count_or_emit_insn (dest, GEN_INT (c));
> +  return;
> +}
> +
> + if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && !(ud1 & 0x8000)))
>  {
>/* li */
> diff --git a/gcc/testsuite/gcc.target/powerpc/const-build-1.c 
> b/gcc/testsuite/gcc.target/powerpc/const-build-1.c
> new file mode 100644
> index 000..7e35f8c507f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/const-build-1.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-require-effective-target power10_ok } */
> +
> +unsigned long long msk66() { return 0xULL; }
> +
> +/* { dg-final { scan-assembler-times {\mpli\M} 1 } } */
> +/* { dg-final { scan-assembler-not {\mli\M} } } */
> +/* { dg-final { scan-assembler-not {\mlis\M} } } */




[pushed] analyzer: add more test coverage for tainted modulus

2023-12-11 Thread David Malcolm
Add more test coverage for r14-6349-g0bef72539e585d.

Pushed to trunk as r14-6444-g2900a77fe4e7d2.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/plugin.exp: Add taint-modulus.c to
analyzer_kernel_plugin.c tests.
* gcc.dg/plugin/taint-modulus.c: New test.
---
 gcc/testsuite/gcc.dg/plugin/plugin.exp  |  1 +
 gcc/testsuite/gcc.dg/plugin/taint-modulus.c | 75 +
 2 files changed, 76 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/taint-modulus.c

diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp 
b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index d6cccb269df..eebf96116ef 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -165,6 +165,7 @@ set plugin_test_list [list \
  taint-CVE-2011-0521-5-fixed.c \
  taint-CVE-2011-0521-6.c \
  taint-antipatterns-1.c \
+ taint-modulus.c \
  taint-pr112850.c \
  taint-pr112850-precise.c \
  taint-pr112850-too-complex.c \
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-modulus.c 
b/gcc/testsuite/gcc.dg/plugin/taint-modulus.c
new file mode 100644
index 000..81d968864e6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/taint-modulus.c
@@ -0,0 +1,75 @@
+/* { dg-do compile } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-effective-target analyzer } */
+
+/* Reduced from a -Wanalyzer-tainted-array-index false +ve
+   seen in the Linux kernel's sound/drivers/opl3/opl3_synth.c.  */
+
+extern unsigned long
+copy_from_user(void* to, const void* from, unsigned long n);
+
+struct sbi_patch
+{
+  unsigned char prog;
+  unsigned char bank;
+};
+struct fm_patch
+{
+  unsigned char prog;
+  unsigned char bank;
+  struct fm_patch* next;
+};
+struct snd_opl3
+{
+  struct fm_patch* patch_table[32];
+};
+int
+snd_opl3_load_patch(struct snd_opl3* opl3,
+int prog,
+int bank);
+struct fm_patch*
+snd_opl3_find_patch(struct snd_opl3* opl3,
+int prog,
+int bank,
+int create_patch);
+long
+snd_opl3_write(struct snd_opl3* opl3,
+   const char* buf,
+   long count)
+{
+  long result = 0;
+  int err = 0;
+  struct sbi_patch inst;
+  while (count >= sizeof(inst)) {
+if (copy_from_user(, buf, sizeof(inst)))
+  return -14;
+err = snd_opl3_load_patch(opl3, inst.prog, inst.bank);
+if (err < 0)
+  break;
+result += sizeof(inst);
+count -= sizeof(inst);
+  }
+  return result > 0 ? result : err;
+}
+int
+snd_opl3_load_patch(struct snd_opl3* opl3,
+int prog,
+int bank)
+{
+  struct fm_patch* patch;
+  patch = snd_opl3_find_patch(opl3, prog, bank, 1);
+  if (!patch)
+return -12;
+  return 0;
+}
+struct fm_patch*
+snd_opl3_find_patch(struct snd_opl3* opl3, int prog, int bank, int 
create_patch)
+{
+  unsigned int key = (prog + bank) % 32;
+  struct fm_patch* patch;
+  for (patch = opl3->patch_table[key]; patch; patch = patch->next) { /* { 
dg-bogus "use of attacker-controlled value in array lookup" } */
+if (patch->prog == prog && patch->bank == bank)
+  return patch;
+  }
+  return ((void*)0);
+}
-- 
2.26.3



Re: [PATCH V4 1/3]rs6000: accurate num_insns_constant_gpr

2023-12-11 Thread Kewen.Lin
Hi Jeff,

on 2023/12/11 11:26, Jiufu Guo wrote:
> Hi,
> 
> Trunk gcc supports more constants to be built via two instructions:
> e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic".
> And then num_insns_constant should also be updated.
> 
> Function "rs6000_emit_set_long_const" is used to build complicated
> constants; and "num_insns_constant_gpr" is used to compute 'how
> many instructions are needed" to build the constant. So, these 
> two functions should be aligned.
> 
> The idea of this patch is: to reuse "rs6000_emit_set_long_const" to
> compute/record the instruction number(when computing the insn_num, 
> then do not emit instructions).
> 
> Compare with the previous version,
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639491.html
> this version updates a lambda usage and comments.
> 
> Bootstrap & regtest pass ppc64{,le}.
> Is this ok for trunk?

OK for trunk, thanks for the patience.

BR,
Kewen

> 
> BR,
> Jeff (Jiufu Guo)
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add new
>   parameter to record number of instructions to build the constant.
>   (num_insns_constant_gpr): Call rs6000_emit_set_long_const to compute
>   num_insn.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 284 ++--
>  1 file changed, 146 insertions(+), 138 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index cee22c359f3..1e3d1f7fc08 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1115,7 +1115,7 @@ static tree rs6000_handle_longcall_attribute (tree *, 
> tree, tree, int, bool *);
>  static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool 
> *);
>  static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
>  static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
> -static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
> +static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, int * = nullptr);
>  static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
>  static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, 
> bool);
>  static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t,
> @@ -6054,21 +6054,9 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
>  
>else if (TARGET_POWERPC64)
>  {
> -  HOST_WIDE_INT low = sext_hwi (value, 32);
> -  HOST_WIDE_INT high = value >> 31;
> -
> -  if (high == 0 || high == -1)
> - return 2;
> -
> -  high >>= 1;
> -
> -  if (low == 0 || low == high)
> - return num_insns_constant_gpr (high) + 1;
> -  else if (high == 0)
> - return num_insns_constant_gpr (low) + 1;
> -  else
> - return (num_insns_constant_gpr (high)
> - + num_insns_constant_gpr (low) + 1);
> +  int num_insns = 0;
> +  rs6000_emit_set_long_const (nullptr, value, _insns);
> +  return num_insns;
>  }
>  
>else
> @@ -10494,14 +10482,13 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
> *shift, HOST_WIDE_INT *mask)
>  
>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> Output insns to set DEST equal to the constant C as a series of
> -   lis, ori and shl instructions.  */
> +   lis, ori and shl instructions.  If NUM_INSNS is not NULL, then
> +   only increase *NUM_INSNS as the number of insns, and do not emit
> +   any insns.  */
>  
>  static void
> -rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
> +rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c, int *num_insns)
>  {
> -  rtx temp;
> -  int shift;
> -  HOST_WIDE_INT mask;
>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>  
>ud1 = c & 0x;
> @@ -10509,168 +10496,189 @@ rs6000_emit_set_long_const (rtx dest, 
> HOST_WIDE_INT c)
>ud3 = (c >> 32) & 0x;
>ud4 = (c >> 48) & 0x;
>  
> -  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
> -  || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
> -emit_move_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
> +  /* This lambda is used to emit one insn or just increase the insn count.
> + When counting the insn number, no need to emit the insn.  */
> +  auto count_or_emit_insn = [_insns] (rtx dest_or_insn, rtx src = 
> nullptr) {
> +if (num_insns)
> +  {
> + (*num_insns)++;
> + return;
> +  }
> +
> +if (src)
> +  emit_move_insn (dest_or_insn, src);
> +else
> +  emit_insn (dest_or_insn);
> +  };
>  
> -  else if ((ud4 == 0x && ud3 == 0x && (ud2 & 0x8000))
> -|| (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000)))
> +  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
> +  || (ud4 == 0 && ud3 == 0 && ud2 == 0 && !(ud1 & 0x8000)))
>  {
> -  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +  /* li */
> +  count_or_emit_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
> +  return;
> +}
> +
> +  rtx temp
> +  

[committed] MAINTAINERS: Update my email address

2023-12-11 Thread Feng Wang
ChangeLog:

* MAINTAINERS: Update my email address
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f3683ff03ec..bc47e30325b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -705,7 +705,7 @@ Marcel Vollweiler   

 Ville Voutilainen  
 Nenad Vukicevic
 Feng Wang  
-Feng Wang  s
+Feng Wang  
 Hongyu Wang
 Jiong Wang 
 Stephen M. Webb

-- 
2.17.1



[committed] RISC-V: Add avail interface into function_group_info

2023-12-11 Thread Feng Wang
Patch v3: Fix typo and remove the modification of rvv.exp.
Patch v2: Using variadic macro and add the dependency into t-riscv.

In order to add other extension about vector,this patch add
unsigned int (*avail) (void) into function_group_info to determine
whether to register the intrinsic based on ISA info.
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (DEF_RVV_FUNCTION):
Add AVAIL argument.
(read_vl): Using AVAIL argument default value.
(vlenb): Ditto.
(vsetvl): Ditto.
(vsetvlmax): Ditto.
(vle): Ditto.
(vse): Ditto.
(vlm): Ditto.
(vsm): Ditto.
(vlse): Ditto.
(vsse): Ditto.
(vluxei8): Ditto.
(vluxei16): Ditto.
(vluxei32): Ditto.
(vluxei64): Ditto.
(vloxei8): Ditto.
(vloxei16): Ditto.
(vloxei32): Ditto.
(vloxei64): Ditto.
(vsuxei8): Ditto.
(vsuxei16): Ditto.
(vsuxei32): Ditto.
(vsuxei64): Ditto.
(vsoxei8): Ditto.
(vsoxei16): Ditto.
(vsoxei32): Ditto.
(vsoxei64): Ditto.
(vleff): Ditto.
(vadd): Ditto.
(vsub): Ditto.
(vrsub): Ditto.
(vneg): Ditto.
(vwaddu): Ditto.
(vwsubu): Ditto.
(vwadd): Ditto.
(vwsub): Ditto.
(vwcvt_x): Ditto.
(vwcvtu_x): Ditto.
(vzext): Ditto.
(vsext): Ditto.
(vadc): Ditto.
(vmadc): Ditto.
(vsbc): Ditto.
(vmsbc): Ditto.
(vand): Ditto.
(vor): Ditto.
(vxor): Ditto.
(vnot): Ditto.
(vsll): Ditto.
(vsra): Ditto.
(vsrl): Ditto.
(vnsrl): Ditto.
(vnsra): Ditto.
(vncvt_x): Ditto.
(vmseq): Ditto.
(vmsne): Ditto.
(vmsltu): Ditto.
(vmslt): Ditto.
(vmsleu): Ditto.
(vmsle): Ditto.
(vmsgtu): Ditto.
(vmsgt): Ditto.
(vmsgeu): Ditto.
(vmsge): Ditto.
(vminu): Ditto.
(vmin): Ditto.
(vmaxu): Ditto.
(vmax): Ditto.
(vmul): Ditto.
(vmulh): Ditto.
(vmulhu): Ditto.
(vmulhsu): Ditto.
(vdivu): Ditto.
(vdiv): Ditto.
(vremu): Ditto.
(vrem): Ditto.
(vwmul): Ditto.
(vwmulu): Ditto.
(vwmulsu): Ditto.
(vmacc): Ditto.
(vnmsac): Ditto.
(vmadd): Ditto.
(vnmsub): Ditto.
(vwmaccu): Ditto.
(vwmacc): Ditto.
(vwmaccsu): Ditto.
(vwmaccus): Ditto.
(vmerge): Ditto.
(vmv_v): Ditto.
(vsaddu): Ditto.
(vsadd): Ditto.
(vssubu): Ditto.
(vssub): Ditto.
(vaaddu): Ditto.
(vaadd): Ditto.
(vasubu): Ditto.
(vasub): Ditto.
(vsmul): Ditto.
(vssrl): Ditto.
(vssra): Ditto.
(vnclipu): Ditto.
(vnclip): Ditto.
(vfadd): Ditto.
(vfsub): Ditto.
(vfrsub): Ditto.
(vfadd_frm): Ditto.
(vfsub_frm): Ditto.
(vfrsub_frm): Ditto.
(vfwadd): Ditto.
(vfwsub): Ditto.
(vfwadd_frm): Ditto.
(vfwsub_frm): Ditto.
(vfmul): Ditto.
(vfdiv): Ditto.
(vfrdiv): Ditto.
(vfmul_frm): Ditto.
(vfdiv_frm): Ditto.
(vfrdiv_frm): Ditto.
(vfwmul): Ditto.
(vfwmul_frm): Ditto.
(vfmacc): Ditto.
(vfnmsac): Ditto.
(vfmadd): Ditto.
(vfnmsub): Ditto.
(vfnmacc): Ditto.
(vfmsac): Ditto.
(vfnmadd): Ditto.
(vfmsub): Ditto.
(vfmacc_frm): Ditto.
(vfnmacc_frm): Ditto.
(vfmsac_frm): Ditto.
(vfnmsac_frm): Ditto.
(vfmadd_frm): Ditto.
(vfnmadd_frm): Ditto.
(vfmsub_frm): Ditto.
(vfnmsub_frm): Ditto.
(vfwmacc): Ditto.
(vfwnmacc): Ditto.
(vfwmsac): Ditto.
(vfwnmsac): Ditto.
(vfwmacc_frm): Ditto.
(vfwnmacc_frm): Ditto.
(vfwmsac_frm): Ditto.
(vfwnmsac_frm): Ditto.
(vfsqrt): Ditto.
(vfsqrt_frm): Ditto.
(vfrsqrt7): Ditto.
(vfrec7): Ditto.
(vfrec7_frm): Ditto.
(vfmin): Ditto.
(vfmax): Ditto.
(vfsgnj): Ditto.
(vfsgnjn): Ditto.
(vfsgnjx): Ditto.
(vfneg): Ditto.
(vfabs): Ditto.
(vmfeq): Ditto.
(vmfne): Ditto.
(vmflt): Ditto.
(vmfle): Ditto.
(vmfgt): Ditto.
(vmfge): Ditto.
(vfclass): Ditto.
(vfmerge): Ditto.
(vfmv_v): Ditto.
(vfcvt_x): Ditto.
(vfcvt_xu): Ditto.
(vfcvt_rtz_x): Ditto.
(vfcvt_rtz_xu): Ditto.
(vfcvt_f): Ditto.
(vfcvt_x_frm): Ditto.
(vfcvt_xu_frm): Ditto.
(vfcvt_f_frm): Ditto.
(vfwcvt_x): Ditto.
(vfwcvt_xu): Ditto.
(vfwcvt_rtz_x): Ditto.

[Committed] RISC-V: Move RVV POLY VALUE estimation from riscv.cc to riscv-v.cc[NFC]

2023-12-11 Thread Juzhe-Zhong
This patch moves RVV POLY VALUE estimation from riscv.cc to riscv-v.cc for
future better maintain like other target hook implementation.

Committed as it is obviously a code refinement.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (estimated_poly_value): New function.
* config/riscv/riscv-v.cc (estimated_poly_value): Ditto.
* config/riscv/riscv.cc (riscv_estimated_poly_value): Move RVV POLY 
VALUE estimation to riscv-v.cc

---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv-v.cc | 47 +
 gcc/config/riscv/riscv.cc   | 44 +++---
 3 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 20bbb5b859c..85ab1db2088 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -608,6 +608,7 @@ int count_regno_occurrences (rtx_insn *, unsigned int);
 bool imm_avl_p (machine_mode);
 bool can_be_broadcasted_p (rtx);
 bool gather_scatter_valid_offset_p (machine_mode);
+HOST_WIDE_INT estimated_poly_value (poly_int64, unsigned int);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 944b37b5df7..01898cb4b8d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4927,4 +4927,51 @@ gather_scatter_valid_offset_p (machine_mode mode)
   return true;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known VLA width all three estimates are the same.
+   For generic VLA tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with VLA when it is a win
+   even for VLA vectorization.
+   When VLA width information is available VAL.coeffs[1] is multiplied by
+   the number of VLA chunks over the initial VLS bits.  */
+HOST_WIDE_INT
+estimated_poly_value (poly_int64 val, unsigned int kind)
+{
+  unsigned int width_source
+= BITS_PER_RISCV_VECTOR.is_constant ()
+   ? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+   : (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on TARGET_MIN_VLEN vectors and the maximum is based on
+ the architectural maximum of 65536 bits.  */
+  unsigned int min_vlen_bytes = TARGET_MIN_VLEN / 8 - 1;
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * min_vlen_bytes;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_min_vlen = width_source - TARGET_MIN_VLEN;
+  return val.coeffs[0] + val.coeffs[1] * over_min_vlen / TARGET_MIN_VLEN;
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3f111fa0393..69a8a503f30 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9604,51 +9604,15 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
-/* Implement TARGET_ESTIMATED_POLY_VALUE.
-   Look into the tuning structure for an estimate.
-   KIND specifies the type of requested estimate: min, max or likely.
-   For cores with a known RVV width all three estimates are the same.
-   For generic RVV tuning we want to distinguish the maximum estimate from
-   the minimum and likely ones.
-   The likely estimate is the same as the minimum in that case to give a
-   conservative behavior of auto-vectorizing with RVV when it is a win
-   even for 128-bit RVV.
-   When RVV width information is available VAL.coeffs[1] is multiplied by
-   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+/* Implement TARGET_ESTIMATED_POLY_VALUE.  */
 
 static HOST_WIDE_INT
 riscv_estimated_poly_value (poly_int64 val,
poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
 {
-  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
-? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
-: (unsigned int) RVV_SCALABLE;
-
-  /* If there is no core-specific information then the minimum and likely
- values are based on 128-bit vectors and the maximum is based on
- the architectural maximum of 65536 bits.  */
-  if 

RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-11 Thread Zhu, Lipeng
On 2023/12/12 1:45, H.J. Lu wrote:
> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
> >
> > On 2023/12/9 23:23, Jakub Jelinek wrote:
> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> > > > This patch try to introduce the rwlock and split the read/write to
> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
> > > > increase CPU efficiency. In the get_gfc_unit function, the
> > > > percentage to step into the insert_unit function is around 30%, in
> > > > most instances, we can get the unit in the phase of reading the
> > > > unit_cache or unit_root tree. So split the read/write phase by
> > > > rwlock would be an approach to make it more parallel.
> > > >
> > > > BTW, the IPC metrics can gain around 9x in our test server with
> > > > 220 cores. The benchmark we used is
> > > > https://github.com/rwesson/NEAT
> > > >
> > > > libgcc/ChangeLog:
> > > >
> > > > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
> > > > (__gthrw): New function.
> > > > (__gthread_rwlock_rdlock): New function.
> > > > (__gthread_rwlock_tryrdlock): New function.
> > > > (__gthread_rwlock_wrlock): New function.
> > > > (__gthread_rwlock_trywrlock): New function.
> > > > (__gthread_rwlock_unlock): New function.
> > > >
> > > > libgfortran/ChangeLog:
> > > >
> > > > * io/async.c (DEBUG_LINE): New macro.
> > > > * io/async.h (RWLOCK_DEBUG_ADD): New macro.
> > > > (CHECK_RDLOCK): New macro.
> > > > (CHECK_WRLOCK): New macro.
> > > > (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
> > > > (IN_RWLOCK_DEBUG_QUEUE): New macro.
> > > > (RDLOCK): New macro.
> > > > (WRLOCK): New macro.
> > > > (RWUNLOCK): New macro.
> > > > (RD_TO_WRLOCK): New macro.
> > > > (INTERN_RDLOCK): New macro.
> > > > (INTERN_WRLOCK): New macro.
> > > > (INTERN_RWUNLOCK): New macro.
> > > > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> > > > a comment.
> > > > (unit_lock): Remove including associated internal_proto.
> > > > (unit_rwlock): New declarations including associated internal_proto.
> > > > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on
> unit_rwlock
> > > > instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> > > > unit_lock.
> > > > * io/transfer.c (st_read_done_worker): Use WRLOCK and
> RWUNLOCK
> > > on
> > > > unit_rwlock instead of LOCK and UNLOCK on unit_lock.
> > > > (st_write_done_worker): Likewise.
> > > > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
> > > > comment. Use unit_rwlock variable instead of unit_lock variable.
> > > > (get_gfc_unit_from_unit_root): New function.
> > > > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on
> unit_rwlock
> > > > instead of LOCK and UNLOCK on unit_lock.
> > > > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> > > of
> > > > LOCK and UNLOCK on unit_lock.
> > > > (close_units): Likewise.
> > > > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK
> on
> > > > unit_lock.
> > > > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
> > > > instead of LOCK and UNLOCK on unit_lock.
> > > > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock
> instead
> > > > of LOCK and UNLOCK on unit_lock.
> > >
> > > Ok for trunk, thanks.
> > >
> > >   Jakub
> >
> > Thanks! Looking forward to landing to trunk.
> >
> > Lipeng Zhu
> 
> Pushed for you.
> 
> Thanks.
> 
> --
> H.J.

Thanks for everyone's patience and help, really appreciate that!

Lipeng Zhu


[committed] MAINTAINERS: Add myself to write after approval and DCO

2023-12-11 Thread Feng Wang
ChangeLog:

* MAINTAINERS: Add myself to write after approval
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0dbcbadcfd7..f3683ff03ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -705,6 +705,7 @@ Marcel Vollweiler   

 Ville Voutilainen  
 Nenad Vukicevic
 Feng Wang  
+Feng Wang  s
 Hongyu Wang
 Jiong Wang 
 Stephen M. Webb

-- 
2.17.1



Re:[pushed] [PATCH v5] LoongArch: Fix eh_return epilogue for normal returns.

2023-12-11 Thread chenglulu

Pushed to r14-6440.

在 2023/12/8 下午6:01, Yang Yujie 写道:

On LoongArch, the regitsters $r4 - $r7 (EH_RETURN_DATA_REGNO) will be saved
and restored in the function prologue and epilogue if the given function calls
__builtin_eh_return.  This causes the return value to be overwritten on normal
return paths and breaks a rare case of libgcc's _Unwind_RaiseException.

gcc/ChangeLog:

* config/loongarch/loongarch.cc: Do not restore the saved eh_return
data registers ($r4-$r7) for a normal return of a function that calls
__builtin_eh_return elsewhere.
* config/loongarch/loongarch-protos.h: Same.
* config/loongarch/loongarch.md: Same.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/eh_return-normal-return.c: New test.
---
  gcc/config/loongarch/loongarch-protos.h   |  2 +-
  gcc/config/loongarch/loongarch.cc | 34 -
  gcc/config/loongarch/loongarch.md | 23 ++-
  .../loongarch/eh_return-normal-return.c   | 38 +++
  4 files changed, 84 insertions(+), 13 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/eh_return-normal-return.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index cb8fc36b086..af20b5d7132 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -60,7 +60,7 @@ enum loongarch_symbol_type {
  extern rtx loongarch_emit_move (rtx, rtx);
  extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int);
  extern void loongarch_expand_prologue (void);
-extern void loongarch_expand_epilogue (bool);
+extern void loongarch_expand_epilogue (int);
  extern bool loongarch_can_use_return_insn (void);
  
  extern bool loongarch_symbolic_constant_p (rtx, enum loongarch_symbol_type *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3545e66a10e..1277c0e9f72 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1015,7 +1015,8 @@ loongarch_save_restore_reg (machine_mode mode, int regno, 
HOST_WIDE_INT offset,
  
  static void

  loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
- loongarch_save_restore_fn fn)
+ loongarch_save_restore_fn fn,
+ bool skip_eh_data_regs_p)
  {
HOST_WIDE_INT offset;
  
@@ -1024,7 +1025,14 @@ loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,

for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
  if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
{
-   if (!cfun->machine->reg_is_wrapped_separately[regno])
+   /* Special care needs to be taken for $r4-$r7 (EH_RETURN_DATA_REGNO)
+  when returning normally from a function that calls
+  __builtin_eh_return.  In this case, these registers are saved but
+  should not be restored, or the return value may be clobbered.  */
+
+   if (!(cfun->machine->reg_is_wrapped_separately[regno]
+ || (skip_eh_data_regs_p
+ && GP_ARG_FIRST <= regno && regno < GP_ARG_FIRST + 4)))
  loongarch_save_restore_reg (word_mode, regno, offset, fn);
  
  	offset -= UNITS_PER_WORD;

@@ -1297,7 +1305,7 @@ loongarch_expand_prologue (void)
GEN_INT (-step1));
RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
size -= step1;
-  loongarch_for_each_saved_reg (size, loongarch_save_reg);
+  loongarch_for_each_saved_reg (size, loongarch_save_reg, false);
  }
  
/* Set up the frame pointer, if we're using one.  */

@@ -1382,11 +1390,13 @@ loongarch_can_use_return_insn (void)
return reload_completed && cfun->machine->frame.total_size == 0;
  }
  
-/* Expand an "epilogue" or "sibcall_epilogue" pattern; SIBCALL_P

-   says which.  */
+/* Expand function epilogue using the following insn patterns:
+   "epilogue"  (style == NORMAL_RETURN)
+   "sibcall_epilogue" (style == SIBCALL_RETURN)
+   "eh_return" (style == EXCEPTION_RETURN) */
  
  void

-loongarch_expand_epilogue (bool sibcall_p)
+loongarch_expand_epilogue (int style)
  {
/* Split the frame into two.  STEP1 is the amount of stack we should
   deallocate before restoring the registers.  STEP2 is the amount we
@@ -1403,7 +1413,8 @@ loongarch_expand_epilogue (bool sibcall_p)
bool need_barrier_p
  = (get_frame_size () + cfun->machine->frame.arg_pointer_offset) != 0;
  
-  if (!sibcall_p && loongarch_can_use_return_insn ())

+  /* Handle simple returns.  */
+  if (style == NORMAL_RETURN && loongarch_can_use_return_insn ())
  {
emit_jump_insn (gen_return ());
return;
@@ -1479,7 +1490,9 @@ loongarch_expand_epilogue (bool sibcall_p)
  
/* Restore the registers.  */

loongarch_for_each_saved_reg (frame->total_size - step2,
-   loongarch_restore_reg);
+ 

[PATCH #1/2] strub: handle volatile promoted args in internal strub [PR112938]

2023-12-11 Thread Alexandre Oliva


When generating code for an internal strub wrapper, don't clear the
DECL_NOT_GIMPLE_REG_P flag of volatile args, and gimplify them both
before and after any conversion.

While at that, move variable TMP into narrower scopes so that it's
more trivial to track where ARG lives.

Regstrapped on x86_64-linux-gnu.  Ok to install?

(there's a #2/2 followup coming up that addresses the ??? comment added
herein)


for  gcc/ChangeLog

PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Handle promoted
volatile args in internal strub.  Simplify.

for  gcc/testsuite/ChangeLog

PR middle-end/112938
* gcc.dg/strub-internal-volatile.c: New.
---
 gcc/ipa-strub.cc   |   29 +---
 gcc/testsuite/gcc.dg/strub-internal-volatile.c |   10 
 2 files changed, 31 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/strub-internal-volatile.c

diff --git a/gcc/ipa-strub.cc b/gcc/ipa-strub.cc
index 8ec6824e8a802..45294b0b46bcb 100644
--- a/gcc/ipa-strub.cc
+++ b/gcc/ipa-strub.cc
@@ -3203,7 +3203,6 @@ pass_ipa_strub::execute (function *)
   i++, arg = DECL_CHAIN (arg), nparm = DECL_CHAIN (nparm))
{
  tree save_arg = arg;
- tree tmp = arg;
 
  /* Arrange to pass indirectly the parms, if we decided to do
 so, and revert its type in the wrapper.  */
@@ -3211,10 +3210,9 @@ pass_ipa_strub::execute (function *)
{
  tree ref_type = TREE_TYPE (nparm);
  TREE_ADDRESSABLE (arg) = true;
- tree addr = build1 (ADDR_EXPR, ref_type, arg);
- tmp = arg = addr;
+ arg = build1 (ADDR_EXPR, ref_type, arg);
}
- else
+ else if (!TREE_THIS_VOLATILE (arg))
DECL_NOT_GIMPLE_REG_P (arg) = 0;
 
  /* Convert the argument back to the type used by the calling
@@ -3223,16 +3221,31 @@ pass_ipa_strub::execute (function *)
 double to be passed on unchanged to the wrapped
 function.  */
  if (TREE_TYPE (nparm) != DECL_ARG_TYPE (nparm))
-   arg = fold_convert (DECL_ARG_TYPE (nparm), arg);
+   {
+ tree tmp = arg;
+ /* If ARG is e.g. volatile, we must copy and
+convert in separate statements.  ???  Should
+we drop volatile from the wrapper
+instead?  */
+ if (!is_gimple_val (arg))
+   {
+ tmp = create_tmp_reg (TYPE_MAIN_VARIANT
+   (TREE_TYPE (arg)), "arg");
+ gimple *stmt = gimple_build_assign (tmp, arg);
+ gsi_insert_after (, stmt, GSI_NEW_STMT);
+   }
+ arg = fold_convert (DECL_ARG_TYPE (nparm), tmp);
+   }
 
  if (!is_gimple_val (arg))
{
- tmp = create_tmp_reg (TYPE_MAIN_VARIANT
-   (TREE_TYPE (arg)), "arg");
+ tree tmp = create_tmp_reg (TYPE_MAIN_VARIANT
+(TREE_TYPE (arg)), "arg");
  gimple *stmt = gimple_build_assign (tmp, arg);
  gsi_insert_after (, stmt, GSI_NEW_STMT);
+ arg = tmp;
}
- vargs.quick_push (tmp);
+ vargs.quick_push (arg);
  arg = save_arg;
}
/* These strub arguments are adjusted later.  */
diff --git a/gcc/testsuite/gcc.dg/strub-internal-volatile.c 
b/gcc/testsuite/gcc.dg/strub-internal-volatile.c
new file mode 100644
index 0..cdfca67616bc8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strub-internal-volatile.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target strub } */
+
+void __attribute__ ((strub("internal")))
+f(volatile short) {
+}
+
+void g(void) {
+  f(0);
+}

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] LoongArch: Fix warnings building libgcc

2023-12-11 Thread chenglulu



在 2023/12/10 上午12:38, Xi Ruoyao 写道:

We are excluding loongarch-opts.h from target libraries, but now struct
loongarch_target and gcc_options are not declared in the target
libraries, causing:

In file included from ../.././gcc/options.h:8,
  from ../.././gcc/tm.h:49,
  from ../../../gcc/libgcc/fixed-bit.c:48:
../../../gcc/libgcc/../gcc/config/loongarch/loongarch-opts.h:57:41:
warning: 'struct gcc_options' declared inside parameter list will not
be visible outside of this definition or declaration
57 |  struct gcc_options *opts,
   | ^~~

So exclude the declarations referring to the C++ structs as well.

gcc/ChangeLog:

* config/loongarch/loongarch-opts.h (la_target): Move into #if
for loongarch-def.h.
(loongarch_init_target): Likewise.
(loongarch_config_target): Likewise.
(loongarch_update_gcc_opt_status): Likewise.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

I don't understand. I tested it when I submitted the code, but I didn't 
find this problem.


I think such a problem will cause compilation errors when 
--enable-bootstrap, right?





Re: Introduce -finline-stringops

2023-12-11 Thread Alexandre Oliva
On Dec 11, 2023, Sam James  wrote:

> Alexandre Oliva via Gcc-patches  writes:

>> On Jun  2, 2023, Alexandre Oliva  wrote:
>> 
>>> Introduce -finline-stringops
>> 
>> Ping?  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620472.html

> Should the docs for the x86-specific -minline-all-stringops refer to
> the new -finline-stringops?

I wouldn't oppose it, but I find it might be somewhat misleading.
-minline-all-stringops seems to be presented as an optimization option,
because on x86 that's viable, whereas on some targets -finline-stringops
will often generate horribly inefficient code just to avoid some
dependencies on libc.  Now, it's undeniable that there is a connection
between them, in terms of offered functionality if not in goal and
framing/presentation.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] multiflags: fix doc warning properly

2023-12-11 Thread Alexandre Oliva
On Dec 11, 2023, Joseph Myers  wrote:

> On Fri, 8 Dec 2023, Alexandre Oliva wrote:
>> @@ -20589,7 +20589,7 @@ allocation before or after interprocedural 
>> optimization.
>> This option enables multilib-aware @code{TFLAGS} to be used to build
>> target libraries with options different from those the compiler is
>> configured to use by default, through the use of specs (@xref{Spec
>> -Files}) set up by compiler internals, by the target, or by builders at
>> +Files}.) set up by compiler internals, by the target, or by builders at

> The proper change in this context is to use @pxref instead of @xref.

Oooh, nice!  Thank you!

Here's a presumably proper fix on top of the earlier one, then.  Tested
on x86_64-linux-gnu.  Ok to install?


Rather than a dubious fix for a dubious warning, namely adding a
period after a parenthesized @xref because the warning demands it, use
@pxref that is meant for exactly this case.  Thanks to Joseph Myers
for introducing me to it.


for  gcc/ChangeLog

* doc/invoke.texi (multiflags): Drop extraneous period, use
@pxref instead.
---
 gcc/doc/invoke.texi |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7d15cf94821e3..ce4bb025d5144 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20588,8 +20588,8 @@ allocation before or after interprocedural optimization.
 @item -fmultiflags
 This option enables multilib-aware @code{TFLAGS} to be used to build
 target libraries with options different from those the compiler is
-configured to use by default, through the use of specs (@xref{Spec
-Files}.) set up by compiler internals, by the target, or by builders at
+configured to use by default, through the use of specs (@pxref{Spec
+Files}) set up by compiler internals, by the target, or by builders at
 configure time.
 
 Like @code{TFLAGS}, this allows the target libraries to be built for


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Jiang, Haochen
> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Tuesday, December 12, 2023 9:01 AM
> To: haochen.jiang ; Andrew Pinski (QUIC)
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Jiang, Haochen 
> Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> > -Original Message-
> > From: haochen.jiang 
> > Sent: Monday, December 11, 2023 4:54 PM
> > To: Andrew Pinski (QUIC) ; gcc-
> > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; haochen.ji...@intel.com
> > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> > assembler-times shrq 2 on Linux/x86_64
> >
> > On Linux/x86_64,
> >
> > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit commit
> > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > Author: Andrew Pinski 
> > Date:   Sat Nov 11 15:54:10 2023 -0800
> >
> > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type 
> > are
> > the same
> >
> > caused
> >
> > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> 
> 
> So I think this is a testsuite issue, in that shrx instruction is being used 
> here
> instead of just ` shrq` due to that instruction being enabled with `-
> march=cascadelake` .
> Can someone confirm that and submit a testcase change?

I will do that today.

Thx,
Haochen

> 
> Thanks,
> Andrew
> 
> >
> > with GCC configured with
> >
> > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-zlib -
> > -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> > enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check
> > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > target_board='unix{-m64\ -march=cascadelake}'"
> >
> > (Please do not reply to this email, for question about this report, contact 
> > me at
> > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > related, disabling AVX512F in command line might save that.) (However,
> > please make sure that there is no potential problems with AVX512.)


RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: haochen.jiang 
> Sent: Monday, December 11, 2023 4:54 PM
> To: Andrew Pinski (QUIC) ; gcc-
> regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; haochen.ji...@intel.com
> Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit commit
> 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> Author: Andrew Pinski 
> Date:   Sat Nov 11 15:54:10 2023 -0800
> 
> MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type are
> the same
> 
> caused
> 
> FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2


So I think this is a testsuite issue, in that shrx instruction is being used 
here instead of just ` shrq` due to that instruction being enabled with 
`-march=cascadelake` .
Can someone confirm that and submit a testcase change?

Thanks,
Andrew

> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-zlib -
> -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake
> related, disabling AVX512F in command line might save that.) (However,
> please make sure that there is no potential problems with AVX512.)


[r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread haochen.jiang
On Linux/x86_64,

85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit
commit 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
Author: Andrew Pinski 
Date:   Sat Nov 11 15:54:10 2023 -0800

MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type are 
the same

caused

FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6420/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH] libgccjit: Fix GGC segfault when using -flto

2023-12-11 Thread Antoni Boucher
I'm not sure how to do this. I tried the following commands, but this
fails even on master:

../../gcc/configure --enable-host-shared --enable-
languages=c,jit,c++,fortran,objc,lto --enable-checking=release --
disable-werror --prefix=/opt/gcc

make bootstrap -j24
make -k check -j24

>From what I can understand, the unexpected failures are in g++:

=== g++ Summary ===

# of expected passes72790
# of unexpected failures1
# of expected failures  1011
# of unsupported tests  3503

=== g++ Summary ===

# of expected passes4750
# of unexpected failures27
# of expected failures  16
# of unsupported tests  43


Am I doing something wrong?

On Fri, 2023-12-01 at 12:49 -0500, David Malcolm wrote:
> On Thu, 2023-11-30 at 17:13 -0500, Antoni Boucher wrote:
> > Here's the updated patch.
> > The failure was due to the test being in the test array while it
> > should
> > not have been there since it changes the context.
> 
> Thanks for the updated patch.
> 
> Did you do a full bootstrap and regression test with this one, or do
> you want me to?
> 
> Dave
> 



Re: [Patch] OpenMP: Minor '!$omp allocators' cleanup - and still: Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-11 Thread Andrew MacLeod



On 12/11/23 17:12, Thomas Schwinge wrote:

Hi!

This issue would've been prevented if we'd actually use a distinct C++
data type for GCC types, checkable at compile time -- I'm thus CCing
Andrew MacLeod for amusement or crying, "one more for the list!".  ;-\


Perhaps the time has come  It is definitely under re-consideration 
for next stage 1...


Andrew


(See

"[TTYPE] Strongly typed tree project. Original document circa 2017".)

On 2023-12-11T12:45:27+0100, Tobias Burnus  wrote:

I included a minor cleanup patch [...]

I intent to commit that patch as obvious, unless there are further comments.
OpenMP: Minor '!$omp allocators' cleanup
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -8361,8 +8361,10 @@ gfc_omp_call_add_alloc (tree ptr)
if (fn == NULL_TREE)
  {
fn = build_function_type_list (void_type_node, ptr_type_node, 
NULL_TREE);
+  tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
+  att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES (fn));
+  fn = build_type_attribute_variant (fn, att);
fn = build_fn_decl ("GOMP_add_alloc", fn);
-/* FIXME: attributes.  */
  }
return build_call_expr_loc (input_location, fn, 1, ptr);
  }
@@ -8380,7 +8382,9 @@ gfc_omp_call_is_alloc (tree ptr)
fn = build_function_type_list (boolean_type_node, ptr_type_node,
NULL_TREE);
fn = build_fn_decl ("GOMP_is_alloc", fn);
-/* FIXME: attributes.  */
+  tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
+  att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES (fn));
+  fn = build_type_attribute_variant (fn, att);
  }
return build_call_expr_loc (input_location, fn, 1, ptr);
  }

Pushed to master branch commit 453e0f45a49f425992bc47ff8909ed8affc29d2e
"Resolve ICE in 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'", see
attached.


Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955




RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-11 Thread Tamar Christina
> > + vectype = truth_type_for (comp_type);
> 
> so this leaves the producer of the mask in the GIMPLE_COND and we
> vectorize the GIMPLE_COND as
> 
>   mask_1 = ...;
>   if (mask_1 != {-1,-1...})
> ..
> 
> ?  In principle only the mask producer needs a vector type and that
> adjusted by bool handling, the branch itself doesn't need any
> STMT_VINFO_VECTYPE.
> 
> As said I believe if you recognize a GIMPLE_COND pattern for conds
> that aren't bool != 0 producing the mask stmt this should be picked
> up by bool handling correctly already.
> 
> Also as said piggy-backing on the COND_EXPR handling in this function
> which has the condition split out into a separate stmt(!) might not
> completely handle things correctly and you are likely missing
> the tcc_comparison handling of the embedded compare.
> 

Ok, I've stopped piggy-backing on the COND_EXPR handling and created
vect_recog_gcond_pattern.  As you said in the previous email I've also
stopped setting the vectype for the gcond and instead use the type of the
operand.

Note that because the pattern doesn't apply if you were already an NE_EXPR
I do need the extra truth_type_for for that case.  Because in the case of e.g.

a = b > 4;
If (a != 0)

The producer of the mask is already outside of the cond but will not trigger
Boolean recognition.  That means that while the integral type is correct it
Won't be a Boolean one and vectorable_comparison expects a Boolean
vector.  Alternatively, we can remove that assert?  But that seems worse.

Additionally in the previous email you mention "adjusted Boolean statement".

I'm guessing you were referring to generating a COND_EXPR from the gcond.
So vect_recog_bool_pattern detects it?  The problem with that this gets folded
to x & 1 and doesn't trigger.  It also then blocks vectorization.  So instead 
I've
not forced it.

> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +{
> > +  /* We build the reductions in a way to maintain as much parallelism 
> > as
> > +possible.  */
> > +  auto_vec workset (stmts.length ());
> > +
> > +  /* Mask the statements as we queue them up.  */
> > +  if (masked_loop_p)
> > +   for (auto stmt : stmts)
> > + workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > +   mask, stmt, _gsi));
> > +  else
> > +   workset.splice (stmts);
> > +
> > +  while (workset.length () > 1)
> > +   {
> > + new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > + tree arg0 = workset.pop ();
> > + tree arg1 = workset.pop ();
> > + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +  _gsi);
> > + workset.quick_insert (0, new_temp);
> > +   }
> > +}
> > +  else
> > +new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  /* If we have multiple statements after reduction we should check all the
> > + lanes and treat it as a full vector.  */
> > +  if (masked_loop_p)
> > +cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +_gsi);
> 
> You didn't fix any of the code above it seems, it's still wrong.
> 

Apologies, I hadn't realized that the last argument to get_loop_mask was the 
index.

Should be fixed now. Is this closer to what you wanted?
The individual ops are now masked with separate masks. (See testcase when 
N=865).

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
(vect_recog_gcond_pattern): New.
(vect_vect_recog_func_ptrs): Use it.
* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
lhs.
(vectorizable_early_exit): New.
(vect_analyze_stmt, vect_transform_stmt): Use it.
(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_88.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 
..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+ 

Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2023 at 05:00:50PM -0500, Jason Merrill wrote:
> In discussion of PR71093 it came up that more clobber_kind options would be
> useful within the C++ front-end.
> 
> gcc/ChangeLog:
> 
>   * tree-core.h (enum clobber_kind): Rename CLOBBER_EOL to
>   CLOBBER_STORAGE_END.  Add CLOBBER_STORAGE_BEGIN,
>   CLOBBER_OBJECT_BEGIN, CLOBBER_OBJECT_END.
>   * gimple-lower-bitint.cc
>   * gimple-ssa-warn-access.cc
>   * gimplify.cc
>   * tree-inline.cc
>   * tree-ssa-ccp.cc: Adjust for rename.

I doubt the above style will make it through the pre-commit hook, but I
might be wrong.

> --- a/gcc/gimple-lower-bitint.cc
> +++ b/gcc/gimple-lower-bitint.cc
> @@ -806,7 +806,7 @@ bitint_large_huge::handle_operand (tree op, tree idx)
> && m_after_stmt
> && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
>   {
> -   tree clobber = build_clobber (TREE_TYPE (m_vars[p]), CLOBBER_EOL);
> +   tree clobber = build_clobber (TREE_TYPE (m_vars[p]), 
> CLOBBER_STORAGE_END);

This needs line wrapping I think.

> @@ -2100,7 +2100,7 @@ bitint_large_huge::handle_operand_addr (tree op, gimple 
> *stmt,
> ret = build_fold_addr_expr (var);
> if (!stmt_ends_bb_p (gsi_stmt (m_gsi)))
>   {
> -   tree clobber = build_clobber (m_limb_type, CLOBBER_EOL);
> +   tree clobber = build_clobber (m_limb_type, 
> CLOBBER_STORAGE_END);

This too.

> && flag_stack_reuse != SR_NONE)
>   {
> -   tree clobber = build_clobber (TREE_TYPE (t), CLOBBER_EOL);
> +   tree clobber = build_clobber (TREE_TYPE (t), CLOBBER_STORAGE_END);
> gimple *clobber_stmt;
> clobber_stmt = gimple_build_assign (t, clobber);
> gimple_set_location (clobber_stmt, end_locus);
> @@ -7417,7 +7417,7 @@ gimplify_target_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
>   {
> if (flag_stack_reuse == SR_ALL)
>   {
> -   tree clobber = build_clobber (TREE_TYPE (temp), CLOBBER_EOL);
> +   tree clobber = build_clobber (TREE_TYPE (temp), 
> CLOBBER_STORAGE_END);
> clobber = build2 (MODIFY_EXPR, TREE_TYPE (temp), temp, clobber);
> gimple_push_cleanup (temp, clobber, false, pre_p, true);
>   }

These too.

> diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
> index a4fc839a22d..cca3227fa89 100644
> --- a/gcc/tree-inline.cc
> +++ b/gcc/tree-inline.cc
> @@ -5136,7 +5136,7 @@ expand_call_inline (basic_block bb, gimple *stmt, 
> copy_body_data *id,
> && !is_gimple_reg (*varp)
> && !(id->debug_map && id->debug_map->get (p)))
>   {
> -   tree clobber = build_clobber (TREE_TYPE (*varp), CLOBBER_EOL);
> +   tree clobber = build_clobber (TREE_TYPE (*varp), 
> CLOBBER_STORAGE_END);
> gimple *clobber_stmt;
> clobber_stmt = gimple_build_assign (*varp, clobber);
> gimple_set_location (clobber_stmt, gimple_location (stmt));
> @@ -5208,7 +5208,7 @@ expand_call_inline (basic_block bb, gimple *stmt, 
> copy_body_data *id,
> && !is_gimple_reg (id->retvar)
> && !stmt_ends_bb_p (stmt))
>   {
> -   tree clobber = build_clobber (TREE_TYPE (id->retvar), CLOBBER_EOL);
> +   tree clobber = build_clobber (TREE_TYPE (id->retvar), 
> CLOBBER_STORAGE_END);
> gimple *clobber_stmt;
> clobber_stmt = gimple_build_assign (id->retvar, clobber);
> gimple_set_location (clobber_stmt, gimple_location (old_stmt));

And these.

Otherwise LGTM.

Jakub



Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-11 Thread Marek Polacek
On Mon, Dec 11, 2023 at 05:00:50PM -0500, Jason Merrill wrote:
> On 12/11/23 14:21, Marek Polacek wrote:
> > On Mon, Dec 11, 2023 at 08:17:22PM +0100, Richard Biener wrote:
> > > 
> > > 
> > > > Am 11.12.2023 um 20:12 schrieb Jason Merrill :
> > > > Maybe something like this?  Or shall we write out the names like 
> > > > CLOBBER_OBJECT_START, CLOBBER_STORAGE_END, etc?
> > > 
> > > Yeah, the abbreviations look a bit confusing so spelling it out would be 
> > > better
> > 
> > What about pretty-print, should we keep
> > 
> >pp_string (pp, "(eol)");
> > 
> > or use the new, more specific description?
> 
> I think we tend toward terseness in pretty-print, though I don't feel
> strongly about it at all.
> 
> But we should handle the other cases as well.

Nice.  I think you're going to have to adjust gcc.dg/pr87052.c because
that checks for "(eol)".

> So, how about:

Obviously I can't approve but, FWIW, it looks good.  I don't see anything
in doc/ that needs updating.  Thanks.

> From e2e535a440a5447b45769e6630cca21d274108f1 Mon Sep 17 00:00:00 2001
> From: Jason Merrill 
> Date: Mon, 11 Dec 2023 11:35:31 -0500
> Subject: [PATCH] tree: add to clobber_kind
> To: gcc-patches@gcc.gnu.org
> 
> In discussion of PR71093 it came up that more clobber_kind options would be
> useful within the C++ front-end.
> 
> gcc/ChangeLog:
> 
>   * tree-core.h (enum clobber_kind): Rename CLOBBER_EOL to
>   CLOBBER_STORAGE_END.  Add CLOBBER_STORAGE_BEGIN,
>   CLOBBER_OBJECT_BEGIN, CLOBBER_OBJECT_END.
>   * gimple-lower-bitint.cc
>   * gimple-ssa-warn-access.cc
>   * gimplify.cc
>   * tree-inline.cc
>   * tree-ssa-ccp.cc: Adjust for rename.
>   * tree-pretty-print.cc: And handle new values.
> ---
>  gcc/tree-core.h   | 13 ++---
>  gcc/gimple-lower-bitint.cc|  8 
>  gcc/gimple-ssa-warn-access.cc |  2 +-
>  gcc/gimplify.cc   |  8 
>  gcc/tree-inline.cc|  4 ++--
>  gcc/tree-pretty-print.cc  | 19 +--
>  gcc/tree-ssa-ccp.cc   |  2 +-
>  7 files changed, 39 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 04c04cf2f37..58aa598f3bb 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -986,12 +986,19 @@ enum annot_expr_kind {
>annot_expr_kind_last
>  };
>  
> -/* The kind of a TREE_CLOBBER_P CONSTRUCTOR node.  */
> +/* The kind of a TREE_CLOBBER_P CONSTRUCTOR node.  Other than _UNDEF, these 
> are
> +   in roughly sequential order.  */
>  enum clobber_kind {
>/* Unspecified, this clobber acts as a store of an undefined value.  */
>CLOBBER_UNDEF,
> -  /* This clobber ends the lifetime of the storage.  */
> -  CLOBBER_EOL,
> +  /* Beginning of storage duration, e.g. malloc.  */
> +  CLOBBER_STORAGE_BEGIN,
> +  /* Beginning of object lifetime, e.g. C++ constructor.  */
> +  CLOBBER_OBJECT_BEGIN,
> +  /* End of object lifetime, e.g. C++ destructor.  */
> +  CLOBBER_OBJECT_END,
> +  /* End of storage duration, e.g. free.  */
> +  CLOBBER_STORAGE_END,
>CLOBBER_LAST
>  };
>  
> diff --git a/gcc/gimple-lower-bitint.cc b/gcc/gimple-lower-bitint.cc
> index c55c32fb40d..84f92b6e654 100644
> --- a/gcc/gimple-lower-bitint.cc
> +++ b/gcc/gimple-lower-bitint.cc
> @@ -806,7 +806,7 @@ bitint_large_huge::handle_operand (tree op, tree idx)
> && m_after_stmt
> && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
>   {
> -   tree clobber = build_clobber (TREE_TYPE (m_vars[p]), CLOBBER_EOL);
> +   tree clobber = build_clobber (TREE_TYPE (m_vars[p]), 
> CLOBBER_STORAGE_END);
> g = gimple_build_assign (m_vars[p], clobber);
> gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
> gsi_insert_after (, g, GSI_SAME_STMT);
> @@ -2063,7 +2063,7 @@ bitint_large_huge::handle_operand_addr (tree op, gimple 
> *stmt,
>tree ret = build_fold_addr_expr (var);
>if (!stmt_ends_bb_p (gsi_stmt (m_gsi)))
>   {
> -   tree clobber = build_clobber (atype, CLOBBER_EOL);
> +   tree clobber = build_clobber (atype, CLOBBER_STORAGE_END);
> g = gimple_build_assign (var, clobber);
> gsi_insert_after (_gsi, g, GSI_SAME_STMT);
>   }
> @@ -2100,7 +2100,7 @@ bitint_large_huge::handle_operand_addr (tree op, gimple 
> *stmt,
> ret = build_fold_addr_expr (var);
> if (!stmt_ends_bb_p (gsi_stmt (m_gsi)))
>   {
> -   tree clobber = build_clobber (m_limb_type, CLOBBER_EOL);
> +   tree clobber = build_clobber (m_limb_type, 
> CLOBBER_STORAGE_END);
> g = gimple_build_assign (var, clobber);
> gsi_insert_after (_gsi, g, GSI_SAME_STMT);
>   }
> @@ -3707,7 +3707,7 @@ bitint_large_huge::finish_arith_overflow (tree var, 
> tree obj, tree type,
>  }
>if (var)
>  {
> -  tree clobber = build_clobber (TREE_TYPE (var), CLOBBER_EOL);
> +  tree clobber = build_clobber (TREE_TYPE (var), 

Re: [Patch] OpenMP: Minor '!$omp allocators' cleanup - and still: Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-11 Thread Thomas Schwinge
Hi!

This issue would've been prevented if we'd actually use a distinct C++
data type for GCC types, checkable at compile time -- I'm thus CCing
Andrew MacLeod for amusement or crying, "one more for the list!".  ;-\
(See

"[TTYPE] Strongly typed tree project. Original document circa 2017".)

On 2023-12-11T12:45:27+0100, Tobias Burnus  wrote:
> I included a minor cleanup patch [...]
>
> I intent to commit that patch as obvious, unless there are further comments.

> OpenMP: Minor '!$omp allocators' cleanup

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -8361,8 +8361,10 @@ gfc_omp_call_add_alloc (tree ptr)
>if (fn == NULL_TREE)
>  {
>fn = build_function_type_list (void_type_node, ptr_type_node, 
> NULL_TREE);
> +  tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
> +  att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES 
> (fn));
> +  fn = build_type_attribute_variant (fn, att);
>fn = build_fn_decl ("GOMP_add_alloc", fn);
> -/* FIXME: attributes.  */
>  }
>return build_call_expr_loc (input_location, fn, 1, ptr);
>  }
> @@ -8380,7 +8382,9 @@ gfc_omp_call_is_alloc (tree ptr)
>fn = build_function_type_list (boolean_type_node, ptr_type_node,
>NULL_TREE);
>fn = build_fn_decl ("GOMP_is_alloc", fn);
> -/* FIXME: attributes.  */
> +  tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
> +  att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES 
> (fn));
> +  fn = build_type_attribute_variant (fn, att);
>  }
>return build_call_expr_loc (input_location, fn, 1, ptr);
>  }

Pushed to master branch commit 453e0f45a49f425992bc47ff8909ed8affc29d2e
"Resolve ICE in 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 453e0f45a49f425992bc47ff8909ed8affc29d2e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 11 Dec 2023 22:52:54 +0100
Subject: [PATCH] Resolve ICE in
 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'

Fix-up for recent commit 2505a8b41d3b74a545755a278f3750a29c1340b6
"OpenMP: Minor '!$omp allocators' cleanup", which caused:

{+FAIL: gfortran.dg/gomp/allocate-5.f90   -O  (internal compiler error: tree check: expected class 'type', have 'declaration' (function_decl) in gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)+}
[-PASS:-]{+FAIL:+} gfortran.dg/gomp/allocate-5.f90   -O  (test for excess errors)

..., and similarly in 'libgomp.fortran/allocators-1.f90',
'libgomp.fortran/allocators-2.f90', 'libgomp.fortran/allocators-3.f90',
'libgomp.fortran/allocators-4.f90', 'libgomp.fortran/allocators-5.f90'.

	gcc/fortran/
	* trans-openmp.cc (gfc_omp_call_is_alloc): Resolve ICE.
---
 gcc/fortran/trans-openmp.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 95184920cf7..f7c73a5d273 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -8381,10 +8381,10 @@ gfc_omp_call_is_alloc (tree ptr)
 {
   fn = build_function_type_list (boolean_type_node, ptr_type_node,
  NULL_TREE);
-  fn = build_fn_decl ("GOMP_is_alloc", fn);
   tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
   att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES (fn));
   fn = build_type_attribute_variant (fn, att);
+  fn = build_fn_decl ("GOMP_is_alloc", fn);
 }
   return build_call_expr_loc (input_location, fn, 1, ptr);
 }
-- 
2.34.1



Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-11 Thread Jason Merrill

On 12/11/23 14:21, Marek Polacek wrote:

On Mon, Dec 11, 2023 at 08:17:22PM +0100, Richard Biener wrote:




Am 11.12.2023 um 20:12 schrieb Jason Merrill :
Maybe something like this?  Or shall we write out the names like 
CLOBBER_OBJECT_START, CLOBBER_STORAGE_END, etc?


Yeah, the abbreviations look a bit confusing so spelling it out would be better


What about pretty-print, should we keep

   pp_string (pp, "(eol)");

or use the new, more specific description?


I think we tend toward terseness in pretty-print, though I don't feel 
strongly about it at all.


But we should handle the other cases as well.

So, how about:

From e2e535a440a5447b45769e6630cca21d274108f1 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Mon, 11 Dec 2023 11:35:31 -0500
Subject: [PATCH] tree: add to clobber_kind
To: gcc-patches@gcc.gnu.org

In discussion of PR71093 it came up that more clobber_kind options would be
useful within the C++ front-end.

gcc/ChangeLog:

	* tree-core.h (enum clobber_kind): Rename CLOBBER_EOL to
	CLOBBER_STORAGE_END.  Add CLOBBER_STORAGE_BEGIN,
	CLOBBER_OBJECT_BEGIN, CLOBBER_OBJECT_END.
	* gimple-lower-bitint.cc
	* gimple-ssa-warn-access.cc
	* gimplify.cc
	* tree-inline.cc
	* tree-ssa-ccp.cc: Adjust for rename.
	* tree-pretty-print.cc: And handle new values.
---
 gcc/tree-core.h   | 13 ++---
 gcc/gimple-lower-bitint.cc|  8 
 gcc/gimple-ssa-warn-access.cc |  2 +-
 gcc/gimplify.cc   |  8 
 gcc/tree-inline.cc|  4 ++--
 gcc/tree-pretty-print.cc  | 19 +--
 gcc/tree-ssa-ccp.cc   |  2 +-
 7 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 04c04cf2f37..58aa598f3bb 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -986,12 +986,19 @@ enum annot_expr_kind {
   annot_expr_kind_last
 };
 
-/* The kind of a TREE_CLOBBER_P CONSTRUCTOR node.  */
+/* The kind of a TREE_CLOBBER_P CONSTRUCTOR node.  Other than _UNDEF, these are
+   in roughly sequential order.  */
 enum clobber_kind {
   /* Unspecified, this clobber acts as a store of an undefined value.  */
   CLOBBER_UNDEF,
-  /* This clobber ends the lifetime of the storage.  */
-  CLOBBER_EOL,
+  /* Beginning of storage duration, e.g. malloc.  */
+  CLOBBER_STORAGE_BEGIN,
+  /* Beginning of object lifetime, e.g. C++ constructor.  */
+  CLOBBER_OBJECT_BEGIN,
+  /* End of object lifetime, e.g. C++ destructor.  */
+  CLOBBER_OBJECT_END,
+  /* End of storage duration, e.g. free.  */
+  CLOBBER_STORAGE_END,
   CLOBBER_LAST
 };
 
diff --git a/gcc/gimple-lower-bitint.cc b/gcc/gimple-lower-bitint.cc
index c55c32fb40d..84f92b6e654 100644
--- a/gcc/gimple-lower-bitint.cc
+++ b/gcc/gimple-lower-bitint.cc
@@ -806,7 +806,7 @@ bitint_large_huge::handle_operand (tree op, tree idx)
 	  && m_after_stmt
 	  && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
 	{
-	  tree clobber = build_clobber (TREE_TYPE (m_vars[p]), CLOBBER_EOL);
+	  tree clobber = build_clobber (TREE_TYPE (m_vars[p]), CLOBBER_STORAGE_END);
 	  g = gimple_build_assign (m_vars[p], clobber);
 	  gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
 	  gsi_insert_after (, g, GSI_SAME_STMT);
@@ -2063,7 +2063,7 @@ bitint_large_huge::handle_operand_addr (tree op, gimple *stmt,
   tree ret = build_fold_addr_expr (var);
   if (!stmt_ends_bb_p (gsi_stmt (m_gsi)))
 	{
-	  tree clobber = build_clobber (atype, CLOBBER_EOL);
+	  tree clobber = build_clobber (atype, CLOBBER_STORAGE_END);
 	  g = gimple_build_assign (var, clobber);
 	  gsi_insert_after (_gsi, g, GSI_SAME_STMT);
 	}
@@ -2100,7 +2100,7 @@ bitint_large_huge::handle_operand_addr (tree op, gimple *stmt,
 	  ret = build_fold_addr_expr (var);
 	  if (!stmt_ends_bb_p (gsi_stmt (m_gsi)))
 		{
-		  tree clobber = build_clobber (m_limb_type, CLOBBER_EOL);
+		  tree clobber = build_clobber (m_limb_type, CLOBBER_STORAGE_END);
 		  g = gimple_build_assign (var, clobber);
 		  gsi_insert_after (_gsi, g, GSI_SAME_STMT);
 		}
@@ -3707,7 +3707,7 @@ bitint_large_huge::finish_arith_overflow (tree var, tree obj, tree type,
 }
   if (var)
 {
-  tree clobber = build_clobber (TREE_TYPE (var), CLOBBER_EOL);
+  tree clobber = build_clobber (TREE_TYPE (var), CLOBBER_STORAGE_END);
   g = gimple_build_assign (var, clobber);
   gsi_insert_after (_gsi, g, GSI_SAME_STMT);
 }
diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 1646bd1be14..f04c2530869 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -4364,7 +4364,7 @@ void
 pass_waccess::check_stmt (gimple *stmt)
 {
   if (m_check_dangling_p
-  && gimple_clobber_p (stmt, CLOBBER_EOL))
+  && gimple_clobber_p (stmt, CLOBBER_STORAGE_END))
 {
   /* Ignore clobber statements in blocks with exceptional edges.  */
   basic_block bb = gimple_bb (stmt);
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 342e43a7f25..ffc1882d22a 100644
--- a/gcc/gimplify.cc
+++ 

[r14-6412 Regression] FAIL: libgomp.fortran/allocators-5.f90 -O (test for excess errors) on Linux/x86_64

2023-12-11 Thread haochen.jiang
On Linux/x86_64,

2505a8b41d3b74a545755a278f3750a29c1340b6 is the first bad commit
commit 2505a8b41d3b74a545755a278f3750a29c1340b6
Author: Tobias Burnus 
Date:   Mon Dec 11 15:08:07 2023 +0100

OpenMP: Minor '!$omp allocators' cleanup

caused

FAIL: gfortran.dg/gomp/allocate-5.f90   -O  (internal compiler error: tree 
check: expected class 'type', have 'declaration' (function_decl) in 
gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)
FAIL: gfortran.dg/gomp/allocate-5.f90   -O  (test for excess errors)
FAIL: libgomp.fortran/allocators-1.f90   -O  (internal compiler error: tree 
check: expected class 'type', have 'declaration' (function_decl) in 
gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)
FAIL: libgomp.fortran/allocators-1.f90   -O   scan-tree-dump-times original 
"__builtin_GOMP_free \\(sss, 0B\\);" 2
FAIL: libgomp.fortran/allocators-1.f90   -O   scan-tree-dump-times original 
"GOMP_add_alloc \\(sss\\);" 1
FAIL: libgomp.fortran/allocators-1.f90   -O   scan-tree-dump-times original "if 
\\(GOMP_is_alloc \\(sss\\)\\)" 2
FAIL: libgomp.fortran/allocators-1.f90   -O   scan-tree-dump-times original 
"sss = \\(integer\\(kind=4\\) \\*\\) __builtin_GOMP_alloc \\(4, 4, 
D\\.[0-9]+\\);" 1
FAIL: libgomp.fortran/allocators-1.f90   -O  (test for excess errors)
FAIL: libgomp.fortran/allocators-2.f90   -O  (internal compiler error: tree 
check: expected class 'type', have 'declaration' (function_decl) in 
gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)
FAIL: libgomp.fortran/allocators-2.f90   -O  (test for excess errors)
FAIL: libgomp.fortran/allocators-3.f90   -O  (internal compiler error: tree 
check: expected class 'type', have 'declaration' (function_decl) in 
gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)
FAIL: libgomp.fortran/allocators-3.f90   -O  (test for excess errors)
FAIL: libgomp.fortran/allocators-4.f90   -O  (internal compiler error: tree 
check: expected class 'type', have 'declaration' (function_decl) in 
gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)
FAIL: libgomp.fortran/allocators-4.f90   -O  (test for excess errors)
FAIL: libgomp.fortran/allocators-5.f90   -O  (internal compiler error: tree 
check: expected class 'type', have 'declaration' (function_decl) in 
gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)
FAIL: libgomp.fortran/allocators-5.f90   -O  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6412/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/allocate-5.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/allocate-5.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/allocate-5.f90 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=gfortran.dg/gomp/allocate-5.f90 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-1.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-1.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-1.f90 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-1.f90 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-2.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-2.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-2.f90 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-2.f90 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-3.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-3.f90 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/allocators-3.f90 

Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-12-11 Thread Thomas Schwinge
Hi Andre!

On 2023-10-16T16:03:26+0100, "Andre Vieira (lists)" 
 wrote:
> Just a minor update to the patch, I had missed the libgomp testsuite, so
> had to make some adjustments there too.

Unfortunately, there appear to be a number of DejaGnu directive errors in
your test case changes -- do you not see those in your testing?

-PASS: gcc.dg/gomp/pr87887-1.c (test for excess errors)
+ERROR: gcc.dg/gomp/pr87887-1.c: syntax error in target selector ".-4" for 
" dg-warning 13 "unsupported return type ‘struct S’ for ‘simd’ functions" { 
target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/pr87887-1.c: syntax error in target selector ".-4" for 
" dg-warning 13 "unsupported return type ‘struct S’ for ‘simd’ functions" { 
target aarch64*-*-* } .-4 "

-UNSUPPORTED: gcc.dg/gomp/pr89246-1.c
+ERROR: gcc.dg/gomp/pr89246-1.c: syntax error in target selector ".-4" for 
" dg-warning 11 "unsupported argument type ‘__int128’ for ‘simd’ functions" { 
target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/pr89246-1.c: syntax error in target selector ".-4" for 
" dg-warning 11 "unsupported argument type ‘__int128’ for ‘simd’ functions" { 
target aarch64*-*-* } .-4 "

-PASS: gcc.dg/gomp/simd-clones-2.c (test for excess errors)
-PASS: gcc.dg/gomp/simd-clones-2.c scan-tree-dump optimized 
"(?n)^__attribute__\\(\\(omp declare simd \\(notinbranch aligned\\(2:32\\)\\), 
omp declare simd \\(inbranch uniform\\(2\\) linear\\(1:66\\)\\)\\)\\)$"
-[...]
-PASS: gcc.dg/gomp/simd-clones-2.c scan-tree-dump optimized 
"_ZGVeN16vvva32_addit"
+ERROR: gcc.dg/gomp/simd-clones-2.c: unmatched open quote in list for " 
dg-final 19 { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized { target 
aarch64*-*-* } } "
+ERROR: gcc.dg/gomp/simd-clones-2.c: unmatched open quote in list for " 
dg-final 19 { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized { target 
aarch64*-*-* } } "

-PASS: libgomp.c/declare-variant-1.c (test for excess errors)
-PASS: libgomp.c/declare-variant-1.c scan-ltrans-tree-dump-not optimized 
"f04 \\(x"
-PASS: libgomp.c/declare-variant-1.c scan-ltrans-tree-dump-times optimized 
"f01 \\(x" 4
-PASS: libgomp.c/declare-variant-1.c scan-ltrans-tree-dump-times optimized 
"f03 \\(x" 14
-PASS: libgomp.c/declare-variant-1.c scan-tree-dump-times gimple "f04 \\(x" 
2
+ERROR: libgomp.c/declare-variant-1.c: unknown dg option: \} for "}"
+ERROR: libgomp.c/declare-variant-1.c: unknown dg option: \} for "}"

..., and the following change also doesn't look quite right:

> --- a/libgomp/testsuite/libgomp.fortran/declare-simd-1.f90
> +++ b/libgomp/testsuite/libgomp.fortran/declare-simd-1.f90
> @@ -1,5 +1,5 @@
>  ! { dg-do run { target vect_simd_clones } }
> -! { dg-options "-fno-inline" }
> +! { dg-options "-fno-inline -cpp -D__aarch64__" }


Grüße
 Thomas


> gcc/ChangeLog:
>
>  * config/aarch64/aarch64.cc (lane_size): New function.
>  (aarch64_simd_clone_compute_vecsize_and_simdlen): Determine
> simdlen according to NDS rule
>  and reject combination of simdlen and types that lead to
> vectors larger than 128bits.
>
> gcc/testsuite/ChangeLog:
>
>  * lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
>  * c-c++-common/gomp/declare-variant-14.c: Adapt test for aarch64.
>  * c-c++-common/gomp/pr60823-1.c: Likewise.
>  * c-c++-common/gomp/pr60823-2.c: Likewise.
>  * c-c++-common/gomp/pr60823-3.c: Likewise.
>  * g++.dg/gomp/attrs-10.C: Likewise.
>  * g++.dg/gomp/declare-simd-1.C: Likewise.
>  * g++.dg/gomp/declare-simd-3.C: Likewise.
>  * g++.dg/gomp/declare-simd-4.C: Likewise.
>  * g++.dg/gomp/declare-simd-7.C: Likewise.
>  * g++.dg/gomp/declare-simd-8.C: Likewise.
>  * g++.dg/gomp/pr88182.C: Likewise.
>  * gcc.dg/declare-simd.c: Likewise.
>  * gcc.dg/gomp/declare-simd-1.c: Likewise.
>  * gcc.dg/gomp/declare-simd-3.c: Likewise.
>  * gcc.dg/gomp/pr87887-1.c: Likewise.
>  * gcc.dg/gomp/pr87895-1.c: Likewise.
>  * gcc.dg/gomp/pr89246-1.c: Likewise.
>  * gcc.dg/gomp/pr99542.c: Likewise.
>  * gcc.dg/gomp/simd-clones-2.c: Likewise.
>  * gcc.dg/gcc.dg/vect/vect-simd-clone-1.c: Likewise.
>  * gcc.dg/gcc.dg/vect/vect-simd-clone-2.c: Likewise.
>  * gcc.dg/gcc.dg/vect/vect-simd-clone-4.c: Likewise.
>  * gcc.dg/gcc.dg/vect/vect-simd-clone-5.c: Likewise.
>  * gcc.dg/gcc.dg/vect/vect-simd-clone-8.c: Likewise.
>  * gfortran.dg/gomp/declare-simd-2.f90: Likewise.
>  * gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
>  * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
>  * gfortran.dg/gomp/pr79154-1.f90: Likewise.
>  * gfortran.dg/gomp/pr83977.f90: Likewise.
>
> libgomp/testsuite/ChangeLog:
>
>  * libgomp.c/declare-variant-1.c: Adapt test for aarch64.
>  * libgomp.fortran/declare-simd-1.f90: Likewise.
> 

[PATCH] analyzer: fix uninitialized bitmap [PR112955]

2023-12-11 Thread David Malcolm
In r14-5566-g841008d3966c0f I added a new ctor for
feasibility_state, but failed to call bitmap_clear
on m_snodes_visited.

Fixed thusly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Verified fix manually with valgrind on gcc.dg/analyzer/data-model-20.c.
Pushed to trunk as r14-6434-g6008b80b25d718.

gcc/analyzer/ChangeLog:
PR analyzer/112955
* engine.cc (feasibility_state::feasibility_state): Initialize
m_snodes_visited.
---
 gcc/analyzer/engine.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 1f930a21eb37..03750815939a 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -4870,6 +4870,7 @@ feasibility_state::feasibility_state (const region_model 
,
 : m_model (model),
   m_snodes_visited (sg.m_nodes.length ())
 {
+  bitmap_clear (m_snodes_visited);
 }
 
 feasibility_state &
-- 
2.26.3



Re: v2 [C PATCH] Fix regression causing ICE for structs with VLAs [PR 112488]

2023-12-11 Thread Joseph Myers
On Sat, 9 Dec 2023, Martin Uecker wrote:

> Fix regression causing ICE for structs with VLAs [PR 112488]
> 
> A previous patch the fixed several ICEs related to size expressions
> of VM types (PR c/70418, ...) caused a regression for structs where
> a DECL_EXPR is not generated anymore although reqired.  We now call
> add_decl_expr introduced by the previous patch from finish_struct.
> The function is revised with a new argument to not set the TYPE_NAME
> for the type to the DECL_EXPR in this specific case.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] multiflags: fix doc warning

2023-12-11 Thread Joseph Myers
On Fri, 8 Dec 2023, Alexandre Oliva wrote:

> @@ -20589,7 +20589,7 @@ allocation before or after interprocedural 
> optimization.
>  This option enables multilib-aware @code{TFLAGS} to be used to build
>  target libraries with options different from those the compiler is
>  configured to use by default, through the use of specs (@xref{Spec
> -Files}) set up by compiler internals, by the target, or by builders at
> +Files}.) set up by compiler internals, by the target, or by builders at

The proper change in this context is to use @pxref instead of @xref.


-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-11 Thread Jonathan Wakely
CCing Tom's current address, as he's not @redhat.com now.

On Mon, 11 Dec 2023, 19:24 Nate Eldredge,  wrote:

> On Mon, 11 Dec 2023, Nate Eldredge wrote:
>
> > To fix, we need something like `__args._M_old = __val;` inside the loop
> in
> > __atomic_wait_address(), so that we always wait on the exact value that
> the
> > predicate __pred() rejected.  Again, there are similar instances in
> > atomic_timed_wait.h.
>
> Thinking through this, there's another problem.  The main loop in
> __atomic_wait_address() would then be:
>
>while (!__pred(__val))
>  {
>__args._M_old = __val;
>__detail::__wait_impl(__wait_addr, &__args);
>__val = __vfn();
>  }
>
> The idea being that we only call __wait_impl to wait on a value that the
> predicate said was unacceptable.  But looking for instance at its caller
> __atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate
> passed in is _S_do_try_acquire(), whose code is:
>
>  _S_do_try_acquire(__detail::__platform_wait_t* __counter,
>__detail::__platform_wait_t __old) noexcept
>  {
>if (__old == 0)
>  return false;
>
>return __atomic_impl::compare_exchange_strong(__counter,
>  __old, __old - 1,
>  memory_order::acquire,
>
>  memory_order::relaxed);
>  }
>
> It returns false if the value passed in was unacceptable (i.e. zero), *or*
> if it was nonzero (let's say 1) but the compare_exchange failed because
> another thread swooped in and modified the semaphore counter.  In that
> latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which
> is likewise bad.  If the counter is externally changed back to 1 just
> before we call __platform_wait (that's the futex call), we would go to
> sleep waiting on a semaphore that is already available: deadlock.
>
> I guess there's a couple ways to fix it.
>
> You could have the "predicate" callback instead return a tri-state value:
> "all done, stop waiting" (like current true), "value passed is not
> acceptable" (like current false), and "value was acceptable but something
> else went wrong".  Only the second case should result in calling
> __wait_impl().  In the third case, __atomic_wait_address() should
> just reload the value (using __vfn()) and loop again.
>
> Or, make the callback __pred() a pure predicate that only tests its input
> value for acceptability, without any other side effects.  Then have
> __atomic_wait_address() simply return as soon as __pred(__val) returns
> true.  It would be up to the caller to actually decrement the semaphore or
> whatever, and to call __atomic_wait_address() again if this fails.  In
> that case, __atomic_wait_address should probably return the final value
> that was read, so the caller can immediately do something like a
> compare-exchange using it, and not have to do an additional load and
> predicate test.
>
> Or, make __pred() a pure predicate as before, and give
> __atomic_wait_address yet one more callback function argument, call it
> __taker(), whose job is to acquire the semaphore etc, and have
> __atomic_wait_address call it after __pred(__val) returns true.
>
> --
> Nate Eldredge
> n...@thatsmathematics.com
>
>


Re: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Richard Sandiford
Jeff Law  writes:
> On 11/27/23 05:12, Richard Sandiford wrote:
>> check_asm_operands was inconsistent about how it handled "p" after
>> RA compared to before RA.  Before RA it tested the address with a
>> void (unknown) memory mode:
>> 
>>  case CT_ADDRESS:
>>/* Every address operand can be reloaded to fit.  */
>>result = result || address_operand (op, VOIDmode);
>>break;
>> 
>> After RA it deferred to constrain_operands, which used the mode
>> of the operand:
>> 
>>  if ((GET_MODE (op) == VOIDmode
>>   || SCALAR_INT_MODE_P (GET_MODE (op)))
>>  && (strict <= 0
>>  || (strict_memory_address_p
>>   (recog_data.operand_mode[opno], op
>>win = true;
>> 
>> Using the mode of the operand matches reload's behaviour:
>> 
>>else if (insn_extra_address_constraint
>> (lookup_constraint (constraints[i])))
>>  {
>>address_operand_reloaded[i]
>>  = find_reloads_address (recog_data.operand_mode[i], (rtx*) 0,
>>  recog_data.operand[i],
>>  recog_data.operand_loc[i],
>>  i, operand_type[i], ind_levels, insn);
>> 
>> It allowed the special predicate address_operand to be used, with the
>> mode of the operand being the mode of the addressed memory, rather than
>> the mode of the address itself.  For example, vax has:
>> 
>> (define_insn "*movaddr"
>>[(set (match_operand:SI 0 "nonimmediate_operand" "=g")
>>  (match_operand:VAXfp 1 "address_operand" "p"))
>> (clobber (reg:CC VAX_PSL_REGNUM))]
>>"reload_completed"
>>"mova %a1,%0")
>> 
>> where operand 1 is an SImode expression that can address memory of
>> mode VAXfp.  GET_MODE (recog_data.operand[1]) is SImode (or VOIDmode),
>> but recog_data.operand_mode[1] is mode.
>> 
>> But AFAICT, ira and lra (like pre-reload check_asm_operands) do not
>> do this, and instead pass VOIDmode.  So I think this traditional use
>> of address_operand is effectively an old-reload-only feature.
>> 
>> And it seems like no modern port cares.  I think ports have generally
>> moved to using different address constraints instead, rather than
>> relying on "p" with different operand modes.  Target-specific address
>> constraints post-date the code above.
>> 
>> The big advantage of using different constraints is that it works
>> for asms too.  And that (to finally get to the point) is the problem
>> fixed in this patch.  For the aarch64 test:
>> 
>>void f(char *p) { asm("prfm pldl1keep, %a0\n" :: "p" (p + 6)); }
>> 
>> everything up to and including RA required the operand to be a
>> valid VOIDmode address.  But post-RA check_asm_operands and
>> constrain_operands instead required it to be valid for
>> recog_data.operand_mode[0].  Since asms have no syntax for
>> specifying an operand mode that's separate from the operand itself,
>> operand_mode[0] is simply Pmode (i.e. DImode).
>> 
>> This meant that we required one mode before RA and a different mode
>> after RA.  On AArch64, VOIDmode is treated as a wildcard and so has a
>> more conservative/restricted range than DImode.  So if a post-RA pass
>> tried to form a new address, it would use a laxer condition than the
>> pre-RA passes.
> This was initially a bit counter-intuitive, my first reaction was that a 
> wildcard mode is more general.  And that's true, but it necessarily 
> means the addresses accepted are more restrictive because any mode is 
> allowed.

Right.  I should probably have a conservative, common subset.

>> This happened with the late-combine pass that I posted in October:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
>> which in turn triggered an error from aarch64_print_operand_address.
>> 
>> This patch takes the (hopefully) conservative fix of using VOIDmode for
>> asms but continuing to use the operand mode for .md insns, so as not
>> to break ports that still use reload.
> Sadly I didn't get as far as I would have liked in removing reload, 
> though we did get a handful of ports converted this cycle
>
>> 
>> Fixing this made me realise that recog_level2 was doing duplicate
>> work for asms after RA.
>> 
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>> 
>> Richard
>> 
>> 
>> gcc/
>>  * recog.cc (constrain_operands): Pass VOIDmode to
>>  strict_memory_address_p for 'p' constraints in asms.
>>  * rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands
>>  for asms.
>> 
>> gcc/testsuite/
>>  * gcc.target/aarch64/prfm_imm_offset_2.c: New test.
> It all seems a bit hackish.  I don't think ports have had much success 
> using 'p' through the decades.  I think I generally ended up having to 
> go with distinct constraints rather than relying on 'p'.
>
> OK for the trunk, but ewww.

Thanks, pushed.  And yeah, eww is fair.  I'd be happy for this to 

Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-11 Thread Nate Eldredge

On Mon, 11 Dec 2023, Nate Eldredge wrote:

To fix, we need something like `__args._M_old = __val;` inside the loop in 
__atomic_wait_address(), so that we always wait on the exact value that the 
predicate __pred() rejected.  Again, there are similar instances in 
atomic_timed_wait.h.


Thinking through this, there's another problem.  The main loop in 
__atomic_wait_address() would then be:


  while (!__pred(__val))
{
  __args._M_old = __val;
  __detail::__wait_impl(__wait_addr, &__args);
  __val = __vfn();
}

The idea being that we only call __wait_impl to wait on a value that the 
predicate said was unacceptable.  But looking for instance at its caller 
__atomic_semaphore::_M_acquire() in bits/semaphore_base.h, the predicate 
passed in is _S_do_try_acquire(), whose code is:


_S_do_try_acquire(__detail::__platform_wait_t* __counter,
  __detail::__platform_wait_t __old) noexcept
{
  if (__old == 0)
return false;

  return __atomic_impl::compare_exchange_strong(__counter,
__old, __old - 1,
memory_order::acquire,
memory_order::relaxed);
}

It returns false if the value passed in was unacceptable (i.e. zero), *or* 
if it was nonzero (let's say 1) but the compare_exchange failed because 
another thread swooped in and modified the semaphore counter.  In that 
latter case, __atomic_wait_address() would pass 1 to __wait_impl(), which 
is likewise bad.  If the counter is externally changed back to 1 just 
before we call __platform_wait (that's the futex call), we would go to 
sleep waiting on a semaphore that is already available: deadlock.


I guess there's a couple ways to fix it.

You could have the "predicate" callback instead return a tri-state value: 
"all done, stop waiting" (like current true), "value passed is not 
acceptable" (like current false), and "value was acceptable but something 
else went wrong".  Only the second case should result in calling 
__wait_impl().  In the third case, __atomic_wait_address() should 
just reload the value (using __vfn()) and loop again.


Or, make the callback __pred() a pure predicate that only tests its input 
value for acceptability, without any other side effects.  Then have 
__atomic_wait_address() simply return as soon as __pred(__val) returns 
true.  It would be up to the caller to actually decrement the semaphore or 
whatever, and to call __atomic_wait_address() again if this fails.  In 
that case, __atomic_wait_address should probably return the final value 
that was read, so the caller can immediately do something like a 
compare-exchange using it, and not have to do an additional load and 
predicate test.


Or, make __pred() a pure predicate as before, and give 
__atomic_wait_address yet one more callback function argument, call it 
__taker(), whose job is to acquire the semaphore etc, and have 
__atomic_wait_address call it after __pred(__val) returns true.


--
Nate Eldredge
n...@thatsmathematics.com



Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-11 Thread Marek Polacek
On Mon, Dec 11, 2023 at 08:17:22PM +0100, Richard Biener wrote:
> 
> 
> > Am 11.12.2023 um 20:12 schrieb Jason Merrill :
> > Maybe something like this?  Or shall we write out the names like 
> > CLOBBER_OBJECT_START, CLOBBER_STORAGE_END, etc?
> 
> Yeah, the abbreviations look a bit confusing so spelling it out would be 
> better

What about pretty-print, should we keep

  pp_string (pp, "(eol)");

or use the new, more specific description?

Marek



Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-11 Thread Richard Biener



> Am 11.12.2023 um 20:12 schrieb Jason Merrill :
> 
> On 12/11/23 03:02, Richard Biener wrote:
>>> On Sun, 10 Dec 2023, Jason Merrill wrote:
>>> On 12/10/23 05:22, Richard Biener wrote:
> Am 09.12.2023 um 21:13 schrieb Jason Merrill :
> 
> On 11/2/23 21:18, Nathaniel Shead wrote:
>> Bootstrapped and regtested on x86-64_pc_linux_gnu.
>> I'm not entirely sure if the change I made to have destructors clobber
>> with
>> CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed 
>> to
>> have
>> broken by doing this and I wasn't able to find anything else that really
>> depended on this distinction other than a warning pass. Otherwise I could
>> experiment with a new clobber kind for destructor calls.
> 
> It seems wrong to me: CLOBBER_EOL is documented to mean that the storage 
> is
> expiring at that point as well, which a (pseudo-)destructor does not 
> imply;
> it's perfectly valid to destroy an object and then create another in the
> same storage.
> 
> We probably do want another clobber kind for end of object lifetime. 
> And/or
> one for beginning of object lifetime.
 
 There?s not much semantically different between UNDEF and end of object but
 not storage lifetime?  At least for what middle-end optimizations do.
>>> 
>>> That's fine for the middle-end, but Nathaniel's patch wants to distinguish
>>> between the clobbers at beginning and end of object lifetime in order to
>>> diagnose stores to an out-of-lifetime object in constexpr evaluation.
>> Ah, I see.  I did want to add CLOBBER_SOL (start-of-life) when working
>> on PR90348, but I always fail to finish working on that stack-slot sharing
>> issue.  But it would be for the storage life, not object life, also
>> added by gimplification.
>>> One option might be to remove the clobber at the beginning of the 
>>> constructor;
>>> are there any useful optimizations enabled by that, or is it just 
>>> pedantically
>>> breaking people's code?
>> It's allowing DSE to the object that was live before the new one.  Not
>> all objects require explicit destruction (which would get you a clobber)
>> before storage can be re-used.
 EOL is used by stack slot sharing and that operates on the underlying
 storage, not individual objects live in it.
>>> 
>>> I wonder about changing the name to EOS (end of storage [duration]) to avoid
>>> similar confusion with object lifetime?
>> EOS{L,D}?  But sure, better names (and documentation) are appreciated.
> 
> Maybe something like this?  Or shall we write out the names like 
> CLOBBER_OBJECT_START, CLOBBER_STORAGE_END, etc?

Yeah, the abbreviations look a bit confusing so spelling it out would be better

Richard 

> 
> <0001-tree-add-to-clobber_kind.patch>


[pushed] testsuite: update mangling

2023-12-11 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Since r14-6064-gc3f281a0c1ca50 this test was checking for the wrong
mangling, but it still passed on targets that support ABI compatibility
aliases.  Let's avoid generating those aliases when checking mangling.

gcc/ChangeLog:

* common.opt: Add comment.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-explicit-inst1.C: Specify ABI v18.
* g++.dg/cpp2a/concepts-explicit-inst1a.C: New test.
---
 gcc/common.opt|  1 +
 .../g++.dg/cpp2a/concepts-explicit-inst1.C|  1 +
 .../g++.dg/cpp2a/concepts-explicit-inst1a.C   | 24 +++
 3 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst1a.C

diff --git a/gcc/common.opt b/gcc/common.opt
index 5eb5ecff04b..d263a959df3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1020,6 +1020,7 @@ Driver Undocumented
 ;
 ; 19: Emits ABI tags if needed in structured binding mangled names.
 ; Ignores cv-quals on [[no_unique_object]] members.
+; Mangles constraints on function templates.
 ; Default in G++ 14.
 ;
 ; Additional positive integers will be assigned as new versions of
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst1.C
index 5cbf64a8cd3..b66e919e880 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst1.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst1.C
@@ -1,4 +1,5 @@
 // { dg-do compile { target c++20 } }
+// { dg-additional-options "-fabi-version=18 -fabi-compat-version=18" }
 // { dg-final { scan-assembler "_Z1gI1XEvT_" } }
 // { dg-final { scan-assembler "_Z1gI1YEvT_" } }
 // { dg-final { scan-assembler "_Z1gIiEvT_" } }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst1a.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst1a.C
new file mode 100644
index 000..feb31f9e24c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst1a.C
@@ -0,0 +1,24 @@
+// { dg-do compile { target c++20 } }
+// { dg-additional-options "-fabi-version=0 -fabi-compat-version=0" }
+// { dg-final { scan-assembler "_Z1gITk1C1YEvT_" } }
+// { dg-final { scan-assembler "_Z1gITk1D1XEvT_" } }
+// { dg-final { scan-assembler "_Z1gIiEvT_" } }
+
+template
+  concept C = __is_class(T);
+
+template
+  concept D = C && __is_empty(T);
+
+struct X { };
+struct Y { int n; };
+
+template void g(T) { } // #1
+template void g(T) { } // #2
+template void g(T) { } // #3
+
+template void g(int); // Instantiate #1
+template void g(X); // Instantitae #3
+template void g(Y); // Instantiate #2
+
+int main() { }

base-commit: 889341a897d3d2e9fb09de1a1c5e764a2c03424f
prerequisite-patch-id: 6f3182b395d62bae9a7b1930edc21d52611c328c
-- 
2.39.3



Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-11 Thread Jason Merrill

On 12/11/23 03:02, Richard Biener wrote:

On Sun, 10 Dec 2023, Jason Merrill wrote:


On 12/10/23 05:22, Richard Biener wrote:

Am 09.12.2023 um 21:13 schrieb Jason Merrill :

On 11/2/23 21:18, Nathaniel Shead wrote:

Bootstrapped and regtested on x86-64_pc_linux_gnu.
I'm not entirely sure if the change I made to have destructors clobber
with
CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to
have
broken by doing this and I wasn't able to find anything else that really
depended on this distinction other than a warning pass. Otherwise I could
experiment with a new clobber kind for destructor calls.


It seems wrong to me: CLOBBER_EOL is documented to mean that the storage is
expiring at that point as well, which a (pseudo-)destructor does not imply;
it's perfectly valid to destroy an object and then create another in the
same storage.

We probably do want another clobber kind for end of object lifetime. And/or
one for beginning of object lifetime.


There?s not much semantically different between UNDEF and end of object but
not storage lifetime?  At least for what middle-end optimizations do.


That's fine for the middle-end, but Nathaniel's patch wants to distinguish
between the clobbers at beginning and end of object lifetime in order to
diagnose stores to an out-of-lifetime object in constexpr evaluation.


Ah, I see.  I did want to add CLOBBER_SOL (start-of-life) when working
on PR90348, but I always fail to finish working on that stack-slot sharing
issue.  But it would be for the storage life, not object life, also
added by gimplification.


One option might be to remove the clobber at the beginning of the constructor;
are there any useful optimizations enabled by that, or is it just pedantically
breaking people's code?


It's allowing DSE to the object that was live before the new one.  Not
all objects require explicit destruction (which would get you a clobber)
before storage can be re-used.


EOL is used by stack slot sharing and that operates on the underlying
storage, not individual objects live in it.


I wonder about changing the name to EOS (end of storage [duration]) to avoid
similar confusion with object lifetime?


EOS{L,D}?  But sure, better names (and documentation) are appreciated.


Maybe something like this?  Or shall we write out the names like 
CLOBBER_OBJECT_START, CLOBBER_STORAGE_END, etc?



From 14f71a9479bd0cf4249c8c9e917a9caf3eac8c82 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Mon, 11 Dec 2023 11:35:31 -0500
Subject: [PATCH] tree: add to clobber_kind
To: gcc-patches@gcc.gnu.org

In discussion of PR71093 it came up that more clobber_kind options would be
useful within the C++ front-end.

gcc/ChangeLog:

	* tree-core.h (enum clobber_kind): Rename CLOBBER_EOL to
	CLOBBER_EOSD.  Add CLOBBER_BOSD, CLOBBER_BOBL, CLOBBER_EOBL.
	* gimple-lower-bitint.cc
	* gimple-ssa-warn-access.cc
	* gimplify.cc
	* tree-inline.cc
	* tree-pretty-print.cc
	* tree-ssa-ccp.cc: Adjust for rename.
---
 gcc/tree-core.h   | 13 ++---
 gcc/gimple-lower-bitint.cc|  8 
 gcc/gimple-ssa-warn-access.cc |  2 +-
 gcc/gimplify.cc   |  8 
 gcc/tree-inline.cc|  4 ++--
 gcc/tree-pretty-print.cc  |  2 +-
 gcc/tree-ssa-ccp.cc   |  2 +-
 7 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 04c04cf2f37..bdf14605c91 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -986,12 +986,19 @@ enum annot_expr_kind {
   annot_expr_kind_last
 };
 
-/* The kind of a TREE_CLOBBER_P CONSTRUCTOR node.  */
+/* The kind of a TREE_CLOBBER_P CONSTRUCTOR node.  Other than _UNDEF, these are
+   in roughly sequential order.  */
 enum clobber_kind {
   /* Unspecified, this clobber acts as a store of an undefined value.  */
   CLOBBER_UNDEF,
-  /* This clobber ends the lifetime of the storage.  */
-  CLOBBER_EOL,
+  /* Beginning of storage duration, e.g. malloc.  */
+  CLOBBER_BOSD,
+  /* Beginning of object lifetime, e.g. C++ constructor.  */
+  CLOBBER_BOBL,
+  /* End of object lifetime, e.g. C++ destructor.  */
+  CLOBBER_EOBL,
+  /* End of storage duration, e.g. free.  */
+  CLOBBER_EOSD,
   CLOBBER_LAST
 };
 
diff --git a/gcc/gimple-lower-bitint.cc b/gcc/gimple-lower-bitint.cc
index c55c32fb40d..00c3a5b20a8 100644
--- a/gcc/gimple-lower-bitint.cc
+++ b/gcc/gimple-lower-bitint.cc
@@ -806,7 +806,7 @@ bitint_large_huge::handle_operand (tree op, tree idx)
 	  && m_after_stmt
 	  && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
 	{
-	  tree clobber = build_clobber (TREE_TYPE (m_vars[p]), CLOBBER_EOL);
+	  tree clobber = build_clobber (TREE_TYPE (m_vars[p]), CLOBBER_EOSD);
 	  g = gimple_build_assign (m_vars[p], clobber);
 	  gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
 	  gsi_insert_after (, g, GSI_SAME_STMT);
@@ -2063,7 +2063,7 @@ bitint_large_huge::handle_operand_addr (tree op, gimple *stmt,
   tree ret = build_fold_addr_expr (var);
   if (!stmt_ends_bb_p 

Re: [PING ^3][PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2023-12-11 Thread jeevitha
Ping!

Please review 

On 13/11/23 8:38 pm, jeevitha wrote:
> Ping!
> 
> please review.
> 
> Thanks & Regards
> Jeevitha
> 
> On 25/08/23 7:49 am, Peter Bergner wrote:
>> On 8/24/23 12:35 PM, Michael Meissner wrote:
>>> On Thu, Jul 20, 2023 at 10:05:28AM +0530, jeevitha wrote:
 gcc/
PR target/110411
* config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
to hold PTImode type.
* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
for PTImode type.
>>>
>>> It is good as far as it goes, but I suspect we will eventually need to 
>>> extend
>>> it.  In particular, the reason people need PTImode is they need the even/odd
>>> register layout.  What you've done enables users to declare this value.
>>
>> Sure, it could be extended, but that is not what this patch is about.
>> It's purely to allow the kernel team access to the guaranteed even/odd
>> register layout for some inline asm code.  Any extension would be a
>> follow-on patch to this.
>>
>>
>>
>> On 8/9/23 3:48 AM, Kewen.Lin wrote:
>>> IIUC, this builtin type registering makes this type expose to users, so
>>> I wonder if we want to actually expose this type for users' uses.
>>> If yes, we need to update the documentation (and not sure if the current
>>> name is good enough);

Is the current name acceptable if we're not going to document the type?

>>
>> Segher, Mike, Jeevitha and I talked about the patch and Segher mentioned
>> that under some conditions, it's fine to keep the type undocumented.
>> Hopefully he'll weigh in on whether this particular patch is one of
>> those cases or not.  
>>
>>
>> Peter

Thanks & Regards
Jeevitha


Re: [PATCH v8] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-11 Thread Marek Polacek
On Mon, Dec 11, 2023 at 02:02:52PM -0500, Jason Merrill wrote:
> On 12/11/23 03:49, FX Coudert wrote:
> > Hi Marek,
> > 
> > The patch is causing three failures on x86_64-apple-darwin21:
> > 
> > > FAIL: g++.dg/cpp2a/concepts-explicit-inst1.C -std=c++20 scan-assembler 
> > > _Z1gI1XEvT_
> > > FAIL: g++.dg/cpp2a/concepts-explicit-inst1.C -std=c++20 scan-assembler 
> > > _Z1gI1YEvT_
> 
> I think these are from my r14-6064-gc3f281a0c1ca50 rather than Marek's
> patch.  The mangling of those two functions changed, but the test looking
> for the old mangling still passed on targets that support ABI compatibility
> aliases.  I'll fix the tests.

Sounds plausible, thanks.  If the tests still fail with 
-fno-immediate-escalation,
it's the mangling changes.
 
> Jason
> 
> > > FAIL: g++.dg/cpp2a/consteval-prop6.C -std=c++20 at line 58 (test for 
> > > warnings, line 57)
> > 
> > How could I help debug this?
> > 
> > FX
> > 
> 

Marek



Re: [PATCH v8] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-11 Thread Jason Merrill

On 12/11/23 03:49, FX Coudert wrote:

Hi Marek,

The patch is causing three failures on x86_64-apple-darwin21:


FAIL: g++.dg/cpp2a/concepts-explicit-inst1.C -std=c++20 scan-assembler 
_Z1gI1XEvT_
FAIL: g++.dg/cpp2a/concepts-explicit-inst1.C -std=c++20 scan-assembler 
_Z1gI1YEvT_


I think these are from my r14-6064-gc3f281a0c1ca50 rather than Marek's 
patch.  The mangling of those two functions changed, but the test 
looking for the old mangling still passed on targets that support ABI 
compatibility aliases.  I'll fix the tests.


Jason


FAIL: g++.dg/cpp2a/consteval-prop6.C -std=c++20 at line 58 (test for warnings, 
line 57)


How could I help debug this?

FX





Re: Introduce -finline-stringops (was: Re: [RFC] Introduce -finline-memset-loops)

2023-12-11 Thread Sam James


Alexandre Oliva via Gcc-patches  writes:

> On Jun  2, 2023, Alexandre Oliva  wrote:
>
>> Introduce -finline-stringops
>
> Ping?  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620472.html

Should the docs for the x86-specific -minline-all-stringops refer to
the new -finline-stringops?

thanks,
sam


Re: [PATCH V2] RISC-V: XFAIL scan dump fails for autovec PR111311

2023-12-11 Thread Edwin Lu



On 12/10/2023 9:37 PM, Kito Cheng wrote:

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr83518.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
index b8a2bd1ebbd..6f2fc56c82c 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
@@ -24,4 +24,4 @@ unsigned test()
return sum;
  }

-/* { dg-final { scan-tree-dump "return 15;" "optimized" { xfail 
vect_variable_length } } } */
+/* { dg-final { scan-tree-dump "return 15;" "optimized" { xfail { vect_variable_length 
&& aarch64*-*-* } } } } */

aarch64?


I found the patch which added the xfail vect_variable_length originating 
from https://inbox.sourceware.org/gcc-patches/mptfszz8vxw@arm.com/. 
The patch mentioned how it was tested on aarch64 which is why I added 
it. Should I change it to ! riscv_v?


Edwin



[PATCH v2 FYI] -finline-stringops: avoid too-wide smallest_int_mode_for_size [PR112784]

2023-12-11 Thread Alexandre Oliva
On Dec 11, 2023, Richard Biener  wrote:

> you can use .exists (_mode) here to ...

Aah, yeah, and that should help avoid the noisy as_a conversions too,
that I could replace with require(), or drop altogether.

>> +  || (GET_MODE_BITSIZE (as_a  (int_move_mode))
>> + != incr * BITS_PER_UNIT))

Unfortunately, here it can't quite be dropped, GET_MODE_SIZE on a
machine_mode isn't suitable for the !=, but with .require() we know it's
a scalar_int_mode and thus != on its bitsize is fine.

> I'll note that int_mode_for_size and smallest_int_mode_for_size
> are not semantically equivalent in what they can return.  In
> particular it seems you are incrementing by iter_incr even when
> formerly smallest_int_mode_for_size would have returned a
> larger than necessary mode, resulting in at least inefficient
> code, copying/comparing pieces multiple times.

If we get a mode that isn't exactly the expected size, we go for
BLKmode, so it should be fine and efficient.  Unless machine modes are
not powers of two multiples of BITS_PER_UNIT, then things may get a
little weird, not so much because of repeated copying/comparing, but
because of inefficiencies in the block copy/compare operations with
block sizes that are not a good fit for such hypothetical (?) GCC
targets.  I guess we can cross that bridge if we ever get to it.

> So int_mode_for_size looks more correct.

*nod*.  IIRC I had it at first (very long ago), and went for the
smallest_ when transitioning to the machine_mode type hierarchy revamp.

> OK with the above change.

Here's what I've regstrapped on x86_64-linux-gnu, and will install
shortly.  Thanks!


smallest_int_mode_for_size may abort when the requested mode is not
available.  Call int_mode_for_size instead, that signals the
unsatisfiable request in a more graceful way.


for  gcc/ChangeLog

PR middle-end/112784
* expr.cc (emit_block_move_via_loop): Call int_mode_for_size
for maybe-too-wide sizes.
(emit_block_cmp_via_loop): Likewise.

for  gcc/testsuite/ChangeLog

PR middle-end/112784
* gcc.target/i386/avx512cd-inline-stringops-pr112784.c: New.
---
 gcc/expr.cc|   20 +---
 .../i386/avx512cd-inline-stringops-pr112784.c  |   12 
 2 files changed, 21 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 6da51f2aca296..076ba706537aa 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2449,15 +2449,14 @@ emit_block_move_via_loop (rtx x, rtx y, rtx size,
 }
   emit_move_insn (iter, iter_init);
 
-  scalar_int_mode int_move_mode
-= smallest_int_mode_for_size (incr * BITS_PER_UNIT);
-  if (GET_MODE_BITSIZE (int_move_mode) != incr * BITS_PER_UNIT)
+  opt_scalar_int_mode int_move_mode
+= int_mode_for_size (incr * BITS_PER_UNIT, 1);
+  if (!int_move_mode.exists (_mode)
+  || GET_MODE_BITSIZE (int_move_mode.require ()) != incr * BITS_PER_UNIT)
 {
   move_mode = BLKmode;
   gcc_checking_assert (can_move_by_pieces (incr, align));
 }
-  else
-move_mode = int_move_mode;
 
   x_addr = force_operand (XEXP (x, 0), NULL_RTX);
   y_addr = force_operand (XEXP (y, 0), NULL_RTX);
@@ -2701,16 +2700,15 @@ emit_block_cmp_via_loop (rtx x, rtx y, rtx len, tree 
len_type, rtx target,
   iter = gen_reg_rtx (iter_mode);
   emit_move_insn (iter, iter_init);
 
-  scalar_int_mode int_cmp_mode
-= smallest_int_mode_for_size (incr * BITS_PER_UNIT);
-  if (GET_MODE_BITSIZE (int_cmp_mode) != incr * BITS_PER_UNIT
-  || !can_compare_p (NE, int_cmp_mode, ccp_jump))
+  opt_scalar_int_mode int_cmp_mode
+= int_mode_for_size (incr * BITS_PER_UNIT, 1);
+  if (!int_cmp_mode.exists (_mode)
+  || GET_MODE_BITSIZE (int_cmp_mode.require ()) != incr * BITS_PER_UNIT
+  || !can_compare_p (NE, cmp_mode, ccp_jump))
 {
   cmp_mode = BLKmode;
   gcc_checking_assert (incr != 1);
 }
-  else
-cmp_mode = int_cmp_mode;
 
   /* Save the base addresses.  */
   x_addr = force_operand (XEXP (x, 0), NULL_RTX);
diff --git a/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c 
b/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
new file mode 100644
index 0..c81f99c693c24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512cd -finline-stringops" } */
+
+struct S {
+  int e;
+} __attribute__((aligned(128)));
+
+int main() {
+  struct S s1;
+  struct S s2;
+  int v = __builtin_memcmp(, , sizeof(s1));
+}


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-11 Thread H.J. Lu
On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>
> On 2023/12/9 23:23, Jakub Jelinek wrote:
> > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> > > This patch try to introduce the rwlock and split the read/write to
> > > unit_root tree and unit_cache with rwlock instead of the mutex to
> > > increase CPU efficiency. In the get_gfc_unit function, the percentage
> > > to step into the insert_unit function is around 30%, in most
> > > instances, we can get the unit in the phase of reading the unit_cache
> > > or unit_root tree. So split the read/write phase by rwlock would be an
> > > approach to make it more parallel.
> > >
> > > BTW, the IPC metrics can gain around 9x in our test server with 220
> > > cores. The benchmark we used is https://github.com/rwesson/NEAT
> > >
> > > libgcc/ChangeLog:
> > >
> > > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
> > > (__gthrw): New function.
> > > (__gthread_rwlock_rdlock): New function.
> > > (__gthread_rwlock_tryrdlock): New function.
> > > (__gthread_rwlock_wrlock): New function.
> > > (__gthread_rwlock_trywrlock): New function.
> > > (__gthread_rwlock_unlock): New function.
> > >
> > > libgfortran/ChangeLog:
> > >
> > > * io/async.c (DEBUG_LINE): New macro.
> > > * io/async.h (RWLOCK_DEBUG_ADD): New macro.
> > > (CHECK_RDLOCK): New macro.
> > > (CHECK_WRLOCK): New macro.
> > > (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
> > > (IN_RWLOCK_DEBUG_QUEUE): New macro.
> > > (RDLOCK): New macro.
> > > (WRLOCK): New macro.
> > > (RWUNLOCK): New macro.
> > > (RD_TO_WRLOCK): New macro.
> > > (INTERN_RDLOCK): New macro.
> > > (INTERN_WRLOCK): New macro.
> > > (INTERN_RWUNLOCK): New macro.
> > > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> > > a comment.
> > > (unit_lock): Remove including associated internal_proto.
> > > (unit_rwlock): New declarations including associated internal_proto.
> > > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
> > > instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> > > unit_lock.
> > > * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK
> > on
> > > unit_rwlock instead of LOCK and UNLOCK on unit_lock.
> > > (st_write_done_worker): Likewise.
> > > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
> > > comment. Use unit_rwlock variable instead of unit_lock variable.
> > > (get_gfc_unit_from_unit_root): New function.
> > > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
> > > instead of LOCK and UNLOCK on unit_lock.
> > > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> > of
> > > LOCK and UNLOCK on unit_lock.
> > > (close_units): Likewise.
> > > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
> > > unit_lock.
> > > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
> > > instead of LOCK and UNLOCK on unit_lock.
> > > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> > > of LOCK and UNLOCK on unit_lock.
> >
> > Ok for trunk, thanks.
> >
> >   Jakub
>
> Thanks! Looking forward to landing to trunk.
>
> Lipeng Zhu

Pushed for you.

Thanks.

-- 
H.J.


Re: [PATCH] Testsuite: restrict test to nonpic targets

2023-12-11 Thread Mike Stump
On Dec 11, 2023, at 12:29 AM, FX Coudert  wrote:
> 
> The test is currently failing on x86_64-apple-darwin. This patch requires 
> nonpic, as suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112297 
> by Andrew Pinski.
> 
> OK to commit?

Ok.

Re: Backport of "fixincludes: Update darwin_flt_eval_method for macOS 14"

2023-12-11 Thread FX Coudert
> Yes, OK (build fixes are on my list, but you got to it first).

Backported to 13 as 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=87e6cc0103369f8891c3c3a516f4d93187c2c12b
Backported to 12 as 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=65595b02668c99edcfd5aedac984ebcbb64a1685

FX

[PATCH v3 6/6] libgomp: fine-grained pinned memory allocator

2023-12-11 Thread Andrew Stubbs

This patch introduces a new custom memory allocator for use with pinned
memory (in the case where the Cuda allocator isn't available).  In future,
this allocator will also be used for Unified Shared Memory.  Both memories
are incompatible with the system malloc because allocated memory cannot
share a page with memory allocated for other purposes.

This means that small allocations will no longer consume an entire page of
pinned memory.  Unfortunately, it also means that pinned memory pages will
never be unmapped (although they may be reused).

The implementation is not perfect; there are various corner cases (especially
related to extending onto new pages) where allocations and reallocations may
be sub-optimal, but it should still be a step forward in support for small
allocations.

I have considered using libmemkind's "fixed" memory but rejected it for three
reasons: 1) libmemkind may not always be present at runtime, 2) there's no
currently documented means to extend a "fixed" kind one page at a time
(although the code appears to have an undocumented function that may do the
job, and/or extending libmemkind to support the MAP_LOCKED mmap flag with its
regular kinds would be straight-forward), 3) USM benefits from having the
metadata located in different memory and using an external implementation makes
it hard to guarantee this.

libgomp/ChangeLog:

* Makefile.am (libgomp_la_SOURCES): Add usmpin-allocator.c.
* Makefile.in: Regenerate.
* config/linux/allocator.c: Include unistd.h.
(pin_ctx): New variable.
(ctxlock): New variable.
(linux_init_pin_ctx): New function.
(linux_memspace_alloc): Use usmpin-allocator for pinned memory.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.h (usmpin_init_context): New prototype.
(usmpin_register_memory): New prototype.
(usmpin_alloc): New prototype.
(usmpin_free): New prototype.
(usmpin_realloc): New prototype.
* testsuite/libgomp.c/alloc-pinned-1.c: Adjust for new behaviour.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-8.c: New test.
* usmpin-allocator.c: New file.
---
 libgomp/Makefile.am  |   2 +-
 libgomp/Makefile.in  |   7 +-
 libgomp/config/linux/allocator.c |  91 --
 libgomp/libgomp.h|  10 +
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c | 127 
 libgomp/usmpin-allocator.c   | 319 +++
 6 files changed, 523 insertions(+), 33 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/usmpin-allocator.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 1871590596d..9d41ed886d1 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -72,7 +72,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
 	target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
 	oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
 	priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c target-indirect.c
+	oacc-target.c target-indirect.c usmpin-allocator.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 56a6beab867..96fa9faf6a4 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -219,7 +219,8 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \
 	oacc-parallel.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
 	oacc-async.lo oacc-plugin.lo oacc-cuda.lo priority_queue.lo \
 	affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
-	oacc-target.lo target-indirect.lo $(am__objects_1)
+	oacc-target.lo target-indirect.lo usmpin-allocator.lo \
+	$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -552,7 +553,8 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
 	oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
 	oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
 	affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c target-indirect.c $(am__append_3)
+	oacc-target.c target-indirect.c usmpin-allocator.c \
+	$(am__append_3)
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
@@ -786,6 +788,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/teams.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/time.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/usmpin-allocator.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/work.Plo@am__quote@
 
 .c.o:
diff 

[PATCH v3 4/6] openmp: -foffload-memory=pinned

2023-12-11 Thread Andrew Stubbs

Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map: Add GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.
* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: New test.
---
 gcc/omp-builtins.def  |  3 +
 gcc/omp-low.cc| 66 +++
 libgomp/config/linux/allocator.c  | 26 
 libgomp/libgomp.map   |  1 +
 .../libgomp.c-c++-common/alloc-pinned-1.c | 28 
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 63 ++
 6 files changed, 187 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index ed78d49d205..54ea7380722 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -473,3 +473,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+		  "GOMP_enable_pinned_mode",
+		  BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index dd802ca37a6..455c5897577 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14592,6 +14592,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+return;
+  visited = true;
+
+  /* Create a new function like this:
+ 
+   static void __attribute__((constructor))
+   __set_pinned_mode ()
+   {
+ GOMP_enable_pinned_mode ();
+   }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+  NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+		   void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14648,6 +14710,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
 finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 269d0d607d8..57278b1af91 100644
--- a/libgomp/config/linux/allocator.c
+++ 

[PATCH v3 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-11 Thread Andrew Stubbs

Use Cuda to pin memory, instead of Linux mlock, when available.

There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.

The design adds a device independent plugin API for allocating pinned memory,
and then implements it for NVPTX.  At present, the other supported devices do
not have equivalent capabilities (or requirements).

libgomp/ChangeLog:

* config/linux/allocator.c: Include assert.h.
(using_device_for_page_locked): New variable.
(linux_memspace_alloc): Add init0 parameter. Support device pinning.
(linux_memspace_calloc): Set init0 to true.
(linux_memspace_free): Support device pinning.
(linux_memspace_realloc): Support device pinning.
(MEMSPACE_ALLOC): Set init0 to false.
* libgomp-plugin.h
(GOMP_OFFLOAD_page_locked_host_alloc): New prototype.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* libgomp.h (gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(struct gomp_device_descr): Add page_locked_host_alloc_func and
page_locked_host_free_func.
* libgomp.texi: Adjust the docs for the pinned trait.
* libgomp_g.h (GOMP_enable_pinned_mode): New prototype.
* plugin/plugin-nvptx.c
(GOMP_OFFLOAD_page_locked_host_alloc): New function.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* target.c (device_for_page_locked): New variable.
(get_device_for_page_locked): New function.
(gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(gomp_load_plugin_for_device): Add page_locked_host_alloc and
page_locked_host_free.
* testsuite/libgomp.c/alloc-pinned-1.c: Change expectations for NVPTX
devices.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/config/linux/allocator.c | 137 ++-
 libgomp/libgomp-plugin.h |   2 +
 libgomp/libgomp.h|   4 +
 libgomp/libgomp.texi |  11 +-
 libgomp/libgomp_g.h  |   1 +
 libgomp/plugin/plugin-nvptx.c|  42 ++
 libgomp/target.c | 136 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c |  45 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c |  44 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c |  35 -
 13 files changed, 487 insertions(+), 48 deletions(-)

diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 57278b1af91..8d681b5ec50 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -36,6 +36,11 @@
 
 /* Implement malloc routines that can handle pinned memory on Linux.

+   Given that pinned memory is typically used to help host <-> device memory
+   transfers, we attempt to allocate such memory using a device (really:
+   libgomp plugin), but fall back to mmap plus mlock if no suitable device is
+   available.
+
It's possible to use mlock on any heap memory, but using munlock is
problematic if there are multiple pinned allocations on the same page.
Tracking all that manually would be possible, but adds overhead. This may
@@ -49,6 +54,7 @@
 #define _GNU_SOURCE
 #include 
 #include 
+#include 
 #include "libgomp.h"
 
 static bool always_pinned_mode = false;
@@ -65,45 +71,87 @@ GOMP_enable_pinned_mode ()
 always_pinned_mode = true;
 }
 
+static int using_device_for_page_locked
+  = /* uninitialized */ -1;
+
 static void *
-linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
+linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin,
+		  bool init0)
 {
-  (void)memspace;
+  gomp_debug (0, "%s: memspace=%llu, size=%llu, pin=%d, init0=%d\n",
+	  __FUNCTION__, (unsigned long long) memspace,
+	  (unsigned long long) size, pin, init0);
+
+  void *addr;
 
   /* Explicit pinning may not be required.  */
   pin = pin && !always_pinned_mode;
 
   if (pin)
 {
-  /* Note that mmap always returns zeroed memory and is therefore also a
-	 suitable implementation of calloc.  */
-  void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
-			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-  if (addr == MAP_FAILED)
-	return NULL;
-
-  if (mlock (addr, size))
+  int using_device
+	= __atomic_load_n (_device_for_page_locked,
+			   

[PATCH v3 2/6] libgomp, openmp: Add ompx_pinned_mem_alloc

2023-12-11 Thread Andrew Stubbs

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

libgomp/ChangeLog:

* allocator.c (omp_max_predefined_alloc): Update.
(predefined_alloc_mapping): Add ompx_pinned_mem_alloc entry.
(omp_aligned_alloc): Support ompx_pinned_mem_alloc.
(omp_free): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* libgomp.texi: Document ompx_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c   |  58 ++
 libgomp/libgomp.texi  |   7 +-
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 103 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 101 +
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
 7 files changed, 268 insertions(+), 20 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 666adf9a3a9..6c69c4f008f 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -35,7 +35,7 @@
 #include 
 #endif
 
-#define omp_max_predefined_alloc omp_thread_mem_alloc
+#define omp_max_predefined_alloc ompx_pinned_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.
The defaults (no override) are to return NULL for pinned memory requests
@@ -78,6 +78,7 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc (implementation defined). */
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc (implementation defined). */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc (implementation defined). */
+  omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
 };
 
 #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
@@ -623,8 +624,10 @@ retry:
 	  memspace = (allocator_data
 		  ? allocator_data->memspace
 		  : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size,
-allocator_data && allocator_data->pinned);
+	  int pinned = (allocator_data
+			? allocator_data->pinned
+			: allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size, pinned);
 	}
   if (ptr == NULL)
 	goto fail;
@@ -645,7 +648,8 @@ retry:
 fail:;
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		 || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -760,6 +764,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #endif
 
   memspace = predefined_alloc_mapping[data->allocator];
+  pinned = (data->allocator == ompx_pinned_mem_alloc);
 }
 
   MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
@@ -933,8 +938,10 @@ retry:
 	  memspace = (allocator_data
 		  ? allocator_data->memspace
 		  : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_CALLOC (memspace, new_size,
- allocator_data && allocator_data->pinned);
+	  int pinned = (allocator_data
+			? allocator_data->pinned
+			: allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size, pinned);
 	}
   if (ptr == NULL)
 	goto fail;
@@ -955,7 +962,8 @@ retry:
 fail:;
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		 || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -1165,11 +1173,14 @@ retry:
   else
 #endif
   if (prev_size)
-	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-data->size, new_size,
-(free_allocator_data
- && free_allocator_data->pinned),
-allocator_data->pinned);
+	{
+	  int was_pinned = (free_allocator_data
+			? free_allocator_data->pinned
+			: free_allocator == ompx_pinned_mem_alloc);
+	  new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
+  data->size, new_size, was_pinned,
+  allocator_data->pinned);
+	}
   else
 	new_ptr = MEMSPACE_ALLOC 

[PATCH v3 3/6] openmp: Add -foffload-memory

2023-12-11 Thread Andrew Stubbs

Add a new option.  It's inactive until I add some follow-up patches.

gcc/ChangeLog:

* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt  | 16 
 gcc/coretypes.h |  7 +++
 gcc/doc/invoke.texi | 16 +++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 5eb5ecff04b..a008827cfa2 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2332,6 +2332,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned]	Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index fe5b868fb4f..fb4bf37ba24 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -218,6 +218,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43341fe6e5e..f6a7459bda7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted
 -flax-vector-conversions  -fms-extensions
--foffload=@var{arg}  -foffload-options=@var{arg}
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} 
 -fopenacc  -fopenacc-dim=@var{geom}
 -fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone@r{[}=@var{device-type}@r{]}
 -fpermitted-flt-eval-methods=@var{standard}
@@ -2766,6 +2766,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906
 @end smallexample
 
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @opindex fopenacc
 @cindex OpenACC accelerator programming
 @item -fopenacc


[PATCH v3 0/6] libgomp: OpenMP pinned memory omp_alloc

2023-12-11 Thread Andrew Stubbs
This patch series is a rework of the v2 series I posted in August:

https://patchwork.sourceware.org/project/gcc/list/?series=23763=%2A=both

This version addresses most of the review comments from Tobias, but
after discussion with Tobias and Thomas we've decided to skip the
nice-to-have proposed initialization improvement in the interest of
getting the job done, for now.

Otherwise, some bugs have been fixed and few other clean-ups have been
made, but the series retains the same purpose and structure.

This series no longer has any out-of-tree dependencies, now that the
low-latency allocator patch have been committed.

An older, less compact, version of these patches is already applied to
the devel/omp/gcc-13 (OG13) branch.

OK for mainline?

Andrew

Andrew Stubbs (5):
  libgomp: basic pinned memory on Linux
  libgomp, openmp: Add ompx_pinned_mem_alloc
  openmp: Add -foffload-memory
  openmp: -foffload-memory=pinned
  libgomp: fine-grained pinned memory allocator

Thomas Schwinge (1):
  libgomp, nvptx: Cuda pinned memory

 gcc/common.opt|  16 +
 gcc/coretypes.h   |   7 +
 gcc/doc/invoke.texi   |  16 +-
 gcc/omp-builtins.def  |   3 +
 gcc/omp-low.cc|  66 
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |   7 +-
 libgomp/allocator.c   |  95 --
 libgomp/config/gcn/allocator.c|  21 +-
 libgomp/config/linux/allocator.c  | 243 +
 libgomp/config/nvptx/allocator.c  |  21 +-
 libgomp/libgomp-plugin.h  |   2 +
 libgomp/libgomp.h |  14 +
 libgomp/libgomp.map   |   1 +
 libgomp/libgomp.texi  |  17 +-
 libgomp/libgomp_g.h   |   1 +
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/plugin/plugin-nvptx.c |  42 +++
 libgomp/target.c  | 136 
 .../libgomp.c-c++-common/alloc-pinned-1.c |  28 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c  | 141 
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c  | 146 
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c  | 189 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c  | 184 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 129 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 128 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  |  63 
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c  | 127 +++
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +
 libgomp/usmpin-allocator.c| 319 ++
 31 files changed, 2127 insertions(+), 56 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
 create mode 100644 libgomp/usmpin-allocator.c

-- 
2.41.0



[PATCH v3 1/6] libgomp: basic pinned memory on Linux

2023-12-11 Thread Andrew Stubbs

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
(omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* libgomp.texi: Switch pinned trait to supported.
(MEMSPACE_VALIDATE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c  |  65 +---
 libgomp/config/gcn/allocator.c   |  21 +--
 libgomp/config/linux/allocator.c | 111 +
 libgomp/config/nvptx/allocator.c |  21 +--
 libgomp/libgomp.texi |   3 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c | 115 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 120 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c | 156 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c | 150 ++
 9 files changed, 716 insertions(+), 46 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index a8a80f8028d..666adf9a3a9 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -38,27 +38,30 @@
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.
+   The defaults (no override) are to return NULL for pinned memory requests
+   and pass through to the regular OS calls otherwise.
The following definitions (ab)use comma operators to avoid unused
variable errors.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
-  malloc (((void)(MEMSPACE), (SIZE)))
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (((void)(MEMSPACE), (SIZE
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
-  calloc (1, (((void)(MEMSPACE), (SIZE
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, (((void)(MEMSPACE), (SIZE)
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
-  realloc (ADDR, (((void)(MEMSPACE), (void)(OLDSIZE), (SIZE
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+   ((PIN) || (OLDPIN) ? NULL \
+   : realloc (ADDR, (((void)(MEMSPACE), (void)(OLDSIZE), (SIZE)
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
-  free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  if (PIN) free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
 #endif
 #ifndef MEMSPACE_VALIDATE
-#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
-  (((void)(MEMSPACE), (void)(ACCESS), 1))
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS, PIN) \
+  (PIN ? 0 : ((void)(MEMSPACE), (void)(ACCESS), 1))
 #endif
 
 /* Map the predefined allocators to the correct memory space.
@@ -439,12 +442,8 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 }
 #endif
 
-  /* No support for this so far.  */
-  if (data.pinned)
-return omp_null_allocator;
-
   /* Reject unsupported memory spaces.  */
-  if (!MEMSPACE_VALIDATE (data.memspace, data.access))
+  if (!MEMSPACE_VALIDATE (data.memspace, data.access, data.pinned))
 return omp_null_allocator;
 
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
@@ -586,7 +585,8 @@ retry:
 	}
   else
 #endif
-	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+			  allocator_data->pinned);
   if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -623,7 

[pushed] c++: add fixed testcase [PR63378]

2023-12-11 Thread Patrick Palka
We accept this testcase since r12-4453-g79802c5dcc043a.

PR c++/63378

gcc/testsuite/ChangeLog:

* g++.dg/template/fnspec3.C: New test.
---
 gcc/testsuite/g++.dg/template/fnspec3.C | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/fnspec3.C

diff --git a/gcc/testsuite/g++.dg/template/fnspec3.C 
b/gcc/testsuite/g++.dg/template/fnspec3.C
new file mode 100644
index 000..c36cb17751d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/fnspec3.C
@@ -0,0 +1,20 @@
+// PR c++/63378
+// { dg-do compile { target c++11 } }
+
+template
+struct B { };
+
+template
+struct A {
+private:
+  template
+  static B g();
+
+public:
+  template
+  auto f() -> decltype(g());
+};
+
+template<>
+template<>
+auto A::f() -> B;
-- 
2.43.0.76.g1a87c842ec



Re: Ping: [PATCH] Add a late-combine pass [PR106594]

2023-12-11 Thread Robin Dapp
Hi Richard,

I have tested the new pass on riscv64 and while it did exhibit some
regressions, none of them are critical.  Mostly, test expectations
will need to be adjusted - no new execution failures.

As mentioned in the initial discussion it does help us get the
behavior we want but, as of now, seems to propagate/combine a bit
more than I expected.  I suppose a bit of register-pressure tuning
will still be required in order to get the behavior we want.
It will also force us to properly set latencies/costs for the
register-file-crossing vector instructions.

All in all I would be very glad to see this get in :)

Regards
 Robin



Re: [RFC] RISC-V: Support RISC-V Profiles in -march option.

2023-12-11 Thread Jeff Law




On 11/20/23 12:14, Jiawei wrote:

Supports RISC-V profiles[1] in -march option.

Default input set the profile is before other formal extensions.

[1]https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc

gcc/ChangeLog:

 * common/config/riscv/riscv-common.cc (struct riscv_profiles):
   New struct.
 (riscv_subset_list::parse_profiles): New function.
 (riscv_subset_list::parse): New table.
 * config/riscv/riscv-subset.h: New protype.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/arch-29.c: New test.
 * gcc.target/riscv/arch-30.c: New test.
 * gcc.target/riscv/arch-31.c: New test.

---
  gcc/common/config/riscv/riscv-common.cc  | 58 +++-
  gcc/config/riscv/riscv-subset.h  |  2 +
  gcc/testsuite/gcc.target/riscv/arch-29.c |  5 ++
  gcc/testsuite/gcc.target/riscv/arch-30.c |  5 ++
  gcc/testsuite/gcc.target/riscv/arch-31.c |  5 ++
  6 files changed, 81 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-29.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-30.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-31.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 5111626157b..30617e619b1 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -165,6 +165,12 @@ struct riscv_ext_version
int minor_version;
  };
  
+struct riscv_profiles

+{
+  const char * profile_name;
+  const char * profile_string;
+};

Just a formatting nit, no space between the '*' and the field name.


@@ -348,6 +354,28 @@ static const struct riscv_ext_version riscv_combine_info[] 
=
{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
  };
  
+static const riscv_profiles riscv_profiles_table[] =

+{
+  {"RVI20U64", "rv64i"},
+  {"RVI20U32", "rv32i"},
+  /*Currently we don't have zicntr,ziccif,ziccrse,ziccamoa,
+zicclsm,za128rs yet.  */
It is actually useful to note the extensions not included?  I don't 
think the profiles are supposed to change once ratified.



+  {"RVA22U64", "rv64imafdc_zicsr_zihintpause_zba_zbb_zbs_" 
\
Note the trailing "_", was that intentional?  None of the other entries 
have a trailing "_".




@@ -927,6 +955,31 @@ riscv_subset_list::parsing_subset_version (const char *ext,
return p;
  }
  
+const char *

+riscv_subset_list::parse_profiles (const char * p){
+  for (int i = 0; riscv_profiles_table[i].profile_name != NULL; ++i) {
+const char* match = strstr(p, riscv_profiles_table[i].profile_name);
+const char* plus_ext = strchr(p, '+');
+/* Find profile at the begin.  */
+if (match != NULL && match == p) {
+  /* If there's no '+' sign, return the profile_string directly.  */
+  if(!plus_ext)
+   return riscv_profiles_table[i].profile_string;
+  /* If there's a '+' sign, concatenate profiles with other ext.  */
+  else {
+   size_t arch_len = strlen(riscv_profiles_table[i].profile_string) +
+   strlen(plus_ext);
+   static char* result = new char[arch_len + 2];
+   strcpy(result, riscv_profiles_table[i].profile_string);
+   strcat(result, "_");
+   strcat(result, plus_ext + 1); /* skip the '+'.  */
+   return result;
+  }
+}
+  }
+  return p;
+}

This needs a function comment.

The open curly should always be on a line by itself which is going to 
require reindenting all this code.  Comments go on separate lines rather 
than appending them to an existing line.



I think the consensus in the Tuesday patchwork meeting was that while 
there are concerns about profiles, those concerns should prevent this 
patch from going forward.  So if you could fix the formatting problem as 
well as the trailing "_" issue noted above and repost, it would be 
appreciated.


Thanks,

Jeff


Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-11 Thread Vincenzo Palazzo
Hi all,

>Are you backing down from that opinion and deciding that this proposal
is, indeed, after all specific to NixOS and only NixOS and is neither
needed nor used on any other distro?

I may be misreading the conversation, so let's restart it.

Why should my RFC be inside the distro's repository? What makes this a distro
build package? and not a developer configuration for building a
development environment?

Cheers,

   Vincent.

On Tue, Dec 5, 2023 at 1:43 PM Eli Schwartz  wrote:
>
> On 12/5/23 5:35 AM, Vincenzo Palazzo wrote:
> >>> I see, but to me, this do not look like a distro build procedure,
> >>> because you can use
> >>> with any kind of system (OSX/UNIX) by using nix.
> >>
> >> But you can do the same with various other distro build procedures too?
> >> e.g. Gentoo Prefix allows you to install a full-blown Gentoo anywhere
> >> you like, "by using portage".
> >
> > With a single difference on Gentoo you are allowed to put stuff in the
> > global path and use
> > it from a terminal like `$ pacman -Sy foo`. On nix os you do not,
> > because the development environment
> > is used for that.
> >
> > So all the nice dependencies that gcc required to build can not be
> > installed in NixOS global pat (e.g libc)
> > so in NixOS you should define the development environment otherwise
> > you can do the build. Instead in all
> > the other systems that you mention you can do.
>
>
> And yet, it seems your original point was that this doesn't qualify as a
> "distro build procedure" because nix isn't specific to NixOS.
>
> Are you backing down from that opinion and deciding that this proposal
> is, indeed, after all specific to NixOS and only NixOS and is neither
> needed nor used on any other distro?
>
>
> > Please note that the flake.nix does not define how to build gcc, but
> > just what are the dependencies
> > that gcc is required in order to contribute to the compiler. In other
> > words, is you run the flake.nix
> > on NixOS or any other system you do not have gcc installed on your
> > system, this is the job of the
> > distro.
>
>
> Its lack of completeness is surely an issue, but not the issue at hand
> *here*. Why do you think the lack of completeness is a supporting
> argument, rather than an opposing argument?
>
>
> --
> Eli Schwartz
>


Re: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Maciej W. Rozycki
On Mon, 11 Dec 2023, Jeff Law wrote:

> > This happened with the late-combine pass that I posted in October:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
> > which in turn triggered an error from aarch64_print_operand_address.
> > 
> > This patch takes the (hopefully) conservative fix of using VOIDmode for
> > asms but continuing to use the operand mode for .md insns, so as not
> > to break ports that still use reload.
> Sadly I didn't get as far as I would have liked in removing reload, though we
> did get a handful of ports converted this cycle

 The VAX port isn't ready for LRA yet as not only LRA produces notably 
worse RISC-like code ignoring all the architecture's address mode features 
(unenthusiatically acceptable), but it causes testsuite regressions of the 
ICE kind (unacceptable) as well.

 I did hope to at least start work on it in this release cycle, but there 
has been this outstanding issue of broken exception unwinding, which makes 
C++ unusuable except for odd cases such as with GCC itself where 
exceptions are not used.  This unwinder issue obviously has to take 
precedence as it cripples the usability of code produced by the compiler 
even for developer's use, e.g. native VAX/GDB is mostly broken and even 
VAX/gdbserver quits with a crash.

 I can try and see if I can find some time over the festive period to 
move the VAX port forward in either respect.

  Maciej


Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-12-11 Thread Jeff Law




On 11/20/23 16:54, David Malcolm wrote:

On Mon, 2023-11-20 at 16:38 -0700, Jeff Law wrote:



On 11/20/23 15:46, David Malcolm wrote:

On Fri, 2023-11-17 at 14:09 -0700, Jeff Law wrote:



On 11/17/23 14:08, Antoni Boucher wrote:

In contrast with the other frontends, libgccjit can be executed
multiple times in a row in the same process.

Yup.  I'm aware of that.  Even so calling init_emit_once more
than
one
time still seems wrong.


There are two approaches we follow when dealing with state stored
in
global variables:
(a) clean it all up via the various functions called from
toplev::finalize
(b) make it effectively constant once initialized, with idempotent
initialization

The multiple in-process executions of libgccjit could pass in
different
code-generation options.  Does the RTL-initialization logic depend
anywhere on flags passed in, because if so, we're probably going to
need to re-run the initialization.

The INIT_EXPANDERS code would be the most concerning as it's
implementation is totally hidden and provided by the target. I
wouldn't
be at all surprised if one or more do something evil in there.  That
probably needs to be evaluated on a target by target basis.

The rest really do look like single init, even in a JIT environment
kinds of things -- ie all the shared constants in RTL.


I think Antoni's patch can we described as implementing "single init",
in that it ensures that at least part of init_emit_once is single init.

Is the posted patch OK by you, or do we need to rework things, and if
the latter, what would be the goal?
What I'm struggling with is perhaps a problem of naming.  Conceptually 
"init_emit_once" in my mind should be called once and only once.If I 
read Antoni's change correctly, we call it more than once.  That just 
feels conceptually wrong -- add to it the opaqueness of INIT_EXPANDERS 
and it feels even more wrong -- we don't know what's going on behind the 
scenes in there.


jeff


Re: [PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-11 Thread Tobias Burnus

On 11.12.23 15:31, Thomas Schwinge wrote:

On 2023-12-08T17:44:14+0100, Tobias Burnus  wrote:

On 08.12.23 15:09, Thomas Schwinge wrote:

On 22/11/2023 17:07, Tobias Burnus wrote:

Let's start with the patch itself:

--- a/libgomp/target.c
+++ b/libgomp/target.c
...
+static struct gomp_device_descr *
+get_device_for_page_locked (void)
+{
+ gomp_debug (0, "%s\n",
+ __FUNCTION__);
+
+ struct gomp_device_descr *device;
+#ifdef HAVE_SYNC_BUILTINS
+ device
+   = __atomic_load_n (_for_page_locked, MEMMODEL_RELAXED);
+ if (device == (void *) -1)
+   {
+ gomp_debug (0, " init\n");
+
+ gomp_init_targets_once ();
+
+ device = NULL;
+ for (int i = 0; i < num_devices; ++i)

Given that this function just sets a single variable based on whether the
page_locked_host_alloc_func function pointer exists, wouldn't it be much
simpler to just do all this handling in   gomp_target_init  ?

@Thomas, care to comment on this?

  From what I remember, we cannot assume that 'gomp_target_init' has
already been done when we get here; therefore 'gomp_init_targets_once' is
being called here.  We may get to 'get_device_for_page_locked' via
host-side OpenMP, in code that doesn't contain any OpenMP 'target'
offloading things.  Therefore, this was (a) necessary to make that work,
and (b) did seem to be a useful abstraction to me.

I am not questioning the "gomp_init_targets_once ();" but I am wounding
whether only 'gomp_init_targets_once()' should remain without the
locking + loading dance - and then just set that single variable inside
gomp_target_init.

Ah, I see, thanks.


If you reach here w/o target set up, the "gomp_init_targets_once ();"
would ensure it gets initialized with all the other code inside
gomp_target_init.

And if gomp_target_init() was called before, gomp_init_targets_once()
will just return without doing anything and your are also fine.

Yes, I suppose we could do it that way.  'get_device_for_page_locked'
could then, after 'gomp_init_targets_once', unconditionally return
'device_for_page_locked' (even without '__atomic_load', right?).

Yes, that was my idea.

A disadvantage is that the setup of 'device_for_page_locked' (in
'gomp_target_init') and use of it (in 'get_device_for_page_locked') is
then split apart.  I guess I don't have a strong opinion on that one.
;-)


But pro is that it avoids the #ifdef HAVE_SYNC_BUILTINS, avoiding a "-1"
initialization of using_device_for_page_locked, atomic loads all over
the place etc.

Thus, I prefer this option – but I also don't have a strong opinion,
either.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] untyped calls: enable target switching [PR112334]

2023-12-11 Thread Jeff Law




On 12/1/23 08:10, Alexandre Oliva wrote:

On Dec  1, 2023, Alexandre Oliva  wrote:


Also tested on arm-eabi, but it's *not* enough (or needed) to fix the
PR, there's another bug lurking there, with a separate patch coming
up.


Here it is.



The computation of apply_args_size and apply_result_size is saved in a
static variable, so that the corresponding _mode arrays are
initialized only once.  That is not compatible with switchable
targets, and ARM's arm_set_current_function, by saving and restoring
target globals, exercises this problem with a testcase such as that in
the PR, in which more than one function in the translation unit calls
__builtin_apply or __builtin_return, respectively.

This patch moves the _size statics into the target_builtins array,
with a bit of ugliness over _plus_one so that zero initialization of
the struct does the right thing.

Regstrapped on x86_64-linux-gnu, tested on arm-eabi with and without the
upthread patch.  It fixes the hardcfr fails either way.  As for the
ugliness, there's a follow up patch below that attempts to alleviate it
a little (also regstrapped and tested), but I'm not sure we want to go
down that path.  WDYT?

It's a wart, but doesn't seem too bad to me.




for  gcc/ChangeLog

PR target/112334
* builtins.h (target_builtins): Add fields for apply_args_size
and apply_result_size.
* builtins.cc (apply_args_size, apply_result_size): Cache
results in fields rather than in static variables.
(get_apply_args_size, set_apply_args_size): New.
(get_apply_result_size, set_apply_result_size): New.

OK.




untyped calls: use wrapper class type for implicit plus_one

Instead of get and set macros to apply a delta, use a single macro
that resorts to a temporary wrapper class to apply it.

To be combined (or not) with the previous patch.

I'd be OK with this as well.

jeff



Re: [PATCH] aarch64: Fix wrong code for bfloat when f16 is enabled [PR 111867]

2023-12-11 Thread Richard Sandiford
Andrew Pinski  writes:
> The problem here is when f16 is enabled, movbf_aarch64 accepts `Ufc`
> as a constraint:
>  [ w, Ufc ; fconsts , fp16  ] fmov\t%h0, %1
> But that is for fmov values and in this case fmov represents f16 rather than 
> bfloat16 values.
> This means we would get the wrong value in the register.
>
> Built and tested for aarch64-linux-gnu with no regressions.  Also tested with 
> `-march=armv9-a+sve2,
> gcc.dg/torture/bfloat16-basic.c and gcc.dg/torture/bfloat16-builtin.c no 
> longer fail.
>
> gcc/ChangeLog:
>
>   PR target/111867
>   * config/aarch64/aarch64.cc (aarch64_float_const_representable_p): For 
> BFmode,
>   only accept +0.0.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5cffdabc62e..d48f5a1ba4b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23904,6 +23904,7 @@ aarch64_float_const_representable_p (rtx x)
>  
>r = *CONST_DOUBLE_REAL_VALUE (x);
>  
> +
>/* We cannot represent infinities, NaNs or +/-zero.  We won't
>   know if we have +zero until we analyse the mantissa, but we
>   can reject the other invalid values.  */

Seems like a stray change.

OK without that, thanks.

Richard

> @@ -23911,6 +23912,10 @@ aarch64_float_const_representable_p (rtx x)
>|| REAL_VALUE_MINUS_ZERO (r))
>  return false;
>  
> +  /* For BFmode, only handle 0.0. */
> +  if (GET_MODE (x) == BFmode)
> +return real_iszero (, false);
> +
>/* Extract exponent.  */
>r = real_value_abs ();
>exponent = REAL_EXP ();


Re: [PATCH] treat argp-based mem as frame related in dse

2023-12-11 Thread Jeff Law




On 12/11/23 02:26, Jiufu Guo wrote:


Hi,

Thanks for your quick reply!

Jeff Law  writes:


On 12/10/23 20:07, Jiufu Guo wrote:


I'm having a bit of a hard time convincing myself this is correct
though.  I can't see how rewriting the load to read the source of the
prior store is unsafe.  If that fixes a problem, then it would seem
like we've gone wrong before here -- perhaps failing to use the fusage
loads to "kill" any available stores to the same or aliased memory
locations.

As you said the later one, call's fusage would killing the previous
store. It is a kind of case like:

 134: [argp:SI+0x8]=r134:SI
 135: [argp:SI+0x4]=0x1
 136: [argp:SI]=r132:SI
 137: ax:SI=call [`memset'] argc:0xc
 REG_CALL_DECL `memset'
 REG_EH_REGION 0

This call insn is:
(call_insn/j 137 136 147 27 (set (reg:SI 0 ax)
   (call (mem:QI (symbol_ref:SI ("memset") [flags 0x41]  ) [0 __builtin_memset S1 A8])
   (const_int 12 [0xc]))) "pr102798.c":23:22 1086 {*sibcall_value}
(expr_list:REG_UNUSED (reg:SI 0 ax)
   (expr_list:REG_CALL_DECL (symbol_ref:SI ("memset") [flags 0x41]  
)
   (expr_list:REG_EH_REGION (const_int 0 [0])
   (nil
   (expr_list:SI (use (mem/f:SI (reg/f:SI 16 argp) [0  S4 A32]))
   (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) (const_int 4 
[0x4])) [0  S4 A32]))
   (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) 
(const_int 8 [0x8])) [0  S4 A32]))
   (nil)

The stores in "insns 134-136" are used by the call. "check_mem_read_rtx"
would prevent them to eliminated.

Right.  But unless I read something wrong, the patch wasn't changing
store removal, it was changing whether or not we forwarded the source
of the store into the destination of a subsequent load from the same
address.

"check_mem_read_rtx" has another behavior which checks the mem
and adds read_info to insn_info->read_rec. "read_rec" could prevent
the "store" from being eliminated during the dse's global alg. This
patch leverages this behavior.
And to avoid the "mem on fusage" to be replaced by leading store's rhs
"replace_read" was disabled if the mem is on the call's fusage.

Ah, so not only do we want to avoid the call to replace_read, but also avoid 
the early return.

By avoiding the early return, we proceed into later code which "kills"
the tracked store, thus avoiding the problem.  Right?

It is similar, I would say.  There is "leading code" as below:
   /* Look at all of the uses in the insn.  */
   note_uses ( (insn), check_mem_read_use, bb_info);

This checks possible loads in the "insn" and "kills" the tracked
stores if needed.
But "note_uses" does not check the fusage of the call insn.
So, this patch proceed the code "check_mem_read" for the "use mem"
on fusage.
OK for the trunk.  Please double check that older BZ and if that issue 
is fixed as well, add it to the commit log.  Thanks for walking me 
through the details.


jeff


Re: [PATCH] wrong code on m68k with -mlong-jump-table-offsets and -malign-int (PR target/112413)

2023-12-11 Thread Jeff Law




On 12/11/23 05:51, Mikael Pettersson wrote:

On m68k the compiler assumes that the PC-relative jump-via-jump-table
instruction and the jump table are adjacent with no padding in between.

When -mlong-jump-table-offsets is combined with -malign-int, a 2-byte
nop may be inserted before the jump table, causing the jump to add the
fetched offset to the wrong PC base and thus jump to the wrong address.

Fixed by referencing the jump table via its label. On the test case
in the PR the object code change is (the moveal at 16 is the nop):

 a:  6536bcss 42 
 c:  e588lsll #2,%d0
 e:  203b 0808   movel %pc@(18 ,%d0:l),%d0
-  12:  4efb 0802   jmp %pc@(16 ,%d0:l)
+  12:  4efb 0804   jmp %pc@(18 ,%d0:l)
16:  284cmoveal %a4,%a4
18:   0020   orib #32,%d0
1c:   002c   orib #44,%d0

Bootstrapped and tested on m68k-linux-gnu, no regressions.

Note: I don't have commit rights to I would need assistance applying this.

2023-12-11  Mikael Pettersson  

PR target/112413
* config/m68k/linux.h (ASM_RETURN_CASE_JUMP): For
TARGET_LONG_JUMP_TABLE_OFFSETS, reference the jump table
via its label.
* config/m68k/m68kelf.h (ASM_RETURN_CASE_JUMP: Likewise.
* config/m68k/netbsd-elf.h (ASM_RETURN_CASE_JUMP): Likewise.

THanks.  Installed.

jeff


Re: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Jeff Law




On 11/27/23 05:12, Richard Sandiford wrote:

check_asm_operands was inconsistent about how it handled "p" after
RA compared to before RA.  Before RA it tested the address with a
void (unknown) memory mode:

case CT_ADDRESS:
  /* Every address operand can be reloaded to fit.  */
  result = result || address_operand (op, VOIDmode);
  break;

After RA it deferred to constrain_operands, which used the mode
of the operand:

if ((GET_MODE (op) == VOIDmode
 || SCALAR_INT_MODE_P (GET_MODE (op)))
&& (strict <= 0
|| (strict_memory_address_p
 (recog_data.operand_mode[opno], op
  win = true;

Using the mode of the operand matches reload's behaviour:

   else if (insn_extra_address_constraint
   (lookup_constraint (constraints[i])))
{
  address_operand_reloaded[i]
= find_reloads_address (recog_data.operand_mode[i], (rtx*) 0,
recog_data.operand[i],
recog_data.operand_loc[i],
i, operand_type[i], ind_levels, insn);

It allowed the special predicate address_operand to be used, with the
mode of the operand being the mode of the addressed memory, rather than
the mode of the address itself.  For example, vax has:

(define_insn "*movaddr"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
(match_operand:VAXfp 1 "address_operand" "p"))
(clobber (reg:CC VAX_PSL_REGNUM))]
   "reload_completed"
   "mova %a1,%0")

where operand 1 is an SImode expression that can address memory of
mode VAXfp.  GET_MODE (recog_data.operand[1]) is SImode (or VOIDmode),
but recog_data.operand_mode[1] is mode.

But AFAICT, ira and lra (like pre-reload check_asm_operands) do not
do this, and instead pass VOIDmode.  So I think this traditional use
of address_operand is effectively an old-reload-only feature.

And it seems like no modern port cares.  I think ports have generally
moved to using different address constraints instead, rather than
relying on "p" with different operand modes.  Target-specific address
constraints post-date the code above.

The big advantage of using different constraints is that it works
for asms too.  And that (to finally get to the point) is the problem
fixed in this patch.  For the aarch64 test:

   void f(char *p) { asm("prfm pldl1keep, %a0\n" :: "p" (p + 6)); }

everything up to and including RA required the operand to be a
valid VOIDmode address.  But post-RA check_asm_operands and
constrain_operands instead required it to be valid for
recog_data.operand_mode[0].  Since asms have no syntax for
specifying an operand mode that's separate from the operand itself,
operand_mode[0] is simply Pmode (i.e. DImode).

This meant that we required one mode before RA and a different mode
after RA.  On AArch64, VOIDmode is treated as a wildcard and so has a
more conservative/restricted range than DImode.  So if a post-RA pass
tried to form a new address, it would use a laxer condition than the
pre-RA passes.
This was initially a bit counter-intuitive, my first reaction was that a 
wildcard mode is more general.  And that's true, but it necessarily 
means the addresses accepted are more restrictive because any mode is 
allowed.




This happened with the late-combine pass that I posted in October:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
which in turn triggered an error from aarch64_print_operand_address.

This patch takes the (hopefully) conservative fix of using VOIDmode for
asms but continuing to use the operand mode for .md insns, so as not
to break ports that still use reload.
Sadly I didn't get as far as I would have liked in removing reload, 
though we did get a handful of ports converted this cycle




Fixing this made me realise that recog_level2 was doing duplicate
work for asms after RA.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (constrain_operands): Pass VOIDmode to
strict_memory_address_p for 'p' constraints in asms.
* rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands
for asms.

gcc/testsuite/
* gcc.target/aarch64/prfm_imm_offset_2.c: New test.
It all seems a bit hackish.  I don't think ports have had much success 
using 'p' through the decades.  I think I generally ended up having to 
go with distinct constraints rather than relying on 'p'.


OK for the trunk, but ewww.

jeff


Ping: [PATCH] Add a late-combine pass [PR106594]

2023-12-11 Thread Richard Sandiford
Ping

---

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.  I hope it would
also help with Robin's vec_duplicate testcase, although the
pressure heuristic might need tweaking for that case.

This is just a first step..  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutitable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

I've run an assembly comparison with one target per CPU directory,
and it seems to be a win for all targets except nvptx (which is hard
to measure, being a higher-level asm).  The biggest winner seemed
to be AVR.

I'd originally hoped to enable the pass by default at -O2 and above
on all targets.  But in the end, I don't think that's possible,
because it interacts badly with x86's STV and partial register
dependency passes.

For example, gcc.target/i386/minmax-6.c tests whether the code
compiles without any spilling.  The RTL created by STV contains:

(insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int 1 [0x1]))) -1
 (nil))
(insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
(subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
 (expr_list:REG_DEAD (reg:SI 120)
(nil)))
(insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
(reg:SI 118)) -1
 (nil))

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

That one could be solved by running STV2 later.  But RPAD is
a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:

(insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
(sqrt:DF (mem/c:DF (symbol_ref:DI ("d2") {*sqrtdf2_sse}
 (nil))

into:

(insn 45 26 44 6 (set (reg:V4SF 108)
(const_vector:V4SF [
(const_double:SF 0.0 [0x0.0p+0]) repeated x4
])) -1
 (nil))
(insn 44 45 27 6 (set (reg:V2DF 109)
(vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI 
("d2")
(subreg:V2DF (reg:V4SF 108) 0)
(const_int 1 [0x1]))) -1
 (nil))
(insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
(subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
 (nil))

But both the pre-RA and post-RA passes are able to combine these
instructions back to the original form.

The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?

Richard


gcc/
PR rtl-optimization/106594
* Makefile.in (OBJS): Add late-combine.o.
* common.opt (flate-combine-instructions): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc: Enable it by default
at -O2 and above.
* tree-pass.h (make_pass_late_combine): Declare.
* late-combine.cc: New file.
* passes.def: Add two instances of late_combine.

gcc/testsuite/
PR rtl-optimization/106594
* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
targets.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
described in the comment.
* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
* gcc.target/aarch64/pr106594_1.c: New test.
---
 gcc/Makefile.in   |   1 +
 gcc/common.opt|   5 +
 

Ping: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Richard Sandiford
Ping

---

check_asm_operands was inconsistent about how it handled "p" after
RA compared to before RA.  Before RA it tested the address with a
void (unknown) memory mode:

case CT_ADDRESS:
  /* Every address operand can be reloaded to fit.  */
  result = result || address_operand (op, VOIDmode);
  break;

After RA it deferred to constrain_operands, which used the mode
of the operand:

if ((GET_MODE (op) == VOIDmode
 || SCALAR_INT_MODE_P (GET_MODE (op)))
&& (strict <= 0
|| (strict_memory_address_p
 (recog_data.operand_mode[opno], op
  win = true;

Using the mode of the operand matches reload's behaviour:

  else if (insn_extra_address_constraint
   (lookup_constraint (constraints[i])))
{
  address_operand_reloaded[i]
= find_reloads_address (recog_data.operand_mode[i], (rtx*) 0,
recog_data.operand[i],
recog_data.operand_loc[i],
i, operand_type[i], ind_levels, insn);

It allowed the special predicate address_operand to be used, with the
mode of the operand being the mode of the addressed memory, rather than
the mode of the address itself.  For example, vax has:

(define_insn "*movaddr"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
(match_operand:VAXfp 1 "address_operand" "p"))
   (clobber (reg:CC VAX_PSL_REGNUM))]
  "reload_completed"
  "mova %a1,%0")

where operand 1 is an SImode expression that can address memory of
mode VAXfp.  GET_MODE (recog_data.operand[1]) is SImode (or VOIDmode),
but recog_data.operand_mode[1] is mode.

But AFAICT, ira and lra (like pre-reload check_asm_operands) do not
do this, and instead pass VOIDmode.  So I think this traditional use
of address_operand is effectively an old-reload-only feature.

And it seems like no modern port cares.  I think ports have generally
moved to using different address constraints instead, rather than
relying on "p" with different operand modes.  Target-specific address
constraints post-date the code above.

The big advantage of using different constraints is that it works
for asms too.  And that (to finally get to the point) is the problem
fixed in this patch.  For the aarch64 test:

  void f(char *p) { asm("prfm pldl1keep, %a0\n" :: "p" (p + 6)); }

everything up to and including RA required the operand to be a
valid VOIDmode address.  But post-RA check_asm_operands and
constrain_operands instead required it to be valid for
recog_data.operand_mode[0].  Since asms have no syntax for
specifying an operand mode that's separate from the operand itself,
operand_mode[0] is simply Pmode (i.e. DImode).

This meant that we required one mode before RA and a different mode
after RA.  On AArch64, VOIDmode is treated as a wildcard and so has a
more conservative/restricted range than DImode.  So if a post-RA pass
tried to form a new address, it would use a laxer condition than the
pre-RA passes.

This happened with the late-combine pass that I posted in October:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
which in turn triggered an error from aarch64_print_operand_address.

This patch takes the (hopefully) conservative fix of using VOIDmode for
asms but continuing to use the operand mode for .md insns, so as not
to break ports that still use reload.

Fixing this made me realise that recog_level2 was doing duplicate
work for asms after RA.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (constrain_operands): Pass VOIDmode to
strict_memory_address_p for 'p' constraints in asms.
* rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands
for asms.

gcc/testsuite/
* gcc.target/aarch64/prfm_imm_offset_2.c: New test.
---
 gcc/recog.cc   | 18 +++---
 gcc/rtl-ssa/changes.cc |  4 +++-
 .../gcc.target/aarch64/prfm_imm_offset_2.c |  2 ++
 3 files changed, 16 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_2.c

diff --git a/gcc/recog.cc b/gcc/recog.cc
index eaab79c25d7..bff7be1aec1 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -3191,13 +3191,17 @@ constrain_operands (int strict, alternative_mask 
alternatives)
   strictly valid, i.e., that all pseudos requiring hard regs
   have gotten them.  We also want to make sure we have a
   valid mode.  */
-   if ((GET_MODE (op) == VOIDmode
-|| SCALAR_INT_MODE_P (GET_MODE (op)))
-   && (strict <= 0
-   || (strict_memory_address_p
-(recog_data.operand_mode[opno], op
- win = true;
-  

Re: [pushed] configure, libquadmath: Remove unintended AC_CHECK_LIBM [PR111928]

2023-12-11 Thread Jakub Jelinek
On Mon, Oct 23, 2023 at 02:18:39PM +0100, Iain Sandoe wrote:
> This is a partial reversion of r14-4825-g6a6d3817afa02b to remove an
> unintended change.
> 
> Tested with x86_64-linux X arm-none-eabi (and  x86_64-darwin X arm-non-eabi
> and native x86_64-darwin bootstrap.  Also reported by the OP to fix the
> issue, pushed to trunk, apologies for the breakage,
> Iain
> 
> --- 8< ---
> 
> This was a rebase error, that managed to pass testing on Darwin and
> Linux (but fails on bare metal).
> 
>   PR libquadmath/111928
> 
> libquadmath/ChangeLog:
> 
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Remove AC_CHECK_LIBM.

I'm afraid this change is very harmful on Linux.
libquadmath.so.0 had on Linux since forever
 0x0001 (NEEDED) Shared library: [libm.so.6]
entry (and it should have it, because it has undefined relocations against
libm.so.6 entrypoints: at least signgam and sqrt, on powerpc64le also
__sqrtieee128.
But with this change it no longer has.
This e.g. breaks libtool build on powerpc64le, where the dynamic linker
crashes during sqrt related IFUNC resolution.

Jakub



[PATCH v4] aarch64: SVE/NEON Bridging intrinsics

2023-12-11 Thread Richard Ball
ACLE has added intrinsics to bridge between SVE and Neon.

The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.

This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq

gcc/ChangeLog:

* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
Moved to header file.
(ENTRY): Likewise.
(enum aarch64_simd_type): Likewise.
(struct aarch64_simd_type_info): Remove static.
(GTY): Likewise.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
(class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
functions.
* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc 
(DEF_SVE_TYPE_SUFFIX): Changed to be called through
SVE_NEON macro.
(DEF_SVE_NEON_TYPE_SUFFIX): Defines 
macro for NEON_SVE_BRIDGE type suffixes.
(DEF_NEON_SVE_FUNCTION): Defines 
macro for NEON_SVE_BRIDGE functions.
(function_resolver::infer_neon128_vector_type): Infers type suffix
for overloaded functions.
(init_neon_sve_builtins): Initialise neon_sve_bridge_builtins for LTO.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
(bf16): Replace entry with neon-sve entry.
(f16): Likewise.
(f32): Likewise.
(f64): Likewise.
(s8): Likewise.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
* config/aarch64/aarch64-sve-builtins.h
(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
(ENTRY): Add aarch64_simd_type definiton.
(enum aarch64_simd_type): Add neon information to type_suffix_info.
(struct type_suffix_info): New function.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_get_neonq_): New intrinsic insn for big endian.
(@aarch64_sve_set_neonq_): Likewise.
* config/aarch64/aarch64.cc 
(aarch64_init_builtins): Add call to init_neon_sve_builtins.
* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
* config/aarch64/aarch64-builtins.h: New file.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
* config/aarch64/arm_neon_sve_bridge.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include 
arm_neon_sve_bridge header file
* gcc.dg/torture/neon-sve-bridge.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.

Re: Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread Sergei Lewis
...oh, and keep the current approach if riscv-autovec-lmul=dynamic.
Makes perfect sense - thanks!

On Mon, Dec 11, 2023 at 3:01 PM 钟居哲  wrote:

> I think we should leave it to user choice.
>
> --param=riscv-autovec-lmul=m1/m2/m4/m8/dynamic.
>
> So use TARGET_MAX_LMUL should be more reasonable.
>
> --
> juzhe.zh...@rivai.ai
>
>
> *From:* Sergei Lewis 
> *Date:* 2023-12-11 22:58
> *To:* juzhe.zh...@rivai.ai
> *CC:* gcc-patches ; Robin Dapp
> ; Kito.cheng ; jeffreyalaw
> 
> *Subject:* Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension
> The thinking here is that using the largest possible LMUL when we know the
> operation will fit in fewer registers potentially leaves performance on the
> table - indirectly, due to the unnecessarily increased register pressure,
> and also directly, depending on the implementation.
>
> On Mon, Dec 11, 2023 at 10:05 AM juzhe.zh...@rivai.ai <
> juzhe.zh...@rivai.ai> wrote:
>
>> Hi, Thanks for contributing this.
>>
>> +/* Select appropriate LMUL for a single vector operation based on
>> +   byte size of data to be processed.
>> +   On success, return true and populate lmul_out.
>> +   If length_in is too wide for a single vector operation, return false
>> +   and leave lmul_out unchanged.  */
>> +
>> +static bool
>> +select_appropriate_lmul (HOST_WIDE_INT length_in,
>> +HOST_WIDE_INT _out)
>> +{
>>
>> I don't think we need this, you only need to use TARGET_MAX_LMUL
>>
>>
>> --
>> juzhe.zh...@rivai.ai
>>
>


Re: Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread 钟居哲
I think we should leave it to user choice.

--param=riscv-autovec-lmul=m1/m2/m4/m8/dynamic.

So use TARGET_MAX_LMUL should be more reasonable.



juzhe.zh...@rivai.ai
 
From: Sergei Lewis
Date: 2023-12-11 22:58
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Robin Dapp; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension
The thinking here is that using the largest possible LMUL when we know the 
operation will fit in fewer registers potentially leaves performance on the 
table - indirectly, due to the unnecessarily increased register pressure, and 
also directly, depending on the implementation.

On Mon, Dec 11, 2023 at 10:05 AM juzhe.zh...@rivai.ai  
wrote:
Hi, Thanks for contributing this.

+/* Select appropriate LMUL for a single vector operation based on
+   byte size of data to be processed.
+   On success, return true and populate lmul_out.
+   If length_in is too wide for a single vector operation, return false
+   and leave lmul_out unchanged.  */
+
+static bool
+select_appropriate_lmul (HOST_WIDE_INT length_in,
+HOST_WIDE_INT _out)
+{
I don't think we need this, you only need to use TARGET_MAX_LMUL




juzhe.zh...@rivai.ai


Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread Sergei Lewis
The thinking here is that using the largest possible LMUL when we know the
operation will fit in fewer registers potentially leaves performance on the
table - indirectly, due to the unnecessarily increased register pressure,
and also directly, depending on the implementation.

On Mon, Dec 11, 2023 at 10:05 AM juzhe.zh...@rivai.ai 
wrote:

> Hi, Thanks for contributing this.
>
> +/* Select appropriate LMUL for a single vector operation based on
> +   byte size of data to be processed.
> +   On success, return true and populate lmul_out.
> +   If length_in is too wide for a single vector operation, return false
> +   and leave lmul_out unchanged.  */
> +
> +static bool
> +select_appropriate_lmul (HOST_WIDE_INT length_in,
> +HOST_WIDE_INT _out)
> +{
>
> I don't think we need this, you only need to use TARGET_MAX_LMUL
>
>
> --
> juzhe.zh...@rivai.ai
>


Re: [PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-11 Thread Thomas Schwinge
Hi!

On 2023-12-08T17:44:14+0100, Tobias Burnus  wrote:
> On 08.12.23 15:09, Thomas Schwinge wrote:
>>> On 22/11/2023 17:07, Tobias Burnus wrote:
 Let's start with the patch itself:
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> ...
> +static struct gomp_device_descr *
> +get_device_for_page_locked (void)
> +{
> + gomp_debug (0, "%s\n",
> + __FUNCTION__);
> +
> + struct gomp_device_descr *device;
> +#ifdef HAVE_SYNC_BUILTINS
> + device
> +   = __atomic_load_n (_for_page_locked, MEMMODEL_RELAXED);
> + if (device == (void *) -1)
> +   {
> + gomp_debug (0, " init\n");
> +
> + gomp_init_targets_once ();
> +
> + device = NULL;
> + for (int i = 0; i < num_devices; ++i)
 Given that this function just sets a single variable based on whether the
 page_locked_host_alloc_func function pointer exists, wouldn't it be much
 simpler to just do all this handling in   gomp_target_init  ?
>>> @Thomas, care to comment on this?
>>  From what I remember, we cannot assume that 'gomp_target_init' has
>> already been done when we get here; therefore 'gomp_init_targets_once' is
>> being called here.  We may get to 'get_device_for_page_locked' via
>> host-side OpenMP, in code that doesn't contain any OpenMP 'target'
>> offloading things.  Therefore, this was (a) necessary to make that work,
>> and (b) did seem to be a useful abstraction to me.
>
> I am not questioning the "gomp_init_targets_once ();" but I am wounding
> whether only 'gomp_init_targets_once()' should remain without the
> locking + loading dance - and then just set that single variable inside
> gomp_target_init.

Ah, I see, thanks.

> If you reach here w/o target set up, the "gomp_init_targets_once ();"
> would ensure it gets initialized with all the other code inside
> gomp_target_init.
>
> And if gomp_target_init() was called before, gomp_init_targets_once()
> will just return without doing anything and your are also fine.

Yes, I suppose we could do it that way.  'get_device_for_page_locked'
could then, after 'gomp_init_targets_once', unconditionally return
'device_for_page_locked' (even without '__atomic_load', right?).
A disadvantage is that the setup of 'device_for_page_locked' (in
'gomp_target_init') and use of it (in 'get_device_for_page_locked') is
then split apart.  I guess I don't have a strong opinion on that one.
;-)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] Add myself to write after approval

2023-12-11 Thread Paul Iannetta
Hi,

I would like to add myself to write after approval, is it ok for
master?

Thanks,
Paul Iannetta

---8<---

ChangeLog:

* MAINTAINERS: Add myself to write after approval

Signed-off-by: Paul Iannetta 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0dbcbadcfd7..971a33873bb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -471,6 +471,7 @@ Dominique d'Humieres

 Andy Hutchinson
 Joel Hutton
 Lewis Hyatt
+Paul Iannetta  
 Roland Illig   
 Meador Inge
 Bernardo Innocenti 
-- 
2.35.1.500.gb896f729e2







Re: [PATCH 1/2] analyzer: Remove check of unsigned_char in maybe_undo_optimize_bit_field_compare.

2023-12-11 Thread David Malcolm
On Mon, 2023-12-11 at 09:04 +0100, Richard Biener wrote:
> On Sun, Dec 10, 2023 at 8:57 PM Andrew Pinski
>  wrote:
> > 
> > From: Andrew Pinski 
> > 
> > The check for the type seems unnecessary and gets in the way
> > sometimes.
> > Also with a patch I am working on for match.pd, it causes a failure
> > to happen.
> > Before my patch the IR was:
> >   _1 = BIT_FIELD_REF ;
> >   _2 = _1 & 1;
> >   _3 = _2 != 0;
> >   _4 = (int) _3;
> >   __analyzer_eval (_4);
> > 
> > Where _2 was an unsigned char type.
> > And After my patch we have:
> >   _1 = BIT_FIELD_REF ;
> >   _2 = (int) _1;
> >   _3 = _2 & 1;
> >   __analyzer_eval (_3);
> > 
> > But in this case, the BIT_AND_EXPR is in an int type.
> > 
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no
> > regressions.

Yes...

> 
> OK (hope it's OK that I approve this).

...and yes.

Dave



Re: [PATCH V3 4/4] OpenMP: Permit additional selector properties

2023-12-11 Thread Tobias Burnus

This patch LGTM.

Likewise 'LGTM' are patches 1/4 and 2/4, in line with my previous
comments. (Those are unchanged to previous round.)

Thanks for the patches!

I still have to look at 3/4, which is large and did see some changes
between v2 and v3. (Overall they seem to be really nice!)

Tobias

On 07.12.23 16:52, Sandra Loosemore wrote:

This patch adds "hpe" to the known properties for the "vendor" selector,
and support for "acquire" and "release" for "atomic_default_mem_order".

gcc/ChangeLog
  * omp-general.cc (vendor_properties): Add "hpe".
  (atomic_default_mem_order_properties): Add "acquire" and "release".
  (omp_context_selector_matches): Handle "acquire" and "release".

gcc/testsuite/ChangeLog
  * c-c++-common/gomp/declare-variant-2.c: Don't expect error on
  "acquire" and "release".
  * gfortran.dg/gomp/declare-variant-2a.f90: Likewise.
---
  gcc/omp-general.cc| 10 --
  gcc/testsuite/c-c++-common/gomp/declare-variant-2.c   |  4 ++--
  gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90 |  4 ++--
  3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 5f0cb041ffa..4f7c83fbd2c 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -1126,12 +1126,12 @@ const char *omp_tss_map[] =
  static const char *const kind_properties[] =
{ "host", "nohost", "cpu", "gpu", "fpga", "any", NULL };
  static const char *const vendor_properties[] =
-  { "amd", "arm", "bsc", "cray", "fujitsu", "gnu", "ibm", "intel",
+  { "amd", "arm", "bsc", "cray", "fujitsu", "gnu", "hpe", "ibm", "intel",
  "llvm", "nvidia", "pgi", "ti", "unknown", NULL };
  static const char *const extension_properties[] =
{ NULL };
  static const char *const atomic_default_mem_order_properties[] =
-  { "seq_cst", "relaxed", "acq_rel", NULL };
+  { "seq_cst", "relaxed", "acq_rel", "acquire", "release", NULL };

  struct omp_ts_info omp_ts_map[] =
{
@@ -1551,6 +1551,12 @@ omp_context_selector_matches (tree ctx)
else if (!strcmp (prop, "acq_rel")
 && omo != OMP_MEMORY_ORDER_ACQ_REL)
  return 0;
+   else if (!strcmp (prop, "acquire")
+&& omo != OMP_MEMORY_ORDER_ACQUIRE)
+ return 0;
+   else if (!strcmp (prop, "release")
+&& omo != OMP_MEMORY_ORDER_RELEASE)
+ return 0;
  }
break;
  case OMP_TRAIT_DEVICE_ARCH:
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c 
b/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c
index 97285fa3b74..bc3f443379f 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c
@@ -105,9 +105,9 @@ void f50 (void);  /* { 
dg-error "expected '\\\}' before '\\(' token" "" {
  void f51 (void);/* { dg-error 
"expected '\\\}' before '\\(' token" "" { target c } .-1 } */
  #pragma omp declare variant (f1) match(implementation={atomic_default_mem_order})   /* 
{ dg-error "expected '\\(' before '\\\}' token" } */
  void f52 (void);
-#pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(acquire)})   /* { dg-error "incorrect 
property 'acquire' of 'atomic_default_mem_order' selector" } */
+#pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(acquire)})
  void f53 (void);
-#pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(release)})   /* { dg-error "incorrect 
property 'release' of 'atomic_default_mem_order' selector" } */
+#pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(release)})
  void f54 (void);
  #pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(foobar)})   /* { dg-error "incorrect 
property 'foobar' of 'atomic_default_mem_order' selector" } */
  void f55 (void);
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90 
b/gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90
index 56de1177789..edc9b27f884 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90
@@ -29,10 +29,10 @@ contains
  !$omp declare variant (f1) match(implementation={vendor("foobar")}) ! { dg-warning 
"unknown property '.foobar.' of 'vendor' selector" }
end subroutine
subroutine f53 ()
-!$omp declare variant (f1) match(implementation={atomic_default_mem_order(acquire)}) 
 ! { dg-error "incorrect property 'acquire' of 'atomic_default_mem_order' 
selector" }
+!$omp declare variant (f1) 
match(implementation={atomic_default_mem_order(acquire)})
end subroutine
subroutine f54 ()
-!$omp declare variant (f1) match(implementation={atomic_default_mem_order(release)}) 
 ! { dg-error 

Re: [PATCH v2] testsuite: adjust call to abort in excess-precision-12

2023-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2023 at 02:35:52PM +0100, Marc Poulhiès wrote:
> On non-hosted targets, cstdlib may not be sufficient to have abort
> defined, but it should be for std::abort.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/i386/excess-precision-12.C: call std::abort instead of 
> abort.
> ---
> Changed from calling __builtin_abort to std::abort, as advised.
> 
> Ok for master?

Ok.

Jakub



[PATCH v2] testsuite: adjust call to abort in excess-precision-12

2023-12-11 Thread Marc Poulhiès
On non-hosted targets, cstdlib may not be sufficient to have abort
defined, but it should be for std::abort.

gcc/testsuite/ChangeLog:

* g++.target/i386/excess-precision-12.C: call std::abort instead of 
abort.
---
Changed from calling __builtin_abort to std::abort, as advised.

Ok for master?

Thanks,
Marc

 gcc/testsuite/g++.target/i386/excess-precision-12.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.target/i386/excess-precision-12.C 
b/gcc/testsuite/g++.target/i386/excess-precision-12.C
index dff48c07c8b..7cfd15d6136 100644
--- a/gcc/testsuite/g++.target/i386/excess-precision-12.C
+++ b/gcc/testsuite/g++.target/i386/excess-precision-12.C
@@ -13,8 +13,8 @@ main (void)
   unsigned long long int u = (1ULL << 63) + 1;
 
   if ((f <=> u) >= 0)
-abort ();
+std::abort ();
 
   if ((u <=> f) <= 0)
-abort ();
+std::abort ();
 }
-- 
2.43.0



Re: [PATCH] testsuite: adjust call to abort in excess-precision-12

2023-12-11 Thread Marc Poulhiès
Hello,

> Why wouldn't they have abort and what else does __builtin_abort () expand
> to?

It expands to abort but works around the "abort is undeclared" error.

> There are 2000+ other tests in gcc.target/i386/ which call abort (),
> not __builtin_abort (), after including  directly or indirectly
> or declaring it themselves.  This test in particular includes 
>
> Does whatever target you are running this into provide just std::abort ()
> and not abort (); from ?  If so, perhaps it should call
> std::abort (); instead of abort ().

You are correct, std::abort() is a better solution. cstdlib does not
include stdlib.h because I'm on a non-hosted target. I'll send a
refreshed patch.

Thanks,
Marc


  1   2   >