Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Toru Kisuki via Gcc-patches
Hi Jeff,


Thank you for taking care of it.


Toru


From: Jeff Law 
Sent: Monday, June 19, 2023 7:55 PM
To: Richard Biener; Toru Kisuki
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with 
-fsignaling-nans


[EXTERNAL] Caution: This email originated from outside of the organization.



On 6/19/23 05:41, Richard Biener via Gcc-patches wrote:
> On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>>
>> With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x 
>> + 0.0' to 'x'.
>>
>
> OK if you bootstrapped / tested this change.
I'm suspect Toru doesn't have write access.  So I went ahead and did and
x86 bootstrap & regression test which passed.  The ChangeLog entry
needed fleshing out a bit and fixed a minor whitespace problem in the
patch itself.

Pushed to the trunk.


jeff


Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 22:52, Tamar Christina wrote:


It's a bit hackish, but could we reject the stack pointer for operand1 in the
stack-tie?  And if we do so, does it help?


Yeah this one I had to defer until later this week to look at closer because 
what I'm
wondering about is whether the optimization should apply to frame related
RTX as well.

Looking at the description of RTX_FRAME_RELATED_P that this optimization may
end up de-optimizing RISC targets by creating an offset that is larger than 
offset
which can be used from a SP making reload having to spill.  i.e. sometimes the
move was explicitly done. So perhaps it should not apply it to
RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1?

Other parts of this pass already seems to bail out in similar situations.   So 
I needed to
write some testcases to check what would happen in these cases hence the 
deferral.
to later in the week.
Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably 
better in general to me.  The cases where we're looking to clean things 
up aren't really in the prologue/epilogue, but instead in the main body 
after register elimination has turned fp into sp + offset, thus making 
all kinds of things no longer valid.


jeff


RE: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Jeff Law 
> Sent: Tuesday, June 20, 2023 3:17 AM
> To: Andrew Pinski ; Thiago Jung Bauermann
> 
> Cc: Manolis Tsamis ; Philipp Tomsich
> ; Richard Biener ;
> Palmer Dabbelt ; Kito Cheng ;
> gcc-patches@gcc.gnu.org; Tamar Christina 
> Subject: Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack
> pointer if possible.
> 
> 
> 
> On 6/19/23 17:48, Andrew Pinski wrote:
> > On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski 
> wrote:
> >>
> >> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
> >>  wrote:
> >>>
> >>>
> >>> Hello Manolis,
> >>>
> >>> Philipp Tomsich  writes:
> >>>
>  On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> >
> > On 5/25/23 06:35, Manolis Tsamis wrote:
> >> Propagation of the stack pointer in cprop_hardreg is currenty
> >> forbidden in all cases, due to maybe_mode_change returning NULL.
> >> Relax this restriction and allow propagation when no mode change is
> requested.
> >>
> >> gcc/ChangeLog:
> >>
> >>   * regcprop.cc (maybe_mode_change): Enable stack pointer
> propagation.
> > Thanks for the clarification.  This is OK for the trunk.  It looks
> > generic enough to have value going forward now rather than waiting.
> 
>  Rebased, retested, and applied to trunk.  Thanks!
> >>>
> >>> Our CI found a couple of tests that started failing on aarch64-linux
> >>> after this commit. I was able to confirm manually that they don't
> >>> happen in the commit immediately before this one, and also that
> >>> these failures are still present in today's trunk.
> >>>
> >>> I have testsuite logs for last good commit, first bad commit and
> >>> current trunk here:
> >>>
> >>> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbb
> >>> d4b/
> >>>
> >>> Could you please check?
> >>>
> >>> These are the new failures:
> >>>
> >>> Running gcc:gcc.target/aarch64/aarch64.exp ...
> >>> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times
> >>> mov\\tx11, sp 1
> >>
> >> So for the above before this change we had:
> >> ```
> >> (insn:TI 597 596 598 2 (set (reg:DI 11 x11)
> >>  (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65
> {*movdi_aarch64}
> >>   (nil))
> >> (insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
> >>  (unspec:BLK [
> >>  (reg:DI 11 x11)
> >>  (reg/f:DI 31 sp)
> >>  ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1
> >> 1169 {stack_tie}
> >>   (expr_list:REG_DEAD (reg:DI 11 x11)
> >>  (nil)))
> >> ```
> >>
> >> After we get:
> >> ```
> >> (insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
> >>  (unspec:BLK [
> >>  (reg:DI 31 sp [11]) repeated x2
> >>  ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1
> >> 1169 {stack_tie}
> >>   (nil))
> >> ```
> >> Which seems to be ok, except we still have:
> >> .cfi_def_cfa_register 11
> >>
> >> That is because on:
> >> (insn/f 596 595 598 2 (set (reg:DI 12 x12)
> >>  (plus:DI (reg:DI 12 x12)
> >>  (const_int 272 [0x110])))
> >> "stack-check-prologue-16.c":16:1
> >> 153 {*adddi3_aarch64}
> >>   (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
> >>  (nil)))
> >>
> >> We record x11 but never update it though that came before the mov for
> >> x11 ... So it seems like cprop_hardreg had no idea it needed to
> >> update it.
> >>
> >> I suspect the other testcases are just propagation of sp into the
> >> stores and such and just needed update. But the above testcase seems
> >> getting broken cfi  though I don't know how to fix it.

Yeah, we noticed the failures internally but left them broken since we have an
upcoming AArch64 patch which requires them to be updated anyway and are
rolling up the updates into that patch. 

> >
> > The code from aarch64.cc:
> > ```
> >/* This is done to provide unwinding information for the stack
> >   adjustments we're about to do, however to prevent the 
> > optimizers
> >   from removing the R11 move and leaving the CFA note (which 
> > would
> be
> >   very wrong) we tie the old and new stack pointer together.
> >   The tie will expand to nothing but the optimizers will not 
> > touch
> >   the instruction.  */
> >rtx stack_ptr_copy = gen_rtx_REG (Pmode,
> STACK_CLASH_SVE_CFA_REGNUM);
> >emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
> >emit_insn (gen_stack_tie (stack_ptr_copy,
> > stack_pointer_rtx));
> >
> >/* We want the CFA independent of the stack pointer for the
> >   duration of the loop.  */
> >add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
> >RTX_FRAME_RELATED_P (insn) = 1; ```
> >
> > Well except now with this change, the optimizers touch this
> > instruction. Maybe the move instruction should not be a move but an
> > unspec so optimizers don't know what 

[PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-19 Thread Di Zhao OS via Gcc-patches
This patch enables reassociation of floating-point additions on ampere1.
This brings about 1% overall benefit on spec2017 fprate cases. (There
are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)

Bootstrapped and tested on aarch64-unknown-linux-gnu. Is this OK for trunk?

Thanks,
Di Zhao

gcc/ChangeLog:

* config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1
---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d16565b5581..301c9f6c0cd 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1927,7 +1927,7 @@ static const struct tune_params ampere1_tunings =
   "32:12", /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
-  1,   /* fma_reassoc_width.  */
+  4,   /* fma_reassoc_width.  */
   2,   /* vec_reassoc_width.  */
   2,   /* min_div_recip_mul_sf.  */
   2,   /* min_div_recip_mul_df.  */
-- 
2.25.1




[PATCH] RISC-V: Optimize codegen of VLA SLP

2023-06-19 Thread Juzhe-Zhong
Recently, I figure out a better approach in case of codegen for VLA stepped 
vector.

Here is the detail descriptions:

Case 1:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8] = b[i * 8 + 37] + 1;
  a[i * 8 + 1] = b[i * 8 + 37] + 2;
  a[i * 8 + 2] = b[i * 8 + 37] + 3;
  a[i * 8 + 3] = b[i * 8 + 37] + 4;
  a[i * 8 + 4] = b[i * 8 + 37] + 5;
  a[i * 8 + 5] = b[i * 8 + 37] + 6;
  a[i * 8 + 6] = b[i * 8 + 37] + 7;
  a[i * 8 + 7] = b[i * 8 + 37] + 8;
}
}

We need to generate the stepped vector:
NPATTERNS = 8.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8 }

Before this patch:
vid.vv4 ;; {0,1,2,3,4,5,6,7,...}
vsrl.vi  v4,v4,3;; {0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,...}
li   a3,8   ;; {8}
vmul.vx  v4,v4,a3   ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...}

After this patch:
vid.vv4;; {0,1,2,3,4,5,6,7,...}
vand.vi  v4,v4,-8(-NPATTERNS)  ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...}

Case 2:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8] = b[i * 8 + 3] + 1;
  a[i * 8 + 1] = b[i * 8 + 2] + 2;
  a[i * 8 + 2] = b[i * 8 + 1] + 3;
  a[i * 8 + 3] = b[i * 8 + 0] + 4;
  a[i * 8 + 4] = b[i * 8 + 7] + 5;
  a[i * 8 + 5] = b[i * 8 + 6] + 6;
  a[i * 8 + 6] = b[i * 8 + 5] + 7;
  a[i * 8 + 7] = b[i * 8 + 4] + 8;
}
} 

We need to generate the stepped vector:
NPATTERNS = 4.
{ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... }

Before this patch:
li   a6,134221824
slli a6,a6,5
addi a6,a6,3;; 64-bit: 0x000300020001
vmv.v.x  v6,a6  ;; {3, 2, 1, 0, ... }
vid.vv4 ;; {0, 1, 2, 3, 4, 5, 6, 7, ... }
vsrl.vi  v4,v4,2;; {0, 0, 0, 0, 1, 1, 1, 1, ... }
li   a3,4   ;; {4}
vmul.vx  v4,v4,a3   ;; {0, 0, 0, 0, 4, 4, 4, 4, ... }
vadd.vv  v4,v4,v6   ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 
12, ... }

After this patch:
li  a3,-536875008
sllia3,a3,4
addia3,a3,1
sllia3,a3,16
vmv.v.x v2,a3   ;; {3, 1, -1, -3, ... }
vid.v   v4  ;; {0, 1, 2, 3, 4, 5, 6, 7, ... }
vadd.vv v4,v4,v2;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 
12, ... }

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Optimize codegen.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-16.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 78 ---
 .../riscv/rvv/autovec/partial/slp-1.c |  2 +
 .../riscv/rvv/autovec/partial/slp-16.c| 24 ++
 .../riscv/rvv/autovec/partial/slp_run-16.c| 66 
 4 files changed, 125 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-16.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 79c0337327d..aa143c864d6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1128,7 +1128,7 @@ expand_const_vector (rtx target, rtx src)
builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j));
 }
   builder.finalize ();
-  
+
   if (CONST_VECTOR_DUPLICATE_P (src))
 {
   /* Handle the case with repeating sequence that NELTS_PER_PATTERN = 1
@@ -1204,61 +1204,49 @@ expand_const_vector (rtx target, rtx src)
   if (builder.single_step_npatterns_p ())
{
  /* Describe the case by choosing NPATTERNS = 4 as an example.  */
- rtx base, step;
+ insn_code icode;
+
+ /* Step 1: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
+ rtx vid = gen_reg_rtx (builder.mode ());
+ rtx vid_ops[] = {vid};
+ icode = code_for_pred_series (builder.mode ());
+ emit_vlmax_insn (icode, RVV_MISC_OP, vid_ops);
+
  if (builder.npatterns_all_equal_p ())
{
  /* Generate the variable-length vector following this rule:
 { a, a, a + step, a + step, a + step * 2, a + step * 2, ...}
   E.g. { 0, 0, 8, 8, 16, 16, ... } */
- /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }.  */
- base = expand_vector_broadcast (builder.mode (), builder.elt (0));
+ /* Step 2: VID AND -NPATTERNS:
+{ 0&-4, 1&-4, 2&-4, 3 &-4, 4 &-4, 5 &-4, 6 &-4, 7 &-4, ... }
+ */
+ rtx imm
+   = gen_int_mode (-builder.npatterns (), builder.inner_mode ());
+ rtx and_ops[] = {target, vid, imm};
+ icode = code_for_pred_scalar (AND, builder.mode ());
+ emit_vlmax_insn (icode, RVV_BINOP, and_ops);
}
  else
{
  /* Generate the 

Re: [PATCH, rs6000] Add two peephole2 patterns for mr. insn

2023-06-19 Thread HAO CHEN GUI via Gcc-patches
HP,
  It makes sense. I will update the patch.

Thanks
Gui Haochen

在 2023/6/20 8:07, Hans-Peter Nilsson 写道:
> On Tue, 30 May 2023, HAO CHEN GUI via Gcc-patches wrote:
> 
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -7891,6 +7891,36 @@ (define_insn "*mov_internal2"
>> (set_attr "dot" "yes")
>> (set_attr "length" "4,4,8")])
>>
>> +(define_peephole2
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "")
>> +(compare:CC (match_operand:P 1 "int_reg_operand" "")
>> +(const_int 0)))
>> +   (set (match_operand:P 0 "int_reg_operand" "")
> 
> A random comment from the sideline: I'd suggest to remove the 
> (empty) constraints string from your peephole2's.
> 
> It can be a matter of port-specific-taste but it seems removing 
> them would be consistent with the other peephole2's in 
> rs6000.md.
> 
> (In this matter, I believe the examples in md.texi are bad.)
> 
> brgds, H-P


Re: [PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/6/20 02:54, Carl Love wrote:
> 
> Kewen, GCC maintainers:
> 
> Version 6, Fixed missing change log entry.  Changed builtin id names as
> requested.  Missed making the change on the last version.  Fixed
> comment in the three test cases.  Reran regression suite on Power 10,
> no regressions.
> 
> Version 5, Tested the patch on P9 BE per request.  Fixed up test case
> to get the correct expected values for BE and LE.  Fixed typos. 
> Updated the doc/extend.texi to clarify the vector arguments.  Changed
> test file names per request.  Moved builtin defs next to related
> definitions.  Renamed new mode_attr. Removed new mode_iterator, used
> existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
> Fixed up overloaded definitions per request.
> 
> Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
> cases to rs6000_expand_builtin.  Merged the new define_insn definitions
> with the existing definitions.  Renamed the builtins by removing the
> __builtin_ prefix from the names.  Fixed the documentation for the
> builtins.  Updated the test files to check the desired instructions
> were generated.  Retested patch on Power 10 with no regressions.
> 
> Version 3, was able to get the overloaded version of scalar_insert_exp
> to work and the change to xsxexpqp_f128_ define instruction to
> work with the suggestions from Kewen.  
> 
> Version 2, I have addressed the various comments from Kewen.  I had
> issues with adding an additional overloaded version of
> scalar_insert_exp with vector arguments.  The overload infrastructure
> didn't work with a mix of scalar and vector arguments.  I did rename
> the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
> it similar to the existing builtin.  I also wasn't able to get the
> suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
> I left the two simpler definitiions.
> 
> The patch add three new builtins to extract the significand and
> exponent of an IEEE float 128-bit value where the builtin argument is a
> vector.  Additionally, a builtin to insert the exponent into an IEEE
> float 128-bit vector argument is added.  These builtins were requested
> since there is no clean and optimal way to transfer between a vector
> and a scalar IEEE 128 bit value.
> 
> The patch has been tested on Power 9 BE and Power 10 LE with no
> regressions.  Please let me know if the patch is acceptable or not. 
> Thanks.

OK for trunk with some nits fixed in changelog (sorry that I didn't catch
all of them in previous review, but I don't think you need to post
a new version).  Thanks!

> 
>Carl
> 
> 
> rs6000: Add builtins for IEEE 128-bit floating point values
> 
> Add support for the following builtins:
> 
>  __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
>  __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
>  __ieee128 scalar_insert_exp (__vector unsigned __int128,
> __vector unsigned long long);
> 
> The instructions used in the builtins operate on vector registers.  Thus
> the result must be moved to a scalar type.  There is no clean, performant
> way to do this.  The user code typically needs the result as a vector
> anyway.
> 
> gcc/
>   * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
>   Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
>   Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.


Miss 
"Rename CODE_FOR_xsxsigqp_tf to CODE_FOR_xsxsigqp_tf_ti."
"Rename CODE_FOR_xsxsigqp_kf to CODE_FOR_xsxsigqp_kf_ti."
"Rename CODE_FOR_xsiexpqp_tf to CODE_FOR_xsiexpqp_tf_di."
"Rename CODE_FOR_xsiexpqp_kf to CODE_FOR_xsiexpqp_kf_di."

>   (CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
>   CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
>   * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
>__builtin_extractf128_sig, __builtin_insertf128_exp): Add new
>   builtin definitions.

Should be with correct names:

(__builtin_vsx_scalar_extract_exp_to_vec,
__builtin_vsx_scalar_extract_sig_to_vec,
__builtin_vsx_scalar_insert_exp_vqp):

>   Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
>   xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
>   * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
>   Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
>   overloaded instance. Update comments.
>   * config/rs6000/rs6000-overload.def
>   (__builtin_vec_scalar_insert_exp): Add new overload definition with
>   vector arguments.
>   (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
>   overloaded definitions.
>   * config/vsx.md (V2DI_DI): New mode iterator.
>   (DI_to_TI): New mode attribute.
>   Rename xsxexpqp_ to sxexpqp__.
>   Rename xsxsigqp_ to xsxsigqp__.
>   

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 17:48, Andrew Pinski wrote:

On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski  wrote:


On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
 wrote:



Hello Manolis,

Philipp Tomsich  writes:


On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:


On 5/25/23 06:35, Manolis Tsamis wrote:

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

  * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.

Thanks for the clarification.  This is OK for the trunk.  It looks
generic enough to have value going forward now rather than waiting.


Rebased, retested, and applied to trunk.  Thanks!


Our CI found a couple of tests that started failing on aarch64-linux
after this commit. I was able to confirm manually that they don't happen
in the commit immediately before this one, and also that these failures
are still present in today's trunk.

I have testsuite logs for last good commit, first bad commit and current
trunk here:

https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/

Could you please check?

These are the new failures:

Running gcc:gcc.target/aarch64/aarch64.exp ...
FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 
1


So for the above before this change we had:
```
(insn:TI 597 596 598 2 (set (reg:DI 11 x11)
 (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
  (nil))
(insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
 (unspec:BLK [
 (reg:DI 11 x11)
 (reg/f:DI 31 sp)
 ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
  (expr_list:REG_DEAD (reg:DI 11 x11)
 (nil)))
```

After we get:
```
(insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
 (unspec:BLK [
 (reg:DI 31 sp [11]) repeated x2
 ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
  (nil))
```
Which seems to be ok, except we still have:
.cfi_def_cfa_register 11

That is because on:
(insn/f 596 595 598 2 (set (reg:DI 12 x12)
 (plus:DI (reg:DI 12 x12)
 (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
153 {*adddi3_aarch64}
  (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
 (nil)))

We record x11 but never update it though that came before the mov for
x11 ... So it seems like cprop_hardreg had no idea it needed to update
it.

I suspect the other testcases are just propagation of sp into the
stores and such and just needed update. But the above testcase seems
getting broken cfi  though I don't know how to fix it.


The code from aarch64.cc:
```
   /* This is done to provide unwinding information for the stack
  adjustments we're about to do, however to prevent the optimizers
  from removing the R11 move and leaving the CFA note (which would 
be
  very wrong) we tie the old and new stack pointer together.
  The tie will expand to nothing but the optimizers will not touch
  the instruction.  */
   rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
   emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
   emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx));

   /* We want the CFA independent of the stack pointer for the
  duration of the loop.  */
   add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
   RTX_FRAME_RELATED_P (insn) = 1;
```

Well except now with this change, the optimizers touch this
instruction. Maybe the move instruction should not be a move but an
unspec so optimizers don't know what the move was.
Adding Tamar to the CC who added this code to aarch64 originally for
comments on the above understanding here.
It's a bit hackish, but could we reject the stack pointer for operand1 
in the stack-tie?  And if we do so, does it help?


jeff


RE: Re: [PATCH] RISC-V: Fix fails of testcases

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of ???
Sent: Tuesday, June 20, 2023 7:15 AM
To: Jeff Law ; gcc-patches 
Cc: kito.cheng ; palmer ; rdapp.gcc 

Subject: Re: Re: [PATCH] RISC-V: Fix fails of testcases

>> Presumably the target selector in the dg-do ensures we only build/run
>> these on the appropriate targets now and we don't need explicitly -march
>> arguments?
Yes. 

>> Assuming that's correct, this is fine for the trunk.
Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-20 07:13
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix fails of testcases
 
 
On 6/19/23 17:04, Juzhe-Zhong wrote:
> FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
> -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for 
> excess errors)
> Excess errors:
> xgcc: fatal error: Cannot find suitable multilib set for 
> '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
> compilation terminated.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: 
> Ditto.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.
Presumably the target selector in the dg-do ensures we only build/run 
these on the appropriate targets now and we don't need explicitly -march 
arguments?
 
Assuming that's correct, this is fine for the trunk.
 
jeff
 


RE: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Tuesday, June 20, 2023 2:04 AM
To: 钟居哲 ; 丁乐华 ; gcc-patches 

Cc: Wang, Yanzhang ; kito.cheng 
; palmer ; rdapp.gcc 

Subject: Re: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify 
code



On 6/18/23 07:16, 钟居哲 wrote:
> Thanks for cleaning up codes for future's ABI support patch.
> Let's wait for Jeff or Robin comments.
Looks reasonable to me given the state we're in WRT psabi and vectors.

jeff


Re: [committed] libstdc++: Optimize std::to_array for trivial types [PR110167]

2023-06-19 Thread Patrick Palka via Gcc-patches
On Fri, 9 Jun 2023, Jonathan Wakely via Libstdc++ wrote:

> Tested powerpc64le-linux. Pushed to trunk.
> 
> This makes sense to backport after some soak time on trunk.
> 
> -- >8 --
> 
> As reported in PR libstdc++/110167, std::to_array compiles extremely
> slowly for very large arrays. It needs to instantiate a very large
> specialization of std::index_sequence and then create a very large
> aggregate initializer from the pack expansion. For trivial types we can
> simply default-initialize the std::array and then use memcpy to copy the
> values. For non-trivial types we need to use the existing
> implementation, despite the compilation cost.
> 
> As also noted in the PR, using a generic lambda instead of the
> __to_array helper compiles faster since gcc-13. It also produces
> slightly smaller code at -O1, due to additional inlining. The code at
> -Os, -O2 and -O3 seems to be the same. This new implementation requires
> __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
> since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/110167
>   * include/std/array (to_array): Initialize arrays of trivial
>   types using memcpy. For non-trivial types, use lambda
>   expressions instead of a separate helper function.
>   (__to_array): Remove.
>   * testsuite/23_containers/array/creation/110167.cc: New test.
> ---
>  libstdc++-v3/include/std/array| 53 +--
>  .../23_containers/array/creation/110167.cc| 14 +
>  2 files changed, 51 insertions(+), 16 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
> 
> diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
> index 70280c1beeb..b791d86ddb2 100644
> --- a/libstdc++-v3/include/std/array
> +++ b/libstdc++-v3/include/std/array
> @@ -414,19 +414,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>return std::move(std::get<_Int>(__arr));
>  }
>  
> -#if __cplusplus > 201703L
> +#if __cplusplus >= 202002L && __cpp_generic_lambdas >= 201707L
>  #define __cpp_lib_to_array 201907L
> -
> -  template
> -constexpr array, sizeof...(_Idx)>
> -__to_array(_Tp (&__a)[sizeof...(_Idx)], index_sequence<_Idx...>)
> -{
> -  if constexpr (_Move)
> - return {{std::move(__a[_Idx])...}};
> -  else
> - return {{__a[_Idx]...}};
> -}
> -
>template
>  [[nodiscard]]
>  constexpr array, _Nm>
> @@ -436,8 +425,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>static_assert(!is_array_v<_Tp>);
>static_assert(is_constructible_v<_Tp, _Tp&>);
>if constexpr (is_constructible_v<_Tp, _Tp&>)
> - return __to_array(__a, make_index_sequence<_Nm>{});
> -  __builtin_unreachable(); // FIXME: see PR c++/91388
> + {
> +   if constexpr (is_trivial_v<_Tp> && _Nm != 0)

redundant _Nm != 0 test?

> + {
> +   array, _Nm> __arr;
> +   if (!__is_constant_evaluated() && _Nm != 0)
> + __builtin_memcpy(__arr.data(), __a, sizeof(__a));
> +   else
> + for (size_t __i = 0; __i < _Nm; ++__i)
> +   __arr._M_elems[__i] = __a[__i];
> +   return __arr;
> + }
> +   else
> + return [&__a](index_sequence<_Idx...>) {
> +   return array, _Nm>{{ __a[_Idx]... }};
> + }(make_index_sequence<_Nm>{});
> + }
> +  else
> + __builtin_unreachable(); // FIXME: see PR c++/91388
>  }
>  
>template
> @@ -449,8 +454,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>static_assert(!is_array_v<_Tp>);
>static_assert(is_move_constructible_v<_Tp>);
>if constexpr (is_move_constructible_v<_Tp>)
> - return __to_array<1>(__a, make_index_sequence<_Nm>{});
> -  __builtin_unreachable(); // FIXME: see PR c++/91388
> + {
> +   if constexpr (is_trivial_v<_Tp>)
> + {
> +   array, _Nm> __arr;
> +   if (!__is_constant_evaluated() && _Nm != 0)
> + __builtin_memcpy(__arr.data(), __a, sizeof(__a));
> +   else
> + for (size_t __i = 0; __i < _Nm; ++__i)
> +   __arr._M_elems[__i] = std::move(__a[__i]);

IIUC this std::move is unnecessary for trivial arrays?

> +   return __arr;
> + }
> +   else
> + return [&__a](index_sequence<_Idx...>) {
> +   return array, _Nm>{{ std::move(__a[_Idx])... }};
> + }(make_index_sequence<_Nm>{});
> + }
> +  else
> + __builtin_unreachable(); // FIXME: see PR c++/91388
>  }
>  #endif // C++20
>  
> diff --git a/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc 
> b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
> new file mode 100644
> index 000..c2aecc911bd
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
> @@ -0,0 +1,14 @@
> +// { dg-options "-std=gnu++20" }
> +// { dg-do compile { target c++20 } }
> +
> +// PR 

Re: [PATCH v6 0/4] P1689R5 support

2023-06-19 Thread Jason Merrill via Gcc-patches

On 6/17/23 10:43, Ben Boeckel wrote:

On Fri, Jun 16, 2023 at 23:55:53 -0400, Jason Merrill wrote:

I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g.

FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors)
Excess errors:
/home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal
compiler error: Segmentation fault
0x19e2f3c crash_signal
 /home/jason/gt/gcc/toplev.cc:314
0x340f3f8 mkdeps::vec::size() const
 /home/jason/gt/libcpp/mkdeps.cc:57
0x340dc1f apply_vpath
 /home/jason/gt/libcpp/mkdeps.cc:194
0x340e08e deps_add_dep(mkdeps*, char const*)
 /home/jason/gt/libcpp/mkdeps.cc:318
0xea7b51 module_client::open_module_client(unsigned int, char const*,
mkdeps*, void (*)(char const*), char const*)
 /home/jason/gt/gcc/cp/mapper-client.cc:291
0xef2ba8 make_mapper
 /home/jason/gt/gcc/cp/module.cc:14042
0xf0896c get_mapper(unsigned int, mkdeps*)
 /home/jason/gt/gcc/cp/module.cc:3977
0xf032ac name_pending_imports
 /home/jason/gt/gcc/cp/module.cc:19623
0xf03a7d preprocessed_module(cpp_reader*)
 /home/jason/gt/gcc/cp/module.cc:19817
0xe85104 module_token_cdtor(cpp_reader*, unsigned long)
 /home/jason/gt/gcc/cp/lex.cc:548
0xf467b2 cp_lexer_new_main
 /home/jason/gt/gcc/cp/parser.cc:756
0xfc1e3a c_parse_file()
 /home/jason/gt/gcc/cp/parser.cc:49725
0x11c5bf5 c_common_parse_file()
 /home/jason/gt/gcc/c-family/c-opts.cc:1268


Thanks. I missed a `nullptr` check before calling `deps_add_dep`. I
think I got misled by `make check` returning a zero exit code even if
there are failures.


Aha!

Patches 3 and 4 could also use testcases.

Jason



Re: [PATCH, rs6000] Add two peephole2 patterns for mr. insn

2023-06-19 Thread Hans-Peter Nilsson
On Tue, 30 May 2023, HAO CHEN GUI via Gcc-patches wrote:

> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7891,6 +7891,36 @@ (define_insn "*mov_internal2"
> (set_attr "dot" "yes")
> (set_attr "length" "4,4,8")])
> 
> +(define_peephole2
> +  [(set (match_operand:CC 2 "cc_reg_operand" "")
> + (compare:CC (match_operand:P 1 "int_reg_operand" "")
> + (const_int 0)))
> +   (set (match_operand:P 0 "int_reg_operand" "")

A random comment from the sideline: I'd suggest to remove the 
(empty) constraints string from your peephole2's.

It can be a matter of port-specific-taste but it seems removing 
them would be consistent with the other peephole2's in 
rs6000.md.

(In this matter, I believe the examples in md.texi are bad.)

brgds, H-P


Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski  wrote:
>
> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
>  wrote:
> >
> >
> > Hello Manolis,
> >
> > Philipp Tomsich  writes:
> >
> > > On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> > >>
> > >> On 5/25/23 06:35, Manolis Tsamis wrote:
> > >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > >> > restriction and allow propagation when no mode change is requested.
> > >> >
> > >> > gcc/ChangeLog:
> > >> >
> > >> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> > >> > propagation.
> > >> Thanks for the clarification.  This is OK for the trunk.  It looks
> > >> generic enough to have value going forward now rather than waiting.
> > >
> > > Rebased, retested, and applied to trunk.  Thanks!
> >
> > Our CI found a couple of tests that started failing on aarch64-linux
> > after this commit. I was able to confirm manually that they don't happen
> > in the commit immediately before this one, and also that these failures
> > are still present in today's trunk.
> >
> > I have testsuite logs for last good commit, first bad commit and current
> > trunk here:
> >
> > https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
> >
> > Could you please check?
> >
> > These are the new failures:
> >
> > Running gcc:gcc.target/aarch64/aarch64.exp ...
> > FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times 
> > mov\\tx11, sp 1
>
> So for the above before this change we had:
> ```
> (insn:TI 597 596 598 2 (set (reg:DI 11 x11)
> (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
>  (nil))
> (insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
> (unspec:BLK [
> (reg:DI 11 x11)
> (reg/f:DI 31 sp)
> ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
> {stack_tie}
>  (expr_list:REG_DEAD (reg:DI 11 x11)
> (nil)))
> ```
>
> After we get:
> ```
> (insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
> (unspec:BLK [
> (reg:DI 31 sp [11]) repeated x2
> ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
> {stack_tie}
>  (nil))
> ```
> Which seems to be ok, except we still have:
> .cfi_def_cfa_register 11
>
> That is because on:
> (insn/f 596 595 598 2 (set (reg:DI 12 x12)
> (plus:DI (reg:DI 12 x12)
> (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
> 153 {*adddi3_aarch64}
>  (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
> (nil)))
>
> We record x11 but never update it though that came before the mov for
> x11 ... So it seems like cprop_hardreg had no idea it needed to update
> it.
>
> I suspect the other testcases are just propagation of sp into the
> stores and such and just needed update. But the above testcase seems
> getting broken cfi  though I don't know how to fix it.

The code from aarch64.cc:
```
  /* This is done to provide unwinding information for the stack
 adjustments we're about to do, however to prevent the optimizers
 from removing the R11 move and leaving the CFA note (which would be
 very wrong) we tie the old and new stack pointer together.
 The tie will expand to nothing but the optimizers will not touch
 the instruction.  */
  rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
  emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
  emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx));

  /* We want the CFA independent of the stack pointer for the
 duration of the loop.  */
  add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
  RTX_FRAME_RELATED_P (insn) = 1;
```

Well except now with this change, the optimizers touch this
instruction. Maybe the move instruction should not be a move but an
unspec so optimizers don't know what the move was.
Adding Tamar to the CC who added this code to aarch64 originally for
comments on the above understanding here.

Thanks,
Andrew


>
> Thanks,
> Andrew Pinski
>
>
> >
> > Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> > FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
> > -fno-stack-protector  check-function-bodies caller_pred
> > FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
> > -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
> > -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
> > -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
> > #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: 

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
 wrote:
>
>
> Hello Manolis,
>
> Philipp Tomsich  writes:
>
> > On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> >>
> >> On 5/25/23 06:35, Manolis Tsamis wrote:
> >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> >> > restriction and allow propagation when no mode change is requested.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> >> > propagation.
> >> Thanks for the clarification.  This is OK for the trunk.  It looks
> >> generic enough to have value going forward now rather than waiting.
> >
> > Rebased, retested, and applied to trunk.  Thanks!
>
> Our CI found a couple of tests that started failing on aarch64-linux
> after this commit. I was able to confirm manually that they don't happen
> in the commit immediately before this one, and also that these failures
> are still present in today's trunk.
>
> I have testsuite logs for last good commit, first bad commit and current
> trunk here:
>
> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
>
> Could you please check?
>
> These are the new failures:
>
> Running gcc:gcc.target/aarch64/aarch64.exp ...
> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, 
> sp 1

So for the above before this change we had:
```
(insn:TI 597 596 598 2 (set (reg:DI 11 x11)
(reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
 (nil))
(insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
(unspec:BLK [
(reg:DI 11 x11)
(reg/f:DI 31 sp)
] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
 (expr_list:REG_DEAD (reg:DI 11 x11)
(nil)))
```

After we get:
```
(insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
(unspec:BLK [
(reg:DI 31 sp [11]) repeated x2
] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
 (nil))
```
Which seems to be ok, except we still have:
.cfi_def_cfa_register 11

That is because on:
(insn/f 596 595 598 2 (set (reg:DI 12 x12)
(plus:DI (reg:DI 12 x12)
(const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
153 {*adddi3_aarch64}
 (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
(nil)))

We record x11 but never update it though that came before the mov for
x11 ... So it seems like cprop_hardreg had no idea it needed to update
it.

I suspect the other testcases are just propagation of sp into the
stores and such and just needed update. But the above testcase seems
getting broken cfi  though I don't know how to fix it.

Thanks,
Andrew Pinski


>
> Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
> -fno-stack-protector  check-function-bodies caller_pred
> FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
> #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 

Re: Re: [PATCH] RISC-V: Fix fails of testcases

2023-06-19 Thread 钟居哲
>> Presumably the target selector in the dg-do ensures we only build/run
>> these on the appropriate targets now and we don't need explicitly -march
>> arguments?
Yes. 

>> Assuming that's correct, this is fine for the trunk.
Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-20 07:13
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix fails of testcases
 
 
On 6/19/23 17:04, Juzhe-Zhong wrote:
> FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
> -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for 
> excess errors)
> Excess errors:
> xgcc: fatal error: Cannot find suitable multilib set for 
> '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
> compilation terminated.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: 
> Ditto.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.
Presumably the target selector in the dg-do ensures we only build/run 
these on the appropriate targets now and we don't need explicitly -march 
arguments?
 
Assuming that's correct, this is fine for the trunk.
 
jeff
 


Re: [PATCH] RISC-V: Fix fails of testcases

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 17:04, Juzhe-Zhong wrote:

FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
Excess errors:
xgcc: fatal error: Cannot find suitable multilib set for 
'-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
compilation terminated.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.
Presumably the target selector in the dg-do ensures we only build/run 
these on the appropriate targets now and we don't need explicitly -march 
arguments?


Assuming that's correct, this is fine for the trunk.

jeff


[PATCH] RISC-V: Fix fails of testcases

2023-06-19 Thread Juzhe-Zhong
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
Excess errors:
xgcc: fatal error: Cannot find suitable multilib set for 
'-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
compilation terminated.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.

---
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c| 2 +-
 .../riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
index 82bf6d674ec..dd22dae5eb9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-std=c99 -march=rv64gcv -Wno-pedantic" } */
+/* { dg-additional-options "-std=c99 -Wno-pedantic" } */
 
 #include 
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c
index a0b2cf97afe..db54acc6535 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run {target { riscv_zvfh_hw } } } */
-/* { dg-additional-options "-march=rv64gcv_zvfh -Wno-pedantic" } */
+/* { dg-additional-options "-Wno-pedantic" } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
index 7e5e0e69d51..bf04a3d029e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-std=c99 -march=rv64gcv -Wno-pedantic" } */
+/* { dg-additional-options "-std=c99 -Wno-pedantic" } */
 
 #include 
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c
index bf514f9426b..df8363e0428 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_zvfh_hw } } } */
-/* { dg-additional-options "-march=rv64gcv_zvfh -Wno-pedantic" } */
+/* { dg-additional-options "-Wno-pedantic" } */
 
 #include 
 
-- 
2.36.1



[PR target/110201] Fix operand types for various scalar crypto insns

2023-06-19 Thread Jeff Law via Gcc-patches


A handful of the scalar crypto instructions are supposed to take a 
constant integer argument 0..3 inclusive.  A suitable constraint was 
created and used for this purpose (D03), but the operand's predicate is 
"register_operand".  That's just wrong.


This patch adds a new predicate "const_0_3_operand" and fixes the 
relevant insns to use it.  One could argue the constraint is redundant 
now (and you'd be correct).  I wouldn't lose sleep if someone wanted 
that removed, in which case I'll spin up a V2.


The testsuite was broken in a way that made it consistent with the 
compiler, so the tests passed, when they really should have been issuing 
errors all along.


This patch adjusts the existing tests so that they all expect a 
diagnostic on the invalid operand usage (including out of range 
constants).  It adds new tests with proper constants, testing the 
extremes of valid values.


OK for the trunk, or should we remove the D03 constraint?

Jeff

PR target/110201
gcc/
* config/riscv/predicates.md (const_0_3_operand): New predicate.
* config/riscv/crypto.md (riscv_aes32dsi): Use new predicate.
(riscv_aes32dsmi, riscv_aes32esi, riscvaes32esmi): Likewise.
(riscv_sm4ed_, riscv_sm4ks_"
   [(set (match_operand:X 0 "register_operand" "=r")
 (unspec:X [(match_operand:X 1 "register_operand" "r")
   (match_operand:X 2 "register_operand" "r")
-  (match_operand:SI 3 "register_operand" "D03")]
+  (match_operand:SI 3 "const_0_3_operand" "D03")]
   UNSPEC_SM4_ED))]
   "TARGET_ZKSED"
   "sm4ed\t%0,%1,%2,%3"
@@ -404,7 +404,7 @@ (define_insn "riscv_sm4ks_"
   [(set (match_operand:X 0 "register_operand" "=r")
 (unspec:X [(match_operand:X 1 "register_operand" "r")
   (match_operand:X 2 "register_operand" "r")
-  (match_operand:SI 3 "register_operand" "D03")]
+  (match_operand:SI 3 "const_0_3_operand" "D03")]
   UNSPEC_SM4_KS))]
   "TARGET_ZKSED"
   "sm4ks\t%0,%1,%2,%3"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 04ca6ceabc7..7aed71b5123 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -45,6 +45,10 @@ (define_predicate "const_csr_operand"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 31)")))
 
+(define_predicate "const_0_3_operand"
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (INTVAL (op), 0, 3)")))
+
 (define_predicate "csr_operand"
   (ior (match_operand 0 "const_csr_operand")
(match_operand 0 "register_operand")))
diff --git a/gcc/testsuite/gcc.target/riscv/zknd32-2.c 
b/gcc/testsuite/gcc.target/riscv/zknd32-2.c
new file mode 100644
index 000..f8e68c6e56b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zknd32-2.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gc_zknd -mabi=ilp32d" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
+
+#include 
+
+int32_t foo1(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsi(rs1,rs2,0);
+}
+
+int32_t foo2(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsmi(rs1,rs2,0);
+}
+
+int32_t foo3(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsi(rs1,rs2,3);
+}
+
+int32_t foo4(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsmi(rs1,rs2,3);
+}
+
+/* { dg-final { scan-assembler-times "aes32dsi" 2 } } */
+/* { dg-final { scan-assembler-times "aes32dsmi" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zknd32.c 
b/gcc/testsuite/gcc.target/riscv/zknd32.c
index 5fcc66da901..7370a2c1812 100644
--- a/gcc/testsuite/gcc.target/riscv/zknd32.c
+++ b/gcc/testsuite/gcc.target/riscv/zknd32.c
@@ -6,13 +6,30 @@
 
 int32_t foo1(int32_t rs1, int32_t rs2, int bs)
 {
-return __builtin_riscv_aes32dsi(rs1,rs2,bs);
+return __builtin_riscv_aes32dsi(rs1,rs2,bs); /* { dg-error "invalid 
argument to built-in function" } */
 }
 
 int32_t foo2(int32_t rs1, int32_t rs2, int bs)
 {
-return __builtin_riscv_aes32dsmi(rs1,rs2,bs);
+return __builtin_riscv_aes32dsmi(rs1,rs2,bs); /* { dg-error "invalid 
argument to built-in function" } */
 }
 
-/* { dg-final { scan-assembler-times "aes32dsi" 1 } } */
-/* { dg-final { scan-assembler-times "aes32dsmi" 1 } } */
+int32_t foo3(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsi(rs1,rs2,-1); /* { dg-error "invalid 
argument to built-in function" } */
+}
+
+int32_t foo4(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsmi(rs1,rs2,-1); /* { dg-error "invalid 
argument to built-in function" } */
+}
+
+int32_t foo5(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsi(rs1,rs2,4); /* { dg-error "invalid 
argument to built-in function" } */
+}
+
+int32_t foo6(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsmi(rs1,rs2,4); /* { dg-error "invalid 
argument to built-in function" } */
+}
diff --git 

Re: [PATCH] RISC-V: Add VLS modes for GNU vectors

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 15:45, 钟居哲 wrote:

Hi, Jeff.
Thanks for comment.

I add INCLUDE_ALGORITHM since I use std:min.
I failed to compile when I didn't add INCLUDE_ALGORITHM.

Is INCLUDE_ALGORITHM expensive that you don't want it?


It just stood out as unexpected.  THere's no concerns with std::min and 
the like.


Jeff



Re: [PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-19 Thread 钟居哲
Hi, this patch refactors the codes in tree-vect-stmts.cc in case of gimple IR 
generation.

I realize the codes change too much and I am not sure whether you are happy 
with it.

Originally, the codes are like:

if (final_mask)
 {
generate IFN_MASK_LOAD...
 }
else if (loop_len)
{
   generate IFN_LEN_LOAD
   handle BIAS.
}
else
{ 
NORMAL_LOAD
}

Now, I refactor it:

if (final_mask || loop_len)
{
  if (get_len_load_store ().exisits ())
  {
/* LEN_MASK_LOAD or LEN_LOAD */
get len..
if (LEN_MASK_LOAD)
 {
   get mask...
   generate IFN_LEN_MASK_LOAD...
 }
 else
  {
generate IFN_LEN_LOAD...
  }
  Handle BIAS
  }
  else
  {
gcc_assert (final_mask)
/* MASK_LOAD */
   }
}
else 
{
 NORMAL_LOAD
}

The reason I refactor it is I found LEN_MASK_LOAD and LEN_LOAD share some 
common codes.
Avoid duplicate codes make the codes looks reasonable.

Boostrap and Regression is on the way.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-20 00:17
To: gcc-patches
CC: rguenther; richard.sandiford; Ju-Zhe Zhong
Subject: [PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
From: Ju-Zhe Zhong 
 
This patch is apply LEN_MASK_{LOAD,STORE} into vectorizer.
I refactor gimple IR build to make codes look cleaner.
 
gcc/ChangeLog:
 
* internal-fn.cc (expand_partial_store_optab_fn): Add 
LEN_MASK_{LOAD,STORE} vectorizer support.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* optabs-query.cc (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support.
(vectorizable_load): Ditto.
 
---
gcc/internal-fn.cc |  35 +-
gcc/optabs-query.cc|  25 +++-
gcc/tree-vect-stmts.cc | 259 +
3 files changed, 213 insertions(+), 106 deletions(-)
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..e10c21de5f1 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
  * OPTAB.  */
static void
-expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
{
   class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
@@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   insn_code icode;
   maskt = gimple_call_arg (stmt, 2);
-  rhs = gimple_call_arg (stmt, 3);
+  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
   type = TREE_TYPE (rhs);
   lhs = expand_call_mem_ref (type, stmt, 0);
@@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
 case IFN_GATHER_LOAD:
 case IFN_MASK_GATHER_LOAD:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   return true;
 default:
@@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   return true;
 default:
@@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_STORE_LANES:
   return 2;
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
+  return 3;
+
 case IFN_MASK_GATHER_LOAD:
 case IFN_MASK_SCATTER_STORE:
   return 4;
@@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
 case IFN_LEN_STORE:
   return 3;
+case IFN_LEN_MASK_STORE:
+  return 4;
+
 default:
   return -1;
 }
@@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
{
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);
+  int bias_argno = 3;
+  if (icode == CODE_FOR_nothing)
+{
+  machine_mode mask_mode
+ = targetm.vectorize.get_mask_mode (mode).require ();
+  if (ifn == IFN_LEN_LOAD)
+ {
+   /* Try LEN_MASK_LOAD.  */
+   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
+ }
+  else
+ {
+   /* Try LEN_MASK_STORE.  */
+   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
+ }
+  icode = convert_optab_handler (optab, mode, mask_mode);
+  bias_argno = 4;
+}
   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
-  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (0)))
return 0;
-  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (-1)))
return -1;
 }
diff --git a/gcc/optabs-query.cc 

Re: Re: [PATCH] RISC-V: Add VLS modes for GNU vectors

2023-06-19 Thread 钟居哲
Hi, Jeff.
Thanks for comment.

I add INCLUDE_ALGORITHM since I use std:min.
I failed to compile when I didn't add INCLUDE_ALGORITHM.

Is INCLUDE_ALGORITHM expensive that you don't want it?


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-20 02:25
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add VLS modes for GNU vectors
 
 
On 6/18/23 17:06, Juzhe-Zhong wrote:
> This patch is a propsal patch is **NOT** ready to push since
> after this patch the total machine modes will exceed 255 which will create ICE
> in LTO:
>internal compiler error: in bp_pack_int_in_range, at data-streamer.h:290
Right.  Note that an ack from Jakub or Richi will be sufficient for the 
LTO fixes to go forward.
 
 
> 
> The reason we need to add VLS modes for following reason:
> 1. Enhance GNU vectors codegen:
> For example:
>   typedef int32_t vnx8si __attribute__ ((vector_size (32)));
> 
>   __attribute__ ((noipa)) void
>   f_vnx8si (int32_t * in, int32_t * out)
>   {
> vnx8si v = *(vnx8si*)in;
> *(vnx8si *) out = v;
>   }
> 
> compile option: --param=riscv-autovec-preference=scalable
>  before this patch:
>  f_vnx8si:
>  ld  a2,0(a0)
>  ld  a3,8(a0)
>  ld  a4,16(a0)
>  ld  a5,24(a0)
>  addisp,sp,-32
>  sd  a2,0(a1)
>  sd  a3,8(a1)
>  sd  a4,16(a1)
>  sd  a5,24(a1)
>  addisp,sp,32
>  jr  ra
> 
> After this patch:
> f_vnx8si:
>  vsetivlizero,8,e32,m2,ta,ma
>  vle32.v v2,0(a0)
>  vse32.v v2,0(a1)
>  ret
> 
> 2. Ehance VLA SLP:
> void
> f (uint8_t *restrict a, uint8_t *restrict b, uint8_t *restrict c)
> {
>for (int i = 0; i < 100; ++i)
>  {
>a[i * 8] = b[i * 8] + c[i * 8];
>a[i * 8 + 1] = b[i * 8] + c[i * 8 + 1];
>a[i * 8 + 2] = b[i * 8 + 2] + c[i * 8 + 2];
>a[i * 8 + 3] = b[i * 8 + 2] + c[i * 8 + 3];
>a[i * 8 + 4] = b[i * 8 + 4] + c[i * 8 + 4];
>a[i * 8 + 5] = b[i * 8 + 4] + c[i * 8 + 5];
>a[i * 8 + 6] = b[i * 8 + 6] + c[i * 8 + 6];
>a[i * 8 + 7] = b[i * 8 + 6] + c[i * 8 + 7];
>  }
> }
> 
> 
> ..
> Loop body:
>   ...
>   vrgatherei16.vv...
>   ...
> 
> Tail:
>   lbu a4,792(a1)
>  lbu a5,792(a2)
>  addwa5,a5,a4
>  sb  a5,792(a0)
>  lbu a5,793(a2)
>  addwa5,a5,a4
>  sb  a5,793(a0)
>  lbu a4,794(a1)
>  lbu a5,794(a2)
>  addwa5,a5,a4
>  sb  a5,794(a0)
>  lbu a5,795(a2)
>  addwa5,a5,a4
>  sb  a5,795(a0)
>  lbu a4,796(a1)
>  lbu a5,796(a2)
>  addwa5,a5,a4
>  sb  a5,796(a0)
>  lbu a5,797(a2)
>  addwa5,a5,a4
>  sb  a5,797(a0)
>  lbu a4,798(a1)
>  lbu a5,798(a2)
>  addwa5,a5,a4
>  sb  a5,798(a0)
>  lbu a5,799(a2)
>  addwa5,a5,a4
>  sb  a5,799(a0)
>  ret
> 
> The tail elements need VLS modes to vectorize like ARM SVE:
> 
> f:
>  mov x3, 0
>  cntbx5
>  mov x4, 792
>  whilelo p7.b, xzr, x4
> .L2:
>  ld1bz31.b, p7/z, [x1, x3]
>  ld1bz30.b, p7/z, [x2, x3]
>  trn1z31.b, z31.b, z31.b
>  add z31.b, z31.b, z30.b
>  st1bz31.b, p7, [x0, x3]
>  add x3, x3, x5
>  whilelo p7.b, x3, x4
>  b.any   .L2
> Tail:
>  ldr b31, [x1, 792]
>  ldr b27, [x1, 794]
>  ldr b28, [x1, 796]
>  dup v31.8b, v31.b[0]
>  ldr b29, [x1, 798]
>  ldr d30, [x2, 792]
>  ins v31.b[2], v27.b[0]
>  ins v31.b[3], v27.b[0]
>  ins v31.b[4], v28.b[0]
>  ins v31.b[5], v28.b[0]
>  ins v31.b[6], v29.b[0]
>  ins v31.b[7], v29.b[0]
>  add v31.8b, v30.8b, v31.8b
>  str d31, [x0, 792]
>  ret
> 
> Notice ARM SVE use ADVSIMD modes (Neon) to vectorize the tail.
 
 
 
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add VLS modes for 
> GNU vectors.
>  (ADJUST_ALIGNMENT): Ditto.
>  (ADJUST_BYTESIZE): Ditto.
> 
>  (ADJUST_PRECISION): Ditto.
>  (VECTOR_MODES): Ditto.
>  * config/riscv/riscv-protos.h (riscv_v_ext_vls_mode_p): Ditto.
>  (get_regno_alignment): Ditto.
>  * config/riscv/riscv-v.cc (INCLUDE_ALGORITHM): Ditto.
>  (const_vlmax_p): Ditto.
>  (legitimize_move): Ditto.
>  (get_vlmul): Ditto.
>  (get_regno_alignment): Ditto.
>  (get_ratio): Ditto.
>  (get_vector_mode): Ditto.
>  * config/riscv/riscv-vector-switch.def (VLS_ENTRY): Ditto.
>  * config/riscv/riscv.cc 

Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF

2023-06-19 Thread Jason Merrill via Gcc-patches

On 6/6/23 16:50, Ben Boeckel wrote:

Unicode does not support such values because they are unrepresentable in
UTF-16.


Pushed.


libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
  libcpp/charset.cc | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d7f323b2cd5..3b34d804cf1 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
int err = one_utf8_to_cppchar (, , );
if (err)
return false;
+
+  /* Additionally, Unicode declares that all codepoints above 0010 are
+invalid because they cannot be represented in UTF-16.
+
+Reject such values.*/
+  if (cp >= 0x10)
+   return false;
  }
/* No problems encountered.  */
return true;




Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-19 Thread Jason Merrill via Gcc-patches

On 5/12/23 10:24, Ben Boeckel wrote:

On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:

I notice that the actual flags are all -fdep-*, though some of them are
-fdeps-* here, and the internal variables all seem to be fdeps_*.  I
lean toward harmonizing on "deps", I think.


Done.


I don't love the three separate options, but I suppose it's fine.  I'd
prefer "target" instead of "output".


Done.


It should be possible to omit both -file and -target and get reasonable
defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options.


`file` can be omitted (the `output_stream` will be used then). I *think*
I see that adding:

 %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}


%{!fdeps-file: but yes.


would at least do for `-fdeps-file` defaults? I don't know if there's a
reasonable default for `-fdeps-target=` though given that this command
line has no information about the object file that will be used (`-o` is
used for preprocessor output since we're leaning on `-E` here).


I would think it could default to %b.o?

I had quite a few more comments on the v5 patch that you didn't respond 
to here or address in the v6 patch; did your mail client hide them from you?


Jason



[PATCH 12/14] OpenACC: "declare create" fixes wrt. "allocatable" variables

2023-06-19 Thread Julian Brown
This patch fixes a case revealed by the previous patch where a synthetic
"acc data" region created for a "declare create" variable could interact
strangely with lexical inheritance behaviour.  In fact, it doesn't seem
right to create the "acc data" region for allocatable variables at all
-- doing so means that a data region is likely to be created for an
unallocated variable.

The fix is not to add such variables to the synthetic "acc data" region
at all, and defer to the code that performs "enter data"/"exit data"
for them when allocated/deallocated on the host instead. Then, "declare
create" variables are implicitly turned into "present" clauses on in-scope
offload regions.

2023-06-16  Julian Brown  

gcc/fortran/
* trans-openmp.cc (gfc_omp_finish_clause): Handle "declare create" for
scalar allocatable variables.
(gfc_trans_omp_clauses): Don't include allocatable vars in synthetic
"acc data" region created for "declare create" variables.  Mark such
variables with the "oacc declare create" attribute instead.  Don't
create ALWAYS_POINTER mapping for target-to-host updates of declare
create variables.
(gfc_trans_oacc_declare): Handle empty clause list.

gcc/
* gimplify.cc (gimplify_adjust_omp_clauses_1): Handle "oacc declare
create" attribute.

libgomp/
* testsuite/libgomp.oacc-fortran/declare-create-1.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-create-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-create-3.f90: New test.
---
 gcc/fortran/trans-openmp.cc   | 45 ---
 gcc/gimplify.cc   |  8 
 .../libgomp.oacc-fortran/declare-create-1.f90 | 21 +
 .../libgomp.oacc-fortran/declare-create-2.f90 | 25 +++
 .../libgomp.oacc-fortran/declare-create-3.f90 | 25 +++
 5 files changed, 119 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-2.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-3.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 1a14d2bc068..819d79cda28 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1619,7 +1619,16 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   orig_decl = decl;
 
   c4 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
+  if (openacc
+ && GFC_DECL_GET_SCALAR_ALLOCATABLE (decl)
+ && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_PRESENT)
+   /* This allows "declare create" to work for scalar allocatables.  The
+  resulting mapping nodes are:
+force_present(*var) firstprivate_pointer(var)
+  which is the same as an explicit "present" clause gives.  */
+   OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_FIRSTPRIVATE_POINTER);
+  else
+   OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
   OMP_CLAUSE_DECL (c4) = decl;
   OMP_CLAUSE_SIZE (c4) = size_int (0);
   decl = build_fold_indirect_ref (decl);
@@ -4588,6 +4597,29 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  if (!n->sym->attr.referenced)
continue;
 
+ /* We do not want to include allocatable vars in a synthetic
+"acc data" region created for "!$acc declare create" vars.
+Such variables are handled by augmenting allocate/deallocate
+statements elsewhere (with
+"acc enter data declare_allocate(...)", etc.).  */
+ if (op == EXEC_OACC_DECLARE
+ && n->u.map_op == OMP_MAP_ALLOC
+ && n->sym->attr.allocatable
+ && n->sym->attr.oacc_declare_create)
+   {
+ tree tree_var = gfc_get_symbol_decl (n->sym);
+ if (!lookup_attribute ("oacc declare create",
+DECL_ATTRIBUTES (tree_var)))
+   DECL_ATTRIBUTES (tree_var)
+ = tree_cons (get_identifier ("oacc declare create"),
+  NULL_TREE, DECL_ATTRIBUTES (tree_var));
+ /* We might need to turn what would normally be a
+"firstprivate" mapping into a "present" mapping.  For the
+latter, we need the decl to be addressable.  */
+ TREE_ADDRESSABLE (tree_var) = 1;
+ continue;
+   }
+
  bool always_modifier = false;
  tree node = build_omp_clause (input_location, OMP_CLAUSE_MAP);
  tree node2 = NULL_TREE;
@@ -4780,7 +4812,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  tree orig_decl = decl;
 

[PATCH 11/14] OpenACC: Reimplement "inheritance" for lexically-nested offload regions

2023-06-19 Thread Julian Brown
This patch reimplements "lexical inheritance" for OpenACC offload regions
inside "data" regions, allowing e.g. this to work:

  int *ptr;
  [...]
  #pragma acc data copyin(ptr[10:2])
  {
#pragma acc parallel
{ ... }
  }

here, the "copyin" is mirrored on the inner "acc parallel" as
"present(ptr[10:2])" -- allowing code within the parallel to use that
section of the array even though the mapping is implicit.

In terms of implementation, this works by expanding mapping nodes for
"acc data" to include pointer mappings that might be needed by inner
offload regions. The resulting mapping group is then copied to the inner
offload region as needed, rewriting the first node to "force_present".
The pointer mapping nodes are then removed from the "acc data" later
during gimplification.

For OpenMP, pointer mapping nodes on equivalent "omp data" regions are
not needed, so remain suppressed during expansion.

2023-06-16  Julian Brown  

gcc/c-family/
* c-omp.cc (c_omp_address_inspector::expand_array_base): Don't omit
pointer nodes for OpenACC.

gcc/
* gimplify.cc (omp_tsort_mark, omp_mapping_group): Move before
gimplify_omp_ctx. Add constructor to omp_mapping_group.
(gimplify_omp_ctx): Add DECL_DATA_CLAUSE field.
(new_omp_context, delete_omp_context): Initialise and free above field.
(omp_gather_mapping_groups_1): Use constructor for omp_mapping_group.
(gimplify_scan_omp_clauses): Record mappings that might be lexically
inherited.  Don't remove
GOMP_MAP_FIRSTPRIVATE_POINTER/GOMP_MAP_FIRSTPRIVATE_REFERENCE yet.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Implement lexical inheritance
behaviour for OpenACC.
(gimplify_adjust_omp_clauses): Remove
GOMP_MAP_FIRSTPRIVATE_POINTER/GOMP_MAP_FIRSTPRIVATE_REFERENCE here
instead, after lexical inheritance is done.

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: Re-enable scan test.
* gfortran.dg/goacc/pr70828.f90: Likewise.
* gfortran.dg/goacc/assumed-size.f90: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: Un-XFAIL.
* testsuite/libgomp.oacc-c-c++-common/pr70828-2.c: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: Un-XFAIL.
---
 gcc/c-family/c-omp.cc |  13 +-
 gcc/gimplify.cc   | 208 +-
 .../c-c++-common/goacc/acc-data-chain.c   |   4 +-
 .../gfortran.dg/goacc/assumed-size.f90|  35 +++
 gcc/testsuite/gfortran.dg/goacc/pr70828.f90   |   3 +-
 .../libgomp.oacc-c-c++-common/pr70828-2.c |   2 -
 .../libgomp.oacc-c-c++-common/pr70828.c   |   2 -
 .../libgomp.oacc-fortran/pr70828-2.f90|   2 -
 .../libgomp.oacc-fortran/pr70828-3.f90|   2 -
 .../libgomp.oacc-fortran/pr70828-4.f90|   2 -
 .../libgomp.oacc-fortran/pr70828-5.f90|   2 -
 .../libgomp.oacc-fortran/pr70828-6.f90|   2 -
 .../libgomp.oacc-fortran/pr70828.f90  |   2 -
 13 files changed, 202 insertions(+), 77 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/assumed-size.f90

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index e55b2aec920..291a26293ef 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -4313,7 +4313,8 @@ c_omp_address_inspector::expand_array_base (tree c,
/* The code handling "firstprivatize_array_bases" in gimplify.cc is
   relevant here.  What do we need to create for arrays at this
   stage?  (This condition doesn't feel quite right.  FIXME?)  */
-   if (!target
+   if (openmp
+   && !target
&& (TREE_CODE (TREE_TYPE (addr_tokens[i + 1]->expr))
== ARRAY_TYPE))
  break;
@@ -4324,7 +4325,7 @@ c_omp_address_inspector::expand_array_base (tree c,
   virtual_origin);
tree data_addr = omp_accessed_addr (addr_tokens, i + 1, expr);
c2 = build_omp_clause (loc, OMP_CLAUSE_MAP);
-   if (decl_p && target)
+   if (decl_p && (!openmp || target))
  OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
else
  {
@@ -4375,9 +4376,11 @@ c_omp_address_inspector::expand_array_base (tree c,
tree data_addr = omp_accessed_addr (addr_tokens, last_access, expr);
c2 = build_omp_clause (loc, OMP_CLAUSE_MAP);
/* For OpenACC, use FIRSTPRIVATE_POINTER for decls even on non-compute
-  regions (e.g. "acc data" constructs).  It'll be removed anyway in
-  gimplify.cc, but doing it this way 

[PATCH 10/14] OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc

2023-06-19 Thread Julian Brown
This patch has been separated out from the C++ "declare mapper"
support patch.  It contains just the gimplify.cc rearrangement
work, mostly moving gimplification from gimplify_scan_omp_clauses
to gimplify_adjust_omp_clauses for map clauses.

The motivation for doing this was that we don't know if we need to
instantiate mappers implicitly until the body of an offload region has
been scanned, i.e. in gimplify_adjust_omp_clauses, but we also need the
un-gimplified form of clauses to sort by base-pointer dependencies after
mapper instantiation has taken place.

The patch also reimplements the "present" clause sorting code to avoid
another sorting pass on mapping nodes.

2023-06-16  Julian Brown  

gcc/
* gimplify.cc (omp_segregate_mapping_groups): Handle "present" groups.
(gimplify_scan_omp_clauses): Use mapping group functionality to
iterate through mapping nodes.  Remove most gimplification of
OMP_CLAUSE_MAP nodes from here, but still populate ctx->variables
splay tree.
(gimplify_adjust_omp_clauses): Move most gimplification of
OMP_CLAUSE_MAP nodes here.

gcc/testsuite/
* gfortran.dg/gomp/map-12.f90: Adjust scan output.
---
 gcc/gimplify.cc   | 670 --
 gcc/testsuite/gfortran.dg/gomp/map-12.f90 |   2 +-
 2 files changed, 378 insertions(+), 294 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 9ce1f5b983a..e21e9d99cc9 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -9779,10 +9779,15 @@ omp_tsort_mapping_groups (vec 
*groups,
   return outlist;
 }
 
-/* Split INLIST into two parts, moving groups corresponding to
-   ALLOC/RELEASE/DELETE mappings to one list, and other mappings to another.
-   The former list is then appended to the latter.  Each sub-list retains the
-   order of the original list.
+/* Split INLIST into four parts:
+
+ - "present" to/from groups
+ - "present" alloc groups
+ - other to/from groups
+ - other alloc/release/delete groups
+
+   These sub-lists are then concatenated together to form the final list.
+   Each sub-list retains the order of the original list.
Note that ATTACH nodes are later moved to the end of the list in
gimplify_adjust_omp_clauses, for target regions.  */
 
@@ -9790,7 +9795,9 @@ static omp_mapping_group *
 omp_segregate_mapping_groups (omp_mapping_group *inlist)
 {
   omp_mapping_group *ard_groups = NULL, *tf_groups = NULL;
+  omp_mapping_group *pa_groups = NULL, *ptf_groups = NULL;
   omp_mapping_group **ard_tail = _groups, **tf_tail = _groups;
+  omp_mapping_group **pa_tail = _groups, **ptf_tail = _groups;
 
   for (omp_mapping_group *w = inlist; w;)
 {
@@ -9809,6 +9816,20 @@ omp_segregate_mapping_groups (omp_mapping_group *inlist)
  ard_tail = >next;
  break;
 
+   case GOMP_MAP_PRESENT_ALLOC:
+ *pa_tail = w;
+ w->next = NULL;
+ pa_tail = >next;
+ break;
+
+   case GOMP_MAP_PRESENT_FROM:
+   case GOMP_MAP_PRESENT_TO:
+   case GOMP_MAP_PRESENT_TOFROM:
+ *ptf_tail = w;
+ w->next = NULL;
+ ptf_tail = >next;
+ break;
+
default:
  *tf_tail = w;
  w->next = NULL;
@@ -9820,8 +9841,10 @@ omp_segregate_mapping_groups (omp_mapping_group *inlist)
 
   /* Now splice the lists together...  */
   *tf_tail = ard_groups;
+  *pa_tail = tf_groups;
+  *ptf_tail = pa_groups;
 
-  return tf_groups;
+  return ptf_groups;
 }
 
 /* Given a list LIST_P containing groups of mappings given by GROUPS, reorder
@@ -11673,119 +11696,30 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
break;
   }
 
-  if (code == OMP_TARGET
-  || code == OMP_TARGET_DATA
-  || code == OMP_TARGET_ENTER_DATA
-  || code == OMP_TARGET_EXIT_DATA)
-{
-  vec *groups;
-  groups = omp_gather_mapping_groups (list_p);
-  if (groups)
-   {
- hash_map *grpmap;
- grpmap = omp_index_mapping_groups (groups);
+  vec *groups = omp_gather_mapping_groups (list_p);
+  hash_map *grpmap = NULL;
+  unsigned grpnum = 0;
+  tree *grp_start_p = NULL, grp_end = NULL_TREE;
 
- omp_resolve_clause_dependencies (code, groups, grpmap);
- omp_build_struct_sibling_lists (code, region_type, groups, ,
- list_p);
-
- omp_mapping_group *outlist = NULL;
- bool enter_exit = (code == OMP_TARGET_ENTER_DATA
-|| code == OMP_TARGET_EXIT_DATA);
-
- /* Topological sorting may fail if we have duplicate nodes, which
-we should have detected and shown an error for already.  Skip
-sorting in that case.  */
- if (seen_error ())
-   goto failure;
-
- delete grpmap;
- delete groups;
-
- /* Rebuild now we have struct sibling lists.  */
- groups = omp_gather_mapping_groups (list_p);
- grpmap = omp_index_mapping_groups (groups);
-
-

[PATCH 08/14] OpenMP: Pointers and member mappings

2023-06-19 Thread Julian Brown
This patch changes the mapping node arrangement used for array components
of derived types, e.g.:

  type T
  integer, pointer, dimension(:) :: arrptr
  end type T

  type(T) :: tvar
  [...]
  !$omp target map(tofrom: tvar%arrptr)

This will currently be mapped using three mapping nodes:

  GOMP_MAP_TO tvar%arrptr   (the descriptor)
  GOMP_MAP_TOFROM *tvar%arrptr%data (the actual array data)
  GOMP_MAP_ALWAYS_POINTER tvar%arrptr%data  (a pointer to the array data)

This follows OMP 5.0, 2.19.7.1 (or OpenMP 5.2, 5.8.3) "map Clause":

  "If a list item in a map clause is an associated pointer and the
   pointer is not the base pointer of another list item in a map clause
   on the same construct, then it is treated as if its pointer target
   is implicitly mapped in the same clause. For the purposes of the map
   clause, the mapped pointer target is treated as if its base pointer
   is the associated pointer."

However, we can also write this:

  map(to: tvar%arrptr) map(tofrom: tvar%arrptr(3:8))

and then instead we should follow (OpenMP 5.2, 5.8.3 "map Clause"):

  "For map clauses on map-entering constructs, if any list item has a base
   pointer for which a corresponding pointer exists in the data environment
   upon entry to the region and either a new list item or the corresponding
   pointer is created in the device data environment on entry to the region,
   then:
   1. [Fortran] The corresponding pointer variable is associated with
  a pointer target that has the same rank and bounds as the pointer
  target of the original pointer, such that the corresponding list item
  can be accessed through the pointer in a target region.
   2. The corresponding pointer variable becomes an attached pointer
  for the corresponding list item."

With this patch you can write the above mappings, and the mapping nodes
used to map pointers to array sections (with descriptors) now look
like this:

  1) map(to: tvar%arrptr)   -->
  GOMP_MAP_TO [implicit]  *tvar%arrptr%data  (the array data)
  GOMP_MAP_TO_PSETtvar%arrptr(the descriptor)
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

  2) map(tofrom: tvar%arrptr(3:8)   -->
  GOMP_MAP_TOFROM *tvar%arrptr%data(3)  (size 8-3+1, etc.)
  GOMP_MAP_TO_PSETtvar%arrptr
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data  (bias 3, etc.)

In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer --
so we drop it entirely.  (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)

In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:

  GOMP_MAP_STRUCT   tvar (len: 1)
  GOMP_MAP_TO_PSET  tvar%arr (size: 64, etc.)  <--. moved here
  [...]   |
  GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___|
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

In another case, if we have an array of derived-type values "dtarr",
and mappings like:

  i = 1
  j = 1
  map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))

We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):

  GOMP_MAP_STRUCT dtvar  (len: 2)
  GOMP_MAP_TO_PSETdtvar(i)%arrptr
  GOMP_MAP_TO_PSETdtvar(j)%arrptr
  [...]
  GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3)  (size: 8-3+1)
  GOMP_MAP_ATTACH_DETACH  dtvar(j)%arrptr%data
  GOMP_MAP_TO [implicit]  *dtvar(i)%arrptr%data(1)  (size: whole array)
  GOMP_MAP_ATTACH_DETACH  dtvar(i)%arrptr%data

Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.

The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement.  Also now if you have
mappings like this:

  #pragma omp target enter data map(to: dv, dv%arr(1:20))

The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:

  GOMP_MAP_TOdv

  GOMP_MAP_TO*dv%arr%data
  GOMP_MAP_TO_PSET   dv%arr <-- deleted (array section mapping)
  GOMP_MAP_ATTACH_DETACH dv%arr%data

To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion.


[PATCH 09/14] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-06-19 Thread Julian Brown
This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time.  Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.

This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time.  Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).

For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c).  That could perhaps be addressed with a follow-up
patch, if necessary.

This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support.

2023-06-16  Julian Brown  

gcc/fortran/
* trans-openmp.cc (gfc_omp_deep_map_kind_p): Add GOMP_MAP_STRUCT_UNORD.

gcc/
* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
(omp_get_attachment, omp_group_last, omp_group_base,
omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
Support GOMP_MAP_STRUCT_UNORD.
(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
GOMP_MAP_STRUCT_UNORD support.
* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
* tree-pretty-print.cc (dump_omp_clause): Likewise.

include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.

libgomp/
* oacc-mem.c (find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal, GOACC_enter_exit_data): Add
GOMP_MAP_STRUCT_UNORD support.
* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
Detect incorrect use of variable indexing of arrays of structs.
(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
GOMP_MAP_STRUCT_UNORD support.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
---
 gcc/fortran/trans-openmp.cc   |   1 +
 gcc/gimplify.cc   | 110 ++
 gcc/omp-low.cc|   1 +
 gcc/tree-pretty-print.cc  |   3 +
 include/gomp-constants.h  |   6 +
 libgomp/oacc-mem.c|   6 +-
 libgomp/target.c  |  60 +-
 .../map-arrayofstruct-1.c |  38 ++
 .../map-arrayofstruct-2.c |  58 +
 .../map-arrayofstruct-3.c |  68 +++
 .../libgomp.fortran/map-subarray-5.f90|  54 +
 11 files changed, 378 insertions(+), 27 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index a108f718ffa..1a14d2bc068 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2961,6 +2961,7 @@ gfc_omp_deep_map_kind_p (tree clause)
 case GOMP_MAP_FORCE_TOFROM:
 case GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT:
 case GOMP_MAP_STRUCT:
+case GOMP_MAP_STRUCT_UNORD:
 case GOMP_MAP_ALWAYS_POINTER:
 case GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION:
 case GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION:
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index da81582da1c..9ce1f5b983a 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -8952,7 +8952,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
 
 static tree
 extract_base_bit_offset (tree base, poly_int64 *bitposp,
-poly_offset_int *poffsetp)
+poly_offset_int *poffsetp,
+bool *variable_offset)
 {
   tree offset;
   poly_int64 bitsize, bitpos;
@@ -8970,10 +8971,13 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp,
   if (offset && poly_int_tree_p (offset))
 {
   poffset = wi::to_poly_offset (offset);
-  

[PATCH 14/14] OpenACC: Improve implicit mapping for non-lexically nested offload regions

2023-06-19 Thread Julian Brown
This patch enables use of the OMP_CLAUSE_RUNTIME_IMPLICIT_P flag for
OpenACC.

This allows code like this to work correctly:

  int arr[100];
  [...]
  #pragma acc enter data copyin(arr[20:10])

  /* No explicit mapping of 'arr' here.  */
  #pragma acc parallel
  { /* use of arr[20:10]... */ }

  #pragma acc exit data copyout(arr[20:10])

Otherwise, the implicit "copy" ("present_or_copy") on the parallel
corresponds to the whole array, and that fails at runtime when the
subarray is mapped.

The numbering of the GOMP_MAP_IMPLICIT bit clashes with the OpenACC
"non-contiguous" dynamic array support, so the GOMP_MAP_NONCONTIG_ARRAY_P
macro has been adjusted to account for that.

This behaviour relates to upstream OpenACC issue 490 (not yet resolved).

2023-06-16  Julian Brown  

gcc/
* gimplify.cc (gimplify_adjust_omp_clauses_1): Set
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P for OpenACC also.

gcc/testsuite/
* c-c++-common/goacc/combined-reduction.c: Adjust scan output.
* c-c++-common/goacc/reduction-1.c: Likewise.
* c-c++-common/goacc/reduction-2.c: Likewise.
* c-c++-common/goacc/reduction-3.c: Likewise.
* c-c++-common/goacc/reduction-4.c: Likewise.
* c-c++-common/goacc/reduction-10.c: Likewise.
* gfortran.dg/goacc/loop-tree-1.f90: Likewise.

include/
* gomp-constants.h (GOMP_MAP_NONCONTIG_ARRAY_P): Tweak condition.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c: New test.
---
 gcc/gimplify.cc   |  5 +---
 .../c-c++-common/goacc/combined-reduction.c   |  2 +-
 .../c-c++-common/goacc/reduction-1.c  |  4 ++--
 .../c-c++-common/goacc/reduction-10.c |  9 +++
 .../c-c++-common/goacc/reduction-2.c  |  4 ++--
 .../c-c++-common/goacc/reduction-3.c  |  4 ++--
 .../c-c++-common/goacc/reduction-4.c  |  4 ++--
 .../gfortran.dg/goacc/loop-tree-1.f90 |  2 +-
 include/gomp-constants.h  |  3 ++-
 .../implicit-mapping-1.c  | 24 +++
 10 files changed, 42 insertions(+), 19 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 0706f130ebb..1e90d2ed031 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13413,10 +13413,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
  gcc_unreachable ();
}
   OMP_CLAUSE_SET_MAP_KIND (clause, kind);
-  /* Setting of the implicit flag for the runtime is currently disabled for
-OpenACC.  */
-  if ((gimplify_omp_ctxp->region_type & ORT_ACC) == 0)
-   OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P (clause) = 1;
+  OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P (clause) = 1;
   if (DECL_SIZE (decl)
  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
{
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-reduction.c 
b/gcc/testsuite/c-c++-common/goacc/combined-reduction.c
index ecf23f59d66..40b93acc9ea 100644
--- a/gcc/testsuite/c-c++-common/goacc/combined-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/combined-reduction.c
@@ -25,5 +25,5 @@ main ()
 
 /* { dg-final { scan-tree-dump-times "omp target oacc_parallel reduction.+:v1. 
map.tofrom:v1" 1 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "acc loop reduction.+:v1. private.i." 1 
"gimple" } } */
-/* { dg-final { scan-tree-dump-times "omp target oacc_kernels 
map.force_tofrom:n .len: 4.. map.force_tofrom:v1 .len: 4.." 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "omp target oacc_kernels 
map.force_tofrom:n .len: 4..implicit.. map.force_tofrom:v1 .len: 4..implicit.." 
1 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "acc loop reduction.+:v1. private.i." 1 
"gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c 
b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
index 35bfc868708..d9e3c380b8e 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -68,5 +68,5 @@ main(void)
 }
 
 /* Check that default copy maps are generated for loop reductions.  */
-/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: 
\[0-9\]+\\\]\\)" 7 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: 
\[0-9\]+\\\]\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: 
\[0-9\]+\\\]\\\[implicit\\\]\\)" 7 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: 
\[0-9\]+\\\]\\\[implicit\\\]\\)" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-10.c 
b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
index 579aa561479..36c330e9267 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-10.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
@@ -87,7 +87,8 @@ main(void)
 
 /* Check that default copy maps are generated for loop reductions.  */
 /* { 

[PATCH 13/14] OpenACC: Allow implicit uses of assumed-size arrays in offload regions

2023-06-19 Thread Julian Brown
This patch reimplements the functionality of the previously-reverted
patch "Assumed-size arrays with non-lexical data mappings". The purpose
is to support implicit uses of assumed-size arrays for Fortran when those
arrays have already been mapped on the target some other way (e.g. by
"acc enter data").

This relates to upstream OpenACC issue 489 (not yet resolved).

2023-06-16  Julian Brown  

gcc/fortran/
* trans-openmp.cc (gfc_omp_finish_clause): Treat implicitly-mapped
assumed-size arrays as zero-sized for OpenACC, rather than an error.

gcc/testsuite/
* gfortran.dg/goacc/assumed-size.f90: Don't expect error.

libgomp/
* testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90: New
test.
* testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90: New
test.
---
 gcc/fortran/trans-openmp.cc   | 16 ++--
 .../gfortran.dg/goacc/assumed-size.f90|  4 +-
 .../nonlexical-assumed-size-1.f90 | 28 +
 .../nonlexical-assumed-size-2.f90 | 40 +++
 4 files changed, 82 insertions(+), 6 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 819d79cda28..230cebf250b 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1587,6 +1587,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
 return;
 
   tree decl = OMP_CLAUSE_DECL (c);
+  bool assumed_size = false;
 
   /* Assumed-size arrays can't be mapped implicitly, they have to be
  mapped explicitly using array sections.  */
@@ -1597,9 +1598,14 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
 == NULL)
 {
-  error_at (OMP_CLAUSE_LOCATION (c),
-   "implicit mapping of assumed size array %qD", decl);
-  return;
+  if (openacc)
+   assumed_size = true;
+  else
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "implicit mapping of assumed size array %qD", decl);
+ return;
+   }
 }
 
   if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
@@ -1654,7 +1660,9 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   else
{
  OMP_CLAUSE_DECL (c) = decl;
- OMP_CLAUSE_SIZE (c) = NULL_TREE;
+ OMP_CLAUSE_SIZE (c) = assumed_size ? size_zero_node : NULL_TREE;
+ if (assumed_size)
+   OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION (c) = 1;
}
   if (TREE_CODE (TREE_TYPE (orig_decl)) == REFERENCE_TYPE
  && (GFC_DECL_GET_SCALAR_POINTER (orig_decl)
diff --git a/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90 
b/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
index 4fced2e70c9..12f44c4743a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
@@ -4,7 +4,8 @@
 ! exit data, respectively.
 
 ! This does not appear to be supported by the OpenACC standard as of version
-! 3.0.  Check for an appropriate error message.
+! 3.0.  There is however real-world code that relies on this working, so we
+! make an attempt to support it.
 
 program test
   implicit none
@@ -26,7 +27,6 @@ subroutine dtest (a, n)
   !$acc enter data copyin(a(1:n))
 
   !$acc parallel loop
-! { dg-error {implicit mapping of assumed size array 'a'} "" { target *-*-* } 
.-1 }
   do i = 1, n
  a(i) = i
   end do
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
new file mode 100644
index 000..4b61e1cee9b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
@@ -0,0 +1,28 @@
+! { dg-do run }
+
+program p
+implicit none
+integer :: myarr(10)
+
+myarr = 0
+
+call subr(myarr)
+
+if (myarr(5).ne.5) stop 1
+
+contains
+
+subroutine subr(arr)
+implicit none
+integer :: arr(*)
+
+!$acc enter data copyin(arr(1:10))
+
+!$acc serial
+arr(5) = 5
+!$acc end serial
+
+!$acc exit data copyout(arr(1:10))
+
+end subroutine subr
+end program p
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
new file mode 100644
index 000..daf7089915f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
@@ -0,0 +1,40 @@
+! { dg-do run }
+
+program p
+implicit none
+integer :: myarr(10)
+
+myarr = 0
+
+call subr(myarr)
+
+if (myarr(5).ne.5) stop 1
+
+contains
+
+subroutine subr(arr)
+implicit none
+integer :: arr(*)
+
+! At first glance, it might not be obvious how this works.  The "enter data"
+! and "exit data" operations expand to a pair 

[PATCH 05/14] OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause

2023-06-19 Thread Julian Brown
This patch trivially adds braces and reindents the
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in
c_finish_omp_clause and finish_omp_clause, in preparation for the
following patch (to clarify the diff a little).

2022-09-13  Julian Brown  

gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.

gcc/cp/
* semantics.cc (finish_omp_clause): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
---
 gcc/c/c-typeck.cc   | 615 +-
 gcc/cp/semantics.cc | 788 ++--
 2 files changed, 707 insertions(+), 696 deletions(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 9591d67251e..2cfe2174bab 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15520,321 +15520,326 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
case OMP_CLAUSE_TO:
case OMP_CLAUSE_FROM:
case OMP_CLAUSE__CACHE_:
- t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST)
-   {
- grp_start_p = pc;
- grp_sentinel = OMP_CLAUSE_CHAIN (c);
+ {
+   t = OMP_CLAUSE_DECL (c);
+   if (TREE_CODE (t) == TREE_LIST)
+ {
+   grp_start_p = pc;
+   grp_sentinel = OMP_CLAUSE_CHAIN (c);
 
- if (handle_omp_array_sections (c, ort))
-   remove = true;
- else
-   {
- t = OMP_CLAUSE_DECL (c);
- if (!omp_mappable_type (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "array section does not have mappable type "
-   "in %qs clause",
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- else if (TYPE_ATOMIC (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "%<_Atomic%> %qE in %qs clause", t,
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- while (TREE_CODE (t) == ARRAY_REF)
-   t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == COMPONENT_REF
- && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
-   {
- do
-   {
- t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == MEM_REF
- || TREE_CODE (t) == INDIRECT_REF)
-   {
- t = TREE_OPERAND (t, 0);
- STRIP_NOPS (t);
- if (TREE_CODE (t) == POINTER_PLUS_EXPR)
-   t = TREE_OPERAND (t, 0);
-   }
-   }
- while (TREE_CODE (t) == COMPONENT_REF
-|| TREE_CODE (t) == ARRAY_REF);
-
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
- && OMP_CLAUSE_MAP_IMPLICIT (c)
- && (bitmap_bit_p (_head, DECL_UID (t))
- || bitmap_bit_p (_field_head, DECL_UID (t))
- || bitmap_bit_p (_firstprivate_head,
-  DECL_UID (t
-   {
- remove = true;
- break;
-   }
- if (bitmap_bit_p (_field_head, DECL_UID (t)))
-   break;
- if (bitmap_bit_p (_head, DECL_UID (t)))
-   {
- if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in motion "
- "clauses", t);
- else if (ort == C_ORT_ACC)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in data "
- "clauses", t);
- else
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in map "
- "clauses", t);
- remove = true;
-   }
- else
-   {
- bitmap_set_bit (_head, DECL_UID (t));
- bitmap_set_bit (_field_head, DECL_UID (t));
-   }
-   }
-   }
- if 

[PATCH 07/14] OpenMP: implicitly map base pointer for array-section pointer components

2023-06-19 Thread Julian Brown
Following from discussion in:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html

and:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608100.html

and also upstream OpenMP issue 342, this patch changes mapping for array
sections of pointer components on compute regions like this:

  #pragma omp target map(s.ptr[0:10])
  {
...use of 's'...
  }

so the base pointer 's.ptr' is implicitly mapped, and thus pointer
attachment happens.  This is subtly different in the "enter data"
case, e.g:

  #pragma omp target enter data map(s.ptr[0:10])

if 's.ptr' (or the whole of 's') is not present on the target before
the directive is executed, the array section is copied to the target
but pointer attachment does *not* take place, since 's' (or 's.ptr')
is not mapped implicitly for "enter data".

To get a pointer attachment with "enter data", you can do, e.g:

  #pragma omp target enter data map(s.ptr, s.ptr[0:10])

  #pragma omp target
  {
...implicit use of 's'...
  }

That is, once the attachment has happened, implicit mapping of 's'
and uses of 's.ptr[...]' work correctly in the target region.

ChangeLog

2022-12-12  Julian Brown  

gcc/
* gimplify.cc (omp_accumulate_sibling_list): Don't require
explicitly-mapped base pointer for compute regions.

gcc/testsuite/
* c-c++-comon/gomp/target-implicit-map-2.c: Update expected scan output.

libgomp/
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
"free".
* testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test.
* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
* testsuite/libgomp.c/target-22.c: Remove explicit base pointer
mappings.
---
 gcc/gimplify.cc   |  9 ++--
 .../c-c++-common/gomp/target-implicit-map-2.c |  3 +-
 .../target-implicit-map-2.c   |  2 +
 .../target-implicit-map-5.c   | 50 +++
 .../libgomp.c-c++-common/target-map-zlas-1.c  | 36 +
 libgomp/testsuite/libgomp.c/target-22.c   |  3 +-
 6 files changed, 97 insertions(+), 6 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-map-zlas-1.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 9be5d9c5328..6a43c792450 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -10696,6 +10696,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
   poly_int64 cbitpos;
   tree ocd = OMP_CLAUSE_DECL (grp_end);
   bool openmp = !(region_type & ORT_ACC);
+  bool target = (region_type & ORT_TARGET) != 0;
   tree *continue_at = NULL;
 
   while (TREE_CODE (ocd) == ARRAY_REF)
@@ -10800,9 +10801,9 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
}
 
  /* For OpenMP semantics, we don't want to implicitly allocate
-space for the pointer here.  A FRAGILE_P node is only being
-created so that omp-low.cc is able to rewrite the struct
-properly.
+space for the pointer here for non-compute regions (e.g. "enter
+data").  A FRAGILE_P node is only being created so that
+omp-low.cc is able to rewrite the struct properly.
 For references (to pointers), we want to actually allocate the
 space for the reference itself in the sorted list following the
 struct node.
@@ -10810,6 +10811,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
 mapping of the attachment point, but not otherwise.  */
  if (*fragile_p
  || (openmp
+ && !target
  && attach_detach
  && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE
  && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end)))
@@ -11122,6 +11124,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
 
  if (*fragile_p
  || (openmp
+ && !target
  && attach_detach
  && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE
  && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end)))
diff --git a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c 
b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
index 5ba1d7efe08..72df5b1 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
@@ -49,4 +49,5 @@ main (void)
 
 /* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(tofrom:a 
\[len: [0-9]+\]\[implicit\]\)} "gimple" } } */
 
-/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a 
\[len: 1\]\) map\(alloc:a\.ptr \[len: 0\]\) map\(tofrom:\*_[0-9]+ \[len: 
[0-9]+\]\) map\(attach:a\.ptr \[bias: 0\]\)} "gimple" } } */
+/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a 
\[len: 

[PATCH 03/14] Revert "Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)"

2023-06-19 Thread Julian Brown
This reverts commit a84b89b8f070f1efe86ea347e98d57e6bc32ae2d.

Relevant tests are temporarily disabled or XFAILed.

2023-06-16  Julian Brown  

gcc/
Revert:
* gimplify.cc (oacc_array_mapping_info): New struct.
(gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Zero-initialise above.
(delete_omp_context): Delete above if allocated.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
and record in above map.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.cc (lower_omp_target): Allow decl for bias not to be present
in OpenACC context.

gcc/fortran/
Revert:
* trans-openmp.cc: Handle implicit "present".

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: Partly disable test.
* gfortran.dg/goacc/pr70828.f90: Likewise.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: XFAIL test.
* testsuite/libgomp.oacc-c-c++-common/pr70828-2.c: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: XFAIL test.
---
 gcc/fortran/trans-openmp.cc   |  10 +-
 gcc/gimplify.cc   | 143 +-
 gcc/omp-low.cc|  10 +-
 .../c-c++-common/goacc/acc-data-chain.c   |   4 +-
 gcc/testsuite/gfortran.dg/goacc/pr70828.f90   |   3 +-
 .../libgomp.oacc-c-c++-common/pr70828-2.c |   2 +
 .../libgomp.oacc-c-c++-common/pr70828.c   |   2 +
 .../libgomp.oacc-fortran/pr70828-2.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-3.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-4.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-5.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-6.f90|   2 +
 .../libgomp.oacc-fortran/pr70828.f90  |   2 +
 13 files changed, 28 insertions(+), 158 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 96e91a3bc50..809b96bc220 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1587,13 +1587,9 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
 
   tree decl = OMP_CLAUSE_DECL (c);
 
-  /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
- explicitly using array sections.  An exception is if the array is
- mapped explicitly in an enclosing data construct for OpenACC, in which
- case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
- error.  */
-  if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
-  && TREE_CODE (decl) == PARM_DECL
+  /* Assumed-size arrays can't be mapped implicitly, they have to be
+ mapped explicitly using array sections.  */
+  if (TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
   && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 80f1f3a657f..e3384c7f65b 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -218,17 +218,6 @@ enum gimplify_defaultmap_kind
   GDMK_POINTER
 };
 
-/* Used to record clauses representing array slices on data directives that
-   may affect implicit mapping semantics on enclosed OpenACC parallel/kernels
-   regions.  PSET is used for Fortran array slices with array descriptors,
-   or NULL otherwise.  */
-struct oacc_array_mapping_info
-{
-  tree mapping;
-  tree pset;
-  tree pointer;
-};
-
 struct gimplify_omp_ctx
 {
   struct gimplify_omp_ctx *outer_context;
@@ -250,7 +239,6 @@ struct gimplify_omp_ctx
   bool in_for_exprs;
   bool ompacc;
   int defaultmap[5];
-  hash_map *decl_data_clause;
 };
 
 struct privatize_reduction
@@ -485,7 +473,6 @@ new_omp_context (enum omp_region_type region_type)
   c->defaultmap[GDMK_AGGREGATE] = GOVD_MAP;
   c->defaultmap[GDMK_ALLOCATABLE] = GOVD_MAP;
   c->defaultmap[GDMK_POINTER] = GOVD_MAP;
-  c->decl_data_clause = NULL;
 
   return c;
 }
@@ -498,8 +485,6 @@ delete_omp_context (struct gimplify_omp_ctx *c)
   splay_tree_delete (c->variables);
   delete c->privatized_types;
   c->loop_iter_var.release ();
-  if (c->decl_data_clause)
-delete c->decl_data_clause;
   XDELETE (c);
 }
 
@@ -11235,41 +11220,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
case OMP_TARGET:
  break;
case OACC_DATA:
- {
-   tree base_ptr = OMP_CLAUSE_CHAIN (c);
-   tree pset = NULL;
-   

[PATCH 01/14] Revert "Assumed-size arrays with non-lexical data mappings"

2023-06-19 Thread Julian Brown
This reverts commit 72733f6e6f6ec1bb9884fea8bfbebd3de03d9374.

2023-06-16  Julian Brown  

gcc/
Revert:
* gimplify.cc (gimplify_adjust_omp_clauses_1): Raise error for
assumed-size arrays in map clauses for Fortran/OpenMP.
* omp-low.cc (lower_omp_target): Set the size of assumed-size Fortran
arrays to one to allow use of data already mapped on the offload device.

gcc/fortran/
Revert:
* trans-openmp.cc (gfc_omp_finish_clause): Change clauses mapping
assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.
---
 gcc/fortran/trans-openmp.cc | 22 +-
 gcc/gimplify.cc | 14 --
 gcc/omp-low.cc  |  5 -
 3 files changed, 9 insertions(+), 32 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index e8f3b24e5f8..e55c8292d05 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1588,18 +1588,10 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   tree decl = OMP_CLAUSE_DECL (c);
 
   /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
- explicitly using array sections.  For OpenACC this restriction is lifted
- if the array has already been mapped:
-
-   - Using a lexically-enclosing data region: in that case we see the
- GOMP_MAP_FORCE_PRESENT mapping kind here.
-
-   - Using a non-lexical data mapping ("acc enter data").
-
- In the latter case we change the mapping type to GOMP_MAP_FORCE_PRESENT.
- This raises an error for OpenMP in the caller
- (gimplify.c:gimplify_adjust_omp_clauses_1).  OpenACC will raise a runtime
- error if the implicitly-referenced assumed-size array is not mapped.  */
+ explicitly using array sections.  An exception is if the array is
+ mapped explicitly in an enclosing data construct for OpenACC, in which
+ case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
+ error.  */
   if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
   && TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
@@ -1607,7 +1599,11 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
 == NULL)
-OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT);
+{
+  error_at (OMP_CLAUSE_LOCATION (c),
+   "implicit mapping of assumed size array %qD", decl);
+  return;
+}
 
   if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
 return;
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 09c596f026e..3729b986801 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -12828,26 +12828,12 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
void *data)
   *list_p = clause;
   struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
   gimplify_omp_ctxp = ctx->outer_context;
-  gomp_map_kind kind = (code == OMP_CLAUSE_MAP) ? OMP_CLAUSE_MAP_KIND (clause)
-   : (gomp_map_kind) GOMP_MAP_LAST;
   /* Don't call omp_finish_clause on implicitly added OMP_CLAUSE_PRIVATE
  in simd.  Those are only added for the local vars inside of simd body
  and they don't need to be e.g. default constructible.  */
   if (code != OMP_CLAUSE_PRIVATE || ctx->region_type != ORT_SIMD) 
 lang_hooks.decls.omp_finish_clause (clause, pre_p,
(ctx->region_type & ORT_ACC) != 0);
-  /* Allow OpenACC to have implicit assumed-size arrays via FORCE_PRESENT,
- which should work as long as the array has previously been mapped
- explicitly on the target (e.g. by "enter data").  Raise an error for
- OpenMP.  */
-  if (lang_GNU_Fortran ()
-  && code == OMP_CLAUSE_MAP
-  && (ctx->region_type & ORT_ACC) == 0
-  && kind == GOMP_MAP_TOFROM
-  && OMP_CLAUSE_MAP_KIND (clause) == GOMP_MAP_FORCE_PRESENT)
-error_at (OMP_CLAUSE_LOCATION (clause),
- "implicit mapping of assumed size array %qD",
- OMP_CLAUSE_DECL (clause));
   if (gimplify_omp_ctxp)
 for (; clause != chain; clause = OMP_CLAUSE_CHAIN (clause))
   if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 3424eba2217..59143d8efe5 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14353,11 +14353,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
  s = OMP_CLAUSE_SIZE (c);
if (s == NULL_TREE)
  s = TYPE_SIZE_UNIT (TREE_TYPE (ovar));
-   /* Fortran assumed-size arrays have zero size because the type is
-  incomplete.  Set the size to one to allow the runtime to remap
-  any existing data that is already present on the accelerator.  */
-   if (s == NULL_TREE && is_gimple_omp_oacc (ctx->stmt))
- s = integer_one_node;

[PATCH 02/14] Revert "Fix references declared in lexically-enclosing OpenACC data region"

2023-06-19 Thread Julian Brown
This reverts commit c9cd2bac6a5127a01c6f47e5636a926ac39b5e21.

2023-06-16  Julian Brown  

gcc/fortran/
Revert:
* trans-openmp.cc (gfc_omp_finish_clause): Guard addition of clauses for
pointers with DECL_P.

gcc/
Revert:
* gimplify.cc (oacc_array_mapping_info): Add REF field.
(gimplify_scan_omp_clauses): Initialise above field for data blocks
passed by reference.
(gomp_oacc_needs_data_present): Handle references.
(gimplify_adjust_omp_clauses_1): Handle references and optional
arguments for variables declared in lexically-enclosing OpenACC data
region.
---
 gcc/fortran/trans-openmp.cc |  2 +-
 gcc/gimplify.cc | 55 +
 2 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index e55c8292d05..96e91a3bc50 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1611,7 +1611,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
   tree present = gfc_omp_check_optional_argument (decl, true);
   tree orig_decl = NULL_TREE;
-  if (DECL_P (decl) && POINTER_TYPE_P (TREE_TYPE (decl)))
+  if (POINTER_TYPE_P (TREE_TYPE (decl)))
 {
   if (!gfc_omp_privatize_by_reference (decl)
  && !GFC_DECL_GET_SCALAR_POINTER (decl)
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 3729b986801..80f1f3a657f 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -227,7 +227,6 @@ struct oacc_array_mapping_info
   tree mapping;
   tree pset;
   tree pointer;
-  tree ref;
 };
 
 struct gimplify_omp_ctx
@@ -11248,9 +11247,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  }
if (base_ptr
&& OMP_CLAUSE_CODE (base_ptr) == OMP_CLAUSE_MAP
-   && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-&& (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALLOC
-|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER))
&& OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET
&& ((OMP_CLAUSE_MAP_KIND (base_ptr)
 == GOMP_MAP_FIRSTPRIVATE_POINTER)
@@ -11269,19 +11265,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
ai.mapping = unshare_expr (c);
ai.pset = pset ? unshare_expr (pset) : NULL;
ai.pointer = unshare_expr (base_ptr);
-   ai.ref = NULL_TREE;
-   if (TREE_CODE (base_addr) == INDIRECT_REF
-   && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base_addr, 0)))
-   == REFERENCE_TYPE))
- {
-   base_addr = TREE_OPERAND (base_addr, 0);
-   tree ref_clause = OMP_CLAUSE_CHAIN (base_ptr);
-   gcc_assert ((OMP_CLAUSE_CODE (ref_clause)
-== OMP_CLAUSE_MAP)
-   && (OMP_CLAUSE_MAP_KIND (ref_clause)
-   == GOMP_MAP_POINTER));
-   ai.ref = unshare_expr (ref_clause);
- }
ctx->decl_data_clause->put (base_addr, ai);
  }
if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
@@ -12464,15 +12447,11 @@ gomp_oacc_needs_data_present (tree decl)
   && gimplify_omp_ctxp->region_type != ORT_ACC_KERNELS)
 return NULL;
 
-  tree type = TREE_TYPE (decl);
-  if (TREE_CODE (type) == REFERENCE_TYPE)
-type = TREE_TYPE (type);
-
-  if (TREE_CODE (type) != ARRAY_TYPE
-  && TREE_CODE (type) != POINTER_TYPE
-  && TREE_CODE (type) != RECORD_TYPE
-  && (TREE_CODE (type) != POINTER_TYPE
- || TREE_CODE (TREE_TYPE (type)) != ARRAY_TYPE))
+  if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE
+  && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
+  && TREE_CODE (TREE_TYPE (decl)) != RECORD_TYPE
+  && (TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
+ || TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) != ARRAY_TYPE))
 return NULL;
 
   decl = get_base_address (decl);
@@ -12626,12 +12605,6 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
 {
   tree mapping = array_info->mapping;
   tree pointer = array_info->pointer;
-  gomp_map_kind presence_kind = GOMP_MAP_FORCE_PRESENT;
-  bool no_alloc = (OMP_CLAUSE_CODE (mapping) == OMP_CLAUSE_MAP
-  && OMP_CLAUSE_MAP_KIND (mapping) == GOMP_MAP_IF_PRESENT);
-
-  if (no_alloc || omp_check_optional_argument (decl, false))
-presence_kind = GOMP_MAP_IF_PRESENT;
 
   if (code == OMP_CLAUSE_FIRSTPRIVATE)
/* Oops, we have the wrong type of clause.  Rebuild it.  */
@@ -12639,15 +12612,14 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
void *data)
 

[PATCH 04/14] Revert "openmp: Handle C/C++ array reference base-pointers in array sections"

2023-06-19 Thread Julian Brown
This reverts commit 3385743fd2fa15a2a750a29daf6d4f97f5aad0ae.

2023-06-16  Julian Brown  

Revert:
2022-02-24  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-typeck.cc (handle_omp_array_sections): Add handling for
creating array-reference base-pointer attachment clause.

gcc/cp/ChangeLog:

* semantics.cc (handle_omp_array_sections): Add handling for
creating array-reference base-pointer attachment clause.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-enter-data-1.c: Adjust testcase.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test.
---
 gcc/c/c-typeck.cc | 27 +
 gcc/cp/semantics.cc   | 28 +
 .../c-c++-common/gomp/target-enter-data-1.c   |  3 +-
 .../libgomp.c-c++-common/ptr-attach-2.c   | 60 ---
 4 files changed, 3 insertions(+), 115 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.c-c++-common/ptr-attach-2.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 450214556f9..9591d67251e 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14113,10 +14113,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
maybe_zero_len = true;
 
-  struct dim { tree low_bound, length; };
-  auto_vec dims (num);
-  dims.safe_grow (num);
-
   for (i = num, t = OMP_CLAUSE_DECL (c); i > 0;
   t = TREE_CHAIN (t))
{
@@ -14238,9 +14234,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  else
size = size_binop (MULT_EXPR, size, l);
}
-
- dim d = { low_bound, length };
- dims[i] = d;
}
   if (non_contiguous)
{
@@ -14288,23 +14281,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  OMP_CLAUSE_DECL (c) = t;
  return false;
}
-
-  tree aref = t;
-  for (i = 0; i < dims.length (); i++)
-   {
- if (dims[i].length && integer_onep (dims[i].length))
-   {
- tree lb = dims[i].low_bound;
- aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb);
-   }
- else
-   {
- if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE)
-   t = aref;
- break;
-   }
-   }
-
   first = c_fully_fold (first, false, NULL);
   OMP_CLAUSE_DECL (c) = first;
   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
@@ -14339,8 +14315,7 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  break;
}
   tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  if (TREE_CODE (t) == COMPONENT_REF || TREE_CODE (t) == ARRAY_REF
- || TREE_CODE (t) == INDIRECT_REF)
+  if (TREE_CODE (t) == COMPONENT_REF)
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
   else
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e7bda6fa060..93ff7cf5e1b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5605,10 +5605,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (processing_template_decl && maybe_zero_len)
return false;
 
-  struct dim { tree low_bound, length; };
-  auto_vec dims (num);
-  dims.safe_grow (num);
-
   for (i = num, t = OMP_CLAUSE_DECL (c); i > 0;
   t = TREE_CHAIN (t))
{
@@ -5728,9 +5724,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
  else
size = size_binop (MULT_EXPR, size, l);
}
-
- dim d = { low_bound, length };
- dims[i] = d;
}
   if (!processing_template_decl)
{
@@ -5782,24 +5775,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  OMP_CLAUSE_DECL (c) = t;
  return false;
}
-
- tree aref = t;
- for (i = 0; i < dims.length (); i++)
-   {
- if (dims[i].length && integer_onep (dims[i].length))
-   {
- tree lb = dims[i].low_bound;
- aref = convert_from_reference (aref);
- aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb);
-   }
- else
-   {
- if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE)
-   t = aref;
- break;
-   }
-   }
-
  OMP_CLAUSE_DECL (c) = first;
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
return false;
@@ -5841,8 +5816,7 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
  bool reference_always_pointer = true;
  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c),
  OMP_CLAUSE_MAP);
- 

[PATCH 00/14] [og13] OpenMP/OpenACC: map clause and OMP gimplify rework

2023-06-19 Thread Julian Brown
This series (for the og13 branch) is a rebased and merged version of
the first few patches of the series previously sent upstream for mainline:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html

The series contains patches 1-6 and the parts of 8 ("C++
"declare mapper" support) that pertain to reorganisation of
gimplify.cc:gimplify_{scan,adjust}_omp_clauses.

The series also contains reversions and rewrites of several patches
that needed adjustment in order to fit in with the new clause-processing
arrangements.

Tested with offloading to AMD GCN. I will apply shortly.

Thanks,

Julian

Julian Brown (14):
  Revert "Assumed-size arrays with non-lexical data mappings"
  Revert "Fix references declared in lexically-enclosing OpenACC data
region"
  Revert "Fix implicit mapping for array slices on lexically-enclosing
data constructs (PR70828)"
  Revert "openmp: Handle C/C++ array reference base-pointers in array
sections"
  OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in
{c_}finish_omp_clause
  OpenMP/OpenACC: Rework clause expansion and nested struct handling
  OpenMP: implicitly map base pointer for array-section pointer
components
  OpenMP: Pointers and member mappings
  OpenMP/OpenACC: Unordered/non-constant component offset runtime
diagnostic
  OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc
  OpenACC: Reimplement "inheritance" for lexically-nested offload
regions
  OpenACC: "declare create" fixes wrt. "allocatable" variables
  OpenACC: Allow implicit uses of assumed-size arrays in offload regions
  OpenACC: Improve implicit mapping for non-lexically nested offload
regions

 gcc/c-family/c-common.h   |   74 +-
 gcc/c-family/c-omp.cc |  837 -
 gcc/c/c-parser.cc |   17 +-
 gcc/c/c-typeck.cc |  773 ++--
 gcc/cp/parser.cc  |   17 +-
 gcc/cp/pt.cc  |4 +-
 gcc/cp/semantics.cc   | 1065 +++---
 gcc/fortran/dependency.cc |  128 +
 gcc/fortran/dependency.h  |1 +
 gcc/fortran/gfortran.h|1 +
 gcc/fortran/trans-openmp.cc   |  376 +-
 gcc/gimplify.cc   | 2239 
 gcc/omp-general.cc|  424 +++
 gcc/omp-general.h |   69 +
 gcc/omp-low.cc|   23 +-
 .../c-c++-common/goacc/acc-data-chain.c   |2 +-
 .../c-c++-common/goacc/combined-reduction.c   |2 +-
 .../c-c++-common/goacc/reduction-1.c  |4 +-
 .../c-c++-common/goacc/reduction-10.c |9 +-
 .../c-c++-common/goacc/reduction-2.c  |4 +-
 .../c-c++-common/goacc/reduction-3.c  |4 +-
 .../c-c++-common/goacc/reduction-4.c  |4 +-
 gcc/testsuite/c-c++-common/gomp/clauses-2.c   |2 +-
 gcc/testsuite/c-c++-common/gomp/target-50.c   |2 +-
 .../c-c++-common/gomp/target-enter-data-1.c   |4 +-
 .../c-c++-common/gomp/target-implicit-map-2.c |3 +-
 .../g++.dg/gomp/static-component-1.C  |   23 +
 gcc/testsuite/gcc.dg/gomp/target-3.c  |2 +-
 .../gfortran.dg/goacc/assumed-size.f90|   35 +
 .../gfortran.dg/goacc/loop-tree-1.f90 |2 +-
 gcc/testsuite/gfortran.dg/gomp/map-12.f90 |2 +-
 gcc/testsuite/gfortran.dg/gomp/map-9.f90  |2 +-
 .../gfortran.dg/gomp/map-subarray-2.f90   |   57 +
 .../gfortran.dg/gomp/map-subarray.f90 |   40 +
 gcc/tree-pretty-print.cc  |3 +
 gcc/tree.h|8 +
 include/gomp-constants.h  |9 +-
 libgomp/oacc-mem.c|6 +-
 libgomp/target.c  |   91 +-
 libgomp/testsuite/libgomp.c++/baseptrs-3.C|  275 ++
 libgomp/testsuite/libgomp.c++/baseptrs-4.C| 3154 +
 libgomp/testsuite/libgomp.c++/baseptrs-5.C|   62 +
 libgomp/testsuite/libgomp.c++/class-array-1.C |   59 +
 libgomp/testsuite/libgomp.c++/target-48.C |   32 +
 libgomp/testsuite/libgomp.c++/target-49.C |   37 +
 .../libgomp.c-c++-common/baseptrs-1.c |   50 +
 .../libgomp.c-c++-common/baseptrs-2.c |   70 +
 .../map-arrayofstruct-1.c |   38 +
 .../map-arrayofstruct-2.c |   58 +
 .../map-arrayofstruct-3.c |   68 +
 .../target-implicit-map-2.c   |2 +
 .../target-implicit-map-5.c   |   50 +
 .../libgomp.c-c++-common/target-map-zlas-1.c  |   36 +
 .../libgomp.fortran/map-subarray-2.f90|  108 +
 .../libgomp.fortran/map-subarray-3.f90|   62 +
 .../libgomp.fortran/map-subarray-4.f90|   35 +
 .../libgomp.fortran/map-subarray-5.f90|   54 +
 .../libgomp.fortran/map-subarray-6.f90|   26 +
 

Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-19 Thread Richard Sandiford via Gcc-patches
Jeff Law  writes:
> On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:
>> IVOPTs has strip_offset which suffers from the same issues regarding
>> integer overflow that split_constant_offset did but the latter was
>> fixed quite some time ago.  The following implements strip_offset
>> in terms of split_constant_offset, removing the redundant and
>> incorrect implementation.
>> 
>> The implementations are not exactly the same, strip_offset relies
>> on ptrdiff_tree_p to fend off too large offsets while split_constant_offset
>> simply assumes those do not happen and truncates them.  By
>> the same means strip_offset also handles POLY_INT_CSTs but
>> split_constant_offset does not.  Massaging the latter to
>> behave like strip_offset in those cases might be the way to go?
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> 
>> Comments?
>> 
>> Thanks,
>> Richard.
>> 
>>  PR tree-optimization/110243
>>  * tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
>>  (strip_offset): Make it a wrapper around split_constant_offset.
>> 
>>  * gcc.dg/torture/pr110243.c: New testcase.
> Your call -- IMHO you know this code far better than I.

+1, but LGTM FWIW.  I couldn't see anything obvious (and valid)
that split_offset_1 handles and split_constant_offset doesn't.

Thanks,
Richard


Re: [PATCH v2] RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/14/23 01:57, Jin Ma wrote:

In order to avoid interrupt functions to change the FCSR, it needs to be saved
and restored at the beginning and end of the function.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for 
FCSR.
(riscv_for_each_saved_reg): Save and restore FCSR in interrupt 
functions.
* config/riscv/riscv.md (riscv_frcsr): New patterns.
(riscv_fscsr): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/interrupt-fcsr-1.c: New test.
* gcc.target/riscv/interrupt-fcsr-2.c: New test.
* gcc.target/riscv/interrupt-fcsr-3.c: New test.

Thanks.  I pushed this to the trunk.
jeff


[PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches


Kewen, GCC maintainers:

Version 6, Fixed missing change log entry.  Changed builtin id names as
requested.  Missed making the change on the last version.  Fixed
comment in the three test cases.  Reran regression suite on Power 10,
no regressions.

Version 5, Tested the patch on P9 BE per request.  Fixed up test case
to get the correct expected values for BE and LE.  Fixed typos. 
Updated the doc/extend.texi to clarify the vector arguments.  Changed
test file names per request.  Moved builtin defs next to related
definitions.  Renamed new mode_attr. Removed new mode_iterator, used
existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
Fixed up overloaded definitions per request.

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 9 BE and Power 10 LE with no
regressions.  Please let me know if the patch is acceptable or not. 
Thanks.

   Carl


rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

The instructions used in the builtins operate on vector registers.  Thus
the result must be moved to a scalar type.  There is no clean, performant
way to do this.  The user code typically needs the result as a vector
anyway.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/vsx.md (V2DI_DI): New mode iterator.
(DI_to_TI): New mode attribute.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-16.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  21 +++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++-
 gcc/config/rs6000/rs6000-c.cc |  10 +-
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 gcc/config/rs6000/vsx.md  |  25 +++--
 gcc/doc/extend.texi   |  24 +++-
 

Re: [PATCH] Introduce hardbool attribute for C

2023-06-19 Thread Bernhard Reutner-Fischer via Gcc-patches
On 16 June 2023 07:35:27 CEST, Alexandre Oliva via Gcc-patches 
 wrote:

index 0..634feaed4deef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/hardbool-err.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+typedef _Bool __attribute__ ((__hardbool__))
+hbbl; /* { dg-error "integral types" } */
+
+typedef double __attribute__ ((__hardbool__))
+hbdbl; /* { dg-error "integral types" } */
+
+enum x;
+typedef enum x __attribute__ ((__hardbool__))
+hbenum; /* { dg-error "integral types" } */
+
+struct s;
+typedef struct s __attribute__ ((__hardbool__))
+hbstruct; /* { dg-error "integral types" } */
+
+typedef int __attribute__ ((__hardbool__ (0, 0)))
+hb00; /* { dg-error "different values" } */
+
+typedef int __attribute__ ((__hardbool__ (4, 16))) hb4x;
+struct s {
+ hb4x m:2;
+}; /* { dg-error "is a GCC extension|different values" } */
+/* { dg-warning "changes value" "warning" { target *-*-* } .-1 } */
+
+hb4x __attribute__ ((vector_size (4 * sizeof (hb4x
+vvar; /* { dg-error "invalid vector type" } */

Arm-chair, tinfoil hat still on, didn't look closely, hence:

I don't see explicit tests with _Complex nor __complex__. Would we want to 
check these here, or are they handled thought the "underlying" tests above?

I'd welcome a fortran interop note in the docs as hinted previously to cover 
out of the box behavior. It's probably reasonably unlikely but better be safe 
than sorry?
cheers,


Re: [PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches
Kewen:

On Mon, 2023-06-19 at 14:08 +0800, Kewen.Lin wrote:
> > 



> Hi Carl,
> 
> on 2023/6/17 01:57, Carl Love wrote:
> > overloaded instance. Update comments.
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
> > overloaded definitions.
> > * config/vsx.md (V2DI_DI): New mode iterator.
> 
> Missing an entry for DI_to_TI.

Opps, missed that.  Sorry, fixed.

> > 



> 
> >  
> >const signed long long __builtin_vsx_scalar_extract_expq
> > (_Float128);
> > -VSEEQP xsxexpqp_kf {}
> > +VSEEQP xsxexpqp_kf_di {}
> > +
> > +  vull __builtin_vsx_scalar_extract_exp_to_vec (_Float128);
> > +VSEEXPKF xsxexpqp_kf_v2di {}
> 
> As I pointed out previously, the related id is VSEEQP, since both of
> them

Oops, I guess I forgot to change that.  Sorry.

> have kf in their names, having KF in its id doesn't look good IMHO.
> How about VSEEQPV instead of VSEEXPKF?  It's also consistent with
> what
> we use for VSIEQP.

Yup, makes sense, changed to VSEEQPV.
> 
> >  
> >const signed __int128 __builtin_vsx_scalar_extract_sigq
> > (_Float128);
> > -VSESQP xsxsigqp_kf {}
> > +VSESQP xsxsigqp_kf_ti {}
> > +
> > +  vuq __builtin_vsx_scalar_extract_sig_to_vec (_Float128);
> > +VSESIGKF xsxsigqp_kf_v1ti {}
> 
> Similar to the above, s/VSESIGKF/VSESQPV/
 
Changed to VSESQPV.
> 
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned
> > __int128, \
> >   unsigned long
> > long);
> > -VSIEQP xsiexpqp_kf {}
> > +VSIEQP xsiexpqp_kf_di {}
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, \
> >unsigned
> > long long);
> >  VSIEQPF xsiexpqpf_kf {}
> >  
> > +  const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
> > +VSIEQPV xsiexpqp_kf_v2di {}
> > +
> >const signed int __builtin_vsx_scalar_test_data_class_qp
> > (_Float128, \
> >  const
> > int<7>);
> >  VSTDCQP xststdcqp_kf {}
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index 8555174d36e..11060f697db 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1929,11 +1929,15 @@ altivec_resolve_overloaded_builtin
> > (location_t loc, tree fndecl,
> >128-bit variant of built-in function.  */
> > if (GET_MODE_PRECISION (arg1_mode) > 64)
> >   {
> > -   /* If first argument is of float variety, choose variant
> > -  that expects __ieee128 argument.  Otherwise, expect
> > -  __int128 argument.  */
> > +   /* If first argument is of float variety, choose the
> > variant that
> > +  expects __ieee128 argument.  If the first argument is
> > vector
> > +  int, choose the variant that expects vector unsigned
> > +  __int128 argument.  Otherwise, expect scalar __int128
> > argument.
> > +   */
> > if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
> >   instance_code = RS6000_BIF_VSIEQPF;
> > +   else if (GET_MODE_CLASS (arg1_mode) == MODE_VECTOR_INT)
> > + instance_code = RS6000_BIF_VSIEQPV;
> > else
> >   instance_code = RS6000_BIF_VSIEQP;
> >   }
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..05a5ca6a04d 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -4515,6 +4515,18 @@
> >  VSIEQP
> >_Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned
> > long long);
> >  VSIEQPF
> > +  _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
> > +VSIEQPV
> > +
> > +[VEC_VSEEV, scalar_extract_exp_to_vec, \
> > +__builtin_vec_scalar_extract_exp_to_vector]
> > +  vull __builtin_vec_scalar_extract_exp_to_vector (_Float128);
> > +VSEEXPKF
> > +
> 
> Need to update if the above changes.

changed 
> 
> > +[VEC_VSESV, scalar_extract_sig_to_vec, \
> > +__builtin_vec_scalar_extract_sig_to_vector]
> > +  vuq __builtin_vec_scalar_extract_sig_to_vector (_Float128);
> > +VSESIGKF
> >  
> 
> Ditto.

changed

> 



> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-
> > exp-8.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-
> > 8.c
> > new file mode 100644
> > index 000..e24e09012d9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-8.c
> > @@ -0,0 +1,58 @@
> > +/* { dg-do run { target { powerpc*-*-* } } } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target p9vector_hw } */
> > +/* { dg-options "-mdejagnu-cpu=power9 -save-temps" } */
> > +
> > +#include 
> > +#include 
> > +
> > +#if 

Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:

IVOPTs has strip_offset which suffers from the same issues regarding
integer overflow that split_constant_offset did but the latter was
fixed quite some time ago.  The following implements strip_offset
in terms of split_constant_offset, removing the redundant and
incorrect implementation.

The implementations are not exactly the same, strip_offset relies
on ptrdiff_tree_p to fend off too large offsets while split_constant_offset
simply assumes those do not happen and truncates them.  By
the same means strip_offset also handles POLY_INT_CSTs but
split_constant_offset does not.  Massaging the latter to
behave like strip_offset in those cases might be the way to go?

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Comments?

Thanks,
Richard.

PR tree-optimization/110243
* tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
(strip_offset): Make it a wrapper around split_constant_offset.

* gcc.dg/torture/pr110243.c: New testcase.

Your call -- IMHO you know this code far better than I.

jeff


Re: [PATCH] RISC-V: Add VLS modes for GNU vectors

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 17:06, Juzhe-Zhong wrote:

This patch is a propsal patch is **NOT** ready to push since
after this patch the total machine modes will exceed 255 which will create ICE
in LTO:
   internal compiler error: in bp_pack_int_in_range, at data-streamer.h:290
Right.  Note that an ack from Jakub or Richi will be sufficient for the 
LTO fixes to go forward.





The reason we need to add VLS modes for following reason:
1. Enhance GNU vectors codegen:
For example:
  typedef int32_t vnx8si __attribute__ ((vector_size (32)));

  __attribute__ ((noipa)) void
  f_vnx8si (int32_t * in, int32_t * out)
  {
vnx8si v = *(vnx8si*)in;
*(vnx8si *) out = v;
  }

compile option: --param=riscv-autovec-preference=scalable
 before this patch:
 f_vnx8si:
 ld  a2,0(a0)
 ld  a3,8(a0)
 ld  a4,16(a0)
 ld  a5,24(a0)
 addisp,sp,-32
 sd  a2,0(a1)
 sd  a3,8(a1)
 sd  a4,16(a1)
 sd  a5,24(a1)
 addisp,sp,32
 jr  ra

After this patch:
f_vnx8si:
 vsetivlizero,8,e32,m2,ta,ma
 vle32.v v2,0(a0)
 vse32.v v2,0(a1)
 ret

2. Ehance VLA SLP:
void
f (uint8_t *restrict a, uint8_t *restrict b, uint8_t *restrict c)
{
   for (int i = 0; i < 100; ++i)
 {
   a[i * 8] = b[i * 8] + c[i * 8];
   a[i * 8 + 1] = b[i * 8] + c[i * 8 + 1];
   a[i * 8 + 2] = b[i * 8 + 2] + c[i * 8 + 2];
   a[i * 8 + 3] = b[i * 8 + 2] + c[i * 8 + 3];
   a[i * 8 + 4] = b[i * 8 + 4] + c[i * 8 + 4];
   a[i * 8 + 5] = b[i * 8 + 4] + c[i * 8 + 5];
   a[i * 8 + 6] = b[i * 8 + 6] + c[i * 8 + 6];
   a[i * 8 + 7] = b[i * 8 + 6] + c[i * 8 + 7];
 }
}


..
Loop body:
  ...
  vrgatherei16.vv...
  ...

Tail:
  lbu a4,792(a1)
 lbu a5,792(a2)
 addwa5,a5,a4
 sb  a5,792(a0)
 lbu a5,793(a2)
 addwa5,a5,a4
 sb  a5,793(a0)
 lbu a4,794(a1)
 lbu a5,794(a2)
 addwa5,a5,a4
 sb  a5,794(a0)
 lbu a5,795(a2)
 addwa5,a5,a4
 sb  a5,795(a0)
 lbu a4,796(a1)
 lbu a5,796(a2)
 addwa5,a5,a4
 sb  a5,796(a0)
 lbu a5,797(a2)
 addwa5,a5,a4
 sb  a5,797(a0)
 lbu a4,798(a1)
 lbu a5,798(a2)
 addwa5,a5,a4
 sb  a5,798(a0)
 lbu a5,799(a2)
 addwa5,a5,a4
 sb  a5,799(a0)
 ret

The tail elements need VLS modes to vectorize like ARM SVE:

f:
 mov x3, 0
 cntbx5
 mov x4, 792
 whilelo p7.b, xzr, x4
.L2:
 ld1bz31.b, p7/z, [x1, x3]
 ld1bz30.b, p7/z, [x2, x3]
 trn1z31.b, z31.b, z31.b
 add z31.b, z31.b, z30.b
 st1bz31.b, p7, [x0, x3]
 add x3, x3, x5
 whilelo p7.b, x3, x4
 b.any   .L2
Tail:
 ldr b31, [x1, 792]
 ldr b27, [x1, 794]
 ldr b28, [x1, 796]
 dup v31.8b, v31.b[0]
 ldr b29, [x1, 798]
 ldr d30, [x2, 792]
 ins v31.b[2], v27.b[0]
 ins v31.b[3], v27.b[0]
 ins v31.b[4], v28.b[0]
 ins v31.b[5], v28.b[0]
 ins v31.b[6], v29.b[0]
 ins v31.b[7], v29.b[0]
 add v31.8b, v30.8b, v31.8b
 str d31, [x0, 792]
 ret

Notice ARM SVE use ADVSIMD modes (Neon) to vectorize the tail.






gcc/ChangeLog:

 * config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add VLS modes for 
GNU vectors.
 (ADJUST_ALIGNMENT): Ditto.
 (ADJUST_BYTESIZE): Ditto.

 (ADJUST_PRECISION): Ditto.
 (VECTOR_MODES): Ditto.
 * config/riscv/riscv-protos.h (riscv_v_ext_vls_mode_p): Ditto.
 (get_regno_alignment): Ditto.
 * config/riscv/riscv-v.cc (INCLUDE_ALGORITHM): Ditto.
 (const_vlmax_p): Ditto.
 (legitimize_move): Ditto.
 (get_vlmul): Ditto.
 (get_regno_alignment): Ditto.
 (get_ratio): Ditto.
 (get_vector_mode): Ditto.
 * config/riscv/riscv-vector-switch.def (VLS_ENTRY): Ditto.
 * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Ditto.
 (VLS_ENTRY): Ditto.
 (riscv_v_ext_mode_p): Ditto.
 (riscv_hard_regno_nregs): Ditto.
 (riscv_hard_regno_mode_ok): Ditto.
 * config/riscv/riscv.md: Ditto.
 * config/riscv/vector-iterators.md: Ditto.
 * config/riscv/vector.md: Ditto.
 * config/riscv/autovec-vls.md: New file.

---
So I expected we were going to have to define some static length 
patterns at some point.  So this isn't a huge surprise.








diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6421e933ca9..6fc1c433069 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ 

Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-19 Thread Richard Biener via Gcc-patches



> Am 19.06.2023 um 19:03 schrieb Alexander Monakov :
> 
> 
> Ping. OK for trunk?

Ok if the FE maintainers do not object within 48h.

Thanks,
Richard 

>> On Mon, 5 Jun 2023, Alexander Monakov wrote:
>> 
>> Ping for the front-end maintainers' input.
>> 
>>> On Mon, 22 May 2023, Richard Biener wrote:
>>> 
>>> On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
>>>  wrote:
 
 Implement -ffp-contract=on for C and C++ without changing default
 behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
>>> 
>>> The documentation changes mention the defaults are changed for
>>> standard modes, I suppose you want to remove that hunk.
>>> 
 gcc/c-family/ChangeLog:
 
* c-gimplify.cc (fma_supported_p): New helper.
(c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
contraction.
 
 gcc/ChangeLog:
 
* common.opt (fp_contract_mode) [on]: Remove fallback.
* config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
* doc/invoke.texi (-ffp-contract): Update.
* trans-mem.cc (diagnose_tm_1): Skip internal function calls.
 ---
 gcc/c-family/c-gimplify.cc | 78 ++
 gcc/common.opt |  3 +-
 gcc/config/sh/sh.md|  2 +-
 gcc/doc/invoke.texi|  8 ++--
 gcc/trans-mem.cc   |  3 ++
 5 files changed, 88 insertions(+), 6 deletions(-)
 
 diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
 index ef5c7d919f..f7635d3b0c 100644
 --- a/gcc/c-family/c-gimplify.cc
 +++ b/gcc/c-family/c-gimplify.cc
 @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-ubsan.h"
 #include "tree-nested.h"
 #include "context.h"
 +#include "tree-pass.h"
 +#include "internal-fn.h"
 
 /*  The gimplification pass converts the language-dependent trees
 (ld-trees) emitted by the parser into language-independent trees
 @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree 
 body)
   return bind;
 }
 
 +/* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
 +
 +static bool
 +fma_supported_p (enum internal_fn fn, tree type)
 +{
 +  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
 +}
 +
 /* Gimplification of expression trees.  */
 
 /* Do C-specific gimplification on *EXPR_P.  PRE_P and POST_P are as in
 @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
 ATTRIBUTE_UNUSED,
break;
   }
 
 +case PLUS_EXPR:
 +case MINUS_EXPR:
 +  {
 +   tree type = TREE_TYPE (*expr_p);
 +   /* For -ffp-contract=on we need to attempt FMA contraction only
 +  during initial gimplification.  Late contraction across 
 statement
 +  boundaries would violate language semantics.  */
 +   if (SCALAR_FLOAT_TYPE_P (type)
 +   && flag_fp_contract_mode == FP_CONTRACT_ON
 +   && cfun && !(cfun->curr_properties & PROP_gimple_any)
 +   && fma_supported_p (IFN_FMA, type))
 + {
 +   bool neg_mul = false, neg_add = code == MINUS_EXPR;
 +
 +   tree *op0_p = _OPERAND (*expr_p, 0);
 +   tree *op1_p = _OPERAND (*expr_p, 1);
 +
 +   /* Look for ±(x * y) ± z, swapping operands if necessary.  */
 +   if (TREE_CODE (*op0_p) == NEGATE_EXPR
 +   && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR)
 + /* '*EXPR_P' is '-(x * y) ± z'.  This is fine.  */;
 +   else if (TREE_CODE (*op0_p) != MULT_EXPR)
 + {
 +   std::swap (op0_p, op1_p);
 +   std::swap (neg_mul, neg_add);
 + }
 +   if (TREE_CODE (*op0_p) == NEGATE_EXPR)
 + {
 +   op0_p = _OPERAND (*op0_p, 0);
 +   neg_mul = !neg_mul;
 + }
 +   if (TREE_CODE (*op0_p) != MULT_EXPR)
 + break;
 +   auto_vec ops (3);
 +   ops.quick_push (TREE_OPERAND (*op0_p, 0));
 +   ops.quick_push (TREE_OPERAND (*op0_p, 1));
 +   ops.quick_push (*op1_p);
 +
 +   enum internal_fn ifn = IFN_FMA;
 +   if (neg_mul)
 + {
 +   if (fma_supported_p (IFN_FNMA, type))
 + ifn = IFN_FNMA;
 +   else
 + ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
 + }
 +   if (neg_add)
 + {
 +   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : 
 IFN_FNMS;
 +   if (fma_supported_p (ifn2, type))
 + ifn = ifn2;
 +   else
 + 

Re: Tiny phiprop compile time optimization

2023-06-19 Thread Richard Biener via Gcc-patches



> Am 19.06.2023 um 20:08 schrieb Andrew Pinski via Gcc-patches 
> :
> 
> On Mon, Jun 19, 2023 at 1:32 AM Richard Biener via Gcc-patches
>  wrote:
>> 
>>> On Mon, 19 Jun 2023, Jan Hubicka wrote:
>>> 
>>> Hi,
>>> this patch avoids unnecessary post dominator and update_ssa in phiprop.
>>> 
>>> Bootstrapped/regtested x86_64-linux, OK?
>>> 
>>> gcc/ChangeLog:
>>> 
>>>  * tree-ssa-phiprop.cc (propagate_with_phi): Add 
>>> post_dominators_computed;
>>>  compute post dominators lazilly.
>>>  (const pass_data pass_data_phiprop): Remove TODO_update_ssa.
>>>  (pass_phiprop::execute): Update; return TODO_update_ssa if something
>>>  changed.
>>> 
>>> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
>>> index 3cb4900b6be..87e3a2ccf3a 100644
>>> --- a/gcc/tree-ssa-phiprop.cc
>>> +++ b/gcc/tree-ssa-phiprop.cc
>>> @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data)
>>> 
>>> static bool
>>> propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
>>> - size_t n)
>>> + size_t n, bool *post_dominators_computed)
>>> {
>>>   tree ptr = PHI_RESULT (phi);
>>>   gimple *use_stmt;
>>> @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
>>> phiprop_d *phivn,
>>>   gimple *def_stmt;
>>>   tree vuse;
>>> 
>>> +  if (!*post_dominators_computed)
>>> +{
>>> +   calculate_dominance_info (CDI_POST_DOMINATORS);
>>> +   *post_dominators_computed = true;
>> 
>> I think you can save the parameter by using dom_info_available_p () here
>> and ...
>> 
>>> + }
>>> +
>>>   /* Only replace loads in blocks that post-dominate the PHI node.  That
>>>  makes sure we don't end up speculating loads.  */
>>>   if (!dominated_by_p (CDI_POST_DOMINATORS,
>>> @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop =
>>>   0, /* properties_provided */
>>>   0, /* properties_destroyed */
>>>   0, /* todo_flags_start */
>>> -  TODO_update_ssa, /* todo_flags_finish */
>>> +  0, /* todo_flags_finish */
>>> };
>>> 
>>> class pass_phiprop : public gimple_opt_pass
>>> @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun)
>>>   gphi_iterator gsi;
>>>   unsigned i;
>>>   size_t n;
>>> +  bool post_dominators_computed = false;
>>> 
>>>   calculate_dominance_info (CDI_DOMINATORS);
>>> -  calculate_dominance_info (CDI_POST_DOMINATORS);
>>> 
>>>   n = num_ssa_names;
>>>   phivn = XCNEWVEC (struct phiprop_d, n);
>>> @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun)
>>>   if (bb_has_abnormal_pred (bb))
>>>  continue;
>>>   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
>>> - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
>>> + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n,
>>> +  _dominators_computed);
>>> }
>>> 
>>>   if (did_something)
>>> @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun)
>>> 
>>>   free (phivn);
>>> 
>>> -  free_dominance_info (CDI_POST_DOMINATORS);
>>> +  if (post_dominators_computed)
>>> +free_dominance_info (CDI_POST_DOMINATORS);
>> 
>> unconditionally free_dominance_info here.
>> 
>>> -  return 0;
>>> +  return did_something ? TODO_update_ssa : 0;
>> 
>> I guess that change is following general practice and good to catch
>> undesired changes (update_ssa will exit early when there's nothing
>> to do anyway).
> 
> I wonder if TODO_update_ssa_only_virtuals should be used here rather
> than TODO_update_ssa as the code produces ssa names already and just
> adds memory loads/stores. But I could be wrong.

I guess it should be able to update virtual SSA form itself.  But it’s been 
some time since I wrote the pass …

> 
> Thanks,
> Andrew Pinski
> 
> 
>> 
>> OK with those changes.


Re: Tiny phiprop compile time optimization

2023-06-19 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 19, 2023 at 1:32 AM Richard Biener via Gcc-patches
 wrote:
>
> On Mon, 19 Jun 2023, Jan Hubicka wrote:
>
> > Hi,
> > this patch avoids unnecessary post dominator and update_ssa in phiprop.
> >
> > Bootstrapped/regtested x86_64-linux, OK?
> >
> > gcc/ChangeLog:
> >
> >   * tree-ssa-phiprop.cc (propagate_with_phi): Add 
> > post_dominators_computed;
> >   compute post dominators lazilly.
> >   (const pass_data pass_data_phiprop): Remove TODO_update_ssa.
> >   (pass_phiprop::execute): Update; return TODO_update_ssa if something
> >   changed.
> >
> > diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
> > index 3cb4900b6be..87e3a2ccf3a 100644
> > --- a/gcc/tree-ssa-phiprop.cc
> > +++ b/gcc/tree-ssa-phiprop.cc
> > @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data)
> >
> >  static bool
> >  propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
> > - size_t n)
> > + size_t n, bool *post_dominators_computed)
> >  {
> >tree ptr = PHI_RESULT (phi);
> >gimple *use_stmt;
> > @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
> > phiprop_d *phivn,
> >gimple *def_stmt;
> >tree vuse;
> >
> > +  if (!*post_dominators_computed)
> > +{
> > +   calculate_dominance_info (CDI_POST_DOMINATORS);
> > +   *post_dominators_computed = true;
>
> I think you can save the parameter by using dom_info_available_p () here
> and ...
>
> > + }
> > +
> >/* Only replace loads in blocks that post-dominate the PHI node.  
> > That
> >   makes sure we don't end up speculating loads.  */
> >if (!dominated_by_p (CDI_POST_DOMINATORS,
> > @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop =
> >0, /* properties_provided */
> >0, /* properties_destroyed */
> >0, /* todo_flags_start */
> > -  TODO_update_ssa, /* todo_flags_finish */
> > +  0, /* todo_flags_finish */
> >  };
> >
> >  class pass_phiprop : public gimple_opt_pass
> > @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun)
> >gphi_iterator gsi;
> >unsigned i;
> >size_t n;
> > +  bool post_dominators_computed = false;
> >
> >calculate_dominance_info (CDI_DOMINATORS);
> > -  calculate_dominance_info (CDI_POST_DOMINATORS);
> >
> >n = num_ssa_names;
> >phivn = XCNEWVEC (struct phiprop_d, n);
> > @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun)
> >if (bb_has_abnormal_pred (bb))
> >   continue;
> >for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
> > - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
> > + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n,
> > +  _dominators_computed);
> >  }
> >
> >if (did_something)
> > @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun)
> >
> >free (phivn);
> >
> > -  free_dominance_info (CDI_POST_DOMINATORS);
> > +  if (post_dominators_computed)
> > +free_dominance_info (CDI_POST_DOMINATORS);
>
> unconditionally free_dominance_info here.
>
> > -  return 0;
> > +  return did_something ? TODO_update_ssa : 0;
>
> I guess that change is following general practice and good to catch
> undesired changes (update_ssa will exit early when there's nothing
> to do anyway).

I wonder if TODO_update_ssa_only_virtuals should be used here rather
than TODO_update_ssa as the code produces ssa names already and just
adds memory loads/stores. But I could be wrong.

Thanks,
Andrew Pinski


>
> OK with those changes.


Re: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 07:16, 钟居哲 wrote:

Thanks for cleaning up codes for future's ABI support patch.
Let's wait for Jeff or Robin comments.

Looks reasonable to me given the state we're in WRT psabi and vectors.

jeff


Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 05:41, Richard Biener via Gcc-patches wrote:

On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches
 wrote:


Hi,


With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x + 
0.0' to 'x'.



OK if you bootstrapped / tested this change.
I'm suspect Toru doesn't have write access.  So I went ahead and did and 
x86 bootstrap & regression test which passed.  The ChangeLog entry 
needed fleshing out a bit and fixed a minor whitespace problem in the 
patch itself.


Pushed to the trunk.


jeff


Re: [PATCH] debug/110295 - mixed up early/late debug for member DIEs

2023-06-19 Thread Jason Merrill via Gcc-patches

On 6/19/23 06:15, Richard Biener wrote:

When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members.  Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.

The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.

[LTO] Bootstrapped on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

PR debug/110295
* dwarf2out.cc (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.

* g++.dg/debug/pr110295.C: New testcase.
---
  gcc/dwarf2out.cc  |  3 ++-
  gcc/testsuite/g++.dg/debug/pr110295.C | 19 +++
  2 files changed, 21 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/debug/pr110295.C

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d89ffa66847..e70c47cec8d 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -26533,7 +26533,8 @@ process_scope_var (tree stmt, tree decl, tree origin, 
dw_die_ref context_die)
  
if (die != NULL && die->die_parent == NULL)

  add_child_die (context_die, die);


I wonder about reorganizing the function a bit to unify this parent 
setting with the one a bit below, which already falls through to 
gen_decl_die:



  if (decl && DECL_P (decl))
{
  die = lookup_decl_die (decl);

  /* Early created DIEs do not have a parent as the decls refer 
 to the function as DECL_CONTEXT rather than the BLOCK.  */

  if (die && die->die_parent == NULL)
{
  gcc_assert (in_lto_p);
  add_child_die (context_die, die);
}
}


OK either way.

Jason



Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Manolis Tsamis
On Mon, Jun 19, 2023 at 7:57 PM Thiago Jung Bauermann
 wrote:
>
>
> Hello Manolis,
>
> Philipp Tomsich  writes:
>
> > On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> >>
> >> On 5/25/23 06:35, Manolis Tsamis wrote:
> >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> >> > restriction and allow propagation when no mode change is requested.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> >> > propagation.
> >> Thanks for the clarification.  This is OK for the trunk.  It looks
> >> generic enough to have value going forward now rather than waiting.
> >
> > Rebased, retested, and applied to trunk.  Thanks!
>
> Our CI found a couple of tests that started failing on aarch64-linux
> after this commit. I was able to confirm manually that they don't happen
> in the commit immediately before this one, and also that these failures
> are still present in today's trunk.
>
> I have testsuite logs for last good commit, first bad commit and current
> trunk here:
>
> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
>
> Could you please check?
>
> These are the new failures:
>
> Running gcc:gcc.target/aarch64/aarch64.exp ...
> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, 
> sp 1
>
> Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
> -fno-stack-protector  check-function-bodies caller_pred
> FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
> #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
> z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
> z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
> z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
> z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - 
> z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: 

Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-19 Thread Alexander Monakov via Gcc-patches


Ping. OK for trunk?

On Mon, 5 Jun 2023, Alexander Monakov wrote:

> Ping for the front-end maintainers' input.
> 
> On Mon, 22 May 2023, Richard Biener wrote:
> 
> > On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
> >  wrote:
> > >
> > > Implement -ffp-contract=on for C and C++ without changing default
> > > behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
> > 
> > The documentation changes mention the defaults are changed for
> > standard modes, I suppose you want to remove that hunk.
> > 
> > > gcc/c-family/ChangeLog:
> > >
> > > * c-gimplify.cc (fma_supported_p): New helper.
> > > (c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
> > > contraction.
> > >
> > > gcc/ChangeLog:
> > >
> > > * common.opt (fp_contract_mode) [on]: Remove fallback.
> > > * config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
> > > * doc/invoke.texi (-ffp-contract): Update.
> > > * trans-mem.cc (diagnose_tm_1): Skip internal function calls.
> > > ---
> > >  gcc/c-family/c-gimplify.cc | 78 ++
> > >  gcc/common.opt |  3 +-
> > >  gcc/config/sh/sh.md|  2 +-
> > >  gcc/doc/invoke.texi|  8 ++--
> > >  gcc/trans-mem.cc   |  3 ++
> > >  5 files changed, 88 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> > > index ef5c7d919f..f7635d3b0c 100644
> > > --- a/gcc/c-family/c-gimplify.cc
> > > +++ b/gcc/c-family/c-gimplify.cc
> > > @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "c-ubsan.h"
> > >  #include "tree-nested.h"
> > >  #include "context.h"
> > > +#include "tree-pass.h"
> > > +#include "internal-fn.h"
> > >
> > >  /*  The gimplification pass converts the language-dependent trees
> > >  (ld-trees) emitted by the parser into language-independent trees
> > > @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree 
> > > body)
> > >return bind;
> > >  }
> > >
> > > +/* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
> > > +
> > > +static bool
> > > +fma_supported_p (enum internal_fn fn, tree type)
> > > +{
> > > +  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
> > > +}
> > > +
> > >  /* Gimplification of expression trees.  */
> > >
> > >  /* Do C-specific gimplification on *EXPR_P.  PRE_P and POST_P are as in
> > > @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> > > ATTRIBUTE_UNUSED,
> > > break;
> > >}
> > >
> > > +case PLUS_EXPR:
> > > +case MINUS_EXPR:
> > > +  {
> > > +   tree type = TREE_TYPE (*expr_p);
> > > +   /* For -ffp-contract=on we need to attempt FMA contraction only
> > > +  during initial gimplification.  Late contraction across 
> > > statement
> > > +  boundaries would violate language semantics.  */
> > > +   if (SCALAR_FLOAT_TYPE_P (type)
> > > +   && flag_fp_contract_mode == FP_CONTRACT_ON
> > > +   && cfun && !(cfun->curr_properties & PROP_gimple_any)
> > > +   && fma_supported_p (IFN_FMA, type))
> > > + {
> > > +   bool neg_mul = false, neg_add = code == MINUS_EXPR;
> > > +
> > > +   tree *op0_p = _OPERAND (*expr_p, 0);
> > > +   tree *op1_p = _OPERAND (*expr_p, 1);
> > > +
> > > +   /* Look for ±(x * y) ± z, swapping operands if necessary.  */
> > > +   if (TREE_CODE (*op0_p) == NEGATE_EXPR
> > > +   && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR)
> > > + /* '*EXPR_P' is '-(x * y) ± z'.  This is fine.  */;
> > > +   else if (TREE_CODE (*op0_p) != MULT_EXPR)
> > > + {
> > > +   std::swap (op0_p, op1_p);
> > > +   std::swap (neg_mul, neg_add);
> > > + }
> > > +   if (TREE_CODE (*op0_p) == NEGATE_EXPR)
> > > + {
> > > +   op0_p = _OPERAND (*op0_p, 0);
> > > +   neg_mul = !neg_mul;
> > > + }
> > > +   if (TREE_CODE (*op0_p) != MULT_EXPR)
> > > + break;
> > > +   auto_vec ops (3);
> > > +   ops.quick_push (TREE_OPERAND (*op0_p, 0));
> > > +   ops.quick_push (TREE_OPERAND (*op0_p, 1));
> > > +   ops.quick_push (*op1_p);
> > > +
> > > +   enum internal_fn ifn = IFN_FMA;
> > > +   if (neg_mul)
> > > + {
> > > +   if (fma_supported_p (IFN_FNMA, type))
> > > + ifn = IFN_FNMA;
> > > +   else
> > > + ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
> > > + }
> > > +   if (neg_add)
> > > + {
> > > +   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : 
> > > IFN_FNMS;
> > > +   if (fma_supported_p (ifn2, type))
> > > + ifn = ifn2;
> > > +   else
> > > + ops[2] = 

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Thiago Jung Bauermann via Gcc-patches


Hello Manolis,

Philipp Tomsich  writes:

> On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
>>
>> On 5/25/23 06:35, Manolis Tsamis wrote:
>> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
>> > in all cases, due to maybe_mode_change returning NULL. Relax this
>> > restriction and allow propagation when no mode change is requested.
>> >
>> > gcc/ChangeLog:
>> >
>> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
>> > propagation.
>> Thanks for the clarification.  This is OK for the trunk.  It looks
>> generic enough to have value going forward now rather than waiting.
>
> Rebased, retested, and applied to trunk.  Thanks!

Our CI found a couple of tests that started failing on aarch64-linux
after this commit. I was able to confirm manually that they don't happen
in the commit immediately before this one, and also that these failures
are still present in today's trunk.

I have testsuite logs for last good commit, first bad commit and current
trunk here:

https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/

Could you please check?

These are the new failures:

Running gcc:gcc.target/aarch64/aarch64.exp ...
FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 
1

Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
-fno-stack-protector  check-function-bodies caller_pred
FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
#8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
#8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
#8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - 
z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - 
z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler 

[PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-19 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch is apply LEN_MASK_{LOAD,STORE} into vectorizer.
I refactor gimple IR build to make codes look cleaner.

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Add 
LEN_MASK_{LOAD,STORE} vectorizer support.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* optabs-query.cc (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support.
(vectorizable_load): Ditto.

---
 gcc/internal-fn.cc |  35 +-
 gcc/optabs-query.cc|  25 +++-
 gcc/tree-vect-stmts.cc | 259 +
 3 files changed, 213 insertions(+), 106 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..e10c21de5f1 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
  * OPTAB.  */
 
 static void
-expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
 {
   class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
@@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
-  rhs = gimple_call_arg (stmt, 3);
+  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
   type = TREE_TYPE (rhs);
   lhs = expand_call_mem_ref (type, stmt, 0);
 
@@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
 case IFN_GATHER_LOAD:
 case IFN_MASK_GATHER_LOAD:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   return true;
 
 default:
@@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   return true;
 
 default:
@@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_STORE_LANES:
   return 2;
 
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
+  return 3;
+
 case IFN_MASK_GATHER_LOAD:
 case IFN_MASK_SCATTER_STORE:
   return 4;
@@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
 case IFN_LEN_STORE:
   return 3;
 
+case IFN_LEN_MASK_STORE:
+  return 4;
+
 default:
   return -1;
 }
@@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
 {
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);
+  int bias_argno = 3;
+  if (icode == CODE_FOR_nothing)
+{
+  machine_mode mask_mode
+   = targetm.vectorize.get_mask_mode (mode).require ();
+  if (ifn == IFN_LEN_LOAD)
+   {
+ /* Try LEN_MASK_LOAD.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
+   }
+  else
+   {
+ /* Try LEN_MASK_STORE.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
+   }
+  icode = convert_optab_handler (optab, mode, mask_mode);
+  bias_argno = 4;
+}
 
   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
-  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (0)))
return 0;
-  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (-1)))
return -1;
 }
 
diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 276f8408dd7..4394d391200 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -566,11 +566,14 @@ can_vec_mask_load_store_p (machine_mode mode,
   bool is_load)
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
+  optab len_op = is_load ? len_maskload_optab : len_maskstore_optab;
   machine_mode vmode;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
-return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
+return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing
+  || convert_optab_handler (len_op, mode, mask_mode)
+   != CODE_FOR_nothing;
 
   /* Otherwise, return true if there is some vector mode with
  the mask load/store supported.  */
@@ -584,7 +587,9 @@ can_vec_mask_load_store_p (machine_mode mode,
   vmode = targetm.vectorize.preferred_simd_mode (smode);
   if (VECTOR_MODE_P (vmode)
   && targetm.vectorize.get_mask_mode (vmode).exists (_mode)
-  && 

Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jan Hubicka via Gcc-patches
> On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:
> 
> > On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
> > wrote:
> > > - if (max_size() - size() < __n)
> > > -   __throw_length_error(__N(__s));
> > > + const size_type __max_size = max_size();
> > > + // On 64bit systems vectors can not reach overflow by growing
> > > + // by small sizes; before this happens, we will run out of memory.
> > > + if (__builtin_constant_p(__n)
> > > + && __builtin_constant_p(__max_size)
> > > + && sizeof(ptrdiff_t) >= 8
> > > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
> >
> > Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?
> >
> 
> For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp). But
> for a user-defined allocator that has a silly max_size(), yes, that's
> possible.
> 
> I still don't really understand why any change is needed here. The PR says
> that the current _M_check_len brings in the EH code, but how/why does that
> happen? The __throw_length_error function is not inline, it's defined in
> libstdc++.so, so why isn't it just an extern call? Is the problem that it

It is really quite interesting peformance problem which does affect real
code. Extra extern call counts (especially since it is seen as
3 calls by inliner).  

This is _M_check_len after early optimizations (so as seen by inline
heuristics):

   [local count: 1073741824]:
  _15 = this_7(D)->D.26656._M_impl.D.25963._M_finish;
  _14 = this_7(D)->D.26656._M_impl.D.25963._M_start;
  _13 = _15 - _14;
  _10 = _13 /[ex] 8;
  _8 = (long unsigned int) _10;
  _1 = 1152921504606846975 - _8;
  __n.3_2 = __n;
  if (_1 < __n.3_2)
goto ; [0.04%]
  else
goto ; [99.96%]

   [local count: 429496]:
  std::__throw_length_error (__s_16(D));

   [local count: 1073312329]:
  D.27696 = _8;
  if (__n.3_2 > _8)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 364926196]:

   [local count: 1073312330]:
  # _18 = PHI <(4), &__n(5)>
  _3 = *_18;
  __len_11 = _3 + _8;
  D.27696 ={v} {CLOBBER(eol)};
  if (_8 > __len_11)
goto ; [35.00%]
  else
goto ; [65.00%]

   [local count: 697653013]:
  _5 = MIN_EXPR <__len_11, 1152921504606846975>;

   [local count: 1073312330]:
  # iftmp.4_4 = PHI <1152921504606846975(6), _5(7)>
  return iftmp.4_4;

So a lot of code that is essnetially semantically equivalent to:

   return __size + MAX_EXPR (__n, __size)

at least with the default allocator.

Early inliner decides that it is not good idea to early inline. 
At this stage we inline mostly calls where we expect code to get
smaller after inlining and since the function contains another
uninlinable call, this does not seem likely.

With -O3 we will inline it later at IPA stage, but only when the code is
considered hot. 
With -O2 we decide to keep it offline if the unit contians multiple
calls to the function otherwise we inline it since it wins in the code
size estimation model.

The problem is that _M_check_len is used by _M_realloc_insert that later
feeds result to the allocator.  There is extra redundancy since
allocator can call std::__throw_bad_array_new_length and 
std::__throw_bad_alloc for bad sizes, but _M_check_len will not produce
them which is something we won't work out before inlning it.

As a result _M_realloc_insert is seen as very large function by
inliner heuristics (71 instructions).  Functions that are not
declared inline are inlined if smaller than 15 instructions with -O2
and 30 instructions with -O3. So we don't inline.

This hurts common lops that use vector as a stack and calls push_back
in internal loop. Not inlining prevents SRA and we end up saving and
loading the end of vector pointer on every iteration of the loop.

The following testcase:

typedef unsigned int uint32_t;
std::pair pair;
void
test()
{
std::vector> stack;
stack.push_back (pair);
while (!stack.empty()) {
std::pair cur = stack.back();
stack.pop_back();
if (!cur.first)
{
cur.second++;
stack.push_back (cur);
}
if (cur.second > 1)
break;
}
}
int
main()
{
for (int i = 0; i < 1; i++)
  test();
}

Runs for me 0.5s with _M_realoc_insert not inlined and 0.048s with
_M_realloc_insert inlined.  Clang inlines it even at -O2 and does
0.063s.  I believe it is the reason why jpegxl library is slower
when built with GCC and since such loops are quite common in say
DFS walk, I think it is frequent problem.
> makes _M_check_len potentially-throwing? Because that's basically the
> entire point of _M_check_len: to throw the exception that is required by
> the C++ standard. We need to be very careful about removing that required
> throw! And after we call _M_check_len we call allocate unconditionally, so
> _M_realloc_insert can always throw (we only 

[PATCH] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-19 Thread Carl Love via Gcc-patches
GCC maintainers:


The GLibC team requested a builtin to replace the mffscrn and mffscrniinline 
asm instructions in the GLibC code.  Previously there was discussion on adding 
builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 


rs6000, __builtin_set_fpscr_rn add retrun value

Change the return value from void to double.  The return value consists of
the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
overloaded version which accepts a double argument.

The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
double reterun value and the new double argument.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Delete.
(__builtin_set_fpscr_rn_i): New builtin definition.
(__builtin_set_fpscr_rn_d): New builtin definition.
* config/rs6000/rs6000-overload.def (__builtin_set_fpscr_rn): New
overloaded definition.
* config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
define_expand.
(rs6000_update_fpscr_rn_field): New define_expand.
(rs6000_set_fpscr_rn_d): New define expand.
(rs6000_set_fpscr_rn_i): Renamed from rs6000_set_fpscr_rn, Added
return argument.  Updated to use new rs6000_get_fpscr_fields and
rs6000_update_fpscr_rn_field define _expands.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value and new double argument.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Add new tests th check
double return value.  Add tests for overloaded double argument.
re
---
 gcc/config/rs6000/rs6000-builtins.def |   7 +-
 gcc/config/rs6000/rs6000-overload.def |   6 +
 gcc/config/rs6000/rs6000.md   | 122 ---
 gcc/doc/extend.texi   |  25 ++-
 .../powerpc/test_fpscr_rn_builtin.c   | 143 +-
 5 files changed, 262 insertions(+), 41 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 289a37998b1..30e0b0bb06d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -237,8 +237,11 @@
   const __ibm128 __builtin_pack_ibm128 (double, double);
 PACK_IF packif {ibm128}
 
-  void __builtin_set_fpscr_rn (const int[0,3]);
-SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
+  double __builtin_set_fpscr_rn_i (const int[0,3]);
+SET_FPSCR_RN_I rs6000_set_fpscr_rn_i {nosoft}
+
+  double __builtin_set_fpscr_rn_d (double);
+SET_FPSCR_RN_D rs6000_set_fpscr_rn_d {nosoft}
 
   const double __builtin_unpack_ibm128 (__ibm128, const int<1>);
 UNPACK_IF unpackif {ibm128}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 

Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jonathan Wakely via Gcc-patches
On Mon, 19 Jun 2023 at 16:13, Jonathan Wakely  wrote:

> On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:
>
>> On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
>> wrote:
>> > - if (max_size() - size() < __n)
>> > -   __throw_length_error(__N(__s));
>> > + const size_type __max_size = max_size();
>> > + // On 64bit systems vectors can not reach overflow by growing
>> > + // by small sizes; before this happens, we will run out of memory.
>> > + if (__builtin_constant_p(__n)
>> > + && __builtin_constant_p(__max_size)
>> > + && sizeof(ptrdiff_t) >= 8
>> > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
>>
>> Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?
>>
>
> For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp).
> But for a user-defined allocator that has a silly max_size(), yes, that's
> possible.
>
> I still don't really understand why any change is needed here. The PR says
> that the current _M_check_len brings in the EH code, but how/why does that
> happen? The __throw_length_error function is not inline, it's defined in
> libstdc++.so, so why isn't it just an extern call? Is the problem that it
> makes _M_check_len potentially-throwing? Because that's basically the
> entire point of _M_check_len: to throw the exception that is required by
> the C++ standard. We need to be very careful about removing that required
> throw! And after we call _M_check_len we call allocate unconditionally, so
> _M_realloc_insert can always throw (we only call _M_realloc_insert in the
> case where we've already decided a reallocation is definitely needed).
>
> Would this version of _M_check_len help?
>
>   size_type
>   _M_check_len(size_type __n, const char* __s) const
>   {
> const size_type __size = size();
> const size_type __max_size = max_size();
>
> if (__is_same(allocator_type, allocator<_Tp>)
>   && __size > __max_size / 2)
>

This check is wrong for C++17 and older standards, because max_size()
changed value in C++20.

In C++17 it was PTRDIFF_MAX / sizeof(T) but in C++20 it's SIZE_MAX /
sizeof(T). So on 32-bit targets using C++17, it's possible a std::vector
could use PTRDIFF_MAX/2 bytes, and then the size <= max_size/2 assumption
would not hold.


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jonathan Wakely via Gcc-patches
P.S. please CC libstd...@gcc.gnu.org for all libstdc++ patches.

On Mon, 19 Jun 2023 at 16:13, Jonathan Wakely  wrote:

> On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:
>
>> On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
>> wrote:
>> > - if (max_size() - size() < __n)
>> > -   __throw_length_error(__N(__s));
>> > + const size_type __max_size = max_size();
>> > + // On 64bit systems vectors can not reach overflow by growing
>> > + // by small sizes; before this happens, we will run out of memory.
>> > + if (__builtin_constant_p(__n)
>> > + && __builtin_constant_p(__max_size)
>> > + && sizeof(ptrdiff_t) >= 8
>> > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
>>
>> Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?
>>
>
> For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp).
> But for a user-defined allocator that has a silly max_size(), yes, that's
> possible.
>
> I still don't really understand why any change is needed here. The PR says
> that the current _M_check_len brings in the EH code, but how/why does that
> happen? The __throw_length_error function is not inline, it's defined in
> libstdc++.so, so why isn't it just an extern call? Is the problem that it
> makes _M_check_len potentially-throwing? Because that's basically the
> entire point of _M_check_len: to throw the exception that is required by
> the C++ standard. We need to be very careful about removing that required
> throw! And after we call _M_check_len we call allocate unconditionally, so
> _M_realloc_insert can always throw (we only call _M_realloc_insert in the
> case where we've already decided a reallocation is definitely needed).
>
> Would this version of _M_check_len help?
>
>   size_type
>   _M_check_len(size_type __n, const char* __s) const
>   {
> const size_type __size = size();
> const size_type __max_size = max_size();
>
> if (__is_same(allocator_type, allocator<_Tp>)
>   && __size > __max_size / 2)
>   __builtin_unreachable(); // Assume std::allocator can't fill
> memory.
> else if (__size > __max_size)
>   __builtin_unreachable();
>
> if (__max_size - __size < __n)
>   __throw_length_error(__N(__s));
>
> const size_type __len = __size + (std::max)(__size, __n);
> return (__len < __size || __len > __max_size) ? __max_size : __len;
>   }
>
> This only applies to std::allocator, not user-defined allocators (because
> we don't know their semantics). It also seems like less of a big hack!
>
>
>


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jonathan Wakely via Gcc-patches
On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote:

> On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches
> wrote:
> > - if (max_size() - size() < __n)
> > -   __throw_length_error(__N(__s));
> > + const size_type __max_size = max_size();
> > + // On 64bit systems vectors can not reach overflow by growing
> > + // by small sizes; before this happens, we will run out of memory.
> > + if (__builtin_constant_p(__n)
> > + && __builtin_constant_p(__max_size)
> > + && sizeof(ptrdiff_t) >= 8
> > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
>
> Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?
>

For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp). But
for a user-defined allocator that has a silly max_size(), yes, that's
possible.

I still don't really understand why any change is needed here. The PR says
that the current _M_check_len brings in the EH code, but how/why does that
happen? The __throw_length_error function is not inline, it's defined in
libstdc++.so, so why isn't it just an extern call? Is the problem that it
makes _M_check_len potentially-throwing? Because that's basically the
entire point of _M_check_len: to throw the exception that is required by
the C++ standard. We need to be very careful about removing that required
throw! And after we call _M_check_len we call allocate unconditionally, so
_M_realloc_insert can always throw (we only call _M_realloc_insert in the
case where we've already decided a reallocation is definitely needed).

Would this version of _M_check_len help?

  size_type
  _M_check_len(size_type __n, const char* __s) const
  {
const size_type __size = size();
const size_type __max_size = max_size();

if (__is_same(allocator_type, allocator<_Tp>)
  && __size > __max_size / 2)
  __builtin_unreachable(); // Assume std::allocator can't fill
memory.
else if (__size > __max_size)
  __builtin_unreachable();

if (__max_size - __size < __n)
  __throw_length_error(__N(__s));

const size_type __len = __size + (std::max)(__size, __n);
return (__len < __size || __len > __max_size) ? __max_size : __len;
  }

This only applies to std::allocator, not user-defined allocators (because
we don't know their semantics). It also seems like less of a big hack!


[committed] recog: Change return type of predicate functions from int to bool

2023-06-19 Thread Uros Bizjak via Gcc-patches
Also change some internal variables to bool and change return type of
split_all_insns_noflow to void.

gcc/ChangeLog:

* recog.h (check_asm_operands): Change return type from int to bool.
(insn_invalid_p): Ditto.
(verify_changes): Ditto.
(apply_change_group): Ditto.
(constrain_operands): Ditto.
(constrain_operands_cached): Ditto.
(validate_replace_rtx_subexp): Ditto.
(validate_replace_rtx): Ditto.
(validate_replace_rtx_part): Ditto.
(validate_replace_rtx_part_nosimplify): Ditto.
(added_clobbers_hard_reg_p): Ditto.
(peep2_regno_dead_p): Ditto.
(peep2_reg_dead_p): Ditto.
(store_data_bypass_p): Ditto.
(if_test_bypass_p): Ditto.
* rtl.h (split_all_insns_noflow): Change
return type from unsigned int to void.
* genemit.cc (output_added_clobbers_hard_reg_p): Change return type
of generated added_clobbers_hard_reg_p from int to bool and adjust
function body accordingly.  Change "used" variable type from
int to bool.
* recog.cc (check_asm_operands): Change return type
from int to bool and adjust function body accordingly.
(insn_invalid_p): Ditto.  Change "is_asm" variable to bool.
(verify_changes): Change return type from int to bool.
(apply_change_group): Change return type from int to bool
and adjust function body accordingly.
(validate_replace_rtx_subexp): Change return type from int to bool.
(validate_replace_rtx): Ditto.
(validate_replace_rtx_part): Ditto.
(validate_replace_rtx_part_nosimplify): Ditto.
(constrain_operands_cached): Ditto.
(constrain_operands): Ditto.  Change "lose" and "win"
variables type from int to bool.
(split_all_insns_noflow): Change return type from unsigned int
to void and adjust function body accordingly.
(peep2_regno_dead_p): Change return type from int to bool.
(peep2_reg_dead_p): Ditto.
(peep2_find_free_register): Change "success"
variable type from int to bool
(store_data_bypass_p_1): Change return type from int to bool.
(store_data_bypass_p): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 33c9ec05d6f..1ce0564076d 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -688,26 +688,27 @@ output_added_clobbers_hard_reg_p (void)
 {
   struct clobber_pat *clobber;
   struct clobber_ent *ent;
-  int clobber_p, used;
+  int clobber_p;
+  bool used;
 
-  printf ("\n\nint\nadded_clobbers_hard_reg_p (int insn_code_number)\n");
+  printf ("\n\nbool\nadded_clobbers_hard_reg_p (int insn_code_number)\n");
   printf ("{\n");
   printf ("  switch (insn_code_number)\n");
   printf ("{\n");
 
   for (clobber_p = 0; clobber_p <= 1; clobber_p++)
 {
-  used = 0;
+  used = false;
   for (clobber = clobber_list; clobber; clobber = clobber->next)
if (clobber->has_hard_reg == clobber_p)
  for (ent = clobber->insns; ent; ent = ent->next)
{
  printf ("case %d:\n", ent->code_number);
- used++;
+ used = true;
}
 
   if (used)
-   printf ("  return %d;\n\n", clobber_p);
+   printf ("  return %s;\n\n", clobber_p ? "true" : "false");
 }
 
   printf ("default:\n");
diff --git a/gcc/recog.cc b/gcc/recog.cc
index fd09145d45e..37432087812 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -133,7 +133,7 @@ asm_labels_ok (rtx body)
 /* Check that X is an insn-body for an `asm' with operands
and that the operands mentioned in it are legitimate.  */
 
-int
+bool
 check_asm_operands (rtx x)
 {
   int noperands;
@@ -142,7 +142,7 @@ check_asm_operands (rtx x)
   int i;
 
   if (!asm_labels_ok (x))
-return 0;
+return false;
 
   /* Post-reload, be more strict with things.  */
   if (reload_completed)
@@ -156,9 +156,9 @@ check_asm_operands (rtx x)
 
   noperands = asm_noperands (x);
   if (noperands < 0)
-return 0;
+return false;
   if (noperands == 0)
-return 1;
+return true;
 
   operands = XALLOCAVEC (rtx, noperands);
   constraints = XALLOCAVEC (const char *, noperands);
@@ -171,10 +171,10 @@ check_asm_operands (rtx x)
   if (c[0] == '%')
c++;
   if (! asm_operand_ok (operands[i], c, constraints))
-   return 0;
+   return false;
 }
 
-  return 1;
+  return true;
 }
 
 /* Static data for the next two routines.  */
@@ -212,8 +212,8 @@ static int temporarily_undone_changes = 0;
 
If IN_GROUP is zero, this is a single change.  Try to recognize the insn
or validate the memory reference with the change applied.  If the result
-   is not valid for the machine, suppress the change and return zero.
-   Otherwise, perform the change and return 1.  */
+   is not valid for the machine, suppress the change and return false.
+   Otherwise, perform the change and return true.  */
 
 static bool
 validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool in_group,
@@ -232,7 +232,7 @@ validate_change_1 (rtx 

RE: [PATCH v2] RISC-V: Fix VWEXTF iterator requirement

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Monday, June 19, 2023 9:10 PM
To: juzhe.zh...@rivai.ai; Li Xu ; gcc-patches 

Cc: kito.cheng ; palmer 
Subject: Re: [PATCH v2] RISC-V: Fix VWEXTF iterator requirement



On 6/19/23 00:05, juzhe.zh...@rivai.ai wrote:
> LGTM.
OK
jeff


RE: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, June 19, 2023 7:45 PM
To: juzhe.zh...@rivai.ai; Li, Pan2 ; gcc-patches 

Cc: Robin Dapp ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64



On 6/19/23 01:01, juzhe.zh...@rivai.ai wrote:
> 
> LGTM
ACK for the trunk.
jeff


[PATCH v2] combine: Narrow comparison of memory and constant

2023-06-19 Thread Stefan Schulze Frielinghaus via Gcc-patches
Comparisons between memory and constants might be done in a smaller mode
resulting in smaller constants which might finally end up as immediates
instead of in the literal pool.

For example, on s390x a non-symmetric comparison like
  x <= 0x3fff
results in the constant being spilled to the literal pool and an 8 byte
memory comparison is emitted.  Ideally, an equivalent comparison
  x0 <= 0x3f
where x0 is the most significant byte of x, is emitted where the
constant is smaller and more likely to materialize as an immediate.

Similarly, comparisons of the form
  x >= 0x4000
can be shortened into x0 >= 0x40.

Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
Note, the new tests show that for the mentioned little-endian targets
the optimization does not materialize since either the costs of the new
instructions are higher or they do not match.  Still ok for mainline?

gcc/ChangeLog:

* combine.cc (simplify_compare_const): Narrow comparison of
memory and constant.
(try_combine): Adapt new function signature.
(simplify_comparison): Adapt new function signature.

gcc/testsuite/ChangeLog:

* gcc.dg/cmp-mem-const-1.c: New test.
* gcc.dg/cmp-mem-const-2.c: New test.
* gcc.dg/cmp-mem-const-3.c: New test.
* gcc.dg/cmp-mem-const-4.c: New test.
* gcc.dg/cmp-mem-const-5.c: New test.
* gcc.dg/cmp-mem-const-6.c: New test.
* gcc.target/s390/cmp-mem-const-1.c: New test.
---
 gcc/combine.cc| 79 +--
 gcc/testsuite/gcc.dg/cmp-mem-const-1.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-2.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-3.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-4.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-5.c| 17 
 gcc/testsuite/gcc.dg/cmp-mem-const-6.c| 17 
 .../gcc.target/s390/cmp-mem-const-1.c | 24 ++
 8 files changed, 200 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-1.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-2.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-3.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-4.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-5.c
 create mode 100644 gcc/testsuite/gcc.dg/cmp-mem-const-6.c
 create mode 100644 gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 5aa0ec5c45a..56e15a93409 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -460,7 +460,7 @@ static rtx simplify_shift_const (rtx, enum rtx_code, 
machine_mode, rtx,
 static int recog_for_combine (rtx *, rtx_insn *, rtx *);
 static rtx gen_lowpart_for_combine (machine_mode, rtx);
 static enum rtx_code simplify_compare_const (enum rtx_code, machine_mode,
-rtx, rtx *);
+rtx *, rtx *);
 static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *);
 static void update_table_tick (rtx);
 static void record_value_for_reg (rtx, rtx_insn *, rtx);
@@ -3185,7 +3185,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
  if (is_a  (GET_MODE (i2dest), ))
compare_code = simplify_compare_const (compare_code, mode,
-  op0, );
+  , );
  target_canonicalize_comparison (_code, , , 1);
}
 
@@ -11796,13 +11796,14 @@ gen_lowpart_for_combine (machine_mode omode, rtx x)
(CODE OP0 const0_rtx) form.
 
The result is a possibly different comparison code to use.
-   *POP1 may be updated.  */
+   *POP0 and *POP1 may be updated.  */
 
 static enum rtx_code
 simplify_compare_const (enum rtx_code code, machine_mode mode,
-   rtx op0, rtx *pop1)
+   rtx *pop0, rtx *pop1)
 {
   scalar_int_mode int_mode;
+  rtx op0 = *pop0;
   HOST_WIDE_INT const_op = INTVAL (*pop1);
 
   /* Get the constant we are comparing against and turn off all bits
@@ -11987,6 +11988,74 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
   break;
 }
 
+  /* Narrow non-symmetric comparison of memory and constant as e.g.
+ x0...x7 <= 0x3fff into x0 <= 0x3f where x0 is the most
+ significant byte.  Likewise, transform x0...x7 >= 0x4000 into
+ x0 >= 0x40.  */
+  if ((code == LEU || code == LTU || code == GEU || code == GTU)
+  && is_a  (GET_MODE (op0), _mode)
+  && MEM_P (op0)
+  && !MEM_VOLATILE_P (op0)
+  /* The optimization makes only sense for constants which are big enough
+so that we have a chance to chop off something at all.  */
+  && (unsigned HOST_WIDE_INT) const_op > 0xff
+  /* Ensure that we do not overflow during normalization.  */
+  && (code 

RE: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, June 19, 2023 9:51 PM
To: 钟居哲 ; Li, Pan2 ; gcc-patches 

Cc: rdapp.gcc ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64



On 6/18/23 07:14, 钟居哲 wrote:
> Thanks for fixing it for me.
> LGTM now.
OK for the trunk.
jeff


Re: ping: [PATCH] libcpp: Improve location for macro names [PR66290]

2023-06-19 Thread Lewis Hyatt via Gcc-patches
May I please ping this one? FWIW, it's 10 months old now without any feedback.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607647.html

Most of the changes are just adapting the testsuite to look for the
improved diagnostic location. Otherwise it's a handful of lines in
libcpp and it just changes this:

t.cpp:1: warning: macro "X" is not used [-Wunused-macros]
1 | #define X 1
  |

to this:

t.cpp:1:9: warning: macro "X" is not used [-Wunused-macros]
1 | #define X 1
  | ^

which closes out PR66290. Thank you!

-Lewis

On Thu, Jan 12, 2023 at 6:31 PM Lewis Hyatt  wrote:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607647.html
> May I please ping this one again? It will enable closing out the PR. Thanks!
>
> -Lewis
>
> On Thu, Dec 1, 2022 at 9:22 AM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> >
> > May I please ping this one? Thanks!
> > I have also re-attached the rebased patch here.
> >
> > -Lewis
> >
> > On Wed, Oct 12, 2022 at 06:37:50PM -0400, Lewis Hyatt wrote:
> > > Hello-
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> > >
> > > Since Jeff was kind enough to ack one of my other preprocessor patches
> > > today, I have become emboldened to ping this one again too :). Would
> > > anyone have some time to take a look at it please? Thanks!
> > >
> > > -Lewis
> > >
> > > On Thu, Sep 15, 2022 at 6:31 PM Lewis Hyatt  wrote:
> > > >
> > > > Hello-
> > > >
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> > > > May I please ping this patch? Thank you.
> > > >
> > > > -Lewis
> > > >
> > > > On Fri, Aug 5, 2022 at 12:14 PM Lewis Hyatt  wrote:
> > > > >
> > > > >
> > > > > When libcpp reports diagnostics whose locus is a macro name (such as 
> > > > > for
> > > > > -Wunused-macros), it uses the location in the cpp_macro object that 
> > > > > was
> > > > > stored by _cpp_new_macro. This is currently set to 
> > > > > pfile->directive_line,
> > > > > which contains the line number only and no column information. This 
> > > > > patch
> > > > > changes the stored location to the src_loc for the token defining the 
> > > > > macro
> > > > > name, which includes the location and range information.
> > > > >
> > > > > libcpp/ChangeLog:
> > > > >
> > > > > PR c++/66290
> > > > > * macro.cc (_cpp_create_definition): Add location argument.
> > > > > * internal.h (_cpp_create_definition): Adjust prototype.
> > > > > * directives.cc (do_define): Pass new location argument to
> > > > > _cpp_create_definition.
> > > > > (do_undef): Stop passing inferior location to 
> > > > > cpp_warning_with_line;
> > > > > the default from cpp_warning is better.
> > > > > (cpp_pop_definition): Pass new location argument to
> > > > > _cpp_create_definition.
> > > > > * pch.cc (cpp_read_state): Likewise.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR c++/66290
> > > > > * c-c++-common/cpp/macro-ranges.c: New test.
> > > > > * c-c++-common/cpp/line-2.c: Adapt to check for column 
> > > > > information
> > > > > on macro-related libcpp warnings.
> > > > > * c-c++-common/cpp/line-3.c: Likewise.
> > > > > * c-c++-common/cpp/macro-arg-count-1.c: Likewise.
> > > > > * c-c++-common/cpp/pr58844-1.c: Likewise.
> > > > > * c-c++-common/cpp/pr58844-2.c: Likewise.
> > > > > * c-c++-common/cpp/warning-zero-location.c: Likewise.
> > > > > * c-c++-common/pragma-diag-14.c: Likewise.
> > > > > * c-c++-common/pragma-diag-15.c: Likewise.
> > > > > * g++.dg/modules/macro-2_d.C: Likewise.
> > > > > * g++.dg/modules/macro-4_d.C: Likewise.
> > > > > * g++.dg/modules/macro-4_e.C: Likewise.
> > > > > * g++.dg/spellcheck-macro-ordering.C: Likewise.
> > > > > * gcc.dg/builtin-redefine.c: Likewise.
> > > > > * gcc.dg/cpp/Wunused.c: Likewise.
> > > > > * gcc.dg/cpp/redef2.c: Likewise.
> > > > > * gcc.dg/cpp/redef3.c: Likewise.
> > > > > * gcc.dg/cpp/redef4.c: Likewise.
> > > > > * gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
> > > > > * gcc.dg/cpp/ucnid-11.c: Likewise.
> > > > > * gcc.dg/cpp/undef2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-redefined-2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-redefined.c: Likewise.
> > > > > * gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-unused-macros.c: Likewise.
> > > > > ---
> > > > >
> > > > > Notes:
> > > > > Hello-
> > > > >
> > > > > The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was 
> > > > > originally
> > > > > about the entirely wrong location for -Wunused-macros in C++ 
> > > > > mode, which
> > > > > behavior was fixed by r13-1903, but before closing it out I 
> > > > > wanted to also
> > > 

Re: [PATCH] combine: Narrow comparison of memory and constant

2023-06-19 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Mon, Jun 12, 2023 at 03:29:00PM -0600, Jeff Law wrote:
> 
> 
> On 6/12/23 01:57, Stefan Schulze Frielinghaus via Gcc-patches wrote:
> > Comparisons between memory and constants might be done in a smaller mode
> > resulting in smaller constants which might finally end up as immediates
> > instead of in the literal pool.
> > 
> > For example, on s390x a non-symmetric comparison like
> >x <= 0x3fff
> > results in the constant being spilled to the literal pool and an 8 byte
> > memory comparison is emitted.  Ideally, an equivalent comparison
> >x0 <= 0x3f
> > where x0 is the most significant byte of x, is emitted where the
> > constant is smaller and more likely to materialize as an immediate.
> > 
> > Similarly, comparisons of the form
> >x >= 0x4000
> > can be shortened into x0 >= 0x40.
> > 
> > I'm not entirely sure whether combine is the right place to implement
> > something like this.  In my first try I implemented it in
> > TARGET_CANONICALIZE_COMPARISON but then thought other targets might
> > profit from it, too.  simplify_context::simplify_relational_operation_1
> > seems to be the wrong place since code/mode may change.  Any opinions?
> > 
> > gcc/ChangeLog:
> > 
> > * combine.cc (simplify_compare_const): Narrow comparison of
> > memory and constant.
> > (try_combine): Adapt new function signature.
> > (simplify_comparison): Adapt new function signature.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/cmp-mem-const-1.c: New test.
> > * gcc.target/s390/cmp-mem-const-2.c: New test.
> This does seem more general than we'd want to do in the canonicalization
> hook.  So thanks for going the extra mile and doing a generic
> implementation.
> 
> 
> 
> 
> > @@ -11987,6 +11988,79 @@ simplify_compare_const (enum rtx_code code, 
> > machine_mode mode,
> > break;
> >   }
> > +  /* Narrow non-symmetric comparison of memory and constant as e.g.
> > + x0...x7 <= 0x3fff into x0 <= 0x3f where x0 is the most
> > + significant byte.  Likewise, transform x0...x7 >= 0x4000 
> > into
> > + x0 >= 0x40.  */
> > +  if ((code == LEU || code == LTU || code == GEU || code == GTU)
> > +  && is_a  (GET_MODE (op0), _mode)
> > +  && MEM_P (op0)
> > +  && !MEM_VOLATILE_P (op0)
> > +  && (unsigned HOST_WIDE_INT)const_op > 0xff)
> > +{
> > +  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT)const_op;
> > +  enum rtx_code adjusted_code = code;
> > +
> > +  /* If the least significant bit is already zero, then adjust the
> > +comparison in the hope that we hit cases like
> > +  op0  <= 0x3dfe
> > +where the adjusted comparison
> > +  op0  <  0x3dff
> > +can be shortened into
> > +  op0' <  0x3d.  */
> > +  if (code == LEU && (n & 1) == 0)
> > +   {
> > + ++n;
> > + adjusted_code = LTU;
> > +   }
> > +  /* or e.g. op0 < 0x4020  */
> > +  else if (code == LTU && (n & 1) == 0)
> > +   {
> > + --n;
> > + adjusted_code = LEU;
> > +   }
> > +  /* or op0 >= 0x4001  */
> > +  else if (code == GEU && (n & 1) == 1)
> > +   {
> > + --n;
> > + adjusted_code = GTU;
> > +   }
> > +  /* or op0 > 0x3fff.  */
> > +  else if (code == GTU && (n & 1) == 1)
> > +   {
> > + ++n;
> > + adjusted_code = GEU;
> > +   }
> > +
> > +  scalar_int_mode narrow_mode_iter;
> > +  bool lower_p = code == LEU || code == LTU;
> > +  bool greater_p = !lower_p;
> > +  FOR_EACH_MODE_UNTIL (narrow_mode_iter, int_mode)
> > +   {
> > + unsigned nbits = GET_MODE_PRECISION (int_mode)
> > + - GET_MODE_PRECISION (narrow_mode_iter);
> > + unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << nbits) - 1;
> > + unsigned HOST_WIDE_INT lower_bits = n & mask;
> > + if ((lower_p && lower_bits == mask)
> > + || (greater_p && lower_bits == 0))
> > +   {
> > + n >>= nbits;
> > + break;
> > +   }
> > +   }
> > +
> > +  if (narrow_mode_iter < int_mode)
> > +   {
> > + poly_int64 offset = BYTES_BIG_ENDIAN
> > +   ? 0
> > +   : GET_MODE_SIZE (int_mode)
> > + - GET_MODE_SIZE (narrow_mode_iter);
> Go ahead and add some parenthesis here.  I'd add one pair around the whole
> RHS of that assignment.  The '?' and ':' would line up under the 'B' in that
> case.  Similarly add them around the false arm of the ternary.  The '-' will
> line up under the 'G'.

Done.

> 
> Going to trust you got the little endian adjustment correct here ;-)

Sadly I gave it a try on x64, aarch64, and powerpc64le and in all cases
the resulting instructions were rejected either because the costs were
higher or because the new instructions failed to match.  Thus currently I
have tested this only thoroughly on s390x.

> 
> 
> > /* Compute some predicates to 

Re: [PATCH] Improved SUBREG simplifications in simplify-rtx.cc's simplify_subreg.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 04:22, Roger Sayle wrote:


An x86 backend improvement that I'm working results in combine attempting
to recognize:

(set (reg:DI 87 [ xD.2846 ])
  (ior:DI (subreg:DI (ashift:TI (zero_extend:TI (reg:DI 92))
(const_int 64 [0x40])) 0)
  (reg:DI 91)))

where the lowpart SUBREG has difficulty seeing through the (hi<<64)
that the lowpart must be zero.  Rather than workaround this in the
backend, the better fix is to teach simplify-rtx that
lowpart((hi<<64)|lo) -> lo and highpart((hi<<64)|lo) -> hi, so that
all backends benefit.  Reducing the number of places where the
middle-end generates a SUBREG of something other than REG is a
good thing.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures, except for pr78904-1b.c, for which a backend
solution has just been proposed.  Ok for mainline?


2023-06-18  Roger Sayle  

gcc/ChangeLog
 * simplify-rtx.cc (simplify_subreg):  Optimize lowpart SUBREGs
 of ASHIFT to const0_rtx with sufficiently large shift count.
 Optimize highpart SUBREGs of ASHIFT as the shift operand when
 the shift count is the correct offset.  Optimize SUBREGs of
 multi-word logic operations if the SUBREGs of both operands
 can be simplified.

OK
Jeff


Re: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 07:14, 钟居哲 wrote:

Thanks for fixing it for me.
LGTM now.

OK for the trunk.
jeff


RE: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

2023-06-19 Thread Li, Pan2 via Gcc-patches
Ok for trunk?

And a reminder to myself that this PATCH should be committed before the RVV 
widen reduction one.

Pan

From: 钟居哲 
Sent: Sunday, June 18, 2023 9:15 PM
To: Li, Pan2 ; gcc-patches 
Cc: rdapp.gcc ; Jeff Law ; Li, Pan2 
; Wang, Yanzhang ; kito.cheng 

Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64

Thanks for fixing it for me.
LGTM now.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-18 10:57
To: gcc-patches
CC: juzhe.zhong; 
rdapp.gcc; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v2] RISC-V: Bugfix for RVV float reduction in ZVE32/64
From: Pan Li mailto:pan2...@intel.com>>

The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf;  // ZVE64

  if (code == max && mode1 == VNx1HF && mode2 == VNx1HF)
return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32
will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

PR target/110277

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for
ret_mode.
* config/riscv/vector-iterators.md: Add VHF, VSF, VDF,
VHF_LMUL1, VSF_LMUL1, VDF_LMUL1, and remove unused attr.
* config/riscv/vector.md (@pred_reduc_): Removed.
(@pred_reduc_): Ditto.
(@pred_reduc_): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_): New pattern.
(@pred_reduc_): Ditto.
(@pred_reduc_): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110277-1.c: New test.
* gcc.target/riscv/rvv/base/pr110277-1.h: New test.
* gcc.target/riscv/rvv/base/pr110277-2.c: New test.
* gcc.target/riscv/rvv/base/pr110277-2.h: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |   5 +-
gcc/config/riscv/vector-iterators.md  | 128 +++---
gcc/config/riscv/vector.md| 363 +++---
.../gcc.target/riscv/rvv/base/pr110277-1.c|   9 +
.../gcc.target/riscv/rvv/base/pr110277-1.h|  33 ++
.../gcc.target/riscv/rvv/base/pr110277-2.c|  11 +
.../gcc.target/riscv/rvv/base/pr110277-2.h|  33 ++
7 files changed, 366 insertions(+), 216 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110277-2.h

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 53bd0ed2534..27545113996 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1400,8 +1400,7 @@ public:
 machine_mode ret_mode = e.ret_mode ();
 /* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
-if ((GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
-   || GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
+if (GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
   return e.use_exact_insn (
code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
 else
@@ -1435,7 +1434,7 @@ public:
   rtx expand (function_expander ) const override
   {
 return e.use_exact_insn (
-  code_for_pred_reduc_plus (UNSPEC, e.vector_mode (), e.vector_mode ()));
+  code_for_pred_reduc_plus (UNSPEC, e.vector_mode (), e.ret_mode ()));
   }
};
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e2c8ade98eb..6169116482a 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -967,6 +967,33 @@ (define_mode_iterator VDI [
   (VNx16DI 

RE: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-19 Thread Li, Pan2 via Gcc-patches
Thanks Jakub for reviewing, sorry for misleading and will have a try for PATCH 
v3.

Pan

-Original Message-
From: Jakub Jelinek  
Sent: Monday, June 19, 2023 5:17 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; Wang, Yanzhang ; 
kito.ch...@gmail.com; rguent...@suse.de
Subject: Re: [PATCH] RISC-V: Fix out of range memory access of machine mode 
table

On Mon, Jun 19, 2023 at 05:05:48PM +0800, pan2...@intel.com wrote:
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1985,7 +1985,8 @@ lto_input_mode_table (struct lto_file_decl_data 
> *file_data)
>  internal_error ("cannot read LTO mode table from %s",
>   file_data->file_name);
>  
> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
> +  unsigned char *table = ggc_cleared_vec_alloc (
> +MAX_MACHINE_MODE);

Incorrect formatting.  And, see my other mail, this is wrong.

> @@ -108,7 +108,7 @@ inline void
>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>  {
>streamer_mode_table[mode] = 1;
> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
> +  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, mode);
>  }
>  
>  inline machine_mode
> @@ -116,7 +116,8 @@ bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
>return (machine_mode)
>  ((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> + bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode,
> + MAX_MACHINE_MODE)];
>  }

And these two are wrong as well.  The value passed to bp_pack_enum
has to match the one used on bp_unpack_enum.  But that is not the case
after your changes.  You stream out with the host MAX_MACHINE_MODE, and
stream in for normal LTO with the same value (ok), but for offloading
targets (nvptx, amdgcn) with a different MAX_MACHINE_MODE.  That will
immediate result in LTO streaming being out of sync and ICEs all around.
The reason for using 1 << 8 there was exactly to make it interoperable for
offloading.  What could be perhaps done is that you stream out the
host MAX_MACHINE_MODE value somewhere and stream it in inside of
lto_input_mode_table before you allocate the table.  But, that streamed
in host max_machine_mdoe has to be remembered somewhere and used e.g. in
bp_unpack_machine_mode instead of MAX_MACHINE_MODE.

Jakub



Re: [PATCH v2] RISC-V: Fix VWEXTF iterator requirement

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 00:05, juzhe.zh...@rivai.ai wrote:

LGTM.

OK
jeff


RE: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-19 Thread Li, Pan2 via Gcc-patches
Thanks Jeff, will commit this one after the RVV float reduction PATCH (reviewed 
by Juzhe already).

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, June 19, 2023 7:45 PM
To: juzhe.zh...@rivai.ai; Li, Pan2 ; gcc-patches 

Cc: Robin Dapp ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64



On 6/19/23 01:01, juzhe.zh...@rivai.ai wrote:
> 
> LGTM
ACK for the trunk.
jeff


[PATCH] [i386] Reject too large vectors for partial vector vectorization

2023-06-19 Thread Richard Biener via Gcc-patches
The following works around the lack of the x86 backend making the
vectorizer compare the costs of the different possible vector
sizes the backed advertises through the vector_modes hook.  When
enabling masked epilogues or main loops then this means we will
select the prefered vector mode which is usually the largest even
for loops that do not iterate close to the times the vector has
lanes.  When not using masking the vectorizer would reject any
mode resulting in a VF bigger than the number of iterations
but with masking they are simply masked out.

So this overloads the finish_cost function and matches for
the problematic case, forcing a high cost to make us try a
smaller vector size.


Bootstrapped and tested on x86_64-unknown-linux-gnu.  This should
avoid regressing 525.x264_r with partial vector epilogues and
instead improves it by 25% with -march=znver4 (need to re-check
that, that was true with some earlier attempt).

This falls short of enabling cost comparison in the x86 backend
which I also considered doing for --param vect-partial-vector-usage=1
but which will also cause a much larger churn and compile-time
impact (but it should be bearable as seen with aarch64).

I've filed PR110310 for an oddity I noticed around vectorizing
epilogues, I failed to adjust things for the case in that PR.

I'm using INT_MAX to fend off the vectorizer, I wondered if
we should be able to signal that with a bool return value of
finish_cost?  Though INT_MAX seems to work fine.

Does this look reasonable?

Thanks,
Richard.

* config/i386/i386.cc (ix86_vector_costs::finish_cost):
Overload.  For masked main loops make sure the vectorization
factor isn't more than double the number of iterations.


* gcc.target/i386/vect-partial-vectors-1.c: New testcase.
* gcc.target/i386/vect-partial-vectors-2.c: Likewise.
---
 gcc/config/i386/i386.cc   | 26 +++
 .../gcc.target/i386/vect-partial-vectors-1.c  | 13 ++
 .../gcc.target/i386/vect-partial-vectors-2.c  | 12 +
 3 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-partial-vectors-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-partial-vectors-2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b20cb86b822..32851a514a9 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23666,6 +23666,7 @@ class ix86_vector_costs : public vector_costs
  stmt_vec_info stmt_info, slp_tree node,
  tree vectype, int misalign,
  vect_cost_model_location where) override;
+  void finish_cost (const vector_costs *) override;
 };
 
 /* Implement targetm.vectorize.create_costs.  */
@@ -23918,6 +23919,31 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
   return retval;
 }
 
+void
+ix86_vector_costs::finish_cost (const vector_costs *scalar_costs)
+{
+  loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
+  if (loop_vinfo && !m_costing_for_scalar)
+{
+  /* We are currently not asking the vectorizer to compare costs
+between different vector mode sizes.  When using predication
+that will end up always choosing the prefered mode size even
+if there's a smaller mode covering all lanes.  Test for this
+situation and artificially reject the larger mode attempt.
+???  We currently lack masked ops for sub-SSE sized modes,
+so we could restrict this rejection to AVX and AVX512 modes
+but error on the safe side for now.  */
+  if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+ && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+ && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ())
+ > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo
+   m_costs[vect_body] = INT_MAX;
+}
+
+  vector_costs::finish_cost (scalar_costs);
+}
+
 /* Validate target specific memory model bits in VAL. */
 
 static unsigned HOST_WIDE_INT
diff --git a/gcc/testsuite/gcc.target/i386/vect-partial-vectors-1.c 
b/gcc/testsuite/gcc.target/i386/vect-partial-vectors-1.c
new file mode 100644
index 000..3834720e8e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-partial-vectors-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mavx512vl -mprefer-vector-width=512 --param 
vect-partial-vector-usage=1" } */
+
+void foo (int * __restrict a, int *b)
+{
+  for (int i = 0; i < 4; ++i)
+a[i] = b[i] + 42;
+}
+
+/* We do not want to optimize this using masked AVX or AXV512
+   but unmasked SSE.  */
+/* { dg-final { scan-assembler-not "\[yz\]mm" } } */
+/* { dg-final { scan-assembler "xmm" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-partial-vectors-2.c 
b/gcc/testsuite/gcc.target/i386/vect-partial-vectors-2.c
new file mode 100644
index 000..4ab2cbc4203
--- 

Re: [PATCH v2] RISC-V: Bugfix for RVV widenning reduction in ZVE32/64

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 01:01, juzhe.zh...@rivai.ai wrote:


LGTM

ACK for the trunk.
jeff


Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches
 wrote:
>
> Hi,
>
>
> With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x 
> + 0.0' to 'x'.
>

OK if you bootstrapped / tested this change.

Thanks,
Richard.

> GCC Bugzilla : Bug 110305
>
>
> gcc/ChangeLog:
>
> 2023-06-19  Toru Kisuki  
>
> * simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
>
> ---
>  gcc/simplify-rtx.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index e152918b0f1..cc96b36ad4e 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -2698,7 +2698,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
> code,
>  when x is NaN, infinite, or finite and nonzero.  They aren't
>  when x is -0 and the rounding mode is not towards -infinity,
>  since (-0) + 0 is then 0.  */
> -  if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode))
> +  if (!HONOR_SIGNED_ZEROS (mode) && !HONOR_SNANS (mode)
> +  && trueop1 == CONST0_RTX (mode))
> return op0;
>
>/* ((-a) + b) -> (b - a) and similarly for (a + (-b)).  These
> --
> 2.38.1
>


Re: [PATCH V6] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 00:56, Robin Dapp wrote:

If the pattern is not allowed to fail, then what code enforces the bias
argument's restrictions?  I don't see it in the generic expander code.


I have no ideal since this is just copied from len_load/len_store which is
s390 target dependent stuff.

I have sent V7 patch with fixing doc by following your suggestion.



We have:

signed char
internal_len_load_store_bias (internal_fn ifn, machine_mode mode)
{
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);

   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
   if (insn_operand_matches (icode, 3, GEN_INT (0)))
return 0;
   if (insn_operand_matches (icode, 3, GEN_INT (-1)))
return -1;
 }

   return VECT_PARTIAL_BIAS_UNSUPPORTED;

Ah.  That's not where I expected to find it.  Thanks for pointing it out.

Jeff


Re: Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, Jun 19, 2023 at 1:30 PM Richard Biener
 wrote:
>
> On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka  wrote:
> >
> > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > > this was suggested earlier somewhere, but I can not find the thread.
> > > > C++ has assume attribute that expands int
> > > >   if (conditional)
> > > > __builtin_unreachable ()
> > > > We do not want to account the conditional in inline heuristics since
> > > > we know that it is going to be optimized out.
> > > >
> > > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > > thre are no complains.
> > >
> > > I think we also had the request to not account the condition feeding
> > > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > > complex range comparisons here.  Also ...
> >
> > I was thinking of this: it depends on how smart do we want to get.
> > We also have dead conditionals guarding clobbers, predicts and other
> > stuff.  In general we can use mark phase of cd-dce telling it to ignore
> > those statements and then use its resut in the analysis.
>
> Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
> which might be a way to do this (of course we cannot use that 1:1).  Also
> then consider
>
>  a = a + 1;
>  if (a > 10)
>__builtin_unreachable ();
>  if (a < 5)
>__builtin_unreachable ();
>
> and a has more than one use but both are going away.  So indeed a
> more global analysis would be needed to get the full benefit.
>
> > Also question is how much we can rely on middle-end optimizing out
> > unreachables.  For example:
> > int a;
> > int b[3];
> > test()
> > {
> >   if (a > 0)
> > {
> >   for (int i = 0; i < 3; i++)
> >   b[i] = 0;
> >   __builtin_unreachable ();
> > }
> > }
> >
> > IMO can be optimized to empty function.  I believe we used to have code
> > in tree-cfgcleanup to remove statements just before
> > __builtin_unreachable which can not terminate execution, but perhaps it
> > existed only in my local tree?
>
> I think we rely on DCE/DSE here and explicit unreachable () pruning after
> VRP picked up things (I think it simply gets the secondary effect optimizing
> the condition it created the range for in the first pass).
>
> DSE is appearantly not able to kill the stores, I will fix that.  I
> think DCE can,
> but only for non-aliased stores.
>
> > We could also perhaps declare unreachable NOVOPs which would make DSE to
> > remove the stores.
>
> But only because of a bug in DSE ... it also removes them if that
> __builtin_unreachable ()
> is GIMPLE_RESX.

Oh, and __builtin_unreachable is already 'const' and thus without any VOPs.  The
issue in DSE is that DSE will not run into __builtin_unreachable because it has
no VOPs.  Instead DSE relies on eventually seeing a VUSE for all paths leaving
a function, it doesn't have a way to consider __builtin_unreachable killing all
memory (it would need a VDEF for that).

It might be possible to record which virtual operands are live at BBs without
successors (but the VUSE on returns was an attempt to avoid the need for that).

So there's no easy way to fix DSE here.

> > >
> > > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?
> >
> > You are right.  I tested it with -funreachable-traps but it does not do
> > what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable
> >
> > Also if I try to call it by hand I get:
> >
> > jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps 
> > -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable
> > t.c: In function ‘test’:
> > t.c:9:13: warning: implicit declaration of function 
> > ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? 
> > [-Wimplicit-function-declaration]
> > 9 | __builtin_unreachable_trap ();
> >   | ^~
> >   | __builtin_unreachable trap
> >
> > Which is not as helpful as it is trying to be.
> > >
> > > > +ret = true;
> > > > +done:
> > > > +  for (basic_block vbb:visited_bbs)
> > > > +cache[vbb->index] = (unsigned char)ret + 1;
> > > > +  return ret;
> > > > +}
> > > > +
> > > >  /* Analyze function body for NODE.
> > > > EARLY indicates run from early optimization pipeline.  */
> > > >
> > > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, 
> > > > bool early)
> > > >const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN 
> > > > (cfun)->count;
> > > >order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> > > >nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > > > +  auto_vec cache;
> > > > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> > >
> > > A sbitmap with two bits per entry would be more space efficient here.  
> > > bitmap
> > > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> > > adding the corresponding to 

[committed] amdgcn: minimal V64TImode vector support

2023-06-19 Thread Andrew Stubbs
This patch adds just enough TImode vector support to use them for moving 
data about. This is primarily for the use of divmodv64di4, which will 
use TImode to return a pair of DImode values.


The TImode vectors have no other operators defined, and there are no 
hardware instructions to support this mode, beyond load and store.


Committed to mainline, and OG13 will follow shortly.

Andrewamdgcn: minimal V64TImode vector support

Just enough support for TImode vectors to exist, load, store, move,
without any real instructions available.

This is primarily for the use of divmodv64di4, which uses TImode to
return a pair of DImode values.

gcc/ChangeLog:

* config/gcn/gcn-protos.h (vgpr_4reg_mode_p): New function.
* config/gcn/gcn-valu.md (V_4REG, V_4REG_ALT): New iterators.
(V_MOV, V_MOV_ALT): Likewise.
(scalar_mode, SCALAR_MODE): Add TImode.
(vnsi, VnSI, vndi, VnDI): Likewise.
(vec_merge, vec_merge_with_clobber, vec_merge_with_vcc): Use V_MOV.
(mov, mov_unspec): Use V_MOV.
(*mov_4reg): New insn.
(mov_exec): New 4reg variant.
(mov_sgprbase): Likewise.
(reload_in, reload_out): Use V_MOV.
(vec_set): Likewise.
(vec_duplicate): New 4reg variant.
(vec_extract): Likewise.
(vec_extract): Rename to ...
(vec_extract): ... this, and use V_MOV.
(vec_extract_nop): New 4reg variant.
(fold_extract_last_): Use V_MOV.
(vec_init): Rename to ...
(vec_init): ... this, and use V_MOV.
(gather_load, gather_expr,
gather_insn_1offset, gather_insn_1offset_ds,
gather_insn_2offsets): Use V_MOV.
(scatter_store, scatter_expr,
scatter_insn_1offset,
scatter_insn_1offset_ds,
scatter_insn_2offsets): Likewise.
(maskloaddi, maskstoredi, mask_gather_load,
mask_scatter_store): Likewise.
* config/gcn/gcn.cc (gcn_class_max_nregs): Use vgpr_4reg_mode_p.
(gcn_hard_regno_mode_ok): Likewise.
(GEN_VNM): Add TImode support.
(USE_TI): New macro. Separate TImode operations from non-TImode ones.
(gcn_vector_mode_supported_p): Add V64TImode, V32TImode, V16TImode,
V8TImode, and V2TImode.
(print_operand):  Add 'J' and 'K' print codes.

diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index 287ce17d422..3befb2b7caa 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -136,6 +136,17 @@ vgpr_2reg_mode_p (machine_mode mode)
   return (mode == DImode || mode == DFmode);
 }
 
+/* Return true if MODE is valid for four VGPR registers.  */
+
+inline bool
+vgpr_4reg_mode_p (machine_mode mode)
+{
+  if (VECTOR_MODE_P (mode))
+mode = GET_MODE_INNER (mode);
+
+  return (mode == TImode);
+}
+
 /* Return true if MODE can be handled directly by VGPR operations.  */
 
 inline bool
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 7290cdc2fd0..284dda73da9 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -96,6 +96,10 @@ (define_mode_iterator V_2REG_ALT
   V32DI V32DF
   V64DI V64DF])
 
+; Vector modes for four vector registers
+(define_mode_iterator V_4REG [V2TI V4TI V8TI V16TI V32TI V64TI])
+(define_mode_iterator V_4REG_ALT [V2TI V4TI V8TI V16TI V32TI V64TI])
+
 ; Vector modes with native support
 (define_mode_iterator V_noQI
  [V2HI V2HF V2SI V2SF V2DI V2DF
@@ -136,7 +140,7 @@ (define_mode_iterator SV_SFDF
   V32SF V32DF
   V64SF V64DF])
 
-; All of above
+; All modes in which we want to do more than just moves.
 (define_mode_iterator V_ALL
  [V2QI V2HI V2HF V2SI V2SF V2DI V2DF
   V4QI V4HI V4HF V4SI V4SF V4DI V4DF
@@ -175,97 +179,113 @@ (define_mode_iterator SV_FP
   V32HF V32SF V32DF
   V64HF V64SF V64DF])
 
+; All modes that need moves, including those without many insns.
+(define_mode_iterator V_MOV
+ [V2QI V2HI V2HF V2SI V2SF V2DI V2DF V2TI
+  V4QI V4HI V4HF V4SI V4SF V4DI V4DF V4TI
+  V8QI V8HI V8HF V8SI V8SF V8DI V8DF V8TI
+  V16QI V16HI V16HF V16SI V16SF V16DI V16DF V16TI
+  V32QI V32HI V32HF V32SI V32SF V32DI V32DF V32TI
+  V64QI V64HI V64HF V64SI V64SF V64DI V64DF V64TI])
+(define_mode_iterator V_MOV_ALT
+ [V2QI V2HI V2HF V2SI V2SF V2DI V2DF V2TI
+  V4QI V4HI V4HF V4SI V4SF V4DI V4DF V4TI
+  V8QI V8HI V8HF V8SI V8SF V8DI V8DF V8TI
+  V16QI V16HI V16HF V16SI V16SF V16DI V16DF V16TI
+  V32QI V32HI V32HF V32SI V32SF V32DI V32DF V32TI
+  V64QI V64HI V64HF V64SI V64SF V64DI V64DF V64TI])
+
 (define_mode_attr scalar_mode
-  [(QI "qi") (HI "hi") (SI "si")
+  [(QI "qi") (HI 

[committed] amdgcn: Delete inactive libfuncs

2023-06-19 Thread Andrew Stubbs
There were implementations for HImode division in libgcc, but there were 
no matching libfuncs defined in the compiler, so the code was inactive 
(GCC only defines SImode and DImode, by default, and amdgcn only adds 
TImode explicitly).


On trying to activate it I find that the definition of 
TARGET_PROMOTE_FUNCTION_MODE causes all unsigned HImode values to be 
sign-extended to SImode when calling libfuncs, thus breaking the values 
(presumably because they don't have a prototype?). I can't see an 
obvious advantage for having these functions for scalars, at this time, 
so I'm just deleting them ahead of adding divmod and vector implementations.


Committed to mainline, and OG13 will follow shortly.

Andrewamdgcn: Delete inactive libfuncs

The HImode libfuncs weren't called and trying to enable them fails because
TARGET_PROMOTE_FUNCTION_MODE wants to widen the arguments but the signedness
isn't known.

libgcc/ChangeLog:

* config/gcn/lib2-gcn.h (QItype, UQItype, HItype, UHItype): Delete.
(__divhi3, __modhi3, __udivhi3, __umodhi3): Delete.
* config/gcn/t-amdgcn: Don't build lib2-divmod-hi.c.
* config/gcn/lib2-divmod-hi.c: Removed.

diff --git a/libgcc/config/gcn/lib2-divmod-hi.c 
b/libgcc/config/gcn/lib2-divmod-hi.c
deleted file mode 100644
index f4584aabcd9..000
--- a/libgcc/config/gcn/lib2-divmod-hi.c
+++ /dev/null
@@ -1,117 +0,0 @@
-/* Copyright (C) 2012-2023 Free Software Foundation, Inc.
-   Contributed by Altera and Mentor Graphics, Inc.
-
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
-
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
-
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
-
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-.  */
-
-#include "lib2-gcn.h"
-
-/* 16-bit HI divide and modulo as used in gcn.  */
-
-static UHItype
-udivmodhi4 (UHItype num, UHItype den, word_type modwanted)
-{
-  UHItype bit = 1;
-  UHItype res = 0;
-
-  while (den < num && bit && !(den & (1L<<15)))
-{
-  den <<=1;
-  bit <<=1;
-}
-  while (bit)
-{
-  if (num >= den)
-   {
- num -= den;
- res |= bit;
-   }
-  bit >>=1;
-  den >>=1;
-}
-  if (modwanted)
-return num;
-  return res;
-}
-
-
-HItype
-__divhi3 (HItype a, HItype b)
-{
-  word_type neg = 0;
-  HItype res;
-
-  if (a < 0)
-{
-  a = -a;
-  neg = !neg;
-}
-
-  if (b < 0)
-{
-  b = -b;
-  neg = !neg;
-}
-
-  res = udivmodhi4 (a, b, 0);
-
-  if (neg)
-res = -res;
-
-  return res;
-}
-
-
-HItype
-__modhi3 (HItype a, HItype b)
-{
-  word_type neg = 0;
-  HItype res;
-
-  if (a < 0)
-{
-  a = -a;
-  neg = 1;
-}
-
-  if (b < 0)
-b = -b;
-
-  res = udivmodhi4 (a, b, 1);
-
-  if (neg)
-res = -res;
-
-  return res;
-}
-
-
-UHItype
-__udivhi3 (UHItype a, UHItype b)
-{
-  return udivmodhi4 (a, b, 0);
-}
-
-
-UHItype
-__umodhi3 (UHItype a, UHItype b)
-{
-  return udivmodhi4 (a, b, 1);
-}
-
diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h
index 645245b2128..67ad9bafc19 100644
--- a/libgcc/config/gcn/lib2-gcn.h
+++ b/libgcc/config/gcn/lib2-gcn.h
@@ -27,10 +27,6 @@
 
 /* Types.  */
 
-typedef char QItype __attribute__ ((mode (QI)));
-typedef unsigned char UQItype __attribute__ ((mode (QI)));
-typedef short HItype __attribute__ ((mode (HI)));
-typedef unsigned short UHItype __attribute__ ((mode (HI)));
 typedef int SItype __attribute__ ((mode (SI)));
 typedef unsigned int USItype __attribute__ ((mode (SI)));
 typedef int DItype __attribute__ ((mode (DI)));
@@ -48,10 +44,6 @@ extern SItype __divsi3 (SItype, SItype);
 extern SItype __modsi3 (SItype, SItype);
 extern USItype __udivsi3 (USItype, USItype);
 extern USItype __umodsi3 (USItype, USItype);
-extern HItype __divhi3 (HItype, HItype);
-extern HItype __modhi3 (HItype, HItype);
-extern UHItype __udivhi3 (UHItype, UHItype);
-extern UHItype __umodhi3 (UHItype, UHItype);
 extern SItype __mulsi3 (SItype, SItype);
 
 #endif /* LIB2_GCN_H */
diff --git a/libgcc/config/gcn/t-amdgcn b/libgcc/config/gcn/t-amdgcn
index 38bde54a096..e64953e6185 100644
--- a/libgcc/config/gcn/t-amdgcn
+++ b/libgcc/config/gcn/t-amdgcn
@@ -1,6 +1,5 @@
 LIB2ADD += $(srcdir)/config/gcn/atomic.c \
   $(srcdir)/config/gcn/lib2-divmod.c \
-  

Re: Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka  wrote:
>
> > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > > this was suggested earlier somewhere, but I can not find the thread.
> > > C++ has assume attribute that expands int
> > >   if (conditional)
> > > __builtin_unreachable ()
> > > We do not want to account the conditional in inline heuristics since
> > > we know that it is going to be optimized out.
> > >
> > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > thre are no complains.
> >
> > I think we also had the request to not account the condition feeding
> > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > complex range comparisons here.  Also ...
>
> I was thinking of this: it depends on how smart do we want to get.
> We also have dead conditionals guarding clobbers, predicts and other
> stuff.  In general we can use mark phase of cd-dce telling it to ignore
> those statements and then use its resut in the analysis.

Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
which might be a way to do this (of course we cannot use that 1:1).  Also
then consider

 a = a + 1;
 if (a > 10)
   __builtin_unreachable ();
 if (a < 5)
   __builtin_unreachable ();

and a has more than one use but both are going away.  So indeed a
more global analysis would be needed to get the full benefit.

> Also question is how much we can rely on middle-end optimizing out
> unreachables.  For example:
> int a;
> int b[3];
> test()
> {
>   if (a > 0)
> {
>   for (int i = 0; i < 3; i++)
>   b[i] = 0;
>   __builtin_unreachable ();
> }
> }
>
> IMO can be optimized to empty function.  I believe we used to have code
> in tree-cfgcleanup to remove statements just before
> __builtin_unreachable which can not terminate execution, but perhaps it
> existed only in my local tree?

I think we rely on DCE/DSE here and explicit unreachable () pruning after
VRP picked up things (I think it simply gets the secondary effect optimizing
the condition it created the range for in the first pass).

DSE is appearantly not able to kill the stores, I will fix that.  I
think DCE can,
but only for non-aliased stores.

> We could also perhaps declare unreachable NOVOPs which would make DSE to
> remove the stores.

But only because of a bug in DSE ... it also removes them if that
__builtin_unreachable ()
is GIMPLE_RESX.

> >
> > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?
>
> You are right.  I tested it with -funreachable-traps but it does not do
> what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable
>
> Also if I try to call it by hand I get:
>
> jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps 
> -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable
> t.c: In function ‘test’:
> t.c:9:13: warning: implicit declaration of function 
> ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? 
> [-Wimplicit-function-declaration]
> 9 | __builtin_unreachable_trap ();
>   | ^~
>   | __builtin_unreachable trap
>
> Which is not as helpful as it is trying to be.
> >
> > > +ret = true;
> > > +done:
> > > +  for (basic_block vbb:visited_bbs)
> > > +cache[vbb->index] = (unsigned char)ret + 1;
> > > +  return ret;
> > > +}
> > > +
> > >  /* Analyze function body for NODE.
> > > EARLY indicates run from early optimization pipeline.  */
> > >
> > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, 
> > > bool early)
> > >const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> > >order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> > >nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > > +  auto_vec cache;
> > > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> >
> > A sbitmap with two bits per entry would be more space efficient here.  
> > bitmap
> > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> > adding the corresponding to sbitmap.h would likely ease use there as well.
>
> I did not know about the chunk API which is certainly nice :)
> sbitmap will always allocate, while here we stay on stack for small
> functions and I am not sure how much extra bit operations would not
> offset extra memset, but overall I think it is all in noise.

Ah, yeah.
> Honza


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches wrote:
> - if (max_size() - size() < __n)
> -   __throw_length_error(__N(__s));
> + const size_type __max_size = max_size();
> + // On 64bit systems vectors can not reach overflow by growing
> + // by small sizes; before this happens, we will run out of memory.
> + if (__builtin_constant_p(__n)
> + && __builtin_constant_p(__max_size)
> + && sizeof(ptrdiff_t) >= 8
> + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)

Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation?

Jakub



RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, 19 Jun 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 19, 2023 11:19 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] Remove -save-temps from tests using -flto
> > 
> > On Mon, 19 Jun 2023, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Monday, June 19, 2023 7:28 AM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: Tamar Christina 
> > > > Subject: [PATCH] Remove -save-temps from tests using -flto
> > > >
> > > > The following removes -save-temps that doesn't seem to have any good
> > > > reason from tests that also run with -flto added.  That can cause
> > > > ltrans files to race with other multilibs tested and I'm frequently
> > > > seeing linker complaints that the architecture doesn't match here.
> > > >
> > > > I'm not sure whether the .ltrans.o files end up in a non gccN/
> > > > specific directory or if we end up sharing the same dir for
> > > > different multilibs (not sure if it's easily possible to avoid that).
> > > >
> > > > Parallel testing on x86_64-unknown-linux-gnu in progress.
> > > >
> > > > Tamar, was there any reason to use -save-temps here?
> > >
> > > At the time I was getting unresolved errors from these without it.
> > > But perhaps that's something to do with dejagnu versions?
> > 
> > I don't know.  Can you check if there's an issue on your side when removing 
> > -
> > save-temps?
> 
> Nope no issues, all tests still pass.

Pushed then.

Richard.


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jan Hubicka via Gcc-patches
> > -   if (max_size() - size() < __n)
> > - __throw_length_error(__N(__s));
> > +   // On 64bit systems vectors of small sizes can not
> > +   // reach overflow by growing by small sizes; before
> > +   // this happens, we will run out of memory.
> > +   if (__builtin_constant_p (sizeof (_Tp))
> >
> 
> This shouldn't be here, of course sizeof is a constant.
OK :)
> 
> No space before the opening parens, libstdc++ doesn't follow GNU style.
Fixed.
> 
> 
> 
> > +   && __builtin_constant_p (__n)
> > +   && sizeof (ptrdiff_t) >= 8
> > +   && __n < max_size () / 2)
> >
> 
> This check is not OK. As I said in Bugzilla just now, max_size() depends on
> the allocator, which could return something much smaller than PTRDIFF_MAX.
> You can't make this assumption for all specializations of std::vector.
> 
> If Alloc::max_size() == 100 and this->size() == 100 then this function
> needs to throw length_error for *any* n. In the general case you cannot
> remove size() from this condition.
> 
> For std::allocator it's safe to assume that max_size() is related to
> PTRDIFF_MAX/sizeof(T), but this patch would apply to all allocators.

Here is updated version.  I simply __builtin_constant_p max_size and
test it is large enough.  For that we need to copy it into temporary
variable since we fold-const __builtin_constant_p (function (x))
early, before function gets inlined.

I also added __builtin_unreachable to determine return value range
as discussed in PR.

Honza

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 70ced3d101f..7a1966405ca 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1895,11 +1895,29 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   size_type
   _M_check_len(size_type __n, const char* __s) const
   {
-   if (max_size() - size() < __n)
- __throw_length_error(__N(__s));
+   const size_type __max_size = max_size();
+   // On 64bit systems vectors can not reach overflow by growing
+   // by small sizes; before this happens, we will run out of memory.
+   if (__builtin_constant_p(__n)
+   && __builtin_constant_p(__max_size)
+   && sizeof(ptrdiff_t) >= 8
+   && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60)
+   && __n < __max_size / 2)
+ {
+   const size_type __len = size() + (std::max)(size(), __n);
+   // let compiler know that __len has sane value range.
+   if (__len < __n || __len >= __max_size)
+ __builtin_unreachable();
+   return __len;
+ }
+   else
+ {
+   if (__max_size - size() < __n)
+ __throw_length_error(__N(__s));
 
-   const size_type __len = size() + (std::max)(size(), __n);
-   return (__len < size() || __len > max_size()) ? max_size() : __len;
+   const size_type __len = size() + (std::max)(size(), __n);
+   return (__len < size() || __len > __max_size) ? __max_size : __len;
+ }
   }
 
   // Called by constructors to check initial size.


RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 19, 2023 11:19 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Remove -save-temps from tests using -flto
> 
> On Mon, 19 Jun 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, June 19, 2023 7:28 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Tamar Christina 
> > > Subject: [PATCH] Remove -save-temps from tests using -flto
> > >
> > > The following removes -save-temps that doesn't seem to have any good
> > > reason from tests that also run with -flto added.  That can cause
> > > ltrans files to race with other multilibs tested and I'm frequently
> > > seeing linker complaints that the architecture doesn't match here.
> > >
> > > I'm not sure whether the .ltrans.o files end up in a non gccN/
> > > specific directory or if we end up sharing the same dir for
> > > different multilibs (not sure if it's easily possible to avoid that).
> > >
> > > Parallel testing on x86_64-unknown-linux-gnu in progress.
> > >
> > > Tamar, was there any reason to use -save-temps here?
> >
> > At the time I was getting unresolved errors from these without it.
> > But perhaps that's something to do with dejagnu versions?
> 
> I don't know.  Can you check if there's an issue on your side when removing -
> save-temps?

Nope no issues, all tests still pass.

Regards,
Tamar
> 
> Richard.
> 
> > Tamar
> >
> > >
> > >   * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
> > >   * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-6.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-8.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-9.c  | 2 +-
> > >  9 files changed, 9 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > index e9ec9603af6..e6810433d70 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > index 06c103d3885..f83078b5d51 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > index 059bfb3ae62..e33a824df07 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > index 059bfb3ae62..e33a824df07 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git 

[PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Toru Kisuki via Gcc-patches
Hi,


With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x + 
0.0' to 'x'.


GCC Bugzilla : Bug 110305


gcc/ChangeLog:

2023-06-19  Toru Kisuki  

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):

---
 gcc/simplify-rtx.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index e152918b0f1..cc96b36ad4e 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2698,7 +2698,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
 when x is NaN, infinite, or finite and nonzero.  They aren't
 when x is -0 and the rounding mode is not towards -infinity,
 since (-0) + 0 is then 0.  */
-  if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode))
+  if (!HONOR_SIGNED_ZEROS (mode) && !HONOR_SNANS (mode)
+  && trueop1 == CONST0_RTX (mode))
return op0;

   /* ((-a) + b) -> (b - a) and similarly for (a + (-b)).  These
--
2.38.1



Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c' (was: [committed] libgomp.c/target-51.c: Accept more error-msg variants in dg-output (was: Re: [committed] libgomp: Fix OMP_TARGET_OFFLOAD

2023-06-19 Thread Thomas Schwinge
Hi!

On 2023-06-19T10:02:58+0200, Tobias Burnus  wrote:
> On 16.06.23 22:42, Thomas Schwinge wrote:
>> I see the new tests PASS, but with offloading enabled (nvptx) also see:
>>
>>  PASS: libgomp.c/target-51.c (test for excess errors)
>>  PASS: libgomp.c/target-51.c execution test
>>  [-PASS:-]{+FAIL:+} libgomp.c/target-51.c output pattern test
>>
>> ... due to:
>>
>>  Output was:
>>
>>  libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be 
>> used for offloading
>>
>>  Should match:
>>  .*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not 
>> found.*
>
> Thanks for the report. I can offer yet another wording for the same program – 
> and also
> with nvptx enabled:
>
> libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used 
> for offloading
>
> And I can also offer (which is already in the testcase with "! 
> offload_device"):
>
> libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is 
> available
>
> I think I will just match "..., but .*" without distinguishing 
> check_effective_target_* ...
>
> ... which I now did in commit r14-1926-g01fe115ba7eafe (see also attached 
> patch).

Pushed commit de2d3b69eefde005759279d6739d9a0dbd2a05cc
"Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c'",
see attached.


Grüße
 Thomas


> * * *
>
> With offloading, there are simply too many possibilities:
>
> * Not compiled with offloading support - vs. with (ENABLE_OFFLOADING)
> * Support compiled in but either compiler or library support not installed
>(requires configuring with --enable-offload-defaulted)
> * Offloading libgomp plugins there but no CUDA or hsa runtime libraries
> * The latter being installed but no device available
>
> Plus -foffload=disable or only enabling an (at runtime) unavailable or
> unsupported device type or other issues like CUDA and device present but
> an issue with the kernel driver (or similar half-broken states) or ...
>
> [And with remote testing issues related to dg-set-target-env-var and only
> few systems supporting offloading, a full test coverage is even harder.]
>
> Tobias


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From de2d3b69eefde005759279d6739d9a0dbd2a05cc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 19 Jun 2023 12:20:15 +0200
Subject: [PATCH] Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c'

ERROR: libgomp.c/target-51.c: unknown dg option: \} for "}"

Fix-up for recent commit 01fe115ba7eafebcf97bbac9e157038a003d0c85
"libgomp.c/target-51.c: Accept more error-msg variants in dg-output".

	libgomp/
	* testsuite/libgomp.c/target-51.c: Fix DejaGnu directive syntax
	error.
---
 libgomp/testsuite/libgomp.c/target-51.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c/target-51.c b/libgomp/testsuite/libgomp.c/target-51.c
index db0363bfc14..7ff8122861f 100644
--- a/libgomp/testsuite/libgomp.c/target-51.c
+++ b/libgomp/testsuite/libgomp.c/target-51.c
@@ -9,7 +9,7 @@
 
 /* See comment in target-50.c/target-50.c for why the output differs.  */
 
-/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but .*" } } */
+/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but .*" } */
 
 int
 main ()
-- 
2.34.1



RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, 19 Jun 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 19, 2023 7:28 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Tamar Christina 
> > Subject: [PATCH] Remove -save-temps from tests using -flto
> > 
> > The following removes -save-temps that doesn't seem to have any good
> > reason from tests that also run with -flto added.  That can cause ltrans 
> > files to
> > race with other multilibs tested and I'm frequently seeing linker complaints
> > that the architecture doesn't match here.
> > 
> > I'm not sure whether the .ltrans.o files end up in a non gccN/ specific 
> > directory
> > or if we end up sharing the same dir for different multilibs (not sure if 
> > it's easily
> > possible to avoid that).
> > 
> > Parallel testing on x86_64-unknown-linux-gnu in progress.
> > 
> > Tamar, was there any reason to use -save-temps here?
> 
> At the time I was getting unresolved errors from these without it.
> But perhaps that's something to do with dejagnu versions?

I don't know.  Can you check if there's an issue on your side when
removing -save-temps?

Richard.

> Tamar
> 
> > 
> > * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
> > * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
> > * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-6.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-8.c  | 2 +-
> > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-9.c  | 2 +-
> >  9 files changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > index e9ec9603af6..e6810433d70 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > 
> >  #include 
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > index 06c103d3885..f83078b5d51 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > 
> >  #include 
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > index 059bfb3ae62..e33a824df07 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > 
> >  #include 
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > index 059bfb3ae62..e33a824df07 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > 
> >  #include 
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> > index 91b82fb5988..8895d5c263c 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
> >  /* { dg-do run } */
> > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> > +/* { dg-additional-options "-O3 -fdump-tree-dce 

Re: Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Jan Hubicka via Gcc-patches
> On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > this was suggested earlier somewhere, but I can not find the thread.
> > C++ has assume attribute that expands int
> >   if (conditional)
> > __builtin_unreachable ()
> > We do not want to account the conditional in inline heuristics since
> > we know that it is going to be optimized out.
> >
> > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > thre are no complains.
> 
> I think we also had the request to not account the condition feeding
> stmts (if they only feed it and have no side-effects).  libstdc++ has
> complex range comparisons here.  Also ...

I was thinking of this: it depends on how smart do we want to get.
We also have dead conditionals guarding clobbers, predicts and other
stuff.  In general we can use mark phase of cd-dce telling it to ignore
those statements and then use its resut in the analysis.

Also question is how much we can rely on middle-end optimizing out
unreachables.  For example:
int a;
int b[3];
test()
{
  if (a > 0)
{
  for (int i = 0; i < 3; i++)
  b[i] = 0;
  __builtin_unreachable ();
}
}

IMO can be optimized to empty function.  I believe we used to have code
in tree-cfgcleanup to remove statements just before
__builtin_unreachable which can not terminate execution, but perhaps it
existed only in my local tree?
We could also perhaps declare unreachable NOVOPs which would make DSE to
remove the stores.

> 
> ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?

You are right.  I tested it with -funreachable-traps but it does not do
what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable

Also if I try to call it by hand I get:

jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps -fdump-tree-all-details 
-fsanitize=unreachable -fsanitize-trap=unreachable
t.c: In function ‘test’:
t.c:9:13: warning: implicit declaration of function 
‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? 
[-Wimplicit-function-declaration]
9 | __builtin_unreachable_trap ();
  | ^~
  | __builtin_unreachable trap

Which is not as helpful as it is trying to be.
> 
> > +ret = true;
> > +done:
> > +  for (basic_block vbb:visited_bbs)
> > +cache[vbb->index] = (unsigned char)ret + 1;
> > +  return ret;
> > +}
> > +
> >  /* Analyze function body for NODE.
> > EARLY indicates run from early optimization pipeline.  */
> >
> > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool 
> > early)
> >const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> >order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> >nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > +  auto_vec cache;
> > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> 
> A sbitmap with two bits per entry would be more space efficient here.  bitmap
> has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> adding the corresponding to sbitmap.h would likely ease use there as well.

I did not know about the chunk API which is certainly nice :)
sbitmap will always allocate, while here we stay on stack for small
functions and I am not sure how much extra bit operations would not
offset extra memset, but overall I think it is all in noise.

Honza


[PATCH] debug/110295 - mixed up early/late debug for member DIEs

2023-06-19 Thread Richard Biener via Gcc-patches
When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members.  Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.

The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.

[LTO] Bootstrapped on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

PR debug/110295
* dwarf2out.cc (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.

* g++.dg/debug/pr110295.C: New testcase.
---
 gcc/dwarf2out.cc  |  3 ++-
 gcc/testsuite/g++.dg/debug/pr110295.C | 19 +++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/debug/pr110295.C

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d89ffa66847..e70c47cec8d 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -26533,7 +26533,8 @@ process_scope_var (tree stmt, tree decl, tree origin, 
dw_die_ref context_die)
 
   if (die != NULL && die->die_parent == NULL)
 add_child_die (context_die, die);
-  else if (TREE_CODE (decl_or_origin) == IMPORTED_DECL)
+
+  if (TREE_CODE (decl_or_origin) == IMPORTED_DECL)
 {
   if (early_dwarf)
dwarf2out_imported_module_or_decl_1 (decl_or_origin, DECL_NAME 
(decl_or_origin),
diff --git a/gcc/testsuite/g++.dg/debug/pr110295.C 
b/gcc/testsuite/g++.dg/debug/pr110295.C
new file mode 100644
index 000..10cad557095
--- /dev/null
+++ b/gcc/testsuite/g++.dg/debug/pr110295.C
@@ -0,0 +1,19 @@
+// { dg-do compile }
+// { dg-options "-g" }
+
+template 
+struct QCachedT
+{
+  void operator delete(void *, T *) {}
+};
+template
+void exercise()
+{
+  struct thing_t
+: QCachedT
+  {
+  };
+  thing_t *list[1];
+  new thing_t; // { dg-warning "" }
+}
+int main() { exercise<1>(); }
-- 
2.35.3


Re: [libstdc++] Improve M_check_len

2023-06-19 Thread Jonathan Wakely via Gcc-patches
On Sun, 18 Jun 2023 at 19:37, Jan Hubicka  wrote:

> Hi,
> _M_check_len is used in vector reallocations. It computes __n + __s but
> does
> checking for case that (__n + __s) * sizeof (Tp) would overflow ptrdiff_t.
> Since we know that __s is a size of already allocated memory block if __n
> is
> not too large, this will never happen on 64bit systems since memory is not
> that
> large.  This patch adds __builtin_constant_p checks for this case.  This
> size
> of fully inlined push_back function that is critical for loops that are
> controlled by std::vector based stack.
>
> With the patch to optimize std::max and to handle SRA candidates, we
> fully now inline push_back with -O3 (not with -O2), however there are still
> quite few silly things for example:
>
>   //  _78 is original size of the allocated vector.
>
>   _76 = stack$_M_end_of_storage_177 - _142;
>   _77 = _76 /[ex] 8;
>   _78 = (long unsigned int) _77;
>   _79 = MAX_EXPR <_78, 1>;
>   _80 = _78 + _79; // this is result of _M_check_len doubling the
> allocated vector size.
>   if (_80 != 0)// result will always be non-zero.
> goto ; [54.67%]
>   else
> goto ; [45.33%]
>
>[local count: 30795011]:
>   if (_80 > 1152921504606846975)  // doubling succesfully allocated
> memmory will never get so large.
> goto ; [10.00%]
>   else
> goto ; [90.00%]
>
>[local count: 3079501]:
>   if (_80 > 2305843009213693951)  // I wonder if we really want to have
> two different throws
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 1539750]:
>   std::__throw_bad_array_new_length ();
>
>[local count: 1539750]:
>   std::__throw_bad_alloc ();
>
>[local count: 27715510]:
>   _108 = _80 * 8;
>   _109 = operator new (_108);
>
> Maybe we want to add assumption that result of the function is never
> greater than max_size to get rid of the two checks above.  However this
> will still be recongized only after inlining and will continue confusing
> inliner heuristics.
>
> Bootstrapped/regtested x86_64-linux.  I am not too familiar with libstdc++
> internals,
> so would welcome comments and ideas.
>
> libstdc++-v3/ChangeLog:
>
> PR tree-optimization/110287
> * include/bits/stl_vector.h: Optimize _M_check_len for constantly
> sized
> types and allocations.
>
> diff --git a/libstdc++-v3/include/bits/stl_vector.h
> b/libstdc++-v3/include/bits/stl_vector.h
> index 70ced3d101f..3ad59fe3e2b 100644
> --- a/libstdc++-v3/include/bits/stl_vector.h
> +++ b/libstdc++-v3/include/bits/stl_vector.h
> @@ -1895,11 +1895,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>size_type
>_M_check_len(size_type __n, const char* __s) const
>{
> -   if (max_size() - size() < __n)
> - __throw_length_error(__N(__s));
> +   // On 64bit systems vectors of small sizes can not
> +   // reach overflow by growing by small sizes; before
> +   // this happens, we will run out of memory.
> +   if (__builtin_constant_p (sizeof (_Tp))
>

This shouldn't be here, of course sizeof is a constant.

No space before the opening parens, libstdc++ doesn't follow GNU style.



> +   && __builtin_constant_p (__n)
> +   && sizeof (ptrdiff_t) >= 8
> +   && __n < max_size () / 2)
>

This check is not OK. As I said in Bugzilla just now, max_size() depends on
the allocator, which could return something much smaller than PTRDIFF_MAX.
You can't make this assumption for all specializations of std::vector.

If Alloc::max_size() == 100 and this->size() == 100 then this function
needs to throw length_error for *any* n. In the general case you cannot
remove size() from this condition.

For std::allocator it's safe to assume that max_size() is related to
PTRDIFF_MAX/sizeof(T), but this patch would apply to all allocators.



> + return size() + (std::max)(size(), __n);
>
+   else
> + {
> +   if (max_size() - size() < __n)
> + __throw_length_error(__N(__s));
>
> -   const size_type __len = size() + (std::max)(size(), __n);
> -   return (__len < size() || __len > max_size()) ? max_size() : __len;
> +   const size_type __len = size() + (std::max)(size(), __n);
> +   return (__len < size() || __len > max_size()) ? max_size() :
> __len;
> + }
>}
>
>// Called by constructors to check initial size.
>
>


Re: [PATCH] [contrib] validate_failures.py: Don't consider summary line in wrong place

2023-06-19 Thread Thiago Jung Bauermann via Gcc-patches


Jeff Law  writes:

> On 6/16/23 06:02, Thiago Jung Bauermann via Gcc-patches wrote:
>> contrib/ChangeLog:
>>  * testsuite-management/validate_failures.py (IsInterestingResult):
>>  Add result_set argument and use it.  Adjust callers.
> Thanks.  I pushed this to the trunk.

Thank you!

-- 
Thiago


RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 19, 2023 7:28 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina 
> Subject: [PATCH] Remove -save-temps from tests using -flto
> 
> The following removes -save-temps that doesn't seem to have any good
> reason from tests that also run with -flto added.  That can cause ltrans 
> files to
> race with other multilibs tested and I'm frequently seeing linker complaints
> that the architecture doesn't match here.
> 
> I'm not sure whether the .ltrans.o files end up in a non gccN/ specific 
> directory
> or if we end up sharing the same dir for different multilibs (not sure if 
> it's easily
> possible to avoid that).
> 
> Parallel testing on x86_64-unknown-linux-gnu in progress.
> 
> Tamar, was there any reason to use -save-temps here?

At the time I was getting unresolved errors from these without it.
But perhaps that's something to do with dejagnu versions?

Tamar

> 
>   * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
>   * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-6.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-8.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-9.c  | 2 +-
>  9 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> index e9ec9603af6..e6810433d70 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> index 06c103d3885..f83078b5d51 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> index 059bfb3ae62..e33a824df07 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> index 059bfb3ae62..e33a824df07 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> index 91b82fb5988..8895d5c263c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> index 59f339fb8c5..77d4deb633c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for 

Re: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 19, 2023 at 05:05:48PM +0800, pan2...@intel.com wrote:
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1985,7 +1985,8 @@ lto_input_mode_table (struct lto_file_decl_data 
> *file_data)
>  internal_error ("cannot read LTO mode table from %s",
>   file_data->file_name);
>  
> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
> +  unsigned char *table = ggc_cleared_vec_alloc (
> +MAX_MACHINE_MODE);

Incorrect formatting.  And, see my other mail, this is wrong.

> @@ -108,7 +108,7 @@ inline void
>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>  {
>streamer_mode_table[mode] = 1;
> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
> +  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, mode);
>  }
>  
>  inline machine_mode
> @@ -116,7 +116,8 @@ bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
>return (machine_mode)
>  ((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> + bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode,
> + MAX_MACHINE_MODE)];
>  }

And these two are wrong as well.  The value passed to bp_pack_enum
has to match the one used on bp_unpack_enum.  But that is not the case
after your changes.  You stream out with the host MAX_MACHINE_MODE, and
stream in for normal LTO with the same value (ok), but for offloading
targets (nvptx, amdgcn) with a different MAX_MACHINE_MODE.  That will
immediate result in LTO streaming being out of sync and ICEs all around.
The reason for using 1 << 8 there was exactly to make it interoperable for
offloading.  What could be perhaps done is that you stream out the
host MAX_MACHINE_MODE value somewhere and stream it in inside of
lto_input_mode_table before you allocate the table.  But, that streamed
in host max_machine_mdoe has to be remembered somewhere and used e.g. in
bp_unpack_machine_mode instead of MAX_MACHINE_MODE.

Jakub



Re: [PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, 19 Jun 2023, pan2...@intel.com wrote:

> From: Pan Li 
> 
> We extend the machine mode from 8 to 16 bits already. But there still
> one placing missing from the tree-streamer. It has one hard coded array
> for the machine code like size 256.
> 
> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
> value of the MAX_MACHINE_MODE will grow as more and more modes are added.
> While the machine mode array in tree-streamer still leave 256 as is.
> 
> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
> lto_output_init_mode_table will touch the memory out of range unexpected.
> 
> This patch would like to take the MAX_MACHINE_MODE as the size of the
> array in tree-streamer, to make sure there is no potential unexpected
> memory access in future.

Please review more careful:

void
lto_input_mode_table (struct lto_file_decl_data *file_data)
{
...
  while ((m = bp_unpack_value (, 8)) != VOIDmode)

reads 8 bits again.

  ibit = bp_unpack_value (, 8);
  fbit = bp_unpack_value (, 8);

likewise.

Also file_data->mode_table is indexed by the _host_ mode, so you
have to allocate enough space to fill in all streamed modes but
you are using the targets MAX_MACHINE_MODE here.  I think we
need to stream the hosts MAX_MACHINE_MODE.

Richard.


> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * lto-streamer-in.cc (lto_input_mode_table): Use
>   MAX_MACHINE_MODE for memory allocation.
>   * tree-streamer.cc: Use MAX_MACHINE_MODE for array size.
>   * tree-streamer.h (streamer_mode_table): Ditto.
>   (bp_pack_machine_mode): Ditto.
>   (bp_unpack_machine_mode): Ditto.
> ---
>  gcc/lto-streamer-in.cc | 3 ++-
>  gcc/tree-streamer.cc   | 2 +-
>  gcc/tree-streamer.h| 7 ---
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> index 2cb83406db5..102b7e18526 100644
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1985,7 +1985,8 @@ lto_input_mode_table (struct lto_file_decl_data 
> *file_data)
>  internal_error ("cannot read LTO mode table from %s",
>   file_data->file_name);
>  
> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
> +  unsigned char *table = ggc_cleared_vec_alloc (
> +MAX_MACHINE_MODE);
>file_data->mode_table = table;
>const struct lto_simple_header_with_strings *header
>  = (const struct lto_simple_header_with_strings *) data;
> diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
> index ed65a7692e3..a28ef9c7920 100644
> --- a/gcc/tree-streamer.cc
> +++ b/gcc/tree-streamer.cc
> @@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
> During streaming in, we translate the on the disk mode using this
> table.  For normal LTO it is set to identity, for ACCEL_COMPILER
> depending on the mode_table content.  */
> -unsigned char streamer_mode_table[1 << 8];
> +unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  
>  /* Check that all the TS_* structures handled by the streamer_write_* and
> streamer_read_* routines are exactly ALL the structures defined in
> diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
> index 170d61cf20b..be3a1938e76 100644
> --- a/gcc/tree-streamer.h
> +++ b/gcc/tree-streamer.h
> @@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
>  void streamer_write_integer_cst (struct output_block *, tree);
>  
>  /* In tree-streamer.cc.  */
> -extern unsigned char streamer_mode_table[1 << 8];
> +extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  void streamer_check_handled_ts_structures (void);
>  bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
>hashval_t, unsigned *);
> @@ -108,7 +108,7 @@ inline void
>  bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>  {
>streamer_mode_table[mode] = 1;
> -  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
> +  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, mode);
>  }
>  
>  inline machine_mode
> @@ -116,7 +116,8 @@ bp_unpack_machine_mode (struct bitpack_d *bp)
>  {
>return (machine_mode)
>  ((class lto_input_block *)
> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> + bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode,
> + MAX_MACHINE_MODE)];
>  }
>  
>  #endif  /* GCC_TREE_STREAMER_H  */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init

2023-06-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 19, 2023 at 08:40:58AM +, Richard Biener wrote:
> You also have to fix bp_pack_machine_mode/bp_unpack_machine_mode which
> streams exactly values in [0, 1<<8 - 1].
> 
> CCing Jakub who invented this code.

For stream-out, all it stores is a bool flag whether the mode is streamed
out, and on stream in it contains a mapping table between host and
offloading modes.
For stream in, it actually isn't used despite the comment maybe suggesting
it is, so I guess using MAX_MACHINE_MODE for it is ok.

As you said,
inline void
bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
{
  streamer_mode_table[mode] = 1;
  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
}

inline machine_mode
bp_unpack_machine_mode (struct bitpack_d *bp)
{
  return (machine_mode)
   ((class lto_input_block *)
bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
}
needs changing for the case when MAX_MACHINE_MODE > 256, but far more
places make similar assumptions:
E.g. lto_write_mode_table has
  bp_pack_value (, m, 8);
(if MAX_MACHINE_MODE > 256, this can't encode all modes),
  bp_pack_value (, GET_MODE_INNER (m), 8);
(ditto).
lto_input_mode_table has e.g.
  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
  file_data->mode_table = table;
Here we need to decide if we keep requiring that offloading architectures
still have MAX_MACHINE_MODE <= 256 or not.  Currently the offloading
arches are nvptx and amdgcn, intelmic support has been removed.
If yes, table can have unsigned char elements, but its size actually depends
on the number of modes on the host side, so lto_write_mode_table would need
to stream out the host MAX_MACHINE_MODE value and lto_input_mode_table
stream it in and use instead of the 1 << 8 size above.
If not, mode_table and unsigned char * would need to change to unsigned
short *, or just conditionally depending on if MAX_MACHINE_MODE <= 256 or
not.
Then
  while ((m = bp_unpack_value (, 8)) != VOIDmode)
{
...
  machine_mode inner = (machine_mode) bp_unpack_value (, 8);
again hardcode 8 bits, that needs to match how many bits packs the host
compiler in lto_write_mode_table.

Jakub



RE: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init

2023-06-19 Thread Li, Pan2 via Gcc-patches
Thanks Richard for the review, just go thru the word (1 << 8) and found another 
one besides bp. Update the PATCH v2 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622151.html

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, June 19, 2023 4:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; Wang, Yanzhang ; 
kito.ch...@gmail.com; Jakub Jelinek 
Subject: RE: [PATCH v1] RISC-V: Fix out of range memory access when lto mode 
init

On Mon, 19 Jun 2023, Li, Pan2 wrote:

> Add Richard Biener for reviewing, sorry for inconvenient.
> 
> Pan
> 
> -Original Message-
> From: Li, Pan2  
> Sent: Monday, June 19, 2023 4:07 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; rdapp@gmail.com; jeffreya...@gmail.com; Li, 
> Pan2 ; Wang, Yanzhang ; 
> kito.ch...@gmail.com
> Subject: [PATCH v1] RISC-V: Fix out of range memory access when lto mode init
> 
> From: Pan Li 
> 
> We extend the machine mode from 8 to 16 bits already. But there still
> one placing missing from the tree-streamer. It has one hard coded array
> for the machine code like size 256.
> 
> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
> value of the MAX_MACHINE_MODE will grow as more and more modes are added.
> While the machine mode array in tree-streamer still leave 256 as is.
> 
> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
> lto_output_init_mode_table will touch the memory out of range unexpected.
> 
> This patch would like to take the MAX_MACHINE_MODE as the size of the
> array in tree-streamer, to make sure there is no potential unexpected
> memory access in future.

You also have to fix bp_pack_machine_mode/bp_unpack_machine_mode which
streams exactly values in [0, 1<<8 - 1].

CCing Jakub who invented this code.

Richard.


> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * tree-streamer.cc (streamer_mode_table): Use MAX_MACHINE_MODE
>   as array size.
>   * tree-streamer.h (streamer_mode_table): Ditto.
> ---
>  gcc/tree-streamer.cc | 2 +-
>  gcc/tree-streamer.h  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
> index ed65a7692e3..a28ef9c7920 100644
> --- a/gcc/tree-streamer.cc
> +++ b/gcc/tree-streamer.cc
> @@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
> During streaming in, we translate the on the disk mode using this
> table.  For normal LTO it is set to identity, for ACCEL_COMPILER
> depending on the mode_table content.  */
> -unsigned char streamer_mode_table[1 << 8];
> +unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  
>  /* Check that all the TS_* structures handled by the streamer_write_* and
> streamer_read_* routines are exactly ALL the structures defined in
> diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
> index 170d61cf20b..51a292c8d80 100644
> --- a/gcc/tree-streamer.h
> +++ b/gcc/tree-streamer.h
> @@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
>  void streamer_write_integer_cst (struct output_block *, tree);
>  
>  /* In tree-streamer.cc.  */
> -extern unsigned char streamer_mode_table[1 << 8];
> +extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
>  void streamer_check_handled_ts_structures (void);
>  bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
>hashval_t, unsigned *);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] tree-optimization/110298 - CFG cleanup and stale nb_iterations

2023-06-19 Thread Richard Biener via Gcc-patches
When unrolling we eventually kill nb_iterations info since it may
refer to removed SSA names.  But we do this only after cleaning
up the CFG which in turn can end up accessing it.  Fixed by
swapping the two.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110298
* tree-ssa-loop-ivcanon.cc (tree_unroll_loops_completely):
Clear number of iterations info before cleaning up the CFG.

* gcc.dg/torture/pr110298.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110298.c | 20 
 gcc/tree-ssa-loop-ivcanon.cc|  7 ---
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110298.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110298.c 
b/gcc/testsuite/gcc.dg/torture/pr110298.c
new file mode 100644
index 000..139f5c77d89
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110298.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+int a, b, c, d, e;
+int f() {
+  c = 0;
+  for (; c >= 0; c--) {
+d = 0;
+for (; d <= 0; d++) {
+  e = 0;
+  for (; d + c + e >= 0; e--)
+;
+  a = 1;
+  b = 0;
+  for (; a; ++b)
+a *= 2;
+  for (; b + d >= 0;)
+return 0;
+}
+  }
+}
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index 6a962a9f503..491b57ec0f1 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -1520,15 +1520,16 @@ tree_unroll_loops_completely (bool may_increase_size, 
bool unroll_outer)
}
  BITMAP_FREE (fathers);
 
+ /* Clean up the information about numbers of iterations, since
+complete unrolling might have invalidated it.  */
+ scev_reset ();
+
  /* This will take care of removing completely unrolled loops
 from the loop structures so we can continue unrolling now
 innermost loops.  */
  if (cleanup_tree_cfg ())
update_ssa (TODO_update_ssa_only_virtuals);
 
- /* Clean up the information about numbers of iterations, since
-complete unrolling might have invalidated it.  */
- scev_reset ();
  if (flag_checking && loops_state_satisfies_p (LOOP_CLOSED_SSA))
verify_loop_closed_ssa (true);
}
-- 
2.35.3


[PATCH] RISC-V: Fix out of range memory access of machine mode table

2023-06-19 Thread Pan Li via Gcc-patches
From: Pan Li 

We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the tree-streamer. It has one hard coded array
for the machine code like size 256.

In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are added.
While the machine mode array in tree-streamer still leave 256 as is.

Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.

This patch would like to take the MAX_MACHINE_MODE as the size of the
array in tree-streamer, to make sure there is no potential unexpected
memory access in future.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* lto-streamer-in.cc (lto_input_mode_table): Use
MAX_MACHINE_MODE for memory allocation.
* tree-streamer.cc: Use MAX_MACHINE_MODE for array size.
* tree-streamer.h (streamer_mode_table): Ditto.
(bp_pack_machine_mode): Ditto.
(bp_unpack_machine_mode): Ditto.
---
 gcc/lto-streamer-in.cc | 3 ++-
 gcc/tree-streamer.cc   | 2 +-
 gcc/tree-streamer.h| 7 ---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2cb83406db5..102b7e18526 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1985,7 +1985,8 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
 internal_error ("cannot read LTO mode table from %s",
file_data->file_name);
 
-  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
+  unsigned char *table = ggc_cleared_vec_alloc (
+MAX_MACHINE_MODE);
   file_data->mode_table = table;
   const struct lto_simple_header_with_strings *header
 = (const struct lto_simple_header_with_strings *) data;
diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
index ed65a7692e3..a28ef9c7920 100644
--- a/gcc/tree-streamer.cc
+++ b/gcc/tree-streamer.cc
@@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
During streaming in, we translate the on the disk mode using this
table.  For normal LTO it is set to identity, for ACCEL_COMPILER
depending on the mode_table content.  */
-unsigned char streamer_mode_table[1 << 8];
+unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 
 /* Check that all the TS_* structures handled by the streamer_write_* and
streamer_read_* routines are exactly ALL the structures defined in
diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
index 170d61cf20b..be3a1938e76 100644
--- a/gcc/tree-streamer.h
+++ b/gcc/tree-streamer.h
@@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
 void streamer_write_integer_cst (struct output_block *, tree);
 
 /* In tree-streamer.cc.  */
-extern unsigned char streamer_mode_table[1 << 8];
+extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 hashval_t, unsigned *);
@@ -108,7 +108,7 @@ inline void
 bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
 {
   streamer_mode_table[mode] = 1;
-  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
+  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, mode);
 }
 
 inline machine_mode
@@ -116,7 +116,8 @@ bp_unpack_machine_mode (struct bitpack_d *bp)
 {
   return (machine_mode)
   ((class lto_input_block *)
-   bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
+   bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode,
+   MAX_MACHINE_MODE)];
 }
 
 #endif  /* GCC_TREE_STREAMER_H  */
-- 
2.34.1



Re: Do not account __builtin_unreachable guards in inliner

2023-06-19 Thread Richard Biener via Gcc-patches
On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> this was suggested earlier somewhere, but I can not find the thread.
> C++ has assume attribute that expands int
>   if (conditional)
> __builtin_unreachable ()
> We do not want to account the conditional in inline heuristics since
> we know that it is going to be optimized out.
>
> Bootstrapped/regtested x86_64-linux, will commit it later today if
> thre are no complains.

I think we also had the request to not account the condition feeding
stmts (if they only feed it and have no side-effects).  libstdc++ has
complex range comparisons here.  Also ...

> gcc/ChangeLog:
>
> * ipa-fnsummary.cc (builtin_unreachable_bb_p): New function.
> (analyze_function_body): Do not account conditionals guarding
> builtin_unreachable calls.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/ipa/fnsummary-1.c: New test.
>
> diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> index a5f5a50c8a5..987da29ec34 100644
> --- a/gcc/ipa-fnsummary.cc
> +++ b/gcc/ipa-fnsummary.cc
> @@ -2649,6 +2649,54 @@ points_to_possible_sra_candidate_p (tree t)
>return false;
>  }
>
> +/* Return true if BB is builtin_unreachable.
> +   We skip empty basic blocks, debug statements, clobbers and predicts.
> +   CACHE is used to memoize already analyzed blocks.  */
> +
> +static bool
> +builtin_unreachable_bb_p (basic_block bb, vec )
> +{
> +  if (cache[bb->index])
> +return cache[bb->index] - 1;
> +  gimple_stmt_iterator si;
> +  auto_vec  visited_bbs;
> +  bool ret = false;
> +  while (true)
> +{
> +  bool empty_bb = true;
> +  visited_bbs.safe_push (bb);
> +  cache[bb->index] = 3;
> +  for (si = gsi_start_nondebug_bb (bb);
> +  !gsi_end_p (si) && empty_bb;
> +  gsi_next_nondebug ())
> +   {
> + if (gimple_code (gsi_stmt (si)) != GIMPLE_PREDICT
> + && !gimple_clobber_p (gsi_stmt (si))
> + && !gimple_nop_p (gsi_stmt (si)))
> +   {
> + empty_bb = false;
> + break;
> +   }
> +   }
> +  if (!empty_bb)
> +   break;
> +  else
> +   bb = single_succ_edge (bb)->dest;
> +  if (cache[bb->index])
> +   {
> + ret = cache[bb->index] == 3 ? false : cache[bb->index] - 1;
> + goto done;
> +   }
> +}
> +  if (gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE)
> +  || gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE_TRAP))

... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?

> +ret = true;
> +done:
> +  for (basic_block vbb:visited_bbs)
> +cache[vbb->index] = (unsigned char)ret + 1;
> +  return ret;
> +}
> +
>  /* Analyze function body for NODE.
> EARLY indicates run from early optimization pipeline.  */
>
> @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool 
> early)
>const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> +  auto_vec cache;
> +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));

A sbitmap with two bits per entry would be more space efficient here.  bitmap
has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
adding the corresponding to sbitmap.h would likely ease use there as well.

>for (n = 0; n < nblocks; n++)
>  {
>bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
> @@ -2901,6 +2951,24 @@ analyze_function_body (struct cgraph_node *node, bool 
> early)
> }
> }
>
> + /* Conditionals guarding __builtin_unreachable will be
> +optimized out.  */
> + if (gimple_code (stmt) == GIMPLE_COND)
> +   {
> + edge_iterator ei;
> + edge e;
> + FOR_EACH_EDGE (e, ei, bb->succs)
> +   if (builtin_unreachable_bb_p (e->dest, cache))
> + {
> +   if (dump_file)
> + fprintf (dump_file,
> +  "\t\tConditional guarding 
> __builtin_unreachable; ignored\n");
> +   this_time = 0;
> +   this_size = 0;
> +   break;
> + }
> +   }
> +
>   /* TODO: When conditional jump or switch is known to be constant, 
> but
>  we did not translate it into the predicates, we really can 
> account
>  just maximum of the possible paths.  */
> diff --git a/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c 
> b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c
> new file mode 100644
> index 000..a0ece0c300b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-options "-O2 -fdump-ipa-fnsummary"  } */
> +int
> +test(int a)
> +{
> +   if (a > 10)
> +   __builtin_unreachable ();
> +}
> +/* { dg-final { scan-ipa-dump "Conditional guarding 

  1   2   >