Re: [PATCH] Analyze niter for until-wrap condition [PR101145]

2021-07-01 Thread guojiufu via Gcc-patches

On 2021-07-02 08:51, Bin.Cheng wrote:

On Thu, Jul 1, 2021 at 10:15 PM guojiufu via Gcc-patches
 wrote:


On 2021-07-01 20:35, Richard Biener wrote:
> On Thu, 1 Jul 2021, Jiufu Guo wrote:
>
>> For code like:
>> unsigned foo(unsigned val, unsigned start)
>> {
>>   unsigned cnt = 0;
>>   for (unsigned i = start; i > val; ++i)
>> cnt++;
>>   return cnt;
>> }
>>
>> The number of iterations should be about UINT_MAX - start.
>
> For
>
> unsigned foo(unsigned val, unsigned start)
> {
>   unsigned cnt = 0;
>   for (unsigned i = start; i >= val; ++i)
> cnt++;
>   return cnt;
> }
>
> and val == 0 the loop never terminates.  I don't see anywhere
> in the patch that you disregard GE_EXPR and I remember
> the code handles GE as well as GT?  From a quick look this is
> also not covered by a testcase you add - not exactly sure
> how it would materialize in a miscompilation.

In number_of_iterations_cond, there is code:
if (code == GE_EXPR || code == GT_EXPR
|| (code == NE_EXPR && integer_zerop (iv0->step)))
   {
 std::swap (iv0, iv1);
 code = swap_tree_comparison (code);
   }
It converts "GT/GE" (i >= val) to "LT/LE" (val <= i),
and LE (val <= i) is converted to LT (val - 1 < i).
So, the code is added to number_of_iterations_lt.

But, this patch leads mis-compilation for unsigned "i >= val" as
above transforms: converting LE (val <= i) to LT (val - 1 < i)
seems not appropriate (e.g where val=0).

I don't know where the exact code is, but IIRC, number_of_iteration
handles boundary conditions when transforming <= into <.  You may
check it out.

Yes, in number_of_iterations_le, there is code to check MAX/MIN
if (integer_nonzerop (iv0->step))
  assumption = fold_build2 (NE_EXPR, boolean_type_node,
iv1->base, TYPE_MAX_VALUE (type));
else
  assumption = fold_build2 (NE_EXPR, boolean_type_node,
iv0->base, TYPE_MIN_VALUE (type));

Checking why this code does not help.



Thanks for pointing out this!!!

I would investigate a way to handle this correctly.
A possible way maybe just to return false for this kind of LE.

IIRC, it checks the boundary conditions, either returns false or
simply introduces more assumptions.

Thanks! Adding more assumptions would help.
The below code also runs into infinite, more assumptions may help this 
code.


__attribute__ ((noinline))
unsigned foo(unsigned val, unsigned start)
{
  unsigned cnt = 0;
  for (unsigned i = start; val <= i; i+=16)
cnt++;
  return cnt;
}

foo (4, 8);

Thanks again!


BR,
Jiufu Guo


Any suggestions?

>
>> There is function adjust_cond_for_loop_until_wrap which
>> handles similar work for const bases.
>> Like adjust_cond_for_loop_until_wrap, this patch enhance
>> function number_of_iterations_cond/number_of_iterations_lt
>> to analyze number of iterations for this kind of loop.
>>
>> Bootstrap and regtest pass on powerpc64le, is this ok for trunk?
>>
>> gcc/ChangeLog:
>>
>>  PR tree-optimization/101145
>>  * tree-ssa-loop-niter.c
>>  (number_of_iterations_until_wrap): New function.
>>  (number_of_iterations_lt): Invoke above function.
>>  (adjust_cond_for_loop_until_wrap):
>>  Merge to number_of_iterations_until_wrap.
>>  (number_of_iterations_cond): Update invokes for
>>  adjust_cond_for_loop_until_wrap and number_of_iterations_lt.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR tree-optimization/101145
>>  * gcc.dg/vect/pr101145.c: New test.
>>  * gcc.dg/vect/pr101145.inc: New test.
>>  * gcc.dg/vect/pr101145_1.c: New test.
>>  * gcc.dg/vect/pr101145_2.c: New test.
>>  * gcc.dg/vect/pr101145_3.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/vect/pr101145.c   | 187
>> +
>>  gcc/testsuite/gcc.dg/vect/pr101145.inc |  63 +
>>  gcc/testsuite/gcc.dg/vect/pr101145_1.c |  15 ++
>>  gcc/testsuite/gcc.dg/vect/pr101145_2.c |  15 ++
>>  gcc/testsuite/gcc.dg/vect/pr101145_3.c |  15 ++
>>  gcc/tree-ssa-loop-niter.c  | 150 +++-
>>  6 files changed, 380 insertions(+), 65 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.c
>>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.inc
>>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_3.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145.c
>> b/gcc/testsuite/gcc.dg/vect/pr101145.c
>> new file mode 100644
>> index 000..74031b031cf
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/pr101145.c
>> @@ -0,0 +1,187 @@
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-options "-O3 -fdump-tree-vect-details" } */
>> +#include 
>> +
>> +unsigned __attribute__ ((noinline))
>> +foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned
>> n)
>> +{
>> +  while (n < ++l)
>> +*a++ = *b++ + 1;
>> +  return l;
>> 

Re: [PATCH] [i386] Clear odata for aes(enc|dec)(wide)?kl intrinsics

2021-07-01 Thread Hongyu Wang via Gcc-patches
Updated patch with minor change to move the variable declaration after comment.

Hongtao, could you help check-in the patch?

Hongyu Wang  于2021年7月1日周四 下午4:16写道:
>
> > Change some keylocker insn to Keylocker aesenc/aesdec in comments.
> > others LGTM.
>
> Changed.
>
> Forgot to mention bootstrapped and regression tested on
> x86_64-linux-gnu {,-m32}.
>
> Attached file is the patch i'm going to check-in.
>
> Hongtao Liu via Gcc-patches  于2021年7月1日周四 下午4:07写道:
> >
> > On Thu, Jul 1, 2021 at 3:51 PM Hongyu Wang  wrote:
> > >
> > > For Keylocker aesenc/aesdec intrinsics, current implementation
> > > moves idata to odata unconditionally, which causes safety issue when
> > > the instruction meets runtime error. So we add a branch to clear
> > > odata when ZF is set after instruction exectution.
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-expand.c (ix86_expand_builtin):
> > > Add branch to clear odata when ZF is set for asedecenc_expand
> > > and wideaesdecenc_expand.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/keylocker-aesdec128kl.c: Update test.
> > > * gcc.target/i386/keylocker-aesdec256kl.c: Likewise.
> > > * gcc.target/i386/keylocker-aesdecwide128kl.c: Likewise.
> > > * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
> > > * gcc.target/i386/keylocker-aesenc128kl.c: Likewise.
> > > * gcc.target/i386/keylocker-aesenc256kl.c: Likewise.
> > > * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
> > > * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
> > > ---
> > >  gcc/config/i386/i386-expand.c | 33 ---
> > >  .../gcc.target/i386/keylocker-aesdec128kl.c   |  2 ++
> > >  .../gcc.target/i386/keylocker-aesdec256kl.c   |  2 ++
> > >  .../i386/keylocker-aesdecwide128kl.c  |  9 +
> > >  .../i386/keylocker-aesdecwide256kl.c  |  9 +
> > >  .../gcc.target/i386/keylocker-aesenc128kl.c   |  2 ++
> > >  .../gcc.target/i386/keylocker-aesenc256kl.c   |  2 ++
> > >  .../i386/keylocker-aesencwide128kl.c  |  9 +
> > >  .../i386/keylocker-aesencwide256kl.c  |  9 +
> > >  9 files changed, 72 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > > index e9763eb5b3e..de85f256fee 100644
> > > --- a/gcc/config/i386/i386-expand.c
> > > +++ b/gcc/config/i386/i386-expand.c
> > > @@ -11556,6 +11556,9 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
> > > subtarget,
> > >
> > >  aesdecenc_expand:
> > >
> > > +  rtx_code_label *ok_label;
> > > +  rtx tmp;
> > > +
> > >arg0 = CALL_EXPR_ARG (exp, 0); // __m128i *odata
> > >arg1 = CALL_EXPR_ARG (exp, 1); // __m128i idata
> > >arg2 = CALL_EXPR_ARG (exp, 2); // const void *p
> > > @@ -11586,10 +11589,21 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
> > > subtarget,
> > >if (target == 0)
> > > target = gen_reg_rtx (QImode);
> > >
> > > -  pat = gen_rtx_EQ (QImode, gen_rtx_REG (CCZmode, FLAGS_REG),
> > > -   const0_rtx);
> > > -  emit_insn (gen_rtx_SET (target, pat));
> > > +  /* NB: For some keylocker insn, ZF will be set when runtime
> > > +error occurs. Then the output should be cleared for safety. */
> > Change some keylocker insn to Keylocker aesenc/aesdec in comments.
> > others LGTM.
> > > +  tmp = gen_rtx_REG (CCZmode, FLAGS_REG);
> > > +  pat = gen_rtx_EQ (QImode, tmp, const0_rtx);
> > > +  ok_label = gen_label_rtx ();
> > > +  emit_cmp_and_jump_insns (tmp, const0_rtx, NE, 0, GET_MODE (tmp),
> > > +  true, ok_label);
> > > +  /* Usually the runtime error seldom occur, so predict OK path as
> > > +hotspot to optimize it as fallthrough block. */
> > > +  predict_jump (REG_BR_PROB_BASE * 90 / 100);
> > > +
> > > +  emit_insn (gen_rtx_SET (op1, const0_rtx));
> > >
> > > +  emit_label (ok_label);
> > > +  emit_insn (gen_rtx_SET (target, pat));
> > >emit_insn (gen_rtx_SET (op0, op1));
> > >
> > >return target;
> > > @@ -11644,8 +11658,17 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
> > > subtarget,
> > >if (target == 0)
> > > target = gen_reg_rtx (QImode);
> > >
> > > -  pat = gen_rtx_EQ (QImode, gen_rtx_REG (CCZmode, FLAGS_REG),
> > > -   const0_rtx);
> > > +  tmp = gen_rtx_REG (CCZmode, FLAGS_REG);
> > > +  pat = gen_rtx_EQ (QImode, tmp, const0_rtx);
> > > +  ok_label = gen_label_rtx ();
> > > +  emit_cmp_and_jump_insns (tmp, const0_rtx, NE, 0, GET_MODE (tmp),
> > > +  true, ok_label);
> > > +  predict_jump (REG_BR_PROB_BASE * 90 / 100);
> > > +
> > > +  for (i = 0; i < 8; i++)
> > > +   emit_insn (gen_rtx_SET (xmm_regs[i], const0_rtx));
> > > +
> > > +  emit_label (ok_label);
> > >emit_insn (gen_rtx_SET 

Re: [PATCH] docs: Add 'S' to Machine Constraints for RISC-V

2021-07-01 Thread Fangrui Song

On 2021-07-02, Kito Cheng wrote:

It was undocument before, but already used in linux kernel, so LLVM
community suggest we should document that, so that make it become
supported/documented/non-internal machine constraints.

gcc/ChangeLog:

PR target/101275
* doc/md.text (Machine Constraints): Document the 'S' constraints
for RISC-V.
---
gcc/doc/md.texi | 3 +++
1 file changed, 3 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..b776623e8a5 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions.
@item A
An address that is held in a general-purpose register.

+@item S
+A constant call address.
+
@end table

@item RX---@file{config/rx/constraints.md}
--
2.31.1



Thanks for the patch!

To clarify, 'S' is used by the aarch64 port of the Linux kernel.
(https://github.com/ClangBuiltLinux/linux/issues/13)

It was proposed as one way to make  __vdso_rt_sigreturn in
arch/riscv/kernel/signal.c less hacky but we have agreed that riscv
should just use the existing mechanism (e.g.
arch/arm64/kernel/vdso/gen_vdso_offsets.sh) as used by a few other
ports.

That said, 'S' is still useful as it enables flexible modifiers (e.g.
`%got_pcrel_hi, %pcrel_hi`) in inline asm.


[PATCH] docs: Add 'S' to Machine Constraints for RISC-V

2021-07-01 Thread Kito Cheng
It was undocument before, but already used in linux kernel, so LLVM
community suggest we should document that, so that make it become
supported/documented/non-internal machine constraints.

gcc/ChangeLog:

PR target/101275
* doc/md.text (Machine Constraints): Document the 'S' constraints
for RISC-V.
---
 gcc/doc/md.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..b776623e8a5 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions.
 @item A
 An address that is held in a general-purpose register.
 
+@item S
+A constant call address.
+
 @end table
 
 @item RX---@file{config/rx/constraints.md}
-- 
2.31.1



[PATCH] i386: Disable param ira-consider-dup-in-all-alts [PR100328]

2021-07-01 Thread Kewen.Lin via Gcc-patches
Hi,

With Hongtao's help (thanks), we got the SPEC2017 performance
evaluation result on x86_64 (see [1]), this new parameter
ira-consider-dup-in-all-alts has negative effects on i386.
Since we observed it can benefit ports aarch64 and rs6000, the
param is set as 1 by default, this patch is to disable it on
i386 explicitly to avoid performance degradation there.

Bootstrapped & regtested on x86_64-redhat-linux.

Is it ok for trunk?

BR,
Kewen

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573986.html
-
From 457c7b3032e20ea0f9d8c8d2980e7da6daeedb13 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Mon, 21 Jun 2021 22:51:09 -0500
Subject: [PATCH 2/2] i386: Disable param ira-consider-dup-in-all-alts
 [PR100328]

With Hongtao's SPEC2017 performance evaluation result here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573986.html
this new parameter ira-consider-dup-in-all-alts has negative
effects on i386, this patch is to disable it explicitly on
i386.

Bootstrapped & regtested on x86_64-redhat-linux.

gcc/ChangeLog:

PR rtl-optimization/100328
* config/i386/i386-options.c (ix86_option_override_internal):
Set param_ira_consider_dup_in_all_alts to 0.
---
 gcc/config/i386/i386-options.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 0eccb549c22..7a35c468da3 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2831,6 +2831,8 @@ ix86_option_override_internal (bool main_args_p,
   if (ix86_indirect_branch != indirect_branch_keep)
 SET_OPTION_IF_UNSET (opts, opts_set, flag_jump_tables, 0);
 
+  SET_OPTION_IF_UNSET (opts, opts_set, param_ira_consider_dup_in_all_alts, 0);
+
   return true;
 }
 
-- 
2.17.1



Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]

2021-07-01 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2021/6/30 下午11:42, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> on 2021/6/28 下午3:20, Hongtao Liu wrote:
>>> On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu  wrote:

 On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin  wrote:
>
> Hi!
>
> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> PR100328 has some details about this issue, I am trying to
>> brief it here.  In the hottest function LBM_performStreamCollideTRT
>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
>> (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
>> class have 64 registers whose foregoing 32 ones make up the
>> whole FLOAT_REG.  There are some differences for these two
>> flavors, taking "*fma4_fpr" as example:
>>
>> (define_insn "*fma4_fpr"
>>   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
>>   (fma:SFDF
>> (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
>> (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
>> (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]
>>
>> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
>> //  (f/d) => A floating point register, aka. FLOAT_REG.
>>
>> So for VSX_REG, we only have the destructive form, when VSX_REG
>> alternative being used, the operand 2 or operand 3 is required
>> to be the same as operand 0.  reload has to take care of this
>> constraint and create some non-free register copies if required.
>>
>> Assuming one fma insn looks like:
>>   op0 = FMA (op1, op2, op3)
>>
>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
>> IRA simply creates three shuffle copies for them (here the operand
>> order matters, since with the same freq, the one with smaller number
>> takes preference), but IMO both op2 and op3 should take higher priority
>> in copy queue due to the matching constraint.
>>
>> I noticed that there is one function ira_get_dup_out_num, which meant
>> to create this kind of constraint copy, but the below code looks to
>> refuse to create if there is an alternative which has valid regclass
>> without spilled need.
>>
>>   default:
>>   {
>> enum constraint_num cn = lookup_constraint (str);
>> enum reg_class cl = reg_class_for_constraint (cn);
>> if (cl != NO_REGS
>> && !targetm.class_likely_spilled_p (cl))
>>   goto fail
>>
>>...
>>
>> I cooked one patch attached to make ira respect this kind of matching
>> constraint guarded with one parameter.  As I stated in the PR, I was
>> not sure this is on the right track.  The RFC patch is to check the
>> matching constraint in all alternatives, if there is one alternative
>> with matching constraint and matches the current preferred regclass
>> (or best of allocno?), it will record the output operand number and
>> further create one constraint copy for it.  Normally it can get the
>> priority against shuffle copies and the matching constraint will get
>> satisfied with higher possibility, reload doesn't create extra copies
>> to meet the matching constraint or the desirable register class when
>> it has to.
>>
>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
>> as shuffle copies, and later any of A,B,C,D gets assigned by one
>> hardware register which is a VSX register (VSX_REG) but not a FP
>> register (FLOAT_REG), which means it has to pay costs once we can NOT
>> go with VSX alternatives, so at that time it's important to respect
>> the matching constraint then we can increase the freq for the remaining
>> copies related to this (A/B, A/C, A/D).  This idea requires some side
>> tables to record some information and seems a bit complicated in the
>> current framework, so the proposed patch aggressively emphasizes the
>> matching constraint at the time of creating copies.
>>
>
> Comparing with the original patch (v1), this patch v3 has
> considered: (this should be v2 for this mail list, but bump
> it to be consistent as PR's).
>
>   - Excluding the case where for one preferred register class
> there can be two or more alternatives, one of them has the
> matching constraint, while another doesn't have.  So for
> the given operand, even if it's assigned by a hardware reg
> which doesn't meet the matching constraint, it can simply
> use the alternative which doesn't have matching constraint
> so no register move is needed.  One typical case is
> define_insn *mov_internal2 on rs6000.  So we
> shouldn't create constraint copy for it.
>
>   - The possible free register move in the 

[PATCH v4] ira: Support more matching constraint forms with param [PR100328]

2021-07-01 Thread Kewen.Lin via Gcc-patches
Hi Vladimir,

on 2021/6/30 下午11:24, Vladimir Makarov wrote:
> 
> On 2021-06-28 2:26 a.m., Kewen.Lin wrote:
>> Hi!
>>
>> on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> PR100328 has some details about this issue, I am trying to
>>> brief it here.  In the hottest function LBM_performStreamCollideTRT
>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
>>> (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
>>> class have 64 registers whose foregoing 32 ones make up the
>>> whole FLOAT_REG.  There are some differences for these two
>>> flavors, taking "*fma4_fpr" as example:
>>>
>>> (define_insn "*fma4_fpr"
>>>    [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
>>> (fma:SFDF
>>>   (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
>>>   (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
>>>   (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]
>>>
>>> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
>>> //  (f/d) => A floating point register, aka. FLOAT_REG.
>>>
>>> So for VSX_REG, we only have the destructive form, when VSX_REG
>>> alternative being used, the operand 2 or operand 3 is required
>>> to be the same as operand 0.  reload has to take care of this
>>> constraint and create some non-free register copies if required.
>>>
>>> Assuming one fma insn looks like:
>>>    op0 = FMA (op1, op2, op3)
>>>
>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
>>> IRA simply creates three shuffle copies for them (here the operand
>>> order matters, since with the same freq, the one with smaller number
>>> takes preference), but IMO both op2 and op3 should take higher priority
>>> in copy queue due to the matching constraint.
>>>
>>> I noticed that there is one function ira_get_dup_out_num, which meant
>>> to create this kind of constraint copy, but the below code looks to
>>> refuse to create if there is an alternative which has valid regclass
>>> without spilled need.
>>>
>>>    default:
>>> {
>>>   enum constraint_num cn = lookup_constraint (str);
>>>   enum reg_class cl = reg_class_for_constraint (cn);
>>>   if (cl != NO_REGS
>>>   && !targetm.class_likely_spilled_p (cl))
>>>     goto fail
>>>
>>>  ...
>>>
>>> I cooked one patch attached to make ira respect this kind of matching
>>> constraint guarded with one parameter.  As I stated in the PR, I was
>>> not sure this is on the right track.  The RFC patch is to check the
>>> matching constraint in all alternatives, if there is one alternative
>>> with matching constraint and matches the current preferred regclass
>>> (or best of allocno?), it will record the output operand number and
>>> further create one constraint copy for it.  Normally it can get the
>>> priority against shuffle copies and the matching constraint will get
>>> satisfied with higher possibility, reload doesn't create extra copies
>>> to meet the matching constraint or the desirable register class when
>>> it has to.
>>>
>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
>>> as shuffle copies, and later any of A,B,C,D gets assigned by one
>>> hardware register which is a VSX register (VSX_REG) but not a FP
>>> register (FLOAT_REG), which means it has to pay costs once we can NOT
>>> go with VSX alternatives, so at that time it's important to respect
>>> the matching constraint then we can increase the freq for the remaining
>>> copies related to this (A/B, A/C, A/D).  This idea requires some side
>>> tables to record some information and seems a bit complicated in the
>>> current framework, so the proposed patch aggressively emphasizes the
>>> matching constraint at the time of creating copies.
>>>
>> Comparing with the original patch (v1), this patch v3 has
>> considered: (this should be v2 for this mail list, but bump
>> it to be consistent as PR's).
>>
>>    - Excluding the case where for one preferred register class
>>  there can be two or more alternatives, one of them has the
>>  matching constraint, while another doesn't have.  So for
>>  the given operand, even if it's assigned by a hardware reg
>>  which doesn't meet the matching constraint, it can simply
>>  use the alternative which doesn't have matching constraint
>>  so no register move is needed.  One typical case is
>>  define_insn *mov_internal2 on rs6000.  So we
>>  shouldn't create constraint copy for it.
>>
>>    - The possible free register move in the same register class,
>>  disable this if so since the register move to meet the
>>  constraint is considered as free.
>>
>>    - Making it on by default, suggested by Segher & Vladimir, we
>>  hope to get rid of the parameter if the benchmarking result
>>  looks good on major targets.
>>
>>    - Tweaking cost when either of matching constraint two sides
>>  is hardware register.  Before this patch, the 

[PATCH 2/2] rs6000: Add tests for SSE4.1 "ceil" intrinsics

2021-07-01 Thread Paul A. Clarke via Gcc-patches
Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.

Copy a test for _mm_ceil_pd and _mm_ceil_ps from
gcc/testsuite/gcc.target/i386.

Define __VSX_SSE2__ to pick up some union definitons in
m128-check.h.

2021-07-01  Paul A. Clarke  

gcc/testsuite/ChangeLog:
* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c: Copy
from gcc/testsuite/gcc.target/i386.
* gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
(__VSX_SSE2__): Define.
---
 .../gcc.target/powerpc/sse4_1-ceilpd.c|  51 
 .../gcc.target/powerpc/sse4_1-ceilps.c|  33 +
 .../gcc.target/powerpc/sse4_1-ceilsd.c| 119 ++
 .../gcc.target/powerpc/sse4_1-ceilss.c|  95 ++
 .../gcc.target/powerpc/sse4_1-check.h |   4 +
 .../gcc.target/powerpc/sse4_1-round-data.h|  20 +++
 .../gcc.target/powerpc/sse4_1-round.h |  27 
 .../gcc.target/powerpc/sse4_1-round2.h|  27 
 .../gcc.target/powerpc/sse4_1-roundpd-3.c |  36 ++
 9 files changed, 412 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
new file mode 100644
index ..f532fdb9c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_ceil_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  1.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  1.0,  1.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.ep+50,  0x1.fp+50 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.2p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.4p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.0p+52 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+51 } },
+   { -0x1.ep+51, -0x1.ep+51 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.2p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0,  0.0 } },
+  { { .f = { -0.50, -0.25 } }, {  0.0,  0.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
new file mode 100644
index ..417ac76d8aa9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128

[PATCH 1/2] rs6000: Add support for SSE4.1 "ceil" intrinsics

2021-07-01 Thread Paul A. Clarke via Gcc-patches
2021-07-01  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
_mm_ceil_sd, _mm_ceil_ss): New.
---
 gcc/config/rs6000/smmintrin.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index fa17a8b2f478..0c0b0dd7c1e3 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -212,4 +212,32 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_pd (__m128d __A)
+{
+  return (__m128d) vec_ceil ((__v2df) __A);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_ps (__m128 __A)
+{
+  return (__m128) vec_ceil ((__v4sf) __A);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_ceil ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_ceil (((__v4sf) __B)[0]);
+  return r;
+}
+
 #endif
-- 
2.27.0



[PATCH 0/2] Add SSE4.1 "ceil" intrinsics

2021-07-01 Thread Paul A. Clarke via Gcc-patches
Instead of copying some tests from gcc/testsuite/gcc.target/i386,
I created new tests.  The i386 tests in question used rand() to
generate the input data and assembly to compute the rounded values.
Using rand() for testing seems wrong, and the assembly is obviously
not portable.  I use static data, primarily exercising the edges of
dynamic ranges (where fractions start to be unrepresentable).

Paul A. Clarke (2):
  rs6000: Add support for SSE4.1 "ceil" intrinsics
  rs6000: Add tests for SSE4.1 "ceil" intrinsics

 gcc/config/rs6000/smmintrin.h |  28 +
 .../gcc.target/powerpc/sse4_1-ceilpd.c|  51 
 .../gcc.target/powerpc/sse4_1-ceilps.c|  33 +
 .../gcc.target/powerpc/sse4_1-ceilsd.c| 119 ++
 .../gcc.target/powerpc/sse4_1-ceilss.c|  95 ++
 .../gcc.target/powerpc/sse4_1-check.h |   4 +
 .../gcc.target/powerpc/sse4_1-round-data.h|  20 +++
 .../gcc.target/powerpc/sse4_1-round.h |  27 
 .../gcc.target/powerpc/sse4_1-round2.h|  27 
 .../gcc.target/powerpc/sse4_1-roundpd-3.c |  36 ++
 10 files changed, 440 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

-- 
2.27.0



Re: [PATCH PR100740]Fix overflow check in simplifying exit cond comparing two IVs.

2021-07-01 Thread Bin.Cheng via Gcc-patches
On Thu, Jul 1, 2021 at 8:19 PM Richard Biener
 wrote:
>
> On Mon, Jun 7, 2021 at 4:35 PM Richard Biener
>  wrote:
> >
> > On Sun, Jun 6, 2021 at 12:01 PM Bin.Cheng  wrote:
> > >
> > > On Wed, Jun 2, 2021 at 3:28 PM Richard Biener via Gcc-patches
> > >  wrote:
> > > >
> > > > On Tue, Jun 1, 2021 at 4:00 PM bin.cheng via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > As described in patch summary, this fixes the wrong code issue by 
> > > > > adding overflow-ness
> > > > > check for iv1.step - iv2.step.
> > > > >
> > > > > Bootstrap and test on x86_64.  Any comments?
> > > >
> > > > + bool wrap_p = TYPE_OVERFLOW_WRAPS (step_type);
> > > > + if (wrap_p)
> > > > +   {
> > > > + tree t = fold_binary_to_constant (GE_EXPR, step_type,
> > > > +   iv0->step, iv1->step);
> > > > + wrap_p = integer_zerop (t);
> > > > +   }
> > > >
> > > > I think we can't use TYPE_OVERFLOW_WRAPS/TYPE_OVERFLOW_UNDEFINED since
> > > > that's only relevant for expressions written by the user - we're
> > > > computing iv0.step - iv1.step
> > > > which can even overflow when TYPE_OVERFLOW_UNDEFINED (in fact we may not
> > > > even generate this expression then!).  So I think we have to do sth like
> > > >
> > > >/* If the iv0->step - iv1->step wraps, fail.  */
> > > >if (!operand_equal_p (iv0->step, iv1->step)
> > > >&& (TREE_CODE (iv0->step) != INTEGER_CST || TREE_CODE
> > > > (iv1->step) != INTEGER_CST)
> > > >&& !wi::gt (wi::to_widest (iv0->step), wi::to_widest (iv1->step))
> > > >  return false;
> > > >
> > > > which only handles equality and all integer constant steps. You could
> > > Thanks for the suggestion.  I realized that we have LE/LT/NE
> > > conditions here, and for LE/LT what we need to check is iv0/iv1
> > > converge to each other, rather than diverge.  Also steps here can only
> > > be constants, so there is no need to use range information.
> >
> > Ah, that simplifies things.
> >
> > + if (tree_int_cst_lt (iv0->step, iv1->step))
> > +   return false;
> >
> > so it looks to me that iv?->step can be negative which means we should
> > verify that abs(iv0->step - iv1->step) <= abs (iv0->step), correct?
> >
> >tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
> >iv0->step, iv1->step);
> > ...
> > + if (TREE_CODE (step) != INTEGER_CST)
> > +   return false;
> >
> > note fold_binary_to_constant will return NULL if the result is not
> > TREE_CONSTANT (which would also include symbolic constants
> > like  - ).  It wasn't checked before, of course but since we're
> > touching the code we might very well be checking for NULL step ...
> > (or assert it is not for documentation purposes).
> >
> > That said, if iv0->step and iv1->step are known INTEGER_CSTs
> > (I think they indeed are given the constraints we impose on
> > simple_iv_with_niters).
> >
> > That said, with just a quick look again it looks to me the
> > IV1 {<=,<} IV2 transform to IV1 - IV2step {<=,<} IV2base
> > is OK whenever the effective step magnitude on the IV1'
> > decreases, thus abs(IV1.step - IV2.step) <= abs(IV1.step)
> > since then IV1 is still guaranteed to not overflow.  But
> > for example {0, +, 1} and {10, -, 1} also converge if the
> > number of iterations is less than 10 but they would not pass
> > this criteria.  So I'm not sure "convergence" is a good wording
> > here - but maybe I'm missing some critical piece of understanding
> > here.
> >
> > But in any case it looks like we're on the right track ;)
>
> Just to pick up things where we left them (and seeing the patch to
> unti-wrap which reminded me), I've digged in myself a bit and
> came to the following conclusion.
>
> The b0 + s0 < b1 + s1 -> b0 + s0 - s1 < b1 transform is only
> valid if b0 + s0 - s1 does not wrap which we can only ensure
> by ensuring that b0 + s0 and b1 + s1 do not wrap (which we
> already check) but also - what we're missing - that (s0 - s1)
> makes b0 still evolve in the same direction (thus s0-s1 has the
> same sign as s0) and that its magnitude is less that that of s0.
>
> Extra cases could be handled if we have an upper bound for
> the number of iterations from other sources, not sure if that's
> worth checking.
>
> Thus I am testing the attached now.
>
> Hope you don't mind - and I of course welcome any comments.
Oh, not at all.  Your help on these issues are greatly appreciated.

As for the change:

> + if (tree_int_cst_sign_bit (iv0->step) != tree_int_cst_sign_bit 
> (step)
> + || wi::geu_p (wi::abs (wi::to_wide (step)),
> +   wi::abs (wi::to_wide (iv0->step
It returns false on condition "{base, 5}_iv0 < {base, -1}_iv1", but
this is like the "convergence" case I mentioned and could be analyzed?

Thanks,
bin
> +   return false;
> +   }


Thanks,
bin
>
> Thanks,
> Richard.
>
> 

[PATCH] consider parameter names in -Wvla-parameter (PR 97548)

2021-07-01 Thread Martin Sebor via Gcc-patches

-Wvla-parameter relies on operand_equal_p() with OEP_LEXICOGRAPHIC
set to compare VLA bounds for equality.  But operand_equal_p()
doesn't consider decl names, and so nontrivial expressions that
refer to the same function parameter are considered unequal by
the function, leading to false positives.

The attached fix solves the problem by adding a new flag bit,
OEP_DECL_NAME, to set of flags that control the function.  When
the bit is set, the function considers distinct decls with
the same name equal.  The caller is responsible for ensuring
that the otherwise distinct decls appear in a context where they
can be assumed to refer to the same entity.  The only caller that
sets the flag is the -Wvla-parameter checker.

In addition, the patch strips nops from the VLA bound to avoid
false positives with meaningless casts.

Tested on x86_64-linux.

Martin
Resolves:

PR c/101289 - bogus -Wvla-paramater warning when using const for vla param
PR c/97548 -  bogus -Wvla-parameter on a bound expression involving a parameter

gcc/c-family/ChangeLog:

	* c-warn.c (warn_parm_array_mismatch): Use OEP_DECL_NAME.

gcc/c/ChangeLog:

	* c-decl.c (get_parm_array_spec): Strip nops.

gcc/ChangeLog:

	* fold-const.c (operand_compare::operand_equal_p): Handle OEP_DECL_NAME.
	(operand_compare::verify_hash_value): Same.
	* tree-core.h (OEP_DECL_NAME): New.

gcc/testsuite/ChangeLog:

	* gcc.dg/Wvla-parameter-12.c: New test.

diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
index 34959590b37..552a29f9944 100644
--- a/gcc/c-family/c-warn.c
+++ b/gcc/c-family/c-warn.c
@@ -3646,7 +3646,8 @@ warn_parm_array_mismatch (location_t origloc, tree fndecl, tree newparms)
 	  /* The VLA bounds don't refer to other function parameters.
 		 Compare them lexicographically to detect gross mismatches
 		 such as between T[foo()] and T[bar()].  */
-	  if (operand_equal_p (newbnd, curbnd, OEP_LEXICOGRAPHIC))
+	  if (operand_equal_p (newbnd, curbnd,
+   OEP_DECL_NAME | OEP_LEXICOGRAPHIC))
 		continue;
 
 	  if (warning_at (newloc, OPT_Wvla_parameter,
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 983d65e930c..234ee16fe4a 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -5865,6 +5865,7 @@ get_parm_array_spec (const struct c_parm *parm, tree attrs)
 
   /* Each variable VLA bound is represented by a dollar sign.  */
   spec += "$";
+  STRIP_NOPS (nelts);
   vbchain = tree_cons (NULL_TREE, nelts, vbchain);
 }
 
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index dfccbaec683..e0cc9b1a485 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3499,11 +3499,26 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
 
 case tcc_declaration:
   /* Consider __builtin_sqrt equal to sqrt.  */
-  return (TREE_CODE (arg0) == FUNCTION_DECL
-	  && fndecl_built_in_p (arg0) && fndecl_built_in_p (arg1)
-	  && DECL_BUILT_IN_CLASS (arg0) == DECL_BUILT_IN_CLASS (arg1)
-	  && (DECL_UNCHECKED_FUNCTION_CODE (arg0)
-		  == DECL_UNCHECKED_FUNCTION_CODE (arg1)));
+  if (TREE_CODE (arg0) == FUNCTION_DECL)
+	return (fndecl_built_in_p (arg0) && fndecl_built_in_p (arg1)
+		&& DECL_BUILT_IN_CLASS (arg0) == DECL_BUILT_IN_CLASS (arg1)
+		&& (DECL_UNCHECKED_FUNCTION_CODE (arg0)
+		== DECL_UNCHECKED_FUNCTION_CODE (arg1)));
+
+  if (DECL_P (arg0)
+	  && (flags & OEP_DECL_NAME)
+	  && (flags & OEP_LEXICOGRAPHIC))
+	{
+	  /* Consider decls with the same name equal.  The caller needs
+	 to make sure they refer to the same entity (such as a function
+	 formal parameter).  */
+	  tree a0name = DECL_NAME (arg0);
+	  tree a1name = DECL_NAME (arg1);
+	  const char *a0ns = a0name ? IDENTIFIER_POINTER (a0name) : NULL;
+	  const char *a1ns = a1name ? IDENTIFIER_POINTER (a1name) : NULL;
+	  return a0ns && a1ns && strcmp (a0ns, a1ns) == 0;
+	}
+  return false;
 
 case tcc_exceptional:
   if (TREE_CODE (arg0) == CONSTRUCTOR)
@@ -3914,14 +3929,14 @@ bool
 operand_compare::verify_hash_value (const_tree arg0, const_tree arg1,
 unsigned int flags, bool *ret)
 {
-  /* When checking, verify at the outermost operand_equal_p call that
- if operand_equal_p returns non-zero then ARG0 and ARG1 has the same
- hash value.  */
+  /* When checking and unless comparing DECL names, verify that if
+ the outermost operand_equal_p call returns non-zero then ARG0
+ and ARG1 have the same hash value.  */
   if (flag_checking && !(flags & OEP_NO_HASH_CHECK))
 {
   if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
 	{
-	  if (arg0 != arg1)
+	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
 	{
 	  inchash::hash hstate0 (0), hstate1 (0);
 	  hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e15e6c651f0..58eaf3933e4 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -902,7 +902,10 @@ enum operand_equal_flag {
   OEP_BITWISE = 128,
   /* For OEP_ADDRESS_OF of COMPONENT_REFs, only consider 

Re: [PATCH] Analyze niter for until-wrap condition [PR101145]

2021-07-01 Thread Bin.Cheng via Gcc-patches
On Thu, Jul 1, 2021 at 10:15 PM guojiufu via Gcc-patches
 wrote:
>
> On 2021-07-01 20:35, Richard Biener wrote:
> > On Thu, 1 Jul 2021, Jiufu Guo wrote:
> >
> >> For code like:
> >> unsigned foo(unsigned val, unsigned start)
> >> {
> >>   unsigned cnt = 0;
> >>   for (unsigned i = start; i > val; ++i)
> >> cnt++;
> >>   return cnt;
> >> }
> >>
> >> The number of iterations should be about UINT_MAX - start.
> >
> > For
> >
> > unsigned foo(unsigned val, unsigned start)
> > {
> >   unsigned cnt = 0;
> >   for (unsigned i = start; i >= val; ++i)
> > cnt++;
> >   return cnt;
> > }
> >
> > and val == 0 the loop never terminates.  I don't see anywhere
> > in the patch that you disregard GE_EXPR and I remember
> > the code handles GE as well as GT?  From a quick look this is
> > also not covered by a testcase you add - not exactly sure
> > how it would materialize in a miscompilation.
>
> In number_of_iterations_cond, there is code:
> if (code == GE_EXPR || code == GT_EXPR
> || (code == NE_EXPR && integer_zerop (iv0->step)))
>{
>  std::swap (iv0, iv1);
>  code = swap_tree_comparison (code);
>}
> It converts "GT/GE" (i >= val) to "LT/LE" (val <= i),
> and LE (val <= i) is converted to LT (val - 1 < i).
> So, the code is added to number_of_iterations_lt.
>
> But, this patch leads mis-compilation for unsigned "i >= val" as
> above transforms: converting LE (val <= i) to LT (val - 1 < i)
> seems not appropriate (e.g where val=0).
I don't know where the exact code is, but IIRC, number_of_iteration
handles boundary conditions when transforming <= into <.  You may
check it out.

> Thanks for pointing out this!!!
>
> I would investigate a way to handle this correctly.
> A possible way maybe just to return false for this kind of LE.
IIRC, it checks the boundary conditions, either returns false or
simply introduces more assumptions.
>
> Any suggestions?
>
> >
> >> There is function adjust_cond_for_loop_until_wrap which
> >> handles similar work for const bases.
> >> Like adjust_cond_for_loop_until_wrap, this patch enhance
> >> function number_of_iterations_cond/number_of_iterations_lt
> >> to analyze number of iterations for this kind of loop.
> >>
> >> Bootstrap and regtest pass on powerpc64le, is this ok for trunk?
> >>
> >> gcc/ChangeLog:
> >>
> >>  PR tree-optimization/101145
> >>  * tree-ssa-loop-niter.c
> >>  (number_of_iterations_until_wrap): New function.
> >>  (number_of_iterations_lt): Invoke above function.
> >>  (adjust_cond_for_loop_until_wrap):
> >>  Merge to number_of_iterations_until_wrap.
> >>  (number_of_iterations_cond): Update invokes for
> >>  adjust_cond_for_loop_until_wrap and number_of_iterations_lt.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  PR tree-optimization/101145
> >>  * gcc.dg/vect/pr101145.c: New test.
> >>  * gcc.dg/vect/pr101145.inc: New test.
> >>  * gcc.dg/vect/pr101145_1.c: New test.
> >>  * gcc.dg/vect/pr101145_2.c: New test.
> >>  * gcc.dg/vect/pr101145_3.c: New test.
> >> ---
> >>  gcc/testsuite/gcc.dg/vect/pr101145.c   | 187
> >> +
> >>  gcc/testsuite/gcc.dg/vect/pr101145.inc |  63 +
> >>  gcc/testsuite/gcc.dg/vect/pr101145_1.c |  15 ++
> >>  gcc/testsuite/gcc.dg/vect/pr101145_2.c |  15 ++
> >>  gcc/testsuite/gcc.dg/vect/pr101145_3.c |  15 ++
> >>  gcc/tree-ssa-loop-niter.c  | 150 +++-
> >>  6 files changed, 380 insertions(+), 65 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.inc
> >>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_1.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_2.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_3.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145.c
> >> b/gcc/testsuite/gcc.dg/vect/pr101145.c
> >> new file mode 100644
> >> index 000..74031b031cf
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/vect/pr101145.c
> >> @@ -0,0 +1,187 @@
> >> +/* { dg-require-effective-target vect_int } */
> >> +/* { dg-options "-O3 -fdump-tree-vect-details" } */
> >> +#include 
> >> +
> >> +unsigned __attribute__ ((noinline))
> >> +foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned
> >> n)
> >> +{
> >> +  while (n < ++l)
> >> +*a++ = *b++ + 1;
> >> +  return l;
> >> +}
> >> +
> >> +unsigned __attribute__ ((noinline))
> >> +foo_1 (int *__restrict__ a, int *__restrict__ b, unsigned l,
> >> unsigned)
> >> +{
> >> +  while (UINT_MAX - 64 < ++l)
> >> +*a++ = *b++ + 1;
> >> +  return l;
> >> +}
> >> +
> >> +unsigned __attribute__ ((noinline))
> >> +foo_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned
> >> n)
> >> +{
> >> +  l = UINT_MAX - 32;
> >> +  while (n < ++l)
> >> +*a++ = *b++ + 1;
> >> +  return l;
> >> +}
> >> +
> >> +unsigned __attribute__ ((noinline))
> >> +foo_3 (int *__restrict__ a, int *__restrict__ b, 

Re: [PATCH 2/4] allow poisoning input_location in ranges it should not be used

2021-07-01 Thread Trevor Saunders
On Thu, Jul 01, 2021 at 11:40:55AM -0400, David Malcolm via Gcc-patches wrote:
> On Thu, 2021-07-01 at 14:53 +0200, Richard Biener wrote:
> > On Thu, Jul 1, 2021 at 12:16 PM Trevor Saunders <
> > tbsau...@tbsaunde.org> wrote:
> > > 
> > > On Wed, Jun 30, 2021 at 11:13:23AM -0400, David Malcolm wrote:
> > > > On Wed, 2021-06-30 at 01:35 -0400, Trevor Saunders wrote:
> > > > > This makes it possible to assert if input_location is used
> > > > > during the
> > > > > lifetime
> > > > > of a scope.  This will allow us to find places that currently
> > > > > use it
> > > > > within a
> > > > > function and its callees, or prevent adding uses within the
> > > > > lifetime
> > > > > of a
> > > > > function after all existing uses are removed.
> > > > > 
> > > > > bootstrapped and regtested on x86_64-linux-gnu, ok?
> > > > > 
> > > > > Trev
> > > > 
> > > > [...snip...]
> > > > 
> > > > > diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> > > > > index d58586f2526..3f68d1d79eb 100644
> > > > > --- a/gcc/diagnostic.c
> > > > > +++ b/gcc/diagnostic.c
> > > > > @@ -1835,7 +1835,7 @@ internal_error (const char *gmsgid, ...)
> > > > >    auto_diagnostic_group d;
> > > > >    va_list ap;
> > > > >    va_start (ap, gmsgid);
> > > > > -  rich_location richloc (line_table, input_location);
> > > > > +  rich_location richloc (line_table, UNKNOWN_LOCATION);
> > > > >    diagnostic_impl (, NULL, -1, gmsgid, , DK_ICE);
> > > > >    va_end (ap);
> > > > > 
> > > > 
> > > > I actually make use of this in the analyzer: the analyzer sets
> > > > input_location to stmt->location when analyzing a given stmt -
> > > > that
> > > > way, if the analyzer ICEs, the ICE is shown at the code construct
> > > > that
> > > > crashed the analyzer.
> > > > 
> > > > This behavior is useful to me, and would be lost with the
> > > > proposed
> > > > patch.
> > > 
> > > I made this change because otherwise if the compiler ICE's while
> > > access
> > > to input_location is blocked we end up infinitely recursing
> > > complaining
> > > we can't access it while trying to say where the last error was.  I
> > > was
> > > nervous about the change before, and now I agree we need something
> > > else.
> > > 
> > > > Is there a better way of doing what I'm doing?
> > > > 
> > > > Is the long-term goal of the patch kit to reduce our reliance on
> > > > global
> > > > variables?  Are we ultimately still going to need a variable for
> > > > "where
> > > > to show the ICE if gcc crashes"?  (perhaps stashing it in the
> > > > diagnostic_context???)
> > > 
> > > Yes, the goal is ultimately removal of global state, however I'm
> > > not
> > > really ure what the better approach to your problem is, after all
> > > even
> > > moving it to the diagnostic context is sort of a global state, and
> > > sort
> > > of dupplicates input_location.  That said it is somewhat more
> > > constrained, so if it removes usage of input_location perhaps its
> > > worthwhile?
> > 
> > Reduction of global state is of course good - but in particular
> > input_location
> > should be something only used during parsing because it's a quite
> > broken concept otherwise.  And fiddling with it tends to be quite
> > fragile...
> > for example see g:7d6f7e92c3b737736a2d8ff97a71af9f230c2f88
> > for the "fun" you can have with "stale" values in input_location ...
> 
> Yeah.  Another example, from the analyzer, is
> g:2fbea4190e76a59c4880727cf84706fe083c00ae (PR 93349)

So, one rather useful this patch allows doing, even if you apply it
locally, is that you can block off access to input_location in the same
scope you currently save and restore it, and then once you've fixed up
all the asserts you find with a bootstrap and test cycle, you can be
somewhat confident nothing uses it and the save and restore dance can
go.  I may have a look at doing that with some of these.

> 
> > IMHO users should have their own "copy", for example the gimplifier
> > instead of mucking with and using input_location could use a
> > similar state in its gimplify_ctx.
> 
> Some ideas (not necessarily good ones):
> 
> (a) the diagnostic_context could have an ice_location field, and use
> that in internal_error (and maybe an RAII class for setting/clearing
> it).

If this is useful to people as it seems to be, this seems sensible, my
one concern is when exactly should ice_location be set? other than "it
should match input_location".  There's still the global state of the
diagnostic context, but at least its more understandable and isolated.
On the other hand it offers the opertunity to have a stack of
ice_locations, if you walk up the stack looking at all the RAII objects,
perhaps that would be useful, especially with things like template
instantiation?

> (b) move input_location to diagnostic_context, and add:
>#define input_location (global_dc->x_input_location)
> or:
>#define input_location (global_dc->x_default_location)
> which add an indirection everywhere.  I don't love these ideas, in that
> we 

Re: [RS6000] Adjust testcases for power10 instructions

2021-07-01 Thread Alan Modra via Gcc-patches
On Thu, Jul 01, 2021 at 04:47:21PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jul 01, 2021 at 10:59:15PM +0930, Alan Modra wrote:
> > * lib/target-supports.exp (check_effective_target_has_arch_pwr10): New.
> 
> Mike added this already, please make sure to not add it twice :-)

Yup, rebasing took it out of my patch and a little edit took it out of
my changelog.

> [...]
> > gcc.target/powerpc/pr86731-fwrapv-longlong.c: Match power10 insns.
> 
> (It still allows older as well, so "Also match" maybe?)

OK.

> Did you make sure all of these are correct and expected?

Yes, they still are.  I checked that there was a corresponding
testsuite regression fix for each change too.

>  Are the
> testcases still strict enough.

I think so.

, or should you add -mno-pcrel to the
> options, instead? Or maybe test both -mpcrel and -mno-pcrel?  Etc.

I think adding -mno-pcrel would be a bad idea, since it would reduce
power10 code coverage, and you'll get both by simply running the
testsuite on power10 and say, power9.

> 
> > * gcc.target/powerpc/lvsl-lvsr.c: Avoid file name match.
> 
> You also add a "p?", is that expected?  Should be in the changelog
> then :-)

It was in the changelog..  I mentioned lvsl-lvsr.c twice (which I
suppose might fall foul of the changelog commit checking).  Changing
to

* gcc.target/powerpc/lvsl-lvsr.c: Likewise.  Avoid file name match.

> > -/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M} 2 } } */
> > +/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */
> 
> 
> > @@ -1,12 +1,12 @@
> >  /* Test expected code generation for lvsl and lvsr on little endian.
> > -   Note that lvsl and lvsr are each produced once, but the filename
> > -   causes them to appear twice in the file.  */
> > +   Note that \s is used in the lvsl/lvsr matches so we don't match
> > +   on '.file "lvsl-lvsr.c"'.  */
> 
> Even better is to not put the instruction names in the filename, but
> heh, maybe that would be too simple ;-)
> 
> 
> Segher

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Update gen_autofdo_event.py and gcc-auto-profile

2021-07-01 Thread Andi Kleen via Gcc-patches



On 7/1/2021 4:40 PM, Eugene Rozenfeld wrote:

gen_autofdo_event.py was stumbling on models with stepping so
I updated the script to handle this case similar to the code in
https://github.com/andikleen/pmu-tools/blob/c6a5f63aede19def8886d6a8b74d7a55c38ca947/event_download.py

The second change was to tolerate cases when the CPU supports PEBS but the
perf command with /p fails. This can happen in, e.g., a virtual machine.
In newer perf :P would work too, but that's not available in older perf, 
so I guess that's ok.


I regenerated gcc-auto-profile using the updated script.



Thanks. Looks good to me.


-Andi




[PATCH] Update gen_autofdo_event.py and gcc-auto-profile

2021-07-01 Thread Eugene Rozenfeld via Gcc-patches
gen_autofdo_event.py was stumbling on models with stepping so
I updated the script to handle this case similar to the code in
https://github.com/andikleen/pmu-tools/blob/c6a5f63aede19def8886d6a8b74d7a55c38ca947/event_download.py

The second change was to tolerate cases when the CPU supports PEBS but the
perf command with /p fails. This can happen in, e.g., a virtual machine.

I regenerated gcc-auto-profile using the updated script.

contrib/ChangeLog:

* gen_autofdo_event.py: handle stepping, non-working PEBS

gcc/ChangeLog:

* config/i386/gcc-auto-profile: regenerate
---
 contrib/gen_autofdo_event.py | 54 ++--
 gcc/config/i386/gcc-auto-profile | 41 +++-
 2 files changed, 71 insertions(+), 24 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
index c97460c61c6..1eb6f1d6d85 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -46,20 +46,29 @@ args = ap.parse_args()
 
 eventmap = collections.defaultdict(list)
 
-def get_cpu_str():
-with open('/proc/cpuinfo', 'r') as c:
-vendor, fam, model = None, None, None
-for j in c:
-n = j.split()
-if n[0] == 'vendor_id':
-vendor = n[2]
-elif n[0] == 'model' and n[1] == ':':
-model = int(n[2])
-elif n[0] == 'cpu' and n[1] == 'family':
-fam = int(n[3])
-if vendor and fam and model:
-return "%s-%d-%X" % (vendor, fam, model), model
-return None, None
+def get_cpustr():
+cpuinfo = os.getenv("CPUINFO")
+if cpuinfo is None:
+cpuinfo = '/proc/cpuinfo'
+f = open(cpuinfo, 'r')
+cpu = [None, None, None, None]
+for j in f:
+n = j.split()
+if n[0] == 'vendor_id':
+cpu[0] = n[2]
+elif n[0] == 'model' and n[1] == ':':
+cpu[2] = int(n[2])
+elif n[0] == 'cpu' and n[1] == 'family':
+cpu[1] = int(n[3])
+elif n[0] == 'stepping' and n[1] == ':':
+cpu[3] = int(n[2])
+if all(v is not None for v in cpu):
+break
+# stepping for SKX only
+stepping = cpu[0] == "GenuineIntel" and cpu[1] == 6 and cpu[2] == 0x55
+if stepping:
+return "%s-%d-%X-%X" % tuple(cpu)
+return "%s-%d-%X" % tuple(cpu)[:3]
 
 def find_event(eventurl, model):
 print >>sys.stderr, "Downloading", eventurl
@@ -81,7 +90,7 @@ def find_event(eventurl, model):
 return found
 
 if not args.all:
-cpu, model = get_cpu_str()
+cpu = get_cpu_str()
 if not cpu:
 sys.exit("Unknown CPU type")
 
@@ -94,7 +103,8 @@ for j in u:
 n = j.rstrip().split(',')
 if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
 if args.all:
-vendor, fam, model = n[0].split("-")
+components = n[0].split("-")
+model = components[2]
 model = int(model, 16)
 cpufound += 1
 found += find_event(baseurl + n[2], model)
@@ -146,7 +156,17 @@ case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
exit 1 ;;'''
 print "esac"
-print 'exec perf record -e $E -b "$@"'
+print "set -x"
+print 'if ! perf record -e $E -b "$@" ; then'
+print '  # PEBS may not actually be working even if the processor supports 
it'
+print '  # (e.g., in a virtual machine). Trying to run without /p.'
+print '  set +x'
+print '  echo >&2 "Retrying without /p."'
+print '  E="$(echo "${E}" | sed -e \'s/\/p/\//\')"'
+print '  set -x'
+print '  exec perf record -e $E -b "$@"'
+print ' set +x'
+print 'fi'
 
 if cpufound == 0 and not args.all:
 sys.exit('CPU %s not found' % cpu)
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 5da5c63cd84..56f64cbff1f 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -1,7 +1,7 @@
 #!/bin/sh
-# profile workload for gcc profile feedback (autofdo) using Linux perf
-# auto generated. to regenerate for new CPUs run
-# contrib/gen_autofdo_event.py --shell --all in gcc source
+# Profile workload for gcc profile feedback (autofdo) using Linux perf.
+# Auto generated. To regenerate for new CPUs run
+# contrib/gen_autofdo_event.py --script --all in gcc source
 
 # usages:
 # gcc-auto-profile program (profile program and children)
@@ -10,7 +10,7 @@
 # gcc-auto-profile --kernel -a sleep X (profile kernel)
 # gcc-auto-profile --all -a sleep X(profile kernel and user space)
 
-# identify branches taken event for CPU
+# Identify branches taken event for CPU.
 #
 
 FLAGS=u
@@ -37,7 +37,12 @@ case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
   egrep "^model\s*:" /proc/cpuinfo | head -n1` in
 model*:\ 55|\
 model*:\ 77|\
-model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;;
+model*:\ 76|\
+model*:\ 92|\

Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-01 Thread Jacob Lifshay via Gcc-patches
On Thu, Jul 1, 2021, 15:28 H.J. Lu via llvm-dev 
wrote:

> On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers 
> wrote:
> >
> > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:
> >
> > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1
> registers.
> >
> > That restricts use of _Float16 to processors with SSE.  Is that what we
> > want in the ABI, or should _Float16 be available with base 32-bit x86
> > architecture features only, much like _Float128 and the decimal FP types
>
> Yes, _Float16 requires XMM registers.
>
> > are?  (If it is restricted to SSE, we can of course ensure relevant
> libgcc
> > functions are built with SSE enabled, and likewise in glibc if that gains
> > _Float16 functions, though maybe with some extra complications to get
> > relevant testcases to run whenever possible.)
> >
>
> _Float16 functions in libgcc should be compiled with SSE enabled.
>
> BTW, _Float16 software emulation may require more than just SSE
> since we need to do _Float16 load and store with XMM registers.
> There is no 16bit load/store for XMM registers without AVX512FP16.
>

Umm, if you just need to load/store 16-bit scalars in XMM registers you can
use pextrw and pinsrw which don't require AVX. f16x8 can use any of the
standard full-register load/stores.

https://gcc.godbolt.org/z/ncznr9TM1

Jacob


Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-01 Thread Craig Topper via Gcc-patches
On Thu, Jul 1, 2021 at 4:02 PM H.J. Lu via llvm-dev 
wrote:

> On Thu, Jul 1, 2021 at 3:40 PM Joseph Myers 
> wrote:
> >
> > On Thu, 1 Jul 2021, H.J. Lu wrote:
> >
> > > BTW, _Float16 software emulation may require more than just SSE
> > > since we need to do _Float16 load and store with XMM registers.
> > > There is no 16bit load/store for XMM registers without AVX512FP16.
> >
> > You should be able to make the move go via general-purpose registers (for
> > example) if you can't do a direct 16-bit load/store for XMM registers.
> >
>
> There is no 16bit move between GPRs and XMM registers without
> AVX512FP16.
>
>
Isn't PINSRW supported since SSE1?


>
> --
> H.J.
> ___
> LLVM Developers mailing list
> llvm-...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


Re: [PATCH] Add optional _Float16 support

2021-07-01 Thread H.J. Lu via Gcc-patches
On Thu, Jul 1, 2021 at 3:40 PM Joseph Myers  wrote:
>
> On Thu, 1 Jul 2021, H.J. Lu wrote:
>
> > BTW, _Float16 software emulation may require more than just SSE
> > since we need to do _Float16 load and store with XMM registers.
> > There is no 16bit load/store for XMM registers without AVX512FP16.
>
> You should be able to make the move go via general-purpose registers (for
> example) if you can't do a direct 16-bit load/store for XMM registers.
>

There is no 16bit move between GPRs and XMM registers without
AVX512FP16.


-- 
H.J.


Re: [PATCH] Add optional _Float16 support

2021-07-01 Thread Joseph Myers
On Thu, 1 Jul 2021, H.J. Lu wrote:

> BTW, _Float16 software emulation may require more than just SSE
> since we need to do _Float16 load and store with XMM registers.
> There is no 16bit load/store for XMM registers without AVX512FP16.

You should be able to make the move go via general-purpose registers (for 
example) if you can't do a direct 16-bit load/store for XMM registers.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add optional _Float16 support

2021-07-01 Thread H.J. Lu via Gcc-patches
On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers  wrote:
>
> On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:
>
> > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
>
> That restricts use of _Float16 to processors with SSE.  Is that what we
> want in the ABI, or should _Float16 be available with base 32-bit x86
> architecture features only, much like _Float128 and the decimal FP types

Yes, _Float16 requires XMM registers.

> are?  (If it is restricted to SSE, we can of course ensure relevant libgcc
> functions are built with SSE enabled, and likewise in glibc if that gains
> _Float16 functions, though maybe with some extra complications to get
> relevant testcases to run whenever possible.)
>

_Float16 functions in libgcc should be compiled with SSE enabled.

BTW, _Float16 software emulation may require more than just SSE
since we need to do _Float16 load and store with XMM registers.
There is no 16bit load/store for XMM registers without AVX512FP16.

-- 
H.J.


Re: [PATCH 1/2] Implement basic block path solver.

2021-07-01 Thread Jeff Law via Gcc-patches




On 6/28/2021 10:21 AM, Aldy Hernandez wrote:

This is is the main basic block path solver for use in the ranger-based
backwards threader.  Given a path of BBs, the class can solve the final
conditional or any SSA name used in calculating the final conditional.

The main API is:

// This class is a basic block path solver.  Given a set of BBs
// indicating a path through the CFG, range_in_path() will return the
// range of an SSA as if the BBs in the path would have been executed
// in order.
//
// Only SSA names passed in IMPORTS are precomputed, and can be
// queried.
//
// Note that the blocks are in reverse order, thus the exit block is
// path[0].

class path_solver
{
public:
   path_solver (gimple_ranger );
   virtual ~path_solver ();
   void precompute_ranges (const vec *path,
  const bitmap_head *imports);
   void range_in_path (irange &, tree name);
   void range_in_path (irange &, gimple *);
};

gcc/ChangeLog:

 * Makefile.in (OBJS): Add tree-ssa-path-solver.o.
* tree-ssa-path-solver.cc: New file.
* tree-ssa-path-solver.h: New file.
---
  gcc/Makefile.in |   1 +
  gcc/tree-ssa-path-solver.cc | 310 
  gcc/tree-ssa-path-solver.h  |  85 ++
  3 files changed, 396 insertions(+)
  create mode 100644 gcc/tree-ssa-path-solver.cc
  create mode 100644 gcc/tree-ssa-path-solver.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ebf26442992..66cc5f9529e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1644,6 +1644,7 @@ OBJS = \
tree-ssa-loop.o \
tree-ssa-math-opts.o \
tree-ssa-operands.o \
+   tree-ssa-path-solver.o \
tree-ssa-phiopt.o \
tree-ssa-phiprop.o \
tree-ssa-pre.o \
diff --git a/gcc/tree-ssa-path-solver.cc b/gcc/tree-ssa-path-solver.cc
new file mode 100644
index 000..1e2c37cff78
--- /dev/null
+++ b/gcc/tree-ssa-path-solver.cc
@@ -0,0 +1,310 @@
+/* Basic block path solver.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   Contributed by Aldy Hernandez .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfganal.h"
+#include "value-range.h"
+#include "gimple-range.h"
+#include "tree-pretty-print.h"
+#include "tree-ssa-path-solver.h"
+#include "ssa.h"
+
+// Internal construct to help facilitate debugging of solver.
+#define DEBUG_SOLVER getenv("DEBUG")
Shouldn't this really be a property of what pass is using the solver and 
whether or not the appropriate dump flag is on for that pass?




+
+path_solver::path_solver (gimple_ranger )
+  : m_ranger (ranger)
+{
+  m_cache = new ssa_global_cache;
+  m_has_cache_entry = BITMAP_ALLOC (NULL);
+  m_path = NULL;
+}
+
+path_solver::~path_solver ()
+{
+  BITMAP_FREE (m_has_cache_entry);
+  delete m_cache;
+}
Do we need to clean up any other members in here? m_has_cache_entry, 
m_path, m_imports, m_ranger?


path_solver::range_of_expr has a comment asking if the call to set_cache 
is necessary.  Can you resolve that one way or the other?


+// Initialize the current path to PATH. The current block is set to

+// the entry block to the path.
+//
+// Note that the blocks are in reverse order, so the exit block is
+// path[0].
+
+void
+path_solver::set_path (const vec *path)
+{
+  gcc_checking_assert (path->length () > 1);
+  m_path = path;
+  m_pos = m_path->length () - 1;
+  bitmap_clear (m_has_cache_entry);
+}
What's our position on ownership of PATH here?  Can our caller delete 
it?  Can we modify it?  Who releases it?  I realize you may be 
interfacing with some nonsense code I wrote eons ago ;-)




+
+// Return the range of the result of PHI in R.
+
+void
+path_solver::ssa_range_in_phi (irange , gphi *phi)
+{
+  tree name = gimple_phi_result (phi);
+  basic_block bb = gimple_bb (phi);
+
+  // We experimented with querying ranger's range_on_entry here, but
+  // the performance penalty was too high, for hardly any improvements.
+  if (at_entry ())
+{
+  r.set_varying (TREE_TYPE (name));
+  return;
+}
+
+  basic_block prev = prev_bb ();
+  edge e_in = find_edge (prev, bb);
+  for (size_t i = 0; i < gimple_phi_num_args (phi); ++i)
It's probably not important in practice, but you're going to end up 
calling gimple_phi_num_args every iteration of 

Re: [PATCH] Add optional _Float16 support

2021-07-01 Thread Joseph Myers
On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:

> 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.

That restricts use of _Float16 to processors with SSE.  Is that what we 
want in the ABI, or should _Float16 be available with base 32-bit x86 
architecture features only, much like _Float128 and the decimal FP types 
are?  (If it is restricted to SSE, we can of course ensure relevant libgcc 
functions are built with SSE enabled, and likewise in glibc if that gains 
_Float16 functions, though maybe with some extra complications to get 
relevant testcases to run whenever possible.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[committed] input.c: move file caching globals to a new file_cache class

2021-07-01 Thread David Malcolm via Gcc-patches
This moves some global state from input.c to a new file_cache class,
of which an instance is owned by global_dc.  Various state is also
made private.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as b544c348e13ad33d55f0d954370ab1fb0f2bf683.

gcc/ChangeLog:
* diagnostic.h (diagnostic_context::m_file_cache): New field.
* input.c (class fcache): Rename to...
(class file_cache_slot): ...this, making most members private and
prefixing fields with "m_".
(file_cache_slot::get_file_path): New accessor.
(file_cache_slot::get_use_count): New accessor.
(file_cache_slot::missing_trailing_newline_p): New accessor.
(file_cache_slot::inc_use_count): New.
(fcache_buffer_size): Move to...
(file_cache_slot::buffer_size): ...here.
(fcache_line_record_size): Move to...
(file_cache_slot::line_record_size): ...here.
(fcache_tab): Delete, in favor of global_dc->m_file_cache.
(fcache_tab_size): Move to file_cache::num_file_slots.
(diagnostic_file_cache_init): Update for move of fcache_tab
to global_dc->m_file_cache.
(diagnostic_file_cache_fini): Likewise.
(lookup_file_in_cache_tab): Convert to...
(file_cache::lookup_file): ...this.
(diagnostics_file_cache_forcibly_evict_file): Update for move of
fcache_tab to global_dc->m_file_cache, moving most of
implementation to...
(file_cache::forcibly_evict_file): ...this new function and...
(file_cache_slot::evict): ...this new function.
(evicted_cache_tab_entry): Convert to...
(file_cache::evicted_cache_tab_entry): ...this.
(add_file_to_cache_tab): Convert to...
(file_cache::add_file): ...this, moving bulk of implementation
to...
(file_cache_slot::create): ..this new function.
(file_cache::file_cache): New.
(file_cache::~file_cache): New.
(lookup_or_add_file_to_cache_tab): Convert to...
(file_cache::lookup_or_add_file): ..this new function.
(fcache::fcache): Rename to...
(file_cache_slot::file_cache_slot): ...this, adding "m_" prefixes
to fields.
(fcache::~fcache): Rename to...
(file_cache_slot::~file_cache_slot): ...this, adding "m_" prefixes
to fields.
(needs_read): Convert to...
(file_cache_slot::needs_read_p): ...this.
(needs_grow): Convert to...
(file_cache_slot::needs_grow_p): ...this.
(maybe_grow): Convert to...
(file_cache_slot::maybe_grow): ...this.
(read_data): Convert to...
(file_cache_slot::read_data): ...this.
(maybe_read_data): Convert to...
(file_cache_slot::maybe_read_data): ...this.
(get_next_line): Convert to...
(file_cache_slot::get_next_line): ...this.
(goto_next_line): Convert to...
(file_cache_slot::goto_next_line): ...this.
(read_line_num): Convert to...
(file_cache_slot::read_line_num): ...this.
(location_get_source_line): Update for moving of globals to
global_dc->m_file_cache.
(location_missing_trailing_newline): Likewise.
* input.h (class file_cache_slot): New forward decl.
(class file_cache): New.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic.h |   3 +
 gcc/input.c  | 459 +++
 gcc/input.h  |  33 
 3 files changed, 301 insertions(+), 194 deletions(-)

diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 1b9d6b1f64d..086bc4f903f 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -136,6 +136,9 @@ struct diagnostic_context
   /* Where most of the diagnostic formatting work is done.  */
   pretty_printer *printer;
 
+  /* Cache of source code.  */
+  file_cache *m_file_cache;
+
   /* The number of times we have issued diagnostics.  */
   int diagnostic_count[DK_LAST_DIAGNOSTIC_KIND];
 
diff --git a/gcc/input.c b/gcc/input.c
index 9e39e7df83c..de20d983d2c 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -32,9 +32,29 @@ along with GCC; see the file COPYING3.  If not see
 
 /* This is a cache used by get_next_line to store the content of a
file to be searched for file lines.  */
-class fcache
+class file_cache_slot
 {
 public:
+  file_cache_slot ();
+  ~file_cache_slot ();
+
+  bool read_line_num (size_t line_num,
+ char ** line, ssize_t *line_len);
+
+  /* Accessors.  */
+  const char *get_file_path () const { return m_file_path; }
+  unsigned get_use_count () const { return m_use_count; }
+  bool missing_trailing_newline_p () const
+  {
+return m_missing_trailing_newline;
+  }
+
+  void inc_use_count () { m_use_count++; }
+
+  void create (const char *file_path, FILE *fp, unsigned highest_use_count);
+  void evict ();
+
+ private:
   /* These are information used to store a line boundary.  */
   class line_info
   {
@@ -61,36 

Re: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).

2021-07-01 Thread Joseph Myers
On Thu, 1 Jul 2021, Richard Biener via Gcc-patches wrote:

> > C++ FE doesn't support _FLoat16, and the place float/double are
> > handled is in convert.c(which is GENERIC?), that's why I decided to do
> > it in the backend.

I think there ought to be a preliminary patch series adding whatever 
_FloatN support is relevant to the C++ front end - covering at least those 
types that have modes different from float / double / long double, even if 
you don't cover all the _FloatN / _FloatNx types (e.g. _Float32 as 
distinct from float), and ensuring the corresponding constant suffixes are 
also accepted in C++ in whatever way makes sense for that language.  (As I 
noted in bug 85518, there are ICEs in name mangling when such types escape 
into C++ code at present.)

When this was discussed on the gcc list in March, Jonathan Wakely at least 
supported making _Float16 available in C++ 
 
, even if no C++ 
front-end maintainers contributed to that discussion.

> Yes, but we can easily add a pattern to match.pd, sth like
> 
> (for sq (SQRT)
>  (simplify
>   (convert (sq@1 (convert @0)))
>   (if (types_match (type, TREE_TYPE (@0))
>&& TYPE_PRECISION (TREE_TYPE (@1)) > TYPE_PRECISION (TREE_TYPE (@0))

(With a more complicated precision condition, see convert.c for details.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RS6000] Adjust testcases for power10 instructions

2021-07-01 Thread Segher Boessenkool
Hi!

On Thu, Jul 01, 2021 at 10:59:15PM +0930, Alan Modra wrote:
>   * lib/target-supports.exp (check_effective_target_has_arch_pwr10): New.

Mike added this already, please make sure to not add it twice :-)

[...]
>   gcc.target/powerpc/pr86731-fwrapv-longlong.c: Match power10 insns.

(It still allows older as well, so "Also match" maybe?)

Did you make sure all of these are correct and expected?  Are the
testcases still strict enough, or should you add -mno-pcrel to the
options, instead?  Or maybe test both -mpcrel and -mno-pcrel?  Etc.

>   * gcc.target/powerpc/lvsl-lvsr.c: Avoid file name match.

You also add a "p?", is that expected?  Should be in the changelog
then :-)

> -/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */


> @@ -1,12 +1,12 @@
>  /* Test expected code generation for lvsl and lvsr on little endian.
> -   Note that lvsl and lvsr are each produced once, but the filename
> -   causes them to appear twice in the file.  */
> +   Note that \s is used in the lvsl/lvsr matches so we don't match
> +   on '.file "lvsl-lvsr.c"'.  */

Even better is to not put the instruction names in the filename, but
heh, maybe that would be too simple ;-)


Segher


Re: [PATCH 0/2] Initial support for AVX512FP16

2021-07-01 Thread Joseph Myers
On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:

> The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> require -mavx512fp16, we need to compile complex _Float16 functions in
> libgcc without -mavx512fp16.  Complex _Float16 performance is very
> important for our _Float16 usage.   _Float16 performance has to be
> very fast.  There should be no emulation anywhere when -mavx512fp16
> is used.   That is why _Float16 is available only with -mavx512fp16.

You could build IFUNC versions of the libgcc functions (like float128 on 
powerpc64le), to be fast (modulo any IFUNC overhead) when run on 
AVX512FP16 hardware.  Or arrange for different libcall names to be used 
depending on the instruction set features available, and build those 
functions under multiple names, to be fast when the application is built 
with -mavx512fp16.

Since the HCmode libgcc functions just convert to/from SFmode and do all 
their computations on SFmode (to avoid intermediate overflows / 
cancellation resulting in inaccuracy), an F16C version may make sense as 
well (assuming use of the F16C conversion instructions is still efficient 
once you allow for zeroing the unused parts of the vector register, if 
necessary to avoid spurious exceptions from converting junk data in those 
parts of the register).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16).

2021-07-01 Thread Joseph Myers
On Thu, 1 Jul 2021, liuhongt via Gcc-patches wrote:

> +/* Optimize for code like (_Float16) __builtin_ceif ((float) f16)
> +   since it's not handled in frontend.  */

Much the same comments apply as for sqrt.  But in this case, the 
conversion code is in match.pd - right now, specific to pairs of types 
(float, double) and (float, long double).  And it's logically valid for 
any pair of same-radix floating-point types, the values of one of which 
are a subset of the values of the other (a strict subset, for it actually 
to be an interesting optimization).  (So when making it apply to more 
general types, take care that it does *not* apply to the __ibm128 / 
_Float128 pair on powerpc64le, in either order, because neither of those 
types has values a subset of the other.)

(Also, the match.pd code isn't handling roundeven at present, but that 
should be a trivial addition to it.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16).

2021-07-01 Thread Joseph Myers
On Thu, 1 Jul 2021, liuhongt via Gcc-patches wrote:

> +/* Optimize for code like (_Float16) __builtin_sqrtf ((float) f16)
> +   since it's not handled in frontend.  */

If correct, it *should* be handled in front end (well, middle-end).  See 
what convert.c:convert_to_real_1 does, with a long comment about when it's 
safe for sqrt (the comment says it's safe when P1 >= P2*2+2, which is true 
for SFmode and HFmode).

The issue (apart from convert_to_real_1 being earlier than this really 
ought to be done - something based on match.pd would be better - but you 
can ignore that for now) would be the limitation earlier in that code to 
the modes of float and double:

  /* Disable until we figure out how to decide whether the functions are
 present in runtime.  */
  /* Convert (float)sqrt((double)x) where x is float into sqrtf(x) */
  if (optimize
  && (TYPE_MODE (type) == TYPE_MODE (double_type_node)
  || TYPE_MODE (type) == TYPE_MODE (float_type_node)))

In this case, you *don't* have the sqrtf16 function in the runtime library 
(adding _Float16 support to glibc would be good, but runs into various 
other complications that would need considering, especially questions of 
how if at all it can be added on an architecture before the minimum GCC 
version for building glibc for that architecture is recent enough to 
support _Float16 for that architecture).  So effectively what you'd need 
is some way of saying "__builtin_sqrtf16 is available", where "available" 
for now means "will be expanded inline", i.e. some combination of 
!flag_math_errno and instruction set features.  That's not really within 
the scope of what the libc_has_function hook does, but it could maybe be 
extended to take information about the exact function in question, or 
another similar hook could be added.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Add optional _Float16 support

2021-07-01 Thread H.J. Lu via Gcc-patches
1. Pass _Float16 and _Complex _Float16 values on stack.
2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
---
 low-level-sys-info.tex | 57 +-
 1 file changed, 40 insertions(+), 17 deletions(-)

diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index acaf30e..82956e3 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers 
to a
 \subsubsection{Fundamental Types}
 
 Table~\ref{basic-types} shows the correspondence between ISO C
-scalar types and the processor scalar types.  \code{__float80},
+scalar types and the processor scalar types.  \code{_Float16},
+\code{__float80},
 \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and
 \code{__m512} types are optional.
 
@@ -79,22 +80,25 @@ scalar types and the processor scalar types.  
\code{__float80},
 & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\
 & \texttt{\textit{any-type} (*)()} & & \\
 \hline
-Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\
 \cline{2-5}
-point & \texttt{double} & 8 & 4 & double (IEEE-754) \\
-& \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+& \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 
& 16-bit (IEEE-754) \\
 \cline{2-5}
-& \texttt{__float80}$^{\dagger\dagger}$  & 12 & 4 & 80-bit extended 
(IEEE-754) \\
-& \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+& \texttt{float} & 4 & 4 & single (IEEE-754) \\
+\cline{2-5}
+Floating- & \texttt{double} & 8
+   & 8$^{\dagger\dagger\dagger\dagger}$ & double (IEEE-754) \\
+\cline{2-5}
+point & \texttt{__float80}$^{\dagger\dagger}$  & 16 & 16 & 80-bit extended 
(IEEE-754) \\
+& \texttt{long double}$^{\dagger\dagger\dagger\dagger\dagger}$  & 16 & 16 
& 80-bit extended (IEEE-754) \\
 \cline{2-5}
 & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended 
(IEEE-754) \\
 \hline
-Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
+& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
 \cline{2-5}
-Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
-point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & 
& \\
+Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
+Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ 
& & & \\
 \cline{2-5}
-& \texttt{_Complex __float80}$^{\dagger\dagger}$  & 24 & 4 & complex 
80-bit extended (IEEE-754) \\
+point & \texttt{_Complex __float80}$^{\dagger\dagger}$  & 24 & 4 & complex 
80-bit extended (IEEE-754) \\
 & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
 \cline{2-5}
 & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 
128-bit extended (IEEE-754) \\
@@ -125,6 +129,8 @@ The \texttt{long double} type is 64-bit, the same as the 
\texttt{double}
 type, on the Android{\texttrademark} platform.  More information on the
 Android{\texttrademark} platform is available from
 \url{http://www.android.com/}.}\\
+\multicolumn{5}{p{13cm}}{\myfontsize 
$^{\dagger\dagger\dagger\dagger\dagger\dagger}$
+The \texttt{_Float16} type, from ISO/IEC TS 18661-3:2015, is optional.}\\
   \end{tabular}
 }
 \end{table}
@@ -323,6 +329,7 @@ at the time of the call.
 \begin{table}
 \Hrule
   \caption{Register Usage}
+  \myfontsize
   \label{fig-reg-usage}
   \begin{center}
 \begin{tabular}{l|p{8.35cm}|l}
@@ -346,13 +353,29 @@ of some 64bit return types & No \\
 \EBP & callee-saved register; optionally used as frame pointer & Yes \\
 \ESI & callee-saved register & yes \\
 \EDI & callee-saved register & yes \\
-\reg{xmm0}, \reg{ymm0} & scratch registers; also used to pass and return
-\code{__m128}, \code{__m256} parameters & No\\
-\reg{xmm1}--\reg{xmm2},& scratch registers; also used to pass
-\code{__m128}, & No \\
-\reg{ymm1}--\reg{ymm2} & \code{__m256} parameters & \\
-\reg{xmm3}--\reg{xmm7},& scratch registers & No \\
-\reg{ymm3}--\reg{ymm7} & & \\
+\reg{xmm0} & scratch register; also used to pass the first \code{__m128}
+ parameter and return \code{__m128}, \code{_Float16},
+the real part of \code{_Complex _Float16} & No \\
+\reg{ymm0} & scratch register; also used to pass the first \code{__m256}
+ parameter and return \code{__m256} & No \\
+\reg{zmm0} & scratch register; also used to pass the first \code{__m512}
+ parameter and return \code{__m512} & No \\
+\reg{xmm1} & scratch register; also used to pass the second \code{__m128}
+ parameter and return the imaginary part of
+\code{_Complex _Float16} & No \\
+\reg{ymm1} & scratch register; also used to pass the second \code{__m256}
+ parameters & No \\
+\reg{zmm1} & scratch register; also used to pass 

Re: [PATCH 0/2] Initial support for AVX512FP16

2021-07-01 Thread Joseph Myers
Some general comments, following what I said on libc-alpha:


1. Can you confirm that the ABI being used for 64-bit, for _Float16 and 
_Complex _Float16 argument passing and return, follows the current x86_64 
ABI document?


2. Can you confirm that if you build with this instruction set extension 
enabled by default, and run GCC tests for a corresponding (emulated?) 
processor, all the existing float16 tests in the testsuite are enabled and 
PASS (both compilation and execution) (both 64-bit and 32-bit testing)?


3. There's an active 32-bit ABI mailing list (ia32-...@googlegroups.com).  
If you want to support _Float16 in the 32-bit case, please work with it to 
get the corresponding ABI documented (using only memory and 
general-purpose registers seems like a good idea, so that the ABI can be 
supported for the base architecture without depending on SSE registers 
being present).  In the absence of 32-bit ABI support it might be better 
to disable the HFmode support for 32-bit.


4. Support for _Float16 really ought not to depend on whether a particular 
instruction set extension is present, just like with other floating-point 
types; it makes sense, as an API, for all x86 processors (and like many 
APIs, it will be faster on some processors than on others).  More specific 
points here are:

(a) Basic arithmetic (+-*/) can be done by converting to SFmode, doing 
arithmetic there and converting back to HFmode; the results of doing so 
will be correctly rounded.  Indeed, I think optabs.c handles that 
automatically when operations are available on a wider mode but not on the 
desired mode (but you'd need to check carefully that all the expected 
conversions do occur).

(b) Conversions to/from all other floating-point modes will always be 
needed, whether in hardware or in software.

(c) In the F16C (Ivy Bridge and later) case, where you have hardware 
conversions to/from float (only), it's fine to convert to double (or long 
double) via float.  (On efficiency grounds, widening from HFmode to TFmode 
should be a pure software operations, that should be faster than having an 
intermediate conversion to SFmode when the SFmode-to-TFmode conversion is 
a software operation.)

(d) In the F16C case (where there are hardware conversions only from 
SFmode, not from wider modes), conversion *from* DFmode (or XFmode or 
TFmode) to HFmode should be a software operation, to avoid double 
rounding; an intermediate conversion to SFmode would be incorrect.

(e) It's OK for conversions to/from integer modes to go via SFmode 
(although I don't know if that's efficient or not).  Any case where a 
conversion from integer to SFmode is inexact would overflow HFmode, so 
there are no double rounding issues.

(f) In the F16C case, it seems the hardware instructions only work on 
vectors, not scalars, so care would need to be taken to use them for 
scalar conversions only if the other elements of the vector register are 
known to be safe to convert without raising any exceptions (e.g. all zero 
bits, or -fno-trapping-math in effect).

(g) If concerned about efficiency of intermediate truncations on 
processors without hardware _Float16 arithmetic, look at 
aarch64_excess_precision; you have the option of using excess precision 
for _Float16 by default, though that only really helps for C given the 
lack of excess precision support in the C++ front end.  (Enabling this can 
cause trouble for code that only expects C99/C11 values of 
FLT_EVAL_METHOD, however; see the -fpermitted-flt-eval-methods option for 
more details.)


5. Suppose that in some cases you do disable _Float16 support (whether 
that's just for 32-bit until the ABI has been defined, or also in the 
absence of instruction set support despite my comments above).  Then the 
way you do that in this patch series, enabling the type in 
ix86_scalar_mode_supported_p and ix86_libgcc_floating_mode_supported_p and 
giving an error later in ix86_expand_move, is a bad idea.

Errors in expanders are generally problematic (they don't have good 
location information available).  But apart from that, ordinary user code 
should be able to tell whether _Float16 is supported by testing whether 
e.g. __FLT16_MANT_DIG__ is defined (like float.h does), or by including 
float.h (with __STDC_WANT_IEC_60559_TYPES_EXT__ defined) and then testing 
whether one of the FLT16_* macros is defined, or in a configure test by 
just declaring something using the _Float16 type.  Patch 1 changes 
check_effective_target_float16 to work around your technique for disabling 
_Float16 in ix86_expand_move, but it should be considered a stable user 
API that any of the above methods can be used in user code to check for 
_Float16 support - user code shouldn't need to know implementation details 
that you need to do something that will go through ix86_expand_move to see 
whether _Float16 is supported or not (and user code shouldn't need to use 
a configure test at all for this, testing FLT16_* after including 

Re: [PATCH] soft-fp: Update soft-fp from glibc

2021-07-01 Thread H.J. Lu via Gcc-patches
On Thu, Jul 1, 2021 at 1:12 PM Jeff Law  wrote:
>
>
>
> On 7/1/2021 1:28 PM, H.J. Lu via Gcc-patches wrote:
> > This patch is updating soft-fp from glibc:
> >
> > 1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
> > 2. Add __truncxfhf2 to truncate IEEE extended into IEEE half.
> >
> > These are needed by x86 _Float16 support.
> Presumably these are lifted verbatim from glibc?  If so, clearly an ACK

That is correct.  I am checking it in.

> for the trunk.
>
> jeff
>

Thanks.

-- 
H.J.


Re: [PING][PATCH 2/4] remove %G and %K from calls in front end and middle end (PR 98512)

2021-07-01 Thread Martin Sebor via Gcc-patches

On 6/30/21 5:35 PM, David Malcolm wrote:

On Wed, 2021-06-30 at 13:45 -0600, Martin Sebor wrote:

On 6/30/21 9:39 AM, Martin Sebor wrote:

Ping.  Attached is the same patch rebased on top the latest trunk.


Please see the attached patch instead.  The previous one had typo
in it.



As discussed in the review of Aldy's recent changes to the backwards
threader, he has run into the same bug the patch fixes.  Getting this
patch set reviewed and approved would be helpful in keeping him from
having to work around the bug.

https://gcc.gnu.org/pipermail/gcc/2021-June/236608.html

On 6/10/21 5:27 PM, Martin Sebor wrote:

This diff removes the uses of %G and %K from all warning_at() calls
throughout GCC front end and middle end.  The inlining context is
included in diagnostic output whenever it's present.






Thanks for writing the patch.

I went through the full patch, though my eyes may have glazed over in
places at all of the %G and %K removals.  I *think* you got them mostly
correct, apart from the following possible issues and nits...


diff --git a/gcc/expr.c b/gcc/expr.c
index 025033c9ecf..b9fe1cf91d7 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c


[...]


@@ -11425,10 +11425,10 @@ expand_expr_real_1 (tree exp, rtx target, 
machine_mode tmode,
 DECL_ATTRIBUTES (fndecl))) != NULL)
  {
const char *ident = lang_hooks.decl_printable_name (fndecl, 1);
-   warning_at (tree_nonartificial_location (exp),
+   warning_at (EXPR_LOCATION (exp),


Are we preserving the existing behavior for
__attribute__((__artificial__)) here?
Is this behavior handled somewhere else in the patch kit?


Yes.  The warning infrastructure (set_inlining_locations) uses
the location of the site into which the statement has been inlined
regardless of whether the inlined function is artificial.




OPT_Wattribute_warning,
-   "%Kcall to %qs declared with attribute warning: %s",
-   exp, identifier_to_locale (ident),
+   "call to %qs declared with attribute warning: %s",
+   identifier_to_locale (ident),
TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr;
  }
  


[...]


diff --git a/gcc/gimple-ssa-warn-restrict.c b/gcc/gimple-ssa-warn-restrict.c
index 02771e4cd60..efb8db98393 100644
--- a/gcc/gimple-ssa-warn-restrict.c
+++ b/gcc/gimple-ssa-warn-restrict.c


[...]


@@ -1689,7 +1689,7 @@ maybe_diag_access_bounds (gimple *call, tree func, int 
strict,
  const builtin_memref , offset_int wroff,
  bool do_warn)
  {
-  location_t loc = gimple_or_expr_nonartificial_location (call, ref.ptr);
+  location_t loc = gimple_location (call);


Likewise here.


@@ -2065,7 +2065,7 @@ check_bounds_or_overlap (range_query *query,
}
  }
  
-  location_t loc = gimple_or_expr_nonartificial_location (call, dst);

+  location_t loc = gimple_location (call);


Likewise here.

[...]


diff --git a/gcc/testsuite/g++.dg/warn/Wdtor1.s 
b/gcc/testsuite/g++.dg/warn/Wdtor1.s
new file mode 100644
index 000..e69de29bb2d


Is this an empty .s file?  Was this a misfire with "git add"?


It must have been, yes.



[...]

@@ -90,8 +90,8 @@ NOIPA void warn_g2 (struct A *p)
g2 (p);
  }
  
-// { dg-message "inlined from 'g2'" "" { target *-*-* } 0 }

-// { dg-message "inlined from 'warn_g2'" "" { target *-*-* } 0 }
+// { dg-message "inlined from 'g2'" "note on line 93" { target *-*-* } 0 }
+// { dg-message "inlined from 'warn_g2'" "note on line 94" { target *-*-* } 0 }


You've added descriptions to disambiguate all of the various directives
on line 0, which is good, but I don't like the use of line numbers in
the descriptions, since it will get very confusing if the numbering
changes.

Would it work to use the message text as the description e.g.

   // { dg-message "inlined from 'warn_g2'" "inlined from 'warn_g2'" { target 
*-*-* } 0 }

or somesuch?


It would certainly work, they're just informational labels printed
by DejaGnu when the assertions fail.  I added them to help me see
what they went with while working with the test.  I'm not concerned
about the line numbers changing.  If they do and someone notices,
they can update them, the same way they might want to if they
rename the functions they're inlined into.





diff --git a/gcc/testsuite/gcc.dg/Wfree-nonheap-object-5.c 
b/gcc/testsuite/gcc.dg/Wfree-nonheap-object-5.c
new file mode 100644
index 000..979e1e3d78f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wfree-nonheap-object-5.c
@@ -0,0 +1,45 @@
+/* Similar to Wfree-nonheap-object-4.c but without system headers:
+   verify that warnings for the same call site from distinct callers
+   include the correct function names in the inlining stack.
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+struct A
+{
+  void *p;
+};
+
+void f0 (struct A *p)
+{
+  __builtin_free (p->p);  

Re: [PATCH] soft-fp: Update soft-fp from glibc

2021-07-01 Thread Jeff Law via Gcc-patches




On 7/1/2021 1:28 PM, H.J. Lu via Gcc-patches wrote:

This patch is updating soft-fp from glibc:

1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
2. Add __truncxfhf2 to truncate IEEE extended into IEEE half.

These are needed by x86 _Float16 support.
Presumably these are lifted verbatim from glibc?  If so, clearly an ACK 
for the trunk.


jeff



Re: [PATCH] rs6000: Add MMA __builtin_vsx_lxvp and __builtin_vsx_stxvp built-ins

2021-07-01 Thread Peter Bergner via Gcc-patches
On 7/1/21 1:01 PM, Segher Boessenkool wrote:
> On Wed, Jun 30, 2021 at 04:56:04PM -0500, Peter Bergner wrote:
>> LLVM added the __builtin_vsx_lxvp and __builtin_vsx_stxvp built-ins.
>> The following patch adds support for them to GCC so that we stay in sync
>> with LLVM.
> 
> This should be documented somewhere.

Done.


>> +  tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR, TREE_TYPE 
>> (ptr), ptr, offset));
> 
> Line (much) too long.

Fixed.


>> +  tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR, TREE_TYPE 
>> (ptr), ptr, offset));
> 
> Same.

Fixed.


> Can BU_MMA_LD be used only for lxvp?  Name it BU_MMA_PAIR_LD then?  Same
> for _ST ofc.

Yes, it's used only for lxvp and stxvp and uses our MMA specific
__vector_pair types too.  Macro name changed.



> The patch is okay for trunk.

Below is the updated patch which is bootstrapping now.  I'll commit it
if it shows no regressions.


> For the backports it is okay if Bill has looked at this patch as well.

Bill has not seen the patch.  I'm not sure when/if he'll get a chance
to either.


Peter




gcc/
* config/rs6000/rs6000-builtin.def (BU_MMA_PAIR_LD, BU_MMA_PAIR_ST):
New macros.
(__builtin_vsx_lxvp, __builtin_vsx_stxvp): New built-ins.
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Expand
lxvp and stxvp built-ins.
(mma_init_builtins): Handle lxvp and stxvp built-ins.
(builtin_function_type): Likewise.
* doc/extend.texi (__builtin_vsx_lxvp, __builtin_mma_stxvp): Document.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-7.c: New test.
* gcc.target/powerpc/mma-builtin-8.c: New test.


diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index d7ce4de421e..6270444ef70 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -484,6 +484,25 @@
 | RS6000_BTC_SENARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
+#define BU_MMA_PAIR_LD(ENUM, NAME, ATTR)   \
+  RS6000_BUILTIN_M (VSX_BUILTIN_ ## ENUM,  /* ENUM */  \
+   "__builtin_vsx_" NAME,  /* NAME */  \
+   RS6000_BTM_MMA, /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_BINARY\
+| RS6000_BTC_GIMPLE),  \
+   CODE_FOR_nothing)   /* ICODE */
+
+#define BU_MMA_PAIR_ST(ENUM, NAME, ATTR)   \
+  RS6000_BUILTIN_M (VSX_BUILTIN_ ## ENUM,  /* ENUM */  \
+   "__builtin_vsx_" NAME,  /* NAME */  \
+   RS6000_BTM_MMA, /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_TERNARY   \
+| RS6000_BTC_VOID  \
+| RS6000_BTC_GIMPLE),  \
+   CODE_FOR_nothing)   /* ICODE */
+
 /* ISA 2.05 (power6) convenience macros. */
 /* For functions that depend on the CMPB instruction */
 #define BU_P6_2(ENUM, NAME, ATTR, ICODE)   \
@@ -3253,6 +3272,9 @@ BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, 
"__builtin_cfstring", RS6000_BTM_ALWAYS,
 BU_P10V_VSX_1 (XVCVBF16SPN, "xvcvbf16spn", MISC, vsx_xvcvbf16spn)
 BU_P10V_VSX_1 (XVCVSPBF16, "xvcvspbf16",   MISC, vsx_xvcvspbf16)
 
+BU_MMA_PAIR_LD (LXVP,  "lxvp", MISC)
+BU_MMA_PAIR_ST (STXVP, "stxvp",PAIR)
+
 BU_MMA_1 (XXMFACC, "xxmfacc",  QUAD, mma_xxmfacc)
 BU_MMA_1 (XXMTACC, "xxmtacc",  QUAD, mma_xxmtacc)
 BU_MMA_1 (XXSETACCZ,   "xxsetaccz",MISC, mma_xxsetaccz)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b67789845a5..6115e3b34d9 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -11913,6 +11913,32 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
*gsi)
   gsi_replace_with_seq (gsi, new_seq, true);
   return true;
 }
+  else if (fncode == VSX_BUILTIN_LXVP)
+{
+  push_gimplify_context (true);
+  tree offset = gimple_call_arg (stmt, 0);
+  tree ptr = gimple_call_arg (stmt, 1);
+  tree lhs = gimple_call_lhs (stmt);
+  tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
+  TREE_TYPE (ptr), ptr, offset));
+  gimplify_assign (lhs, mem, _seq);
+  pop_gimplify_context (NULL);
+  gsi_replace_with_seq (gsi, new_seq, true);
+  return true;
+}
+  else if (fncode == VSX_BUILTIN_STXVP)
+{
+  

Re: [PATCH 1/4] introduce diagnostic infrastructure changes (PR 98512)

2021-07-01 Thread Martin Sebor via Gcc-patches

On 6/30/21 4:55 PM, David Malcolm wrote:

On Tue, 2021-06-15 at 17:00 -0600, Martin Sebor wrote:

On 6/11/21 11:04 AM, David Malcolm wrote:

On Thu, 2021-06-10 at 17:26 -0600, Martin Sebor wrote:

This diff introduces the diagnostic infrastructure changes to
support
controlling warnings at any call site in the inlining stack and
printing
the inlining context without the %K and %G directives.


Thanks for working on this, looks very promising.


Improve warning suppression for inlined functions.

Resolves:
PR middle-end/98871 - Cannot silence -Wmaybe-uninitialized at
declaration site
PR middle-end/98512 - #pragma GCC diagnostic ignored ineffective in
conjunction with alias attribute


Am I right in thinking that you add test coverage for both of these
in
patch 2 of the kit?


Yes, the tests depend on the changes in patch 2 (some existing tests
fail with just patch 1 applied because the initial location passed
to warning_t() is different than with it).




[...]






Yep, thanks.  Please see the attached revision.

Martin


Various nits inline below:


diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index d58586f2526..3a22d4d26a6 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -991,51 +991,93 @@ print_parseable_fixits (pretty_printer *pp, rich_location 
*richloc,
pp_set_prefix (pp, saved_prefix);
  }
  
-/* Update the diag_class of DIAGNOSTIC based on its location

-   relative to any
+/* Update the inlininig context in CONTEXT for a DIAGNOSTIC.  */

  ^

It's inlining_info now, so please update this comment.


Inlining_info is the name of the struct (per your request) that
captures what's commonly referred to as the inlining context:
the context from which a function call is made and into which
it's inlined.  Another common name for it is inlining stack.
These are also the terms using which I documented the purpose
of the struct in diagnostic.h.  As far as I know no one refers
to it as the "info of a function call" and while it's not wrong
it per se I don't think it helps make the comment clearer.

I've made the requested naming changes against my better judgment.




+
+static void
+update_inlining_context (diagnostic_context *context,
+diagnostic_info *diagnostic)


...and please rename to "get_any_inlining_info".


Done (with the same reservation as above).




+{
+  auto  = diagnostic->m_iinfo.m_ilocs;
+
+  if (context->set_locations_cb)
+{
+  /* Retrieve the locations into which the expression about to be
+diagnosed has been inlined, including those of all the callers
+all the way down the inlining stack.  */
+  context->set_locations_cb (context, diagnostic);
+  location_t loc = diagnostic->m_iinfo.m_ilocs.last ();




+  if (diagnostic->m_iinfo.m_ao && loc != UNKNOWN_LOCATION)
+   diagnostic->message.set_location (0, loc, SHOW_RANGE_WITH_CARET);


What is the purpose of the above two lines of code?
(I believe it's to replace the %G/%K stuff, right?)
Please can you add a suitable comment.


The purpose of this code was to restore the caret in case it was
suppressed for a rich location by a call to add_range().  I didn't
note down the test that failed and that prompted me to add it (it's
what I was chasing down when I noticed the semi_embedded_vec and
rich_location bugs), but removing it now doesn't trigger any
failures anymore.


+}
+  else
+{
+  /* When there's no callback use just the one location provided
+by the caller of the diagnostic function.  */
+  location_t loc = diagnostic_location (diagnostic);
+  ilocs.safe_push (loc);
+  diagnostic->m_iinfo.m_allsyslocs = in_system_header_at (loc);
+}
+}
+
+/* Update the kind of DIAGNOSTIC based on its location(s), including
+   any of those in its inlining context, relative to any

^^^

"stack" rather than "context" here; I think we're overusing the word "context".


I don't foresee anyone getting confused by either but again, no
point in spending time arguing about it.  I used stack instead.



...>>   /* Generate a URL string describing CWE.  The caller is 
responsible for

@@ -1129,6 +1171,9 @@ static bool
  diagnostic_enabled (diagnostic_context *context,
diagnostic_info *diagnostic)
  {
+  /* Update the inlining context for this diagnostic.  */
+  update_inlining_context (context, diagnostic);


Please rename as described above.


Done.

The revised patch is in the attachment.  I plan to go with it unless
there are requests for code changes.

Martin
Improve warning suppression for inlined functions [PR 98512].

Resolves:
PR middle-end/98871 - Cannot silence -Wmaybe-uninitialized at declaration site
PR middle-end/98512 - #pragma GCC diagnostic ignored ineffective in conjunction with alias attribute

gcc/ChangeLog:

	* diagnostic.c (get_any_inlining_info): New.
	(update_effective_level_from_pragmas): Handle inlining context.
	

Re: [Patch] Fortran: Fix bind(C) character length checks

2021-07-01 Thread Sandra Loosemore

On 7/1/21 11:08 AM, Tobias Burnus wrote:

Hi all,

this patch came up when discussing Sandra's TS29113 patch internally.
There is presumably also some overlap with José's patches.

This patch tries to rectify the BIND(C) CHARACTER handling on the
diagnostic side, only. That is: what to accept and what
to reject for which Fortran standard.


FAOD, my TS29113 testsuite just posted includes only tests for bind(c) 
character features that were added by TS29113, which are all failing 
(without this new patch from Tobias) due to the incorrect compile-time 
error.  Those places in the tests are all currently marked with dg-bogus 
referencing PR 92482.


I've been thinking we need to add a further large batch of tests to 
systematically exercise all the other character-related bind(c) 
functionality; all the combinations of character type, character length, 
dummy argument flavor, call direction, and in/out add up to perhaps 600 
more things that need to be tested.  :-(  Of course a single testcase 
can test multiple things so it wouldn't require 600 actual new 
testcases.  Anyway, I thought there was little point in starting work on 
that until the compiler was actually accepting the more complicated 
dummy argument forms.


I'm a bit backlogged on other things right now but I'll give this patch 
a spin when I have time and if it's approved I'll adjust the TS29113 
testsuite dg-bogus patterns or whatever accordingly, before reposting a 
cleaned-up version.


-Sandra


[PATCH] soft-fp: Update soft-fp from glibc

2021-07-01 Thread H.J. Lu via Gcc-patches
This patch is updating soft-fp from glibc:

1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
2. Add __truncxfhf2 to truncate IEEE extended into IEEE half.

These are needed by x86 _Float16 support.
---
 libgcc/soft-fp/extendhfxf2.c | 53 
 libgcc/soft-fp/truncxfhf2.c  | 52 +++
 2 files changed, 105 insertions(+)
 create mode 100644 libgcc/soft-fp/extendhfxf2.c
 create mode 100644 libgcc/soft-fp/truncxfhf2.c

diff --git a/libgcc/soft-fp/extendhfxf2.c b/libgcc/soft-fp/extendhfxf2.c
new file mode 100644
index 000..1cb5fef9477
--- /dev/null
+++ b/libgcc/soft-fp/extendhfxf2.c
@@ -0,0 +1,53 @@
+/* Software floating-point emulation.
+   Return an IEEE half converted to IEEE extended.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "extended.h"
+
+XFtype
+__extendhfxf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_E (R);
+  XFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_EXTEND (E, H, 4, 1, R, A);
+#else
+  FP_EXTEND (E, H, 2, 1, R, A);
+#endif
+  FP_PACK_RAW_E (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/libgcc/soft-fp/truncxfhf2.c b/libgcc/soft-fp/truncxfhf2.c
new file mode 100644
index 000..688ad24523d
--- /dev/null
+++ b/libgcc/soft-fp/truncxfhf2.c
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE extended into IEEE half.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "extended.h"
+
+HFtype
+__truncxfhf2 (XFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_E (A);
+  FP_DECL_H (R);
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_E (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (H, E, 1, 4, R, A);
+#else
+  FP_TRUNC (H, E, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
-- 
2.31.1



Re: [PATCH] rs6000: Add MMA __builtin_vsx_lxvp and __builtin_vsx_stxvp built-ins

2021-07-01 Thread Peter Bergner via Gcc-patches
On 7/1/21 1:01 PM, Segher Boessenkool wrote:
> On Wed, Jun 30, 2021 at 04:56:04PM -0500, Peter Bergner wrote:
>> LLVM added the __builtin_vsx_lxvp and __builtin_vsx_stxvp built-ins.
>> The following patch adds support for them to GCC so that we stay in sync
>> with LLVM.
> 
> This should be documented somewhere.

Hmmm, I had that change in one of my trees, but must have dropped
it when moving to another system/tree and didn't notice!  Doh!
Let me add that back and make the other changes you suggested
and then I'll repost and test here.  Thanks for catching that!

Peter




Re: [PATCH] rs6000: Add MMA __builtin_vsx_lxvp and __builtin_vsx_stxvp built-ins

2021-07-01 Thread Segher Boessenkool
Hi!

On Wed, Jun 30, 2021 at 04:56:04PM -0500, Peter Bergner wrote:
> LLVM added the __builtin_vsx_lxvp and __builtin_vsx_stxvp built-ins.
> The following patch adds support for them to GCC so that we stay in sync
> with LLVM.

This should be documented somewhere.

> +  else if (fncode == VSX_BUILTIN_LXVP)
> +{
> +  push_gimplify_context (true);
> +  tree offset = gimple_call_arg (stmt, 0);
> +  tree ptr = gimple_call_arg (stmt, 1);
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR, TREE_TYPE 
> (ptr), ptr, offset));

Line (much) too long.

> +  else if (fncode == VSX_BUILTIN_STXVP)
> +{
> +  push_gimplify_context (true);
> +  tree src = gimple_call_arg (stmt, 0);
> +  tree offset = gimple_call_arg (stmt, 1);
> +  tree ptr = gimple_call_arg (stmt, 2);
> +  tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR, TREE_TYPE 
> (ptr), ptr, offset));

Same.

Can BU_MMA_LD be used only for lxvp?  Name it BU_MMA_PAIR_LD then?  Same
for _ST ofc.

The patch is okay for trunk.  For the backports it is okay if Bill has
looked at this patch as well.  Thanks!


Segher


[committed] libstdc++: Improvements to Doxygen markup

2021-07-01 Thread Jonathan Wakely via Gcc-patches
This attempst to improve the doxygen output to work around what seems to
be some bugs in doxygen (issues 8635 and 8638).

The @addtogroup command doesn't work for entities inside a nested
namespace (see 8635) so we need to close and reopen groups on entering
and leaving nested namespaces. This fixes the problem that
chrono::duration and chrono::time_point were not documented in the
"Time" documentation group. I am unable to make the path classes appear
as part of their relevant groups (File System and Filesystem TS), nor
the contents of  or . I have made some minor
improvements to the docs for those types, including starting to address
PR 97001 by adding @since to the doxygen comments.

This change also excludes the  header from
Doxygen processing, so we don't get an unwanted "Networking-ts" group
in the documentation.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* doc/doxygen/doxygroups.cc: Fix docs for std::literals.
* doc/doxygen/user.cfg.in: Exclude the Networking TS header.
Add some more predefined macros.
* include/bits/fs_fwd.h: Move @addtogroup commands inside
namespaces. Add better documentation.
* include/bits/fs_path.h: Likewise.
* include/experimental/bits/fs_fwd.h: Likewise.
* include/experimental/bits/fs_path.h: Likewise.
* include/ext/throw_allocator.h: Fix typo and improve docs.
* include/std/chrono: Move @addtogroup commands.
* include/std/system_error: Move @addtogroup commands.
* libsupc++/exception: Improve documentation.
* libsupc++/exception.h: Add @since documentation.

Tested powerpc64le-linux. Committed to trunk.

commit f2ce64b53fa76a4c192fe51b2f6c5a863a3b1241
Author: Jonathan Wakely 
Date:   Thu Jul 1 00:30:54 2021

libstdc++: Improvements to Doxygen markup

This attempst to improve the doxygen output to work around what seems to
be some bugs in doxygen (issues 8635 and 8638).

The @addtogroup command doesn't work for entities inside a nested
namespace (see 8635) so we need to close and reopen groups on entering
and elaving nested namespaces. This fixes the problem that
chrono::duration and chrono::time_point were not documented in the
"Time" documentation group. I am unable to make the path classes appear
as part of their relevant groups (File System and Filesystem TS), nor
the contents of  or . I have made some minor
improvements to the docs for those types, including starting to address
PR 97001 by adding @since to the doxygen comments.

This change also excludes the  header from
Doxygen processing, so we don't get an unwanted "Networking-ts" group
in the documentation.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* doc/doxygen/doxygroups.cc: Fix docs for std::literals.
* doc/doxygen/user.cfg.in: Exclude the Networking TS header.
Add some more predefined macros.
* include/bits/fs_fwd.h: Move @addtogroup commands inside
namespaces. Add better documentation.
* include/bits/fs_path.h: Likewise.
* include/experimental/bits/fs_fwd.h: Likewise.
* include/experimental/bits/fs_path.h: Likewise.
* include/ext/throw_allocator.h: Fix typo and improve docs.
* include/std/chrono: Move @addtogroup commands.
* include/std/system_error: Move @addtogroup commands.
* libsupc++/exception: Improve documentation.
* libsupc++/exception.h: Add @since documentation.

diff --git a/libstdc++-v3/doc/doxygen/doxygroups.cc 
b/libstdc++-v3/doc/doxygen/doxygroups.cc
index 506cb145b35..42f7d447520 100644
--- a/libstdc++-v3/doc/doxygen/doxygroups.cc
+++ b/libstdc++-v3/doc/doxygen/doxygroups.cc
@@ -19,7 +19,7 @@
 /** @namespace std
  *  @brief ISO C++ entities toplevel namespace is std.
 */
-/** @namespace std
+/** @namespace std::literals
  *  @brief ISO C++ inline namespace for literal suffixes.
 */
 /** @namespace std::__detail
@@ -43,7 +43,7 @@
  *  @ingroup extensions
 */
 /** @namespace __gnu_cxx::__detail
- *  @brief Implementation details not part of the namespace __gnu_cxx 
+ *  @brief Implementation details not part of the namespace __gnu_cxx
  *  interface.
 */
 /** @namespace __gnu_internal
diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 3510c736789..349b9ec9c36 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -1092,7 +1092,8 @@ RECURSIVE  = NO
 # Note that relative paths are relative to the directory from which doxygen is
 # run.
 
-EXCLUDE= Makefile
+EXCLUDE= Makefile \
+ include/experimental/bits/net.h
 
 # The EXCLUDE_SYMLINKS tag can be used to select whether or not files or
 # directories that are symbolic links (a Unix file system feature) are excluded
@@ 

Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Richard Biener via Gcc-patches
On July 1, 2021 4:40:11 PM GMT+02:00, Michael Matz  wrote:
>Hello,
>
>I haven't followed this thread too closely, in particular I haven't
>seen 
>why the current form of the .DEFERRED_INIT call was chosen or
>suggested, 
>but it triggered my "well, that's obviously wrong" gut feeling; so
>sorry 
>for stating something which might be obvious to thread participants
>here.  
>Anyway:
>
>On Thu, 1 Jul 2021, Richard Sandiford via Gcc-patches wrote:
>
>> >> It's not a bug in SSA, it's at most a missed optimization there.
>> >
>> > I still think that SSA cannot handle block-scoped variable
>correctly 
>> > in this case, it should not move the variable out side of the block
>
>> > scope. -:)
>> >
>> >>  But with -ftrivial-auto-var-init it becomes a correctness issue.
>> 
>> I might have lost track of what “it” is here.  Do you mean the 
>> progation, or the fact that we have a PHI in the first place?
>> 
>> For:
>> 
>> unsigned int
>> f (int n)
>> {
>>   unsigned int res = 0;
>>   for (int i = 0; i < n; i += 1)
>> {
>>   unsigned int foo;
>>   foo += 1;
>
>Note that here foo is used uninitialized.  That is the thing from which
>
>everything else follows.  Garbage in, garbage out.  It makes not too
>much 
>sense to argue that the generated PHI node on this loop (generated
>because 
>of the exposed-upward use of foo) is wrong, or right, or should better
>be 
>something else.  The input program was broken, so anything makes sense.
>
>That is the same with Qings shorter testcase:
>
>  6   for (i = 0; i < a; i++) {
>  7 if (__extension__({int size2;
>  8 size2 = ART_INIT (size2, 2);
>
>Uninitialized use of size2 right there.  And it's the same for the use
>of 
>.DEFERRED_INIT as the patch does:
>
>{
>  int size2;
>  size2 = .DEFERRED_INIT (size2, 2);
>  size2 = 4;
>  _1 = size2 > 5;
>  D.2240 = (int) _1;
>}
>
>Argument of the pseudo-function-call to .DEFERRED_INIT uninitialized ->
>
>boom.
>
>You can't solve any of this by fiddling with how SSA rewriting behaves
>at 
>a large scale.  You need to change this and only this specific 
>interpretation of a use.  Or preferrably not generate it to start with.
>
>> IMO the foo_3 PHI makes no sense.  foo doesn't live beyond its block,
>> so there should be no loop-carried dependency here.
>> 
>> So yeah, if “it” meant that the fact that variables live too long,
>> then I agree that becomes a correctness issue with this feature,
>> rather than just an odd quirk.  Could we solve it by inserting
>> a clobber at the end of the variable's scope, or something like that?
>
>It would possibly make GCC not generate a PHI on this broken input,
>yes.  
>But why would that be an improvement?
>
>> > Agreed, I have made such change yesterday, and this work around the
>
>> > current issue.
>> >
>> > temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)
>> >
>> > To:
>> >
>> > temp = .DEFERRED_INIT (SIZE_of_temp, init_type)
>> 
>> I think we're going round in circles here though.  The point of
>having
>> the undefined input was so that we would continue to warn about
>undefined
>> inputs.  The above issue doesn't seem like a good justification for
>> dropping that.
>
>If you want to rely on the undefined use for error reporting then you
>must 
>only generate an undefined use when there was one before, you can't
>just 
>insert new undefined uses.  I don't see how it could be otherwise, as
>soon 
>as you introduce new undefined uses you can and will run into GCC
>making 
>use of the undefinedness, not just this particular issue with lifetime
>and 
>PHI nodes which might be "solved" by clobbers.
>
>I think it's a chicken egg problem: you can't add undefined uses, for 
>which you need to know if there was one, but the facility is supposed
>to 
>help detecting if there is one to start with.

But because of the place we insert DEFERED_INIT the use will always be 
uninitialized (or we generated wrong code), so the use is somewhat pointless 
and thus IMHO getting rid of it is the correct thing to do. 

Richard. 

>
>Ciao,
>Michael.



[PATCH] BTF: Support for BTF_KIND_FLOAT

2021-07-01 Thread David Faust via Gcc-patches
Add BTF_KIND_FLOAT, a new BTF type kind which has recently stabilized in
the linux kernel [1]. This kind is used for encoding floating point
types, and is of particular use when generating BTF for some s390
arch-specific kernel headers.

Also update some BTF tests which previously used floating point types to
check correct behavior for types with no BTF representation.

[1]: 
https://github.com/torvalds/linux/commit/b1828f0b04828aa8cccadf00a702f459caefeed9

bootstrapped and tested on x86_64-linux-gnu, ok to install?

include/ChangeLog:

* btf.h (struct btf_type): Update bit usage comment.
(BTF_INFO_KIND): Update bit mask.
(BTF_KIND_FLOAT): New define.
(BTF_KIND_MAX): Update.

gcc/ChangeLog:

* btfout.c (get_btf_kind): Support BTF_KIND_FLOAT.
(btf_asm_type): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/debug/btf/btf-float-1.c: New test.
* gcc.dg/debug/btf/btf-function-3.c: Use different unrepresentable type.
* gcc.dg/debug/btf/btf-struct-2.c: Likewise.
* gcc.dg/debug/btf/btf-variables-2.c: Likewise.
---
 gcc/btfout.c  |  2 ++
 gcc/testsuite/gcc.dg/debug/btf/btf-float-1.c  | 20 +++
 .../gcc.dg/debug/btf/btf-function-3.c |  2 +-
 gcc/testsuite/gcc.dg/debug/btf/btf-struct-2.c |  2 +-
 .../gcc.dg/debug/btf/btf-variables-2.c|  2 +-
 include/btf.h |  9 +
 6 files changed, 30 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-float-1.c

diff --git a/gcc/btfout.c b/gcc/btfout.c
index e58c969825a..8cdd9905fb6 100644
--- a/gcc/btfout.c
+++ b/gcc/btfout.c
@@ -124,6 +124,7 @@ get_btf_kind (uint32_t ctf_kind)
   switch (ctf_kind)
 {
 case CTF_K_INTEGER:  return BTF_KIND_INT;
+case CTF_K_FLOAT:   return BTF_KIND_FLOAT;
 case CTF_K_POINTER:  return BTF_KIND_PTR;
 case CTF_K_ARRAY:return BTF_KIND_ARRAY;
 case CTF_K_FUNCTION: return BTF_KIND_FUNC_PROTO;
@@ -627,6 +628,7 @@ btf_asm_type (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
   switch (btf_kind)
 {
 case BTF_KIND_INT:
+case BTF_KIND_FLOAT:
 case BTF_KIND_STRUCT:
 case BTF_KIND_UNION:
 case BTF_KIND_ENUM:
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-float-1.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-float-1.c
new file mode 100644
index 000..6876df04158
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-float-1.c
@@ -0,0 +1,20 @@
+/* Tests for BTF floating point type kinds. We expect a single record for each
+   of the base types: float, double and long double.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+/* { dg-final { scan-assembler-times "\[\t \]0x1000\[\t 
\]+\[^\n\]*btt_info" 3 } } */
+
+/* { dg-final { scan-assembler-times "ascii \"float.0\"\[\t 
\]+\[^\n\]*btf_string" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"double.0\"\[\t 
\]+\[^\n\]*btf_string" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"long double.0\"\[\t 
\]+\[^\n\]*btf_string" 1 } } */
+
+float a;
+float b = 1.5f;
+
+double c;
+double d = -99.9;
+
+long double e;
+long double f = 1000.01;
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-function-3.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-function-3.c
index 35f96a2152c..c83b823d22f 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-function-3.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-function-3.c
@@ -16,7 +16,7 @@
 /* Exactly one function parameter should have type_id=0.  */
 /* { dg-final { scan-assembler-times "\[\t \]0\[\t \]+\[^\n\]*farg_type" 1 } } 
*/
 
-int foo (int a, float f, long b)
+int foo (int a, float __attribute__((__vector_size__(16))) f, long b)
 {
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-struct-2.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-struct-2.c
index 24514fcb31e..c3aff09ed9a 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-struct-2.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-struct-2.c
@@ -14,6 +14,6 @@
 struct with_float
 {
   int a;
-  float f;
+  float __attribute__((__vector_size__(16))) f;
   char c;
 } instance;
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-variables-2.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-variables-2.c
index 0f9742e9ac5..db0bdd7be16 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-variables-2.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-variables-2.c
@@ -16,7 +16,7 @@
 /* { dg-final { scan-assembler-times "ascii \"myst.0\"\[\t 
\]+\[^\n\]*btf_string" 1 } } */
 
 int foo;
-float bar;
+float __attribute__((__vector_size__(16))) bar;
 int baz[10];
 
 struct st
diff --git a/include/btf.h b/include/btf.h
index 3b1a4c3093f..bf720ee49b4 100644
--- a/include/btf.h
+++ b/include/btf.h
@@ -62,8 +62,8 @@ struct btf_type
   uint32_t info;   /* Encoded kind, variant length, kind flag:
   - bits  0-15: vlen
   - bits 16-23: unused
-  - bits 24-27: kind
-  - bits 28-30: unused
+  

Re: [wwwdocs] gcc-12/changes.html: GCN - add TI mode, mention -foffload(-options)

2021-07-01 Thread Tobias Burnus

Now committed as 5c17042e880a5d1a3eb261f73e1b9da0c1aa2641
https://gcc.gnu.org/gcc-12/changes.html

Tobias

On 29.06.21 18:38, Julian Brown wrote:

On Tue, 29 Jun 2021 17:34:00 +0200
Tobias Burnus  wrote:


This documents AMD GCN's new much-more complete TI-mode
(__int128_t) support, that was as v2 just posted by Julian
and should get committed very soon.

Thank you!


gcc-12/changes.html: GCN - add TI mode, mention -foffload(-options)
diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index b854c4e6..599443e7 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -62,6 +62,14 @@ a work-in-progress.
OpenACC. It warns about potentially suboptimal choices related
to OpenACC parallelism.

+  The offload target code generation for OpenMP and OpenACC can
now
+  be better adjused using the new 
Typo, "adjused".

Julian

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[Patch] Fortran: Fix bind(C) character length checks

2021-07-01 Thread Tobias Burnus

Hi all,

this patch came up when discussing Sandra's TS29113 patch internally.
There is presumably also some overlap with José's patches.

This patch tries to rectify the BIND(C) CHARACTER handling on the
diagnostic side, only. That is: what to accept and what
to reject for which Fortran standard.


The rules are:

* [F2003-F2018] Interoperable is character(len=1)
  → F2018, 18.3.1  Interoperability of intrinsic types
  (General, unchanged)

* Fortran 2008: In some cases, const-length chars are
  permitted as well:
  → F2018, 18.3.4  Interoperability of scalar variables
  → F2018, 18.3.5  Interoperability of array variables
  → F2018, 18.3.6  Interoperability of procedures and procedure interfaces
 [= F2008, 15.3.{4,5,6}
For global vars with bind(C), 18.3.4 + 18.3.5 applies directly (TODO: Add 
support, not in this patch)
For passed-by ref dummy arguments, 18.3.4 + 18.3.5 are referenced in
- F2008: R1229  proc-language-binding-spec is language-binding-spec
 C1255 (R1229) 
- F2018, F2018, C1554

While it is not very clearly spelt out, I regard 'char parm[4]'
interoperable with 'character(len=4) :: a', 'character(len=2) :: b(2)'
and 'character(len=1) :: c(4)' for both global variables and for
dummy arguments.

* Fortran 2018/TS29113:  Uses additionally CFI array descriptor
  - allocatable, pointer:  must be len=:
  - nonallocatable/nonpointer: len=* → implies array descriptor also
for assumed-size/explicit-size/scalar arguments.
  - All which all passed by an array descriptor already without further
restrictions: assumed-shape, assumed-rank, i.e. len= seems
to be also fine
→ 18.3.6 under item (5) bullet point 2 and 3 plus (6).


I hope I got the conditions right. I also fixed an issue with
character(len=5) :: str – the code in trans-expr.c did crash for
scalars  (decl.c did not check any constraints for arrays).
I believe the condition is wrong and for len= no descriptor
is used.

Any comments, remarks?
OK for mainline?

Tobias

PS: To do are global variables, the implementation of the sorries;
PPS: At other places like with VALUE or for function return values,
Fortran still requires len=1 with Bind(C).

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Fortran: Fix bind(C) character length checks

gcc/fortran/ChangeLog:

	* decl.c (gfc_verify_c_interop_param): Update for F2008 + F2018
	changes; reject unsupported bits with 'Error: Sorry,'.
	* trans-expr.c (gfc_conv_procedure_call): Fix condition to
	For using CFI descriptor with characters.

gcc/testsuite/ChangeLog:

	* gfortran.dg/iso_c_binding_char_1.f90: Update dg-error.
	* gfortran.dg/pr32599.f03: Use -std=-f2003 + update comment.
	* gfortran.dg/bind_c_char_10.f90: New test.
	* gfortran.dg/bind_c_char_6.f90: New test.
	* gfortran.dg/bind_c_char_7.f90: New test.
	* gfortran.dg/bind_c_char_8.f90: New test.
	* gfortran.dg/bind_c_char_9.f90: New test.

 gcc/fortran/decl.c | 107 -
 gcc/fortran/trans-expr.c   |  18 +-
 gcc/testsuite/gfortran.dg/bind_c_char_10.f90   | 480 +
 gcc/testsuite/gfortran.dg/bind_c_char_6.f90| 262 +++
 gcc/testsuite/gfortran.dg/bind_c_char_7.f90| 261 +++
 gcc/testsuite/gfortran.dg/bind_c_char_8.f90| 249 +++
 gcc/testsuite/gfortran.dg/bind_c_char_9.f90| 188 
 gcc/testsuite/gfortran.dg/iso_c_binding_char_1.f90 |   2 +-
 gcc/testsuite/gfortran.dg/pr32599.f03  |   8 +-
 9 files changed, 1551 insertions(+), 24 deletions(-)

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 413c7a75e0c..4a9f74306ff 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -1552,20 +1552,109 @@ gfc_verify_c_interop_param (gfc_symbol *sym)
 	}
 
   /* Character strings are only C interoperable if they have a
- length of 1.  */
-  if (sym->ts.type == BT_CHARACTER && !sym->attr.dimension)
+	 length of 1.  However, as argument they are either iteroperable
+	 when passed as descriptor (which requires len=: or len=*) or
+	 when having a constant length or are always passed by
+	 descriptor.  */
+	  if (sym->ts.type == BT_CHARACTER)
 	{
 	  gfc_charlen *cl = sym->ts.u.cl;
-	  if (!cl || !cl->length || cl->length->expr_type != EXPR_CONSTANT
-  || mpz_cmp_si (cl->length->value.integer, 1) != 0)
+
+	  if (sym->attr.allocatable || sym->attr.pointer)
 		{
-		  gfc_error ("Character argument %qs at %L "
-			 "must be length 1 because "
-			 "procedure %qs is BIND(C)",
-			 sym->name, >declared_at,
-			 sym->ns->proc_name->name);
+		  /* F2018, 18.3.6 (6).  */
+		  if (!sym->ts.deferred)
+		{
+		  gfc_error ("Allocatable and pointer character dummy "
+ "argument %qs at %L must have deferred length "
+ "as procedure %qs is BIND(C)", sym->name,
+ 

Re: [PATCH] Port GCC documentation to Sphinx

2021-07-01 Thread Eli Zaretskii via Gcc-patches
> Cc: jos...@codesourcery.com, g...@gcc.gnu.org, gcc-patches@gcc.gnu.org
> From: Martin Liška 
> Date: Thu, 1 Jul 2021 18:04:24 +0200
> 
> > Emacs doesn't hide the period.  But there shouldn't be a period to
> > begin with, since it's the middle of a sentence.  The correct way of
> > writing this in Texinfo is to have some punctuation: a comma or a
> > semi-colon, after the closing brace, like this:
> > 
> >This is the warning level of @ref{e,,-Wshift-overflow3}, and …
> 
> I don't see why we should put a comma after an option reference.

You explained it yourself later on:

> It's all related to Texinfo. Sphinx generates e.g.
> Enabled by @ref{7,,-Wall} and something else.
> 
> as documented here:
> https://www.gnu.org/software/texinfo/manual/texinfo/html_node/_0040ref.html
> 
> Then it ends with the following info output:
> 
>   Enabled by *note -Wall: 7. and something else.
> 
> So the period is added by Texinfo. If I put comma after a reference, then
> the period is not added there.
  ^

So the purpose of having the comma there is to avoid having a period
in the middle of a sentence, which is added by makeinfo (because the
Info readers need that).  Having a comma there may seem a bit
redundant, but having a period will definitely look like a typo, if
not confuse the heck out of the reader, especially if you want to use
these inline cross-references so massively.

> > I don't think the GCC manuals should necessarily be bound by the
> > Sphinx standards.  Where those standards are sub-optimal, it is
> > perfectly okay for GCC (and other projects) to deviate.  GCC and other
> > GNU manuals used a certain style and convention for decades, so
> > there's more than enough experience and tradition to build on.
> 
> What type of conversions and style are going to change with conversion to 
> Sphinx?

Anything that is different from the style conventions described in the
Texinfo manual.  We have many such conventions.

> Do you see any of them worse than what we have now?

I didn't bother reading the Sphinx guidelines yet, and don't know when
(and if) I will have time for that.  I do think the comparison should
be part of the job or moving to Sphinx.

> > I will no longer pursue this point, but let me just say that I
> > consider it a mistake to throw away all the experience collected using
> > Texinfo just because Sphinx folks have other traditions and
> > conventions.  It might be throwing the baby with the bathwater.
> > 
> 
> Again, please show up concrete examples. What you describe is very 
> theoretical.

We've already seen one: the style of writing inline cross-references
with the equivalent of @ref.  We also saw another: the way you
converted the menus.  It is quite clear to me that there will be
others.  So I'm not sure why you need more evidence that this could be
a real issue.

But maybe all of this is intentional: maybe the GCC project
consciously and deliberately decided to move away of the GNU
documentation style and conventions, and replace them with whatever
the Sphinx and RST conventions are?  In that case, there's no reason
for me to even mention these aspects.


[x86] Change EH pointer encodings to PC relative on Windows

2021-07-01 Thread Eric Botcazou
Hi,

a big difference between ELF and PE-COFF is that, with the latter, you can 
build position-independent executables or DLLs without generating PIC; as a 
matter of fact, flag_pic has historically been forced to 0 for 32-bit:

/* Don't allow flag_pic to propagate since gas may produce invalid code
   otherwise.  */

#undef  SUBTARGET_OVERRIDE_OPTIONS
#define SUBTARGET_OVERRIDE_OPTIONS  
\
do {
\
  flag_pic = TARGET_64BIT ? 1 : 0;  \
} while (0)

The reason is that the linker builds a .reloc section that collects all the 
absolute relocations in the generated binary, and the loader uses them to 
relocate it at load time if need be (e.g. if --dynamicbase is enabled).

Up to binutils 2.35, the GNU linker didn't build the .reloc section for 
executables and defaulted to --enable-auto-image-base for DLLs, which means 
that DLLs had an essentially unique load address and, therefore, need not be 
relocated by the loader in most cases.

With binutils 2.36 and later, the GNU linker builds a .reloc section for 
executables (thus making them PIE), --enable-auto-image-base is disabled and 
--dynamicbase is enabled by default, which means that essentially all the 
binaries are relocated at load time.

This badly breaks the 32-bit compiler configured to use DWARF-2 EH (the 64-bit 
compiler always uses the native SEH) because the loader corrupts the .eh_frame 
section when processing the relocations contained in the .reloc section.

The attached patch simply forces the PIC encodings for EH on PE-COFF targets.

Tested on x86/Windows, OK for all active branches?


2021-07-01  Eric Botcazou  

* config/i386/i386.c (asm_preferred_eh_data_format): Always use the
PIC encodings for PE-COFF targets.

-- 
Eric Botcazoudiff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2fbaae7cd02..cff26909292 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21930,10 +21930,12 @@ ix86_stack_protect_fail (void)
After all, the relocation needed is the same as for the call insn.
Whether or not a particular assembler allows us to enter such, I
guess we'll have to see.  */
+
 int
 asm_preferred_eh_data_format (int code, int global)
 {
-  if (flag_pic)
+  /* PE-COFF is effectively always -fPIC because of the .reloc section.  */
+  if (flag_pic || TARGET_PECOFF)
 {
   int type = DW_EH_PE_sdata8;
   if (!TARGET_64BIT
@@ -21942,9 +21944,11 @@ asm_preferred_eh_data_format (int code, int global)
 	type = DW_EH_PE_sdata4;
   return (global ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | type;
 }
+
   if (ix86_cmodel == CM_SMALL
   || (ix86_cmodel == CM_MEDIUM && code))
 return DW_EH_PE_udata4;
+
   return DW_EH_PE_absptr;
 }
 


Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Qing Zhao via Gcc-patches
Hi, Richard,

> On Jul 1, 2021, at 11:23 AM, Richard Sandiford  
> wrote:
> 
> Michael Matz  writes:
>> Hello,
>> 
>> On Thu, 1 Jul 2021, Richard Sandiford wrote:
>> 
>>> Well, it does feel like this is pressing the reset button on a thread
>>> whose message count is already in the high double figures.  There's a
>>> risk that restarting the discussion now is going to repeat a lot of the
>>> same ground, probably at similar length.  (But I realise that the sheer
>>> length of the thread is probably also the main reason for not having
>>> followed it closely. :-))
>> 
>> Yeah, I thought of not sending the mail because of that.  But it itched 
>> too itchy ;-)
>> 
>>> (2) continue to treat uses of uninitialised automatic variables as
>>>(semantically) undefined behaviour
>>> 
>>> When this was discussed on the clang list, (2) was a key requirement
>>> and was instrumental in getting the feature accepted.
>>> ... 
>>> since the behaviour is at least deterministic.  But from a debugging QoI
>>> perspective, this should not happen.  We should generate the same code
>>> as for:
>>> 
>>>   int foo = ;
>>>   if (flag)
>>> foo = 1;
>>>   x = foo;
>>> 
>>> However, because of (2), we should still generate a 
>>> -Wmaybe-uninitialized warning for the “x = foo” assignment.
>>> 
>>> This is not a combination we support at the moment, so something needs 
>>> to give.  The question of course is what.
>> 
>> Nothing needs to give.  You add the initialization with , 
>> _without an uninitialized use_, i.e. just:
>> 
>>   foo = .deferred_init(pattern)
>> 
>> or the like.  Then you let the optimizers do their thing.  If 'foo' is 
>> provably initialized (or rather overwritten in this setting) on all paths 
>> from above init to each use, that init will be removed.  IOW: if the init 
>> remains then not all paths are provably initializing.  I.e. the warning 
>> code can be triggered based on the simple existence of the deferred_init 
>> initializer.
>> 
>> Basically the .deferred_init acts as a "here the scope starts" marker.  
>> That's indeed what you want, not the "here the scope ends" clobber.  If a 
>> use reaches the scope-start marker you have an (possibly) uninitialized 
>> use and warn.
>> 
>> (I'm obviously missing something because the above idea seems natural, so 
>> was probably already discussed and rejected, but why?)
> 
> The hope was that we wouldn't even need to treat DEFERRED_INIT specially
> when analysing uninitialised uses.  The argument would be a standard
> uninitialised use, just like any other.
> 
> But that was before this issue came up.  It also sounds like Qing is now
> handling DEFERRED_INIT specially anyway.  (I've lost track of why,
> but see above about reopening that discussion. :-))

The reason I handled DEFERRED_INIT specially in uninitialized warning analysis 
was because only relying
on the uninitialized argument in the .DEFERRED_INIT call was not enough for the 
analysis.

The special handlings was introduced in the 1st version of the patch. (I don’t 
remember the 
exact details on why it would not work without specially handling 
.DEFERRED_INIT)

Thanks

Qing
> 
> So it sounds like that's indeed where we've ended up.  I don't think
> the original principle was inherently bad though.
> 
> Thanks,
> Richard



Re: [PATCH] PING implement pre-c++20 contracts

2021-07-01 Thread Andrew Sutton via Gcc-patches
> > I think this version addresses most of your concerns.
>
> Thanks, looking good.  I'll fuss with it a bit and commit it soon.

Awesome!

Andrew


Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Richard Sandiford via Gcc-patches
Michael Matz  writes:
> Hello,
>
> On Thu, 1 Jul 2021, Richard Sandiford wrote:
>
>> Well, it does feel like this is pressing the reset button on a thread
>> whose message count is already in the high double figures.  There's a
>> risk that restarting the discussion now is going to repeat a lot of the
>> same ground, probably at similar length.  (But I realise that the sheer
>> length of the thread is probably also the main reason for not having
>> followed it closely. :-))
>
> Yeah, I thought of not sending the mail because of that.  But it itched 
> too itchy ;-)
>
>> (2) continue to treat uses of uninitialised automatic variables as
>> (semantically) undefined behaviour
>> 
>> When this was discussed on the clang list, (2) was a key requirement
>> and was instrumental in getting the feature accepted.
>> ... 
>> since the behaviour is at least deterministic.  But from a debugging QoI
>> perspective, this should not happen.  We should generate the same code
>> as for:
>> 
>>int foo = ;
>>if (flag)
>>  foo = 1;
>>x = foo;
>> 
>> However, because of (2), we should still generate a 
>> -Wmaybe-uninitialized warning for the “x = foo” assignment.
>> 
>> This is not a combination we support at the moment, so something needs 
>> to give.  The question of course is what.
>
> Nothing needs to give.  You add the initialization with , 
> _without an uninitialized use_, i.e. just:
>
>foo = .deferred_init(pattern)
>
> or the like.  Then you let the optimizers do their thing.  If 'foo' is 
> provably initialized (or rather overwritten in this setting) on all paths 
> from above init to each use, that init will be removed.  IOW: if the init 
> remains then not all paths are provably initializing.  I.e. the warning 
> code can be triggered based on the simple existence of the deferred_init 
> initializer.
>
> Basically the .deferred_init acts as a "here the scope starts" marker.  
> That's indeed what you want, not the "here the scope ends" clobber.  If a 
> use reaches the scope-start marker you have an (possibly) uninitialized 
> use and warn.
>
> (I'm obviously missing something because the above idea seems natural, so 
> was probably already discussed and rejected, but why?)

The hope was that we wouldn't even need to treat DEFERRED_INIT specially
when analysing uninitialised uses.  The argument would be a standard
uninitialised use, just like any other.

But that was before this issue came up.  It also sounds like Qing is now
handling DEFERRED_INIT specially anyway.  (I've lost track of why,
but see above about reopening that discussion. :-))

So it sounds like that's indeed where we've ended up.  I don't think
the original principle was inherently bad though.

Thanks,
Richard


Fix PR ada/101094

2021-07-01 Thread Eric Botcazou
This is a minor regression present on mainline and 11 branch, whereby the 
value of the Enum_Rep attribute is always unsigned.

Tested on x86-64/Linux, applied on the mainline and 11 branch.


2021-07-01  Eric Botcazou  

PR ada/101094
* exp_attr.adb (Get_Integer_Type): Return an integer type with the
same signedness as the input type.

-- 
Eric Botcazoudiff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index 400398dcbc5..9dd3d9d9726 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -1851,14 +1851,13 @@ package body Exp_Attr is
   --
 
   function Get_Integer_Type (Typ : Entity_Id) return Entity_Id is
- Siz : constant Uint := Esize (Base_Type (Typ));
+ Siz : constant Uint := Esize (Base_Type (Typ));
 
   begin
  --  We need to accommodate invalid values of the base type since we
- --  accept them for Enum_Rep and Pos, so we reason on the Esize. And
- --  we use an unsigned type since the enumeration type is unsigned.
+ --  accept them for Enum_Rep and Pos, so we reason on the Esize.
 
- return Small_Integer_Type_For (Siz, Uns => True);
+ return Small_Integer_Type_For (Siz, Uns => Is_Unsigned_Type (Typ));
   end Get_Integer_Type;
 
   -


[c-family] Fix duplicate name issues in output of -fdump-ada-spec #2

2021-07-01 Thread Eric Botcazou
This extends the type name conflict detection mechanism to variables.

Tested on x86-64/Linux, applied on the mainline.


2021-07-01  Eric Botcazou  

c-family/
* c-ada-spec.c (check_name): Rename into...
(check_type_name_conflict): ...this.  Minor tweak.
(dump_ada_function_declaration): Adjust to above renaming.
(dump_ada_array_domains): Fix oversight.
(dump_ada_declaration): Call check_type_name_conflict for variables.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c
index a2669c604a9..ea52be6f5e6 100644
--- a/gcc/c-family/c-ada-spec.c
+++ b/gcc/c-family/c-ada-spec.c
@@ -1540,9 +1540,8 @@ dump_ada_import (pretty_printer *buffer, tree t, int spc)
otherwise in BUFFER.  */
 
 static void
-check_name (pretty_printer *buffer, tree t)
+check_type_name_conflict (pretty_printer *buffer, tree t)
 {
-  const char *s;
   tree tmp = TREE_TYPE (t);
 
   while (TREE_CODE (tmp) == POINTER_TYPE && !TYPE_NAME (tmp))
@@ -1550,6 +1549,8 @@ check_name (pretty_printer *buffer, tree t)
 
   if (TREE_CODE (tmp) != FUNCTION_TYPE)
 {
+  const char *s;
+
   if (TREE_CODE (tmp) == IDENTIFIER_NODE)
 	s = IDENTIFIER_POINTER (tmp);
   else if (!TYPE_NAME (tmp))
@@ -1641,7 +1642,7 @@ dump_ada_function_declaration (pretty_printer *buffer, tree func,
 	{
 	  if (DECL_NAME (arg))
 	{
-	  check_name (buffer, arg);
+	  check_type_name_conflict (buffer, arg);
 	  pp_ada_tree_identifier (buffer, DECL_NAME (arg), NULL_TREE,
   false);
 	  pp_string (buffer, " : ");
@@ -1710,7 +1711,8 @@ dump_ada_function_declaration (pretty_printer *buffer, tree func,
 static void
 dump_ada_array_domains (pretty_printer *buffer, tree node, int spc)
 {
-  int first = 1;
+  bool first = true;
+
   pp_left_paren (buffer);
 
   for (; TREE_CODE (node) == ARRAY_TYPE; node = TREE_TYPE (node))
@@ -1724,7 +1726,7 @@ dump_ada_array_domains (pretty_printer *buffer, tree node, int spc)
 
 	  if (!first)
 	pp_string (buffer, ", ");
-	  first = 0;
+	  first = false;
 
 	  if (min)
 	dump_ada_node (buffer, min, NULL_TREE, spc, false, true);
@@ -1738,7 +1740,10 @@ dump_ada_array_domains (pretty_printer *buffer, tree node, int spc)
 	pp_string (buffer, "0");
 	}
   else
-	pp_string (buffer, "size_t");
+	{
+	  pp_string (buffer, "size_t");
+	  first = false;
+	}
 }
   pp_right_paren (buffer);
 }
@@ -3152,8 +3157,9 @@ dump_ada_declaration (pretty_printer *buffer, tree t, tree type, int spc)
   if (need_indent)
 	INDENT (spc);
 
-  if (TREE_CODE (t) == FIELD_DECL && DECL_NAME (t))
-	check_name (buffer, t);
+  if ((TREE_CODE (t) == FIELD_DECL || TREE_CODE (t) == VAR_DECL)
+	  && DECL_NAME (t))
+	check_type_name_conflict (buffer, t);
 
   /* Print variable/type's name.  */
   dump_ada_node (buffer, t, t, spc, false, true);


[c-family] Improve packed record layout support with -fdump-ada-spec

2021-07-01 Thread Eric Botcazou
We cannot fully support packed record layout in -fdump-ada-spec, as packing 
in C and Ada does not behave the same, so we issue a warning.  But some simple 
cases are OK and can actually be handled without much work.

Tested on x86-64/Linux, applied on the mainline.


2021-07-01  Eric Botcazou  

c-family/
* c-ada-spec.c (packed_layout): New global variable.
(dump_ada_declaration): Set it upon seeing a packed record type.
Do not put the "aliased" keyword if it is set.
(dump_ada_structure): Add Pack aspect if it is set and clear it.


2021-07-01  Eric Botcazou  

* c-c++-common/dump-ada-spec-14.c: Adjust dg-warning directive.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c
index ea52be6f5e6..827bcc76830 100644
--- a/gcc/c-family/c-ada-spec.c
+++ b/gcc/c-family/c-ada-spec.c
@@ -2038,6 +2038,7 @@ is_float128 (tree node)
 }
 
 static bool bitfield_used = false;
+static bool packed_layout = false;
 
 /* Recursively dump in BUFFER Ada declarations corresponding to NODE of type
TYPE.  SPC is the indentation level.  LIMITED_ACCESS indicates whether NODE
@@ -2851,14 +2852,14 @@ dump_ada_declaration (pretty_printer *buffer, tree t, tree type, int spc)
 		return 1;
 	  }
 
-	/* ??? Packed record layout is not supported.  */
+	/* Packed record layout is not fully supported.  */
 	if (TYPE_PACKED (TREE_TYPE (t)))
 	  {
-		warning_at (DECL_SOURCE_LOCATION (t), 0,
-			"unsupported record layout");
+		warning_at (DECL_SOURCE_LOCATION (t), 0, "packed layout");
 		pp_string (buffer, "pragma Compile_Time_Warning (True, ");
-		pp_string (buffer, "\"probably incorrect record layout\");");
+		pp_string (buffer, "\"packed layout may be incorrect\");");
 		newline_and_indent (buffer, spc);
+		packed_layout = true;
 	  }
 
 	if (orig && TYPE_NAME (orig))
@@ -2951,7 +2952,8 @@ dump_ada_declaration (pretty_printer *buffer, tree t, tree type, int spc)
 
 	  pp_string (buffer, " : ");
 
-	  if (TREE_CODE (TREE_TYPE (TREE_TYPE (t))) != POINTER_TYPE)
+	  if (TREE_CODE (TREE_TYPE (TREE_TYPE (t))) != POINTER_TYPE
+	  && !packed_layout)
 	pp_string (buffer, "aliased ");
 
 	  if (TYPE_NAME (TREE_TYPE (t)))
@@ -3185,7 +3187,8 @@ dump_ada_declaration (pretty_printer *buffer, tree t, tree type, int spc)
 	  if (TREE_CODE (TREE_TYPE (t)) != POINTER_TYPE
 	  && (TYPE_NAME (TREE_TYPE (t))
 		  || (TREE_CODE (TREE_TYPE (t)) != INTEGER_TYPE
-		  && TREE_CODE (TREE_TYPE (t)) != ENUMERAL_TYPE)))
+		  && TREE_CODE (TREE_TYPE (t)) != ENUMERAL_TYPE))
+	  && !packed_layout)
 	pp_string (buffer, "aliased ");
 
 	  if (TREE_READONLY (t) && TREE_CODE (t) != FIELD_DECL)
@@ -3352,7 +3355,7 @@ dump_ada_structure (pretty_printer *buffer, tree node, tree type, bool nested,
   pp_string (buffer, "Unchecked_Union => True");
 }
 
-  if (bitfield_used)
+  if (bitfield_used || packed_layout)
 {
   char buf[32];
   pp_comma (buffer);
@@ -3363,6 +3366,7 @@ dump_ada_structure (pretty_printer *buffer, tree node, tree type, bool nested,
   sprintf (buf, "Alignment => %d", TYPE_ALIGN (node) / BITS_PER_UNIT);
   pp_string (buffer, buf);
   bitfield_used = false;
+  packed_layout = false;
 }
 
   if (nested)
diff --git a/gcc/testsuite/c-c++-common/dump-ada-spec-14.c b/gcc/testsuite/c-c++-common/dump-ada-spec-14.c
index bfdec6181b5..291eea8ba70 100644
--- a/gcc/testsuite/c-c++-common/dump-ada-spec-14.c
+++ b/gcc/testsuite/c-c++-common/dump-ada-spec-14.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-fdump-ada-spec" } */
 
-struct __attribute__((packed)) S /* { dg-warning "unsupported record layout" } */
+struct __attribute__((packed)) S /* { dg-warning "packed layout" } */
 {
   char c;
   int  t;


Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Michael Matz
Hello,

On Thu, 1 Jul 2021, Richard Sandiford wrote:

> Well, it does feel like this is pressing the reset button on a thread
> whose message count is already in the high double figures.  There's a
> risk that restarting the discussion now is going to repeat a lot of the
> same ground, probably at similar length.  (But I realise that the sheer
> length of the thread is probably also the main reason for not having
> followed it closely. :-))

Yeah, I thought of not sending the mail because of that.  But it itched 
too itchy ;-)

> (2) continue to treat uses of uninitialised automatic variables as
> (semantically) undefined behaviour
> 
> When this was discussed on the clang list, (2) was a key requirement
> and was instrumental in getting the feature accepted.
> ... 
> since the behaviour is at least deterministic.  But from a debugging QoI
> perspective, this should not happen.  We should generate the same code
> as for:
> 
>int foo = ;
>if (flag)
>  foo = 1;
>x = foo;
> 
> However, because of (2), we should still generate a 
> -Wmaybe-uninitialized warning for the “x = foo” assignment.
> 
> This is not a combination we support at the moment, so something needs 
> to give.  The question of course is what.

Nothing needs to give.  You add the initialization with , 
_without an uninitialized use_, i.e. just:

   foo = .deferred_init(pattern)

or the like.  Then you let the optimizers do their thing.  If 'foo' is 
provably initialized (or rather overwritten in this setting) on all paths 
from above init to each use, that init will be removed.  IOW: if the init 
remains then not all paths are provably initializing.  I.e. the warning 
code can be triggered based on the simple existence of the deferred_init 
initializer.

Basically the .deferred_init acts as a "here the scope starts" marker.  
That's indeed what you want, not the "here the scope ends" clobber.  If a 
use reaches the scope-start marker you have an (possibly) uninitialized 
use and warn.

(I'm obviously missing something because the above idea seems natural, so 
was probably already discussed and rejected, but why?)

> > Garbage in, garbage out.  It makes not too much sense to argue that
> > the generated PHI node on this loop (generated because of the
> > exposed-upward use of foo) is wrong, or right, or should better be
> > something else.  The input program was broken, so anything makes sense.
> 
> Yeah, without the new option it's GIGO.  But I think it's still possible
> to rank different forms of garbage output in terms of how self-consistent
> they are.
> 
> IMO it makes no sense to say that “foo” is upwards-exposed to somewhere
> where “foo” doesn't exist.  Using “foo_N(D)” doesn't have that problem,
> so I think it's possible to say that it's “better” garbage output. :-)
> 
> And the point is that with the new option, this is no longer garbage
> input as far as SSA is concerned: carrying the old value of “foo”
> across iterations would not be implementing the option correctly.
> How SSA handles this becomes a correctness issue.

I disagree.  The notion of "foo doesn't exist" simply doesn't exist (ugh!) 
in SSA; SSA is not the issue here.  The notion of ranges of where foo does 
or doesn't exist are scopes, that's orthogonal; so you want to encode that 
info somehow in a way compatible with SSA (!).  Doing that 
with explicit instructions is sensible (like marking scope endings with 
the clobber insn was sensible).  But those instructions simply need to not 
work against other invariants established and required by SSA.  Don't 
introduce new undefined uses and you're fine.

> > I think it's a chicken egg problem: you can't add undefined uses, for 
> > which you need to know if there was one, but the facility is supposed to 
> > help detecting if there is one to start with.
> 
> The idea is that .DEFERRED_INIT would naturally get optimised away by
> DCE/DSE if the variable is provably initialised before it's used.

Well, that's super.  So, why would you want or need the uninitialized use 
in the argument of .DEFERRED_INIT?


Ciao,
Michael.


Re: [PATCH 2/4] allow poisoning input_location in ranges it should not be used

2021-07-01 Thread David Malcolm via Gcc-patches
On Thu, 2021-07-01 at 11:40 -0400, David Malcolm wrote:
> On Thu, 2021-07-01 at 14:53 +0200, Richard Biener wrote:
> > On Thu, Jul 1, 2021 at 12:16 PM Trevor Saunders <
> > tbsau...@tbsaunde.org> wrote:
> > > 
> > > On Wed, Jun 30, 2021 at 11:13:23AM -0400, David Malcolm wrote:
> > > > On Wed, 2021-06-30 at 01:35 -0400, Trevor Saunders wrote:
> > > > > This makes it possible to assert if input_location is used
> > > > > during the
> > > > > lifetime
> > > > > of a scope.  This will allow us to find places that currently
> > > > > use it
> > > > > within a
> > > > > function and its callees, or prevent adding uses within the
> > > > > lifetime
> > > > > of a
> > > > > function after all existing uses are removed.
> > > > > 
> > > > > bootstrapped and regtested on x86_64-linux-gnu, ok?
> > > > > 
> > > > > Trev
> > > > 
> > > > [...snip...]
> > > > 
> > > > > diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> > > > > index d58586f2526..3f68d1d79eb 100644
> > > > > --- a/gcc/diagnostic.c
> > > > > +++ b/gcc/diagnostic.c
> > > > > @@ -1835,7 +1835,7 @@ internal_error (const char *gmsgid,
> > > > > ...)
> > > > >    auto_diagnostic_group d;
> > > > >    va_list ap;
> > > > >    va_start (ap, gmsgid);
> > > > > -  rich_location richloc (line_table, input_location);
> > > > > +  rich_location richloc (line_table, UNKNOWN_LOCATION);
> > > > >    diagnostic_impl (, NULL, -1, gmsgid, , DK_ICE);
> > > > >    va_end (ap);
> > > > > 
> > > > 
> > > > I actually make use of this in the analyzer: the analyzer sets
> > > > input_location to stmt->location when analyzing a given stmt -
> > > > that
> > > > way, if the analyzer ICEs, the ICE is shown at the code
> > > > construct
> > > > that
> > > > crashed the analyzer.
> > > > 
> > > > This behavior is useful to me, and would be lost with the
> > > > proposed
> > > > patch.
> > > 
> > > I made this change because otherwise if the compiler ICE's while
> > > access
> > > to input_location is blocked we end up infinitely recursing
> > > complaining
> > > we can't access it while trying to say where the last error was. 
> > > I
> > > was
> > > nervous about the change before, and now I agree we need
> > > something
> > > else.
> > > 
> > > > Is there a better way of doing what I'm doing?
> > > > 
> > > > Is the long-term goal of the patch kit to reduce our reliance
> > > > on
> > > > global
> > > > variables?  Are we ultimately still going to need a variable
> > > > for
> > > > "where
> > > > to show the ICE if gcc crashes"?  (perhaps stashing it in the
> > > > diagnostic_context???)
> > > 
> > > Yes, the goal is ultimately removal of global state, however I'm
> > > not
> > > really ure what the better approach to your problem is, after all
> > > even
> > > moving it to the diagnostic context is sort of a global state,
> > > and
> > > sort
> > > of dupplicates input_location.  That said it is somewhat more
> > > constrained, so if it removes usage of input_location perhaps its
> > > worthwhile?
> > 
> > Reduction of global state is of course good - but in particular
> > input_location
> > should be something only used during parsing because it's a quite
> > broken concept otherwise.  And fiddling with it tends to be quite
> > fragile...
> > for example see g:7d6f7e92c3b737736a2d8ff97a71af9f230c2f88
> > for the "fun" you can have with "stale" values in input_location
> > ...
> 
> Yeah.  Another example, from the analyzer, is
> g:2fbea4190e76a59c4880727cf84706fe083c00ae (PR 93349)
> 
> 
> > IMHO users should have their own "copy", for example the gimplifier
> > instead of mucking with and using input_location could use a
> > similar state in its gimplify_ctx.
> 
> Some ideas (not necessarily good ones):
> 
> (a) the diagnostic_context could have an ice_location field, and use
> that in internal_error (and maybe an RAII class for setting/clearing
> it).
> 
> (b) move input_location to diagnostic_context, and add:
>    #define input_location (global_dc->x_input_location)
> or:
>    #define input_location (global_dc->x_default_location)
> which add an indirection everywhere.  I don't love these ideas, in
> that
> we already overuse the preprocessor IMHO.
> 
> Trevor: BTW, if you're looking for global state to eliminate, it
> might
> be nice to move the globals in input.c for caching source lines
> (fcache_tab etc) into a new source_cache class, and have the
> diagnostic_context own it via a new "source_cache *" field.

Actually, I just found an old working copy on my drive that does most
of this; I'll try cleaning it up for gcc 12 and see if I can get it to
bootstrap.

Dave



Re: [PATCH] Port GCC documentation to Sphinx

2021-07-01 Thread Martin Liška

On 7/1/21 5:44 PM, Eli Zaretskii wrote:

Cc: jos...@codesourcery.com, g...@gcc.gnu.org, gcc-patches@gcc.gnu.org
From: Martin Liška 
Date: Thu, 1 Jul 2021 16:14:30 +0200


If I understand the notes correct, the '.' should be also hidden by e.g. Emacs.


No, it doesn't.  The actual text in the Info file is:

 *note -std: f.‘=iso9899:1990’

and the period after " f" isn't hidden.  Where does that "f" come from
and what is its purpose here? can it be removed (together with the
period)?


It's name of the anchor used for the @ref. The names are automatically generated
by makeinfo. So there's an example:

This is the warning level of @ref{e,,-Wshift-overflow3} and …

becomes in info:
This is the warning level of *note -Wshift-overflow3: e. and …

I can ask the question at Sphinx, the Emacs script should hide that.


Emacs doesn't hide the period.  But there shouldn't be a period to
begin with, since it's the middle of a sentence.  The correct way of
writing this in Texinfo is to have some punctuation: a comma or a
semi-colon, after the closing brace, like this:

   This is the warning level of @ref{e,,-Wshift-overflow3}, and …


I don't see why we should put a comma after an option reference.



Does Sphinx somehow generate the period if there's no comma, or does
it do it unconditionally, i.e. even if there is a punctuation after
the closing brace?


It's all related to Texinfo. Sphinx generates e.g.
Enabled by @ref{7,,-Wall} and something else.

as documented here:
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/_0040ref.html

Then it ends with the following info output:

 Enabled by *note -Wall: 7. and something else.

So the period is added by Texinfo. If I put comma after a reference, then
the period is not added there.




This actually raises a more general issue with this Sphinx porting
initiative: what will be the canonical style guide for maintaining the
GCC manual in Sphinx, or more generally for writing GNU manuals in
Sphinx?  For Texinfo, we have the Texinfo manual, which both documents
the language and provides style guidelines for how to use Texinfo for
producing good manuals.  Contributors to GNU manuals are using those
guidelines for many years.  Is there, or will there be, an equivalent
style guide for Sphinx?  If not, how will the future contributors to
the GCC manuals know what are the writing and style conventions?


No, I'm not planning any extra style guide. We will you standard Sphinx RST
manual and one can find many tutorials about how to do it.


Are you sure everything there is good for our manuals?  Did you
compare the style conventions there with what we have in the Texinfo
manual?


I'm not sure, but I made the conversion from Texinfo as close as possible to 
RST files.
I see the documentation markup natural and it matches with we write 
documentation
in Texinfo.



Moreover, this means people who contribute to other manuals will now
have to learn two different styles, no?  And that's in addition to
learning one more language.


Yes, people will have to learn RST. Similarly to git conversion, people also
had to learn another version control system (we used svn).




That is why I recommended to discuss this on the Texinfo list: that's
the place where such guidelines are discussed, and where we have
experts who understand the effects and consequences of using this or
that style.  The current style in GNU manuals is to have the menus as
we see them in the existing GCC manuals: with a short description.
Maybe there are good reasons to deviate from that style, but
shouldn't this be at least presented and discussed, before the
decision is made?  GCC developers are not the only ones who will be
reading the future GCC manuals.



That seems to me a subtle adjustment and it's standard way how people generate
TOC in Sphinx. See e.g. the Linux kernel documentation:
https://www.kernel.org/doc/html/latest/


I don't think the GCC manuals should necessarily be bound by the
Sphinx standards.  Where those standards are sub-optimal, it is
perfectly okay for GCC (and other projects) to deviate.  GCC and other
GNU manuals used a certain style and convention for decades, so
there's more than enough experience and tradition to build on.


What type of conversions and style are going to change with conversion to 
Sphinx?
Do you see any of them worse than what we have now?



I will no longer pursue this point, but let me just say that I
consider it a mistake to throw away all the experience collected using
Texinfo just because Sphinx folks have other traditions and
conventions.  It might be throwing the baby with the bathwater.



Again, please show up concrete examples. What you describe is very theoretical.

Thanks,
Martin


[pushed] Darwin: Define a suitable section name for CTF [PR101283]

2021-07-01 Thread Iain Sandoe
Hi,

This is a placeholder name ahead of any CTF implementation on
LLVM (which sets Darwin ABI).  Ideally, we would get agreement
on this choice (or any replacement) before GCC12 is shipped.

tested on Darwin18,
pushed to master, thanks,
Iain

PR debug/101283 - Several tests fail on Darwin with -gctf

PR debug/101283

gcc/ChangeLog:

* config/darwin.h (CTF_INFO_SECTION_NAME): New.
---
 gcc/config/darwin.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index d2b2c141c8e..b7c3af3b3fa 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -1115,4 +1115,8 @@ extern void darwin_driver_init (unsigned int *,struct 
cl_decoded_option **);
 # endif
 #endif
 
+/* CTF support.  */
+#undef CTF_INFO_SECTION_NAME
+#define CTF_INFO_SECTION_NAME "__CTF,__ctf,regular,debug"
+
 #endif /* CONFIG_DARWIN_H */
-- 
2.24.1




Re: [PATCH] Port GCC documentation to Sphinx

2021-07-01 Thread Eli Zaretskii via Gcc-patches
> Cc: jos...@codesourcery.com, g...@gcc.gnu.org, gcc-patches@gcc.gnu.org
> From: Martin Liška 
> Date: Thu, 1 Jul 2021 16:14:30 +0200
> 
> >> If I understand the notes correct, the '.' should be also hidden by e.g. 
> >> Emacs.
> > 
> > No, it doesn't.  The actual text in the Info file is:
> > 
> > *note -std: f.‘=iso9899:1990’
> > 
> > and the period after " f" isn't hidden.  Where does that "f" come from
> > and what is its purpose here? can it be removed (together with the
> > period)?
> 
> It's name of the anchor used for the @ref. The names are automatically 
> generated
> by makeinfo. So there's an example:
> 
> This is the warning level of @ref{e,,-Wshift-overflow3} and …
> 
> becomes in info:
> This is the warning level of *note -Wshift-overflow3: e. and …
> 
> I can ask the question at Sphinx, the Emacs script should hide that.

Emacs doesn't hide the period.  But there shouldn't be a period to
begin with, since it's the middle of a sentence.  The correct way of
writing this in Texinfo is to have some punctuation: a comma or a
semi-colon, after the closing brace, like this:

  This is the warning level of @ref{e,,-Wshift-overflow3}, and …

Does Sphinx somehow generate the period if there's no comma, or does
it do it unconditionally, i.e. even if there is a punctuation after
the closing brace?

> > This actually raises a more general issue with this Sphinx porting
> > initiative: what will be the canonical style guide for maintaining the
> > GCC manual in Sphinx, or more generally for writing GNU manuals in
> > Sphinx?  For Texinfo, we have the Texinfo manual, which both documents
> > the language and provides style guidelines for how to use Texinfo for
> > producing good manuals.  Contributors to GNU manuals are using those
> > guidelines for many years.  Is there, or will there be, an equivalent
> > style guide for Sphinx?  If not, how will the future contributors to
> > the GCC manuals know what are the writing and style conventions?
> 
> No, I'm not planning any extra style guide. We will you standard Sphinx RST
> manual and one can find many tutorials about how to do it.

Are you sure everything there is good for our manuals?  Did you
compare the style conventions there with what we have in the Texinfo
manual?

Moreover, this means people who contribute to other manuals will now
have to learn two different styles, no?  And that's in addition to
learning one more language.

> > That is why I recommended to discuss this on the Texinfo list: that's
> > the place where such guidelines are discussed, and where we have
> > experts who understand the effects and consequences of using this or
> > that style.  The current style in GNU manuals is to have the menus as
> > we see them in the existing GCC manuals: with a short description.
> > Maybe there are good reasons to deviate from that style, but
> > shouldn't this be at least presented and discussed, before the
> > decision is made?  GCC developers are not the only ones who will be
> > reading the future GCC manuals.
> > 
> 
> That seems to me a subtle adjustment and it's standard way how people generate
> TOC in Sphinx. See e.g. the Linux kernel documentation:
> https://www.kernel.org/doc/html/latest/

I don't think the GCC manuals should necessarily be bound by the
Sphinx standards.  Where those standards are sub-optimal, it is
perfectly okay for GCC (and other projects) to deviate.  GCC and other
GNU manuals used a certain style and convention for decades, so
there's more than enough experience and tradition to build on.

I will no longer pursue this point, but let me just say that I
consider it a mistake to throw away all the experience collected using
Texinfo just because Sphinx folks have other traditions and
conventions.  It might be throwing the baby with the bathwater.


Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Qing Zhao via Gcc-patches
Hi, Michael,

> On Jul 1, 2021, at 9:40 AM, Michael Matz  wrote:
> 
> Hello,
> 
> I haven't followed this thread too closely, in particular I haven't seen 
> why the current form of the .DEFERRED_INIT call was chosen or suggested, 
> but it triggered my "well, that's obviously wrong" gut feeling; so sorry 
> for stating something which might be obvious to thread participants here.  
> Anyway:
> 
> On Thu, 1 Jul 2021, Richard Sandiford via Gcc-patches wrote:
> 
 It's not a bug in SSA, it's at most a missed optimization there.
>>> 
>>> I still think that SSA cannot handle block-scoped variable correctly 
>>> in this case, it should not move the variable out side of the block 
>>> scope. -:)
>>> 
 But with -ftrivial-auto-var-init it becomes a correctness issue.
>> 
>> I might have lost track of what “it” is here.  Do you mean the 
>> progation, or the fact that we have a PHI in the first place?
>> 
>> For:
>> 
>> unsigned int
>> f (int n)
>> {
>>  unsigned int res = 0;
>>  for (int i = 0; i < n; i += 1)
>>{
>>  unsigned int foo;
>>  foo += 1;
> 
> Note that here foo is used uninitialized.  That is the thing from which 
> everything else follows.  Garbage in, garbage out.  It makes not too much 
> sense to argue that the generated PHI node on this loop (generated because 
> of the exposed-upward use of foo) is wrong, or right, or should better be 
> something else.  The input program was broken, so anything makes sense.

Yes, the PHI node was generated because of the exposed-upward use of “foo”.
this makes sense.

However, the PHI node should not be put out of the live scope of “foo”, this is 
wrong.

IMO, even though there is uninitialized variable, it’s not the excuse that 
compiler should
not maintain correct variable scopes.


> 
> That is the same with Qings shorter testcase:
> 
>  6   for (i = 0; i < a; i++) {
>  7 if (__extension__({int size2;
>  8 size2 = ART_INIT (size2, 2);
> 
> Uninitialized use of size2 right there.  And it's the same for the use of 
> .DEFERRED_INIT as the patch does:
> 
> {
>  int size2;
>  size2 = .DEFERRED_INIT (size2, 2);
>  size2 = 4;
>  _1 = size2 > 5;
>  D.2240 = (int) _1;
> }
> 
> Argument of the pseudo-function-call to .DEFERRED_INIT uninitialized -> 
> boom.
> 
> You can't solve any of this by fiddling with how SSA rewriting behaves at 
> a large scale.  You need to change this and only this specific 
> interpretation of a use.  Or preferrably not generate it to start with.
> 
>> IMO the foo_3 PHI makes no sense.  foo doesn't live beyond its block,
>> so there should be no loop-carried dependency here.
>> 
>> So yeah, if “it” meant that the fact that variables live too long,
>> then I agree that becomes a correctness issue with this feature,
>> rather than just an odd quirk.  Could we solve it by inserting
>> a clobber at the end of the variable's scope, or something like that?
> 
> It would possibly make GCC not generate a PHI on this broken input, yes.  
> But why would that be an improvement?

Being adding Clobbers at the end of the variable’s scope, the compiler will 
keep the correct
Variable scope during this stage, and prevent any more incorrect transformation 
from happening.
IMO  this should be a nice improvement.  -:)

Thanks.

Qing
> 
>>> Agreed, I have made such change yesterday, and this work around the 
>>> current issue.
>>> 
>>> temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)
>>> 
>>> To:
>>> 
>>> temp = .DEFERRED_INIT (SIZE_of_temp, init_type)
>> 
>> I think we're going round in circles here though.  The point of having
>> the undefined input was so that we would continue to warn about undefined
>> inputs.  The above issue doesn't seem like a good justification for
>> dropping that.
> 
> If you want to rely on the undefined use for error reporting then you must 
> only generate an undefined use when there was one before, you can't just 
> insert new undefined uses.  I don't see how it could be otherwise, as soon 
> as you introduce new undefined uses you can and will run into GCC making 
> use of the undefinedness, not just this particular issue with lifetime and 
> PHI nodes which might be "solved" by clobbers.
> 
> I think it's a chicken egg problem: you can't add undefined uses, for 
> which you need to know if there was one, but the facility is supposed to 
> help detecting if there is one to start with.
> 
> 
> Ciao,
> Michael.



Re: [PATCH 2/4] allow poisoning input_location in ranges it should not be used

2021-07-01 Thread David Malcolm via Gcc-patches
On Thu, 2021-07-01 at 14:53 +0200, Richard Biener wrote:
> On Thu, Jul 1, 2021 at 12:16 PM Trevor Saunders <
> tbsau...@tbsaunde.org> wrote:
> > 
> > On Wed, Jun 30, 2021 at 11:13:23AM -0400, David Malcolm wrote:
> > > On Wed, 2021-06-30 at 01:35 -0400, Trevor Saunders wrote:
> > > > This makes it possible to assert if input_location is used
> > > > during the
> > > > lifetime
> > > > of a scope.  This will allow us to find places that currently
> > > > use it
> > > > within a
> > > > function and its callees, or prevent adding uses within the
> > > > lifetime
> > > > of a
> > > > function after all existing uses are removed.
> > > > 
> > > > bootstrapped and regtested on x86_64-linux-gnu, ok?
> > > > 
> > > > Trev
> > > 
> > > [...snip...]
> > > 
> > > > diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> > > > index d58586f2526..3f68d1d79eb 100644
> > > > --- a/gcc/diagnostic.c
> > > > +++ b/gcc/diagnostic.c
> > > > @@ -1835,7 +1835,7 @@ internal_error (const char *gmsgid, ...)
> > > >    auto_diagnostic_group d;
> > > >    va_list ap;
> > > >    va_start (ap, gmsgid);
> > > > -  rich_location richloc (line_table, input_location);
> > > > +  rich_location richloc (line_table, UNKNOWN_LOCATION);
> > > >    diagnostic_impl (, NULL, -1, gmsgid, , DK_ICE);
> > > >    va_end (ap);
> > > > 
> > > 
> > > I actually make use of this in the analyzer: the analyzer sets
> > > input_location to stmt->location when analyzing a given stmt -
> > > that
> > > way, if the analyzer ICEs, the ICE is shown at the code construct
> > > that
> > > crashed the analyzer.
> > > 
> > > This behavior is useful to me, and would be lost with the
> > > proposed
> > > patch.
> > 
> > I made this change because otherwise if the compiler ICE's while
> > access
> > to input_location is blocked we end up infinitely recursing
> > complaining
> > we can't access it while trying to say where the last error was.  I
> > was
> > nervous about the change before, and now I agree we need something
> > else.
> > 
> > > Is there a better way of doing what I'm doing?
> > > 
> > > Is the long-term goal of the patch kit to reduce our reliance on
> > > global
> > > variables?  Are we ultimately still going to need a variable for
> > > "where
> > > to show the ICE if gcc crashes"?  (perhaps stashing it in the
> > > diagnostic_context???)
> > 
> > Yes, the goal is ultimately removal of global state, however I'm
> > not
> > really ure what the better approach to your problem is, after all
> > even
> > moving it to the diagnostic context is sort of a global state, and
> > sort
> > of dupplicates input_location.  That said it is somewhat more
> > constrained, so if it removes usage of input_location perhaps its
> > worthwhile?
> 
> Reduction of global state is of course good - but in particular
> input_location
> should be something only used during parsing because it's a quite
> broken concept otherwise.  And fiddling with it tends to be quite
> fragile...
> for example see g:7d6f7e92c3b737736a2d8ff97a71af9f230c2f88
> for the "fun" you can have with "stale" values in input_location ...

Yeah.  Another example, from the analyzer, is
g:2fbea4190e76a59c4880727cf84706fe083c00ae (PR 93349)


> IMHO users should have their own "copy", for example the gimplifier
> instead of mucking with and using input_location could use a
> similar state in its gimplify_ctx.

Some ideas (not necessarily good ones):

(a) the diagnostic_context could have an ice_location field, and use
that in internal_error (and maybe an RAII class for setting/clearing
it).

(b) move input_location to diagnostic_context, and add:
   #define input_location (global_dc->x_input_location)
or:
   #define input_location (global_dc->x_default_location)
which add an indirection everywhere.  I don't love these ideas, in that
we already overuse the preprocessor IMHO.

Trevor: BTW, if you're looking for global state to eliminate, it might
be nice to move the globals in input.c for caching source lines
(fcache_tab etc) into a new source_cache class, and have the
diagnostic_context own it via a new "source_cache *" field.

Dave



Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Richard Sandiford via Gcc-patches
Michael Matz  writes:
> Hello,
>
> I haven't followed this thread too closely, in particular I haven't seen 
> why the current form of the .DEFERRED_INIT call was chosen or suggested, 
> but it triggered my "well, that's obviously wrong" gut feeling; so sorry 
> for stating something which might be obvious to thread participants here.  

Well, it does feel like this is pressing the reset button on a thread
whose message count is already in the high double figures.  There's a
risk that restarting the discussion now is going to repeat a lot of the
same ground, probably at similar length.  (But I realise that the sheer
length of the thread is probably also the main reason for not having
followed it closely. :-))

To recap, the purpose of the option is to do two things simultaneously:

(1) make the object code behave “as if” automatic variables were
initialised with a fill pattern, before any explicit initialisation

(2) continue to treat uses of uninitialised automatic variables as
(semantically) undefined behaviour

When this was discussed on the clang list, (2) was a key requirement
and was instrumental in getting the feature accepted.  The option is
specificially *not* trying to change language semantics or create
a fork in the language.

The option is both a security and a debugging feature.  As a security
feature, it would be legitimate to optimise:

   int foo;
   if (flag)
 foo = 1;
   x = foo;

into:

   x = 1;

since the behaviour is at least deterministic.  But from a debugging QoI
perspective, this should not happen.  We should generate the same code
as for:

   int foo = ;
   if (flag)
 foo = 1;
   x = foo;

However, because of (2), we should still generate a -Wmaybe-uninitialized
warning for the “x = foo” assignment.

This is not a combination we support at the moment, so something needs
to give.  The question of course is what.

> Anyway:
>
> On Thu, 1 Jul 2021, Richard Sandiford via Gcc-patches wrote:
>
>> >> It's not a bug in SSA, it's at most a missed optimization there.
>> >
>> > I still think that SSA cannot handle block-scoped variable correctly 
>> > in this case, it should not move the variable out side of the block 
>> > scope. -:)
>> >
>> >>  But with -ftrivial-auto-var-init it becomes a correctness issue.
>> 
>> I might have lost track of what “it” is here.  Do you mean the 
>> progation, or the fact that we have a PHI in the first place?
>> 
>> For:
>> 
>> unsigned int
>> f (int n)
>> {
>>   unsigned int res = 0;
>>   for (int i = 0; i < n; i += 1)
>> {
>>   unsigned int foo;
>>   foo += 1;
>
> Note that here foo is used uninitialized.  That is the thing from which 
> everything else follows.

Right, that was the point.

> Garbage in, garbage out.  It makes not too much sense to argue that
> the generated PHI node on this loop (generated because of the
> exposed-upward use of foo) is wrong, or right, or should better be
> something else.  The input program was broken, so anything makes sense.

Yeah, without the new option it's GIGO.  But I think it's still possible
to rank different forms of garbage output in terms of how self-consistent
they are.

IMO it makes no sense to say that “foo” is upwards-exposed to somewhere
where “foo” doesn't exist.  Using “foo_N(D)” doesn't have that problem,
so I think it's possible to say that it's “better” garbage output. :-)

And the point is that with the new option, this is no longer garbage
input as far as SSA is concerned: carrying the old value of “foo”
across iterations would not be implementing the option correctly.
How SSA handles this becomes a correctness issue.

> That is the same with Qings shorter testcase:
>
>   6   for (i = 0; i < a; i++) {
>   7 if (__extension__({int size2;
>   8 size2 = ART_INIT (size2, 2);
>
> Uninitialized use of size2 right there.  And it's the same for the use of 
> .DEFERRED_INIT as the patch does:
>
> {
>   int size2;
>   size2 = .DEFERRED_INIT (size2, 2);
>   size2 = 4;
>   _1 = size2 > 5;
>   D.2240 = (int) _1;
> }
>
> Argument of the pseudo-function-call to .DEFERRED_INIT uninitialized -> 
> boom.
>
> You can't solve any of this by fiddling with how SSA rewriting behaves at 
> a large scale.  You need to change this and only this specific 
> interpretation of a use.  Or preferrably not generate it to start with.
>
>> IMO the foo_3 PHI makes no sense.  foo doesn't live beyond its block,
>> so there should be no loop-carried dependency here.
>> 
>> So yeah, if “it” meant that the fact that variables live too long,
>> then I agree that becomes a correctness issue with this feature,
>> rather than just an odd quirk.  Could we solve it by inserting
>> a clobber at the end of the variable's scope, or something like that?
>
> It would possibly make GCC not generate a PHI on this broken input, yes.  
> But why would that be an improvement?

Hopefully the above answers this.

>> > Agreed, I have made such change yesterday, and this work around the 
>> > current issue.
>> >

Re: [PATCH] c++: unqualified member template in constraint [PR101247]

2021-07-01 Thread Patrick Palka via Gcc-patches
On Thu, 1 Jul 2021, Jason Merrill wrote:

> On 6/30/21 5:27 PM, Patrick Palka wrote:
> > Here any_template_parm_r is failing to mark the template parameters
> > that're implicitly used by the unqualified use of 'd' inside the constraint,
> > because the code to do so assumes each level of a template parameter
> > list points to the corresponding primary template, but here the
> > parameter level for A in the out-of-line definition of A::B does not
> > (nor do the parameter levels for A and C in the definition of A::C),
> > which causes us to overlook the sharing.
> > 
> > So it seems we can't in general depend on the TREE_TYPE of a template
> > parameter level being non-empty here.  This patch partially fixes this
> > by rewriting the relevant part of any_template_parm_r to not depend on
> > the TREE_TYPE of outer levels.  We still depend on the innermost level
> > to point to the innermost primary template, so unfortunately we still
> > crash on the commented out lines in the below testcase.  (The problem
> > there ultimately seems to be in push_template_decl, where we consider
> > the out-of-line definition of A::C to not be primary since
> > template_parm_scope_p is false, so DECL_PRIMARY_TEMPLATE never gets set.
> > Fixing that might not be safe enough to backport, but hopefully this
> > partial fix is.)
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
> > range-v3 and cmcstl2, does this look OK for trunk/11?
> 
> OK.  Are you looking at fixing the commented-out line in a separate patch?

Thanks a lot.  Yes, I'm going to try to make us set
DECL_PRIMARY_TEMPLATE (probably from push_template_decl) when defining a
class-scope class template out-of-line.

> 
> > PR c++/101247
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.c (any_template_parm_r) : Rewrite to
> > use common_enclosing_class and to not depend on the TREE_TYPE
> > of outer levels pointing to the corresponding primary template.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-memtmpl4.C: New test.
> > ---
> >   gcc/cp/pt.c   | 23 ---
> >   .../g++.dg/cpp2a/concepts-memtmpl4.C  | 28 +++
> >   2 files changed, 33 insertions(+), 18 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-memtmpl4.C
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index db769d59951..a8fdd2e177e 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -10735,24 +10735,11 @@ any_template_parm_r (tree t, void *data)
> > {
> > /* If T is a member template that shares template parameters with
> >ctx_parms, we need to mark all those parameters for mapping.  */
> > -   tree dparms = DECL_TEMPLATE_PARMS (t);
> > -   tree cparms = ftpi->ctx_parms;
> > -   while (TMPL_PARMS_DEPTH (dparms) > ftpi->max_depth)
> > - dparms = TREE_CHAIN (dparms);
> > -   while (TMPL_PARMS_DEPTH (cparms) > TMPL_PARMS_DEPTH (dparms))
> > - cparms = TREE_CHAIN (cparms);
> > -   while (dparms
> > -  && (TREE_TYPE (TREE_VALUE (dparms))
> > -  != TREE_TYPE (TREE_VALUE (cparms
> > - dparms = TREE_CHAIN (dparms),
> > -   cparms = TREE_CHAIN (cparms);
> > -   if (dparms)
> > - {
> > -   int ddepth = TMPL_PARMS_DEPTH (dparms);
> > -   tree dargs = TI_ARGS (get_template_info (DECL_TEMPLATE_RESULT
> > (t)));
> > -   for (int i = 0; i < ddepth; ++i)
> > - WALK_SUBTREE (TMPL_ARGS_LEVEL (dargs, i+1));
> > - }
> > +   if (tree dtmpl = TREE_TYPE (INNERMOST_TEMPLATE_PARMS
> > (ftpi->ctx_parms)))
> > + if (tree com = common_enclosing_class (DECL_CONTEXT (t),
> > +DECL_CONTEXT (dtmpl)))
> > +   if (tree ti = CLASSTYPE_TEMPLATE_INFO (com))
> > + WALK_SUBTREE (TI_ARGS (ti));
> > }
> > break;
> >   diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-memtmpl4.C
> > b/gcc/testsuite/g++.dg/cpp2a/concepts-memtmpl4.C
> > new file mode 100644
> > index 000..625149e5025
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-memtmpl4.C
> > @@ -0,0 +1,28 @@
> > +// PR c++/101247
> > +// { dg-do compile { target concepts } }
> > +// A variant of concepts-memtmpl3.C where f is defined outside A's
> > definition.
> > +
> > +template  struct A {
> > +  template  static constexpr bool d = true;
> > +  struct B;
> > +  template  struct C;
> > +};
> > +
> > +template 
> > +struct A::B {
> > +  template  static void f(c) requires d;
> > +};
> > +
> > +template 
> > +template 
> > +struct A::C {
> > +  template  static void f(c) requires d;
> > +  static void g() requires d;
> > +};
> > +
> > +int main()
> > +{
> > +  A::B::f(0);
> > +  A::C::f(0);
> > +  // A::C::g();
> > +}
> > 
> 
> 



Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Qing Zhao via Gcc-patches


> On Jul 1, 2021, at 9:09 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>>> On Jul 1, 2021, at 1:48 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Jun 30, 2021 at 9:15 PM Qing Zhao via Gcc-patches
>>>  wrote:
 
 
 
> On Jun 30, 2021, at 1:59 PM, Richard Biener  wrote:
> 
> On June 30, 2021 8:07:43 PM GMT+02:00, Qing Zhao  
> wrote:
>> 
>> 
>>> On Jun 30, 2021, at 12:36 PM, Richard Biener 
>> wrote:
>>> 
>>> On June 30, 2021 7:20:18 PM GMT+02:00, Andrew Pinski
>>  wrote:
 On Wed, Jun 30, 2021 at 8:47 AM Qing Zhao via Gcc-patches
  wrote:
> 
> I came up with a very simple testing case that can repeat the same
 issue:
> 
> [qinzhao@localhost gcc]$ cat t.c
> extern void bar (int);
> void foo (int a)
> {
> int i;
> for (i = 0; i < a; i++) {
> if (__extension__({int size2;
> size2 = 4;
> size2 > 5;}))
> bar (a);
> }
> }
 
 You should show the full dump,
 What we have is the following:
 
 
 
 size2_3 = PHI 
  :
 
 size2_12 = .DEFERRED_INIT (size2_3, 2);
 size2_13 = 4;
 
 So CCP decides to propagate 4 into the PHI and then decides
>> size2_1(D)
 is undefined so size2_3 is then considered 4 and propagates it into
 the .DEFERRED_INIT.
>>> 
>>> Which means the DEFERED_INIT is inserted at the wrong place.
>> 
>> Then, where is the correct place for “.DEFERRED_INIT(size2,2)?
>> 
>> The variable “size2” is a block scope variable which is declared inside
>> the “if” condition:
> 
> But that's obviously not how it behaves
> During into SSA phase since we're inserting a PHI for it - and we're 
> inserting it because of the use in the DEFERED_INIT call. I suppose you 
> need to fiddle with the SSA rewrite and avoid treating the use as a use 
> but only for the purpose of inserting PHIs...
 
 Please see my other email on the new small testing case without 
 -ftrivial-auto-var-init.  The same issue in SSA with that testing case 
 even without -ftrivial-auto-var-init.
 It looks like an existing bug to me in SSA.
>>> 
>>> It's not a bug in SSA, it's at most a missed optimization there.
>> 
>> I still think that SSA cannot handle block-scoped variable correctly in this 
>> case, it should not move the variable out side of the block scope. -:)
>> 
>>> But
>>> with -ftrivial-auto-var-init it
>>> becomes a correctness issue.
> 
> I might have lost track of what “it” is here.  Do you mean the progation,
> or the fact that we have a PHI in the first place?

“PHI” out of the live scope of the corresponding variable. 

> 
> For:
> 
> unsigned int
> f (int n)
> {
>  unsigned int res = 0;
>  for (int i = 0; i < n; i += 1)
>{
>  unsigned int foo;
>  foo += 1;
>  res = foo;
>}
>  return res;
> }
> 
> we generate:
> 
>  unsigned int foo;
>  int i;
>  unsigned int res;
>  unsigned int _8;
> 
>   :
>  res_4 = 0;
>  i_5 = 0;
>  goto ; [INV]
> 
>   :
>  foo_10 = foo_3 + 1;
>  res_11 = foo_10;
>  i_12 = i_2 + 1;
> 
>   :
>  # res_1 = PHI 
>  # i_2 = PHI 
>  # foo_3 = PHI 
>  if (i_2 < n_7(D))
>goto ; [INV]
>  else
>goto ; [INV]
> 
>   :
>  _8 = res_1;
>  return _8;
> 
> IMO the foo_3 PHI makes no sense.  foo doesn't live beyond its block,
> so there should be no loop-carried dependency here.
> 
> So yeah, if “it” meant that the fact that variables live too long,
> then I agree that becomes a correctness issue with this feature,
> rather than just an odd quirk.

Yes, I think so.

>  Could we solve it by inserting
> a clobber at the end of the variable's scope, or something like that?
Yes, I think that will be a good solution.
> 
>>> I think the idea avoiding the USE of the variable in .DEFERRED_INIT
>>> and instead passing the init size is a good one and should avoid this
>>> case (hopefully).
>> 
>> 
>> Agreed, I have made such change yesterday, and this work around the current 
>> issue.
>> 
>> temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)
>> 
>> To:
>> 
>> temp = .DEFERRED_INIT (SIZE_of_temp, init_type)
> 
> I think we're going round in circles here though.  The point of having
> the undefined input was so that we would continue to warn about undefined
> inputs.  The above issue doesn't seem like a good justification for
> dropping that.

Actually, in the current patch, the .DEFERRED_INIT calls are handled specially 
for uninitialized warnings
Analysis, this call itself is treated as the undefined input. 
Therefore, even after I made the above change, all the uninitialized warnings 
still are kept well.

So, I think that this change should be fine for keeping the uninitialized 
warnings.

Qing


> 
> Thanks,
> Richard
> 
> 



Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Qing Zhao via Gcc-patches


> On Jul 1, 2021, at 9:04 AM, Richard Biener  wrote:
> 
> On Thu, Jul 1, 2021 at 3:45 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Jul 1, 2021, at 1:48 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Jun 30, 2021 at 9:15 PM Qing Zhao via Gcc-patches
>>>  wrote:
 
 
 
> On Jun 30, 2021, at 1:59 PM, Richard Biener  wrote:
> 
> On June 30, 2021 8:07:43 PM GMT+02:00, Qing Zhao  
> wrote:
>> 
>> 
>>> On Jun 30, 2021, at 12:36 PM, Richard Biener 
>> wrote:
>>> 
>>> On June 30, 2021 7:20:18 PM GMT+02:00, Andrew Pinski
>>  wrote:
 On Wed, Jun 30, 2021 at 8:47 AM Qing Zhao via Gcc-patches
  wrote:
> 
> I came up with a very simple testing case that can repeat the same
 issue:
> 
> [qinzhao@localhost gcc]$ cat t.c
> extern void bar (int);
> void foo (int a)
> {
> int i;
> for (i = 0; i < a; i++) {
> if (__extension__({int size2;
> size2 = 4;
> size2 > 5;}))
> bar (a);
> }
> }
 
 You should show the full dump,
 What we have is the following:
 
 
 
 size2_3 = PHI 
  :
 
 size2_12 = .DEFERRED_INIT (size2_3, 2);
 size2_13 = 4;
 
 So CCP decides to propagate 4 into the PHI and then decides
>> size2_1(D)
 is undefined so size2_3 is then considered 4 and propagates it into
 the .DEFERRED_INIT.
>>> 
>>> Which means the DEFERED_INIT is inserted at the wrong place.
>> 
>> Then, where is the correct place for “.DEFERRED_INIT(size2,2)?
>> 
>> The variable “size2” is a block scope variable which is declared inside
>> the “if” condition:
> 
> But that's obviously not how it behaves
> During into SSA phase since we're inserting a PHI for it - and we're 
> inserting it because of the use in the DEFERED_INIT call. I suppose you 
> need to fiddle with the SSA rewrite and avoid treating the use as a use 
> but only for the purpose of inserting PHIs...
 
 Please see my other email on the new small testing case without 
 -ftrivial-auto-var-init.  The same issue in SSA with that testing case 
 even without -ftrivial-auto-var-init.
 It looks like an existing bug to me in SSA.
>>> 
>>> It's not a bug in SSA, it's at most a missed optimization there.
>> 
>> I still think that SSA cannot handle block-scoped variable correctly in this 
>> case, it should not move the variable out side of the block scope. -:)
> 
> Well, the SSA rewriter simply does not "understand" scopes and thus
> keeps registers live across
> the scope boundaries which results in unnecessary PHI nodes in this
> particular case.  That's of
> course because "scopes" no longer exist at this point in the IL, they
> are at most approximated
> by CLOBBERs we put at the end of scopes, but we don't do that for everything.

Then, this looks like a bug to me.
If the scope info is emulated by CLOBBERs in this stage, then compiler should 
add CLOBBERs at proper places to
keep the scope info. 
Moving a variable out of its live scope seems not a correct transformation.

> 
> For
> 
> void foo ()
> {
> lab:
>{
>  int i;
>  i += 1;
>}
>  goto lab;
> }
> 
> we get
> 
>   :
>  # i_1 = PHI 

Here, the variable “I” had been moved out of its live scope already. 
Seems not correct.

Qing
> lab:
>  i_3 = i_1 + 1;
>  // predicted unlikely by goto predictor.
>  goto ; [INV]
> 
> but
> 
>   :
> lab:
>  i_3 = i_2(D) + 1;
>  // predicted unlikely by goto predictor.
>  goto ; [INV]
> 
> would have been correct as  well.
> 
> Richard.
> 
>>> But
>>> with -ftrivial-auto-var-init it
>>> becomes a correctness issue.  I think the idea avoiding the USE of the
>>> variable in .DEFERRED_INIT
>>> and instead passing the init size is a good one and should avoid this
>>> case (hopefully).
>> 
>> 
>> Agreed, I have made such change yesterday, and this work around the current 
>> issue.
>> 
>> temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)
>> 
>> To:
>> 
>> temp = .DEFERRED_INIT (SIZE_of_temp, init_type)
>> 
>> Thanks.
>> 
>> Qing
>>> 
>>> Richard.
>>> 
 Let me know if I still miss anything
 
 Qing
> 
> You might be able to construct a testcase which has a use before the real 
> init where then the optimistic CCP propagation will defeat the 
> DEFERED_INIT otherwise.
> 
> I'd need to play with the actual patch to find a good solution to this 
> problem.
> 
> Richard.
> 
 
>> 



[PATCH v5 06/11] x86: Add tests for piecewise move and store

2021-07-01 Thread H.J. Lu via Gcc-patches
* gcc.target/i386/pieces-memcpy-10.c: New test.
* gcc.target/i386/pieces-memcpy-11.c: Likewise.
* gcc.target/i386/pieces-memcpy-12.c: Likewise.
* gcc.target/i386/pieces-memcpy-13.c: Likewise.
* gcc.target/i386/pieces-memcpy-14.c: Likewise.
* gcc.target/i386/pieces-memcpy-15.c: Likewise.
* gcc.target/i386/pieces-memcpy-16.c: Likewise.
* gcc.target/i386/pieces-memcpy-17.c: Likewise.
* gcc.target/i386/pieces-memcpy-18.c: Likewise.
* gcc.target/i386/pieces-memcpy-19.c: Likewise.
* gcc.target/i386/pieces-memset-1.c: Likewise.
* gcc.target/i386/pieces-memset-2.c: Likewise.
* gcc.target/i386/pieces-memset-3.c: Likewise.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-5.c: Likewise.
* gcc.target/i386/pieces-memset-6.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.
* gcc.target/i386/pieces-memset-10.c: Likewise.
* gcc.target/i386/pieces-memset-11.c: Likewise.
* gcc.target/i386/pieces-memset-12.c: Likewise.
* gcc.target/i386/pieces-memset-13.c: Likewise.
* gcc.target/i386/pieces-memset-14.c: Likewise.
* gcc.target/i386/pieces-memset-15.c: Likewise.
* gcc.target/i386/pieces-memset-16.c: Likewise.
* gcc.target/i386/pieces-memset-17.c: Likewise.
* gcc.target/i386/pieces-memset-18.c: Likewise.
* gcc.target/i386/pieces-memset-19.c: Likewise.
* gcc.target/i386/pieces-memset-20.c: Likewise.
* gcc.target/i386/pieces-memset-21.c: Likewise.
* gcc.target/i386/pieces-memset-22.c: Likewise.
* gcc.target/i386/pieces-memset-23.c: Likewise.
* gcc.target/i386/pieces-memset-24.c: Likewise.
* gcc.target/i386/pieces-memset-25.c: Likewise.
* gcc.target/i386/pieces-memset-26.c: Likewise.
* gcc.target/i386/pieces-memset-27.c: Likewise.
* gcc.target/i386/pieces-memset-28.c: Likewise.
* gcc.target/i386/pieces-memset-29.c: Likewise.
* gcc.target/i386/pieces-memset-30.c: Likewise.
* gcc.target/i386/pieces-memset-31.c: Likewise.
* gcc.target/i386/pieces-memset-32.c: Likewise.
* gcc.target/i386/pieces-memset-33.c: Likewise.
* gcc.target/i386/pieces-memset-34.c: Likewise.
* gcc.target/i386/pieces-memset-35.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Likewise.
* gcc.target/i386/pieces-memset-38.c: Likewise.
* gcc.target/i386/pieces-memset-39.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pieces-memset-42.c: Likewise.
* gcc.target/i386/pieces-memset-43.c: Likewise.
* gcc.target/i386/pieces-memset-44.c: Likewise.
---
 .../gcc.target/i386/pieces-memcpy-10.c | 16 
 .../gcc.target/i386/pieces-memcpy-11.c | 17 +
 .../gcc.target/i386/pieces-memcpy-12.c | 16 
 .../gcc.target/i386/pieces-memcpy-13.c | 16 
 .../gcc.target/i386/pieces-memcpy-14.c | 17 +
 .../gcc.target/i386/pieces-memcpy-15.c | 16 
 .../gcc.target/i386/pieces-memcpy-16.c | 16 
 .../gcc.target/i386/pieces-memcpy-7.c  | 15 +++
 .../gcc.target/i386/pieces-memcpy-8.c  | 14 ++
 .../gcc.target/i386/pieces-memcpy-9.c  | 14 ++
 .../gcc.target/i386/pieces-memset-1.c  | 16 
 .../gcc.target/i386/pieces-memset-10.c | 16 
 .../gcc.target/i386/pieces-memset-11.c | 16 
 .../gcc.target/i386/pieces-memset-12.c | 16 
 .../gcc.target/i386/pieces-memset-13.c | 16 
 .../gcc.target/i386/pieces-memset-14.c | 16 
 .../gcc.target/i386/pieces-memset-15.c | 16 
 .../gcc.target/i386/pieces-memset-16.c | 16 
 .../gcc.target/i386/pieces-memset-17.c | 16 
 .../gcc.target/i386/pieces-memset-18.c | 16 
 .../gcc.target/i386/pieces-memset-19.c | 17 +
 .../gcc.target/i386/pieces-memset-2.c  | 12 
 .../gcc.target/i386/pieces-memset-20.c | 17 +
 .../gcc.target/i386/pieces-memset-21.c | 17 +
 .../gcc.target/i386/pieces-memset-22.c | 17 +
 .../gcc.target/i386/pieces-memset-23.c | 17 +
 .../gcc.target/i386/pieces-memset-24.c | 17 +
 .../gcc.target/i386/pieces-memset-25.c | 17 +
 

[PATCH v5 11/11] x86: Also pass -mno-sse to vect8-ret.c

2021-07-01 Thread H.J. Lu via Gcc-patches
Also pass -mno-sse to vect8-ret.c to disable XMM load/store when running
GCC tests with "-march=x86-64 -m32".

* gcc.target/i386/vect8-ret.c: Also pass -mno-sse.
---
 gcc/testsuite/gcc.target/i386/vect8-ret.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect8-ret.c 
b/gcc/testsuite/gcc.target/i386/vect8-ret.c
index 2b2b81ecf7a..6ace07e6e0c 100644
--- a/gcc/testsuite/gcc.target/i386/vect8-ret.c
+++ b/gcc/testsuite/gcc.target/i386/vect8-ret.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ia32 && { ! *-*-vxworks* } } } } */
-/* { dg-options "-mmmx -mvect8-ret-in-mem" } */
+/* { dg-options "-mmmx -mno-sse -mvect8-ret-in-mem" } */
 
 #include 
 
-- 
2.31.1



[PATCH v5 10/11] x86: Update gcc.target/i386/incoming-11.c

2021-07-01 Thread H.J. Lu via Gcc-patches
Expect no stack realignment since we no longer realign stack when
copying data.

* gcc.target/i386/incoming-11.c: Expect no stack realignment.
---
 gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/incoming-11.c 
b/gcc/testsuite/gcc.target/i386/incoming-11.c
index a830c96f7d1..4b822684b88 100644
--- a/gcc/testsuite/gcc.target/i386/incoming-11.c
+++ b/gcc/testsuite/gcc.target/i386/incoming-11.c
@@ -15,4 +15,4 @@ void f()
for (i = 0; i < 100; i++) q[i] = 1;
 }
 
-/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
+/* { dg-final { scan-assembler-not "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
-- 
2.31.1



[PATCH v5 09/11] x86: Also pass -mno-avx to sw-1.c for ia32

2021-07-01 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
---
 gcc/testsuite/gcc.target/i386/sw-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/i386/sw-1.c 
b/gcc/testsuite/gcc.target/i386/sw-1.c
index aec095eda62..a9c89fca4ec 100644
--- a/gcc/testsuite/gcc.target/i386/sw-1.c
+++ b/gcc/testsuite/gcc.target/i386/sw-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune=generic -fshrink-wrap -fdump-rtl-pro_and_epilogue" 
} */
+/* { dg-additional-options "-mno-avx" { target ia32 } } */
 /* { dg-skip-if "No shrink-wrapping preformed" { x86_64-*-mingw* } } */
 
 #include 
-- 
2.31.1



[PATCH v5 04/11] x86: Update piecewise move and store

2021-07-01 Thread H.J. Lu via Gcc-patches
We can use TImode/OImode/XImode integers for piecewise move and store.

1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
bytes we can move from memory to memory in one reasonably fast instruction.
The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
must be a constant, independent of compiler options, since it is used in
reload.h to define struct target_reload and MOVE_MAX can vary, depending
on compiler options.
3. When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
check stack_realign_needed for stack realignment.
(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
than the largest integer supported by vector register.
* config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
(MOVE_MAX_PIECES): Set to bytes of the largest integer supported
by vector register.
(MOVE_MAX): Defined to MOVE_MAX_PIECES.
(STORE_MAX_PIECES): New.

gcc/testsuite/

* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-24.c: Likewise.
* gcc.target/i386/pr90773-25.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
XMM movd to store 4 bytes.
* gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
YMM registers.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Expect YMM registers.
* gcc.target/i386/pr100865-10b.c: Likewise.
---
 gcc/config/i386/i386.c   | 21 --
 gcc/config/i386/i386.h   | 40 
 gcc/testsuite/gcc.target/i386/pr100865-1.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-10a.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-10b.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr100865-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-3.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4a.c  |  6 +--
 gcc/testsuite/gcc.target/i386/pr100865-4b.c  |  8 ++--
 gcc/testsuite/gcc.target/i386/pr90773-1.c| 10 ++---
 gcc/testsuite/gcc.target/i386/pr90773-14.c   |  4 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c   |  6 +--
 gcc/testsuite/gcc.target/i386/pr90773-16.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-17.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-24.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-25.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-4.c|  2 +-
 17 files changed, 77 insertions(+), 42 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f436794f65c..3dfb3a6f2dc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7953,8 +7953,17 @@ ix86_finalize_stack_frame_flags (void)
  assumed stack realignment might be needed or -fno-omit-frame-pointer
  is used, but in the end nothing that needed the stack alignment had
  been spilled nor stack access, clear frame_pointer_needed and say we
- don't need stack realignment.  */
-  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
+ don't need stack realignment.
+
+ When vector register is used for piecewise move and store, we don't
+ increase stack_alignment_needed as there is no register spill for
+ piecewise move and store.  Since stack_realign_needed is set to true
+ by checking stack_alignment_estimated which is updated by pseudo
+ vector register usage, we also need to check stack_realign_needed to
+ eliminate frame pointer.  */
+  if ((stack_realign
+   || (!flag_omit_frame_pointer && optimize)
+   || crtl->stack_realign_needed)
   && frame_pointer_needed
   && crtl->is_leaf
   && crtl->sp_is_unchanging
@@ -10418,7 +10427,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
  /* FALLTHRU */
case E_OImode:
case E_XImode:
- if (!standard_sse_constant_p (x, mode))
+ if (!standard_sse_constant_p (x, mode)
+ && GET_MODE_SIZE (TARGET_AVX512F
+   ? XImode
+   

[PATCH v5 07/11] x86: Also pass -mno-avx to pr72839.c

2021-07-01 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/pr72839.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/pr72839.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr72839.c 
b/gcc/testsuite/gcc.target/i386/pr72839.c
index ea724f70377..6888d9d0a55 100644
--- a/gcc/testsuite/gcc.target/i386/pr72839.c
+++ b/gcc/testsuite/gcc.target/i386/pr72839.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
-/* { dg-options "-O2 -mtune=lakemont" } */
+/* { dg-options "-O2 -mtune=lakemont -mno-avx" } */
 
 extern char *strcpy (char *, const char *);
 
-- 
2.31.1



[PATCH v5 05/11] x86: Add AVX2 tests for PR middle-end/90773

2021-07-01 Thread H.J. Lu via Gcc-patches
PR middle-end/90773
* gcc.target/i386/pr90773-20.c: New test.
* gcc.target/i386/pr90773-21.c: Likewise.
* gcc.target/i386/pr90773-22.c: Likewise.
* gcc.target/i386/pr90773-23.c: Likewise.
* gcc.target/i386/pr90773-26.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr90773-20.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-21.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-22.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-23.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-26.c | 21 +
 5 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-26.c

diff --git a/gcc/testsuite/gcc.target/i386/pr90773-20.c 
b/gcc/testsuite/gcc.target/i386/pr90773-20.c
new file mode 100644
index 000..e61e405f2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-21.c 
b/gcc/testsuite/gcc.target/i386/pr90773-21.c
new file mode 100644
index 000..16ad17f3cbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-21.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]%.*, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-22.c 
b/gcc/testsuite/gcc.target/i386/pr90773-22.c
new file mode 100644
index 000..45a8ff65a84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-22.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-23.c 
b/gcc/testsuite/gcc.target/i386/pr90773-23.c
new file mode 100644
index 000..9256ce10ff0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-23.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-26.c 
b/gcc/testsuite/gcc.target/i386/pr90773-26.c
new file mode 100644
index 000..b2513c3a9c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-26.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
32\\(%\[\^,\]+\\)" 1 } } */
-- 
2.31.1



[PATCH v5 00/11] Allow TImode/OImode/XImode in op_by_pieces operations

2021-07-01 Thread H.J. Lu via Gcc-patches
Changes in the v5 patches:

1. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.
2. Use vec_duplicate, instead of adding TARGET_READ_MEMSET_VALUE and
TARGET_GEN_MEMSET_VALUE, to expand memset if available.

Changes in the v4 patches:

1. Define x86 MAX_MOVE_MAX to 64, which is the constant maximum number
of bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define x86 MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
bytes we can move from memory to memory in one reasonably fast instruction.
The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
must be a constant, independent of compiler options, since it is used in
reload.h to define struct target_reload and MOVE_MAX can vary, depending
on compiler options.

Changes in the v3 patches:

1. Split the TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE changes
into the generic part and the x86 part.


1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.
2. x86: Avoid stack realignment when copying data
3. x86: Remov MAX_BITSIZE_MODE_ANY_INT.  Only x86 backend defines it.
4. x86: Use TImode/OImode/XImode integers for piecewise move and store.
5. x86: Add tests for TImode/OImode/XImode for piecewise move and store.
6. x86: Adjust existing tests.

On x86-64, SPEC CPU 2017 performance impact is neutral.  Glibc code size
differences with -O2 build are:

 Before After
libc.so 19065721906444

Some code sequence differences in libc.so are:

:
...
jne   | jne

test   %r15,%r15test   
%r15,%r15
je| je 

mov%r13d,(%r14) mov
%r13d,(%r14)
lea0x10(%r14),%rdi  lea
0x10(%r14),%rdi
mov$0x1,%ecxmov
$0x1,%ecx
mov%r13d,%edx   mov
%r13d,%edx
mov%r15,0x40(%r12)  mov
%r15,0x40(%r12)
mov%r15,%rsimov
%r15,%rsi
call call   

lea0xa2f9b(%rip),%rax# | lea
0xa2fab(%rip),%rax# 
xor%esi,%esixor
%esi,%esi
mov%ebp,%edimov
%ebp,%edi
mov%rax,0x8(%r12)   mov
%rax,0x8(%r12)
movzwl 0x12(%rsp),%eax  movzwl 
0x12(%rsp),%eax
mov$0x8,%edx  <
lea0xc(%rsp),%rcx   lea
0xc(%rsp),%rcx
mov%r14,0x48(%r12)<
add$0x40,%r14 <
mov$0x4,%r8dmov
$0x4,%r8d
  > movq   
$0x0,0x1d0(%r14)
  > mov
$0x8,%edx
rol$0x8,%ax rol
$0x8,%ax
mov%ebp,(%r12)| mov
%r14,0x48(%r12)
movq   $0x0,0x190(%r14)   | add
$0x40,%r14
mov%ax,0x4(%r12)  <
mov%r14,0x30(%r12)  mov
%r14,0x30(%r12)
  > mov
%ax,0x4(%r12)
  > mov
%ebp,(%r12)
movl   $0x1,0xc(%rsp)   movl   
$0x1,0xc(%rsp)
callcall   

mov%r12,%rdimov
%r12,%rdi
movabs $0x101010101010101,%rdx<
test   %eax,%eaxtest   
%eax,%eax
mov$0xff,%eax   mov
$0xff,%eax
cmove  %eax,%ebxcmove  
%eax,%ebx
movzbl %bl,%eax   | movd   
%ebx,%xmm0
mov%ebx,0xc(%rsp)   mov
%ebx,0xc(%rsp)
mov%rax,%rsi  

[PATCH v5 08/11] x86: Also pass -mno-avx to cold-attribute-1.c

2021-07-01 Thread H.J. Lu via Gcc-patches
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/cold-attribute-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c 
b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
index 57666ac60b6..658eb3e25bb 100644
--- a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
+++ b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-avx" } */
 #include 
 static inline
 __attribute__ ((cold)) void
-- 
2.31.1



[PATCH v5 02/11] x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX

2021-07-01 Thread H.J. Lu via Gcc-patches
Define TARGET_GEN_MEMSET_SCRATCH_RTX to ix86_gen_scratch_sse_rtx to
return a scratch SSE register for memset.

gcc/

PR middle-end/90773
* config/i386/i386.c (TARGET_GEN_MEMSET_SCRATCH_RTX): New.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-15.c: New test.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-18.c: Likewise.
* gcc.target/i386/pr90773-19.c: Likewise.
---
 gcc/config/i386/i386.c |  6 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr90773-16.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr90773-17.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr90773-18.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr90773-19.c | 14 ++
 6 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2fbaae7cd02..f436794f65c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23163,7 +23163,8 @@ ix86_optab_supported_p (int op, machine_mode mode1, 
machine_mode,
 }
 }
 
-/* Return a scratch register in MODE for vector load and store.  */
+/* Implement the TARGET_GEN_MEMSET_SCRATCH_RTX hook.  Return a scratch
+   register in MODE for vector load and store.  */
 
 rtx
 ix86_gen_scratch_sse_rtx (machine_mode mode)
@@ -24082,6 +24083,9 @@ static bool ix86_libc_has_fast_function (int fcode 
ATTRIBUTE_UNUSED)
 #undef TARGET_LIBC_HAS_FAST_FUNCTION
 #define TARGET_LIBC_HAS_FAST_FUNCTION ix86_libc_has_fast_function
 
+#undef TARGET_GEN_MEMSET_SCRATCH_RTX
+#define TARGET_GEN_MEMSET_SCRATCH_RTX ix86_gen_scratch_sse_rtx
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c 
b/gcc/testsuite/gcc.target/i386/pr90773-15.c
new file mode 100644
index 000..c0a96fed892
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 17);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%edi, %xmm\[0-9\]+" 
1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]+%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+%dil, 16\\(%\[\^,\]+\\)" 1 
} } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c 
b/gcc/testsuite/gcc.target/i386/pr90773-16.c
new file mode 100644
index 000..d2d1ec6141c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, -1, 17);
+}
+
+/* { dg-final { scan-assembler-times "vpcmpeqd" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]+%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+\\\$-1, 16\\(%\[\^,\]+\\)" 
1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c 
b/gcc/testsuite/gcc.target/i386/pr90773-17.c
new file mode 100644
index 000..6c8da7d24ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 12, 19);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]+%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovd\[\\t \]+%xmm\[0-9\]+, 
15\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-18.c 
b/gcc/testsuite/gcc.target/i386/pr90773-18.c
new file mode 100644
index 000..b0687abbe01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-18.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 12, 9);
+}
+
+/* { dg-final { scan-assembler-times "movabsq\[\\t \]+\\\$868082074056920076, 
%r" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, 
\\(%\[\^,\]+\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, 
4\\(%\[\^,\]+\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+\\\$12, 

[PATCH v5 01/11] Rewrite memset with TARGET_GEN_MEMSET_SCRATCH_RTX

2021-07-01 Thread H.J. Lu via Gcc-patches
1. Rewrite builtin_memset_read_str/builtin_memset_gen_str to use vector
broadcast to duplicate QI value to TI/OI/XI value for memmset.
2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

PR middle-end/90773
* builtins.c (gen_memset_value_from_prev): New function.
(gen_memset_broadcast): Likewise.
(builtin_memset_read_str): Use gen_memset_value_from_prev
and gen_memset_broadcast.
(builtin_memset_gen_str): Likewise.
* target.def (gen_memset_scratch_rtx): New hook.
* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
* doc/tm.texi: Regenerated.
---
 gcc/builtins.c | 123 +
 gcc/doc/tm.texi|   5 ++
 gcc/doc/tm.texi.in |   2 +
 gcc/target.def |   7 +++
 4 files changed, 116 insertions(+), 21 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index e5e39386a93..e938d610f12 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6639,26 +6639,111 @@ expand_builtin_strncpy (tree exp, rtx target)
   return NULL_RTX;
 }
 
-/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
-   bytes from constant string DATA + OFFSET and return it as target
-   constant.  If PREV isn't nullptr, it has the RTL info from the
+/* Return the RTL of a register in MODE generated from PREV in the
previous iteration.  */
 
-rtx
-builtin_memset_read_str (void *data, void *prevp,
-HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
-scalar_int_mode mode)
+static rtx
+gen_memset_value_from_prev (void *prevp, scalar_int_mode mode)
 {
+  rtx target = nullptr;
   by_pieces_prev *prev = (by_pieces_prev *) prevp;
   if (prev != nullptr && prev->data != nullptr)
 {
   /* Use the previous data in the same mode.  */
   if (prev->mode == mode)
return prev->data;
+
+  rtx prev_rtx = prev->data;
+  machine_mode prev_mode = prev->mode;
+  unsigned int word_size = GET_MODE_SIZE (word_mode);
+  if (word_size < GET_MODE_SIZE (prev->mode)
+ && word_size > GET_MODE_SIZE (mode))
+   {
+ /* First generate subreg of word mode if the previous mode is
+wider than word mode and word mode is wider than MODE.  */
+ prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
+ prev_mode, 0);
+ prev_mode = word_mode;
+   }
+  if (prev_rtx != nullptr)
+   target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
 }
+  return target;
+}
+
+/* Return the RTL of a register in MODE broadcasted from DATA.  */
+
+static rtx
+gen_memset_broadcast (rtx data, scalar_int_mode mode)
+{
+  /* Skip if regno_reg_rtx isn't initialized.  */
+  if (!regno_reg_rtx)
+return nullptr;
+
+  rtx target = nullptr;
+
+  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
+  machine_mode vector_mode;
+  if (!mode_for_vector (QImode, nunits).exists (_mode))
+gcc_unreachable ();
+
+  enum insn_code icode = optab_handler (vec_duplicate_optab,
+   vector_mode);
+  if (icode != CODE_FOR_nothing)
+{
+  rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
+  if (CONST_INT_P (data))
+   {
+ /* Use the move expander with CONST_VECTOR.  */
+ rtx const_vec = gen_const_vec_duplicate (vector_mode, data);
+ emit_move_insn (reg, const_vec);
+   }
+  else
+   {
+
+ class expand_operand ops[2];
+ create_output_operand ([0], reg, vector_mode);
+ create_input_operand ([1], data, QImode);
+ expand_insn (icode, 2, ops);
+ if (!rtx_equal_p (reg, ops[0].value))
+   emit_move_insn (reg, ops[0].value);
+   }
+  target = lowpart_subreg (mode, reg, vector_mode);
+}
+
+  return target;
+}
+
+/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
+   bytes from constant string DATA + OFFSET and return it as target
+   constant.  If PREV isn't nullptr, it has the RTL info from the
+   previous iteration.  */
 
+rtx
+builtin_memset_read_str (void *data, void *prev,
+HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
+scalar_int_mode mode)
+{
+  rtx target;
   const char *c = (const char *) data;
-  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
+  char *p;
+
+  /* Don't use the previous value if size is 1.  */
+  if (GET_MODE_SIZE (mode) != 1)
+{
+  target = gen_memset_value_from_prev (prev, mode);
+  if (target != nullptr)
+   return target;
+
+  p = XALLOCAVEC (char, GET_MODE_SIZE (QImode));
+  memset (p, *c, GET_MODE_SIZE (QImode));
+  rtx src = c_readstr (p, QImode);
+  target = gen_memset_broadcast (src, mode);
+  if (target != nullptr)
+   return target;
+}
+
+  p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
 
   memset (p, *c, GET_MODE_SIZE 

[PATCH v5 03/11] x86: Avoid stack realignment when copying data

2021-07-01 Thread H.J. Lu via Gcc-patches
To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

* config/i386/i386-expand.c (ix86_expand_vector_move): Call
ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
data from one memory location to another.

gcc/testsuite/

* gcc.target/i386/eh_return-1.c: New test.
---
 gcc/config/i386/i386-expand.c   |  4 +++-
 gcc/testsuite/gcc.target/i386/eh_return-1.c | 26 +
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 5c9170e3a1d..6b009d523a5 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -602,7 +602,9 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   && !register_operand (op0, mode)
   && !register_operand (op1, mode))
 {
-  emit_move_insn (op0, force_reg (GET_MODE (op0), op1));
+  rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
+  emit_move_insn (tmp, op1);
+  emit_move_insn (op0, tmp);
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c 
b/gcc/testsuite/gcc.target/i386/eh_return-1.c
new file mode 100644
index 000..671ba635e88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell -mno-avx512f" } */
+
+struct _Unwind_Context
+{
+  void *ra;
+  char array[48];
+};
+
+extern long uw_install_context_1 (struct _Unwind_Context *);
+
+void
+_Unwind_RaiseException (void)
+{
+  struct _Unwind_Context this_context, cur_context;
+  long offset = uw_install_context_1 (_context);
+  __builtin_memcpy (_context, _context,
+   sizeof (struct _Unwind_Context));
+  void *handler = __builtin_frob_return_addr ((_context)->ra);
+  uw_install_context_1 (_context);
+  __builtin_eh_return (offset, handler);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
-- 
2.31.1



Re: [PATCH] Add gnu::diagnose_as attribute

2021-07-01 Thread Jason Merrill via Gcc-patches

On 7/1/21 5:28 AM, Matthias Kretz wrote:

On Tuesday, 22 June 2021 22:12:42 CEST Jason Merrill wrote:

On 6/22/21 4:01 PM, Matthias Kretz wrote:

On Tuesday, 22 June 2021 21:52:16 CEST Jason Merrill wrote:

For alias templates, you probably want the attribute only on the
templated class, not on the instantiations.


Oh good point. My current patch does not allow the attribute on alias
templates. Consider:

template 

struct X {};

template 

using foo [[gnu::diagnose_as]] = X;

I have no idea how this could work. I would have to set the attribute for
an implicit partial specialization (not that I know of the existence of
such a thing)? I.e. X would have to be diagnosed as foo,
but X would have to be diagnosed as X, not foo.

So if anything it should only support alias templates if they are strictly
"renaming" the type. I.e. their template parameters must match up exactly.
Can I constrain the attribute like this?


Yes.  You can check that with get_underlying_template.

Or you could support the above by putting the attribute on the
instantiation with the TEMPLATE_INFO for foo rather than a simple name.


Question, given:

   template  using foo = bar;

The diagnose_as attribute handler isn't called until e.g. `foo` is
instantiated.


You probably want to adjust is_late_template_attribute to change that.


Meaning that even after the declaration of the alias template
`bar` will not be diagnosed as `foo`, which happens only after the
first use of `foo`. I find that more confusing than helpful, even if the
expectation would be that users only use the alias template.

So do you still expect alias templates to support diagnose_as? And if yes, how
could I handle the attribute so that the diagnose_as attribute is applied to
`bar` on declaration of `foo`?





Re: [PATCH] c++: unqualified member template in constraint [PR101247]

2021-07-01 Thread Jason Merrill via Gcc-patches

On 6/30/21 5:27 PM, Patrick Palka wrote:

Here any_template_parm_r is failing to mark the template parameters
that're implicitly used by the unqualified use of 'd' inside the constraint,
because the code to do so assumes each level of a template parameter
list points to the corresponding primary template, but here the
parameter level for A in the out-of-line definition of A::B does not
(nor do the parameter levels for A and C in the definition of A::C),
which causes us to overlook the sharing.

So it seems we can't in general depend on the TREE_TYPE of a template
parameter level being non-empty here.  This patch partially fixes this
by rewriting the relevant part of any_template_parm_r to not depend on
the TREE_TYPE of outer levels.  We still depend on the innermost level
to point to the innermost primary template, so unfortunately we still
crash on the commented out lines in the below testcase.  (The problem
there ultimately seems to be in push_template_decl, where we consider
the out-of-line definition of A::C to not be primary since
template_parm_scope_p is false, so DECL_PRIMARY_TEMPLATE never gets set.
Fixing that might not be safe enough to backport, but hopefully this
partial fix is.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
range-v3 and cmcstl2, does this look OK for trunk/11?


OK.  Are you looking at fixing the commented-out line in a separate patch?


PR c++/101247

gcc/cp/ChangeLog:

* pt.c (any_template_parm_r) : Rewrite to
use common_enclosing_class and to not depend on the TREE_TYPE
of outer levels pointing to the corresponding primary template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-memtmpl4.C: New test.
---
  gcc/cp/pt.c   | 23 ---
  .../g++.dg/cpp2a/concepts-memtmpl4.C  | 28 +++
  2 files changed, 33 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-memtmpl4.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index db769d59951..a8fdd2e177e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10735,24 +10735,11 @@ any_template_parm_r (tree t, void *data)
{
/* If T is a member template that shares template parameters with
   ctx_parms, we need to mark all those parameters for mapping.  */
-   tree dparms = DECL_TEMPLATE_PARMS (t);
-   tree cparms = ftpi->ctx_parms;
-   while (TMPL_PARMS_DEPTH (dparms) > ftpi->max_depth)
- dparms = TREE_CHAIN (dparms);
-   while (TMPL_PARMS_DEPTH (cparms) > TMPL_PARMS_DEPTH (dparms))
- cparms = TREE_CHAIN (cparms);
-   while (dparms
-  && (TREE_TYPE (TREE_VALUE (dparms))
-  != TREE_TYPE (TREE_VALUE (cparms
- dparms = TREE_CHAIN (dparms),
-   cparms = TREE_CHAIN (cparms);
-   if (dparms)
- {
-   int ddepth = TMPL_PARMS_DEPTH (dparms);
-   tree dargs = TI_ARGS (get_template_info (DECL_TEMPLATE_RESULT (t)));
-   for (int i = 0; i < ddepth; ++i)
- WALK_SUBTREE (TMPL_ARGS_LEVEL (dargs, i+1));
- }
+   if (tree dtmpl = TREE_TYPE (INNERMOST_TEMPLATE_PARMS (ftpi->ctx_parms)))
+ if (tree com = common_enclosing_class (DECL_CONTEXT (t),
+DECL_CONTEXT (dtmpl)))
+   if (tree ti = CLASSTYPE_TEMPLATE_INFO (com))
+ WALK_SUBTREE (TI_ARGS (ti));
}
break;
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-memtmpl4.C b/gcc/testsuite/g++.dg/cpp2a/concepts-memtmpl4.C

new file mode 100644
index 000..625149e5025
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-memtmpl4.C
@@ -0,0 +1,28 @@
+// PR c++/101247
+// { dg-do compile { target concepts } }
+// A variant of concepts-memtmpl3.C where f is defined outside A's definition.
+
+template  struct A {
+  template  static constexpr bool d = true;
+  struct B;
+  template  struct C;
+};
+
+template 
+struct A::B {
+  template  static void f(c) requires d;
+};
+
+template 
+template 
+struct A::C {
+  template  static void f(c) requires d;
+  static void g() requires d;
+};
+
+int main()
+{
+  A::B::f(0);
+  A::C::f(0);
+  // A::C::g();
+}





[PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-01 Thread Hafiz Abid Qadeer
Currently, if we look at the debug information for offload kernel
regions, it looks something like this:

void foo (void)
{
#pragma acc kernels
  {

  }
}

DW_TAG_compile_unit
  DW_AT_name("")

  DW_TAG_subprogram // notional parent function (foo) with no code range

DW_TAG_subprogram // offload function foo._omp_fn.0

There is an artificial compile unit. It contains a parent subprogram which
has the offload function as its child.  The parent function makes sense in
host code where it actually exists and does have an address range. But in
offload code, it does not exist and neither the generated dwarf has an
address range for this function.

When debugger read the dwarf for offload code, they see a function with no
address range and discard it alongwith its children which include offload
function.  This results in a poor debug experience of offload code.

This patch tries to solve this problem by making offload kernels children of
"artifical" compile unit instead of a non existent parent function. This
not only improves debug experience but also reflects the reality better
in debug info.

Patch was tested on x86_64 with amdgcn offload. Debug behavior was
tested with rocgdb.

gcc/

* gcc/dwarf2out.c (notional_parents_list): New file variable.
(gen_subprogram_die): Record offload kernel functions in
notional_parents_list.
(fixup_notional_parents): New function.
(dwarf2out_finish): Call fixup_notional_parents.
(dwarf2out_c_finalize): Reset notional_parents_list.
---
 gcc/dwarf2out.c | 68 +++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 80acf165fee..769bb7fc4a8 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3506,6 +3506,11 @@ static GTY(()) limbo_die_node *limbo_die_list;
DW_AT_{,MIPS_}linkage_name once their DECL_ASSEMBLER_NAMEs are set.  */
 static GTY(()) limbo_die_node *deferred_asm_name;
 
+/* A list of DIEs which represent parents of nested offload kernels.  These
+   functions exist on the host side but not in the offloed code.  But they
+   still show up as parent of the ofload kernels in DWARF. */
+static GTY(()) limbo_die_node *notional_parents_list;
+
 struct dwarf_file_hasher : ggc_ptr_hash
 {
   typedef const char *compare_type;
@@ -23652,8 +23657,23 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
  if (fde->dw_fde_begin)
{
  /* We have already generated the labels.  */
- add_AT_low_high_pc (subr_die, fde->dw_fde_begin,
- fde->dw_fde_end, false);
+ add_AT_low_high_pc (subr_die, fde->dw_fde_begin,
+ fde->dw_fde_end, false);
+
+/* Offload kernel functions are nested within a parent function
+   that doesn't actually exist in the offload object.  GDB
+   will ignore the function and everything nested within it as
+   the function does not have an address range.  We mark the
+   parent functions here and will later fix them.  */
+if (lookup_attribute ("omp target entrypoint",
+  DECL_ATTRIBUTES (decl)))
+  {
+limbo_die_node *node = ggc_cleared_alloc ();
+node->die = subr_die->die_parent;
+node->created_for = decl;
+node->next = notional_parents_list;
+notional_parents_list = node;
+  }
}
  else
{
@@ -31881,6 +31901,46 @@ flush_limbo_die_list (void)
 }
 }
 
+/* Fixup notional parent function (which does not actually exist) so that
+   a function with no address range is not parent of a function *with* address
+   ranges.  Otherwise debugger see the parent function without code range
+   and discards it along with its children which here include function
+   which have address range.
+
+   Typically this occurs when we have an offload kernel, where the parent
+   function only exists in the host-side portion of the code.  */
+
+static void
+fixup_notional_parents (void)
+{
+  limbo_die_node *node;
+
+  for (node = notional_parents_list; node; node = node->next)
+{
+  dw_die_ref notional_parent = node->die;
+  /* The dwarf at this moment looks like this
+DW_TAG_compile_unit
+  DW_AT_name   ("")
+
+  DW_TAG_subprogram // parent function with no code range
+
+DW_TAG_subprogram // offload function 1
+...
+DW_TAG_subprogram // offload function n
+Our aim is to make offload function children of CU.  */
+  if (notional_parent
+ && notional_parent->die_tag == DW_TAG_subprogram
+ && !(get_AT (notional_parent, DW_AT_low_pc)
+ || get_AT (notional_parent, DW_AT_ranges)))
+
+   {
+ dw_die_ref cu = notional_parent->die_parent;

Re: [PATCH] PING implement pre-c++20 contracts

2021-07-01 Thread Jason Merrill via Gcc-patches

On 6/26/21 10:23 AM, Andrew Sutton wrote:

Hi Jason,

I ended up taking over this work from Jeff (CC'd on his existing email
address). I scraped all the contracts changes into one big patch
against master. See attached. The ChangeLog.contracts files list the
sum of changes for the patch, not the full history of the work.

I think this version addresses most of your concerns.


Thanks, looking good.  I'll fuss with it a bit and commit it soon.


There are a few big changes.

The first is that we treat contracts like any other attribute, which
gets interesting at times. For example, in duplicate_decl, we have to
do a little dance to make sure the target merge_attributes doesn't
copy attributes between the new and old decls in seemingly arbitrary
order. Friends are also a bit funny because the attributes aren't
attached by cplus_decl_attributes at the point declarations are
merged, so we have to defer comparisons.

Contracts are always parsed where they appear, except on member
functions. For postconditions with result variables (e.g., [[post r:
...]]), we temporarily declare r as if 'auto r' and then update it
later when we've computed the function's return type. (I feel like
this was kind of overlooked in N4820... it's generally impossible to
assign a type to 'r' given the position of contract attributes in the
declarator: 'auto f(int n) [[post r: q]] -> int'. It's worse in GCC
since the return type is computed in grokdeclarator, well after
contract attributes have been parsed).

On a related note, the handling of postconditions involving deduced
return type was completely rewritten. Everything happens in
apply_deduced_return_type, which seems right. I think
check_return_expr is where the postcondition is actually invoked.

We no longer instantiate contract attributes until absolutely
necessary in regenerate_decl_from_tempalte. That seems to work well...
at least does after I discovered we were quietly rewriting contract
lists every time we removed contracts from an old declaration or from
a template specialization. This also gets rid of the need to have
unshare_templates, which had a FIXME note attached.

Lastly, we only ever generate pre/post checks for actual functions,
not function templates. I also simplified a lot of the logic around
associating pre/post check functions, so that it's only set exactly
once when start analyzing function definitions.

I believe Jeff addressed some of the ABI concerns and COMDAT-related questions.

On the issue of copy_fn_decl vs.v copy_fndecl_with_name... I didn't
change that. The latter sends the function to middle end for codegen,
which we really don't want at the point we make the copy.

I think we're probably still breaking NRVO. I didn't get a chance to
look at that.



On Fri, May 28, 2021 at 9:18 AM Jeff Chapman  wrote:


Hello again :)  Wanted to shoot a quick status update. Some github issues have
been created for points of feedback, and we've been working on addressing them.
A few changes have been pushed to the contracts-jac-alt branch, while there's
also an active more in depth rewrite branch. Some specific comments inline
below.

On 5/17/21, Jason Merrill  wrote:

On 5/14/21 4:54 PM, Jason Merrill wrote:

On 4/30/21 1:44 PM, Jeff Chapman wrote:

Hello! Looping back around to this. re:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567334.html

On 3/25/21, Jason Merrill  wrote:

On 3/1/21 8:12 AM, Jeff Chapman wrote:

On 1/18/21, Jason Merrill  wrote:

On 1/4/21 9:58 AM, Jeff Chapman wrote:

Ping. re:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561135.html


https://github.com/lock3/gcc/tree/contracts-jac-alt



Why is some of the code in c-family?  From the modules merge there is
now a cp_handle_option function that you could add the option
handling
to, and I don't see a reason for cxx-contracts.c to be in c-family/
rather than cp/.


I've been pushing changes that address the points raised and wanted to
ping to see if there's more feedback and give a status summary. The
notable change is making sure the implementation plays nicely with
modules and a mangling change that did away with a good chunk of code.

The contracts code outside of cp/ has been moved into it, and the
contract attributes have been altered to work with language
independent
handling code. More of the contracts code can probably still be
moved to
cxx-contracts which I'll loop back to after other refactoring. The
naming, spelling, and comments (or lack thereof) have been addressed.


Sounds good.  I plan to get back to this when GCC 11 branches, which
should be mid-April.


Please let me know if you see any more issues when you pick it back up.
Particularly in modules interop, since I don't think you've had a chance
to look at that yet.

Completed another merge with master earlier this week which didn't bring
to light any new issues or regressions, but 

Re: [PATCH] Port GCC documentation to Sphinx

2021-07-01 Thread Michael Matz
Hello,

On Thu, 1 Jul 2021, Martin Liška wrote:

> On 7/1/21 3:33 PM, Eli Zaretskii wrote:
> > > Cc: jos...@codesourcery.com, g...@gcc.gnu.org, gcc-patches@gcc.gnu.org
> > > From: Martin Liška 
> > > Date: Thu, 1 Jul 2021 14:44:10 +0200
> > > 
> > > > It helps some, but not all of the issues disappear.  For example,
> > > > stuff like this is still hard to read:
> > > > 
> > > > To select this standard in GCC, use one of the options -ansi
> > > >-
> > > > -std.‘=c90’ or -std.‘=iso9899:1990’
> > > >    
> > > 
> > > If I understand the notes correct, the '.' should be also hidden by e.g.
> > > Emacs.
> > 
> > No, it doesn't.  The actual text in the Info file is:
> > 
> > *note -std: f.‘=iso9899:1990’
> > 
> > and the period after " f" isn't hidden.  Where does that "f" come from
> > and what is its purpose here? can it be removed (together with the
> > period)?
> 
> It's name of the anchor used for the @ref. The names are automatically
> generated
> by makeinfo. So there's an example:
> 
> This is the warning level of @ref{e,,-Wshift-overflow3} and …
> 
> becomes in info:
> This is the warning level of *note -Wshift-overflow3: e. and …
> 
> I can ask the question at Sphinx, the Emacs script should hide that.

Not everyone reads info within Emacs; even if there's an emacs solution to 
postprocess info pages to make them nicer we can't rely on that.  It must 
look sensible without that.  In this case it seems that already the 
generated .texinfo input to makeinfo is bad, where does the 'e' (or 'f') 
come from?  The original texinfo file simply contains:

  @option{-std=iso9899:1990}

so that's what perhaps should be generated, or maybe the anchor in @ref is 
optional and could be left out if it doesn't provide any info (a single 
character as anchor doesn't seem very useful)?

> > > > > We can adjust 'emph' formatting to nil, what do you think?
> > > > 
> > > > Something like that, yes.  But the problem is: how will you format it
> > > > instead?  The known alternatives, _foo_ and *foo* both use punctuation
> > > > characters, which will get in the way similarly to the quotes.  Can
> > > > you format those in caps, like makeinfo does?
> > > 
> > > You are fully right, info is very simple format and it uses wrapping for
> > > the formatting
> > > purpose (by default * and _). So, I don't have any elegant solution.
> > 
> > Well, it sounds from another mail that you did find a solution: to
> > up-case the string in @var.
> 
> I don't know. Some of them can be e.g. keywords and using upper-case 
> does not seem to me feasible.

Then that needs to be different already in the input, so that the 
directive that (in info) capitalizes is only used in contexts where that 
makes sense.  People reading info pages will know that an all-caps word 
often means a syntactic variable/placeholder, so that should be preserved.


Ciao,
Michael.


Re: [PATCH] Return true/false instead of 1/0 from generic predicates.

2021-07-01 Thread Jeff Law via Gcc-patches




On 7/1/2021 8:55 AM, Uros Bizjak via Gcc-patches wrote:

No functional changes.

2021-07-01  Uroš Bizjak  

gcc/
 * recog.c (general_operand): Return true/false instead of 1/0.
 (register_operand): Ditto.
 (immediate_operand): Ditto.
 (const_int_operand): Ditto.
 (const_scalar_int_operand): Ditto.
 (const_double_operand): Ditto.
 (push_operand): Ditto.
 (pop_operand): Ditto.
 (memory_operand): Ditto.
 (indirect_operand): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.
Thanks for tackling this.   I fully support this kind of cleanup without 
the need for waiting on an approval.  If there's oddball target fallout, 
my tester will catch it.


Jeff



[PATCH] i386: Return true/false instead of 1/0 from predicates.

2021-07-01 Thread Uros Bizjak via Gcc-patches
No functional changes.

2021-07-01  Uroš Bizjak  

gcc/
* config/i386/predicates.md (ix86_endbr_immediate_operand):
Return true/false instead of 1/0.
(movq_parallel): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index e7a896874d6..c4b35c82506 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -145,16 +145,16 @@ (define_predicate "ix86_endbr_immediate_operand"
unsigned HOST_WIDE_INT val = TARGET_64BIT ? 0xfa1e0ff3 : 0xfb1e0ff3;
 
if (imm == val)
-return 1;
+return true;
 
/* NB: Encoding is byte based.  */
if (TARGET_64BIT)
 for (; imm >= val; imm >>= 8)
   if (imm == val)
-return 1;
+return true;
   }
 
-  return 0;
+  return false;
 })
 
 ;; Return true if VALUE can be stored in a sign extended immediate field.
@@ -1559,15 +1559,15 @@ (define_predicate "movq_parallel"
   unsigned HOST_WIDE_INT ei;
 
   if (!CONST_INT_P (er))
-   return 0;
+   return false;
   ei = INTVAL (er);
   if (i < nelt2 && ei != i)
-   return 0;
+   return false;
   if (i >= nelt2 && (ei < nelt || ei >= nelt << 1))
-   return 0;
+   return false;
 }
 
-  return 1;
+  return true;
 })
 
 ;; Return true if OP is a vzeroall operation, known to be a PARALLEL.


[PATCH] Return true/false instead of 1/0 from generic predicates.

2021-07-01 Thread Uros Bizjak via Gcc-patches
No functional changes.

2021-07-01  Uroš Bizjak  

gcc/
* recog.c (general_operand): Return true/false instead of 1/0.
(register_operand): Ditto.
(immediate_operand): Ditto.
(const_int_operand): Ditto.
(const_scalar_int_operand): Ditto.
(const_double_operand): Ditto.
(push_operand): Ditto.
(pop_operand): Ditto.
(memory_operand): Ditto.
(indirect_operand): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/recog.c b/gcc/recog.c
index 9d880433e6f..2114df8c0d1 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -1420,12 +1420,12 @@ general_operand (rtx op, machine_mode mode)
   if (GET_MODE (op) == VOIDmode && mode != VOIDmode
   && GET_MODE_CLASS (mode) != MODE_INT
   && GET_MODE_CLASS (mode) != MODE_PARTIAL_INT)
-return 0;
+return false;
 
   if (CONST_INT_P (op)
   && mode != VOIDmode
   && trunc_int_for_mode (INTVAL (op), mode) != INTVAL (op))
-return 0;
+return false;
 
   if (CONSTANT_P (op))
 return ((GET_MODE (op) == VOIDmode || GET_MODE (op) == mode
@@ -1439,7 +1439,7 @@ general_operand (rtx op, machine_mode mode)
  OP's mode must match MODE if MODE specifies a mode.  */
 
   if (GET_MODE (op) != mode)
-return 0;
+return false;
 
   if (code == SUBREG)
 {
@@ -1452,7 +1452,7 @@ general_operand (rtx op, machine_mode mode)
 get cleaned up by cleanup_subreg_operands.  */
   if (!reload_completed && MEM_P (sub)
  && paradoxical_subreg_p (op))
-   return 0;
+   return false;
 #endif
   /* Avoid memories with nonzero SUBREG_BYTE, as offsetting the memory
  may result in incorrect reference.  We should simplify all valid
@@ -1463,7 +1463,7 @@ general_operand (rtx op, machine_mode mode)
   if (!reload_completed
  && maybe_ne (SUBREG_BYTE (op), 0)
  && MEM_P (sub))
-   return 0;
+   return false;
 
   if (REG_P (sub)
  && REGNO (sub) < FIRST_PSEUDO_REGISTER
@@ -1474,7 +1474,7 @@ general_operand (rtx op, machine_mode mode)
 operand reload presentation.  LRA needs to treat them as
 valid.  */
  && ! LRA_SUBREG_P (op))
-   return 0;
+   return false;
 
   /* FLOAT_MODE subregs can't be paradoxical.  Combine will occasionally
 create such rtl, and we must reject it.  */
@@ -1486,7 +1486,7 @@ general_operand (rtx op, machine_mode mode)
 mode.  */
  && ! lra_in_progress 
  && paradoxical_subreg_p (op))
-   return 0;
+   return false;
 
   op = sub;
   code = GET_CODE (op);
@@ -1501,7 +1501,7 @@ general_operand (rtx op, machine_mode mode)
   rtx y = XEXP (op, 0);
 
   if (! volatile_ok && MEM_VOLATILE_P (op))
-   return 0;
+   return false;
 
   /* Use the mem's mode, since it will be reloaded thus.  LRA can
 generate move insn with invalid addresses which is made valid
@@ -1509,10 +1509,10 @@ general_operand (rtx op, machine_mode mode)
 transformations.  */
   if (lra_in_progress
  || memory_address_addr_space_p (GET_MODE (op), y, MEM_ADDR_SPACE 
(op)))
-   return 1;
+   return true;
 }
 
-  return 0;
+  return false;
 }
 
 /* Return true if OP is a valid memory address for a memory reference
@@ -1552,10 +1552,10 @@ register_operand (rtx op, machine_mode mode)
 but currently it does result from (SUBREG (REG)...) where the
 reg went on the stack.)  */
   if (!REG_P (sub) && (reload_completed || !MEM_P (sub)))
-   return 0;
+   return false;
 }
   else if (!REG_P (op))
-return 0;
+return false;
   return general_operand (op, mode);
 }
 
@@ -1574,7 +1574,7 @@ bool
 scratch_operand (rtx op, machine_mode mode)
 {
   if (GET_MODE (op) != mode && mode != VOIDmode)
-return 0;
+return false;
 
   return (GET_CODE (op) == SCRATCH
  || (REG_P (op)
@@ -1596,12 +1596,12 @@ immediate_operand (rtx op, machine_mode mode)
   if (GET_MODE (op) == VOIDmode && mode != VOIDmode
   && GET_MODE_CLASS (mode) != MODE_INT
   && GET_MODE_CLASS (mode) != MODE_PARTIAL_INT)
-return 0;
+return false;
 
   if (CONST_INT_P (op)
   && mode != VOIDmode
   && trunc_int_for_mode (INTVAL (op), mode) != INTVAL (op))
-return 0;
+return false;
 
   return (CONSTANT_P (op)
  && (GET_MODE (op) == mode || mode == VOIDmode
@@ -1618,13 +1618,13 @@ bool
 const_int_operand (rtx op, machine_mode mode)
 {
   if (!CONST_INT_P (op))
-return 0;
+return false;
 
   if (mode != VOIDmode
   && trunc_int_for_mode (INTVAL (op), mode) != INTVAL (op))
-return 0;
+return false;
 
-  return 1;
+  return true;
 }
 
 #if TARGET_SUPPORTS_WIDE_INT
@@ -1634,7 +1634,7 @@ bool
 const_scalar_int_operand (rtx op, machine_mode mode)
 {
   if (!CONST_SCALAR_INT_P (op))
-return 0;
+return false;
 
   if (CONST_INT_P (op))
 return const_int_operand (op, mode);
@@ -1646,10 +1646,10 @@ 

Re: [PATCH] tree-optimization/101280 - revise interchange fix for PR101173

2021-07-01 Thread Michael Matz
Hello,

On Thu, 1 Jul 2021, Richard Biener wrote:

> diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
> index 43045c5455e..43ef112a2d0 100644
> --- a/gcc/gimple-loop-interchange.cc
> +++ b/gcc/gimple-loop-interchange.cc
> @@ -1043,8 +1043,11 @@ tree_loop_interchange::valid_data_dependences 
> (unsigned i_idx, unsigned o_idx,
>   continue;
>  
> /* Be conservative, skip case if either direction at i_idx/o_idx
> -  levels is not '=' (for the inner loop) or '<'.  */
> -   if (dist_vect[i_idx] < 0 || dist_vect[o_idx] <= 0)
> +  levels is not '=' or '<'.  */
> +   if (dist_vect[i_idx] < 0
> +   || (DDR_REVERSED_P (ddr) && dist_vect[i_idx] > 0)
> +   || dist_vect[o_idx] < 0
> +   || (DDR_REVERSED_P (ddr) && dist_vect[o_idx] > 0))

Hmm, if DDR_REVERSED_P matters here, then it should matter for all arms.  
IOW: < 0 should be tested only when !DDR_REVERSED_P, not always:

   if ((!DDR_REVERSED_P (ddr) && dist_vect[i_idx] < 0)
   || (DDR_REVERSED_P (ddr) && dist_vect[i_idx] > 0)
   || (!DDR_REVERSED_P (ddr) && dist_vect[o_idx] < 0)
   || (DDR_REVERSED_P (ddr) && dist_vect[o_idx] > 0))

(what you have effectively written is a condition that allows only 0 when 
DDR_REVERSED_P)


Ciao,
Michael.


Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)

2021-07-01 Thread Martin Sebor via Gcc-patches

On 7/1/21 2:01 AM, Christophe LYON wrote:


On 30/06/2021 21:56, Martin Sebor via Gcc-patches wrote:

On 6/11/21 8:46 AM, Martin Sebor wrote:

On 6/11/21 3:58 AM, Richard Sandiford wrote:

Martin Sebor via Gcc-patches  writes:

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7b37e1b602c..7cdc824730c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13242,13 +13242,8 @@ bounds_check (rtx operand, HOST_WIDE_INT 
low, HOST_WIDE_INT high,

    lane = INTVAL (operand);
    if (lane < low || lane >= high)
-    {
-  if (exp)
-    error ("%K%s %wd out of range %wd - %wd",
-   exp, desc, lane, low, high - 1);
-  else
-    error ("%s %wd out of range %wd - %wd", desc, lane, low, high 
- 1);

-    }
+    error_at (EXPR_LOCATION (exp),
+  "%s %wd out of range %wd - %wd", desc, lane, low, high - 
1);

  }
  /* Bounds-check lanes.  */


This part doesn't look safe: “exp” is null when called from 
arm_const_bounds.


Doh!  Yes, will fix, thanks.


Attached is an updated patch with the test above restored.

Christophe, if you could apply it on top of patches 1 and 2 and run
the aarch64/arm tests that would be great!

Patch 1:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573859.html
Patch 2:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/574088.html



Hi,


I hope I got it right, but there are a few regressions on aarch64/arm:


Thanks!



http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/r12-1929-gf6bc9d9bddad7f9e3aad939bb6750770ac67f003-martin.patch

The patch I applied is 
https://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/r12-1929-gf6bc9d9bddad7f9e3aad939bb6750770ac67f003-martin.patch/commit.txt



The regressions are the same on aarch64/arm:

   Executed from: gcc.dg/format/format.exp
 gcc.dg/format/c90-printf-1.c%s with NULL (test for warnings, line 243)
 gcc.dg/format/c90-printf-1.c(test for excess errors)
 gcc.dg/format/c90-printf-1.c   -DWIDE  %s with NULL (test for warnings, 
line 243)
 gcc.dg/format/c90-printf-1.c   -DWIDE  (test for excess errors)


The location of the warning has improved here leading to the test
failing where it looks for it in the wrong place (wrong column).
This is (or will be) fixed in the final patch by adjusting the test.




   Executed from: gcc.dg/dg.exp
 gcc.dg/pr79214.c  (test for warnings, line 25)
 gcc.dg/pr79214.c  (test for warnings, line 30)
 gcc.dg/pr79214.c  (test for warnings, line 35)
 gcc.dg/pr79214.c  (test for warnings, line 40)
 gcc.dg/pr79214.c  (test for warnings, line 45)
 gcc.dg/pr79214.c  (test for warnings, line 52)
 gcc.dg/pr79214.c  (test for warnings, line 59)
 gcc.dg/pr79214.c  (test for warnings, line 66)
 gcc.dg/pr79214.c  (test for warnings, line 73)
 gcc.dg/pr79214.c  (test for warnings, line 80)
 gcc.dg/pr79214.c  (test for warnings, line 87)


This is also expected (on all targets) and something I'll deal with
before committing.




If you want to see the corresponding gcc.log, you can click on "log" in 
the red cells of the top-level report.


Your results validate the aarch64 and arm changes so they're all
I needed.

Thanks again for your help!

Martin




HTH


Christophe



Thanks
Martin




Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Michael Matz
Hello,

I haven't followed this thread too closely, in particular I haven't seen 
why the current form of the .DEFERRED_INIT call was chosen or suggested, 
but it triggered my "well, that's obviously wrong" gut feeling; so sorry 
for stating something which might be obvious to thread participants here.  
Anyway:

On Thu, 1 Jul 2021, Richard Sandiford via Gcc-patches wrote:

> >> It's not a bug in SSA, it's at most a missed optimization there.
> >
> > I still think that SSA cannot handle block-scoped variable correctly 
> > in this case, it should not move the variable out side of the block 
> > scope. -:)
> >
> >>  But with -ftrivial-auto-var-init it becomes a correctness issue.
> 
> I might have lost track of what “it” is here.  Do you mean the 
> progation, or the fact that we have a PHI in the first place?
> 
> For:
> 
> unsigned int
> f (int n)
> {
>   unsigned int res = 0;
>   for (int i = 0; i < n; i += 1)
> {
>   unsigned int foo;
>   foo += 1;

Note that here foo is used uninitialized.  That is the thing from which 
everything else follows.  Garbage in, garbage out.  It makes not too much 
sense to argue that the generated PHI node on this loop (generated because 
of the exposed-upward use of foo) is wrong, or right, or should better be 
something else.  The input program was broken, so anything makes sense.

That is the same with Qings shorter testcase:

  6   for (i = 0; i < a; i++) {
  7 if (__extension__({int size2;
  8 size2 = ART_INIT (size2, 2);

Uninitialized use of size2 right there.  And it's the same for the use of 
.DEFERRED_INIT as the patch does:

{
  int size2;
  size2 = .DEFERRED_INIT (size2, 2);
  size2 = 4;
  _1 = size2 > 5;
  D.2240 = (int) _1;
}

Argument of the pseudo-function-call to .DEFERRED_INIT uninitialized -> 
boom.

You can't solve any of this by fiddling with how SSA rewriting behaves at 
a large scale.  You need to change this and only this specific 
interpretation of a use.  Or preferrably not generate it to start with.

> IMO the foo_3 PHI makes no sense.  foo doesn't live beyond its block,
> so there should be no loop-carried dependency here.
> 
> So yeah, if “it” meant that the fact that variables live too long,
> then I agree that becomes a correctness issue with this feature,
> rather than just an odd quirk.  Could we solve it by inserting
> a clobber at the end of the variable's scope, or something like that?

It would possibly make GCC not generate a PHI on this broken input, yes.  
But why would that be an improvement?

> > Agreed, I have made such change yesterday, and this work around the 
> > current issue.
> >
> > temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)
> >
> > To:
> >
> > temp = .DEFERRED_INIT (SIZE_of_temp, init_type)
> 
> I think we're going round in circles here though.  The point of having
> the undefined input was so that we would continue to warn about undefined
> inputs.  The above issue doesn't seem like a good justification for
> dropping that.

If you want to rely on the undefined use for error reporting then you must 
only generate an undefined use when there was one before, you can't just 
insert new undefined uses.  I don't see how it could be otherwise, as soon 
as you introduce new undefined uses you can and will run into GCC making 
use of the undefinedness, not just this particular issue with lifetime and 
PHI nodes which might be "solved" by clobbers.

I think it's a chicken egg problem: you can't add undefined uses, for 
which you need to know if there was one, but the facility is supposed to 
help detecting if there is one to start with.


Ciao,
Michael.


Re: [PATCH] Port GCC documentation to Sphinx

2021-07-01 Thread Martin Liška

On 7/1/21 3:33 PM, Eli Zaretskii wrote:

Cc: jos...@codesourcery.com, g...@gcc.gnu.org, gcc-patches@gcc.gnu.org
From: Martin Liška 
Date: Thu, 1 Jul 2021 14:44:10 +0200


It helps some, but not all of the issues disappear.  For example,
stuff like this is still hard to read:

To select this standard in GCC, use one of the options -ansi
   -
-std.‘=c90’ or -std.‘=iso9899:1990’
   


If I understand the notes correct, the '.' should be also hidden by e.g. Emacs.


No, it doesn't.  The actual text in the Info file is:

*note -std: f.‘=iso9899:1990’

and the period after " f" isn't hidden.  Where does that "f" come from
and what is its purpose here? can it be removed (together with the
period)?


It's name of the anchor used for the @ref. The names are automatically generated
by makeinfo. So there's an example:

This is the warning level of @ref{e,,-Wshift-overflow3} and …

becomes in info:
This is the warning level of *note -Wshift-overflow3: e. and …

I can ask the question at Sphinx, the Emacs script should hide that.




About ‘=iso9899:1990’, yes, it's a :samp: and how it's wrapper by Sphinx by 
default.


Why is it a separate :samp:?  IMO, the correct markup is to make the
entire string -std=iso9899:1990 have the same typeface.  In Texinfo,
I'd use

@option{-std=iso9899:1990}


We can adjust 'emph' formatting to nil, what do you think?


Something like that, yes.  But the problem is: how will you format it
instead?  The known alternatives, _foo_ and *foo* both use punctuation
characters, which will get in the way similarly to the quotes.  Can
you format those in caps, like makeinfo does?


You are fully right, info is very simple format and it uses wrapping for the 
formatting
purpose (by default * and _). So, I don't have any elegant solution.


Well, it sounds from another mail that you did find a solution: to
up-case the string in @var.


I don't know. Some of them can be e.g. keywords and using upper-case does not 
seem
to me feasible.




Note also that nodes are now called by the same name as the section,
which means node names generally got much longer.  Is that really a
good idea?


Well, I intentionally removed these and used simple TOC tree links
which take display text for a section title.


I would suggest to discuss these decisions first, perhaps on the
Texinfo mailing list?  I'm accustomed to these short descriptions, but
I'm not sure how important they are for others.


Well, it was decision done during the transition of texinfo files into Sphinx.
I don't see why it should be discussed in Texinfo community.


This actually raises a more general issue with this Sphinx porting
initiative: what will be the canonical style guide for maintaining the
GCC manual in Sphinx, or more generally for writing GNU manuals in
Sphinx?  For Texinfo, we have the Texinfo manual, which both documents
the language and provides style guidelines for how to use Texinfo for
producing good manuals.  Contributors to GNU manuals are using those
guidelines for many years.  Is there, or will there be, an equivalent
style guide for Sphinx?  If not, how will the future contributors to
the GCC manuals know what are the writing and style conventions?


No, I'm not planning any extra style guide. We will you standard Sphinx RST
manual and one can find many tutorials about how to do it.



That is why I recommended to discuss this on the Texinfo list: that's
the place where such guidelines are discussed, and where we have
experts who understand the effects and consequences of using this or
that style.  The current style in GNU manuals is to have the menus as
we see them in the existing GCC manuals: with a short description.
Maybe there are good reasons to deviate from that style, but
shouldn't this be at least presented and discussed, before the
decision is made?  GCC developers are not the only ones who will be
reading the future GCC manuals.



That seems to me a subtle adjustment and it's standard way how people generate
TOC in Sphinx. See e.g. the Linux kernel documentation:
https://www.kernel.org/doc/html/latest/

Martin


Re: [PATCH] Analyze niter for until-wrap condition [PR101145]

2021-07-01 Thread guojiufu via Gcc-patches

On 2021-07-01 20:35, Richard Biener wrote:

On Thu, 1 Jul 2021, Jiufu Guo wrote:


For code like:
unsigned foo(unsigned val, unsigned start)
{
  unsigned cnt = 0;
  for (unsigned i = start; i > val; ++i)
cnt++;
  return cnt;
}

The number of iterations should be about UINT_MAX - start.


For

unsigned foo(unsigned val, unsigned start)
{
  unsigned cnt = 0;
  for (unsigned i = start; i >= val; ++i)
cnt++;
  return cnt;
}

and val == 0 the loop never terminates.  I don't see anywhere
in the patch that you disregard GE_EXPR and I remember
the code handles GE as well as GT?  From a quick look this is
also not covered by a testcase you add - not exactly sure
how it would materialize in a miscompilation.


In number_of_iterations_cond, there is code:
   if (code == GE_EXPR || code == GT_EXPR
|| (code == NE_EXPR && integer_zerop (iv0->step)))
  {
std::swap (iv0, iv1);
code = swap_tree_comparison (code);
  }
It converts "GT/GE" (i >= val) to "LT/LE" (val <= i),
and LE (val <= i) is converted to LT (val - 1 < i).
So, the code is added to number_of_iterations_lt.

But, this patch leads mis-compilation for unsigned "i >= val" as
above transforms: converting LE (val <= i) to LT (val - 1 < i)
seems not appropriate (e.g where val=0).
Thanks for pointing out this!!!

I would investigate a way to handle this correctly.
A possible way maybe just to return false for this kind of LE.

Any suggestions?




There is function adjust_cond_for_loop_until_wrap which
handles similar work for const bases.
Like adjust_cond_for_loop_until_wrap, this patch enhance
function number_of_iterations_cond/number_of_iterations_lt
to analyze number of iterations for this kind of loop.

Bootstrap and regtest pass on powerpc64le, is this ok for trunk?

gcc/ChangeLog:

PR tree-optimization/101145
* tree-ssa-loop-niter.c
(number_of_iterations_until_wrap): New function.
(number_of_iterations_lt): Invoke above function.
(adjust_cond_for_loop_until_wrap):
Merge to number_of_iterations_until_wrap.
(number_of_iterations_cond): Update invokes for
adjust_cond_for_loop_until_wrap and number_of_iterations_lt.

gcc/testsuite/ChangeLog:

PR tree-optimization/101145
* gcc.dg/vect/pr101145.c: New test.
* gcc.dg/vect/pr101145.inc: New test.
* gcc.dg/vect/pr101145_1.c: New test.
* gcc.dg/vect/pr101145_2.c: New test.
* gcc.dg/vect/pr101145_3.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr101145.c   | 187 
+

 gcc/testsuite/gcc.dg/vect/pr101145.inc |  63 +
 gcc/testsuite/gcc.dg/vect/pr101145_1.c |  15 ++
 gcc/testsuite/gcc.dg/vect/pr101145_2.c |  15 ++
 gcc/testsuite/gcc.dg/vect/pr101145_3.c |  15 ++
 gcc/tree-ssa-loop-niter.c  | 150 +++-
 6 files changed, 380 insertions(+), 65 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145.inc
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr101145_3.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr101145.c 
b/gcc/testsuite/gcc.dg/vect/pr101145.c

new file mode 100644
index 000..74031b031cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr101145.c
@@ -0,0 +1,187 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O3 -fdump-tree-vect-details" } */
+#include 
+
+unsigned __attribute__ ((noinline))
+foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned)

+{
+  while (UINT_MAX - 64 < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  l = UINT_MAX - 32;
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_3 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  while (n <= ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_4 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{  // infininate
+  while (0 <= ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+foo_5 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  //no loop
+  l = UINT_MAX;
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+bar (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned 
n)

+{
+  while (--l < n)
+*a++ = *b++ + 1;
+  return l;
+}
+
+unsigned __attribute__ ((noinline))
+bar_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned)

+{
+  while (--l < 64)
+*a++ = *b++ + 1;
+  return l;
+}
+

Re: [PATCH v6 1/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-07-01 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 1, 2021 at 2:42 PM H.J. Lu  wrote:
>
> Hi Uros,
>
> On Thu, Jul 1, 2021 at 1:32 AM Hongtao Liu  wrote:
> >
> > On Tue, Jun 29, 2021 at 6:16 AM H.J. Lu  wrote:
> > >
> > > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR
> > > operands to vector broadcast from an integer with AVX.
> > > 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
> > > won't increase stack alignment requirement and blocks transformation by
> > > the combine pass.
> > >
> > > A small benchmark:
> > >
> > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
> > >
> > > shows that broadcast is a little bit faster on Intel Core i7-8559U:
> > >
> > > $ make
> > > gcc -g -I. -O2   -c -o test.o test.c
> > > gcc -g   -c -o memory.o memory.S
> > > gcc -g   -c -o broadcast.o broadcast.S
> > > gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
> > > gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
> > > ./test
> > > memory  : 147215
> > > broadcast   : 121213
> > > vec_dup_sse2: 171366
> > > $
> > >
> > > broadcast is also smaller:
> > >
> > > $ size memory.o broadcast.o
> > >textdata bss dec hex filename
> > > 132   0   0 132  84 memory.o
> > > 122   0   0 122  7a broadcast.o
> > > $
> > >
> > > 3. Update PR 87767 tests to expect integer broadcast instead of broadcast
> > > from memory.
> > > 4. Update avx512f_cond_move.c to expect integer broadcast.
> > >
> > > A small benchmark:
> > >
> > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast
> > >
> > > shows that integer broadcast is faster than embedded memory broadcast:
> > >
> > > $ make
> > > gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
> > > gcc -g   -c -o memory.o memory.S
> > > gcc -g   -c -o broadcast.o broadcast.S
> > > gcc -o test test.o memory.o broadcast.o
> > > ./test
> > > memory  : 425538
> > > broadcast   : 375260
> > > $
> > >
> > > gcc/
> > >
> > > PR target/100865
> > > * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
> > > New prototype.
> > > (ix86_byte_broadcast): New function.
> > > (ix86_convert_const_wide_int_to_broadcast): Likewise.
> > > (ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
> > > size is 16 bytes or bigger.
> > > (ix86_broadcast_from_integer_constant): New function.
> > > (ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
> > > to broadcast if mode size is 16 bytes or bigger.
> > > * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
> > > prototype.
> > > * config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/100865
> > > * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
> > > broadcast.
> > > * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
> > > * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
> > > * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
> > > * gcc.target/i386/avx512f_cond_move.c: Also pass
> > > -mprefer-vector-width=512 and expect integer broadcast.
> > Those tests change LGTM since it's faster than the older version, and
> > I remember uros has already reviewed other parts, right?
>
> Uros, does this version look OK?

Yes, LGTM.

My main concern was around embedded broadcast, but as shown above,
using temporary reg is faster.

Uros.

> Thanks.
>
> > > * gcc.target/i386/pr100865-1.c: New test.
> > > * gcc.target/i386/pr100865-2.c: Likewise.
> > > * gcc.target/i386/pr100865-3.c: Likewise.
> > > * gcc.target/i386/pr100865-4a.c: Likewise.
> > > * gcc.target/i386/pr100865-4b.c: Likewise.
> > > * gcc.target/i386/pr100865-5a.c: Likewise.
> > > * gcc.target/i386/pr100865-5b.c: Likewise.
> > > * gcc.target/i386/pr100865-6a.c: Likewise.
> > > * gcc.target/i386/pr100865-6b.c: Likewise.
> > > * gcc.target/i386/pr100865-6c.c: Likewise.
> > > * gcc.target/i386/pr100865-7a.c: Likewise.
> > > * gcc.target/i386/pr100865-7b.c: Likewise.
> > > * gcc.target/i386/pr100865-7c.c: Likewise.
> > > * gcc.target/i386/pr100865-8a.c: Likewise.
> > > * gcc.target/i386/pr100865-8b.c: Likewise.
> > > * gcc.target/i386/pr100865-8c.c: Likewise.
> > > * gcc.target/i386/pr100865-9a.c: Likewise.
> > > * gcc.target/i386/pr100865-9b.c: Likewise.
> > > * gcc.target/i386/pr100865-9c.c: Likewise.
> > > * gcc.target/i386/pr100865-10a.c: Likewise.
> > > * gcc.target/i386/pr100865-10b.c: Likewise.
> > > * gcc.target/i386/pr100865-11a.c: Likewise.
> > > * gcc.target/i386/pr100865-11b.c: Likewise.
> > > * gcc.target/i386/pr100865-11c.c: Likewise.
> > > * gcc.target/i386/pr100865-12a.c: 

Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Richard Sandiford via Gcc-patches
Qing Zhao  writes:
>> On Jul 1, 2021, at 1:48 AM, Richard Biener  
>> wrote:
>> 
>> On Wed, Jun 30, 2021 at 9:15 PM Qing Zhao via Gcc-patches
>>  wrote:
>>> 
>>> 
>>> 
 On Jun 30, 2021, at 1:59 PM, Richard Biener  wrote:
 
 On June 30, 2021 8:07:43 PM GMT+02:00, Qing Zhao  
 wrote:
> 
> 
>> On Jun 30, 2021, at 12:36 PM, Richard Biener 
> wrote:
>> 
>> On June 30, 2021 7:20:18 PM GMT+02:00, Andrew Pinski
>  wrote:
>>> On Wed, Jun 30, 2021 at 8:47 AM Qing Zhao via Gcc-patches
>>>  wrote:
 
 I came up with a very simple testing case that can repeat the same
>>> issue:
 
 [qinzhao@localhost gcc]$ cat t.c
 extern void bar (int);
 void foo (int a)
 {
 int i;
 for (i = 0; i < a; i++) {
  if (__extension__({int size2;
  size2 = 4;
  size2 > 5;}))
  bar (a);
 }
 }
>>> 
>>> You should show the full dump,
>>> What we have is the following:
>>> 
>>> 
>>> 
>>> size2_3 = PHI 
>>>  :
>>> 
>>> size2_12 = .DEFERRED_INIT (size2_3, 2);
>>> size2_13 = 4;
>>> 
>>> So CCP decides to propagate 4 into the PHI and then decides
> size2_1(D)
>>> is undefined so size2_3 is then considered 4 and propagates it into
>>> the .DEFERRED_INIT.
>> 
>> Which means the DEFERED_INIT is inserted at the wrong place.
> 
> Then, where is the correct place for “.DEFERRED_INIT(size2,2)?
> 
> The variable “size2” is a block scope variable which is declared inside
> the “if” condition:
 
 But that's obviously not how it behaves
 During into SSA phase since we're inserting a PHI for it - and we're 
 inserting it because of the use in the DEFERED_INIT call. I suppose you 
 need to fiddle with the SSA rewrite and avoid treating the use as a use 
 but only for the purpose of inserting PHIs...
>>> 
>>> Please see my other email on the new small testing case without 
>>> -ftrivial-auto-var-init.  The same issue in SSA with that testing case even 
>>> without -ftrivial-auto-var-init.
>>> It looks like an existing bug to me in SSA.
>> 
>> It's not a bug in SSA, it's at most a missed optimization there.
>
> I still think that SSA cannot handle block-scoped variable correctly in this 
> case, it should not move the variable out side of the block scope. -:)
>
>>  But
>> with -ftrivial-auto-var-init it
>> becomes a correctness issue.

I might have lost track of what “it” is here.  Do you mean the progation,
or the fact that we have a PHI in the first place?

For:

unsigned int
f (int n)
{
  unsigned int res = 0;
  for (int i = 0; i < n; i += 1)
{
  unsigned int foo;
  foo += 1;
  res = foo;
}
  return res;
}

we generate:

  unsigned int foo;
  int i;
  unsigned int res;
  unsigned int _8;

   :
  res_4 = 0;
  i_5 = 0;
  goto ; [INV]

   :
  foo_10 = foo_3 + 1;
  res_11 = foo_10;
  i_12 = i_2 + 1;

   :
  # res_1 = PHI 
  # i_2 = PHI 
  # foo_3 = PHI 
  if (i_2 < n_7(D))
goto ; [INV]
  else
goto ; [INV]

   :
  _8 = res_1;
  return _8;

IMO the foo_3 PHI makes no sense.  foo doesn't live beyond its block,
so there should be no loop-carried dependency here.

So yeah, if “it” meant that the fact that variables live too long,
then I agree that becomes a correctness issue with this feature,
rather than just an odd quirk.  Could we solve it by inserting
a clobber at the end of the variable's scope, or something like that?

>> I think the idea avoiding the USE of the variable in .DEFERRED_INIT
>> and instead passing the init size is a good one and should avoid this
>> case (hopefully).
>
>
> Agreed, I have made such change yesterday, and this work around the current 
> issue.
>
> temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)
>
> To:
>
> temp = .DEFERRED_INIT (SIZE_of_temp, init_type)

I think we're going round in circles here though.  The point of having
the undefined input was so that we would continue to warn about undefined
inputs.  The above issue doesn't seem like a good justification for
dropping that.

Thanks,
Richard




Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Richard Biener via Gcc-patches
On Thu, Jul 1, 2021 at 3:45 PM Qing Zhao  wrote:
>
>
>
> > On Jul 1, 2021, at 1:48 AM, Richard Biener  
> > wrote:
> >
> > On Wed, Jun 30, 2021 at 9:15 PM Qing Zhao via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >>> On Jun 30, 2021, at 1:59 PM, Richard Biener  wrote:
> >>>
> >>> On June 30, 2021 8:07:43 PM GMT+02:00, Qing Zhao  
> >>> wrote:
> 
> 
> > On Jun 30, 2021, at 12:36 PM, Richard Biener 
>  wrote:
> >
> > On June 30, 2021 7:20:18 PM GMT+02:00, Andrew Pinski
>   wrote:
> >> On Wed, Jun 30, 2021 at 8:47 AM Qing Zhao via Gcc-patches
> >>  wrote:
> >>>
> >>> I came up with a very simple testing case that can repeat the same
> >> issue:
> >>>
> >>> [qinzhao@localhost gcc]$ cat t.c
> >>> extern void bar (int);
> >>> void foo (int a)
> >>> {
> >>> int i;
> >>> for (i = 0; i < a; i++) {
> >>>  if (__extension__({int size2;
> >>>  size2 = 4;
> >>>  size2 > 5;}))
> >>>  bar (a);
> >>> }
> >>> }
> >>
> >> You should show the full dump,
> >> What we have is the following:
> >>
> >>
> >>
> >> size2_3 = PHI 
> >>  :
> >>
> >> size2_12 = .DEFERRED_INIT (size2_3, 2);
> >> size2_13 = 4;
> >>
> >> So CCP decides to propagate 4 into the PHI and then decides
>  size2_1(D)
> >> is undefined so size2_3 is then considered 4 and propagates it into
> >> the .DEFERRED_INIT.
> >
> > Which means the DEFERED_INIT is inserted at the wrong place.
> 
>  Then, where is the correct place for “.DEFERRED_INIT(size2,2)?
> 
>  The variable “size2” is a block scope variable which is declared inside
>  the “if” condition:
> >>>
> >>> But that's obviously not how it behaves
> >>> During into SSA phase since we're inserting a PHI for it - and we're 
> >>> inserting it because of the use in the DEFERED_INIT call. I suppose you 
> >>> need to fiddle with the SSA rewrite and avoid treating the use as a use 
> >>> but only for the purpose of inserting PHIs...
> >>
> >> Please see my other email on the new small testing case without 
> >> -ftrivial-auto-var-init.  The same issue in SSA with that testing case 
> >> even without -ftrivial-auto-var-init.
> >> It looks like an existing bug to me in SSA.
> >
> > It's not a bug in SSA, it's at most a missed optimization there.
>
> I still think that SSA cannot handle block-scoped variable correctly in this 
> case, it should not move the variable out side of the block scope. -:)

Well, the SSA rewriter simply does not "understand" scopes and thus
keeps registers live across
the scope boundaries which results in unnecessary PHI nodes in this
particular case.  That's of
course because "scopes" no longer exist at this point in the IL, they
are at most approximated
by CLOBBERs we put at the end of scopes, but we don't do that for everything.

For

void foo ()
{
lab:
{
  int i;
  i += 1;
}
  goto lab;
}

we get

   :
  # i_1 = PHI 
lab:
  i_3 = i_1 + 1;
  // predicted unlikely by goto predictor.
  goto ; [INV]

but

   :
lab:
  i_3 = i_2(D) + 1;
  // predicted unlikely by goto predictor.
  goto ; [INV]

would have been correct as  well.

Richard.

> >  But
> > with -ftrivial-auto-var-init it
> > becomes a correctness issue.  I think the idea avoiding the USE of the
> > variable in .DEFERRED_INIT
> > and instead passing the init size is a good one and should avoid this
> > case (hopefully).
>
>
> Agreed, I have made such change yesterday, and this work around the current 
> issue.
>
> temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)
>
> To:
>
> temp = .DEFERRED_INIT (SIZE_of_temp, init_type)
>
> Thanks.
>
> Qing
> >
> > Richard.
> >
> >> Let me know if I still miss anything
> >>
> >> Qing
> >>>
> >>> You might be able to construct a testcase which has a use before the real 
> >>> init where then the optimistic CCP propagation will defeat the 
> >>> DEFERED_INIT otherwise.
> >>>
> >>> I'd need to play with the actual patch to find a good solution to this 
> >>> problem.
> >>>
> >>> Richard.
> >>>
> >>
>


Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Qing Zhao via Gcc-patches


> On Jul 1, 2021, at 1:48 AM, Richard Biener  wrote:
> 
> On Wed, Jun 30, 2021 at 9:15 PM Qing Zhao via Gcc-patches
>  wrote:
>> 
>> 
>> 
>>> On Jun 30, 2021, at 1:59 PM, Richard Biener  wrote:
>>> 
>>> On June 30, 2021 8:07:43 PM GMT+02:00, Qing Zhao  
>>> wrote:
 
 
> On Jun 30, 2021, at 12:36 PM, Richard Biener 
 wrote:
> 
> On June 30, 2021 7:20:18 PM GMT+02:00, Andrew Pinski
  wrote:
>> On Wed, Jun 30, 2021 at 8:47 AM Qing Zhao via Gcc-patches
>>  wrote:
>>> 
>>> I came up with a very simple testing case that can repeat the same
>> issue:
>>> 
>>> [qinzhao@localhost gcc]$ cat t.c
>>> extern void bar (int);
>>> void foo (int a)
>>> {
>>> int i;
>>> for (i = 0; i < a; i++) {
>>>  if (__extension__({int size2;
>>>  size2 = 4;
>>>  size2 > 5;}))
>>>  bar (a);
>>> }
>>> }
>> 
>> You should show the full dump,
>> What we have is the following:
>> 
>> 
>> 
>> size2_3 = PHI 
>>  :
>> 
>> size2_12 = .DEFERRED_INIT (size2_3, 2);
>> size2_13 = 4;
>> 
>> So CCP decides to propagate 4 into the PHI and then decides
 size2_1(D)
>> is undefined so size2_3 is then considered 4 and propagates it into
>> the .DEFERRED_INIT.
> 
> Which means the DEFERED_INIT is inserted at the wrong place.
 
 Then, where is the correct place for “.DEFERRED_INIT(size2,2)?
 
 The variable “size2” is a block scope variable which is declared inside
 the “if” condition:
>>> 
>>> But that's obviously not how it behaves
>>> During into SSA phase since we're inserting a PHI for it - and we're 
>>> inserting it because of the use in the DEFERED_INIT call. I suppose you 
>>> need to fiddle with the SSA rewrite and avoid treating the use as a use but 
>>> only for the purpose of inserting PHIs...
>> 
>> Please see my other email on the new small testing case without 
>> -ftrivial-auto-var-init.  The same issue in SSA with that testing case even 
>> without -ftrivial-auto-var-init.
>> It looks like an existing bug to me in SSA.
> 
> It's not a bug in SSA, it's at most a missed optimization there.

I still think that SSA cannot handle block-scoped variable correctly in this 
case, it should not move the variable out side of the block scope. -:)

>  But
> with -ftrivial-auto-var-init it
> becomes a correctness issue.  I think the idea avoiding the USE of the
> variable in .DEFERRED_INIT
> and instead passing the init size is a good one and should avoid this
> case (hopefully).


Agreed, I have made such change yesterday, and this work around the current 
issue.

temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)

To:

temp = .DEFERRED_INIT (SIZE_of_temp, init_type)

Thanks.

Qing
> 
> Richard.
> 
>> Let me know if I still miss anything
>> 
>> Qing
>>> 
>>> You might be able to construct a testcase which has a use before the real 
>>> init where then the optimistic CCP propagation will defeat the DEFERED_INIT 
>>> otherwise.
>>> 
>>> I'd need to play with the actual patch to find a good solution to this 
>>> problem.
>>> 
>>> Richard.
>>> 
>> 



Re: [PATCH] Port GCC documentation to Sphinx

2021-07-01 Thread Eli Zaretskii via Gcc-patches
> Cc: jos...@codesourcery.com, g...@gcc.gnu.org, gcc-patches@gcc.gnu.org
> From: Martin Liška 
> Date: Thu, 1 Jul 2021 14:44:10 +0200
> 
> > It helps some, but not all of the issues disappear.  For example,
> > stuff like this is still hard to read:
> > 
> >To select this standard in GCC, use one of the options -ansi
> >   -
> >-std.‘=c90’ or -std.‘=iso9899:1990’
> >   
> 
> If I understand the notes correct, the '.' should be also hidden by e.g. 
> Emacs.

No, it doesn't.  The actual text in the Info file is:

   *note -std: f.‘=iso9899:1990’

and the period after " f" isn't hidden.  Where does that "f" come from
and what is its purpose here? can it be removed (together with the
period)?

> About ‘=iso9899:1990’, yes, it's a :samp: and how it's wrapper by Sphinx by 
> default.

Why is it a separate :samp:?  IMO, the correct markup is to make the
entire string -std=iso9899:1990 have the same typeface.  In Texinfo,
I'd use

   @option{-std=iso9899:1990}

> >> We can adjust 'emph' formatting to nil, what do you think?
> > 
> > Something like that, yes.  But the problem is: how will you format it
> > instead?  The known alternatives, _foo_ and *foo* both use punctuation
> > characters, which will get in the way similarly to the quotes.  Can
> > you format those in caps, like makeinfo does?
> 
> You are fully right, info is very simple format and it uses wrapping for the 
> formatting
> purpose (by default * and _). So, I don't have any elegant solution.

Well, it sounds from another mail that you did find a solution: to
up-case the string in @var.

> >>> Note also that nodes are now called by the same name as the section,
> >>> which means node names generally got much longer.  Is that really a
> >>> good idea?
> >>
> >> Well, I intentionally removed these and used simple TOC tree links
> >> which take display text for a section title.
> > 
> > I would suggest to discuss these decisions first, perhaps on the
> > Texinfo mailing list?  I'm accustomed to these short descriptions, but
> > I'm not sure how important they are for others.
> 
> Well, it was decision done during the transition of texinfo files into Sphinx.
> I don't see why it should be discussed in Texinfo community.

This actually raises a more general issue with this Sphinx porting
initiative: what will be the canonical style guide for maintaining the
GCC manual in Sphinx, or more generally for writing GNU manuals in
Sphinx?  For Texinfo, we have the Texinfo manual, which both documents
the language and provides style guidelines for how to use Texinfo for
producing good manuals.  Contributors to GNU manuals are using those
guidelines for many years.  Is there, or will there be, an equivalent
style guide for Sphinx?  If not, how will the future contributors to
the GCC manuals know what are the writing and style conventions?

That is why I recommended to discuss this on the Texinfo list: that's
the place where such guidelines are discussed, and where we have
experts who understand the effects and consequences of using this or
that style.  The current style in GNU manuals is to have the menus as
we see them in the existing GCC manuals: with a short description.
Maybe there are good reasons to deviate from that style, but
shouldn't this be at least presented and discussed, before the
decision is made?  GCC developers are not the only ones who will be
reading the future GCC manuals.


[RS6000] Adjust testcases for power10 instructions

2021-07-01 Thread Alan Modra via Gcc-patches
Bootstrapped and regression tested powerpc64le-linux power9 and
power10.  OK for mainline?

* lib/target-supports.exp (check_effective_target_has_arch_pwr10): New.
* gcc.dg/pr56727-2.c,
gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c,
gcc.target/powerpc/fold-vec-load-vec_xl-char.c,
gcc.target/powerpc/fold-vec-load-vec_xl-double.c,
gcc.target/powerpc/fold-vec-load-vec_xl-float.c,
gcc.target/powerpc/fold-vec-load-vec_xl-int.c,
gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c,
gcc.target/powerpc/fold-vec-load-vec_xl-short.c,
gcc.target/powerpc/fold-vec-splat-floatdouble.c,
gcc.target/powerpc/fold-vec-splat-longlong.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c,
gcc.target/powerpc/fold-vec-store-vec_xst-char.c,
gcc.target/powerpc/fold-vec-store-vec_xst-double.c,
gcc.target/powerpc/fold-vec-store-vec_xst-float.c,
gcc.target/powerpc/fold-vec-store-vec_xst-int.c,
gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c,
gcc.target/powerpc/fold-vec-store-vec_xst-short.c,
gcc.target/powerpc/lvsl-lvsr.c,
gcc.target/powerpc/ppc-eq0-1.c,
gcc.target/powerpc/ppc-ne0-1.c,
gcc.target/powerpc/pr86731-fwrapv-longlong.c: Match power10 insns.
* gcc.target/powerpc/lvsl-lvsr.c: Avoid file name match.

diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c b/gcc/testsuite/gcc.dg/pr56727-2.c
index c54369ed25e..f055116772a 100644
--- a/gcc/testsuite/gcc.dg/pr56727-2.c
+++ b/gcc/testsuite/gcc.dg/pr56727-2.c
@@ -18,4 +18,4 @@ void h ()
 
 /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* } } } 
*/
 /* { dg-final { scan-assembler "@(PLT|plt)" { target { powerpc*-*-linux* && 
ilp32 } } } } */
-/* { dg-final { scan-assembler "bl f\n\\s*nop" { target { powerpc*-*-linux* && 
lp64 } } } } */
+/* { dg-final { scan-assembler {bl f(\n\s*nop|@notoc\n)} { target { 
powerpc*-*-linux* && lp64 } } } } */
diff --git 
a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
index 246f38fa6d1..1cff4550f28 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
@@ -25,6 +25,6 @@ main1 (void)
with no word loads (lw, lwu, lwz, lwzu, or their indexed forms)
or word stores (stw, stwu, stwx, stwux, or their indexed forms).  */
 
-/* { dg-final { scan-assembler "\t(lvx|lxv|lvsr|stxv)" } } */
+/* { dg-final { scan-assembler "\t(lvx|p?lxv|lvsr|p?stxv)" } } */
 /* { dg-final { scan-assembler-not "\tlwz?u?x? " { xfail { powerpc-ibm-aix* } 
} } } */
 /* { dg-final { scan-assembler-not "\tstwu?x? " } } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
index 9b199c219bf..104710700c8 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
@@ -36,4 +36,4 @@ BUILD_VAR_TEST( test10, vector unsigned char, signed long 
long, vector unsigned
 BUILD_VAR_TEST( test11, vector unsigned char, signed int, vector unsigned 
char);
 BUILD_CST_TEST( test12, vector unsigned char, 8, vector unsigned char);
 
-/* { dg-final { scan-assembler-times {\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mlvx\M} 
12 } } */
+/* { dg-final { scan-assembler-times 
{\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mlvx\M|\mplxv\M} 12 } } */

Re: [PATCH] Optimize macro: make it more predictable

2021-07-01 Thread Martin Liška

On 10/23/20 1:47 PM, Martin Liška wrote:

Hey.


Hello.

I deferred the patch for GCC 12. Since the time, I messed up with options
I feel more familiar with the option handling. So ...



This is a follow-up of the discussion that happened in thread about 
no_stack_protector
attribute: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545916.html

The current optimize attribute works in the following way:
- 1) we take current global_options as base
- 2) maybe_default_options is called for the currently selected optimization 
level, which
  means all rules in default_options_table are executed
- 3) attribute values are applied (via decode_options)

So the step 2) is problematic: in case of -O2 -fno-omit-frame-pointer and 
__attribute__((optimize("-fno-stack-protector")))
ends basically with -O2 -fno-stack-protector because -fno-omit-frame-pointer is 
default:
 /* -O1 and -Og optimizations.  */
 { OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },

My patch handled and the current optimize attribute really behaves that same as 
appending attribute value
to the command line. So far so good. We should also reflect that in 
documentation entry which is quite
vague right now:


^^^ all these are still valid arguments, plus I'm adding a new test-case that 
tests that.



"""
The optimize attribute is used to specify that a function is to be compiled 
with different optimization options than specified on the command line.
"""


I addressed that with documentation changes, should be more clear to users. 
Moreover, I noticed that we declare 'optimize' attribute
as something not for a production use:

"The optimize attribute should be used for debugging purposes only. It is not 
suitable in production code."

Are we sure about the statement? I know that e.g. glibc uses that.



and we may want to handle -Ox in the attribute in a special way. I guess many 
macro/pragma users expect that

-O2 -ftree-vectorize and __attribute__((optimize(1))) will end with -O1 and not
with -ftree-vectorize -O1 ?


The situation with 'target' attribute is different. When parsing the attribute, 
we intentionally drop all existing target flags:

$ cat -n gcc/config/i386/i386-options.c
...
  1245if (opt == IX86_FUNCTION_SPECIFIC_ARCH)
  1246  {
  1247/* If arch= is set,  clear all bits in 
x_ix86_isa_flags,
  1248   except for ISA_64BIT, ABI_64, ABI_X32, and CODE16
  1249   and all bits in x_ix86_isa_flags2.  */
  1250opts->x_ix86_isa_flags &= (OPTION_MASK_ISA_64BIT
  1251   | OPTION_MASK_ABI_64
  1252   | OPTION_MASK_ABI_X32
  1253   | OPTION_MASK_CODE16);
  1254opts->x_ix86_isa_flags_explicit &= 
(OPTION_MASK_ISA_64BIT
  1255| 
OPTION_MASK_ABI_64
  1256| 
OPTION_MASK_ABI_X32
  1257| 
OPTION_MASK_CODE16);
  1258opts->x_ix86_isa_flags2 = 0;
  1259opts->x_ix86_isa_flags2_explicit = 0;
  1260  }

That seems logical because target attribute is used for e.g. ifunc 
multi-versioning and one needs
to be sure all existing ISA flags are dropped. However, I noticed clang behaves 
differently:

$ cat hreset.c
#pragma GCC target "arch=geode"
#include 
void foo(unsigned int eax)
{
  _hreset (eax);
}

$ clang hreset.c -mhreset  -c -O2 -m32
$ gcc hreset.c -mhreset  -c -O2 -m32
In file included from 
/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/12.0.0/include/x86gprintrin.h:97,
 from 
/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/12.0.0/include/immintrin.h:27,
 from hreset.c:2:
hreset.c: In function ‘foo’:
/home/marxin/bin/gcc/lib64/gcc/x86_64-pc-linux-gnu/12.0.0/include/hresetintrin.h:39:1:
 error: inlining failed in call to ‘always_inline’ ‘_hreset’: target specific 
option mismatch
   39 | _hreset (unsigned int __EAX)
  | ^~~
hreset.c:5:3: note: called from here
5 |   _hreset (eax);
  |   ^

Anyway, I think the current target attribute handling should be preserved.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin



I'm also planning to take a look at the target macro/attribute, I expect 
similar problems:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97469

Thoughts?
Thanks,
Martin


>From eaa1892cbe32c6fe73de7708aa17be2d3917bceb Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 2 Jun 2021 08:44:37 +0200
Subject: [PATCH] Append target/optimize attr to the current cmdline.

gcc/c-family/ChangeLog:

	* c-common.c (parse_optimize_options): Combine optimize
	options with what was provided on the command line.


Re: [PATCH] Change the type of predicates to bool.

2021-07-01 Thread Richard Biener via Gcc-patches
On Thu, Jul 1, 2021 at 3:07 PM Uros Bizjak  wrote:
>
> On Wed, Jun 30, 2021 at 12:50 PM Richard Biener
>  wrote:
> >
> > On Wed, Jun 30, 2021 at 10:47 AM Uros Bizjak via Gcc-patches
> >  wrote:
> > >
> > > This RFC patch changes the type of predicates to bool. However, some
> > > of the targets (e.g. x86) use indirect functions to call the
> > > predicates, so without the local change, the build fails. Putting the
> > > patch through CI bots should weed out the problems, but I have no
> > > infrastructure to do it myself.
> >
> > I'd say thanks for the work - note building some cc1 crosses should
> > catch 99% of the fallout (just configure $target-linux/elf and make all-gcc)
>
> Thanks for the hint, I have tested the patch on arm-eabi, {x86_64,
> i386, aarch64, mips, m68k, h8300}-elf and {ppc64le, hppa, s390, ia64,
> riscv, sh, sparc}-linux. The fallout, fixed by the attached v1 patch,
> was surprisingly small, so I hope there remains no (otherwise easily
> fixable) build errors.

OK.

Thanks,
Richard.

> 2021-07-01  Uroš Bizjak  
>
> gcc/
> * genpreds.c (write_predicate_subfunction):
> Change the type of written subfunction to bool.
> (write_one_predicate_function):
> Change the type of written function to bool.
> (write_tm_preds_h): Ditto.
> * recog.h (*insn_operand_predicate_fn): Change the type to bool.
> * recog.c (general_operand): Change the type to bool.
> (address_operand): Ditto.
> (register_operand): Ditto.
> (pmode_register_operand): Ditto.
> (scratch_operand): Ditto.
> (immediate_operand): Ditto.
> (const_int_operand): Ditto.
> (const_scalar_int_operand): Ditto.
> (const_double_operand): Ditto.
> (nonimmediate_operand): Ditto.
> (nonmemory_operand): Ditto.
> (push_operand): Ditto.
> (pop_operand): Ditto.
> (memory_operand): Ditto.
> (indirect_operand): Ditto.
> (ordered_comparison_operator): Ditto.
> (comparison_operator): Ditto.
>
> * config/i386/i386-expand.c (ix86_expand_sse_cmp):
> Change the type of indirect predicate function to bool.
>
> * config/rs6000/rs6000.c (easy_vector_constant):
> Change the type to bool.
>
> * config/mips/mips-protos.h (m16_based_address_p):
> Change the type of operand 3 to bool.
>
> OK for the trunk?
>
> Uros.


[PATCH] Change the type of predicates to bool.

2021-07-01 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 30, 2021 at 12:50 PM Richard Biener
 wrote:
>
> On Wed, Jun 30, 2021 at 10:47 AM Uros Bizjak via Gcc-patches
>  wrote:
> >
> > This RFC patch changes the type of predicates to bool. However, some
> > of the targets (e.g. x86) use indirect functions to call the
> > predicates, so without the local change, the build fails. Putting the
> > patch through CI bots should weed out the problems, but I have no
> > infrastructure to do it myself.
>
> I'd say thanks for the work - note building some cc1 crosses should
> catch 99% of the fallout (just configure $target-linux/elf and make all-gcc)

Thanks for the hint, I have tested the patch on arm-eabi, {x86_64,
i386, aarch64, mips, m68k, h8300}-elf and {ppc64le, hppa, s390, ia64,
riscv, sh, sparc}-linux. The fallout, fixed by the attached v1 patch,
was surprisingly small, so I hope there remains no (otherwise easily
fixable) build errors.

2021-07-01  Uroš Bizjak  

gcc/
* genpreds.c (write_predicate_subfunction):
Change the type of written subfunction to bool.
(write_one_predicate_function):
Change the type of written function to bool.
(write_tm_preds_h): Ditto.
* recog.h (*insn_operand_predicate_fn): Change the type to bool.
* recog.c (general_operand): Change the type to bool.
(address_operand): Ditto.
(register_operand): Ditto.
(pmode_register_operand): Ditto.
(scratch_operand): Ditto.
(immediate_operand): Ditto.
(const_int_operand): Ditto.
(const_scalar_int_operand): Ditto.
(const_double_operand): Ditto.
(nonimmediate_operand): Ditto.
(nonmemory_operand): Ditto.
(push_operand): Ditto.
(pop_operand): Ditto.
(memory_operand): Ditto.
(indirect_operand): Ditto.
(ordered_comparison_operator): Ditto.
(comparison_operator): Ditto.

* config/i386/i386-expand.c (ix86_expand_sse_cmp):
Change the type of indirect predicate function to bool.

* config/rs6000/rs6000.c (easy_vector_constant):
Change the type to bool.

* config/mips/mips-protos.h (m16_based_address_p):
Change the type of operand 3 to bool.

OK for the trunk?

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e9763eb5b3e..76d6afd6d9d 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3571,7 +3571,7 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx 
cmp_op0, rtx cmp_op1,
 
   cmp_op0 = force_reg (cmp_ops_mode, cmp_op0);
 
-  int (*op1_predicate)(rtx, machine_mode)
+  bool (*op1_predicate)(rtx, machine_mode)
 = VECTOR_MODE_P (cmp_ops_mode) ? vector_operand : nonimmediate_operand;
 
   if (!op1_predicate (cmp_op1, cmp_ops_mode))
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 2cf4ed50292..51b82b1458d 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -366,7 +366,7 @@ extern bool umips_12bit_offset_address_p (rtx, 
machine_mode);
 extern bool mips_9bit_offset_address_p (rtx, machine_mode);
 extern bool lwsp_swsp_address_p (rtx, machine_mode);
 extern bool m16_based_address_p (rtx, machine_mode,
-int (*)(rtx_def*, machine_mode)); 
+bool (*)(rtx_def*, machine_mode));
 extern rtx mips_expand_thread_pointer (rtx);
 extern void mips16_expand_get_fcsr (rtx);
 extern void mips16_expand_set_fcsr (rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 075c156ae13..f3e5f95b8d4 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1146,7 +1146,7 @@ static bool set_to_load_agen (rtx_insn *,rtx_insn *);
 static bool insn_terminates_group_p (rtx_insn *, enum group_termination);
 static bool insn_must_be_first_in_group (rtx_insn *);
 static bool insn_must_be_last_in_group (rtx_insn *);
-int easy_vector_constant (rtx, machine_mode);
+bool easy_vector_constant (rtx, machine_mode);
 static rtx rs6000_debug_legitimize_address (rtx, rtx, machine_mode);
 static rtx rs6000_legitimize_tls_address (rtx, enum tls_model);
 #if TARGET_MACHO
diff --git a/gcc/genpreds.c b/gcc/genpreds.c
index 63fac0c7d34..9d9715f3d2f 100644
--- a/gcc/genpreds.c
+++ b/gcc/genpreds.c
@@ -110,7 +110,7 @@ process_define_predicate (md_rtx_info *info)
 
becomes
 
-   static inline int basereg_operand_1(rtx op, machine_mode mode)
+   static inline bool basereg_operand_1(rtx op, machine_mode mode)
{
  if (GET_CODE (op) == SUBREG)
op = SUBREG_REG (op);
@@ -151,7 +151,7 @@ write_predicate_subfunction (struct pred_data *p)
 
   p->exp = and_exp;
 
-  printf ("static inline int\n"
+  printf ("static inline bool\n"
  "%s_1 (rtx op ATTRIBUTE_UNUSED, machine_mode mode 
ATTRIBUTE_UNUSED)\n",
  p->name);
   rtx_reader_ptr->print_md_ptr_loc (p->c_block);
@@ -651,7 +651,7 @@ write_one_predicate_function (struct pred_data *p)
 
   /* A normal predicate can legitimately not look at machine_mode
  if it accepts only CONST_INTs and/or 

Re: [PATCH 0/2] Initial support for AVX512FP16

2021-07-01 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote:
> > The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> > require -mavx512fp16, we need to compile complex _Float16 functions in
> > libgcc without -mavx512fp16.  Complex _Float16 performance is very
> > important for our _Float16 usage.   _Float16 performance has to be
> > very fast.  There should be no emulation anywhere when -mavx512fp16
> > is used.   That is why _Float16 is available only with -mavx512fp16.
> 
> It should be possible to emulate scalar _Float16 using _Float32 with a
> reasonable
> performance trade-off.  I think users caring for _Float16 performance will
> use vector intrinsics anyway since for scalar code _Float32 code will likely
> perform the same (at double storage cost)

Only if it is allowed to have excess precision for _Float16.  If not, then
one would need to (expensively?) round after every operation at least.

Jakub



  1   2   3   >